diff --git a/.gitignore b/.gitignore
new file mode 100644
index 0000000000000000000000000000000000000000..ba0430d26c996e7f078385407f959c96c271087c
--- /dev/null
+++ b/.gitignore
@@ -0,0 +1 @@
+__pycache__/
\ No newline at end of file
diff --git a/.gitmodules b/.gitmodules
new file mode 100644
index 0000000000000000000000000000000000000000..3dad1fe94b0ebcd12bd55052a2842990b23d7b2a
--- /dev/null
+++ b/.gitmodules
@@ -0,0 +1,25 @@
+[submodule "SpeechT5/fairseq"]
+	path = SpeechT5/fairseq
+	url = https://github.com/pytorch/fairseq
+[submodule "Speech2C/fairseq"]
+	path = Speech2C/fairseq
+	url = https://github.com/facebookresearch/fairseq.git
+[submodule "YiTrans/fairseq"]
+	path = YiTrans/fairseq
+	url = https://github.com/facebookresearch/fairseq
+[submodule "SpeechLM/fairseq"]
+	path = SpeechLM/fairseq
+	url = https://github.com/facebookresearch/fairseq.git
+[submodule "SpeechUT/fairseq"]
+	path = SpeechUT/fairseq
+	url = https://github.com/facebookresearch/fairseq.git
+[submodule "VATLM/fairseq"]
+	path = VATLM/fairseq
+	url = https://github.com/facebookresearch/fairseq.git
+[submodule "Speech2S/fairseq"]
+	path = Speech2S/fairseq
+	url = https://github.com/facebookresearch/fairseq.git
+	branch = adding_womenbios
+[submodule "WavLLM/fairseq"]
+	path = WavLLM/fairseq
+	url = https://github.com/pytorch/fairseq.git
diff --git a/CODE_OF_CONDUCT.md b/CODE_OF_CONDUCT.md
new file mode 100644
index 0000000000000000000000000000000000000000..f9ba8cf65f3e3104dd061c178066ec8247811f33
--- /dev/null
+++ b/CODE_OF_CONDUCT.md
@@ -0,0 +1,9 @@
+# Microsoft Open Source Code of Conduct
+
+This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).
+
+Resources:
+
+- [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/)
+- [Microsoft Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/)
+- Contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with questions or concerns
diff --git a/LICENSE b/LICENSE
new file mode 100644
index 0000000000000000000000000000000000000000..9e841e7a26e4eb057b24511e7b92d42b257a80e5
--- /dev/null
+++ b/LICENSE
@@ -0,0 +1,21 @@
+    MIT License
+
+    Copyright (c) Microsoft Corporation.
+
+    Permission is hereby granted, free of charge, to any person obtaining a copy
+    of this software and associated documentation files (the "Software"), to deal
+    in the Software without restriction, including without limitation the rights
+    to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+    copies of the Software, and to permit persons to whom the Software is
+    furnished to do so, subject to the following conditions:
+
+    The above copyright notice and this permission notice shall be included in all
+    copies or substantial portions of the Software.
+
+    THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+    IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+    FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+    AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+    LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+    OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+    SOFTWARE
diff --git a/README.md b/README.md
index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..49d15c926a75feb4dea63da0e3a5952cd59304b4 100644
--- a/README.md
+++ b/README.md
@@ -0,0 +1,209 @@
+# SpeechT5
+
+## 论文
+
+  - https://arxiv.org/abs/2110.07205
+
+
+## 开源代码
+
+  - https://github.com/microsoft/SpeechT5
+
+
+## 模型结构
+
+speechT5的核心是一个常规的**Transformer编码器-解码器**，为了使得同一个Transformer可以同时处理文本和语音数据，添加了**pre-nets**和**post-nets**，**pre-net**将输入的文本或语音转换为Transformer使用的隐藏表示；**post-net**从Transformer中获取输出并转换为文本或语音。
+<div align="center">
+    <img src="./images/model_architecture.png"/>
+</div>
+
+
+## 算法原理
+
+在预训练期间，同时使用所有的 per-nets 和 post-nets 。预训练后，整个编码器 - 解码器主干在单个任务上进行微调。这种经过微调的模型仅使用特定于给定任务的 per-nets 和 post-nets 。*例如：要将 SpeechT5 用于文本到语音转换，您需要将文本编码器 per-nets 交换为文本输入，将语音解码器 per-nets 和 post-nets 交换为语音输出。*
+<div align="center">
+    <img src="./images/algorithm.png"/>
+</div>
+
+**注意: 即使微调模型一开始使用共享预训练模型的同一组权重，但最终版本最终还是完全不同。例如，您不能采用经过微调的 ASR 模型并换掉 per-nets 和 post-nets 来获得有效的 TTS 模型。SpeechT5 很灵活，但不是那么灵活。**
+
+
+## 环境配置
+
+### Docker (方法一)
+**注意修改路径参数**
+
+```
+docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-ubuntu20.04-dtk24.04.1-py3.10
+
+docker run -it --network=host --ipc=host --name=your_container_name --shm-size=32G --device=/dev/kfd --device=/dev/mkfd --device=/dev/dri -v /opt/hyhal:/opt/hyhal:ro -v /path/your_code_data/:/path/your_code_data/ --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-centos7.6-dtk24.04-py310 /bin/bash
+
+# 独立安装项目依赖包fairseq
+git clone -b 1.0.0a0 http://developer.hpccube.com/codes/OpenDAS/fairseq.git
+cd fairseq
+pip3 install --editable ./ 
+
+# 安装项目依赖包
+cd speechT5_pytorch
+pip3 install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple/
+```
+
+### Dockerfile (方法二)
+
+```
+cd ./docker
+docker build --no-cache -t speecht5 .
+docker run -it -v /path/your_code_data/:/path/your_code_data/ --shm-size=32G --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --name docker_name imageID bash
+
+git clone -b 1.0.0a0 http://developer.hpccube.com/codes/OpenDAS/fairseq.git
+cd fairseq
+pip3 install --editable ./ 
+
+cd speechT5_pytorch
+pip3 install -r requirements.txt
+```
+
+### Anaconda (方法三)
+
+1、关于本项目DCU显卡所需的特殊深度学习库可从光合开发者社区下载安装： https://developer.hpccube.com/tool/
+
+```
+DTK软件栈：dtk24.04
+python：python3.10
+torch：2.1.0
+torchvision：0.16.0
+```
+
+Tips：以上dtk软件栈、python、torch等DCU相关工具版本需要严格一一对应
+
+2、fairseq库需要单独安装，可参考如下命令：
+```
+git clone -b 1.0.0a0 http://developer.hpccube.com/codes/OpenDAS/fairseq.git
+cd fairseq
+pip3 install --editable ./ 
+```
+
+3、其他非特殊库直接按照requirements.txt安装
+
+```
+cd speechT5_pytorch
+pip3 install -r requirements.txt
+```
+
+## 数据集
+**内网可从此地址拷贝：/public/home/changhl/dataset/LibriSpeech**
+- SCnet快速下载链接：
+  - [librispeech_asr数据集下载](http://113.200.138.88:18080/aidatasets/librispeech_asr_dummy)
+  
+- 官方下载链接：
+  - [librispeech_asr数据集下载](http://www.openslr.org/12)
+    
+
+`librispeech_asr`：语音识别数据集，数据集中包括音频文件以及文本转录文件。其中音频文件以flac格式存储，文本转录文件以txt格式存储。
+
+```
+LibriSpeech
+├── train-clean-100
+│   ├── 1272
+│   │   ├── 1272-128104
+│   │   │   ├── 1272-128104-0000.flac
+│   │   │   ├── 1272-128104-0001.flac
+│   │   │   ├── 1272-128104-0002.flac
+│   │   │   ├── 1272-128104-0003.flac
+│   │   │   ├── ...
+│   │   │   ├── 1272-128104.trans.txt
+│   │   └── ...
+│   └── ...
+├── train-clean-360
+├── train-other-500
+├── dev-clean
+├── dev-other
+├── test-clean
+└── test-othe
+```
+
+  - `train-clean-100`：包含大约 100 小时的清晰语音。
+  - `1272`：说话人ID(1272)。
+  - `1272-128104`:说话人ID(1272)-文本章节ID(128204)。
+  - `1272-128104-0000.flac`:说话人ID(1272)-文本章节ID(128204)-文本片段ID(0)的音频文件。
+  - `1272-128104.trans.txt`:说话人ID(1272)-文本章节ID(128204)的转录文本文件。
+
+## 预训练模型
+**ASR任务训练前先下载预训练好的权重文件、SPM_TOKENIZER、字典等文件**
+- 官方下载地址：
+  - [speecht5初始权重](https://huggingface.co/ajyy/SpeechT5/resolve/main/speecht5_base.pt)
+  - [SPM_TOKENIZER下载地址](https://drive.google.com/uc?export=download&id=1wClgQjXXoU2lmpbaEa1v2SqMbg7cAutq)
+  - [字典](https://huggingface.co/ajyy/SpeechT5/resolve/main/speecht5_base.pt)
+
+
+
+## 训练
+
+### ASR训练
+**step1：生成训练集的train.tsv文件和valid.tsv文件**
+```
+cd speecht5_pytorch/SpeechT5/fairseq
+python examples/wav2vec/wav2vec_manifest.py dataset/LibriSpeech/dev-clean  --dest /public/home/changhl/py_project/train_0910  --ext flac --valid-percent 0.1
+# python wav2vec_manifest.py 数据集目录 --dest tsv文件的存储目录 --ext 语音文件类型（flac） --valid-percent 0.1
+# 数据集目录为dataset/LibriSpeech/dev-clean,因为选取整个数据集，训练耗时过长,可以选择所有的librispeech数据集。
+```
+
+**step2：生成step1生成的tsv文件的标签文件：train.txt和valid.txt**
+```
+cd speecht5_pytorch/SpeechT5/fairseq
+python examples/wav2vec/libri_labels.py /public/home/changhl/py_project/train_0910/train.tsv --output-dir /public/home/changhl/py_project/train_0910 --output-name train
+python examples/wav2vec/libri_labels.py /public/home/changhl/py_project/train_0910/valid.tsv --output-dir /public/home/changhl/py_project/train_0910 --output-name valid
+# python libri_labels.py tsv文件目录 \
+#    --output-dir txt文件保存目录 \
+#    --output-name txt文件名模型(和tsv的文件名相同)
+```
+
+**step3：字典文件迁移**
+将下载的dict.txt字典文件迁移至train.tsv的同一目录下
+
+**step4：训练**
+```
+cd speecht5_pytorch
+export HIP_VISIBLE_DEVICES=0 # 自行指定可见卡
+bash asr_train.sh \
+    --dcu 1 \ 
+    --log /public/home/changhl/py_project/SpeechT5/log \
+    --td /public/home/changhl/py_project/train_0910 \
+    --res /public/home/changhl/py_project/SpeechT5/res \
+    --lab /public/home/changhl/py_project/train_0910 \
+    --token /public/home/changhl/dataset/spm_char.model \ 
+    --speecht5 /public/home/changhl/py_project/SpeechT5/SpeechT5/speecht5 \
+    --checkpoint /public/home/changhl/dataset/speecht5_base.pt \
+    --epoch 3
+# 具体参数解析-h查看
+```
+  - `dcu`:训练所用的卡数
+  - `log`:训练日志文件保存目录
+  - `td`:train.tsv文件和valid.tsv文件的所在目录
+  - `res`:训练结果.pt文件保存目录
+  - `lab`:train.txt和valid.txt文件的所在目录
+  - `token`:下载的spm_char.model文件路径——SPM_TOKENIZER文件
+  - `speecht5`:/speecht5_pytorch/SpeechT5/speecht5该目录的绝对路径
+  - `checkpoint`:预训练的初始权重路径
+  - `epoch`:训练次数
+
+
+## 应用场景
+
+### 算法分类
+```
+语音识别
+```
+
+### 热点应用行业
+```
+金融，通信，广媒
+```
+
+## 源码仓库及问题反馈
+
+https://developer.hpccube.com/codes/modelzoo/SpeechT5_pytorch
+
+## 参考
+
+[GitHub - microsoft/SpeechT5](https://github.com/microsoft/SpeechT5/tree/main/SpeechT5)
diff --git a/README_ori.md b/README_ori.md
new file mode 100644
index 0000000000000000000000000000000000000000..38ba8779169478628811ff1e9199d9325b68cac2
--- /dev/null
+++ b/README_ori.md
@@ -0,0 +1,271 @@
+# SpeechT5
+
+Unified-modal speech-text pre-training for spoken language processing:
+
+> [**SpeechT5**](https://arxiv.org/abs/2110.07205) (```ACL 2022```): **SpeechT5: Unified-Modal Encoder-Decoder Pre-training for Spoken Language Processing**
+
+> [**Speech2C**](https://arxiv.org/abs/2203.17113) (```INTERSPEECH 2022```): **Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data**
+
+> [**YiTrans**](https://arxiv.org/abs/2206.05777) (```IWSLT 2022```): **The YiTrans End-to-End Speech Translation System for IWSLT 2022 Offline Shared Task**
+
+> [**SpeechUT**](https://arxiv.org/abs/2210.03730) (```EMNLP 2022```): **SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder Based Speech-Text Pre-training**
+
+> [**SpeechLM**](https://arxiv.org/abs/2209.15329) (```IEEE/ACM TASLP```): **SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data**
+
+> [**Speech2S**](https://arxiv.org/abs/2210.17027) (```ICASSP 2023```): **Joint Pre-Training with Speech and Bilingual Text for Direct Speech to Speech Translation**
+
+> [**Prosody-SpeechT5**](https://ieeexplore.ieee.org/document/10096530/) (```ICASSP 2023```): **Prosody-aware SpeechT5 for Expressive Neural TTS**
+
+> [**VATLM**](https://arxiv.org/abs/2211.11275) (```IEEE Transactions on Multimedia```): **VATLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for Speech Representation Learning**
+
+> [**VALL-E X**](https://arxiv.org/abs/2303.03926) (```Arxiv 2023```): **Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language Modeling**
+
+> [**VioLA**](https://arxiv.org/abs/2305.16107) (```Arxiv 2023```): **VioLA: Unified Codec Language Models for Speech Recognition, Synthesis, and Translation**
+
+> [**WavLLM**](https://arxiv.org/abs/2404.00656) (```Arxiv 2024```): **WavLLM: Towards Robust and Adaptive Speech Large Language Model**
+
+<!-- Model introductions, evaluation results, and model inference instructions are located in the corresponding folders. The source code is [https://github.com/microsoft/SpeechT5/tree/main/ModelName]. -->
+
+
+## Update
+
+- April, 2024: WavLLM [**Arxiv**](https://arxiv.org/abs/2404.00656).
+- March, 2024: [**SpeechLM**](https://arxiv.org/abs/2209.15329) was accepted by IEEE/ACM Transactions on Audio, Speech, and Language Processing.
+- May, 2023: VioLA [**Arxiv**](https://arxiv.org/abs/2305.16107).
+- May, 2023: [**VATLM**](https://arxiv.org/abs/2211.11275) was accepted by IEEE Transactions on Multimedia.
+- March, 2023: VALL-E X [**Arxiv**](https://arxiv.org/abs/2303.03926) and [**Demo**](https://aka.ms/vallex).
+- February, 2023: [**Speech2S**](https://arxiv.org/abs/2210.17027) and [**Prosody-SpeechT5**](https://arxiv.org/abs/2211.11275) were accepted by ICASSP 2023.
+- [HuggingFace Integration] February, 2023: [**SpeechT5**](https://aclanthology.org/2022.acl-long.393/) models are on [**HuggingFace**](https://huggingface.co/blog/speecht5).
+- [Model Release] November, 2022: [**VATLM**](https://github.com/microsoft/SpeechT5/tree/main/VATLM) models are released.
+- November, 2022: VATLM [**Arxiv**](https://arxiv.org/abs/2211.11275).
+- November, 2022: Speech2S [**Arxiv**](https://arxiv.org/abs/2210.17027).
+- [Model Release] October, 2022: [**SpeechUT**](https://github.com/microsoft/SpeechT5/tree/main/SpeechUT) models are released.
+- October, 2022: [**SpeechUT**](https://arxiv.org/abs/2210.03730) was accepted by EMNLP 2022.
+- [Model Release] October, 2022: [**SpeechLM**](https://github.com/microsoft/SpeechT5/tree/main/SpeechLM) models are released.
+- September, 2022: SpeechLM [**Arxiv**](https://arxiv.org/abs/2209.15329).
+- [Evaluation] June, 2022: The end-to-end ST system [**YiTrans**](https://arxiv.org/abs/2206.05777) achieved top results on [**IWSLT 2022**](https://iwslt.org/2022/offline) shared tasks.
+- June, 2022:  [**Speech2C**](https://www.isca-speech.org/archive/interspeech_2022/ao22_interspeech.html) was accepted by InterSpeech 2022.
+- [Model Release] May, 2022: [**Speech2C**](https://github.com/microsoft/SpeechT5/tree/main/Speech2C) models are released.
+- [Model Release] April, 2022: [**SpeechT5**](https://github.com/microsoft/SpeechT5/tree/main/SpeechT5) models are released.
+- March, 2022: Speech2C [**Arxiv**](https://arxiv.org/abs/2203.17113).
+- February, 2022: [**SpeechT5**](https://aclanthology.org/2022.acl-long.393/) was accepted by ACL 2022.
+- October, 2021: SpeechT5 [**Arxiv**](https://arxiv.org/abs/2110.07205).
+
+
+## Pre-Trained Models
+
+
+|  Model   |               Pre-training Dataset               | Fine-tuning Dataset | Model |
+| :------: | :----------------------------------------------: | :-----------------: | :-----: |
+| SpeechT5 Base | [960 hrs LibriSpeech](http://www.openslr.org/12) + [LibriSpeech LM Dataset](https://www.openslr.org/11/) |          -          | [HuggingFace](https://huggingface.co/ajyy/SpeechT5/resolve/main/speecht5_base.pt)<br /> [Google Drive](https://drive.google.com/file/d/1Sq00uZ1pw6Z4OUaqhOWzQEJxIVWgAO5U/view?usp=sharing)  |
+| SpeechT5 Base | [960 hrs LibriSpeech](http://www.openslr.org/12) + [LibriSpeech LM Dataset](https://www.openslr.org/11/) | [100 hrs LibriSpeech](http://www.openslr.org/12) | [HuggingFace](https://huggingface.co/ajyy/SpeechT5/resolve/main/speecht5_base_asr.pt)<br /> [Google Drive](https://drive.google.com/file/d/1qLKJ81JPWOGf1MHfjSmgtZyqqTqgI6kT/view?usp=sharing)  |
+| SpeechT5 Large | [60k hrs Libri-Light](https://github.com/facebookresearch/libri-light) + [LibriSpeech LM Dataset](https://www.openslr.org/11/) |          -          | [Google Drive](https://drive.google.com/file/d/1M79b1jetSPOVxWVMIX-y0URvDjNskZKp/view?usp=sharing)  |
+| Speech2C | [960 hrs LibriSpeech](http://www.openslr.org/12) |          -          | [Google Drive](https://drive.google.com/file/d/1nGZ0LWEwlLq2pz7o805YALsMr9irV0Za/view?usp=sharing)  |
+| Speech2C | [960 hrs LibriSpeech](http://www.openslr.org/12) | [10 hrs LibriSpeech](http://www.openslr.org/12) |  [Google Drive](https://drive.google.com/file/d/1nWSAc-33LmcDQHzH8IjXVJsuk0JZTWgN/view?usp=sharing) |
+| Speech2C | [960 hrs LibriSpeech](http://www.openslr.org/12) | [100 hrs LibriSpeech](http://www.openslr.org/12) |  [Google Drive](https://drive.google.com/file/d/1LwbQ5Y3tKZoK3s1ayLQgsfLTFnmkKNZs/view?usp=sharing) |
+| SpeechLM-P Base   | [960 hrs LibriSpeech](http://www.openslr.org/12) + [40M Text](http://www.openslr.org/11)                      |                      -                                            | [Google drive](https://drive.google.com/file/d/1iJvhSGghNrMT-wAY1nwVu2YaYuTy1pxx/view?usp=sharing)  |
+| SpeechLM-P Base   | [960 hrs LibriSpeech](http://www.openslr.org/12) + [40M Text](http://www.openslr.org/11)                      | [100 hrs LibriSpeech](http://www.openslr.org/12)                  | [Google drive](https://drive.google.com/file/d/1mH3N7iKMWYk3rSBJErQPYf3x5ugqDq5x/view?usp=sharing)  |
+| SpeechLM-H Base   | [960 hrs LibriSpeech](http://www.openslr.org/12) + [40M Text](http://www.openslr.org/11)                      |                      -                                            | [Google drive](https://drive.google.com/file/d/1eblW8U8f9t-NTuCNRrNHwr-8BeLAUAmQ/view?usp=sharing)  |
+| SpeechLM-H Base   | [960 hrs LibriSpeech](http://www.openslr.org/12) + [40M Text](http://www.openslr.org/11)                      | [100 hrs LibriSpeech](http://www.openslr.org/12)                  | [Google drive](https://drive.google.com/file/d/1vXyO5DolbiWiTYZ6pkkKQsu2wJetaPlv/view?usp=sharing)  |
+| SpeechLM-P Base   | [960 hrs LibriSpeech](http://www.openslr.org/12) + [40M Text](http://www.openslr.org/11)                      | [En-De CoVoST-2](https://github.com/facebookresearch/covost)      | [Azure Storage]  |
+| SpeechLM-P Base   | [960 hrs LibriSpeech](http://www.openslr.org/12) + [40M Text](http://www.openslr.org/11)                      | [En-Ca CoVoST-2](https://github.com/facebookresearch/covost)      | [Azure Storage]  |
+| SpeechLM-P Base   | [960 hrs LibriSpeech](http://www.openslr.org/12) + [40M Text](http://www.openslr.org/11)                      | [En-Ar CoVoST-2](https://github.com/facebookresearch/covost)      | [Azure Storage]  |
+| SpeechLM-P Base   | [960 hrs LibriSpeech](http://www.openslr.org/12) + [40M Text](http://www.openslr.org/11)                      | [En-Tr CoVoST-2](https://github.com/facebookresearch/covost)      | [Azure Storage]  |
+| SpeechLM-P Large  | [60k hrs LibriLight](https://github.com/facebookresearch/libri-light) + [40M Text](http://www.openslr.org/11) |                      -                                            | [Google drive](https://drive.google.com/file/d/1QjLIgTJKIylVIp5hUkfSjGPtz8Xo7Lky/view?usp=sharing)  |
+| SpeechLM-P Large  | [60k hrs LibriLight](https://github.com/facebookresearch/libri-light) + [40M Text](http://www.openslr.org/11) | [960 hrs LibriSpeech](http://www.openslr.org/12)                  | [Google drive](https://drive.google.com/file/d/1YZQDVv096o8Opt0RBnkRiZXYPRDqKZnP/view?usp=sharing)  |
+| SpeechLM-P Large  | [60k hrs LibriLight](https://github.com/facebookresearch/libri-light) + [40M Text](http://www.openslr.org/11) | [En-De CoVoST-2](https://github.com/facebookresearch/covost)      | [Google drive](https://drive.google.com/file/d/1qYygNWSc11TQbBI1OzC4ChlR-dNh8t9S/view?usp=sharing)  |
+| SpeechLM-P Large  | [60k hrs LibriLight](https://github.com/facebookresearch/libri-light) + [40M Text](http://www.openslr.org/11) | [En-Ca CoVoST-2](https://github.com/facebookresearch/covost)      | [Google drive](https://drive.google.com/file/d/162U88mwso2aVfzzPkEM2nP_vwTpcb57T/view?usp=sharing)  |
+| SpeechLM-P Large  | [60k hrs LibriLight](https://github.com/facebookresearch/libri-light) + [40M Text](http://www.openslr.org/11) | [En-Ar CoVoST-2](https://github.com/facebookresearch/covost)      | [Google drive](https://drive.google.com/file/d/1lbTSRXewEeb2t45URunD6EiJcbniyjWW/view?usp=sharing)  |
+| SpeechLM-P Large  | [60k hrs LibriLight](https://github.com/facebookresearch/libri-light) + [40M Text](http://www.openslr.org/11) | [En-Tr CoVoST-2](https://github.com/facebookresearch/covost)      | [Google drive](https://drive.google.com/file/d/1Er4I_jHS175pQQph223yKtiiLQ378VvH/view?usp=sharing)  |
+| SpeechUT Base (ASR)   | [960 hrs LibriSpeech](http://www.openslr.org/12) + [40M Text](http://www.openslr.org/11)                                                          |                      -                            | [Azure Storage]|
+| SpeechUT Base (ASR)   | [960 hrs LibriSpeech](http://www.openslr.org/12) + [40M Text](http://www.openslr.org/11)                                                          | [100 hrs LibriSpeech](http://www.openslr.org/12)  | [Azure Storage]|
+| SpeechUT Large (ASR)  | [60k hrs LibriSpeech](http://www.openslr.org/12) + [40M Text](http://www.openslr.org/11)                                                          |                      -                            | [Azure Storage]|
+| SpeechUT Large (ASR)  | [60k hrs LibriSpeech](http://www.openslr.org/12) + [40M Text](http://www.openslr.org/11)                                                          | [960 hrs LibriSpeech](http://www.openslr.org/12)  | [Azure Storage]|
+| SpeechUT Base (En-De) | [960 hrs LibriSpeech](http://www.openslr.org/12) + [408 hrs MuST-C v1](https://ict.fbk.eu/must-c/) + [4.6M Text](https://www.statmt.org/wmt16/)   |                      -                            | [Azure Storage]|
+| SpeechUT Base (En-De) | [960 hrs LibriSpeech](http://www.openslr.org/12) + [408 hrs MuST-C v1](https://ict.fbk.eu/must-c/) + [4.6M Text](https://www.statmt.org/wmt16/)   | [En-De MuST-C v1](https://ict.fbk.eu/must-c/)     | [Azure Storage]|
+| SpeechUT Base (En-Es) | [960 hrs LibriSpeech](http://www.openslr.org/12) + [504 hrs MuST-C v1](https://ict.fbk.eu/must-c/) + [15M Text](https://www.statmt.org/wmt13/)    |                      -                            | [Azure Storage]|
+| SpeechUT Base (En-Es) | [960 hrs LibriSpeech](http://www.openslr.org/12) + [504 hrs MuST-C v1](https://ict.fbk.eu/must-c/) + [15M Text](https://www.statmt.org/wmt13/)    | [En-Es MuST-C v1](https://ict.fbk.eu/must-c/)     | [Azure Storage]|
+| SpeechUT Base (En-Fr) | [960 hrs LibriSpeech](http://www.openslr.org/12) + [492 hrs MuST-C v1](https://ict.fbk.eu/must-c/) + [40M Text](https://www.statmt.org/wmt14/)    |                      -                            | [Azure Storage]|
+| SpeechUT Base (En-Fr) | [960 hrs LibriSpeech](http://www.openslr.org/12) + [492 hrs MuST-C v1](https://ict.fbk.eu/must-c/) + [40M Text](https://www.statmt.org/wmt14/)    | [En-Fr MuST-C v1](https://ict.fbk.eu/must-c/)     | [Azure Storage]|
+
+
+
+## SpeechT5 Introduction
+
+Motivated by the success of T5 (Text-To-Text Transfer Transformer) in pre-trained natural language processing models, we propose a unified-modal SpeechT5 framework that explores the encoder-decoder pre-training for self-supervised speech/text representation learning.
+The SpeechT5 framework consists of a shared encoder-decoder network and six modal-specific (speech/text) pre/post-nets. 
+After preprocessing the input speech/text through the pre-nets, the shared encoder-decoder network models the sequence-to-sequence transformation, and then the post-nets generate the output in the speech/text modality based on the output of the decoder.
+
+<img src="SpeechT5/speecht5_framework.png" alt="se" width="1000" />
+
+Leveraging large-scale unlabeled speech and text data, we pre-train SpeechT5 to learn a unified-modal representation, hoping to improve the modeling capability for both speech and text.
+To align the textual and speech information into this unified semantic space, we propose a cross-modal vector quantization approach that randomly mixes up speech/text states with latent units as the interface between encoder and decoder.
+Extensive evaluations show the superiority of the proposed SpeechT5 framework on a wide variety of spoken language processing tasks, including automatic speech recognition, speech synthesis, speech translation, voice conversion, speech enhancement, and speaker identification.
+
+<!--
+Model introductions, evaluation results, and model inference instructions are located in the corresponding folders. The source code is here [https://github.com/microsoft/SpeechT5/tree/main/SpeechT5].
+-->
+
+## SpeechT5 Downstream Task Performance
+
+We evaluate our models on typical spoken language processing tasks, including automatic speech recognition, text to speech, speech to text translation, voice conversion, speech enhancement, and speaker identification.
+
+### Automatic Speech Recognition
+
+Evaluation on the [LibriSpeech](http://www.openslr.org/12)
+
+| Model         |LM                 | dev-clean | dev-other  | test-clean   | test-other   |
+| ------------- |-------------      | ------| ----- | ----|  ----|
+| wav2vec2.0 Base          | -      | 6.1   | 13.5  | 6.1 | 13.3 |
+| HuBERT Base              | -      | 5.5	| 13.1  | 5.8 | 13.3 |
+| Baseline (w/o CTC) | -      | 5.8   | 12.3	| 6.2 | 12.3 |
+| Baseline                 | -      | 4.9	| 11.7  | 5.0 | 11.9 |
+| SpeechT5 (w/o CTC)   | -      | 5.4	| 10.7  | 5.8 | 10.7 |
+| **SpeechT5**             | -      | **4.3**	| **10.3**  | **4.4** | **10.4** |
+| DiscreteBERT             | 4-gram | 4.0   |10.9   |4.5  |12.1  |
+| wav2vec 2.0 Base         | 4-gram | 2.7   |7.9    |3.4  |8.0   |
+| HuBERT Base              | 4-gram	| 2.7   |7.8    |3.4  |8.1   |
+| wav2vec 2.0 Base   | Transf. | 2.2   |6.3    |2.6  |6.3   |
+| Baseline                 | Transf. | 2.3   |6.3    |2.5  |6.3   |
+| **SpeechT5**             | Transf. | **2.1**   |**5.5**    |**2.4**  |**5.8**   |
+
+### Text-to-Speech
+
+Evaluation on the [LibriTTS](http://www.openslr.org/60/)
+
+
+| Model         | Naturalness | MOS       | CMOS        |
+| ------------- |------------ | ------    | -----       | 
+| Ground Truth  | -           | 3.87      | -           |  
+| Baseline      | 2.76        | 3.56	  | 0           | 
+| **SpeechT5**  | 2.91        | **3.65**  | **+0.290**  | 
+
+### Speech Translation
+
+Evaluation on the [MUST-C v1](https://ict.fbk.eu/must-c/)
+
+| Model         | EN-DE | EN-FR           | 
+| ------------- |------------  | ------    | 
+| Fairseq ST    | 22.70        | 32.90     | 
+| ESPnet ST     | 22.91        | 32.69	  | 
+| Adapter Tuning| 24.63        | 34.98	  | 
+| Baseline      | 23.43        | 33.76	  | 
+| SpeechT5 (w/o initializing decoder)  | 24.44   | 34.5  | 
+| **SpeechT5**  | **25.18**        | **35.30**  | 
+
+
+### Voice Conversion
+
+Evaluation on the [CMU Arctic](http://www.festvox.org/cmu_arctic/)
+
+
+| Model            | WER        | WER         | MCD          | MCD          |
+| -------------    | ------    | -----   | ----    |  ----|
+|                  | bdl to slt | clb to slt  | bdl to slt   | clb to slt   |
+| VTN w/ ASR       |  11.1    | 10.9   | 6.5     | 6.11 |
+| VTN w/ TTS       |  7.6 	   | 9.1    | 6.33    | 13.3 |
+| Many-to-many VTN |  -        | -	     | 6.13    | 5.97 |
+| Baseline         |  21.5   | 10.8   | 6.26    | 6.16 |
+| **SpeechT5**     |  **7.8**  | **6.4** | **5.93**| **5.87** |
+
+
+
+### Speech Enhancement
+
+Evaluation on the [WSJ0 Hipster AmbientMixtures (WHAM!)](http://wham.whisper.ai/)
+
+
+| Model                | WER        | 
+| -------------        |------------  | 
+| Ground Truth Speech  | 3.2        | 
+| Noisy Speech         | 76.1        | 
+| Baseline             | 10.9        | 
+| **SpeechT5**         | **8.9**    |
+
+
+### Speaker Identification
+
+Evaluation on the [VoxCeleb1](https://www.robots.ox.ac.uk/~vgg/data/voxceleb/vox1.html)
+
+| Model                          | Acc          | 
+| -------------                  |------------  | 
+| SUPERB, wav2vec 2.0 Base       | 75.18%         | 
+| SUPERB, HuBERT Base       | 81.42%        | 
+| SUPERB, HuBERT Large       | 90.33%        | 
+| SpeechNet, single task         | 86.00%        | 
+| SpeechNet, multi-task with TTS | 87.90%        |
+| Thin ResNet-34                 | 89.00%        |  
+| Baseline                       | 91.92%        | 
+| **SpeechT5**                   | **96.49%**    |
+
+## License
+
+This project is licensed under the license found in the LICENSE file in the root directory of this source tree.
+Portions of the source code are based on the [FAIRSEQ](https://github.com/pytorch/fairseq) and [ESPnet](https://github.com/espnet/espnet) projects.
+
+[Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct)
+
+### Reference
+
+If you find our work is useful in your research, please cite the following paper:
+
+```bibtex
+@article{Ao2021SpeechT5,
+  title   = {SpeechT5: Unified-Modal Encoder-Decoder Pre-training for Spoken Language Processing},
+  author  = {Junyi Ao and Rui Wang and Long Zhou and Chengyi Wang and Shuo Ren and Yu Wu and Shujie Liu and Tom Ko and Qing Li and Yu Zhang and Zhihua Wei and Yao Qian and Jinyu Li and Furu Wei},
+  eprint={2110.07205},
+  archivePrefix={arXiv},
+  primaryClass={eess.AS},
+  year={2021}
+}
+```
+
+```bibtex
+@article{Ao2022Speech2C,
+  title   = {Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data},
+  author  = {Junyi Ao and Ziqiang Zhang and Long Zhou and Shujie Liu and Haizhou Li and Tom Ko and Lirong Dai and Jinyu Li and Yao Qian and Furu Wei},
+  eprint={2203.17113},
+  archivePrefix={arXiv},
+  primaryClass={cs.SD},
+  year={2022}
+}
+```
+
+```bibtex
+@article{Zhang2022Yitrans,
+  title   = {The YiTrans End-to-End Speech Translation System for IWSLT 2022 Offline Shared Task},
+  author  = {Zhang, Ziqiang and Ao, Junyi and Zhou, Long and Liu, Shujie and Wei, Furu and Li, Jinyu},
+  eprint={2206.05777},
+  archivePrefix={arXiv},
+  primaryClass={cs.CL},
+  year={2022}
+}
+```
+
+```bibtex
+@article{zhang2022speechut,
+  title   = {SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder Based Speech-Text Pre-training},
+  author  = {Zhang, Ziqiang and Zhou, Long and Ao, Junyi and Liu, Shujie and Dai, Lirong and Li, Jinyu and Wei, Furu},
+  eprint={2210.03730},
+  archivePrefix={arXiv},
+  primaryClass={cs.CL},
+  year={2022}
+}
+```
+
+```bibtex
+@article{zhang2022speechlm,
+  title   = {SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data},
+  author  = {Zhang, Ziqiang and Chen, Sanyuan and Zhou, Long and Wu, Yu and Ren, Shuo and Liu, Shujie and Yao, Zhuoyuan and Gong, Xun and Dai, Lirong and Li, Jinyu and Wei, Furu},
+  eprint={2209.15329},
+  archivePrefix={arXiv},
+  primaryClass={cs.CL},
+  year={2022}
+}
+```
+
+### Contact Information
+
+For help or issues using SpeechT5 models, please submit a GitHub issue.
+
+For other communications related to SpeechT5, please contact Long Zhou (`lozhou@microsoft.com`).
diff --git a/SECURITY.md b/SECURITY.md
new file mode 100644
index 0000000000000000000000000000000000000000..869fdfe2b246991a053fab9cfec1bed3ab532ab1
--- /dev/null
+++ b/SECURITY.md
@@ -0,0 +1,41 @@
+<!-- BEGIN MICROSOFT SECURITY.MD V0.0.7 BLOCK -->
+
+## Security
+
+Microsoft takes the security of our software products and services seriously, which includes all source code repositories managed through our GitHub organizations, which include [Microsoft](https://github.com/Microsoft), [Azure](https://github.com/Azure), [DotNet](https://github.com/dotnet), [AspNet](https://github.com/aspnet), [Xamarin](https://github.com/xamarin), and [our GitHub organizations](https://opensource.microsoft.com/).
+
+If you believe you have found a security vulnerability in any Microsoft-owned repository that meets [Microsoft's definition of a security vulnerability](https://aka.ms/opensource/security/definition), please report it to us as described below.
+
+## Reporting Security Issues
+
+**Please do not report security vulnerabilities through public GitHub issues.**
+
+Instead, please report them to the Microsoft Security Response Center (MSRC) at [https://msrc.microsoft.com/create-report](https://aka.ms/opensource/security/create-report).
+
+If you prefer to submit without logging in, send email to [secure@microsoft.com](mailto:secure@microsoft.com).  If possible, encrypt your message with our PGP key; please download it from the [Microsoft Security Response Center PGP Key page](https://aka.ms/opensource/security/pgpkey).
+
+You should receive a response within 24 hours. If for some reason you do not, please follow up via email to ensure we received your original message. Additional information can be found at [microsoft.com/msrc](https://aka.ms/opensource/security/msrc). 
+
+Please include the requested information listed below (as much as you can provide) to help us better understand the nature and scope of the possible issue:
+
+  * Type of issue (e.g. buffer overflow, SQL injection, cross-site scripting, etc.)
+  * Full paths of source file(s) related to the manifestation of the issue
+  * The location of the affected source code (tag/branch/commit or direct URL)
+  * Any special configuration required to reproduce the issue
+  * Step-by-step instructions to reproduce the issue
+  * Proof-of-concept or exploit code (if possible)
+  * Impact of the issue, including how an attacker might exploit the issue
+
+This information will help us triage your report more quickly.
+
+If you are reporting for a bug bounty, more complete reports can contribute to a higher bounty award. Please visit our [Microsoft Bug Bounty Program](https://aka.ms/opensource/security/bounty) page for more details about our active programs.
+
+## Preferred Languages
+
+We prefer all communications to be in English.
+
+## Policy
+
+Microsoft follows the principle of [Coordinated Vulnerability Disclosure](https://aka.ms/opensource/security/cvd).
+
+<!-- END MICROSOFT SECURITY.MD BLOCK -->
diff --git a/Speech2C/README.md b/Speech2C/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..9e568918c7ee624ba9bfe8c39f810a72af69f3f2
--- /dev/null
+++ b/Speech2C/README.md
@@ -0,0 +1,145 @@
+# Speech2C
+
+> [**Speech2C**](https://arxiv.org/abs/2203.17113) (```INTERSPEECH 2022```): **Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data**
+
+## Pre-Trained and Fine-tuned Models
+
+|  Model   |               Pre-training Dataset               | Fine-tuning Dataset | Model |
+| :------: | :----------------------------------------------: | :-----------------: | :-----: |
+| Speech2C | [960 hrs LibriSpeech](http://www.openslr.org/12) |          -          | [Google Drive](https://drive.google.com/file/d/1nGZ0LWEwlLq2pz7o805YALsMr9irV0Za/view?usp=sharing)  |
+| Speech2C | [960 hrs LibriSpeech](http://www.openslr.org/12) | [10 hrs LibriSpeech](http://www.openslr.org/12) |  [Google Drive](https://drive.google.com/file/d/1nWSAc-33LmcDQHzH8IjXVJsuk0JZTWgN/view?usp=sharing) |
+| Speech2C | [960 hrs LibriSpeech](http://www.openslr.org/12) | [100 hrs LibriSpeech](http://www.openslr.org/12) |  [Google Drive](https://drive.google.com/file/d/1LwbQ5Y3tKZoK3s1ayLQgsfLTFnmkKNZs/view?usp=sharing) |
+
+
+## Language Model and Vocabulary
+|  Model   |  Dataset | Model | Vocabulary | 
+| :------: | :------: | :---: | :--------: |
+| LM | [LibriSpeech LM Dataset](https://www.openslr.org/11/) | [Model](https://drive.google.com/file/d/1UDCcNJT1DlquSRw0iRAXH6GHlf6zK6-8/view?usp=sharing)  | [Vocabulary](https://dl.fbaipublicfiles.com/fairseq/wav2vec/dict.ltr.txt) |
+
+## Setup
+```
+git submodule update --init Speech2C/fairseq
+cd Speech2C/
+pip install --editable fairseq/
+```
+
+## Data Preparation
+Please follow the steps of data preparation for HuBERT in [here](https://github.com/facebookresearch/fairseq/tree/main/examples/hubert#data-preparation).
+
+## Pre-Training
+```
+DATA_DIR=
+LABEL_DIR=
+FAIRSEQ_PATH=
+
+python ${FAIRSEQ_PATH}/fairseq_cli/hydra_train.py \
+  --config-dir speech2c/config \
+  --config-name speech2c_base_librispeech \
+  task.data=${DATA_DIR} task.label_dir=${LABEL_DIR} task.labels='["km"]' \
+  model.label_rate=50 common.user_dir=SpeechT5/Speech2C/speech2c \
+```
+
+## Finetune
+
+```
+DATA_DIR=
+LABEL_DIR=
+FAIRSEQ_PATH=
+W2V_PATH=
+CONFIG_NAME=
+
+python ${FAIRSEQ_PATH}/fairseq_cli/hydra_train.py \
+  --config-dir speech2c/config \
+  --config-name ${CONFIG_NAME} \
+  task.data=${DATA_DIR} task.label_dir=${LABEL_DIR} \
+  model.w2v_path=${W2V_PATH} common.user_dir=SpeechT5/Speech2C/speech2c \
+```
+
+## Inference
+Note that joint CTC and decoder inference is only supported when the batch size is 1.
+
+```
+FAIRSEQ_PATH=
+DATA_DIR=
+LABEL_DIR=
+BEAM_SIZE=
+CTC_WEIGHT=
+TEST_SET=
+CHECKPOINT_PATH=
+W2V_PATH=
+
+
+python ${FAIRSEQ_PATH}/fairseq_cli/generate.py ${DATA_DIR} \
+    --label-dir ${LABEL_DIR} \
+    --path ${CHECKPOINT_PATH} \
+    --user-dir SpeechT5/Speech2C/speech2c \
+    --model-overrides "{'w2v_path': '${W2V_PATH}'}" \
+    --gen-subset ${TEST_SET} \
+    --task speech2c_pretraining \
+    --post-process letter \
+    --add-decoder \
+    --labels '["ltr"]' \
+    --fine-tuning \
+    --scoring wer \
+    --max-len-a 0 \
+    --max-len-b 620 \
+    --pad-audio \
+    --random-crop \
+    --ctc-weight ${CTC_WEIGHT} \
+    --max-tokens 8000000 \
+    --beam ${BEAM_SIZE} \
+    --single-target \
+```
+
+## Results on Librispeech
+
+### Evaluation on the [LibriSpeech](http://www.openslr.org/12) 10hr subset
+
+| Model         |LM                 | test-clean   | test-other   |
+| ------------- |-------------      | ----|  ----|
+| wav2vec2.0 Base          | -      | 11.1 | 17.6 |
+| HuBERT Base              | -      | 10.1 | 16.8 |
+| **Speech2C**              | -      | **7.8** | **13.1** |
+| wav2vec 2.0 Base         | 4-gram | 4.3  |9.5   |
+| wav2vec 2.0 Base   | Transf. |3.2  |7.8   |
+| HuBERT Base              | 4-gram	|4.3 |9.4   |
+| **Speech2C**              | **Transf.**     | **3.1** | **7.0** |
+
+### Evaluation on the [LibriSpeech](http://www.openslr.org/12) 100hr subset
+
+| Model         |LM                 | test-clean   | test-other   |
+| ------------- |-------------      | ----|  ----|
+| wav2vec2.0 Base          | -      | 6.1 | 13.3 |
+| wav2vec2.0 Large          | -      | 4.7 | 9.0 |
+| HuBERT Base              | -      | 6.3 | 13.2 |
+| SpeechT5             | -      | 4.4 | 10.4 |
+| Baseline                 | -      |  5.0 | 11.9 |
+| **Speech2C**                 | - | **4.3**  |**9.0**   |
+| wav2vec 2.0 Base         | 4-gram | 3.4  |8.0   |
+| wav2vec 2.0 Base         | Transf. | 2.6  | 6.3   |
+| HuBERT Base              | 4-gram	| 3.4  |8.1   |
+| SpeechT5             | Transf. | 2.4  |5.8   |
+| Baseline                 | Transf. | 2.5  |6.3   |
+| **Speech2C**                 | **Transf.** | **2.4**  |**5.2**   |
+
+## License
+
+This project is licensed under the license found in the LICENSE file in the root directory of this source tree.
+Portions of the source code are based on the [FAIRSEQ](https://github.com/pytorch/fairseq).
+
+[Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct)
+
+## Reference
+
+If you find our work is useful in your research, please cite the following paper:
+
+```bibtex
+@article{Ao2022Speech2C,
+  title   = {Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data},
+  author  = {Junyi Ao and Ziqiang Zhang and Long Zhou and Shujie Liu and Haizhou Li and Tom Ko and Lirong Dai and Jinyu Li and Yao Qian and Furu Wei},
+  eprint={2203.17113},
+  archivePrefix={arXiv},
+  primaryClass={cs.SD},
+  year={2022}
+}
+```
diff --git a/Speech2C/speech2c/__init__.py b/Speech2C/speech2c/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..8994f9a368ae4b2eff720fffb134e2a5b813ee1c
--- /dev/null
+++ b/Speech2C/speech2c/__init__.py
@@ -0,0 +1 @@
+from . import data, tasks, criterions, models   # noqa
\ No newline at end of file
diff --git a/Speech2C/speech2c/config/base_100h.yaml b/Speech2C/speech2c/config/base_100h.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..2af86af96e3719a1419a4dd49af156d4c61e9c49
--- /dev/null
+++ b/Speech2C/speech2c/config/base_100h.yaml
@@ -0,0 +1,93 @@
+# @package _group_
+
+common:
+  fp16: true
+  log_format: json
+  log_interval: 200
+  tensorboard_logdir: tblog
+  seed: 1337
+
+checkpoint:
+  no_epoch_checkpoints: true
+  best_checkpoint_metric: dec_accuracy
+  maximize_best_checkpoint_metric: true
+
+distributed_training:
+  ddp_backend: c10d
+  find_unused_parameters: true
+  distributed_world_size: 1
+  distributed_port: 29671
+  nprocs_per_node: 8
+
+task:
+  _name: speech2c_pretraining
+  data: ???
+  fine_tuning: true
+  label_dir: ???
+  normalize: false  # must be consistent with pre-training
+  labels: ["ltr"]
+  single_target: true
+  add_decoder: true
+  pad_audio: true
+  random_crop: false
+
+dataset:
+  num_workers: 6
+  max_tokens: 3200000
+  skip_invalid_size_inputs_valid_test: true
+  train_subset: train_100h
+  valid_subset: dev_other
+
+criterion:
+  _name: ctc_ce
+  zero_infinity: true
+
+optimization:
+  max_update: 80000
+  lr: [0.00004]
+  sentence_avg: true
+  update_freq: [1]
+
+optimizer:
+  _name: adam
+  adam_betas: (0.9,0.98)
+  adam_eps: 1e-08
+
+lr_scheduler:
+  _name: tri_stage
+  phase_ratio: [0.1, 0.4, 0.5]
+  final_lr_scale: 0.05
+
+model:
+  _name: speech2c_ctc
+  w2v_path: ???
+  apply_mask: true
+  mask_prob: 0.65
+  mask_channel_prob: 0.5
+  mask_channel_length: 64
+  layerdrop: 0.1
+  decoder_layerdrop: 0.1
+  activation_dropout: 0.1
+  feature_grad_mult: 0.0
+  freeze_finetune_updates: 25000
+
+hydra:
+  job:
+    config:
+      override_dirname:
+        kv_sep: '-'
+        item_sep: '__'
+        exclude_keys:
+          - run
+          - task.data
+          - task.label_dir
+          - model.w2v_path
+          - dataset.train_subset
+          - dataset.valid_subset
+          - criterion.wer_kenlm_model
+          - criterion.wer_lexicon
+  run:
+    dir: ???
+  sweep:
+    dir: ???
+    subdir: ${hydra.job.config_name}__${hydra.job.override_dirname}
diff --git a/Speech2C/speech2c/config/base_10h.yaml b/Speech2C/speech2c/config/base_10h.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..aaa4ed7a79998fc1a09480f2917e2557e8aba457
--- /dev/null
+++ b/Speech2C/speech2c/config/base_10h.yaml
@@ -0,0 +1,104 @@
+# @package _group_
+
+common:
+  fp16: true
+  log_format: json
+  log_interval: 200
+  tensorboard_logdir: tblog
+  seed: 1337
+
+checkpoint:
+  save_interval: 5
+  keep_interval_updates: 1
+  no_epoch_checkpoints: true
+  best_checkpoint_metric: dec_accuracy
+  maximize_best_checkpoint_metric: true
+
+distributed_training:
+  ddp_backend: c10d
+  find_unused_parameters: true
+  distributed_world_size: 1
+  distributed_port: 29671
+  nprocs_per_node: 8
+
+task:
+  _name: speech2c_pretraining
+  data: ???
+  fine_tuning: true
+  label_dir: ???
+  normalize: false  # must be consistent with pre-training
+  labels: ["ltr"]
+  single_target: true
+  add_decoder: true
+  pad_audio: true
+  random_crop: false
+
+dataset:
+  num_workers: 6
+  max_tokens: 3200000
+  skip_invalid_size_inputs_valid_test: true
+  validate_after_updates: ${model.freeze_finetune_updates}
+  validate_interval: 5
+  train_subset: train_10h
+  valid_subset: dev_other
+
+criterion:
+  _name: ctc_ce
+  zero_infinity: true
+
+optimization:
+  max_update: 25000
+  lr: [2e-5]
+  sentence_avg: true
+  update_freq: [1]
+
+optimizer:
+  _name: adam
+  adam_betas: (0.9,0.98)
+  adam_eps: 1e-08
+
+lr_scheduler:
+  _name: tri_stage
+  phase_ratio: [0.1, 0.4, 0.5]
+  final_lr_scale: 0.05
+
+model:
+  _name: speech2c_ctc
+  w2v_path: ???
+  apply_mask: true
+  mask_selection: static
+  mask_length: 10
+  mask_other: 0
+  mask_prob: 0.75
+  mask_channel_selection: static
+  mask_channel_length: 64
+  mask_channel_other: 0
+  mask_channel_prob: 0.5
+  layerdrop: 0.1
+  decoder_layerdrop: 0.1
+  dropout: 0.0
+  activation_dropout: 0.1
+  attention_dropout: 0.0
+  feature_grad_mult: 0.0
+  freeze_finetune_updates: 10000
+
+hydra:
+  job:
+    config:
+      override_dirname:
+        kv_sep: '-'
+        item_sep: '__'
+        exclude_keys:
+          - run
+          - task.data
+          - task.label_dir
+          - model.w2v_path
+          - dataset.train_subset
+          - dataset.valid_subset
+          - criterion.wer_kenlm_model
+          - criterion.wer_lexicon
+  run:
+    dir: ???
+  sweep:
+    dir: ???
+    subdir: ${hydra.job.config_name}__${hydra.job.override_dirname}
diff --git a/Speech2C/speech2c/config/speech2c_base_librispeech.yaml b/Speech2C/speech2c/config/speech2c_base_librispeech.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..1f361375d8d11d6d3f7dc5573bbfc1e779930d52
--- /dev/null
+++ b/Speech2C/speech2c/config/speech2c_base_librispeech.yaml
@@ -0,0 +1,100 @@
+# @package _group_
+
+common:
+  fp16: true
+  log_format: json
+  log_interval: 200
+  seed: 1337
+  tensorboard_logdir: tblog
+
+checkpoint:
+  save_interval_updates: 25000
+  keep_interval_updates: 1
+  no_epoch_checkpoints: true
+
+
+distributed_training:
+  ddp_backend: no_c10d
+  distributed_backend: 'nccl'
+  distributed_world_size: 32
+  distributed_port: 29671
+  nprocs_per_node: 8
+  find_unused_parameters: true
+
+task:
+  _name: speech2c_pretraining
+  data: ???
+  label_dir: ???
+  labels: ???
+  label_rate: ${model.label_rate}
+  sample_rate: 16000
+  max_sample_size: 250000
+  min_sample_size: 32000
+  pad_audio: false
+  random_crop: true
+  normalize: false # must be consistent with extractor
+  add_decoder: true
+
+dataset:
+  num_workers: 6
+  max_tokens: 1400000
+  skip_invalid_size_inputs_valid_test: true
+  validate_interval: 5
+  validate_interval_updates: 10000
+
+criterion:
+  _name: speech2c
+  pred_masked_weight: 1.0
+  pred_nomask_weight: 0.0
+  loss_weights: [10,]
+
+optimization:
+  max_update: 400000
+  lr: [0.0005]
+  clip_norm: 10.0
+
+optimizer:
+  _name: adam
+  adam_betas: (0.9,0.98)
+  adam_eps: 1e-06
+  weight_decay: 0.01
+
+lr_scheduler:
+  _name: polynomial_decay
+  warmup_updates: 32000
+
+model:
+  _name: speech2c
+  label_rate: ???
+  skip_masked: false
+  skip_nomask: false
+  mask_prob: 0.80
+  extractor_mode: default
+  conv_feature_layers: '[(512,10,5)] + [(512,3,2)] * 4 + [(512,2,2)] * 2'
+  final_dim: 256
+  encoder_layerdrop: 0.05
+  dropout_input: 0.1
+  dropout_features: 0.1
+  dropout: 0.1
+  attention_dropout: 0.1
+  feature_grad_mult: 0.1
+  untie_final_proj: true
+  activation_dropout: 0.0
+  use_rel_pos_enc: true
+  decoder_dict_size: -1
+
+hydra:
+  job:
+    config:
+      override_dirname:
+        kv_sep: '-'
+        item_sep: '__'
+        exclude_keys:
+          - run
+          - task.data
+          - task.label_dir
+  run:
+    dir: ???
+  sweep:
+    dir: ???
+    subdir: ${hydra.job.config_name}__${hydra.job.override_dirname}
diff --git a/Speech2C/speech2c/criterions/__init__.py b/Speech2C/speech2c/criterions/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..69fc7d7c6fa06ee16e28752119410410bf3e212f
--- /dev/null
+++ b/Speech2C/speech2c/criterions/__init__.py
@@ -0,0 +1,10 @@
+import importlib
+import os
+
+
+for file in os.listdir(os.path.dirname(__file__)):
+    if file.endswith(".py") and not file.startswith("_"):
+        criterion_name = file[: file.find(".py")]
+        importlib.import_module(
+            "speech2c.criterions." + criterion_name
+        )
diff --git a/Speech2C/speech2c/criterions/ctc_ce.py b/Speech2C/speech2c/criterions/ctc_ce.py
new file mode 100644
index 0000000000000000000000000000000000000000..39922924a1f22f6405f743cf262ca3609de59268
--- /dev/null
+++ b/Speech2C/speech2c/criterions/ctc_ce.py
@@ -0,0 +1,404 @@
+# --------------------------------------------------------
+# Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data (https://arxiv.org/abs/2203.17113)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/Speech2C
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Based on fairseq code bases
+# https://github.com/pytorch/fairseq
+# --------------------------------------------------------
+
+import math
+from argparse import Namespace
+from dataclasses import dataclass, field
+from omegaconf import II
+from typing import Optional
+
+import torch
+import torch.nn.functional as F
+from fairseq import metrics, utils
+from fairseq.criterions import FairseqCriterion, register_criterion
+from fairseq.criterions.label_smoothed_cross_entropy import label_smoothed_nll_loss
+from fairseq.dataclass import FairseqDataclass
+from fairseq.data.data_utils import post_process
+from fairseq.tasks import FairseqTask
+from fairseq.logging.meters import safe_round
+
+
+@dataclass
+class CtcCeCriterionConfig(FairseqDataclass):
+    zero_infinity: bool = field(
+        default=False,
+        metadata={"help": "zero inf loss when source length <= target length"},
+    )
+    sentence_avg: bool = II("optimization.sentence_avg")
+    post_process: str = field(
+        default="letter",
+        metadata={
+            "help": "how to post process predictions into words. can be letter, "
+            "wordpiece, BPE symbols, etc. "
+            "See fairseq.data.data_utils.post_process() for full list of options"
+        },
+    )
+    wer_kenlm_model: Optional[str] = field(
+        default=None,
+        metadata={
+            "help": "if this is provided, use kenlm to compute wer (along with other wer_* args)"
+        },
+    )
+    wer_lexicon: Optional[str] = field(
+        default=None,
+        metadata={"help": "lexicon to use with wer_kenlm_model"},
+    )
+    wer_lm_weight: float = field(
+        default=2.0,
+        metadata={"help": "lm weight to use with wer_kenlm_model"},
+    )
+    wer_word_score: float = field(
+        default=-1.0,
+        metadata={"help": "lm word score to use with wer_kenlm_model"},
+    )
+
+    wer_args: Optional[str] = field(
+        default=None,
+        metadata={
+            "help": "DEPRECATED: tuple of (wer_kenlm_model, wer_lexicon, wer_lm_weight, wer_word_score)"
+        },
+    )
+
+    dec_weight: float = field(
+        default=0.5,
+        metadata={"help": "weights for decoder CE Loss, loss will be ((1 - dec_weight) * hubert_loss + dec_weight * CE_Loss)"},
+    )
+    report_accuracy: bool = field(
+        default=True,
+        metadata={"help": "report decoder accuracy metric"},
+    )
+    ignore_prefix_size: int = field(
+        default=0,
+        metadata={"help": "Ignore first N tokens"},
+    )
+    label_smoothing: float = field(
+        default=0.1,
+        metadata={"help": "epsilon for label smoothing, 0 means no label smoothing"},
+    )
+
+
+@register_criterion("ctc_ce", dataclass=CtcCeCriterionConfig)
+class CtcCeCriterion(FairseqCriterion):
+    def __init__(self, cfg: CtcCeCriterionConfig, task: FairseqTask):
+        super().__init__(task)
+        self.blank_idx = (
+            task.target_dictionary.index(task.blank_symbol)
+            if hasattr(task, "blank_symbol")
+            else 0
+        )
+        self.pad_idx = task.target_dictionary.pad()
+        self.eos_idx = task.target_dictionary.eos()
+        self.post_process = cfg.post_process
+
+        if cfg.wer_args is not None:
+            (
+                cfg.wer_kenlm_model,
+                cfg.wer_lexicon,
+                cfg.wer_lm_weight,
+                cfg.wer_word_score,
+            ) = eval(cfg.wer_args)
+
+        if cfg.wer_kenlm_model is not None:
+            from examples.speech_recognition.w2l_decoder import W2lKenLMDecoder
+
+            dec_args = Namespace()
+            dec_args.nbest = 1
+            dec_args.criterion = "ctc"
+            dec_args.kenlm_model = cfg.wer_kenlm_model
+            dec_args.lexicon = cfg.wer_lexicon
+            dec_args.beam = 50
+            dec_args.beam_size_token = min(50, len(task.target_dictionary))
+            dec_args.beam_threshold = min(50, len(task.target_dictionary))
+            dec_args.lm_weight = cfg.wer_lm_weight
+            dec_args.word_score = cfg.wer_word_score
+            dec_args.unk_weight = -math.inf
+            dec_args.sil_weight = 0
+
+            self.w2l_decoder = W2lKenLMDecoder(dec_args, task.target_dictionary)
+        else:
+            self.w2l_decoder = None
+
+        self.zero_infinity = cfg.zero_infinity
+        self.sentence_avg = cfg.sentence_avg
+
+        self.dec_weight = cfg.dec_weight
+        self.report_accuracy = cfg.report_accuracy
+        self.ignore_prefix_size = cfg.ignore_prefix_size
+        self.eps = cfg.label_smoothing
+
+    def forward(self, model, sample, reduce=True):
+        net_output = model(**sample["net_input"])
+        lprobs = model.get_normalized_probs(
+            net_output, log_probs=True
+        ).contiguous()  # (T, B, C) from the encoder
+
+        if "src_lengths" in sample["net_input"]:
+            input_lengths = sample["net_input"]["src_lengths"]
+        else:
+            if net_output["padding_mask"] is not None:
+                non_padding_mask = ~net_output["padding_mask"]
+                input_lengths = non_padding_mask.long().sum(-1)
+            else:
+                input_lengths = lprobs.new_full(
+                    (lprobs.size(1),), lprobs.size(0), dtype=torch.long
+                )
+
+        pad_mask = (sample["target"] != self.pad_idx) & (
+            sample["target"] != self.eos_idx
+        )
+        targets_flat = sample["target"].masked_select(pad_mask)
+        if "target_lengths" in sample:
+            target_lengths = sample["target_lengths"]
+        else:
+            target_lengths = pad_mask.sum(-1)
+
+        with torch.backends.cudnn.flags(enabled=False):
+            loss = F.ctc_loss(
+                lprobs,
+                targets_flat,
+                input_lengths,
+                target_lengths,
+                blank=self.blank_idx,
+                reduction="sum",
+                zero_infinity=self.zero_infinity,
+            )
+
+        ntokens = (
+            sample["ntokens"] if "ntokens" in sample else target_lengths.sum().item()
+        )
+
+        sample_size = sample["target"].size(0) if self.sentence_avg else ntokens
+
+        logging_output = {}
+        if "decoder_target" in sample:
+            dec_sample_size = sample["target"].size(0) if self.sentence_avg else sample["dec_ntokens"]
+            dec_loss, dec_nll_loss = self.compute_ce_loss(model, net_output["decoder_out"], sample, reduce=reduce)
+            logging_output["ctc_loss"] = loss.item()
+            loss = (1 - self.dec_weight) * loss + (self.dec_weight * dec_loss *  sample_size / dec_sample_size)
+            logging_output["dec_loss"] = dec_loss.item()
+            logging_output["dec_nll_loss"] = dec_nll_loss.item()
+            logging_output["dec_sample_size"] = dec_sample_size
+
+            if self.report_accuracy:
+                n_correct, total = self.compute_accuracy(model, net_output["decoder_out"], sample)
+                logging_output["dec_n_correct"] = utils.item(n_correct.data)
+                logging_output["total"] = utils.item(total.data)
+
+        logging_output = {
+            "loss": utils.item(loss.data),  # * sample['ntokens'],
+            "ntokens": ntokens,
+            "nsentences": sample["id"].numel(),
+            "sample_size": sample_size,
+            **logging_output,
+        }
+
+        if not model.training:
+            import editdistance
+
+            with torch.no_grad():
+                lprobs_t = lprobs.transpose(0, 1).float().contiguous().cpu()
+
+                c_err = 0
+                c_len = 0
+                w_errs = 0
+                w_len = 0
+                wv_errs = 0
+                for lp, t, inp_l in zip(
+                    lprobs_t,
+                    sample["target_label"]
+                    if "target_label" in sample
+                    else sample["target"],
+                    input_lengths,
+                ):
+                    lp = lp[:inp_l].unsqueeze(0)
+
+                    decoded = None
+                    if self.w2l_decoder is not None:
+                        decoded = self.w2l_decoder.decode(lp)
+                        if len(decoded) < 1:
+                            decoded = None
+                        else:
+                            decoded = decoded[0]
+                            if len(decoded) < 1:
+                                decoded = None
+                            else:
+                                decoded = decoded[0]
+
+                    p = (t != self.task.target_dictionary.pad()) & (
+                        t != self.task.target_dictionary.eos()
+                    )
+                    targ = t[p]
+                    targ_units = self.task.target_dictionary.string(targ)
+                    targ_units_arr = targ.tolist()
+
+                    toks = lp.argmax(dim=-1).unique_consecutive()
+                    pred_units_arr = toks[toks != self.blank_idx].tolist()
+
+                    c_err += editdistance.eval(pred_units_arr, targ_units_arr)
+                    c_len += len(targ_units_arr)
+
+                    targ_words = post_process(targ_units, self.post_process).split()
+
+                    pred_units = self.task.target_dictionary.string(pred_units_arr)
+                    pred_words_raw = post_process(pred_units, self.post_process).split()
+
+                    if decoded is not None and "words" in decoded:
+                        pred_words = decoded["words"]
+                        w_errs += editdistance.eval(pred_words, targ_words)
+                        wv_errs += editdistance.eval(pred_words_raw, targ_words)
+                    else:
+                        dist = editdistance.eval(pred_words_raw, targ_words)
+                        w_errs += dist
+                        wv_errs += dist
+
+                    w_len += len(targ_words)
+
+                logging_output["wv_errors"] = wv_errs
+                logging_output["w_errors"] = w_errs
+                logging_output["w_total"] = w_len
+                logging_output["c_errors"] = c_err
+                logging_output["c_total"] = c_len
+
+        return loss, sample_size, logging_output
+
+    def compute_ce_loss(self, model, net_output, sample, reduce=True):
+        lprobs, target = self.get_lprobs_and_target(model, net_output, sample)
+        loss, nll_loss = label_smoothed_nll_loss(
+            lprobs,
+            target,
+            self.eps,
+            ignore_index=self.pad_idx,
+            reduce=reduce,
+        )
+        return loss, nll_loss
+
+    def compute_accuracy(self, model, net_output, sample):
+        lprobs, target = self.get_lprobs_and_target(model, net_output, sample)
+        mask = target.ne(self.pad_idx)
+        n_correct = torch.sum(
+            lprobs.argmax(1).masked_select(mask).eq(target.masked_select(mask))
+        )
+        total = torch.sum(mask)
+        return n_correct, total
+
+    def get_lprobs_and_target(self, model, net_output, sample):
+        lprobs = model.get_normalized_probs(net_output, log_probs=True)
+        target = sample["decoder_target"]
+        if self.ignore_prefix_size > 0:
+            if getattr(lprobs, "batch_first", False):
+                lprobs = lprobs[:, self.ignore_prefix_size :, :].contiguous()
+                target = target[:, self.ignore_prefix_size :].contiguous()
+            else:
+                lprobs = lprobs[self.ignore_prefix_size :, :, :].contiguous()
+                target = target[self.ignore_prefix_size :, :].contiguous()
+        return lprobs.view(-1, lprobs.size(-1)), target.view(-1)
+
+
+    @staticmethod
+    def reduce_metrics(logging_outputs) -> None:
+        """Aggregate logging outputs from data parallel training."""
+
+        loss_sum = utils.item(sum(log.get("loss", 0) for log in logging_outputs))
+        ntokens = utils.item(sum(log.get("ntokens", 0) for log in logging_outputs))
+        nsentences = utils.item(
+            sum(log.get("nsentences", 0) for log in logging_outputs)
+        )
+        sample_size = utils.item(
+            sum(log.get("sample_size", 0) for log in logging_outputs)
+        )
+
+        metrics.log_scalar(
+            "loss", loss_sum / sample_size / math.log(2), sample_size, round=3
+        )
+        metrics.log_scalar("ntokens", ntokens)
+        metrics.log_scalar("nsentences", nsentences)
+        if sample_size != ntokens:
+            metrics.log_scalar(
+                "nll_loss", loss_sum / ntokens / math.log(2), ntokens, round=3
+            )
+
+        c_errors = sum(log.get("c_errors", 0) for log in logging_outputs)
+        metrics.log_scalar("_c_errors", c_errors)
+        c_total = sum(log.get("c_total", 0) for log in logging_outputs)
+        metrics.log_scalar("_c_total", c_total)
+        w_errors = sum(log.get("w_errors", 0) for log in logging_outputs)
+        metrics.log_scalar("_w_errors", w_errors)
+        wv_errors = sum(log.get("wv_errors", 0) for log in logging_outputs)
+        metrics.log_scalar("_wv_errors", wv_errors)
+        w_total = sum(log.get("w_total", 0) for log in logging_outputs)
+        metrics.log_scalar("_w_total", w_total)
+
+        if c_total > 0:
+            metrics.log_derived(
+                "uer",
+                lambda meters: safe_round(
+                    meters["_c_errors"].sum * 100.0 / meters["_c_total"].sum, 3
+                )
+                if meters["_c_total"].sum > 0
+                else float("nan"),
+            )
+        if w_total > 0:
+            metrics.log_derived(
+                "wer",
+                lambda meters: safe_round(
+                    meters["_w_errors"].sum * 100.0 / meters["_w_total"].sum, 3
+                )
+                if meters["_w_total"].sum > 0
+                else float("nan"),
+            )
+            metrics.log_derived(
+                "raw_wer",
+                lambda meters: safe_round(
+                    meters["_wv_errors"].sum * 100.0 / meters["_w_total"].sum, 3
+                )
+                if meters["_w_total"].sum > 0
+                else float("nan"),
+            )
+
+        if "dec_loss" in logging_outputs[0]:
+            ctc_loss_sum = sum(log.get("ctc_loss", 0) for log in logging_outputs)
+            dec_loss_sum = sum(log.get("dec_loss", 0) for log in logging_outputs)
+            dec_nll_loss_sum = sum(log.get("dec_nll_loss", 0) for log in logging_outputs)
+            dec_sample_size = sum(log.get("dec_sample_size", 0) for log in logging_outputs)
+            metrics.log_scalar(
+                "dec_loss", dec_loss_sum / dec_sample_size / math.log(2), dec_sample_size, round=3
+            )
+            metrics.log_scalar(
+                "ctc_loss", ctc_loss_sum / sample_size / math.log(2), sample_size, round=3
+            )
+            metrics.log_scalar(
+                "dec_nll_loss", dec_nll_loss_sum / dec_sample_size / math.log(2), dec_sample_size, round=3
+            )
+            metrics.log_derived(
+                "dec_ppl", lambda meters: utils.get_perplexity(meters["dec_nll_loss"].avg)
+            )
+            total = utils.item(sum(log.get("total", 0) for log in logging_outputs))
+            if total > 0:
+                metrics.log_scalar("total", total)
+                n_correct = utils.item(
+                    sum(log.get("dec_n_correct", 0) for log in logging_outputs)
+                )
+                metrics.log_scalar("dec_n_correct", n_correct)
+                metrics.log_derived(
+                    "dec_accuracy",
+                    lambda meters: round(
+                        meters["dec_n_correct"].sum * 100.0 / meters["total"].sum, 3
+                    )
+                    if meters["total"].sum > 0
+                    else float("nan"),
+                )
+
+    @staticmethod
+    def logging_outputs_can_be_summed() -> bool:
+        """
+        Whether the logging outputs returned by `forward` can be summed
+        across workers prior to calling `reduce_metrics`. Setting this
+        to True will improves distributed training speed.
+        """
+        return True
diff --git a/Speech2C/speech2c/criterions/speech2c_criterion.py b/Speech2C/speech2c/criterions/speech2c_criterion.py
new file mode 100644
index 0000000000000000000000000000000000000000..f6a695fc04024df3f2b5f8d87077484491c90d84
--- /dev/null
+++ b/Speech2C/speech2c/criterions/speech2c_criterion.py
@@ -0,0 +1,261 @@
+# --------------------------------------------------------
+# Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data (https://arxiv.org/abs/2203.17113)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/Speech2C
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Based on fairseq code bases
+# https://github.com/pytorch/fairseq
+# --------------------------------------------------------
+
+import math
+import re
+from dataclasses import dataclass, field
+
+import torch
+import torch.nn.functional as F
+from fairseq import metrics, utils
+from fairseq.criterions import FairseqCriterion, register_criterion
+from fairseq.criterions.label_smoothed_cross_entropy import label_smoothed_nll_loss
+from fairseq.criterions.hubert_criterion import HubertCriterionConfig
+
+@dataclass
+class Speech2cCriterionConfig(HubertCriterionConfig):
+    dec_weight: float = field(
+        default=1.0,
+        metadata={"help": "weights for decoder CE Loss, loss will be (hubert_loss + dec_weight * CE_Loss)"},
+    )
+    report_accuracy: bool = field(
+        default=True,
+        metadata={"help": "report decoder accuracy metric"},
+    )
+    ignore_prefix_size: int = field(
+        default=0,
+        metadata={"help": "Ignore first N tokens"},
+    )
+    label_smoothing: float = field(
+        default=0.0,
+        metadata={"help": "epsilon for label smoothing, 0 means no label smoothing"},
+    )
+
+
+@register_criterion("speech2c", dataclass=Speech2cCriterionConfig)
+class Speech2cCriterion(FairseqCriterion):
+    def __init__(self, task, pred_masked_weight, pred_nomask_weight, loss_weights=None, log_keys=None, dec_weight=1.0, report_accuracy=False, ignore_prefix_size=0, label_smoothing=0.0):
+        super().__init__(task)
+        self.pred_masked_weight = pred_masked_weight
+        self.pred_nomask_weight = pred_nomask_weight
+        self.loss_weights = loss_weights
+        self.log_keys = [] if log_keys is None else log_keys
+        self.dec_weight = dec_weight
+        self.report_accuracy = report_accuracy
+        self.ignore_prefix_size = ignore_prefix_size
+        self.eps = label_smoothing
+        self.padding_idx = task.dictionaries[0].pad()
+
+    def forward(self, model, sample, reduce=True, log_pred=False):
+        """Compute the loss for the given sample.
+        Returns a tuple with three elements:
+        1) the loss
+        2) the sample size, which is used as the denominator for the gradient
+        3) logging outputs to display while training
+        """
+        net_output = model(target_list=sample["target_list"], **sample["net_input"])
+        loss = 0.0
+        sample_size = 0
+        logging_output = {}
+        reduction = "sum" if reduce else "none"
+
+        loss_m_list = []
+        logp_m_list = model.get_logits(net_output, True)
+        targ_m_list = model.get_targets(net_output, True)
+        assert self.pred_masked_weight == 0 or len(logp_m_list) > 0
+        for i, (logp_m, targ_m) in enumerate(zip(logp_m_list, targ_m_list)):
+            loss_m = F.cross_entropy(logp_m, targ_m, reduction=reduction)
+            loss_m_list.append(loss_m)
+            logging_output[f"loss_m_{i}"] = loss_m.detach().item()
+        if self.pred_masked_weight > 0:
+            loss += self.pred_masked_weight * sum(loss_m_list)
+            sample_size += targ_m_list[0].numel()
+
+        loss_u_list = []
+        logp_u_list = model.get_logits(net_output, False)
+        targ_u_list = model.get_targets(net_output, False)
+        assert self.pred_nomask_weight == 0 or len(logp_u_list) > 0
+        for i, (logp_u, targ_u) in enumerate(zip(logp_u_list, targ_u_list)):
+            loss_u = F.cross_entropy(logp_u, targ_u, reduction=reduction)
+            loss_u_list.append(loss_u)
+            logging_output[f"loss_u_{i}"] = loss_u.detach().item()
+        if self.pred_nomask_weight > 0:
+            loss += self.pred_nomask_weight * sum(loss_u_list)
+            sample_size += targ_u_list[0].numel()
+
+        if self.loss_weights is not None:
+            assert hasattr(model, "get_extra_losses")
+            extra_losses, names = model.get_extra_losses(net_output)
+            if torch.is_tensor(extra_losses):
+                extra_losses = [extra_losses]
+                names = [names]
+            if len(self.loss_weights) == 1 and len(extra_losses) != 1:
+                self.loss_weights = [self.loss_weights[0]] * len(extra_losses)
+            assert len(extra_losses) == len(
+                self.loss_weights
+            ), f"{len(extra_losses)}, {len(self.loss_weights)}"
+            for p, n, coef in zip(extra_losses, names, self.loss_weights):
+                if coef != 0 and p is not None:
+                    p = coef * p.float() * sample_size
+                    loss += p
+                    logging_output[f"loss_{n}"] = p.item()
+
+        if "decoder_target" in sample:
+            dec_sample_size = sample["dec_ntokens"]
+            dec_loss, dec_nll_loss = self.compute_ce_loss(model, net_output["decoder_out"], sample, reduce=reduce)
+            loss = loss + (self.dec_weight * dec_loss *  sample_size / dec_sample_size)
+            logging_output["dec_loss"] = dec_loss.item()
+            logging_output["dec_nll_loss"] = dec_nll_loss.item()
+            logging_output["dec_sample_size"] = dec_sample_size
+
+            if self.report_accuracy:
+                n_correct, total = self.compute_accuracy(model, net_output["decoder_out"], sample)
+                logging_output["dec_n_correct"] = utils.item(n_correct.data)
+                logging_output["total"] = utils.item(total.data)
+
+        logging_output = {
+            "loss": loss.item() if reduce else loss,
+            "ntokens": sample_size,
+            "nsentences": sample["id"].numel(),
+            "sample_size": sample_size,
+            **logging_output,
+        }
+
+        for lk in self.log_keys:
+            if lk in net_output:
+                logging_output[lk] = float((net_output[lk]))
+
+        def compute_correct(logits):
+            if logits.numel() == 0:
+                return 0, 0
+            else:
+                assert logits.dim() > 1, logits.shape
+                max = logits.argmax(-1) == 0
+                min = logits.argmin(-1) == 0
+                both = max & min
+                corr = max.long().sum().item() - both.long().sum().item()
+                count = max.numel()
+                return corr, count
+
+        with torch.no_grad():
+            for i, logp_m in enumerate(logp_m_list):
+                corr_m, count_m = compute_correct(logp_m)
+                logging_output[f"correct_m_{i}"] = corr_m
+                logging_output[f"count_m_{i}"] = count_m
+
+            for i, logp_u in enumerate(logp_u_list):
+                corr_u, count_u = compute_correct(logp_u)
+                logging_output[f"correct_u_{i}"] = corr_u
+                logging_output[f"count_u_{i}"] = count_u
+
+        return loss, sample_size, logging_output
+
+    def compute_ce_loss(self, model, net_output, sample, reduce=True):
+        lprobs, target = self.get_lprobs_and_target(model, net_output, sample)
+        loss, nll_loss = label_smoothed_nll_loss(
+            lprobs,
+            target,
+            self.eps,
+            ignore_index=self.padding_idx,
+            reduce=reduce,
+        )
+        return loss, nll_loss
+
+    def compute_accuracy(self, model, net_output, sample):
+        lprobs, target = self.get_lprobs_and_target(model, net_output, sample)
+        mask = target.ne(self.padding_idx)
+        n_correct = torch.sum(
+            lprobs.argmax(1).masked_select(mask).eq(target.masked_select(mask))
+        )
+        total = torch.sum(mask)
+        return n_correct, total
+
+    def get_lprobs_and_target(self, model, net_output, sample):
+        lprobs = model.get_normalized_probs(net_output, log_probs=True)
+        target = sample["decoder_target"]
+        if self.ignore_prefix_size > 0:
+            if getattr(lprobs, "batch_first", False):
+                lprobs = lprobs[:, self.ignore_prefix_size :, :].contiguous()
+                target = target[:, self.ignore_prefix_size :].contiguous()
+            else:
+                lprobs = lprobs[self.ignore_prefix_size :, :, :].contiguous()
+                target = target[self.ignore_prefix_size :, :].contiguous()
+        return lprobs.view(-1, lprobs.size(-1)), target.view(-1)
+
+    @staticmethod
+    def reduce_metrics(logging_outputs) -> None:
+        """Aggregate logging outputs from data parallel training (copied from normal cross entropy)."""
+        loss_sum = sum(log.get("loss", 0) for log in logging_outputs)
+        ntokens = sum(log.get("ntokens", 0) for log in logging_outputs)
+        sample_size = sum(log.get("sample_size", 0) for log in logging_outputs)
+
+        metrics.log_scalar("loss", loss_sum / sample_size / math.log(2), sample_size, round=3)
+        if sample_size != ntokens:
+            metrics.log_scalar("nll_loss", loss_sum / ntokens / math.log(2), ntokens, round=3)
+            metrics.log_derived("ppl", lambda meters: utils.get_perplexity(meters["nll_loss"].avg))
+        else:
+            metrics.log_derived("ppl", lambda meters: utils.get_perplexity(meters["loss"].avg))
+
+        counts = {}
+        for lk in logging_outputs[0].keys():
+            if lk.startswith("count_"):
+                val = sum(log[lk] for log in logging_outputs)
+                metrics.log_scalar(lk, val)
+                counts[lk] = val
+
+        for lk in logging_outputs[0].keys():
+            if lk.startswith("loss_"):
+                val = sum(log[lk] for log in logging_outputs)
+                metrics.log_scalar(lk, val / sample_size / math.log(2), round=3)
+            elif lk.startswith("correct_"):
+                val = sum(log[lk] for log in logging_outputs)
+                metrics.log_scalar(lk, val / counts[re.sub("correct", "count", lk)])
+
+        if "dec_loss" in logging_outputs[0]:
+            dec_loss_sum = sum(log.get("dec_loss", 0) for log in logging_outputs)
+            dec_nll_loss_sum = sum(log.get("dec_nll_loss", 0) for log in logging_outputs)
+            dec_sample_size = sum(log.get("dec_sample_size", 0) for log in logging_outputs)
+            metrics.log_scalar(
+                "dec_loss", dec_loss_sum / dec_sample_size / math.log(2), dec_sample_size, round=3
+            )
+            metrics.log_scalar(
+                "dec_nll_loss", dec_nll_loss_sum / dec_sample_size / math.log(2), dec_sample_size, round=3
+            )
+            metrics.log_derived(
+                "dec_ppl", lambda meters: utils.get_perplexity(meters["dec_nll_loss"].avg)
+            )
+            total = utils.item(sum(log.get("total", 0) for log in logging_outputs))
+            if total > 0:
+                metrics.log_scalar("total", total)
+                n_correct = utils.item(
+                    sum(log.get("dec_n_correct", 0) for log in logging_outputs)
+                )
+                metrics.log_scalar("dec_n_correct", n_correct)
+                metrics.log_derived(
+                    "dec_accuracy",
+                    lambda meters: round(
+                        meters["dec_n_correct"].sum * 100.0 / meters["total"].sum, 3
+                    )
+                    if meters["total"].sum > 0
+                    else float("nan"),
+                )
+
+    @staticmethod
+    def aggregate_logging_outputs(logging_outputs):
+        """Aggregate logging outputs from data parallel training."""
+        raise NotImplementedError()
+
+    @staticmethod
+    def logging_outputs_can_be_summed() -> bool:
+        """
+        Whether the logging outputs returned by `forward` can be summed
+        across workers prior to calling `reduce_metrics`. Setting this
+        to True will improves distributed training speed.
+        """
+        return False
diff --git a/Speech2C/speech2c/data/speech2c_dataset.py b/Speech2C/speech2c/data/speech2c_dataset.py
new file mode 100644
index 0000000000000000000000000000000000000000..7af1303b0faa145d19e0bdf1d0a1ed9db61ad625
--- /dev/null
+++ b/Speech2C/speech2c/data/speech2c_dataset.py
@@ -0,0 +1,145 @@
+# --------------------------------------------------------
+# Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data (https://arxiv.org/abs/2203.17113)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/Speech2C
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Based on fairseq code bases
+# https://github.com/pytorch/fairseq
+# --------------------------------------------------------
+
+import logging
+from typing import Any, List, Optional, Union
+
+import torch
+from fairseq.data import data_utils, Dictionary
+from fairseq.data.audio.hubert_dataset import HubertDataset
+logger = logging.getLogger(__name__)
+
+
+class Speech2cDataset(HubertDataset):
+    def __init__(
+        self,
+        manifest_path: str,
+        sample_rate: float,
+        label_paths: List[str],
+        label_rates: Union[List[float], float],  # -1 for sequence labels
+        pad_list: List[str],
+        eos_list: List[str],
+        label_processors: Optional[List[Any]] = None,
+        max_keep_sample_size: Optional[int] = None,
+        min_keep_sample_size: Optional[int] = None,
+        max_sample_size: Optional[int] = None,
+        shuffle: bool = True,
+        pad_audio: bool = False,
+        normalize: bool = False,
+        store_labels: bool = True,
+        random_crop: bool = False,
+        single_target: bool = False,
+        tgt_dict: Optional[Dictionary] = None,
+        add_decoder: bool = False,
+        fine_tuning: bool = False,
+    ):
+        super().__init__(
+            manifest_path,
+            sample_rate,
+            label_paths,
+            label_rates,
+            pad_list,
+            eos_list,
+            label_processors,
+            max_keep_sample_size,
+            min_keep_sample_size,
+            max_sample_size,
+            shuffle,
+            pad_audio,
+            normalize,
+            store_labels,
+            random_crop,
+            single_target
+        )
+
+        self.tgt_dict = tgt_dict
+        self.add_decoder = add_decoder
+        self.fine_tuning = fine_tuning
+
+    def collater(self, samples):
+        # target = max(sizes) -> random_crop not used
+        # target = max_sample_size -> random_crop used for long
+        samples = [s for s in samples if s["source"] is not None]
+        if len(samples) == 0:
+            return {}
+
+        audios = [s["source"] for s in samples]
+        audio_sizes = [len(s) for s in audios]
+        if self.pad_audio:
+            audio_size = min(max(audio_sizes), self.max_sample_size)
+        else:
+            audio_size = min(min(audio_sizes), self.max_sample_size)
+        collated_audios, padding_mask, audio_starts = self.collater_audio(
+            audios, audio_size
+        )
+
+        targets_by_label = [
+            [s["label_list"][i] for s in samples] for i in range(self.num_labels)
+        ]
+        targets_list, lengths_list, ntokens_list = self.collater_label(
+            targets_by_label, audio_size, audio_starts
+        )
+
+        if self.add_decoder:
+            if self.fine_tuning:
+                decoder_label = [
+                    torch.cat((targets_list[0][i, :lengths_list[0][i]], torch.tensor([self.tgt_dict.eos()])), 0).long()
+                    for i in range(targets_list[0].size(0))
+                ]
+            else:
+                decoder_label = [
+                    torch.cat((targets_list[0][i, :lengths_list[0][i]].unique_consecutive(), torch.tensor([self.tgt_dict.eos()])), 0).long()
+                    for i in range(targets_list[0].size(0))
+                ]
+            dec_ntokens = sum(x.size(0) for x in decoder_label)
+            decoder_target = data_utils.collate_tokens(
+                decoder_label,
+                self.tgt_dict.pad(),
+                self.tgt_dict.eos(),
+                left_pad=False,
+                move_eos_to_beginning=False,
+            )
+            decoder_target_lengths = torch.tensor(
+                [x.size(0) for x in decoder_label], dtype=torch.long
+            )
+            prev_output_tokens = data_utils.collate_tokens(
+                decoder_label,
+                self.tgt_dict.pad(),
+                self.tgt_dict.eos(),
+                left_pad=False,
+                move_eos_to_beginning=True,
+            )
+            net_input = {
+                "source": collated_audios, 
+                "padding_mask": padding_mask,
+                "prev_output_tokens": prev_output_tokens,
+            }
+            batch = {
+                "id": torch.LongTensor([s["id"] for s in samples]),
+                "net_input": net_input,
+                "decoder_target": decoder_target,
+                "decoder_target_lengths": decoder_target_lengths,
+                "dec_ntokens": dec_ntokens,
+            }
+        else:
+            net_input = {"source": collated_audios, "padding_mask": padding_mask}
+            batch = {
+                "id": torch.LongTensor([s["id"] for s in samples]),
+                "net_input": net_input,
+            }
+
+        if self.single_target:
+            batch["target_lengths"] = lengths_list[0]
+            batch["ntokens"] = ntokens_list[0]
+            batch["target"] = targets_list[0]
+        else:
+            batch["target_lengths_list"] = lengths_list
+            batch["ntokens_list"] = ntokens_list
+            batch["target_list"] = targets_list
+        return batch
diff --git a/Speech2C/speech2c/models/modules/ctc_prefix_score.py b/Speech2C/speech2c/models/modules/ctc_prefix_score.py
new file mode 100644
index 0000000000000000000000000000000000000000..b42cbd819abf7bdd718bef3db3f553c8360ac384
--- /dev/null
+++ b/Speech2C/speech2c/models/modules/ctc_prefix_score.py
@@ -0,0 +1,93 @@
+#!/usr/bin/env python3
+
+# Copyright 2018 Mitsubishi Electric Research Labs (Takaaki Hori)
+#  Apache 2.0  (http://www.apache.org/licenses/LICENSE-2.0)
+
+import numpy as np
+import six
+
+
+class CTCPrefixScore(object):
+    """Compute CTC label sequence scores
+    which is based on Algorithm 2 in WATANABE et al.
+    "HYBRID CTC/ATTENTION ARCHITECTURE FOR END-TO-END SPEECH RECOGNITION,"
+    but extended to efficiently compute the probablities of multiple labels
+    simultaneously
+    """
+
+    def __init__(self, x, blank, eos, xp):
+        self.xp = xp
+        self.logzero = -10000000000.0
+        self.blank = blank
+        self.eos = eos
+        self.input_length = len(x)
+        self.x = x
+
+    def initial_state(self):
+        """Obtain an initial CTC state
+        :return: CTC state
+        """
+        # initial CTC state is made of a frame x 2 tensor that corresponds to
+        # r_t^n(<sos>) and r_t^b(<sos>), where 0 and 1 of axis=1 represent
+        # superscripts n and b (non-blank and blank), respectively.
+        r = self.xp.full((self.input_length, 2), self.logzero, dtype=np.float32)
+        r[0, 1] = self.x[0, self.blank]
+        for i in six.moves.range(1, self.input_length):
+            r[i, 1] = r[i - 1, 1] + self.x[i, self.blank]
+        return r
+
+    def __call__(self, y, cs, r_prev):
+        """Compute CTC prefix scores for next labels
+        :param y     : prefix label sequence
+        :param cs    : array of next labels
+        :param r_prev: previous CTC state
+        :return ctc_scores, ctc_states
+        """
+        # initialize CTC states
+        output_length = len(y) - 1  # ignore sos
+        # new CTC states are prepared as a frame x (n or b) x n_labels tensor
+        # that corresponds to r_t^n(h) and r_t^b(h).
+        r = self.xp.ndarray((self.input_length, 2, len(cs)), dtype=np.float32)
+        xs = self.x[:, cs]
+        if output_length == 0:
+            r[0, 0] = xs[0]
+            r[0, 1] = self.logzero
+        else:
+            r[output_length - 1] = self.logzero
+
+        # prepare forward probabilities for the last label
+        r_sum = self.xp.logaddexp(
+            r_prev[:, 0], r_prev[:, 1]
+        )  # log(r_t^n(g) + r_t^b(g))
+        last = y[-1]
+        if output_length > 0 and last in cs:
+            log_phi = self.xp.ndarray((self.input_length, len(cs)), dtype=np.float32)
+            for i in six.moves.range(len(cs)):
+                log_phi[:, i] = r_sum if cs[i] != last else r_prev[:, 1]
+        else:
+            log_phi = r_sum
+
+        # compute forward probabilities log(r_t^n(h)), log(r_t^b(h)),
+        # and log prefix probabilities log(psi)
+        start = max(output_length, 1)
+        log_psi = r[start - 1, 0]
+        for t in six.moves.range(start, self.input_length):
+            r[t, 0] = self.xp.logaddexp(r[t - 1, 0], log_phi[t - 1]) + xs[t]
+            r[t, 1] = (
+                self.xp.logaddexp(r[t - 1, 0], r[t - 1, 1]) + self.x[t, self.blank]
+            )
+            log_psi = self.xp.logaddexp(log_psi, log_phi[t - 1] + xs[t])
+
+        # get P(...eos|X) that ends with the prefix itself
+        eos_pos = self.xp.where(cs == self.eos)[0]
+        if len(eos_pos) > 0:
+            log_psi[eos_pos] = r_sum[-1]  # log(r_T^n(g) + r_T^b(g))
+
+        # exclude blank probs
+        blank_pos = self.xp.where(cs == self.blank)[0]
+        if len(blank_pos) > 0:
+            log_psi[blank_pos] = self.logzero
+
+        # return the log prefix probability and CTC states, where the label axis
+        # of the CTC states is moved to the first axis to slice it easily
+        return log_psi, self.xp.rollaxis(r, 2)
diff --git a/Speech2C/speech2c/models/modules/multihead_attention.py b/Speech2C/speech2c/models/modules/multihead_attention.py
new file mode 100644
index 0000000000000000000000000000000000000000..7b1c1445037ada5aef5b8cf9fd3b63b05d95aca1
--- /dev/null
+++ b/Speech2C/speech2c/models/modules/multihead_attention.py
@@ -0,0 +1,341 @@
+# --------------------------------------------------------
+# Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data (https://arxiv.org/abs/2203.17113)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/Speech2C
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Based on fairseq code bases
+# https://github.com/pytorch/fairseq
+# --------------------------------------------------------
+
+from typing import Dict, Optional, Tuple
+
+import torch
+import torch.nn.functional as F
+from fairseq import utils
+from torch import Tensor
+
+from fairseq.modules import MultiheadAttention as FairseqMultiheadAttention
+
+
+class MultiheadAttention(FairseqMultiheadAttention):
+    """Multi-headed attention.
+
+    See "Attention Is All You Need" for more details.
+    """
+
+    def __init__(
+        self,
+        embed_dim,
+        num_heads,
+        kdim=None,
+        vdim=None,
+        dropout=0.0,
+        bias=True,
+        add_bias_kv=False,
+        add_zero_attn=False,
+        self_attention=False,
+        encoder_decoder_attention=False,
+        q_noise=0.0,
+        qn_block_size=8,
+    ):
+        super().__init__(
+            embed_dim,
+            num_heads,
+            kdim,
+            vdim,
+            dropout,
+            bias,
+            add_bias_kv,
+            add_zero_attn,
+            self_attention,
+            encoder_decoder_attention,
+            q_noise,
+            qn_block_size,
+        )
+
+    def forward(
+        self,
+        query,
+        key: Optional[Tensor],
+        value: Optional[Tensor],
+        key_padding_mask: Optional[Tensor] = None,
+        incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]] = None,
+        need_weights: bool = True,
+        static_kv: bool = False,
+        attn_mask: Optional[Tensor] = None,
+        before_softmax: bool = False,
+        need_head_weights: bool = False,
+        position_bias: Optional[Tensor] = None,
+    ) -> Tuple[Tensor, Optional[Tensor]]:
+        """Input shape: Time x Batch x Channel
+
+        Args:
+            key_padding_mask (ByteTensor, optional): mask to exclude
+                keys that are pads, of shape `(batch, src_len)`, where
+                padding elements are indicated by 1s.
+            need_weights (bool, optional): return the attention weights,
+                averaged over heads (default: False).
+            attn_mask (ByteTensor, optional): typically used to
+                implement causal attention, where the mask prevents the
+                attention from looking forward in time (default: None).
+            before_softmax (bool, optional): return the raw attention
+                weights and values before the attention softmax.
+            need_head_weights (bool, optional): return the attention
+                weights for each head. Implies *need_weights*. Default:
+                return the average attention weights over all heads.
+        """
+        if need_head_weights:
+            need_weights = True
+
+        is_tpu = query.device.type == "xla"
+
+        tgt_len, bsz, embed_dim = query.size()
+        src_len = tgt_len
+        assert embed_dim == self.embed_dim, f"query dim {embed_dim} != {self.embed_dim}"
+        assert list(query.size()) == [tgt_len, bsz, embed_dim]
+        if key is not None:
+            src_len, key_bsz, _ = key.size()
+            if not torch.jit.is_scripting():
+                assert key_bsz == bsz
+                assert value is not None
+                assert src_len, bsz == value.shape[:2]
+
+        if (
+            not self.onnx_trace
+            and not is_tpu  # don't use PyTorch version on TPUs
+            and incremental_state is None
+            and not static_kv
+            # A workaround for quantization to work. Otherwise JIT compilation
+            # treats bias in linear module as method.
+            and not torch.jit.is_scripting()
+            and position_bias is None
+        ):
+            assert key is not None and value is not None
+            return F.multi_head_attention_forward(
+                query,
+                key,
+                value,
+                self.embed_dim,
+                self.num_heads,
+                torch.empty([0]),
+                torch.cat((self.q_proj.bias, self.k_proj.bias, self.v_proj.bias)),
+                self.bias_k,
+                self.bias_v,
+                self.add_zero_attn,
+                self.dropout_module.p,
+                self.out_proj.weight,
+                self.out_proj.bias,
+                self.training or self.dropout_module.apply_during_inference,
+                key_padding_mask,
+                need_weights,
+                attn_mask,
+                use_separate_proj_weight=True,
+                q_proj_weight=self.q_proj.weight,
+                k_proj_weight=self.k_proj.weight,
+                v_proj_weight=self.v_proj.weight,
+            )
+
+        if incremental_state is not None:
+            saved_state = self._get_input_buffer(incremental_state)
+            if saved_state is not None and "prev_key" in saved_state:
+                # previous time steps are cached - no need to recompute
+                # key and value if they are static
+                if static_kv:
+                    assert self.encoder_decoder_attention and not self.self_attention
+                    key = value = None
+        else:
+            saved_state = None
+
+        if self.self_attention:
+            q = self.q_proj(query)
+            k = self.k_proj(query)
+            v = self.v_proj(query)
+        elif self.encoder_decoder_attention:
+            # encoder-decoder attention
+            q = self.q_proj(query)
+            if key is None:
+                assert value is None
+                k = v = None
+            else:
+                k = self.k_proj(key)
+                v = self.v_proj(key)
+
+        else:
+            assert key is not None and value is not None
+            q = self.q_proj(query)
+            k = self.k_proj(key)
+            v = self.v_proj(value)
+        q *= self.scaling
+
+        if self.bias_k is not None:
+            assert self.bias_v is not None
+            k = torch.cat([k, self.bias_k.repeat(1, bsz, 1)])
+            v = torch.cat([v, self.bias_v.repeat(1, bsz, 1)])
+            if attn_mask is not None:
+                attn_mask = torch.cat(
+                    [attn_mask, attn_mask.new_zeros(attn_mask.size(0), 1)], dim=1
+                )
+            if key_padding_mask is not None:
+                key_padding_mask = torch.cat(
+                    [
+                        key_padding_mask,
+                        key_padding_mask.new_zeros(key_padding_mask.size(0), 1),
+                    ],
+                    dim=1,
+                )
+
+        q = (
+            q.contiguous()
+            .view(tgt_len, bsz * self.num_heads, self.head_dim)
+            .transpose(0, 1)
+        )
+        if k is not None:
+            k = (
+                k.contiguous()
+                .view(-1, bsz * self.num_heads, self.head_dim)
+                .transpose(0, 1)
+            )
+        if v is not None:
+            v = (
+                v.contiguous()
+                .view(-1, bsz * self.num_heads, self.head_dim)
+                .transpose(0, 1)
+            )
+
+        if saved_state is not None:
+            # saved states are stored with shape (bsz, num_heads, seq_len, head_dim)
+            if "prev_key" in saved_state:
+                _prev_key = saved_state["prev_key"]
+                assert _prev_key is not None
+                prev_key = _prev_key.view(bsz * self.num_heads, -1, self.head_dim)
+                if static_kv:
+                    k = prev_key
+                else:
+                    assert k is not None
+                    k = torch.cat([prev_key, k], dim=1)
+                src_len = k.size(1)
+            if "prev_value" in saved_state:
+                _prev_value = saved_state["prev_value"]
+                assert _prev_value is not None
+                prev_value = _prev_value.view(bsz * self.num_heads, -1, self.head_dim)
+                if static_kv:
+                    v = prev_value
+                else:
+                    assert v is not None
+                    v = torch.cat([prev_value, v], dim=1)
+            prev_key_padding_mask: Optional[Tensor] = None
+            if "prev_key_padding_mask" in saved_state:
+                prev_key_padding_mask = saved_state["prev_key_padding_mask"]
+            assert k is not None and v is not None
+            key_padding_mask = MultiheadAttention._append_prev_key_padding_mask(
+                key_padding_mask=key_padding_mask,
+                prev_key_padding_mask=prev_key_padding_mask,
+                batch_size=bsz,
+                src_len=k.size(1),
+                static_kv=static_kv,
+            )
+
+            saved_state["prev_key"] = k.view(bsz, self.num_heads, -1, self.head_dim)
+            saved_state["prev_value"] = v.view(bsz, self.num_heads, -1, self.head_dim)
+            saved_state["prev_key_padding_mask"] = key_padding_mask
+            # In this branch incremental_state is never None
+            assert incremental_state is not None
+            incremental_state = self._set_input_buffer(incremental_state, saved_state)
+        assert k is not None
+        assert k.size(1) == src_len
+
+        # This is part of a workaround to get around fork/join parallelism
+        # not supporting Optional types.
+        if key_padding_mask is not None and key_padding_mask.dim() == 0:
+            key_padding_mask = None
+
+        if key_padding_mask is not None:
+            assert key_padding_mask.size(0) == bsz
+            assert key_padding_mask.size(1) == src_len
+
+        if self.add_zero_attn:
+            assert v is not None
+            src_len += 1
+            k = torch.cat([k, k.new_zeros((k.size(0), 1) + k.size()[2:])], dim=1)
+            v = torch.cat([v, v.new_zeros((v.size(0), 1) + v.size()[2:])], dim=1)
+            if attn_mask is not None:
+                attn_mask = torch.cat(
+                    [attn_mask, attn_mask.new_zeros(attn_mask.size(0), 1)], dim=1
+                )
+            if key_padding_mask is not None:
+                key_padding_mask = torch.cat(
+                    [
+                        key_padding_mask,
+                        torch.zeros(key_padding_mask.size(0), 1).type_as(
+                            key_padding_mask
+                        ),
+                    ],
+                    dim=1,
+                )
+
+        attn_weights = torch.bmm(q, k.transpose(1, 2))
+        attn_weights = self.apply_sparse_mask(attn_weights, tgt_len, src_len, bsz)
+
+        if position_bias is not None: ## first order
+            ## position_bias: [241, 241, 64]
+            #print ("attn_weights: ", attn_weights.size()) # [492, 241, 241]
+            reshape_q = q.contiguous().view(bsz * self.num_heads, -1, self.head_dim).transpose(0,1) #[241, 492, 64]
+            #print ("reshape_q: ", reshape_q.size())
+            B = torch.matmul(reshape_q, position_bias.transpose(-2, -1))
+            #print ("B: ", B.size())  ## [241, 492, 241]
+            #B = B.transpose(0, 1).view(bsz, self.num_heads, position_bias.size(0), position_bias.size(1))
+            B = B.transpose(0, 1).view(bsz*self.num_heads, position_bias.size(0), position_bias.size(1))
+            #print ("B 2: ", B.size())
+            attn_weights += B
+
+        assert list(attn_weights.size()) == [bsz * self.num_heads, tgt_len, src_len]
+
+        if attn_mask is not None:
+            attn_mask = attn_mask.unsqueeze(0)
+            if self.onnx_trace:
+                attn_mask = attn_mask.repeat(attn_weights.size(0), 1, 1)
+            attn_weights += attn_mask
+
+        if key_padding_mask is not None:
+            # don't attend to padding symbols
+            attn_weights = attn_weights.view(bsz, self.num_heads, tgt_len, src_len)
+            if not is_tpu:
+                attn_weights = attn_weights.masked_fill(
+                    key_padding_mask.unsqueeze(1).unsqueeze(2).to(torch.bool),
+                    float("-inf"),
+                )
+            else:
+                attn_weights = attn_weights.transpose(0, 2)
+                attn_weights = attn_weights.masked_fill(key_padding_mask, float("-inf"))
+                attn_weights = attn_weights.transpose(0, 2)
+            attn_weights = attn_weights.view(bsz * self.num_heads, tgt_len, src_len)
+
+        if before_softmax:
+            return attn_weights, v
+
+        attn_weights_float = utils.softmax(
+            attn_weights, dim=-1, onnx_trace=self.onnx_trace
+        )
+        attn_weights = attn_weights_float.type_as(attn_weights)
+        attn_probs = self.dropout_module(attn_weights)
+
+        assert v is not None
+        attn = torch.bmm(attn_probs, v)
+        assert list(attn.size()) == [bsz * self.num_heads, tgt_len, self.head_dim]
+        if self.onnx_trace and attn.size(1) == 1:
+            # when ONNX tracing a single decoder step (sequence length == 1)
+            # the transpose is a no-op copy before view, thus unnecessary
+            attn = attn.contiguous().view(tgt_len, bsz, embed_dim)
+        else:
+            attn = attn.transpose(0, 1).contiguous().view(tgt_len, bsz, embed_dim)
+        attn = self.out_proj(attn)
+        attn_weights: Optional[Tensor] = None
+        if need_weights:
+            attn_weights = attn_weights_float.view(
+                bsz, self.num_heads, tgt_len, src_len
+            ).transpose(1, 0)
+            if not need_head_weights:
+                # average attention weights over heads
+                attn_weights = attn_weights.mean(dim=0)
+
+        return attn, attn_weights
diff --git a/Speech2C/speech2c/models/modules/relative_pos_enc.py b/Speech2C/speech2c/models/modules/relative_pos_enc.py
new file mode 100644
index 0000000000000000000000000000000000000000..2a073ebf2893e9e9b092aa520bdaf927e9388c2b
--- /dev/null
+++ b/Speech2C/speech2c/models/modules/relative_pos_enc.py
@@ -0,0 +1,35 @@
+# --------------------------------------------------------
+# Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data (https://arxiv.org/abs/2203.17113)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/Speech2C
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Based on fairseq code bases
+# https://github.com/pytorch/fairseq
+# --------------------------------------------------------
+
+import torch
+
+class RelativePositionalEncoding(torch.nn.Module):
+    def __init__(self, d_model, maxlen=1000, embed_v=False):
+        super(RelativePositionalEncoding, self).__init__()
+
+        self.d_model = d_model
+        self.maxlen = maxlen
+        self.pe_k = torch.nn.Embedding(2*maxlen, d_model) 
+        if embed_v:
+            self.pe_v = torch.nn.Embedding(2*maxlen, d_model)
+        self.embed_v = embed_v
+
+
+    def forward(self, pos_seq, incremental_state=None):
+        pos_seq[pos_seq < -self.maxlen] = -self.maxlen
+        pos_seq[pos_seq >= self.maxlen] = self.maxlen - 1
+        pos_seq = pos_seq + self.maxlen
+        
+        if incremental_state is not None:
+            pos_seq = pos_seq[-1:]
+
+        if self.embed_v:
+            return self.pe_k(pos_seq), self.pe_v(pos_seq)
+        else:
+            return self.pe_k(pos_seq), None
diff --git a/Speech2C/speech2c/models/modules/transformer_decoder.py b/Speech2C/speech2c/models/modules/transformer_decoder.py
new file mode 100644
index 0000000000000000000000000000000000000000..aaf4dce4ac717453bf4c37f3f393092ea53ef062
--- /dev/null
+++ b/Speech2C/speech2c/models/modules/transformer_decoder.py
@@ -0,0 +1,485 @@
+# --------------------------------------------------------
+# Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data (https://arxiv.org/abs/2203.17113)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/Speech2C
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Based on fairseq code bases
+# https://github.com/pytorch/fairseq
+# --------------------------------------------------------
+
+import math
+from typing import Any, Dict, List, Optional
+
+import torch
+import torch.nn as nn
+from fairseq import utils
+from fairseq.distributed import fsdp_wrap
+from fairseq.models import FairseqIncrementalDecoder
+from fairseq.models.transformer import TransformerConfig
+from fairseq.models.transformer.transformer_decoder import module_name_fordropout, Linear
+from fairseq.modules import (
+    AdaptiveSoftmax,
+    BaseLayer,
+    FairseqDropout,
+    LayerDropModuleList,
+    LayerNorm,
+    PositionalEmbedding,
+    SinusoidalPositionalEmbedding,
+)
+from fairseq.modules.checkpoint_activations import checkpoint_wrapper
+from fairseq.modules.quant_noise import quant_noise as apply_quant_noise_
+from torch import Tensor
+
+
+from speech2c.models.modules.transformer_decoder_layer import TransformerDecoderLayerBase
+from speech2c.models.modules.relative_pos_enc import RelativePositionalEncoding
+
+
+class TransformerDecoderBase(FairseqIncrementalDecoder):
+    """
+    Transformer decoder consisting of *cfg.decoder.layers* layers. Each layer
+    is a :class:`TransformerDecoderLayer`.
+
+    Args:
+        args (argparse.Namespace): parsed command-line arguments
+        dictionary (~fairseq.data.Dictionary): decoding dictionary
+        embed_tokens (torch.nn.Embedding): output embedding
+        no_encoder_attn (bool, optional): whether to attend to encoder outputs
+            (default: False).
+    """
+
+    def __init__(
+        self,
+        cfg,
+        dictionary,
+        embed_tokens,
+        no_encoder_attn=False,
+        output_projection=None,
+        use_rel_pos_enc=False,
+    ):
+        self.cfg = cfg
+        super().__init__(dictionary)
+        self.register_buffer("version", torch.Tensor([3]))
+        self._future_mask = torch.empty(0)
+
+        self.dropout_module = FairseqDropout(
+            cfg.dropout, module_name=module_name_fordropout(self.__class__.__name__)
+        )
+        self.decoder_layerdrop = cfg.decoder.layerdrop
+        self.share_input_output_embed = cfg.share_decoder_input_output_embed
+
+        input_embed_dim = embed_tokens.embedding_dim
+        embed_dim = cfg.decoder.embed_dim
+        self.embed_dim = embed_dim
+        self.output_embed_dim = cfg.decoder.output_dim
+
+        self.padding_idx = embed_tokens.padding_idx
+        self.max_target_positions = cfg.max_target_positions
+
+        self.embed_tokens = embed_tokens
+
+        self.embed_scale = 1.0 if cfg.no_scale_embedding else math.sqrt(embed_dim)
+
+        if not cfg.adaptive_input and cfg.quant_noise.pq > 0:
+            self.quant_noise = apply_quant_noise_(
+                nn.Linear(embed_dim, embed_dim, bias=False),
+                cfg.quant_noise.pq,
+                cfg.quant_noise.pq_block_size,
+            )
+        else:
+            self.quant_noise = None
+
+        self.project_in_dim = (
+            Linear(input_embed_dim, embed_dim, bias=False)
+            if embed_dim != input_embed_dim
+            else None
+        )
+        self.embed_positions = (
+            PositionalEmbedding(
+                self.max_target_positions,
+                embed_dim,
+                self.padding_idx,
+                learned=cfg.decoder.learned_pos,
+            )
+            if not cfg.no_token_positional_embeddings
+            else None
+        )
+        if cfg.layernorm_embedding:
+            self.layernorm_embedding = LayerNorm(embed_dim, export=cfg.export)
+        else:
+            self.layernorm_embedding = None
+
+        self.cross_self_attention = cfg.cross_self_attention
+
+        self.use_rel_pos_enc = use_rel_pos_enc
+        if self.decoder_layerdrop > 0.0:
+            self.layers = LayerDropModuleList(p=self.decoder_layerdrop)
+        else:
+            self.layers = nn.ModuleList([])
+        self.layers.extend(
+            [
+                self.build_decoder_layer(cfg, no_encoder_attn)
+                for _ in range(cfg.decoder.layers)
+            ]
+        )
+        self.num_layers = len(self.layers)
+
+        if cfg.decoder.normalize_before and not cfg.no_decoder_final_norm:
+            self.layer_norm = LayerNorm(embed_dim, export=cfg.export)
+        else:
+            self.layer_norm = None
+
+        self.project_out_dim = (
+            Linear(embed_dim, self.output_embed_dim, bias=False)
+            if embed_dim != self.output_embed_dim and not cfg.tie_adaptive_weights
+            else None
+        )
+
+        self.adaptive_softmax = None
+        self.output_projection = output_projection
+        if self.output_projection is None:
+            self.build_output_projection(cfg, dictionary, embed_tokens)
+
+        if self.use_rel_pos_enc:
+            self.pos_emb = RelativePositionalEncoding(self.embed_dim // cfg.decoder.attention_heads, 24)
+
+    def build_output_projection(self, cfg, dictionary, embed_tokens):
+        if cfg.adaptive_softmax_cutoff is not None:
+            self.adaptive_softmax = AdaptiveSoftmax(
+                len(dictionary),
+                self.output_embed_dim,
+                utils.eval_str_list(cfg.adaptive_softmax_cutoff, type=int),
+                dropout=cfg.adaptive_softmax_dropout,
+                adaptive_inputs=embed_tokens if cfg.tie_adaptive_weights else None,
+                factor=cfg.adaptive_softmax_factor,
+                tie_proj=cfg.tie_adaptive_proj,
+            )
+        elif self.share_input_output_embed:
+            self.output_projection = nn.Linear(
+                self.embed_tokens.weight.shape[1],
+                self.embed_tokens.weight.shape[0],
+                bias=False,
+            )
+            self.output_projection.weight = self.embed_tokens.weight
+        else:
+            self.output_projection = nn.Linear(
+                self.output_embed_dim, len(dictionary), bias=False
+            )
+            nn.init.normal_(
+                self.output_projection.weight, mean=0, std=self.output_embed_dim ** -0.5
+            )
+        num_base_layers = cfg.base_layers
+        for i in range(num_base_layers):
+            self.layers.insert(
+                ((i + 1) * cfg.decoder.layers) // (num_base_layers + 1),
+                BaseLayer(cfg),
+            )
+
+    def build_decoder_layer(self, cfg, no_encoder_attn=False):
+        layer = TransformerDecoderLayerBase(cfg, no_encoder_attn, has_relative_attention_bias=self.use_rel_pos_enc)
+        checkpoint = cfg.checkpoint_activations
+        if checkpoint:
+            offload_to_cpu = cfg.offload_activations
+            layer = checkpoint_wrapper(layer, offload_to_cpu=offload_to_cpu)
+        # if we are checkpointing, enforce that FSDP always wraps the
+        # checkpointed layer, regardless of layer size
+        min_params_to_wrap = cfg.min_params_to_wrap if not checkpoint else 0
+        layer = fsdp_wrap(layer, min_num_params=min_params_to_wrap)
+        return layer
+
+    def forward(
+        self,
+        prev_output_tokens,
+        encoder_out: Optional[Dict[str, List[Tensor]]] = None,
+        incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]] = None,
+        features_only: bool = False,
+        full_context_alignment: bool = False,
+        alignment_layer: Optional[int] = None,
+        alignment_heads: Optional[int] = None,
+        src_lengths: Optional[Any] = None,
+        return_all_hiddens: bool = False,
+    ):
+        """
+        Args:
+            prev_output_tokens (LongTensor): previous decoder outputs of shape
+                `(batch, tgt_len)`, for teacher forcing
+            encoder_out (optional): output from the encoder, used for
+                encoder-side attention, should be of size T x B x C
+            incremental_state (dict): dictionary used for storing state during
+                :ref:`Incremental decoding`
+            features_only (bool, optional): only return features without
+                applying output layer (default: False).
+            full_context_alignment (bool, optional): don't apply
+                auto-regressive mask to self-attention (default: False).
+
+        Returns:
+            tuple:
+                - the decoder's output of shape `(batch, tgt_len, vocab)`
+                - a dictionary with any model-specific outputs
+        """
+
+        x, extra = self.extract_features(
+            prev_output_tokens,
+            encoder_out=encoder_out,
+            incremental_state=incremental_state,
+            full_context_alignment=full_context_alignment,
+            alignment_layer=alignment_layer,
+            alignment_heads=alignment_heads,
+        )
+
+        if not features_only:
+            x = self.output_layer(x)
+        return x, extra
+
+    def extract_features_scriptable(
+        self,
+        prev_output_tokens,
+        encoder_out: Optional[Dict[str, List[Tensor]]],
+        incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]] = None,
+        full_context_alignment: bool = False,
+        alignment_layer: Optional[int] = None,
+        alignment_heads: Optional[int] = None,
+    ):
+        """
+        Similar to *forward* but only return features.
+
+        Includes several features from "Jointly Learning to Align and
+        Translate with Transformer Models" (Garg et al., EMNLP 2019).
+
+        Args:
+            full_context_alignment (bool, optional): don't apply
+                auto-regressive mask to self-attention (default: False).
+            alignment_layer (int, optional): return mean alignment over
+                heads at this layer (default: last layer).
+            alignment_heads (int, optional): only average alignment over
+                this many heads (default: all heads).
+
+        Returns:
+            tuple:
+                - the decoder's features of shape `(batch, tgt_len, embed_dim)`
+                - a dictionary with any model-specific outputs
+        """
+        bs, slen = prev_output_tokens.size()
+        if alignment_layer is None:
+            alignment_layer = self.num_layers - 1
+
+        enc: Optional[Tensor] = None
+        padding_mask: Optional[Tensor] = None
+        if encoder_out is not None and len(encoder_out["encoder_out"]) > 0:
+            enc = encoder_out["encoder_out"][0]
+            assert (
+                enc.size()[1] == bs
+            ), f"Expected enc.shape == (t, {bs}, c) got {enc.shape}"
+        if encoder_out is not None and len(encoder_out["encoder_padding_mask"]) > 0:
+            padding_mask = encoder_out["encoder_padding_mask"][0]
+
+        # embed positions
+        positions = None
+        if self.embed_positions is not None:
+            positions = self.embed_positions(
+                prev_output_tokens, incremental_state=incremental_state
+            )
+
+        if incremental_state is not None:
+            prev_output_tokens = prev_output_tokens[:, -1:]
+            if positions is not None:
+                positions = positions[:, -1:]
+
+        # embed tokens and positions
+        x = self.embed_scale * self.embed_tokens(prev_output_tokens)
+
+        if self.quant_noise is not None:
+            x = self.quant_noise(x)
+
+        if self.project_in_dim is not None:
+            x = self.project_in_dim(x)
+
+        if positions is not None:
+            x += positions
+
+        if self.layernorm_embedding is not None:
+            x = self.layernorm_embedding(x)
+
+        x = self.dropout_module(x)
+
+        # B x T x C -> T x B x C
+        x = x.transpose(0, 1)
+        if self.use_rel_pos_enc:
+            pos_seq = torch.arange(0, slen).long().to(x.device)
+            pos_seq = pos_seq[:, None] - pos_seq[None, :]
+            pos_k, _ = self.pos_emb(pos_seq, incremental_state)
+        else:
+            pos_k = None
+
+        self_attn_padding_mask: Optional[Tensor] = None
+        if self.cross_self_attention or prev_output_tokens.eq(self.padding_idx).any():
+            self_attn_padding_mask = prev_output_tokens.eq(self.padding_idx)
+
+        # decoder layers
+        attn: Optional[Tensor] = None
+        inner_states: List[Optional[Tensor]] = [x]
+        for idx, layer in enumerate(self.layers):
+            if incremental_state is None and not full_context_alignment:
+                self_attn_mask = self.buffered_future_mask(x)
+            else:
+                self_attn_mask = None
+
+            x, layer_attn, _ = layer(
+                x,
+                enc,
+                padding_mask,
+                incremental_state,
+                self_attn_mask=self_attn_mask,
+                self_attn_padding_mask=self_attn_padding_mask,
+                need_attn=bool((idx == alignment_layer)),
+                need_head_weights=bool((idx == alignment_layer)),
+                pos_bias=pos_k,
+            )
+            inner_states.append(x)
+            if layer_attn is not None and idx == alignment_layer:
+                attn = layer_attn.float().to(x)
+
+        if attn is not None:
+            if alignment_heads is not None:
+                attn = attn[:alignment_heads]
+
+            # average probabilities over heads
+            attn = attn.mean(dim=0)
+
+        if self.layer_norm is not None:
+            x = self.layer_norm(x)
+
+        # T x B x C -> B x T x C
+        x = x.transpose(0, 1)
+
+        if self.project_out_dim is not None:
+            x = self.project_out_dim(x)
+
+        return x, {"attn": [attn], "inner_states": inner_states}
+
+    def output_layer(self, features):
+        """Project features to the vocabulary size."""
+        if self.adaptive_softmax is None:
+            # project back to size of vocabulary
+            return self.output_projection(features)
+        else:
+            return features
+
+    def max_positions(self):
+        """Maximum output length supported by the decoder."""
+        if self.embed_positions is None:
+            return self.max_target_positions
+        return min(self.max_target_positions, self.embed_positions.max_positions)
+
+    def buffered_future_mask(self, tensor):
+        dim = tensor.size(0)
+        # self._future_mask.device != tensor.device is not working in TorchScript. This is a workaround.
+        if (
+            self._future_mask.size(0) == 0
+            or (not self._future_mask.device == tensor.device)
+            or self._future_mask.size(0) < dim
+        ):
+            self._future_mask = torch.triu(
+                utils.fill_with_neg_inf(torch.zeros([dim, dim])), 1
+            )
+        self._future_mask = self._future_mask.to(tensor)
+        return self._future_mask[:dim, :dim]
+
+    def upgrade_state_dict_named(self, state_dict, name):
+        """Upgrade a (possibly old) state dict for new versions of fairseq."""
+        if isinstance(self.embed_positions, SinusoidalPositionalEmbedding):
+            weights_key = "{}.embed_positions.weights".format(name)
+            if weights_key in state_dict:
+                del state_dict[weights_key]
+            state_dict[
+                "{}.embed_positions._float_tensor".format(name)
+            ] = torch.FloatTensor(1)
+
+        if f"{name}.output_projection.weight" not in state_dict:
+            if self.share_input_output_embed:
+                embed_out_key = f"{name}.embed_tokens.weight"
+            else:
+                embed_out_key = f"{name}.embed_out"
+            if embed_out_key in state_dict:
+                state_dict[f"{name}.output_projection.weight"] = state_dict[
+                    embed_out_key
+                ]
+                if not self.share_input_output_embed:
+                    del state_dict[embed_out_key]
+
+        for i in range(self.num_layers):
+            # update layer norms
+            layer_norm_map = {
+                "0": "self_attn_layer_norm",
+                "1": "encoder_attn_layer_norm",
+                "2": "final_layer_norm",
+            }
+            for old, new in layer_norm_map.items():
+                for m in ("weight", "bias"):
+                    k = "{}.layers.{}.layer_norms.{}.{}".format(name, i, old, m)
+                    if k in state_dict:
+                        state_dict[
+                            "{}.layers.{}.{}.{}".format(name, i, new, m)
+                        ] = state_dict[k]
+                        del state_dict[k]
+
+        version_key = "{}.version".format(name)
+        if utils.item(state_dict.get(version_key, torch.Tensor([1]))[0]) <= 2:
+            # earlier checkpoints did not normalize after the stack of layers
+            self.layer_norm = None
+            self.normalize = False
+            state_dict[version_key] = torch.Tensor([1])
+
+        return state_dict
+
+
+class TransformerDecoder(TransformerDecoderBase):
+    def __init__(
+        self,
+        args,
+        dictionary,
+        embed_tokens,
+        no_encoder_attn=False,
+        output_projection=None,
+    ):
+        self.args = args
+        super().__init__(
+            TransformerConfig.from_namespace(args),
+            dictionary,
+            embed_tokens,
+            no_encoder_attn=no_encoder_attn,
+            output_projection=output_projection,
+            use_rel_pos_enc=args.use_rel_pos_enc,
+        )
+
+    def build_output_projection(self, args, dictionary, embed_tokens):
+        super().build_output_projection(
+            TransformerConfig.from_namespace(args), dictionary, embed_tokens
+        )
+
+    def build_decoder_layer(self, args, no_encoder_attn=False):
+        return super().build_decoder_layer(
+            TransformerConfig.from_namespace(args), no_encoder_attn=no_encoder_attn
+        )
+
+class TransformerDecoderScriptable(TransformerDecoder):
+    def extract_features(
+        self,
+        prev_output_tokens,
+        encoder_out: Optional[Dict[str, List[Tensor]]] = None,
+        incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]] = None,
+        full_context_alignment: bool = False,
+        alignment_layer: Optional[int] = None,
+        alignment_heads: Optional[int] = None,
+    ):
+        # call scriptable method from parent class
+        x, _ = self.extract_features_scriptable(
+            prev_output_tokens,
+            encoder_out,
+            incremental_state,
+            full_context_alignment,
+            alignment_layer,
+            alignment_heads,
+        )
+        return x, None
+
diff --git a/Speech2C/speech2c/models/modules/transformer_decoder_layer.py b/Speech2C/speech2c/models/modules/transformer_decoder_layer.py
new file mode 100644
index 0000000000000000000000000000000000000000..780bb43d8d3aaf456c0ae4cf5223b9b7eae599e8
--- /dev/null
+++ b/Speech2C/speech2c/models/modules/transformer_decoder_layer.py
@@ -0,0 +1,215 @@
+# --------------------------------------------------------
+# Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data (https://arxiv.org/abs/2203.17113)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/Speech2C
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Based on fairseq code bases
+# https://github.com/pytorch/fairseq
+# --------------------------------------------------------
+
+from typing import Dict, List, Optional
+
+import torch
+from torch import Tensor
+from fairseq.modules.transformer_layer import TransformerDecoderLayerBase as FairseqTransformerDecoderLayerBase
+from fairseq.modules import LayerNorm
+
+from speech2c.models.modules.multihead_attention import MultiheadAttention
+
+
+class TransformerDecoderLayerBase(FairseqTransformerDecoderLayerBase):
+    """Decoder layer block.
+
+    In the original paper each operation (multi-head attention, encoder
+    attention or FFN) is postprocessed with: `dropout -> add residual ->
+    layernorm`. In the tensor2tensor code they suggest that learning is more
+    robust when preprocessing each layer with layernorm and postprocessing with:
+    `dropout -> add residual`. We default to the approach in the paper, but the
+    tensor2tensor approach can be enabled by setting
+    *cfg.decoder.normalize_before* to ``True``.
+
+    Args:
+        args (argparse.Namespace): parsed command-line arguments
+        no_encoder_attn (bool, optional): whether to attend to encoder outputs
+            (default: False).
+    """
+
+    def __init__(
+        self, cfg, no_encoder_attn=False, add_bias_kv=False, add_zero_attn=False, has_relative_attention_bias=False
+    ):
+        super().__init__(
+            cfg,
+            no_encoder_attn,
+            add_bias_kv,
+            add_zero_attn,
+        )
+
+        if has_relative_attention_bias:
+            self.norm_k = LayerNorm(self.embed_dim // cfg.decoder.attention_heads)
+
+    def build_self_attention(
+        self, embed_dim, cfg, add_bias_kv=False, add_zero_attn=False
+    ):
+        return MultiheadAttention(
+            embed_dim,
+            cfg.decoder.attention_heads,
+            dropout=cfg.attention_dropout,
+            add_bias_kv=add_bias_kv,
+            add_zero_attn=add_zero_attn,
+            self_attention=not cfg.cross_self_attention,
+            q_noise=self.quant_noise,
+            qn_block_size=self.quant_noise_block_size,
+        )
+
+    def forward(
+        self,
+        x,
+        encoder_out: Optional[torch.Tensor] = None,
+        encoder_padding_mask: Optional[torch.Tensor] = None,
+        incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]] = None,
+        prev_self_attn_state: Optional[List[torch.Tensor]] = None,
+        prev_attn_state: Optional[List[torch.Tensor]] = None,
+        self_attn_mask: Optional[torch.Tensor] = None,
+        self_attn_padding_mask: Optional[torch.Tensor] = None,
+        need_attn: bool = False,
+        need_head_weights: bool = False,
+        pos_bias=None,
+    ):
+        """
+        Args:
+            x (Tensor): input to the layer of shape `(seq_len, batch, embed_dim)`
+            encoder_padding_mask (ByteTensor, optional): binary
+                ByteTensor of shape `(batch, src_len)` where padding
+                elements are indicated by ``1``.
+            need_attn (bool, optional): return attention weights
+            need_head_weights (bool, optional): return attention weights
+                for each head (default: return average over heads).
+        Returns:
+            encoded output of shape `(seq_len, batch, embed_dim)`
+        """
+        if need_head_weights:
+            need_attn = True
+
+        residual = x
+        if self.normalize_before:
+            x = self.self_attn_layer_norm(x)
+            if pos_bias is not None:
+                pos_bias = self.norm_k(pos_bias)
+        if prev_self_attn_state is not None:
+            prev_key, prev_value = prev_self_attn_state[:2]
+            saved_state: Dict[str, Optional[Tensor]] = {
+                "prev_key": prev_key,
+                "prev_value": prev_value,
+            }
+            if len(prev_self_attn_state) >= 3:
+                saved_state["prev_key_padding_mask"] = prev_self_attn_state[2]
+            assert incremental_state is not None
+            self.self_attn._set_input_buffer(incremental_state, saved_state)
+        _self_attn_input_buffer = self.self_attn._get_input_buffer(incremental_state)
+        if self.cross_self_attention and not (
+            incremental_state is not None
+            and _self_attn_input_buffer is not None
+            and "prev_key" in _self_attn_input_buffer
+        ):
+            if self_attn_mask is not None:
+                assert encoder_out is not None
+                self_attn_mask = torch.cat(
+                    (x.new_zeros(x.size(0), encoder_out.size(0)), self_attn_mask), dim=1
+                )
+            if self_attn_padding_mask is not None:
+                if encoder_padding_mask is None:
+                    assert encoder_out is not None
+                    encoder_padding_mask = self_attn_padding_mask.new_zeros(
+                        encoder_out.size(1), encoder_out.size(0)
+                    )
+                self_attn_padding_mask = torch.cat(
+                    (encoder_padding_mask, self_attn_padding_mask), dim=1
+                )
+            assert encoder_out is not None
+            y = torch.cat((encoder_out, x), dim=0)
+        else:
+            y = x
+
+        x, attn = self.self_attn(
+            query=x,
+            key=y,
+            value=y,
+            key_padding_mask=self_attn_padding_mask,
+            incremental_state=incremental_state,
+            need_weights=False,
+            attn_mask=self_attn_mask,
+            position_bias=pos_bias,
+        )
+        if self.c_attn is not None:
+            tgt_len, bsz = x.size(0), x.size(1)
+            x = x.view(tgt_len, bsz, self.nh, self.head_dim)
+            x = torch.einsum("tbhd,h->tbhd", x, self.c_attn)
+            x = x.reshape(tgt_len, bsz, self.embed_dim)
+        if self.attn_ln is not None:
+            x = self.attn_ln(x)
+        x = self.dropout_module(x)
+        x = self.residual_connection(x, residual)
+        if not self.normalize_before:
+            x = self.self_attn_layer_norm(x)
+
+        if self.encoder_attn is not None and encoder_out is not None:
+            residual = x
+            if self.normalize_before:
+                x = self.encoder_attn_layer_norm(x)
+            if prev_attn_state is not None:
+                prev_key, prev_value = prev_attn_state[:2]
+                saved_state: Dict[str, Optional[Tensor]] = {
+                    "prev_key": prev_key,
+                    "prev_value": prev_value,
+                }
+                if len(prev_attn_state) >= 3:
+                    saved_state["prev_key_padding_mask"] = prev_attn_state[2]
+                assert incremental_state is not None
+                self.encoder_attn._set_input_buffer(incremental_state, saved_state)
+
+            x, attn = self.encoder_attn(
+                query=x,
+                key=encoder_out,
+                value=encoder_out,
+                key_padding_mask=encoder_padding_mask,
+                incremental_state=incremental_state,
+                static_kv=True,
+                need_weights=need_attn or (not self.training and self.need_attn),
+                need_head_weights=need_head_weights,
+            )
+            x = self.dropout_module(x)
+            x = self.residual_connection(x, residual)
+            if not self.normalize_before:
+                x = self.encoder_attn_layer_norm(x)
+
+        residual = x
+        if self.normalize_before:
+            x = self.final_layer_norm(x)
+
+        x = self.activation_fn(self.fc1(x))
+        x = self.activation_dropout_module(x)
+        if self.ffn_layernorm is not None:
+            x = self.ffn_layernorm(x)
+        x = self.fc2(x)
+        x = self.dropout_module(x)
+        if self.w_resid is not None:
+            residual = torch.mul(self.w_resid, residual)
+        x = self.residual_connection(x, residual)
+        if not self.normalize_before:
+            x = self.final_layer_norm(x)
+        if self.onnx_trace and incremental_state is not None:
+            saved_state = self.self_attn._get_input_buffer(incremental_state)
+            assert saved_state is not None
+            if self_attn_padding_mask is not None:
+                self_attn_state = [
+                    saved_state["prev_key"],
+                    saved_state["prev_value"],
+                    saved_state["prev_key_padding_mask"],
+                ]
+            else:
+                self_attn_state = [saved_state["prev_key"], saved_state["prev_value"]]
+            return x, attn, self_attn_state
+        return x, attn, None
+
+    def make_generation_fast_(self, need_attn: bool = False, **kwargs):
+        self.need_attn = need_attn
diff --git a/Speech2C/speech2c/models/modules/transformer_encoder.py b/Speech2C/speech2c/models/modules/transformer_encoder.py
new file mode 100644
index 0000000000000000000000000000000000000000..6916c7960cf5bf6fc4fc60257ddb377bfea368fc
--- /dev/null
+++ b/Speech2C/speech2c/models/modules/transformer_encoder.py
@@ -0,0 +1,278 @@
+# --------------------------------------------------------
+# Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data (https://arxiv.org/abs/2203.17113)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/Speech2C
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Based on fairseq code bases
+# https://github.com/pytorch/fairseq
+# --------------------------------------------------------
+
+import math
+
+import numpy as np
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from fairseq import utils
+from fairseq.dataclass import ChoiceEnum
+from fairseq.modules import (
+    LayerNorm,
+    MultiheadAttention,
+    SamePad,
+)
+from fairseq.modules.checkpoint_activations import checkpoint_wrapper
+from fairseq.modules.transformer_sentence_encoder import init_bert_params
+from fairseq.utils import index_put
+from fairseq.distributed import fsdp_wrap
+from fairseq.models.wav2vec.utils import pad_to_multiple
+from fairseq.models.wav2vec.wav2vec2 import TransformerEncoder as W2vTransformerEncoder
+
+from speech2c.models.modules.relative_pos_enc import RelativePositionalEncoding
+from speech2c.models.modules.multihead_attention import MultiheadAttention
+
+EXTRACTOR_MODE_CHOICES = ChoiceEnum(["default", "layer_norm"])
+MASKING_DISTRIBUTION_CHOICES = ChoiceEnum(["static", "uniform", "normal", "poisson"])
+
+
+class TransformerEncoder(W2vTransformerEncoder):
+    def __init__(self, args):
+        super().__init__(args)
+
+        self.dropout = args.dropout
+        self.embedding_dim = args.encoder_embed_dim
+        self.required_seq_len_multiple = args.required_seq_len_multiple
+        self.use_rel_pos_enc = getattr(args, "use_rel_pos_enc", False)
+
+        self.pos_conv = nn.Conv1d(
+            self.embedding_dim,
+            self.embedding_dim,
+            kernel_size=args.conv_pos,
+            padding=args.conv_pos // 2,
+            groups=args.conv_pos_groups,
+        )
+        dropout = 0
+        std = math.sqrt((4 * (1.0 - dropout)) / (args.conv_pos * self.embedding_dim))
+        nn.init.normal_(self.pos_conv.weight, mean=0, std=std)
+        nn.init.constant_(self.pos_conv.bias, 0)
+
+        self.pos_conv = nn.utils.weight_norm(self.pos_conv, name="weight", dim=2)
+        self.pos_conv = nn.Sequential(self.pos_conv, SamePad(args.conv_pos), nn.GELU())
+
+        layers = []
+        for _ in range(args.encoder_layers):
+            layer = TransformerSentenceEncoderLayer(
+                embedding_dim=self.embedding_dim,
+                ffn_embedding_dim=args.encoder_ffn_embed_dim,
+                num_attention_heads=args.encoder_attention_heads,
+                dropout=self.dropout,
+                attention_dropout=args.attention_dropout,
+                activation_dropout=args.activation_dropout,
+                activation_fn=args.activation_fn,
+                layer_norm_first=args.layer_norm_first,
+                has_relative_attention_bias=self.use_rel_pos_enc,
+            )
+            if args.checkpoint_activations:
+                layer = fsdp_wrap(layer)
+                layer = checkpoint_wrapper(layer)
+            layers.append(layer)
+        self.layers = nn.ModuleList(layers)
+
+        self.layer_norm_first = args.layer_norm_first
+        self.layer_norm = LayerNorm(self.embedding_dim)
+        self.layerdrop = args.encoder_layerdrop
+        if self.use_rel_pos_enc:
+            self.pos_emb = RelativePositionalEncoding(args.encoder_embed_dim // args.encoder_attention_heads, 160)
+
+
+        self.apply(init_bert_params)
+
+    def forward(self, x, padding_mask=None, layer=None):
+        x, layer_results = self.extract_features(x, padding_mask, layer)
+
+        if self.layer_norm_first and layer is None:
+            x = self.layer_norm(x)
+
+        return x, layer_results
+
+    def extract_features(self, x, padding_mask=None, tgt_layer=None):
+
+        if padding_mask is not None:
+            x = index_put(x, padding_mask, 0)
+
+        x_conv = self.pos_conv(x.transpose(1, 2))
+        x_conv = x_conv.transpose(1, 2)
+        x = x + x_conv
+
+        if not self.layer_norm_first:
+            x = self.layer_norm(x)
+
+        # pad to the sequence length dimension
+        x, pad_length = pad_to_multiple(
+            x, self.required_seq_len_multiple, dim=-2, value=0
+        )
+        if pad_length > 0 and padding_mask is None:
+            padding_mask = x.new_zeros((x.size(0), x.size(1)), dtype=torch.bool)
+            padding_mask[:, -pad_length:] = True
+        else:
+            padding_mask, _ = pad_to_multiple(
+                padding_mask, self.required_seq_len_multiple, dim=-1, value=True
+            )
+        x = F.dropout(x, p=self.dropout, training=self.training)
+
+        # B x T x C -> T x B x C
+        x = x.transpose(0, 1)
+
+        if self.use_rel_pos_enc:
+            x_len = x.shape[0]
+            pos_seq = torch.arange(0, x_len).long().to(x.device)
+            pos_seq = pos_seq[:, None] - pos_seq[None, :]
+            pos_k, pos_v = self.pos_emb(pos_seq)
+        else:
+            pos_k = None
+
+        layer_results = []
+        r = None
+        for i, layer in enumerate(self.layers):
+            dropout_probability = np.random.random()
+            if not self.training or (dropout_probability > self.layerdrop):
+                x, z = layer(x, self_attn_padding_mask=padding_mask, need_weights=False, pos_bias=pos_k)
+                if tgt_layer is not None:
+                    # unpad if needed
+                    if pad_length > 0:
+                        layer_results.append(
+                            (
+                                x[:-pad_length],
+                                z[:, :-pad_length, :-pad_length]
+                                if z is not None
+                                else z,
+                            )
+                        )
+                    else:
+                        layer_results.append((x, z))
+            if i == tgt_layer:
+                r = x
+                break
+
+        if r is not None:
+            x = r
+
+        # T x B x C -> B x T x C
+        x = x.transpose(0, 1)
+        # undo paddding
+        if pad_length > 0:
+            x = x[:, :-pad_length]
+
+        return x, layer_results
+
+
+class TransformerSentenceEncoderLayer(nn.Module):
+    """
+    Implements a Transformer Encoder Layer used in BERT/XLM style pre-trained
+    models.
+    """
+
+    def __init__(
+        self,
+        embedding_dim: float = 768,
+        ffn_embedding_dim: float = 3072,
+        num_attention_heads: float = 8,
+        dropout: float = 0.1,
+        attention_dropout: float = 0.1,
+        activation_dropout: float = 0.1,
+        activation_fn: str = "relu",
+        layer_norm_first: bool = False,
+        has_relative_attention_bias: bool = False,
+    ) -> None:
+
+        super().__init__()
+        # Initialize parameters
+        self.embedding_dim = embedding_dim
+        self.dropout = dropout
+        self.activation_dropout = activation_dropout
+
+        # Initialize blocks
+        self.activation_fn = utils.get_activation_fn(activation_fn)
+        self.self_attn = MultiheadAttention(
+            self.embedding_dim,
+            num_attention_heads,
+            dropout=attention_dropout,
+            self_attention=True,
+        )
+
+        self.dropout1 = nn.Dropout(dropout)
+        self.dropout2 = nn.Dropout(self.activation_dropout)
+        self.dropout3 = nn.Dropout(dropout)
+
+        self.layer_norm_first = layer_norm_first
+
+        # layer norm associated with the self attention layer
+        self.self_attn_layer_norm = LayerNorm(self.embedding_dim)
+        self.fc1 = nn.Linear(self.embedding_dim, ffn_embedding_dim)
+        self.fc2 = nn.Linear(ffn_embedding_dim, self.embedding_dim)
+
+        # layer norm associated with the position wise feed-forward NN
+        self.final_layer_norm = LayerNorm(self.embedding_dim)
+
+        if has_relative_attention_bias:
+            self.norm_k = LayerNorm(self.embedding_dim//num_attention_heads)
+
+    def forward(
+        self,
+        x: torch.Tensor,
+        self_attn_mask: torch.Tensor = None,
+        self_attn_padding_mask: torch.Tensor = None,
+        need_weights: bool = False,
+        att_args=None,
+        pos_bias=None,
+    ):
+        """
+        LayerNorm is applied either before or after the self-attention/ffn
+        modules similar to the original Transformer imlementation.
+        """
+        residual = x
+
+        if self.layer_norm_first:
+            x = self.self_attn_layer_norm(x)
+            if pos_bias is not None:
+                pos_bias = self.norm_k(pos_bias)
+            x, attn = self.self_attn(
+                query=x,
+                key=x,
+                value=x,
+                key_padding_mask=self_attn_padding_mask,
+                attn_mask=self_attn_mask,
+                position_bias=pos_bias,
+            )
+            x = self.dropout1(x)
+            x = residual + x
+
+            residual = x
+            x = self.final_layer_norm(x)
+            x = self.activation_fn(self.fc1(x))
+            x = self.dropout2(x)
+            x = self.fc2(x)
+            x = self.dropout3(x)
+            x = residual + x
+        else:
+            x, attn = self.self_attn(
+                query=x,
+                key=x,
+                value=x,
+                key_padding_mask=self_attn_padding_mask,
+                position_bias=pos_bias,
+            )
+
+            x = self.dropout1(x)
+            x = residual + x
+
+            x = self.self_attn_layer_norm(x)
+
+            residual = x
+            x = self.activation_fn(self.fc1(x))
+            x = self.dropout2(x)
+            x = self.fc2(x)
+            x = self.dropout3(x)
+            x = residual + x
+            x = self.final_layer_norm(x)
+
+        return x, attn
diff --git a/Speech2C/speech2c/models/speech2c.py b/Speech2C/speech2c/models/speech2c.py
new file mode 100644
index 0000000000000000000000000000000000000000..7ec69a679451172f8e32047c1bd2275932636e65
--- /dev/null
+++ b/Speech2C/speech2c/models/speech2c.py
@@ -0,0 +1,321 @@
+# --------------------------------------------------------
+# Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data (https://arxiv.org/abs/2203.17113)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/Speech2C
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Based on fairseq code bases
+# https://github.com/pytorch/fairseq
+# --------------------------------------------------------
+
+import logging
+import copy
+import contextlib
+from typing import Dict, List, Optional, Tuple
+
+import torch
+from dataclasses import dataclass, field
+from fairseq.data.dictionary import Dictionary
+from fairseq.models import register_model
+from fairseq.models.hubert import HubertConfig, HubertModel
+from fairseq.models.transformer import Embedding
+from torch import Tensor
+from speech2c.tasks.speech2c_pretraining import (
+    Speech2cPretrainingConfig,
+    Speech2cPretrainingTask,
+)
+
+from speech2c.models.modules.transformer_decoder import TransformerDecoderScriptable
+from speech2c.models.modules.transformer_encoder import TransformerEncoder
+
+logger = logging.getLogger(__name__)
+
+
+@dataclass
+class Speech2cConfig(HubertConfig):
+    use_rel_pos_enc: bool = field(
+        default=False,
+        metadata={"help": "whether to use relative positional encoding"},
+    )
+
+    # decoder
+    decoder_layers: int = field(
+        default=6, metadata={"help": "num decoder layers in the transformer"}
+    )
+    decoder_embed_dim: int = field(
+        default=768, metadata={"help": "decoder embedding dimension"}
+    )
+    decoder_ffn_embed_dim: int = field(
+        default=3072, metadata={"help": "decoder embedding dimension for FFN"}
+    )
+    decoder_attention_heads: int = field(
+        default=12, metadata={"help": "num decoder attention heads"}
+    )
+    decoder_normalize_before: bool = field(
+        default=False,
+        metadata={"help": "apply layernorm before each decoder block"},
+    )
+    decoder_layerdrop: float = field(
+        default=0.0,
+        metadata={"help": "probability of dropping a tarnsformer layer"},
+    )
+    share_decoder_input_output_embed: bool = field(
+        default=False,
+        metadata={"help": "share decoder input and output embeddings"},
+    )
+    decoder_output_dim: int = field(
+        default=768, metadata={"help": "decoder output dimension"}
+    )
+    max_target_positions: int = field(
+        default=3000, metadata={"help": "max target position"}
+    )
+    no_scale_embedding: bool = field(
+        default=False,
+        metadata={"help": "not scale embedding"},
+    )
+    adaptive_input: bool = field(
+        default=False,
+        metadata={"help": "adaptive input"},
+    )
+    quant_noise_pq: int = field(
+        default=0, metadata={"help": "quant noise pq"}
+    )
+    decoder_learned_pos: bool = field(
+        default=False,
+        metadata={"help": "decoder learnable positional embedding"},
+    )
+    no_token_positional_embeddings: bool = field(
+        default=False,
+        metadata={"help": "no token positional embeddings"},
+    )
+    decoder_dict_size: int = field(
+        default=-1,
+        metadata={"help": "decoder dictionary dimension, only used for fine-tuning"},
+    )
+
+    # FP16 optimization
+    required_seq_len_multiple: int = field(
+        default=1,
+        metadata={
+            "help": "pad the input to encoder such that the sequence length is divisible by multiple"
+        },
+    )
+    crop_seq_to_multiple: int = field(
+        default=1,
+        metadata={
+            "help": "crop convolutional feature extractor output such that the sequence length is divisible by multiple"
+        },
+    )
+
+
+@register_model("speech2c", dataclass=Speech2cConfig)
+class Speech2cModel(HubertModel):
+    def __init__(
+        self,
+        cfg: Speech2cConfig,
+        task_cfg: Speech2cPretrainingConfig,
+        dictionaries: List[Dictionary],
+    ) -> None:
+        super().__init__(cfg, task_cfg, dictionaries)
+        logger.info(f"Speech2cModel Config: {cfg}")
+
+        self.encoder = TransformerEncoder(cfg)
+
+        self.add_decoder = task_cfg.add_decoder
+        if task_cfg.add_decoder:
+            def build_embedding(dictionary, embed_dim):
+                num_embeddings = len(dictionary)
+                padding_idx = dictionary.pad()
+                return Embedding(num_embeddings, embed_dim, padding_idx)
+
+            # To make sure that the decoder dict size is the same as the fine-tuning tgt_dict size
+            cut_dictionary = copy.deepcopy(dictionaries[0])
+            if cfg.decoder_dict_size != -1:
+                cut_dictionary.symbols = cut_dictionary.symbols[:cfg.decoder_dict_size]
+
+            decoder_embed_tokens = build_embedding(
+                cut_dictionary, cfg.decoder_embed_dim
+            )
+
+            self.decoder = TransformerDecoderScriptable(cfg, cut_dictionary, decoder_embed_tokens)
+
+
+    @classmethod
+    def build_model(cls, cfg: Speech2cConfig, task: Speech2cPretrainingTask):
+        """Build a new model instance."""
+
+        model = Speech2cModel(cfg, task.cfg, task.dictionaries)
+        return model
+
+    def get_normalized_probs(
+        self,
+        net_output: Tuple[Tensor, Optional[Dict[str, List[Optional[Tensor]]]]],
+        log_probs: bool,
+        sample: Optional[Dict[str, Tensor]] = None,
+    ):
+        # net_output['encoder_out'] is a (B, T, D) tensor
+        lprobs = self.get_normalized_probs_scriptable(net_output, log_probs, sample)
+        lprobs.batch_first = True
+        return lprobs
+
+    def forward(
+        self,
+        source: torch.Tensor,
+        target_list: Optional[List[torch.Tensor]] = None,
+        padding_mask: Optional[torch.Tensor] = None,
+        mask: bool = True,
+        features_only: bool = False,
+        output_layer: Optional[int] = None,
+        prev_output_tokens: Optional[torch.Tensor] = None,
+    ) -> Dict[str, torch.Tensor]:
+        """output layer is 1-based"""
+        features = self.forward_features(source)
+        if target_list is not None:
+            features, target_list = self.forward_targets(features, target_list)
+
+        features_pen = features.float().pow(2).mean()
+
+        features = features.transpose(1, 2)
+        features = self.layer_norm(features)
+        unmasked_features = features.clone()
+
+        if padding_mask is not None:
+            padding_mask = self.forward_padding_mask(features, padding_mask)
+
+        if self.post_extract_proj is not None:
+            features = self.post_extract_proj(features)
+
+        features = self.dropout_input(features)
+        unmasked_features = self.dropout_features(unmasked_features)
+
+        if mask:
+            x, mask_indices = self.apply_mask(features, padding_mask, target_list)
+        else:
+            x = features
+            mask_indices = None
+
+        # feature: (B, T, D), float
+        # target: (B, T), long
+        # x: (B, T, D), float
+        # padding_mask: (B, T), bool
+        # mask_indices: (B, T), bool
+        x, _ = self.encoder(
+            x,
+            padding_mask=padding_mask,
+            layer=None if output_layer is None else output_layer - 1,
+        )
+
+        if features_only:
+            return {"x": x, "padding_mask": padding_mask, "features": features}
+
+        def compute_pred(proj_x, target, label_embs):
+            # compute logits for the i-th label set
+            y = torch.index_select(label_embs, 0, target.long())
+            negs = label_embs.unsqueeze(1).expand(-1, proj_x.size(0), -1)
+            if self.target_glu:
+                y = self.target_glu(y)
+                negs = self.target_glu(negs)
+            # proj_x: (S, D)
+            # y: (S, D)
+            # negs: (Neg, S, D)
+            return self.compute_nce(proj_x, y, negs)
+
+        label_embs_list = self.label_embs_concat.split(self.num_classes, 0)
+
+        if not self.skip_masked:
+            masked_indices = torch.logical_and(~padding_mask, mask_indices)
+            proj_x_m = self.final_proj(x[masked_indices])
+            if self.untie_final_proj:
+                proj_x_m_list = proj_x_m.chunk(len(target_list), dim=-1)
+            else:
+                proj_x_m_list = [proj_x_m for _ in range(len(target_list))]
+            logit_m_list = [
+                compute_pred(proj_x_m, t[masked_indices], label_embs_list[i])
+                for i, (proj_x_m, t) in enumerate(zip(proj_x_m_list, target_list))
+            ]
+        else:
+            logit_m_list = [None for _ in target_list]
+
+        if not self.skip_nomask:
+            nomask_indices = torch.logical_and(~padding_mask, ~mask_indices)
+            proj_x_u = self.final_proj(x[nomask_indices])
+            if self.untie_final_proj:
+                proj_x_u_list = proj_x_u.chunk(len(target_list), dim=-1)
+            else:
+                proj_x_u_list = [proj_x_u for _ in range(len(target_list))]
+
+            logit_u_list = [
+                compute_pred(proj_x_u, t[nomask_indices], label_embs_list[i])
+                for i, (proj_x_u, t) in enumerate(zip(proj_x_u_list, target_list))
+            ]
+        else:
+            logit_u_list = [None for _ in target_list]
+
+        result = {
+            "logit_m_list": logit_m_list,
+            "logit_u_list": logit_u_list,
+            "padding_mask": padding_mask,
+            "features_pen": features_pen,
+        }
+        if self.add_decoder:
+            encoder_out = {
+                "encoder_out": [x.transpose(0, 1)],  # T x B x C
+                "encoder_padding_mask": [padding_mask],  # B x T
+            }
+            assert prev_output_tokens is not None
+            decoder_out = self.decoder(
+                prev_output_tokens=prev_output_tokens, encoder_out=encoder_out
+            )
+            result['decoder_out'] = decoder_out
+        return result
+
+    def forward_torchscript(self, net_input: Dict[str, Tensor]):
+        """A TorchScript-compatible version of forward.
+        Encoders which use additional arguments may want to override
+        this method for TorchScript compatibility.
+        """
+        res = self.forward(
+            net_input["source"],
+            padding_mask=net_input["padding_mask"],
+            mask=False,
+            features_only=True
+        )
+
+        encoder_out = {
+            "encoder_out": [res["x"].transpose(0, 1)],  # T x B x C
+            "encoder_padding_mask": [res["padding_mask"]],  # B x T
+        }
+        return encoder_out
+
+    def extract_features(
+        self,
+        source: torch.Tensor,
+        padding_mask: Optional[torch.Tensor] = None,
+        mask: bool = False,
+        ret_conv: bool = False,
+        output_layer: Optional[int] = None,
+        prev_output_tokens: Optional[torch.Tensor] = None,
+        ft: bool = True,
+    ) -> Tuple[torch.Tensor, torch.Tensor]:
+        with torch.no_grad() if not ft else contextlib.ExitStack():
+            res = self.forward(
+                source,
+                padding_mask=padding_mask,
+                mask=mask,
+                features_only=True,
+                output_layer=output_layer,
+            )
+            
+        feature = res["features"] if ret_conv else res["x"]
+        if self.add_decoder:
+            encoder_out = {
+                "encoder_out": [feature.transpose(0, 1)],  # T x B x C
+                "encoder_padding_mask": [res["padding_mask"]],  # B x T
+            }
+            assert prev_output_tokens is not None
+            decoder_out = self.decoder(
+                prev_output_tokens=prev_output_tokens, 
+                encoder_out=encoder_out,
+            )
+        else:
+            decoder_out = None
+        return feature, res["padding_mask"], decoder_out
diff --git a/Speech2C/speech2c/models/speech2c_asr.py b/Speech2C/speech2c/models/speech2c_asr.py
new file mode 100644
index 0000000000000000000000000000000000000000..9bf8aed97d97f1fd352a884f10173c11043f6a92
--- /dev/null
+++ b/Speech2C/speech2c/models/speech2c_asr.py
@@ -0,0 +1,276 @@
+# --------------------------------------------------------
+# Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data (https://arxiv.org/abs/2203.17113)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/Speech2C
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Based on fairseq code bases
+# https://github.com/pytorch/fairseq
+# --------------------------------------------------------
+
+from argparse import Namespace
+from omegaconf import II
+
+import torch.nn as nn
+from dataclasses import dataclass, field
+from fairseq import checkpoint_utils, tasks, utils
+from fairseq.dataclass.utils import convert_namespace_to_omegaconf
+from fairseq.models import BaseFairseqModel, FairseqEncoder, register_model
+from fairseq.models.hubert.hubert_asr import HubertAsrConfig, Linear
+from fairseq.tasks import FairseqTask
+
+
+@dataclass
+class Speech2cAsrConfig(HubertAsrConfig):
+    # for decoder
+    decoder_layerdrop: float = field(
+        default=0.0,
+        metadata={"help": "probability of dropping a decoder layer in hubert"},
+    )
+
+    add_decoder: bool = II("task.add_decoder")
+
+@dataclass
+class Speech2cCtcConfig(Speech2cAsrConfig):
+    pass
+
+
+@register_model("speech2c_ctc", dataclass=Speech2cCtcConfig)
+class Speech2cCtc(BaseFairseqModel):
+    def __init__(self, cfg: Speech2cCtcConfig, w2v_encoder: BaseFairseqModel):
+        super().__init__()
+        self.cfg = cfg
+        self.w2v_encoder = w2v_encoder
+
+    def upgrade_state_dict_named(self, state_dict, name):
+        super().upgrade_state_dict_named(state_dict, name)
+        return state_dict
+
+    @classmethod
+    def build_model(cls, cfg: Speech2cCtcConfig, task: FairseqTask):
+        """Build a new model instance."""
+        w2v_encoder = Speech2cEncoder(cfg, task.target_dictionary)
+        return cls(cfg, w2v_encoder)
+
+    def get_normalized_probs(self, net_output, log_probs, sample=None):
+        """Get normalized probabilities (or log probs) from a net's output."""
+        if "encoder_out" not in net_output:
+            return self.w2v_encoder.get_normalized_probs_decoder(net_output, log_probs, sample)
+
+        if "encoder_out_for_ctc" in net_output:
+            logits = net_output["encoder_out_for_ctc"]
+        else:
+            logits = net_output["encoder_out"]
+        
+        if isinstance(logits, list):
+            logits = logits[0]
+
+        if log_probs:
+            return utils.log_softmax(logits.float(), dim=-1)
+        else:
+            return utils.softmax(logits.float(), dim=-1)
+
+    def get_logits(self, net_output):
+        logits = net_output["encoder_out"]
+        padding = net_output["encoder_padding_mask"]
+        if padding is not None and padding.any():
+            padding = padding.T
+            logits[padding][..., 0] = 0
+            logits[padding][..., 1:] = float("-inf")
+
+        return logits
+
+    def forward(self, **kwargs):
+        x = self.w2v_encoder(**kwargs)
+        return x
+
+    @property
+    def encoder(self):
+        return self.w2v_encoder
+
+    def reorder_encoder_out(self, encoder_out, new_order):
+        return self.encoder.reorder_encoder_out(encoder_out, new_order)
+
+    @property
+    def decoder(self):
+        return self.w2v_encoder.w2v_model.decoder
+
+
+class Speech2cEncoder(FairseqEncoder):
+    def __init__(self, cfg: Speech2cAsrConfig, tgt_dict=None):
+        self.apply_mask = cfg.apply_mask
+
+        arg_overrides = {
+            "dropout": cfg.dropout,
+            "activation_dropout": cfg.activation_dropout,
+            "dropout_input": cfg.dropout_input,
+            "attention_dropout": cfg.attention_dropout,
+            "mask_length": cfg.mask_length,
+            "mask_prob": cfg.mask_prob,
+            "mask_selection": cfg.mask_selection,
+            "mask_other": cfg.mask_other,
+            "no_mask_overlap": cfg.no_mask_overlap,
+            "mask_channel_length": cfg.mask_channel_length,
+            "mask_channel_prob": cfg.mask_channel_prob,
+            "mask_channel_selection": cfg.mask_channel_selection,
+            "mask_channel_other": cfg.mask_channel_other,
+            "no_mask_channel_overlap": cfg.no_mask_channel_overlap,
+            "encoder_layerdrop": cfg.layerdrop,
+            "decoder_layerdrop": cfg.decoder_layerdrop,
+            "feature_grad_mult": cfg.feature_grad_mult,
+            "decoder_dict_size": len(tgt_dict) if cfg.add_decoder else -1,
+        }
+
+        if cfg.w2v_args is None:
+            state = checkpoint_utils.load_checkpoint_to_cpu(cfg.w2v_path, arg_overrides)
+            w2v_args = state.get("cfg", None)
+            if w2v_args is None:
+                w2v_args = convert_namespace_to_omegaconf(state["args"])
+            cfg.w2v_args = w2v_args
+        else:
+            state = None
+            w2v_args = cfg.w2v_args
+            if isinstance(w2v_args, Namespace):
+                cfg.w2v_args = w2v_args = convert_namespace_to_omegaconf(w2v_args)
+
+        assert cfg.normalize == w2v_args.task.normalize, (
+            "Fine-tuning works best when data normalization is the same. "
+            "Please check that --normalize is set or unset for "
+            "both pre-training and here"
+        )
+
+        w2v_args.task.data = cfg.data
+        w2v_args.task.add_decoder = cfg.add_decoder
+        task = tasks.setup_task(w2v_args.task)
+        if state is not None and "task_state" in state:
+            # This will load the stored "dictionaries" object
+            task.load_state_dict(state["task_state"])
+        model = task.build_model(w2v_args.model)
+
+        if state is not None and not cfg.no_pretrained_weights:
+            if "decoder.embed_tokens.weight" in state["model"]:
+                del state["model"]["decoder.embed_tokens.weight"]
+            if "decoder.output_projection.weight" in state["model"]:
+                del state["model"]["decoder.output_projection.weight"]
+            # set strict=False because we omit some modules
+            model.load_state_dict(state["model"], strict=False)
+
+        model.remove_pretraining_modules()
+
+        super().__init__(task.source_dictionary)
+
+        d = model.mask_emb.size(0)
+
+        self.w2v_model = model
+
+        self.final_dropout = nn.Dropout(cfg.final_dropout)
+        self.freeze_finetune_updates = cfg.freeze_finetune_updates
+        self.num_updates = 0
+
+        if tgt_dict is not None:
+            self.proj = Linear(d, len(tgt_dict))
+        elif getattr(cfg, "decoder_embed_dim", d) != d:
+            self.proj = Linear(d, cfg.decoder_embed_dim)
+        else:
+            self.proj = None
+
+    def set_num_updates(self, num_updates):
+        """Set the number of parameters updates."""
+        super().set_num_updates(num_updates)
+        self.num_updates = num_updates
+
+    def forward(self, source, padding_mask, prev_output_tokens=None, tbc=True, **kwargs):
+
+        ft = self.freeze_finetune_updates <= self.num_updates
+        w2v_args = {
+            "source": source,
+            "padding_mask": padding_mask,
+            "mask": self.apply_mask and self.training,
+            "prev_output_tokens": prev_output_tokens,
+            "ft": ft,
+        }
+
+        x, padding_mask, decoder_out = self.w2v_model.extract_features(**w2v_args)
+        
+        if tbc:
+            # B x T x C -> T x B x C
+            x = x.transpose(0, 1)
+
+        x = self.final_dropout(x)
+
+        if self.proj:
+            x = self.proj(x)
+
+        return {
+            "encoder_out": x,  # T x B x C
+            "encoder_padding_mask": padding_mask,  # B x T
+            "padding_mask": padding_mask,
+            "decoder_out": decoder_out,
+        }
+
+    def get_normalized_probs_decoder(self, net_output, log_probs, sample=None):
+        # net_output['encoder_out'] is a (B, T, D) tensor
+        return self.w2v_model.get_normalized_probs(net_output, log_probs, sample)
+
+    def reorder_encoder_out(self, encoder_out, new_order):
+        if encoder_out["encoder_out"] is not None:
+            if isinstance(encoder_out["encoder_out"], list):
+                encoder_out["encoder_out"] = (
+                    [] if len(encoder_out["encoder_out"]) == 0
+                    else [x.index_select(1, new_order) for x in encoder_out["encoder_out"]]
+                )
+            else:
+                encoder_out["encoder_out"] = encoder_out[
+                    "encoder_out"
+                ].index_select(1, new_order)
+        if encoder_out["encoder_padding_mask"] is not None:
+            if isinstance(encoder_out["encoder_padding_mask"], list):
+                encoder_out["encoder_padding_mask"] = (
+                    [] if len(encoder_out["encoder_padding_mask"]) == 0
+                    else [x.index_select(0, new_order) for x in encoder_out["encoder_padding_mask"]]
+                )
+            else:
+                encoder_out["encoder_padding_mask"] = encoder_out[
+                    "encoder_padding_mask"
+                ].index_select(0, new_order)
+        if "decoder_out" in encoder_out and encoder_out["decoder_out"] is not None:
+            if isinstance(encoder_out["decoder_out"], list):
+                encoder_out["decoder_out"] = (
+                    [] if len(encoder_out["decoder_out"]) == 0
+                    else [x.index_select(0, new_order) for x in encoder_out["decoder_out"]]
+                )
+            else:
+                encoder_out["decoder_out"] = encoder_out[
+                    "decoder_out"
+                ].index_select(0, new_order)
+        if "encoder_out_for_ctc" in encoder_out and encoder_out["encoder_out_for_ctc"] is not None:
+            if isinstance(encoder_out["encoder_out_for_ctc"], list):
+                encoder_out["encoder_out_for_ctc"] = (
+                    [] if len(encoder_out["encoder_out_for_ctc"]) == 0
+                    else [x.index_select(1, new_order) for x in encoder_out["encoder_out_for_ctc"]]
+                )
+            else:
+                encoder_out["encoder_out_for_ctc"] = encoder_out[
+                    "encoder_out_for_ctc"
+                ].index_select(1, new_order)
+
+        return encoder_out
+
+    def forward_torchscript(self, net_input):
+        """A TorchScript-compatible version of forward.
+        Encoders which use additional arguments may want to override
+        this method for TorchScript compatibility.
+        """
+        encoder_out = self.w2v_model.forward_torchscript(net_input)
+        
+        assert self.proj is not None
+        encoder_out['encoder_out_for_ctc'] = [self.proj(encoder_out['encoder_out'][0])]
+        
+        return encoder_out
+
+    def max_positions(self):
+        """Maximum input length supported by the encoder."""
+        return None
+
+    def upgrade_state_dict_named(self, state_dict, name):
+        return state_dict
+
diff --git a/Speech2C/speech2c/models/t5_transformer_lm.py b/Speech2C/speech2c/models/t5_transformer_lm.py
new file mode 100644
index 0000000000000000000000000000000000000000..3d16a2df00b692114f8d84d254cf486d09e1137b
--- /dev/null
+++ b/Speech2C/speech2c/models/t5_transformer_lm.py
@@ -0,0 +1,25 @@
+# --------------------------------------------------------
+# Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data (https://arxiv.org/abs/2203.17113)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/Speech2C
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Based on fairseq code bases
+# https://github.com/pytorch/fairseq
+# --------------------------------------------------------
+
+from fairseq.models import (
+    register_model_architecture,
+)
+from fairseq.models.transformer_lm import base_lm_architecture
+
+
+@register_model_architecture(model_name="transformer_lm", arch_name="transformer_lm_t5")
+def transformer_lm_t5(args):
+    args.decoder_embed_dim = getattr(args, "decoder_embed_dim", 1280)
+    args.decoder_ffn_embed_dim = getattr(args, "decoder_ffn_embed_dim", 6144)
+    args.decoder_layers = getattr(args, "decoder_layers", 20)
+    args.decoder_attention_heads = getattr(args, "decoder_attention_heads", 16)
+    args.dropout = getattr(args, "dropout", 0.1)
+    args.attention_dropout = getattr(args, "attention_dropout", 0.1)
+    args.activation_fn = getattr(args, "activation_fn", "gelu")
+    base_lm_architecture(args)
diff --git a/Speech2C/speech2c/squence_generator.py b/Speech2C/speech2c/squence_generator.py
new file mode 100644
index 0000000000000000000000000000000000000000..e51e8021fe9e4e48619340412df012937db54198
--- /dev/null
+++ b/Speech2C/speech2c/squence_generator.py
@@ -0,0 +1,1028 @@
+# --------------------------------------------------------
+# Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data (https://arxiv.org/abs/2203.17113)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/Speech2C
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Based on fairseq code bases
+# https://github.com/pytorch/fairseq
+# --------------------------------------------------------
+
+import math
+from typing import Dict, List, Optional
+import sys
+
+import torch
+import torch.nn as nn
+from fairseq import search, utils
+from fairseq.data import data_utils
+from fairseq.models import FairseqIncrementalDecoder
+from torch import Tensor
+from fairseq.ngram_repeat_block import NGramRepeatBlock
+from speech2c.models.modules.ctc_prefix_score import CTCPrefixScore
+import numpy
+
+
+CTC_SCORING_RATIO = 7.0
+
+class SequenceGenerator(nn.Module):
+    def __init__(
+        self,
+        models,
+        tgt_dict,
+        beam_size=1,
+        max_len_a=0,
+        max_len_b=200,
+        max_len=0,
+        min_len=1,
+        normalize_scores=True,
+        len_penalty=1.0,
+        unk_penalty=0.0,
+        temperature=1.0,
+        match_source_len=False,
+        no_repeat_ngram_size=0,
+        search_strategy=None,
+        eos=None,
+        symbols_to_strip_from_output=None,
+        lm_model=None,
+        lm_weight=1.0,
+        ctc_weight=0.0,
+    ):
+        """Generates translations of a given source sentence.
+        Args:
+            models (List[~fairseq.models.FairseqModel]): ensemble of models,
+                currently support fairseq.models.TransformerModel for scripting
+            beam_size (int, optional): beam width (default: 1)
+            max_len_a/b (int, optional): generate sequences of maximum length
+                ax + b, where x is the source length
+            max_len (int, optional): the maximum length of the generated output
+                (not including end-of-sentence)
+            min_len (int, optional): the minimum length of the generated output
+                (not including end-of-sentence)
+            normalize_scores (bool, optional): normalize scores by the length
+                of the output (default: True)
+            len_penalty (float, optional): length penalty, where <1.0 favors
+                shorter, >1.0 favors longer sentences (default: 1.0)
+            unk_penalty (float, optional): unknown word penalty, where <0
+                produces more unks, >0 produces fewer (default: 0.0)
+            temperature (float, optional): temperature, where values
+                >1.0 produce more uniform samples and values <1.0 produce
+                sharper samples (default: 1.0)
+            match_source_len (bool, optional): outputs should match the source
+                length (default: False)
+        """
+        super().__init__()
+        if isinstance(models, EnsembleModel):
+            self.model = models
+        else:
+            self.model = EnsembleModel(models)
+        self.tgt_dict = tgt_dict
+        self.pad = tgt_dict.pad()
+        self.unk = tgt_dict.unk()
+        self.eos = tgt_dict.eos() if eos is None else eos
+        self.blank = self.tgt_dict.index("<s>")
+        self.symbols_to_strip_from_output = (
+            symbols_to_strip_from_output.union({self.eos})
+            if symbols_to_strip_from_output is not None
+            else {self.eos}
+        )
+        self.vocab_size = len(tgt_dict)
+        self.beam_size = beam_size
+        # the max beam size is the dictionary size - 1, since we never select pad
+        self.beam_size = min(beam_size, self.vocab_size - 1)
+        self.max_len_a = max_len_a
+        self.max_len_b = max_len_b
+        self.min_len = min_len
+        self.max_len = max_len or self.model.max_decoder_positions()
+
+        self.normalize_scores = normalize_scores
+        self.len_penalty = len_penalty
+        self.unk_penalty = unk_penalty
+        self.temperature = temperature
+        self.match_source_len = match_source_len
+
+        if no_repeat_ngram_size > 0:
+            self.repeat_ngram_blocker = NGramRepeatBlock(no_repeat_ngram_size)
+        else:
+            self.repeat_ngram_blocker = None
+
+        assert temperature > 0, "--temperature must be greater than 0"
+
+        self.search = (
+            search.BeamSearch(tgt_dict) if search_strategy is None else search_strategy
+        )
+        # We only need to set src_lengths in LengthConstrainedBeamSearch.
+        # As a module attribute, setting it would break in multithread
+        # settings when the model is shared.
+        self.should_set_src_lengths = (
+            hasattr(self.search, "needs_src_lengths") and self.search.needs_src_lengths
+        )
+
+        self.model.eval()
+
+        self.lm_model = lm_model
+        self.lm_weight = lm_weight
+        self.ctc_weight = ctc_weight
+        if self.lm_model is not None:
+            self.lm_model.eval()
+
+    def cuda(self):
+        self.model.cuda()
+        return self
+
+    @torch.no_grad()
+    def forward(
+        self,
+        sample: Dict[str, Dict[str, Tensor]],
+        prefix_tokens: Optional[Tensor] = None,
+        bos_token: Optional[int] = None,
+    ):
+        """Generate a batch of translations.
+        Args:
+            sample (dict): batch
+            prefix_tokens (torch.LongTensor, optional): force decoder to begin
+                with these tokens
+            bos_token (int, optional): beginning of sentence token
+                (default: self.eos)
+        """
+        return self._generate(sample, prefix_tokens, bos_token=bos_token)
+
+    # TODO(myleott): unused, deprecate after pytorch-translate migration
+    def generate_batched_itr(self, data_itr, beam_size=None, cuda=False, timer=None):
+        """Iterate over a batched dataset and yield individual translations.
+        Args:
+            cuda (bool, optional): use GPU for generation
+            timer (StopwatchMeter, optional): time generations
+        """
+        for sample in data_itr:
+            s = utils.move_to_cuda(sample) if cuda else sample
+            if "net_input" not in s:
+                continue
+            input = s["net_input"]
+            # model.forward normally channels prev_output_tokens into the decoder
+            # separately, but SequenceGenerator directly calls model.encoder
+            encoder_input = {
+                k: v for k, v in input.items() if k != "prev_output_tokens"
+            }
+            if timer is not None:
+                timer.start()
+            with torch.no_grad():
+                hypos = self.generate(encoder_input)
+            if timer is not None:
+                timer.stop(sum(len(h[0]["tokens"]) for h in hypos))
+            for i, id in enumerate(s["id"].data):
+                # remove padding
+                src = utils.strip_pad(input["src_tokens"].data[i, :], self.pad)
+                ref = (
+                    utils.strip_pad(s["target"].data[i, :], self.pad)
+                    if s["target"] is not None
+                    else None
+                )
+                yield id, src, ref, hypos[i]
+
+    @torch.no_grad()
+    def generate(self, models, sample: Dict[str, Dict[str, Tensor]], **kwargs) -> List[List[Dict[str, Tensor]]]:
+        """Generate translations. Match the api of other fairseq generators.
+        Args:
+            models (List[~fairseq.models.FairseqModel]): ensemble of models
+            sample (dict): batch
+            prefix_tokens (torch.LongTensor, optional): force decoder to begin
+                with these tokens
+            constraints (torch.LongTensor, optional): force decoder to include
+                the list of constraints
+            bos_token (int, optional): beginning of sentence token
+                (default: self.eos)
+        """
+        return self._generate(sample, **kwargs)
+
+    def _generate(
+        self,
+        sample: Dict[str, Dict[str, Tensor]],
+        prefix_tokens: Optional[Tensor] = None,
+        constraints: Optional[Tensor] = None,
+        bos_token: Optional[int] = None,
+    ):
+        incremental_states = torch.jit.annotate(
+            List[Dict[str, Dict[str, Optional[Tensor]]]],
+            [
+                torch.jit.annotate(Dict[str, Dict[str, Optional[Tensor]]], {})
+                for i in range(self.model.models_size)
+            ],
+        )
+        net_input = sample["net_input"]
+
+        if "src_tokens" in net_input:
+            src_tokens = net_input["src_tokens"]
+            # length of the source text being the character length except EndOfSentence and pad
+            src_lengths = (
+                (src_tokens.ne(self.eos) & src_tokens.ne(self.pad)).long().sum(dim=1)
+            )
+        elif "source" in net_input:
+            src_tokens = net_input["source"]
+            src_lengths = (
+                net_input["padding_mask"].size(-1) - net_input["padding_mask"].sum(-1)
+                if net_input["padding_mask"] is not None
+                else torch.tensor(src_tokens.size(-1)).to(src_tokens)
+            )
+        elif "features" in net_input:
+            src_tokens = net_input["features"]
+            src_lengths = (
+                net_input["padding_mask"].size(-1) - net_input["padding_mask"].sum(-1)
+                if net_input["padding_mask"] is not None
+                else torch.tensor(src_tokens.size(-1)).to(src_tokens)
+            )
+        else:
+            raise Exception("expected src_tokens or source in net input. input keys: " + str(net_input.keys()))
+
+        # bsz: total number of sentences in beam
+        # Note that src_tokens may have more than 2 dimensions (i.e. audio features)
+        bsz, src_len = src_tokens.size()[:2]
+        beam_size = self.beam_size
+
+        if constraints is not None and not self.search.supports_constraints:
+            raise NotImplementedError(
+                "Target-side constraints were provided, but search method doesn't support them"
+            )
+
+        # Initialize constraints, when active
+        self.search.init_constraints(constraints, beam_size)
+
+        max_len: int = -1
+        if self.match_source_len:
+            max_len = src_lengths.max().item()
+        else:
+            max_len = min(
+                int(self.max_len_a * src_len + self.max_len_b),
+                self.max_len - 1,
+            )
+        assert (
+            self.min_len <= max_len
+        ), "min_len cannot be larger than max_len, please adjust these!"
+        # compute the encoder output for each beam
+        with torch.autograd.profiler.record_function("EnsembleModel: forward_encoder"):
+            encoder_outs = self.model.forward_encoder(net_input)
+
+        # Get CTC lprobs and prep ctc_scorer
+        if self.ctc_weight > 0:
+            ctc_lprobs = self.model.models[0].get_normalized_probs(
+                encoder_outs[0], log_probs=True
+            ).contiguous().transpose(0, 1)  # (B, T, C) from the encoder
+
+            hyp = {}
+            ctc_prefix_score = CTCPrefixScore(ctc_lprobs[0].detach().cpu().numpy(), self.blank, self.eos, numpy)
+            hyp["ctc_state_prev"] = ctc_prefix_score.initial_state()
+            hyp["ctc_score_prev"] = 0.0
+            ctc_beam = min(ctc_lprobs.shape[-1], int(beam_size * CTC_SCORING_RATIO))
+            ctc_hyps = {str(self.eos): hyp}
+
+        # placeholder of indices for bsz * beam_size to hold tokens and accumulative scores
+        new_order = torch.arange(bsz).view(-1, 1).repeat(1, beam_size).view(-1)
+        new_order = new_order.to(src_tokens.device).long()
+        encoder_outs = self.model.reorder_encoder_out(encoder_outs, new_order)
+        # ensure encoder_outs is a List.
+        assert encoder_outs is not None
+
+        # initialize buffers
+        scores = (
+            torch.zeros(bsz * beam_size, max_len + 1).to(src_tokens).float()
+        )  # +1 for eos; pad is never chosen for scoring
+        tokens = (
+            torch.zeros(bsz * beam_size, max_len + 2)
+            .to(src_tokens)
+            .long()
+            .fill_(self.pad)
+        )  # +2 for eos and pad
+        tokens[:, 0] = self.eos if bos_token is None else bos_token
+        attn: Optional[Tensor] = None
+
+        # A list that indicates candidates that should be ignored.
+        # For example, suppose we're sampling and have already finalized 2/5
+        # samples. Then cands_to_ignore would mark 2 positions as being ignored,
+        # so that we only finalize the remaining 3 samples.
+        cands_to_ignore = (
+            torch.zeros(bsz, beam_size).to(src_tokens).eq(-1)
+        )  # forward and backward-compatible False mask
+
+        # list of completed sentences
+        finalized = torch.jit.annotate(
+            List[List[Dict[str, Tensor]]],
+            [torch.jit.annotate(List[Dict[str, Tensor]], []) for i in range(bsz)],
+        )  # contains lists of dictionaries of infomation about the hypothesis being finalized at each step
+
+        # a boolean array indicating if the sentence at the index is finished or not
+        finished = [False for i in range(bsz)]
+        num_remaining_sent = bsz  # number of sentences remaining
+
+        # number of candidate hypos per step
+        cand_size = 2 * beam_size  # 2 x beam size in case half are EOS
+
+        # offset arrays for converting between different indexing schemes
+        bbsz_offsets = (
+            (torch.arange(0, bsz) * beam_size)
+            .unsqueeze(1)
+            .type_as(tokens)
+            .to(src_tokens.device)
+        )
+        cand_offsets = torch.arange(0, cand_size).type_as(tokens).to(src_tokens.device)
+
+        reorder_state: Optional[Tensor] = None
+        batch_idxs: Optional[Tensor] = None
+
+        original_batch_idxs: Optional[Tensor] = None
+        if "id" in sample and isinstance(sample["id"], Tensor):
+            original_batch_idxs = sample["id"]
+        else:
+            original_batch_idxs = torch.arange(0, bsz).type_as(tokens)
+
+        for step in range(max_len + 1):  # one extra step for EOS marker
+            # reorder decoder internal states based on the prev choice of beams
+            if reorder_state is not None:
+                if batch_idxs is not None:
+                    # update beam indices to take into account removed sentences
+                    corr = batch_idxs - torch.arange(batch_idxs.numel()).type_as(
+                        batch_idxs
+                    )
+                    reorder_state.view(-1, beam_size).add_(
+                        corr.unsqueeze(-1) * beam_size
+                    )
+                    original_batch_idxs = original_batch_idxs[batch_idxs]
+                self.model.reorder_incremental_state(incremental_states, reorder_state)
+                encoder_outs = self.model.reorder_encoder_out(
+                    encoder_outs, reorder_state
+                )
+            with torch.autograd.profiler.record_function("EnsembleModel: forward_decoder"):
+                lprobs, avg_attn_scores = self.model.forward_decoder(
+                    tokens[:, : step + 1],
+                    encoder_outs,
+                    incremental_states,
+                    self.temperature,
+                )
+
+            if self.ctc_weight > 0 and step != 0:
+                # lprobs[:, self.blank] = -math.inf  # never select blank
+                ctc_lprobs = lprobs.clone()
+                ctc_lprobs[:, self.blank] = -math.inf # never select blank
+                _, local_best_ids = torch.topk(ctc_lprobs, ctc_beam, dim=-1)
+                for b in range(tokens.size(0)):
+                    hyp_key = " ".join(str(x) for x in tokens[b, : step + 1].tolist())
+                    ctc_scores, ctc_states = ctc_prefix_score(
+                        tokens[b, : step + 1].cpu(), local_best_ids[b].cpu(), ctc_hyps[hyp_key]["ctc_state_prev"]
+                    )
+                    lprobs[b] = lprobs[b]
+                    lprobs[b, local_best_ids[b]] = (1 - self.ctc_weight) * (lprobs[b, local_best_ids[b]]) + self.ctc_weight * torch.from_numpy(
+                            ctc_scores - ctc_hyps[hyp_key]["ctc_score_prev"]
+                        ).to(device="cuda")
+                    for j in range(len(local_best_ids[b])):
+                        ctc_hyps[hyp_key + " " + str(local_best_ids[b][j].item())] = {}
+                        ctc_hyps[hyp_key + " " + str(local_best_ids[b][j].item())]["ctc_score_prev"] = ctc_scores[j]
+                        ctc_hyps[hyp_key + " " + str(local_best_ids[b][j].item())]["ctc_state_prev"] = ctc_states[j]
+
+            elif self.ctc_weight > 0 and step == 0:
+                ctc_lprobs = lprobs.clone()
+                ctc_lprobs[:, self.blank] = -math.inf # never select blank
+                _, local_best_ids = torch.topk(ctc_lprobs, ctc_beam, dim=-1)
+                for b in range(tokens.size(0)):
+                    hyp_key = " ".join(str(x) for x in tokens[b, : step + 1].tolist())
+                    ctc_scores, ctc_states = ctc_prefix_score(
+                        tokens[b, : step + 1].cpu(), local_best_ids[b].cpu(), ctc_hyps[hyp_key]["ctc_state_prev"]
+                    )
+                    lprobs[b] = lprobs[b]
+                    lprobs[b, local_best_ids[b]] = (1 - self.ctc_weight) * (lprobs[b, local_best_ids[b]]) + self.ctc_weight * torch.from_numpy(
+                            ctc_scores - ctc_hyps[hyp_key]["ctc_score_prev"]
+                        ).to(device="cuda")
+                    for j in range(len(local_best_ids[b])):
+                        if b == 0:
+                            ctc_hyps[hyp_key + " " + str(local_best_ids[b][j].item())] = {}
+                            ctc_hyps[hyp_key + " " + str(local_best_ids[b][j].item())]["ctc_score_prev"] = ctc_scores[j]
+                            ctc_hyps[hyp_key + " " + str(local_best_ids[b][j].item())]["ctc_state_prev"] = ctc_states[j]
+
+            if self.lm_model is not None:
+                lm_out = self.lm_model(tokens[:, : step + 1])
+                probs = self.lm_model.get_normalized_probs(
+                    lm_out, log_probs=True, sample=None
+                )
+                probs = probs[:, -1, :] * self.lm_weight
+                lprobs += probs
+            # handle prefix tokens (possibly with different lengths)
+            if (
+                prefix_tokens is not None
+                and step < prefix_tokens.size(1)
+                and step < max_len
+            ):
+                lprobs, tokens, scores = self._prefix_tokens(
+                    step, lprobs, scores, tokens, prefix_tokens, beam_size
+                )
+            elif step < self.min_len:
+                # minimum length constraint (does not apply if using prefix_tokens)
+                lprobs[:, self.eos] = -math.inf
+
+            lprobs[lprobs != lprobs] = torch.tensor(-math.inf).to(lprobs)
+
+            lprobs[:, self.pad] = -math.inf  # never select pad
+            lprobs[:, self.unk] -= self.unk_penalty  # apply unk penalty
+            lprobs[:, self.blank] = -math.inf # never select blank
+
+            # handle max length constraint
+            if step >= max_len:
+                lprobs[:, : self.eos] = -math.inf
+                lprobs[:, self.eos + 1 :] = -math.inf
+
+            # Record attention scores, only support avg_attn_scores is a Tensor
+            if avg_attn_scores is not None:
+                if attn is None:
+                    attn = torch.empty(
+                        bsz * beam_size, avg_attn_scores.size(1), max_len + 2
+                    ).to(scores)
+                attn[:, :, step + 1].copy_(avg_attn_scores)
+
+            scores = scores.type_as(lprobs)
+            eos_bbsz_idx = torch.empty(0).to(
+                tokens
+            )  # indices of hypothesis ending with eos (finished sentences)
+            eos_scores = torch.empty(0).to(
+                scores
+            )  # scores of hypothesis ending with eos (finished sentences)
+
+            if self.should_set_src_lengths:
+                self.search.set_src_lengths(src_lengths)
+
+            if self.repeat_ngram_blocker is not None:
+                lprobs = self.repeat_ngram_blocker(tokens, lprobs, bsz, beam_size, step)
+
+            # Shape: (batch, cand_size)
+            cand_scores, cand_indices, cand_beams = self.search.step(
+                step,
+                lprobs.view(bsz, -1, self.vocab_size),
+                scores.view(bsz, beam_size, -1)[:, :, :step],
+                tokens[:, : step + 1],
+                original_batch_idxs,
+            )
+
+            # cand_bbsz_idx contains beam indices for the top candidate
+            # hypotheses, with a range of values: [0, bsz*beam_size),
+            # and dimensions: [bsz, cand_size]
+            cand_bbsz_idx = cand_beams.add(bbsz_offsets)
+
+            # finalize hypotheses that end in eos
+            # Shape of eos_mask: (batch size, beam size)
+            eos_mask = cand_indices.eq(self.eos) & cand_scores.ne(-math.inf)
+            eos_mask[:, :beam_size][cands_to_ignore] = torch.tensor(0).to(eos_mask)
+
+            # only consider eos when it's among the top beam_size indices
+            # Now we know what beam item(s) to finish
+            # Shape: 1d list of absolute-numbered
+            eos_bbsz_idx = torch.masked_select(
+                cand_bbsz_idx[:, :beam_size], mask=eos_mask[:, :beam_size]
+            )
+
+            finalized_sents: List[int] = []
+            if eos_bbsz_idx.numel() > 0:
+                eos_scores = torch.masked_select(
+                    cand_scores[:, :beam_size], mask=eos_mask[:, :beam_size]
+                )
+
+                finalized_sents = self.finalize_hypos(
+                    step,
+                    eos_bbsz_idx,
+                    eos_scores,
+                    tokens,
+                    scores,
+                    finalized,
+                    finished,
+                    beam_size,
+                    attn,
+                    src_lengths,
+                    max_len,
+                )
+                num_remaining_sent -= len(finalized_sents)
+
+            assert num_remaining_sent >= 0
+            if num_remaining_sent == 0:
+                break
+            if self.search.stop_on_max_len and step >= max_len:
+                break
+            assert step < max_len, f"{step} < {max_len}"
+
+            # Remove finalized sentences (ones for which {beam_size}
+            # finished hypotheses have been generated) from the batch.
+            if len(finalized_sents) > 0:
+                new_bsz = bsz - len(finalized_sents)
+
+                # construct batch_idxs which holds indices of batches to keep for the next pass
+                batch_mask = torch.ones(
+                    bsz, dtype=torch.bool, device=cand_indices.device
+                )
+                batch_mask[finalized_sents] = False
+                # TODO replace `nonzero(as_tuple=False)` after TorchScript supports it
+                batch_idxs = torch.arange(
+                    bsz, device=cand_indices.device
+                ).masked_select(batch_mask)
+
+                # Choose the subset of the hypothesized constraints that will continue
+                self.search.prune_sentences(batch_idxs)
+
+                eos_mask = eos_mask[batch_idxs]
+                cand_beams = cand_beams[batch_idxs]
+                bbsz_offsets.resize_(new_bsz, 1)
+                cand_bbsz_idx = cand_beams.add(bbsz_offsets)
+                cand_scores = cand_scores[batch_idxs]
+                cand_indices = cand_indices[batch_idxs]
+
+                if prefix_tokens is not None:
+                    prefix_tokens = prefix_tokens[batch_idxs]
+                src_lengths = src_lengths[batch_idxs]
+                cands_to_ignore = cands_to_ignore[batch_idxs]
+
+                scores = scores.view(bsz, -1)[batch_idxs].view(new_bsz * beam_size, -1)
+                tokens = tokens.view(bsz, -1)[batch_idxs].view(new_bsz * beam_size, -1)
+                if attn is not None:
+                    attn = attn.view(bsz, -1)[batch_idxs].view(
+                        new_bsz * beam_size, attn.size(1), -1
+                    )
+                bsz = new_bsz
+            else:
+                batch_idxs = None
+
+            # Set active_mask so that values > cand_size indicate eos hypos
+            # and values < cand_size indicate candidate active hypos.
+            # After, the min values per row are the top candidate active hypos
+
+            # Rewrite the operator since the element wise or is not supported in torchscript.
+
+            eos_mask[:, :beam_size] = ~((~cands_to_ignore) & (~eos_mask[:, :beam_size]))
+            active_mask = torch.add(
+                eos_mask.type_as(cand_offsets) * cand_size,
+                cand_offsets[: eos_mask.size(1)],
+            )
+
+            # get the top beam_size active hypotheses, which are just
+            # the hypos with the smallest values in active_mask.
+            # {active_hypos} indicates which {beam_size} hypotheses
+            # from the list of {2 * beam_size} candidates were
+            # selected. Shapes: (batch size, beam size)
+            new_cands_to_ignore, active_hypos = torch.topk(
+                active_mask, k=beam_size, dim=1, largest=False
+            )
+
+            # update cands_to_ignore to ignore any finalized hypos.
+            cands_to_ignore = new_cands_to_ignore.ge(cand_size)[:, :beam_size]
+            # Make sure there is at least one active item for each sentence in the batch.
+            assert (~cands_to_ignore).any(dim=1).all()
+
+            # update cands_to_ignore to ignore any finalized hypos
+
+            # {active_bbsz_idx} denotes which beam number is continued for each new hypothesis (a beam
+            # can be selected more than once).
+            active_bbsz_idx = torch.gather(cand_bbsz_idx, dim=1, index=active_hypos)
+            active_scores = torch.gather(cand_scores, dim=1, index=active_hypos)
+
+            active_bbsz_idx = active_bbsz_idx.view(-1)
+            active_scores = active_scores.view(-1)
+
+            # copy tokens and scores for active hypotheses
+
+            # Set the tokens for each beam (can select the same row more than once)
+            tokens[:, : step + 1] = torch.index_select(
+                tokens[:, : step + 1], dim=0, index=active_bbsz_idx
+            )
+            # Select the next token for each of them
+            tokens.view(bsz, beam_size, -1)[:, :, step + 1] = torch.gather(
+                cand_indices, dim=1, index=active_hypos
+            )
+            if step > 0:
+                scores[:, :step] = torch.index_select(
+                    scores[:, :step], dim=0, index=active_bbsz_idx
+                )
+            scores.view(bsz, beam_size, -1)[:, :, step] = torch.gather(
+                cand_scores, dim=1, index=active_hypos
+            )
+
+            # Update constraints based on which candidates were selected for the next beam
+            self.search.update_constraints(active_hypos)
+
+            # copy attention for active hypotheses
+            if attn is not None:
+                attn[:, :, : step + 2] = torch.index_select(
+                    attn[:, :, : step + 2], dim=0, index=active_bbsz_idx
+                )
+
+            # reorder incremental state in decoder
+            reorder_state = active_bbsz_idx
+
+        # sort by score descending
+        for sent in range(len(finalized)):
+            scores = torch.tensor(
+                [float(elem["score"].item()) for elem in finalized[sent]]
+            )
+            _, sorted_scores_indices = torch.sort(scores, descending=True)
+            finalized[sent] = [finalized[sent][ssi] for ssi in sorted_scores_indices]
+            finalized[sent] = torch.jit.annotate(
+                List[Dict[str, Tensor]], finalized[sent]
+            )
+        return finalized
+
+    def _prefix_tokens(
+        self, step: int, lprobs, scores, tokens, prefix_tokens, beam_size: int
+    ):
+        """Handle prefix tokens"""
+        prefix_toks = prefix_tokens[:, step].unsqueeze(-1).repeat(1, beam_size).view(-1)
+        prefix_lprobs = lprobs.gather(-1, prefix_toks.unsqueeze(-1))
+        prefix_mask = prefix_toks.ne(self.pad)
+        lprobs[prefix_mask] = torch.min(prefix_lprobs) - 1
+        lprobs[prefix_mask] = lprobs[prefix_mask].scatter(
+            -1, prefix_toks[prefix_mask].unsqueeze(-1), prefix_lprobs[prefix_mask]
+        )
+        # if prefix includes eos, then we should make sure tokens and
+        # scores are the same across all beams
+        eos_mask = prefix_toks.eq(self.eos)
+        if eos_mask.any():
+            # validate that the first beam matches the prefix
+            first_beam = tokens[eos_mask].view(-1, beam_size, tokens.size(-1))[
+                :, 0, 1 : step + 1
+            ]
+            eos_mask_batch_dim = eos_mask.view(-1, beam_size)[:, 0]
+            target_prefix = prefix_tokens[eos_mask_batch_dim][:, :step]
+            assert (first_beam == target_prefix).all()
+
+            # copy tokens, scores and lprobs from the first beam to all beams
+            tokens = self.replicate_first_beam(tokens, eos_mask_batch_dim, beam_size)
+            scores = self.replicate_first_beam(scores, eos_mask_batch_dim, beam_size)
+            lprobs = self.replicate_first_beam(lprobs, eos_mask_batch_dim, beam_size)
+        return lprobs, tokens, scores
+
+    def replicate_first_beam(self, tensor, mask, beam_size: int):
+        tensor = tensor.view(-1, beam_size, tensor.size(-1))
+        tensor[mask] = tensor[mask][:, :1, :]
+        return tensor.view(-1, tensor.size(-1))
+
+    def finalize_hypos(
+        self,
+        step: int,
+        bbsz_idx,
+        eos_scores,
+        tokens,
+        scores,
+        finalized: List[List[Dict[str, Tensor]]],
+        finished: List[bool],
+        beam_size: int,
+        attn: Optional[Tensor],
+        src_lengths,
+        max_len: int,
+    ):
+        """Finalize hypothesis, store finalized information in `finalized`, and change `finished` accordingly.
+        A sentence is finalized when {beam_size} finished items have been collected for it.
+        Returns number of sentences (not beam items) being finalized.
+        These will be removed from the batch and not processed further.
+        Args:
+            bbsz_idx (Tensor):
+        """
+        assert bbsz_idx.numel() == eos_scores.numel()
+
+        # clone relevant token and attention tensors.
+        # tokens is (batch * beam, max_len). So the index_select
+        # gets the newly EOS rows, then selects cols 1..{step + 2}
+        tokens_clone = tokens.index_select(0, bbsz_idx)[
+            :, 1 : step + 2
+        ]  # skip the first index, which is EOS
+
+        tokens_clone[:, step] = self.eos
+        attn_clone = (
+            attn.index_select(0, bbsz_idx)[:, :, 1 : step + 2]
+            if attn is not None
+            else None
+        )
+
+        # compute scores per token position
+        pos_scores = scores.index_select(0, bbsz_idx)[:, : step + 1]
+        pos_scores[:, step] = eos_scores
+        # convert from cumulative to per-position scores
+        pos_scores[:, 1:] = pos_scores[:, 1:] - pos_scores[:, :-1]
+
+        # normalize sentence-level scores
+        if self.normalize_scores:
+            eos_scores /= (step + 1) ** self.len_penalty
+
+        # cum_unfin records which sentences in the batch are finished.
+        # It helps match indexing between (a) the original sentences
+        # in the batch and (b) the current, possibly-reduced set of
+        # sentences.
+        cum_unfin: List[int] = []
+        prev = 0
+        for f in finished:
+            if f:
+                prev += 1
+            else:
+                cum_unfin.append(prev)
+        cum_fin_tensor = torch.tensor(cum_unfin, dtype=torch.int).to(bbsz_idx)
+
+        unfin_idx = bbsz_idx // beam_size
+        sent = unfin_idx + torch.index_select(cum_fin_tensor, 0, unfin_idx)
+
+        # Create a set of "{sent}{unfin_idx}", where
+        # "unfin_idx" is the index in the current (possibly reduced)
+        # list of sentences, and "sent" is the index in the original,
+        # unreduced batch
+        # For every finished beam item
+        # sentence index in the current (possibly reduced) batch
+        seen = (sent << 32) + unfin_idx
+        unique_seen: List[int] = torch.unique(seen).tolist()
+
+        if self.match_source_len:
+            condition = step > torch.index_select(src_lengths, 0, unfin_idx)
+            eos_scores = torch.where(condition, torch.tensor(-math.inf), eos_scores)
+        sent_list: List[int] = sent.tolist()
+        for i in range(bbsz_idx.size()[0]):
+            # An input sentence (among those in a batch) is finished when
+            # beam_size hypotheses have been collected for it
+            if len(finalized[sent_list[i]]) < beam_size:
+                if attn_clone is not None:
+                    # remove padding tokens from attn scores
+                    hypo_attn = attn_clone[i]
+                else:
+                    hypo_attn = torch.empty(0)
+
+                finalized[sent_list[i]].append(
+                    {
+                        "tokens": tokens_clone[i],
+                        "score": eos_scores[i],
+                        "attention": hypo_attn,  # src_len x tgt_len
+                        "alignment": torch.empty(0),
+                        "positional_scores": pos_scores[i],
+                    }
+                )
+
+        newly_finished: List[int] = []
+        for unique_s in unique_seen:
+            # check termination conditions for this sentence
+            unique_sent: int = unique_s >> 32
+            unique_unfin_idx: int = unique_s - (unique_sent << 32)
+
+            if not finished[unique_sent] and self.is_finished(
+                step, unique_unfin_idx, max_len, len(finalized[unique_sent]), beam_size
+            ):
+                finished[unique_sent] = True
+                newly_finished.append(unique_unfin_idx)
+
+        return newly_finished
+
+    def is_finished(
+        self,
+        step: int,
+        unfin_idx: int,
+        max_len: int,
+        finalized_sent_len: int,
+        beam_size: int,
+    ):
+        """
+        Check whether decoding for a sentence is finished, which
+        occurs when the list of finalized sentences has reached the
+        beam size, or when we reach the maximum length.
+        """
+        assert finalized_sent_len <= beam_size
+        if finalized_sent_len == beam_size or step == max_len:
+            return True
+        return False
+
+
+class EnsembleModel(nn.Module):
+    """A wrapper around an ensemble of models."""
+
+    def __init__(self, models):
+        super().__init__()
+        self.models_size = len(models)
+        # method '__len__' is not supported in ModuleList for torch script
+        self.single_model = models[0]
+        self.models = nn.ModuleList(models)
+
+        self.has_incremental: bool = False
+        if all(
+            hasattr(m, "decoder") and isinstance(m.decoder, FairseqIncrementalDecoder)
+            for m in models
+        ):
+            self.has_incremental = True
+
+    def forward(self):
+        pass
+
+    def has_encoder(self):
+        return hasattr(self.single_model, "encoder")
+
+    def has_incremental_states(self):
+        return self.has_incremental
+
+    def max_decoder_positions(self):
+        return min([m.max_decoder_positions() for m in self.models if hasattr(m, "max_decoder_positions")] + [sys.maxsize])
+
+    @torch.jit.export
+    def forward_encoder(self, net_input: Dict[str, Tensor]):
+        if not self.has_encoder():
+            return None
+        return [model.encoder.forward_torchscript(net_input) for model in self.models]
+
+    @torch.jit.export
+    def forward_decoder(
+        self,
+        tokens,
+        encoder_outs: List[Dict[str, List[Tensor]]],
+        incremental_states: List[Dict[str, Dict[str, Optional[Tensor]]]],
+        temperature: float = 1.0,
+    ):
+        log_probs = []
+        avg_attn: Optional[Tensor] = None
+        encoder_out: Optional[Dict[str, List[Tensor]]] = None
+        for i, model in enumerate(self.models):
+            if self.has_encoder():
+                encoder_out = encoder_outs[i]
+            # decode each model
+            if self.has_incremental_states():
+                decoder_out = model.decoder.forward(
+                    tokens,
+                    encoder_out=encoder_out,
+                    incremental_state=incremental_states[i],
+                )
+            else:
+                if hasattr(model, "decoder"):
+                    decoder_out = model.decoder.forward(tokens, encoder_out=encoder_out)
+                else:
+                    decoder_out = model.forward(tokens)
+
+            attn: Optional[Tensor] = None
+            decoder_len = len(decoder_out)
+            if decoder_len > 1 and decoder_out[1] is not None:
+                if isinstance(decoder_out[1], Tensor):
+                    attn = decoder_out[1]
+                else:
+                    attn_holder = decoder_out[1]["attn"]
+                    if isinstance(attn_holder, Tensor):
+                        attn = attn_holder
+                    elif attn_holder is not None:
+                        attn = attn_holder[0]
+                if attn is not None:
+                    attn = attn[:, -1, :]
+
+            decoder_out_tuple = (
+                decoder_out[0][:, -1:, :].div_(temperature),
+                None if decoder_len <= 1 else decoder_out[1],
+            )
+            probs = model.get_normalized_probs(
+                decoder_out_tuple, log_probs=True, sample=None
+            )
+            probs = probs[:, -1, :]
+            if self.models_size == 1:
+                return probs, attn
+
+            log_probs.append(probs)
+            if attn is not None:
+                if avg_attn is None:
+                    avg_attn = attn
+                else:
+                    avg_attn.add_(attn)
+
+        avg_probs = torch.logsumexp(torch.stack(log_probs, dim=0), dim=0) - math.log(
+            self.models_size
+        )
+
+        if avg_attn is not None:
+            avg_attn.div_(self.models_size)
+        return avg_probs, avg_attn
+
+    @torch.jit.export
+    def reorder_encoder_out(
+        self, encoder_outs: Optional[List[Dict[str, List[Tensor]]]], new_order
+    ):
+        """
+        Reorder encoder output according to *new_order*.
+        Args:
+            encoder_out: output from the ``forward()`` method
+            new_order (LongTensor): desired order
+        Returns:
+            *encoder_out* rearranged according to *new_order*
+        """
+        new_outs: List[Dict[str, List[Tensor]]] = []
+        if not self.has_encoder():
+            return new_outs
+        for i, model in enumerate(self.models):
+            assert encoder_outs is not None
+            new_outs.append(
+                model.encoder.reorder_encoder_out(encoder_outs[i], new_order)
+            )
+        return new_outs
+
+    @torch.jit.export
+    def reorder_incremental_state(
+        self,
+        incremental_states: List[Dict[str, Dict[str, Optional[Tensor]]]],
+        new_order,
+    ):
+        if not self.has_incremental_states():
+            return
+        for i, model in enumerate(self.models):
+            model.decoder.reorder_incremental_state_scripting(
+                incremental_states[i], new_order
+            )
+
+
+class SequenceGeneratorWithAlignment(SequenceGenerator):
+    def __init__(
+        self, models, tgt_dict, left_pad_target=False, print_alignment="hard", **kwargs
+    ):
+        """Generates translations of a given source sentence.
+        Produces alignments following "Jointly Learning to Align and
+        Translate with Transformer Models" (Garg et al., EMNLP 2019).
+        Args:
+            left_pad_target (bool, optional): Whether or not the
+                hypothesis should be left padded or not when they are
+                teacher forced for generating alignments.
+        """
+        super().__init__(EnsembleModelWithAlignment(models), tgt_dict, **kwargs)
+        self.left_pad_target = left_pad_target
+
+        if print_alignment == "hard":
+            self.extract_alignment = utils.extract_hard_alignment
+        elif print_alignment == "soft":
+            self.extract_alignment = utils.extract_soft_alignment
+
+    @torch.no_grad()
+    def generate(self, models, sample, **kwargs):
+        finalized = super()._generate(sample, **kwargs)
+
+        src_tokens = sample["net_input"]["src_tokens"]
+        bsz = src_tokens.shape[0]
+        beam_size = self.beam_size
+        (
+            src_tokens,
+            src_lengths,
+            prev_output_tokens,
+            tgt_tokens,
+        ) = self._prepare_batch_for_alignment(sample, finalized)
+        if any(getattr(m, "full_context_alignment", False) for m in self.model.models):
+            attn = self.model.forward_align(src_tokens, src_lengths, prev_output_tokens)
+        else:
+            attn = [
+                finalized[i // beam_size][i % beam_size]["attention"].transpose(1, 0)
+                for i in range(bsz * beam_size)
+            ]
+
+        if src_tokens.device != "cpu":
+            src_tokens = src_tokens.to("cpu")
+            tgt_tokens = tgt_tokens.to("cpu")
+            attn = [i.to("cpu") for i in attn]
+
+        # Process the attn matrix to extract hard alignments.
+        for i in range(bsz * beam_size):
+            alignment = self.extract_alignment(
+                attn[i], src_tokens[i], tgt_tokens[i], self.pad, self.eos
+            )
+            finalized[i // beam_size][i % beam_size]["alignment"] = alignment
+        return finalized
+
+    def _prepare_batch_for_alignment(self, sample, hypothesis):
+        src_tokens = sample["net_input"]["src_tokens"]
+        bsz = src_tokens.shape[0]
+        src_tokens = (
+            src_tokens[:, None, :]
+            .expand(-1, self.beam_size, -1)
+            .contiguous()
+            .view(bsz * self.beam_size, -1)
+        )
+        src_lengths = sample["net_input"]["src_lengths"]
+        src_lengths = (
+            src_lengths[:, None]
+            .expand(-1, self.beam_size)
+            .contiguous()
+            .view(bsz * self.beam_size)
+        )
+        prev_output_tokens = data_utils.collate_tokens(
+            [beam["tokens"] for example in hypothesis for beam in example],
+            self.pad,
+            self.eos,
+            self.left_pad_target,
+            move_eos_to_beginning=True,
+        )
+        tgt_tokens = data_utils.collate_tokens(
+            [beam["tokens"] for example in hypothesis for beam in example],
+            self.pad,
+            self.eos,
+            self.left_pad_target,
+            move_eos_to_beginning=False,
+        )
+        return src_tokens, src_lengths, prev_output_tokens, tgt_tokens
+
+
+class EnsembleModelWithAlignment(EnsembleModel):
+    """A wrapper around an ensemble of models."""
+
+    def __init__(self, models):
+        super().__init__(models)
+
+    def forward_align(self, src_tokens, src_lengths, prev_output_tokens):
+        avg_attn = None
+        for model in self.models:
+            decoder_out = model(src_tokens, src_lengths, prev_output_tokens)
+            attn = decoder_out[1]["attn"][0]
+            if avg_attn is None:
+                avg_attn = attn
+            else:
+                avg_attn.add_(attn)
+        if len(self.models) > 1:
+            avg_attn.div_(len(self.models))
+        return avg_attn
diff --git a/Speech2C/speech2c/tasks/speech2c_pretraining.py b/Speech2C/speech2c/tasks/speech2c_pretraining.py
new file mode 100644
index 0000000000000000000000000000000000000000..de275630bb08ad3ffae5120eee93d0c75d9ed8b0
--- /dev/null
+++ b/Speech2C/speech2c/tasks/speech2c_pretraining.py
@@ -0,0 +1,91 @@
+# --------------------------------------------------------
+# Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data (https://arxiv.org/abs/2203.17113)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/Speech2C
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Based on fairseq code bases
+# https://github.com/pytorch/fairseq
+# --------------------------------------------------------
+
+import logging
+
+from dataclasses import dataclass, field
+from fairseq.data import Dictionary
+from fairseq.tasks import register_task
+from fairseq.tasks.hubert_pretraining import HubertPretrainingConfig, HubertPretrainingTask, LabelEncoder
+from speech2c.data.speech2c_dataset import Speech2cDataset
+
+logger = logging.getLogger(__name__)
+
+
+@dataclass
+class Speech2cPretrainingConfig(HubertPretrainingConfig):
+    add_decoder: bool = field(
+        default=False,
+        metadata={"help": "whether to add decoder for CE Loss on code"},
+    )
+    
+    # For inference
+    ctc_weight: float = field(
+        default=0.0,
+        metadata={"help": "ctc weight during inference"},
+    )
+
+
+@register_task("speech2c_pretraining", dataclass=Speech2cPretrainingConfig)
+class Speech2cPretrainingTask(HubertPretrainingTask):
+
+    cfg: Speech2cPretrainingConfig
+
+    def load_dictionaries(self):
+        label_dir = self.cfg.data if self.cfg.label_dir is None else self.cfg.label_dir
+        dictionaries = [Dictionary.load(f"{label_dir}/dict.{label}.txt") for label in self.cfg.labels]
+        return dictionaries[0] if self.cfg.fine_tuning else dictionaries
+
+    def load_dataset(self, split: str, **kwargs) -> None:
+        manifest = f"{self.cfg.data}/{split}.tsv"
+        dicts = [self.target_dictionary] if self.cfg.fine_tuning else self.dictionaries
+        pad_list = [dict.pad() for dict in dicts]
+        eos_list = [dict.eos() for dict in dicts]
+        procs = [LabelEncoder(dict) for dict in dicts]
+        paths = [
+            f"{self.get_label_dir()}/{split}.{l}" for l in self.cfg.labels
+        ]
+
+        # hubert v1: pad_audio=True, random_crop=False;
+        self.datasets[split] = Speech2cDataset(
+            manifest,
+            sample_rate=self.cfg.sample_rate,
+            label_paths=paths,
+            label_rates=self.cfg.label_rate,
+            pad_list=pad_list,
+            eos_list=eos_list,
+            label_processors=procs,
+            max_keep_sample_size=self.cfg.max_keep_size,
+            min_keep_sample_size=self.cfg.min_sample_size,
+            max_sample_size=self.cfg.max_sample_size,
+            pad_audio=self.cfg.pad_audio,
+            normalize=self.cfg.normalize,
+            store_labels=False,
+            random_crop=self.cfg.random_crop,
+            single_target=self.cfg.single_target,
+            tgt_dict=dicts[0],
+            add_decoder=self.cfg.add_decoder,
+            fine_tuning=self.cfg.fine_tuning,
+        )
+
+    def build_generator(
+        self,
+        models,
+        args,
+        seq_gen_cls=None,
+        extra_gen_cls_kwargs=None,
+    ):
+        from speech2c.squence_generator import SequenceGenerator
+        extra_gen_cls_kwargs = {
+            "ctc_weight": self.cfg.ctc_weight,
+            **extra_gen_cls_kwargs
+        }
+        return super().build_generator(
+            models, args, seq_gen_cls=SequenceGenerator, extra_gen_cls_kwargs=extra_gen_cls_kwargs
+        )
diff --git a/Speech2S/README.md b/Speech2S/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..fc827e237111d872dac19ce407b8d11e52a5ee44
--- /dev/null
+++ b/Speech2S/README.md
@@ -0,0 +1,64 @@
+# Speech2S
+<!--**Pre-trained models for speech related tasks**-->
+
+ [**Joint Pre-Training with Speech and Bilingual Text for Direct Speech to Speech Translation**](https://arxiv.org/abs/2210.17027)
+
+
+- (Updating) Nov. 2022: release the code and models
+- Nov. 2022: release preprint in [arXiv](https://arxiv.org/abs/2210.17027)
+
+## Pre-Trained and Fine-tuned Models
+
+|  Model   |               Pre-training Dataset               | Fine-tuning Dataset | Model |
+| :------: | :----------------------------------------------: | :-----------------: | :-----: |
+| Speech2S_enes |   Voxpopuli_en_v2 |         -          | [Google Drive](https://drive.google.com/file/d/1TYypFiEKoCixUro8FTTG23bRZYwAxhkX/view?usp=share_link)  |
+| Speech2S_enes |   Voxpopuli_en_v2 | Voxpopuli_s2s |  [Google Drive](https://drive.google.com/file/d/11RxeKznSrHcoP_KK9A1VgwRt3fNh_U_C/view?usp=share_link) |
+| Speech2S_esen |   Voxpopuli_es_v2 |         -          | [Google Drive](https://drive.google.com/file/d/1NoC7W-UtQZ-ugIptF1ex0ZlGJncsT1S4/view?usp=share_link) |
+| Speech2S_esen |   Voxpopuli_es_v2 | Voxpopuli_s2s |  [Google Drive](https://drive.google.com/file/d/1eNcKw4ZWGmcABWXJxlf6MKocmiPrKSkH/view?usp=share_link) |
+
+
+## Setup
+```
+cd Speech2S/speech2s
+pip install --editable fairseq/
+```
+
+## Data Preparation
+Please follow the steps of data preparation for S2ST in [here](https://github.com/facebookresearch/fairseq/blob/main/examples/speech_to_speech/docs/enhanced_direct_s2st_discrete_units.md).
+
+## Pre-Training
+```
+cd speech2s/stpretrain_scripts
+base_sc2c_enes.sh
+```
+## Finetune
+```
+cd speech2s/stpretrain_scripts
+finetune_enes.sh
+```
+## Inference
+```
+cd speech2s/stpretrain_scripts
+inference_ed.sh
+```
+## Results on Voxpopuli and Covst
+
+
+## License
+
+This project is licensed under the license found in the LICENSE file in the root directory of this source tree.
+Portions of the source code are based on the [FAIRSEQ](https://github.com/pytorch/fairseq).
+
+[Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct)
+
+## Reference
+
+If you find our work is useful in your research, please cite the following paper: 
+```bibtex
+@article{wei2022joint,
+  title={Joint Pre-Training with Speech and Bilingual Text for Direct Speech to Speech Translation},
+  author={Wei, Kun and Zhou, Long and Zhang, Ziqiang and Chen, Liping and Liu, Shujie and He, Lei and Li, Jinyu and Wei, Furu},
+  journal={arXiv preprint arXiv:2210.17027},
+  year={2022}
+}
+```
diff --git a/Speech2S/speech2s/__init__.py b/Speech2S/speech2s/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..97327d269e93a13cd135f6c1a187fd820a8decb8
--- /dev/null
+++ b/Speech2S/speech2s/__init__.py
@@ -0,0 +1 @@
+from . import data, tasks, criterions, models
diff --git a/Speech2S/speech2s/config/finetune_asr/speechut_base_100h.yaml b/Speech2S/speech2s/config/finetune_asr/speechut_base_100h.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..736c3c72b9a7ba85eacaf44e1952fa7f0fc15a4f
--- /dev/null
+++ b/Speech2S/speech2s/config/finetune_asr/speechut_base_100h.yaml
@@ -0,0 +1,101 @@
+# @package _group_
+
+common:
+  fp16: true
+  log_format: json
+  log_interval: 100
+  tensorboard_logdir: tblog
+  seed: 1337
+
+checkpoint:
+  save_interval: 1
+  keep_last_epochs: 1
+  keep_best_checkpoints: 5
+  best_checkpoint_metric: dec_accuracy
+  maximize_best_checkpoint_metric: true
+  restore_file: checkpoint_last.pt
+
+distributed_training:
+  ddp_backend: legacy_ddp
+  find_unused_parameters: true
+  distributed_world_size: 1
+  distributed_port: -1
+  nprocs_per_node: 8
+
+task:
+  _name: joint_sc2t_pretraining
+  data: ???
+  fine_tuning: true
+  label_dir: ???
+  normalize: false  # must be consistent with pre-training
+  labels: ["ltr"]
+  store_labels: true
+  single_target: true
+  add_decoder_target: true
+  pad_audio: false
+  random_crop: true
+  hubert_tokenizer: "none"
+  sp_path: None
+
+dataset:
+  num_workers: 0
+  max_tokens: 1300000
+  skip_invalid_size_inputs_valid_test: true
+  train_subset: train_100
+  valid_subset: dev_other
+  required_batch_size_multiple: 1
+
+criterion:
+  _name: ctc_ce
+  zero_infinity: true
+
+optimization:
+  max_update: 40000
+  lr: [0.00001]
+  sentence_avg: true
+  update_freq: [2]
+
+optimizer:
+  _name: adam
+  adam_betas: (0.9,0.98)
+  adam_eps: 1e-08
+  weight_decay: 0.0
+
+lr_scheduler:
+  _name: tri_stage
+  phase_ratio: [0.1, 0.4, 0.5]
+  final_lr_scale: 0.05
+
+model:
+  _name: speechut_asr
+  w2v_path: ???
+  apply_mask: true
+  mask_prob: 0.65
+  mask_channel_prob: 0.5
+  mask_channel_length: 64
+  layerdrop: 0.1
+  activation_dropout: 0.1
+  feature_grad_mult: 0.0
+  freeze_finetune_updates: 0
+  add_decoder: true
+
+hydra:
+  job:
+    config:
+      override_dirname:
+        kv_sep: '-'
+        item_sep: '__'
+        exclude_keys:
+          - run
+          - task.data
+          - task.label_dir
+          - model.w2v_path
+          - dataset.train_subset
+          - dataset.valid_subset
+          - criterion.wer_kenlm_model
+          - criterion.wer_lexicon
+  run:
+    dir: ???
+  sweep:
+    dir: ???
+    subdir: ${hydra.job.config_name}__${hydra.job.override_dirname}
diff --git a/Speech2S/speech2s/config/finetune_asr/speechut_large_100h.yaml b/Speech2S/speech2s/config/finetune_asr/speechut_large_100h.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..7cbc59e61f10ab00b997286d6355f22ce1008677
--- /dev/null
+++ b/Speech2S/speech2s/config/finetune_asr/speechut_large_100h.yaml
@@ -0,0 +1,102 @@
+# @package _group_
+
+common:
+  fp16: true
+  log_format: json
+  log_interval: 100
+  tensorboard_logdir: tblog
+  seed: 1337
+
+checkpoint:
+  save_interval: 1
+  keep_last_epochs: 5
+  keep_best_checkpoints: 5
+  best_checkpoint_metric: dec_accuracy
+  maximize_best_checkpoint_metric: true
+  restore_file: checkpoint_last.pt
+
+distributed_training:
+  ddp_backend: legacy_ddp
+  find_unused_parameters: true
+  distributed_world_size: 16
+  distributed_port: -1
+  nprocs_per_node: 8
+
+task:
+  _name: joint_sc2t_pretraining
+  data: ???
+  fine_tuning: true
+  label_dir: ???
+  normalize: true  # must be consistent with pre-training
+  labels: ["ltr"]
+  store_labels: true
+  single_target: true
+  add_decoder_target: true
+  pad_audio: false
+  random_crop: true
+  hubert_tokenizer: "none"
+  sp_path: None
+
+dataset:
+  num_workers: 0
+  max_tokens: 1300000
+  skip_invalid_size_inputs_valid_test: true
+  train_subset: train_100
+  valid_subset: dev_other
+  required_batch_size_multiple: 1
+
+criterion:
+  _name: ctc_ce
+  zero_infinity: true
+
+optimization:
+  max_update: 40000
+  lr: [0.00001]
+  sentence_avg: true
+  update_freq: [2]
+
+optimizer:
+  _name: adam
+  adam_betas: (0.9,0.98)
+  adam_eps: 1e-08
+  weight_decay: 0.0
+
+lr_scheduler:
+  _name: tri_stage
+  phase_ratio: [0.1, 0.4, 0.5]
+  final_lr_scale: 0.05
+
+model:
+  _name: speechut_asr
+  w2v_path: ???
+  apply_mask: true
+  mask_prob: 0.5
+  mask_channel_prob: 0.5
+  mask_channel_length: 64
+  layerdrop: 0.0
+  activation_dropout: 0.1
+  attention_dropout: 0.1
+  feature_grad_mult: 0.0
+  freeze_finetune_updates: 0
+  add_decoder: true
+
+hydra:
+  job:
+    config:
+      override_dirname:
+        kv_sep: '-'
+        item_sep: '__'
+        exclude_keys:
+          - run
+          - task.data
+          - task.label_dir
+          - model.w2v_path
+          - dataset.train_subset
+          - dataset.valid_subset
+          - criterion.wer_kenlm_model
+          - criterion.wer_lexicon
+  run:
+    dir: ???
+  sweep:
+    dir: ???
+    subdir: ${hydra.job.config_name}__${hydra.job.override_dirname}
diff --git a/Speech2S/speech2s/config/finetune_asr/speechut_large_960h.yaml b/Speech2S/speech2s/config/finetune_asr/speechut_large_960h.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..f10d6002555e5cbcfbf31035d8258e77abc26050
--- /dev/null
+++ b/Speech2S/speech2s/config/finetune_asr/speechut_large_960h.yaml
@@ -0,0 +1,100 @@
+# @package _group_
+
+common:
+  fp16: true
+  log_format: json
+  log_interval: 100
+  tensorboard_logdir: tblog
+
+checkpoint:
+  save_interval: 1
+  keep_last_epochs: 5
+  keep_best_checkpoints: 5
+  best_checkpoint_metric: dec_accuracy
+  maximize_best_checkpoint_metric: true
+  restore_file: checkpoint_last.pt
+
+distributed_training:
+  ddp_backend: legacy_ddp
+  find_unused_parameters: true
+  distributed_world_size: 24
+  distributed_port: -1
+  nprocs_per_node: 8
+
+task:
+  _name: joint_sc2t_pretraining
+  data: ???
+  fine_tuning: true
+  label_dir: ???
+  normalize: true  # must be consistent with pre-training
+  labels: ["ltr"]
+  store_labels: true
+  single_target: true
+  add_decoder_target: true
+  pad_audio: false
+  random_crop: true
+  hubert_tokenizer: "none"
+  sp_path: None
+
+dataset:
+  num_workers: 0
+  max_tokens: 1300000
+  skip_invalid_size_inputs_valid_test: true
+  train_subset: train_960
+  valid_subset: dev_other
+  required_batch_size_multiple: 1
+
+criterion:
+  _name: ctc_ce
+  zero_infinity: true
+
+optimization:
+  max_update: 40000
+  lr: [0.00001]
+  sentence_avg: true
+  update_freq: [2]
+
+optimizer:
+  _name: adam
+  adam_betas: (0.9,0.98)
+  adam_eps: 1e-08
+  weight_decay: 0.0
+
+lr_scheduler:
+  _name: tri_stage
+  phase_ratio: [0.1, 0.4, 0.5]
+  final_lr_scale: 0.05
+
+model:
+  _name: speechut_asr
+  w2v_path: ???
+  apply_mask: true
+  mask_prob: 0.5
+  mask_channel_prob: 0.25
+  mask_channel_length: 64
+  layerdrop: 0.0
+  activation_dropout: 0.1
+  feature_grad_mult: 0.0
+  freeze_finetune_updates: 0
+  add_decoder: true
+
+hydra:
+  job:
+    config:
+      override_dirname:
+        kv_sep: '-'
+        item_sep: '__'
+        exclude_keys:
+          - run
+          - task.data
+          - task.label_dir
+          - model.w2v_path
+          - dataset.train_subset
+          - dataset.valid_subset
+          - criterion.wer_kenlm_model
+          - criterion.wer_lexicon
+  run:
+    dir: ???
+  sweep:
+    dir: ???
+    subdir: ${hydra.job.config_name}__${hydra.job.override_dirname}
diff --git a/Speech2S/speech2s/config/pretrain/speechut_base_librispeech.yaml b/Speech2S/speech2s/config/pretrain/speechut_base_librispeech.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..6a3751febf2efc3cbf7a91e3a75f05b570559f2c
--- /dev/null
+++ b/Speech2S/speech2s/config/pretrain/speechut_base_librispeech.yaml
@@ -0,0 +1,153 @@
+# @package _group_
+
+common:
+  fp16: true
+  log_format: json
+  log_interval: 200
+  seed: 1337
+  tensorboard_logdir: tblog
+
+checkpoint:
+  save_dir: ???
+  save_interval: 4
+  keep_last_epochs: 4
+  save_interval_updates: 50000
+  keep_interval_updates: -1
+  keep_interval_updates_pattern: 50000
+  # no_epoch_checkpoints: true
+
+distributed_training:
+  ddp_backend: no_c10d
+  distributed_backend: 'nccl'
+  distributed_port: -1
+  distributed_world_size: 32
+  nprocs_per_node: 8
+  find_unused_parameters: true
+
+task:
+  _name: joint_sc2t_pretraining
+  data: ???
+  label_dir: ???
+  labels: ???
+  label_rate: ${model.label_rate}
+  store_labels: true
+  sample_rate: 16000
+  max_sample_size: 250000
+  min_sample_size: 32000
+  pad_audio: false
+  random_crop: true
+  normalize: false # must be consistent with extractor
+  add_decoder_target: true
+  text_cfg:
+    seed: ${common.seed}
+    text_data: ???
+    data_config: config.yaml
+    sample_break_mode: eos
+    tokens_per_sample: 1024
+    shorten_method: "random_crop"
+    text_maxtokens_ratio: 1.5
+
+dataset:
+  num_workers: 6
+  max_tokens: 1400000
+  skip_invalid_size_inputs_valid_test: true
+  validate_interval: ${checkpoint.save_interval}
+  validate_interval_updates: ${checkpoint.save_interval_updates}
+  required_batch_size_multiple: 1
+
+criterion:
+  _name: speechut_criterion
+  pred_masked_weight: 1.0
+  pred_nomask_weight: 0.0
+  loss_weights: [10,]
+  label_smoothing: 0.1
+  u2t_ed_weight: 0.1
+  u2t_ctc_weight: 0.1
+  text_mum_weight: 0.5
+
+optimization:
+  max_update: 400000
+  lr: [0.0005]
+  clip_norm: 10.0
+
+optimizer:
+  _name: adam
+  adam_betas: (0.9,0.98)
+  adam_eps: 1e-06
+  weight_decay: 0.01
+
+lr_scheduler:
+  _name: polynomial_decay
+  warmup_updates: 32000
+
+model:
+  _name: speechut
+  label_rate: ???
+  skip_masked: false
+  skip_nomask: false
+  mask_prob: 0.80
+  extractor_mode: default
+  conv_feature_layers: '[(512,10,5)] + [(512,3,2)] * 4 + [(512,2,2)] * 2'
+  final_dim: 256
+  activation_fn: "gelu"
+  encoder_layers: 6
+  encoder_attention_heads: 8
+  encoder_layerdrop: 0.0
+  dropout_input: 0.1
+  dropout_features: 0.1
+  dropout: 0.1
+  attention_dropout: 0.1
+  feature_grad_mult: 0.1
+  untie_final_proj: true
+  activation_dropout: 0.0
+  use_rel_pos_enc: true
+  add_unit_encoder: true
+  add_text_ctc: true
+  mask_u2t: false
+  mix_with_unit: true
+  add_decoder: true
+  reset_decoder_embedding_config: true
+  text_transformer:
+    activation_fn: ${model.activation_fn}
+    dropout: ${model.dropout}
+    attention_dropout: ${model.attention_dropout}
+    activation_dropout: ${model.activation_dropout}
+    max_source_positions: 3000
+    max_target_positions: 3000
+    no_scale_embedding: true
+    layernorm_embedding: true
+    no_token_positional_embeddings: false
+    share_decoder_input_output_embed: false
+    encoder:
+      embed_dim: 768
+      ffn_embed_dim: 3072
+      layers: 6
+      attention_heads: 8
+      normalize_before: false
+      learned_pos: true
+      layerdrop: ${model.encoder_layerdrop}
+    decoder:
+      layerdrop: 0.1
+      embed_dim: 768
+      ffn_embed_dim: 3072
+      layers: 6
+      attention_heads: 12
+      normalize_before: false
+      learned_pos: false
+      output_dim: 768
+
+hydra:
+  job:
+    config:
+      override_dirname:
+        kv_sep: '-'
+        item_sep: '__'
+        exclude_keys:
+          - run
+          - task.data
+          - task.label_dir
+  run:
+    dir: ???
+  sweep:
+    dir: ???
+    subdir: ${hydra.job.config_name}__${hydra.job.override_dirname}
diff --git a/Speech2S/speech2s/config/pretrain/speechut_large_librilight.yaml b/Speech2S/speech2s/config/pretrain/speechut_large_librilight.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..849c1d986126f6e26f3e10feb14fae0a299be4b4
--- /dev/null
+++ b/Speech2S/speech2s/config/pretrain/speechut_large_librilight.yaml
@@ -0,0 +1,159 @@
+# @package _group_
+
+common:
+  fp16: true
+  fp16_scale_tolerance: 0.1   # alleviate fp16 overflow issue
+  log_format: json
+  log_interval: 200
+  seed: 1234
+  tensorboard_logdir: tblog
+
+checkpoint:
+  save_dir: ???
+  save_interval: 1
+  keep_last_epochs: 4
+  save_interval_updates: 10000
+  keep_interval_updates: -1
+  keep_interval_updates_pattern: 10000
+  # no_epoch_checkpoints: true
+
+distributed_training:
+  ddp_backend: no_c10d
+  distributed_backend: 'nccl'
+  distributed_port: -1
+  distributed_world_size: 128
+  nprocs_per_node: 8
+  find_unused_parameters: true
+
+task:
+  _name: joint_sc2t_pretraining
+  data: ???
+  label_dir: ???
+  labels: ???
+  label_rate: ${model.label_rate}
+  store_labels: true
+  sample_rate: 16000
+  max_sample_size: 250000
+  min_sample_size: 32000
+  pad_audio: false
+  random_crop: true
+  normalize: true # must be consistent with extractor
+  add_decoder_target: true
+  text_cfg:
+    seed: ${common.seed}
+    text_data: ???
+    data_config: config.yaml
+    sample_break_mode: eos
+    tokens_per_sample: 1024
+    shorten_method: "random_crop"
+    text_maxtokens_ratio: 1.4
+
+dataset:
+  num_workers: 6
+  max_tokens: 900000
+  skip_invalid_size_inputs_valid_test: true
+  validate_interval: ${checkpoint.save_interval}
+  validate_interval_updates: ${checkpoint.save_interval_updates}
+  required_batch_size_multiple: 2
+
+criterion:
+  _name: speechut_criterion
+  pred_masked_weight: 1.0
+  pred_nomask_weight: 0.0
+  loss_weights: [10,]
+  label_smoothing: 0.1
+  u2t_ed_weight: 0.1
+  u2t_ctc_weight: 0.1
+  text_mum_weight: 0.5
+
+optimization:
+  max_update: 400000
+  lr: [0.0005]
+  clip_norm: 1.0
+
+optimizer:
+  _name: adam
+  adam_betas: (0.9,0.98)
+  adam_eps: 1e-06
+  weight_decay: 0.01
+
+lr_scheduler:
+  _name: polynomial_decay
+  warmup_updates: 32000
+  end_learning_rate: 0.00015  # for future longger pre-training, e.g. 600K step
+
+model:
+  _name: speechut
+  label_rate: ???
+  encoder_embed_dim: 1024
+  encoder_ffn_embed_dim: 4096
+  skip_masked: false
+  skip_nomask: false
+  mask_prob: 0.80
+  extractor_mode: layer_norm
+  conv_feature_layers: '[(512,10,5)] + [(512,3,2)] * 4 + [(512,2,2)] * 2'
+  final_dim: 768
+  activation_fn: "gelu"
+  encoder_layers: 12
+  encoder_attention_heads: 16
+  encoder_layerdrop: 0.0
+  dropout_input: 0.0
+  dropout_features: 0.0
+  dropout: 0.0
+  attention_dropout: 0.0
+  layer_norm_first: true
+  feature_grad_mult: 1.0
+  untie_final_proj: true
+  activation_dropout: 0.0
+  use_rel_pos_enc: true
+  add_unit_encoder: true
+  add_text_ctc: true
+  mask_u2t: false
+  mix_with_unit: true
+  add_decoder: true
+  reset_decoder_embedding_config: true
+  scaling_for_att: 32   # alleviate fp16 overflow issue
+  text_transformer:
+    activation_fn: ${model.activation_fn}
+    dropout: ${model.dropout}
+    attention_dropout: ${model.attention_dropout}
+    activation_dropout: ${model.activation_dropout}
+    max_source_positions: 3000
+    max_target_positions: 3000
+    no_scale_embedding: true
+    layernorm_embedding: true
+    no_token_positional_embeddings: true
+    share_decoder_input_output_embed: false
+    encoder:
+      embed_dim: 1024
+      ffn_embed_dim: 4096
+      layers: 12
+      attention_heads: 16
+      normalize_before: false
+      learned_pos: true
+      layerdrop: ${model.encoder_layerdrop}
+    decoder:
+      layerdrop: 0.1
+      embed_dim: 768
+      ffn_embed_dim: 3072
+      layers: 6
+      attention_heads: 12
+      normalize_before: false
+      learned_pos: false
+      output_dim: 768
+
+hydra:
+  job:
+    config:
+      override_dirname:
+        kv_sep: '-'
+        item_sep: '__'
+        exclude_keys:
+          - run
+          - task.data
+          - task.label_dir
+  run:
+    dir: ???
+  sweep:
+    dir: ???
+    subdir: ${hydra.job.config_name}__${hydra.job.override_dirname}
diff --git a/Speech2S/speech2s/criterions/__init__.py b/Speech2S/speech2s/criterions/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..2bf9fac9a8c00d76decd07417d86a2625c4c851c
--- /dev/null
+++ b/Speech2S/speech2s/criterions/__init__.py
@@ -0,0 +1,9 @@
+import importlib
+import os
+
+for file in os.listdir(os.path.dirname(__file__)):
+    if file.endswith(".py") and not file.startswith("_"):
+        criterion_name = file[: file.find(".py")]
+        importlib.import_module(
+            "speechut.criterions." + criterion_name
+        )
diff --git a/Speech2S/speech2s/criterions/ctc_ce.py b/Speech2S/speech2s/criterions/ctc_ce.py
new file mode 100644
index 0000000000000000000000000000000000000000..aab6c9d23ac3b7dc410704bcba8982a697a57656
--- /dev/null
+++ b/Speech2S/speech2s/criterions/ctc_ce.py
@@ -0,0 +1,414 @@
+# ----------------------------------------------------------------------------
+# SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder Based Speech-Text Pre-training (https://arxiv.org/abs/2210.03730)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/SpeechUT
+# Code based on fairseq: https://github.com/facebookresearch/fairseq/tree/272c4c5197250997148fb12c0db6306035f166a4
+# 
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# ----------------------------------------------------------------------------
+
+import math
+from argparse import Namespace
+from dataclasses import dataclass, field
+from omegaconf import II
+from typing import Optional
+
+import torch
+import torch.nn.functional as F
+from fairseq import metrics, utils
+from fairseq.criterions import FairseqCriterion, register_criterion
+from fairseq.criterions.label_smoothed_cross_entropy import label_smoothed_nll_loss
+from fairseq.dataclass import FairseqDataclass
+from fairseq.data.data_utils import post_process
+from fairseq.tasks import FairseqTask
+from fairseq.logging.meters import safe_round
+
+
+@dataclass
+class CtcCeCriterionConfig(FairseqDataclass):
+    zero_infinity: bool = field(
+        default=False,
+        metadata={"help": "zero inf loss when source length <= target length"},
+    )
+    sentence_avg: bool = II("optimization.sentence_avg")
+    post_process: str = field(
+        default="letter",
+        metadata={
+            "help": "how to post process predictions into words. can be letter, "
+            "wordpiece, BPE symbols, etc. "
+            "See fairseq.data.data_utils.post_process() for full list of options"
+        },
+    )
+    wer_kenlm_model: Optional[str] = field(
+        default=None,
+        metadata={
+            "help": "if this is provided, use kenlm to compute wer (along with other wer_* args)"
+        },
+    )
+    wer_lexicon: Optional[str] = field(
+        default=None,
+        metadata={"help": "lexicon to use with wer_kenlm_model"},
+    )
+    wer_lm_weight: float = field(
+        default=2.0,
+        metadata={"help": "lm weight to use with wer_kenlm_model"},
+    )
+    wer_word_score: float = field(
+        default=-1.0,
+        metadata={"help": "lm word score to use with wer_kenlm_model"},
+    )
+
+    wer_args: Optional[str] = field(
+        default=None,
+        metadata={
+            "help": "DEPRECATED: tuple of (wer_kenlm_model, wer_lexicon, wer_lm_weight, wer_word_score)"
+        },
+    )
+
+    dec_weight: float = field(
+        default=0.5,
+        metadata={"help": "weights for decoder CE Loss, loss will be ((1 - dec_weight) * hubert_loss + dec_weight * CE_Loss)"},
+    )
+    report_accuracy: bool = field(
+        default=True,
+        metadata={"help": "report decoder accuracy metric"},
+    )
+    ignore_prefix_size: int = field(
+        default=0,
+        metadata={"help": "Ignore first N tokens"},
+    )
+    label_smoothing: float = field(
+        default=0.1,
+        metadata={"help": "epsilon for label smoothing, 0 means no label smoothing"},
+    )
+
+
+@register_criterion("ctc_ce", dataclass=CtcCeCriterionConfig)
+class CtcCeCriterion(FairseqCriterion):
+    def __init__(self, cfg: CtcCeCriterionConfig, task: FairseqTask):
+        super().__init__(task)
+        self.blank_idx = (
+            task.target_dictionary.index(task.blank_symbol)
+            if hasattr(task, "blank_symbol")
+            else 0
+        )
+        self.pad_idx = task.target_dictionary.pad()
+        self.eos_idx = task.target_dictionary.eos()
+        self.post_process = cfg.post_process
+
+        if cfg.wer_args is not None:
+            (
+                cfg.wer_kenlm_model,
+                cfg.wer_lexicon,
+                cfg.wer_lm_weight,
+                cfg.wer_word_score,
+            ) = eval(cfg.wer_args)
+
+        if cfg.wer_kenlm_model is not None:
+            from examples.speech_recognition.w2l_decoder import W2lKenLMDecoder
+
+            dec_args = Namespace()
+            dec_args.nbest = 1
+            dec_args.criterion = "ctc"
+            dec_args.kenlm_model = cfg.wer_kenlm_model
+            dec_args.lexicon = cfg.wer_lexicon
+            dec_args.beam = 50
+            dec_args.beam_size_token = min(50, len(task.target_dictionary))
+            dec_args.beam_threshold = min(50, len(task.target_dictionary))
+            dec_args.lm_weight = cfg.wer_lm_weight
+            dec_args.word_score = cfg.wer_word_score
+            dec_args.unk_weight = -math.inf
+            dec_args.sil_weight = 0
+
+            self.w2l_decoder = W2lKenLMDecoder(dec_args, task.target_dictionary)
+        else:
+            self.w2l_decoder = None
+
+        self.zero_infinity = cfg.zero_infinity
+        self.sentence_avg = cfg.sentence_avg
+
+        self.dec_weight = cfg.dec_weight
+        self.report_accuracy = cfg.report_accuracy
+        self.ignore_prefix_size = cfg.ignore_prefix_size
+        self.eps = cfg.label_smoothing
+
+    def forward(self, model, sample, reduce=True):
+        net_output = model(**sample["net_input"])
+        lprobs = model.get_normalized_probs(
+            net_output, log_probs=True
+        ).contiguous()  # (T, B, C) from the encoder
+
+        if "src_lengths" in sample["net_input"]:
+            input_lengths = sample["net_input"]["src_lengths"]
+        else:
+            if net_output["padding_mask"] is not None:
+                non_padding_mask = ~net_output["padding_mask"]
+                input_lengths = non_padding_mask.long().sum(-1)
+            else:
+                input_lengths = lprobs.new_full(
+                    (lprobs.size(1),), lprobs.size(0), dtype=torch.long
+                )
+
+        pad_mask = (sample["target"] != self.pad_idx) & (
+            sample["target"] != self.eos_idx
+        )
+        targets_flat = sample["target"].masked_select(pad_mask)
+        if "target_lengths" in sample:
+            target_lengths = sample["target_lengths"]
+        else:
+            target_lengths = pad_mask.sum(-1)
+
+        with torch.backends.cudnn.flags(enabled=False):
+            loss = F.ctc_loss(
+                lprobs,
+                targets_flat,
+                input_lengths,
+                target_lengths,
+                blank=self.blank_idx,
+                reduction="sum",
+                zero_infinity=self.zero_infinity,
+            )
+
+        ntokens = (
+            sample["ntokens"] if "ntokens" in sample else target_lengths.sum().item()
+        )
+
+        sample_size = sample["target"].size(0) if self.sentence_avg else ntokens
+
+        logging_output = {}
+        if "decoder_target" in sample:
+            if net_output["decoder_out"] is not None:
+                dec_sample_size = sample["target"].size(0) if self.sentence_avg else sample["dec_ntokens"]
+                dec_loss, dec_nll_loss = self.compute_ce_loss(model, net_output["decoder_out"], sample, reduce=reduce)
+                logging_output["ctc_loss"] = loss.item()
+                loss = (1 - self.dec_weight) * loss + (self.dec_weight * dec_loss *  sample_size / dec_sample_size)
+                logging_output["dec_loss"] = dec_loss.item()
+                logging_output["dec_nll_loss"] = dec_nll_loss.item()
+                logging_output["dec_sample_size"] = dec_sample_size
+
+                if self.report_accuracy:
+                    n_correct, total = self.compute_accuracy(model, net_output["decoder_out"], sample)
+                    logging_output["dec_n_correct"] = utils.item(n_correct.data)
+                    logging_output["total"] = utils.item(total.data)
+            else:
+                logging_output["ctc_loss"] = loss.item()
+                loss = (1 - self.dec_weight) * loss
+                logging_output["dec_loss"] = 0
+                logging_output["dec_nll_loss"] = 0
+                logging_output["dec_sample_size"] = 1
+                if self.report_accuracy:
+                    logging_output["dec_n_correct"] = 0
+                    logging_output["total"] = 1
+            
+        logging_output = {
+            "loss": utils.item(loss.data),  # * sample['ntokens'],
+            "ntokens": ntokens,
+            "nsentences": sample["id"].numel(),
+            "sample_size": sample_size,
+            **logging_output,
+        }
+
+        if not model.training and self.dec_weight < 1.0:
+            import editdistance
+
+            with torch.no_grad():
+                lprobs_t = lprobs.transpose(0, 1).float().contiguous().cpu()
+
+                c_err = 0
+                c_len = 0
+                w_errs = 0
+                w_len = 0
+                wv_errs = 0
+                for lp, t, inp_l in zip(
+                    lprobs_t,
+                    sample["target_label"]
+                    if "target_label" in sample
+                    else sample["target"],
+                    input_lengths,
+                ):
+                    lp = lp[:inp_l].unsqueeze(0)
+
+                    decoded = None
+                    if self.w2l_decoder is not None:
+                        decoded = self.w2l_decoder.decode(lp)
+                        if len(decoded) < 1:
+                            decoded = None
+                        else:
+                            decoded = decoded[0]
+                            if len(decoded) < 1:
+                                decoded = None
+                            else:
+                                decoded = decoded[0]
+
+                    p = (t != self.task.target_dictionary.pad()) & (
+                        t != self.task.target_dictionary.eos()
+                    )
+                    targ = t[p]
+                    targ_units = self.task.target_dictionary.string(targ)
+                    targ_units_arr = targ.tolist()
+
+                    toks = lp.argmax(dim=-1).unique_consecutive()
+                    pred_units_arr = toks[toks != self.blank_idx].tolist()
+
+                    c_err += editdistance.eval(pred_units_arr, targ_units_arr)
+                    c_len += len(targ_units_arr)
+
+                    targ_words = post_process(targ_units, self.post_process).split()
+
+                    pred_units = self.task.target_dictionary.string(pred_units_arr)
+                    pred_words_raw = post_process(pred_units, self.post_process).split()
+
+                    if decoded is not None and "words" in decoded:
+                        pred_words = decoded["words"]
+                        w_errs += editdistance.eval(pred_words, targ_words)
+                        wv_errs += editdistance.eval(pred_words_raw, targ_words)
+                    else:
+                        dist = editdistance.eval(pred_words_raw, targ_words)
+                        w_errs += dist
+                        wv_errs += dist
+
+                    w_len += len(targ_words)
+
+                logging_output["wv_errors"] = wv_errs
+                logging_output["w_errors"] = w_errs
+                logging_output["w_total"] = w_len
+                logging_output["c_errors"] = c_err
+                logging_output["c_total"] = c_len
+
+        return loss, sample_size, logging_output
+
+    def compute_ce_loss(self, model, net_output, sample, reduce=True):
+        lprobs, target = self.get_lprobs_and_target(model, net_output, sample)
+        loss, nll_loss = label_smoothed_nll_loss(
+            lprobs,
+            target,
+            self.eps,
+            ignore_index=self.pad_idx,
+            reduce=reduce,
+        )
+        return loss, nll_loss
+
+    def compute_accuracy(self, model, net_output, sample):
+        lprobs, target = self.get_lprobs_and_target(model, net_output, sample)
+        mask = target.ne(self.pad_idx)
+        n_correct = torch.sum(
+            lprobs.argmax(1).masked_select(mask).eq(target.masked_select(mask))
+        )
+        total = torch.sum(mask)
+        return n_correct, total
+
+    def get_lprobs_and_target(self, model, net_output, sample):
+        lprobs = model.get_normalized_probs(net_output, log_probs=True)
+        target = sample["decoder_target"]
+        if self.ignore_prefix_size > 0:
+            if getattr(lprobs, "batch_first", False):
+                lprobs = lprobs[:, self.ignore_prefix_size :, :].contiguous()
+                target = target[:, self.ignore_prefix_size :].contiguous()
+            else:
+                lprobs = lprobs[self.ignore_prefix_size :, :, :].contiguous()
+                target = target[self.ignore_prefix_size :, :].contiguous()
+        return lprobs.view(-1, lprobs.size(-1)), target.view(-1)
+
+
+    @staticmethod
+    def reduce_metrics(logging_outputs) -> None:
+        """Aggregate logging outputs from data parallel training."""
+
+        loss_sum = utils.item(sum(log.get("loss", 0) for log in logging_outputs))
+        ntokens = utils.item(sum(log.get("ntokens", 0) for log in logging_outputs))
+        nsentences = utils.item(
+            sum(log.get("nsentences", 0) for log in logging_outputs)
+        )
+        sample_size = utils.item(
+            sum(log.get("sample_size", 0) for log in logging_outputs)
+        )
+
+        metrics.log_scalar(
+            "loss", loss_sum / sample_size / math.log(2), sample_size, round=3
+        )
+        metrics.log_scalar("ntokens", ntokens)
+        metrics.log_scalar("nsentences", nsentences)
+        if sample_size != ntokens:
+            metrics.log_scalar(
+                "nll_loss", loss_sum / ntokens / math.log(2), ntokens, round=3
+            )
+
+        c_errors = sum(log.get("c_errors", 0) for log in logging_outputs)
+        metrics.log_scalar("_c_errors", c_errors)
+        c_total = sum(log.get("c_total", 0) for log in logging_outputs)
+        metrics.log_scalar("_c_total", c_total)
+        w_errors = sum(log.get("w_errors", 0) for log in logging_outputs)
+        metrics.log_scalar("_w_errors", w_errors)
+        wv_errors = sum(log.get("wv_errors", 0) for log in logging_outputs)
+        metrics.log_scalar("_wv_errors", wv_errors)
+        w_total = sum(log.get("w_total", 0) for log in logging_outputs)
+        metrics.log_scalar("_w_total", w_total)
+
+        if c_total > 0:
+            metrics.log_derived(
+                "uer",
+                lambda meters: safe_round(
+                    meters["_c_errors"].sum * 100.0 / meters["_c_total"].sum, 3
+                )
+                if meters["_c_total"].sum > 0
+                else float("nan"),
+            )
+        if w_total > 0:
+            metrics.log_derived(
+                "wer",
+                lambda meters: safe_round(
+                    meters["_w_errors"].sum * 100.0 / meters["_w_total"].sum, 3
+                )
+                if meters["_w_total"].sum > 0
+                else float("nan"),
+            )
+            metrics.log_derived(
+                "raw_wer",
+                lambda meters: safe_round(
+                    meters["_wv_errors"].sum * 100.0 / meters["_w_total"].sum, 3
+                )
+                if meters["_w_total"].sum > 0
+                else float("nan"),
+            )
+
+        if "dec_loss" in logging_outputs[0]:
+            ctc_loss_sum = sum(log.get("ctc_loss", 0) for log in logging_outputs)
+            dec_loss_sum = sum(log.get("dec_loss", 0) for log in logging_outputs)
+            dec_nll_loss_sum = sum(log.get("dec_nll_loss", 0) for log in logging_outputs)
+            dec_sample_size = sum(log.get("dec_sample_size", 0) for log in logging_outputs)
+            metrics.log_scalar(
+                "dec_loss", dec_loss_sum / dec_sample_size / math.log(2), dec_sample_size, round=3
+            )
+            metrics.log_scalar(
+                "ctc_loss", ctc_loss_sum / sample_size / math.log(2), sample_size, round=3
+            )
+            metrics.log_scalar(
+                "dec_nll_loss", dec_nll_loss_sum / dec_sample_size / math.log(2), dec_sample_size, round=3
+            )
+            metrics.log_derived(
+                "dec_ppl", lambda meters: utils.get_perplexity(meters["dec_nll_loss"].avg)
+            )
+            total = utils.item(sum(log.get("total", 0) for log in logging_outputs))
+            if total > 0:
+                metrics.log_scalar("total", total)
+                n_correct = utils.item(
+                    sum(log.get("dec_n_correct", 0) for log in logging_outputs)
+                )
+                metrics.log_scalar("dec_n_correct", n_correct)
+                metrics.log_derived(
+                    "dec_accuracy",
+                    lambda meters: round(
+                        meters["dec_n_correct"].sum * 100.0 / meters["total"].sum, 3
+                    )
+                    if meters["total"].sum > 0
+                    else float("nan"),
+                )
+
+    @staticmethod
+    def logging_outputs_can_be_summed() -> bool:
+        """
+        Whether the logging outputs returned by `forward` can be summed
+        across workers prior to calling `reduce_metrics`. Setting this
+        to True will improves distributed training speed.
+        """
+        return True
diff --git a/Speech2S/speech2s/criterions/speechut_criterion.py b/Speech2S/speech2s/criterions/speechut_criterion.py
new file mode 100644
index 0000000000000000000000000000000000000000..0d735f1efd16aebf4146e26d5a5ebaeca2516ad7
--- /dev/null
+++ b/Speech2S/speech2s/criterions/speechut_criterion.py
@@ -0,0 +1,384 @@
+# ----------------------------------------------------------------------------
+# SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder Based Speech-Text Pre-training (https://arxiv.org/abs/2210.03730)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/SpeechUT
+# Code based on fairseq: https://github.com/facebookresearch/fairseq/tree/272c4c5197250997148fb12c0db6306035f166a4
+# 
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# ----------------------------------------------------------------------------
+
+import logging
+import math
+import re
+from dataclasses import dataclass, field
+from typing import List, Optional
+
+import numpy as np
+import torch
+import torch.nn.functional as F
+from fairseq import metrics, utils
+from fairseq.criterions import FairseqCriterion, register_criterion
+from fairseq.criterions.label_smoothed_cross_entropy import label_smoothed_nll_loss
+from fairseq.dataclass import FairseqDataclass
+
+logger = logging.getLogger(__name__)
+
+@dataclass
+class SpeechUTCriterionConfig(FairseqDataclass):
+    pred_masked_weight: float = field(
+        default=1.0,
+        metadata={"help": "weight for predictive loss for masked frames"},
+    )
+    pred_nomask_weight: float = field(
+        default=0.0,
+        metadata={"help": "weight for predictive loss for unmasked frames"},
+    )
+    loss_weights: Optional[List[float]] = field(
+        default=None,
+        metadata={"help": "weights for additional loss terms (not first one)"},
+    )
+    log_keys: List[str] = field(
+        default_factory=lambda: [],
+        metadata={"help": "output keys to log"},
+    )
+    u2t_ed_weight: float = field(
+        default=0.1,
+        metadata={"help": "weights for text ED Loss, loss will be (hubert_loss + text_mum_weight * MUM_Loss + u2t_ed_weight * CE_Loss + u2t_ctc_weight * CTC_loss)"},
+    )
+    u2t_ctc_weight: float = field(
+        default=0.0,
+        metadata={"help": "weights for text ED Loss, loss will be (hubert_loss + text_mum_weight * MUM_Loss + u2t_ed_weight * CE_Loss + u2t_ctc_weight * CTC_loss)"},
+    )
+    text_mum_weight: float = field(
+        default=0.0,
+        metadata={"help": "masked unit modeling weight from the text end"},
+    )
+    report_accuracy: bool = field(
+        default=True,
+        metadata={"help": "report decoder accuracy metric"},
+    )
+    ignore_prefix_size: int = field(
+        default=0,
+        metadata={"help": "Ignore first N tokens"},
+    )
+    label_smoothing: float = field(
+        default=0.0,
+        metadata={"help": "epsilon for label smoothing, 0 means no label smoothing"},
+    )
+    no_ctc_blank: bool = field(
+        default=False,
+        metadata={"help": "mask out the blank of ctc, only when dec_loss_type=ctc"},
+    )
+    label_smoothing: float = field(
+        default=0.0,
+        metadata={"help": "epsilon for label smoothing, 0 means no label smoothing"},
+    )
+
+@register_criterion("speechut_criterion", dataclass=SpeechUTCriterionConfig)
+class SpeechUTCriterion(FairseqCriterion):
+    def __init__(
+        self, 
+        task, 
+        pred_masked_weight, 
+        pred_nomask_weight, 
+        loss_weights=None, 
+        log_keys=None, 
+        u2t_ed_weight=0.1,
+        u2t_ctc_weight=0,
+        text_mum_weight=0,
+        report_accuracy=False, 
+        ignore_prefix_size=0,
+        label_smoothing=0,
+        no_ctc_blank=False,
+    ):
+        super().__init__(task)
+        self.pred_masked_weight = pred_masked_weight
+        self.pred_nomask_weight = pred_nomask_weight
+        self.loss_weights = loss_weights
+        self.log_keys = [] if log_keys is None else log_keys
+        self.u2t_ed_weight = u2t_ed_weight
+        self.u2t_ctc_weight = u2t_ctc_weight
+        self.text_mum_weight = text_mum_weight
+        self.report_accuracy = report_accuracy
+        self.ignore_prefix_size = ignore_prefix_size
+        self.eps = label_smoothing
+        self.no_ctc_blank = no_ctc_blank
+        self.padding_idx = task.dictionaries[0].pad()
+        self.eos_idx = task.dictionaries[0].eos()
+        self.blank_idx = task.dictionaries[0].bos()
+
+    def compute_hubert_loss(self, model, net_output, reduction, preffix='', suffix=''):
+        loss = 0
+        sample_size = []
+        logging_output = {}
+        loss_m_list = []
+        logp_m_list = model.get_logits(net_output, True)
+        targ_m_list = model.get_targets(net_output, True)
+        assert self.pred_masked_weight == 0 or len(logp_m_list) > 0
+        for i, (logp_m, targ_m) in enumerate(zip(logp_m_list, targ_m_list)):
+            loss_m = F.cross_entropy(logp_m, targ_m, reduction=reduction)
+            loss_m_list.append(loss_m)
+            logging_output[f"{preffix}loss_m_{i}"] = loss_m.detach().item()
+        if self.pred_masked_weight > 0:
+            loss += self.pred_masked_weight * sum(loss_m_list)
+            sample_size.append(targ_m_list[0].numel())
+
+        loss_u_list = []
+        logp_u_list = model.get_logits(net_output, False)
+        targ_u_list = model.get_targets(net_output, False)
+        assert self.pred_nomask_weight == 0 or len(logp_u_list) > 0
+        for i, (logp_u, targ_u) in enumerate(zip(logp_u_list, targ_u_list)):
+            loss_u = F.cross_entropy(logp_u, targ_u, reduction=reduction)
+            loss_u_list.append(loss_u)
+            logging_output[f"{preffix}loss_u_{i}"] = loss_u.detach().item()
+        if self.pred_nomask_weight > 0:
+            loss += self.pred_nomask_weight * sum(loss_u_list)
+            sample_size.append(targ_u_list[0].numel())
+        
+        sample_size = np.mean(sample_size)
+
+        def compute_correct(logits, targets):
+            if logits.numel() == 0:
+                return 0, 0
+            else:
+                assert logits.dim() > 1, logits.shape
+                max = logits.argmax(-1) == targets
+                min = logits.argmin(-1) == targets
+                both = max & min
+                corr = max.long().sum().item() - both.long().sum().item()
+                count = max.numel()
+                return corr, count
+
+        with torch.no_grad():
+            for i, (logp_m, targ_m) in enumerate(zip(logp_m_list, targ_m_list)):
+                corr_m, count_m = compute_correct(logp_m, targ_m)
+                logging_output[f"correct_m_{i}{suffix}"] = corr_m
+                logging_output[f"count_m_{i}{suffix}"] = count_m
+
+            for i, (logp_u, targ_u) in enumerate(zip(logp_u_list, targ_u_list)):
+                corr_u, count_u = compute_correct(logp_u, targ_u)
+                logging_output[f"correct_u_{i}{suffix}"] = corr_u
+                logging_output[f"count_u_{i}{suffix}"] = count_u
+
+        return loss, sample_size, logging_output
+
+
+    def forward(self, model, sample, reduce=True, log_pred=False):
+        """Compute the loss for the given sample.
+        Returns a tuple with three elements:
+        1) the loss
+        2) the sample size, which is used as the denominator for the gradient
+        3) logging outputs to display while training
+        """
+        reduction = "sum" if reduce else "none"
+
+        if "net_input" in sample:
+            unit_sample = text_sample = None
+        else:
+            unit_sample = sample.get("text_mono", None)
+            text_sample = sample.get("text_paired", None)
+            assert unit_sample is not None or text_sample is not None
+            sample = sample.get("speech")
+
+        ### 1. S2U: do hubert forward and loss computation
+        sample["modality"] = "speech"
+        net_output = model(target_list=sample["target_list"], **sample["net_input"])
+        loss, sample_size, logging_output = self.compute_hubert_loss(
+            model,
+            net_output,
+            reduction,
+        )
+        if self.loss_weights is not None:
+            assert hasattr(model, "get_extra_losses")
+            extra_losses, names = model.get_extra_losses(net_output)
+            if torch.is_tensor(extra_losses):
+                extra_losses = [extra_losses]
+                names = [names]
+            if len(self.loss_weights) == 1 and len(extra_losses) != 1:
+                self.loss_weights = [self.loss_weights[0]] * len(extra_losses)
+            assert len(extra_losses) == len(
+                self.loss_weights
+            ), f"{len(extra_losses)}, {len(self.loss_weights)}"
+            for p, n, coef in zip(extra_losses, names, self.loss_weights):
+                if coef != 0 and p is not None:
+                    p = coef * p.float() * sample_size
+                    loss += p
+                    logging_output[f"loss_{n}"] = p.item()
+        for lk in self.log_keys:
+            if lk in net_output:
+                logging_output[lk] = float((net_output[lk]))
+        
+        ### 2. do text U2T forward and loss computation
+        if text_sample is not None and (self.u2t_ctc_weight + self.u2t_ed_weight) > 0:
+            ## 2.1 re-loading "target_list", in default case, target_list = [src_tokens],
+            ## while in case of using "unit-phone-char" structure, target_list will be [ref_tokens]
+            text_sample["net_input"]["target_list"] = [
+                text_sample.get("ref_tokens", text_sample["net_input"]["src_tokens"].clone()),
+            ]
+            text_net_output = model(**text_sample["net_input"])
+            text_sample_size = text_sample["ntokens"]
+
+            ### 2.1 U2T_UCTC
+            if self.u2t_ctc_weight > 0:
+                text_ctc_loss = self.compute_ctc_loss(model, text_net_output, text_sample["target"], reduction=reduction)
+                loss += self.u2t_ctc_weight * text_ctc_loss * sample_size / text_sample_size
+                logging_output["text_ctc_loss"] = utils.item(text_ctc_loss)
+                logging_output["text_sample_size"] = text_sample_size
+
+            ### 2.2 U2T_ED
+            if self.u2t_ed_weight > 0:
+                text_dec_loss, text_dec_nll_loss = self.compute_ce_loss(model, text_net_output["decoder_out"], text_sample, reduce=reduce)
+                loss += self.u2t_ed_weight * text_dec_loss * sample_size / text_sample_size
+                logging_output["text_dec_loss"] = utils.item(text_dec_loss)
+                logging_output["text_dec_nll_loss"] = utils.item(text_dec_nll_loss)
+                logging_output["text_sample_size"] = text_sample_size
+                if self.report_accuracy:
+                    n_correct, total = self.compute_accuracy(model, text_net_output["decoder_out"], text_sample)
+                    logging_output["correct_text_dec"] = utils.item(n_correct.data)
+                    logging_output["count_text_dec"] = utils.item(total.data)
+
+        ### 3. do unit MUM forward and loss computation
+        if unit_sample is not None and self.text_mum_weight > 0:
+            src_tokens = unit_sample["net_input"]["src_tokens"]
+            target = unit_sample.get("target", None)
+            target = src_tokens.clone() if target is None else target
+            unit_net_output = model.forward_mum(src_tokens, target)
+            loss_num, sample_size_mum, logging_output_mum = self.compute_hubert_loss(
+                model,
+                unit_net_output,
+                reduction,
+                preffix="mum_",
+                suffix="_mum",
+            )
+            loss += self.text_mum_weight * loss_num * sample_size / sample_size_mum
+            logging_output["unit_sample_size"] = sample_size_mum
+            logging_output.update(logging_output_mum)
+
+        logging_output = {
+            "loss": utils.item(loss) if reduce else loss,
+            "ntokens": sample_size,
+            "nsentences": sample["id"].numel() + (text_sample["id"].numel() if text_sample is not None else 0),
+            "sample_size": sample_size,
+            **logging_output,
+        }
+
+        return loss, sample_size, logging_output
+
+    def compute_ctc_loss(self, model, net_output, target, reduction):
+        logits = net_output["encoder_out_ctc"][0]  # (T, B, C) from the code-encoder
+        if self.no_ctc_blank:
+            ## set prob of <blank> to -inf
+            logits = logits.float()
+            logits[:, :, self.blank_idx] = -1000000.0
+        
+        lprobs = F.log_softmax(logits.float(), dim=-1)
+
+        encoder_padding_mask = net_output["encoder_padding_mask"][0]
+        non_padding_mask = ~encoder_padding_mask
+        input_lengths = non_padding_mask.long().sum(-1)
+        pad_mask = (target != self.padding_idx) & (target != self.eos_idx)
+        targets_flat = target.masked_select(pad_mask)
+        target_lengths = pad_mask.sum(-1)
+
+        with torch.backends.cudnn.flags(enabled=False):
+            loss = F.ctc_loss(
+                lprobs,
+                targets_flat,
+                input_lengths,
+                target_lengths,
+                blank=self.blank_idx,
+                reduction=reduction,
+                zero_infinity=True,
+            )
+        return loss
+
+    def compute_ce_loss(self, model, net_output, sample, reduce=True):
+        lprobs, target = self.get_lprobs_and_target(model, net_output, sample)
+        loss, nll_loss = label_smoothed_nll_loss(
+            lprobs,
+            target,
+            self.eps,
+            ignore_index=self.padding_idx,
+            reduce=reduce,
+        )
+        return loss, nll_loss
+
+    def compute_accuracy(self, model, net_output, sample):
+        lprobs, target = self.get_lprobs_and_target(model, net_output, sample)
+        mask = target.ne(self.padding_idx)
+        n_correct = torch.sum(
+            lprobs.argmax(1).masked_select(mask).eq(target.masked_select(mask))
+        )
+        total = torch.sum(mask)
+        return n_correct, total
+
+    def get_lprobs_and_target(self, model, net_output, sample):
+        lprobs = model.get_normalized_probs(net_output, log_probs=True)
+        target = sample["target"]
+
+        return lprobs.view(-1, lprobs.size(-1)), target.view(-1)
+
+    @staticmethod
+    def reduce_metrics(logging_outputs) -> None:
+        """Aggregate logging outputs from data parallel training (copied from normal cross entropy)."""
+        loss_sum = sum(log.get("loss", 0) for log in logging_outputs)
+        ntokens = sum(log.get("ntokens", 0) for log in logging_outputs)
+        sample_size = sum(log.get("sample_size", 0) for log in logging_outputs)
+
+        metrics.log_scalar(
+            "loss", loss_sum / sample_size / math.log(2), sample_size, round=3
+        )
+        if sample_size != ntokens:
+            metrics.log_scalar(
+                "nll_loss", loss_sum / ntokens / math.log(2), ntokens, round=3
+            )
+            metrics.log_derived(
+                "ppl", lambda meters: utils.get_perplexity(meters["nll_loss"].avg)
+            )
+        else:
+            metrics.log_derived(
+                "ppl", lambda meters: utils.get_perplexity(meters["loss"].avg)
+            )
+
+        counts = {}
+        for lk in logging_outputs[0].keys():
+            if lk.startswith("count_"):
+                val = sum(log.get(lk, 0) for log in logging_outputs)
+                metrics.log_scalar(lk, val)
+                counts[lk] = val
+
+        for lk in logging_outputs[0].keys():
+            if lk.startswith("loss_"):
+                val = sum(log.get(lk, 0) for log in logging_outputs)
+                metrics.log_scalar(lk, val / sample_size / math.log(2), round=3)
+            elif lk.startswith("correct_"):
+                val = sum(log.get(lk, 0) for log in logging_outputs)
+                metrics.log_scalar(lk, val / counts[re.sub("correct", "count", lk)])
+
+        if "text_sample_size" in logging_outputs[0]:
+            text_sample_size = sum(log.get("text_sample_size", 0) for log in logging_outputs)
+            for lk in logging_outputs[0].keys():
+                if lk.startswith("text_") and lk.endswith("_loss"):
+                    val = sum(log.get(lk, 0) for log in logging_outputs)
+                    metrics.log_scalar(lk, val / text_sample_size / math.log(2), round=3)
+
+        if "unit_sample_size" in logging_outputs[0]:
+            unit_sample_size = sum(log.get("unit_sample_size", 0) for log in logging_outputs)
+            for lk in logging_outputs[0].keys():
+                if lk.startswith("mum_loss_"):
+                    val = sum(log.get(lk, 0) for log in logging_outputs)
+                    metrics.log_scalar(lk, val / unit_sample_size / math.log(2), round=3)
+
+    @staticmethod
+    def aggregate_logging_outputs(logging_outputs):
+        """Aggregate logging outputs from data parallel training."""
+        raise NotImplementedError()
+
+    @staticmethod
+    def logging_outputs_can_be_summed() -> bool:
+        """
+        Whether the logging outputs returned by `forward` can be summed
+        across workers prior to calling `reduce_metrics`. Setting this
+        to True will improves distributed training speed.
+        """
+        return False
diff --git a/Speech2S/speech2s/data/concat_dataset.py b/Speech2S/speech2s/data/concat_dataset.py
new file mode 100644
index 0000000000000000000000000000000000000000..5766921ac39b571010b318e0d4b6f967cd21d96e
--- /dev/null
+++ b/Speech2S/speech2s/data/concat_dataset.py
@@ -0,0 +1,129 @@
+# --------------------------------------------------------
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Based on fairseq code bases
+# https://github.com/facebookresearch/fairseq
+# --------------------------------------------------------
+
+import bisect
+
+import numpy as np
+from torch.utils.data.dataloader import default_collate
+
+from fairseq.data import FairseqDataset
+
+
+class ConcatDataset(FairseqDataset):
+    @staticmethod
+    def cumsum(sequence, sample_ratios):
+        r, s = [], 0
+        for e, ratio in zip(sequence, sample_ratios):
+            curr_len = int(ratio * len(e))
+            r.append(curr_len + s)
+            s += curr_len
+        return r
+
+    def __init__(self, datasets, sample_ratios=1):
+        super(ConcatDataset, self).__init__()
+        assert len(datasets) > 0, "datasets should not be an empty iterable"
+        self.datasets = list(datasets)
+        if isinstance(sample_ratios, int):
+            sample_ratios = [sample_ratios] * len(self.datasets)
+        self.sample_ratios = sample_ratios
+        self.cumulative_sizes = self.cumsum(self.datasets, sample_ratios)
+        self.real_sizes = [len(d) for d in self.datasets]
+
+    def __len__(self):
+        return self.cumulative_sizes[-1]
+
+    def __getitem__(self, idx):
+        dataset_idx, sample_idx = self._get_dataset_and_sample_index(idx)
+        return self.datasets[dataset_idx][sample_idx]
+
+    def _get_dataset_and_sample_index(self, idx: int):
+        dataset_idx = bisect.bisect_right(self.cumulative_sizes, idx)
+        if dataset_idx == 0:
+            sample_idx = idx
+        else:
+            sample_idx = idx - self.cumulative_sizes[dataset_idx - 1]
+        sample_idx = sample_idx % self.real_sizes[dataset_idx]
+        return dataset_idx, sample_idx
+
+    def collater(self, samples, **extra_args):
+        # For now only supports datasets with same underlying collater implementations
+        if hasattr(self.datasets[0], "collater"):
+            return self.datasets[0].collater(samples, **extra_args)
+        else:
+            return default_collate(samples, **extra_args)
+
+    def size(self, idx: int):
+        """
+        Return an example's size as a float or tuple.
+        """
+        dataset_idx, sample_idx = self._get_dataset_and_sample_index(idx)
+        return self.datasets[dataset_idx].size(sample_idx)
+
+    def num_tokens(self, index: int):
+        return np.max(self.size(index))
+
+    def attr(self, attr: str, index: int):
+        dataset_idx = bisect.bisect_right(self.cumulative_sizes, index)
+        return getattr(self.datasets[dataset_idx], attr, None)
+
+    @property
+    def sizes(self):
+        _dataset_sizes = []
+        for ds, sr in zip(self.datasets, self.sample_ratios):
+            if isinstance(ds.sizes, np.ndarray):
+                _dataset_sizes.append(np.tile(ds.sizes, sr))
+            else:
+                # Only support underlying dataset with single size array.
+                assert isinstance(ds.sizes, list)
+                _dataset_sizes.append(np.tile(ds.sizes[0], sr))
+        return np.concatenate(_dataset_sizes)
+
+    @property
+    def supports_prefetch(self):
+        return all(d.supports_prefetch for d in self.datasets)
+
+    def ordered_indices(self):
+        """
+        Returns indices sorted by length. So less padding is needed.
+        """
+        if isinstance(self.sizes, np.ndarray) and len(self.sizes.shape) > 1:
+            # special handling for concatenating lang_pair_datasets
+            if getattr(self.datasets[0], "shuffle", False):
+                indices = np.random.permutation(len(self)).astype(np.int64)
+            else:
+                indices = np.arange(len(self), dtype=np.int64)
+            sizes = self.sizes
+            tgt_sizes = (
+                sizes[:, 1] if len(sizes.shape) > 0 and sizes.shape[1] > 1 else None
+            )
+            src_sizes = (
+                sizes[:, 0] if len(sizes.shape) > 0 and sizes.shape[1] > 1 else sizes
+            )
+            # sort by target length, then source length
+            if tgt_sizes is not None:
+                indices = indices[np.argsort(tgt_sizes[indices], kind="mergesort")]
+            return indices[np.argsort(src_sizes[indices], kind="mergesort")]
+        else:
+            return np.argsort(self.sizes)
+
+    def prefetch(self, indices):
+        frm = 0
+        for to, ds in zip(self.cumulative_sizes, self.datasets):
+            real_size = len(ds)
+            if getattr(ds, "supports_prefetch", False):
+                ds.prefetch([(i - frm) % real_size for i in indices if frm <= i < to])
+            frm = to
+
+    @property
+    def can_reuse_epoch_itr_across_epochs(self):
+        return all(d.can_reuse_epoch_itr_across_epochs for d in self.datasets)
+
+    def set_epoch(self, epoch):
+        super().set_epoch(epoch)
+        for ds in self.datasets:
+            if hasattr(ds, "set_epoch"):
+                ds.set_epoch(epoch)
diff --git a/Speech2S/speech2s/data/hubert_dataset.py b/Speech2S/speech2s/data/hubert_dataset.py
new file mode 100644
index 0000000000000000000000000000000000000000..64965dea445a0a5afc63c887b1bc89cece0b203b
--- /dev/null
+++ b/Speech2S/speech2s/data/hubert_dataset.py
@@ -0,0 +1,597 @@
+# --------------------------------------------------------
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Based on fairseq code bases
+# https://github.com/facebookresearch/fairseq
+# --------------------------------------------------------
+
+import itertools
+import logging
+import io
+import os
+import sys
+import time
+from pathlib import Path
+from typing import Any, List, Optional, Union, Tuple
+
+import numpy as np
+
+import torch
+import torch.nn.functional as F
+from fairseq.data import data_utils, Dictionary
+from fairseq.data.fairseq_dataset import FairseqDataset
+from fairseq.data.audio.audio_utils import (
+    read_from_stored_zip,
+    is_sf_audio_data,
+)
+
+FEATURE_OR_SF_AUDIO_FILE_EXTENSIONS = {".npy", ".wav", ".flac", ".ogg"}
+
+logger = logging.getLogger(__name__)
+
+def parse_path(path: str) -> Tuple[str, List[int]]:
+    """Parse data path which is either a path to
+    1. a .npy/.wav/.flac/.ogg file
+    2. a stored ZIP file with slicing info: "[zip_path]:[offset]:[length]"
+
+      Args:
+          path (str): the data path to parse
+
+      Returns:
+          file_path (str): the file path
+          slice_ptr (list of int): empty in case 1;
+            byte offset and length for the slice in case 2
+    """
+
+    if Path(path).suffix in FEATURE_OR_SF_AUDIO_FILE_EXTENSIONS:
+        _path, slice_ptr = path, []
+    else:
+        _path, *slice_ptr = path.split(":")
+        if not Path(_path).is_file():
+            raise FileNotFoundError(f"File not found: {_path}")
+    assert len(slice_ptr) in {0, 1, 2}, f"Invalid path: {path}"
+    slice_ptr = [int(i) for i in slice_ptr]
+    return _path, slice_ptr
+
+def load_audio(manifest_path, max_keep, min_keep, retry_times=5):
+    n_long, n_short = 0, 0
+    names, inds, sizes, chunk_names, chunk_indices = [], [], [], [], []
+    for i in range(retry_times):
+        with open(manifest_path) as f:
+            root = f.readline().strip()
+            for ind, line in enumerate(f):
+                items = line.strip().split("\t")
+                assert len(items) == 2, line
+                sz = int(items[1])
+                if min_keep is not None and sz < min_keep:
+                    n_short += 1
+                elif max_keep is not None and sz > max_keep:
+                    n_long += 1
+                else:
+                    fname = items[0].split(":")
+                    if len(fname) > 2:
+                        if len(chunk_names) == 0 or fname[0] != chunk_names[-1]:
+                            chunk_names.append(fname[0])
+                            chunk_indices.append(len(names))
+                    names.append(items[0])
+                    inds.append(ind)
+                    sizes.append(sz)
+        if len(names) == 0:
+            logger.warn(f"Fail to load manifest for the {i} time")
+            time.sleep(1)
+            continue
+        else:
+            break
+    tot = ind + 1
+    logger.info(
+        (
+            f"max_keep={max_keep}, min_keep={min_keep}, "
+            f"loaded {len(names)}, skipped {n_short} short and {n_long} long, "
+            f"longest-loaded={max(sizes)}, shortest-loaded={min(sizes)}"
+        )
+    )
+    return root, names, inds, tot, sizes, chunk_names, chunk_indices
+
+
+def load_label(label_path, inds, tot, retry_times=5):
+    for i in range(retry_times):
+        with open(label_path) as f:
+            labels = [line.rstrip() for line in f]
+            if len(labels) == 0:
+                logger.warn(f"Fail to load label for the {i} time")
+                time.sleep(1)
+                continue
+            else:
+                break
+    assert (
+        len(labels) == tot
+    ), f"number of labels does not match ({len(labels)} != {tot})"
+    labels = [labels[i] for i in inds]
+    return labels
+
+
+def load_label_offset(label_path, inds, tot, retry_times=5):
+    for i in range(retry_times):
+        with open(label_path) as f:
+            code_lengths = [len(line.encode("utf-8")) for line in f]
+            if len(code_lengths) == 0:
+                logger.warn(f"Fail to load label for the {i} time")
+                time.sleep(1)
+                continue
+            else:
+                break
+    assert (
+        len(code_lengths) == tot
+    ), f"number of labels does not match ({len(code_lengths)} != {tot})"
+    offsets = list(itertools.accumulate([0] + code_lengths))
+    offsets = [(offsets[i], offsets[i + 1]) for i in inds]
+    return offsets
+
+
+def verify_label_lengths(
+    audio_sizes,
+    audio_rate,
+    label_path,
+    label_rate,
+    inds,
+    tot,
+    tol=0.1,  # tolerance in seconds
+):
+    if label_rate < 0:
+        logger.info(f"{label_path} is sequence label. skipped")
+        return
+
+    with open(label_path) as f:
+        lengths = [len(line.rstrip().split()) for line in f]
+        assert len(lengths) == tot
+        lengths = [lengths[i] for i in inds]
+    num_invalid = 0
+    for i, ind in enumerate(inds):
+        dur_from_audio = audio_sizes[i] / audio_rate
+        dur_from_label = lengths[i] / label_rate
+        if abs(dur_from_audio - dur_from_label) > tol:
+            logger.warning(
+                (
+                    f"audio and label duration differ too much "
+                    f"(|{dur_from_audio} - {dur_from_label}| > {tol}) "
+                    f"in line {ind+1} of {label_path}. Check if `label_rate` "
+                    f"is correctly set (currently {label_rate}). "
+                    f"num. of samples = {audio_sizes[i]}; "
+                    f"label length = {lengths[i]}"
+                )
+            )
+            num_invalid += 1
+    if num_invalid > 0:
+        logger.warning(
+            f"total {num_invalid} (audio, label) pairs with mismatched lengths"
+        )
+
+
+class HubertDataset(FairseqDataset):
+    def __init__(
+        self,
+        manifest_path: str,
+        sample_rate: float,
+        label_paths: List[str],
+        label_rates: Union[List[float], float],  # -1 for sequence labels
+        pad_list: List[str],
+        eos_list: List[str],
+        label_processors: Optional[List[Any]] = None,
+        max_keep_sample_size: Optional[int] = None,
+        min_keep_sample_size: Optional[int] = None,
+        max_sample_size: Optional[int] = None,
+        shuffle: bool = True,
+        pad_audio: bool = False,
+        normalize: bool = False,
+        store_labels: bool = True,
+        random_crop: bool = False,
+        single_target: bool = False,
+        tgt_dict: Optional[Dictionary] = None,
+        add_decoder_target: bool = False,
+        fine_tuning: bool = False,
+        tgt_lang_idx: int = None,
+        tokenizer = None,
+        mbart_style_lang_id: bool = False,
+        retry_times: int = 5,
+        reduce_label_for_dec: bool = True,
+    ):
+        self.audio_root, self.audio_names, inds, tot, self.wav_sizes, self.chunk_names, self.chunk_indices = load_audio(
+            manifest_path, max_keep_sample_size, min_keep_sample_size, retry_times
+        )
+        self.sample_rate = sample_rate
+        self.shuffle = shuffle
+        self.random_crop = random_crop
+        self.tgt_dict = tgt_dict
+        self.add_decoder_target = add_decoder_target
+        self.fine_tuning = fine_tuning
+
+        self.num_labels = len(label_paths)
+        self.pad_list = pad_list
+        self.eos_list = eos_list
+        self.label_processors = label_processors
+        self.single_target = single_target
+        self.epoch = 0
+
+        self.label_rates = (
+            [label_rates for _ in range(len(label_paths))]
+            if isinstance(label_rates, int)
+            else label_rates
+        )
+        self.store_labels = store_labels
+        if store_labels:
+            self.label_list = [load_label(p, inds, tot, retry_times) for p in label_paths]
+        else:
+            self.label_paths = label_paths
+            self.label_offsets_list = [
+                load_label_offset(p, inds, tot, retry_times) for p in label_paths
+            ]
+        assert label_processors is None or len(label_processors) == self.num_labels
+        for label_path, label_rate in zip(label_paths, self.label_rates):
+            verify_label_lengths(
+                self.wav_sizes, sample_rate, label_path, label_rate, inds, tot
+            )
+
+        self.max_sample_size = (
+            max_sample_size if max_sample_size is not None else sys.maxsize
+        )
+        self.pad_audio = pad_audio
+        self.normalize = normalize
+        self.tgt_lang_idx = tgt_lang_idx
+        self.tokenizer = tokenizer
+        self.mbart_style_lang_id = mbart_style_lang_id
+        self.retry_times = retry_times
+        self.reduce_label_for_dec = reduce_label_for_dec
+        logger.info(
+            f"pad_audio={pad_audio}, random_crop={random_crop}, tgt_lang_idx={self.tgt_lang_idx}, reduce_label_for_dec={reduce_label_for_dec}, "
+            f"mbart_style_lang_id={mbart_style_lang_id}, normalize={normalize}, max_sample_size={self.max_sample_size}"
+        )
+
+    def set_epoch(self, epoch):
+        self.epoch = epoch
+    
+    def batch_by_size(self, indices, max_tokens=None, max_sentences=None, required_batch_size_multiple=1):
+        self.max_tokens = max_tokens
+        self.max_sentences = max_sentences
+        self.required_batch_size_multiple = required_batch_size_multiple
+        if isinstance(indices[0], np.ndarray):
+            batch_list = []
+            for indice in indices:
+                batch = super(HubertDataset, self).batch_by_size(indice, max_tokens, max_sentences, required_batch_size_multiple)
+                batch_list.append(batch)
+            return batch_list
+        else:
+            return super(HubertDataset, self).batch_by_size(indices, max_tokens, max_sentences, required_batch_size_multiple)
+    def shuffle_batches(self, batches, seed):
+        if isinstance(batches[0], list):
+            new_batches = []
+            with data_utils.numpy_seed(seed):
+                np.random.shuffle(batches)
+                for batch in batches:
+                    np.random.shuffle(batch)
+                    new_batches.extend(batch)
+            return new_batches
+        else:
+            with data_utils.numpy_seed(seed):
+                np.random.shuffle(batches)
+        return batches
+    
+    def get_audio(self, index):
+        import soundfile as sf
+
+        wav_path = os.path.join(self.audio_root, self.audio_names[index])
+        _path, slice_ptr = parse_path(wav_path)
+        if len(slice_ptr) == 1:
+            import kaldiio
+            feat = kaldiio.load_mat(wav_path)
+            feat = torch.from_numpy(feat).float()
+            if self.normalize:
+                with torch.no_grad():
+                    feat = F.layer_norm(feat, feat.shape[-1])
+            return feat
+        else:
+            if len(slice_ptr) == 2:
+                byte_data = read_from_stored_zip(_path, slice_ptr[0], slice_ptr[1])
+                assert is_sf_audio_data(byte_data)
+                wav_path = io.BytesIO(byte_data)
+            for i in range(self.retry_times):
+                if i < self.retry_times - 1:
+                    try:
+                        wav, cur_sample_rate = sf.read(wav_path)
+                        break
+                    except Exception as e:
+                        logger.warn(f"Fail to load wav for the {i} time")
+                        logger.warn(e)
+                        time.sleep(1)
+                        continue
+                else:
+                    wav, cur_sample_rate = sf.read(wav_path)
+
+            wav = torch.from_numpy(wav).float()
+            wav = self.postprocess(wav, cur_sample_rate)
+            return wav
+
+    def get_label(self, index, label_idx):
+        if self.store_labels:
+            label = self.label_list[label_idx][index]
+        else:
+            with open(self.label_paths[label_idx]) as f:
+                offset_s, offset_e = self.label_offsets_list[label_idx][index]
+                f.seek(offset_s)
+                label = f.read(offset_e - offset_s)
+
+        if self.tokenizer is not None and self.fine_tuning:
+            label = self.tokenizer.encode(label)
+
+        if self.label_processors is not None:
+            label = self.label_processors[label_idx](label)
+        return label
+
+    def get_labels(self, index):
+        return [self.get_label(index, i) for i in range(self.num_labels)]
+
+    def __getitem__(self, index):
+        wav = self.get_audio(index)
+        labels = self.get_labels(index)
+        return {"id": index, "source": wav, "label_list": labels}
+
+    def __len__(self):
+        return len(self.wav_sizes)
+
+    def crop_to_max_size(self, wav, target_size):
+        size = len(wav)
+        diff = size - target_size
+        if diff <= 0:
+            return wav, 0
+
+        start, end = 0, target_size
+        if self.random_crop:
+            start = np.random.randint(0, diff + 1)
+            end = size - diff + start
+        return wav[start:end], start
+
+    def collater(self, samples):
+        # target = max(sizes) -> random_crop not used
+        # target = max_sample_size -> random_crop used for long
+        samples = [s for s in samples if s["source"] is not None]
+        if len(samples) == 0:
+            return {}
+
+        audios = [s["source"] for s in samples]
+        audio_sizes = [len(s) for s in audios]
+        if self.pad_audio:
+            audio_size = min(max(audio_sizes), self.max_sample_size)
+        else:
+            audio_size = min(min(audio_sizes), self.max_sample_size)
+        feat_dim = audios[0].size(-1) if audios[0].dim() > 1 else 1
+        collated_audios, padding_mask, audio_starts = self.collater_audio(
+            audios, audio_size, feat_dim,
+        )
+
+        targets_by_label = [
+            [s["label_list"][i] for s in samples] for i in range(self.num_labels)
+        ]
+        targets_list, lengths_list, ntokens_list = self.collater_label(
+            targets_by_label, audio_size, audio_starts
+        )
+
+        if self.add_decoder_target:
+            if self.fine_tuning:
+                    decoder_label = [
+                        torch.cat((targets_list[0][i, :lengths_list[0][i]], torch.tensor([self.tgt_dict.eos()])), 0).long()
+                        for i in range(targets_list[0].size(0))
+                    ]
+            else:
+                if self.tokenizer is not None:
+                    decoder_label = [
+                        # Set 48 for translate int to char and avoid \n
+                        torch.cat(
+                            (
+                                torch.tensor(
+                                    self.tokenizer.sp.Encode(
+                                        "".join(
+                                            [chr(j + 48) for j in (
+                                                targets_list[0][i, :lengths_list[0][i]].unique_consecutive() if self.reduce_label_for_dec else targets_list[0][i, :lengths_list[0][i]]
+                                            ).tolist()]
+                                        ), out_type=int
+                                    )
+                                ), 
+                                torch.tensor([self.tgt_dict.eos()])
+                            ), dim=0
+                        ).long()
+                        for i in range(targets_list[0].size(0))
+                    ]
+                else:
+                    decoder_label = [
+                        torch.cat((targets_list[0][i, :lengths_list[0][i]].unique_consecutive() if self.reduce_label_for_dec else targets_list[0][i, :lengths_list[0][i]], torch.tensor([self.tgt_dict.eos()])), 0).long()
+                        for i in range(targets_list[0].size(0))
+                    ]
+
+            if self.mbart_style_lang_id:
+                decoder_label = [
+                    torch.cat((decoder_label[i], torch.tensor([self.tgt_lang_idx])), 0).long()
+                    for i in range(targets_list[0].size(0))
+                ]
+
+            dec_ntokens = sum(x.size(0) for x in decoder_label)
+            decoder_target = data_utils.collate_tokens(
+                decoder_label,
+                self.tgt_dict.pad(),
+                self.tgt_dict.eos() if not self.mbart_style_lang_id else self.tgt_lang_idx,
+                left_pad=False,
+                move_eos_to_beginning=False,
+            )
+            decoder_target_lengths = torch.tensor(
+                [x.size(0) for x in decoder_label], dtype=torch.long
+            )
+            prev_output_tokens = data_utils.collate_tokens(
+                decoder_label,
+                self.tgt_dict.pad(),
+                self.tgt_dict.eos() if not self.mbart_style_lang_id else self.tgt_lang_idx,
+                left_pad=False,
+                move_eos_to_beginning=True,
+            )
+            
+            if self.tgt_lang_idx is not None and not self.mbart_style_lang_id:
+                assert (prev_output_tokens[:, 0] != self.tgt_dict.eos()).sum() == 0
+                prev_output_tokens[:, 0] = self.tgt_lang_idx
+
+            net_input = {
+                "source": collated_audios, 
+                "padding_mask": padding_mask,
+                "prev_output_tokens": prev_output_tokens,
+            }
+            batch = {
+                "id": torch.LongTensor([s["id"] for s in samples]),
+                "net_input": net_input,
+                "decoder_target": decoder_target,
+                "decoder_target_lengths": decoder_target_lengths,
+                "dec_ntokens": dec_ntokens,
+                "lang_idx": self.tgt_lang_idx,
+            }
+        else:
+            net_input = {"source": collated_audios, "padding_mask": padding_mask}
+            batch = {
+                "id": torch.LongTensor([s["id"] for s in samples]),
+                "net_input": net_input,
+            }
+
+        if self.single_target:
+            batch["target_lengths"] = lengths_list[0]
+            batch["ntokens"] = ntokens_list[0]
+            batch["target"] = targets_list[0]
+        else:
+            batch["target_lengths_list"] = lengths_list
+            batch["ntokens_list"] = ntokens_list
+            batch["target_list"] = targets_list
+        return batch
+
+    def collater_audio(self, audios, audio_size, feat_dim=1):
+        collated_audios = audios[0].new_zeros(len(audios), audio_size, feat_dim)
+        padding_mask = (
+            torch.BoolTensor(collated_audios.shape[0:2]).fill_(False)
+            # if self.pad_audio else None
+        )
+        audio_starts = [0 for _ in audios]
+        for i, audio in enumerate(audios):
+            audio = audio.view(-1, feat_dim)
+            diff = len(audio) - audio_size
+            if diff == 0:
+                collated_audios[i] = audio
+            elif diff < 0:
+                assert self.pad_audio
+                collated_audios[i] = torch.cat([audio, audio.new_full((-diff, feat_dim), 0.0)])
+                padding_mask[i, diff:] = True
+            else:
+                collated_audios[i], audio_starts[i] = self.crop_to_max_size(
+                    audio, audio_size
+                )
+        return collated_audios.squeeze(-1), padding_mask, audio_starts
+
+    def collater_frm_label(self, targets, audio_size, audio_starts, label_rate, pad):
+        assert label_rate > 0
+        s2f = label_rate / self.sample_rate
+        frm_starts = [int(round(s * s2f)) for s in audio_starts]
+        frm_size = int(round(audio_size * s2f))
+        if not self.pad_audio:
+            rem_size = [len(t) - s for t, s in zip(targets, frm_starts)]
+            frm_size = min(frm_size, *rem_size)
+        targets = [t[s : s + frm_size] for t, s in zip(targets, frm_starts)]
+        logger.debug(f"audio_starts={audio_starts}")
+        logger.debug(f"frame_starts={frm_starts}")
+        logger.debug(f"frame_size={frm_size}")
+
+        lengths = torch.LongTensor([len(t) for t in targets])
+        ntokens = lengths.sum().item()
+        targets = data_utils.collate_tokens(targets, pad_idx=pad, left_pad=False)
+        return targets, lengths, ntokens
+
+    def collater_seq_label(self, targets, pad):
+        lengths = torch.LongTensor([len(t) for t in targets])
+        ntokens = lengths.sum().item()
+        targets = data_utils.collate_tokens(targets, pad_idx=pad, left_pad=False)
+        return targets, lengths, ntokens
+
+    def collater_label(self, targets_by_label, audio_size, audio_starts):
+        targets_list, lengths_list, ntokens_list = [], [], []
+        itr = zip(targets_by_label, self.label_rates, self.pad_list)
+        for targets, label_rate, pad in itr:
+            if label_rate == -1:
+                targets, lengths, ntokens = self.collater_seq_label(targets, pad)
+            else:
+                targets, lengths, ntokens = self.collater_frm_label(
+                    targets, audio_size, audio_starts, label_rate, pad
+                )
+            targets_list.append(targets)
+            lengths_list.append(lengths)
+            ntokens_list.append(ntokens)
+        return targets_list, lengths_list, ntokens_list
+
+    def num_tokens(self, index):
+        return self.size(index)
+
+    def size(self, index):
+        if self.pad_audio:
+            return self.wav_sizes[index]
+        return min(self.wav_sizes[index], self.max_sample_size)
+
+    @property
+    def sizes(self):
+        return np.array(self.wav_sizes)
+
+    def ordered_indices(self):
+        """Return an ordered list of indices. Batches will be constructed based
+        on this order."""
+
+        if self.shuffle:
+            if len(self.chunk_names) > 0:
+                logger.info(f"ordered indices for epoch {self.epoch}")
+                with data_utils.numpy_seed(self.epoch):
+                    self.chunk_order = np.random.permutation(len(self.chunk_names))
+                chunk_count = 0
+                tmp_sizes = []
+                tmp_indices = []
+                indice = []
+                for i in self.chunk_order:
+                    chunk_count += 1
+                    start = self.chunk_indices[i]
+                    end = self.chunk_indices[i+1] if i < len(self.chunk_names) - 1 else len(self)
+                    size = list(self.sizes[start:end])
+                    tmp_indices.extend(list(np.arange(start, end)))
+                    tmp_sizes.extend(size)
+                    if chunk_count % 10 == 0 or i == self.chunk_order[0]:
+                        order = [np.random.permutation(len(tmp_indices))]
+                        order.append(
+                            np.minimum(
+                                np.array(tmp_sizes),
+                                self.max_sample_size,
+                            )
+                        )
+                        sort_idx = np.lexsort(order)[::-1]
+                        indice.append(np.array([tmp_indices[k] for k in sort_idx]))
+                        tmp_indices = []
+                        tmp_sizes =[]
+                return indice
+            else:
+                order = [np.random.permutation(len(self))]
+                order.append(
+                    np.minimum(
+                        np.array(self.sizes),
+                        self.max_sample_size,
+                    )
+                )
+                return np.lexsort(order)[::-1]
+        else:
+            return np.arange(len(self))
+
+    def postprocess(self, wav, cur_sample_rate):
+        if wav.dim() == 2:
+            wav = wav.mean(-1)
+        assert wav.dim() == 1, wav.dim()
+
+        if cur_sample_rate != self.sample_rate:
+            raise Exception(f"sr {cur_sample_rate} != {self.sample_rate}")
+
+        if self.normalize:
+            with torch.no_grad():
+                wav = F.layer_norm(wav, wav.shape)
+        return wav
diff --git a/Speech2S/speech2s/data/language_trible_dataset.py b/Speech2S/speech2s/data/language_trible_dataset.py
new file mode 100644
index 0000000000000000000000000000000000000000..6494127d6bb5d993d557f9f534f7cca83b0f7fa1
--- /dev/null
+++ b/Speech2S/speech2s/data/language_trible_dataset.py
@@ -0,0 +1,669 @@
+# --------------------------------------------------------
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Based on fairseq code bases
+# https://github.com/facebookresearch/fairseq
+# --------------------------------------------------------
+
+import logging
+import numpy as np
+import torch
+import os
+import itertools
+
+from fairseq.data import FairseqDataset, data_utils
+from fairseq.data import (
+    AppendTokenDataset,
+    ConcatDataset,
+    PrependTokenDataset,
+    data_utils,
+    indexed_dataset,
+)
+
+logger = logging.getLogger(__name__)
+
+def load_langtriple_dataset(
+    data_path,
+    split,
+    src,
+    src_dict,
+    ref,
+    ref_dict,
+    tgt,
+    tgt_dict,
+    combine,
+    dataset_impl,
+    upsample_primary,
+    left_pad_source,
+    left_pad_target,
+    max_source_positions,
+    max_target_positions,
+    prepend_bos=False,
+    load_alignments=False,
+    truncate_source=False,
+    append_source_id=False,
+    num_buckets=0,
+    shuffle=True,
+    pad_to_multiple=1,
+    prepend_bos_src=None,
+    lang_format="[{}]",
+):
+    assert not truncate_source
+    def split_exists(split, src, ref, tgt, lang, data_path):
+        filename = os.path.join(data_path, "{}.{}-{}-{}.{}".format(split, src, ref, tgt, lang))
+        return indexed_dataset.dataset_exists(filename, impl=dataset_impl)
+
+    src_datasets = []
+    ref_datasets = []
+    tgt_datasets = []
+
+    for k in itertools.count():
+        split_k = split + (str(k) if k > 0 else "")
+
+        # infer langcode
+        if split_exists(split_k, src, ref, tgt, src, data_path):
+            prefix = os.path.join(data_path, "{}.{}-{}-{}.".format(split_k, src, ref, tgt))
+        elif split_exists(split_k, tgt, ref, src, src, data_path):
+            prefix = os.path.join(data_path, "{}.{}-{}-{}.".format(split_k, tgt, ref, src))
+        else:
+            if k > 0:
+                break
+            else:
+                raise FileNotFoundError(
+                    "Dataset not found: {} ({})".format(split, data_path)
+                )
+
+        src_dataset = data_utils.load_indexed_dataset(
+            prefix + src, src_dict, dataset_impl
+        )
+        src_datasets.append(src_dataset)
+
+        ref_dataset = data_utils.load_indexed_dataset(
+            prefix + ref, ref_dict, dataset_impl
+        )
+        ref_datasets.append(ref_dataset)
+
+        tgt_dataset = data_utils.load_indexed_dataset(
+            prefix + tgt, tgt_dict, dataset_impl
+        )
+        if tgt_dataset is not None:
+            tgt_datasets.append(tgt_dataset)
+
+        logger.info(
+            "{} {} {}-{}-{} {} examples".format(
+                data_path, split_k, src, ref, tgt, len(src_datasets[-1])
+            )
+        )
+
+        if not combine:
+            break
+
+    assert len(src_datasets) == len(ref_datasets)
+    assert len(src_datasets) == len(tgt_datasets) or len(tgt_datasets) == 0
+
+    if len(src_datasets) == 1:
+        src_dataset = src_datasets[0]
+        ref_dataset = ref_datasets[0]
+        tgt_dataset = tgt_datasets[0] if len(tgt_datasets) > 0 else None
+    else:
+        sample_ratios = [1] * len(src_datasets)
+        sample_ratios[0] = upsample_primary
+        src_dataset = ConcatDataset(src_datasets, sample_ratios)
+        ref_dataset = ConcatDataset(ref_datasets, sample_ratios)
+        if len(tgt_datasets) > 0:
+            tgt_dataset = ConcatDataset(tgt_datasets, sample_ratios)
+        else:
+            tgt_dataset = None
+
+    if prepend_bos:
+        assert hasattr(src_dict, "bos_index") and hasattr(ref_dict, "bos_index") and hasattr(tgt_dict, "bos_index")
+        src_dataset = PrependTokenDataset(src_dataset, src_dict.bos())
+        ref_dataset = PrependTokenDataset(ref_dataset, ref_dict.bos())
+        if tgt_dataset is not None:
+            tgt_dataset = PrependTokenDataset(tgt_dataset, tgt_dict.bos())
+    elif prepend_bos_src is not None:
+        logger.info(f"prepending src bos: {prepend_bos_src}")
+        src_dataset = PrependTokenDataset(src_dataset, prepend_bos_src)
+        ref_dataset = PrependTokenDataset(ref_dataset, prepend_bos_src)
+
+    eos = None
+    if append_source_id:
+        src_dataset = AppendTokenDataset(
+            src_dataset, src_dict.index(lang_format.format(src))
+        )
+        ref_dataset = AppendTokenDataset(
+            ref_dataset, ref_dict.index(lang_format.format(ref))
+        )
+        if tgt_dataset is not None:
+            tgt_dataset = AppendTokenDataset(
+                tgt_dataset, tgt_dict.index(lang_format.format(tgt))
+            )
+        eos = tgt_dict.index(lang_format.format(tgt))
+
+    align_dataset = None
+    if load_alignments:
+        align_path = os.path.join(data_path, "{}.align.{}-{}".format(split, src, tgt))
+        if indexed_dataset.dataset_exists(align_path, impl=dataset_impl):
+            align_dataset = data_utils.load_indexed_dataset(
+                align_path, None, dataset_impl
+            )
+
+    tgt_dataset_sizes = tgt_dataset.sizes if tgt_dataset is not None else None
+    return LanguageTripleDataset(
+        src_dataset,
+        src_dataset.sizes,
+        src_dict,
+        ref_dataset,
+        ref_dataset.sizes,
+        ref_dict,
+        tgt_dataset,
+        tgt_dataset_sizes,
+        tgt_dict,
+        left_pad_source=left_pad_source,
+        left_pad_target=left_pad_target,
+        align_dataset=align_dataset,
+        eos=eos,
+        num_buckets=num_buckets,
+        shuffle=shuffle,
+        pad_to_multiple=pad_to_multiple,
+    )
+
+
+def collate(
+    samples,
+    pad_idx,
+    eos_idx,
+    left_pad_source=True,
+    left_pad_target=False,
+    input_feeding=True,
+    pad_to_length=None,
+    pad_to_multiple=1,
+):
+    if len(samples) == 0:
+        return {}
+
+    def merge(key, left_pad, move_eos_to_beginning=False, pad_to_length=None):
+        return data_utils.collate_tokens(
+            [s[key] for s in samples],
+            pad_idx,
+            None,
+            left_pad,
+            move_eos_to_beginning,
+            pad_to_length=pad_to_length,
+            pad_to_multiple=pad_to_multiple,
+        )
+
+    def check_alignment(alignment, src_len, tgt_len):
+        if alignment is None or len(alignment) == 0:
+            return False
+        if (
+            alignment[:, 0].max().item() >= src_len - 1
+            or alignment[:, 1].max().item() >= tgt_len - 1
+        ):
+            logger.warning("alignment size mismatch found, skipping alignment!")
+            return False
+        return True
+
+    def compute_alignment_weights(alignments):
+        """
+        Given a tensor of shape [:, 2] containing the source-target indices
+        corresponding to the alignments, a weight vector containing the
+        inverse frequency of each target index is computed.
+        For e.g. if alignments = [[5, 7], [2, 3], [1, 3], [4, 2]], then
+        a tensor containing [1., 0.5, 0.5, 1] should be returned (since target
+        index 3 is repeated twice)
+        """
+        align_tgt = alignments[:, 1]
+        _, align_tgt_i, align_tgt_c = torch.unique(
+            align_tgt, return_inverse=True, return_counts=True
+        )
+        align_weights = align_tgt_c[align_tgt_i[np.arange(len(align_tgt))]]
+        return 1.0 / align_weights.float()
+
+    id = torch.LongTensor([s["id"] for s in samples])
+    src_tokens = merge(
+        "source",
+        left_pad=left_pad_source,
+        pad_to_length=pad_to_length["source"] if pad_to_length is not None else None,
+    )
+    ref_tokens = merge(
+        "reference",
+        left_pad=left_pad_source,
+        pad_to_length=pad_to_length["source"] if pad_to_length is not None else None,
+    )
+    # sort by descending source length
+    src_lengths = torch.LongTensor(
+        [s["source"].ne(pad_idx).long().sum() for s in samples]
+    )
+    ref_lengths = torch.LongTensor(
+        [s["reference"].ne(pad_idx).long().sum() for s in samples]
+    )
+    src_lengths, sort_order = src_lengths.sort(descending=True)
+    id = id.index_select(0, sort_order)
+    src_tokens = src_tokens.index_select(0, sort_order)
+    ref_lengths = ref_lengths.index_select(0, sort_order)
+    ref_tokens = ref_tokens.index_select(0, sort_order)
+
+    prev_output_tokens = None
+    target = None
+    if samples[0].get("target", None) is not None:
+        target = merge(
+            "target",
+            left_pad=left_pad_target,
+            pad_to_length=pad_to_length["target"]
+            if pad_to_length is not None
+            else None,
+        )
+        target = target.index_select(0, sort_order)
+        tgt_lengths = torch.LongTensor(
+            [s["target"].ne(pad_idx).long().sum() for s in samples]
+        ).index_select(0, sort_order)
+        ntokens = tgt_lengths.sum().item()
+
+        if samples[0].get("prev_output_tokens", None) is not None:
+            prev_output_tokens = merge("prev_output_tokens", left_pad=left_pad_target)
+        elif input_feeding:
+            # we create a shifted version of targets for feeding the
+            # previous output token(s) into the next decoder step
+            prev_output_tokens = merge(
+                "target",
+                left_pad=left_pad_target,
+                move_eos_to_beginning=True,
+                pad_to_length=pad_to_length["target"]
+                if pad_to_length is not None
+                else None,
+            )
+    else:
+        ntokens = src_lengths.sum().item()
+
+    batch = {
+        "id": id,
+        "nsentences": len(samples),
+        "ntokens": ntokens,
+        "net_input": {
+            "src_tokens": src_tokens,
+            "src_lengths": src_lengths,
+        },
+        "target": target,
+        "ref_tokens": ref_tokens,
+        "ref_lengths": ref_lengths,
+    }
+    if prev_output_tokens is not None:
+        batch["net_input"]["prev_output_tokens"] = prev_output_tokens.index_select(
+            0, sort_order
+        )
+
+    if samples[0].get("alignment", None) is not None:
+        bsz, tgt_sz = batch["target"].shape
+        src_sz = batch["net_input"]["src_tokens"].shape[1]
+
+        offsets = torch.zeros((len(sort_order), 2), dtype=torch.long)
+        offsets[:, 1] += torch.arange(len(sort_order), dtype=torch.long) * tgt_sz
+        if left_pad_source:
+            offsets[:, 0] += src_sz - src_lengths
+        if left_pad_target:
+            offsets[:, 1] += tgt_sz - tgt_lengths
+
+        alignments = [
+            alignment + offset
+            for align_idx, offset, src_len, tgt_len in zip(
+                sort_order, offsets, src_lengths, tgt_lengths
+            )
+            for alignment in [samples[align_idx]["alignment"].view(-1, 2)]
+            if check_alignment(alignment, src_len, tgt_len)
+        ]
+
+        if len(alignments) > 0:
+            alignments = torch.cat(alignments, dim=0)
+            align_weights = compute_alignment_weights(alignments)
+
+            batch["alignments"] = alignments
+            batch["align_weights"] = align_weights
+
+    if samples[0].get("constraints", None) is not None:
+        # Collate the packed constraints across the samples, padding to
+        # the length of the longest sample.
+        lens = [sample.get("constraints").size(0) for sample in samples]
+        max_len = max(lens)
+        constraints = torch.zeros((len(samples), max(lens))).long()
+        for i, sample in enumerate(samples):
+            constraints[i, 0 : lens[i]] = samples[i].get("constraints")
+        batch["constraints"] = constraints.index_select(0, sort_order)
+
+    return batch
+
+
+class LanguageTripleDataset(FairseqDataset):
+    """
+    A pair of torch.utils.data.Datasets.
+
+    Args:
+        src (torch.utils.data.Dataset): source dataset to wrap
+        src_sizes (List[int]): source sentence lengths
+        src_dict (~fairseq.data.Dictionary): source vocabulary
+        tgt (torch.utils.data.Dataset, optional): target dataset to wrap
+        tgt_sizes (List[int], optional): target sentence lengths
+        tgt_dict (~fairseq.data.Dictionary, optional): target vocabulary
+        left_pad_source (bool, optional): pad source tensors on the left side
+            (default: True).
+        left_pad_target (bool, optional): pad target tensors on the left side
+            (default: False).
+        shuffle (bool, optional): shuffle dataset elements before batching
+            (default: True).
+        input_feeding (bool, optional): create a shifted version of the targets
+            to be passed into the model for teacher forcing (default: True).
+        remove_eos_from_source (bool, optional): if set, removes eos from end
+            of source if it's present (default: False).
+        append_eos_to_target (bool, optional): if set, appends eos to end of
+            target if it's absent (default: False).
+        align_dataset (torch.utils.data.Dataset, optional): dataset
+            containing alignments.
+        constraints (Tensor, optional): 2d tensor with a concatenated, zero-
+            delimited list of constraints for each sentence.
+        append_bos (bool, optional): if set, appends bos to the beginning of
+            source/target sentence.
+        num_buckets (int, optional): if set to a value greater than 0, then
+            batches will be bucketed into the given number of batch shapes.
+        src_lang_id (int, optional): source language ID, if set, the collated batch
+            will contain a field 'src_lang_id' in 'net_input' which indicates the
+            source language of the samples.
+        tgt_lang_id (int, optional): target language ID, if set, the collated batch
+            will contain a field 'tgt_lang_id' which indicates the target language
+             of the samples.
+    """
+
+    def __init__(
+        self,
+        src,
+        src_sizes,
+        src_dict,
+        ref,
+        ref_sizes,
+        ref_dict,
+        tgt=None,
+        tgt_sizes=None,
+        tgt_dict=None,
+        left_pad_source=True,
+        left_pad_target=False,
+        shuffle=True,
+        input_feeding=True,
+        remove_eos_from_source=False,
+        append_eos_to_target=False,
+        align_dataset=None,
+        constraints=None,
+        append_bos=False,
+        eos=None,
+        num_buckets=0,
+        src_lang_id=None,
+        tgt_lang_id=None,
+        pad_to_multiple=1,
+    ):
+        if tgt_dict is not None:
+            assert src_dict.pad() == tgt_dict.pad()
+            assert src_dict.eos() == tgt_dict.eos()
+            assert src_dict.unk() == tgt_dict.unk()
+        if tgt is not None:
+            assert len(src) == len(
+                tgt
+            ), "Source and target must contain the same number of examples"
+        assert len(src) == len(
+            ref
+        ), "Source and reference must contain the same number of examples"
+        self.src = src
+        self.ref = ref
+        self.tgt = tgt
+        self.src_sizes = np.array(src_sizes)
+        self.ref_sizes = np.array(ref_sizes)
+        self.tgt_sizes = np.array(tgt_sizes) if tgt_sizes is not None else None
+        self.sizes = (
+            np.vstack((self.src_sizes, self.tgt_sizes)).T
+            if self.tgt_sizes is not None
+            else self.src_sizes
+        )
+        self.src_dict = src_dict
+        self.ref_dict = ref_dict
+        self.tgt_dict = tgt_dict
+        self.left_pad_source = left_pad_source
+        self.left_pad_target = left_pad_target
+        self.shuffle = shuffle
+        self.input_feeding = input_feeding
+        self.remove_eos_from_source = remove_eos_from_source
+        self.append_eos_to_target = append_eos_to_target
+        self.align_dataset = align_dataset
+        if self.align_dataset is not None:
+            assert (
+                self.tgt_sizes is not None
+            ), "Both source and target needed when alignments are provided"
+        self.constraints = constraints
+        self.append_bos = append_bos
+        self.eos = eos if eos is not None else src_dict.eos()
+        self.src_lang_id = src_lang_id
+        self.tgt_lang_id = tgt_lang_id
+        if num_buckets > 0:
+            from fairseq.data import BucketPadLengthDataset
+
+            self.src = BucketPadLengthDataset(
+                self.src,
+                sizes=self.src_sizes,
+                num_buckets=num_buckets,
+                pad_idx=self.src_dict.pad(),
+                left_pad=self.left_pad_source,
+            )
+            self.src_sizes = self.src.sizes
+            logger.info("bucketing source lengths: {}".format(list(self.src.buckets)))
+            self.ref = BucketPadLengthDataset(
+                self.ref,
+                sizes=self.ref_sizes,
+                num_buckets=num_buckets,
+                pad_idx=self.ref_dict.pad(),
+                left_pad=self.left_pad_source,
+            )
+            self.ref_sizes = self.ref.sizes
+            logger.info("bucketing reference lengths: {}".format(list(self.src.buckets)))
+            if self.tgt is not None:
+                self.tgt = BucketPadLengthDataset(
+                    self.tgt,
+                    sizes=self.tgt_sizes,
+                    num_buckets=num_buckets,
+                    pad_idx=self.tgt_dict.pad(),
+                    left_pad=self.left_pad_target,
+                )
+                self.tgt_sizes = self.tgt.sizes
+                logger.info(
+                    "bucketing target lengths: {}".format(list(self.tgt.buckets))
+                )
+
+            # determine bucket sizes using self.num_tokens, which will return
+            # the padded lengths (thanks to BucketPadLengthDataset)
+            num_tokens = np.vectorize(self.num_tokens, otypes=[np.compat.long])
+            self.bucketed_num_tokens = num_tokens(np.arange(len(self.src)))
+            self.buckets = [
+                (None, num_tokens) for num_tokens in np.unique(self.bucketed_num_tokens)
+            ]
+        else:
+            self.buckets = None
+        self.pad_to_multiple = pad_to_multiple
+
+    def get_batch_shapes(self):
+        return self.buckets
+
+    def __getitem__(self, index):
+        tgt_item = self.tgt[index] if self.tgt is not None else None
+        src_item = self.src[index]
+        ref_item = self.ref[index]
+        # Append EOS to end of tgt sentence if it does not have an EOS and remove
+        # EOS from end of src sentence if it exists. This is useful when we use
+        # use existing datasets for opposite directions i.e., when we want to
+        # use tgt_dataset as src_dataset and vice versa
+        if self.append_eos_to_target:
+            eos = self.tgt_dict.eos() if self.tgt_dict else self.src_dict.eos()
+            if self.tgt and self.tgt[index][-1] != eos:
+                tgt_item = torch.cat([self.tgt[index], torch.LongTensor([eos])])
+
+        if self.append_bos:
+            bos = self.tgt_dict.bos() if self.tgt_dict else self.src_dict.bos()
+            if self.tgt and self.tgt[index][0] != bos:
+                tgt_item = torch.cat([torch.LongTensor([bos]), self.tgt[index]])
+
+            bos = self.src_dict.bos()
+            if self.src[index][0] != bos:
+                src_item = torch.cat([torch.LongTensor([bos]), self.src[index]])
+            if self.ref[index][0] != bos:
+                ref_item = torch.cat([torch.LongTensor([bos]), self.ref[index]])
+
+        if self.remove_eos_from_source:
+            eos = self.src_dict.eos()
+            if self.src[index][-1] == eos:
+                src_item = self.src[index][:-1]
+            if self.ref[index][-1] == eos:
+                ref_item = self.ref[index][:-1]
+
+        example = {
+            "id": index,
+            "source": src_item,
+            "reference": ref_item,
+            "target": tgt_item,
+        }
+        if self.align_dataset is not None:
+            example["alignment"] = self.align_dataset[index]
+        if self.constraints is not None:
+            example["constraints"] = self.constraints[index]
+        return example
+
+    def __len__(self):
+        return len(self.src)
+
+    def collater(self, samples, pad_to_length=None):
+        """Merge a list of samples to form a mini-batch.
+
+        Args:
+            samples (List[dict]): samples to collate
+            pad_to_length (dict, optional): a dictionary of
+                {'source': source_pad_to_length, 'target': target_pad_to_length}
+                to indicate the max length to pad to in source and target respectively.
+
+        Returns:
+            dict: a mini-batch with the following keys:
+
+                - `id` (LongTensor): example IDs in the original input order
+                - `ntokens` (int): total number of tokens in the batch
+                - `net_input` (dict): the input to the Model, containing keys:
+
+                  - `src_tokens` (LongTensor): a padded 2D Tensor of tokens in
+                    the source sentence of shape `(bsz, src_len)`. Padding will
+                    appear on the left if *left_pad_source* is ``True``.
+                  - `src_lengths` (LongTensor): 1D Tensor of the unpadded
+                    lengths of each source sentence of shape `(bsz)`
+                  - `prev_output_tokens` (LongTensor): a padded 2D Tensor of
+                    tokens in the target sentence, shifted right by one
+                    position for teacher forcing, of shape `(bsz, tgt_len)`.
+                    This key will not be present if *input_feeding* is
+                    ``False``.  Padding will appear on the left if
+                    *left_pad_target* is ``True``.
+                  - `src_lang_id` (LongTensor): a long Tensor which contains source
+                    language IDs of each sample in the batch
+
+                - `target` (LongTensor): a padded 2D Tensor of tokens in the
+                  target sentence of shape `(bsz, tgt_len)`. Padding will appear
+                  on the left if *left_pad_target* is ``True``.
+                - `tgt_lang_id` (LongTensor): a long Tensor which contains target language
+                   IDs of each sample in the batch
+        """
+        res = collate(
+            samples,
+            pad_idx=self.src_dict.pad(),
+            eos_idx=self.eos,
+            left_pad_source=self.left_pad_source,
+            left_pad_target=self.left_pad_target,
+            input_feeding=self.input_feeding,
+            pad_to_length=pad_to_length,
+            pad_to_multiple=self.pad_to_multiple,
+        )
+        if self.src_lang_id is not None or self.tgt_lang_id is not None:
+            src_tokens = res["net_input"]["src_tokens"]
+            bsz = src_tokens.size(0)
+            if self.src_lang_id is not None:
+                res["net_input"]["src_lang_id"] = (
+                    torch.LongTensor([[self.src_lang_id]]).expand(bsz, 1).to(src_tokens)
+                )
+            if self.tgt_lang_id is not None:
+                res["tgt_lang_id"] = (
+                    torch.LongTensor([[self.tgt_lang_id]]).expand(bsz, 1).to(src_tokens)
+                )
+        return res
+
+    def num_tokens(self, index):
+        """Return the number of tokens in a sample. This value is used to
+        enforce ``--max-tokens`` during batching."""
+        return max(
+            self.src_sizes[index],
+            self.tgt_sizes[index] if self.tgt_sizes is not None else 0,
+        )
+
+    def num_tokens_vec(self, indices):
+        """Return the number of tokens for a set of positions defined by indices.
+        This value is used to enforce ``--max-tokens`` during batching."""
+        sizes = self.src_sizes[indices]
+        if self.tgt_sizes is not None:
+            sizes = np.maximum(sizes, self.tgt_sizes[indices])
+        return sizes
+
+    def size(self, index):
+        """Return an example's size as a float or tuple. This value is used when
+        filtering a dataset with ``--max-positions``."""
+        return (
+            self.src_sizes[index],
+            self.tgt_sizes[index] if self.tgt_sizes is not None else 0,
+        )
+
+    def ordered_indices(self):
+        """Return an ordered list of indices. Batches will be constructed based
+        on this order."""
+        if self.shuffle:
+            indices = np.random.permutation(len(self)).astype(np.int64)
+        else:
+            indices = np.arange(len(self), dtype=np.int64)
+        if self.buckets is None:
+            # sort by target length, then source length
+            if self.tgt_sizes is not None:
+                indices = indices[np.argsort(self.tgt_sizes[indices], kind="mergesort")]
+            return indices[np.argsort(self.src_sizes[indices], kind="mergesort")]
+        else:
+            # sort by bucketed_num_tokens, which is:
+            #   max(padded_src_len, padded_tgt_len)
+            return indices[
+                np.argsort(self.bucketed_num_tokens[indices], kind="mergesort")
+            ]
+
+    @property
+    def supports_prefetch(self):
+        return getattr(self.src, "supports_prefetch", False) and (
+            getattr(self.tgt, "supports_prefetch", False) or self.tgt is None
+        )
+
+    def prefetch(self, indices):
+        self.src.prefetch(indices)
+        if self.tgt is not None:
+            self.tgt.prefetch(indices)
+        if self.align_dataset is not None:
+            self.align_dataset.prefetch(indices)
+
+    def filter_indices_by_size(self, indices, max_sizes):
+        """Filter a list of sample indices. Remove those that are longer
+            than specified in max_sizes.
+
+        Args:
+            indices (np.array): original array of sample indices
+            max_sizes (int or list[int] or tuple[int]): max sample size,
+                can be defined separately for src and tgt (then list or tuple)
+
+        Returns:
+            np.array: filtered sample array
+            list: list of removed indices
+        """
+        return data_utils.filter_paired_dataset_indices_by_size(
+            self.src_sizes,
+            self.tgt_sizes,
+            indices,
+            max_sizes,
+        )
diff --git a/Speech2S/speech2s/data/load_langpair_dataset.py b/Speech2S/speech2s/data/load_langpair_dataset.py
new file mode 100644
index 0000000000000000000000000000000000000000..bfd204598e67d41a5688e16b0835f96fd40cf384
--- /dev/null
+++ b/Speech2S/speech2s/data/load_langpair_dataset.py
@@ -0,0 +1,172 @@
+# --------------------------------------------------------
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Based on fairseq code bases
+# https://github.com/facebookresearch/fairseq
+# --------------------------------------------------------
+
+"""
+    Modified from https://github.com/facebookresearch/fairseq/blob/272c4c5197250997148fb12c0db6306035f166a4/fairseq/tasks/translation.py
+    1. Add custom lang_format in function load_langpair_dataset
+    2. If truncate_source (default no), use RandomCropDataset instead of TruncateDataset
+"""
+
+import itertools
+import logging
+import os
+
+from fairseq.data import (
+    AppendTokenDataset,
+    LanguagePairDataset,
+    PrependTokenDataset,
+    StripTokenDataset,
+    TruncateDataset,
+    RandomCropDataset,
+    data_utils,
+    indexed_dataset,
+)
+
+from speechut.data.concat_dataset import ConcatDataset
+
+
+EVAL_BLEU_ORDER = 4
+
+
+logger = logging.getLogger(__name__)
+
+
+def load_langpair_dataset(
+    data_path,
+    split,
+    src,
+    src_dict,
+    tgt,
+    tgt_dict,
+    combine,
+    dataset_impl,
+    upsample_primary,
+    left_pad_source,
+    left_pad_target,
+    max_source_positions,
+    max_target_positions,
+    prepend_bos=False,
+    load_alignments=False,
+    truncate_source=False,
+    append_source_id=False,
+    num_buckets=0,
+    shuffle=True,
+    pad_to_multiple=1,
+    prepend_bos_src=None,
+    lang_format="[{}]",
+    input_feeding=True,
+):
+    def split_exists(split, src, tgt, lang, data_path):
+        filename = os.path.join(data_path, "{}.{}-{}.{}".format(split, src, tgt, lang))
+        return indexed_dataset.dataset_exists(filename, impl=dataset_impl)
+
+    src_datasets = []
+    tgt_datasets = []
+
+    for k in itertools.count():
+        split_k = split + (str(k) if k > 0 else "")
+
+        # infer langcode
+        if split_exists(split_k, src, tgt, src, data_path):
+            prefix = os.path.join(data_path, "{}.{}-{}.".format(split_k, src, tgt))
+        elif split_exists(split_k, tgt, src, src, data_path):
+            prefix = os.path.join(data_path, "{}.{}-{}.".format(split_k, tgt, src))
+        else:
+            if k > 0:
+                break
+            else:
+                raise FileNotFoundError(
+                    "Dataset not found: {} ({})".format(split, data_path)
+                )
+
+        src_dataset = data_utils.load_indexed_dataset(
+            prefix + src, src_dict, dataset_impl
+        )
+        if truncate_source:
+            src_dataset = AppendTokenDataset(
+                RandomCropDataset(
+                    StripTokenDataset(src_dataset, src_dict.eos()),
+                    max_source_positions - 1,
+                ),
+                src_dict.eos(),
+            )
+        src_datasets.append(src_dataset)
+
+        tgt_dataset = data_utils.load_indexed_dataset(
+            prefix + tgt, tgt_dict, dataset_impl
+        )
+        if tgt_dataset is not None:
+            tgt_datasets.append(tgt_dataset)
+
+        logger.info(
+            "{} {} {}-{} {} examples".format(
+                data_path, split_k, src, tgt, len(src_datasets[-1])
+            )
+        )
+
+        if not combine:
+            break
+
+    assert len(src_datasets) == len(tgt_datasets) or len(tgt_datasets) == 0
+
+    if len(src_datasets) == 1:
+        src_dataset = src_datasets[0]
+        tgt_dataset = tgt_datasets[0] if len(tgt_datasets) > 0 else None
+    else:
+        sample_ratios = [1] * len(src_datasets)
+        sample_ratios[0] = upsample_primary
+        src_dataset = ConcatDataset(src_datasets, sample_ratios)
+        if len(tgt_datasets) > 0:
+            tgt_dataset = ConcatDataset(tgt_datasets, sample_ratios)
+        else:
+            tgt_dataset = None
+
+    if prepend_bos:
+        assert hasattr(src_dict, "bos_index") and hasattr(tgt_dict, "bos_index")
+        src_dataset = PrependTokenDataset(src_dataset, src_dict.bos())
+        if tgt_dataset is not None:
+            tgt_dataset = PrependTokenDataset(tgt_dataset, tgt_dict.bos())
+    elif prepend_bos_src is not None:
+        logger.info(f"prepending src bos: {prepend_bos_src}")
+        src_dataset = PrependTokenDataset(src_dataset, prepend_bos_src)
+
+    eos = None
+    if append_source_id:
+        src_dataset = AppendTokenDataset(
+            src_dataset, src_dict.index(lang_format.format(src))
+        )
+        if tgt_dataset is not None:
+            tgt_dataset = AppendTokenDataset(
+                tgt_dataset, tgt_dict.index(lang_format.format(tgt))
+            )
+        eos = tgt_dict.index(lang_format.format(tgt))
+
+    align_dataset = None
+    if load_alignments:
+        align_path = os.path.join(data_path, "{}.align.{}-{}".format(split, src, tgt))
+        if indexed_dataset.dataset_exists(align_path, impl=dataset_impl):
+            align_dataset = data_utils.load_indexed_dataset(
+                align_path, None, dataset_impl
+            )
+
+    tgt_dataset_sizes = tgt_dataset.sizes if tgt_dataset is not None else None
+    return LanguagePairDataset(
+        src_dataset,
+        src_dataset.sizes,
+        src_dict,
+        tgt_dataset,
+        tgt_dataset_sizes,
+        tgt_dict,
+        left_pad_source=left_pad_source,
+        left_pad_target=left_pad_target,
+        align_dataset=align_dataset,
+        eos=eos,
+        num_buckets=num_buckets,
+        shuffle=shuffle,
+        pad_to_multiple=pad_to_multiple,
+        input_feeding=input_feeding,
+    )
diff --git a/Speech2S/speech2s/data/multimodal_corpus_dataset.py b/Speech2S/speech2s/data/multimodal_corpus_dataset.py
new file mode 100644
index 0000000000000000000000000000000000000000..19a6f8962757dec9b32430a98cd6e850d1f30d19
--- /dev/null
+++ b/Speech2S/speech2s/data/multimodal_corpus_dataset.py
@@ -0,0 +1,368 @@
+# --------------------------------------------------------
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Based on fairseq code bases
+# https://github.com/facebookresearch/fairseq
+# --------------------------------------------------------
+
+import logging
+from os import replace
+import time
+from collections import OrderedDict
+from typing import Any, Dict, List, Optional
+
+import numpy as np
+from fairseq.data import data_utils
+
+from fairseq.data import FairseqDataset
+
+logger = logging.getLogger(__name__)
+
+
+class MultiCorpusDataset(FairseqDataset):
+    """
+    see fairseq/fairseq/data/multi_corpus_dataset.__doc__
+
+    Args:
+        datasets: a OrderedDict of FairseqDataset instances.
+        distribution: a List containing the probability of getting an utterance from
+                        corresponding dataset
+        seed: random seed for sampling the datsets
+        sort_indices: if true, will sort the ordered indices by size
+        batch_sample: if true, will ensure each batch is from a single dataset
+    """
+
+    def __init__(
+        self,
+        datasets: Dict[str, FairseqDataset],
+        max_positions: Dict,
+        distribution: List[float],
+        max_tokens_ratio: List[float],
+        seed: int = 1234,
+        sort_indices: bool = False,
+        check_length: bool = False,
+    ):
+        super().__init__()
+        assert isinstance(datasets, OrderedDict)
+        assert len(datasets) == len(distribution)
+        # assert sum(distribution) == 1
+        self.datasets = datasets
+        self.distribution = distribution
+        self.max_tokens_ratio = max_tokens_ratio
+        self.seed = seed
+        self.sort_indices = sort_indices
+        self.max_positions = max_positions
+        self.check_length = check_length
+
+        # Avoid repeated conversions to list later
+        self.dataset_list = list(datasets.values())
+        self.total_num_instances = 0
+
+        # first_dataset = self.dataset_list[0]
+
+        self.num_instances_per_dataset = []
+        self.dataset_offsets = []
+        for i, dataset in enumerate(self.dataset_list):
+            assert isinstance(dataset, FairseqDataset)
+            # assert type(dataset) is type(first_dataset)
+            self.num_instances_per_dataset.append(
+                0 if self.distribution[i] == 0 else len(dataset)
+            )
+            self.dataset_offsets.append(self.total_num_instances)
+            self.total_num_instances += self.num_instances_per_dataset[i]
+
+    def ordered_indices(self):
+        start = time.time()
+        with data_utils.numpy_seed(self.seed, self.epoch):
+            logger.info(f"sampling new dataset with seed {self.seed} epoch {self.epoch}")
+            sampled_indices = {}
+
+            # For each dataset i, sample self.distribution[i] * self.total_num_instances
+            for i, key in enumerate(self.datasets):
+                tp = time.time()
+                if self.distribution[i] == 0:
+                    # skip dataset if sampling probability is 0
+                    continue
+
+                if i < len(self.datasets) - 1:
+                    num_instances = int(self.distribution[i] * self.total_num_instances)
+                    high = self.dataset_offsets[i + 1]
+                else:
+                    num_instances = int(self.distribution[i] * self.total_num_instances)
+                    high = self.total_num_instances
+
+                logger.info(f"sampling {num_instances} from {key} dataset")
+
+                # First, add k copies of the dataset where k = num_instances // len(dataset).
+                # This ensures an equal distribution of the data points as much as possible.
+                # For the remaining entries randomly sample them
+                dataset_size = len(self.datasets[key])
+                num_copies = num_instances // dataset_size
+                dataset_indices = np.random.permutation(high - self.dataset_offsets[i])[: num_instances - num_copies * dataset_size]
+                if num_copies > 0:
+                    dataset_indices = np.concatenate(
+                            (
+                                np.repeat(
+                                    np.arange(high - self.dataset_offsets[i]), num_copies
+                                ),
+                                dataset_indices,
+                            )
+                        )
+                # filter by size, we should ignore it by setting check_length=False
+                # , as it is very time-consuming on large dadaset
+                if self.max_positions[key] is not None and self.check_length:
+                    dataset_indices, ignored = self.datasets[key].filter_indices_by_size(
+                        dataset_indices,
+                        self.max_positions[key],
+                    )
+                    if len(ignored) > 0:
+                        logger.warning(
+                            (
+                                "{:,} samples have invalid sizes and will be skipped, "
+                                "max_positions={}, first few sample ids={}"
+                            ).format(len(ignored), self.max_positions[key], ignored[:10])
+                        )
+            
+                if self.sort_indices:
+                    logger.info(" - sampled indices took {}s".format(time.time() - tp))
+                    tp = time.time()
+                    dataset_indices = np.sort(dataset_indices)
+                    ordered_indices = self.datasets[key].ordered_indices()
+                    if isinstance(ordered_indices[0], np.ndarray):  # chunked audio data
+                        dataset_indices = [order_idx + self.dataset_offsets[i] for order_idx in ordered_indices]
+                        assert self.dataset_offsets[i] == 0
+                        # TODO for chunked audio data, now assume len(dataset_indices) == len(dataset). Don't filter any data.
+                    else:
+                        dataset_indices = ordered_indices[dataset_indices] + self.dataset_offsets[i]
+                    logger.info(" - ordered_indices took {}s".format(time.time() - tp))
+                else:
+                    np.random.shuffle(dataset_indices)
+                
+                sampled_indices[key] = dataset_indices
+
+            logger.info(
+                "multi_corpus_dataset ordered_indices took {}s".format(
+                    time.time() - start
+                )
+            )
+            return sampled_indices
+
+    def _map_index(self, index: int):
+        """
+        If dataset A has length N and dataset B has length M
+        then index 1 maps to index 1 of dataset A, and index N + 1
+        maps to index 1 of B.
+        """
+        counter = 0
+        for num_instances, key in zip(self.num_instances_per_dataset, self.datasets):
+            if index < counter + num_instances:
+                return index - counter, key
+            counter += num_instances
+        raise ValueError(
+            "Invalid index: {}, max: {}".format(index, self.total_num_instances)
+        )
+
+    def __len__(self):
+        """
+        Length of this dataset is the sum of individual datasets
+        """
+        return self.total_num_instances
+
+    def __getitem__(self, index):
+        new_index, key = self._map_index(index)
+        try:
+            item = self.datasets[key][new_index]
+            item["full_id"] = index
+            return item
+        except Exception as e:
+            e.args = (f"Error from {key} dataset", *e.args)
+            raise
+
+    def collater(self, samples):
+        """
+        If we are doing batch sampling, then pick the right collater to use.
+
+        Otherwise we assume all collaters are the same.
+        """
+        if len(samples) == 0:
+            return None
+        
+        samples_dict = {key: [] for key in self.datasets}
+        for s in samples:
+            _, key = self._map_index(s["full_id"])
+            samples_dict[key].append(s)
+        
+        batch = {}
+        for key in samples_dict:
+            if len(samples_dict[key]) == 0:
+                continue
+            batch[key] = self.datasets[key].collater(samples_dict[key])
+
+        return batch
+
+
+    def num_tokens(self, index: int):
+        index, key = self._map_index(index)
+        return self.datasets[key].num_tokens(index)
+
+    def size(self, index: int):
+        index, key = self._map_index(index)
+        return self.datasets[key].size(index)
+
+    @property
+    def can_reuse_epoch_itr_across_epochs(self):
+        return False
+
+    def set_epoch(self, epoch, **unused):
+        super().set_epoch(epoch)
+        logger.info(f"setting epoch of multi_corpus_dataset to {epoch}")
+        for ds in self.dataset_list:
+            if hasattr(ds, "set_epoch"):
+                ds.set_epoch(epoch)
+        self.epoch = epoch
+
+    @property
+    def supports_prefetch(self):
+        return False
+
+    @property
+    def supports_fetch_outside_dataloader(self):
+        return all(
+            self.datasets[key].supports_fetch_outside_dataloader
+            for key in self.datasets
+        )
+        
+
+    def batch_by_size(
+        self,
+        indices,
+        max_tokens=None,
+        max_sentences=None,
+        required_batch_size_multiple=1,
+    ):    
+        dataset_indices = indices
+        batches_dict = {}
+        for n, key in enumerate(dataset_indices):
+            max_tokens_ratio = self.max_tokens_ratio[n]
+            if isinstance(dataset_indices[key][0], np.ndarray): # chunked audio data
+                cur_batches = self.datasets[key].batch_by_size(
+                    dataset_indices[key],
+                    round(max_tokens * max_tokens_ratio),
+                    max_sentences,
+                    required_batch_size_multiple,
+                )
+                logger.info(f"Created {sum([len(b) for b in cur_batches])} [{len(cur_batches)}] batches for dataset {key}")
+            else:
+                cur_batches = super().batch_by_size(
+                    np.array(dataset_indices[key], dtype=np.int64),
+                    round(max_tokens * max_tokens_ratio),
+                    max_sentences,
+                    required_batch_size_multiple,
+                )
+                logger.info(f"Created {len(cur_batches)} batches for dataset {key}")
+            batches_dict[key] = cur_batches
+
+        return batches_dict
+
+
+    def get_batch_sampler(
+        self,
+        indices,
+        num_shards,
+        seed,
+        max_tokens=None,
+        max_sentences=None,
+        required_batch_size_multiple=1,
+        split_modality_batch=False,
+    ):
+
+        def batch_sampler(dataset, epoch):
+            start = time.time()
+            batches_dict = dataset.batch_by_size(
+                indices,
+                max_tokens=max_tokens,
+                max_sentences=max_sentences,
+                required_batch_size_multiple=required_batch_size_multiple,
+            )
+            logger.info(f"multi_corpus_dataset, batch_by_size took {time.time() - start}s")
+            start = time.time()
+            new_batches = []
+
+            ### shuffle inner group size, split into speech/text batches
+            shuffled_batches_list = []
+            speech_batches = []
+            ### we should specify the speech_batches because: we need concatenate different speech datasets 
+            # (e.g. ltr or km) instead of loading them parellelly.
+            for name, batches in batches_dict.items():
+                if name.startswith("speech"):
+                    if isinstance(batches[0], list):  # chunked audio data
+                        batches = self.datasets[name].shuffle_batches(list(batches), seed + epoch)
+                        shuffled_batches_list.append(batches)
+                    else:
+                        batches = inner_bucket_shuffle(batches, seed+epoch, num_shards*10)
+                        batches = batches[: (len(batches) // num_shards) * num_shards]
+                        if len(batches) == 0:
+                            logger.warning(f"Sample 0 batch for {name}, you should ensure that no {name} data provided.")
+                        else:
+                            speech_batches += batches
+                else:
+                    batches = inner_bucket_shuffle(batches, seed+epoch, num_shards*10)
+                    batches = batches[: (len(batches) // num_shards) * num_shards]
+                    if len(batches) == 0:
+                        logger.warning(f"Sample 0 batch for {name}, you should ensure that no {name} data provided.")
+                    else:
+                        batches = shuffle_buckets(batches, seed=seed+epoch, inner_shuf=False)
+                        shuffled_batches_list.append(batches)
+                if len(speech_batches) > 0:
+                    speech_batches = shuffle_buckets(speech_batches, seed=seed+epoch, inner_shuf=False)
+                    shuffled_batches_list.append(speech_batches)
+
+            ### create the final new_batches
+            num_batch = min(len(batches) for batches in shuffled_batches_list)
+            if split_modality_batch:
+                for i in range(0, num_batch, num_shards):
+                    for batches in shuffled_batches_list:
+                        new_batches += batches[i: i + num_shards]
+            else:
+                for i in range(num_batch):
+                    new_batches.append(np.concatenate([batches[i] for batches in shuffled_batches_list]))
+
+            logger.info(f"multi_corpus_dataset sample {len(new_batches)} batches, took {time.time() - start}s")
+            return new_batches
+        
+        def inner_bucket_shuffle(batches, seed, bucket_size=10, thr=0):
+            """we assert batches is sorted form long to short.
+                shuffle samples in a buctet(e.g. 10 batches).
+                batches: a list of numpy array"""
+            num_batch = len(batches)
+            new_batches = []
+            num_buckets = len(batches) // bucket_size
+            i = 0
+            while i < num_batch:
+                if (i < bucket_size * thr or 
+                    i >= bucket_size * (num_buckets - thr)
+                ):
+                    new_batches.append(batches[i])
+                    i += 1
+                else:
+                    group = np.concatenate(batches[i: i+bucket_size])
+                    with data_utils.numpy_seed(seed):
+                        np.random.shuffle(group)
+                    new_batches += np.array_split(group, bucket_size)
+                    i += bucket_size
+            assert all([len(batch) > 0 for batch in new_batches])
+            return new_batches
+        
+        def shuffle_buckets(batches, seed, inner_shuf=True):
+            if inner_shuf:
+                batches = inner_bucket_shuffle(batches, seed, num_shards*10)
+            batches = [batches[i: i + num_shards] for i in range(0, len(batches)-num_shards+1, num_shards)]
+            assert len(batches[-1]) == num_shards
+            new_batches = []
+            with data_utils.numpy_seed(seed):
+                np.random.shuffle(batches)
+                for group in batches:
+                    new_batches += group
+            return new_batches
+        
+        return batch_sampler
diff --git a/Speech2S/speech2s/models/__init__.py b/Speech2S/speech2s/models/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/Speech2S/speech2s/models/speechut.py b/Speech2S/speech2s/models/speechut.py
new file mode 100644
index 0000000000000000000000000000000000000000..cb668286c1c1c420d0c7d7b9e74a3bca17c6c871
--- /dev/null
+++ b/Speech2S/speech2s/models/speechut.py
@@ -0,0 +1,785 @@
+# ----------------------------------------------------------------------------
+# SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder Based Speech-Text Pre-training (https://arxiv.org/abs/2210.03730)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/SpeechUT
+# Code based on fairseq: https://github.com/facebookresearch/fairseq/tree/272c4c5197250997148fb12c0db6306035f166a4
+# 
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# ----------------------------------------------------------------------------
+
+import logging
+from dataclasses import dataclass, field
+from typing import Dict, List, Optional, Tuple
+
+import numpy as np
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+
+from fairseq import utils, checkpoint_utils
+from fairseq.data.data_utils import compute_mask_indices
+from fairseq.data.dictionary import Dictionary
+from fairseq.dataclass import ChoiceEnum
+from fairseq.models import BaseFairseqModel, register_model
+from fairseq.models.transformer import Embedding
+from fairseq.file_io import PathManager
+from torch import Tensor
+from fairseq.models.wav2vec.wav2vec2 import ConvFeatureExtractionModel
+from fairseq.modules import GradMultiply, LayerNorm
+from fairseq.tasks.hubert_pretraining import (
+    HubertPretrainingConfig,
+    HubertPretrainingTask,
+)
+from fairseq.models.hubert import HubertConfig
+from fairseq.models.transformer import TransformerConfig
+from speechut.modules import TransformerEncoder
+from speechut.modules import TransformerEncoderBase
+from speechut.modules import TransformerDecoderBaseScriptable
+
+logger = logging.getLogger(__name__)
+
+EXTRACTOR_MODE_CHOICES = ChoiceEnum(["default", "layer_norm"])
+MASKING_DISTRIBUTION_CHOICES = ChoiceEnum(["static", "uniform", "normal", "poisson"])
+
+
+@dataclass
+
+class SpeechutConfig(HubertConfig):
+    use_rel_pos_enc: bool = field(
+        default=False,
+        metadata={"help": "whether to use relative positional encoding"},
+    )
+    scaling_for_att: float = field(
+        default=1.0,
+        metadata={"help": "scaling for attention weights to prevent overflow issue (for large model)"},
+    )
+
+    # unit encoder-decoder
+    text_transformer: TransformerConfig = TransformerConfig()
+    reset_decoder_embedding_config: bool = field(
+        default=False,
+        metadata={"help": "reset the no_scale_embedding/layernorm_embedding to default for the decoder"},
+    )
+    add_unit_encoder: bool = field(
+        default=False,
+        metadata={"help": "add unit encoder"},
+    )
+    add_decoder: bool = field(
+        default=True,
+        metadata={"help": "add decoder"},
+    )
+    add_text_ctc: bool = field(
+        default=False,
+        metadata={"help": "add_text_ctc head"},
+    )
+    text_ctc_conv_kernel: int = field(
+        default=2,
+        metadata={"help": "text_ctc_conv kernel size"},
+    )
+    mask_u2t: bool = field(
+        default=True,
+        metadata={"help": "mask the unit input in unit-to-text task"},
+    )
+
+    # embedding mixing
+    mix_with_unit: bool = field(
+        default=True,
+        metadata={"help": "mix with the unit embeddings"},
+    )
+    use_pred_unit: bool = field(
+        default=False,
+        metadata={"help": "use the embeddings of predicted units"},
+    )
+    l2_embedding: bool = field(
+        default=False,
+        metadata={"help": "compute l2 loss between unit embedding and unit hidden state"},
+    )
+
+    # Finetune related
+    encoder_dict_size: int = field(
+        default=-1,
+        metadata={"help": "text encoder dictionary dimension"},
+    )
+
+    decoder_dict_size: int = field(
+        default=-1,
+        metadata={"help": "decoder dictionary dimension"},
+    )
+
+
+@register_model("speechut", dataclass=SpeechutConfig)
+class SpeechutModel(BaseFairseqModel):
+    def __init__(
+        self,
+        cfg: SpeechutConfig,
+        task_cfg: HubertPretrainingConfig,
+        dictionaries: List[Dictionary],
+        unit_dictionary: Dictionary = None,
+        text_tgt_dictionary: Dictionary = None,
+    ) -> None:
+        super().__init__()
+        logger.info(f"SpeechutModel Config: {cfg}")
+
+        feature_enc_layers = eval(cfg.conv_feature_layers)  # noqa
+        self.embed = feature_enc_layers[-1][0]
+
+        self.feature_extractor = ConvFeatureExtractionModel(
+            conv_layers=feature_enc_layers,
+            dropout=0.0,
+            mode=cfg.extractor_mode,
+            conv_bias=cfg.conv_bias,
+        )
+        feature_ds_rate = np.prod([s for _, _, s in feature_enc_layers])
+        self.feat2tar_ratio = cfg.label_rate * feature_ds_rate / task_cfg.sample_rate
+
+        self.post_extract_proj = (
+            nn.Linear(self.embed, cfg.encoder_embed_dim)
+            if self.embed != cfg.encoder_embed_dim
+            else None
+        )
+
+        self.mask_prob = cfg.mask_prob
+        self.mask_selection = cfg.mask_selection
+        self.mask_other = cfg.mask_other
+        self.mask_length = cfg.mask_length
+        self.no_mask_overlap = cfg.no_mask_overlap
+        self.mask_min_space = cfg.mask_min_space
+
+        self.mask_channel_prob = cfg.mask_channel_prob
+        self.mask_channel_selection = cfg.mask_channel_selection
+        self.mask_channel_other = cfg.mask_channel_other
+        self.mask_channel_length = cfg.mask_channel_length
+        self.no_mask_channel_overlap = cfg.no_mask_channel_overlap
+        self.mask_channel_min_space = cfg.mask_channel_min_space
+
+        self.dropout_input = nn.Dropout(cfg.dropout_input)
+        self.dropout_features = nn.Dropout(cfg.dropout_features)
+
+        self.feature_grad_mult = cfg.feature_grad_mult
+        self.logit_temp = cfg.logit_temp
+        self.skip_masked = cfg.skip_masked
+        self.skip_nomask = cfg.skip_nomask
+
+        final_dim = cfg.final_dim if cfg.final_dim > 0 else cfg.encoder_embed_dim
+
+        self.mask_emb = nn.Parameter(
+            torch.FloatTensor(cfg.encoder_embed_dim).uniform_()
+        )
+
+        self.encoder = TransformerEncoder(cfg)
+        self.layer_norm = LayerNorm(self.embed)
+
+        self.target_glu = None
+        if cfg.target_glu:
+            self.target_glu = nn.Sequential(
+                nn.Linear(final_dim, final_dim * 2), nn.GLU()
+            )
+
+        self.final_dim = final_dim
+        assert len(dictionaries) <= 2, f"Only support <=2 kinds of targets, get {len(dictionaries)} dictionaries"
+        if len(dictionaries) == 1:
+            dictionaries = [dictionaries[0], dictionaries[0]]
+        self.num_classes = [len(d) for d in dictionaries]
+
+        self.final_proj = nn.Linear(cfg.encoder_embed_dim, final_dim)
+        self.code_encoder_proj = nn.Linear(cfg.text_transformer.encoder.embed_dim, self.num_classes[-1])
+        self.final_proj_list = [self.final_proj, self.code_encoder_proj]
+
+        self.label_embs_concat = nn.Parameter(torch.FloatTensor(self.num_classes[0], final_dim))
+        self.label_embs_list = [self.label_embs_concat]
+        for p in self.label_embs_list:
+            nn.init.uniform_(p)
+
+        ### build unit encoder:
+        self.mask_u2t = cfg.mask_u2t
+        self.add_text_ctc = cfg.add_text_ctc
+        self.text_ctc_conv_kernel = cfg.text_ctc_conv_kernel
+        self.padding_idx = unit_dictionary.pad()
+        self.unit_mask_idx = unit_dictionary.index("<mask>")
+
+        self.add_unit_encoder = cfg.add_unit_encoder
+        self.mix_with_unit = cfg.mix_with_unit
+        self.use_pred_unit = cfg.use_pred_unit
+        self.l2_embedding = cfg.l2_embedding
+        if self.add_unit_encoder:
+            assert len(unit_dictionary) == self.num_classes[0], f"unit_dictionary: {len(unit_dictionary)}, self.num_classes[0]: {self.num_classes[0]}"
+            ### build unit pre-net, and shared with hubert label_embs if needed (default: False)
+            self.unit_embed_tokens = self.build_embedding(
+                    unit_dictionary,
+                    cfg.text_transformer.encoder.embed_dim,
+                )
+            if self.final_dim == cfg.text_transformer.encoder.embed_dim:
+                logger.info("Share label_embs[0] with unit_embed_tokens ...")
+                nn.init.uniform_(self.unit_embed_tokens.weight)
+                self.label_embs_list[0] = self.unit_embed_tokens.weight
+
+            ### build unit encoder
+            self.unit_encoder = TransformerEncoderBase(
+                cfg.text_transformer, 
+                unit_dictionary, 
+                self.unit_embed_tokens,
+                use_rel_pos_enc=cfg.use_rel_pos_enc,
+                scaling_for_att=cfg.scaling_for_att,
+            )
+            
+            ### build text ctc head
+            if self.add_text_ctc:
+                conv = nn.Conv1d(
+                    cfg.text_transformer.encoder.embed_dim, cfg.text_transformer.encoder.embed_dim, 
+                    self.text_ctc_conv_kernel,
+                    stride=self.text_ctc_conv_kernel // 2,
+                    bias=False,
+                    padding=self.text_ctc_conv_kernel // 2,
+                )
+                nn.init.kaiming_normal_(conv.weight)
+                self.unit_encoder_ctc_head = nn.Sequential(
+                    Rotate3D(),
+                    conv,
+                    nn.Dropout(p=0.1),
+                    nn.Sequential(
+                        Rotate3D(),
+                        Rotate3D(),
+                        LayerNorm(cfg.text_transformer.encoder.embed_dim),
+                    ),
+                    nn.GELU(),
+                    nn.Linear(cfg.text_transformer.encoder.embed_dim, len(text_tgt_dictionary)),
+                )
+
+        ### build unit2text decoder, not available for now
+        self.add_decoder = cfg.add_decoder
+        self.text_transformer_cfg = cfg.text_transformer
+        if self.add_decoder:
+            # To make sure that the decoder dict size is the same as the fine-tuning tgt_dict size or bpe code dict size
+            dec_dictionary = self.cutting_dictionary(text_tgt_dictionary, cfg.decoder_dict_size)
+            decoder_embed_tokens = self.build_embedding(
+                dec_dictionary, cfg.text_transformer.decoder.embed_dim
+            )
+            if cfg.reset_decoder_embedding_config:
+                cfg.text_transformer.no_scale_embedding = False
+                cfg.text_transformer.layernorm_embedding = False
+                cfg.text_transformer.no_token_positional_embeddings = False
+            self.decoder = TransformerDecoderBaseScriptable(cfg.text_transformer, dec_dictionary, decoder_embed_tokens, use_rel_pos_enc=cfg.use_rel_pos_enc)
+        
+
+    def cutting_dictionary(self, dictionary, dict_size):
+        if dictionary is None or dict_size <= 0:
+            return dictionary
+        else:
+            import copy
+            cut_dictionary = copy.deepcopy(dictionary)
+            if dict_size > len(cut_dictionary):
+                for i in range(dict_size - len(cut_dictionary)):
+                    cut_dictionary.symbols.append(f'_{i}_')
+            else:
+                cut_dictionary.symbols = cut_dictionary.symbols[:dict_size]
+            return cut_dictionary
+
+    def build_embedding(self, dictionary, embed_dim):
+        num_embeddings = len(dictionary)
+        padding_idx = dictionary.pad()
+        return Embedding(num_embeddings, embed_dim, padding_idx)
+
+    def upgrade_state_dict_named(self, state_dict, name):
+        """Upgrade a (possibly old) state dict for new versions of fairseq."""
+
+        super().upgrade_state_dict_named(state_dict, name)
+        return state_dict
+
+    @classmethod
+    def build_model(cls, cfg: SpeechutConfig, task: HubertPretrainingTask):
+        """Build a new model instance."""
+        unit_dictionary = getattr(task, "text_src_dictionary", None)
+        text_tgt_dictionary = getattr(task, "text_dictionary", None)
+        model = SpeechutModel(cfg, task.cfg, task.dictionaries, unit_dictionary, text_tgt_dictionary)
+        return model
+
+    def apply_mask(self, x, padding_mask, target_list):
+        B, T, C = x.shape
+        if self.mask_prob > 0:
+            mask_indices = compute_mask_indices(
+                (B, T),
+                padding_mask,
+                self.mask_prob,
+                self.mask_length,
+                self.mask_selection,
+                self.mask_other,
+                min_masks=2,
+                no_overlap=self.no_mask_overlap,
+                min_space=self.mask_min_space,
+            )
+            mask_indices = torch.from_numpy(mask_indices).to(x.device)
+            x[mask_indices] = self.mask_emb
+        else:
+            mask_indices = None
+
+        if self.mask_channel_prob > 0:
+            mask_channel_indices = compute_mask_indices(
+                (B, C),
+                None,
+                self.mask_channel_prob,
+                self.mask_channel_length,
+                self.mask_channel_selection,
+                self.mask_channel_other,
+                no_overlap=self.no_mask_channel_overlap,
+                min_space=self.mask_channel_min_space,
+            )
+            mask_channel_indices = (
+                torch.from_numpy(mask_channel_indices)
+                .to(x.device)
+                .unsqueeze(1)
+                .expand(-1, T, -1)
+            )
+            x[mask_channel_indices] = 0
+
+        return x, mask_indices
+
+    def forward_features(self, source: torch.Tensor) -> torch.Tensor:
+        if self.feature_grad_mult > 0:
+            features = self.feature_extractor(source)
+            if self.feature_grad_mult != 1.0:
+                features = GradMultiply.apply(features, self.feature_grad_mult)
+        else:
+            with torch.no_grad():
+                features = self.feature_extractor(source)
+        return features
+
+    def forward_targets(
+        self,
+        features: torch.Tensor,
+        target_list: List[torch.Tensor],
+    ) -> Tuple[torch.Tensor, torch.Tensor]:
+        # Trim features to ensure labels exist and then get aligned labels
+        feat_tsz = features.size(2)
+        targ_tsz = min([t.size(1) for t in target_list])
+        if self.feat2tar_ratio * feat_tsz > targ_tsz:
+            feat_tsz = int(targ_tsz / self.feat2tar_ratio)
+            features = features[..., :feat_tsz]
+        target_inds = torch.arange(feat_tsz).float() * self.feat2tar_ratio
+        target_inds += np.random.choice(int(self.feat2tar_ratio))
+        target_list = [t[:, target_inds.long()] for t in target_list]
+        return features, target_list
+
+    def forward_padding_mask(
+        self,
+        features: torch.Tensor,
+        padding_mask: torch.Tensor,
+    ) -> torch.Tensor:
+        extra = padding_mask.size(1) % features.size(1)
+        if extra > 0:
+            padding_mask = padding_mask[:, :-extra]
+        padding_mask = padding_mask.view(padding_mask.size(0), features.size(1), -1)
+        padding_mask = padding_mask.all(-1)
+        return padding_mask
+
+    def get_normalized_probs(
+        self,
+        net_output: Tuple[Tensor, Optional[Dict[str, List[Optional[Tensor]]]]],
+        log_probs: bool,
+        sample: Optional[Dict[str, Tensor]] = None,
+    ):
+        lprobs = self.get_normalized_probs_scriptable(net_output, log_probs, sample)
+        lprobs.batch_first = True
+        return lprobs
+
+    def downsample_ctc_padding_mask(self, padding_mask):
+        """
+        padding_mask: (B, T)
+        """
+        stride = self.text_ctc_conv_kernel // 2
+        return padding_mask[:, ::stride]
+    
+    def compute_pred(self, proj_x, label_embs):
+        if self.target_glu:
+            label_embs = self.target_glu(label_embs)
+        x = F.normalize(proj_x.float(), dim=-1)                 # (S, D)
+        label_embs = F.normalize(label_embs.float(), dim=-1)    # (C, D)
+        logits = torch.matmul(x, label_embs.T).type_as(proj_x)  # (S, C)
+        logits /= self.logit_temp
+        return logits
+
+    def compute_hubert_logits(self, x, target, proj, label_embs, padding_mask, mask_indices):
+        if not self.skip_masked:
+            masked_indices = torch.logical_and(~padding_mask, mask_indices)
+            proj_x_m = proj(x[masked_indices])
+            logit_m_list = [(self.compute_pred(proj_x_m, label_embs), target[masked_indices])]
+        else:
+            logit_m_list = [None]
+
+        if not self.skip_nomask:
+            nomask_indices = torch.logical_and(~padding_mask, ~mask_indices)
+            proj_x_u = proj(x[nomask_indices])
+            logit_u_list = [(self.compute_pred(proj_x_u, label_embs), target[nomask_indices])]
+        else:
+            logit_u_list = [None]
+
+        return logit_m_list, logit_u_list
+
+    def compute_ce_logits(self, x, target, proj, padding_mask, mask_indices):
+        if not self.skip_masked:
+            masked_indices = torch.logical_and(~padding_mask, mask_indices)
+            logit_m_list = [(proj(x[masked_indices]), target[masked_indices])]
+        else:
+            logit_m_list = [None]
+
+        if not self.skip_nomask:
+            nomask_indices = torch.logical_and(~padding_mask, ~mask_indices)
+            logit_u_list = [(proj(x[nomask_indices]), target[nomask_indices])]
+        else:
+            logit_u_list = [None]
+
+        return logit_m_list, logit_u_list
+
+    def convert_embeddings(self,
+        x,
+        padding_mask,
+        target=None,
+        mask_indices=None,
+        mix_with_unit=False,
+        use_pred_unit=False,
+        l2_embedding=False,
+        remask=False
+    ):
+        """
+        1. Mix with units if needed (default: True)
+        2. Prepare for unit_encoder inputs
+        Inputs:
+            x, (B, T, D)
+        Return:
+            src_tokens, (B, T)
+            soft_embeddings, (B, T, D)
+            l2_loss, a loss
+        """
+        soft_embeddings = self.final_proj_list[0](x) if x.size(-1) == self.final_dim else x
+        if padding_mask is None:
+            padding_mask = soft_embeddings.new_zeros(soft_embeddings.size(0), soft_embeddings.size(1), dtype=torch.long)
+        if use_pred_unit:
+            src_tokens = self.compute_pred(self.final_proj_list[0](x), self.label_embs_list[0]).argmax(dim=-1)
+            src_tokens[padding_mask] = self.padding_idx
+        elif target is not None:
+            src_tokens = target
+        else:
+            src_tokens = padding_mask.long()
+
+        if l2_embedding | mix_with_unit:
+            unit_embeddings = self.unit_embed_tokens(src_tokens)    # (B, T, D)
+        
+        l2_loss = 0
+        if l2_embedding:
+            if mask_indices is not None:
+                l2_loss = (soft_embeddings - unit_embeddings)[mask_indices].float().pow(2).mean(dim=-1)
+                scale = unit_embeddings[mask_indices].float().pow(2).sum(dim=-1)
+            else:
+                l2_loss = (soft_embeddings - unit_embeddings).float().pow(2).mean(dim=-1)
+                scale = unit_embeddings.float().pow(2).sum(dim=-1)
+            l2_loss = (l2_loss / scale).mean()
+
+        if mix_with_unit:
+            B, T, D = x.shape
+            selected_indices = compute_mask_indices(
+                (B, T),
+                padding_mask,
+                self.mask_prob / 2,
+                self.mask_length // 2,
+                self.mask_selection,
+                self.mask_other,
+                min_masks=2,
+                no_overlap=self.no_mask_overlap,
+                min_space=self.mask_min_space,
+            )
+            selected_indices = torch.from_numpy(selected_indices).to(x.device)
+            if mask_indices is not None:
+                if remask:
+                    remask_indices = torch.logical_and(selected_indices, mask_indices)
+                    soft_embeddings[remask_indices] = self.mask_emb
+                swap_indices = torch.logical_and(selected_indices, ~mask_indices)
+            else:
+                swap_indices = selected_indices
+            soft_embeddings[swap_indices] = unit_embeddings[swap_indices]
+
+        soft_embeddings = soft_embeddings * (1 - padding_mask.unsqueeze(-1).type_as(x))
+        return src_tokens, soft_embeddings, l2_loss
+
+    def forward(
+        self,
+        source: torch.Tensor = None,
+        src_tokens: torch.Tensor = None,
+        src_lengths: torch.Tensor = None,
+        prev_output_tokens: torch.Tensor = None,
+        target_list: Optional[List[torch.Tensor]] = None,
+        padding_mask: Optional[torch.Tensor] = None,
+        mask: bool = True,
+        features_only: bool = False,
+        output_layer: Optional[int] = None,
+    ) -> Dict[str, torch.Tensor]:
+        assert source is not None or src_tokens is not None
+        if source is not None:
+            return self.forward_speech(
+                source=source,
+                target_list=target_list,
+                padding_mask=padding_mask,
+                mask=mask,
+                features_only=features_only,
+                output_layer=output_layer,
+            )
+        else:
+            return self.forward_text(
+                src_tokens=src_tokens,
+                src_lengths=src_lengths,
+                prev_output_tokens=prev_output_tokens,
+                mask=self.mask_u2t,
+                features_only=features_only,
+                output_layer=output_layer,
+            )
+    
+    def forward_speech(
+        self,
+        source: torch.Tensor = None,
+        target_list: Optional[List[torch.Tensor]] = None,
+        padding_mask: Optional[torch.Tensor] = None,
+        mask: bool = True,
+        features_only: bool = False,
+        output_layer: Optional[int] = None,
+    ) -> Dict[str, torch.Tensor]:
+        """output layer is 1-based"""
+        features = self.forward_features(source)
+        if target_list is not None:
+            features, target_list = self.forward_targets(features, target_list)
+
+        features_pen = features.float().pow(2).mean()
+
+        features = features.transpose(1, 2)
+        features = self.layer_norm(features)
+        unmasked_features = features.clone()
+
+        if padding_mask is not None:
+            padding_mask = self.forward_padding_mask(features, padding_mask)
+
+        if self.post_extract_proj is not None:
+            features = self.post_extract_proj(features)
+
+        features = self.dropout_input(features)
+        unmasked_features = self.dropout_features(unmasked_features)
+
+        if mask:
+            x, mask_indices = self.apply_mask(features, padding_mask, target_list)
+        else:
+            x = features
+            mask_indices = None
+
+        # feature: (B, T, D), float
+        # target: (B, T), long
+        # x: (B, T, D), float
+        # padding_mask: (B, T), bool
+        # mask_indices: (B, T), bool
+        x, _ = self.encoder(
+            x,
+            padding_mask=padding_mask,
+            layer=None if output_layer is None else output_layer - 1,
+        )
+
+        if features_only:
+            return {"x": x, "padding_mask": padding_mask, "features": features}
+        
+        logit_m_list, logit_u_list = self.compute_hubert_logits(
+            x,
+            target_list[0],
+            self.final_proj_list[0],
+            self.label_embs_list[0],
+            padding_mask,
+            mask_indices,
+        )
+
+        result = {
+            "logit_m_list": logit_m_list,
+            "logit_u_list": logit_u_list,
+            "padding_mask": padding_mask,
+            "features_pen": features_pen,
+        }
+        
+        if self.add_unit_encoder:
+            src_tokens, x_emb, l2_loss = self.convert_embeddings(
+                x, 
+                padding_mask, target_list[0],
+                mask_indices=mask_indices,
+                mix_with_unit=self.mix_with_unit,
+                use_pred_unit=self.use_pred_unit,
+                l2_embedding=self.l2_embedding,
+            )
+            encoder_out = self.unit_encoder(src_tokens, token_embeddings=x_emb)
+
+            result['encoder_out'] = encoder_out['encoder_out']  # [(T, B, D)]
+            result['encoder_padding_mask'] = encoder_out['encoder_padding_mask']    # [(B, T)]
+            if self.l2_embedding:
+                result['embedding_l2_loss'] = l2_loss
+
+            code_logit_m_list, code_logit_u_list = self.compute_ce_logits(
+                encoder_out['encoder_out'][0].transpose(0, 1),      # -> (B, T, C)
+                target_list[-1], 
+                self.final_proj_list[1], 
+                padding_mask,
+                mask_indices,
+            )
+            result['logit_m_list'] += code_logit_m_list
+            result['logit_u_list'] += code_logit_u_list
+        return result
+
+    def forward_text(
+        self,
+        src_tokens: torch.Tensor = None,
+        src_lengths: torch.Tensor = None,
+        prev_output_tokens: torch.Tensor = None,
+        target_list: Optional[List[torch.Tensor]] = None,
+        mask: bool = True,
+        features_only: bool = False,
+        output_layer: Optional[int] = None,
+    ) -> Dict[str, torch.Tensor]:
+        assert self.add_unit_encoder, f"Can not forward unit-text branch without unit_encoder!"
+
+        padding_mask = src_tokens == self.padding_idx
+        unit_embeddings = self.unit_embed_tokens(src_tokens)
+        if mask:
+            unit_embeddings, mask_indices = self.apply_mask(unit_embeddings, padding_mask, [src_tokens])
+        
+        encoder_out = self.unit_encoder(
+            src_tokens,
+            token_embeddings=unit_embeddings,
+            return_all_hiddens=output_layer is not None,
+        )
+
+        result = {}
+        result["encoder_out"] = encoder_out["encoder_out"]
+        result["encoder_states"] = encoder_out["encoder_states"]
+        result["padding_mask"] = padding_mask
+
+        if self.add_text_ctc:
+            result["encoder_out_ctc"] = [self.unit_encoder_ctc_head(x) for x in encoder_out['encoder_out']]
+            result["encoder_padding_mask"] = [
+                self.downsample_ctc_padding_mask(padding_mask) for padding_mask in encoder_out['encoder_padding_mask']
+            ]
+        
+        if features_only:
+            return result
+        if self.add_decoder:
+            assert prev_output_tokens is not None
+            decoder_out = self.decoder(
+                prev_output_tokens=prev_output_tokens, encoder_out=encoder_out,
+            )
+            result['decoder_out'] = decoder_out
+        return result
+
+    def forward_mum(self, src_tokens, target, mask=True):
+        target_list = [target]
+        padding_mask = src_tokens.eq(self.unit_encoder.padding_idx)
+        unit_embeddings = self.unit_embed_tokens(src_tokens)
+        if mask:
+            unit_embeddings, mask_indices = self.apply_mask(unit_embeddings, padding_mask, target_list)
+        else:
+            ### If already applied mask on src_tokens, then the target_list should contains many padding_idx
+            mask_indices = target_list[-1] != self.padding_idx
+            unit_embeddings[mask_indices] = self.mask_emb
+
+        encoder_out = self.unit_encoder(
+            src_tokens,
+            token_embeddings=unit_embeddings,
+        )
+        code_logit_m_list, code_logit_u_list = self.compute_ce_logits(
+            encoder_out["encoder_out"][0].transpose(0, 1), 
+            target_list[-1], 
+            self.final_proj_list[1], 
+            padding_mask,
+            mask_indices,
+        )
+        result = {}
+        result["logit_m_list"] = code_logit_m_list
+        result["logit_u_list"] = code_logit_u_list
+        result["padding_mask"] = padding_mask
+        return result
+
+    def extract_features(
+        self,
+        source: torch.Tensor,
+        padding_mask: Optional[torch.Tensor] = None,
+        mask: bool = False,
+        ret_conv: bool = False,
+        output_layer: Optional[int] = None,
+        **kwargs,
+    ) -> Tuple[torch.Tensor, torch.Tensor]:
+        """Extract encoder features for only speech input"""
+        res = self.forward(
+            source,
+            padding_mask=padding_mask,
+            mask=mask,
+            features_only=True,
+            output_layer=output_layer,
+        )
+        x = res["x"] # B x T x D
+        padding_mask = res["padding_mask"]
+
+        if self.add_unit_encoder:
+            src_tokens, x, _ = self.convert_embeddings(
+                x,
+                padding_mask,
+                mix_with_unit=False,
+                use_pred_unit=False,
+            )
+            encoder_out = self.unit_encoder(
+                src_tokens,
+                token_embeddings=x,
+                return_all_hiddens=output_layer is not None
+            )
+            res["x"] = encoder_out['encoder_out'][0].transpose(0, 1)  # (B, T, D)
+
+        feature = res["features"] if ret_conv else res["x"]
+        if output_layer is not None:
+            feature = encoder_out['encoder_states']
+
+        return feature, padding_mask
+
+    def get_logits(self, net_output, is_masked=True):
+        if is_masked:
+            logits_list = net_output["logit_m_list"]
+        else:
+            logits_list = net_output["logit_u_list"]
+        logits_list = [x[0].float() for x in logits_list if x is not None]
+        return logits_list
+
+    def get_targets(self, net_output, is_masked=True):
+        if is_masked:
+            logits_list = net_output["logit_m_list"]
+        else:
+            logits_list = net_output["logit_u_list"]
+        targets_list = [x[1].long() for x in logits_list if x is not None]
+        return targets_list
+
+    def get_extra_losses(self, net_output):
+        extra_losses = []
+        names = []
+
+        if "features_pen" in net_output:
+            extra_losses.append(net_output["features_pen"])
+            names.append("features_pen")
+
+        if "embedding_l2_loss" in net_output:
+            extra_losses.append(net_output["embedding_l2_loss"])
+            names.append("embedding_l2_loss")
+
+        return extra_losses, names
+
+    def remove_pretraining_modules(self, step2=False):
+        self.target_glu = None
+
+    def load_checkpoint(self, checkpoint: str):
+        if not PathManager.exists(checkpoint):
+            raise IOError("Model file not found: {}".format(checkpoint))
+        state = checkpoint_utils.load_checkpoint_to_cpu(checkpoint)
+        return state
+
+class Rotate3D(nn.Module):
+    """
+    (T, B, D) --> (B, D, T) --> (D, T, B) --> (T, B, D)
+    """
+    def __init__(self):
+        super().__init__()
+
+    def forward(self, x):
+        return x.permute(1, 2, 0)
diff --git a/Speech2S/speech2s/models/speechut_asr.py b/Speech2S/speech2s/models/speechut_asr.py
new file mode 100644
index 0000000000000000000000000000000000000000..f9ec9d8488b4f7e552804d355de000c80fb35b78
--- /dev/null
+++ b/Speech2S/speech2s/models/speechut_asr.py
@@ -0,0 +1,165 @@
+# ----------------------------------------------------------------------------
+# SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder Based Speech-Text Pre-training (https://arxiv.org/abs/2210.03730)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/SpeechUT
+# Code based on fairseq: https://github.com/facebookresearch/fairseq/tree/272c4c5197250997148fb12c0db6306035f166a4
+# 
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# ----------------------------------------------------------------------------
+
+import contextlib
+import torch
+from dataclasses import dataclass, field
+from fairseq import utils
+from fairseq.models import BaseFairseqModel, register_model
+from fairseq.models.fairseq_encoder import FairseqEncoder
+from fairseq.models.hubert import HubertAsrConfig, HubertEncoder
+from fairseq.tasks import FairseqTask
+
+@dataclass
+class SpeechUTASRConfig(HubertAsrConfig):
+    add_decoder: bool = field(
+        default=True,
+        metadata={"help": "add decoder for fine-tune"},
+    )
+
+@register_model("speechut_asr", dataclass=SpeechUTASRConfig)
+class SpeechUTASR(BaseFairseqModel):
+    """
+    A encoder-ctc-decoder model if cfg.add_decoder is True, or a encoder-ctc model
+    """
+    def __init__(self, cfg: SpeechUTASRConfig, encoder: FairseqEncoder):
+        super().__init__()
+        self.cfg = cfg
+        self.encoder = encoder
+        if not cfg.add_decoder:
+            self.encoder.w2v_model.decoder = None
+
+    def upgrade_state_dict_named(self, state_dict, name):
+        super().upgrade_state_dict_named(state_dict, name)
+        return state_dict
+    
+    @classmethod
+    def build_model(cls, cfg: SpeechUTASRConfig, task: FairseqTask):
+        """Build a new model instance."""
+        encoder = SpeechUTEncoder(cfg, task)
+        return cls(cfg, encoder)
+
+    def forward(self, source, padding_mask, prev_output_tokens, **kwargs):
+        encoder_out = self.encoder(source, padding_mask, **kwargs)
+
+        x = self.encoder.final_dropout(encoder_out['encoder_out'][0])  # (T, B, C)
+        if self.encoder.proj:
+            x = self.encoder.proj(x)
+        if self.encoder.conv_ctc_proj:
+            padding_mask = self.encoder.w2v_model.downsample_ctc_padding_mask(encoder_out["encoder_padding_mask"][0])
+        else:
+            padding_mask = encoder_out["encoder_padding_mask"]
+
+        decoder_out = self.decoder(
+            prev_output_tokens, encoder_out=encoder_out, **kwargs
+        ) if self.cfg.add_decoder else None
+        
+        return {
+            "encoder_out_ctc": x,           # (T, B, C), for CTC loss
+            "padding_mask": padding_mask,   # (B, T), for CTC loss
+            "decoder_out": decoder_out,     # for ED loss
+        }
+    
+    def forward_decoder(self, prev_output_tokens, **kwargs):
+        return self.decoder(prev_output_tokens, **kwargs)
+
+    def get_logits(self, net_output):
+        """For CTC decoding"""
+        logits = net_output["encoder_out"]
+        padding = net_output["encoder_padding_mask"]
+        if padding is not None and padding.any():
+            padding = padding.T
+            logits[padding][..., 0] = 0
+            logits[padding][..., 1:] = float("-inf")
+
+        return logits
+
+    def get_normalized_probs(self, net_output, log_probs, sample=None):
+        """For 1) computing CTC loss, 2) decoder decoding."""
+
+        if "encoder_out_ctc" in net_output:
+            logits = net_output["encoder_out_ctc"]
+        else:
+            return self.decoder.get_normalized_probs(net_output, log_probs, sample)
+        
+        if isinstance(logits, list):
+            logits = logits[0]
+
+        if log_probs:
+            return utils.log_softmax(logits.float(), dim=-1)
+        else:
+            return utils.softmax(logits.float(), dim=-1)
+
+    @property
+    def decoder(self):
+        return self.encoder.w2v_model.decoder
+
+
+class SpeechUTEncoder(HubertEncoder):
+    """
+    Modified from fairseq.models.hubert.hubert_asr.HubertEncoder
+    1. make it compatible with encoder-decoder model
+    """
+    def __init__(self, cfg: HubertAsrConfig, task):
+        super().__init__(cfg, task)
+        
+        if (task.target_dictionary is not None) and (
+            hasattr(self.w2v_model, "unit_encoder_ctc_head")
+        ):
+            self.proj = self.w2v_model.unit_encoder_ctc_head
+            self.conv_ctc_proj = True
+        else:
+            self.conv_ctc_proj = False
+    
+    def forward(self, source, padding_mask, tbc=True, **kwargs):
+        w2v_args = {
+            "source": source,
+            "padding_mask": padding_mask,
+            "mask": self.apply_mask and self.training,
+        }
+        ft = self.freeze_finetune_updates <= self.num_updates
+        with torch.no_grad() if not ft else contextlib.ExitStack():
+            x, padding_mask = self.w2v_model.extract_features(**w2v_args)
+            if tbc:
+                # B x T x C -> T x B x C
+                x = x.transpose(0, 1)
+        return {
+            "encoder_out": [x],  # T x B x C
+            "encoder_padding_mask": [padding_mask],  # B x T
+        }
+    
+    def forward_torchscript(self, net_input):
+        """A TorchScript-compatible version of forward.
+
+        Forward the encoder out.
+        """
+        x, padding_mask = self.w2v_model.extract_features(**net_input, mask=False)
+        # B x T x C -> T x B x C
+        x = x.transpose(0, 1)
+
+        encoder_out = {
+            "encoder_out" : [x],
+            "encoder_padding_mask" : [padding_mask],
+        }
+        if self.proj:
+            x = self.proj(x)
+            encoder_out["encoder_out_ctc"] = x
+
+        return encoder_out
+
+    def reorder_encoder_out(self, encoder_out, new_order):
+        if encoder_out["encoder_out"] is not None:
+            encoder_out["encoder_out"] = [
+                x.index_select(1, new_order) for x in encoder_out["encoder_out"]
+            ]
+        if encoder_out["encoder_padding_mask"] is not None:
+            encoder_out["encoder_padding_mask"] = [
+                x.index_select(0, new_order) for x in encoder_out["encoder_padding_mask"]
+            ]
+        return encoder_out
diff --git a/Speech2S/speech2s/models/speechut_st.py b/Speech2S/speech2s/models/speechut_st.py
new file mode 100644
index 0000000000000000000000000000000000000000..6faaccfc89748a2692bd1eaec200588449d10423
--- /dev/null
+++ b/Speech2S/speech2s/models/speechut_st.py
@@ -0,0 +1,221 @@
+# ----------------------------------------------------------------------------
+# SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder Based Speech-Text Pre-training (https://arxiv.org/abs/2210.03730)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/SpeechUT
+# Code based on fairseq: https://github.com/facebookresearch/fairseq/tree/272c4c5197250997148fb12c0db6306035f166a4
+# 
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# ----------------------------------------------------------------------------
+
+import logging
+import contextlib
+import torch
+import torch.nn as nn
+from argparse import Namespace
+from dataclasses import dataclass
+from typing import Any
+from fairseq import checkpoint_utils, tasks
+from fairseq.models import BaseFairseqModel, register_model
+from fairseq.models.fairseq_encoder import FairseqEncoder
+from fairseq.tasks import FairseqTask
+from fairseq.dataclass.utils import convert_namespace_to_omegaconf
+from fairseq.data.data_utils import lengths_to_padding_mask
+
+from fairseq.models.hubert import HubertAsrConfig
+
+logger = logging.getLogger(__name__)
+
+@dataclass
+class SpeechUTS2TConfig(HubertAsrConfig):
+    ### the following config is only for the compatibility to fairseq speech_to_text task
+    input_feat_per_channel: Any = None
+    input_channels: Any = None
+    speaker_to_id: Any = None
+
+@register_model("speechut_st_legacy", dataclass=SpeechUTS2TConfig)
+class SpeechUTS2T(BaseFairseqModel):
+    """An encoder-decoder model."""
+    def __init__(self, cfg: SpeechUTS2TConfig, encoder: FairseqEncoder):
+        super().__init__()
+        self.cfg = cfg
+        self.encoder = encoder
+
+    def upgrade_state_dict_named(self, state_dict, name):
+        super().upgrade_state_dict_named(state_dict, name)
+        return state_dict
+    
+    @classmethod
+    def build_model(cls, cfg: SpeechUTS2TConfig, task: FairseqTask):
+        """Build a new model instance."""
+        encoder = SpeechUTEncoder(cfg, task)
+        return cls(cfg, encoder)
+
+    def forward(self, src_tokens, src_lengths, prev_output_tokens, **kwargs):
+        encoder_out = self.encoder(src_tokens, src_lengths, **kwargs)
+        decoder_out = self.encoder.w2v_model.decoder(
+            prev_output_tokens, encoder_out=encoder_out, **kwargs
+        )
+        return decoder_out
+    
+    def forward_decoder(self, prev_output_tokens, **kwargs):
+        return self.encoder.w2v_model.decoder(prev_output_tokens, **kwargs)
+
+    def get_normalized_probs(self, net_output, log_probs, sample=None):
+        """For decoder decoding."""
+        return self.encoder.w2v_model.decoder.get_normalized_probs(net_output, log_probs, sample)
+        
+    @property
+    def decoder(self):
+        return self.encoder.w2v_model.decoder
+
+
+class SpeechUTEncoder(FairseqEncoder):
+    """
+    Modified from fairseq.models.hubert.hubert_asr.HubertEncoder
+    1. make it compatible with fairseq speech_to_text task
+    2. make it compatible with encoder-decoder model
+    """
+    def __init__(self, cfg: SpeechUTS2TConfig, task):
+        self.apply_mask = cfg.apply_mask
+
+        arg_overrides = {
+            "dropout": cfg.dropout,
+            "activation_dropout": cfg.activation_dropout,
+            "dropout_input": cfg.dropout_input,
+            "attention_dropout": cfg.attention_dropout,
+            "mask_length": cfg.mask_length,
+            "mask_prob": cfg.mask_prob,
+            "mask_selection": cfg.mask_selection,
+            "mask_other": cfg.mask_other,
+            "no_mask_overlap": cfg.no_mask_overlap,
+            "mask_channel_length": cfg.mask_channel_length,
+            "mask_channel_prob": cfg.mask_channel_prob,
+            "mask_channel_selection": cfg.mask_channel_selection,
+            "mask_channel_other": cfg.mask_channel_other,
+            "no_mask_channel_overlap": cfg.no_mask_channel_overlap,
+            "encoder_layerdrop": cfg.layerdrop,
+            "feature_grad_mult": cfg.feature_grad_mult,
+        }
+
+        if cfg.w2v_args is None:
+            state = checkpoint_utils.load_checkpoint_to_cpu(cfg.w2v_path, arg_overrides)
+            w2v_args = state.get("cfg", None)
+            if w2v_args is None:
+                w2v_args = convert_namespace_to_omegaconf(state["args"])
+            cfg.w2v_args = w2v_args
+        else:
+            state = None
+            w2v_args = cfg.w2v_args
+            if isinstance(w2v_args, Namespace):
+                cfg.w2v_args = w2v_args = convert_namespace_to_omegaconf(w2v_args)
+
+        assert task.data_cfg.standardize_audio() == w2v_args.task.normalize, (
+            "Fine-tuning works best when data normalization is the same. "
+            "Please check that --normalize is set or unset for "
+            "both pre-training and here"
+        )
+
+        pretrain_task = tasks.setup_task(w2v_args.task, load_local_states=False)
+        assert state is not None and "task_state" in state, f"the stored dictionaries not found in checkpoint!"
+        # This will load the stored "dictionaries" object
+        pretrain_task.load_state_dict(state["task_state"])
+
+        model = pretrain_task.build_model(w2v_args.model, from_checkpoint=True)
+        if state is not None and not cfg.no_pretrained_weights:
+            try:            
+                model.load_state_dict(state["model"], strict=True)
+            except Exception as e:
+                logger.warn(e)
+                model.load_state_dict(state["model"], strict=False)
+
+        model.remove_pretraining_modules()
+
+        super().__init__(pretrain_task.source_dictionary)
+
+        d = w2v_args.model.encoder_embed_dim
+
+        self.w2v_model = model
+
+        self.final_dropout = nn.Dropout(cfg.final_dropout)
+        self.freeze_finetune_updates = cfg.freeze_finetune_updates
+        self.num_updates = 0
+
+    def set_num_updates(self, num_updates):
+        """Set the number of parameters updates."""
+        super().set_num_updates(num_updates)
+        self.num_updates = num_updates
+    
+    def forward(self, src_tokens=None, src_lengths=None, **kwargs):
+
+        w2v_args = {
+            "source": src_tokens,
+            "padding_mask": lengths_to_padding_mask(src_lengths),
+            "mask": self.apply_mask and self.training,
+        }
+
+        ft = self.freeze_finetune_updates <= self.num_updates
+
+        with torch.no_grad() if not ft else contextlib.ExitStack():
+            x, padding_mask = self.w2v_model.extract_features(**w2v_args)
+            # B x T x C -> T x B x C
+            x = x.transpose(0, 1)
+
+        return {
+            "encoder_out": [x],  # T x B x C
+            "encoder_padding_mask": [padding_mask],  # B x T
+            "padding_mask": [padding_mask],
+        }
+    
+    def forward_torchscript(self, net_input):
+        """A TorchScript-compatible version of forward.
+
+        Forward the encoder out.
+        """
+        _net_input = {
+            "source": net_input["src_tokens"],
+            "padding_mask": lengths_to_padding_mask(net_input["src_lengths"]),
+            "mask": False,
+        }
+
+        x, padding_mask = self.w2v_model.extract_features(**_net_input)
+        # B x T x C -> T x B x C
+        x = x.transpose(0, 1)
+
+        encoder_out = {
+            "encoder_out" : [x],
+            "encoder_padding_mask" : [padding_mask],
+        }
+        return encoder_out
+
+    def reorder_encoder_out(self, encoder_out, new_order):
+        if encoder_out["encoder_out"] is not None:
+            encoder_out["encoder_out"] = [
+                x.index_select(1, new_order) for x in encoder_out["encoder_out"]
+            ]
+        if encoder_out["encoder_padding_mask"] is not None:
+            encoder_out["encoder_padding_mask"] = [
+                x.index_select(0, new_order) for x in encoder_out["encoder_padding_mask"]
+            ]
+        return encoder_out
+
+    def max_positions(self):
+        """Maximum input length supported by the encoder."""
+        return None
+
+    def upgrade_state_dict_named(self, state_dict, name):
+        return state_dict
+
+
+def Embedding(num_embeddings, embedding_dim, padding_idx):
+    m = nn.Embedding(num_embeddings, embedding_dim, padding_idx=padding_idx)
+    nn.init.normal_(m.weight, mean=0, std=embedding_dim**-0.5)
+    nn.init.constant_(m.weight[padding_idx], 0)
+    return m
+
+
+def Linear(in_features, out_features, bias=True):
+    m = nn.Linear(in_features, out_features, bias)
+    nn.init.xavier_uniform_(m.weight)
+    if bias:
+        nn.init.constant_(m.bias, 0.0)
+    return m
diff --git a/Speech2S/speech2s/models/t5_transformer_lm.py b/Speech2S/speech2s/models/t5_transformer_lm.py
new file mode 100644
index 0000000000000000000000000000000000000000..3d16a2df00b692114f8d84d254cf486d09e1137b
--- /dev/null
+++ b/Speech2S/speech2s/models/t5_transformer_lm.py
@@ -0,0 +1,25 @@
+# --------------------------------------------------------
+# Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data (https://arxiv.org/abs/2203.17113)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/Speech2C
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Based on fairseq code bases
+# https://github.com/pytorch/fairseq
+# --------------------------------------------------------
+
+from fairseq.models import (
+    register_model_architecture,
+)
+from fairseq.models.transformer_lm import base_lm_architecture
+
+
+@register_model_architecture(model_name="transformer_lm", arch_name="transformer_lm_t5")
+def transformer_lm_t5(args):
+    args.decoder_embed_dim = getattr(args, "decoder_embed_dim", 1280)
+    args.decoder_ffn_embed_dim = getattr(args, "decoder_ffn_embed_dim", 6144)
+    args.decoder_layers = getattr(args, "decoder_layers", 20)
+    args.decoder_attention_heads = getattr(args, "decoder_attention_heads", 16)
+    args.dropout = getattr(args, "dropout", 0.1)
+    args.attention_dropout = getattr(args, "attention_dropout", 0.1)
+    args.activation_fn = getattr(args, "activation_fn", "gelu")
+    base_lm_architecture(args)
diff --git a/Speech2S/speech2s/modules/__init__.py b/Speech2S/speech2s/modules/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..dad97814e515d8e68d68e4e031d4f9c9055f3864
--- /dev/null
+++ b/Speech2S/speech2s/modules/__init__.py
@@ -0,0 +1,27 @@
+# --------------------------------------------------------
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Based on fairseq code bases
+# https://github.com/facebookresearch/fairseq
+# --------------------------------------------------------
+
+from .learned_positional_embedding import LearnedPositionalEmbedding
+from .multihead_attention import MultiheadAttention
+from .relative_pos_enc import RelativePositionalEncoding
+from .transformer_layer import TransformerEncoderLayerBase, TransformerDecoderLayerBase
+from .w2v_encoder import TransformerEncoder, TransformerSentenceEncoderLayer
+from .transformer_encoder import TransformerEncoderBase
+from .transformer_decoder import TransformerDecoderScriptable, TransformerDecoderBaseScriptable
+
+__all__ = [
+    "MultiheadAttention",
+    "RelativePositionalEncoding",
+    "LearnedPositionalEmbedding",
+    "TransformerEncoderLayerBase",
+    "TransformerDecoderLayerBase",
+    "TransformerEncoder",
+    "TransformerSentenceEncoderLayer",
+    "TransformerEncoderBase",
+    "TransformerDecoderScriptable",
+    "TransformerDecoderBaseScriptable",
+]
diff --git a/Speech2S/speech2s/modules/ctc_prefix_score.py b/Speech2S/speech2s/modules/ctc_prefix_score.py
new file mode 100644
index 0000000000000000000000000000000000000000..b42cbd819abf7bdd718bef3db3f553c8360ac384
--- /dev/null
+++ b/Speech2S/speech2s/modules/ctc_prefix_score.py
@@ -0,0 +1,93 @@
+#!/usr/bin/env python3
+
+# Copyright 2018 Mitsubishi Electric Research Labs (Takaaki Hori)
+#  Apache 2.0  (http://www.apache.org/licenses/LICENSE-2.0)
+
+import numpy as np
+import six
+
+
+class CTCPrefixScore(object):
+    """Compute CTC label sequence scores
+    which is based on Algorithm 2 in WATANABE et al.
+    "HYBRID CTC/ATTENTION ARCHITECTURE FOR END-TO-END SPEECH RECOGNITION,"
+    but extended to efficiently compute the probablities of multiple labels
+    simultaneously
+    """
+
+    def __init__(self, x, blank, eos, xp):
+        self.xp = xp
+        self.logzero = -10000000000.0
+        self.blank = blank
+        self.eos = eos
+        self.input_length = len(x)
+        self.x = x
+
+    def initial_state(self):
+        """Obtain an initial CTC state
+        :return: CTC state
+        """
+        # initial CTC state is made of a frame x 2 tensor that corresponds to
+        # r_t^n(<sos>) and r_t^b(<sos>), where 0 and 1 of axis=1 represent
+        # superscripts n and b (non-blank and blank), respectively.
+        r = self.xp.full((self.input_length, 2), self.logzero, dtype=np.float32)
+        r[0, 1] = self.x[0, self.blank]
+        for i in six.moves.range(1, self.input_length):
+            r[i, 1] = r[i - 1, 1] + self.x[i, self.blank]
+        return r
+
+    def __call__(self, y, cs, r_prev):
+        """Compute CTC prefix scores for next labels
+        :param y     : prefix label sequence
+        :param cs    : array of next labels
+        :param r_prev: previous CTC state
+        :return ctc_scores, ctc_states
+        """
+        # initialize CTC states
+        output_length = len(y) - 1  # ignore sos
+        # new CTC states are prepared as a frame x (n or b) x n_labels tensor
+        # that corresponds to r_t^n(h) and r_t^b(h).
+        r = self.xp.ndarray((self.input_length, 2, len(cs)), dtype=np.float32)
+        xs = self.x[:, cs]
+        if output_length == 0:
+            r[0, 0] = xs[0]
+            r[0, 1] = self.logzero
+        else:
+            r[output_length - 1] = self.logzero
+
+        # prepare forward probabilities for the last label
+        r_sum = self.xp.logaddexp(
+            r_prev[:, 0], r_prev[:, 1]
+        )  # log(r_t^n(g) + r_t^b(g))
+        last = y[-1]
+        if output_length > 0 and last in cs:
+            log_phi = self.xp.ndarray((self.input_length, len(cs)), dtype=np.float32)
+            for i in six.moves.range(len(cs)):
+                log_phi[:, i] = r_sum if cs[i] != last else r_prev[:, 1]
+        else:
+            log_phi = r_sum
+
+        # compute forward probabilities log(r_t^n(h)), log(r_t^b(h)),
+        # and log prefix probabilities log(psi)
+        start = max(output_length, 1)
+        log_psi = r[start - 1, 0]
+        for t in six.moves.range(start, self.input_length):
+            r[t, 0] = self.xp.logaddexp(r[t - 1, 0], log_phi[t - 1]) + xs[t]
+            r[t, 1] = (
+                self.xp.logaddexp(r[t - 1, 0], r[t - 1, 1]) + self.x[t, self.blank]
+            )
+            log_psi = self.xp.logaddexp(log_psi, log_phi[t - 1] + xs[t])
+
+        # get P(...eos|X) that ends with the prefix itself
+        eos_pos = self.xp.where(cs == self.eos)[0]
+        if len(eos_pos) > 0:
+            log_psi[eos_pos] = r_sum[-1]  # log(r_T^n(g) + r_T^b(g))
+
+        # exclude blank probs
+        blank_pos = self.xp.where(cs == self.blank)[0]
+        if len(blank_pos) > 0:
+            log_psi[blank_pos] = self.logzero
+
+        # return the log prefix probability and CTC states, where the label axis
+        # of the CTC states is moved to the first axis to slice it easily
+        return log_psi, self.xp.rollaxis(r, 2)
diff --git a/Speech2S/speech2s/modules/learned_positional_embedding.py b/Speech2S/speech2s/modules/learned_positional_embedding.py
new file mode 100644
index 0000000000000000000000000000000000000000..20c8558e20b2172a8c607e2f5c32aa146ff2b9cf
--- /dev/null
+++ b/Speech2S/speech2s/modules/learned_positional_embedding.py
@@ -0,0 +1,69 @@
+# --------------------------------------------------------
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Based on fairseq code bases
+# https://github.com/facebookresearch/fairseq
+# --------------------------------------------------------
+
+"""
+    Modified from https://github.com/facebookresearch/fairseq/blob/main/fairseq/modules/learned_positional_embedding.py
+    1. Add clamping if the input length exceeds the max-source-tokens
+"""
+
+from typing import Dict, Optional
+
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from fairseq import utils
+from torch import Tensor
+
+
+class LearnedPositionalEmbedding(nn.Embedding):
+    """
+    This module learns positional embeddings up to a fixed maximum size.
+    Padding ids are ignored by either offsetting based on padding_idx
+    or by setting padding_idx to None and ensuring that the appropriate
+    position ids are passed to the forward function.
+    """
+
+    def __init__(self, num_embeddings: int, embedding_dim: int, padding_idx: int):
+        super().__init__(num_embeddings, embedding_dim, padding_idx)
+        self.onnx_trace = False
+        if self.padding_idx is not None:
+            self.max_positions = self.num_embeddings - self.padding_idx - 1
+        else:
+            self.max_positions = self.num_embeddings
+
+    def forward(
+        self,
+        input: Tensor,
+        incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]] = None,
+        positions: Optional[Tensor] = None,
+    ):
+        """Input is expected to be of size [bsz x seqlen]."""
+        assert (positions is None) or (
+            self.padding_idx is None
+        ), "If positions is pre-computed then padding_idx should not be set."
+
+        if positions is None:
+            if incremental_state is not None:
+                # positions is the same for every token when decoding a single step
+                # Without the int() cast, it doesn't work in some cases when exporting to ONNX
+                positions = torch.zeros(
+                    (1, 1), device=input.device, dtype=input.dtype
+                ).fill_(int(self.padding_idx + input.size(1)))
+            else:
+                positions = utils.make_positions(
+                    input, self.padding_idx, onnx_trace=self.onnx_trace
+                )
+            positions = torch.clamp(positions, max=self.padding_idx + self.max_positions)
+        return F.embedding(
+            positions,
+            self.weight,
+            self.padding_idx,
+            self.max_norm,
+            self.norm_type,
+            self.scale_grad_by_freq,
+            self.sparse,
+        )
diff --git a/Speech2S/speech2s/modules/multihead_attention.py b/Speech2S/speech2s/modules/multihead_attention.py
new file mode 100644
index 0000000000000000000000000000000000000000..89f46ab628ebe7faa1a3db2fd4f31a7269bb006a
--- /dev/null
+++ b/Speech2S/speech2s/modules/multihead_attention.py
@@ -0,0 +1,346 @@
+# --------------------------------------------------------
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Based on fairseq code bases
+# https://github.com/facebookresearch/fairseq
+# --------------------------------------------------------
+
+from typing import Dict, Optional, Tuple
+
+import torch
+import torch.nn.functional as F
+from fairseq import utils
+from torch import Tensor
+
+from fairseq.modules import MultiheadAttention as FairseqMultiheadAttention
+
+
+class MultiheadAttention(FairseqMultiheadAttention):
+    """Multi-headed attention.
+
+    See "Attention Is All You Need" for more details.
+    """
+
+    def __init__(
+        self,
+        embed_dim,
+        num_heads,
+        kdim=None,
+        vdim=None,
+        dropout=0.0,
+        bias=True,
+        add_bias_kv=False,
+        add_zero_attn=False,
+        self_attention=False,
+        encoder_decoder_attention=False,
+        q_noise=0.0,
+        qn_block_size=8,
+        scaling_for_att=1.0
+    ):
+        super().__init__(
+            embed_dim,
+            num_heads,
+            kdim,
+            vdim,
+            dropout,
+            bias,
+            add_bias_kv,
+            add_zero_attn,
+            self_attention,
+            encoder_decoder_attention,
+            q_noise,
+            qn_block_size,
+        )
+        self.scaling_for_att = scaling_for_att
+
+    def forward(
+        self,
+        query,
+        key: Optional[Tensor],
+        value: Optional[Tensor],
+        key_padding_mask: Optional[Tensor] = None,
+        incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]] = None,
+        need_weights: bool = True,
+        static_kv: bool = False,
+        attn_mask: Optional[Tensor] = None,
+        before_softmax: bool = False,
+        need_head_weights: bool = False,
+        position_bias: Optional[Tensor] = None,
+    ) -> Tuple[Tensor, Optional[Tensor]]:
+        """Input shape: Time x Batch x Channel
+
+        Args:
+            key_padding_mask (ByteTensor, optional): mask to exclude
+                keys that are pads, of shape `(batch, src_len)`, where
+                padding elements are indicated by 1s.
+            need_weights (bool, optional): return the attention weights,
+                averaged over heads (default: False).
+            attn_mask (ByteTensor, optional): typically used to
+                implement causal attention, where the mask prevents the
+                attention from looking forward in time (default: None).
+            before_softmax (bool, optional): return the raw attention
+                weights and values before the attention softmax.
+            need_head_weights (bool, optional): return the attention
+                weights for each head. Implies *need_weights*. Default:
+                return the average attention weights over all heads.
+        """
+        if need_head_weights:
+            need_weights = True
+
+        is_tpu = query.device.type == "xla"
+
+        tgt_len, bsz, embed_dim = query.size()
+        src_len = tgt_len
+        assert embed_dim == self.embed_dim, f"query dim {embed_dim} != {self.embed_dim}"
+        assert list(query.size()) == [tgt_len, bsz, embed_dim]
+        if key is not None:
+            src_len, key_bsz, _ = key.size()
+            if not torch.jit.is_scripting():
+                assert key_bsz == bsz
+                assert value is not None
+                assert src_len, bsz == value.shape[:2]
+
+        if (
+            not self.onnx_trace
+            and not is_tpu  # don't use PyTorch version on TPUs
+            and incremental_state is None
+            and not static_kv
+            # A workaround for quantization to work. Otherwise JIT compilation
+            # treats bias in linear module as method.
+            and not torch.jit.is_scripting()
+            and position_bias is None
+        ):
+            assert key is not None and value is not None
+            return F.multi_head_attention_forward(
+                query,
+                key,
+                value,
+                self.embed_dim,
+                self.num_heads,
+                torch.empty([0]),
+                torch.cat((self.q_proj.bias, self.k_proj.bias, self.v_proj.bias)),
+                self.bias_k,
+                self.bias_v,
+                self.add_zero_attn,
+                self.dropout_module.p,
+                self.out_proj.weight,
+                self.out_proj.bias,
+                self.training or self.dropout_module.apply_during_inference,
+                key_padding_mask,
+                need_weights,
+                attn_mask,
+                use_separate_proj_weight=True,
+                q_proj_weight=self.q_proj.weight,
+                k_proj_weight=self.k_proj.weight,
+                v_proj_weight=self.v_proj.weight,
+            )
+
+        if incremental_state is not None:
+            saved_state = self._get_input_buffer(incremental_state)
+            if saved_state is not None and "prev_key" in saved_state:
+                # previous time steps are cached - no need to recompute
+                # key and value if they are static
+                if static_kv:
+                    assert self.encoder_decoder_attention and not self.self_attention
+                    key = value = None
+        else:
+            saved_state = None
+
+        if self.self_attention:
+            q = self.q_proj(query)
+            k = self.k_proj(query)
+            v = self.v_proj(query)
+        elif self.encoder_decoder_attention:
+            # encoder-decoder attention
+            q = self.q_proj(query)
+            if key is None:
+                assert value is None
+                k = v = None
+            else:
+                k = self.k_proj(key)
+                v = self.v_proj(key)
+
+        else:
+            assert key is not None and value is not None
+            q = self.q_proj(query)
+            k = self.k_proj(key)
+            v = self.v_proj(value)
+        q *= self.scaling
+        q *= (1 / self.scaling_for_att)
+
+        if self.bias_k is not None:
+            assert self.bias_v is not None
+            k = torch.cat([k, self.bias_k.repeat(1, bsz, 1)])
+            v = torch.cat([v, self.bias_v.repeat(1, bsz, 1)])
+            if attn_mask is not None:
+                attn_mask = torch.cat(
+                    [attn_mask, attn_mask.new_zeros(attn_mask.size(0), 1)], dim=1
+                )
+            if key_padding_mask is not None:
+                key_padding_mask = torch.cat(
+                    [
+                        key_padding_mask,
+                        key_padding_mask.new_zeros(key_padding_mask.size(0), 1),
+                    ],
+                    dim=1,
+                )
+
+        q = (
+            q.contiguous()
+            .view(tgt_len, bsz * self.num_heads, self.head_dim)
+            .transpose(0, 1)
+        )
+        if k is not None:
+            k = (
+                k.contiguous()
+                .view(-1, bsz * self.num_heads, self.head_dim)
+                .transpose(0, 1)
+            )
+        if v is not None:
+            v = (
+                v.contiguous()
+                .view(-1, bsz * self.num_heads, self.head_dim)
+                .transpose(0, 1)
+            )
+
+        if saved_state is not None:
+            # saved states are stored with shape (bsz, num_heads, seq_len, head_dim)
+            if "prev_key" in saved_state:
+                _prev_key = saved_state["prev_key"]
+                assert _prev_key is not None
+                prev_key = _prev_key.view(bsz * self.num_heads, -1, self.head_dim)
+                if static_kv:
+                    k = prev_key
+                else:
+                    assert k is not None
+                    k = torch.cat([prev_key, k], dim=1)
+                src_len = k.size(1)
+            if "prev_value" in saved_state:
+                _prev_value = saved_state["prev_value"]
+                assert _prev_value is not None
+                prev_value = _prev_value.view(bsz * self.num_heads, -1, self.head_dim)
+                if static_kv:
+                    v = prev_value
+                else:
+                    assert v is not None
+                    v = torch.cat([prev_value, v], dim=1)
+            prev_key_padding_mask: Optional[Tensor] = None
+            if "prev_key_padding_mask" in saved_state:
+                prev_key_padding_mask = saved_state["prev_key_padding_mask"]
+            assert k is not None and v is not None
+            key_padding_mask = MultiheadAttention._append_prev_key_padding_mask(
+                key_padding_mask=key_padding_mask,
+                prev_key_padding_mask=prev_key_padding_mask,
+                batch_size=bsz,
+                src_len=k.size(1),
+                static_kv=static_kv,
+            )
+
+            saved_state["prev_key"] = k.view(bsz, self.num_heads, -1, self.head_dim)
+            saved_state["prev_value"] = v.view(bsz, self.num_heads, -1, self.head_dim)
+            saved_state["prev_key_padding_mask"] = key_padding_mask
+            # In this branch incremental_state is never None
+            assert incremental_state is not None
+            incremental_state = self._set_input_buffer(incremental_state, saved_state)
+        assert k is not None
+        assert k.size(1) == src_len
+
+        # This is part of a workaround to get around fork/join parallelism
+        # not supporting Optional types.
+        if key_padding_mask is not None and key_padding_mask.dim() == 0:
+            key_padding_mask = None
+
+        if key_padding_mask is not None:
+            assert key_padding_mask.size(0) == bsz
+            assert key_padding_mask.size(1) == src_len
+
+        if self.add_zero_attn:
+            assert v is not None
+            src_len += 1
+            k = torch.cat([k, k.new_zeros((k.size(0), 1) + k.size()[2:])], dim=1)
+            v = torch.cat([v, v.new_zeros((v.size(0), 1) + v.size()[2:])], dim=1)
+            if attn_mask is not None:
+                attn_mask = torch.cat(
+                    [attn_mask, attn_mask.new_zeros(attn_mask.size(0), 1)], dim=1
+                )
+            if key_padding_mask is not None:
+                key_padding_mask = torch.cat(
+                    [
+                        key_padding_mask,
+                        torch.zeros(key_padding_mask.size(0), 1).type_as(
+                            key_padding_mask
+                        ),
+                    ],
+                    dim=1,
+                )
+
+        attn_weights = torch.bmm(q, k.transpose(1, 2))
+        attn_weights = self.apply_sparse_mask(attn_weights, tgt_len, src_len, bsz)
+
+        if position_bias is not None: ## first order
+            ## position_bias: [241, 241, 64]
+            #print ("attn_weights: ", attn_weights.size()) # [492, 241, 241]
+            reshape_q = q.contiguous().view(bsz * self.num_heads, -1, self.head_dim).transpose(0,1) #[241, 492, 64]
+            #print ("reshape_q: ", reshape_q.size())
+            B = torch.matmul(reshape_q, position_bias.transpose(-2, -1))
+            #print ("B: ", B.size())  ## [241, 492, 241]
+            #B = B.transpose(0, 1).view(bsz, self.num_heads, position_bias.size(0), position_bias.size(1))
+            B = B.transpose(0, 1).view(bsz*self.num_heads, position_bias.size(0), position_bias.size(1))
+            #print ("B 2: ", B.size())
+            attn_weights += B
+
+        attn_weights *= self.scaling_for_att
+        assert list(attn_weights.size()) == [bsz * self.num_heads, tgt_len, src_len]
+
+        if attn_mask is not None:
+            attn_mask = attn_mask.unsqueeze(0)
+            if self.onnx_trace:
+                attn_mask = attn_mask.repeat(attn_weights.size(0), 1, 1)
+            attn_weights += attn_mask
+
+        if key_padding_mask is not None:
+            # don't attend to padding symbols
+            attn_weights = attn_weights.view(bsz, self.num_heads, tgt_len, src_len)
+            if not is_tpu:
+                attn_weights = attn_weights.masked_fill(
+                    key_padding_mask.unsqueeze(1).unsqueeze(2).to(torch.bool),
+                    float("-inf"),
+                )
+            else:
+                attn_weights = attn_weights.transpose(0, 2)
+                attn_weights = attn_weights.masked_fill(key_padding_mask, float("-inf"))
+                attn_weights = attn_weights.transpose(0, 2)
+            attn_weights = attn_weights.view(bsz * self.num_heads, tgt_len, src_len)
+        
+        if self.scaling_for_att > 1.0:
+            attn_weights = attn_weights - attn_weights.detach().max(dim=-1, keepdim=True)[0]
+        
+        if before_softmax:
+            return attn_weights, v
+
+        attn_weights_float = utils.softmax(
+            attn_weights, dim=-1, onnx_trace=self.onnx_trace
+        )
+        attn_weights = attn_weights_float.type_as(attn_weights)
+        attn_probs = self.dropout_module(attn_weights)
+
+        assert v is not None
+        attn = torch.bmm(attn_probs, v)
+        assert list(attn.size()) == [bsz * self.num_heads, tgt_len, self.head_dim]
+        if self.onnx_trace and attn.size(1) == 1:
+            # when ONNX tracing a single decoder step (sequence length == 1)
+            # the transpose is a no-op copy before view, thus unnecessary
+            attn = attn.contiguous().view(tgt_len, bsz, embed_dim)
+        else:
+            attn = attn.transpose(0, 1).contiguous().view(tgt_len, bsz, embed_dim)
+        attn = self.out_proj(attn)
+        attn_weights: Optional[Tensor] = None
+        if need_weights:
+            attn_weights = attn_weights_float.view(
+                bsz, self.num_heads, tgt_len, src_len
+            ).transpose(1, 0)
+            if not need_head_weights:
+                # average attention weights over heads
+                attn_weights = attn_weights.mean(dim=0)
+
+        return attn, attn_weights
diff --git a/Speech2S/speech2s/modules/relative_pos_enc.py b/Speech2S/speech2s/modules/relative_pos_enc.py
new file mode 100644
index 0000000000000000000000000000000000000000..7021fc0941fef310ca5571c101b8a8e18ffc1db6
--- /dev/null
+++ b/Speech2S/speech2s/modules/relative_pos_enc.py
@@ -0,0 +1,33 @@
+# --------------------------------------------------------
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Based on fairseq code bases
+# https://github.com/facebookresearch/fairseq
+# --------------------------------------------------------
+
+import torch
+
+class RelativePositionalEncoding(torch.nn.Module):
+    def __init__(self, d_model, maxlen=1000, embed_v=False):
+        super(RelativePositionalEncoding, self).__init__()
+
+        self.d_model = d_model
+        self.maxlen = maxlen
+        self.pe_k = torch.nn.Embedding(2*maxlen, d_model) 
+        if embed_v:
+            self.pe_v = torch.nn.Embedding(2*maxlen, d_model)
+        self.embed_v = embed_v
+
+
+    def forward(self, pos_seq, incremental_state=None):
+        pos_seq[pos_seq < -self.maxlen] = -self.maxlen
+        pos_seq[pos_seq >= self.maxlen] = self.maxlen - 1
+        pos_seq = pos_seq + self.maxlen
+        
+        if incremental_state is not None:
+            pos_seq = pos_seq[-1:]
+
+        if self.embed_v:
+            return self.pe_k(pos_seq), self.pe_v(pos_seq)
+        else:
+            return self.pe_k(pos_seq), None
diff --git a/Speech2S/speech2s/modules/transformer_decoder.py b/Speech2S/speech2s/modules/transformer_decoder.py
new file mode 100644
index 0000000000000000000000000000000000000000..84417b44b2672e49cf92bad8355d2dae48661b55
--- /dev/null
+++ b/Speech2S/speech2s/modules/transformer_decoder.py
@@ -0,0 +1,543 @@
+# --------------------------------------------------------
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Based on fairseq code bases
+# https://github.com/facebookresearch/fairseq
+# --------------------------------------------------------
+
+"""
+    Modified from https://github.com/facebookresearch/fairseq/blob/main/fairseq/models/transformer/transformer_decoder.py
+"""
+
+import math
+from typing import Any, Dict, List, Optional
+
+import torch
+import torch.nn as nn
+from fairseq import utils
+from fairseq.distributed import fsdp_wrap
+from fairseq.models import FairseqIncrementalDecoder
+from fairseq.models.transformer import TransformerConfig
+from fairseq.modules import (
+    AdaptiveSoftmax,
+    BaseLayer,
+    FairseqDropout,
+    LayerDropModuleList,
+    LayerNorm,
+    PositionalEmbedding,
+    SinusoidalPositionalEmbedding,
+)
+from fairseq.modules.checkpoint_activations import checkpoint_wrapper
+from fairseq.modules.quant_noise import quant_noise as apply_quant_noise_
+from torch import Tensor
+
+from speechut.modules import transformer_layer
+from speechut.modules import RelativePositionalEncoding
+
+# rewrite name for backward compatibility in `make_generation_fast_`
+def module_name_fordropout(module_name: str) -> str:
+    if module_name == "TransformerDecoderBase":
+        return "TransformerDecoder"
+    else:
+        return module_name
+
+
+class TransformerDecoderBase(FairseqIncrementalDecoder):
+    """
+    Transformer decoder consisting of *cfg.decoder.layers* layers. Each layer
+    is a :class:`TransformerDecoderLayer`.
+
+    Args:
+        args (argparse.Namespace): parsed command-line arguments
+        dictionary (~fairseq.data.Dictionary): decoding dictionary
+        embed_tokens (torch.nn.Embedding): output embedding
+        no_encoder_attn (bool, optional): whether to attend to encoder outputs
+            (default: False).
+    """
+
+    def __init__(
+        self,
+        cfg,
+        dictionary,
+        embed_tokens,
+        no_encoder_attn=False,
+        output_projection=None,
+        use_rel_pos_enc=False,
+    ):
+        self.cfg = cfg
+        super().__init__(dictionary)
+        self.register_buffer("version", torch.Tensor([3]))
+        self._future_mask = torch.empty(0)
+
+        self.dropout_module = FairseqDropout(
+            cfg.dropout, module_name=module_name_fordropout(self.__class__.__name__)
+        )
+        self.decoder_layerdrop = cfg.decoder.layerdrop
+        self.share_input_output_embed = cfg.share_decoder_input_output_embed
+
+        input_embed_dim = embed_tokens.embedding_dim
+        embed_dim = cfg.decoder.embed_dim
+        self.embed_dim = embed_dim
+        self.output_embed_dim = cfg.decoder.output_dim
+
+        self.padding_idx = embed_tokens.padding_idx
+        self.max_target_positions = cfg.max_target_positions
+
+        self.embed_tokens = embed_tokens
+
+        self.embed_scale = 1.0 if cfg.no_scale_embedding else math.sqrt(embed_dim)
+
+        if not cfg.adaptive_input and cfg.quant_noise.pq > 0:
+            self.quant_noise = apply_quant_noise_(
+                nn.Linear(embed_dim, embed_dim, bias=False),
+                cfg.quant_noise.pq,
+                cfg.quant_noise.pq_block_size,
+            )
+        else:
+            self.quant_noise = None
+
+        self.project_in_dim = (
+            Linear(input_embed_dim, embed_dim, bias=False)
+            if embed_dim != input_embed_dim
+            else None
+        )
+        self.embed_positions = (
+            PositionalEmbedding(
+                self.max_target_positions,
+                embed_dim,
+                self.padding_idx,
+                learned=cfg.decoder.learned_pos,
+            )
+            if not cfg.no_token_positional_embeddings
+            else None
+        )
+        if cfg.layernorm_embedding:
+            self.layernorm_embedding = LayerNorm(embed_dim, export=cfg.export)
+        else:
+            self.layernorm_embedding = None
+
+        self.cross_self_attention = cfg.cross_self_attention
+
+        if self.decoder_layerdrop > 0.0:
+            self.layers = LayerDropModuleList(p=self.decoder_layerdrop)
+        else:
+            self.layers = nn.ModuleList([])
+        self.use_rel_pos_enc = use_rel_pos_enc
+        self.layers.extend(
+            [
+                self.build_decoder_layer(cfg, no_encoder_attn)
+                for _ in range(cfg.decoder.layers)
+            ]
+        )
+        self.num_layers = len(self.layers)
+
+        if cfg.decoder.normalize_before and not cfg.no_decoder_final_norm:
+            self.layer_norm = LayerNorm(embed_dim, export=cfg.export)
+        else:
+            self.layer_norm = None
+
+        self.project_out_dim = (
+            Linear(embed_dim, self.output_embed_dim, bias=False)
+            if embed_dim != self.output_embed_dim and not cfg.tie_adaptive_weights
+            else None
+        )
+
+        self.adaptive_softmax = None
+        self.output_projection = output_projection
+        if self.output_projection is None:
+            self.build_output_projection(cfg, dictionary, embed_tokens)
+        if self.use_rel_pos_enc:
+            self.pos_emb = RelativePositionalEncoding(embed_dim // cfg.decoder.attention_heads, 24)
+
+    def build_output_projection(self, cfg, dictionary, embed_tokens):
+        if cfg.adaptive_softmax_cutoff is not None:
+            self.adaptive_softmax = AdaptiveSoftmax(
+                len(dictionary),
+                self.output_embed_dim,
+                utils.eval_str_list(cfg.adaptive_softmax_cutoff, type=int),
+                dropout=cfg.adaptive_softmax_dropout,
+                adaptive_inputs=embed_tokens if cfg.tie_adaptive_weights else None,
+                factor=cfg.adaptive_softmax_factor,
+                tie_proj=cfg.tie_adaptive_proj,
+            )
+        elif self.share_input_output_embed:
+            self.output_projection = nn.Linear(
+                self.embed_tokens.weight.shape[1],
+                self.embed_tokens.weight.shape[0],
+                bias=False,
+            )
+            self.output_projection.weight = self.embed_tokens.weight
+        else:
+            self.output_projection = nn.Linear(
+                self.output_embed_dim, len(dictionary), bias=False
+            )
+            nn.init.normal_(
+                self.output_projection.weight, mean=0, std=self.output_embed_dim ** -0.5
+            )
+        num_base_layers = cfg.base_layers
+        for i in range(num_base_layers):
+            self.layers.insert(
+                ((i + 1) * cfg.decoder.layers) // (num_base_layers + 1),
+                BaseLayer(cfg),
+            )
+
+    def build_decoder_layer(self, cfg, no_encoder_attn=False):
+        layer = transformer_layer.TransformerDecoderLayerBase(cfg, no_encoder_attn, has_relative_attention_bias=self.use_rel_pos_enc)
+        checkpoint = cfg.checkpoint_activations
+        if checkpoint:
+            offload_to_cpu = cfg.offload_activations
+            layer = checkpoint_wrapper(layer, offload_to_cpu=offload_to_cpu)
+        # if we are checkpointing, enforce that FSDP always wraps the
+        # checkpointed layer, regardless of layer size
+        min_params_to_wrap = cfg.min_params_to_wrap if not checkpoint else 0
+        layer = fsdp_wrap(layer, min_num_params=min_params_to_wrap)
+        return layer
+
+    def forward(
+        self,
+        prev_output_tokens,
+        encoder_out: Optional[Dict[str, List[Tensor]]] = None,
+        incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]] = None,
+        features_only: bool = False,
+        full_context_alignment: bool = False,
+        alignment_layer: Optional[int] = None,
+        alignment_heads: Optional[int] = None,
+        src_lengths: Optional[Any] = None,
+        return_all_hiddens: bool = False,
+    ):
+        """
+        Args:
+            prev_output_tokens (LongTensor): previous decoder outputs of shape
+                `(batch, tgt_len)`, for teacher forcing
+            encoder_out (optional): output from the encoder, used for
+                encoder-side attention, should be of size T x B x C
+            incremental_state (dict): dictionary used for storing state during
+                :ref:`Incremental decoding`
+            features_only (bool, optional): only return features without
+                applying output layer (default: False).
+            full_context_alignment (bool, optional): don't apply
+                auto-regressive mask to self-attention (default: False).
+
+        Returns:
+            tuple:
+                - the decoder's output of shape `(batch, tgt_len, vocab)`
+                - a dictionary with any model-specific outputs
+        """
+
+        x, extra = self.extract_features(
+            prev_output_tokens,
+            encoder_out=encoder_out,
+            incremental_state=incremental_state,
+            full_context_alignment=full_context_alignment,
+            alignment_layer=alignment_layer,
+            alignment_heads=alignment_heads,
+        )
+
+        if not features_only:
+            x = self.output_layer(x)
+        return x, extra
+
+    def extract_features(
+        self,
+        prev_output_tokens,
+        encoder_out: Optional[Dict[str, List[Tensor]]],
+        incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]] = None,
+        full_context_alignment: bool = False,
+        alignment_layer: Optional[int] = None,
+        alignment_heads: Optional[int] = None,
+    ):
+        return self.extract_features_scriptable(
+            prev_output_tokens,
+            encoder_out,
+            incremental_state,
+            full_context_alignment,
+            alignment_layer,
+            alignment_heads,
+        )
+
+    """
+    A scriptable subclass of this class has an extract_features method and calls
+    super().extract_features, but super() is not supported in torchscript. A copy of
+    this function is made to be used in the subclass instead.
+    """
+
+    def extract_features_scriptable(
+        self,
+        prev_output_tokens,
+        encoder_out: Optional[Dict[str, List[Tensor]]],
+        incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]] = None,
+        full_context_alignment: bool = False,
+        alignment_layer: Optional[int] = None,
+        alignment_heads: Optional[int] = None,
+    ):
+        """
+        Similar to *forward* but only return features.
+
+        Includes several features from "Jointly Learning to Align and
+        Translate with Transformer Models" (Garg et al., EMNLP 2019).
+
+        Args:
+            full_context_alignment (bool, optional): don't apply
+                auto-regressive mask to self-attention (default: False).
+            alignment_layer (int, optional): return mean alignment over
+                heads at this layer (default: last layer).
+            alignment_heads (int, optional): only average alignment over
+                this many heads (default: all heads).
+
+        Returns:
+            tuple:
+                - the decoder's features of shape `(batch, tgt_len, embed_dim)`
+                - a dictionary with any model-specific outputs
+        """
+        bs, slen = prev_output_tokens.size()
+        if alignment_layer is None:
+            alignment_layer = self.num_layers - 1
+
+        enc: Optional[Tensor] = None
+        padding_mask: Optional[Tensor] = None
+        if encoder_out is not None and len(encoder_out["encoder_out"]) > 0:
+            enc = encoder_out["encoder_out"][0]
+            assert (
+                enc.size()[1] == bs
+            ), f"Expected enc.shape == (t, {bs}, c) got {enc.shape}"
+        if encoder_out is not None and len(encoder_out["encoder_padding_mask"]) > 0:
+            padding_mask = encoder_out["encoder_padding_mask"][0]
+
+        # embed positions
+        positions = None
+        if self.embed_positions is not None:
+            positions = self.embed_positions(
+                prev_output_tokens, incremental_state=incremental_state
+            )
+
+        if incremental_state is not None:
+            prev_output_tokens = prev_output_tokens[:, -1:]
+            if positions is not None:
+                positions = positions[:, -1:]
+
+        # embed tokens and positions
+        x = self.embed_scale * self.embed_tokens(prev_output_tokens)
+
+        if self.quant_noise is not None:
+            x = self.quant_noise(x)
+
+        if self.project_in_dim is not None:
+            x = self.project_in_dim(x)
+
+        if positions is not None:
+            x += positions
+
+        if self.layernorm_embedding is not None:
+            x = self.layernorm_embedding(x)
+
+        x = self.dropout_module(x)
+
+        # B x T x C -> T x B x C
+        x = x.transpose(0, 1)
+        if self.use_rel_pos_enc:
+            pos_seq = torch.arange(0, slen).long().to(x.device)
+            pos_seq = pos_seq[:, None] - pos_seq[None, :]
+            pos_k, _ = self.pos_emb(pos_seq, incremental_state)
+        else:
+            pos_k = None
+
+        self_attn_padding_mask: Optional[Tensor] = None
+        if self.cross_self_attention or prev_output_tokens.eq(self.padding_idx).any():
+            self_attn_padding_mask = prev_output_tokens.eq(self.padding_idx)
+
+        # decoder layers
+        attn: Optional[Tensor] = None
+        inner_states: List[Optional[Tensor]] = [x]
+        for idx, layer in enumerate(self.layers):
+            if incremental_state is None and not full_context_alignment:
+                self_attn_mask = self.buffered_future_mask(x)
+            else:
+                self_attn_mask = None
+
+            x, layer_attn, _ = layer(
+                x,
+                enc,
+                padding_mask,
+                incremental_state,
+                self_attn_mask=self_attn_mask,
+                self_attn_padding_mask=self_attn_padding_mask,
+                need_attn=bool((idx == alignment_layer)),
+                need_head_weights=bool((idx == alignment_layer)),
+                pos_bias=pos_k,
+            )
+            inner_states.append(x)
+            if layer_attn is not None and idx == alignment_layer:
+                attn = layer_attn.float().to(x)
+
+        if attn is not None:
+            if alignment_heads is not None:
+                attn = attn[:alignment_heads]
+
+            # average probabilities over heads
+            attn = attn.mean(dim=0)
+
+        if self.layer_norm is not None:
+            x = self.layer_norm(x)
+
+        # T x B x C -> B x T x C
+        x = x.transpose(0, 1)
+
+        if self.project_out_dim is not None:
+            x = self.project_out_dim(x)
+
+        return x, {"attn": [attn], "inner_states": inner_states}
+
+    def output_layer(self, features):
+        """Project features to the vocabulary size."""
+        if self.adaptive_softmax is None:
+            # project back to size of vocabulary
+            return self.output_projection(features)
+        else:
+            return features
+
+    def max_positions(self):
+        """Maximum output length supported by the decoder."""
+        if self.embed_positions is None:
+            return self.max_target_positions
+        return min(self.max_target_positions, self.embed_positions.max_positions)
+
+    def buffered_future_mask(self, tensor):
+        dim = tensor.size(0)
+        # self._future_mask.device != tensor.device is not working in TorchScript. This is a workaround.
+        if (
+            self._future_mask.size(0) == 0
+            or (not self._future_mask.device == tensor.device)
+            or self._future_mask.size(0) < dim
+        ):
+            self._future_mask = torch.triu(
+                utils.fill_with_neg_inf(torch.zeros([dim, dim])), 1
+            )
+        self._future_mask = self._future_mask.to(tensor)
+        return self._future_mask[:dim, :dim]
+
+    def upgrade_state_dict_named(self, state_dict, name):
+        """Upgrade a (possibly old) state dict for new versions of fairseq."""
+        if isinstance(self.embed_positions, SinusoidalPositionalEmbedding):
+            weights_key = "{}.embed_positions.weights".format(name)
+            if weights_key in state_dict:
+                del state_dict[weights_key]
+            state_dict[
+                "{}.embed_positions._float_tensor".format(name)
+            ] = torch.FloatTensor(1)
+
+        if f"{name}.output_projection.weight" not in state_dict:
+            if self.share_input_output_embed:
+                embed_out_key = f"{name}.embed_tokens.weight"
+            else:
+                embed_out_key = f"{name}.embed_out"
+            if embed_out_key in state_dict:
+                state_dict[f"{name}.output_projection.weight"] = state_dict[
+                    embed_out_key
+                ]
+                if not self.share_input_output_embed:
+                    del state_dict[embed_out_key]
+
+        for i in range(self.num_layers):
+            # update layer norms
+            layer_norm_map = {
+                "0": "self_attn_layer_norm",
+                "1": "encoder_attn_layer_norm",
+                "2": "final_layer_norm",
+            }
+            for old, new in layer_norm_map.items():
+                for m in ("weight", "bias"):
+                    k = "{}.layers.{}.layer_norms.{}.{}".format(name, i, old, m)
+                    if k in state_dict:
+                        state_dict[
+                            "{}.layers.{}.{}.{}".format(name, i, new, m)
+                        ] = state_dict[k]
+                        del state_dict[k]
+
+        version_key = "{}.version".format(name)
+        if utils.item(state_dict.get(version_key, torch.Tensor([1]))[0]) <= 2:
+            # earlier checkpoints did not normalize after the stack of layers
+            self.layer_norm = None
+            self.normalize = False
+            state_dict[version_key] = torch.Tensor([1])
+
+        return state_dict
+
+
+def Linear(in_features, out_features, bias=True):
+    m = nn.Linear(in_features, out_features, bias)
+    nn.init.xavier_uniform_(m.weight)
+    if bias:
+        nn.init.constant_(m.bias, 0.0)
+    return m
+
+class TransformerDecoderBaseScriptable(TransformerDecoderBase):
+    def extract_features(
+        self,
+        prev_output_tokens,
+        encoder_out: Optional[Dict[str, List[Tensor]]] = None,
+        incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]] = None,
+        full_context_alignment: bool = False,
+        alignment_layer: Optional[int] = None,
+        alignment_heads: Optional[int] = None,
+    ):
+        # call scriptable method from parent class
+        x, _ = self.extract_features_scriptable(
+            prev_output_tokens,
+            encoder_out,
+            incremental_state,
+            full_context_alignment,
+            alignment_layer,
+            alignment_heads,
+        )
+        return x, None
+
+
+class TransformerDecoder(TransformerDecoderBase):
+    def __init__(
+        self,
+        args,
+        dictionary,
+        embed_tokens,
+        no_encoder_attn=False,
+        output_projection=None,
+    ):
+        self.args = args
+        super().__init__(
+            TransformerConfig.from_namespace(args),
+            dictionary,
+            embed_tokens,
+            no_encoder_attn=no_encoder_attn,
+            output_projection=output_projection,
+            use_rel_pos_enc=getattr(args, "use_rel_pos_enc", False),
+        )
+
+    def build_output_projection(self, args, dictionary, embed_tokens):
+        super().build_output_projection(
+            TransformerConfig.from_namespace(args), dictionary, embed_tokens
+        )
+
+    def build_decoder_layer(self, args, no_encoder_attn=False):
+        return super().build_decoder_layer(
+            TransformerConfig.from_namespace(args), no_encoder_attn=no_encoder_attn
+        )
+
+class TransformerDecoderScriptable(TransformerDecoder):
+    def extract_features(
+        self,
+        prev_output_tokens,
+        encoder_out: Optional[Dict[str, List[Tensor]]] = None,
+        incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]] = None,
+        full_context_alignment: bool = False,
+        alignment_layer: Optional[int] = None,
+        alignment_heads: Optional[int] = None,
+    ):
+        # call scriptable method from parent class
+        x, _ = self.extract_features_scriptable(
+            prev_output_tokens,
+            encoder_out,
+            incremental_state,
+            full_context_alignment,
+            alignment_layer,
+            alignment_heads,
+        )
+        return x, None
diff --git a/Speech2S/speech2s/modules/transformer_encoder.py b/Speech2S/speech2s/modules/transformer_encoder.py
new file mode 100644
index 0000000000000000000000000000000000000000..f94e1fed8a005ec59d1e422157e08d88ff95bfda
--- /dev/null
+++ b/Speech2S/speech2s/modules/transformer_encoder.py
@@ -0,0 +1,401 @@
+# --------------------------------------------------------
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Based on fairseq code bases
+# https://github.com/facebookresearch/fairseq
+# --------------------------------------------------------
+
+import math
+from typing import Dict, List, Optional
+
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from fairseq import utils
+from fairseq.distributed import fsdp_wrap
+from fairseq.models import FairseqEncoder
+from fairseq.modules import (
+    FairseqDropout,
+    LayerDropModuleList,
+    LayerNorm,
+    SinusoidalPositionalEmbedding,
+)
+from fairseq.modules.checkpoint_activations import checkpoint_wrapper
+from fairseq.modules.quant_noise import quant_noise as apply_quant_noise_
+from torch import Tensor
+from fairseq.models.transformer import (
+    TransformerConfig,
+)
+
+
+from speechut.modules import transformer_layer, LearnedPositionalEmbedding
+from speechut.modules import RelativePositionalEncoding
+
+# rewrite name for backward compatibility in `make_generation_fast_`
+def module_name_fordropout(module_name: str) -> str:
+    if module_name == "TransformerEncoderBase":
+        return "TransformerEncoder"
+    else:
+        return module_name
+
+
+class TransformerEncoderBase(FairseqEncoder):
+    """
+    Transformer encoder consisting of *cfg.encoder.layers* layers. Each layer
+    is a :class:`TransformerEncoderLayer`.
+
+    Args:
+        args (argparse.Namespace): parsed command-line arguments
+        dictionary (~fairseq.data.Dictionary): encoding dictionary
+        embed_tokens (torch.nn.Embedding): input embedding
+    """
+
+    def __init__(self, cfg, dictionary, embed_tokens, use_rel_pos_enc=False, scaling_for_att=1.0):
+        self.cfg = cfg
+        super().__init__(dictionary)
+        self.register_buffer("version", torch.Tensor([3]))
+
+        self.dropout_module = FairseqDropout(
+            cfg.dropout, module_name=module_name_fordropout(self.__class__.__name__)
+        )
+        self.encoder_layerdrop = cfg.encoder.layerdrop
+
+        embed_dim = embed_tokens.embedding_dim
+        self.padding_idx = embed_tokens.padding_idx
+        self.max_source_positions = cfg.max_source_positions
+
+        self.embed_tokens = embed_tokens
+
+        self.embed_scale = 1.0 if cfg.no_scale_embedding else math.sqrt(embed_dim)
+
+        self.embed_positions = (
+            PositionalEmbedding(
+                cfg.max_source_positions,
+                embed_dim,
+                self.padding_idx,
+                learned=cfg.encoder.learned_pos,
+            )
+            if not cfg.no_token_positional_embeddings
+            else None
+        )
+        if cfg.layernorm_embedding:
+            self.layernorm_embedding = LayerNorm(embed_dim, export=cfg.export)
+        else:
+            self.layernorm_embedding = None
+
+        if not cfg.adaptive_input and cfg.quant_noise.pq > 0:
+            self.quant_noise = apply_quant_noise_(
+                nn.Linear(embed_dim, embed_dim, bias=False),
+                cfg.quant_noise.pq,
+                cfg.quant_noise.pq_block_size,
+            )
+        else:
+            self.quant_noise = None
+
+        if self.encoder_layerdrop > 0.0:
+            self.layers = LayerDropModuleList(p=self.encoder_layerdrop)
+        else:
+            self.layers = nn.ModuleList([])
+        self.use_rel_pos_enc = use_rel_pos_enc
+        self.scaling_for_att = scaling_for_att
+        self.layers.extend(
+            [self.build_encoder_layer(cfg) for i in range(cfg.encoder.layers)]
+        )
+        self.num_layers = len(self.layers)
+
+        if cfg.encoder.normalize_before:
+            self.layer_norm = LayerNorm(embed_dim, export=cfg.export)
+        else:
+            self.layer_norm = None
+        if self.use_rel_pos_enc:
+            self.pos_emb = RelativePositionalEncoding(embed_dim // cfg.encoder.attention_heads, 160)
+
+    def build_encoder_layer(self, cfg):
+        layer = transformer_layer.TransformerEncoderLayerBase(cfg, has_relative_attention_bias=self.use_rel_pos_enc, scaling_for_att=self.scaling_for_att)
+        checkpoint = cfg.checkpoint_activations
+        if checkpoint:
+            offload_to_cpu = cfg.offload_activations
+            layer = checkpoint_wrapper(layer, offload_to_cpu=offload_to_cpu)
+        # if we are checkpointing, enforce that FSDP always wraps the
+        # checkpointed layer, regardless of layer size
+        min_params_to_wrap = cfg.min_params_to_wrap if not checkpoint else 0
+        layer = fsdp_wrap(layer, min_num_params=min_params_to_wrap)
+        return layer
+
+    def forward_embedding(
+        self, src_tokens, token_embedding: Optional[torch.Tensor] = None
+    ):
+        # embed tokens and positions
+        if token_embedding is None:
+            token_embedding = self.embed_tokens(src_tokens)
+        x = embed = self.embed_scale * token_embedding
+        if self.embed_positions is not None:
+            x = embed + self.embed_positions(src_tokens)
+        if self.layernorm_embedding is not None:
+            x = self.layernorm_embedding(x)
+        x = self.dropout_module(x)
+        if self.quant_noise is not None:
+            x = self.quant_noise(x)
+        return x, embed
+
+    def forward(
+        self,
+        src_tokens,
+        src_lengths: Optional[torch.Tensor] = None,
+        return_all_hiddens: bool = False,
+        token_embeddings: Optional[torch.Tensor] = None,
+        uniformity_layers: Optional[List[int]] = None,
+    ):
+        """
+        Args:
+            src_tokens (LongTensor): tokens in the source language of shape
+                `(batch, src_len)`
+            src_lengths (torch.LongTensor): lengths of each source sentence of
+                shape `(batch)`
+            return_all_hiddens (bool, optional): also return all of the
+                intermediate hidden states (default: False).
+            token_embeddings (torch.Tensor, optional): precomputed embeddings
+                default `None` will recompute embeddings
+
+        Returns:
+            dict:
+                - **encoder_out** (Tensor): the last encoder layer's output of
+                  shape `(src_len, batch, embed_dim)`
+                - **encoder_padding_mask** (ByteTensor): the positions of
+                  padding elements of shape `(batch, src_len)`
+                - **encoder_embedding** (Tensor): the (scaled) embedding lookup
+                  of shape `(batch, src_len, embed_dim)`
+                - **encoder_states** (List[Tensor]): all intermediate
+                  hidden states of shape `(src_len, batch, embed_dim)`.
+                  Only populated if *return_all_hiddens* is True.
+        """
+        return self.forward_scriptable(
+            src_tokens, src_lengths, return_all_hiddens, token_embeddings, uniformity_layers
+        )
+
+    # TorchScript doesn't support super() method so that the scriptable Subclass
+    # can't access the base class model in Torchscript.
+    # Current workaround is to add a helper function with different name and
+    # call the helper function from scriptable Subclass.
+    def forward_scriptable(
+        self,
+        src_tokens,
+        src_lengths: Optional[torch.Tensor] = None,
+        return_all_hiddens: bool = False,
+        token_embeddings: Optional[torch.Tensor] = None,
+        uniformity_layers: Optional[List[int]] = None,
+    ):
+        """
+        Args:
+            src_tokens (LongTensor): tokens in the source language of shape
+                `(batch, src_len)`
+            src_lengths (torch.LongTensor): lengths of each source sentence of
+                shape `(batch)`
+            return_all_hiddens (bool, optional): also return all of the
+                intermediate hidden states (default: False).
+            token_embeddings (torch.Tensor, optional): precomputed embeddings
+                default `None` will recompute embeddings
+
+        Returns:
+            dict:
+                - **encoder_out** (Tensor): the last encoder layer's output of
+                  shape `(src_len, batch, embed_dim)`
+                - **encoder_padding_mask** (ByteTensor): the positions of
+                  padding elements of shape `(batch, src_len)`
+                - **encoder_embedding** (Tensor): the (scaled) embedding lookup
+                  of shape `(batch, src_len, embed_dim)`
+                - **encoder_states** (List[Tensor]): all intermediate
+                  hidden states of shape `(src_len, batch, embed_dim)`.
+                  Only populated if *return_all_hiddens* is True.
+        """
+        # compute padding mask
+        encoder_padding_mask = src_tokens.eq(self.padding_idx)
+        has_pads = src_tokens.device.type == "xla" or encoder_padding_mask.any()
+
+        x, encoder_embedding = self.forward_embedding(src_tokens, token_embeddings)
+
+        # account for padding while computing the representation
+        if has_pads:
+            x = x * (1 - encoder_padding_mask.unsqueeze(-1).type_as(x))
+
+        # B x T x C -> T x B x C
+        x = x.transpose(0, 1)
+        if self.use_rel_pos_enc:
+            x_len = x.shape[0]
+            pos_seq = torch.arange(0, x_len).long().to(x.device)
+            pos_seq = pos_seq[:, None] - pos_seq[None, :]
+            pos_k, pos_v = self.pos_emb(pos_seq)
+        else:
+            pos_k = None
+
+        encoder_states = []
+        uniformity_hiddens = []
+
+        if return_all_hiddens:
+            encoder_states.append(x)
+
+        if uniformity_layers is not None and 0 in uniformity_layers:
+            x = F.normalize(x.float(), dim=-1).type_as(x)
+            uniformity_hiddens.append(x)
+
+        # encoder layers
+        for i, layer in enumerate(self.layers):
+            x = layer(
+                x, encoder_padding_mask=encoder_padding_mask if has_pads else None,
+                pos_bias=pos_k,
+            )
+            if uniformity_layers is not None and i+1 in uniformity_layers:
+                x = F.normalize(x.float(), dim=-1).type_as(x)
+                uniformity_hiddens.append(x)
+            if return_all_hiddens:
+                assert encoder_states is not None
+                encoder_states.append(x)
+
+        if self.layer_norm is not None:
+            x = self.layer_norm(x)
+
+        # The Pytorch Mobile lite interpreter does not supports returning NamedTuple in
+        # `forward` so we use a dictionary instead.
+        # TorchScript does not support mixed values so the values are all lists.
+        # The empty list is equivalent to None.
+        src_lengths = (
+            src_tokens.ne(self.padding_idx)
+            .sum(dim=1, dtype=torch.int32)
+            .reshape(-1, 1)
+            .contiguous()
+        )
+        return {
+            "encoder_out": [x],  # T x B x C
+            "encoder_padding_mask": [encoder_padding_mask],  # B x T
+            "encoder_embedding": [encoder_embedding],  # B x T x C
+            "encoder_states": encoder_states,  # List[T x B x C]
+            "uniformity_hiddens": uniformity_hiddens, # List[T x B x C]
+            "src_tokens": [],
+            "src_lengths": [src_lengths],
+        }
+
+    @torch.jit.export
+    def reorder_encoder_out(self, encoder_out: Dict[str, List[Tensor]], new_order):
+        """
+        Reorder encoder output according to *new_order*.
+
+        Args:
+            encoder_out: output from the ``forward()`` method
+            new_order (LongTensor): desired order
+
+        Returns:
+            *encoder_out* rearranged according to *new_order*
+        """
+        if len(encoder_out["encoder_out"]) == 0:
+            new_encoder_out = []
+        else:
+            new_encoder_out = [encoder_out["encoder_out"][0].index_select(1, new_order)]
+        if len(encoder_out["encoder_padding_mask"]) == 0:
+            new_encoder_padding_mask = []
+        else:
+            new_encoder_padding_mask = [
+                encoder_out["encoder_padding_mask"][0].index_select(0, new_order)
+            ]
+        if len(encoder_out["encoder_embedding"]) == 0:
+            new_encoder_embedding = []
+        else:
+            new_encoder_embedding = [
+                encoder_out["encoder_embedding"][0].index_select(0, new_order)
+            ]
+
+        if len(encoder_out["src_tokens"]) == 0:
+            src_tokens = []
+        else:
+            src_tokens = [(encoder_out["src_tokens"][0]).index_select(0, new_order)]
+
+        if len(encoder_out["src_lengths"]) == 0:
+            src_lengths = []
+        else:
+            src_lengths = [(encoder_out["src_lengths"][0]).index_select(0, new_order)]
+
+        encoder_states = encoder_out["encoder_states"]
+        if len(encoder_states) > 0:
+            for idx, state in enumerate(encoder_states):
+                encoder_states[idx] = state.index_select(1, new_order)
+
+        return {
+            "encoder_out": new_encoder_out,  # T x B x C
+            "encoder_padding_mask": new_encoder_padding_mask,  # B x T
+            "encoder_embedding": new_encoder_embedding,  # B x T x C
+            "encoder_states": encoder_states,  # List[T x B x C]
+            "src_tokens": src_tokens,  # B x T
+            "src_lengths": src_lengths,  # B x 1
+        }
+
+    def max_positions(self):
+        """Maximum input length supported by the encoder."""
+        if self.embed_positions is None:
+            return self.max_source_positions
+        return min(self.max_source_positions, self.embed_positions.max_positions)
+
+    def upgrade_state_dict_named(self, state_dict, name):
+        """Upgrade a (possibly old) state dict for new versions of fairseq."""
+        if isinstance(self.embed_positions, SinusoidalPositionalEmbedding):
+            weights_key = "{}.embed_positions.weights".format(name)
+            if weights_key in state_dict:
+                print("deleting {0}".format(weights_key))
+                del state_dict[weights_key]
+            state_dict[
+                "{}.embed_positions._float_tensor".format(name)
+            ] = torch.FloatTensor(1)
+        for i in range(self.num_layers):
+            # update layer norms
+            self.layers[i].upgrade_state_dict_named(
+                state_dict, "{}.layers.{}".format(name, i)
+            )
+
+        version_key = "{}.version".format(name)
+        if utils.item(state_dict.get(version_key, torch.Tensor([1]))[0]) < 2:
+            # earlier checkpoints did not normalize after the stack of layers
+            self.layer_norm = None
+            self.normalize = False
+            state_dict[version_key] = torch.Tensor([1])
+        return state_dict
+
+
+class TransformerEncoder(TransformerEncoderBase):
+    def __init__(self, args, dictionary, embed_tokens):
+        self.args = args
+        super().__init__(
+            TransformerConfig.from_namespace(args),
+            dictionary,
+            embed_tokens,
+            use_rel_pos_enc=getattr(args, "use_rel_pos_enc", False),
+            scaling_for_att=getattr(args, "scaling_for_att", 1.0),
+        )
+
+    def build_encoder_layer(self, args):
+        return super().build_encoder_layer(
+            TransformerConfig.from_namespace(args),
+        )
+
+
+def PositionalEmbedding(
+    num_embeddings: int,
+    embedding_dim: int,
+    padding_idx: int,
+    learned: bool = False,
+):
+    if learned:
+        # if padding_idx is specified then offset the embedding ids by
+        # this index and adjust num_embeddings appropriately
+        # TODO: The right place for this offset would be inside
+        # LearnedPositionalEmbedding. Move this there for a cleaner implementation.
+        if padding_idx is not None:
+            num_embeddings = num_embeddings + padding_idx + 1
+        m = LearnedPositionalEmbedding(num_embeddings, embedding_dim, padding_idx)
+        nn.init.normal_(m.weight, mean=0, std=embedding_dim**-0.5)
+        if padding_idx is not None:
+            nn.init.constant_(m.weight[padding_idx], 0)
+    else:
+        m = SinusoidalPositionalEmbedding(
+            embedding_dim,
+            padding_idx,
+            init_size=num_embeddings + padding_idx + 1,
+        )
+    return m
diff --git a/Speech2S/speech2s/modules/transformer_layer.py b/Speech2S/speech2s/modules/transformer_layer.py
new file mode 100644
index 0000000000000000000000000000000000000000..a71a848f1a5436756168aafd12d71637520b6b67
--- /dev/null
+++ b/Speech2S/speech2s/modules/transformer_layer.py
@@ -0,0 +1,330 @@
+# --------------------------------------------------------
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Based on fairseq code bases
+# https://github.com/facebookresearch/fairseq
+# --------------------------------------------------------
+
+"""
+    Modified from https://github.com/facebookresearch/fairseq/blob/main/fairseq/modules/transformer_layer.py
+    https://github.com/microsoft/SpeechT5/blob/main/Speech2C/speech2c/models/modules/transformer_decoder_layer.py
+"""
+
+from typing import Dict, List, Optional
+
+import torch
+from torch import Tensor
+from fairseq.modules import LayerNorm
+from fairseq.modules.transformer_layer import TransformerEncoderLayerBase as FairseqTransformerEncoderLayerBase
+from fairseq.modules.transformer_layer import TransformerDecoderLayerBase as FairseqTransformerDecoderLayerBase
+
+from speechut.modules import MultiheadAttention
+
+class TransformerEncoderLayerBase(FairseqTransformerEncoderLayerBase):
+    """Encoder layer block.
+
+    In the original paper each operation (multi-head attention or FFN) is
+    postprocessed with: `dropout -> add residual -> layernorm`. In the
+    tensor2tensor code they suggest that learning is more robust when
+    preprocessing each layer with layernorm and postprocessing with:
+    `dropout -> add residual`. We default to the approach in the paper, but the
+    tensor2tensor approach can be enabled by setting
+    *cfg.encoder.normalize_before* to ``True``.
+
+    Args:
+        args (argparse.Namespace): parsed command-line arguments
+    """
+
+    def __init__(self, cfg, has_relative_attention_bias=False, scaling_for_att=1.0):
+        self.scaling_for_att = scaling_for_att
+        super().__init__(cfg)
+        if has_relative_attention_bias:
+            self.norm_k = LayerNorm(self.embed_dim // cfg.encoder.attention_heads)
+
+    def build_self_attention(self, embed_dim, cfg, scaling_for_att=1.0):
+        return MultiheadAttention(
+            embed_dim,
+            cfg.encoder.attention_heads,
+            dropout=cfg.attention_dropout,
+            self_attention=True,
+            q_noise=self.quant_noise,
+            qn_block_size=self.quant_noise_block_size,
+            scaling_for_att=self.scaling_for_att,
+        )
+
+    def forward(
+        self,
+        x,
+        encoder_padding_mask: Optional[Tensor],
+        attn_mask: Optional[Tensor] = None,
+        pos_bias=None,
+    ):
+        """
+        Args:
+            x (Tensor): input to the layer of shape `(seq_len, batch, embed_dim)`
+            encoder_padding_mask (ByteTensor): binary ByteTensor of shape
+                `(batch, seq_len)` where padding elements are indicated by ``1``.
+            attn_mask (ByteTensor): binary tensor of shape `(tgt_len, src_len)`,
+                where `tgt_len` is the length of output and `src_len` is the
+                length of input, though here both are equal to `seq_len`.
+                `attn_mask[tgt_i, src_j] = 1` means that when calculating the
+                embedding for `tgt_i`, we exclude (mask out) `src_j`. This is
+                useful for strided self-attention.
+
+        Returns:
+            encoded output of shape `(seq_len, batch, embed_dim)`
+        """
+        # anything in original attn_mask = 1, becomes -1e8
+        # anything in original attn_mask = 0, becomes 0
+        # Note that we cannot use -inf here, because at some edge cases,
+        # the attention weight (before softmax) for some padded element in query
+        # will become -inf, which results in NaN in model parameters
+        if attn_mask is not None:
+            attn_mask = attn_mask.masked_fill(
+                attn_mask.to(torch.bool), -1e8 if x.dtype == torch.float32 else -1e4
+            )
+
+        residual = x
+        if self.normalize_before:
+            x = self.self_attn_layer_norm(x)
+            if pos_bias is not None:
+                pos_bias = self.norm_k(pos_bias)
+        x, _ = self.self_attn(
+            query=x,
+            key=x,
+            value=x,
+            key_padding_mask=encoder_padding_mask,
+            need_weights=False,
+            attn_mask=attn_mask,
+            position_bias=pos_bias,
+        )
+        x = self.dropout_module(x)
+        x = self.residual_connection(x, residual)
+        if not self.normalize_before:
+            x = self.self_attn_layer_norm(x)
+
+        residual = x
+        if self.normalize_before:
+            x = self.final_layer_norm(x)
+        x = self.activation_fn(self.fc1(x))
+        x = self.activation_dropout_module(x)
+        x = self.fc2(x)
+        x = self.dropout_module(x)
+        x = self.residual_connection(x, residual)
+        if not self.normalize_before:
+            x = self.final_layer_norm(x)
+        return x
+
+
+
+class TransformerDecoderLayerBase(FairseqTransformerDecoderLayerBase):
+    """Decoder layer block.
+
+    In the original paper each operation (multi-head attention, encoder
+    attention or FFN) is postprocessed with: `dropout -> add residual ->
+    layernorm`. In the tensor2tensor code they suggest that learning is more
+    robust when preprocessing each layer with layernorm and postprocessing with:
+    `dropout -> add residual`. We default to the approach in the paper, but the
+    tensor2tensor approach can be enabled by setting
+    *cfg.decoder.normalize_before* to ``True``.
+
+    Args:
+        args (argparse.Namespace): parsed command-line arguments
+        no_encoder_attn (bool, optional): whether to attend to encoder outputs
+            (default: False).
+    """
+
+    def __init__(
+        self, cfg, no_encoder_attn=False, add_bias_kv=False, add_zero_attn=False, has_relative_attention_bias=False, scaling_for_att=1.0,
+    ):
+        self.scaling_for_att = scaling_for_att
+        super().__init__(cfg,
+            no_encoder_attn,
+            add_bias_kv,
+            add_zero_attn,
+        )
+
+        if has_relative_attention_bias:
+            self.norm_k = LayerNorm(self.embed_dim // cfg.decoder.attention_heads)
+
+    def build_self_attention(
+        self, embed_dim, cfg, add_bias_kv=False, add_zero_attn=False
+    ):
+        return MultiheadAttention(
+            embed_dim,
+            cfg.decoder.attention_heads,
+            dropout=cfg.attention_dropout,
+            add_bias_kv=add_bias_kv,
+            add_zero_attn=add_zero_attn,
+            self_attention=not cfg.cross_self_attention,
+            q_noise=self.quant_noise,
+            qn_block_size=self.quant_noise_block_size,
+            scaling_for_att=self.scaling_for_att,
+        )
+
+    def build_encoder_attention(self, embed_dim, cfg):
+        return MultiheadAttention(
+            embed_dim,
+            cfg.decoder.attention_heads,
+            kdim=cfg.encoder.embed_dim,
+            vdim=cfg.encoder.embed_dim,
+            dropout=cfg.attention_dropout,
+            encoder_decoder_attention=True,
+            q_noise=self.quant_noise,
+            qn_block_size=self.quant_noise_block_size,
+            scaling_for_att=self.scaling_for_att,
+        )
+
+    def forward(
+        self,
+        x,
+        encoder_out: Optional[torch.Tensor] = None,
+        encoder_padding_mask: Optional[torch.Tensor] = None,
+        incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]] = None,
+        prev_self_attn_state: Optional[List[torch.Tensor]] = None,
+        prev_attn_state: Optional[List[torch.Tensor]] = None,
+        self_attn_mask: Optional[torch.Tensor] = None,
+        self_attn_padding_mask: Optional[torch.Tensor] = None,
+        need_attn: bool = False,
+        need_head_weights: bool = False,
+        pos_bias=None,
+    ):
+        """
+        Args:
+            x (Tensor): input to the layer of shape `(seq_len, batch, embed_dim)`
+            encoder_padding_mask (ByteTensor, optional): binary
+                ByteTensor of shape `(batch, src_len)` where padding
+                elements are indicated by ``1``.
+            need_attn (bool, optional): return attention weights
+            need_head_weights (bool, optional): return attention weights
+                for each head (default: return average over heads).
+
+        Returns:
+            encoded output of shape `(seq_len, batch, embed_dim)`
+        """
+        if need_head_weights:
+            need_attn = True
+
+        residual = x
+        if self.normalize_before:
+            x = self.self_attn_layer_norm(x)
+            if pos_bias is not None:
+                pos_bias = self.norm_k(pos_bias)
+        if prev_self_attn_state is not None:
+            prev_key, prev_value = prev_self_attn_state[:2]
+            saved_state: Dict[str, Optional[Tensor]] = {
+                "prev_key": prev_key,
+                "prev_value": prev_value,
+            }
+            if len(prev_self_attn_state) >= 3:
+                saved_state["prev_key_padding_mask"] = prev_self_attn_state[2]
+            assert incremental_state is not None
+            self.self_attn._set_input_buffer(incremental_state, saved_state)
+        _self_attn_input_buffer = self.self_attn._get_input_buffer(incremental_state)
+        if self.cross_self_attention and not (
+            incremental_state is not None
+            and _self_attn_input_buffer is not None
+            and "prev_key" in _self_attn_input_buffer
+        ):
+            if self_attn_mask is not None:
+                assert encoder_out is not None
+                self_attn_mask = torch.cat(
+                    (x.new_zeros(x.size(0), encoder_out.size(0)), self_attn_mask), dim=1
+                )
+            if self_attn_padding_mask is not None:
+                if encoder_padding_mask is None:
+                    assert encoder_out is not None
+                    encoder_padding_mask = self_attn_padding_mask.new_zeros(
+                        encoder_out.size(1), encoder_out.size(0)
+                    )
+                self_attn_padding_mask = torch.cat(
+                    (encoder_padding_mask, self_attn_padding_mask), dim=1
+                )
+            assert encoder_out is not None
+            y = torch.cat((encoder_out, x), dim=0)
+        else:
+            y = x
+
+        x, attn = self.self_attn(
+            query=x,
+            key=y,
+            value=y,
+            key_padding_mask=self_attn_padding_mask,
+            incremental_state=incremental_state,
+            need_weights=False,
+            attn_mask=self_attn_mask,
+            position_bias=pos_bias,
+        )
+        if self.c_attn is not None:
+            tgt_len, bsz = x.size(0), x.size(1)
+            x = x.view(tgt_len, bsz, self.nh, self.head_dim)
+            x = torch.einsum("tbhd,h->tbhd", x, self.c_attn)
+            x = x.reshape(tgt_len, bsz, self.embed_dim)
+        if self.attn_ln is not None:
+            x = self.attn_ln(x)
+        x = self.dropout_module(x)
+        x = self.residual_connection(x, residual)
+        if not self.normalize_before:
+            x = self.self_attn_layer_norm(x)
+
+        if self.encoder_attn is not None and encoder_out is not None:
+            residual = x
+            if self.normalize_before:
+                x = self.encoder_attn_layer_norm(x)
+            if prev_attn_state is not None:
+                prev_key, prev_value = prev_attn_state[:2]
+                saved_state: Dict[str, Optional[Tensor]] = {
+                    "prev_key": prev_key,
+                    "prev_value": prev_value,
+                }
+                if len(prev_attn_state) >= 3:
+                    saved_state["prev_key_padding_mask"] = prev_attn_state[2]
+                assert incremental_state is not None
+                self.encoder_attn._set_input_buffer(incremental_state, saved_state)
+
+            x, attn = self.encoder_attn(
+                query=x,
+                key=encoder_out,
+                value=encoder_out,
+                key_padding_mask=encoder_padding_mask,
+                incremental_state=incremental_state,
+                static_kv=True,
+                need_weights=need_attn or (not self.training and self.need_attn),
+                need_head_weights=need_head_weights,
+            )
+            x = self.dropout_module(x)
+            x = self.residual_connection(x, residual)
+            if not self.normalize_before:
+                x = self.encoder_attn_layer_norm(x)
+
+        residual = x
+        if self.normalize_before:
+            x = self.final_layer_norm(x)
+
+        x = self.activation_fn(self.fc1(x))
+        x = self.activation_dropout_module(x)
+        if self.ffn_layernorm is not None:
+            x = self.ffn_layernorm(x)
+        x = self.fc2(x)
+        x = self.dropout_module(x)
+        if self.w_resid is not None:
+            residual = torch.mul(self.w_resid, residual)
+        x = self.residual_connection(x, residual)
+        if not self.normalize_before:
+            x = self.final_layer_norm(x)
+        if self.onnx_trace and incremental_state is not None:
+            saved_state = self.self_attn._get_input_buffer(incremental_state)
+            assert saved_state is not None
+            if self_attn_padding_mask is not None:
+                self_attn_state = [
+                    saved_state["prev_key"],
+                    saved_state["prev_value"],
+                    saved_state["prev_key_padding_mask"],
+                ]
+            else:
+                self_attn_state = [saved_state["prev_key"], saved_state["prev_value"]]
+            return x, attn, self_attn_state
+        return x, attn, None
+
+    def make_generation_fast_(self, need_attn: bool = False, **kwargs):
+        self.need_attn = need_attn
diff --git a/Speech2S/speech2s/modules/w2v_encoder.py b/Speech2S/speech2s/modules/w2v_encoder.py
new file mode 100644
index 0000000000000000000000000000000000000000..386f1eb0a4f4f67b552271e65c0b402d197e5bb2
--- /dev/null
+++ b/Speech2S/speech2s/modules/w2v_encoder.py
@@ -0,0 +1,281 @@
+# --------------------------------------------------------
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Based on fairseq code bases
+# https://github.com/facebookresearch/fairseq
+# --------------------------------------------------------
+
+"""
+    wav2vec encoder adding relitive position bias, modified from 
+    https://github.com/microsoft/SpeechT5/blob/main/Speech2C/speech2c/models/modules/transformer_encoder.py
+    https://github.com/facebookresearch/fairseq/blob/main/fairseq/models/wav2vec/wav2vec2.py
+"""
+
+import math
+import numpy as np
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from fairseq import utils
+from fairseq.dataclass import ChoiceEnum
+from fairseq.modules import (
+    LayerNorm,
+    SamePad,
+)
+from fairseq.modules.checkpoint_activations import checkpoint_wrapper
+from fairseq.modules.transformer_sentence_encoder import init_bert_params
+from fairseq.utils import index_put
+from fairseq.distributed import fsdp_wrap
+from fairseq.models.wav2vec.utils import pad_to_multiple
+
+## reload multi-head attition with rel-pos-bias
+from fairseq.models.wav2vec.wav2vec2 import TransformerEncoder as W2vTransformerEncoder
+from speechut.modules import RelativePositionalEncoding
+from speechut.modules import MultiheadAttention
+
+EXTRACTOR_MODE_CHOICES = ChoiceEnum(["default", "layer_norm"])
+MASKING_DISTRIBUTION_CHOICES = ChoiceEnum(["static", "uniform", "normal", "poisson"])
+
+
+class TransformerEncoder(W2vTransformerEncoder):
+    def __init__(self, args):
+        super().__init__(args)
+
+        self.dropout = args.dropout
+        self.embedding_dim = args.encoder_embed_dim
+        self.required_seq_len_multiple = args.required_seq_len_multiple
+        self.use_rel_pos_enc = getattr(args, "use_rel_pos_enc", False)
+
+        self.pos_conv = nn.Conv1d(
+            self.embedding_dim,
+            self.embedding_dim,
+            kernel_size=args.conv_pos,
+            padding=args.conv_pos // 2,
+            groups=args.conv_pos_groups,
+        )
+        dropout = 0
+        std = math.sqrt((4 * (1.0 - dropout)) / (args.conv_pos * self.embedding_dim))
+        nn.init.normal_(self.pos_conv.weight, mean=0, std=std)
+        nn.init.constant_(self.pos_conv.bias, 0)
+
+        self.pos_conv = nn.utils.weight_norm(self.pos_conv, name="weight", dim=2)
+        self.pos_conv = nn.Sequential(self.pos_conv, SamePad(args.conv_pos), nn.GELU())
+
+        layers = []
+        for _ in range(args.encoder_layers):
+            layer = TransformerSentenceEncoderLayer(
+                embedding_dim=self.embedding_dim,
+                ffn_embedding_dim=args.encoder_ffn_embed_dim,
+                num_attention_heads=args.encoder_attention_heads,
+                dropout=self.dropout,
+                attention_dropout=args.attention_dropout,
+                activation_dropout=args.activation_dropout,
+                activation_fn=args.activation_fn,
+                layer_norm_first=args.layer_norm_first,
+                has_relative_attention_bias=self.use_rel_pos_enc,
+            )
+            if args.checkpoint_activations:
+                layer = fsdp_wrap(layer)
+                layer = checkpoint_wrapper(layer)
+            layers.append(layer)
+        self.layers = nn.ModuleList(layers)
+
+        self.layer_norm_first = args.layer_norm_first
+        self.layer_norm = LayerNorm(self.embedding_dim)
+        self.layerdrop = args.encoder_layerdrop
+        if self.use_rel_pos_enc:
+            self.pos_emb = RelativePositionalEncoding(args.encoder_embed_dim // args.encoder_attention_heads, 160)
+
+
+        self.apply(init_bert_params)
+
+    def forward(self, x, padding_mask=None, layer=None):
+        x, layer_results = self.extract_features(x, padding_mask, layer)
+
+        if self.layer_norm_first and layer is None:
+            x = self.layer_norm(x)
+
+        return x, layer_results
+
+    def extract_features(self, x, padding_mask=None, tgt_layer=None):
+
+        if padding_mask is not None:
+            x = index_put(x, padding_mask, 0)
+
+        x_conv = self.pos_conv(x.transpose(1, 2))
+        x_conv = x_conv.transpose(1, 2)
+        x = x + x_conv
+
+        if not self.layer_norm_first:
+            x = self.layer_norm(x)
+
+        # pad to the sequence length dimension
+        x, pad_length = pad_to_multiple(
+            x, self.required_seq_len_multiple, dim=-2, value=0
+        )
+        if pad_length > 0 and padding_mask is None:
+            padding_mask = x.new_zeros((x.size(0), x.size(1)), dtype=torch.bool)
+            padding_mask[:, -pad_length:] = True
+        else:
+            padding_mask, _ = pad_to_multiple(
+                padding_mask, self.required_seq_len_multiple, dim=-1, value=True
+            )
+        x = F.dropout(x, p=self.dropout, training=self.training)
+
+        # B x T x C -> T x B x C
+        x = x.transpose(0, 1)
+
+        if self.use_rel_pos_enc:
+            x_len = x.shape[0]
+            pos_seq = torch.arange(0, x_len).long().to(x.device)
+            pos_seq = pos_seq[:, None] - pos_seq[None, :]
+            pos_k, pos_v = self.pos_emb(pos_seq)
+        else:
+            pos_k = None
+
+        layer_results = []
+        r = None
+        for i, layer in enumerate(self.layers):
+            dropout_probability = np.random.random()
+            if not self.training or (dropout_probability > self.layerdrop):
+                x, z = layer(x, self_attn_padding_mask=padding_mask, need_weights=False, pos_bias=pos_k)
+                if tgt_layer is not None:
+                    # unpad if needed
+                    if pad_length > 0:
+                        layer_results.append(
+                            (
+                                x[:-pad_length],
+                                z[:, :-pad_length, :-pad_length]
+                                if z is not None
+                                else z,
+                            )
+                        )
+                    else:
+                        layer_results.append((x, z))
+            if i == tgt_layer:
+                r = x
+                break
+
+        if r is not None:
+            x = r
+
+        # T x B x C -> B x T x C
+        x = x.transpose(0, 1)
+        # undo paddding
+        if pad_length > 0:
+            x = x[:, :-pad_length]
+
+        return x, layer_results
+
+
+class TransformerSentenceEncoderLayer(nn.Module):
+    """
+    Implements a Transformer Encoder Layer used in BERT/XLM style pre-trained
+    models.
+    """
+
+    def __init__(
+        self,
+        embedding_dim: float = 768,
+        ffn_embedding_dim: float = 3072,
+        num_attention_heads: float = 8,
+        dropout: float = 0.1,
+        attention_dropout: float = 0.1,
+        activation_dropout: float = 0.1,
+        activation_fn: str = "relu",
+        layer_norm_first: bool = False,
+        has_relative_attention_bias: bool = False,
+    ) -> None:
+
+        super().__init__()
+        # Initialize parameters
+        self.embedding_dim = embedding_dim
+        self.dropout = dropout
+        self.activation_dropout = activation_dropout
+
+        # Initialize blocks
+        self.activation_fn = utils.get_activation_fn(activation_fn)
+        self.self_attn = MultiheadAttention(
+            self.embedding_dim,
+            num_attention_heads,
+            dropout=attention_dropout,
+            self_attention=True,
+        )
+
+        self.dropout1 = nn.Dropout(dropout)
+        self.dropout2 = nn.Dropout(self.activation_dropout)
+        self.dropout3 = nn.Dropout(dropout)
+
+        self.layer_norm_first = layer_norm_first
+
+        # layer norm associated with the self attention layer
+        self.self_attn_layer_norm = LayerNorm(self.embedding_dim)
+        self.fc1 = nn.Linear(self.embedding_dim, ffn_embedding_dim)
+        self.fc2 = nn.Linear(ffn_embedding_dim, self.embedding_dim)
+
+        # layer norm associated with the position wise feed-forward NN
+        self.final_layer_norm = LayerNorm(self.embedding_dim)
+
+        if has_relative_attention_bias:
+            self.norm_k = LayerNorm(self.embedding_dim//num_attention_heads)
+
+    def forward(
+        self,
+        x: torch.Tensor,
+        self_attn_mask: torch.Tensor = None,
+        self_attn_padding_mask: torch.Tensor = None,
+        need_weights: bool = False,
+        att_args=None,
+        pos_bias=None,
+    ):
+        """
+        LayerNorm is applied either before or after the self-attention/ffn
+        modules similar to the original Transformer imlementation.
+        """
+        residual = x
+
+        if self.layer_norm_first:
+            x = self.self_attn_layer_norm(x)
+            if pos_bias is not None:
+                pos_bias = self.norm_k(pos_bias)
+            x, attn = self.self_attn(
+                query=x,
+                key=x,
+                value=x,
+                key_padding_mask=self_attn_padding_mask,
+                attn_mask=self_attn_mask,
+                position_bias=pos_bias,
+            )
+            x = self.dropout1(x)
+            x = residual + x
+
+            residual = x
+            x = self.final_layer_norm(x)
+            x = self.activation_fn(self.fc1(x))
+            x = self.dropout2(x)
+            x = self.fc2(x)
+            x = self.dropout3(x)
+            x = residual + x
+        else:
+            x, attn = self.self_attn(
+                query=x,
+                key=x,
+                value=x,
+                key_padding_mask=self_attn_padding_mask,
+                position_bias=pos_bias,
+            )
+
+            x = self.dropout1(x)
+            x = residual + x
+
+            x = self.self_attn_layer_norm(x)
+
+            residual = x
+            x = self.activation_fn(self.fc1(x))
+            x = self.dropout2(x)
+            x = self.fc2(x)
+            x = self.dropout3(x)
+            x = residual + x
+            x = self.final_layer_norm(x)
+
+        return x, attn
diff --git a/Speech2S/speech2s/scripts copy/pretrain_speechut/base_speechut_for_asr.sh b/Speech2S/speech2s/scripts copy/pretrain_speechut/base_speechut_for_asr.sh
new file mode 100644
index 0000000000000000000000000000000000000000..d5bc7311331208c3f2f65c17586c73ee63cd98f0
--- /dev/null
+++ b/Speech2S/speech2s/scripts copy/pretrain_speechut/base_speechut_for_asr.sh	
@@ -0,0 +1,40 @@
+# ####################################
+# SpeechUT Base model #
+# ####################################
+[ $# -lt 2 ] && echo "Usage: $0 <data_dir> <text_data_dir> [mount=${PWD}] [world_size=32] [update_freq=1]" && exit 1
+[ ${PWD##*/} != SpeechUT ] && echo "Error: dir not match! Switch to SpeechUT/ and run it again!" && exit 1
+DATA_DIR=$1
+TEXT_DATA_DIR=$2
+mount=$3
+world_size=$4
+update_freq=$5
+[ -z $mount ] && mount=${PWD}
+[ -z $world_size ] && world_size=32
+[ -z $update_freq ] && update_freq=1
+
+CODE_ROOT=${PWD}
+MODEL_DIR="${mount}/exp/pretrain/base_speechut4asr_${world_size}gpu_${update_freq}accum"
+[ -d $MODEL_DIR ] || mkdir -p $MODEL_DIR
+
+python $CODE_ROOT/fairseq/fairseq_cli/hydra_train.py \
+  --config-dir $CODE_ROOT/speechut/config/pretrain \
+  --config-name speechut_base_librispeech \
+  common.user_dir=$CODE_ROOT/speechut \
+  \
+  task.labels='["km"]' \
+  model.label_rate=50 \
+  task.data=$DATA_DIR \
+  task.label_dir=$DATA_DIR \
+  task.text_cfg.text_data=$TEXT_DATA_DIR \
+  \
+  dataset.train_subset=\"train_960+pseudo_libritext.kmu-ltr+merge_960.kmu-none\" \
+  dataset.valid_subset=\"dev_clean+dev.kmu-ltr+dev.kmu-none\" \
+  dataset.num_workers=0 \
+  dataset.max_tokens=1400000 \
+  distributed_training.distributed_world_size=${world_size} \
+  optimization.update_freq=[${update_freq}] \
+  \
+  common.tensorboard_logdir=$MODEL_DIR \
+  checkpoint.save_dir=$MODEL_DIR \
+  hydra.run.dir=$MODEL_DIR \
+  hydra.job.name=base_speechut4asr_${world_size}gpu_${update_freq}accum
diff --git a/Speech2S/speech2s/scripts copy/pretrain_speechut/base_speechut_for_st.sh b/Speech2S/speech2s/scripts copy/pretrain_speechut/base_speechut_for_st.sh
new file mode 100644
index 0000000000000000000000000000000000000000..438a43f55275938c51faefab181dacc1af3567d0
--- /dev/null
+++ b/Speech2S/speech2s/scripts copy/pretrain_speechut/base_speechut_for_st.sh	
@@ -0,0 +1,47 @@
+# ####################################
+# SpeechUT Base model #
+# ####################################
+[ $# -lt 3 ] && echo "Usage: $0 <data_dir> <text_data_dir> <lang=de/es> [mount=${PWD}] [world_size=32] [update_freq=1]" && exit 1
+[ ${PWD##*/} != SpeechUT ] && echo "Error: dir not match! Switch to SpeechUT/ and run it again!" && exit 1
+DATA_DIR=$1
+TEXT_DATA_DIR=$2
+lang=$3
+mount=$4
+world_size=$5
+update_freq=$6
+[ -z $mount ] && mount=${PWD}
+[ -z $world_size ] && world_size=32
+[ -z $update_freq ] && update_freq=1
+
+CODE_ROOT=${PWD}
+MODEL_DIR="${mount}/exp/pretrain/base_speechut4en${lang}_${world_size}gpu_${update_freq}accum"
+[ -d $MODEL_DIR ] || mkdir -p $MODEL_DIR
+
+python $CODE_ROOT/fairseq/fairseq_cli/hydra_train.py \
+  --config-dir $CODE_ROOT/speechut/config/pretrain \
+  --config-name speechut_base_librispeech \
+  common.user_dir=$CODE_ROOT/speechut \
+  \
+  task.labels='["km"]' \
+  model.label_rate=50 \
+  task.data=$DATA_DIR \
+  task.label_dir=$DATA_DIR \
+  task.text_cfg.text_data=$TEXT_DATA_DIR \
+  \
+  model.add_text_ctc=false \
+  model.text_transformer.share_decoder_input_output_embed=true \
+  criterion.u2t_ed_weight=1.0 \
+  criterion.u2t_ctc_weight=0 \
+  \
+  dataset.train_subset=\"train_960,mustcuns_${lang}+pseudo_wmt_en${lang}.kmu-spm+train_960.kmu-none,mustcuns_${lang}.kmu-none\" \
+  dataset.valid_subset=\"dev_clean+pseudo_valid.kmu-spm+dev.kmu-none\" \
+  dataset.num_workers=0 \
+  dataset.max_tokens=1400000 \
+  distributed_training.distributed_world_size=${world_size} \
+  optimization.update_freq=[${update_freq}] \
+  \
+  common.tensorboard_logdir=$MODEL_DIR \
+  checkpoint.save_dir=$MODEL_DIR \
+  hydra.run.dir=$MODEL_DIR \
+  hydra.job.name=base_speechut4en${lang}_${world_size}gpu_${update_freq}accum
+
diff --git a/Speech2S/speech2s/scripts copy/pretrain_speechut/base_speechut_for_st_enfr.sh b/Speech2S/speech2s/scripts copy/pretrain_speechut/base_speechut_for_st_enfr.sh
new file mode 100644
index 0000000000000000000000000000000000000000..c0c7217d0c124e603bb3b95ff11b7e7e462290c0
--- /dev/null
+++ b/Speech2S/speech2s/scripts copy/pretrain_speechut/base_speechut_for_st_enfr.sh	
@@ -0,0 +1,48 @@
+# ####################################
+# SpeechUT Base model #
+# ####################################
+[ $# -lt 3 ] && echo "Usage: $0 <data_dir> <text_data_dir> [lang=fr] [mount=${PWD}] [world_size=32] [update_freq=1]" && exit 1
+[ ${PWD##*/} != SpeechUT ] && echo "Error: dir not match! Switch to SpeechUT/ and run it again!" && exit 1
+DATA_DIR=$1
+TEXT_DATA_DIR=$2
+lang=$3
+mount=$4
+world_size=$5
+update_freq=$6
+[ -z $lang ] && lang=fr
+[ -z $mount ] && mount=${PWD}
+[ -z $world_size ] && world_size=32
+[ -z $update_freq ] && update_freq=1
+
+CODE_ROOT=${PWD}
+MODEL_DIR="${mount}/exp/pretrain/base_speechut4en${lang}_${world_size}gpu_${update_freq}accum"
+[ -d $MODEL_DIR ] || mkdir -p $MODEL_DIR
+
+python $CODE_ROOT/fairseq/fairseq_cli/hydra_train.py \
+  --config-dir $CODE_ROOT/speechut/config/pretrain \
+  --config-name speechut_base_librispeech \
+  common.user_dir=$CODE_ROOT/speechut \
+  \
+  task.labels='["km"]' \
+  model.label_rate=50 \
+  task.data=$DATA_DIR \
+  task.label_dir=$DATA_DIR \
+  task.text_cfg.text_data=$TEXT_DATA_DIR \
+  \
+  model.add_text_ctc=false \
+  criterion.u2t_ed_weight=1.0 \
+  criterion.u2t_ctc_weight=0 \
+  \
+  dataset.train_subset=\"train_960,pretrain_mustc+pseudo_wmt14_enfr.kmu-spm+train_960.kmu-none,pretrain_mustc.kmu-none\" \
+  dataset.valid_subset=\"dev_clean+pseudo_valid.kmu-spm+dev.kmu-none\" \
+  dataset.num_workers=0 \
+  dataset.max_tokens=1400000 \
+  optimization.max_update=600000 \
+  distributed_training.distributed_world_size=${world_size} \
+  optimization.update_freq=[${update_freq}] \
+  \
+  common.tensorboard_logdir=$MODEL_DIR \
+  checkpoint.save_dir=$MODEL_DIR \
+  hydra.run.dir=$MODEL_DIR \
+  hydra.job.name=base_speechut4en${lang}_${world_size}gpu_${update_freq}accum
+
diff --git a/Speech2S/speech2s/scripts copy/pretrain_speechut/large_speechut_for_asr.sh b/Speech2S/speech2s/scripts copy/pretrain_speechut/large_speechut_for_asr.sh
new file mode 100644
index 0000000000000000000000000000000000000000..e9d64d789ed0421252edd71aa9c8268a42dc42f3
--- /dev/null
+++ b/Speech2S/speech2s/scripts copy/pretrain_speechut/large_speechut_for_asr.sh	
@@ -0,0 +1,41 @@
+# ####################################
+# SpeechUT Large model #
+# ####################################
+[ $# -lt 2 ] && echo "Usage: $0 <data_dir> <text_data_dir> [mount=${PWD}] [world_size=32] [update_freq=4]" && exit 1
+[ ${PWD##*/} != SpeechUT ] && echo "Error: dir not match! Switch to SpeechUT/ and run it again!" && exit 1
+DATA_DIR=$1
+TEXT_DATA_DIR=$2
+mount=$3
+world_size=$4
+update_freq=$5
+[ -z $mount ] && mount=${PWD}
+[ -z $world_size ] && world_size=32
+[ -z $update_freq ] && update_freq=4
+
+CODE_ROOT=${PWD}
+MODEL_DIR="${mount}/exp/pretrain/large_speechut4asr_${world_size}gpu_${update_freq}accum"
+[ -d $MODEL_DIR ] || mkdir -p $MODEL_DIR
+
+python $CODE_ROOT/fairseq/fairseq_cli/hydra_train.py \
+  --config-dir $CODE_ROOT/speechut/config/pretrain \
+  --config-name speechut_large_librilight \
+  common.user_dir=$CODE_ROOT/speechut \
+  \
+  task.labels='["km"]' \
+  model.label_rate=50 \
+  task.data=$DATA_DIR \
+  task.label_dir=$DATA_DIR \
+  task.text_cfg.text_data=$TEXT_DATA_DIR \
+  \
+  dataset.train_subset=\"train_small+pseudo_libritext.kmu-ltr\" \
+  dataset.valid_subset=\"dev_clean+dev.kmu-ltr\" \
+  dataset.num_workers=0 \
+  dataset.max_tokens=900000 \
+  distributed_training.distributed_world_size=${world_size} \
+  optimization.update_freq=[${update_freq}] \
+  \
+  common.tensorboard_logdir=$MODEL_DIR \
+  checkpoint.save_dir=$MODEL_DIR \
+  hydra.run.dir=$MODEL_DIR \
+  hydra.job.name=large_speechut4asr_${world_size}gpu_${update_freq}accum
+  
\ No newline at end of file
diff --git a/Speech2S/speech2s/scripts copy/tune_speechut_asr/finetune960h_large_edctc.sh b/Speech2S/speech2s/scripts copy/tune_speechut_asr/finetune960h_large_edctc.sh
new file mode 100644
index 0000000000000000000000000000000000000000..08a25818bc9fc519e65fa175886545a8650c0906
--- /dev/null
+++ b/Speech2S/speech2s/scripts copy/tune_speechut_asr/finetune960h_large_edctc.sh	
@@ -0,0 +1,45 @@
+# ####################################
+# SpeechUT Large model #
+# ####################################
+[ $# -lt 3 ] && echo "Usage: $0 <model_path> <data_dir> <cpt_tag> [mount=${PWD}] [world_size=8] [update_freq=3]" && exit 1
+[ ${PWD##*/} != SpeechUT ] && echo "Error: dir not match! Switch to SpeechUT/ and run it again!" && exit 1
+
+w2v_path=$1
+DATA_DIR=$2
+cpt=$3
+mount=$4
+world_size=$5
+update_freq=$6
+[ -z $mount ] && mount=${PWD}
+[ -z $world_size ] && world_size=8
+[ -z $update_freq ] && update_freq=3
+
+CODE_ROOT=${PWD}
+
+exp_name=${w2v_path%/*}
+exp_name=${exp_name##*/}
+MODEL_DIR="${mount}/exp/finetune_asr/$exp_name/960h_edctc80k_from_${cpt}_bz3.3m_lr1e-5"
+[ -d $MODEL_DIR ] || mkdir -p $MODEL_DIR
+
+python $CODE_ROOT/fairseq/fairseq_cli/hydra_train.py \
+  --config-dir $CODE_ROOT/speechut/config/finetune_asr \
+  --config-name speechut_large_960h \
+  common.user_dir=$CODE_ROOT/speechut \
+  \
+  task.data=$DATA_DIR \
+  task.label_dir=$DATA_DIR \
+  model.w2v_path=${w2v_path} \
+  \
+  optimization.lr=[0.00001] \
+  optimization.max_update=80000 \
+  dataset.max_tokens=1100000 \
+  optimization.update_freq=[${update_freq}] \
+  distributed_training.distributed_world_size=${world_size} \
+  \
+  dataset.train_subset="train_960" \
+  dataset.valid_subset="dev_other" \
+  \
+  common.tensorboard_logdir=$MODEL_DIR \
+  checkpoint.save_dir=$MODEL_DIR \
+  hydra.run.dir=$MODEL_DIR \
+  hydra.job.name=960h_edctc80k_from_${cpt}_bz3.3m_lr1e-5
diff --git a/Speech2S/speech2s/scripts copy/tune_speechut_asr/finetune_base_edctc.sh b/Speech2S/speech2s/scripts copy/tune_speechut_asr/finetune_base_edctc.sh
new file mode 100644
index 0000000000000000000000000000000000000000..cad7bd0a11336a2b5e0c34372d57b7b4b953a414
--- /dev/null
+++ b/Speech2S/speech2s/scripts copy/tune_speechut_asr/finetune_base_edctc.sh	
@@ -0,0 +1,45 @@
+# ####################################
+# SpeechUT Base model #
+# ####################################
+[ $# -lt 3 ] && echo "Usage: $0 <model_path> <data_dir> <cpt_tag> [mount=${PWD}] [world_size=8] [update_freq=2]" && exit 1
+[ ${PWD##*/} != SpeechUT ] && echo "Error: dir not match! Switch to SpeechUT/ and run it again!" && exit 1
+
+w2v_path=$1
+DATA_DIR=$2
+cpt=$3
+mount=$4
+world_size=$5
+update_freq=$6
+[ -z $mount ] && mount=${PWD}
+[ -z $world_size ] && world_size=8
+[ -z $update_freq ] && update_freq=2
+
+CODE_ROOT=${PWD}
+
+exp_name=${w2v_path%/*}
+exp_name=${exp_name##*/}
+MODEL_DIR="${mount}/exp/finetune_asr/$exp_name/edctc40k_from_${cpt}_bz2.6m_lr1e-5"
+[ -d $MODEL_DIR ] || mkdir -p $MODEL_DIR
+
+python $CODE_ROOT/fairseq/fairseq_cli/hydra_train.py \
+  --config-dir $CODE_ROOT/speechut/config/finetune_asr \
+  --config-name speechut_base_100h \
+  common.user_dir=$CODE_ROOT/speechut \
+  \
+  task.data=$DATA_DIR \
+  task.label_dir=$DATA_DIR \
+  model.w2v_path=${w2v_path} \
+  \
+  optimization.lr=[0.00001] \
+  optimization.max_update=40000 \
+  dataset.max_tokens=1300000 \
+  optimization.update_freq=[${update_freq}] \
+  distributed_training.distributed_world_size=${world_size} \
+  \
+  dataset.train_subset="train_clean_100" \
+  dataset.valid_subset="dev_other" \
+  \
+  common.tensorboard_logdir=$MODEL_DIR \
+  checkpoint.save_dir=$MODEL_DIR \
+  hydra.run.dir=$MODEL_DIR \
+  hydra.job.name=edctc40k_from_${cpt}_bz2.6m_lr1e-5
diff --git a/Speech2S/speech2s/scripts copy/tune_speechut_asr/inference_edctc.sh b/Speech2S/speech2s/scripts copy/tune_speechut_asr/inference_edctc.sh
new file mode 100644
index 0000000000000000000000000000000000000000..9dce06398c476a26290839b7f3a8f8632a5060e0
--- /dev/null
+++ b/Speech2S/speech2s/scripts copy/tune_speechut_asr/inference_edctc.sh	
@@ -0,0 +1,61 @@
+#####################################
+# SpeechUT ASR model #
+#####################################
+[ $# -lt 2 ] && echo "Usage: $0 <model_path> <data_dir> [gen-set=dev_other] [beam_size=10] [ctc_weight=0.2] [--normalize]" && exit 1
+[ ${PWD##*/} != SpeechUT ] && echo "Error: dir not match! Switch to SpeechUT/ and run it again!" && exit 1
+
+model_path=$1
+DATA_DIR=$2
+gen_set=$3
+beam_size=$4
+ctc_weight=$5
+extra=$6
+[ -z $extra ] && echo "Assert decoding base model! If you are decoding large model, please add '--normalize' at the end..."
+[ -z $gen_set ] && gen_set="dev_other"
+[ -z $beam_size ] && beam_size=10
+[ -z $ctc_weight ] && ctc_weight=0.2
+[ $ctc_weight == 0 ] && [ $beam_size != 1 ] && echo "Change beam size to 1 as no ctc-decoding used..." && beam_size=1
+[ $ctc_weight != 0 ] && extra="$extra --batch-size 1"
+
+src_dir=${model_path%/*}
+cpt=${model_path##*/}
+cpt=${cpt%.*}
+
+CODE_ROOT=${PWD}
+
+for subset in ${gen_set//,/ }; do
+    results_path=$src_dir/decode_${cpt}/beam${beam_size}_ctc${ctc_weight}/${subset}_${world_size}_${rank}
+    [ ! -d $results_path ] && mkdir -p $results_path
+
+    python $CODE_ROOT/fairseq/fairseq_cli/generate.py $DATA_DIR \
+    --user-dir $CODE_ROOT/speechut \
+    --label-dir ${DATA_DIR} \
+    --labels '["ltr"]' \
+    --single-target \
+    --post-process letter \
+    --gen-subset ${subset} \
+    --max-tokens 2000000 \
+    \
+    --task joint_sc2t_pretraining \
+    --add-decoder-target \
+    --fine-tuning \
+    --pad-audio \
+    --random-crop \
+    \
+    --ctc-weight ${ctc_weight} $extra \
+    --beam ${beam_size} \
+    \
+    --path ${model_path} \
+    --results-path $results_path \
+    \
+    --scoring wer --max-len-a 0.00078125 --max-len-b 200 \
+    &
+done
+wait
+
+
+for subset in ${gen_set//,/ }; do
+    results_path=$src_dir/decode_${cpt}/beam${beam_size}_ctc${ctc_weight}/${subset}_${world_size}_${rank}
+    echo $results_path
+    tail -n 1 $results_path/generate-*.txt
+done
diff --git a/Speech2S/speech2s/scripts copy/tune_speechut_asr/inference_edctclm.sh b/Speech2S/speech2s/scripts copy/tune_speechut_asr/inference_edctclm.sh
new file mode 100644
index 0000000000000000000000000000000000000000..dadd1a4286de52cef0250640ef64fd4117e11ecb
--- /dev/null
+++ b/Speech2S/speech2s/scripts copy/tune_speechut_asr/inference_edctclm.sh	
@@ -0,0 +1,66 @@
+#####################################
+# SpeechUT ASR model #
+#####################################
+[ $# -lt 2 ] && echo "Usage: $0 <model_path> <data_dir> [gen-set=dev_other] [beam_size=30] [ctc_weight=0.3] [lm_weight=0.7] [lm_path] [--normalize]" && exit 1
+[ ${PWD##*/} != SpeechUT ] && echo "Error: dir not match! Switch to SpeechUT/ and run it again!" && exit 1
+
+model_path=$1
+DATA_DIR=$2
+gen_set=$3
+beam_size=$4
+ctc_weight=$5
+lm_weight=$6
+lm_path=$7
+extra=$8
+[ -z $extra ] && echo "Assert decoding base model! If you are decoding large model, please add '--normalize' at the end..."
+[ -z $gen_set ] && gen_set="dev_other"
+[ -z $beam_size ] && beam_size=30
+[ -z $ctc_weight ] && ctc_weight=0.3
+[ -z $lm_weight ] && lm_weight=0.7
+[ -z $lm_path ] && lm_path="/mnt/default/v-junyiao/librispeech/lm/lm_ctc_form/checkpoint_best.pt"
+[ $ctc_weight == 0 ] && [ $beam_size != 1 ] && echo "Change beam size to 1 and lm_weight to 0 as no ctc-decoding used..." && beam_size=1 && lm_weight=0
+[ $ctc_weight != 0 ] && extra="$extra --batch-size 1"
+
+src_dir=${model_path%/*}
+cpt=${model_path##*/}
+cpt=${cpt%.*}
+
+CODE_ROOT=${PWD}
+
+for subset in ${gen_set//,/ }; do
+    results_path=$src_dir/decode_${cpt}/beam${beam_size}_ctc${ctc_weight}_lm${lm_weight}/${subset}_${world_size}_${rank}
+    [ ! -d $results_path ] && mkdir -p $results_path
+
+    python $CODE_ROOT/fairseq/fairseq_cli/generate.py $DATA_DIR \
+    --user-dir $CODE_ROOT/speechut \
+    --label-dir ${DATA_DIR} \
+    --labels '["ltr"]' \
+    --single-target \
+    --post-process letter \
+    --gen-subset ${subset} \
+    --max-tokens 800000 \
+    \
+    --task joint_sc2t_pretraining \
+    --add-decoder-target \
+    --fine-tuning \
+    --pad-audio \
+    --random-crop \
+    \
+    --ctc-weight ${ctc_weight} $extra \
+    --lm-weight ${lm_weight} --lm-path ${lm_path} \
+    --beam ${beam_size} \
+    \
+    --path ${model_path} \
+    --results-path ${results_path} \
+    \
+    --scoring wer --max-len-a 0.00078125 --max-len-b 200 \
+    &
+done
+wait
+
+
+for subset in ${gen_set//,/ }; do
+    results_path=$src_dir/decode_${cpt}/beam${beam_size}_ctc${ctc_weight}_lm${lm_weight}/${subset}_${world_size}_${rank}
+    echo $results_path
+    tail -n 1 $results_path/generate-*.txt
+done
diff --git a/Speech2S/speech2s/scripts copy/tune_speechut_asr/inference_lm_nj.sh b/Speech2S/speech2s/scripts copy/tune_speechut_asr/inference_lm_nj.sh
new file mode 100644
index 0000000000000000000000000000000000000000..a5627a59975a01736907a5cc3fb76df335709b43
--- /dev/null
+++ b/Speech2S/speech2s/scripts copy/tune_speechut_asr/inference_lm_nj.sh	
@@ -0,0 +1,74 @@
+#####################################
+# SpeechUT ASR model #
+#####################################
+[ $# -lt 2 ] && echo "Usage: $0 <model_path> <data_dir> [gen-set=dev_other] [beam_size=30] [ctc_weight=0.3] [lm_weight=0.7] [lm_path] [nj=8] [ngpu=8] [--normalize]" && exit 1
+[ ${PWD##*/} != SpeechUT ] && echo "Error: dir not match! Switch to SpeechUT/ and run it again!" && exit 1
+
+model_path=$1
+DATA_DIR=$2
+gen_set=$3
+beam_size=$4
+ctc_weight=$5
+lm_weight=$6
+lm_path=$7
+nj=$8
+ngpu=$9
+extra=${10}
+[ -z $extra ] && echo "Assert decoding base model! If you are decoding large model, please add '--normalize' at the end..."
+[ -z $gen_set ] && gen_set="dev_other"
+[ -z $beam_size ] && beam_size=30
+[ -z $ctc_weight ] && ctc_weight=0.3
+[ -z $lm_weight ] && lm_weight=0.7
+[ -z $lm_path ] && lm_path="/mnt/default/v-junyiao/librispeech/lm/lm_ctc_form/checkpoint_best.pt"
+[ $ctc_weight == 0 ] && [ $beam_size != 1 ] && echo "Change beam size to 1 and lm_weight to 0 as no ctc-decoding used..." && beam_size=1 && lm_weight=0
+[ $ctc_weight != 0 ] && extra="$extra --batch-size 1"
+[ -z $nj ] && nj=8
+[ -z $ngpu ] && ngpu=8
+
+src_dir=${model_path%/*}
+cpt=${model_path##*/}
+cpt=${cpt%.*}
+
+CODE_ROOT=${PWD}
+
+world_size=$nj
+for rank in $(seq 0 $((nj - 1))); do
+    export CUDA_VISIBLE_DEVICES=$((rank % $ngpu))
+    for subset in ${gen_set//,/ }; do
+        results_path=$src_dir/decode_${cpt}/beam${beam_size}_ctc${ctc_weight}_lm${lm_weight}/${subset}_${world_size}_${rank}
+        [ ! -d $results_path ] && mkdir -p $results_path
+
+        python $CODE_ROOT/fairseq/fairseq_cli/generate.py $DATA_DIR \
+        --user-dir $CODE_ROOT/speechut \
+        --label-dir ${DATA_DIR} \
+        --labels '["ltr"]' \
+        --single-target \
+        --post-process letter \
+        --gen-subset ${subset} \
+        --max-tokens 800000 \
+        \
+        --task joint_sc2t_pretraining \
+        --add-decoder-target \
+        --fine-tuning \
+        --pad-audio \
+        --random-crop \
+        \
+        --ctc-weight ${ctc_weight} $extra \
+        --lm-weight ${lm_weight} --lm-path ${lm_path} \
+        --beam ${beam_size} \
+        \
+        --path ${model_path} \
+        --results-path $results_path \
+        \
+        --scoring wer --max-len-a 0.00078125 --max-len-b 200 \
+        --distributed-world-size ${world_size} --distributed-rank ${rank} \
+        &
+    done
+done
+wait
+
+
+for subset in ${gen_set//,/ }; do
+    results_dir=$src_dir/decode_${cpt}/beam${beam_size}_ctc${ctc_weight}_lm${lm_weight}
+    cat $results_dir/${subset}_${world_size}_*/generate-${subset}.txt | grep -v "^Generate" > $results_dir/generate-${subset}.all.txt
+done
diff --git a/Speech2S/speech2s/scripts copy/tune_speechut_asr/inference_nj.sh b/Speech2S/speech2s/scripts copy/tune_speechut_asr/inference_nj.sh
new file mode 100644
index 0000000000000000000000000000000000000000..08e6df431c9856f24122118017b8ae85bacc5444
--- /dev/null
+++ b/Speech2S/speech2s/scripts copy/tune_speechut_asr/inference_nj.sh	
@@ -0,0 +1,69 @@
+#####################################
+# SpeechUT ASR model #
+#####################################
+[ $# -lt 2 ] && echo "Usage: $0 <model_path> <data_dir> [gen-set=dev_other] [beam_size=10] [ctc_weight=0.2] [nj=32] [ngpu=8] [--normalize]" && exit 1
+[ ${PWD##*/} != SpeechUT ] && echo "Error: dir not match! Switch to SpeechUT/ and run it again!" && exit 1
+
+model_path=$1
+DATA_DIR=$2
+gen_set=$3
+beam_size=$4
+ctc_weight=$5
+nj=$6
+ngpu=$7
+extra=$8
+[ -z $extra ] && echo "Assert decoding base model! If you are decoding large model, please add '--normalize' at the end..."
+[ -z $gen_set ] && gen_set="dev_other"
+[ -z $beam_size ] && beam_size=10
+[ -z $ctc_weight ] && ctc_weight=0.2
+[ $ctc_weight == 0 ] && [ $beam_size != 1 ] && echo "Change beam size to 1 as no ctc-decoding used..." && beam_size=1
+[ $ctc_weight != 0 ] && extra="$extra --batch-size 1"
+[ -z $nj ] && nj=32
+[ -z $ngpu ] && ngpu=8
+
+src_dir=${model_path%/*}
+cpt=${model_path##*/}
+cpt=${cpt%.*}
+
+CODE_ROOT=${PWD}
+
+world_size=$nj
+for rank in $(seq 0 $((nj - 1))); do
+    export CUDA_VISIBLE_DEVICES=$((rank % $ngpu))
+    for subset in ${gen_set//,/ }; do
+        results_path=$src_dir/decode_${cpt}/beam${beam_size}_ctc${ctc_weight}/${subset}_${world_size}_${rank}
+        [ ! -d $results_path ] && mkdir -p $results_path
+
+        python $CODE_ROOT/fairseq/fairseq_cli/generate.py $DATA_DIR \
+        --user-dir $CODE_ROOT/speechut \
+        --label-dir ${DATA_DIR} \
+        --labels '["ltr"]' \
+        --single-target \
+        --post-process letter \
+        --gen-subset ${subset} \
+        --max-tokens 2000000 \
+        \
+        --task joint_sc2t_pretraining \
+        --add-decoder-target \
+        --fine-tuning \
+        --pad-audio \
+        --random-crop \
+        \
+        --ctc-weight ${ctc_weight} $extra \
+        --beam ${beam_size} \
+        \
+        --path ${model_path} \
+        --results-path $results_path \
+        \
+        --scoring wer --max-len-a 0.00078125 --max-len-b 200 \
+        --distributed-world-size ${world_size} --distributed-rank ${rank} \
+        &
+    done
+done
+wait
+
+
+for subset in ${gen_set//,/ }; do
+    results_dir=$src_dir/decode_${cpt}/beam${beam_size}_ctc${ctc_weight}
+    cat $results_dir/${subset}_${world_size}_*/generate-${subset}.txt | grep -v "^Generate" > $results_dir/generate-${subset}.all.txt
+done
diff --git a/Speech2S/speech2s/scripts copy/tune_speechut_st/finetune_base_mustc_enxx.sh b/Speech2S/speech2s/scripts copy/tune_speechut_st/finetune_base_mustc_enxx.sh
new file mode 100644
index 0000000000000000000000000000000000000000..59c8a2a0346b708894b1568fa691c062537aa559
--- /dev/null
+++ b/Speech2S/speech2s/scripts copy/tune_speechut_st/finetune_base_mustc_enxx.sh	
@@ -0,0 +1,77 @@
+# ####################################
+# SpeechUT Base model #
+# ####################################
+[ $# -lt 4 ] && echo "Usage: $0 <model_path> <data_dir> <lang> <cpt-tag> [mount=${PWD}] [world_size=8] [update_freq=4/6]" && exit 0
+[ ${PWD##*/} != SpeechUT ] && echo "Error: dir not match! Switch to SpeechUT/ and run it again!" && exit 1
+
+w2v_path=$1
+DATA_DIR=$2
+lang=$3
+cpt=$4
+mount=$5
+world_size=$6
+update_freq=$7
+[ -z $mount ] && mount=${PWD}
+[ -z $world_size ] && world_size=8
+[ -z $update_freq ] && update_freq=4
+
+CODE_ROOT=${PWD}
+
+exp_name=${w2v_path%/*}
+exp_name=${exp_name##*/}
+MODEL_DIR="$mount/exp/finetune_mustc/$exp_name/legacy_en${lang}_from_${cpt}_bz3.2m_lr3e-5"
+[ -d $MODEL_DIR ] || mkdir -p $MODEL_DIR
+
+max_tokens=800000
+python $CODE_ROOT/fairseq/fairseq_cli/train.py ${DATA_DIR} \
+    --save-dir ${MODEL_DIR} \
+    --user-dir $CODE_ROOT/speechut \
+    --task speech_to_text \
+    --config-yaml config_en${lang}.yaml \
+    --train-subset "train_st" \
+    --valid-subset "dev_st" \
+    --fp16 \
+    --seed 1 \
+    \
+    --ddp-backend no_c10d \
+    --distributed-world-size ${world_size} \
+    --tensorboard-logdir ${MODEL_DIR} \
+    \
+    --criterion label_smoothed_cross_entropy --report-accuracy \
+    --label-smoothing 0.3 \
+    \
+    --optimizer adam \
+    --clip-norm 1.0 \
+    --lr 3e-05 \
+    --lr-scheduler polynomial_decay --warmup-updates 5000 \
+    --max-update 50000 \
+    --total-num-update 50000 \
+    --update-freq ${update_freq} \
+    \
+    --max-tokens ${max_tokens} \
+    --max-sentences 16 \
+    --max-tokens-valid ${max_tokens} \
+    --grouped-shuffling \
+    --max-source-positions ${max_tokens} \
+    --skip-invalid-size-inputs-valid-test \
+    --num-workers 0 \
+    --best-checkpoint-metric "accuracy" \
+    --maximize-best-checkpoint-metric \
+    \
+    --arch "speechut_st_legacy" \
+    --w2v-path ${w2v_path} \
+    --layerdrop 0.1 \
+    --activation-dropout 0.1 \
+    --attention-dropout 0.1 \
+    --feature-grad-mult 1.0 \
+    \
+    --apply-mask --mask-prob 0.5 \
+    \
+    --log-format json \
+    --log-interval 100 \
+    --save-interval 1 \
+    --keep-last-epochs 5 \
+    --keep-best-checkpoints 5 \
+    \
+    2>&1 | tee ${MODEL_DIR}/train_en${lang}.log
+
diff --git a/Speech2S/speech2s/scripts copy/tune_speechut_st/inference_st.sh b/Speech2S/speech2s/scripts copy/tune_speechut_st/inference_st.sh
new file mode 100644
index 0000000000000000000000000000000000000000..3aefa10e360f57dbf66cff9d84c800b4da89619f
--- /dev/null
+++ b/Speech2S/speech2s/scripts copy/tune_speechut_st/inference_st.sh	
@@ -0,0 +1,44 @@
+# ####################################
+# SpeechUT Base model #
+# ####################################
+[ $# -lt 3 ] && echo "Usage: $0 <model_path> <data_dir> <lang> [gen-set=dev] [beam_size=10] [lenpen=1.0]" && exit 0
+[ ${PWD##*/} != SpeechUT ] && echo "Error: dir not match! Switch to SpeechUT/ and run it again!" && exit 1
+
+model_path=$1
+DATA_DIR=$2
+lang=$3
+gen_set=$4
+beam_size=$5
+lenpen=$6
+[ -z $gen_set ] && gen_set="dev"
+[ -z $beam_size ] && beam_size=10
+[ -z $lenpen ] && lenpen=1
+src_dir=${model_path%/*}
+cpt=${model_path##*/}
+cpt=${cpt%.*}
+
+CODE_ROOT=${PWD}
+results_path=$src_dir/decode_${cpt}_beam${beam_size}/${gen_set}
+[ ! -d $results_path ] && mkdir -p $results_path
+
+python $CODE_ROOT/fairseq/fairseq_cli/generate.py $DATA_DIR \
+    --gen-subset ${gen_set}_st \
+    --max-tokens 2000000 \
+    --max-source-positions 2000000 \
+    --num-workers 0 \
+    \
+    --user-dir $CODE_ROOT/speechut \
+    --task speech_to_text \
+    --config-yaml config_en${lang}.yaml \
+    \
+    --path ${model_path} \
+    --results-path $results_path \
+    \
+    --scoring sacrebleu --max-len-a 0 --max-len-b 512 \
+    --beam ${beam_size} \
+    --lenpen $lenpen \
+    # --model-overrides "{'model':{'w2v_path':'/path/to/your/pretrained/model.pt'}}" \
+
+    echo $results_path
+    tail -n 1 $results_path/generate-*.txt
+    sleep 1s
diff --git a/Speech2S/speech2s/scripts/__init__.py b/Speech2S/speech2s/scripts/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/Speech2S/speech2s/scripts/average_checkpoints.py b/Speech2S/speech2s/scripts/average_checkpoints.py
new file mode 100644
index 0000000000000000000000000000000000000000..a4711e4840a45118c9e28d0258f89fe64e964cf3
--- /dev/null
+++ b/Speech2S/speech2s/scripts/average_checkpoints.py
@@ -0,0 +1,160 @@
+#!/usr/bin/env python3
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import argparse
+import collections
+import os
+import re
+
+import torch
+from fairseq.file_io import PathManager
+
+
+def average_checkpoints(inputs):
+    """Loads checkpoints from inputs and returns a model with averaged weights.
+
+    Args:
+      inputs: An iterable of string paths of checkpoints to load from.
+
+    Returns:
+      A dict of string keys mapping to various values. The 'model' key
+      from the returned dict should correspond to an OrderedDict mapping
+      string parameter names to torch Tensors.
+    """
+    params_dict = collections.OrderedDict()
+    params_keys = None
+    new_state = None
+    num_models = len(inputs)
+
+    for fpath in inputs:
+        with PathManager.open(fpath, "rb") as f:
+            state = torch.load(
+                f,
+                map_location=(
+                    lambda s, _: torch.serialization.default_restore_location(s, "cpu")
+                ),
+            )
+        # Copies over the settings from the first checkpoint
+        if new_state is None:
+            new_state = state
+
+        model_params = state["model"]
+
+        model_params_keys = list(model_params.keys())
+        if params_keys is None:
+            params_keys = model_params_keys
+        elif params_keys != model_params_keys:
+            raise KeyError(
+                "For checkpoint {}, expected list of params: {}, "
+                "but found: {}".format(f, params_keys, model_params_keys)
+            )
+
+        for k in params_keys:
+            p = model_params[k]
+            if isinstance(p, torch.HalfTensor):
+                p = p.float()
+            if k not in params_dict:
+                params_dict[k] = p.clone()
+                # NOTE: clone() is needed in case of p is a shared parameter
+            else:
+                params_dict[k] += p
+
+    averaged_params = collections.OrderedDict()
+    for k, v in params_dict.items():
+        averaged_params[k] = v
+        if averaged_params[k].is_floating_point():
+            averaged_params[k].div_(num_models)
+        else:
+            averaged_params[k] //= num_models
+    new_state["model"] = averaged_params
+    return new_state
+
+
+def last_n_checkpoints(paths, n, update_based, upper_bound=None):
+    assert len(paths) == 1
+    path = paths[0]
+    if update_based:
+        pt_regexp = re.compile(r"checkpoint_\d+_(\d+)\.pt")
+    else:
+        pt_regexp = re.compile(r"checkpoint(\d+)\.pt")
+    files = PathManager.ls(path)
+
+    entries = []
+    for f in files:
+        m = pt_regexp.fullmatch(f)
+        if m is not None:
+            sort_key = int(m.group(1))
+            if upper_bound is None or sort_key <= upper_bound:
+                entries.append((sort_key, m.group(0)))
+    if len(entries) < n:
+        raise Exception(
+            "Found {} checkpoint files but need at least {}", len(entries), n
+        )
+    return [os.path.join(path, x[1]) for x in sorted(entries, reverse=True)[:n]]
+
+
+def main():
+    parser = argparse.ArgumentParser(
+        description="Tool to average the params of input checkpoints to "
+        "produce a new checkpoint",
+    )
+    # fmt: off
+    parser.add_argument('--inputs', required=True, nargs='+',
+                        help='Input checkpoint file paths.')
+    parser.add_argument('--output', required=True, metavar='FILE',
+                        help='Write the new checkpoint containing the averaged weights to this path.')
+    num_group = parser.add_mutually_exclusive_group()
+    num_group.add_argument('--num-epoch-checkpoints', type=int,
+                           help='if set, will try to find checkpoints with names checkpoint_xx.pt in the '
+                           'path specified by input, and average last this many of them.')
+    num_group.add_argument('--num-update-checkpoints', type=int,
+                           help='if set, will try to find checkpoints with names checkpoint_ee_xx.pt in the path specified by'
+                           ' input, and average last this many of them.')
+    parser.add_argument('--checkpoint-upper-bound', type=int,
+                        help='when using --num-epoch-checkpoints, this will set an upper bound on which epoch to use, '
+                        'when using --num-update-checkpoints, this will set an upper bound on which update to use'
+                        'e.g., with --num-epoch-checkpoints=10 --checkpoint-upper-bound=50, checkpoints 41-50 would be'
+                        ' averaged.'
+                        'e.g., with --num-update-checkpoints=10 --checkpoint-upper-bound=50000, checkpoints 40500-50000 would'
+                        ' be averaged assuming --save-interval-updates 500'
+                        )
+    # fmt: on
+    args = parser.parse_args()
+    print(args)
+
+    num = None
+    is_update_based = False
+    if args.num_update_checkpoints is not None:
+        num = args.num_update_checkpoints
+        is_update_based = True
+    elif args.num_epoch_checkpoints is not None:
+        num = args.num_epoch_checkpoints
+
+    assert args.checkpoint_upper_bound is None or (
+        args.num_epoch_checkpoints is not None
+        or args.num_update_checkpoints is not None
+    ), "--checkpoint-upper-bound requires --num-epoch-checkpoints or --num-update-checkpoints"
+    assert (
+        args.num_epoch_checkpoints is None or args.num_update_checkpoints is None
+    ), "Cannot combine --num-epoch-checkpoints and --num-update-checkpoints"
+
+    if num is not None:
+        args.inputs = last_n_checkpoints(
+            args.inputs,
+            num,
+            is_update_based,
+            upper_bound=args.checkpoint_upper_bound,
+        )
+        print("averaging checkpoints: ", args.inputs)
+
+    new_state = average_checkpoints(args.inputs)
+    with PathManager.open(args.output, "wb") as f:
+        torch.save(new_state, f)
+    print("Finished writing averaged checkpoint to {}".format(args.output))
+
+
+if __name__ == "__main__":
+    main()
diff --git a/Speech2S/speech2s/scripts/build_sym_alignment.py b/Speech2S/speech2s/scripts/build_sym_alignment.py
new file mode 100644
index 0000000000000000000000000000000000000000..0ca5c18f7bd4b0fbf58b203793506ca395466129
--- /dev/null
+++ b/Speech2S/speech2s/scripts/build_sym_alignment.py
@@ -0,0 +1,97 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+"""
+Use this script in order to build symmetric alignments for your translation
+dataset.
+This script depends on fast_align and mosesdecoder tools. You will need to
+build those before running the script.
+fast_align:
+    github: http://github.com/clab/fast_align
+    instructions: follow the instructions in README.md
+mosesdecoder:
+    github: http://github.com/moses-smt/mosesdecoder
+    instructions: http://www.statmt.org/moses/?n=Development.GetStarted
+The script produces the following files under --output_dir:
+    text.joined - concatenation of lines from the source_file and the
+    target_file.
+    align.forward - forward pass of fast_align.
+    align.backward - backward pass of fast_align.
+    aligned.sym_heuristic - symmetrized alignment.
+"""
+
+import argparse
+import os
+from itertools import zip_longest
+
+
+def main():
+    parser = argparse.ArgumentParser(description="symmetric alignment builer")
+    # fmt: off
+    parser.add_argument('--fast_align_dir',
+                        help='path to fast_align build directory')
+    parser.add_argument('--mosesdecoder_dir',
+                        help='path to mosesdecoder root directory')
+    parser.add_argument('--sym_heuristic',
+                        help='heuristic to use for symmetrization',
+                        default='grow-diag-final-and')
+    parser.add_argument('--source_file',
+                        help='path to a file with sentences '
+                             'in the source language')
+    parser.add_argument('--target_file',
+                        help='path to a file with sentences '
+                             'in the target language')
+    parser.add_argument('--output_dir',
+                        help='output directory')
+    # fmt: on
+    args = parser.parse_args()
+
+    fast_align_bin = os.path.join(args.fast_align_dir, "fast_align")
+    symal_bin = os.path.join(args.mosesdecoder_dir, "bin", "symal")
+    sym_fast_align_bin = os.path.join(
+        args.mosesdecoder_dir, "scripts", "ems", "support", "symmetrize-fast-align.perl"
+    )
+
+    # create joined file
+    joined_file = os.path.join(args.output_dir, "text.joined")
+    with open(args.source_file, "r", encoding="utf-8") as src, open(
+        args.target_file, "r", encoding="utf-8"
+    ) as tgt:
+        with open(joined_file, "w", encoding="utf-8") as joined:
+            for s, t in zip_longest(src, tgt):
+                print("{} ||| {}".format(s.strip(), t.strip()), file=joined)
+
+    bwd_align_file = os.path.join(args.output_dir, "align.backward")
+
+    # run forward alignment
+    fwd_align_file = os.path.join(args.output_dir, "align.forward")
+    fwd_fast_align_cmd = "{FASTALIGN} -i {JOINED} -d -o -v > {FWD}".format(
+        FASTALIGN=fast_align_bin, JOINED=joined_file, FWD=fwd_align_file
+    )
+    assert os.system(fwd_fast_align_cmd) == 0
+
+    # run backward alignment
+    bwd_align_file = os.path.join(args.output_dir, "align.backward")
+    bwd_fast_align_cmd = "{FASTALIGN} -i {JOINED} -d -o -v -r > {BWD}".format(
+        FASTALIGN=fast_align_bin, JOINED=joined_file, BWD=bwd_align_file
+    )
+    assert os.system(bwd_fast_align_cmd) == 0
+
+    # run symmetrization
+    sym_out_file = os.path.join(args.output_dir, "aligned")
+    sym_cmd = "{SYMFASTALIGN} {FWD} {BWD} {SRC} {TGT} {OUT} {HEURISTIC} {SYMAL}".format(
+        SYMFASTALIGN=sym_fast_align_bin,
+        FWD=fwd_align_file,
+        BWD=bwd_align_file,
+        SRC=args.source_file,
+        TGT=args.target_file,
+        OUT=sym_out_file,
+        HEURISTIC=args.sym_heuristic,
+        SYMAL=symal_bin,
+    )
+    assert os.system(sym_cmd) == 0
+
+
+if __name__ == "__main__":
+    main()
diff --git a/Speech2S/speech2s/scripts/compare_namespaces.py b/Speech2S/speech2s/scripts/compare_namespaces.py
new file mode 100644
index 0000000000000000000000000000000000000000..bc24db624f8db36f546c263ba3a806dae6d466bf
--- /dev/null
+++ b/Speech2S/speech2s/scripts/compare_namespaces.py
@@ -0,0 +1,46 @@
+#!/usr/bin/env python
+"""Helper script to compare two argparse.Namespace objects."""
+
+from argparse import Namespace  # noqa
+
+
+def main():
+
+    ns1 = eval(input("Namespace 1: "))
+    ns2 = eval(input("Namespace 2: "))
+
+    def keys(ns):
+        ks = set()
+        for k in dir(ns):
+            if not k.startswith("_"):
+                ks.add(k)
+        return ks
+
+    k1 = keys(ns1)
+    k2 = keys(ns2)
+
+    def print_keys(ks, ns1, ns2=None):
+        for k in ks:
+            if ns2 is None:
+                print("{}\t{}".format(k, getattr(ns1, k, None)))
+            else:
+                print(
+                    "{}\t{}\t{}".format(k, getattr(ns1, k, None), getattr(ns2, k, None))
+                )
+
+    print("Keys unique to namespace 1:")
+    print_keys(k1 - k2, ns1)
+    print()
+
+    print("Keys unique to namespace 2:")
+    print_keys(k2 - k1, ns2)
+    print()
+
+    print("Overlapping keys with different values:")
+    ks = [k for k in k1 & k2 if getattr(ns1, k, "None") != getattr(ns2, k, "None")]
+    print_keys(ks, ns1, ns2)
+    print()
+
+
+if __name__ == "__main__":
+    main()
diff --git a/Speech2S/speech2s/scripts/compound_split_bleu.sh b/Speech2S/speech2s/scripts/compound_split_bleu.sh
new file mode 100644
index 0000000000000000000000000000000000000000..1972fddcebff9a43a70bcf14c287175c68f60e3f
--- /dev/null
+++ b/Speech2S/speech2s/scripts/compound_split_bleu.sh
@@ -0,0 +1,20 @@
+#!/bin/bash
+
+if [ $# -ne 1 ]; then
+    echo "usage: $0 GENERATE_PY_OUTPUT"
+    exit 1
+fi
+
+GEN=$1
+
+SYS=$GEN.sys
+REF=$GEN.ref
+
+if [ $(tail -n 1 $GEN | grep BLEU | wc -l) -ne 1 ]; then
+    echo "not done generating"
+    exit
+fi
+
+grep ^H $GEN | awk -F '\t' '{print $NF}' | perl -ple 's{(\S)-(\S)}{$1 ##AT##-##AT## $2}g' > $SYS
+grep ^T $GEN | cut -f2- | perl -ple 's{(\S)-(\S)}{$1 ##AT##-##AT## $2}g' > $REF
+fairseq-score --sys $SYS --ref $REF
diff --git a/Speech2S/speech2s/scripts/constraints/extract.py b/Speech2S/speech2s/scripts/constraints/extract.py
new file mode 100644
index 0000000000000000000000000000000000000000..437b373856966e568ca93c13ebbd1417291e49da
--- /dev/null
+++ b/Speech2S/speech2s/scripts/constraints/extract.py
@@ -0,0 +1,90 @@
+#!/usr/bin/env python3
+#
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+"""Extracts random constraints from reference files."""
+
+import argparse
+import random
+import sys
+
+
+def get_phrase(words, index, length):
+    assert index < len(words) - length + 1
+    phr = " ".join(words[index : index + length])
+    for i in range(index, index + length):
+        words.pop(index)
+    return phr
+
+
+def main(args):
+
+    if args.seed:
+        random.seed(args.seed)
+
+    for line in sys.stdin:
+        constraints = []
+
+        def add_constraint(constraint):
+            constraints.append(constraint)
+
+        source = line.rstrip()
+        if "\t" in line:
+            source, target = line.split("\t")
+            if args.add_sos:
+                target = f"<s> {target}"
+            if args.add_eos:
+                target = f"{target} </s>"
+
+            if len(target.split()) >= args.len:
+                words = [target]
+
+                num = args.number
+
+                choices = {}
+                for i in range(num):
+                    if len(words) == 0:
+                        break
+                    segmentno = random.choice(range(len(words)))
+                    segment = words.pop(segmentno)
+                    tokens = segment.split()
+                    phrase_index = random.choice(range(len(tokens)))
+                    choice = " ".join(
+                        tokens[phrase_index : min(len(tokens), phrase_index + args.len)]
+                    )
+                    for j in range(
+                        phrase_index, min(len(tokens), phrase_index + args.len)
+                    ):
+                        tokens.pop(phrase_index)
+                    if phrase_index > 0:
+                        words.append(" ".join(tokens[0:phrase_index]))
+                    if phrase_index + 1 < len(tokens):
+                        words.append(" ".join(tokens[phrase_index:]))
+                    choices[target.find(choice)] = choice
+
+                    # mask out with spaces
+                    target = target.replace(choice, " " * len(choice), 1)
+
+                for key in sorted(choices.keys()):
+                    add_constraint(choices[key])
+
+        print(source, *constraints, sep="\t")
+
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--number", "-n", type=int, default=1, help="number of phrases")
+    parser.add_argument("--len", "-l", type=int, default=1, help="phrase length")
+    parser.add_argument(
+        "--add-sos", default=False, action="store_true", help="add <s> token"
+    )
+    parser.add_argument(
+        "--add-eos", default=False, action="store_true", help="add </s> token"
+    )
+    parser.add_argument("--seed", "-s", default=0, type=int)
+    args = parser.parse_args()
+
+    main(args)
diff --git a/Speech2S/speech2s/scripts/constraints/validate.py b/Speech2S/speech2s/scripts/constraints/validate.py
new file mode 100644
index 0000000000000000000000000000000000000000..d531ad9f39b1df42c98fe8f26ad61fe53a9ac0c5
--- /dev/null
+++ b/Speech2S/speech2s/scripts/constraints/validate.py
@@ -0,0 +1,34 @@
+#!/usr/bin/env python3
+#
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import sys
+
+
+"""Reads in a fairseq output file, and verifies that the constraints
+(C- lines) are present in the output (the first H- line). Assumes that
+constraints are listed prior to the first hypothesis.
+"""
+
+constraints = []
+found = 0
+total = 0
+for line in sys.stdin:
+    if line.startswith("C-"):
+        constraints.append(line.rstrip().split("\t")[1])
+    elif line.startswith("H-"):
+        text = line.split("\t")[2]
+
+        for constraint in constraints:
+            total += 1
+            if constraint in text:
+                found += 1
+            else:
+                print(f"No {constraint} in {text}", file=sys.stderr)
+
+        constraints = []
+
+print(f"Found {found} / {total} = {100 * found / total:.1f}%")
diff --git a/Speech2S/speech2s/scripts/convert_dictionary.lua b/Speech2S/speech2s/scripts/convert_dictionary.lua
new file mode 100644
index 0000000000000000000000000000000000000000..14ee8c997f642c8ff196617c2dcd0584037a60c4
--- /dev/null
+++ b/Speech2S/speech2s/scripts/convert_dictionary.lua
@@ -0,0 +1,34 @@
+-- Copyright (c) Facebook, Inc. and its affiliates.
+--
+-- This source code is licensed under the MIT license found in the
+-- LICENSE file in the root directory of this source tree.
+--
+-- Usage: convert_dictionary.lua <dict.th7>
+require 'fairseq'
+require 'torch'
+require 'paths'
+
+if #arg < 1 then
+   print('usage: convert_dictionary.lua <dict.th7>')
+   os.exit(1)
+end
+if not paths.filep(arg[1]) then
+   print('error: file does not exit: ' .. arg[1])
+   os.exit(1)
+end
+
+dict = torch.load(arg[1])
+dst = paths.basename(arg[1]):gsub('.th7', '.txt')
+assert(dst:match('.txt$'))
+
+f = io.open(dst, 'w')
+for idx, symbol in ipairs(dict.index_to_symbol) do
+  if idx > dict.cutoff then
+    break
+  end
+  f:write(symbol)
+  f:write(' ')
+  f:write(dict.index_to_freq[idx])
+  f:write('\n')
+end
+f:close()
diff --git a/Speech2S/speech2s/scripts/convert_model.lua b/Speech2S/speech2s/scripts/convert_model.lua
new file mode 100644
index 0000000000000000000000000000000000000000..61b92139294fb90a25989ebd2ee52a765fb278a2
--- /dev/null
+++ b/Speech2S/speech2s/scripts/convert_model.lua
@@ -0,0 +1,108 @@
+-- Copyright (c) Facebook, Inc. and its affiliates.
+--
+-- This source code is licensed under the MIT license found in the
+-- LICENSE file in the root directory of this source tree.
+--
+-- Usage: convert_model.lua <model_epoch1.th7>
+require 'torch'
+local fairseq = require 'fairseq'
+
+model = torch.load(arg[1])
+
+function find_weight_norm(container, module)
+  for _, wn in ipairs(container:listModules()) do
+    if torch.type(wn) == 'nn.WeightNorm' and wn.modules[1] == module then
+      return wn
+    end
+  end
+end
+
+function push_state(dict, key, module)
+  if torch.type(module) == 'nn.Linear' then
+    local wn = find_weight_norm(model.module, module)
+    assert(wn)
+    dict[key .. '.weight_v'] = wn.v:float()
+    dict[key .. '.weight_g'] = wn.g:float()
+  elseif torch.type(module) == 'nn.TemporalConvolutionTBC' then
+    local wn = find_weight_norm(model.module, module)
+    assert(wn)
+    local v = wn.v:float():view(wn.viewOut):transpose(2, 3)
+    dict[key .. '.weight_v'] = v
+    dict[key .. '.weight_g'] = wn.g:float():view(module.weight:size(3), 1, 1)
+  else
+    dict[key .. '.weight'] = module.weight:float()
+  end
+  if module.bias then
+    dict[key .. '.bias'] = module.bias:float()
+  end
+end
+
+encoder_dict = {}
+decoder_dict = {}
+combined_dict = {}
+
+function encoder_state(encoder)
+  luts = encoder:findModules('nn.LookupTable')
+  push_state(encoder_dict, 'embed_tokens', luts[1])
+  push_state(encoder_dict, 'embed_positions', luts[2])
+
+  fcs = encoder:findModules('nn.Linear')
+  assert(#fcs >= 2)
+  local nInputPlane = fcs[1].weight:size(1)
+  push_state(encoder_dict, 'fc1', table.remove(fcs, 1))
+  push_state(encoder_dict, 'fc2', table.remove(fcs, #fcs))
+
+  for i, module in ipairs(encoder:findModules('nn.TemporalConvolutionTBC')) do
+    push_state(encoder_dict, 'convolutions.' .. tostring(i - 1), module)
+    if nInputPlane ~= module.weight:size(3) / 2 then
+      push_state(encoder_dict, 'projections.' .. tostring(i - 1), table.remove(fcs, 1))
+    end
+    nInputPlane = module.weight:size(3) / 2
+  end
+  assert(#fcs == 0)
+end
+
+function decoder_state(decoder)
+  luts = decoder:findModules('nn.LookupTable')
+  push_state(decoder_dict, 'embed_tokens', luts[1])
+  push_state(decoder_dict, 'embed_positions', luts[2])
+
+  fcs = decoder:findModules('nn.Linear')
+  local nInputPlane = fcs[1].weight:size(1)
+  push_state(decoder_dict, 'fc1', table.remove(fcs, 1))
+  push_state(decoder_dict, 'fc2', fcs[#fcs - 1])
+  push_state(decoder_dict, 'fc3', fcs[#fcs])
+
+  table.remove(fcs, #fcs)
+  table.remove(fcs, #fcs)
+
+  for i, module in ipairs(decoder:findModules('nn.TemporalConvolutionTBC')) do
+    if nInputPlane ~= module.weight:size(3) / 2 then
+      push_state(decoder_dict, 'projections.' .. tostring(i - 1), table.remove(fcs, 1))
+    end
+    nInputPlane = module.weight:size(3) / 2
+
+    local prefix = 'attention.' .. tostring(i - 1)
+    push_state(decoder_dict, prefix .. '.in_projection', table.remove(fcs, 1))
+    push_state(decoder_dict, prefix .. '.out_projection', table.remove(fcs, 1))
+    push_state(decoder_dict, 'convolutions.' .. tostring(i - 1), module)
+  end
+  assert(#fcs == 0)
+end
+
+
+_encoder = model.module.modules[2]
+_decoder = model.module.modules[3]
+
+encoder_state(_encoder)
+decoder_state(_decoder)
+
+for k, v in pairs(encoder_dict) do
+  combined_dict['encoder.' .. k] = v
+end
+for k, v in pairs(decoder_dict) do
+  combined_dict['decoder.' .. k] = v
+end
+
+
+torch.save('state_dict.t7', combined_dict)
diff --git a/Speech2S/speech2s/scripts/count_docs.py b/Speech2S/speech2s/scripts/count_docs.py
new file mode 100644
index 0000000000000000000000000000000000000000..58d85af85e91377a34dbd01f7674436152fd08e8
--- /dev/null
+++ b/Speech2S/speech2s/scripts/count_docs.py
@@ -0,0 +1,58 @@
+#!/usr/bin/env python3
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+"""
+Count the number of documents and average number of lines and tokens per
+document in a large file. Documents should be separated by a single empty line.
+"""
+
+import argparse
+import gzip
+import sys
+
+import numpy as np
+
+
+def main():
+    parser = argparse.ArgumentParser()
+    parser.add_argument("input")
+    parser.add_argument("--gzip", action="store_true")
+    args = parser.parse_args()
+
+    def gopen():
+        if args.gzip:
+            return gzip.open(args.input, "r")
+        else:
+            return open(args.input, "r", encoding="utf-8")
+
+    num_lines = []
+    num_toks = []
+    with gopen() as h:
+        num_docs = 1
+        num_lines_in_doc = 0
+        num_toks_in_doc = 0
+        for i, line in enumerate(h):
+            if len(line.strip()) == 0:  # empty line indicates new document
+                num_docs += 1
+                num_lines.append(num_lines_in_doc)
+                num_toks.append(num_toks_in_doc)
+                num_lines_in_doc = 0
+                num_toks_in_doc = 0
+            else:
+                num_lines_in_doc += 1
+                num_toks_in_doc += len(line.rstrip().split())
+            if i % 1000000 == 0:
+                print(i, file=sys.stderr, end="", flush=True)
+            elif i % 100000 == 0:
+                print(".", file=sys.stderr, end="", flush=True)
+        print(file=sys.stderr, flush=True)
+
+    print("found {} docs".format(num_docs))
+    print("average num lines per doc: {}".format(np.mean(num_lines)))
+    print("average num toks per doc: {}".format(np.mean(num_toks)))
+
+
+if __name__ == "__main__":
+    main()
diff --git a/Speech2S/speech2s/scripts/read_binarized.py b/Speech2S/speech2s/scripts/read_binarized.py
new file mode 100644
index 0000000000000000000000000000000000000000..a414095d03fb022a6753e816fc8bfd80e11db24d
--- /dev/null
+++ b/Speech2S/speech2s/scripts/read_binarized.py
@@ -0,0 +1,48 @@
+#!/usr/bin/env python3
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import argparse
+
+from fairseq.data import Dictionary, data_utils, indexed_dataset
+
+
+def get_parser():
+    parser = argparse.ArgumentParser(
+        description="writes text from binarized file to stdout"
+    )
+    # fmt: off
+    parser.add_argument('--dataset-impl', help='dataset implementation',
+                        choices=indexed_dataset.get_available_dataset_impl())
+    parser.add_argument('--dict', metavar='FP', help='dictionary containing known words', default=None)
+    parser.add_argument('--input', metavar='FP', required=True, help='binarized file to read')
+    # fmt: on
+
+    return parser
+
+
+def main():
+    parser = get_parser()
+    args = parser.parse_args()
+
+    dictionary = Dictionary.load(args.dict) if args.dict is not None else None
+    dataset = data_utils.load_indexed_dataset(
+        args.input,
+        dictionary,
+        dataset_impl=args.dataset_impl,
+        default="lazy",
+    )
+
+    for tensor_line in dataset:
+        if dictionary is None:
+            line = " ".join([str(int(x)) for x in tensor_line])
+        else:
+            line = dictionary.string(tensor_line)
+
+        print(line)
+
+
+if __name__ == "__main__":
+    main()
diff --git a/Speech2S/speech2s/scripts/rm_pt.py b/Speech2S/speech2s/scripts/rm_pt.py
new file mode 100644
index 0000000000000000000000000000000000000000..6cd063d21f0610fa7c42c2cfb2ee8af7c9c78677
--- /dev/null
+++ b/Speech2S/speech2s/scripts/rm_pt.py
@@ -0,0 +1,141 @@
+#!/usr/bin/env python3
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import argparse
+import os
+import re
+import shutil
+import sys
+
+
+pt_regexp = re.compile(r"checkpoint(\d+|_\d+_\d+|_[a-z]+)\.pt")
+pt_regexp_epoch_based = re.compile(r"checkpoint(\d+)\.pt")
+pt_regexp_update_based = re.compile(r"checkpoint_\d+_(\d+)\.pt")
+
+
+def parse_checkpoints(files):
+    entries = []
+    for f in files:
+        m = pt_regexp_epoch_based.fullmatch(f)
+        if m is not None:
+            entries.append((int(m.group(1)), m.group(0)))
+        else:
+            m = pt_regexp_update_based.fullmatch(f)
+            if m is not None:
+                entries.append((int(m.group(1)), m.group(0)))
+    return entries
+
+
+def last_n_checkpoints(files, n):
+    entries = parse_checkpoints(files)
+    return [x[1] for x in sorted(entries, reverse=True)[:n]]
+
+
+def every_n_checkpoints(files, n):
+    entries = parse_checkpoints(files)
+    return [x[1] for x in sorted(sorted(entries)[::-n])]
+
+
+def main():
+    parser = argparse.ArgumentParser(
+        description=(
+            "Recursively delete checkpoint files from `root_dir`, "
+            "but preserve checkpoint_best.pt and checkpoint_last.pt"
+        )
+    )
+    parser.add_argument("root_dirs", nargs="*")
+    parser.add_argument(
+        "--save-last", type=int, default=0, help="number of last checkpoints to save"
+    )
+    parser.add_argument(
+        "--save-every", type=int, default=0, help="interval of checkpoints to save"
+    )
+    parser.add_argument(
+        "--preserve-test",
+        action="store_true",
+        help="preserve checkpoints in dirs that start with test_ prefix (default: delete them)",
+    )
+    parser.add_argument(
+        "--delete-best", action="store_true", help="delete checkpoint_best.pt"
+    )
+    parser.add_argument(
+        "--delete-last", action="store_true", help="delete checkpoint_last.pt"
+    )
+    parser.add_argument(
+        "--no-dereference", action="store_true", help="don't dereference symlinks"
+    )
+    args = parser.parse_args()
+
+    files_to_desymlink = []
+    files_to_preserve = []
+    files_to_delete = []
+    for root_dir in args.root_dirs:
+        for root, _subdirs, files in os.walk(root_dir):
+            if args.save_last > 0:
+                to_save = last_n_checkpoints(files, args.save_last)
+            else:
+                to_save = []
+            if args.save_every > 0:
+                to_save += every_n_checkpoints(files, args.save_every)
+            for file in files:
+                if not pt_regexp.fullmatch(file):
+                    continue
+                full_path = os.path.join(root, file)
+                if (
+                    not os.path.basename(root).startswith("test_") or args.preserve_test
+                ) and (
+                    (file == "checkpoint_last.pt" and not args.delete_last)
+                    or (file == "checkpoint_best.pt" and not args.delete_best)
+                    or file in to_save
+                ):
+                    if os.path.islink(full_path) and not args.no_dereference:
+                        files_to_desymlink.append(full_path)
+                    else:
+                        files_to_preserve.append(full_path)
+                else:
+                    files_to_delete.append(full_path)
+
+    if len(files_to_desymlink) == 0 and len(files_to_delete) == 0:
+        print("Nothing to do.")
+        sys.exit(0)
+
+    files_to_desymlink = sorted(files_to_desymlink)
+    files_to_preserve = sorted(files_to_preserve)
+    files_to_delete = sorted(files_to_delete)
+
+    print("Operations to perform (in order):")
+    if len(files_to_desymlink) > 0:
+        for file in files_to_desymlink:
+            print(" - preserve (and dereference symlink): " + file)
+    if len(files_to_preserve) > 0:
+        for file in files_to_preserve:
+            print(" - preserve: " + file)
+    if len(files_to_delete) > 0:
+        for file in files_to_delete:
+            print(" - delete: " + file)
+    while True:
+        resp = input("Continue? (Y/N): ")
+        if resp.strip().lower() == "y":
+            break
+        elif resp.strip().lower() == "n":
+            sys.exit(0)
+
+    print("Executing...")
+    if len(files_to_desymlink) > 0:
+        for file in files_to_desymlink:
+            realpath = os.path.realpath(file)
+            print("rm " + file)
+            os.remove(file)
+            print("cp {} {}".format(realpath, file))
+            shutil.copyfile(realpath, file)
+    if len(files_to_delete) > 0:
+        for file in files_to_delete:
+            print("rm " + file)
+            os.remove(file)
+
+
+if __name__ == "__main__":
+    main()
diff --git a/Speech2S/speech2s/scripts/sacrebleu.sh b/Speech2S/speech2s/scripts/sacrebleu.sh
new file mode 100644
index 0000000000000000000000000000000000000000..c10bf2b76ea032deabab6f5c9d8a3e1e884f1642
--- /dev/null
+++ b/Speech2S/speech2s/scripts/sacrebleu.sh
@@ -0,0 +1,27 @@
+#!/bin/bash
+
+if [ $# -ne 4 ]; then
+    echo "usage: $0 TESTSET SRCLANG TGTLANG GEN"
+    exit 1
+fi
+
+TESTSET=$1
+SRCLANG=$2
+TGTLANG=$3
+
+GEN=$4
+
+if ! command -v sacremoses &> /dev/null
+then
+    echo "sacremoses could not be found, please install with: pip install sacremoses"
+    exit
+fi
+
+grep ^H $GEN \
+| sed 's/^H\-//' \
+| sort -n -k 1 \
+| cut -f 3 \
+| sacremoses detokenize \
+> $GEN.sorted.detok
+
+sacrebleu --test-set $TESTSET --language-pair "${SRCLANG}-${TGTLANG}" < $GEN.sorted.detok
diff --git a/Speech2S/speech2s/scripts/shard_docs.py b/Speech2S/speech2s/scripts/shard_docs.py
new file mode 100644
index 0000000000000000000000000000000000000000..97232c3c845ee01dc5ab627388934cc0f9588280
--- /dev/null
+++ b/Speech2S/speech2s/scripts/shard_docs.py
@@ -0,0 +1,54 @@
+#!/usr/bin/env python3
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+"""
+Split a large file into shards while respecting document boundaries. Documents
+should be separated by a single empty line.
+"""
+
+import argparse
+import contextlib
+
+
+def main():
+    parser = argparse.ArgumentParser()
+    parser.add_argument("input")
+    parser.add_argument("--num-shards", type=int)
+    args = parser.parse_args()
+
+    assert args.num_shards is not None and args.num_shards > 1
+
+    with open(args.input, "r", encoding="utf-8") as h:
+        with contextlib.ExitStack() as stack:
+            outputs = [
+                stack.enter_context(
+                    open(args.input + ".shard" + str(i), "w", encoding="utf-8")
+                )
+                for i in range(args.num_shards)
+            ]
+
+            doc = []
+            first_doc = [True] * args.num_shards
+
+            def output_doc(i):
+                if not first_doc[i]:
+                    outputs[i].write("\n")
+                first_doc[i] = False
+                for line in doc:
+                    outputs[i].write(line)
+                doc.clear()
+
+            num_docs = 0
+            for line in h:
+                if line.strip() == "":  # empty line indicates new document
+                    output_doc(num_docs % args.num_shards)
+                    num_docs += 1
+                else:
+                    doc.append(line)
+            output_doc(num_docs % args.num_shards)
+
+
+if __name__ == "__main__":
+    main()
diff --git a/Speech2S/speech2s/scripts/split_train_valid_docs.py b/Speech2S/speech2s/scripts/split_train_valid_docs.py
new file mode 100644
index 0000000000000000000000000000000000000000..ff159785284a13b44626b207d84430c592acaf8f
--- /dev/null
+++ b/Speech2S/speech2s/scripts/split_train_valid_docs.py
@@ -0,0 +1,86 @@
+#!/usr/bin/env python3
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+"""
+Split a large file into a train and valid set while respecting document
+boundaries. Documents should be separated by a single empty line.
+"""
+
+import argparse
+import random
+import sys
+
+
+def main():
+    parser = argparse.ArgumentParser()
+    parser.add_argument("input")
+    parser.add_argument("sample_output", help="train output file")
+    parser.add_argument("remainder_output", help="valid output file")
+    parser.add_argument("-k", type=int, help="remainder size")
+    parser.add_argument(
+        "--lines", action="store_true", help="split lines instead of docs"
+    )
+    args = parser.parse_args()
+
+    assert args.k is not None
+
+    sample = []
+    remainder = []
+    num_docs = [0]
+
+    def update_sample(doc):
+        if len(sample) < args.k:
+            sample.append(doc.copy())
+        else:
+            i = num_docs[0]
+            j = random.randrange(i + 1)
+            if j < args.k:
+                remainder.append(sample[j])
+                sample[j] = doc.copy()
+            else:
+                remainder.append(doc.copy())
+        num_docs[0] += 1
+        doc.clear()
+
+    with open(args.input, "r", encoding="utf-8") as h:
+        doc = []
+        for i, line in enumerate(h):
+            if line.strip() == "":  # empty line indicates new document
+                update_sample(doc)
+            else:
+                doc.append(line)
+            if args.lines:
+                update_sample(doc)
+            if i % 1000000 == 0:
+                print(i, file=sys.stderr, end="", flush=True)
+            elif i % 100000 == 0:
+                print(".", file=sys.stderr, end="", flush=True)
+        if len(doc) > 0:
+            update_sample(doc)
+    print(file=sys.stderr, flush=True)
+
+    assert len(sample) == args.k
+
+    with open(args.sample_output, "w", encoding="utf-8") as out:
+        first = True
+        for doc in sample:
+            if not first and not args.lines:
+                out.write("\n")
+            first = False
+            for line in doc:
+                out.write(line)
+
+    with open(args.remainder_output, "w", encoding="utf-8") as out:
+        first = True
+        for doc in remainder:
+            if not first and not args.lines:
+                out.write("\n")
+            first = False
+            for line in doc:
+                out.write(line)
+
+
+if __name__ == "__main__":
+    main()
diff --git a/Speech2S/speech2s/scripts/spm_decode.py b/Speech2S/speech2s/scripts/spm_decode.py
new file mode 100644
index 0000000000000000000000000000000000000000..7d7b68b240265924601ca6a738ed3d7b4b8e9cda
--- /dev/null
+++ b/Speech2S/speech2s/scripts/spm_decode.py
@@ -0,0 +1,53 @@
+#!/usr/bin/env python
+# Copyright (c) Facebook, Inc. and its affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+
+from __future__ import absolute_import, division, print_function, unicode_literals
+
+import argparse
+
+import sentencepiece as spm
+
+
+def main():
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        "--model", required=True, help="sentencepiece model to use for decoding"
+    )
+    parser.add_argument("--input", required=True, help="input file to decode")
+    parser.add_argument("--input_format", choices=["piece", "id"], default="piece")
+    args = parser.parse_args()
+
+    sp = spm.SentencePieceProcessor()
+    sp.Load(args.model)
+
+    if args.input_format == "piece":
+
+        def decode(input):
+            return "".join(sp.DecodePieces(input))
+
+    elif args.input_format == "id":
+
+        def decode(input):
+            return "".join(sp.DecodeIds(input))
+
+    else:
+        raise NotImplementedError
+
+    def tok2int(tok):
+        # remap reference-side <unk> (represented as <<unk>>) to 0
+        return int(tok) if tok != "<<unk>>" else 0
+
+    with open(args.input, "r", encoding="utf-8") as h:
+        for line in h:
+            if args.input_format == "id":
+                print(decode(list(map(tok2int, line.rstrip().split()))))
+            elif args.input_format == "piece":
+                print(decode(line.rstrip().split()))
+
+
+if __name__ == "__main__":
+    main()
diff --git a/Speech2S/speech2s/scripts/spm_encode.py b/Speech2S/speech2s/scripts/spm_encode.py
new file mode 100644
index 0000000000000000000000000000000000000000..f91e0bb728a33448c1415aee6036ac9d0feac11f
--- /dev/null
+++ b/Speech2S/speech2s/scripts/spm_encode.py
@@ -0,0 +1,119 @@
+#!/usr/bin/env python
+# Copyright (c) Facebook, Inc. and its affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+
+from __future__ import absolute_import, division, print_function, unicode_literals
+
+import argparse
+import contextlib
+import sys
+
+import sentencepiece as spm
+
+
+def main():
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        "--model", required=True, help="sentencepiece model to use for encoding"
+    )
+    parser.add_argument(
+        "--inputs", nargs="+", default=["-"], help="input files to filter/encode"
+    )
+    parser.add_argument(
+        "--outputs", nargs="+", default=["-"], help="path to save encoded outputs"
+    )
+    parser.add_argument("--output_format", choices=["piece", "id"], default="piece")
+    parser.add_argument(
+        "--min-len",
+        type=int,
+        metavar="N",
+        help="filter sentence pairs with fewer than N tokens",
+    )
+    parser.add_argument(
+        "--max-len",
+        type=int,
+        metavar="N",
+        help="filter sentence pairs with more than N tokens",
+    )
+    args = parser.parse_args()
+
+    assert len(args.inputs) == len(
+        args.outputs
+    ), "number of input and output paths should match"
+
+    sp = spm.SentencePieceProcessor()
+    sp.Load(args.model)
+
+    if args.output_format == "piece":
+
+        def encode(input):
+            return sp.EncodeAsPieces(input)
+
+    elif args.output_format == "id":
+
+        def encode(input):
+            return list(map(str, sp.EncodeAsIds(input)))
+
+    else:
+        raise NotImplementedError
+
+    if args.min_len is not None or args.max_len is not None:
+
+        def valid(line):
+            return (args.min_len is None or len(line) >= args.min_len) and (
+                args.max_len is None or len(line) <= args.max_len
+            )
+
+    else:
+
+        def valid(lines):
+            return True
+
+    with contextlib.ExitStack() as stack:
+        inputs = [
+            stack.enter_context(open(input, "r", encoding="utf-8"))
+            if input != "-"
+            else sys.stdin
+            for input in args.inputs
+        ]
+        outputs = [
+            stack.enter_context(open(output, "w", encoding="utf-8"))
+            if output != "-"
+            else sys.stdout
+            for output in args.outputs
+        ]
+
+        stats = {
+            "num_empty": 0,
+            "num_filtered": 0,
+        }
+
+        def encode_line(line):
+            line = line.strip()
+            if len(line) > 0:
+                line = encode(line)
+                if valid(line):
+                    return line
+                else:
+                    stats["num_filtered"] += 1
+            else:
+                stats["num_empty"] += 1
+            return None
+
+        for i, lines in enumerate(zip(*inputs), start=1):
+            enc_lines = list(map(encode_line, lines))
+            if not any(enc_line is None for enc_line in enc_lines):
+                for enc_line, output_h in zip(enc_lines, outputs):
+                    print(" ".join(enc_line), file=output_h)
+            if i % 10000 == 0:
+                print("processed {} lines".format(i), file=sys.stderr)
+
+        print("skipped {} empty lines".format(stats["num_empty"]), file=sys.stderr)
+        print("filtered {} lines".format(stats["num_filtered"]), file=sys.stderr)
+
+
+if __name__ == "__main__":
+    main()
diff --git a/Speech2S/speech2s/scripts/spm_train.py b/Speech2S/speech2s/scripts/spm_train.py
new file mode 100644
index 0000000000000000000000000000000000000000..9db668fd4166a860198784990de68ea26157995d
--- /dev/null
+++ b/Speech2S/speech2s/scripts/spm_train.py
@@ -0,0 +1,16 @@
+#!/usr/bin/env python
+# Copyright (c) Facebook, Inc. and its affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+
+from __future__ import absolute_import, division, print_function, unicode_literals
+
+import sys
+
+import sentencepiece as spm
+
+
+if __name__ == "__main__":
+    spm.SentencePieceTrainer.Train(" ".join(sys.argv[1:]))
diff --git a/Speech2S/speech2s/scripts/test_fsdp.sh b/Speech2S/speech2s/scripts/test_fsdp.sh
new file mode 100644
index 0000000000000000000000000000000000000000..1f428a035e4474427ded991f8e8307ea59f61f69
--- /dev/null
+++ b/Speech2S/speech2s/scripts/test_fsdp.sh
@@ -0,0 +1,24 @@
+#!/usr/bin/env bash
+rm -rf fsdp_dummy
+mkdir -p fsdp_dummy
+CUDA_VISIBLE_DEVICES=0,1,2,3 fairseq-train /private/home/sshleifer/data-bin/stories_mmap \
+    --ddp-backend fully_sharded --fp16 --fp16-init-scale 4 \
+    --cpu-offload --checkpoint-activations \
+    --task language_modeling --tokens-per-sample 256 --batch-size 8 \
+    --arch transformer_lm_gpt2_tiny \
+    --optimizer cpu_adam --adam-betas "(0.9,0.98)" \
+    --lr 0.0001 --lr-scheduler polynomial_decay --warmup-updates 5 --total-num-update 10 \
+    --max-update 5 --log-format json --log-interval 1 \
+    --save-interval-updates 5 --save-dir fsdp_dummy --disable-validation \
+    --restore-file x.pt "$@"
+
+# Now we try to load the checkpoint
+CUDA_VISIBLE_DEVICES=0,1 fairseq-train /private/home/sshleifer/data-bin/stories_mmap \
+    --ddp-backend fully_sharded --fp16 --fp16-init-scale 4 \
+    --cpu-offload --checkpoint-activations \
+    --task language_modeling --tokens-per-sample 256 --batch-size 8 \
+    --arch transformer_lm_gpt2_tiny \
+    --optimizer cpu_adam --adam-betas "(0.9,0.98)" \
+    --lr 0.0001 --lr-scheduler polynomial_decay --warmup-updates 5 --total-num-update 10 \
+    --max-update 2 --log-format json --log-interval 1 \
+    --save-interval-updates 2 --save-dir fsdp_dummy
diff --git a/Speech2S/speech2s/stpretrain_scripts/base_sc2c_enes.sh b/Speech2S/speech2s/stpretrain_scripts/base_sc2c_enes.sh
new file mode 100644
index 0000000000000000000000000000000000000000..08e00403f961625ec2c819f5ee85a2ce74e64e9a
--- /dev/null
+++ b/Speech2S/speech2s/stpretrain_scripts/base_sc2c_enes.sh
@@ -0,0 +1,64 @@
+
+# ####################################
+# Hubert SCT2T ED model #
+# ####################################
+
+world_size=$1
+update_freq=$2
+exp_name=$3
+[ -z $world_size ] && world_size=8
+[ -z $update_freq ] && update_freq=1
+[ -z $exp_name ] && exp_name=sc2t_base_enes_${world_size}gpu_${update_freq}accum6666
+
+
+FAIRSEQ_ROOT=/mnt/output/users/v-kunwei/code/fairseq_mlstku
+CONFIG_DIR=/mnt/output/users/v-kunwei/code/stpretrain_scripts/config
+DATA_DIR="/mnt/output/users/v-kunwei/data/s2s_data/speech_enes"
+TEXT_DATA_DIR="/mnt/output/users/v-kunwei/data/s2s_data/text_enes/bin-idx"
+MODEL_DIR="/mnt/output/v-kunwei/data/s2s_data/exp/S2S_enes/$exp_name"
+
+[ -d $MODEL_DIR ] || mkdir -p $MODEL_DIR
+
+
+python $FAIRSEQ_ROOT/fairseq_cli/hydra_train.py \
+  --config-dir $CONFIG_DIR/pretrain \
+  --config-name sc2t_base_librispeech \
+  \
+  +task.store_labels=true \
+  task.labels='["km"]' \
+  model.label_rate=50 \
+  task.data=$DATA_DIR \
+  task.label_dir=$DATA_DIR \
+  task.text_cfg.text_data=$TEXT_DATA_DIR \
+  +task.text_cfg.data_config=config.yaml \
+  task.text_cfg.text_maxtokens_ratio=3.0 \
+  \
+  +criterion.dec_loss_type="ce" \
+  \
+  criterion.text_weight=1.0 \
+  \
+  model.use_rel_pos_enc=true \
+  +model.code_use_rel_pos_enc=true \
+  +model.pad_with_code=true \
+  model.text_transformer.no_scale_embedding=true \
+  model.text_transformer.layernorm_embedding=true \
+  +model.share_decoder_input_output_embed=true \
+  \
+  dataset.train_subset=\"train_all+en.kmu-spm\" \
+  dataset.valid_subset=\"valid+en_valid.kmu-spm\" \
+  dataset.num_workers=0 \
+  dataset.max_tokens=1000000 \
+  optimization.update_freq=[${update_freq}] \
+  optimization.max_update=400000 \
+  \
+  distributed_training.distributed_world_size=${world_size} \
+  \
+  common.tensorboard_logdir=$MODEL_DIR \
+  checkpoint.save_dir=$MODEL_DIR \
+  hydra.run.dir=$MODEL_DIR \
+  hydra.job.name=${exp_name}
+
+
+sleep 5m
+echo "All finished"
+
diff --git a/Speech2S/speech2s/stpretrain_scripts/base_sc2c_esen.sh b/Speech2S/speech2s/stpretrain_scripts/base_sc2c_esen.sh
new file mode 100644
index 0000000000000000000000000000000000000000..2a15bd129b961e9c5eeff211f7c03f7f8fcc20c9
--- /dev/null
+++ b/Speech2S/speech2s/stpretrain_scripts/base_sc2c_esen.sh
@@ -0,0 +1,64 @@
+
+# ####################################
+# Hubert SCT2T ED model #
+# ####################################
+
+world_size=$1
+update_freq=$2
+exp_name=$3
+[ -z $world_size ] && world_size=24
+[ -z $update_freq ] && update_freq=3
+[ -z $exp_name ] && exp_name=sc2t_base_esen_${world_size}gpu_${update_freq}accum1
+
+
+FAIRSEQ_ROOT=/mnt/output/users/v-kunwei/code/fairseq_mlstku
+CONFIG_DIR=/mnt/output/users/v-kunwei/code/stpretrain_scripts/config
+DATA_DIR="/mnt/output/users/v-kunwei/data/s2s_data/speech_esen"
+TEXT_DATA_DIR="/mnt/output/users/v-kunwei/data/s2s_data/text_esen"
+MODEL_DIR="/mnt/output/v-kunwei/data/s2s_data/exp/S2S_esen/$exp_name"
+
+[ -d $MODEL_DIR ] || mkdir -p $MODEL_DIR
+
+
+python $FAIRSEQ_ROOT/fairseq_cli/hydra_train.py \
+  --config-dir $CONFIG_DIR/pretrain \
+  --config-name sc2t_base_librispeech \
+  \
+  +task.store_labels=true \
+  task.labels='["km"]' \
+  model.label_rate=50 \
+  task.data=$DATA_DIR \
+  task.label_dir=$DATA_DIR \
+  task.text_cfg.text_data=$TEXT_DATA_DIR \
+  +task.text_cfg.data_config=config.yaml \
+  task.text_cfg.text_maxtokens_ratio=3.0 \
+  \
+  +criterion.dec_loss_type="ce" \
+  \
+  criterion.text_weight=1.0 \
+  \
+  model.use_rel_pos_enc=true \
+  +model.code_use_rel_pos_enc=true \
+  +model.pad_with_code=true \
+  model.text_transformer.no_scale_embedding=true \
+  model.text_transformer.layernorm_embedding=true \
+  +model.share_decoder_input_output_embed=true \
+  \
+  dataset.train_subset=\"train+en.kmu-spm\" \
+  dataset.valid_subset=\"valid+en_valid.kmu-spm\" \
+  dataset.num_workers=0 \
+  dataset.max_tokens=1000000 \
+  optimization.update_freq=[${update_freq}] \
+  optimization.max_update=400000 \
+  \
+  distributed_training.distributed_world_size=${world_size} \
+  \
+  common.tensorboard_logdir=$MODEL_DIR \
+  checkpoint.save_dir=$MODEL_DIR \
+  hydra.run.dir=$MODEL_DIR \
+  hydra.job.name=${exp_name}
+
+
+sleep 5m
+echo "All finished"
+
diff --git a/Speech2S/speech2s/stpretrain_scripts/config.yaml b/Speech2S/speech2s/stpretrain_scripts/config.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..58ba896d1a38a7ac980d213d818b1d2e427c9eb6
--- /dev/null
+++ b/Speech2S/speech2s/stpretrain_scripts/config.yaml
@@ -0,0 +1,4 @@
+audio_root: ./
+standardize_audio: true
+use_audio_input: true
+vocab_filename: dict.txt
diff --git a/Speech2S/speech2s/stpretrain_scripts/config/finetune_asr/base_100h.yaml b/Speech2S/speech2s/stpretrain_scripts/config/finetune_asr/base_100h.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..7c9fae8e626ccb3d209334d754ff6823b40c2c4e
--- /dev/null
+++ b/Speech2S/speech2s/stpretrain_scripts/config/finetune_asr/base_100h.yaml
@@ -0,0 +1,101 @@
+# @package _group_
+
+common:
+  fp16: true
+  log_format: json
+  log_interval: 200
+  tensorboard_logdir: tblog
+  seed: 1337
+
+checkpoint:
+  save_interval: 1
+  keep_last_epochs: 5
+  keep_best_checkpoints: 5
+  best_checkpoint_metric: wer
+  restore_file: checkpoint_last.pt
+
+distributed_training:
+  ddp_backend: c10d
+  find_unused_parameters: true
+  distributed_world_size: 1
+  distributed_port: -1
+  nprocs_per_node: 8
+
+task:
+  _name: hubert_pretraining
+  data: ???
+  fine_tuning: true
+  label_dir: ???
+  normalize: false  # must be consistent with pre-training
+  labels: ["ltr"]
+  single_target: true
+  add_decoder: false
+  pad_audio: false
+  random_crop: true
+  tokenizer: "none"
+  sp_path: None
+
+dataset:
+  num_workers: 0
+  max_tokens: 1200000
+  skip_invalid_size_inputs_valid_test: true
+  train_subset: train_100
+  valid_subset: dev_other
+  required_batch_size_multiple: 1
+
+criterion:
+  _name: label_smoothed_cross_entropy
+  #zero_infinity: true
+
+
+optimization:
+  max_update: 80000
+  lr: [0.00003]
+  sentence_avg: true
+  update_freq: [1]
+
+optimizer:
+  _name: adam
+  adam_betas: (0.9,0.98)
+  adam_eps: 1e-08
+  weight_decay: 0.0
+
+lr_scheduler:
+  _name: tri_stage
+  phase_ratio: [0.1, 0.4, 0.5]
+  final_lr_scale: 0.05
+
+model:
+  _name: hubert_ctc
+  w2v_path: ???
+  apply_mask: true
+  mask_prob: 0.65
+  mask_channel_prob: 0.5
+  mask_channel_length: 64
+  layerdrop: 0.1
+  decoder_layerdrop: 0.1
+  activation_dropout: 0.1
+  feature_grad_mult: 0.0
+  freeze_finetune_updates: 0
+  add_decoder: false
+
+hydra:
+  job:
+    config:
+      override_dirname:
+        kv_sep: '-'
+        item_sep: '__'
+        exclude_keys:
+          - run
+          - task.data
+          - task.label_dir
+          - model.w2v_path
+          - dataset.train_subset
+          - dataset.valid_subset
+          - criterion.wer_kenlm_model
+          - criterion.wer_lexicon
+  run:
+    dir: ???
+  sweep:
+    dir: ???
+    subdir: ${hydra.job.config_name}__${hydra.job.override_dirname}
diff --git a/Speech2S/speech2s/stpretrain_scripts/config/finetune_asr/large_960h.yaml b/Speech2S/speech2s/stpretrain_scripts/config/finetune_asr/large_960h.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..360182329dd245e1d2f8d10f412654fc5ba2afb3
--- /dev/null
+++ b/Speech2S/speech2s/stpretrain_scripts/config/finetune_asr/large_960h.yaml
@@ -0,0 +1,98 @@
+# @package _group_
+
+common:
+  fp16: true
+  log_format: json
+  log_interval: 200
+  tensorboard_logdir: tblog
+
+checkpoint:
+  save_interval: 1
+  keep_last_epochs: 10
+  keep_best_checkpoints: 5
+  best_checkpoint_metric: wer
+  restore_file: checkpoint_last.pt
+
+distributed_training:
+  ddp_backend: c10d
+  find_unused_parameters: true
+  distributed_world_size: 24
+  distributed_port: -1
+  nprocs_per_node: 8
+
+task:
+  _name: hubert_pretraining
+  data: ???
+  fine_tuning: true
+  label_dir: ???
+  normalize: true  # must be consistent with pre-training
+  labels: ["ltr"]
+  single_target: true
+  add_decoder: false
+  pad_audio: false
+  random_crop: true
+  tokenizer: "none"
+  sp_path: None
+
+dataset:
+  num_workers: 0
+  max_tokens: 1280000
+  skip_invalid_size_inputs_valid_test: true
+  valid_subset: dev_other
+  required_batch_size_multiple: 1
+
+criterion:
+  _name: ctc
+  zero_infinity: true
+
+optimization:
+  max_update: 200000
+  lr: [0.00003]
+  sentence_avg: true
+  update_freq: [1]
+
+optimizer:
+  _name: adam
+  adam_betas: (0.9,0.98)
+  adam_eps: 1e-08
+  weight_decay: 0.0
+
+lr_scheduler:
+  _name: tri_stage
+  phase_ratio: [0.1, 0.4, 0.5]
+  final_lr_scale: 0.05
+
+model:
+  _name: hubert_ctc
+  w2v_path: ???
+  apply_mask: true
+  mask_prob: 0.5
+  mask_channel_prob: 0.25
+  mask_channel_length: 64
+  layerdrop: 0.0
+  decoder_layerdrop: 0.1
+  activation_dropout: 0.1
+  feature_grad_mult: 0.0
+  freeze_finetune_updates: 0
+  add_decoder: false
+
+hydra:
+  job:
+    config:
+      override_dirname:
+        kv_sep: '-'
+        item_sep: '__'
+        exclude_keys:
+          - run
+          - task.data
+          - task.label_dir
+          - model.w2v_path
+          - dataset.train_subset
+          - dataset.valid_subset
+          - criterion.wer_kenlm_model
+          - criterion.wer_lexicon
+  run:
+    dir: ???
+  sweep:
+    dir: ???
+    subdir: ${hydra.job.config_name}__${hydra.job.override_dirname}
diff --git a/Speech2S/speech2s/stpretrain_scripts/config/pretrain/mbart.yaml b/Speech2S/speech2s/stpretrain_scripts/config/pretrain/mbart.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..51025f2f8ec584a888a4e07c8c246829351af948
--- /dev/null
+++ b/Speech2S/speech2s/stpretrain_scripts/config/pretrain/mbart.yaml
@@ -0,0 +1,120 @@
+# @package _group_
+
+common:
+  fp16: true
+  log_format: json
+  log_interval: 200
+  seed: 1337
+  tensorboard_logdir: tblog
+
+checkpoint:
+  save_dir: ???
+  save_interval: 4
+  keep_last_epochs: 4
+  save_interval_updates: 20000
+  keep_interval_updates: -1
+  keep_interval_updates_pattern: 50000
+  # no_epoch_checkpoints: true
+
+distributed_training:
+  ddp_backend: no_c10d
+  distributed_backend: 'nccl'
+  distributed_world_size: 8
+  nprocs_per_node: 8
+  find_unused_parameters: true
+
+task:
+  _name: denoising
+  data: ???
+  mask: 0.15
+
+dataset:
+  num_workers: 6
+  max_tokens: 1400000
+  skip_invalid_size_inputs_valid_test: true
+  validate_interval: ${checkpoint.save_interval}
+  validate_interval_updates: ${checkpoint.save_interval_updates}
+  required_batch_size_multiple: 1
+
+criterion:
+  _name: sc2t
+  pred_masked_weight: 1.0
+  pred_nomask_weight: 0.0
+  loss_weights: [10,]
+  label_smoothing: 0.1
+  text_weight: 0.1
+
+optimization:
+  max_update: 400000
+  lr: [0.0005]
+  clip_norm: 10.0
+
+optimizer:
+  _name: adam
+  adam_betas: (0.9,0.98)
+  adam_eps: 1e-06
+  weight_decay: 0.01
+
+lr_scheduler:
+  _name: polynomial_decay
+  warmup_updates: 32000
+
+model:
+  _name: stbert
+  label_rate: ???
+  skip_masked: false
+  skip_nomask: false
+  mask_prob: 0.80
+  extractor_mode: default
+  conv_feature_layers: '[(512,10,5)] + [(512,3,2)] * 4 + [(512,2,2)] * 2'
+  final_dim: 256
+  encoder_layers: 6
+  encoder_attention_heads: 8
+  decoder_layerdrop: 0.05
+  dropout_input: 0.1
+  dropout_features: 0.1
+  dropout: 0.1
+  attention_dropout: 0.1
+  feature_grad_mult: 0.1
+  untie_final_proj: true
+  activation_dropout: 0.0
+  use_rel_pos_enc: true
+  add_code_encoder: true
+  add_adaptor: false
+  text_transformer:
+    activation_fn: ${model.activation_fn}
+    dropout: ${model.dropout}
+    attention_dropout: ${model.attention_dropout}
+    activation_dropout: ${model.activation_dropout}
+    adaptive_input: ${model.adaptive_input}
+    max_source_positions: 3000
+    checkpoint_activations: ${model.checkpoint_activations}
+    no_scale_embedding: false
+    layernorm_embedding: false
+    quant_noise:
+      pq: ${model.quant_noise_pq}
+    encoder:
+      embed_dim: 768
+      ffn_embed_dim: 3072
+      layers: 6
+      attention_heads: 8
+      normalize_before: false
+      learned_pos: true
+      layerdrop: ${model.encoder_layerdrop}
+     
+
+hydra:
+  job:
+    config:
+      override_dirname:
+        kv_sep: '-'
+        item_sep: '__'
+        exclude_keys:
+          - run
+          - task.data
+          - task.label_dir
+  run:
+    dir: ???
+  sweep:
+    dir: ???
+    subdir: ${hydra.job.config_name}__${hydra.job.override_dirname}
diff --git a/Speech2S/speech2s/stpretrain_scripts/config/pretrain/sc2t_base_librispeech.yaml b/Speech2S/speech2s/stpretrain_scripts/config/pretrain/sc2t_base_librispeech.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..0cd16561c9d4715d21824cbbc7271940d3ceeda7
--- /dev/null
+++ b/Speech2S/speech2s/stpretrain_scripts/config/pretrain/sc2t_base_librispeech.yaml
@@ -0,0 +1,137 @@
+# @package _group_
+
+common:
+  fp16: true
+  log_format: json
+  log_interval: 200
+  seed: 1337
+  tensorboard_logdir: tblog
+
+checkpoint:
+  save_dir: ???
+  save_interval: 4
+  keep_last_epochs: 4
+  save_interval_updates: 20000
+  keep_interval_updates: -1
+  keep_interval_updates_pattern: 50000
+  # no_epoch_checkpoints: true
+
+distributed_training:
+  ddp_backend: no_c10d
+  distributed_backend: 'nccl'
+  distributed_world_size: 8
+  nprocs_per_node: 8
+  find_unused_parameters: true
+
+task:
+  _name: joint_sc2t_pretraining
+  data: ???
+  label_dir: ???
+  labels: ???
+  label_rate: ${model.label_rate}
+  sample_rate: 16000
+  max_sample_size: 250000
+  min_sample_size: 32000
+  pad_audio: false
+  random_crop: true
+  normalize: false # must be consistent with extractor
+  add_decoder: true
+  text_cfg:
+    seed: ${common.seed}
+    text_data: ???
+    sample_break_mode: eos
+    tokens_per_sample: 1024
+    shorten_method: "random_crop"
+    text_maxtokens_ratio: 1.0
+
+
+dataset:
+  num_workers: 6
+  max_tokens: 1400000
+  skip_invalid_size_inputs_valid_test: true
+  validate_interval: ${checkpoint.save_interval}
+  validate_interval_updates: ${checkpoint.save_interval_updates}
+  required_batch_size_multiple: 1
+
+criterion:
+  _name: sc2t
+  pred_masked_weight: 1.0
+  pred_nomask_weight: 0.0
+  loss_weights: [10,]
+  label_smoothing: 0.1
+  text_weight: 0.1
+
+optimization:
+  max_update: 400000
+  lr: [0.0005]
+  clip_norm: 10.0
+
+optimizer:
+  _name: adam
+  adam_betas: (0.9,0.98)
+  adam_eps: 1e-06
+  weight_decay: 0.01
+
+lr_scheduler:
+  _name: polynomial_decay
+  warmup_updates: 32000
+
+model:
+  _name: stbert
+  label_rate: ???
+  skip_masked: false
+  skip_nomask: false
+  mask_prob: 0.80
+  extractor_mode: default
+  conv_feature_layers: '[(512,10,5)] + [(512,3,2)] * 4 + [(512,2,2)] * 2'
+  final_dim: 256
+  encoder_layers: 6
+  encoder_attention_heads: 8
+  decoder_layerdrop: 0.05
+  dropout_input: 0.1
+  dropout_features: 0.1
+  dropout: 0.1
+  attention_dropout: 0.1
+  feature_grad_mult: 0.1
+  untie_final_proj: true
+  activation_dropout: 0.0
+  use_rel_pos_enc: true
+  add_code_encoder: true
+  add_adaptor: false
+  text_transformer:
+    activation_fn: ${model.activation_fn}
+    dropout: ${model.dropout}
+    attention_dropout: ${model.attention_dropout}
+    activation_dropout: ${model.activation_dropout}
+    adaptive_input: ${model.adaptive_input}
+    max_source_positions: 3000
+    checkpoint_activations: ${model.checkpoint_activations}
+    no_scale_embedding: false
+    layernorm_embedding: false
+    quant_noise:
+      pq: ${model.quant_noise_pq}
+    encoder:
+      embed_dim: 768
+      ffn_embed_dim: 3072
+      layers: 6
+      attention_heads: 8
+      normalize_before: false
+      learned_pos: true
+      layerdrop: ${model.encoder_layerdrop}
+     
+
+hydra:
+  job:
+    config:
+      override_dirname:
+        kv_sep: '-'
+        item_sep: '__'
+        exclude_keys:
+          - run
+          - task.data
+          - task.label_dir
+  run:
+    dir: ???
+  sweep:
+    dir: ???
+    subdir: ${hydra.job.config_name}__${hydra.job.override_dirname}
diff --git a/Speech2S/speech2s/stpretrain_scripts/config/translation/text2code.yaml b/Speech2S/speech2s/stpretrain_scripts/config/translation/text2code.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..bed25135e0da21c20d33475ad33437c63e6703d7
--- /dev/null
+++ b/Speech2S/speech2s/stpretrain_scripts/config/translation/text2code.yaml
@@ -0,0 +1,81 @@
+# @package _group_
+
+common:
+  fp16: true
+  log_format: json
+  log_interval: 200
+  tensorboard_logdir: tblog
+  seed: 1337
+
+checkpoint:
+  save_interval: 1000000
+  keep_last_epochs: 5
+  save_interval_updates: 1000
+  keep_interval_updates_pattern: 10000
+  keep_interval_updates: 5
+  best_checkpoint_metric: accuracy
+  maximize_best_checkpoint_metric: true
+
+distributed_training:
+  ddp_backend: c10d
+  find_unused_parameters: true
+  distributed_world_size: 1
+  nprocs_per_node: 8
+
+
+criterion:
+  _name: "label_smoothed_cross_entropy"
+
+
+task:
+  _name: "translation_from_jst"
+
+dataset:
+  num_workers: 0
+  max_tokens: 4096
+  skip_invalid_size_inputs_valid_test: true
+  validate_after_updates: ${model.freeze_finetune_updates}
+  validate_interval: ${checkpoint.save_interval}
+  validate_interval_updates: ${checkpoint.save_interval_updates}
+  train_subset: train_clean_100
+  valid_subset: dev_clean
+  required_batch_size_multiple: 1
+
+optimizer:
+  _name: adam
+  adam_betas: (0.9,0.98)
+  adam_eps: 1e-06
+  weight_decay: 0.0
+
+lr_scheduler:
+  _name: tri_stage
+  phase_ratio: [0.1, 0.4, 0.5]
+  final_lr_scale: 0.05
+
+model:
+  _name: hubert_t2c
+  w2v_path: ???
+  layerdrop: 0.1
+  decoder_layerdrop: 0.1
+  activation_dropout: 0.1
+  feature_grad_mult: 0.0
+  freeze_finetune_updates: 0
+
+hydra:
+  job:
+    config:
+      override_dirname:
+        kv_sep: '-'
+        item_sep: '__'
+        exclude_keys:
+          - run
+          - task.data
+          - task.label_dir
+          - model.w2v_path
+          - dataset.train_subset
+          - dataset.valid_subset
+  run:
+    dir: ???
+  sweep:
+    dir: ???
+    subdir: ${hydra.job.config_name}__${hydra.job.override_dirname}
diff --git a/Speech2S/speech2s/stpretrain_scripts/config_mbart.yaml b/Speech2S/speech2s/stpretrain_scripts/config_mbart.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..51025f2f8ec584a888a4e07c8c246829351af948
--- /dev/null
+++ b/Speech2S/speech2s/stpretrain_scripts/config_mbart.yaml
@@ -0,0 +1,120 @@
+# @package _group_
+
+common:
+  fp16: true
+  log_format: json
+  log_interval: 200
+  seed: 1337
+  tensorboard_logdir: tblog
+
+checkpoint:
+  save_dir: ???
+  save_interval: 4
+  keep_last_epochs: 4
+  save_interval_updates: 20000
+  keep_interval_updates: -1
+  keep_interval_updates_pattern: 50000
+  # no_epoch_checkpoints: true
+
+distributed_training:
+  ddp_backend: no_c10d
+  distributed_backend: 'nccl'
+  distributed_world_size: 8
+  nprocs_per_node: 8
+  find_unused_parameters: true
+
+task:
+  _name: denoising
+  data: ???
+  mask: 0.15
+
+dataset:
+  num_workers: 6
+  max_tokens: 1400000
+  skip_invalid_size_inputs_valid_test: true
+  validate_interval: ${checkpoint.save_interval}
+  validate_interval_updates: ${checkpoint.save_interval_updates}
+  required_batch_size_multiple: 1
+
+criterion:
+  _name: sc2t
+  pred_masked_weight: 1.0
+  pred_nomask_weight: 0.0
+  loss_weights: [10,]
+  label_smoothing: 0.1
+  text_weight: 0.1
+
+optimization:
+  max_update: 400000
+  lr: [0.0005]
+  clip_norm: 10.0
+
+optimizer:
+  _name: adam
+  adam_betas: (0.9,0.98)
+  adam_eps: 1e-06
+  weight_decay: 0.01
+
+lr_scheduler:
+  _name: polynomial_decay
+  warmup_updates: 32000
+
+model:
+  _name: stbert
+  label_rate: ???
+  skip_masked: false
+  skip_nomask: false
+  mask_prob: 0.80
+  extractor_mode: default
+  conv_feature_layers: '[(512,10,5)] + [(512,3,2)] * 4 + [(512,2,2)] * 2'
+  final_dim: 256
+  encoder_layers: 6
+  encoder_attention_heads: 8
+  decoder_layerdrop: 0.05
+  dropout_input: 0.1
+  dropout_features: 0.1
+  dropout: 0.1
+  attention_dropout: 0.1
+  feature_grad_mult: 0.1
+  untie_final_proj: true
+  activation_dropout: 0.0
+  use_rel_pos_enc: true
+  add_code_encoder: true
+  add_adaptor: false
+  text_transformer:
+    activation_fn: ${model.activation_fn}
+    dropout: ${model.dropout}
+    attention_dropout: ${model.attention_dropout}
+    activation_dropout: ${model.activation_dropout}
+    adaptive_input: ${model.adaptive_input}
+    max_source_positions: 3000
+    checkpoint_activations: ${model.checkpoint_activations}
+    no_scale_embedding: false
+    layernorm_embedding: false
+    quant_noise:
+      pq: ${model.quant_noise_pq}
+    encoder:
+      embed_dim: 768
+      ffn_embed_dim: 3072
+      layers: 6
+      attention_heads: 8
+      normalize_before: false
+      learned_pos: true
+      layerdrop: ${model.encoder_layerdrop}
+     
+
+hydra:
+  job:
+    config:
+      override_dirname:
+        kv_sep: '-'
+        item_sep: '__'
+        exclude_keys:
+          - run
+          - task.data
+          - task.label_dir
+  run:
+    dir: ???
+  sweep:
+    dir: ???
+    subdir: ${hydra.job.config_name}__${hydra.job.override_dirname}
diff --git a/Speech2S/speech2s/stpretrain_scripts/data_process/extract_hubert_feature_itp.sh b/Speech2S/speech2s/stpretrain_scripts/data_process/extract_hubert_feature_itp.sh
new file mode 100644
index 0000000000000000000000000000000000000000..52929896c612957d7fc8df452015411b0e6038bc
--- /dev/null
+++ b/Speech2S/speech2s/stpretrain_scripts/data_process/extract_hubert_feature_itp.sh
@@ -0,0 +1,41 @@
+
+if [ ! -d ${HOME}/azcopy_linux_amd64_10.11.0 ]; then
+    CURRENT_DIR=`pwd`
+    cd ${HOME} && wget https://azcopyvnext.azureedge.net/release20210616/azcopy_linux_amd64_10.11.0.tar.gz && tar -zxvf azcopy_linux_amd64_10.11.0.tar.gz && rm -f azcopy_linux_amd64_10.11.0.tar.gz && cd ${CURRENT_DIR}
+fi
+export PATH=$PATH:${HOME}/azcopy_linux_amd64_10.11.0/:${HOME}/.local/bin
+export PYTHONPATH=$PYTHONPATH:/mnt/output/users/v-kunwei/code/fairseq
+
+rank=$1
+nshard=$2
+split=$3
+[ -z $rank ] && echo "please specify rank"
+[ -z $nshard ] && nshard=1
+[ -z $split ] && split="train"
+
+
+FAIRSEQ_ROOT=/mnt/output/users/v-kunwei/code/fairseq
+ckpt_path=/mnt/output/users/v-kunwei/code/fairseq/examples/speech_to_speech/mhubert_base_vp_en_es_fr_it3.pt
+tsv_dir=/home/v-kunwei
+
+feat_dir=${HOME}/$split
+python $FAIRSEQ_ROOT/examples/hubert/simple_kmeans/dump_hubert_feature.py ${tsv_dir} ${split} ${ckpt_path} 9 ${nshard} ${rank} ${feat_dir} || exit 1
+
+
+echo "-------------------------------------------------------------------------------------------"
+echo "----------------------------------    done    ---------------------------------------------"
+echo "-------------------------------------------------------------------------------------------"
+
+km_path=/mnt/output/users/v-kunwei/code/fairseq/examples/speech_to_speech/mhubert_base_vp_en_es_fr_it3_L11_km1000.bin 
+lab_dir=${HOME}/${split}
+python $FAIRSEQ_ROOT/examples/hubert/simple_kmeans/dump_km_label.py ${feat_dir} ${split} ${km_path} ${nshard} ${rank} ${lab_dir}
+
+
+# sas="?sv=2020-08-04&st=2022-01-02T04%3A58%3A15Z&se=2022-06-01T04%3A58%3A00Z&sr=c&sp=racwdl&sig=NyZKOHivgesEoZ8yvLsVT6aZMYQZMevLLmXNOTaWyvU%3D"
+# blob="https://msranlcmtteamdrive.blob.core.windows.net/teamdrive/v-ziqzhang/data/stbert/data/librispeech/libri_960/hubert_release_iter2_layer9_kmeans/${split}"
+# azcopy copy $feat_dir/${split}_${rank}_${nshard}.len "$blob/$sas"
+# azcopy copy $feat_dir/${split}_${rank}_${nshard}.npy "$blob/$sas"
+# azcopy copy $lab_dir "$blob/$sas" --recursive
+
+
+
diff --git a/Speech2S/speech2s/stpretrain_scripts/data_process/merge_code.py b/Speech2S/speech2s/stpretrain_scripts/data_process/merge_code.py
new file mode 100644
index 0000000000000000000000000000000000000000..a02ba3e3058b75e2e603d7470e9ef93beebabcfa
--- /dev/null
+++ b/Speech2S/speech2s/stpretrain_scripts/data_process/merge_code.py
@@ -0,0 +1,14 @@
+import sys
+import torch
+
+
+def main():
+    for line in sys.stdin:
+        line = line.rstrip()
+        codes = list(map(int, line.split()))
+        merged_codes = torch.unique_consecutive(torch.tensor(codes)).numpy()
+        merged_codes = map(str, merged_codes)
+        print(" ".join(merged_codes))
+
+if __name__ == "__main__":
+    main()
diff --git a/Speech2S/speech2s/stpretrain_scripts/data_process/txt2idx.sh b/Speech2S/speech2s/stpretrain_scripts/data_process/txt2idx.sh
new file mode 100644
index 0000000000000000000000000000000000000000..466f8a3ef8debba9c9f5a76cfb02d1e25217c6b4
--- /dev/null
+++ b/Speech2S/speech2s/stpretrain_scripts/data_process/txt2idx.sh
@@ -0,0 +1,43 @@
+[ $# -lt 3 ] && echo "Usage: $0 <input-text> <outdir> <DICT> <suffix>" && exit 0
+
+if [ ! -d ${HOME}/sentencepiece ]; then
+    CURRENT_DIR=`pwd`
+    cd ${HOME}
+    git clone https://github.com/google/sentencepiece.git
+    cd sentencepiece
+    mkdir build && cd build
+    cmake .. && make -j 16
+    sudo make install
+    sudo ldconfig -v
+    cd ${HOME}
+    cd ${CURRENT_DIR}
+fi
+
+input=$1
+outdir=$2
+DICT=$3
+suffix=$4
+outname=${input##*/}
+outname=${outname%.txt*}
+[ -z $input ] && echo "You must specify a source file" && exit 1
+
+[ -z $DICT ] && echo "No dict was specified!" && exit 1
+[ -z $outdir ] && outdir=${input%/*}
+[ -z $outdir ] && outdir="."
+[ ! -d $outdir ] && mkdir -p $outdir
+
+echo "Dict  : $DICT"
+echo "------------------------------- creating idx/bin--------------------------------------------"
+echo "$input --> $outdir/${outname}${suffix}.idx"
+fairseq-preprocess \
+  --only-source \
+  --trainpref $input \
+  --destdir $outdir \
+  --thresholdsrc 0 \
+  --srcdict ${DICT} \
+  --workers 40
+
+mv $outdir/train.idx $outdir/${outname}${suffix}.idx
+mv $outdir/train.bin $outdir/${outname}${suffix}.bin
+echo "-----------------------------------   done      --------------------------------------------"
+
diff --git a/Speech2S/speech2s/stpretrain_scripts/data_process/txt2spm.sh b/Speech2S/speech2s/stpretrain_scripts/data_process/txt2spm.sh
new file mode 100644
index 0000000000000000000000000000000000000000..6baf72227b4013512af8a6724d2bff2156a47078
--- /dev/null
+++ b/Speech2S/speech2s/stpretrain_scripts/data_process/txt2spm.sh
@@ -0,0 +1,33 @@
+[ $# -lt 2 ] && echo "Usage: $0 <input-text> <outdir> <MODEL> <suffix>" && exit 0
+
+if [ ! -d ${HOME}/sentencepiece ]; then
+    CURRENT_DIR=`pwd`
+    cd ${HOME}
+    git clone https://github.com/google/sentencepiece.git
+    cd sentencepiece
+    mkdir build && cd build
+    cmake .. && make -j 16
+    sudo make install
+    sudo ldconfig -v
+    cd ${HOME}
+    cd ${CURRENT_DIR}
+fi
+
+input=$1
+outdir=$2
+MODEL=$3
+suffix=$4
+outname=${input##*/}
+outname=${outname%.wrd*}
+[ -z $input ] && echo "You must specify a source file" && exit 1
+
+[ -z $MODEL ] && MODEL=/mnt/default/v-ziqzhang/data/stbert/data/librispeech/hubert_release_iter2_layer9_kmeans/spm_unigram_10000.model && echo "No spm model was specified!, set default to $MODEL"
+[ -z $outdir ] && outdir=${input%/*}
+[ -z $outdir ] && outdir="."
+[ ! -d $outdir ] && mkdir -p $outdir
+
+echo "Output: $outdir/$outname.spm"
+
+echo "------------------------------- tokenize text...--------------------------------------------"
+spm_encode --model=$MODEL < ${input} > $outdir/$outname.spm || exit 1
+echo "-----------------------------------   done      --------------------------------------------"
diff --git a/Speech2S/speech2s/stpretrain_scripts/data_process/wmt/normalize_en_text.py b/Speech2S/speech2s/stpretrain_scripts/data_process/wmt/normalize_en_text.py
new file mode 100644
index 0000000000000000000000000000000000000000..83e332575ba317ded70c4095eeebbc5ec588b965
--- /dev/null
+++ b/Speech2S/speech2s/stpretrain_scripts/data_process/wmt/normalize_en_text.py
@@ -0,0 +1,46 @@
+import re
+import sys
+import regex
+import argparse
+from tqdm import tqdm
+from num2words import num2words
+
+def writefile(filename, lines):
+    with open(filename, 'w', encoding='utf-8') as f:
+        f.writelines(lines)
+
+def main():
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--input", "-i", required=True, type=str)
+    parser.add_argument("--output", "-o", required=True, type=str)
+    args = parser.parse_args()
+    outlines = []
+
+    with open(f"{args.input}", 'r') as f:
+        inputs = f.readlines()
+
+        for line in tqdm(inputs):
+            line = line.strip().upper()
+            line = re.sub(u"([^\u0041-\u005a\u0061-\u007a\u0030-\u0039\'])", " ", line)
+            items = []
+            for item in line.split():
+                if item.isdigit():
+                    try:
+                        item = num2words(item)
+                    except Exception as e:
+                        print(line)
+                        raise(e)
+                items.append(item)
+            line = " ".join(items)
+            line = line.replace("-", " ")
+            line = line.upper()
+            line = line.replace("' S", "'S")
+            line = line.replace(" ", "|")
+            line = " ".join(line) + " |"
+            outlines.append(line + '\n')
+            # print(line)
+
+    writefile(args.output, outlines)
+
+if __name__ == "__main__":
+    main()
diff --git a/Speech2S/speech2s/stpretrain_scripts/data_process/wmt/normalize_es_text.py b/Speech2S/speech2s/stpretrain_scripts/data_process/wmt/normalize_es_text.py
new file mode 100644
index 0000000000000000000000000000000000000000..0136b534be0bf4fef1c84b51c83a7ac9ad437700
--- /dev/null
+++ b/Speech2S/speech2s/stpretrain_scripts/data_process/wmt/normalize_es_text.py
@@ -0,0 +1,49 @@
+import re
+import sys
+import regex
+import argparse
+import re,string
+from tqdm import tqdm
+from num2words import num2words
+
+def writefile(filename, lines):
+    with open(filename, 'w', encoding='utf-8') as f:
+        f.writelines(lines)
+
+def main():
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--input", "-i", required=True, type=str)
+    parser.add_argument("--output", "-o", required=True, type=str)
+    args = parser.parse_args()
+    outlines = []
+
+    with open(f"{args.input}", 'r') as f:
+        inputs = f.readlines()
+
+        for line in tqdm(inputs):
+            line = line.strip()
+            line = re.sub(u"([^\u0041-\u005a\u0061-\u007a\u0030-\u0039\u00d1\u00f1\'])", " ", line)
+            items = []
+            punc='~`!#$%^&*()_+-=|\';":/.,?><~.'
+            for item in line.split():
+                if item.isdigit():
+                    try:
+                        item = num2words(item, lang='es')
+                    except Exception as e:
+                        print(line)
+                        raise(e)
+                items.append(item)
+            line = " ".join(items)
+            line = (re.sub(r"[%s]+" %punc, "",line))
+            line = line.replace("-", " ")
+            line = line.lower()
+            line = line.replace("' S", "'S")
+            line = line.replace(" ", "|")
+            line = " ".join(line) + " |"
+            outlines.append(line + '\n')
+            # print(line)
+
+    writefile(args.output, outlines)
+
+if __name__ == "__main__":
+    main()
diff --git a/Speech2S/speech2s/stpretrain_scripts/decode_text2code_beam2.sh b/Speech2S/speech2s/stpretrain_scripts/decode_text2code_beam2.sh
new file mode 100644
index 0000000000000000000000000000000000000000..c9dcc10425a3a519ec456c73d15f3339de2a0eba
--- /dev/null
+++ b/Speech2S/speech2s/stpretrain_scripts/decode_text2code_beam2.sh
@@ -0,0 +1,50 @@
+
+#####################################
+# Hubert ED model #
+#####################################
+[ $# -lt 1 ] && echo "Usage: $0 <init-model> <gen-set> <src> <tgt> <max_tokens> <world_size> <rank>" && exit 0
+#source /mnt/default/v-ziqzhang/.bashrc_sing
+
+model_path=$1
+gen_set=$2
+tgt=$3
+src="ltr"
+max_tokens=$4
+word_size=$5
+rank=$6
+outdir=$7
+
+[ -z $tgt ] && tgt="kmu"
+[ -z $gen_set ] && gen_set="dev_clean"
+[ -z $word_size ] && word_size=1
+[ -z $rank ] && rank=0
+[ -z $max_tokens ] && max_tokens=16000
+
+FAIRSEQ_ROOT=/mnt/output/users/v-kunwei/code/fairseq_mlstku
+DATA_DIR=/home/v-kunwei/
+[ $gen_set == "test" ] && DATA_DIR=/mnt/output/users/v-kunwei/code/fairseq_mlstku
+[ -z $outdir ] && outdir=$DATA_DIR
+
+
+results_path=$outdir/pseudo_${gen_set}_${rank}
+[ ! -d $results_path ] && mkdir -p $results_path
+
+for subset in $gen_set; do
+    python $FAIRSEQ_ROOT/fairseq_cli/generate_mt_label.py $DATA_DIR \
+    --path ${model_path} \
+    --task "translation_from_jst" \
+    --max-target-positions 18000 \
+    --gen-subset $subset \
+    -t $tgt -s "ltr" \
+    --dataset-impl "raw" \
+    --max-tokens ${max_tokens} \
+    --beam 2 \
+    --max-len-a 3 --max-len-b 100 \
+    --results-path $results_path \
+    --distributed-world-size $word_size --distributed-rank $rank \
+    
+    echo "$model" > $results_path/model.record
+    sleep 1s
+done | tee $results_path/decode.log
+
+sleep 2s
diff --git a/Speech2S/speech2s/stpretrain_scripts/eval2.sh b/Speech2S/speech2s/stpretrain_scripts/eval2.sh
new file mode 100644
index 0000000000000000000000000000000000000000..0736ef4e338c9837cafc61d3c903d4683d684ea9
--- /dev/null
+++ b/Speech2S/speech2s/stpretrain_scripts/eval2.sh
@@ -0,0 +1,12 @@
+lmweight=0
+num_gpus=8
+python examples/speech_recognition/new/infer.py --config-dir /mnt/output/users/v-kunwei/code/fairseq/examples/speech_recognition/new/conf \
+--config-name infer task=audio_finetuning task.data=/home/v-kunwei common.user_dir=/mnt/output/users/v-kunwei/code/fairseq/examples/data2vec \
+task.labels=ltr decoding.type=viterbi \
+decoding.lexicon=models/es_eval/espeak_dict.txt \
+decoding.unique_wer_file=True \
+dataset.gen_subset=test \
+common_eval.path=/mnt/output/users/v-kunwei/code/fairseq/models/es_eval/espeak_26lang_m10.pt decoding.beam=1500 distributed_training.distributed_world_size=${num_gpus} \
+decoding.results_path=/home/v-kunwei
+
+#sclite  -h "/home/v-kunwei/hypo.units"  -r "/home/v-kunwei/ref.units"  -i rm -o all stdout > "./result.txt"
diff --git a/Speech2S/speech2s/stpretrain_scripts/eval3.sh b/Speech2S/speech2s/stpretrain_scripts/eval3.sh
new file mode 100644
index 0000000000000000000000000000000000000000..4a2354319ddc7a672506e92e7577d3dc978b47a8
--- /dev/null
+++ b/Speech2S/speech2s/stpretrain_scripts/eval3.sh
@@ -0,0 +1,4 @@
+#$subset=test
+python examples/speech_recognition/infer.py /home/v-kunwei --task audio_finetuning \
+--nbest 1 --path /mnt/output/users/v-kunwei/code/fairseq/models/es_eval/espeak_26lang_m10.pt --gen-subset test --results-path /home/v-kunwei --criterion ctc --labels ltr --max-tokens 4000000 \
+--post-process letter
diff --git a/Speech2S/speech2s/stpretrain_scripts/finetune_enes.sh b/Speech2S/speech2s/stpretrain_scripts/finetune_enes.sh
new file mode 100644
index 0000000000000000000000000000000000000000..eaae1476bc5f80640abee6a85bdd1f453c15d97a
--- /dev/null
+++ b/Speech2S/speech2s/stpretrain_scripts/finetune_enes.sh
@@ -0,0 +1,85 @@
+# ####################################
+# Hubert ED model #
+# ####################################
+#source /mnt/default/v-ziqzhang/.bashrc_sing
+
+[ $# -lt 4 ] && echo "Usage: $0 <world_size> <update_freq> <w2v_path> <cpt>" && exit 0
+world_size=$1
+update_freq=$2
+w2v_path=$3
+cpt=$4
+Mount=$5
+
+[ -z $world_size ] && world_size=8
+[ -z $update_freq ] && update_freq=3
+[ -z $w2v_path ] && echo "you must specify a wav_path !" && exit 1
+[ -z $cpt ] && cpt=030.pt
+[ -z $Mount ] && Mount=/mnt/default
+
+
+FAIRSEQ_ROOT=/mnt/output/users/v-kunwei/code/fairseq_mlstku
+CONFIG_DIR=/mnt/output/users/v-kunwei/code/stpretrain_scripts/config
+DATA_DIR="/mnt/output/users/v-kunwei/data/s2s_data/fin_enes100"
+
+exp_name=${w2v_path%/*}
+exp_name=${exp_name##*/}
+MODEL_DIR="/mnt/output/users/v-kunwei/data/s2s_data/finetune/tune_ST_from_eneshu"
+exp_name="tune_enes_lr5e-5_from_$cpt"
+MODEL_DIR=$MODEL_DIR/$exp_name
+[ -d $MODEL_DIR ] || mkdir -p $MODEL_DIR
+
+max_tokens=490000
+
+python $FAIRSEQ_ROOT/fairseq_cli/hydra_train.py \
+  --config-dir $CONFIG_DIR/finetune_asr \
+  --config-name base_100h \
+  \
+  +task.store_labels=true \
+  task.labels='["spm"]' \
+  task.data=$DATA_DIR \
+  task.label_dir=$DATA_DIR \
+  task.add_decoder=true \
+  +task.max_keep_size=490000 \
+  \
+  +model.reuse_text_emb=true \
+  model._name="stbert_st" \
+  model.w2v_path=${w2v_path} \
+  model.add_decoder=true \
+  \
+  criterion._name="label_smoothed_cross_entropy" \
+  +criterion.label_smoothing=0.2 \
+  +criterion.report_accuracy=true \
+  \
+  lr_scheduler._name="polynomial_decay" \
+  +lr_scheduler.warmup_updates=20000 \
+  \
+  optimization.lr=[0.0003] \
+  optimization.max_update=100000 \
+  checkpoint.best_checkpoint_metric="accuracy" \
+  checkpoint.maximize_best_checkpoint_metric=true \
+  checkpoint.save_interval=1 \
+  \
+  dataset.train_subset="train" \
+  dataset.valid_subset="valid" \
+  dataset.max_tokens=$max_tokens \
+  optimization.update_freq=[${update_freq}] \
+  \
+  distributed_training.distributed_world_size=${world_size} \
+  distributed_training.distributed_port=-1 \
+  \
+  common.log_interval=100 \
+  common.tensorboard_logdir=$MODEL_DIR \
+  checkpoint.save_dir=$MODEL_DIR \
+  hydra.run.dir=$MODEL_DIR \
+  hydra.job.name=${exp_name}
+
+
+
+sleep 20s
+
+  # \
+  # lr_scheduler._name="polynomial_decay" \
+  # +lr_scheduler.warmup_updates=5000 \
+
+
+# /mnt/default/v-ziqzhang/data/stbert-ed/exp/ST_enes/sc2t_base_ende_32gpu_1accum/checkpoint_204_400000.pt
diff --git a/Speech2S/speech2s/stpretrain_scripts/finetune_esen.sh b/Speech2S/speech2s/stpretrain_scripts/finetune_esen.sh
new file mode 100644
index 0000000000000000000000000000000000000000..a9051f67008817d200c797b67ee4919ed5e2797a
--- /dev/null
+++ b/Speech2S/speech2s/stpretrain_scripts/finetune_esen.sh
@@ -0,0 +1,85 @@
+# ####################################
+# Hubert ED model #
+# ####################################
+#source /mnt/default/v-ziqzhang/.bashrc_sing
+
+[ $# -lt 4 ] && echo "Usage: $0 <world_size> <update_freq> <w2v_path> <cpt>" && exit 0
+world_size=$1
+update_freq=$2
+w2v_path=$3
+cpt=$4
+Mount=$5
+
+[ -z $world_size ] && world_size=1
+[ -z $update_freq ] && update_freq=1
+[ -z $w2v_path ] && echo "you must specify a wav_path !" && exit 1
+[ -z $cpt ] && cpt=030.pt
+[ -z $Mount ] && Mount=/mnt/default
+
+
+FAIRSEQ_ROOT=/mnt/output/users/v-kunwei/code/fairseq_mlstku
+CONFIG_DIR=/mnt/output/users/v-kunwei/code/stpretrain_scripts/config
+DATA_DIR="/mnt/output/users/v-kunwei/data/s2s_data/fin_esen"
+
+exp_name=${w2v_path%/*}
+exp_name=${exp_name##*/}
+MODEL_DIR="/mnt/output/users/v-kunwei/data/s2s_data/finetune/tune_ST_from_esen"
+exp_name="tune_esen_lr5e-5_from_$cpt"
+MODEL_DIR=$MODEL_DIR/$exp_name
+[ -d $MODEL_DIR ] || mkdir -p $MODEL_DIR
+
+max_tokens=4900
+
+python $FAIRSEQ_ROOT/fairseq_cli/hydra_train.py \
+  --config-dir $CONFIG_DIR/finetune_asr \
+  --config-name base_100h \
+  \
+  +task.store_labels=true \
+  task.labels='["spm"]' \
+  task.data=$DATA_DIR \
+  task.label_dir=$DATA_DIR \
+  task.add_decoder=true \
+  +task.max_keep_size=4900 \
+  \
+  +model.reuse_text_emb=true \
+  model._name="stbert_st" \
+  model.w2v_path=${w2v_path} \
+  model.add_decoder=true \
+  \
+  criterion._name="label_smoothed_cross_entropy" \
+  +criterion.label_smoothing=0.2 \
+  +criterion.report_accuracy=true \
+  \
+  lr_scheduler._name="polynomial_decay" \
+  +lr_scheduler.warmup_updates=20000 \
+  \
+  optimization.lr=[0.0002] \
+  optimization.max_update=100000 \
+  checkpoint.best_checkpoint_metric="accuracy" \
+  checkpoint.maximize_best_checkpoint_metric=true \
+  checkpoint.save_interval=1 \
+  \
+  dataset.train_subset="train" \
+  dataset.valid_subset="valid" \
+  dataset.max_tokens=$max_tokens \
+  optimization.update_freq=[${update_freq}] \
+  \
+  distributed_training.distributed_world_size=${world_size} \
+  distributed_training.distributed_port=-1 \
+  \
+  common.log_interval=100 \
+  common.tensorboard_logdir=$MODEL_DIR \
+  checkpoint.save_dir=$MODEL_DIR \
+  hydra.run.dir=$MODEL_DIR \
+  hydra.job.name=${exp_name}
+
+
+
+sleep 20s
+
+  # \
+  # lr_scheduler._name="polynomial_decay" \
+  # +lr_scheduler.warmup_updates=5000 \
+
+
+# /mnt/default/v-ziqzhang/data/stbert-ed/exp/ST_enes/sc2t_base_ende_32gpu_1accum/checkpoint_204_400000.pt
diff --git a/Speech2S/speech2s/stpretrain_scripts/inference_ed.sh b/Speech2S/speech2s/stpretrain_scripts/inference_ed.sh
new file mode 100644
index 0000000000000000000000000000000000000000..3fd9ef1231c827d980077a30b278b8986d31c4d7
--- /dev/null
+++ b/Speech2S/speech2s/stpretrain_scripts/inference_ed.sh
@@ -0,0 +1,38 @@
+#####################################
+# Hubert base model #
+#####################################
+[ $# -lt 1 ] && echo "Usage: $0 <init-model> <gen-set>" && exit 0
+
+model_path=$1
+src_dir=${model_path%/*}
+cpt=${model_path##*/}
+cpt=${cpt%.*}
+
+#beam_size=$2
+gen_set=$2
+#lang=$4
+[ -z $gen_set ] && gen_set="test_et"
+[ -z $beam_size ] && beam_size=2
+[ -z $lang ] && lang="fr"
+
+
+#DATA_DIR=/mnt/output/users/v-kunwei/data/s2s_data/fin_enes
+DATA_DIR=/home/v-kunwei
+FAIRSEQ_ROOT=/mnt/output/users/v-kunwei/code/fairseq_mlstku
+
+for subset in $gen_set; do
+    results_path=$src_dir/decode_${cpt}_beam${beam_size}/${subset}
+    [ ! -d $results_path ] && mkdir -p $results_path
+
+    python $FAIRSEQ_ROOT/fairseq_cli/generate.py \
+	    $DATA_DIR  --label-dir ${DATA_DIR} \
+	    --labels '["spm"]' --gen-subset ${subset} \
+            --max-tokens 9000000 --task hubert_pretraining \
+	    --add-decoder --fine-tuning --random-crop \
+	    --path ${model_path}  --results-path /home/v-kunwei --scoring sacrebleu  \
+	    --max-len-a 0 --max-len-b 900 \
+	    --beam 10 --single-target 
+    
+    tail -n 1 /home/v-kunwei/generate-*.txt
+    sleep 1s
+done
diff --git a/Speech2S/speech2s/stpretrain_scripts/train_text2code/base_ReleaseIter2_text2unicode_from400k.sh b/Speech2S/speech2s/stpretrain_scripts/train_text2code/base_ReleaseIter2_text2unicode_from400k.sh
new file mode 100644
index 0000000000000000000000000000000000000000..34d1594d8fda2954b8a70dbdfc059402571d70ee
--- /dev/null
+++ b/Speech2S/speech2s/stpretrain_scripts/train_text2code/base_ReleaseIter2_text2unicode_from400k.sh
@@ -0,0 +1,70 @@
+#####################################
+# Hubert mt model #
+#####################################
+[ $# -gt 3 ] && echo "Usage: $0 <world_size> <seeds>" && exit 0
+world_size=$1
+update_freq=$2
+w2v_path=$3
+Mount=""
+
+[ -z $world_size ] && world_size=8
+[ -z $update_freq ] && update_freq=1
+[ -z $w2v_path ] && w2v_path="/mnt/output/users/v-kunwei/data/s2s_data/model_wo_emb_32_1004.pt"
+
+
+langs="ltr,kmu"
+FAIRSEQ_ROOT=/mnt/output/users/v-kunwei/code/fairseq_mlstku
+CONFIG_ROOT=/mnt/output/users/v-kunwei/code/stpretrain_scripts/config/translation
+DATA_DIR=/mnt/output/users/v-kunwei/data/s2s_data/en_asr_data/
+
+### set save-dir
+MODEL_DIR="/mnt/output/users/v-kunwei/data/s2s_data/exp/text2unicode_en"
+exp_name="base_pt400k_releaseiter2_${world_size}gpu_${update_freq}accum_lr1e-4_alll"
+MODEL_DIR=$MODEL_DIR/$exp_name
+[ -d $MODEL_DIR ] || mkdir -p $MODEL_DIR
+
+
+python $FAIRSEQ_ROOT/fairseq_cli/hydra_train.py \
+  --config-dir $CONFIG_ROOT \
+  --config-name text2code \
+  +task.data=$DATA_DIR \
+  dataset.dataset_impl="raw" \
+  +task.source_lang="ltr" +task.target_lang="kmu" \
+  +task.normalize=false \
+  \
+  +criterion.label_smoothing=0.1 \
+  +criterion.report_accuracy=true \
+  optimizer.weight_decay=0.00001 \
+  +lr_scheduler.lr="[0.0001]" \
+  optimization.max_update=500000 \
+  \
+  +model.dropout=0.1 \
+  +model.attention_dropout=0.1 \
+  model.activation_dropout=0.1 \
+  model.decoder_layerdrop=0 \
+  model.layerdrop=0 \
+  model.w2v_path=$w2v_path \
+  +model.text_transformer_encoder_layers=6 \
+  \
+  dataset.train_subset="en_train" \
+  dataset.valid_subset="en_dev" \
+  optimization.update_freq=[${update_freq}] \
+  optimization.clip_norm=5 \
+  \
+  common.seed=222 \
+  common.log_interval=100 \
+  common.log_format="json" \
+  \
+  distributed_training.distributed_world_size=${world_size} \
+  distributed_training.nprocs_per_node=8 \
+  distributed_training.ddp_backend="legacy_ddp" \
+  \
+  common.tensorboard_logdir=$MODEL_DIR \
+  checkpoint.save_dir=$MODEL_DIR \
+  hydra.run.dir=$MODEL_DIR \
+  hydra.job.name=${exp_name} \
+
+sleep 10s
+  # sleep infinity
+
+
diff --git a/Speech2S/speech2s/stpretrain_scripts/train_text2code/base_ReleaseIter2_text2unicode_from400k_es.sh b/Speech2S/speech2s/stpretrain_scripts/train_text2code/base_ReleaseIter2_text2unicode_from400k_es.sh
new file mode 100644
index 0000000000000000000000000000000000000000..1caf2f97f4b01def88b91d8a8422588f4f7a26d5
--- /dev/null
+++ b/Speech2S/speech2s/stpretrain_scripts/train_text2code/base_ReleaseIter2_text2unicode_from400k_es.sh
@@ -0,0 +1,70 @@
+#####################################
+# Hubert mt model #
+#####################################
+[ $# -gt 3 ] && echo "Usage: $0 <world_size> <seeds>" && exit 0
+world_size=$1
+update_freq=$2
+w2v_path=$3
+Mount=""
+
+[ -z $world_size ] && world_size=8
+[ -z $update_freq ] && update_freq=1
+[ -z $w2v_path ] && w2v_path="/mnt/output/users/v-kunwei/data/s2s_data/model_es_emb_90_1004.pt"
+
+
+langs="ltr,kmu"
+FAIRSEQ_ROOT=/mnt/output/users/v-kunwei/code/fairseq_mlstku
+CONFIG_ROOT=/mnt/output/users/v-kunwei/code/stpretrain_scripts/config/translation
+DATA_DIR=/mnt/output/users/v-kunwei/data/s2s_data/es_no_data/
+
+### set save-dir
+MODEL_DIR="/mnt/output/users/v-kunwei/data/s2s_data/exp/text2unicode_es"
+exp_name="base_pt400k_releaseiter2_${world_size}gpu_${update_freq}accum_lr1e-4_no"
+MODEL_DIR=$MODEL_DIR/$exp_name
+[ -d $MODEL_DIR ] || mkdir -p $MODEL_DIR
+
+
+python $FAIRSEQ_ROOT/fairseq_cli/hydra_train.py \
+  --config-dir $CONFIG_ROOT \
+  --config-name text2code \
+  +task.data=$DATA_DIR \
+  dataset.dataset_impl="raw" \
+  +task.source_lang="ltr" +task.target_lang="kmu" \
+  +task.normalize=false \
+  \
+  +criterion.label_smoothing=0.1 \
+  +criterion.report_accuracy=true \
+  optimizer.weight_decay=0.00001 \
+  +lr_scheduler.lr="[0.0001]" \
+  optimization.max_update=500000 \
+  \
+  +model.dropout=0.1 \
+  +model.attention_dropout=0.1 \
+  model.activation_dropout=0.1 \
+  model.decoder_layerdrop=0 \
+  model.layerdrop=0 \
+  model.w2v_path=$w2v_path \
+  +model.text_transformer_encoder_layers=6 \
+  \
+  dataset.train_subset="es_train" \
+  dataset.valid_subset="es_dev" \
+  optimization.update_freq=[${update_freq}] \
+  optimization.clip_norm=5 \
+  \
+  common.seed=222 \
+  common.log_interval=100 \
+  common.log_format="json" \
+  \
+  distributed_training.distributed_world_size=${world_size} \
+  distributed_training.nprocs_per_node=8 \
+  distributed_training.ddp_backend="legacy_ddp" \
+  \
+  common.tensorboard_logdir=$MODEL_DIR \
+  checkpoint.save_dir=$MODEL_DIR \
+  hydra.run.dir=$MODEL_DIR \
+  hydra.job.name=${exp_name} \
+
+sleep 10s
+  # sleep infinity
+
+
diff --git a/Speech2S/speech2s/stpretrain_scripts/train_text2code/base_ReleaseIter2_text2unicode_from400k_es2.sh b/Speech2S/speech2s/stpretrain_scripts/train_text2code/base_ReleaseIter2_text2unicode_from400k_es2.sh
new file mode 100644
index 0000000000000000000000000000000000000000..910a6f35e43a0451b241a2033236039f009f0f75
--- /dev/null
+++ b/Speech2S/speech2s/stpretrain_scripts/train_text2code/base_ReleaseIter2_text2unicode_from400k_es2.sh
@@ -0,0 +1,70 @@
+#####################################
+# Hubert mt model #
+#####################################
+[ $# -gt 3 ] && echo "Usage: $0 <world_size> <seeds>" && exit 0
+world_size=$1
+update_freq=$2
+w2v_path=$3
+Mount=""
+
+[ -z $world_size ] && world_size=8
+[ -z $update_freq ] && update_freq=1
+[ -z $w2v_path ] && w2v_path="/mnt/output/users/v-kunwei/data/s2s_data/model_es_emb_81_1004.pt"
+
+
+langs="ltr,kmu"
+FAIRSEQ_ROOT=/mnt/output/users/v-kunwei/code/fairseq_mlstku
+CONFIG_ROOT=/mnt/output/users/v-kunwei/code/stpretrain_scripts/config/translation
+DATA_DIR=/mnt/output/users/v-kunwei/data/s2s_data/es_asrl_data/
+
+### set save-dir
+MODEL_DIR="/mnt/output/users/v-kunwei/data/s2s_data/exp/text2unicode_es"
+exp_name="base_pt400k_releaseiter2_${world_size}gpu_${update_freq}accum_lr1e-4_ll"
+MODEL_DIR=$MODEL_DIR/$exp_name
+[ -d $MODEL_DIR ] || mkdir -p $MODEL_DIR
+
+
+python $FAIRSEQ_ROOT/fairseq_cli/hydra_train.py \
+  --config-dir $CONFIG_ROOT \
+  --config-name text2code \
+  +task.data=$DATA_DIR \
+  dataset.dataset_impl="raw" \
+  +task.source_lang="ltr" +task.target_lang="kmu" \
+  +task.normalize=false \
+  \
+  +criterion.label_smoothing=0.1 \
+  +criterion.report_accuracy=true \
+  optimizer.weight_decay=0.00001 \
+  +lr_scheduler.lr="[0.0001]" \
+  optimization.max_update=500000 \
+  \
+  +model.dropout=0.1 \
+  +model.attention_dropout=0.1 \
+  model.activation_dropout=0.1 \
+  model.decoder_layerdrop=0 \
+  model.layerdrop=0 \
+  model.w2v_path=$w2v_path \
+  +model.text_transformer_encoder_layers=6 \
+  \
+  dataset.train_subset="es_train" \
+  dataset.valid_subset="es_dev" \
+  optimization.update_freq=[${update_freq}] \
+  optimization.clip_norm=5 \
+  \
+  common.seed=222 \
+  common.log_interval=100 \
+  common.log_format="json" \
+  \
+  distributed_training.distributed_world_size=${world_size} \
+  distributed_training.nprocs_per_node=8 \
+  distributed_training.ddp_backend="legacy_ddp" \
+  \
+  common.tensorboard_logdir=$MODEL_DIR \
+  checkpoint.save_dir=$MODEL_DIR \
+  hydra.run.dir=$MODEL_DIR \
+  hydra.job.name=${exp_name} \
+
+sleep 10s
+  # sleep infinity
+
+
diff --git a/Speech2S/speech2s/stpretrain_scripts/train_text2code/decode_text2code.sh b/Speech2S/speech2s/stpretrain_scripts/train_text2code/decode_text2code.sh
new file mode 100644
index 0000000000000000000000000000000000000000..866146d4a26cea23c4dc51d5f53c90f58bfadc21
--- /dev/null
+++ b/Speech2S/speech2s/stpretrain_scripts/train_text2code/decode_text2code.sh
@@ -0,0 +1,51 @@
+
+#####################################
+# Hubert ED model #
+#####################################
+[ $# -lt 1 ] && echo "Usage: $0 <init-model> <gen-set> <src> <tgt> <max_tokens> <world_size> <rank>" && exit 0
+#source /mnt/default/v-ziqzhang/.bashrc_sing
+
+model_path=$1
+gen_set=$2
+tgt=$3
+src="ltr"
+max_tokens=$4
+word_size=$5
+rank=$6
+outdir=$7
+
+[ -z $tgt ] && tgt="kmu"
+[ -z $gen_set ] && gen_set="dev_clean"
+[ -z $word_size ] && word_size=1
+[ -z $rank ] && rank=0
+[ -z $max_tokens ] && max_tokens=2000
+
+FAIRSEQ_ROOT=/mnt/output/users/v-kunwei/code/fairseq_mlst
+DATA_DIR=${gen_set%/*}
+gen_set=${gen_set##*/}
+[ $gen_set == "test" ] && DATA_DIR=/mnt/output/users/v-kunwei/data/s2s_data/en_asr_data
+[ -z $outdir ] && outdir=$DATA_DIR
+
+
+results_path=$outdir/pseudo_${gen_set}_${rank}
+[ ! -d $results_path ] && mkdir -p $results_path
+
+for subset in $gen_set; do
+    python $FAIRSEQ_ROOT/fairseq_cli/generate_mt_label.py $DATA_DIR \
+    --path ${model_path} \
+    --task "translation_from_jst" \
+    --max-target-positions 3000 \
+    --gen-subset $subset \
+    -t $tgt -s "ltr" \
+    --max-tokens ${max_tokens} \
+    --dataset-impl "raw" \
+    --max-len-a 2 --max-len-b 100 \
+    --results-path $results_path \
+    --skip-invalid-size-inputs-valid-test \
+    --distributed-world-size $word_size --distributed-rank $rank \
+    
+    echo "$model" > $results_path/model.record
+    sleep 1s
+done | tee $results_path/decode.log
+
+sleep 2s
diff --git a/Speech2S/speech2s/stpretrain_scripts/train_text2code/decode_text2code_beam2.sh b/Speech2S/speech2s/stpretrain_scripts/train_text2code/decode_text2code_beam2.sh
new file mode 100644
index 0000000000000000000000000000000000000000..9cad721b3dfcf0bbca8d82b57290dacb616b74b2
--- /dev/null
+++ b/Speech2S/speech2s/stpretrain_scripts/train_text2code/decode_text2code_beam2.sh
@@ -0,0 +1,52 @@
+
+#####################################
+# Hubert ED model #
+#####################################
+[ $# -lt 1 ] && echo "Usage: $0 <init-model> <gen-set> <src> <tgt> <max_tokens> <world_size> <rank>" && exit 0
+#source /mnt/default/v-ziqzhang/.bashrc_sing
+
+model_path=$1
+gen_set=$2
+tgt=$3
+src="ltr"
+max_tokens=$4
+word_size=$5
+rank=$6
+outdir=$7
+
+[ -z $tgt ] && tgt="kmu"
+[ -z $gen_set ] && gen_set="dev_clean"
+[ -z $word_size ] && word_size=1
+[ -z $rank ] && rank=0
+[ -z $max_tokens ] && max_tokens=2000
+
+FAIRSEQ_ROOT=/mnt/output/users/v-kunwei/code/fairseq_mlstku
+DATA_DIR=${gen_set%/*}
+gen_set=${gen_set##*/}
+[ $gen_set == "test" ] && DATA_DIR=/mnt/output/users/v-kunwei/code/fairseq_mlstku
+[ -z $outdir ] && outdir=$DATA_DIR
+
+
+results_path=$outdir/pseudo_${gen_set}_${rank}
+[ ! -d $results_path ] && mkdir -p $results_path
+
+for subset in $gen_set; do
+    python $FAIRSEQ_ROOT/fairseq_cli/generate_mt_label.py $DATA_DIR \
+    --path ${model_path} \
+    --task "translation_from_jst" \
+    --max-target-positions 3000 \
+    --gen-subset $subset \
+    -t $tgt -s "ltr" \
+    --dataset-impl "raw" \
+    --max-tokens ${max_tokens} \
+    --beam 2 \
+    --max-len-a 2 --max-len-b 100 \
+    --results-path $results_path \
+    --skip-invalid-size-inputs-valid-test \
+    --distributed-world-size $word_size --distributed-rank $rank \
+    
+    echo "$model" > $results_path/model.record
+    sleep 1s
+done | tee $results_path/decode.log
+
+sleep 2s
diff --git a/Speech2S/speech2s/stpretrain_scripts/train_text2code/inference_code_bleu.sh b/Speech2S/speech2s/stpretrain_scripts/train_text2code/inference_code_bleu.sh
new file mode 100644
index 0000000000000000000000000000000000000000..240d4874c02fb1b06c18af32382ae4aee3297113
--- /dev/null
+++ b/Speech2S/speech2s/stpretrain_scripts/train_text2code/inference_code_bleu.sh
@@ -0,0 +1,52 @@
+
+#####################################
+# Hubert ED model #
+#####################################
+[ $# -lt 1 ] && echo "Usage: $0 <init-model> <gen-set>" && exit 0
+
+model_path=$1
+src_dir=${model_path%/*}
+cpt=${model_path##*/}
+cpt=${cpt%.*}
+
+gen_set=$2
+tgt=$3
+outdir=$4
+src="ltr"
+[ -z $tgt ] && tgt="kmu"
+[ -z $gen_set ] && gen_set="es_dev"
+[ -z $outdir ] && outdir=$src_dir/decode_${cpt}
+
+DATA_DIR=/mnt/output/users/v-kunwei/data/s2s_data/es_asr_data/
+# DATA_DIR=/mnt/default/v-ziqzhang/data/stbert/data/librispeech/speech2c_joint_splitenc_400k/ltr-$tgt
+# DATA_DIR=/mnt/default/v-ziqzhang/data/stbert/data/librispeech/speech2c_400k/ltr-$tgt
+FAIRSEQ_ROOT=/mnt/output/users/v-kunwei/code/fairseq_mlst
+
+langs="ltr,$tgt"
+
+for subset in $gen_set; do
+    results_path=$outdir/${subset}
+    [ ! -d $results_path ] && mkdir -p $results_path
+
+    python $FAIRSEQ_ROOT/fairseq_cli/generate.py $DATA_DIR \
+    --path ${model_path} \
+    --task "translation_from_jst" \
+    --max-target-positions 3000 \
+    --gen-subset $subset \
+    -t $tgt -s "ltr" --dataset-impl "raw" \
+    --batch-size 16 \
+    --max-len-a 2 --max-len-b 400 \
+    --results-path $results_path \
+    --scoring sacrebleu $extra
+
+    echo $results_path
+    tail -n 1 $results_path/generate-*.txt
+    sleep 1s
+done
+
+# --distributed-world-size 1000 --distributed-rank 0 \
+
+sleep 2s
+
+# cat generate-newstest2020_enja.txt | grep "^D-" | cut -d'-' -f 2- | sort -n -k1 | cut -f3 > decode-newstest2020_enja.txt
+# sacrebleu -t wmt20 -l en-ja -i decode-newstest2020_enja.txt --tokenize char
diff --git a/Speech2S/speech2s/stpretrain_scripts/train_text2code/inference_code_wer.sh b/Speech2S/speech2s/stpretrain_scripts/train_text2code/inference_code_wer.sh
new file mode 100644
index 0000000000000000000000000000000000000000..8fa9670ff8629ccc857d55c7c07983cc3d2c700b
--- /dev/null
+++ b/Speech2S/speech2s/stpretrain_scripts/train_text2code/inference_code_wer.sh
@@ -0,0 +1,53 @@
+
+#####################################
+# Hubert ED model #
+#####################################
+[ $# -lt 1 ] && echo "Usage: $0 <init-model> <gen-set>" && exit 0
+
+model_path=$1
+src_dir=${model_path%/*}
+cpt=${model_path##*/}
+cpt=${cpt%.*}
+
+gen_set=$2
+tgt=$3
+outdir=$4
+src="ltr"
+[ -z $tgt ] && tgt="kmu"
+[ -z $gen_set ] && gen_set="en_dev"
+[ -z $outdir ] && outdir=$src_dir/decode_${cpt}
+
+# DATA_DIR=/mnt/default/v-ziqzhang/data/stbert/data/librispeech/hubert_release_iter2_layer9_kmeans/ltr-$tgt
+# DATA_DIR=/mnt/default/v-ziqzhang/data/stbert/data/librispeech/speech2c_joint_splitenc_400k/ltr-$tgt
+#DATA_DIR=/mnt/default/v-ziqzhang/data/stbert/data/librispeech/speech2c_400k/ltr-$tgt
+DATA_DIR=/mnt/output/users/v-kunwei/data/s2s_data/es_asr_data/
+FAIRSEQ_ROOT=/mnt/output/users/v-kunwei/code/fairseq_mlst
+
+langs="ltr,$tgt"
+
+for subset in $gen_set; do
+    results_path=$outdir/${subset}
+    [ ! -d $results_path ] && mkdir -p $results_path
+
+    python $FAIRSEQ_ROOT/fairseq_cli/generate.py $DATA_DIR \
+    --path ${model_path} \
+    --task "translation_from_jst" \
+    --max-target-positions 3000 \
+    --gen-subset $subset \
+    -t $tgt -s "ltr" --dataset-impl "raw" \
+    --batch-size 16 \
+    --max-len-a 2 --max-len-b 400 \
+    --results-path $results_path \
+    --scoring wer
+
+    echo $results_path
+    tail -n 1 $results_path/generate-*.txt
+    sleep 1s
+done
+
+# --distributed-world-size 1000 --distributed-rank 0 \
+
+sleep 2s
+
+# cat generate-newstest2020_enja.txt | grep "^D-" | cut -d'-' -f 2- | sort -n -k1 | cut -f3 > decode-newstest2020_enja.txt
+# sacrebleu -t wmt20 -l en-ja -i decode-newstest2020_enja.txt --tokenize char
diff --git a/Speech2S/speech2s/tasks/joint_sc2t_pretrain.py b/Speech2S/speech2s/tasks/joint_sc2t_pretrain.py
new file mode 100644
index 0000000000000000000000000000000000000000..db6e4e611f01d58f53ede5fd529fb9ceca44bcc8
--- /dev/null
+++ b/Speech2S/speech2s/tasks/joint_sc2t_pretrain.py
@@ -0,0 +1,1004 @@
+# ----------------------------------------------------------------------------
+# SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder Based Speech-Text Pre-training (https://arxiv.org/abs/2210.03730)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/SpeechUT
+# Code based on fairseq: https://github.com/facebookresearch/fairseq/tree/272c4c5197250997148fb12c0db6306035f166a4
+# 
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# ----------------------------------------------------------------------------
+
+import logging
+import os
+import sys
+from typing import Dict, List, Optional, Tuple
+from pathlib import Path
+
+import numpy as np
+from argparse import Namespace
+from collections import OrderedDict
+
+import torch
+from dataclasses import dataclass, field
+from fairseq.data import (
+    Dictionary,
+    encoders,
+    data_utils,
+    StripTokenDataset,
+    PrependTokenDataset,
+    AppendTokenDataset,
+    DenoisingDataset,
+    ConcatDataset,
+    FairseqDataset,
+    iterators,
+    ResamplingDataset,
+    MaskTokensDataset,
+    LanguagePairDataset,
+)
+from fairseq.data.audio.speech_to_text_joint_dataset import S2TJointDataConfig
+from fairseq.data.shorten_dataset import maybe_shorten_dataset
+# from fairseq.data.encoders.utils import get_whole_word_mask
+from fairseq.dataclass.configs import FairseqDataclass
+from fairseq.tasks import register_task
+from fairseq.tasks.fairseq_task import FairseqTask
+from fairseq.dataclass.constants import ChoiceEnum
+from omegaconf import MISSING
+
+from speechut.data.multimodal_corpus_dataset import MultiCorpusDataset
+from speechut.data.load_langpair_dataset  import load_langpair_dataset
+from speechut.data.language_trible_dataset import LanguageTripleDataset, load_langtriple_dataset
+from speechut.data.hubert_dataset import HubertDataset
+
+logger = logging.getLogger(__name__)
+
+TOKENIZER_CHOICES = ChoiceEnum(["sentencepiece", "hubert_letters", "none"])
+
+def _lang_token(lang: str):
+    return "<lang:{}>".format(lang)
+
+def _lang_token_index(dic: Dictionary, lang: str):
+    """Return language token index."""
+    idx = dic.index(_lang_token(lang))
+    assert idx != dic.unk_index, "cannot find language token for lang {}".format(lang)
+    return idx
+
+
+class LabelEncoder(object):
+    def __init__(self, dictionary: Dictionary) -> None:
+        self.dictionary = dictionary
+
+    def __call__(self, label: str) -> List[str]:
+        return self.dictionary.encode_line(
+            label, append_eos=False, add_if_not_exist=False,
+        )
+
+
+### wrap the initial get_whole_word_mask which needs bpe_tokenizer,
+### here we just assume words are splited by "|" or "<SIL>"
+def get_whole_word_mask(args, dictionary):
+    def is_beginning_of_word(i):
+        if i < dictionary.nspecial:
+            # special elements are always considered beginnings
+            return True
+        tok = dictionary[i]
+        if tok.startswith("madeupword"):
+            return True
+        elif tok in ["<unk>", "<s>", "</s>", "<pad>", "|", "<eps>"]:
+            return True
+        else:
+            return False
+
+    mask_whole_words = torch.ByteTensor(
+        list(map(is_beginning_of_word, range(len(dictionary))))
+    )
+    return mask_whole_words
+
+def get_repeative_start(tokens):
+    """
+    tokens: torch.Tensor with repeative tokens
+    """
+    length = len(tokens)
+    rep_start_id = tokens[:-1] != tokens[1:]
+    return torch.cat([torch.tensor([True]), rep_start_id])
+
+@dataclass
+class TextPretrainingConfig(FairseqDataclass):    
+    ### added for joint pretraining
+    text_data: Optional[str] = field(
+        default=None,
+        metadata={
+            "help": "if set, path to text data directory",
+        },
+    )
+    seed: Optional[int] = field(
+        default=1,
+        metadata={
+            "help": "for ordered_indices in MulticorpusDataset",
+        },
+    )
+    tokens_per_sample: Optional[int] = field(
+        default=512,
+        metadata={
+            "help": "max number of total tokens over all segments per sample for dataset",
+        },
+    )
+    tokens_per_sample_tgt: Optional[int] = field(
+        default=512,
+        metadata={
+            "help": "max number of total tokens over all segments per target sample for dataset",
+        },
+    )
+    sample_break_mode: Optional[str] = field(
+        default="eos",
+        metadata={
+            "help": "mode for breaking sentence",
+        },
+    )
+    mask: Optional[float] = field(
+        default=0.3,
+        metadata={
+            "help": "fraction of words/subwords that will be masked",
+        },
+    )
+    leave_unmasked_prob: float = field(
+        default=0.1,
+        metadata={"help": "probability that a masked token is unmasked"},
+    )
+    mask_random: Optional[float] = field(
+        default=0.1,
+        metadata={
+            "help": "instead of using [MASK], use random token this often",
+        },
+    )
+    freq_weighted_replacement: bool = field(
+        default=False,
+        metadata={"help": "sample random replacement words based on word frequencies"},
+    )
+    mask_whole_words: bool = field(
+        default=True,
+        metadata={"help": "mask whole words; you may also want to set --bpe"},
+    )
+    mask_repeative_tokens: bool = field(
+        default=True,
+        metadata={"help": "mask repeative_tokens; if mask_whole_words=False"},
+    )
+    mask_multiple_length: int = field(
+        default=1,
+        metadata={"help": "repeat the mask indices multiple times"},
+    )
+    mask_stdev: float = field(
+        default=0.0,
+        metadata={"help": "stdev of the mask length"},
+    )
+    shorten_method: Optional[str] = field(
+        default="none",
+        metadata={
+            "help": "if not none, shorten sequences that exceed tokens_per_sample",
+            "choices": "none/truncate/random_crop"
+        },
+    )
+    shorten_data_split_list: Optional[str] = field(
+        default="",
+        metadata={
+            "help": "comma_separated list of dataset splits to apply shortening to, e.g., train,valid (default: all dataset splits)",
+        },
+    )
+
+    ### below hypra-parameters is used in bart
+    insert: Optional[float] = field(
+        default=0.0,
+        metadata={
+            "help": "insert this percentage of additional random tokens",
+        },
+    )
+    permute: Optional[float] = field(
+        default=0.0,
+        metadata={
+            "help": "take this proportion of subwords and permute them",
+        },
+    )
+    rotate: Optional[float] = field(
+        default=0.0,
+        metadata={
+            "help": "rotate this proportion of inputs",
+        },
+    )
+    poisson_lambda: Optional[float] = field(
+        default=3.5,
+        metadata={
+            "help": "randomly shuffle sentences for this proportion of inputs",
+        },
+    )
+    permute_sentences: Optional[float] = field(
+        default=0.0,
+        metadata={
+            "help": "shuffle this proportion of sentences in all inputs",
+        },
+    )
+    mask_length: Optional[str] = field(
+        default="span-poisson",
+        metadata={
+            "help": "mask length to choose",
+            "choice": "subword/word/span-poisson"
+        },
+    )
+    replace_length: Optional[int] = field(
+        default=1,
+        metadata={
+            "help": "when masking N tokens, replace with 0, 1, or N tokens (use -1 for N)",
+        },
+    )
+    shuffle_instance: Optional[bool] = field(
+        default=False,
+        metadata={"help": "shuffle instance"},
+    )
+    max_source_positions: Optional[int] = field(
+        default=1024,
+        metadata={"help": "max number of tokens in the source sequence"},
+    )
+    max_target_positions: Optional[int] = field(
+        default=1024,
+        metadata={"help": "max number of tokens in the target sequence"},
+    )
+    bpe: Optional[str] = field(
+        default="",
+        metadata={
+            "help": "will wrapped by the text_data_config yaml",
+        },
+    )
+    data_config: Optional[str] = field(
+        default=None,
+        metadata={
+            "help": "a config yaml specify the bpe model of text data",
+        },
+    )
+    text_maxtokens_ratio: Optional[float] = field(
+        default=1.0,
+        metadata={
+            "help": "for text, max_tokens = max_tokens * text_maxtokens_ratio / 320 ",
+        },
+    )
+    prepend_tgt_lang_tag: bool = field(
+        default=False,
+        metadata={"help": "prepend tgt_lang_tag to replace <eos>"},
+    )
+    mask_text_ratio: Optional[float] = field(
+        default=0.0,
+        metadata={
+            "help": "mask_text_ratio, for paired data",
+        },
+    )
+    truncate_mono_source: bool = field(
+        default=True,
+        metadata={"help": "truncate mono source-side examples that exceed max-positions"},
+    )
+
+
+@dataclass
+class JointPretrainingConfig(FairseqDataclass):
+    data: str = field(
+        default=MISSING, metadata={"help": "path to speech data directory"}
+    )
+    fine_tuning: bool = field(
+        default=False, metadata={"help": "set to true if fine-tuning Hubert"}
+    )
+    labels: List[str] = field(
+        default_factory=lambda: ["ltr"],
+        metadata={
+            "help": (
+                "extension of the label files to load, frame-level labels for"
+                " pre-training, and sequence-level label for fine-tuning"
+            )
+        },
+    )
+    label_dir: Optional[str] = field(
+        default=None,
+        metadata={
+            "help": "if set, looks for labels in this directory instead",
+        },
+    )
+    label_rate: int = field(
+        default=-1,
+        metadata={"help": "label frame rate. -1 for sequence label"},
+    )
+    sample_rate: int = field(
+        default=16_000,
+        metadata={
+            "help": "target sample rate. audio files will be up/down "
+            "sampled to this rate"
+        },
+    )
+    normalize: bool = field(
+        default=False,
+        metadata={
+            "help": "if set, normalizes input to have 0 mean and unit variance"
+        },
+    )
+    enable_padding: bool = field(
+        default=False,
+        metadata={"help": "pad shorter samples instead of cropping"},
+    )
+    max_keep_size: Optional[int] = field(
+        default=None,
+        metadata={"help": "exclude sample longer than this"},
+    )
+    max_sample_size: Optional[int] = field(
+        default=None,
+        metadata={"help": "max sample size to crop to for batching"},
+    )
+    min_sample_size: Optional[int] = field(
+        default=None,
+        metadata={"help": "min sample size to crop to for batching"},
+    )
+    single_target: Optional[bool] = field(
+        default=False,
+        metadata={
+            "help": "if set, AddTargetDatasets outputs same keys "
+            "as AddTargetDataset"
+        },
+    )
+    random_crop: Optional[bool] = field(
+        default=True,
+        metadata={"help": "always crop from the beginning if false"},
+    )
+    pad_audio: Optional[bool] = field(
+        default=False,
+        metadata={"help": "pad audio to the longest one in the batch if true"},
+    )
+    store_labels: Optional[bool] = field(
+        default=True,
+        metadata={"help": "store spm labels in memory, should be true when fine-tune with bpe"},
+    )
+    add_decoder_target: bool = field(
+        default=False,
+        metadata={"help": "contral the model architecture, if set True, load reduced unit as target"},
+    )
+    split_modality_batch: bool = field(
+        default=False,
+        metadata={"help": "whether create all samples of different modalities in a batch"},
+    )
+    speech_tgt_lang: str = field(
+        default="",
+        metadata={"help": "prepend <tgt-id> to prev_output_tokens to replace <eos>, only used for decoder"},
+    )
+    speech_sampling_alpha: float = field(
+        default=0.2,
+        metadata={
+            "help": "Hyper-parameter alpha = 1/T for temperature-based speech resampling."
+            "(alpha = 1 for no resampling)"
+        },
+    )
+    text_sampling_alpha: float = field(
+        default=0.2,
+        metadata={
+            "help": "Hyper-parameter alpha = 1/T for temperature-based text resampling."
+            "(alpha = 1 for no resampling)"
+        },
+    )
+    hubert_tokenizer: Optional[TOKENIZER_CHOICES] = field(
+        default="none",
+        metadata={"help": "which tokenizer for processing text"},
+    )
+    sp_path: Optional[str] = field(
+        default=None,
+        metadata={"help": "sentencepiece model path if using bpe tokenizer"},
+    )
+    text_cfg: TextPretrainingConfig = TextPretrainingConfig()
+    # For inference
+    ctc_weight: float = field(
+        default=0.0,
+        metadata={"help": "ctc weight during inference"},
+    )
+    lm_dict: Optional[str] = field(
+        default="dict.txt",
+        metadata={"help": "dict used for decoding with language model, should be in cfg.data/"},
+    )
+
+@register_task("joint_sc2t_pretraining", dataclass=JointPretrainingConfig)
+class Jsc2tPretrainingTask(FairseqTask):
+
+    cfg: JointPretrainingConfig
+
+    def __init__(
+        self,
+        cfg: JointPretrainingConfig,
+        load_local_states: True,
+    ) -> None:
+        super().__init__(cfg)
+
+        logger.info(f"current directory is {os.getcwd()}")
+        logger.info(f"JSTPretrainingTask Config {cfg}")
+
+        self.cfg = cfg
+        self.fine_tuning = cfg.fine_tuning
+        self.blank_symbol = "<s>"
+
+        if load_local_states:
+            self.state.add_factory("hubert_tokenizer", self.build_tokenizer)
+            if self.cfg.text_cfg.text_data is not None and os.path.exists(self.cfg.text_cfg.text_data):
+                self.state.add_factory("text_dictionary", self.load_text_dictionary)
+                self.state.add_factory("text_src_dictionary", self.load_text_src_dictionary)
+            if cfg.fine_tuning:
+                self.state.add_factory("target_dictionary", self.load_dictionaries)
+            else:
+                self.state.add_factory("dictionaries", self.load_dictionaries)
+
+            if cfg.text_cfg.data_config is not None:
+                self.text_data_cfg = S2TJointDataConfig(Path(f"{cfg.text_cfg.text_data}/{cfg.text_cfg.data_config}"))
+                self.cfg.text_cfg.bpe = self.text_data_cfg.bpe_tokenizer["bpe"]
+            else:
+                self.text_data_cfg = None
+
+    @property
+    def source_dictionary(self) -> Optional[Dictionary]:
+        return None
+
+    @property
+    def target_dictionary(self) -> Optional[Dictionary]:
+        return self.state.target_dictionary
+
+    @property
+    def dictionaries(self) -> List[Dictionary]:
+        return self.state.dictionaries
+
+    @property
+    def text_dictionary(self) -> Optional[Dictionary]:
+        return self.state.text_dictionary
+
+    @property
+    def text_src_dictionary(self) -> Optional[Dictionary]:
+        return self.state.text_src_dictionary
+
+    @property
+    def hubert_tokenizer(self):
+        return self.state.hubert_tokenizer
+
+    def load_dictionaries(self):
+        label_dir = self.cfg.data if self.cfg.label_dir is None else self.cfg.label_dir
+        dictionaries = [Dictionary.load(f"{label_dir}/dict.{label}.txt") for label in self.cfg.labels]
+        if not self.cfg.fine_tuning:
+            for dictionary in dictionaries:
+                dictionary.add_symbol("<mask>")
+        return dictionaries[0] if self.cfg.fine_tuning else dictionaries
+    
+    def load_text_dictionary(self):
+        tgt_dict_path = f"{self.cfg.text_cfg.text_data}/{self.text_data_cfg.vocab_filename if self.text_data_cfg is not None else 'dict.txt'}"
+        if not os.path.isfile(tgt_dict_path):
+            raise FileNotFoundError(f"Dict not found: {tgt_dict_path}")
+        text_dictionary = Dictionary.load(tgt_dict_path)
+        self.mask_idx = text_dictionary.add_symbol("<mask>")
+        return text_dictionary
+
+    def load_text_src_dictionary(self):
+        src_dict_path = f"{self.cfg.text_cfg.text_data}/{self.text_data_cfg.src_vocab_filename if self.text_data_cfg is not None else 'dict.txt'}"
+        if not os.path.isfile(src_dict_path):
+            raise FileNotFoundError(f"Dict not found: {src_dict_path}")
+        src_text_dictionary = Dictionary.load(src_dict_path)
+        self.mask_idx = src_text_dictionary.add_symbol("<mask>")
+        return src_text_dictionary
+
+    @classmethod
+    def setup_task(
+        cls, cfg: JointPretrainingConfig, **kwargs
+    ) -> "Jsc2tPretrainingTask":
+        load_local_states = kwargs.get("load_local_states", True)
+        return cls(cfg, load_local_states)
+
+    def get_label_dir(self) -> str:
+        if self.cfg.label_dir is None:
+            return self.cfg.data
+        return self.cfg.label_dir
+    
+    def load_paired_dataset(self, text_split, truncate_source=False):
+        text_split, lp = text_split.rsplit('.', 1)       # e.g. "libritext.ltr-ltr"
+        if len(lp.split("-")) == 2:
+            src, tgt = lp.split("-")
+            if src == tgt:
+                logger.warn(f"| trying to load monolingual dataset {text_split}.{lp}, please check your task is right.")
+                paired_dataset = self.load_char_bart_dataset(f"{text_split}.{lp}.{tgt}")
+                return paired_dataset
+            paired_dataset = load_langpair_dataset(
+                self.cfg.text_cfg.text_data,
+                text_split,
+                src,
+                self.text_src_dictionary,
+                tgt,
+                self.text_dictionary,
+                combine=True,
+                dataset_impl=None,
+                upsample_primary=1,
+                left_pad_source=False,
+                left_pad_target=False,
+                max_source_positions=self.cfg.text_cfg.tokens_per_sample,
+                max_target_positions=self.cfg.text_cfg.tokens_per_sample,
+                truncate_source=truncate_source,
+                prepend_bos=False,
+                load_alignments=False,
+                append_source_id=True if self.cfg.text_cfg.prepend_tgt_lang_tag else False,
+                lang_format="<lang:{}>" if self.cfg.text_cfg.prepend_tgt_lang_tag else "[{}]",
+                input_feeding=self.cfg.add_decoder_target,
+            )
+            if self.cfg.text_cfg.mask_text_ratio > 0:
+                # add mask
+                self.mask_idx = self.text_src_dictionary.index("<mask>")
+                mask_whole_words = None
+                if self.cfg.text_cfg.mask_whole_words:
+                    mask_whole_words = get_whole_word_mask(self.cfg.text_cfg, self.text_src_dictionary)
+                elif self.cfg.text_cfg.mask_repeative_tokens:
+                    mask_whole_words = get_repeative_start
+                
+                src_dataset, src_unmasked_dataset = MaskTokensDataset.apply_mask(
+                    paired_dataset.src,
+                    self.text_src_dictionary,
+                    pad_idx=self.text_src_dictionary.pad(),
+                    mask_idx=self.mask_idx,
+                    seed=self.cfg.text_cfg.seed,
+                    mask_prob=self.cfg.text_cfg.mask_text_ratio,
+                    leave_unmasked_prob=self.cfg.text_cfg.leave_unmasked_prob,
+                    random_token_prob=self.cfg.text_cfg.mask_random,
+                    freq_weighted_replacement=self.cfg.text_cfg.freq_weighted_replacement,
+                    mask_whole_words=mask_whole_words,
+                    mask_multiple_length=self.cfg.text_cfg.mask_multiple_length,
+                    mask_stdev=self.cfg.text_cfg.mask_stdev,
+                )
+                tgt_dataset = paired_dataset.tgt if paired_dataset.tgt is not None else src_unmasked_dataset
+                paired_dataset = LanguageTripleDataset(
+                    src_dataset,
+                    src_dataset.sizes,
+                    self.text_src_dictionary,
+                    src_unmasked_dataset,
+                    src_unmasked_dataset.sizes,
+                    self.text_src_dictionary,
+                    tgt_dataset,
+                    tgt_dataset.sizes,
+                    self.text_dictionary,
+                    left_pad_source=False,
+                    left_pad_target=False,
+                    align_dataset=None,
+                    eos=None,
+                    num_buckets=0,
+                    shuffle=True,
+                    pad_to_multiple=1,
+                )
+        else:
+            src, ref, tgt = lp.split("-")
+            paired_dataset = load_langtriple_dataset(
+                self.cfg.text_cfg.text_data,
+                text_split,
+                src,
+                self.text_src_dictionary,
+                ref,
+                self.dictionaries[-1],
+                tgt,
+                self.text_dictionary,
+                combine=True,
+                dataset_impl=None,
+                upsample_primary=1,
+                left_pad_source=False,
+                left_pad_target=False,
+                max_source_positions=self.cfg.text_cfg.tokens_per_sample,
+                max_target_positions=self.cfg.text_cfg.tokens_per_sample,
+                truncate_source=truncate_source,
+                prepend_bos=False,
+                load_alignments=False,
+                append_source_id=True if self.cfg.text_cfg.prepend_tgt_lang_tag else False,
+                lang_format="<lang:{}>" if self.cfg.text_cfg.prepend_tgt_lang_tag else "[{}]",
+            )
+        return paired_dataset
+
+    def load_dataset(self, split: str, epoch=1, **kwargs) -> None:
+        """
+            Create Wav dataset for audio, and Index dataset for phonemized text, 
+            then concatenate them to by fairseq.data.multi_corpus_dataset.MultiCorpusDataset.
+        """
+        speech_splits = split.split('+')[0].split(',')
+        ### 1st, create a speech dataset using STSpeechDataset (modified from HubertDataset)
+        dicts = [self.target_dictionary] if self.cfg.fine_tuning else self.dictionaries
+        pad_list = [dict.pad() for dict in dicts]
+        eos_list = [dict.eos() for dict in dicts]
+        procs = [LabelEncoder(dict) for dict in dicts]
+        if self.cfg.speech_tgt_lang != "":
+            tgt_lang_idx = _lang_token_index(dicts[0], self.cfg.speech_tgt_lang)
+            logger.info(f"Will prepend <{tgt_lang_idx}> at the beginning of prev_output_tokens to replace <eos>")
+        else:
+            tgt_lang_idx = None
+
+
+        # hubert v1: pad_audio=True, random_crop=False;
+        speech_datasets = []
+        for speech_split in speech_splits:
+            paths = [
+                f"{self.get_label_dir()}/{speech_split}.{l}" for l in self.cfg.labels
+            ]
+            speech_datasets.append(
+                HubertDataset(
+                    f"{self.cfg.data}/{speech_split}.tsv",
+                    sample_rate=self.cfg.sample_rate,
+                    label_paths=paths,
+                    label_rates=self.cfg.label_rate,
+                    pad_list=pad_list,
+                    eos_list=eos_list,
+                    label_processors=procs,
+                    max_keep_sample_size=self.cfg.max_keep_size,
+                    min_keep_sample_size=self.cfg.min_sample_size,
+                    max_sample_size=self.cfg.max_sample_size,
+                    pad_audio=self.cfg.pad_audio,
+                    normalize=self.cfg.normalize,
+                    store_labels=self.cfg.store_labels,
+                    random_crop=self.cfg.random_crop,
+                    single_target=self.cfg.single_target,
+                    tgt_dict=dicts[0],
+                    add_decoder_target=self.cfg.add_decoder_target,
+                    fine_tuning=self.cfg.fine_tuning,
+                    tgt_lang_idx=tgt_lang_idx,
+                    tokenizer=self.hubert_tokenizer,
+                )
+            )
+        if len(speech_datasets) > 1:
+            speech_dataset = ConcatDataset(speech_datasets)
+        else:
+            speech_dataset = speech_datasets[0]
+
+        has_text = len(split.split('+')) > 1
+        if not has_text:
+            assert speech_dataset is not None
+            self.datasets[split] = speech_dataset
+            return
+
+        ### 2nd, create paired/mono text datasets using Langpairdataset
+        if split.split('+')[1] != '':
+            paired_splits = [paired_split for paired_split in split.split('+')[1].split(',') if paired_split != '']
+            paired_datasets = [self.load_paired_dataset(paired_split) for paired_split in paired_splits]
+        else:
+            paired_splits, paired_datasets = [], []
+
+        if len(split.split('+')) > 2 and split.split('+')[2] != '':
+            mono_splits = [mono_split for mono_split in split.split('+')[2].split(',') if mono_split != '']
+            mono_datasets = [self.load_paired_dataset(mono_split, truncate_source=self.cfg.text_cfg.truncate_mono_source) for mono_split in mono_splits]
+        else:
+            mono_splits, mono_datasets = [], []
+        
+        assert len(mono_datasets + paired_datasets) > 0, f"split {split} has no text! you should check out for that"
+
+        ### 3rd, if provided, create a supervised dataset with labeled data
+        if len(split.split('+')) > 3 and split.split('+')[3] != '':
+            assert len(paired_splits) > 0, f"supervised dataset can not be loaded without text paired dataset!"
+            tgt = paired_splits[0].rsplit('.', 1)[1].split("-")[1]
+            sup_split = split.split('+')[3]
+
+            sup_dataset = HubertDataset(
+                f"{self.cfg.data}/{sup_split}.tsv",
+                sample_rate=self.cfg.sample_rate,
+                label_paths=[f"{self.get_label_dir()}/{sup_split}.{tgt}"],
+                label_rates=[-1],
+                pad_list=[self.text_dictionary.pad()],
+                eos_list=[self.text_dictionary.eos()],
+                label_processors=[LabelEncoder(self.text_dictionary)],
+                max_keep_sample_size=self.cfg.max_keep_size,
+                min_keep_sample_size=None,
+                max_sample_size=None,
+                pad_audio=True,
+                normalize=self.cfg.normalize,
+                store_labels=self.cfg.store_labels,
+                random_crop=False,
+                single_target=True,
+                tgt_dict=self.text_dictionary,
+                add_decoder_target=self.cfg.add_decoder_target,
+                fine_tuning=True,
+                tgt_lang_idx=None,
+                tokenizer=None,
+            )
+        else:
+            sup_dataset = None
+
+        ### 4th, compose a MultiCorpusDataset
+        dataset_dict, max_positions_dict, distributions, max_tokens_ratios = self.resample_multi_modality_dataset(
+        speech_dataset, sup_dataset, mono_datasets, paired_datasets, mono_splits, paired_splits, epoch=epoch,
+        )
+        self.datasets[split] = MultiCorpusDataset(
+            dataset_dict,
+            max_positions=max_positions_dict,
+            distribution=distributions,
+            max_tokens_ratio=max_tokens_ratios,
+            seed=self.cfg.text_cfg.seed,
+            sort_indices=True,
+        )
+
+
+    def max_positions(self) -> Tuple[int, int]:
+        return (sys.maxsize, sys.maxsize)
+
+    def filter_indices_by_size(
+        self, indices: np.array, *args, **kwargs
+    ) -> np.array:
+        return indices
+
+    def get_batch_iterator(
+        self,
+        dataset,
+        max_tokens=None,
+        max_sentences=None,
+        max_positions=None,
+        ignore_invalid_inputs=False,
+        required_batch_size_multiple=1,
+        seed=1,
+        num_shards=1,
+        shard_id=0,
+        num_workers=0,
+        epoch=1,
+        data_buffer_size=0,
+        disable_iterator_cache=False,
+        skip_remainder_batch=False,
+        grouped_shuffling=False,
+        update_epoch_batch_itr=False,
+    ):
+        """
+        Get an iterator that yields batches of data from the given dataset.
+        Args:
+            dataset (~fairseq.data.FairseqDataset): dataset to batch
+            max_tokens (int, optional): max number of tokens in each batch
+                (default: None).
+            max_sentences (int, optional): max number of sentences in each
+                batch (default: None).
+            max_positions (optional): max sentence length supported by the
+                model (default: None).
+            ignore_invalid_inputs (bool, optional): don't raise Exception for
+                sentences that are too long (default: False).
+            required_batch_size_multiple (int, optional): require batch size to
+                be a multiple of N (default: 1).
+            seed (int, optional): seed for random number generator for
+                reproducibility (default: 1).
+            num_shards (int, optional): shard the data iterator into N
+                shards (default: 1).
+            shard_id (int, optional): which shard of the data iterator to
+                return (default: 0).
+            num_workers (int, optional): how many subprocesses to use for data
+                loading. 0 means the data will be loaded in the main process
+                (default: 0).
+            epoch (int, optional): the epoch to start the iterator from
+                (default: 1).
+            data_buffer_size (int, optional): number of batches to
+                preload (default: 0).
+            disable_iterator_cache (bool, optional): don't cache the
+                EpochBatchIterator (ignores `FairseqTask::can_reuse_epoch_itr`)
+                (default: False).
+            skip_remainder_batch (bool, optional): if set, discard the last
+                batch in each training epoch, as the last batch is often smaller than
+                    local_batch_size * distributed_word_size (default: ``True``).
+            grouped_shuffling (bool, optional): group batches with each groups
+                containing num_shards batches and shuffle groups. Reduces difference
+                between sequence lengths among workers for batches sorted by length.
+            update_epoch_batch_itr (bool optional): if true then donot use the cached
+                batch iterator for the epoch
+            
+        Returns:
+            ~fairseq.iterators.EpochBatchIterator: a batched iterator over the
+                given dataset split
+        """
+        if self.fine_tuning or not isinstance(dataset, MultiCorpusDataset):
+            return super().get_batch_iterator(
+                dataset,
+                max_tokens=max_tokens,
+                max_sentences=max_sentences,
+                max_positions=max_positions,
+                ignore_invalid_inputs=ignore_invalid_inputs,
+                required_batch_size_multiple=required_batch_size_multiple,
+                seed=seed,
+                num_shards=num_shards,
+                shard_id=shard_id,
+                num_workers=num_workers,
+                epoch=epoch,
+                data_buffer_size=data_buffer_size,
+                disable_iterator_cache=disable_iterator_cache,
+                skip_remainder_batch=skip_remainder_batch,
+                grouped_shuffling=grouped_shuffling,
+                update_epoch_batch_itr=update_epoch_batch_itr,
+            )
+        
+        can_reuse_epoch_itr = (
+            not disable_iterator_cache
+            and not update_epoch_batch_itr
+            and self.can_reuse_epoch_itr(dataset)
+        )
+        if can_reuse_epoch_itr and dataset in self.dataset_to_epoch_iter:
+            logger.debug("reusing EpochBatchIterator for epoch {}".format(epoch))
+            return self.dataset_to_epoch_iter[dataset]
+
+        assert isinstance(dataset, FairseqDataset)
+
+        # initialize the dataset with the correct starting epoch
+        dataset.set_epoch(epoch)
+
+        # get indices ordered by example size
+        with data_utils.numpy_seed(seed):
+            indices = dataset.ordered_indices()
+
+        # filter examples that are too large
+        if max_positions is not None:
+            indices = self.filter_indices_by_size(
+                indices, dataset, max_positions, ignore_invalid_inputs
+            )
+
+        # create mini-batches with given size constraints
+        batch_sampler = dataset.get_batch_sampler(
+            indices,
+            num_shards,
+            seed,
+            max_tokens=max_tokens,
+            max_sentences=max_sentences,
+            required_batch_size_multiple=required_batch_size_multiple,
+            split_modality_batch=self.cfg.split_modality_batch,
+        )
+
+        # return a reusable, sharded iterator
+        epoch_iter = iterators.EpochBatchIterator(
+            dataset=dataset,
+            collate_fn=dataset.collater,
+            batch_sampler=batch_sampler,
+            seed=seed,
+            num_shards=num_shards,
+            shard_id=shard_id,
+            num_workers=num_workers,
+            epoch=epoch,
+            buffer_size=data_buffer_size,
+            skip_remainder_batch=skip_remainder_batch,
+            disable_shuffling=True,
+            grouped_shuffling=grouped_shuffling,
+        )
+
+        if can_reuse_epoch_itr:
+            self.dataset_to_epoch_iter[dataset] = epoch_iter
+
+        return epoch_iter
+    
+    def build_generator(
+        self,
+        models,
+        args,
+        seq_gen_cls=None,
+        extra_gen_cls_kwargs=None,
+    ):
+        """Build ED-CTC generator for finet-tuned ASR model"""
+        from speechut.squence_generator import SequenceGenerator
+        extra_gen_cls_kwargs = {
+            "ctc_weight": self.cfg.ctc_weight,
+            "lm_dict": Dictionary.load(os.path.join(self.cfg.data, self.cfg.lm_dict)),
+            **extra_gen_cls_kwargs
+        }
+        return super().build_generator(
+            models, args, seq_gen_cls=SequenceGenerator, extra_gen_cls_kwargs=extra_gen_cls_kwargs
+        )
+
+    @classmethod
+    def _get_size_ratios(cls, ids: List[str], sizes: List[int], alpha: float = 1.0):
+        """Size ratios for temperature-based sampling
+        (https://arxiv.org/abs/1907.05019)"""
+        _sizes = np.array(sizes)
+        prob = _sizes / _sizes.sum()
+        smoothed_prob = prob ** alpha
+        smoothed_prob = smoothed_prob / smoothed_prob.sum()
+        size_ratio = (smoothed_prob * _sizes.sum()) / _sizes
+
+        o_str = str({_i: f"{prob[i]:.3f}" for i, _i in enumerate(ids)})
+        logger.info(f"original sampling probability: {o_str}")
+        p_str = str({_i: f"{smoothed_prob[i]:.3f}" for i, _i in enumerate(ids)})
+        logger.info(f"balanced sampling probability: {p_str}")
+        sr_str = str({_id: f"{size_ratio[i]:.3f}" for i, _id in enumerate(ids)})
+        logger.info(f"balanced sampling size ratio: {sr_str}")
+        return size_ratio.tolist()
+
+    def resample_multi_modality_dataset(self, speech_dataset, sup_dataset, mono_datasets, paired_datasets, mono_splits, paired_splits, epoch=1, train=True):
+        assert len(mono_datasets+paired_datasets) > 0, f"No text data loaded!"
+
+        if len(mono_datasets) > 1 and self.cfg.text_sampling_alpha != 1.0:
+            size_ratios = self._get_size_ratios(
+                mono_splits, [len(s) for s in mono_datasets], alpha=self.cfg.text_sampling_alpha
+            )
+            mono_datasets = [
+                ResamplingDataset(
+                    d, size_ratio=r, seed=0, epoch=epoch, replace=(r >= 1.0)
+                ) for d, r in zip(mono_datasets, size_ratios)
+            ]
+            
+        if len(paired_datasets) > 1 and self.cfg.text_sampling_alpha != 1.0:
+            size_ratios = self._get_size_ratios(
+                paired_splits, [len(s) for s in paired_datasets], alpha=self.cfg.text_sampling_alpha
+            )
+            paired_datasets = [
+                ResamplingDataset(
+                    d, size_ratio=r, seed=0, epoch=epoch, replace=(r >= 1.0)
+                ) for d, r in zip(paired_datasets, size_ratios)
+            ]
+
+        dataset_list = [speech_dataset, sup_dataset]
+        for datasets in [mono_datasets, paired_datasets]:
+            if len(datasets) > 1:
+                dataset_list.append(ConcatDataset(datasets))
+            elif len(datasets) == 1:
+                dataset_list.append(datasets[0])
+            else:
+                dataset_list.append(None)
+
+        ### match speech/text datasets according to modality
+        dataset_dict = OrderedDict((name, d) for name, d in zip(["speech", "speech_sup", "text_mono", "text_paired"], dataset_list) if d is not None)
+        max_positions_dict = {
+            "speech": None,
+            "speech_sup": None,
+            "text_mono": (self.cfg.text_cfg.tokens_per_sample, self.cfg.text_cfg.tokens_per_sample),
+            "text_paired": (self.cfg.text_cfg.tokens_per_sample, self.cfg.text_cfg.tokens_per_sample),
+        }
+        max_positions_dict = OrderedDict((name, max_positions_dict[name]) for name in dataset_dict.keys())
+        max_tokens_ratios_dict = {
+            "speech": 1.0,
+            "speech_sup": 1.0,
+            "text_mono": 1.0 / 320 / self.cfg.text_cfg.text_maxtokens_ratio,
+            "text_paired": 1.0 / 320 / self.cfg.text_cfg.text_maxtokens_ratio,
+        }
+        max_tokens_ratios = [max_tokens_ratios_dict[name] for name in dataset_dict.keys()]
+        dataset_lens = np.array([len(dataset) for dataset in dataset_dict.values()])
+        dataset_avg_sample_lens = np.array([
+            sum([dataset.num_tokens(i) for i in np.random.randint(low=0, high=len(dataset), size=10000)]) / 10000.0 
+            for dataset in dataset_dict.values()
+        ])
+
+        if not "speech" in dataset_dict:
+            distributions = [l / sum(dataset_lens) for l in dataset_lens]
+        else:
+            ## we just keep the batches of speech and non-speech the same, expand_coef is to ensure speech batches is less than others
+            first_ratio = dataset_lens[0] / sum(dataset_lens)
+            expand_coef = 1.2 if sup_dataset is None else 1.1 * sum(dataset_lens[0:2]) / dataset_lens[0]
+            distributions = [expand_coef * max_tokens_ratios[i] * dataset_avg_sample_lens[0] / l for (i, l) in enumerate(dataset_avg_sample_lens)]
+            distributions[0] = 1.0
+            if sup_dataset is not None:
+                distributions[1] = dataset_lens[1] / dataset_lens[0]
+            distributions = [first_ratio * d for d in distributions]
+
+        logging.info(f"Number samples of datasets is {dataset_lens}")
+        logging.info(f"Avg sample length of datasets is {dataset_avg_sample_lens}")
+        logging.info(f"Sampling distributions is {distributions}")
+        logging.info(f"Maxtokens ratio is {max_tokens_ratios}")
+        return dataset_dict, max_positions_dict, distributions, max_tokens_ratios
+
+    def build_tokenizer(self, cfg=None):
+        logger.info(f"tokenizer: {self.cfg.hubert_tokenizer}")
+        if self.cfg.hubert_tokenizer != "none":
+            return encoders.build_bpe(Namespace(**{"bpe": self.cfg.hubert_tokenizer, "sentencepiece_model": self.cfg.sp_path}))
+        else:
+            return None
+
+    def load_char_bart_dataset(self, split):
+        mono_dataset = data_utils.load_indexed_dataset(
+            f"{self.cfg.text_cfg.text_data}/{split}",
+            self.text_dictionary,
+        )
+        mono_dataset = StripTokenDataset(mono_dataset, self.text_dictionary.eos())
+        mono_dataset = maybe_shorten_dataset(
+            mono_dataset,
+            split,
+            self.cfg.text_cfg.shorten_data_split_list,
+            self.cfg.text_cfg.shorten_method,
+            self.cfg.text_cfg.tokens_per_sample - 2,
+            self.cfg.text_cfg.seed,
+        )
+        logger.info("loaded {} samples from: {}".format(len(mono_dataset), mono_dataset))
+        ### prepend bos and eos to dataset
+        mono_dataset = PrependTokenDataset(mono_dataset, self.text_dictionary.bos())
+        mono_dataset = AppendTokenDataset(mono_dataset, self.text_dictionary.eos())
+        mask_whole_words = (
+            get_whole_word_mask(None, self.text_dictionary)
+            if self.cfg.text_cfg.mask_whole_words
+            else None
+        )
+        lang=self.cfg.speech_tgt_lang
+        mono_dataset = DenoisingDataset(
+            mono_dataset,
+            mono_dataset.sizes,
+            self.text_dictionary,
+            self.mask_idx,
+            mask_whole_words,
+            shuffle=self.cfg.text_cfg.shuffle_instance,
+            seed=self.cfg.text_cfg.seed,
+            args=self.cfg.text_cfg,
+            tgt_lang_idx=_lang_token_index(self.text_dictionary, lang) if self.cfg.text_cfg.prepend_tgt_lang_tag else None,
+        )
+        
+        return mono_dataset
diff --git a/SpeechLM/README.md b/SpeechLM/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..71ea069ba766a163e5a1876fa5eace6d1e7f6efa
--- /dev/null
+++ b/SpeechLM/README.md
@@ -0,0 +1,268 @@
+# SpeechLM
+
+<!--**Pre-trained models for speech related tasks**-->
+
+ [**SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data**](https://arxiv.org/abs/2209.15329)
+
+- June 2023: We have corrected the errors in the pre-training data for SpeechLM-P Base models, and new results are updated.
+
+- April 2023: We discovered some errors about the data in the pre-training experiments, which will affect all the results about SpeechLM-P Base models. We are re-conducting the related experiments and will update the paper with the new results.
+
+- (Done) Oct 2022: release the code and models
+- Oct 2022: release preprint in [arXiv](https://arxiv.org/abs/2209.15329)
+
+## Pre-Trained and Fine-tuned Models
+
+|  Model            |               Pre-training Dataset                                                                            | Fine-tuning Dataset                                               | Model |
+| :------:          | :----------------------------------------------:                                                              | :-----------------:                                               | :-----: |
+| SpeechLM-P Base   | [960 hrs LibriSpeech](http://www.openslr.org/12) + [40M Text](http://www.openslr.org/11)                      |                      -                                            | [Azure Storage]  |
+| SpeechLM-P Base   | [960 hrs LibriSpeech](http://www.openslr.org/12) + [40M Text](http://www.openslr.org/11)                      | [100 hrs LibriSpeech](http://www.openslr.org/12)                  | [Azure Storage] |
+| SpeechLM-H Base   | [960 hrs LibriSpeech](http://www.openslr.org/12) + [40M Text](http://www.openslr.org/11)                      |                      -                                            | [Google drive](https://drive.google.com/file/d/1eblW8U8f9t-NTuCNRrNHwr-8BeLAUAmQ/view?usp=sharing)  |
+| SpeechLM-H Base   | [960 hrs LibriSpeech](http://www.openslr.org/12) + [40M Text](http://www.openslr.org/11)                      | [100 hrs LibriSpeech](http://www.openslr.org/12)                  | [Google drive](https://drive.google.com/file/d/1vXyO5DolbiWiTYZ6pkkKQsu2wJetaPlv/view?usp=sharing)  |
+| SpeechLM-P Base   | [960 hrs LibriSpeech](http://www.openslr.org/12) + [40M Text](http://www.openslr.org/11)                      | [En-De CoVoST-2](https://github.com/facebookresearch/covost)      | [Azure Storage]  |
+| SpeechLM-P Base   | [960 hrs LibriSpeech](http://www.openslr.org/12) + [40M Text](http://www.openslr.org/11)                      | [En-Ca CoVoST-2](https://github.com/facebookresearch/covost)      | [Azure Storage]  |
+| SpeechLM-P Base   | [960 hrs LibriSpeech](http://www.openslr.org/12) + [40M Text](http://www.openslr.org/11)                      | [En-Ar CoVoST-2](https://github.com/facebookresearch/covost)      | [Azure Storage]  |
+| SpeechLM-P Base   | [960 hrs LibriSpeech](http://www.openslr.org/12) + [40M Text](http://www.openslr.org/11)                      | [En-Tr CoVoST-2](https://github.com/facebookresearch/covost)      | [Azure Storage]  |
+| SpeechLM-P Large  | [60k hrs LibriLight](https://github.com/facebookresearch/libri-light) + [40M Text](http://www.openslr.org/11) |                      -                                            | [Google drive](https://drive.google.com/file/d/1QjLIgTJKIylVIp5hUkfSjGPtz8Xo7Lky/view?usp=sharing)  |
+| SpeechLM-P Large  | [60k hrs LibriLight](https://github.com/facebookresearch/libri-light) + [40M Text](http://www.openslr.org/11) | [960 hrs LibriSpeech](http://www.openslr.org/12)                  | [Google drive](https://drive.google.com/file/d/1YZQDVv096o8Opt0RBnkRiZXYPRDqKZnP/view?usp=sharing)  |
+| SpeechLM-P Large  | [60k hrs LibriLight](https://github.com/facebookresearch/libri-light) + [40M Text](http://www.openslr.org/11) | [En-De CoVoST-2](https://github.com/facebookresearch/covost)      | [Google drive](https://drive.google.com/file/d/1qYygNWSc11TQbBI1OzC4ChlR-dNh8t9S/view?usp=sharing)  |
+| SpeechLM-P Large  | [60k hrs LibriLight](https://github.com/facebookresearch/libri-light) + [40M Text](http://www.openslr.org/11) | [En-Ca CoVoST-2](https://github.com/facebookresearch/covost)      | [Google drive](https://drive.google.com/file/d/162U88mwso2aVfzzPkEM2nP_vwTpcb57T/view?usp=sharing)  |
+| SpeechLM-P Large  | [60k hrs LibriLight](https://github.com/facebookresearch/libri-light) + [40M Text](http://www.openslr.org/11) | [En-Ar CoVoST-2](https://github.com/facebookresearch/covost)      | [Google drive](https://drive.google.com/file/d/1lbTSRXewEeb2t45URunD6EiJcbniyjWW/view?usp=sharing)  |
+| SpeechLM-P Large  | [60k hrs LibriLight](https://github.com/facebookresearch/libri-light) + [40M Text](http://www.openslr.org/11) | [En-Tr CoVoST-2](https://github.com/facebookresearch/covost)      | [Google drive](https://drive.google.com/file/d/1Er4I_jHS175pQQph223yKtiiLQ378VvH/view?usp=sharing)  |
+
+
+## Extract features using pre-trained models
+For easier use of our pre-trained models, we merge all inference-related code to [`SpeechLM.py`](SpeechLM.py) and make cleaned checkpoints [~~`SpeechLM-P Base`~~] [`SpeechLM-H Base`] [`SpeechLM-P Large`] by removing non-required modules. Now you can directly use the following script to extract your speech features:
+```python
+import torch
+import torch.nn.functional as F
+from SpeechLM import SpeechLMConfig, SpeechLM
+
+checkpoint = torch.load('path/to/the/cleaned/checkpoint.pt')
+cfg = SpeechLMConfig(checkpoint['cfg']['model'])
+model = SpeechLM(cfg)
+model.load_state_dict(checkpoint['model'])
+model.eval()
+
+wav_input_16khz = torch.randn(1,10000)
+normalize = checkpoint['cfg']['task']['normalize']  # False for base model, True for large model
+if normalize:
+    wav_input_16khz = F.layer_norm(wav_input_16khz[0], wav_input_16khz[0].shape).unsqueeze(0)
+
+# extract the representation of last layer
+rep = model.extract_features(wav_input_16khz)[0]
+
+# extract the representation of each layer
+output_layer = model.cfg.encoder_layers + model.cfg.text_transformer.encoder.layers
+rep, layer_results = model.extract_features(wav_input_16khz, output_layer=output_layer, ret_layer_results=True)[0]
+layer_reps = [x.transpose(0, 1) for x in layer_results]
+```
+
+
+## Setup
+To fine-tune or pre-train more models, please follow the instructions below.
+
+```bash
+git submodule update --init SpeechLM/fairseq
+cd SpeechLM/
+pip install --editable fairseq/
+pip install sacrebleu==1.5.1
+```
+
+## ASR on LibriSpeech
+### Data preparation
+Please follow the steps of wav2vec 2.0 manifest [here](https://github.com/pytorch/fairseq/tree/main/examples/wav2vec#prepare-training-data-manifest) to prepare `train.tsv` and `train.ltr`. You should make sure the vocabulary [`dict.ltr.txt`](dataset/LibriSpeech/asr/dict.ltr.txt) is the same as that used for the pre-trained model.
+
+Put yout prepared data into `$data_dir`, we provided eamples in [`dataset/LibriSpeech/asr`](dataset/LibriSpeech/asr/). 
+
+### Fine-tune a CTC model
+- Fine-tune the base model
+    ```bash
+    # Usage: speechlm/scripts/tune_speechlm_asr/finetune_base_ctc.sh <model_path> <data_dir> <cpt_tag> [mount=$PWD] [world_size=8] [update_freq=1]
+    model_path=path/to/your/pre-trained/model
+    data_dir=dataset/LibriSpeech/asr
+    bash speechlm/scripts/tune_speechlm_asr/finetune_base_ctc.sh $model_path $data_dir 'tag400k'
+    ```
+- Fine-tune the large model
+    ```bash
+    # Usage: speechlm/scripts/tune_speechlm_asr/finetune_large_ctc.sh <model_path> <data_dir> <cpt_tag> [mount=$PWD] [world_size=8] [update_freq=4]
+    model_path=path/to/your/pre-trained/model
+    data_dir=dataset/LibriSpeech/asr
+    bash speechlm/scripts/tune_speechlm_asr/finetune_large_ctc.sh $model_path $data_dir 'tag400k'
+    ```
+### Decode
+- Directly decode a CTC model.
+    ```bash
+    # Usage: speechlm/scripts/tune_speechlm_asr/inference_ctc.sh <model_path> <data_dir> [gen-set=dev_clean,dev_other,test_clean,test_other]
+    model_path=path/to/your/fine-tuned/model
+    data_dir=dataset/LibriSpeech/asr
+    bash speechlm/scripts/tune_speechlm_asr/inference_ctc.sh $model_path $data_dir
+    # for large models
+    # bash speechlm/scripts/tune_speechlm_asr/inference_ctc_large.sh $model_path $data_dir
+    ```
+- Decode with 4-gram language model using [flashlight](https://github.com/flashlight/flashlight/tree/main/bindings/python) and [kenlm](https://github.com/kpu/kenlm).
+    > Please put [4-gram.arpa](https://www.openslr.org/resources/11/4-gram.arpa.gz) and the word-to-letter lexicon [librispeech_lexicon.lst](https://drive.google.com/file/d/1q7IbNGqtwXnctjvuvpviQ4ZmepFHQmTO/view?usp=sharing) into `$data_dir`.
+    ```bash
+    # Usage: speechlm/scripts/tune_speechlm_asr/inference_ctc_kenlm.sh <model_path> <data_dir> [gen-set=dev_clean,dev_other,test_clean,test_other]
+    model_path=path/to/your/fine-tuned/model
+    data_dir=dataset/LibriSpeech/asr
+    bash speechlm/scripts/tune_speechlm_asr/inference_ctc_kenlm.sh $model_path $data_dir
+    ```
+- Decode large models with fairseq-lm using [flashlight](https://github.com/flashlight/flashlight/tree/main/bindings/python).
+    > Please put [lm_librispeech_word_transformer.pt](https://dl.fbaipublicfiles.com/wav2letter/sota/2019/lm/lm_librispeech_word_transformer.pt) and its vocabulary [`dict.txt`](https://dl.fbaipublicfiles.com/wav2letter/sota/2019/lm/lm_librispeech_word_transformer.dict) into `$data_dir/fairseq_word_lm`, and the word-to-letter lexicon [librispeech_lexicon.lst](https://drive.google.com/file/d/1q7IbNGqtwXnctjvuvpviQ4ZmepFHQmTO/view?usp=sharing) into `$data_dir`. Capitalize the `dict.txt` to amke it compatible with the word-to-letter lexicon.
+    ```bash
+    # Usage: speechlm/scripts/tune_speechlm_asr/inference_ctc_large_fsqlm.sh <model_path> <data_dir> [gen-set=dev_clean,dev_other,test_clean,test_other]
+    model_path=path/to/your/fine-tuned/model
+    data_dir=dataset/LibriSpeech/asr
+    bash speechlm/scripts/tune_speechlm_asr/inference_ctc_large_fsqlm.sh $model_path $data_dir dev_other
+    ```
+
+## ST on CoVoST-2
+### Data Preparation
+1. Download [Common Voice audio clips](https://commonvoice.mozilla.org/en/datasets) (version 4) for English into `$cv_root/en`.
+2. Get data manifest. The following script will convert mp3 files to waveform, create tsv file containing speech/translation paires, create data config files.
+    ```bash
+    lang=de # ca,ar,tr
+    cv_root=dataset/CommonVoice/v4
+    bash speechlm/data_process/prepare_covost2_enxx.sh $lang $cv_root
+    ```
+    We provided examples in [`dataset/CommonVoice/v4/en/en-de`](dataset/CommonVoice/v4/en/en-de).
+
+### Fine-tune a encoder-decoder model
+- Fine-tune the Base model (fine-tuned models will be stored in `$mount/exp/finetune_covost`).
+
+    ```bash
+    model_path=path/to/your/pre-trained/model
+    lang=de # ca,ar,tr
+    data_dir=dataset/CommonVoice/v4/en/en-${lang}
+    # Usage (Base model): speechlm/scripts/tune_speechlm_st/ft_base_covost_enxx.sh <model_path> <data_dir> <lang> <cpt-tag> [mount=$PWD] [world_size=8] [update_freq=2]
+    bash speechlm/scripts/tune_speechlm_st/ft_base_covost_enxx.sh $model_path $data_dir $lang 'tag400k'
+    ```
+- Fine-tune the Large model (fine-tuned models will be stored in `$mount/exp/finetune_covost`).
+    ```bash
+    # Usage (Large model): speechlm/scripts/tune_speechlm_st/ft_large_covost_enxx.sh <model_path> <data_dir> <lang> <cpt-tag> [mount=$PWD] [world_size=8] [update_freq=4]
+    bash speechlm/scripts/tune_speechlm_st/ft_large_covost_enxx.sh $model_path $data_dir $lang 'tag400k'
+    ```
+
+### Decode
+- Decode the base model
+    ```bash
+    # Usage: speechlm/scripts/tune_speechlm_st/inference_base.sh <model_path> <data_dir> <lang> [gen-set=dev] [beam_size=5]
+    model_path=path/to/your/fine-tuned/model
+    lang=de # ca,ar,tr
+    data_dir=dataset/CommonVoice/v4/en/en-${lang}
+    bash speechlm/scripts/tune_speechlm_st/inference_base.sh $model_path $data_dir $lang dev
+    ```
+- Decode the large model
+    ```bash
+    # Usage: speechlm/scripts/tune_speechlm_st/inference_large.sh <model_path> <data_dir> <lang> [gen-set=dev] [beam_size=5]
+    bash speechlm/scripts/tune_speechlm_st/inference_large.sh $model_path $data_dir $lang dev
+    ```
+
+## Universal Representation Evaluation on SUPERB
+
+Please refer to [**SUPERB**](https://superbbenchmark.org/) for the downstreaming tasks.
+
+## Pre-train
+Please follow the instructions of [Tokenizer](README.md#Tokenizers) to prepare the pre-training data. We provided examples in [`dataset`](dataset).
+- SpeechLM-P Base model
+
+    Models will be stored in `$mount/pretrain`.
+    ```bash
+    data_dir=dataset/LibriSpeech/phone_unit   # should contain train_960.{tsv,phn}
+    text_data_dir=dataset/LibriLM/phone_unit/bin-idx     # should contain train_text.phn-ltr.{phn,ltr}.{bin,idx}
+    # Usage: speechlm/scripts/pretrain_speechlm/base_speechlmp.sh <data_dir> <text_data_dir> [mount=$PWD] [world_size=32] [update_freq=1]
+    bash speechlm/scripts/pretrain_speechlm/base_speechlmp.sh $data_dir $text_data_dir
+    ```
+- SpeechLM-H Base model
+    ```bash
+    data_dir=dataset/LibriSpeech/hidden_unit  # should contain train_960.{tsv,phn}
+    text_data_dir=dataset/LibriLM/km-ltr/bin-idx     # should contain train_text.km-ltr.{km,ltr}.{bin,idx}
+    # Usage: speechlm/scripts/pretrain_speechlm/base_speechlmh.sh <data_dir> <text_data_dir> [mount=$PWD] [world_size=32] [update_freq=1]
+    bash speechlm/scripts/pretrain_speechlm/base_speechlmp.sh $data_dir $text_data_dir
+    ```
+- SpeechLM-P Large model
+    ```bash
+    data_dir=dataset/LibriSpeech/phone_unit   # should contain train_960.{tsv,phn}
+    text_data_dir=dataset/LibriLM/phone_unit/bin-idx     # should contain train_text.phn-ltr.{phn,ltr}.{bin,idx}
+    # Usage: speechlm/scripts/pretrain_speechlm/base_speechlmp.sh <data_dir> <text_data_dir> [mount=$PWD] [world_size=32] [update_freq=1]
+    bash speechlm/scripts/pretrain_speechlm/large_speechlmp.sh $data_dir $text_data_dir
+    ```
+
+
+## Tokenizers
+### Phoneme-unit Tokenizer for Speech
+This tokenizer is used to produce the frame-laigned phonemes for unlabeled speech, which is actually a hybrid HMM ASR model.
+
+In the Base setting, we use 100h LibriSpeech labeled data to train the HMM model under Kaldi recipe, then decode the unpaired speech and get the aligned phonemes from the lattice.
+Here we provided the processed phonemes of 960h speech here: [`train_960.tsv`](https://drive.google.com/file/d/1rxlikMglL2kEsF4NfqekZRoA02klY7CE/view?usp=sharing), [`train_960.phn`](), [`dev_clean.tsv`](https://drive.google.com/file/d/1NuVwe687jLBFkDLRy1EV2A2uXyV_kBo2/view?usp=sharing), [`dev_clean.phn`](https://drive.google.com/file/d/1cq_gbS-UgCALOoaE5QmhWrhkTdXuc_Uc/view?usp=sharing). Note that the label-rate is 100 (10ms).
+
+> The phoneme inventory is 300+ word-position-dependent phones including silence phones.
+
+### Phoneme-unit Tokenizer for Text
+This tokenizer is used to phonemize the unpaired text data to (phonemes, letters) paired data, following a `words -> phonemes -> upsampled phones` pipeline.
+
+The following script will download LibriSpeech LM corpus and produce the required data: `train_text.phn-ltr.phn.{idx,bin}` and `train_text.phn-ltr.ltr.{idx,bin}`. 
+> Before runing it, make sure you have our provided [`dict.phn.txt`](dataset/LibriLM/phone_unit/bin-idx/dict.phn.txt) and [`dict.ltr.txt`](dataset/LibriLM/phone_unit/bin-idx/dict.ltr.txt) in the output dir `dataset/LibriLM/phone_unit/bin-idx/`.
+
+> The phoneme inventory is 300+ word-position-dependent phones including silence phones.
+
+```bash
+# data will be in dataset/LibriLM/phone_unit/
+bash speechlm/data_process/prepare_phn2ltr_librilm.sh
+```
+### Hidden-unit Tokenizer for Speech
+Please follow the steps of data preparation for HuBERT [here](https://github.com/facebookresearch/fairseq/tree/main/examples/hubert#data-preparation) to prepare 1) wav recordings [`train.tsv`](dataset/LibriSpeech/hidden_unit/train_sample100.tsv) and 2) corresponding hidden-units [`train.km`](dataset/LibriSpeech/hidden_unit/train_sample100.km), and 3) unit vocabulary [`dict.km.txt`](dataset/LibriSpeech/hidden_unit/dict.km.txt).
+
+### Hidden-unit Tokenizer for Text
+This tokenizer is used to produce the speech-style hidden units from unpaired text.
+We train a [FastSpeech](https://arxiv.org/abs/2006.04558)-like model (instead generating continuous spectrum in the original paper, here we generate discrete units) on a small amount of ASR data ([100 hrs LibriSpeech](http://www.openslr.org/12)) as the tokenizer.
+
+Train:
+1. Convert asr transcripts to phoneme sequence with duration information.
+2. Extract hidden-units from speech, using the [Hidden-unit Tokenizer for Speech](#hidden-unit-tokenizer-for-speech).
+3. Train the [model](speechlm/models/fasttext2unit.py) on the paired data:
+    ```bash
+    data_dir=dataset/LibriSpeech/fast_phone2unit
+    bash speechlm/scripts/tokenizer_fastT2U/train_s_5e-4.sh $data_dir
+    ```
+> The phoneme inventory is 41 mono phones including silence phones.
+
+Inference:
+
+4. Convert text data to phoneme sequence by [`lexicon`](https://drive.google.com/file/d/1dh9NEx_cCF9_Aa0UcKyl9j00GXs6LmLQ/view?usp=sharing).
+5. [Generate](speechlm/scripts/tokenizer_fastT2U/generate.sh) hidden units for a large text corpus:
+    ```bash
+    gen_set=dataset/LibriSpeech/fast_phone2unit/genset_examples
+    bash speechlm/scripts/tokenizer_fastT2U/generate.sh $model_path $gen_set
+    ```
+We provided train/generate data examples in [`dataset/LibriSpeech/fast_phone2unit`](dataset/LibriSpeech/fast_phone2unit), and the model checkpoint [here](https://drive.google.com/file/d/1e-aYf8hPXuly8DEvNg5SISOlcUxsgED0/view?usp=sharing).
+
+## License
+
+This project is licensed under the license found in the LICENSE file in the root directory of this source tree.
+Portions of the source code are based on the [FAIRSEQ](https://github.com/pytorch/fairseq).
+
+[Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct)
+
+## Reference
+
+If you find our work is useful in your research, please cite the following paper:
+
+```bibtex
+@article{zhang2022speechlm,
+  title   = {SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data},
+  author  = {Zhang, Ziqiang and Chen, Sanyuan and Zhou, Long and Wu, Yu and Ren, Shuo and Liu, Shujie and Yao, Zhuoyuan and Gong, Xun and Dai, Lirong and Li, Jinyu and Wei, Furu},
+  eprint={2209.15329},
+  archivePrefix={arXiv},
+  primaryClass={cs.CL},
+  year={2022}
+}
+```
+
+### Contact Information
+
+For help or issues using SpeechLM models, please submit a GitHub issue.
+
+For other communications related to SpeechLM, please contact Long Zhou (`lozhou@microsoft.com`).
+
diff --git a/SpeechLM/SpeechLM.py b/SpeechLM/SpeechLM.py
new file mode 100644
index 0000000000000000000000000000000000000000..b242dde083e272f96e80791f13803c44b438991d
--- /dev/null
+++ b/SpeechLM/SpeechLM.py
@@ -0,0 +1,667 @@
+# ----------------------------------------------------------------------------
+# SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data (https://arxiv.org/abs/2209.15329)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/SpeechLM
+# Code based on fairseq: https://github.com/facebookresearch/fairseq
+# 
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# ----------------------------------------------------------------------------
+
+import copy
+import logging
+from typing import Dict, List, Optional, Tuple
+
+import numpy as np
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from torch import Tensor
+
+from modules import (
+    compute_mask_indices,
+    LayerNorm,
+    ConvFeatureExtractionModel,
+    GradMultiply,
+    TransformerEncoder,
+    TransformerEncoderBase,
+
+)
+
+# from fairseq.models.transformer import TransformerConfig
+
+logger = logging.getLogger(__name__)
+
+class DictConfig:
+    def __init__(self, cfg=None):
+        if cfg is not None:
+            self.update(cfg)
+    
+    def update(self, cfg: dict):
+        self.__dict__.update(cfg)
+
+
+class TransformerConfig:
+    def __init__(self, cfg=None):
+        if cfg is not None:
+            self.update(cfg)
+    
+    def update(self, cfg: dict):
+        if 'encoder' in cfg:
+            self.encoder = DictConfig(cfg['encoder'])
+            del cfg['encoder']
+        if 'quant_noise' in cfg:
+            self.quant_noise = DictConfig(cfg['quant_noise'])
+            del cfg['quant_noise']
+        if 'decoder' in cfg:
+            del cfg['decoder']
+        self.__dict__.update(cfg)
+
+
+class SpeechLMConfig:
+    def __init__(self, cfg=None):
+        self.label_rate: int = 50
+        self.extractor_mode: str = "default"     # mode for feature extractor. default has a single group norm with d groups in the first conv block, whereas layer_norm has layer norms in every block (meant to use with normalize=True)
+        self.encoder_layers: int = 12     # num encoder layers in the transformer
+        self.encoder_embed_dim: int = 768     # encoder embedding dimension
+        self.encoder_embed_dim: int = 768     # encoder embedding dimension
+        self.encoder_ffn_embed_dim: int = 3072     # encoder embedding dimension for FFN
+        self.encoder_attention_heads: int = 12     # num encoder attention heads
+        self.activation_fn: str = "gelu"     # activation function to use
+        self.layer_type: str = "transformer"     # layer type in encoder
+
+        # dropouts
+        self.dropout: float = 0.1     # dropout probability for the transformer
+        self.attention_dropout: float = 0.1     # dropout probability for attention weights
+        self.activation_dropout: float = 0.0     # dropout probability after activation in FFN
+        self.encoder_layerdrop: float = 0.0     # probability of dropping a tarnsformer layer
+        self.dropout_input: float = 0.0     # dropout to apply to the input (after feat extr)
+        self.dropout_features: float = 0.0     # dropout to apply to the features (after feat extr)
+
+        self.final_dim: int = 256   # project final representations and targets to this many dimensions
+        self.layer_norm_first: bool = False     # apply layernorm first in the transformer
+        self.conv_feature_layers: str = "[(512,10,5)] + [(512,3,2)] * 4 + [(512,2,2)] * 2"     # string describing convolutional feature extraction layers in form of a python list that contains [(dim, kernel_size, stride), ...]
+        self.conv_bias: bool = False     # include bias in conv encoder
+        self.feature_grad_mult: float = 1.0     # multiply feature extractor var grads by this
+
+        # masking
+        self.mask_length: int = 10     # mask length
+        self.mask_prob: float = 0.65     # probability of replacing a token with mask
+        self.mask_selection: str = "static"     # how to choose mask length
+        self.mask_other: float = 0     # secondary mask argument (used for more complex distributions), see help in compute_mask_indicesh
+        self.no_mask_overlap: bool = False     # whether to allow masks to overlap
+        self.mask_min_space: int = 1     # min space between spans (if no overlap is enabled)
+
+
+        # channel masking
+        self.mask_channel_length: int = 10     # length of the mask for features (channels)
+        self.mask_channel_prob: float = 0.0     # probability of replacing a feature with 0
+        self.mask_channel_selection: str = "static"     # how to choose mask length for channel masking
+        self.mask_channel_other: float = 0     # secondary mask argument (used for more complex distributions), see help in compute_mask_indices
+        self.no_mask_channel_overlap: bool = False     # whether to allow channel masks to overlap
+        self.mask_channel_min_space: int = 1     # min space between spans (if no overlap is enabled)
+
+        # positional embeddings
+        self.conv_pos: int = 128     # number of filters for convolutional positional embeddings
+        self.conv_pos_groups: int = 16     # number of groups for convolutional positional embedding
+
+        # loss computation
+        self.skip_masked: bool = False  # skip computing losses over masked frames
+        self.skip_nomask: bool = False  # skip computing losses over unmasked frames
+        self.checkpoint_activations: bool = False   # recompute activations and save memory for extra compute
+        
+        # FP16 optimization
+        self.required_seq_len_multiple: int = 2     # pad the input to encoder such that the sequence length is divisible by multiple
+
+        # Custom
+        self.use_rel_pos_enc: bool = False  # whether to use relative positional encoding
+        self.scaling_for_att: float = 1.0   # scaling for attention weights to prevent overflow issue (for large model)
+
+        # unit encoder-decoder
+        self.add_unit_encoder: bool = False # add unit encoder
+
+        # embedding mixing
+        self.mix_with_unit: bool = True # mix with the unit embeddings
+        self.use_pred_unit: bool = False    # use the embeddings of predicted units
+        self.l2_embedding: bool = False # compute l2 loss between unit embedding and unit hidden state
+        
+        if cfg is not None:
+            self.update(cfg)
+    
+    def update(self, cfg: dict):
+        model_cfg = copy.deepcopy(cfg)
+        self.text_transformer = TransformerConfig(model_cfg['text_transformer'])
+        del model_cfg['text_transformer']
+        self.__dict__.update(model_cfg)
+
+class SpeechLM(nn.Module):
+    def __init__(
+        self,
+        cfg: SpeechLMConfig,
+    ) -> None:
+        super().__init__()
+        self.cfg = cfg
+
+        feature_enc_layers = eval(cfg.conv_feature_layers)  # noqa
+        self.embed = feature_enc_layers[-1][0]
+
+        self.feature_extractor = ConvFeatureExtractionModel(
+            conv_layers=feature_enc_layers,
+            dropout=0.0,
+            mode=cfg.extractor_mode,
+            conv_bias=cfg.conv_bias,
+        )
+        sample_rate = 16000
+        feature_ds_rate = np.prod([s for _, _, s in feature_enc_layers])
+        self.feat2tar_ratio = cfg.label_rate * feature_ds_rate / sample_rate
+
+        self.post_extract_proj = (
+            nn.Linear(self.embed, cfg.encoder_embed_dim)
+            if self.embed != cfg.encoder_embed_dim
+            else None
+        )
+
+        self.mask_prob = cfg.mask_prob
+        self.mask_selection = cfg.mask_selection
+        self.mask_other = cfg.mask_other
+        self.mask_length = cfg.mask_length
+        self.no_mask_overlap = cfg.no_mask_overlap
+        self.mask_min_space = cfg.mask_min_space
+
+        self.mask_channel_prob = cfg.mask_channel_prob
+        self.mask_channel_selection = cfg.mask_channel_selection
+        self.mask_channel_other = cfg.mask_channel_other
+        self.mask_channel_length = cfg.mask_channel_length
+        self.no_mask_channel_overlap = cfg.no_mask_channel_overlap
+        self.mask_channel_min_space = cfg.mask_channel_min_space
+
+        self.dropout_input = nn.Dropout(cfg.dropout_input)
+        self.dropout_features = nn.Dropout(cfg.dropout_features)
+
+        self.feature_grad_mult = cfg.feature_grad_mult
+        self.logit_temp = cfg.logit_temp
+        self.skip_masked = cfg.skip_masked
+        self.skip_nomask = cfg.skip_nomask
+
+        self.final_dim = cfg.final_dim if cfg.final_dim > 0 else cfg.encoder_embed_dim
+        self.final_proj_list = nn.ModuleList([
+            nn.Linear(cfg.encoder_embed_dim, self.final_dim) for _ in range(2)
+        ])
+
+        self.mask_emb = nn.Parameter(
+            torch.FloatTensor(cfg.encoder_embed_dim).uniform_()
+        )
+
+        self.encoder = TransformerEncoder(cfg)
+        self.layer_norm = LayerNorm(self.embed)
+
+        ### build unit encoder:
+        self.mask_u2t = cfg.mask_u2t
+        self.compute_mum = cfg.compute_mum
+        self.add_text_ctc = cfg.add_text_ctc
+        self.text_ctc_conv_kernel = cfg.text_ctc_conv_kernel
+        self.padding_idx = 1
+
+        self.add_unit_encoder = cfg.add_unit_encoder
+        self.mix_with_unit = cfg.mix_with_unit
+        self.use_pred_unit = cfg.use_pred_unit
+        self.l2_embedding = cfg.l2_embedding
+        if self.add_unit_encoder:
+            self.unit_embed_tokens = None
+            ### build unit encoder
+            self.unit_encoder = TransformerEncoderBase(
+                cfg.text_transformer, 
+                dictionary=None, 
+                embed_tokens=self.unit_embed_tokens,
+                use_rel_pos_enc=cfg.use_rel_pos_enc,
+                scaling_for_att=cfg.scaling_for_att,
+            )
+            
+        ### build unit2text decoder, not available for now
+        self.add_decoder = cfg.add_decoder
+
+    def upgrade_state_dict_named(self, state_dict, name):
+        """Upgrade a (possibly old) state dict for new versions."""
+
+        super().upgrade_state_dict_named(state_dict, name)
+        return state_dict
+
+    def apply_mask(self, x, padding_mask, target_list):
+        B, T, C = x.shape
+        if self.mask_prob > 0:
+            mask_indices = compute_mask_indices(
+                (B, T),
+                padding_mask,
+                self.mask_prob,
+                self.mask_length,
+                self.mask_selection,
+                self.mask_other,
+                min_masks=2,
+                no_overlap=self.no_mask_overlap,
+                min_space=self.mask_min_space,
+            )
+            mask_indices = torch.from_numpy(mask_indices).to(x.device)
+            x[mask_indices] = self.mask_emb
+        else:
+            mask_indices = None
+
+        if self.mask_channel_prob > 0:
+            mask_channel_indices = compute_mask_indices(
+                (B, C),
+                None,
+                self.mask_channel_prob,
+                self.mask_channel_length,
+                self.mask_channel_selection,
+                self.mask_channel_other,
+                no_overlap=self.no_mask_channel_overlap,
+                min_space=self.mask_channel_min_space,
+            )
+            mask_channel_indices = (
+                torch.from_numpy(mask_channel_indices)
+                .to(x.device)
+                .unsqueeze(1)
+                .expand(-1, T, -1)
+            )
+            x[mask_channel_indices] = 0
+
+        return x, mask_indices
+
+    def forward_features(self, source: torch.Tensor) -> torch.Tensor:
+        if self.feature_grad_mult > 0:
+            features = self.feature_extractor(source)
+            if self.feature_grad_mult != 1.0:
+                features = GradMultiply.apply(features, self.feature_grad_mult)
+        else:
+            with torch.no_grad():
+                features = self.feature_extractor(source)
+        return features
+
+    def forward_targets(
+        self,
+        features: torch.Tensor,
+        target_list: List[torch.Tensor],
+    ) -> Tuple[torch.Tensor, torch.Tensor]:
+        # Trim features to ensure labels exist and then get aligned labels
+        feat_tsz = features.size(2)
+        targ_tsz = min([t.size(1) for t in target_list])
+        if self.feat2tar_ratio * feat_tsz > targ_tsz:
+            feat_tsz = int(targ_tsz / self.feat2tar_ratio)
+            features = features[..., :feat_tsz]
+        target_inds = torch.arange(feat_tsz).float() * self.feat2tar_ratio
+        target_inds += np.random.choice(int(self.feat2tar_ratio))
+        target_list = [t[:, target_inds.long()] for t in target_list]
+        return features, target_list
+
+    def forward_padding_mask(
+        self,
+        features: torch.Tensor,
+        padding_mask: torch.Tensor,
+    ) -> torch.Tensor:
+        extra = padding_mask.size(1) % features.size(1)
+        if extra > 0:
+            padding_mask = padding_mask[:, :-extra]
+        padding_mask = padding_mask.view(padding_mask.size(0), features.size(1), -1)
+        padding_mask = padding_mask.all(-1)
+        return padding_mask
+
+    def get_normalized_probs(
+        self,
+        net_output: Tuple[Tensor, Optional[Dict[str, List[Optional[Tensor]]]]],
+        log_probs: bool,
+        sample: Optional[Dict[str, Tensor]] = None,
+    ):
+        lprobs = self.get_normalized_probs_scriptable(net_output, log_probs, sample)
+        lprobs.batch_first = True
+        return lprobs
+
+    def downsample_ctc_padding_mask(self, padding_mask):
+        """
+        padding_mask: (B, T)
+        """
+        stride = self.text_ctc_conv_kernel // 2
+        return padding_mask[:, ::stride]
+    
+    def compute_pred(self, proj_x, label_embs):
+        if self.target_glu:
+            label_embs = self.target_glu(label_embs)
+        x = F.normalize(proj_x.float(), dim=-1)                 # (S, D)
+        label_embs = F.normalize(label_embs.float(), dim=-1)    # (C, D)
+        logits = torch.matmul(x, label_embs.T).type_as(proj_x)  # (S, C)
+        logits /= self.logit_temp
+        return logits
+
+    def compute_hubert_logits(self, x, target, proj, label_embs, padding_mask, mask_indices):
+        if not self.skip_masked:
+            masked_indices = torch.logical_and(~padding_mask, mask_indices)
+            proj_x_m = proj(x[masked_indices])
+            logit_m_list = [(self.compute_pred(proj_x_m, label_embs), target[masked_indices])]
+        else:
+            logit_m_list = [None]
+
+        if not self.skip_nomask:
+            nomask_indices = torch.logical_and(~padding_mask, ~mask_indices)
+            proj_x_u = proj(x[nomask_indices])
+            logit_u_list = [(self.compute_pred(proj_x_u, label_embs), target[nomask_indices])]
+        else:
+            logit_u_list = [None]
+
+        return logit_m_list, logit_u_list
+
+    def convert_embeddings(self,
+        x,
+        padding_mask,
+        target=None,
+        mask_indices=None,
+        mix_with_unit=False,
+        use_pred_unit=False,
+        l2_embedding=False,
+        remask=False
+    ):
+        """
+        1. Mix with units if needed (default: True)
+        2. Prepare for unit_encoder inputs
+        Inputs:
+            x, (B, T, D)
+        Return:
+            src_tokens, (B, T)
+            soft_embeddings, (B, T, D)
+            l2_loss, a loss
+        """
+        soft_embeddings = self.final_proj_list[0](x) if x.size(-1) == self.final_dim else x
+        if padding_mask is None:
+            padding_mask = soft_embeddings.new_zeros(soft_embeddings.size(0), soft_embeddings.size(1), dtype=torch.long)
+        if use_pred_unit:
+            src_tokens = self.compute_pred(self.final_proj_list[0](x), self.label_embs_list[0]).argmax(dim=-1)
+            src_tokens[padding_mask] = self.padding_idx
+        elif target is not None:
+            src_tokens = target
+        else:
+            src_tokens = padding_mask.long()
+
+        if l2_embedding | mix_with_unit:
+            unit_embeddings = self.unit_embed_tokens(src_tokens)    # (B, T, D)
+        
+        l2_loss = 0
+        if l2_embedding:
+            if mask_indices is not None:
+                l2_loss = (soft_embeddings - unit_embeddings)[mask_indices].float().pow(2).mean(dim=-1)
+                scale = unit_embeddings[mask_indices].float().pow(2).sum(dim=-1)
+            else:
+                l2_loss = (soft_embeddings - unit_embeddings).float().pow(2).mean(dim=-1)
+                scale = unit_embeddings.float().pow(2).sum(dim=-1)
+            l2_loss = (l2_loss / scale).mean()
+
+        if mix_with_unit:
+            B, T, D = x.shape
+            selected_indices = compute_mask_indices(
+                (B, T),
+                padding_mask,
+                self.mask_prob / 2,
+                self.mask_length // 2,
+                self.mask_selection,
+                self.mask_other,
+                min_masks=2,
+                no_overlap=self.no_mask_overlap,
+                min_space=self.mask_min_space,
+            )
+            selected_indices = torch.from_numpy(selected_indices).to(x.device)
+            if mask_indices is not None:
+                if remask:
+                    remask_indices = torch.logical_and(selected_indices, mask_indices)
+                    soft_embeddings[remask_indices] = self.mask_emb
+                swap_indices = torch.logical_and(selected_indices, ~mask_indices)
+            else:
+                swap_indices = selected_indices
+            soft_embeddings[swap_indices] = unit_embeddings[swap_indices]
+
+        soft_embeddings = soft_embeddings * (1 - padding_mask.unsqueeze(-1).type_as(x))
+        return src_tokens, soft_embeddings, l2_loss
+
+    def forward(
+        self,
+        source: torch.Tensor = None,
+        src_tokens: torch.Tensor = None,
+        src_lengths: torch.Tensor = None,
+        target_list: Optional[List[torch.Tensor]] = None,
+        padding_mask: Optional[torch.Tensor] = None,
+        mask: bool = True,
+        features_only: bool = False,
+        output_layer: Optional[int] = None,
+    ) -> Dict[str, torch.Tensor]:
+        assert source is not None or src_tokens is not None
+        if source is not None:
+            return self.forward_speech(
+                source=source,
+                target_list=target_list,
+                padding_mask=padding_mask,
+                mask=mask,
+                features_only=features_only,
+                output_layer=output_layer,
+            )
+        else:
+            return self.forward_text(
+                src_tokens=src_tokens,
+                src_lengths=src_lengths,
+                mask=self.mask_u2t,
+                output_layer=output_layer,
+            )
+    
+    def forward_speech(
+        self,
+        source: torch.Tensor = None,
+        target_list: Optional[List[torch.Tensor]] = None,
+        padding_mask: Optional[torch.Tensor] = None,
+        mask: bool = True,
+        features_only: bool = False,
+        output_layer: Optional[int] = None,
+    ) -> Dict[str, torch.Tensor]:
+        """output layer is 1-based"""
+        features = self.forward_features(source)
+        if target_list is not None:
+            features, target_list = self.forward_targets(features, target_list)
+
+        features_pen = features.float().pow(2).mean()
+
+        features = features.transpose(1, 2)
+        features = self.layer_norm(features)
+        unmasked_features = features.clone()
+
+        if padding_mask is not None:
+            padding_mask = self.forward_padding_mask(features, padding_mask)
+
+        if self.post_extract_proj is not None:
+            features = self.post_extract_proj(features)
+
+        features = self.dropout_input(features)
+        unmasked_features = self.dropout_features(unmasked_features)
+
+        if mask:
+            x, mask_indices = self.apply_mask(features, padding_mask, target_list)
+        else:
+            x = features
+            mask_indices = None
+
+        # feature: (B, T, D), float
+        # target: (B, T), long
+        # x: (B, T, D), float
+        # padding_mask: (B, T), bool
+        # mask_indices: (B, T), bool
+        x, layer_results = self.encoder(
+            x,
+            padding_mask=padding_mask,
+            layer=None if output_layer is None else output_layer - 1,
+        )
+
+        if features_only:
+            return {"x": x, "padding_mask": padding_mask, "features": features, "layer_results": layer_results}
+        
+        logit_m_list, logit_u_list = self.compute_hubert_logits(
+            x,
+            target_list[0],
+            self.final_proj_list[0],
+            self.label_embs_list[0],
+            padding_mask,
+            mask_indices,
+        )
+
+        result = {
+            "logit_m_list": logit_m_list,
+            "logit_u_list": logit_u_list,
+            "padding_mask": padding_mask,
+            "features_pen": features_pen,
+        }
+        
+        if self.add_unit_encoder:
+            src_tokens, x_emb, l2_loss = self.convert_embeddings(
+                x, 
+                padding_mask, target_list[0],
+                mask_indices=mask_indices,
+                mix_with_unit=self.mix_with_unit,
+                use_pred_unit=self.use_pred_unit,
+                l2_embedding=self.l2_embedding,
+            )
+            encoder_out = self.unit_encoder(src_tokens, token_embeddings=x_emb)
+
+            result['encoder_out'] = encoder_out['encoder_out']  # [(T, B, D)]
+            result['encoder_padding_mask'] = encoder_out['encoder_padding_mask']    # [(B, T)]
+            if self.l2_embedding:
+                result['embedding_l2_loss'] = l2_loss
+
+            code_logit_m_list, code_logit_u_list = self.compute_hubert_logits(
+                encoder_out['encoder_out'][0].transpose(0, 1), 
+                target_list[-1], 
+                self.final_proj_list[-1], 
+                self.label_embs_list[-1],
+                padding_mask,
+                mask_indices,
+            )
+            result['logit_m_list'] += code_logit_m_list
+            result['logit_u_list'] += code_logit_u_list
+        return result
+
+    def forward_text(
+        self,
+        src_tokens: torch.Tensor = None,
+        src_lengths: torch.Tensor = None,
+        target_list: Optional[List[torch.Tensor]] = None,
+        mask: bool = True,
+        output_layer: Optional[int] = None,
+    ) -> Dict[str, torch.Tensor]:
+        assert self.add_unit_encoder, f"Can not forward unit-text branch without unit_encoder!"
+
+        padding_mask = src_tokens == self.padding_idx
+        unit_embeddings = self.unit_embed_tokens(src_tokens)
+        if mask:
+            unit_embeddings, mask_indices = self.apply_mask(unit_embeddings, padding_mask, [src_tokens])
+        else:
+            ### If already applied mask on src_tokens, then the target_list should contains many padding_idx
+            mask_indices = target_list[-1] != self.padding_idx
+            unit_embeddings[mask_indices] = self.mask_emb
+        
+        encoder_out = self.unit_encoder(
+            src_tokens,
+            token_embeddings=unit_embeddings,
+            return_all_hiddens=output_layer is not None,
+        )
+
+        result = {}
+        result["encoder_out"] = encoder_out["encoder_out"]
+        result["encoder_states"] = encoder_out["encoder_states"]
+        result["padding_mask"] = padding_mask
+
+        if self.compute_mum:
+            code_logit_m_list, code_logit_u_list = self.compute_hubert_logits(
+                encoder_out["encoder_out"].transpose(0, 1), 
+                target_list[-1], 
+                self.final_proj_list[-1], 
+                self.label_embs_list[-1],
+                padding_mask,
+                mask_indices,
+            )
+            result["logit_m_list"] = code_logit_m_list
+            result["logit_u_list"] = code_logit_u_list
+        
+        if self.add_text_ctc:
+            result["encoder_out_ctc"] = [self.unit_encoder_ctc_head(x) for x in encoder_out['encoder_out']]
+            result["encoder_padding_mask"] = [
+                self.downsample_ctc_padding_mask(padding_mask) for padding_mask in encoder_out['encoder_padding_mask']
+            ]
+        return result
+
+    def extract_features(
+        self,
+        source: torch.Tensor,
+        padding_mask: Optional[torch.Tensor] = None,
+        mask: bool = False,
+        ret_conv: bool = False,
+        output_layer: Optional[int] = None,
+        ret_layer_results: bool = False,
+    ) -> Tuple[torch.Tensor, torch.Tensor]:
+        """Extract features for only speech input"""
+        with torch.no_grad():
+            res = self.forward(
+                source,
+                padding_mask=padding_mask,
+                mask=mask,
+                features_only=True,
+                output_layer=output_layer,
+            )
+            # {"x": x, "padding_mask": padding_mask, "features": features, "layer_results": layer_results}
+
+            x = res["x"] # B x T x D
+            padding_mask = res["padding_mask"]
+            if self.add_unit_encoder and (output_layer is None or output_layer > self.cfg.encoder_layers):
+                src_tokens, x, _ = self.convert_embeddings(
+                    x,
+                    padding_mask,
+                    mix_with_unit=False,
+                    use_pred_unit=False,
+                )
+                return_all_hiddens=output_layer is not None and output_layer > self.cfg.encoder_layers
+                encoder_out = self.unit_encoder(
+                    src_tokens,
+                    token_embeddings=x,
+                    return_all_hiddens=return_all_hiddens,
+                )
+                res["x"] = encoder_out['encoder_out'][0].transpose(0, 1)  # (B, T, D)
+                if return_all_hiddens:
+                    res["layer_results"] += encoder_out['encoder_states'][1:1+output_layer-len(res["layer_results"])]
+            
+            feature = res["features"] if ret_conv else res["x"]
+            if ret_layer_results:
+                feature = (feature, res["layer_results"])
+
+            return feature, padding_mask
+
+    def get_logits(self, net_output, is_masked=True):
+        if is_masked:
+            logits_list = net_output["logit_m_list"]
+        else:
+            logits_list = net_output["logit_u_list"]
+        logits_list = [x[0].float() for x in logits_list if x is not None]
+        return logits_list
+
+    def get_targets(self, net_output, is_masked=True):
+        if is_masked:
+            logits_list = net_output["logit_m_list"]
+        else:
+            logits_list = net_output["logit_u_list"]
+        targets_list = [x[1].long() for x in logits_list if x is not None]
+        return targets_list
+
+    def get_extra_losses(self, net_output):
+        extra_losses = []
+        names = []
+
+        if "features_pen" in net_output:
+            extra_losses.append(net_output["features_pen"])
+            names.append("features_pen")
+
+        if "embedding_l2_loss" in net_output:
+            extra_losses.append(net_output["embedding_l2_loss"])
+            names.append("embedding_l2_loss")
+
+        return extra_losses, names
+
+    def remove_pretraining_modules(self, step2=False):
+        self.target_glu = None
+
diff --git a/SpeechLM/dataset/CommonVoice/v4/en/en-de/config_base_ende.yaml b/SpeechLM/dataset/CommonVoice/v4/en/en-de/config_base_ende.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..50733b2740c6f02f3adfc1d536a3a4005ffa7d6a
--- /dev/null
+++ b/SpeechLM/dataset/CommonVoice/v4/en/en-de/config_base_ende.yaml
@@ -0,0 +1,14 @@
+bpe_tokenizer:
+  bpe: sentencepiece
+  sentencepiece_model: spm_char_st_en_de.model
+
+shuffle: false
+use_audio_input: true
+use_sample_rate: 16000
+standardize_audio: false
+vocab_filename: spm_char_st_en_de.txt
+              
+# required by speech_to_text task but never used  
+input_channels: 1
+input_feat_per_channel: 1
+
diff --git a/SpeechLM/dataset/CommonVoice/v4/en/en-de/config_large_ende.yaml b/SpeechLM/dataset/CommonVoice/v4/en/en-de/config_large_ende.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..d3424a3c55f0e48e8197d98cd3e724baa08c834f
--- /dev/null
+++ b/SpeechLM/dataset/CommonVoice/v4/en/en-de/config_large_ende.yaml
@@ -0,0 +1,14 @@
+bpe_tokenizer:
+  bpe: sentencepiece
+  sentencepiece_model: spm_char_st_en_de.model
+
+shuffle: false
+use_audio_input: true
+use_sample_rate: 16000
+standardize_audio: true
+vocab_filename: spm_char_st_en_de.txt
+              
+# required by speech_to_text task but never used  
+input_channels: 1
+input_feat_per_channel: 1
+
diff --git a/SpeechLM/dataset/CommonVoice/v4/en/en-de/dev-sample100_st_en_de_local.tsv b/SpeechLM/dataset/CommonVoice/v4/en/en-de/dev-sample100_st_en_de_local.tsv
new file mode 100644
index 0000000000000000000000000000000000000000..c4251fa8a24f33e2ebd44ad90899c0778e24aaf8
--- /dev/null
+++ b/SpeechLM/dataset/CommonVoice/v4/en/en-de/dev-sample100_st_en_de_local.tsv
@@ -0,0 +1,100 @@
+id	audio	n_frames	tgt_text
+common_voice_en_18540003	/LocalData/dataset/CommonVoice/v4/en/wav/common_voice_en_18540003.wav	90624	Wenn Wasser knapp ist, verschwenden Sie es nicht.
+common_voice_en_18540005	/LocalData/dataset/CommonVoice/v4/en/wav/common_voice_en_18540005.wav	57984	Du fährst mit ihr bis zu ihrer Tür.
+common_voice_en_18540006	/LocalData/dataset/CommonVoice/v4/en/wav/common_voice_en_18540006.wav	63744	Celia schreckte zurück und zitterte.
+common_voice_en_65557	/LocalData/dataset/CommonVoice/v4/en/wav/common_voice_en_65557.wav	40704	Haben Sie einen Ring?
+common_voice_en_65559	/LocalData/dataset/CommonVoice/v4/en/wav/common_voice_en_65559.wav	44160	Ich habe ihn nicht einmal gefragt.
+common_voice_en_19594267	/LocalData/dataset/CommonVoice/v4/en/wav/common_voice_en_19594267.wav	110208	Der größte See nach Fläche in der Mongolei, der Uvs-See, ist in der Great Lakes Depression.
+common_voice_en_19594268	/LocalData/dataset/CommonVoice/v4/en/wav/common_voice_en_19594268.wav	91392	Die darauffolgende Wiedervereinigung mit Rom hat bis heute ununterbrochen angedauert.
+common_voice_en_19594269	/LocalData/dataset/CommonVoice/v4/en/wav/common_voice_en_19594269.wav	64128	Die Saiten könnten aus Messing oder Stahl sein.
+common_voice_en_18282099	/LocalData/dataset/CommonVoice/v4/en/wav/common_voice_en_18282099.wav	67584	Andrew rollte sich in der Box zusammen.
+common_voice_en_2518264	/LocalData/dataset/CommonVoice/v4/en/wav/common_voice_en_2518264.wav	61824	Säure ätzt Locher in Wollstoff.
+common_voice_en_18909686	/LocalData/dataset/CommonVoice/v4/en/wav/common_voice_en_18909686.wav	147072	Dies wurde später von Herny Seebohm beschrieben und Riesen-Fischuhu genannt.
+common_voice_en_18909688	/LocalData/dataset/CommonVoice/v4/en/wav/common_voice_en_18909688.wav	114048	Er ist auch dazu in der Lage, über kurze Distanzen zu schweben.
+common_voice_en_18909689	/LocalData/dataset/CommonVoice/v4/en/wav/common_voice_en_18909689.wav	85248	So konnte Letta seine große Koalition fortsetzen.
+common_voice_en_18460666	/LocalData/dataset/CommonVoice/v4/en/wav/common_voice_en_18460666.wav	56064	Es nicht gekostet wegschieben?
+common_voice_en_18460690	/LocalData/dataset/CommonVoice/v4/en/wav/common_voice_en_18460690.wav	68736	Ich bin verzweifelt, und damit hatte es sich.
+common_voice_en_18460692	/LocalData/dataset/CommonVoice/v4/en/wav/common_voice_en_18460692.wav	54912	Ich folge dir nicht, Jeeves.
+common_voice_en_485640	/LocalData/dataset/CommonVoice/v4/en/wav/common_voice_en_485640.wav	70272	Ordentliche Pläne scheitern ohne Glück.
+common_voice_en_89833	/LocalData/dataset/CommonVoice/v4/en/wav/common_voice_en_89833.wav	128256	Das ist ein super Armband, das du trägst.
+common_voice_en_19001715	/LocalData/dataset/CommonVoice/v4/en/wav/common_voice_en_19001715.wav	148224	Der Buddhismus in Afghanistan wurde von den Saffariden, Ghaznawiden und Ghuriden erfolgreich beseitigt.
+common_voice_en_19001716	/LocalData/dataset/CommonVoice/v4/en/wav/common_voice_en_19001716.wav	80256	Das System sieht einen frei schwebenden Lauf vor.
+common_voice_en_19001719	/LocalData/dataset/CommonVoice/v4/en/wav/common_voice_en_19001719.wav	109056	Diese bekannten Murderabilia-Händler finden Sie auf den folgenden Websites.
+common_voice_en_9774	/LocalData/dataset/CommonVoice/v4/en/wav/common_voice_en_9774.wav	119040	Sie liest unheimlich gern, weiß jedoch nicht so genau, wie das Lesen zu einer Steigerung der Kreativität beitragen kann.
+common_voice_en_26370	/LocalData/dataset/CommonVoice/v4/en/wav/common_voice_en_26370.wav	62208	Danke, dass Sie uns an Ihrer Geschichte haben teilhaben lassen. Alles Gute für die Hochzeitsreise.
+common_voice_en_26372	/LocalData/dataset/CommonVoice/v4/en/wav/common_voice_en_26372.wav	59904	Sie kennen die Uhrzeit doch. Warum fragen Sie mich danach?
+common_voice_en_17260994	/LocalData/dataset/CommonVoice/v4/en/wav/common_voice_en_17260994.wav	155520	Der Fuchs sprang über den Rand der Farm. Dort fand er einen Safari-Reisenden vor, der eine Vivaldi Opera zum Besten gab.
+common_voice_en_18881599	/LocalData/dataset/CommonVoice/v4/en/wav/common_voice_en_18881599.wav	108672	"Express" sollte das Gebiet untersuchen und fand dort nichts.
+common_voice_en_18881604	/LocalData/dataset/CommonVoice/v4/en/wav/common_voice_en_18881604.wav	92544	Dadurch werden die Probleme gemildert, die durch einen Mangel an Hämoglobin verursacht werden.
+common_voice_en_18881605	/LocalData/dataset/CommonVoice/v4/en/wav/common_voice_en_18881605.wav	109056	Diese Behauptungen werden von der Mainstream-Archäologie kategorisch zurückgewiesen.
+common_voice_en_180278	/LocalData/dataset/CommonVoice/v4/en/wav/common_voice_en_180278.wav	48768	Sie sollte eigentlich herunterkommen und Sie abholen.
+common_voice_en_180279	/LocalData/dataset/CommonVoice/v4/en/wav/common_voice_en_180279.wav	54912	Ich werde dort nicht als Geist leben.
+common_voice_en_696251	/LocalData/dataset/CommonVoice/v4/en/wav/common_voice_en_696251.wav	98304	Der Junge hat bemerkt, dass der Engländer nervös war und seine Bücher vergessen hat.
+common_voice_en_19049974	/LocalData/dataset/CommonVoice/v4/en/wav/common_voice_en_19049974.wav	73344	Durch eine Augenverletzung fand seine Karriere ein vorzeitiges Ende.
+common_voice_en_19049975	/LocalData/dataset/CommonVoice/v4/en/wav/common_voice_en_19049975.wav	126336	Supermatrixes ähnlicher Größe können genauso wie normale Matrixes hinzugefügt und vervielfacht werden.
+common_voice_en_19049976	/LocalData/dataset/CommonVoice/v4/en/wav/common_voice_en_19049976.wav	94464	Es liegt annäherungsweise südlich von Denali, der höchsten Erhebung in Nordamerika.
+common_voice_en_19765134	/LocalData/dataset/CommonVoice/v4/en/wav/common_voice_en_19765134.wav	110208	Kleinstädte in Vietnam unterstehen der regionalen Regierung.
+common_voice_en_19765136	/LocalData/dataset/CommonVoice/v4/en/wav/common_voice_en_19765136.wav	61440	Fünf Jahre später nahm er ihn nach Dresden mit.
+common_voice_en_19765138	/LocalData/dataset/CommonVoice/v4/en/wav/common_voice_en_19765138.wav	130176	Der Croma ist standardmäßig mit Anti-Blockier-System (ABS) und Elektronischer Bremskraftverteilung (EBD) ausgestattet.
+common_voice_en_19688061	/LocalData/dataset/CommonVoice/v4/en/wav/common_voice_en_19688061.wav	104448	Carter hat zwei Kinder, die Tochter Taleya und den Sohn Tamere.
+common_voice_en_19688062	/LocalData/dataset/CommonVoice/v4/en/wav/common_voice_en_19688062.wav	148992	Wenn der Gehalt an gelöstem Sauerstoff zu hypoxischen Bedingungen übergeht, ersticken Fische und andere Meerestiere.
+common_voice_en_19688064	/LocalData/dataset/CommonVoice/v4/en/wav/common_voice_en_19688064.wav	48768	Adams hatte ein Leben mit vielen Tiefen.
+common_voice_en_19690060	/LocalData/dataset/CommonVoice/v4/en/wav/common_voice_en_19690060.wav	136320	Er hat die Dudley Middle Comprehensive School besucht.
+common_voice_en_19690063	/LocalData/dataset/CommonVoice/v4/en/wav/common_voice_en_19690063.wav	116352	Der ursprüngliche Name der Schule lautet "School of Commerce and Domestic Science" (Handels- und Hauswirtschaftsschule).
+common_voice_en_19690064	/LocalData/dataset/CommonVoice/v4/en/wav/common_voice_en_19690064.wav	124032	Bei dem Unfall, bei dem er am Steuer saß, befand sich auch Anna, seine Tochter, im Auto. Sie hat den Unfall überlebt.
+common_voice_en_18260377	/LocalData/dataset/CommonVoice/v4/en/wav/common_voice_en_18260377.wav	98304	Jeder möchte gemocht werden. Das liegt in der Natur des Menschen.
+common_voice_en_18260378	/LocalData/dataset/CommonVoice/v4/en/wav/common_voice_en_18260378.wav	85248	Jeder sollte Zugang zu medizinischer Grundversorgung haben.
+common_voice_en_18260379	/LocalData/dataset/CommonVoice/v4/en/wav/common_voice_en_18260379.wav	77184	Während wir älter werden, sind wir in unserem Leben gefangen.
+common_voice_en_100764	/LocalData/dataset/CommonVoice/v4/en/wav/common_voice_en_100764.wav	73344	Sie sollten das in einem Wahrnehmungsexperiment untersuchen.
+common_voice_en_100765	/LocalData/dataset/CommonVoice/v4/en/wav/common_voice_en_100765.wav	70656	Sie haben mich vom ersten Moment an abgelehnt.
+common_voice_en_626029	/LocalData/dataset/CommonVoice/v4/en/wav/common_voice_en_626029.wav	104448	Hanf ist ein Gras, das in Teilen der Tropen vorgefunden wird.
+common_voice_en_19703984	/LocalData/dataset/CommonVoice/v4/en/wav/common_voice_en_19703984.wav	142848	Sowohl Federation als auch Neo-Zeon Forces sehen dabei zu als die Axis beim Wiedereintritt von der Bahn abkommen.
+common_voice_en_19703985	/LocalData/dataset/CommonVoice/v4/en/wav/common_voice_en_19703985.wav	165120	Das Mutterhaus in Loretto befindet sich in Nerinx, Marion County, Kentucky.
+common_voice_en_19703987	/LocalData/dataset/CommonVoice/v4/en/wav/common_voice_en_19703987.wav	114048	Der Umfang von Matildas militärischer Ausbildung wird diskutiert.
+common_voice_en_19676540	/LocalData/dataset/CommonVoice/v4/en/wav/common_voice_en_19676540.wav	105984	Zu Lernzwecken wurden Stifte nach und nach durch Schreibtafeln ersetzt.
+common_voice_en_19676541	/LocalData/dataset/CommonVoice/v4/en/wav/common_voice_en_19676541.wav	131712	Die extrem hügelige Landschaft zeichnet sich durch eine Art Erhabenheit aus und bietet einen atemberaubenden Ausblick.
+common_voice_en_19676542	/LocalData/dataset/CommonVoice/v4/en/wav/common_voice_en_19676542.wav	93696	Die beiden Tierbilder wurden zu einem Bild kombiniert.
+common_voice_en_19678470	/LocalData/dataset/CommonVoice/v4/en/wav/common_voice_en_19678470.wav	145920	Sie und Gatti-Casazza haben sich im darauffolgenden Jahr getrennt und sich dann scheiden lassen.
+common_voice_en_19678471	/LocalData/dataset/CommonVoice/v4/en/wav/common_voice_en_19678471.wav	74112	Es zeigt allerdings niemand Interesse.
+common_voice_en_19678476	/LocalData/dataset/CommonVoice/v4/en/wav/common_voice_en_19678476.wav	98688	Er hat keine sinnvollen Aussagen gemacht. Es war nur Kauderwelsch.
+common_voice_en_17730605	/LocalData/dataset/CommonVoice/v4/en/wav/common_voice_en_17730605.wav	57984	Wer im Glashaus sitzt, sollte nicht mit Steinen werfen.
+common_voice_en_19768089	/LocalData/dataset/CommonVoice/v4/en/wav/common_voice_en_19768089.wav	66432	Der Rahmen kippt den Motor leicht nach hinten.
+common_voice_en_19768197	/LocalData/dataset/CommonVoice/v4/en/wav/common_voice_en_19768197.wav	58752	Bevor er hauptberuflich Politiker wurde, war er Landwirt.
+common_voice_en_19768200	/LocalData/dataset/CommonVoice/v4/en/wav/common_voice_en_19768200.wav	73344	Er hat auch als Karikaturist und Comiczeichner gearbeitet.
+common_voice_en_19699188	/LocalData/dataset/CommonVoice/v4/en/wav/common_voice_en_19699188.wav	106368	Das Schiff war zwei von vier Lebensjahren aufgelegt.
+common_voice_en_19699189	/LocalData/dataset/CommonVoice/v4/en/wav/common_voice_en_19699189.wav	133632	Boucher hat sich von Künstlern wie Peter Pauls Rubens und Antoine Watteau inspirieren lassen.
+common_voice_en_19699190	/LocalData/dataset/CommonVoice/v4/en/wav/common_voice_en_19699190.wav	108288	Zwei Tracks wurden als Auszüge auf einer Single herausgebracht.
+common_voice_en_512711	/LocalData/dataset/CommonVoice/v4/en/wav/common_voice_en_512711.wav	84096	Gruppe von Menschen, von sanftem Licht einer Öllaterne angestrahlt.
+common_voice_en_512712	/LocalData/dataset/CommonVoice/v4/en/wav/common_voice_en_512712.wav	103296	Frau mit hellem Haar und Mann mit einem Lächeln, die nebeneinander sitzen.
+common_voice_en_512713	/LocalData/dataset/CommonVoice/v4/en/wav/common_voice_en_512713.wav	98304	Ein Mann fährt die Straße entlang und passiert Blumenkübel. Er hält dabei ein zweites Fahrrad.
+common_voice_en_19678686	/LocalData/dataset/CommonVoice/v4/en/wav/common_voice_en_19678686.wav	114816	Computertische werden normalerweise in der Massenproduktion gefertigt und müssen teilweise in Selbstmontage montiert werden.
+common_voice_en_19678689	/LocalData/dataset/CommonVoice/v4/en/wav/common_voice_en_19678689.wav	97536	Aufgrund der geringen Auflage gilt es jetzt als Sammlerstück.
+common_voice_en_19678692	/LocalData/dataset/CommonVoice/v4/en/wav/common_voice_en_19678692.wav	139776	Die Songs von Thrussel haben regelmäßig Themen zum Gegenstand, in denen er sich gegen Konsum und Überwachung durch den Staat ausspricht.
+common_voice_en_648128	/LocalData/dataset/CommonVoice/v4/en/wav/common_voice_en_648128.wav	77184	Ein Mann und ein Kind auf einem Campingplatz, die ein Frühstück zubereiten.
+common_voice_en_648129	/LocalData/dataset/CommonVoice/v4/en/wav/common_voice_en_648129.wav	72576	Militärangehörige bereiten sich auf ihren Dienst vor.
+common_voice_en_648130	/LocalData/dataset/CommonVoice/v4/en/wav/common_voice_en_648130.wav	109824	Ein Baseballspieler, der ein blaues T-Shirt trägt, läuft auf eine Base zu.
+common_voice_en_34182	/LocalData/dataset/CommonVoice/v4/en/wav/common_voice_en_34182.wav	82560	Ihr Büro hat mich angerufen, um ihn zurückzuhalten.
+common_voice_en_34184	/LocalData/dataset/CommonVoice/v4/en/wav/common_voice_en_34184.wav	67968	Dieser Satz macht überhaupt keinen Sinn.
+common_voice_en_92676	/LocalData/dataset/CommonVoice/v4/en/wav/common_voice_en_92676.wav	62976	Eine Gruppe von Leuten läuft durch eine Karnevalsgruppe.
+common_voice_en_92677	/LocalData/dataset/CommonVoice/v4/en/wav/common_voice_en_92677.wav	62976	Ein älteres Paar, das singt und Gitarre spielt.
+common_voice_en_92678	/LocalData/dataset/CommonVoice/v4/en/wav/common_voice_en_92678.wav	86016	Zwei Männer in roten Hosen vollführen akrobatische Kunststücke mit einer Leiter.
+common_voice_en_570502	/LocalData/dataset/CommonVoice/v4/en/wav/common_voice_en_570502.wav	82944	Künstliche neuronale Netzwerke können etwas ganz ähnliches ausführen.
+common_voice_en_141246	/LocalData/dataset/CommonVoice/v4/en/wav/common_voice_en_141246.wav	63744	Schalte die Laterne aus, die uns Licht spendet.
+common_voice_en_141247	/LocalData/dataset/CommonVoice/v4/en/wav/common_voice_en_141247.wav	62592	Brian reist heute ab.
+common_voice_en_19047441	/LocalData/dataset/CommonVoice/v4/en/wav/common_voice_en_19047441.wav	84096	Die Bewohner haben im Namen des Dauphin eine Silbermine betrieben.
+common_voice_en_19047442	/LocalData/dataset/CommonVoice/v4/en/wav/common_voice_en_19047442.wav	74496	Die Statue wurde durch den Millenium Lottery Fund teilfinanziert.
+common_voice_en_19047443	/LocalData/dataset/CommonVoice/v4/en/wav/common_voice_en_19047443.wav	117504	Das Henderson House in Elmhurst, Illinois, USA; hat einen ähnlichen Grundriss.
+common_voice_en_567705	/LocalData/dataset/CommonVoice/v4/en/wav/common_voice_en_567705.wav	54144	Hängen Sie an beide Zweige Lametta.
+common_voice_en_17283658	/LocalData/dataset/CommonVoice/v4/en/wav/common_voice_en_17283658.wav	59520	Unter den Linden.
+common_voice_en_17283659	/LocalData/dataset/CommonVoice/v4/en/wav/common_voice_en_17283659.wav	119040	Das höchste Gebäude der Welt ist 829,8 m hoch.
+common_voice_en_18707930	/LocalData/dataset/CommonVoice/v4/en/wav/common_voice_en_18707930.wav	107136	Die Stadt liegt in Harris County, in Südost Texas.
+common_voice_en_18707931	/LocalData/dataset/CommonVoice/v4/en/wav/common_voice_en_18707931.wav	155904	Das steht im Gegensatz zum Potential des Pacemakers oder dem Strom, der die rhythmische Modulierung der Impulsfrequenz antreibt.
+common_voice_en_18707933	/LocalData/dataset/CommonVoice/v4/en/wav/common_voice_en_18707933.wav	90624	Die Stadt wird durch eine Stadtverwaltung regiert.
+common_voice_en_18524588	/LocalData/dataset/CommonVoice/v4/en/wav/common_voice_en_18524588.wav	88704	Genehmigen Sie den Ausdruck meiner vorzüglichen Hochachtung.
+common_voice_en_18524590	/LocalData/dataset/CommonVoice/v4/en/wav/common_voice_en_18524590.wav	67584	Anhand der Laufzeit kann man ablesen, dass dieser Computer nie neu gestartet wurde.
+common_voice_en_18524592	/LocalData/dataset/CommonVoice/v4/en/wav/common_voice_en_18524592.wav	89856	Celia stand dort, war offenbar nicht betroffen und konnte den Vorkommnissen nicht folgen.
+common_voice_en_19254317	/LocalData/dataset/CommonVoice/v4/en/wav/common_voice_en_19254317.wav	134784	Der Unterricht wird extra abends abgehalten, damit die Studenten von High Schools daran teilnehmen können.
+common_voice_en_19254318	/LocalData/dataset/CommonVoice/v4/en/wav/common_voice_en_19254318.wav	119424	Dieses Fett ist Halacha und wird auch Chelev oder Talg genannt.
+common_voice_en_19254320	/LocalData/dataset/CommonVoice/v4/en/wav/common_voice_en_19254320.wav	97536	Die Patienten und das Krankenhauspersonal haben sie für den Preis nominiert.
+common_voice_en_542826	/LocalData/dataset/CommonVoice/v4/en/wav/common_voice_en_542826.wav	117120	Jeder Teilbereich des Bildschirms gehört zu einer bestimmten Reihe und Spalte.
+common_voice_en_542828	/LocalData/dataset/CommonVoice/v4/en/wav/common_voice_en_542828.wav	108672	Die internationale Raumstation ist ein großartiges Projekt.
diff --git a/SpeechLM/dataset/CommonVoice/v4/en/en-de/spm_char_st_en_de.model b/SpeechLM/dataset/CommonVoice/v4/en/en-de/spm_char_st_en_de.model
new file mode 100644
index 0000000000000000000000000000000000000000..9dc0c3ca4c5a2ab48f2fdb344005b1169fe55a2e
Binary files /dev/null and b/SpeechLM/dataset/CommonVoice/v4/en/en-de/spm_char_st_en_de.model differ
diff --git a/SpeechLM/dataset/CommonVoice/v4/en/en-de/spm_char_st_en_de.txt b/SpeechLM/dataset/CommonVoice/v4/en/en-de/spm_char_st_en_de.txt
new file mode 100644
index 0000000000000000000000000000000000000000..1a1a2f6420331fb0efef9fe87631b10fa493dba7
--- /dev/null
+++ b/SpeechLM/dataset/CommonVoice/v4/en/en-de/spm_char_st_en_de.txt
@@ -0,0 +1,164 @@
+▁ 1
+e 1
+n 1
+i 1
+r 1
+t 1
+s 1
+a 1
+d 1
+h 1
+u 1
+l 1
+o 1
+c 1
+g 1
+m 1
+. 1
+b 1
+f 1
+w 1
+k 1
+z 1
+S 1
+v 1
+p 1
+, 1
+D 1
+ü 1
+E 1
+ä 1
+A 1
+B 1
+M 1
+G 1
+" 1
+F 1
+K 1
+P 1
+W 1
+T 1
+y 1
+H 1
+ö 1
+I 1
+R 1
+L 1
+- 1
+C 1
+V 1
+N 1
+ß 1
+Z 1
+J 1
+U 1
+j 1
+O 1
+x 1
+? 1
+! 1
+' 1
+q 1
+Y 1
+Ü 1
+: 1
+Q 1
+Ä 1
+Ö 1
+; 1
+( 1
+) 1
+X 1
+0 1
+1 1
+[ 1
+] 1
+é 1
+2 1
+& 1
+3 1
+5 1
+4 1
+7 1
+9 1
+8 1
+6 1
+/ 1
+á 1
+ō 1
+ó 1
+ñ 1
+ú 1
+í 1
+ā 1
+è 1
+* 1
+ć 1
+à 1
+ê 1
+ë 1
+¡ 1
+ç 1
+ð 1
+ã 1
+č 1
+ū 1
+% 1
+É 1
+â 1
+ø 1
+š 1
+å 1
+ô 1
+ł 1
+œ 1
+ş 1
+Š 1
+_ 1
+Î 1
+Ó 1
+æ 1
+ï 1
+ă 1
+ě 1
+ī 1
+ı 1
+ʻ 1
+ʿ 1
+π 1
+и 1
+к 1
+= 1
+Ã 1
+Ø 1
+î 1
+û 1
+þ 1
+ċ 1
+Č 1
+ę 1
+ğ 1
+ń 1
+Ō 1
+ő 1
+ř 1
+ž 1
+ǎ 1
+α 1
+В 1
+е 1
+з 1
+й 1
+л 1
+н 1
+ь 1
+я 1
+ṃ 1
+ạ 1
+ụ 1
+→ 1
+≡ 1
+京 1
+大 1
+都 1
+阪 1
diff --git a/SpeechLM/dataset/CommonVoice/v4/en/en-de/spm_char_st_en_de.vocab b/SpeechLM/dataset/CommonVoice/v4/en/en-de/spm_char_st_en_de.vocab
new file mode 100644
index 0000000000000000000000000000000000000000..dcaf02c4610abddbef943bb81b8df7807ca6d7ca
--- /dev/null
+++ b/SpeechLM/dataset/CommonVoice/v4/en/en-de/spm_char_st_en_de.vocab
@@ -0,0 +1,168 @@
+<s>	0
+<pad>	0
+</s>	0
+<unk>	0
+▁	-1.94346
+e	-2.0247
+n	-2.52771
+i	-2.69095
+r	-2.81179
+t	-2.99429
+s	-3.07457
+a	-3.08727
+d	-3.37853
+h	-3.41543
+u	-3.52845
+l	-3.53925
+o	-3.76429
+c	-3.83672
+g	-3.89086
+m	-4.03425
+.	-4.27171
+b	-4.34078
+f	-4.45167
+w	-4.51255
+k	-4.68054
+z	-4.81542
+S	-4.96966
+v	-5.01738
+p	-5.09819
+,	-5.11371
+D	-5.22687
+ü	-5.34517
+E	-5.43072
+ä	-5.43483
+A	-5.61389
+B	-5.67037
+M	-5.68285
+G	-5.93387
+"	-5.94796
+F	-5.95252
+K	-5.99114
+P	-6.03568
+W	-6.0592
+T	-6.08128
+y	-6.08834
+H	-6.14664
+ö	-6.17763
+I	-6.18576
+R	-6.22513
+L	-6.30172
+-	-6.34074
+C	-6.41901
+V	-6.44441
+N	-6.48507
+ß	-6.60475
+Z	-6.78851
+J	-6.81489
+U	-7.04154
+j	-7.07161
+O	-7.13538
+x	-7.50985
+?	-7.66957
+!	-8.34983
+'	-8.62779
+q	-8.7511
+Y	-8.80869
+Ü	-9.0344
+:	-9.03696
+Q	-9.11993
+Ä	-9.61997
+Ö	-9.9612
+;	-10.0729
+(	-10.0826
+)	-10.0839
+X	-10.6277
+0	-11.1096
+1	-11.1164
+[	-11.296
+]	-11.296
+é	-11.3293
+2	-11.4413
+&	-12.1488
+3	-12.188
+5	-12.3864
+4	-12.4237
+7	-12.4891
+9	-12.6035
+8	-12.6343
+6	-12.666
+/	-12.9645
+á	-13.1043
+ō	-13.392
+ó	-13.5351
+ñ	-13.6151
+ú	-13.9028
+í	-14.1541
+ā	-14.1541
+è	-14.2282
+*	-14.3953
+ć	-14.7137
+à	-14.8472
+ê	-14.8472
+ë	-14.8472
+¡	-15.0014
+ç	-15.0014
+ð	-15.0014
+ã	-15.1837
+č	-15.1837
+ū	-15.1837
+%	-15.4069
+É	-15.4069
+â	-15.4069
+ø	-15.4069
+š	-15.4069
+å	-15.6945
+ô	-15.6945
+ł	-15.6945
+œ	-15.6945
+ş	-15.6945
+Š	-15.6945
+_	-16.1
+Î	-16.1
+Ó	-16.1
+æ	-16.1
+ï	-16.1
+ă	-16.1
+ě	-16.1
+ī	-16.1
+ı	-16.1
+ʻ	-16.1
+ʿ	-16.1
+π	-16.1
+и	-16.1
+к	-16.1
+=	-16.7932
+Ã	-16.7932
+Ø	-16.7932
+î	-16.7932
+û	-16.7932
+þ	-16.7932
+ċ	-16.7932
+Č	-16.7932
+ę	-16.7932
+ğ	-16.7932
+ń	-16.7932
+Ō	-16.7932
+ő	-16.7932
+ř	-16.7932
+ž	-16.7932
+ǎ	-16.7932
+α	-16.7932
+В	-16.7932
+е	-16.7932
+з	-16.7932
+й	-16.7932
+л	-16.7932
+н	-16.7932
+ь	-16.7932
+я	-16.7932
+ṃ	-16.7932
+ạ	-16.7932
+ụ	-16.7932
+→	-16.7932
+≡	-16.7932
+京	-16.7932
+大	-16.7932
+都	-16.7932
+阪	-16.7932
diff --git a/SpeechLM/dataset/LibriLM/hidden_unit/bin-idx/config.yaml b/SpeechLM/dataset/LibriLM/hidden_unit/bin-idx/config.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..97f25d9780d99813e322fbbf24c5b916525ede94
--- /dev/null
+++ b/SpeechLM/dataset/LibriLM/hidden_unit/bin-idx/config.yaml
@@ -0,0 +1,3 @@
+vocab_filename: dict.ltr.txt
+src_vocab_filename: dict.km.txt
+
diff --git a/SpeechLM/dataset/LibriLM/hidden_unit/bin-idx/dict.km.txt b/SpeechLM/dataset/LibriLM/hidden_unit/bin-idx/dict.km.txt
new file mode 100644
index 0000000000000000000000000000000000000000..bbfe59e554d6234f3631d8d09d9281c2160f4675
--- /dev/null
+++ b/SpeechLM/dataset/LibriLM/hidden_unit/bin-idx/dict.km.txt
@@ -0,0 +1,500 @@
+0 0
+1 1
+2 2
+3 3
+4 4
+5 5
+6 6
+7 7
+8 8
+9 9
+10 10
+11 11
+12 12
+13 13
+14 14
+15 15
+16 16
+17 17
+18 18
+19 19
+20 20
+21 21
+22 22
+23 23
+24 24
+25 25
+26 26
+27 27
+28 28
+29 29
+30 30
+31 31
+32 32
+33 33
+34 34
+35 35
+36 36
+37 37
+38 38
+39 39
+40 40
+41 41
+42 42
+43 43
+44 44
+45 45
+46 46
+47 47
+48 48
+49 49
+50 50
+51 51
+52 52
+53 53
+54 54
+55 55
+56 56
+57 57
+58 58
+59 59
+60 60
+61 61
+62 62
+63 63
+64 64
+65 65
+66 66
+67 67
+68 68
+69 69
+70 70
+71 71
+72 72
+73 73
+74 74
+75 75
+76 76
+77 77
+78 78
+79 79
+80 80
+81 81
+82 82
+83 83
+84 84
+85 85
+86 86
+87 87
+88 88
+89 89
+90 90
+91 91
+92 92
+93 93
+94 94
+95 95
+96 96
+97 97
+98 98
+99 99
+100 100
+101 101
+102 102
+103 103
+104 104
+105 105
+106 106
+107 107
+108 108
+109 109
+110 110
+111 111
+112 112
+113 113
+114 114
+115 115
+116 116
+117 117
+118 118
+119 119
+120 120
+121 121
+122 122
+123 123
+124 124
+125 125
+126 126
+127 127
+128 128
+129 129
+130 130
+131 131
+132 132
+133 133
+134 134
+135 135
+136 136
+137 137
+138 138
+139 139
+140 140
+141 141
+142 142
+143 143
+144 144
+145 145
+146 146
+147 147
+148 148
+149 149
+150 150
+151 151
+152 152
+153 153
+154 154
+155 155
+156 156
+157 157
+158 158
+159 159
+160 160
+161 161
+162 162
+163 163
+164 164
+165 165
+166 166
+167 167
+168 168
+169 169
+170 170
+171 171
+172 172
+173 173
+174 174
+175 175
+176 176
+177 177
+178 178
+179 179
+180 180
+181 181
+182 182
+183 183
+184 184
+185 185
+186 186
+187 187
+188 188
+189 189
+190 190
+191 191
+192 192
+193 193
+194 194
+195 195
+196 196
+197 197
+198 198
+199 199
+200 200
+201 201
+202 202
+203 203
+204 204
+205 205
+206 206
+207 207
+208 208
+209 209
+210 210
+211 211
+212 212
+213 213
+214 214
+215 215
+216 216
+217 217
+218 218
+219 219
+220 220
+221 221
+222 222
+223 223
+224 224
+225 225
+226 226
+227 227
+228 228
+229 229
+230 230
+231 231
+232 232
+233 233
+234 234
+235 235
+236 236
+237 237
+238 238
+239 239
+240 240
+241 241
+242 242
+243 243
+244 244
+245 245
+246 246
+247 247
+248 248
+249 249
+250 250
+251 251
+252 252
+253 253
+254 254
+255 255
+256 256
+257 257
+258 258
+259 259
+260 260
+261 261
+262 262
+263 263
+264 264
+265 265
+266 266
+267 267
+268 268
+269 269
+270 270
+271 271
+272 272
+273 273
+274 274
+275 275
+276 276
+277 277
+278 278
+279 279
+280 280
+281 281
+282 282
+283 283
+284 284
+285 285
+286 286
+287 287
+288 288
+289 289
+290 290
+291 291
+292 292
+293 293
+294 294
+295 295
+296 296
+297 297
+298 298
+299 299
+300 300
+301 301
+302 302
+303 303
+304 304
+305 305
+306 306
+307 307
+308 308
+309 309
+310 310
+311 311
+312 312
+313 313
+314 314
+315 315
+316 316
+317 317
+318 318
+319 319
+320 320
+321 321
+322 322
+323 323
+324 324
+325 325
+326 326
+327 327
+328 328
+329 329
+330 330
+331 331
+332 332
+333 333
+334 334
+335 335
+336 336
+337 337
+338 338
+339 339
+340 340
+341 341
+342 342
+343 343
+344 344
+345 345
+346 346
+347 347
+348 348
+349 349
+350 350
+351 351
+352 352
+353 353
+354 354
+355 355
+356 356
+357 357
+358 358
+359 359
+360 360
+361 361
+362 362
+363 363
+364 364
+365 365
+366 366
+367 367
+368 368
+369 369
+370 370
+371 371
+372 372
+373 373
+374 374
+375 375
+376 376
+377 377
+378 378
+379 379
+380 380
+381 381
+382 382
+383 383
+384 384
+385 385
+386 386
+387 387
+388 388
+389 389
+390 390
+391 391
+392 392
+393 393
+394 394
+395 395
+396 396
+397 397
+398 398
+399 399
+400 400
+401 401
+402 402
+403 403
+404 404
+405 405
+406 406
+407 407
+408 408
+409 409
+410 410
+411 411
+412 412
+413 413
+414 414
+415 415
+416 416
+417 417
+418 418
+419 419
+420 420
+421 421
+422 422
+423 423
+424 424
+425 425
+426 426
+427 427
+428 428
+429 429
+430 430
+431 431
+432 432
+433 433
+434 434
+435 435
+436 436
+437 437
+438 438
+439 439
+440 440
+441 441
+442 442
+443 443
+444 444
+445 445
+446 446
+447 447
+448 448
+449 449
+450 450
+451 451
+452 452
+453 453
+454 454
+455 455
+456 456
+457 457
+458 458
+459 459
+460 460
+461 461
+462 462
+463 463
+464 464
+465 465
+466 466
+467 467
+468 468
+469 469
+470 470
+471 471
+472 472
+473 473
+474 474
+475 475
+476 476
+477 477
+478 478
+479 479
+480 480
+481 481
+482 482
+483 483
+484 484
+485 485
+486 486
+487 487
+488 488
+489 489
+490 490
+491 491
+492 492
+493 493
+494 494
+495 495
+496 496
+497 497
+498 498
+499 499
diff --git a/SpeechLM/dataset/LibriLM/hidden_unit/bin-idx/dict.ltr.txt b/SpeechLM/dataset/LibriLM/hidden_unit/bin-idx/dict.ltr.txt
new file mode 100644
index 0000000000000000000000000000000000000000..26a7e6ba309998c3868db7ecab5d7afa52a68e52
--- /dev/null
+++ b/SpeechLM/dataset/LibriLM/hidden_unit/bin-idx/dict.ltr.txt
@@ -0,0 +1,29 @@
+| 803288730
+E 439294199
+T 319071758
+A 277306732
+O 263784364
+N 239361162
+I 237353011
+H 223346762
+S 220175453
+R 203352500
+D 152198685
+L 141597450
+U 98913389
+M 87138757
+C 84680142
+W 81375101
+F 80240665
+G 70642902
+Y 68388038
+P 58436929
+B 52538531
+V 33250231
+K 26906609
+' 9162896
+X 5075632
+J 4746771
+Q 3401794
+Z 2186971
+<mask> 1
diff --git a/SpeechLM/dataset/LibriLM/phone_unit/bin-idx/config.yaml b/SpeechLM/dataset/LibriLM/phone_unit/bin-idx/config.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..d6fd3d8c13f92f3ef5796e4c93adb4fe3161a38b
--- /dev/null
+++ b/SpeechLM/dataset/LibriLM/phone_unit/bin-idx/config.yaml
@@ -0,0 +1,3 @@
+vocab_filename: dict.ltr.txt
+src_vocab_filename: dict.phn.txt
+
diff --git a/SpeechLM/dataset/LibriLM/phone_unit/bin-idx/dict.ltr.txt b/SpeechLM/dataset/LibriLM/phone_unit/bin-idx/dict.ltr.txt
new file mode 100644
index 0000000000000000000000000000000000000000..26a7e6ba309998c3868db7ecab5d7afa52a68e52
--- /dev/null
+++ b/SpeechLM/dataset/LibriLM/phone_unit/bin-idx/dict.ltr.txt
@@ -0,0 +1,29 @@
+| 803288730
+E 439294199
+T 319071758
+A 277306732
+O 263784364
+N 239361162
+I 237353011
+H 223346762
+S 220175453
+R 203352500
+D 152198685
+L 141597450
+U 98913389
+M 87138757
+C 84680142
+W 81375101
+F 80240665
+G 70642902
+Y 68388038
+P 58436929
+B 52538531
+V 33250231
+K 26906609
+' 9162896
+X 5075632
+J 4746771
+Q 3401794
+Z 2186971
+<mask> 1
diff --git a/SpeechLM/dataset/LibriLM/phone_unit/bin-idx/dict.phn.txt b/SpeechLM/dataset/LibriLM/phone_unit/bin-idx/dict.phn.txt
new file mode 100644
index 0000000000000000000000000000000000000000..812e4b06e13b30fda420034927f6f877e2d54f56
--- /dev/null
+++ b/SpeechLM/dataset/LibriLM/phone_unit/bin-idx/dict.phn.txt
@@ -0,0 +1,364 @@
+<eps> 0
+SIL 1
+SIL_B 2
+SIL_E 3
+SIL_I 4
+SIL_S 5
+SPN 6
+SPN_B 7
+SPN_E 8
+SPN_I 9
+SPN_S 10
+AA_B 11
+AA_E 12
+AA_I 13
+AA_S 14
+AA0_B 15
+AA0_E 16
+AA0_I 17
+AA0_S 18
+AA1_B 19
+AA1_E 20
+AA1_I 21
+AA1_S 22
+AA2_B 23
+AA2_E 24
+AA2_I 25
+AA2_S 26
+AE_B 27
+AE_E 28
+AE_I 29
+AE_S 30
+AE0_B 31
+AE0_E 32
+AE0_I 33
+AE0_S 34
+AE1_B 35
+AE1_E 36
+AE1_I 37
+AE1_S 38
+AE2_B 39
+AE2_E 40
+AE2_I 41
+AE2_S 42
+AH_B 43
+AH_E 44
+AH_I 45
+AH_S 46
+AH0_B 47
+AH0_E 48
+AH0_I 49
+AH0_S 50
+AH1_B 51
+AH1_E 52
+AH1_I 53
+AH1_S 54
+AH2_B 55
+AH2_E 56
+AH2_I 57
+AH2_S 58
+AO_B 59
+AO_E 60
+AO_I 61
+AO_S 62
+AO0_B 63
+AO0_E 64
+AO0_I 65
+AO0_S 66
+AO1_B 67
+AO1_E 68
+AO1_I 69
+AO1_S 70
+AO2_B 71
+AO2_E 72
+AO2_I 73
+AO2_S 74
+AW_B 75
+AW_E 76
+AW_I 77
+AW_S 78
+AW0_B 79
+AW0_E 80
+AW0_I 81
+AW0_S 82
+AW1_B 83
+AW1_E 84
+AW1_I 85
+AW1_S 86
+AW2_B 87
+AW2_E 88
+AW2_I 89
+AW2_S 90
+AY_B 91
+AY_E 92
+AY_I 93
+AY_S 94
+AY0_B 95
+AY0_E 96
+AY0_I 97
+AY0_S 98
+AY1_B 99
+AY1_E 100
+AY1_I 101
+AY1_S 102
+AY2_B 103
+AY2_E 104
+AY2_I 105
+AY2_S 106
+B_B 107
+B_E 108
+B_I 109
+B_S 110
+CH_B 111
+CH_E 112
+CH_I 113
+CH_S 114
+D_B 115
+D_E 116
+D_I 117
+D_S 118
+DH_B 119
+DH_E 120
+DH_I 121
+DH_S 122
+EH_B 123
+EH_E 124
+EH_I 125
+EH_S 126
+EH0_B 127
+EH0_E 128
+EH0_I 129
+EH0_S 130
+EH1_B 131
+EH1_E 132
+EH1_I 133
+EH1_S 134
+EH2_B 135
+EH2_E 136
+EH2_I 137
+EH2_S 138
+ER_B 139
+ER_E 140
+ER_I 141
+ER_S 142
+ER0_B 143
+ER0_E 144
+ER0_I 145
+ER0_S 146
+ER1_B 147
+ER1_E 148
+ER1_I 149
+ER1_S 150
+ER2_B 151
+ER2_E 152
+ER2_I 153
+ER2_S 154
+EY_B 155
+EY_E 156
+EY_I 157
+EY_S 158
+EY0_B 159
+EY0_E 160
+EY0_I 161
+EY0_S 162
+EY1_B 163
+EY1_E 164
+EY1_I 165
+EY1_S 166
+EY2_B 167
+EY2_E 168
+EY2_I 169
+EY2_S 170
+F_B 171
+F_E 172
+F_I 173
+F_S 174
+G_B 175
+G_E 176
+G_I 177
+G_S 178
+HH_B 179
+HH_E 180
+HH_I 181
+HH_S 182
+IH_B 183
+IH_E 184
+IH_I 185
+IH_S 186
+IH0_B 187
+IH0_E 188
+IH0_I 189
+IH0_S 190
+IH1_B 191
+IH1_E 192
+IH1_I 193
+IH1_S 194
+IH2_B 195
+IH2_E 196
+IH2_I 197
+IH2_S 198
+IY_B 199
+IY_E 200
+IY_I 201
+IY_S 202
+IY0_B 203
+IY0_E 204
+IY0_I 205
+IY0_S 206
+IY1_B 207
+IY1_E 208
+IY1_I 209
+IY1_S 210
+IY2_B 211
+IY2_E 212
+IY2_I 213
+IY2_S 214
+JH_B 215
+JH_E 216
+JH_I 217
+JH_S 218
+K_B 219
+K_E 220
+K_I 221
+K_S 222
+L_B 223
+L_E 224
+L_I 225
+L_S 226
+M_B 227
+M_E 228
+M_I 229
+M_S 230
+N_B 231
+N_E 232
+N_I 233
+N_S 234
+NG_B 235
+NG_E 236
+NG_I 237
+NG_S 238
+OW_B 239
+OW_E 240
+OW_I 241
+OW_S 242
+OW0_B 243
+OW0_E 244
+OW0_I 245
+OW0_S 246
+OW1_B 247
+OW1_E 248
+OW1_I 249
+OW1_S 250
+OW2_B 251
+OW2_E 252
+OW2_I 253
+OW2_S 254
+OY_B 255
+OY_E 256
+OY_I 257
+OY_S 258
+OY0_B 259
+OY0_E 260
+OY0_I 261
+OY0_S 262
+OY1_B 263
+OY1_E 264
+OY1_I 265
+OY1_S 266
+OY2_B 267
+OY2_E 268
+OY2_I 269
+OY2_S 270
+P_B 271
+P_E 272
+P_I 273
+P_S 274
+R_B 275
+R_E 276
+R_I 277
+R_S 278
+S_B 279
+S_E 280
+S_I 281
+S_S 282
+SH_B 283
+SH_E 284
+SH_I 285
+SH_S 286
+T_B 287
+T_E 288
+T_I 289
+T_S 290
+TH_B 291
+TH_E 292
+TH_I 293
+TH_S 294
+UH_B 295
+UH_E 296
+UH_I 297
+UH_S 298
+UH0_B 299
+UH0_E 300
+UH0_I 301
+UH0_S 302
+UH1_B 303
+UH1_E 304
+UH1_I 305
+UH1_S 306
+UH2_B 307
+UH2_E 308
+UH2_I 309
+UH2_S 310
+UW_B 311
+UW_E 312
+UW_I 313
+UW_S 314
+UW0_B 315
+UW0_E 316
+UW0_I 317
+UW0_S 318
+UW1_B 319
+UW1_E 320
+UW1_I 321
+UW1_S 322
+UW2_B 323
+UW2_E 324
+UW2_I 325
+UW2_S 326
+V_B 327
+V_E 328
+V_I 329
+V_S 330
+W_B 331
+W_E 332
+W_I 333
+W_S 334
+Y_B 335
+Y_E 336
+Y_I 337
+Y_S 338
+Z_B 339
+Z_E 340
+Z_I 341
+Z_S 342
+ZH_B 343
+ZH_E 344
+ZH_I 345
+ZH_S 346
+#0 347
+#1 348
+#2 349
+#3 350
+#4 351
+#5 352
+#6 353
+#7 354
+#8 355
+#9 356
+#10 357
+#11 358
+#12 359
+#13 360
+#14 361
+#15 362
+#16 363
diff --git a/SpeechLM/dataset/LibriSpeech/asr/dict.ltr.txt b/SpeechLM/dataset/LibriSpeech/asr/dict.ltr.txt
new file mode 100644
index 0000000000000000000000000000000000000000..26a7e6ba309998c3868db7ecab5d7afa52a68e52
--- /dev/null
+++ b/SpeechLM/dataset/LibriSpeech/asr/dict.ltr.txt
@@ -0,0 +1,29 @@
+| 803288730
+E 439294199
+T 319071758
+A 277306732
+O 263784364
+N 239361162
+I 237353011
+H 223346762
+S 220175453
+R 203352500
+D 152198685
+L 141597450
+U 98913389
+M 87138757
+C 84680142
+W 81375101
+F 80240665
+G 70642902
+Y 68388038
+P 58436929
+B 52538531
+V 33250231
+K 26906609
+' 9162896
+X 5075632
+J 4746771
+Q 3401794
+Z 2186971
+<mask> 1
diff --git a/SpeechLM/dataset/LibriSpeech/asr/train_sample100.ltr b/SpeechLM/dataset/LibriSpeech/asr/train_sample100.ltr
new file mode 100644
index 0000000000000000000000000000000000000000..ab9ab39e823eba89897e7763155c77d6f2be38a4
--- /dev/null
+++ b/SpeechLM/dataset/LibriSpeech/asr/train_sample100.ltr
@@ -0,0 +1,100 @@
+C H A P T E R | O N E | M I S S U S | R A C H E L | L Y N D E | I S | S U R P R I S E D | M I S S U S | R A C H E L | L Y N D E | L I V E D | J U S T | W H E R E | T H E | A V O N L E A | M A I N | R O A D | D I P P E D | D O W N | I N T O | A | L I T T L E | H O L L O W | F R I N G E D | W I T H | A L D E R S | A N D | L A D I E S | E A R D R O P S | A N D | T R A V E R S E D | B Y | A | B R O O K |
+T H A T | H A D | I T S | S O U R C E | A W A Y | B A C K | I N | T H E | W O O D S | O F | T H E | O L D | C U T H B E R T | P L A C E | I T | W A S | R E P U T E D | T O | B E | A N | I N T R I C A T E | H E A D L O N G | B R O O K | I N | I T S | E A R L I E R | C O U R S E | T H R O U G H | T H O S E | W O O D S | W I T H | D A R K | S E C R E T S | O F | P O O L | A N D | C A S C A D E | B U T | B Y | T H E | T I M E | I T | R E A C H E D | L Y N D E ' S | H O L L O W | I T | W A S | A | Q U I E T | W E L L | C O N D U C T E D | L I T T L E | S T R E A M |
+F O R | N O T | E V E N | A | B R O O K | C O U L D | R U N | P A S T | M I S S U S | R A C H E L | L Y N D E ' S | D O O R | W I T H O U T | D U E | R E G A R D | F O R | D E C E N C Y | A N D | D E C O R U M | I T | P R O B A B L Y | W A S | C O N S C I O U S | T H A T | M I S S U S | R A C H E L | W A S | S I T T I N G | A T | H E R | W I N D O W | K E E P I N G | A | S H A R P | E Y E | O N | E V E R Y T H I N G | T H A T | P A S S E D | F R O M | B R O O K S | A N D | C H I L D R E N | U P |
+A N D | T H A T | I F | S H E | N O T I C E D | A N Y T H I N G | O D D | O R | O U T | O F | P L A C E | S H E | W O U L D | N E V E R | R E S T | U N T I L | S H E | H A D | F E R R E T E D | O U T | T H E | W H Y S | A N D | W H E R E F O R E S | T H E R E O F | T H E R E | A R E | P L E N T Y | O F | P E O P L E | I N | A V O N L E A | A N D | O U T | O F | I T | W H O | C A N | A T T E N D | C L O S E L Y | T O | T H E I R | N E I G H B O R ' S | B U S I N E S S | B Y | D I N T | O F | N E G L E C T I N G | T H E I R | O W N |
+B U T | M I S S U S | R A C H E L | L Y N D E | W A S | O N E | O F | T H O S E | C A P A B L E | C R E A T U R E S | W H O | C A N | M A N A G E | T H E I R | O W N | C O N C E R N S | A N D | T H O S E | O F | O T H E R | F O L K S | I N T O | T H E | B A R G A I N | S H E | W A S | A | N O T A B L E | H O U S E W I F E | H E R | W O R K | W A S | A L W A Y S | D O N E | A N D | W E L L | D O N E | S H E | R A N | T H E | S E W I N G | C I R C L E |
+H E L P E D | R U N | T H E | S U N D A Y | S C H O O L | A N D | W A S | T H E | S T R O N G E S T | P R O P | O F | T H E | C H U R C H | A I D | S O C I E T Y | A N D | F O R E I G N | M I S S I O N S | A U X I L I A R Y | Y E T | W I T H | A L L | T H I S | M I S S U S | R A C H E L | F O U N D | A B U N D A N T | T I M E | T O | S I T | F O R | H O U R S | A T | H E R | K I T C H E N | W I N D O W | K N I T T I N G | C O T T O N | W A R P | Q U I L T S | S H E | H A D | K N I T T E D | S I X T E E N | O F | T H E M |
+A S | A V O N L E A | H O U S E K E E P E R S | W E R E | W O N T | T O | T E L L | I N | A W E D | V O I C E S | A N D | K E E P I N G | A | S H A R P | E Y E | O N | T H E | M A I N | R O A D | T H A T | C R O S S E D | T H E | H O L L O W | A N D | W O U N D | U P | T H E | S T E E P | R E D | H I L L | B E Y O N D |
+A N Y B O D Y | W H O | W E N T | O U T | O F | I T | O R | I N T O | I T | H A D | T O | P A S S | O V E R | T H A T | H I L L | R O A D | A N D | S O | R U N | T H E | U N S E E N | G A U N T L E T | O F | M I S S U S | R A C H E L ' S | A L L | S E E I N G | E Y E | S H E | W A S | S I T T I N G | T H E R E | O N E | A F T E R N O O N | I N | E A R L Y | J U N E | T H E | S U N | W A S | C O M I N G | I N | A T | T H E | W I N D O W | W A R M | A N D | B R I G H T |
+T H E | O R C H A R D | O N | T H E | S L O P E | B E L O W | T H E | H O U S E | W A S | I N | A | B R I D A L | F L U S H | O F | P I N K Y | W H I T E | B L O O M | H U M M E D | O V E R | B Y | A | M Y R I A D | O F | B E E S | T H O M A S | L Y N D E | A | M E E K | L I T T L E | M A N | W H O M | A V O N L E A | P E O P L E | C A L L E D | R A C H E L | L Y N D E ' S | H U S B A N D | W A S | S O W I N G | H I S | L A T E | T U R N I P | S E E D | O N | T H E | H I L L | F I E L D | B E Y O N D | T H E | B A R N |
+M I S S U S | R A C H E L | K N E W | T H A T | H E | O U G H T | B E C A U S E | S H E | H A D | H E A R D | H I M | T E L L | P E T E R | M O R R I S O N | T H E | E V E N I N G | B E F O R E | I N | W I L L I A M | J | B L A I R ' S | S T O R E | O V E R | A T | C A R M O D Y | T H A T | H E | M E A N T | T O | S O W | H I S | T U R N I P | S E E D | T H E | N E X T | A F T E R N O O N |
+P E T E R | H A D | A S K E D | H I M | O F | C O U R S E | F O R | M A T T H E W | C U T H B E R T | H A D | N E V E R | B E E N | K N O W N | T O | V O L U N T E E R | I N F O R M A T I O N | A B O U T | A N Y T H I N G | I N | H I S | W H O L E | L I F E | A N D | Y E T | H E R E | W A S | M A T T H E W | C U T H B E R T | A T | H A L F | P A S T | T H R E E | O N | T H E | A F T E R N O O N | O F | A | B U S Y | D A Y | P L A C I D L Y | D R I V I N G | O V E R | T H E | H O L L O W | A N D | U P | T H E | H I L L |
+A N D | H I S | B E S T | S U I T | O F | C L O T H E S | W H I C H | W A S | P L A I N | P R O O F | T H A T | H E | W A S | G O I N G | O U T | O F | A V O N L E A | A N D | H E | H A D | T H E | B U G G Y | A N D | T H E | S O R R E L | M A R E | W H I C H | B E T O K E N E D | T H A T | H E | W A S | G O I N G | A | C O N S I D E R A B L E | D I S T A N C E | N O W | W H E R E | W A S | M A T T H E W | C U T H B E R T | G O I N G | A N D | W H Y | W A S | H E | G O I N G | T H E R E |
+H A D | I T | B E E N | A N Y | O T H E R | M A N | I N | A V O N L E A | M I S S U S | R A C H E L | D E F T L Y | P U T T I N G | T H I S | A N D | T H A T | T O G E T H E R | M I G H T | H A V E | G I V E N | A | P R E T T Y | G O O D | G U E S S | A S | T O | B O T H | Q U E S T I O N S | B U T | M A T T H E W | S O | R A R E L Y | W E N T | F R O M | H O M E | T H A T | I T | M U S T | B E | S O M E T H I N G | P R E S S I N G | A N D | U N U S U A L | W H I C H | W A S | T A K I N G | H I M |
+H E | W A S | T H E | S H Y E S T | M A N | A L I V E | A N D | H A T E D | T O | H A V E | T O | G O | A M O N G | S T R A N G E R S | O R | T O | A N Y | P L A C E | W H E R E | H E | M I G H T | H A V E | T O | T A L K | M A T T H E W | D R E S S E D | U P | W I T H | A | W H I T E | C O L L A R | A N D | D R I V I N G | I N | A | B U G G Y | W A S | S O M E T H I N G | T H A T | D I D N ' T | H A P P E N | O F T E N | M I S S U S | R A C H E L | P O N D E R | A S | S H E | M I G H T | C O U L D | M A K E | N O T H I N G | O F | I T |
+A N D | H E R | A F T E R N O O N ' S | E N J O Y M E N T | W A S | S P O I L E D | I ' L L | J U S T | S T E P | O V E R | T O | G R E E N | G A B L E S | A F T E R | T E A | A N D | F I N D | O U T | F R O M | M A R I L L A | W H E R E | H E ' S | G O N E | A N D | W H Y | T H E | W O R T H Y | W O M A N | F I N A L L Y | C O N C L U D E D | H E | D O E S N ' T | G E N E R A L L Y | G O | T O | T O W N | T H I S | T I M E | O F | Y E A R | A N D | H E | N E V E R | V I S I T S |
+I F | H E ' D | R U N | O U T | O F | T U R N I P | S E E D | H E | W O U L D N ' T | D R E S S | U P | A N D | T A K E | T H E | B U G G Y | T O | G O | F O R | M O R E |
+Y E T | S O M E T H I N G | M U S T | H A V E | H A P P E N E D | S I N C E | L A S T | N I G H T | T O | S T A R T | H I M | O F F | I ' M | C L E A N | P U Z Z L E D | T H A T ' S | W H A T | A N D | I | W O N ' T | K N O W | A | M I N U T E ' S | P E A C E | O F | M I N D | O R | C O N S C I E N C E | U N T I L | I | K N O W | W H A T | H A S | T A K E N | M A T T H E W | C U T H B E R T | O U T | O F | A V O N L E A | T O D A Y | A C C O R D I N G L Y | A F T E R | T E A | M I S S U S | R A C H E L | S E T | O U T | S H E | H A D | N O T | F A R | T O | G O |
+T H E | B I G | R A M B L I N G | O R C H A R D | E M B O W E R E D | H O U S E | W H E R E | T H E | C U T H B E R T S | L I V E D | W A S | A | S C A N T | Q U A R T E R | O F | A | M I L E | U P | T H E | R O A D | F R O M | L Y N D E ' S | H O L L O W | T O | B E | S U R E | T H E | L O N G | L A N E | M A D E | I T | A | G O O D | D E A L | F U R T H E R | M A T T H E W | C U T H B E R T ' S | F A T H E R | A S | S H Y | A N D | S I L E N T | A S | H I S | S O N | A F T E R | H I M |
+H A D | G O T | A S | F A R | A W A Y | A S | H E | P O S S I B L Y | C O U L D | F R O M | H I S | F E L L O W | M E N | W I T H O U T | A C T U A L L Y | R E T R E A T I N G | I N T O | T H E | W O O D S | W H E N | H E | F O U N D E D | H I S | H O M E S T E A D | G R E E N | G A B L E S | W A S | B U I L T | A T | T H E | F U R T H E S T | E D G E | O F | H I S | C L E A R E D | L A N D | A N D | T H E R E | I T | W A S | T O | T H I S | D A Y |
+B A R E L Y | V I S I B L E | F R O M | T H E | M A I N | R O A D | A L O N G | W H I C H | A L L | T H E | O T H E R | A V O N L E A | H O U S E S | W E R E | S O | S O C I A B L Y | S I T U A T E D | M I S S U S | R A C H E L | L Y N D E | D I D | N O T | C A L L | L I V I N G | I N | S U C H | A | P L A C E | L I V I N G | A T | A L L | I T ' S | J U S T | S T A Y I N G | T H A T ' S | W H A T | S H E | S A I D | A S | S H E | S T E P P E D | A L O N G | T H E | D E E P | R U T T E D | G R A S S Y | L A N E |
+B O R D E R E D | W I T H | W I L D | R O S E | B U S H E S | I T ' S | N O | W O N D E R | M A T T H E W | A N D | M A R I L L A | A R E | B O T H | A | L I T T L E | O D D | L I V I N G | A W A Y | B A C K | H E R E | B Y | T H E M S E L V E S | T R E E S | A R E N ' T | M U C H | C O M P A N Y | T H O U G H | D E A R | K N O W S | I F | T H E Y | W E R E | T H E R E ' D | B E | E N O U G H | O F | T H E M | I ' D | R U T H E R | L O O K | A T | P E O P L E | T O | B E | S U R E |
+T H E Y | S E E M | C O N T E N T E D | E N O U G H | B U T | T H E N | I | S U P P O S E | T H E Y ' R E | U S E D | T O | I T | A | B O D Y | C A N | G E T | U S E D | T O | A N Y T H I N G | E V E N | T O | B E I N G | H A N G E D | A S | T H E | I R I S H M A N | S A I D | W I T H | T H I S | M I S S U S | R A C H E L | S T E P P E D | O U T | O F | T H E | L A N E | I N T O | T H E | B A C K Y A R D | O F | G R E E N | G A B L E S | V E R Y | G R E E N | A N D | N E A T | A N D | P R E C I S E | W A S | T H A T | Y A R D |
+S E T | A B O U T | O N | O N E | S I D E | W I T H | G R E A T | P A T R I A R C H A L | W I L L O W S | A N D | T H E | O T H E R | W I T H | P R I M | L O M B A R D I E S | N O T | A | S T R A Y | S T I C K | N O R | S T O N E | W A S | T O | B E | S E E N | F O R | M I S S U S | R A C H E L | W O U L D | H A V E | S E E N | I T | I F | T H E R E | H A D | B E E N | P R I V A T E L Y | S H E | W A S | O F | T H E | O P I N I O N | T H A T | M A R I L L A | C U T H B E R T | S W E P T | T H A T | Y A R D | O V E R | A S | O F T E N | A S | S H E | S W E P T | H E R | H O U S E |
+O N E | C O U L D | H A V E | E A T E N | A | M E A L | O F F | T H E | G R O U N D | W I T H O U T | O V E R B R I M M I N G | T H E | P R O V E R B I A L | P E C K | O F | D I R T | M I S S U S | R A C H E L | R A P P E D | S M A R T L Y | A T | T H E | K I T C H E N | D O O R | A N D | S T E P P E D | I N | W H E N | B I D D E N | T O | D O | S O | T H E | K I T C H E N | A T | G R E E N | G A B L E S | W A S | A | C H E E R F U L | A P A R T M E N T |
+O R | W O U L D | H A V E | B E E N | C H E E R F U L | I F | I T | H A D | N O T | B E E N | S O | P A I N F U L L Y | C L E A N | A S | T O | G I V E | I T | S O M E T H I N G | O F | T H E | A P P E A R A N C E | O F | A N | U N U S E D | P A R L O R | I T S | W I N D O W S | L O O K E D | E A S T | A N D | W E S T | T H R O U G H | T H E | W E S T | O N E | L O O K I N G | O U T | O N | T H E | B A C K | Y A R D | C A M E | A | F L O O D | O F | M E L L O W | J U N E | S U N L I G H T | B U T | T H E | E A S T | O N E |
+W H E N C E | Y O U | G O T | A | G L I M P S E | O F | T H E | B L O O M | W H I T E | C H E R R Y | T R E E S | I N | T H E | L E F T | O R C H A R D | A N D | N O D D I N G | S L E N D E R | B I R C H E S | D O W N | I N | T H E | H O L L O W | B Y | T H E | B R O O K | W A S | G R E E N E D | O V E R | B Y | A | T A N G L E | O F | V I N E S | H E R E | S A T | M A R I L L A | C U T H B E R T | W H E N | S H E | S A T | A T | A L L | A L W A Y S | S L I G H T L Y | D I S T R U S T F U L | O F | S U N S H I N E |
+A N D | H E R E | S H E | S A T | N O W | K N I T T I N G | A N D | T H E | T A B L E | B E H I N D | H E R | W A S | L A I D | F O R | S U P P E R | M I S S U S | R A C H E L | B E F O R E | S H E | H A D | F A I R L Y | C L O S E D | T H E | D O O R |
+T H E R E | W E R E | T H R E E | P L A T E S | L A I D | S O | T H A T | M A R I L L A | M U S T | B E | E X P E C T I N G | S O M E | O N E | H O M E | W I T H | M A T T H E W | T O | T E A | B U T | T H E | D I S H E S | W E R E | E V E R Y D A Y | D I S H E S | A N D | T H E R E | W A S | O N L Y | C R A B | A P P L E | P R E S E R V E S | A N D | O N E | K I N D | O F | C A K E | S O | T H A T | T H E | E X P E C T E D | C O M P A N Y | C O U L D | N O T | B E | A N Y | P A R T I C U L A R | C O M P A N Y |
+Y E T | W H A T | O F | M A T T H E W ' S | W H I T E | C O L L A R | A N D | T H E | S O R R E L | M A R E | M I S S U S | R A C H E L | W A S | G E T T I N G | F A I R L Y | D I Z Z Y | W I T H | T H I S | U N U S U A L | M Y S T E R Y | A B O U T | Q U I E T | U N M Y S T E R I O U S | G R E E N | G A B L E S | G O O D | E V E N I N G | R A C H E L | M A R I L L A | S A I D | B R I S K L Y | T H I S | I S | A | R E A L | F I N E | E V E N I N G | I S N ' T | I T | W O N ' T | Y O U | S I T | D O W N |
+H O W | A R E | A L L | Y O U R | F O L K S | S O M E T H I N G | T H A T | F O R | L A C K | O F | A N Y | O T H E R | N A M E | M I G H T | B E | C A L L E D | F R I E N D S H I P | E X I S T E D | A N D | A L W A Y S | H A D | E X I S T E D | B E T W E E N | M A R I L L A | C U T H B E R T | A N D | M I S S U S | R A C H E L | I N | S P I T E | O F | O R | P E R H A P S | B E C A U S E | O F | T H E I R | D I S S I M I L A R I T Y | M A R I L L A | W A S | A | T A L L |
+T H I N | W O M A N | W I T H | A N G L E S | A N D | W I T H O U T | C U R V E S | H E R | D A R K | H A I R | S H O W E D | S O M E | G R A Y | S T R E A K S | A N D | W A S | A L W A Y S | T W I S T E D | U P | I N | A | H A R D | L I T T L E | K N O T | B E H I N D | W I T H | T W O | W I R E | H A I R P I N S | S T U C K | A G G R E S S I V E L Y | T H R O U G H | I T | S H E | L O O K E D | L I K E | A | W O M A N | O F | N A R R O W | E X P E R I E N C E | A N D | R I G I D | C O N S C I E N C E | W H I C H | S H E | W A S |
+B U T | T H E R E | W A S | A | S A V I N G | S O M E T H I N G | A B O U T | H E R | M O U T H | W H I C H | I F | I T | H A D | B E E N | E V E R | S O | S L I G H T L Y | D E V E L O P E D | M I G H T | H A V E | B E E N | C O N S I D E R E D | I N D I C A T I V E | O F | A | S E N S E | O F | H U M O R | W E ' R E | A L L | P R E T T Y | W E L L | S A I D | M I S S U S | R A C H E L | I | W A S | K I N D | O F | A F R A I D | Y O U | W E R E N ' T | T H O U G H | W H E N | I | S A W | M A T T H E W | S T A R T I N G | O F F | T O D A Y | I | T H O U G H T | M A Y B E | H E | W A S | G O I N G | T O | T H E | D O C T O R ' S |
+M A R I L L A ' S | L I P S | T W I T C H E D | U N D E R S T A N D I N G L Y | S H E | H A D | E X P E C T E D | M I S S U S | R A C H E L | U P | S H E | H A D | K N O W N | T H A T | T H E | S I G H T | O F | M A T T H E W | J A U N T I N G | O F F | S O | U N A C C O U N T A B L Y | W O U L D | B E | T O O | M U C H | F O R | H E R | N E I G H B O R ' S | C U R I O S I T Y | O H | N O | I ' M | Q U I T E | W E L L | A L T H O U G H | I | H A D | A | B A D | H E A D A C H E | Y E S T E R D A Y | S H E | S A I D |
+M A T T H E W | W E N T | T O | B R I G H T | R I V E R | W E ' R E | G E T T I N G | A | L I T T L E | B O Y | F R O M | A N | O R P H A N | A S Y L U M | I N | N O V A | S C O T I A | A N D | H E ' S | C O M I N G | O N | T H E | T R A I N | T O N I G H T | I F | M A R I L L A | H A D | S A I D | T H A T | M A T T H E W | H A D | G O N E | T O | B R I G H T | R I V E R | T O | M E E T | A | K A N G A R O O | F R O M | A U S T R A L I A | M I S S U S | R A C H E L | C O U L D | N O T | H A V E | B E E N | M O R E | A S T O N I S H E D |
+S H E | W A S | A C T U A L L Y | S T R I C K E N | D U M B | F O R | F I V E | S E C O N D S | I T | W A S | U N S U P P O S A B L E | T H A T | M A R I L L A | W A S | M A K I N G | F U N | O F | H E R | B U T | M I S S U S | R A C H E L | W A S | A L M O S T | F O R C E D | T O | S U P P O S E | I T | A R E | Y O U | I N | E A R N E S T | M A R I L L A | S H E | D E M A N D E D | W H E N | V O I C E | R E T U R N E D | T O | H E R | Y E S | O F | C O U R S E |
+S A I D | M A R I L L A | A S | I F | G E T T I N G | B O Y S | F R O M | O R P H A N | A S Y L U M S | I N | N O V A | S C O T I A | W E R E | P A R T | O F | T H E | U S U A L | S P R I N G | W O R K | O N | A N Y | W E L L | R E G U L A T E D | A V O N L E A | F A R M | I N S T E A D | O F | B E I N G | A N | U N H E A R D | O F | I N N O V A T I O N | M I S S U S | R A C H E L | F E L T | T H A T | S H E | H A D | R E C E I V E D | A | S E V E R E | M E N T A L | J O L T | S H E | T H O U G H T | I N | E X C L A M A T I O N | P O I N T S |
+M A R I L L A | A N D | M A T T H E W | C U T H B E R T | O F | A L L | P E O P L E | A D O P T I N G | A | B O Y | F R O M | A N | O R P H A N | A S Y L U M | W E L L | T H E | W O R L D | W A S | C E R T A I N L Y | T U R N I N G | U P S I D E | D O W N | S H E | W O U L D | B E | S U R P R I S E D | A T | N O T H I N G | A F T E R | T H I S | N O T H I N G |
+W H A T | O N | E A R T H | P U T | S U C H | A | N O T I O N | I N T O | Y O U R | H E A D | S H E | D E M A N D E D | D I S A P P R O V I N G L Y | T H I S | H A D | B E E N | D O N E | W I T H O U T | H E R | A D V I C E | B E I N G | A S K E D | A N D | M U S T | P E R F O R C E | B E | D I S A P P R O V E D | W E L L | W E ' V E | B E E N | T H I N K I N G | A B O U T | I T | F O R | S O M E | T I M E | A L L | W I N T E R | I N | F A C T | R E T U R N E D | M A R I L L A |
+M I S S U S | A L E X A N D E R | S P E N C E R | W A S | U P | H E R E | O N E | D A Y | B E F O R E | C H R I S T M A S | A N D | S H E | S A I D | S H E | W A S | G O I N G | T O | G E T | A | L I T T L E | G I R L | F R O M | T H E | A S Y L U M | O V E R | I N | H O P E T O N | I N | T H E | S P R I N G |
+S O | M A T T H E W | A N D | I | H A V E | T A L K E D | I T | O V E R | O F F | A N D | O N | E V E R | S I N C E | W E | T H O U G H T | W E ' D | G E T | A | B O Y | M A T T H E W | I S | G E T T I N G | U P | I N | Y E A R S | Y O U | K N O W | H E ' S | S I X T Y | A N D | H E | I S N ' T | S O | S P R Y | A S | H E | O N C E | W A S | H I S | H E A R T | T R O U B L E S | H I M | A | G O O D | D E A L | A N D | Y O U | K N O W | H O W | D E S P E R A T E | H A R D | I T ' S | G O T | T O | B E | T O | G E T | H I R E D | H E L P |
+T H E R E ' S | N E V E R | A N Y B O D Y | T O | B E | H A D | B U T | T H O S E | S T U P I D | H A L F | G R O W N | L I T T L E | F R E N C H | B O Y S | A N D | A S | S O O N | A S | Y O U | D O | G E T | O N E | B R O K E | I N T O | Y O U R | W A Y S | A N D | T A U G H T | S O M E T H I N G | H E ' S | U P | A N D | O F F | T O | T H E | L O B S T E R | C A N N E R I E S | O R | T H E | S T A T E S | A T | F I R S T | M A T T H E W | S U G G E S T E D | G E T T I N G | A | H O M E | B O Y | B U T | I | S A I D | N O | F L A T | T O | T H A T |
+T H E Y | M A Y | B E | A L L | R I G H T | I ' M | N O T | S A Y I N G | T H E Y ' R E | N O T | B U T | N O | L O N D O N | S T R E E T | A R A B S | F O R | M E | I | S A I D | G I V E | M E | A | N A T I V E | B O R N | A T | L E A S T | T H E R E ' L L | B E | A | R I S K | N O | M A T T E R | W H O | W E | G E T | B U T | I ' L L | F E E L | E A S I E R | I N | M Y | M I N D | A N D | S L E E P | S O U N D E R | A T | N I G H T S | I F | W E | G E T | A | B O R N | C A N A D I A N |
+S O | I N | T H E | E N D | W E | D E C I D E D | T O | A S K | M I S S U S | S P E N C E R | T O | P I C K | U S | O U T | O N E | W H E N | S H E | W E N T | O V E R | T O | G E T | H E R | L I T T L E | G I R L | W E | H E A R D | L A S T | W E E K | S H E | W A S | G O I N G | S O | W E | S E N T | H E R | W O R D | B Y | R I C H A R D | S P E N C E R ' S | F O L K S | A T | C A R M O D Y | T O | B R I N G | U S | A | S M A R T | L I K E L Y | B O Y | O F | A B O U T | T E N | O R | E L E V E N | W E | D E C I D E D | T H A T | W O U L D | B E | T H E | B E S T | A G E |
+O L D | E N O U G H | T O | B E | O F | S O M E | U S E | I N | D O I N G | C H O R E S | R I G H T | O F F | A N D | Y O U N G | E N O U G H | T O | B E | T R A I N E D | U P | P R O P E R | W E | M E A N | T O | G I V E | H I M | A | G O O D | H O M E | A N D | S C H O O L I N G | W E | H A D | A | T E L E G R A M | F R O M | M I S S U S | A L E X A N D E R | S P E N C E R | T O D A Y | T H E | M A I L | M A N | B R O U G H T | I T | F R O M | T H E | S T A T I O N | S A Y I N G | T H E Y | W E R E | C O M I N G | O N | T H E | F I V E | T H I R T Y | T R A I N | T O N I G H T |
+S O | M A T T H E W | W E N T | T O | B R I G H T | R I V E R | T O | M E E T | H I M | M I S S U S | S P E N C E R | W I L L | D R O P | H I M | O F F | T H E R E | O F | C O U R S E | S H E | G O E S | O N | T O | W H I T E | S A N D S | S T A T I O N | H E R S E L F | M I S S U S | R A C H E L | P R I D E D | H E R S E L F | O N | A L W A Y S | S P E A K I N G | H E R | M I N D | S H E | P R O C E E D E D | T O | S P E A K | I T | N O W | H A V I N G | A D J U S T E D | H E R | M E N T A L | A T T I T U D E | T O | T H I S | A M A Z I N G | P I E C E | O F | N E W S |
+W E L L | M A R I L L A | I ' L L | J U S T | T E L L | Y O U | P L A I N | T H A T | I | T H I N K | Y O U ' R E | D O I N G | A | M I G H T Y | F O O L I S H | T H I N G | A | R I S K Y | T H I N G | T H A T ' S | W H A T | Y O U | D O N ' T | K N O W | W H A T | Y O U ' R E | G E T T I N G | Y O U ' R E | B R I N G I N G | A | S T R A N G E | C H I L D | I N T O | Y O U R | H O U S E | A N D | H O M E | A N D | Y O U | D O N ' T | K N O W | A | S I N G L E | T H I N G | A B O U T | H I M | N O R | W H A T | H I S | D I S P O S I T I O N | I S | L I K E | N O R | W H A T | S O R T | O F | P A R E N T S | H E | H A D |
+N O R | H O W | H E ' S | L I K E L Y | T O | T U R N | O U T | W H Y | I T | W A S | O N L Y | L A S T | W E E K | I | R E A D | I N | T H E | P A P E R | H O W | A | M A N | A N D | H I S | W I F E | U P | W E S T | O F | T H E | I S L A N D | T O O K | A | B O Y | O U T | O F | A N | O R P H A N | A S Y L U M | A N D | H E | S E T | F I R E | T O | T H E | H O U S E | A T | N I G H T | S E T | I T | O N | P U R P O S E | M A R I L L A | A N D | N E A R L Y | B U R N T | T H E M | T O | A | C R I S P | I N | T H E I R | B E D S |
+A N D | I | K N O W | A N O T H E R | C A S E | W H E R E | A N | A D O P T E D | B O Y | U S E D | T O | S U C K | T H E | E G G S | T H E Y | C O U L D N ' T | B R E A K | H I M | O F | I T | I F | Y O U | H A D | A S K E D | M Y | A D V I C E | I N | T H E | M A T T E R | W H I C H | Y O U | D I D N ' T | D O | M A R I L L A | I ' D | H A V E | S A I D | F O R | M E R C Y ' S | S A K E | N O T | T O | T H I N K | O F | S U C H | A | T H I N G | T H A T ' S | W H A T |
+T H I S | J O B ' S | C O M F O R T I N G | S E E M E D | N E I T H E R | T O | O F F E N D | N O R | T O | A L A R M | M A R I L L A | S H E | K N I T T E D | S T E A D I L Y | O N | I | D O N ' T | D E N Y | T H E R E ' S | S O M E T H I N G | I N | W H A T | Y O U | S A Y | R A C H E L | I ' V E | H A D | S O M E | Q U A L M S | M Y S E L F | B U T | M A T T H E W | W A S | T E R R I B L E | S E T | O N | I T | I | C O U L D | S E E | T H A T | S O | I | G A V E | I N |
+I T ' S | S O | S E L D O M | M A T T H E W | S E T S | H I S | M I N D | O N | A N Y T H I N G | T H A T | W H E N | H E | D O E S | I | A L W A Y S | F E E L | I T ' S | M Y | D U T Y | T O | G I V E | I N | A N D | A S | F O R | T H E | R I S K | T H E R E ' S | R I S K S | I N | P R E T T Y | N E A R | E V E R Y T H I N G | A | B O D Y | D O E S | I N | T H I S | W O R L D | T H E R E ' S | R I S K S | I N | P E O P L E ' S | H A V I N G | C H I L D R E N | O F | T H E I R | O W N | I F | I T | C O M E S | T O | T H A T | T H E Y | D O N ' T | A L W A Y S | T U R N | O U T | W E L L |
+A N D | T H E N | N O V A | S C O T I A | I S | R I G H T | C L O S E | T O | T H E | I S L A N D | I T | I S N ' T | A S | I F | W E | W E R E | G E T T I N G | H I M | F R O M | E N G L A N D | O R | T H E | S T A T E S | H E | C A N ' T | B E | M U C H | D I F F E R E N T | F R O M | O U R S E L V E S | W E L L | I | H O P E | I T | W I L L | T U R N | O U T | A L L | R I G H T | S A I D | M I S S U S | R A C H E L | I N | A | T O N E | T H A T | P L A I N L Y | I N D I C A T E D | H E R | P A I N F U L | D O U B T S |
+O N L Y | D O N ' T | S A Y | I | D I D N ' T | W A R N | Y O U | I F | H E | B U R N S | G R E E N | G A B L E S | D O W N | O R | P U T S | S T R Y C H N I N E | I N | T H E | W E L L | I | H E A R D | O F | A | C A S E | O V E R | I N | N E W | B R U N S W I C K | W H E R E | A N | O R P H A N | A S Y L U M | C H I L D | D I D | T H A T | A N D | T H E | W H O L E | F A M I L Y | D I E D | I N | F E A R F U L | A G O N I E S | O N L Y | I T | W A S | A | G I R L | I N | T H A T | I N S T A N C E | W E L L | W E ' R E | N O T | G E T T I N G | A | G I R L | S A I D | M A R I L L A |
+A S | I F | P O I S O N I N G | W E L L S | W E R E | A | P U R E L Y | F E M I N I N E | A C C O M P L I S H M E N T | A N D | N O T | T O | B E | D R E A D E D | I N | T H E | C A S E | O F | A | B O Y | I ' D | N E V E R | D R E A M | O F | T A K I N G | A | G I R L | T O | B R I N G | U P | I | W O N D E R | A T | M I S S U S | A L E X A N D E R | S P E N C E R | F O R | D O I N G | I T | B U T | T H E R E | S H E | W O U L D N ' T | S H R I N K | F R O M | A D O P T I N G | A | W H O L E | O R P H A N | A S Y L U M | I F | S H E | T O O K | I T | I N T O | H E R | H E A D |
+M I S S U S | R A C H E L | W O U L D | H A V E | L I K E D | T O | S T A Y | U N T I L | M A T T H E W | C A M E | H O M E | W I T H | H I S | I M P O R T E D | O R P H A N | B U T | R E F L E C T I N G | T H A T | I T | W O U L D | B E | A | G O O D | T W O | H O U R S | A T | L E A S T | B E F O R E | H I S | A R R I V A L | S H E | C O N C L U D E D | T O | G O | U P | T H E | R O A D | T O | R O B E R T | B E L L ' S | A N D | T E L L | T H E | N E W S | I T | W O U L D | C E R T A I N L Y | M A K E | A | S E N S A T I O N | S E C O N D | T O | N O N E |
+A N D | M I S S U S | R A C H E L | D E A R L Y | L O V E D | T O | M A K E | A | S E N S A T I O N | S O | S H E | T O O K | H E R S E L F | A W A Y | S O M E W H A T | T O | M A R I L L A ' S | R E L I E F | F O R | T H E | L A T T E R | F E L T | H E R | D O U B T S | A N D | F E A R S | R E V I V I N G | U N D E R | T H E | I N F L U E N C E | O F | M I S S U S | R A C H E L ' S | P E S S I M I S M | W E L L | O F | A L L | T H I N G S | T H A T | E V E R | W E R E | O R | W I L L | B E | E J A C U L A T E D | M I S S U S | R A C H E L | W H E N | S H E | W A S | S A F E L Y | O U T | I N | T H E | L A N E |
+I T | D O E S | R E A L L Y | S E E M | A S | I F | I | M U S T | B E | D R E A M I N G | W E L L | I ' M | S O R R Y | F O R | T H A T | P O O R | Y O U N G | O N E | A N D | N O | M I S T A K E | M A T T H E W | A N D | M A R I L L A | D O N ' T | K N O W | A N Y T H I N G | A B O U T | C H I L D R E N | A N D | T H E Y ' L L | E X P E C T | H I M | T O | B E | W I S E R | A N D | S T E A D I E R | T H A T | H I S | O W N | G R A N D F A T H E R |
+I T | S E E M S | U N C A N N Y | T O | T H I N K | O F | A | C H I L D | A T | G R E E N | G A B L E S | S O M E H O W | T H E R E ' S | N E V E R | B E E N | O N E | T H E R E | F O R | M A T T H E W | A N D | M A R I L L A | W E R E | G R O W N | U P | W H E N | T H E | N E W | H O U S E | W A S | B U I L T | I F | T H E Y | E V E R | W E R E | C H I L D R E N | W H I C H | I S | H A R D | T O | B E L I E V E | W H E N | O N E | L O O K S | A T | T H E M | I | W O U L D N ' T | B E | I N | T H A T | O R P H A N ' S | S H O E S | F O R | A N Y T H I N G |
+M Y | B U T | I | P I T Y | H I M | T H A T ' S | W H A T | S O | S A I D | M I S S U S | R A C H E L | T O | T H E | W I L D | R O S E | B U S H E S | O U T | O F | T H E | F U L N E S S | O F | H E R | H E A R T |
+C H A P T E R | T W O | M A T T H E W | C U T H B E R T | I S | S U R P R I S E D | M A T T H E W | C U T H B E R T | A N D | T H E | S O R R E L | M A R E | J O G G E D | C O M F O R T A B L Y | O V E R | T H E | E I G H T | M I L E S | T O | B R I G H T | R I V E R | I T | W A S | A | P R E T T Y | R O A D | R U N N I N G | A L O N G | B E T W E E N | S N U G | F A R M S T E A D S | W I T H | N O W | A N D | A G A I N | A | B I T | O F | B A L S A M Y | F I R | W O O D | T O | D R I V E | T H R O U G H |
+O R | A | H O L L O W | W H E R E | W I L D | P L U M S | H U N G | O U T | T H E I R | F I L M Y | B L O O M | T H E | A I R | W A S | S W E E T | W I T H | T H E | B R E A T H | O F | M A N Y | A P P L E | O R C H A R D S | A N D | T H E | M E A D O W S | S L O P E D | A W A Y | I N | T H E | D I S T A N C E | T O | H O R I Z O N | M I S T S | O F | P E A R L | A N D | P U R P L E | W H I L E | T H E | L I T T L E | B I R D S | S A N G | A S | I F | I T | W E R E | T H E | O N E | D A Y | O F | S U M M E R | I N | A L L | T H E | Y E A R |
+M A T T H E W | E N J O Y E D | T H E | D R I V E | A F T E R | H I S | O W N | F A S H I O N | E X C E P T | D U R I N G | T H E | M O M E N T S | W H E N | H E | M E T | W O M E N | A N D | H A D | T O | N O D | T O | T H E M | F O R | I N | P R I N C E | E D W A R D | I S L A N D | Y O U | A R E | S U P P O S E D | T O | N O D | T O | A L L | A N D | S U N D R Y | Y O U | M E E T | O N | T H E | R O A D | W H E T H E R | Y O U | K N O W | T H E M | O R | N O T | M A T T H E W | D R E A D E D | A L L | W O M E N | E X C E P T | M A R I L L A | A N D | M I S S U S | R A C H E L |
+H E | H A D | A N | U N C O M F O R T A B L E | F E E L I N G | T H A T | T H E | M Y S T E R I O U S | C R E A T U R E S | W E R E | S E C R E T L Y | L A U G H I N G | A T | H I M | H E | M A Y | H A V E | B E E N | Q U I T E | R I G H T | I N | T H I N K I N G | S O | F O R | H E | W A S | A N | O D D | L O O K I N G | P E R S O N A G E | W I T H | A N | U N G A I N L Y | F I G U R E | A N D | L O N G | I R O N | G R A Y | H A I R | T H A T | T O U C H E D | H I S | S T O O P I N G | S H O U L D E R S |
+A N D | A | F U L L | S O F T | B R O W N | B E A R D | W H I C H | H E | H A D | W O R N | E V E R | S I N C E | H E | W A S | T W E N T Y | I N | F A C T | H E | H A D | L O O K E D | A T | T W E N T Y | V E R Y | M U C H | A S | H E | L O O K E D | A T | S I X T Y | L A C K I N G | A | L I T T L E | O F | T H E | G R A Y N E S S | W H E N | H E | R E A C H E D | B R I G H T | R I V E R | T H E R E | W A S | N O | S I G N | O F | A N Y | T R A I N |
+H E | T H O U G H T | H E | W A S | T O O | E A R L Y | S O | H E | T I E D | H I S | H O R S E | I N | T H E | Y A R D | O F | T H E | S M A L L | B R I G H T | R I V E R | H O T E L | A N D | W E N T | O V E R | T O | T H E | S T A T I O N | H O U S E | T H E | L O N G | P L A T F O R M | W A S | A L M O S T | D E S E R T E D | T H E | O N L Y | L I V I N G | C R E A T U R E | I N | S I G H T | B E I N G | A | G I R L | W H O | W A S | S I T T I N G | O N | A | P I L E | O F | S H I N G L E S | A T | T H E | E X T R E M E | E N D |
+M A T T H E W | B A R E L Y | N O T I N G | T H A T | I T | W A S | A | G I R L | S I D L E D | P A S T | H E R | A S | Q U I C K L Y | A S | P O S S I B L E | W I T H O U T | L O O K I N G | A T | H E R | H A D | H E | L O O K E D | H E | C O U L D | H A R D L Y | H A V E | F A I L E D | T O | N O T I C E | T H E | T E N S E | R I G I D I T Y | A N D | E X P E C T A T I O N | O F | H E R | A T T I T U D E | A N D | E X P R E S S I O N | S H E | W A S | S I T T I N G | T H E R E | W A I T I N G | F O R | S O M E T H I N G | O R | S O M E B O D Y |
+A N D | S I N C E | S I T T I N G | A N D | W A I T I N G | W A S | T H E | O N L Y | T H I N G | T O | D O | J U S T | T H E N | S H E | S A T | A N D | W A I T E D | W I T H | A L L | H E R | M I G H T | A N D | M A I N | M A T T H E W | E N C O U N T E R E D | T H E | S T A T I O N M A S T E R | L O C K I N G | U P | T H E | T I C K E T | O F F I C E | P R E P A R A T O R Y | T O | G O I N G | H O M E | F O R | S U P P E R | A N D | A S K E D | H I M | I F | T H E | F I V E | T H I R T Y | T R A I N | W O U L D | S O O N | B E | A L O N G |
+T H E | F I V E | T H I R T Y | T R A I N | H A S | B E E N | I N | A N D | G O N E | H A L F | A N | H O U R | A G O | A N S W E R E D | T H A T | B R I S K | O F F I C I A L | B U T | T H E R E | W A S | A | P A S S E N G E R | D R O P P E D | O F F | F O R | Y O U | A | L I T T L E | G I R L | S H E ' S | S I T T I N G | O U T | T H E R E | O N | T H E | S H I N G L E S | I | A S K E D | H E R | T O | G O | I N T O | T H E | L A D I E S | W A I T I N G | R O O M | B U T | S H E | I N F O R M E D | M E | G R A V E L Y | T H A T | S H E | P R E F E R R E D | T O | S T A Y | O U T S I D E |
+S H E ' S | A | C A S E | I | S H O U L D | S A Y | I ' M | N O T | E X P E C T I N G | A | G I R L | S A I D | M A T T H E W | B L A N K L Y | I T ' S | A | B O Y | I ' V E | C O M E | F O R | H E | S H O U L D | B E | H E R E | M I S S U S | A L E X A N D E R | S P E N C E R | W A S | T O | B R I N G | H I M | O V E R | F R O M | N O V A | S C O T I A | F O R | M E | T H E | S T A T I O N M A S T E R | W H I S T L E D |
+G U E S S | T H E R E ' S | S O M E | M I S T A K E | H E | S A I D | M I S S U S | S P E N C E R | C A M E | O F F | T H E | T R A I N | W I T H | T H A T | G I R L | A N D | G A V E | H E R | I N T O | M Y | C H A R G E | S A I D | Y O U | A N D | Y O U R | S I S T E R | W E R E | A D O P T I N G | H E R | F R O M | A N | O R P H A N | A S Y L U M | A N D | T H A T | Y O U | W O U L D | B E | A L O N G | F O R | H E R | P R E S E N T L Y | T H A T ' S | A L L | I | K N O W | A B O U T | I T | A N D | I | H A V E N ' T | G O T | A N Y | M O R E | O R P H A N S | C O N C E A L E D | H E R E A B O U T S |
+I | D O N ' T | U N D E R S T A N D | S A I D | M A T T H E W | H E L P L E S S L Y | W I S H I N G | T H A T | M A R I L L A | W A S | A T | H A N D | T O | C O P E | W I T H | T H E | S I T U A T I O N | W E L L | Y O U ' D | B E T T E R | Q U E S T I O N | T H E | G I R L | S A I D | T H E | S T A T I O N | M A S T E R | C A R E L E S S L Y | I | D A R E | S A Y | S H E ' L L | B E | A B L E | T O | E X P L A I N | S H E ' S | G O T | A | T O N G U E | O F | H E R | O W N | T H A T ' S | C E R T A I N |
+M A Y B E | T H E Y | W E R E | O U T | O F | B O Y S | O F | T H E | B R A N D | Y O U | W A N T E D | H E | W A L K E D | J A U N T I L Y | A W A Y | B E I N G | H U N G R Y | A N D | T H E | U N F O R T U N A T E | M A T T H E W | W A S | L E F T | T O | D O | T H A T | W H I C H | W A S | H A R D E R | F O R | H I M | T H A N | B E A R D I N G | A | L I O N | I N | I T S | D E N | W A L K | U P | T O | A | G I R L | A | S T R A N G E | G I R L | A N | O R P H A N | G I R L |
+A N D | D E M A N D | O F | H E R | W H Y | S H E | W A S N ' T | A | B O Y | M A T T H E W | G R O A N E D | I N | S P I R I T | A S | H E | T U R N E D | A B O U T | A N D | S H U F F L E D | G E N T L Y | D O W N | T H E | P L A T F O R M | T O W A R D S | H E R | S H E | H A D | B E E N | W A T C H I N G | H I M | E V E R | S I N C E | H E | H A D | P A S S E D | H E R | A N D | S H E | H A D | H E R | E Y E S | O N | H I M | N O W | M A T T H E W | W A S | N O T | L O O K I N G | A T | H E R |
+A | C H I L D | O F | A B O U T | E L E V E N | G A R B E D | I N | A | V E R Y | S H O R T | V E R Y | T I G H T | V E R Y | U G L Y | D R E S S | O F | Y E L L O W I S H | G R A Y | W I N C E Y | S H E | W O R E | A | F A D E D | B R O W N | S A I L O R | H A T | A N D | B E N E A T H | T H E | H A T | E X T E N D I N G | D O W N | H E R | B A C K | W E R E | T W O | B R A I D S | O F | V E R Y | T H I C K | D E C I D E D L Y | R E D | H A I R |
+H E R | F A C E | W A S | S M A L L | W H I T E | A N D | T H I N | A L S O | M U C H | F R E C K L E D | H E R | M O U T H | W A S | L A R G E | A N D | S O | W E R E | H E R | E Y E S | W H I C H | L O O K E D | G R E E N | I N | S O M E | L I G H T S | A N D | M O O D S | A N D | G R A Y | I N | O T H E R S | S O | F A R | T H E | O R D I N A R Y | O B S E R V E R | A N | E X T R A O R D I N A R Y | O B S E R V E R |
+M I G H T | H A V E | S E E N | T H A T | T H E | C H I N | W A S | V E R Y | P O I N T E D | A N D | P R O N O U N C E D | T H A T | T H E | B I G | E Y E S | W E R E | F U L L | O F | S P I R I T | A N D | V I V A C I T Y | T H A T | T H E | M O U T H | W A S | S W E E T | L I P P E D | A N D | E X P R E S S I V E | T H A T | T H E | F O R E H E A D | W A S | B R O A D | A N D | F U L L | I N | S H O R T | O U R | D I S C E R N I N G | E X T R A O R D I N A R Y | O B S E R V E R | M I G H T | H A V E | C O N C L U D E D |
+W A S | S O | L U D I C R O U S L Y | A F R A I D | M A T T H E W | H O W E V E R | W A S | S P A R E D | T H E | O R D E A L | O F | S P E A K I N G | F I R S T | F O R | A S | S O O N | A S | S H E | C O N C L U D E D | T H A T | H E | W A S | C O M I N G | T O | H E R | S H E | S T O O D | U P | G R A S P I N G | W I T H | O N E | T H I N | B R O W N | H A N D | T H E | H A N D L E | O F | A | S H A B B Y | O L D | F A S H I O N E D | C A R P E T | B A G | T H E | O T H E R | S H E | H E L D | O U T | T O | H I M |
+I | S U P P O S E | Y O U | A R E | M I S T E R | M A T T H E W | C U T H B E R T | O F | G R E E N | G A B L E S | S H E | S A I D | I N | A | P E C U L I A R L Y | C L E A R | S W E E T | V O I C E | I ' M | V E R Y | G L A D | T O | S E E | Y O U | I | W A S | B E G I N N I N G | T O | B E | A F R A I D | Y O U | W E R E N ' T | C O M I N G | F O R | M E |
+I | H A D | M A D E | U P | M Y | M I N D | T H A T | I F | Y O U | D I D N ' T | C O M E | F O R | M E | T O | N I G H T |
+I | W O U L D N ' T | B E | A | B I T | A F R A I D | A N D | I T | W O U L D | B E | L O V E L Y | T O | S L E E P | I N | A | W I L D | C H E R R Y | T R E E | A L L | W H I T E | W I T H | B L O O M | I N | T H E | M O O N S H I N E | D O N ' T | Y O U | T H I N K | Y O U | C O U L D | I M A G I N E | Y O U | W E R E | D W E L L I N G | I N | M A R B L E | H A L L S | C O U L D N ' T | Y O U |
+M A T T H E W | H A D | T A K E N | T H E | S C R A W N Y | L I T T L E | H A N D | A W K W A R D L Y | I N | H I S | T H E N | A N D | T H E R E | H E | D E C I D E D | W H A T | T O | D O | H E | C O U L D | N O T | T E L L | T H I S | C H I L D | W I T H | T H E | G L O W I N G | E Y E S | T H A T | T H E R E | H A D | B E E N | A | M I S T A K E | H E | W O U L D | T A K E | H E R | H O M E | A N D | L E T | M A R I L L A | D O | T H A T | S H E | C O U L D N ' T | B E | L E F T | A T | B R I G H T | R I V E R | A N Y H O W |
+N O | M A T T E R | W H A T | M I S T A K E | H A D | B E E N | M A D E | S O | A L L | Q U E S T I O N S | A N D | E X P L A N A T I O N S | M I G H T | A S | W E L L | B E | D E F E R R E D | U N T I L | H E | W A S | S A F E L Y | B A C K | A T | G R E E N | G A B L E S | I ' M | S O R R Y | I | W A S | L A T E | H E | S A I D | S H Y L Y | C O M E | A L O N G | T H E | H O R S E | I S | O V E R | I N | T H E | Y A R D | G I V E | M E | Y O U R | B A G | O H | I | C A N | C A R R Y | I T | T H E | C H I L D | R E S P O N D E D | C H E E R F U L L Y |
+I T | I S N ' T | H E A V Y | I ' V E | G O T | A L L | M Y | W O R L D L Y | G O O D S | I N | I T | B U T | I T | I S N ' T | H E A V Y | A N D | I F | I T | I S N ' T | C A R R I E D | I N | J U S T | A | C E R T A I N | W A Y | T H E | H A N D L E | P U L L S | O U T | S O | I ' D | B E T T E R | K E E P | I T | B E C A U S E | I | K N O W | T H E | E X A C T | K N A C K | O F | I T | I T ' S | A N | E X T R E M E L Y | O L D | C A R P E T | B A G | O H | I ' M | V E R Y | G L A D | Y O U ' V E | C O M E | E V E N | I F | I T | W O U L D | H A V E | B E E N | N I C E | T O | S L E E P | I N | A | W I L D | C H E R R Y | T R E E |
+W E ' V E | G O T | T O | D R I V E | A | L O N G | P I E C E | H A V E N ' T | W E | M I S S U S | S P E N C E R | S A I D | I T | W A S | E I G H T | M I L E S | I ' M | G L A D | B E C A U S E | I | L O V E | D R I V I N G | O H | I T | S E E M S | S O | W O N D E R F U L | T H A T | I ' M | G O I N G | T O | L I V E | W I T H | Y O U | A N D | B E L O N G | T O | Y O U | I ' V E | N E V E R | B E L O N G E D | T O | A N Y B O D Y | N O T | R E A L L Y | B U T | T H E | A S Y L U M | W A S | T H E | W O R S T | I ' V E | O N L Y | B E E N | I N | I T | F O U R | M O N T H S | B U T | T H A T | W A S | E N O U G H |
+I T ' S | W O R S E | T H A N | A N Y T H I N G | Y O U | C O U L D | I M A G I N E | M I S S U S | S P E N C E R | S A I D | I T | W A S | W I C K E D | O F | M E | T O | T A L K | L I K E | T H A T |
+T H E Y | W E R E | G O O D | Y O U | K N O W | T H E | A S Y L U M | P E O P L E | B U T | T H E R E | I S | S O | L I T T L E | S C O P E | F O R | T H E | I M A G I N A T I O N | I N | A N | A S Y L U M | O N L Y | J U S T | I N | T H E | O T H E R | O R P H A N S | I T | W A S | P R E T T Y | I N T E R E S T I N G | T O | I M A G I N E | T H I N G S | A B O U T | T H E M |
+W H O | H A D | B E E N | S T O L E N | A W A Y | F R O M | H E R | P A R E N T S | I N | H E R | I N F A N C Y | B Y | A | C R U E L | N U R S E | W H O | D I E D | B E F O R E | S H E | C O U L D | C O N F E S S | I | U S E D | T O | L I E | A W A K E | A T | N I G H T S | A N D | I M A G I N E | T H I N G S | L I K E | T H A T | B E C A U S E | I | D I D N ' T | H A V E | T I M E | I N | T H E | D A Y | I | G U E S S | T H A T ' S | W H Y | I ' M | S O | T H I N | I | A M | D R E A D F U L | T H I N | A I N ' T | I | T H E R E | I S N ' T | A | P I C K | O N | M Y | B O N E S |
+I | D O | L O V E | T O | I M A G I N E | I ' M | N I C E | A N D | P L U M P | W I T H | D I M P L E S | I N | M Y | E L B O W S | W I T H | T H I S | M A T T H E W ' S | C O M P A N I O N | S T O P P E D | T A L K I N G | P A R T L Y | B E C A U S E | S H E | W A S | O U T | O F | B R E A T H | A N D | P A R T L Y | B E C A U S E | T H E Y | H A D | R E A C H E D | T H E | B U G G Y | N O T | A N O T H E R | W O R D | D I D | S H E | S A Y | U N T I L | T H E Y | H A D | L E F T | T H E | V I L L A G E | A N D | W E R E | D R I V I N G | D O W N | A | S T E E P | L I T T L E | H I L L |
+T H E | R O A D | P A R T | O F | W H I C H | H A D | B E E N | C U T | S O | D E E P L Y | I N T O | T H E | S O F T | S O I L | T H A T | T H E | B A N K S | F R I N G E D | W I T H | B L O O M I N G | W I L D | C H E R R Y | T R E E S | A N D | S L I M | W H I T E | B I R C H E S | W E R E | S E V E R A L | F E E T | A B O V E | T H E I R | H E A D S | T H E | C H I L D | P U T | O U T | H E R | H A N D | A N D | B R O K E | O F F | A | B R A N C H | O F | W I L D | P L U M | T H A T | B R U S H E D | A G A I N S T | T H E | S I D E | O F | T H E | B U G G Y |
+I S N ' T | T H A T | B E A U T I F U L | W H A T | D I D | T H A T | T R E E | L E A N I N G | O U T | F R O M | T H E | B A N K | A L L | W H I T E | A N D | L A C Y | M A K E | Y O U | T H I N K | O F | S H E | A S K E D | W E L L | N O W | I | D U N N O | S A I D | M A T T H E W | W H Y | A | B R I D E | O F | C O U R S E | A | B R I D E | A L L | I N | W H I T E | W I T H | A | L O V E L Y | M I S T Y | V E I L |
+I ' V E | N E V E R | S E E N | O N E | B U T | I | C A N | I M A G I N E | W H A T | S H E | W O U L D | L O O K | L I K E | I | D O N ' T | E V E R | E X P E C T | T O | B E | A | B R I D E | M Y S E L F | I ' M | S O | H O M E L Y | N O B O D Y | W I L L | E V E R | W A N T | T O | M A R R Y | M E | U N L E S S | I T | M I G H T | B E | A | F O R E I G N | M I S S I O N A R Y | I | S U P P O S E | A | F O R E I G N | M I S S I O N A R Y | M I G H T N ' T | B E | V E R Y | P A R T I C U L A R |
+B U T | I | D O | H O P E | T H A T | S O M E | D A Y | I | S H A L L | H A V E | A | W H I T E | D R E S S | T H A T | I S | M Y | H I G H E S T | I D E A L | O F | E A R T H L Y | B L I S S | I | J U S T | L O V E | P R E T T Y | C L O T H E S | A N D | I ' V E | N E V E R | H A D | A | P R E T T Y | D R E S S | I N | M Y | L I F E | T H A T | I | C A N | R E M E M B E R | B U T | O F | C O U R S E | I T ' S | A L L | T H E | M O R E | T O | L O O K | F O R W A R D | T O | I S N ' T | I T | A N D | T H E N |
+I | C A N | I M A G I N E | T H A T | I ' M | D R E S S E D | G O R G E O U S L Y | T H I S | M O R N I N G | W H E N | I | L E F T | T H E | A S Y L U M | I | F E L T | S O | A S H A M E D | B E C A U S E | I | H A D | T O | W E A R | T H I S | H O R R I D | O L D | W I N C E Y | D R E S S | A L L | T H E | O R P H A N S | H A D | T O | W E A R | T H E M | Y O U | K N O W | A | M E R C H A N T | I N | H O P E T O N | L A S T | W I N T E R | D O N A T E D | T H R E E | H U N D R E D | Y A R D S | O F | W I N C E Y | T O | T H E | A S Y L U M | S O M E | P E O P L E | S A I D | I T | W A S | B E C A U S E | H E | C O U L D N ' T | S E L L | I T |
+B U T | I ' D | R A T H E R | B E L I E V E | T H A T | I T | W A S | O U T | O F | T H E | K I N D N E S S | O F | H I S | H E A R T | W O U L D N ' T | Y O U | W H E N | W E | G O T | O N | T H E | T R A I N | I | F E L T | A S | I F | E V E R Y B O D Y | M U S T | B E | L O O K I N G | A T | M E | A N D | P I T Y I N G | M E | B U T | I | J U S T | W E N T | T O | W O R K | A N D | I M A G I N E D | T H A T | I | H A D | O N | T H E | M O S T | B E A U T I F U L | P A L E | B L U E | S I L K | D R E S S | B E C A U S E | W H E N | Y O U | A R E | I M A G I N I N G | Y O U | M I G H T | A S | W E L L | I M A G I N E | S O M E T H I N G | W O R T H | W H I L E |
+A N D | A | B I G | H A T | A L L | F L O W E R S | A N D | N O D D I N G | P L U M E S | A N D | A | G O L D | W A T C H | A N D | K I D | G L O V E S | A N D | B O O T S | I | F E L T | C H E E R E D | U P | R I G H T | A W A Y | A N D | I | E N J O Y E D | M Y | T R I P | T O | T H E | I S L A N D | W I T H | A L L | M Y | M I G H T | I | W A S N ' T | A | B I T | S I C K | C O M I N G | O V E R | I N | T H E | B O A T | N E I T H E R | W A S | M I S S U S | S P E N C E R | A L T H O U G H | S H E | G E N E R A L L Y | I S |
+S H E | S A I D | S H E | H A D N ' T | T I M E | T O | G E T | S I C K | W A T C H I N G | T O | S E E | T H A T | I | D I D N ' T | F A L L | O V E R B O A R D | S H E | S A I D | S H E | N E V E R | S A W | T H E | B E A T | O F | M E | F O R | P R O W L I N G | A B O U T | B U T | I F | I T | K E P T | H E R | F R O M | B E I N G | S E A S I C K | I T ' S | A | M E R C Y | I | D I D | P R O W L | I S N ' T | I T | A N D | I | W A N T E D | T O | S E E | E V E R Y T H I N G | T H A T | W A S | T O | B E | S E E N | O N | T H A T | B O A T | B E C A U S E | I | D I D N ' T | K N O W | W H E T H E R | I ' D | E V E R | H A V E | A N O T H E R | O P P O R T U N I T Y |
+O H | T H E R E | A R E | A | L O T | M O R E | C H E R R Y | T R E E S | A L L | I N | B L O O M | T H I S | I S L A N D | I S | T H E | B L O O M I E S T | P L A C E | I | J U S T | L O V E | I T | A L R E A D Y | A N D | I ' M | S O | G L A D | I ' M | G O I N G | T O | L I V E | H E R E | I ' V E | A L W A Y S | H E A R D | T H A T | P R I N C E | E D W A R D | I S L A N D | W A S | T H E | P R E T T I E S T | P L A C E | I N | T H E | W O R L D |
+A N D | I | U S E D | T O | I M A G I N E | I | W A S | L I V I N G | H E R E | B U T | I | N E V E R | R E A L L Y | E X P E C T E D | I | W O U L D | I T ' S | D E L I G H T F U L | W H E N | Y O U R | I M A G I N A T I O N S | C O M E | T R U E | I S N ' T | I T | B U T | T H O S E | R E D | R O A D S | A R E | S O | F U N N Y | W H E N | W E | G O T | I N T O | T H E | T R A I N | A T | C H A R L O T T E T O W N | A N D | T H E | R E D | R O A D S | B E G A N | T O | F L A S H | P A S T | I | A S K E D | M I S S U S | S P E N C E R | W H A T | M A D E | T H E M | R E D |
+A N D | S H E | S A I D | S H E | D I D N ' T | K N O W | A N D | F O R | P I T Y ' S | S A K E | N O T | T O | A S K | H E R | A N Y | M O R E | Q U E S T I O N S | S H E | S A I D | I | M U S T | H A V E | A S K E D | H E R | A | T H O U S A N D | A L R E A D Y | I | S U P P O S E | I | H A D | T O O | B U T | H O W | Y O U | G O I N G | T O | F I N D | O U T | A B O U T | T H I N G S | I F | Y O U | D O N ' T | A S K | Q U E S T I O N S | A N D | W H A T | D O E S | M A K E | T H E | R O A D S | R E D | W E L L | N O W | I | D U N N O | S A I D | M A T T H E W |
+T H E R E ' D | B E | N O | S C O P E | F O R | I M A G I N A T I O N | T H E N | W O U L D | T H E R E | B U T | A M | I | T A L K I N G | T O O | M U C H | P E O P L E | A R E | A L W A Y S | T E L L I N G | M E | I | D O | W O U L D | Y O U | R A T H E R | I | D I D N ' T | T A L K | I F | Y O U | S A Y | S O | I ' L L | S T O P | I | C A N | S T O P | W H E N | I | M A K E | U P | M Y | M I N D | T O | I T | A L T H O U G H | I T ' S | D I F F I C U L T | M A T T H E W |
+W A S | E N J O Y I N G | H I M S E L F | L I K E | M O S T | Q U I E T | F O L K S | H E | L I K E D | T A L K A T I V E | P E O P L E | W H E N | T H E Y | W E R E | W I L L I N G | T O | D O | T H E | T A L K I N G | T H E M S E L V E S | A N D | D I D | N O T | E X P E C T | H I M | T O | K E E P | U P | H I S | E N D | O F | I T | B U T | H E | H A D | N E V E R | E X P E C T E D | T O | E N J O Y | T H E | S O C I E T Y | O F | A | L I T T L E | G I R L | W O M E N | W E R E | B A D | E N O U G H | I N | A L L | C O N S C I E N C E | B U T | L I T T L E | G I R L S | W E R E | W O R S E |
diff --git a/SpeechLM/dataset/LibriSpeech/asr/train_sample100.tsv b/SpeechLM/dataset/LibriSpeech/asr/train_sample100.tsv
new file mode 100644
index 0000000000000000000000000000000000000000..8b50a1d2f1e06553881ec3352bee2e6360814635
--- /dev/null
+++ b/SpeechLM/dataset/LibriSpeech/asr/train_sample100.tsv
@@ -0,0 +1,101 @@
+/LocalData/dataset/LibriSpeech/train-clean-100
+103/1240/103-1240-0000.flac	225360
+103/1240/103-1240-0001.flac	255120
+103/1240/103-1240-0002.flac	223120
+103/1240/103-1240-0003.flac	235360
+103/1240/103-1240-0004.flac	200240
+103/1240/103-1240-0005.flac	242800
+103/1240/103-1240-0006.flac	153280
+103/1240/103-1240-0007.flac	240560
+103/1240/103-1240-0008.flac	246960
+103/1240/103-1240-0009.flac	160480
+103/1240/103-1240-0010.flac	236880
+103/1240/103-1240-0011.flac	234480
+103/1240/103-1240-0012.flac	243040
+103/1240/103-1240-0013.flac	244160
+103/1240/103-1240-0014.flac	223360
+103/1240/103-1240-0015.flac	60960
+103/1240/103-1240-0016.flac	250640
+103/1240/103-1240-0017.flac	229040
+103/1240/103-1240-0018.flac	185760
+103/1240/103-1240-0019.flac	246480
+103/1240/103-1240-0020.flac	214640
+103/1240/103-1240-0021.flac	236960
+103/1240/103-1240-0022.flac	262000
+103/1240/103-1240-0023.flac	194400
+103/1240/103-1240-0024.flac	244320
+103/1240/103-1240-0025.flac	241920
+103/1240/103-1240-0026.flac	133360
+103/1240/103-1240-0027.flac	223440
+103/1240/103-1240-0028.flac	250400
+103/1240/103-1240-0029.flac	244320
+103/1240/103-1240-0030.flac	232320
+103/1240/103-1240-0031.flac	269760
+103/1240/103-1240-0032.flac	236400
+103/1240/103-1240-0033.flac	230640
+103/1240/103-1240-0034.flac	246480
+103/1240/103-1240-0035.flac	256720
+103/1240/103-1240-0036.flac	200320
+103/1240/103-1240-0037.flac	237040
+103/1240/103-1240-0038.flac	114480
+103/1240/103-1240-0039.flac	230800
+103/1240/103-1240-0040.flac	234720
+103/1240/103-1240-0041.flac	216160
+103/1240/103-1240-0042.flac	249680
+103/1240/103-1240-0043.flac	236160
+103/1240/103-1240-0044.flac	262240
+103/1240/103-1240-0045.flac	250800
+103/1240/103-1240-0046.flac	222800
+103/1240/103-1240-0047.flac	206320
+103/1240/103-1240-0048.flac	236320
+103/1240/103-1240-0049.flac	244560
+103/1240/103-1240-0050.flac	224400
+103/1240/103-1240-0051.flac	245760
+103/1240/103-1240-0052.flac	236640
+103/1240/103-1240-0053.flac	218640
+103/1240/103-1240-0054.flac	261360
+103/1240/103-1240-0055.flac	179920
+103/1240/103-1240-0056.flac	229040
+103/1240/103-1240-0057.flac	109680
+103/1241/103-1241-0000.flac	255440
+103/1241/103-1241-0001.flac	248800
+103/1241/103-1241-0002.flac	249040
+103/1241/103-1241-0003.flac	222160
+103/1241/103-1241-0004.flac	236080
+103/1241/103-1241-0005.flac	224400
+103/1241/103-1241-0006.flac	243760
+103/1241/103-1241-0007.flac	242320
+103/1241/103-1241-0008.flac	242160
+103/1241/103-1241-0009.flac	222400
+103/1241/103-1241-0010.flac	253920
+103/1241/103-1241-0011.flac	231760
+103/1241/103-1241-0012.flac	239680
+103/1241/103-1241-0013.flac	236960
+103/1241/103-1241-0014.flac	242080
+103/1241/103-1241-0015.flac	224160
+103/1241/103-1241-0016.flac	234640
+103/1241/103-1241-0017.flac	254240
+103/1241/103-1241-0018.flac	150960
+103/1241/103-1241-0019.flac	48400
+103/1241/103-1241-0020.flac	155360
+103/1241/103-1241-0021.flac	242880
+103/1241/103-1241-0022.flac	261600
+103/1241/103-1241-0023.flac	266720
+103/1241/103-1241-0024.flac	254240
+103/1241/103-1241-0025.flac	77280
+103/1241/103-1241-0026.flac	176080
+103/1241/103-1241-0027.flac	238080
+103/1241/103-1241-0028.flac	248880
+103/1241/103-1241-0029.flac	244960
+103/1241/103-1241-0030.flac	247520
+103/1241/103-1241-0031.flac	209600
+103/1241/103-1241-0032.flac	224080
+103/1241/103-1241-0033.flac	251920
+103/1241/103-1241-0034.flac	270560
+103/1241/103-1241-0035.flac	248800
+103/1241/103-1241-0036.flac	249040
+103/1241/103-1241-0037.flac	204400
+103/1241/103-1241-0038.flac	238960
+103/1241/103-1241-0039.flac	258160
+103/1241/103-1241-0040.flac	220560
+103/1241/103-1241-0041.flac	252240
diff --git a/SpeechLM/dataset/LibriSpeech/fast_phone2unit/config.yaml b/SpeechLM/dataset/LibriSpeech/fast_phone2unit/config.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..eaec2ce8655ebfa043cf73a2ee2d85ac5bcdfb21
--- /dev/null
+++ b/SpeechLM/dataset/LibriSpeech/fast_phone2unit/config.yaml
@@ -0,0 +1,13 @@
+audio_root: /home/v-ziqzhang/dataset/librispeech_phone2unit
+features:
+  energy_max: 5.733445167541504
+  energy_min: 1.0e-08
+  eps: 1.0e-05
+  hop_length: 256
+  pitch_max: 6.608609099713706
+  pitch_min: 1.0e-08
+  sample_rate: 16000
+sample_rate: 16000
+vocab_filename: dict.km.txt
+src_vocab_filename: dict.phn.txt
+
diff --git a/SpeechLM/dataset/LibriSpeech/fast_phone2unit/config_generate.yaml b/SpeechLM/dataset/LibriSpeech/fast_phone2unit/config_generate.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..1d9fa74529728fe81f41edd55689f43f6ae2da83
--- /dev/null
+++ b/SpeechLM/dataset/LibriSpeech/fast_phone2unit/config_generate.yaml
@@ -0,0 +1,13 @@
+audio_root: /home/v-ziqzhang/dataset/librispeech_phone2unit
+features:
+  energy_max: 5.733445167541504
+  energy_min: 1.0e-08
+  eps: 1.0e-05
+  hop_length: 256
+  pitch_max: 6.608609099713706
+  pitch_min: 1.0e-08
+  sample_rate: 16000
+sample_rate: 16000
+vocab_filename: dict.km.txt
+src_vocab_filename: dict.PHN.txt
+
diff --git a/SpeechLM/dataset/LibriSpeech/fast_phone2unit/dict.PHN.txt b/SpeechLM/dataset/LibriSpeech/fast_phone2unit/dict.PHN.txt
new file mode 100644
index 0000000000000000000000000000000000000000..60232ecf55c10e9ab673168262af28951ecbec2f
--- /dev/null
+++ b/SpeechLM/dataset/LibriSpeech/fast_phone2unit/dict.PHN.txt
@@ -0,0 +1,42 @@
+| 0
+<SIL> 1
+' 2
+AA 3
+AE 4
+AH 5
+AO 6
+AW 7
+AY 8
+B 9
+CH 10
+D 11
+DH 12
+EH 13
+ER 14
+EY 15
+F 16
+G 17
+HH 18
+IH 19
+IY 20
+JH 21
+K 22
+L 23
+M 24
+N 25
+NG 26
+OW 27
+OY 28
+P 29
+R 30
+S 31
+SH 32
+T 33
+TH 34
+UH 35
+UW 36
+V 37
+W 38
+Y 39
+Z 40
+ZH 41
diff --git a/SpeechLM/dataset/LibriSpeech/fast_phone2unit/dict.km.txt b/SpeechLM/dataset/LibriSpeech/fast_phone2unit/dict.km.txt
new file mode 100644
index 0000000000000000000000000000000000000000..bbfe59e554d6234f3631d8d09d9281c2160f4675
--- /dev/null
+++ b/SpeechLM/dataset/LibriSpeech/fast_phone2unit/dict.km.txt
@@ -0,0 +1,500 @@
+0 0
+1 1
+2 2
+3 3
+4 4
+5 5
+6 6
+7 7
+8 8
+9 9
+10 10
+11 11
+12 12
+13 13
+14 14
+15 15
+16 16
+17 17
+18 18
+19 19
+20 20
+21 21
+22 22
+23 23
+24 24
+25 25
+26 26
+27 27
+28 28
+29 29
+30 30
+31 31
+32 32
+33 33
+34 34
+35 35
+36 36
+37 37
+38 38
+39 39
+40 40
+41 41
+42 42
+43 43
+44 44
+45 45
+46 46
+47 47
+48 48
+49 49
+50 50
+51 51
+52 52
+53 53
+54 54
+55 55
+56 56
+57 57
+58 58
+59 59
+60 60
+61 61
+62 62
+63 63
+64 64
+65 65
+66 66
+67 67
+68 68
+69 69
+70 70
+71 71
+72 72
+73 73
+74 74
+75 75
+76 76
+77 77
+78 78
+79 79
+80 80
+81 81
+82 82
+83 83
+84 84
+85 85
+86 86
+87 87
+88 88
+89 89
+90 90
+91 91
+92 92
+93 93
+94 94
+95 95
+96 96
+97 97
+98 98
+99 99
+100 100
+101 101
+102 102
+103 103
+104 104
+105 105
+106 106
+107 107
+108 108
+109 109
+110 110
+111 111
+112 112
+113 113
+114 114
+115 115
+116 116
+117 117
+118 118
+119 119
+120 120
+121 121
+122 122
+123 123
+124 124
+125 125
+126 126
+127 127
+128 128
+129 129
+130 130
+131 131
+132 132
+133 133
+134 134
+135 135
+136 136
+137 137
+138 138
+139 139
+140 140
+141 141
+142 142
+143 143
+144 144
+145 145
+146 146
+147 147
+148 148
+149 149
+150 150
+151 151
+152 152
+153 153
+154 154
+155 155
+156 156
+157 157
+158 158
+159 159
+160 160
+161 161
+162 162
+163 163
+164 164
+165 165
+166 166
+167 167
+168 168
+169 169
+170 170
+171 171
+172 172
+173 173
+174 174
+175 175
+176 176
+177 177
+178 178
+179 179
+180 180
+181 181
+182 182
+183 183
+184 184
+185 185
+186 186
+187 187
+188 188
+189 189
+190 190
+191 191
+192 192
+193 193
+194 194
+195 195
+196 196
+197 197
+198 198
+199 199
+200 200
+201 201
+202 202
+203 203
+204 204
+205 205
+206 206
+207 207
+208 208
+209 209
+210 210
+211 211
+212 212
+213 213
+214 214
+215 215
+216 216
+217 217
+218 218
+219 219
+220 220
+221 221
+222 222
+223 223
+224 224
+225 225
+226 226
+227 227
+228 228
+229 229
+230 230
+231 231
+232 232
+233 233
+234 234
+235 235
+236 236
+237 237
+238 238
+239 239
+240 240
+241 241
+242 242
+243 243
+244 244
+245 245
+246 246
+247 247
+248 248
+249 249
+250 250
+251 251
+252 252
+253 253
+254 254
+255 255
+256 256
+257 257
+258 258
+259 259
+260 260
+261 261
+262 262
+263 263
+264 264
+265 265
+266 266
+267 267
+268 268
+269 269
+270 270
+271 271
+272 272
+273 273
+274 274
+275 275
+276 276
+277 277
+278 278
+279 279
+280 280
+281 281
+282 282
+283 283
+284 284
+285 285
+286 286
+287 287
+288 288
+289 289
+290 290
+291 291
+292 292
+293 293
+294 294
+295 295
+296 296
+297 297
+298 298
+299 299
+300 300
+301 301
+302 302
+303 303
+304 304
+305 305
+306 306
+307 307
+308 308
+309 309
+310 310
+311 311
+312 312
+313 313
+314 314
+315 315
+316 316
+317 317
+318 318
+319 319
+320 320
+321 321
+322 322
+323 323
+324 324
+325 325
+326 326
+327 327
+328 328
+329 329
+330 330
+331 331
+332 332
+333 333
+334 334
+335 335
+336 336
+337 337
+338 338
+339 339
+340 340
+341 341
+342 342
+343 343
+344 344
+345 345
+346 346
+347 347
+348 348
+349 349
+350 350
+351 351
+352 352
+353 353
+354 354
+355 355
+356 356
+357 357
+358 358
+359 359
+360 360
+361 361
+362 362
+363 363
+364 364
+365 365
+366 366
+367 367
+368 368
+369 369
+370 370
+371 371
+372 372
+373 373
+374 374
+375 375
+376 376
+377 377
+378 378
+379 379
+380 380
+381 381
+382 382
+383 383
+384 384
+385 385
+386 386
+387 387
+388 388
+389 389
+390 390
+391 391
+392 392
+393 393
+394 394
+395 395
+396 396
+397 397
+398 398
+399 399
+400 400
+401 401
+402 402
+403 403
+404 404
+405 405
+406 406
+407 407
+408 408
+409 409
+410 410
+411 411
+412 412
+413 413
+414 414
+415 415
+416 416
+417 417
+418 418
+419 419
+420 420
+421 421
+422 422
+423 423
+424 424
+425 425
+426 426
+427 427
+428 428
+429 429
+430 430
+431 431
+432 432
+433 433
+434 434
+435 435
+436 436
+437 437
+438 438
+439 439
+440 440
+441 441
+442 442
+443 443
+444 444
+445 445
+446 446
+447 447
+448 448
+449 449
+450 450
+451 451
+452 452
+453 453
+454 454
+455 455
+456 456
+457 457
+458 458
+459 459
+460 460
+461 461
+462 462
+463 463
+464 464
+465 465
+466 466
+467 467
+468 468
+469 469
+470 470
+471 471
+472 472
+473 473
+474 474
+475 475
+476 476
+477 477
+478 478
+479 479
+480 480
+481 481
+482 482
+483 483
+484 484
+485 485
+486 486
+487 487
+488 488
+489 489
+490 490
+491 491
+492 492
+493 493
+494 494
+495 495
+496 496
+497 497
+498 498
+499 499
diff --git a/SpeechLM/dataset/LibriSpeech/fast_phone2unit/genset_examples.tsv b/SpeechLM/dataset/LibriSpeech/fast_phone2unit/genset_examples.tsv
new file mode 100644
index 0000000000000000000000000000000000000000..fe4a9a1b21a77835afaacc936f59963e8ed0090c
--- /dev/null
+++ b/SpeechLM/dataset/LibriSpeech/fast_phone2unit/genset_examples.tsv
@@ -0,0 +1,101 @@
+id	speaker	n_frames	tgt_text	unit
+librilm-9899	librilm	323	<SIL> AH B EH T R OW TH AH L <SIL> AH W EH D IH NG AO R <SIL> AH K R IH S AH N IH NG <SIL> AO R DH AH M IH R P R AA K S IH M AH T IY AH V AH G IH T AA R IH Z S AH F IH SH AH N T AH K EY ZH AH N AH N D IH F DH AH AH K EY ZH AH N <SIL> L AE K S S EH N D <SIL> F AO R DH AH <SIL> G IH T AA R AH N D D AE N S EH N IY W EY <SIL>	0
+librilm-9900	librilm	449	<SIL> AH B EH T R OW TH AH L B R OW K AH N AO F W AA Z <SIL> AH TH IH NG DH AE T HH AE D HH AE P AH N D <SIL> B IH F AO R AH N D M AY T HH AE P AH N AH G EH N IH T <SIL> W AA Z <SIL> AH T R AY F AH L IY V IH N AH M IH R <SIL> N AH TH IH NG IH F OW N L IY HH AH N ER W ER AH N T AH CH T B AY <SIL> IH T IH F OW N L IY <SIL> AA T AH M AA R K UH D S T EY K <SIL> HH IH Z <SIL> L AY F AH P AA N HH IH Z <SIL> AH N IH M P IY CH AH B AH L HH AH N ER <SIL>	0
+librilm-9901	librilm	211	<SIL> AH B EH T R OW TH AH L S EH R AH M OW N IY AH K AO R D IH NG L IY IH Z DH AH <SIL> IH M IY D IY AH T AH K EY ZH AH N AH V DH AH K AH M IH NG T AH G EH DH ER AH V AW ER <SIL> AH K W EY N T AH N S IH Z <SIL>	0
+librilm-9902	librilm	45	<SIL> AH B EH T R OW TH AH L <SIL> D EY <SIL>	0
+librilm-9903	librilm	141	<SIL> AH B EH T R OW TH AH L HH IY R <SIL> IH Z AO L M OW S T AE Z B AY N D IH NG AH N D K W AY T AE Z S AA L AH M AE Z AH M EH R IH JH <SIL>	0
+librilm-9904	librilm	59	<SIL> AH B EH T R OW TH AH L <SIL> IH Z S EY K R AH D <SIL>	0
+librilm-9905	librilm	79	<SIL> AH B EH T R OW TH AH L IH Z <SIL> S AH M TH IH NG HH EH V AH N L IY <SIL>	0
+librilm-9906	librilm	225	<SIL> AH <SIL> B EH T R OW TH AH L R IH NG W AA Z <SIL> P ER CH AH S T AH N D DH EH N HH ER K AA N SH AH N S B IY IH NG AH P IY Z D SH IY <SIL> G EY V HH ER S EH L F K AH M P L IY T L IY <SIL> T UW <SIL> HH ER L AH V ER <SIL>	0
+librilm-9907	librilm	288	<SIL> AH B EH T R OW TH AH L T UH K <SIL> P L EY S AO L <SIL> W AA Z HH AA R M AH N IY AH N D F AO R <SIL> AH <SIL> T AY M N OW M AO R W AA Z S EH D AH V D IH S IH N ER SH IH T IH NG <SIL> M AE D AH M D IY <SIL> L AA <SIL> P EH L T R IY AO R P AH T IH NG HH ER IH N W AO R D SH IH P <SIL>	0
+librilm-9908	librilm	139	<SIL> AH <SIL> B EH T R OW TH AH L W IY HH AE V B IH N T UW AH B AO L AH V W IH CH AY M AH S T G IH V Y UW AH D IH S K R IH P SH AH N <SIL>	0
+librilm-9909	librilm	491	<SIL> AH <SIL> B EH T R OW TH AH L <SIL> W IH CH HH AE D <SIL> T EY K AH N P L EY S AA N DH AH <SIL> P R IY V IY AH S IY V N IH NG G EY V K AA Z F AO R P L EH N T AH F AH L SH R AH G IH NG AH V SH OW L D ER Z B IH K AO Z <SIL> DH AH <SIL> JH EH N T AH L M AH N AE Z Y EH T HH EH L D N OW <SIL> R IH S P EH K T AH B AH L P AH Z IH SH AH N IH N L AY F AH N D DH AH F IY AE N S IY AE Z S EH V R AH L F IY M EY L <SIL> F R EH N D Z AH S ER T AH D AH V EH R IY <SIL> AH N S ER T AH N W AH N <SIL>	0
+librilm-9910	librilm	269	<SIL> AH B EH T R OW TH AH L <SIL> W IH TH AW T AH N D IY V IH N <SIL> AH G EH N S T DH AH K AH N S EH N T AH V P EH R AH N T S W AA Z S AH M TH IH NG K W AY T AW T S AY D AH V DH AH <SIL> Y AH NG <SIL> L EY D IY Z P AW ER AH V <SIL> K AA M P R IY HH EH N SH AH N <SIL>	0
+librilm-9911	librilm	29	<SIL> AH B EH T R AH TH <SIL>	0
+librilm-9912	librilm	127	<SIL> AH <SIL> B EH T R AH TH B R AY D AO T T UW B IY HH AE P IY Y UW <SIL> AA R AO L W EY Z <SIL> T EH L IH NG M IY S OW <SIL>	0
+librilm-9913	librilm	108	<SIL> AH <SIL> B EH T R AH TH <SIL> G ER L <SIL> IH Z W AH N TH IH NG AH W AY F K W AY T AH N AH DH ER <SIL>	0
+librilm-9914	librilm	168	<SIL> AH B EH T R AH TH L AH V ER K AE N AA T T AA L ER EY T EH N IY <SIL> AH S P ER ZH AH N K AE S T AH P AA N <SIL> DH AH F EH R S EH K S S EH D JH AO R JH AH Z <SIL>	0
+librilm-9915	librilm	61	<SIL> AH B EH T R AH TH <SIL> L AH V ER Z F EH R W EH L <SIL>	0
+librilm-9916	librilm	335	<SIL> AH B EH T R AH TH Y AH NG <SIL> M AE N <SIL> AO R HH IH Z F IY M EY L R EH L AH T IH V Z <SIL> AH S IH S T IH NG <SIL> HH IH M <SIL> W AA Z AH K AH S T AH M D T UW M EY K AH <SIL> P R EH Z AH N T AH V W AH N <SIL> AO R M AO R P EH T IY K OW T S T UW HH IH Z S W IY T HH AA R T <SIL> T UW IH N K R IY S HH ER W AO R D R OW B <SIL>	0
+librilm-9917	librilm	24	<SIL> AH B EH T ER <SIL>	0
+librilm-9918	librilm	182	<SIL> AH B EH T ER AH F AY N ER <SIL> AH <SIL> N OW B L ER F EH L OW DH AE N HH IY <SIL> N EH V ER L AY V D AH N D DH AE T S W AH T AY W AA N T Y UW <SIL> F EH L OW Z T UW N OW <SIL>	0
+librilm-9919	librilm	254	<SIL> AH B EH T ER AH K AW N T AH V AH <SIL> M IH L AH T EH R IY AE K SH AH N DH AE N <SIL> DH AE T W IH CH <SIL> S M OW L IH T G IH V Z <SIL> AH V DH AH B AE T AH L <SIL> AH V DH AH B OY N <SIL> IH T W UH D <SIL> B IY HH AA R D <SIL> T UW F AY N D <SIL>	0
+librilm-9920	librilm	117	<SIL> AH B EH T ER AH K AW N T AH V <SIL> DH AH SH UH G ER DH AE N <SIL> AY K UH D HH AE V IH K S P EH K T AH D <SIL>	0
+librilm-9921	librilm	135	<SIL> AH B EH T ER AH K AW N T <SIL> AH V DH IH S IH N S AH D AH N T W AA Z W AY D L IY P R IH N T AH D <SIL> AE T DH AE T T AY M <SIL>	0
+librilm-9922	librilm	251	<SIL> AH B EH T ER AH K AW N T T UW DH AE N DH AH N UW Z P EY P ER W AH N F UH L ER IH G Z AE K T ER <SIL> M AO R D IH T EY L D <SIL> B AE K T AH P B AY F IH G Y ER Z D AW N TH R IY L AO NG SH IY T S AH N D HH AE F W EY <SIL> D AW N AH F AO R TH <SIL>	0
+librilm-9923	librilm	84	<SIL> AH B EH T ER AH K W EY N T AH N S W IH L S UW N D IH S K AH V ER DH IH S <SIL>	0
+librilm-9924	librilm	258	<SIL> AH B EH T ER <SIL> AH K W EY N T AH N S W IH DH <SIL> DH AH AA P ER EY SH AH N N OW N T UW <SIL> M AA D ER N Z <SIL> AE Z S P AY K IH NG AH P IY S <SIL> W UH D <SIL> HH AE V EH N EY B AH L D HH IH M T UW M EY K DH AH B L OW IH R EH P ER AH B AH L <SIL>	0
+librilm-9925	librilm	595	<SIL> AH B EH T ER AH K W EY N T AH N S <SIL> W IH DH DH IH S L AE N D AH V <SIL> F R IY D AH M W IH L SH OW Y UW <SIL> DH AE T F AH D EH L AH T IY AH N D HH AH N ER B IH T W IY N HH AH Z B AH N D AH N D W AY F <SIL> AA R <SIL> HH IY R N OW <SIL> R EH R IH K S EH P SH AH N Z B AH T DH AH Y UW N AH V ER S AH L R UW L B AH T Y UW <SIL> M AH S T <SIL> N OW <SIL> AE T <SIL> W AH N S DH AE T W IY D UW N AA T DH EH R F AO R EH K S ER S AY Z EH N IY S UW P ER HH Y UW M AH N V ER CH UW B AH T S IH M P L IY AE K T IH N K AH N F AO R M AH T IY W IH DH DH AH R IY L N EY CH ER <SIL> AH V M AE N <SIL>	0
+librilm-9926	librilm	176	<SIL> AH B EH T ER AE D M AY R ER DH AE N <SIL> M IY SH IY <SIL> W IH L N AA T F AY N D <SIL> IH N HH ER T AW N K AW N S AH L <SIL> N AO R IH N HH AY ER S AH S AY AH T IY <SIL>	0
+librilm-9927	librilm	143	<SIL> AH B EH T ER AH D V ER T AH Z M AH N T DH AE N DH AH <SIL> K AO R N EH T S OW L OW K UH D N AA T HH AE V B IH N <SIL> D IH V AY Z D <SIL>	0
+librilm-9928	librilm	213	<SIL> AH B EH T ER <SIL> AO L R AW N D <SIL> M AE N DH AE N S IY S AH L AY SH UH D <SIL> HH OW P S EH D D EY V IH D <SIL> L IH N T AH N W IH DH AH S AW N D L AY K <SIL> AH S N AO R T <SIL> AH V <SIL> R AE TH <SIL>	0
+librilm-9929	librilm	140	<SIL> AH B EH T ER AO L T ER K UH D <SIL> N AA T HH AE V B IH N <SIL> S AH L EH K T AH D <SIL> IH N AO L DH AE T V AE S T R IY JH AH N <SIL>	0
+librilm-9930	librilm	423	<SIL> AH B EH T ER AE N K AY N D ER W UH M AH N DH AE N M IH S IH Z <SIL> HH Y UW IH T Y UW W UH D AH N T F AY N D N AA T IH F Y UW W AA Z <SIL> T UW <SIL> W IH DH D IH F AH K AH L T IY DH AH S T R EY N JH ER AH B T EY N D AH F Y UW D IH T EY L Z AH V DH AH AO R AH JH AH N AH N D K AO R S <SIL> AH V <SIL> DH AH IH L N AH S D IH T EY L Z <SIL> HH OW L IY M IH S L IY D IH NG B AH T D IH V AY Z D T UW R IY AH SH UH R <SIL>	0
+librilm-9931	librilm	211	<SIL> AH B EH T ER AE NG K ER AH JH <SIL> DH AE N <SIL> DH IH S P AA R T AH V DH AH <SIL> K OW S T AH F AO R D AH D HH AE V IH NG B IH N F AW N D DH AH <SIL> SH IH P <SIL> B R AO T <SIL> AH P <SIL> HH IY R <SIL>	0
+librilm-9932	librilm	147	<SIL> AH B EH T ER AH N D <SIL> B OW L D ER <SIL> K AA R D P L EY ER DH AE N L AO R D B EH L IH NG ER N EH V ER <SIL> HH EH L D AH T R AH M P <SIL>	0
+librilm-9933	librilm	164	<SIL> AH <SIL> B EH T ER <SIL> AH N D M AO R K R IH S CH AH N <SIL> M AE N S K EH R S L IY EH V ER B R IY DH D <SIL> DH AE N <SIL> JH OW S AH F AE D AH S AH N <SIL>	0
+librilm-9934	librilm	98	<SIL> AH <SIL> B EH T ER AH N D M AO R K AH N S IH S T AH N T W UH M AH N N EH V ER L AY V D <SIL>	0
+librilm-9935	librilm	442	<SIL> AH B EH T ER <SIL> AH N D M AO R AA N ER AH B AH L <SIL> AO F ER IH NG IH Z M EY D T UW AW ER M AE S T ER <SIL> IH N M IH N AH S T R IY T UW DH AH <SIL> P UH R IH N IH K S T EH N D IH NG DH AH N AA L AH JH AH V HH IH Z N EY M IH N <SIL> DH AH P R AE K T AH S AH V DH AH V ER CH UW Z <SIL> B AY W IH CH DH AE T N EY M <SIL> IH Z HH AE L OW D <SIL> DH AE N IH N M AH T IH R IY AH L P R EH Z AH N T S T UW <SIL> HH IH Z T EH M P AH L <SIL>	0
+librilm-9936	librilm	555	<SIL> AH B EH T ER AH N D M AO R <SIL> S P IY D IY P L AE N W UH D P ER HH AE P S <SIL> HH AE V B IH N T UW <SIL> S IY K <SIL> AW T <SIL> W AH N AH V Z UW M AH L AA K S W AA R OW Z EY D Z D IY <SIL> K AE M P R IH L EY T T UW HH IH M HH IH Z R IY S AH N T <SIL> AE D V EH N CH ER Z P R OW D UW S R IY T AH Z <SIL> L EH T ER IH N K ER AO B ER EY SH AH N <SIL> AH V HH IH Z V ER AE S IH T IY <SIL> AH N D R IH K W EH S T HH IH M <SIL> T UW F AO R W ER D IH T AO R P R AH V AY D <SIL> HH IH M W IH DH AH <SIL> HH AO R S T UW T EY K IH T HH IH M S EH L F <SIL>	0
+librilm-9937	librilm	142	<SIL> AH B EH T ER AH N D <SIL> W AY Z ER P R IH N S SH AE L ER AY Z HH UW SH AE L R IH S T AO R P R AA S P EH R AH T IY T UW JH UW D AH <SIL>	0
+librilm-9938	librilm	453	<SIL> AH B EH T ER AE N S ER IH Z DH AE T DH AH <SIL> K AO R T <SIL> HH AE Z B IH F AO R HH AE N D S T R AO NG P R IY Z AH M P T IH V EH V AH D AH N S <SIL> AH V DH AH K R AY M <SIL> AH N D DH AE T AH P R IH Z AH N ER IH Z N AA T P UH T T UW DH AH T AO R CH ER AH N T IH L IH T HH AE Z B IH N <SIL> W EH L <SIL> AE S ER T EY N D <SIL> B AY T EH S T AH M OW N IY AH B T EY N D EH L S W EH R DH AE T <SIL> HH IY IH Z AH G R EY T <SIL> AH F EH N D ER <SIL>	0
+librilm-9939	librilm	151	<SIL> AH B EH T ER AE N T IH D OW T T UW DH AH S T OW N W IH DH IH N P EH R IH S IH Z T UW B IY <SIL> F AW N D IH N DH AH S T OW N ER AW N D IH T <SIL>	0
+librilm-9940	librilm	139	<SIL> AH <SIL> B EH T ER AH P AA L AH JH IY L AY Z IH N DH AH <SIL> T EH K S T AH P R EH SH AH N M AE K IH TH AH W AY Z M AE N M AE D <SIL>	0
+librilm-9941	librilm	355	<SIL> AH B EH T ER <SIL> AH P AA L AH JH IY M EY <SIL> B IY F AW N D <SIL> IH N DH AH <SIL> IH M AH T EY T IH NG DH AH <SIL> K AH N F EH SH AH N AH V <SIL> AA N AH S T B EH N AH D IH K T DH AE T W EH N <SIL> HH IY S EH D <SIL> HH IY W UH D D AY AH B AE CH AH L ER HH IY D IH D N AA T TH IH NG K HH IY <SIL> SH UH D L AY V <SIL> T UW B IY M EH R IY D <SIL>	0
+librilm-9942	librilm	303	<SIL> AH B EH T ER AH P OY N T AH D AA R M IY K AH N S IH S T IH NG AH V DH AH V EH R IY F L AW ER AH V SH IH V AH L R IY <SIL> AH V Y UH R AH P HH AE D IH N DH AH <SIL> M IY N T AY M <SIL> AH S EH M B AH L D T UW <SIL> F AA L OW <SIL> DH AH S EY M P AE TH DH OW IH N AH D IH F ER AH N T M AE N ER <SIL>	0
+librilm-9943	librilm	163	<SIL> AH B EH T ER <SIL> AA R G Y AH M AH N T IH N F EY V ER AH V <SIL> HH AA R T F ER D <SIL> IH Z DH AE T TH R IY <SIL> R EY L R OW D Z S EH N T ER DH EH R <SIL>	0
+librilm-9944	librilm	106	<SIL> AH <SIL> B EH T ER <SIL> AA R M ER ER N EH V ER L EY D HH AE M ER <SIL> AA N <SIL> AE N V AH L <SIL>	0
+librilm-9945	librilm	316	<SIL> AH B EH T ER AA R M IY M AE N <SIL> F AO R <SIL> M AE N P R AA B AH B L IY N EH V ER F EY S T AE N EH N AH M IY <SIL> DH AE N DH AH W AH N K AH M AE N D AH D B AY <SIL> JH EH N ER AH L T EY L ER IH N DH AH <SIL> ER L IY AH S T T UW EH N G EY JH M AH N T S AH V <SIL> DH AH M EH K S AH K AH N <SIL> W AO R <SIL>	0
+librilm-9946	librilm	144	<SIL> AH B EH T ER AA R T DH AE N <SIL> DH AE T AH V IY JH AH P T <SIL> HH AE Z T EY K AH N F IH R AH N D K ER AH P SH AH N AW T AH V IH T <SIL>	0
+librilm-9947	librilm	116	<SIL> AH <SIL> B EH T ER AH S AO R T IH D <SIL> K AH P AH L <SIL> Y UW <SIL> K UH D F AY N D <SIL> N OW W EH R <SIL>	0
+librilm-9948	librilm	235	<SIL> AH B EH T ER AH T ER N IY F AO R DH AH P ER P AH S AH Z T UW W IH CH <SIL> HH IH Z <SIL> L AY F W AA Z D IH V OW T AH D <SIL> D IH D N AA T IH G Z IH S T <SIL> IH N <SIL> L AH N D AH N DH AE N M IH S T ER K AE M P D ER AW N <SIL>	0
+librilm-9949	librilm	133	<SIL> AH B EH T ER B AA R G AH N W AA Z D R IH V AH N IH N <SIL> DH AH W IH R IY <SIL> S K W EH R DH AE N EH N IY W EH R EH L S <SIL>	0
+librilm-9950	librilm	93	<SIL> AH B EH T ER <SIL> B EY S AH S <SIL> F AO R F R EH N D SH IH P K UH D N AA T B IY <SIL>	0
+librilm-9951	librilm	112	<SIL> AH <SIL> B EH T ER B AE TH HH IY R <SIL> R IH T ER N D R OW L Z AH N D N AH TH IH NG T UW <SIL> P EY <SIL>	0
+librilm-9952	librilm	64	<SIL> AH <SIL> B EH T ER B EH D <SIL> AY N EH V ER HH AE D <SIL>	0
+librilm-9953	librilm	128	<SIL> AH <SIL> B EH T ER <SIL> B EH D <SIL> R AA K P R IH N S AH P AH L K AE N HH AA R D L IY <SIL> B IY IH M AE JH AH N D <SIL>	0
+librilm-9954	librilm	331	<SIL> AH <SIL> B EH T ER B IH G IH N IH NG K UH D N AA T B IY M EY D DH AE N <SIL> W IH DH <SIL> DH AH <SIL> HH IH R OW Z AH V HH OW M S T EH D AH N D <SIL> IH T IH Z AH S P EH SH L IY F IH T IH NG DH AE T DH AH F ER S T <SIL> IH M P AH T AH S SH UH D B IY G IH V AH N IH N K AH N EH K SH AH N W IH DH DH IH S HH IH S T ER IY <SIL>	0
+librilm-9955	librilm	75	<SIL> AH B EH T ER B IH HH EY V D L AE D <SIL> D AH Z AH N T S T EH P <SIL>	0
+librilm-9956	librilm	110	<SIL> AH B EH T ER B AA D IY AH V HH EH L P ER Z K UH D S K EH R S L IY B IY G AA T AH N T AH G EH DH ER <SIL>	0
+librilm-9957	librilm	94	<SIL> AH <SIL> B EH T ER B UH K F AO R <SIL> B OY Z HH AE Z N EH V ER B IH N R IH T AH N <SIL>	0
+librilm-9958	librilm	86	<SIL> AH B EH T ER B UH K <SIL> DH AE N AY SH AE L EH V ER R AY T W AA Z DH EH R <SIL>	0
+librilm-9959	librilm	97	<SIL> AH B EH T ER B UH K <SIL> DH AE N <SIL> DH AH P R IH Z AH N ER AH V <SIL> Z EH N D AH <SIL>	0
+librilm-9960	librilm	131	<SIL> AH B EH T ER B UH K DH AE N DH IH S AA N P EH R IH S IH N T AY M <SIL> HH AE Z N AA T CH AE N S T <SIL> IH N AW ER W EY <SIL>	0
+librilm-9961	librilm	65	<SIL> AH <SIL> B EH T ER B AA T AH L DH AE N DH AH F ER S T <SIL>	0
+librilm-9962	librilm	93	<SIL> AH B EH T ER B AW N D ER IY W UH D B IY <SIL> DH AH <SIL> R IH V ER IH T S EH L F <SIL>	0
+librilm-9963	librilm	214	<SIL> AH B EH T ER B OY T UW <SIL> AE N <SIL> AW L D <SIL> F AA DH ER DH AE T S G UH D <SIL> F AO R N AH TH IH NG <SIL> N AW <SIL> IH N DH IH S W ER L D <SIL> N EH V ER W AA Z P L EY Z Y AO R <SIL> HH AH N ER <SIL>	0
+librilm-9964	librilm	249	<SIL> AH <SIL> B EH T ER B R EY V ER S OW L JH ER <SIL> AO R AH <SIL> M AO R F EY TH F AH L F R EH N D N OW <SIL> M AE N EH V ER <SIL> N UW DH AE N CH AA R L Z <SIL> D IY N T R UW P <SIL> AH <SIL> S IH K S TH M IH SH IH G AH N K AE V AH L R IY <SIL>	0
+librilm-9965	librilm	33	<SIL> AH B EH T ER B R EY K <SIL>	0
+librilm-9966	librilm	255	<SIL> AH B EH T ER B R AY T ER <SIL> B OY N EH V ER <SIL> D R UW B R EH TH HH IY S ER V D Y UW F EY TH F AH L <SIL> AE Z DH AH <SIL> D EY W AA Z L AO NG <SIL> AH N D Y UW T R IY T AH D <SIL> HH IH M SH EY M F AH L <SIL> W ER S AH N <SIL> AH S L EY V <SIL>	0
+librilm-9967	librilm	197	<SIL> AH <SIL> B EH T ER B R AH DH ER <SIL> N EH V ER L AY V D B AH T <SIL> HH IY <SIL> M EY <SIL> HH AE V B IH N <SIL> T UW R EH D IY T UW F AO L IH N W IH DH AH DH ER P IY P AH L Z V Y UW Z <SIL>	0
+librilm-9968	librilm	187	<SIL> AH B EH T ER B R AO T AH P B EH T ER D IH S P OW Z D <SIL> Y UW TH DH AE N Y UW W ER W IH DH <SIL> AH HH AY ER S EH N S AH V <SIL> HH AH N ER <SIL> K UH D N AA T B IY F AW N D <SIL>	0
+librilm-9969	librilm	97	<SIL> AH B EH T ER <SIL> K AE P T AH N D OW N T W AO K DH AH D EH K <SIL> Y AO R HH AH N ER <SIL>	0
+librilm-9970	librilm	157	<SIL> AH B EH T ER K AE P T AH N T UW L EH D AH B EH T ER <SIL> S OW L JH ER T UW S T R AY K W IH DH DH AH <SIL> S AO R D <SIL> AY N EH V ER <SIL> S AO <SIL>	0
+librilm-9971	librilm	40	<SIL> AH <SIL> B EH T ER CH AE N S <SIL>	0
+librilm-9972	librilm	144	<SIL> AH B EH T ER CH AE N S AH W EY T S <SIL> DH IY <SIL> SH IY M IY T S DH AH <SIL> F OW M IY T S W EH N SH AE L SH IY <SIL> R IH T ER N <SIL>	0
+librilm-9973	librilm	141	<SIL> AH <SIL> B EH T ER CH AE N S <SIL> F AO R AH <SIL> SH AA T K UH D <SIL> HH AA R D L IY HH AE V <SIL> B IH N AE S K T <SIL> F AO R <SIL>	0
+librilm-9974	librilm	561	<SIL> AH B EH T ER <SIL> CH AE N S F AO R <SIL> HH IH Z P AW ER Z <SIL> AH K ER D <SIL> IH N DH AH AH S EH M B L IY AH V <SIL> DH AE T Y IH R IH N K AH N EH K SH AH N W IH DH AH P L ER AE L IH T IY K EY S W EH R <SIL> DH AH W AH N D ER F AH L D IH S P L EY AH V HH IH Z T AE L AH N T S K AH N T R IH B Y UW T IH D M AH CH T UW DH AH P AE S IH NG AH V AE N EH N AE K T M AH N T DH AE T N OW P R AH F EH S ER SH IH P IH N <SIL> AH Y UW N AH V ER S AH T IY <SIL> SH UH D <SIL> B IY HH EH L D IH N K AH N EH K SH AH N W IH DH <SIL> AH K AH N T R IY CH AA R JH <SIL>	0
+librilm-9975	librilm	76	<SIL> AH B EH T ER CH AE N S F AO R Y UW <SIL> IH Z <SIL> K AH M IH NG <SIL>	0
+librilm-9976	librilm	56	<SIL> AH B EH T ER <SIL> CH AE N S <SIL> M EH R IY <SIL>	0
+librilm-9977	librilm	154	<SIL> AH B EH T ER CH AE N S S EH D M IH S T ER HH Y UW M <SIL> G R IH M L IY DH AE N <SIL> W IY HH AE V AH V K AE CH IH NG <SIL> DH AH OW K AA P IY <SIL>	0
+librilm-9978	librilm	316	<SIL> AH B EH T ER <SIL> CH AE N S DH AH B AE L AH W EY N K AH N T IH N Y UW D AH V DH AH F Y UW P L EY S AH Z OW P AH N IH N DH AH AY L AH N D DH AE N IH F <SIL> HH IY W ER B R AO T AH P AE T DH AH M AE NG K S <SIL> B AA R OW N L IY W IH CH <SIL> W UH D K AA S T M IY L EH S DH AE N HH AE F <SIL> AE Z M AH CH <SIL>	0
+librilm-9979	librilm	286	<SIL> AH B EH T ER K EH R IH K T ER HH AE Z <SIL> B IH N G IH V AH N T UW DH AH R EH G Y AH L ER T R UW P S <SIL> F AO R <SIL> DH EH R IH N D EH V ER Z T UW <SIL> D IH S P ER S DH AH P IY P AH L <SIL> W IH TH AW T <SIL> W UW N D IH NG AO R AH DH ER W AY Z IH N JH ER IH NG DH EH M <SIL>	0
+librilm-9980	librilm	129	<SIL> AH B EH T ER CH AY L D N EH V ER B R IY DH D <SIL> S EH D D AO L T AH N D R IH NG K IH NG AO F <SIL> HH IH Z G L AE S <SIL>	0
+librilm-9981	librilm	145	<SIL> AH B EH T ER <SIL> S IH T AH Z AH N D AH Z N AA T IH G Z IH S T AH N D <SIL> AW ER F R EH N D SH IH P HH AE Z N EH V ER F AA L T ER D <SIL>	0
+librilm-9982	librilm	306	<SIL> AH B EH T ER K L EY M <SIL> M AY T B IY <SIL> R EY Z D AH P IH N Y AO R G R EY S IH Z <SIL> OW N <SIL> P ER S AH N S EH D DH AH ER L <SIL> AH V AA K S F ER D IH F Y UW <SIL> W IH L AH F AO R D M AA R G ER IH T <SIL> AH V <SIL> AE N JH UW DH AH <SIL> S UW K AO R SH IY R IY K W AY ER Z B AY M IY <SIL>	0
+librilm-9983	librilm	222	<SIL> AH B EH T ER K L AE S AH V M EH N S IY M T UW <SIL> B IY JH OY N IH NG <SIL> DH AH K AH L ER Z DH IY Z D EY Z AH N D DH EY AA R K AO L IH NG DH EH R <SIL> D IH F AE M ER Z T UW AH S T R IH K T AH K AW N T IH NG <SIL>	0
+librilm-9984	librilm	341	<SIL> AH <SIL> B EH T ER K L AE S AH F AH K EY SH AH N B EY S T AA N <SIL> M AO R K EH R F AH L S T AH D IY AH V DH AH HH IH S T ER IY AH V DH AH IH NG G L IH SH V ER B <SIL> D IH V AY D Z <SIL> V ER B Z <SIL> IH N T UW DH OW Z AH V <SIL> DH AH W IY K AH N D <SIL> DH OW Z AH V <SIL> DH AH <SIL> S T R AO NG K AA N JH AH G EY SH AH N Z <SIL>	0
+librilm-9985	librilm	454	<SIL> AH <SIL> B EH T ER K L AE S AH F AH K EY SH AH N IH Z IH N T UW DH AH S OW SH AH L <SIL> IH N K L UW D IH NG <SIL> G UH D W IH L L AH V AH V R EH P Y AH T EY SH AH N D IH Z AY ER <SIL> AH V <SIL> AE M IH T IY R IH L IH JH AH N D IH S OW SH AH L D IH S P L EH ZH ER S EH L F R AH G AA R D IH NG F IH Z IH K AH L D IH Z AY ER P EH K Y UW N IY EH R IY IH N T R AH S T L AH V <SIL> AH V <SIL> P AW ER <SIL> S EH L F <SIL> P R EH Z ER V EY SH AH N <SIL>	0
+librilm-9986	librilm	269	<SIL> AH B EH T ER <SIL> K AH M IY D IY AH N Y UW M EY B IY B AH T HH IY HH AE Z N AA T Y AO R <SIL> S K R UW P AH L Z Y AO R S EH N S AH T IH V N AH S AH N D IH Z DH EH R F AO R M AO R <SIL> D EH K S T ER AH S AE T D R AO IH NG DH AH K R AW D Z AH T EH N SH AH N <SIL>	0
+librilm-9987	librilm	237	<SIL> AH B EH T ER K AH M AE N D ER AY D N EH V ER D IH Z AY ER T UW <SIL> S ER V AH N D HH UW N OW Z B AH T AY <SIL> M EY <SIL> HH EH L P T UW S EH T <SIL> AH P DH AY <SIL> S T AE N D IH NG R IH G IH NG IH N AH N AH DH ER W ER L D <SIL>	0
+librilm-9988	librilm	789	<SIL> AH B EH T ER <SIL> K AA M EH N T K UH D N AA T <SIL> B IY M EY D AA N W AH T <SIL> IH Z R IY K W AY ER D T UW P ER F IH K T M AE N AH N D <SIL> P L EY S <SIL> HH IH M <SIL> IH N DH AE T <SIL> S UW P IH R IY ER P AH Z IH SH AH N F AO R <SIL> W IH CH <SIL> HH IY W AA Z D IH Z AY N D DH AE N <SIL> B AY DH AH IH N T ER P R IH T EY SH AH N AH V B EY K AH N <SIL> AH P AA N DH AH L EH JH AH N D Z <SIL> AH V <SIL> DH AH S IH R AH N <SIL> K OW S T W EH N DH AH W AY Z Y UW L IH S IY Z P AE S T S EH Z HH IY HH IY <SIL> K AA Z D HH IH Z M EH R AH N ER Z <SIL> T UW <SIL> S T AA P <SIL> DH EH R <SIL> IH R Z W IH DH <SIL> W AE K S N OW IH NG DH EH R W AA Z IH N DH EH M N OW <SIL> P AW ER <SIL> T UW R IH Z IH S T <SIL> DH AH L UH R <SIL> AH V DH AE T V AH L AH P CH AH W AH S S AO NG <SIL>	0
+librilm-9989	librilm	315	<SIL> AH B EH T ER K AH M P AE N Y AH N DH AE N HH ER <SIL> W AY T K IH T AH N <SIL> AO R HH ER <SIL> F EY V ER IH T <SIL> N IH R OW <SIL> AO R <SIL> IY V IH N <SIL> HH ER <SIL> F EY TH F AH L F R EH N D P IY EH R DH AH S EY N T B ER N AA R D AA K Y AH P AY D DH AH AH DH ER V EH L V AH T R AA K IH NG CH EH R <SIL>	0
+librilm-9990	librilm	155	<SIL> AH B EH T ER K AH M P EH R AH S AH N IH Z DH AE T W IH CH M IH S T ER G AA S HH AE Z <SIL> M EY D W IH DH S IH D N IY D OW B AH L Z B AO L D ER <SIL>	0
+librilm-9991	librilm	227	<SIL> AH B EH T ER K AH N S EH P SH AH N AH V L AE NG G W AH JH K UH D N AA T HH AE V B IH N <SIL> F AO R M D <SIL> IH N P L EY T OW Z EY JH DH AE N DH AE T W IH CH HH IY AE T R IH B Y UW T S T UW <SIL> S AA K R AH T IY Z <SIL>	0
+librilm-9992	librilm	130	<SIL> AH <SIL> B EH T ER <SIL> K AH N D IH SH AH N <SIL> AH V <SIL> TH IH NG Z N AW <SIL> P R IY Z EH N T AH D IH T S EH L F <SIL>	0
+librilm-9993	librilm	99	<SIL> AH B EH T ER K AH N D AH K T ER SH IY K UH D <SIL> N AA T <SIL> HH AE V <SIL> W IH SH T <SIL>	0
+librilm-9994	librilm	168	<SIL> AH B EH T ER K AH N D AH K T ER <SIL> W UH D B IY DH AH M EH T AH L <SIL> K AH V ER IH NG AH V DH AH R UW F W EH N S AH CH M AH T IH R IY AH L IH Z Y UW Z D <SIL>	0
+librilm-9995	librilm	463	<SIL> AH B EH T ER K AA N S T AH T UW T AH D <SIL> B OY <SIL> W UH D S ER T AH N L IY HH AE V P R AA F AH T AH D AH N D ER M AY IH N T EH L AH JH AH N T T UW T ER Z W IH DH DH EH R S AY AH N T IH F IH K AE P ER AE T AH S AH N D W UH D D AW T L AH S HH AE V F AW N D DH AH F AH N AA M AH N AH AH V IH L EH K T R IH S AH T IY AH N D M AE G N AH T IH Z AH M <SIL> AE Z F AE S AH N EY T IH NG AE Z AY <SIL> W AA Z EH V ER IY TH ER Z D EY AH SH UH R D DH EY W ER <SIL>	0
+librilm-9996	librilm	81	<SIL> AH B EH T ER K AH N T R IY <SIL> F AO R HH IH M <SIL> DH AE N DH IH S <SIL>	0
+librilm-9997	librilm	415	<SIL> AH <SIL> B EH T ER K AH N T R IY W AA Z R IY CH T AE Z <SIL> W IY N IH R D DH AH R IH V ER AH N D IH T <SIL> W AA Z AH <SIL> P L EH Z AH N T S AY T T UW S IY DH AH T AH M B AH L IH NG S T R IY M <SIL> AH V <SIL> DH AH <SIL> L EH S ER T UW G EY L AH AH N D T UW <SIL> F AY N D <SIL> IH N W AH N <SIL> V AE L IY DH AH P R IY T AH N S AH V AH G AA R D AH N <SIL> AH N D AH HH AW S AH M AH NG <SIL> T R IY Z <SIL>	0
+librilm-9998	librilm	370	<SIL> AH B EH T ER K AO R S <SIL> AE T DH AH R IH NG K UH D N AA T <SIL> B IY R AH N DH AE N <SIL> S ER <SIL> JH AO S L IH N HH AE TH <SIL> P ER F AO R M D <SIL> N AO R K UH D G R EY T ER <SIL> V AE N T AH JH B IY G EY N D <SIL> IH N DH AH <SIL> JH AW S T S DH AE N HH IY HH AE TH AH B T EY N D <SIL> OW V ER <SIL> DH AH <SIL> M AA R K IY AH V B AH K IH NG HH AE M <SIL>	0
diff --git a/SpeechLM/dataset/LibriSpeech/fast_phone2unit/train_exmples.tsv b/SpeechLM/dataset/LibriSpeech/fast_phone2unit/train_exmples.tsv
new file mode 100644
index 0000000000000000000000000000000000000000..bbdef25c7cf1c806740740956a25ce2aff41d007
--- /dev/null
+++ b/SpeechLM/dataset/LibriSpeech/fast_phone2unit/train_exmples.tsv
@@ -0,0 +1,100 @@
+id	speaker	n_frames	tgt_text	duration	unit
+103-1240-0000	103	704	1 10 4 29 33 14 38 5 25 1 24 19 31 19 40 30 15 10 5 23 19 25 11 1 19 40 31 14 29 30 8 40 11 1 24 19 31 19 40 30 15 10 5 23 19 25 11 1 23 19 37 11 1 21 5 31 33 38 13 30 12 20 4 37 5 25 23 20 24 15 25 30 27 11 1 11 19 29 33 1 11 7 25 19 25 33 36 5 23 19 33 5 23 18 3 23 27 1 16 30 19 25 21 11 38 19 34 6 23 11 14 40 5 25 11 23 15 11 20 40 19 30 11 30 3 29 31 5 25 11 33 30 5 37 14 31 33 9 8 5 9 30 35 22 1	22 6 3 4 2 4 5 4 9 14 5 2 4 2 5 4 4 3 4 6 5 5 4 1 5 3 4 2 4 2 9 6 4 84 4 2 4 4 5 4 3 4 4 6 6 6 7 3 5 3 3 2 4 6 4 5 4 2 3 3 2 5 8 3 2 5 2 7 8 7 7 4 9 5 8 3 3 4 4 5 4 8 3 1 3 4 2 3 5 2 2 2 4 5 4 5 8 13 8 2 3 4 4 2 2 2 2 7 4 4 4 5 1 2 4 3 6 2 5 4 3 3 4 3 4 4 4 2 1 3 2 2 2 2 7 3 4 2 5 4 5 3 5 6 6	17 17 17 296 296 317 317 317 317 317 491 461 461 461 461 461 461 491 491 184 184 184 289 310 107 107 395 351 486 486 460 215 215 35 96 272 300 382 382 245 43 364 276 174 174 174 174 319 282 282 388 303 303 117 404 404 439 439 225 225 225 225 225 225 225 491 391 391 47 491 73 80 289 7 7 217 473 258 258 258 31 342 224 494 494 494 368 281 9 142 142 147 147 329 329 329 329 329 329 36 310 107 395 395 302 302 497 497 251 251 241 241 431 329 432 330 330 388 195 195 64 212 212 131 483 226 226 226 209 356 356 356 356 31 162 68 224 224 494 494 215 129 74 190 190 499 499 499 265 265 85 85 85 85 207 318 185 185 433 433 86 6 6 227 419 417 417 417 237 237 237 237 237 237 237 237 237 237 237 237 362 491 362 305 40 491 305 40 40 362 362 40 40 40 40 40 40 40 40 218 491 218 218 218 491 305 218 491 218 218 218 218 218 218 491 218 435 491 218 491 218 218 218 491 218 218 491 369 491 369 369 369 369 369 21 21 21 21 21 21 21 21 408 408 408 149 228 228 491 289 320 7 473 258 258 258 258 342 342 224 494 494 494 494 31 9 9 142 397 147 147 329 329 329 329 329 143 36 107 107 395 302 302 497 497 251 251 251 241 241 431 278 278 278 278 330 388 388 195 195 195 243 212 131 419 439 225 225 225 80 491 80 7 7 251 241 431 278 278 278 173 173 402 402 401 401 401 401 401 491 310 107 395 395 180 151 151 151 169 150 150 86 86 238 6 272 397 133 345 109 109 109 264 264 313 216 216 22 448 448 448 14 14 14 145 145 145 486 460 460 460 173 280 29 242 242 116 33 250 250 251 241 81 444 324 324 324 324 324 301 339 217 217 217 217 217 473 65 290 290 290 290 290 434 434 339 339 33 250 250 42 42 147 147 380 288 84 496 496 496 496 496 274 274 37 24 131 404 439 78 414 80 80 80 80 80 80 80 401 384 371 278 278 278 215 35 35 96 401 401 401 401 401 401 401 401 401 239 384 371 180 315 315 315 315 315 450 450 413 413 94 199 340 340 33 76 465 377 123 123 123 88 88 44 44 44 251 251 241 431 278 278 285 285 302 302 497 497 497 58 72 72 72 437 481 481 481 481 481 481 175 175 81 84 84 84 496 274 98 98 229 247 247 126 126 126 326 326 326 326 326 101 101 149 228 491 373 393 234 234 155 190 190 487 288 288 278 330 339 64 64 212 310 447 447 6 272 472 345 333 333 220 220 164 14 14 411 411 284 481 481 481 293 293 122 122 384 300 334 334 304 304 304 49 269 342 168 89 89 89 446 33 33 250 251 251 241 431 470 171 171 171 252 252 325 34 41 324 324 318 368 368 342 9 219 485 286 286 382 382 313 236 239 161 161 79 499 499 405 405 206 215 215 233 270 270 433 342 224 89 89 322 67 394 76 465 161 161 492 492 492 8 8 280 498 498 498 498 498 396 186 39 54 238 6 272 472 336 336 62 62 62 62 62 146 464 44 44 44 8 32 401 354 190 190 380 380 499 496 496 496 178 233 233 458 192 419 427 247 247 15 193 193 17
+103-1240-0001	103	797	1 12 5 33 18 4 11 19 33 31 6 30 31 5 38 15 9 4 22 19 25 12 5 38 35 11 40 5 37 12 20 27 23 11 22 5 34 9 14 33 29 23 15 31 1 19 33 38 5 40 30 19 29 39 36 33 19 11 33 19 9 20 5 25 19 25 33 30 5 22 5 33 18 13 11 23 6 26 9 30 35 22 19 25 19 33 31 1 14 23 20 14 22 6 30 31 34 30 36 12 27 40 38 35 11 40 1 38 19 34 11 3 30 22 31 20 22 30 19 33 31 5 37 29 36 23 5 25 11 22 4 31 22 15 11 1 9 5 33 9 8 12 5 33 8 24 19 33 30 20 10 33 23 19 25 11 40 18 3 23 27 19 33 38 5 40 5 22 38 8 5 33 1 38 13 23 22 5 25 11 5 22 33 5 11 23 19 33 5 23 31 33 30 20 24 1	8 2 2 2 7 3 2 3 4 8 4 5 4 4 5 9 6 4 4 3 3 1 3 7 3 3 3 1 2 3 4 4 2 4 5 3 6 3 3 4 4 2 5 12 29 6 3 3 2 3 2 3 3 2 4 2 3 3 1 2 3 5 1 3 5 4 4 2 1 3 4 3 9 3 4 3 6 6 3 4 2 5 2 2 2 3 2 2 6 3 3 3 6 4 3 4 3 2 2 3 5 5 5 4 4 5 14 4 2 7 4 4 5 4 9 4 4 2 2 3 3 2 3 7 9 3 3 2 2 6 4 6 3 9 5 27 2 6 4 3 5 2 2 6 5 3 2 3 4 4 5 2 4 4 4 2 2 5 4 5 9 5 3 3 2 3 2 6 3 8 3 4 2 5 3 4 3 2 4 3 3 4 2 3 3 3 2 2 2 3 3 4 2 6 6 6	17 17 363 363 51 51 228 320 127 45 45 45 385 131 58 72 72 110 110 110 110 486 460 240 240 325 34 154 154 154 457 478 478 232 232 482 482 172 115 273 273 153 153 153 372 372 396 396 186 186 54 54 172 224 273 255 255 43 364 364 276 109 109 403 403 403 403 403 207 246 324 301 301 129 401 354 354 180 376 376 376 460 178 178 458 192 192 242 340 116 466 466 22 283 455 43 364 364 276 276 153 153 496 496 37 37 24 77 270 342 224 69 69 130 130 198 22 448 448 448 464 180 424 424 424 424 424 274 122 131 472 221 401 82 144 27 437 151 151 169 169 164 164 472 221 401 259 29 380 382 396 313 385 35 472 401 259 74 425 425 386 343 343 343 343 343 358 358 39 39 433 433 160 160 160 112 427 56 56 491 312 312 341 341 341 341 341 341 12 12 12 21 21 21 21 21 21 21 21 21 408 408 408 408 391 391 228 491 491 412 177 177 177 177 177 131 133 345 141 141 141 281 453 142 397 456 456 456 456 129 259 74 485 485 485 485 374 374 325 449 449 191 191 191 314 314 36 377 87 87 8 8 420 420 420 324 464 44 44 44 94 335 335 411 411 188 121 121 33 64 76 465 465 161 161 487 469 469 143 458 192 192 278 278 278 37 314 131 472 72 72 72 72 72 72 110 110 443 120 240 314 314 26 26 26 251 241 431 235 235 235 235 235 413 200 200 248 248 248 212 354 190 380 380 499 496 496 496 178 233 458 192 192 340 340 340 94 199 154 154 77 342 342 142 14 411 498 498 498 498 498 134 175 81 166 324 324 464 382 382 245 129 458 208 208 441 441 441 153 153 372 372 396 186 186 323 323 238 6 272 377 487 487 374 313 216 216 114 124 124 124 274 274 368 269 9 142 397 336 276 109 109 496 496 496 37 37 37 24 270 270 433 160 427 229 247 247 126 126 326 326 326 326 326 101 101 149 149 228 228 491 345 333 333 333 220 220 164 402 221 401 401 401 491 384 371 180 106 306 306 306 306 396 396 178 178 35 458 96 96 66 66 68 68 68 68 115 115 444 213 213 213 143 458 208 208 487 487 288 277 385 143 270 270 342 224 69 462 462 130 402 402 401 401 491 74 190 441 441 441 153 153 182 182 182 182 182 497 175 175 81 89 89 446 116 33 131 472 221 458 445 445 351 351 486 486 460 460 169 150 342 342 86 105 336 445 445 470 403 403 171 171 171 246 246 252 24 131 404 439 78 170 305 491 28 28 28 491 491 491 2 201 305 305 491 305 305 2 316 316 316 316 316 491 491 289 289 289 320 354 159 159 159 159 159 240 35 131 472 221 336 354 62 62 62 62 62 438 216 22 283 455 236 108 119 119 103 103 103 103 103 85 299 203 53 473 177 177 143 131 133 133 147 380 288 213 213 213 252 143 310 447 447 447 26 26 251 251 241 81 329 329 329 330 388 195 195 471 471 49 453 142 58 72 72 437 437 481 481 481 481 293 175 175 81 84 84 84 84 84 16 274 274 98 483 483 440 188 177 177 177 131 133 133 345 141 141 141 281 9 168 44 44 143 458 208 208 441 441 441 346 346 265 265 85 85 85 146 146 277 277 277 385 385 227 419 225 225 226 197 7 364 276 109 109 139 139 293 293 122 143 458 144 27 27 121 116 33 33 212 239 371 180 151 151 151 178 35 96 96 36 272 191 191 191 37 314 26 251 241 431 431 278 285 285 302 302 497 497 186 162 482 482 338 238 161 79 487 288 288 360 360 434 434 434 203 381 381 404 13 491 247 15 193 193 193 17
+103-1240-0002	103	697	1 16 6 30 25 3 33 20 37 19 25 5 9 30 35 22 1 22 35 11 30 5 25 29 4 31 33 24 19 31 19 40 30 15 10 5 23 19 25 11 40 11 6 30 38 19 12 7 33 11 39 36 30 19 17 3 30 11 16 14 11 20 31 5 25 31 20 4 25 11 19 22 6 30 5 24 1 19 33 29 30 3 9 5 9 23 20 38 5 40 22 3 25 32 5 31 12 5 33 24 19 31 19 40 30 15 10 5 23 38 5 40 31 19 33 19 26 4 33 18 14 38 19 25 11 27 1 22 20 29 19 26 5 32 3 30 29 8 3 25 13 37 30 20 34 19 26 12 5 33 29 4 31 33 1 16 14 24 9 30 35 22 31 5 25 11 10 19 23 11 30 5 25 5 29 1	8 6 2 3 2 6 2 6 3 1 3 3 5 5 3 10 4 4 2 4 4 3 5 5 7 4 2 2 3 3 3 4 3 3 3 3 7 3 4 3 3 3 3 3 3 2 3 6 2 2 3 4 4 2 3 6 4 3 3 4 4 5 4 3 5 5 7 3 2 4 2 6 4 5 2 6 32 6 3 3 3 3 2 3 3 3 4 3 2 3 6 4 4 4 5 5 2 2 4 2 3 4 3 5 4 4 4 4 5 3 2 2 5 1 3 2 5 2 3 4 3 4 2 4 2 9 11 4 4 3 2 4 2 9 4 3 3 11 5 4 4 5 3 3 4 2 4 2 2 2 7 9 6 4 9 3 4 3 2 2 3 3 3 2 2 2 3 2 3 3 2 1 4 8 6 9	17 17 17 363 363 51 51 228 491 373 155 155 155 148 148 387 372 313 10 479 479 307 307 307 307 61 167 449 449 34 357 357 357 357 357 173 280 29 242 116 94 199 44 44 44 8 129 401 259 354 190 190 380 380 499 496 496 496 167 233 233 144 192 419 419 439 225 225 225 80 80 491 491 144 389 389 389 389 389 133 133 42 147 147 380 499 319 319 319 348 348 195 394 90 76 74 74 437 311 311 311 311 311 311 460 169 150 342 86 6 6 196 217 473 258 258 258 31 342 224 494 494 368 281 9 142 397 147 329 329 329 329 329 36 310 107 302 302 302 497 497 251 251 251 241 431 329 329 330 116 33 195 195 471 471 49 269 142 238 6 272 106 153 153 372 372 372 245 43 345 333 333 220 220 216 180 113 113 113 113 167 167 236 239 401 384 219 485 485 374 374 132 132 42 147 456 456 456 456 416 144 27 106 306 306 306 306 306 306 396 313 24 24 131 472 393 155 332 332 332 313 236 239 239 384 371 213 213 213 252 186 39 342 342 11 11 11 379 379 379 394 76 478 66 68 68 115 267 41 41 41 246 3 464 464 89 194 446 446 446 64 212 239 384 490 490 143 458 144 208 441 441 153 153 153 372 372 372 467 467 467 275 203 381 381 48 404 13 491 491 312 312 312 292 292 292 292 292 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 408 408 408 408 149 149 228 491 289 412 177 177 177 177 177 35 401 259 74 190 190 488 488 488 488 488 8 29 134 134 134 134 8 8 359 359 166 166 166 324 301 378 345 141 141 141 281 9 142 221 144 27 27 437 370 370 370 370 370 370 348 64 76 310 107 395 395 459 459 459 271 271 39 86 86 238 198 45 45 45 45 45 35 401 196 217 473 258 258 258 31 342 342 224 494 494 494 368 453 9 142 397 147 147 329 329 329 329 329 329 143 310 107 107 395 302 302 302 375 497 497 497 43 364 345 141 141 141 31 162 232 68 172 115 470 278 278 325 34 176 135 135 200 200 464 415 415 415 415 415 131 183 156 156 156 156 245 43 364 364 109 109 278 278 116 33 64 212 212 34 84 84 84 84 274 274 98 229 229 247 126 126 326 326 101 408 408 228 491 491 491 445 445 213 213 213 213 252 215 458 176 176 135 135 200 200 44 44 44 44 99 338 338 338 338 338 395 106 306 306 306 306 396 396 215 35 35 335 145 284 265 265 265 85 85 85 146 464 464 106 125 125 125 125 348 94 335 14 411 204 204 204 204 204 204 204 29 29 337 337 324 422 422 164 164 164 214 214 214 328 200 248 466 114 114 45 45 385 90 401 82 74 119 311 311 311 311 311 311 311 311 311 282 169 169 150 39 433 86 238 6 75 227 419 225 225 225 225 225 491 373 305 80 289 491 155 165 165 165 165 165 203 53 212 239 190 380 380 496 496 178 35 96 270 342 342 224 89 89 446 33 394 310 107 395 395 106 139 424 387 122 122 122 300 300 242 242 116 94 335 335 411 230 230 230 230 230 230 215 215 233 233 419 427 229 247 247 126 126 193 193 193 193 17
+103-1240-0003	103	735	1 4 25 11 12 4 33 19 16 32 20 25 27 33 5 31 33 13 25 20 34 19 26 1 3 11 1 6 30 7 33 5 37 29 23 15 31 1 32 20 38 35 11 25 13 37 14 30 13 31 33 5 25 33 19 23 32 20 18 4 11 16 13 30 5 33 19 11 7 33 12 5 38 8 40 5 25 11 38 13 30 16 6 30 40 12 13 30 5 37 1 12 13 30 3 30 29 23 13 25 33 20 5 37 29 20 29 5 23 19 25 4 37 5 25 23 20 4 25 11 7 33 5 37 19 33 1 18 36 22 5 25 5 33 13 25 11 22 23 27 31 23 20 33 19 12 13 30 25 15 9 14 40 9 19 40 25 19 31 9 8 11 19 25 33 5 37 25 19 17 23 13 22 33 19 26 12 13 30 27 25 1	8 7 2 1 2 4 2 4 3 3 2 5 4 2 3 4 4 3 4 3 3 2 3 2 11 2 2 4 4 6 3 2 2 7 2 9 7 14 7 3 3 2 3 4 3 3 4 5 4 8 3 3 3 2 3 2 5 3 2 1 4 4 2 4 3 2 2 3 7 4 2 3 6 11 5 2 2 4 3 3 4 6 3 3 4 3 2 8 5 9 61 3 2 1 2 3 5 2 2 3 2 3 1 3 3 5 2 2 4 5 4 5 2 3 5 3 9 5 2 4 7 2 2 3 4 5 13 4 2 3 3 2 3 5 4 3 4 5 3 4 6 2 2 4 1 2 2 3 3 5 4 2 5 3 2 5 3 4 3 4 6 4 3 3 3 2 2 3 2 3 2 3 3 3 1 3 3 2 5 6 5 12	17 17 17 363 363 51 149 228 228 209 83 194 194 194 322 322 67 212 127 45 45 45 45 240 240 325 118 118 118 118 118 402 338 400 400 400 30 301 301 10 479 331 84 84 496 274 252 36 449 459 459 459 31 342 86 86 6 272 483 483 411 475 475 475 475 475 475 475 475 349 164 164 214 214 214 214 200 248 14 14 411 287 284 284 284 426 426 426 206 206 206 24 335 335 226 157 157 157 157 157 245 14 14 411 145 113 113 113 113 285 285 34 462 462 130 402 401 401 491 74 425 425 386 386 431 343 343 343 343 358 358 358 358 358 39 433 433 433 160 427 247 247 247 126 126 292 326 326 326 326 326 408 408 149 228 491 373 338 338 400 400 400 400 301 378 43 345 389 389 389 314 314 196 309 309 479 331 463 463 463 463 280 29 382 245 245 42 42 147 380 380 288 443 443 120 169 169 150 39 433 86 86 86 6 6 272 34 89 319 319 348 394 76 108 377 139 139 139 139 293 186 99 338 400 400 400 30 3 58 254 254 254 314 131 393 234 234 261 25 470 264 264 468 468 468 396 313 143 449 449 191 191 191 325 180 180 113 113 113 113 113 167 314 314 401 401 198 22 283 455 455 43 364 364 276 346 346 346 265 265 265 265 85 85 85 146 146 318 318 368 453 342 168 89 89 446 116 212 131 133 43 364 276 109 109 264 264 264 468 245 245 349 234 234 155 25 148 148 148 372 372 304 304 49 9 9 221 198 127 114 114 264 264 468 406 406 467 467 106 284 284 426 426 206 206 37 173 352 352 352 352 419 439 439 237 237 237 491 491 491 28 491 491 491 491 341 341 341 341 341 341 369 369 369 369 369 369 369 369 369 369 369 369 369 369 369 369 369 369 369 369 369 369 369 369 369 369 369 369 369 260 491 163 163 163 163 163 163 316 491 316 316 73 289 289 289 127 114 0 222 468 353 353 353 353 215 35 259 74 425 425 386 431 432 330 330 348 76 449 41 324 464 462 462 462 402 221 259 74 351 213 213 213 252 252 129 354 100 100 100 497 497 335 335 440 188 188 340 116 94 199 145 145 145 486 460 460 173 280 29 242 242 116 33 250 250 359 359 474 474 474 324 246 19 3 3 14 14 209 145 194 194 446 446 388 64 212 335 14 411 145 145 113 113 113 450 413 285 34 223 223 223 280 280 277 277 277 277 277 233 75 227 427 229 247 312 292 292 326 326 101 101 149 391 228 491 491 373 489 489 489 489 143 458 144 389 389 389 94 199 255 255 236 36 108 119 119 351 432 432 432 330 388 195 64 131 472 472 221 458 144 208 208 425 386 386 496 496 496 274 186 186 54 54 86 26 26 474 474 166 301 143 36 377 123 123 123 114 222 222 222 222 313 10 10 479 398 290 171 171 252 215 458 29 382 382 304 368 269 342 142 221 336 354 278 278 278 368 453 342 86 196 196 94 459 459 459 271 31 342 342 221 221 336 354 62 62 62 62 62 438 438 143 36 384 371 371 278 278 330 33 64 76 108 449 69 223 130 402 196 479 331 255 154 416 458 208 386 431 151 151 151 178 458 96 36 272 176 135 135 200 248 248 127 114 222 222 222 406 406 467 467 350 350 350 350 350 350 413 413 303 48 404 13 229 491 491 312 15 15 15 193 193 193 17
+103-1240-0004	103	625	1 9 5 33 24 19 31 19 40 30 15 10 5 23 19 25 11 38 5 40 38 5 25 5 37 12 27 40 22 15 29 5 9 5 23 22 30 20 10 14 40 18 36 22 5 25 24 4 25 19 21 12 13 30 27 25 22 5 25 31 14 25 40 4 25 11 12 27 40 5 37 5 12 14 16 27 22 31 19 25 33 5 12 5 9 3 30 17 5 25 1 32 20 38 5 40 5 25 27 33 5 9 5 23 18 7 31 38 8 16 1 18 14 38 14 22 38 5 40 6 23 38 20 40 11 5 25 4 25 11 38 13 23 11 5 25 1 32 20 30 4 25 1 12 5 31 27 19 26 31 14 22 5 23 1	13 3 3 3 2 3 4 2 5 3 3 4 4 6 5 3 3 3 2 3 3 2 2 3 2 2 5 3 8 4 2 2 4 2 4 6 3 4 6 4 4 2 2 3 3 3 3 3 2 4 5 2 2 4 7 2 4 2 3 6 5 4 3 3 3 1 3 5 4 2 4 3 3 2 5 6 3 3 2 2 2 2 1 2 3 3 4 3 2 6 37 6 3 3 1 3 3 3 3 2 2 1 2 3 4 4 6 3 5 9 7 4 4 6 4 4 2 2 3 4 4 4 3 4 3 6 7 8 2 3 4 4 5 4 4 11 7 8 9 7 10 7 1 2 4 5 4 3 4 4 4 4 2 6 7	17 17 17 296 363 363 51 51 51 491 491 491 491 320 320 159 159 159 159 314 35 196 196 473 258 258 258 31 342 224 494 494 494 368 453 142 142 397 147 380 329 329 329 329 143 36 310 107 395 134 302 302 497 497 251 251 251 241 431 278 278 278 330 388 195 64 212 131 133 133 141 141 141 281 453 142 221 336 174 174 174 174 348 199 223 223 223 130 198 198 124 124 124 124 124 368 31 342 86 221 221 336 445 445 445 351 351 171 171 171 252 215 29 134 134 134 8 259 354 100 100 497 497 497 122 129 259 144 208 208 190 487 487 213 213 213 252 143 36 310 107 395 334 334 334 304 304 185 49 269 342 224 224 489 489 489 143 144 27 389 389 116 33 250 217 217 473 365 365 365 330 94 199 469 469 469 24 36 310 447 447 447 6 127 222 222 222 245 245 14 411 411 350 350 350 350 413 64 394 465 465 27 27 121 116 33 394 478 478 232 172 224 273 470 498 308 308 467 299 388 379 471 471 49 342 168 89 194 194 446 322 64 212 198 114 114 84 496 496 274 318 49 269 342 224 69 462 130 129 402 106 493 493 493 216 300 300 382 245 349 205 261 261 25 496 496 496 496 274 274 233 96 270 433 342 168 340 340 116 33 36 377 123 123 216 283 455 8 354 106 306 306 306 306 396 396 416 416 192 192 275 275 116 303 303 48 48 229 170 491 491 312 312 312 292 292 292 292 292 292 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 260 408 408 408 391 391 391 228 491 491 373 338 400 400 400 30 378 378 141 141 141 281 342 342 44 44 44 94 331 84 84 496 496 285 449 134 134 8 100 100 497 497 58 72 72 268 268 268 268 268 169 186 39 54 142 397 397 276 346 346 346 428 85 146 146 358 352 352 352 352 352 352 417 417 417 417 237 491 435 225 225 225 72 156 156 156 156 245 245 43 364 276 276 109 109 498 396 396 178 143 259 458 208 345 141 141 281 281 9 168 106 297 297 297 297 297 297 43 43 345 109 109 171 171 368 368 342 342 221 336 371 180 180 319 319 319 319 282 388 195 195 195 117 404 335 440 209 83 194 194 194 194 446 446 64 212 131 133 364 364 276 109 109 443 139 139 139 293 293 497 122 239 36 371 180 319 319 319 319 282 303 303 303 303 117 404 439 439 439 78 237 47 47 47 80 491 80 491 373 373 338 338 338 400 400 400 30 30 246 246 246 3 3 197 197 7 42 147 147 147 380 210 210 210 210 486 365 282 282 282 388 388 195 195 199 404 404 197 197 216 22 283 283 455 38 162 482 115 273 273 84 496 88 88 176 176 176 328 200 248 478 66 172 115 273 498 498 498 245 143 458 458 302 302 302 302 375 98 98 229 247 247 15 15 193 193 193 17
+103-1240-0005	103	758	1 18 13 23 29 33 30 5 25 12 5 31 5 25 11 15 31 22 36 23 4 25 11 38 5 40 12 5 31 33 30 6 26 17 5 31 33 29 30 3 29 5 37 1 12 5 10 14 10 1 15 11 31 5 31 8 5 33 20 4 25 11 16 6 30 5 25 24 19 32 5 25 40 3 17 40 19 23 39 14 20 1 39 13 33 38 19 12 6 23 12 19 31 1 24 19 31 19 40 30 15 10 5 23 16 7 25 11 1 5 9 5 25 11 5 25 33 8 24 33 5 31 19 33 16 14 7 14 40 4 33 18 14 22 19 10 5 25 38 19 25 11 27 1 25 19 33 19 26 22 3 33 5 25 38 6 30 29 22 38 19 23 33 31 1 32 20 18 4 11 25 19 33 19 11 31 19 22 31 33 20 25 5 37 12 13 24 1	9 5 2 4 3 3 5 4 4 2 2 6 3 3 3 5 6 2 7 5 6 2 3 3 2 3 2 3 4 4 2 3 3 3 2 4 3 4 4 4 4 1 2 2 1 3 7 5 5 2 7 4 3 2 6 7 3 3 6 5 2 2 6 5 3 2 2 3 2 6 2 5 2 4 4 4 2 4 2 5 9 30 4 2 4 2 2 3 5 6 3 7 11 2 5 2 5 2 5 4 3 3 3 3 5 6 2 2 1 4 6 3 7 2 4 2 10 7 2 3 2 5 2 3 4 6 10 4 4 2 2 3 3 4 2 6 2 3 4 2 4 2 9 14 4 3 3 3 7 9 4 3 1 5 4 3 3 3 8 3 2 6 2 9 12 7 3 2 2 2 2 4 2 3 4 4 2 3 4 4 3 2 3 3 1 4 5 7	17 17 17 363 363 363 51 51 228 491 373 72 110 110 139 139 139 293 293 215 35 96 96 6 472 472 133 42 147 380 499 499 319 319 319 348 195 195 466 22 283 283 38 162 68 68 68 273 273 319 319 319 348 33 64 212 212 93 93 93 93 171 422 186 39 86 86 105 105 336 208 153 153 153 153 182 182 375 375 497 98 98 483 440 83 83 55 55 55 322 67 212 131 133 345 141 141 141 141 281 9 198 198 22 283 455 38 162 482 482 482 238 6 161 161 499 499 235 235 235 235 348 64 212 459 459 459 459 31 54 86 6 272 472 221 336 259 190 190 190 488 499 499 405 405 206 215 215 35 29 69 69 223 130 198 198 22 283 455 236 129 36 310 107 395 395 487 498 498 498 396 178 36 310 107 447 483 226 226 209 411 171 171 171 171 252 252 143 77 478 342 224 494 494 494 31 342 342 115 273 470 265 265 265 85 85 85 146 469 469 469 36 449 41 41 41 324 324 3 335 440 145 194 194 446 446 67 76 90 393 393 234 261 25 148 148 148 148 372 372 467 467 467 242 116 33 250 217 217 473 473 278 278 99 436 436 60 60 298 379 379 195 471 471 49 49 168 106 106 405 167 215 35 458 96 368 453 453 371 278 278 139 175 81 324 324 219 495 495 495 495 467 41 41 41 41 19 454 454 229 491 491 312 312 312 292 292 292 292 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 408 408 408 149 228 491 491 320 219 357 357 357 240 240 131 133 133 345 333 333 220 220 216 106 106 297 297 297 297 297 297 293 293 293 122 4 127 114 258 258 258 258 258 271 271 39 433 433 433 433 160 160 160 97 97 225 225 225 225 7 217 473 258 258 258 258 342 342 224 494 494 494 368 9 142 142 397 147 380 329 329 329 329 329 310 107 107 395 302 302 497 497 349 349 205 261 261 25 315 315 315 450 450 413 413 212 131 335 226 209 44 44 44 236 8 32 259 354 180 319 319 319 348 348 64 64 64 64 212 384 34 11 11 11 116 33 243 243 401 401 491 108 119 119 437 103 103 103 103 103 85 85 299 299 339 64 76 36 87 87 66 66 68 68 115 278 278 278 36 131 393 393 155 155 332 332 332 14 14 14 411 145 145 284 315 315 450 450 153 88 372 372 304 304 185 49 453 168 415 415 415 415 58 183 156 156 156 313 143 458 458 445 351 278 278 278 36 310 107 107 395 242 116 116 250 250 250 276 109 109 278 330 116 33 64 212 212 371 84 84 84 84 274 274 263 229 247 247 126 126 326 326 326 326 326 101 101 149 228 228 289 320 309 309 479 278 278 278 325 449 449 176 176 135 328 200 200 195 248 197 197 197 401 491 144 27 27 437 437 405 405 405 206 167 35 35 242 242 242 33 33 33 250 250 364 276 276 153 153 372 372 372 215 215 35 472 472 221 401 491 208 208 441 441 441 441 109 278 139 139 139 375 375 375 233 233 270 270 433 433 160 160 18 112 112 439 439 439 225 237 237 237 47 491 47 491 491 73 491 373 373 338 400 400 400 30 30 58 110 254 254 254 314 196 196 479 331 278 278 278 325 449 191 191 191 314 314 478 478 68 68 115 273 278 278 143 96 96 232 68 68 68 6 272 371 444 360 360 252 339 199 199 223 223 130 402 198 198 114 57 57 57 203 381 381 48 229 491 247 15 193 193 193 17
+103-1240-0006	103	478	1 4 40 4 37 5 25 23 20 18 7 31 22 20 29 14 40 38 14 38 27 25 33 19 33 13 23 19 25 1 6 11 37 28 31 5 40 1 4 25 11 22 20 29 19 26 5 32 3 30 29 8 3 25 12 5 24 15 25 30 27 11 12 5 33 22 30 6 31 33 12 5 18 3 23 27 1 5 25 11 38 7 25 11 5 29 12 5 31 33 20 29 30 13 11 18 19 23 9 20 6 25 11 1	8 7 4 5 2 3 2 3 4 5 4 4 4 4 3 4 2 4 2 4 5 2 3 3 6 4 5 3 4 1 13 2 5 6 6 5 7 16 5 2 2 5 5 2 3 4 2 8 4 3 3 12 4 3 2 1 5 8 6 6 7 4 1 2 2 5 3 3 6 4 1 2 6 2 5 8 12 6 1 4 5 10 2 3 3 5 2 3 6 4 4 3 7 3 3 7 3 5 4 4 8 4 7 20	17 17 17 363 51 51 228 491 412 83 145 253 253 253 253 368 342 168 168 145 145 486 460 460 173 280 29 242 242 242 359 359 359 81 324 324 324 3 58 72 268 268 268 268 268 268 274 186 39 54 86 105 336 445 485 485 213 485 215 129 354 29 334 304 304 185 131 397 397 345 347 347 347 347 43 43 364 276 174 174 426 426 206 167 457 76 36 377 87 87 87 236 259 108 119 119 351 351 443 139 139 139 293 175 175 81 89 340 340 116 33 335 14 14 491 411 411 284 284 284 405 405 405 206 206 206 37 24 131 133 4 4 280 153 153 343 343 343 343 358 358 39 342 342 224 50 50 50 50 50 50 185 269 433 160 112 427 82 247 312 126 292 292 292 326 326 326 326 326 101 408 149 149 491 412 412 55 55 55 322 67 131 472 221 458 445 445 213 213 213 213 252 215 458 176 176 135 135 135 200 200 44 44 44 44 99 338 338 338 338 395 273 106 306 306 306 396 396 215 35 35 335 14 14 411 284 265 265 265 265 85 85 146 464 464 125 125 125 125 466 466 22 283 455 399 217 217 217 473 290 290 290 290 290 434 434 434 339 339 33 90 42 42 147 147 380 380 288 496 496 496 496 274 274 274 24 131 472 198 198 127 45 45 385 90 221 458 208 208 190 499 499 499 405 405 206 150 150 54 86 238 6 6 472 472 198 22 283 455 38 72 72 437 437 481 481 481 481 175 175 81 84 84 84 84 274 274 98 229 247 247 126 126 326 326 326 326 326 101 149 149 228 491 83 83 55 55 322 67 67 131 133 133 364 276 276 346 346 486 315 315 315 315 450 450 450 413 413 348 64 212 131 230 230 230 230 230 35 35 401 198 198 22 283 455 38 162 232 232 232 68 68 6 371 371 213 213 213 252 215 129 259 29 29 42 42 42 147 380 288 443 443 443 240 314 131 183 183 183 183 183 278 278 278 139 139 139 497 497 497 497 122 259 259 354 420 420 324 464 180 180 426 426 426 426 426 282 388 303 303 64 212 465 227 419 439 78 491 305 421 491 491 491 421 491 421 491 491 491 128 128 128 491 128 193 193 193 17
+103-1240-0007	103	751	1 13 25 20 9 5 11 20 18 36 38 13 25 33 7 33 5 37 19 33 1 6 30 19 25 33 36 19 33 1 18 4 11 33 19 29 4 31 27 37 14 12 4 33 18 19 23 30 27 11 1 4 25 11 31 27 1 30 5 25 12 20 5 25 31 20 25 17 6 25 33 23 5 33 5 37 24 19 31 19 40 30 15 10 5 23 40 6 23 31 20 19 26 8 1 32 20 38 5 40 31 19 33 19 26 12 13 30 38 5 25 4 16 33 14 25 36 25 19 25 1 14 23 20 21 36 25 1 12 5 31 5 25 38 5 40 22 5 24 19 26 19 25 4 33 12 5 38 19 25 11 27 1 38 6 30 24 5 25 11 9 30 8 33 1	11 6 3 4 2 4 2 4 4 3 5 3 3 3 6 3 2 3 6 7 2 6 5 3 5 4 4 5 4 11 6 1 3 2 3 5 7 5 5 2 4 2 4 3 6 2 8 5 9 6 2 4 3 3 6 14 15 5 3 4 3 3 6 6 7 5 5 3 5 4 3 2 4 2 2 4 4 2 4 3 4 5 4 4 4 3 5 5 7 7 7 2 4 14 50 6 3 2 2 3 3 2 2 3 4 2 3 4 4 2 4 3 4 2 3 3 5 2 3 3 1 7 3 5 6 7 8 22 4 3 6 5 3 2 2 3 5 3 2 2 5 3 2 1 3 2 3 3 3 4 3 9 1 4 4 4 4 1 2 3 3 3 7 6 10	17 17 17 363 363 363 363 51 149 228 491 491 411 145 475 475 475 475 94 475 475 475 324 301 8 354 106 493 151 240 325 41 41 324 324 3 183 183 489 489 489 489 489 43 43 276 109 109 443 330 330 348 64 76 465 449 483 145 113 113 113 113 113 240 285 285 34 223 223 130 280 277 277 277 277 277 385 36 36 227 419 225 225 226 226 226 491 209 157 157 157 157 157 372 335 14 14 411 188 340 340 116 33 64 394 465 108 377 123 123 123 88 88 277 277 277 277 385 24 131 427 229 247 126 126 126 326 326 326 101 408 149 491 228 373 110 110 110 254 254 240 314 35 108 377 87 87 87 129 259 74 311 311 311 311 311 311 311 311 169 150 342 342 342 168 106 410 410 410 410 410 29 29 382 313 216 216 114 92 92 92 92 92 385 131 472 183 183 183 351 278 278 139 139 139 497 497 497 497 42 42 8 147 380 380 499 84 496 496 496 496 274 274 274 37 24 131 419 419 225 225 225 225 82 83 55 55 55 322 67 394 478 478 232 232 172 172 115 273 84 84 84 84 16 16 16 274 274 274 98 13 229 247 312 126 126 23 23 23 101 101 101 149 149 228 491 289 289 7 147 147 380 499 319 319 319 348 466 466 466 212 22 448 448 448 14 14 145 319 319 319 319 348 195 195 195 394 478 478 232 68 68 68 267 267 267 267 267 434 339 339 33 90 90 32 465 144 27 180 284 405 426 426 413 348 64 76 26 26 26 359 81 81 277 277 385 325 34 69 223 130 130 402 196 196 217 473 473 258 258 31 342 224 494 494 494 494 368 9 142 142 42 42 147 380 329 329 329 329 252 143 36 107 107 395 302 302 302 497 497 185 269 9 9 483 14 411 411 297 297 297 297 297 297 293 293 497 186 162 68 68 172 115 267 267 267 267 360 360 176 176 176 135 328 328 200 199 106 106 265 265 265 265 85 85 85 85 207 207 19 454 13 417 417 417 237 237 170 28 28 28 28 28 362 491 491 362 362 362 362 491 491 362 211 491 491 369 369 369 369 21 21 21 21 21 21 21 21 21 21 21 21 21 21 408 408 408 408 391 391 73 491 289 373 338 338 400 400 400 301 378 378 141 141 141 281 162 68 68 115 470 278 278 278 449 176 176 135 135 200 248 248 127 114 264 264 264 468 245 245 43 364 276 174 174 174 174 348 94 199 145 145 460 460 460 402 402 6 272 300 469 313 313 10 479 398 398 374 374 132 413 339 199 199 340 116 116 199 335 14 411 411 498 498 498 498 134 175 359 81 166 324 324 422 236 36 310 107 395 395 485 374 374 374 374 132 132 413 303 303 303 117 404 439 439 78 237 237 237 47 47 491 47 2 491 2 2 2 2 316 316 491 316 316 491 491 73 435 289 7 127 5 5 5 38 162 68 68 68 115 273 319 319 319 319 348 348 195 250 133 141 141 141 281 342 86 221 458 144 27 351 319 319 319 53 53 176 135 135 200 200 200 464 340 340 340 94 199 415 415 415 35 198 22 283 455 455 364 345 109 278 278 330 116 33 64 212 212 371 84 84 84 84 274 274 274 274 43 43 401 364 276 276 153 153 153 387 387 372 372 396 396 203 53 473 89 446 446 67 131 472 221 401 354 190 380 380 499 499 428 428 146 146 358 358 233 233 227 419 419 427 56 491 421 15 15 15 193 193 193 17
+103-1240-0008	103	771	1 12 20 6 30 10 14 11 3 25 12 5 31 23 27 29 9 19 23 27 12 5 18 7 31 38 5 40 19 25 5 9 30 8 11 5 23 16 23 5 32 5 37 29 19 26 22 20 38 8 33 9 23 36 24 1 18 5 24 11 27 37 14 9 8 5 24 19 30 20 5 11 5 37 1 9 20 40 1 33 3 24 5 31 23 19 25 11 1 5 24 20 22 23 19 33 5 23 24 4 25 18 36 24 4 37 5 25 23 20 29 20 29 5 23 22 6 23 11 30 15 10 5 23 19 25 11 40 18 5 40 9 5 25 11 1 38 5 40 31 27 19 26 18 19 40 23 15 33 14 25 5 29 31 20 11 3 25 12 5 18 19 23 16 20 23 11 1 9 20 6 25 11 12 5 9 3 30 25 1	11 4 3 6 3 5 2 3 3 3 2 2 7 2 5 3 2 3 6 7 1 3 6 9 6 2 2 3 2 2 2 4 4 5 2 2 3 9 2 5 7 2 4 6 2 5 4 4 5 5 6 2 4 7 6 14 6 2 8 3 5 4 3 4 6 2 5 3 5 4 2 2 2 4 1 3 11 10 25 5 4 3 3 5 3 5 5 10 9 6 5 4 4 4 2 2 3 3 5 7 3 4 2 3 7 2 3 2 2 5 5 5 3 3 3 5 3 5 5 3 3 4 3 8 4 6 2 2 4 4 4 3 2 4 5 19 3 1 4 4 5 3 3 3 3 3 5 4 9 3 3 3 3 6 4 3 3 3 1 2 5 2 5 6 4 5 3 1 3 4 6 4 2 1 2 4 5 4 6 9	17 17 17 363 363 363 363 51 51 51 228 491 320 127 448 448 448 14 14 411 153 153 387 372 372 396 313 35 310 107 395 382 382 313 313 285 34 125 125 125 125 348 466 22 283 283 38 162 232 232 232 26 26 26 431 431 84 496 496 274 274 457 457 401 401 354 354 255 255 251 251 251 241 431 84 84 84 16 16 274 274 216 216 283 283 455 58 72 72 72 268 268 268 268 268 268 450 450 274 271 271 39 39 86 142 397 336 345 141 141 281 281 342 168 340 340 116 199 44 44 44 129 259 190 190 380 380 499 499 428 85 146 146 285 34 302 302 497 497 349 349 234 234 234 234 234 261 425 425 386 431 151 151 151 169 169 169 99 436 338 338 447 395 69 462 462 402 402 221 401 259 491 74 351 351 360 360 360 200 200 248 76 76 465 445 485 324 324 324 301 378 364 364 346 346 346 428 428 146 146 143 36 472 221 401 401 259 354 425 425 241 431 374 374 374 374 132 132 132 203 381 381 404 13 491 247 312 126 126 326 326 326 326 101 101 101 149 228 491 491 373 72 72 437 284 319 319 319 203 53 53 53 53 469 212 212 131 34 410 410 410 410 410 173 280 29 29 382 245 245 8 259 354 62 62 62 62 146 464 464 44 44 399 217 217 217 473 286 286 286 468 468 406 337 337 337 324 464 464 277 277 325 34 462 462 462 402 402 221 401 401 354 213 213 213 213 213 246 246 246 246 318 318 185 185 433 433 433 160 160 112 112 78 56 491 491 28 28 491 491 341 341 341 341 12 12 12 12 12 260 260 260 260 391 391 391 73 289 491 289 108 119 437 437 284 284 426 426 203 53 473 459 271 31 39 342 342 26 26 251 251 241 81 329 120 120 330 388 195 195 195 64 212 131 419 439 439 439 439 225 225 225 237 47 491 47 80 80 491 80 197 225 287 287 44 44 44 399 217 217 473 398 213 213 213 143 143 458 144 26 26 251 241 431 278 278 285 449 302 302 497 497 399 399 217 217 473 136 136 136 136 136 136 136 282 282 388 195 404 58 489 489 489 489 489 399 53 335 14 145 145 145 486 460 460 173 280 29 242 242 116 250 359 359 81 324 324 324 422 129 259 74 485 213 213 213 213 252 215 129 259 354 100 100 100 497 497 122 143 458 144 27 437 481 481 481 481 481 293 293 122 122 472 133 42 147 147 380 329 329 171 252 143 36 107 395 302 302 302 497 497 497 251 251 251 241 81 431 278 278 330 388 379 195 195 471 471 77 269 342 142 72 72 72 437 151 151 151 368 453 342 142 221 336 354 275 275 275 275 303 303 195 243 131 419 427 491 247 126 126 126 292 326 326 326 326 326 326 326 326 326 101 101 149 149 228 320 345 141 141 281 162 232 232 172 172 115 273 84 496 88 88 88 176 176 135 135 200 248 183 183 257 257 257 257 453 342 26 26 251 241 241 431 171 171 171 252 457 457 401 259 108 119 119 351 308 308 308 313 313 94 199 469 469 215 35 96 66 68 68 68 115 115 444 444 213 246 252 252 325 34 125 125 125 125 466 466 22 283 455 58 72 72 351 278 278 139 139 293 497 497 349 349 234 234 261 25 485 485 485 464 139 139 375 497 497 122 122 36 472 221 336 354 420 420 324 464 464 180 106 426 426 426 426 413 348 64 212 212 198 22 283 455 8 354 354 106 284 306 306 306 306 306 396 396 396 37 303 303 48 404 78 229 491 491 15 15 193 193 193 17
+103-1240-0009	103	501	1 24 19 31 19 40 30 15 10 5 23 25 39 36 12 5 33 18 20 6 33 1 9 19 22 5 40 32 20 18 4 11 18 14 11 18 19 24 33 13 23 29 20 33 14 24 6 30 19 31 5 25 12 20 37 25 19 26 9 19 16 6 30 19 25 1 38 19 23 39 5 24 21 15 9 23 13 30 40 31 33 6 30 27 37 14 4 33 22 3 30 24 5 11 20 1 12 4 33 18 20 24 13 25 33 19 31 27 18 19 40 33 14 25 5 29 31 20 11 12 5 25 13 22 31 33 4 16 33 14 25 36 25 1	10 4 2 4 2 4 3 2 3 3 3 3 4 3 2 2 2 4 4 9 4 2 2 3 5 4 2 6 4 3 2 2 5 5 2 2 2 4 5 3 4 3 4 3 3 4 4 2 3 5 1 3 2 9 2 2 2 3 2 3 6 6 8 6 7 5 7 2 2 4 3 3 5 6 4 3 3 4 3 4 3 6 7 7 3 3 2 2 5 3 3 2 2 2 7 15 3 2 3 5 3 3 2 4 5 2 6 5 3 1 4 5 3 2 3 3 4 4 2 1 2 3 3 3 2 3 3 4 2 2 3 5 6 15	17 17 17 363 363 363 51 51 51 228 491 7 217 473 258 258 31 342 342 494 494 494 281 9 142 397 147 329 329 329 329 143 310 107 302 302 302 497 122 10 10 309 398 398 398 398 398 374 132 216 216 127 45 45 45 325 183 451 30 30 30 3 14 14 411 284 284 405 405 405 206 206 167 24 227 227 472 221 401 491 354 420 420 422 143 458 144 27 351 351 151 253 368 368 99 338 338 338 400 400 400 400 30 3 58 58 110 254 254 254 254 58 58 72 72 110 498 498 498 498 396 313 325 183 183 57 57 57 203 53 394 90 76 108 108 119 351 139 139 139 139 293 293 215 35 74 74 329 329 213 329 252 325 300 382 382 245 399 217 70 65 65 153 329 372 406 406 467 313 186 39 342 342 224 242 242 116 466 466 22 283 448 448 14 411 213 213 213 213 173 173 402 196 196 176 328 328 248 248 8 354 255 255 38 349 205 234 261 148 148 148 148 148 148 372 372 372 59 452 335 197 226 226 209 188 188 340 340 340 340 33 195 117 117 117 197 197 197 80 491 80 491 491 7 7 7 364 345 109 329 139 329 81 219 219 485 464 464 203 203 33 394 212 465 107 395 329 329 329 171 171 171 301 301 8 129 354 425 175 175 431 329 329 264 468 468 304 313 186 162 323 482 482 482 238 6 272 106 153 153 153 182 372 372 372 372 59 245 335 14 209 411 410 410 410 410 410 410 173 29 29 495 406 467 415 415 131 90 259 144 27 437 437 306 306 306 306 396 203 53 469 469 469 325 325 41 41 41 19 19 454 229 247 126 126 126 326 326 326 326 326 326 101 149 149 228 289 491 127 45 45 45 45 240 183 183 183 451 30 30 30 301 399 217 473 432 432 432 330 348 64 457 401 82 108 377 87 87 38 162 323 323 115 273 84 84 496 274 274 58 58 183 257 257 257 31 9 238 6 119 161 308 308 308 396 313 94 199 199 459 215 215 96 66 342 172 224 41 41 324 3 301 314 198 22 283 455 116 199 331 443 443 178 178 458 96 86 238 6 272 145 145 460 460 460 402 402 6 272 300 469 313 10 94 398 398 374 374 374 132 413 303 303 48 404 13 170 491 491 491 312 15 15 292 292 292 193 193 193 193 17
+103-1240-0010	103	740	1 29 20 33 14 18 4 11 4 31 22 33 19 24 5 37 22 6 30 31 16 14 24 4 34 39 36 22 5 34 9 14 33 18 4 11 25 13 37 14 9 19 25 27 25 33 36 37 3 23 5 25 33 19 30 19 25 16 14 24 15 32 5 25 5 9 7 33 13 25 20 34 19 26 19 25 18 19 40 18 27 23 8 16 1 4 25 11 39 13 33 1 18 20 30 38 5 40 24 4 34 39 36 22 5 34 9 14 33 1 4 33 18 4 16 29 4 31 33 34 30 20 6 25 12 20 4 16 33 14 25 36 25 5 37 5 9 19 40 20 11 15 1 29 23 4 31 19 11 23 20 11 30 8 37 19 26 27 37 14 12 5 18 3 23 27 5 25 11 5 29 12 5 18 19 23 1	17 3 4 3 2 3 2 2 7 4 2 1 3 2 2 4 3 4 5 5 4 3 3 5 3 2 3 4 3 5 3 5 1 5 1 3 5 2 3 3 4 2 7 5 3 4 2 4 3 3 3 2 5 2 4 2 2 4 1 4 4 4 2 1 2 2 5 5 4 3 4 4 3 6 4 1 3 2 3 3 3 8 8 11 37 5 1 2 3 8 10 11 9 3 4 3 2 3 3 6 4 1 3 7 3 5 3 6 3 8 6 2 8 5 6 5 5 5 2 5 3 7 3 3 3 3 4 4 3 1 3 6 2 2 3 2 6 3 4 4 4 12 15 4 3 4 7 1 2 3 4 5 3 6 2 3 4 8 2 3 2 2 6 4 5 5 2 2 3 3 3 5 1 5 3 9 9	17 17 17 17 363 363 363 363 363 363 363 408 51 51 228 491 289 320 74 329 329 329 329 329 325 34 334 382 382 467 110 254 254 254 285 34 145 145 145 376 460 460 169 150 342 86 105 96 96 272 57 57 57 203 53 255 255 255 130 402 221 259 208 441 441 153 153 372 372 372 59 271 271 269 54 54 9 97 336 155 155 332 332 332 245 399 473 65 329 329 329 460 169 164 164 485 485 485 374 132 143 259 144 27 437 329 329 329 169 164 164 142 221 336 29 495 334 59 59 313 24 131 58 72 110 254 254 254 254 35 35 196 309 309 479 331 463 463 463 463 29 382 382 245 8 129 354 137 137 137 137 33 10 10 309 331 331 84 84 350 350 413 413 33 394 465 377 377 87 123 132 8 354 354 106 284 481 481 481 175 175 81 242 116 33 394 465 465 108 119 485 485 286 286 468 406 467 467 121 53 394 155 155 25 469 469 203 217 473 418 418 418 418 418 99 436 436 60 60 298 199 255 255 8 180 113 113 113 113 240 285 131 335 14 401 209 411 475 475 475 475 475 475 475 475 422 164 164 164 214 214 214 214 328 328 200 200 248 335 188 188 340 340 94 199 199 257 257 257 257 342 9 142 437 424 424 424 424 424 497 497 122 251 241 431 431 265 265 428 428 85 146 146 358 358 352 352 352 352 352 352 352 112 427 56 491 312 312 312 292 292 292 292 292 292 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 408 408 408 408 149 149 228 289 209 83 55 55 322 67 212 90 219 357 357 357 357 357 357 120 120 240 385 385 35 227 227 227 419 419 439 225 197 47 47 491 491 47 491 47 47 80 491 491 491 289 7 373 451 451 451 286 286 286 286 286 468 468 245 43 43 345 141 141 281 281 9 142 196 217 473 65 486 486 460 460 169 164 164 164 485 485 485 374 132 143 259 144 144 27 437 329 329 329 329 150 164 164 105 221 336 354 29 498 498 59 59 313 385 227 419 427 229 247 408 149 149 228 226 491 209 83 415 415 415 240 314 335 58 72 72 72 72 72 110 110 486 486 460 460 460 169 352 352 402 221 401 259 74 311 311 311 311 311 311 311 169 150 86 86 238 6 272 472 472 234 164 164 487 487 487 288 288 213 213 246 246 3 335 440 125 125 125 125 466 466 448 448 448 448 464 145 145 460 460 460 349 402 96 272 469 469 469 236 94 398 374 374 374 132 132 339 94 199 69 223 130 280 44 44 44 44 32 401 401 354 354 278 278 278 368 342 342 41 41 324 324 301 239 239 384 93 93 93 93 93 207 207 207 246 19 454 229 247 247 126 126 326 326 326 326 101 101 149 228 491 80 80 491 491 74 425 425 386 386 431 486 486 460 460 169 150 342 342 342 224 494 494 416 26 359 359 166 166 166 324 301 236 401 259 161 161 79 499 499 499 265 85 85 146 146 173 173 176 176 135 135 200 248 14 14 411 410 410 410 410 410 410 173 29 29 382 313 216 283 283 455 58 72 72 72 437 481 481 481 481 481 293 175 175 81 84 84 16 88 88 89 89 446 116 64 212 384 180 230 230 230 215 35 35 96 198 198 22 283 455 455 58 183 278 278 278 278 139 139 139 375 375 375 375 98 13 229 491 170 491 15 15 15 193 193 17
+103-1240-0011	103	732	1 4 25 11 18 19 40 9 13 31 33 31 36 33 5 37 22 23 27 12 40 1 38 19 10 38 5 40 29 23 15 25 29 30 36 16 12 5 33 18 20 38 5 40 17 27 19 26 7 33 5 37 4 37 5 25 23 20 1 4 25 11 18 20 18 4 11 12 5 9 5 17 20 4 25 11 12 5 31 6 30 5 23 24 13 30 38 19 10 9 19 33 27 22 5 25 11 12 5 33 18 20 38 5 40 17 27 19 26 5 22 5 25 31 19 11 14 5 9 5 23 11 19 31 33 5 25 31 1 25 7 1 38 13 30 38 5 40 24 4 34 39 36 22 5 34 9 14 33 1 17 27 19 26 1 4 25 11 38 8 38 5 40 18 20 17 27 19 25 12 13 30 1	5 6 1 2 2 2 4 4 4 5 3 5 4 2 1 4 5 3 7 2 7 13 4 2 4 2 2 4 7 3 6 6 4 3 6 5 2 2 2 4 2 3 2 3 3 4 3 4 8 3 2 3 5 3 2 4 2 11 33 3 2 1 2 4 4 3 4 1 3 7 5 2 7 1 2 1 2 2 6 4 2 3 2 5 7 5 3 2 4 2 3 6 4 3 3 1 2 1 2 1 2 2 3 1 4 3 4 3 3 2 4 2 2 7 1 3 3 3 2 2 2 4 3 5 2 2 5 9 28 5 13 6 10 8 3 6 2 4 6 7 4 2 3 8 3 6 3 3 4 4 4 7 3 11 2 5 3 3 6 6 3 2 3 2 3 6 4 6 5 2 2 11 21	17 17 363 51 228 412 412 83 194 194 446 67 67 131 183 257 257 257 257 453 342 221 221 336 354 354 443 443 443 169 150 342 86 86 6 6 272 472 66 482 482 115 485 374 374 132 252 36 449 462 462 402 402 221 336 144 208 425 386 386 431 496 496 496 496 496 274 274 37 233 185 185 269 323 18 427 427 247 247 126 126 292 23 23 408 408 391 391 228 228 289 491 320 407 407 407 407 310 107 397 397 141 141 141 281 281 9 142 221 221 336 491 74 74 425 425 386 386 431 290 290 290 290 434 434 339 339 195 33 394 76 465 74 190 190 190 487 487 374 374 374 132 132 358 352 352 352 402 198 198 45 45 45 45 131 183 451 30 30 30 301 378 345 141 141 281 453 9 221 336 144 180 84 496 88 88 176 176 135 328 200 335 14 14 145 145 113 113 113 113 206 285 449 34 69 223 130 280 180 145 145 486 460 460 173 280 280 242 242 116 33 250 251 241 81 256 444 213 246 246 246 19 19 454 454 78 170 170 491 28 491 491 312 312 187 292 292 12 12 21 21 21 21 21 21 21 21 21 21 408 408 408 408 149 149 491 491 289 491 209 83 55 322 67 325 30 30 30 30 3 58 58 72 110 110 254 254 254 254 314 35 198 127 283 455 455 236 129 401 401 401 354 354 431 151 151 240 416 416 192 41 41 41 324 3 464 89 89 446 348 466 22 283 455 38 162 232 482 172 115 106 106 153 372 372 372 406 467 302 302 497 497 399 399 217 473 473 264 264 264 264 264 468 468 59 59 59 245 43 364 345 407 407 407 310 107 447 221 336 354 420 420 236 129 36 108 119 119 351 496 496 496 274 143 458 192 242 242 116 116 466 212 45 45 45 325 183 30 30 301 378 141 141 141 281 342 9 221 336 144 180 84 88 88 88 176 135 135 200 200 464 44 44 143 458 27 27 121 121 33 478 478 232 68 172 115 273 278 278 278 285 495 495 495 134 134 134 134 8 100 100 100 497 122 401 401 401 371 278 278 278 31 39 86 86 6 272 11 11 11 11 379 379 471 471 270 433 433 433 18 112 56 56 491 312 312 312 187 292 292 292 292 292 292 21 21 21 21 21 21 21 21 21 21 408 408 408 408 149 228 228 289 491 7 309 479 331 315 315 315 315 450 450 16 293 293 335 197 197 197 197 197 197 197 197 491 491 7 7 364 364 364 364 276 181 181 181 181 181 264 264 264 264 468 468 468 245 245 43 364 364 430 430 430 430 430 342 342 221 196 217 217 217 217 473 486 486 486 486 460 169 169 164 164 164 485 219 485 485 132 143 143 129 82 144 27 27 437 329 329 151 169 164 164 164 164 221 401 259 29 29 382 313 313 35 131 472 221 401 401 401 401 491 144 180 180 84 84 350 88 88 88 176 176 176 176 328 328 200 200 200 117 454 404 483 226 226 226 226 491 83 55 55 55 322 67 67 212 131 133 364 364 276 181 346 346 181 265 85 85 146 378 378 345 430 430 430 430 342 342 451 30 30 324 422 143 401 401 144 27 180 84 84 496 88 88 88 176 176 135 135 200 248 248 248 216 127 114 114 264 264 264 264 264 59 59 452 452 263 263 417 417 237 491 237 237 421 421 421 491 491 491 128 128 128 491 128 128 193 193 193 17
+103-1240-0012	103	759	1 18 4 11 19 33 9 19 25 13 25 20 5 12 14 24 4 25 19 25 4 37 5 25 23 20 1 24 19 31 19 40 30 15 10 5 23 1 11 13 16 33 23 20 29 35 33 19 26 12 19 31 4 25 11 12 4 33 5 17 13 12 14 1 24 8 33 18 4 37 17 19 37 5 25 5 29 30 19 33 20 17 35 11 17 13 31 4 40 33 5 9 27 34 22 38 13 31 10 5 25 40 1 9 5 33 24 4 34 39 36 31 27 30 13 30 23 20 38 13 25 33 16 14 24 18 27 24 1 12 5 33 19 33 24 5 31 33 9 20 31 5 24 34 19 26 29 30 13 31 19 26 4 25 11 5 25 39 36 41 38 5 23 38 19 10 38 5 40 33 15 22 19 26 18 19 24 1	24 9 3 2 3 3 3 4 3 8 3 5 4 3 3 5 6 2 3 3 5 3 2 5 2 11 18 4 3 5 2 5 4 4 3 4 8 3 2 5 4 3 2 3 4 3 3 3 7 2 4 7 5 2 2 3 5 6 2 4 3 4 7 13 5 5 2 3 2 3 3 3 2 4 2 3 5 3 1 2 4 4 3 5 3 8 6 2 4 3 2 3 5 3 3 2 3 3 5 3 6 6 40 2 2 3 5 7 5 3 3 7 9 5 4 4 3 4 5 2 4 2 4 2 3 7 9 5 8 3 2 3 2 2 5 3 4 5 2 3 7 2 4 3 2 5 8 2 2 7 3 5 1 2 2 4 3 5 5 5 1 2 3 3 2 4 2 2 4 5 4 3 2 3 3 4 5 7	17 17 17 296 296 317 317 491 491 317 305 305 461 491 461 491 491 461 491 491 435 435 435 435 435 435 7 373 72 72 430 430 430 430 430 430 430 34 177 177 177 236 35 401 259 354 137 137 137 137 137 94 199 335 14 14 411 411 475 475 475 475 475 475 475 475 324 324 464 464 493 493 493 493 493 216 300 300 382 245 399 217 217 473 136 136 136 136 136 136 282 94 199 340 340 340 94 199 145 145 486 486 460 460 173 280 29 242 242 116 379 33 250 251 241 81 444 444 213 246 246 246 19 19 454 229 247 247 126 126 292 326 326 326 326 326 326 326 326 326 101 101 149 149 228 289 7 217 473 258 258 258 258 342 342 342 494 494 494 368 453 9 142 397 147 380 329 329 329 329 329 329 36 310 107 395 302 302 302 375 497 98 98 98 225 225 225 225 80 80 259 384 371 180 443 443 169 169 352 352 402 6 6 26 359 166 166 166 301 129 259 259 74 189 189 189 285 449 449 176 176 135 328 200 200 248 248 32 32 127 114 114 258 258 258 31 39 86 68 68 68 483 483 440 89 194 446 446 33 212 212 198 127 114 92 92 92 92 167 167 457 457 36 108 377 123 123 416 458 445 180 180 443 493 493 216 300 300 334 59 59 452 263 229 247 247 126 126 326 326 326 326 101 101 149 149 228 228 491 7 70 70 65 65 428 428 428 146 438 325 449 34 202 202 202 202 402 221 259 144 445 278 278 173 173 280 29 242 242 116 94 199 44 44 44 129 129 259 74 190 190 104 104 104 325 325 41 324 324 301 416 239 144 144 484 484 484 236 314 131 221 401 259 445 445 180 443 443 443 443 120 120 271 271 39 342 342 224 253 253 253 253 31 86 238 6 272 123 123 123 8 354 106 496 496 496 274 368 342 142 221 336 208 208 441 151 151 151 169 150 99 238 6 6 310 107 60 298 298 298 275 303 303 471 471 471 269 433 18 112 427 491 491 312 312 312 292 292 292 292 292 292 292 21 21 21 21 21 21 21 21 21 21 21 408 408 391 163 491 316 491 316 316 316 316 491 316 316 316 73 289 289 320 159 159 159 159 35 35 196 196 217 473 473 329 329 329 329 329 329 169 169 164 164 164 485 485 485 485 374 132 422 186 162 232 482 172 115 344 344 344 344 344 274 274 274 42 42 364 147 147 380 288 264 264 264 468 468 468 313 134 359 359 166 166 166 301 301 43 364 276 109 109 189 330 330 33 64 76 131 472 393 155 155 165 165 165 165 53 58 58 72 72 72 72 437 350 350 350 350 350 350 350 182 413 413 381 381 404 404 225 225 225 225 225 80 80 491 491 320 127 45 45 45 45 325 177 177 177 177 457 217 217 217 70 65 65 319 169 150 150 86 86 6 272 472 472 336 354 420 420 420 422 162 232 232 68 68 115 273 231 231 231 53 53 76 465 198 214 214 214 328 200 200 248 76 129 401 491 74 190 190 190 488 488 488 151 151 169 150 342 342 68 224 176 176 176 328 328 200 200 464 89 89 446 67 212 34 106 319 319 319 348 33 33 219 219 219 219 485 374 374 374 374 374 368 368 107 161 134 134 100 100 100 497 43 43 345 407 407 407 36 310 447 397 397 141 141 141 281 86 86 238 6 119 119 295 295 295 295 295 252 143 192 192 135 135 328 200 200 183 183 57 57 57 57 57 203 381 48 48 13 13 78 491 128 491 193 17
+103-1240-0013	103	762	1 18 20 38 5 40 12 5 32 8 5 31 33 24 4 25 5 23 8 37 5 25 11 18 15 33 19 11 33 5 18 4 37 33 19 17 27 5 24 5 26 31 33 30 15 25 21 14 40 14 33 36 13 25 20 29 23 15 31 38 13 30 18 20 24 8 33 18 4 37 33 19 33 6 22 1 24 4 34 39 36 11 30 13 31 33 5 29 38 19 12 5 38 8 33 22 3 23 14 5 25 11 30 8 37 19 26 19 25 5 9 5 17 20 1 38 5 40 31 5 24 34 19 26 12 5 33 11 19 11 25 33 18 4 29 5 25 6 16 5 25 1 24 19 31 19 40 30 15 10 5 23 1 29 3 25 11 14 13 40 32 20 24 8 33 1 22 35 11 24 15 22 25 5 34 19 26 5 37 19 33 1	8 5 3 3 2 3 2 2 7 9 3 3 3 3 5 2 3 5 8 3 1 2 1 5 5 3 2 2 3 2 3 4 4 3 2 4 4 3 3 2 5 3 3 3 4 3 3 4 4 4 3 6 3 2 4 5 3 5 5 2 2 1 6 2 5 5 2 5 2 3 3 3 4 6 10 24 5 6 5 3 6 5 4 3 6 3 5 3 3 1 2 2 5 5 3 6 5 4 5 3 1 4 3 6 2 2 3 1 2 3 4 5 3 8 13 3 1 3 7 2 5 3 2 5 1 4 3 3 3 2 4 2 6 4 4 2 3 4 6 2 5 33 5 3 3 3 4 4 4 3 5 6 13 5 5 3 2 4 3 2 5 3 4 8 6 11 3 3 2 4 4 5 7 5 4 2 4 7 3 5 5 8	17 17 17 296 296 317 184 184 491 373 451 451 451 30 301 378 364 345 141 141 141 281 342 342 198 22 283 455 38 338 338 338 395 395 106 480 480 480 85 85 146 146 464 459 459 459 31 31 86 238 6 472 196 196 473 136 136 136 136 136 282 388 199 199 255 255 251 251 241 241 431 265 265 265 85 85 85 146 299 173 352 89 89 322 67 199 58 72 72 72 110 171 171 171 171 252 143 36 449 191 191 236 314 36 108 377 87 58 72 110 110 202 202 202 460 169 352 402 402 6 272 87 87 87 416 144 180 84 496 88 88 88 255 255 399 70 70 65 319 319 319 348 200 248 478 66 482 482 238 6 161 79 288 290 290 290 290 434 339 339 212 310 395 334 334 304 304 304 49 269 168 168 157 157 313 313 36 377 377 123 123 88 88 14 411 475 475 475 475 475 475 475 324 301 129 259 74 425 425 386 386 343 343 343 343 358 318 39 342 9 142 397 345 109 109 498 245 313 183 451 451 30 30 30 301 399 217 70 65 65 428 428 428 146 146 325 131 72 72 110 110 486 486 460 460 402 402 96 272 87 87 87 236 259 108 119 119 351 405 405 405 405 405 405 206 206 169 233 458 192 419 427 491 491 247 312 126 292 292 292 292 292 292 21 21 21 21 21 21 21 408 408 408 149 149 228 82 320 7 217 473 65 486 486 486 460 329 169 164 164 164 164 485 485 485 485 374 374 132 132 132 236 32 401 259 161 161 79 79 380 288 443 151 169 150 150 86 86 86 238 6 272 180 230 230 230 230 215 215 35 29 345 333 333 220 220 44 44 44 43 364 276 346 346 346 428 428 146 146 385 131 472 221 458 144 27 437 437 481 481 481 481 481 175 175 81 300 300 382 406 467 467 89 89 446 116 394 212 161 161 79 499 499 499 428 85 85 146 173 173 280 176 135 135 200 464 340 340 199 44 44 44 8 32 259 354 431 151 151 151 416 416 192 41 41 41 41 41 19 454 229 247 247 126 126 326 326 326 101 101 149 149 228 289 320 320 345 141 141 281 31 342 232 232 68 68 172 115 231 231 231 231 231 53 394 76 164 164 214 214 214 214 200 200 248 212 127 45 45 45 45 236 401 401 259 384 371 278 278 278 314 196 242 242 33 64 212 131 472 72 72 72 110 486 486 460 460 460 215 35 29 29 242 242 94 199 199 106 426 426 426 169 349 352 352 352 242 275 275 303 303 303 48 48 417 417 417 237 237 491 28 28 491 305 305 491 491 362 305 491 491 491 491 362 366 491 366 366 316 491 491 435 316 435 491 491 73 289 7 7 217 473 258 258 258 342 342 224 494 494 494 281 9 142 397 147 147 329 329 329 329 329 252 36 310 107 395 302 302 302 302 497 98 98 13 229 82 247 312 126 126 326 326 101 101 101 149 391 228 491 289 491 74 437 437 284 405 426 426 206 348 64 64 212 300 382 495 406 467 253 253 253 99 99 338 338 400 400 400 400 30 301 399 217 70 65 65 265 428 428 85 146 146 358 385 36 227 427 427 229 247 126 126 126 326 408 408 391 228 228 289 491 144 27 389 389 389 314 196 196 217 473 476 476 476 476 476 143 458 96 196 32 196 309 309 309 309 479 331 231 231 231 231 231 349 164 214 214 214 214 328 200 200 335 14 411 287 284 223 223 223 223 130 280 277 277 277 277 277 385 24 227 419 439 439 439 439 225 128 193 193 17
+103-1240-0014	103	697	1 4 25 11 18 14 4 16 33 14 25 36 25 40 19 25 21 28 24 5 25 33 38 5 40 31 29 28 23 11 1 8 23 21 19 31 33 31 33 13 29 27 37 14 33 19 17 30 20 25 17 15 9 5 23 40 4 16 33 14 33 20 5 25 11 16 8 25 11 7 33 16 14 24 3 30 19 23 5 38 13 30 18 20 40 17 6 25 5 25 11 38 8 1 12 5 38 14 12 20 38 35 24 5 25 16 8 25 5 23 20 22 5 25 22 23 36 11 19 11 1 18 20 11 5 40 5 25 21 13 25 14 5 23 20 17 27 33 19 33 7 25 12 19 31 33 8 24 5 37 39 19 30 1 5 25 11 18 20 25 13 37 14 37 19 40 19 33 31 1	7 7 1 2 4 2 7 3 3 2 2 6 4 2 3 3 4 5 4 2 3 5 2 1 3 4 3 7 5 6 52 7 5 3 4 4 5 3 4 3 4 6 2 3 2 2 3 3 5 3 2 7 3 2 2 4 6 2 3 2 7 8 1 2 3 5 6 2 2 6 4 3 2 5 2 3 3 4 3 3 4 2 4 2 4 3 5 3 2 1 3 5 11 17 2 4 6 3 3 3 4 3 2 2 4 5 5 2 2 2 3 3 2 2 4 2 4 2 2 6 31 5 2 3 3 2 3 4 4 4 2 4 2 3 3 5 3 2 3 8 6 2 2 3 3 5 4 3 2 3 3 3 6 2 4 2 1 5 2 7 3 3 3 5 2 5 4 4 8 6	17 17 17 296 296 184 184 184 412 83 194 194 194 55 322 67 131 183 156 156 156 156 335 14 145 145 460 460 460 460 349 402 96 272 272 469 469 313 94 398 398 374 374 374 132 339 339 33 471 77 342 168 121 121 121 33 33 394 310 395 395 153 153 387 387 146 146 203 217 291 291 291 291 291 64 243 36 227 472 397 397 345 141 141 281 31 162 232 232 105 105 336 354 153 153 387 387 387 387 139 139 302 302 375 375 122 122 131 227 419 439 417 417 170 491 28 28 28 28 28 491 491 362 362 491 362 491 305 362 362 491 362 491 362 435 491 211 369 491 369 369 369 369 369 21 21 21 21 21 21 21 260 260 260 260 260 260 391 391 391 491 73 491 491 412 287 111 111 111 111 139 139 293 293 122 239 36 384 395 470 459 271 271 150 342 86 238 6 6 491 478 66 68 232 238 6 272 470 470 443 443 215 215 35 354 29 410 410 410 410 410 410 280 29 29 313 236 36 377 123 123 129 259 208 79 79 288 360 360 360 434 434 200 248 248 212 445 180 171 171 171 171 171 252 215 8 354 100 302 497 497 49 342 168 180 145 486 460 460 460 169 402 402 6 272 300 382 313 236 36 108 108 119 351 213 213 213 213 246 246 246 3 464 89 446 116 394 76 90 393 234 261 25 25 480 480 480 480 299 299 339 64 212 34 180 113 113 113 113 167 167 35 401 393 155 155 165 165 165 165 53 217 217 65 329 329 495 406 467 467 134 139 175 175 423 423 423 423 43 43 364 345 109 109 264 468 468 396 58 183 451 30 30 30 368 342 342 221 336 144 180 106 426 426 426 206 388 94 199 89 446 446 212 131 133 133 276 276 346 346 346 265 265 85 85 85 207 207 19 454 417 417 417 417 417 47 491 47 47 491 435 435 80 491 491 80 80 289 320 127 5 5 455 43 43 364 276 109 498 498 498 396 313 216 216 41 324 324 301 43 364 276 174 174 174 203 53 473 242 116 195 33 90 393 349 234 261 25 106 480 480 480 146 146 299 339 250 359 359 166 166 166 143 458 144 27 121 121 121 76 458 458 208 386 386 444 374 374 374 252 325 34 191 191 191 24 131 404 427 229 247 247 126 126 292 292 292 292 292 326 326 326 326 326 326 326 326 326 326 326 326 326 326 326 101 408 408 408 408 391 491 491 373 451 451 30 30 422 325 371 71 71 71 71 71 453 242 242 348 64 394 76 401 310 107 395 395 395 432 432 330 94 199 495 495 495 134 134 134 359 81 166 166 324 422 416 458 144 180 84 496 496 274 285 449 123 123 236 236 36 108 119 351 351 315 315 315 315 450 450 413 413 413 466 198 114 258 258 258 31 86 86 6 272 119 103 103 103 103 85 299 203 53 29 462 462 462 280 280 219 219 286 286 286 286 334 59 59 452 452 263 225 225 83 55 55 55 322 67 131 183 183 451 451 30 30 434 339 10 10 10 309 479 331 463 463 463 463 29 29 382 382 245 245 349 280 280 278 278 278 368 368 342 168 168 277 277 277 37 385 233 270 270 433 390 160 112 112 439 439 78 56 128 128 193 193 17
+103-1240-0015	103	190	1 19 16 18 20 11 30 5 25 7 33 5 37 33 14 25 5 29 31 20 11 18 20 38 35 11 5 25 33 11 30 13 31 5 29 5 25 11 33 15 22 12 5 9 5 17 20 33 19 17 27 16 14 24 6 30 1	9 4 1 3 2 3 3 4 2 5 2 2 3 6 3 2 3 2 7 4 2 3 3 4 2 2 2 2 1 2 2 3 5 4 3 2 1 2 3 4 4 2 1 5 4 3 4 3 2 4 5 4 2 4 4 9 8	17 17 17 363 51 51 228 491 491 412 118 118 118 118 402 451 30 30 422 314 90 133 147 147 380 499 319 319 319 94 199 145 113 113 113 113 240 285 34 462 130 402 221 36 108 119 119 308 308 308 396 313 94 199 199 230 215 35 478 232 232 232 172 115 444 444 444 213 213 252 24 131 183 451 30 30 30 378 378 345 389 389 389 389 314 196 242 33 33 394 212 161 161 79 380 288 443 443 169 39 342 342 168 230 230 230 230 215 35 29 89 89 116 394 465 108 119 295 295 295 295 295 143 458 96 198 198 283 455 455 8 32 354 354 431 151 151 240 416 416 192 41 41 324 422 36 36 377 87 87 416 458 27 180 84 496 496 274 349 205 155 155 332 332 245 399 70 70 138 138 138 138 138 372 372 372 59 452 263 263 229 491 491 312 15 15 15 193 193 193
+103-1240-0016	103	783	1 39 13 33 31 5 24 34 19 26 24 5 31 33 18 4 37 18 4 29 5 25 11 31 19 25 31 23 4 31 25 8 33 19 31 33 3 30 33 18 19 24 6 16 1 8 24 22 23 20 25 29 5 40 5 23 11 12 4 33 31 38 5 33 1 5 25 11 8 38 27 25 33 25 27 5 24 19 25 5 33 31 29 20 31 5 37 24 8 25 11 14 22 3 25 32 5 25 31 1 5 25 33 19 23 8 25 27 38 5 33 18 4 40 33 15 22 5 25 24 4 34 39 36 22 5 34 9 14 33 7 33 5 37 4 37 5 25 23 20 33 5 11 15 1 5 22 6 30 11 19 26 23 20 1 4 16 33 14 33 20 24 19 31 19 40 30 15 10 5 23 31 13 33 7 33 1 32 20 18 4 11 25 3 33 16 3 30 33 19 17 27 1	12 4 2 4 3 2 3 3 2 2 6 4 3 2 1 2 2 3 4 3 3 1 3 4 2 3 3 4 5 6 2 7 3 3 3 3 3 2 2 4 2 3 9 10 16 5 3 4 3 4 3 3 4 3 2 4 2 2 2 4 3 2 4 7 2 4 2 2 4 5 5 1 2 3 5 3 7 2 2 3 3 3 4 4 3 1 3 4 5 2 2 3 7 3 3 4 2 4 5 12 4 2 4 3 3 4 7 6 3 2 1 2 1 5 4 3 3 2 3 4 4 4 2 2 7 2 4 3 3 6 4 3 2 3 5 2 2 3 3 3 4 1 2 8 61 4 5 2 3 1 2 4 3 7 16 8 2 3 2 6 9 4 2 4 2 4 3 3 3 2 1 6 3 3 4 9 18 7 4 2 2 2 2 4 3 4 2 3 3 2 4 11 5	17 17 17 296 296 317 317 317 184 184 184 184 491 219 357 357 357 240 385 35 35 478 68 115 273 231 231 231 53 76 465 214 214 214 214 200 248 248 217 217 217 70 65 65 319 151 169 150 342 86 6 272 34 494 202 402 58 72 110 110 486 486 460 460 215 35 29 242 242 116 379 471 478 478 68 68 115 273 278 278 379 379 77 342 342 26 26 241 431 431 376 376 376 376 169 169 150 342 86 238 196 309 331 331 428 428 428 428 146 252 143 36 377 87 87 38 162 86 482 238 272 180 499 306 306 396 396 285 183 183 183 57 57 57 57 399 70 65 106 426 426 426 426 426 426 206 169 352 352 352 352 352 352 352 112 439 417 237 170 47 491 47 47 491 491 2 47 316 316 316 73 73 73 491 287 111 319 203 203 90 76 465 144 208 208 386 386 360 360 360 360 339 339 394 76 465 74 437 437 151 151 368 368 342 168 302 302 302 375 375 122 122 239 127 114 92 92 92 169 35 77 9 142 397 345 181 181 181 167 385 35 131 419 439 225 225 305 491 412 412 83 55 55 322 67 199 111 111 111 111 438 378 43 364 276 109 109 496 496 274 274 274 457 196 309 479 331 84 84 88 88 44 44 44 44 217 217 217 217 217 217 473 278 278 116 199 199 278 240 143 77 342 86 142 221 336 336 74 213 213 213 252 39 342 224 462 462 462 402 196 196 70 65 480 480 480 85 299 299 339 212 131 157 157 157 245 129 129 259 27 27 437 370 370 370 370 348 64 76 310 107 395 60 298 379 379 471 471 77 269 433 112 427 247 247 126 126 326 23 23 101 101 149 149 228 289 412 287 55 446 322 67 76 465 449 351 139 139 139 251 175 111 111 111 111 438 438 10 10 10 309 479 331 84 84 16 16 274 43 43 364 181 181 181 181 325 34 356 281 281 342 86 221 336 108 119 295 295 295 295 143 458 192 242 242 116 33 250 217 217 473 486 486 486 460 460 169 164 164 485 485 485 374 132 143 129 259 144 27 27 437 151 151 164 164 402 401 401 354 29 382 382 313 24 335 14 14 209 287 113 113 113 113 113 285 34 255 223 130 280 280 145 486 486 460 173 173 280 242 242 379 250 359 359 81 474 324 324 252 143 36 377 87 87 236 93 93 93 93 207 207 207 454 454 417 417 417 417 237 170 491 28 28 28 491 491 362 362 305 362 491 362 491 362 491 362 369 369 369 369 369 369 369 369 369 369 369 369 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 260 260 391 391 391 391 491 289 289 412 287 255 255 143 458 458 208 441 153 153 387 396 313 325 176 176 328 200 248 250 359 474 474 474 324 19 19 454 454 417 417 417 417 417 237 237 47 491 491 491 435 435 80 289 435 209 209 287 145 486 460 460 460 169 402 35 36 272 382 382 313 143 36 108 119 351 213 213 213 213 246 246 246 246 3 301 196 217 473 258 258 258 31 342 224 494 494 494 31 9 142 397 147 329 329 329 329 143 36 449 395 302 302 497 38 162 68 68 115 273 470 443 240 325 449 180 113 113 113 113 113 450 233 233 227 419 419 439 417 417 417 237 237 237 237 47 491 47 2 491 47 316 316 316 316 491 491 435 373 373 338 400 400 400 400 30 3 58 110 254 254 254 314 196 479 479 307 307 61 61 167 35 393 205 261 25 106 306 306 396 396 313 36 449 87 87 87 416 144 180 180 84 84 84 16 16 16 274 98 98 13 417 417 225 225 193 17
+103-1240-0017	103	715	1 12 5 9 19 17 1 30 4 24 9 23 19 26 6 30 10 14 11 13 24 9 7 14 11 18 7 31 38 13 30 12 5 2 23 19 37 11 1 38 5 40 5 31 22 4 25 33 22 38 6 30 33 14 5 37 5 24 8 23 5 29 12 5 30 27 11 16 14 24 23 19 25 11 40 18 3 23 27 1 33 19 9 20 32 35 30 1 12 5 23 6 26 23 15 25 24 15 11 19 33 5 17 35 11 20 23 16 14 12 14 1 24 4 34 39 36 22 5 34 9 14 33 31 16 3 12 14 1 4 40 32 8 5 25 11 31 8 23 5 25 33 13 40 18 19 40 31 5 25 4 16 33 14 18 19 24 1	7 3 2 5 5 10 3 4 5 5 1 3 4 4 6 4 5 2 1 2 4 3 8 3 2 7 9 6 3 3 2 2 3 25 4 5 4 3 17 3 2 3 3 5 3 7 2 4 5 2 3 2 2 4 2 2 3 4 9 3 3 3 2 3 4 5 2 5 2 3 2 4 3 2 2 4 4 4 9 37 3 2 2 3 5 3 8 17 2 3 5 5 7 3 8 4 3 4 2 2 3 2 4 4 6 3 3 6 6 3 8 15 5 5 4 2 3 4 4 5 3 3 3 4 4 6 4 6 13 6 3 8 12 3 1 3 4 6 4 3 2 2 2 3 3 1 3 5 5 3 6 3 2 3 2 5 5 5	17 17 17 296 296 51 184 320 320 127 5 455 236 129 259 354 278 278 278 278 278 252 416 416 416 192 472 225 397 225 225 80 80 197 147 147 147 380 499 486 486 486 365 460 203 53 53 53 212 354 302 175 81 176 176 135 328 200 200 248 335 14 14 411 411 153 153 372 372 396 313 143 310 107 395 382 382 313 325 34 121 121 53 53 394 212 212 180 180 486 486 315 450 450 88 372 372 396 313 24 34 58 72 72 72 72 268 268 268 268 268 268 450 450 274 274 271 186 39 323 9 142 397 336 345 109 109 109 264 313 216 216 22 5 455 236 458 27 27 351 151 151 169 169 164 164 472 221 336 354 29 498 498 313 313 143 77 270 342 342 26 26 251 241 431 431 278 278 120 120 173 173 352 352 272 419 229 247 247 126 126 292 292 326 326 326 326 101 101 101 149 149 228 289 320 345 141 141 141 281 453 168 44 44 38 38 232 232 232 105 105 445 445 470 365 365 365 365 460 330 388 64 131 472 472 221 458 208 208 441 441 441 153 153 153 387 387 285 285 300 382 382 467 69 69 130 130 280 44 44 44 399 70 65 65 265 265 265 85 85 85 85 139 293 175 175 175 230 230 230 215 402 198 198 22 283 455 42 42 147 380 380 496 496 496 496 274 24 131 393 205 155 165 165 165 165 53 250 251 251 241 431 329 278 330 388 379 33 471 49 9 142 58 72 437 481 481 481 481 481 293 175 175 81 84 84 84 496 274 98 13 13 417 170 170 491 170 491 28 491 28 491 28 362 362 362 362 362 362 362 491 491 491 40 305 491 305 305 305 316 435 491 435 435 435 491 491 435 435 7 465 108 377 87 87 8 420 420 420 422 186 338 338 395 395 487 498 498 498 498 59 59 59 263 229 229 247 126 126 126 326 326 326 326 326 326 101 101 101 149 149 228 289 320 127 5 5 455 251 251 251 241 235 235 235 235 235 235 235 348 200 248 248 248 251 251 241 431 290 290 290 290 290 434 434 434 339 195 33 250 217 473 476 476 476 476 252 325 325 191 191 191 325 34 44 44 416 129 259 144 484 484 484 236 314 401 401 259 371 485 485 139 139 302 497 497 349 234 234 261 25 498 498 498 498 498 493 216 216 300 300 334 334 59 452 452 263 263 417 417 237 237 237 237 47 47 47 491 47 491 491 73 491 73 289 7 217 473 65 486 486 329 460 169 169 164 164 219 485 485 374 422 143 458 144 27 351 329 329 329 169 169 164 164 472 221 336 354 495 498 498 313 385 35 77 342 86 142 393 393 261 25 91 91 91 91 91 206 493 216 216 300 334 334 59 59 452 263 229 247 247 126 126 326 326 326 326 408 408 149 149 491 412 83 83 253 253 253 253 253 99 338 338 338 338 338 338 395 180 499 499 265 265 265 265 85 85 85 146 146 464 89 89 446 446 394 478 66 68 68 115 273 265 265 265 85 146 146 146 175 175 81 11 11 11 64 76 465 34 253 253 253 253 342 168 257 257 257 257 31 162 68 68 68 115 273 273 319 319 319 388 94 199 145 145 145 460 460 460 460 402 402 96 272 300 382 382 58 58 57 57 57 57 57 57 203 381 117 404 229 247 15 193 193 17
+103-1240-0018	103	580	1 18 4 11 17 3 33 13 40 16 3 30 5 38 15 13 40 18 20 29 3 31 5 9 23 20 22 35 11 16 14 24 18 19 40 16 13 23 27 24 13 25 38 19 12 7 33 4 22 10 36 5 23 20 30 20 33 30 20 33 19 26 19 25 33 5 12 5 38 35 11 40 1 38 19 25 18 20 16 7 25 11 19 11 18 19 40 18 27 24 31 33 13 11 1 17 30 20 25 17 15 9 5 23 40 38 5 40 9 19 23 33 4 33 12 5 16 14 34 5 31 33 13 21 5 37 18 19 40 22 23 19 30 11 23 4 25 11 1 4 25 11 12 13 30 19 33 38 5 40 33 19 12 19 31 11 15 1	6 5 1 3 3 4 3 2 6 4 8 3 4 4 4 3 3 2 2 4 5 5 2 3 3 3 5 4 4 3 3 2 3 1 6 2 2 4 3 5 4 5 3 2 2 7 4 7 4 5 3 2 3 4 3 3 4 3 2 2 2 5 2 3 2 2 2 3 6 4 4 6 1 3 2 2 2 3 5 4 2 1 2 2 2 1 4 3 4 3 4 4 3 5 31 3 2 5 3 3 6 2 2 5 4 2 2 4 3 4 5 2 2 2 3 2 6 5 2 3 4 4 4 5 2 2 1 2 5 3 2 3 4 3 5 6 2 5 17 4 2 3 1 4 4 2 4 2 6 4 3 3 2 3 5 2 11 6	17 17 17 363 51 51 491 412 412 254 254 254 254 131 221 458 144 180 106 405 405 206 240 285 34 253 253 31 31 86 142 142 393 234 261 25 106 106 481 481 306 306 306 306 372 406 467 467 255 255 255 43 43 345 109 109 403 403 403 207 464 253 253 253 342 342 30 30 30 422 129 74 437 437 437 405 405 206 169 150 342 342 224 494 494 236 129 259 26 359 359 166 166 166 422 129 259 144 27 389 389 389 120 37 24 24 131 472 393 155 165 165 165 165 53 473 58 183 257 257 257 31 342 86 142 393 261 25 470 443 139 175 81 84 84 84 274 399 217 217 473 136 136 136 136 282 388 303 195 404 133 364 345 333 333 220 216 114 180 113 113 113 113 113 450 167 35 335 14 411 145 145 145 460 460 460 178 178 96 96 99 436 436 395 134 134 134 134 134 134 359 359 166 166 166 324 301 42 147 456 456 456 236 36 161 161 487 487 487 213 213 252 325 325 176 176 135 135 200 200 464 340 340 116 64 76 108 377 123 123 216 22 283 455 43 43 364 276 109 109 496 496 496 37 37 37 24 471 270 270 323 323 18 97 397 336 82 409 409 409 409 67 58 183 30 30 30 422 349 205 261 25 180 315 315 450 450 413 64 131 34 277 236 325 34 257 257 257 31 342 142 72 72 350 350 350 350 203 53 53 394 478 66 86 238 6 272 470 470 120 120 120 37 24 131 419 427 491 491 247 312 312 292 292 292 292 292 292 292 21 21 21 21 21 21 21 21 21 326 21 408 408 408 408 391 491 491 289 491 144 208 79 288 288 360 360 360 434 339 33 90 212 445 445 180 171 171 171 171 252 215 8 29 302 302 497 497 497 185 49 9 397 397 345 141 141 281 281 9 142 221 336 336 354 278 278 139 139 139 293 293 122 122 449 449 415 415 415 129 198 198 22 455 38 349 234 234 261 25 498 498 498 498 396 240 216 114 300 459 271 31 162 86 68 6 272 470 470 120 120 120 240 314 314 259 108 377 123 223 130 402 257 257 257 257 453 9 221 336 336 208 208 386 386 485 286 286 286 468 313 313 24 314 26 26 251 241 431 294 294 294 294 294 282 282 388 303 303 212 131 419 78 491 491 247 312 126 292 292 292 292 326 326 408 149 149 228 491 491 83 83 55 55 322 67 466 212 212 127 114 264 264 468 468 406 467 177 177 177 131 133 133 364 345 141 141 141 141 141 368 368 31 342 86 86 6 108 377 123 123 216 216 258 258 258 258 31 342 86 86 6 6 371 93 93 93 93 207 207 19 19 454 417 417 417 417 417 193 193 193
+103-1240-0019	103	770	1 9 13 30 23 20 37 19 40 5 9 5 23 16 14 24 12 5 24 15 25 30 27 11 5 23 6 26 38 19 10 1 6 23 12 20 5 12 14 1 4 37 5 25 23 20 18 7 31 5 40 38 14 31 27 31 27 32 5 9 23 20 31 19 10 36 15 33 19 11 1 24 19 31 19 40 30 15 10 5 23 19 25 11 1 11 19 11 25 3 33 22 6 23 1 23 19 37 19 26 19 25 31 5 10 5 29 23 15 31 1 23 19 37 19 26 1 4 33 6 23 1 19 33 31 21 5 31 33 31 33 15 19 26 12 4 33 31 38 5 33 1 32 20 31 13 11 13 40 32 20 31 33 13 29 33 5 23 6 26 12 5 11 20 29 30 5 33 19 11 17 30 4 31 20 23 15 25 1	9 2 4 3 3 3 3 3 4 2 2 2 5 3 2 3 1 2 3 7 4 4 6 3 3 2 7 4 3 2 6 2 7 3 3 4 3 4 6 1 7 3 1 2 3 3 5 7 3 4 4 3 3 5 4 8 2 5 3 2 2 3 5 2 5 4 4 3 1 6 35 4 3 3 3 4 3 2 4 3 6 4 4 7 3 3 2 2 2 6 3 7 7 9 1 10 2 3 4 5 2 3 3 3 3 2 6 2 6 7 5 9 3 3 5 6 1 5 3 3 12 28 5 3 4 4 1 6 3 6 3 11 1 5 2 4 3 2 5 3 7 10 7 3 6 5 3 2 2 4 3 5 4 2 5 2 3 3 4 5 2 2 4 5 4 6 2 5 3 6 3 3 5 7 3 6 7 5 7	17 17 17 296 296 184 184 320 320 320 354 264 264 264 468 468 468 313 359 359 166 166 166 301 378 280 280 278 278 278 368 453 342 168 494 134 8 8 100 100 100 100 497 497 349 155 155 165 165 165 165 466 466 22 283 455 217 217 473 290 290 290 290 290 434 434 339 33 250 42 147 147 380 288 496 496 496 496 274 274 24 325 34 255 255 255 175 241 235 235 235 235 235 235 235 413 200 248 250 345 407 407 407 407 407 310 107 447 447 483 226 226 209 287 297 297 297 297 297 293 293 122 216 22 448 448 448 464 464 493 493 493 493 216 300 300 382 382 313 335 14 226 226 209 411 145 145 486 460 460 173 280 29 242 116 359 466 81 166 324 324 3 58 72 72 268 268 268 268 268 268 450 274 368 368 9 168 300 50 50 50 49 9 142 397 336 347 347 347 313 186 162 54 172 344 344 344 344 344 186 186 162 482 482 482 115 273 344 496 496 186 99 436 436 395 134 134 134 8 359 359 166 166 166 422 162 68 68 115 470 278 278 143 310 107 395 485 469 134 88 158 158 158 158 158 24 325 191 191 191 37 24 419 439 78 170 170 28 491 28 28 491 491 28 491 362 305 362 362 362 40 40 40 40 201 491 366 491 491 305 366 366 491 316 316 316 491 316 73 73 289 7 217 473 258 258 258 342 342 224 494 494 494 31 9 142 397 147 329 329 329 329 329 36 107 107 395 302 302 497 497 251 251 241 81 81 278 278 330 388 379 195 64 212 131 419 439 225 225 225 225 80 491 75 371 371 278 278 314 196 309 479 307 307 307 61 61 167 131 90 221 458 144 27 437 437 437 437 437 481 481 481 481 481 481 293 293 293 497 497 335 197 197 197 197 197 197 7 251 251 251 251 251 241 241 278 278 173 173 280 176 176 176 135 328 200 200 200 340 340 340 340 394 478 342 224 273 344 344 344 449 449 44 44 44 129 259 74 425 425 386 343 343 343 343 343 358 358 358 39 39 433 160 97 97 197 226 226 80 491 491 491 7 7 251 251 251 251 241 431 278 278 278 173 173 176 176 176 176 328 328 200 200 200 200 335 335 209 209 415 415 415 415 415 131 106 106 106 297 297 297 297 182 375 375 375 98 98 13 13 229 170 170 312 312 312 312 292 292 292 292 292 292 21 21 21 21 21 21 21 21 408 408 408 408 408 391 391 491 491 412 177 177 177 177 177 356 356 142 238 238 6 310 395 395 151 169 150 342 342 86 238 6 491 478 66 66 68 232 68 238 6 272 470 470 403 403 403 171 171 171 246 246 176 176 328 328 200 200 248 248 216 114 92 92 92 92 35 77 77 9 397 221 276 181 181 181 181 181 240 385 35 227 419 419 439 225 225 225 225 225 225 225 225 225 225 373 373 373 338 338 400 400 400 400 422 422 162 68 68 68 273 470 120 120 120 120 37 37 24 34 253 253 253 99 338 338 400 400 400 30 422 162 232 232 232 238 6 6 371 470 189 151 215 215 35 96 96 272 272 255 255 251 251 241 431 235 235 235 235 235 235 248 248 212 22 283 455 236 239 239 384 371 213 213 213 252 215 129 259 402 133 42 42 42 147 380 499 151 151 240 240 449 449 191 191 191 37 314 314 90 90 401 401 491 208 208 79 380 380 486 486 376 460 460 169 150 342 342 342 224 41 324 324 301 251 251 251 241 431 431 290 290 290 290 434 434 434 339 339 117 404 229 247 247 126 326 193 193 17
+103-1240-0020	103	670	1 9 6 30 11 14 11 38 19 12 38 8 23 11 30 27 40 9 35 32 5 40 1 19 33 31 25 27 38 5 25 11 14 24 4 34 39 36 5 25 11 24 3 30 19 23 5 3 30 9 27 34 5 23 19 33 5 23 3 11 1 23 19 37 19 26 5 38 15 9 4 22 18 20 30 9 8 12 5 24 31 13 23 37 40 1 33 30 20 40 3 30 25 33 24 5 10 22 5 24 29 5 25 20 1 12 27 11 19 30 25 27 40 19 16 12 15 38 14 12 13 30 11 9 20 19 25 5 16 5 37 12 5 24 1 8 11 30 36 34 14 23 35 22 4 33 29 20 29 5 23 1 33 19 9 20 32 35 30 1	7 3 4 3 3 3 3 2 2 3 5 7 3 4 3 5 4 3 2 5 4 7 27 3 2 4 3 5 4 2 3 3 3 4 4 4 2 4 2 1 2 1 2 5 3 3 3 4 3 8 4 5 1 3 1 2 3 6 10 6 6 6 3 2 2 4 3 5 7 3 6 5 3 3 4 2 5 2 2 3 4 4 5 3 8 15 6 2 5 4 3 1 2 3 3 3 4 5 2 2 2 2 2 7 20 3 5 2 5 3 5 6 5 2 3 2 6 5 5 3 2 3 2 2 3 3 3 2 4 2 3 2 3 10 26 5 3 3 2 3 2 2 2 3 2 3 4 5 2 2 7 27 3 1 2 4 5 2 8 5	17 17 296 296 296 184 184 320 354 153 153 153 153 387 387 387 396 285 131 300 382 382 313 131 133 133 345 333 220 220 220 133 133 364 276 276 346 346 346 265 85 85 85 85 139 293 293 122 122 472 133 147 147 380 288 496 496 496 274 368 368 9 142 221 336 354 109 278 278 99 99 436 107 50 50 50 50 50 185 269 433 433 160 112 427 491 491 312 312 126 292 292 292 292 292 292 326 326 21 21 326 326 21 21 21 408 408 408 408 149 228 289 177 177 177 356 356 342 86 238 196 479 331 231 231 231 274 274 43 364 276 174 174 319 319 348 348 64 212 300 334 382 382 245 399 217 473 65 486 486 460 169 169 164 164 485 485 485 8 345 88 109 242 242 116 250 250 70 65 65 498 245 42 147 380 134 134 139 175 175 423 423 423 423 353 353 353 406 467 245 245 8 32 401 401 401 354 354 106 496 496 496 274 169 164 164 224 494 255 251 251 81 278 278 26 26 302 302 497 335 14 14 411 284 284 284 405 405 405 206 206 206 206 37 24 131 404 404 439 225 225 225 225 417 80 491 80 7 7 251 241 241 278 278 278 173 280 176 135 135 135 200 199 255 255 43 43 364 276 109 403 403 403 403 171 246 324 301 8 259 354 376 376 376 376 376 178 178 458 458 192 183 183 286 286 286 286 286 286 468 245 245 8 354 62 62 62 62 438 216 114 57 203 53 394 478 342 232 172 115 279 279 279 279 279 279 279 375 375 233 270 270 269 433 390 112 112 56 491 56 47 491 305 491 187 187 187 187 391 391 316 491 73 73 73 491 289 108 119 487 487 487 288 213 213 246 246 318 368 453 168 106 353 353 353 353 396 35 465 472 196 70 70 383 383 383 383 36 107 447 221 336 144 27 437 319 319 53 53 76 465 242 242 242 94 41 41 41 41 19 19 454 229 247 126 126 126 326 326 326 326 326 326 326 326 326 101 101 101 149 149 228 491 320 127 114 84 84 496 274 236 239 384 371 485 286 286 286 468 382 313 10 10 479 331 84 84 84 496 16 274 368 368 453 342 168 118 118 118 118 402 198 198 114 0 0 0 0 301 378 43 364 347 347 347 347 498 467 396 313 216 216 114 0 222 382 313 314 314 239 354 420 420 420 464 464 44 116 94 479 230 230 169 169 352 352 69 223 223 130 402 402 198 114 57 57 57 203 381 381 117 404 439 439 439 439 439 439 237 78 78 170 491 491 312 312 312 12 12 1 292 21 21 21 408 408 408 391 391 391 228 491 412 412 287 111 111 438 438 314 133 133 380 499 499 493 216 216 300 382 134 251 241 367 367 367 458 192 415 415 457 401 401 259 74 351 213 213 213 213 252 215 259 259 29 100 100 375 375 98 98 13 417 417 417 237 237 237 237 237 237 237 237 237 237 237 237 237 47 316 491 491 316 316 491 491 491 435 289 289 289 108 377 87 8 420 420 420 420 422 99 338 338 395 395 487 498 498 498 59 59 59 452 263 263 417 417 193 193 193
+103-1240-0021	103	740	1 12 15 31 20 24 22 5 25 33 13 25 33 19 11 19 25 5 16 9 5 33 12 13 25 8 31 5 29 27 40 12 13 30 39 36 40 11 33 36 19 33 1 5 9 3 11 20 22 5 25 17 19 33 39 36 40 11 33 36 13 25 20 34 19 26 1 20 37 19 25 33 5 9 20 19 26 18 4 26 11 13 40 12 20 8 30 19 32 24 5 25 31 13 11 1 38 19 34 12 19 31 24 19 31 19 40 30 15 10 5 23 31 33 13 29 33 7 33 5 37 12 5 23 15 25 1 19 25 33 5 12 5 9 4 22 39 3 30 11 5 37 17 30 20 25 17 15 9 5 23 40 1 37 13 30 20 17 30 20 25 4 25 11 25 20 33 4 25 11 29 30 19 31 8 31 38 5 40 12 4 33 39 3 30 11 1	7 4 3 5 4 3 3 3 2 4 2 2 3 2 2 1 5 4 5 2 1 3 2 2 2 5 4 2 3 8 4 1 2 3 5 4 3 2 2 3 2 8 25 4 3 5 3 2 3 3 1 3 1 2 4 2 4 1 2 4 1 3 3 4 2 6 7 6 3 2 1 2 2 2 3 2 4 5 4 9 2 3 4 3 4 1 6 2 4 2 2 3 4 4 5 58 3 3 4 2 5 9 2 2 4 2 5 3 4 3 2 3 5 4 3 5 2 6 2 2 2 2 2 5 8 7 5 5 5 2 1 2 2 3 6 4 3 5 2 3 1 4 3 2 5 3 3 5 2 3 4 9 19 4 4 3 5 5 4 8 4 4 3 3 3 6 5 3 1 2 2 2 3 4 8 5 3 2 3 2 4 1 4 6 4 4 7	17 17 296 296 296 184 184 184 320 435 0 0 0 0 422 162 68 115 444 444 444 360 434 339 394 90 76 144 445 121 121 116 33 76 465 108 432 432 432 432 379 64 76 449 191 191 191 325 34 34 196 309 479 479 331 230 230 230 169 169 352 352 221 401 259 159 159 159 236 259 127 361 361 361 94 199 111 111 111 438 438 162 342 224 494 494 494 129 74 84 84 496 496 496 496 274 274 368 368 9 9 198 114 0 222 222 468 313 313 219 219 219 485 485 374 374 186 186 323 86 238 6 377 123 123 88 277 277 277 277 385 131 227 419 439 78 229 170 491 312 312 292 292 292 292 292 292 326 326 326 326 326 326 23 23 101 101 408 391 391 491 491 412 287 44 8 8 354 354 106 284 91 206 206 240 325 41 324 324 143 144 389 389 389 389 200 248 192 445 213 324 219 219 219 219 485 374 374 368 162 54 86 6 272 123 123 88 109 109 475 475 94 475 475 475 422 349 164 214 214 214 214 214 328 200 200 117 404 454 225 225 225 225 225 225 451 451 213 357 357 357 173 280 29 242 116 33 465 377 123 8 354 420 420 420 360 135 135 200 200 248 58 58 72 72 110 110 486 486 365 365 365 328 200 200 195 195 248 248 212 239 34 253 253 253 253 453 342 342 198 22 448 448 448 464 145 284 306 306 468 406 467 288 469 469 99 436 447 221 196 473 275 275 379 394 478 342 68 115 470 470 120 120 120 37 24 117 263 417 417 417 417 237 237 237 491 491 28 28 491 28 491 491 305 362 491 491 362 362 362 491 362 362 491 40 211 211 491 218 491 369 369 369 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 260 260 260 260 391 391 228 228 491 320 320 333 333 333 220 314 35 401 259 127 114 258 258 258 258 258 31 39 342 433 160 97 221 336 196 473 258 258 258 258 342 224 494 494 494 31 9 142 397 397 147 329 329 329 329 329 143 36 310 395 302 302 302 497 497 162 232 232 232 238 6 6 371 470 189 443 151 215 35 35 96 401 272 34 180 113 113 113 113 113 285 449 223 223 130 198 198 22 283 455 251 251 251 241 431 290 290 290 290 290 434 434 434 339 339 117 117 404 225 225 225 225 225 80 226 491 209 188 340 340 340 33 33 64 212 377 123 123 216 22 283 455 8 354 180 376 376 376 376 460 178 178 35 458 192 219 485 180 306 306 306 306 306 306 396 24 285 69 462 462 402 402 401 259 208 79 288 288 213 213 360 434 434 339 248 394 239 445 180 171 171 171 171 252 215 8 354 100 100 375 375 375 185 185 269 433 433 18 18 112 439 439 237 237 237 237 47 47 491 491 2 2 316 316 491 316 316 491 73 289 289 7 7 4 104 104 104 104 104 104 468 468 337 337 337 324 301 416 32 32 208 79 79 79 288 288 360 360 360 246 246 246 434 434 339 94 199 89 194 194 194 446 446 64 212 212 465 196 196 309 398 398 213 213 246 246 252 143 36 108 119 351 89 446 446 446 394 76 401 74 190 492 492 38 162 342 68 115 273 265 265 265 428 85 146 146 358 358 39 342 142 397 345 141 141 141 281 453 342 198 198 114 92 92 92 240 314 131 219 219 180 180 106 306 306 306 306 306 59 59 37 131 419 427 229 247 247 15 193 193 193 17
+103-1240-0022	103	818	1 31 13 33 5 9 7 33 3 25 38 5 25 31 8 11 38 19 34 17 30 15 33 1 29 15 33 30 20 3 30 22 5 23 38 19 23 27 40 1 5 25 11 12 20 5 12 14 38 19 34 29 30 19 24 23 3 24 9 3 30 11 20 40 1 25 3 33 5 31 33 30 15 31 33 19 22 25 6 30 31 33 27 25 38 5 40 33 5 9 20 31 20 25 1 16 6 30 24 19 31 19 40 30 15 10 5 23 38 35 11 18 4 37 31 20 25 19 33 19 16 12 13 30 18 4 11 9 19 25 1 29 30 8 37 5 33 23 20 32 20 38 5 40 5 37 12 20 5 29 19 25 39 5 25 12 5 33 24 3 30 19 23 5 22 5 34 9 14 33 31 38 13 29 33 12 4 33 39 3 30 11 27 37 14 13 40 6 16 5 25 13 40 32 20 31 38 13 29 33 18 14 18 7 31 1	8 6 4 2 2 3 10 4 3 5 3 3 4 6 7 3 2 2 3 3 3 5 5 1 4 4 5 2 1 5 4 3 2 3 5 2 4 7 6 21 8 1 2 1 6 3 4 4 3 2 4 4 3 2 10 3 3 3 2 2 1 3 6 9 23 4 3 3 3 6 4 2 9 7 3 3 7 3 3 3 6 3 8 1 4 2 3 2 2 2 4 5 8 6 14 6 2 2 3 2 3 2 5 3 4 3 4 5 4 2 2 3 2 3 6 3 2 5 2 2 6 3 2 3 5 3 4 3 3 10 13 3 3 6 2 2 3 4 4 4 3 3 1 4 3 3 2 3 3 5 2 3 4 2 2 2 2 3 2 2 3 3 4 2 4 4 4 3 4 3 6 2 2 4 4 2 5 3 2 6 2 2 6 4 5 2 4 7 5 3 2 2 2 4 3 4 2 3 3 1 4 3 4 7 9 15	17 17 17 363 363 51 51 51 184 373 66 68 115 273 470 443 443 240 449 449 34 255 8 354 180 113 113 113 113 113 113 113 450 413 413 36 449 34 106 125 125 125 125 203 250 250 250 70 174 174 174 319 319 348 33 394 478 478 68 68 115 273 273 265 265 265 85 85 85 146 24 133 133 345 333 333 220 220 142 221 401 491 208 79 484 484 484 484 484 252 252 457 36 472 472 401 401 491 74 74 351 351 171 171 252 143 36 161 161 487 485 485 485 464 106 499 306 306 396 396 178 178 458 144 302 100 497 497 497 497 364 364 364 276 109 278 139 175 175 81 81 84 84 84 496 274 185 185 185 269 433 18 427 229 247 247 126 126 292 292 326 326 326 326 326 326 326 326 326 408 101 149 149 149 149 228 412 83 83 194 194 194 322 67 466 212 22 448 448 448 464 464 493 493 493 493 493 216 300 300 382 382 245 378 43 345 333 333 220 220 142 221 401 401 491 74 190 190 487 288 278 278 203 53 53 195 195 195 250 251 251 251 241 431 284 426 426 426 203 53 212 212 29 29 495 313 313 325 41 41 41 41 19 318 185 185 433 433 433 160 112 112 417 417 237 237 237 237 237 237 237 491 2 491 201 201 491 316 435 491 491 435 435 435 435 289 289 491 7 479 331 307 307 61 61 285 44 44 44 44 38 482 482 482 482 238 6 336 161 79 487 288 288 403 403 171 171 246 246 252 422 186 162 232 232 232 68 238 6 272 470 278 278 278 178 458 458 192 472 196 196 309 331 157 157 157 157 372 396 186 162 54 482 482 238 238 6 371 180 84 84 84 496 274 413 413 413 195 250 250 345 141 141 141 281 9 238 6 272 87 87 354 420 420 420 420 422 162 68 68 115 267 267 267 267 267 434 434 434 434 339 303 303 404 13 491 247 247 126 126 292 326 326 326 23 101 101 149 228 491 491 373 155 155 155 332 332 332 372 245 399 70 258 258 258 31 342 224 494 494 368 453 142 142 397 147 329 329 329 329 329 329 36 310 107 395 302 302 302 302 497 497 43 43 364 364 345 389 389 389 314 58 58 110 202 202 202 202 402 478 66 68 68 267 267 267 267 267 434 339 94 277 277 277 37 37 325 34 118 118 118 118 402 402 472 177 198 127 114 0 222 222 468 245 58 72 110 110 254 254 254 240 314 131 133 401 401 354 137 137 137 137 275 303 303 303 48 404 439 439 439 78 237 47 47 47 491 491 47 316 316 491 73 289 73 491 491 190 488 488 488 488 488 428 146 146 173 173 29 469 469 236 36 26 359 359 359 474 474 474 324 186 99 338 400 400 400 30 378 8 345 141 141 281 453 342 168 168 223 223 130 129 198 198 448 448 448 464 464 255 143 129 259 74 351 351 278 278 360 339 398 398 398 464 275 275 116 64 212 198 45 45 45 236 129 196 196 473 65 329 329 406 406 288 288 139 175 175 175 423 423 423 143 458 144 27 437 151 151 169 169 402 35 221 336 354 29 498 382 313 313 35 131 472 482 482 482 482 397 397 189 189 189 189 169 35 96 96 272 472 472 198 127 114 92 92 92 92 240 385 449 449 219 464 180 106 306 306 306 306 206 396 285 285 34 84 410 410 410 410 173 173 29 29 495 495 406 406 467 467 253 253 453 9 168 14 14 411 284 405 405 206 169 349 352 352 29 242 242 116 94 199 253 253 253 253 338 338 400 400 400 30 422 162 232 482 397 397 189 189 189 215 215 35 96 272 472 156 156 156 156 245 58 72 72 268 268 268 268 268 268 450 450 271 186 39 390 390 390 18 427 56 247 312 15 15 15 292 292 292 292 15 193 193 193 193 193 17
+103-1240-0023	103	607	1 18 38 5 25 22 35 11 18 4 37 20 33 5 25 5 24 20 23 6 16 12 5 17 30 7 25 11 38 19 12 7 33 27 37 14 9 30 19 24 19 26 12 5 29 30 5 37 14 9 20 5 23 29 13 22 5 37 11 14 33 1 24 19 31 19 40 30 15 10 5 23 30 4 29 33 31 24 3 30 33 23 20 4 33 12 20 22 19 10 5 25 11 6 30 1 5 25 11 31 33 13 29 33 19 25 38 19 25 9 19 11 19 25 33 19 11 36 31 27 1 12 20 22 19 10 5 25 4 33 17 30 20 25 17 15 9 5 23 40 38 5 40 5 10 19 30 16 5 23 5 29 3 30 33 24 5 25 33 1	18 1 3 4 3 3 2 1 2 1 4 5 1 2 3 2 5 7 6 7 5 2 1 3 3 6 2 3 3 1 3 4 4 6 3 2 4 2 2 2 3 5 2 2 3 3 2 2 7 3 2 4 2 7 2 4 3 3 4 8 5 47 4 3 3 3 5 3 3 4 3 4 4 4 3 3 5 4 2 3 3 2 4 2 2 2 2 3 3 5 2 4 2 5 6 3 5 1 3 5 3 4 3 2 2 4 3 2 3 3 3 2 2 4 3 2 2 5 6 8 22 3 3 3 4 5 3 2 3 3 3 2 5 3 3 6 2 2 6 2 3 1 4 3 5 2 3 3 2 2 2 4 3 3 3 1 2 2 4 13	17 17 17 296 363 363 363 363 363 363 363 101 51 51 51 228 228 491 320 345 174 174 174 174 174 348 348 64 248 465 144 389 389 389 389 34 202 202 173 280 280 444 213 213 213 252 314 196 242 242 116 479 199 44 44 44 217 217 217 473 398 213 213 213 286 139 139 302 302 375 375 335 226 14 287 287 284 405 405 206 169 352 402 198 198 22 283 455 416 144 208 79 380 315 315 315 315 450 450 413 413 212 212 131 133 345 333 333 220 220 216 180 113 113 113 113 167 35 131 14 14 411 410 410 410 410 173 29 29 382 245 8 129 259 190 380 288 288 278 203 53 176 176 135 328 200 200 248 248 212 22 283 455 129 259 74 190 492 492 245 173 280 280 498 498 498 498 498 396 215 8 354 337 485 485 464 139 302 497 497 497 129 401 401 259 74 351 351 443 443 178 178 458 192 192 462 462 462 402 32 239 384 371 498 498 498 498 498 59 396 385 24 227 419 419 439 417 417 237 237 237 237 237 237 491 28 237 28 362 491 491 362 491 491 362 362 362 491 362 362 491 435 211 491 369 369 369 21 21 21 21 21 21 21 21 21 21 408 408 408 149 149 228 289 491 7 217 258 258 258 258 342 342 224 494 494 494 281 9 142 397 336 147 329 329 329 329 329 329 310 107 395 302 302 302 497 497 497 42 42 147 380 486 486 486 460 460 215 35 96 36 472 66 68 68 172 482 105 196 70 70 65 306 306 396 396 313 35 26 26 359 359 474 474 324 464 415 415 415 35 22 283 455 416 458 458 445 278 278 278 36 310 107 107 60 298 116 33 33 394 212 384 371 153 153 153 153 372 372 372 59 59 452 263 225 225 225 83 83 55 55 322 67 67 478 478 232 232 232 68 238 6 371 470 443 443 443 215 35 96 96 272 34 340 340 116 33 250 250 250 409 409 409 409 116 33 394 32 239 354 278 278 278 314 196 242 242 242 33 394 76 36 377 87 87 236 239 371 371 374 374 374 132 186 162 54 172 224 273 84 84 84 84 16 98 98 13 417 417 417 417 237 237 237 201 211 211 187 260 260 391 391 391 491 316 316 73 491 289 491 127 5 5 455 143 458 445 445 278 278 278 278 143 310 107 107 395 298 298 116 94 199 415 415 415 415 131 221 401 144 79 79 288 288 360 360 434 434 200 248 212 239 445 180 171 171 171 171 171 252 8 354 100 100 302 497 497 81 253 253 453 142 221 345 141 141 281 453 342 168 44 236 236 36 310 107 395 485 286 286 286 313 349 349 155 262 262 100 175 81 255 255 129 259 74 437 437 306 306 306 396 396 457 233 196 291 291 291 291 243 243 227 427 427 247 247 15 126 15 292 292 292 193 193 193 193 17
+103-1240-0024	103	763	1 6 30 38 35 11 18 4 37 9 19 25 10 19 30 16 5 23 1 19 16 19 33 18 4 11 25 3 33 9 19 25 31 27 29 15 25 16 5 23 20 22 23 20 25 13 40 33 19 17 19 37 19 33 31 5 24 34 19 26 5 37 12 20 5 29 19 30 5 25 31 5 37 5 25 5 25 39 36 40 11 29 3 30 23 14 1 19 33 31 38 19 25 11 27 40 23 35 22 33 1 20 31 33 4 25 11 38 13 31 33 1 34 30 36 12 5 38 13 31 33 38 5 25 1 23 35 22 19 26 7 33 6 25 12 5 9 4 22 39 3 30 11 1 22 15 24 5 16 23 5 11 5 37 24 13 23 27 21 36 25 31 5 25 23 8 33 1 9 5 33 12 20 31 33 38 5 25 1	14 7 6 5 2 2 2 1 3 3 3 5 5 2 4 5 2 6 4 6 3 2 2 2 2 1 3 5 4 2 2 4 6 2 8 3 5 3 2 2 5 7 3 8 2 3 5 2 2 5 2 3 3 3 3 2 4 2 2 5 2 2 3 3 3 6 3 5 5 3 4 2 2 2 3 3 4 7 5 4 3 5 4 4 4 8 27 5 1 3 3 2 4 3 6 4 2 3 3 2 2 10 5 3 4 3 3 3 6 6 5 12 5 2 3 1 3 4 3 5 3 4 2 8 15 4 2 3 4 4 4 3 2 3 1 2 3 6 4 3 8 4 4 12 4 4 3 4 5 3 4 2 3 3 7 2 6 6 8 8 6 6 3 4 3 6 8 17 3 1 3 3 11 5 5 4 2 6 11	17 17 17 296 363 363 363 51 51 51 51 184 491 491 412 287 157 157 157 157 372 372 372 396 396 245 245 43 364 364 345 389 389 389 240 325 34 202 202 202 402 221 259 354 137 137 137 137 137 33 394 394 76 310 107 107 395 395 286 286 286 468 396 245 349 349 155 262 262 262 100 100 375 98 98 117 417 417 225 80 80 491 209 118 118 118 118 118 402 177 177 177 177 36 34 254 254 254 314 196 479 331 307 307 307 61 61 167 457 35 259 137 137 137 137 137 33 394 478 478 68 172 344 344 344 344 274 129 129 74 74 72 72 290 290 290 290 290 339 339 33 90 393 155 262 262 262 359 359 166 166 166 324 422 143 129 458 144 208 208 208 386 386 386 444 360 360 360 360 246 434 434 339 94 199 253 253 253 253 31 342 86 238 6 272 87 87 87 416 458 445 485 278 278 173 173 29 277 277 385 314 478 478 68 115 273 231 231 231 53 53 76 198 214 214 214 328 200 200 200 69 223 223 130 198 198 22 448 448 464 464 255 255 129 129 74 74 485 485 485 485 286 286 468 468 467 337 11 11 11 11 11 379 379 77 77 342 342 224 69 69 223 130 44 44 44 116 94 199 319 319 319 348 33 33 219 219 219 219 219 485 485 374 374 374 132 318 368 368 54 54 238 272 472 221 336 74 74 437 437 306 306 306 306 306 396 396 134 175 175 81 334 334 334 334 59 452 452 13 491 247 312 312 126 292 292 292 292 292 292 326 326 326 21 21 21 21 21 21 21 408 408 408 408 149 228 491 412 177 177 177 177 77 9 142 397 336 276 109 109 278 330 348 33 64 212 384 84 84 84 84 496 274 185 49 9 26 26 241 367 367 367 367 143 96 36 131 483 226 226 209 411 213 213 213 213 213 252 318 39 39 342 86 86 238 6 272 483 440 89 89 446 446 446 67 212 131 133 364 276 109 109 443 443 120 120 271 150 39 433 433 160 6 6 227 419 439 439 439 225 225 225 225 237 47 47 491 80 491 373 373 373 155 155 487 487 487 374 374 216 216 283 455 455 43 276 109 109 443 443 443 150 150 86 86 238 272 397 397 364 174 174 174 174 319 348 195 195 195 404 404 229 491 247 312 126 292 292 23 23 23 101 101 149 391 228 491 289 320 7 241 367 367 367 367 192 192 176 135 135 135 200 464 464 113 113 113 113 113 167 449 34 125 125 125 125 466 22 455 455 8 259 180 376 376 376 376 376 460 178 178 458 192 219 219 219 180 180 106 306 306 306 306 306 306 59 37 24 131 404 229 229 247 126 126 326 326 326 408 408 149 228 228 289 491 144 445 210 210 210 210 210 203 53 44 44 44 44 349 234 234 234 261 425 386 386 431 151 151 240 240 285 34 69 462 462 130 402 402 196 217 217 217 217 473 65 443 443 139 175 175 241 81 84 84 84 496 274 274 274 236 32 401 36 310 107 395 485 374 374 374 374 132 132 132 242 116 33 33 33 394 478 478 232 172 172 115 273 319 319 319 348 348 466 250 241 431 431 428 428 428 428 146 358 385 233 227 419 419 439 439 78 170 170 491 47 491 2 491 2 316 316 316 491 316 491 73 73 73 491 354 159 159 159 35 35 198 22 5 448 448 14 411 411 213 213 213 213 213 186 39 342 86 86 238 6 272 472 397 336 276 174 174 174 174 275 388 303 303 117 48 417 78 170 491 491 421 491 128 128 193 193 17
+103-1240-0025	103	755	1 18 38 13 25 31 39 36 17 3 33 1 5 17 23 19 24 29 31 5 37 12 5 9 23 36 24 1 38 8 33 1 10 13 30 20 33 30 20 40 19 25 12 5 23 13 16 33 6 30 10 14 11 1 5 25 11 25 3 11 19 26 31 23 13 25 11 14 9 14 10 19 40 11 7 25 19 25 12 5 18 3 23 27 9 8 12 5 9 30 35 22 1 38 5 40 17 30 20 25 11 27 37 14 9 8 5 33 4 26 17 5 23 5 37 8 25 40 1 18 20 30 31 4 33 24 3 30 19 23 5 22 5 34 9 14 33 1 38 13 25 32 20 31 4 33 4 33 6 23 1 6 23 38 20 40 31 23 8 33 23 20 11 19 31 33 30 5 31 33 16 5 23 5 37 31 5 25 32 8 25 1	12 1 2 2 4 3 2 2 5 7 7 2 4 4 5 2 4 2 4 2 4 1 2 5 3 8 10 2 8 6 7 3 6 3 3 4 5 2 5 3 2 2 2 2 3 4 4 2 6 3 5 3 5 14 6 2 2 3 6 3 4 7 7 2 3 3 3 5 4 6 6 4 5 4 5 3 1 2 1 3 5 3 5 4 3 4 3 2 4 4 3 6 19 3 2 4 4 3 5 2 3 5 3 3 4 6 2 8 2 4 2 1 3 4 5 11 6 9 38 5 3 4 6 4 4 3 1 2 3 3 2 6 3 5 3 5 7 3 4 2 2 4 4 4 4 2 2 3 3 8 9 6 5 3 4 2 6 2 4 3 2 3 3 2 5 2 2 2 4 1 2 2 2 2 2 4 2 4 4 5 4 8	17 17 17 296 296 317 435 491 184 184 184 184 320 345 109 409 330 330 67 77 77 54 54 219 152 152 152 152 143 129 144 144 180 106 405 405 405 206 206 167 167 35 227 227 419 225 225 226 226 209 44 44 44 416 458 458 208 208 425 386 386 431 278 278 278 53 53 76 76 270 342 342 172 224 69 462 130 130 198 198 22 283 455 455 129 259 354 425 425 241 431 374 374 374 374 374 132 132 413 203 381 381 381 212 212 32 197 197 197 197 491 197 197 7 7 364 276 346 346 346 428 428 428 146 146 358 35 35 401 131 472 472 401 80 80 491 310 107 107 395 351 264 264 468 468 468 337 337 337 324 422 24 36 161 161 487 487 288 213 213 246 318 318 49 342 168 340 340 116 466 22 283 455 455 251 241 431 431 443 443 169 169 352 402 402 272 483 14 411 153 153 153 372 372 396 313 36 310 107 395 334 334 59 313 24 24 404 427 247 247 126 126 292 326 326 326 326 326 326 101 149 149 149 491 412 83 55 55 55 322 67 10 10 309 479 331 331 284 405 405 206 206 240 325 34 176 176 135 135 328 200 200 200 248 248 478 162 68 68 68 26 26 26 241 431 432 432 432 330 64 64 212 131 300 382 382 382 245 245 129 129 354 354 498 498 498 498 396 186 35 36 310 107 107 50 50 50 50 50 49 342 342 68 221 336 384 371 180 315 315 315 450 450 348 94 199 340 340 466 22 283 455 455 58 72 72 437 437 481 481 481 481 175 175 81 84 84 84 274 274 8 259 354 62 62 62 62 216 216 22 283 455 8 129 259 190 380 380 499 496 496 496 274 233 233 458 419 427 229 247 126 126 292 292 292 292 292 23 23 23 23 101 101 101 149 149 228 228 320 345 141 141 141 281 453 142 221 221 336 208 79 79 79 288 360 360 360 434 434 339 64 212 131 180 180 410 410 410 410 173 29 29 382 245 245 8 8 62 62 62 62 62 146 464 44 44 44 236 36 108 119 119 351 351 486 365 365 365 200 200 212 212 302 302 302 175 175 462 462 462 462 4 4 4 280 106 284 480 480 480 480 480 85 85 299 299 299 299 339 303 471 471 471 49 433 433 433 160 112 112 439 56 56 237 237 237 491 28 491 28 491 362 362 362 491 362 362 491 362 362 491 491 362 491 211 491 491 369 369 21 21 21 326 408 408 408 408 228 491 289 491 373 451 451 286 286 286 286 468 468 313 186 162 232 68 115 273 470 486 486 460 460 240 35 472 196 196 70 65 65 495 406 467 288 139 175 175 423 423 423 143 129 144 27 437 437 151 151 169 169 164 164 164 221 336 354 29 334 334 59 59 385 233 465 419 439 439 225 417 417 80 80 80 7 345 409 409 409 409 76 310 107 400 400 30 30 422 422 342 342 273 470 486 486 460 240 285 34 415 415 415 325 131 106 106 297 297 297 375 375 98 98 263 417 417 417 417 237 237 237 237 80 491 491 483 287 287 297 297 297 297 297 43 43 345 109 109 109 171 171 318 186 162 232 68 68 26 26 425 241 431 428 428 428 146 143 26 26 359 166 166 166 301 143 36 490 490 490 38 162 482 482 238 6 161 487 499 151 151 150 150 86 142 142 393 262 100 100 175 81 462 462 130 162 68 68 273 319 319 319 348 33 64 212 310 107 395 180 480 480 480 85 299 299 303 48 229 229 247 15 15 193 193 193 17
+103-1240-0026	103	416	1 4 25 11 18 20 30 32 20 31 4 33 25 7 1 25 19 33 19 26 1 4 25 11 12 5 33 15 9 5 23 9 19 18 8 25 11 18 14 38 5 40 23 15 11 16 14 31 5 29 14 1 24 19 31 19 40 30 15 10 5 23 1 9 19 16 6 30 32 20 18 4 11 16 13 30 23 20 22 23 27 40 11 12 5 11 6 30 1	21 8 1 2 4 5 3 5 4 6 5 4 3 10 16 7 2 4 2 8 15 6 2 1 2 2 4 5 2 2 2 2 2 5 5 1 2 3 3 3 1 4 2 4 3 3 3 5 2 4 7 60 4 2 4 3 4 3 4 3 4 8 6 3 4 4 3 2 5 2 2 1 3 3 3 3 3 3 4 3 4 3 2 2 2 2 5 8 5	17 17 17 296 296 363 363 52 52 52 52 52 52 51 51 51 51 184 184 289 491 209 83 83 194 194 194 194 322 67 67 131 183 183 183 451 286 286 286 286 468 468 313 186 99 338 400 400 400 400 30 422 186 162 232 68 68 115 470 486 486 486 460 167 167 35 401 196 309 479 331 315 315 315 315 315 450 450 98 98 417 417 417 417 417 417 237 237 47 47 491 47 491 80 80 73 80 80 7 7 309 309 479 278 278 278 278 36 449 449 176 176 328 328 328 200 303 117 98 13 417 417 417 417 237 237 47 47 47 491 80 80 80 80 435 209 83 83 194 194 322 67 67 212 22 5 455 236 36 108 119 351 351 171 171 171 252 8 29 302 302 497 497 8 354 255 255 58 58 72 72 480 480 480 480 299 299 339 212 131 156 156 156 156 245 245 43 345 141 141 281 342 26 26 241 431 476 476 476 252 252 36 393 393 155 332 332 332 186 162 342 115 273 151 151 151 215 215 29 29 334 334 59 59 59 263 263 417 417 417 237 237 237 491 28 28 28 362 362 491 362 362 491 362 362 362 491 362 362 362 491 362 362 362 218 218 218 218 218 218 491 491 211 211 369 491 369 369 369 369 369 369 369 369 260 260 163 163 163 316 316 316 491 316 491 73 289 289 7 217 473 258 258 258 342 342 224 494 494 368 453 9 142 397 147 380 329 329 329 329 252 143 310 107 395 302 302 302 375 375 375 98 98 13 225 225 225 225 80 80 80 80 80 320 354 255 255 255 349 349 155 155 148 148 148 372 372 313 186 338 400 400 400 30 58 110 254 254 254 314 90 393 205 25 470 264 264 468 468 134 359 359 166 166 166 422 129 458 208 208 386 386 496 496 496 496 368 368 453 9 142 198 22 22 283 455 384 371 106 153 153 153 372 372 372 59 59 452 263 229 247 126 126 326 193 193
+103-1240-0027	103	698	1 12 13 30 38 14 34 30 20 29 23 15 33 31 23 15 11 1 31 27 12 5 33 24 3 30 19 23 5 24 5 31 33 9 20 19 22 31 29 13 22 33 19 26 31 5 24 38 5 25 18 27 24 38 19 34 24 4 34 39 36 33 19 33 20 1 9 5 33 12 5 11 19 32 19 40 38 14 13 37 30 20 11 15 11 19 32 19 40 1 5 25 11 12 13 30 38 5 40 27 25 23 20 22 30 4 9 4 29 5 23 29 30 19 40 14 37 40 5 25 11 38 5 25 22 8 25 11 5 37 22 15 22 1 31 27 12 5 33 12 20 19 22 31 29 13 22 33 19 11 22 5 24 29 5 25 20 22 35 11 25 3 33 9 20 13 25 20 29 14 33 19 22 39 5 23 14 22 5 24 29 5 25 20 1	14 3 2 3 3 4 5 3 4 5 3 4 3 3 4 7 5 16 6 3 3 2 3 3 1 3 4 4 3 4 3 3 1 2 3 2 4 2 3 2 4 2 2 6 5 2 4 2 2 3 5 4 4 2 2 4 3 4 3 2 3 3 3 5 11 25 6 4 3 2 2 5 2 6 4 4 3 3 4 3 3 2 3 7 4 3 5 5 11 7 4 2 1 2 1 3 2 2 3 3 3 3 4 6 2 4 3 4 3 2 2 3 2 2 4 11 3 2 3 5 3 4 3 5 6 5 2 1 2 3 4 7 6 15 6 2 3 3 2 3 3 2 4 1 3 2 4 2 2 2 5 3 3 3 2 2 4 4 2 3 2 4 3 2 5 2 4 3 2 3 5 2 4 2 1 3 2 5 2 4 2 2 2 7 19	17 17 17 296 296 317 491 491 184 184 184 184 184 320 320 127 114 0 0 0 378 378 43 347 347 347 347 313 186 164 164 164 119 487 487 487 288 213 324 324 301 143 259 74 425 425 386 386 431 403 171 171 252 24 270 270 342 26 26 26 241 241 431 403 403 171 171 207 358 37 24 131 427 229 247 247 126 126 292 292 326 326 326 326 326 101 101 101 149 149 228 491 373 66 115 273 273 496 496 274 216 216 45 45 45 236 35 196 196 217 65 65 329 495 406 467 256 256 139 175 175 423 423 423 423 423 399 217 70 65 65 151 169 150 342 142 221 336 420 420 420 464 154 154 154 458 458 96 86 105 105 336 470 470 151 178 178 96 96 36 272 176 135 328 200 200 248 248 478 342 68 68 115 273 231 231 231 53 250 250 174 174 174 319 348 33 58 72 72 72 437 350 350 350 350 203 203 250 250 250 333 333 333 220 220 402 196 196 217 217 65 486 486 460 460 169 164 25 485 485 485 374 422 143 36 377 87 87 236 36 108 108 119 213 213 213 213 246 246 246 19 19 454 454 229 170 491 312 312 312 187 292 292 292 12 12 12 21 408 408 408 260 391 491 491 316 491 316 491 73 289 7 7 7 354 159 159 159 159 240 314 35 35 127 5 5 455 236 129 239 384 371 278 278 278 99 436 436 395 395 50 50 50 50 49 9 142 397 336 347 347 347 467 14 411 204 204 204 204 204 29 337 337 337 301 422 239 384 93 93 93 93 171 171 252 252 314 239 384 371 371 278 278 99 99 436 436 395 50 50 50 50 50 185 185 269 433 390 390 160 112 97 225 225 225 225 225 225 80 491 491 412 83 55 55 55 322 466 127 0 0 222 378 43 345 141 141 281 453 168 168 350 350 350 350 348 250 466 81 166 166 324 422 143 401 259 208 208 208 190 487 499 499 486 460 460 215 280 106 486 486 460 215 215 74 100 302 497 497 122 129 259 190 492 492 492 368 9 9 168 300 498 498 498 498 498 498 59 59 59 304 173 173 49 269 342 168 89 89 446 446 116 195 195 195 212 133 133 364 276 174 174 174 174 348 348 33 33 90 90 76 144 27 437 437 480 480 480 85 299 299 299 212 131 462 462 462 402 221 259 445 445 351 351 171 171 171 171 358 358 358 233 458 192 427 229 247 126 126 326 326 326 326 326 326 326 101 101 149 391 491 373 66 68 172 115 273 494 240 216 216 45 45 45 45 314 35 198 22 448 448 448 464 154 154 154 458 96 86 105 105 336 470 151 151 178 178 96 96 36 272 191 191 131 472 458 144 27 27 437 319 319 319 53 53 76 35 259 242 242 242 116 41 41 41 324 422 143 465 458 389 389 389 389 314 196 479 331 307 307 61 61 215 35 35 354 420 420 420 3 14 411 475 475 475 475 475 475 475 301 143 129 74 492 492 236 236 129 75 108 278 278 278 178 458 458 192 485 134 134 359 81 300 382 245 143 82 144 27 351 319 319 203 53 394 76 465 29 242 242 242 41 41 41 19 19 454 13 417 491 170 421 421 491 491 491 421 421 128 491 128 128 491 128 128 193 193 17
+103-1240-0028	103	782	1 39 13 33 1 18 38 5 33 5 37 24 4 34 39 36 40 38 8 33 22 3 23 14 5 25 11 12 5 31 6 30 5 23 24 13 30 1 24 19 31 19 40 30 15 10 5 23 38 5 40 17 19 33 19 26 16 13 30 23 20 1 11 19 40 20 38 19 12 19 31 5 25 39 36 41 38 5 23 24 19 31 33 14 20 5 9 7 33 22 38 8 5 33 1 5 25 24 19 31 33 19 30 20 5 31 17 30 20 25 17 15 9 5 23 40 1 17 35 11 20 37 25 19 26 30 15 10 5 23 1 24 3 30 19 23 5 31 13 11 1 9 30 19 31 22 23 20 1 12 19 31 19 40 5 30 20 23 16 8 25 20 37 25 19 26 19 40 5 25 19 33 1 38 27 25 33 39 36 31 19 33 11 7 25 1	22 4 2 4 6 2 3 2 2 2 3 4 4 4 2 3 5 6 4 4 11 3 7 6 1 2 1 2 2 7 3 3 3 4 6 4 14 21 4 2 4 2 5 3 4 4 3 3 2 2 3 3 2 2 2 5 7 3 4 3 7 3 3 3 6 4 4 2 3 2 5 2 4 6 4 5 1 2 4 5 3 4 3 3 2 2 3 5 3 6 4 7 2 6 3 5 5 3 2 7 3 1 3 5 3 4 3 2 5 4 3 6 3 2 5 7 57 2 3 2 3 2 2 2 3 3 3 2 3 8 13 4 2 2 3 3 3 5 3 5 2 2 2 3 3 3 3 7 10 5 3 2 3 3 4 4 4 4 5 9 4 4 3 3 2 4 3 2 1 2 4 3 13 3 2 2 1 2 2 4 2 2 3 7 4 9	17 17 17 296 296 296 296 296 52 52 52 52 52 52 408 408 51 51 228 184 184 289 219 357 357 357 357 240 385 35 401 401 401 197 197 80 80 491 320 276 181 181 181 181 240 285 34 223 223 130 402 196 196 217 473 65 329 329 460 460 169 164 164 485 485 485 485 257 368 368 9 142 397 221 364 364 364 276 346 346 428 428 428 146 252 35 131 472 221 458 144 144 27 437 437 437 437 437 437 481 481 481 481 481 481 175 175 81 300 300 382 382 406 467 467 89 446 116 466 22 283 455 38 162 482 482 482 482 115 273 106 153 372 372 406 467 467 302 302 497 497 399 399 217 217 217 217 473 65 264 264 264 264 264 264 468 468 59 59 59 452 452 263 263 13 78 170 170 170 491 491 2 491 2 187 491 187 187 163 163 163 391 391 391 316 73 73 491 491 7 217 473 258 258 31 342 342 494 494 494 368 9 9 142 397 147 329 329 329 329 329 329 36 310 107 395 302 302 302 497 497 43 345 141 141 141 281 342 221 336 445 180 443 240 325 449 176 135 135 200 248 248 90 393 234 234 234 261 261 25 264 264 264 264 468 313 134 359 359 166 166 166 324 422 32 401 401 401 401 401 239 384 371 278 278 278 368 453 342 168 41 41 41 324 246 301 43 364 333 333 333 220 216 198 114 258 258 31 342 342 224 273 319 319 319 348 10 10 219 219 219 398 485 374 374 374 374 374 368 368 107 395 397 134 100 100 100 497 497 399 217 217 217 217 473 258 258 258 31 342 86 238 6 272 495 495 495 41 41 324 464 255 8 354 180 113 113 113 167 167 35 472 472 221 259 208 208 441 441 441 346 346 265 265 85 85 146 146 146 277 277 277 24 131 449 472 225 226 226 226 226 491 209 287 319 319 348 33 33 250 217 217 473 65 258 258 31 162 232 68 68 68 238 6 272 485 485 468 468 337 337 324 324 464 459 459 459 368 31 9 142 221 336 208 79 79 288 213 213 213 246 339 339 33 248 248 212 465 445 180 171 171 171 171 252 252 8 354 100 100 302 375 375 375 185 49 433 390 18 427 56 56 247 312 312 292 292 292 292 292 292 292 292 292 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 260 260 260 163 163 163 491 491 163 491 163 491 491 316 491 316 491 316 491 316 73 73 491 320 465 144 180 484 240 325 34 213 213 213 173 402 196 94 176 360 328 200 248 147 380 329 329 329 329 107 107 302 302 302 302 497 98 98 98 225 225 225 225 225 225 225 225 225 225 225 225 225 225 225 225 320 7 217 70 329 329 495 406 467 139 139 175 175 423 423 423 423 162 68 68 115 273 470 120 120 240 314 131 472 401 401 259 354 190 380 288 288 278 31 342 86 105 105 26 359 359 474 474 474 474 19 19 454 454 454 225 225 225 225 225 225 225 225 225 225 225 7 7 127 114 258 258 31 342 342 168 356 281 453 453 168 44 44 44 42 147 147 380 288 286 286 464 139 139 497 497 349 349 234 261 25 25 480 480 480 480 85 85 85 299 299 339 339 94 199 360 360 213 360 173 173 402 196 94 199 176 328 328 200 200 464 356 356 430 453 430 430 116 199 277 277 277 277 24 131 419 439 439 439 225 225 225 225 391 391 80 80 491 491 320 345 346 350 496 348 35 310 152 152 152 422 186 342 224 470 278 278 457 36 384 371 180 315 315 315 315 450 450 413 413 303 48 404 13 229 491 247 15 15 193 193 17
+103-1240-0029	103	763	1 18 7 14 6 23 39 6 30 16 27 22 31 1 31 5 24 34 19 26 12 4 33 16 6 30 23 4 22 5 37 13 25 20 5 12 14 25 15 24 8 33 9 20 22 6 23 11 16 30 13 25 32 19 29 1 19 17 40 19 31 33 5 11 4 25 11 6 23 38 20 40 18 4 11 19 17 40 19 31 33 5 11 9 19 33 38 20 25 24 3 30 19 23 5 22 5 34 9 14 33 5 25 11 24 19 31 19 40 30 15 10 5 23 1 19 25 31 29 8 33 5 37 1 6 30 29 14 18 4 29 31 9 19 22 5 40 5 37 1 12 13 30 11 19 31 19 24 5 23 4 30 5 33 20 1 24 3 30 19 23 5 38 5 40 5 33 6 23 1	10 5 7 3 4 4 1 4 3 5 2 7 7 36 7 3 4 4 3 6 3 4 3 5 2 2 5 6 7 6 6 4 3 6 3 3 3 3 9 6 6 4 2 5 4 4 4 4 6 1 3 3 5 4 5 14 4 5 4 3 5 2 5 2 6 2 3 3 3 3 4 3 5 3 3 2 4 2 4 3 3 2 2 2 2 4 3 2 2 3 2 2 3 3 3 4 4 3 5 4 2 2 2 1 2 2 4 2 4 3 4 3 4 4 17 4 3 4 4 5 3 7 14 9 6 3 4 3 3 4 4 5 2 3 6 4 4 6 8 5 3 2 3 2 2 5 2 2 2 4 4 3 3 2 7 50 3 1 5 2 3 3 2 2 3 3 9 10 10 3	17 17 17 296 363 363 51 51 51 184 491 373 72 268 268 268 268 268 88 88 88 353 353 406 467 106 297 297 297 297 293 293 219 219 464 106 106 387 387 372 349 349 205 261 25 106 496 496 496 233 233 233 233 270 270 433 390 390 18 112 112 439 439 439 439 439 78 78 491 491 28 28 28 2 2 2 491 2 2 341 341 491 341 341 163 163 163 163 163 163 163 316 491 316 491 491 491 73 491 373 66 68 172 115 273 231 231 231 231 53 53 394 76 164 214 214 214 214 214 328 200 200 248 248 248 212 127 45 45 92 92 167 167 35 472 393 393 155 155 332 332 332 387 134 251 251 241 431 431 376 376 376 460 460 169 178 35 401 458 144 192 69 69 223 223 223 223 223 130 130 352 352 402 483 440 411 475 475 475 94 475 475 475 324 324 464 180 493 493 493 493 216 300 382 382 313 10 10 479 331 290 290 290 290 434 434 434 434 203 399 217 70 70 65 65 428 428 428 146 146 24 131 472 221 259 420 420 420 420 420 143 458 144 144 27 437 481 481 481 481 481 293 122 122 131 472 393 393 234 234 234 261 487 288 288 432 330 348 64 76 310 107 107 395 395 459 459 459 459 215 233 131 427 229 247 247 126 326 326 326 326 326 101 101 149 149 228 228 412 188 154 154 154 416 416 96 368 453 453 278 278 278 278 278 31 342 86 86 238 6 272 191 191 191 191 37 325 335 335 440 145 89 194 446 446 67 212 131 34 106 297 297 297 293 43 345 109 109 109 171 171 368 453 342 58 72 72 110 110 486 486 460 240 240 325 34 154 154 154 416 96 96 453 168 278 278 278 31 39 86 238 6 272 191 191 191 314 133 259 354 255 236 236 108 119 397 487 360 360 360 434 339 33 250 217 65 65 329 495 406 288 288 139 175 175 423 423 423 423 129 458 144 27 437 151 151 169 169 402 402 221 401 401 354 495 498 59 396 313 325 449 89 89 446 116 250 217 473 258 258 31 342 224 494 494 494 368 142 142 397 147 380 329 329 329 329 329 36 310 107 395 302 302 302 375 98 98 229 82 247 126 126 126 326 326 326 326 326 326 101 408 149 149 228 289 412 188 188 340 121 394 478 478 68 86 105 336 354 106 265 265 428 428 146 146 240 325 449 34 223 223 223 223 223 223 173 173 402 352 352 352 352 352 352 97 97 417 197 417 237 237 237 237 237 80 491 80 80 80 209 287 157 157 157 372 245 245 129 129 74 492 492 492 492 58 110 486 486 486 460 215 215 35 96 66 68 68 342 221 336 336 354 420 420 422 143 458 144 27 27 351 151 151 151 368 453 342 168 168 223 223 223 223 223 223 173 173 352 352 352 352 97 97 225 225 225 225 80 491 127 114 222 222 468 313 313 325 34 490 490 31 342 342 342 273 494 203 70 65 134 134 175 175 431 486 486 486 486 468 468 468 469 469 469 36 449 449 41 41 41 19 19 454 417 170 170 170 491 28 28 28 28 491 28 362 362 491 362 362 362 362 491 362 491 491 491 362 211 211 211 369 491 369 369 369 369 369 21 21 21 21 21 21 21 21 21 408 408 408 149 149 228 289 491 320 473 65 329 329 245 406 406 380 134 139 139 175 423 423 423 423 345 141 141 281 281 9 168 44 44 44 129 129 108 108 119 119 437 437 437 481 481 481 481 481 481 481 182 182 182 182 293 293 497 497 497 98 98 404 225 225 193 193
+103-1240-0030	103	725	34 19 25 38 35 24 5 25 38 19 34 4 26 17 5 23 40 4 25 11 38 19 12 7 33 22 14 37 40 1 18 14 11 3 30 22 18 13 30 32 27 11 31 5 24 17 30 15 31 33 30 20 22 31 5 25 11 38 5 40 6 23 38 20 40 33 38 19 31 33 19 11 5 29 19 25 5 18 3 30 11 23 19 33 5 23 25 3 33 9 19 18 8 25 11 1 38 19 34 33 36 38 8 14 18 13 30 29 19 25 40 31 33 5 22 5 17 30 13 31 19 37 23 20 34 30 36 19 33 1 32 20 23 35 22 33 23 8 22 5 38 35 24 5 25 5 37 25 4 30 27 19 22 31 29 19 30 20 5 25 31 5 25 11 30 19 21 19 11 22 3 25 32 5 25 31 1 38 19 10 32 20 38 3 40 1	8 4 7 3 4 3 2 5 3 2 2 8 5 2 2 5 6 2 2 3 2 2 2 7 3 8 8 4 10 7 6 2 5 5 3 5 2 3 5 9 5 4 3 2 6 3 4 5 7 2 3 5 3 3 2 1 2 2 2 3 3 3 4 2 4 4 1 2 4 2 2 3 4 4 1 3 2 5 4 4 1 3 2 1 3 2 4 5 4 2 2 6 8 4 3 3 3 2 6 6 10 5 6 4 6 2 4 4 4 3 2 3 3 3 3 3 4 3 2 6 2 2 3 3 5 4 2 4 6 27 7 4 4 3 3 2 4 3 3 3 4 3 3 2 2 2 3 3 7 2 6 2 3 3 3 4 3 3 3 3 2 2 2 3 2 3 3 1 3 6 3 4 4 3 4 9 7 3 3 4 5 3 4 7 10 13	17 17 296 296 4 261 25 470 278 278 278 330 116 33 195 195 212 90 133 364 276 276 174 174 174 203 53 53 473 275 275 388 195 117 404 133 364 345 333 333 220 220 402 335 14 411 145 145 145 365 365 365 360 200 64 212 212 302 302 302 302 497 497 497 497 49 269 342 342 168 89 89 194 446 446 64 133 133 345 333 333 220 216 114 180 113 113 113 113 450 167 285 131 472 221 458 144 208 27 27 498 498 498 498 498 59 59 59 59 59 173 173 270 270 270 390 390 390 112 112 417 417 417 237 47 491 491 491 80 80 491 491 373 156 156 156 156 156 313 236 239 259 384 180 106 106 306 306 306 396 396 178 143 458 192 445 72 110 351 264 264 264 468 468 468 382 313 186 99 338 338 338 338 395 395 84 84 496 496 274 122 314 314 478 478 68 115 273 231 231 231 53 53 90 90 32 32 144 208 79 79 380 288 288 171 171 171 252 186 39 342 232 482 482 238 6 161 79 487 288 213 213 213 358 358 233 270 270 270 342 168 89 55 322 322 250 345 141 141 281 453 9 168 106 297 297 297 297 293 43 345 109 109 109 368 31 342 86 238 6 108 119 397 487 278 278 31 31 86 86 272 191 191 191 236 325 34 230 230 230 230 215 35 259 340 340 340 94 44 44 44 44 72 72 72 437 306 306 306 306 306 396 396 313 26 251 241 431 278 240 285 285 302 302 497 122 10 10 479 331 331 405 405 206 206 167 449 472 221 336 354 255 255 58 72 72 72 437 480 480 480 480 480 85 299 299 299 299 339 195 243 212 131 419 225 397 133 320 345 333 333 220 220 164 164 472 221 401 491 108 108 119 374 374 374 374 374 374 132 132 132 132 132 43 364 276 276 346 346 265 265 85 85 146 468 468 382 313 337 58 183 72 72 110 110 264 264 468 468 245 215 35 259 74 351 275 275 116 379 195 471 471 478 66 68 68 238 6 272 189 189 189 189 178 458 192 44 44 236 416 239 208 79 79 380 151 151 151 150 39 342 342 224 494 494 134 8 402 359 359 474 474 166 422 349 164 164 164 487 487 487 487 374 374 88 88 277 277 277 277 385 75 227 419 427 491 491 247 312 312 292 292 292 292 292 292 292 21 21 1 21 21 21 21 21 21 21 408 408 408 408 149 491 491 373 338 400 400 400 400 30 301 422 251 251 367 367 367 367 367 367 35 96 259 26 26 241 266 266 266 266 266 266 35 192 44 44 44 43 364 276 174 174 174 174 53 53 65 242 242 94 199 462 462 462 402 196 309 309 479 331 486 315 315 315 460 450 406 467 84 84 84 88 88 88 154 154 154 96 96 232 232 105 105 336 354 485 286 286 286 468 468 337 337 11 11 11 11 379 379 471 77 342 224 89 89 116 33 90 42 147 147 380 288 278 236 36 449 191 191 191 131 472 458 144 27 437 370 370 370 370 370 348 64 64 310 310 436 395 459 459 11 11 379 303 471 243 270 269 433 160 160 18 112 439 225 225 225 225 80 80 491 491 320 345 407 407 407 236 233 36 310 107 338 338 400 400 400 400 301 378 43 345 109 346 346 141 355 355 355 37 185 185 433 433 433 160 160 112 427 56 491 491 312 15 15 15 15 193 193 193 193 193 17 17
+103-1240-0031	103	842	1 9 5 33 12 13 30 38 5 40 5 31 15 37 19 26 31 5 24 34 19 26 5 9 7 33 18 14 24 7 34 38 19 10 1 19 16 19 33 18 4 11 9 19 25 13 37 14 31 27 31 23 8 33 23 20 11 19 37 13 23 5 29 33 1 24 8 33 18 4 37 9 19 25 22 5 25 31 19 11 14 11 19 25 11 19 22 5 33 19 37 5 37 5 31 13 25 31 5 37 18 39 36 24 14 1 38 19 30 6 23 29 30 19 33 20 38 13 23 1 31 13 11 24 19 31 19 40 30 15 10 5 23 1 8 1 38 5 40 22 8 25 11 5 37 5 16 30 15 11 39 36 38 14 25 33 1 12 27 1 18 38 13 25 8 31 6 24 4 34 39 36 31 33 3 30 33 19 26 6 16 33 5 11 15 1 8 34 6 33 24 15 9 20 18 20 38 5 40 17 27 19 26 33 5 12 5 11 3 22 33 14 40 1	16 2 3 2 1 2 2 2 2 4 3 6 6 2 2 6 7 2 4 3 2 3 2 3 5 2 5 3 4 11 6 2 4 12 2 6 5 2 1 2 2 1 3 5 4 6 4 4 6 3 6 2 5 3 2 2 3 2 4 2 5 4 3 4 17 6 4 2 1 2 2 2 2 3 3 2 3 5 2 2 2 2 3 4 2 3 4 2 2 4 2 3 3 3 4 2 4 3 1 3 3 2 3 3 6 44 4 3 2 4 3 4 2 1 2 3 3 3 10 7 6 3 3 2 2 3 2 4 3 3 4 3 4 35 7 1 2 1 3 3 4 2 1 2 2 4 4 2 5 2 4 5 4 5 2 3 1 3 11 1 2 2 2 2 5 5 6 4 6 3 2 4 3 2 3 2 2 2 3 3 3 3 2 2 10 20 7 3 4 3 2 4 3 3 3 3 1 2 3 4 3 4 2 2 2 3 1 3 4 5 2 5 11 16	17 17 17 296 296 363 51 51 51 51 51 491 184 184 184 184 320 159 159 159 159 240 35 127 0 0 222 378 345 347 141 141 281 342 342 44 44 44 38 162 232 68 172 115 470 470 171 171 171 252 173 173 176 176 135 135 200 200 248 248 478 232 232 232 68 115 273 231 231 231 231 53 76 76 164 164 214 214 214 328 200 464 255 255 8 259 180 113 113 113 113 450 285 285 58 156 156 156 156 156 245 245 399 217 217 473 65 315 315 315 315 450 450 450 450 169 352 164 164 97 397 397 364 407 407 407 407 407 407 36 310 310 107 107 107 447 97 483 197 226 226 80 80 209 209 188 118 118 118 118 352 352 25 177 177 254 325 34 254 254 254 314 129 259 137 137 137 137 137 116 195 195 195 335 14 226 411 145 463 463 463 463 29 29 382 313 186 186 162 68 172 344 344 344 344 344 274 186 162 232 68 26 26 26 386 431 428 428 146 146 35 26 359 166 166 166 236 36 384 490 490 490 173 280 280 180 443 443 139 175 175 81 81 469 469 469 215 233 96 96 227 419 427 229 247 126 126 126 292 326 326 326 326 326 101 101 101 149 149 228 491 320 7 70 70 65 65 428 428 85 146 24 325 202 202 202 402 129 259 137 137 137 137 33 76 465 465 27 121 121 33 394 478 68 68 115 115 278 278 278 285 300 382 313 24 325 34 121 121 116 33 394 212 239 371 371 278 278 143 458 458 192 469 469 325 34 459 459 459 173 173 280 69 69 223 130 280 44 44 44 38 342 68 115 273 432 432 330 379 64 77 77 224 494 462 462 402 183 183 219 485 485 485 374 374 132 399 53 334 334 334 59 452 263 417 417 417 417 237 237 28 28 491 28 362 362 491 491 491 362 491 362 491 362 491 491 40 211 211 369 369 369 369 21 21 21 21 21 21 260 260 260 260 391 391 391 491 73 289 289 7 345 109 109 109 347 467 467 499 297 297 297 297 293 293 35 259 74 190 104 104 104 285 34 41 324 301 43 276 109 109 139 139 139 293 293 293 98 98 13 13 417 417 417 417 417 417 80 80 80 80 435 66 66 179 179 179 179 179 314 314 196 217 473 258 258 31 342 224 494 494 281 142 142 397 147 380 329 329 329 329 143 310 107 395 395 302 302 375 98 98 13 417 417 417 170 249 20 28 305 491 442 305 305 2 491 491 2 2 249 491 491 305 305 366 491 491 366 316 491 435 435 435 435 435 435 289 412 287 287 111 111 111 438 378 345 141 141 281 9 142 336 144 27 480 480 480 146 299 64 34 34 462 130 29 44 255 38 349 205 205 261 190 380 288 288 171 171 252 252 314 239 219 219 219 152 152 374 374 132 132 43 364 276 109 372 372 59 59 396 339 243 227 472 472 198 216 127 114 84 84 84 16 16 16 274 98 98 13 225 225 225 345 409 409 409 409 94 199 111 111 111 438 438 162 342 172 115 273 106 481 481 426 426 206 399 217 217 473 486 486 486 460 460 169 164 164 485 485 485 485 374 318 186 162 54 238 6 272 499 499 499 206 240 285 34 176 135 200 200 106 284 405 206 169 169 402 96 36 272 469 469 236 325 93 93 93 207 207 19 454 454 229 82 247 126 126 126 326 326 326 326 326 326 326 326 326 101 101 101 149 149 228 412 412 287 111 111 111 438 349 164 164 106 106 405 405 206 167 457 217 473 473 476 476 171 252 252 8 420 420 324 324 183 451 30 30 30 378 141 141 141 281 342 142 221 144 180 84 496 88 88 176 176 135 135 248 248 108 377 123 123 216 22 283 455 236 384 180 91 91 91 91 91 178 458 96 6 272 334 334 334 304 304 304 185 185 323 390 390 18 112 439 417 237 237 237 237 237 237 237 237 237 128 128 128 128 128 128 193 193 17
+103-1240-0032	103	738	1 24 14 19 23 5 40 23 19 29 31 33 38 19 10 33 5 25 11 14 31 33 4 25 11 19 26 23 20 1 32 20 18 4 11 19 22 31 29 13 22 33 19 11 24 19 31 19 40 30 15 10 5 23 5 29 1 32 20 18 4 11 25 27 25 12 5 33 12 5 31 8 33 5 37 24 4 34 39 36 21 6 25 33 19 26 6 16 31 27 5 25 5 22 7 25 33 5 9 23 20 1 38 35 11 9 20 33 36 24 5 10 16 14 18 14 25 15 9 14 40 22 39 35 30 20 3 31 5 33 20 1 27 25 27 1 8 24 22 38 8 33 38 13 23 1 6 23 12 27 8 18 4 11 5 9 4 11 18 13 11 15 22 39 13 31 33 14 11 15 1 32 20 31 13 11 1	19 5 4 3 4 3 4 3 2 6 6 4 3 3 7 2 2 2 2 3 3 3 5 1 2 1 4 3 7 23 3 4 2 3 2 2 4 2 3 3 4 2 3 2 3 2 3 2 4 3 3 4 4 3 5 7 6 6 4 4 2 3 4 8 1 2 2 2 2 3 5 7 3 3 2 3 6 4 2 5 5 5 2 3 2 3 5 3 6 6 2 2 3 5 4 3 2 2 3 3 7 16 4 2 2 3 5 7 4 4 3 4 5 2 3 2 3 5 4 3 4 3 2 2 3 4 4 5 2 2 9 27 7 4 12 15 8 3 3 3 5 4 2 2 8 1 7 5 2 6 6 3 3 2 3 3 8 2 4 2 3 4 4 2 3 4 3 1 3 7 5 6 3 4 4 5 10	17 17 17 17 296 317 317 317 491 317 461 461 461 461 184 184 184 184 491 7 7 70 65 329 329 42 406 406 288 134 139 139 175 175 423 423 423 423 31 342 26 26 251 241 431 278 278 278 215 233 233 270 270 433 433 86 238 336 336 82 108 397 397 441 441 109 278 278 385 233 36 310 447 447 238 6 272 34 319 319 348 64 212 300 300 313 186 162 232 238 6 470 470 294 294 294 294 330 94 94 176 176 328 200 200 248 359 359 474 474 474 19 19 454 454 229 170 491 247 312 126 292 292 292 292 292 21 21 21 408 408 408 408 408 391 391 228 491 373 373 400 400 400 30 30 58 58 110 254 254 254 254 325 34 154 154 458 96 66 342 105 105 336 470 151 151 178 178 35 96 401 75 272 191 191 191 314 196 217 473 258 258 258 31 342 224 494 494 368 9 142 397 147 380 329 329 329 329 252 36 107 107 302 302 302 302 175 175 431 230 230 230 230 215 35 227 419 439 439 439 225 225 47 47 491 80 491 373 373 338 338 400 400 400 30 30 58 58 110 254 254 254 254 254 35 196 196 309 479 331 84 84 84 496 274 274 413 413 466 466 45 45 45 45 216 198 22 283 455 38 162 68 68 115 273 273 265 428 428 146 146 146 35 449 34 69 69 130 402 402 196 217 473 486 486 486 460 460 169 169 164 164 485 485 485 374 132 301 236 129 310 310 107 395 180 106 426 426 426 206 348 76 465 449 449 176 135 200 200 199 106 426 405 426 206 402 402 478 232 68 68 115 344 344 344 344 88 14 14 411 319 319 319 94 199 154 154 458 445 445 351 351 351 315 315 450 413 413 76 449 449 134 134 134 8 26 359 359 474 474 474 324 19 454 229 247 247 126 126 326 326 326 326 326 326 326 101 101 101 149 149 228 320 7 345 389 389 389 389 314 401 259 420 420 420 420 422 236 129 36 108 119 344 344 344 374 374 132 132 399 70 383 383 383 383 383 383 36 107 447 393 393 155 332 332 332 245 58 156 156 156 313 10 10 479 331 290 171 171 252 173 8 29 334 304 304 304 186 54 142 221 336 445 445 485 485 485 468 468 468 337 337 485 464 180 180 405 405 169 169 150 342 342 224 469 469 325 325 41 41 41 19 19 454 13 439 78 78 491 491 28 491 312 341 491 341 12 12 341 491 12 12 260 260 260 260 260 391 391 228 491 491 289 289 209 287 287 16 16 16 274 413 348 479 331 84 84 84 16 16 16 16 203 381 48 417 417 417 417 417 237 491 47 47 47 491 491 47 491 80 80 491 209 287 287 111 111 111 111 203 203 90 90 76 458 208 441 441 346 428 428 428 146 146 131 133 133 364 109 109 139 139 139 139 375 375 98 98 417 417 417 225 225 225 287 287 297 297 297 297 293 293 216 216 114 84 84 84 16 88 88 111 111 111 111 438 58 58 110 110 110 254 254 240 34 44 44 44 8 354 180 180 376 376 376 376 460 282 240 24 131 183 72 110 110 443 443 240 325 34 207 207 207 324 416 416 239 219 357 357 443 443 169 150 150 86 238 6 469 469 469 469 325 93 93 93 207 207 19 454 417 417 417 417 80 491 435 373 338 400 400 400 30 30 422 162 342 115 273 470 120 120 120 37 24 404 229 229 247 312 15 15 15 193 193 193 193 17
+103-1240-0033	103	720	1 24 4 34 39 36 38 13 25 33 5 9 30 8 33 30 19 37 14 1 38 19 30 17 13 33 19 26 5 23 19 33 5 23 9 28 16 14 24 5 25 6 30 16 5 25 5 31 8 23 5 24 19 25 27 37 5 31 22 27 32 5 1 5 25 11 18 20 40 22 5 24 19 26 3 25 12 5 33 30 15 25 33 5 25 8 33 1 19 16 24 3 30 19 23 5 18 4 11 31 13 11 1 12 5 33 24 4 34 39 36 18 4 11 17 6 25 33 5 9 30 8 33 30 19 37 14 33 5 24 20 33 15 22 4 26 17 14 36 16 14 24 6 31 33 30 15 23 39 5 1 24 19 31 19 40 30 15 10 5 23 22 35 11 25 3 33 18 4 37 9 19 25 24 6 30 5 31 33 3 25 19 32 33 1	11 6 4 3 3 3 3 3 3 3 2 3 2 6 3 3 3 3 7 14 5 1 3 4 2 2 3 4 2 3 2 1 2 2 6 11 2 2 2 2 2 6 3 3 2 2 3 5 5 3 1 3 3 10 3 2 3 4 3 4 6 7 12 7 1 2 1 2 3 4 3 2 2 6 3 4 1 2 6 2 3 4 3 2 2 6 7 41 7 3 2 2 5 3 3 3 3 2 3 6 5 4 2 1 2 2 5 6 4 2 3 2 2 3 3 4 4 3 2 4 3 5 6 2 3 3 5 4 2 3 3 3 2 9 2 6 3 6 6 3 2 3 4 6 5 2 3 3 3 8 18 4 2 3 2 5 3 3 4 5 3 4 2 3 3 4 2 1 2 2 3 2 2 6 5 2 3 3 3 4 3 3 5 4 11	17 17 17 363 363 363 363 51 51 228 491 289 7 217 70 65 329 329 329 329 169 164 164 164 485 485 485 301 378 378 43 345 109 189 432 330 348 64 76 36 377 377 123 123 8 259 190 380 499 499 428 428 146 146 252 35 131 133 147 147 380 288 288 173 173 280 29 334 59 59 452 263 417 417 417 417 417 237 237 237 435 237 237 80 491 435 435 435 7 7 364 345 152 152 468 468 313 416 416 445 180 443 443 240 325 34 135 135 135 200 200 44 44 44 251 241 81 278 278 26 34 302 497 122 8 32 32 32 354 153 153 153 153 387 387 387 207 207 146 301 393 155 165 165 165 165 53 44 44 94 335 14 411 411 153 387 372 372 396 349 349 352 25 242 242 94 199 255 271 38 342 342 115 273 106 265 265 85 85 146 175 175 81 242 203 217 473 89 340 116 195 195 195 195 195 197 309 309 479 331 84 84 496 173 280 29 469 38 162 54 482 105 336 144 180 496 496 496 274 186 99 436 436 395 423 423 423 423 355 263 263 229 82 247 126 126 326 326 326 326 101 101 149 149 228 412 412 83 194 194 194 322 212 131 451 30 356 281 342 342 221 144 27 351 319 319 319 53 176 135 135 200 248 248 248 479 331 106 426 426 125 348 466 466 22 283 455 236 259 161 161 161 487 288 290 290 290 434 434 339 248 76 108 377 351 494 116 94 479 331 428 428 428 428 299 358 358 24 227 419 439 439 417 417 237 237 28 28 491 28 491 362 491 491 362 362 491 362 491 362 491 362 491 491 362 218 218 40 305 366 491 366 366 366 366 366 491 435 435 316 73 73 289 289 209 188 118 118 118 118 402 221 196 217 473 65 329 329 245 42 147 380 288 134 139 175 175 423 423 423 423 58 110 254 254 254 254 36 478 232 232 68 172 115 273 470 120 120 120 120 24 314 314 472 198 127 45 45 45 45 457 196 217 217 473 65 329 486 486 460 460 169 164 164 485 485 485 485 485 374 132 58 254 254 254 254 314 416 458 144 180 106 426 426 426 413 348 64 76 465 108 377 123 44 236 129 259 190 190 380 499 499 428 85 146 146 143 131 472 133 133 147 147 380 288 288 173 173 280 29 334 59 59 59 313 143 36 377 377 87 87 217 473 65 213 213 213 252 325 34 44 44 44 129 259 445 445 445 351 351 351 486 365 365 360 200 200 200 248 212 192 180 255 495 42 147 380 380 374 374 132 132 132 349 155 155 165 165 165 165 70 65 284 405 206 169 150 162 482 482 482 482 482 238 6 161 487 487 288 288 288 139 139 139 81 337 324 423 423 423 423 423 423 452 263 229 247 312 126 126 292 326 326 326 326 326 326 326 408 408 149 149 228 491 320 7 217 258 258 258 31 54 224 494 494 368 453 142 142 397 147 380 329 329 329 329 252 36 107 107 395 302 302 302 375 375 98 143 401 259 144 389 389 389 389 314 196 309 479 307 307 307 61 61 285 449 202 202 202 130 129 259 354 137 137 137 116 33 250 217 217 70 70 138 138 138 138 138 372 467 467 255 255 38 54 86 238 6 272 106 499 405 426 206 348 199 199 459 469 271 99 447 447 6 6 6 419 439 78 491 305 421 491 421 128 491 193 193 193 17
+103-1240-0034	103	770	1 32 20 38 5 40 4 22 32 5 23 20 31 33 30 19 22 5 25 1 11 5 24 16 14 16 8 37 31 13 22 5 25 40 1 19 33 38 5 40 1 2 12 5 33 1 24 3 30 19 23 5 38 6 40 1 24 15 22 19 26 1 16 5 25 5 37 18 14 1 9 5 33 24 19 31 19 40 30 15 10 5 23 38 5 40 6 23 24 27 31 33 16 6 30 31 33 5 31 5 29 27 40 19 33 1 3 30 39 36 19 25 1 14 25 19 31 33 24 3 30 19 23 5 1 32 20 11 19 24 4 25 11 19 11 1 38 19 25 37 28 31 30 20 33 14 25 11 33 5 18 14 1 39 13 31 5 37 22 6 30 31 1	12 6 2 3 1 3 6 3 2 4 3 5 5 3 2 1 3 3 8 1 4 4 9 4 3 6 10 3 5 2 4 3 5 6 32 4 3 3 3 5 1 14 2 2 5 6 1 2 2 3 2 3 4 4 31 1 2 3 4 2 2 2 7 3 2 2 2 3 8 28 2 4 3 3 2 5 2 5 3 4 4 4 3 3 1 7 2 3 4 2 4 2 5 3 3 5 7 3 4 2 4 6 4 3 10 57 5 4 2 5 2 4 4 8 2 4 5 4 2 2 2 3 4 9 3 7 2 2 2 4 4 2 2 2 1 2 2 1 4 4 6 6 2 2 6 5 3 1 2 2 4 8 24 4 2 4 1 2 5 3 6 12 10	17 17 17 17 296 317 317 317 435 435 184 184 184 373 373 338 400 400 400 30 378 345 141 141 281 453 168 145 145 145 460 460 460 178 96 96 436 447 134 134 134 134 134 359 166 166 166 166 324 186 162 482 482 482 238 6 336 161 487 278 278 178 458 192 192 242 116 116 195 195 394 394 212 401 401 401 401 384 371 180 180 319 319 319 203 53 381 381 381 381 381 381 76 393 155 332 332 332 332 245 349 205 205 261 25 106 265 265 265 265 85 85 85 146 146 173 402 402 66 68 68 115 273 470 151 151 178 458 458 192 242 275 275 379 303 471 471 471 49 433 160 112 427 247 247 312 126 292 292 292 292 292 292 292 292 292 21 21 21 326 21 21 408 408 408 408 149 149 228 228 316 316 73 491 289 289 209 177 177 177 177 131 133 133 141 141 141 141 281 453 342 483 483 226 226 209 287 319 319 319 319 348 348 394 478 478 66 68 68 115 494 494 494 215 129 259 74 74 437 72 72 437 437 496 496 496 496 274 274 368 368 368 9 168 494 134 134 8 100 100 100 100 100 375 375 497 216 198 45 45 45 45 35 196 196 70 65 329 329 329 406 406 467 288 139 175 175 423 423 423 345 141 141 281 342 142 196 217 473 476 476 476 143 458 192 176 135 135 328 200 248 248 393 234 234 234 261 25 319 319 319 348 94 199 223 223 130 402 58 156 156 156 156 59 59 59 452 263 229 247 247 126 126 126 292 326 326 326 326 1 1 1 1 408 408 260 260 391 391 391 391 491 73 73 73 289 491 320 159 159 159 159 159 385 35 196 196 217 473 258 258 258 342 342 224 494 494 494 368 9 142 397 147 147 329 329 329 329 329 329 143 310 107 107 395 302 302 302 497 497 43 345 141 141 141 281 453 168 483 14 226 209 297 297 297 297 297 399 70 65 65 496 169 150 54 238 6 6 472 393 234 261 261 148 148 148 387 372 396 186 186 54 86 238 6 6 472 472 472 482 224 224 494 494 38 162 323 323 224 494 494 129 259 74 437 496 496 496 496 274 274 368 368 9 168 277 277 277 37 24 131 227 419 439 439 439 439 417 237 237 237 28 28 491 491 28 362 491 362 362 362 491 491 491 491 362 362 491 218 362 491 491 218 491 218 218 218 218 218 435 218 366 491 491 305 366 491 366 435 366 491 366 491 366 316 316 491 316 491 316 316 491 73 73 289 289 209 287 430 430 430 430 430 430 219 219 477 477 378 88 109 44 116 116 199 335 14 226 226 226 209 209 411 498 498 498 308 396 313 94 459 459 459 459 271 31 342 86 86 6 272 472 221 196 70 473 329 329 329 406 467 134 134 134 175 175 423 423 423 423 423 263 263 225 225 225 225 225 80 373 373 338 338 400 400 400 30 422 239 384 490 490 399 217 473 365 365 365 365 365 388 64 212 191 191 191 314 133 259 409 409 409 409 33 33 250 32 280 280 153 153 343 387 387 146 358 39 39 86 142 142 397 456 456 456 236 36 108 119 308 308 308 308 308 308 308 388 339 33 394 212 108 123 123 123 123 58 156 156 156 156 59 59 452 263 229 229 247 312 312 126 292 292 292 292 292 1 1 1 1 23 23 23 408 408 408 391 391 316 73 491 289 289 7 357 357 357 271 31 342 168 494 255 402 402 458 208 441 441 153 153 153 387 372 396 396 271 186 39 39 390 390 390 390 390 390 18 18 112 439 439 439 439 237 78 421 128 193 193 17
+103-1240-0035	103	802	1 31 13 11 24 3 30 19 23 5 1 4 40 19 16 17 13 33 19 26 9 28 40 16 14 24 6 30 16 5 25 5 31 8 23 5 24 40 19 25 27 37 5 31 22 27 32 5 38 14 29 3 30 33 5 37 12 20 39 36 41 36 5 23 31 29 30 19 26 38 14 22 6 25 13 25 20 38 13 23 30 13 17 39 5 23 15 33 19 11 4 37 5 25 23 20 16 3 30 24 1 19 25 31 33 13 11 5 37 9 20 19 26 4 25 5 25 18 14 11 5 37 19 25 5 37 15 32 5 25 1 24 19 31 19 40 30 15 10 5 23 16 13 23 33 12 5 33 32 20 18 4 11 30 20 31 20 37 11 5 1 31 5 37 19 30 24 13 25 33 5 23 1 21 27 23 33 1 32 20 34 6 33 19 25 13 22 31 22 23 5 24 15 32 5 25 29 28 25 33 31 1	8 4 2 3 2 2 4 3 3 6 13 5 4 2 4 3 3 2 2 6 3 11 3 4 1 3 5 2 4 2 2 2 5 5 2 2 4 2 2 7 3 2 3 3 4 4 4 4 3 3 4 4 2 2 2 2 2 2 5 6 4 2 2 3 4 4 1 3 8 5 5 4 3 4 4 3 4 5 5 7 3 2 4 2 3 2 3 2 2 3 7 2 2 3 2 5 6 6 3 4 19 5 2 4 3 2 2 2 2 3 3 2 3 2 2 5 4 6 5 3 5 3 2 2 2 3 4 7 2 7 33 4 3 2 3 3 4 3 4 3 2 5 3 4 2 2 1 3 4 4 4 3 6 1 3 6 7 2 3 6 9 8 2 4 6 4 8 2 4 3 3 5 3 5 3 7 8 18 7 2 4 6 4 2 3 2 4 3 2 2 1 3 2 6 3 3 4 7 2 2 5 10	17 17 17 296 296 296 184 184 373 66 172 179 179 179 179 314 196 196 70 65 329 329 495 406 467 288 134 139 139 175 423 423 423 423 263 229 82 247 126 126 326 326 326 326 101 101 101 149 149 228 412 83 253 253 253 453 342 224 118 118 118 118 402 402 221 259 144 445 180 443 240 449 449 176 135 135 200 200 248 248 32 32 354 354 153 153 153 153 387 387 387 85 207 318 185 269 9 142 393 155 165 165 165 165 70 14 14 411 153 387 372 372 349 349 205 352 29 242 116 94 199 255 38 31 342 68 115 273 265 265 85 85 85 175 175 81 203 203 471 471 49 453 168 89 340 116 116 10 10 479 331 84 84 496 274 8 29 459 313 31 162 54 105 105 336 27 496 496 496 496 274 99 99 436 395 423 423 423 43 43 345 347 347 245 245 129 259 74 437 437 306 306 306 206 240 285 449 69 223 130 198 198 283 455 219 219 219 219 219 485 374 374 374 132 132 99 99 161 161 397 134 100 100 100 497 497 186 162 482 142 105 336 336 336 190 380 288 288 360 328 200 200 195 195 248 248 364 364 276 276 109 498 498 498 396 396 178 35 458 192 125 125 125 125 348 199 335 14 411 411 475 475 475 94 475 475 324 324 301 378 43 364 276 109 109 443 443 139 139 139 293 293 497 497 42 42 147 147 380 288 443 443 416 416 458 445 485 134 134 175 175 158 158 158 158 325 449 191 191 191 325 335 14 145 145 486 460 460 173 280 280 242 242 116 379 250 359 81 41 324 324 324 422 349 234 234 261 261 25 106 306 306 306 306 306 306 282 203 203 117 404 229 247 247 126 126 326 326 326 326 326 326 326 326 326 101 408 408 149 228 491 491 412 188 340 340 67 77 478 232 86 68 272 470 470 443 443 240 34 223 223 130 129 259 354 420 420 420 360 135 135 135 200 44 44 44 44 199 335 145 319 319 319 348 348 33 90 72 72 72 72 498 498 498 498 498 396 396 285 285 180 106 284 353 206 206 173 280 280 121 121 116 199 469 469 173 280 418 418 418 418 418 418 99 99 436 436 60 60 298 298 303 303 303 117 404 13 229 491 247 312 126 292 292 292 292 292 292 12 12 12 21 260 305 201 201 201 201 201 201 201 201 201 491 491 316 316 491 316 491 289 289 7 7 473 258 258 258 342 224 494 494 494 281 9 142 397 147 329 329 329 329 329 143 36 107 107 395 302 302 497 497 349 349 234 261 25 180 189 139 139 139 293 167 35 35 198 45 45 45 45 310 338 400 400 400 400 30 30 3 58 110 110 254 254 254 254 254 314 131 133 364 147 456 456 456 38 162 68 68 172 115 444 444 444 444 444 246 246 318 173 402 6 272 34 44 44 44 8 8 401 401 401 197 491 80 491 80 491 80 80 197 66 66 68 172 115 273 494 278 173 8 4 280 485 485 286 286 286 286 468 382 245 245 399 399 217 217 217 217 473 65 432 432 330 348 64 64 465 449 449 302 302 302 497 497 122 129 401 401 401 401 491 310 107 395 395 106 481 424 424 182 182 375 375 122 233 75 227 227 419 419 439 439 439 439 439 237 439 78 78 47 491 47 491 491 316 316 491 491 316 316 73 373 373 373 338 400 400 400 30 422 422 164 164 25 106 106 405 405 405 206 167 449 449 34 340 340 116 94 199 145 154 178 458 96 342 342 224 105 27 386 386 386 386 399 473 418 418 418 418 99 436 436 60 60 298 116 33 250 53 394 76 259 74 441 441 153 387 387 299 299 299 358 243 270 270 433 160 112 427 491 247 247 126 15 15 193 193 193 193 17
+103-1240-0036	103	625	1 24 3 30 19 23 5 4 25 11 24 4 34 39 36 22 5 34 9 14 33 5 37 6 23 29 20 29 5 23 5 11 3 29 33 19 26 5 9 28 1 16 30 5 24 5 25 6 30 16 5 25 5 31 8 23 5 24 1 38 13 23 1 12 5 38 14 23 11 38 5 40 31 14 33 5 25 23 20 33 14 25 19 26 5 29 31 8 11 7 25 1 32 20 38 35 11 9 20 31 5 29 30 8 40 11 4 33 25 5 34 19 26 4 16 33 14 12 19 31 1 25 5 34 19 26 1	17 5 2 7 3 3 4 4 3 3 3 5 4 1 4 6 3 4 3 2 2 2 2 5 4 7 3 3 1 4 2 3 4 2 2 1 3 2 10 21 24 5 1 2 1 2 1 5 3 4 2 1 3 6 6 4 3 8 37 6 3 12 4 2 3 8 6 4 3 2 2 2 9 4 2 1 2 2 3 6 4 1 2 4 2 3 5 4 7 9 7 30 6 2 3 2 1 3 3 3 2 3 4 4 3 3 2 2 6 3 5 2 4 4 2 3 2 3 5 11 27 9 3 7 2 5 25	17 17 17 17 296 296 52 52 52 51 51 51 51 51 184 184 491 320 7 217 70 473 65 65 329 42 42 147 147 147 380 288 134 139 175 175 423 423 423 423 423 335 440 89 89 446 446 212 131 472 196 196 473 65 486 486 460 460 169 169 164 164 485 485 485 132 143 129 401 144 144 27 27 437 151 151 169 164 164 164 401 401 259 29 382 313 285 34 69 223 130 280 106 297 297 297 297 297 293 215 35 35 259 74 74 213 213 213 213 213 252 215 259 29 100 302 175 175 81 255 255 236 384 180 405 405 405 206 215 96 449 135 135 135 200 44 44 44 8 32 32 401 401 401 401 401 354 354 153 153 153 153 153 153 153 387 387 387 207 207 207 207 19 19 454 454 454 229 229 247 312 312 312 126 292 292 292 292 292 12 12 12 12 12 12 260 260 260 260 391 391 391 228 491 373 373 155 165 165 165 165 53 44 44 199 106 106 284 387 372 372 396 349 349 234 261 29 242 116 94 199 255 38 162 232 232 172 115 273 265 265 265 85 146 146 146 175 175 81 459 203 203 203 381 48 404 13 439 78 170 170 170 28 491 187 187 341 2 2 2 491 2 2 362 362 362 362 40 491 366 366 491 366 435 366 366 491 366 491 316 316 316 491 491 435 435 491 289 7 7 364 276 109 109 139 139 139 139 293 293 375 98 98 98 225 225 225 225 225 225 225 465 198 127 5 5 455 43 43 364 276 276 109 109 498 498 498 498 134 134 139 302 293 497 122 122 131 133 345 141 141 141 281 162 232 232 232 232 68 68 115 273 498 498 498 313 240 35 26 26 359 359 166 166 166 422 143 36 108 108 119 308 308 308 308 308 313 94 176 135 200 200 200 230 230 230 215 35 478 232 68 68 273 273 265 428 146 146 416 416 401 401 259 371 180 315 315 315 315 315 315 450 450 450 413 413 303 117 48 404 229 229 491 312 312 126 126 292 292 292 292 292 292 292 292 292 1 21 21 21 21 408 408 408 408 408 391 391 391 228 491 373 373 338 400 400 400 400 378 378 345 389 389 389 314 8 354 420 420 420 422 342 342 224 494 494 129 129 74 190 487 499 499 265 85 85 146 368 453 238 6 272 34 494 236 314 196 196 309 309 309 479 331 231 231 231 231 349 164 164 214 214 214 328 200 200 464 145 460 460 460 169 402 96 272 300 382 313 216 216 114 258 258 258 271 271 39 433 433 433 390 160 18 112 427 56 56 170 312 312 312 187 187 292 12 12 12 12 12 12 12 12 12 408 163 163 163 491 316 491 316 491 491 289 289 289 7 7 309 309 309 479 331 231 231 231 231 169 164 164 164 214 214 214 214 328 328 200 303 117 404 404 439 439 439 439 439 237 237 237 421 421 421 421 491 421 128 491 128 128 128 128 193 193 17
+103-1240-0037	103	740	1 18 38 5 33 3 25 1 14 34 29 35 33 31 5 10 5 25 27 32 5 25 19 25 33 5 39 6 30 18 13 11 1 32 20 11 19 24 4 25 11 19 11 19 31 5 29 30 36 37 19 26 23 20 1 12 19 31 18 4 11 9 19 25 11 5 25 38 19 12 7 33 18 14 5 11 37 8 31 9 20 19 26 4 31 22 33 1 4 25 11 24 5 31 33 29 14 16 6 30 31 9 20 11 19 31 5 29 30 36 37 11 1 38 13 23 38 20 37 9 19 25 34 19 26 22 19 26 5 9 7 33 19 33 16 14 31 5 24 33 8 24 1 6 23 38 19 25 33 14 19 25 16 4 22 33 1 30 19 33 14 25 11 24 3 30 19 23 5 1	25 2 2 1 3 4 4 3 9 4 2 3 2 2 3 3 2 6 3 5 2 2 1 2 3 2 2 3 3 6 3 9 3 7 2 3 2 4 6 2 2 2 5 2 4 2 4 3 2 2 3 4 2 6 39 2 4 5 5 1 3 2 3 4 4 4 5 3 2 2 6 3 4 3 2 2 4 6 5 2 4 2 4 6 5 2 6 10 9 3 4 3 4 3 3 5 4 5 3 5 5 3 3 3 2 3 2 3 2 5 5 5 66 3 10 4 3 3 3 3 3 3 4 1 3 3 2 3 2 3 5 3 2 3 2 2 5 3 4 5 8 9 11 5 5 3 2 2 4 2 2 4 4 6 4 5 5 3 2 5 5 2 2 2 2 2 4 3 6 11	17 17 17 296 211 317 317 317 317 52 52 52 52 52 52 52 52 51 51 51 51 184 184 184 491 320 320 181 181 181 285 449 34 125 125 125 348 348 457 14 226 226 226 226 209 411 498 498 498 498 498 169 169 164 164 472 221 259 354 181 181 236 35 478 54 224 344 344 344 36 449 44 44 44 10 10 10 479 331 84 496 496 274 99 99 436 60 60 298 116 94 199 340 340 116 76 377 123 123 123 219 477 222 222 222 372 372 245 58 72 72 110 110 120 120 120 120 37 24 24 131 404 439 225 225 225 225 80 80 80 373 373 338 400 400 400 400 422 143 384 490 490 490 399 217 473 365 365 365 365 365 365 330 388 212 384 191 191 191 314 401 401 75 384 490 490 31 342 342 224 494 494 129 259 74 190 487 487 374 374 374 173 173 176 176 135 200 200 248 359 359 474 474 474 474 19 454 229 229 491 247 312 312 126 292 292 292 292 292 292 292 292 21 21 21 21 21 21 21 21 21 21 21 260 408 408 408 408 391 391 491 316 73 73 73 73 491 127 127 258 258 258 258 31 342 342 342 97 72 72 110 254 254 254 254 314 35 259 137 137 137 137 137 33 33 394 32 465 384 180 180 319 319 319 348 195 195 250 250 364 333 333 220 220 216 114 180 113 113 113 113 167 167 131 449 183 156 156 156 156 406 467 255 255 236 314 90 4 280 280 265 265 265 85 146 146 186 39 342 142 221 336 354 420 420 360 360 135 135 200 200 464 145 376 376 376 376 460 169 150 150 342 86 105 6 96 96 227 419 419 439 78 47 47 47 47 491 491 80 80 289 289 209 83 194 194 194 194 194 194 282 388 195 195 131 472 472 196 217 70 65 65 319 169 150 150 86 238 6 472 221 336 74 190 492 492 492 492 245 349 205 205 261 25 148 148 148 372 372 396 396 271 186 39 86 86 142 221 336 354 420 420 420 422 143 36 371 490 490 31 9 142 221 336 336 74 190 487 487 288 374 374 374 132 132 132 173 173 96 6 227 419 439 439 78 170 305 170 28 28 491 28 491 28 28 28 491 362 362 362 362 362 362 491 362 362 362 491 362 362 305 305 218 218 491 218 218 218 491 218 218 218 218 435 435 211 211 369 21 21 21 21 21 21 21 21 23 23 260 260 260 260 391 391 391 491 73 73 289 289 320 320 109 109 84 84 139 139 139 139 139 16 16 293 293 293 43 364 345 152 152 152 152 422 402 221 221 354 137 137 137 137 33 394 76 465 164 214 214 214 360 360 76 458 192 176 176 135 200 200 464 255 255 8 354 180 113 113 113 113 206 240 285 34 277 277 457 173 155 155 332 332 38 162 232 68 115 273 231 231 231 231 53 53 90 76 108 119 103 103 103 103 103 103 103 85 299 299 203 381 381 117 404 263 439 439 417 417 237 237 47 491 80 80 80 80 435 435 209 287 297 297 297 297 293 293 293 43 364 109 109 278 278 348 64 465 449 300 382 495 467 467 242 116 33 394 393 205 261 25 470 376 376 376 376 460 178 178 233 96 96 227 419 419 439 225 80 80 80 80 80 320 456 456 456 456 236 108 119 308 308 308 308 308 179 313 64 212 472 196 217 473 65 329 495 406 467 134 134 139 175 175 423 423 423 423 263 13 229 229 491 312 15 15 15 193 193 193 193 17
+103-1240-0038	103	357	1 24 19 31 19 40 4 23 19 17 40 4 25 11 14 31 29 13 25 31 14 38 5 40 5 29 18 20 30 38 5 25 11 15 9 19 16 6 30 22 30 19 31 24 5 31 5 25 11 32 20 31 13 11 32 20 38 5 40 17 27 19 26 33 19 17 13 33 15 23 19 33 5 23 17 14 23 16 14 24 12 20 5 31 8 23 5 24 27 37 14 19 25 1 2 19 25 12 5 31 29 30 19 26 1	14 4 2 3 2 3 4 2 3 3 3 4 2 2 3 5 2 3 4 3 4 3 1 4 3 3 2 2 4 4 2 3 2 5 2 3 4 2 3 4 2 2 6 3 3 5 3 1 3 5 5 5 3 3 4 3 2 2 2 4 2 3 2 2 2 3 4 1 3 4 1 2 2 3 5 5 7 3 2 3 1 2 2 6 5 4 2 4 3 4 2 2 2 2 17 5 2 2 1 5 4 2 2 8 12	17 17 17 363 363 363 363 363 408 51 149 228 491 491 320 7 473 258 258 258 31 342 224 494 494 368 453 168 180 145 329 329 175 175 81 81 469 416 416 96 453 168 470 365 365 365 330 348 212 300 300 382 313 186 162 232 232 105 105 336 470 432 432 330 330 64 64 77 449 224 300 156 382 245 43 43 345 141 141 281 453 342 168 230 230 230 215 35 74 183 485 286 286 382 245 245 43 364 276 174 174 174 348 348 64 64 212 93 93 93 93 93 301 8 8 255 255 349 349 155 155 148 148 372 372 245 245 458 458 208 190 487 487 278 258 31 342 342 86 105 196 217 473 459 459 271 271 39 433 433 160 168 89 55 322 67 394 76 310 338 338 400 400 400 400 30 324 422 186 162 68 68 115 470 470 120 240 240 314 310 338 400 400 400 400 301 378 345 141 141 281 281 9 221 336 144 106 496 88 319 146 135 339 76 36 377 87 87 416 259 445 180 443 443 240 325 34 44 44 44 251 251 241 431 278 26 302 302 497 497 122 32 401 82 144 498 498 498 498 498 139 302 293 497 497 497 122 393 155 155 165 165 165 165 466 466 448 448 464 255 38 38 162 68 115 273 273 265 265 85 85 146 146 175 81 81 242 203 399 70 65 410 410 410 410 173 280 29 29 406 467 89 446 67 58 72 72 72 496 496 496 215 215 35 96 96 272 449 242 275 275 275 303 195 199 335 188 188 340 340 466 22 283 455 38 162 232 105 105 336 491 336 190 380 288 288 288 328 328 200 303 303 117 48 78 491 491 491 491 421 421 491 201 193 193 193 193
+103-1240-0039	103	721	1 31 27 24 4 34 39 36 5 25 11 8 18 4 37 33 6 22 33 19 33 27 37 14 6 16 5 25 11 6 25 13 37 14 31 19 25 31 1 38 20 34 6 33 38 20 11 17 13 33 5 9 28 1 24 4 34 39 36 19 40 17 19 33 19 26 5 29 19 25 39 14 40 39 36 25 27 1 18 20 40 31 19 22 31 33 20 1 4 25 11 18 20 19 40 5 25 31 27 31 29 30 8 13 40 18 20 38 5 25 31 38 3 40 1 18 19 40 18 3 30 33 30 5 9 5 23 40 18 19 24 5 17 35 11 20 23 1 4 25 11 39 36 25 27 18 7 11 13 31 29 30 19 33 18 3 30 11 19 33 31 17 3 33 19 9 20 33 19 17 13 33 18 8 14 11 18 13 23 29 1	9 9 5 3 6 3 2 4 1 2 2 7 2 2 2 5 3 3 2 2 2 7 3 6 6 4 3 1 2 7 5 4 3 3 5 3 6 7 19 3 3 3 5 2 2 2 2 3 2 2 2 4 10 24 5 5 3 2 2 2 3 3 2 2 2 5 3 4 1 4 6 7 3 3 2 3 10 4 6 2 3 5 2 4 4 3 7 1 6 1 2 1 5 2 4 1 3 5 4 5 3 2 7 3 3 2 3 4 2 3 4 5 7 7 17 4 2 3 3 4 4 6 2 3 2 2 2 3 2 2 3 2 4 2 7 7 6 19 5 2 1 4 1 5 5 4 5 7 3 5 2 2 1 2 4 3 3 3 2 2 3 3 4 5 2 3 5 2 1 3 3 2 5 5 4 2 3 3 5 5 15	17 17 17 363 363 51 51 491 184 373 373 66 232 68 172 115 273 84 84 16 16 274 274 399 70 473 65 329 329 460 169 169 164 164 164 485 485 485 374 88 89 89 446 212 131 106 111 111 111 111 85 438 58 110 110 202 202 202 402 221 36 108 119 437 405 405 206 178 35 96 96 272 277 191 325 34 180 410 410 410 410 410 410 173 280 29 334 382 59 59 245 335 14 287 284 405 405 206 169 349 352 25 242 242 242 116 94 199 106 426 426 426 426 282 282 388 195 117 335 335 440 145 463 463 463 463 29 382 313 186 162 342 172 115 273 432 432 330 379 379 243 243 243 77 433 433 433 160 112 56 247 247 312 126 292 292 292 292 1 326 326 326 326 101 101 149 149 228 491 491 320 345 152 152 422 422 164 164 106 106 405 405 206 167 35 35 397 152 152 152 314 90 458 445 180 443 443 240 285 44 44 8 259 354 153 153 153 153 387 387 207 207 19 454 417 417 417 417 417 237 237 47 491 47 2 491 2 491 491 2 316 316 316 491 316 316 73 491 289 491 7 217 217 473 329 329 329 329 329 329 164 485 485 485 485 374 141 281 281 9 221 336 445 180 443 240 325 449 176 135 135 200 200 180 230 230 230 230 215 35 192 340 340 340 116 33 33 219 219 219 219 286 286 286 286 334 304 304 304 185 49 323 219 219 152 152 152 236 94 331 84 84 84 84 84 16 274 98 263 13 417 417 417 417 435 225 373 451 451 451 30 30 422 186 162 232 232 68 68 115 273 278 278 178 143 96 96 86 86 86 238 272 41 41 41 41 19 19 454 454 225 225 225 83 83 55 55 55 322 212 34 30 30 30 324 356 356 356 356 281 453 342 242 242 379 379 478 478 68 172 344 344 344 344 344 186 162 54 482 142 105 336 336 190 499 499 499 499 85 85 146 146 464 253 253 253 368 342 342 451 30 30 301 378 43 276 174 174 319 319 348 379 77 77 9 142 397 336 276 276 346 346 346 387 355 355 355 37 185 185 269 433 433 112 427 82 247 312 126 292 292 326 326 326 326 326 326 326 326 101 101 149 149 228 491 451 257 257 257 31 342 142 72 72 437 306 306 306 306 306 396 167 167 457 32 401 259 161 161 487 499 151 481 215 215 29 302 497 497 71 71 342 342 57 57 57 57 203 53 44 44 44 416 239 458 484 484 484 484 314 32 32 259 384 371 213 213 213 286 286 139 139 302 375 375 98 98 13 417 229 491 247 126 126 126 292 326 326 326 408 408 408 149 149 228 491 289 491 83 55 55 55 322 67 212 219 219 152 152 152 236 10 309 331 331 84 84 16 274 88 58 72 268 268 268 268 268 268 274 32 32 401 401 401 384 371 443 443 443 150 150 86 105 105 336 29 29 288 313 285 131 72 72 437 306 306 306 306 306 396 396 24 325 34 177 177 143 77 342 142 221 336 144 180 405 405 206 167 167 35 36 377 377 87 87 8 354 420 420 420 246 246 252 325 87 87 416 458 445 180 443 443 240 325 58 72 72 72 437 265 265 85 85 468 468 468 396 313 24 131 58 72 110 351 139 139 139 139 293 375 375 233 233 233 419 229 491 247 312 15 15 15 15 15 15 193 193 193 193 193 17
+103-1240-0040	103	733	1 12 13 30 40 25 13 37 14 1 13 25 20 9 5 11 20 33 5 9 20 18 4 11 9 5 33 12 27 40 31 33 36 29 19 11 18 4 16 17 30 27 25 23 19 33 5 23 16 30 13 25 10 9 28 40 1 4 25 11 13 40 31 36 25 13 40 39 36 11 36 17 19 33 18 38 5 25 9 30 27 22 19 25 33 36 39 35 30 38 15 40 5 25 11 33 6 33 31 5 24 34 19 26 1 18 20 40 5 29 5 25 11 6 16 33 5 12 5 23 3 9 31 33 14 22 4 25 14 20 40 14 12 5 31 33 15 33 31 1 4 33 16 14 31 33 24 4 34 39 36 31 5 17 21 13 31 33 19 11 17 19 33 19 26 5 18 27 24 9 28 1 9 5 33 8 31 13 11 25 27 16 23 4 33 5 12 4 33 1	17 3 1 3 3 3 3 2 9 1 4 4 3 2 3 2 3 3 2 2 4 6 5 2 2 2 2 3 3 3 6 3 4 3 4 3 6 5 6 2 2 4 2 2 2 2 2 2 6 2 1 4 5 3 9 6 18 5 1 2 2 3 5 3 3 3 5 2 2 4 5 3 4 2 3 3 2 4 3 2 4 3 2 2 3 1 2 3 3 4 7 3 2 2 1 4 4 5 3 2 4 3 2 6 10 4 3 4 6 5 3 1 3 6 3 3 2 2 2 4 2 4 2 3 2 5 4 3 4 4 5 4 3 2 4 3 6 4 9 14 4 2 5 7 4 3 5 4 4 2 3 4 2 1 4 3 3 3 2 2 3 2 2 2 4 2 5 7 2 3 12 22 3 2 2 6 6 2 3 5 8 5 3 3 4 2 3 7 8 15	17 17 17 296 296 363 52 52 52 52 52 51 51 51 51 184 491 491 412 0 0 222 356 356 281 9 196 196 479 331 463 463 463 463 29 29 382 245 335 14 14 226 226 226 226 209 209 475 475 475 475 475 475 475 324 301 8 354 180 151 240 240 325 41 324 324 422 36 377 87 87 354 420 420 420 324 3 58 72 72 110 110 486 486 486 460 282 37 24 35 259 159 159 159 236 35 198 127 114 84 496 496 274 186 162 482 482 482 482 482 238 6 272 371 485 374 374 374 8 354 29 191 191 191 37 24 131 472 225 72 72 72 110 110 486 486 460 460 460 169 352 352 402 221 401 336 79 79 288 84 496 496 413 348 250 250 81 278 278 285 285 302 302 497 497 349 234 234 234 261 190 380 288 288 330 64 64 76 310 107 447 447 221 336 354 153 153 153 153 387 387 304 304 185 185 269 433 160 112 427 491 247 312 126 292 292 292 292 292 326 326 326 408 408 408 149 149 228 491 412 83 55 55 55 322 212 34 34 253 253 31 162 482 482 115 485 485 374 374 374 339 94 199 253 253 253 253 453 9 219 219 152 152 152 301 236 239 384 371 371 374 374 132 132 416 416 445 445 180 443 240 385 131 133 133 364 276 174 174 174 319 348 33 250 394 212 465 190 380 380 496 496 274 143 458 192 192 340 340 33 394 108 377 123 123 219 222 222 222 222 245 245 43 43 276 109 109 403 403 403 207 318 318 318 49 342 168 89 89 116 33 76 108 119 437 437 405 405 405 206 167 167 457 35 77 68 342 273 231 231 231 203 53 76 198 164 214 214 214 328 200 200 117 229 229 247 126 126 326 326 408 408 391 228 491 491 373 451 30 30 356 356 368 453 168 180 230 230 230 230 230 215 35 35 35 401 89 89 89 446 67 212 131 106 106 284 405 405 206 169 349 402 402 6 272 123 123 216 216 22 283 455 251 241 431 405 405 405 215 215 169 270 86 238 6 300 382 382 245 458 445 445 351 351 365 365 365 365 388 94 199 495 495 406 337 41 41 318 318 49 9 168 157 157 157 467 313 313 216 22 22 283 38 162 232 232 238 6 272 470 470 171 171 171 358 358 358 233 270 270 433 433 433 160 18 112 439 439 439 439 237 237 237 237 237 237 237 237 80 80 491 435 435 412 83 415 415 415 131 393 234 261 261 25 498 498 498 498 498 396 186 39 54 86 238 6 472 472 196 217 217 473 65 486 486 460 460 169 164 164 485 485 485 152 152 422 162 323 224 224 494 494 236 36 310 395 470 470 151 151 150 39 86 342 272 191 191 191 314 90 401 259 445 180 443 240 325 176 135 200 200 199 44 44 44 58 72 72 72 350 350 350 350 350 413 413 53 250 250 212 354 354 153 153 153 387 387 387 207 207 19 454 229 82 247 126 126 126 126 326 326 326 326 326 326 326 326 326 101 408 408 149 391 491 289 289 289 320 159 159 159 240 285 111 111 111 111 438 438 186 162 342 68 273 470 120 240 240 314 196 196 309 479 331 331 84 84 84 16 16 274 274 349 349 234 234 261 425 386 431 376 376 460 167 167 36 108 377 123 123 216 127 114 92 92 92 92 92 92 282 385 385 233 227 419 439 417 417 237 237 237 421 491 421 421 491 491 128 491 128 193 193 17
+103-1240-0041	103	675	1 12 15 24 15 9 20 6 23 30 8 33 8 24 25 3 33 31 15 19 26 12 13 30 25 3 33 1 9 5 33 25 27 23 5 25 11 5 25 31 33 30 20 33 4 30 5 9 40 16 14 24 20 1 8 31 13 11 1 17 19 37 24 20 5 25 15 33 19 37 9 6 30 25 4 33 23 20 31 33 1 12 13 30 5 23 9 20 5 30 19 31 22 25 27 24 4 33 14 18 36 38 20 17 13 33 1 9 5 33 8 23 16 20 23 1 20 40 20 14 19 25 24 8 24 8 25 11 5 25 11 31 23 20 29 31 7 25 11 14 4 33 25 8 33 31 19 16 38 20 17 13 33 5 9 6 30 25 22 5 25 15 11 20 5 25 1	17 3 4 3 5 2 4 6 4 4 10 3 3 2 3 5 3 5 5 2 2 2 2 4 4 9 7 8 2 3 3 4 5 6 2 3 2 2 3 4 3 2 2 3 5 5 2 2 5 2 3 4 10 4 8 6 4 6 19 3 3 2 4 4 2 6 5 2 2 3 3 4 2 2 2 3 4 6 6 6 19 2 2 1 2 1 3 3 5 5 5 4 4 3 3 4 2 3 2 5 3 2 4 4 5 6 22 3 1 3 4 2 5 4 3 1 9 4 2 4 3 2 3 5 5 9 3 2 1 2 3 4 2 4 3 6 5 3 2 3 2 3 2 6 4 2 2 2 2 2 3 2 2 1 5 4 3 2 4 2 2 6 2 4 1 5 22	17 17 17 296 211 211 52 52 52 363 363 363 408 51 51 228 491 491 320 114 0 0 0 301 399 473 65 476 476 476 171 252 8 420 420 420 324 464 106 297 297 297 297 297 293 293 42 42 42 147 380 499 499 499 428 428 85 85 85 207 207 358 24 34 34 111 111 319 203 53 10 479 307 307 307 307 61 167 167 478 478 68 68 172 115 470 403 403 403 403 135 135 135 200 248 212 127 114 222 222 468 313 313 10 10 479 331 307 307 307 307 426 426 206 206 206 385 24 227 419 419 439 439 225 225 225 225 225 80 80 289 320 159 159 159 159 236 35 196 196 479 331 231 231 231 16 274 274 274 251 251 241 431 431 319 319 348 64 64 212 34 242 242 116 394 478 162 482 482 238 6 161 487 487 213 213 252 252 335 14 411 145 145 486 486 468 468 467 467 467 134 215 8 270 270 86 9 142 393 155 332 332 332 245 399 217 429 429 429 429 246 246 246 19 19 454 454 225 225 417 417 80 80 412 287 111 111 111 438 438 186 162 342 68 115 273 470 120 120 120 37 24 24 404 427 229 491 247 312 126 292 292 292 326 326 326 408 408 408 391 228 228 289 289 289 320 209 445 278 278 278 314 314 196 217 429 429 429 429 464 464 44 44 44 10 10 10 309 479 331 171 171 171 252 325 34 494 173 402 402 221 259 354 354 153 153 153 387 372 396 285 415 415 415 415 457 26 251 241 431 444 444 213 213 246 358 358 39 433 433 86 86 6 6 227 419 439 417 417 237 237 237 237 491 491 491 47 47 491 491 491 316 316 316 73 73 80 435 412 114 0 139 139 139 293 293 8 420 420 420 420 464 44 44 44 44 42 42 147 147 380 288 278 278 278 271 271 39 342 86 105 105 144 472 196 196 331 331 231 231 274 399 217 473 65 486 460 240 285 300 382 245 58 72 72 489 489 374 132 132 8 152 152 152 152 324 416 458 445 445 180 120 120 120 37 385 233 227 419 427 229 491 247 312 126 292 292 292 292 292 326 23 23 326 326 326 101 101 101 149 149 228 491 289 320 159 159 285 285 106 111 111 284 481 293 169 349 205 205 25 485 485 485 139 139 293 497 335 14 411 411 213 213 213 213 213 318 368 368 453 168 41 324 485 382 406 467 467 340 340 340 116 33 250 70 46 46 46 46 438 438 399 217 70 65 480 480 480 480 480 480 85 299 299 299 299 339 64 212 89 89 322 116 394 478 478 232 232 68 26 26 81 444 213 252 215 129 401 478 232 68 68 115 273 470 315 315 315 450 450 413 64 64 131 300 382 382 467 415 415 236 35 196 196 479 331 265 428 428 85 146 358 233 270 342 224 118 118 118 118 402 345 152 152 152 458 445 180 443 443 285 34 44 44 8 32 259 354 153 153 153 372 372 467 467 299 394 76 465 445 351 351 116 94 199 331 171 171 171 171 252 325 34 324 324 464 275 275 275 303 48 48 417 417 417 491 237 237 237 491 421 421 421 491 128 128 491 128 491 305 128 128 193 193 17
+103-1240-0042	103	780	1 31 27 19 25 12 20 13 25 11 38 20 11 19 31 8 11 19 11 33 36 4 31 22 24 19 31 19 40 31 29 13 25 31 14 33 19 29 19 22 5 31 7 33 38 5 25 38 13 25 32 20 38 13 25 33 27 37 14 33 19 17 13 33 18 14 23 19 33 5 23 17 14 23 1 38 20 18 14 11 23 4 31 33 38 20 22 32 20 38 5 40 17 27 19 26 1 31 27 38 20 31 13 25 33 18 14 38 14 11 9 8 30 19 10 14 11 31 29 13 25 31 14 40 16 27 22 31 4 33 22 3 30 24 5 11 20 33 19 9 30 19 26 5 31 5 31 24 3 30 33 1 23 8 22 23 20 9 28 5 37 5 9 7 33 13 25 14 19 23 13 37 5 25 1 38 20 11 19 31 8 11 19 11 12 4 33 38 35 11 9 20 12 5 9 13 31 33 15 21 1	23 7 4 3 3 3 5 3 3 3 2 3 3 2 6 6 2 4 1 3 4 6 5 3 2 2 4 2 2 3 3 3 3 5 4 3 2 3 3 3 4 3 7 4 5 3 5 2 2 3 4 3 3 3 1 4 4 3 2 3 2 3 3 2 4 3 3 2 2 3 1 3 7 6 31 4 3 3 5 3 3 4 4 3 2 4 3 4 2 3 2 3 4 4 4 9 12 7 3 3 3 6 2 2 2 5 4 7 4 3 3 6 2 3 4 2 4 2 2 3 4 4 3 3 4 6 3 2 2 2 5 2 3 3 1 3 4 3 2 3 1 2 4 1 4 3 6 3 4 3 3 4 5 5 5 4 2 4 11 1 3 3 2 9 10 2 3 3 3 4 3 3 2 6 27 4 2 2 3 5 7 2 4 3 3 2 4 1 2 2 3 4 1 2 3 5 6 3 5 8 7	17 17 17 296 305 317 317 491 317 491 317 461 491 461 461 435 435 491 435 435 491 491 435 289 373 66 68 115 273 273 84 16 88 88 109 340 340 340 466 466 22 283 448 448 448 464 464 432 432 432 330 330 388 195 64 131 133 345 152 152 152 422 314 239 371 490 490 38 342 68 115 273 106 265 265 265 85 146 146 325 34 191 191 191 37 314 36 377 87 87 87 14 14 145 145 376 376 460 460 150 150 342 105 221 336 96 196 217 473 258 258 31 342 224 494 494 494 31 162 232 86 105 336 354 470 432 432 330 379 64 77 77 224 224 300 334 334 59 313 313 36 377 87 87 87 129 74 74 351 278 416 416 144 180 180 151 240 368 453 342 168 180 113 113 113 113 450 167 35 131 133 133 364 364 276 174 174 174 174 348 348 195 195 250 250 345 409 409 409 116 64 76 310 338 400 400 30 301 378 43 345 109 109 330 330 64 76 449 449 180 410 410 410 410 8 29 29 382 313 236 36 377 87 87 416 458 445 180 443 240 385 131 58 156 156 156 156 313 313 251 251 81 431 278 285 26 302 302 497 497 416 458 144 498 498 498 498 498 134 302 375 375 98 98 13 229 229 491 312 312 126 292 292 292 326 326 326 326 326 326 326 326 326 326 326 326 326 326 326 101 101 101 101 149 149 228 491 491 320 152 152 152 422 58 58 72 498 498 498 498 396 313 314 35 26 241 241 376 376 376 460 169 150 86 86 6 272 472 397 354 109 213 213 213 358 143 458 96 99 338 400 400 400 301 378 8 141 141 281 281 9 221 221 144 180 84 84 496 88 88 176 176 176 328 328 200 117 117 454 454 439 78 491 491 312 126 126 326 326 326 101 408 408 149 228 491 373 66 66 115 273 84 84 16 43 43 345 152 152 152 422 162 68 68 115 273 273 432 330 330 64 131 131 183 156 156 156 156 156 156 245 43 43 364 276 276 109 498 498 498 59 396 313 24 131 472 259 354 62 62 62 62 438 438 42 147 380 288 329 329 36 107 395 300 382 313 314 478 478 478 172 105 336 470 432 432 330 330 33 394 77 54 107 395 382 382 313 186 31 54 142 393 336 25 25 496 496 496 496 274 215 233 270 270 342 224 415 415 325 472 458 144 27 437 437 306 306 306 396 396 53 53 469 469 24 325 41 41 41 324 422 36 108 377 87 87 8 239 190 380 288 360 360 200 200 464 459 271 31 342 224 44 44 38 162 232 232 482 105 105 196 70 65 65 306 306 306 396 396 385 131 472 225 225 225 225 225 225 7 251 241 431 266 266 266 266 146 178 35 35 401 26 359 359 166 166 324 301 8 129 354 354 153 153 153 153 387 387 387 207 464 464 464 69 130 130 280 255 255 236 8 354 180 113 113 113 113 113 450 167 167 457 401 401 401 75 108 119 351 351 351 432 330 388 199 199 495 495 406 467 134 302 251 251 241 431 443 443 443 173 280 29 275 275 275 303 303 303 48 13 229 491 491 312 312 312 292 292 292 292 292 21 21 21 21 21 21 21 21 21 408 408 149 149 149 491 491 491 320 152 152 152 422 143 384 490 490 490 31 342 68 115 273 470 265 265 428 85 146 146 325 325 191 191 191 191 314 314 198 127 114 114 92 92 92 167 457 364 345 389 389 314 129 259 354 420 420 420 301 216 22 283 455 236 259 354 180 443 443 443 169 150 150 39 342 86 238 272 371 470 93 171 171 171 358 358 233 310 107 107 112 439 417 417 237 237 128 193 193 193
+103-1240-0043	103	737	1 27 23 11 19 25 5 16 33 19 9 20 5 37 31 5 24 39 36 31 19 25 11 36 19 26 10 6 30 40 30 8 33 6 16 1 5 25 11 39 5 26 19 25 5 16 33 5 9 20 33 30 15 25 11 5 29 30 3 29 14 1 38 20 24 20 25 33 19 17 19 37 19 24 5 17 35 11 18 27 24 4 25 11 31 22 36 23 19 26 1 38 20 18 4 11 5 33 13 23 5 17 30 4 24 16 14 24 19 31 19 40 4 23 19 17 40 4 25 11 14 31 29 13 25 31 14 33 5 11 15 1 12 5 24 15 23 24 4 25 9 30 6 33 19 33 16 14 24 12 5 31 33 15 32 5 25 1 31 15 19 26 12 15 38 14 22 5 24 19 26 3 25 12 5 16 8 37 34 14 11 20 33 30 15 25 33 5 25 8 33 1	7 6 5 2 2 3 4 3 3 1 4 6 2 3 6 4 4 5 5 5 2 2 3 4 3 4 6 7 5 7 3 5 3 9 9 2 6 1 2 4 4 3 1 3 2 3 2 2 1 4 5 3 4 2 3 5 7 2 3 4 8 27 3 3 4 3 3 2 2 3 1 2 2 3 3 5 2 2 5 6 5 4 2 3 5 2 6 2 2 6 11 5 2 3 2 2 2 6 2 3 3 1 3 5 4 3 2 3 3 3 3 3 3 3 2 2 3 4 2 2 3 4 3 1 5 3 3 3 2 2 13 18 3 2 4 5 4 3 4 3 4 2 4 3 3 2 3 2 2 2 2 3 3 4 6 2 5 12 7 7 1 5 2 4 3 3 6 3 3 2 4 3 2 2 2 6 7 2 6 3 2 4 6 3 5 2 3 2 3 9 6 6	17 17 296 296 296 184 184 412 209 287 424 424 424 424 424 274 274 122 285 34 34 242 116 479 331 230 230 230 169 349 402 96 36 377 87 87 87 129 354 420 420 420 420 246 3 464 223 223 130 402 478 232 232 232 172 115 273 231 231 231 231 203 53 53 219 219 219 219 219 485 374 374 132 132 186 39 54 342 224 89 340 116 33 394 212 384 371 374 374 88 88 176 176 135 200 200 248 76 465 310 107 395 395 441 441 153 153 153 182 372 372 372 372 304 304 185 185 269 269 9 142 97 397 336 147 380 499 499 428 85 146 146 325 34 106 106 106 426 426 426 426 206 169 169 352 352 352 352 352 352 97 97 225 225 225 83 55 55 55 322 67 64 212 219 219 219 464 180 180 319 319 348 200 464 242 116 94 331 230 169 169 402 402 6 377 87 87 420 420 420 422 422 129 310 161 161 487 487 288 288 290 290 434 434 339 64 212 131 180 230 230 230 167 167 457 401 401 491 190 190 190 488 488 488 405 206 215 215 35 29 334 334 59 59 452 452 263 229 491 247 312 126 292 292 292 1 1 1 1 21 21 21 21 21 21 21 260 260 260 260 391 391 391 491 491 320 345 152 152 152 301 399 217 473 360 360 360 434 339 64 64 108 377 87 87 416 445 485 278 173 280 57 57 57 53 473 44 44 44 416 129 259 144 484 484 484 285 131 58 72 72 72 437 350 350 350 350 350 413 203 381 335 335 14 440 145 194 446 446 33 394 478 478 482 482 482 482 105 336 208 441 153 153 153 182 182 175 81 176 176 328 328 303 117 48 417 417 417 417 237 237 237 491 47 80 491 80 491 7 7 152 152 152 58 58 110 110 254 254 240 34 44 44 236 36 108 119 119 351 486 139 175 175 81 81 469 416 8 79 380 288 288 365 365 282 203 203 53 394 393 155 155 332 332 165 399 217 473 258 258 258 31 342 224 494 494 368 453 168 168 145 329 329 329 175 81 81 469 416 416 453 453 470 365 365 365 365 388 64 212 300 382 382 313 186 54 54 105 336 354 470 432 330 379 379 77 77 54 224 300 334 313 236 36 377 377 87 236 236 93 93 93 93 93 93 207 207 207 207 19 454 229 247 247 126 126 126 326 326 326 326 326 326 326 326 101 101 149 149 491 289 491 127 5 5 455 399 217 473 65 290 290 171 139 139 139 293 293 399 217 65 136 136 136 136 282 388 33 394 32 259 354 190 380 499 405 405 206 206 285 449 34 277 277 24 314 393 155 155 165 165 165 165 466 22 22 283 38 162 342 238 6 272 470 470 171 171 171 358 99 436 436 60 60 298 298 303 303 117 48 229 491 247 126 126 326 326 326 408 408 408 149 228 491 373 66 68 68 68 273 470 403 403 403 403 207 135 135 135 200 200 248 212 127 0 0 0 0 378 378 347 347 347 347 245 143 458 144 27 437 437 319 319 319 53 53 176 176 135 328 200 200 199 125 125 125 125 348 466 283 455 38 349 234 234 261 25 346 265 265 85 85 146 146 438 349 349 234 234 261 164 273 498 498 498 313 285 34 41 324 324 422 143 259 161 161 161 487 487 288 290 290 290 434 434 434 339 394 36 377 377 87 236 10 479 331 331 428 428 428 428 207 207 358 358 233 465 227 419 439 78 421 491 491 193 193 17
+103-1240-0044	103	819	1 31 27 24 4 34 39 36 38 13 25 33 5 9 30 8 33 30 19 37 14 33 5 24 20 33 18 19 24 1 24 19 31 19 40 31 29 13 25 31 14 38 5 23 11 30 3 29 19 24 6 16 12 13 30 1 5 37 22 6 30 31 1 32 20 17 27 40 3 25 33 5 38 8 33 31 4 25 11 40 31 33 15 32 5 25 18 14 31 13 23 16 1 24 19 31 19 40 30 15 10 5 23 29 30 8 11 19 11 18 14 31 13 23 16 3 25 6 23 38 20 40 31 29 20 22 19 26 18 14 24 8 25 11 1 32 20 29 30 5 31 20 11 5 11 33 19 31 29 20 22 19 33 25 7 1 18 4 37 19 26 5 21 5 31 33 19 11 18 14 24 13 25 33 5 23 4 33 5 33 36 11 33 19 12 19 31 5 24 15 40 19 26 29 20 31 5 37 25 39 36 40 1	8 7 4 4 6 3 2 4 3 2 2 4 2 4 1 6 4 2 3 2 3 3 3 3 3 2 3 2 5 23 3 2 5 3 3 5 2 3 4 4 3 2 2 2 3 2 3 4 2 3 5 5 2 2 9 24 4 2 3 3 5 6 11 5 5 3 7 4 5 2 2 3 4 5 4 3 5 2 2 1 2 3 3 4 2 1 4 3 5 3 5 9 53 3 3 4 2 4 3 3 3 3 3 5 3 4 2 3 2 4 4 4 3 4 4 1 3 6 4 3 3 2 5 3 3 2 2 3 4 3 3 6 3 4 18 7 3 2 3 2 5 3 2 2 3 2 3 3 3 3 2 3 3 2 11 5 3 4 2 2 4 3 5 3 4 3 2 2 4 2 6 2 2 2 2 4 6 2 3 5 5 1 3 3 2 2 4 2 4 7 4 2 4 4 4 4 2 3 2 4 3 12 12	17 17 17 296 296 184 184 184 435 435 66 172 115 273 273 84 344 16 274 399 399 473 65 486 486 486 460 460 169 164 164 164 485 485 485 301 378 43 364 109 109 189 330 330 64 76 465 377 123 123 236 32 259 354 190 380 499 428 428 85 146 146 35 35 133 133 147 288 288 278 173 280 29 29 382 313 236 108 377 87 87 399 217 473 213 213 213 252 325 325 183 57 57 57 57 203 381 117 404 13 229 491 247 312 126 292 292 292 292 292 292 21 326 326 326 408 408 408 408 149 149 228 491 320 217 473 258 258 258 342 342 224 494 494 494 258 31 162 232 232 68 68 105 105 336 470 329 329 330 330 379 64 77 342 224 300 300 382 245 245 43 345 109 389 497 497 122 239 161 79 499 499 405 206 215 35 29 57 57 57 203 70 106 426 426 426 426 169 169 352 352 402 198 127 114 114 264 264 264 264 59 59 59 452 263 263 417 417 417 417 170 170 47 47 491 491 2 2 47 2 316 2 491 491 316 316 73 73 289 435 435 83 255 255 130 402 458 144 441 441 153 153 372 372 396 271 186 39 342 323 97 427 247 247 126 126 326 326 326 326 101 101 149 149 491 373 338 400 400 400 400 30 301 416 416 180 180 84 84 496 496 274 71 368 368 453 168 106 426 426 426 426 413 348 64 465 377 123 123 123 43 276 346 346 346 428 85 146 146 252 36 478 66 68 115 470 486 365 365 365 330 388 33 77 77 342 68 238 6 272 470 171 171 252 99 99 436 60 60 116 94 58 58 156 156 156 313 186 186 162 68 115 273 273 279 279 279 279 279 375 375 352 352 352 352 352 352 112 112 417 417 237 237 237 237 237 237 491 237 237 491 237 237 237 362 362 362 362 491 491 362 362 362 491 218 491 218 491 491 211 218 218 366 366 491 491 366 366 366 366 491 366 366 366 366 163 316 316 316 316 73 73 491 320 7 473 258 258 258 31 342 224 494 494 494 31 9 142 397 147 380 329 329 329 329 329 310 107 395 302 302 497 497 122 129 259 190 190 190 488 499 265 265 85 146 146 325 325 34 382 313 285 325 183 156 156 156 156 396 313 186 162 172 115 273 279 279 279 279 279 293 169 352 352 155 125 125 322 94 335 14 14 411 297 297 297 297 297 293 43 345 109 109 109 171 422 186 162 68 68 68 105 105 336 354 213 213 213 143 192 192 135 135 135 200 248 58 156 156 156 156 245 245 399 70 65 480 480 480 480 85 299 299 299 303 243 227 419 427 56 491 247 312 126 292 292 326 326 326 326 326 326 326 101 101 408 228 228 373 338 338 400 400 400 400 301 143 129 74 190 492 492 492 186 162 342 172 444 444 444 444 252 325 325 191 191 191 314 36 108 87 87 87 38 342 86 105 336 470 213 213 213 143 458 192 277 277 277 314 196 196 479 331 331 315 315 315 315 450 450 450 98 263 417 417 417 225 225 225 72 72 110 202 202 202 202 202 280 135 135 135 200 464 464 255 255 236 239 259 107 395 180 151 151 169 150 150 86 238 6 272 191 191 191 240 58 183 156 156 156 156 245 399 399 217 217 473 432 432 330 348 64 212 449 302 302 497 497 14 14 411 145 145 486 460 240 325 449 469 469 469 236 259 108 449 485 485 374 374 374 37 24 259 377 377 123 123 216 22 283 283 38 162 68 342 224 494 494 399 217 217 473 290 290 171 171 171 252 318 368 342 342 176 176 176 328 200 248 248 76 74 485 213 213 213 213 186 39 342 224 462 462 462 462 402 196 196 398 398 398 398 398 374 374 132 132 185 185 185 323 390 18 112 427 56 56 491 491 15 15 15 15 15 193 193 193 193 17 17
+103-1240-0045	103	783	1 38 13 23 24 3 30 19 23 5 1 8 23 21 5 31 33 13 23 39 36 29 23 15 25 12 5 33 8 34 19 26 22 39 35 30 11 36 19 26 5 24 8 33 20 16 36 23 19 32 34 19 26 1 5 30 19 31 22 20 34 19 26 12 4 33 31 38 5 33 1 39 36 11 27 25 33 25 27 38 5 33 39 35 30 17 13 33 19 26 1 39 35 30 9 30 19 26 19 26 5 31 33 30 15 25 21 10 8 23 11 19 25 33 36 39 6 30 18 7 31 5 25 11 18 27 24 1 4 25 11 39 36 11 27 25 33 25 27 5 31 19 26 17 5 23 34 19 26 5 9 7 33 19 24 25 6 30 38 5 33 18 19 40 11 19 31 29 5 40 19 32 5 25 19 40 23 8 22 25 6 30 38 5 33 31 6 30 33 5 37 29 13 30 5 25 33 31 18 20 18 4 11 1	15 8 5 5 3 2 3 3 3 7 12 9 5 3 3 5 5 3 3 4 4 5 3 8 2 2 3 2 8 5 2 4 3 3 1 4 3 5 3 3 2 4 4 4 2 7 5 2 2 7 3 1 11 12 10 5 2 5 3 4 5 3 8 3 4 3 3 3 5 7 21 4 3 3 4 1 2 3 5 3 3 2 2 2 3 4 3 2 3 6 11 6 2 4 4 3 2 2 3 4 3 5 3 2 4 4 4 5 6 3 3 2 3 3 2 1 4 2 4 5 4 1 2 1 3 6 4 12 7 2 2 2 4 4 5 1 2 2 6 2 7 2 5 1 2 2 4 2 4 2 3 6 3 4 3 2 4 4 4 2 1 2 1 3 3 2 3 3 2 3 3 5 1 2 3 3 3 6 5 4 4 4 4 3 4 3 2 2 2 2 3 6 3 6 2 2 2 2 2 4 4 5 4 6	17 17 17 17 296 363 52 51 51 51 51 491 184 184 491 184 7 7 7 364 276 109 109 109 443 443 139 139 293 293 293 497 399 217 70 473 65 329 495 406 406 467 134 139 139 175 423 423 423 423 423 263 263 417 417 417 237 237 237 237 237 201 237 80 435 435 435 440 287 111 111 111 111 139 139 293 293 293 122 35 310 107 395 395 151 151 31 342 342 86 238 6 108 119 119 351 443 151 139 240 240 219 219 477 477 477 477 477 132 8 259 74 74 425 425 386 386 290 290 290 290 434 434 434 434 434 339 466 212 127 45 45 45 325 34 111 111 111 111 111 438 438 438 422 349 164 164 214 214 214 360 360 200 248 76 465 219 219 152 152 222 498 353 353 313 236 239 371 371 374 374 88 88 176 135 135 135 200 44 44 44 399 70 65 65 428 428 146 146 252 449 449 324 324 422 349 349 234 234 261 25 441 153 153 153 153 132 81 81 459 459 469 99 447 447 447 447 238 336 214 214 214 214 214 328 328 200 303 303 404 404 229 491 247 312 126 326 326 326 101 101 101 149 149 228 491 287 287 44 44 44 44 44 42 42 42 147 147 380 288 278 278 31 342 86 86 105 105 336 485 41 324 324 422 349 164 164 164 214 214 214 214 328 328 200 200 200 200 248 248 248 127 114 92 92 92 92 169 35 77 66 142 397 397 276 346 346 346 355 355 37 37 24 227 419 419 439 78 491 491 312 312 312 312 292 292 1 21 21 21 21 21 408 408 408 408 149 228 491 289 219 152 152 152 152 236 325 371 180 84 84 496 350 167 457 457 479 331 84 84 84 16 274 43 43 276 181 181 181 181 35 449 449 485 152 222 353 353 245 416 458 445 180 443 443 240 325 449 176 176 135 328 200 117 117 48 414 414 47 47 47 47 491 491 47 491 491 80 491 7 7 219 219 152 152 222 353 372 245 245 245 129 259 354 190 380 288 288 360 360 200 135 135 135 135 200 200 44 44 44 44 162 232 482 482 482 238 336 161 487 288 290 290 290 434 339 339 64 76 107 447 447 6 6 119 351 437 91 91 265 85 85 85 139 139 293 122 122 131 34 340 340 116 33 394 465 377 123 123 219 477 222 222 222 372 372 245 58 72 268 268 268 268 268 268 169 186 269 323 224 242 116 33 58 58 72 350 350 350 350 274 274 203 381 117 229 247 247 126 126 326 326 326 326 326 101 149 149 228 491 412 83 55 55 55 55 322 67 212 219 152 152 152 152 132 236 239 239 371 180 84 84 496 274 274 274 457 196 479 331 84 84 274 88 88 44 44 44 38 232 232 68 68 115 273 278 360 360 200 200 64 212 302 302 302 497 497 349 205 259 214 214 214 214 200 200 464 255 255 8 354 180 113 113 113 113 113 167 285 449 57 57 57 57 203 203 195 10 309 331 157 157 157 157 372 245 245 43 364 364 181 181 181 181 285 449 34 356 281 453 342 6 272 490 490 490 31 9 105 336 336 494 494 494 368 453 168 418 418 418 99 436 436 60 60 298 116 199 356 356 281 31 9 26 26 241 266 266 266 266 266 146 146 358 143 458 192 472 196 309 479 331 157 157 157 157 372 372 245 245 43 364 276 181 181 181 167 167 35 478 478 68 224 273 153 153 153 396 285 285 462 462 462 402 129 259 74 74 351 351 351 351 264 468 468 468 468 467 467 467 11 275 379 379 77 77 342 342 451 30 30 30 30 58 58 110 110 110 486 486 460 460 240 24 131 404 229 247 126 326 193 193 17
+103-1240-0046	103	696	1 25 6 30 18 7 18 20 40 23 8 22 23 20 33 19 33 14 25 7 33 1 38 8 19 33 38 5 40 27 25 23 20 23 4 31 33 38 20 22 8 30 13 11 19 25 12 5 29 15 29 14 18 7 5 24 4 25 4 25 11 18 19 40 38 8 16 5 29 38 13 31 33 5 37 12 20 8 23 5 25 11 1 33 35 22 5 9 28 7 33 5 37 5 25 6 30 16 5 25 5 31 8 23 5 24 4 25 11 18 20 31 13 33 16 8 30 33 5 12 5 18 7 31 4 33 25 8 33 1 31 13 33 19 33 3 25 29 14 29 5 31 24 3 30 19 23 5 1 4 25 11 25 19 30 23 20 9 14 25 33 12 5 24 33 36 5 22 30 19 31 29 19 25 12 13 30 9 13 11 40 1	6 5 6 2 4 6 3 3 4 3 3 4 2 2 2 3 4 5 2 5 8 19 7 6 2 2 2 2 3 4 3 2 4 3 5 4 2 3 4 3 6 4 3 2 3 2 1 2 4 4 5 5 3 6 3 5 7 3 1 2 1 2 3 4 4 5 4 4 3 6 4 5 2 2 1 2 4 5 4 3 2 3 13 3 4 2 3 4 10 4 3 2 2 2 2 6 3 3 3 2 3 4 5 4 3 3 3 2 1 3 5 5 3 3 7 7 4 2 2 1 3 4 5 5 1 4 2 6 5 16 8 3 3 2 6 5 5 5 4 2 4 5 2 1 4 3 3 7 8 6 1 3 2 4 3 3 4 6 4 2 3 1 3 2 4 3 3 5 1 2 5 2 3 1 2 2 3 3 4 6 8 14	17 17 363 51 51 228 491 7 309 479 331 157 157 157 387 372 372 396 313 58 72 110 268 268 268 268 268 274 274 183 451 30 30 30 356 368 342 9 26 251 241 266 266 266 266 178 458 96 26 359 474 474 301 236 87 87 87 87 36 108 119 308 308 308 308 396 313 94 199 180 113 113 113 113 450 450 413 233 227 419 439 78 170 491 312 187 187 187 187 12 12 12 12 260 260 260 391 391 149 491 491 491 7 7 276 346 346 346 265 85 85 146 464 177 177 177 177 133 133 141 141 141 281 342 168 106 350 350 350 350 348 250 359 166 166 324 301 251 251 241 376 376 376 376 460 169 150 342 86 238 272 397 397 109 109 213 213 213 143 458 144 180 106 111 111 111 438 438 42 147 147 380 288 443 240 240 325 34 340 340 116 466 22 283 455 129 74 351 351 351 171 171 252 215 215 259 29 334 334 59 59 245 58 72 72 268 268 268 268 88 88 88 44 44 399 217 217 473 65 136 136 136 136 136 136 136 282 388 94 34 89 340 116 131 183 257 257 257 257 281 9 142 221 336 364 276 346 346 428 428 146 146 349 205 352 106 230 230 230 215 215 35 35 133 364 364 276 109 109 443 443 443 169 150 39 86 86 238 6 272 69 223 130 198 22 448 448 464 106 106 265 85 85 85 146 175 175 81 81 275 275 116 64 131 427 229 247 126 326 326 326 326 326 101 101 149 228 289 289 491 108 377 295 295 295 295 35 192 44 44 44 8 8 8 354 153 153 153 387 387 387 146 464 464 113 113 113 113 206 285 449 34 69 223 130 44 44 44 94 335 14 411 411 153 372 372 372 396 349 349 234 261 25 242 116 94 199 255 38 31 342 68 115 273 106 265 265 85 85 85 175 175 81 81 203 203 381 404 335 440 55 55 322 67 131 183 451 451 30 30 30 422 186 162 68 68 115 273 189 443 240 385 131 472 393 393 234 234 261 261 25 265 265 265 85 146 146 300 382 382 313 143 36 377 123 123 216 283 283 455 72 72 268 268 268 268 268 169 169 39 342 342 224 415 415 415 314 401 196 479 331 428 428 428 428 358 358 233 36 227 427 427 247 247 312 126 292 292 292 292 292 23 408 408 408 408 391 491 491 373 66 66 68 68 115 273 470 443 240 325 449 277 277 277 277 325 335 14 14 287 284 125 125 125 348 348 195 33 394 76 401 82 74 492 492 492 492 396 215 35 354 459 459 459 271 39 342 86 142 196 70 65 65 495 406 406 467 288 139 139 175 175 423 423 423 423 263 229 229 247 126 126 326 101 408 149 149 491 412 83 83 55 55 322 322 67 10 10 309 479 398 398 398 398 468 468 313 359 359 166 166 166 324 301 301 32 32 32 354 354 498 498 308 313 348 64 76 198 198 114 57 57 203 53 76 465 377 377 123 123 123 88 44 44 44 129 458 208 208 190 487 278 278 31 342 86 105 336 336 354 340 340 116 466 466 114 222 222 222 468 245 8 8 354 470 120 120 330 240 379 243 233 270 270 433 433 160 112 112 56 56 421 491 421 491 491 491 421 421 128 491 128 128 193 17 17
+103-1240-0047	103	644	1 4 25 11 8 25 27 5 25 5 12 14 22 15 31 38 13 30 5 25 5 11 3 29 33 5 11 9 28 39 36 40 11 33 5 31 5 22 12 20 13 17 40 1 12 15 22 35 11 5 25 33 9 30 15 22 18 19 24 5 37 19 33 1 19 16 39 36 18 4 11 4 31 33 24 8 5 11 37 8 31 19 25 12 5 24 4 33 14 1 38 19 10 39 36 11 19 11 5 25 33 1 11 36 24 3 30 19 23 5 1 8 11 18 4 37 31 13 11 16 6 30 24 14 31 20 40 31 15 22 25 3 33 5 34 19 26 22 5 37 31 5 10 5 34 19 26 12 4 33 31 38 5 33 1	15 7 2 2 5 4 6 1 4 4 3 2 6 5 4 3 2 3 2 2 3 3 5 3 2 2 3 4 6 6 3 4 1 2 3 7 3 6 3 5 4 7 6 17 3 5 5 3 2 4 2 2 2 3 2 3 2 2 2 4 2 5 5 16 8 4 3 4 5 1 3 5 4 2 2 6 2 1 3 7 5 2 2 1 2 2 4 4 5 16 4 2 7 3 3 4 2 3 1 4 5 2 2 7 3 2 3 3 3 8 14 10 1 2 1 3 7 4 4 6 3 3 8 3 6 3 4 6 5 4 3 4 6 2 7 1 3 3 2 2 3 3 3 3 4 4 6 2 5 3 5 3 2 11 19	17 17 17 296 296 317 305 305 317 461 491 491 435 435 435 435 287 83 194 194 194 194 322 67 212 34 111 111 111 111 438 438 10 479 331 84 84 88 88 88 44 44 348 10 10 479 331 493 493 493 216 300 300 382 245 143 465 445 351 351 343 343 343 171 358 368 342 9 142 397 336 345 347 347 347 406 467 467 467 340 116 199 255 255 236 236 384 180 106 405 405 405 215 215 96 272 449 191 191 191 314 314 32 401 259 354 153 153 153 387 387 387 146 146 219 219 219 219 485 374 374 368 186 323 323 238 6 272 87 87 87 38 162 68 68 68 68 115 273 151 151 178 178 35 458 96 472 164 198 22 448 448 448 448 464 180 443 443 120 120 120 416 416 233 233 270 49 433 433 433 160 427 247 247 126 126 126 292 326 326 326 326 326 326 408 408 149 149 228 289 491 127 114 0 0 0 0 422 143 458 144 27 389 389 389 389 314 35 196 242 242 33 33 76 465 401 259 354 190 380 288 288 295 143 458 192 183 57 57 57 203 88 69 223 223 223 130 280 277 277 277 277 385 24 227 419 419 439 439 439 439 237 237 237 237 47 47 47 491 491 316 491 80 373 412 412 188 188 118 118 118 118 118 118 402 219 219 152 152 152 132 132 58 58 72 110 110 254 254 240 325 34 145 145 460 460 169 150 342 86 6 472 221 70 46 46 46 46 464 464 255 255 240 314 4 280 106 265 265 265 85 85 146 358 39 39 342 342 224 340 340 466 466 22 283 399 473 65 486 486 460 460 285 449 334 334 382 59 452 229 229 247 126 126 326 326 326 326 326 101 101 101 149 149 228 491 289 320 345 407 407 407 407 35 36 310 107 447 219 219 219 152 152 152 132 236 32 239 384 371 371 278 278 325 242 242 242 379 243 243 36 227 472 472 221 336 336 384 371 374 374 374 374 132 132 132 399 70 473 65 329 329 42 406 467 134 139 139 175 423 423 423 423 423 452 263 229 259 247 312 126 292 292 23 23 23 408 101 149 149 149 228 491 412 287 111 111 111 111 111 438 438 325 34 202 202 202 402 402 162 232 68 68 172 115 470 470 120 120 240 240 314 314 131 393 393 393 155 155 332 332 332 372 372 245 399 399 217 217 217 70 65 65 498 498 498 186 186 54 54 172 224 41 324 324 422 186 162 232 232 68 68 68 68 115 470 470 403 171 171 252 416 458 401 196 196 309 331 307 307 307 61 167 35 35 108 377 87 87 38 164 164 164 164 164 214 214 214 360 200 200 76 458 192 69 223 223 402 66 342 224 344 344 344 449 449 44 44 44 38 164 164 164 214 214 214 214 214 328 200 200 248 248 212 127 114 92 92 92 92 169 35 77 77 66 86 142 397 397 336 276 346 346 265 355 37 37 24 227 419 419 439 439 439 78 237 170 491 421 421 491 491 491 491 341 15 15 15 15 193 193 193 17 17
+103-1240-0048	103	738	1 12 19 31 21 27 9 40 22 5 24 16 14 33 19 26 31 20 24 11 25 8 12 14 33 36 5 16 13 25 11 25 6 30 33 36 5 23 3 30 24 3 30 19 23 5 1 32 20 25 19 33 19 11 31 33 13 11 5 23 20 6 25 1 8 11 27 25 11 19 25 8 12 13 30 40 31 5 24 34 19 26 19 25 38 5 33 39 36 31 15 30 15 10 5 23 1 8 37 18 4 11 31 5 24 22 38 3 23 24 40 24 8 31 13 23 16 1 9 5 33 24 4 34 39 36 38 5 40 33 13 30 5 9 5 23 31 13 33 3 25 19 33 1 8 22 35 11 31 20 12 4 33 31 27 8 17 15 37 19 25 1	21 6 5 8 7 9 3 3 4 3 3 3 2 2 3 6 6 6 3 4 3 6 2 3 3 4 3 7 4 8 5 3 3 3 4 4 2 5 4 4 5 2 2 4 3 7 14 7 2 3 3 2 3 4 4 3 2 3 2 3 5 7 6 52 9 3 5 2 2 4 2 8 2 2 2 3 3 2 3 3 2 3 2 2 3 3 3 2 3 5 7 4 4 4 3 5 14 9 2 5 2 4 3 1 5 3 4 3 3 4 4 2 5 5 2 6 6 18 3 2 3 3 6 3 3 4 2 2 4 5 2 4 3 1 2 3 5 3 3 4 3 4 5 11 8 4 2 3 5 5 2 7 5 3 5 6 3 6 4 4 6 12	17 17 17 17 296 52 52 52 52 52 52 52 52 52 461 51 51 51 184 491 184 184 7 7 7 127 127 258 258 258 258 258 39 342 342 86 86 238 221 336 336 401 310 107 395 395 329 84 84 496 496 496 496 274 274 215 8 96 270 342 86 221 221 144 27 437 437 319 319 53 53 76 205 29 29 469 469 24 325 176 176 328 328 200 200 195 248 49 49 68 68 68 444 444 444 444 444 434 434 434 339 394 212 131 472 472 196 309 479 331 331 265 265 428 146 146 216 300 300 382 236 36 377 87 87 87 88 44 44 44 349 349 234 234 261 261 25 432 432 432 432 330 330 388 195 195 195 195 195 64 212 131 472 472 221 309 479 157 157 157 157 372 313 236 108 377 123 123 123 88 88 88 255 255 251 251 241 431 431 306 306 306 306 306 306 396 203 53 381 217 70 65 65 329 495 406 406 134 134 139 175 175 423 423 423 423 423 263 263 417 417 417 417 237 237 237 47 47 47 491 491 435 435 435 435 373 338 338 338 400 400 400 30 422 94 199 398 278 278 325 449 191 191 191 314 314 478 478 68 68 68 238 6 371 470 443 443 240 325 26 134 359 359 359 474 474 324 324 464 464 426 426 426 426 426 426 282 388 303 303 48 48 417 170 491 170 491 28 28 28 491 28 362 491 362 362 491 362 491 362 362 40 491 491 211 211 369 369 369 369 369 369 21 21 21 21 21 21 21 21 21 21 260 260 260 260 260 260 391 391 391 491 73 289 412 412 287 287 111 111 111 111 438 438 24 384 371 180 84 84 350 274 167 314 36 384 490 490 490 116 479 331 331 265 265 265 85 85 146 146 216 127 300 382 382 313 186 162 232 172 115 231 231 231 231 53 76 76 164 214 214 214 328 200 200 340 340 116 250 250 345 181 181 181 181 35 131 219 152 152 152 422 186 162 232 172 115 273 470 403 403 403 207 301 301 42 147 147 329 329 329 329 252 143 310 107 395 302 302 302 497 98 98 13 417 417 417 417 237 237 237 237 237 237 237 80 80 491 491 412 412 287 287 111 111 111 438 202 202 402 58 72 110 110 110 460 240 240 35 77 478 68 224 231 231 231 53 90 90 76 465 208 208 441 441 106 481 481 426 426 426 426 203 53 381 471 49 49 342 142 221 196 46 46 46 46 438 186 162 68 68 115 273 279 279 279 279 279 279 375 169 352 352 352 352 427 229 491 247 126 126 292 326 326 326 326 326 326 101 101 149 149 228 228 289 320 159 159 159 159 159 35 196 196 217 473 329 329 329 329 460 329 164 164 485 485 485 485 423 132 378 43 345 141 141 141 281 9 86 86 6 108 119 119 351 351 264 468 468 468 467 134 134 134 8 100 100 497 497 186 162 68 68 115 273 470 443 240 285 449 34 125 125 125 125 348 199 199 277 277 277 385 227 227 419 439 417 417 237 237 237 237 237 237 80 80 80 491 412 412 287 287 111 111 111 438 438 143 144 389 389 389 314 478 478 68 68 172 267 267 267 267 267 301 216 127 114 92 92 92 92 92 240 240 143 35 36 478 66 172 224 273 84 84 16 88 88 111 111 111 111 438 438 416 416 445 445 210 210 210 171 252 173 173 280 34 120 120 120 120 388 303 303 303 48 417 417 417 170 491 491 491 421 128 128 193 193 17
+103-1240-0049	103	764	1 19 33 31 27 31 13 23 11 5 24 4 34 39 36 31 13 33 31 18 19 40 24 8 25 11 3 25 13 25 20 34 19 26 12 5 33 18 38 13 25 18 20 11 5 40 8 6 23 38 20 40 16 20 23 19 33 31 24 8 11 39 36 33 20 33 19 17 19 37 19 25 1 5 25 11 4 40 16 14 12 5 30 19 31 22 1 12 13 30 40 30 19 31 22 31 19 25 29 30 19 33 20 25 19 30 13 37 30 20 34 19 26 5 9 3 11 20 11 5 40 19 25 12 19 31 38 14 23 11 1 12 13 30 40 30 19 31 22 31 19 25 29 20 29 5 23 40 18 4 37 19 26 10 19 23 11 30 5 25 5 37 12 13 30 27 25 19 16 19 33 22 5 24 40 33 5 12 4 33 1 12 15 11 27 25 33 6 23 38 20 40 33 14 25 7 33 38 13 23 1	13 5 3 6 5 6 3 4 3 3 5 5 3 3 2 6 3 3 1 2 1 4 4 5 2 2 5 3 2 3 3 4 3 6 2 3 2 3 2 2 2 5 3 3 8 5 6 3 3 3 2 6 2 5 3 2 2 3 3 4 3 3 3 3 4 3 3 3 2 2 5 7 24 7 1 3 4 6 2 4 2 3 3 3 6 7 18 3 2 3 6 3 2 5 3 2 2 4 3 3 1 2 2 3 4 4 5 3 3 4 3 3 4 3 4 4 3 4 3 5 4 2 1 2 3 4 4 3 5 4 19 3 2 4 5 4 3 6 2 3 1 3 4 4 2 2 2 3 2 4 2 2 4 4 3 3 2 2 1 2 3 2 1 4 4 9 3 3 3 2 2 2 3 5 3 2 3 2 8 6 17 3 7 2 6 1 3 3 3 3 2 3 5 3 3 5 2 3 3 7 15	17 17 17 17 296 363 363 363 363 51 51 228 491 412 412 177 177 177 177 356 478 66 68 68 172 115 273 344 344 344 274 274 186 162 232 232 172 115 273 273 139 293 293 293 122 122 272 34 242 319 203 53 381 217 217 473 65 486 486 460 169 169 164 164 485 485 485 374 422 186 162 68 68 273 470 443 443 240 71 71 342 224 257 257 257 257 453 9 196 217 70 65 480 480 480 480 299 299 299 339 212 34 106 125 125 125 388 94 199 475 475 475 475 475 475 475 475 422 349 164 164 214 214 214 214 328 328 200 200 248 212 127 45 45 45 45 385 131 133 133 364 409 409 409 409 348 94 183 183 451 451 30 30 301 236 36 384 71 71 71 71 71 71 71 71 71 368 453 342 168 106 111 111 111 111 438 464 106 297 297 297 297 297 43 109 109 109 469 186 39 342 86 142 393 261 25 444 213 213 139 139 251 241 81 177 356 236 71 71 142 221 196 70 46 46 46 438 438 236 239 384 485 485 485 374 374 252 449 449 41 41 41 324 3 143 36 377 87 87 87 416 445 445 278 278 173 280 34 120 120 120 275 388 303 303 303 117 404 78 491 491 312 312 312 292 292 292 292 292 292 21 21 21 21 21 21 21 21 408 408 408 149 149 491 412 412 83 55 55 322 67 212 131 34 253 253 253 253 31 342 342 86 142 221 155 332 332 332 332 332 216 283 455 42 42 147 380 288 278 278 278 271 39 342 433 433 105 105 458 192 419 439 439 417 417 237 237 237 237 237 237 237 237 237 237 491 237 80 80 435 80 491 491 127 114 0 0 222 468 356 281 453 9 142 221 336 147 380 288 278 278 31 342 86 86 105 336 270 270 342 224 494 121 203 53 394 394 465 259 190 190 487 104 104 325 41 324 324 301 10 309 398 398 398 398 468 245 335 14 411 411 204 204 204 204 204 204 29 337 337 337 324 422 164 164 164 214 214 214 214 214 200 200 464 415 415 236 129 259 354 354 91 91 206 206 285 285 41 41 324 301 236 239 384 71 71 71 71 71 71 368 342 168 340 340 340 466 466 114 258 258 258 31 142 142 397 397 364 109 109 498 498 134 139 375 375 375 122 122 131 427 229 491 247 126 126 292 292 292 292 326 326 326 326 326 101 101 408 149 228 491 491 127 114 222 222 468 468 356 356 281 9 9 142 42 147 147 380 288 278 278 31 31 342 86 105 105 336 458 270 270 224 89 89 203 53 394 465 465 74 485 213 213 252 215 129 354 100 302 497 497 49 342 58 72 110 202 202 202 202 202 280 176 135 135 200 248 248 310 107 107 395 395 106 153 153 387 122 122 161 300 242 242 116 94 199 223 223 130 198 198 222 222 222 222 406 406 467 467 350 350 350 350 350 350 350 413 413 413 195 199 118 118 118 118 402 177 177 177 177 458 144 351 351 319 319 319 71 71 71 71 49 86 238 6 123 123 123 216 216 114 92 92 92 92 92 92 92 282 385 385 131 419 427 229 247 247 126 126 292 326 326 326 326 326 101 101 149 149 228 289 289 491 127 114 0 0 0 0 0 252 252 325 180 106 350 350 350 350 413 413 465 131 106 106 297 297 297 297 43 109 109 109 109 318 31 39 142 6 272 119 308 308 308 308 313 116 94 199 331 486 113 113 167 167 457 364 276 109 109 139 139 139 375 375 98 98 13 417 417 417 417 237 491 237 421 421 491 491 491 491 193 193 17
+103-1240-0050	103	701	1 4 25 11 12 13 25 27 37 5 31 22 27 32 5 19 40 30 8 33 22 23 27 31 33 5 12 20 8 23 5 25 11 1 19 33 19 40 5 25 33 13 40 19 16 38 20 38 14 17 13 33 19 26 18 19 24 16 14 24 19 26 17 23 5 25 11 6 30 12 5 31 33 15 33 31 1 18 20 22 4 25 33 9 20 24 5 10 11 19 16 30 5 25 33 16 14 24 3 30 31 13 23 37 40 1 38 13 23 8 18 27 29 19 33 38 5 23 33 14 25 7 33 6 23 30 8 33 1 31 13 11 24 19 31 19 40 30 15 10 5 23 19 25 5 33 27 25 12 5 33 29 23 15 25 23 20 19 25 11 5 22 15 33 19 11 18 14 29 15 25 16 5 23 11 7 33 31 1	16 9 2 5 2 4 7 4 2 3 4 4 4 5 1 2 4 3 4 2 5 2 4 3 3 2 2 4 6 4 2 2 7 4 5 2 2 4 2 2 2 2 4 4 4 1 3 3 3 3 3 2 3 3 2 3 3 4 2 2 5 5 2 3 3 1 3 2 3 2 3 4 3 6 5 5 17 4 2 4 4 2 1 2 4 5 3 6 2 2 3 2 1 2 1 3 2 2 2 4 4 4 4 3 11 52 4 3 4 7 8 5 3 2 3 1 2 2 6 3 2 5 2 4 3 4 5 7 16 4 3 2 3 2 4 2 5 2 4 3 3 4 2 2 2 6 6 2 2 1 3 6 3 4 3 2 5 3 2 2 3 4 4 2 2 2 4 3 7 4 6 4 1 5 4 10 3 7 5	17 17 17 296 296 363 363 363 363 363 51 51 149 228 228 289 412 83 83 194 194 194 194 194 194 388 388 64 64 131 472 198 127 361 361 361 361 361 388 67 10 10 479 331 331 84 496 496 173 173 280 29 255 38 162 54 482 105 105 336 144 496 496 496 496 274 99 99 436 395 395 50 50 50 31 9 142 397 147 380 499 428 428 146 167 457 35 401 259 208 208 386 386 496 496 496 496 186 39 86 238 6 377 123 123 123 22 448 448 448 464 180 106 265 265 85 85 146 134 175 81 81 275 275 388 303 243 131 419 439 439 225 417 80 80 491 491 209 188 177 177 177 325 356 356 356 281 342 342 168 242 242 116 64 212 34 253 253 368 453 342 168 118 118 118 118 118 205 402 152 152 152 152 378 378 345 347 347 347 313 416 458 445 180 443 240 325 325 176 135 135 200 200 248 183 57 57 57 57 203 53 76 205 155 165 165 165 165 165 335 14 411 411 360 360 360 200 200 248 248 248 441 302 81 81 275 275 116 64 212 131 157 157 157 157 157 216 216 22 283 455 38 162 68 68 6 272 470 470 171 171 171 171 358 358 358 358 270 270 270 433 160 112 427 229 247 247 126 292 292 326 326 326 326 326 326 101 408 408 391 228 491 491 373 451 451 30 143 458 445 445 351 365 365 365 460 460 76 465 465 420 420 420 324 301 399 217 217 383 383 383 383 383 383 310 310 447 447 6 336 371 278 278 349 349 155 29 242 275 116 394 90 393 155 332 165 165 53 65 353 353 353 353 396 186 162 342 115 273 279 279 279 279 279 279 293 169 169 352 270 433 390 390 18 112 112 439 439 78 56 56 491 491 28 491 491 312 491 341 341 12 12 12 12 292 21 21 21 21 21 21 21 21 21 21 21 21 21 369 260 260 260 260 260 260 40 40 40 40 40 40 163 163 491 491 305 305 491 316 316 316 73 491 491 320 345 109 109 139 139 139 175 175 81 111 111 111 111 438 438 438 58 72 72 72 72 72 437 496 496 496 496 496 215 35 35 354 177 177 177 131 133 345 389 389 497 497 36 108 108 308 308 308 308 308 313 94 199 113 113 113 113 206 285 285 106 106 297 297 293 293 42 42 147 380 499 428 428 428 146 358 358 233 227 419 419 439 417 417 417 237 237 237 237 237 237 201 237 237 201 201 201 80 373 435 435 108 179 179 179 179 179 314 196 217 473 258 258 31 342 224 494 494 494 281 9 142 397 147 147 329 329 329 329 252 310 107 395 302 302 302 497 175 175 81 89 340 94 199 255 255 236 36 119 119 351 351 496 350 350 350 413 413 466 466 22 45 45 236 129 129 82 74 74 425 425 386 386 386 290 290 290 290 434 339 33 359 359 166 166 166 3 14 411 188 121 121 116 64 212 384 469 469 416 143 458 445 158 158 158 158 158 325 34 191 191 314 131 58 156 156 156 156 245 8 129 259 74 74 351 351 290 290 290 290 434 339 339 195 248 90 393 393 155 262 262 100 100 497 497 497 122 239 384 371 180 180 486 315 315 113 450 450 450 167 37 233 270 270 433 390 390 18 112 112 439 439 193 193 193 17
+103-1240-0051	103	767	1 27 25 23 20 11 27 25 33 31 15 8 11 19 11 5 25 38 6 30 25 39 36 19 16 18 20 9 14 25 40 17 30 20 25 17 15 9 5 23 40 11 7 25 14 29 35 33 31 33 30 19 22 25 8 25 19 25 12 5 38 13 23 1 8 18 14 11 5 37 5 22 15 31 27 37 14 19 25 39 36 9 30 5 25 40 38 19 22 38 13 30 5 25 6 30 16 5 25 5 31 8 23 5 24 10 8 23 11 19 11 12 4 33 1 5 25 11 12 5 18 27 23 16 4 24 23 20 11 8 11 19 25 16 19 30 16 5 23 4 17 5 25 20 40 1 27 25 23 20 19 33 38 5 40 5 17 14 23 19 25 12 4 33 19 25 31 33 5 25 31 1 38 13 23 38 14 25 3 33 17 13 33 19 26 5 17 14 23 1 31 13 11 24 3 30 19 23 5 1	5 6 5 3 4 5 5 3 2 7 6 6 2 3 1 2 3 4 2 3 2 3 3 2 2 3 2 5 8 4 3 3 3 4 5 3 4 3 2 4 2 4 8 3 4 4 2 4 5 3 2 3 3 3 3 3 3 2 1 2 3 4 9 19 8 3 4 2 1 2 3 4 6 5 5 5 2 2 4 3 2 5 2 2 5 2 2 3 5 1 2 1 2 2 4 3 3 1 3 2 5 4 3 2 3 5 5 3 3 3 4 3 4 5 13 4 2 1 2 2 7 5 4 5 5 2 3 3 4 6 2 2 3 7 4 3 3 2 4 5 3 1 2 6 8 23 8 2 3 2 2 2 3 1 3 2 3 6 3 3 1 3 2 2 3 3 4 2 2 4 7 25 6 4 3 4 3 2 4 4 3 2 2 2 4 2 4 5 7 18 5 2 2 2 2 3 3 3 7 5	17 17 296 296 184 491 209 287 350 350 350 350 350 350 250 359 359 81 166 166 324 324 422 314 32 239 384 371 180 84 350 350 350 413 413 243 131 472 232 232 232 68 115 273 470 470 403 403 171 464 464 111 111 111 111 438 438 239 371 371 278 242 314 242 242 242 394 133 364 276 276 153 153 387 387 396 348 339 219 219 477 477 477 88 118 118 118 118 402 183 451 30 30 301 32 129 259 354 354 498 498 498 498 498 396 396 242 116 195 471 368 453 9 142 221 144 208 79 288 288 360 360 360 434 339 200 33 248 248 212 445 180 171 171 171 252 252 8 354 100 302 497 497 497 49 453 9 6 384 371 180 315 315 315 315 450 450 450 413 413 94 199 157 157 245 245 129 259 74 190 189 189 236 35 478 478 482 482 482 482 482 238 6 161 487 288 178 178 458 458 192 196 196 479 398 360 360 434 434 339 199 34 340 340 116 466 22 283 455 43 276 109 109 139 139 139 139 375 375 375 375 98 13 229 491 247 126 126 126 326 326 326 326 326 326 326 326 326 101 101 101 149 149 228 289 320 287 111 111 111 438 438 58 110 498 498 498 498 396 285 34 223 223 280 44 44 44 458 445 445 351 351 343 171 171 358 358 39 342 342 224 224 106 410 410 410 410 173 173 29 29 495 467 467 44 44 116 10 10 398 398 398 398 374 132 236 32 259 354 190 380 499 319 319 348 348 471 471 49 9 142 397 109 109 288 178 178 143 458 208 397 347 347 467 467 44 44 94 14 14 411 153 372 372 396 349 349 352 29 242 116 199 44 44 38 342 342 115 273 106 265 265 85 85 146 175 81 282 203 53 394 90 310 107 395 351 91 91 91 85 85 139 450 293 122 35 401 384 371 278 278 314 314 401 401 127 114 92 92 92 92 167 385 35 227 427 229 247 126 126 326 326 326 326 101 408 149 149 228 491 412 55 55 55 322 67 466 198 5 5 455 38 72 72 72 72 72 437 424 424 424 424 424 497 497 122 349 401 205 261 25 25 365 365 365 365 460 203 53 359 81 81 41 324 324 422 36 371 180 265 265 265 85 146 146 325 34 340 340 116 33 76 76 205 234 234 234 261 25 485 286 286 286 468 245 349 349 205 262 262 100 497 497 14 14 145 145 486 460 460 416 458 242 242 116 199 41 41 41 19 318 185 433 433 433 160 112 427 56 170 491 312 312 312 187 12 12 292 12 12 12 12 12 408 408 260 391 491 491 316 491 491 491 491 412 287 287 350 350 350 350 350 359 359 81 166 166 464 177 177 177 133 133 141 141 141 281 453 168 44 44 416 208 79 498 498 498 498 134 302 497 175 81 340 340 340 466 466 114 92 92 92 240 325 34 121 121 121 379 77 77 342 86 238 6 272 11 11 379 379 243 471 49 433 390 390 18 18 112 439 439 237 237 237 237 237 237 237 305 305 12 260 260 260 260 260 260 260 163 163 316 316 316 316 491 7 7 7 364 109 109 139 139 139 293 293 43 43 345 347 347 347 313 313 94 479 307 307 307 61 167 131 472 401 259 144 445 443 443 240 325 176 135 135 200 464 44 44 44 416 458 144 79 498 498 498 499 499 302 375 375 98 98 13 417 417 417 237 237 237 237 237 237 237 237 491 491 80 316 491 80 491 289 435 66 66 179 179 179 179 314 196 217 70 65 329 329 406 406 467 134 139 139 175 423 423 423 423 423 263 263 229 247 15 193 193 17
+103-1240-0052	103	739	1 4 40 19 16 29 28 40 5 25 19 26 38 13 23 40 38 14 5 29 39 35 30 23 20 16 13 24 5 25 5 25 5 22 3 24 29 23 19 32 24 5 25 33 1 5 25 11 25 3 33 19 9 20 11 30 13 11 19 11 19 25 12 20 22 15 31 5 37 5 9 28 1 8 11 25 13 37 14 11 30 20 24 5 37 33 15 22 19 26 5 17 14 23 33 5 9 30 19 26 5 29 1 8 38 5 25 11 14 4 33 24 19 31 19 40 4 23 5 17 40 4 25 11 14 31 29 13 25 31 14 16 14 11 36 19 26 19 33 1 9 5 33 12 13 30 1 32 20 38 35 11 5 25 33 32 30 19 26 22 16 14 24 5 11 3 29 33 19 26 5 18 27 23 6 30 16 5 25 5 31 8 23 5 24 19 16 32 20 33 35 22 19 33 19 25 33 36 18 14 18 13 11 1	7 6 3 3 4 4 6 3 2 1 2 5 3 4 5 4 4 3 3 4 3 2 4 2 5 8 2 3 2 1 3 2 2 5 2 2 2 2 2 5 2 2 2 3 12 7 2 1 3 4 4 2 2 3 4 3 2 3 2 3 2 2 2 1 5 5 5 1 3 2 4 12 38 7 2 2 3 2 3 5 4 4 2 2 2 5 3 3 2 4 2 7 5 4 2 2 2 2 3 3 6 7 8 7 6 2 2 3 2 2 2 1 3 3 2 2 4 2 3 2 3 3 3 3 2 4 4 2 4 4 2 3 2 4 2 4 3 3 6 10 3 2 3 1 2 8 7 12 8 3 2 2 2 2 1 6 1 2 4 3 2 2 2 2 3 5 2 2 1 4 3 4 2 9 5 3 4 2 1 3 5 5 3 2 1 3 2 3 2 3 3 3 2 2 3 2 4 2 4 4 3 5 6 22	17 17 363 363 51 51 228 209 83 145 253 253 253 453 342 342 118 118 118 118 349 402 221 259 74 74 441 153 153 387 387 146 368 453 342 242 196 309 199 176 135 135 200 200 248 250 364 276 109 109 443 139 139 139 293 293 293 185 49 9 142 397 345 347 347 347 347 406 467 255 255 129 129 74 74 485 485 485 286 286 468 468 468 134 359 359 359 166 166 324 3 422 349 234 234 234 261 25 25 443 443 330 203 53 473 242 242 199 199 89 446 94 199 255 255 143 259 144 27 437 319 319 53 53 76 465 81 81 469 469 99 447 447 221 196 291 291 291 291 291 243 227 419 427 247 247 126 126 326 326 326 326 326 101 149 149 491 412 83 55 194 194 194 388 67 10 10 479 331 307 307 61 167 167 36 108 87 87 87 8 420 420 420 422 32 239 161 161 79 79 288 443 443 240 325 34 191 191 24 36 34 340 340 340 466 22 283 455 143 458 445 445 351 343 343 171 358 358 39 342 342 224 69 69 130 280 29 44 44 236 8 354 153 153 153 153 387 387 387 207 207 207 454 13 229 82 247 312 312 312 292 292 292 292 292 292 292 292 1 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 408 408 408 408 391 391 289 491 209 287 111 111 438 438 314 196 309 479 463 463 463 463 29 29 382 313 236 32 32 239 161 79 79 288 360 360 360 434 434 203 53 69 223 223 402 221 259 108 119 295 295 295 295 295 458 135 135 135 135 200 200 44 44 44 44 129 401 491 144 79 498 498 498 498 139 302 302 497 122 122 449 87 87 87 8 354 380 288 288 360 328 200 464 230 230 230 230 230 230 215 35 29 419 419 225 225 225 225 225 225 225 225 225 225 225 225 287 287 111 111 111 438 378 43 364 276 174 174 319 348 348 64 212 161 300 382 382 277 415 457 196 217 258 258 258 342 342 224 494 494 453 168 145 329 329 329 175 81 81 469 416 416 96 342 168 470 365 365 365 365 348 64 212 300 382 382 186 186 54 54 105 336 336 74 470 432 330 379 64 64 77 77 224 224 334 382 245 349 155 332 332 332 236 239 384 371 374 374 88 176 176 135 135 200 200 277 277 277 277 233 227 419 439 439 439 225 225 225 391 491 80 491 491 73 491 320 159 159 159 314 35 259 127 114 264 264 264 468 59 59 452 263 417 417 417 417 47 47 491 491 491 435 197 373 338 338 338 338 338 400 400 400 400 95 95 246 246 246 301 378 378 345 389 389 389 314 196 242 242 33 33 310 107 338 161 161 161 487 288 360 360 360 200 200 243 96 96 393 155 165 165 165 53 44 255 236 239 384 180 180 405 405 206 206 35 96 272 176 135 135 200 200 44 44 44 44 72 72 72 424 424 424 424 424 424 497 497 497 335 14 226 82 411 411 157 372 372 396 349 349 234 261 25 242 242 94 199 459 38 31 342 342 273 106 265 265 85 85 85 175 175 81 81 203 53 118 118 118 118 402 338 400 400 400 422 36 108 377 295 295 295 416 458 277 277 277 325 34 340 340 340 116 64 76 108 377 123 123 123 88 88 156 156 156 156 245 58 58 110 110 120 120 120 120 120 37 24 24 404 439 78 229 491 312 312 15 292 292 292 292 292 292 292 21 21 21 15 193 193 193 193 17 17
+103-1240-0053	103	683	1 24 19 31 19 40 30 15 10 5 23 38 35 11 18 4 37 23 8 22 33 19 31 33 15 5 25 33 19 23 24 4 34 39 36 22 15 24 18 27 24 38 19 12 18 19 40 19 24 29 6 30 33 19 11 6 30 16 5 25 1 9 5 33 30 5 16 23 13 22 33 19 26 12 5 33 19 33 38 35 11 9 20 5 17 35 11 33 36 7 14 40 4 33 23 20 31 33 9 19 16 6 30 18 19 40 14 8 37 5 23 1 32 20 22 5 25 22 23 36 11 19 11 33 19 17 27 5 29 12 5 30 27 11 33 36 30 3 9 14 33 9 13 23 40 5 25 11 33 13 23 12 5 25 39 36 40 1 19 33 38 35 11 31 14 33 5 25 23 20 24 15 22 5 31 13 25 31 15 32 5 25 31 13 22 5 25 33 5 25 5 25 1	23 5 2 3 3 4 3 3 3 3 3 2 2 1 2 1 2 3 4 2 3 2 4 3 4 2 2 3 2 3 3 6 3 2 2 5 4 4 1 5 2 2 2 1 2 1 3 2 2 4 2 3 2 1 3 7 3 4 2 6 23 3 3 3 2 4 3 1 3 3 3 3 4 1 3 1 2 3 2 1 2 3 5 3 6 3 5 5 7 8 3 4 2 4 3 4 3 2 2 3 5 2 3 2 1 4 6 6 2 2 5 14 6 3 2 3 2 3 2 4 2 3 2 3 2 3 5 5 4 2 3 5 8 2 8 3 4 4 2 2 4 3 4 7 4 1 2 2 3 3 3 2 2 3 4 7 10 14 5 4 2 2 3 6 4 2 2 1 2 2 4 4 3 3 4 2 4 4 4 5 2 3 3 3 3 2 1 3 2 4 5 6 8	17 17 17 17 296 52 52 52 52 52 52 52 52 461 461 491 461 461 184 491 491 305 305 289 7 217 473 258 258 258 342 342 224 494 494 494 453 9 142 397 147 329 329 329 329 329 143 36 449 302 302 302 497 43 43 345 389 389 389 285 34 202 202 402 402 251 241 266 266 266 266 146 178 35 35 272 87 87 38 162 342 86 238 6 272 470 403 403 464 464 464 330 348 76 76 108 377 139 139 139 497 399 217 217 473 486 486 486 460 460 169 164 164 485 485 485 374 422 143 144 445 210 210 210 210 210 210 203 53 58 58 350 350 350 350 350 203 250 250 345 333 333 220 216 22 257 281 453 9 168 121 121 53 76 465 74 74 441 153 153 372 372 313 449 449 191 191 24 335 14 14 411 153 153 372 372 372 396 349 352 352 352 275 275 275 116 303 303 48 229 491 247 312 126 126 292 292 292 292 326 326 326 23 23 23 101 408 408 408 149 149 228 491 289 491 354 159 159 159 159 314 133 133 456 456 456 456 349 349 234 261 386 386 151 151 151 178 35 96 36 449 176 176 135 135 200 200 248 212 127 45 45 45 325 177 177 177 177 345 345 389 389 389 129 129 259 420 420 420 420 464 464 44 44 44 416 129 401 401 144 484 484 484 484 314 314 32 401 401 259 108 377 351 374 374 374 132 88 88 106 145 284 315 315 315 450 450 450 372 304 304 304 49 342 342 168 415 415 415 26 26 26 241 241 444 444 213 213 358 39 39 342 142 221 336 354 354 255 38 349 205 155 148 148 148 148 372 245 58 183 257 257 257 453 168 255 255 42 42 147 380 499 499 265 85 85 146 173 173 280 302 302 375 497 98 229 82 247 126 126 126 326 326 326 326 326 326 101 101 149 228 491 373 338 400 400 400 400 30 143 144 27 121 121 121 33 394 76 208 208 386 444 444 374 374 252 325 34 191 191 191 314 36 36 377 87 87 87 416 416 180 84 84 496 88 88 230 230 230 230 215 215 35 401 198 198 283 283 455 42 42 147 380 380 288 496 496 496 496 274 274 274 37 24 24 36 377 377 377 123 123 272 123 123 123 42 42 147 147 499 499 405 405 206 215 29 469 313 314 32 401 401 401 354 180 180 443 139 139 139 139 375 375 375 375 185 49 342 342 168 89 116 33 394 76 108 119 351 351 139 139 293 293 122 216 283 283 455 116 10 398 398 398 398 398 374 374 132 132 132 185 185 269 390 390 390 18 18 112 439 237 237 237 237 237 237 491 47 491 47 491 491 435 435 435 289 491 209 177 177 177 177 131 133 133 345 389 389 389 314 129 478 66 68 68 115 273 498 498 498 240 240 35 35 359 359 359 166 166 166 301 301 217 217 473 476 476 476 476 143 458 192 44 44 38 342 342 115 273 432 432 379 379 394 77 68 68 115 418 418 418 418 418 99 99 436 436 60 298 379 379 471 478 66 342 115 273 151 178 416 458 192 242 116 64 76 108 377 123 123 116 10 479 331 331 319 319 319 282 388 303 303 117 48 229 247 15 15 15 193 193 193 17
+103-1240-0054	103	816	1 5 25 11 24 19 31 19 40 30 15 10 5 23 1 11 19 30 23 20 23 5 37 11 33 5 24 15 22 5 31 13 25 31 15 32 5 25 1 31 27 32 20 33 35 22 18 14 31 13 23 16 5 38 15 31 5 24 38 5 33 5 24 14 19 23 5 40 30 19 23 20 16 1 16 6 30 12 5 23 4 33 14 16 13 23 33 18 14 11 7 33 31 5 25 11 16 19 30 40 30 19 37 8 37 19 26 5 25 11 14 12 20 19 25 16 23 36 5 25 31 5 37 24 19 31 19 40 30 15 10 5 23 40 29 13 31 5 24 19 40 5 24 1 38 13 23 5 37 6 23 34 19 26 40 12 5 33 1 13 37 14 38 14 6 30 38 19 23 9 20 1 19 21 4 22 39 36 23 15 33 19 11 24 19 31 19 40 30 15 10 5 23 38 19 25 32 20 38 5 40 31 15 16 23 20 7 33 19 25 12 5 23 15 25 1	7 6 2 3 2 2 4 2 5 3 3 4 4 5 3 3 4 3 3 5 5 4 4 1 2 2 2 3 3 2 4 2 4 4 4 6 2 4 20 6 3 3 3 4 4 2 1 3 4 2 3 3 4 4 10 8 3 4 2 3 5 2 2 6 3 4 3 5 2 2 4 7 7 4 6 2 3 1 4 4 4 3 3 5 3 3 2 4 3 5 6 5 2 2 2 3 3 5 3 4 3 2 3 8 2 3 4 3 2 3 1 3 3 3 4 4 3 1 2 3 3 2 3 3 2 5 2 4 2 3 3 3 2 3 4 2 5 3 2 2 4 2 5 43 3 1 2 2 4 6 3 8 2 7 2 2 3 3 1 6 3 4 9 9 4 4 5 3 4 3 8 10 5 5 3 4 2 1 3 3 3 2 3 2 2 4 2 5 2 4 3 4 2 3 3 2 3 3 2 2 2 5 5 4 2 5 4 5 2 2 1 2 5 6 6 9	17 17 17 363 363 51 51 491 412 412 55 55 55 322 67 33 250 217 473 258 258 258 31 342 224 494 494 494 281 9 142 397 147 147 329 329 329 329 329 329 36 310 107 395 302 302 302 497 497 122 122 401 401 401 401 401 384 371 371 286 286 286 286 313 134 359 359 166 166 166 166 301 301 251 251 251 241 266 266 266 266 266 173 173 402 402 6 108 377 87 87 217 473 476 476 476 143 458 192 44 38 68 68 115 273 432 432 330 379 394 77 342 342 115 470 418 418 418 418 99 99 436 436 60 60 298 303 303 48 48 417 417 237 237 237 491 237 2 491 2 2 491 491 491 435 435 491 491 435 435 491 289 373 66 66 115 273 344 496 186 99 400 400 400 30 422 143 36 108 295 295 295 295 295 458 192 156 156 156 186 186 54 172 115 279 279 279 279 279 349 352 29 44 255 255 43 43 276 109 109 403 403 403 207 207 207 207 19 3 454 225 66 66 68 68 68 115 273 231 231 231 231 53 250 250 345 346 426 206 167 167 457 36 108 377 123 399 70 65 65 329 42 42 147 380 288 256 139 175 175 423 423 423 423 271 368 269 142 142 397 147 456 456 456 251 251 241 444 444 444 213 246 246 358 358 173 352 352 352 112 225 225 225 225 225 225 225 373 393 155 155 155 332 332 332 313 216 216 5 5 455 455 251 241 431 486 376 376 460 460 449 449 300 382 382 245 349 349 205 261 25 180 189 139 139 293 122 122 131 183 156 156 156 382 313 313 236 239 239 384 180 180 486 315 113 113 450 450 167 35 270 270 342 224 340 340 340 33 394 76 393 205 261 25 485 286 286 286 286 304 304 304 49 447 142 397 147 456 456 456 456 173 280 106 265 265 265 265 85 85 146 146 173 280 176 176 135 328 200 200 199 89 319 319 348 33 64 212 300 123 216 198 448 448 448 448 464 121 121 121 53 394 76 155 425 425 386 134 88 88 11 11 11 379 471 49 342 168 69 69 223 130 129 196 196 217 473 258 258 258 31 224 224 494 494 494 281 142 142 397 147 329 329 329 329 329 36 449 302 302 302 497 497 31 142 221 336 74 351 351 443 443 150 150 342 342 224 494 494 203 53 459 459 459 368 453 342 168 275 203 203 381 48 13 13 491 491 247 312 312 292 292 292 292 292 292 292 21 21 21 21 21 21 21 21 21 21 21 21 21 260 260 260 260 260 260 260 260 260 260 260 391 391 391 491 491 73 491 320 345 109 139 175 175 81 223 130 280 280 106 297 297 297 297 297 297 293 293 349 349 352 164 164 164 164 164 214 214 214 214 200 200 200 200 471 471 49 453 198 114 45 45 385 457 14 401 226 82 209 463 463 463 463 463 280 29 382 382 382 245 245 43 43 364 364 276 276 109 347 347 498 498 59 59 59 245 14 14 411 157 157 157 372 372 245 245 43 43 364 276 109 109 329 139 139 293 497 122 8 354 420 41 41 41 19 19 454 454 417 417 417 417 237 47 80 491 80 491 435 435 209 188 177 177 236 36 107 395 180 486 486 460 178 178 458 192 485 469 134 175 158 158 158 158 158 325 449 191 191 191 314 314 196 217 473 258 258 258 342 342 494 494 494 281 9 142 397 397 147 329 329 329 329 329 143 310 107 395 302 302 497 497 43 364 345 409 409 409 116 314 76 465 400 400 400 301 378 345 141 141 141 31 232 232 68 68 115 273 470 171 171 171 252 349 349 402 26 359 166 166 166 324 464 464 113 113 113 113 169 167 36 449 34 340 340 466 466 22 283 455 497 251 251 241 431 431 290 290 290 434 434 339 339 117 404 13 229 491 247 15 15 15 193 193 193 17
+103-1240-0055	103	562	1 19 33 11 5 40 30 20 23 20 31 20 24 13 40 19 16 8 24 5 31 33 9 20 11 30 20 24 19 26 1 38 13 23 8 24 31 3 30 20 16 6 30 12 4 33 29 36 30 39 5 26 38 5 25 4 25 11 25 27 24 19 31 33 15 22 1 24 4 34 39 36 5 25 11 24 3 30 19 23 5 11 27 25 27 13 25 20 34 19 26 5 9 7 33 10 19 23 11 30 5 25 1 5 25 11 12 15 23 19 22 31 29 13 22 33 19 24 33 19 9 20 38 8 40 14 5 25 11 31 33 13 11 20 14 12 5 33 18 19 40 27 25 17 30 4 25 16 3 12 14 1	12 6 2 3 3 6 3 2 4 3 7 4 3 1 4 2 4 3 3 2 3 1 2 2 4 2 4 2 2 7 47 3 2 4 4 3 6 2 4 3 4 1 2 2 4 3 6 4 2 5 4 5 3 2 2 2 2 2 4 3 2 3 5 3 5 7 15 5 5 4 2 5 1 2 1 2 1 5 2 3 3 5 4 5 8 1 4 3 3 2 3 2 2 7 2 4 2 3 3 2 2 5 5 6 1 2 1 3 3 3 2 3 3 2 4 3 2 3 3 1 2 4 6 4 6 3 2 2 2 4 4 1 3 3 3 3 2 1 2 1 3 7 5 3 2 4 5 4 4 2 7 9	17 17 17 363 363 363 51 51 228 184 491 491 320 188 177 177 177 177 143 401 82 384 71 71 71 71 71 453 9 142 221 336 155 487 288 485 278 26 359 166 166 166 166 422 162 232 232 68 68 444 444 444 360 360 339 53 53 473 253 253 453 342 168 118 118 349 402 25 111 111 111 438 399 70 65 319 169 150 342 105 221 336 420 420 422 236 239 161 79 79 288 288 360 360 360 203 53 176 176 328 328 200 303 48 13 491 491 312 312 312 312 312 292 292 292 292 292 292 292 292 292 292 292 21 21 21 21 21 21 21 369 21 21 21 21 21 21 21 21 21 21 21 260 260 260 391 391 391 391 491 491 491 320 320 346 346 84 139 139 175 175 81 111 111 111 438 203 53 478 478 232 232 172 115 106 106 153 153 372 372 337 337 337 301 349 155 155 332 332 332 240 216 114 92 92 92 167 457 35 401 259 74 74 441 441 441 153 153 153 372 372 396 313 219 219 219 219 180 180 319 319 348 348 248 250 250 276 174 174 174 388 94 199 89 89 446 116 10 10 309 479 331 84 84 496 399 399 473 65 459 31 31 342 86 86 238 6 470 470 470 171 171 171 358 24 458 192 419 419 439 439 417 237 237 237 237 237 237 237 237 237 237 237 237 237 80 7 7 217 473 65 329 486 460 460 169 169 164 164 25 485 485 485 378 378 88 89 89 116 33 250 70 65 329 329 245 42 147 134 134 139 175 175 423 423 423 423 314 314 239 384 371 180 84 350 350 167 457 309 479 331 84 84 88 88 14 14 411 475 475 475 475 475 475 475 475 422 349 164 214 214 214 214 200 200 255 255 8 354 113 113 113 113 450 167 167 457 36 310 107 107 395 395 106 153 153 122 122 285 300 300 300 275 275 94 117 404 13 414 80 80 491 491 412 412 83 55 55 55 322 67 64 212 114 0 0 139 139 175 81 154 154 154 458 96 66 68 105 105 336 470 151 151 178 35 96 401 36 272 57 57 57 203 64 394 76 377 87 87 87 420 420 420 420 301 43 364 276 346 346 265 265 265 85 146 146 368 453 9 300 300 382 406 467 89 89 446 33 394 478 68 68 68 238 6 272 470 470 443 240 325 41 324 324 286 459 459 469 216 198 114 242 446 94 199 257 257 257 453 168 106 350 350 350 350 350 413 195 195 33 90 32 465 208 79 380 288 365 365 365 365 388 348 64 76 90 393 261 25 91 91 91 91 493 216 300 334 334 59 452 263 229 247 126 126 326 326 326 326 193 193 17
+103-1240-0056	103	715	1 19 33 31 20 24 40 5 25 22 4 25 20 33 19 34 19 26 22 5 37 5 10 8 23 11 4 33 17 30 20 25 17 15 9 5 23 40 31 5 24 18 7 1 12 13 30 40 25 13 37 14 9 19 25 38 5 25 12 13 30 16 6 30 24 4 34 39 36 5 25 11 24 3 30 19 23 5 38 14 17 30 27 25 5 29 38 13 25 12 5 25 39 36 18 7 31 38 5 40 9 19 23 33 1 19 16 12 15 13 37 14 38 14 10 19 23 11 30 5 25 1 38 19 10 19 40 18 3 30 11 33 5 9 19 23 20 37 38 13 25 38 5 25 23 35 22 31 4 33 12 5 24 1 8 38 35 11 5 25 33 9 20 19 25 12 4 33 6 30 16 5 25 40 32 36 40 16 14 13 25 20 34 19 26 1	11 5 5 5 5 4 2 6 7 13 2 2 5 3 2 2 2 1 2 2 2 2 8 4 3 3 1 3 3 2 4 3 2 5 2 2 3 2 4 3 3 2 9 21 3 1 2 3 2 1 4 2 2 4 3 4 3 3 2 4 5 6 2 3 4 5 4 2 4 2 2 1 2 1 5 3 4 6 3 2 4 3 5 4 3 5 2 3 2 2 2 2 3 4 5 5 4 2 2 3 3 2 6 8 16 6 3 2 6 3 3 3 7 7 6 2 3 3 3 2 3 16 4 2 4 2 3 2 3 3 2 2 2 2 1 4 5 4 1 2 2 4 2 3 2 2 4 3 2 2 3 3 5 38 11 4 3 2 2 2 2 2 8 2 3 3 3 2 4 4 3 2 4 2 6 7 5 4 4 3 3 4 3 2 7 7	17 17 17 296 296 317 491 491 184 184 184 412 177 177 177 177 177 177 401 478 66 66 68 68 115 444 444 444 444 360 339 339 53 471 71 342 342 483 440 287 319 319 319 388 348 195 195 90 90 143 401 491 445 445 445 351 351 72 72 351 365 365 365 365 330 94 199 41 41 324 324 143 36 377 87 87 87 164 214 214 214 200 200 192 69 223 130 29 44 44 236 36 310 107 395 351 437 91 91 91 85 139 139 293 122 35 198 45 191 236 131 90 401 82 208 79 288 360 360 360 434 339 248 248 212 445 180 171 171 171 252 215 8 100 100 497 497 497 269 342 68 68 115 273 231 231 231 203 94 58 58 268 268 315 268 450 450 98 98 229 82 247 312 312 126 292 292 292 292 23 23 23 23 408 408 408 391 391 491 491 491 289 289 127 114 0 0 313 186 447 196 479 463 463 463 463 29 29 382 245 8 354 137 137 137 137 116 250 250 250 276 174 174 174 319 319 348 466 466 212 127 114 264 264 264 264 59 59 452 245 349 205 155 155 332 332 332 332 372 245 245 217 217 473 65 486 486 460 460 169 164 164 164 219 219 485 485 132 88 88 89 89 446 33 250 70 65 65 329 495 42 147 380 288 139 139 175 175 423 423 423 423 423 355 245 43 345 347 347 347 245 416 32 239 208 79 380 499 84 496 496 274 274 413 94 479 230 230 230 230 215 35 401 491 354 345 409 409 409 409 466 466 466 22 283 455 116 10 398 398 398 398 132 132 58 58 72 72 268 268 268 268 268 169 169 39 54 142 397 397 141 141 281 54 9 221 336 336 354 180 139 139 139 375 375 274 122 122 227 227 419 439 439 439 417 237 237 237 237 237 47 47 491 47 316 316 491 316 73 491 289 435 188 118 118 118 118 118 402 198 198 0 0 0 0 464 464 464 463 463 463 463 280 29 382 382 245 245 43 364 276 347 347 347 498 498 396 396 313 313 24 36 310 107 107 395 395 106 153 153 387 122 122 161 161 487 334 275 275 116 117 48 229 229 247 126 126 126 326 326 326 326 326 101 101 149 149 228 491 320 320 345 407 407 407 143 107 395 356 257 257 281 9 142 72 437 306 306 306 306 396 313 186 36 377 87 87 87 8 354 425 251 251 241 444 444 444 444 246 246 173 402 402 397 409 409 409 116 250 250 276 174 174 174 319 348 466 250 241 367 367 367 367 35 458 270 270 342 224 415 415 415 457 259 127 114 57 57 203 203 381 48 48 13 13 491 247 312 126 126 292 292 292 292 292 292 292 292 21 21 21 21 21 21 21 21 21 21 21 21 21 260 260 260 260 260 391 391 391 391 491 491 289 412 287 287 111 111 111 111 111 438 378 378 364 389 389 389 389 389 314 242 242 242 33 76 76 465 259 354 420 420 420 324 246 3 464 340 340 340 116 466 466 466 114 92 92 92 92 285 34 106 106 153 372 372 396 245 349 349 352 25 242 242 116 33 471 49 49 9 482 338 338 338 395 485 374 374 374 132 132 318 318 49 269 9 142 393 155 332 332 332 332 332 467 467 475 475 475 475 475 475 475 475 422 349 164 214 214 214 214 214 200 200 117 404 404 404 225 225 225 225 225 193 193
+103-1240-0057	103	342	1 24 8 9 5 33 8 29 19 33 20 18 19 24 1 12 4 33 31 38 5 33 1 31 27 31 13 11 24 19 31 19 40 30 15 10 5 23 33 5 12 5 38 8 23 11 30 27 40 9 35 32 5 40 7 33 5 37 12 5 16 5 23 25 5 31 5 37 18 14 18 3 30 33 1	9 6 12 2 2 3 5 7 2 2 6 2 2 8 6 3 4 3 4 3 3 8 46 8 7 7 4 3 4 3 3 3 5 2 4 4 2 3 4 2 2 3 5 8 3 4 4 4 6 3 2 5 4 4 5 3 2 2 1 3 5 2 4 2 3 4 2 2 2 3 5 3 4 6 6	17 17 296 363 363 363 225 225 289 7 7 7 70 70 65 65 284 284 284 265 265 265 85 85 146 146 438 8 354 159 159 285 285 111 111 111 111 438 438 143 129 259 74 74 351 278 278 278 325 183 41 41 324 324 3 183 183 57 57 57 57 203 381 381 117 117 417 417 417 417 417 417 80 80 320 127 114 92 92 92 92 240 35 77 342 9 142 397 336 181 181 181 181 181 385 385 36 227 419 439 439 417 417 237 237 237 237 237 237 237 237 237 237 491 362 491 491 362 491 491 362 362 362 362 491 435 211 211 491 369 369 21 21 21 21 21 21 21 21 21 260 408 408 391 391 391 228 491 491 373 66 66 68 115 273 273 344 344 344 16 274 274 186 186 162 232 68 172 115 470 179 443 120 240 240 314 35 196 217 217 473 258 258 258 342 342 224 494 494 494 281 9 142 397 147 147 329 329 329 329 252 143 36 449 395 302 302 302 497 122 36 36 377 123 123 123 216 22 283 455 43 364 364 276 346 346 346 346 265 85 85 85 139 139 293 293 122 122 131 472 133 147 147 380 288 496 496 496 274 368 31 342 86 142 336 336 354 109 496 278 99 99 436 395 50 50 50 50 185 453 342 168 180 113 113 113 113 169 285 449 34 69 223 130 198 22 283 283 455 349 234 234 261 25 424 424 424 497 497 122 466 81 459 459 271 31 342 224 69 69 130 130 280 156 156 156 156 245 245 72 72 437 306 306 306 306 306 396 396 37 24 227 419 419 427 229 491 247 15 193 193 193
+103-1241-0000	103	798	1 10 4 29 33 14 33 36 1 24 4 34 39 36 22 5 34 9 14 33 1 19 40 31 5 29 30 8 40 11 1 24 4 34 39 36 22 5 34 9 14 33 4 25 11 12 5 31 6 30 5 23 24 13 30 21 3 17 11 22 5 24 16 14 33 5 9 23 20 27 37 14 12 20 15 33 24 8 23 40 33 5 9 30 8 33 30 19 37 14 1 19 33 38 5 40 5 29 30 19 33 20 30 27 11 1 30 5 25 19 26 5 23 6 26 9 19 33 38 20 25 31 25 5 17 16 3 30 24 31 33 13 11 40 38 19 34 25 7 5 25 11 5 17 13 25 1 5 9 19 33 5 37 1 2 16 14 38 35 11 33 19 11 30 8 37 34 30 36 1	10 5 3 3 3 3 6 10 31 6 4 4 2 3 4 5 4 3 5 2 1 4 3 2 3 3 3 8 6 5 71 5 6 3 3 3 6 5 4 3 6 2 3 2 1 2 2 6 2 3 4 3 5 5 5 7 6 3 5 5 2 3 2 1 2 2 3 2 5 6 2 3 2 5 6 4 5 4 4 4 2 2 3 3 4 5 3 3 3 8 23 5 1 2 2 3 3 4 2 2 2 4 5 8 5 3 7 2 3 3 5 3 4 6 4 2 2 5 2 3 5 6 2 5 4 7 3 3 3 3 4 4 4 3 2 4 5 4 7 1 2 1 2 5 8 5 12 6 3 2 2 2 4 2 24 9 8 5 4 3 2 3 3 3 6 2 7 3 7 6	17 17 296 296 317 184 184 184 184 289 320 108 119 351 351 486 460 460 215 96 35 272 300 382 382 313 236 129 75 108 119 351 351 374 374 374 374 132 132 98 98 13 417 417 170 170 170 170 442 491 442 442 312 187 442 12 102 102 12 442 12 12 23 260 260 260 260 260 391 391 316 289 289 289 320 7 217 70 473 486 486 486 460 169 169 35 164 219 485 485 374 132 143 129 321 144 27 351 329 329 329 169 352 164 221 221 321 354 29 382 382 396 313 24 131 483 226 321 188 356 356 31 162 232 172 224 494 494 494 129 74 190 190 499 499 499 265 265 265 85 85 299 299 185 185 433 433 86 238 6 419 439 56 56 237 237 237 237 237 28 28 28 491 28 28 362 362 491 362 491 305 362 362 362 362 491 491 218 40 362 218 218 218 218 491 218 491 218 218 218 218 218 218 218 218 491 491 218 218 218 491 491 369 369 491 369 369 369 369 369 369 369 21 21 21 21 21 101 101 101 149 391 228 321 321 320 7 217 217 473 486 486 486 460 460 169 169 164 164 485 485 485 374 132 143 129 458 144 27 27 351 329 329 151 169 169 164 352 221 221 321 354 29 334 334 59 59 313 24 131 483 440 89 55 446 322 466 22 5 455 38 162 482 482 172 115 273 106 499 372 406 406 467 302 302 497 497 497 497 399 217 217 473 65 264 264 264 264 468 468 468 467 37 236 314 401 401 310 107 395 180 106 499 405 405 206 206 178 96 96 272 472 472 472 401 321 144 27 27 437 319 319 319 53 53 76 205 155 29 134 134 134 134 8 359 359 474 474 474 3 335 14 411 410 410 410 410 410 173 29 29 313 313 216 22 448 448 448 14 411 411 171 171 171 171 252 252 131 472 196 196 70 70 65 65 265 265 85 85 139 450 293 497 49 54 86 238 272 377 123 236 129 259 190 380 499 499 428 428 146 146 358 457 457 133 42 147 380 380 288 288 173 173 29 334 334 59 59 452 452 263 229 491 312 312 312 312 292 292 292 1 1 1 1 1 21 21 408 408 408 408 149 149 289 321 321 412 177 177 177 177 131 133 141 141 141 281 453 168 44 44 44 129 259 190 190 487 288 278 240 325 34 324 324 301 378 42 147 147 380 84 84 496 496 496 496 274 274 37 24 419 439 225 417 417 80 321 320 7 7 147 147 380 499 319 319 348 94 199 176 176 135 135 135 200 200 199 255 255 255 251 251 241 241 431 235 235 235 235 235 235 200 248 248 212 354 255 236 36 108 397 397 487 360 360 360 360 339 339 33 33 394 478 478 232 68 172 115 196 479 331 331 151 319 151 240 416 314 96 393 393 234 234 234 261 25 106 306 306 306 306 396 203 53 394 478 162 86 86 6 272 470 470 120 120 120 37 37 24 77 270 9 142 397 345 333 333 333 220 220 173 164 164 402 472 196 309 479 331 331 315 315 315 450 88 88 242 116 94 199 255 255 416 416 458 445 445 361 361 361 120 120 282 282 282 388 195 117 117 229 247 247 126 126 326 326 326 326 101 101 149 149 321 412 287 44 44 44 215 35 354 278 278 325 34 462 462 462 402 402 401 401 321 259 354 106 106 481 481 481 293 293 186 39 342 224 224 494 242 203 217 473 41 324 324 422 422 349 234 234 234 234 234 261 261 25 498 498 498 498 396 245 245 43 43 364 345 109 109 496 496 496 37 24 314 36 108 377 87 87 236 239 161 161 79 499 499 499 265 85 85 146 146 173 402 402 205 205 234 161 161 487 487 487 374 374 374 132 132 132 98 229 247 15 15 193 193 17
+103-1241-0001	103	777	1 6 30 5 18 3 23 27 38 13 30 38 8 23 11 29 23 5 24 40 18 5 26 7 33 12 13 30 16 19 23 24 20 9 23 36 24 1 12 20 13 30 38 5 40 31 38 20 33 38 19 34 12 5 9 30 13 34 5 37 24 13 25 20 1 4 29 5 23 6 30 10 14 11 40 1 5 25 11 12 5 24 13 11 27 40 31 23 27 29 33 5 38 15 19 25 12 5 11 19 31 33 5 25 31 33 5 18 14 8 40 5 25 24 19 31 33 31 5 37 29 14 23 5 25 11 1 29 14 29 5 23 38 8 23 12 5 23 19 33 5 23 9 14 11 40 31 4 26 13 40 19 16 19 33 38 14 12 5 38 5 25 11 15 5 37 31 5 24 14 19 25 6 23 12 20 39 19 30 1	7 5 3 3 6 3 5 7 2 4 4 5 7 3 4 6 2 3 7 5 4 2 5 5 4 3 3 3 6 3 3 3 4 4 3 8 8 29 5 4 7 5 3 1 3 6 3 4 5 2 1 3 2 2 5 3 5 5 1 5 3 2 3 6 4 6 4 2 4 5 3 7 4 3 5 11 6 2 2 1 2 5 3 3 6 3 9 3 3 4 2 4 4 5 2 3 1 2 4 3 5 3 2 4 3 3 2 4 7 8 6 2 7 3 3 5 6 3 2 3 9 9 5 3 2 4 1 4 4 5 2 7 7 8 4 3 2 4 2 2 3 2 4 7 5 2 5 6 6 3 4 2 4 2 6 4 6 3 4 6 3 7 2 6 2 3 7 2 5 2 3 4 5 7 3 2 6 3 11 17	17 17 363 51 51 51 228 321 320 157 157 157 157 372 467 44 44 44 58 72 72 72 437 437 481 481 481 481 175 175 81 84 84 84 16 274 274 274 43 345 345 109 109 264 468 245 245 245 43 364 276 276 346 346 284 265 85 85 85 139 139 293 293 122 122 472 221 129 321 75 74 425 425 386 431 319 319 319 319 203 203 381 381 381 471 185 49 342 342 342 142 72 437 189 189 189 319 189 200 200 180 180 113 113 113 113 167 167 457 401 321 75 127 114 222 222 222 468 245 349 349 234 205 205 261 25 278 139 139 139 293 203 399 70 429 324 324 324 301 32 239 259 354 425 425 241 374 374 374 374 374 132 132 132 381 381 381 381 404 13 13 78 170 170 491 491 491 491 28 491 341 211 341 12 292 292 21 21 21 21 21 21 21 21 408 408 408 408 149 228 321 321 320 7 127 5 448 448 14 14 411 411 264 264 264 264 264 468 468 468 245 43 43 345 141 141 281 162 54 232 482 482 105 397 397 109 109 213 213 213 358 358 36 36 472 397 397 333 333 220 220 314 198 127 22 283 455 236 129 321 354 190 79 380 288 443 443 443 169 169 164 164 164 164 69 69 130 130 402 402 196 217 217 473 432 330 116 94 337 324 324 324 3 3 197 197 226 226 209 209 145 145 486 460 460 215 215 35 29 100 302 497 497 335 14 411 153 153 153 372 372 396 396 36 36 107 107 395 334 334 334 59 37 37 24 471 270 269 433 427 427 247 247 126 126 326 326 326 326 408 149 149 149 321 412 83 55 55 55 322 466 466 22 5 5 455 399 217 217 473 65 443 443 443 240 325 34 84 84 84 496 274 274 186 162 54 482 482 482 482 482 482 26 26 26 241 431 84 496 496 496 215 35 96 96 36 272 255 255 255 43 364 109 109 403 403 403 171 464 464 340 340 116 466 466 22 283 455 236 239 384 371 278 278 278 31 342 86 86 238 6 272 11 11 11 379 379 471 471 49 9 238 6 272 87 87 87 58 72 156 156 255 42 42 147 147 380 499 499 265 265 85 85 146 146 368 368 368 342 342 224 242 242 116 116 33 33 33 90 250 217 217 473 473 278 278 278 31 39 86 86 238 238 401 491 270 270 270 342 168 69 462 462 130 402 221 401 321 321 74 190 190 437 498 498 498 498 498 498 134 16 302 182 302 302 497 175 175 81 89 89 446 446 67 212 131 472 221 401 321 74 190 492 492 498 498 498 215 215 35 259 74 100 100 100 100 375 375 375 375 98 43 7 7 7 276 346 346 346 346 315 85 85 85 139 139 293 293 293 122 35 198 22 5 5 251 251 241 431 278 278 285 449 302 302 497 497 497 8 8 259 354 29 498 498 498 498 498 396 37 37 314 77 478 232 232 232 172 115 115 273 470 486 486 365 365 365 365 328 200 200 200 248 253 253 253 31 342 342 168 118 118 118 118 280 29 177 177 177 314 131 133 364 364 276 347 347 347 347 498 498 467 313 313 216 216 22 283 283 455 43 43 364 276 174 174 174 174 319 319 348 348 195 195 195 64 212 212 93 93 93 93 93 464 464 69 462 130 402 402 162 232 68 172 115 273 273 319 319 203 53 53 29 29 495 467 467 89 340 116 94 335 14 14 411 411 297 297 297 297 297 182 182 497 122 216 216 22 283 448 219 219 219 219 286 286 286 286 286 286 286 59 59 59 452 452 263 417 417 417 491 491 421 421 491 421 128 128 491 128 491 128 128 193 193 193 17
+103-1241-0002	103	778	1 24 4 34 39 36 19 25 21 28 11 12 5 11 30 8 37 4 16 33 14 18 19 40 27 25 16 4 32 5 25 1 19 22 31 13 29 33 11 14 19 26 12 5 24 27 24 5 25 33 31 38 19 25 18 20 24 13 33 38 19 24 5 25 5 25 11 18 4 11 33 5 25 3 11 33 5 12 5 24 1 16 6 30 19 25 29 30 19 25 31 13 11 38 14 11 8 23 5 25 11 39 36 14 31 5 29 27 40 11 33 19 25 3 11 33 36 6 23 5 25 11 31 5 25 11 30 20 39 36 24 20 33 3 25 12 5 30 27 11 1 38 13 12 14 39 36 25 27 12 5 24 14 25 3 33 1 24 4 34 39 36 11 30 13 11 19 11 6 23 38 19 24 5 25 19 22 31 13 29 33 24 3 30 19 23 5 4 25 11 24 19 31 19 40 30 15 10 5 23 1	20 6 5 3 2 3 3 4 3 6 3 2 1 3 4 6 3 4 3 3 1 2 1 4 6 3 5 5 7 2 7 17 4 3 3 4 1 2 3 2 3 2 2 3 3 4 4 2 1 2 2 2 2 1 5 2 5 3 5 3 3 3 3 3 2 2 1 5 2 2 4 2 5 7 3 4 2 3 2 8 17 6 4 3 3 3 3 2 2 4 3 3 3 2 2 2 7 4 3 2 2 2 5 3 3 2 3 5 5 1 2 2 3 6 2 2 3 8 5 3 2 3 6 2 3 2 2 2 4 3 3 3 3 3 2 2 3 4 7 4 13 4 3 2 2 3 3 4 5 2 2 3 4 3 8 6 39 6 7 3 4 3 3 4 2 2 3 8 9 8 3 3 3 4 4 4 4 4 2 4 3 5 1 2 3 4 3 3 2 4 1 3 4 2 5 3 4 4 4 6 10	17 17 17 296 296 52 52 52 52 52 52 52 52 52 408 101 51 149 149 321 321 7 7 217 473 65 329 329 460 460 169 164 164 485 485 485 485 378 88 121 121 121 116 33 394 239 107 395 470 153 153 387 387 146 146 314 35 259 22 283 455 236 239 161 79 79 499 499 265 85 85 85 146 173 173 280 145 145 460 460 460 169 402 36 272 495 495 467 257 257 257 257 342 168 180 84 350 350 350 350 413 33 394 90 393 234 261 261 25 486 486 486 460 460 169 99 436 436 436 60 298 298 298 275 303 303 117 404 229 491 247 126 126 126 326 326 326 326 326 326 408 408 408 149 228 321 321 412 188 154 154 154 96 96 172 172 273 470 151 151 215 215 96 36 272 161 495 495 467 467 135 135 200 248 466 22 283 455 399 399 70 65 65 84 496 496 203 53 53 291 291 379 379 49 9 142 397 345 409 409 409 409 58 183 451 30 30 30 301 399 217 217 473 443 443 443 240 36 449 472 133 133 364 276 109 278 278 278 399 217 473 136 275 275 116 195 199 89 89 322 67 199 58 110 110 110 110 254 254 314 35 401 75 377 87 87 87 10 10 309 479 331 331 284 284 405 206 206 206 314 314 401 75 108 377 123 123 123 216 114 114 57 57 57 381 381 381 48 48 229 414 491 312 312 126 292 292 23 23 23 23 23 101 260 391 391 228 289 321 321 373 155 155 332 148 148 387 387 372 406 467 467 242 121 203 53 394 76 74 190 190 487 288 330 379 33 394 77 342 342 273 470 443 240 240 133 133 133 345 382 313 285 14 411 284 265 85 85 146 146 175 175 175 81 275 275 275 116 64 212 131 219 152 152 152 378 353 353 353 313 186 54 54 224 494 236 259 74 437 496 496 496 496 274 186 186 323 238 238 6 272 87 87 116 10 479 331 106 284 405 206 206 167 35 75 377 377 123 123 14 14 411 411 411 297 424 297 182 182 293 293 175 175 89 89 446 446 33 394 478 478 232 232 172 172 273 273 319 319 348 64 64 212 161 300 337 41 219 219 219 219 152 152 152 399 217 473 213 213 213 252 449 449 106 125 125 125 125 466 22 283 455 42 42 147 380 380 84 496 496 496 496 274 274 37 24 404 427 321 247 126 126 326 326 326 101 101 149 149 149 321 321 320 7 345 109 409 181 240 216 300 300 300 219 219 152 152 152 152 10 10 479 331 84 84 84 274 216 216 114 57 203 399 70 157 157 157 157 313 10 479 331 331 307 307 307 307 61 167 167 233 227 227 419 439 417 170 170 170 170 28 491 28 28 491 491 28 362 491 491 40 305 305 305 40 40 40 40 40 40 163 491 491 366 366 366 163 491 316 316 491 435 289 321 321 320 7 217 217 473 65 486 486 486 460 460 169 169 164 164 485 219 219 485 152 152 301 422 239 36 161 79 79 288 288 443 240 325 34 191 191 191 314 131 472 14 14 226 321 321 411 297 297 297 297 297 297 297 297 297 297 293 497 497 497 122 43 364 276 109 109 278 278 399 217 473 136 275 275 275 195 195 195 335 440 440 154 154 154 458 96 96 68 172 273 470 151 151 215 35 96 272 472 472 472 196 70 70 70 65 495 495 380 467 256 139 175 251 241 423 423 423 423 355 89 89 446 116 33 250 250 217 473 258 258 258 342 342 224 494 494 494 31 9 142 142 397 147 147 329 329 329 329 329 143 36 310 107 395 302 302 302 375 98 98 13 229 491 247 312 15 15 15 15 193 193 193 17
+103-1241-0003	103	694	1 18 20 18 4 11 5 25 5 25 22 5 24 16 14 33 5 9 5 23 16 20 23 19 26 1 12 5 33 12 5 24 19 31 33 19 30 20 5 31 22 30 20 10 14 40 38 14 31 20 22 30 19 33 23 20 23 4 16 19 26 4 33 18 19 24 1 18 20 24 15 18 4 37 9 19 25 22 38 8 33 30 8 33 19 25 34 19 26 22 19 26 31 27 1 16 6 30 18 20 38 5 40 5 25 1 3 11 23 35 22 19 26 29 14 31 19 25 19 21 1 38 19 12 5 25 5 25 17 15 25 23 20 16 19 17 39 14 1 4 25 11 23 6 26 1 8 14 25 17 30 15 18 13 30 12 5 33 5 10 33 18 19 40 31 33 36 29 19 26 32 27 23 11 14 40 1	12 4 2 6 1 2 2 3 3 4 5 3 3 2 2 1 2 2 2 3 8 5 2 3 7 11 2 2 1 2 2 2 2 6 3 2 3 3 4 4 5 3 3 5 3 4 2 2 7 3 3 2 3 2 2 4 7 6 5 2 5 3 3 3 2 8 19 4 3 4 3 1 2 2 1 2 3 3 3 5 3 3 4 3 2 3 3 1 4 3 2 5 4 9 14 5 2 2 3 2 2 2 3 2 5 1 9 2 3 2 4 2 4 5 4 5 2 1 4 12 1 3 2 2 2 2 4 5 3 8 4 1 4 9 3 3 2 6 14 5 2 6 1 9 7 1 13 2 7 3 3 8 8 6 4 3 1 8 4 7 2 1 2 3 8 3 4 3 2 4 7 4 3 3 5 6 7	17 17 17 363 363 363 363 363 408 51 51 491 321 451 451 30 30 30 58 72 110 110 110 254 254 254 285 44 44 44 94 199 331 319 319 319 348 394 76 465 144 27 27 437 319 319 319 53 53 77 205 155 29 6 134 134 134 8 354 100 100 100 497 349 349 234 234 234 261 261 25 485 213 485 286 139 139 175 175 81 176 176 328 328 200 200 117 229 321 247 126 126 326 326 326 101 408 149 228 321 321 45 45 45 45 198 22 5 455 399 473 473 494 38 162 232 232 86 238 6 272 485 485 286 468 468 337 337 485 324 459 459 271 31 54 9 142 221 336 259 208 208 190 487 288 213 213 213 252 36 310 107 395 334 334 304 304 49 269 142 397 397 141 141 281 162 232 232 172 115 273 444 444 213 252 143 458 208 79 487 313 236 143 36 26 359 166 166 166 324 301 251 251 251 251 241 241 431 376 376 376 460 460 169 169 352 164 164 25 176 135 328 200 464 464 415 415 415 415 415 36 131 183 57 57 57 57 57 381 381 48 48 417 417 170 170 170 28 491 28 491 341 491 341 341 491 163 491 491 316 316 316 321 435 289 373 451 451 30 30 301 399 217 473 65 476 476 464 464 202 202 202 8 137 137 137 116 90 90 76 208 441 441 346 346 428 428 146 146 24 35 133 133 147 380 499 428 428 428 146 143 449 449 89 340 116 394 212 164 164 214 214 360 360 200 76 465 192 192 176 135 200 200 248 248 478 342 172 115 273 84 84 84 84 274 98 13 229 247 312 126 126 292 326 326 23 23 101 101 101 149 391 228 321 321 155 155 155 148 148 372 58 183 451 30 30 378 378 141 141 141 281 342 168 44 44 44 94 199 335 14 14 411 284 284 284 405 405 206 206 240 314 26 26 241 241 367 367 367 458 321 192 176 135 200 200 248 248 465 259 74 74 492 492 492 492 271 150 342 342 224 224 242 116 199 459 459 469 469 37 24 75 310 107 107 447 97 225 225 80 321 321 320 345 333 333 220 220 22 44 44 94 199 331 319 319 319 348 348 33 394 394 212 239 445 180 290 290 290 290 434 434 434 339 33 359 359 81 166 324 324 422 422 349 234 234 234 234 261 25 25 278 278 278 416 458 192 300 334 334 355 355 452 263 229 247 247 126 126 326 326 326 326 101 101 149 149 228 321 412 83 55 55 446 322 67 33 33 250 251 251 241 241 431 235 235 235 235 235 235 235 235 235 348 200 200 200 248 335 14 14 14 226 209 287 265 265 265 85 85 146 146 146 299 242 242 339 195 195 248 248 212 239 208 79 79 288 288 403 403 403 171 171 171 3 58 58 72 72 72 72 110 264 264 264 264 264 264 264 468 468 59 313 216 216 127 45 45 45 236 129 75 108 119 351 351 151 151 151 169 178 36 310 447 447 447 6 272 257 257 257 257 257 99 338 338 338 447 482 238 238 336 321 75 371 374 374 374 374 215 129 354 176 176 176 328 200 200 248 248 186 338 338 338 395 470 106 424 424 424 424 497 122 122 131 300 334 334 304 304 185 185 269 9 427 229 247 247 15 15 193 193 193 17
+103-1241-0004	103	737	1 5 25 11 5 16 35 23 31 6 16 33 9 30 7 25 9 19 30 11 1 38 19 10 18 20 18 4 11 38 6 30 25 13 37 14 31 19 25 31 18 20 38 5 40 33 38 13 25 33 20 1 19 25 16 4 22 33 1 18 20 18 4 11 23 35 22 33 4 33 38 13 25 33 20 1 37 13 30 20 24 5 10 13 40 18 20 23 35 22 33 4 33 31 19 22 31 33 20 1 23 4 22 19 26 5 23 19 33 5 23 5 37 12 5 17 30 15 25 5 31 1 18 38 13 25 18 20 30 20 10 33 9 30 8 33 30 19 37 14 1 12 13 30 38 5 40 25 27 31 8 25 5 37 13 25 20 33 30 15 25 1	8 4 2 2 3 9 11 4 9 6 5 4 4 2 9 7 3 7 5 7 2 3 3 3 2 1 2 1 5 3 3 3 2 2 4 2 4 2 3 2 1 2 2 1 4 3 2 2 3 3 9 40 5 4 4 9 3 4 17 5 3 2 3 3 4 3 4 2 3 7 2 2 3 3 10 5 6 3 2 4 4 3 4 2 2 2 3 2 3 3 2 2 3 4 2 3 3 3 8 14 7 4 4 3 4 2 3 2 2 3 2 2 3 1 2 3 3 4 3 5 13 72 5 2 2 2 3 3 4 4 4 5 2 3 5 6 3 3 3 9 6 4 2 3 3 2 5 4 6 7 6 3 2 3 2 3 3 6 2 5 9 4	17 17 17 363 363 51 51 228 321 412 55 55 322 67 212 34 44 44 462 349 234 234 234 234 234 261 261 25 424 424 424 424 424 182 182 182 497 497 497 497 497 497 186 186 162 482 482 482 482 115 115 273 106 405 405 405 405 206 169 349 352 352 402 6 272 472 221 336 336 354 190 380 288 315 315 315 315 450 450 450 413 413 348 33 33 33 394 394 32 239 75 354 485 213 286 286 286 286 286 286 334 59 59 37 37 24 131 404 225 225 225 225 225 320 345 407 407 407 407 36 107 107 400 30 30 464 254 254 254 314 133 133 364 276 276 153 153 153 387 372 396 388 94 199 145 463 463 173 280 29 313 186 162 342 342 224 494 379 379 379 77 342 224 30 30 378 345 141 141 281 31 9 6 272 119 397 441 109 432 330 348 64 64 76 449 41 41 41 41 19 19 454 13 417 170 170 170 491 28 28 28 28 491 491 2 2 2 491 362 362 102 362 362 362 491 362 362 491 305 305 366 366 366 366 366 435 435 316 316 435 435 435 289 321 321 321 209 188 340 340 33 33 90 349 205 234 261 25 470 486 376 376 376 376 460 460 178 178 233 96 75 419 427 321 247 126 126 126 326 326 326 326 326 326 326 101 408 408 408 149 321 321 373 451 451 30 30 30 58 58 110 254 254 254 254 314 26 26 251 241 367 367 367 367 367 35 96 321 75 34 415 415 415 385 314 259 108 119 397 441 441 432 330 330 379 64 76 36 449 41 41 41 41 19 19 454 454 454 414 47 47 80 80 80 321 321 7 7 32 280 104 104 104 104 104 337 337 337 324 301 399 217 383 383 383 383 383 383 310 107 34 253 253 253 342 224 30 30 30 301 26 241 367 367 367 35 96 321 272 34 415 415 143 478 478 68 68 115 273 278 278 178 96 96 86 86 238 6 272 41 41 41 19 19 454 417 417 417 47 47 47 47 491 47 47 47 80 249 435 435 321 321 321 7 7 251 241 241 431 376 376 460 178 178 458 192 192 176 176 135 200 200 200 464 255 251 251 241 431 278 278 278 26 26 302 302 175 175 69 223 130 130 198 22 283 455 416 144 208 79 380 288 288 290 290 434 339 339 199 459 459 459 271 271 39 39 433 433 160 160 112 112 56 56 56 56 28 28 491 28 28 28 28 362 362 491 362 362 362 362 362 362 491 491 362 362 491 362 218 40 40 211 369 369 369 369 369 369 369 369 369 21 21 21 21 21 21 21 21 21 21 21 21 21 260 260 260 260 260 260 260 260 260 260 260 260 260 260 163 163 163 366 491 316 316 316 491 316 73 73 289 321 321 7 409 409 409 409 94 58 183 451 30 30 301 378 42 147 147 288 213 213 213 252 36 107 447 447 6 472 472 221 336 321 354 190 380 499 428 428 428 146 146 143 35 472 133 133 147 147 147 288 288 278 173 173 280 29 334 334 355 355 452 452 263 263 417 417 417 442 80 80 435 321 435 127 127 0 222 222 245 378 8 345 141 141 281 281 9 9 238 196 309 479 331 231 231 231 231 231 274 274 186 162 232 172 172 115 273 480 480 480 85 85 299 299 94 199 69 223 130 280 34 475 475 475 475 475 475 324 422 36 310 161 161 161 487 487 288 290 290 290 434 434 339 303 303 48 48 48 417 417 417 417 193 193 17
+103-1241-0005	103	701	1 18 20 34 6 33 18 20 38 5 40 33 36 14 23 20 31 27 18 20 33 8 11 18 19 40 18 6 30 31 19 25 12 20 39 3 30 11 5 37 12 5 31 24 6 23 1 9 30 8 33 30 19 37 14 18 27 33 13 23 1 5 25 11 38 13 25 33 27 37 14 33 19 12 5 31 33 15 32 5 25 18 7 31 1 12 5 23 6 26 29 23 4 33 16 6 30 24 38 5 40 1 6 23 24 27 31 33 11 19 40 14 33 19 11 1 12 20 27 25 23 20 23 19 37 19 26 22 30 20 10 14 19 25 31 8 33 1 9 20 19 26 5 17 14 23 18 36 38 5 40 31 19 33 19 26 6 25 5 29 8 23 5 37 32 19 26 17 5 23 40 4 33 12 20 13 22 31 33 30 20 24 13 25 11 1	11 6 3 4 4 2 2 2 2 1 4 5 6 6 3 5 5 2 5 3 6 6 2 2 2 4 2 3 3 5 1 2 2 1 4 6 2 2 2 1 2 3 6 4 7 5 1 3 3 5 6 2 2 3 2 4 5 6 4 8 15 6 1 2 3 2 3 3 5 2 3 3 3 1 3 3 3 4 6 1 2 4 7 11 21 2 3 4 6 7 4 3 4 3 5 2 3 2 2 2 4 1 5 3 3 3 2 2 2 2 5 5 2 3 5 8 3 4 7 2 2 4 3 3 2 2 4 3 3 3 5 3 2 3 7 7 4 1 2 7 1 4 3 5 7 6 3 3 3 2 2 4 3 2 5 3 3 2 3 6 7 2 2 3 6 2 5 2 2 3 3 2 2 3 2 2 5 2 4 3 6 4 5 5 6 13	17 17 296 296 296 317 491 491 184 184 184 321 435 435 451 451 30 30 30 422 164 164 106 106 106 153 387 387 285 449 451 30 301 378 378 141 141 281 281 9 238 6 108 377 344 344 374 132 132 88 147 109 498 498 498 498 134 134 175 81 474 41 41 19 19 186 162 68 172 115 273 84 496 274 58 451 451 30 30 422 422 36 108 119 119 437 265 265 265 85 85 146 146 24 131 257 257 257 257 31 54 9 142 397 441 153 153 153 372 396 313 186 342 342 224 340 340 340 466 22 283 448 448 219 219 464 180 180 306 306 306 306 396 396 24 285 69 223 130 198 22 283 455 38 162 232 482 482 482 105 196 70 70 65 65 481 481 481 481 182 293 293 497 122 129 401 321 321 190 190 380 380 499 428 428 428 146 146 385 36 472 133 42 147 147 288 278 278 173 280 29 29 245 58 58 72 496 496 496 274 274 143 75 108 119 119 351 351 256 139 139 139 139 139 375 375 375 98 13 321 247 126 126 126 326 326 326 326 326 326 326 101 149 149 228 321 412 83 55 55 322 67 67 133 364 109 109 189 330 330 348 64 36 34 180 410 410 410 410 410 280 29 29 313 236 36 377 377 123 123 216 22 283 283 38 162 232 68 238 272 470 470 171 171 171 99 99 436 436 60 298 116 33 199 58 58 268 268 268 268 268 268 268 169 169 186 39 323 390 390 18 18 112 112 56 491 56 312 491 491 312 491 12 12 12 12 21 23 23 260 260 260 391 149 228 491 321 321 127 5 5 455 251 251 241 431 235 235 235 235 235 235 235 235 348 200 248 248 90 465 259 74 425 386 386 431 486 486 460 240 35 35 393 393 155 155 148 148 148 148 387 387 203 53 250 345 141 141 141 281 9 483 14 226 226 209 411 297 297 297 297 203 53 53 65 496 496 368 31 54 6 6 272 490 490 490 368 453 9 168 498 498 498 498 396 313 325 449 191 191 191 37 24 24 404 13 417 47 47 491 491 321 80 321 321 435 5 448 448 14 14 411 411 350 350 350 350 350 350 466 81 166 166 324 301 301 251 251 241 431 278 278 173 280 176 176 328 200 248 394 465 208 208 487 487 213 213 213 422 36 310 107 395 334 495 406 340 340 340 340 33 394 478 66 68 68 115 273 273 265 265 265 428 85 146 358 24 36 472 472 336 259 354 420 420 420 360 360 135 135 135 135 200 200 44 44 44 44 416 321 144 208 498 498 498 498 498 498 134 302 302 375 375 293 497 98 225 489 489 489 489 378 43 345 141 141 281 162 232 68 68 115 115 470 278 278 325 325 176 176 176 135 135 200 200 199 125 125 125 125 199 44 44 44 129 259 321 74 437 437 265 85 85 85 85 85 139 175 175 81 462 462 130 402 99 338 338 338 338 395 395 360 360 200 200 200 248 212 302 302 302 302 497 185 49 342 168 415 415 415 198 22 448 448 448 448 154 154 154 458 321 96 96 482 482 447 238 6 161 487 487 288 360 360 360 434 434 301 399 217 473 65 432 432 432 120 330 388 388 303 303 303 243 243 131 419 427 491 229 247 15 15 15 15 15 193 193 193 193 193 17
+103-1241-0006	103	761	1 24 4 34 39 36 9 13 30 23 20 25 27 33 19 26 12 5 33 19 33 38 5 40 5 17 14 23 1 31 8 11 5 23 11 29 4 31 33 18 14 13 40 22 38 19 22 23 20 13 40 29 3 31 5 9 5 23 38 19 12 7 33 23 35 22 19 26 4 33 18 14 1 18 4 11 18 20 23 35 22 33 1 18 20 22 35 11 18 3 30 11 23 20 18 4 37 16 15 23 11 33 5 25 27 33 5 31 12 5 33 13 25 31 30 19 21 19 11 5 33 20 4 25 11 13 22 31 29 13 22 33 15 32 5 25 5 37 18 14 4 33 5 33 36 11 4 25 11 19 22 31 29 30 13 32 5 25 1 32 20 38 5 40 31 19 33 19 26 12 13 30 38 15 33 19 26 16 14 31 5 24 34 19 26 6 30 31 5 24 9 3 11 20 1	15 5 6 4 4 5 4 3 5 3 2 5 2 10 2 4 1 2 1 2 4 6 3 3 3 5 6 7 3 9 7 2 2 3 4 5 4 4 2 3 2 3 6 3 3 2 4 2 4 2 4 3 4 4 2 2 2 1 3 2 2 6 2 3 2 4 2 4 3 2 4 6 26 6 3 2 4 3 4 4 5 4 14 6 2 4 1 2 6 3 3 3 2 1 2 1 2 6 4 4 1 3 2 6 4 3 3 6 2 2 8 2 5 6 2 2 6 2 2 3 4 8 1 2 2 3 4 3 2 2 5 4 4 6 1 2 2 2 3 5 6 3 2 6 4 3 2 2 2 2 3 3 2 2 2 6 3 6 26 7 4 4 1 5 3 2 2 4 5 2 3 6 4 5 3 2 5 4 3 8 2 5 4 3 4 7 3 8 2 4 2 2 2 9 7	17 17 17 296 363 363 363 363 363 101 51 51 51 321 321 320 7 217 473 65 329 329 329 460 460 329 329 164 164 485 485 219 485 485 477 374 132 132 132 32 321 321 354 180 264 264 264 468 468 313 134 175 359 166 166 166 301 10 479 331 84 84 496 496 285 459 459 459 31 342 342 224 176 176 328 200 200 248 212 45 45 45 325 177 177 177 177 43 43 364 364 276 346 346 141 141 141 368 453 342 168 44 44 416 416 321 144 27 498 498 498 498 467 302 302 375 375 98 98 98 13 225 321 435 225 7 373 66 66 68 68 172 115 273 265 265 265 85 85 146 146 285 285 302 302 302 497 122 122 472 472 259 74 74 437 437 311 311 311 460 169 150 39 86 238 6 272 300 334 334 406 406 467 356 281 281 9 9 142 221 336 321 208 208 441 109 278 278 178 143 458 192 26 359 359 474 166 464 464 253 253 253 342 142 221 321 74 437 437 405 405 206 150 342 342 224 494 134 8 100 100 100 100 497 345 333 333 220 216 180 113 113 113 113 167 167 457 251 251 241 367 367 367 367 458 192 192 135 135 200 200 464 415 415 415 415 415 285 156 156 156 156 59 59 452 263 13 229 491 247 312 312 312 292 292 292 292 292 1 21 21 21 21 21 21 21 260 408 408 408 149 149 321 321 321 373 72 110 430 430 430 430 430 430 325 183 183 451 30 30 301 301 251 251 241 367 367 367 367 367 367 233 96 96 6 227 419 427 56 170 442 442 201 201 201 201 201 201 491 47 491 491 491 435 435 435 320 451 451 30 30 422 458 144 389 389 389 131 58 72 72 72 437 306 306 306 306 306 396 313 285 26 359 166 166 166 166 464 202 202 349 205 234 234 261 261 25 470 443 139 139 139 293 497 122 35 75 377 87 87 87 116 10 10 479 331 84 84 496 496 274 285 325 459 459 459 271 31 342 342 86 198 198 127 283 5 236 129 129 321 108 119 119 351 432 432 330 330 33 195 471 77 269 238 272 447 397 336 147 456 456 236 236 239 310 107 395 278 278 278 325 34 469 469 236 36 108 449 41 41 41 324 324 246 3 464 89 89 446 446 212 131 145 443 178 178 458 96 96 86 105 105 336 470 470 151 178 143 96 401 321 75 108 119 418 418 418 418 418 186 99 436 436 60 60 298 116 199 69 223 130 402 402 156 156 156 245 14 14 411 411 145 145 145 460 460 240 325 34 469 469 143 321 108 449 485 485 485 374 132 132 325 325 34 89 446 446 67 131 34 154 154 154 96 96 54 142 105 336 190 380 288 151 151 169 99 436 436 60 60 298 298 298 303 303 303 48 404 229 491 491 312 312 312 292 292 292 292 292 292 21 21 21 21 21 21 21 21 408 408 408 408 391 491 491 321 373 373 338 338 400 400 400 30 301 378 378 345 141 141 141 281 162 162 232 68 68 115 470 278 278 278 325 176 176 176 135 135 200 248 248 212 127 114 264 264 264 264 468 245 245 43 43 364 364 109 109 109 171 171 171 252 449 449 176 176 135 328 200 200 248 248 393 155 155 332 332 332 186 162 232 68 68 68 115 273 231 231 231 231 53 53 394 76 164 164 214 214 214 214 214 328 200 200 117 335 14 209 157 157 157 157 157 313 186 186 162 232 68 68 68 115 273 231 231 231 53 53 212 212 65 493 493 240 325 41 41 41 19 19 19 454 229 491 247 15 15 193 193 193 17
+103-1241-0007	103	757	1 4 25 11 1 31 19 25 31 19 33 19 26 4 25 11 38 15 33 19 26 38 5 40 12 20 27 25 23 20 34 19 26 33 19 11 36 21 5 31 33 12 13 25 1 32 20 31 4 33 1 4 25 11 38 15 33 19 11 38 19 34 1 6 23 18 14 24 8 33 5 25 11 24 15 25 1 24 4 34 39 36 19 25 22 7 25 33 14 11 12 5 31 33 15 32 5 25 24 4 31 33 14 23 3 22 19 26 5 29 12 5 33 19 22 5 33 6 16 5 31 29 30 13 29 30 5 33 6 30 20 33 19 17 27 19 26 18 27 24 16 14 31 5 29 14 1 5 25 11 4 31 33 19 24 19 16 12 5 16 8 37 34 14 11 20 33 30 15 25 38 35 11 1 31 36 25 9 20 5 23 6 26 1	10 12 3 5 1 8 3 3 7 3 2 4 4 1 2 2 3 4 3 3 3 2 2 2 3 4 6 1 3 3 4 2 10 3 2 4 8 3 5 5 1 2 6 6 11 6 4 7 6 4 1 2 3 3 4 4 3 2 5 2 2 4 3 9 4 6 2 6 5 4 2 1 5 2 7 9 42 6 4 4 2 4 2 3 4 6 2 3 2 2 1 3 3 3 3 6 2 2 3 5 3 4 2 6 3 4 2 5 3 4 2 2 5 2 3 2 2 5 6 3 3 4 2 3 2 2 3 3 3 3 3 2 2 5 4 5 3 4 5 4 3 3 5 2 5 7 16 6 1 3 7 3 3 2 3 2 2 2 4 5 7 2 5 3 3 3 6 2 5 3 2 2 5 4 7 6 2 2 2 4 3 6 8 20	17 17 17 363 51 51 228 184 321 321 209 83 194 194 194 194 194 194 194 194 194 282 388 195 195 212 212 131 483 197 197 197 197 66 66 68 68 115 273 494 278 330 379 33 394 478 478 68 68 68 115 115 470 278 278 325 449 176 176 176 328 328 200 200 464 89 446 116 131 133 364 109 109 403 171 171 171 252 449 449 176 176 328 200 200 250 345 141 141 281 281 9 198 22 448 448 448 14 411 350 350 350 350 350 350 348 359 81 324 324 324 422 164 164 164 214 214 214 214 214 200 200 200 195 248 248 394 76 75 377 87 87 87 236 321 384 371 374 374 374 374 132 132 236 36 310 107 395 395 151 151 151 169 150 39 86 86 238 6 127 114 361 361 361 361 361 388 388 303 117 48 229 321 247 126 126 292 292 408 408 408 408 391 321 321 373 373 400 400 400 400 30 422 422 162 232 68 68 115 273 470 486 486 486 460 460 169 169 36 227 483 226 440 89 446 322 67 394 133 364 321 364 109 109 171 171 171 252 252 449 191 191 191 191 131 133 133 321 345 333 220 220 220 164 483 14 226 321 321 209 411 297 297 297 297 297 297 297 293 497 175 81 58 58 156 156 156 156 156 245 399 217 217 217 70 65 65 428 428 146 146 358 449 449 449 242 116 116 33 33 217 217 217 217 473 290 290 290 290 290 434 434 434 339 303 303 48 48 48 417 491 170 170 28 28 28 28 491 362 362 362 491 491 362 362 491 362 491 491 211 211 491 341 369 369 369 369 21 21 21 21 21 21 21 21 21 101 101 101 149 149 228 321 321 7 217 473 473 329 329 329 460 169 169 164 164 219 219 485 485 378 88 88 242 446 348 90 90 465 445 445 351 351 486 315 319 450 413 413 76 449 449 300 191 313 314 198 22 283 455 38 232 232 238 6 272 470 171 171 171 252 99 436 436 60 60 298 116 33 250 217 473 65 486 486 460 460 169 150 54 238 6 272 300 334 382 313 251 251 251 241 431 405 405 405 206 178 458 192 176 135 135 200 200 200 199 230 230 230 230 215 35 96 198 22 283 455 236 129 321 108 119 351 278 278 143 458 192 192 277 385 325 180 106 405 405 405 206 169 352 352 25 459 459 271 271 31 9 142 221 321 259 190 488 488 488 488 215 35 29 29 382 313 236 36 36 119 351 153 153 153 372 467 337 337 301 236 108 377 123 123 416 458 144 180 180 84 84 496 88 88 176 176 135 328 200 200 248 58 72 437 437 350 350 350 350 203 53 381 394 394 155 155 332 332 332 186 162 342 115 273 273 151 151 215 215 354 29 334 334 334 59 452 452 229 321 247 126 126 126 326 326 326 326 326 326 326 326 101 149 149 149 228 321 83 55 55 322 67 212 34 145 145 486 376 460 460 169 150 150 86 238 6 272 57 57 57 203 473 118 118 118 118 402 198 22 5 5 455 349 234 234 261 25 106 265 265 265 85 85 146 438 173 349 234 393 198 164 470 498 498 313 285 325 41 324 324 422 36 310 161 161 487 487 288 290 290 290 290 434 434 339 250 250 345 389 389 389 314 131 472 401 401 401 80 321 80 321 478 66 68 482 115 273 374 374 374 132 413 33 250 212 354 420 420 420 464 464 255 255 251 251 241 431 235 235 235 235 235 235 413 303 303 303 48 48 48 417 417 170 170 421 421 491 421 421 491 491 128 128 491 128 128 128 193 193 17
+103-1241-0008	103	756	1 12 5 16 8 37 34 14 11 20 33 30 15 25 18 4 40 9 19 25 19 25 4 25 11 17 6 25 18 4 16 5 25 7 30 5 17 27 1 4 25 31 14 11 12 4 33 9 30 19 31 22 5 16 19 32 5 23 1 9 5 33 12 13 30 38 5 40 5 29 4 31 5 25 21 14 11 30 3 29 33 6 16 14 39 36 1 5 23 19 33 5 23 17 14 23 1 32 20 40 31 19 33 19 26 7 33 12 13 30 3 25 12 5 32 19 26 17 5 23 40 1 8 4 31 33 18 14 33 5 17 27 19 25 33 5 12 5 23 15 11 20 40 38 15 33 19 26 30 36 24 9 5 33 32 20 19 25 16 6 30 24 11 24 20 17 30 15 37 23 20 1 12 4 33 32 20 29 30 5 16 14 11 33 19 31 33 15 7 33 31 8 11 1	23 2 3 4 7 3 4 3 2 3 5 2 4 2 2 2 4 3 3 3 11 4 4 3 3 3 9 4 5 5 4 2 2 6 5 2 3 10 7 8 3 4 2 3 2 4 4 4 2 2 4 3 2 4 2 4 4 6 23 3 5 3 1 2 3 3 2 3 2 4 4 5 2 3 2 3 5 2 4 3 2 4 6 2 5 9 10 6 3 1 2 1 2 4 6 9 14 5 3 3 3 2 2 4 5 4 3 2 2 3 3 3 1 2 5 2 4 2 2 3 6 20 7 7 4 2 2 2 2 2 4 6 3 3 3 1 2 3 4 4 2 4 4 3 3 3 2 4 4 2 3 3 4 2 5 3 2 4 4 2 3 3 1 2 4 4 3 5 4 2 8 13 2 3 2 3 3 2 2 3 3 3 2 2 2 2 3 2 6 4 2 8 6 5	17 17 17 296 296 296 317 52 52 52 52 52 52 52 52 52 51 51 51 184 184 321 321 320 127 5 5 38 349 261 261 25 106 265 265 85 85 146 438 173 349 221 401 127 114 498 498 498 313 325 34 324 324 422 236 36 161 161 487 487 288 290 290 290 434 339 199 58 254 254 71 281 342 142 221 336 354 137 137 137 137 116 335 335 14 14 411 188 188 340 340 340 340 330 330 388 94 199 199 89 89 89 446 446 67 64 131 472 472 458 144 180 106 426 426 426 426 426 426 282 282 388 195 117 404 404 225 225 72 110 486 486 486 460 460 169 352 352 29 44 44 94 199 145 145 315 315 315 468 468 406 467 467 467 469 416 416 192 180 84 84 84 84 375 98 98 13 417 417 47 491 47 80 80 321 321 320 83 83 145 145 365 365 330 330 379 77 77 224 179 179 179 313 314 198 164 127 114 92 92 92 92 167 457 401 401 401 321 354 190 380 288 278 278 31 342 86 105 105 458 192 255 255 349 205 261 25 278 278 278 99 436 436 395 302 302 302 375 98 98 13 229 321 247 312 312 292 1 1 292 326 1 326 23 23 23 23 101 101 101 149 391 491 289 289 320 7 354 159 159 159 159 159 167 35 35 198 127 0 0 222 468 245 378 345 141 141 281 453 453 44 44 44 44 259 74 437 311 311 311 311 150 150 342 224 494 494 469 116 64 212 310 300 334 382 313 314 314 239 161 161 380 499 405 405 206 215 215 96 96 272 34 106 405 405 206 169 349 352 234 155 332 332 332 313 219 219 477 477 477 477 132 132 98 98 417 417 417 417 47 47 491 491 47 80 80 80 321 320 412 287 44 44 44 175 81 431 278 278 285 302 497 497 122 416 144 180 498 498 498 498 498 499 302 375 375 98 98 263 13 417 417 417 417 170 491 491 47 491 491 491 47 47 435 435 321 321 373 310 400 400 400 30 422 422 162 232 68 172 115 470 278 278 325 176 176 135 135 200 200 464 180 113 113 113 113 167 167 35 35 127 114 264 264 468 406 467 467 125 125 125 348 466 22 283 455 99 338 338 395 395 360 360 360 200 200 248 212 302 302 302 302 375 375 185 269 433 112 427 491 247 312 312 126 292 292 292 23 23 23 23 23 101 101 101 149 149 149 289 321 412 287 287 287 111 111 111 438 145 376 376 460 460 169 150 86 86 238 6 272 272 156 382 313 325 34 87 87 416 144 27 180 84 84 496 88 88 88 340 340 340 116 33 394 465 377 123 123 198 22 283 283 455 251 251 241 431 171 171 171 252 325 41 41 41 318 318 49 9 142 397 364 109 109 403 171 171 252 449 449 176 135 328 200 248 248 345 380 288 288 496 203 203 53 53 212 354 159 159 159 159 167 167 310 107 338 400 400 400 30 324 464 121 121 33 394 90 393 155 155 25 148 148 387 387 387 203 53 53 53 64 10 429 429 429 301 416 32 321 208 79 79 380 288 288 171 171 171 252 173 173 402 26 359 474 474 474 474 19 19 19 229 321 247 126 126 326 326 326 326 101 149 149 228 321 321 412 45 45 45 169 143 310 107 400 400 30 301 422 129 74 492 492 245 349 349 261 25 498 498 498 313 313 36 377 87 87 38 54 86 6 272 470 171 464 464 464 113 113 113 113 167 77 478 172 172 273 265 265 265 265 85 85 299 299 24 131 419 439 439 417 417 237 128 193 17
+103-1241-0009	103	694	1 32 20 40 5 22 15 31 8 32 35 11 31 15 1 8 24 25 3 33 19 22 31 29 13 22 33 19 26 5 17 14 23 1 31 13 11 24 4 34 39 36 9 23 4 26 22 23 20 1 19 33 31 5 9 28 8 37 22 5 24 16 6 30 1 18 20 32 35 11 9 20 18 20 30 1 24 19 31 19 40 4 23 5 17 40 4 25 11 14 31 29 13 25 31 14 38 5 40 33 19 9 30 19 26 18 19 24 27 37 14 16 14 24 25 27 37 5 31 22 27 32 5 16 6 30 24 20 1 12 5 31 33 15 32 5 25 24 4 31 33 14 38 19 31 5 23 11 1	10 7 4 4 4 4 8 5 4 4 2 3 2 12 53 9 1 3 2 3 2 3 2 3 2 3 3 1 3 3 4 6 8 14 7 2 3 2 6 2 3 2 4 3 3 3 3 3 6 24 4 4 2 2 8 11 4 4 4 2 6 5 3 8 13 5 2 5 2 2 3 6 4 3 7 19 3 2 3 2 4 6 3 3 3 4 4 2 2 4 6 3 2 4 5 6 2 2 4 2 2 4 2 3 2 3 2 4 3 4 3 4 2 1 5 3 2 2 5 3 4 5 3 5 3 2 4 8 27 3 3 4 3 3 5 3 2 3 6 3 3 4 4 2 4 3 3 6 11	17 17 17 363 363 363 51 51 184 184 321 373 373 338 400 400 400 400 213 356 356 368 453 342 168 44 44 44 44 458 144 445 351 343 343 343 343 171 358 358 358 39 342 224 168 111 111 111 111 438 186 99 395 395 389 389 236 36 478 224 273 470 403 403 403 207 207 207 454 263 48 417 417 417 417 417 237 237 442 28 28 28 491 28 442 442 442 442 362 362 362 362 362 362 362 362 362 362 305 218 218 218 218 366 218 491 491 366 366 366 366 366 366 491 366 366 366 366 316 316 316 316 435 321 289 321 209 287 287 111 111 111 111 438 438 203 10 479 307 307 307 61 285 34 154 154 458 96 342 86 105 336 470 470 151 151 178 178 96 36 449 176 135 135 200 464 44 44 44 416 321 208 79 498 498 498 498 499 302 375 98 98 98 13 13 414 170 170 442 47 47 47 491 47 47 47 491 316 316 80 80 321 289 66 66 66 179 179 179 179 314 314 196 217 473 486 486 329 460 169 169 352 183 485 485 485 374 301 129 321 259 425 425 386 431 376 365 365 299 76 76 465 26 26 359 359 474 474 474 19 19 229 321 247 126 126 126 126 326 326 326 326 101 408 408 408 391 391 391 391 316 316 80 80 289 321 321 321 188 177 177 177 356 356 356 342 168 44 44 44 8 32 32 32 321 354 354 153 153 153 153 387 387 387 146 464 464 111 111 111 146 438 202 402 402 402 75 144 27 437 319 319 203 53 53 394 90 393 234 205 155 332 148 148 148 372 372 372 59 452 263 263 78 414 47 47 47 47 47 491 47 47 80 491 321 289 289 373 451 451 451 30 30 30 99 338 338 389 389 389 389 314 32 239 354 420 420 420 213 213 252 422 183 183 451 286 286 286 286 334 59 59 59 452 263 321 247 247 126 126 292 292 326 326 326 326 326 326 101 101 101 149 228 289 289 320 217 473 258 258 31 342 224 494 494 368 453 342 483 14 145 145 284 329 329 175 175 81 81 469 416 458 458 96 342 68 115 273 365 365 365 330 348 64 212 300 382 382 313 186 162 162 482 482 105 105 336 470 470 432 330 330 64 64 77 77 342 224 300 334 334 334 59 245 43 43 345 141 141 141 281 9 9 6 6 87 87 87 87 8 321 354 190 288 288 360 360 200 200 183 183 57 57 57 57 53 473 106 410 410 410 410 173 280 29 29 382 245 349 155 155 165 165 165 53 53 10 479 331 84 84 496 274 173 280 29 38 38 162 482 482 105 336 144 180 496 496 496 274 99 99 436 107 60 423 423 349 349 205 155 155 332 332 332 372 372 245 203 473 429 429 429 429 19 19 454 454 417 414 170 170 170 28 491 28 2 2 2 2 491 2 491 491 2 2 2 316 316 316 491 316 316 73 321 321 321 321 7 127 5 5 38 162 342 86 238 6 371 470 171 171 171 252 99 436 436 60 298 275 116 33 250 217 473 65 486 486 486 460 169 169 150 86 238 6 272 300 334 382 245 43 43 364 276 109 278 278 31 342 342 224 302 302 302 302 375 122 122 122 352 419 427 229 247 312 15 15 15 15 15 193 193 193 17
+103-1241-0010	103	793	1 17 13 31 12 13 30 40 31 5 24 19 31 33 15 22 1 18 20 31 13 11 1 24 19 31 19 40 31 29 13 25 31 14 22 15 24 6 16 12 5 33 30 15 25 38 19 34 12 4 33 17 14 23 5 25 11 17 15 37 18 14 19 25 33 5 24 8 10 3 30 21 1 31 13 11 39 36 5 25 11 39 6 30 31 19 31 33 14 38 14 5 11 3 29 33 19 26 18 14 16 14 24 5 25 6 30 16 5 25 5 31 8 23 5 24 1 5 25 11 12 5 33 39 36 38 35 11 9 20 5 23 6 26 16 14 18 14 29 30 13 40 5 25 33 23 20 1 12 4 33 31 6 23 8 25 27 5 9 7 33 19 33 1 4 25 11 8 18 4 37 5 25 17 3 33 13 25 20 24 6 30 6 30 16 5 25 40 22 5 25 31 20 23 11 18 19 30 5 9 7 33 31 1	14 5 5 5 2 3 3 2 4 4 5 2 5 3 6 8 8 5 3 5 5 5 19 5 3 4 2 2 2 3 3 4 3 3 4 4 4 4 4 2 2 4 3 3 3 2 2 2 2 6 3 3 8 5 3 1 2 3 5 3 3 4 2 2 3 1 4 5 5 5 5 8 14 6 4 2 6 6 3 1 2 1 4 3 5 3 4 3 4 4 4 3 3 5 3 1 2 3 4 4 4 2 3 1 3 5 3 4 1 2 3 5 6 3 2 5 13 5 1 2 1 3 3 3 3 2 2 2 2 3 3 4 4 4 4 1 3 4 5 2 2 3 2 3 3 2 6 15 5 5 3 2 6 4 5 4 6 2 3 5 2 4 5 8 9 1 3 6 3 4 2 2 2 3 3 3 1 3 3 4 3 3 6 3 3 3 3 3 2 2 3 5 5 4 2 2 2 5 1 2 9 3 9 21	17 17 17 296 363 363 363 51 51 51 184 491 184 321 7 7 320 127 357 357 443 443 240 271 150 39 86 238 198 198 114 0 222 468 468 313 186 186 162 68 68 115 273 231 231 231 231 53 53 217 473 65 258 38 31 162 68 68 238 6 470 470 470 171 171 171 171 358 358 233 233 321 192 419 439 417 417 417 237 237 47 80 321 321 435 373 451 451 451 30 30 422 162 68 68 115 470 470 120 120 120 37 24 24 404 13 229 491 247 312 126 292 292 292 292 292 21 21 21 408 408 408 149 149 228 321 321 320 7 217 473 258 258 258 31 342 224 494 494 494 31 162 232 105 105 336 470 329 432 330 330 64 77 77 224 300 334 382 245 245 458 144 445 210 210 210 210 210 203 203 53 106 230 426 426 206 169 349 402 198 198 22 283 455 236 161 161 487 487 288 290 290 290 434 434 250 250 345 333 333 220 220 129 127 114 92 92 92 92 167 167 457 32 32 32 259 208 498 498 498 498 498 134 302 302 302 375 175 175 81 89 89 322 67 394 32 239 144 445 210 210 210 210 210 173 349 402 156 156 156 156 156 467 467 340 340 116 394 465 377 123 123 399 70 46 46 46 46 46 438 236 36 310 107 395 180 499 499 306 306 306 306 306 59 37 37 243 233 75 227 419 427 78 56 491 491 312 292 292 292 23 23 23 408 408 408 391 321 321 373 66 68 115 273 470 120 120 240 314 314 219 219 219 219 152 152 152 374 132 132 88 88 89 89 446 116 212 131 219 222 222 222 387 387 186 186 162 232 172 115 273 278 278 31 31 54 86 238 6 272 300 300 355 132 43 345 347 347 347 347 347 467 467 313 236 236 239 384 180 180 405 405 206 215 215 35 96 272 176 135 135 200 248 58 156 156 156 156 156 59 245 349 155 155 165 165 165 165 53 44 44 44 335 14 14 411 411 153 372 372 372 349 349 352 261 242 242 94 199 459 271 38 162 342 68 115 273 265 265 265 85 146 146 175 175 81 459 203 203 117 404 229 247 126 126 126 326 326 326 326 101 101 149 149 228 321 412 83 55 55 55 322 67 466 45 45 45 45 36 36 107 219 152 152 152 132 378 345 389 389 389 314 239 239 420 420 420 464 255 255 255 251 251 241 431 235 235 235 235 348 200 248 76 393 155 332 332 332 245 156 156 156 156 245 245 129 129 321 74 190 488 488 151 368 453 453 168 11 11 379 64 243 465 26 359 474 474 474 474 19 19 48 417 417 417 417 170 47 491 47 491 47 491 491 47 316 80 289 321 7 7 127 114 92 92 92 92 240 167 77 77 342 168 106 297 297 297 297 297 297 293 175 111 111 111 111 438 438 438 10 479 331 84 84 84 88 88 255 255 255 8 354 180 113 113 113 113 450 285 285 277 277 277 277 24 131 439 417 417 417 417 491 491 47 491 80 80 321 412 83 83 83 194 194 55 55 322 212 34 111 111 111 111 111 438 438 58 72 110 202 202 202 202 202 29 242 116 90 394 239 27 180 405 405 206 240 285 34 475 475 475 475 475 475 301 399 70 138 138 138 138 138 372 245 245 14 411 411 153 153 372 372 372 349 349 155 29 242 275 379 379 471 471 49 142 221 336 144 121 121 379 394 478 68 342 115 444 444 444 213 464 139 139 302 302 497 122 122 131 183 286 286 286 286 406 406 467 467 255 8 354 180 180 113 113 113 450 450 413 37 243 270 270 433 390 390 18 112 56 56 56 312 312 312 15 15 15 15 15 15 15 15 15 15 15 260 260 260 193 193 193 193 17
+103-1241-0011	103	724	1 8 11 27 25 33 5 25 11 14 31 33 4 25 11 1 31 13 11 24 4 34 39 36 18 13 23 29 23 5 31 23 20 1 38 19 32 19 26 12 5 33 24 3 30 19 23 5 38 5 40 4 33 18 4 25 11 33 19 22 27 29 38 19 34 12 5 31 19 10 36 15 32 5 25 1 38 13 23 39 36 11 9 13 33 14 22 38 13 31 10 5 25 12 5 17 14 23 1 31 13 11 12 5 31 33 15 32 5 25 24 4 31 33 14 22 13 30 23 5 31 23 20 1 8 11 13 30 31 15 32 20 23 9 20 15 9 5 23 33 36 19 22 31 29 23 15 25 1 32 20 40 17 3 33 5 33 5 26 5 37 18 14 27 25 12 4 33 31 14 33 5 25 1	23 8 4 5 1 3 3 2 2 3 6 3 9 2 9 15 6 1 4 3 5 3 2 4 5 2 3 4 2 3 4 2 6 14 4 3 4 3 5 2 2 4 2 2 4 3 4 3 3 2 3 3 2 7 5 3 2 2 2 5 6 3 2 1 2 1 2 4 2 5 4 4 6 2 6 33 6 6 4 4 3 2 3 3 3 2 3 2 2 3 4 1 2 1 2 2 7 7 16 6 2 3 1 3 3 3 3 5 2 2 3 4 3 2 2 5 2 4 2 3 4 2 8 25 8 4 3 4 5 5 4 3 3 2 4 4 2 1 2 3 2 3 3 3 2 3 7 5 6 4 3 4 4 1 3 3 5 3 3 2 2 3 6 7 4 2 5 4 6 6 1 2 6 16	17 17 296 296 296 317 317 317 317 317 461 491 461 461 461 461 461 461 461 184 184 289 321 321 209 287 111 111 111 438 438 10 239 384 371 84 84 350 350 413 64 212 131 34 145 319 348 348 212 212 300 494 469 186 162 232 232 232 482 238 6 272 470 470 294 294 294 294 294 294 282 388 388 303 243 75 131 419 439 439 439 78 78 47 47 47 47 491 47 47 47 491 47 47 491 491 80 442 289 66 66 68 179 179 179 179 314 314 196 196 70 65 329 486 329 460 169 164 164 485 485 485 485 132 274 58 58 72 72 72 437 268 139 293 293 215 215 35 26 26 262 262 262 262 262 342 342 26 26 359 474 474 474 474 19 454 229 247 247 126 126 326 326 326 326 101 101 101 149 149 228 321 7 345 109 109 278 278 99 447 447 107 176 135 135 135 200 200 248 212 212 45 45 45 45 35 196 196 217 70 65 65 329 42 42 380 288 256 256 139 175 175 423 423 423 43 43 345 141 141 281 281 453 168 415 415 415 36 131 119 72 72 72 110 294 294 294 294 294 282 388 388 195 394 76 75 377 87 87 87 129 321 144 27 351 496 496 496 274 215 215 401 401 321 354 333 220 220 198 22 283 38 162 342 342 224 494 494 236 36 107 485 485 485 134 88 418 418 418 418 418 252 99 436 436 60 298 298 298 303 117 48 13 229 321 247 312 312 187 187 12 12 12 12 12 12 260 260 260 260 491 163 163 366 366 491 366 491 366 366 316 316 316 316 321 321 321 435 435 7 7 364 276 109 109 443 443 443 139 139 293 293 293 122 219 219 152 152 152 152 314 314 472 401 259 354 180 443 443 285 285 382 382 245 143 458 208 441 151 151 151 169 150 238 238 272 60 60 242 116 466 22 283 416 144 79 498 498 498 498 499 355 302 375 375 98 98 263 13 417 417 417 417 237 237 237 47 47 47 491 47 47 47 80 80 80 321 435 435 66 115 179 179 179 179 314 198 22 283 455 38 162 86 238 6 470 470 171 171 171 99 99 436 436 60 298 116 33 250 217 473 65 486 460 460 169 150 86 6 272 300 382 313 458 445 445 445 351 351 264 468 468 134 134 175 262 262 262 262 39 342 26 26 359 474 474 474 19 19 454 229 321 321 247 126 126 126 292 326 326 326 326 326 326 21 326 21 326 101 101 101 101 149 149 149 228 321 412 412 287 287 111 111 111 438 438 236 239 384 371 470 264 264 468 468 313 186 162 342 68 115 470 403 403 171 171 422 186 99 338 338 395 494 139 139 497 122 8 420 420 420 420 464 171 171 171 134 8 29 100 497 122 36 377 87 87 87 154 154 154 458 96 96 232 105 105 336 425 386 386 431 290 290 290 290 434 434 434 339 195 117 117 417 417 417 417 225 225 435 435 338 400 400 400 30 422 281 342 342 105 221 144 180 106 189 240 285 34 44 44 236 36 108 119 119 351 319 319 319 348 200 200 69 223 223 130 402 156 156 156 156 406 467 467 350 350 350 350 350 350 413 413 413 195 33 212 198 114 92 92 92 92 92 167 35 478 478 68 68 172 115 273 498 498 498 396 396 385 233 242 242 275 303 303 303 303 48 305 417 78 170 421 421 491 491 128 128 491 128 128 128 193 193 17
+103-1241-0012	103	748	1 24 15 9 20 12 15 38 14 7 33 5 37 9 28 40 5 37 12 5 9 30 4 25 11 39 36 38 3 25 33 5 11 1 18 20 38 6 22 33 1 21 6 25 33 5 23 20 5 38 15 1 9 20 19 26 18 5 26 17 30 20 1 4 25 11 12 20 5 25 16 6 30 10 5 25 5 33 24 4 34 39 36 38 5 40 23 13 16 33 19 11 36 12 4 33 18 38 19 10 38 5 40 18 3 30 11 14 16 14 18 19 24 12 5 25 1 9 19 30 11 19 26 5 23 8 5 25 19 25 19 33 31 11 13 25 1 38 6 22 5 29 33 36 5 17 14 23 1 5 31 33 30 15 25 21 17 14 23 1 5 25 6 30 16 5 25 17 14 23 1	20 4 4 2 3 3 4 4 3 4 3 1 4 4 9 3 2 1 2 2 2 3 5 1 2 2 4 5 2 2 3 3 6 28 5 4 3 7 3 4 2 4 4 3 2 2 2 5 4 4 11 4 3 7 1 4 5 2 4 2 3 6 22 7 2 2 1 3 3 3 4 2 3 3 2 1 3 2 4 6 4 3 4 3 2 4 3 4 2 3 2 5 14 5 7 4 4 2 2 5 2 1 4 5 2 3 2 3 3 2 6 3 7 3 3 7 2 2 4 4 2 2 4 4 6 11 2 2 2 1 3 4 4 3 7 10 23 7 7 5 4 5 4 4 3 4 8 10 15 6 6 3 2 6 2 3 4 6 6 1 4 4 7 3 5 1 6 3 7 6 8	17 17 17 296 296 52 52 52 52 52 52 52 363 101 101 51 51 228 491 289 321 7 473 473 329 476 171 252 378 337 337 324 301 216 216 0 0 0 301 378 43 345 347 347 347 406 467 145 113 113 113 113 285 34 223 462 402 402 221 401 259 354 153 153 153 153 387 387 387 318 318 185 453 9 168 69 223 198 198 22 283 455 129 354 190 380 499 288 365 365 282 299 64 212 131 219 219 152 152 152 378 43 364 276 174 174 319 319 348 64 76 449 191 191 191 191 24 131 404 417 417 417 417 417 237 237 237 491 237 28 491 28 362 491 102 362 102 362 362 362 491 366 491 316 491 491 316 316 435 435 321 321 373 451 451 30 30 301 378 364 276 276 346 346 405 405 405 405 206 178 96 96 227 472 472 472 472 401 75 310 107 395 180 329 426 426 206 348 76 465 26 26 359 359 359 474 474 324 464 464 255 255 255 43 364 276 109 109 403 403 403 207 207 207 19 19 454 197 197 80 80 321 321 320 7 354 420 420 420 360 360 360 135 135 135 200 200 248 58 58 72 437 319 319 319 348 348 64 248 212 79 495 334 41 41 41 19 454 229 247 247 126 126 126 326 326 326 326 326 326 326 326 326 326 101 101 101 149 149 149 228 321 412 83 194 194 194 194 322 388 67 466 127 448 448 448 464 319 319 319 348 90 90 205 261 25 148 148 148 387 396 186 310 107 60 60 298 94 11 11 11 457 457 217 217 473 65 486 486 486 460 460 169 169 164 164 485 485 485 485 485 374 132 43 43 345 141 141 141 281 342 26 26 251 241 431 443 443 169 169 402 402 6 272 377 87 87 236 239 239 384 371 371 374 374 374 374 132 132 132 132 132 132 132 132 197 197 321 127 114 114 92 92 92 92 92 460 167 385 35 75 227 472 397 397 345 407 407 407 407 407 310 447 397 397 141 141 141 281 54 9 142 72 72 72 437 306 306 306 306 396 396 285 300 382 382 349 205 155 332 332 332 58 58 183 183 183 57 57 57 57 203 53 381 381 195 394 212 198 127 114 89 446 446 67 394 394 32 32 32 401 401 321 75 354 485 485 286 286 286 468 396 313 325 325 176 135 135 200 200 199 44 44 44 251 251 251 251 241 241 431 265 480 480 480 85 85 85 146 464 464 275 275 388 94 199 340 340 199 199 154 154 154 36 77 342 342 86 221 336 321 384 384 371 93 120 120 120 120 330 388 303 195 195 303 117 404 404 78 78 491 491 312 312 292 292 292 12 12 12 23 23 260 260 260 260 260 391 391 391 491 289 289 321 7 7 7 364 276 276 346 346 405 405 405 206 178 35 35 458 192 180 230 230 230 230 215 215 35 96 401 75 108 377 123 123 123 88 44 44 44 416 416 239 144 79 498 498 498 498 498 498 134 302 375 375 375 375 98 98 98 13 417 417 417 417 237 237 47 47 47 491 491 80 80 491 289 321 321 287 287 44 44 44 38 162 232 482 482 482 238 6 161 161 79 487 288 290 290 290 434 434 339 339 310 107 447 221 144 79 498 498 498 498 498 302 302 375 375 98 98 225 483 226 209 44 44 44 44 33 335 14 14 411 411 153 153 372 372 372 396 349 349 234 261 261 242 242 116 116 33 90 90 212 239 79 79 498 498 498 498 134 302 375 375 375 98 13 229 321 247 15 15 15 193 193 193 17
+103-1241-0013	103	740	1 4 25 11 19 24 4 25 11 5 37 18 14 38 8 32 20 38 5 40 5 25 33 5 9 28 1 24 4 34 39 36 17 30 27 25 11 19 25 31 29 19 30 19 33 13 40 18 20 33 14 25 11 5 9 7 33 5 25 11 32 5 16 5 23 11 1 21 13 25 33 23 20 11 7 25 12 5 29 23 4 33 16 6 30 24 33 6 30 11 40 18 14 1 32 20 18 4 11 9 19 25 38 3 10 19 26 18 19 24 13 37 14 31 19 25 31 18 20 18 4 11 29 4 31 33 18 14 1 4 25 11 32 20 18 4 11 18 14 8 40 3 25 18 19 24 25 7 1 24 4 34 39 36 38 5 40 25 3 33 23 35 22 19 26 4 33 18 14 1	10 8 2 3 3 4 5 2 3 2 2 4 6 7 7 4 3 4 2 3 2 1 2 2 3 15 32 7 8 6 3 5 4 6 9 3 2 3 4 8 3 1 8 2 3 2 2 2 3 6 7 2 1 2 2 9 4 1 3 3 6 4 4 2 5 3 2 6 2 4 3 2 4 4 5 3 1 3 3 3 4 4 3 2 2 3 4 4 3 4 2 2 8 40 5 4 2 2 3 2 2 3 4 4 4 2 2 4 3 2 6 4 3 5 3 4 2 3 1 2 1 3 6 5 5 1 2 8 14 6 1 2 3 3 5 3 2 4 6 8 5 4 2 3 3 2 4 15 24 6 6 2 3 3 2 2 4 3 4 2 5 2 3 3 4 5 2 2 7 10	17 17 17 363 363 363 51 149 228 228 321 83 83 194 194 194 194 322 67 64 212 212 34 44 44 44 217 217 473 65 486 365 365 460 330 388 64 212 131 34 223 223 130 402 402 156 156 156 156 59 59 59 245 245 43 43 364 276 346 346 346 346 346 265 85 85 146 438 186 338 338 400 400 30 378 378 345 141 141 141 141 281 453 242 242 116 64 131 34 44 44 8 354 354 153 153 153 153 387 387 387 387 207 207 207 98 48 417 417 417 417 170 170 491 28 28 28 28 2 2 2 491 2 2 491 2 491 2 2 2 366 366 491 316 316 316 316 491 73 491 289 321 7 7 217 217 473 65 486 486 486 486 460 460 169 169 169 164 164 164 219 219 219 485 485 374 374 132 132 32 32 321 208 208 79 79 380 380 288 84 496 496 496 496 274 274 413 413 413 413 64 212 212 34 34 340 340 116 33 394 478 478 232 232 232 232 232 105 105 336 354 470 286 278 498 468 468 468 468 468 467 277 385 325 449 34 253 253 253 453 168 30 30 30 422 129 75 108 119 308 308 308 308 308 308 308 396 313 64 212 131 255 255 8 354 180 113 113 113 113 113 450 450 413 24 36 449 89 89 116 33 33 394 90 338 338 338 338 338 338 395 189 151 151 169 349 349 352 29 302 302 302 302 497 122 122 122 314 401 401 401 321 310 107 107 395 432 432 432 330 379 64 76 36 26 26 359 474 474 474 324 301 239 384 371 180 315 315 315 450 450 413 466 466 22 283 455 455 259 74 425 386 431 486 486 460 460 167 35 393 205 321 155 148 148 148 387 387 203 53 90 90 75 119 441 441 153 153 153 153 372 372 37 314 77 77 342 9 224 156 156 156 156 59 452 452 263 229 247 247 312 312 312 292 292 292 292 292 292 292 1 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 260 260 260 260 408 408 391 391 321 321 373 373 400 400 400 30 30 58 110 254 254 254 254 314 35 259 137 137 137 137 399 250 250 276 346 346 346 206 206 240 310 310 107 395 176 135 135 200 248 183 57 57 57 57 57 53 473 335 14 411 145 463 463 463 463 29 29 382 313 186 186 162 68 68 273 278 278 278 330 379 394 77 342 342 451 30 30 30 464 254 254 254 131 129 321 74 437 311 311 311 311 311 311 460 169 150 86 86 238 6 272 272 334 334 334 59 59 452 452 229 321 247 126 126 126 292 292 292 326 408 408 149 149 228 321 321 209 83 55 55 322 322 67 310 400 400 400 30 30 3 58 72 72 110 110 254 254 254 240 131 58 183 156 156 156 156 245 335 14 411 265 265 265 265 265 85 85 146 318 368 342 168 168 125 125 125 125 348 199 183 183 57 57 57 57 53 53 10 10 479 331 331 315 315 315 315 315 450 450 450 450 98 98 13 13 78 170 170 170 170 28 28 2 491 2 2 491 2 2 2 2 316 491 316 316 316 316 289 289 289 321 7 7 217 473 329 329 329 329 329 329 329 164 164 485 485 485 301 378 378 345 141 141 141 281 9 238 221 196 479 307 307 307 61 167 167 457 457 251 251 241 367 367 367 367 458 192 176 135 135 200 200 464 464 415 415 415 415 415 240 285 156 156 156 156 59 452 452 229 321 247 312 15 15 15 15 193 193 193 17
+103-1241-0014	103	756	1 5 10 8 23 11 5 37 5 9 7 33 19 23 13 37 5 25 1 17 3 30 9 11 19 25 5 37 13 30 20 32 6 30 33 1 37 13 30 20 33 8 33 1 37 13 30 20 5 17 23 20 11 30 13 31 5 37 39 13 23 27 19 32 17 30 15 38 19 25 31 20 1 32 20 38 6 30 5 16 15 11 19 11 9 30 7 25 31 15 23 14 18 4 33 1 5 25 11 9 19 25 20 34 12 5 18 4 33 1 19 22 31 33 13 25 11 19 26 11 7 25 18 14 9 4 22 38 14 33 36 9 30 15 11 40 5 37 13 30 20 34 19 22 1 11 19 31 8 11 5 11 23 20 30 13 11 1 18 13 30 1	14 5 6 7 3 2 1 3 2 3 4 2 2 6 2 3 3 8 23 4 6 3 2 3 2 2 2 4 3 4 3 9 4 3 6 5 5 3 4 4 7 6 8 4 4 3 4 7 4 4 5 4 3 2 4 7 2 2 5 4 3 5 2 7 3 3 8 5 2 4 6 8 26 6 3 4 4 3 3 6 7 2 3 4 4 3 8 3 6 4 5 3 4 6 8 2 5 2 1 3 3 3 5 4 2 1 7 5 7 12 4 3 4 3 3 4 2 2 4 3 6 2 5 2 4 8 8 2 3 7 7 3 3 8 4 3 2 5 3 3 5 7 5 8 6 2 3 8 5 3 3 3 2 5 7 6 8 1 7 2 11 17	17 17 17 296 491 317 317 491 491 184 184 184 321 320 412 44 44 44 44 36 310 107 107 395 437 91 91 91 91 85 85 139 139 293 122 122 34 69 223 130 280 44 44 8 8 354 180 113 113 113 285 285 34 44 251 251 251 241 241 431 443 443 173 173 280 242 275 275 116 195 117 117 48 417 414 170 491 170 491 321 211 211 312 312 292 292 326 326 326 326 23 101 101 101 149 391 491 289 289 321 144 144 106 499 306 306 306 306 306 396 396 215 215 96 36 272 272 340 340 94 199 44 44 44 4 280 104 104 104 104 104 468 337 337 337 337 422 422 99 338 338 338 338 395 395 153 153 153 153 372 372 396 385 385 36 227 419 225 225 80 80 491 321 321 7 7 32 4 104 104 104 104 104 468 337 337 337 324 422 143 36 108 119 119 437 437 265 265 428 428 428 146 358 358 233 75 227 419 225 225 225 80 80 80 80 7 7 4 280 104 104 104 468 468 337 337 337 324 3 14 14 411 411 284 319 319 240 416 416 96 134 134 359 359 81 166 166 324 301 236 239 161 161 79 288 288 151 151 271 271 39 342 342 342 224 462 462 462 462 402 402 219 219 219 219 180 180 443 139 175 175 81 84 496 88 88 109 459 459 459 99 99 447 447 447 221 336 144 208 79 380 288 403 403 403 171 324 3 301 301 43 364 364 345 109 278 278 116 33 394 77 77 342 68 342 224 41 41 41 41 19 19 454 229 321 247 247 312 126 292 292 292 292 292 292 23 23 23 23 23 23 23 260 260 260 260 260 260 391 391 228 321 373 373 400 400 400 400 301 378 43 364 276 210 210 210 372 372 372 467 44 44 44 349 349 234 234 261 261 25 470 171 171 171 171 252 252 325 34 191 191 191 314 131 472 401 401 321 259 190 190 380 380 315 315 315 315 450 450 450 413 348 394 478 478 232 232 172 115 273 470 171 171 171 252 175 81 300 300 382 245 58 58 72 110 110 486 486 486 460 460 460 169 385 233 227 227 419 225 225 225 225 412 412 83 55 55 322 67 212 401 321 354 255 255 116 94 94 398 213 213 213 213 252 164 164 164 164 164 283 283 455 72 72 72 110 486 486 486 486 460 282 282 385 385 227 227 419 427 321 247 126 126 326 326 326 326 101 408 149 149 321 412 154 154 154 143 96 96 66 232 68 238 6 272 470 432 432 432 330 64 64 64 212 176 135 135 200 200 248 248 212 384 180 315 315 315 315 450 413 348 199 58 156 156 156 156 156 245 8 8 354 180 376 376 376 376 376 376 460 460 169 178 233 321 208 133 397 345 347 347 313 313 236 36 36 108 119 119 485 374 374 374 374 132 132 132 8 259 259 190 190 380 288 288 403 171 171 171 246 246 318 24 24 270 270 342 168 168 462 462 4 4 4 4 104 104 104 104 104 468 337 337 337 324 324 422 349 164 164 164 164 164 25 278 278 278 278 278 178 143 321 192 192 419 225 47 491 80 80 80 80 321 75 371 490 490 490 162 232 232 68 68 115 273 273 265 265 265 428 146 146 325 34 191 191 191 314 26 26 359 166 166 166 324 301 42 42 42 147 147 147 380 288 288 443 120 120 120 37 37 24 24 131 404 225 225 225 80 321 373 72 72 72 110 264 264 264 264 264 264 59 59 59 59 452 263 13 78 170 170 491 421 491 491 211 491 421 491 15 15 15 193 193 193 17
+103-1241-0015	103	700	1 18 14 16 15 31 38 5 40 31 24 6 23 1 38 8 33 1 5 25 11 34 19 25 1 6 23 31 27 24 5 10 16 30 13 22 5 23 11 1 18 14 24 7 34 38 5 40 23 3 30 21 1 5 25 11 31 27 38 14 18 14 8 40 1 38 19 10 23 35 22 33 17 30 20 25 19 25 31 5 24 23 8 33 31 5 25 11 24 36 11 40 1 4 25 11 17 30 15 19 25 5 12 14 40 1 31 27 16 3 30 12 20 6 30 11 5 25 13 30 20 5 9 40 14 37 14 1 4 25 19 22 31 33 30 6 30 11 5 25 13 30 20 5 9 40 14 37 14 1	18 5 2 5 6 7 1 2 2 8 3 1 14 5 8 7 9 1 2 4 2 7 4 7 11 6 4 6 4 5 4 3 6 2 1 5 2 5 5 18 5 2 6 10 7 2 1 5 4 6 5 10 1 4 2 2 5 4 3 2 4 6 13 7 1 2 2 6 3 3 3 6 3 3 6 2 3 3 6 3 4 3 6 4 2 1 2 1 3 8 3 6 9 7 2 3 3 5 6 3 3 4 4 5 11 40 7 7 3 4 3 3 3 5 3 1 2 1 2 2 4 2 3 5 3 3 8 21 5 2 2 3 2 4 2 2 2 2 1 2 4 3 4 2 3 5 4 3 7 5	17 17 17 296 363 363 363 52 52 52 52 52 408 51 51 51 184 289 321 320 156 156 156 156 245 349 205 205 261 343 343 343 343 343 343 252 186 39 342 86 86 142 397 141 141 141 281 162 232 232 232 482 482 105 105 196 70 65 65 481 481 481 481 182 182 182 375 375 375 98 98 98 225 225 80 80 491 80 80 321 7 7 364 276 276 346 346 428 428 428 428 146 146 358 358 233 321 227 227 419 225 89 483 321 188 89 446 446 33 394 394 76 164 164 164 164 164 278 278 278 278 120 330 303 303 303 303 117 48 48 417 47 47 491 491 80 80 80 80 289 321 320 287 287 297 297 297 297 293 293 186 162 54 68 115 224 273 84 84 84 274 399 399 70 70 383 383 383 383 167 310 107 447 447 447 393 234 261 25 380 288 151 178 178 458 458 208 302 302 302 302 375 375 122 122 227 419 419 427 82 321 312 312 126 292 292 292 292 23 23 23 101 101 101 149 228 228 321 320 373 156 156 156 245 399 217 217 70 473 65 315 315 315 315 315 450 450 293 169 352 352 352 352 97 397 397 345 345 141 141 281 453 9 26 26 251 241 241 431 284 306 306 306 306 306 306 396 396 396 37 233 36 310 107 107 447 18 97 97 225 321 412 83 55 55 446 67 394 478 66 172 115 273 273 84 410 410 410 43 29 347 347 245 245 58 156 156 156 245 14 14 14 411 287 265 265 265 265 265 265 85 85 85 207 207 318 185 269 433 433 160 97 397 397 345 407 407 407 407 310 107 447 447 26 251 241 367 367 367 367 458 96 96 272 472 472 221 401 321 321 208 79 79 288 288 360 360 360 360 434 434 339 199 340 340 340 116 33 394 478 68 68 172 115 273 319 319 319 203 53 53 251 251 241 431 428 428 428 428 146 146 385 77 77 342 224 89 89 116 33 250 217 217 473 65 374 374 374 132 132 37 37 24 321 270 433 160 112 427 229 247 312 126 326 326 326 101 101 149 149 321 412 412 83 55 55 446 446 67 131 472 472 458 208 208 79 79 380 288 403 403 403 403 207 324 464 464 464 446 116 94 199 493 493 493 493 493 216 300 334 334 304 304 304 185 185 269 323 18 112 112 56 56 56 170 170 28 28 28 491 491 28 28 362 491 362 362 362 491 491 362 362 362 362 40 40 362 218 491 491 305 305 366 366 366 305 366 435 435 435 435 435 435 321 435 435 373 373 66 68 172 115 344 344 344 344 344 274 274 349 205 261 25 106 306 306 306 353 396 313 216 22 448 448 448 14 411 411 153 153 153 387 387 313 314 196 196 398 134 134 468 337 337 337 464 464 255 255 215 96 368 453 453 168 273 498 498 498 396 173 29 29 334 334 59 452 452 263 13 229 491 442 312 312 312 292 292 292 292 292 326 326 326 326 326 326 101 408 149 228 321 305 209 287 44 44 94 199 154 154 154 96 96 482 482 238 6 161 79 153 153 153 153 387 396 240 314 196 309 199 264 264 264 468 468 468 337 337 464 464 464 255 255 215 96 478 342 9 168 470 498 498 498 396 173 173 280 29 334 334 59 452 452 229 321 247 15 193 193 17
+103-1241-0016	103	733	1 24 8 33 18 4 37 31 20 25 12 5 33 12 5 10 19 25 38 5 40 37 13 30 20 29 28 25 33 19 11 5 25 11 29 30 5 25 7 25 31 33 1 12 4 33 12 5 9 19 17 8 40 38 14 16 35 23 5 37 31 29 19 30 19 33 5 25 11 37 19 37 4 31 19 33 20 1 12 5 33 12 5 24 7 34 38 5 40 31 38 20 33 23 19 29 33 5 25 11 19 22 31 29 30 13 31 19 37 1 12 5 33 12 5 16 6 30 18 13 11 38 5 40 9 30 6 11 5 25 11 16 35 23 1 19 25 32 6 30 33 1 3 30 11 19 31 14 25 19 26 19 22 31 33 30 6 30 11 5 25 13 30 20 5 9 40 14 37 14 1 24 8 33 18 4 37 22 5 25 22 23 36 11 19 11 1	6 5 5 2 1 2 3 5 5 3 2 2 2 1 2 8 3 4 2 2 5 3 2 4 4 9 5 3 3 3 2 2 2 2 3 2 2 2 7 5 5 5 15 3 2 3 2 2 4 4 5 8 5 3 2 6 2 3 3 3 7 3 2 4 3 2 2 1 2 2 7 2 5 5 2 3 11 13 3 3 1 2 2 4 10 4 2 2 2 7 3 4 3 4 3 5 3 1 2 2 1 4 3 3 2 2 7 5 6 16 3 2 3 2 2 7 4 3 5 5 4 4 3 4 4 4 11 2 2 2 2 8 3 11 14 7 2 7 5 4 7 10 7 5 3 2 6 5 2 3 4 2 6 1 3 2 2 3 1 2 1 3 2 1 2 3 5 4 4 7 19 5 4 2 1 2 3 2 2 3 5 3 4 2 4 4 4	17 17 51 51 228 289 321 7 70 70 65 428 428 428 146 146 325 449 202 202 202 202 402 162 232 172 172 267 267 267 267 267 434 434 339 248 248 212 45 45 45 45 198 22 5 455 236 129 321 310 107 107 395 395 278 278 330 116 195 250 250 345 141 141 281 281 453 9 142 4 4 4 104 104 104 104 104 468 337 337 324 301 143 129 401 321 74 74 441 441 441 441 441 387 360 360 360 252 339 76 465 449 191 191 191 191 24 325 89 89 446 33 394 76 74 190 492 492 492 313 94 94 331 331 315 315 315 315 450 450 450 413 413 243 243 77 433 86 238 6 6 227 427 427 491 247 126 126 292 326 326 326 326 101 408 149 228 228 321 320 321 127 45 45 45 45 35 35 127 5 5 455 236 129 259 354 278 278 278 278 416 416 192 180 106 265 265 265 265 85 85 85 146 318 49 9 142 397 347 347 347 347 245 349 349 205 261 261 25 424 424 424 175 81 462 462 462 130 402 478 162 232 232 232 232 105 105 336 336 354 485 278 498 468 468 467 277 277 469 325 449 89 89 446 53 394 212 280 106 265 428 428 146 438 173 280 280 486 486 486 460 169 150 150 342 342 224 469 469 325 449 41 41 41 41 19 19 19 454 454 454 78 491 170 491 312 187 187 292 12 408 408 408 149 228 321 321 127 45 45 45 45 35 198 22 5 455 399 217 473 65 315 315 315 315 450 450 450 169 169 164 164 397 397 397 141 141 141 281 31 162 232 232 68 68 482 397 397 397 109 213 213 213 252 252 36 26 26 26 251 241 431 278 278 278 215 215 35 96 96 465 272 89 89 446 67 131 34 154 154 458 96 96 54 142 105 336 336 190 380 288 151 151 169 150 342 342 224 224 494 459 459 459 459 37 173 352 352 352 427 491 247 126 126 126 326 326 326 326 101 101 101 149 149 228 321 321 320 127 45 45 45 45 35 35 198 22 5 455 349 205 234 234 261 25 148 148 148 148 372 372 372 396 58 72 72 110 110 120 120 120 120 240 24 24 133 133 364 345 141 141 141 141 281 281 9 142 221 336 336 259 190 190 380 380 499 499 499 405 426 426 426 426 206 206 206 37 24 34 89 89 446 116 394 90 393 234 234 234 234 261 25 441 424 424 424 182 182 375 375 375 98 98 13 13 417 170 170 47 491 47 491 491 491 2 491 491 316 73 289 289 320 412 188 188 340 340 116 33 394 478 338 338 338 338 395 470 153 153 153 153 387 372 396 396 385 233 227 227 419 439 417 78 47 47 491 47 47 491 80 321 321 80 321 435 209 287 287 353 353 353 353 396 313 236 36 384 490 490 490 31 162 68 115 115 273 308 308 308 396 313 94 94 176 176 135 328 200 200 199 255 154 154 129 401 321 96 66 482 238 272 79 153 153 153 387 387 396 314 196 196 479 398 398 264 264 468 467 467 255 255 215 96 478 342 68 115 273 498 498 498 498 396 173 173 29 29 334 334 59 59 452 263 229 247 247 126 126 326 326 326 326 326 326 326 101 101 101 149 149 228 321 289 320 7 70 65 65 389 428 428 146 240 325 34 202 202 202 402 221 458 27 121 121 121 33 394 76 259 208 208 386 386 386 444 374 374 374 252 325 34 191 191 191 37 24 404 427 229 247 193 193 17
+103-1241-0017	103	794	1 38 5 40 31 27 23 36 11 5 22 30 5 31 23 20 5 16 30 15 11 1 24 4 34 39 36 18 7 13 37 14 1 38 5 40 31 29 13 30 11 12 20 6 30 11 20 23 5 37 31 29 20 22 19 26 16 14 31 33 1 16 6 30 13 40 31 36 25 13 40 32 20 22 5 25 22 23 36 11 19 11 12 5 33 18 20 38 5 40 22 5 24 19 26 33 36 18 14 1 32 20 31 33 35 11 5 29 1 17 30 4 31 29 19 26 38 19 34 38 5 25 34 19 25 9 30 7 25 18 4 25 11 1 12 5 18 4 25 11 5 23 5 37 5 32 4 9 20 27 23 11 16 4 32 5 25 11 22 3 30 29 5 33 9 4 17 1 12 20 5 12 14 32 20 18 13 23 11 7 33 19 18 19 24 1	8 4 1 4 5 2 5 4 2 2 3 2 2 5 2 4 2 5 3 7 5 45 5 6 4 4 4 5 4 4 4 8 4 5 2 3 4 3 4 4 3 3 3 3 4 4 6 4 1 3 4 4 3 3 2 6 4 7 4 4 13 6 3 3 2 3 4 3 3 2 2 3 3 3 2 3 4 3 3 2 3 2 2 1 2 3 3 2 1 4 5 3 2 2 4 5 3 3 7 12 6 3 6 2 3 2 6 8 6 3 2 5 4 3 2 4 2 1 4 4 5 5 9 6 8 4 2 8 3 8 7 3 4 12 2 2 6 3 5 2 3 2 2 2 2 9 5 3 8 6 3 2 7 4 6 2 2 3 6 2 3 2 3 2 4 8 5 11 3 4 5 5 3 6 3 4 4 3 2 6 4 2 5 4 7 14	17 17 17 363 51 51 184 320 320 345 333 333 220 220 402 66 66 68 115 344 344 344 344 274 274 251 251 241 431 374 374 374 374 285 34 469 469 143 458 208 79 459 459 271 31 342 86 26 26 166 166 166 464 464 464 255 255 349 234 234 261 190 380 288 288 403 403 403 207 207 207 37 24 24 404 439 417 417 417 170 170 28 28 491 28 491 362 491 491 362 491 362 491 491 491 362 362 491 362 362 491 40 211 369 369 369 369 21 21 21 21 21 21 21 408 408 408 149 149 228 321 321 320 7 217 70 65 486 486 486 460 460 169 169 164 164 164 485 485 485 485 374 132 132 58 58 72 268 268 268 268 268 88 88 109 84 463 463 463 173 280 29 334 59 59 452 263 263 417 417 417 417 80 321 321 7 7 345 141 141 141 281 162 232 232 232 482 105 336 336 470 470 264 264 264 264 468 468 313 313 314 314 198 22 448 448 448 464 106 106 372 372 372 313 236 236 36 371 485 213 286 286 286 139 302 302 175 175 69 223 223 130 478 232 232 105 105 336 321 354 470 213 213 252 143 192 176 135 135 135 200 248 248 248 393 205 261 25 498 498 498 498 498 396 271 271 39 54 86 238 6 427 427 247 247 126 126 326 326 326 326 101 101 101 149 228 321 321 373 155 155 155 332 332 332 372 372 372 467 253 253 38 162 232 172 172 115 485 374 374 374 348 94 199 253 253 253 99 338 400 400 400 30 422 143 144 27 121 121 121 394 76 76 208 208 386 386 444 444 374 374 252 325 191 191 191 37 314 198 198 45 45 45 183 183 451 30 30 301 378 345 141 141 281 342 342 221 336 144 27 27 351 319 319 319 53 53 176 135 135 200 248 76 75 108 377 123 123 123 123 132 58 156 156 156 156 59 59 452 229 229 247 126 126 326 326 326 326 101 101 408 149 228 321 321 373 400 400 400 30 422 422 162 482 482 482 482 238 272 189 189 189 189 285 34 230 230 230 230 230 215 215 35 74 419 439 439 78 78 47 47 80 80 80 289 289 320 208 79 499 486 486 460 460 169 150 342 105 105 336 354 176 176 135 135 200 248 248 333 333 220 220 220 142 133 364 276 174 174 174 174 319 319 348 348 195 195 90 90 90 393 234 234 234 234 261 261 25 470 278 278 278 330 330 388 195 195 195 250 250 394 32 32 259 354 190 380 380 315 315 315 315 450 450 450 413 413 33 58 58 72 72 72 294 294 294 294 294 294 294 294 294 282 282 388 195 64 212 131 427 321 247 126 126 326 326 326 326 101 149 149 228 321 320 22 5 455 455 72 72 72 294 294 294 294 294 388 348 64 64 212 212 26 302 302 302 175 69 69 69 130 280 44 44 44 99 338 338 338 338 338 395 470 486 486 486 460 460 215 354 41 324 324 324 3 335 14 226 411 411 424 424 424 424 424 424 274 122 122 131 472 393 234 234 261 25 486 486 486 460 460 169 99 436 436 436 60 242 116 116 33 212 131 472 221 321 144 27 437 437 306 306 306 460 215 35 29 469 277 277 314 401 401 321 354 180 376 376 376 376 376 282 207 37 24 192 192 427 321 247 126 126 23 408 408 408 149 228 321 321 320 127 448 448 448 14 14 411 493 493 493 493 493 216 127 300 334 334 59 452 186 99 338 338 400 400 400 30 422 58 58 72 110 110 139 139 139 293 293 122 122 34 180 113 113 113 113 167 167 36 449 123 123 123 123 183 183 57 57 57 57 57 203 381 381 381 48 48 417 417 417 170 421 421 421 421 491 128 491 128 128 193 193 17
+103-1241-0018	103	471	1 8 31 5 29 27 40 39 36 3 30 24 19 31 33 14 24 4 34 39 36 22 5 34 9 14 33 5 37 17 30 20 25 17 15 9 5 23 40 1 32 20 31 13 11 19 25 5 29 19 22 39 36 23 39 14 23 20 22 23 19 30 31 38 20 33 37 28 31 1 8 24 37 13 30 20 17 23 4 11 33 19 31 20 39 36 1 8 38 5 40 9 19 17 19 25 19 26 33 19 9 20 5 16 30 15 11 39 36 38 14 25 33 22 5 24 19 26 16 14 24 20 1	18 7 3 2 4 6 3 1 2 3 3 2 2 3 2 2 4 3 3 2 2 5 3 4 4 3 2 2 2 3 2 3 3 2 5 2 3 3 10 20 6 3 5 3 2 2 2 3 3 2 7 2 4 2 3 2 3 4 9 3 6 6 8 3 3 3 4 7 8 16 8 2 3 2 3 3 3 3 3 1 2 2 6 2 6 5 16 7 2 2 3 1 3 4 2 2 2 3 2 2 3 5 2 5 2 4 2 2 3 4 2 2 2 5 2 2 2 4 4 3 3 6 5	17 17 17 296 317 317 317 491 491 491 491 491 461 184 321 435 435 321 435 287 287 111 111 111 438 162 342 224 494 494 236 74 470 496 496 496 496 496 274 368 368 9 219 152 152 152 88 353 353 353 245 399 70 473 258 31 54 86 238 272 272 300 245 399 217 473 65 486 460 460 169 164 485 485 485 382 422 458 458 144 27 437 437 151 169 169 164 402 221 401 321 354 29 498 313 313 325 34 462 462 130 402 321 259 79 79 288 360 360 360 200 200 248 445 445 180 171 171 171 252 215 8 354 100 302 375 497 98 185 269 433 390 160 112 112 56 491 312 312 312 187 187 12 12 12 12 12 12 12 23 260 260 260 260 391 391 391 491 321 373 338 400 400 400 400 30 422 162 68 68 115 470 470 120 120 240 314 196 340 116 199 44 44 44 129 259 74 492 236 129 321 445 445 485 485 485 485 485 485 374 374 132 359 81 485 485 134 382 134 359 359 81 166 166 324 422 143 401 401 321 321 144 208 208 208 386 386 386 286 286 286 286 286 286 334 382 59 304 313 186 162 66 482 482 482 482 105 397 336 109 213 213 213 252 143 131 472 393 393 261 343 343 343 343 343 343 343 358 39 39 433 433 160 427 56 247 247 312 126 292 292 326 326 326 326 326 101 101 101 149 149 321 412 287 287 111 111 111 356 356 53 394 212 4 104 104 104 104 104 337 337 337 301 143 144 208 386 431 376 376 376 240 24 36 87 87 87 162 232 172 115 267 267 267 267 267 219 219 477 477 477 477 477 132 13 229 491 247 312 126 292 292 292 23 23 23 101 101 101 149 228 321 412 412 287 111 111 111 378 378 141 141 281 453 142 221 336 420 420 420 416 458 445 485 360 360 360 94 176 135 135 248 76 108 377 87 87 129 354 420 420 420 464 464 44 255 38 349 205 205 261 487 288 288 288 171 171 252 24 131 219 152 152 152 378 43 345 347 347 372 396 313 457 131 221 458 144 27 351 351 319 319 203 53 176 135 328 200 248 393 205 155 332 332 332 332 245 399 429 429 429 429 19 19 229 247 247 126 193 193 193
+103-1241-0019	103	151	1 8 18 4 11 24 15 11 5 29 24 8 24 8 25 11 1 12 5 33 19 16 39 36 11 19 11 5 25 22 5 24 16 14 24 20 33 19 25 8 33 1	14 6 4 2 2 4 3 1 4 3 2 5 5 6 2 3 9 3 2 1 3 2 2 2 4 2 3 2 2 5 2 4 3 3 3 2 4 2 3 9 4 4	17 491 211 491 296 296 363 363 326 101 101 149 149 228 321 321 287 111 111 111 438 58 110 254 254 254 314 196 196 217 473 476 476 476 252 325 34 230 230 230 215 35 196 196 46 46 46 46 438 399 399 217 70 65 480 480 480 480 85 299 299 339 212 131 427 229 247 126 326 326 326 101 149 149 321 321 320 45 45 45 325 118 118 118 118 402 219 152 152 422 236 239 384 371 278 278 314 196 196 242 242 33 90 465 144 27 351 351 319 319 203 53 394 76 205 155 332 332 332 399 399 429 429 429 422 143 108 377 377 87 236 10 479 331 331 428 265 428 428 428 146 207 358 233 131 419 321 247 15 193 193
+103-1241-0020	103	485	1 8 38 35 11 5 25 33 9 20 5 9 19 33 5 16 30 15 11 1 4 25 11 19 33 38 35 11 9 20 23 5 37 23 20 33 19 31 23 20 29 19 25 5 38 8 23 11 1 10 13 30 20 33 30 20 6 23 38 8 33 38 19 34 9 23 36 24 19 25 12 5 24 36 25 32 8 25 11 27 25 33 39 36 34 19 26 22 1 39 36 22 35 11 19 24 4 21 5 25 39 36 38 14 11 38 13 23 19 26 19 25 24 3 30 9 5 23 18 6 23 40 22 35 11 5 25 33 39 36 1	11 8 3 2 1 2 1 2 2 4 2 4 2 2 3 6 2 6 4 12 7 1 2 2 2 1 2 2 2 4 9 3 6 2 3 3 2 5 2 3 3 1 2 3 5 7 4 2 2 8 3 4 3 4 3 5 7 7 4 6 5 1 2 5 4 3 8 3 2 2 1 2 4 6 3 6 6 3 3 2 1 4 2 1 4 3 5 6 26 4 2 4 1 2 2 4 5 4 2 2 2 3 2 2 2 4 2 2 3 2 2 3 6 4 2 2 2 3 6 9 4 4 3 3 1 2 2 2 4 6 11	17 17 17 363 363 363 51 149 228 228 321 321 287 287 111 111 111 111 378 378 43 389 389 389 314 242 242 394 394 32 259 420 420 420 420 464 44 44 236 129 354 354 278 278 325 34 300 255 349 349 234 234 261 190 487 288 288 288 403 171 207 207 37 24 131 427 491 247 126 126 326 326 326 326 101 149 149 149 228 321 209 83 55 55 55 55 322 94 199 177 177 177 457 389 389 389 314 259 259 420 420 420 301 301 251 251 251 251 251 251 251 241 266 266 266 266 266 173 402 402 26 359 359 81 166 324 422 36 377 87 87 38 162 232 172 26 26 359 444 444 213 252 8 354 89 340 116 199 44 44 43 43 364 276 346 346 346 265 85 85 85 139 139 293 293 122 122 35 401 401 401 75 310 107 107 107 395 395 351 264 264 264 468 468 406 337 337 324 252 143 36 161 487 487 487 41 324 3 335 14 14 411 297 297 297 297 297 297 297 293 293 497 497 43 43 364 364 276 346 346 428 428 428 146 358 76 449 472 397 397 333 333 220 220 164 142 221 401 321 321 321 354 425 425 431 374 374 374 374 374 132 132 203 203 53 473 340 340 116 466 22 283 455 399 217 70 473 65 65 350 350 413 413 33 33 394 478 338 338 338 395 470 480 480 480 85 299 299 339 64 212 465 384 430 430 430 430 430 465 449 152 152 152 152 349 164 164 214 214 214 214 360 328 328 200 243 233 192 192 419 229 491 312 312 491 187 187 187 201 201 201 201 201 201 201 201 491 201 491 201 201 435 211 211 408 149 321 321 321 219 152 152 152 152 143 458 144 389 389 389 325 34 255 399 217 217 65 486 486 486 460 460 240 310 107 395 242 116 116 219 219 152 152 378 378 347 347 347 236 239 161 397 133 276 109 189 139 139 175 81 176 135 200 464 464 340 116 33 33 250 217 70 70 70 65 306 306 306 306 396 134 215 35 29 100 497 497 497 58 72 72 72 72 437 481 481 481 481 481 481 182 182 182 375 375 375 185 269 342 86 221 336 144 430 430 430 430 430 430 430 430 430 131 449 449 485 152 477 477 374 132 132 13 229 491 247 312 15 15 15 15 193 193 193 17
+103-1241-0021	103	758	1 24 4 34 39 36 18 4 11 33 15 22 5 25 12 5 31 22 30 6 25 20 23 19 33 5 23 18 4 25 11 6 22 38 14 11 23 20 19 25 18 19 40 1 12 13 25 5 25 11 12 13 30 1 18 20 11 19 31 8 11 19 11 18 38 5 33 19 11 36 1 18 20 22 35 11 25 3 33 13 23 12 19 31 10 8 23 11 38 19 12 5 17 23 27 19 26 8 40 1 12 4 33 12 13 30 18 4 11 9 5 25 5 24 19 31 33 15 22 1 18 20 38 35 11 33 15 22 18 14 18 27 24 5 25 11 23 13 33 24 3 30 19 23 5 11 36 12 4 33 1 32 20 22 35 11 5 25 9 20 23 13 16 33 4 33 9 30 8 33 30 19 37 14 13 25 20 18 7 1	24 7 6 4 2 5 5 2 3 4 4 3 3 2 1 3 6 2 2 4 3 3 3 2 1 2 2 6 5 3 5 7 4 2 2 3 2 5 2 2 5 8 9 17 6 6 3 3 1 3 2 5 9 5 6 2 1 3 4 5 2 2 1 3 2 3 5 1 2 10 37 5 3 3 3 2 3 5 7 2 5 1 3 6 6 6 5 3 1 2 3 2 4 4 5 3 5 10 7 16 4 2 2 2 2 2 3 1 2 4 1 2 3 2 3 6 4 6 10 25 6 3 2 2 2 4 4 3 2 4 4 6 3 1 2 1 2 2 3 2 2 5 2 4 2 3 5 3 6 6 22 7 3 4 2 2 2 1 4 4 3 3 3 2 2 3 3 2 5 2 3 3 3 4 3 2 4 4 10 4	17 17 17 296 317 317 317 317 317 491 317 317 461 461 461 461 461 461 461 184 184 184 184 321 320 7 217 217 217 473 329 329 329 329 329 460 169 164 164 164 219 485 485 485 374 132 132 274 58 58 72 110 254 254 254 254 314 401 75 108 119 295 295 295 295 295 143 458 192 242 242 116 466 466 22 283 455 38 162 54 482 482 105 221 336 79 79 499 499 405 206 206 348 199 41 324 324 301 251 241 431 278 278 285 302 497 497 497 58 58 72 110 294 294 294 294 294 294 282 388 64 212 131 335 14 14 411 411 284 405 405 405 206 178 35 35 441 441 109 109 134 313 24 26 26 359 359 474 474 324 464 340 340 340 116 33 58 183 183 257 257 257 257 257 120 50 50 185 185 185 269 433 433 390 18 427 56 247 312 312 126 292 292 326 326 326 326 326 101 101 101 408 149 149 321 289 7 7 7 4 127 361 361 361 361 361 330 388 94 199 89 89 446 116 33 212 212 127 114 361 361 361 264 264 264 264 468 59 452 452 263 263 417 417 414 47 80 321 321 435 373 451 451 30 30 236 325 490 490 38 162 342 115 273 265 265 428 146 146 325 34 191 191 325 133 133 259 181 181 181 181 167 457 75 108 377 87 87 236 325 371 374 374 374 374 132 98 98 48 48 417 417 170 170 102 102 28 28 40 40 40 40 40 40 40 40 40 40 40 491 362 491 218 366 305 305 491 366 366 40 40 40 40 435 435 435 435 435 373 451 451 30 30 30 422 458 144 27 389 389 389 389 196 196 479 331 307 307 61 61 167 167 457 75 108 119 351 351 139 139 139 293 293 293 216 216 114 258 258 31 54 54 238 238 221 321 310 107 395 395 437 91 91 91 85 85 85 85 450 293 293 122 122 131 133 333 333 220 220 198 22 44 236 129 321 208 208 425 386 241 431 84 496 496 88 88 176 176 176 328 200 200 464 106 265 265 265 265 265 85 85 85 207 318 318 39 433 433 160 427 247 247 126 126 126 326 326 326 326 326 326 326 101 101 149 149 228 321 320 127 45 45 45 45 35 198 114 0 0 222 58 58 110 254 254 254 314 401 321 354 137 137 137 94 44 44 44 217 473 65 258 31 31 342 68 68 68 238 6 272 470 470 470 171 171 171 358 358 358 233 321 192 192 419 419 439 439 78 78 78 491 28 491 28 28 28 2 491 491 2 341 341 341 12 12 12 21 21 21 408 408 149 228 321 321 373 451 451 30 30 30 378 378 389 389 389 389 129 259 108 119 295 295 295 295 295 143 458 192 156 156 156 245 245 58 72 72 350 350 350 350 350 350 413 203 381 53 89 89 322 67 466 241 431 443 167 167 457 196 217 65 329 329 42 42 147 147 380 256 139 175 175 423 423 423 423 236 75 371 371 374 374 132 132 216 127 114 92 92 92 92 92 167 385 243 227 419 439 78 78 170 170 170 47 491 491 2 2 491 2 2 491 491 2 491 316 435 435 316 316 435 321 435 373 338 338 400 400 400 30 422 143 458 144 389 389 389 389 314 242 242 394 76 76 259 420 420 420 301 26 251 241 431 443 443 169 169 402 402 6 272 415 415 385 129 401 321 259 190 380 499 499 428 428 146 146 457 457 147 147 380 288 288 173 173 29 29 495 495 406 467 467 365 330 94 475 475 324 324 58 58 72 110 268 315 315 315 268 450 450 98 98 229 247 15 15 193 17
+103-1241-0022	103	817	1 25 27 24 4 33 14 38 5 33 24 19 31 33 15 22 18 4 11 9 19 25 24 15 11 1 31 27 6 23 22 38 13 31 10 5 25 40 5 25 11 13 22 31 29 23 5 25 15 32 5 25 40 24 8 33 13 40 38 13 23 9 20 11 19 16 14 11 5 25 33 19 23 18 20 38 5 40 31 15 16 23 20 9 4 22 4 33 17 30 20 25 17 15 9 5 23 40 1 8 24 31 3 30 20 8 38 5 40 23 15 33 1 18 20 31 13 11 32 8 23 20 1 22 5 24 5 23 6 26 1 12 5 18 6 30 31 19 40 27 37 14 19 25 12 20 39 3 30 11 1 17 19 37 24 20 39 35 30 9 4 17 1 27 8 22 5 25 22 13 30 20 19 33 1 12 5 10 8 23 11 30 19 31 29 3 25 11 19 11 10 19 30 16 5 23 20 1	8 4 2 4 3 3 3 4 3 3 2 2 5 3 3 2 1 2 1 2 2 2 4 7 5 14 6 5 7 5 3 2 3 4 4 3 4 2 1 2 1 3 3 3 2 2 2 2 5 7 2 5 4 3 4 2 3 4 3 5 3 3 3 3 2 6 5 3 1 3 3 2 2 3 2 2 2 2 5 5 2 3 2 3 4 4 2 2 3 3 3 3 3 5 2 2 4 10 35 9 4 5 4 2 4 3 2 2 4 3 5 12 6 6 2 5 3 3 5 6 3 8 23 4 2 2 1 4 4 5 24 2 2 4 3 3 4 3 3 5 5 2 2 2 2 2 3 6 5 5 25 3 2 2 2 2 2 2 2 3 8 7 21 10 4 3 1 4 6 3 4 5 2 7 12 2 2 6 4 3 2 1 2 3 4 2 2 2 1 3 7 1 3 4 2 2 8 6	17 17 17 363 51 51 228 321 320 309 331 331 231 231 399 399 473 65 486 486 460 240 285 300 382 245 43 364 276 181 181 181 181 167 35 35 196 196 473 258 258 31 342 86 86 6 272 470 470 171 171 252 458 458 192 389 314 314 321 354 137 137 137 399 217 217 473 476 476 476 476 476 476 207 37 24 131 427 229 321 247 312 126 292 292 23 23 23 23 408 408 391 228 228 321 373 66 172 115 273 344 84 274 88 14 14 411 297 297 297 297 297 297 293 293 122 35 458 208 208 441 109 151 151 151 169 150 54 238 238 310 107 60 298 298 298 379 471 471 49 342 89 89 446 67 34 145 443 154 178 96 96 342 105 105 321 354 386 386 386 469 116 94 418 418 418 418 418 418 99 436 436 436 60 298 298 298 379 379 471 471 471 49 9 142 221 196 70 65 428 428 428 146 325 325 34 253 253 453 9 142 133 364 276 109 109 139 139 139 293 293 293 122 35 354 420 420 420 422 36 384 490 490 490 349 349 234 261 25 487 498 498 498 396 313 285 131 34 89 116 33 394 465 377 351 139 139 139 175 58 451 30 30 378 43 141 141 141 31 162 232 68 68 115 273 470 171 171 252 173 402 402 26 359 166 166 301 8 354 354 180 376 376 460 178 178 458 192 415 415 314 472 221 458 208 79 288 288 360 360 434 200 248 248 212 445 445 171 171 171 171 252 215 354 100 100 302 375 375 185 185 269 390 390 18 112 427 56 56 312 312 312 312 292 292 292 292 292 12 12 12 12 12 12 12 12 260 260 260 260 163 163 163 163 163 163 491 316 316 491 316 316 73 289 321 320 287 287 287 111 111 111 85 438 203 53 394 478 162 232 68 115 273 106 499 499 306 396 337 337 464 464 464 111 111 378 88 345 141 281 31 342 26 251 251 241 431 403 171 171 171 358 358 233 321 227 227 419 419 439 439 225 225 47 47 47 491 47 80 80 80 289 451 451 451 30 30 422 162 232 172 115 179 179 120 120 314 457 310 310 338 338 395 499 499 265 265 85 146 146 37 359 359 474 474 474 19 454 454 417 414 170 170 170 47 28 28 2 2 2 491 491 2 2 2 2 491 316 491 316 435 435 289 435 321 144 27 351 319 319 53 255 255 255 251 241 431 235 235 235 235 413 413 98 48 13 13 13 170 321 170 312 187 187 292 292 292 292 23 23 23 23 23 101 101 149 149 228 289 321 321 127 5 5 455 72 72 441 153 153 153 372 396 313 186 54 54 224 50 356 281 281 9 168 106 410 410 410 410 410 173 402 29 495 406 467 340 340 340 466 22 283 455 448 219 219 219 180 180 306 306 306 306 306 306 59 37 37 404 439 439 439 78 78 170 170 28 28 28 491 2 2 491 2 2 2 491 491 2 316 491 316 316 316 73 73 289 321 321 445 445 278 278 173 196 196 429 429 429 219 464 222 222 245 245 245 8 354 180 376 376 376 376 376 282 37 37 233 192 419 419 439 78 170 170 442 442 187 442 187 187 12 12 12 12 260 260 260 149 149 289 289 321 289 209 287 16 16 16 16 16 88 88 111 111 111 111 438 143 35 389 389 389 33 394 76 465 445 445 445 351 351 264 264 264 468 468 468 337 337 324 324 464 277 277 277 385 36 227 419 439 78 78 170 491 47 187 47 47 47 442 442 442 442 442 127 22 5 236 36 36 107 395 351 91 91 91 91 206 206 122 122 35 29 456 456 31 162 9 105 336 74 106 426 426 206 348 64 212 191 191 191 314 401 321 108 107 107 395 485 286 286 286 468 245 349 349 155 262 262 359 359 474 474 474 474 19 454 229 321 247 15 15 15 193 193 17
+103-1241-0023	103	833	1 19 33 19 40 5 25 33 18 13 37 20 1 8 37 17 3 33 1 6 23 24 8 38 14 23 11 23 20 17 35 11 40 19 25 19 33 1 9 5 33 19 33 19 40 5 25 33 18 13 37 20 1 4 25 11 19 16 19 33 19 40 5 25 33 22 13 30 20 11 19 25 21 5 31 33 5 31 14 33 5 25 38 15 12 5 18 4 25 11 5 23 29 35 23 40 7 33 1 31 27 8 11 9 13 33 14 22 20 29 19 33 9 19 22 5 40 8 25 27 12 20 19 17 40 4 22 33 25 4 22 5 37 19 33 1 19 33 31 5 25 13 22 31 33 30 20 24 23 20 27 23 11 22 3 30 29 5 33 9 4 17 1 27 8 24 37 13 30 20 17 23 4 11 39 36 37 22 5 24 1 20 37 19 25 19 16 19 33 38 35 11 18 4 37 9 19 25 8 31 33 19 31 23 20 29 19 25 5 38 8 23 11 10 13 30 20 33 30 20 1	8 4 2 2 4 1 2 1 4 3 4 8 14 7 1 4 4 3 1 7 3 4 6 4 3 3 2 2 2 4 4 3 3 2 2 4 3 15 3 1 3 1 3 1 4 2 1 2 4 2 4 10 23 6 1 2 4 3 2 2 2 3 2 2 2 6 3 4 2 3 2 4 3 3 3 3 2 6 5 1 2 3 4 7 3 1 8 2 2 2 1 3 4 3 3 4 7 4 12 7 4 6 2 2 2 3 3 6 3 3 3 2 2 2 3 3 4 5 5 3 3 3 2 5 4 5 6 2 4 3 4 3 2 3 8 18 4 2 1 2 2 1 3 3 2 2 4 2 1 6 3 6 3 4 3 3 2 2 2 4 6 6 27 8 3 3 3 2 3 4 3 2 5 2 3 2 3 6 3 9 6 7 2 2 1 3 4 2 4 2 2 1 2 1 3 3 2 6 5 4 2 3 3 2 3 3 1 2 4 5 6 3 2 6 4 3 3 5 2 8 10	17 17 17 363 51 51 228 289 321 188 177 177 177 325 356 356 356 342 342 224 242 242 116 131 131 72 72 110 443 443 240 173 280 41 41 41 41 19 454 417 417 417 417 170 47 491 47 491 491 491 47 47 80 321 321 435 435 435 209 111 111 111 202 202 402 402 458 27 180 405 405 206 167 457 14 14 14 209 411 297 297 297 297 297 297 297 293 399 70 70 46 46 46 46 46 438 378 43 364 109 109 498 498 134 387 122 122 26 26 359 81 166 324 416 239 458 144 180 484 278 240 314 77 270 342 224 340 340 340 94 199 277 277 277 277 227 419 229 247 247 126 126 326 326 101 101 149 391 80 80 80 80 289 321 354 159 159 159 325 34 177 177 325 356 356 356 31 342 224 242 242 379 131 131 72 72 110 443 443 443 173 173 280 41 41 41 41 19 19 454 454 454 78 170 170 491 312 312 292 292 292 292 292 21 21 21 21 21 21 408 408 149 149 228 228 289 321 209 209 83 55 55 322 322 94 199 118 118 118 118 118 205 177 177 177 177 325 356 356 356 342 342 242 242 116 64 131 472 221 144 445 445 351 351 264 486 468 468 468 337 337 337 324 252 325 34 89 340 116 33 394 212 465 395 395 151 151 169 150 86 86 6 272 34 44 38 162 68 172 115 273 498 498 498 396 240 35 242 242 242 116 250 250 364 364 109 109 403 403 403 207 171 3 252 216 198 22 5 455 72 72 72 72 294 294 294 294 330 64 64 212 302 302 302 497 122 129 259 74 441 441 424 424 497 497 497 49 342 168 180 180 113 113 113 113 450 167 167 131 427 321 247 126 126 326 326 326 101 408 408 149 391 491 321 373 66 68 68 115 273 84 16 88 88 111 111 111 111 438 438 438 35 259 354 180 443 443 285 300 382 313 313 143 458 458 445 445 213 213 213 252 215 354 277 277 277 277 143 259 259 354 420 420 143 458 144 351 494 253 368 453 168 106 111 111 111 438 438 10 10 479 331 84 84 496 274 216 198 448 448 448 464 154 154 154 416 32 96 368 453 453 115 470 470 486 486 376 460 460 178 35 96 96 401 196 196 309 309 479 331 486 486 460 460 178 458 192 192 69 223 130 280 277 277 277 277 385 385 75 227 419 439 78 170 47 47 47 491 491 491 2 2 491 491 316 316 316 73 289 321 321 209 177 177 177 356 356 342 168 44 116 199 154 154 96 96 54 482 238 161 161 487 288 360 360 360 339 53 359 166 166 166 324 14 14 411 411 424 424 424 424 424 424 122 122 122 131 472 221 144 27 437 306 306 306 306 396 215 35 29 277 277 314 401 321 259 354 180 376 376 376 376 120 282 37 233 192 419 427 78 170 491 312 312 312 341 341 341 341 341 12 12 12 12 21 21 326 326 326 326 101 101 149 149 228 289 321 321 209 287 16 16 16 88 88 111 319 319 203 53 394 212 4 104 104 104 104 406 337 337 337 324 422 143 458 208 386 431 376 376 376 460 240 24 36 107 152 152 152 202 402 402 402 259 144 27 27 351 319 319 319 319 203 381 381 117 48 417 417 417 417 197 491 435 80 289 321 209 188 357 357 357 357 357 173 280 242 116 94 118 118 118 118 118 280 177 177 177 177 457 457 364 345 389 389 389 285 34 202 202 202 402 401 259 354 137 137 137 137 33 10 10 479 331 265 265 428 146 146 146 39 86 6 272 87 87 87 162 54 86 26 26 444 444 213 252 215 354 340 340 340 199 44 44 44 43 43 364 276 346 346 265 85 85 85 139 139 293 122 122 314 401 401 75 107 107 395 351 351 264 264 468 468 406 337 337 324 422 36 36 161 161 487 487 487 41 41 19 19 19 454 417 417 421 421 491 421 128 128 128 193 193 17
+103-1241-0024	103	794	1 38 20 37 17 3 33 19 11 30 8 37 5 23 6 26 29 20 31 18 4 37 5 25 33 38 20 1 24 19 31 19 40 31 29 13 25 31 14 31 13 11 19 33 38 5 40 15 33 24 8 23 40 1 8 24 17 23 4 11 9 19 22 5 40 8 23 5 37 11 30 8 37 19 26 1 27 19 33 31 20 24 40 31 27 38 5 25 11 14 16 5 23 12 5 33 8 24 17 27 19 26 33 5 23 19 37 38 19 34 39 36 5 25 11 9 19 23 6 26 33 19 39 36 1 8 37 25 13 37 14 9 19 23 6 26 11 33 36 13 25 20 9 5 11 20 1 25 3 33 30 20 23 20 1 9 5 33 12 20 5 31 8 23 5 24 38 5 40 12 5 38 14 31 33 1 8 37 27 25 23 20 9 19 25 19 25 19 33 16 6 30 24 5 25 34 31 1 9 5 33 12 4 33 38 5 40 19 25 5 16 1	13 5 2 2 2 3 4 1 2 3 4 3 2 2 4 4 5 5 4 3 3 2 2 2 2 2 8 19 3 2 3 2 2 2 3 3 3 3 3 5 2 2 2 2 2 1 4 7 3 3 8 3 5 16 7 3 3 3 5 2 3 2 3 2 3 4 6 3 4 2 3 6 1 2 9 13 11 3 4 5 3 3 2 5 7 4 2 6 2 2 4 2 2 1 2 1 3 3 3 1 3 1 3 2 6 2 3 2 1 3 1 3 2 1 2 1 2 6 4 4 4 2 5 5 16 6 3 2 1 3 2 2 2 3 4 2 2 3 3 2 2 3 2 2 3 8 10 4 2 4 3 5 4 9 16 3 3 2 2 2 3 4 6 2 2 2 2 1 4 2 2 5 6 5 5 22 7 4 2 2 2 3 3 2 2 3 2 3 2 5 3 3 4 4 3 3 4 10 2 2 2 3 3 3 1 2 4 2 4 6 11 10	17 17 17 296 317 491 317 317 491 184 184 184 184 320 7 345 152 152 152 152 402 221 144 180 189 405 206 167 36 377 87 87 236 161 79 499 499 499 428 146 173 173 280 29 255 251 251 241 235 235 235 235 235 348 248 76 259 74 74 351 213 213 213 213 213 186 39 342 342 224 110 110 110 202 202 202 430 430 430 430 430 243 133 259 345 109 41 41 19 19 454 229 82 229 312 312 126 292 292 292 292 292 292 21 21 21 21 408 408 408 149 149 321 321 320 473 258 258 31 342 224 494 494 31 162 232 105 105 336 470 470 432 330 330 379 77 342 224 300 300 382 186 186 54 172 273 470 470 120 240 325 177 177 177 378 345 141 141 281 342 168 470 411 171 171 171 171 252 314 401 196 196 217 70 65 265 265 265 85 85 85 139 139 375 375 185 269 433 427 427 247 247 126 126 126 326 326 23 23 23 23 23 101 149 149 149 321 321 287 111 111 111 438 356 203 64 90 212 144 208 386 431 376 376 376 376 85 37 24 35 259 354 420 420 422 143 144 27 351 368 453 168 106 111 111 111 438 438 251 251 251 241 266 266 266 266 173 402 402 221 75 161 161 79 499 499 428 85 146 146 173 173 176 176 176 328 200 200 117 404 404 439 439 225 237 237 260 260 260 260 260 391 391 289 321 321 321 209 287 287 16 16 16 16 16 88 88 177 177 177 177 35 478 478 68 172 172 444 444 444 360 339 339 394 478 478 232 232 68 172 344 344 344 344 344 274 43 43 43 364 364 276 174 319 319 348 348 348 64 64 212 212 300 469 134 349 155 262 262 100 100 497 122 45 45 45 325 111 111 111 203 53 90 212 144 106 88 319 135 135 248 465 377 87 87 87 251 251 251 251 241 278 278 278 173 402 402 345 333 220 220 164 219 477 477 477 88 89 89 446 53 212 354 354 255 251 251 251 251 241 431 235 235 235 235 235 348 248 248 465 449 377 123 123 123 219 219 477 477 477 477 477 132 13 321 247 312 126 126 326 326 326 326 326 326 101 101 149 149 228 321 412 287 111 111 111 438 202 402 6 479 463 463 463 280 29 382 245 8 354 354 134 497 251 241 431 235 235 235 235 348 76 465 108 123 123 123 88 109 475 475 94 475 475 475 301 8 354 106 493 493 240 325 41 41 41 19 454 454 229 491 247 312 126 126 23 408 149 149 228 321 320 7 331 307 307 307 167 457 457 42 147 380 485 213 213 286 286 139 139 175 359 474 474 41 41 19 19 454 454 13 414 170 47 47 47 491 47 491 491 47 491 102 435 80 80 289 321 7 7 354 159 159 159 314 35 22 448 448 464 464 255 38 162 68 115 273 106 265 265 85 85 146 175 81 242 203 250 250 345 141 141 281 453 9 198 22 283 455 43 364 364 276 109 109 498 498 498 396 271 271 39 39 86 86 238 6 227 419 439 78 56 56 28 491 28 491 2 491 2 341 341 12 12 21 21 23 101 101 101 149 149 228 321 287 287 111 111 202 202 202 280 29 106 350 350 350 175 466 166 166 166 301 8 137 137 137 137 94 199 340 340 340 94 199 277 277 385 457 393 205 155 155 332 148 148 148 372 372 245 399 399 217 70 65 319 319 319 319 379 379 243 77 270 433 433 112 427 247 247 126 126 23 408 408 391 228 321 320 320 159 159 159 159 129 259 127 114 92 92 92 92 457 457 141 141 141 281 342 168 168 340 340 116 10 479 331 331 230 230 230 169 169 169 352 352 352 352 352 352 112 112 78 56 421 421 491 15 15 15 193 193 193 17
+103-1241-0025	103	241	1 19 33 31 38 14 31 12 5 25 1 13 25 20 34 19 26 39 36 22 35 11 19 24 4 21 5 25 1 24 19 31 19 40 31 29 13 25 31 14 31 13 11 19 33 38 5 40 38 19 22 5 11 5 37 24 20 33 19 33 6 22 23 8 22 12 4 33 1	13 4 3 3 4 4 3 3 2 3 3 5 2 4 3 2 3 2 3 2 1 2 3 6 6 5 3 7 19 4 2 3 2 2 4 3 1 5 2 3 4 1 2 1 2 2 1 4 3 2 3 2 2 1 3 2 2 3 3 5 5 4 3 3 4 3 5 3 9	17 17 17 363 363 363 51 51 51 228 491 321 321 209 177 177 177 177 356 77 342 142 397 336 345 109 498 498 498 313 186 39 342 68 198 114 114 242 446 116 457 335 401 321 226 321 209 475 475 475 475 475 475 475 475 422 349 164 214 214 214 214 200 248 219 152 152 152 143 458 192 389 389 34 121 121 399 217 217 217 217 473 65 486 486 486 460 460 460 24 310 107 107 242 275 275 275 303 303 117 404 13 78 170 170 491 491 312 187 187 187 187 12 12 12 12 12 408 408 149 149 228 321 320 7 473 258 258 258 31 342 224 494 494 31 162 232 232 105 105 336 470 432 432 330 379 64 77 77 224 300 300 382 186 186 54 273 470 470 240 34 177 177 378 345 141 141 281 9 142 397 364 364 109 109 278 143 458 192 192 469 325 34 223 130 402 196 70 429 429 429 422 108 377 87 87 236 259 108 119 119 437 405 405 405 405 206 178 35 321 26 386 266 266 266 266 266 178 458 96 321 127 114 92 92 92 92 92 167 385 427 82 247 126 126 326 326 326 193 193 193
+103-1241-0026	103	550	1 12 15 1 38 14 17 35 11 1 39 36 25 27 1 12 20 5 31 8 23 5 24 29 20 29 5 23 1 9 5 33 12 13 30 19 40 31 27 23 19 33 5 23 31 22 27 29 16 14 12 20 19 24 4 21 5 25 15 32 5 25 19 25 5 25 5 31 8 23 5 24 1 27 25 23 20 21 5 31 33 19 25 12 20 5 12 14 6 30 16 5 25 40 1 19 33 38 5 40 29 30 19 33 20 19 25 33 30 19 31 33 19 26 33 5 19 24 4 21 5 25 34 19 26 40 5 9 7 33 12 13 24 1	12 2 4 2 2 2 4 7 6 12 4 3 3 8 10 4 1 3 3 6 2 3 2 2 5 3 2 8 27 2 2 2 2 2 2 3 1 6 4 3 2 1 2 3 7 3 4 2 4 2 2 3 3 4 3 5 2 6 6 7 1 2 4 1 2 3 3 7 5 5 2 6 28 6 3 2 3 4 2 4 2 2 2 2 3 2 3 3 6 3 6 2 5 12 40 4 2 3 4 3 3 2 1 2 3 3 2 2 1 2 2 3 2 2 4 2 1 4 4 4 1 3 2 2 3 4 1 2 6 3 2 6 7 5	17 17 17 296 317 491 184 184 184 184 289 321 320 127 0 0 0 0 378 354 347 347 347 245 416 129 321 144 180 484 484 484 484 120 37 37 37 24 24 404 414 414 414 47 47 47 47 491 47 80 80 321 321 289 7 219 152 152 152 116 94 331 84 84 84 84 16 274 98 229 247 247 126 326 326 326 326 101 149 228 321 321 320 22 448 464 255 38 162 342 115 273 106 265 265 85 85 146 175 175 81 242 203 394 76 259 74 485 213 213 213 252 215 259 354 100 100 100 497 98 98 98 13 417 417 170 170 170 170 28 491 28 2 2 2 2 2 2 2 2 2 2 2 2 491 316 316 316 73 289 321 289 321 159 159 159 159 35 127 114 0 222 406 467 356 356 281 162 232 232 68 172 115 344 344 344 344 274 274 251 241 431 278 285 285 302 497 497 186 162 162 232 482 482 482 105 336 144 180 496 496 274 215 457 96 393 155 155 332 332 216 216 448 448 448 464 121 121 399 217 217 65 65 486 460 240 240 310 449 107 242 242 116 33 10 10 10 309 331 418 418 418 418 418 252 99 99 436 436 60 60 298 298 116 199 199 340 340 116 94 199 242 466 94 199 459 44 38 31 162 68 68 115 273 273 265 265 85 85 146 146 175 175 81 81 275 203 203 381 381 48 13 229 321 247 312 126 292 292 292 292 292 292 21 21 23 23 23 23 23 23 260 260 260 260 260 391 391 228 321 321 412 287 287 350 350 350 350 350 250 81 166 166 166 422 36 310 395 395 151 151 150 39 86 238 272 34 340 340 116 466 22 448 448 464 464 493 493 493 300 300 382 245 14 14 411 411 153 153 372 372 372 396 349 349 234 234 25 242 275 275 379 379 471 471 49 269 433 390 390 112 112 56 56 56 305 170 28 28 28 491 28 491 28 362 491 362 491 362 362 362 362 40 40 362 362 362 305 362 362 491 218 218 40 40 40 40 435 435 211 21 326 326 408 408 408 149 228 321 177 177 177 177 378 364 345 141 141 141 141 281 453 9 142 336 74 190 487 104 278 325 34 324 324 464 464 121 121 121 64 161 161 487 469 186 54 86 6 272 176 176 328 200 248 76 465 377 87 123 255 255 399 217 473 65 486 486 460 460 240 310 449 242 242 116 394 76 465 214 214 214 328 200 248 49 453 342 168 255 8 354 180 113 113 113 113 167 167 35 198 198 114 114 114 57 57 120 282 203 381 381 381 117 48 229 321 247 193 193 17
+103-1241-0027	103	743	1 18 36 18 4 11 9 19 25 31 33 27 23 5 25 5 38 15 16 14 24 18 14 29 13 30 5 25 33 31 19 25 18 14 19 25 16 5 25 31 20 9 8 5 22 30 36 23 25 14 31 18 36 11 8 11 9 19 16 6 30 32 20 22 35 11 22 5 25 16 13 31 1 8 39 36 40 11 33 5 23 8 5 38 15 22 4 33 25 8 33 31 5 25 11 19 24 4 21 5 25 34 19 26 40 23 8 22 12 4 33 1 9 19 22 5 40 8 11 19 11 5 25 18 4 37 33 8 24 19 25 12 5 11 15 1 8 17 13 31 12 4 33 31 38 8 5 24 31 27 34 19 25 1 8 4 24 11 30 13 11 16 5 23 34 19 25 15 25 33 8 1 12 13 30 19 40 5 25 5 29 19 22 3 25 24 8 9 27 25 40 1	5 5 3 1 2 2 2 1 4 4 4 3 4 1 2 2 5 3 4 2 2 3 2 8 3 2 2 2 2 2 2 2 2 6 4 3 3 2 3 4 3 4 6 2 8 2 5 5 5 6 5 3 3 4 8 2 2 3 3 2 1 5 2 3 1 2 3 1 2 6 5 10 30 7 4 2 4 2 1 3 3 5 4 3 3 3 2 3 2 5 3 2 1 2 1 2 4 4 4 2 3 3 1 5 2 3 5 3 2 7 4 17 2 3 2 3 3 2 3 2 2 1 2 4 1 2 6 5 3 1 2 2 2 2 11 27 6 4 3 3 3 1 2 3 2 5 2 2 5 4 4 4 7 17 8 4 3 2 2 2 3 2 1 2 4 3 2 6 2 2 10 34 3 2 1 3 2 2 1 3 8 4 3 2 2 3 6 4 8 4 8 6	17 17 363 363 51 228 373 489 489 489 489 88 88 254 254 254 314 8 354 137 137 137 33 394 478 478 482 482 482 6 272 371 189 189 424 424 497 122 34 34 242 116 285 199 255 43 43 109 109 403 403 171 301 349 205 155 165 165 165 53 58 156 156 156 156 245 129 129 321 74 74 351 351 351 264 264 468 468 406 11 11 379 379 77 77 342 224 340 340 94 199 156 156 156 245 14 14 411 411 188 121 121 121 53 394 76 205 261 25 469 11 379 379 77 342 342 224 41 41 41 301 143 259 354 62 62 62 62 464 464 44 44 44 129 321 458 208 208 190 190 441 487 487 153 424 424 182 182 497 497 497 497 122 10 10 479 331 498 498 498 498 498 396 271 186 39 323 323 142 489 489 489 489 422 32 239 321 384 371 180 265 265 265 265 85 85 146 24 35 259 354 255 255 349 155 155 148 148 148 387 186 99 400 400 400 30 143 458 144 389 389 314 90 458 144 121 121 203 394 76 4 205 261 25 470 443 443 443 169 271 150 39 433 433 433 160 112 427 56 247 312 312 312 292 292 292 292 292 292 292 292 21 21 21 21 21 21 21 23 23 260 260 260 260 260 260 391 149 228 321 321 412 287 287 111 111 438 219 219 219 485 485 374 186 162 54 86 238 272 272 494 139 175 251 241 431 265 265 85 146 146 464 464 255 43 43 109 109 403 171 171 143 192 192 469 314 314 196 196 479 331 428 428 428 428 146 385 35 75 342 224 89 446 94 199 255 255 217 217 473 65 486 486 460 460 368 310 449 60 242 116 116 394 76 259 214 214 214 214 200 200 471 49 453 26 26 241 266 266 266 266 266 266 416 96 198 198 114 92 92 92 92 92 92 167 385 233 131 229 247 247 126 126 326 326 326 326 326 408 408 149 149 228 491 289 321 321 354 420 420 143 458 192 485 494 368 342 168 111 111 111 240 325 371 371 278 278 116 33 33 58 72 110 110 202 202 202 402 402 36 119 119 103 103 103 103 85 299 299 203 53 473 340 340 466 22 283 455 236 384 371 93 93 93 93 207 207 207 19 454 263 417 417 417 417 417 170 170 28 491 28 491 491 2 491 2 491 2 2 2 163 316 491 435 435 435 435 321 321 321 435 287 111 111 438 438 458 445 357 357 443 271 31 342 342 198 114 92 92 169 77 342 142 397 345 346 181 428 438 464 464 365 330 203 394 478 172 115 273 344 344 344 274 349 164 164 164 470 278 278 120 330 388 195 195 117 48 417 417 417 170 47 47 47 47 47 47 491 491 47 491 80 80 80 321 435 435 287 287 111 111 111 438 464 365 365 365 330 203 53 64 212 161 79 288 151 240 314 131 393 262 262 100 497 497 349 164 224 470 432 365 330 94 199 331 145 290 290 434 434 339 212 131 180 284 265 265 85 85 207 207 454 454 229 321 247 312 312 126 292 292 292 292 292 292 292 292 292 292 21 21 21 21 21 21 21 21 21 21 21 21 101 101 149 149 228 321 321 320 127 0 0 222 468 356 356 356 453 342 242 116 199 44 44 44 129 35 401 401 401 321 74 351 351 278 278 178 458 192 180 125 125 125 348 250 70 46 46 46 46 46 438 301 8 239 354 106 106 84 496 496 496 496 413 413 413 471 471 49 269 433 390 160 112 56 417 201 201 201 201 193 193 17
+103-1241-0028	103	777	1 8 11 36 23 5 37 33 36 19 24 4 21 5 25 8 24 25 8 31 5 25 11 29 23 5 24 29 38 19 34 11 19 24 29 5 23 40 19 25 24 8 13 23 9 27 40 1 38 19 12 19 31 1 24 4 34 39 36 40 22 5 24 29 4 25 39 5 25 31 33 3 29 33 6 22 19 26 1 29 3 30 33 23 20 1 9 19 22 5 40 32 20 38 5 40 7 33 5 37 9 30 13 34 5 25 11 29 3 30 33 23 20 9 19 22 5 40 12 15 18 4 11 30 20 10 33 12 5 9 5 17 20 1 25 3 33 5 25 5 12 14 38 14 11 19 11 32 20 31 15 5 25 33 19 23 12 15 18 4 11 23 13 16 33 12 5 37 19 23 19 21 5 25 11 38 14 11 30 8 37 19 26 11 7 25 5 31 33 20 29 23 19 33 5 23 18 19 23 1	10 8 5 6 3 4 3 4 2 2 4 5 5 1 2 5 3 4 7 7 1 2 2 6 3 2 5 3 1 2 4 3 2 3 3 2 3 3 2 1 3 6 3 3 3 5 14 53 3 4 6 6 16 3 3 5 3 2 2 3 3 1 3 5 2 4 2 3 2 4 3 4 2 5 5 4 2 7 8 3 3 4 2 3 8 14 3 2 3 2 2 2 2 2 2 3 4 3 1 4 2 2 6 9 2 1 3 4 4 3 3 2 3 3 3 3 4 3 3 4 2 1 4 2 3 6 1 2 2 3 4 2 9 21 3 2 3 3 2 3 3 3 7 5 4 1 4 4 2 6 6 3 3 3 3 2 2 3 2 2 2 3 3 4 3 2 1 5 2 4 3 6 1 2 2 2 2 3 2 6 3 2 4 3 5 3 3 5 4 5 4 2 2 2 2 2 5 3 9 8	17 17 17 296 317 317 184 184 184 289 209 287 287 111 111 111 438 438 314 32 239 384 371 371 374 374 132 274 251 251 241 431 266 266 266 266 173 402 402 36 108 87 87 88 88 255 255 399 217 217 65 65 486 486 460 460 240 310 107 395 242 275 116 199 199 111 111 85 438 203 203 53 10 10 309 331 331 265 265 428 428 146 146 186 39 342 68 68 224 224 11 116 33 394 472 401 401 321 74 425 425 386 431 319 319 319 203 53 53 53 76 401 259 345 333 333 220 220 402 472 221 239 384 371 278 278 53 53 394 76 259 74 302 302 497 497 49 453 342 168 340 340 116 250 70 46 46 46 46 438 464 464 145 139 139 293 293 122 8 354 354 84 84 496 496 274 185 39 433 433 390 160 112 112 56 56 56 56 28 28 491 491 28 28 491 491 362 491 362 362 362 491 362 362 362 362 362 362 491 362 362 211 211 362 491 369 369 369 369 369 369 369 369 21 21 21 21 21 21 21 260 260 260 260 260 260 391 391 391 491 289 321 321 7 7 345 333 333 220 220 314 32 4 4 127 114 258 258 258 258 258 31 39 342 433 390 390 390 160 160 160 160 97 97 225 225 80 80 80 321 321 7 217 473 329 329 329 329 329 164 164 485 485 485 485 374 368 31 142 221 336 27 121 399 53 76 465 74 351 351 365 365 365 330 388 64 219 398 398 275 275 116 471 471 478 66 482 238 6 272 106 405 405 405 167 215 96 96 75 108 119 437 405 405 405 405 206 206 178 458 192 176 176 328 328 200 303 48 48 417 225 80 80 491 321 80 289 289 320 74 437 437 306 306 396 396 396 35 35 26 359 359 474 474 474 474 19 19 454 229 321 247 126 126 326 326 326 326 101 101 408 391 228 321 289 321 320 354 420 422 143 144 27 494 278 186 99 400 400 400 378 378 141 141 141 281 168 106 113 113 113 206 240 285 34 462 462 402 401 259 259 354 380 380 288 443 120 120 169 169 169 169 352 352 352 352 352 352 97 89 55 322 67 90 90 259 74 74 437 437 306 306 306 396 396 385 35 26 359 359 474 474 324 301 301 354 420 420 420 143 321 144 27 351 253 253 368 453 342 198 198 127 0 0 0 0 58 110 254 254 254 131 133 147 147 288 213 213 143 233 310 107 447 447 447 198 198 22 283 455 8 354 354 329 151 151 416 416 41 41 41 41 19 19 454 454 170 170 491 491 312 491 312 312 292 292 292 326 326 326 326 326 101 101 101 149 149 228 321 320 479 331 307 307 61 285 34 44 44 44 94 199 493 493 493 493 216 300 300 382 245 43 364 364 276 109 498 498 498 498 396 313 313 314 36 430 430 430 430 36 36 310 107 400 400 400 30 422 162 162 68 68 115 273 470 403 403 207 207 464 464 89 319 348 64 76 465 108 139 139 139 139 497 122 216 0 0 0 0 58 254 254 254 314 26 251 241 431 443 443 443 169 402 402 96 75 472 198 198 22 283 455 4 4 4 280 278 278 278 175 175 81 459 469 37 37 24 310 107 395 89 89 322 322 250 250 347 347 347 236 129 161 79 79 499 499 265 85 85 146 146 173 173 176 176 135 135 200 248 248 384 371 180 315 315 315 450 450 413 94 199 44 44 38 342 342 68 482 6 336 384 371 213 213 213 213 252 215 129 321 26 26 26 81 278 278 285 26 302 497 497 58 58 183 72 351 278 278 278 139 139 375 375 375 375 98 98 13 229 321 247 15 15 193 193 193 17
+103-1241-0029	103	765	1 12 5 30 27 11 29 3 30 33 5 37 38 19 10 18 4 11 9 19 25 22 5 33 31 27 11 20 29 23 20 19 25 33 5 12 5 31 6 16 33 31 28 23 1 12 5 33 12 5 9 4 26 22 31 1 16 30 19 25 21 11 38 19 12 9 23 36 24 19 26 38 8 23 11 10 13 30 20 33 30 20 40 5 25 11 31 23 19 24 38 8 33 9 14 10 19 40 1 38 14 31 13 37 30 5 23 16 20 33 5 9 5 37 12 13 30 18 13 11 40 1 12 5 10 8 23 11 29 35 33 7 33 18 14 18 4 25 11 5 25 11 9 30 27 22 6 16 5 9 30 4 25 10 5 37 38 8 23 11 29 23 5 24 12 5 33 9 30 5 32 33 5 17 13 25 31 33 12 5 31 8 11 5 37 12 5 9 5 17 20 1	10 3 3 4 7 2 3 4 3 2 2 5 2 4 4 1 2 1 2 2 4 4 4 4 6 6 4 5 3 2 3 2 2 3 2 2 3 6 6 4 3 7 8 7 12 3 2 1 3 2 5 5 6 3 9 2 2 3 2 2 4 2 2 2 3 4 3 4 3 3 7 3 7 3 3 7 3 3 4 5 2 5 3 2 3 2 6 2 4 7 3 4 4 6 6 5 5 7 13 3 2 7 3 3 2 2 3 6 4 2 2 3 4 3 3 3 2 4 7 5 9 38 2 3 6 9 4 3 4 2 3 5 3 3 3 5 6 4 1 2 1 3 2 3 3 3 5 3 2 6 2 5 3 5 2 4 3 7 2 5 6 3 4 8 2 3 3 4 2 3 6 2 1 3 3 3 2 1 2 2 4 5 3 1 2 2 2 3 5 3 8 15	17 17 17 363 363 363 51 51 228 321 321 320 5 5 455 42 42 147 380 380 496 496 496 274 274 122 24 131 472 221 259 74 437 306 306 306 306 169 167 36 449 69 223 130 130 402 402 345 345 109 407 407 407 385 36 75 310 395 254 254 254 314 259 354 137 137 137 33 394 394 465 144 27 351 189 189 151 167 167 457 478 478 66 68 68 115 344 344 344 344 344 344 274 236 32 239 384 371 213 213 213 213 215 129 321 354 359 359 474 474 464 340 340 116 394 465 377 377 123 123 198 22 283 455 38 162 232 482 172 115 273 106 405 405 405 405 206 169 402 402 402 6 272 472 472 482 482 482 115 273 106 153 153 387 387 387 139 139 302 302 375 375 98 13 321 247 247 126 126 326 326 326 326 101 149 149 228 321 321 127 45 45 45 35 259 127 5 5 455 129 129 259 354 354 180 486 365 365 365 365 360 200 200 243 243 233 270 270 270 390 390 390 390 97 97 225 225 225 373 155 261 487 288 288 278 330 339 64 310 447 447 238 272 397 345 333 220 220 402 221 129 401 321 354 425 425 241 431 374 374 374 203 53 53 473 176 176 135 328 200 200 248 248 248 364 276 276 346 346 265 428 85 85 139 293 293 293 122 122 401 401 401 310 310 107 107 395 351 470 264 264 468 468 468 337 337 324 422 36 36 161 161 487 487 288 41 41 246 318 49 453 342 168 89 116 116 33 394 394 478 66 68 68 68 26 26 251 241 81 278 278 278 203 53 250 250 250 250 250 276 346 346 428 428 428 146 252 36 472 221 401 401 321 321 354 354 498 498 498 498 396 143 36 310 107 107 395 50 50 50 50 185 185 433 433 160 112 427 247 247 126 126 292 326 326 101 101 149 149 228 289 321 320 347 347 347 186 162 232 232 68 172 115 273 204 204 204 204 280 29 495 134 302 302 497 497 349 349 234 234 261 25 213 213 213 252 252 449 34 255 255 8 259 354 180 230 230 319 173 402 402 198 127 222 222 222 222 222 313 58 72 72 110 110 120 120 120 120 120 37 37 24 471 270 270 433 433 160 18 112 112 56 56 491 28 28 28 491 28 491 28 491 362 362 491 362 362 362 491 491 362 491 491 211 211 102 491 369 369 369 369 21 21 21 21 21 101 101 149 149 321 321 320 127 5 5 236 129 36 310 107 395 351 91 91 91 91 85 85 85 85 139 293 293 122 122 131 472 221 401 321 74 441 189 189 240 285 34 180 113 113 113 113 167 285 449 449 156 156 156 313 58 58 72 72 294 294 294 294 294 294 294 282 388 64 64 212 34 89 89 322 53 212 32 259 354 380 380 189 496 496 274 143 458 192 180 230 230 230 169 169 352 29 44 44 245 8 32 321 354 190 380 288 365 365 365 365 330 388 64 76 76 310 107 107 395 462 462 462 402 402 133 276 276 346 346 486 315 315 315 139 450 293 122 122 131 472 221 401 321 75 74 425 425 386 386 386 431 319 319 319 203 203 381 381 381 381 381 117 198 198 127 45 45 45 236 401 401 401 321 354 190 380 499 151 151 169 169 99 447 447 238 6 272 34 255 416 192 180 432 432 330 379 77 77 342 342 198 22 283 283 38 162 342 115 273 265 265 265 85 146 146 325 34 69 130 130 198 22 283 455 8 259 354 106 151 151 151 416 416 192 41 41 41 41 19 19 454 454 229 321 247 312 15 15 15 15 15 15 15 193 193 193 193 17
+103-1241-0030	103	773	1 19 40 5 25 12 4 33 1 9 39 36 33 5 16 5 23 1 18 38 5 33 11 19 11 12 4 33 30 20 23 20 25 19 26 7 33 16 14 24 12 5 9 4 26 22 6 23 38 8 33 5 25 11 23 15 31 20 24 15 22 39 36 34 19 26 22 5 37 32 20 4 31 22 33 1 38 13 23 1 25 7 1 8 11 5 25 27 1 31 13 11 24 4 34 39 36 1 18 38 8 5 9 30 8 11 5 37 22 6 30 31 1 5 9 30 8 11 1 6 23 19 25 38 8 33 38 19 12 5 23 5 37 23 20 24 19 31 33 20 37 15 23 1	16 6 4 1 4 2 5 2 2 2 5 3 2 2 6 2 10 37 3 1 2 2 2 2 1 3 4 9 2 8 4 5 2 4 5 4 2 3 2 2 2 2 5 4 4 4 8 7 5 4 5 2 1 2 4 5 7 4 3 3 3 2 3 3 2 5 4 5 6 5 4 6 5 2 4 34 7 2 5 1 2 16 23 11 1 2 4 17 5 7 2 3 3 5 3 3 5 37 5 3 7 3 6 3 8 3 2 5 7 3 5 9 16 5 7 4 11 5 2 10 5 3 6 5 7 5 2 1 2 3 9 4 7 3 5 8 3 7 4 3 5 9 10 7	17 17 17 296 363 363 363 363 51 51 51 184 184 321 184 321 321 209 188 430 430 430 430 342 430 430 430 430 33 64 212 127 114 92 92 92 92 167 457 401 401 401 321 354 354 219 219 485 485 485 374 374 285 285 469 469 349 393 234 155 262 262 100 100 100 100 375 98 98 98 13 13 13 442 442 442 491 442 442 442 442 442 491 102 2 2 2 2 201 40 305 305 305 305 305 40 366 366 102 316 316 316 491 102 491 305 102 102 289 289 321 321 320 181 181 181 181 181 35 449 430 430 430 430 198 114 114 92 92 92 167 457 35 401 75 161 161 161 161 487 487 487 213 213 246 246 246 246 301 26 251 251 241 444 444 444 444 360 360 339 199 176 135 135 135 135 200 200 464 113 113 113 113 167 167 349 155 155 165 165 165 165 466 22 22 283 455 455 259 354 354 180 376 376 365 365 365 328 200 243 76 458 192 483 14 411 411 297 297 297 297 297 297 293 293 497 497 43 364 364 276 346 346 346 428 428 428 146 252 143 36 449 89 89 446 446 33 251 251 251 241 241 431 171 171 171 252 186 39 342 342 342 224 41 41 324 324 301 399 473 476 476 476 476 143 458 192 219 152 152 422 349 164 164 164 214 214 360 360 200 200 248 321 144 192 69 223 223 223 223 223 37 173 352 352 352 402 99 338 400 400 400 400 464 464 464 145 376 376 376 460 169 169 342 342 86 105 6 96 96 272 427 56 247 247 312 126 126 292 292 292 292 292 23 23 23 23 23 260 260 260 260 260 260 391 391 163 491 316 316 316 316 316 316 316 73 289 7 7 7 364 276 276 109 109 139 139 293 293 293 413 309 479 331 331 315 315 315 315 315 450 450 16 98 98 98 13 13 13 229 247 312 312 126 126 126 23 23 23 23 23 260 260 391 391 47 491 491 316 316 80 321 373 412 412 287 287 287 284 306 306 85 438 240 325 34 242 242 94 199 331 84 84 84 16 16 16 16 98 98 98 98 263 13 225 225 225 225 225 225 80 80 80 321 373 66 66 172 179 179 179 179 179 314 196 196 473 65 329 329 329 329 329 164 164 485 485 485 485 485 374 132 98 417 417 417 417 417 417 170 170 170 28 28 28 491 28 491 491 2 491 491 2 491 2 435 2 2 2 366 321 305 305 40 435 40 201 435 435 435 435 435 289 289 321 320 364 276 346 346 265 428 85 146 464 464 44 44 44 8 401 401 321 354 190 190 380 380 499 265 265 265 85 85 146 146 252 325 449 34 255 130 130 402 402 401 321 321 208 208 441 441 441 153 153 153 372 372 396 396 271 186 54 433 433 390 112 427 56 247 247 126 126 326 326 326 326 326 326 326 101 101 101 149 149 228 321 412 44 44 44 8 129 129 259 190 190 190 190 79 380 380 499 499 265 265 265 85 85 85 146 146 146 24 131 335 14 14 226 321 209 411 287 297 297 297 297 297 297 297 293 293 175 175 81 81 340 340 116 33 33 90 250 250 364 364 276 346 346 346 428 428 428 146 358 358 36 131 472 397 345 333 333 220 220 216 44 44 44 251 251 251 251 251 251 241 241 431 266 266 266 266 173 173 173 402 402 402 26 359 359 81 81 324 324 324 301 339 399 217 217 217 217 217 473 65 278 278 31 162 342 86 86 86 6 6 272 41 324 324 324 301 4 4 4 280 470 470 403 171 171 171 171 464 139 139 302 375 375 375 98 263 13 78 78 170 491 421 15 15 193 193 17
+103-1241-0031	103	654	1 8 37 25 13 37 14 31 20 25 38 5 25 1 9 5 33 8 22 5 25 19 24 4 21 5 25 38 5 33 32 20 38 35 11 23 35 22 23 8 22 1 8 11 27 25 33 13 37 14 19 22 31 29 13 22 33 19 9 20 5 9 30 8 11 24 8 31 13 23 16 1 8 24 31 27 18 27 24 23 20 25 27 9 5 11 20 38 5 23 13 37 14 38 6 25 33 5 24 13 30 20 1 24 20 1 5 25 23 13 31 19 33 24 8 33 9 20 5 16 6 30 5 25 24 19 32 5 25 13 30 20 1 8 31 5 29 27 40 5 16 6 30 5 25 24 19 32 5 25 13 30 20 24 8 33 5 25 33 9 20 37 13 30 20 29 14 33 19 22 39 5 23 14 1	11 8 1 3 2 3 3 6 4 4 2 3 6 13 2 1 3 4 5 2 1 3 5 5 5 2 2 2 2 2 4 2 3 1 2 5 2 5 2 7 6 24 7 2 3 2 2 2 5 3 2 3 3 2 3 2 3 2 3 4 2 4 3 4 3 2 5 7 3 6 5 14 8 3 5 3 5 5 3 3 4 3 4 2 2 1 3 2 3 7 3 3 3 3 2 3 3 2 4 5 2 4 2 2 11 18 6 2 2 2 4 2 1 4 4 2 3 3 2 6 3 2 3 2 2 1 6 2 2 4 2 7 7 9 3 2 5 3 4 2 6 3 3 3 2 2 2 5 2 2 3 2 4 4 5 2 3 2 2 2 6 2 2 3 5 2 2 3 2 4 2 2 3 6 12	17 17 17 296 363 363 363 51 51 51 228 321 412 287 111 111 111 202 202 202 196 309 479 463 463 463 463 29 382 313 186 162 232 68 68 267 267 267 267 267 339 339 250 250 250 276 174 174 174 319 388 388 303 117 229 229 247 126 126 326 326 101 101 149 149 228 321 289 321 159 159 159 159 325 34 111 111 111 438 438 143 458 445 351 242 116 199 199 255 255 399 217 217 473 65 486 486 486 460 460 240 310 310 107 395 242 242 203 250 250 181 181 181 181 99 338 338 400 400 152 378 378 345 389 389 314 26 26 251 241 241 367 367 367 367 35 458 96 26 386 266 266 266 266 266 266 146 358 143 458 192 419 439 78 170 170 491 28 491 28 28 491 491 2 491 2 102 2 102 102 102 491 491 435 305 289 289 321 209 287 111 111 111 438 438 325 34 180 84 350 413 348 131 34 463 463 463 463 402 29 29 495 467 467 154 154 458 96 96 232 68 105 336 470 470 151 151 178 35 401 75 272 87 87 8 354 420 420 420 464 464 44 255 8 259 259 190 380 380 499 499 428 85 146 146 35 196 196 473 46 46 46 46 438 186 162 68 68 68 115 273 279 279 279 279 279 279 279 375 375 352 352 352 427 491 247 491 312 126 292 23 23 23 101 101 149 149 228 321 321 287 287 111 111 111 438 203 53 394 478 232 172 115 344 344 344 344 274 58 72 72 437 350 350 350 350 350 203 53 250 250 359 359 166 166 324 324 301 10 479 331 231 231 231 274 8 354 29 469 325 41 324 301 378 345 345 389 139 497 175 335 14 14 209 145 463 463 463 463 280 29 382 245 245 43 364 174 174 174 330 348 76 465 75 377 87 87 399 217 473 65 65 264 264 264 468 468 337 337 337 324 324 301 217 473 429 429 429 429 429 246 246 246 19 454 229 321 247 126 126 126 326 326 326 326 326 23 408 408 408 149 149 391 491 289 321 412 287 287 319 319 348 175 81 431 443 443 31 342 342 177 177 177 177 457 70 70 65 65 428 428 146 215 35 259 420 420 420 464 464 44 44 349 234 234 261 25 148 148 148 372 372 467 467 242 348 250 217 473 473 278 278 99 99 436 436 60 60 116 94 199 470 264 264 468 468 468 337 41 41 19 454 454 417 417 417 417 237 237 80 80 321 412 287 287 287 111 111 438 438 31 342 224 494 494 129 74 74 437 496 496 496 274 368 368 9 168 494 44 349 234 234 205 261 148 148 148 372 372 372 467 467 446 116 250 250 473 473 278 278 99 99 436 60 60 242 116 94 199 264 264 468 468 337 337 41 324 301 399 70 473 65 428 428 146 146 457 35 401 196 242 33 33 394 32 32 259 420 420 420 420 420 324 301 173 280 104 104 104 104 337 337 337 337 301 129 321 74 492 492 236 384 371 278 278 278 143 321 192 485 134 134 134 175 81 300 334 59 452 263 229 229 491 312 15 15 15 15 15 193 193 193 193 17
+103-1241-0032	103	700	1 9 5 33 8 11 36 18 27 29 12 5 33 31 5 24 11 15 8 32 4 23 1 18 4 37 5 38 8 33 11 30 13 31 1 12 4 33 1 19 40 24 8 18 8 5 31 33 8 11 20 23 5 37 14 34 23 20 9 23 19 31 1 8 21 19 31 33 23 5 37 29 30 19 33 20 22 23 27 12 40 1 4 25 11 8 37 25 13 37 14 18 4 11 5 29 30 19 33 20 11 30 13 31 19 25 24 8 23 8 16 12 5 33 8 22 5 25 30 19 24 13 24 9 14 1 9 5 33 5 37 22 6 30 31 1 19 33 31 1 6 23 12 5 24 6 30 33 5 23 35 22 16 6 30 38 14 11 33 36 19 40 5 25 33 19 33 1 4 25 11 12 13 25 1	14 3 1 3 6 3 6 5 2 3 2 2 3 6 3 4 2 5 5 5 2 3 2 6 2 2 3 6 6 3 5 2 5 13 14 3 7 3 1 4 4 3 4 13 6 3 3 3 5 4 4 3 2 2 6 4 2 2 3 2 5 9 23 7 3 3 2 3 2 3 2 6 2 1 2 2 6 3 6 3 9 14 5 1 3 5 3 2 2 2 2 5 2 2 2 5 1 2 2 3 3 3 2 5 1 2 3 4 6 5 4 2 1 2 3 4 2 4 2 2 3 2 4 3 8 24 2 1 2 1 3 4 3 4 9 14 5 2 3 1 6 4 2 2 5 2 2 2 3 3 2 5 5 2 3 2 2 2 4 3 3 3 2 1 3 3 6 17 5 2 2 2 3 6 5	17 17 17 296 363 363 51 51 51 184 184 184 289 321 320 354 159 159 240 285 34 111 111 111 111 438 236 35 75 371 371 374 374 132 132 88 58 72 72 496 496 496 496 215 35 96 26 34 45 45 45 31 478 478 68 68 68 115 273 231 231 231 203 53 64 212 384 93 93 93 93 464 464 111 111 111 111 438 99 99 338 395 389 389 389 497 129 401 259 74 483 58 72 110 202 202 202 202 173 402 44 44 44 43 43 364 276 276 346 346 428 428 146 146 358 457 401 401 321 75 161 161 79 487 288 443 443 120 271 271 150 39 433 433 433 160 160 160 112 439 56 56 47 47 491 47 491 47 491 491 316 491 73 73 289 289 321 320 127 114 92 92 92 92 92 240 385 35 131 335 226 188 188 356 356 281 342 9 196 70 70 46 46 46 46 438 58 58 72 72 72 72 72 72 72 72 437 265 428 428 146 146 146 464 459 459 459 31 39 86 86 6 272 106 486 428 85 146 438 239 36 371 485 485 286 139 139 175 175 69 462 462 130 29 498 498 498 498 169 164 164 26 26 359 81 324 324 301 8 259 354 425 386 81 459 271 271 271 39 39 433 390 390 160 112 427 491 247 312 126 126 292 292 292 326 326 326 326 326 326 326 326 326 101 101 149 149 149 228 321 321 287 287 111 111 438 438 143 36 107 395 494 31 342 342 26 26 26 241 266 266 266 266 173 402 401 401 401 259 74 190 487 278 278 325 34 324 324 143 458 321 208 208 208 386 386 431 496 496 496 496 274 274 216 164 270 270 433 160 18 427 56 56 47 47 491 47 47 491 2 2 491 316 316 321 73 289 321 435 209 83 55 55 322 322 212 34 111 111 111 111 438 202 202 402 196 479 331 463 463 463 29 29 382 58 58 72 110 110 254 240 325 34 44 44 129 129 259 74 190 487 278 278 325 324 324 324 236 239 259 161 79 487 288 443 443 169 342 342 224 340 340 340 250 217 70 46 46 46 46 46 438 251 251 241 431 431 428 428 428 146 146 186 402 352 342 224 45 45 325 325 111 111 111 178 458 192 242 242 116 33 250 456 456 456 456 456 399 217 473 65 432 432 330 203 53 212 212 29 334 334 59 59 452 263 229 321 321 312 312 126 292 292 1 292 292 1 1 1 1 1 23 260 260 408 408 391 391 391 289 289 321 320 159 159 159 285 255 255 402 402 458 144 441 441 441 153 153 372 372 396 186 186 54 54 86 112 427 56 56 201 201 201 201 201 201 201 201 201 201 201 321 435 435 435 320 209 177 177 177 356 356 342 483 14 226 321 411 297 297 297 297 297 297 293 122 216 22 283 455 399 399 138 138 138 138 372 396 313 449 377 87 87 87 251 241 367 367 367 367 458 96 393 393 234 234 261 25 148 148 148 372 245 43 345 109 109 313 236 36 75 108 377 485 489 378 88 356 356 356 281 342 430 242 242 116 212 131 277 277 277 277 277 385 233 75 419 427 56 56 170 170 312 312 292 292 292 1 1 1 408 408 408 408 305 321 209 83 55 55 322 67 466 127 361 361 361 361 361 388 195 117 229 229 247 126 126 193 193 17
+103-1241-0033	103	787	1 8 22 5 25 19 24 4 21 5 25 12 4 33 8 24 11 30 13 31 33 17 6 30 21 5 31 23 20 1 12 19 31 24 6 30 25 19 26 38 13 25 8 23 13 16 33 12 20 5 31 8 23 5 24 8 16 13 23 33 31 27 5 32 15 24 11 9 19 22 5 40 8 18 4 11 33 5 38 13 30 12 19 31 18 6 30 5 11 27 23 11 1 38 19 25 31 20 11 30 13 31 1 6 23 12 20 6 30 16 5 25 40 18 4 11 33 5 38 13 30 12 5 24 39 36 25 27 1 5 24 14 10 5 25 33 19 25 2 23 4 31 38 19 25 33 14 11 27 25 15 33 19 11 34 30 20 18 5 25 11 14 11 39 3 30 11 40 5 37 38 19 25 31 20 33 19 12 20 5 31 8 23 5 24 1 31 5 24 29 20 29 5 23 31 13 11 19 33 38 5 40 9 19 22 5 40 18 20 22 35 11 5 25 31 13 23 19 33 1	7 4 3 2 2 2 4 4 5 2 2 1 2 3 3 2 3 2 2 6 2 4 4 3 3 2 7 2 8 21 3 2 5 4 2 2 2 3 3 2 2 2 4 3 3 3 1 2 2 4 6 6 4 2 3 4 5 2 3 2 6 4 2 9 6 3 3 2 2 3 3 3 3 4 2 2 3 3 2 2 2 3 2 4 8 4 4 1 3 4 4 2 1 3 2 4 5 2 4 3 4 10 27 7 3 3 3 3 3 4 2 2 2 3 2 2 2 2 2 3 2 3 2 2 2 3 3 10 15 5 5 4 4 2 1 2 2 2 18 5 4 6 3 2 1 4 2 4 4 2 4 3 2 2 5 3 6 4 3 2 2 3 2 4 7 3 4 1 2 4 2 2 4 4 1 3 2 2 4 2 5 5 4 2 6 21 6 4 2 2 3 3 2 2 4 2 2 1 3 1 2 3 2 2 4 2 2 2 2 4 2 2 2 2 5 3 4 3 5 9	17 17 363 51 51 228 321 209 111 111 111 438 458 192 192 242 116 199 255 255 399 217 473 65 486 486 460 240 240 35 310 107 242 298 116 379 466 45 45 45 285 34 111 111 365 203 203 394 212 161 79 487 288 443 169 150 39 86 86 238 6 336 90 221 321 144 208 153 153 153 387 372 396 313 24 310 107 459 459 271 39 433 68 68 68 359 474 474 474 474 19 19 454 454 417 442 442 170 170 28 28 491 2 491 491 2 491 2 491 2 2 491 316 316 73 289 321 321 7 127 258 258 31 162 342 142 142 196 217 70 65 153 387 387 396 348 94 176 176 328 200 200 248 345 409 409 409 94 199 111 111 111 438 251 241 431 443 443 169 169 352 402 198 198 448 448 464 464 255 38 38 162 232 68 68 115 273 265 265 265 85 85 146 175 175 81 81 242 203 203 53 65 111 111 111 438 349 205 205 261 25 189 139 139 293 122 478 478 66 68 172 115 344 344 344 344 88 88 255 255 186 99 338 338 338 338 338 395 470 290 290 290 290 290 434 339 53 394 212 401 221 321 354 420 420 143 458 192 278 253 368 453 342 168 111 111 111 438 72 110 110 254 254 240 35 321 377 87 87 87 43 364 109 109 264 264 313 216 216 114 258 258 31 31 342 142 142 72 72 72 72 72 72 437 153 481 306 372 406 467 467 469 240 285 34 106 106 424 424 424 424 497 122 122 133 401 321 364 276 109 278 330 348 33 394 77 77 342 224 41 324 324 301 236 321 75 161 79 79 288 288 443 120 271 271 39 39 433 390 160 160 112 427 491 247 312 126 292 292 292 292 326 326 326 326 326 23 23 23 101 101 101 101 101 149 149 228 289 289 321 289 209 209 287 297 297 297 297 297 293 293 216 22 448 448 448 378 106 153 372 372 372 349 349 205 261 25 242 379 379 471 77 342 110 110 110 460 240 314 35 384 87 87 43 43 276 109 109 468 468 240 216 216 57 57 203 217 473 219 219 152 374 116 94 331 84 84 84 84 16 274 98 98 13 13 414 491 170 491 170 187 491 187 187 23 23 101 101 149 149 149 321 209 44 44 44 399 217 70 473 65 498 498 396 313 35 310 107 107 242 116 116 199 34 89 446 116 33 58 58 72 437 496 496 496 496 215 35 35 96 270 342 224 242 242 116 33 466 466 241 431 376 376 376 169 150 150 86 238 272 397 397 109 109 278 278 64 76 449 300 382 313 236 239 259 384 371 84 496 496 413 94 199 158 158 158 252 325 449 191 191 191 314 36 164 119 161 161 487 487 487 337 213 213 324 324 3 3 58 72 72 437 319 319 319 348 64 212 300 382 313 313 314 314 219 219 219 180 180 106 306 306 306 306 306 396 396 37 37 24 77 270 168 168 462 462 402 402 402 345 109 109 330 116 33 394 77 77 224 41 41 324 236 108 377 123 123 216 283 448 448 464 464 255 38 162 68 115 115 106 265 265 265 85 146 299 175 175 81 275 203 203 381 117 48 13 491 491 312 312 126 292 292 292 292 292 21 21 21 23 23 23 260 408 391 391 391 321 321 373 66 68 115 273 231 231 231 319 53 76 76 74 485 213 213 301 8 354 100 497 497 186 162 68 115 273 470 443 240 325 177 177 177 457 345 141 141 281 9 221 336 354 420 420 143 259 144 27 351 368 368 342 224 30 30 422 143 144 27 389 389 389 314 196 242 242 33 394 478 232 68 172 115 273 443 443 139 175 175 175 81 277 277 37 385 131 404 321 247 247 126 15 15 193 193 193 17
+103-1241-0034	103	845	1 9 5 33 8 11 30 4 12 14 9 19 23 20 37 12 5 33 19 33 38 5 40 7 33 5 37 12 5 22 8 25 11 25 5 31 5 37 18 19 40 18 3 30 33 38 35 11 5 25 33 39 36 1 38 13 25 38 20 17 3 33 3 25 12 5 33 30 15 25 1 8 16 13 23 33 13 40 19 16 1 13 37 30 20 9 3 11 20 24 5 31 33 9 20 23 35 22 19 26 4 33 24 20 5 25 11 29 19 33 20 19 26 24 20 1 9 5 33 8 21 5 31 33 38 13 25 33 5 38 14 22 5 25 11 19 24 4 21 5 25 11 12 5 33 8 18 4 11 3 25 12 5 24 27 31 33 1 9 39 36 33 5 16 5 23 29 15 23 9 23 36 31 19 23 22 11 30 13 31 1 9 19 22 5 40 38 19 25 39 36 3 30 19 24 4 21 5 25 19 26 39 36 24 8 33 13 40 38 13 23 19 24 4 21 5 25 31 5 24 34 19 26 38 14 34 18 38 8 23 1	9 3 1 2 6 4 2 3 3 2 3 2 5 5 2 2 1 2 1 2 2 1 4 4 2 2 2 1 2 7 4 1 2 1 2 4 2 2 1 2 3 5 4 3 4 2 1 2 3 2 2 4 7 29 4 2 1 2 3 4 3 3 2 3 1 3 6 2 7 5 10 5 4 3 2 3 2 4 2 5 1 6 3 3 2 3 3 2 3 3 3 2 2 3 3 4 3 4 2 4 2 3 2 4 1 2 1 6 2 4 5 2 5 2 9 23 3 2 2 8 4 3 2 2 2 3 2 3 3 4 3 3 1 2 1 3 3 5 5 2 2 1 2 1 3 5 9 2 3 3 3 2 1 3 4 3 4 10 2 8 6 2 4 4 2 4 12 10 5 8 5 9 10 3 3 7 2 3 4 9 23 2 4 2 3 3 1 3 2 3 5 7 4 3 4 4 5 2 1 3 3 3 2 3 4 3 2 3 3 2 3 2 4 3 5 2 2 3 2 2 3 2 4 3 4 2 3 3 8 4 13	17 17 296 51 51 184 184 184 289 321 320 159 159 240 199 111 111 111 111 111 438 314 133 133 147 380 180 486 443 240 240 216 300 300 382 245 8 354 255 255 251 251 251 81 444 444 444 444 246 252 173 198 164 45 45 45 34 177 177 177 345 141 141 281 453 342 168 180 113 113 113 285 285 69 223 130 198 22 283 455 455 129 259 144 27 437 480 480 480 146 299 339 64 10 459 459 459 271 342 342 224 69 223 130 280 257 257 257 31 9 142 142 72 72 437 306 306 306 306 306 396 396 385 233 131 133 133 430 430 430 430 430 430 430 430 430 212 131 219 219 219 477 477 477 374 132 132 98 48 13 170 170 170 491 312 312 28 341 341 341 341 341 12 12 12 21 21 21 21 23 23 101 101 149 391 391 73 289 289 321 7 70 409 409 409 399 53 473 429 30 422 143 458 144 180 189 405 405 206 285 34 125 125 125 348 466 22 283 455 236 36 161 161 161 161 487 487 288 290 290 290 290 434 434 434 339 195 404 229 82 247 126 126 326 23 101 101 149 149 321 321 287 111 111 111 349 205 261 25 189 189 139 293 122 35 449 34 253 253 453 453 342 118 118 118 118 402 402 14 226 321 209 411 145 204 204 204 204 204 204 29 337 337 337 301 8 259 354 109 151 240 325 34 41 324 301 399 217 70 65 151 169 150 342 105 221 259 354 420 420 301 301 251 251 241 367 367 367 367 367 458 192 192 176 135 200 200 464 415 415 415 415 457 457 217 429 429 429 464 464 89 203 394 129 401 321 75 74 351 278 278 278 325 449 41 41 324 324 324 464 434 135 328 328 200 248 248 248 429 429 429 429 429 19 19 454 417 417 170 170 170 170 28 491 28 2 491 491 2 491 2 2 491 2 316 491 491 316 73 289 289 321 321 354 159 159 159 285 34 111 111 111 111 111 111 438 438 239 384 371 180 151 151 31 54 54 142 397 397 109 109 189 330 457 394 465 108 377 87 87 43 364 276 109 372 498 396 396 178 458 192 89 340 94 199 255 255 399 217 473 65 486 486 486 460 240 240 36 310 107 242 275 275 116 195 466 45 45 45 45 325 34 111 111 111 111 438 438 58 72 72 72 110 110 110 110 254 254 240 285 34 106 125 125 125 125 466 22 283 455 399 70 65 496 496 496 186 186 238 6 6 472 221 401 401 47 47 491 491 47 80 491 80 401 321 354 354 485 485 219 219 219 485 485 374 374 374 132 132 132 285 449 469 469 469 349 349 155 262 262 100 100 100 497 497 122 129 401 401 401 401 321 75 74 74 437 351 351 290 290 171 171 171 171 171 139 139 139 139 497 497 497 497 122 32 32 32 401 401 321 354 425 425 425 241 431 431 374 374 374 374 132 132 132 132 186 162 232 232 232 68 68 172 115 273 278 278 139 139 293 122 458 458 96 472 221 401 401 75 161 79 487 288 443 443 169 271 150 39 433 433 433 433 160 427 247 247 247 126 126 126 326 326 326 326 326 326 326 326 326 326 101 101 101 149 228 228 321 321 321 354 420 420 422 143 458 192 485 278 368 453 9 397 345 409 409 409 67 219 219 152 152 152 152 14 14 411 411 284 284 284 353 353 353 396 406 467 467 255 255 255 399 217 473 65 486 486 365 460 240 310 107 107 447 242 94 199 176 176 328 200 248 248 219 152 152 152 378 399 70 65 428 428 428 146 143 449 34 253 253 9 142 397 336 109 109 139 139 175 175 81 255 255 217 217 65 486 486 460 240 240 36 310 107 242 242 116 394 478 66 342 224 231 231 231 76 76 198 214 214 214 328 200 248 250 364 276 109 498 498 396 169 164 164 133 364 276 276 346 346 346 265 85 85 85 355 355 375 375 375 98 229 229 321 247 15 15 15 15 15 193 193 193 193 17
+103-1241-0035	103	777	1 4 25 11 5 1 9 19 17 18 4 33 1 6 23 16 23 7 14 40 5 25 11 25 3 11 19 26 29 23 36 24 40 1 5 25 11 5 17 27 23 11 38 3 10 5 25 11 1 22 19 11 17 23 5 37 40 5 25 11 9 36 33 31 1 8 16 13 23 33 10 19 30 11 5 29 1 30 8 33 5 38 15 1 4 25 11 8 19 25 21 28 11 24 8 33 30 19 29 33 19 12 20 8 23 5 25 11 38 19 34 1 6 23 24 8 24 8 33 1 8 38 5 40 5 25 33 5 9 19 33 31 19 22 5 24 19 26 27 37 14 19 25 12 5 9 27 33 1 25 8 12 14 38 5 40 24 19 31 19 40 31 29 13 25 31 14 1 6 23 12 27 32 20 21 13 25 14 5 23 20 19 40 1	14 4 2 2 6 4 3 5 4 9 5 5 1 9 5 9 2 9 3 5 2 2 2 3 6 1 2 5 7 3 6 4 6 7 5 1 2 2 6 4 6 4 4 6 6 2 1 4 3 5 3 4 3 4 4 3 3 2 2 3 6 5 5 11 8 10 5 2 3 3 6 2 2 3 4 3 2 3 4 3 2 4 8 11 6 1 2 5 2 2 4 5 1 3 4 4 2 2 2 3 2 3 4 6 2 2 1 3 2 1 5 1 7 4 3 6 4 8 9 44 7 2 4 3 1 2 1 2 7 3 4 6 3 8 2 2 2 4 4 4 2 2 2 1 3 3 9 5 15 3 5 2 2 3 1 4 1 2 3 2 2 3 3 1 4 5 6 11 4 2 2 3 4 2 5 2 2 4 2 3 6 5 13 8	17 17 17 296 363 363 363 363 51 51 51 51 228 321 321 83 55 55 322 67 131 44 44 44 236 32 401 401 401 401 401 401 321 354 278 278 278 278 278 360 252 416 458 192 445 183 72 72 72 72 110 110 486 486 486 460 460 240 35 131 483 226 226 226 321 411 287 297 297 297 297 297 297 297 293 293 293 122 349 349 234 234 234 234 261 425 425 386 386 431 486 315 315 315 450 88 372 372 304 304 304 368 269 342 342 89 89 446 446 33 10 10 309 479 331 331 284 405 405 206 240 325 176 176 328 200 200 248 248 76 259 74 74 425 425 425 386 386 431 374 374 374 374 374 434 203 381 471 471 49 433 433 97 427 247 247 126 326 326 326 101 149 228 321 83 55 55 322 67 34 44 44 236 129 259 144 27 424 424 424 424 424 424 424 424 424 497 122 122 131 133 133 364 276 346 346 346 405 405 206 206 169 35 36 107 107 395 89 89 446 33 394 90 90 401 401 401 321 75 445 445 445 351 278 278 240 314 90 401 401 321 144 208 425 386 431 431 431 266 266 173 173 402 270 270 342 224 89 89 446 33 33 394 32 32 401 401 401 354 354 374 374 374 374 132 233 385 233 270 270 270 390 390 390 18 112 56 56 56 47 47 491 47 47 47 491 491 435 435 321 435 435 435 287 287 111 111 111 438 349 205 205 261 25 189 139 139 293 167 457 401 321 75 310 107 107 395 286 286 468 468 313 313 285 34 230 230 230 230 215 35 402 133 147 147 147 499 499 428 428 146 146 325 34 255 255 43 364 109 109 403 403 403 207 19 454 229 321 247 126 126 326 326 326 326 101 101 149 149 228 412 83 55 55 55 322 212 34 111 111 111 111 438 121 121 339 394 212 107 180 106 153 387 387 146 252 314 196 217 46 46 46 438 438 129 36 161 161 487 288 278 173 402 96 36 272 377 123 123 216 22 448 448 448 464 464 145 265 265 85 146 146 175 175 81 242 116 212 133 133 333 333 220 220 335 14 14 226 226 321 209 297 297 297 297 297 297 297 293 399 399 70 46 46 46 46 46 438 438 399 217 70 65 265 265 428 428 428 146 358 358 233 36 227 419 419 439 417 417 170 170 170 491 28 28 491 28 442 28 442 362 491 362 102 491 362 491 362 362 102 362 362 491 362 491 362 218 218 491 218 102 369 369 369 369 369 21 21 21 101 101 149 149 228 289 412 287 111 111 111 378 345 141 141 141 141 281 453 342 242 242 116 212 131 44 44 44 32 32 32 321 354 354 278 278 278 385 457 478 478 232 68 68 172 115 470 470 278 120 385 143 458 458 144 27 351 319 319 319 53 176 176 135 200 200 464 106 410 410 410 410 173 280 29 29 495 467 340 340 116 466 22 283 455 8 354 354 180 496 496 496 496 496 274 274 37 24 227 419 427 78 78 170 491 187 187 292 23 23 23 23 101 149 149 228 289 321 320 479 331 265 265 428 146 240 216 300 300 378 43 345 141 141 281 9 221 196 473 258 258 258 342 224 494 494 31 232 232 105 105 336 470 432 432 330 379 64 77 342 224 224 334 334 334 59 452 452 263 321 247 126 126 23 23 101 101 101 149 149 321 321 287 297 297 293 216 216 114 84 84 186 186 338 400 400 400 422 239 310 107 395 395 432 330 94 199 495 495 495 467 134 134 359 359 166 166 166 324 324 464 464 356 356 120 120 271 185 433 433 433 433 160 160 112 417 417 417 237 421 421 491 421 128 128 128 193 17
+103-1241-0036	103	778	1 32 20 31 13 11 32 20 18 4 11 5 25 33 8 24 33 19 17 19 33 31 19 22 1 38 3 10 19 26 33 19 31 20 12 5 33 8 11 19 11 5 25 16 3 23 27 37 14 9 6 30 11 1 32 20 31 13 11 32 20 25 13 37 14 31 6 12 5 1 9 20 33 5 37 24 20 16 14 29 30 7 23 19 26 5 9 7 33 1 9 5 33 19 16 19 33 22 4 29 33 18 14 16 14 24 9 20 19 26 31 20 31 19 22 19 33 31 5 24 14 31 20 8 11 19 11 29 30 7 23 19 40 5 25 33 19 33 1 4 25 11 8 38 6 25 33 19 11 33 19 31 20 13 37 30 20 34 19 26 12 5 33 38 5 40 33 19 9 20 31 20 25 3 25 12 4 33 9 27 33 1 9 19 22 5 40 8 11 19 11 5 25 27 38 13 12 14 8 11 13 37 14 18 4 37 5 25 5 12 14 3 29 14 33 36 25 5 33 20 1	10 6 3 4 3 3 2 3 4 2 3 1 3 5 4 3 2 1 4 3 3 6 5 4 2 4 3 3 2 3 2 1 6 5 1 2 2 4 3 1 2 1 3 5 3 5 4 3 3 3 4 3 5 18 6 3 4 3 3 4 2 6 2 3 4 4 5 3 3 2 2 4 2 2 3 2 3 3 2 5 3 3 3 3 3 1 3 8 6 26 2 2 2 2 4 2 2 4 4 4 1 2 1 4 2 4 2 5 1 6 7 4 6 4 3 2 2 2 2 6 4 5 4 3 4 3 2 4 3 4 4 2 3 2 2 2 5 6 19 4 2 1 3 4 1 2 2 1 2 1 2 6 4 3 3 2 2 2 1 3 1 2 1 2 1 3 3 1 3 3 8 5 2 3 3 2 4 2 4 6 3 14 2 4 1 3 3 4 3 3 1 4 6 5 3 1 4 3 5 3 2 3 3 4 2 2 2 3 3 3 5 2 3 2 6 4 2 3 2 8 17	17 17 17 296 296 363 363 51 51 51 321 321 373 338 400 400 400 422 422 162 68 115 470 470 443 240 314 35 310 107 400 400 30 422 58 110 110 254 254 254 240 35 242 242 242 33 457 465 108 119 437 103 103 103 146 299 203 53 64 212 377 87 416 416 445 180 278 443 385 385 77 478 66 232 68 172 115 273 470 278 278 120 178 458 458 192 225 225 225 7 276 346 346 405 206 206 35 310 107 135 135 135 248 212 384 87 87 38 342 68 172 267 267 267 267 301 301 216 45 45 45 325 111 111 111 438 438 239 384 371 278 278 116 242 33 90 393 393 234 261 25 106 481 481 481 293 293 175 14 14 410 410 410 410 410 280 29 29 245 245 8 354 153 153 153 153 372 372 372 37 24 404 439 229 491 247 312 312 187 292 292 292 292 292 21 21 21 21 23 101 408 408 149 321 321 373 400 400 400 30 422 162 162 68 115 470 470 120 240 240 314 310 338 338 400 400 400 30 301 10 10 309 479 331 463 463 463 463 29 382 313 186 186 162 54 115 273 106 481 405 481 293 216 216 283 283 455 236 401 401 321 354 213 213 213 252 325 34 69 223 130 402 196 429 429 429 429 422 393 155 332 332 245 129 321 74 190 190 380 499 499 486 481 481 293 175 175 81 176 328 200 200 255 255 8 8 180 113 113 113 113 113 450 450 167 385 75 227 419 439 78 78 170 491 28 491 491 341 341 12 12 12 21 21 21 21 21 21 101 101 149 391 228 491 289 321 321 320 159 159 159 285 34 118 118 118 118 261 177 177 177 131 90 259 144 445 351 351 443 443 240 215 35 96 96 272 156 156 382 349 205 155 165 165 165 165 53 394 212 212 354 420 420 360 360 360 135 135 200 200 248 248 478 66 68 68 68 115 267 267 267 213 422 186 162 68 68 68 115 273 470 278 278 178 143 458 192 177 177 77 77 342 168 44 44 399 217 217 70 65 498 498 498 396 186 162 54 172 224 41 41 324 464 464 111 111 438 438 239 75 371 371 278 278 314 35 401 259 74 190 190 499 499 499 405 450 293 293 175 175 81 356 356 281 453 430 430 430 430 64 465 34 277 277 277 277 385 385 233 321 419 427 491 491 312 312 312 187 187 12 12 12 12 12 21 21 326 408 408 149 149 321 321 209 55 55 322 322 199 111 111 111 378 43 364 174 174 319 348 325 34 191 191 36 87 87 87 162 68 68 172 267 267 267 267 464 464 464 204 204 204 204 29 29 337 469 164 164 214 214 214 200 248 114 45 177 43 345 141 141 281 86 238 6 377 87 87 8 354 420 420 420 422 162 162 68 68 68 68 267 267 267 267 267 434 339 339 199 125 125 125 125 348 466 114 114 92 92 92 167 457 401 321 354 354 496 496 496 496 274 37 24 131 427 321 247 126 326 326 326 326 326 326 326 101 149 228 289 321 321 354 420 420 422 143 144 27 180 253 368 453 168 111 111 111 438 236 325 371 278 278 278 314 242 242 242 457 10 10 10 309 331 331 84 84 84 274 43 43 109 109 181 216 216 300 300 300 406 467 111 111 111 111 438 240 325 34 463 463 463 280 29 382 382 58 72 72 110 202 202 202 202 402 44 44 116 479 331 493 493 493 493 216 300 495 406 467 467 499 405 206 215 29 29 469 469 236 36 108 119 485 485 485 374 374 330 94 199 469 469 469 325 41 41 41 19 19 19 454 454 229 170 491 491 15 15 15 15 15 15 15 15 193 193 193 193 193 17
+103-1241-0037	103	638	1 27 1 12 13 30 14 5 23 3 33 24 6 30 1 10 13 30 20 33 30 20 40 6 23 19 25 9 23 36 24 1 12 19 31 1 8 23 5 25 11 19 40 12 5 1 2 29 23 15 31 8 21 19 31 33 23 5 37 19 33 6 23 30 13 11 20 4 25 11 8 24 31 27 17 23 4 11 8 24 17 27 19 26 33 5 23 19 37 18 20 30 1 8 37 6 23 38 20 40 18 14 11 12 5 33 29 30 19 25 31 13 11 38 14 11 8 23 5 25 11 38 5 40 12 5 29 30 19 33 20 5 31 33 29 23 15 31 19 25 12 5 38 14 23 11 1	18 43 5 3 4 3 5 4 7 8 3 5 3 4 2 10 2 3 4 5 2 3 6 6 4 3 4 4 7 10 6 26 3 5 5 1 12 4 3 1 3 1 5 1 5 2 28 5 3 6 8 4 4 2 2 4 3 2 3 2 2 4 5 4 3 3 5 1 2 2 3 3 8 4 7 4 4 2 3 4 4 2 3 1 3 2 4 2 4 2 4 9 19 5 4 2 2 3 2 4 4 2 3 2 2 3 3 2 1 4 3 3 3 2 2 5 6 2 3 2 2 1 2 3 1 2 6 2 2 1 4 3 3 1 3 3 4 5 1 2 1 3 5 3 5 4 6	17 17 17 296 363 363 363 51 51 51 51 491 491 184 491 184 289 321 321 287 287 287 284 284 284 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 98 98 98 225 98 98 225 225 225 373 164 164 289 321 164 127 127 114 0 0 264 468 468 406 406 467 467 406 467 467 467 44 44 251 251 251 251 251 241 241 431 431 284 284 405 405 206 206 167 457 457 196 70 70 70 138 138 138 138 372 313 313 236 401 401 401 401 75 310 107 107 107 395 395 395 395 264 468 468 406 337 337 324 324 422 143 36 161 161 487 487 41 41 324 318 49 9 9 483 14 321 411 297 297 297 297 297 297 297 175 175 81 242 340 203 53 53 394 76 401 321 354 425 425 425 241 431 431 374 374 374 374 374 374 132 132 132 413 203 381 381 381 404 229 229 247 312 312 126 292 292 292 292 292 292 292 23 23 23 23 23 23 23 101 408 391 491 491 491 289 289 321 127 114 258 258 258 258 31 342 342 342 483 14 14 321 411 287 284 265 265 85 85 85 146 146 139 175 175 175 81 242 275 116 64 212 131 356 356 356 281 342 198 198 22 283 455 455 129 129 401 321 354 425 425 386 431 374 374 374 374 132 399 53 473 324 324 324 464 459 459 459 31 39 86 86 6 272 472 336 321 74 74 425 425 386 386 343 343 343 343 358 358 358 39 39 433 68 172 115 97 225 287 111 111 438 143 36 107 395 494 31 31 342 26 26 26 251 241 241 266 266 266 173 173 29 277 277 325 285 106 106 297 297 297 293 293 42 42 147 147 380 288 443 443 240 325 325 41 41 324 324 3 464 89 322 94 199 111 111 111 356 203 53 478 478 232 232 68 68 115 344 344 344 344 344 274 274 32 401 401 321 208 208 386 386 431 431 376 376 376 240 285 34 111 111 203 203 53 394 90 465 144 180 84 496 88 176 135 135 248 36 377 87 87 87 251 251 241 431 278 278 173 173 402 183 183 286 286 286 286 264 59 59 452 263 263 417 414 170 170 491 170 28 491 28 28 491 2 491 2 2 491 491 435 435 289 289 321 320 305 287 287 111 111 202 202 280 106 297 297 297 297 88 109 109 469 368 31 342 142 142 72 110 498 498 498 498 313 240 314 198 45 45 45 457 129 259 259 74 190 487 432 330 330 64 64 77 342 342 168 145 329 329 240 131 133 345 345 109 313 285 335 14 14 145 284 265 85 146 146 139 175 81 81 275 275 116 133 133 345 141 141 281 453 342 198 22 283 455 129 129 321 74 190 190 487 487 278 278 325 324 324 464 459 459 459 31 86 86 221 336 321 74 425 386 343 343 343 343 252 186 39 342 342 340 340 340 340 466 22 283 455 43 43 276 109 109 498 498 498 139 139 375 375 375 122 122 131 427 229 247 15 15 193 193 17
+103-1241-0038	103	746	1 5 25 11 8 39 36 40 11 33 5 19 24 4 21 5 25 8 38 5 40 23 19 37 19 26 18 20 30 1 9 5 33 8 25 13 37 14 30 20 23 20 19 22 31 29 13 22 33 5 11 8 38 35 11 1 19 33 31 11 19 23 8 33 16 5 23 38 19 25 39 35 30 19 24 4 21 5 25 15 32 5 25 40 22 5 24 33 30 36 19 40 5 25 33 19 33 1 9 5 33 12 27 40 30 13 11 30 27 11 40 14 31 27 16 5 25 20 1 18 38 13 25 38 20 17 3 33 19 25 33 5 12 5 33 30 15 25 4 33 32 3 30 23 5 33 7 25 5 25 11 12 5 30 13 11 30 27 11 40 9 20 17 4 25 33 5 16 23 4 32 29 4 31 33 1 8 4 31 33 24 19 31 19 40 31 29 13 25 31 14 38 5 33 24 15 11 12 5 24 30 13 11 1	9 3 2 2 3 5 2 4 1 2 2 2 4 4 4 2 2 3 3 1 5 3 2 3 2 2 6 2 6 20 3 2 2 4 5 2 4 3 6 2 4 3 2 3 4 3 2 2 3 2 2 6 6 4 7 26 5 3 3 3 2 6 4 3 4 1 3 2 3 2 2 2 3 3 3 4 4 3 2 3 6 1 2 3 4 2 4 5 2 4 2 3 2 3 2 5 5 22 2 1 3 3 4 5 4 3 5 4 3 3 2 3 6 5 7 4 3 9 20 1 2 1 2 1 4 3 3 2 2 4 2 3 2 3 6 3 3 3 2 2 5 2 3 2 4 5 7 2 2 1 2 1 4 3 3 6 3 4 2 3 2 2 3 4 3 2 2 4 3 5 6 5 8 4 3 12 8 8 3 2 2 3 3 2 2 3 3 2 3 4 2 3 2 2 3 4 2 1 2 5 1 4 4 5	17 17 17 363 363 363 51 51 228 321 83 55 55 322 131 111 111 111 111 438 219 219 219 485 485 374 186 162 54 238 6 161 87 87 464 255 255 399 217 473 65 486 486 460 240 240 310 107 395 242 116 94 199 111 111 378 378 345 141 141 281 342 9 26 26 251 241 431 278 278 173 173 280 176 135 328 200 248 183 183 286 286 286 286 264 59 59 452 263 321 247 247 126 126 292 292 292 23 23 23 23 260 260 391 391 391 316 73 73 289 321 321 354 159 159 159 285 34 111 111 111 111 438 10 10 479 463 463 463 463 463 29 29 382 245 245 42 42 147 147 380 485 485 278 278 359 359 166 166 166 464 464 154 154 458 96 66 86 105 105 336 470 470 151 178 178 96 96 272 191 191 191 325 34 111 111 111 111 438 438 43 364 364 109 109 389 389 120 120 120 37 37 75 419 419 439 78 170 170 170 170 28 491 28 491 28 362 362 491 362 2 362 362 491 491 491 366 491 491 316 316 435 289 321 321 177 177 177 143 36 77 86 86 221 336 384 490 490 490 251 251 251 241 431 428 428 428 428 146 35 35 393 393 155 262 262 100 100 497 43 364 409 409 409 409 33 219 219 152 222 222 406 467 467 467 255 203 217 473 65 486 486 460 460 240 310 107 395 395 469 116 94 418 418 418 418 418 99 436 436 60 298 298 379 471 49 9 142 336 144 27 351 319 319 203 53 90 76 321 161 161 161 487 487 374 374 132 88 88 356 356 356 453 342 430 430 116 64 212 131 277 277 277 277 277 385 385 75 419 427 229 491 312 312 126 292 292 292 1 1 1 1 21 21 21 21 101 408 149 149 228 321 320 320 159 159 159 35 35 198 127 124 124 124 124 124 368 9 142 397 42 147 147 380 288 443 443 240 240 314 131 133 133 147 147 380 496 496 496 274 368 77 270 9 353 353 353 186 186 232 482 172 115 344 344 344 344 274 349 349 234 234 234 234 261 25 319 319 319 240 94 199 41 41 41 41 19 19 19 454 454 414 170 170 312 312 187 187 187 292 292 292 23 23 23 101 101 149 149 228 321 321 320 345 409 409 409 399 473 429 30 301 143 465 144 180 189 189 240 285 94 34 340 340 116 64 76 377 377 123 123 216 216 22 283 455 236 401 321 75 161 161 487 487 288 290 290 290 434 339 199 199 415 457 457 186 338 338 338 395 499 499 306 306 206 293 175 81 81 469 457 457 36 75 108 119 351 315 315 315 315 450 450 413 413 94 199 340 340 466 22 283 455 455 42 42 147 380 288 443 443 240 314 131 133 133 364 147 380 380 496 496 496 274 274 24 77 270 142 221 336 420 420 420 416 445 445 210 210 210 460 330 388 76 384 87 87 87 349 234 261 261 386 431 431 376 376 460 460 169 169 99 436 447 447 221 336 74 74 311 311 311 311 311 311 311 460 169 150 150 342 86 6 272 427 82 247 126 326 326 326 326 101 101 101 149 149 228 321 287 287 111 111 111 438 438 145 145 376 376 460 460 169 150 150 86 238 6 196 196 217 473 258 258 342 342 224 494 494 31 162 68 105 105 336 354 470 432 330 379 64 77 77 224 300 382 382 245 43 345 181 181 181 167 457 217 217 473 476 476 476 252 314 259 22 57 57 203 53 250 250 147 380 288 288 120 120 240 385 131 229 247 126 193 193 17
+103-1241-0039	103	806	1 4 25 11 32 20 31 13 11 32 20 11 19 11 25 33 25 27 5 25 11 16 14 29 19 33 20 40 31 15 22 25 3 33 5 4 31 22 18 14 13 25 20 24 6 30 22 38 13 31 10 5 25 40 1 32 20 31 13 11 8 24 5 31 33 18 4 37 4 31 33 18 14 5 34 7 40 5 25 11 6 23 30 13 11 20 1 8 31 5 29 27 40 8 18 4 11 33 36 9 5 33 1 18 7 39 36 17 27 19 26 33 19 16 8 25 11 7 33 5 9 7 33 34 19 26 40 19 16 39 36 11 27 25 33 4 31 22 38 13 31 10 5 25 40 1 4 25 11 38 5 33 11 5 40 24 15 22 12 5 30 27 11 40 30 13 11 1 38 13 23 25 7 1 8 11 5 25 27 1 31 13 11 24 4 34 39 36 1	6 3 2 3 4 3 6 2 3 3 2 3 2 3 2 1 3 4 2 2 1 2 3 4 2 3 3 2 5 4 3 2 4 3 2 7 3 2 2 1 2 3 2 4 2 3 4 2 2 3 5 2 3 4 15 6 2 4 2 1 5 3 3 2 1 2 1 2 6 3 1 2 1 3 6 6 4 1 2 2 4 4 3 3 2 8 16 7 4 1 4 4 5 6 5 5 4 5 7 4 2 3 17 7 6 5 1 3 2 3 1 2 1 5 5 2 3 4 2 2 2 6 3 2 2 3 3 2 2 2 2 3 3 2 2 4 4 3 2 2 4 4 3 4 10 22 3 2 1 2 4 5 4 5 6 2 3 5 2 3 4 4 3 4 3 4 7 37 18 6 3 4 20 22 11 2 3 3 17 8 7 1 3 3 5 3 3 10 19	17 17 296 296 51 321 412 55 55 322 322 67 478 338 338 400 400 400 30 30 422 186 232 232 172 115 470 470 240 314 36 310 400 400 400 30 422 236 384 371 371 278 278 314 196 242 242 457 309 479 331 84 84 496 88 88 89 446 203 393 155 332 332 332 245 129 259 74 74 278 278 278 325 41 324 324 324 186 162 232 68 68 115 470 470 171 171 252 143 96 196 479 331 307 307 61 167 457 36 377 87 87 14 145 145 376 460 460 169 150 86 105 221 458 208 495 467 467 475 475 475 475 475 301 399 70 138 138 138 138 372 245 245 129 321 208 441 151 151 151 151 169 99 238 238 310 107 60 298 298 298 379 471 471 270 160 427 247 247 126 126 292 292 292 292 23 23 408 408 408 408 391 321 321 373 373 400 400 400 400 422 162 342 115 470 470 240 285 34 111 111 111 438 399 70 65 65 151 150 150 86 6 34 202 202 202 280 145 145 486 460 460 169 150 86 238 272 161 382 467 44 44 44 38 164 401 321 164 180 180 486 315 315 450 450 169 269 9 168 242 116 64 131 34 106 297 297 297 293 293 42 42 147 380 288 443 443 240 325 325 41 41 19 19 454 454 414 321 247 312 126 126 292 292 292 1 23 23 408 408 149 149 228 321 321 209 287 111 111 111 438 31 342 342 494 494 494 129 74 496 496 496 496 496 368 368 453 168 180 111 111 111 111 111 438 58 72 72 110 110 486 486 486 460 460 388 64 314 401 321 108 108 119 374 374 374 374 374 132 132 132 8 8 354 159 159 159 159 314 229 247 247 126 126 326 326 326 326 326 326 326 326 326 101 101 101 149 228 321 373 72 72 268 268 268 268 268 88 430 430 430 430 219 219 152 152 416 144 27 106 88 350 360 135 339 212 87 87 349 349 261 25 480 480 480 85 299 299 299 64 212 384 180 180 486 113 240 285 285 255 255 8 354 180 113 113 113 113 167 167 164 164 164 214 214 214 200 200 471 49 342 9 118 118 118 280 30 30 30 422 239 371 180 84 350 350 413 285 34 145 145 460 460 169 150 86 142 221 336 208 441 151 151 151 151 169 99 447 238 6 310 60 60 298 298 275 379 471 471 471 49 269 433 160 160 112 112 56 56 170 28 28 491 2 2 2 2 289 321 373 373 326 326 326 326 326 326 101 149 149 228 321 321 83 55 322 322 399 250 181 181 181 181 181 35 401 401 401 401 321 384 371 180 71 71 71 71 368 368 453 342 86 221 196 196 473 476 476 476 143 143 401 401 198 22 283 455 455 42 147 380 380 288 496 496 496 274 37 77 77 323 142 397 336 147 147 380 288 120 120 120 37 24 24 419 439 439 78 417 417 170 170 170 170 28 28 28 28 491 362 491 491 362 362 491 362 491 362 491 362 362 491 40 40 40 40 40 40 40 366 366 366 366 316 316 249 7 7 7 7 7 7 7 7 7 7 7 7 364 364 276 276 109 109 84 443 139 139 139 293 293 413 122 309 479 331 331 315 315 315 315 315 450 450 450 16 16 293 98 98 13 13 13 13 13 78 491 170 312 312 126 23 23 23 23 260 260 391 163 316 491 316 289 373 225 225 225 442 287 287 287 287 287 287 111 111 111 438 438 24 325 34 84 242 116 479 331 84 84 84 84 16 16 16 16 375 98 98 98 263 13 13 13 78 47 47 47 491 47 80 80 321 80 373 66 66 172 179 179 179 179 314 196 196 70 65 329 329 460 460 169 349 352 25 485 485 485 485 485 374 132 98 98 13 417 417 417 170 421 421 491 421 421 491 128 128 128 491 128 128 128 128 128 193 193 17
+103-1241-0040	103	689	1 12 13 30 11 9 20 25 27 31 22 27 29 16 14 19 24 4 21 5 25 15 32 5 25 12 13 25 1 38 35 11 12 13 30 1 9 5 33 15 13 24 8 1 33 6 22 19 26 33 36 24 5 10 1 29 20 29 5 23 14 6 23 38 20 40 33 13 23 19 26 24 20 8 11 36 1 38 35 11 39 36 30 4 12 14 8 11 19 11 5 25 33 6 22 1 19 16 39 36 31 15 31 27 8 23 31 33 3 29 1 8 22 4 25 31 33 3 29 38 13 25 8 24 15 22 5 29 24 8 24 8 25 11 33 36 19 33 6 23 12 27 19 33 31 11 19 16 5 22 5 23 33 1 24 4 34 39 36 1	9 3 2 1 2 2 2 5 4 4 3 3 2 3 2 6 3 3 4 2 2 5 6 3 2 3 4 5 3 5 2 4 2 2 8 31 3 1 2 1 2 3 4 1 3 5 3 2 2 3 3 3 5 9 8 3 3 2 2 3 6 3 2 3 2 2 4 2 2 3 2 3 4 6 3 12 18 3 2 2 2 3 4 4 2 4 4 3 3 1 2 2 5 7 7 19 5 4 2 2 6 5 5 4 2 4 4 3 8 5 22 6 8 5 4 4 3 4 3 2 1 2 4 4 3 4 3 2 3 4 5 4 3 1 3 3 4 3 3 2 3 5 2 3 4 2 3 3 2 3 3 5 6 40 6 7 6 2 10 9	17 17 17 363 363 363 149 149 228 321 127 114 0 0 313 313 35 354 420 420 420 301 10 479 331 231 231 231 274 186 162 482 482 105 6 144 496 496 496 215 457 393 205 155 332 332 332 216 448 448 448 464 255 399 473 65 486 486 460 240 24 310 395 469 242 116 94 418 418 418 418 418 418 99 436 436 60 60 298 298 116 33 394 466 127 114 361 361 361 282 388 303 117 48 13 80 80 80 321 7 364 345 430 430 430 314 35 401 198 127 114 114 264 264 264 59 59 452 452 13 229 247 247 312 312 312 292 292 292 292 292 12 21 1 21 21 21 21 21 23 101 101 149 149 391 316 316 316 73 73 289 289 321 320 159 159 285 34 430 430 430 399 70 65 111 438 438 143 36 108 119 351 405 405 405 206 178 192 192 176 135 135 248 248 465 377 377 374 374 132 399 70 383 383 383 383 383 385 35 310 310 107 447 97 427 56 56 47 491 187 80 491 80 289 321 320 74 485 213 213 252 215 354 29 302 302 175 175 81 353 353 353 353 467 467 297 297 297 293 43 345 109 469 281 342 342 6 36 119 351 351 139 139 175 81 176 135 200 248 248 429 429 429 429 464 464 111 111 111 111 438 438 239 75 371 371 374 374 374 374 132 132 132 98 98 13 414 247 312 312 126 187 292 12 12 12 12 23 23 23 101 149 149 228 321 321 320 345 430 389 236 310 107 152 152 152 378 42 147 380 180 486 486 460 240 216 300 300 495 406 467 111 111 111 438 314 36 371 371 278 278 314 242 242 242 457 457 108 108 119 437 437 405 405 405 405 206 206 215 35 458 192 419 427 229 247 312 126 292 292 23 1 408 408 408 149 228 228 316 316 80 80 289 321 209 188 118 118 118 118 118 261 219 152 152 152 152 186 162 232 172 115 470 470 403 171 171 422 162 342 342 273 273 84 16 88 106 284 481 293 293 293 150 162 232 86 238 272 371 180 106 284 405 405 405 206 206 215 215 233 352 419 13 229 82 312 187 187 187 187 47 47 47 47 491 491 491 316 491 316 80 80 435 435 435 435 209 111 111 111 438 143 458 445 445 445 351 351 351 365 365 365 365 330 388 64 64 77 77 342 68 238 6 272 180 405 405 405 215 215 35 402 345 409 409 94 199 111 111 111 438 399 217 473 473 476 476 476 143 458 192 180 230 230 215 35 35 70 70 46 46 46 438 438 399 217 70 65 480 480 480 480 299 299 339 394 465 108 377 123 123 123 88 277 277 277 277 385 131 34 106 297 297 297 293 216 114 114 84 84 88 88 177 177 177 143 36 77 342 86 238 6 336 371 490 490 490 349 349 261 469 469 469 458 144 27 100 100 100 375 375 122 122 227 227 419 427 56 56 491 312 312 312 12 12 12 12 12 12 12 12 12 12 260 260 260 260 260 491 163 163 163 366 491 366 491 316 491 366 366 491 40 40 40 40 316 289 321 321 289 7 7 217 217 473 486 486 486 460 460 460 169 164 164 164 164 219 219 219 485 477 477 374 374 132 132 98 13 417 417 417 417 237 421 421 128 128 193 17
diff --git a/SpeechLM/dataset/LibriSpeech/hidden_unit/dict.km.txt b/SpeechLM/dataset/LibriSpeech/hidden_unit/dict.km.txt
new file mode 100644
index 0000000000000000000000000000000000000000..bbfe59e554d6234f3631d8d09d9281c2160f4675
--- /dev/null
+++ b/SpeechLM/dataset/LibriSpeech/hidden_unit/dict.km.txt
@@ -0,0 +1,500 @@
+0 0
+1 1
+2 2
+3 3
+4 4
+5 5
+6 6
+7 7
+8 8
+9 9
+10 10
+11 11
+12 12
+13 13
+14 14
+15 15
+16 16
+17 17
+18 18
+19 19
+20 20
+21 21
+22 22
+23 23
+24 24
+25 25
+26 26
+27 27
+28 28
+29 29
+30 30
+31 31
+32 32
+33 33
+34 34
+35 35
+36 36
+37 37
+38 38
+39 39
+40 40
+41 41
+42 42
+43 43
+44 44
+45 45
+46 46
+47 47
+48 48
+49 49
+50 50
+51 51
+52 52
+53 53
+54 54
+55 55
+56 56
+57 57
+58 58
+59 59
+60 60
+61 61
+62 62
+63 63
+64 64
+65 65
+66 66
+67 67
+68 68
+69 69
+70 70
+71 71
+72 72
+73 73
+74 74
+75 75
+76 76
+77 77
+78 78
+79 79
+80 80
+81 81
+82 82
+83 83
+84 84
+85 85
+86 86
+87 87
+88 88
+89 89
+90 90
+91 91
+92 92
+93 93
+94 94
+95 95
+96 96
+97 97
+98 98
+99 99
+100 100
+101 101
+102 102
+103 103
+104 104
+105 105
+106 106
+107 107
+108 108
+109 109
+110 110
+111 111
+112 112
+113 113
+114 114
+115 115
+116 116
+117 117
+118 118
+119 119
+120 120
+121 121
+122 122
+123 123
+124 124
+125 125
+126 126
+127 127
+128 128
+129 129
+130 130
+131 131
+132 132
+133 133
+134 134
+135 135
+136 136
+137 137
+138 138
+139 139
+140 140
+141 141
+142 142
+143 143
+144 144
+145 145
+146 146
+147 147
+148 148
+149 149
+150 150
+151 151
+152 152
+153 153
+154 154
+155 155
+156 156
+157 157
+158 158
+159 159
+160 160
+161 161
+162 162
+163 163
+164 164
+165 165
+166 166
+167 167
+168 168
+169 169
+170 170
+171 171
+172 172
+173 173
+174 174
+175 175
+176 176
+177 177
+178 178
+179 179
+180 180
+181 181
+182 182
+183 183
+184 184
+185 185
+186 186
+187 187
+188 188
+189 189
+190 190
+191 191
+192 192
+193 193
+194 194
+195 195
+196 196
+197 197
+198 198
+199 199
+200 200
+201 201
+202 202
+203 203
+204 204
+205 205
+206 206
+207 207
+208 208
+209 209
+210 210
+211 211
+212 212
+213 213
+214 214
+215 215
+216 216
+217 217
+218 218
+219 219
+220 220
+221 221
+222 222
+223 223
+224 224
+225 225
+226 226
+227 227
+228 228
+229 229
+230 230
+231 231
+232 232
+233 233
+234 234
+235 235
+236 236
+237 237
+238 238
+239 239
+240 240
+241 241
+242 242
+243 243
+244 244
+245 245
+246 246
+247 247
+248 248
+249 249
+250 250
+251 251
+252 252
+253 253
+254 254
+255 255
+256 256
+257 257
+258 258
+259 259
+260 260
+261 261
+262 262
+263 263
+264 264
+265 265
+266 266
+267 267
+268 268
+269 269
+270 270
+271 271
+272 272
+273 273
+274 274
+275 275
+276 276
+277 277
+278 278
+279 279
+280 280
+281 281
+282 282
+283 283
+284 284
+285 285
+286 286
+287 287
+288 288
+289 289
+290 290
+291 291
+292 292
+293 293
+294 294
+295 295
+296 296
+297 297
+298 298
+299 299
+300 300
+301 301
+302 302
+303 303
+304 304
+305 305
+306 306
+307 307
+308 308
+309 309
+310 310
+311 311
+312 312
+313 313
+314 314
+315 315
+316 316
+317 317
+318 318
+319 319
+320 320
+321 321
+322 322
+323 323
+324 324
+325 325
+326 326
+327 327
+328 328
+329 329
+330 330
+331 331
+332 332
+333 333
+334 334
+335 335
+336 336
+337 337
+338 338
+339 339
+340 340
+341 341
+342 342
+343 343
+344 344
+345 345
+346 346
+347 347
+348 348
+349 349
+350 350
+351 351
+352 352
+353 353
+354 354
+355 355
+356 356
+357 357
+358 358
+359 359
+360 360
+361 361
+362 362
+363 363
+364 364
+365 365
+366 366
+367 367
+368 368
+369 369
+370 370
+371 371
+372 372
+373 373
+374 374
+375 375
+376 376
+377 377
+378 378
+379 379
+380 380
+381 381
+382 382
+383 383
+384 384
+385 385
+386 386
+387 387
+388 388
+389 389
+390 390
+391 391
+392 392
+393 393
+394 394
+395 395
+396 396
+397 397
+398 398
+399 399
+400 400
+401 401
+402 402
+403 403
+404 404
+405 405
+406 406
+407 407
+408 408
+409 409
+410 410
+411 411
+412 412
+413 413
+414 414
+415 415
+416 416
+417 417
+418 418
+419 419
+420 420
+421 421
+422 422
+423 423
+424 424
+425 425
+426 426
+427 427
+428 428
+429 429
+430 430
+431 431
+432 432
+433 433
+434 434
+435 435
+436 436
+437 437
+438 438
+439 439
+440 440
+441 441
+442 442
+443 443
+444 444
+445 445
+446 446
+447 447
+448 448
+449 449
+450 450
+451 451
+452 452
+453 453
+454 454
+455 455
+456 456
+457 457
+458 458
+459 459
+460 460
+461 461
+462 462
+463 463
+464 464
+465 465
+466 466
+467 467
+468 468
+469 469
+470 470
+471 471
+472 472
+473 473
+474 474
+475 475
+476 476
+477 477
+478 478
+479 479
+480 480
+481 481
+482 482
+483 483
+484 484
+485 485
+486 486
+487 487
+488 488
+489 489
+490 490
+491 491
+492 492
+493 493
+494 494
+495 495
+496 496
+497 497
+498 498
+499 499
diff --git a/SpeechLM/dataset/LibriSpeech/hidden_unit/train_sample100.km b/SpeechLM/dataset/LibriSpeech/hidden_unit/train_sample100.km
new file mode 100644
index 0000000000000000000000000000000000000000..dc3336034c88c053523a9e5076515ce4d56c19b6
--- /dev/null
+++ b/SpeechLM/dataset/LibriSpeech/hidden_unit/train_sample100.km
@@ -0,0 +1,100 @@
+17 17 17 296 296 317 317 317 317 317 491 461 461 461 461 461 461 491 491 184 184 184 289 310 107 107 395 351 486 486 460 215 215 35 96 272 300 382 382 245 43 364 276 174 174 174 174 319 282 282 388 303 303 117 404 404 439 439 225 225 225 225 225 225 225 491 391 391 47 491 73 80 289 7 7 217 473 258 258 258 31 342 224 494 494 494 368 281 9 142 142 147 147 329 329 329 329 329 329 36 310 107 395 395 302 302 497 497 251 251 241 241 431 329 432 330 330 388 195 195 64 212 212 131 483 226 226 226 209 356 356 356 356 31 162 68 224 224 494 494 215 129 74 190 190 499 499 499 265 265 85 85 85 85 207 318 185 185 433 433 86 6 6 227 419 417 417 417 237 237 237 237 237 237 237 237 237 237 237 237 362 491 362 305 40 491 305 40 40 362 362 40 40 40 40 40 40 40 40 218 491 218 218 218 491 305 218 491 218 218 218 218 218 218 491 218 435 491 218 491 218 218 218 491 218 218 491 369 491 369 369 369 369 369 21 21 21 21 21 21 21 21 408 408 408 149 228 228 491 289 320 7 473 258 258 258 258 342 342 224 494 494 494 494 31 9 9 142 397 147 147 329 329 329 329 329 143 36 107 107 395 302 302 497 497 251 251 251 241 241 431 278 278 278 278 330 388 388 195 195 195 243 212 131 419 439 225 225 225 80 491 80 7 7 251 241 431 278 278 278 173 173 402 402 401 401 401 401 401 491 310 107 395 395 180 151 151 151 169 150 150 86 86 238 6 272 397 133 345 109 109 109 264 264 313 216 216 22 448 448 448 14 14 14 145 145 145 486 460 460 460 173 280 29 242 242 116 33 250 250 251 241 81 444 324 324 324 324 324 301 339 217 217 217 217 217 473 65 290 290 290 290 290 434 434 339 339 33 250 250 42 42 147 147 380 288 84 496 496 496 496 496 274 274 37 24 131 404 439 78 414 80 80 80 80 80 80 80 401 384 371 278 278 278 215 35 35 96 401 401 401 401 401 401 401 401 401 239 384 371 180 315 315 315 315 315 450 450 413 413 94 199 340 340 33 76 465 377 123 123 123 88 88 44 44 44 251 251 241 431 278 278 285 285 302 302 497 497 497 58 72 72 72 437 481 481 481 481 481 481 175 175 81 84 84 84 496 274 98 98 229 247 247 126 126 126 326 326 326 326 326 101 101 149 228 491 373 393 234 234 155 190 190 487 288 288 278 330 339 64 64 212 310 447 447 6 272 472 345 333 333 220 220 164 14 14 411 411 284 481 481 481 293 293 122 122 384 300 334 334 304 304 304 49 269 342 168 89 89 89 446 33 33 250 251 251 241 431 470 171 171 171 252 252 325 34 41 324 324 318 368 368 342 9 219 485 286 286 382 382 313 236 239 161 161 79 499 499 405 405 206 215 215 233 270 270 433 342 224 89 89 322 67 394 76 465 161 161 492 492 492 8 8 280 498 498 498 498 498 396 186 39 54 238 6 272 472 336 336 62 62 62 62 62 146 464 44 44 44 8 32 401 354 190 190 380 380 499 496 496 496 178 233 233 458 192 419 427 247 247 15 193 193 17
+17 17 363 363 51 51 228 320 127 45 45 45 385 131 58 72 72 110 110 110 110 486 460 240 240 325 34 154 154 154 457 478 478 232 232 482 482 172 115 273 273 153 153 153 372 372 396 396 186 186 54 54 172 224 273 255 255 43 364 364 276 109 109 403 403 403 403 403 207 246 324 301 301 129 401 354 354 180 376 376 376 460 178 178 458 192 192 242 340 116 466 466 22 283 455 43 364 364 276 276 153 153 496 496 37 37 24 77 270 342 224 69 69 130 130 198 22 448 448 448 464 180 424 424 424 424 424 274 122 131 472 221 401 82 144 27 437 151 151 169 169 164 164 472 221 401 259 29 380 382 396 313 385 35 472 401 259 74 425 425 386 343 343 343 343 343 358 358 39 39 433 433 160 160 160 112 427 56 56 491 312 312 341 341 341 341 341 341 12 12 12 21 21 21 21 21 21 21 21 21 408 408 408 408 391 391 228 491 491 412 177 177 177 177 177 131 133 345 141 141 141 281 453 142 397 456 456 456 456 129 259 74 485 485 485 485 374 374 325 449 449 191 191 191 314 314 36 377 87 87 8 8 420 420 420 324 464 44 44 44 94 335 335 411 411 188 121 121 33 64 76 465 465 161 161 487 469 469 143 458 192 192 278 278 278 37 314 131 472 72 72 72 72 72 72 110 110 443 120 240 314 314 26 26 26 251 241 431 235 235 235 235 235 413 200 200 248 248 248 212 354 190 380 380 499 496 496 496 178 233 458 192 192 340 340 340 94 199 154 154 77 342 342 142 14 411 498 498 498 498 498 134 175 81 166 324 324 464 382 382 245 129 458 208 208 441 441 441 153 153 372 372 396 186 186 323 323 238 6 272 377 487 487 374 313 216 216 114 124 124 124 274 274 368 269 9 142 397 336 276 109 109 496 496 496 37 37 37 24 270 270 433 160 427 229 247 247 126 126 326 326 326 326 326 101 101 149 149 228 228 491 345 333 333 333 220 220 164 402 221 401 401 401 491 384 371 180 106 306 306 306 306 396 396 178 178 35 458 96 96 66 66 68 68 68 68 115 115 444 213 213 213 143 458 208 208 487 487 288 277 385 143 270 270 342 224 69 462 462 130 402 402 401 401 491 74 190 441 441 441 153 153 182 182 182 182 182 497 175 175 81 89 89 446 116 33 131 472 221 458 445 445 351 351 486 486 460 460 169 150 342 342 86 105 336 445 445 470 403 403 171 171 171 246 246 252 24 131 404 439 78 170 305 491 28 28 28 491 491 491 2 201 305 305 491 305 305 2 316 316 316 316 316 491 491 289 289 289 320 354 159 159 159 159 159 240 35 131 472 221 336 354 62 62 62 62 62 438 216 22 283 455 236 108 119 119 103 103 103 103 103 85 299 203 53 473 177 177 143 131 133 133 147 380 288 213 213 213 252 143 310 447 447 447 26 26 251 251 241 81 329 329 329 330 388 195 195 471 471 49 453 142 58 72 72 437 437 481 481 481 481 293 175 175 81 84 84 84 84 84 16 274 274 98 483 483 440 188 177 177 177 131 133 133 345 141 141 141 281 9 168 44 44 143 458 208 208 441 441 441 346 346 265 265 85 85 85 146 146 277 277 277 385 385 227 419 225 225 226 197 7 364 276 109 109 139 139 293 293 122 143 458 144 27 27 121 116 33 33 212 239 371 180 151 151 151 178 35 96 96 36 272 191 191 191 37 314 26 251 241 431 431 278 285 285 302 302 497 497 186 162 482 482 338 238 161 79 487 288 288 360 360 434 434 434 203 381 381 404 13 491 247 15 193 193 193 17
+17 17 17 363 363 51 51 228 491 373 155 155 155 148 148 387 372 313 10 479 479 307 307 307 307 61 167 449 449 34 357 357 357 357 357 173 280 29 242 116 94 199 44 44 44 8 129 401 259 354 190 190 380 380 499 496 496 496 167 233 233 144 192 419 419 439 225 225 225 80 80 491 491 144 389 389 389 389 389 133 133 42 147 147 380 499 319 319 319 348 348 195 394 90 76 74 74 437 311 311 311 311 311 311 460 169 150 342 86 6 6 196 217 473 258 258 258 31 342 224 494 494 368 281 9 142 397 147 329 329 329 329 329 36 310 107 302 302 302 497 497 251 251 251 241 431 329 329 330 116 33 195 195 471 471 49 269 142 238 6 272 106 153 153 372 372 372 245 43 345 333 333 220 220 216 180 113 113 113 113 167 167 236 239 401 384 219 485 485 374 374 132 132 42 147 456 456 456 456 416 144 27 106 306 306 306 306 306 306 396 313 24 24 131 472 393 155 332 332 332 313 236 239 239 384 371 213 213 213 252 186 39 342 342 11 11 11 379 379 379 394 76 478 66 68 68 115 267 41 41 41 246 3 464 464 89 194 446 446 446 64 212 239 384 490 490 143 458 144 208 441 441 153 153 153 372 372 372 467 467 467 275 203 381 381 48 404 13 491 491 312 312 312 292 292 292 292 292 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 408 408 408 408 149 149 228 491 289 412 177 177 177 177 177 35 401 259 74 190 190 488 488 488 488 488 8 29 134 134 134 134 8 8 359 359 166 166 166 324 301 378 345 141 141 141 281 9 142 221 144 27 27 437 370 370 370 370 370 370 348 64 76 310 107 395 395 459 459 459 271 271 39 86 86 238 198 45 45 45 45 45 35 401 196 217 473 258 258 258 31 342 342 224 494 494 494 368 453 9 142 397 147 147 329 329 329 329 329 329 143 310 107 107 395 302 302 302 375 497 497 497 43 364 345 141 141 141 31 162 232 68 172 115 470 278 278 325 34 176 135 135 200 200 464 415 415 415 415 415 131 183 156 156 156 156 245 43 364 364 109 109 278 278 116 33 64 212 212 34 84 84 84 84 274 274 98 229 229 247 126 126 326 326 101 408 408 228 491 491 491 445 445 213 213 213 213 252 215 458 176 176 135 135 200 200 44 44 44 44 99 338 338 338 338 338 395 106 306 306 306 306 396 396 215 35 35 335 145 284 265 265 265 85 85 85 146 464 464 106 125 125 125 125 348 94 335 14 411 204 204 204 204 204 204 204 29 29 337 337 324 422 422 164 164 164 214 214 214 328 200 248 466 114 114 45 45 385 90 401 82 74 119 311 311 311 311 311 311 311 311 311 282 169 169 150 39 433 86 238 6 75 227 419 225 225 225 225 225 491 373 305 80 289 491 155 165 165 165 165 165 203 53 212 239 190 380 380 496 496 178 35 96 270 342 342 224 89 89 446 33 394 310 107 395 395 106 139 424 387 122 122 122 300 300 242 242 116 94 335 335 411 230 230 230 230 230 230 215 215 233 233 419 427 229 247 247 126 126 193 193 193 193 17
+17 17 17 363 363 51 149 228 228 209 83 194 194 194 322 322 67 212 127 45 45 45 45 240 240 325 118 118 118 118 118 402 338 400 400 400 30 301 301 10 479 331 84 84 496 274 252 36 449 459 459 459 31 342 86 86 6 272 483 483 411 475 475 475 475 475 475 475 475 349 164 164 214 214 214 214 200 248 14 14 411 287 284 284 284 426 426 426 206 206 206 24 335 335 226 157 157 157 157 157 245 14 14 411 145 113 113 113 113 285 285 34 462 462 130 402 401 401 491 74 425 425 386 386 431 343 343 343 343 358 358 358 358 358 39 433 433 433 160 427 247 247 247 126 126 292 326 326 326 326 326 408 408 149 228 491 373 338 338 400 400 400 400 301 378 43 345 389 389 389 314 314 196 309 309 479 331 463 463 463 463 280 29 382 245 245 42 42 147 380 380 288 443 443 120 169 169 150 39 433 86 86 86 6 6 272 34 89 319 319 348 394 76 108 377 139 139 139 139 293 186 99 338 400 400 400 30 3 58 254 254 254 314 131 393 234 234 261 25 470 264 264 468 468 468 396 313 143 449 449 191 191 191 325 180 180 113 113 113 113 113 167 314 314 401 401 198 22 283 455 455 43 364 364 276 346 346 346 265 265 265 265 85 85 85 146 146 318 318 368 453 342 168 89 89 446 116 212 131 133 43 364 276 109 109 264 264 264 468 245 245 349 234 234 155 25 148 148 148 372 372 304 304 49 9 9 221 198 127 114 114 264 264 468 406 406 467 467 106 284 284 426 426 206 206 37 173 352 352 352 352 419 439 439 237 237 237 491 491 491 28 491 491 491 491 341 341 341 341 341 341 369 369 369 369 369 369 369 369 369 369 369 369 369 369 369 369 369 369 369 369 369 369 369 369 369 369 369 369 369 260 491 163 163 163 163 163 163 316 491 316 316 73 289 289 289 127 114 0 222 468 353 353 353 353 215 35 259 74 425 425 386 431 432 330 330 348 76 449 41 324 464 462 462 462 402 221 259 74 351 213 213 213 252 252 129 354 100 100 100 497 497 335 335 440 188 188 340 116 94 199 145 145 145 486 460 460 173 280 29 242 242 116 33 250 250 359 359 474 474 474 324 246 19 3 3 14 14 209 145 194 194 446 446 388 64 212 335 14 411 145 145 113 113 113 450 413 285 34 223 223 223 280 280 277 277 277 277 277 233 75 227 427 229 247 312 292 292 326 326 101 101 149 391 228 491 491 373 489 489 489 489 143 458 144 389 389 389 94 199 255 255 236 36 108 119 119 351 432 432 432 330 388 195 64 131 472 472 221 458 144 208 208 425 386 386 496 496 496 274 186 186 54 54 86 26 26 474 474 166 301 143 36 377 123 123 123 114 222 222 222 222 313 10 10 479 398 290 171 171 252 215 458 29 382 382 304 368 269 342 142 221 336 354 278 278 278 368 453 342 86 196 196 94 459 459 459 271 31 342 342 221 221 336 354 62 62 62 62 62 438 438 143 36 384 371 371 278 278 330 33 64 76 108 449 69 223 130 402 196 479 331 255 154 416 458 208 386 431 151 151 151 178 458 96 36 272 176 135 135 200 248 248 127 114 222 222 222 406 406 467 467 350 350 350 350 350 350 413 413 303 48 404 13 229 491 491 312 15 15 15 193 193 193 17
+17 17 17 296 363 363 51 51 51 491 491 491 491 320 320 159 159 159 159 314 35 196 196 473 258 258 258 31 342 224 494 494 494 368 453 142 142 397 147 380 329 329 329 329 143 36 310 107 395 134 302 302 497 497 251 251 251 241 431 278 278 278 330 388 195 64 212 131 133 133 141 141 141 281 453 142 221 336 174 174 174 174 348 199 223 223 223 130 198 198 124 124 124 124 124 368 31 342 86 221 221 336 445 445 445 351 351 171 171 171 252 215 29 134 134 134 8 259 354 100 100 497 497 497 122 129 259 144 208 208 190 487 487 213 213 213 252 143 36 310 107 395 334 334 334 304 304 185 49 269 342 224 224 489 489 489 143 144 27 389 389 116 33 250 217 217 473 365 365 365 330 94 199 469 469 469 24 36 310 447 447 447 6 127 222 222 222 245 245 14 411 411 350 350 350 350 413 64 394 465 465 27 27 121 116 33 394 478 478 232 172 224 273 470 498 308 308 467 299 388 379 471 471 49 342 168 89 194 194 446 322 64 212 198 114 114 84 496 496 274 318 49 269 342 224 69 462 130 129 402 106 493 493 493 216 300 300 382 245 349 205 261 261 25 496 496 496 496 274 274 233 96 270 433 342 168 340 340 116 33 36 377 123 123 216 283 455 8 354 106 306 306 306 306 396 396 416 416 192 192 275 275 116 303 303 48 48 229 170 491 491 312 312 312 292 292 292 292 292 292 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 260 408 408 408 391 391 391 228 491 491 373 338 400 400 400 30 378 378 141 141 141 281 342 342 44 44 44 94 331 84 84 496 496 285 449 134 134 8 100 100 497 497 58 72 72 268 268 268 268 268 169 186 39 54 142 397 397 276 346 346 346 428 85 146 146 358 352 352 352 352 352 352 417 417 417 417 237 491 435 225 225 225 72 156 156 156 156 245 245 43 364 276 276 109 109 498 396 396 178 143 259 458 208 345 141 141 281 281 9 168 106 297 297 297 297 297 297 43 43 345 109 109 171 171 368 368 342 342 221 336 371 180 180 319 319 319 319 282 388 195 195 195 117 404 335 440 209 83 194 194 194 194 446 446 64 212 131 133 364 364 276 109 109 443 139 139 139 293 293 497 122 239 36 371 180 319 319 319 319 282 303 303 303 303 117 404 439 439 439 78 237 47 47 47 80 491 80 491 373 373 338 338 338 400 400 400 30 30 246 246 246 3 3 197 197 7 42 147 147 147 380 210 210 210 210 486 365 282 282 282 388 388 195 195 199 404 404 197 197 216 22 283 283 455 38 162 482 115 273 273 84 496 88 88 176 176 176 328 200 248 478 66 172 115 273 498 498 498 245 143 458 458 302 302 302 302 375 98 98 229 247 247 15 15 193 193 193 17
+17 17 17 363 363 363 51 51 228 491 373 72 110 110 139 139 139 293 293 215 35 96 96 6 472 472 133 42 147 380 499 499 319 319 319 348 195 195 466 22 283 283 38 162 68 68 68 273 273 319 319 319 348 33 64 212 212 93 93 93 93 171 422 186 39 86 86 105 105 336 208 153 153 153 153 182 182 375 375 497 98 98 483 440 83 83 55 55 55 322 67 212 131 133 345 141 141 141 141 281 9 198 198 22 283 455 38 162 482 482 482 238 6 161 161 499 499 235 235 235 235 348 64 212 459 459 459 459 31 54 86 6 272 472 221 336 259 190 190 190 488 499 499 405 405 206 215 215 35 29 69 69 223 130 198 198 22 283 455 236 129 36 310 107 395 395 487 498 498 498 396 178 36 310 107 447 483 226 226 209 411 171 171 171 171 252 252 143 77 478 342 224 494 494 494 31 342 342 115 273 470 265 265 265 85 85 85 146 469 469 469 36 449 41 41 41 324 324 3 335 440 145 194 194 446 446 67 76 90 393 393 234 261 25 148 148 148 148 372 372 467 467 467 242 116 33 250 217 217 473 473 278 278 99 436 436 60 60 298 379 379 195 471 471 49 49 168 106 106 405 167 215 35 458 96 368 453 453 371 278 278 139 175 81 324 324 219 495 495 495 495 467 41 41 41 41 19 454 454 229 491 491 312 312 312 292 292 292 292 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 408 408 408 149 228 491 491 320 219 357 357 357 240 240 131 133 133 345 333 333 220 220 216 106 106 297 297 297 297 297 297 293 293 293 122 4 127 114 258 258 258 258 258 271 271 39 433 433 433 433 160 160 160 97 97 225 225 225 225 7 217 473 258 258 258 258 342 342 224 494 494 494 368 9 142 142 397 147 380 329 329 329 329 329 310 107 107 395 302 302 497 497 349 349 205 261 261 25 315 315 315 450 450 413 413 212 131 335 226 209 44 44 44 236 8 32 259 354 180 319 319 319 348 348 64 64 64 64 212 384 34 11 11 11 116 33 243 243 401 401 491 108 119 119 437 103 103 103 103 103 85 85 299 299 339 64 76 36 87 87 66 66 68 68 115 278 278 278 36 131 393 393 155 155 332 332 332 14 14 14 411 145 145 284 315 315 450 450 153 88 372 372 304 304 185 49 453 168 415 415 415 415 58 183 156 156 156 313 143 458 458 445 351 278 278 278 36 310 107 107 395 242 116 116 250 250 250 276 109 109 278 330 116 33 64 212 212 371 84 84 84 84 274 274 263 229 247 247 126 126 326 326 326 326 326 101 101 149 228 228 289 320 309 309 479 278 278 278 325 449 449 176 176 135 328 200 200 195 248 197 197 197 401 491 144 27 27 437 437 405 405 405 206 167 35 35 242 242 242 33 33 33 250 250 364 276 276 153 153 372 372 372 215 215 35 472 472 221 401 491 208 208 441 441 441 441 109 278 139 139 139 375 375 375 233 233 270 270 433 433 160 160 18 112 112 439 439 439 225 237 237 237 47 491 47 491 491 73 491 373 373 338 400 400 400 30 30 58 110 254 254 254 314 196 196 479 331 278 278 278 325 449 191 191 191 314 314 478 478 68 68 115 273 278 278 143 96 96 232 68 68 68 6 272 371 444 360 360 252 339 199 199 223 223 130 402 198 198 114 57 57 57 203 381 381 48 229 491 247 15 193 193 193 17
+17 17 17 363 51 51 228 491 412 83 145 253 253 253 253 368 342 168 168 145 145 486 460 460 173 280 29 242 242 242 359 359 359 81 324 324 324 3 58 72 268 268 268 268 268 268 274 186 39 54 86 105 336 445 485 485 213 485 215 129 354 29 334 304 304 185 131 397 397 345 347 347 347 347 43 43 364 276 174 174 426 426 206 167 457 76 36 377 87 87 87 236 259 108 119 119 351 351 443 139 139 139 293 175 175 81 89 340 340 116 33 335 14 14 491 411 411 284 284 284 405 405 405 206 206 206 37 24 131 133 4 4 280 153 153 343 343 343 343 358 358 39 342 342 224 50 50 50 50 50 50 185 269 433 160 112 427 82 247 312 126 292 292 292 326 326 326 326 326 101 408 149 149 491 412 412 55 55 55 322 67 131 472 221 458 445 445 213 213 213 213 252 215 458 176 176 135 135 135 200 200 44 44 44 44 99 338 338 338 338 395 273 106 306 306 306 396 396 215 35 35 335 14 14 411 284 265 265 265 265 85 85 146 464 464 125 125 125 125 466 466 22 283 455 399 217 217 217 473 290 290 290 290 290 434 434 434 339 339 33 90 42 42 147 147 380 380 288 496 496 496 496 274 274 274 24 131 472 198 198 127 45 45 385 90 221 458 208 208 190 499 499 499 405 405 206 150 150 54 86 238 6 6 472 472 198 22 283 455 38 72 72 437 437 481 481 481 481 175 175 81 84 84 84 84 274 274 98 229 247 247 126 126 326 326 326 326 326 101 149 149 228 491 83 83 55 55 322 67 67 131 133 133 364 276 276 346 346 486 315 315 315 315 450 450 450 413 413 348 64 212 131 230 230 230 230 230 35 35 401 198 198 22 283 455 38 162 232 232 232 68 68 6 371 371 213 213 213 252 215 129 259 29 29 42 42 42 147 380 288 443 443 443 240 314 131 183 183 183 183 183 278 278 278 139 139 139 497 497 497 497 122 259 259 354 420 420 324 464 180 180 426 426 426 426 426 282 388 303 303 64 212 465 227 419 439 78 491 305 421 491 491 491 421 491 421 491 491 491 128 128 128 491 128 193 193 193 17
+17 17 17 363 363 363 363 51 149 228 491 491 411 145 475 475 475 475 94 475 475 475 324 301 8 354 106 493 151 240 325 41 41 324 324 3 183 183 489 489 489 489 489 43 43 276 109 109 443 330 330 348 64 76 465 449 483 145 113 113 113 113 113 240 285 285 34 223 223 130 280 277 277 277 277 277 385 36 36 227 419 225 225 226 226 226 491 209 157 157 157 157 157 372 335 14 14 411 188 340 340 116 33 64 394 465 108 377 123 123 123 88 88 277 277 277 277 385 24 131 427 229 247 126 126 126 326 326 326 101 408 149 491 228 373 110 110 110 254 254 240 314 35 108 377 87 87 87 129 259 74 311 311 311 311 311 311 311 311 169 150 342 342 342 168 106 410 410 410 410 410 29 29 382 313 216 216 114 92 92 92 92 92 385 131 472 183 183 183 351 278 278 139 139 139 497 497 497 497 42 42 8 147 380 380 499 84 496 496 496 496 274 274 274 37 24 131 419 419 225 225 225 225 82 83 55 55 55 322 67 394 478 478 232 232 172 172 115 273 84 84 84 84 16 16 16 274 274 274 98 13 229 247 312 126 126 23 23 23 101 101 101 149 149 228 491 289 289 7 147 147 380 499 319 319 319 348 466 466 466 212 22 448 448 448 14 14 145 319 319 319 319 348 195 195 195 394 478 478 232 68 68 68 267 267 267 267 267 434 339 339 33 90 90 32 465 144 27 180 284 405 426 426 413 348 64 76 26 26 26 359 81 81 277 277 385 325 34 69 223 130 130 402 196 196 217 473 473 258 258 31 342 224 494 494 494 494 368 9 142 142 42 42 147 380 329 329 329 329 252 143 36 107 107 395 302 302 302 497 497 185 269 9 9 483 14 411 411 297 297 297 297 297 297 293 293 497 186 162 68 68 172 115 267 267 267 267 360 360 176 176 176 135 328 328 200 199 106 106 265 265 265 265 85 85 85 85 207 207 19 454 13 417 417 417 237 237 170 28 28 28 28 28 362 491 491 362 362 362 362 491 491 362 211 491 491 369 369 369 369 21 21 21 21 21 21 21 21 21 21 21 21 21 21 408 408 408 408 391 391 73 491 289 373 338 338 400 400 400 301 378 378 141 141 141 281 162 68 68 115 470 278 278 278 449 176 176 135 135 200 248 248 127 114 264 264 264 468 245 245 43 364 276 174 174 174 174 348 94 199 145 145 460 460 460 402 402 6 272 300 469 313 313 10 479 398 398 374 374 132 413 339 199 199 340 116 116 199 335 14 411 411 498 498 498 498 134 175 359 81 166 324 324 422 236 36 310 107 395 395 485 374 374 374 374 132 132 413 303 303 303 117 404 439 439 78 237 237 237 47 47 491 47 2 491 2 2 2 2 316 316 491 316 316 491 491 73 435 289 7 127 5 5 5 38 162 68 68 68 115 273 319 319 319 319 348 348 195 250 133 141 141 141 281 342 86 221 458 144 27 351 319 319 319 53 53 176 135 135 200 200 200 464 340 340 340 94 199 415 415 415 35 198 22 283 455 455 364 345 109 278 278 330 116 33 64 212 212 371 84 84 84 84 274 274 274 274 43 43 401 364 276 276 153 153 153 387 387 372 372 396 396 203 53 473 89 446 446 67 131 472 221 401 354 190 380 380 499 499 428 428 146 146 358 358 233 233 227 419 419 427 56 491 421 15 15 15 193 193 193 17
+17 17 17 363 363 363 363 51 51 51 228 491 320 127 448 448 448 14 14 411 153 153 387 372 372 396 313 35 310 107 395 382 382 313 313 285 34 125 125 125 125 348 466 22 283 283 38 162 232 232 232 26 26 26 431 431 84 496 496 274 274 457 457 401 401 354 354 255 255 251 251 251 241 431 84 84 84 16 16 274 274 216 216 283 283 455 58 72 72 72 268 268 268 268 268 268 450 450 274 271 271 39 39 86 142 397 336 345 141 141 281 281 342 168 340 340 116 199 44 44 44 129 259 190 190 380 380 499 499 428 85 146 146 285 34 302 302 497 497 349 349 234 234 234 234 234 261 425 425 386 431 151 151 151 169 169 169 99 436 338 338 447 395 69 462 462 402 402 221 401 259 491 74 351 351 360 360 360 200 200 248 76 76 465 445 485 324 324 324 301 378 364 364 346 346 346 428 428 146 146 143 36 472 221 401 401 259 354 425 425 241 431 374 374 374 374 132 132 132 203 381 381 404 13 491 247 312 126 126 326 326 326 326 101 101 101 149 228 491 491 373 72 72 437 284 319 319 319 203 53 53 53 53 469 212 212 131 34 410 410 410 410 410 173 280 29 29 382 245 245 8 259 354 62 62 62 62 146 464 464 44 44 399 217 217 217 473 286 286 286 468 468 406 337 337 337 324 464 464 277 277 325 34 462 462 462 402 402 221 401 401 354 213 213 213 213 213 246 246 246 246 318 318 185 185 433 433 433 160 160 112 112 78 56 491 491 28 28 491 491 341 341 341 341 12 12 12 12 12 260 260 260 260 391 391 391 73 289 491 289 108 119 437 437 284 284 426 426 203 53 473 459 271 31 39 342 342 26 26 251 251 241 81 329 120 120 330 388 195 195 195 64 212 131 419 439 439 439 439 225 225 225 237 47 491 47 80 80 491 80 197 225 287 287 44 44 44 399 217 217 473 398 213 213 213 143 143 458 144 26 26 251 241 431 278 278 285 449 302 302 497 497 399 399 217 217 473 136 136 136 136 136 136 136 282 282 388 195 404 58 489 489 489 489 489 399 53 335 14 145 145 145 486 460 460 173 280 29 242 242 116 250 359 359 81 324 324 324 422 129 259 74 485 213 213 213 213 252 215 129 259 354 100 100 100 497 497 122 143 458 144 27 437 481 481 481 481 481 293 293 122 122 472 133 42 147 147 380 329 329 171 252 143 36 107 395 302 302 302 497 497 497 251 251 251 241 81 431 278 278 330 388 379 195 195 471 471 77 269 342 142 72 72 72 437 151 151 151 368 453 342 142 221 336 354 275 275 275 275 303 303 195 243 131 419 427 491 247 126 126 126 292 326 326 326 326 326 326 326 326 326 101 101 149 149 228 320 345 141 141 281 162 232 232 172 172 115 273 84 496 88 88 88 176 176 135 135 200 248 183 183 257 257 257 257 453 342 26 26 251 241 241 431 171 171 171 252 457 457 401 259 108 119 119 351 308 308 308 313 313 94 199 469 469 215 35 96 66 68 68 68 115 115 444 444 213 246 252 252 325 34 125 125 125 125 466 466 22 283 455 58 72 72 351 278 278 139 139 293 497 497 349 349 234 234 261 25 485 485 485 464 139 139 375 497 497 122 122 36 472 221 336 354 420 420 324 464 464 180 106 426 426 426 426 413 348 64 212 212 198 22 283 455 8 354 354 106 284 306 306 306 306 306 396 396 396 37 303 303 48 404 78 229 491 491 15 15 193 193 193 17
+17 17 17 363 363 363 51 51 51 228 491 7 217 473 258 258 31 342 342 494 494 494 281 9 142 397 147 329 329 329 329 143 310 107 302 302 302 497 122 10 10 309 398 398 398 398 398 374 132 216 216 127 45 45 45 325 183 451 30 30 30 3 14 14 411 284 284 405 405 405 206 206 167 24 227 227 472 221 401 491 354 420 420 422 143 458 144 27 351 351 151 253 368 368 99 338 338 338 400 400 400 400 30 3 58 58 110 254 254 254 254 58 58 72 72 110 498 498 498 498 396 313 325 183 183 57 57 57 203 53 394 90 76 108 108 119 351 139 139 139 139 293 293 215 35 74 74 329 329 213 329 252 325 300 382 382 245 399 217 70 65 65 153 329 372 406 406 467 313 186 39 342 342 224 242 242 116 466 466 22 283 448 448 14 411 213 213 213 213 173 173 402 196 196 176 328 328 248 248 8 354 255 255 38 349 205 234 261 148 148 148 148 148 148 372 372 372 59 452 335 197 226 226 209 188 188 340 340 340 340 33 195 117 117 117 197 197 197 80 491 80 491 491 7 7 7 364 345 109 329 139 329 81 219 219 485 464 464 203 203 33 394 212 465 107 395 329 329 329 171 171 171 301 301 8 129 354 425 175 175 431 329 329 264 468 468 304 313 186 162 323 482 482 482 238 6 272 106 153 153 153 182 372 372 372 372 59 245 335 14 209 411 410 410 410 410 410 410 173 29 29 495 406 467 415 415 131 90 259 144 27 437 437 306 306 306 306 396 203 53 469 469 469 325 325 41 41 41 19 19 454 229 247 126 126 126 326 326 326 326 326 326 101 149 149 228 289 491 127 45 45 45 45 240 183 183 183 451 30 30 30 301 399 217 473 432 432 432 330 348 64 457 401 82 108 377 87 87 38 162 323 323 115 273 84 84 496 274 274 58 58 183 257 257 257 31 9 238 6 119 161 308 308 308 396 313 94 199 199 459 215 215 96 66 342 172 224 41 41 324 3 301 314 198 22 283 455 116 199 331 443 443 178 178 458 96 86 238 6 272 145 145 460 460 460 402 402 6 272 300 469 313 10 94 398 398 374 374 374 132 413 303 303 48 404 13 170 491 491 491 312 15 15 292 292 292 193 193 193 193 17
+17 17 17 17 363 363 363 363 363 363 363 408 51 51 228 491 289 320 74 329 329 329 329 329 325 34 334 382 382 467 110 254 254 254 285 34 145 145 145 376 460 460 169 150 342 86 105 96 96 272 57 57 57 203 53 255 255 255 130 402 221 259 208 441 441 153 153 372 372 372 59 271 271 269 54 54 9 97 336 155 155 332 332 332 245 399 473 65 329 329 329 460 169 164 164 485 485 485 374 132 143 259 144 27 437 329 329 329 169 164 164 142 221 336 29 495 334 59 59 313 24 131 58 72 110 254 254 254 254 35 35 196 309 309 479 331 463 463 463 463 29 382 382 245 8 129 354 137 137 137 137 33 10 10 309 331 331 84 84 350 350 413 413 33 394 465 377 377 87 123 132 8 354 354 106 284 481 481 481 175 175 81 242 116 33 394 465 465 108 119 485 485 286 286 468 406 467 467 121 53 394 155 155 25 469 469 203 217 473 418 418 418 418 418 99 436 436 60 60 298 199 255 255 8 180 113 113 113 113 240 285 131 335 14 401 209 411 475 475 475 475 475 475 475 475 422 164 164 164 214 214 214 214 328 328 200 200 248 335 188 188 340 340 94 199 199 257 257 257 257 342 9 142 437 424 424 424 424 424 497 497 122 251 241 431 431 265 265 428 428 85 146 146 358 358 352 352 352 352 352 352 352 112 427 56 491 312 312 312 292 292 292 292 292 292 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 408 408 408 408 149 149 228 289 209 83 55 55 322 67 212 90 219 357 357 357 357 357 357 120 120 240 385 385 35 227 227 227 419 419 439 225 197 47 47 491 491 47 491 47 47 80 491 491 491 289 7 373 451 451 451 286 286 286 286 286 468 468 245 43 43 345 141 141 281 281 9 142 196 217 473 65 486 486 460 460 169 164 164 164 485 485 485 374 132 143 259 144 144 27 437 329 329 329 329 150 164 164 105 221 336 354 29 498 498 59 59 313 385 227 419 427 229 247 408 149 149 228 226 491 209 83 415 415 415 240 314 335 58 72 72 72 72 72 110 110 486 486 460 460 460 169 352 352 402 221 401 259 74 311 311 311 311 311 311 311 169 150 86 86 238 6 272 472 472 234 164 164 487 487 487 288 288 213 213 246 246 3 335 440 125 125 125 125 466 466 448 448 448 448 464 145 145 460 460 460 349 402 96 272 469 469 469 236 94 398 374 374 374 132 132 339 94 199 69 223 130 280 44 44 44 44 32 401 401 354 354 278 278 278 368 342 342 41 41 324 324 301 239 239 384 93 93 93 93 93 207 207 207 246 19 454 229 247 247 126 126 326 326 326 326 101 101 149 228 491 80 80 491 491 74 425 425 386 386 431 486 486 460 460 169 150 342 342 342 224 494 494 416 26 359 359 166 166 166 324 301 236 401 259 161 161 79 499 499 499 265 85 85 146 146 173 173 176 176 135 135 200 248 14 14 411 410 410 410 410 410 410 173 29 29 382 313 216 283 283 455 58 72 72 72 437 481 481 481 481 481 293 175 175 81 84 84 16 88 88 89 89 446 116 64 212 384 180 230 230 230 215 35 35 96 198 198 22 283 455 455 58 183 278 278 278 278 139 139 139 375 375 375 375 98 13 229 491 170 491 15 15 15 193 193 17
+17 17 363 51 228 412 412 83 194 194 446 67 67 131 183 257 257 257 257 453 342 221 221 336 354 354 443 443 443 169 150 342 86 86 6 6 272 472 66 482 482 115 485 374 374 132 252 36 449 462 462 402 402 221 336 144 208 425 386 386 431 496 496 496 496 496 274 274 37 233 185 185 269 323 18 427 427 247 247 126 126 292 23 23 408 408 391 391 228 228 289 491 320 407 407 407 407 310 107 397 397 141 141 141 281 281 9 142 221 221 336 491 74 74 425 425 386 386 431 290 290 290 290 434 434 339 339 195 33 394 76 465 74 190 190 190 487 487 374 374 374 132 132 358 352 352 352 402 198 198 45 45 45 45 131 183 451 30 30 30 301 378 345 141 141 281 453 9 221 336 144 180 84 496 88 88 176 176 135 328 200 335 14 14 145 145 113 113 113 113 206 285 449 34 69 223 130 280 180 145 145 486 460 460 173 280 280 242 242 116 33 250 251 241 81 256 444 213 246 246 246 19 19 454 454 78 170 170 491 28 491 491 312 312 187 292 292 12 12 21 21 21 21 21 21 21 21 21 21 408 408 408 408 149 149 491 491 289 491 209 83 55 322 67 325 30 30 30 30 3 58 58 72 110 110 254 254 254 254 314 35 198 127 283 455 455 236 129 401 401 401 354 354 431 151 151 240 416 416 192 41 41 41 324 3 464 89 89 446 348 466 22 283 455 38 162 232 482 172 115 106 106 153 372 372 372 406 467 302 302 497 497 399 399 217 473 473 264 264 264 264 264 468 468 59 59 59 245 43 364 345 407 407 407 310 107 447 221 336 354 420 420 236 129 36 108 119 119 351 496 496 496 274 143 458 192 242 242 116 116 466 212 45 45 45 325 183 30 30 301 378 141 141 141 281 342 9 221 336 144 180 84 88 88 88 176 135 135 200 200 464 44 44 143 458 27 27 121 121 33 478 478 232 68 172 115 273 278 278 278 285 495 495 495 134 134 134 134 8 100 100 100 497 122 401 401 401 371 278 278 278 31 39 86 86 6 272 11 11 11 11 379 379 471 471 270 433 433 433 18 112 56 56 491 312 312 312 187 292 292 292 292 292 292 21 21 21 21 21 21 21 21 21 21 408 408 408 408 149 228 228 289 491 7 309 479 331 315 315 315 315 450 450 16 293 293 335 197 197 197 197 197 197 197 197 491 491 7 7 364 364 364 364 276 181 181 181 181 181 264 264 264 264 468 468 468 245 245 43 364 364 430 430 430 430 430 342 342 221 196 217 217 217 217 473 486 486 486 486 460 169 169 164 164 164 485 219 485 485 132 143 143 129 82 144 27 27 437 329 329 151 169 164 164 164 164 221 401 259 29 29 382 313 313 35 131 472 221 401 401 401 401 491 144 180 180 84 84 350 88 88 88 176 176 176 176 328 328 200 200 200 117 454 404 483 226 226 226 226 491 83 55 55 55 322 67 67 212 131 133 364 364 276 181 346 346 181 265 85 85 146 378 378 345 430 430 430 430 342 342 451 30 30 324 422 143 401 401 144 27 180 84 84 496 88 88 88 176 176 135 135 200 248 248 248 216 127 114 114 264 264 264 264 264 59 59 452 452 263 263 417 417 237 491 237 237 421 421 421 491 491 491 128 128 128 491 128 128 193 193 193 17
+17 17 17 296 296 317 317 491 491 317 305 305 461 491 461 491 491 461 491 491 435 435 435 435 435 435 7 373 72 72 430 430 430 430 430 430 430 34 177 177 177 236 35 401 259 354 137 137 137 137 137 94 199 335 14 14 411 411 475 475 475 475 475 475 475 475 324 324 464 464 493 493 493 493 493 216 300 300 382 245 399 217 217 473 136 136 136 136 136 136 282 94 199 340 340 340 94 199 145 145 486 486 460 460 173 280 29 242 242 116 379 33 250 251 241 81 444 444 213 246 246 246 19 19 454 229 247 247 126 126 292 326 326 326 326 326 326 326 326 326 101 101 149 149 228 289 7 217 473 258 258 258 258 342 342 342 494 494 494 368 453 9 142 397 147 380 329 329 329 329 329 329 36 310 107 395 302 302 302 375 497 98 98 98 225 225 225 225 80 80 259 384 371 180 443 443 169 169 352 352 402 6 6 26 359 166 166 166 301 129 259 259 74 189 189 189 285 449 449 176 176 135 328 200 200 248 248 32 32 127 114 114 258 258 258 31 39 86 68 68 68 483 483 440 89 194 446 446 33 212 212 198 127 114 92 92 92 92 167 167 457 457 36 108 377 123 123 416 458 445 180 180 443 493 493 216 300 300 334 59 59 452 263 229 247 247 126 126 326 326 326 326 101 101 149 149 228 228 491 7 70 70 65 65 428 428 428 146 438 325 449 34 202 202 202 202 402 221 259 144 445 278 278 173 173 280 29 242 242 116 94 199 44 44 44 129 129 259 74 190 190 104 104 104 325 325 41 324 324 301 416 239 144 144 484 484 484 236 314 131 221 401 259 445 445 180 443 443 443 443 120 120 271 271 39 342 342 224 253 253 253 253 31 86 238 6 272 123 123 123 8 354 106 496 496 496 274 368 342 142 221 336 208 208 441 151 151 151 169 150 99 238 6 6 310 107 60 298 298 298 275 303 303 471 471 471 269 433 18 112 427 491 491 312 312 312 292 292 292 292 292 292 292 21 21 21 21 21 21 21 21 21 21 21 408 408 391 163 491 316 491 316 316 316 316 491 316 316 316 73 289 289 320 159 159 159 159 35 35 196 196 217 473 473 329 329 329 329 329 329 169 169 164 164 164 485 485 485 485 374 132 422 186 162 232 482 172 115 344 344 344 344 344 274 274 274 42 42 364 147 147 380 288 264 264 264 468 468 468 313 134 359 359 166 166 166 301 301 43 364 276 109 109 189 330 330 33 64 76 131 472 393 155 155 165 165 165 165 53 58 58 72 72 72 72 437 350 350 350 350 350 350 350 182 413 413 381 381 404 404 225 225 225 225 225 80 80 491 491 320 127 45 45 45 45 325 177 177 177 177 457 217 217 217 70 65 65 319 169 150 150 86 86 6 272 472 472 336 354 420 420 420 422 162 232 232 68 68 115 273 231 231 231 53 53 76 465 198 214 214 214 328 200 200 248 76 129 401 491 74 190 190 190 488 488 488 151 151 169 150 342 342 68 224 176 176 176 328 328 200 200 464 89 89 446 67 212 34 106 319 319 319 348 33 33 219 219 219 219 485 374 374 374 374 374 368 368 107 161 134 134 100 100 100 497 43 43 345 407 407 407 36 310 447 397 397 141 141 141 281 86 86 238 6 119 119 295 295 295 295 295 252 143 192 192 135 135 328 200 200 183 183 57 57 57 57 57 203 381 48 48 13 13 78 491 128 491 193 17
+17 17 17 296 296 317 184 184 491 373 451 451 451 30 301 378 364 345 141 141 141 281 342 342 198 22 283 455 38 338 338 338 395 395 106 480 480 480 85 85 146 146 464 459 459 459 31 31 86 238 6 472 196 196 473 136 136 136 136 136 282 388 199 199 255 255 251 251 241 241 431 265 265 265 85 85 85 146 299 173 352 89 89 322 67 199 58 72 72 72 110 171 171 171 171 252 143 36 449 191 191 236 314 36 108 377 87 58 72 110 110 202 202 202 460 169 352 402 402 6 272 87 87 87 416 144 180 84 496 88 88 88 255 255 399 70 70 65 319 319 319 348 200 248 478 66 482 482 238 6 161 79 288 290 290 290 290 434 339 339 212 310 395 334 334 304 304 304 49 269 168 168 157 157 313 313 36 377 377 123 123 88 88 14 411 475 475 475 475 475 475 475 324 301 129 259 74 425 425 386 386 343 343 343 343 358 318 39 342 9 142 397 345 109 109 498 245 313 183 451 451 30 30 30 301 399 217 70 65 65 428 428 428 146 146 325 131 72 72 110 110 486 486 460 460 402 402 96 272 87 87 87 236 259 108 119 119 351 405 405 405 405 405 405 206 206 169 233 458 192 419 427 491 491 247 312 126 292 292 292 292 292 292 21 21 21 21 21 21 21 408 408 408 149 149 228 82 320 7 217 473 65 486 486 486 460 329 169 164 164 164 164 485 485 485 485 374 374 132 132 132 236 32 401 259 161 161 79 79 380 288 443 151 169 150 150 86 86 86 238 6 272 180 230 230 230 230 215 215 35 29 345 333 333 220 220 44 44 44 43 364 276 346 346 346 428 428 146 146 385 131 472 221 458 144 27 437 437 481 481 481 481 481 175 175 81 300 300 382 406 467 467 89 89 446 116 394 212 161 161 79 499 499 499 428 85 85 146 173 173 280 176 135 135 200 464 340 340 199 44 44 44 8 32 259 354 431 151 151 151 416 416 192 41 41 41 41 41 19 454 229 247 247 126 126 326 326 326 101 101 149 149 228 289 320 320 345 141 141 281 31 342 232 232 68 68 172 115 231 231 231 231 231 53 394 76 164 164 214 214 214 214 200 200 248 212 127 45 45 45 45 236 401 401 259 384 371 278 278 278 314 196 242 242 33 64 212 131 472 72 72 72 110 486 486 460 460 460 215 35 29 29 242 242 94 199 199 106 426 426 426 169 349 352 352 352 242 275 275 303 303 303 48 48 417 417 417 237 237 491 28 28 491 305 305 491 491 362 305 491 491 491 491 362 366 491 366 366 316 491 491 435 316 435 491 491 73 289 7 7 217 473 258 258 258 342 342 224 494 494 494 281 9 142 397 147 147 329 329 329 329 329 252 36 310 107 395 302 302 302 302 497 98 98 13 229 82 247 312 126 126 326 326 101 101 101 149 391 228 491 289 491 74 437 437 284 405 426 426 206 348 64 64 212 300 382 495 406 467 253 253 253 99 99 338 338 400 400 400 400 30 301 399 217 70 65 65 265 428 428 85 146 146 358 385 36 227 427 427 229 247 126 126 126 326 408 408 391 228 228 289 491 144 27 389 389 389 314 196 196 217 473 476 476 476 476 476 143 458 96 196 32 196 309 309 309 309 479 331 231 231 231 231 231 349 164 214 214 214 214 328 200 200 335 14 411 287 284 223 223 223 223 130 280 277 277 277 277 277 385 24 227 419 439 439 439 439 225 128 193 193 17
+17 17 17 296 296 184 184 184 412 83 194 194 194 55 322 67 131 183 156 156 156 156 335 14 145 145 460 460 460 460 349 402 96 272 272 469 469 313 94 398 398 374 374 374 132 339 339 33 471 77 342 168 121 121 121 33 33 394 310 395 395 153 153 387 387 146 146 203 217 291 291 291 291 291 64 243 36 227 472 397 397 345 141 141 281 31 162 232 232 105 105 336 354 153 153 387 387 387 387 139 139 302 302 375 375 122 122 131 227 419 439 417 417 170 491 28 28 28 28 28 491 491 362 362 491 362 491 305 362 362 491 362 491 362 435 491 211 369 491 369 369 369 369 369 21 21 21 21 21 21 21 260 260 260 260 260 260 391 391 391 491 73 491 491 412 287 111 111 111 111 139 139 293 293 122 239 36 384 395 470 459 271 271 150 342 86 238 6 6 491 478 66 68 232 238 6 272 470 470 443 443 215 215 35 354 29 410 410 410 410 410 410 280 29 29 313 236 36 377 123 123 129 259 208 79 79 288 360 360 360 434 434 200 248 248 212 445 180 171 171 171 171 171 252 215 8 354 100 302 497 497 49 342 168 180 145 486 460 460 460 169 402 402 6 272 300 382 313 236 36 108 108 119 351 213 213 213 213 246 246 246 3 464 89 446 116 394 76 90 393 234 261 25 25 480 480 480 480 299 299 339 64 212 34 180 113 113 113 113 167 167 35 401 393 155 155 165 165 165 165 53 217 217 65 329 329 495 406 467 467 134 139 175 175 423 423 423 423 43 43 364 345 109 109 264 468 468 396 58 183 451 30 30 30 368 342 342 221 336 144 180 106 426 426 426 206 388 94 199 89 446 446 212 131 133 133 276 276 346 346 346 265 265 85 85 85 207 207 19 454 417 417 417 417 417 47 491 47 47 491 435 435 80 491 491 80 80 289 320 127 5 5 455 43 43 364 276 109 498 498 498 396 313 216 216 41 324 324 301 43 364 276 174 174 174 203 53 473 242 116 195 33 90 393 349 234 261 25 106 480 480 480 146 146 299 339 250 359 359 166 166 166 143 458 144 27 121 121 121 76 458 458 208 386 386 444 374 374 374 252 325 34 191 191 191 24 131 404 427 229 247 247 126 126 292 292 292 292 292 326 326 326 326 326 326 326 326 326 326 326 326 326 326 326 101 408 408 408 408 391 491 491 373 451 451 30 30 422 325 371 71 71 71 71 71 453 242 242 348 64 394 76 401 310 107 395 395 395 432 432 330 94 199 495 495 495 134 134 134 359 81 166 166 324 422 416 458 144 180 84 496 496 274 285 449 123 123 236 236 36 108 119 351 351 315 315 315 315 450 450 413 413 413 466 198 114 258 258 258 31 86 86 6 272 119 103 103 103 103 85 299 203 53 29 462 462 462 280 280 219 219 286 286 286 286 334 59 59 452 452 263 225 225 83 55 55 55 322 67 131 183 183 451 451 30 30 434 339 10 10 10 309 479 331 463 463 463 463 29 29 382 382 245 245 349 280 280 278 278 278 368 368 342 168 168 277 277 277 37 385 233 270 270 433 390 160 112 112 439 439 78 56 128 128 193 193 17
+17 17 17 363 51 51 228 491 491 412 118 118 118 118 402 451 30 30 422 314 90 133 147 147 380 499 319 319 319 94 199 145 113 113 113 113 240 285 34 462 130 402 221 36 108 119 119 308 308 308 396 313 94 199 199 230 215 35 478 232 232 232 172 115 444 444 444 213 213 252 24 131 183 451 30 30 30 378 378 345 389 389 389 389 314 196 242 33 33 394 212 161 161 79 380 288 443 443 169 39 342 342 168 230 230 230 230 215 35 29 89 89 116 394 465 108 119 295 295 295 295 295 143 458 96 198 198 283 455 455 8 32 354 354 431 151 151 240 416 416 192 41 41 324 422 36 36 377 87 87 416 458 27 180 84 496 496 274 349 205 155 155 332 332 245 399 70 70 138 138 138 138 138 372 372 372 59 452 263 263 229 491 491 312 15 15 15 193 193 193
+17 17 17 296 296 317 317 317 184 184 184 184 491 219 357 357 357 240 385 35 35 478 68 115 273 231 231 231 53 76 465 214 214 214 214 200 248 248 217 217 217 70 65 65 319 151 169 150 342 86 6 272 34 494 202 402 58 72 110 110 486 486 460 460 215 35 29 242 242 116 379 471 478 478 68 68 115 273 278 278 379 379 77 342 342 26 26 241 431 431 376 376 376 376 169 169 150 342 86 238 196 309 331 331 428 428 428 428 146 252 143 36 377 87 87 38 162 86 482 238 272 180 499 306 306 396 396 285 183 183 183 57 57 57 57 399 70 65 106 426 426 426 426 426 426 206 169 352 352 352 352 352 352 352 112 439 417 237 170 47 491 47 47 491 491 2 47 316 316 316 73 73 73 491 287 111 319 203 203 90 76 465 144 208 208 386 386 360 360 360 360 339 339 394 76 465 74 437 437 151 151 368 368 342 168 302 302 302 375 375 122 122 239 127 114 92 92 92 169 35 77 9 142 397 345 181 181 181 167 385 35 131 419 439 225 225 305 491 412 412 83 55 55 322 67 199 111 111 111 111 438 378 43 364 276 109 109 496 496 274 274 274 457 196 309 479 331 84 84 88 88 44 44 44 44 217 217 217 217 217 217 473 278 278 116 199 199 278 240 143 77 342 86 142 221 336 336 74 213 213 213 252 39 342 224 462 462 462 402 196 196 70 65 480 480 480 85 299 299 339 212 131 157 157 157 245 129 129 259 27 27 437 370 370 370 370 348 64 76 310 107 395 60 298 379 379 471 471 77 269 433 112 427 247 247 126 126 326 23 23 101 101 149 149 228 289 412 287 55 446 322 67 76 465 449 351 139 139 139 251 175 111 111 111 111 438 438 10 10 10 309 479 331 84 84 16 16 274 43 43 364 181 181 181 181 325 34 356 281 281 342 86 221 336 108 119 295 295 295 295 143 458 192 242 242 116 33 250 217 217 473 486 486 486 460 460 169 164 164 485 485 485 374 132 143 129 259 144 27 27 437 151 151 164 164 402 401 401 354 29 382 382 313 24 335 14 14 209 287 113 113 113 113 113 285 34 255 223 130 280 280 145 486 486 460 173 173 280 242 242 379 250 359 359 81 474 324 324 252 143 36 377 87 87 236 93 93 93 93 207 207 207 454 454 417 417 417 417 237 170 491 28 28 28 491 491 362 362 305 362 491 362 491 362 491 362 369 369 369 369 369 369 369 369 369 369 369 369 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 260 260 391 391 391 391 491 289 289 412 287 255 255 143 458 458 208 441 153 153 387 396 313 325 176 176 328 200 248 250 359 474 474 474 324 19 19 454 454 417 417 417 417 417 237 237 47 491 491 491 435 435 80 289 435 209 209 287 145 486 460 460 460 169 402 35 36 272 382 382 313 143 36 108 119 351 213 213 213 213 246 246 246 246 3 301 196 217 473 258 258 258 31 342 224 494 494 494 31 9 142 397 147 329 329 329 329 143 36 449 395 302 302 497 38 162 68 68 115 273 470 443 240 325 449 180 113 113 113 113 113 450 233 233 227 419 419 439 417 417 417 237 237 237 237 47 491 47 2 491 47 316 316 316 316 491 491 435 373 373 338 400 400 400 400 30 3 58 110 254 254 254 314 196 479 479 307 307 61 61 167 35 393 205 261 25 106 306 306 396 396 313 36 449 87 87 87 416 144 180 180 84 84 84 16 16 16 274 98 98 13 417 417 225 225 193 17
+17 17 17 296 296 51 184 320 320 127 5 455 236 129 259 354 278 278 278 278 278 252 416 416 416 192 472 225 397 225 225 80 80 197 147 147 147 380 499 486 486 486 365 460 203 53 53 53 212 354 302 175 81 176 176 135 328 200 200 248 335 14 14 411 411 153 153 372 372 396 313 143 310 107 395 382 382 313 325 34 121 121 53 53 394 212 212 180 180 486 486 315 450 450 88 372 372 396 313 24 34 58 72 72 72 72 268 268 268 268 268 268 450 450 274 274 271 186 39 323 9 142 397 336 345 109 109 109 264 313 216 216 22 5 455 236 458 27 27 351 151 151 169 169 164 164 472 221 336 354 29 498 498 313 313 143 77 270 342 342 26 26 251 241 431 431 278 278 120 120 173 173 352 352 272 419 229 247 247 126 126 292 292 326 326 326 326 101 101 101 149 149 228 289 320 345 141 141 141 281 453 168 44 44 38 38 232 232 232 105 105 445 445 470 365 365 365 365 460 330 388 64 131 472 472 221 458 208 208 441 441 441 153 153 153 387 387 285 285 300 382 382 467 69 69 130 130 280 44 44 44 399 70 65 65 265 265 265 85 85 85 85 139 293 175 175 175 230 230 230 215 402 198 198 22 283 455 42 42 147 380 380 496 496 496 496 274 24 131 393 205 155 165 165 165 165 53 250 251 251 241 431 329 278 330 388 379 33 471 49 9 142 58 72 437 481 481 481 481 481 293 175 175 81 84 84 84 496 274 98 13 13 417 170 170 491 170 491 28 491 28 491 28 362 362 362 362 362 362 362 491 491 491 40 305 491 305 305 305 316 435 491 435 435 435 491 491 435 435 7 465 108 377 87 87 8 420 420 420 422 186 338 338 395 395 487 498 498 498 498 59 59 59 263 229 229 247 126 126 126 326 326 326 326 326 326 101 101 101 149 149 228 289 320 127 5 5 455 251 251 251 241 235 235 235 235 235 235 235 348 200 248 248 248 251 251 241 431 290 290 290 290 290 434 434 434 339 195 33 250 217 473 476 476 476 476 252 325 325 191 191 191 325 34 44 44 416 129 259 144 484 484 484 236 314 401 401 259 371 485 485 139 139 302 497 497 349 234 234 261 25 498 498 498 498 498 493 216 216 300 300 334 334 59 452 452 263 263 417 417 237 237 237 237 47 47 47 491 47 491 491 73 491 73 289 7 217 473 65 486 486 329 460 169 169 164 164 219 485 485 374 422 143 458 144 27 351 329 329 329 169 169 164 164 472 221 336 354 495 498 498 313 385 35 77 342 86 142 393 393 261 25 91 91 91 91 91 206 493 216 216 300 334 334 59 59 452 263 229 247 247 126 126 326 326 326 326 408 408 149 149 491 412 83 83 253 253 253 253 253 99 338 338 338 338 338 338 395 180 499 499 265 265 265 265 85 85 85 146 146 464 89 89 446 446 394 478 66 68 68 115 273 265 265 265 85 146 146 146 175 175 81 11 11 11 64 76 465 34 253 253 253 253 342 168 257 257 257 257 31 162 68 68 68 115 273 273 319 319 319 388 94 199 145 145 145 460 460 460 460 402 402 96 272 300 382 382 58 58 57 57 57 57 57 57 203 381 117 404 229 247 15 193 193 17
+17 17 17 363 51 51 491 412 412 254 254 254 254 131 221 458 144 180 106 405 405 206 240 285 34 253 253 31 31 86 142 142 393 234 261 25 106 106 481 481 306 306 306 306 372 406 467 467 255 255 255 43 43 345 109 109 403 403 403 207 464 253 253 253 342 342 30 30 30 422 129 74 437 437 437 405 405 206 169 150 342 342 224 494 494 236 129 259 26 359 359 166 166 166 422 129 259 144 27 389 389 389 120 37 24 24 131 472 393 155 165 165 165 165 53 473 58 183 257 257 257 31 342 86 142 393 261 25 470 443 139 175 81 84 84 84 274 399 217 217 473 136 136 136 136 282 388 303 195 404 133 364 345 333 333 220 216 114 180 113 113 113 113 113 450 167 35 335 14 411 145 145 145 460 460 460 178 178 96 96 99 436 436 395 134 134 134 134 134 134 359 359 166 166 166 324 301 42 147 456 456 456 236 36 161 161 487 487 487 213 213 252 325 325 176 176 135 135 200 200 464 340 340 116 64 76 108 377 123 123 216 22 283 455 43 43 364 276 109 109 496 496 496 37 37 37 24 471 270 270 323 323 18 97 397 336 82 409 409 409 409 67 58 183 30 30 30 422 349 205 261 25 180 315 315 450 450 413 64 131 34 277 236 325 34 257 257 257 31 342 142 72 72 350 350 350 350 203 53 53 394 478 66 86 238 6 272 470 470 120 120 120 37 24 131 419 427 491 491 247 312 312 292 292 292 292 292 292 292 21 21 21 21 21 21 21 21 21 326 21 408 408 408 408 391 491 491 289 491 144 208 79 288 288 360 360 360 434 339 33 90 212 445 445 180 171 171 171 171 252 215 8 29 302 302 497 497 497 185 49 9 397 397 345 141 141 281 281 9 142 221 336 336 354 278 278 139 139 139 293 293 122 122 449 449 415 415 415 129 198 198 22 455 38 349 234 234 261 25 498 498 498 498 396 240 216 114 300 459 271 31 162 86 68 6 272 470 470 120 120 120 240 314 314 259 108 377 123 223 130 402 257 257 257 257 453 9 221 336 336 208 208 386 386 485 286 286 286 468 313 313 24 314 26 26 251 241 431 294 294 294 294 294 282 282 388 303 303 212 131 419 78 491 491 247 312 126 292 292 292 292 326 326 408 149 149 228 491 491 83 83 55 55 322 67 466 212 212 127 114 264 264 468 468 406 467 177 177 177 131 133 133 364 345 141 141 141 141 141 368 368 31 342 86 86 6 108 377 123 123 216 216 258 258 258 258 31 342 86 86 6 6 371 93 93 93 93 207 207 19 19 454 417 417 417 417 417 193 193 193
+17 17 17 296 296 184 184 320 320 320 354 264 264 264 468 468 468 313 359 359 166 166 166 301 378 280 280 278 278 278 368 453 342 168 494 134 8 8 100 100 100 100 497 497 349 155 155 165 165 165 165 466 466 22 283 455 217 217 473 290 290 290 290 290 434 434 339 33 250 42 147 147 380 288 496 496 496 496 274 274 24 325 34 255 255 255 175 241 235 235 235 235 235 235 235 413 200 248 250 345 407 407 407 407 407 310 107 447 447 483 226 226 209 287 297 297 297 297 297 293 293 122 216 22 448 448 448 464 464 493 493 493 493 216 300 300 382 382 313 335 14 226 226 209 411 145 145 486 460 460 173 280 29 242 116 359 466 81 166 324 324 3 58 72 72 268 268 268 268 268 268 450 274 368 368 9 168 300 50 50 50 49 9 142 397 336 347 347 347 313 186 162 54 172 344 344 344 344 344 186 186 162 482 482 482 115 273 344 496 496 186 99 436 436 395 134 134 134 8 359 359 166 166 166 422 162 68 68 115 470 278 278 143 310 107 395 485 469 134 88 158 158 158 158 158 24 325 191 191 191 37 24 419 439 78 170 170 28 491 28 28 491 491 28 491 362 305 362 362 362 40 40 40 40 201 491 366 491 491 305 366 366 491 316 316 316 491 316 73 73 289 7 217 473 258 258 258 342 342 224 494 494 494 31 9 142 397 147 329 329 329 329 329 36 107 107 395 302 302 497 497 251 251 241 81 81 278 278 330 388 379 195 64 212 131 419 439 225 225 225 225 80 491 75 371 371 278 278 314 196 309 479 307 307 307 61 61 167 131 90 221 458 144 27 437 437 437 437 437 481 481 481 481 481 481 293 293 293 497 497 335 197 197 197 197 197 197 7 251 251 251 251 251 241 241 278 278 173 173 280 176 176 176 135 328 200 200 200 340 340 340 340 394 478 342 224 273 344 344 344 449 449 44 44 44 129 259 74 425 425 386 343 343 343 343 343 358 358 358 39 39 433 160 97 97 197 226 226 80 491 491 491 7 7 251 251 251 251 241 431 278 278 278 173 173 176 176 176 176 328 328 200 200 200 200 335 335 209 209 415 415 415 415 415 131 106 106 106 297 297 297 297 182 375 375 375 98 98 13 13 229 170 170 312 312 312 312 292 292 292 292 292 292 21 21 21 21 21 21 21 21 408 408 408 408 408 391 391 491 491 412 177 177 177 177 177 356 356 142 238 238 6 310 395 395 151 169 150 342 342 86 238 6 491 478 66 66 68 232 68 238 6 272 470 470 403 403 403 171 171 171 246 246 176 176 328 328 200 200 248 248 216 114 92 92 92 92 35 77 77 9 397 221 276 181 181 181 181 181 240 385 35 227 419 419 439 225 225 225 225 225 225 225 225 225 225 373 373 373 338 338 400 400 400 400 422 422 162 68 68 68 273 470 120 120 120 120 37 37 24 34 253 253 253 99 338 338 400 400 400 30 422 162 232 232 232 238 6 6 371 470 189 151 215 215 35 96 96 272 272 255 255 251 251 241 431 235 235 235 235 235 235 248 248 212 22 283 455 236 239 239 384 371 213 213 213 252 215 129 259 402 133 42 42 42 147 380 499 151 151 240 240 449 449 191 191 191 37 314 314 90 90 401 401 491 208 208 79 380 380 486 486 376 460 460 169 150 342 342 342 224 41 324 324 301 251 251 251 241 431 431 290 290 290 290 434 434 434 339 339 117 404 229 247 247 126 326 193 193 17
+17 17 296 296 296 184 184 320 354 153 153 153 153 387 387 387 396 285 131 300 382 382 313 131 133 133 345 333 220 220 220 133 133 364 276 276 346 346 346 265 85 85 85 85 139 293 293 122 122 472 133 147 147 380 288 496 496 496 274 368 368 9 142 221 336 354 109 278 278 99 99 436 107 50 50 50 50 50 185 269 433 433 160 112 427 491 491 312 312 126 292 292 292 292 292 292 326 326 21 21 326 326 21 21 21 408 408 408 408 149 228 289 177 177 177 356 356 342 86 238 196 479 331 231 231 231 274 274 43 364 276 174 174 319 319 348 348 64 212 300 334 382 382 245 399 217 473 65 486 486 460 169 169 164 164 485 485 485 8 345 88 109 242 242 116 250 250 70 65 65 498 245 42 147 380 134 134 139 175 175 423 423 423 423 353 353 353 406 467 245 245 8 32 401 401 401 354 354 106 496 496 496 274 169 164 164 224 494 255 251 251 81 278 278 26 26 302 302 497 335 14 14 411 284 284 284 405 405 405 206 206 206 206 37 24 131 404 404 439 225 225 225 225 417 80 491 80 7 7 251 241 241 278 278 278 173 280 176 135 135 135 200 199 255 255 43 43 364 276 109 403 403 403 403 171 246 324 301 8 259 354 376 376 376 376 376 178 178 458 458 192 183 183 286 286 286 286 286 286 468 245 245 8 354 62 62 62 62 438 216 114 57 203 53 394 478 342 232 172 115 279 279 279 279 279 279 279 375 375 233 270 270 269 433 390 112 112 56 491 56 47 491 305 491 187 187 187 187 391 391 316 491 73 73 73 491 289 108 119 487 487 487 288 213 213 246 246 318 368 453 168 106 353 353 353 353 396 35 465 472 196 70 70 383 383 383 383 36 107 447 221 336 144 27 437 319 319 53 53 76 465 242 242 242 94 41 41 41 41 19 19 454 229 247 126 126 126 326 326 326 326 326 326 326 326 326 101 101 101 149 149 228 491 320 127 114 84 84 496 274 236 239 384 371 485 286 286 286 468 382 313 10 10 479 331 84 84 84 496 16 274 368 368 453 342 168 118 118 118 118 402 198 198 114 0 0 0 0 301 378 43 364 347 347 347 347 498 467 396 313 216 216 114 0 222 382 313 314 314 239 354 420 420 420 464 464 44 116 94 479 230 230 169 169 352 352 69 223 223 130 402 402 198 114 57 57 57 203 381 381 117 404 439 439 439 439 439 439 237 78 78 170 491 491 312 312 312 12 12 1 292 21 21 21 408 408 408 391 391 391 228 491 412 412 287 111 111 438 438 314 133 133 380 499 499 493 216 216 300 382 134 251 241 367 367 367 458 192 415 415 457 401 401 259 74 351 213 213 213 213 252 215 259 259 29 100 100 375 375 98 98 13 417 417 417 237 237 237 237 237 237 237 237 237 237 237 237 237 47 316 491 491 316 316 491 491 491 435 289 289 289 108 377 87 8 420 420 420 420 422 99 338 338 395 395 487 498 498 498 59 59 59 452 263 263 417 417 193 193 193
+17 17 296 296 296 184 184 184 320 435 0 0 0 0 422 162 68 115 444 444 444 360 434 339 394 90 76 144 445 121 121 116 33 76 465 108 432 432 432 432 379 64 76 449 191 191 191 325 34 34 196 309 479 479 331 230 230 230 169 169 352 352 221 401 259 159 159 159 236 259 127 361 361 361 94 199 111 111 111 438 438 162 342 224 494 494 494 129 74 84 84 496 496 496 496 274 274 368 368 9 9 198 114 0 222 222 468 313 313 219 219 219 485 485 374 374 186 186 323 86 238 6 377 123 123 88 277 277 277 277 385 131 227 419 439 78 229 170 491 312 312 292 292 292 292 292 292 326 326 326 326 326 326 23 23 101 101 408 391 391 491 491 412 287 44 8 8 354 354 106 284 91 206 206 240 325 41 324 324 143 144 389 389 389 389 200 248 192 445 213 324 219 219 219 219 485 374 374 368 162 54 86 6 272 123 123 88 109 109 475 475 94 475 475 475 422 349 164 214 214 214 214 214 328 200 200 117 404 454 225 225 225 225 225 225 451 451 213 357 357 357 173 280 29 242 116 33 465 377 123 8 354 420 420 420 360 135 135 200 200 248 58 58 72 72 110 110 486 486 365 365 365 328 200 200 195 195 248 248 212 239 34 253 253 253 253 453 342 342 198 22 448 448 448 464 145 284 306 306 468 406 467 288 469 469 99 436 447 221 196 473 275 275 379 394 478 342 68 115 470 470 120 120 120 37 24 117 263 417 417 417 417 237 237 237 491 491 28 28 491 28 491 491 305 362 491 491 362 362 362 491 362 362 491 40 211 211 491 218 491 369 369 369 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 260 260 260 260 391 391 228 228 491 320 320 333 333 333 220 314 35 401 259 127 114 258 258 258 258 258 31 39 342 433 160 97 221 336 196 473 258 258 258 258 342 224 494 494 494 31 9 142 397 397 147 329 329 329 329 329 143 36 310 395 302 302 302 497 497 162 232 232 232 238 6 6 371 470 189 443 151 215 35 35 96 401 272 34 180 113 113 113 113 113 285 449 223 223 130 198 198 22 283 455 251 251 251 241 431 290 290 290 290 290 434 434 434 339 339 117 117 404 225 225 225 225 225 80 226 491 209 188 340 340 340 33 33 64 212 377 123 123 216 22 283 455 8 354 180 376 376 376 376 460 178 178 35 458 192 219 485 180 306 306 306 306 306 306 396 24 285 69 462 462 402 402 401 259 208 79 288 288 213 213 360 434 434 339 248 394 239 445 180 171 171 171 171 252 215 8 354 100 100 375 375 375 185 185 269 433 433 18 18 112 439 439 237 237 237 237 47 47 491 491 2 2 316 316 491 316 316 491 73 289 289 7 7 4 104 104 104 104 104 104 468 468 337 337 337 324 301 416 32 32 208 79 79 79 288 288 360 360 360 246 246 246 434 434 339 94 199 89 194 194 194 446 446 64 212 212 465 196 196 309 398 398 213 213 246 246 252 143 36 108 119 351 89 446 446 446 394 76 401 74 190 492 492 38 162 342 68 115 273 265 265 265 428 85 146 146 358 358 39 342 142 397 345 141 141 141 281 453 342 198 198 114 92 92 92 240 314 131 219 219 180 180 106 306 306 306 306 306 59 59 37 131 419 427 229 247 247 15 193 193 193 17
+17 17 17 363 363 51 51 51 184 373 66 68 115 273 470 443 443 240 449 449 34 255 8 354 180 113 113 113 113 113 113 113 450 413 413 36 449 34 106 125 125 125 125 203 250 250 250 70 174 174 174 319 319 348 33 394 478 478 68 68 115 273 273 265 265 265 85 85 85 146 24 133 133 345 333 333 220 220 142 221 401 491 208 79 484 484 484 484 484 252 252 457 36 472 472 401 401 491 74 74 351 351 171 171 252 143 36 161 161 487 485 485 485 464 106 499 306 306 396 396 178 178 458 144 302 100 497 497 497 497 364 364 364 276 109 278 139 175 175 81 81 84 84 84 496 274 185 185 185 269 433 18 427 229 247 247 126 126 292 292 326 326 326 326 326 326 326 326 326 408 101 149 149 149 149 228 412 83 83 194 194 194 322 67 466 212 22 448 448 448 464 464 493 493 493 493 493 216 300 300 382 382 245 378 43 345 333 333 220 220 142 221 401 401 491 74 190 190 487 288 278 278 203 53 53 195 195 195 250 251 251 251 241 431 284 426 426 426 203 53 212 212 29 29 495 313 313 325 41 41 41 41 19 318 185 185 433 433 433 160 112 112 417 417 237 237 237 237 237 237 237 491 2 491 201 201 491 316 435 491 491 435 435 435 435 289 289 491 7 479 331 307 307 61 61 285 44 44 44 44 38 482 482 482 482 238 6 336 161 79 487 288 288 403 403 171 171 246 246 252 422 186 162 232 232 232 68 238 6 272 470 278 278 278 178 458 458 192 472 196 196 309 331 157 157 157 157 372 396 186 162 54 482 482 238 238 6 371 180 84 84 84 496 274 413 413 413 195 250 250 345 141 141 141 281 9 238 6 272 87 87 354 420 420 420 420 422 162 68 68 115 267 267 267 267 267 434 434 434 434 339 303 303 404 13 491 247 247 126 126 292 326 326 326 23 101 101 149 228 491 491 373 155 155 155 332 332 332 372 245 399 70 258 258 258 31 342 224 494 494 368 453 142 142 397 147 329 329 329 329 329 329 36 310 107 395 302 302 302 302 497 497 43 43 364 364 345 389 389 389 314 58 58 110 202 202 202 202 402 478 66 68 68 267 267 267 267 267 434 339 94 277 277 277 37 37 325 34 118 118 118 118 402 402 472 177 198 127 114 0 222 222 468 245 58 72 110 110 254 254 254 240 314 131 133 401 401 354 137 137 137 137 275 303 303 303 48 404 439 439 439 78 237 47 47 47 491 491 47 316 316 491 73 289 73 491 491 190 488 488 488 488 488 428 146 146 173 173 29 469 469 236 36 26 359 359 359 474 474 474 324 186 99 338 400 400 400 30 378 8 345 141 141 281 453 342 168 168 223 223 130 129 198 198 448 448 448 464 464 255 143 129 259 74 351 351 278 278 360 339 398 398 398 464 275 275 116 64 212 198 45 45 45 236 129 196 196 473 65 329 329 406 406 288 288 139 175 175 175 423 423 423 143 458 144 27 437 151 151 169 169 402 35 221 336 354 29 498 382 313 313 35 131 472 482 482 482 482 397 397 189 189 189 189 169 35 96 96 272 472 472 198 127 114 92 92 92 92 240 385 449 449 219 464 180 106 306 306 306 306 206 396 285 285 34 84 410 410 410 410 173 173 29 29 495 495 406 406 467 467 253 253 453 9 168 14 14 411 284 405 405 206 169 349 352 352 29 242 242 116 94 199 253 253 253 253 338 338 400 400 400 30 422 162 232 482 397 397 189 189 189 215 215 35 96 272 472 156 156 156 156 245 58 72 72 268 268 268 268 268 268 450 450 271 186 39 390 390 390 18 427 56 247 312 15 15 15 292 292 292 292 15 193 193 193 193 193 17
+17 17 17 296 363 363 363 363 363 363 363 101 51 51 51 228 228 491 320 345 174 174 174 174 174 348 348 64 248 465 144 389 389 389 389 34 202 202 173 280 280 444 213 213 213 252 314 196 242 242 116 479 199 44 44 44 217 217 217 473 398 213 213 213 286 139 139 302 302 375 375 335 226 14 287 287 284 405 405 206 169 352 402 198 198 22 283 455 416 144 208 79 380 315 315 315 315 450 450 413 413 212 212 131 133 345 333 333 220 220 216 180 113 113 113 113 167 35 131 14 14 411 410 410 410 410 173 29 29 382 245 8 129 259 190 380 288 288 278 203 53 176 176 135 328 200 200 248 248 212 22 283 455 129 259 74 190 492 492 245 173 280 280 498 498 498 498 498 396 215 8 354 337 485 485 464 139 302 497 497 497 129 401 401 259 74 351 351 443 443 178 178 458 192 192 462 462 462 402 32 239 384 371 498 498 498 498 498 59 396 385 24 227 419 419 439 417 417 237 237 237 237 237 237 491 28 237 28 362 491 491 362 491 491 362 362 362 491 362 362 491 435 211 491 369 369 369 21 21 21 21 21 21 21 21 21 21 408 408 408 149 149 228 289 491 7 217 258 258 258 258 342 342 224 494 494 494 281 9 142 397 336 147 329 329 329 329 329 329 310 107 395 302 302 302 497 497 497 42 42 147 380 486 486 486 460 460 215 35 96 36 472 66 68 68 172 482 105 196 70 70 65 306 306 396 396 313 35 26 26 359 359 474 474 324 464 415 415 415 35 22 283 455 416 458 458 445 278 278 278 36 310 107 107 60 298 116 33 33 394 212 384 371 153 153 153 153 372 372 372 59 59 452 263 225 225 225 83 83 55 55 322 67 67 478 478 232 232 232 68 238 6 371 470 443 443 443 215 35 96 96 272 34 340 340 116 33 250 250 250 409 409 409 409 116 33 394 32 239 354 278 278 278 314 196 242 242 242 33 394 76 36 377 87 87 236 239 371 371 374 374 374 132 186 162 54 172 224 273 84 84 84 84 16 98 98 13 417 417 417 417 237 237 237 201 211 211 187 260 260 391 391 391 491 316 316 73 491 289 491 127 5 5 455 143 458 445 445 278 278 278 278 143 310 107 107 395 298 298 116 94 199 415 415 415 415 131 221 401 144 79 79 288 288 360 360 434 434 200 248 212 239 445 180 171 171 171 171 171 252 8 354 100 100 302 497 497 81 253 253 453 142 221 345 141 141 281 453 342 168 44 236 236 36 310 107 395 485 286 286 286 313 349 349 155 262 262 100 175 81 255 255 129 259 74 437 437 306 306 306 396 396 457 233 196 291 291 291 291 243 243 227 427 427 247 247 15 126 15 292 292 292 193 193 193 193 17
+17 17 17 296 363 363 363 51 51 51 51 184 491 491 412 287 157 157 157 157 372 372 372 396 396 245 245 43 364 364 345 389 389 389 240 325 34 202 202 202 402 221 259 354 137 137 137 137 137 33 394 394 76 310 107 107 395 395 286 286 286 468 396 245 349 349 155 262 262 262 100 100 375 98 98 117 417 417 225 80 80 491 209 118 118 118 118 118 402 177 177 177 177 36 34 254 254 254 314 196 479 331 307 307 307 61 61 167 457 35 259 137 137 137 137 137 33 394 478 478 68 172 344 344 344 344 274 129 129 74 74 72 72 290 290 290 290 290 339 339 33 90 393 155 262 262 262 359 359 166 166 166 324 422 143 129 458 144 208 208 208 386 386 386 444 360 360 360 360 246 434 434 339 94 199 253 253 253 253 31 342 86 238 6 272 87 87 87 416 458 445 485 278 278 173 173 29 277 277 385 314 478 478 68 115 273 231 231 231 53 53 76 198 214 214 214 328 200 200 200 69 223 223 130 198 198 22 448 448 464 464 255 255 129 129 74 74 485 485 485 485 286 286 468 468 467 337 11 11 11 11 11 379 379 77 77 342 342 224 69 69 223 130 44 44 44 116 94 199 319 319 319 348 33 33 219 219 219 219 219 485 485 374 374 374 132 318 368 368 54 54 238 272 472 221 336 74 74 437 437 306 306 306 306 306 396 396 134 175 175 81 334 334 334 334 59 452 452 13 491 247 312 312 126 292 292 292 292 292 292 326 326 326 21 21 21 21 21 21 21 408 408 408 408 149 228 491 412 177 177 177 177 77 9 142 397 336 276 109 109 278 330 348 33 64 212 384 84 84 84 84 496 274 185 49 9 26 26 241 367 367 367 367 143 96 36 131 483 226 226 209 411 213 213 213 213 213 252 318 39 39 342 86 86 238 6 272 483 440 89 89 446 446 446 67 212 131 133 364 276 109 109 443 443 120 120 271 150 39 433 433 160 6 6 227 419 439 439 439 225 225 225 225 237 47 47 491 80 491 373 373 373 155 155 487 487 487 374 374 216 216 283 455 455 43 276 109 109 443 443 443 150 150 86 86 238 272 397 397 364 174 174 174 174 319 348 195 195 195 404 404 229 491 247 312 126 292 292 23 23 23 101 101 149 391 228 491 289 320 7 241 367 367 367 367 192 192 176 135 135 135 200 464 464 113 113 113 113 113 167 449 34 125 125 125 125 466 22 455 455 8 259 180 376 376 376 376 376 460 178 178 458 192 219 219 219 180 180 106 306 306 306 306 306 306 59 37 24 131 404 229 229 247 126 126 326 326 326 408 408 149 228 228 289 491 144 445 210 210 210 210 210 203 53 44 44 44 44 349 234 234 234 261 425 386 386 431 151 151 240 240 285 34 69 462 462 130 402 402 196 217 217 217 217 473 65 443 443 139 175 175 241 81 84 84 84 496 274 274 274 236 32 401 36 310 107 395 485 374 374 374 374 132 132 132 242 116 33 33 33 394 478 478 232 172 172 115 273 319 319 319 348 348 466 250 241 431 431 428 428 428 428 146 358 385 233 227 419 419 439 439 78 170 170 491 47 491 2 491 2 316 316 316 491 316 491 73 73 73 491 354 159 159 159 35 35 198 22 5 448 448 14 411 411 213 213 213 213 213 186 39 342 86 86 238 6 272 472 397 336 276 174 174 174 174 275 388 303 303 117 48 417 78 170 491 491 421 491 128 128 193 193 17
+17 17 17 296 296 317 435 491 184 184 184 184 320 345 109 409 330 330 67 77 77 54 54 219 152 152 152 152 143 129 144 144 180 106 405 405 405 206 206 167 167 35 227 227 419 225 225 226 226 209 44 44 44 416 458 458 208 208 425 386 386 431 278 278 278 53 53 76 76 270 342 342 172 224 69 462 130 130 198 198 22 283 455 455 129 259 354 425 425 241 431 374 374 374 374 374 132 132 413 203 381 381 381 212 212 32 197 197 197 197 491 197 197 7 7 364 276 346 346 346 428 428 428 146 146 358 35 35 401 131 472 472 401 80 80 491 310 107 107 395 351 264 264 468 468 468 337 337 337 324 422 24 36 161 161 487 487 288 213 213 246 318 318 49 342 168 340 340 116 466 22 283 455 455 251 241 431 431 443 443 169 169 352 402 402 272 483 14 411 153 153 153 372 372 396 313 36 310 107 395 334 334 59 313 24 24 404 427 247 247 126 126 292 326 326 326 326 326 326 101 149 149 149 491 412 83 55 55 55 322 67 10 10 309 479 331 331 284 405 405 206 206 240 325 34 176 176 135 135 328 200 200 200 248 248 478 162 68 68 68 26 26 26 241 431 432 432 432 330 64 64 212 131 300 382 382 382 245 245 129 129 354 354 498 498 498 498 396 186 35 36 310 107 107 50 50 50 50 50 49 342 342 68 221 336 384 371 180 315 315 315 450 450 348 94 199 340 340 466 22 283 455 455 58 72 72 437 437 481 481 481 481 175 175 81 84 84 84 274 274 8 259 354 62 62 62 62 216 216 22 283 455 8 129 259 190 380 380 499 496 496 496 274 233 233 458 419 427 229 247 126 126 292 292 292 292 292 23 23 23 23 101 101 101 149 149 228 228 320 345 141 141 141 281 453 142 221 221 336 208 79 79 79 288 360 360 360 434 434 339 64 212 131 180 180 410 410 410 410 173 29 29 382 245 245 8 8 62 62 62 62 62 146 464 44 44 44 236 36 108 119 119 351 351 486 365 365 365 200 200 212 212 302 302 302 175 175 462 462 462 462 4 4 4 280 106 284 480 480 480 480 480 85 85 299 299 299 299 339 303 471 471 471 49 433 433 433 160 112 112 439 56 56 237 237 237 491 28 491 28 491 362 362 362 491 362 362 491 362 362 491 491 362 491 211 491 491 369 369 21 21 21 326 408 408 408 408 228 491 289 491 373 451 451 286 286 286 286 468 468 313 186 162 232 68 115 273 470 486 486 460 460 240 35 472 196 196 70 65 65 495 406 467 288 139 175 175 423 423 423 143 129 144 27 437 437 151 151 169 169 164 164 164 221 336 354 29 334 334 59 59 385 233 465 419 439 439 225 417 417 80 80 80 7 345 409 409 409 409 76 310 107 400 400 30 30 422 422 342 342 273 470 486 486 460 240 285 34 415 415 415 325 131 106 106 297 297 297 375 375 98 98 263 417 417 417 417 237 237 237 237 80 491 491 483 287 287 297 297 297 297 297 43 43 345 109 109 109 171 171 318 186 162 232 68 68 26 26 425 241 431 428 428 428 146 143 26 26 359 166 166 166 301 143 36 490 490 490 38 162 482 482 238 6 161 487 499 151 151 150 150 86 142 142 393 262 100 100 175 81 462 462 130 162 68 68 273 319 319 319 348 33 64 212 310 107 395 180 480 480 480 85 299 299 303 48 229 229 247 15 15 193 193 193 17
+17 17 17 296 296 363 363 52 52 52 52 52 52 51 51 51 51 184 184 289 491 209 83 83 194 194 194 194 322 67 67 131 183 183 183 451 286 286 286 286 468 468 313 186 99 338 400 400 400 400 30 422 186 162 232 68 68 115 470 486 486 486 460 167 167 35 401 196 309 479 331 315 315 315 315 315 450 450 98 98 417 417 417 417 417 417 237 237 47 47 491 47 491 80 80 73 80 80 7 7 309 309 479 278 278 278 278 36 449 449 176 176 328 328 328 200 303 117 98 13 417 417 417 417 237 237 47 47 47 491 80 80 80 80 435 209 83 83 194 194 322 67 67 212 22 5 455 236 36 108 119 351 351 171 171 171 252 8 29 302 302 497 497 8 354 255 255 58 58 72 72 480 480 480 480 299 299 339 212 131 156 156 156 156 245 245 43 345 141 141 281 342 26 26 241 431 476 476 476 252 252 36 393 393 155 332 332 332 186 162 342 115 273 151 151 151 215 215 29 29 334 334 59 59 59 263 263 417 417 417 237 237 237 491 28 28 28 362 362 491 362 362 491 362 362 362 491 362 362 362 491 362 362 362 218 218 218 218 218 218 491 491 211 211 369 491 369 369 369 369 369 369 369 369 260 260 163 163 163 316 316 316 491 316 491 73 289 289 7 217 473 258 258 258 342 342 224 494 494 368 453 9 142 397 147 380 329 329 329 329 252 143 310 107 395 302 302 302 375 375 375 98 98 13 225 225 225 225 80 80 80 80 80 320 354 255 255 255 349 349 155 155 148 148 148 372 372 313 186 338 400 400 400 30 58 110 254 254 254 314 90 393 205 25 470 264 264 468 468 134 359 359 166 166 166 422 129 458 208 208 386 386 496 496 496 496 368 368 453 9 142 198 22 22 283 455 384 371 106 153 153 153 372 372 372 59 59 452 263 229 247 126 126 326 193 193
+17 17 17 296 296 317 491 491 184 184 184 184 184 320 320 127 114 0 0 0 378 378 43 347 347 347 347 313 186 164 164 164 119 487 487 487 288 213 324 324 301 143 259 74 425 425 386 386 431 403 171 171 252 24 270 270 342 26 26 26 241 241 431 403 403 171 171 207 358 37 24 131 427 229 247 247 126 126 292 292 326 326 326 326 326 101 101 101 149 149 228 491 373 66 115 273 273 496 496 274 216 216 45 45 45 236 35 196 196 217 65 65 329 495 406 467 256 256 139 175 175 423 423 423 423 423 399 217 70 65 65 151 169 150 342 142 221 336 420 420 420 464 154 154 154 458 458 96 86 105 105 336 470 470 151 178 178 96 96 36 272 176 135 328 200 200 248 248 478 342 68 68 115 273 231 231 231 53 250 250 174 174 174 319 348 33 58 72 72 72 437 350 350 350 350 203 203 250 250 250 333 333 333 220 220 402 196 196 217 217 65 486 486 460 460 169 164 25 485 485 485 374 422 143 36 377 87 87 236 36 108 108 119 213 213 213 213 246 246 246 19 19 454 454 229 170 491 312 312 312 187 292 292 292 12 12 12 21 408 408 408 260 391 491 491 316 491 316 491 73 289 7 7 7 354 159 159 159 159 240 314 35 35 127 5 5 455 236 129 239 384 371 278 278 278 99 436 436 395 395 50 50 50 50 49 9 142 397 336 347 347 347 467 14 411 204 204 204 204 204 29 337 337 337 301 422 239 384 93 93 93 93 171 171 252 252 314 239 384 371 371 278 278 99 99 436 436 395 50 50 50 50 50 185 185 269 433 390 390 160 112 97 225 225 225 225 225 225 80 491 491 412 83 55 55 55 322 466 127 0 0 222 378 43 345 141 141 281 453 168 168 350 350 350 350 348 250 466 81 166 166 324 422 143 401 259 208 208 208 190 487 499 499 486 460 460 215 280 106 486 486 460 215 215 74 100 302 497 497 122 129 259 190 492 492 492 368 9 9 168 300 498 498 498 498 498 498 59 59 59 304 173 173 49 269 342 168 89 89 446 446 116 195 195 195 212 133 133 364 276 174 174 174 174 348 348 33 33 90 90 76 144 27 437 437 480 480 480 85 299 299 299 212 131 462 462 462 402 221 259 445 445 351 351 171 171 171 171 358 358 358 233 458 192 427 229 247 126 126 326 326 326 326 326 326 326 101 101 149 391 491 373 66 68 172 115 273 494 240 216 216 45 45 45 45 314 35 198 22 448 448 448 464 154 154 154 458 96 86 105 105 336 470 151 151 178 178 96 96 36 272 191 191 131 472 458 144 27 27 437 319 319 319 53 53 76 35 259 242 242 242 116 41 41 41 324 422 143 465 458 389 389 389 389 314 196 479 331 307 307 61 61 215 35 35 354 420 420 420 3 14 411 475 475 475 475 475 475 475 301 143 129 74 492 492 236 236 129 75 108 278 278 278 178 458 458 192 485 134 134 359 81 300 382 245 143 82 144 27 351 319 319 203 53 394 76 465 29 242 242 242 41 41 41 19 19 454 13 417 491 170 421 421 491 491 491 421 421 128 491 128 128 491 128 128 193 193 17
+17 17 17 296 296 296 296 296 52 52 52 52 52 52 408 408 51 51 228 184 184 289 219 357 357 357 357 240 385 35 401 401 401 197 197 80 80 491 320 276 181 181 181 181 240 285 34 223 223 130 402 196 196 217 473 65 329 329 460 460 169 164 164 485 485 485 485 257 368 368 9 142 397 221 364 364 364 276 346 346 428 428 428 146 252 35 131 472 221 458 144 144 27 437 437 437 437 437 437 481 481 481 481 481 481 175 175 81 300 300 382 382 406 467 467 89 446 116 466 22 283 455 38 162 482 482 482 482 115 273 106 153 372 372 406 467 467 302 302 497 497 399 399 217 217 217 217 473 65 264 264 264 264 264 264 468 468 59 59 59 452 452 263 263 13 78 170 170 170 491 491 2 491 2 187 491 187 187 163 163 163 391 391 391 316 73 73 491 491 7 217 473 258 258 31 342 342 494 494 494 368 9 9 142 397 147 329 329 329 329 329 329 36 310 107 395 302 302 302 497 497 43 345 141 141 141 281 342 221 336 445 180 443 240 325 449 176 135 135 200 248 248 90 393 234 234 234 261 261 25 264 264 264 264 468 313 134 359 359 166 166 166 324 422 32 401 401 401 401 401 239 384 371 278 278 278 368 453 342 168 41 41 41 324 246 301 43 364 333 333 333 220 216 198 114 258 258 31 342 342 224 273 319 319 319 348 10 10 219 219 219 398 485 374 374 374 374 374 368 368 107 395 397 134 100 100 100 497 497 399 217 217 217 217 473 258 258 258 31 342 86 238 6 272 495 495 495 41 41 324 464 255 8 354 180 113 113 113 167 167 35 472 472 221 259 208 208 441 441 441 346 346 265 265 85 85 146 146 146 277 277 277 24 131 449 472 225 226 226 226 226 491 209 287 319 319 348 33 33 250 217 217 473 65 258 258 31 162 232 68 68 68 238 6 272 485 485 468 468 337 337 324 324 464 459 459 459 368 31 9 142 221 336 208 79 79 288 213 213 213 246 339 339 33 248 248 212 465 445 180 171 171 171 171 252 252 8 354 100 100 302 375 375 375 185 49 433 390 18 427 56 56 247 312 312 292 292 292 292 292 292 292 292 292 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 260 260 260 163 163 163 491 491 163 491 163 491 491 316 491 316 491 316 491 316 73 73 491 320 465 144 180 484 240 325 34 213 213 213 173 402 196 94 176 360 328 200 248 147 380 329 329 329 329 107 107 302 302 302 302 497 98 98 98 225 225 225 225 225 225 225 225 225 225 225 225 225 225 225 225 320 7 217 70 329 329 495 406 467 139 139 175 175 423 423 423 423 162 68 68 115 273 470 120 120 240 314 131 472 401 401 259 354 190 380 288 288 278 31 342 86 105 105 26 359 359 474 474 474 474 19 19 454 454 454 225 225 225 225 225 225 225 225 225 225 225 7 7 127 114 258 258 31 342 342 168 356 281 453 453 168 44 44 44 42 147 147 380 288 286 286 464 139 139 497 497 349 349 234 261 25 25 480 480 480 480 85 85 85 299 299 339 339 94 199 360 360 213 360 173 173 402 196 94 199 176 328 328 200 200 464 356 356 430 453 430 430 116 199 277 277 277 277 24 131 419 439 439 439 225 225 225 225 391 391 80 80 491 491 320 345 346 350 496 348 35 310 152 152 152 422 186 342 224 470 278 278 457 36 384 371 180 315 315 315 315 450 450 413 413 303 48 404 13 229 491 247 15 15 193 193 17
+17 17 17 296 363 363 51 51 51 184 491 373 72 268 268 268 268 268 88 88 88 353 353 406 467 106 297 297 297 297 293 293 219 219 464 106 106 387 387 372 349 349 205 261 25 106 496 496 496 233 233 233 233 270 270 433 390 390 18 112 112 439 439 439 439 439 78 78 491 491 28 28 28 2 2 2 491 2 2 341 341 491 341 341 163 163 163 163 163 163 163 316 491 316 491 491 491 73 491 373 66 68 172 115 273 231 231 231 231 53 53 394 76 164 214 214 214 214 214 328 200 200 248 248 248 212 127 45 45 92 92 167 167 35 472 393 393 155 155 332 332 332 387 134 251 251 241 431 431 376 376 376 460 460 169 178 35 401 458 144 192 69 69 223 223 223 223 223 130 130 352 352 402 483 440 411 475 475 475 94 475 475 475 324 324 464 180 493 493 493 493 216 300 382 382 313 10 10 479 331 290 290 290 290 434 434 434 434 203 399 217 70 70 65 65 428 428 428 146 146 24 131 472 221 259 420 420 420 420 420 143 458 144 144 27 437 481 481 481 481 481 293 122 122 131 472 393 393 234 234 234 261 487 288 288 432 330 348 64 76 310 107 107 395 395 459 459 459 459 215 233 131 427 229 247 247 126 326 326 326 326 326 101 101 149 149 228 228 412 188 154 154 154 416 416 96 368 453 453 278 278 278 278 278 31 342 86 86 238 6 272 191 191 191 191 37 325 335 335 440 145 89 194 446 446 67 212 131 34 106 297 297 297 293 43 345 109 109 109 171 171 368 453 342 58 72 72 110 110 486 486 460 240 240 325 34 154 154 154 416 96 96 453 168 278 278 278 31 39 86 238 6 272 191 191 191 314 133 259 354 255 236 236 108 119 397 487 360 360 360 434 339 33 250 217 65 65 329 495 406 288 288 139 175 175 423 423 423 423 129 458 144 27 437 151 151 169 169 402 402 221 401 401 354 495 498 59 396 313 325 449 89 89 446 116 250 217 473 258 258 31 342 224 494 494 494 368 142 142 397 147 380 329 329 329 329 329 36 310 107 395 302 302 302 375 98 98 229 82 247 126 126 126 326 326 326 326 326 326 101 408 149 149 228 289 412 188 188 340 121 394 478 478 68 86 105 336 354 106 265 265 428 428 146 146 240 325 449 34 223 223 223 223 223 223 173 173 402 352 352 352 352 352 352 97 97 417 197 417 237 237 237 237 237 80 491 80 80 80 209 287 157 157 157 372 245 245 129 129 74 492 492 492 492 58 110 486 486 486 460 215 215 35 96 66 68 68 342 221 336 336 354 420 420 422 143 458 144 27 27 351 151 151 151 368 453 342 168 168 223 223 223 223 223 223 173 173 352 352 352 352 97 97 225 225 225 225 80 491 127 114 222 222 468 313 313 325 34 490 490 31 342 342 342 273 494 203 70 65 134 134 175 175 431 486 486 486 486 468 468 468 469 469 469 36 449 449 41 41 41 19 19 454 417 170 170 170 491 28 28 28 28 491 28 362 362 491 362 362 362 362 491 362 491 491 491 362 211 211 211 369 491 369 369 369 369 369 21 21 21 21 21 21 21 21 21 408 408 408 149 149 228 289 491 320 473 65 329 329 245 406 406 380 134 139 139 175 423 423 423 423 345 141 141 281 281 9 168 44 44 44 129 129 108 108 119 119 437 437 437 481 481 481 481 481 481 481 182 182 182 182 293 293 497 497 497 98 98 404 225 225 193 193
+17 17 296 296 4 261 25 470 278 278 278 330 116 33 195 195 212 90 133 364 276 276 174 174 174 203 53 53 473 275 275 388 195 117 404 133 364 345 333 333 220 220 402 335 14 411 145 145 145 365 365 365 360 200 64 212 212 302 302 302 302 497 497 497 497 49 269 342 342 168 89 89 194 446 446 64 133 133 345 333 333 220 216 114 180 113 113 113 113 450 167 285 131 472 221 458 144 208 27 27 498 498 498 498 498 59 59 59 59 59 173 173 270 270 270 390 390 390 112 112 417 417 417 237 47 491 491 491 80 80 491 491 373 156 156 156 156 156 313 236 239 259 384 180 106 106 306 306 306 396 396 178 143 458 192 445 72 110 351 264 264 264 468 468 468 382 313 186 99 338 338 338 338 395 395 84 84 496 496 274 122 314 314 478 478 68 115 273 231 231 231 53 53 90 90 32 32 144 208 79 79 380 288 288 171 171 171 252 186 39 342 232 482 482 238 6 161 79 487 288 213 213 213 358 358 233 270 270 270 342 168 89 55 322 322 250 345 141 141 281 453 9 168 106 297 297 297 297 293 43 345 109 109 109 368 31 342 86 238 6 108 119 397 487 278 278 31 31 86 86 272 191 191 191 236 325 34 230 230 230 230 215 35 259 340 340 340 94 44 44 44 44 72 72 72 437 306 306 306 306 306 396 396 313 26 251 241 431 278 240 285 285 302 302 497 122 10 10 479 331 331 405 405 206 206 167 449 472 221 336 354 255 255 58 72 72 72 437 480 480 480 480 480 85 299 299 299 299 339 195 243 212 131 419 225 397 133 320 345 333 333 220 220 164 164 472 221 401 491 108 108 119 374 374 374 374 374 374 132 132 132 132 132 43 364 276 276 346 346 265 265 85 85 146 468 468 382 313 337 58 183 72 72 110 110 264 264 468 468 245 215 35 259 74 351 275 275 116 379 195 471 471 478 66 68 68 238 6 272 189 189 189 189 178 458 192 44 44 236 416 239 208 79 79 380 151 151 151 150 39 342 342 224 494 494 134 8 402 359 359 474 474 166 422 349 164 164 164 487 487 487 487 374 374 88 88 277 277 277 277 385 75 227 419 427 491 491 247 312 312 292 292 292 292 292 292 292 21 21 1 21 21 21 21 21 21 21 408 408 408 408 149 491 491 373 338 400 400 400 400 30 301 422 251 251 367 367 367 367 367 367 35 96 259 26 26 241 266 266 266 266 266 266 35 192 44 44 44 43 364 276 174 174 174 174 53 53 65 242 242 94 199 462 462 462 402 196 309 309 479 331 486 315 315 315 460 450 406 467 84 84 84 88 88 88 154 154 154 96 96 232 232 105 105 336 354 485 286 286 286 468 468 337 337 11 11 11 11 379 379 471 77 342 224 89 89 116 33 90 42 147 147 380 288 278 236 36 449 191 191 191 131 472 458 144 27 437 370 370 370 370 370 348 64 64 310 310 436 395 459 459 11 11 379 303 471 243 270 269 433 160 160 18 112 439 225 225 225 225 80 80 491 491 320 345 407 407 407 236 233 36 310 107 338 338 400 400 400 400 301 378 43 345 109 346 346 141 355 355 355 37 185 185 433 433 433 160 160 112 427 56 491 491 312 15 15 15 15 193 193 193 193 193 17 17
+17 17 17 296 296 363 51 51 51 51 51 491 184 184 184 184 320 159 159 159 159 240 35 127 0 0 222 378 345 347 141 141 281 342 342 44 44 44 38 162 232 68 172 115 470 470 171 171 171 252 173 173 176 176 135 135 200 200 248 248 478 232 232 232 68 115 273 231 231 231 231 53 76 76 164 164 214 214 214 328 200 464 255 255 8 259 180 113 113 113 113 450 285 285 58 156 156 156 156 156 245 245 399 217 217 473 65 315 315 315 315 450 450 450 450 169 352 164 164 97 397 397 364 407 407 407 407 407 407 36 310 310 107 107 107 447 97 483 197 226 226 80 80 209 209 188 118 118 118 118 352 352 25 177 177 254 325 34 254 254 254 314 129 259 137 137 137 137 137 116 195 195 195 335 14 226 411 145 463 463 463 463 29 29 382 313 186 186 162 68 172 344 344 344 344 344 274 186 162 232 68 26 26 26 386 431 428 428 146 146 35 26 359 166 166 166 236 36 384 490 490 490 173 280 280 180 443 443 139 175 175 81 81 469 469 469 215 233 96 96 227 419 427 229 247 126 126 126 292 326 326 326 326 326 101 101 101 149 149 228 491 320 7 70 70 65 65 428 428 85 146 24 325 202 202 202 402 129 259 137 137 137 137 33 76 465 465 27 121 121 33 394 478 68 68 115 115 278 278 278 285 300 382 313 24 325 34 121 121 116 33 394 212 239 371 371 278 278 143 458 458 192 469 469 325 34 459 459 459 173 173 280 69 69 223 130 280 44 44 44 38 342 68 115 273 432 432 330 379 64 77 77 224 494 462 462 402 183 183 219 485 485 485 374 374 132 399 53 334 334 334 59 452 263 417 417 417 417 237 237 28 28 491 28 362 362 491 491 491 362 491 362 491 362 491 491 40 211 211 369 369 369 369 21 21 21 21 21 21 260 260 260 260 391 391 391 491 73 289 289 7 345 109 109 109 347 467 467 499 297 297 297 297 293 293 35 259 74 190 104 104 104 285 34 41 324 301 43 276 109 109 139 139 139 293 293 293 98 98 13 13 417 417 417 417 417 417 80 80 80 80 435 66 66 179 179 179 179 179 314 314 196 217 473 258 258 31 342 224 494 494 281 142 142 397 147 380 329 329 329 329 143 310 107 395 395 302 302 375 98 98 13 417 417 417 170 249 20 28 305 491 442 305 305 2 491 491 2 2 249 491 491 305 305 366 491 491 366 316 491 435 435 435 435 435 435 289 412 287 287 111 111 111 438 378 345 141 141 281 9 142 336 144 27 480 480 480 146 299 64 34 34 462 130 29 44 255 38 349 205 205 261 190 380 288 288 171 171 252 252 314 239 219 219 219 152 152 374 374 132 132 43 364 276 109 372 372 59 59 396 339 243 227 472 472 198 216 127 114 84 84 84 16 16 16 274 98 98 13 225 225 225 345 409 409 409 409 94 199 111 111 111 438 438 162 342 172 115 273 106 481 481 426 426 206 399 217 217 473 486 486 486 460 460 169 164 164 485 485 485 485 374 318 186 162 54 238 6 272 499 499 499 206 240 285 34 176 135 200 200 106 284 405 206 169 169 402 96 36 272 469 469 236 325 93 93 93 207 207 19 454 454 229 82 247 126 126 126 326 326 326 326 326 326 326 326 326 101 101 101 149 149 228 412 412 287 111 111 111 438 349 164 164 106 106 405 405 206 167 457 217 473 473 476 476 171 252 252 8 420 420 324 324 183 451 30 30 30 378 141 141 141 281 342 142 221 144 180 84 496 88 88 176 176 135 135 248 248 108 377 123 123 216 22 283 455 236 384 180 91 91 91 91 91 178 458 96 6 272 334 334 334 304 304 304 185 185 323 390 390 18 112 439 417 237 237 237 237 237 237 237 237 237 128 128 128 128 128 128 193 193 17
+17 17 17 17 296 317 317 317 491 317 461 461 461 461 184 184 184 184 491 7 7 70 65 329 329 42 406 406 288 134 139 139 175 175 423 423 423 423 31 342 26 26 251 241 431 278 278 278 215 233 233 270 270 433 433 86 238 336 336 82 108 397 397 441 441 109 278 278 385 233 36 310 447 447 238 6 272 34 319 319 348 64 212 300 300 313 186 162 232 238 6 470 470 294 294 294 294 330 94 94 176 176 328 200 200 248 359 359 474 474 474 19 19 454 454 229 170 491 247 312 126 292 292 292 292 292 21 21 21 408 408 408 408 408 391 391 228 491 373 373 400 400 400 30 30 58 58 110 254 254 254 254 325 34 154 154 458 96 66 342 105 105 336 470 151 151 178 178 35 96 401 75 272 191 191 191 314 196 217 473 258 258 258 31 342 224 494 494 368 9 142 397 147 380 329 329 329 329 252 36 107 107 302 302 302 302 175 175 431 230 230 230 230 215 35 227 419 439 439 439 225 225 47 47 491 80 491 373 373 338 338 400 400 400 30 30 58 58 110 254 254 254 254 254 35 196 196 309 479 331 84 84 84 496 274 274 413 413 466 466 45 45 45 45 216 198 22 283 455 38 162 68 68 115 273 273 265 428 428 146 146 146 35 449 34 69 69 130 402 402 196 217 473 486 486 486 460 460 169 169 164 164 485 485 485 374 132 301 236 129 310 310 107 395 180 106 426 426 426 206 348 76 465 449 449 176 135 200 200 199 106 426 405 426 206 402 402 478 232 68 68 115 344 344 344 344 88 14 14 411 319 319 319 94 199 154 154 458 445 445 351 351 351 315 315 450 413 413 76 449 449 134 134 134 8 26 359 359 474 474 474 324 19 454 229 247 247 126 126 326 326 326 326 326 326 326 101 101 101 149 149 228 320 7 345 389 389 389 389 314 401 259 420 420 420 420 422 236 129 36 108 119 344 344 344 374 374 132 132 399 70 383 383 383 383 383 383 36 107 447 393 393 155 332 332 332 245 58 156 156 156 313 10 10 479 331 290 171 171 252 173 8 29 334 304 304 304 186 54 142 221 336 445 445 485 485 485 468 468 468 337 337 485 464 180 180 405 405 169 169 150 342 342 224 469 469 325 325 41 41 41 19 19 454 13 439 78 78 491 491 28 491 312 341 491 341 12 12 341 491 12 12 260 260 260 260 260 391 391 228 491 491 289 289 209 287 287 16 16 16 274 413 348 479 331 84 84 84 16 16 16 16 203 381 48 417 417 417 417 417 237 491 47 47 47 491 491 47 491 80 80 491 209 287 287 111 111 111 111 203 203 90 90 76 458 208 441 441 346 428 428 428 146 146 131 133 133 364 109 109 139 139 139 139 375 375 98 98 417 417 417 225 225 225 287 287 297 297 297 297 293 293 216 216 114 84 84 84 16 88 88 111 111 111 111 438 58 58 110 110 110 254 254 240 34 44 44 44 8 354 180 180 376 376 376 376 460 282 240 24 131 183 72 110 110 443 443 240 325 34 207 207 207 324 416 416 239 219 357 357 443 443 169 150 150 86 238 6 469 469 469 469 325 93 93 93 207 207 19 454 417 417 417 417 80 491 435 373 338 400 400 400 30 30 422 162 342 115 273 470 120 120 120 37 24 404 229 229 247 312 15 15 15 193 193 193 193 17
+17 17 17 363 363 363 363 51 51 228 491 289 7 217 70 65 329 329 329 329 169 164 164 164 485 485 485 301 378 378 43 345 109 189 432 330 348 64 76 36 377 377 123 123 8 259 190 380 499 499 428 428 146 146 252 35 131 133 147 147 380 288 288 173 173 280 29 334 59 59 452 263 417 417 417 417 417 237 237 237 435 237 237 80 491 435 435 435 7 7 364 345 152 152 468 468 313 416 416 445 180 443 443 240 325 34 135 135 135 200 200 44 44 44 251 241 81 278 278 26 34 302 497 122 8 32 32 32 354 153 153 153 153 387 387 387 207 207 146 301 393 155 165 165 165 165 53 44 44 94 335 14 411 411 153 387 372 372 396 349 349 352 25 242 242 94 199 255 271 38 342 342 115 273 106 265 265 85 85 146 175 175 81 242 203 217 473 89 340 116 195 195 195 195 195 197 309 309 479 331 84 84 496 173 280 29 469 38 162 54 482 105 336 144 180 496 496 496 274 186 99 436 436 395 423 423 423 423 355 263 263 229 82 247 126 126 326 326 326 326 101 101 149 149 228 412 412 83 194 194 194 322 212 131 451 30 356 281 342 342 221 144 27 351 319 319 319 53 176 135 135 200 248 248 248 479 331 106 426 426 125 348 466 466 22 283 455 236 259 161 161 161 487 288 290 290 290 434 434 339 248 76 108 377 351 494 116 94 479 331 428 428 428 428 299 358 358 24 227 419 439 439 417 417 237 237 28 28 491 28 491 362 491 491 362 362 491 362 491 362 491 362 491 491 362 218 218 40 305 366 491 366 366 366 366 366 491 435 435 316 73 73 289 289 209 188 118 118 118 118 402 221 196 217 473 65 329 329 245 42 147 380 288 134 139 175 175 423 423 423 423 58 110 254 254 254 254 36 478 232 232 68 172 115 273 470 120 120 120 120 24 314 314 472 198 127 45 45 45 45 457 196 217 217 473 65 329 486 486 460 460 169 164 164 485 485 485 485 485 374 132 58 254 254 254 254 314 416 458 144 180 106 426 426 426 413 348 64 76 465 108 377 123 44 236 129 259 190 190 380 499 499 428 85 146 146 143 131 472 133 133 147 147 380 288 288 173 173 280 29 334 59 59 59 313 143 36 377 377 87 87 217 473 65 213 213 213 252 325 34 44 44 44 129 259 445 445 445 351 351 351 486 365 365 360 200 200 200 248 212 192 180 255 495 42 147 380 380 374 374 132 132 132 349 155 155 165 165 165 165 70 65 284 405 206 169 150 162 482 482 482 482 482 238 6 161 487 487 288 288 288 139 139 139 81 337 324 423 423 423 423 423 423 452 263 229 247 312 126 126 292 326 326 326 326 326 326 326 408 408 149 149 228 491 320 7 217 258 258 258 31 54 224 494 494 368 453 142 142 397 147 380 329 329 329 329 252 36 107 107 395 302 302 302 375 375 98 143 401 259 144 389 389 389 389 314 196 309 479 307 307 307 61 61 285 449 202 202 202 130 129 259 354 137 137 137 116 33 250 217 217 70 70 138 138 138 138 138 372 467 467 255 255 38 54 86 238 6 272 106 499 405 426 206 348 199 199 459 469 271 99 447 447 6 6 6 419 439 78 491 305 421 491 421 128 491 193 193 193 17
+17 17 17 17 296 317 317 317 435 435 184 184 184 373 373 338 400 400 400 30 378 345 141 141 281 453 168 145 145 145 460 460 460 178 96 96 436 447 134 134 134 134 134 359 166 166 166 166 324 186 162 482 482 482 238 6 336 161 487 278 278 178 458 192 192 242 116 116 195 195 394 394 212 401 401 401 401 384 371 180 180 319 319 319 203 53 381 381 381 381 381 381 76 393 155 332 332 332 332 245 349 205 205 261 25 106 265 265 265 265 85 85 85 146 146 173 402 402 66 68 68 115 273 470 151 151 178 458 458 192 242 275 275 379 303 471 471 471 49 433 160 112 427 247 247 312 126 292 292 292 292 292 292 292 292 292 21 21 21 326 21 21 408 408 408 408 149 149 228 228 316 316 73 491 289 289 209 177 177 177 177 131 133 133 141 141 141 141 281 453 342 483 483 226 226 209 287 319 319 319 319 348 348 394 478 478 66 68 68 115 494 494 494 215 129 259 74 74 437 72 72 437 437 496 496 496 496 274 274 368 368 368 9 168 494 134 134 8 100 100 100 100 100 375 375 497 216 198 45 45 45 45 35 196 196 70 65 329 329 329 406 406 467 288 139 175 175 423 423 423 345 141 141 281 342 142 196 217 473 476 476 476 143 458 192 176 135 135 328 200 248 248 393 234 234 234 261 25 319 319 319 348 94 199 223 223 130 402 58 156 156 156 156 59 59 59 452 263 229 247 247 126 126 126 292 326 326 326 326 1 1 1 1 408 408 260 260 391 391 391 391 491 73 73 73 289 491 320 159 159 159 159 159 385 35 196 196 217 473 258 258 258 342 342 224 494 494 494 368 9 142 397 147 147 329 329 329 329 329 329 143 310 107 107 395 302 302 302 497 497 43 345 141 141 141 281 453 168 483 14 226 209 297 297 297 297 297 399 70 65 65 496 169 150 54 238 6 6 472 393 234 261 261 148 148 148 387 372 396 186 186 54 86 238 6 6 472 472 472 482 224 224 494 494 38 162 323 323 224 494 494 129 259 74 437 496 496 496 496 274 274 368 368 9 168 277 277 277 37 24 131 227 419 439 439 439 439 417 237 237 237 28 28 491 491 28 362 491 362 362 362 491 491 491 491 362 362 491 218 362 491 491 218 491 218 218 218 218 218 435 218 366 491 491 305 366 491 366 435 366 491 366 491 366 316 316 491 316 491 316 316 491 73 73 289 289 209 287 430 430 430 430 430 430 219 219 477 477 378 88 109 44 116 116 199 335 14 226 226 226 209 209 411 498 498 498 308 396 313 94 459 459 459 459 271 31 342 86 86 6 272 472 221 196 70 473 329 329 329 406 467 134 134 134 175 175 423 423 423 423 423 263 263 225 225 225 225 225 80 373 373 338 338 400 400 400 30 422 239 384 490 490 399 217 473 365 365 365 365 365 388 64 212 191 191 191 314 133 259 409 409 409 409 33 33 250 32 280 280 153 153 343 387 387 146 358 39 39 86 142 142 397 456 456 456 236 36 108 119 308 308 308 308 308 308 308 388 339 33 394 212 108 123 123 123 123 58 156 156 156 156 59 59 452 263 229 229 247 312 312 126 292 292 292 292 292 1 1 1 1 23 23 23 408 408 408 391 391 316 73 491 289 289 7 357 357 357 271 31 342 168 494 255 402 402 458 208 441 441 153 153 153 387 372 396 396 271 186 39 39 390 390 390 390 390 390 18 18 112 439 439 439 439 237 78 421 128 193 193 17
+17 17 17 296 296 296 184 184 373 66 172 179 179 179 179 314 196 196 70 65 329 329 495 406 467 288 134 139 139 175 423 423 423 423 263 229 82 247 126 126 326 326 326 326 101 101 101 149 149 228 412 83 253 253 253 453 342 224 118 118 118 118 402 402 221 259 144 445 180 443 240 449 449 176 135 135 200 200 248 248 32 32 354 354 153 153 153 153 387 387 387 85 207 318 185 269 9 142 393 155 165 165 165 165 70 14 14 411 153 387 372 372 349 349 205 352 29 242 116 94 199 255 38 31 342 68 115 273 265 265 85 85 85 175 175 81 203 203 471 471 49 453 168 89 340 116 116 10 10 479 331 84 84 496 274 8 29 459 313 31 162 54 105 105 336 27 496 496 496 496 274 99 99 436 395 423 423 423 43 43 345 347 347 245 245 129 259 74 437 437 306 306 306 206 240 285 449 69 223 130 198 198 283 455 219 219 219 219 219 485 374 374 374 132 132 99 99 161 161 397 134 100 100 100 497 497 186 162 482 142 105 336 336 336 190 380 288 288 360 328 200 200 195 195 248 248 364 364 276 276 109 498 498 498 396 396 178 35 458 192 125 125 125 125 348 199 335 14 411 411 475 475 475 94 475 475 324 324 301 378 43 364 276 109 109 443 443 139 139 139 293 293 497 497 42 42 147 147 380 288 443 443 416 416 458 445 485 134 134 175 175 158 158 158 158 325 449 191 191 191 325 335 14 145 145 486 460 460 173 280 280 242 242 116 379 250 359 81 41 324 324 324 422 349 234 234 261 261 25 106 306 306 306 306 306 306 282 203 203 117 404 229 247 247 126 126 326 326 326 326 326 326 326 326 326 101 408 408 149 228 491 491 412 188 340 340 67 77 478 232 86 68 272 470 470 443 443 240 34 223 223 130 129 259 354 420 420 420 360 135 135 135 200 44 44 44 44 199 335 145 319 319 319 348 348 33 90 72 72 72 72 498 498 498 498 498 396 396 285 285 180 106 284 353 206 206 173 280 280 121 121 116 199 469 469 173 280 418 418 418 418 418 418 99 99 436 436 60 60 298 298 303 303 303 117 404 13 229 491 247 312 126 292 292 292 292 292 292 12 12 12 21 260 305 201 201 201 201 201 201 201 201 201 491 491 316 316 491 316 491 289 289 7 7 473 258 258 258 342 224 494 494 494 281 9 142 397 147 329 329 329 329 329 143 36 107 107 395 302 302 497 497 349 349 234 261 25 180 189 139 139 139 293 167 35 35 198 45 45 45 45 310 338 400 400 400 400 30 30 3 58 110 110 254 254 254 254 254 314 131 133 364 147 456 456 456 38 162 68 68 172 115 444 444 444 444 444 246 246 318 173 402 6 272 34 44 44 44 8 8 401 401 401 197 491 80 491 80 491 80 80 197 66 66 68 172 115 273 494 278 173 8 4 280 485 485 286 286 286 286 468 382 245 245 399 399 217 217 217 217 473 65 432 432 330 348 64 64 465 449 449 302 302 302 497 497 122 129 401 401 401 401 491 310 107 395 395 106 481 424 424 182 182 375 375 122 233 75 227 227 419 419 439 439 439 439 439 237 439 78 78 47 491 47 491 491 316 316 491 491 316 316 73 373 373 373 338 400 400 400 30 422 422 164 164 25 106 106 405 405 405 206 167 449 449 34 340 340 116 94 199 145 154 178 458 96 342 342 224 105 27 386 386 386 386 399 473 418 418 418 418 99 436 436 60 60 298 116 33 250 53 394 76 259 74 441 441 153 387 387 299 299 299 358 243 270 270 433 160 112 427 491 247 247 126 15 15 193 193 193 193 17
+17 17 17 17 296 296 52 52 52 51 51 51 51 51 184 184 491 320 7 217 70 473 65 65 329 42 42 147 147 147 380 288 134 139 175 175 423 423 423 423 423 335 440 89 89 446 446 212 131 472 196 196 473 65 486 486 460 460 169 169 164 164 485 485 485 132 143 129 401 144 144 27 27 437 151 151 169 164 164 164 401 401 259 29 382 313 285 34 69 223 130 280 106 297 297 297 297 297 293 215 35 35 259 74 74 213 213 213 213 213 252 215 259 29 100 302 175 175 81 255 255 236 384 180 405 405 405 206 215 96 449 135 135 135 200 44 44 44 8 32 32 401 401 401 401 401 354 354 153 153 153 153 153 153 153 387 387 387 207 207 207 207 19 19 454 454 454 229 229 247 312 312 312 126 292 292 292 292 292 12 12 12 12 12 12 260 260 260 260 391 391 391 228 491 373 373 155 165 165 165 165 53 44 44 199 106 106 284 387 372 372 396 349 349 234 261 29 242 116 94 199 255 38 162 232 232 172 115 273 265 265 265 85 146 146 146 175 175 81 459 203 203 203 381 48 404 13 439 78 170 170 170 28 491 187 187 341 2 2 2 491 2 2 362 362 362 362 40 491 366 366 491 366 435 366 366 491 366 491 316 316 316 491 491 435 435 491 289 7 7 364 276 109 109 139 139 139 139 293 293 375 98 98 98 225 225 225 225 225 225 225 465 198 127 5 5 455 43 43 364 276 276 109 109 498 498 498 498 134 134 139 302 293 497 122 122 131 133 345 141 141 141 281 162 232 232 232 232 68 68 115 273 498 498 498 313 240 35 26 26 359 359 166 166 166 422 143 36 108 108 119 308 308 308 308 308 313 94 176 135 200 200 200 230 230 230 215 35 478 232 68 68 273 273 265 428 146 146 416 416 401 401 259 371 180 315 315 315 315 315 315 450 450 450 413 413 303 117 48 404 229 229 491 312 312 126 126 292 292 292 292 292 292 292 292 292 1 21 21 21 21 408 408 408 408 408 391 391 391 228 491 373 373 338 400 400 400 400 378 378 345 389 389 389 314 8 354 420 420 420 422 342 342 224 494 494 129 129 74 190 487 499 499 265 85 85 146 368 453 238 6 272 34 494 236 314 196 196 309 309 309 479 331 231 231 231 231 349 164 164 214 214 214 328 200 200 464 145 460 460 460 169 402 96 272 300 382 313 216 216 114 258 258 258 271 271 39 433 433 433 390 160 18 112 427 56 56 170 312 312 312 187 187 292 12 12 12 12 12 12 12 12 12 408 163 163 163 491 316 491 316 491 491 289 289 289 7 7 309 309 309 479 331 231 231 231 231 169 164 164 164 214 214 214 214 328 328 200 303 117 404 404 439 439 439 439 439 237 237 237 421 421 421 421 491 421 128 491 128 128 128 128 193 193 17
+17 17 17 296 211 317 317 317 317 52 52 52 52 52 52 52 52 51 51 51 51 184 184 184 491 320 320 181 181 181 285 449 34 125 125 125 348 348 457 14 226 226 226 226 209 411 498 498 498 498 498 169 169 164 164 472 221 259 354 181 181 236 35 478 54 224 344 344 344 36 449 44 44 44 10 10 10 479 331 84 496 496 274 99 99 436 60 60 298 116 94 199 340 340 116 76 377 123 123 123 219 477 222 222 222 372 372 245 58 72 72 110 110 120 120 120 120 37 24 24 131 404 439 225 225 225 225 80 80 80 373 373 338 400 400 400 400 422 143 384 490 490 490 399 217 473 365 365 365 365 365 365 330 388 212 384 191 191 191 314 401 401 75 384 490 490 31 342 342 224 494 494 129 259 74 190 487 487 374 374 374 173 173 176 176 135 200 200 248 359 359 474 474 474 474 19 454 229 229 491 247 312 312 126 292 292 292 292 292 292 292 292 21 21 21 21 21 21 21 21 21 21 21 260 408 408 408 408 391 391 491 316 73 73 73 73 491 127 127 258 258 258 258 31 342 342 342 97 72 72 110 254 254 254 254 314 35 259 137 137 137 137 137 33 33 394 32 465 384 180 180 319 319 319 348 195 195 250 250 364 333 333 220 220 216 114 180 113 113 113 113 167 167 131 449 183 156 156 156 156 406 467 255 255 236 314 90 4 280 280 265 265 265 85 146 146 186 39 342 142 221 336 354 420 420 360 360 135 135 200 200 464 145 376 376 376 376 460 169 150 150 342 86 105 6 96 96 227 419 419 439 78 47 47 47 47 491 491 80 80 289 289 209 83 194 194 194 194 194 194 282 388 195 195 131 472 472 196 217 70 65 65 319 169 150 150 86 238 6 472 221 336 74 190 492 492 492 492 245 349 205 205 261 25 148 148 148 372 372 396 396 271 186 39 86 86 142 221 336 354 420 420 420 422 143 36 371 490 490 31 9 142 221 336 336 74 190 487 487 288 374 374 374 132 132 132 173 173 96 6 227 419 439 439 78 170 305 170 28 28 491 28 491 28 28 28 491 362 362 362 362 362 362 491 362 362 362 491 362 362 305 305 218 218 491 218 218 218 491 218 218 218 218 435 435 211 211 369 21 21 21 21 21 21 21 21 23 23 260 260 260 260 391 391 391 491 73 73 289 289 320 320 109 109 84 84 139 139 139 139 139 16 16 293 293 293 43 364 345 152 152 152 152 422 402 221 221 354 137 137 137 137 33 394 76 465 164 214 214 214 360 360 76 458 192 176 176 135 200 200 464 255 255 8 354 180 113 113 113 113 206 240 285 34 277 277 457 173 155 155 332 332 38 162 232 68 115 273 231 231 231 231 53 53 90 76 108 119 103 103 103 103 103 103 103 85 299 299 203 381 381 117 404 263 439 439 417 417 237 237 47 491 80 80 80 80 435 435 209 287 297 297 297 297 293 293 293 43 364 109 109 278 278 348 64 465 449 300 382 495 467 467 242 116 33 394 393 205 261 25 470 376 376 376 376 460 178 178 233 96 96 227 419 419 439 225 80 80 80 80 80 320 456 456 456 456 236 108 119 308 308 308 308 308 179 313 64 212 472 196 217 473 65 329 495 406 467 134 134 139 175 175 423 423 423 423 263 13 229 229 491 312 15 15 15 193 193 193 193 17
+17 17 17 363 363 363 363 363 408 51 149 228 491 491 320 7 473 258 258 258 31 342 224 494 494 368 453 168 180 145 329 329 175 175 81 81 469 416 416 96 453 168 470 365 365 365 330 348 212 300 300 382 313 186 162 232 232 105 105 336 470 432 432 330 330 64 64 77 449 224 300 156 382 245 43 43 345 141 141 281 453 342 168 230 230 230 215 35 74 183 485 286 286 382 245 245 43 364 276 174 174 174 348 348 64 64 212 93 93 93 93 93 301 8 8 255 255 349 349 155 155 148 148 372 372 245 245 458 458 208 190 487 487 278 258 31 342 342 86 105 196 217 473 459 459 271 271 39 433 433 160 168 89 55 322 67 394 76 310 338 338 400 400 400 400 30 324 422 186 162 68 68 115 470 470 120 240 240 314 310 338 400 400 400 400 301 378 345 141 141 281 281 9 221 336 144 106 496 88 319 146 135 339 76 36 377 87 87 416 259 445 180 443 443 240 325 34 44 44 44 251 251 241 431 278 26 302 302 497 497 122 32 401 82 144 498 498 498 498 498 139 302 293 497 497 497 122 393 155 155 165 165 165 165 466 466 448 448 464 255 38 38 162 68 115 273 273 265 265 85 85 146 146 175 81 81 242 203 399 70 65 410 410 410 410 173 280 29 29 406 467 89 446 67 58 72 72 72 496 496 496 215 215 35 96 96 272 449 242 275 275 275 303 195 199 335 188 188 340 340 466 22 283 455 38 162 232 105 105 336 491 336 190 380 288 288 288 328 328 200 303 303 117 48 78 491 491 491 491 421 421 491 201 193 193 193 193
+17 17 17 363 363 51 51 491 184 373 373 66 232 68 172 115 273 84 84 16 16 274 274 399 70 473 65 329 329 460 169 169 164 164 164 485 485 485 374 88 89 89 446 212 131 106 111 111 111 111 85 438 58 110 110 202 202 202 402 221 36 108 119 437 405 405 206 178 35 96 96 272 277 191 325 34 180 410 410 410 410 410 410 173 280 29 334 382 59 59 245 335 14 287 284 405 405 206 169 349 352 25 242 242 242 116 94 199 106 426 426 426 426 282 282 388 195 117 335 335 440 145 463 463 463 463 29 382 313 186 162 342 172 115 273 432 432 330 379 379 243 243 243 77 433 433 433 160 112 56 247 247 312 126 292 292 292 292 1 326 326 326 326 101 101 149 149 228 491 491 320 345 152 152 422 422 164 164 106 106 405 405 206 167 35 35 397 152 152 152 314 90 458 445 180 443 443 240 285 44 44 8 259 354 153 153 153 153 387 387 207 207 19 454 417 417 417 417 417 237 237 47 491 47 2 491 2 491 491 2 316 316 316 491 316 316 73 491 289 491 7 217 217 473 329 329 329 329 329 329 164 485 485 485 485 374 141 281 281 9 221 336 445 180 443 240 325 449 176 135 135 200 200 180 230 230 230 230 215 35 192 340 340 340 116 33 33 219 219 219 219 286 286 286 286 334 304 304 304 185 49 323 219 219 152 152 152 236 94 331 84 84 84 84 84 16 274 98 263 13 417 417 417 417 435 225 373 451 451 451 30 30 422 186 162 232 232 68 68 115 273 278 278 178 143 96 96 86 86 86 238 272 41 41 41 41 19 19 454 454 225 225 225 83 83 55 55 55 322 212 34 30 30 30 324 356 356 356 356 281 453 342 242 242 379 379 478 478 68 172 344 344 344 344 344 186 162 54 482 142 105 336 336 190 499 499 499 499 85 85 146 146 464 253 253 253 368 342 342 451 30 30 301 378 43 276 174 174 319 319 348 379 77 77 9 142 397 336 276 276 346 346 346 387 355 355 355 37 185 185 269 433 433 112 427 82 247 312 126 292 292 326 326 326 326 326 326 326 326 101 101 149 149 228 491 451 257 257 257 31 342 142 72 72 437 306 306 306 306 306 396 167 167 457 32 401 259 161 161 487 499 151 481 215 215 29 302 497 497 71 71 342 342 57 57 57 57 203 53 44 44 44 416 239 458 484 484 484 484 314 32 32 259 384 371 213 213 213 286 286 139 139 302 375 375 98 98 13 417 229 491 247 126 126 126 292 326 326 326 408 408 408 149 149 228 491 289 491 83 55 55 55 322 67 212 219 219 152 152 152 236 10 309 331 331 84 84 16 274 88 58 72 268 268 268 268 268 268 274 32 32 401 401 401 384 371 443 443 443 150 150 86 105 105 336 29 29 288 313 285 131 72 72 437 306 306 306 306 306 396 396 24 325 34 177 177 143 77 342 142 221 336 144 180 405 405 206 167 167 35 36 377 377 87 87 8 354 420 420 420 246 246 252 325 87 87 416 458 445 180 443 443 240 325 58 72 72 72 437 265 265 85 85 468 468 468 396 313 24 131 58 72 110 351 139 139 139 139 293 375 375 233 233 233 419 229 491 247 312 15 15 15 15 15 15 193 193 193 193 193 17
+17 17 17 296 296 363 52 52 52 52 52 51 51 51 51 184 491 491 412 0 0 222 356 356 281 9 196 196 479 331 463 463 463 463 29 29 382 245 335 14 14 226 226 226 226 209 209 475 475 475 475 475 475 475 324 301 8 354 180 151 240 240 325 41 324 324 422 36 377 87 87 354 420 420 420 324 3 58 72 72 110 110 486 486 486 460 282 37 24 35 259 159 159 159 236 35 198 127 114 84 496 496 274 186 162 482 482 482 482 482 238 6 272 371 485 374 374 374 8 354 29 191 191 191 37 24 131 472 225 72 72 72 110 110 486 486 460 460 460 169 352 352 402 221 401 336 79 79 288 84 496 496 413 348 250 250 81 278 278 285 285 302 302 497 497 349 234 234 234 261 190 380 288 288 330 64 64 76 310 107 447 447 221 336 354 153 153 153 153 387 387 304 304 185 185 269 433 160 112 427 491 247 312 126 292 292 292 292 292 326 326 326 408 408 408 149 149 228 491 412 83 55 55 55 322 212 34 34 253 253 31 162 482 482 115 485 485 374 374 374 339 94 199 253 253 253 253 453 9 219 219 152 152 152 301 236 239 384 371 371 374 374 132 132 416 416 445 445 180 443 240 385 131 133 133 364 276 174 174 174 319 348 33 250 394 212 465 190 380 380 496 496 274 143 458 192 192 340 340 33 394 108 377 123 123 219 222 222 222 222 245 245 43 43 276 109 109 403 403 403 207 318 318 318 49 342 168 89 89 116 33 76 108 119 437 437 405 405 405 206 167 167 457 35 77 68 342 273 231 231 231 203 53 76 198 164 214 214 214 328 200 200 117 229 229 247 126 126 326 326 408 408 391 228 491 491 373 451 30 30 356 356 368 453 168 180 230 230 230 230 230 215 35 35 35 401 89 89 89 446 67 212 131 106 106 284 405 405 206 169 349 402 402 6 272 123 123 216 216 22 283 455 251 241 431 405 405 405 215 215 169 270 86 238 6 300 382 382 245 458 445 445 351 351 365 365 365 365 388 94 199 495 495 406 337 41 41 318 318 49 9 168 157 157 157 467 313 313 216 22 22 283 38 162 232 232 238 6 272 470 470 171 171 171 358 358 358 233 270 270 433 433 433 160 18 112 439 439 439 439 237 237 237 237 237 237 237 237 80 80 491 435 435 412 83 415 415 415 131 393 234 261 261 25 498 498 498 498 498 396 186 39 54 86 238 6 472 472 196 217 217 473 65 486 486 460 460 169 164 164 485 485 485 152 152 422 162 323 224 224 494 494 236 36 310 395 470 470 151 151 150 39 86 342 272 191 191 191 314 90 401 259 445 180 443 240 325 176 135 200 200 199 44 44 44 58 72 72 72 350 350 350 350 350 413 413 53 250 250 212 354 354 153 153 153 387 387 387 207 207 19 454 229 82 247 126 126 126 126 326 326 326 326 326 326 326 326 326 101 408 408 149 391 491 289 289 289 320 159 159 159 240 285 111 111 111 111 438 438 186 162 342 68 273 470 120 240 240 314 196 196 309 479 331 331 84 84 84 16 16 274 274 349 349 234 234 261 425 386 431 376 376 460 167 167 36 108 377 123 123 216 127 114 92 92 92 92 92 92 282 385 385 233 227 419 439 417 417 237 237 237 421 491 421 421 491 491 128 491 128 193 193 17
+17 17 17 296 211 211 52 52 52 363 363 363 408 51 51 228 491 491 320 114 0 0 0 301 399 473 65 476 476 476 171 252 8 420 420 420 324 464 106 297 297 297 297 297 293 293 42 42 42 147 380 499 499 499 428 428 85 85 85 207 207 358 24 34 34 111 111 319 203 53 10 479 307 307 307 307 61 167 167 478 478 68 68 172 115 470 403 403 403 403 135 135 135 200 248 212 127 114 222 222 468 313 313 10 10 479 331 307 307 307 307 426 426 206 206 206 385 24 227 419 419 439 439 225 225 225 225 225 80 80 289 320 159 159 159 159 236 35 196 196 479 331 231 231 231 16 274 274 274 251 251 241 431 431 319 319 348 64 64 212 34 242 242 116 394 478 162 482 482 238 6 161 487 487 213 213 252 252 335 14 411 145 145 486 486 468 468 467 467 467 134 215 8 270 270 86 9 142 393 155 332 332 332 245 399 217 429 429 429 429 246 246 246 19 19 454 454 225 225 417 417 80 80 412 287 111 111 111 438 438 186 162 342 68 115 273 470 120 120 120 37 24 24 404 427 229 491 247 312 126 292 292 292 326 326 326 408 408 408 391 228 228 289 289 289 320 209 445 278 278 278 314 314 196 217 429 429 429 429 464 464 44 44 44 10 10 10 309 479 331 171 171 171 252 325 34 494 173 402 402 221 259 354 354 153 153 153 387 372 396 285 415 415 415 415 457 26 251 241 431 444 444 213 213 246 358 358 39 433 433 86 86 6 6 227 419 439 417 417 237 237 237 237 491 491 491 47 47 491 491 491 316 316 316 73 73 80 435 412 114 0 139 139 139 293 293 8 420 420 420 420 464 44 44 44 44 42 42 147 147 380 288 278 278 278 271 271 39 342 86 105 105 144 472 196 196 331 331 231 231 274 399 217 473 65 486 460 240 285 300 382 245 58 72 72 489 489 374 132 132 8 152 152 152 152 324 416 458 445 445 180 120 120 120 37 385 233 227 419 427 229 491 247 312 126 292 292 292 292 292 326 23 23 326 326 326 101 101 101 149 149 228 491 289 320 159 159 285 285 106 111 111 284 481 293 169 349 205 205 25 485 485 485 139 139 293 497 335 14 411 411 213 213 213 213 213 318 368 368 453 168 41 324 485 382 406 467 467 340 340 340 116 33 250 70 46 46 46 46 438 438 399 217 70 65 480 480 480 480 480 480 85 299 299 299 299 339 64 212 89 89 322 116 394 478 478 232 232 68 26 26 81 444 213 252 215 129 401 478 232 68 68 115 273 470 315 315 315 450 450 413 64 64 131 300 382 382 467 415 415 236 35 196 196 479 331 265 428 428 85 146 358 233 270 342 224 118 118 118 118 402 345 152 152 152 458 445 180 443 443 285 34 44 44 8 32 259 354 153 153 153 372 372 467 467 299 394 76 465 445 351 351 116 94 199 331 171 171 171 171 252 325 34 324 324 464 275 275 275 303 48 48 417 417 417 491 237 237 237 491 421 421 421 491 128 128 491 128 491 305 128 128 193 193 17
+17 17 17 296 305 317 317 491 317 491 317 461 491 461 461 435 435 491 435 435 491 491 435 289 373 66 68 115 273 273 84 16 88 88 109 340 340 340 466 466 22 283 448 448 448 464 464 432 432 432 330 330 388 195 64 131 133 345 152 152 152 422 314 239 371 490 490 38 342 68 115 273 106 265 265 265 85 146 146 325 34 191 191 191 37 314 36 377 87 87 87 14 14 145 145 376 376 460 460 150 150 342 105 221 336 96 196 217 473 258 258 31 342 224 494 494 494 31 162 232 86 105 336 354 470 432 432 330 379 64 77 77 224 224 300 334 334 59 313 313 36 377 87 87 87 129 74 74 351 278 416 416 144 180 180 151 240 368 453 342 168 180 113 113 113 113 450 167 35 131 133 133 364 364 276 174 174 174 174 348 348 195 195 250 250 345 409 409 409 116 64 76 310 338 400 400 30 301 378 43 345 109 109 330 330 64 76 449 449 180 410 410 410 410 8 29 29 382 313 236 36 377 87 87 416 458 445 180 443 240 385 131 58 156 156 156 156 313 313 251 251 81 431 278 285 26 302 302 497 497 416 458 144 498 498 498 498 498 134 302 375 375 98 98 13 229 229 491 312 312 126 292 292 292 326 326 326 326 326 326 326 326 326 326 326 326 326 326 326 101 101 101 101 149 149 228 491 491 320 152 152 152 422 58 58 72 498 498 498 498 396 313 314 35 26 241 241 376 376 376 460 169 150 86 86 6 272 472 397 354 109 213 213 213 358 143 458 96 99 338 400 400 400 301 378 8 141 141 281 281 9 221 221 144 180 84 84 496 88 88 176 176 176 328 328 200 117 117 454 454 439 78 491 491 312 126 126 326 326 326 101 408 408 149 228 491 373 66 66 115 273 84 84 16 43 43 345 152 152 152 422 162 68 68 115 273 273 432 330 330 64 131 131 183 156 156 156 156 156 156 245 43 43 364 276 276 109 498 498 498 59 396 313 24 131 472 259 354 62 62 62 62 438 438 42 147 380 288 329 329 36 107 395 300 382 313 314 478 478 478 172 105 336 470 432 432 330 330 33 394 77 54 107 395 382 382 313 186 31 54 142 393 336 25 25 496 496 496 496 274 215 233 270 270 342 224 415 415 325 472 458 144 27 437 437 306 306 306 396 396 53 53 469 469 24 325 41 41 41 324 422 36 108 377 87 87 8 239 190 380 288 360 360 200 200 464 459 271 31 342 224 44 44 38 162 232 232 482 105 105 196 70 65 65 306 306 306 396 396 385 131 472 225 225 225 225 225 225 7 251 241 431 266 266 266 266 146 178 35 35 401 26 359 359 166 166 324 301 8 129 354 354 153 153 153 153 387 387 387 207 464 464 464 69 130 130 280 255 255 236 8 354 180 113 113 113 113 113 450 167 167 457 401 401 401 75 108 119 351 351 351 432 330 388 199 199 495 495 406 467 134 302 251 251 241 431 443 443 443 173 280 29 275 275 275 303 303 303 48 13 229 491 491 312 312 312 292 292 292 292 292 21 21 21 21 21 21 21 21 21 408 408 149 149 149 491 491 491 320 152 152 152 422 143 384 490 490 490 31 342 68 115 273 470 265 265 428 85 146 146 325 325 191 191 191 191 314 314 198 127 114 114 92 92 92 167 457 364 345 389 389 314 129 259 354 420 420 420 301 216 22 283 455 236 259 354 180 443 443 443 169 150 150 39 342 86 238 272 371 470 93 171 171 171 358 358 233 310 107 107 112 439 417 417 237 237 128 193 193 193
+17 17 296 296 296 184 184 412 209 287 424 424 424 424 424 274 274 122 285 34 34 242 116 479 331 230 230 230 169 349 402 96 36 377 87 87 87 129 354 420 420 420 420 246 3 464 223 223 130 402 478 232 232 232 172 115 273 231 231 231 231 203 53 53 219 219 219 219 219 485 374 374 132 132 186 39 54 342 224 89 340 116 33 394 212 384 371 374 374 88 88 176 176 135 200 200 248 76 465 310 107 395 395 441 441 153 153 153 182 372 372 372 372 304 304 185 185 269 269 9 142 97 397 336 147 380 499 499 428 85 146 146 325 34 106 106 106 426 426 426 426 206 169 169 352 352 352 352 352 352 97 97 225 225 225 83 55 55 55 322 67 64 212 219 219 219 464 180 180 319 319 348 200 464 242 116 94 331 230 169 169 402 402 6 377 87 87 420 420 420 422 422 129 310 161 161 487 487 288 288 290 290 434 434 339 64 212 131 180 230 230 230 167 167 457 401 401 491 190 190 190 488 488 488 405 206 215 215 35 29 334 334 59 59 452 452 263 229 491 247 312 126 292 292 292 1 1 1 1 21 21 21 21 21 21 21 260 260 260 260 391 391 391 491 491 320 345 152 152 152 301 399 217 473 360 360 360 434 339 64 64 108 377 87 87 416 445 485 278 173 280 57 57 57 53 473 44 44 44 416 129 259 144 484 484 484 285 131 58 72 72 72 437 350 350 350 350 350 413 203 381 335 335 14 440 145 194 446 446 33 394 478 478 482 482 482 482 105 336 208 441 153 153 153 182 182 175 81 176 176 328 328 303 117 48 417 417 417 417 237 237 237 491 47 80 491 80 491 7 7 152 152 152 58 58 110 110 254 254 240 34 44 44 236 36 108 119 119 351 486 139 175 175 81 81 469 416 8 79 380 288 288 365 365 282 203 203 53 394 393 155 155 332 332 165 399 217 473 258 258 258 31 342 224 494 494 368 453 168 168 145 329 329 329 175 81 81 469 416 416 453 453 470 365 365 365 365 388 64 212 300 382 382 313 186 54 54 105 336 354 470 432 330 379 379 77 77 54 224 300 334 313 236 36 377 377 87 236 236 93 93 93 93 93 93 207 207 207 207 19 454 229 247 247 126 126 126 326 326 326 326 326 326 326 326 101 101 149 149 491 289 491 127 5 5 455 399 217 473 65 290 290 171 139 139 139 293 293 399 217 65 136 136 136 136 282 388 33 394 32 259 354 190 380 499 405 405 206 206 285 449 34 277 277 24 314 393 155 155 165 165 165 165 466 22 22 283 38 162 342 238 6 272 470 470 171 171 171 358 99 436 436 60 60 298 298 303 303 117 48 229 491 247 126 126 326 326 326 408 408 408 149 228 491 373 66 68 68 68 273 470 403 403 403 403 207 135 135 135 200 200 248 212 127 0 0 0 0 378 378 347 347 347 347 245 143 458 144 27 437 437 319 319 319 53 53 176 176 135 328 200 200 199 125 125 125 125 348 466 283 455 38 349 234 234 261 25 346 265 265 85 85 146 146 438 349 349 234 234 261 164 273 498 498 498 313 285 34 41 324 324 422 143 259 161 161 161 487 487 288 290 290 290 434 434 434 339 394 36 377 377 87 236 10 479 331 331 428 428 428 428 207 207 358 358 233 465 227 419 439 78 421 491 491 193 193 17
+17 17 17 296 296 184 184 184 435 435 66 172 115 273 273 84 344 16 274 399 399 473 65 486 486 486 460 460 169 164 164 164 485 485 485 301 378 43 364 109 109 189 330 330 64 76 465 377 123 123 236 32 259 354 190 380 499 428 428 85 146 146 35 35 133 133 147 288 288 278 173 280 29 29 382 313 236 108 377 87 87 399 217 473 213 213 213 252 325 325 183 57 57 57 57 203 381 117 404 13 229 491 247 312 126 292 292 292 292 292 292 21 326 326 326 408 408 408 408 149 149 228 491 320 217 473 258 258 258 342 342 224 494 494 494 258 31 162 232 232 68 68 105 105 336 470 329 329 330 330 379 64 77 342 224 300 300 382 245 245 43 345 109 389 497 497 122 239 161 79 499 499 405 206 215 35 29 57 57 57 203 70 106 426 426 426 426 169 169 352 352 402 198 127 114 114 264 264 264 264 59 59 59 452 263 263 417 417 417 417 170 170 47 47 491 491 2 2 47 2 316 2 491 491 316 316 73 73 289 435 435 83 255 255 130 402 458 144 441 441 153 153 372 372 396 271 186 39 342 323 97 427 247 247 126 126 326 326 326 326 101 101 149 149 491 373 338 400 400 400 400 30 301 416 416 180 180 84 84 496 496 274 71 368 368 453 168 106 426 426 426 426 413 348 64 465 377 123 123 123 43 276 346 346 346 428 85 146 146 252 36 478 66 68 115 470 486 365 365 365 330 388 33 77 77 342 68 238 6 272 470 171 171 252 99 99 436 60 60 116 94 58 58 156 156 156 313 186 186 162 68 115 273 273 279 279 279 279 279 375 375 352 352 352 352 352 352 112 112 417 417 237 237 237 237 237 237 491 237 237 491 237 237 237 362 362 362 362 491 491 362 362 362 491 218 491 218 491 491 211 218 218 366 366 491 491 366 366 366 366 491 366 366 366 366 163 316 316 316 316 73 73 491 320 7 473 258 258 258 31 342 224 494 494 494 31 9 142 397 147 380 329 329 329 329 329 310 107 395 302 302 497 497 122 129 259 190 190 190 488 499 265 265 85 146 146 325 325 34 382 313 285 325 183 156 156 156 156 396 313 186 162 172 115 273 279 279 279 279 279 293 169 352 352 155 125 125 322 94 335 14 14 411 297 297 297 297 297 293 43 345 109 109 109 171 422 186 162 68 68 68 105 105 336 354 213 213 213 143 192 192 135 135 135 200 248 58 156 156 156 156 245 245 399 70 65 480 480 480 480 85 299 299 299 303 243 227 419 427 56 491 247 312 126 292 292 326 326 326 326 326 326 326 101 101 408 228 228 373 338 338 400 400 400 400 301 143 129 74 190 492 492 492 186 162 342 172 444 444 444 444 252 325 325 191 191 191 314 36 108 87 87 87 38 342 86 105 336 470 213 213 213 143 458 192 277 277 277 314 196 196 479 331 331 315 315 315 315 450 450 450 98 263 417 417 417 225 225 225 72 72 110 202 202 202 202 202 280 135 135 135 200 464 464 255 255 236 239 259 107 395 180 151 151 169 150 150 86 238 6 272 191 191 191 240 58 183 156 156 156 156 245 399 399 217 217 473 432 432 330 348 64 212 449 302 302 497 497 14 14 411 145 145 486 460 240 325 449 469 469 469 236 259 108 449 485 485 374 374 374 37 24 259 377 377 123 123 216 22 283 283 38 162 68 342 224 494 494 399 217 217 473 290 290 171 171 171 252 318 368 342 342 176 176 176 328 200 248 248 76 74 485 213 213 213 213 186 39 342 224 462 462 462 462 402 196 196 398 398 398 398 398 374 374 132 132 185 185 185 323 390 18 112 427 56 56 491 491 15 15 15 15 15 193 193 193 193 17 17
+17 17 17 17 296 363 52 51 51 51 51 491 184 184 491 184 7 7 7 364 276 109 109 109 443 443 139 139 293 293 293 497 399 217 70 473 65 329 495 406 406 467 134 139 139 175 423 423 423 423 423 263 263 417 417 417 237 237 237 237 237 201 237 80 435 435 435 440 287 111 111 111 111 139 139 293 293 293 122 35 310 107 395 395 151 151 31 342 342 86 238 6 108 119 119 351 443 151 139 240 240 219 219 477 477 477 477 477 132 8 259 74 74 425 425 386 386 290 290 290 290 434 434 434 434 434 339 466 212 127 45 45 45 325 34 111 111 111 111 111 438 438 438 422 349 164 164 214 214 214 360 360 200 248 76 465 219 219 152 152 222 498 353 353 313 236 239 371 371 374 374 88 88 176 135 135 135 200 44 44 44 399 70 65 65 428 428 146 146 252 449 449 324 324 422 349 349 234 234 261 25 441 153 153 153 153 132 81 81 459 459 469 99 447 447 447 447 238 336 214 214 214 214 214 328 328 200 303 303 404 404 229 491 247 312 126 326 326 326 101 101 101 149 149 228 491 287 287 44 44 44 44 44 42 42 42 147 147 380 288 278 278 31 342 86 86 105 105 336 485 41 324 324 422 349 164 164 164 214 214 214 214 328 328 200 200 200 200 248 248 248 127 114 92 92 92 92 169 35 77 66 142 397 397 276 346 346 346 355 355 37 37 24 227 419 419 439 78 491 491 312 312 312 312 292 292 1 21 21 21 21 21 408 408 408 408 149 228 491 289 219 152 152 152 152 236 325 371 180 84 84 496 350 167 457 457 479 331 84 84 84 16 274 43 43 276 181 181 181 181 35 449 449 485 152 222 353 353 245 416 458 445 180 443 443 240 325 449 176 176 135 328 200 117 117 48 414 414 47 47 47 47 491 491 47 491 491 80 491 7 7 219 219 152 152 222 353 372 245 245 245 129 259 354 190 380 288 288 360 360 200 135 135 135 135 200 200 44 44 44 44 162 232 482 482 482 238 336 161 487 288 290 290 290 434 339 339 64 76 107 447 447 6 6 119 351 437 91 91 265 85 85 85 139 139 293 122 122 131 34 340 340 116 33 394 465 377 123 123 219 477 222 222 222 372 372 245 58 72 268 268 268 268 268 268 169 186 269 323 224 242 116 33 58 58 72 350 350 350 350 274 274 203 381 117 229 247 247 126 126 326 326 326 326 326 101 149 149 228 491 412 83 55 55 55 55 322 67 212 219 152 152 152 152 132 236 239 239 371 180 84 84 496 274 274 274 457 196 479 331 84 84 274 88 88 44 44 44 38 232 232 68 68 115 273 278 360 360 200 200 64 212 302 302 302 497 497 349 205 259 214 214 214 214 200 200 464 255 255 8 354 180 113 113 113 113 113 167 285 449 57 57 57 57 203 203 195 10 309 331 157 157 157 157 372 245 245 43 364 364 181 181 181 181 285 449 34 356 281 453 342 6 272 490 490 490 31 9 105 336 336 494 494 494 368 453 168 418 418 418 99 436 436 60 60 298 116 199 356 356 281 31 9 26 26 241 266 266 266 266 266 146 146 358 143 458 192 472 196 309 479 331 157 157 157 157 372 372 245 245 43 364 276 181 181 181 167 167 35 478 478 68 224 273 153 153 153 396 285 285 462 462 462 402 129 259 74 74 351 351 351 351 264 468 468 468 468 467 467 467 11 275 379 379 77 77 342 342 451 30 30 30 30 58 58 110 110 110 486 486 460 460 240 24 131 404 229 247 126 326 193 193 17
+17 17 363 51 51 228 491 7 309 479 331 157 157 157 387 372 372 396 313 58 72 110 268 268 268 268 268 274 274 183 451 30 30 30 356 368 342 9 26 251 241 266 266 266 266 178 458 96 26 359 474 474 301 236 87 87 87 87 36 108 119 308 308 308 308 396 313 94 199 180 113 113 113 113 450 450 413 233 227 419 439 78 170 491 312 187 187 187 187 12 12 12 12 260 260 260 391 391 149 491 491 491 7 7 276 346 346 346 265 85 85 146 464 177 177 177 177 133 133 141 141 141 281 342 168 106 350 350 350 350 348 250 359 166 166 324 301 251 251 241 376 376 376 376 460 169 150 342 86 238 272 397 397 109 109 213 213 213 143 458 144 180 106 111 111 111 438 438 42 147 147 380 288 443 240 240 325 34 340 340 116 466 22 283 455 129 74 351 351 351 171 171 252 215 215 259 29 334 334 59 59 245 58 72 72 268 268 268 268 88 88 88 44 44 399 217 217 473 65 136 136 136 136 136 136 136 282 388 94 34 89 340 116 131 183 257 257 257 257 281 9 142 221 336 364 276 346 346 428 428 146 146 349 205 352 106 230 230 230 215 215 35 35 133 364 364 276 109 109 443 443 443 169 150 39 86 86 238 6 272 69 223 130 198 22 448 448 464 106 106 265 85 85 85 146 175 175 81 81 275 275 116 64 131 427 229 247 126 326 326 326 326 326 101 101 149 228 289 289 491 108 377 295 295 295 295 35 192 44 44 44 8 8 8 354 153 153 153 387 387 387 146 464 464 113 113 113 113 206 285 449 34 69 223 130 44 44 44 94 335 14 411 411 153 372 372 372 396 349 349 234 261 25 242 116 94 199 255 38 31 342 68 115 273 106 265 265 85 85 85 175 175 81 81 203 203 381 404 335 440 55 55 322 67 131 183 451 451 30 30 30 422 186 162 68 68 115 273 189 443 240 385 131 472 393 393 234 234 261 261 25 265 265 265 85 146 146 300 382 382 313 143 36 377 123 123 216 283 283 455 72 72 268 268 268 268 268 169 169 39 342 342 224 415 415 415 314 401 196 479 331 428 428 428 428 358 358 233 36 227 427 427 247 247 312 126 292 292 292 292 292 23 408 408 408 408 391 491 491 373 66 66 68 68 115 273 470 443 240 325 449 277 277 277 277 325 335 14 14 287 284 125 125 125 348 348 195 33 394 76 401 82 74 492 492 492 492 396 215 35 354 459 459 459 271 39 342 86 142 196 70 65 65 495 406 406 467 288 139 139 175 175 423 423 423 423 263 229 229 247 126 126 326 101 408 149 149 491 412 83 83 55 55 322 322 67 10 10 309 479 398 398 398 398 468 468 313 359 359 166 166 166 324 301 301 32 32 32 354 354 498 498 308 313 348 64 76 198 198 114 57 57 203 53 76 465 377 377 123 123 123 88 44 44 44 129 458 208 208 190 487 278 278 31 342 86 105 336 336 354 340 340 116 466 466 114 222 222 222 468 245 8 8 354 470 120 120 330 240 379 243 233 270 270 433 433 160 112 112 56 56 421 491 421 491 491 491 421 421 128 491 128 128 193 17 17
+17 17 17 296 296 317 305 305 317 461 491 491 435 435 435 435 287 83 194 194 194 194 322 67 212 34 111 111 111 111 438 438 10 479 331 84 84 88 88 88 44 44 348 10 10 479 331 493 493 493 216 300 300 382 245 143 465 445 351 351 343 343 343 171 358 368 342 9 142 397 336 345 347 347 347 406 467 467 467 340 116 199 255 255 236 236 384 180 106 405 405 405 215 215 96 272 449 191 191 191 314 314 32 401 259 354 153 153 153 387 387 387 146 146 219 219 219 219 485 374 374 368 186 323 323 238 6 272 87 87 87 38 162 68 68 68 68 115 273 151 151 178 178 35 458 96 472 164 198 22 448 448 448 448 464 180 443 443 120 120 120 416 416 233 233 270 49 433 433 433 160 427 247 247 126 126 126 292 326 326 326 326 326 326 408 408 149 149 228 289 491 127 114 0 0 0 0 422 143 458 144 27 389 389 389 389 314 35 196 242 242 33 33 76 465 401 259 354 190 380 288 288 295 143 458 192 183 57 57 57 203 88 69 223 223 223 130 280 277 277 277 277 385 24 227 419 419 439 439 439 439 237 237 237 237 47 47 47 491 491 316 491 80 373 412 412 188 188 118 118 118 118 118 118 402 219 219 152 152 152 132 132 58 58 72 110 110 254 254 240 325 34 145 145 460 460 169 150 342 86 6 472 221 70 46 46 46 46 464 464 255 255 240 314 4 280 106 265 265 265 85 85 146 358 39 39 342 342 224 340 340 466 466 22 283 399 473 65 486 486 460 460 285 449 334 334 382 59 452 229 229 247 126 126 326 326 326 326 326 101 101 101 149 149 228 491 289 320 345 407 407 407 407 35 36 310 107 447 219 219 219 152 152 152 132 236 32 239 384 371 371 278 278 325 242 242 242 379 243 243 36 227 472 472 221 336 336 384 371 374 374 374 374 132 132 132 399 70 473 65 329 329 42 406 467 134 139 139 175 423 423 423 423 423 452 263 229 259 247 312 126 292 292 23 23 23 408 101 149 149 149 228 491 412 287 111 111 111 111 111 438 438 325 34 202 202 202 402 402 162 232 68 68 172 115 470 470 120 120 240 240 314 314 131 393 393 393 155 155 332 332 332 372 372 245 399 399 217 217 217 70 65 65 498 498 498 186 186 54 54 172 224 41 324 324 422 186 162 232 232 68 68 68 68 115 470 470 403 171 171 252 416 458 401 196 196 309 331 307 307 307 61 167 35 35 108 377 87 87 38 164 164 164 164 164 214 214 214 360 200 200 76 458 192 69 223 223 402 66 342 224 344 344 344 449 449 44 44 44 38 164 164 164 214 214 214 214 214 328 200 200 248 248 212 127 114 92 92 92 92 169 35 77 77 66 86 142 397 397 336 276 346 346 265 355 37 37 24 227 419 419 439 439 439 78 237 170 491 421 421 491 491 491 491 341 15 15 15 15 193 193 193 17 17
+17 17 17 17 296 52 52 52 52 52 52 52 52 52 461 51 51 51 184 491 184 184 7 7 7 127 127 258 258 258 258 258 39 342 342 86 86 238 221 336 336 401 310 107 395 395 329 84 84 496 496 496 496 274 274 215 8 96 270 342 86 221 221 144 27 437 437 319 319 53 53 76 205 29 29 469 469 24 325 176 176 328 328 200 200 195 248 49 49 68 68 68 444 444 444 444 444 434 434 434 339 394 212 131 472 472 196 309 479 331 331 265 265 428 146 146 216 300 300 382 236 36 377 87 87 87 88 44 44 44 349 349 234 234 261 261 25 432 432 432 432 330 330 388 195 195 195 195 195 64 212 131 472 472 221 309 479 157 157 157 157 372 313 236 108 377 123 123 123 88 88 88 255 255 251 251 241 431 431 306 306 306 306 306 306 396 203 53 381 217 70 65 65 329 495 406 406 134 134 139 175 175 423 423 423 423 423 263 263 417 417 417 417 237 237 237 47 47 47 491 491 435 435 435 435 373 338 338 338 400 400 400 30 422 94 199 398 278 278 325 449 191 191 191 314 314 478 478 68 68 68 238 6 371 470 443 443 240 325 26 134 359 359 359 474 474 324 324 464 464 426 426 426 426 426 426 282 388 303 303 48 48 417 170 491 170 491 28 28 28 491 28 362 491 362 362 491 362 491 362 362 40 491 491 211 211 369 369 369 369 369 369 21 21 21 21 21 21 21 21 21 21 260 260 260 260 260 260 391 391 391 491 73 289 412 412 287 287 111 111 111 111 438 438 24 384 371 180 84 84 350 274 167 314 36 384 490 490 490 116 479 331 331 265 265 265 85 85 146 146 216 127 300 382 382 313 186 162 232 172 115 231 231 231 231 53 76 76 164 214 214 214 328 200 200 340 340 116 250 250 345 181 181 181 181 35 131 219 152 152 152 422 186 162 232 172 115 273 470 403 403 403 207 301 301 42 147 147 329 329 329 329 252 143 310 107 395 302 302 302 497 98 98 13 417 417 417 417 237 237 237 237 237 237 237 80 80 491 491 412 412 287 287 111 111 111 438 202 202 402 58 72 110 110 110 460 240 240 35 77 478 68 224 231 231 231 53 90 90 76 465 208 208 441 441 106 481 481 426 426 426 426 203 53 381 471 49 49 342 142 221 196 46 46 46 46 438 186 162 68 68 115 273 279 279 279 279 279 279 375 169 352 352 352 352 427 229 491 247 126 126 292 326 326 326 326 326 326 101 101 149 149 228 228 289 320 159 159 159 159 159 35 196 196 217 473 329 329 329 329 460 329 164 164 485 485 485 485 423 132 378 43 345 141 141 141 281 9 86 86 6 108 119 119 351 351 264 468 468 468 467 134 134 134 8 100 100 497 497 186 162 68 68 115 273 470 443 240 285 449 34 125 125 125 125 348 199 199 277 277 277 385 227 227 419 439 417 417 237 237 237 237 237 237 80 80 80 491 412 412 287 287 111 111 111 438 438 143 144 389 389 389 314 478 478 68 68 172 267 267 267 267 267 301 216 127 114 92 92 92 92 92 240 240 143 35 36 478 66 172 224 273 84 84 16 88 88 111 111 111 111 438 438 416 416 445 445 210 210 210 171 252 173 173 280 34 120 120 120 120 388 303 303 303 48 417 417 417 170 491 491 491 421 128 128 193 193 17
+17 17 17 17 296 363 363 363 363 51 51 228 491 412 412 177 177 177 177 356 478 66 68 68 172 115 273 344 344 344 274 274 186 162 232 232 172 115 273 273 139 293 293 293 122 122 272 34 242 319 203 53 381 217 217 473 65 486 486 460 169 169 164 164 485 485 485 374 422 186 162 68 68 273 470 443 443 240 71 71 342 224 257 257 257 257 453 9 196 217 70 65 480 480 480 480 299 299 299 339 212 34 106 125 125 125 388 94 199 475 475 475 475 475 475 475 475 422 349 164 164 214 214 214 214 328 328 200 200 248 212 127 45 45 45 45 385 131 133 133 364 409 409 409 409 348 94 183 183 451 451 30 30 301 236 36 384 71 71 71 71 71 71 71 71 71 368 453 342 168 106 111 111 111 111 438 464 106 297 297 297 297 297 43 109 109 109 469 186 39 342 86 142 393 261 25 444 213 213 139 139 251 241 81 177 356 236 71 71 142 221 196 70 46 46 46 438 438 236 239 384 485 485 485 374 374 252 449 449 41 41 41 324 3 143 36 377 87 87 87 416 445 445 278 278 173 280 34 120 120 120 275 388 303 303 303 117 404 78 491 491 312 312 312 292 292 292 292 292 292 21 21 21 21 21 21 21 21 408 408 408 149 149 491 412 412 83 55 55 322 67 212 131 34 253 253 253 253 31 342 342 86 142 221 155 332 332 332 332 332 216 283 455 42 42 147 380 288 278 278 278 271 39 342 433 433 105 105 458 192 419 439 439 417 417 237 237 237 237 237 237 237 237 237 237 491 237 80 80 435 80 491 491 127 114 0 0 222 468 356 281 453 9 142 221 336 147 380 288 278 278 31 342 86 86 105 336 270 270 342 224 494 121 203 53 394 394 465 259 190 190 487 104 104 325 41 324 324 301 10 309 398 398 398 398 468 245 335 14 411 411 204 204 204 204 204 204 29 337 337 337 324 422 164 164 164 214 214 214 214 214 200 200 464 415 415 236 129 259 354 354 91 91 206 206 285 285 41 41 324 301 236 239 384 71 71 71 71 71 71 368 342 168 340 340 340 466 466 114 258 258 258 31 142 142 397 397 364 109 109 498 498 134 139 375 375 375 122 122 131 427 229 491 247 126 126 292 292 292 292 326 326 326 326 326 101 101 408 149 228 491 491 127 114 222 222 468 468 356 356 281 9 9 142 42 147 147 380 288 278 278 31 31 342 86 105 105 336 458 270 270 224 89 89 203 53 394 465 465 74 485 213 213 252 215 129 354 100 302 497 497 49 342 58 72 110 202 202 202 202 202 280 176 135 135 200 248 248 310 107 107 395 395 106 153 153 387 122 122 161 300 242 242 116 94 199 223 223 130 198 198 222 222 222 222 406 406 467 467 350 350 350 350 350 350 350 413 413 413 195 199 118 118 118 118 402 177 177 177 177 458 144 351 351 319 319 319 71 71 71 71 49 86 238 6 123 123 123 216 216 114 92 92 92 92 92 92 92 282 385 385 131 419 427 229 247 247 126 126 292 326 326 326 326 326 101 101 149 149 228 289 289 491 127 114 0 0 0 0 0 252 252 325 180 106 350 350 350 350 413 413 465 131 106 106 297 297 297 297 43 109 109 109 109 318 31 39 142 6 272 119 308 308 308 308 313 116 94 199 331 486 113 113 167 167 457 364 276 109 109 139 139 139 375 375 98 98 13 417 417 417 417 237 491 237 421 421 491 491 491 491 193 193 17
+17 17 17 296 296 363 363 363 363 363 51 51 149 228 228 289 412 83 83 194 194 194 194 194 194 388 388 64 64 131 472 198 127 361 361 361 361 361 388 67 10 10 479 331 331 84 496 496 173 173 280 29 255 38 162 54 482 105 105 336 144 496 496 496 496 274 99 99 436 395 395 50 50 50 31 9 142 397 147 380 499 428 428 146 167 457 35 401 259 208 208 386 386 496 496 496 496 186 39 86 238 6 377 123 123 123 22 448 448 448 464 180 106 265 265 85 85 146 134 175 81 81 275 275 388 303 243 131 419 439 439 225 417 80 80 491 491 209 188 177 177 177 325 356 356 356 281 342 342 168 242 242 116 64 212 34 253 253 368 453 342 168 118 118 118 118 118 205 402 152 152 152 152 378 378 345 347 347 347 313 416 458 445 180 443 240 325 325 176 135 135 200 200 248 183 57 57 57 57 203 53 76 205 155 165 165 165 165 165 335 14 411 411 360 360 360 200 200 248 248 248 441 302 81 81 275 275 116 64 212 131 157 157 157 157 157 216 216 22 283 455 38 162 68 68 6 272 470 470 171 171 171 171 358 358 358 358 270 270 270 433 160 112 427 229 247 247 126 292 292 326 326 326 326 326 326 101 408 408 391 228 491 491 373 451 451 30 143 458 445 445 351 365 365 365 460 460 76 465 465 420 420 420 324 301 399 217 217 383 383 383 383 383 383 310 310 447 447 6 336 371 278 278 349 349 155 29 242 275 116 394 90 393 155 332 165 165 53 65 353 353 353 353 396 186 162 342 115 273 279 279 279 279 279 279 293 169 169 352 270 433 390 390 18 112 112 439 439 78 56 56 491 491 28 491 491 312 491 341 341 12 12 12 12 292 21 21 21 21 21 21 21 21 21 21 21 21 21 369 260 260 260 260 260 260 40 40 40 40 40 40 163 163 491 491 305 305 491 316 316 316 73 491 491 320 345 109 109 139 139 139 175 175 81 111 111 111 111 438 438 438 58 72 72 72 72 72 437 496 496 496 496 496 215 35 35 354 177 177 177 131 133 345 389 389 497 497 36 108 108 308 308 308 308 308 313 94 199 113 113 113 113 206 285 285 106 106 297 297 293 293 42 42 147 380 499 428 428 428 146 358 358 233 227 419 419 439 417 417 417 237 237 237 237 237 237 201 237 237 201 201 201 80 373 435 435 108 179 179 179 179 179 314 196 217 473 258 258 31 342 224 494 494 494 281 9 142 397 147 147 329 329 329 329 252 310 107 395 302 302 302 497 175 175 81 89 340 94 199 255 255 236 36 119 119 351 351 496 350 350 350 413 413 466 466 22 45 45 236 129 129 82 74 74 425 425 386 386 386 290 290 290 290 434 339 33 359 359 166 166 166 3 14 411 188 121 121 116 64 212 384 469 469 416 143 458 445 158 158 158 158 158 325 34 191 191 314 131 58 156 156 156 156 245 8 129 259 74 74 351 351 290 290 290 290 434 339 339 195 248 90 393 393 155 262 262 100 100 497 497 497 122 239 384 371 180 180 486 315 315 113 450 450 450 167 37 233 270 270 433 390 390 18 112 112 439 439 193 193 193 17
+17 17 296 296 184 491 209 287 350 350 350 350 350 350 250 359 359 81 166 166 324 324 422 314 32 239 384 371 180 84 350 350 350 413 413 243 131 472 232 232 232 68 115 273 470 470 403 403 171 464 464 111 111 111 111 438 438 239 371 371 278 242 314 242 242 242 394 133 364 276 276 153 153 387 387 396 348 339 219 219 477 477 477 88 118 118 118 118 402 183 451 30 30 301 32 129 259 354 354 498 498 498 498 498 396 396 242 116 195 471 368 453 9 142 221 144 208 79 288 288 360 360 360 434 339 200 33 248 248 212 445 180 171 171 171 252 252 8 354 100 302 497 497 497 49 453 9 6 384 371 180 315 315 315 315 450 450 450 413 413 94 199 157 157 245 245 129 259 74 190 189 189 236 35 478 478 482 482 482 482 482 238 6 161 487 288 178 178 458 458 192 196 196 479 398 360 360 434 434 339 199 34 340 340 116 466 22 283 455 43 276 109 109 139 139 139 139 375 375 375 375 98 13 229 491 247 126 126 126 326 326 326 326 326 326 326 326 326 101 101 101 149 149 228 289 320 287 111 111 111 438 438 58 110 498 498 498 498 396 285 34 223 223 280 44 44 44 458 445 445 351 351 343 171 171 358 358 39 342 342 224 224 106 410 410 410 410 173 173 29 29 495 467 467 44 44 116 10 10 398 398 398 398 374 132 236 32 259 354 190 380 499 319 319 348 348 471 471 49 9 142 397 109 109 288 178 178 143 458 208 397 347 347 467 467 44 44 94 14 14 411 153 372 372 396 349 349 352 29 242 116 199 44 44 38 342 342 115 273 106 265 265 85 85 146 175 81 282 203 53 394 90 310 107 395 351 91 91 91 85 85 139 450 293 122 35 401 384 371 278 278 314 314 401 401 127 114 92 92 92 92 167 385 35 227 427 229 247 126 126 326 326 326 326 101 408 149 149 228 491 412 55 55 55 322 67 466 198 5 5 455 38 72 72 72 72 72 437 424 424 424 424 424 497 497 122 349 401 205 261 25 25 365 365 365 365 460 203 53 359 81 81 41 324 324 422 36 371 180 265 265 265 85 146 146 325 34 340 340 116 33 76 76 205 234 234 234 261 25 485 286 286 286 468 245 349 349 205 262 262 100 497 497 14 14 145 145 486 460 460 416 458 242 242 116 199 41 41 41 19 318 185 433 433 433 160 112 427 56 170 491 312 312 312 187 12 12 292 12 12 12 12 12 408 408 260 391 491 491 316 491 491 491 491 412 287 287 350 350 350 350 350 359 359 81 166 166 464 177 177 177 133 133 141 141 141 281 453 168 44 44 416 208 79 498 498 498 498 134 302 497 175 81 340 340 340 466 466 114 92 92 92 240 325 34 121 121 121 379 77 77 342 86 238 6 272 11 11 379 379 243 471 49 433 390 390 18 18 112 439 439 237 237 237 237 237 237 237 305 305 12 260 260 260 260 260 260 260 163 163 316 316 316 316 491 7 7 7 364 109 109 139 139 139 293 293 43 43 345 347 347 347 313 313 94 479 307 307 307 61 167 131 472 401 259 144 445 443 443 240 325 176 135 135 200 464 44 44 44 416 458 144 79 498 498 498 499 499 302 375 375 98 98 13 417 417 417 237 237 237 237 237 237 237 237 491 491 80 316 491 80 491 289 435 66 66 179 179 179 179 314 196 217 70 65 329 329 406 406 467 134 139 139 175 423 423 423 423 423 263 263 229 247 15 193 193 17
+17 17 363 363 51 51 228 209 83 145 253 253 253 453 342 342 118 118 118 118 349 402 221 259 74 74 441 153 153 387 387 146 368 453 342 242 196 309 199 176 135 135 200 200 248 250 364 276 109 109 443 139 139 139 293 293 293 185 49 9 142 397 345 347 347 347 347 406 467 255 255 129 129 74 74 485 485 485 286 286 468 468 468 134 359 359 359 166 166 324 3 422 349 234 234 234 261 25 25 443 443 330 203 53 473 242 242 199 199 89 446 94 199 255 255 143 259 144 27 437 319 319 53 53 76 465 81 81 469 469 99 447 447 221 196 291 291 291 291 291 243 227 419 427 247 247 126 126 326 326 326 326 326 101 149 149 491 412 83 55 194 194 194 388 67 10 10 479 331 307 307 61 167 167 36 108 87 87 87 8 420 420 420 422 32 239 161 161 79 79 288 443 443 240 325 34 191 191 24 36 34 340 340 340 466 22 283 455 143 458 445 445 351 343 343 171 358 358 39 342 342 224 69 69 130 280 29 44 44 236 8 354 153 153 153 153 387 387 387 207 207 207 454 13 229 82 247 312 312 312 292 292 292 292 292 292 292 292 1 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 408 408 408 408 391 391 289 491 209 287 111 111 438 438 314 196 309 479 463 463 463 463 29 29 382 313 236 32 32 239 161 79 79 288 360 360 360 434 434 203 53 69 223 223 402 221 259 108 119 295 295 295 295 295 458 135 135 135 135 200 200 44 44 44 44 129 401 491 144 79 498 498 498 498 139 302 302 497 122 122 449 87 87 87 8 354 380 288 288 360 328 200 464 230 230 230 230 230 230 215 35 29 419 419 225 225 225 225 225 225 225 225 225 225 225 225 287 287 111 111 111 438 378 43 364 276 174 174 319 348 348 64 212 161 300 382 382 277 415 457 196 217 258 258 258 342 342 224 494 494 453 168 145 329 329 329 175 81 81 469 416 416 96 342 168 470 365 365 365 365 348 64 212 300 382 382 186 186 54 54 105 336 336 74 470 432 330 379 64 64 77 77 224 224 334 382 245 349 155 332 332 332 236 239 384 371 374 374 88 176 176 135 135 200 200 277 277 277 277 233 227 419 439 439 439 225 225 225 391 491 80 491 491 73 491 320 159 159 159 314 35 259 127 114 264 264 264 468 59 59 452 263 417 417 417 417 47 47 491 491 491 435 197 373 338 338 338 338 338 400 400 400 400 95 95 246 246 246 301 378 378 345 389 389 389 314 196 242 242 33 33 310 107 338 161 161 161 487 288 360 360 360 200 200 243 96 96 393 155 165 165 165 53 44 255 236 239 384 180 180 405 405 206 206 35 96 272 176 135 135 200 200 44 44 44 44 72 72 72 424 424 424 424 424 424 497 497 497 335 14 226 82 411 411 157 372 372 396 349 349 234 261 25 242 242 94 199 459 38 31 342 342 273 106 265 265 85 85 85 175 175 81 81 203 53 118 118 118 118 402 338 400 400 400 422 36 108 377 295 295 295 416 458 277 277 277 325 34 340 340 340 116 64 76 108 377 123 123 123 88 88 156 156 156 156 245 58 58 110 110 120 120 120 120 120 37 24 24 404 439 78 229 491 312 312 15 292 292 292 292 292 292 292 21 21 21 15 193 193 193 193 17 17
+17 17 17 17 296 52 52 52 52 52 52 52 52 461 461 491 461 461 184 491 491 305 305 289 7 217 473 258 258 258 342 342 224 494 494 494 453 9 142 397 147 329 329 329 329 329 143 36 449 302 302 302 497 43 43 345 389 389 389 285 34 202 202 402 402 251 241 266 266 266 266 146 178 35 35 272 87 87 38 162 342 86 238 6 272 470 403 403 464 464 464 330 348 76 76 108 377 139 139 139 497 399 217 217 473 486 486 486 460 460 169 164 164 485 485 485 374 422 143 144 445 210 210 210 210 210 210 203 53 58 58 350 350 350 350 350 203 250 250 345 333 333 220 216 22 257 281 453 9 168 121 121 53 76 465 74 74 441 153 153 372 372 313 449 449 191 191 24 335 14 14 411 153 153 372 372 372 396 349 352 352 352 275 275 275 116 303 303 48 229 491 247 312 126 126 292 292 292 292 326 326 326 23 23 23 101 408 408 408 149 149 228 491 289 491 354 159 159 159 159 314 133 133 456 456 456 456 349 349 234 261 386 386 151 151 151 178 35 96 36 449 176 176 135 135 200 200 248 212 127 45 45 45 325 177 177 177 177 345 345 389 389 389 129 129 259 420 420 420 420 464 464 44 44 44 416 129 401 401 144 484 484 484 484 314 314 32 401 401 259 108 377 351 374 374 374 132 88 88 106 145 284 315 315 315 450 450 450 372 304 304 304 49 342 342 168 415 415 415 26 26 26 241 241 444 444 213 213 358 39 39 342 142 221 336 354 354 255 38 349 205 155 148 148 148 148 372 245 58 183 257 257 257 453 168 255 255 42 42 147 380 499 499 265 85 85 146 173 173 280 302 302 375 497 98 229 82 247 126 126 126 326 326 326 326 326 326 101 101 149 228 491 373 338 400 400 400 400 30 143 144 27 121 121 121 33 394 76 208 208 386 444 444 374 374 252 325 34 191 191 191 314 36 36 377 87 87 87 416 416 180 84 84 496 88 88 230 230 230 230 215 215 35 401 198 198 283 283 455 42 42 147 380 380 288 496 496 496 496 274 274 274 37 24 24 36 377 377 377 123 123 272 123 123 123 42 42 147 147 499 499 405 405 206 215 29 469 313 314 32 401 401 401 354 180 180 443 139 139 139 139 375 375 375 375 185 49 342 342 168 89 116 33 394 76 108 119 351 351 139 139 293 293 122 216 283 283 455 116 10 398 398 398 398 398 374 374 132 132 132 185 185 269 390 390 390 18 18 112 439 237 237 237 237 237 237 491 47 491 47 491 491 435 435 435 289 491 209 177 177 177 177 131 133 133 345 389 389 389 314 129 478 66 68 68 115 273 498 498 498 240 240 35 35 359 359 359 166 166 166 301 301 217 217 473 476 476 476 476 143 458 192 44 44 38 342 342 115 273 432 432 379 379 394 77 68 68 115 418 418 418 418 418 99 99 436 436 60 298 379 379 471 478 66 342 115 273 151 178 416 458 192 242 116 64 76 108 377 123 123 116 10 479 331 331 319 319 319 282 388 303 303 117 48 229 247 15 15 15 193 193 193 17
+17 17 17 363 363 51 51 491 412 412 55 55 55 322 67 33 250 217 473 258 258 258 31 342 224 494 494 494 281 9 142 397 147 147 329 329 329 329 329 329 36 310 107 395 302 302 302 497 497 122 122 401 401 401 401 401 384 371 371 286 286 286 286 313 134 359 359 166 166 166 166 301 301 251 251 251 241 266 266 266 266 266 173 173 402 402 6 108 377 87 87 217 473 476 476 476 143 458 192 44 38 68 68 115 273 432 432 330 379 394 77 342 342 115 470 418 418 418 418 99 99 436 436 60 60 298 303 303 48 48 417 417 237 237 237 491 237 2 491 2 2 491 491 491 435 435 491 491 435 435 491 289 373 66 66 115 273 344 496 186 99 400 400 400 30 422 143 36 108 295 295 295 295 295 458 192 156 156 156 186 186 54 172 115 279 279 279 279 279 349 352 29 44 255 255 43 43 276 109 109 403 403 403 207 207 207 207 19 3 454 225 66 66 68 68 68 115 273 231 231 231 231 53 250 250 345 346 426 206 167 167 457 36 108 377 123 399 70 65 65 329 42 42 147 380 288 256 139 175 175 423 423 423 423 271 368 269 142 142 397 147 456 456 456 251 251 241 444 444 444 213 246 246 358 358 173 352 352 352 112 225 225 225 225 225 225 225 373 393 155 155 155 332 332 332 313 216 216 5 5 455 455 251 241 431 486 376 376 460 460 449 449 300 382 382 245 349 349 205 261 25 180 189 139 139 293 122 122 131 183 156 156 156 382 313 313 236 239 239 384 180 180 486 315 113 113 450 450 167 35 270 270 342 224 340 340 340 33 394 76 393 205 261 25 485 286 286 286 286 304 304 304 49 447 142 397 147 456 456 456 456 173 280 106 265 265 265 265 85 85 146 146 173 280 176 176 135 328 200 200 199 89 319 319 348 33 64 212 300 123 216 198 448 448 448 448 464 121 121 121 53 394 76 155 425 425 386 134 88 88 11 11 11 379 471 49 342 168 69 69 223 130 129 196 196 217 473 258 258 258 31 224 224 494 494 494 281 142 142 397 147 329 329 329 329 329 36 449 302 302 302 497 497 31 142 221 336 74 351 351 443 443 150 150 342 342 224 494 494 203 53 459 459 459 368 453 342 168 275 203 203 381 48 13 13 491 491 247 312 312 292 292 292 292 292 292 292 21 21 21 21 21 21 21 21 21 21 21 21 21 260 260 260 260 260 260 260 260 260 260 260 391 391 391 491 491 73 491 320 345 109 139 175 175 81 223 130 280 280 106 297 297 297 297 297 297 293 293 349 349 352 164 164 164 164 164 214 214 214 214 200 200 200 200 471 471 49 453 198 114 45 45 385 457 14 401 226 82 209 463 463 463 463 463 280 29 382 382 382 245 245 43 43 364 364 276 276 109 347 347 498 498 59 59 59 245 14 14 411 157 157 157 372 372 245 245 43 43 364 276 109 109 329 139 139 293 497 122 8 354 420 41 41 41 19 19 454 454 417 417 417 417 237 47 80 491 80 491 435 435 209 188 177 177 236 36 107 395 180 486 486 460 178 178 458 192 485 469 134 175 158 158 158 158 158 325 449 191 191 191 314 314 196 217 473 258 258 258 342 342 494 494 494 281 9 142 397 397 147 329 329 329 329 329 143 310 107 395 302 302 497 497 43 364 345 409 409 409 116 314 76 465 400 400 400 301 378 345 141 141 141 31 232 232 68 68 115 273 470 171 171 171 252 349 349 402 26 359 166 166 166 324 464 464 113 113 113 113 169 167 36 449 34 340 340 466 466 22 283 455 497 251 251 241 431 431 290 290 290 434 434 339 339 117 404 13 229 491 247 15 15 15 193 193 193 17
+17 17 17 363 363 363 51 51 228 184 491 491 320 188 177 177 177 177 143 401 82 384 71 71 71 71 71 453 9 142 221 336 155 487 288 485 278 26 359 166 166 166 166 422 162 232 232 68 68 444 444 444 360 360 339 53 53 473 253 253 453 342 168 118 118 349 402 25 111 111 111 438 399 70 65 319 169 150 342 105 221 336 420 420 422 236 239 161 79 79 288 288 360 360 360 203 53 176 176 328 328 200 303 48 13 491 491 312 312 312 312 312 292 292 292 292 292 292 292 292 292 292 292 21 21 21 21 21 21 21 369 21 21 21 21 21 21 21 21 21 21 21 260 260 260 391 391 391 391 491 491 491 320 320 346 346 84 139 139 175 175 81 111 111 111 438 203 53 478 478 232 232 172 115 106 106 153 153 372 372 337 337 337 301 349 155 155 332 332 332 240 216 114 92 92 92 167 457 35 401 259 74 74 441 441 441 153 153 153 372 372 396 313 219 219 219 219 180 180 319 319 348 348 248 250 250 276 174 174 174 388 94 199 89 89 446 116 10 10 309 479 331 84 84 496 399 399 473 65 459 31 31 342 86 86 238 6 470 470 470 171 171 171 358 24 458 192 419 419 439 439 417 237 237 237 237 237 237 237 237 237 237 237 237 237 80 7 7 217 473 65 329 486 460 460 169 169 164 164 25 485 485 485 378 378 88 89 89 116 33 250 70 65 329 329 245 42 147 134 134 139 175 175 423 423 423 423 314 314 239 384 371 180 84 350 350 167 457 309 479 331 84 84 88 88 14 14 411 475 475 475 475 475 475 475 475 422 349 164 214 214 214 214 200 200 255 255 8 354 113 113 113 113 450 167 167 457 36 310 107 107 395 395 106 153 153 122 122 285 300 300 300 275 275 94 117 404 13 414 80 80 491 491 412 412 83 55 55 55 322 67 64 212 114 0 0 139 139 175 81 154 154 154 458 96 66 68 105 105 336 470 151 151 178 35 96 401 36 272 57 57 57 203 64 394 76 377 87 87 87 420 420 420 420 301 43 364 276 346 346 265 265 265 85 146 146 368 453 9 300 300 382 406 467 89 89 446 33 394 478 68 68 68 238 6 272 470 470 443 240 325 41 324 324 286 459 459 469 216 198 114 242 446 94 199 257 257 257 453 168 106 350 350 350 350 350 413 195 195 33 90 32 465 208 79 380 288 365 365 365 365 388 348 64 76 90 393 261 25 91 91 91 91 493 216 300 334 334 59 452 263 229 247 126 126 326 326 326 326 193 193 17
+17 17 17 296 296 317 491 491 184 184 184 412 177 177 177 177 177 177 401 478 66 66 68 68 115 444 444 444 444 360 339 339 53 471 71 342 342 483 440 287 319 319 319 388 348 195 195 90 90 143 401 491 445 445 445 351 351 72 72 351 365 365 365 365 330 94 199 41 41 324 324 143 36 377 87 87 87 164 214 214 214 200 200 192 69 223 130 29 44 44 236 36 310 107 395 351 437 91 91 91 85 139 139 293 122 35 198 45 191 236 131 90 401 82 208 79 288 360 360 360 434 339 248 248 212 445 180 171 171 171 252 215 8 100 100 497 497 497 269 342 68 68 115 273 231 231 231 203 94 58 58 268 268 315 268 450 450 98 98 229 82 247 312 312 126 292 292 292 292 23 23 23 23 408 408 408 391 391 491 491 491 289 289 127 114 0 0 313 186 447 196 479 463 463 463 463 29 29 382 245 8 354 137 137 137 137 116 250 250 250 276 174 174 174 319 319 348 466 466 212 127 114 264 264 264 264 59 59 452 245 349 205 155 155 332 332 332 332 372 245 245 217 217 473 65 486 486 460 460 169 164 164 164 219 219 485 485 132 88 88 89 89 446 33 250 70 65 65 329 495 42 147 380 288 139 139 175 175 423 423 423 423 423 355 245 43 345 347 347 347 245 416 32 239 208 79 380 499 84 496 496 274 274 413 94 479 230 230 230 230 215 35 401 491 354 345 409 409 409 409 466 466 466 22 283 455 116 10 398 398 398 398 132 132 58 58 72 72 268 268 268 268 268 169 169 39 54 142 397 397 141 141 281 54 9 221 336 336 354 180 139 139 139 375 375 274 122 122 227 227 419 439 439 439 417 237 237 237 237 237 47 47 491 47 316 316 491 316 73 491 289 435 188 118 118 118 118 118 402 198 198 0 0 0 0 464 464 464 463 463 463 463 280 29 382 382 245 245 43 364 276 347 347 347 498 498 396 396 313 313 24 36 310 107 107 395 395 106 153 153 387 122 122 161 161 487 334 275 275 116 117 48 229 229 247 126 126 126 326 326 326 326 326 101 101 149 149 228 491 320 320 345 407 407 407 143 107 395 356 257 257 281 9 142 72 437 306 306 306 306 396 313 186 36 377 87 87 87 8 354 425 251 251 241 444 444 444 444 246 246 173 402 402 397 409 409 409 116 250 250 276 174 174 174 319 348 466 250 241 367 367 367 367 35 458 270 270 342 224 415 415 415 457 259 127 114 57 57 203 203 381 48 48 13 13 491 247 312 126 126 292 292 292 292 292 292 292 292 21 21 21 21 21 21 21 21 21 21 21 21 21 260 260 260 260 260 391 391 391 391 491 491 289 412 287 287 111 111 111 111 111 438 378 378 364 389 389 389 389 389 314 242 242 242 33 76 76 465 259 354 420 420 420 324 246 3 464 340 340 340 116 466 466 466 114 92 92 92 92 285 34 106 106 153 372 372 396 245 349 349 352 25 242 242 116 33 471 49 49 9 482 338 338 338 395 485 374 374 374 132 132 318 318 49 269 9 142 393 155 332 332 332 332 332 467 467 475 475 475 475 475 475 475 475 422 349 164 214 214 214 214 214 200 200 117 404 404 404 225 225 225 225 225 193 193
+17 17 296 363 363 363 225 225 289 7 7 7 70 70 65 65 284 284 284 265 265 265 85 85 146 146 438 8 354 159 159 285 285 111 111 111 111 438 438 143 129 259 74 74 351 278 278 278 325 183 41 41 324 324 3 183 183 57 57 57 57 203 381 381 117 117 417 417 417 417 417 417 80 80 320 127 114 92 92 92 92 240 35 77 342 9 142 397 336 181 181 181 181 181 385 385 36 227 419 439 439 417 417 237 237 237 237 237 237 237 237 237 237 491 362 491 491 362 491 491 362 362 362 362 491 435 211 211 491 369 369 21 21 21 21 21 21 21 21 21 260 408 408 391 391 391 228 491 491 373 66 66 68 115 273 273 344 344 344 16 274 274 186 186 162 232 68 172 115 470 179 443 120 240 240 314 35 196 217 217 473 258 258 258 342 342 224 494 494 494 281 9 142 397 147 147 329 329 329 329 252 143 36 449 395 302 302 302 497 122 36 36 377 123 123 123 216 22 283 455 43 364 364 276 346 346 346 346 265 85 85 85 139 139 293 293 122 122 131 472 133 147 147 380 288 496 496 496 274 368 31 342 86 142 336 336 354 109 496 278 99 99 436 395 50 50 50 50 185 453 342 168 180 113 113 113 113 169 285 449 34 69 223 130 198 22 283 283 455 349 234 234 261 25 424 424 424 497 497 122 466 81 459 459 271 31 342 224 69 69 130 130 280 156 156 156 156 245 245 72 72 437 306 306 306 306 306 396 396 37 24 227 419 419 427 229 491 247 15 193 193 193
+17 17 296 296 317 184 184 184 184 289 320 108 119 351 351 486 460 460 215 96 35 272 300 382 382 313 236 129 75 108 119 351 351 374 374 374 374 132 132 98 98 13 417 417 170 170 170 170 442 491 442 442 312 187 442 12 102 102 12 442 12 12 23 260 260 260 260 260 391 391 316 289 289 289 320 7 217 70 473 486 486 486 460 169 169 35 164 219 485 485 374 132 143 129 321 144 27 351 329 329 329 169 352 164 221 221 321 354 29 382 382 396 313 24 131 483 226 321 188 356 356 31 162 232 172 224 494 494 494 129 74 190 190 499 499 499 265 265 265 85 85 299 299 185 185 433 433 86 238 6 419 439 56 56 237 237 237 237 237 28 28 28 491 28 28 362 362 491 362 491 305 362 362 362 362 491 491 218 40 362 218 218 218 218 491 218 491 218 218 218 218 218 218 218 218 491 491 218 218 218 491 491 369 369 491 369 369 369 369 369 369 369 21 21 21 21 21 101 101 101 149 391 228 321 321 320 7 217 217 473 486 486 486 460 460 169 169 164 164 485 485 485 374 132 143 129 458 144 27 27 351 329 329 151 169 169 164 352 221 221 321 354 29 334 334 59 59 313 24 131 483 440 89 55 446 322 466 22 5 455 38 162 482 482 172 115 273 106 499 372 406 406 467 302 302 497 497 497 497 399 217 217 473 65 264 264 264 264 468 468 468 467 37 236 314 401 401 310 107 395 180 106 499 405 405 206 206 178 96 96 272 472 472 472 401 321 144 27 27 437 319 319 319 53 53 76 205 155 29 134 134 134 134 8 359 359 474 474 474 3 335 14 411 410 410 410 410 410 173 29 29 313 313 216 22 448 448 448 14 411 411 171 171 171 171 252 252 131 472 196 196 70 70 65 65 265 265 85 85 139 450 293 497 49 54 86 238 272 377 123 236 129 259 190 380 499 499 428 428 146 146 358 457 457 133 42 147 380 380 288 288 173 173 29 334 334 59 59 452 452 263 229 491 312 312 312 312 292 292 292 1 1 1 1 1 21 21 408 408 408 408 149 149 289 321 321 412 177 177 177 177 131 133 141 141 141 281 453 168 44 44 44 129 259 190 190 487 288 278 240 325 34 324 324 301 378 42 147 147 380 84 84 496 496 496 496 274 274 37 24 419 439 225 417 417 80 321 320 7 7 147 147 380 499 319 319 348 94 199 176 176 135 135 135 200 200 199 255 255 255 251 251 241 241 431 235 235 235 235 235 235 200 248 248 212 354 255 236 36 108 397 397 487 360 360 360 360 339 339 33 33 394 478 478 232 68 172 115 196 479 331 331 151 319 151 240 416 314 96 393 393 234 234 234 261 25 106 306 306 306 306 396 203 53 394 478 162 86 86 6 272 470 470 120 120 120 37 37 24 77 270 9 142 397 345 333 333 333 220 220 173 164 164 402 472 196 309 479 331 331 315 315 315 450 88 88 242 116 94 199 255 255 416 416 458 445 445 361 361 361 120 120 282 282 282 388 195 117 117 229 247 247 126 126 326 326 326 326 101 101 149 149 321 412 287 44 44 44 215 35 354 278 278 325 34 462 462 462 402 402 401 401 321 259 354 106 106 481 481 481 293 293 186 39 342 224 224 494 242 203 217 473 41 324 324 422 422 349 234 234 234 234 234 261 261 25 498 498 498 498 396 245 245 43 43 364 345 109 109 496 496 496 37 24 314 36 108 377 87 87 236 239 161 161 79 499 499 499 265 85 85 146 146 173 402 402 205 205 234 161 161 487 487 487 374 374 374 132 132 132 98 229 247 15 15 193 193 17
+17 17 363 51 51 51 228 321 320 157 157 157 157 372 467 44 44 44 58 72 72 72 437 437 481 481 481 481 175 175 81 84 84 84 16 274 274 274 43 345 345 109 109 264 468 245 245 245 43 364 276 276 346 346 284 265 85 85 85 139 139 293 293 122 122 472 221 129 321 75 74 425 425 386 431 319 319 319 319 203 203 381 381 381 471 185 49 342 342 342 142 72 437 189 189 189 319 189 200 200 180 180 113 113 113 113 167 167 457 401 321 75 127 114 222 222 222 468 245 349 349 234 205 205 261 25 278 139 139 139 293 203 399 70 429 324 324 324 301 32 239 259 354 425 425 241 374 374 374 374 374 132 132 132 381 381 381 381 404 13 13 78 170 170 491 491 491 491 28 491 341 211 341 12 292 292 21 21 21 21 21 21 21 21 408 408 408 408 149 228 321 321 320 7 127 5 448 448 14 14 411 411 264 264 264 264 264 468 468 468 245 43 43 345 141 141 281 162 54 232 482 482 105 397 397 109 109 213 213 213 358 358 36 36 472 397 397 333 333 220 220 314 198 127 22 283 455 236 129 321 354 190 79 380 288 443 443 443 169 169 164 164 164 164 69 69 130 130 402 402 196 217 217 473 432 330 116 94 337 324 324 324 3 3 197 197 226 226 209 209 145 145 486 460 460 215 215 35 29 100 302 497 497 335 14 411 153 153 153 372 372 396 396 36 36 107 107 395 334 334 334 59 37 37 24 471 270 269 433 427 427 247 247 126 126 326 326 326 326 408 149 149 149 321 412 83 55 55 55 322 466 466 22 5 5 455 399 217 217 473 65 443 443 443 240 325 34 84 84 84 496 274 274 186 162 54 482 482 482 482 482 482 26 26 26 241 431 84 496 496 496 215 35 96 96 36 272 255 255 255 43 364 109 109 403 403 403 171 464 464 340 340 116 466 466 22 283 455 236 239 384 371 278 278 278 31 342 86 86 238 6 272 11 11 11 379 379 471 471 49 9 238 6 272 87 87 87 58 72 156 156 255 42 42 147 147 380 499 499 265 265 85 85 146 146 368 368 368 342 342 224 242 242 116 116 33 33 33 90 250 217 217 473 473 278 278 278 31 39 86 86 238 238 401 491 270 270 270 342 168 69 462 462 130 402 221 401 321 321 74 190 190 437 498 498 498 498 498 498 134 16 302 182 302 302 497 175 175 81 89 89 446 446 67 212 131 472 221 401 321 74 190 492 492 498 498 498 215 215 35 259 74 100 100 100 100 375 375 375 375 98 43 7 7 7 276 346 346 346 346 315 85 85 85 139 139 293 293 293 122 35 198 22 5 5 251 251 241 431 278 278 285 449 302 302 497 497 497 8 8 259 354 29 498 498 498 498 498 396 37 37 314 77 478 232 232 232 172 115 115 273 470 486 486 365 365 365 365 328 200 200 200 248 253 253 253 31 342 342 168 118 118 118 118 280 29 177 177 177 314 131 133 364 364 276 347 347 347 347 498 498 467 313 313 216 216 22 283 283 455 43 43 364 276 174 174 174 174 319 319 348 348 195 195 195 64 212 212 93 93 93 93 93 464 464 69 462 130 402 402 162 232 68 172 115 273 273 319 319 203 53 53 29 29 495 467 467 89 340 116 94 335 14 14 411 411 297 297 297 297 297 182 182 497 122 216 216 22 283 448 219 219 219 219 286 286 286 286 286 286 286 59 59 59 452 452 263 417 417 417 491 491 421 421 491 421 128 128 491 128 491 128 128 193 193 193 17
+17 17 17 296 296 52 52 52 52 52 52 52 52 52 408 101 51 149 149 321 321 7 7 217 473 65 329 329 460 460 169 164 164 485 485 485 485 378 88 121 121 121 116 33 394 239 107 395 470 153 153 387 387 146 146 314 35 259 22 283 455 236 239 161 79 79 499 499 265 85 85 85 146 173 173 280 145 145 460 460 460 169 402 36 272 495 495 467 257 257 257 257 342 168 180 84 350 350 350 350 413 33 394 90 393 234 261 261 25 486 486 486 460 460 169 99 436 436 436 60 298 298 298 275 303 303 117 404 229 491 247 126 126 126 326 326 326 326 326 326 408 408 408 149 228 321 321 412 188 154 154 154 96 96 172 172 273 470 151 151 215 215 96 36 272 161 495 495 467 467 135 135 200 248 466 22 283 455 399 399 70 65 65 84 496 496 203 53 53 291 291 379 379 49 9 142 397 345 409 409 409 409 58 183 451 30 30 30 301 399 217 217 473 443 443 443 240 36 449 472 133 133 364 276 109 278 278 278 399 217 473 136 275 275 116 195 199 89 89 322 67 199 58 110 110 110 110 254 254 314 35 401 75 377 87 87 87 10 10 309 479 331 331 284 284 405 206 206 206 314 314 401 75 108 377 123 123 123 216 114 114 57 57 57 381 381 381 48 48 229 414 491 312 312 126 292 292 23 23 23 23 23 101 260 391 391 228 289 321 321 373 155 155 332 148 148 387 387 372 406 467 467 242 121 203 53 394 76 74 190 190 487 288 330 379 33 394 77 342 342 273 470 443 240 240 133 133 133 345 382 313 285 14 411 284 265 85 85 146 146 175 175 175 81 275 275 275 116 64 212 131 219 152 152 152 378 353 353 353 313 186 54 54 224 494 236 259 74 437 496 496 496 496 274 186 186 323 238 238 6 272 87 87 116 10 479 331 106 284 405 206 206 167 35 75 377 377 123 123 14 14 411 411 411 297 424 297 182 182 293 293 175 175 89 89 446 446 33 394 478 478 232 232 172 172 273 273 319 319 348 64 64 212 161 300 337 41 219 219 219 219 152 152 152 399 217 473 213 213 213 252 449 449 106 125 125 125 125 466 22 283 455 42 42 147 380 380 84 496 496 496 496 274 274 37 24 404 427 321 247 126 126 326 326 326 101 101 149 149 149 321 321 320 7 345 109 409 181 240 216 300 300 300 219 219 152 152 152 152 10 10 479 331 84 84 84 274 216 216 114 57 203 399 70 157 157 157 157 313 10 479 331 331 307 307 307 307 61 167 167 233 227 227 419 439 417 170 170 170 170 28 491 28 28 491 491 28 362 491 491 40 305 305 305 40 40 40 40 40 40 163 491 491 366 366 366 163 491 316 316 491 435 289 321 321 320 7 217 217 473 65 486 486 486 460 460 169 169 164 164 485 219 219 485 152 152 301 422 239 36 161 79 79 288 288 443 240 325 34 191 191 191 314 131 472 14 14 226 321 321 411 297 297 297 297 297 297 297 297 297 297 293 497 497 497 122 43 364 276 109 109 278 278 399 217 473 136 275 275 275 195 195 195 335 440 440 154 154 154 458 96 96 68 172 273 470 151 151 215 35 96 272 472 472 472 196 70 70 70 65 495 495 380 467 256 139 175 251 241 423 423 423 423 355 89 89 446 116 33 250 250 217 473 258 258 258 342 342 224 494 494 494 31 9 142 142 397 147 147 329 329 329 329 329 143 36 310 107 395 302 302 302 375 98 98 13 229 491 247 312 15 15 15 15 193 193 193 17
+17 17 17 363 363 363 363 363 408 51 51 491 321 451 451 30 30 30 58 72 110 110 110 254 254 254 285 44 44 44 94 199 331 319 319 319 348 394 76 465 144 27 27 437 319 319 319 53 53 77 205 155 29 6 134 134 134 8 354 100 100 100 497 349 349 234 234 234 261 261 25 485 213 485 286 139 139 175 175 81 176 176 328 328 200 200 117 229 321 247 126 126 326 326 326 101 408 149 228 321 321 45 45 45 45 198 22 5 455 399 473 473 494 38 162 232 232 86 238 6 272 485 485 286 468 468 337 337 485 324 459 459 271 31 54 9 142 221 336 259 208 208 190 487 288 213 213 213 252 36 310 107 395 334 334 304 304 49 269 142 397 397 141 141 281 162 232 232 172 115 273 444 444 213 252 143 458 208 79 487 313 236 143 36 26 359 166 166 166 324 301 251 251 251 251 241 241 431 376 376 376 460 460 169 169 352 164 164 25 176 135 328 200 464 464 415 415 415 415 415 36 131 183 57 57 57 57 57 381 381 48 48 417 417 170 170 170 28 491 28 491 341 491 341 341 491 163 491 491 316 316 316 321 435 289 373 451 451 30 30 301 399 217 473 65 476 476 464 464 202 202 202 8 137 137 137 116 90 90 76 208 441 441 346 346 428 428 146 146 24 35 133 133 147 380 499 428 428 428 146 143 449 449 89 340 116 394 212 164 164 214 214 360 360 200 76 465 192 192 176 135 200 200 248 248 478 342 172 115 273 84 84 84 84 274 98 13 229 247 312 126 126 292 326 326 23 23 101 101 101 149 391 228 321 321 155 155 155 148 148 372 58 183 451 30 30 378 378 141 141 141 281 342 168 44 44 44 94 199 335 14 14 411 284 284 284 405 405 206 206 240 314 26 26 241 241 367 367 367 458 321 192 176 135 200 200 248 248 465 259 74 74 492 492 492 492 271 150 342 342 224 224 242 116 199 459 459 469 469 37 24 75 310 107 107 447 97 225 225 80 321 321 320 345 333 333 220 220 22 44 44 94 199 331 319 319 319 348 348 33 394 394 212 239 445 180 290 290 290 290 434 434 434 339 33 359 359 81 166 324 324 422 422 349 234 234 234 234 261 25 25 278 278 278 416 458 192 300 334 334 355 355 452 263 229 247 247 126 126 326 326 326 326 101 101 149 149 228 321 412 83 55 55 446 322 67 33 33 250 251 251 241 241 431 235 235 235 235 235 235 235 235 235 348 200 200 200 248 335 14 14 14 226 209 287 265 265 265 85 85 146 146 146 299 242 242 339 195 195 248 248 212 239 208 79 79 288 288 403 403 403 171 171 171 3 58 58 72 72 72 72 110 264 264 264 264 264 264 264 468 468 59 313 216 216 127 45 45 45 236 129 75 108 119 351 351 151 151 151 169 178 36 310 447 447 447 6 272 257 257 257 257 257 99 338 338 338 447 482 238 238 336 321 75 371 374 374 374 374 215 129 354 176 176 176 328 200 200 248 248 186 338 338 338 395 470 106 424 424 424 424 497 122 122 131 300 334 334 304 304 185 185 269 9 427 229 247 247 15 15 193 193 193 17
+17 17 17 363 363 51 51 228 321 412 55 55 322 67 212 34 44 44 462 349 234 234 234 234 234 261 261 25 424 424 424 424 424 182 182 182 497 497 497 497 497 497 186 186 162 482 482 482 482 115 115 273 106 405 405 405 405 206 169 349 352 352 402 6 272 472 221 336 336 354 190 380 288 315 315 315 315 450 450 450 413 413 348 33 33 33 394 394 32 239 75 354 485 213 286 286 286 286 286 286 334 59 59 37 37 24 131 404 225 225 225 225 225 320 345 407 407 407 407 36 107 107 400 30 30 464 254 254 254 314 133 133 364 276 276 153 153 153 387 372 396 388 94 199 145 463 463 173 280 29 313 186 162 342 342 224 494 379 379 379 77 342 224 30 30 378 345 141 141 281 31 9 6 272 119 397 441 109 432 330 348 64 64 76 449 41 41 41 41 19 19 454 13 417 170 170 170 491 28 28 28 28 491 491 2 2 2 491 362 362 102 362 362 362 491 362 362 491 305 305 366 366 366 366 366 435 435 316 316 435 435 435 289 321 321 321 209 188 340 340 33 33 90 349 205 234 261 25 470 486 376 376 376 376 460 460 178 178 233 96 75 419 427 321 247 126 126 126 326 326 326 326 326 326 326 101 408 408 408 149 321 321 373 451 451 30 30 30 58 58 110 254 254 254 254 314 26 26 251 241 367 367 367 367 367 35 96 321 75 34 415 415 415 385 314 259 108 119 397 441 441 432 330 330 379 64 76 36 449 41 41 41 41 19 19 454 454 454 414 47 47 80 80 80 321 321 7 7 32 280 104 104 104 104 104 337 337 337 324 301 399 217 383 383 383 383 383 383 310 107 34 253 253 253 342 224 30 30 30 301 26 241 367 367 367 35 96 321 272 34 415 415 143 478 478 68 68 115 273 278 278 178 96 96 86 86 238 6 272 41 41 41 19 19 454 417 417 417 47 47 47 47 491 47 47 47 80 249 435 435 321 321 321 7 7 251 241 241 431 376 376 460 178 178 458 192 192 176 176 135 200 200 200 464 255 251 251 241 431 278 278 278 26 26 302 302 175 175 69 223 130 130 198 22 283 455 416 144 208 79 380 288 288 290 290 434 339 339 199 459 459 459 271 271 39 39 433 433 160 160 112 112 56 56 56 56 28 28 491 28 28 28 28 362 362 491 362 362 362 362 362 362 491 491 362 362 491 362 218 40 40 211 369 369 369 369 369 369 369 369 369 21 21 21 21 21 21 21 21 21 21 21 21 21 260 260 260 260 260 260 260 260 260 260 260 260 260 260 163 163 163 366 491 316 316 316 491 316 73 73 289 321 321 7 409 409 409 409 94 58 183 451 30 30 301 378 42 147 147 288 213 213 213 252 36 107 447 447 6 472 472 221 336 321 354 190 380 499 428 428 428 146 146 143 35 472 133 133 147 147 147 288 288 278 173 173 280 29 334 334 355 355 452 452 263 263 417 417 417 442 80 80 435 321 435 127 127 0 222 222 245 378 8 345 141 141 281 281 9 9 238 196 309 479 331 231 231 231 231 231 274 274 186 162 232 172 172 115 273 480 480 480 85 85 299 299 94 199 69 223 130 280 34 475 475 475 475 475 475 324 422 36 310 161 161 161 487 487 288 290 290 290 434 434 339 303 303 48 48 48 417 417 417 417 193 193 17
+17 17 296 296 296 317 491 491 184 184 184 321 435 435 451 451 30 30 30 422 164 164 106 106 106 153 387 387 285 449 451 30 301 378 378 141 141 281 281 9 238 6 108 377 344 344 374 132 132 88 147 109 498 498 498 498 134 134 175 81 474 41 41 19 19 186 162 68 172 115 273 84 496 274 58 451 451 30 30 422 422 36 108 119 119 437 265 265 265 85 85 146 146 24 131 257 257 257 257 31 54 9 142 397 441 153 153 153 372 396 313 186 342 342 224 340 340 340 466 22 283 448 448 219 219 464 180 180 306 306 306 306 396 396 24 285 69 223 130 198 22 283 455 38 162 232 482 482 482 105 196 70 70 65 65 481 481 481 481 182 293 293 497 122 129 401 321 321 190 190 380 380 499 428 428 428 146 146 385 36 472 133 42 147 147 288 278 278 173 280 29 29 245 58 58 72 496 496 496 274 274 143 75 108 119 119 351 351 256 139 139 139 139 139 375 375 375 98 13 321 247 126 126 126 326 326 326 326 326 326 326 101 149 149 228 321 412 83 55 55 322 67 67 133 364 109 109 189 330 330 348 64 36 34 180 410 410 410 410 410 280 29 29 313 236 36 377 377 123 123 216 22 283 283 38 162 232 68 238 272 470 470 171 171 171 99 99 436 436 60 298 116 33 199 58 58 268 268 268 268 268 268 268 169 169 186 39 323 390 390 18 18 112 112 56 491 56 312 491 491 312 491 12 12 12 12 21 23 23 260 260 260 391 149 228 491 321 321 127 5 5 455 251 251 241 431 235 235 235 235 235 235 235 235 348 200 248 248 90 465 259 74 425 386 386 431 486 486 460 240 35 35 393 393 155 155 148 148 148 148 387 387 203 53 250 345 141 141 141 281 9 483 14 226 226 209 411 297 297 297 297 203 53 53 65 496 496 368 31 54 6 6 272 490 490 490 368 453 9 168 498 498 498 498 396 313 325 449 191 191 191 37 24 24 404 13 417 47 47 491 491 321 80 321 321 435 5 448 448 14 14 411 411 350 350 350 350 350 350 466 81 166 166 324 301 301 251 251 241 431 278 278 173 280 176 176 328 200 248 394 465 208 208 487 487 213 213 213 422 36 310 107 395 334 495 406 340 340 340 340 33 394 478 66 68 68 115 273 273 265 265 265 428 85 146 358 24 36 472 472 336 259 354 420 420 420 360 360 135 135 135 135 200 200 44 44 44 44 416 321 144 208 498 498 498 498 498 498 134 302 302 375 375 293 497 98 225 489 489 489 489 378 43 345 141 141 281 162 232 68 68 115 115 470 278 278 325 325 176 176 176 135 135 200 200 199 125 125 125 125 199 44 44 44 129 259 321 74 437 437 265 85 85 85 85 85 139 175 175 81 462 462 130 402 99 338 338 338 338 395 395 360 360 200 200 200 248 212 302 302 302 302 497 185 49 342 168 415 415 415 198 22 448 448 448 448 154 154 154 458 321 96 96 482 482 447 238 6 161 487 487 288 360 360 360 434 434 301 399 217 473 65 432 432 432 120 330 388 388 303 303 303 243 243 131 419 427 491 229 247 15 15 15 15 15 193 193 193 193 193 17
+17 17 17 296 363 363 363 363 363 101 51 51 51 321 321 320 7 217 473 65 329 329 329 460 460 329 329 164 164 485 485 219 485 485 477 374 132 132 132 32 321 321 354 180 264 264 264 468 468 313 134 175 359 166 166 166 301 10 479 331 84 84 496 496 285 459 459 459 31 342 342 224 176 176 328 200 200 248 212 45 45 45 325 177 177 177 177 43 43 364 364 276 346 346 141 141 141 368 453 342 168 44 44 416 416 321 144 27 498 498 498 498 467 302 302 375 375 98 98 98 13 225 321 435 225 7 373 66 66 68 68 172 115 273 265 265 265 85 85 146 146 285 285 302 302 302 497 122 122 472 472 259 74 74 437 437 311 311 311 460 169 150 39 86 238 6 272 300 334 334 406 406 467 356 281 281 9 9 142 221 336 321 208 208 441 109 278 278 178 143 458 192 26 359 359 474 166 464 464 253 253 253 342 142 221 321 74 437 437 405 405 206 150 342 342 224 494 134 8 100 100 100 100 497 345 333 333 220 216 180 113 113 113 113 167 167 457 251 251 241 367 367 367 367 458 192 192 135 135 200 200 464 415 415 415 415 415 285 156 156 156 156 59 59 452 263 13 229 491 247 312 312 312 292 292 292 292 292 1 21 21 21 21 21 21 21 260 408 408 408 149 149 321 321 321 373 72 110 430 430 430 430 430 430 325 183 183 451 30 30 301 301 251 251 241 367 367 367 367 367 367 233 96 96 6 227 419 427 56 170 442 442 201 201 201 201 201 201 491 47 491 491 491 435 435 435 320 451 451 30 30 422 458 144 389 389 389 131 58 72 72 72 437 306 306 306 306 306 396 313 285 26 359 166 166 166 166 464 202 202 349 205 234 234 261 261 25 470 443 139 139 139 293 497 122 35 75 377 87 87 87 116 10 10 479 331 84 84 496 496 274 285 325 459 459 459 271 31 342 342 86 198 198 127 283 5 236 129 129 321 108 119 119 351 432 432 330 330 33 195 471 77 269 238 272 447 397 336 147 456 456 236 236 239 310 107 395 278 278 278 325 34 469 469 236 36 108 449 41 41 41 324 324 246 3 464 89 89 446 446 212 131 145 443 178 178 458 96 96 86 105 105 336 470 470 151 178 143 96 401 321 75 108 119 418 418 418 418 418 186 99 436 436 60 60 298 116 199 69 223 130 402 402 156 156 156 245 14 14 411 411 145 145 145 460 460 240 325 34 469 469 143 321 108 449 485 485 485 374 132 132 325 325 34 89 446 446 67 131 34 154 154 154 96 96 54 142 105 336 190 380 288 151 151 169 99 436 436 60 60 298 298 298 303 303 303 48 404 229 491 491 312 312 312 292 292 292 292 292 292 21 21 21 21 21 21 21 21 408 408 408 408 391 491 491 321 373 373 338 338 400 400 400 30 301 378 378 345 141 141 141 281 162 162 232 68 68 115 470 278 278 278 325 176 176 176 135 135 200 248 248 212 127 114 264 264 264 264 468 245 245 43 43 364 364 109 109 109 171 171 171 252 449 449 176 176 135 328 200 200 248 248 393 155 155 332 332 332 186 162 232 68 68 68 115 273 231 231 231 231 53 53 394 76 164 164 214 214 214 214 214 328 200 200 117 335 14 209 157 157 157 157 157 313 186 186 162 232 68 68 68 115 273 231 231 231 53 53 212 212 65 493 493 240 325 41 41 41 19 19 19 454 229 491 247 15 15 193 193 193 17
+17 17 17 363 51 51 228 184 321 321 209 83 194 194 194 194 194 194 194 194 194 282 388 195 195 212 212 131 483 197 197 197 197 66 66 68 68 115 273 494 278 330 379 33 394 478 478 68 68 68 115 115 470 278 278 325 449 176 176 176 328 328 200 200 464 89 446 116 131 133 364 109 109 403 171 171 171 252 449 449 176 176 328 200 200 250 345 141 141 281 281 9 198 22 448 448 448 14 411 350 350 350 350 350 350 348 359 81 324 324 324 422 164 164 164 214 214 214 214 214 200 200 200 195 248 248 394 76 75 377 87 87 87 236 321 384 371 374 374 374 374 132 132 236 36 310 107 395 395 151 151 151 169 150 39 86 86 238 6 127 114 361 361 361 361 361 388 388 303 117 48 229 321 247 126 126 292 292 408 408 408 408 391 321 321 373 373 400 400 400 400 30 422 422 162 232 68 68 115 273 470 486 486 486 460 460 169 169 36 227 483 226 440 89 446 322 67 394 133 364 321 364 109 109 171 171 171 252 252 449 191 191 191 191 131 133 133 321 345 333 220 220 220 164 483 14 226 321 321 209 411 297 297 297 297 297 297 297 293 497 175 81 58 58 156 156 156 156 156 245 399 217 217 217 70 65 65 428 428 146 146 358 449 449 449 242 116 116 33 33 217 217 217 217 473 290 290 290 290 290 434 434 434 339 303 303 48 48 48 417 491 170 170 28 28 28 28 491 362 362 362 491 491 362 362 491 362 491 491 211 211 491 341 369 369 369 369 21 21 21 21 21 21 21 21 21 101 101 101 149 149 228 321 321 7 217 473 473 329 329 329 460 169 169 164 164 219 219 485 485 378 88 88 242 446 348 90 90 465 445 445 351 351 486 315 319 450 413 413 76 449 449 300 191 313 314 198 22 283 455 38 232 232 238 6 272 470 171 171 171 252 99 436 436 60 60 298 116 33 250 217 473 65 486 486 460 460 169 150 54 238 6 272 300 334 382 313 251 251 251 241 431 405 405 405 206 178 458 192 176 135 135 200 200 200 199 230 230 230 230 215 35 96 198 22 283 455 236 129 321 108 119 351 278 278 143 458 192 192 277 385 325 180 106 405 405 405 206 169 352 352 25 459 459 271 271 31 9 142 221 321 259 190 488 488 488 488 215 35 29 29 382 313 236 36 36 119 351 153 153 153 372 467 337 337 301 236 108 377 123 123 416 458 144 180 180 84 84 496 88 88 176 176 135 328 200 200 248 58 72 437 437 350 350 350 350 203 53 381 394 394 155 155 332 332 332 186 162 342 115 273 273 151 151 215 215 354 29 334 334 334 59 452 452 229 321 247 126 126 126 326 326 326 326 326 326 326 326 101 149 149 149 228 321 83 55 55 322 67 212 34 145 145 486 376 460 460 169 150 150 86 238 6 272 57 57 57 203 473 118 118 118 118 402 198 22 5 5 455 349 234 234 261 25 106 265 265 265 85 85 146 438 173 349 234 393 198 164 470 498 498 313 285 325 41 324 324 422 36 310 161 161 487 487 288 290 290 290 290 434 434 339 250 250 345 389 389 389 314 131 472 401 401 401 80 321 80 321 478 66 68 482 115 273 374 374 374 132 413 33 250 212 354 420 420 420 464 464 255 255 251 251 241 431 235 235 235 235 235 235 413 303 303 303 48 48 48 417 417 170 170 421 421 491 421 421 491 491 128 128 491 128 128 128 193 193 17
+17 17 17 296 296 296 317 52 52 52 52 52 52 52 52 52 51 51 51 184 184 321 321 320 127 5 5 38 349 261 261 25 106 265 265 85 85 146 438 173 349 221 401 127 114 498 498 498 313 325 34 324 324 422 236 36 161 161 487 487 288 290 290 290 434 339 199 58 254 254 71 281 342 142 221 336 354 137 137 137 137 116 335 335 14 14 411 188 188 340 340 340 340 330 330 388 94 199 199 89 89 89 446 446 67 64 131 472 472 458 144 180 106 426 426 426 426 426 426 282 282 388 195 117 404 404 225 225 72 110 486 486 486 460 460 169 352 352 29 44 44 94 199 145 145 315 315 315 468 468 406 467 467 467 469 416 416 192 180 84 84 84 84 375 98 98 13 417 417 47 491 47 80 80 321 321 320 83 83 145 145 365 365 330 330 379 77 77 224 179 179 179 313 314 198 164 127 114 92 92 92 92 167 457 401 401 401 321 354 190 380 288 278 278 31 342 86 105 105 458 192 255 255 349 205 261 25 278 278 278 99 436 436 395 302 302 302 375 98 98 13 229 321 247 312 312 292 1 1 292 326 1 326 23 23 23 23 101 101 101 149 391 491 289 289 320 7 354 159 159 159 159 159 167 35 35 198 127 0 0 222 468 245 378 345 141 141 281 453 453 44 44 44 44 259 74 437 311 311 311 311 150 150 342 224 494 494 469 116 64 212 310 300 334 382 313 314 314 239 161 161 380 499 405 405 206 215 215 96 96 272 34 106 405 405 206 169 349 352 234 155 332 332 332 313 219 219 477 477 477 477 132 132 98 98 417 417 417 417 47 47 491 491 47 80 80 80 321 320 412 287 44 44 44 175 81 431 278 278 285 302 497 497 122 416 144 180 498 498 498 498 498 499 302 375 375 98 98 263 13 417 417 417 417 170 491 491 47 491 491 491 47 47 435 435 321 321 373 310 400 400 400 30 422 422 162 232 68 172 115 470 278 278 325 176 176 135 135 200 200 464 180 113 113 113 113 167 167 35 35 127 114 264 264 468 406 467 467 125 125 125 348 466 22 283 455 99 338 338 395 395 360 360 360 200 200 248 212 302 302 302 302 375 375 185 269 433 112 427 491 247 312 312 126 292 292 292 23 23 23 23 23 101 101 101 149 149 149 289 321 412 287 287 287 111 111 111 438 145 376 376 460 460 169 150 86 86 238 6 272 272 156 382 313 325 34 87 87 416 144 27 180 84 84 496 88 88 88 340 340 340 116 33 394 465 377 123 123 198 22 283 283 455 251 251 241 431 171 171 171 252 325 41 41 41 318 318 49 9 142 397 364 109 109 403 171 171 252 449 449 176 135 328 200 248 248 345 380 288 288 496 203 203 53 53 212 354 159 159 159 159 167 167 310 107 338 400 400 400 30 324 464 121 121 33 394 90 393 155 155 25 148 148 387 387 387 203 53 53 53 64 10 429 429 429 301 416 32 321 208 79 79 380 288 288 171 171 171 252 173 173 402 26 359 474 474 474 474 19 19 19 229 321 247 126 126 326 326 326 326 101 149 149 228 321 321 412 45 45 45 169 143 310 107 400 400 30 301 422 129 74 492 492 245 349 349 261 25 498 498 498 313 313 36 377 87 87 38 54 86 6 272 470 171 464 464 464 113 113 113 113 167 77 478 172 172 273 265 265 265 265 85 85 299 299 24 131 419 439 439 417 417 237 128 193 17
+17 17 17 363 363 363 51 51 184 184 321 373 373 338 400 400 400 400 213 356 356 368 453 342 168 44 44 44 44 458 144 445 351 343 343 343 343 171 358 358 358 39 342 224 168 111 111 111 111 438 186 99 395 395 389 389 236 36 478 224 273 470 403 403 403 207 207 207 454 263 48 417 417 417 417 417 237 237 442 28 28 28 491 28 442 442 442 442 362 362 362 362 362 362 362 362 362 362 305 218 218 218 218 366 218 491 491 366 366 366 366 366 366 491 366 366 366 366 316 316 316 316 435 321 289 321 209 287 287 111 111 111 111 438 438 203 10 479 307 307 307 61 285 34 154 154 458 96 342 86 105 336 470 470 151 151 178 178 96 36 449 176 135 135 200 464 44 44 44 416 321 208 79 498 498 498 498 499 302 375 98 98 98 13 13 414 170 170 442 47 47 47 491 47 47 47 491 316 316 80 80 321 289 66 66 66 179 179 179 179 314 314 196 217 473 486 486 329 460 169 169 352 183 485 485 485 374 301 129 321 259 425 425 386 431 376 365 365 299 76 76 465 26 26 359 359 474 474 474 19 19 229 321 247 126 126 126 126 326 326 326 326 101 408 408 408 391 391 391 391 316 316 80 80 289 321 321 321 188 177 177 177 356 356 356 342 168 44 44 44 8 32 32 32 321 354 354 153 153 153 153 387 387 387 146 464 464 111 111 111 146 438 202 402 402 402 75 144 27 437 319 319 203 53 53 394 90 393 234 205 155 332 148 148 148 372 372 372 59 452 263 263 78 414 47 47 47 47 47 491 47 47 80 491 321 289 289 373 451 451 451 30 30 30 99 338 338 389 389 389 389 314 32 239 354 420 420 420 213 213 252 422 183 183 451 286 286 286 286 334 59 59 59 452 263 321 247 247 126 126 292 292 326 326 326 326 326 326 101 101 101 149 228 289 289 320 217 473 258 258 31 342 224 494 494 368 453 342 483 14 145 145 284 329 329 175 175 81 81 469 416 458 458 96 342 68 115 273 365 365 365 330 348 64 212 300 382 382 313 186 162 162 482 482 105 105 336 470 470 432 330 330 64 64 77 77 342 224 300 334 334 334 59 245 43 43 345 141 141 141 281 9 9 6 6 87 87 87 87 8 321 354 190 288 288 360 360 200 200 183 183 57 57 57 57 53 473 106 410 410 410 410 173 280 29 29 382 245 349 155 155 165 165 165 53 53 10 479 331 84 84 496 274 173 280 29 38 38 162 482 482 105 336 144 180 496 496 496 274 99 99 436 107 60 423 423 349 349 205 155 155 332 332 332 372 372 245 203 473 429 429 429 429 19 19 454 454 417 414 170 170 170 28 491 28 2 2 2 2 491 2 491 491 2 2 2 316 316 316 491 316 316 73 321 321 321 321 7 127 5 5 38 162 342 86 238 6 371 470 171 171 171 252 99 436 436 60 298 275 116 33 250 217 473 65 486 486 486 460 169 169 150 86 238 6 272 300 334 382 245 43 43 364 276 109 278 278 31 342 342 224 302 302 302 302 375 122 122 122 352 419 427 229 247 312 15 15 15 15 15 193 193 193 17
+17 17 17 296 363 363 363 51 51 51 184 491 184 321 7 7 320 127 357 357 443 443 240 271 150 39 86 238 198 198 114 0 222 468 468 313 186 186 162 68 68 115 273 231 231 231 231 53 53 217 473 65 258 38 31 162 68 68 238 6 470 470 470 171 171 171 171 358 358 233 233 321 192 419 439 417 417 417 237 237 47 80 321 321 435 373 451 451 451 30 30 422 162 68 68 115 470 470 120 120 120 37 24 24 404 13 229 491 247 312 126 292 292 292 292 292 21 21 21 408 408 408 149 149 228 321 321 320 7 217 473 258 258 258 31 342 224 494 494 494 31 162 232 105 105 336 470 329 432 330 330 64 77 77 224 300 334 382 245 245 458 144 445 210 210 210 210 210 203 203 53 106 230 426 426 206 169 349 402 198 198 22 283 455 236 161 161 487 487 288 290 290 290 434 434 250 250 345 333 333 220 220 129 127 114 92 92 92 92 167 167 457 32 32 32 259 208 498 498 498 498 498 134 302 302 302 375 175 175 81 89 89 322 67 394 32 239 144 445 210 210 210 210 210 173 349 402 156 156 156 156 156 467 467 340 340 116 394 465 377 123 123 399 70 46 46 46 46 46 438 236 36 310 107 395 180 499 499 306 306 306 306 306 59 37 37 243 233 75 227 419 427 78 56 491 491 312 292 292 292 23 23 23 408 408 408 391 321 321 373 66 68 115 273 470 120 120 240 314 314 219 219 219 219 152 152 152 374 132 132 88 88 89 89 446 116 212 131 219 222 222 222 387 387 186 186 162 232 172 115 273 278 278 31 31 54 86 238 6 272 300 300 355 132 43 345 347 347 347 347 347 467 467 313 236 236 239 384 180 180 405 405 206 215 215 35 96 272 176 135 135 200 248 58 156 156 156 156 156 59 245 349 155 155 165 165 165 165 53 44 44 44 335 14 14 411 411 153 372 372 372 349 349 352 261 242 242 94 199 459 271 38 162 342 68 115 273 265 265 265 85 146 146 175 175 81 459 203 203 117 404 229 247 126 126 126 326 326 326 326 101 101 149 149 228 321 412 83 55 55 55 322 67 466 45 45 45 45 36 36 107 219 152 152 152 132 378 345 389 389 389 314 239 239 420 420 420 464 255 255 255 251 251 241 431 235 235 235 235 348 200 248 76 393 155 332 332 332 245 156 156 156 156 245 245 129 129 321 74 190 488 488 151 368 453 453 168 11 11 379 64 243 465 26 359 474 474 474 474 19 19 48 417 417 417 417 170 47 491 47 491 47 491 491 47 316 80 289 321 7 7 127 114 92 92 92 92 240 167 77 77 342 168 106 297 297 297 297 297 297 293 175 111 111 111 111 438 438 438 10 479 331 84 84 84 88 88 255 255 255 8 354 180 113 113 113 113 450 285 285 277 277 277 277 24 131 439 417 417 417 417 491 491 47 491 80 80 321 412 83 83 83 194 194 55 55 322 212 34 111 111 111 111 111 438 438 58 72 110 202 202 202 202 202 29 242 116 90 394 239 27 180 405 405 206 240 285 34 475 475 475 475 475 475 301 399 70 138 138 138 138 138 372 245 245 14 411 411 153 153 372 372 372 349 349 155 29 242 275 379 379 471 471 49 142 221 336 144 121 121 379 394 478 68 342 115 444 444 444 213 464 139 139 302 302 497 122 122 131 183 286 286 286 286 406 406 467 467 255 8 354 180 180 113 113 113 450 450 413 37 243 270 270 433 390 390 18 112 56 56 56 312 312 312 15 15 15 15 15 15 15 15 15 15 15 260 260 260 193 193 193 193 17
+17 17 296 296 296 317 317 317 317 317 461 491 461 461 461 461 461 461 461 184 184 289 321 321 209 287 111 111 111 438 438 10 239 384 371 84 84 350 350 413 64 212 131 34 145 319 348 348 212 212 300 494 469 186 162 232 232 232 482 238 6 272 470 470 294 294 294 294 294 294 282 388 388 303 243 75 131 419 439 439 439 78 78 47 47 47 47 491 47 47 47 491 47 47 491 491 80 442 289 66 66 68 179 179 179 179 314 314 196 196 70 65 329 486 329 460 169 164 164 485 485 485 485 132 274 58 58 72 72 72 437 268 139 293 293 215 215 35 26 26 262 262 262 262 262 342 342 26 26 359 474 474 474 474 19 454 229 247 247 126 126 326 326 326 326 101 101 101 149 149 228 321 7 345 109 109 278 278 99 447 447 107 176 135 135 135 200 200 248 212 212 45 45 45 45 35 196 196 217 70 65 65 329 42 42 380 288 256 256 139 175 175 423 423 423 43 43 345 141 141 281 281 453 168 415 415 415 36 131 119 72 72 72 110 294 294 294 294 294 282 388 388 195 394 76 75 377 87 87 87 129 321 144 27 351 496 496 496 274 215 215 401 401 321 354 333 220 220 198 22 283 38 162 342 342 224 494 494 236 36 107 485 485 485 134 88 418 418 418 418 418 252 99 436 436 60 298 298 298 303 117 48 13 229 321 247 312 312 187 187 12 12 12 12 12 12 260 260 260 260 491 163 163 366 366 491 366 491 366 366 316 316 316 316 321 321 321 435 435 7 7 364 276 109 109 443 443 443 139 139 293 293 293 122 219 219 152 152 152 152 314 314 472 401 259 354 180 443 443 285 285 382 382 245 143 458 208 441 151 151 151 169 150 238 238 272 60 60 242 116 466 22 283 416 144 79 498 498 498 498 499 355 302 375 375 98 98 263 13 417 417 417 417 237 237 237 47 47 47 491 47 47 47 80 80 80 321 435 435 66 115 179 179 179 179 314 198 22 283 455 38 162 86 238 6 470 470 171 171 171 99 99 436 436 60 298 116 33 250 217 473 65 486 460 460 169 150 86 6 272 300 382 313 458 445 445 445 351 351 264 468 468 134 134 175 262 262 262 262 39 342 26 26 359 474 474 474 19 19 454 229 321 321 247 126 126 126 292 326 326 326 326 326 326 21 326 21 326 101 101 101 101 149 149 149 228 321 412 412 287 287 111 111 111 438 438 236 239 384 371 470 264 264 468 468 313 186 162 342 68 115 470 403 403 171 171 422 186 99 338 338 395 494 139 139 497 122 8 420 420 420 420 464 171 171 171 134 8 29 100 497 122 36 377 87 87 87 154 154 154 458 96 96 232 105 105 336 425 386 386 431 290 290 290 290 434 434 434 339 195 117 117 417 417 417 417 225 225 435 435 338 400 400 400 30 422 281 342 342 105 221 144 180 106 189 240 285 34 44 44 236 36 108 119 119 351 319 319 319 348 200 200 69 223 223 130 402 156 156 156 156 406 467 467 350 350 350 350 350 350 413 413 413 195 33 212 198 114 92 92 92 92 92 167 35 478 478 68 68 172 115 273 498 498 498 396 396 385 233 242 242 275 303 303 303 303 48 305 417 78 170 421 421 491 491 128 128 491 128 128 128 193 193 17
+17 17 17 296 296 52 52 52 52 52 52 52 363 101 101 51 51 228 491 289 321 7 473 473 329 476 171 252 378 337 337 324 301 216 216 0 0 0 301 378 43 345 347 347 347 406 467 145 113 113 113 113 285 34 223 462 402 402 221 401 259 354 153 153 153 153 387 387 387 318 318 185 453 9 168 69 223 198 198 22 283 455 129 354 190 380 499 288 365 365 282 299 64 212 131 219 219 152 152 152 378 43 364 276 174 174 319 319 348 64 76 449 191 191 191 191 24 131 404 417 417 417 417 417 237 237 237 491 237 28 491 28 362 491 102 362 102 362 362 362 491 366 491 316 491 491 316 316 435 435 321 321 373 451 451 30 30 301 378 364 276 276 346 346 405 405 405 405 206 178 96 96 227 472 472 472 472 401 75 310 107 395 180 329 426 426 206 348 76 465 26 26 359 359 359 474 474 324 464 464 255 255 255 43 364 276 109 109 403 403 403 207 207 207 19 19 454 197 197 80 80 321 321 320 7 354 420 420 420 360 360 360 135 135 135 200 200 248 58 58 72 437 319 319 319 348 348 64 248 212 79 495 334 41 41 41 19 454 229 247 247 126 126 126 326 326 326 326 326 326 326 326 326 326 101 101 101 149 149 149 228 321 412 83 194 194 194 194 322 388 67 466 127 448 448 448 464 319 319 319 348 90 90 205 261 25 148 148 148 387 396 186 310 107 60 60 298 94 11 11 11 457 457 217 217 473 65 486 486 486 460 460 169 169 164 164 485 485 485 485 485 374 132 43 43 345 141 141 141 281 342 26 26 251 241 431 443 443 169 169 402 402 6 272 377 87 87 236 239 239 384 371 371 374 374 374 374 132 132 132 132 132 132 132 132 197 197 321 127 114 114 92 92 92 92 92 460 167 385 35 75 227 472 397 397 345 407 407 407 407 407 310 447 397 397 141 141 141 281 54 9 142 72 72 72 437 306 306 306 306 396 396 285 300 382 382 349 205 155 332 332 332 58 58 183 183 183 57 57 57 57 203 53 381 381 195 394 212 198 127 114 89 446 446 67 394 394 32 32 32 401 401 321 75 354 485 485 286 286 286 468 396 313 325 325 176 135 135 200 200 199 44 44 44 251 251 251 251 241 241 431 265 480 480 480 85 85 85 146 464 464 275 275 388 94 199 340 340 199 199 154 154 154 36 77 342 342 86 221 336 321 384 384 371 93 120 120 120 120 330 388 303 195 195 303 117 404 404 78 78 491 491 312 312 292 292 292 12 12 12 23 23 260 260 260 260 260 391 391 391 491 289 289 321 7 7 7 364 276 276 346 346 405 405 405 206 178 35 35 458 192 180 230 230 230 230 215 215 35 96 401 75 108 377 123 123 123 88 44 44 44 416 416 239 144 79 498 498 498 498 498 498 134 302 375 375 375 375 98 98 98 13 417 417 417 417 237 237 47 47 47 491 491 80 80 491 289 321 321 287 287 44 44 44 38 162 232 482 482 482 238 6 161 161 79 487 288 290 290 290 434 434 339 339 310 107 447 221 144 79 498 498 498 498 498 302 302 375 375 98 98 225 483 226 209 44 44 44 44 33 335 14 14 411 411 153 153 372 372 372 396 349 349 234 261 261 242 242 116 116 33 90 90 212 239 79 79 498 498 498 498 134 302 375 375 375 98 13 229 321 247 15 15 15 193 193 193 17
+17 17 17 363 363 363 51 149 228 228 321 83 83 194 194 194 194 322 67 64 212 212 34 44 44 44 217 217 473 65 486 365 365 460 330 388 64 212 131 34 223 223 130 402 402 156 156 156 156 59 59 59 245 245 43 43 364 276 346 346 346 346 346 265 85 85 146 438 186 338 338 400 400 30 378 378 345 141 141 141 141 281 453 242 242 116 64 131 34 44 44 8 354 354 153 153 153 153 387 387 387 387 207 207 207 98 48 417 417 417 417 170 170 491 28 28 28 28 2 2 2 491 2 2 491 2 491 2 2 2 366 366 491 316 316 316 316 491 73 491 289 321 7 7 217 217 473 65 486 486 486 486 460 460 169 169 169 164 164 164 219 219 219 485 485 374 374 132 132 32 32 321 208 208 79 79 380 380 288 84 496 496 496 496 274 274 413 413 413 413 64 212 212 34 34 340 340 116 33 394 478 478 232 232 232 232 232 105 105 336 354 470 286 278 498 468 468 468 468 468 467 277 385 325 449 34 253 253 253 453 168 30 30 30 422 129 75 108 119 308 308 308 308 308 308 308 396 313 64 212 131 255 255 8 354 180 113 113 113 113 113 450 450 413 24 36 449 89 89 116 33 33 394 90 338 338 338 338 338 338 395 189 151 151 169 349 349 352 29 302 302 302 302 497 122 122 122 314 401 401 401 321 310 107 107 395 432 432 432 330 379 64 76 36 26 26 359 474 474 474 324 301 239 384 371 180 315 315 315 450 450 413 466 466 22 283 455 455 259 74 425 386 431 486 486 460 460 167 35 393 205 321 155 148 148 148 387 387 203 53 90 90 75 119 441 441 153 153 153 153 372 372 37 314 77 77 342 9 224 156 156 156 156 59 452 452 263 229 247 247 312 312 312 292 292 292 292 292 292 292 1 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 260 260 260 260 408 408 391 391 321 321 373 373 400 400 400 30 30 58 110 254 254 254 254 314 35 259 137 137 137 137 399 250 250 276 346 346 346 206 206 240 310 310 107 395 176 135 135 200 248 183 57 57 57 57 57 53 473 335 14 411 145 463 463 463 463 29 29 382 313 186 186 162 68 68 273 278 278 278 330 379 394 77 342 342 451 30 30 30 464 254 254 254 131 129 321 74 437 311 311 311 311 311 311 460 169 150 86 86 238 6 272 272 334 334 334 59 59 452 452 229 321 247 126 126 126 292 292 292 326 408 408 149 149 228 321 321 209 83 55 55 322 322 67 310 400 400 400 30 30 3 58 72 72 110 110 254 254 254 240 131 58 183 156 156 156 156 245 335 14 411 265 265 265 265 265 85 85 146 318 368 342 168 168 125 125 125 125 348 199 183 183 57 57 57 57 53 53 10 10 479 331 331 315 315 315 315 315 450 450 450 450 98 98 13 13 78 170 170 170 170 28 28 2 491 2 2 491 2 2 2 2 316 491 316 316 316 316 289 289 289 321 7 7 217 473 329 329 329 329 329 329 329 164 164 485 485 485 301 378 378 345 141 141 141 281 9 238 221 196 479 307 307 307 61 167 167 457 457 251 251 241 367 367 367 367 458 192 176 135 135 200 200 464 464 415 415 415 415 415 240 285 156 156 156 156 59 452 452 229 321 247 312 15 15 15 15 193 193 193 17
+17 17 17 296 491 317 317 491 491 184 184 184 321 320 412 44 44 44 44 36 310 107 107 395 437 91 91 91 91 85 85 139 139 293 122 122 34 69 223 130 280 44 44 8 8 354 180 113 113 113 285 285 34 44 251 251 251 241 241 431 443 443 173 173 280 242 275 275 116 195 117 117 48 417 414 170 491 170 491 321 211 211 312 312 292 292 326 326 326 326 23 101 101 101 149 391 491 289 289 321 144 144 106 499 306 306 306 306 306 396 396 215 215 96 36 272 272 340 340 94 199 44 44 44 4 280 104 104 104 104 104 468 337 337 337 337 422 422 99 338 338 338 338 395 395 153 153 153 153 372 372 396 385 385 36 227 419 225 225 80 80 491 321 321 7 7 32 4 104 104 104 104 104 468 337 337 337 324 422 143 36 108 119 119 437 437 265 265 428 428 428 146 358 358 233 75 227 419 225 225 225 80 80 80 80 7 7 4 280 104 104 104 468 468 337 337 337 324 3 14 14 411 411 284 319 319 240 416 416 96 134 134 359 359 81 166 166 324 301 236 239 161 161 79 288 288 151 151 271 271 39 342 342 342 224 462 462 462 462 402 402 219 219 219 219 180 180 443 139 175 175 81 84 496 88 88 109 459 459 459 99 99 447 447 447 221 336 144 208 79 380 288 403 403 403 171 324 3 301 301 43 364 364 345 109 278 278 116 33 394 77 77 342 68 342 224 41 41 41 41 19 19 454 229 321 247 247 312 126 292 292 292 292 292 292 23 23 23 23 23 23 23 260 260 260 260 260 260 391 391 228 321 373 373 400 400 400 400 301 378 43 364 276 210 210 210 372 372 372 467 44 44 44 349 349 234 234 261 261 25 470 171 171 171 171 252 252 325 34 191 191 191 314 131 472 401 401 321 259 190 190 380 380 315 315 315 315 450 450 450 413 348 394 478 478 232 232 172 115 273 470 171 171 171 252 175 81 300 300 382 245 58 58 72 110 110 486 486 486 460 460 460 169 385 233 227 227 419 225 225 225 225 412 412 83 55 55 322 67 212 401 321 354 255 255 116 94 94 398 213 213 213 213 252 164 164 164 164 164 283 283 455 72 72 72 110 486 486 486 486 460 282 282 385 385 227 227 419 427 321 247 126 126 326 326 326 326 101 408 149 149 321 412 154 154 154 143 96 96 66 232 68 238 6 272 470 432 432 432 330 64 64 64 212 176 135 135 200 200 248 248 212 384 180 315 315 315 315 450 413 348 199 58 156 156 156 156 156 245 8 8 354 180 376 376 376 376 376 376 460 460 169 178 233 321 208 133 397 345 347 347 313 313 236 36 36 108 119 119 485 374 374 374 374 132 132 132 8 259 259 190 190 380 288 288 403 171 171 171 246 246 318 24 24 270 270 342 168 168 462 462 4 4 4 4 104 104 104 104 104 468 337 337 337 324 324 422 349 164 164 164 164 164 25 278 278 278 278 278 178 143 321 192 192 419 225 47 491 80 80 80 80 321 75 371 490 490 490 162 232 232 68 68 115 273 273 265 265 265 428 146 146 325 34 191 191 191 314 26 26 359 166 166 166 324 301 42 42 42 147 147 147 380 288 288 443 120 120 120 37 37 24 24 131 404 225 225 225 80 321 373 72 72 72 110 264 264 264 264 264 264 59 59 59 59 452 263 13 78 170 170 491 421 491 491 211 491 421 491 15 15 15 193 193 193 17
+17 17 17 296 363 363 363 52 52 52 52 52 408 51 51 51 184 289 321 320 156 156 156 156 245 349 205 205 261 343 343 343 343 343 343 252 186 39 342 86 86 142 397 141 141 141 281 162 232 232 232 482 482 105 105 196 70 65 65 481 481 481 481 182 182 182 375 375 375 98 98 98 225 225 80 80 491 80 80 321 7 7 364 276 276 346 346 428 428 428 428 146 146 358 358 233 321 227 227 419 225 89 483 321 188 89 446 446 33 394 394 76 164 164 164 164 164 278 278 278 278 120 330 303 303 303 303 117 48 48 417 47 47 491 491 80 80 80 80 289 321 320 287 287 297 297 297 297 293 293 186 162 54 68 115 224 273 84 84 84 274 399 399 70 70 383 383 383 383 167 310 107 447 447 447 393 234 261 25 380 288 151 178 178 458 458 208 302 302 302 302 375 375 122 122 227 419 419 427 82 321 312 312 126 292 292 292 292 23 23 23 101 101 101 149 228 228 321 320 373 156 156 156 245 399 217 217 70 473 65 315 315 315 315 315 450 450 293 169 352 352 352 352 97 397 397 345 345 141 141 281 453 9 26 26 251 241 241 431 284 306 306 306 306 306 306 396 396 396 37 233 36 310 107 107 447 18 97 97 225 321 412 83 55 55 446 67 394 478 66 172 115 273 273 84 410 410 410 43 29 347 347 245 245 58 156 156 156 245 14 14 14 411 287 265 265 265 265 265 265 85 85 85 207 207 318 185 269 433 433 160 97 397 397 345 407 407 407 407 310 107 447 447 26 251 241 367 367 367 367 458 96 96 272 472 472 221 401 321 321 208 79 79 288 288 360 360 360 360 434 434 339 199 340 340 340 116 33 394 478 68 68 172 115 273 319 319 319 203 53 53 251 251 241 431 428 428 428 428 146 146 385 77 77 342 224 89 89 116 33 250 217 217 473 65 374 374 374 132 132 37 37 24 321 270 433 160 112 427 229 247 312 126 326 326 326 101 101 149 149 321 412 412 83 55 55 446 446 67 131 472 472 458 208 208 79 79 380 288 403 403 403 403 207 324 464 464 464 446 116 94 199 493 493 493 493 493 216 300 334 334 304 304 304 185 185 269 323 18 112 112 56 56 56 170 170 28 28 28 491 491 28 28 362 491 362 362 362 491 491 362 362 362 362 40 40 362 218 491 491 305 305 366 366 366 305 366 435 435 435 435 435 435 321 435 435 373 373 66 68 172 115 344 344 344 344 344 274 274 349 205 261 25 106 306 306 306 353 396 313 216 22 448 448 448 14 411 411 153 153 153 387 387 313 314 196 196 398 134 134 468 337 337 337 464 464 255 255 215 96 368 453 453 168 273 498 498 498 396 173 29 29 334 334 59 452 452 263 13 229 491 442 312 312 312 292 292 292 292 292 326 326 326 326 326 326 101 408 149 228 321 305 209 287 44 44 94 199 154 154 154 96 96 482 482 238 6 161 79 153 153 153 153 387 396 240 314 196 309 199 264 264 264 468 468 468 337 337 464 464 464 255 255 215 96 478 342 9 168 470 498 498 498 396 173 173 280 29 334 334 59 452 452 229 321 247 15 193 193 17
+17 17 51 51 228 289 321 7 70 70 65 428 428 428 146 146 325 449 202 202 202 202 402 162 232 172 172 267 267 267 267 267 434 434 339 248 248 212 45 45 45 45 198 22 5 455 236 129 321 310 107 107 395 395 278 278 330 116 195 250 250 345 141 141 281 281 453 9 142 4 4 4 104 104 104 104 104 468 337 337 324 301 143 129 401 321 74 74 441 441 441 441 441 387 360 360 360 252 339 76 465 449 191 191 191 191 24 325 89 89 446 33 394 76 74 190 492 492 492 313 94 94 331 331 315 315 315 315 450 450 450 413 413 243 243 77 433 86 238 6 6 227 427 427 491 247 126 126 292 326 326 326 326 101 408 149 228 228 321 320 321 127 45 45 45 45 35 35 127 5 5 455 236 129 259 354 278 278 278 278 416 416 192 180 106 265 265 265 265 85 85 85 146 318 49 9 142 397 347 347 347 347 245 349 349 205 261 261 25 424 424 424 175 81 462 462 462 130 402 478 162 232 232 232 232 105 105 336 336 354 485 278 498 468 468 467 277 277 469 325 449 89 89 446 53 394 212 280 106 265 428 428 146 438 173 280 280 486 486 486 460 169 150 150 342 342 224 469 469 325 449 41 41 41 41 19 19 19 454 454 454 78 491 170 491 312 187 187 292 12 408 408 408 149 228 321 321 127 45 45 45 45 35 198 22 5 455 399 217 473 65 315 315 315 315 450 450 450 169 169 164 164 397 397 397 141 141 141 281 31 162 232 232 68 68 482 397 397 397 109 213 213 213 252 252 36 26 26 26 251 241 431 278 278 278 215 215 35 96 96 465 272 89 89 446 67 131 34 154 154 458 96 96 54 142 105 336 336 190 380 288 151 151 169 150 342 342 224 224 494 459 459 459 459 37 173 352 352 352 427 491 247 126 126 126 326 326 326 326 101 101 101 149 149 228 321 321 320 127 45 45 45 45 35 35 198 22 5 455 349 205 234 234 261 25 148 148 148 148 372 372 372 396 58 72 72 110 110 120 120 120 120 240 24 24 133 133 364 345 141 141 141 141 281 281 9 142 221 336 336 259 190 190 380 380 499 499 499 405 426 426 426 426 206 206 206 37 24 34 89 89 446 116 394 90 393 234 234 234 234 261 25 441 424 424 424 182 182 375 375 375 98 98 13 13 417 170 170 47 491 47 491 491 491 2 491 491 316 73 289 289 320 412 188 188 340 340 116 33 394 478 338 338 338 338 395 470 153 153 153 153 387 372 396 396 385 233 227 227 419 439 417 78 47 47 491 47 47 491 80 321 321 80 321 435 209 287 287 353 353 353 353 396 313 236 36 384 490 490 490 31 162 68 115 115 273 308 308 308 396 313 94 94 176 176 135 328 200 200 199 255 154 154 129 401 321 96 66 482 238 272 79 153 153 153 387 387 396 314 196 196 479 398 398 264 264 468 467 467 255 255 215 96 478 342 68 115 273 498 498 498 498 396 173 173 29 29 334 334 59 59 452 263 229 247 247 126 126 326 326 326 326 326 326 326 101 101 101 149 149 228 321 289 320 7 70 65 65 389 428 428 146 240 325 34 202 202 202 402 221 458 27 121 121 121 33 394 76 259 208 208 386 386 386 444 374 374 374 252 325 34 191 191 191 37 24 404 427 229 247 193 193 17
+17 17 17 363 51 51 184 320 320 345 333 333 220 220 402 66 66 68 115 344 344 344 344 274 274 251 251 241 431 374 374 374 374 285 34 469 469 143 458 208 79 459 459 271 31 342 86 26 26 166 166 166 464 464 464 255 255 349 234 234 261 190 380 288 288 403 403 403 207 207 207 37 24 24 404 439 417 417 417 170 170 28 28 491 28 491 362 491 491 362 491 362 491 491 491 362 362 491 362 362 491 40 211 369 369 369 369 21 21 21 21 21 21 21 408 408 408 149 149 228 321 321 320 7 217 70 65 486 486 486 460 460 169 169 164 164 164 485 485 485 485 374 132 132 58 58 72 268 268 268 268 268 88 88 109 84 463 463 463 173 280 29 334 59 59 452 263 263 417 417 417 417 80 321 321 7 7 345 141 141 141 281 162 232 232 232 482 105 336 336 470 470 264 264 264 264 468 468 313 313 314 314 198 22 448 448 448 464 106 106 372 372 372 313 236 236 36 371 485 213 286 286 286 139 302 302 175 175 69 223 223 130 478 232 232 105 105 336 321 354 470 213 213 252 143 192 176 135 135 135 200 248 248 248 393 205 261 25 498 498 498 498 498 396 271 271 39 54 86 238 6 427 427 247 247 126 126 326 326 326 326 101 101 101 149 228 321 321 373 155 155 155 332 332 332 372 372 372 467 253 253 38 162 232 172 172 115 485 374 374 374 348 94 199 253 253 253 99 338 400 400 400 30 422 143 144 27 121 121 121 394 76 76 208 208 386 386 444 444 374 374 252 325 191 191 191 37 314 198 198 45 45 45 183 183 451 30 30 301 378 345 141 141 281 342 342 221 336 144 27 27 351 319 319 319 53 53 176 135 135 200 248 76 75 108 377 123 123 123 123 132 58 156 156 156 156 59 59 452 229 229 247 126 126 326 326 326 326 101 101 408 149 228 321 321 373 400 400 400 30 422 422 162 482 482 482 482 238 272 189 189 189 189 285 34 230 230 230 230 230 215 215 35 74 419 439 439 78 78 47 47 80 80 80 289 289 320 208 79 499 486 486 460 460 169 150 342 105 105 336 354 176 176 135 135 200 248 248 333 333 220 220 220 142 133 364 276 174 174 174 174 319 319 348 348 195 195 90 90 90 393 234 234 234 234 261 261 25 470 278 278 278 330 330 388 195 195 195 250 250 394 32 32 259 354 190 380 380 315 315 315 315 450 450 450 413 413 33 58 58 72 72 72 294 294 294 294 294 294 294 294 294 282 282 388 195 64 212 131 427 321 247 126 126 326 326 326 326 101 149 149 228 321 320 22 5 455 455 72 72 72 294 294 294 294 294 388 348 64 64 212 212 26 302 302 302 175 69 69 69 130 280 44 44 44 99 338 338 338 338 338 395 470 486 486 486 460 460 215 354 41 324 324 324 3 335 14 226 411 411 424 424 424 424 424 424 274 122 122 131 472 393 234 234 261 25 486 486 486 460 460 169 99 436 436 436 60 242 116 116 33 212 131 472 221 321 144 27 437 437 306 306 306 460 215 35 29 469 277 277 314 401 401 321 354 180 376 376 376 376 376 282 207 37 24 192 192 427 321 247 126 126 23 408 408 408 149 228 321 321 320 127 448 448 448 14 14 411 493 493 493 493 493 216 127 300 334 334 59 452 186 99 338 338 400 400 400 30 422 58 58 72 110 110 139 139 139 293 293 122 122 34 180 113 113 113 113 167 167 36 449 123 123 123 123 183 183 57 57 57 57 57 203 381 381 381 48 48 417 417 417 170 421 421 421 421 491 128 491 128 128 193 193 17
+17 17 17 296 317 317 317 491 491 491 491 491 461 184 321 435 435 321 435 287 287 111 111 111 438 162 342 224 494 494 236 74 470 496 496 496 496 496 274 368 368 9 219 152 152 152 88 353 353 353 245 399 70 473 258 31 54 86 238 272 272 300 245 399 217 473 65 486 460 460 169 164 485 485 485 382 422 458 458 144 27 437 437 151 169 169 164 402 221 401 321 354 29 498 313 313 325 34 462 462 130 402 321 259 79 79 288 360 360 360 200 200 248 445 445 180 171 171 171 252 215 8 354 100 302 375 497 98 185 269 433 390 160 112 112 56 491 312 312 312 187 187 12 12 12 12 12 12 12 23 260 260 260 260 391 391 391 491 321 373 338 400 400 400 400 30 422 162 68 68 115 470 470 120 120 240 314 196 340 116 199 44 44 44 129 259 74 492 236 129 321 445 445 485 485 485 485 485 485 374 374 132 359 81 485 485 134 382 134 359 359 81 166 166 324 422 143 401 401 321 321 144 208 208 208 386 386 386 286 286 286 286 286 286 334 382 59 304 313 186 162 66 482 482 482 482 105 397 336 109 213 213 213 252 143 131 472 393 393 261 343 343 343 343 343 343 343 358 39 39 433 433 160 427 56 247 247 312 126 292 292 326 326 326 326 326 101 101 101 149 149 321 412 287 287 111 111 111 356 356 53 394 212 4 104 104 104 104 104 337 337 337 301 143 144 208 386 431 376 376 376 240 24 36 87 87 87 162 232 172 115 267 267 267 267 267 219 219 477 477 477 477 477 132 13 229 491 247 312 126 292 292 292 23 23 23 101 101 101 149 228 321 412 412 287 111 111 111 378 378 141 141 281 453 142 221 336 420 420 420 416 458 445 485 360 360 360 94 176 135 135 248 76 108 377 87 87 129 354 420 420 420 464 464 44 255 38 349 205 205 261 487 288 288 288 171 171 252 24 131 219 152 152 152 378 43 345 347 347 372 396 313 457 131 221 458 144 27 351 351 319 319 203 53 176 135 328 200 248 393 205 155 332 332 332 332 245 399 429 429 429 429 19 19 229 247 247 126 193 193 193
+17 491 211 491 296 296 363 363 326 101 101 149 149 228 321 321 287 111 111 111 438 58 110 254 254 254 314 196 196 217 473 476 476 476 252 325 34 230 230 230 215 35 196 196 46 46 46 46 438 399 399 217 70 65 480 480 480 480 85 299 299 339 212 131 427 229 247 126 326 326 326 101 149 149 321 321 320 45 45 45 325 118 118 118 118 402 219 152 152 422 236 239 384 371 278 278 314 196 196 242 242 33 90 465 144 27 351 351 319 319 203 53 394 76 205 155 332 332 332 399 399 429 429 429 422 143 108 377 377 87 236 10 479 331 331 428 265 428 428 428 146 207 358 233 131 419 321 247 15 193 193
+17 17 17 363 363 363 51 149 228 228 321 321 287 287 111 111 111 111 378 378 43 389 389 389 314 242 242 394 394 32 259 420 420 420 420 464 44 44 236 129 354 354 278 278 325 34 300 255 349 349 234 234 261 190 487 288 288 288 403 171 207 207 37 24 131 427 491 247 126 126 326 326 326 326 101 149 149 149 228 321 209 83 55 55 55 55 322 94 199 177 177 177 457 389 389 389 314 259 259 420 420 420 301 301 251 251 251 251 251 251 251 241 266 266 266 266 266 173 402 402 26 359 359 81 166 324 422 36 377 87 87 38 162 232 172 26 26 359 444 444 213 252 8 354 89 340 116 199 44 44 43 43 364 276 346 346 346 265 85 85 85 139 139 293 293 122 122 35 401 401 401 75 310 107 107 107 395 395 351 264 264 264 468 468 406 337 337 324 252 143 36 161 487 487 487 41 324 3 335 14 14 411 297 297 297 297 297 297 297 293 293 497 497 43 43 364 364 276 346 346 428 428 428 146 358 76 449 472 397 397 333 333 220 220 164 142 221 401 321 321 321 354 425 425 431 374 374 374 374 374 132 132 203 203 53 473 340 340 116 466 22 283 455 399 217 70 473 65 65 350 350 413 413 33 33 394 478 338 338 338 395 470 480 480 480 85 299 299 339 64 212 465 384 430 430 430 430 430 465 449 152 152 152 152 349 164 164 214 214 214 214 360 328 328 200 243 233 192 192 419 229 491 312 312 491 187 187 187 201 201 201 201 201 201 201 201 491 201 491 201 201 435 211 211 408 149 321 321 321 219 152 152 152 152 143 458 144 389 389 389 325 34 255 399 217 217 65 486 486 486 460 460 240 310 107 395 242 116 116 219 219 152 152 378 378 347 347 347 236 239 161 397 133 276 109 189 139 139 175 81 176 135 200 464 464 340 116 33 33 250 217 70 70 70 65 306 306 306 306 396 134 215 35 29 100 497 497 497 58 72 72 72 72 437 481 481 481 481 481 481 182 182 182 375 375 375 185 269 342 86 221 336 144 430 430 430 430 430 430 430 430 430 131 449 449 485 152 477 477 374 132 132 13 229 491 247 312 15 15 15 15 193 193 193 17
+17 17 17 296 317 317 317 317 317 491 317 317 461 461 461 461 461 461 461 184 184 184 184 321 320 7 217 217 217 473 329 329 329 329 329 460 169 164 164 164 219 485 485 485 374 132 132 274 58 58 72 110 254 254 254 254 314 401 75 108 119 295 295 295 295 295 143 458 192 242 242 116 466 466 22 283 455 38 162 54 482 482 105 221 336 79 79 499 499 405 206 206 348 199 41 324 324 301 251 241 431 278 278 285 302 497 497 497 58 58 72 110 294 294 294 294 294 294 282 388 64 212 131 335 14 14 411 411 284 405 405 405 206 178 35 35 441 441 109 109 134 313 24 26 26 359 359 474 474 324 464 340 340 340 116 33 58 183 183 257 257 257 257 257 120 50 50 185 185 185 269 433 433 390 18 427 56 247 312 312 126 292 292 326 326 326 326 326 101 101 101 408 149 149 321 289 7 7 7 4 127 361 361 361 361 361 330 388 94 199 89 89 446 116 33 212 212 127 114 361 361 361 264 264 264 264 468 59 452 452 263 263 417 417 414 47 80 321 321 435 373 451 451 30 30 236 325 490 490 38 162 342 115 273 265 265 428 146 146 325 34 191 191 325 133 133 259 181 181 181 181 167 457 75 108 377 87 87 236 325 371 374 374 374 374 132 98 98 48 48 417 417 170 170 102 102 28 28 40 40 40 40 40 40 40 40 40 40 40 491 362 491 218 366 305 305 491 366 366 40 40 40 40 435 435 435 435 435 373 451 451 30 30 30 422 458 144 27 389 389 389 389 196 196 479 331 307 307 61 61 167 167 457 75 108 119 351 351 139 139 139 293 293 293 216 216 114 258 258 31 54 54 238 238 221 321 310 107 395 395 437 91 91 91 85 85 85 85 450 293 293 122 122 131 133 333 333 220 220 198 22 44 236 129 321 208 208 425 386 241 431 84 496 496 88 88 176 176 176 328 200 200 464 106 265 265 265 265 265 85 85 85 207 318 318 39 433 433 160 427 247 247 126 126 126 326 326 326 326 326 326 326 101 101 149 149 228 321 320 127 45 45 45 45 35 198 114 0 0 222 58 58 110 254 254 254 314 401 321 354 137 137 137 94 44 44 44 217 473 65 258 31 31 342 68 68 68 238 6 272 470 470 470 171 171 171 358 358 358 233 321 192 192 419 419 439 439 78 78 78 491 28 491 28 28 28 2 491 491 2 341 341 341 12 12 12 21 21 21 408 408 149 228 321 321 373 451 451 30 30 30 378 378 389 389 389 389 129 259 108 119 295 295 295 295 295 143 458 192 156 156 156 245 245 58 72 72 350 350 350 350 350 350 413 203 381 53 89 89 322 67 466 241 431 443 167 167 457 196 217 65 329 329 42 42 147 147 380 256 139 175 175 423 423 423 423 236 75 371 371 374 374 132 132 216 127 114 92 92 92 92 92 167 385 243 227 419 439 78 78 170 170 170 47 491 491 2 2 491 2 2 491 491 2 491 316 435 435 316 316 435 321 435 373 338 338 400 400 400 30 422 143 458 144 389 389 389 389 314 242 242 394 76 76 259 420 420 420 301 26 251 241 431 443 443 169 169 402 402 6 272 415 415 385 129 401 321 259 190 380 499 499 428 428 146 146 457 457 147 147 380 288 288 173 173 29 29 495 495 406 467 467 365 330 94 475 475 324 324 58 58 72 110 268 315 315 315 268 450 450 98 98 229 247 15 15 193 17
+17 17 17 363 51 51 228 321 320 309 331 331 231 231 399 399 473 65 486 486 460 240 285 300 382 245 43 364 276 181 181 181 181 167 35 35 196 196 473 258 258 31 342 86 86 6 272 470 470 171 171 252 458 458 192 389 314 314 321 354 137 137 137 399 217 217 473 476 476 476 476 476 476 207 37 24 131 427 229 321 247 312 126 292 292 23 23 23 23 408 408 391 228 228 321 373 66 172 115 273 344 84 274 88 14 14 411 297 297 297 297 297 297 293 293 122 35 458 208 208 441 109 151 151 151 169 150 54 238 238 310 107 60 298 298 298 379 471 471 49 342 89 89 446 67 34 145 443 154 178 96 96 342 105 105 321 354 386 386 386 469 116 94 418 418 418 418 418 418 99 436 436 436 60 298 298 298 379 379 471 471 471 49 9 142 221 196 70 65 428 428 428 146 325 325 34 253 253 453 9 142 133 364 276 109 109 139 139 139 293 293 293 122 35 354 420 420 420 422 36 384 490 490 490 349 349 234 261 25 487 498 498 498 396 313 285 131 34 89 116 33 394 465 377 351 139 139 139 175 58 451 30 30 378 43 141 141 141 31 162 232 68 68 115 273 470 171 171 252 173 402 402 26 359 166 166 301 8 354 354 180 376 376 460 178 178 458 192 415 415 314 472 221 458 208 79 288 288 360 360 434 200 248 248 212 445 445 171 171 171 171 252 215 354 100 100 302 375 375 185 185 269 390 390 18 112 427 56 56 312 312 312 312 292 292 292 292 292 12 12 12 12 12 12 12 12 260 260 260 260 163 163 163 163 163 163 491 316 316 491 316 316 73 289 321 320 287 287 287 111 111 111 85 438 203 53 394 478 162 232 68 115 273 106 499 499 306 396 337 337 464 464 464 111 111 378 88 345 141 281 31 342 26 251 251 241 431 403 171 171 171 358 358 233 321 227 227 419 419 439 439 225 225 47 47 47 491 47 80 80 80 289 451 451 451 30 30 422 162 232 172 115 179 179 120 120 314 457 310 310 338 338 395 499 499 265 265 85 146 146 37 359 359 474 474 474 19 454 454 417 414 170 170 170 47 28 28 2 2 2 491 491 2 2 2 2 491 316 491 316 435 435 289 435 321 144 27 351 319 319 53 255 255 255 251 241 431 235 235 235 235 413 413 98 48 13 13 13 170 321 170 312 187 187 292 292 292 292 23 23 23 23 23 101 101 149 149 228 289 321 321 127 5 5 455 72 72 441 153 153 153 372 396 313 186 54 54 224 50 356 281 281 9 168 106 410 410 410 410 410 173 402 29 495 406 467 340 340 340 466 22 283 455 448 219 219 219 180 180 306 306 306 306 306 306 59 37 37 404 439 439 439 78 78 170 170 28 28 28 491 2 2 491 2 2 2 491 491 2 316 491 316 316 316 73 73 289 321 321 445 445 278 278 173 196 196 429 429 429 219 464 222 222 245 245 245 8 354 180 376 376 376 376 376 282 37 37 233 192 419 419 439 78 170 170 442 442 187 442 187 187 12 12 12 12 260 260 260 149 149 289 289 321 289 209 287 16 16 16 16 16 88 88 111 111 111 111 438 143 35 389 389 389 33 394 76 465 445 445 445 351 351 264 264 264 468 468 468 337 337 324 324 464 277 277 277 385 36 227 419 439 78 78 170 491 47 187 47 47 47 442 442 442 442 442 127 22 5 236 36 36 107 395 351 91 91 91 91 206 206 122 122 35 29 456 456 31 162 9 105 336 74 106 426 426 206 348 64 212 191 191 191 314 401 321 108 107 107 395 485 286 286 286 468 245 349 349 155 262 262 359 359 474 474 474 474 19 454 229 321 247 15 15 15 193 193 17
+17 17 17 363 51 51 228 289 321 188 177 177 177 325 356 356 356 342 342 224 242 242 116 131 131 72 72 110 443 443 240 173 280 41 41 41 41 19 454 417 417 417 417 170 47 491 47 491 491 491 47 47 80 321 321 435 435 435 209 111 111 111 202 202 402 402 458 27 180 405 405 206 167 457 14 14 14 209 411 297 297 297 297 297 297 297 293 399 70 70 46 46 46 46 46 438 378 43 364 109 109 498 498 134 387 122 122 26 26 359 81 166 324 416 239 458 144 180 484 278 240 314 77 270 342 224 340 340 340 94 199 277 277 277 277 227 419 229 247 247 126 126 326 326 101 101 149 391 80 80 80 80 289 321 354 159 159 159 325 34 177 177 325 356 356 356 31 342 224 242 242 379 131 131 72 72 110 443 443 443 173 173 280 41 41 41 41 19 19 454 454 454 78 170 170 491 312 312 292 292 292 292 292 21 21 21 21 21 21 408 408 149 149 228 228 289 321 209 209 83 55 55 322 322 94 199 118 118 118 118 118 205 177 177 177 177 325 356 356 356 342 342 242 242 116 64 131 472 221 144 445 445 351 351 264 486 468 468 468 337 337 337 324 252 325 34 89 340 116 33 394 212 465 395 395 151 151 169 150 86 86 6 272 34 44 38 162 68 172 115 273 498 498 498 396 240 35 242 242 242 116 250 250 364 364 109 109 403 403 403 207 171 3 252 216 198 22 5 455 72 72 72 72 294 294 294 294 330 64 64 212 302 302 302 497 122 129 259 74 441 441 424 424 497 497 497 49 342 168 180 180 113 113 113 113 450 167 167 131 427 321 247 126 126 326 326 326 101 408 408 149 391 491 321 373 66 68 68 115 273 84 16 88 88 111 111 111 111 438 438 438 35 259 354 180 443 443 285 300 382 313 313 143 458 458 445 445 213 213 213 252 215 354 277 277 277 277 143 259 259 354 420 420 143 458 144 351 494 253 368 453 168 106 111 111 111 438 438 10 10 479 331 84 84 496 274 216 198 448 448 448 464 154 154 154 416 32 96 368 453 453 115 470 470 486 486 376 460 460 178 35 96 96 401 196 196 309 309 479 331 486 486 460 460 178 458 192 192 69 223 130 280 277 277 277 277 385 385 75 227 419 439 78 170 47 47 47 491 491 491 2 2 491 491 316 316 316 73 289 321 321 209 177 177 177 356 356 342 168 44 116 199 154 154 96 96 54 482 238 161 161 487 288 360 360 360 339 53 359 166 166 166 324 14 14 411 411 424 424 424 424 424 424 122 122 122 131 472 221 144 27 437 306 306 306 306 396 215 35 29 277 277 314 401 321 259 354 180 376 376 376 376 120 282 37 233 192 419 427 78 170 491 312 312 312 341 341 341 341 341 12 12 12 12 21 21 326 326 326 326 101 101 149 149 228 289 321 321 209 287 16 16 16 88 88 111 319 319 203 53 394 212 4 104 104 104 104 406 337 337 337 324 422 143 458 208 386 431 376 376 376 460 240 24 36 107 152 152 152 202 402 402 402 259 144 27 27 351 319 319 319 319 203 381 381 117 48 417 417 417 417 197 491 435 80 289 321 209 188 357 357 357 357 357 173 280 242 116 94 118 118 118 118 118 280 177 177 177 177 457 457 364 345 389 389 389 285 34 202 202 202 402 401 259 354 137 137 137 137 33 10 10 479 331 265 265 428 146 146 146 39 86 6 272 87 87 87 162 54 86 26 26 444 444 213 252 215 354 340 340 340 199 44 44 44 43 43 364 276 346 346 265 85 85 85 139 139 293 122 122 314 401 401 75 107 107 395 351 351 264 264 468 468 406 337 337 324 422 36 36 161 161 487 487 487 41 41 19 19 19 454 417 417 421 421 491 421 128 128 128 193 193 17
+17 17 17 296 317 491 317 317 491 184 184 184 184 320 7 345 152 152 152 152 402 221 144 180 189 405 206 167 36 377 87 87 236 161 79 499 499 499 428 146 173 173 280 29 255 251 251 241 235 235 235 235 235 348 248 76 259 74 74 351 213 213 213 213 213 186 39 342 342 224 110 110 110 202 202 202 430 430 430 430 430 243 133 259 345 109 41 41 19 19 454 229 82 229 312 312 126 292 292 292 292 292 292 21 21 21 21 408 408 408 149 149 321 321 320 473 258 258 31 342 224 494 494 31 162 232 105 105 336 470 470 432 330 330 379 77 342 224 300 300 382 186 186 54 172 273 470 470 120 240 325 177 177 177 378 345 141 141 281 342 168 470 411 171 171 171 171 252 314 401 196 196 217 70 65 265 265 265 85 85 85 139 139 375 375 185 269 433 427 427 247 247 126 126 126 326 326 23 23 23 23 23 101 149 149 149 321 321 287 111 111 111 438 356 203 64 90 212 144 208 386 431 376 376 376 376 85 37 24 35 259 354 420 420 422 143 144 27 351 368 453 168 106 111 111 111 438 438 251 251 251 241 266 266 266 266 173 402 402 221 75 161 161 79 499 499 428 85 146 146 173 173 176 176 176 328 200 200 117 404 404 439 439 225 237 237 260 260 260 260 260 391 391 289 321 321 321 209 287 287 16 16 16 16 16 88 88 177 177 177 177 35 478 478 68 172 172 444 444 444 360 339 339 394 478 478 232 232 68 172 344 344 344 344 344 274 43 43 43 364 364 276 174 319 319 348 348 348 64 64 212 212 300 469 134 349 155 262 262 100 100 497 122 45 45 45 325 111 111 111 203 53 90 212 144 106 88 319 135 135 248 465 377 87 87 87 251 251 251 251 241 278 278 278 173 402 402 345 333 220 220 164 219 477 477 477 88 89 89 446 53 212 354 354 255 251 251 251 251 241 431 235 235 235 235 235 348 248 248 465 449 377 123 123 123 219 219 477 477 477 477 477 132 13 321 247 312 126 126 326 326 326 326 326 326 101 101 149 149 228 321 412 287 111 111 111 438 202 402 6 479 463 463 463 280 29 382 245 8 354 354 134 497 251 241 431 235 235 235 235 348 76 465 108 123 123 123 88 109 475 475 94 475 475 475 301 8 354 106 493 493 240 325 41 41 41 19 454 454 229 491 247 312 126 126 23 408 149 149 228 321 320 7 331 307 307 307 167 457 457 42 147 380 485 213 213 286 286 139 139 175 359 474 474 41 41 19 19 454 454 13 414 170 47 47 47 491 47 491 491 47 491 102 435 80 80 289 321 7 7 354 159 159 159 314 35 22 448 448 464 464 255 38 162 68 115 273 106 265 265 85 85 146 175 81 242 203 250 250 345 141 141 281 453 9 198 22 283 455 43 364 364 276 109 109 498 498 498 396 271 271 39 39 86 86 238 6 227 419 439 78 56 56 28 491 28 491 2 491 2 341 341 12 12 21 21 23 101 101 101 149 149 228 321 287 287 111 111 202 202 202 280 29 106 350 350 350 175 466 166 166 166 301 8 137 137 137 137 94 199 340 340 340 94 199 277 277 385 457 393 205 155 155 332 148 148 148 372 372 245 399 399 217 70 65 319 319 319 319 379 379 243 77 270 433 433 112 427 247 247 126 126 23 408 408 391 228 321 320 320 159 159 159 159 129 259 127 114 92 92 92 92 457 457 141 141 141 281 342 168 168 340 340 116 10 479 331 331 230 230 230 169 169 169 352 352 352 352 352 352 112 112 78 56 421 421 491 15 15 15 193 193 193 17
+17 17 17 363 363 363 51 51 51 228 491 321 321 209 177 177 177 177 356 77 342 142 397 336 345 109 498 498 498 313 186 39 342 68 198 114 114 242 446 116 457 335 401 321 226 321 209 475 475 475 475 475 475 475 475 422 349 164 214 214 214 214 200 248 219 152 152 152 143 458 192 389 389 34 121 121 399 217 217 217 217 473 65 486 486 486 460 460 460 24 310 107 107 242 275 275 275 303 303 117 404 13 78 170 170 491 491 312 187 187 187 187 12 12 12 12 12 408 408 149 149 228 321 320 7 473 258 258 258 31 342 224 494 494 31 162 232 232 105 105 336 470 432 432 330 379 64 77 77 224 300 300 382 186 186 54 273 470 470 240 34 177 177 378 345 141 141 281 9 142 397 364 364 109 109 278 143 458 192 192 469 325 34 223 130 402 196 70 429 429 429 422 108 377 87 87 236 259 108 119 119 437 405 405 405 405 206 178 35 321 26 386 266 266 266 266 266 178 458 96 321 127 114 92 92 92 92 92 167 385 427 82 247 126 126 326 326 326 193 193 193
+17 17 17 296 317 491 184 184 184 184 289 321 320 127 0 0 0 0 378 354 347 347 347 245 416 129 321 144 180 484 484 484 484 120 37 37 37 24 24 404 414 414 414 47 47 47 47 491 47 80 80 321 321 289 7 219 152 152 152 116 94 331 84 84 84 84 16 274 98 229 247 247 126 326 326 326 326 101 149 228 321 321 320 22 448 464 255 38 162 342 115 273 106 265 265 85 85 146 175 175 81 242 203 394 76 259 74 485 213 213 213 252 215 259 354 100 100 100 497 98 98 98 13 417 417 170 170 170 170 28 491 28 2 2 2 2 2 2 2 2 2 2 2 2 491 316 316 316 73 289 321 289 321 159 159 159 159 35 127 114 0 222 406 467 356 356 281 162 232 232 68 172 115 344 344 344 344 274 274 251 241 431 278 285 285 302 497 497 186 162 162 232 482 482 482 105 336 144 180 496 496 274 215 457 96 393 155 155 332 332 216 216 448 448 448 464 121 121 399 217 217 65 65 486 460 240 240 310 449 107 242 242 116 33 10 10 10 309 331 418 418 418 418 418 252 99 99 436 436 60 60 298 298 116 199 199 340 340 116 94 199 242 466 94 199 459 44 38 31 162 68 68 115 273 273 265 265 85 85 146 146 175 175 81 81 275 203 203 381 381 48 13 229 321 247 312 126 292 292 292 292 292 292 21 21 23 23 23 23 23 23 260 260 260 260 260 391 391 228 321 321 412 287 287 350 350 350 350 350 250 81 166 166 166 422 36 310 395 395 151 151 150 39 86 238 272 34 340 340 116 466 22 448 448 464 464 493 493 493 300 300 382 245 14 14 411 411 153 153 372 372 372 396 349 349 234 234 25 242 275 275 379 379 471 471 49 269 433 390 390 112 112 56 56 56 305 170 28 28 28 491 28 491 28 362 491 362 491 362 362 362 362 40 40 362 362 362 305 362 362 491 218 218 40 40 40 40 435 435 211 21 326 326 408 408 408 149 228 321 177 177 177 177 378 364 345 141 141 141 141 281 453 9 142 336 74 190 487 104 278 325 34 324 324 464 464 121 121 121 64 161 161 487 469 186 54 86 6 272 176 176 328 200 248 76 465 377 87 123 255 255 399 217 473 65 486 486 460 460 240 310 449 242 242 116 394 76 465 214 214 214 328 200 248 49 453 342 168 255 8 354 180 113 113 113 113 167 167 35 198 198 114 114 114 57 57 120 282 203 381 381 381 117 48 229 321 247 193 193 17
+17 17 363 363 51 228 373 489 489 489 489 88 88 254 254 254 314 8 354 137 137 137 33 394 478 478 482 482 482 6 272 371 189 189 424 424 497 122 34 34 242 116 285 199 255 43 43 109 109 403 403 171 301 349 205 155 165 165 165 53 58 156 156 156 156 245 129 129 321 74 74 351 351 351 264 264 468 468 406 11 11 379 379 77 77 342 224 340 340 94 199 156 156 156 245 14 14 411 411 188 121 121 121 53 394 76 205 261 25 469 11 379 379 77 342 342 224 41 41 41 301 143 259 354 62 62 62 62 464 464 44 44 44 129 321 458 208 208 190 190 441 487 487 153 424 424 182 182 497 497 497 497 122 10 10 479 331 498 498 498 498 498 396 271 186 39 323 323 142 489 489 489 489 422 32 239 321 384 371 180 265 265 265 265 85 85 146 24 35 259 354 255 255 349 155 155 148 148 148 387 186 99 400 400 400 30 143 458 144 389 389 314 90 458 144 121 121 203 394 76 4 205 261 25 470 443 443 443 169 271 150 39 433 433 433 160 112 427 56 247 312 312 312 292 292 292 292 292 292 292 292 21 21 21 21 21 21 21 23 23 260 260 260 260 260 260 391 149 228 321 321 412 287 287 111 111 438 219 219 219 485 485 374 186 162 54 86 238 272 272 494 139 175 251 241 431 265 265 85 146 146 464 464 255 43 43 109 109 403 171 171 143 192 192 469 314 314 196 196 479 331 428 428 428 428 146 385 35 75 342 224 89 446 94 199 255 255 217 217 473 65 486 486 460 460 368 310 449 60 242 116 116 394 76 259 214 214 214 214 200 200 471 49 453 26 26 241 266 266 266 266 266 266 416 96 198 198 114 92 92 92 92 92 92 167 385 233 131 229 247 247 126 126 326 326 326 326 326 408 408 149 149 228 491 289 321 321 354 420 420 143 458 192 485 494 368 342 168 111 111 111 240 325 371 371 278 278 116 33 33 58 72 110 110 202 202 202 402 402 36 119 119 103 103 103 103 85 299 299 203 53 473 340 340 466 22 283 455 236 384 371 93 93 93 93 207 207 207 19 454 263 417 417 417 417 417 170 170 28 491 28 491 491 2 491 2 491 2 2 2 163 316 491 435 435 435 435 321 321 321 435 287 111 111 438 438 458 445 357 357 443 271 31 342 342 198 114 92 92 169 77 342 142 397 345 346 181 428 438 464 464 365 330 203 394 478 172 115 273 344 344 344 274 349 164 164 164 470 278 278 120 330 388 195 195 117 48 417 417 417 170 47 47 47 47 47 47 491 491 47 491 80 80 80 321 435 435 287 287 111 111 111 438 464 365 365 365 330 203 53 64 212 161 79 288 151 240 314 131 393 262 262 100 497 497 349 164 224 470 432 365 330 94 199 331 145 290 290 434 434 339 212 131 180 284 265 265 85 85 207 207 454 454 229 321 247 312 312 126 292 292 292 292 292 292 292 292 292 292 21 21 21 21 21 21 21 21 21 21 21 21 101 101 149 149 228 321 321 320 127 0 0 222 468 356 356 356 453 342 242 116 199 44 44 44 129 35 401 401 401 321 74 351 351 278 278 178 458 192 180 125 125 125 348 250 70 46 46 46 46 46 438 301 8 239 354 106 106 84 496 496 496 496 413 413 413 471 471 49 269 433 390 160 112 56 417 201 201 201 201 193 193 17
+17 17 17 296 317 317 184 184 184 289 209 287 287 111 111 111 438 438 314 32 239 384 371 371 374 374 132 274 251 251 241 431 266 266 266 266 173 402 402 36 108 87 87 88 88 255 255 399 217 217 65 65 486 486 460 460 240 310 107 395 242 275 116 199 199 111 111 85 438 203 203 53 10 10 309 331 331 265 265 428 428 146 146 186 39 342 68 68 224 224 11 116 33 394 472 401 401 321 74 425 425 386 431 319 319 319 203 53 53 53 76 401 259 345 333 333 220 220 402 472 221 239 384 371 278 278 53 53 394 76 259 74 302 302 497 497 49 453 342 168 340 340 116 250 70 46 46 46 46 438 464 464 145 139 139 293 293 122 8 354 354 84 84 496 496 274 185 39 433 433 390 160 112 112 56 56 56 56 28 28 491 491 28 28 491 491 362 491 362 362 362 491 362 362 362 362 362 362 491 362 362 211 211 362 491 369 369 369 369 369 369 369 369 21 21 21 21 21 21 21 260 260 260 260 260 260 391 391 391 491 289 321 321 7 7 345 333 333 220 220 314 32 4 4 127 114 258 258 258 258 258 31 39 342 433 390 390 390 160 160 160 160 97 97 225 225 80 80 80 321 321 7 217 473 329 329 329 329 329 164 164 485 485 485 485 374 368 31 142 221 336 27 121 399 53 76 465 74 351 351 365 365 365 330 388 64 219 398 398 275 275 116 471 471 478 66 482 238 6 272 106 405 405 405 167 215 96 96 75 108 119 437 405 405 405 405 206 206 178 458 192 176 176 328 328 200 303 48 48 417 225 80 80 491 321 80 289 289 320 74 437 437 306 306 396 396 396 35 35 26 359 359 474 474 474 474 19 19 454 229 321 247 126 126 326 326 326 326 101 101 408 391 228 321 289 321 320 354 420 422 143 144 27 494 278 186 99 400 400 400 378 378 141 141 141 281 168 106 113 113 113 206 240 285 34 462 462 402 401 259 259 354 380 380 288 443 120 120 169 169 169 169 352 352 352 352 352 352 97 89 55 322 67 90 90 259 74 74 437 437 306 306 306 396 396 385 35 26 359 359 474 474 324 301 301 354 420 420 420 143 321 144 27 351 253 253 368 453 342 198 198 127 0 0 0 0 58 110 254 254 254 131 133 147 147 288 213 213 143 233 310 107 447 447 447 198 198 22 283 455 8 354 354 329 151 151 416 416 41 41 41 41 19 19 454 454 170 170 491 491 312 491 312 312 292 292 292 326 326 326 326 326 101 101 101 149 149 228 321 320 479 331 307 307 61 285 34 44 44 44 94 199 493 493 493 493 216 300 300 382 245 43 364 364 276 109 498 498 498 498 396 313 313 314 36 430 430 430 430 36 36 310 107 400 400 400 30 422 162 162 68 68 115 273 470 403 403 207 207 464 464 89 319 348 64 76 465 108 139 139 139 139 497 122 216 0 0 0 0 58 254 254 254 314 26 251 241 431 443 443 443 169 402 402 96 75 472 198 198 22 283 455 4 4 4 280 278 278 278 175 175 81 459 469 37 37 24 310 107 395 89 89 322 322 250 250 347 347 347 236 129 161 79 79 499 499 265 85 85 146 146 173 173 176 176 135 135 200 248 248 384 371 180 315 315 315 450 450 413 94 199 44 44 38 342 342 68 482 6 336 384 371 213 213 213 213 252 215 129 321 26 26 26 81 278 278 285 26 302 497 497 58 58 183 72 351 278 278 278 139 139 375 375 375 375 98 98 13 229 321 247 15 15 193 193 193 17
+17 17 17 363 363 363 51 51 228 321 321 320 5 5 455 42 42 147 380 380 496 496 496 274 274 122 24 131 472 221 259 74 437 306 306 306 306 169 167 36 449 69 223 130 130 402 402 345 345 109 407 407 407 385 36 75 310 395 254 254 254 314 259 354 137 137 137 33 394 394 465 144 27 351 189 189 151 167 167 457 478 478 66 68 68 115 344 344 344 344 344 344 274 236 32 239 384 371 213 213 213 213 215 129 321 354 359 359 474 474 464 340 340 116 394 465 377 377 123 123 198 22 283 455 38 162 232 482 172 115 273 106 405 405 405 405 206 169 402 402 402 6 272 472 472 482 482 482 115 273 106 153 153 387 387 387 139 139 302 302 375 375 98 13 321 247 247 126 126 326 326 326 326 101 149 149 228 321 321 127 45 45 45 35 259 127 5 5 455 129 129 259 354 354 180 486 365 365 365 365 360 200 200 243 243 233 270 270 270 390 390 390 390 97 97 225 225 225 373 155 261 487 288 288 278 330 339 64 310 447 447 238 272 397 345 333 220 220 402 221 129 401 321 354 425 425 241 431 374 374 374 203 53 53 473 176 176 135 328 200 200 248 248 248 364 276 276 346 346 265 428 85 85 139 293 293 293 122 122 401 401 401 310 310 107 107 395 351 470 264 264 468 468 468 337 337 324 422 36 36 161 161 487 487 288 41 41 246 318 49 453 342 168 89 116 116 33 394 394 478 66 68 68 68 26 26 251 241 81 278 278 278 203 53 250 250 250 250 250 276 346 346 428 428 428 146 252 36 472 221 401 401 321 321 354 354 498 498 498 498 396 143 36 310 107 107 395 50 50 50 50 185 185 433 433 160 112 427 247 247 126 126 292 326 326 101 101 149 149 228 289 321 320 347 347 347 186 162 232 232 68 172 115 273 204 204 204 204 280 29 495 134 302 302 497 497 349 349 234 234 261 25 213 213 213 252 252 449 34 255 255 8 259 354 180 230 230 319 173 402 402 198 127 222 222 222 222 222 313 58 72 72 110 110 120 120 120 120 120 37 37 24 471 270 270 433 433 160 18 112 112 56 56 491 28 28 28 491 28 491 28 491 362 362 491 362 362 362 491 491 362 491 491 211 211 102 491 369 369 369 369 21 21 21 21 21 101 101 149 149 321 321 320 127 5 5 236 129 36 310 107 395 351 91 91 91 91 85 85 85 85 139 293 293 122 122 131 472 221 401 321 74 441 189 189 240 285 34 180 113 113 113 113 167 285 449 449 156 156 156 313 58 58 72 72 294 294 294 294 294 294 294 282 388 64 64 212 34 89 89 322 53 212 32 259 354 380 380 189 496 496 274 143 458 192 180 230 230 230 169 169 352 29 44 44 245 8 32 321 354 190 380 288 365 365 365 365 330 388 64 76 76 310 107 107 395 462 462 462 402 402 133 276 276 346 346 486 315 315 315 139 450 293 122 122 131 472 221 401 321 75 74 425 425 386 386 386 431 319 319 319 203 203 381 381 381 381 381 117 198 198 127 45 45 45 236 401 401 401 321 354 190 380 499 151 151 169 169 99 447 447 238 6 272 34 255 416 192 180 432 432 330 379 77 77 342 342 198 22 283 283 38 162 342 115 273 265 265 265 85 146 146 325 34 69 130 130 198 22 283 455 8 259 354 106 151 151 151 416 416 192 41 41 41 41 19 19 454 454 229 321 247 312 15 15 15 15 15 15 15 193 193 193 193 17
+17 17 17 296 363 363 363 363 51 51 51 184 184 321 184 321 321 209 188 430 430 430 430 342 430 430 430 430 33 64 212 127 114 92 92 92 92 167 457 401 401 401 321 354 354 219 219 485 485 485 374 374 285 285 469 469 349 393 234 155 262 262 100 100 100 100 375 98 98 98 13 13 13 442 442 442 491 442 442 442 442 442 491 102 2 2 2 2 201 40 305 305 305 305 305 40 366 366 102 316 316 316 491 102 491 305 102 102 289 289 321 321 320 181 181 181 181 181 35 449 430 430 430 430 198 114 114 92 92 92 167 457 35 401 75 161 161 161 161 487 487 487 213 213 246 246 246 246 301 26 251 251 241 444 444 444 444 360 360 339 199 176 135 135 135 135 200 200 464 113 113 113 113 167 167 349 155 155 165 165 165 165 466 22 22 283 455 455 259 354 354 180 376 376 365 365 365 328 200 243 76 458 192 483 14 411 411 297 297 297 297 297 297 293 293 497 497 43 364 364 276 346 346 346 428 428 428 146 252 143 36 449 89 89 446 446 33 251 251 251 241 241 431 171 171 171 252 186 39 342 342 342 224 41 41 324 324 301 399 473 476 476 476 476 143 458 192 219 152 152 422 349 164 164 164 214 214 360 360 200 200 248 321 144 192 69 223 223 223 223 223 37 173 352 352 352 402 99 338 400 400 400 400 464 464 464 145 376 376 376 460 169 169 342 342 86 105 6 96 96 272 427 56 247 247 312 126 126 292 292 292 292 292 23 23 23 23 23 260 260 260 260 260 260 391 391 163 491 316 316 316 316 316 316 316 73 289 7 7 7 364 276 276 109 109 139 139 293 293 293 413 309 479 331 331 315 315 315 315 315 450 450 16 98 98 98 13 13 13 229 247 312 312 126 126 126 23 23 23 23 23 260 260 391 391 47 491 491 316 316 80 321 373 412 412 287 287 287 284 306 306 85 438 240 325 34 242 242 94 199 331 84 84 84 16 16 16 16 98 98 98 98 263 13 225 225 225 225 225 225 80 80 80 321 373 66 66 172 179 179 179 179 179 314 196 196 473 65 329 329 329 329 329 164 164 485 485 485 485 485 374 132 98 417 417 417 417 417 417 170 170 170 28 28 28 491 28 491 491 2 491 491 2 491 2 435 2 2 2 366 321 305 305 40 435 40 201 435 435 435 435 435 289 289 321 320 364 276 346 346 265 428 85 146 464 464 44 44 44 8 401 401 321 354 190 190 380 380 499 265 265 265 85 85 146 146 252 325 449 34 255 130 130 402 402 401 321 321 208 208 441 441 441 153 153 153 372 372 396 396 271 186 54 433 433 390 112 427 56 247 247 126 126 326 326 326 326 326 326 326 101 101 101 149 149 228 321 412 44 44 44 8 129 129 259 190 190 190 190 79 380 380 499 499 265 265 265 85 85 85 146 146 146 24 131 335 14 14 226 321 209 411 287 297 297 297 297 297 297 297 293 293 175 175 81 81 340 340 116 33 33 90 250 250 364 364 276 346 346 346 428 428 428 146 358 358 36 131 472 397 345 333 333 220 220 216 44 44 44 251 251 251 251 251 251 241 241 431 266 266 266 266 173 173 173 402 402 402 26 359 359 81 81 324 324 324 301 339 399 217 217 217 217 217 473 65 278 278 31 162 342 86 86 86 6 6 272 41 324 324 324 301 4 4 4 280 470 470 403 171 171 171 171 464 139 139 302 375 375 375 98 263 13 78 78 170 491 421 15 15 193 193 17
+17 17 17 296 363 363 363 51 51 51 228 321 412 287 111 111 111 202 202 202 196 309 479 463 463 463 463 29 382 313 186 162 232 68 68 267 267 267 267 267 339 339 250 250 250 276 174 174 174 319 388 388 303 117 229 229 247 126 126 326 326 101 101 149 149 228 321 289 321 159 159 159 159 325 34 111 111 111 438 438 143 458 445 351 242 116 199 199 255 255 399 217 217 473 65 486 486 486 460 460 240 310 310 107 395 242 242 203 250 250 181 181 181 181 99 338 338 400 400 152 378 378 345 389 389 314 26 26 251 241 241 367 367 367 367 35 458 96 26 386 266 266 266 266 266 266 146 358 143 458 192 419 439 78 170 170 491 28 491 28 28 491 491 2 491 2 102 2 102 102 102 491 491 435 305 289 289 321 209 287 111 111 111 438 438 325 34 180 84 350 413 348 131 34 463 463 463 463 402 29 29 495 467 467 154 154 458 96 96 232 68 105 336 470 470 151 151 178 35 401 75 272 87 87 8 354 420 420 420 464 464 44 255 8 259 259 190 380 380 499 499 428 85 146 146 35 196 196 473 46 46 46 46 438 186 162 68 68 68 115 273 279 279 279 279 279 279 279 375 375 352 352 352 427 491 247 491 312 126 292 23 23 23 101 101 149 149 228 321 321 287 287 111 111 111 438 203 53 394 478 232 172 115 344 344 344 344 274 58 72 72 437 350 350 350 350 350 203 53 250 250 359 359 166 166 324 324 301 10 479 331 231 231 231 274 8 354 29 469 325 41 324 301 378 345 345 389 139 497 175 335 14 14 209 145 463 463 463 463 280 29 382 245 245 43 364 174 174 174 330 348 76 465 75 377 87 87 399 217 473 65 65 264 264 264 468 468 337 337 337 324 324 301 217 473 429 429 429 429 429 246 246 246 19 454 229 321 247 126 126 126 326 326 326 326 326 23 408 408 408 149 149 391 491 289 321 412 287 287 319 319 348 175 81 431 443 443 31 342 342 177 177 177 177 457 70 70 65 65 428 428 146 215 35 259 420 420 420 464 464 44 44 349 234 234 261 25 148 148 148 372 372 467 467 242 348 250 217 473 473 278 278 99 99 436 436 60 60 116 94 199 470 264 264 468 468 468 337 41 41 19 454 454 417 417 417 417 237 237 80 80 321 412 287 287 287 111 111 438 438 31 342 224 494 494 129 74 74 437 496 496 496 274 368 368 9 168 494 44 349 234 234 205 261 148 148 148 372 372 372 467 467 446 116 250 250 473 473 278 278 99 99 436 60 60 242 116 94 199 264 264 468 468 337 337 41 324 301 399 70 473 65 428 428 146 146 457 35 401 196 242 33 33 394 32 32 259 420 420 420 420 420 324 301 173 280 104 104 104 104 337 337 337 337 301 129 321 74 492 492 236 384 371 278 278 278 143 321 192 485 134 134 134 175 81 300 334 59 452 263 229 229 491 312 15 15 15 15 15 193 193 193 193 17
+17 17 17 296 363 363 51 51 51 184 184 184 289 321 320 354 159 159 240 285 34 111 111 111 111 438 236 35 75 371 371 374 374 132 132 88 58 72 72 496 496 496 496 215 35 96 26 34 45 45 45 31 478 478 68 68 68 115 273 231 231 231 203 53 64 212 384 93 93 93 93 464 464 111 111 111 111 438 99 99 338 395 389 389 389 497 129 401 259 74 483 58 72 110 202 202 202 202 173 402 44 44 44 43 43 364 276 276 346 346 428 428 146 146 358 457 401 401 321 75 161 161 79 487 288 443 443 120 271 271 150 39 433 433 433 160 160 160 112 439 56 56 47 47 491 47 491 47 491 491 316 491 73 73 289 289 321 320 127 114 92 92 92 92 92 240 385 35 131 335 226 188 188 356 356 281 342 9 196 70 70 46 46 46 46 438 58 58 72 72 72 72 72 72 72 72 437 265 428 428 146 146 146 464 459 459 459 31 39 86 86 6 272 106 486 428 85 146 438 239 36 371 485 485 286 139 139 175 175 69 462 462 130 29 498 498 498 498 169 164 164 26 26 359 81 324 324 301 8 259 354 425 386 81 459 271 271 271 39 39 433 390 390 160 112 427 491 247 312 126 126 292 292 292 326 326 326 326 326 326 326 326 326 101 101 149 149 149 228 321 321 287 287 111 111 438 438 143 36 107 395 494 31 342 342 26 26 26 241 266 266 266 266 173 402 401 401 401 259 74 190 487 278 278 325 34 324 324 143 458 321 208 208 208 386 386 431 496 496 496 496 274 274 216 164 270 270 433 160 18 427 56 56 47 47 491 47 47 491 2 2 491 316 316 321 73 289 321 435 209 83 55 55 322 322 212 34 111 111 111 111 438 202 202 402 196 479 331 463 463 463 29 29 382 58 58 72 110 110 254 240 325 34 44 44 129 129 259 74 190 487 278 278 325 324 324 324 236 239 259 161 79 487 288 443 443 169 342 342 224 340 340 340 250 217 70 46 46 46 46 46 438 251 251 241 431 431 428 428 428 146 146 186 402 352 342 224 45 45 325 325 111 111 111 178 458 192 242 242 116 33 250 456 456 456 456 456 399 217 473 65 432 432 330 203 53 212 212 29 334 334 59 59 452 263 229 321 321 312 312 126 292 292 1 292 292 1 1 1 1 1 23 260 260 408 408 391 391 391 289 289 321 320 159 159 159 285 255 255 402 402 458 144 441 441 441 153 153 372 372 396 186 186 54 54 86 112 427 56 56 201 201 201 201 201 201 201 201 201 201 201 321 435 435 435 320 209 177 177 177 356 356 342 483 14 226 321 411 297 297 297 297 297 297 293 122 216 22 283 455 399 399 138 138 138 138 372 396 313 449 377 87 87 87 251 241 367 367 367 367 458 96 393 393 234 234 261 25 148 148 148 372 245 43 345 109 109 313 236 36 75 108 377 485 489 378 88 356 356 356 281 342 430 242 242 116 212 131 277 277 277 277 277 385 233 75 419 427 56 56 170 170 312 312 292 292 292 1 1 1 408 408 408 408 305 321 209 83 55 55 322 67 466 127 361 361 361 361 361 388 195 117 229 229 247 126 126 193 193 17
+17 17 363 51 51 228 321 209 111 111 111 438 458 192 192 242 116 199 255 255 399 217 473 65 486 486 460 240 240 35 310 107 242 298 116 379 466 45 45 45 285 34 111 111 365 203 203 394 212 161 79 487 288 443 169 150 39 86 86 238 6 336 90 221 321 144 208 153 153 153 387 372 396 313 24 310 107 459 459 271 39 433 68 68 68 359 474 474 474 474 19 19 454 454 417 442 442 170 170 28 28 491 2 491 491 2 491 2 491 2 2 491 316 316 73 289 321 321 7 127 258 258 31 162 342 142 142 196 217 70 65 153 387 387 396 348 94 176 176 328 200 200 248 345 409 409 409 94 199 111 111 111 438 251 241 431 443 443 169 169 352 402 198 198 448 448 464 464 255 38 38 162 232 68 68 115 273 265 265 265 85 85 146 175 175 81 81 242 203 203 53 65 111 111 111 438 349 205 205 261 25 189 139 139 293 122 478 478 66 68 172 115 344 344 344 344 88 88 255 255 186 99 338 338 338 338 338 395 470 290 290 290 290 290 434 339 53 394 212 401 221 321 354 420 420 143 458 192 278 253 368 453 342 168 111 111 111 438 72 110 110 254 254 240 35 321 377 87 87 87 43 364 109 109 264 264 313 216 216 114 258 258 31 31 342 142 142 72 72 72 72 72 72 437 153 481 306 372 406 467 467 469 240 285 34 106 106 424 424 424 424 497 122 122 133 401 321 364 276 109 278 330 348 33 394 77 77 342 224 41 324 324 301 236 321 75 161 79 79 288 288 443 120 271 271 39 39 433 390 160 160 112 427 491 247 312 126 292 292 292 292 326 326 326 326 326 23 23 23 101 101 101 101 101 149 149 228 289 289 321 289 209 209 287 297 297 297 297 297 293 293 216 22 448 448 448 378 106 153 372 372 372 349 349 205 261 25 242 379 379 471 77 342 110 110 110 460 240 314 35 384 87 87 43 43 276 109 109 468 468 240 216 216 57 57 203 217 473 219 219 152 374 116 94 331 84 84 84 84 16 274 98 98 13 13 414 491 170 491 170 187 491 187 187 23 23 101 101 149 149 149 321 209 44 44 44 399 217 70 473 65 498 498 396 313 35 310 107 107 242 116 116 199 34 89 446 116 33 58 58 72 437 496 496 496 496 215 35 35 96 270 342 224 242 242 116 33 466 466 241 431 376 376 376 169 150 150 86 238 272 397 397 109 109 278 278 64 76 449 300 382 313 236 239 259 384 371 84 496 496 413 94 199 158 158 158 252 325 449 191 191 191 314 36 164 119 161 161 487 487 487 337 213 213 324 324 3 3 58 72 72 437 319 319 319 348 64 212 300 382 313 313 314 314 219 219 219 180 180 106 306 306 306 306 306 396 396 37 37 24 77 270 168 168 462 462 402 402 402 345 109 109 330 116 33 394 77 77 224 41 41 324 236 108 377 123 123 216 283 448 448 464 464 255 38 162 68 115 115 106 265 265 265 85 146 299 175 175 81 275 203 203 381 117 48 13 491 491 312 312 126 292 292 292 292 292 21 21 21 23 23 23 260 408 391 391 391 321 321 373 66 68 115 273 231 231 231 319 53 76 76 74 485 213 213 301 8 354 100 497 497 186 162 68 115 273 470 443 240 325 177 177 177 457 345 141 141 281 9 221 336 354 420 420 143 259 144 27 351 368 368 342 224 30 30 422 143 144 27 389 389 389 314 196 242 242 33 394 478 232 68 172 115 273 443 443 139 175 175 175 81 277 277 37 385 131 404 321 247 247 126 15 15 193 193 193 17
+17 17 296 51 51 184 184 184 289 321 320 159 159 240 199 111 111 111 111 111 438 314 133 133 147 380 180 486 443 240 240 216 300 300 382 245 8 354 255 255 251 251 251 81 444 444 444 444 246 252 173 198 164 45 45 45 34 177 177 177 345 141 141 281 453 342 168 180 113 113 113 285 285 69 223 130 198 22 283 455 455 129 259 144 27 437 480 480 480 146 299 339 64 10 459 459 459 271 342 342 224 69 223 130 280 257 257 257 31 9 142 142 72 72 437 306 306 306 306 306 396 396 385 233 131 133 133 430 430 430 430 430 430 430 430 430 212 131 219 219 219 477 477 477 374 132 132 98 48 13 170 170 170 491 312 312 28 341 341 341 341 341 12 12 12 21 21 21 21 23 23 101 101 149 391 391 73 289 289 321 7 70 409 409 409 399 53 473 429 30 422 143 458 144 180 189 405 405 206 285 34 125 125 125 348 466 22 283 455 236 36 161 161 161 161 487 487 288 290 290 290 290 434 434 434 339 195 404 229 82 247 126 126 326 23 101 101 149 149 321 321 287 111 111 111 349 205 261 25 189 189 139 293 122 35 449 34 253 253 453 453 342 118 118 118 118 402 402 14 226 321 209 411 145 204 204 204 204 204 204 29 337 337 337 301 8 259 354 109 151 240 325 34 41 324 301 399 217 70 65 151 169 150 342 105 221 259 354 420 420 301 301 251 251 241 367 367 367 367 367 458 192 192 176 135 200 200 464 415 415 415 415 457 457 217 429 429 429 464 464 89 203 394 129 401 321 75 74 351 278 278 278 325 449 41 41 324 324 324 464 434 135 328 328 200 248 248 248 429 429 429 429 429 19 19 454 417 417 170 170 170 170 28 491 28 2 491 491 2 491 2 2 491 2 316 491 491 316 73 289 289 321 321 354 159 159 159 285 34 111 111 111 111 111 111 438 438 239 384 371 180 151 151 31 54 54 142 397 397 109 109 189 330 457 394 465 108 377 87 87 43 364 276 109 372 498 396 396 178 458 192 89 340 94 199 255 255 399 217 473 65 486 486 486 460 240 240 36 310 107 242 275 275 116 195 466 45 45 45 45 325 34 111 111 111 111 438 438 58 72 72 72 110 110 110 110 254 254 240 285 34 106 125 125 125 125 466 22 283 455 399 70 65 496 496 496 186 186 238 6 6 472 221 401 401 47 47 491 491 47 80 491 80 401 321 354 354 485 485 219 219 219 485 485 374 374 374 132 132 132 285 449 469 469 469 349 349 155 262 262 100 100 100 497 497 122 129 401 401 401 401 321 75 74 74 437 351 351 290 290 171 171 171 171 171 139 139 139 139 497 497 497 497 122 32 32 32 401 401 321 354 425 425 425 241 431 431 374 374 374 374 132 132 132 132 186 162 232 232 232 68 68 172 115 273 278 278 139 139 293 122 458 458 96 472 221 401 401 75 161 79 487 288 443 443 169 271 150 39 433 433 433 433 160 427 247 247 247 126 126 126 326 326 326 326 326 326 326 326 326 326 101 101 101 149 228 228 321 321 321 354 420 420 422 143 458 192 485 278 368 453 9 397 345 409 409 409 67 219 219 152 152 152 152 14 14 411 411 284 284 284 353 353 353 396 406 467 467 255 255 255 399 217 473 65 486 486 365 460 240 310 107 107 447 242 94 199 176 176 328 200 248 248 219 152 152 152 378 399 70 65 428 428 428 146 143 449 34 253 253 9 142 397 336 109 109 139 139 175 175 81 255 255 217 217 65 486 486 460 240 240 36 310 107 242 242 116 394 478 66 342 224 231 231 231 76 76 198 214 214 214 328 200 248 250 364 276 109 498 498 396 169 164 164 133 364 276 276 346 346 346 265 85 85 85 355 355 375 375 375 98 229 229 321 247 15 15 15 15 15 193 193 193 193 17
+17 17 17 296 363 363 363 363 51 51 51 51 228 321 321 83 55 55 322 67 131 44 44 44 236 32 401 401 401 401 401 401 321 354 278 278 278 278 278 360 252 416 458 192 445 183 72 72 72 72 110 110 486 486 486 460 460 240 35 131 483 226 226 226 321 411 287 297 297 297 297 297 297 297 293 293 293 122 349 349 234 234 234 234 261 425 425 386 386 431 486 315 315 315 450 88 372 372 304 304 304 368 269 342 342 89 89 446 446 33 10 10 309 479 331 331 284 405 405 206 240 325 176 176 328 200 200 248 248 76 259 74 74 425 425 425 386 386 431 374 374 374 374 374 434 203 381 471 471 49 433 433 97 427 247 247 126 326 326 326 101 149 228 321 83 55 55 322 67 34 44 44 236 129 259 144 27 424 424 424 424 424 424 424 424 424 497 122 122 131 133 133 364 276 346 346 346 405 405 206 206 169 35 36 107 107 395 89 89 446 33 394 90 90 401 401 401 321 75 445 445 445 351 278 278 240 314 90 401 401 321 144 208 425 386 431 431 431 266 266 173 173 402 270 270 342 224 89 89 446 33 33 394 32 32 401 401 401 354 354 374 374 374 374 132 233 385 233 270 270 270 390 390 390 18 112 56 56 56 47 47 491 47 47 47 491 491 435 435 321 435 435 435 287 287 111 111 111 438 349 205 205 261 25 189 139 139 293 167 457 401 321 75 310 107 107 395 286 286 468 468 313 313 285 34 230 230 230 230 215 35 402 133 147 147 147 499 499 428 428 146 146 325 34 255 255 43 364 109 109 403 403 403 207 19 454 229 321 247 126 126 326 326 326 326 101 101 149 149 228 412 83 55 55 55 322 212 34 111 111 111 111 438 121 121 339 394 212 107 180 106 153 387 387 146 252 314 196 217 46 46 46 438 438 129 36 161 161 487 288 278 173 402 96 36 272 377 123 123 216 22 448 448 448 464 464 145 265 265 85 146 146 175 175 81 242 116 212 133 133 333 333 220 220 335 14 14 226 226 321 209 297 297 297 297 297 297 297 293 399 399 70 46 46 46 46 46 438 438 399 217 70 65 265 265 428 428 428 146 358 358 233 36 227 419 419 439 417 417 170 170 170 491 28 28 491 28 442 28 442 362 491 362 102 491 362 491 362 362 102 362 362 491 362 491 362 218 218 491 218 102 369 369 369 369 369 21 21 21 101 101 149 149 228 289 412 287 111 111 111 378 345 141 141 141 141 281 453 342 242 242 116 212 131 44 44 44 32 32 32 321 354 354 278 278 278 385 457 478 478 232 68 68 172 115 470 470 278 120 385 143 458 458 144 27 351 319 319 319 53 176 176 135 200 200 464 106 410 410 410 410 173 280 29 29 495 467 340 340 116 466 22 283 455 8 354 354 180 496 496 496 496 496 274 274 37 24 227 419 427 78 78 170 491 187 187 292 23 23 23 23 101 149 149 228 289 321 320 479 331 265 265 428 146 240 216 300 300 378 43 345 141 141 281 9 221 196 473 258 258 258 342 224 494 494 31 232 232 105 105 336 470 432 432 330 379 64 77 342 224 224 334 334 334 59 452 452 263 321 247 126 126 23 23 101 101 101 149 149 321 321 287 297 297 293 216 216 114 84 84 186 186 338 400 400 400 422 239 310 107 395 395 432 330 94 199 495 495 495 467 134 134 359 359 166 166 166 324 324 464 464 356 356 120 120 271 185 433 433 433 433 160 160 112 417 417 417 237 421 421 491 421 128 128 128 193 17
+17 17 17 296 296 363 363 51 51 51 321 321 373 338 400 400 400 422 422 162 68 115 470 470 443 240 314 35 310 107 400 400 30 422 58 110 110 254 254 254 240 35 242 242 242 33 457 465 108 119 437 103 103 103 146 299 203 53 64 212 377 87 416 416 445 180 278 443 385 385 77 478 66 232 68 172 115 273 470 278 278 120 178 458 458 192 225 225 225 7 276 346 346 405 206 206 35 310 107 135 135 135 248 212 384 87 87 38 342 68 172 267 267 267 267 301 301 216 45 45 45 325 111 111 111 438 438 239 384 371 278 278 116 242 33 90 393 393 234 261 25 106 481 481 481 293 293 175 14 14 410 410 410 410 410 280 29 29 245 245 8 354 153 153 153 153 372 372 372 37 24 404 439 229 491 247 312 312 187 292 292 292 292 292 21 21 21 21 23 101 408 408 149 321 321 373 400 400 400 30 422 162 162 68 115 470 470 120 240 240 314 310 338 338 400 400 400 30 301 10 10 309 479 331 463 463 463 463 29 382 313 186 186 162 54 115 273 106 481 405 481 293 216 216 283 283 455 236 401 401 321 354 213 213 213 252 325 34 69 223 130 402 196 429 429 429 429 422 393 155 332 332 245 129 321 74 190 190 380 499 499 486 481 481 293 175 175 81 176 328 200 200 255 255 8 8 180 113 113 113 113 113 450 450 167 385 75 227 419 439 78 78 170 491 28 491 491 341 341 12 12 12 21 21 21 21 21 21 101 101 149 391 228 491 289 321 321 320 159 159 159 285 34 118 118 118 118 261 177 177 177 131 90 259 144 445 351 351 443 443 240 215 35 96 96 272 156 156 382 349 205 155 165 165 165 165 53 394 212 212 354 420 420 360 360 360 135 135 200 200 248 248 478 66 68 68 68 115 267 267 267 213 422 186 162 68 68 68 115 273 470 278 278 178 143 458 192 177 177 77 77 342 168 44 44 399 217 217 70 65 498 498 498 396 186 162 54 172 224 41 41 324 464 464 111 111 438 438 239 75 371 371 278 278 314 35 401 259 74 190 190 499 499 499 405 450 293 293 175 175 81 356 356 281 453 430 430 430 430 64 465 34 277 277 277 277 385 385 233 321 419 427 491 491 312 312 312 187 187 12 12 12 12 12 21 21 326 408 408 149 149 321 321 209 55 55 322 322 199 111 111 111 378 43 364 174 174 319 348 325 34 191 191 36 87 87 87 162 68 68 172 267 267 267 267 464 464 464 204 204 204 204 29 29 337 469 164 164 214 214 214 200 248 114 45 177 43 345 141 141 281 86 238 6 377 87 87 8 354 420 420 420 422 162 162 68 68 68 68 267 267 267 267 267 434 339 339 199 125 125 125 125 348 466 114 114 92 92 92 167 457 401 321 354 354 496 496 496 496 274 37 24 131 427 321 247 126 326 326 326 326 326 326 326 101 149 228 289 321 321 354 420 420 422 143 144 27 180 253 368 453 168 111 111 111 438 236 325 371 278 278 278 314 242 242 242 457 10 10 10 309 331 331 84 84 84 274 43 43 109 109 181 216 216 300 300 300 406 467 111 111 111 111 438 240 325 34 463 463 463 280 29 382 382 58 72 72 110 202 202 202 202 402 44 44 116 479 331 493 493 493 493 216 300 495 406 467 467 499 405 206 215 29 29 469 469 236 36 108 119 485 485 485 374 374 330 94 199 469 469 469 325 41 41 41 19 19 19 454 454 229 170 491 491 15 15 15 15 15 15 15 15 193 193 193 193 193 17
+17 17 17 296 363 363 363 51 51 51 51 491 491 184 491 184 289 321 321 287 287 287 284 284 284 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 98 98 98 225 98 98 225 225 225 373 164 164 289 321 164 127 127 114 0 0 264 468 468 406 406 467 467 406 467 467 467 44 44 251 251 251 251 251 241 241 431 431 284 284 405 405 206 206 167 457 457 196 70 70 70 138 138 138 138 372 313 313 236 401 401 401 401 75 310 107 107 107 395 395 395 395 264 468 468 406 337 337 324 324 422 143 36 161 161 487 487 41 41 324 318 49 9 9 483 14 321 411 297 297 297 297 297 297 297 175 175 81 242 340 203 53 53 394 76 401 321 354 425 425 425 241 431 431 374 374 374 374 374 374 132 132 132 413 203 381 381 381 404 229 229 247 312 312 126 292 292 292 292 292 292 292 23 23 23 23 23 23 23 101 408 391 491 491 491 289 289 321 127 114 258 258 258 258 31 342 342 342 483 14 14 321 411 287 284 265 265 85 85 85 146 146 139 175 175 175 81 242 275 116 64 212 131 356 356 356 281 342 198 198 22 283 455 455 129 129 401 321 354 425 425 386 431 374 374 374 374 132 399 53 473 324 324 324 464 459 459 459 31 39 86 86 6 272 472 336 321 74 74 425 425 386 386 343 343 343 343 358 358 358 39 39 433 68 172 115 97 225 287 111 111 438 143 36 107 395 494 31 31 342 26 26 26 251 241 241 266 266 266 173 173 29 277 277 325 285 106 106 297 297 297 293 293 42 42 147 147 380 288 443 443 240 325 325 41 41 324 324 3 464 89 322 94 199 111 111 111 356 203 53 478 478 232 232 68 68 115 344 344 344 344 344 274 274 32 401 401 321 208 208 386 386 431 431 376 376 376 240 285 34 111 111 203 203 53 394 90 465 144 180 84 496 88 176 135 135 248 36 377 87 87 87 251 251 241 431 278 278 173 173 402 183 183 286 286 286 286 264 59 59 452 263 263 417 414 170 170 491 170 28 491 28 28 491 2 491 2 2 491 491 435 435 289 289 321 320 305 287 287 111 111 202 202 280 106 297 297 297 297 88 109 109 469 368 31 342 142 142 72 110 498 498 498 498 313 240 314 198 45 45 45 457 129 259 259 74 190 487 432 330 330 64 64 77 342 342 168 145 329 329 240 131 133 345 345 109 313 285 335 14 14 145 284 265 85 146 146 139 175 81 81 275 275 116 133 133 345 141 141 281 453 342 198 22 283 455 129 129 321 74 190 190 487 487 278 278 325 324 324 464 459 459 459 31 86 86 221 336 321 74 425 386 343 343 343 343 252 186 39 342 342 340 340 340 340 466 22 283 455 43 43 276 109 109 498 498 498 139 139 375 375 375 122 122 131 427 229 247 15 15 193 193 17
+17 17 17 363 363 363 51 51 228 321 83 55 55 322 131 111 111 111 111 438 219 219 219 485 485 374 186 162 54 238 6 161 87 87 464 255 255 399 217 473 65 486 486 460 240 240 310 107 395 242 116 94 199 111 111 378 378 345 141 141 281 342 9 26 26 251 241 431 278 278 173 173 280 176 135 328 200 248 183 183 286 286 286 286 264 59 59 452 263 321 247 247 126 126 292 292 292 23 23 23 23 260 260 391 391 391 316 73 73 289 321 321 354 159 159 159 285 34 111 111 111 111 438 10 10 479 463 463 463 463 463 29 29 382 245 245 42 42 147 147 380 485 485 278 278 359 359 166 166 166 464 464 154 154 458 96 66 86 105 105 336 470 470 151 178 178 96 96 272 191 191 191 325 34 111 111 111 111 438 438 43 364 364 109 109 389 389 120 120 120 37 37 75 419 419 439 78 170 170 170 170 28 491 28 491 28 362 362 491 362 2 362 362 491 491 491 366 491 491 316 316 435 289 321 321 177 177 177 143 36 77 86 86 221 336 384 490 490 490 251 251 251 241 431 428 428 428 428 146 35 35 393 393 155 262 262 100 100 497 43 364 409 409 409 409 33 219 219 152 222 222 406 467 467 467 255 203 217 473 65 486 486 460 460 240 310 107 395 395 469 116 94 418 418 418 418 418 99 436 436 60 298 298 379 471 49 9 142 336 144 27 351 319 319 203 53 90 76 321 161 161 161 487 487 374 374 132 88 88 356 356 356 453 342 430 430 116 64 212 131 277 277 277 277 277 385 385 75 419 427 229 491 312 312 126 292 292 292 1 1 1 1 21 21 21 21 101 408 149 149 228 321 320 320 159 159 159 35 35 198 127 124 124 124 124 124 368 9 142 397 42 147 147 380 288 443 443 240 240 314 131 133 133 147 147 380 496 496 496 274 368 77 270 9 353 353 353 186 186 232 482 172 115 344 344 344 344 274 349 349 234 234 234 234 261 25 319 319 319 240 94 199 41 41 41 41 19 19 19 454 454 414 170 170 312 312 187 187 187 292 292 292 23 23 23 101 101 149 149 228 321 321 320 345 409 409 409 399 473 429 30 301 143 465 144 180 189 189 240 285 94 34 340 340 116 64 76 377 377 123 123 216 216 22 283 455 236 401 321 75 161 161 487 487 288 290 290 290 434 339 199 199 415 457 457 186 338 338 338 395 499 499 306 306 206 293 175 81 81 469 457 457 36 75 108 119 351 315 315 315 315 450 450 413 413 94 199 340 340 466 22 283 455 455 42 42 147 380 288 443 443 240 314 131 133 133 364 147 380 380 496 496 496 274 274 24 77 270 142 221 336 420 420 420 416 445 445 210 210 210 460 330 388 76 384 87 87 87 349 234 261 261 386 431 431 376 376 460 460 169 169 99 436 447 447 221 336 74 74 311 311 311 311 311 311 311 460 169 150 150 342 86 6 272 427 82 247 126 326 326 326 326 101 101 101 149 149 228 321 287 287 111 111 111 438 438 145 145 376 376 460 460 169 150 150 86 238 6 196 196 217 473 258 258 342 342 224 494 494 31 162 68 105 105 336 354 470 432 330 379 64 77 77 224 300 382 382 245 43 345 181 181 181 167 457 217 217 473 476 476 476 252 314 259 22 57 57 203 53 250 250 147 380 288 288 120 120 240 385 131 229 247 126 193 193 17
+17 17 296 296 51 321 412 55 55 322 322 67 478 338 338 400 400 400 30 30 422 186 232 232 172 115 470 470 240 314 36 310 400 400 400 30 422 236 384 371 371 278 278 314 196 242 242 457 309 479 331 84 84 496 88 88 89 446 203 393 155 332 332 332 245 129 259 74 74 278 278 278 325 41 324 324 324 186 162 232 68 68 115 470 470 171 171 252 143 96 196 479 331 307 307 61 167 457 36 377 87 87 14 145 145 376 460 460 169 150 86 105 221 458 208 495 467 467 475 475 475 475 475 301 399 70 138 138 138 138 372 245 245 129 321 208 441 151 151 151 151 169 99 238 238 310 107 60 298 298 298 379 471 471 270 160 427 247 247 126 126 292 292 292 292 23 23 408 408 408 408 391 321 321 373 373 400 400 400 400 422 162 342 115 470 470 240 285 34 111 111 111 438 399 70 65 65 151 150 150 86 6 34 202 202 202 280 145 145 486 460 460 169 150 86 238 272 161 382 467 44 44 44 38 164 401 321 164 180 180 486 315 315 450 450 169 269 9 168 242 116 64 131 34 106 297 297 297 293 293 42 42 147 380 288 443 443 240 325 325 41 41 19 19 454 454 414 321 247 312 126 126 292 292 292 1 23 23 408 408 149 149 228 321 321 209 287 111 111 111 438 31 342 342 494 494 494 129 74 496 496 496 496 496 368 368 453 168 180 111 111 111 111 111 438 58 72 72 110 110 486 486 486 460 460 388 64 314 401 321 108 108 119 374 374 374 374 374 132 132 132 8 8 354 159 159 159 159 314 229 247 247 126 126 326 326 326 326 326 326 326 326 326 101 101 101 149 228 321 373 72 72 268 268 268 268 268 88 430 430 430 430 219 219 152 152 416 144 27 106 88 350 360 135 339 212 87 87 349 349 261 25 480 480 480 85 299 299 299 64 212 384 180 180 486 113 240 285 285 255 255 8 354 180 113 113 113 113 167 167 164 164 164 214 214 214 200 200 471 49 342 9 118 118 118 280 30 30 30 422 239 371 180 84 350 350 413 285 34 145 145 460 460 169 150 86 142 221 336 208 441 151 151 151 151 169 99 447 238 6 310 60 60 298 298 275 379 471 471 471 49 269 433 160 160 112 112 56 56 170 28 28 491 2 2 2 2 289 321 373 373 326 326 326 326 326 326 101 149 149 228 321 321 83 55 322 322 399 250 181 181 181 181 181 35 401 401 401 401 321 384 371 180 71 71 71 71 368 368 453 342 86 221 196 196 473 476 476 476 143 143 401 401 198 22 283 455 455 42 147 380 380 288 496 496 496 274 37 77 77 323 142 397 336 147 147 380 288 120 120 120 37 24 24 419 439 439 78 417 417 170 170 170 170 28 28 28 28 491 362 491 491 362 362 491 362 491 362 491 362 362 491 40 40 40 40 40 40 40 366 366 366 366 316 316 249 7 7 7 7 7 7 7 7 7 7 7 7 364 364 276 276 109 109 84 443 139 139 139 293 293 413 122 309 479 331 331 315 315 315 315 315 450 450 450 16 16 293 98 98 13 13 13 13 13 78 491 170 312 312 126 23 23 23 23 260 260 391 163 316 491 316 289 373 225 225 225 442 287 287 287 287 287 287 111 111 111 438 438 24 325 34 84 242 116 479 331 84 84 84 84 16 16 16 16 375 98 98 98 263 13 13 13 78 47 47 47 491 47 80 80 321 80 373 66 66 172 179 179 179 179 314 196 196 70 65 329 329 460 460 169 349 352 25 485 485 485 485 485 374 132 98 98 13 417 417 417 170 421 421 491 421 421 491 128 128 128 491 128 128 128 128 128 193 193 17
+17 17 17 363 363 363 149 149 228 321 127 114 0 0 313 313 35 354 420 420 420 301 10 479 331 231 231 231 274 186 162 482 482 105 6 144 496 496 496 215 457 393 205 155 332 332 332 216 448 448 448 464 255 399 473 65 486 486 460 240 24 310 395 469 242 116 94 418 418 418 418 418 418 99 436 436 60 60 298 298 116 33 394 466 127 114 361 361 361 282 388 303 117 48 13 80 80 80 321 7 364 345 430 430 430 314 35 401 198 127 114 114 264 264 264 59 59 452 452 13 229 247 247 312 312 312 292 292 292 292 292 12 21 1 21 21 21 21 21 23 101 101 149 149 391 316 316 316 73 73 289 289 321 320 159 159 285 34 430 430 430 399 70 65 111 438 438 143 36 108 119 351 405 405 405 206 178 192 192 176 135 135 248 248 465 377 377 374 374 132 399 70 383 383 383 383 383 385 35 310 310 107 447 97 427 56 56 47 491 187 80 491 80 289 321 320 74 485 213 213 252 215 354 29 302 302 175 175 81 353 353 353 353 467 467 297 297 297 293 43 345 109 469 281 342 342 6 36 119 351 351 139 139 175 81 176 135 200 248 248 429 429 429 429 464 464 111 111 111 111 438 438 239 75 371 371 374 374 374 374 132 132 132 98 98 13 414 247 312 312 126 187 292 12 12 12 12 23 23 23 101 149 149 228 321 321 320 345 430 389 236 310 107 152 152 152 378 42 147 380 180 486 486 460 240 216 300 300 495 406 467 111 111 111 438 314 36 371 371 278 278 314 242 242 242 457 457 108 108 119 437 437 405 405 405 405 206 206 215 35 458 192 419 427 229 247 312 126 292 292 23 1 408 408 408 149 228 228 316 316 80 80 289 321 209 188 118 118 118 118 118 261 219 152 152 152 152 186 162 232 172 115 470 470 403 171 171 422 162 342 342 273 273 84 16 88 106 284 481 293 293 293 150 162 232 86 238 272 371 180 106 284 405 405 405 206 206 215 215 233 352 419 13 229 82 312 187 187 187 187 47 47 47 47 491 491 491 316 491 316 80 80 435 435 435 435 209 111 111 111 438 143 458 445 445 445 351 351 351 365 365 365 365 330 388 64 64 77 77 342 68 238 6 272 180 405 405 405 215 215 35 402 345 409 409 94 199 111 111 111 438 399 217 473 473 476 476 476 143 458 192 180 230 230 215 35 35 70 70 46 46 46 438 438 399 217 70 65 480 480 480 480 299 299 339 394 465 108 377 123 123 123 88 277 277 277 277 385 131 34 106 297 297 297 293 216 114 114 84 84 88 88 177 177 177 143 36 77 342 86 238 6 336 371 490 490 490 349 349 261 469 469 469 458 144 27 100 100 100 375 375 122 122 227 227 419 427 56 56 491 312 312 312 12 12 12 12 12 12 12 12 12 12 260 260 260 260 260 491 163 163 163 366 491 366 491 316 491 366 366 491 40 40 40 40 316 289 321 321 289 7 7 217 217 473 486 486 486 460 460 460 169 164 164 164 164 219 219 219 485 477 477 374 374 132 132 98 13 417 417 417 417 237 421 421 128 128 193 17
+17 17 296 296 321 320 345 141 141 281 453 168 121 121 121 33 212 310 395 395 153 153 387 387 387 146 135 135 135 200 248 183 57 57 203 203 243 478 342 172 115 273 279 279 279 279 375 375 375 169 352 352 352 352 112 427 170 491 491 312 312 312 292 292 292 292 292 21 21 21 21 21 21 21 21 21 260 101 101 149 149 228 321 321 320 251 241 266 266 266 266 146 178 35 96 196 196 217 70 65 496 496 496 496 274 186 39 86 238 6 472 221 336 208 208 441 441 346 346 265 428 85 146 146 277 277 385 131 393 393 234 261 261 25 496 496 496 496 274 274 274 274 186 39 433 342 97 451 451 30 30 301 251 251 251 241 266 266 266 266 146 178 35 96 401 36 108 119 437 405 405 405 206 206 178 458 192 469 469 325 34 459 462 130 402 401 75 74 485 213 213 213 252 252 215 259 74 100 100 100 497 497 43 364 345 409 409 409 409 466 466 127 0 0 0 378 43 345 347 347 347 347 245 43 364 345 109 278 139 175 175 81 176 135 135 200 248 248 465 377 87 87 87 239 384 371 374 374 374 216 216 22 283 455 236 108 119 437 405 405 405 405 206 178 192 192 192 135 135 200 248 248 466 114 57 57 203 394 478 478 232 68 68 115 273 279 279 279 279 279 279 279 375 352 270 433 160 427 247 247 126 326 326 326 326 326 326 101 149 149 228 321 83 55 55 55 322 67 212 384 371 191 191 314 196 479 331 307 307 61 61 285 34 154 154 458 96 66 86 105 105 336 470 151 151 178 178 96 36 272 449 57 57 203 53 394 212 377 87 87 87 458 445 445 445 213 213 213 252 215 129 74 230 230 230 230 230 215 35 259 354 257 257 257 453 453 342 168 432 432 330 330 64 64 64 131 34 223 223 280 277 277 277 277 277 75 227 419 439 427 56 170 170 28 491 28 201 201 201 201 201 491 201 201 201 201 201 201 491 491 491 316 316 491 435 321 321 321 354 159 159 159 159 449 183 451 30 30 30 464 254 254 196 196 309 309 479 463 463 463 463 463 29 29 29 406 467 467 154 154 154 259 96 66 68 68 105 105 336 470 470 151 151 178 35 321 75 272 191 191 236 36 377 87 87 87 88 121 121 121 33 394 212 107 395 153 153 387 387 146 173 216 22 283 455 38 162 342 224 494 494 494 162 232 68 172 115 273 265 265 265 265 85 85 469 469 469 449 449 41 41 324 464 69 223 130 280 44 44 44 251 241 431 278 278 134 26 302 497 497 416 259 144 79 498 498 498 498 498 134 302 302 375 375 375 98 13 13 13 491 170 312 312 312 312 341 341 12 12 12 12 12 12 260 260 260 260 260 260 391 391 391 321 321 289 320 7 364 109 109 278 278 399 217 473 136 136 136 116 33 133 250 347 347 347 347 347 8 8 354 180 486 376 376 460 240 285 34 255 340 116 94 331 331 230 230 230 169 349 352 352 340 340 340 94 199 106 297 297 297 297 293 122 458 465 144 27 370 370 370 370 370 348 64 76 310 436 60 60 298 275 379 471 471 77 433 97 427 82 247 126 126 326 326 326 326 326 101 101 149 149 228 321 321 321 320 159 159 159 159 457 457 251 251 241 431 278 278 285 302 302 497 497 497 122 129 259 144 498 498 498 498 498 498 134 302 302 375 375 375 375 375 185 269 323 323 97 97 225 397 345 347 347 347 347 245 245 43 364 276 276 109 498 498 498 498 498 59 396 271 186 39 54 390 390 390 390 18 112 439 56 56 421 128 491 193 193 17
diff --git a/SpeechLM/dataset/LibriSpeech/hidden_unit/train_sample100.tsv b/SpeechLM/dataset/LibriSpeech/hidden_unit/train_sample100.tsv
new file mode 100644
index 0000000000000000000000000000000000000000..77da1d382251d74e02d73cce8e2e35ec6e1f8f0c
--- /dev/null
+++ b/SpeechLM/dataset/LibriSpeech/hidden_unit/train_sample100.tsv
@@ -0,0 +1,101 @@
+/LocalData/dataset/LibriSpeech
+train-clean-100/103/1240/103-1240-0000.flac	225360
+train-clean-100/103/1240/103-1240-0001.flac	255120
+train-clean-100/103/1240/103-1240-0002.flac	223120
+train-clean-100/103/1240/103-1240-0003.flac	235360
+train-clean-100/103/1240/103-1240-0004.flac	200240
+train-clean-100/103/1240/103-1240-0005.flac	242800
+train-clean-100/103/1240/103-1240-0006.flac	153280
+train-clean-100/103/1240/103-1240-0007.flac	240560
+train-clean-100/103/1240/103-1240-0008.flac	246960
+train-clean-100/103/1240/103-1240-0009.flac	160480
+train-clean-100/103/1240/103-1240-0010.flac	236880
+train-clean-100/103/1240/103-1240-0011.flac	234480
+train-clean-100/103/1240/103-1240-0012.flac	243040
+train-clean-100/103/1240/103-1240-0013.flac	244160
+train-clean-100/103/1240/103-1240-0014.flac	223360
+train-clean-100/103/1240/103-1240-0015.flac	60960
+train-clean-100/103/1240/103-1240-0016.flac	250640
+train-clean-100/103/1240/103-1240-0017.flac	229040
+train-clean-100/103/1240/103-1240-0018.flac	185760
+train-clean-100/103/1240/103-1240-0019.flac	246480
+train-clean-100/103/1240/103-1240-0020.flac	214640
+train-clean-100/103/1240/103-1240-0021.flac	236960
+train-clean-100/103/1240/103-1240-0022.flac	262000
+train-clean-100/103/1240/103-1240-0023.flac	194400
+train-clean-100/103/1240/103-1240-0024.flac	244320
+train-clean-100/103/1240/103-1240-0025.flac	241920
+train-clean-100/103/1240/103-1240-0026.flac	133360
+train-clean-100/103/1240/103-1240-0027.flac	223440
+train-clean-100/103/1240/103-1240-0028.flac	250400
+train-clean-100/103/1240/103-1240-0029.flac	244320
+train-clean-100/103/1240/103-1240-0030.flac	232320
+train-clean-100/103/1240/103-1240-0031.flac	269760
+train-clean-100/103/1240/103-1240-0032.flac	236400
+train-clean-100/103/1240/103-1240-0033.flac	230640
+train-clean-100/103/1240/103-1240-0034.flac	246480
+train-clean-100/103/1240/103-1240-0035.flac	256720
+train-clean-100/103/1240/103-1240-0036.flac	200320
+train-clean-100/103/1240/103-1240-0037.flac	237040
+train-clean-100/103/1240/103-1240-0038.flac	114480
+train-clean-100/103/1240/103-1240-0039.flac	230800
+train-clean-100/103/1240/103-1240-0040.flac	234720
+train-clean-100/103/1240/103-1240-0041.flac	216160
+train-clean-100/103/1240/103-1240-0042.flac	249680
+train-clean-100/103/1240/103-1240-0043.flac	236160
+train-clean-100/103/1240/103-1240-0044.flac	262240
+train-clean-100/103/1240/103-1240-0045.flac	250800
+train-clean-100/103/1240/103-1240-0046.flac	222800
+train-clean-100/103/1240/103-1240-0047.flac	206320
+train-clean-100/103/1240/103-1240-0048.flac	236320
+train-clean-100/103/1240/103-1240-0049.flac	244560
+train-clean-100/103/1240/103-1240-0050.flac	224400
+train-clean-100/103/1240/103-1240-0051.flac	245760
+train-clean-100/103/1240/103-1240-0052.flac	236640
+train-clean-100/103/1240/103-1240-0053.flac	218640
+train-clean-100/103/1240/103-1240-0054.flac	261360
+train-clean-100/103/1240/103-1240-0055.flac	179920
+train-clean-100/103/1240/103-1240-0056.flac	229040
+train-clean-100/103/1240/103-1240-0057.flac	109680
+train-clean-100/103/1241/103-1241-0000.flac	255440
+train-clean-100/103/1241/103-1241-0001.flac	248800
+train-clean-100/103/1241/103-1241-0002.flac	249040
+train-clean-100/103/1241/103-1241-0003.flac	222160
+train-clean-100/103/1241/103-1241-0004.flac	236080
+train-clean-100/103/1241/103-1241-0005.flac	224400
+train-clean-100/103/1241/103-1241-0006.flac	243760
+train-clean-100/103/1241/103-1241-0007.flac	242320
+train-clean-100/103/1241/103-1241-0008.flac	242160
+train-clean-100/103/1241/103-1241-0009.flac	222400
+train-clean-100/103/1241/103-1241-0010.flac	253920
+train-clean-100/103/1241/103-1241-0011.flac	231760
+train-clean-100/103/1241/103-1241-0012.flac	239680
+train-clean-100/103/1241/103-1241-0013.flac	236960
+train-clean-100/103/1241/103-1241-0014.flac	242080
+train-clean-100/103/1241/103-1241-0015.flac	224160
+train-clean-100/103/1241/103-1241-0016.flac	234640
+train-clean-100/103/1241/103-1241-0017.flac	254240
+train-clean-100/103/1241/103-1241-0018.flac	150960
+train-clean-100/103/1241/103-1241-0019.flac	48400
+train-clean-100/103/1241/103-1241-0020.flac	155360
+train-clean-100/103/1241/103-1241-0021.flac	242880
+train-clean-100/103/1241/103-1241-0022.flac	261600
+train-clean-100/103/1241/103-1241-0023.flac	266720
+train-clean-100/103/1241/103-1241-0024.flac	254240
+train-clean-100/103/1241/103-1241-0025.flac	77280
+train-clean-100/103/1241/103-1241-0026.flac	176080
+train-clean-100/103/1241/103-1241-0027.flac	238080
+train-clean-100/103/1241/103-1241-0028.flac	248880
+train-clean-100/103/1241/103-1241-0029.flac	244960
+train-clean-100/103/1241/103-1241-0030.flac	247520
+train-clean-100/103/1241/103-1241-0031.flac	209600
+train-clean-100/103/1241/103-1241-0032.flac	224080
+train-clean-100/103/1241/103-1241-0033.flac	251920
+train-clean-100/103/1241/103-1241-0034.flac	270560
+train-clean-100/103/1241/103-1241-0035.flac	248800
+train-clean-100/103/1241/103-1241-0036.flac	249040
+train-clean-100/103/1241/103-1241-0037.flac	204400
+train-clean-100/103/1241/103-1241-0038.flac	238960
+train-clean-100/103/1241/103-1241-0039.flac	258160
+train-clean-100/103/1241/103-1241-0040.flac	220560
+train-clean-100/103/1241/103-1241-0041.flac	252240
diff --git a/SpeechLM/dataset/LibriSpeech/phone_unit/dict.phn.txt b/SpeechLM/dataset/LibriSpeech/phone_unit/dict.phn.txt
new file mode 100644
index 0000000000000000000000000000000000000000..47b7a03cc4b736752fd0ee578a56fafcb0e242b3
--- /dev/null
+++ b/SpeechLM/dataset/LibriSpeech/phone_unit/dict.phn.txt
@@ -0,0 +1,364 @@
+0 0
+1 1
+2 2
+3 3
+4 4
+5 5
+6 6
+7 7
+8 8
+9 9
+10 10
+11 11
+12 12
+13 13
+14 14
+15 15
+16 16
+17 17
+18 18
+19 19
+20 20
+21 21
+22 22
+23 23
+24 24
+25 25
+26 26
+27 27
+28 28
+29 29
+30 30
+31 31
+32 32
+33 33
+34 34
+35 35
+36 36
+37 37
+38 38
+39 39
+40 40
+41 41
+42 42
+43 43
+44 44
+45 45
+46 46
+47 47
+48 48
+49 49
+50 50
+51 51
+52 52
+53 53
+54 54
+55 55
+56 56
+57 57
+58 58
+59 59
+60 60
+61 61
+62 62
+63 63
+64 64
+65 65
+66 66
+67 67
+68 68
+69 69
+70 70
+71 71
+72 72
+73 73
+74 74
+75 75
+76 76
+77 77
+78 78
+79 79
+80 80
+81 81
+82 82
+83 83
+84 84
+85 85
+86 86
+87 87
+88 88
+89 89
+90 90
+91 91
+92 92
+93 93
+94 94
+95 95
+96 96
+97 97
+98 98
+99 99
+100 100
+101 101
+102 102
+103 103
+104 104
+105 105
+106 106
+107 107
+108 108
+109 109
+110 110
+111 111
+112 112
+113 113
+114 114
+115 115
+116 116
+117 117
+118 118
+119 119
+120 120
+121 121
+122 122
+123 123
+124 124
+125 125
+126 126
+127 127
+128 128
+129 129
+130 130
+131 131
+132 132
+133 133
+134 134
+135 135
+136 136
+137 137
+138 138
+139 139
+140 140
+141 141
+142 142
+143 143
+144 144
+145 145
+146 146
+147 147
+148 148
+149 149
+150 150
+151 151
+152 152
+153 153
+154 154
+155 155
+156 156
+157 157
+158 158
+159 159
+160 160
+161 161
+162 162
+163 163
+164 164
+165 165
+166 166
+167 167
+168 168
+169 169
+170 170
+171 171
+172 172
+173 173
+174 174
+175 175
+176 176
+177 177
+178 178
+179 179
+180 180
+181 181
+182 182
+183 183
+184 184
+185 185
+186 186
+187 187
+188 188
+189 189
+190 190
+191 191
+192 192
+193 193
+194 194
+195 195
+196 196
+197 197
+198 198
+199 199
+200 200
+201 201
+202 202
+203 203
+204 204
+205 205
+206 206
+207 207
+208 208
+209 209
+210 210
+211 211
+212 212
+213 213
+214 214
+215 215
+216 216
+217 217
+218 218
+219 219
+220 220
+221 221
+222 222
+223 223
+224 224
+225 225
+226 226
+227 227
+228 228
+229 229
+230 230
+231 231
+232 232
+233 233
+234 234
+235 235
+236 236
+237 237
+238 238
+239 239
+240 240
+241 241
+242 242
+243 243
+244 244
+245 245
+246 246
+247 247
+248 248
+249 249
+250 250
+251 251
+252 252
+253 253
+254 254
+255 255
+256 256
+257 257
+258 258
+259 259
+260 260
+261 261
+262 262
+263 263
+264 264
+265 265
+266 266
+267 267
+268 268
+269 269
+270 270
+271 271
+272 272
+273 273
+274 274
+275 275
+276 276
+277 277
+278 278
+279 279
+280 280
+281 281
+282 282
+283 283
+284 284
+285 285
+286 286
+287 287
+288 288
+289 289
+290 290
+291 291
+292 292
+293 293
+294 294
+295 295
+296 296
+297 297
+298 298
+299 299
+300 300
+301 301
+302 302
+303 303
+304 304
+305 305
+306 306
+307 307
+308 308
+309 309
+310 310
+311 311
+312 312
+313 313
+314 314
+315 315
+316 316
+317 317
+318 318
+319 319
+320 320
+321 321
+322 322
+323 323
+324 324
+325 325
+326 326
+327 327
+328 328
+329 329
+330 330
+331 331
+332 332
+333 333
+334 334
+335 335
+336 336
+337 337
+338 338
+339 339
+340 340
+341 341
+342 342
+343 343
+344 344
+345 345
+346 346
+347 347
+348 348
+349 349
+350 350
+351 351
+352 352
+353 353
+354 354
+355 355
+356 356
+357 357
+358 358
+359 359
+360 360
+361 361
+362 362
+363 363
diff --git a/SpeechLM/dataset/LibriSpeech/phone_unit/train_sample100.phn b/SpeechLM/dataset/LibriSpeech/phone_unit/train_sample100.phn
new file mode 100644
index 0000000000000000000000000000000000000000..6550e52f92d0b35fa7bdc3b0676046893840b7ef
--- /dev/null
+++ b/SpeechLM/dataset/LibriSpeech/phone_unit/train_sample100.phn
@@ -0,0 +1,100 @@
+1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 111 111 111 111 111 111 111 111 111 111 111 111 37 37 37 37 37 37 37 273 273 273 273 273 273 289 289 289 289 289 144 144 144 144 144 144 144 331 331 331 331 331 331 331 331 331 331 53 53 53 53 53 53 53 53 53 232 232 232 232 232 232 232 232 232 232 232 232 232 232 232 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 227 227 227 227 227 227 227 227 227 193 193 193 193 193 281 281 281 281 281 281 281 281 189 189 189 189 340 340 340 340 340 340 340 340 340 340 340 275 275 275 275 275 275 165 165 165 165 165 165 165 165 165 113 113 113 113 113 113 113 113 49 49 49 49 49 49 224 224 224 223 223 223 223 223 223 223 223 223 223 193 193 193 193 193 193 193 193 193 233 233 233 233 233 233 233 233 233 116 116 116 116 116 116 116 116 116 1 1 1 187 187 187 187 187 187 187 187 187 340 340 340 340 340 279 279 279 279 279 279 279 49 49 49 49 49 273 273 273 273 273 273 273 273 277 277 277 277 277 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 341 341 341 341 341 341 341 341 341 341 341 116 116 116 116 116 116 116 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 227 227 227 227 227 227 227 227 227 193 193 193 193 193 281 281 281 281 281 281 281 281 189 189 189 189 189 189 189 189 340 340 340 340 340 340 340 340 340 340 275 275 275 275 275 275 165 165 165 165 165 165 165 165 113 113 113 113 113 113 113 49 49 49 49 49 49 224 224 224 223 223 223 223 223 223 223 223 223 223 223 223 193 193 193 193 193 193 193 193 193 193 193 233 233 233 233 233 233 233 233 233 233 233 233 116 116 116 116 116 116 116 116 116 116 116 116 116 1 1 1 1 1 1 1 223 223 223 223 223 223 223 223 223 223 193 193 193 193 193 193 329 329 329 329 329 329 116 116 116 1 1 1 1 1 1 1 1 215 215 215 215 215 215 215 215 215 215 215 215 53 53 53 53 53 53 53 53 53 281 281 281 281 281 281 281 281 281 281 288 288 288 288 288 288 288 331 331 331 331 133 133 133 133 133 133 276 276 276 276 276 276 119 119 119 119 119 204 204 204 204 204 204 204 204 204 204 35 35 35 35 35 35 35 35 35 35 35 35 35 35 35 35 329 329 329 329 329 49 49 49 49 233 233 233 233 233 233 233 233 233 233 225 225 225 225 225 212 212 212 212 212 212 212 212 212 212 212 212 212 227 227 227 227 227 227 227 227 227 227 227 227 227 227 227 227 165 165 165 165 165 165 165 165 165 165 165 165 165 165 165 232 232 232 232 232 232 232 232 232 232 232 232 232 275 275 275 275 275 275 275 275 275 249 249 249 249 249 249 249 249 249 249 249 249 249 249 249 249 249 249 249 116 116 116 116 116 116 116 116 116 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 115 115 115 115 115 115 115 193 193 193 193 193 193 193 273 273 273 273 273 273 288 288 288 288 288 288 288 1 1 1 1 1 1 1 1 1 1 115 115 115 115 115 115 115 85 85 85 85 85 85 85 85 85 85 85 85 85 85 85 85 85 85 232 232 232 232 232 232 187 187 187 233 233 233 233 289 289 289 289 289 289 289 289 320 320 320 320 320 320 50 50 50 50 50 50 50 223 223 223 223 223 223 193 193 193 193 193 289 289 289 289 289 49 49 49 49 224 224 224 224 224 224 224 179 179 179 179 179 179 179 179 179 179 179 179 21 21 21 21 21 21 225 225 225 225 225 225 225 225 225 225 244 244 244 244 244 244 244 244 244 244 244 244 244 244 244 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 171 171 171 171 171 171 171 171 171 171 171 171 171 171 277 277 277 277 277 193 193 193 193 193 233 233 233 233 233 233 233 233 217 217 217 217 217 217 217 217 217 116 116 116 331 331 331 331 331 189 189 189 189 292 292 292 292 292 292 67 67 67 67 67 67 67 67 67 67 67 67 67 67 225 225 225 225 225 225 117 117 117 117 117 117 117 145 145 145 145 145 145 145 145 145 145 340 340 340 340 340 340 340 340 47 47 47 47 47 47 233 233 233 233 233 116 116 116 116 116 223 223 223 223 165 165 165 165 165 165 165 165 165 165 165 165 117 117 117 117 117 205 205 205 205 205 205 205 205 205 340 340 340 340 340 340 340 340 340 191 191 191 191 191 191 191 277 277 277 277 277 277 277 117 117 117 117 117 117 277 277 277 277 277 25 25 25 25 25 25 25 25 273 273 273 273 273 273 273 273 280 280 280 280 280 280 280 280 47 47 47 47 233 233 233 233 116 116 116 287 287 287 287 287 277 277 277 277 49 49 49 49 329 329 329 329 329 149 149 149 149 149 149 149 149 149 149 149 149 149 149 281 281 281 281 281 281 288 288 288 288 288 288 288 107 107 107 107 107 107 100 100 100 100 100 100 100 100 100 100 50 50 50 50 50 50 107 107 107 107 107 107 107 107 107 107 107 107 277 277 277 277 277 305 305 305 305 305 305 305 305 305 220 220 220 220 220 220 220 220 220 220 220 220 220 1 1 1 1 1 1 1 1 1 1 1
+1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 119 119 119 119 49 49 49 288 288 288 288 179 179 179 179 179 179 179 179 179 179 179 179 179 179 179 37 37 37 37 37 37 116 116 116 116 116 191 191 191 191 191 191 289 289 289 289 289 289 280 280 280 280 279 279 279 279 279 279 279 279 279 279 279 279 69 69 69 69 69 69 69 69 69 277 277 277 277 277 277 277 277 277 280 280 280 280 280 280 280 280 280 47 47 47 47 47 47 333 333 333 333 333 333 333 333 333 333 333 164 164 164 164 164 164 164 164 164 164 164 164 164 164 164 164 164 164 164 164 107 107 107 107 107 107 107 107 107 107 37 37 37 37 37 37 37 37 37 220 220 220 220 220 220 220 220 220 220 187 187 187 187 232 232 232 232 232 119 119 119 52 52 52 52 331 331 331 331 331 331 331 331 331 331 331 331 331 331 331 331 331 305 305 305 305 305 117 117 117 117 117 117 340 340 340 340 340 47 47 47 328 328 328 328 119 119 119 119 119 204 204 204 204 204 204 204 204 247 247 247 247 247 247 247 247 247 225 225 225 225 225 116 116 116 116 116 116 116 219 219 219 219 219 219 219 219 219 219 53 53 53 53 53 53 53 293 293 293 293 293 293 293 293 293 293 109 109 109 109 109 145 145 145 145 145 145 145 145 145 288 288 288 288 288 288 271 271 271 271 271 271 271 271 271 225 225 225 225 165 165 165 165 165 165 165 165 165 165 165 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 187 187 187 187 187 187 187 187 187 187 187 187 288 288 288 288 288 288 331 331 331 331 49 49 49 49 49 340 340 340 340 340 340 340 275 275 275 275 189 189 189 189 273 273 273 273 273 273 273 273 337 337 337 337 321 321 321 321 321 321 321 289 289 289 289 289 189 189 189 189 189 189 189 116 116 116 116 287 287 287 188 188 188 188 107 107 107 107 107 107 208 208 208 208 208 208 208 208 208 47 47 47 232 232 232 232 232 232 232 232 232 191 191 191 191 191 191 191 233 233 233 233 233 233 233 233 289 289 289 289 289 289 289 277 277 277 49 49 49 49 221 221 221 221 221 221 221 49 49 49 49 49 49 49 49 49 288 288 288 288 179 179 179 179 179 179 179 179 179 179 179 179 179 179 179 179 179 179 133 133 133 133 133 133 117 117 117 117 117 117 117 117 117 117 225 225 225 225 225 225 73 73 73 73 73 73 73 73 73 73 236 236 236 236 236 236 236 236 236 236 236 236 236 107 107 107 107 107 277 277 277 277 277 277 305 305 305 305 305 305 305 220 220 220 220 220 220 220 220 220 220 187 187 187 187 232 232 232 232 187 187 187 289 289 289 289 289 280 280 280 280 280 1 1 1 147 147 147 147 147 147 147 147 147 147 147 147 147 225 225 225 225 225 205 205 205 205 205 144 144 144 144 144 144 144 144 219 219 219 219 219 219 219 219 219 219 219 219 69 69 69 69 69 69 69 69 277 277 277 277 277 277 280 280 280 280 280 280 280 280 291 291 291 291 291 291 277 277 277 277 320 320 320 320 119 119 119 119 119 249 249 249 249 249 249 249 249 249 249 249 249 340 340 340 340 340 340 340 340 340 340 340 331 331 331 331 331 331 331 331 305 305 305 305 305 305 305 117 117 117 117 117 117 117 117 340 340 340 340 340 340 340 340 340 340 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 331 331 331 331 331 331 331 331 193 193 193 193 292 292 292 292 292 292 292 292 292 292 292 292 292 292 115 115 115 115 115 115 115 115 21 21 21 21 21 21 21 21 21 277 277 277 277 277 277 277 277 220 220 220 220 220 220 220 220 220 279 279 279 279 279 279 279 279 279 279 279 279 279 279 279 279 279 279 209 209 209 209 209 209 209 221 221 221 221 221 221 221 277 277 277 277 277 189 189 189 189 289 289 289 289 289 289 280 280 280 280 280 280 47 47 47 47 328 328 328 328 328 328 328 328 271 271 271 271 271 271 271 271 271 271 271 271 271 321 321 321 321 321 321 321 321 321 321 321 224 224 224 224 224 224 224 224 224 224 224 224 224 47 47 47 47 47 233 233 233 233 116 116 116 116 116 219 219 219 219 219 219 219 219 219 219 219 33 33 33 33 33 33 33 33 33 281 281 281 281 281 281 281 281 281 281 281 221 221 221 221 221 221 221 165 165 165 165 165 165 165 165 165 165 165 165 165 165 165 165 165 165 116 116 116 116 116 116 116 116 116 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 107 107 107 107 107 107 53 53 53 53 53 53 53 53 53 53 53 288 288 288 288 288 288 288 1 1 1 107 107 107 107 100 100 100 100 100 100 100 100 100 119 119 119 119 48 48 48 48 287 287 287 287 287 287 287 287 287 287 287 287 287 101 101 101 101 101 101 101 101 101 228 228 228 228 228 228 228 187 187 187 288 288 288 288 288 288 288 275 275 275 275 275 275 275 275 275 209 209 209 209 209 209 209 113 113 113 113 113 113 113 113 113 288 288 288 288 288 223 223 223 223 223 223 223 223 193 193 193 193 193 193 193 193 233 233 233 233 233 233 233 117 117 117 117 340 340 340 340 340 179 179 179 179 179 179 179 179 179 179 179 21 21 21 21 21 21 225 225 225 225 225 225 225 225 225 225 244 244 244 244 244 244 244 244 244 244 244 244 244 244 244 244 244 244 191 191 191 191 191 191 191 191 191 191 288 288 288 288 288 288 288 288 331 331 331 331 331 49 49 49 49 340 340 340 340 340 50 50 50 50 50 219 219 219 219 219 219 219 219 219 219 219 219 333 333 333 333 333 333 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 49 49 49 49 49 49 49 288 288 288 288 288 288 288 1 1 1 331 331 331 331 331 331 331 331 331 331 331 133 133 133 133 133 133 224 224 224 224 224 224 224 224 219 219 219 219 219 219 219 49 49 49 49 233 233 233 233 233 233 233 117 117 117 117 117 117 53 53 53 53 53 53 53 53 221 221 221 221 221 221 289 289 289 289 289 289 49 49 49 49 49 49 116 116 116 116 116 116 223 223 223 223 223 193 193 193 289 289 289 289 289 49 49 49 49 224 224 224 224 224 279 279 279 279 279 279 279 289 289 289 289 289 289 289 277 277 277 277 277 209 209 209 209 209 209 209 209 209 209 209 209 228 228 228 228 228 228 228 228 228 228 1 1 1 1 1 1 1 1 1 1 1 1 1
+1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 171 171 171 171 171 171 171 171 171 171 171 171 69 69 69 69 276 276 276 276 276 276 231 231 231 231 231 231 21 21 21 21 21 21 21 21 21 21 288 288 288 288 288 288 207 207 207 207 207 207 207 207 207 207 329 329 329 329 329 329 189 189 189 189 232 232 232 232 232 50 50 50 50 50 50 107 107 107 107 107 107 107 107 107 107 107 277 277 277 277 277 277 277 277 277 305 305 305 305 305 305 220 220 220 220 220 220 220 220 220 220 220 220 220 220 220 220 220 220 220 220 220 220 220 1 1 1 1 219 219 219 219 219 219 219 219 219 305 305 305 305 116 116 116 116 116 116 275 275 275 275 275 275 275 275 275 275 53 53 53 53 53 53 53 232 232 232 232 232 232 232 232 232 271 271 271 271 271 271 271 271 271 271 271 37 37 37 37 37 37 37 37 37 37 37 37 281 281 281 281 281 281 281 281 288 288 288 288 227 227 227 227 227 193 193 193 193 193 193 281 281 281 281 281 281 281 189 189 189 189 189 340 340 340 340 340 340 340 275 275 275 275 275 275 165 165 165 165 165 165 165 113 113 113 113 113 113 49 49 49 49 49 49 224 224 224 223 223 223 223 223 223 223 223 223 223 223 193 193 193 193 193 233 233 233 233 233 233 233 233 233 233 117 117 117 117 340 340 340 340 340 340 340 115 115 115 115 115 115 69 69 69 69 69 69 276 276 276 276 276 276 331 331 331 331 331 331 331 189 189 189 121 121 121 121 121 85 85 85 85 85 85 85 85 85 85 85 85 85 288 288 288 288 288 288 115 115 115 115 115 320 320 320 320 320 320 320 320 320 320 320 275 275 275 275 275 275 275 189 189 189 189 189 177 177 177 177 177 177 177 21 21 21 21 21 21 21 21 21 21 21 277 277 277 277 277 277 277 116 116 116 116 116 116 171 171 171 171 171 171 171 171 144 144 144 144 144 144 115 115 115 115 115 115 115 115 115 115 209 209 209 209 209 209 209 209 209 281 281 281 281 281 281 281 281 49 49 49 49 233 233 233 233 233 233 233 233 233 233 233 281 281 281 281 281 281 281 281 281 281 281 204 204 204 204 204 204 204 204 204 204 204 204 204 204 35 35 35 35 35 35 35 233 233 233 233 116 116 116 116 115 115 115 189 189 189 189 221 221 221 221 221 221 221 221 221 221 221 69 69 69 69 69 69 69 69 69 69 277 277 277 277 277 277 277 277 277 49 49 49 228 228 228 228 228 228 228 228 228 228 228 228 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 187 187 187 187 187 187 187 187 187 187 187 187 288 288 288 288 288 288 271 271 271 271 271 271 271 277 277 277 277 277 277 21 21 21 21 109 109 109 109 109 109 49 49 49 49 49 49 109 109 109 109 109 225 225 225 225 225 225 204 204 204 204 204 204 204 204 331 331 331 331 331 331 49 49 49 340 340 340 340 340 340 340 219 219 219 219 219 219 219 219 219 219 219 219 21 21 21 21 21 21 21 21 233 233 233 233 233 233 233 233 285 285 285 285 285 285 285 285 49 49 49 49 49 49 49 49 49 49 280 280 280 280 280 280 280 280 280 119 119 119 119 119 49 49 49 49 49 288 288 288 288 288 288 227 227 227 227 227 227 193 193 193 193 193 281 281 281 281 281 281 281 281 189 189 189 189 189 189 340 340 340 340 340 340 340 340 340 275 275 275 275 275 275 275 275 165 165 165 165 165 165 165 165 165 165 113 113 113 113 113 113 113 49 49 49 49 49 49 49 224 224 224 224 224 224 224 224 224 224 224 331 331 331 331 331 331 49 49 49 340 340 340 340 340 340 279 279 279 279 279 279 279 279 193 193 193 193 289 289 289 289 189 189 189 189 189 189 236 236 236 236 236 236 236 236 236 35 35 35 35 288 288 288 288 288 179 179 179 179 179 179 179 179 144 144 144 144 144 144 144 331 331 331 331 331 331 331 193 193 193 193 233 233 233 233 233 233 233 233 117 117 117 117 117 244 244 244 244 244 244 244 244 244 244 244 244 244 244 244 244 244 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 219 219 219 219 219 219 219 219 219 219 209 209 209 209 209 209 209 273 273 273 273 273 273 273 189 189 189 236 236 236 236 236 236 236 236 50 50 50 50 50 283 283 283 283 283 283 283 283 283 283 283 283 283 283 283 283 21 21 21 21 21 21 21 21 277 277 277 277 277 277 272 272 272 272 272 272 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 19 19 19 19 19 19 19 19 19 19 232 232 232 232 232 232 232 232 232 131 131 131 131 131 131 131 131 329 329 329 329 329 329 329 329 329 277 277 277 277 277 205 205 205 205 205 205 293 293 293 293 293 293 293 293 293 197 197 197 197 236 236 236 236 236 236 236 236 119 119 119 49 49 49 49 288 288 288 288 288 271 271 271 271 271 271 271 271 271 271 271 271 271 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 281 281 281 281 281 281 281 281 281 281 281 281 288 288 288 288 288 288 288 288 288 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 171 171 171 171 171 171 171 171 145 145 145 145 145 145 145 228 228 228 228 228 107 107 107 107 277 277 277 277 305 305 305 305 305 305 221 221 221 221 221 221 221 280 280 280 280 280 280 47 47 47 47 47 233 233 233 116 116 116 116 111 111 111 111 111 193 193 193 193 225 225 225 225 225 225 225 225 117 117 117 117 117 277 277 277 49 49 49 232 232 232 232 232 232 51 51 51 51 51 51 51 51 51 51 51 51 51 51 51 51 51 51 272 272 272 272 272 272 272 272 272 272 272 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
+1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 35 35 35 35 35 35 35 35 35 35 35 35 233 233 233 233 116 116 116 119 119 119 37 37 37 37 37 37 37 37 288 288 288 288 288 187 187 187 187 187 187 172 172 172 172 172 283 283 283 283 283 283 283 208 208 208 208 208 208 231 231 231 231 231 231 231 231 249 249 249 249 249 249 249 249 249 289 289 289 289 49 49 49 49 49 49 281 281 281 281 281 281 281 281 288 288 288 288 288 288 288 131 131 131 131 131 131 131 233 233 233 233 233 233 233 205 205 205 205 205 205 205 293 293 293 293 293 293 197 197 197 236 236 236 236 236 236 236 1 1 1 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 116 116 116 1 1 1 67 67 67 67 67 67 67 67 276 276 276 276 276 276 276 276 276 276 83 83 83 83 83 83 83 83 83 83 288 288 288 288 288 288 47 47 47 47 328 328 328 328 271 271 271 271 271 271 271 271 271 271 271 271 271 271 225 225 225 225 225 165 165 165 165 165 165 165 165 165 165 165 165 165 165 165 165 165 280 280 280 280 280 280 280 280 280 280 280 280 280 280 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 283 283 283 283 283 283 283 283 283 283 283 283 283 208 208 208 208 208 208 331 331 331 331 331 331 331 305 305 305 305 116 116 116 116 116 231 231 231 231 231 231 231 231 231 133 133 133 133 133 133 329 329 329 329 329 144 144 144 144 144 144 144 144 144 275 275 275 275 275 275 275 275 275 133 133 133 133 133 133 133 133 133 133 281 281 281 281 281 281 281 281 281 281 281 281 281 281 281 288 288 288 288 288 288 47 47 47 47 233 233 233 233 233 233 233 289 289 289 289 289 289 193 193 193 193 193 224 224 224 224 283 283 283 283 283 283 283 283 283 283 283 208 208 208 208 208 179 179 179 37 37 37 37 116 116 116 116 116 116 171 171 171 171 171 171 171 171 133 133 133 133 133 277 277 277 277 277 277 277 277 49 49 49 49 49 289 289 289 289 289 289 189 189 189 116 116 116 116 83 83 83 83 83 83 83 83 83 83 83 83 83 83 83 83 288 288 288 288 288 288 288 288 119 119 119 52 52 52 52 52 52 331 331 331 331 331 331 331 331 331 331 331 331 331 331 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 340 340 340 340 340 340 340 340 47 47 47 47 233 233 233 233 116 116 116 116 116 116 116 331 331 331 331 331 331 331 133 133 133 133 133 133 277 277 277 277 277 277 277 277 173 173 173 173 173 173 173 173 173 173 173 173 73 73 73 73 73 73 277 277 277 277 277 277 340 340 340 340 340 340 340 340 119 119 119 119 119 137 137 137 137 137 277 277 277 277 277 277 277 277 277 277 277 277 277 277 277 53 53 53 53 53 53 53 53 53 53 53 53 328 328 328 328 328 328 328 328 328 328 328 328 328 328 328 328 328 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 119 119 119 119 119 119 119 133 133 133 276 276 276 19 19 19 276 276 276 276 276 276 271 271 271 271 271 271 271 271 271 271 225 225 225 225 133 133 133 133 133 133 233 233 233 233 233 289 289 289 289 204 204 204 204 204 47 47 47 328 328 328 328 328 271 271 271 271 271 271 271 271 209 209 209 209 209 209 209 209 273 273 273 273 273 49 49 49 224 224 224 224 224 224 224 191 191 191 191 191 191 191 191 191 191 191 191 232 232 232 232 232 232 232 35 35 35 35 35 35 35 35 35 35 35 329 329 329 329 329 49 49 49 49 49 49 233 233 233 233 233 233 233 233 225 225 225 225 225 212 212 212 212 212 212 212 212 212 212 212 212 212 212 212 212 212 212 212 35 35 35 35 35 35 35 35 35 35 233 233 233 233 116 116 116 116 116 116 116 116 116 116 83 83 83 83 83 83 83 83 83 83 83 83 288 288 288 288 288 47 47 47 47 328 328 328 328 328 328 328 187 187 187 187 187 187 187 187 288 288 288 288 288 288 288 288 288 288 288 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 179 179 179 179 179 179 179 179 179 320 320 320 320 219 219 219 219 219 219 219 49 49 49 49 232 232 232 232 232 47 47 47 47 47 289 289 289 289 289 289 289 289 289 289 289 289 133 133 133 133 133 133 133 233 233 233 233 233 116 116 116 116 116 116 116 116 116 219 219 219 219 219 219 219 219 219 219 225 225 225 225 225 249 249 249 249 249 249 249 249 281 281 281 281 281 281 281 281 281 281 281 281 225 225 225 225 204 204 204 204 204 204 287 287 287 287 287 287 188 188 188 188 119 119 119 133 133 133 276 276 276 276 276 276 276 231 231 231 231 231 231 165 165 165 165 165 165 165 165 165 165 109 109 109 109 109 109 109 145 145 145 145 145 340 340 340 340 340 340 340 340 340 340 107 107 107 107 107 107 193 193 193 193 193 341 341 341 341 341 341 341 341 341 233 233 233 233 233 49 49 49 49 49 49 49 49 280 280 280 280 280 280 280 280 280 107 107 107 107 107 107 107 100 100 100 100 100 100 100 100 100 100 100 115 115 115 115 115 115 115 115 115 193 193 193 193 193 233 233 233 233 233 233 233 288 288 288 288 288 47 47 47 47 328 328 328 328 328 231 231 231 231 189 189 189 189 177 177 177 177 177 177 177 225 225 225 225 133 133 133 133 133 133 221 221 221 221 221 221 221 289 289 289 289 189 189 189 236 236 236 236 236 236 236 119 119 119 119 133 133 133 133 276 276 276 276 276 276 276 276 276 276 276 276 276 247 247 247 247 247 247 247 247 247 247 247 247 232 232 232 232 232 232 232 232 232 232 232 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
+1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 107 107 107 107 107 107 53 53 53 53 53 53 53 288 288 288 288 288 227 227 227 227 227 193 193 193 193 193 281 281 281 281 281 281 281 281 189 189 189 189 189 340 340 340 340 340 340 340 340 340 275 275 275 275 275 275 165 165 165 165 165 165 165 165 113 113 113 113 113 113 49 49 49 49 49 49 49 224 224 224 223 223 223 223 223 223 223 223 223 223 223 223 193 193 193 193 193 193 193 193 233 233 233 233 233 233 233 116 116 116 116 116 331 331 331 331 331 331 49 49 49 49 340 340 340 340 340 340 331 331 331 331 331 331 53 53 53 53 232 232 232 232 47 47 47 47 47 328 328 328 119 119 119 119 119 249 249 249 249 249 249 249 249 249 249 340 340 340 340 340 340 340 340 340 219 219 219 219 219 219 219 219 219 219 219 219 165 165 165 165 165 165 165 165 165 165 273 273 273 49 49 49 49 49 109 109 109 109 109 109 109 49 49 49 49 224 224 224 224 224 224 224 224 219 219 219 219 219 219 219 219 219 219 219 219 219 277 277 277 277 277 277 209 209 209 209 209 209 209 113 113 113 113 113 113 113 113 113 113 113 113 113 145 145 145 145 145 145 145 145 340 340 340 340 340 340 340 340 179 179 179 179 320 320 320 219 219 219 219 219 219 219 49 49 49 49 49 232 232 232 227 227 227 227 227 227 227 227 227 37 37 37 37 37 37 37 233 233 233 233 233 49 49 49 49 49 49 216 216 216 216 216 216 216 216 216 216 216 216 119 119 119 133 133 133 133 276 276 276 276 276 276 276 276 247 247 247 247 247 247 247 247 247 247 247 247 247 232 232 232 232 232 219 219 219 219 219 219 219 49 49 49 49 233 233 233 233 233 233 281 281 281 281 281 281 281 281 281 281 281 281 281 149 149 149 149 149 149 149 149 149 149 149 233 233 233 233 233 233 233 340 340 340 340 340 340 340 35 35 35 35 35 233 233 233 233 233 233 116 116 116 119 119 119 119 249 249 249 249 249 249 249 249 249 249 249 340 340 340 340 340 340 340 340 47 47 47 47 47 328 328 328 328 328 328 328 51 51 51 51 51 121 121 121 121 121 121 144 144 144 144 171 171 171 171 171 171 171 171 171 171 249 249 249 249 249 249 249 249 249 249 249 249 221 221 221 221 221 221 221 280 280 280 280 280 280 187 187 187 187 233 233 233 233 289 289 289 289 48 48 48 119 119 119 119 52 52 52 107 107 107 107 107 21 21 21 21 21 21 21 21 277 277 277 277 277 277 177 177 177 177 177 177 177 49 49 49 49 232 232 232 232 232 232 232 232 232 232 232 232 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 283 283 283 283 283 283 283 283 283 283 283 283 208 208 208 208 208 331 331 331 331 331 331 49 49 49 49 340 340 340 340 340 340 50 50 50 50 231 231 231 231 231 231 249 249 249 249 249 249 249 289 289 289 289 49 49 49 109 109 109 49 49 49 224 224 224 224 224 224 224 179 179 179 179 179 179 179 85 85 85 85 85 85 85 85 85 281 281 281 281 281 281 281 281 281 281 281 333 333 333 333 333 333 333 105 105 105 105 105 105 105 105 105 105 105 105 172 172 172 172 172 172 172 172 172 172 172 172 172 172 172 172 1 1 1 1 1 1 1 1 1 1 1 1 1 179 179 179 179 179 179 179 179 179 144 144 144 144 144 144 331 331 331 331 331 331 331 331 331 331 331 331 331 331 149 149 149 149 149 149 149 220 220 220 220 220 220 220 220 220 331 331 331 331 49 49 49 340 340 340 340 340 340 340 340 67 67 67 67 67 67 67 67 225 225 225 225 225 225 225 333 333 333 333 333 333 333 333 333 205 205 205 205 205 340 340 340 340 340 340 340 115 115 115 115 115 115 115 115 53 53 53 53 53 53 53 53 53 53 53 232 232 232 232 232 232 232 232 232 232 232 232 232 232 232 232 232 35 35 35 35 35 35 35 35 35 35 35 35 35 233 233 233 116 116 116 116 116 116 116 331 331 331 331 331 331 331 331 133 133 133 133 133 133 133 224 224 224 224 224 224 224 224 224 224 224 115 115 115 115 115 115 115 115 53 53 53 53 53 53 53 53 53 232 232 232 232 232 232 232 232 232 232 232 232 232 232 232 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 283 283 283 283 283 283 283 283 283 283 283 283 283 283 208 208 208 208 208 208 208 208 208 208 208 208 208 208 208 208 208 208 208 275 275 275 275 275 275 275 275 275 275 275 275 275 275 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 232 232 232 232 232 232 232 232 232 232 232 232 232 232 1 1 1 119 119 119 119 119 48 48 48 48 48 48 279 279 279 279 279 279 279 279 279 279 279 249 249 249 249 249 249 249 249 189 189 189 189 189 189 236 236 236 236 236 236 236 236 279 279 279 279 279 279 279 279 149 149 149 149 149 149 149 149 221 221 221 221 221 221 221 49 49 49 49 224 224 224 224 224 224 224 224 224 224 224 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
+1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 179 179 179 179 179 179 179 179 179 179 133 133 133 133 133 225 225 225 225 225 225 225 273 273 273 273 273 273 273 288 288 288 288 288 288 275 275 275 275 275 275 275 275 275 275 53 53 53 53 53 53 53 53 232 232 232 232 232 232 232 232 119 119 119 48 48 48 48 279 279 279 279 279 279 279 279 279 279 279 279 279 53 53 53 53 53 53 233 233 233 233 233 233 233 117 117 117 117 117 168 168 168 168 168 168 168 168 168 168 279 279 279 279 279 279 279 279 279 279 221 221 221 221 221 221 221 321 321 321 321 321 321 321 321 321 321 321 321 321 321 321 321 224 224 224 224 224 224 224 224 224 224 224 224 35 35 35 35 35 35 35 233 233 233 233 116 116 116 116 116 331 331 331 331 331 331 49 49 49 49 49 340 340 340 340 340 340 340 119 119 119 48 48 48 48 48 48 279 279 279 279 279 279 279 279 289 289 289 289 289 289 289 289 277 277 277 69 69 69 69 69 69 69 237 237 237 237 237 237 237 177 177 177 177 49 49 49 49 49 281 281 281 281 281 281 281 288 288 288 288 288 288 288 271 271 271 271 271 271 271 271 277 277 277 277 277 277 277 21 21 21 21 21 21 21 21 272 272 272 272 272 272 272 272 47 47 47 47 328 328 328 1 1 1 119 119 119 119 52 52 52 52 52 111 111 111 111 111 111 111 111 111 111 111 111 111 111 149 149 149 149 149 149 149 149 149 112 112 112 112 112 112 112 112 112 112 112 1 1 1 163 163 163 163 163 163 163 163 163 163 163 163 163 163 163 163 163 116 116 116 116 116 279 279 279 279 279 279 49 49 49 49 49 281 281 281 281 281 281 281 281 281 281 281 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 49 49 49 49 49 289 289 289 289 289 289 289 289 204 204 204 204 204 204 204 204 204 204 204 204 35 35 35 35 35 35 35 35 35 233 233 233 233 116 116 116 116 116 171 171 171 171 171 171 171 171 171 171 171 69 69 69 69 69 69 69 69 69 277 277 277 277 277 277 49 49 49 49 232 232 232 227 227 227 227 227 227 227 227 193 193 193 193 285 285 285 285 285 285 285 285 285 285 285 49 49 49 49 49 49 233 233 233 233 233 233 233 233 233 340 340 340 340 340 15 15 15 15 15 15 15 177 177 177 177 177 177 177 177 177 341 341 341 341 341 341 341 193 193 193 193 225 225 225 225 225 225 225 337 337 337 337 337 145 145 145 145 145 145 145 145 145 145 145 145 204 204 204 204 204 204 204 204 204 204 204 204 204 204 204 204 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 335 335 335 335 335 335 335 133 133 133 133 133 133 288 288 288 288 288 288 288 331 331 331 331 331 189 189 189 189 189 120 120 120 120 120 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 224 224 224 224 224 224 224 224 119 119 119 119 119 193 193 193 193 193 193 193 193 193 193 193 193 193 193 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 1 1 1 227 227 227 227 227 227 227 227 227 193 193 193 193 193 281 281 281 281 281 281 281 281 281 189 189 189 189 189 340 340 340 340 340 340 340 340 340 340 275 275 275 275 275 275 275 165 165 165 165 165 165 113 113 113 113 113 113 113 49 49 49 49 49 49 224 224 224 224 224 171 171 171 171 171 171 171 171 171 171 85 85 85 85 85 85 85 85 85 85 85 85 233 233 233 233 116 116 116 116 116 1 1 1 47 47 47 47 47 47 47 47 109 109 109 109 109 109 109 109 109 109 53 53 53 53 53 53 53 233 233 233 233 233 233 233 233 233 233 233 233 233 117 117 117 117 117 49 49 49 49 49 49 233 233 233 233 233 233 233 288 288 288 288 287 287 287 287 287 287 287 287 287 287 287 287 287 287 287 101 101 101 101 101 101 101 101 101 101 101 101 101 228 228 228 228 228 228 287 287 287 287 287 48 48 48 48 279 279 279 279 279 279 279 279 279 193 193 193 193 288 288 288 288 288 288 288 171 171 171 171 171 171 171 144 144 144 144 144 144 144 144 144 144 144 144 144 83 83 83 83 83 83 83 83 83 83 83 83 83 83 83 83 83 83 83 277 277 277 277 277 277 277 277 340 340 340 340 340 340 340 340 340 35 35 35 35 288 288 288 288 179 179 179 179 179 179 144 144 144 144 144 219 219 219 219 219 219 219 219 219 193 193 193 193 193 113 113 113 113 113 113 113 113 113 113 113 49 49 49 49 232 232 232 232 232 232 232 232 331 331 331 331 331 331 193 193 193 193 233 233 233 233 233 233 233 117 117 117 117 117 244 244 244 244 244 244 244 244 244 244 244 244 244 244 244 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 231 231 231 231 231 231 231 231 231 231 193 193 193 193 193 289 289 289 289 289 189 189 189 189 189 189 189 236 236 236 236 236 236 236 236 236 236 236 236 236 236 219 219 219 219 219 219 219 219 219 219 219 219 219 219 219 219 219 21 21 21 21 21 21 21 21 289 289 289 289 289 289 49 49 49 232 232 232 232 232 232 232 232 232 232 331 331 331 331 331 331 331 331 69 69 69 69 69 69 69 277 277 277 277 277 272 272 272 272 272 272 272 272 272 272 272 219 219 219 219 219 219 219 219 219 219 219 219 333 333 333 333 333 193 193 193 193 225 225 225 225 225 225 225 225 225 225 225 225 289 289 289 289 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 283 283 283 283 283 283 283 283 283 283 283 283 283 208 208 208 208 208 208 179 179 179 179 179 37 37 37 37 116 116 116 116 231 231 231 231 231 193 193 193 193 193 193 193 289 289 289 289 289 189 189 189 189 189 189 116 116 116 116 116 116 279 279 279 279 279 279 279 279 279 193 193 193 193 221 221 221 221 221 281 281 281 281 281 281 281 281 281 281 289 289 289 289 289 289 209 209 209 209 209 209 209 232 232 232 232 232 47 47 47 47 328 328 328 328 328 119 119 119 119 119 119 49 49 49 49 49 49 228 228 228 228 228 228 228 228 228 228 1 1 1 1 1 1 1 1 1 1 1 1 1 1
+1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 35 35 35 35 35 35 35 35 35 35 35 35 35 35 35 340 340 340 340 340 340 340 340 35 35 35 35 35 35 35 35 35 35 35 329 329 329 329 329 49 49 49 49 49 233 233 233 225 225 225 225 225 225 212 212 212 212 212 212 212 179 179 179 179 179 179 179 179 179 179 179 85 85 85 85 85 85 85 85 281 281 281 281 281 281 281 281 221 221 221 221 221 221 221 213 213 213 213 213 213 213 213 213 273 273 273 273 273 273 145 145 145 145 145 145 145 340 340 340 340 340 340 340 331 331 331 331 331 144 144 144 144 144 331 331 331 331 331 331 331 331 331 331 331 331 331 331 249 249 249 233 233 233 233 233 288 288 288 287 287 287 287 188 188 188 188 188 287 287 287 287 287 287 287 287 287 287 287 287 287 133 133 133 133 133 133 224 224 224 224 224 224 224 224 224 224 224 224 187 187 187 187 187 232 232 232 232 232 232 232 1 1 1 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 116 116 116 116 116 327 327 327 327 327 327 327 327 327 327 265 265 265 265 265 265 265 265 265 265 265 265 281 281 281 281 281 281 281 281 281 281 281 281 281 189 189 189 189 189 189 189 189 189 340 340 340 340 340 340 340 340 340 340 340 340 340 340 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 35 35 35 35 35 35 35 35 35 233 233 233 233 116 116 116 116 116 219 219 219 219 219 219 219 219 219 219 209 209 209 209 209 209 209 209 209 273 273 273 273 273 189 189 189 189 189 189 236 236 236 236 236 236 236 236 50 50 50 50 283 283 283 283 283 283 283 283 283 283 283 283 283 283 283 283 21 21 21 21 21 21 21 277 277 277 277 277 277 272 272 272 272 272 272 272 1 1 1 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 19 19 19 19 19 19 19 232 232 232 232 232 232 119 119 119 52 52 52 227 227 227 227 227 227 227 227 227 227 165 165 165 165 165 165 165 165 165 165 165 165 165 165 165 165 165 232 232 232 232 232 232 232 232 232 232 232 275 275 275 275 275 275 275 275 275 275 249 249 249 249 249 249 249 249 249 249 249 249 249 249 249 249 116 116 116 116 116 116 119 119 119 119 119 49 49 49 288 288 288 288 219 219 219 219 219 219 219 219 219 219 277 277 277 277 277 69 69 69 69 69 69 69 281 281 281 281 281 281 281 281 281 281 281 288 288 288 288 288 288 288 288 119 119 119 52 52 52 52 179 179 179 179 179 179 179 179 179 179 179 179 21 21 21 21 225 225 225 225 225 225 225 225 225 244 244 244 244 244 244 244 244 244 244 244 244 244 244 244 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 47 47 47 47 47 47 47 47 47 47 47 233 233 233 233 116 116 116 116 116 116 116 331 331 331 331 331 331 331 331 331 85 85 85 85 85 85 85 85 85 85 85 85 85 85 85 85 85 85 85 85 233 233 233 233 233 116 116 116 116 116 51 51 51 51 51 51 51 51 272 272 272 272 272 272 272 272 272 119 119 119 119 52 52 52 52 52 279 279 279 279 279 279 279 279 279 279 279 279 279 289 289 289 289 289 289 209 209 209 209 209 209 209 209 209 272 272 272 272 272 272 272 275 275 275 275 275 275 275 275 275 275 275 275 275 275 133 133 133 133 133 133 116 116 116 116 116 116 179 179 179 179 179 179 179 179 179 179 179 179 179 179 193 193 193 193 193 193 224 224 224 224 224 224 224 224 224 224 224 107 107 107 107 107 107 107 213 213 213 213 213 213 213 213 213 213 213 69 69 69 69 69 69 69 69 69 69 69 69 69 233 233 233 233 233 233 233 233 116 116 116 116 116 116 116 116 116 116 116 116 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
+1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 131 131 131 131 131 131 131 131 131 131 131 233 233 233 233 233 233 233 205 205 205 205 205 205 205 109 109 109 109 109 109 49 49 49 49 49 49 49 117 117 117 117 204 204 204 204 204 204 204 204 179 179 179 179 179 179 179 179 179 320 320 320 320 320 320 331 331 331 331 331 331 331 331 133 133 133 133 133 133 233 233 233 233 233 233 233 233 288 288 288 288 83 83 83 83 83 83 83 83 83 83 83 83 83 288 288 288 288 288 288 47 47 47 47 47 328 328 328 328 328 328 187 187 187 187 187 187 187 187 187 187 187 288 288 288 288 288 288 288 288 288 288 288 1 1 1 1 1 1 1 67 67 67 67 67 67 67 67 67 67 67 67 276 276 276 276 276 276 276 276 276 276 191 191 191 191 191 191 191 233 233 233 233 233 233 233 233 233 289 289 289 289 289 289 289 316 316 316 316 316 316 316 316 316 316 316 187 187 187 187 187 187 187 288 288 288 288 288 288 288 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 179 179 179 179 179 179 179 179 179 179 179 179 37 37 37 37 37 116 116 116 116 287 287 287 287 287 188 188 188 188 188 271 271 271 271 271 271 271 271 271 271 37 37 37 37 37 37 37 37 37 37 37 37 37 280 280 280 280 280 280 280 280 280 280 280 280 247 247 247 247 247 247 247 247 247 329 329 329 329 329 144 144 144 144 144 144 119 119 119 119 119 37 37 37 37 37 37 37 37 37 37 288 288 288 288 179 179 179 179 179 179 179 179 179 179 179 193 193 193 193 193 193 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 275 275 275 275 275 275 275 275 275 275 249 249 249 249 249 249 249 249 249 249 249 249 249 249 249 249 249 249 116 116 116 116 116 116 116 116 116 116 116 116 116 1 1 1 35 35 35 35 35 35 35 35 233 233 233 233 233 116 116 116 116 116 116 279 279 279 279 279 279 279 279 279 279 279 279 279 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 275 275 275 275 275 275 275 275 275 275 53 53 53 53 53 53 232 232 232 232 232 232 232 232 232 119 119 119 119 119 204 204 204 204 204 204 204 47 47 47 47 47 47 47 47 47 47 47 47 233 233 233 233 233 233 233 233 233 233 233 233 281 281 281 281 281 281 281 281 281 281 281 281 281 281 281 209 209 209 209 209 209 209 209 209 232 232 232 232 232 232 232 232 232 232 175 175 175 175 175 175 69 69 69 69 69 69 69 69 69 233 233 233 233 233 233 233 233 289 289 289 289 289 289 289 289 225 225 225 225 49 49 49 49 49 49 288 288 288 288 47 47 47 47 47 47 328 328 328 328 328 328 328 227 227 227 227 227 227 227 193 193 193 193 281 281 281 281 281 281 281 281 281 189 189 189 189 189 189 340 340 340 340 340 340 340 340 340 275 275 275 275 275 275 275 275 275 165 165 165 165 165 165 165 165 113 113 113 113 113 113 113 113 49 49 49 49 49 49 49 225 225 225 225 225 225 225 340 340 340 340 340 340 340 340 340 340 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 224 224 224 224 224 224 224 224 279 279 279 279 279 279 279 279 279 279 279 279 279 209 209 209 189 189 189 189 189 189 189 189 189 189 189 189 189 189 189 236 236 236 236 236 236 236 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 283 283 283 283 283 283 283 283 283 283 283 283 208 208 208 208 208 208 331 331 331 331 331 49 49 49 340 340 340 340 340 279 279 279 279 279 279 279 193 193 193 193 193 193 289 289 289 289 189 189 189 189 236 236 236 236 236 236 236 236 236 119 119 119 119 133 133 133 133 133 276 276 276 276 276 276 276 276 331 331 331 331 331 331 331 331 331 331 53 53 53 53 232 232 232 232 232 232 39 39 39 39 39 39 39 173 173 173 173 173 173 173 289 289 289 289 289 289 145 145 145 145 145 233 233 233 233 233 321 321 321 321 321 321 321 321 321 321 321 232 232 232 232 232 187 187 187 187 232 232 232 232 232 232 1 1 1 147 147 147 147 147 147 147 147 147 147 147 147 147 147 225 225 225 225 225 225 204 204 204 204 204 204 204 204 204 204 215 215 215 215 215 215 215 215 215 215 215 321 321 321 321 321 321 321 321 321 321 321 321 321 321 321 321 232 232 232 232 232 232 232 232 232 232 232 232 232 232 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 119 119 119 119 119 119 119 48 48 48 48 48 48 279 279 279 279 279 279 279 279 279 279 279 279 53 53 53 53 53 53 53 53 53 53 232 232 232 232 232 232 232 331 331 331 331 49 49 49 49 340 340 340 340 340 340 340 219 219 219 219 219 219 219 219 219 53 53 53 53 53 229 229 229 229 229 229 189 189 189 236 236 236 236 236 236 236 236 236 236 236 191 191 191 191 232 232 232 232 232 35 35 35 288 288 288 288 288 288 119 119 119 52 52 52 52 52 331 331 331 331 331 331 331 193 193 193 193 193 233 233 233 233 233 233 233 233 233 117 117 117 117 117 244 244 244 244 244 244 244 244 244 244 244 244 244 244 244 244 244 244 1 1 1 331 331 331 331 331 331 69 69 69 69 69 69 69 69 69 69 277 277 277 277 277 277 277 277 228 228 228 228 228 228 228 47 47 47 233 233 233 116 116 116 116 116 116 107 107 107 107 107 107 277 277 277 277 277 277 277 277 101 101 101 101 101 101 101 101 101 101 101 101 101 288 288 288 288 288 288 288 288 288 288 288 288 288 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
+1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 119 119 119 119 119 119 204 204 204 204 204 204 204 204 67 67 67 67 67 67 67 67 67 67 277 277 277 277 277 277 277 113 113 113 113 113 113 113 113 113 113 145 145 145 145 116 116 116 116 116 19 19 19 19 19 19 19 19 232 232 232 232 232 119 119 119 48 48 48 48 48 279 279 279 279 279 279 279 279 279 279 279 279 279 225 225 225 225 225 249 249 249 249 249 249 249 249 249 249 272 272 272 272 272 272 272 107 107 107 189 189 189 189 225 225 225 225 225 225 225 225 225 225 225 225 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 119 119 119 52 52 52 52 52 179 179 179 179 179 179 179 179 179 179 179 179 179 85 85 85 85 85 85 85 85 85 85 85 85 85 85 85 85 85 280 280 280 280 280 280 280 280 280 280 280 280 280 331 331 331 331 49 49 49 340 340 340 340 340 340 340 187 187 187 232 232 232 232 232 50 50 50 50 50 107 107 107 107 107 107 107 107 277 277 277 277 277 277 277 277 101 101 101 101 101 101 101 101 117 117 117 117 49 49 49 49 49 224 224 224 224 224 171 171 171 171 171 171 171 171 171 171 171 171 171 171 171 171 171 171 171 225 225 225 225 225 53 53 53 53 53 53 53 53 53 53 284 284 284 284 284 284 284 284 284 284 284 284 284 47 47 47 47 328 328 328 328 328 328 328 328 328 271 271 271 271 271 271 271 271 271 271 271 193 193 193 237 237 237 237 237 237 237 237 237 237 237 237 221 221 221 221 221 221 221 221 204 204 204 204 204 204 204 204 331 331 331 331 331 331 331 331 331 331 101 101 101 101 101 101 101 101 101 101 288 288 288 288 288 288 288 288 1 1 1 107 107 107 107 107 225 225 225 225 225 225 321 321 321 321 321 321 321 321 321 321 321 321 321 321 321 228 228 228 228 228 228 228 228 228 228 228 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 179 179 179 179 179 179 179 179 179 179 179 179 53 53 53 53 53 229 229 229 229 229 229 229 229 229 229 229 229 229 229 116 116 116 116 116 116 247 247 247 247 247 247 247 247 247 247 247 329 329 329 329 329 329 329 144 144 144 144 144 144 107 107 107 107 107 107 107 107 107 100 100 100 100 100 100 100 100 100 100 100 50 50 50 50 50 227 227 227 227 227 227 227 227 227 227 193 193 193 193 193 193 277 277 277 277 277 277 277 277 277 277 205 205 205 205 205 205 205 205 49 49 49 49 116 116 116 116 47 47 47 47 328 328 328 328 328 328 328 1 1 1 107 107 107 107 107 107 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 287 287 287 287 287 287 287 287 287 287 287 287 21 21 21 21 21 21 21 229 229 229 229 229 229 49 49 49 49 49 280 280 280 280 280 280 280 280 280 280 280 223 223 223 223 223 223 223 223 193 193 193 193 193 193 193 193 233 233 233 233 233 233 233 233 233 233 116 116 116 116 116 116 116 116 116 116 116 116 116 116 116 116 116 116 116 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 50 50 50 50 50 50 50 50 50 50 50 50 50 227 227 227 227 227 227 227 227 227 227 209 209 209 209 209 209 209 209 220 220 220 220 220 220 220 220 223 223 223 223 223 223 223 223 193 193 193 289 289 289 289 289 49 49 49 49 224 224 224 224 224 227 227 227 227 227 227 227 227 227 227 227 227 37 37 37 37 37 37 37 37 37 37 37 37 37 37 232 232 232 232 232 232 232 179 179 179 179 179 179 179 321 321 321 228 228 228 228 228 228 35 35 35 35 35 35 35 35 35 35 35 35 35 35 35 35 329 329 329 329 329 49 49 49 49 233 233 233 233 233 225 225 225 225 225 212 212 212 212 212 212 212 212 212 271 271 271 271 271 271 271 271 271 271 209 209 209 209 209 209 209 209 209 209 273 273 273 273 273 273 273 49 49 49 224 224 224 224 224 224 224 224 219 219 219 219 219 219 219 219 219 219 69 69 69 69 69 69 69 69 69 69 225 225 225 225 225 225 116 116 116 116 116 116 116 116 275 275 275 275 275 275 275 165 165 165 165 165 165 165 165 113 113 113 113 113 113 49 49 49 49 49 49 49 224 224 224 224 224 224 224 224 223 223 223 223 223 223 223 223 223 193 193 193 193 193 193 193 233 233 233 233 233 233 233 233 233 233 233 117 117 117 117 340 340 340 340 179 179 179 179 179 179 179 179 179 179 53 53 53 53 53 53 53 341 341 341 341 341 341 341 341 341 109 109 109 109 49 49 49 49 49 233 233 233 233 233 233 233 233 233 116 116 116 116 116 116 116 116 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 331 331 331 331 331 331 49 49 49 49 340 340 340 340 340 279 279 279 279 279 279 279 279 279 249 249 249 249 249 249 249 249 249 249 189 189 189 189 189 189 189 236 236 236 236 236 236 236 179 179 179 179 179 179 193 193 193 193 193 340 340 340 340 340 340 223 223 223 223 223 223 223 223 223 165 165 165 165 165 165 165 165 165 165 288 288 288 288 288 288 287 287 287 287 287 287 287 287 287 287 287 149 149 149 149 149 149 149 233 233 233 233 233 49 49 49 49 49 272 272 272 272 272 279 279 279 279 279 279 279 279 279 279 279 279 279 279 209 209 209 209 209 209 209 209 116 116 116 116 116 19 19 19 19 19 19 19 232 232 232 232 232 119 119 119 52 52 52 179 179 179 179 179 179 179 179 179 179 193 193 193 193 193 193 224 224 224 224 224 224 224 224 171 171 171 171 171 171 171 171 171 171 171 171 171 209 209 209 209 209 209 209 209 225 225 225 225 225 225 225 225 225 225 116 116 116 116 116 116 1 1 1 107 107 107 107 107 213 213 213 213 213 213 213 213 213 69 69 69 69 69 69 69 69 69 69 69 233 233 233 233 233 233 233 116 116 116 116 119 119 119 52 52 52 52 107 107 107 107 107 107 107 21 21 21 21 21 21 21 21 21 21 277 277 277 277 277 277 277 277 277 277 232 232 232 232 232 232 232 232 232 232 232 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
+1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 227 227 227 227 227 227 227 227 227 193 193 193 193 193 281 281 281 281 281 281 281 189 189 189 189 340 340 340 340 340 340 340 340 275 275 275 275 275 165 165 165 165 165 165 113 113 113 113 113 49 49 49 49 49 224 224 224 224 224 224 224 231 231 231 231 231 337 337 337 337 337 337 337 337 320 320 320 320 320 320 119 119 119 119 119 49 49 49 49 288 288 288 288 179 179 179 179 179 179 179 208 208 208 208 208 208 208 208 208 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 288 288 288 288 288 288 288 288 288 1 1 1 107 107 107 107 189 189 189 189 189 189 221 221 221 221 221 221 221 221 221 49 49 49 49 49 49 49 49 49 340 340 340 283 283 283 283 283 283 283 283 283 283 283 283 208 208 208 208 208 208 208 208 208 179 179 179 179 179 179 37 37 37 37 116 116 116 179 179 179 179 179 179 179 179 179 179 179 179 149 149 149 149 149 149 149 149 149 116 116 116 179 179 179 179 179 179 193 193 193 228 228 228 228 228 228 228 228 287 287 287 287 287 287 287 287 287 133 133 133 133 133 133 224 224 224 224 224 224 224 224 224 271 271 271 271 271 271 271 209 209 209 209 209 209 209 289 289 289 289 289 289 144 144 144 144 144 227 227 227 227 227 227 227 227 227 227 69 69 69 69 69 69 69 277 277 277 277 189 189 189 189 189 189 281 281 281 281 281 281 281 281 281 49 49 49 49 232 232 232 232 232 119 119 119 119 204 204 204 204 204 207 207 207 207 207 207 207 207 207 207 207 207 329 329 329 329 329 329 233 233 233 189 189 189 236 236 236 236 236 236 236 107 107 107 107 107 189 189 189 189 173 173 173 173 173 173 173 173 173 173 173 173 173 69 69 69 69 69 69 69 69 69 69 69 69 69 276 276 276 276 276 276 276 276 276 1 1 1 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 232 232 232 232 232 232 232 232 232 232 232 232 232 232 1 1 1 1 1 1 1 1 1 1 1 331 331 331 331 331 331 331 331 331 331 331 331 193 193 193 193 225 225 225 225 225 337 337 337 337 337 337 337 337 49 49 49 49 49 228 228 228 228 228 228 228 215 215 215 215 215 215 215 215 164 164 164 164 164 164 164 164 164 164 164 164 164 164 107 107 107 107 107 107 107 225 225 225 225 225 225 133 133 133 133 133 133 133 277 277 277 277 277 277 277 277 277 340 340 340 340 279 279 279 279 279 279 279 279 279 289 289 289 289 289 289 69 69 69 69 69 69 69 69 69 69 69 276 276 276 276 276 276 276 276 276 276 276 276 276 276 247 247 247 247 247 247 247 247 247 247 247 247 247 247 329 329 329 329 329 329 329 144 144 144 144 144 144 35 35 35 35 288 288 288 219 219 219 219 219 219 219 219 219 219 219 21 21 21 21 21 277 277 277 277 277 229 229 229 229 229 229 49 49 49 117 117 117 117 117 204 204 204 204 204 204 204 204 204 204 204 204 204 204 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 119 119 119 119 119 119 119 37 37 37 37 37 288 288 288 288 179 179 179 179 179 179 179 179 179 179 208 208 208 208 208 227 227 227 227 227 227 227 227 133 133 133 133 233 233 233 233 233 233 233 233 288 288 288 287 287 287 287 287 287 287 188 188 188 188 279 279 279 279 279 279 279 279 279 279 279 248 248 248 248 248 248 248 248 248 248 248 179 179 179 179 179 189 189 189 340 340 340 340 340 340 287 287 287 287 287 287 287 287 287 287 287 149 149 149 149 149 149 149 233 233 233 233 233 49 49 49 49 272 272 272 272 272 279 279 279 279 279 279 279 279 279 209 209 209 209 209 209 209 209 209 116 116 116 119 119 119 48 48 48 48 231 231 231 231 231 133 133 133 133 133 133 221 221 221 221 221 221 221 281 281 281 288 288 288 288 288 288 39 39 39 39 39 39 39 173 173 173 173 173 173 173 289 289 289 289 289 145 145 145 145 233 233 233 233 233 233 321 321 321 321 321 321 321 321 321 232 232 232 232 232 232 232 232 232 232 232 232 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
+1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 271 271 271 271 271 271 271 209 209 209 209 209 209 209 209 289 289 289 289 289 144 144 144 144 144 144 179 179 179 179 179 37 37 37 116 116 116 116 116 35 35 35 35 35 35 35 35 35 35 35 35 35 281 281 281 281 281 281 281 281 281 281 221 221 221 288 288 288 187 187 187 187 228 228 228 228 228 47 47 47 47 47 328 328 328 328 328 328 219 219 219 219 219 219 219 69 69 69 69 69 69 69 69 277 277 277 277 277 277 277 277 277 277 280 280 280 280 280 280 280 280 280 280 171 171 171 171 171 171 171 171 144 144 144 144 144 227 227 227 227 227 227 227 37 37 37 37 37 37 37 37 37 293 293 293 293 293 293 337 337 337 337 316 316 316 316 316 316 219 219 219 219 219 219 219 219 219 219 53 53 53 53 53 53 293 293 293 293 293 293 293 293 109 109 109 109 109 109 145 145 145 145 145 145 145 145 145 145 288 288 288 179 179 179 179 179 179 179 179 179 37 37 37 37 116 116 116 116 116 231 231 231 231 231 231 231 231 231 133 133 133 133 133 133 329 329 329 329 144 144 144 144 144 144 144 107 107 107 107 107 107 107 107 193 193 193 193 232 232 232 232 231 231 231 231 231 231 231 231 231 249 249 249 249 249 249 249 249 249 249 249 249 232 232 232 232 232 287 287 287 287 287 287 287 287 320 320 320 320 327 327 327 327 327 327 327 25 25 25 25 25 25 25 25 225 225 225 225 225 225 49 49 49 49 233 233 233 233 233 289 289 289 289 289 289 289 289 289 289 193 193 193 193 193 276 276 276 276 276 276 276 195 195 195 233 233 233 233 233 173 173 173 173 173 173 173 145 145 145 229 229 229 229 229 229 229 165 165 165 165 165 165 165 165 285 285 285 285 285 285 285 285 285 49 49 49 232 232 232 47 47 47 109 109 109 109 109 85 85 85 85 85 85 85 85 85 85 288 288 288 288 288 288 1 1 1 131 131 131 131 131 131 131 131 131 233 233 233 233 233 233 205 205 205 205 205 205 205 293 293 293 293 293 293 293 293 293 293 197 197 197 197 197 197 236 236 236 236 236 236 236 236 236 236 236 236 236 191 191 191 191 191 232 232 232 179 179 179 179 179 189 189 189 189 189 340 340 340 340 340 340 179 179 179 179 179 249 249 249 249 249 249 249 249 224 224 224 224 224 224 223 223 223 223 223 223 223 223 223 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 172 172 172 172 172 172 172 172 172 172 172 172 172 172 172 172 172 172 172 172 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 35 35 35 35 35 35 35 35 35 233 233 233 233 116 116 116 116 335 335 335 335 335 335 335 335 133 133 133 133 133 133 133 133 133 133 133 133 133 133 288 288 288 288 288 288 288 288 288 288 288 288 288 288 288 288 288 288 288 288 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 179 179 179 179 179 179 179 179 179 179 179 179 179 179 179 179 179 209 209 209 209 209 209 209 276 276 276 276 276 276 276 331 331 331 331 331 331 331 49 49 49 340 340 340 340 340 340 340 227 227 227 227 227 227 37 37 37 37 37 37 37 37 37 37 37 37 293 293 293 293 293 293 293 337 337 337 337 316 316 316 316 316 219 219 219 219 219 219 219 219 219 219 219 219 219 219 219 53 53 53 53 53 293 293 293 293 293 293 293 293 293 293 109 109 109 109 109 109 145 145 145 145 145 145 145 145 145 145 145 288 288 288 288 288 288 288 288 1 1 1 1 1 1 1 1 1 1 1 1 1 1 35 35 35 35 35 35 35 35 35 35 35 35 288 288 288 179 179 179 179 179 179 179 179 179 179 179 179 179 179 179 179 179 179 37 37 37 37 37 37 37 37 37 37 37 172 172 172 172 172 172 172 172 172 271 271 271 271 271 271 271 271 271 271 271 37 37 37 37 37 37 37 37 37 37 281 281 281 281 281 281 281 281 281 281 288 288 288 288 288 288 288 288 291 291 291 291 291 291 291 277 277 277 277 277 277 277 208 208 208 208 208 208 208 208 208 208 208 208 208 19 19 19 19 19 19 19 232 232 232 232 232 119 119 119 119 119 204 204 204 204 204 204 39 39 39 39 39 39 39 39 39 39 173 173 173 173 173 173 289 289 289 289 289 289 145 145 145 145 233 233 233 233 321 321 321 321 321 321 321 321 321 321 321 321 232 232 232 232 232 47 47 47 47 328 328 328 328 328 328 50 50 50 50 107 107 107 107 107 107 107 107 107 107 107 107 107 193 193 193 193 193 193 341 341 341 341 341 341 341 204 204 204 204 204 204 204 204 115 115 115 115 115 115 115 164 164 164 164 164 164 164 164 164 164 164 164 164 164 164 164 164 164 164 164 164 164 164 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 271 271 271 271 271 271 271 271 271 225 225 225 225 225 225 37 37 37 37 37 37 37 37 37 281 281 281 281 281 281 281 281 281 281 281 281 189 189 189 189 117 117 117 225 225 225 225 225 225 204 204 204 204 204 204 204 204 204 204 115 115 115 115 115 115 115 115 277 277 277 277 277 277 277 101 101 101 101 101 101 101 101 101 101 329 329 329 329 329 189 189 189 189 189 189 236 236 236 236 236 236 236 236 247 247 247 247 247 247 247 247 247 247 247 247 247 247 329 329 329 329 329 329 329 144 144 144 144 144 119 119 119 52 52 52 52 52 179 179 179 179 179 179 179 179 179 179 179 179 179 21 21 21 21 21 21 21 225 225 225 225 225 225 225 225 244 244 244 244 244 244 244 244 244 47 47 47 47 47 47 233 233 233 233 233 116 116 116 116 116 116 51 51 51 51 51 51 51 272 272 272 272 272 119 119 119 119 119 119 119 119 119 48 48 48 48 179 179 179 179 179 179 179 179 179 193 193 193 193 193 193 193 193 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
+1 1 1 1 1 1 1 1 1 1 35 35 35 35 35 35 35 35 35 35 35 233 233 233 233 116 116 116 179 179 179 179 179 193 193 193 193 193 340 340 340 340 340 340 340 107 107 107 107 107 107 107 107 133 133 133 133 133 133 281 281 281 281 281 281 281 281 281 281 288 288 288 288 288 288 288 288 279 279 279 279 279 279 279 279 279 279 321 321 321 321 321 321 321 288 288 288 47 47 47 328 328 328 328 328 328 328 328 219 219 219 219 219 219 219 219 219 225 225 225 225 225 225 249 249 249 249 249 249 249 249 249 249 249 249 249 249 249 121 121 121 121 121 340 340 340 340 340 340 340 340 340 340 340 340 340 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 331 331 331 331 331 331 193 193 193 193 112 112 112 112 112 112 112 112 112 331 331 331 331 49 49 49 340 340 340 340 340 340 340 340 340 271 271 271 271 271 271 271 271 271 271 271 271 271 271 225 225 225 225 225 225 165 165 165 165 165 165 165 165 165 165 165 165 232 232 232 232 232 232 232 232 232 232 232 271 271 271 271 271 271 271 271 271 271 271 277 277 277 277 321 321 321 321 321 321 321 321 321 321 321 321 172 172 172 172 172 172 172 172 172 119 119 119 119 119 119 49 49 49 288 288 288 179 179 179 179 179 179 179 179 208 208 208 208 208 331 331 331 331 331 49 49 49 49 340 340 340 340 340 340 175 175 175 175 175 175 249 249 249 249 249 249 249 249 189 189 189 189 189 189 189 236 236 236 236 236 236 236 236 83 83 83 83 83 83 83 83 83 83 83 83 83 83 83 288 288 288 288 288 288 47 47 47 47 328 328 328 328 328 328 35 35 35 35 35 35 35 35 35 35 35 35 329 329 329 329 329 49 49 49 49 233 233 233 233 233 233 233 225 225 225 225 225 212 212 212 212 212 212 212 212 212 212 212 212 212 212 212 212 212 212 212 212 212 212 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 35 35 35 35 35 35 35 233 233 233 116 116 116 179 179 179 208 208 208 208 208 208 208 208 208 208 179 179 179 179 179 179 179 179 37 37 37 37 37 37 116 116 116 116 116 116 119 119 119 48 48 48 48 48 48 107 107 107 107 107 107 107 107 107 107 107 107 107 107 107 53 53 53 53 53 53 53 53 177 177 177 177 177 177 204 204 204 204 204 204 204 204 204 204 204 204 35 35 35 233 233 233 116 116 116 119 119 119 48 48 48 48 279 279 279 279 279 279 279 279 279 279 279 279 279 69 69 69 69 69 69 277 277 277 277 277 277 49 49 49 49 49 224 224 224 224 224 227 227 227 227 227 227 227 227 227 227 133 133 133 133 133 133 133 133 133 133 133 133 276 276 276 276 276 276 276 276 276 276 331 331 331 331 331 331 331 193 193 193 193 193 112 112 112 112 112 112 112 112 107 107 107 107 107 189 189 189 189 189 289 289 289 289 289 289 289 289 289 289 289 289 289 249 249 249 249 249 249 249 221 221 221 221 221 221 49 49 49 49 49 233 233 233 116 116 116 119 119 119 49 49 49 288 288 288 179 179 179 208 208 208 208 208 331 331 331 331 331 49 49 49 340 340 340 340 340 340 340 175 175 175 175 175 175 175 249 249 249 249 249 249 189 189 189 189 189 189 189 236 236 236 236 236 236 236 50 50 50 50 219 219 219 219 219 219 219 49 49 49 49 233 233 233 233 233 281 281 281 281 281 281 281 281 281 281 281 281 281 193 193 193 193 117 117 117 117 145 145 145 145 145 145 145 145 49 49 49 49 49 109 109 109 109 49 49 49 224 224 224 224 115 115 115 115 115 115 115 115 115 115 193 193 193 193 193 281 281 281 281 281 281 281 281 281 281 289 289 289 289 49 49 49 49 49 49 49 49 49 233 233 233 233 233 233 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 231 231 231 231 231 231 231 231 231 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 1 1 1 1 1 1 1 1 1 1 1 1 331 331 331 331 331 331 331 331 331 331 331 331 331 331 331 331 331 331 331 133 133 133 133 133 133 133 133 133 133 133 133 133 133 133 133 276 276 276 276 276 276 276 276 276 331 331 331 331 331 331 331 331 331 53 53 53 53 53 340 340 340 340 340 340 340 340 227 227 227 227 227 227 227 227 227 227 227 227 37 37 37 37 37 37 37 37 37 37 37 37 37 37 293 293 293 293 293 293 293 293 337 337 337 337 316 316 316 316 316 316 219 219 219 219 219 219 219 219 219 219 219 219 219 219 219 219 53 53 53 53 53 53 293 293 293 293 293 293 293 293 293 293 109 109 109 109 109 109 109 145 145 145 145 145 145 145 288 288 288 288 288 288 288 288 1 1 1 1 1 1 1 175 175 175 175 175 175 175 249 249 249 249 249 249 249 249 249 249 249 249 249 249 249 189 189 189 189 189 189 189 189 236 236 236 236 236 236 236 236 236 236 236 236 236 236 236 236 236 236 236 236 236 1 1 1 1 35 35 35 35 35 35 35 35 35 35 35 233 233 233 233 116 116 116 116 116 116 116 331 331 331 331 331 331 331 331 331 331 331 331 331 100 100 100 100 100 100 100 100 100 100 331 331 331 331 331 331 331 49 49 49 340 340 340 340 340 340 179 179 179 179 208 208 208 208 208 208 208 208 208 208 175 175 175 175 175 175 175 175 249 249 249 249 249 249 249 249 249 249 249 189 189 189 189 189 189 189 236 236 236 236 236 236 236 236 236 236 236 236 236 119 119 119 119 119 133 133 133 133 276 276 276 276 276 276 276 276 276 276 276 276 276 276 276 276 276 276 276 276 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
+1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 179 179 179 179 179 179 179 179 179 179 179 179 179 179 179 179 179 179 37 37 37 37 37 116 116 116 116 116 116 191 191 191 191 191 288 288 288 288 288 288 107 107 107 107 107 107 193 193 193 193 193 193 193 232 232 232 232 232 232 232 232 232 232 131 131 131 131 131 131 131 131 131 131 131 131 131 233 233 233 233 233 204 204 204 204 204 204 204 204 204 204 204 204 51 51 51 51 51 51 51 51 121 121 121 121 121 144 144 144 144 144 144 227 227 227 227 227 227 227 227 227 227 37 37 37 37 37 37 37 37 37 37 37 232 232 232 232 232 187 187 187 187 187 232 232 232 232 232 232 35 35 35 35 35 35 35 35 35 35 35 329 329 329 329 329 329 49 49 49 49 233 233 233 233 233 233 233 233 233 233 225 225 225 225 225 212 212 212 212 212 212 212 212 212 212 212 212 212 212 212 212 212 212 212 212 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 227 227 227 227 227 227 227 227 227 193 193 193 193 193 193 281 281 281 281 281 281 281 281 281 189 189 189 189 189 340 340 340 340 340 340 340 340 340 275 275 275 275 275 275 165 165 165 165 165 165 165 165 165 165 113 113 113 113 113 113 49 49 49 49 49 49 49 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 1 1 1 1 1 1 1 115 115 115 115 115 115 115 133 133 133 133 133 133 133 173 173 173 173 173 173 173 173 173 173 173 289 289 289 289 225 225 225 225 204 204 204 204 204 204 271 271 271 271 271 271 271 271 305 305 305 305 305 305 305 289 289 289 289 289 189 189 189 189 189 189 236 236 236 236 236 236 236 236 236 236 236 236 236 119 119 119 119 119 193 193 193 193 193 193 193 193 193 280 280 280 280 280 280 280 280 280 280 280 280 280 280 35 35 35 35 35 35 35 233 233 233 233 233 233 233 116 116 116 119 119 119 119 119 119 37 37 37 37 37 37 37 37 37 37 37 37 288 288 288 288 288 287 287 287 287 287 49 49 49 49 177 177 177 177 177 177 177 133 133 133 133 133 133 133 133 121 121 121 121 121 121 121 144 144 144 144 144 144 144 144 144 144 144 144 144 144 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 227 227 227 227 227 227 227 227 227 227 101 101 101 101 101 101 101 101 101 101 288 288 288 288 288 179 179 179 179 37 37 37 37 328 328 328 328 328 328 175 175 175 175 175 175 175 193 193 193 193 193 193 329 329 329 329 329 49 49 49 49 49 49 232 232 232 232 232 50 50 50 50 50 50 271 271 271 271 271 271 271 271 271 271 271 277 277 277 277 193 193 193 289 289 289 289 289 204 204 204 204 204 204 204 204 204 175 175 175 175 175 175 305 305 305 305 305 305 305 116 116 116 116 116 116 116 116 175 175 175 175 175 175 175 175 133 133 133 133 133 133 133 133 133 133 133 133 133 133 133 133 280 280 280 280 280 280 280 280 280 280 35 35 35 35 35 35 340 340 340 340 340 340 340 287 287 287 287 287 48 48 48 48 107 107 107 107 107 107 249 249 249 249 249 249 249 249 249 249 292 292 292 292 292 292 219 219 219 219 219 219 219 333 333 333 133 133 133 133 133 133 281 281 281 281 281 281 281 113 113 113 113 113 113 113 113 113 113 49 49 49 49 49 233 233 233 233 233 233 233 233 233 233 233 233 233 233 340 340 340 340 340 340 340 340 340 340 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 107 107 107 107 107 53 53 53 53 53 288 288 288 288 288 227 227 227 227 227 227 227 227 227 227 37 37 37 37 37 37 37 37 37 37 37 37 293 293 293 293 293 293 293 293 293 293 293 337 337 337 337 337 337 316 316 316 316 316 316 316 279 279 279 279 279 279 279 279 279 279 279 279 279 279 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 275 275 275 275 275 275 275 275 275 133 133 133 133 133 133 277 277 277 277 277 277 277 277 277 277 225 225 225 225 225 225 204 204 204 204 204 204 204 204 331 331 331 331 331 331 331 331 133 133 133 133 133 133 233 233 233 233 233 233 233 233 288 288 288 288 171 171 171 171 171 171 171 171 145 145 145 145 228 228 228 228 228 228 179 179 179 179 179 179 179 179 179 179 179 179 179 179 249 249 249 249 249 249 249 249 249 249 249 249 249 249 228 228 228 228 228 228 228 228 228 228 228 228 228 228 228 1 1 1 1 1 1 1 1 1 1 1 1 1 1 119 119 119 119 119 119 119 49 49 49 49 288 288 288 288 288 191 191 191 191 191 288 288 288 288 227 227 227 227 227 227 227 227 227 227 53 53 53 53 53 53 53 281 281 281 281 281 281 281 281 288 288 288 288 288 288 288 288 107 107 107 107 107 208 208 208 208 208 208 279 279 279 279 279 279 279 279 279 279 279 279 279 279 53 53 53 53 229 229 229 229 229 229 229 293 293 293 293 293 293 293 293 189 189 189 236 236 236 236 236 236 236 236 236 236 236 271 271 271 271 271 271 271 271 271 271 271 271 271 271 271 277 277 277 277 133 133 133 133 281 281 281 281 281 281 281 281 281 281 281 281 281 189 189 189 189 189 189 189 236 236 236 236 236 236 236 236 236 35 35 35 233 233 233 233 116 116 116 116 116 47 47 47 47 47 47 47 47 233 233 233 233 233 337 337 337 337 337 337 337 337 337 337 337 321 321 321 321 321 321 321 321 321 321 321 345 345 345 345 345 345 345 333 333 333 333 49 49 49 224 224 224 224 224 331 331 331 331 331 331 193 193 193 193 193 112 112 112 112 112 112 112 112 112 331 331 331 49 49 49 340 340 340 340 340 340 340 340 287 287 287 287 287 287 287 287 287 287 287 165 165 165 165 165 165 165 221 221 221 221 221 221 189 189 189 189 236 236 236 236 236 236 236 179 179 179 179 179 179 179 179 193 193 193 193 193 228 228 228 228 228 228 228 228 228 228 228 1 1 1 1 1 1 1 1 1 1 1 1 1
+1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 179 179 179 179 179 179 179 179 179 179 208 208 208 208 208 331 331 331 331 331 331 331 49 49 49 340 340 340 340 340 340 340 119 119 119 119 48 48 48 48 283 283 283 283 283 283 283 283 283 283 283 283 283 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 49 49 49 49 49 49 281 281 281 281 281 281 288 288 288 288 288 288 227 227 227 227 227 37 37 37 37 37 37 37 37 37 37 37 232 232 232 232 232 47 47 47 47 47 47 225 225 225 225 225 225 225 225 225 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 328 328 328 328 328 328 35 35 35 233 233 233 116 116 116 116 179 179 179 179 179 179 179 179 179 165 165 165 165 165 165 165 165 165 165 289 289 289 289 289 289 289 189 189 189 189 116 116 116 287 287 287 287 287 287 48 48 48 48 179 179 179 179 179 179 37 37 37 37 37 37 37 328 328 328 328 328 328 328 328 328 287 287 287 287 287 188 188 188 188 175 175 175 175 175 175 175 248 248 248 248 248 248 248 248 248 248 47 47 47 47 229 229 229 229 229 229 229 53 53 53 53 53 53 236 236 236 236 236 236 236 236 236 279 279 279 279 279 279 289 289 289 289 289 289 289 277 277 277 165 165 165 165 165 165 165 165 165 233 233 233 233 233 233 217 217 217 217 217 217 217 145 145 145 145 145 145 145 145 340 340 340 340 340 340 340 340 340 146 146 146 146 146 146 146 287 287 287 287 287 287 287 320 320 320 320 320 320 320 320 320 131 131 131 131 131 131 131 233 233 233 233 233 204 204 204 204 204 204 204 204 204 271 271 271 271 271 271 271 271 271 225 225 225 225 225 165 165 165 165 165 165 165 165 165 165 280 280 280 280 280 280 280 280 280 280 280 280 331 331 331 133 133 133 276 276 276 179 179 179 179 179 179 179 179 179 179 179 208 208 208 208 227 227 227 227 227 227 227 227 227 227 101 101 101 101 101 101 101 101 101 101 288 288 288 288 179 179 179 179 179 179 179 179 179 179 179 37 37 37 37 328 328 328 328 328 328 287 287 287 287 287 287 188 188 188 188 188 287 287 287 287 287 287 287 287 287 287 69 69 69 69 69 69 69 69 69 69 69 69 220 220 220 220 220 220 220 220 220 220 220 220 220 220 220 220 220 220 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 227 227 227 227 227 227 227 227 227 227 37 37 37 37 37 37 37 37 37 37 37 37 293 293 293 293 293 293 293 293 293 293 337 337 337 337 337 337 337 337 316 316 316 316 316 316 316 316 316 316 316 115 115 115 115 115 115 115 115 115 115 115 277 277 277 277 277 277 133 133 133 133 133 133 133 281 281 281 281 281 281 281 281 281 281 281 281 288 288 288 288 288 288 51 51 51 51 51 51 51 51 51 51 51 272 272 272 272 272 272 331 331 331 331 189 189 189 120 120 120 50 50 50 50 50 331 331 331 331 331 331 331 331 331 331 101 101 101 101 101 101 101 101 101 101 288 288 288 288 288 288 219 219 219 219 219 219 219 219 219 219 219 219 21 21 21 21 21 21 21 21 21 225 225 225 225 225 225 225 225 225 144 144 144 144 144 144 144 144 144 144 47 47 47 47 47 233 233 233 233 116 116 116 115 115 115 115 277 277 277 277 277 277 277 101 101 101 101 101 101 101 101 101 101 329 329 329 329 329 189 189 189 189 236 236 236 236 236 187 187 187 187 232 232 232 50 50 50 50 50 107 107 107 107 107 107 107 107 107 107 53 53 53 53 53 53 53 53 53 177 177 177 177 177 177 204 204 204 204 204 204 204 204 204 204 204 204 204 204 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 331 331 331 331 331 331 331 49 49 49 340 340 340 340 279 279 279 279 279 279 279 279 279 279 279 279 279 279 53 53 53 53 53 229 229 229 229 229 229 229 229 229 293 293 293 293 293 293 293 293 189 189 189 236 236 236 236 236 236 236 236 236 119 119 119 49 49 49 49 49 49 49 49 288 288 288 288 288 288 115 115 115 115 115 115 193 193 193 193 193 193 117 117 117 49 49 49 49 49 233 233 233 233 233 288 288 288 179 179 179 179 179 179 179 179 179 179 179 37 37 37 37 37 37 37 37 37 273 273 273 273 273 273 273 49 49 49 49 232 232 232 232 232 232 232 67 67 67 67 67 67 67 67 67 67 173 173 173 173 173 173 173 173 173 49 49 49 49 49 232 232 232 232 232 232 232 232 232 232 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 227 227 227 227 227 227 227 227 227 227 193 193 193 193 193 193 281 281 281 281 281 281 281 189 189 189 189 189 340 340 340 340 340 340 340 340 275 275 275 275 275 275 275 165 165 165 165 165 165 165 165 165 113 113 113 113 113 113 49 49 49 49 49 49 49 49 49 224 224 224 224 224 224 224 224 224 224 224 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 271 271 271 271 271 271 271 271 271 271 21 21 21 21 21 21 21 21 21 21 233 233 233 233 233 233 117 117 117 117 117 144 144 144 144 144 144 144 144 131 131 131 131 340 340 340 283 283 283 283 283 283 283 283 283 283 283 283 208 208 208 208 208 227 227 227 227 227 227 227 227 227 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 288 288 288 288 288 288 288 288 288 288 288 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 219 219 219 219 219 219 219 219 305 305 305 305 305 116 116 116 116 227 227 227 227 227 227 227 227 165 165 165 165 165 165 165 165 220 220 220 220 220 220 220 220 220 231 231 231 231 231 231 231 231 231 231 231 231 231 231 231 231 53 53 53 53 53 53 53 53 293 293 293 293 293 293 293 293 293 189 189 189 189 236 236 236 236 236 236 236 236 236 236 51 51 51 51 51 51 51 51 51 51 51 51 51 328 328 328 328 328 328 187 187 187 187 187 187 187 187 187 187 288 288 288 288 288 288 288 288 288 288 288 288 1 1 1 1 1 1 1 1 1 1 1 1 1
+1 1 1 1 1 1 1 1 1 1 1 1 1 1 35 35 35 35 35 35 35 35 35 35 35 35 35 233 233 233 116 116 116 116 179 179 179 179 179 179 179 144 144 144 144 144 144 39 39 39 39 39 39 39 39 39 39 39 39 173 173 173 173 173 173 173 289 289 289 289 289 145 145 145 145 233 233 233 233 233 321 321 321 321 321 321 321 321 321 321 321 233 233 233 233 233 233 233 233 340 340 340 340 340 187 187 187 187 187 233 233 233 233 233 233 233 217 217 217 217 217 217 265 265 265 265 265 265 265 265 265 265 265 265 229 229 229 229 229 229 229 229 49 49 49 49 49 233 233 233 233 233 288 288 288 288 288 288 288 288 288 288 331 331 331 331 49 49 49 340 340 340 340 279 279 279 279 279 279 279 279 273 273 273 273 273 273 265 265 265 265 265 265 265 265 265 265 265 265 265 265 225 225 225 225 225 225 225 225 225 225 225 225 116 116 116 116 116 116 116 116 116 116 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 99 99 99 99 99 99 99 99 99 99 99 99 99 99 224 224 224 224 224 224 224 224 224 224 215 215 215 215 215 215 189 189 189 189 189 189 189 189 281 281 281 281 281 281 281 281 281 288 288 288 288 288 288 288 279 279 279 279 279 279 279 279 289 289 289 289 289 289 289 133 133 133 133 133 133 272 272 272 272 272 272 272 272 272 247 247 247 247 247 247 247 247 247 247 247 329 329 329 329 329 144 144 144 144 144 287 287 287 287 188 188 188 175 175 175 175 175 175 175 175 277 277 277 277 277 277 209 209 209 209 209 209 209 209 209 232 232 232 232 232 232 175 175 175 175 175 175 165 165 165 165 165 165 165 165 165 165 165 165 165 109 109 109 109 109 49 49 49 49 225 225 225 225 225 340 340 340 340 340 340 340 35 35 35 35 35 35 35 35 35 35 35 35 173 173 173 173 173 289 289 289 289 289 144 144 144 144 144 144 287 287 287 287 287 287 287 287 287 287 287 287 287 208 208 208 208 208 208 208 208 208 208 208 208 208 208 208 47 47 47 47 233 233 233 233 116 116 116 116 116 171 171 171 171 171 171 171 171 171 171 101 101 101 101 101 101 101 101 101 101 101 101 101 233 233 233 116 116 116 116 83 83 83 83 83 83 83 83 83 83 83 83 288 288 288 288 288 288 288 288 171 171 171 171 171 145 145 145 145 145 228 228 228 228 228 228 227 227 227 17 17 17 277 277 277 277 277 277 277 193 193 193 193 193 193 225 225 225 225 225 225 225 225 225 48 48 48 48 48 48 331 331 331 331 331 331 331 133 133 133 133 133 133 276 276 276 276 276 179 179 179 179 179 179 179 179 209 209 209 209 340 340 340 340 340 340 340 340 175 175 175 175 175 175 69 69 69 69 69 69 69 69 69 69 69 232 232 232 232 47 47 47 47 47 233 233 233 116 116 116 116 331 331 331 331 331 331 331 331 331 331 331 331 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 119 119 119 119 52 52 52 52 52 52 52 331 331 331 331 331 331 331 331 331 331 331 331 149 149 149 149 149 149 149 121 121 121 121 121 121 204 204 204 204 204 204 331 331 331 331 331 331 331 331 305 305 305 305 229 229 229 229 229 49 49 49 49 232 232 232 232 232 232 232 232 171 171 171 171 171 171 171 171 171 171 171 101 101 101 101 101 101 101 101 101 101 101 101 233 233 233 49 49 49 225 225 225 204 204 204 204 204 204 219 219 219 219 219 219 219 49 49 49 49 233 233 233 221 221 221 221 221 221 221 221 221 225 225 225 321 321 321 321 321 321 321 321 117 117 117 117 117 189 189 189 189 189 116 116 116 116 116 116 116 116 116 116 116 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 179 179 179 179 179 179 179 179 179 179 208 208 208 208 115 115 115 115 115 115 53 53 53 53 53 53 53 341 341 341 49 49 49 49 232 232 232 232 232 232 232 232 215 215 215 215 215 215 215 215 215 215 133 133 133 133 133 133 133 233 233 233 233 233 145 145 145 145 145 145 145 49 49 49 49 225 225 225 225 225 225 204 204 204 204 204 204 204 175 175 175 175 175 175 175 248 248 248 248 248 248 248 248 248 287 287 287 287 188 188 188 188 188 188 287 287 287 287 287 287 287 287 287 287 287 287 287 287 287 85 85 85 85 85 85 85 85 85 85 85 85 232 232 232 232 232 119 119 119 119 189 189 189 189 189 280 280 280 280 280 280 280 287 287 287 287 287 287 287 287 287 101 101 101 101 101 101 101 101 228 228 228 228 228 228 228 47 47 47 47 328 328 328 328 328 328 335 335 335 335 335 335 193 193 193 193 193 276 276 276 276 276 276 276 276 276 276 276 276 276 1 1 1 47 47 47 47 47 47 47 47 47 233 233 233 116 116 116 116 179 179 179 179 179 179 179 179 179 208 208 208 208 208 231 231 231 231 231 231 231 231 231 231 231 231 133 133 133 133 133 133 329 329 329 329 329 329 144 144 144 144 144 144 144 327 327 327 327 327 327 327 327 327 193 193 193 193 193 193 341 341 341 341 341 341 341 341 341 189 189 189 189 189 189 189 189 189 289 289 289 289 289 289 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 1 1 1 1 1 1 1 1 1 1 1 1 1
+1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 191 191 191 191 191 191 191 191 172 172 172 172 179 179 179 209 209 209 209 209 209 116 116 116 116 116 116 116 275 275 275 275 275 275 53 53 53 53 53 53 232 232 232 232 232 83 83 83 83 83 83 83 83 83 83 288 288 288 288 47 47 47 328 328 328 328 328 328 287 287 287 287 287 287 287 287 287 287 287 287 149 149 149 149 149 149 233 233 233 233 233 49 49 49 49 49 272 272 272 272 279 279 279 279 279 279 279 279 279 279 279 279 279 279 209 209 209 209 209 209 209 209 209 209 116 116 116 179 179 179 179 179 179 208 208 208 208 208 208 331 331 331 331 331 331 331 305 305 305 305 305 117 117 117 49 49 49 49 49 233 233 233 233 288 288 288 115 115 115 277 277 277 277 277 133 133 133 133 133 280 280 280 280 280 280 280 280 280 280 51 51 51 51 51 51 51 51 272 272 272 272 272 47 47 47 47 233 233 233 116 116 116 287 287 287 287 287 287 287 287 165 165 165 165 165 165 165 220 220 220 220 220 220 220 119 119 119 119 52 52 52 52 107 107 107 107 107 107 107 107 107 107 53 53 53 53 53 53 53 177 177 177 177 177 177 204 204 204 204 204 204 204 204 204 287 287 287 287 287 188 188 188 188 175 175 175 175 175 175 248 248 248 248 248 248 248 248 248 248 248 171 171 171 171 171 171 171 171 171 144 144 144 144 227 227 227 227 227 227 227 227 69 69 69 69 69 69 69 69 69 69 276 276 276 276 276 276 276 276 276 276 276 276 276 276 276 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
+1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 335 335 335 335 335 335 335 133 133 133 133 133 133 288 288 288 288 288 288 288 279 279 279 279 279 279 53 53 53 229 229 229 229 229 229 293 293 293 293 293 293 189 189 189 189 236 236 236 236 236 236 227 227 227 227 227 227 227 227 227 227 227 227 53 53 53 53 53 53 53 281 281 281 281 281 281 288 288 288 179 179 179 37 37 37 328 328 328 328 179 179 179 179 179 179 37 37 37 37 37 37 37 37 37 273 273 273 273 273 273 49 49 49 49 49 233 233 233 116 116 116 116 116 279 279 279 279 279 279 279 279 279 193 193 193 193 233 233 233 233 233 280 280 280 280 280 280 280 280 223 223 223 223 223 223 223 37 37 37 37 37 37 37 37 37 37 280 280 280 280 280 280 280 280 280 280 280 231 231 231 231 231 101 101 101 101 101 101 101 101 101 101 101 101 101 101 288 288 288 287 287 287 287 48 48 48 48 279 279 279 279 279 279 279 289 289 289 289 289 289 21 21 21 21 21 277 277 277 277 277 288 288 288 288 179 179 179 179 179 179 179 193 193 193 193 228 228 228 228 228 228 228 228 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 172 172 172 172 172 172 172 172 172 172 172 172 172 172 172 172 172 172 172 172 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 99 99 99 99 99 99 99 99 99 228 228 228 228 228 228 228 219 219 219 219 219 219 219 219 219 219 225 225 225 225 225 209 209 209 209 209 209 209 209 232 232 232 232 232 232 271 271 271 271 271 271 53 53 53 53 53 53 53 53 53 341 341 341 341 49 49 49 49 49 225 225 225 225 225 225 225 225 225 116 116 116 119 119 119 119 37 37 37 37 37 37 37 289 289 289 289 289 280 280 280 280 280 331 331 331 331 331 53 53 53 53 53 53 53 288 288 288 288 288 288 288 288 288 288 1 1 1 1 1 1 1 47 47 47 47 47 47 47 47 47 47 47 233 233 233 116 116 116 116 102 102 102 102 102 102 102 102 331 331 331 331 331 331 331 331 331 331 331 249 249 249 249 249 249 249 249 233 233 233 288 288 288 231 231 231 231 231 231 231 248 248 248 248 248 248 248 248 248 248 50 50 50 50 227 227 227 227 227 227 227 227 227 227 227 227 227 227 227 227 227 193 193 193 233 233 233 233 233 49 49 49 49 49 289 289 289 289 289 280 280 280 280 280 280 280 271 271 271 271 271 271 271 209 209 209 209 209 209 209 209 280 280 280 280 280 280 47 47 47 47 328 328 328 328 328 227 227 227 227 227 227 227 101 101 101 101 101 101 101 101 101 101 101 233 233 233 116 116 116 116 146 146 146 146 146 146 219 219 219 219 219 219 219 219 219 219 219 219 219 219 21 21 21 21 21 21 233 233 233 233 233 233 233 285 285 285 285 285 285 285 49 49 49 49 49 233 233 233 233 233 233 233 280 280 280 280 280 280 280 280 280 280 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 47 47 47 47 47 47 47 47 233 233 233 233 233 289 289 289 289 289 289 289 193 193 193 193 193 224 224 224 224 224 224 224 102 102 102 102 102 102 102 102 102 231 231 231 231 231 231 231 231 231 231 231 231 231 248 248 248 248 248 248 248 248 248 248 248 248 331 331 331 331 331 331 331 53 53 53 288 288 288 179 179 179 37 37 37 340 340 340 340 340 340 340 340 340 287 287 287 287 287 287 287 165 165 165 165 165 165 165 221 221 221 221 221 49 49 49 49 49 232 232 232 232 227 227 227 227 227 227 227 227 227 37 37 37 37 37 37 37 37 37 37 37 37 293 293 293 293 293 337 337 337 316 316 316 316 316 219 219 219 219 219 219 219 219 219 219 219 219 219 219 53 53 53 53 53 293 293 293 293 293 293 293 109 109 109 109 109 109 145 145 145 145 145 145 145 288 288 288 288 288 288 288 288 288 288 288 83 83 83 83 83 83 83 83 288 288 288 288 288 288 47 47 47 328 328 328 328 328 328 328 35 35 35 35 35 35 35 35 35 329 329 329 329 329 49 49 49 233 233 233 233 233 233 225 225 225 225 225 212 212 212 212 212 212 212 212 212 287 287 287 287 287 287 49 49 49 117 117 117 117 117 164 164 164 164 164 164 164 164 164 164 164 164 164 164 164 164 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 47 47 47 47 47 47 47 47 47 221 221 221 221 221 221 221 221 221 69 69 69 69 69 69 277 277 277 277 117 117 117 117 189 189 189 237 237 237 237 237 237 237 225 225 225 225 225 225 225 204 204 204 204 204 204 204 204 204 204 204 204 204 204 204 204 204 204 204 204 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 35 35 35 35 35 35 35 35 35 35 35 35 35 35 35 35 173 173 173 173 173 289 289 289 289 144 144 144 144 144 144 287 287 287 287 287 287 287 287 287 287 287 287 287 208 208 208 208 208 208 208 208 208 208 208 208 208 208 208 208 227 227 227 227 227 227 227 227 227 193 193 193 193 193 281 281 281 281 281 281 189 189 189 189 189 340 340 340 340 340 340 340 340 275 275 275 275 275 165 165 165 165 165 165 113 113 113 113 113 113 49 49 49 49 224 224 224 224 279 279 279 279 279 279 279 279 279 279 133 133 133 133 133 133 133 288 288 288 288 83 83 83 83 83 83 83 83 83 83 288 288 288 288 288 288 288 288 288 288 288 288 288 288 288 288 288 288 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 283 283 283 283 283 283 283 283 283 283 283 283 283 208 208 208 208 208 208 208 208 208 179 179 179 179 37 37 37 37 116 116 116 231 231 231 231 231 21 21 21 21 21 21 21 288 288 288 288 288 288 288 288 171 171 171 171 171 21 21 21 21 21 276 276 276 276 276 276 276 287 287 287 287 287 188 188 188 188 175 175 175 175 175 175 175 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 1 1 1 1 1 1 1 1 1 1
+1 1 1 1 1 1 1 1 1 1 1 1 1 1 119 119 119 119 119 119 48 48 48 48 48 107 107 107 107 107 107 107 107 107 193 193 193 193 193 193 193 193 193 193 176 176 176 176 176 176 176 176 176 176 176 176 176 176 176 176 176 1 1 1 1 1 1 1 1 275 275 275 275 275 275 275 275 37 37 37 37 37 37 37 37 37 37 37 229 229 229 229 229 229 229 229 229 109 109 109 109 225 225 225 225 189 189 189 189 189 189 189 189 236 236 236 236 236 236 236 236 236 236 67 67 67 67 67 67 67 67 67 67 67 67 277 277 277 277 277 277 277 113 113 113 113 113 113 113 113 113 113 145 145 145 145 116 116 116 127 127 127 229 229 229 229 229 229 229 229 229 109 109 109 109 109 85 85 85 85 85 85 85 85 85 85 85 85 85 85 85 85 145 145 145 145 145 145 116 116 116 179 179 179 179 179 179 179 179 179 179 179 179 179 179 179 85 85 85 85 85 85 85 85 85 85 85 85 85 85 85 85 85 85 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 331 331 331 331 133 133 133 133 276 276 276 276 276 119 119 119 119 48 48 48 48 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 223 223 223 223 223 223 223 223 193 193 193 193 193 193 193 193 193 193 329 329 329 329 329 329 329 116 116 116 116 116 116 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 331 331 331 331 331 331 331 331 49 49 49 340 340 340 340 340 340 50 50 50 50 50 50 279 279 279 279 279 279 279 279 279 279 221 221 221 221 221 221 37 37 37 37 37 37 37 37 37 37 37 37 37 37 233 233 233 233 288 288 288 288 288 288 288 219 219 219 219 219 219 219 219 219 219 219 219 333 333 333 333 69 69 69 69 277 277 277 277 289 289 289 289 144 144 144 144 144 144 144 144 47 47 47 47 328 328 328 328 328 328 50 50 50 50 227 227 227 227 227 227 227 227 227 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 224 224 224 224 224 224 224 224 51 51 51 51 51 272 272 272 272 272 119 119 119 119 48 48 48 48 48 48 48 48 275 275 275 275 275 275 275 249 249 249 249 249 249 249 249 249 249 249 116 116 116 171 171 171 171 171 171 171 171 171 145 145 145 145 228 228 228 228 223 223 223 223 223 223 223 223 223 193 193 193 193 193 193 233 233 233 233 233 233 117 117 117 117 340 340 340 179 179 179 179 179 179 179 179 179 179 21 21 21 21 21 21 225 225 225 225 225 225 225 225 244 244 244 244 244 244 244 244 244 244 244 244 244 244 244 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 287 287 287 287 287 287 287 188 188 188 107 107 107 107 208 208 208 208 208 208 283 283 283 283 283 283 283 283 283 283 305 305 305 305 276 276 276 276 276 276 276 276 276 276 276 276 276 276 276 276 276 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 119 119 119 119 119 52 52 52 52 52 52 52 52 52 223 223 223 223 223 223 223 69 69 69 69 69 69 69 69 69 69 69 236 236 236 236 236 236 236 236 236 236 223 223 223 223 223 223 223 223 223 165 165 165 165 165 165 165 165 165 165 165 165 165 165 165 165 232 232 232 232 232 232 227 227 227 227 227 227 227 165 165 165 165 165 165 165 165 165 116 116 116 116 116 191 191 191 288 288 288 288 288 288 50 50 50 175 175 175 175 175 175 175 175 175 175 305 305 305 305 305 305 116 116 116 116 116 116 116 115 115 115 115 115 115 209 209 209 209 209 209 224 224 224 224 224 171 171 171 171 171 171 171 171 171 171 171 171 171 149 149 149 149 149 149 149 149 149 149 149 121 121 121 121 121 121 144 144 144 144 144 144 144 144 144 144 144 144 144 144 144 144 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 227 227 227 227 227 227 227 227 227 227 37 37 37 37 37 37 37 37 37 37 293 293 293 293 293 293 293 337 337 337 337 316 316 316 316 316 219 219 219 219 219 219 219 219 219 219 53 53 53 53 53 53 53 53 53 293 293 293 293 293 293 293 293 293 109 109 109 109 109 145 145 145 145 145 145 145 145 289 289 289 289 289 280 280 280 280 280 280 280 280 171 171 171 171 171 171 171 21 21 21 21 21 21 21 21 21 21 21 21 21 121 121 121 121 121 121 121 144 144 144 144 144 144 144 144 144 144 144 144 144 144 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 35 35 35 35 35 35 35 35 35 35 35 35 340 340 340 340 283 283 283 283 283 283 283 283 283 283 283 283 283 283 283 283 283 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 47 47 47 47 47 233 233 233 116 116 116 116 116 279 279 279 279 279 279 279 279 279 101 101 101 101 101 101 101 101 101 101 101 101 101 225 225 225 225 225 225 49 49 49 49 49 49 233 233 233 233 233 288 288 288 288 131 131 131 131 340 340 340 340 340 179 179 179 179 179 179 189 189 189 189 340 340 340 340 279 279 279 279 279 279 279 279 279 279 279 53 53 53 53 53 53 53 53 53 232 232 232 232 232 232 232 35 35 35 35 35 35 35 35 35 35 35 173 173 173 173 173 173 289 289 289 289 289 144 144 144 144 144 179 179 179 179 179 179 193 193 193 193 193 193 193 193 228 228 228 228 228 228 228 228 228 228 228 1 1 1 1 1 1 1 1 1 1
+1 1 1 1 1 1 1 1 1 1 1 1 179 179 179 179 179 179 179 179 179 37 37 37 116 116 116 116 116 175 175 175 175 175 175 175 175 21 21 21 21 21 21 21 21 288 288 288 288 288 131 131 131 340 340 340 340 340 340 340 340 340 340 340 340 171 171 171 171 171 171 171 171 171 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 276 276 276 276 276 276 47 47 47 47 47 47 47 333 333 333 333 333 333 333 333 164 164 164 164 164 164 164 164 164 164 164 131 131 131 131 340 340 340 340 340 179 179 179 179 208 208 208 208 208 208 271 271 271 271 271 271 271 21 21 21 21 21 21 21 21 21 21 281 281 281 281 281 281 281 281 281 49 49 49 49 49 109 109 109 109 109 225 225 225 225 225 225 225 204 204 204 204 204 204 219 219 219 219 219 219 219 219 219 305 305 305 305 305 305 305 305 305 305 116 116 116 116 116 116 171 171 171 171 171 171 171 145 145 145 145 145 228 228 228 228 228 228 179 179 179 179 179 189 189 189 340 340 340 340 340 340 340 340 340 340 171 171 171 171 133 133 133 133 133 225 225 225 225 225 225 244 244 244 244 244 244 244 227 227 227 227 227 227 227 227 227 227 133 133 133 133 133 133 133 133 133 133 232 232 232 232 232 232 232 232 232 331 331 331 331 331 331 189 189 189 121 121 121 121 121 85 85 85 85 85 85 85 85 85 85 85 85 85 85 288 288 288 288 288 288 288 35 35 35 35 35 35 35 35 35 35 35 35 35 35 35 221 221 221 221 221 221 221 221 113 113 113 113 113 113 113 113 113 113 325 325 325 325 325 49 49 49 49 225 225 225 225 225 225 225 204 204 204 204 204 204 204 204 275 275 275 275 275 275 275 205 205 205 289 289 289 289 289 289 289 289 289 289 289 277 277 277 277 209 209 209 209 209 209 289 289 289 289 189 189 189 189 236 236 236 236 236 236 236 236 236 187 187 187 187 233 233 233 233 233 289 289 289 289 289 48 48 48 48 119 119 119 119 52 52 52 52 52 331 331 331 331 331 331 331 331 331 331 331 331 331 331 331 305 305 305 305 305 305 117 117 117 117 117 117 117 117 340 340 340 340 340 340 340 340 340 340 340 1 1 1 331 331 331 331 331 133 133 133 133 232 232 232 232 179 179 179 179 179 208 208 208 208 208 208 171 171 171 171 171 171 171 171 171 85 85 85 85 85 85 85 85 85 233 233 233 233 117 117 117 189 189 189 189 116 116 116 179 179 179 193 193 193 193 340 340 340 340 340 340 340 179 179 179 179 179 249 249 249 249 249 249 249 229 229 229 229 229 229 229 229 281 281 281 281 281 281 281 281 289 289 289 289 289 289 289 137 137 137 137 137 137 137 137 116 116 116 116 116 116 116 116 116 116 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 175 175 175 175 175 175 277 277 277 277 209 209 209 209 209 209 209 209 209 209 209 232 232 232 232 232 175 175 175 175 175 165 165 165 165 165 165 165 165 165 165 165 165 165 109 109 109 109 109 49 49 49 49 225 225 225 225 225 225 225 225 225 340 340 340 340 340 340 340 340 331 331 331 331 49 49 49 340 340 340 340 340 340 340 340 340 340 107 107 107 107 107 193 193 193 193 193 193 193 193 225 225 225 225 225 225 225 225 225 225 288 288 288 288 288 35 35 35 288 288 288 288 288 119 119 119 119 52 52 52 52 171 171 171 171 171 171 171 171 171 171 171 171 171 149 149 149 149 149 149 149 149 149 149 293 293 293 49 49 49 49 49 49 49 281 281 281 281 281 281 281 281 288 288 288 288 288 288 288 131 131 131 131 131 131 131 131 131 216 216 216 216 216 216 216 216 216 216 47 47 47 47 328 328 328 179 179 179 189 189 189 189 340 340 340 340 340 340 340 340 340 219 219 219 219 219 219 219 225 225 225 225 225 193 193 193 193 193 193 277 277 277 277 277 277 277 277 116 116 116 116 116 116 223 223 223 223 223 223 223 223 37 37 37 37 37 37 37 37 37 37 37 37 37 233 233 233 233 233 116 116 116 116 116 116 116 116 116 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 35 35 35 35 35 35 35 35 233 233 233 233 233 233 233 116 116 116 119 119 119 133 133 133 133 133 133 276 276 276 276 276 276 276 276 276 187 187 187 187 288 288 288 288 288 288 288 288 331 331 331 331 331 53 53 53 53 53 53 53 53 53 53 53 53 340 340 340 340 340 340 340 340 287 287 287 287 287 287 287 48 48 48 48 119 119 119 119 119 189 189 189 189 189 189 280 280 280 280 280 280 280 280 280 115 115 115 115 115 115 164 164 164 164 164 164 164 164 164 164 164 164 164 164 164 164 164 164 164 164 1 1 1 1 1 1 1 1 1 1 1 1
+1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 107 107 107 107 107 133 133 133 133 133 133 133 277 277 277 277 277 277 277 277 225 225 225 225 225 204 204 204 204 204 204 327 327 327 327 327 327 327 193 193 193 193 193 193 341 341 341 341 341 341 341 49 49 49 49 109 109 109 109 49 49 49 49 224 224 224 224 224 224 224 224 224 171 171 171 171 171 171 145 145 145 145 145 228 228 228 228 228 119 119 119 52 52 52 227 227 227 227 227 227 227 165 165 165 165 165 165 165 165 165 165 165 165 165 165 232 232 232 232 232 232 232 275 275 275 275 275 275 275 275 275 249 249 249 249 249 249 249 249 249 249 249 249 249 116 116 116 116 116 47 47 47 47 47 47 225 225 225 225 69 69 69 69 69 69 69 69 69 69 69 69 69 69 69 236 236 236 236 236 331 331 331 331 331 331 331 193 193 193 193 193 112 112 112 112 112 112 112 112 112 112 112 112 1 1 1 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 224 224 224 224 224 119 119 119 119 119 204 204 204 204 204 204 204 204 204 204 51 51 51 51 51 51 121 121 121 121 121 121 121 144 144 144 144 144 144 144 144 144 144 144 1 1 1 35 35 35 35 35 35 35 35 35 35 35 35 35 329 329 329 329 329 329 49 49 49 233 233 233 233 225 225 225 225 225 212 212 212 212 212 212 179 179 179 179 179 179 179 179 179 179 179 179 85 85 85 85 85 85 85 85 85 85 85 85 281 281 281 281 281 281 281 281 189 189 189 189 189 189 189 340 340 340 340 340 340 340 340 340 331 331 331 331 331 144 144 144 144 144 279 279 279 279 279 279 279 279 279 279 279 248 248 248 248 248 248 248 248 279 279 279 279 279 279 279 279 279 279 279 279 279 279 249 249 249 249 249 249 285 285 285 285 285 285 285 285 285 285 49 49 49 49 49 109 109 109 109 109 225 225 225 225 204 204 204 204 204 204 279 279 279 279 279 279 279 279 279 279 193 193 193 113 113 113 113 113 113 113 113 113 113 113 317 317 317 317 317 317 317 169 169 169 169 169 169 169 169 169 169 289 289 289 289 189 189 189 189 116 116 116 116 116 116 116 116 116 116 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 227 227 227 227 227 227 227 227 193 193 193 193 193 281 281 281 281 281 281 281 281 189 189 189 189 340 340 340 340 340 340 340 340 340 275 275 275 275 275 165 165 165 165 165 165 113 113 113 113 113 113 113 49 49 49 49 49 49 49 224 224 224 223 223 223 223 223 223 223 223 193 193 193 193 193 193 193 193 233 233 233 233 233 233 233 233 116 116 116 116 116 116 116 116 116 116 116 1 1 1 1 1 1 1 1 115 115 115 115 115 115 115 193 193 193 193 116 116 116 231 231 231 231 231 231 21 21 21 21 21 21 21 21 21 21 288 288 288 288 288 288 219 219 219 219 219 219 219 219 219 219 219 219 219 219 219 219 219 69 69 69 69 69 69 69 69 69 69 69 69 69 69 69 224 224 224 224 224 224 224 224 224 224 224 224 224 224 1 1 1 223 223 223 223 223 223 223 223 223 223 223 223 223 223 223 223 223 223 223 223 223 193 193 193 193 329 329 329 329 329 189 189 189 189 189 189 189 189 236 236 236 236 236 236 236 236 236 236 187 187 187 187 232 232 232 232 232 279 279 279 279 279 279 53 53 53 53 53 53 112 112 112 112 112 112 50 50 50 50 50 271 271 271 271 271 271 271 271 271 271 271 225 225 225 225 165 165 165 165 165 165 165 165 165 165 165 165 165 280 280 280 280 280 280 280 280 280 280 280 280 280 280 1 1 1 1 1 1 1 1 1 223 223 223 223 223 223 223 223 223 223 223 223 223 223 223 223 223 223 223 193 193 193 193 193 193 329 329 329 329 329 329 189 189 189 189 189 189 189 189 189 236 236 236 236 236 236 236 236 236 236 236 236 1 1 1 35 35 35 35 35 35 35 35 35 35 288 288 288 288 288 288 67 67 67 67 67 67 67 67 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 191 191 191 191 191 191 191 289 289 289 289 289 289 280 280 280 280 280 280 280 280 215 215 215 215 215 215 215 53 53 53 53 281 281 281 281 281 281 281 281 281 281 288 288 288 288 288 288 288 279 279 279 279 279 279 279 279 279 279 279 279 289 289 289 289 289 289 289 165 165 165 165 165 165 165 165 165 165 165 165 165 165 165 165 165 165 189 189 189 189 236 236 236 236 236 236 236 236 236 236 236 119 119 119 119 37 37 37 37 37 37 37 37 289 289 289 289 289 280 280 280 280 280 280 331 331 331 331 331 331 331 331 53 53 53 53 53 53 53 53 288 288 288 288 288 288 288 288 288 288 288 288 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 283 283 283 283 283 283 283 283 283 283 283 283 283 283 208 208 208 208 208 208 279 279 279 279 279 279 279 279 279 279 279 279 133 133 133 133 133 133 133 133 133 133 133 116 116 116 116 116 116 131 131 131 340 340 340 340 283 283 283 283 283 283 283 283 283 208 208 208 208 208 279 279 279 279 279 279 279 279 279 279 279 279 289 289 289 289 289 289 133 133 133 133 133 133 273 273 273 273 273 273 273 288 288 288 288 288 288 47 47 47 47 47 225 225 225 225 225 225 225 225 225 69 69 69 69 69 69 69 236 236 236 236 236 236 236 236 119 119 119 119 52 52 52 52 52 115 115 115 115 115 115 115 115 115 209 209 209 209 209 209 209 209 272 272 272 272 272 272 272 272 275 275 275 275 275 275 275 275 275 275 275 275 275 53 53 53 53 53 53 289 289 289 289 289 289 289 289 189 189 189 189 189 189 116 116 116 116 116 116 116 116 116 116 116 175 175 175 175 175 175 175 277 277 277 277 277 277 37 37 37 37 37 37 37 37 37 37 281 281 281 281 281 281 281 281 281 281 281 281 281 281 281 204 204 204 204 204 204 204 223 223 223 223 223 223 223 223 223 165 165 165 165 165 165 165 165 165 165 165 165 165 165 165 165 232 232 232 232 232 232 232 232 232 1 1 1 1 1 1 1 1 1 1 1 1 1
+1 1 1 1 1 1 1 1 1 1 1 1 1 1 107 107 107 107 107 107 69 69 69 69 69 69 69 69 69 277 277 277 277 277 117 117 117 117 117 145 145 145 145 145 145 116 116 116 116 116 116 116 331 331 331 331 189 189 189 120 120 120 120 120 120 120 331 331 331 331 331 331 331 331 331 331 331 101 101 101 101 101 101 101 101 101 101 101 101 225 225 225 225 225 225 225 225 225 225 116 116 116 275 275 275 275 275 275 275 275 249 249 249 249 249 249 249 249 249 340 340 340 340 340 340 340 340 340 340 107 107 107 107 305 305 305 285 285 285 285 285 285 285 285 285 285 285 285 49 49 49 49 49 49 49 49 340 340 340 340 340 340 340 340 340 340 340 340 340 340 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 191 191 191 191 191 191 191 289 289 289 289 280 280 280 280 280 280 280 231 231 231 231 231 231 248 248 248 248 248 248 248 248 248 248 248 331 331 331 331 331 331 331 331 53 53 53 53 233 233 233 233 233 233 117 117 117 117 117 144 144 144 144 144 144 144 227 227 227 227 227 227 227 227 37 37 37 37 37 37 37 37 37 37 293 293 293 293 293 337 337 337 337 337 316 316 316 316 316 316 316 316 316 47 47 47 233 233 233 116 116 116 227 227 227 17 17 17 277 277 277 277 277 277 277 277 277 277 193 193 193 193 225 225 225 225 225 225 225 225 225 48 48 48 48 19 19 19 19 19 19 19 19 276 276 276 276 276 276 276 107 107 107 107 107 107 107 107 107 107 107 107 107 107 107 249 249 249 249 249 249 249 249 249 249 292 292 292 292 292 292 292 50 50 50 50 50 223 223 223 223 193 193 193 289 289 289 289 289 289 49 49 49 49 224 224 224 224 224 224 224 224 224 224 224 224 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 116 116 116 116 116 116 116 116 116 116 116 116 116 116 1 1 1 1 1 1 1 1 1 1 223 223 223 223 223 223 223 223 223 223 223 223 193 193 193 193 193 193 193 329 329 329 189 189 189 189 236 236 236 236 236 236 236 47 47 47 47 47 47 333 333 333 333 333 333 333 333 333 333 164 164 164 164 164 164 164 164 164 164 164 164 164 164 164 164 107 107 107 107 107 107 37 37 37 37 37 37 37 37 37 37 37 220 220 220 220 220 220 220 220 220 179 179 179 179 179 179 179 209 209 209 209 209 209 209 276 276 276 276 276 276 107 107 107 107 107 107 100 100 100 100 100 100 100 100 100 119 119 119 119 49 49 49 229 229 229 229 229 229 281 281 281 281 281 281 281 281 281 133 133 133 133 133 133 225 225 225 225 225 225 225 225 225 225 225 225 329 329 329 329 329 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 287 287 287 287 287 287 287 287 287 287 287 277 277 277 277 277 209 209 209 209 209 209 209 209 209 209 209 340 340 340 340 340 340 19 19 19 19 19 19 277 277 277 233 233 233 233 288 288 288 288 288 227 227 227 227 227 227 53 53 53 53 53 53 112 112 112 112 112 112 112 112 219 219 219 219 219 219 219 219 219 219 53 53 53 53 229 229 229 229 229 273 273 273 273 49 49 49 49 233 233 233 233 204 204 204 204 204 204 204 204 204 204 204 204 204 204 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 119 119 119 119 119 119 119 248 248 248 248 248 248 248 248 248 248 115 115 115 115 115 115 193 193 193 193 193 193 193 193 193 276 276 276 276 276 231 231 231 231 231 231 231 231 231 249 249 249 249 249 249 249 249 249 249 249 249 249 340 340 340 340 340 340 340 340 340 191 191 191 191 172 172 172 172 172 172 119 119 119 119 119 164 164 164 164 164 164 164 164 164 164 164 331 331 331 331 331 331 331 331 331 331 331 331 148 148 148 148 148 148 148 148 148 148 119 119 119 119 119 133 133 133 277 277 277 277 277 277 277 116 116 116 116 107 107 107 107 204 204 204 204 204 187 187 187 187 187 187 233 233 233 233 233 233 53 53 53 53 53 172 172 172 172 172 172 172 47 47 47 47 328 328 328 328 328 328 328 119 119 119 119 49 49 49 49 49 228 228 228 228 228 228 228 228 228 228 228 228 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 99 99 99 99 99 99 99 99 99 99 116 116 116 116 116 116 116 275 275 275 275 321 321 321 321 321 293 293 293 293 293 144 144 144 144 144 223 223 223 223 305 305 305 305 220 220 220 220 220 220 35 35 35 288 288 288 288 288 271 271 271 271 271 271 271 271 271 271 209 209 209 209 209 209 209 209 209 209 273 273 273 273 273 49 49 49 224 224 224 224 224 224 224 224 224 224 224 224 224 224 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 287 287 287 287 287 287 188 188 188 188 107 107 107 107 208 208 208 208 208 208 283 283 283 283 283 283 283 283 283 283 283 305 305 305 305 276 276 276 276 276 276 276 276 276 276 276 276 276 276 276 276 1 1 1 1 1 1 1 1 1 1
+1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 119 119 119 119 119 119 119 164 164 164 164 164 164 279 279 279 279 279 279 279 279 279 279 279 209 209 209 209 209 209 228 228 228 228 228 228 228 219 219 219 219 219 219 49 49 49 49 49 233 233 233 233 289 289 289 289 289 289 289 289 133 133 133 133 233 233 233 233 233 289 289 289 289 289 289 49 49 49 49 116 116 116 187 187 187 187 187 233 233 233 233 233 233 233 233 53 53 53 53 53 53 53 53 172 172 172 172 172 172 172 172 172 107 107 107 107 107 53 53 53 53 288 288 288 119 119 119 119 133 133 133 133 232 232 232 232 232 102 102 102 102 102 102 102 102 102 279 279 279 279 279 279 279 279 49 49 49 49 273 273 273 273 273 249 249 249 249 249 249 249 249 249 249 249 249 249 249 249 249 249 340 340 340 340 340 340 340 340 119 119 119 119 133 133 133 276 276 276 276 335 335 335 335 335 335 335 335 335 335 335 335 321 321 321 321 321 321 321 341 341 341 341 341 341 341 341 116 116 116 287 287 287 320 320 320 320 320 320 187 187 187 187 187 187 288 288 288 288 288 288 288 288 288 288 288 288 288 288 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 50 50 50 50 50 50 50 107 107 107 107 107 107 107 107 21 21 21 21 21 21 21 21 21 21 117 117 117 117 204 204 204 204 204 219 219 219 219 219 219 49 49 49 49 49 232 232 232 232 175 175 175 175 175 193 193 193 193 288 288 288 335 335 335 335 335 335 335 321 321 321 321 341 341 341 341 341 341 341 116 116 116 287 287 287 287 320 320 320 320 320 320 320 320 131 131 131 233 233 233 233 233 205 205 205 205 205 293 293 293 293 293 293 293 293 197 197 197 197 197 197 197 236 236 236 236 236 236 236 236 236 1 1 1 1 1 1 1 1 1 1 1 1 1 1 207 207 207 207 207 207 207 207 207 207 207 207 207 329 329 329 329 329 189 189 189 189 232 232 232 287 287 287 287 188 188 188 107 107 107 107 209 209 209 209 209 189 189 189 189 189 236 236 236 236 236 236 236 236 179 179 179 179 179 179 179 179 179 179 179 179 37 37 37 37 37 37 37 237 237 237 237 237 237 237 237 237 237 237 237 237 237 237 237 237 116 116 116 116 116 116 131 131 131 131 131 340 340 340 340 340 340 340 340 340 119 119 119 119 119 204 204 204 204 204 204 204 99 99 99 277 277 277 277 277 277 277 277 277 277 277 189 189 189 189 285 285 285 285 285 285 285 285 285 229 229 229 229 49 49 49 232 232 232 232 232 232 232 279 279 279 279 279 279 279 133 133 133 133 133 133 133 133 133 133 116 116 116 116 116 116 116 116 116 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 331 331 331 331 331 331 331 193 193 193 193 193 193 292 292 292 292 292 292 119 119 119 119 193 193 193 193 193 193 193 193 193 193 193 193 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 227 227 227 227 227 193 193 193 193 193 281 281 281 281 281 281 281 189 189 189 189 189 340 340 340 340 340 340 340 340 275 275 275 275 275 275 165 165 165 165 165 165 165 165 165 165 113 113 113 113 113 113 49 49 49 49 224 224 224 224 224 279 279 279 279 279 279 279 279 279 279 279 289 289 289 289 289 289 289 133 133 133 133 133 133 273 273 273 273 273 273 273 273 273 288 288 288 288 288 83 83 83 83 83 83 83 83 83 83 83 288 288 288 288 288 47 47 47 328 328 328 328 328 119 119 119 52 52 52 52 52 52 223 223 223 223 223 223 223 223 223 165 165 165 165 165 165 165 165 165 165 165 165 165 165 165 165 165 232 232 232 232 232 232 232 232 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 187 187 187 187 187 187 187 187 187 187 233 233 233 233 233 233 233 233 289 289 289 289 289 48 48 48 119 119 119 48 48 48 48 107 107 107 107 107 37 37 37 37 37 37 37 37 37 37 37 37 37 221 221 221 221 221 221 221 221 337 337 337 337 337 337 25 25 25 25 25 25 25 25 25 277 277 277 277 277 277 116 116 116 116 47 47 47 328 328 328 328 328 328 328 175 175 175 175 175 175 277 277 277 277 209 209 209 209 209 209 209 209 209 209 209 209 232 232 232 232 175 175 175 175 175 175 165 165 165 165 165 165 165 165 165 165 165 109 109 109 109 109 49 49 49 225 225 225 225 225 225 225 225 225 225 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 327 327 327 327 327 327 327 327 133 133 133 133 133 133 277 277 277 277 277 277 277 277 204 204 204 204 204 204 204 204 204 204 175 175 175 175 175 175 175 175 175 277 277 277 277 277 277 277 277 277 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 232 232 232 232 232 232 232 35 35 35 35 35 35 35 35 233 233 233 233 233 116 116 116 116 1 1 1 231 231 231 231 231 231 209 209 209 209 209 209 209 209 209 209 209 209 209 288 288 288 288 288 288 288 35 35 35 35 35 35 35 233 233 233 116 116 116 271 271 271 271 271 277 277 277 277 189 189 189 189 189 281 281 281 281 281 281 281 281 281 281 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 280 280 280 280 280 280 280 280 280 280 280 331 331 331 331 49 49 49 49 340 340 340 340 340 340 119 119 119 119 119 37 37 37 37 37 37 37 288 288 288 288 335 335 335 335 335 335 335 21 21 21 21 21 21 21 21 21 21 21 21 277 277 277 277 277 277 277 116 116 116 116 116 116 116 116 1 1 1 1 1 1 1 1 1 1 1 1 1 1
+1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 279 279 279 279 279 279 279 279 279 279 279 279 133 133 133 133 133 133 133 288 288 288 288 288 47 47 47 47 109 109 109 109 109 109 85 85 85 85 85 85 85 85 85 85 85 85 85 85 85 85 85 85 85 288 288 288 288 288 288 288 288 19 19 19 19 19 19 19 19 19 232 232 232 232 232 232 331 331 331 331 331 331 331 331 53 53 53 53 53 232 232 232 232 232 232 232 232 232 279 279 279 279 279 279 279 279 279 279 279 101 101 101 101 101 101 101 101 101 101 101 101 101 101 116 116 116 116 116 116 116 331 331 331 331 189 189 189 292 292 292 292 292 292 292 175 175 175 175 175 175 175 277 277 277 277 277 165 165 165 165 165 165 165 165 165 165 165 288 288 288 288 288 288 288 288 1 1 1 271 271 271 271 271 271 271 271 169 169 169 169 169 169 169 169 289 289 289 289 289 289 289 289 289 277 277 277 277 205 205 205 205 21 21 21 21 21 21 21 21 277 277 277 277 277 277 277 221 221 221 221 221 221 221 49 49 49 49 224 224 224 224 224 224 224 224 331 331 331 331 331 331 331 331 193 193 193 193 225 225 225 225 225 225 225 225 225 253 253 253 253 253 253 253 253 253 253 253 253 253 340 340 340 340 340 340 340 340 340 340 340 340 340 340 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 47 47 47 47 47 47 47 47 47 47 47 47 47 47 233 233 233 116 116 116 119 119 119 204 204 204 204 204 204 204 204 204 204 204 204 51 51 51 51 51 51 51 121 121 121 121 121 121 144 144 144 144 144 144 144 144 144 144 331 331 331 331 331 189 189 189 292 292 292 292 292 292 292 292 271 271 271 271 271 271 271 271 271 271 277 277 277 277 277 193 193 193 193 228 228 228 228 228 228 228 228 228 228 228 228 228 228 228 228 228 228 228 223 223 223 223 223 21 21 21 21 21 21 21 21 229 229 229 229 229 109 109 109 109 109 17 17 17 277 277 277 277 117 117 117 117 205 205 205 205 205 205 205 205 205 205 205 205 205 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 231 231 231 231 231 231 231 231 21 21 21 21 21 21 288 288 288 288 288 50 50 50 50 50 50 50 279 279 279 279 279 279 279 279 279 279 279 279 289 289 289 289 289 289 289 277 277 277 277 277 164 164 164 164 164 164 164 164 164 164 164 164 164 164 164 164 164 279 279 279 279 279 279 279 279 279 279 279 279 279 279 289 289 289 289 289 289 289 289 193 193 193 193 193 220 220 220 220 220 220 220 220 220 220 220 220 220 231 231 231 231 231 231 69 69 69 69 69 69 276 276 276 276 276 276 276 279 279 279 279 279 279 279 279 279 279 279 289 289 289 289 289 289 289 249 249 249 249 249 249 249 249 249 249 249 249 249 249 232 232 232 232 232 232 331 331 331 331 331 331 49 49 49 49 340 340 340 340 340 340 287 287 287 287 48 48 48 107 107 107 107 208 208 208 208 208 208 208 208 279 279 279 279 279 279 279 279 279 279 279 279 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 232 232 232 232 232 232 232 232 232 232 232 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 171 171 171 171 171 171 171 171 171 69 69 69 69 276 276 276 276 227 227 227 227 227 227 193 193 193 193 193 281 281 281 281 281 281 281 189 189 189 189 340 340 340 340 340 340 340 340 340 275 275 275 275 275 275 165 165 165 165 165 165 165 165 165 113 113 113 113 113 113 49 49 49 49 49 49 224 224 224 224 224 224 224 224 224 224 224 224 224 331 331 331 331 331 331 305 305 305 305 116 116 116 179 179 179 179 179 179 179 37 37 37 37 37 328 328 328 328 279 279 279 279 279 279 279 279 279 279 279 279 209 209 209 209 209 209 209 232 232 232 232 191 191 191 191 191 191 191 191 288 288 288 288 288 288 191 191 191 191 191 172 172 172 172 172 172 172 172 172 119 119 119 119 119 119 119 133 133 133 133 133 276 276 276 276 179 179 179 179 179 179 179 179 179 179 179 37 37 37 37 37 37 37 116 116 116 116 116 116 116 116 107 107 107 107 107 107 49 49 49 49 49 49 49 232 232 232 232 232 232 232 232 232 232 232 232 232 232 232 232 232 232 232 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 271 271 271 271 271 271 271 277 277 277 277 277 101 101 101 101 101 101 101 101 101 101 101 329 329 329 329 329 49 49 49 49 289 289 289 289 289 289 225 225 225 225 225 225 225 225 204 204 204 204 204 204 204 283 283 283 283 283 283 283 283 283 283 208 208 208 208 208 331 331 331 331 331 49 49 49 49 340 340 340 340 340 340 340 340 47 47 47 47 47 328 328 328 328 328 119 119 119 119 204 204 204 204 204 204 204 47 47 47 47 47 273 273 273 273 273 273 273 273 273 273 273 273 193 193 193 193 233 233 233 233 233 337 337 337 337 337 337 337 49 49 49 49 232 232 232 232 232 232 119 119 119 49 49 49 49 288 288 288 288 288 288 227 227 227 227 17 17 17 277 277 277 277 277 277 277 193 193 193 193 193 193 225 225 225 225 225 225 225 48 48 48 48 48 219 219 219 219 219 219 219 219 219 219 53 53 53 53 53 53 53 293 293 293 293 293 293 293 109 109 109 109 109 145 145 145 145 145 145 145 145 145 145 288 288 288 288 279 279 279 279 279 279 279 279 279 279 279 279 333 333 333 333 333 133 133 133 133 273 273 273 273 273 273 288 288 288 288 288 288 288 288 288 119 119 119 119 119 37 37 37 37 37 37 37 37 37 37 288 288 288 288 288 288 335 335 335 335 335 21 21 21 21 21 21 21 21 21 21 21 277 277 277 277 116 116 116 116 247 247 247 247 247 247 247 247 247 247 247 247 329 329 329 329 329 329 329 329 144 144 144 144 144 144 144 144 144 144 131 131 131 131 131 340 340 340 340 340 340 340 340 340 67 67 67 67 67 67 67 67 67 67 67 67 67 173 173 173 173 173 173 173 173 173 173 49 49 49 49 232 232 232 232 232 131 131 131 131 340 340 340 340 283 283 283 283 283 283 283 283 208 208 208 208 208 208 279 279 279 279 279 279 279 279 333 333 333 333 133 133 133 133 133 273 273 273 273 273 273 288 288 288 179 179 179 179 179 179 179 144 144 144 144 144 179 179 179 179 179 179 179 179 179 179 85 85 85 85 85 85 85 85 85 85 85 85 85 85 85 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
+1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 331 331 331 331 331 331 331 331 331 53 53 53 53 53 53 53 53 232 232 232 232 232 232 219 219 219 219 219 219 305 305 305 116 116 116 179 179 179 37 37 37 328 328 328 328 328 328 328 207 207 207 207 207 207 207 207 207 207 207 289 289 289 49 49 49 232 232 232 232 232 50 50 50 50 50 227 227 227 227 227 227 227 227 227 227 227 209 209 209 209 209 209 209 209 209 209 209 209 209 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 67 67 67 67 67 67 67 67 67 67 67 172 172 172 172 172 172 172 172 119 119 119 119 52 52 52 175 175 175 175 175 277 277 277 277 277 277 85 85 85 85 85 85 85 85 85 85 85 85 85 233 233 233 233 116 116 116 116 116 331 331 331 331 331 331 189 189 189 121 121 121 121 121 85 85 85 85 85 85 85 85 85 85 288 288 288 288 288 288 288 288 288 247 247 247 247 247 247 247 247 247 329 329 329 329 329 329 145 145 145 145 145 109 109 109 109 109 109 109 109 277 277 277 277 193 193 193 229 229 229 229 229 189 189 189 189 189 236 236 236 236 236 236 236 236 236 236 236 236 119 119 119 52 52 52 52 271 271 271 271 271 271 271 277 277 277 277 49 49 49 49 329 329 329 329 329 149 149 149 149 149 149 149 149 149 149 149 149 149 149 109 109 109 109 109 205 205 205 205 49 49 49 49 49 49 49 49 224 224 224 224 224 271 271 271 271 271 271 271 271 271 271 271 271 271 271 133 133 133 133 133 220 220 220 220 220 220 220 220 47 47 47 47 47 328 328 328 328 328 328 328 115 115 115 115 115 115 149 149 149 149 149 149 149 149 149 149 149 149 149 149 149 149 288 288 288 288 288 288 288 288 288 288 288 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 227 227 227 227 227 227 227 227 227 193 193 193 193 193 193 281 281 281 281 281 281 189 189 189 189 189 189 340 340 340 340 340 340 340 340 340 275 275 275 275 275 275 165 165 165 165 165 165 165 165 165 113 113 113 113 113 49 49 49 49 49 49 224 224 224 224 224 224 224 224 224 224 224 275 275 275 275 275 37 37 37 37 37 37 37 37 37 273 273 273 273 273 273 288 288 288 288 279 279 279 279 279 279 279 279 279 279 279 279 229 229 229 229 229 229 229 21 21 21 21 277 277 277 277 277 277 277 289 289 289 289 289 289 225 225 225 225 225 204 204 204 204 204 204 35 35 35 35 35 288 288 288 119 119 119 119 204 204 204 204 219 219 219 219 219 219 219 219 193 193 193 193 193 113 113 113 113 113 113 113 113 113 49 49 49 49 232 232 232 232 232 232 232 232 115 115 115 115 115 69 69 69 69 69 69 69 69 69 69 69 276 276 276 276 276 276 276 276 276 276 276 276 276 1 1 1 47 47 47 47 47 47 47 47 47 47 233 233 233 116 116 116 116 116 279 279 279 279 279 279 279 279 279 279 289 289 289 289 289 289 133 133 133 133 133 133 133 133 273 273 273 273 273 273 288 288 288 288 288 191 191 191 191 232 232 232 232 232 232 232 232 232 331 331 331 331 193 193 193 193 193 232 232 232 232 232 232 232 232 107 107 107 107 107 193 193 193 193 193 117 117 117 117 189 189 189 189 232 232 232 232 232 232 232 287 287 287 287 287 287 188 188 188 188 115 115 115 115 115 115 115 320 320 320 320 320 320 320 320 279 279 279 279 279 279 279 279 279 279 279 279 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 119 119 119 119 119 52 52 52 52 52 219 219 219 219 219 219 219 219 219 219 193 193 193 193 193 193 113 113 113 113 113 113 113 113 113 49 49 49 49 49 49 232 232 232 232 232 232 35 35 35 35 288 288 288 288 288 288 288 175 175 175 175 175 175 277 277 277 277 209 209 209 209 209 209 209 209 209 232 232 232 232 232 232 175 175 175 175 175 175 165 165 165 165 165 165 165 165 165 165 165 165 165 109 109 109 109 49 49 49 49 225 225 225 225 225 225 225 225 225 225 225 340 340 340 340 340 340 340 331 331 331 49 49 49 340 340 340 340 340 340 340 340 50 50 50 50 50 111 111 111 111 111 111 111 111 111 111 111 193 193 193 193 277 277 277 277 277 173 173 173 173 173 173 49 49 49 49 224 224 224 224 224 47 47 47 47 273 273 273 273 273 273 273 21 21 21 21 21 21 21 277 277 277 277 277 277 277 289 289 289 289 229 229 229 49 49 49 233 233 233 233 233 288 288 288 288 288 288 288 288 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
+1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 276 276 276 276 276 276 276 276 276 276 331 331 331 331 331 331 331 331 331 331 305 305 305 305 116 116 116 116 179 179 179 37 37 37 328 328 328 328 328 328 107 107 107 107 107 107 193 193 193 193 193 232 232 232 232 232 232 232 232 232 232 232 111 111 111 111 111 111 111 111 111 111 193 193 193 193 277 277 277 277 277 277 277 277 173 173 173 173 173 173 173 173 173 49 49 49 49 224 224 224 224 224 224 224 224 224 224 224 224 224 1 1 1 1 1 1 1 1 187 187 187 187 187 187 187 187 187 187 187 172 172 172 172 172 172 172 191 191 191 191 288 288 288 288 288 179 179 179 37 37 37 116 116 116 116 231 231 231 231 231 21 21 21 21 21 21 21 21 21 21 21 21 288 288 288 288 288 288 107 107 107 107 193 193 193 193 232 232 232 232 232 232 232 232 279 279 279 279 279 279 279 279 279 279 279 248 248 248 248 248 248 271 271 271 271 271 271 271 271 271 271 271 271 271 271 271 165 165 165 165 165 165 233 233 233 233 233 233 233 233 233 173 173 173 173 173 173 49 49 49 49 225 225 225 225 225 204 204 204 204 204 204 204 204 204 219 219 219 219 219 219 219 219 219 219 219 219 219 219 219 225 225 225 225 225 225 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 232 232 232 232 232 232 131 131 131 131 131 340 340 340 340 340 340 340 340 340 340 287 287 287 287 287 188 188 188 188 175 175 175 175 175 175 175 175 193 193 193 193 193 193 328 328 328 328 328 187 187 187 187 187 187 288 288 288 288 288 288 279 279 279 279 279 279 53 53 53 53 229 229 229 229 229 229 229 293 293 293 293 293 189 189 189 189 236 236 236 236 236 236 236 236 236 47 47 47 47 328 328 328 328 328 119 119 119 119 204 204 204 204 204 204 204 47 47 47 47 47 273 273 273 273 273 273 273 273 273 273 273 273 273 273 193 193 193 193 193 277 277 277 277 277 277 277 277 277 277 49 49 49 49 49 49 49 49 49 49 233 233 233 233 233 233 280 280 280 280 280 280 280 280 47 47 47 47 47 328 328 328 328 47 47 47 47 232 232 232 232 232 47 47 47 47 47 47 47 233 233 233 233 233 233 233 337 337 337 337 337 337 337 337 337 337 337 337 337 337 321 321 321 321 321 321 321 321 321 321 321 341 341 341 341 341 341 341 341 116 116 116 116 116 271 271 271 271 271 271 271 271 271 271 21 21 21 21 21 21 21 21 21 277 277 277 277 277 277 277 277 225 225 225 225 225 225 225 225 144 144 144 144 144 144 144 144 144 144 144 144 144 144 144 144 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 191 191 191 191 191 191 191 191 191 289 289 289 289 280 280 280 280 280 280 331 331 331 331 331 331 193 193 193 193 193 233 233 233 233 233 233 233 233 117 117 117 117 245 245 245 245 245 245 245 245 245 245 245 245 340 340 340 340 340 340 340 340 223 223 223 223 223 223 305 305 305 305 305 221 221 221 221 221 288 288 288 288 288 1 1 1 207 207 207 207 207 207 207 207 207 207 207 207 207 207 207 207 207 207 207 207 281 281 281 281 281 281 281 281 281 281 281 288 288 288 288 288 288 35 35 35 35 35 35 35 35 35 233 233 233 233 233 116 116 116 116 116 331 331 331 331 331 331 331 133 133 133 133 133 133 133 133 133 133 133 281 281 281 281 281 281 281 281 281 281 281 281 281 288 288 288 288 288 288 288 288 288 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 291 291 291 291 291 291 291 291 291 277 277 277 277 277 320 320 320 320 119 119 119 119 52 52 52 52 52 52 331 331 331 331 331 331 331 133 133 133 133 133 133 281 281 281 281 281 281 281 281 281 288 288 288 288 288 288 288 331 331 331 331 331 331 331 53 53 53 53 53 232 232 232 232 232 232 232 232 232 232 232 232 232 232 232 232 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 223 223 223 223 223 223 223 305 305 305 305 305 221 221 221 221 221 221 189 189 189 189 189 189 189 236 236 236 236 236 236 83 83 83 83 83 83 83 83 83 83 288 288 288 288 288 288 288 19 19 19 19 232 232 232 232 232 119 119 119 52 52 52 107 107 107 107 107 107 107 37 37 37 37 37 37 37 37 37 37 37 37 37 220 220 220 220 220 220 220 335 335 335 335 335 335 335 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 277 277 277 277 277 277 277 116 116 116 116 116 116 116 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 219 219 219 219 219 219 219 219 165 165 165 165 165 165 165 165 228 228 228 228 228 228 228 50 50 50 50 50 50 50 171 171 171 171 171 171 171 171 171 171 171 171 225 225 225 225 225 53 53 53 53 53 53 53 53 116 116 116 116 47 47 47 47 47 328 328 328 328 328 328 328 227 227 227 227 227 227 227 227 227 227 227 227 227 133 133 133 133 225 225 225 225 225 225 225 225 225 225 225 225 244 244 244 244 244 244 244 244 244 244 244 244 244 244 215 215 215 215 215 215 215 215 215 215 215 215 215 215 321 321 321 321 321 321 321 321 321 321 321 321 321 321 321 321 321 232 232 232 232 232 232 232 232 232 232 232 232 279 279 279 279 279 279 279 279 279 279 279 279 53 53 53 53 53 53 233 233 233 233 233 233 233 225 225 225 225 225 225 105 105 105 105 105 105 105 105 105 105 105 105 288 288 288 288 288 288 288 288 288 288 288 288 288 288 288 288 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 107 107 107 107 107 107 53 53 53 288 288 288 288 288 119 119 119 119 119 204 204 204 204 204 204 204 204 207 207 207 207 207 207 207 207 207 207 207 207 207 207 207 281 281 281 281 281 281 281 281 281 281 281 288 288 288 288 288 288 288 288 288 331 331 331 331 331 331 331 53 53 53 53 53 232 232 232 232 232 232 232 232 232 232 232 232 232 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
+1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 331 331 331 331 331 331 133 133 133 133 133 233 233 233 233 233 233 233 280 280 280 280 280 280 280 335 335 335 335 320 320 320 175 175 175 175 175 175 175 175 175 175 175 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 288 288 288 288 288 288 288 288 288 288 288 288 288 288 1 1 1 50 50 50 50 50 50 50 50 175 175 175 175 175 175 175 175 175 225 225 225 225 225 225 225 225 225 193 193 193 229 229 229 229 229 229 229 229 273 273 273 273 280 280 280 280 280 280 280 280 280 47 47 47 47 328 328 328 328 328 328 328 119 119 119 52 52 52 52 107 107 107 107 107 107 107 107 107 225 225 225 225 225 225 321 321 321 321 321 321 321 321 321 321 321 321 321 321 321 321 321 228 228 228 228 228 228 228 228 228 228 228 228 228 228 228 228 228 228 228 1 1 1 1 1 1 331 331 331 331 331 331 331 331 331 331 331 331 331 331 331 331 101 101 101 101 101 101 101 101 101 101 101 101 288 288 288 288 288 288 288 288 288 288 288 288 288 288 1 1 1 1 1 111 111 111 111 111 111 111 111 111 111 111 111 133 133 133 133 133 277 277 277 277 277 277 277 277 204 204 204 204 204 204 204 204 204 287 287 287 287 287 287 287 287 277 277 277 277 209 209 209 209 209 209 209 209 209 340 340 340 340 340 340 340 187 187 187 232 232 232 232 232 119 119 119 52 52 52 52 52 223 223 223 223 223 223 133 133 133 133 133 133 133 133 173 173 173 173 173 173 173 288 288 288 288 288 288 67 67 67 67 67 67 67 67 67 67 277 277 277 277 277 277 277 113 113 113 113 113 113 113 113 113 113 145 145 145 145 145 145 145 116 116 116 116 116 116 116 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 47 47 47 47 47 47 47 47 47 47 47 47 233 233 233 116 116 116 116 231 231 231 231 231 231 231 21 21 21 21 21 21 21 21 21 21 21 21 21 117 117 117 117 189 189 189 189 189 189 189 189 236 236 236 236 236 236 236 236 236 236 236 236 236 236 279 279 279 279 279 279 279 279 279 279 279 279 279 279 279 225 225 225 225 225 133 133 133 133 133 233 233 233 233 233 233 233 117 117 117 117 117 144 144 144 144 144 144 144 144 144 107 107 107 107 107 107 107 107 107 149 149 149 149 149 149 149 149 149 149 149 149 113 113 113 113 113 113 113 113 113 113 113 113 189 189 189 189 189 189 189 340 340 340 340 340 340 340 340 340 340 115 115 115 115 115 115 115 115 85 85 85 85 85 85 85 85 85 85 85 232 232 232 232 232 187 187 187 232 232 232 119 119 119 52 52 52 52 52 52 179 179 179 179 179 179 179 179 179 179 179 179 21 21 21 21 225 225 225 225 225 225 225 225 225 225 244 244 244 244 244 244 244 107 107 107 107 107 107 107 107 100 100 100 100 100 100 100 119 119 119 119 119 52 52 52 52 52 107 107 107 107 107 107 107 107 277 277 277 277 277 277 305 305 305 305 305 305 305 305 305 220 220 220 220 220 220 220 220 220 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 331 331 331 331 331 331 331 49 49 49 340 340 340 340 340 340 340 340 340 175 175 175 175 175 175 277 277 277 277 277 277 209 209 209 209 209 209 209 209 209 209 209 209 233 233 233 116 116 116 116 116 247 247 247 247 247 247 247 247 247 247 329 329 329 329 329 329 144 144 144 144 144 144 144 107 107 107 107 107 107 107 100 100 100 100 100 100 100 100 100 100 100 100 50 50 50 50 50 287 287 287 287 287 287 287 287 287 287 287 287 287 287 287 287 37 37 37 37 237 237 237 237 237 237 237 237 177 177 177 49 49 49 49 224 224 224 224 224 224 47 47 47 47 47 47 328 328 328 328 327 327 327 327 327 327 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 233 233 233 233 233 233 233 233 233 233 233 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 179 179 179 179 179 179 179 179 179 209 209 209 209 209 209 276 276 276 276 276 276 276 276 279 279 279 279 279 279 279 279 279 279 279 279 37 37 37 37 37 37 37 37 37 37 288 288 288 288 288 288 227 227 227 227 227 227 17 17 17 17 277 277 277 193 193 193 193 193 193 225 225 225 225 225 225 48 48 48 48 48 219 219 219 219 219 219 219 219 219 219 219 53 53 53 53 53 53 53 293 293 293 293 293 293 293 293 293 109 109 109 109 109 109 145 145 145 145 145 145 145 145 145 288 288 288 288 288 288 288 288 288 288 288 288 288 288 1 1 1 1 1 1 331 331 331 331 331 331 331 331 331 133 133 133 133 232 232 232 232 283 283 283 283 283 283 283 283 208 208 208 208 208 208 208 208 279 279 279 279 279 279 279 37 37 37 37 37 37 37 37 288 288 288 288 288 35 35 35 35 288 288 288 288 288 288 288 67 67 67 67 67 67 67 67 67 67 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 1 1 1 1 1 1 1 1 1 1 1 1 1 67 67 67 67 67 67 67 67 67 67 67 225 225 225 225 225 225 225 225 225 333 333 333 333 333 333 333 333 333 333 333 205 205 205 205 205 340 340 340 340 279 279 279 279 279 279 279 279 279 279 279 225 225 225 225 225 101 101 101 101 101 101 101 289 289 289 289 289 289 225 225 225 225 225 204 204 204 204 204 204 115 115 115 115 115 115 189 189 189 189 189 281 281 281 281 281 281 281 289 289 289 289 289 289 277 277 277 53 53 53 53 53 53 281 281 281 281 281 281 289 289 289 173 173 173 49 49 49 49 224 224 224 224 224 47 47 47 328 328 328 279 279 279 279 279 279 279 279 279 53 53 53 53 53 233 233 233 233 233 233 233 233 285 285 285 285 285 285 285 285 105 105 105 105 105 105 105 105 105 105 105 232 232 232 232 232 232 232 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
+1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 35 35 35 35 35 35 35 35 35 35 35 35 35 35 35 35 233 233 233 116 116 116 116 179 179 179 179 179 179 179 179 179 209 209 209 209 209 209 209 276 276 276 276 276 276 283 283 283 283 283 283 283 283 283 283 283 283 208 208 208 208 208 208 208 279 279 279 279 279 279 279 279 279 279 279 279 37 37 37 37 37 37 37 37 37 37 37 37 37 288 288 288 288 288 288 231 231 231 231 231 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 231 231 231 231 231 231 231 231 231 231 231 231 231 231 193 193 193 193 193 289 289 289 289 289 289 289 189 189 189 189 236 236 236 236 236 236 236 236 236 236 236 236 236 236 236 236 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 35 35 35 35 35 35 35 35 35 35 35 35 233 233 233 116 116 116 119 119 119 52 52 52 52 287 287 287 287 287 287 287 287 165 165 165 165 165 165 165 165 165 165 165 109 109 109 49 49 49 224 224 224 224 224 107 107 107 107 107 189 189 189 189 181 181 181 181 181 181 181 181 181 181 181 181 101 101 101 101 101 101 101 233 233 233 116 116 116 179 179 179 179 179 179 144 144 144 144 144 144 144 331 331 331 331 331 331 49 49 49 340 340 340 340 340 340 340 223 223 223 223 223 165 165 165 165 165 165 165 165 116 116 116 116 116 171 171 171 171 171 171 144 144 144 144 144 279 279 279 279 279 279 279 279 279 279 53 53 53 53 53 53 273 273 273 273 273 273 144 144 144 144 144 144 144 144 144 144 144 144 144 144 144 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 227 227 227 227 227 227 227 227 193 193 193 193 193 281 281 281 281 281 281 281 189 189 189 189 189 340 340 340 340 340 340 340 340 275 275 275 275 275 275 275 165 165 165 165 165 165 165 165 165 113 113 113 113 113 113 49 49 49 49 49 49 49 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 1 1 1 1 1 1 1 1 1 1 1 1 1 107 107 107 107 107 189 189 189 189 189 189 189 173 173 173 173 173 173 173 173 173 69 69 69 69 69 276 276 276 276 283 283 283 283 283 283 283 283 283 283 208 208 208 208 179 179 179 179 37 37 37 37 116 116 116 116 116 171 171 171 171 171 171 133 133 133 133 133 277 277 277 277 277 277 277 225 225 225 225 225 225 204 204 204 204 204 204 219 219 219 219 219 219 219 219 225 225 225 225 225 249 249 249 249 249 249 249 249 249 341 341 341 341 341 341 341 116 116 116 119 119 119 52 52 52 52 115 115 115 115 115 69 69 69 69 69 69 69 69 69 69 69 276 276 276 276 276 276 276 276 276 276 276 276 276 1 1 1 1 1 1 1 1 1 1 1 1
+1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 119 119 119 119 119 119 133 133 133 133 133 276 276 276 276 276 276 331 331 331 331 331 144 144 144 144 144 144 144 144 291 291 291 291 291 291 291 291 291 291 291 291 277 277 277 277 277 208 208 208 208 208 208 208 208 271 271 271 271 271 271 271 271 271 271 225 225 225 225 225 165 165 165 165 165 165 165 165 165 289 289 289 289 289 289 280 280 280 280 280 280 280 223 223 223 223 223 223 223 165 165 165 165 165 165 165 165 165 165 165 165 165 116 116 116 116 116 116 116 116 116 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 279 279 279 279 279 279 279 279 279 279 279 248 248 248 248 248 248 248 248 248 119 119 119 119 49 49 49 49 288 288 288 288 288 288 288 227 227 227 227 17 17 17 17 277 277 277 277 277 277 193 193 193 193 193 193 225 225 225 225 225 225 225 225 225 48 48 48 48 48 227 227 227 227 227 227 227 227 227 53 53 53 53 53 53 281 281 281 281 281 281 288 288 288 107 107 107 107 208 208 208 208 208 187 187 187 187 221 221 221 221 221 221 221 221 281 281 281 281 273 273 273 273 273 273 133 133 133 133 133 221 221 221 221 221 221 221 289 289 289 289 289 189 189 189 236 236 236 236 236 236 236 236 236 236 236 279 279 279 279 279 279 279 279 279 279 279 53 53 53 53 53 228 228 228 228 228 228 331 331 331 331 331 53 53 53 53 232 232 232 232 232 179 179 179 179 179 179 179 179 179 249 249 249 249 249 249 249 249 249 249 249 228 228 228 228 228 228 331 331 331 331 331 189 189 189 292 292 292 292 292 292 292 292 227 227 227 227 227 227 227 227 37 37 37 37 37 37 37 293 293 293 293 293 293 337 337 337 337 337 316 316 316 316 316 287 287 287 287 287 287 287 188 188 188 188 188 287 287 287 287 287 287 287 287 287 287 208 208 208 208 208 208 208 208 208 208 208 208 208 208 208 208 208 208 208 208 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 107 107 107 107 107 107 53 53 53 53 53 53 53 288 288 288 288 288 288 119 119 119 119 52 52 52 52 52 115 115 115 115 115 115 115 115 115 115 193 193 193 193 285 285 285 285 285 285 285 285 285 285 285 189 189 189 189 189 189 189 189 189 340 340 340 340 340 340 340 340 340 331 331 331 331 144 144 144 144 144 144 144 131 131 131 131 131 131 131 131 329 329 329 329 329 329 277 277 277 277 277 205 205 205 205 205 117 117 117 117 117 117 164 164 164 164 164 164 164 164 164 164 164 164 164 164 164 115 115 115 115 115 115 115 115 193 193 193 193 193 285 285 285 285 285 285 285 285 285 285 285 189 189 189 189 189 189 189 189 189 189 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 35 35 35 35 35 35 35 233 233 233 116 116 116 119 119 119 133 133 133 276 276 276 276 276 331 331 331 331 49 49 49 49 340 340 340 340 340 340 247 247 247 247 247 247 247 233 233 233 233 233 233 225 225 225 225 225 204 204 204 204 204 204 204 219 219 219 219 219 219 219 219 219 219 219 219 219 219 277 277 277 277 37 37 37 37 37 37 37 37 37 108 108 108 108 108 35 35 35 35 35 35 35 273 273 273 273 273 273 49 49 49 49 224 224 224 224 224 271 271 271 271 271 277 277 277 277 189 189 189 189 341 341 341 341 341 341 341 341 341 149 149 149 149 149 149 149 149 149 149 149 149 149 149 149 149 149 149 149 149 149 329 329 329 329 329 340 340 340 340 340 340 47 47 47 47 47 47 47 233 233 233 233 233 233 233 233 233 116 116 116 116 116 116 331 331 331 331 331 331 331 331 331 53 53 53 53 232 232 232 232 232 232 232 232 232 232 232 232 219 219 219 219 219 219 219 219 219 219 101 101 101 101 101 101 101 101 101 101 101 101 233 233 233 116 116 116 47 47 47 328 328 328 328 328 219 219 219 219 219 219 219 219 219 165 165 165 165 165 165 165 165 165 165 165 165 165 165 220 220 220 220 220 220 220 220 220 220 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 279 279 279 279 279 279 279 279 279 279 279 279 279 248 248 248 248 248 119 119 119 119 119 49 49 49 49 49 49 288 288 288 288 119 119 119 119 119 204 204 204 204 204 204 187 187 187 187 187 221 221 221 221 221 221 221 281 281 281 273 273 273 273 273 133 133 133 133 133 133 221 221 221 221 221 221 289 289 289 289 289 49 49 49 116 116 116 116 116 219 219 219 219 219 219 219 219 219 219 219 53 53 53 53 229 229 229 229 229 229 229 273 273 273 273 273 49 49 49 49 233 233 233 233 233 204 204 204 204 204 204 204 204 219 219 219 219 219 219 219 219 305 305 305 305 305 116 116 116 116 231 231 231 231 231 231 21 21 21 21 21 21 21 21 288 288 288 288 288 107 107 107 107 208 208 208 208 208 208 208 208 208 208 131 131 131 131 131 233 233 233 233 233 233 204 204 204 204 204 204 271 271 271 271 271 271 145 145 145 145 145 289 289 289 289 289 289 289 289 289 289 193 193 193 193 221 221 221 221 221 221 221 337 337 337 337 49 49 49 225 225 225 225 225 144 144 144 144 144 219 219 219 219 219 219 219 219 219 219 53 53 53 53 229 229 229 229 229 229 229 273 273 273 273 273 49 49 49 49 233 233 233 204 204 204 204 204 204 204 204 204 204 204 204 204 204 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
+1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 335 335 335 335 335 335 335 335 335 335 133 133 133 133 133 133 288 288 288 288 288 288 1 1 1 1 1 1 1 1 1 1 1 1 1 331 331 331 331 331 331 331 331 331 53 53 53 53 288 288 288 288 47 47 47 47 47 328 328 328 328 328 328 227 227 227 227 227 227 227 37 37 37 37 37 37 37 37 37 37 293 293 293 293 293 293 337 337 337 337 337 317 317 317 317 317 340 340 340 340 340 340 340 340 340 340 340 340 340 340 331 331 331 331 331 331 331 331 331 101 101 101 101 101 101 101 101 101 101 288 288 288 288 288 288 288 288 288 219 219 219 219 219 219 219 219 219 219 219 219 219 219 219 219 219 219 219 21 21 21 21 21 21 225 225 225 225 225 225 225 225 225 225 225 225 225 225 144 144 144 144 144 144 144 144 144 144 144 47 47 47 233 233 233 116 116 116 119 119 119 52 52 52 52 279 279 279 279 279 279 279 279 279 279 279 279 279 279 279 69 69 69 277 277 277 277 277 277 277 277 277 49 49 49 49 49 224 224 224 224 224 224 224 227 227 227 227 227 227 227 227 227 227 227 227 133 133 133 133 133 133 133 133 133 133 133 133 133 276 276 276 276 276 276 276 276 276 276 276 276 276 276 276 276 276 276 276 276 276 276 276 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 227 227 227 227 227 227 227 227 193 193 193 193 193 281 281 281 281 281 281 281 189 189 189 189 340 340 340 340 340 340 340 340 340 275 275 275 275 275 275 275 165 165 165 165 165 165 165 165 165 113 113 113 113 113 113 49 49 49 49 49 49 49 224 224 224 224 224 224 331 331 331 331 49 49 49 49 340 340 340 340 340 175 175 175 175 175 175 193 193 193 193 289 289 289 289 189 189 189 189 236 236 236 236 236 236 236 236 236 236 171 171 171 171 171 171 171 171 171 171 171 171 171 171 171 133 133 133 133 133 277 277 277 277 277 277 277 277 277 225 225 225 225 225 225 204 204 204 204 204 204 204 204 204 204 204 204 204 204 1 1 1 1 1 115 115 115 115 115 115 115 115 193 193 193 193 193 341 341 341 341 341 341 341 341 341 341 341 204 204 204 204 204 204 204 204 204 331 331 331 331 331 331 331 193 193 193 193 120 120 120 119 119 119 119 189 189 189 189 189 280 280 280 280 280 280 280 280 280 47 47 47 47 233 233 233 233 233 233 233 233 233 337 337 337 337 337 337 337 337 337 337 321 321 321 321 321 321 321 321 321 345 345 345 345 345 345 345 345 345 333 333 333 333 49 49 49 224 224 224 224 224 224 224 227 227 227 227 227 227 227 227 227 227 227 227 193 193 193 193 193 281 281 281 281 281 281 281 281 281 289 289 289 289 289 145 145 145 145 145 145 204 204 204 204 204 47 47 47 47 109 109 109 109 109 85 85 85 85 85 85 85 85 288 288 288 288 288 288 288 288 288 219 219 219 219 219 219 219 219 219 219 219 219 219 333 333 333 333 333 101 101 101 101 101 101 101 101 101 101 101 101 101 101 49 49 49 49 49 49 288 288 288 288 288 288 288 288 288 288 1 1 1 1 1 1 1 1 47 47 47 47 47 47 47 47 47 233 233 233 233 233 233 233 229 229 229 229 229 229 229 229 229 197 197 197 197 197 281 281 281 281 281 281 281 281 281 281 281 281 281 289 289 289 289 289 193 193 193 277 277 277 277 277 205 205 205 205 205 205 205 205 205 205 205 49 49 49 49 49 49 280 280 280 280 280 280 280 280 175 175 175 175 175 277 277 277 277 277 209 209 209 209 209 209 209 209 209 209 232 232 232 232 232 232 232 232 175 175 175 175 175 175 165 165 165 165 165 165 165 165 165 165 165 165 165 109 109 109 109 49 49 49 49 225 225 225 225 225 225 225 225 225 225 225 340 340 340 340 340 340 340 340 340 340 340 340 340 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 175 175 175 175 175 175 305 305 305 305 116 116 116 116 116 207 207 207 207 207 207 207 329 329 329 233 233 233 233 189 189 189 189 236 236 236 236 236 275 275 275 275 275 275 275 165 165 165 165 165 113 113 113 113 49 49 49 49 49 49 49 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 227 227 227 227 227 227 227 17 17 17 17 277 277 277 193 193 193 193 193 193 225 225 225 225 225 225 225 48 48 48 48 48 48 279 279 279 279 279 279 279 279 279 279 133 133 133 133 133 133 133 116 116 116 116 116 116 116 116 1 1 1 107 107 107 107 107 107 277 277 277 193 193 193 193 193 193 281 281 281 281 281 281 281 221 221 221 221 221 225 225 225 225 225 225 204 204 204 204 204 204 204 204 204 204 204 204 204 204 204 204 204 204 204 204 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 119 119 119 119 119 119 189 189 189 189 189 189 280 280 280 280 280 187 187 187 187 340 340 340 340 340 340 340 50 50 50 50 50 50 50 275 275 275 275 275 275 275 275 275 209 209 209 209 209 209 209 209 224 224 224 224 224 224 224 171 171 171 171 171 171 171 171 171 171 171 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 232 232 232 232 232 232 232 207 207 207 207 207 207 207 207 329 329 329 329 329 329 233 233 233 233 233 189 189 189 189 189 236 236 236 236 236 236 236 236 236 191 191 191 191 341 341 341 49 49 49 233 233 233 288 288 288 187 187 187 187 187 187 187 288 288 288 288 288 288 288 288 288 288 288 288 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 331 331 331 331 331 331 331 249 249 249 233 233 233 233 288 288 288 288 335 335 335 320 320 320 279 279 279 279 279 279 279 279 193 193 193 193 193 288 288 288 115 115 115 115 115 115 85 85 85 85 85 85 85 85 85 85 85 85 232 232 232 232 232 232 232 232 232 232 232 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
+1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 179 179 179 179 179 179 179 179 179 179 179 84 84 84 84 84 84 84 84 84 84 84 146 146 146 146 146 146 146 146 67 67 67 67 67 67 67 67 67 224 224 224 224 224 224 224 335 335 335 69 69 69 69 69 69 69 69 276 276 276 276 276 276 171 171 171 171 171 171 171 171 171 249 249 249 249 221 221 221 221 221 221 221 221 221 221 221 221 221 221 221 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 279 279 279 279 279 279 279 279 279 279 279 279 279 53 53 53 53 53 53 229 229 229 229 229 229 229 229 293 293 293 293 293 293 293 293 189 189 189 189 189 189 236 236 236 236 236 236 236 236 236 236 236 236 236 119 119 119 119 37 37 37 37 37 37 37 37 37 288 288 288 288 288 288 288 171 171 171 171 171 171 171 171 171 69 69 69 276 276 276 276 276 276 223 223 223 223 223 223 223 223 223 37 37 37 37 37 37 37 37 37 37 37 37 220 220 220 220 220 220 220 220 220 220 220 220 220 220 51 51 51 51 51 51 51 51 51 51 51 328 328 328 328 328 328 328 328 328 328 328 328 328 328 131 131 131 131 131 131 131 233 233 233 233 233 233 204 204 204 204 204 204 204 204 204 204 204 204 51 51 51 51 51 51 51 121 121 121 121 121 144 144 144 144 144 144 144 231 231 231 231 231 231 165 165 165 165 165 165 165 165 165 165 165 165 165 165 165 165 165 228 228 228 228 228 227 227 227 227 227 227 227 227 101 101 101 101 101 101 101 101 101 101 101 288 288 288 288 288 288 288 288 107 107 107 107 107 208 208 208 208 208 208 208 219 219 219 219 219 219 219 219 219 219 219 219 69 69 69 69 69 69 69 69 225 225 225 225 225 225 225 116 116 116 116 116 171 171 171 171 171 171 171 171 171 171 171 171 171 277 277 277 133 133 133 133 233 233 233 233 233 233 233 233 285 285 285 285 285 285 285 285 285 285 189 189 189 189 189 189 189 189 189 272 272 272 272 272 272 272 272 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 187 187 187 187 187 187 187 187 187 187 177 177 177 177 177 177 177 177 341 341 341 341 341 341 341 341 193 193 193 193 193 193 193 281 281 281 281 281 281 281 281 281 281 289 289 289 289 289 49 49 49 49 49 49 49 49 116 116 116 116 116 35 35 35 35 35 35 35 35 35 35 35 35 233 233 233 233 116 116 116 116 116 116 67 67 67 67 67 67 225 225 225 225 225 333 333 333 333 333 333 333 169 169 169 169 169 169 340 340 340 340 340 340 340 340 179 179 179 179 179 179 179 179 179 179 37 37 37 37 37 37 116 116 116 116 116 187 187 187 187 187 177 177 177 177 177 177 177 341 341 341 341 341 193 193 193 193 193 193 193 281 281 281 281 281 281 289 289 289 289 289 289 49 49 49 49 116 116 116 116 107 107 107 107 107 189 189 189 189 289 289 289 289 289 289 289 289 333 333 333 333 209 209 209 209 209 209 232 232 232 232 227 227 227 227 227 227 17 17 17 277 277 277 277 193 193 193 193 193 193 225 225 225 225 225 225 225 48 48 48 48 48 219 219 219 219 219 219 219 219 219 219 219 53 53 53 53 53 53 53 293 293 293 293 293 293 293 109 109 109 109 109 109 109 109 145 145 145 145 145 145 145 145 288 288 288 288 47 47 47 47 47 233 233 233 116 116 116 227 227 227 193 193 193 193 281 281 281 281 281 281 281 281 189 189 189 189 340 340 340 340 340 340 340 275 275 275 275 275 275 165 165 165 165 165 165 165 165 113 113 113 113 113 113 113 113 49 49 49 49 49 49 224 224 224 224 224 224 224 224 224 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 191 191 191 191 191 191 191 191 191 232 232 232 232 232 232 279 279 279 279 279 279 279 273 273 273 273 273 273 101 101 101 101 101 101 101 101 101 101 101 101 101 101 288 288 288 288 288 288 51 51 51 51 51 51 51 51 51 51 51 51 51 51 328 328 328 328 328 328 328 328 328 328 328 328 328 328 328 328 328 328 328 328 328 328 328 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 67 67 67 67 67 67 67 67 67 67 276 276 276 276 276 276 271 271 271 271 271 271 271 271 145 145 145 145 145 145 181 181 181 181 181 181 37 37 37 37 37 37 37 273 273 273 273 273 273 273 280 280 280 280 280 280 280 280 280 280 280 107 107 107 107 107 107 189 189 189 189 189 221 221 221 221 221 221 221 221 221 221 221 221 53 53 53 53 53 53 53 53 340 340 340 340 340 340 340 51 51 51 51 51 51 51 51 51 51 51 51 51 51 328 328 328 328 328 328 328 328 328 328 328 328 328 328 328 1 1 1 1 1 1 1 1 1 1 119 119 119 119 119 133 133 133 133 276 276 276 276 276 276 276 115 115 115 197 197 197 197 197 281 281 281 281 281 281 281 281 281 197 197 197 197 229 229 229 229 49 49 49 49 49 225 225 225 225 225 225 225 225 37 37 37 37 37 277 277 277 277 277 277 277 277 49 49 49 49 49 289 289 289 289 289 289 289 204 204 204 204 204 204 204 204 204 204 204 204 204 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 227 227 227 227 227 227 227 17 17 17 277 277 277 277 277 277 277 277 193 193 193 193 193 225 225 225 225 225 225 225 48 48 48 48 331 331 331 331 331 49 49 49 340 340 340 340 340 340 340 50 50 50 50 50 50 50 287 287 287 287 287 287 287 287 287 287 287 287 287 287 287 287 287 287 69 69 69 69 69 69 69 69 69 69 69 69 69 69 69 69 69 69 69 69 69 69 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 1 1 1 1
+291 291 291 291 291 291 291 291 291 291 291 291 291 291 291 291 193 193 193 193 193 193 193 193 232 232 232 232 232 232 232 232 232 232 232 232 232 232 331 331 331 331 331 331 305 305 305 305 305 305 229 229 229 229 229 229 229 229 49 49 49 49 49 232 232 232 232 232 232 232 331 331 331 331 331 331 331 331 189 189 189 292 292 292 292 292 35 35 35 35 35 35 35 35 35 35 35 35 35 35 35 35 237 237 237 237 237 237 237 237 237 177 177 177 177 49 49 49 49 225 225 225 225 225 225 225 225 225 225 225 340 340 340 340 340 340 340 340 340 340 340 340 35 35 35 35 35 233 233 233 233 116 116 116 116 331 331 331 331 331 189 189 189 121 121 121 121 121 85 85 85 85 85 85 85 85 85 85 85 85 85 85 288 288 288 288 288 288 288 219 219 219 219 219 219 219 219 219 219 219 219 219 219 149 149 149 149 149 149 149 149 149 149 149 149 149 149 149 149 149 149 149 329 329 329 329 329 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 1 1 1 1 1 1 1 1 1 1 1 1 179 179 179 179 179 179 179 179 179 179 179 179 144 144 144 144 144 115 115 115 115 115 115 115 115 115 115 115 21 21 21 21 21 21 21 21 277 277 277 277 277 277 277 220 220 220 220 220 220 220 220 220 179 179 179 179 179 133 133 133 133 133 276 276 276 276 276 276 276 276 276 276 276 283 283 283 283 283 283 283 283 283 283 283 283 283 283 283 283 249 249 249 249 249 249 249 249 249 249 249 249 116 116 116 116 116 116 279 279 279 279 279 279 279 279 279 53 53 53 228 228 228 228 228 228 228 228 228 228 228 175 175 175 175 175 175 175 175 277 277 277 277 277 277 277 164 164 164 164 164 164 164 164 164 164 279 279 279 279 279 279 279 279 279 279 279 279 279 289 289 289 289 289 289 277 277 277 277 277 209 209 209 209 209 209 209 209 209 221 221 221 221 221 221 221 280 280 280 280 280 280 280 47 47 47 233 233 233 116 116 116 116 331 331 331 49 49 49 340 340 340 340 340 340 340 340 67 67 67 67 67 67 225 225 225 225 225 333 333 333 333 333 333 333 205 205 205 205 205 340 340 340 340 340 340 340 340 287 287 287 287 287 287 287 287 333 333 333 193 193 193 193 281 281 281 281 281 281 281 289 289 289 289 49 49 49 49 116 116 116 116 116 51 51 51 51 51 51 51 51 51 51 272 272 272 272 272 272 272 187 187 187 232 232 232 50 50 50 50 50 50 50 179 179 179 179 179 179 179 179 179 21 21 21 21 21 21 21 21 277 277 277 277 277 277 277 116 116 116 116 223 223 223 223 193 193 193 193 289 289 289 289 49 49 49 49 224 224 224 224 224 231 231 231 231 231 231 231 231 231 21 21 21 21 21 21 21 21 288 288 288 288 288 288 288 1 1 1 107 107 107 189 189 189 181 181 181 181 181 181 181 181 181 181 181 181 181 181 181 101 101 101 101 101 101 101 101 101 101 101 101 101 101 233 233 233 233 233 233 233 233 116 116 116 116 116 116 1 1 1 1 1 1 331 331 331 331 331 331 193 193 193 193 292 292 292 292 292 292 292 292 292 292 292 287 287 287 287 287 287 287 287 287 287 287 287 320 320 320 320 320 320 320 320 320 320 320 320 320 320 320 320 320 320 320 320 331 331 331 331 331 331 331 331 331 331 331 101 101 101 101 101 101 101 101 101 101 101 101 144 144 144 144 144 144 144 144 179 179 179 179 179 179 179 179 179 179 179 133 133 133 133 277 277 277 277 277 277 277 277 273 273 273 273 273 273 273 273 273 273 189 189 189 189 189 233 233 233 233 233 233 233 233 340 340 340 340 279 279 279 279 279 279 289 289 289 289 289 289 53 53 53 53 53 53 220 220 220 220 220 220 47 47 47 47 47 177 177 177 177 177 177 177 177 177 177 277 277 277 277 133 133 133 133 281 281 281 281 281 281 281 281 281 281 281 281 189 189 189 189 329 329 329 329 329 225 225 225 225 225 225 204 204 204 204 204 204 291 291 291 291 291 291 291 291 291 291 277 277 277 277 277 277 320 320 320 320 320 320 320 187 187 187 187 187 187 187 187 288 288 288 288 288 288 288 288 288 288 288 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 283 283 283 283 283 283 283 283 283 283 283 283 283 283 208 208 208 208 208 208 208 208 208 223 223 223 223 223 223 223 223 305 305 305 305 305 221 221 221 221 221 221 221 288 288 288 288 223 223 223 223 223 223 101 101 101 101 101 101 220 220 220 220 220 220 220 50 50 50 50 50 50 331 331 331 331 331 331 331 331 305 305 305 305 305 229 229 229 229 229 229 229 49 49 49 232 232 232 232 232 47 47 47 47 328 328 328 328 328 231 231 231 231 231 231 231 231 37 37 37 37 37 37 37 37 37 37 277 277 277 277 277 277 277 244 244 244 244 244 244 244 244 244 244 244 187 187 187 187 221 221 221 221 221 221 281 281 281 281 281 281 281 273 273 273 273 273 273 193 193 193 193 193 193 193 277 277 277 277 277 205 205 205 205 205 205 205 205 49 49 49 49 49 233 233 233 233 233 233 280 280 280 280 280 280 47 47 47 233 233 233 116 116 116 116 116 275 275 275 275 275 275 193 193 193 193 217 217 217 217 217 217 217 189 189 189 189 116 116 116 116 219 219 219 219 219 219 219 219 219 219 219 21 21 21 21 21 21 21 21 233 233 233 233 233 233 233 285 285 285 285 285 285 285 285 49 49 49 49 49 49 49 233 233 233 233 233 233 233 233 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 1 1 1 1 1 1 1 1 1 1 1 1 331 331 331 331 331 331 331 193 193 193 193 193 193 112 112 112 112 112 112 112 112 112 283 283 283 283 283 283 283 283 208 208 208 208 208 208 331 331 331 331 331 331 331 331 69 69 69 69 69 69 69 69 69 69 69 69 69 69 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
+1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 107 107 107 107 107 107 53 53 53 53 53 288 288 288 288 119 119 119 133 133 133 276 276 276 331 331 331 331 331 49 49 49 49 340 340 340 340 340 340 340 50 50 50 50 50 50 279 279 279 279 279 279 279 279 279 279 279 279 165 165 165 165 165 165 165 165 165 165 165 165 165 329 329 329 189 189 189 189 189 236 236 236 236 236 236 236 236 236 236 236 279 279 279 279 279 279 279 279 279 279 279 279 279 279 53 53 53 53 229 229 229 229 229 229 229 293 293 293 293 293 293 293 293 189 189 189 189 236 236 236 236 236 236 47 47 47 47 109 109 109 109 109 109 85 85 85 85 85 85 85 85 85 288 288 288 288 288 179 179 179 179 179 179 179 179 179 179 144 144 144 144 144 144 227 227 227 227 227 227 227 227 85 85 85 85 85 85 85 85 85 85 85 85 85 85 85 85 85 85 292 292 292 292 292 292 292 292 292 292 292 292 292 292 292 292 331 331 331 331 193 193 193 193 193 193 193 112 112 112 112 112 112 112 112 112 112 112 112 112 112 112 112 112 112 112 112 112 112 112 112 1 1 1 1 1 187 187 187 187 187 187 187 187 187 187 187 187 172 172 172 172 172 172 172 172 172 172 191 191 191 288 288 288 179 179 179 37 37 37 37 116 116 116 116 107 107 107 107 107 107 193 193 193 193 193 193 193 193 232 232 232 232 232 232 232 232 232 232 131 131 131 131 131 131 131 131 131 131 131 329 329 329 329 329 329 329 144 144 144 144 144 144 144 279 279 279 279 279 279 279 279 279 279 279 279 279 248 248 248 248 248 248 279 279 279 279 279 279 279 279 279 279 279 279 279 225 225 225 225 225 101 101 101 101 101 101 101 101 289 289 289 289 289 289 225 225 225 225 204 204 204 204 204 115 115 115 115 115 189 189 189 189 189 329 329 329 329 329 329 133 133 133 133 133 225 225 225 225 225 225 225 225 225 225 225 49 49 49 49 49 49 49 273 273 273 273 273 273 288 288 288 288 288 288 288 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 227 227 227 227 227 227 227 227 227 227 227 101 101 101 101 101 101 101 101 101 288 288 288 288 179 179 179 37 37 37 328 328 328 107 107 107 107 107 189 189 189 189 232 232 232 232 232 232 219 219 219 219 219 219 49 49 49 49 233 233 233 233 233 233 281 281 281 281 281 281 281 281 281 193 193 193 193 117 117 117 117 117 145 145 145 145 116 116 116 116 116 187 187 187 187 233 233 233 233 233 233 233 233 117 117 117 117 117 117 193 193 193 193 193 221 221 221 221 221 221 221 221 49 49 49 49 289 289 289 289 189 189 189 189 189 189 189 328 328 328 328 328 328 328 47 47 47 47 328 328 328 328 328 50 50 50 50 50 50 279 279 279 279 279 279 279 279 279 133 133 133 133 133 233 233 233 233 233 233 233 280 280 280 280 280 47 47 47 328 328 328 179 179 179 179 179 179 179 179 179 337 337 337 337 321 321 321 321 321 321 229 229 229 229 229 144 144 144 144 144 144 144 144 144 144 144 144 144 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 331 331 331 331 331 331 331 331 148 148 148 148 148 148 148 148 148 148 148 148 67 67 67 67 67 67 67 224 224 224 224 224 271 271 271 271 271 271 271 277 277 277 277 193 193 193 289 289 289 289 204 204 204 204 204 204 331 331 331 331 331 133 133 133 133 133 133 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 279 279 279 279 279 279 279 279 279 279 279 279 133 133 133 133 133 116 116 116 116 116 116 116 227 227 227 193 193 193 193 281 281 281 281 281 281 281 189 189 189 340 340 340 340 340 340 340 340 340 275 275 275 275 275 165 165 165 165 165 165 165 165 165 113 113 113 113 113 113 113 49 49 49 49 49 224 224 224 224 224 224 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 102 102 102 102 102 102 102 102 102 102 102 102 102 102 1 1 1 331 331 331 49 49 49 340 340 340 340 340 340 219 219 219 219 219 219 101 101 101 101 101 101 101 101 233 233 233 116 116 116 47 47 47 328 328 328 328 47 47 47 47 47 47 47 47 173 173 173 173 173 173 173 173 173 277 277 277 165 165 165 165 165 165 165 165 165 165 165 116 116 116 116 335 335 335 335 335 335 335 335 335 320 320 320 320 320 320 320 320 331 331 331 331 331 331 331 331 331 149 149 149 149 149 149 149 149 149 149 233 233 233 288 288 288 288 288 288 1 1 1 119 119 119 119 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 1 1 1 331 331 331 331 331 331 331 331 133 133 133 133 232 232 232 232 232 102 102 102 102 102 102 102 102 279 279 279 279 279 279 279 279 279 279 279 68 68 68 68 68 68 68 68 68 68 68 227 227 227 227 227 227 227 227 227 227 37 37 37 37 37 37 37 37 37 37 293 293 293 293 293 293 293 337 337 337 337 337 316 316 316 316 316 316 279 279 279 279 279 279 289 289 289 289 289 289 21 21 21 21 277 277 277 277 289 289 289 289 189 189 189 189 236 236 236 236 236 236 236 236 67 67 67 67 67 67 172 172 172 172 172 172 287 287 287 287 287 49 49 49 117 117 117 117 117 117 164 164 164 164 164 164 164 164 164 164 164 164 164 164 164 164 164 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 102 102 102 102 102 102 102 102 102 102 102 102 102 102 291 291 291 291 291 291 69 69 69 69 69 69 69 69 69 69 288 288 288 288 227 227 227 227 227 165 165 165 165 165 165 165 165 165 109 109 109 109 109 204 204 204 204 204 204 179 179 179 179 179 208 208 208 208 208 208 331 331 331 49 49 49 49 340 340 340 340 340 340 175 175 175 175 175 249 249 249 249 249 249 249 249 249 189 189 189 189 189 189 189 236 236 236 236 287 287 287 287 287 48 48 48 48 48 119 119 119 119 52 52 52 115 115 115 115 115 21 21 21 21 21 21 21 21 21 221 221 221 221 221 221 221 221 289 289 289 289 289 289 145 145 145 145 145 145 145 145 145 145 145 145 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
+1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 227 227 227 227 227 227 227 227 227 227 145 145 145 145 145 145 145 145 145 193 193 193 193 193 193 225 225 225 225 225 225 225 225 225 49 49 49 49 49 340 340 340 340 340 340 340 340 223 223 223 223 223 223 223 193 193 193 193 193 193 273 273 273 273 273 273 273 273 280 280 280 280 280 280 280 280 280 280 280 280 287 287 287 287 287 287 287 287 287 287 333 333 333 333 333 193 193 193 193 193 193 113 113 113 113 113 113 113 113 113 113 113 113 113 288 288 288 288 288 51 51 51 233 233 233 233 233 117 117 117 117 145 145 145 145 145 281 281 281 281 281 281 281 289 289 289 289 289 37 37 37 37 37 37 37 37 37 37 233 233 233 117 117 117 189 189 189 237 237 237 237 237 237 237 237 225 225 225 225 225 225 204 204 204 204 204 204 204 204 204 204 204 204 204 204 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 283 283 283 283 283 208 208 208 208 208 208 208 208 208 179 179 179 179 179 37 37 37 37 37 116 116 116 116 187 187 187 187 187 221 221 221 221 221 221 281 281 281 281 281 273 273 273 273 273 133 133 133 133 133 133 221 221 221 221 221 221 221 221 289 289 289 289 289 189 189 189 189 189 116 116 116 116 116 227 227 227 227 227 193 193 193 193 193 281 281 281 281 281 281 189 189 189 189 189 340 340 340 340 340 340 340 275 275 275 275 275 275 165 165 165 165 165 165 165 165 113 113 113 113 113 49 49 49 49 49 49 49 49 224 224 224 224 224 224 224 224 224 51 51 51 51 51 51 51 272 272 272 272 272 272 272 272 272 272 272 272 272 272 272 1 1 1 1 1 1 1 1 1 1 1 283 283 283 283 283 283 283 283 283 283 283 283 283 208 208 208 208 208 208 208 208 179 179 179 179 179 179 179 37 37 37 37 37 37 116 116 116 116 116 231 231 231 231 231 231 231 231 249 249 249 249 249 249 249 249 249 249 249 249 249 249 232 232 232 232 232 119 119 119 119 49 49 49 49 288 288 288 119 119 119 48 48 48 48 48 48 279 279 279 279 279 279 279 279 279 279 279 101 101 101 101 101 101 101 101 101 101 101 101 101 288 288 288 288 288 288 288 47 47 47 47 47 328 328 328 328 227 227 227 227 227 227 227 227 37 37 37 37 37 37 37 37 37 37 37 293 293 293 293 293 293 293 337 337 337 337 316 316 316 316 316 316 316 316 215 215 215 215 215 215 215 215 215 215 215 215 215 69 69 69 69 69 69 69 69 69 233 233 233 233 233 289 289 289 289 289 289 189 189 189 189 236 236 236 236 236 236 236 67 67 67 67 67 67 67 67 67 172 172 172 172 172 172 279 279 279 279 279 279 279 279 279 279 248 248 248 248 248 248 248 248 248 55 55 55 55 55 55 55 55 55 233 233 233 233 49 49 49 49 49 221 221 221 221 221 221 221 221 221 221 221 85 85 85 85 85 85 85 85 85 233 233 233 233 233 289 289 289 289 49 49 49 49 49 109 109 109 109 109 225 225 225 225 225 225 225 204 204 204 204 204 204 204 204 204 204 204 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 331 331 331 331 331 331 331 331 305 305 305 305 305 116 116 116 107 107 107 107 107 107 204 204 204 204 204 204 204 204 204 204 287 287 287 287 287 287 287 287 287 287 287 287 287 320 320 320 320 320 320 320 320 320 227 227 227 227 227 227 227 53 53 53 53 53 53 53 53 53 112 112 112 112 112 112 112 171 171 171 171 171 171 171 171 144 144 144 144 179 179 179 179 179 179 144 144 144 144 144 231 231 231 231 231 231 231 165 165 165 165 165 165 165 165 165 165 109 109 109 109 109 109 145 145 145 145 145 145 145 145 340 340 340 340 340 340 219 219 219 219 219 219 337 337 337 337 337 309 309 309 277 277 277 277 277 277 277 205 205 205 205 205 205 205 205 205 21 21 21 21 21 21 21 281 281 281 281 281 281 281 281 281 281 281 49 49 49 289 289 289 289 204 204 204 204 204 204 204 204 204 204 204 204 204 204 204 204 204 204 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 231 231 231 231 231 231 231 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 99 99 99 99 99 99 99 99 99 99 99 99 99 99 228 228 228 228 228 228 228 228 219 219 219 219 219 219 333 333 333 333 333 333 101 101 101 101 101 101 101 101 101 288 288 288 288 288 288 288 288 331 331 331 331 133 133 133 133 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 1 1 1 71 71 71 71 71 71 71 71 71 71 71 71 71 71 71 71 71 71 225 225 225 225 225 121 121 121 121 248 248 248 248 248 248 248 248 248 248 248 248 102 102 102 102 102 102 102 102 102 102 102 102 102 179 179 179 179 179 179 179 37 37 37 37 37 116 116 116 116 50 50 50 50 50 107 107 107 107 107 107 107 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 116 116 116 116 116 179 179 179 179 179 179 179 179 133 133 133 133 117 117 117 117 117 169 169 169 169 169 169 169 169 169 169 220 220 220 220 220 220 335 335 335 335 133 133 133 133 133 133 133 281 281 281 281 281 281 281 289 289 289 289 289 289 145 145 145 145 117 117 117 117 117 168 168 168 168 168 168 168 168 168 168 168 168 168 168 1 1 1 1 1 1 1 1 1 283 283 283 283 283 283 283 283 283 283 283 283 208 208 208 208 208 208 208 279 279 279 279 279 279 279 133 133 133 133 133 133 133 133 133 116 116 116 116 116 116 116 116 116 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
+1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 227 227 227 227 227 227 227 227 227 227 227 37 37 37 37 37 37 37 37 293 293 293 293 293 293 293 337 337 337 337 337 316 316 316 316 316 316 316 331 331 331 331 331 331 133 133 133 133 133 233 233 233 233 233 233 288 288 288 287 287 287 287 48 48 48 48 107 107 107 107 107 277 277 277 277 277 277 101 101 101 101 101 101 101 101 101 101 288 288 288 288 288 288 275 275 275 275 275 275 275 193 193 193 193 193 329 329 329 329 329 329 329 144 144 144 144 144 144 144 144 144 144 144 144 144 144 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 331 331 331 331 331 331 331 331 331 193 193 193 193 276 276 276 276 276 276 175 175 175 175 175 175 175 133 133 133 133 289 289 289 289 189 189 189 189 189 189 236 236 236 236 236 236 236 236 50 50 50 50 223 223 223 223 223 223 193 193 193 289 289 289 289 49 49 49 224 224 224 224 107 107 107 107 107 107 107 107 107 107 107 107 264 264 264 264 264 264 264 264 264 264 264 264 264 264 264 264 264 264 264 264 264 171 171 171 171 171 145 145 145 145 228 228 228 228 47 47 47 232 232 232 232 232 232 232 67 67 67 67 67 67 67 67 67 277 277 277 277 277 277 173 173 173 173 173 173 173 49 49 49 232 232 232 232 232 47 47 47 47 47 281 281 281 281 281 281 281 281 281 281 101 101 101 101 101 101 101 101 101 101 101 225 225 225 225 225 49 49 49 228 228 228 228 228 228 187 187 187 187 232 232 232 232 231 231 231 231 231 231 231 231 231 231 231 231 231 231 231 231 231 231 249 249 249 249 249 249 329 329 329 329 48 48 48 48 48 48 279 279 279 279 279 279 279 279 221 221 221 221 221 249 249 249 249 249 249 249 249 249 285 285 285 285 285 285 285 285 285 285 285 48 48 48 48 48 48 48 48 48 48 48 48 48 48 48 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 47 47 47 47 47 47 47 47 47 47 47 47 233 233 233 116 116 116 179 179 179 209 209 209 209 209 340 340 340 340 340 219 219 219 219 219 219 219 219 219 53 53 53 53 229 229 229 229 189 189 189 189 236 236 236 236 236 236 236 236 236 236 236 236 236 19 19 19 19 19 19 19 232 232 232 232 232 232 119 119 119 52 52 52 52 52 287 287 287 287 287 287 287 287 287 287 277 277 277 277 165 165 165 165 165 165 232 232 232 232 232 232 232 232 287 287 287 287 287 287 49 49 49 49 233 233 233 233 233 233 101 101 101 101 101 101 101 101 101 288 288 288 288 288 288 288 288 288 288 288 288 288 288 288 288 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 187 187 187 187 187 187 187 187 187 187 187 187 187 187 172 172 172 172 172 172 227 227 227 227 17 17 17 277 277 277 277 277 277 277 277 277 277 277 277 193 193 193 193 225 225 225 225 225 225 225 225 48 48 48 48 179 179 179 179 179 179 179 37 37 37 37 116 116 116 116 116 279 279 279 279 279 279 279 279 279 279 279 279 279 133 133 133 133 133 133 133 133 133 133 116 116 116 116 116 116 116 116 1 1 1 119 119 119 49 49 49 288 288 288 288 288 288 227 227 227 227 227 227 227 227 227 37 37 37 37 37 37 37 37 37 37 37 37 293 293 293 293 293 293 293 337 337 337 337 337 316 316 316 316 316 316 179 179 179 179 37 37 37 37 116 116 116 116 116 175 175 175 175 175 175 69 69 69 69 69 69 69 69 69 69 232 232 232 232 232 232 287 287 287 287 287 287 287 48 48 48 48 107 107 107 107 107 107 107 107 107 277 277 277 277 277 277 101 101 101 101 101 101 101 101 101 288 288 288 288 288 288 288 288 288 288 288 288 275 275 275 275 193 193 193 193 193 193 329 329 329 329 329 144 144 144 144 144 144 144 144 144 144 144 144 287 287 287 287 287 287 287 48 48 48 48 227 227 227 227 227 227 209 209 209 209 209 209 209 288 288 288 288 50 50 50 50 50 219 219 219 219 219 219 219 219 219 219 219 219 219 219 219 219 219 219 219 41 41 41 41 237 237 237 237 237 237 237 237 237 237 237 177 177 177 177 177 177 145 145 145 145 145 145 145 145 145 145 145 320 320 320 320 320 320 320 320 320 320 320 320 171 171 171 171 171 171 171 145 145 145 145 228 228 228 228 228 228 63 63 63 63 63 63 63 63 281 281 281 281 281 281 281 281 281 281 281 281 281 289 289 289 289 289 289 289 289 289 277 277 277 277 165 165 165 165 165 165 225 225 225 225 225 337 337 337 337 337 337 337 48 48 48 48 48 48 48 48 48 48 48 48 48 48 48 48 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 227 227 227 227 227 227 227 193 193 193 193 193 281 281 281 281 281 281 281 189 189 189 340 340 340 340 340 340 340 340 340 340 275 275 275 275 275 275 165 165 165 165 165 165 165 165 113 113 113 113 113 113 49 49 49 49 49 49 49 49 49 224 224 224 224 224 224 219 219 219 219 219 219 219 219 219 305 305 305 305 305 116 116 116 116 231 231 231 231 231 231 231 21 21 21 21 21 21 21 288 288 288 288 179 179 179 37 37 37 328 328 328 328 328 107 107 107 107 107 107 193 193 193 232 232 232 232 227 227 227 227 227 227 227 227 227 227 227 227 227 69 69 69 69 69 69 69 69 69 69 276 276 276 47 47 47 47 47 47 281 281 281 281 281 281 289 289 289 289 289 289 21 21 21 21 21 21 21 21 21 233 233 233 233 233 189 189 189 189 189 189 189 285 285 285 285 285 285 285 285 285 285 285 288 288 288 288 288 288 288 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
+1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 283 283 283 283 283 283 283 283 283 283 208 208 208 208 208 331 331 331 331 331 49 49 49 340 340 340 340 340 340 340 35 35 35 35 35 35 35 35 35 35 35 221 221 221 221 221 285 285 285 285 285 285 49 49 49 49 49 49 49 225 225 225 225 225 225 225 204 204 204 204 204 204 204 204 279 279 279 279 279 279 279 279 279 279 279 289 289 289 289 289 289 277 277 277 193 193 193 221 221 221 221 221 221 221 49 49 49 49 49 232 232 232 232 232 232 232 232 232 232 232 232 232 232 232 1 1 1 115 115 115 115 115 115 115 115 115 53 53 53 53 53 53 53 53 228 228 228 228 228 228 228 228 228 228 228 228 228 228 228 228 171 171 171 171 171 171 171 171 171 144 144 144 144 144 144 171 171 171 171 171 171 171 171 171 171 171 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 328 328 328 328 328 328 279 279 279 279 279 279 279 279 279 279 279 133 133 133 133 221 221 221 221 221 221 221 221 221 49 49 49 49 233 233 233 233 233 233 233 233 233 233 233 233 340 340 340 340 340 340 340 340 340 340 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 191 191 191 191 191 191 191 191 288 288 288 288 288 288 288 331 331 331 331 331 331 49 49 49 49 49 340 340 340 340 340 340 340 340 340 340 1 1 1 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 119 119 119 49 49 49 288 288 288 288 288 288 288 288 288 288 288 288 288 1 1 1 1 1 1 1 1 1 1 227 227 227 17 17 17 277 277 277 277 193 193 193 193 193 193 193 193 225 225 225 48 48 48 331 331 331 331 331 331 331 331 331 331 331 331 331 331 331 331 69 69 69 69 69 69 340 340 340 340 340 340 340 340 340 340 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 227 227 227 227 227 165 165 165 165 165 165 221 221 221 221 221 221 221 221 189 189 189 189 189 236 236 236 236 236 171 171 171 171 171 171 171 171 171 171 171 171 171 171 171 171 53 53 53 53 53 232 232 232 232 232 47 47 47 328 328 328 328 328 179 179 179 179 179 179 148 148 148 148 148 148 148 148 148 148 148 148 148 148 148 148 148 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 107 107 107 107 107 107 53 53 53 53 53 53 53 53 288 288 288 288 288 227 227 227 227 227 193 193 193 193 193 281 281 281 281 281 281 281 281 281 189 189 189 189 189 340 340 340 340 340 340 340 340 340 340 275 275 275 275 275 275 165 165 165 165 165 165 165 165 113 113 113 113 113 113 113 113 49 49 49 49 49 49 49 49 224 224 224 224 224 224 331 331 331 331 331 49 49 49 340 340 340 340 340 340 340 340 340 340 340 340 340 67 67 67 67 67 67 225 225 225 225 225 229 229 229 229 229 229 253 253 253 253 253 281 281 281 281 281 281 281 281 288 288 288 288 288 288 288 171 171 171 171 171 171 171 171 69 69 69 69 69 69 277 277 277 277 277 281 281 281 281 281 281 281 281 281 281 288 288 288 288 288 288 288 288 288 288 288 287 287 287 287 48 48 48 48 48 279 279 279 279 279 279 279 279 279 49 49 49 49 273 273 273 273 273 273 273 249 249 249 249 249 249 249 249 249 249 249 249 249 340 340 340 340 340 340 340 340 340 187 187 187 187 187 187 187 288 288 288 288 288 288 288 288 288 288 288 288 288 288 288 288 288 288 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 19 19 19 19 19 19 19 19 19 19 276 276 276 276 335 335 335 335 335 335 335 320 320 320 320 320 320 320 320 320 320 187 187 187 232 232 232 232 232 232 232 232 232 232 1 1 1 1 1 1 1 1 147 147 147 147 147 147 147 147 147 147 147 147 147 147 233 233 233 233 233 189 189 189 189 189 189 189 189 281 281 281 281 281 281 281 281 281 288 288 288 288 288 288 288 288 288 227 227 227 227 17 17 17 277 277 277 277 277 277 193 193 193 193 193 225 225 225 225 225 225 225 48 48 48 48 48 48 48 48 48 48 48 48 48 48 1 1 1 1 1 1 1 1 1 1 283 283 283 283 283 283 283 283 283 283 283 283 283 283 208 208 208 208 208 115 115 115 115 115 189 189 189 229 229 229 229 229 229 229 229 37 37 37 37 37 37 37 37 233 233 233 117 117 117 117 117 189 189 189 116 116 116 116 1 1 1 331 331 331 193 193 193 232 232 232 232 232 232 232 232 327 327 327 327 327 327 327 265 265 265 265 265 265 265 265 265 265 265 265 280 280 280 280 280 280 280 280 280 280 280 280 280 280 275 275 275 189 189 189 189 289 289 289 289 289 289 289 289 289 289 289 149 149 149 149 149 149 149 149 149 149 233 233 233 233 233 233 116 116 116 287 287 287 287 48 48 48 48 179 179 179 179 179 179 179 148 148 148 148 148 148 148 148 148 148 148 148 148 148 148 148 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 335 335 335 335 335 335 133 133 133 133 280 280 280 280 280 280 280 47 47 47 328 328 328 328 219 219 219 219 219 219 219 219 219 219 69 69 69 69 69 69 69 69 277 277 277 277 277 277 277 277 277 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
+1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 279 279 279 279 279 279 279 279 279 133 133 133 133 133 116 116 116 116 116 227 227 227 227 227 17 17 17 277 277 277 277 277 277 277 277 193 193 193 193 193 225 225 225 225 225 225 225 48 48 48 48 48 48 48 48 48 48 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 35 35 35 35 35 35 35 35 35 35 340 340 340 340 340 340 340 340 187 187 187 187 187 187 172 172 172 172 172 172 175 175 175 175 175 175 175 175 133 133 133 133 289 289 289 289 289 189 189 189 189 236 236 236 236 236 236 236 236 236 236 236 236 236 107 107 107 107 107 265 265 265 265 265 265 265 265 265 265 265 265 265 265 265 265 265 265 265 265 265 340 340 340 340 340 340 340 171 171 171 171 171 171 145 145 145 145 228 228 228 228 228 67 67 67 67 67 67 67 67 67 67 277 277 277 277 277 173 173 173 173 173 173 173 49 49 49 49 232 232 232 232 47 47 47 47 47 47 281 281 281 281 281 281 281 281 281 101 101 101 101 101 101 101 101 101 101 225 225 225 225 49 49 49 49 229 229 229 229 229 229 229 340 340 340 340 340 187 187 187 187 187 232 232 232 231 231 231 231 231 231 231 231 231 249 249 249 249 249 249 329 329 329 329 329 48 48 48 48 48 279 279 279 279 279 279 279 221 221 221 221 221 221 221 249 249 249 249 249 249 249 249 285 285 285 285 285 285 285 285 285 285 48 48 48 48 48 48 331 331 331 331 331 331 144 144 144 144 144 144 271 271 271 271 271 271 271 271 271 21 21 21 21 21 21 21 277 277 277 277 277 288 288 288 288 47 47 47 47 328 328 328 119 119 119 119 119 204 204 204 204 335 335 335 335 335 335 335 335 335 335 321 321 321 321 321 321 321 321 321 321 321 345 345 345 345 345 345 345 345 345 317 317 317 317 49 49 49 224 224 224 224 224 224 279 279 279 279 279 279 279 279 279 279 273 273 273 273 273 273 277 277 277 193 193 193 193 193 193 236 236 236 236 236 236 236 236 236 236 236 236 236 236 236 331 331 331 331 331 331 331 331 331 331 149 149 149 149 149 149 149 149 149 149 149 220 220 220 220 220 220 220 19 19 19 19 19 19 19 232 232 232 232 232 232 232 232 232 131 131 131 131 131 131 131 233 233 233 233 233 233 204 204 204 204 204 204 204 204 204 331 331 331 331 331 331 331 331 331 133 133 133 133 133 133 133 133 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 275 275 275 275 275 275 275 133 133 133 133 177 177 177 177 177 177 177 337 337 337 337 337 49 49 49 49 225 225 225 225 225 169 169 169 169 169 169 289 289 289 289 289 189 189 189 189 116 116 116 116 116 35 35 35 35 35 35 35 35 35 35 35 35 35 329 329 329 329 329 329 49 49 49 233 233 233 233 233 233 225 225 225 225 212 212 212 212 212 212 212 212 212 171 171 171 171 171 171 171 171 171 171 171 171 171 171 21 21 21 21 21 21 21 21 21 21 277 277 277 277 277 277 277 228 228 228 228 228 228 228 228 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 187 187 187 187 187 187 187 187 233 233 233 233 233 281 281 281 281 281 281 281 289 289 289 289 289 289 133 133 133 133 133 116 116 116 116 47 47 47 328 328 328 328 328 107 107 107 107 107 107 209 209 209 209 209 189 189 189 189 189 236 236 236 236 236 236 35 35 35 232 232 232 232 232 232 47 47 47 47 47 47 47 47 47 47 233 233 233 233 233 233 181 181 181 181 181 181 181 181 181 181 181 181 181 149 149 149 149 149 149 149 149 149 149 116 116 116 116 116 116 51 51 51 51 51 51 51 51 51 51 328 328 328 328 328 328 195 195 195 233 233 233 233 233 49 49 49 49 49 329 329 329 329 329 165 165 165 165 165 165 165 165 165 285 285 285 285 285 285 285 285 285 285 285 285 49 49 49 49 232 232 232 232 232 232 232 232 232 232 232 232 232 232 232 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 227 227 227 227 227 227 227 227 227 193 193 193 193 193 281 281 281 281 281 189 189 189 189 189 340 340 340 340 340 340 340 340 275 275 275 275 275 275 165 165 165 165 165 165 165 113 113 113 113 113 113 113 49 49 49 49 49 49 224 224 224 224 171 171 171 171 171 171 171 171 171 171 133 133 133 133 133 225 225 225 225 225 225 225 225 225 288 288 288 288 119 119 119 119 49 49 49 288 288 288 283 283 283 283 283 283 283 283 283 208 208 208 208 208 208 208 208 208 179 179 179 179 179 179 179 179 179 37 37 37 37 37 37 116 116 116 116 116 116 116 116 116 116 275 275 275 205 205 205 205 205 281 281 281 281 281 281 281 281 281 281 281 281 281 281 209 209 209 209 209 209 209 209 209 209 209 209 209 329 329 329 329 329 116 116 116 116 116 50 50 50 50 50 50 50 50 50 50 50 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 279 279 279 279 279 279 279 279 279 279 279 279 279 279 279 279 49 49 49 49 49 329 329 329 329 329 329 329 193 193 193 193 193 193 193 193 193 193 193 193 193 276 276 276 276 276 276 276 227 227 227 227 227 227 227 227 227 227 227 227 227 227 227 227 133 133 133 133 133 133 233 233 233 233 233 233 233 233 289 289 289 289 289 49 49 49 49 49 49 224 224 224 224 224 224 224 224 224 224 1 1 1 1 1 1 215 215 215 215 215 215 215 215 215 215 249 249 249 249 249 249 249 249 225 225 225 225 225 225 225 225 225 225 225 225 288 288 288 288 288 288 288 288 288 288 288 288 288 288 288 288 288 288 288 288 288 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 283 283 283 283 283 283 283 283 283 283 283 283 283 283 208 208 208 208 208 291 291 291 291 291 291 291 291 291 69 69 69 69 69 69 69 69 69 69 288 288 288 288 288 288 288 288 288 187 187 187 187 232 232 232 232 232 135 135 135 135 221 221 221 221 221 221 221 221 281 281 281 281 281 281 221 221 221 221 225 225 225 225 49 49 49 229 229 229 229 229 165 165 165 165 165 285 285 285 285 285 285 285 285 285 285 285 285 49 49 49 49 49 49 232 232 232 232 232 232 232 271 271 271 271 271 271 271 265 265 265 265 265 265 265 265 265 265 265 265 265 265 233 233 233 289 289 289 289 280 280 280 280 280 280 280 280 280 280 280 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
+1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 227 227 227 227 227 227 227 227 227 227 227 17 17 17 277 277 277 193 193 193 193 193 193 193 193 193 193 193 193 193 193 193 193 193 193 225 225 225 225 225 225 225 48 48 48 48 48 48 48 48 48 48 48 35 35 35 233 233 233 233 233 233 116 116 116 116 116 116 116 227 227 227 227 227 37 37 37 37 37 37 37 37 37 37 37 37 293 293 293 293 293 293 337 337 337 337 316 316 316 316 316 316 219 219 219 219 219 219 219 219 219 219 219 219 219 219 53 53 53 53 53 53 53 53 293 293 293 293 293 109 109 109 109 109 109 145 145 145 145 288 288 288 47 47 47 47 328 328 328 328 328 328 328 67 67 67 67 67 67 67 67 67 224 224 224 224 224 224 271 271 271 271 271 271 271 271 271 271 271 271 271 271 271 209 209 209 209 209 209 273 273 273 273 273 49 49 49 49 224 224 224 224 224 224 224 47 47 47 47 117 117 117 117 117 21 21 21 21 21 21 21 21 273 273 273 273 289 289 289 289 289 189 189 189 236 236 236 236 236 50 50 50 50 50 50 50 50 50 50 50 1 1 1 1 1 1 1 107 107 107 107 107 107 107 264 264 264 264 264 264 264 264 264 264 264 264 264 264 264 264 264 264 264 264 264 264 264 264 264 264 264 264 264 264 264 264 264 264 264 264 264 264 264 264 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 171 171 171 171 171 171 171 171 277 277 277 53 53 53 228 228 228 47 47 47 232 232 232 67 67 67 67 67 67 67 67 67 67 277 277 277 277 277 277 173 173 173 173 173 173 173 49 49 49 49 232 232 232 47 47 47 47 47 281 281 281 281 281 281 281 281 281 281 281 281 281 281 101 101 101 101 101 101 101 101 101 101 101 225 225 225 225 225 225 225 225 49 49 49 49 228 228 228 228 228 228 228 228 228 228 228 228 228 228 228 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 331 331 331 331 331 331 331 331 331 331 331 133 133 133 133 133 133 133 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 1 1 1 1 1 1 119 119 119 119 119 52 52 52 52 52 52 331 331 331 331 331 331 331 331 331 331 331 331 331 331 331 331 149 149 149 149 149 149 149 149 149 149 225 225 225 225 225 225 225 225 225 225 116 116 116 116 116 331 331 331 331 331 49 49 49 49 340 340 340 340 279 279 279 279 279 279 279 279 279 279 279 279 279 279 279 279 279 149 149 149 149 149 149 149 149 289 289 289 289 49 49 49 233 233 233 225 225 225 225 204 204 204 204 204 204 204 287 287 287 287 287 287 287 287 287 287 287 287 149 149 149 149 149 149 149 233 233 233 233 189 189 189 236 236 236 236 236 236 236 51 51 51 51 51 273 273 273 273 273 281 281 281 281 281 281 281 281 281 281 101 101 101 101 101 101 101 101 101 101 116 116 116 116 116 116 116 115 115 115 115 115 115 85 85 85 85 85 85 85 85 85 85 85 85 85 85 85 85 85 85 85 85 232 232 232 232 232 232 232 232 232 232 232 232 232 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 283 283 283 283 283 283 283 283 283 283 283 283 208 208 208 208 331 331 331 331 331 331 305 305 305 305 116 116 116 107 107 107 107 107 208 208 208 208 208 208 208 279 279 279 279 279 49 49 49 49 49 273 273 273 273 273 273 273 277 277 277 277 277 277 101 101 101 101 101 101 101 101 101 341 341 341 341 341 341 116 116 116 116 116 116 35 35 35 35 288 288 288 288 231 231 231 231 231 231 231 231 231 231 231 231 231 53 53 53 53 53 53 53 293 293 293 293 293 293 293 293 189 189 189 189 236 236 236 236 236 236 236 236 35 35 35 35 35 35 35 35 35 173 173 173 173 289 289 289 289 289 144 144 144 144 119 119 119 119 119 193 193 193 193 193 193 193 193 193 193 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 231 231 231 231 231 231 231 231 231 231 231 231 231 231 231 231 231 53 53 53 53 53 53 53 53 53 293 293 293 293 293 293 293 293 293 293 293 189 189 189 236 236 236 236 236 236 236 236 236 236 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
+1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 331 331 331 331 331 331 53 53 53 288 288 288 288 288 19 19 19 19 19 19 19 19 232 232 232 232 232 232 232 232 1 1 1 1 1 1 1 1 1 147 147 147 147 147 147 147 147 147 147 147 147 147 147 147 147 292 292 292 292 292 292 292 271 271 271 271 271 271 305 305 305 305 288 288 288 288 288 279 279 279 279 53 53 53 53 53 53 112 112 112 112 112 50 50 50 50 231 231 231 231 231 231 231 231 231 231 231 231 249 249 249 249 249 249 285 285 285 285 285 285 285 285 285 285 285 49 49 49 232 232 232 232 232 187 187 187 233 233 233 233 289 289 289 289 289 289 48 48 48 335 335 335 69 69 69 69 69 69 69 276 276 276 276 276 179 179 179 179 179 179 179 179 179 179 179 179 133 133 133 133 133 133 133 133 116 116 116 116 116 116 116 116 116 116 116 116 116 116 116 116 1 1 1 1 1 1 1 1 283 283 283 283 283 283 283 283 283 283 283 283 208 208 208 208 208 208 115 115 115 115 115 189 189 189 229 229 229 229 229 229 229 229 229 37 37 37 37 37 37 37 37 37 37 37 37 233 233 233 117 117 117 117 189 189 189 189 116 116 116 116 116 116 116 116 115 115 115 115 197 197 197 281 281 281 281 281 281 281 49 49 49 49 49 49 273 273 273 273 273 273 273 273 277 277 277 277 321 321 321 321 329 329 329 329 329 329 189 189 189 189 237 237 237 237 237 237 237 225 225 225 225 225 225 204 204 204 204 204 204 204 204 204 204 204 204 204 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 119 119 119 119 119 119 193 193 193 193 193 193 193 193 280 280 280 280 280 280 280 280 280 179 179 179 179 179 179 179 179 179 37 37 37 37 116 116 116 116 107 107 107 107 107 107 193 193 193 193 193 232 232 232 232 232 232 232 232 232 115 115 115 115 115 115 115 115 53 53 53 53 53 53 232 232 232 232 232 232 232 232 232 232 331 331 331 331 331 331 331 189 189 189 121 121 121 121 121 85 85 85 85 85 85 85 85 85 85 85 288 288 288 288 288 288 179 179 179 179 179 179 179 179 179 148 148 148 148 148 148 148 47 47 47 47 117 117 117 329 329 329 329 329 329 329 329 329 101 101 101 101 101 101 101 101 101 101 101 280 280 280 280 280 280 280 280 280 107 107 107 107 107 107 209 209 209 209 189 189 189 189 189 189 236 236 236 236 236 236 236 35 35 35 35 35 35 35 35 35 35 35 35 35 35 281 281 281 281 281 281 281 281 281 281 281 221 221 221 221 288 288 288 288 288 288 288 288 288 288 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 35 35 35 35 35 35 35 35 35 35 35 35 35 35 35 35 35 233 233 233 233 233 233 233 116 116 116 116 116 116 227 227 227 227 227 227 227 227 53 53 53 53 53 53 53 53 281 281 281 281 281 281 288 288 288 288 288 271 271 271 271 271 271 271 271 271 271 271 145 145 145 145 145 145 145 173 173 173 173 173 173 173 173 173 173 173 69 69 69 69 69 69 277 277 277 277 277 277 277 277 277 280 280 280 280 280 280 280 280 280 280 280 280 107 107 107 107 107 107 204 204 204 204 204 204 115 115 115 115 115 197 197 197 197 197 281 281 281 281 49 49 49 49 273 273 273 273 273 273 273 277 277 277 277 277 321 321 321 321 321 321 321 321 321 321 321 329 329 329 329 329 329 329 329 329 116 116 116 116 116 116 116 116 116 116 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 331 331 331 331 331 331 331 133 133 133 133 133 133 133 133 133 133 133 133 133 133 133 133 133 133 133 224 224 224 224 224 224 224 224 331 331 331 331 331 209 209 209 209 209 209 209 209 328 328 328 328 328 107 107 107 107 107 107 193 193 193 193 193 232 232 232 232 232 232 232 232 291 291 291 291 291 291 193 193 193 193 237 237 237 237 237 221 221 221 221 221 221 189 189 189 189 236 236 236 236 236 236 236 47 47 47 47 109 109 109 109 85 85 85 85 85 85 85 85 85 85 85 288 288 288 288 288 288 288 191 191 191 191 288 288 288 288 288 171 171 171 144 144 144 144 279 279 279 279 279 279 279 279 279 279 279 53 53 53 53 53 53 228 228 228 228 228 228 228 228 287 287 287 287 287 287 287 287 287 287 287 101 101 101 101 101 101 101 101 101 101 101 101 101 101 228 228 228 228 228 228 228 228 228 228 228 228 228 228 228 228 228 228 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 67 67 67 67 67 67 67 67 67 67 67 67 224 224 224 224 224 224 224 224 331 331 331 331 331 331 193 193 193 193 233 233 233 233 233 289 289 289 289 289 289 289 289 289 144 144 144 187 187 187 187 232 232 232 232 232 232 232 171 171 171 171 171 171 171 171 37 37 37 37 37 37 37 37 37 37 37 37 37 221 221 221 221 221 221 288 288 288 288 288 288 288 288 288 288 288 288 1 1 1 1 1 1 1 1 1 275 275 275 275 275 275 275 189 189 189 189 289 289 289 289 289 289 289 289 289 149 149 149 149 149 149 149 149 149 149 233 233 233 233 233 116 116 116 227 227 227 227 227 17 17 17 277 277 277 277 277 193 193 193 193 193 193 225 225 225 225 225 225 225 225 48 48 48 48 48 48 48 48 48 48 48 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
+1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 227 227 227 227 227 227 227 227 193 193 193 193 193 281 281 281 281 281 281 189 189 189 189 340 340 340 340 340 340 39 39 39 39 39 39 39 39 225 225 225 225 49 49 49 49 49 49 177 177 177 177 177 341 341 341 341 341 341 37 37 37 37 37 37 37 37 233 233 233 233 117 117 117 117 144 144 144 144 144 144 279 279 279 279 279 279 279 279 279 279 273 273 273 273 133 133 133 133 133 133 133 233 233 233 233 233 233 233 233 281 281 281 281 281 281 281 144 144 144 144 144 144 144 331 331 331 331 331 49 49 49 49 340 340 340 340 340 340 340 340 51 51 51 51 51 51 272 272 272 272 179 179 179 179 179 179 209 209 209 276 276 276 276 276 276 331 331 331 331 331 331 331 331 331 53 53 53 53 232 232 232 232 232 232 115 115 115 115 115 115 164 164 164 164 164 164 164 164 164 107 107 107 107 189 189 189 189 189 173 173 173 173 173 173 173 173 173 69 69 69 276 276 276 276 276 276 219 219 219 219 219 219 219 219 219 277 277 277 277 193 193 193 193 193 281 281 281 281 281 281 281 281 281 281 229 229 229 229 229 229 49 49 49 49 49 49 49 49 280 280 280 280 280 280 280 280 280 47 47 47 47 47 233 233 233 116 116 116 116 116 283 283 283 283 283 283 283 283 283 283 208 208 208 208 208 208 208 208 208 208 279 279 279 279 279 279 279 279 279 279 133 133 133 133 133 133 133 116 116 116 116 283 283 283 283 283 283 283 283 283 283 208 208 208 208 208 331 331 331 331 49 49 49 49 340 340 340 340 340 340 175 175 175 175 175 175 249 249 249 249 189 189 189 189 189 189 189 236 236 236 287 287 287 287 287 287 188 188 188 175 175 175 175 175 175 175 133 133 133 133 133 133 288 288 288 166 166 166 166 166 223 223 223 223 193 193 193 289 289 289 289 289 289 289 289 49 49 49 49 224 224 224 224 224 224 224 175 175 175 175 175 175 175 175 175 149 149 149 149 149 149 149 149 149 149 224 224 224 224 224 224 224 224 224 224 224 224 171 171 171 171 171 171 171 171 145 145 145 145 228 228 228 228 119 119 119 119 204 204 204 47 47 47 47 47 281 281 281 281 281 281 281 281 281 281 281 281 101 101 101 101 101 101 101 101 101 101 225 225 225 225 225 225 225 49 49 49 49 228 228 228 228 228 228 228 247 247 247 247 247 247 247 247 329 329 329 329 329 329 329 144 144 144 144 187 187 187 187 232 232 232 232 232 232 1 1 1 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 191 191 191 191 191 191 191 191 191 232 232 232 232 119 119 119 52 52 52 279 279 279 279 279 279 279 279 279 279 273 273 273 273 273 273 273 273 277 277 277 193 193 193 193 193 193 236 236 236 236 236 236 236 236 236 236 236 236 236 236 236 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
+1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 279 279 279 279 279 279 279 279 279 279 279 279 279 279 279 279 279 248 248 248 248 248 248 248 248 248 248 248 227 227 227 227 227 227 227 227 37 37 37 37 37 37 37 37 37 37 293 293 293 293 293 293 293 337 337 337 337 316 316 316 316 316 316 316 47 47 47 233 233 233 116 116 116 116 102 102 102 102 102 102 102 102 102 102 102 102 102 102 179 179 179 179 37 37 37 37 37 328 328 328 328 287 287 287 287 287 287 287 287 287 69 69 69 69 69 69 69 69 221 221 221 221 221 221 288 288 288 288 187 187 187 288 288 288 247 247 247 247 247 247 247 247 247 247 247 247 247 247 247 329 329 329 329 329 329 144 144 144 144 144 144 144 144 144 144 144 144 144 144 144 67 67 67 67 67 67 67 67 67 67 172 172 172 172 172 172 172 172 47 47 47 47 233 233 233 116 116 116 116 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 232 232 232 232 232 232 232 232 232 131 131 131 131 131 131 131 131 329 329 329 329 329 329 144 144 144 144 144 144 279 279 279 279 279 279 279 279 279 279 193 193 193 193 193 233 233 233 233 233 233 233 233 233 233 233 233 280 280 280 280 280 280 280 280 280 280 280 280 280 280 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 331 331 331 331 331 331 208 208 208 208 208 291 291 291 291 291 291 291 69 69 69 69 69 69 69 69 69 288 288 288 288 288 331 331 331 331 209 209 209 209 116 116 116 175 175 175 175 175 175 175 133 133 133 133 288 288 288 288 50 50 50 50 50 107 107 107 107 107 107 264 264 264 264 264 264 264 264 264 264 264 264 264 264 264 264 264 264 264 264 264 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 227 227 227 227 227 227 227 227 227 227 227 227 37 37 37 37 37 37 37 37 37 293 293 293 293 293 337 337 337 337 337 316 316 316 316 187 187 187 340 340 340 340 340 340 175 175 175 175 175 175 193 193 193 193 289 289 289 289 289 189 189 189 189 236 236 236 236 236 236 236 236 51 51 51 51 51 51 51 51 272 272 272 272 272 272 272 272 187 187 187 232 232 232 232 232 232 232 335 335 335 335 335 335 335 335 335 335 193 193 193 193 193 277 277 277 277 277 277 277 277 277 277 340 340 340 340 340 340 335 335 335 335 335 335 320 320 320 320 231 231 231 231 231 231 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 1 1 1 1 1 1 1 1 1 1 1 179 179 179 179 179 179 179 179 179 209 209 209 209 209 340 340 340 340 340 340 279 279 279 279 279 279 279 279 279 279 193 193 193 193 221 221 221 221 221 221 221 281 281 281 281 281 281 281 281 289 289 289 289 289 289 204 204 204 204 204 204 204 204 204 204 204 204 204 1 1 1 35 35 35 35 35 35 35 35 35 35 35 35 233 233 233 116 116 116 179 179 179 208 208 208 208 208 208 208 208 208 191 191 191 191 191 191 341 341 341 341 341 341 49 49 49 232 232 232 232 232 232 279 279 279 279 279 279 279 279 279 279 248 248 248 248 248 248 248 279 279 279 279 279 279 279 279 279 279 279 273 273 273 273 273 277 277 277 277 277 277 100 100 100 100 100 100 100 100 100 100 100 100 100 100 131 131 131 131 131 340 340 340 340 340 340 179 179 179 179 208 208 208 208 208 331 331 331 331 331 331 331 331 53 53 53 53 233 233 233 233 233 233 233 280 280 280 280 280 280 280 280 280 331 331 331 331 331 331 331 69 69 69 69 69 69 69 69 69 69 69 69 69 69 69 69 69 340 340 340 340 340 340 340 340 340 340 340 340 340 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 179 179 179 179 179 179 179 189 189 189 189 340 340 340 340 340 179 179 179 179 179 179 21 21 21 21 21 21 21 21 21 277 277 277 277 277 277 277 277 277 277 277 288 288 288 288 287 287 287 287 277 277 277 277 277 53 53 53 53 53 109 109 109 109 49 49 49 49 225 225 225 225 225 340 340 340 340 340 340 340 179 179 179 193 193 193 193 228 228 228 228 228 50 50 50 50 50 175 175 175 175 175 175 175 175 305 305 305 305 305 116 116 116 116 116 116 116 116 115 115 115 115 115 115 209 209 209 209 209 209 209 209 209 209 209 209 224 224 224 224 224 224 224 224 224 224 224 224 224 224 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 35 35 35 35 35 35 35 35 35 35 35 233 233 233 116 116 116 335 335 335 335 335 335 335 320 320 320 320 231 231 231 231 231 231 231 231 231 248 248 248 248 248 248 248 248 248 248 179 179 179 179 179 179 179 179 179 84 84 84 84 84 84 84 84 84 115 115 115 115 115 115 115 115 115 115 115 115 115 115 133 133 133 133 133 133 133 281 281 281 281 281 281 281 281 273 273 273 273 273 277 277 277 189 189 189 189 288 288 288 179 179 179 179 179 179 179 21 21 21 21 21 21 21 21 277 277 277 277 277 116 116 116 116 116 187 187 187 187 289 289 289 289 289 280 280 280 280 280 280 175 175 175 175 175 175 21 21 21 21 21 21 21 21 288 288 288 288 287 287 287 287 287 188 188 188 188 188 107 107 107 107 107 107 208 208 208 208 208 208 208 208 208 208 287 287 287 188 188 188 175 175 175 175 175 175 175 133 133 133 133 133 133 288 288 288 179 179 179 179 179 179 179 179 179 179 101 101 101 101 101 101 101 101 101 145 145 145 145 145 145 145 145 145 116 116 116 179 179 179 179 179 179 179 179 133 133 133 133 225 225 225 225 225 225 225 225 225 225 225 272 272 272 272 272 272 272 272 272 272 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
+1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 119 119 119 119 119 133 133 133 277 277 277 277 277 340 340 340 340 340 340 231 231 231 231 231 231 133 133 133 133 133 133 329 329 329 329 329 144 144 144 144 144 144 144 144 144 144 144 144 144 144 144 144 144 144 1 1 1 131 131 131 131 131 131 131 131 131 233 233 233 233 233 205 205 205 205 205 205 205 109 109 109 109 49 49 49 49 49 49 49 117 117 117 117 204 204 204 204 204 204 287 287 287 287 287 188 188 188 107 107 107 107 107 204 204 204 204 204 204 204 204 179 179 179 179 179 179 179 179 179 179 179 179 37 37 37 37 37 37 37 37 37 37 116 116 116 107 107 107 107 53 53 53 53 288 288 288 288 288 119 119 119 119 119 249 249 249 249 249 249 249 249 340 340 340 340 340 340 279 279 279 279 279 279 279 279 279 279 279 289 289 289 289 289 289 321 321 321 321 321 321 321 321 273 273 273 273 273 273 189 189 189 189 189 189 189 189 116 116 116 116 116 179 179 179 179 179 179 179 179 179 179 179 179 179 37 37 37 37 37 37 37 37 37 172 172 172 172 172 172 172 172 172 172 172 172 175 175 175 175 175 277 277 277 277 249 249 249 249 249 249 249 249 232 232 232 232 223 223 223 223 193 193 193 193 289 289 289 49 49 49 49 224 224 224 224 224 171 171 171 171 171 171 171 171 171 171 171 171 277 277 277 133 133 133 233 233 233 233 233 233 233 233 112 112 112 112 112 112 112 112 112 112 107 107 107 107 107 107 265 265 265 265 265 265 265 265 265 265 265 265 265 265 265 265 265 265 340 340 340 340 340 340 340 340 340 340 340 340 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 35 35 35 35 35 35 35 35 35 233 233 233 233 116 116 116 116 131 131 131 340 340 340 340 340 340 279 279 279 279 279 279 279 279 279 321 321 321 321 321 321 321 321 232 232 232 232 131 131 131 131 131 131 131 340 340 340 340 340 340 340 335 335 335 335 335 335 335 320 320 320 320 320 115 115 115 115 115 115 115 115 320 320 320 320 320 320 320 320 175 175 175 175 175 175 175 175 193 193 193 193 193 193 288 288 288 288 288 288 288 288 288 288 331 331 331 331 331 331 331 53 53 53 53 232 232 232 232 232 232 232 232 232 107 107 107 107 107 277 277 277 277 249 249 249 249 249 249 249 220 220 220 220 220 220 220 191 191 191 233 233 233 233 289 289 289 289 289 289 316 316 316 335 335 335 305 305 305 305 305 276 276 276 276 276 276 331 331 331 331 331 331 331 331 331 331 165 165 165 165 165 165 165 165 165 165 165 165 165 340 340 340 340 340 340 340 47 47 47 47 233 233 233 116 116 116 287 287 287 287 287 287 287 287 69 69 69 69 69 69 69 69 69 288 288 288 288 288 288 288 288 288 279 279 279 279 279 279 53 53 53 53 53 229 229 229 229 229 229 293 293 293 293 293 293 293 189 189 189 189 189 236 236 236 236 236 236 236 236 236 236 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 179 179 179 179 179 179 179 179 179 209 209 209 209 209 209 340 340 340 340 340 340 340 51 51 51 51 51 51 51 51 51 51 51 51 51 51 272 272 272 272 272 272 272 272 47 47 47 47 47 47 233 233 233 116 116 116 116 116 116 67 67 67 67 67 67 67 67 67 67 67 67 172 172 172 172 172 172 287 287 287 287 287 48 48 48 48 119 119 119 119 119 52 52 52 223 223 223 223 223 223 223 223 21 21 21 21 21 21 109 109 109 109 109 109 281 281 281 281 289 289 289 289 289 289 144 144 144 144 219 219 219 219 219 219 219 219 219 219 37 37 37 37 37 37 37 37 37 233 233 233 233 233 145 145 145 145 145 145 145 145 145 145 205 205 205 205 205 205 340 340 340 340 340 340 340 340 340 146 146 146 146 146 146 146 146 146 146 119 119 119 119 119 52 52 52 52 52 279 279 279 279 279 279 279 279 289 289 289 289 289 289 165 165 165 165 165 165 165 165 165 165 165 165 165 289 289 289 289 289 289 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 35 35 35 35 35 35 35 35 288 288 288 171 171 171 171 171 171 171 171 171 171 171 149 149 149 149 149 149 149 149 149 149 149 149 149 149 281 281 281 281 281 281 281 281 281 288 288 288 288 288 288 227 227 227 227 227 227 227 227 37 37 37 37 37 37 37 37 37 37 293 293 293 293 293 293 293 337 337 337 337 337 316 316 316 316 316 279 279 279 279 279 279 279 49 49 49 49 177 177 177 217 217 217 217 217 217 217 217 133 133 133 133 133 133 281 281 281 281 281 281 281 289 289 289 289 289 189 189 189 116 116 116 116 116 175 175 175 175 175 175 193 193 193 193 289 289 289 189 189 189 189 236 236 236 236 236 236 236 236 50 50 50 50 50 179 179 179 179 179 179 179 179 179 249 249 249 249 249 249 249 249 249 249 249 249 249 228 228 228 228 228 228 107 107 107 107 107 107 107 264 264 264 264 264 264 264 264 264 264 264 264 264 264 264 264 264 264 264 264 264 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 107 107 107 107 107 107 107 53 53 53 288 288 288 288 102 102 102 102 102 102 102 102 102 102 102 102 279 279 279 279 279 279 279 279 279 279 133 133 133 133 133 133 133 116 116 116 116 231 231 231 231 231 231 231 231 231 231 231 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 171 171 171 171 171 171 171 171 171 171 225 225 225 225 225 37 37 37 37 37 37 37 37 288 288 288 287 287 287 287 287 287 48 48 48 119 119 119 119 119 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 288 288 288 288 288 288 288 288 288 288 288 288 288 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
+1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 119 119 119 119 119 119 164 164 164 164 164 164 164 227 227 227 227 227 227 164 164 164 164 164 164 164 164 164 164 107 107 107 107 107 208 208 208 208 208 208 208 208 208 208 208 67 67 67 67 67 67 67 67 224 224 224 224 224 224 224 224 224 224 275 275 275 275 275 275 275 275 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 288 288 288 288 288 47 47 47 47 47 47 47 47 47 228 228 228 228 231 231 231 231 231 21 21 21 21 21 21 21 21 21 21 288 288 288 288 288 288 288 279 279 279 279 279 279 279 279 165 165 165 165 165 165 165 165 165 165 189 189 189 236 236 236 236 236 236 236 119 119 119 133 133 133 133 276 276 276 276 276 276 276 231 231 231 231 231 231 231 231 231 231 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 288 288 288 288 288 288 288 288 288 288 288 288 288 288 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 107 107 107 107 107 107 53 53 53 53 53 288 288 288 288 288 231 231 231 231 231 231 231 231 248 248 248 248 248 248 248 248 248 248 223 223 223 223 223 223 223 223 223 223 223 223 53 53 53 53 53 233 233 233 233 233 117 117 117 117 117 49 49 49 49 232 232 232 232 232 232 279 279 279 279 279 279 279 289 289 289 289 289 289 277 277 277 277 209 209 209 209 209 288 288 288 288 288 288 35 35 35 35 35 35 35 277 277 277 277 277 277 277 277 277 277 277 277 49 49 49 49 109 109 109 109 109 340 340 340 340 340 340 340 340 340 171 171 171 171 171 171 144 144 144 144 227 227 227 227 227 227 227 227 208 208 208 208 208 208 208 208 208 208 208 208 208 208 208 208 208 208 208 208 208 1 1 1 1 1 1 1 1 1 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 279 279 279 279 279 279 279 279 279 279 279 279 133 133 133 133 133 133 133 133 133 116 116 116 116 116 116 116 116 116 116 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 175 175 175 175 175 175 175 193 193 193 193 193 328 328 328 227 227 227 227 227 227 227 227 227 208 208 208 208 208 208 208 208 50 50 50 50 231 231 231 231 231 231 231 231 231 231 231 165 165 165 165 165 165 165 165 165 165 289 289 289 289 189 189 189 328 328 328 328 328 328 328 328 107 107 107 107 107 107 69 69 69 69 69 69 69 69 277 277 277 277 232 232 232 232 35 35 35 35 35 288 288 288 288 288 288 223 223 223 223 223 223 209 209 209 209 209 209 209 209 209 209 209 209 209 281 281 281 281 281 281 281 281 281 281 281 281 288 288 288 288 288 288 288 288 288 288 288 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 119 119 119 119 133 133 133 277 277 277 49 49 49 224 224 224 107 107 107 107 107 204 204 204 204 204 204 204 204 50 50 50 50 50 50 50 50 50 50 275 275 275 275 275 275 275 275 275 275 193 193 193 193 193 193 193 193 281 281 281 281 281 281 281 281 281 281 220 220 220 220 220 220 220 231 231 231 231 231 248 248 248 248 248 248 248 227 227 227 227 227 227 227 227 37 37 37 289 289 289 289 289 289 144 144 144 144 144 179 179 179 179 179 179 179 179 179 320 320 320 320 320 320 331 331 331 331 331 208 208 208 208 208 208 208 208 208 175 175 175 175 175 175 175 133 133 133 133 133 133 133 133 133 133 288 288 288 288 288 288 288 288 288 288 288 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 107 107 107 107 107 53 53 53 288 288 288 288 288 99 99 99 99 99 99 99 224 224 224 224 224 224 224 171 171 171 171 171 171 171 171 171 209 209 209 209 209 209 209 224 224 224 224 224 224 1 1 1 207 207 207 207 207 207 207 207 207 207 207 207 207 207 207 207 207 341 341 341 341 341 341 341 341 205 205 205 205 205 144 144 144 144 144 144 144 144 187 187 187 187 187 187 232 232 232 227 227 227 227 227 227 100 100 100 100 100 100 100 100 100 100 227 227 227 227 227 227 227 227 227 227 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 233 233 233 233 233 116 116 116 116 47 47 47 233 233 233 116 116 116 116 116 279 279 279 279 279 279 279 279 279 225 225 225 225 225 209 209 209 209 209 209 209 272 272 272 272 272 279 279 279 279 279 279 279 279 279 279 279 279 85 85 85 85 85 85 85 85 85 85 85 233 233 233 233 233 117 117 117 117 117 144 144 144 144 144 144 35 35 35 35 35 288 288 288 288 288 231 231 231 231 231 101 101 101 101 101 101 101 101 101 101 101 101 289 289 289 289 289 289 280 280 280 280 280 187 187 187 187 172 172 172 172 331 331 331 331 208 208 208 175 175 175 175 175 175 175 133 133 133 288 288 288 288 50 50 50 50 107 107 107 107 107 107 107 107 107 107 69 69 69 69 69 69 69 277 277 277 277 277 277 277 232 232 232 219 219 219 219 219 219 219 219 49 49 49 49 233 233 233 233 165 165 165 165 165 165 165 165 165 165 165 117 117 117 117 117 205 205 205 205 205 205 205 49 49 49 232 232 232 232 232 232 232 232 232 232 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
+1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 279 279 279 279 279 279 279 279 279 279 279 279 279 248 248 248 248 248 248 248 248 248 248 187 187 187 187 187 187 232 232 232 232 232 119 119 119 119 204 204 204 204 204 204 204 204 204 204 204 131 131 131 131 131 131 131 233 233 233 233 233 233 116 116 116 116 116 331 331 331 331 331 208 208 208 208 208 208 208 115 115 115 115 115 115 197 197 197 197 197 281 281 281 281 281 281 281 281 281 101 101 101 101 101 101 101 101 101 101 101 101 117 117 117 117 117 189 189 189 189 189 189 189 116 116 116 116 287 287 287 287 287 320 320 320 320 320 320 320 320 35 35 35 35 35 35 35 35 35 35 35 35 281 281 281 281 281 281 281 281 281 281 220 220 220 220 220 220 227 227 227 193 193 193 193 193 281 281 281 281 281 281 281 281 189 189 189 340 340 340 340 340 279 279 279 279 279 279 273 273 273 273 273 273 133 133 133 133 133 233 233 233 233 233 233 233 281 281 281 281 281 281 281 281 281 144 144 144 144 144 144 144 144 144 287 287 287 287 287 188 188 188 188 271 271 271 271 271 271 271 271 193 193 193 193 193 220 220 220 220 220 220 220 51 51 51 51 51 51 51 280 280 280 280 280 280 83 83 83 83 83 83 83 83 83 83 83 83 83 83 288 288 288 288 288 288 288 288 288 331 331 331 331 331 331 331 331 331 53 53 53 53 53 232 232 232 232 232 232 232 232 232 232 232 331 331 331 331 193 193 193 193 193 232 232 232 232 283 283 283 283 283 283 283 283 208 208 208 208 208 208 331 331 331 331 331 331 331 133 133 133 133 133 233 233 233 233 288 288 288 288 288 288 247 247 247 247 247 247 247 247 329 329 329 329 329 329 144 144 144 144 144 144 287 287 287 287 287 188 188 188 175 175 175 175 175 175 175 175 133 133 133 133 133 288 288 288 179 179 179 179 179 179 179 179 179 144 144 144 144 144 144 144 223 223 223 223 223 193 193 193 193 289 289 289 289 289 49 49 49 49 224 224 224 175 175 175 175 175 175 149 149 149 149 149 149 149 149 149 149 149 149 149 149 224 224 224 224 224 224 224 224 224 224 224 224 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 331 331 331 331 331 331 331 208 208 208 208 208 208 179 179 179 179 179 179 179 149 149 149 149 149 149 149 149 149 149 149 116 116 116 116 223 223 223 223 223 223 223 37 37 37 37 37 37 37 281 281 281 281 281 281 281 281 288 288 288 288 288 288 288 331 331 331 331 209 209 209 209 209 209 209 209 220 220 220 220 220 220 283 283 283 283 283 283 283 208 208 208 208 208 331 331 331 331 331 331 49 49 49 340 340 340 340 340 340 340 175 175 175 175 175 175 249 249 249 249 249 249 249 249 249 249 189 189 189 189 189 189 189 189 236 236 236 236 236 236 236 236 236 236 236 236 236 236 236 236 236 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 279 279 279 279 279 279 279 279 279 279 279 279 279 279 248 248 248 248 248 248 248 331 331 331 331 331 331 208 208 208 208 208 208 279 279 279 279 279 279 279 279 279 279 279 133 133 133 133 133 233 233 233 233 233 288 288 288 179 179 179 179 179 179 179 179 179 179 144 144 144 144 144 144 144 331 331 331 331 331 331 331 331 331 331 331 331 331 331 331 149 149 149 149 149 149 149 149 149 116 116 116 116 116 107 107 107 107 107 107 100 100 100 100 100 100 100 100 100 100 100 100 275 275 275 275 193 193 193 193 193 113 113 113 113 113 113 113 113 113 145 145 145 116 116 116 116 116 116 116 116 279 279 279 279 273 273 273 273 273 133 133 133 133 133 233 233 233 233 233 233 233 233 281 281 281 281 281 281 281 281 281 281 145 145 145 145 340 340 340 340 340 340 340 340 171 171 171 171 171 171 249 249 249 249 249 249 249 249 249 249 249 249 221 221 221 221 221 221 280 280 280 280 280 35 35 35 288 288 288 288 219 219 219 219 219 219 219 219 219 219 219 21 21 21 21 21 277 277 277 277 229 229 229 229 229 229 49 49 49 117 117 117 117 117 204 204 204 204 204 204 204 204 204 287 287 287 287 287 287 188 188 188 188 107 107 107 107 107 277 277 277 277 193 193 193 193 236 236 236 236 236 236 51 51 51 51 280 280 280 280 280 280 280 50 50 50 50 50 279 279 279 279 279 279 279 279 279 279 279 279 279 229 229 229 229 229 229 21 21 21 21 21 21 21 277 277 277 277 277 277 277 288 288 288 288 288 288 1 1 1 1 1 1 1 223 223 223 223 223 223 223 223 223 223 223 101 101 101 101 101 101 101 101 101 221 221 221 221 221 221 221 221 221 221 221 221 225 225 225 225 225 204 204 204 204 204 204 204 107 107 107 107 107 107 107 107 264 264 264 264 264 264 264 264 264 264 264 264 264 264 264 264 264 264 264 264 47 47 47 47 328 328 328 328 328 47 47 47 47 47 109 109 109 109 109 109 85 85 85 85 85 85 85 85 85 85 85 85 85 85 85 85 85 288 288 288 288 288 288 288 287 287 287 287 287 287 287 287 287 287 287 287 133 133 133 133 133 232 232 232 232 232 146 146 146 146 146 146 146 187 187 187 187 187 225 225 225 225 225 225 225 225 133 133 133 133 133 133 133 329 329 329 329 329 329 49 49 49 49 232 232 232 232 232 232 232 232 232 232 232 232 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 331 331 331 331 331 331 331 208 208 208 208 208 115 115 115 115 115 197 197 197 197 197 197 281 281 281 281 281 281 281 281 281 101 101 101 101 101 101 101 101 101 101 101 101 101 117 117 117 117 117 189 189 189 189 189 189 189 189 116 116 116 116 116 116 119 119 119 119 119 37 37 37 37 37 37 288 288 288 288 288 288 331 331 331 331 305 305 305 116 116 116 107 107 107 107 107 107 204 204 204 204 204 204 204 204 119 119 119 52 52 52 52 107 107 107 107 107 107 107 107 133 133 133 133 133 133 133 133 281 281 281 281 281 281 281 281 281 281 281 288 288 288 288 288 163 163 163 163 163 163 163 163 163 163 163 163 216 216 216 216 216 216 216 216 216 216 216 216 216 216 216 216 1 1 1 1 1 1 1 1 1 1 1 1 1
+1 1 1 1 1 1 1 1 1 1 1 1 1 1 247 247 247 247 247 247 247 247 247 247 247 247 247 247 247 225 225 225 225 225 225 225 116 116 116 116 187 187 187 233 233 233 233 233 233 233 53 53 53 53 53 53 53 53 172 172 172 172 172 172 287 287 287 287 287 188 188 188 188 107 107 107 107 107 107 107 208 208 208 208 208 208 208 208 208 208 208 47 47 47 47 47 328 328 328 328 328 279 279 279 279 279 279 279 279 279 279 279 279 279 279 53 53 53 53 53 53 53 228 228 228 228 228 228 228 335 335 335 335 335 335 335 335 335 335 335 321 321 321 321 321 321 321 321 321 280 280 280 280 280 280 280 280 280 280 187 187 187 232 232 232 232 232 232 115 115 115 115 115 115 321 321 321 321 321 321 321 321 321 321 189 189 189 236 236 236 236 236 236 236 236 236 111 111 111 111 111 111 111 111 111 111 111 111 111 69 69 69 69 69 69 69 69 69 69 69 69 69 69 277 277 277 277 277 277 277 277 277 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 275 275 275 275 275 101 101 101 101 101 101 101 101 101 288 288 288 288 288 288 288 288 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 172 172 172 172 172 172 172 172 172 172 172 172 172 172 172 172 172 1 1 1 35 35 35 35 35 35 35 35 35 35 35 233 233 233 233 233 233 116 116 116 335 335 335 335 335 335 335 335 335 53 53 53 53 53 53 236 236 236 236 236 236 187 187 187 233 233 233 233 53 53 53 53 53 53 172 172 172 172 172 172 287 287 287 48 48 48 107 107 107 204 204 204 204 204 204 204 287 287 287 287 287 287 287 287 287 287 287 277 277 277 277 277 165 165 165 165 165 165 165 165 165 233 233 233 233 233 116 116 116 116 116 51 51 51 51 51 51 51 51 51 51 272 272 272 272 272 272 271 271 271 271 271 271 271 277 277 277 277 277 277 21 21 21 21 21 21 273 273 273 273 273 273 273 144 144 144 144 144 144 144 144 144 144 144 144 144 144 144 144 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 331 331 331 331 331 331 208 208 208 208 208 227 227 227 227 227 227 227 227 209 209 209 209 209 209 209 209 232 232 232 232 232 287 287 287 188 188 188 188 175 175 175 175 175 175 193 193 193 193 328 328 328 187 187 187 228 228 228 228 228 228 228 228 50 50 50 50 175 175 175 175 175 175 175 175 175 175 305 305 305 305 305 116 116 116 179 179 179 179 179 179 179 179 179 179 179 249 249 249 249 249 249 249 249 249 249 249 249 228 228 228 228 228 228 228 35 35 35 35 35 35 35 35 35 35 35 233 233 233 233 116 116 116 116 116 279 279 279 279 279 279 279 279 279 279 221 221 221 221 221 221 321 321 321 321 321 321 321 321 321 225 225 225 225 225 225 189 189 189 189 236 236 236 236 236 236 236 236 236 236 236 236 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 331 331 331 331 331 331 331 331 331 331 208 208 208 208 208 179 179 179 179 179 179 37 37 37 37 116 116 116 116 50 50 50 50 287 287 287 287 287 287 287 287 287 287 287 133 133 133 133 133 225 225 225 225 225 49 49 49 49 49 49 177 177 177 277 277 277 277 277 277 41 41 41 41 41 41 41 41 41 228 228 228 228 228 228 228 228 171 171 171 171 171 145 145 145 145 145 228 228 228 228 227 227 227 227 193 193 193 193 193 281 281 281 281 281 281 281 189 189 189 189 340 340 340 340 340 340 340 39 39 39 39 39 39 225 225 225 225 225 49 49 49 49 49 177 177 177 177 341 341 341 341 341 341 37 37 37 37 37 37 37 37 233 233 233 233 117 117 117 117 117 144 144 144 144 144 279 279 279 279 279 279 279 273 273 273 273 273 273 133 133 133 133 233 233 233 233 233 233 233 281 281 281 281 281 281 281 144 144 144 144 144 144 287 287 287 287 287 287 49 49 49 49 117 117 117 117 117 117 164 164 164 164 164 164 164 164 164 164 164 164 164 164 164 164 164 164 164 164 164 164 164 164 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 119 119 119 119 119 119 52 52 52 52 227 227 227 227 227 227 227 165 165 165 165 165 165 165 165 165 165 165 224 224 224 224 224 224 224 227 227 227 227 227 227 37 37 37 37 37 37 37 37 37 232 232 232 232 232 232 232 107 107 107 107 107 107 277 277 277 277 277 277 69 69 69 69 69 69 288 288 288 288 288 288 187 187 187 187 187 187 288 288 288 288 288 171 171 171 171 171 145 145 145 145 228 228 228 228 228 119 119 119 52 52 52 52 52 279 279 279 279 279 289 289 289 289 289 289 289 165 165 165 165 165 165 165 165 165 285 285 285 285 285 285 285 285 285 285 49 49 49 49 49 232 232 232 232 232 232 232 232 232 232 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 279 279 279 279 279 279 279 279 279 279 279 279 279 279 279 165 165 165 165 165 165 165 165 165 165 165 165 165 189 189 189 236 236 236 236 236 236 236 236 236 119 119 119 119 164 164 164 164 164 164 164 164 331 331 331 331 331 331 331 331 144 144 144 144 144 144 219 219 219 219 219 219 219 219 219 219 219 219 53 53 53 53 229 229 229 229 229 229 189 189 189 189 189 236 236 236 236 236 236 236 236 19 19 19 19 19 19 232 232 232 232 119 119 119 52 52 52 52 52 171 171 171 171 171 171 171 171 171 171 171 101 101 101 101 101 101 101 101 101 101 101 101 101 101 328 328 328 328 291 291 291 291 291 291 291 291 291 291 291 291 149 149 149 149 149 149 117 117 117 117 204 204 204 204 204 204 204 204 204 204 287 287 287 287 287 287 287 287 287 287 287 277 277 277 277 165 165 165 165 165 165 165 165 165 165 165 165 232 232 232 232 287 287 287 287 287 287 49 49 49 49 233 233 233 233 233 233 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 288 288 288 288 288 288 288 288 288 288 288 288 1 1 1 1 1 1 1 1 1 1 1 1 1
+1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 279 279 279 279 279 279 279 279 279 279 279 279 279 248 248 248 248 248 248 248 248 248 227 227 227 227 227 227 227 227 227 37 37 37 37 37 37 37 37 37 37 37 293 293 293 293 293 293 337 337 337 337 316 316 316 316 316 316 316 331 331 331 331 331 331 133 133 133 133 133 233 233 233 233 288 288 288 287 287 287 287 48 48 48 48 107 107 107 107 107 107 107 107 277 277 277 277 101 101 101 101 101 101 101 101 101 101 101 288 288 288 288 288 288 288 288 275 275 275 275 193 193 193 193 193 329 329 329 329 329 329 144 144 144 144 144 287 287 287 287 287 287 287 48 48 48 48 227 227 227 227 227 227 209 209 209 209 209 209 209 209 288 288 288 179 179 179 179 179 179 193 193 193 228 228 228 228 228 228 228 228 228 228 228 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 227 227 227 227 227 227 227 193 193 193 193 193 281 281 281 281 281 281 281 281 281 189 189 189 189 189 189 340 340 340 340 340 279 279 279 279 279 279 279 279 279 279 273 273 273 273 273 133 133 133 133 133 133 233 233 233 233 233 233 233 233 281 281 281 281 281 281 281 281 144 144 144 144 144 144 331 331 331 331 49 49 49 49 224 224 224 224 224 115 115 115 115 115 277 277 277 277 21 21 21 21 21 21 21 272 272 272 272 272 272 187 187 187 187 228 228 228 228 228 228 228 67 67 67 67 67 67 67 67 67 67 172 172 172 172 172 172 172 172 172 172 119 119 119 119 133 133 133 133 133 133 276 276 276 276 276 276 276 276 276 276 276 276 276 276 276 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 47 47 47 47 47 47 47 47 47 328 328 328 328 219 219 219 219 219 69 69 69 69 69 69 69 277 277 277 277 277 277 277 277 277 280 280 280 280 280 280 280 280 280 280 280 280 280 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 283 283 283 283 283 283 283 283 283 283 208 208 208 208 208 208 208 208 208 175 175 175 175 175 249 249 249 249 249 249 249 249 249 249 249 249 249 249 249 340 340 340 340 340 340 340 340 19 19 19 19 19 19 19 19 19 19 232 232 232 232 232 287 287 287 287 287 48 48 48 48 48 48 331 331 331 331 331 331 331 101 101 101 101 101 101 101 101 101 101 101 288 288 288 288 288 288 279 279 279 279 279 279 279 37 37 37 37 37 37 37 37 37 37 233 233 233 117 117 117 117 340 340 340 279 279 279 289 289 289 289 289 289 289 165 165 165 165 165 285 285 285 285 285 285 285 285 285 285 49 49 49 232 232 232 179 179 179 179 179 179 179 145 145 145 145 145 145 281 281 281 281 281 281 281 281 133 133 133 133 133 133 133 133 225 225 225 225 225 225 225 225 225 225 225 172 172 172 172 172 172 172 172 172 172 172 172 172 172 172 172 172 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 227 227 227 227 227 227 227 227 193 193 193 193 193 281 281 281 281 281 281 281 189 189 189 189 340 340 340 340 340 340 340 340 275 275 275 275 275 275 165 165 165 165 165 165 165 165 165 113 113 113 113 113 49 49 49 49 49 224 224 224 224 224 224 271 271 271 271 271 271 271 271 271 271 277 277 277 277 277 101 101 101 101 101 101 101 101 117 117 117 117 117 189 189 189 189 189 116 116 116 116 179 179 179 179 179 179 179 179 145 145 145 145 145 145 145 145 281 281 281 281 281 281 281 281 133 133 133 133 133 133 225 225 225 225 225 225 225 225 225 172 172 172 172 172 172 172 19 19 19 232 232 232 232 232 232 232 67 67 67 67 67 67 67 67 67 67 67 225 225 225 225 225 225 225 333 333 333 333 333 333 333 333 205 205 205 205 205 340 340 340 340 279 279 279 279 279 279 279 279 279 273 273 273 273 273 273 209 209 209 209 209 209 221 221 221 221 189 189 189 189 236 236 236 236 236 236 236 179 179 179 179 179 179 179 179 148 148 148 148 148 227 227 227 227 227 227 227 101 101 101 101 101 101 101 101 101 233 233 233 233 233 233 233 233 116 116 116 116 116 116 116 116 116 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 283 283 283 283 283 283 283 283 283 283 283 283 283 208 208 208 208 208 208 271 271 271 271 271 277 277 277 277 277 49 49 49 49 49 281 281 281 281 281 281 281 281 281 209 209 209 209 209 209 209 117 117 117 117 49 49 49 49 49 116 116 116 116 287 287 287 287 287 188 188 188 188 188 279 279 279 279 279 279 273 273 273 273 273 273 209 209 209 209 209 209 220 220 220 220 220 220 191 191 191 191 191 288 288 288 231 231 231 231 231 231 231 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 1 1 1 1 1 1 1 1 1 179 179 179 179 179 179 179 179 179 179 179 37 37 37 37 329 329 329 329 329 189 189 189 236 236 236 236 236 236 236 236 47 47 47 47 47 47 47 217 217 217 217 217 217 217 217 217 53 53 53 53 53 53 53 53 281 281 281 281 281 281 281 289 289 289 289 289 289 49 49 49 49 49 116 116 116 179 179 179 179 179 179 179 179 144 144 144 144 227 227 227 227 227 227 227 227 227 227 227 133 133 133 133 233 233 233 233 233 233 289 289 289 49 49 49 49 224 224 224 224 224 224 224 224 224 35 35 35 35 35 35 35 35 35 35 289 289 289 289 289 49 49 49 49 49 49 289 289 289 289 289 289 289 289 289 325 325 325 325 325 325 325 325 325 325 116 116 116 287 287 287 287 287 287 188 188 188 188 188 119 119 119 119 189 189 189 189 280 280 280 280 280 280 280 280 280 47 47 47 47 47 229 229 229 229 229 229 229 165 165 165 165 165 165 165 165 165 165 165 165 165 341 341 341 341 341 341 341 341 189 189 189 189 236 236 236 236 236 236 236 236 236 271 271 271 271 271 271 271 271 209 209 209 209 209 209 209 209 280 280 280 280 280 280 280 47 47 47 47 328 328 328 328 328 328 231 231 231 231 337 337 337 337 337 337 337 337 321 321 321 321 321 321 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
+1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 331 331 331 331 331 331 331 331 331 331 331 331 331 331 331 331 133 133 133 133 133 133 133 133 133 133 224 224 224 224 224 224 224 224 227 227 227 227 227 227 227 227 227 17 17 17 277 277 277 277 277 277 277 193 193 193 193 193 225 225 225 225 225 225 225 225 48 48 48 48 48 48 48 48 48 48 48 48 48 48 48 48 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 224 224 224 224 224 224 224 224 224 224 224 215 215 215 215 215 215 53 53 53 53 53 53 53 281 281 281 281 281 281 281 281 281 288 288 288 287 287 287 287 287 287 287 133 133 133 133 224 224 224 224 224 224 224 224 335 335 335 335 335 335 335 335 320 320 320 320 320 320 320 271 271 271 271 271 271 271 271 271 271 271 225 225 225 225 225 165 165 165 165 165 165 165 165 165 165 165 165 165 165 165 165 232 232 232 232 232 232 119 119 119 119 49 49 49 49 288 288 288 288 288 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 291 291 291 291 291 291 291 291 291 291 193 193 193 193 237 237 237 237 237 237 237 237 237 220 220 220 220 220 335 335 335 335 335 305 305 305 276 276 276 276 276 276 276 276 276 115 115 115 115 115 115 321 321 321 321 321 321 321 321 321 321 321 189 189 189 189 236 236 236 236 236 236 50 50 50 50 227 227 227 227 227 227 227 101 101 101 101 101 101 101 101 101 289 289 289 289 289 289 289 204 204 204 204 204 204 171 171 171 171 171 171 171 171 171 171 171 171 321 321 321 321 321 321 321 321 321 225 225 225 225 225 225 189 189 189 189 284 284 284 284 284 284 284 284 284 284 284 284 284 284 284 291 291 291 291 193 193 193 193 236 236 236 236 236 236 236 236 236 236 236 236 236 236 236 236 236 236 236 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 275 275 275 275 275 275 275 193 193 193 193 193 281 281 281 281 281 281 281 281 281 281 221 221 221 221 221 221 204 204 204 204 204 204 204 204 291 291 291 291 291 291 291 291 291 291 193 193 193 193 193 193 236 236 236 236 236 236 236 236 236 236 236 236 236 236 236 236 236 119 119 119 119 37 37 37 37 37 37 37 37 37 37 289 289 289 289 289 280 280 280 280 280 280 280 331 331 331 331 331 331 331 53 53 53 53 53 53 53 53 53 288 288 288 288 288 288 288 288 288 288 288 288 288 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 335 335 335 335 335 335 335 320 320 320 320 320 115 115 115 115 115 115 249 249 249 249 249 249 249 249 233 233 233 288 288 288 231 231 231 231 231 248 248 248 248 248 248 248 248 248 248 248 331 331 331 331 331 331 331 331 53 53 53 53 53 288 288 288 288 335 335 335 335 305 305 305 305 276 276 276 276 276 276 175 175 175 175 175 175 175 175 133 133 133 133 133 289 289 289 289 289 289 189 189 189 189 189 236 236 236 236 236 236 236 236 236 236 236 236 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 335 335 335 335 335 335 335 335 335 335 335 305 305 305 276 276 276 276 276 276 276 276 276 276 107 107 107 107 107 107 107 277 277 277 277 277 193 193 193 193 193 237 237 237 237 237 189 189 189 189 236 236 236 236 236 236 236 236 236 50 50 50 50 50 50 279 279 279 279 279 279 279 279 279 279 289 289 289 289 289 289 289 277 277 277 165 165 165 165 165 165 165 233 233 233 233 233 233 233 216 216 216 216 216 216 216 216 216 216 216 111 111 111 111 111 111 111 111 101 101 101 101 101 101 101 101 101 101 101 101 101 225 225 225 225 225 225 116 116 116 116 116 116 187 187 187 187 187 233 233 233 233 233 289 289 289 289 289 289 320 320 320 335 335 335 69 69 69 69 69 69 69 276 276 276 276 276 179 179 179 179 179 179 179 85 85 85 85 85 85 85 85 85 85 85 280 280 280 280 280 280 280 47 47 47 233 233 233 116 116 116 179 179 179 179 249 249 249 249 249 249 249 249 249 249 249 249 249 249 228 228 228 228 228 228 228 228 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 35 35 35 35 35 35 35 35 35 35 35 35 233 233 233 233 233 116 116 116 335 335 335 335 335 320 320 320 320 320 320 320 115 115 115 115 115 115 115 115 249 249 249 249 249 249 249 249 249 249 249 249 232 232 232 232 231 231 231 231 248 248 248 248 248 248 248 248 248 248 248 50 50 50 50 50 50 279 279 279 279 279 279 279 279 279 279 279 279 279 279 193 193 193 237 237 237 237 237 237 237 237 237 237 177 177 177 49 49 49 49 224 224 224 224 224 224 224 291 291 291 291 291 291 193 193 193 193 236 236 236 236 236 236 236 47 47 47 47 109 109 109 109 109 85 85 85 85 85 85 85 85 85 85 85 85 288 288 288 288 179 179 179 179 179 193 193 193 193 228 228 228 228 228 231 231 231 231 231 231 231 231 69 69 69 69 69 69 276 276 276 276 276 276 276 331 331 331 331 331 331 331 331 331 331 53 53 53 53 288 288 288 179 179 179 189 189 189 340 340 340 340 340 340 115 115 115 115 115 197 197 197 197 281 281 281 281 281 281 273 273 273 273 273 49 49 49 49 49 49 341 341 341 341 341 341 193 193 193 193 285 285 285 285 285 285 285 285 285 285 49 49 49 232 232 232 232 187 187 187 187 187 340 340 340 340 340 340 340 340 223 223 223 223 223 223 101 101 101 101 101 101 101 101 101 101 101 101 220 220 220 220 220 220 220 220 220 231 231 231 231 231 231 231 231 69 69 69 69 69 69 69 69 276 276 276 276 276 276 276 276 331 331 331 331 331 331 331 331 53 53 53 53 53 53 53 288 288 288 288 288 288 288 279 279 279 279 279 279 69 69 69 69 277 277 277 277 277 288 288 288 47 47 47 47 328 328 328 328 328 328 271 271 271 271 271 271 271 271 271 271 271 271 133 133 133 133 133 277 277 277 277 277 277 277 277 277 277 277 277 49 49 49 49 49 233 233 233 233 289 289 289 289 280 280 280 179 179 179 179 179 208 208 208 208 208 208 208 208 179 179 179 179 179 179 179 179 37 37 37 37 37 37 37 37 37 37 116 116 116 116 116 116 116 1 1 1 1 1 1 1 1 1 1 1 1 1
+1 1 1 1 1 1 1 1 1 1 1 1 231 231 231 231 231 231 231 231 231 231 69 69 69 69 69 69 69 69 69 69 69 276 276 276 276 276 179 179 179 179 179 179 179 179 84 84 84 84 84 84 84 84 84 84 84 179 179 179 179 179 179 179 179 209 209 209 209 209 209 340 340 340 340 340 340 223 223 223 223 223 223 101 101 101 101 101 101 101 221 221 221 221 221 221 221 225 225 225 225 225 204 204 204 204 204 287 287 287 188 188 188 188 188 287 287 287 287 287 287 287 287 287 149 149 149 149 149 149 149 149 149 232 232 232 232 232 83 83 83 83 83 83 83 83 83 83 83 288 288 288 288 288 288 288 288 288 288 288 288 288 288 288 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 331 331 331 331 331 331 331 331 331 331 331 331 331 331 100 100 100 100 100 100 100 100 100 100 100 100 187 187 187 288 288 288 288 288 331 331 331 331 49 49 49 340 340 340 340 340 340 340 247 247 247 247 247 247 247 247 233 233 233 233 233 233 225 225 225 225 204 204 204 204 204 204 204 204 223 223 223 223 223 223 37 37 37 37 37 37 37 37 37 281 281 281 281 281 281 281 281 281 288 288 288 288 288 331 331 331 331 331 209 209 209 209 209 209 209 220 220 220 220 220 220 220 220 102 102 102 102 102 102 102 102 102 102 102 102 102 275 275 275 275 275 275 275 133 133 133 133 116 116 116 116 116 116 187 187 187 187 232 232 232 232 232 119 119 119 52 52 52 271 271 271 271 271 271 271 165 165 165 165 165 165 165 165 165 165 273 273 273 273 273 273 273 273 144 144 144 144 144 144 144 144 144 144 179 179 179 179 179 179 179 179 84 84 84 84 84 84 84 84 84 84 84 84 50 50 50 50 227 227 227 227 227 227 227 227 227 227 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 232 232 232 232 232 35 35 35 233 233 233 116 116 116 179 179 179 179 179 193 193 193 193 193 193 340 340 340 340 340 340 340 340 340 331 331 331 331 331 331 331 101 101 101 101 101 101 101 101 101 172 172 172 172 172 172 172 172 51 51 51 51 51 51 51 51 51 51 272 272 272 272 331 331 331 331 331 331 331 331 331 331 331 133 133 133 133 133 133 133 133 281 281 281 281 281 281 281 281 281 281 281 288 288 288 288 47 47 47 328 328 328 119 119 119 119 204 204 204 204 204 204 204 99 99 99 99 99 99 99 99 99 99 99 99 225 225 225 225 225 225 225 49 49 49 49 49 233 233 233 233 116 116 116 116 116 116 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 287 287 287 287 287 287 287 287 305 305 305 305 305 305 305 220 220 220 220 50 50 50 50 50 50 50 107 107 107 107 107 107 107 264 264 264 264 264 264 264 264 264 264 264 264 264 264 264 264 83 83 83 83 83 83 83 83 83 83 288 288 288 288 288 288 288 47 47 47 47 328 328 328 328 47 47 47 47 232 232 232 232 232 232 67 67 67 67 67 67 67 67 67 67 277 277 277 277 277 277 173 173 173 173 173 173 173 173 49 49 49 49 232 232 232 232 47 47 47 47 47 47 281 281 281 281 281 281 281 281 281 101 101 101 101 101 101 101 101 101 101 101 225 225 225 225 225 225 225 49 49 49 49 228 228 228 228 228 228 35 35 35 35 35 35 35 35 233 233 233 116 116 116 179 179 179 179 179 179 208 208 208 208 208 208 208 208 208 279 279 279 279 279 279 279 279 279 279 279 133 133 133 133 133 133 288 288 288 288 288 288 171 171 171 171 171 171 171 171 171 171 171 171 171 171 101 101 101 101 101 101 101 101 101 101 101 101 101 276 276 276 276 276 276 276 276 287 287 287 287 287 188 188 188 119 119 119 52 52 52 52 52 179 179 179 179 179 179 179 179 179 179 85 85 85 85 85 85 85 85 280 280 280 280 280 280 280 280 280 280 35 35 35 35 288 288 288 288 288 288 231 231 231 231 231 101 101 101 101 101 101 101 101 101 101 101 288 288 288 288 288 288 288 288 288 288 288 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 279 279 279 279 279 279 279 279 279 279 279 279 279 279 279 279 279 133 133 133 133 133 288 288 288 288 288 288 187 187 187 187 187 187 288 288 288 288 288 288 288 288 288 288 19 19 19 19 19 19 19 19 19 19 232 232 232 232 232 232 232 232 232 232 271 271 271 271 271 271 271 271 271 271 149 149 149 149 149 149 149 149 273 273 273 273 273 49 49 49 49 49 49 49 49 280 280 280 280 280 280 280 280 280 227 227 227 227 17 17 17 277 277 277 277 277 277 277 193 193 193 193 193 193 225 225 225 225 225 225 48 48 48 48 48 48 48 48 48 48 48 48 48 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 35 35 35 35 35 35 35 35 35 35 35 233 233 233 233 116 116 116 116 116 231 231 231 231 231 193 193 193 193 193 193 277 277 277 277 277 277 277 225 225 225 225 225 225 204 204 204 204 204 204 204 204 204 107 107 107 107 107 107 107 107 107 107 149 149 149 149 149 149 149 149 149 233 233 233 233 233 288 288 288 288 119 119 119 49 49 49 49 49 228 228 228 228 228 287 287 287 287 287 287 287 287 320 320 320 320 320 320 50 50 50 50 50 50 219 219 219 219 219 219 219 219 219 277 277 277 193 193 193 193 281 281 281 281 281 281 281 281 281 272 272 272 272 272 272 187 187 187 187 232 232 232 232 119 119 119 133 133 133 133 276 276 276 276 276 276 276 107 107 107 107 107 133 133 133 133 133 133 133 133 133 117 117 117 117 117 117 117 117 117 117 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
+1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 35 35 35 35 35 35 35 35 35 35 35 35 233 233 233 116 116 116 116 116 102 102 102 102 102 102 102 102 102 102 102 231 231 231 231 231 231 231 248 248 248 248 248 248 248 248 248 248 248 47 47 47 47 233 233 233 233 233 233 233 233 233 53 53 53 53 53 121 121 121 121 121 121 144 144 144 144 144 219 219 219 219 219 219 219 219 219 219 165 165 165 165 165 165 165 165 165 165 165 280 280 280 280 280 280 280 280 280 280 331 331 331 331 133 133 133 133 276 276 276 276 276 276 276 47 47 47 47 232 232 232 232 47 47 47 47 47 47 117 117 117 117 117 21 21 21 21 21 21 21 21 21 21 21 273 273 273 273 289 289 289 289 289 49 49 49 49 49 49 116 116 116 116 116 116 107 107 107 107 107 107 264 264 264 264 264 264 264 264 264 264 264 264 335 335 335 335 335 335 335 335 335 335 335 335 321 321 321 321 321 321 321 341 341 341 341 341 341 341 116 116 116 287 287 287 287 48 48 48 48 48 279 279 279 279 279 279 279 279 279 279 279 279 279 279 53 53 53 53 53 53 220 220 220 220 220 220 220 220 220 220 220 220 220 119 119 119 119 119 119 204 204 204 204 204 204 204 204 204 131 131 131 131 131 131 131 131 131 177 177 177 177 177 177 177 177 177 177 177 177 177 177 340 340 340 340 340 340 340 340 340 340 340 340 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 119 119 119 119 119 119 164 164 164 164 164 164 164 164 164 219 219 219 219 219 219 219 219 219 219 219 305 305 305 305 305 305 117 117 117 117 117 49 49 49 49 49 49 233 233 233 233 233 288 288 288 288 107 107 107 107 107 277 277 277 277 165 165 165 165 220 220 220 220 220 220 220 220 179 179 179 193 193 193 193 228 228 228 228 51 51 51 51 51 51 51 328 328 328 328 328 328 187 187 187 187 187 187 187 187 288 288 288 288 288 288 288 288 288 288 288 288 288 288 288 288 288 288 288 288 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 191 191 191 191 191 191 191 191 191 191 191 191 191 191 191 191 172 172 172 172 172 172 172 335 335 335 335 335 320 320 320 320 320 320 320 320 179 179 179 179 179 179 179 179 179 179 37 37 37 116 116 116 116 116 35 35 35 35 35 35 35 35 35 35 281 281 281 281 281 281 281 281 288 288 288 288 288 227 227 227 227 100 100 100 100 100 100 100 100 100 31 31 31 31 31 31 117 117 117 329 329 329 329 329 329 101 101 101 101 101 101 101 101 101 101 101 101 101 101 280 280 280 280 280 280 280 280 280 280 280 187 187 187 232 232 232 119 119 119 52 52 52 227 227 227 227 227 37 37 37 37 37 37 289 289 289 289 289 289 289 289 289 144 144 144 144 144 144 144 144 144 144 144 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 331 331 331 331 331 331 331 331 193 193 193 193 193 112 112 112 112 112 112 112 112 112 112 112 112 112 335 335 335 335 335 320 320 320 320 320 320 320 115 115 115 115 115 115 115 115 115 193 193 193 193 193 117 117 117 117 49 49 49 49 233 233 233 233 233 233 288 288 288 288 288 288 288 288 288 288 1 1 1 115 115 115 115 115 115 320 320 320 320 320 320 320 320 320 320 320 320 320 227 227 227 227 227 227 227 17 17 17 17 277 277 277 277 277 277 277 193 193 193 193 193 225 225 225 225 225 225 225 225 225 48 48 48 48 48 48 48 48 48 48 48 48 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 116 116 116 179 179 179 37 37 37 328 328 328 328 328 328 279 279 279 279 279 279 279 279 279 279 279 279 279 133 133 133 133 133 133 133 133 133 133 116 116 116 116 116 116 116 116 171 171 171 171 171 171 171 171 171 171 69 69 69 69 69 69 69 276 276 276 276 276 227 227 227 227 227 227 227 227 227 227 227 227 227 227 227 227 149 149 149 149 149 149 281 281 281 281 281 281 281 281 281 281 281 281 281 205 205 205 205 205 340 340 340 340 340 340 340 340 340 279 279 279 279 279 279 279 279 279 279 279 165 165 165 165 165 165 165 165 165 165 165 220 220 220 220 220 220 220 231 231 231 231 231 231 231 231 21 21 21 21 21 21 21 288 288 288 288 288 287 287 287 287 287 287 48 48 48 48 291 291 291 291 291 291 291 291 291 291 291 291 291 291 193 193 193 237 237 237 237 237 237 237 220 220 220 220 47 47 47 47 328 328 328 328 328 279 279 279 279 279 279 53 53 53 53 53 53 112 112 112 112 112 50 50 50 50 50 50 50 291 291 291 291 291 291 291 291 291 193 193 193 193 193 193 193 236 236 236 236 236 236 236 236 236 236 236 119 119 119 119 119 37 37 37 37 37 37 37 37 37 37 289 289 289 289 289 280 280 280 280 280 280 280 280 280 280 280 331 331 331 331 331 53 53 53 53 53 53 53 53 53 53 288 288 288 288 288 288 288 288 288 288 288 288 288 288 288 288 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
+1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 119 119 119 119 119 119 119 119 119 119 119 119 119 193 193 193 193 193 193 193 193 193 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 215 215 215 215 215 215 215 215 215 215 215 215 215 215 249 249 249 249 249 249 249 249 249 249 249 249 249 249 249 249 249 249 109 109 109 109 109 109 340 340 340 340 340 340 340 219 219 219 219 219 219 219 219 219 53 53 53 53 229 229 229 229 229 229 229 229 173 173 173 173 145 145 145 145 145 289 289 289 189 189 189 189 189 189 236 236 236 236 236 236 236 236 236 236 236 236 236 279 279 279 279 279 279 279 279 279 279 279 209 209 209 209 209 209 209 209 209 209 209 209 229 229 229 229 229 229 116 116 116 116 116 116 116 116 231 231 231 231 231 231 231 101 101 101 101 101 101 101 101 101 101 101 101 121 121 121 121 144 144 144 144 287 287 287 287 287 287 287 287 320 320 320 320 320 320 320 47 47 47 47 47 47 173 173 173 173 173 173 173 173 173 173 173 173 173 173 133 133 133 133 133 133 133 133 133 233 233 233 233 233 233 233 233 233 233 233 233 233 233 233 233 116 116 116 116 116 116 116 116 116 116 231 231 231 231 231 231 69 69 69 69 69 69 276 276 276 276 276 276 287 287 287 287 287 287 287 287 320 320 320 320 320 320 320 47 47 47 47 47 47 47 225 225 225 225 225 225 225 225 21 21 21 21 21 21 21 21 21 21 277 277 277 277 277 228 228 228 228 228 228 227 227 227 227 227 17 17 17 277 277 277 277 277 277 193 193 193 193 193 193 225 225 225 225 225 225 225 225 225 48 48 48 48 48 48 48 48 48 48 48 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 283 283 283 283 283 283 283 283 283 283 283 283 283 283 283 208 208 208 208 208 231 231 231 231 231 193 193 193 193 193 193 289 289 289 289 289 189 189 189 189 189 189 116 116 116 116 116 279 279 279 279 279 279 279 279 279 289 289 289 289 289 289 289 133 133 133 133 133 117 117 117 117 49 49 49 49 225 225 225 225 225 225 204 204 204 204 204 204 204 204 204 204 204 204 204 67 67 67 67 67 67 67 67 67 67 67 232 232 232 232 232 232 232 232 232 232 232 232 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 115 115 115 115 115 115 115 249 249 249 249 249 249 249 249 233 233 233 288 288 288 115 115 115 115 189 189 189 189 189 189 233 233 233 233 233 233 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 119 119 119 119 133 133 133 277 277 277 277 277 340 340 340 340 340 279 279 279 279 279 279 53 53 53 53 53 229 229 229 229 229 229 293 293 293 293 293 293 189 189 189 189 236 236 236 236 236 236 187 187 187 187 232 232 232 232 331 331 331 331 331 331 53 53 53 53 53 288 288 288 288 288 288 288 335 335 335 335 320 320 320 320 320 279 279 279 279 279 279 279 279 279 279 279 164 164 164 164 164 164 164 164 164 164 164 164 164 164 164 275 275 275 275 275 275 275 165 165 165 165 165 165 165 165 165 113 113 113 113 113 113 49 49 49 49 49 49 49 224 224 224 224 224 224 224 224 224 224 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 328 328 328 328 328 328 179 179 179 179 179 179 179 179 37 37 37 37 37 37 116 116 116 116 116 116 279 279 279 279 279 279 53 53 53 53 228 228 228 228 228 228 228 228 219 219 219 219 219 219 219 219 333 333 333 333 333 333 21 21 21 21 21 21 21 21 225 225 225 225 229 229 229 229 229 229 229 229 340 340 340 340 340 340 340 340 227 227 227 227 227 227 105 105 105 105 105 105 105 105 281 281 281 281 281 281 281 281 281 281 133 133 133 133 133 225 225 225 225 225 225 225 225 225 225 225 172 172 172 172 172 172 172 172 172 172 172 172 172 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 107 107 107 107 107 107 53 53 53 53 53 288 288 288 288 288 227 227 227 227 227 227 227 37 37 37 37 37 37 37 37 37 37 293 293 293 293 293 293 337 337 337 337 337 337 316 316 316 316 316 316 316 316 331 331 331 331 331 49 49 49 49 340 340 340 340 340 340 340 340 287 287 287 287 287 287 287 287 287 287 133 133 133 133 277 277 277 277 277 277 277 277 49 49 49 49 49 109 109 109 49 49 49 224 224 224 224 224 224 279 279 279 279 279 279 279 279 279 279 279 133 133 133 133 133 288 288 288 288 288 288 288 19 19 19 19 19 19 19 19 232 232 232 232 232 187 187 187 187 187 187 187 288 288 288 288 288 288 288 288 288 288 288 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 219 219 219 219 219 219 219 305 305 305 116 116 116 116 116 116 279 279 279 279 279 279 279 279 279 279 208 208 208 208 208 208 208 208 208 208 119 119 119 119 119 37 37 37 37 37 37 37 37 37 37 37 37 37 37 288 288 288 288 288 288 288 288 288 279 279 279 279 279 279 279 248 248 248 248 248 248 248 248 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 175 175 175 175 175 175 165 165 165 165 165 165 165 165 165 165 165 328 328 328 328 328 328 328 191 191 191 191 191 191 191 191 191 191 232 232 232 232 232 232 232 232 232 232 232 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
+1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 187 187 187 187 187 187 187 187 187 187 187 289 289 289 289 289 280 280 280 279 279 279 279 279 279 279 279 248 248 248 248 248 248 248 248 248 248 248 279 279 279 279 279 279 279 279 279 279 279 279 133 133 133 133 133 225 225 225 225 225 225 225 225 225 117 117 117 117 117 117 49 49 49 49 49 228 228 228 228 228 227 227 227 227 227 227 227 37 37 37 37 37 37 37 37 37 293 293 293 293 293 293 337 337 337 337 337 316 316 316 316 316 279 279 279 279 279 279 279 279 279 279 279 133 133 133 133 133 133 289 289 289 289 289 289 280 280 280 179 179 179 189 189 189 340 340 340 340 340 340 340 227 227 227 227 227 227 227 227 101 101 101 101 101 101 101 101 101 101 101 101 233 233 233 116 116 116 116 19 19 19 19 19 19 19 19 19 232 232 232 232 232 232 131 131 131 131 233 233 233 233 233 233 205 205 205 205 205 293 293 293 293 293 293 293 293 293 293 293 197 197 197 197 197 236 236 236 236 236 236 236 236 236 236 236 236 119 119 119 119 49 49 49 49 49 49 288 288 288 288 288 288 288 288 288 331 331 331 331 133 133 133 133 133 232 232 232 179 179 179 179 179 179 179 179 179 179 208 208 208 208 208 208 115 115 115 115 115 115 115 115 53 53 53 53 53 53 53 53 53 53 53 53 53 53 53 53 340 340 340 340 340 340 340 340 102 102 102 102 102 102 102 102 102 102 102 102 102 67 67 67 67 67 67 225 225 225 225 225 225 333 333 333 333 333 333 205 205 205 205 340 340 340 340 340 340 340 340 340 340 340 171 171 171 171 171 209 209 209 209 209 209 209 209 209 209 224 224 224 224 224 224 187 187 187 187 289 289 289 289 280 280 280 280 280 227 227 227 227 227 227 100 100 100 100 100 100 100 100 100 115 115 115 115 115 115 337 337 337 337 337 321 321 321 321 321 321 321 289 289 289 289 289 289 204 204 204 204 204 204 204 204 204 287 287 287 287 287 287 188 188 188 188 175 175 175 175 175 175 175 193 193 193 193 328 328 328 328 191 191 191 191 191 191 191 191 191 191 191 232 232 232 232 232 232 232 232 232 232 232 232 232 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 35 35 35 35 35 35 35 35 35 233 233 233 233 116 116 116 116 116 116 35 35 35 35 35 35 35 340 340 340 340 340 340 340 340 340 340 340 340 171 171 171 171 171 144 144 144 144 144 144 144 119 119 119 119 52 52 52 52 52 275 275 275 275 275 275 275 275 275 193 193 193 193 193 193 281 281 281 281 281 281 281 281 281 281 281 281 220 220 220 220 220 220 220 220 220 220 220 220 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 119 119 119 119 119 133 133 133 133 133 277 277 277 277 277 277 277 340 340 340 340 340 340 340 340 340 340 340 275 275 275 275 275 193 193 193 193 193 281 281 281 281 281 281 281 281 281 221 221 221 221 221 221 280 280 280 280 280 187 187 187 187 232 232 232 232 232 232 232 232 271 271 271 271 271 271 271 277 277 277 277 193 193 193 289 289 289 204 204 204 204 204 204 231 231 231 231 231 193 193 193 193 193 193 193 193 276 276 276 276 276 276 276 276 131 131 131 131 131 131 131 131 131 131 329 329 329 329 329 329 329 277 277 277 277 205 205 205 205 205 205 205 205 293 293 293 293 293 293 293 293 197 197 197 197 197 197 236 236 236 236 236 236 236 50 50 50 50 50 107 107 107 107 107 107 107 107 107 107 21 21 21 21 21 21 21 21 117 117 117 117 204 204 204 204 204 204 204 204 115 115 115 115 115 115 115 115 53 53 53 53 53 53 53 53 53 53 340 340 340 340 340 340 187 187 187 187 232 232 232 119 119 119 119 189 189 189 189 189 189 280 280 280 280 280 280 280 280 280 280 280 331 331 331 331 331 331 149 149 149 149 149 225 225 225 225 225 225 225 225 225 116 116 116 116 116 116 116 116 116 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 119 119 119 119 119 119 133 133 133 133 133 277 277 277 277 277 277 277 340 340 340 340 340 340 340 340 340 340 340 340 275 275 275 275 275 275 275 193 193 193 193 193 281 281 281 281 281 281 281 281 281 281 281 221 221 221 221 221 221 280 280 280 280 280 187 187 187 232 232 232 232 232 232 271 271 271 271 271 271 271 271 209 209 209 209 209 209 209 273 273 273 273 273 49 49 49 225 225 225 225 225 340 340 340 340 340 340 179 179 179 179 179 37 37 37 37 37 37 37 329 329 329 329 189 189 189 189 236 236 236 236 236 236 236 236 111 111 111 111 111 111 111 111 193 193 193 193 193 225 225 225 225 225 225 117 117 117 117 117 277 277 277 49 49 49 232 232 232 232 232 47 47 47 47 47 328 328 328 119 119 119 119 133 133 133 133 133 276 276 276 276 276 276 276 276 276 276 276 247 247 247 247 247 247 247 247 247 247 247 247 247 247 247 247 232 232 232 232 232 232 232 232 191 191 191 191 172 172 172 172 172 172 172 172 191 191 191 288 288 288 219 219 219 219 219 219 53 53 53 53 53 229 229 229 229 229 229 229 229 229 229 340 340 340 340 340 340 287 287 287 287 48 48 48 48 48 119 119 119 119 119 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 288 288 288 288 288 288 288 288 288 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 119 119 119 119 119 119 164 164 164 164 164 164 164 164 164 164 164 164 164 115 115 115 115 115 249 249 249 249 249 249 249 249 249 249 249 249 233 233 233 288 288 288 288 288 67 67 67 67 67 67 67 225 225 225 225 333 333 333 333 333 333 333 205 205 205 205 205 340 340 340 340 340 287 287 287 287 287 287 287 287 287 149 149 149 149 149 149 149 149 232 232 232 232 232 83 83 83 83 83 83 83 83 83 83 83 288 288 288 288 331 331 331 331 331 133 133 133 133 133 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
+1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 35 35 35 35 35 35 35 35 35 35 35 35 35 35 35 35 35 35 233 233 233 233 233 233 233 116 116 116 116 116 116 119 119 119 119 133 133 133 133 133 133 133 133 133 232 232 232 232 231 231 231 231 231 231 231 231 231 249 249 249 249 249 249 249 249 249 329 329 329 329 329 48 48 48 48 48 279 279 279 279 279 279 279 279 279 221 221 221 221 221 249 249 249 249 249 249 249 249 249 249 285 285 285 285 285 285 285 285 285 48 48 48 187 187 187 187 340 340 340 340 340 340 340 340 275 275 275 275 275 101 101 101 101 101 101 101 101 101 101 288 288 288 219 219 219 219 219 219 219 219 219 219 225 225 225 225 249 249 249 249 249 249 280 280 280 280 280 280 280 287 287 287 287 287 287 287 48 48 48 119 119 119 119 204 204 204 204 204 204 204 204 204 99 99 99 99 99 99 99 99 99 99 99 99 225 225 225 225 225 225 225 49 49 49 49 49 233 233 233 233 116 116 116 116 116 116 116 116 116 116 116 116 116 1 1 1 1 1 1 1 1 191 191 191 191 191 191 191 191 191 191 288 288 288 288 288 191 191 191 191 191 341 341 341 341 341 341 341 49 49 49 49 49 233 233 233 288 288 288 288 131 131 131 131 340 340 340 340 340 340 340 340 340 187 187 187 187 187 187 187 172 172 172 172 172 172 172 331 331 331 208 208 208 208 208 331 331 331 331 331 331 331 144 144 144 144 144 144 175 175 175 175 175 175 175 133 133 133 133 289 289 289 289 289 189 189 189 189 189 236 236 236 236 236 236 236 179 179 179 179 179 193 193 193 193 193 228 228 228 228 228 228 171 171 171 171 171 171 171 145 145 145 145 145 228 228 228 228 191 191 191 191 191 191 191 191 191 237 237 237 237 237 237 237 237 237 237 237 177 177 177 177 177 225 225 225 225 225 49 49 49 49 49 233 233 233 116 116 116 116 116 67 67 67 67 67 67 276 276 276 276 276 119 119 119 119 52 52 52 52 279 279 279 279 279 279 279 279 279 289 289 289 289 289 289 165 165 165 165 165 165 165 165 165 165 165 165 165 165 165 289 289 289 289 289 289 289 280 280 280 280 280 280 280 280 280 280 280 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 179 179 179 179 179 179 179 179 179 208 208 208 208 219 219 219 219 219 219 219 219 219 37 37 37 37 37 37 37 37 233 233 233 288 288 288 107 107 107 107 204 204 204 204 204 204 204 227 227 227 227 227 227 227 227 227 227 53 53 53 53 53 53 112 112 112 112 112 112 112 112 112 112 112 112 112 112 115 115 115 193 193 193 193 173 173 173 173 173 173 277 277 277 49 49 49 233 233 233 288 288 288 288 171 171 171 171 171 145 145 145 145 228 228 228 228 15 15 15 15 277 277 277 277 277 277 277 277 281 281 281 281 281 281 281 133 133 133 133 133 133 133 225 225 225 225 225 225 225 225 225 329 329 329 329 329 329 329 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 331 331 331 331 331 331 331 133 133 133 133 133 133 224 224 224 224 224 224 224 224 102 102 102 102 102 102 102 102 102 102 102 102 102 179 179 179 179 179 179 179 179 179 179 179 179 179 179 179 179 179 249 249 249 249 249 249 249 249 249 249 272 272 272 272 272 272 272 187 187 187 288 288 288 288 288 331 331 331 49 49 49 224 224 224 224 224 224 224 287 287 287 287 287 287 287 287 287 287 149 149 149 149 149 232 232 232 232 232 83 83 83 83 83 83 83 83 288 288 288 288 288 288 288 67 67 67 67 67 67 67 67 224 224 224 224 224 224 224 275 275 275 275 275 101 101 101 101 101 101 101 101 101 101 101 101 288 288 288 288 288 288 288 288 288 288 288 288 288 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 279 279 279 279 279 279 279 279 279 279 133 133 133 133 133 133 116 116 116 116 227 227 227 227 227 227 193 193 193 193 281 281 281 281 281 281 281 189 189 189 189 189 340 340 340 340 340 340 340 340 340 275 275 275 275 275 165 165 165 165 165 165 165 165 113 113 113 113 113 49 49 49 49 49 49 49 224 224 224 224 224 224 224 224 187 187 187 187 232 232 232 232 50 50 50 50 287 287 287 287 287 287 287 287 287 287 287 287 287 249 249 249 249 249 249 249 249 249 249 232 232 232 232 232 119 119 119 49 49 49 288 288 288 288 288 288 271 271 271 271 271 271 271 271 271 271 271 271 225 225 225 225 225 165 165 165 165 165 165 165 165 165 233 233 233 233 233 225 225 225 225 204 204 204 204 204 204 204 204 204 191 191 191 191 191 191 191 191 233 233 233 233 117 117 117 117 117 49 49 49 49 221 221 221 221 221 221 221 221 169 169 169 169 169 169 169 169 289 289 289 289 289 189 189 189 189 116 116 116 116 179 179 179 179 179 179 179 144 144 144 144 144 144 271 271 271 271 271 271 271 271 271 271 271 271 271 271 271 165 165 165 165 165 165 165 165 233 233 233 233 233 233 233 233 233 233 233 173 173 173 173 173 173 173 49 49 49 49 49 224 224 224 224 224 224 224 224 224 115 115 115 115 115 115 115 85 85 85 85 85 85 85 85 85 85 85 85 85 85 85 85 85 85 85 85 289 289 289 289 289 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 1 1 1 1 1 1 1 1 1 1
+1 1 1 1 1 1 1 1 1 1 247 247 247 247 247 247 247 247 247 247 247 233 233 233 233 233 233 233 233 233 233 233 233 225 225 225 225 204 204 204 204 204 204 204 204 204 204 115 115 115 115 115 115 115 115 115 249 249 249 249 249 249 249 249 249 249 249 249 233 233 233 288 288 288 288 288 279 279 279 279 279 279 279 279 279 279 279 279 164 164 164 164 164 164 164 164 164 164 164 164 164 164 164 102 102 102 102 102 102 102 102 102 102 115 115 115 115 115 115 193 193 193 193 193 117 117 117 49 49 49 232 232 232 232 232 331 331 331 331 331 331 331 331 69 69 69 69 69 277 277 277 277 277 277 232 232 232 335 335 335 335 335 335 320 320 320 320 320 320 187 187 187 187 187 172 172 172 172 179 179 179 179 179 208 208 208 208 208 208 107 107 107 107 107 107 107 107 107 149 149 149 149 149 149 149 149 149 149 149 149 149 149 149 233 233 233 233 233 233 233 340 340 340 340 340 340 340 340 175 175 175 175 175 175 277 277 277 277 277 277 209 209 209 209 209 209 209 209 209 232 232 232 232 232 232 232 232 232 175 175 175 175 175 165 165 165 165 165 165 165 165 165 109 109 109 109 109 49 49 49 49 225 225 225 225 225 225 225 225 225 340 340 340 340 115 115 115 115 115 115 85 85 85 85 85 85 85 85 85 85 85 85 85 85 85 85 85 85 232 232 232 232 232 232 146 146 146 146 146 146 146 146 271 271 271 271 271 271 271 271 305 305 305 305 305 289 289 289 289 289 289 280 280 280 279 279 279 279 279 279 279 279 279 289 289 289 289 289 277 277 277 193 193 193 193 193 221 221 221 221 221 221 221 221 233 233 233 233 233 105 105 105 105 105 105 105 105 105 232 232 232 232 232 187 187 187 187 232 232 232 232 119 119 119 52 52 52 52 331 331 331 331 331 331 133 133 133 133 133 133 133 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 179 179 179 179 179 149 149 149 149 149 149 149 149 116 116 116 47 47 47 328 328 328 328 50 50 50 50 50 219 219 219 219 219 219 219 219 165 165 165 165 165 165 165 165 165 165 165 165 165 280 280 280 280 280 280 280 280 280 280 280 247 247 247 247 247 247 247 247 247 329 329 329 329 329 329 329 329 329 144 144 144 144 144 187 187 187 187 232 232 232 231 231 231 231 337 337 337 337 337 337 320 320 320 320 320 320 320 320 107 107 107 107 107 107 277 277 277 277 53 53 53 53 233 233 233 233 233 233 233 233 233 341 341 341 341 341 341 333 333 333 333 333 189 189 189 189 189 189 189 220 220 220 220 220 220 220 331 331 331 133 133 133 276 276 276 47 47 47 232 232 232 232 232 232 232 67 67 67 67 67 67 277 277 277 277 277 173 173 173 173 173 173 49 49 49 49 232 232 232 232 47 47 47 47 47 281 281 281 281 281 281 281 281 281 101 101 101 101 101 101 101 101 101 225 225 225 225 225 49 49 49 49 228 228 228 228 228 228 111 111 111 111 111 111 111 111 111 111 101 101 101 101 101 101 101 101 101 101 225 225 225 225 225 225 225 116 116 116 115 115 115 115 115 193 193 193 193 116 116 116 116 116 116 116 116 119 119 119 119 119 37 37 37 37 37 37 37 37 37 288 288 288 288 288 288 288 288 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 47 47 47 47 47 47 47 47 47 47 233 233 233 116 116 116 119 119 119 52 52 52 179 179 179 179 179 179 179 179 179 179 179 179 179 179 179 179 249 249 249 249 249 249 249 249 249 249 224 224 224 224 224 171 171 171 171 171 171 171 171 171 171 171 171 37 37 37 37 37 37 37 37 37 37 229 229 229 229 225 225 225 225 225 225 204 204 204 204 204 204 204 115 115 115 115 115 115 115 101 101 101 101 101 101 101 101 101 101 116 116 116 116 116 116 187 187 187 232 232 232 232 232 232 232 171 171 171 171 171 171 171 171 171 171 171 171 171 171 171 171 193 193 193 193 193 277 277 277 277 277 277 173 173 173 173 173 173 49 49 49 49 224 224 224 224 224 224 224 224 224 224 35 35 35 35 35 35 35 35 35 177 177 177 177 177 49 49 49 233 233 233 233 205 205 205 205 205 205 205 205 205 205 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 247 247 247 247 247 247 247 247 247 247 247 247 247 247 247 247 233 233 233 233 233 225 225 225 225 204 204 204 204 191 191 191 191 288 288 288 288 288 288 331 331 331 331 49 49 49 340 340 340 340 340 50 50 50 50 175 175 175 175 175 175 149 149 149 149 149 149 149 149 149 149 149 224 224 224 224 224 224 224 224 224 187 187 187 187 232 232 232 119 119 119 119 37 37 37 37 37 37 288 288 288 288 288 191 191 191 191 191 191 233 233 233 233 233 233 281 281 281 281 281 281 281 289 289 289 289 289 49 49 49 233 233 233 233 233 233 233 233 233 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 331 331 331 331 331 331 331 331 331 331 331 331 331 133 133 133 133 133 224 224 224 224 224 224 224 224 331 331 331 331 331 331 331 148 148 148 148 148 148 231 231 231 231 231 231 21 21 21 21 21 21 21 288 288 288 288 288 288 288 288 175 175 175 175 175 175 175 133 133 133 133 289 289 289 189 189 189 189 236 236 236 236 236 236 236 50 50 50 50 50 175 175 175 175 175 175 175 175 149 149 149 149 149 149 149 149 149 149 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 279 279 279 279 279 279 279 279 279 279 133 133 133 133 116 116 116 116 227 227 227 227 227 17 17 17 277 277 277 277 277 277 193 193 193 193 193 193 225 225 225 225 225 225 225 225 225 48 48 48 48 48 48 48 48 48 48 1 1 1 1 1 1 1 1 1 1 1
+1 1 1 1 1 1 1 1 1 1 1 1 1 35 35 35 35 35 35 35 35 35 35 35 35 340 340 340 340 340 340 340 340 340 191 191 191 191 191 191 172 172 172 172 172 271 271 271 271 271 271 271 271 265 265 265 265 265 265 265 265 265 265 265 265 265 341 341 341 341 341 341 49 49 49 233 233 233 189 189 189 189 236 236 236 236 236 236 236 236 236 236 331 331 331 331 331 331 133 133 133 133 133 133 133 225 225 225 225 225 225 225 225 225 225 225 340 340 340 340 340 340 340 340 340 331 331 331 331 331 331 331 331 331 144 144 144 50 50 50 50 50 50 271 271 271 271 271 271 271 271 337 337 337 337 337 337 337 305 305 305 305 277 277 277 277 277 277 277 277 225 225 225 225 225 204 204 204 204 204 204 204 204 171 171 171 171 171 171 171 171 171 171 171 171 171 171 171 133 133 133 133 133 229 229 229 229 229 229 229 49 49 49 233 233 233 49 49 49 49 49 232 232 232 232 232 47 47 47 47 221 221 221 221 221 221 221 221 221 221 21 21 21 229 229 229 229 229 229 273 273 273 225 225 225 189 189 189 189 285 285 285 285 285 285 285 285 285 285 229 229 229 229 229 49 49 49 49 233 233 233 233 288 288 288 288 288 288 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 47 47 47 47 47 47 47 47 47 47 47 47 47 233 233 233 116 116 116 231 231 231 231 231 231 21 21 21 21 21 21 21 21 288 288 288 287 287 287 287 287 188 188 188 107 107 107 107 204 204 204 204 204 204 204 204 115 115 115 115 115 115 115 115 277 277 277 277 277 133 133 133 133 133 117 117 117 117 189 189 189 189 189 116 116 116 116 116 116 116 187 187 187 232 232 232 119 119 119 52 52 52 52 219 219 219 219 219 219 219 219 219 219 165 165 165 165 165 165 165 165 165 280 280 280 280 280 280 280 280 280 280 47 47 47 47 328 328 328 328 328 50 50 50 50 50 107 107 107 107 107 107 107 264 264 264 264 264 264 264 264 264 264 264 264 264 264 264 264 264 264 264 264 264 264 264 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 99 99 99 99 99 99 99 99 99 99 99 99 99 99 116 116 116 231 231 231 231 231 133 133 133 133 133 133 329 329 329 329 329 144 144 144 144 144 144 144 115 115 115 115 115 115 115 115 277 277 277 277 277 277 209 209 209 209 209 209 209 209 209 228 228 228 228 228 47 47 47 328 328 328 328 328 287 287 287 287 287 287 287 287 287 287 165 165 165 165 165 165 221 221 221 221 221 221 189 189 189 236 236 236 236 236 236 236 236 50 50 50 50 50 175 175 175 175 175 175 175 175 175 175 175 175 149 149 149 149 149 149 149 149 149 149 224 224 224 224 224 224 224 224 224 224 287 287 287 48 48 48 48 107 107 107 107 107 277 277 277 193 193 193 193 193 193 236 236 236 236 236 236 51 51 51 51 51 51 51 51 51 51 51 51 272 272 272 272 272 272 272 272 272 272 272 272 272 272 272 272 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 102 102 102 102 102 102 102 102 102 102 102 102 331 331 331 331 331 331 331 331 331 331 331 53 53 53 53 233 233 233 233 233 233 117 117 117 117 117 144 144 144 144 144 35 35 35 288 288 288 227 227 227 227 193 193 193 193 193 281 281 281 281 281 281 281 189 189 189 340 340 340 340 39 39 39 39 39 39 225 225 225 225 225 49 49 49 49 49 49 177 177 177 177 177 341 341 341 341 341 37 37 37 37 37 37 37 37 233 233 233 233 233 117 117 117 117 117 144 144 144 144 144 279 279 279 279 279 279 279 279 279 273 273 273 273 273 273 133 133 133 133 233 233 233 233 233 233 233 233 233 281 281 281 281 281 281 281 281 144 144 144 144 144 171 171 171 171 171 144 144 144 144 144 115 115 115 115 115 115 321 321 321 321 321 321 321 189 189 189 189 189 189 189 236 236 236 236 236 236 236 191 191 191 191 191 288 288 288 288 288 288 288 288 288 288 288 288 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 107 107 107 107 107 53 53 53 53 288 288 288 288 288 119 119 119 133 133 133 133 276 276 276 276 276 276 276 276 276 276 276 276 276 276 276 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 283 283 283 283 283 283 283 283 283 283 283 283 283 283 283 283 283 283 283 283 283 283 208 208 208 208 208 208 208 208 208 208 208 208 208 208 208 208 208 331 331 331 331 331 305 305 305 305 305 117 117 117 49 49 49 49 49 233 233 233 288 288 288 283 283 283 283 283 283 283 283 283 283 277 277 277 277 193 193 193 193 237 237 237 237 237 237 237 237 220 220 220 220 220 220 171 171 171 171 145 145 145 228 228 228 228 47 47 47 47 117 117 117 117 117 117 117 21 21 21 21 21 21 21 21 21 273 273 273 273 273 289 289 289 189 189 189 189 236 236 236 236 236 236 236 50 50 50 50 50 179 179 179 179 179 179 179 179 249 249 249 249 249 249 249 249 249 249 249 249 249 224 224 224 224 224 224 224 1 1 1 67 67 67 67 67 67 67 67 67 67 277 277 277 277 277 173 173 173 173 173 173 173 173 173 49 49 49 49 232 232 232 47 47 47 47 47 281 281 281 281 281 281 281 281 281 101 101 101 101 101 101 101 101 101 101 225 225 225 225 225 225 225 49 49 49 228 228 228 187 187 187 187 187 187 172 172 172 172 283 283 283 283 283 208 208 208 208 208 287 287 287 287 287 287 305 305 305 305 305 305 220 220 220 220 220 220 191 191 191 288 288 288 288 288 187 187 187 187 187 233 233 233 233 233 233 289 289 289 289 289 289 289 289 320 320 320 320 179 179 179 179 179 179 179 144 144 144 144 144 144 144 179 179 179 179 179 179 179 179 133 133 133 133 133 133 133 133 133 133 116 116 116 116 116 116 116 116 116 116 116 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
+1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 227 227 227 227 227 227 227 227 227 227 193 193 193 193 281 281 281 281 281 281 281 189 189 189 189 189 340 340 340 340 340 340 340 340 275 275 275 275 275 275 165 165 165 165 165 165 165 165 113 113 113 113 113 49 49 49 49 49 49 224 224 224 224 331 331 331 331 331 331 305 305 305 116 116 116 179 179 179 37 37 37 328 328 328 328 223 223 223 223 223 101 101 101 101 101 101 101 101 221 221 221 221 288 288 288 287 287 287 188 188 188 188 188 279 279 279 279 279 279 279 279 289 289 289 289 289 289 164 164 164 164 164 164 164 164 164 47 47 47 233 233 233 233 289 289 289 289 289 289 289 193 193 193 224 224 224 224 224 227 227 227 227 227 227 227 227 37 37 37 37 37 37 37 37 37 37 37 293 293 293 293 293 337 337 337 337 316 316 316 316 316 219 219 219 219 219 219 219 219 219 165 165 165 165 165 165 165 165 165 228 228 228 228 228 228 228 228 179 179 179 249 249 249 249 249 249 249 228 228 228 228 228 228 331 331 331 331 189 189 189 120 120 120 179 179 179 189 189 189 340 340 340 340 340 187 187 187 187 229 229 229 229 229 273 273 273 273 273 273 273 273 69 69 69 69 69 277 277 277 277 277 289 289 289 189 189 189 189 116 116 116 116 116 116 116 116 116 116 67 67 67 67 67 67 67 67 277 277 277 277 277 277 173 173 173 173 173 173 173 173 49 49 49 49 232 232 232 232 232 232 232 232 232 232 232 232 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 107 107 107 107 107 107 53 53 53 53 53 53 288 288 288 288 288 288 275 275 275 275 49 49 49 49 49 49 173 173 173 173 173 173 173 225 225 225 133 133 133 133 133 221 221 221 221 221 221 221 221 289 289 289 289 289 189 189 189 189 189 236 236 236 236 236 236 236 236 119 119 119 119 49 49 49 49 288 288 288 288 191 191 191 288 288 288 288 288 331 331 331 331 305 305 305 305 116 116 116 116 107 107 107 107 107 208 208 208 208 208 208 208 208 208 208 208 50 50 50 50 50 175 175 175 175 175 175 175 175 175 175 175 305 305 305 305 305 305 305 116 116 116 116 116 116 116 116 116 287 287 287 287 287 287 287 287 287 287 287 320 320 320 320 320 320 320 320 320 320 320 320 83 83 83 83 83 83 83 83 83 83 83 83 83 83 83 83 277 277 277 277 277 277 277 277 340 340 340 340 340 340 340 340 35 35 35 35 35 288 288 288 288 288 288 223 223 223 223 223 223 209 209 209 209 209 209 209 209 209 281 281 281 281 281 281 281 288 288 288 107 107 107 107 107 189 189 189 189 189 173 173 173 173 173 173 173 173 69 69 69 69 69 69 276 276 276 276 179 179 179 179 179 179 189 189 189 340 340 340 340 340 340 340 143 143 143 143 143 143 143 143 143 143 143 143 101 101 101 101 101 101 101 101 101 101 101 329 329 329 329 49 49 49 49 49 224 224 224 224 224 224 224 224 224 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 283 283 283 283 283 283 283 283 283 283 283 283 208 208 208 208 208 219 219 219 219 219 49 49 49 49 49 233 233 233 233 221 221 221 221 221 221 221 225 225 225 225 225 321 321 321 321 321 321 117 117 117 117 117 189 189 189 189 189 189 189 116 116 116 116 287 287 287 287 188 188 188 188 188 188 175 175 175 175 248 248 248 248 248 248 248 248 248 248 51 51 51 51 51 51 51 51 51 51 51 51 272 272 272 272 272 272 119 119 119 119 48 48 48 48 48 48 48 275 275 275 275 275 275 275 275 275 275 249 249 249 249 249 249 249 249 249 249 249 249 249 249 249 249 249 116 116 116 287 287 287 287 287 287 287 287 287 287 287 287 287 287 287 287 320 320 320 320 320 320 275 275 275 275 275 275 275 275 275 21 21 21 21 21 21 21 109 109 109 145 145 145 145 145 145 145 288 288 288 288 288 288 107 107 107 107 107 107 133 133 133 133 133 133 133 133 225 225 225 225 225 225 225 225 225 225 225 225 225 225 340 340 340 340 340 340 340 47 47 47 47 233 233 233 116 116 116 287 287 287 287 287 287 287 133 133 133 133 133 224 224 224 224 224 224 119 119 119 119 119 48 48 48 231 231 231 231 231 231 337 337 337 337 337 337 337 337 321 321 321 321 321 321 321 321 321 321 321 321 321 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 187 187 187 187 187 187 187 187 187 187 187 288 288 288 288 288 288 288 331 331 331 305 305 305 305 116 116 116 116 116 116 116 116 279 279 279 279 279 279 279 279 279 279 149 149 149 149 149 149 149 149 149 289 289 289 289 49 49 49 233 233 233 225 225 225 204 204 204 204 204 227 227 227 227 227 227 227 227 165 165 165 165 165 165 165 220 220 220 220 220 220 220 220 50 50 50 50 279 279 279 279 279 279 279 279 129 129 129 129 233 233 233 233 233 233 233 233 281 281 281 281 281 281 281 281 281 165 165 165 165 165 165 165 165 285 285 285 285 285 285 285 285 285 49 49 49 49 49 49 232 232 232 232 232 279 279 279 279 279 279 279 133 133 133 133 133 221 221 221 221 221 49 49 49 49 232 232 232 232 287 287 287 287 287 287 48 48 48 231 231 231 231 231 231 231 231 231 53 53 53 53 53 53 53 53 53 232 232 232 232 232 232 232 232 232 232 232 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
+1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 35 35 35 35 35 35 35 35 35 35 35 233 233 233 233 116 116 116 116 227 227 227 227 227 193 193 193 193 193 281 281 281 281 281 281 281 281 189 189 189 189 340 340 340 340 340 340 340 340 340 275 275 275 275 275 165 165 165 165 165 165 165 165 165 165 113 113 113 113 113 113 49 49 49 49 49 49 49 224 224 224 224 224 224 224 224 224 224 1 1 1 1 1 1 1 115 115 115 115 115 115 115 193 193 193 193 193 193 277 277 277 277 277 277 277 225 225 225 225 225 225 204 204 204 204 204 204 204 204 223 223 223 223 223 223 223 223 223 223 223 223 53 53 53 53 53 53 53 53 53 329 329 329 329 329 329 329 116 116 116 287 287 287 48 48 48 48 227 227 227 227 165 165 165 165 165 165 220 220 220 220 220 220 50 50 50 50 279 279 279 279 279 279 279 279 129 129 129 129 129 233 233 233 233 233 233 233 281 281 281 281 281 281 281 165 165 165 165 165 165 165 165 165 285 285 285 285 285 285 285 285 285 285 285 49 49 49 49 232 232 232 232 232 232 232 232 232 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 279 279 279 279 279 279 279 279 279 279 279 279 279 248 248 248 248 248 283 283 283 283 283 283 283 208 208 208 208 208 208 287 287 287 287 287 287 287 305 305 305 305 305 305 305 305 220 220 220 179 179 179 179 145 145 145 145 145 145 281 281 281 281 281 281 281 133 133 133 133 133 225 225 225 225 225 225 172 172 172 172 172 172 47 47 47 47 47 47 333 333 333 333 333 333 333 333 333 333 164 164 164 164 164 164 164 164 164 164 164 164 164 164 164 164 164 164 164 164 279 279 279 279 279 279 279 279 279 279 279 279 279 279 279 279 53 53 53 53 53 53 229 229 229 229 229 229 229 229 333 333 333 333 53 53 53 53 53 53 53 53 288 288 288 287 287 287 287 287 48 48 48 227 227 227 227 227 227 145 145 145 145 145 145 145 145 145 145 145 193 193 193 193 193 225 225 225 225 225 225 225 225 225 225 225 49 49 49 49 340 340 340 340 340 340 340 340 340 340 275 275 275 275 189 189 189 189 225 225 225 225 225 225 225 225 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 172 172 172 172 172 172 172 172 172 172 172 172 172 1 1 1 1 1 1 1 1 1 171 171 171 171 171 171 171 171 171 171 171 171 69 69 69 276 276 276 276 276 119 119 119 119 48 48 48 48 48 48 223 223 223 223 223 223 223 223 223 37 37 37 37 37 37 37 37 289 289 289 289 289 289 144 144 144 144 144 144 144 171 171 171 171 171 171 171 171 171 133 133 133 133 133 225 225 225 225 225 225 225 288 288 288 179 179 179 179 179 179 179 179 144 144 144 144 144 144 144 115 115 115 115 115 115 115 115 115 85 85 85 85 85 85 85 85 85 85 85 85 85 85 85 289 289 289 289 289 289 289 289 280 280 280 280 280 47 47 47 233 233 233 233 116 116 116 116 116 171 171 171 171 171 171 171 171 193 193 193 193 193 193 193 193 193 277 277 277 277 277 277 340 340 340 340 340 340 340 340 275 275 275 275 275 275 189 189 189 189 329 329 329 329 329 329 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 329 329 329 329 329 189 189 189 189 189 189 236 236 236 236 236 236 236 236 51 51 51 51 233 233 233 233 233 233 117 117 117 117 144 144 144 119 119 119 119 119 119 204 204 204 204 204 204 191 191 191 191 191 191 233 233 233 233 233 233 233 173 173 173 173 173 173 173 173 173 173 225 225 225 225 317 317 317 49 49 49 49 233 233 233 233 233 280 280 280 280 280 280 47 47 47 47 47 328 328 328 328 328 227 227 227 227 227 227 227 193 193 193 193 193 281 281 281 281 281 281 281 281 189 189 189 189 340 340 340 340 340 340 340 340 275 275 275 275 275 165 165 165 165 165 165 165 113 113 113 113 113 49 49 49 49 49 225 225 225 225 340 340 340 340 340 340 271 271 271 271 271 271 271 271 133 133 133 133 133 281 281 281 281 281 281 281 281 281 281 281 49 49 49 49 229 229 229 229 197 197 197 197 197 197 341 341 341 341 341 341 49 49 49 49 49 228 228 228 228 228 228 228 228 228 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 331 331 331 331 331 331 133 133 133 224 224 224 47 47 47 47 47 328 328 328 328 328 328 67 67 67 67 67 67 67 67 67 67 67 67 67 224 224 224 224 224 224 224 291 291 291 291 291 291 291 291 291 291 291 291 291 291 291 291 193 193 193 193 237 237 237 237 237 237 237 237 237 237 237 237 237 340 340 340 340 119 119 119 119 49 49 49 49 49 288 288 288 288 288 288 1 1 1 131 131 131 131 131 131 131 131 131 131 131 329 329 329 329 329 329 329 144 144 144 144 144 144 144 144 331 331 331 331 331 331 331 331 331 331 331 331 331 331 331 331 331 331 331 331 148 148 148 148 148 148 148 148 148 148 148 148 148 148 148 148 148 148 67 67 67 67 67 67 67 276 276 276 276 276 276 276 331 331 331 331 331 331 331 331 331 331 193 193 193 193 193 193 224 224 224 224 224 224 224 224 107 107 107 107 107 107 204 204 204 204 204 204 204 204 204 204 204 204 204 204 204 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 187 187 187 187 187 187 187 187 187 217 217 217 217 217 217 217 37 37 37 37 37 37 37 37 37 221 221 221 221 221 221 337 337 337 337 317 317 317 225 225 225 225 161 161 161 161 161 161 161 161 161 289 289 289 289 289 189 189 189 189 116 116 116 116 116 227 227 227 227 227 193 193 193 193 193 281 281 281 281 281 281 281 189 189 189 189 340 340 340 340 340 340 340 340 340 275 275 275 275 275 165 165 165 165 165 165 165 165 165 113 113 113 113 113 113 49 49 49 49 49 49 224 224 224 224 331 331 331 331 331 331 331 193 193 193 193 193 232 232 232 232 232 283 283 283 283 283 283 208 208 208 208 208 331 331 331 331 331 49 49 49 340 340 340 340 279 279 279 279 279 279 279 279 279 279 279 165 165 165 165 165 165 165 165 165 173 173 173 173 173 173 173 173 173 225 225 225 225 204 204 204 204 204 204 204 204 204 204 83 83 83 83 83 83 83 288 288 288 288 288 288 288 288 288 288 288 187 187 187 232 232 232 232 119 119 119 52 52 52 52 223 223 223 223 223 223 223 223 223 223 165 165 165 165 165 165 165 165 165 165 165 165 165 232 232 232 232 232 232 232 232 232 232 232 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
+1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 187 187 187 187 187 187 187 187 187 187 187 187 288 288 288 288 288 115 115 115 115 115 115 115 53 53 53 53 53 53 340 340 340 340 340 340 340 340 340 340 340 340 275 275 275 275 275 209 209 209 209 225 225 225 225 225 225 225 204 204 204 204 204 204 204 279 279 279 279 279 279 279 279 279 279 279 279 279 279 209 209 209 209 209 209 209 209 209 228 228 228 228 131 131 131 340 340 340 340 340 340 340 340 191 191 191 191 172 172 172 172 172 172 172 172 102 102 102 102 102 227 227 227 227 227 227 53 53 53 53 53 281 281 281 281 281 288 288 288 107 107 107 107 204 204 204 204 115 115 115 115 115 115 115 277 277 277 277 277 209 209 209 209 209 209 209 229 229 229 229 229 189 189 189 189 189 236 236 236 236 236 236 236 236 236 236 236 236 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 331 331 331 331 331 331 133 133 133 133 133 224 224 224 224 224 224 224 224 99 99 99 99 99 99 99 99 228 228 228 228 228 228 279 279 279 279 279 279 279 279 279 279 279 279 21 21 21 21 277 277 277 277 277 277 277 277 277 204 204 204 204 204 171 171 171 171 171 171 171 69 69 69 276 276 276 276 119 119 119 119 37 37 37 37 37 37 37 37 288 288 288 288 288 288 271 271 271 271 271 271 271 271 271 271 271 321 321 321 321 321 321 321 321 276 276 276 276 276 335 335 335 335 335 335 335 335 335 335 335 53 53 53 53 53 53 236 236 236 236 236 236 236 236 236 236 331 331 331 331 331 331 53 53 53 53 232 232 232 232 232 35 35 35 35 233 233 233 233 116 116 116 231 231 231 231 231 231 231 231 248 248 248 248 248 248 227 227 227 227 227 227 189 189 189 189 281 281 281 281 281 281 281 281 281 281 281 289 289 289 289 289 289 165 165 165 165 165 165 165 165 165 220 220 220 220 220 220 220 220 220 220 220 220 220 220 220 220 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 227 227 227 227 227 227 227 227 227 227 37 37 37 37 37 37 37 37 37 37 37 293 293 293 293 293 293 293 337 337 337 337 316 316 316 316 316 316 316 316 316 47 47 47 233 233 233 116 116 116 227 227 227 17 17 17 277 277 277 277 277 277 277 277 193 193 193 193 193 193 225 225 225 225 225 225 225 48 48 48 48 48 48 48 115 115 115 115 115 115 115 115 115 249 249 249 249 249 249 249 232 232 232 232 231 231 231 231 231 248 248 248 248 248 248 248 248 248 248 248 248 248 131 131 131 131 131 131 131 233 233 233 233 233 205 205 205 205 205 205 205 205 293 293 293 293 293 293 293 197 197 197 236 236 236 236 236 236 47 47 47 47 109 109 109 109 109 85 85 85 85 85 85 85 85 85 85 85 85 85 288 288 288 288 111 111 111 111 111 111 111 111 193 193 193 193 225 225 225 225 225 225 117 117 117 117 117 277 277 277 277 277 49 49 49 49 232 232 232 232 232 232 232 232 232 1 1 1 1 1 1 1 1 1 1 47 47 47 47 47 47 47 47 47 47 47 47 233 233 233 116 116 116 119 119 119 165 165 165 165 165 224 224 224 224 224 224 224 187 187 187 187 187 221 221 221 221 221 281 281 281 281 281 281 273 273 273 273 273 273 133 133 133 133 221 221 221 221 221 221 221 221 288 288 288 288 288 288 187 187 187 228 228 228 228 228 228 228 287 287 287 287 287 188 188 188 107 107 107 107 107 204 204 204 204 204 204 204 331 331 331 331 331 331 331 331 331 331 331 101 101 101 101 101 101 101 101 101 101 101 341 341 341 341 341 341 341 341 341 341 144 144 144 144 144 144 47 47 47 47 47 233 233 233 116 116 116 116 279 279 279 279 279 279 279 279 289 289 289 289 289 289 289 133 133 133 117 117 117 117 117 205 205 205 205 205 205 205 205 205 144 144 144 144 119 119 119 119 119 119 49 49 49 288 288 288 179 179 179 189 189 189 189 340 340 340 340 340 247 247 247 247 247 247 247 247 247 247 247 247 247 232 232 232 232 232 232 232 232 232 232 232 175 175 175 175 175 175 277 277 277 277 37 37 37 37 37 37 37 37 37 233 233 233 233 233 233 233 233 173 173 173 173 173 173 173 173 173 25 25 25 25 25 25 25 25 121 121 121 121 121 144 144 144 144 144 144 144 144 144 144 144 144 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
+1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 187 187 187 187 187 187 187 187 187 187 187 187 288 288 288 288 288 288 288 288 279 279 279 279 279 279 279 279 279 279 279 279 209 209 209 209 209 209 209 209 229 229 229 229 229 229 229 229 340 340 340 340 340 1 1 1 47 47 47 47 47 47 47 47 47 47 233 233 233 233 233 233 233 233 233 233 233 233 221 221 221 221 221 221 221 221 221 221 221 221 221 221 221 221 221 221 221 221 221 221 221 221 221 221 37 37 37 37 233 233 233 233 233 204 204 204 204 204 204 204 204 204 287 287 287 287 287 287 287 188 188 188 188 291 291 291 291 193 193 193 237 237 237 237 220 220 220 47 47 47 47 328 328 328 328 50 50 50 50 111 111 111 111 111 111 111 111 111 111 111 111 111 111 111 111 101 101 101 101 101 101 101 225 225 225 225 225 225 225 116 116 116 116 116 35 35 35 288 288 288 288 288 288 288 175 175 175 175 175 175 277 277 277 209 209 209 209 209 209 209 209 232 232 232 232 232 175 175 175 175 175 165 165 165 165 165 165 165 165 165 165 109 109 109 109 109 49 49 49 225 225 225 225 225 225 340 340 340 340 279 279 279 279 279 279 279 279 53 53 53 53 53 53 229 229 229 229 229 181 181 181 181 181 181 88 88 88 88 88 88 88 88 88 88 88 88 88 88 88 88 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 119 119 119 119 119 133 133 133 277 277 277 340 340 340 340 340 340 340 231 231 231 133 133 133 133 329 329 329 329 329 329 144 144 144 144 107 107 107 107 107 107 193 193 193 193 193 193 232 232 232 232 232 232 232 232 331 331 331 331 331 331 331 331 53 53 53 53 53 232 232 232 232 232 232 232 119 119 119 119 133 133 133 133 133 133 276 276 276 276 276 276 276 276 276 276 276 171 171 171 171 171 171 171 171 171 171 171 171 69 69 69 276 276 276 276 276 276 227 227 227 227 227 227 227 227 227 37 37 37 37 37 37 37 37 37 37 293 293 293 293 293 293 293 337 337 337 337 337 316 316 316 316 316 316 316 316 316 47 47 47 47 233 233 233 116 116 116 227 227 227 17 17 17 277 277 277 277 277 277 277 277 193 193 193 193 193 193 225 225 225 225 225 225 225 225 225 225 225 225 225 48 48 48 48 48 48 48 331 331 331 331 331 331 144 144 144 144 144 144 144 175 175 175 175 175 175 277 277 277 277 277 277 249 249 249 249 249 249 249 249 249 249 232 232 232 232 232 232 232 51 51 51 51 51 51 51 272 272 272 272 272 272 272 272 272 272 331 331 331 331 331 133 133 133 133 232 232 232 232 232 232 119 119 119 48 48 48 48 231 231 231 337 337 337 337 337 337 320 320 320 320 320 320 320 320 179 179 179 179 179 179 179 179 179 179 179 85 85 85 85 85 85 85 85 85 280 280 280 280 280 280 280 280 331 331 331 331 331 49 49 49 340 340 340 340 340 340 340 340 107 107 107 107 107 193 193 193 193 193 225 225 225 225 225 225 225 225 225 225 225 225 225 288 288 288 288 288 288 288 288 288 288 288 288 288 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 187 187 187 187 187 187 187 187 187 187 187 187 187 172 172 172 172 172 119 119 119 119 164 164 164 164 164 164 164 164 164 164 164 164 164 131 131 131 131 131 131 329 329 329 329 329 329 144 144 144 144 144 144 331 331 331 331 331 331 331 331 331 331 331 331 331 148 148 148 148 148 148 148 148 148 148 148 148 148 148 111 111 111 111 111 111 111 111 111 111 111 111 193 193 193 193 225 225 225 225 225 225 225 117 117 117 117 117 117 277 277 277 277 277 49 49 49 49 232 232 232 232 232 232 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 331 331 331 331 331 331 331 193 193 193 193 193 112 112 112 112 112 112 112 187 187 187 187 340 340 340 340 340 340 179 179 179 179 179 21 21 21 21 21 277 277 277 277 277 277 116 116 116 116 287 287 287 287 48 48 48 48 107 107 107 107 107 189 189 189 225 225 225 225 225 225 225 209 209 209 209 209 209 209 209 209 209 328 328 328 328 328 328 328 331 331 331 331 133 133 133 232 232 232 232 232 331 331 331 331 331 331 53 53 53 53 232 232 232 232 232 232 232 223 223 223 223 305 305 305 305 305 221 221 221 221 221 221 221 221 280 280 280 280 280 280 35 35 35 35 288 288 288 288 119 119 119 119 49 49 49 49 49 49 228 228 228 228 228 228 228 228 228 228 228 228 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 331 331 331 331 331 331 331 331 331 331 305 305 305 305 305 117 117 117 117 49 49 49 49 233 233 233 233 233 233 288 288 288 107 107 107 107 107 208 208 208 208 208 208 208 208 208 208 208 208 208 191 191 191 191 191 232 232 232 232 232 232 232 119 119 119 119 119 37 37 37 37 37 37 288 288 288 288 288 67 67 67 67 67 67 67 67 277 277 277 277 277 277 277 173 173 173 173 173 173 49 49 49 49 49 233 233 233 233 233 233 233 340 340 340 340 283 283 283 283 283 283 283 283 283 283 283 283 321 321 321 321 321 321 321 321 321 321 321 321 321 321 321 340 340 340 340 340 340 340 340 340 340 171 171 171 171 171 171 171 144 144 144 144 144 144 144 144 144 144 131 131 131 131 233 233 233 233 233 233 233 205 205 205 205 205 205 205 293 293 293 293 293 293 293 197 197 197 197 236 236 236 236 236 236 236 236 236 236 236 236 236 236 1 1 1 1 1 1 1 1 1 1 1 1 1 1
+1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 227 227 227 227 227 227 227 227 227 227 227 227 227 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 107 107 107 107 107 53 53 53 288 288 288 288 288 102 102 102 102 102 102 102 102 102 102 102 271 271 271 271 271 271 271 271 271 271 271 271 271 271 193 193 193 289 289 289 289 289 204 204 204 204 204 204 204 204 204 204 204 204 179 179 179 179 179 193 193 193 228 228 228 228 228 228 228 228 228 228 228 228 228 228 228 1 1 1 1 1 1 1 1 1 1 1 1 1 119 119 119 119 119 119 37 37 37 37 37 37 37 37 37 289 289 289 289 289 280 280 280 280 280 280 280 280 280 331 331 331 331 331 331 53 53 53 53 53 53 288 288 288 288 288 288 288 288 288 288 288 288 288 288 288 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 279 279 279 279 279 279 279 279 279 279 279 279 279 279 279 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 279 279 279 279 279 279 279 279 279 279 279 279 279 133 133 133 133 133 133 133 133 133 116 116 116 116 116 116 227 227 227 227 227 227 227 227 193 193 193 193 193 281 281 281 281 281 281 281 281 189 189 189 189 189 340 340 340 340 340 340 340 340 340 275 275 275 275 275 165 165 165 165 165 165 165 165 165 113 113 113 113 113 113 49 49 49 49 49 224 224 224 224 224 224 287 287 287 287 287 287 287 287 48 48 48 119 119 119 119 119 52 52 52 52 52 331 331 331 331 331 331 331 331 331 331 331 331 101 101 101 101 101 101 101 101 101 101 101 101 101 101 225 225 225 225 225 225 225 116 116 116 116 116 116 116 116 275 275 275 275 275 275 275 249 249 249 249 249 249 249 249 249 340 340 340 340 340 340 340 340 340 340 340 340 107 107 107 107 107 305 305 305 285 285 285 285 285 285 285 285 285 285 285 49 49 49 49 49 49 49 49 340 340 340 340 340 340 340 340 83 83 83 83 83 83 83 83 83 83 288 288 288 288 288 288 47 47 47 47 47 328 328 328 119 119 119 119 52 52 52 52 52 171 171 171 171 171 171 171 171 171 49 49 49 49 49 225 225 225 225 225 225 225 225 233 233 233 233 233 49 49 49 49 49 280 280 280 280 280 280 280 47 47 47 47 328 328 328 179 179 179 179 179 144 144 144 144 144 144 144 179 179 179 179 179 179 179 179 179 21 21 21 21 21 21 21 277 277 277 277 277 277 277 277 277 288 288 288 288 288 288 288 288 288 288 1 1 1 1 1 1 1 1 1 1 1 1 1
+1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 111 111 111 111 111 111 111 111 111 111 37 37 37 37 37 37 37 273 273 273 273 289 289 289 289 289 289 144 144 144 144 144 144 144 287 287 287 287 287 287 287 287 287 287 287 287 320 320 320 320 320 320 320 320 320 320 320 320 320 320 320 320 320 320 320 320 320 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 227 227 227 227 227 227 227 227 227 227 227 37 37 37 37 37 37 37 37 37 37 293 293 293 293 293 293 337 337 337 337 316 316 316 316 316 316 316 219 219 219 219 219 219 219 219 219 219 53 53 53 53 53 53 53 293 293 293 293 293 293 293 293 293 109 109 109 109 109 145 145 145 145 145 145 145 145 145 145 288 288 288 288 288 1 1 1 187 187 187 187 187 187 187 340 340 340 340 279 279 279 279 279 279 279 49 49 49 49 273 273 273 273 273 273 273 277 277 277 277 277 277 277 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 341 341 341 341 341 341 341 341 341 341 341 116 116 116 116 116 116 116 116 116 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 227 227 227 227 227 227 227 227 227 227 227 37 37 37 37 37 37 37 37 37 37 37 37 37 293 293 293 293 293 293 337 337 337 337 316 316 316 316 316 316 316 219 219 219 219 219 219 219 219 219 219 219 219 53 53 53 53 53 53 53 53 53 293 293 293 293 293 293 293 293 293 293 109 109 109 109 109 145 145 145 145 145 145 145 145 145 145 145 288 288 288 288 288 288 35 35 35 35 35 233 233 233 116 116 116 119 119 119 48 48 48 48 48 279 279 279 279 279 279 279 279 279 279 279 279 69 69 69 69 277 277 277 277 277 277 49 49 49 49 49 49 49 224 224 224 224 224 224 224 227 227 227 227 227 227 227 227 227 133 133 133 133 133 133 133 133 133 133 276 276 276 276 276 276 276 276 276 276 276 276 215 215 215 215 215 215 215 215 215 215 215 215 215 215 21 21 21 21 21 21 21 21 21 21 21 177 177 177 177 177 116 116 116 116 116 116 116 116 116 116 219 219 219 219 219 219 219 219 219 219 219 53 53 53 53 229 229 229 229 229 229 229 173 173 173 145 145 145 289 289 289 49 49 49 49 49 109 109 109 109 109 225 225 225 225 225 204 204 204 204 204 204 204 204 204 247 247 247 247 247 247 247 247 247 247 247 329 329 329 329 329 329 144 144 144 144 144 119 119 119 119 204 204 204 204 204 204 204 204 204 204 204 204 163 163 163 163 163 163 163 163 163 163 163 288 288 288 288 288 288 288 288 227 227 227 227 227 227 227 227 101 101 101 101 101 101 101 101 101 225 225 225 225 225 225 225 225 340 340 340 340 340 340 340 340 287 287 287 287 48 48 48 48 107 107 107 107 107 107 277 277 277 277 277 277 101 101 101 101 101 101 101 101 101 288 288 288 288 288 288 288 288 275 275 275 275 275 275 275 193 193 193 193 193 193 329 329 329 329 144 144 144 144 144 144 144 144 144 144 144 144 144 144 144 144 144 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 187 187 187 187 187 187 187 187 187 288 288 288 288 331 331 331 331 49 49 49 340 340 340 340 340 50 50 50 50 50 50 50 271 271 271 271 271 271 271 277 277 277 277 277 193 193 193 289 289 289 289 289 204 204 204 204 204 204 204 204 204 275 275 275 275 275 275 275 275 275 249 249 249 249 249 249 249 249 249 249 249 249 249 249 249 116 116 116 116 116 116 116 116 116 116 1 1 1 1 1 1 1 275 275 275 275 275 275 275 275 275 275 275 275 275 53 53 53 53 53 53 233 233 233 233 233 189 189 189 189 189 189 189 236 236 236 236 236 236 236 236 236 236 47 47 47 47 47 47 225 225 225 225 225 225 225 225 225 69 69 69 69 69 69 69 69 69 69 69 69 236 236 236 236 236 236 107 107 107 107 189 189 189 189 289 289 289 289 289 289 289 289 289 333 333 333 209 209 209 209 209 209 209 209 232 232 232 232 232 232 232 232 232 232 279 279 279 279 279 279 279 279 279 279 279 233 233 233 233 233 233 53 53 53 53 53 53 53 53 53 53 53 176 176 176 176 176 171 171 171 171 171 171 171 171 171 171 171 171 171 171 171 21 21 21 21 277 277 277 277 277 277 277 229 229 229 229 229 229 229 281 281 281 281 281 281 289 289 289 289 289 289 289 137 137 137 137 137 137 137 137 137 137 117 117 117 117 117 117 340 340 340 340 340 340 340 340 331 331 331 331 193 193 193 193 193 193 193 193 292 292 292 292 292 292 292 292 292 231 231 231 231 231 231 231 231 84 84 84 84 84 84 84 84 84 84 84 84 84 47 47 47 233 233 233 116 116 116 47 47 47 47 177 177 177 177 177 177 177 177 177 177 133 133 133 133 133 133 133 133 133 133 133 133 133 133 133 133 232 232 232 232 232 232 232 232 232 232 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 50 50 50 50 50 50 50 50 50 50 50 107 107 107 107 107 107 193 193 193 288 288 288 288 288 47 47 47 47 328 328 328 328 328 328 328 1 1 1 1 1 1 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 171 171 171 171 171 171 171 171 171 171 171 171 171 171 171 171 171 171 148 148 148 148 148 148 148 148 148 148 148 148 148 148 148 331 331 331 331 331 331 331 331 331 331 331 331 305 305 305 305 305 305 305 116 116 116 116 116 287 287 287 287 287 188 188 188 188 115 115 115 115 115 115 115 115 277 277 277 277 277 277 101 101 101 101 101 101 101 101 101 101 101 101 328 328 328 328 291 291 291 291 291 291 291 291 291 291 291 291 291 291 277 277 277 277 277 320 320 320 320 320 320 320 320 320 320 320 320 320 320 320 1 1 1 1 1 1 1 1 1 1 1
+1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 67 67 67 67 67 67 67 67 67 276 276 276 276 276 276 276 50 50 50 50 50 179 179 179 179 179 179 179 179 179 179 179 179 179 179 21 21 21 225 225 225 225 225 225 225 225 225 225 225 244 244 244 244 244 244 244 244 244 244 244 244 331 331 331 331 331 331 133 133 133 133 133 133 133 276 276 276 276 276 276 331 331 331 331 331 331 331 331 331 331 331 331 331 101 101 101 101 101 101 101 101 101 101 101 101 101 225 225 225 225 225 225 116 116 116 116 116 116 116 271 271 271 271 271 271 271 271 271 271 271 271 271 225 225 225 225 225 53 53 53 53 53 53 53 229 229 229 229 229 229 229 229 229 229 229 229 229 340 340 340 340 340 340 340 340 340 179 179 179 179 179 179 179 179 53 53 53 53 236 236 236 236 236 236 236 236 236 236 83 83 83 83 83 83 83 83 83 83 83 288 288 288 288 288 288 288 288 119 119 119 119 119 133 133 133 133 133 133 276 276 276 276 276 276 171 171 171 171 171 171 171 171 171 171 171 171 171 193 193 193 193 193 193 225 225 225 225 225 225 229 229 229 229 229 229 204 204 204 204 204 204 204 204 107 107 107 107 107 107 107 107 225 225 225 225 225 225 225 321 321 321 321 321 321 321 321 321 321 321 321 321 321 228 228 228 228 228 228 228 228 228 228 228 228 228 228 228 228 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 119 119 119 119 119 119 119 119 119 119 204 204 204 204 204 204 204 204 131 131 131 131 131 131 131 131 131 131 131 131 276 276 276 276 276 276 276 276 276 276 276 276 331 331 331 331 331 49 49 49 340 340 340 340 340 279 279 279 279 279 279 279 279 279 279 279 279 279 333 333 333 333 209 209 209 209 209 209 209 209 209 209 209 288 288 288 288 288 288 288 288 331 331 331 331 189 189 189 189 292 292 292 292 119 119 119 119 52 52 52 52 52 107 107 107 107 107 107 107 107 107 107 277 277 277 277 277 277 133 133 133 133 133 133 133 133 133 292 292 292 292 292 292 292 292 292 292 47 47 47 47 328 328 328 328 328 328 328 328 227 227 227 227 227 227 227 133 133 133 233 233 233 233 233 233 204 204 204 204 204 204 204 204 204 204 204 204 204 1 1 1 1 1 1 1 35 35 35 35 35 35 35 35 35 35 35 35 35 273 273 273 273 273 273 273 273 49 49 49 49 224 224 224 224 224 224 224 224 67 67 67 67 67 67 67 67 67 277 277 277 277 277 277 113 113 113 113 113 113 113 113 113 113 113 113 113 113 145 145 145 145 145 145 145 145 117 117 117 117 117 117 117 340 340 340 340 340 340 340 340 340 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 47 47 47 47 47 47 47 47 47 47 47 47 233 233 233 116 116 116 116 119 119 119 52 52 52 227 227 227 227 227 227 227 227 227 227 227 133 133 133 133 133 133 117 117 117 117 117 117 253 253 253 253 253 253 253 253 253 253 253 253 340 340 340 340 340 340 279 279 279 279 279 279 279 279 279 279 279 279 279 279 279 279 279 225 225 225 225 225 225 249 249 249 249 249 249 249 273 273 273 273 273 273 273 273 288 288 288 288 288 47 47 47 47 47 47 333 333 333 333 333 333 333 164 164 164 164 164 164 164 164 164 164 187 187 187 187 187 187 232 232 232 232 232 119 119 119 48 48 48 48 115 115 115 115 115 115 115 115 115 193 193 193 193 193 281 281 281 281 281 281 281 281 281 289 289 289 289 289 289 289 49 49 49 49 49 233 233 233 233 233 233 233 280 280 280 280 280 280 287 287 287 287 287 48 48 48 48 48 48 179 179 179 179 179 179 179 145 145 145 145 145 145 145 145 145 145 145 145 145 145 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 341 341 341 341 341 341 341 341 341 341 341 49 49 49 49 232 232 232 227 227 227 227 227 227 227 227 227 227 227 227 227 227 227 227 227 227 193 193 193 193 193 193 281 281 281 281 281 281 281 281 281 281 289 289 289 289 289 289 289 289 289 289 289 289 280 280 280 280 280 47 47 47 47 328 328 328 328 328 328 328 328 271 271 271 271 271 271 271 271 271 271 271 271 271 271 271 271 149 149 149 149 149 149 149 149 149 149 149 149 149 149 149 149 149 149 224 224 224 224 224 224 224 224 224 224 224 224 47 47 47 47 47 233 233 233 233 116 116 116 116 116 116 116 1 1 1 271 271 271 271 271 271 271 271 271 149 149 149 149 149 149 149 273 273 273 273 273 273 273 273 273 273 49 49 49 49 49 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 331 331 331 331 331 331 331 331 331 331 331 331 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 224 224 224 224 224 224 224 224 119 119 119 119 119 119 48 48 48 48 48 223 223 223 223 223 223 193 193 193 193 289 289 289 289 289 49 49 49 49 49 224 224 224 224 224 224 107 107 107 107 107 107 107 107 149 149 149 149 149 149 149 149 149 149 149 149 149 149 149 117 117 117 117 117 117 117 340 340 340 279 279 279 279 279 279 279 279 279 279 279 279 37 37 37 37 37 37 37 37 37 37 37 37 236 236 236 236 236 236 236 236 236 236 236 236 236 131 131 131 131 340 340 340 340 340 340 340 340 340 191 191 191 191 172 172 172 172 172 172 172 172 191 191 191 191 288 288 288 288 288 288 288 288 288 288 288 288 288 331 331 331 331 331 331 148 148 148 148 148 148 148 148 148 148 148 148 148 148 119 119 119 119 119 52 52 52 52 52 52 52 331 331 331 331 331 331 331 331 331 331 331 331 331 53 53 53 53 53 53 232 232 232 232 232 232 232 232 232 232 232 232 232 232 115 115 115 115 115 164 164 164 164 164 164 164 164 164 164 47 47 47 47 47 328 328 328 328 328 328 279 279 279 279 279 279 279 279 279 279 279 279 279 53 53 53 53 53 229 229 229 229 229 229 229 229 229 229 144 144 144 144 144 187 187 187 187 187 187 232 232 232 232 232 232 232 232 67 67 67 67 67 67 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 119 119 119 119 119 119 204 204 204 204 204 204 335 335 335 335 335 335 335 335 335 335 193 193 193 193 193 193 193 276 276 276 276 276 276 276 276 276 276 276 276 276 276 276 276 276 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
+1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 227 227 227 227 227 227 227 227 227 227 227 227 37 37 37 37 37 37 37 37 37 293 293 293 293 293 293 337 337 337 337 337 316 316 316 316 316 316 187 187 187 187 187 187 233 233 233 233 233 233 233 217 217 217 217 217 217 217 265 265 265 265 265 265 265 265 265 265 265 265 116 116 116 116 116 116 119 119 119 52 52 52 52 115 115 115 115 115 115 277 277 277 277 277 277 101 101 101 101 101 101 101 101 101 101 101 101 101 328 328 328 328 328 35 35 35 35 35 35 35 35 35 173 173 173 173 173 289 289 289 289 289 289 144 144 144 179 179 179 189 189 189 340 340 340 340 340 340 340 247 247 247 247 247 247 247 247 247 247 247 247 232 232 232 232 232 232 232 171 171 171 171 171 171 171 171 171 171 171 37 37 37 37 37 37 37 37 37 37 285 285 285 285 285 285 285 285 285 285 285 285 285 49 49 49 49 49 232 232 232 232 232 232 232 232 232 232 232 232 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 187 187 187 187 187 187 187 187 187 221 221 221 221 221 281 281 281 281 281 281 281 133 133 133 133 133 133 133 133 273 273 273 273 288 288 288 115 115 115 305 305 305 277 277 277 189 189 189 189 189 189 189 236 236 236 236 119 119 119 119 48 48 48 48 227 227 227 227 227 227 249 249 249 249 249 249 249 249 249 229 229 229 229 229 229 229 49 49 49 49 233 233 233 289 289 289 280 280 280 280 280 280 331 331 331 193 193 193 232 232 232 179 179 179 179 179 179 179 179 179 208 208 208 208 227 227 227 227 227 227 227 227 227 227 133 133 133 133 133 133 133 288 288 288 288 288 288 288 288 288 288 331 331 331 331 331 331 193 193 193 193 193 229 229 229 229 229 229 229 49 49 49 49 49 49 232 232 232 232 232 232 47 47 47 47 233 233 233 116 116 116 116 179 179 179 179 179 179 179 179 37 37 37 37 37 37 116 116 116 116 287 287 287 287 287 287 287 48 48 48 48 231 231 231 231 231 231 231 231 231 231 231 21 21 21 21 21 21 21 21 21 21 21 21 21 116 116 116 116 116 116 287 287 287 287 287 287 287 48 48 48 48 48 119 119 119 119 119 49 49 49 49 49 49 228 228 228 228 228 228 228 228 228 228 228 228 228 228 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 171 171 171 171 171 171 171 171 171 171 69 69 69 69 69 69 69 69 276 276 276 276 276 276 276 187 187 187 187 187 232 232 232 232 232 232 271 271 271 271 271 277 277 277 277 277 277 193 193 193 233 233 233 233 233 233 233 233 280 280 280 280 280 280 280 131 131 131 131 131 117 117 117 117 117 117 117 333 333 333 145 145 145 116 116 116 116 116 116 99 99 99 99 99 99 99 99 99 99 99 99 99 225 225 225 225 225 225 225 49 49 49 49 49 49 233 233 233 233 233 116 116 116 116 335 335 335 335 320 320 320 320 320 320 320 146 146 146 146 146 146 146 146 279 279 279 279 279 279 49 49 49 49 273 273 273 273 273 273 249 249 249 249 249 249 249 249 249 249 341 341 341 341 341 341 341 341 341 341 116 116 116 287 287 287 188 188 188 188 188 231 231 231 231 231 231 21 21 21 21 21 21 21 21 21 21 21 21 116 116 116 287 287 287 287 320 320 320 320 320 320 320 320 320 320 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 224 224 224 224 224 224 224 224 224 47 47 47 47 47 233 233 233 116 116 116 116 279 279 279 279 279 279 279 279 279 279 279 279 279 279 53 53 53 53 233 233 233 233 233 233 117 117 117 117 277 277 277 277 204 204 204 204 335 335 335 335 335 335 335 335 320 320 320 320 320 227 227 227 227 227 227 227 209 209 209 209 209 209 288 288 288 288 19 19 19 19 19 19 19 19 232 232 232 232 119 119 119 52 52 52 52 52 52 52 275 275 275 275 275 275 275 275 249 249 249 249 249 249 249 249 249 249 249 249 249 249 116 116 116 116 116 116 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 331 331 331 331 331 331 331 331 133 133 133 133 133 133 121 121 121 121 144 144 144 144 335 335 335 335 335 335 335 335 320 320 320 320 320 231 231 231 231 231 231 231 231 248 248 248 248 248 248 248 248 119 119 119 119 119 49 49 49 228 228 228 228 228 228 228 228 146 146 146 146 146 146 146 146 231 231 231 231 231 231 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 288 288 288 288 288 288 288 288 288 288 288 288 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 227 227 227 227 227 227 227 227 227 227 227 227 37 37 37 37 37 37 37 37 37 37 37 37 37 293 293 293 293 293 293 293 337 337 337 337 337 337 337 337 316 316 316 316 316 316 115 115 115 115 115 115 115 277 277 277 277 277 277 133 133 133 133 117 117 117 117 117 189 189 189 189 189 116 116 116 116 116 116 116 116 116 116 116 116 116 116 116 116 116 116 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 224 224 224 224 224 224 224 224 224 224 224 331 331 331 331 331 331 331 193 193 193 193 193 229 229 229 229 229 229 229 229 49 49 49 49 49 49 49 232 232 232 232 232 232 232 232 187 187 187 187 187 187 187 187 221 221 221 221 221 221 221 281 281 281 281 281 281 281 281 133 133 133 133 133 273 273 273 273 273 273 273 288 288 288 288 288 288 288 227 227 227 227 227 227 227 227 227 227 17 17 17 277 277 277 193 193 193 193 193 193 225 225 225 225 225 225 225 225 48 48 48 47 47 47 47 47 47 47 47 47 47 233 233 233 116 116 116 116 116 116 116 227 227 227 193 193 193 193 193 281 281 281 281 281 281 281 281 281 189 189 189 189 340 340 340 340 340 340 340 340 340 340 340 275 275 275 275 275 165 165 165 165 165 165 165 165 165 113 113 113 113 113 113 113 49 49 49 49 49 49 49 49 224 224 224 224 224 224 224 224 224 224 224 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
+1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 179 179 179 179 179 179 179 179 179 208 208 208 208 179 179 179 179 179 179 179 179 179 179 179 37 37 37 116 116 116 116 116 47 47 47 232 232 232 232 232 232 47 47 47 47 47 47 233 233 233 233 233 233 233 233 221 221 221 221 221 221 221 221 221 221 53 53 53 53 53 229 229 229 229 229 229 173 173 173 173 173 145 145 145 289 289 289 49 49 49 49 109 109 109 109 49 49 49 49 224 224 224 224 224 171 171 171 171 171 171 171 171 171 171 171 171 171 171 171 171 209 209 209 209 209 209 209 209 209 209 225 225 225 225 225 189 189 189 189 189 189 189 236 236 236 236 236 236 236 236 236 236 236 236 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 119 119 119 119 49 49 49 288 288 288 119 119 119 52 52 52 52 227 227 227 227 189 189 189 189 189 281 281 281 281 281 281 281 281 281 281 281 289 289 289 289 289 289 193 193 193 277 277 277 277 277 277 277 205 205 205 205 205 205 205 49 49 49 49 49 280 280 280 280 280 280 280 280 280 280 219 219 219 219 219 219 219 219 219 219 219 277 277 277 277 277 209 209 209 209 209 113 113 113 113 113 113 113 113 113 113 113 145 145 145 145 145 145 340 340 340 340 340 340 340 340 331 331 331 144 144 144 144 279 279 279 279 279 279 279 279 279 279 279 279 279 279 279 209 209 209 209 209 209 221 221 221 221 221 221 277 277 277 277 189 189 189 189 189 289 289 289 289 225 225 225 225 225 204 204 204 204 204 204 204 204 204 204 204 223 223 223 223 223 223 223 223 223 223 37 37 37 37 37 37 37 37 37 37 37 37 37 173 173 173 173 173 173 173 173 173 173 173 189 189 189 236 236 236 236 236 236 236 236 35 35 35 35 35 35 35 35 35 288 288 288 288 288 179 179 179 179 179 179 193 193 193 193 228 228 228 228 228 228 228 228 228 228 228 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 179 179 179 179 179 179 179 179 179 179 208 208 208 208 208 227 227 227 227 227 227 227 164 164 164 164 179 179 179 179 179 179 37 37 37 328 328 328 107 107 107 49 49 49 49 232 232 232 232 232 219 219 219 219 219 219 219 333 333 333 333 333 101 101 101 101 101 101 101 101 101 101 101 288 288 288 288 288 288 275 275 275 275 275 101 101 101 101 101 101 101 101 101 288 288 288 288 288 288 187 187 187 187 232 232 232 232 232 291 291 291 291 291 291 291 193 193 193 237 237 237 237 237 237 221 221 221 221 221 221 221 221 189 189 189 236 236 236 236 236 236 236 236 236 279 279 279 279 279 279 279 279 279 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 171 171 171 171 171 171 171 171 171 69 69 69 276 276 276 276 276 179 179 179 179 179 208 208 208 208 208 331 331 331 331 49 49 49 49 340 340 340 340 340 340 340 47 47 47 47 232 232 232 232 232 232 232 232 1 1 1 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 116 116 116 116 116 116 116 223 223 223 223 305 305 305 305 305 221 221 221 221 221 221 221 189 189 189 189 236 236 236 236 236 236 236 271 271 271 271 271 271 271 271 271 271 271 149 149 149 149 149 149 149 149 281 281 281 281 281 281 281 281 281 281 189 189 189 233 233 233 189 189 189 189 189 189 189 189 189 216 216 216 216 216 216 216 216 216 216 216 216 216 216 216 216 216 216 216 216 1 1 1 1 1 1 331 331 331 331 331 331 189 189 189 120 120 120 120 47 47 47 232 232 232 232 232 47 47 47 47 47 47 47 47 233 233 233 233 233 233 233 233 233 233 177 177 177 177 177 177 165 165 165 165 165 165 165 165 165 165 165 165 165 165 165 165 233 233 233 233 233 233 233 225 225 225 225 204 204 204 204 204 204 204 204 171 171 171 171 171 171 171 171 171 171 171 171 171 171 171 171 171 193 193 193 193 193 193 193 177 177 177 177 337 337 337 337 144 144 144 144 144 144 144 144 144 144 144 144 144 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 35 35 35 35 35 35 35 35 35 35 35 35 233 233 233 233 116 116 116 116 116 116 116 116 116 116 223 223 223 223 223 69 69 69 69 69 69 69 69 69 69 69 69 69 69 69 69 69 236 236 236 236 236 236 236 236 236 236 236 236 236 1 1 1 1 1 1 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 145 145 145 232 232 232 232 232 232 232 232 232 232 232 232 175 175 175 175 175 175 277 277 277 277 277 277 164 164 164 164 164 164 164 164 164 164 164 164 164 164 164 164 164 179 179 179 179 179 179 179 179 179 179 179 179 179 179 179 179 133 133 133 133 133 133 133 133 133 133 133 276 276 276 276 276 276 276 276 119 119 119 119 119 119 49 49 49 49 288 288 288 288 288 287 287 287 287 287 287 287 287 287 287 287 287 53 53 53 53 53 53 53 53 113 113 113 113 113 113 113 113 113 113 113 113 288 288 288 179 179 179 179 189 189 189 340 340 340 340 340 279 279 279 279 279 279 279 279 279 279 279 279 279 279 279 279 289 289 289 289 289 289 321 321 321 321 321 321 321 321 273 273 273 273 273 273 273 189 189 189 236 236 236 236 236 236 236 236 283 283 283 283 283 283 283 283 283 283 283 283 283 283 249 249 249 249 249 249 249 249 249 225 225 225 225 225 225 225 117 117 117 117 117 145 145 145 145 145 145 145 145 145 145 340 340 340 340 340 340 340 340 340 340 340 1 1 1 1 1 1 1 1 1 1 1 1 1 1
+1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 47 47 47 47 47 47 47 47 47 233 233 233 116 116 116 116 50 50 50 50 50 171 171 171 171 171 171 171 171 171 171 171 171 171 171 171 171 171 171 305 305 305 305 305 305 305 305 305 305 305 305 305 305 305 305 305 305 305 305 305 305 305 224 224 224 224 224 224 224 224 279 279 279 279 279 279 279 279 279 279 279 279 279 279 279 279 279 279 69 69 69 69 69 69 69 69 69 69 69 69 69 69 173 173 173 173 173 173 173 173 173 288 288 288 288 288 288 288 288 288 107 107 107 107 107 277 277 277 277 85 85 85 85 85 85 85 85 85 85 85 85 85 85 85 85 85 85 232 232 232 232 232 232 232 232 232 232 232 232 232 232 232 107 107 107 107 107 107 193 193 193 193 193 193 193 193 193 193 193 193 193 193 193 277 277 277 277 277 277 277 277 277 277 116 116 116 116 116 116 116 116 116 116 116 116 1 1 1 1 331 331 331 331 331 331 331 193 193 193 193 193 112 112 112 112 112 112 112 179 179 179 179 208 208 208 179 179 179 37 37 37 116 116 116 116 116 116 116 116 331 331 331 331 331 331 69 69 69 69 69 69 69 277 277 277 277 277 232 232 232 232 232 131 131 131 131 329 329 329 329 329 329 329 144 144 144 144 144 279 279 279 279 279 279 279 279 193 193 193 193 233 233 233 233 233 233 280 280 280 179 179 179 208 208 208 331 331 331 331 49 49 49 49 340 340 340 340 340 340 287 287 287 287 287 287 287 287 333 333 333 133 133 133 233 233 233 233 233 233 233 289 289 289 289 289 289 204 204 204 204 204 204 204 204 204 204 204 204 204 204 204 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 187 187 187 187 187 187 187 187 187 187 232 232 232 232 232 232 232 171 171 171 171 171 171 171 171 171 171 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 221 221 221 221 221 221 221 288 288 288 288 288 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 179 179 179 179 179 179 179 179 179 179 208 208 208 208 208 208 179 179 179 179 179 179 37 37 37 37 37 37 116 116 116 116 116 116 116 116 223 223 223 223 223 305 305 305 305 305 305 221 221 221 221 221 221 221 288 288 288 288 288 288 35 35 35 35 35 288 288 288 288 288 287 287 287 287 287 287 287 287 333 333 333 333 133 133 133 133 233 233 233 233 233 233 289 289 289 289 289 289 289 289 204 204 204 204 204 204 204 204 204 204 204 204 204 204 204 204 204 204 204 1 1 1 1 1 1 1 1 1 1 327 327 327 327 327 327 327 327 327 327 327 133 133 133 133 133 277 277 277 277 277 277 204 204 204 204 204 204 204 227 227 227 227 227 227 227 227 227 53 53 53 53 53 53 53 112 112 112 112 112 112 112 131 131 131 340 340 340 340 340 340 179 179 179 208 208 208 208 208 208 223 223 223 223 223 305 305 305 305 305 221 221 221 221 221 288 288 288 288 288 35 35 35 288 288 288 288 288 288 288 279 279 279 279 279 279 279 279 193 193 193 193 221 221 221 221 221 281 281 281 281 281 281 289 289 289 289 289 289 204 204 204 204 204 204 204 204 204 204 204 204 204 204 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 223 223 223 223 223 223 223 223 223 223 223 223 223 223 37 37 37 37 37 37 37 37 221 221 221 221 221 221 221 221 189 189 189 189 189 236 236 236 236 236 236 236 236 236 50 50 50 50 223 223 223 223 223 223 193 193 193 193 289 289 289 289 289 49 49 49 49 49 224 224 224 224 224 47 47 47 328 328 328 328 328 119 119 119 119 52 52 52 175 175 175 175 175 175 277 277 277 277 277 165 165 165 165 165 165 165 165 165 233 233 233 233 233 233 233 49 49 49 49 49 49 49 49 49 49 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 331 331 331 331 331 331 331 133 133 133 133 133 232 232 232 179 179 179 179 179 179 208 208 208 208 208 208 208 208 275 275 275 275 275 275 275 209 209 209 209 209 209 209 113 113 113 113 113 113 113 113 113 288 288 288 288 288 288 288 288 288 107 107 107 107 107 107 277 277 277 277 277 101 101 101 101 101 101 101 101 101 101 288 288 288 288 288 288 288 288 288 288 288 275 275 275 275 275 275 275 275 193 193 193 193 193 329 329 329 329 329 329 144 144 144 144 144 144 144 144 144 144 144 144 144 144 144 144 144 144 1 1 1 1 1 1 1 1 1 1 1 119 119 119 119 119 119 119 119 119 133 133 133 276 276 276 276 276 276 331 331 331 331 331 331 331 49 49 49 49 340 340 340 340 340 340 340 340 340 231 231 231 231 231 231 231 231 231 231 248 248 248 248 248 248 248 248 248 248 248 279 279 279 279 279 279 279 279 279 279 279 279 279 279 101 101 101 101 101 101 101 101 101 101 101 101 101 232 232 232 232 232 47 47 47 47 328 328 328 328 328 328 131 131 131 131 233 233 233 233 233 204 204 204 204 204 204 204 287 287 287 287 287 287 287 287 287 287 287 277 277 277 277 277 165 165 165 165 165 165 165 165 165 232 232 232 232 232 232 232 232 232 232 232 232 232 1 1 1 1 1 1 1 1 1 1 1 1 1 1
+1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 179 179 179 179 179 179 179 179 179 179 208 208 208 208 208 208 208 291 291 291 291 291 291 69 69 69 69 69 69 69 69 69 69 288 288 288 288 179 179 179 208 208 208 208 331 331 331 331 49 49 49 49 340 340 340 340 340 340 340 287 287 287 287 287 287 287 287 287 320 320 320 320 320 320 320 320 320 320 320 320 147 147 147 147 147 147 147 147 147 147 147 225 225 225 225 225 225 225 204 204 204 204 204 204 204 204 204 204 279 279 279 279 279 279 279 279 279 279 248 248 248 248 248 248 179 179 179 179 179 179 179 179 208 208 208 208 208 208 208 287 287 287 287 287 287 287 287 287 287 287 287 101 101 101 101 101 101 101 101 101 101 101 116 116 116 116 179 179 179 179 189 189 189 189 340 340 340 340 340 340 340 340 179 179 179 179 179 69 69 69 69 69 69 69 277 277 277 277 277 277 280 280 280 280 280 280 280 280 187 187 187 187 232 232 232 119 119 119 119 204 204 204 204 335 335 335 335 335 335 335 21 21 21 21 21 21 21 21 21 21 21 277 277 277 277 116 116 116 116 47 47 47 328 328 328 328 119 119 119 52 52 52 52 52 279 279 279 279 279 279 279 279 279 279 279 279 279 229 229 229 229 229 229 229 229 69 69 69 69 69 69 69 69 69 69 69 69 69 69 69 224 224 224 224 224 224 224 224 1 1 1 107 107 107 107 107 107 277 277 277 277 277 277 101 101 101 101 101 101 101 101 101 101 288 288 288 288 288 288 288 288 288 288 288 275 275 275 275 275 193 193 193 193 329 329 329 329 329 329 144 144 144 144 144 179 179 179 179 179 179 245 245 245 245 245 245 245 245 245 245 289 289 289 289 289 289 289 289 289 289 289 289 289 133 133 133 133 133 133 133 133 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 47 47 47 47 47 47 47 47 47 47 47 233 233 233 116 116 116 116 116 331 331 331 331 133 133 133 133 133 133 233 233 233 233 233 288 288 288 288 288 247 247 247 247 247 247 247 247 247 247 329 329 329 329 144 144 144 144 144 144 287 287 287 287 287 287 287 188 188 188 188 188 119 119 119 119 52 52 52 52 52 279 279 279 279 279 279 279 289 289 289 289 289 289 165 165 165 165 165 165 165 285 285 285 285 285 285 285 285 285 285 285 49 49 49 232 232 232 232 179 179 179 179 179 179 179 179 85 85 85 85 85 85 85 85 85 85 85 85 85 85 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 119 119 119 119 119 52 52 52 52 52 52 52 223 223 223 223 223 223 223 223 69 69 69 69 69 69 69 69 69 69 69 69 69 236 236 236 236 236 236 236 236 236 236 236 271 271 271 271 271 271 271 271 271 225 225 225 225 225 225 37 37 37 37 37 37 37 37 37 37 37 289 289 289 173 173 173 173 173 173 173 173 173 73 73 73 73 73 277 277 277 277 277 228 228 228 228 228 331 331 331 331 49 49 49 49 340 340 340 340 340 340 340 1 1 1 67 67 67 67 67 67 67 67 67 67 67 225 225 225 225 225 229 229 229 229 229 253 253 253 253 253 253 281 281 281 281 281 288 288 288 115 115 115 115 189 189 189 189 189 341 341 341 341 341 341 341 341 341 149 149 149 149 149 149 149 149 149 149 289 289 289 289 289 189 189 189 189 189 189 189 189 116 116 116 116 116 116 116 116 116 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 119 119 119 119 119 204 204 204 204 204 204 204 204 247 247 247 247 247 247 247 247 247 247 247 247 247 233 233 233 233 225 225 225 225 225 204 204 204 204 204 204 204 204 204 223 223 223 223 223 223 193 193 193 193 329 329 329 329 329 189 189 189 189 236 236 236 236 236 236 236 219 219 219 219 219 219 219 277 277 277 277 277 277 209 209 209 209 209 113 113 113 113 113 113 113 113 113 113 144 144 144 144 144 144 187 187 187 187 187 232 232 232 232 232 232 232 279 279 279 279 279 279 279 279 279 279 279 279 279 101 101 101 101 101 101 101 101 101 101 101 101 101 101 288 288 288 288 288 288 288 1 1 1 107 107 107 107 107 209 209 209 209 209 209 209 209 209 209 189 189 189 189 189 236 236 236 236 236 236 236 236 50 50 50 50 50 175 175 175 175 175 175 175 175 175 175 149 149 149 149 149 149 149 149 149 149 149 149 149 149 149 224 224 224 224 224 224 224 224 224 224 224 224 224 179 179 179 179 179 320 320 320 320 320 331 331 331 331 331 331 331 49 49 49 340 340 340 340 340 279 279 279 279 279 279 279 279 279 193 193 193 193 193 289 289 289 289 189 189 189 189 189 189 189 189 189 236 236 236 236 236 236 236 67 67 67 67 67 232 232 232 232 232 50 50 50 50 50 50 271 271 271 271 271 271 271 271 271 271 271 271 101 101 101 101 101 101 101 101 101 101 101 101 101 224 224 224 224 224 224 47 47 47 47 328 328 328 328 283 283 283 283 283 283 283 283 283 283 283 283 193 193 193 193 193 193 237 237 237 237 237 237 237 237 177 177 177 177 49 49 49 49 225 225 225 225 225 225 225 340 340 340 340 340 340 35 35 35 35 288 288 288 119 119 119 119 119 119 204 204 204 204 204 127 127 127 221 221 221 221 221 221 221 221 221 221 281 281 281 281 281 289 289 289 289 289 289 289 289 289 277 277 277 277 209 209 209 209 209 209 209 209 209 209 209 209 209 228 228 228 228 228 228 228 228 131 131 131 131 131 131 131 131 131 131 233 233 233 233 116 116 116 116 116 116 116 116 116 116 116 116 116 116 116 116 116 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
+1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 227 227 227 227 227 227 227 227 227 227 227 37 37 37 37 37 37 37 37 37 37 37 37 293 293 293 293 293 293 293 293 337 337 337 337 337 337 337 337 316 316 316 316 316 316 316 316 316 316 107 107 107 107 107 107 107 107 133 133 133 133 277 277 277 277 277 277 277 277 277 277 277 277 225 225 225 225 225 225 204 204 204 204 231 231 231 231 231 231 231 231 231 231 249 249 249 249 249 289 289 289 289 289 289 289 289 289 289 289 289 289 289 289 289 289 289 289 189 189 189 236 236 236 236 236 236 236 236 119 119 119 49 49 49 288 288 288 288 191 191 191 191 288 288 288 288 288 288 331 331 331 331 331 331 331 331 331 331 331 331 21 21 21 21 21 340 340 340 340 340 340 340 340 50 50 50 50 50 175 175 175 175 175 175 175 175 175 175 149 149 149 149 149 149 149 149 149 149 149 149 149 224 224 224 224 224 224 224 224 224 224 224 224 224 1 1 1 1 1 1 1 279 279 279 279 279 279 279 279 279 279 279 279 279 279 279 279 279 279 101 101 101 101 101 101 101 101 101 101 101 101 101 117 117 117 117 117 49 49 49 49 225 225 225 225 225 116 116 116 116 116 116 116 271 271 271 271 271 271 271 271 271 271 271 37 37 37 37 37 37 37 37 37 281 281 281 281 281 281 281 281 281 288 288 288 179 179 179 179 179 144 144 144 144 144 144 131 131 131 131 340 340 340 340 340 340 340 340 340 340 340 340 219 219 219 219 219 219 219 219 333 333 333 333 193 193 193 193 221 221 221 221 221 221 221 221 221 225 225 225 225 225 204 204 204 204 204 204 131 131 131 131 340 340 340 340 340 340 340 340 271 271 271 271 271 271 271 21 21 21 21 21 21 21 21 281 281 281 281 281 281 281 281 49 49 49 49 109 109 109 49 49 49 49 224 224 224 224 331 331 331 331 331 189 189 189 121 121 121 121 121 85 85 85 85 85 85 85 85 85 85 85 288 288 288 288 288 223 223 223 223 223 223 305 305 305 305 221 221 221 221 221 221 221 221 189 189 189 189 236 236 236 236 236 236 236 236 35 35 35 35 35 35 288 288 288 288 179 179 179 179 179 179 179 144 144 144 144 144 144 144 144 144 144 144 144 144 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 179 179 179 179 179 179 179 179 179 179 179 179 179 37 37 37 37 37 37 116 116 116 116 179 179 179 179 179 179 179 179 179 208 208 208 208 208 208 208 223 223 223 223 223 223 223 305 305 305 305 305 305 305 221 221 221 221 221 221 221 221 288 288 288 288 288 288 288 288 288 288 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 179 179 179 179 179 179 179 179 179 179 179 179 208 208 208 208 219 219 219 219 219 219 219 219 305 305 305 116 116 116 179 179 179 179 179 179 179 179 179 179 179 179 21 21 21 21 21 21 277 277 277 277 277 117 117 117 117 117 225 225 225 225 225 204 204 204 179 179 179 37 37 37 328 328 328 328 171 171 171 171 171 171 171 171 171 171 171 171 165 165 165 165 165 165 165 225 225 225 225 225 225 225 225 116 116 116 287 287 287 287 287 287 48 48 48 48 231 231 231 231 231 231 231 231 231 231 231 249 249 249 249 249 249 249 249 249 289 289 289 289 289 189 189 189 189 189 189 189 189 280 280 280 280 280 280 280 280 280 280 119 119 119 119 52 52 52 52 52 52 287 287 287 287 287 287 287 287 287 287 287 287 287 287 133 133 133 133 233 233 233 233 233 233 233 233 233 233 233 280 280 280 280 280 280 280 280 280 280 280 280 275 275 275 275 189 189 189 217 217 217 217 217 217 217 217 217 217 217 217 217 193 193 193 193 117 117 117 117 49 49 49 49 49 289 289 289 289 289 289 289 289 289 289 204 204 204 204 204 204 204 204 204 204 204 204 204 204 47 47 47 47 47 233 233 233 116 116 116 135 135 135 135 135 135 221 221 221 221 221 221 221 281 281 281 281 281 281 281 273 273 273 273 129 129 129 129 221 221 221 221 221 221 221 221 221 289 289 289 289 289 289 289 289 289 289 165 165 165 165 165 165 165 285 285 285 285 285 285 285 285 285 285 285 49 49 49 49 232 232 232 47 47 47 47 328 328 328 328 179 179 179 179 179 179 144 144 144 144 144 144 144 1 1 1 35 35 35 35 35 35 35 35 35 35 35 35 289 289 289 289 289 49 49 49 49 49 49 289 289 289 289 289 289 289 289 289 289 289 325 325 325 325 325 325 325 325 116 116 116 116 116 35 35 35 35 35 233 233 233 233 116 116 116 187 187 187 187 221 221 221 221 221 221 281 281 281 281 281 281 281 273 273 273 273 277 277 277 133 133 133 133 285 285 285 285 285 285 285 285 285 285 285 285 285 285 49 49 49 49 232 232 232 232 232 232 232 232 232 232 232 232 232 232 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 283 283 283 283 283 283 283 283 283 283 283 283 283 208 208 208 208 208 208 208 331 331 331 331 331 331 331 331 49 49 49 49 340 340 340 340 340 340 340 340 340 279 279 279 279 279 279 193 193 193 193 193 289 289 289 189 189 189 189 189 189 189 189 236 236 236 236 236 236 236 236 236 236 119 119 119 133 133 133 133 133 133 276 276 276 276 276 276 276 276 276 276 276 331 331 331 331 331 331 331 331 331 331 165 165 165 165 165 165 165 165 165 165 289 289 289 289 289 189 189 189 189 189 236 236 236 236 236 236 236 236 236 171 171 171 171 171 171 171 171 171 144 144 144 144 144 279 279 279 279 279 279 279 279 279 279 279 279 279 279 279 279 53 53 53 53 229 229 229 229 229 229 229 229 229 293 293 293 293 293 293 293 293 293 293 189 189 189 189 189 236 236 236 236 236 236 236 236 236 236 1 1 1 67 67 67 67 67 67 67 67 67 67 276 276 276 276 276 276 279 279 279 279 279 279 279 279 279 279 279 279 279 279 279 279 53 53 53 53 53 229 229 229 229 229 229 109 109 109 109 109 25 25 25 25 25 117 117 117 204 204 204 204 204 204 204 204 204 204 204 204 204 204 204 204 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
+1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 35 35 35 35 35 35 35 35 35 35 35 35 35 35 35 35 35 35 35 35 35 35 35 35 35 233 233 233 233 233 233 233 116 116 116 116 116 116 116 116 1 1 1 279 279 279 279 279 279 279 279 279 279 279 279 279 279 279 279 279 193 193 193 193 193 233 233 233 233 233 233 233 280 280 280 280 280 280 279 279 279 279 279 279 279 279 193 193 193 193 193 289 289 289 289 189 189 189 189 189 189 189 236 236 236 236 236 236 236 236 236 35 35 35 233 233 233 116 116 116 116 331 331 331 331 331 331 165 165 165 165 165 165 165 165 165 289 289 289 289 189 189 189 189 189 189 236 236 236 236 236 236 331 331 331 331 331 49 49 49 49 340 340 340 340 119 119 119 119 119 204 204 204 204 204 204 204 204 247 247 247 247 247 247 247 247 247 233 233 233 233 233 233 225 225 225 225 225 204 204 204 204 204 204 291 291 291 291 291 291 291 291 291 291 193 193 193 193 236 236 236 236 236 236 236 236 236 236 236 236 236 236 236 236 236 236 287 287 287 287 287 287 188 188 188 188 188 115 115 115 115 115 115 115 115 115 320 320 320 320 320 320 320 320 320 320 320 320 215 215 215 215 215 215 215 215 215 215 53 53 53 53 53 53 53 53 53 53 281 281 281 281 281 281 281 281 288 288 288 119 119 119 133 133 133 133 133 133 133 133 133 133 133 133 232 232 232 232 232 232 232 232 232 232 232 232 232 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 283 283 283 283 283 283 283 283 283 283 283 283 208 208 208 208 208 208 208 279 279 279 279 279 279 279 279 279 279 279 279 279 279 37 37 37 37 37 37 37 37 37 37 37 37 37 288 288 288 288 288 288 288 1 1 1 47 47 47 47 47 233 233 233 233 116 116 116 116 116 116 331 331 331 331 331 331 331 331 165 165 165 165 165 165 165 165 165 165 289 289 289 289 289 189 189 189 189 189 116 116 116 116 116 116 116 116 331 331 331 331 331 189 189 189 292 292 292 292 292 292 292 292 292 292 1 1 1 1 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 224 224 224 224 224 224 224 224 224 179 179 179 179 179 179 179 179 179 179 179 144 144 144 144 227 227 227 227 227 227 227 227 227 227 227 227 227 101 101 101 101 101 101 101 101 101 288 288 288 288 288 288 288 288 47 47 47 233 233 233 116 116 116 116 116 116 116 116 116 227 227 227 227 227 165 165 165 165 165 165 165 165 165 165 165 165 165 232 232 232 232 232 232 232 232 232 232 232 232 232 232 232 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 227 227 227 227 227 227 227 227 227 227 227 37 37 37 37 37 37 37 37 37 293 293 293 293 293 293 293 337 337 337 337 337 316 316 316 316 316 316 316 316 187 187 187 187 233 233 233 233 233 233 221 221 221 221 221 221 221 221 221 85 85 85 85 85 85 85 85 85 233 233 233 233 233 233 289 289 289 289 289 145 145 145 145 145 116 116 116 119 119 119 52 52 52 52 52 279 279 279 279 279 279 289 289 289 289 289 289 289 165 165 165 165 165 165 285 285 285 285 285 285 285 285 285 285 285 285 49 49 49 49 233 233 233 229 229 229 229 229 229 229 41 41 41 41 41 41 41 41 41 281 281 281 281 281 281 289 289 289 289 289 289 289 289 144 144 144 144 144 223 223 223 223 223 223 223 223 223 223 223 21 21 21 21 21 21 21 221 221 221 221 221 221 221 221 189 189 189 189 236 236 236 236 236 236 236 236 236 51 51 51 51 51 51 51 51 272 272 272 272 272 119 119 119 119 119 52 52 52 52 52 287 287 287 287 287 287 287 287 287 193 193 193 193 193 221 221 221 221 221 221 49 49 49 49 288 288 288 288 288 67 67 67 67 67 67 67 67 67 67 67 67 173 173 173 173 173 173 173 173 49 49 49 49 49 49 49 280 280 280 280 280 280 271 271 271 271 271 271 271 271 277 277 277 277 133 133 133 133 273 273 273 273 273 273 277 277 277 277 49 49 49 49 49 289 289 289 289 289 289 289 73 73 73 73 73 73 277 277 277 277 277 277 204 204 204 204 204 287 287 287 287 287 188 188 188 188 175 175 175 175 175 175 175 175 249 249 249 249 249 249 249 249 249 249 249 189 189 189 189 189 189 189 189 236 236 236 236 236 236 179 179 179 179 179 179 179 179 179 179 179 249 249 249 249 249 249 228 228 228 228 228 228 228 228 228 171 171 171 171 171 171 171 144 144 144 144 144 144 279 279 279 279 279 279 279 279 279 53 53 53 53 53 53 53 273 273 273 273 273 273 273 144 144 144 144 144 144 144 144 144 144 144 144 144 144 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 47 47 47 47 47 47 47 47 47 47 233 233 233 116 116 116 116 116 35 35 35 35 35 35 35 35 35 35 35 35 35 35 281 281 281 281 281 281 281 288 288 288 288 288 288 187 187 187 228 228 228 228 228 228 187 187 187 187 187 172 172 172 172 119 119 119 119 119 48 48 48 48 48 48 171 171 171 171 171 171 171 171 171 171 171 101 101 101 101 101 101 101 101 101 101 101 101 101 328 328 328 328 328 291 291 291 291 291 291 291 291 291 291 149 149 149 149 149 149 149 117 117 117 117 204 204 204 204 204 204 204 287 287 287 287 287 287 287 287 287 287 287 277 277 277 277 165 165 165 165 165 165 165 165 165 165 165 232 232 232 232 232 232 232 331 331 331 331 305 305 305 305 116 116 116 116 116 116 116 116 1 1 1 1 1 1 1 1 1 279 279 279 279 279 279 279 279 279 279 279 279 279 279 321 321 321 321 321 321 321 321 321 321 321 232 232 232 232 107 107 107 107 107 204 204 204 204 204 47 47 47 47 47 47 47 47 225 225 225 225 225 225 225 69 69 69 69 69 69 69 69 69 69 69 236 236 236 236 236 236 236 236 236 236 236 236 236 236 236 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
+1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 119 119 119 119 119 48 48 48 48 48 48 171 171 171 171 171 171 171 171 101 101 101 101 101 101 101 101 101 101 101 101 101 101 328 328 328 328 328 328 291 291 291 291 291 291 291 149 149 149 149 149 149 117 117 117 117 204 204 204 204 204 204 204 204 204 287 287 287 287 287 287 287 277 277 277 277 165 165 165 165 165 165 165 165 165 232 232 232 232 179 179 179 37 37 37 37 37 340 340 340 340 340 340 340 340 107 107 107 107 107 107 193 193 193 193 193 232 232 232 232 232 232 232 232 232 232 232 191 191 191 191 191 191 191 191 191 191 191 191 191 191 191 191 191 191 232 232 232 232 232 232 232 232 35 35 35 35 35 35 35 35 233 233 233 233 233 233 116 116 116 116 116 175 175 175 175 175 175 175 69 69 69 69 69 69 69 69 69 69 69 69 69 69 69 69 69 69 232 232 232 232 232 232 232 232 179 179 179 179 179 179 179 179 179 179 37 37 37 37 37 37 37 37 37 37 172 172 172 172 172 172 172 172 47 47 47 47 232 232 232 232 83 83 83 83 83 83 83 83 83 83 83 83 276 276 276 276 276 276 276 276 276 47 47 47 47 47 177 177 177 177 177 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 35 35 35 35 35 35 35 35 35 35 35 35 35 35 35 233 233 233 233 233 233 233 281 281 281 281 281 281 281 281 145 145 145 145 116 116 116 116 116 119 119 119 119 119 37 37 37 37 37 37 37 37 37 288 288 288 288 288 288 288 288 107 107 107 107 107 107 277 277 277 277 277 193 193 193 193 193 281 281 281 281 281 281 281 281 220 220 220 220 220 47 47 47 47 173 173 173 173 173 173 173 173 173 193 193 193 193 285 285 285 285 285 285 285 49 49 49 49 49 49 49 224 224 224 224 224 224 224 224 224 224 224 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 107 107 107 107 107 107 107 53 53 53 53 53 53 53 53 53 53 288 288 288 288 119 119 119 119 133 133 133 276 276 276 276 276 276 331 331 331 331 331 331 49 49 49 49 340 340 340 340 340 340 50 50 50 50 50 271 271 271 271 271 271 271 271 37 37 37 37 37 37 37 281 281 281 281 281 281 281 281 281 49 49 49 49 233 233 233 233 233 233 217 217 217 217 144 144 144 144 144 144 144 144 144 115 115 115 115 115 115 115 277 277 277 277 277 21 21 21 21 21 21 21 21 273 273 273 273 273 273 288 288 288 288 288 67 67 67 67 67 67 67 67 172 172 172 172 172 172 172 171 171 171 171 144 144 144 144 335 335 335 335 335 335 335 335 335 335 335 320 320 320 320 320 320 320 320 320 320 320 320 320 320 320 320 320 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 50 50 50 50 50 50 50 50 50 50 50 50 223 223 223 223 193 193 193 193 289 289 289 49 49 49 224 224 224 224 175 175 175 175 175 175 175 175 149 149 149 149 149 149 149 149 149 149 149 149 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 283 283 283 283 283 283 283 283 283 283 283 209 209 209 209 209 340 340 340 340 340 340 279 279 279 279 279 279 279 193 193 193 289 289 289 289 189 189 189 189 189 189 189 236 236 236 236 236 236 236 236 236 83 83 83 83 83 83 83 83 83 83 288 288 288 288 288 288 119 119 119 133 133 133 133 276 276 276 276 276 276 276 276 19 19 19 19 19 19 232 232 232 232 119 119 119 48 48 48 283 283 283 283 283 283 283 283 283 283 283 193 193 193 193 193 237 237 237 237 237 237 177 177 177 177 49 49 49 49 225 225 225 225 225 225 225 340 340 340 340 340 340 340 340 340 340 340 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 102 102 102 102 102 102 102 102 102 102 102 102 102 102 35 35 35 35 35 35 35 35 35 35 35 35 35 35 281 281 281 281 281 281 281 281 288 288 288 288 288 179 179 179 144 144 144 144 287 287 287 48 48 48 48 48 175 175 175 175 175 175 248 248 248 248 248 248 248 248 248 248 248 248 248 248 187 187 187 187 187 187 233 233 233 233 233 233 233 289 289 289 289 48 48 48 119 119 119 119 119 48 48 48 48 48 48 48 223 223 223 223 223 165 165 165 165 165 165 165 165 117 117 117 117 117 205 205 205 205 205 205 205 205 340 340 340 340 340 340 340 340 331 331 331 331 331 331 165 165 165 165 165 165 165 289 289 289 289 289 189 189 189 189 236 236 236 236 236 236 236 236 236 275 275 275 275 275 275 321 321 321 321 228 228 228 228 228 228 228 228 107 107 107 107 107 107 53 53 53 53 53 53 53 53 288 288 288 283 283 283 283 283 283 283 283 283 283 208 208 208 208 187 187 187 187 187 233 233 233 233 233 233 233 233 173 173 173 173 173 173 173 173 173 69 69 69 69 69 277 277 277 277 277 229 229 229 229 116 116 116 227 227 227 227 227 208 208 208 208 208 208 208 208 175 175 175 175 175 175 175 277 277 277 277 277 277 165 165 165 165 165 165 165 165 165 165 329 329 329 329 329 329 329 329 329 225 225 225 225 225 204 204 204 204 204 204 204 204 204 204 204 204 204 204 204 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 119 119 119 119 119 37 37 37 37 37 288 288 288 288 283 283 283 283 283 283 283 208 208 208 208 208 271 271 271 271 271 277 277 277 49 49 49 49 49 49 173 173 173 173 173 173 149 149 149 149 149 149 116 116 116 116 287 287 287 287 188 188 188 188 279 279 279 279 279 289 289 289 289 289 289 164 164 164 164 164 164 83 83 83 83 83 83 83 83 83 83 289 289 289 289 289 289 281 281 281 281 281 281 281 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 116 116 116 116 116 116 116 116 116 116 116 116 1 1 1 1 1 1 1 1 1 1
+1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 283 283 283 283 283 283 283 283 283 283 283 283 283 283 209 209 209 209 209 209 209 209 209 340 340 340 340 340 340 340 50 50 50 50 50 50 50 50 219 219 219 219 219 219 219 219 165 165 165 165 165 165 165 165 165 165 165 165 165 165 165 165 280 280 280 280 280 280 280 280 280 280 102 102 102 102 102 102 102 102 283 283 283 283 283 283 283 283 283 305 305 305 305 305 116 116 116 116 116 279 279 279 279 164 164 164 164 164 164 164 164 164 164 164 164 164 164 164 164 164 164 164 164 164 164 164 164 164 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 228 228 228 228 231 231 231 231 231 21 21 21 21 21 288 288 288 288 187 187 187 187 221 221 221 221 221 221 281 281 281 281 273 273 273 273 273 273 133 133 133 133 133 221 221 221 221 221 221 289 289 289 289 289 189 189 189 189 236 236 236 236 236 236 50 50 50 50 50 175 175 175 175 175 175 175 175 149 149 149 149 149 149 149 149 149 149 149 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 279 279 279 279 279 279 279 279 279 279 279 279 279 133 133 133 133 116 116 116 116 116 116 227 227 227 227 227 227 37 37 37 37 37 37 37 37 37 37 293 293 293 293 293 293 337 337 337 337 316 316 316 316 316 107 107 107 107 107 107 107 225 225 225 225 225 225 225 225 37 37 37 37 237 237 237 237 237 237 221 221 221 221 221 221 221 225 225 225 225 225 225 204 204 204 204 204 204 204 204 204 204 204 204 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 187 187 187 187 187 187 187 187 187 187 187 289 289 289 289 289 289 280 280 280 280 50 50 50 50 50 107 107 107 107 107 107 107 107 107 107 107 107 107 107 107 107 264 264 264 264 264 264 264 264 264 264 264 264 264 264 264 264 264 99 99 99 99 99 99 99 99 99 99 99 99 328 328 328 328 328 328 328 219 219 219 219 219 219 219 219 219 219 53 53 53 228 228 228 228 228 228 228 228 228 228 171 171 171 171 171 171 171 171 171 171 171 171 69 69 69 69 69 69 276 276 276 276 276 276 276 276 276 276 276 276 276 276 276 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 179 179 179 179 179 179 179 179 179 179 208 208 208 208 208 208 283 283 283 283 283 283 283 283 305 305 305 305 305 116 116 116 116 107 107 107 107 107 208 208 208 208 208 208 208 208 208 208 179 179 179 179 179 179 179 179 179 179 179 179 209 209 209 209 276 276 276 276 276 276 276 276 276 276 276 276 276 276 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 227 227 227 227 227 227 227 193 193 193 193 281 281 281 281 281 281 281 189 189 189 189 340 340 340 340 340 340 39 39 39 39 39 39 39 39 39 39 39 39 39 225 225 225 225 225 49 49 49 49 49 49 177 177 177 177 177 177 177 341 341 341 341 341 341 341 341 37 37 37 37 37 37 37 233 233 233 233 233 117 117 117 117 144 144 144 144 144 144 144 279 279 279 279 279 279 279 279 279 279 279 279 273 273 273 273 273 273 133 133 133 133 133 233 233 233 233 233 233 233 233 281 281 281 281 281 281 281 281 281 281 144 144 144 144 144 144 144 144 144 144 331 331 331 331 331 331 49 49 49 49 340 340 340 340 340 340 340 287 287 287 287 287 188 188 188 188 107 107 107 107 107 107 107 107 277 277 277 277 193 193 193 193 193 236 236 236 236 236 236 179 179 179 179 193 193 193 193 228 228 228 228 228 228 228 228 247 247 247 247 247 247 247 247 329 329 329 329 329 329 329 329 144 144 144 144 144 171 171 171 171 171 171 171 171 145 145 145 145 228 228 228 228 231 231 231 231 231 231 231 249 249 249 249 249 249 249 329 329 329 329 48 48 48 48 48 279 279 279 279 279 279 279 279 279 221 221 221 221 221 249 249 249 249 249 249 249 249 285 285 285 285 285 285 285 285 285 285 285 48 48 48 48 48 48 171 171 171 171 171 171 171 171 171 171 69 69 69 69 69 276 276 276 276 276 227 227 227 227 227 227 227 208 208 208 208 208 208 208 208 208 208 208 208 208 208 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 119 119 119 119 119 119 48 48 48 48 48 279 279 279 279 279 279 279 279 289 289 289 289 289 289 165 165 165 165 165 165 165 165 285 285 285 285 285 285 285 285 285 285 49 49 49 49 233 233 233 233 229 229 229 229 229 229 41 41 41 41 41 41 41 41 41 41 41 281 281 281 281 281 281 281 281 289 289 289 289 289 289 144 144 144 144 144 144 144 144 331 331 331 331 331 331 331 193 193 193 193 281 281 281 281 281 281 281 281 281 49 49 49 49 49 225 225 225 225 225 225 116 116 116 116 116 116 116 116 116 116 116 116 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
+1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 175 175 175 175 175 175 175 175 175 133 133 133 133 133 133 133 133 133 133 280 280 280 280 280 280 280 280 280 280 119 119 119 119 119 133 133 133 133 133 277 277 277 277 277 277 340 340 340 340 340 279 279 279 279 279 279 279 279 53 53 53 53 53 53 53 228 228 228 228 228 228 227 227 227 227 189 189 189 189 189 281 281 281 281 281 281 281 281 281 281 289 289 289 289 289 289 165 165 165 165 165 165 165 165 165 165 165 165 220 220 220 220 220 220 220 220 220 220 220 220 220 220 220 220 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 179 179 179 179 179 179 179 179 179 179 208 208 208 208 208 208 279 279 279 279 279 279 279 279 279 133 133 133 133 133 133 133 133 133 133 133 116 116 116 116 116 116 116 116 116 116 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 227 227 227 227 227 227 227 227 227 227 193 193 193 193 193 281 281 281 281 281 281 281 189 189 189 189 340 340 340 340 279 279 279 279 279 273 273 273 273 273 273 133 133 133 133 133 133 233 233 233 233 233 233 281 281 281 281 281 281 281 144 144 144 144 144 144 219 219 219 219 219 219 219 219 165 165 165 165 165 165 165 165 165 228 228 228 228 228 228 228 228 67 67 67 67 67 67 67 67 172 172 172 172 172 172 172 172 119 119 119 119 52 52 52 52 287 287 287 287 287 287 287 287 277 277 277 277 165 165 165 165 165 165 165 232 232 232 232 232 232 331 331 331 331 331 189 189 189 189 292 292 292 119 119 119 119 119 37 37 37 37 37 37 37 37 37 37 37 37 288 288 288 288 288 288 175 175 175 175 175 175 149 149 149 149 149 149 149 149 149 149 149 149 149 149 149 224 224 224 224 224 224 224 224 224 224 224 224 47 47 47 47 233 233 233 233 116 116 116 175 175 175 175 175 175 165 165 165 165 165 165 165 165 165 165 165 328 328 328 328 179 179 179 179 179 179 179 144 144 144 144 144 144 144 187 187 187 187 187 233 233 233 233 289 289 289 289 289 48 48 48 227 227 227 227 227 227 227 100 100 100 100 100 100 100 100 100 100 111 111 111 111 111 111 111 111 111 111 111 21 21 21 21 21 21 21 21 21 21 21 277 277 277 277 277 277 277 277 277 277 277 216 216 216 216 216 216 216 216 216 216 216 216 216 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 279 279 279 279 279 279 279 279 279 279 279 279 279 133 133 133 133 133 133 133 133 116 116 116 116 335 335 335 335 335 335 335 335 335 335 335 335 320 320 320 320 320 320 320 320 320 320 320 47 47 47 47 47 47 233 233 233 116 116 116 335 335 335 69 69 69 69 69 69 69 69 276 276 276 276 276 279 279 279 279 279 279 279 279 279 279 279 193 193 193 193 193 281 281 281 281 281 281 281 281 281 289 289 289 289 289 289 144 144 144 144 144 144 144 331 331 331 331 331 331 331 331 331 331 144 144 144 144 144 144 47 47 47 47 47 117 117 117 117 117 117 117 21 21 21 21 21 21 21 21 21 21 21 273 273 273 273 289 289 289 289 189 189 189 189 236 236 236 236 236 236 179 179 179 179 179 179 148 148 148 148 148 148 148 148 148 148 148 171 171 171 171 171 171 171 145 145 145 145 228 228 228 228 228 47 47 47 232 232 232 232 232 232 232 67 67 67 67 67 67 67 67 67 277 277 277 277 277 173 173 173 173 173 173 173 173 49 49 49 232 232 232 232 47 47 47 47 47 47 281 281 281 281 281 281 281 281 281 281 101 101 101 101 101 101 101 101 101 101 225 225 225 225 225 225 225 225 49 49 49 49 228 228 228 228 228 228 228 228 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 35 35 35 35 35 35 35 35 35 35 35 233 233 233 116 116 116 119 119 119 49 49 49 49 288 288 288 288 288 288 335 335 335 335 335 335 320 320 320 320 320 320 331 331 331 331 331 305 305 305 305 116 116 116 107 107 107 107 107 204 204 204 204 204 204 204 47 47 47 47 47 47 225 225 225 225 225 225 225 225 69 69 69 69 69 69 69 236 236 236 236 236 236 236 236 171 171 171 171 171 171 171 144 144 144 179 179 179 179 179 179 148 148 148 148 148 148 148 148 271 271 271 271 271 271 271 271 271 277 277 277 277 133 133 133 133 341 341 341 341 341 341 341 49 49 49 49 233 233 233 233 233 289 289 289 289 289 289 225 225 225 225 204 204 204 204 204 204 204 204 204 204 204 204 204 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 119 119 119 119 119 119 119 119 119 37 37 37 37 37 37 37 37 37 37 289 289 289 289 289 289 280 280 280 280 280 67 67 67 67 67 67 67 67 67 67 67 224 224 224 224 224 224 224 224 102 102 102 102 102 102 102 102 102 102 102 231 231 231 231 231 231 231 248 248 248 248 248 248 248 248 248 248 248 47 47 47 47 47 109 109 109 109 109 85 85 85 85 85 85 85 85 85 85 85 288 288 288 288 288 187 187 187 187 187 187 187 288 288 288 288 288 288 288 288 288 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 35 35 35 35 35 35 35 35 35 35 35 35 35 35 35 35 233 233 233 116 116 116 102 102 102 102 102 102 102 102 102 102 102 102 102 102 179 179 179 179 179 179 179 37 37 37 37 37 37 329 329 329 329 49 49 49 49 232 232 232 232 175 175 175 175 175 175 175 21 21 21 21 21 21 288 288 288 288 288 131 131 131 233 233 233 233 233 233 204 204 204 204 204 227 227 227 227 227 227 227 227 227 69 69 69 69 69 69 69 276 276 276 276 276 276 276 67 67 67 67 67 67 67 67 67 277 277 277 277 277 173 173 173 173 173 173 173 173 49 49 49 49 49 233 233 233 233 233 233 233 340 340 340 340 340 219 219 219 219 219 49 49 49 233 233 233 233 233 233 233 281 281 281 281 281 281 281 281 281 281 209 209 209 209 209 209 209 209 209 225 225 225 225 225 225 225 225 225 116 116 116 116 116 179 179 179 193 193 193 193 277 277 277 277 277 277 277 277 277 49 49 49 49 109 109 109 109 89 89 89 89 89 89 89 89 89 89 89 89 89 89 89 89 289 289 289 289 289 289 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
+1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 115 115 115 115 115 115 115 249 249 249 249 249 249 249 249 249 233 233 233 233 288 288 288 288 55 55 55 55 55 55 233 233 233 233 233 117 117 117 117 145 145 145 145 145 145 281 281 281 281 281 281 281 281 281 281 281 281 289 289 289 289 289 289 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 233 233 233 233 233 116 116 116 116 116 116 116 116 116 116 116 116 116 116 116 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 279 279 279 279 279 279 279 279 279 279 279 279 133 133 133 133 116 116 116 116 116 116 116 227 227 227 227 227 227 227 37 37 37 37 37 37 37 37 37 37 293 293 293 293 293 337 337 337 337 316 316 316 316 316 316 316 179 179 179 179 179 179 179 179 179 179 133 133 133 133 133 225 225 225 225 225 225 273 273 273 273 273 273 273 225 225 225 225 49 49 49 49 49 49 281 281 281 281 281 281 281 281 225 225 225 225 225 204 204 204 204 204 204 204 204 204 204 204 204 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 331 331 331 331 331 331 331 331 193 193 193 193 193 285 285 285 285 285 285 285 285 285 285 189 189 189 189 189 236 236 236 236 236 236 236 236 236 236 119 119 119 119 49 49 49 49 49 288 288 288 288 288 288 288 227 227 227 227 227 17 17 17 277 277 277 277 277 277 277 277 193 193 193 193 193 193 225 225 225 225 225 225 225 225 48 48 48 48 48 331 331 331 331 331 331 331 49 49 49 49 49 340 340 340 340 340 35 35 35 35 35 288 288 288 288 288 179 179 179 179 179 179 179 179 179 179 179 179 179 37 37 37 37 37 37 37 37 37 37 233 233 233 233 233 233 233 116 116 116 287 287 287 287 287 188 188 188 219 219 219 219 219 219 219 219 219 219 219 219 249 249 249 249 249 249 249 249 249 272 272 272 272 272 272 272 272 272 331 331 331 189 189 189 120 120 120 119 119 119 52 52 52 279 279 279 279 279 279 279 279 197 197 197 197 197 113 113 113 113 113 113 113 113 317 317 317 317 317 317 317 317 317 165 165 165 165 165 165 165 165 285 285 285 285 285 285 285 285 285 285 285 49 49 49 49 232 232 232 232 232 232 232 232 232 232 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 331 331 331 331 331 331 331 331 331 331 331 331 331 133 133 133 133 133 133 133 133 133 133 133 133 224 224 224 224 224 224 224 224 335 335 335 335 335 335 335 335 335 321 321 321 321 116 116 116 116 116 116 107 107 107 107 107 107 133 133 133 133 289 289 289 289 289 144 144 144 144 144 144 219 219 219 219 219 219 333 333 333 333 133 133 133 133 281 281 281 281 281 113 113 113 113 113 113 113 113 49 49 49 232 232 232 119 119 119 52 52 52 175 175 175 175 175 149 149 149 149 149 149 149 149 149 149 149 149 149 224 224 224 224 224 224 224 224 224 224 224 224 224 224 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 279 279 279 279 279 279 279 279 279 279 279 279 279 133 133 133 133 133 116 116 116 116 119 119 119 52 52 52 52 52 52 279 279 279 279 279 279 289 289 289 289 289 289 165 165 165 165 165 165 285 285 285 285 285 285 285 285 285 285 285 49 49 49 232 232 232 227 227 227 227 227 227 227 37 37 37 37 37 37 37 281 281 281 281 281 289 289 289 289 289 289 144 144 144 219 219 219 219 219 219 219 219 219 219 133 133 133 133 133 277 277 277 277 277 277 277 277 225 225 225 225 49 49 49 49 49 49 281 281 281 281 281 281 281 281 225 225 225 225 204 204 204 204 204 204 204 204 204 204 204 204 204 204 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 115 115 115 115 115 115 115 133 133 133 133 133 133 133 276 276 276 276 276 276 279 279 279 279 279 279 279 279 279 279 164 164 164 164 164 164 164 164 164 164 164 283 283 283 283 283 283 283 283 283 209 209 209 209 224 224 224 224 224 224 107 107 107 107 107 107 204 204 204 204 204 204 204 204 163 163 163 163 163 163 109 109 109 109 49 49 49 224 224 224 287 287 287 287 287 287 320 320 320 320 320 187 187 187 187 187 221 221 221 221 221 221 281 281 281 281 281 281 273 273 273 273 273 225 225 225 225 225 225 165 165 165 165 165 165 165 165 165 165 165 165 165 232 232 232 232 232 232 232 232 232 232 232 232 1 1 1 1 1 1 1 1 1 1 283 283 283 283 283 283 283 283 283 283 209 209 209 209 209 340 340 340 340 340 340 340 340 175 175 175 175 175 175 175 21 21 21 288 288 288 288 288 50 50 50 50 50 50 287 287 287 287 287 287 287 287 287 287 287 53 53 53 53 53 53 236 236 236 236 236 236 236 47 47 47 328 328 328 328 328 179 179 179 179 179 144 144 144 144 144 144 144 144 144 144 144 247 247 247 247 247 247 247 247 247 247 247 247 247 247 247 247 232 232 232 232 232 232 119 119 119 119 119 37 37 37 37 37 37 37 37 37 37 289 289 289 289 289 289 289 280 280 280 279 279 279 279 279 279 279 279 279 149 149 149 149 149 149 149 149 149 149 149 289 289 289 289 49 49 49 232 232 232 232 232 232 232 232 232 232 232 232 232 232 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
+1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 227 227 227 227 227 227 227 227 165 165 165 165 165 165 165 165 109 109 109 204 204 204 204 204 204 204 119 119 119 119 119 164 164 164 164 164 164 164 164 331 331 331 331 331 331 331 331 331 331 144 144 144 144 144 83 83 83 83 83 83 83 83 288 288 288 288 288 47 47 47 328 328 328 328 328 328 328 328 328 107 107 107 107 107 107 107 265 265 265 265 265 265 265 265 265 265 265 265 265 265 265 265 340 340 340 340 340 340 340 47 47 47 47 328 328 328 119 119 119 119 52 52 52 107 107 107 107 107 277 277 277 277 277 277 37 37 37 37 37 37 37 37 37 233 233 233 233 116 116 116 335 335 335 335 335 320 320 320 320 320 320 320 331 331 331 331 331 331 331 331 331 21 21 21 21 233 233 233 233 233 289 289 289 289 289 49 49 49 49 49 49 49 116 116 116 116 116 116 116 116 116 116 116 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 179 179 179 179 179 179 179 179 179 208 208 208 208 208 208 208 331 331 331 331 331 331 331 331 331 331 69 69 69 69 69 69 69 69 69 69 69 221 221 221 221 221 288 288 288 288 288 288 288 288 288 1 1 1 215 215 215 215 215 215 215 215 215 215 69 69 69 69 69 69 69 69 233 233 233 233 233 289 289 289 289 289 49 49 49 225 225 225 225 225 204 204 204 204 204 204 204 204 47 47 47 47 47 47 47 333 333 333 333 333 333 333 333 333 333 164 164 164 164 164 164 164 164 164 164 164 164 164 164 164 164 164 164 164 164 1 1 1 1 1 1 1 1 1 107 107 107 107 107 107 107 107 209 209 209 209 209 209 209 209 189 189 189 189 189 189 189 236 236 236 236 236 236 236 236 179 179 179 179 179 179 179 179 179 53 53 53 53 237 237 237 237 237 237 237 237 237 177 177 177 177 277 277 277 277 277 277 204 204 204 204 204 204 204 204 204 204 204 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 35 35 35 35 35 35 35 35 35 35 35 35 35 35 233 233 233 233 116 116 116 116 119 119 119 204 204 204 204 204 204 204 47 47 47 47 233 233 233 233 233 233 173 173 173 173 173 173 173 173 173 69 69 69 69 277 277 277 277 277 277 113 113 113 113 113 113 49 49 49 233 233 233 49 49 49 49 49 49 288 288 288 288 288 227 227 227 227 227 227 227 37 37 37 37 37 37 37 37 37 37 37 37 293 293 293 293 293 293 293 293 337 337 337 337 337 337 316 316 316 316 316 316 316 331 331 331 331 331 331 331 49 49 49 49 340 340 340 340 340 340 223 223 223 223 223 223 223 133 133 133 133 133 133 133 173 173 173 173 173 288 288 288 287 287 287 287 188 188 188 188 188 115 115 115 115 115 115 115 115 115 115 320 320 320 320 320 320 320 320 320 320 320 320 320 320 320 320 320 320 320 320 320 320 320 320 320 320 320 320 1 1 1 119 119 119 119 119 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 288 288 288 288 288 288 288 288 288 288 288 288 288 288 288 331 331 331 331 193 193 193 193 112 112 112 112 112 112 112 112 112 112 331 331 331 49 49 49 340 340 340 340 340 340 340 179 179 179 179 179 179 179 179 179 179 21 21 21 21 21 21 277 277 277 277 277 117 117 117 144 144 144 144 144 144 171 171 171 171 171 171 171 144 144 144 144 179 179 179 179 179 179 179 179 179 179 179 179 179 193 193 193 193 193 193 228 228 228 228 228 228 228 228 228 228 228 228 228 119 119 119 119 119 49 49 49 49 49 49 49 232 232 232 232 232 232 232 232 232 232 232 232 232 1 1 1 1 107 107 107 107 107 107 193 193 193 193 193 193 193 193 277 277 277 277 277 277 277 117 117 117 117 189 189 189 189 236 236 236 236 236 236 236 236 236 50 50 50 50 50 50 223 223 223 223 223 223 223 223 223 223 223 223 223 223 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 49 49 49 232 232 232 232 187 187 187 187 232 232 232 187 187 187 187 187 187 187 289 289 289 289 289 289 280 280 280 280 280 280 280 280 280 115 115 115 115 115 115 115 115 133 133 133 133 133 133 133 133 133 133 133 133 232 232 232 232 232 232 232 232 232 232 232 232 232 232 232 232 232 232 232 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 331 331 331 331 331 331 331 331 331 331 331 331 331 331 331 331 331 69 69 69 69 69 69 69 69 69 69 220 220 220 220 220 220 220 220 220 220 220 51 51 51 51 51 51 51 51 51 272 272 272 272 272 272 272 287 287 287 287 287 287 287 287 287 320 320 320 320 320 320 320 320 50 50 50 50 50 50 175 175 175 175 175 175 175 149 149 149 149 149 149 149 149 149 149 149 149 149 149 149 149 149 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 50 50 50 50 50 50 50 50 50 50 50 50 50 279 279 279 279 279 279 279 279 279 279 279 289 289 289 289 289 289 289 289 277 277 277 277 165 165 165 165 165 165 165 165 165 165 165 233 233 233 233 216 216 216 216 216 216 216 175 175 175 175 175 175 149 149 149 149 149 149 149 149 149 149 149 149 149 149 224 224 224 224 224 224 224 224 224 224 1 1 1 47 47 47 47 47 47 47 232 232 232 232 232 232 232 232 232 232 232 232 67 67 67 67 67 67 67 67 67 67 277 277 277 277 277 277 173 173 173 173 173 173 173 173 173 173 49 49 49 49 232 232 232 232 232 232 232 232 232 232 232 175 175 175 175 175 175 149 149 149 149 149 149 149 149 149 149 149 149 224 224 224 224 224 224 224 224 224 224 224 224 224 224 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
+1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 35 35 35 35 35 35 35 35 35 35 35 35 35 35 35 233 233 233 233 233 116 116 116 115 115 115 189 189 189 189 189 229 229 229 229 229 229 229 229 37 37 37 37 37 37 37 37 37 37 37 233 233 233 233 233 116 116 116 116 47 47 47 47 328 328 328 328 328 179 179 179 179 179 179 179 179 144 144 144 144 144 144 144 144 144 144 144 331 331 331 331 331 331 331 331 331 331 331 331 331 331 331 331 100 100 100 100 100 100 100 100 100 100 100 100 283 283 283 283 283 283 283 283 283 283 208 208 208 208 208 331 331 331 331 331 331 331 53 53 53 53 53 53 341 341 341 341 49 49 49 49 233 233 233 288 288 288 50 50 50 50 50 107 107 107 107 107 107 264 264 264 264 264 264 264 264 264 264 264 264 264 264 264 264 264 264 264 264 264 264 264 264 264 264 264 264 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 227 227 227 227 227 227 227 227 227 227 227 227 227 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 293 293 293 293 293 293 293 293 293 293 293 337 337 337 337 337 337 316 316 316 316 316 316 316 316 316 316 316 175 175 175 175 175 175 175 175 175 277 277 277 277 277 277 277 277 277 277 249 249 249 249 249 249 249 249 249 249 249 249 249 249 249 249 249 249 249 233 233 233 233 233 116 116 116 116 116 116 187 187 187 187 232 232 232 232 232 232 232 232 232 232 279 279 279 279 279 279 279 279 279 279 279 279 279 279 279 273 273 273 273 273 193 193 193 277 277 277 277 277 277 277 277 277 277 277 277 277 277 277 189 189 189 189 189 288 288 288 288 131 131 131 131 131 131 340 340 340 340 179 179 179 179 208 208 208 208 208 208 287 287 287 287 287 287 287 287 287 287 287 149 149 149 149 149 149 149 149 149 149 149 149 149 149 233 233 233 233 116 116 116 116 47 47 47 109 109 109 109 85 85 85 85 85 85 85 85 85 85 85 85 85 85 85 85 85 85 288 288 288 288 288 288 288 288 47 47 47 233 233 233 233 233 116 116 116 116 283 283 283 283 283 283 283 283 283 283 283 283 283 283 283 283 53 53 53 53 53 53 53 173 173 173 173 173 173 173 49 49 49 49 225 225 225 225 225 225 225 225 225 225 225 116 116 116 1 1 1 1 1 215 215 215 215 215 215 215 215 215 215 215 215 133 133 133 133 133 133 233 233 233 233 233 233 233 289 289 289 289 289 289 225 225 225 225 204 204 204 204 204 204 204 204 115 115 115 115 115 115 115 85 85 85 85 85 85 85 85 85 85 85 232 232 232 232 232 119 119 119 119 52 52 52 271 271 271 271 271 271 271 271 225 225 225 225 225 225 37 37 37 37 37 37 37 289 289 289 289 289 289 289 173 173 173 173 173 173 173 73 73 73 73 277 277 277 277 277 228 228 228 228 228 228 228 287 287 287 287 287 287 287 287 69 69 69 69 69 69 69 69 277 277 277 277 277 117 117 117 117 117 117 340 340 340 340 340 179 179 179 179 179 144 144 144 144 144 144 144 144 144 144 144 144 144 144 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 283 283 283 283 283 283 283 283 283 283 208 208 208 208 208 208 208 179 179 179 179 179 179 37 37 37 37 37 116 116 116 107 107 107 107 107 193 193 193 193 193 232 232 232 232 232 232 331 331 331 331 331 331 21 21 21 21 21 21 21 21 21 113 113 113 113 113 113 113 189 189 189 189 189 236 236 236 236 236 236 179 179 179 179 179 179 179 193 193 193 193 228 228 228 228 228 131 131 131 131 131 131 131 131 131 131 131 131 329 329 329 329 329 329 329 144 144 144 144 144 144 144 279 279 279 279 279 279 279 279 279 279 279 193 193 193 193 233 233 233 233 233 233 233 233 280 280 280 280 280 179 179 179 179 179 208 208 208 179 179 179 37 37 37 116 116 116 116 116 271 271 271 271 271 271 271 271 271 271 271 271 37 37 37 37 37 37 37 37 37 37 37 281 281 281 281 281 281 281 281 281 288 288 288 288 179 179 179 144 144 144 144 144 144 144 144 144 144 144 144 144 144 144 144 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 35 35 35 35 35 35 35 35 35 35 35 233 233 233 116 116 116 283 283 283 283 283 283 208 208 208 208 208 208 208 179 179 179 179 179 179 179 179 179 179 179 37 37 37 37 37 37 116 116 116 179 179 179 179 179 179 179 179 148 148 148 148 148 148 148 148 148 148 148 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 340 340 340 340 340 340 340 340 340 19 19 19 19 19 19 19 232 232 232 232 232 179 179 179 179 179 179 193 193 193 193 193 228 228 228 228 228 228 231 231 231 231 231 231 231 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 227 227 227 227 227 227 227 227 227 227 227 37 37 37 37 37 37 37 37 37 37 37 37 293 293 293 293 293 293 337 337 337 337 316 316 316 316 316 331 331 331 331 331 49 49 49 49 49 340 340 340 340 340 340 340 340 340 231 231 231 231 231 21 21 21 21 21 21 21 21 288 288 288 288 288 288 288 223 223 223 223 223 223 223 305 305 305 305 221 221 221 221 221 221 221 189 189 189 189 236 236 236 236 236 236 236 236 236 236 35 35 35 35 35 35 35 35 35 288 288 288 288 179 179 179 179 144 144 144 144 144 144 144 144 144 144 144 144 144 144 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
+1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 50 50 50 50 50 50 50 50 50 50 111 111 111 111 111 111 111 111 111 111 111 111 111 101 101 101 101 101 101 101 101 101 101 101 101 101 101 225 225 225 225 225 225 116 116 116 47 47 47 47 328 328 328 328 328 47 47 47 47 109 109 109 109 109 109 109 85 85 85 85 85 85 288 288 288 288 288 187 187 187 225 225 225 225 225 225 225 225 225 225 225 225 133 133 133 133 329 329 329 329 329 329 329 329 49 49 49 49 49 232 232 232 232 232 232 232 232 232 232 232 232 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 175 175 175 175 175 175 175 175 21 21 21 21 21 21 21 21 21 21 21 277 277 277 277 277 277 277 109 109 109 116 116 116 116 116 116 116 187 187 187 232 232 232 232 232 50 50 50 50 327 327 327 327 327 327 133 133 133 133 133 133 277 277 277 277 277 277 277 277 277 204 204 204 204 204 204 283 283 283 283 283 283 283 283 283 283 283 283 283 283 283 283 283 283 69 69 69 69 69 69 69 69 69 277 277 277 277 277 277 277 288 288 288 288 288 288 288 288 288 288 288 1 1 1 1 1 1 1 1 1 327 327 327 327 327 327 327 327 327 327 327 133 133 133 133 133 277 277 277 277 277 277 277 277 277 204 204 204 204 204 204 204 287 287 287 287 287 287 287 287 287 287 287 287 287 287 287 101 101 101 101 101 101 101 101 101 101 101 101 101 288 288 288 288 288 288 288 288 288 288 288 288 288 288 288 1 1 1 1 1 1 1 327 327 327 327 327 327 327 327 133 133 133 133 133 133 133 277 277 277 277 277 277 277 204 204 204 204 204 204 204 204 204 204 204 204 204 204 51 51 51 51 51 51 51 51 51 51 51 177 177 177 177 177 177 177 225 225 225 225 225 225 225 225 204 204 204 204 204 204 204 204 115 115 115 115 115 115 115 277 277 277 277 277 133 133 133 133 133 133 133 280 280 280 280 280 280 280 280 280 280 280 280 280 47 47 47 47 328 328 328 328 328 328 328 335 335 335 335 335 335 335 335 335 133 133 133 133 133 133 225 225 225 225 225 225 245 245 245 245 245 245 245 245 245 245 245 189 189 189 189 189 284 284 284 284 284 284 284 284 284 284 284 284 284 175 175 175 175 175 175 277 277 277 277 277 277 164 164 164 164 164 164 164 164 164 164 164 164 164 164 164 331 331 331 331 331 331 331 331 331 331 331 193 193 193 233 233 233 233 233 233 233 233 233 281 281 281 281 281 281 281 281 281 281 281 281 204 204 204 204 204 204 204 204 204 204 204 204 204 204 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 283 283 283 283 283 283 283 283 283 283 283 208 208 208 208 208 208 331 331 331 331 331 331 331 331 331 331 69 69 69 69 69 69 276 276 276 276 276 276 50 50 50 50 50 50 171 171 171 171 171 171 171 171 171 171 171 171 171 165 165 165 165 165 165 165 165 165 165 165 165 165 117 117 117 117 117 189 189 189 189 189 189 116 116 116 116 116 116 116 107 107 107 107 107 107 107 107 107 277 277 277 277 277 277 277 85 85 85 85 85 85 85 85 85 85 85 85 85 85 232 232 232 232 232 232 232 279 279 279 279 279 279 279 279 279 279 279 279 165 165 165 165 165 165 165 165 165 225 225 225 225 225 225 225 225 144 144 144 144 144 144 179 179 179 179 179 179 179 179 37 37 37 37 37 37 37 37 37 37 37 37 37 288 288 288 288 288 288 288 288 288 288 288 288 288 288 288 288 1 1 1 47 47 47 47 47 47 47 47 47 47 47 233 233 233 233 116 116 116 107 107 107 107 107 107 189 189 189 189 233 233 233 233 233 233 209 209 209 209 209 209 209 209 209 209 209 292 292 292 292 292 292 292 292 119 119 119 52 52 52 179 179 179 179 179 179 179 179 179 179 179 179 37 37 37 37 37 37 37 37 37 37 37 37 288 288 288 288 288 288 288 288 288 288 288 288 288 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 187 187 187 187 187 187 187 187 221 221 221 221 221 221 221 281 281 281 281 281 281 281 289 289 289 289 289 289 289 133 133 133 133 133 133 133 233 233 233 233 233 233 117 117 117 117 117 189 189 189 236 236 236 236 236 236 236 236 236 115 115 115 115 115 115 85 85 85 85 85 85 85 85 85 85 85 232 232 232 232 232 179 179 179 179 179 179 179 179 179 179 144 144 144 144 144 107 107 107 107 107 107 107 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 220 220 220 220 220 220 220 220 220 220 220 220 220 220 220 331 331 331 331 144 144 144 144 144 144 144 287 287 287 287 287 287 287 287 287 287 287 287 287 320 320 320 320 320 320 320 320 320 320 320 320 320 320 320 107 107 107 107 107 107 107 277 277 277 277 277 165 165 165 165 165 165 165 165 165 165 165 165 165 165 165 165 165 117 117 117 117 117 117 340 340 340 340 340 340 47 47 47 47 47 328 328 328 327 327 327 327 327 327 327 133 133 133 133 133 277 277 277 277 277 277 277 277 204 204 204 204 204 204 204 204 204 291 291 291 291 291 291 291 291 291 291 291 291 291 291 291 193 193 193 193 193 193 193 193 193 220 220 220 220 220 220 220 220 220 220 220 220 220 220 220 220 220 1 1 1 1 1 1 1 1 1 1 115 115 115 115 115 115 197 197 197 197 281 281 281 281 281 281 281 281 281 281 281 281 281 281 281 281 281 101 101 101 101 101 101 101 101 101 101 101 117 117 117 117 117 49 49 49 49 49 117 117 117 117 117 117 225 225 225 225 225 204 204 204 204 204 204 204 204 204 204 275 275 275 275 275 275 275 275 275 275 275 275 275 275 133 133 133 133 133 133 133 133 133 133 133 133 116 116 116 116 116 116 116 116 116 116 116 1 1 1 1 1 1 1 179 179 179 179 179 179 179 179 179 179 179 179 179 133 133 133 133 133 276 276 276 276 276 276 276 276 276 276 276 276 276 276 276 276 276 276 276 276 276 276 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
+1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 179 179 179 179 179 179 179 179 179 144 144 144 144 144 144 171 171 171 171 171 171 171 171 171 165 165 165 165 165 165 165 165 165 165 165 165 280 280 280 280 280 280 280 280 280 280 280 280 331 331 331 331 331 49 49 49 340 340 340 340 279 279 279 279 279 279 279 279 279 279 279 279 279 279 279 229 229 229 229 229 229 69 69 69 69 69 69 69 69 69 69 69 69 69 69 69 69 69 69 224 224 224 224 224 224 224 224 224 224 224 224 224 1 1 1 1 1 1 1 1 1 1 331 331 331 331 331 331 331 331 331 331 331 331 331 331 331 331 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 288 288 288 288 288 288 288 288 288 288 288 288 288 288 288 288 288 1 1 1 47 47 47 233 233 233 233 233 233 233 116 116 116 116 116 116 291 291 291 291 291 291 291 291 291 291 291 291 291 193 193 193 193 193 193 193 232 232 232 232 232 232 232 232 232 232 232 232 232 232 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 67 67 67 67 67 67 67 67 67 67 67 67 67 67 225 225 225 225 225 225 225 281 281 281 281 281 281 281 281 281 281 281 244 244 244 244 244 244 244 244 244 244 227 227 227 227 227 227 227 227 227 53 53 53 53 53 53 53 112 112 112 112 112 112 112 112 112 171 171 171 171 171 171 171 171 171 171 277 277 277 133 133 133 133 221 221 221 221 221 221 221 221 49 49 49 49 49 225 225 225 225 225 225 225 225 225 225 116 116 116 116 116 116 116 116 116 116 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 179 179 179 179 179 179 179 179 179 179 179 144 144 144 227 227 227 227 227 227 227 227 227 227 227 227 85 85 85 85 85 85 85 85 85 85 85 85 85 85 85 85 85 292 292 292 292 292 292 292 292 292 292 292 292 292 292 292 292 292 292 331 331 331 49 49 49 49 340 340 340 340 340 340 340 223 223 223 223 223 223 223 223 223 21 21 21 21 21 21 21 21 21 21 21 21 21 21 277 277 277 277 277 277 277 277 277 277 216 216 216 216 216 216 216 216 216 216 216 216 216 216 216 216 216 216 216 1 1 1 47 47 47 47 47 47 47 47 233 233 233 116 116 116 116 279 279 279 279 279 279 279 279 279 248 248 248 248 248 248 248 248 248 331 331 331 331 331 331 331 148 148 148 179 179 179 179 179 179 179 179 144 144 144 144 144 144 144 144 144 144 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 340 340 340 340 340 340 340 340 340 340 340 340 340 1 1 1 331 331 331 331 193 193 193 193 193 112 112 112 112 112 112 112 112 112 112 223 223 223 223 223 223 223 305 305 305 305 305 221 221 221 221 221 221 288 288 288 288 288 288 288 288 288 288 288 288 175 175 175 175 175 175 277 277 277 277 277 277 209 209 209 209 209 209 209 209 209 209 209 209 209 232 232 232 232 232 187 187 187 187 232 232 232 232 232 232 232 232 279 279 279 279 279 279 279 279 279 279 279 53 53 53 53 53 53 228 228 228 228 228 228 228 228 223 223 223 223 223 223 223 101 101 101 101 101 101 101 101 101 101 101 101 289 289 289 289 289 289 280 280 280 280 47 47 47 233 233 233 116 116 116 227 227 227 227 227 227 227 321 321 321 321 321 321 321 321 321 321 321 321 321 321 117 117 117 117 117 117 340 340 340 340 340 340 340 340 340 340 340 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 35 35 35 35 35 35 35 35 35 35 35 233 233 233 233 233 116 116 116 116 116 116 175 175 175 175 175 175 277 277 277 277 277 277 277 277 277 164 164 164 164 164 164 164 164 164 164 164 164 191 191 191 191 191 191 191 232 232 232 232 232 232 51 51 51 51 51 51 51 51 121 121 121 121 121 121 121 121 145 145 145 145 145 145 145 145 145 145 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 279 279 279 279 279 279 279 279 279 279 279 279 279 279 248 248 248 248 248 248 248 248 248 248 248 248 248 171 171 171 171 171 171 171 171 21 21 21 21 21 21 21 276 276 276 276 276 276 276 119 119 119 119 204 204 204 204 204 204 204 67 67 67 67 67 67 67 67 67 67 277 277 277 277 277 117 117 117 49 49 49 233 233 233 137 137 137 277 277 277 277 277 204 204 204 204 204 204 204 47 47 47 47 47 109 109 109 109 109 109 341 341 341 341 341 341 341 341 149 149 149 149 149 149 149 149 329 329 329 329 329 144 144 144 144 144 144 144 144 144 144 144 144 144 144 144 144 144 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 35 35 35 35 35 35 35 35 35 232 232 232 232 187 187 187 187 187 221 221 221 221 221 281 281 281 281 289 289 289 289 289 289 289 277 277 277 277 277 69 69 69 69 69 277 277 277 277 277 117 117 117 49 49 49 233 233 233 137 137 137 137 137 137 137 277 277 277 277 277 277 277 204 204 204 204 204 204 204 47 47 47 47 109 109 109 109 109 109 341 341 341 341 341 341 341 341 149 149 149 149 149 149 149 149 149 149 329 329 329 329 329 329 329 144 144 144 144 144 144 144 144 144 144 144 144 144 144 1 1 1 1 1 1 1 1 1
+1 1 1 1 1 1 1 1 1 1 1 1 227 227 227 227 227 227 227 227 227 227 101 101 101 101 101 101 101 101 101 288 288 288 288 179 179 179 179 37 37 37 328 328 328 328 279 279 279 279 279 279 279 279 279 279 279 279 209 209 209 209 209 209 209 209 209 209 232 232 232 232 232 232 119 119 119 119 49 49 49 288 288 288 288 119 119 119 52 52 52 52 52 52 52 111 111 111 111 111 111 111 111 111 111 111 111 111 111 193 193 193 193 193 232 232 232 232 232 232 232 232 232 331 331 331 49 49 49 49 340 340 340 340 340 340 340 340 340 340 327 327 327 327 327 327 133 133 133 133 277 277 277 277 277 277 277 204 204 204 204 204 204 204 204 204 271 271 271 271 271 271 271 271 271 271 271 271 271 271 271 271 271 271 271 265 265 265 265 265 265 265 265 265 233 233 233 233 233 289 289 289 289 289 289 289 289 289 189 189 189 189 116 116 116 116 116 47 47 47 233 233 233 233 116 116 116 116 271 271 271 271 271 277 277 277 277 49 49 49 49 233 233 233 233 233 85 85 85 85 85 85 85 85 85 85 85 85 85 85 85 85 233 233 233 233 233 233 281 281 281 281 281 281 281 281 281 281 281 281 288 288 288 288 288 288 288 288 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 119 119 119 119 119 119 119 49 49 49 49 49 288 288 288 288 288 119 119 119 119 48 48 48 48 48 107 107 107 107 107 107 107 107 107 193 193 193 193 193 193 193 176 176 176 176 176 176 176 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 340 340 340 340 340 340 340 340 331 331 331 331 331 144 144 144 144 144 171 171 171 171 171 171 171 171 171 171 171 305 305 305 305 305 224 224 224 224 224 224 224 47 47 47 47 328 328 328 328 328 328 328 279 279 279 279 279 279 279 279 279 279 279 279 279 279 279 273 273 273 273 273 193 193 193 277 277 277 277 277 277 277 277 277 277 189 189 189 189 189 288 288 288 288 288 47 47 47 233 233 233 233 116 116 116 327 327 327 189 189 189 189 189 189 189 189 189 189 189 189 329 329 329 329 329 329 37 37 37 37 37 37 37 37 37 37 281 281 281 281 281 281 281 281 281 281 189 189 189 189 289 289 289 289 289 289 204 204 204 204 204 204 204 204 204 204 204 204 204 204 204 204 204 204 204 204 204 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 119 119 119 119 119 119 119 49 49 49 49 288 288 288 288 119 119 119 52 52 52 52 227 227 227 227 227 227 227 85 85 85 85 85 85 85 85 85 85 85 85 85 85 85 85 85 85 292 292 292 292 292 292 292 292 292 292 331 331 331 331 49 49 49 49 340 340 340 340 340 279 279 279 279 279 279 279 279 279 279 279 279 279 279 333 333 333 333 333 209 209 209 209 209 209 209 209 209 288 288 288 288 288 288 223 223 223 223 223 223 223 223 193 193 193 193 193 193 193 193 273 273 273 273 273 273 273 273 288 288 288 288 288 288 47 47 47 233 233 233 116 116 116 187 187 187 187 221 221 221 221 221 221 221 281 281 281 281 281 281 281 273 273 273 273 273 277 277 277 133 133 133 133 133 133 281 281 281 281 281 281 281 281 281 281 281 281 189 189 189 189 189 189 189 189 189 189 189 328 328 328 328 328 328 328 328 328 328 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 119 119 119 119 119 119 49 49 49 49 288 288 288 288 288 288 119 119 119 52 52 52 52 52 171 171 171 171 171 171 171 171 171 171 171 171 171 171 69 69 69 69 69 69 69 69 69 277 277 277 277 277 181 181 181 181 181 181 181 181 181 129 129 129 129 129 129 129 129 129 129 129 116 116 116 116 116 116 116 116 116 116 331 331 331 331 331 331 49 49 49 49 49 340 340 340 340 340 340 340 340 340 340 107 107 107 107 107 107 107 107 277 277 277 277 277 277 277 277 69 69 69 69 69 69 69 69 69 69 69 69 69 69 69 69 69 69 69 69 116 116 116 116 116 47 47 47 47 233 233 233 233 116 116 116 116 116 116 171 171 171 171 171 171 171 171 171 171 171 171 305 305 305 305 305 305 305 305 305 305 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 187 187 187 187 187 187 187 187 187 187 187 187 187 232 232 232 232 232 283 283 283 283 283 283 283 283 283 283 283 283 283 283 69 69 69 69 69 69 69 69 69 69 277 277 277 277 277 277 277 277 288 288 288 288 288 288 288 288 288 288 288 288 288 288 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 19 19 19 19 19 19 19 19 19 19 19 276 276 276 276 276 276 276 276 115 115 115 115 115 115 189 189 189 189 189 281 281 281 281 281 281 281 281 281 281 281 281 149 149 149 149 149 149 149 149 149 149 233 233 233 233 189 189 189 189 189 189 236 236 236 236 236 236 236 236 187 187 187 187 187 221 221 221 221 221 221 221 221 221 221 281 281 281 289 289 289 289 289 277 277 277 277 277 69 69 69 277 277 277 277 277 277 117 117 117 49 49 49 233 233 233 137 137 137 137 137 137 277 277 277 204 204 204 47 47 47 47 109 109 109 109 109 341 341 341 341 341 341 341 341 341 149 149 149 149 149 149 149 149 149 149 149 329 329 329 329 329 329 144 144 144 144 144 144 144 144 144 144 144 144 144 144 144 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 227 227 227 227 227 227 227 227 227 227 227 101 101 101 101 101 101 101 101 288 288 288 179 179 179 37 37 37 328 328 328 328 328 328 219 219 219 219 49 49 49 49 49 49 233 233 233 233 233 221 221 221 221 221 221 221 221 221 221 221 225 225 225 225 321 321 321 321 321 321 321 321 117 117 117 117 117 189 189 189 189 189 189 189 189 116 116 116 116 116 116 116 1 1 1 1 1 1 1 1
+1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 331 331 331 331 331 331 331 49 49 49 49 340 340 340 340 340 340 279 279 279 279 279 279 279 279 279 248 248 248 248 248 248 248 223 223 223 223 223 223 223 223 321 321 321 321 321 321 321 321 321 117 117 117 117 49 49 49 49 221 221 221 221 221 277 277 277 277 277 49 49 49 49 281 281 281 281 281 281 281 281 281 225 225 225 225 225 204 204 204 204 204 204 204 47 47 47 47 47 173 173 173 173 173 173 173 173 173 173 277 277 277 277 277 165 165 165 165 165 165 165 165 165 165 165 165 165 165 165 116 116 116 116 116 116 116 116 116 116 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 227 227 227 227 227 227 227 227 227 227 37 37 37 37 37 37 37 37 37 37 37 37 293 293 293 293 293 293 293 293 293 337 337 337 337 337 337 316 316 316 316 316 316 316 316 179 179 179 179 179 179 179 179 179 179 89 89 89 89 89 89 89 89 89 133 133 133 133 133 133 133 133 133 329 329 329 329 329 329 329 144 144 144 144 144 144 144 144 144 144 144 144 144 144 1 1 1 1 1 1 1 1 1 1 1 331 331 331 331 331 331 331 331 331 331 49 49 49 340 340 340 340 340 340 340 279 279 279 279 279 279 279 273 273 273 273 273 273 133 133 133 133 133 133 133 277 277 277 277 277 277 277 277 277 277 277 277 116 116 116 119 119 119 119 119 204 204 204 204 204 63 63 63 63 63 63 63 63 277 277 277 277 277 277 277 277 117 117 117 117 117 117 117 117 209 209 209 209 209 209 209 209 209 209 209 224 224 224 224 224 224 224 224 47 47 47 47 328 328 328 328 328 279 279 279 279 279 279 279 279 273 273 273 273 273 273 273 273 209 209 209 209 209 209 221 221 221 221 221 189 189 189 189 189 236 236 236 236 236 236 236 236 171 171 171 171 171 171 171 171 171 171 171 149 149 149 149 149 149 149 149 149 149 149 149 149 149 149 281 281 281 281 281 281 281 281 288 288 288 288 288 288 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 171 171 171 171 171 171 171 171 171 171 171 171 171 69 69 69 69 69 276 276 276 276 276 276 276 131 131 131 131 340 340 340 340 340 340 279 279 279 279 279 279 279 279 321 321 321 321 321 321 321 232 232 232 232 232 131 131 131 340 340 340 340 283 283 283 283 283 283 283 208 208 208 208 208 208 219 219 219 219 219 219 49 49 49 49 233 233 233 233 233 221 221 221 221 221 221 221 221 221 225 225 225 225 225 321 321 321 321 321 321 117 117 117 117 117 189 189 189 189 189 116 116 116 116 116 119 119 119 49 49 49 288 288 288 179 179 179 179 179 179 208 208 208 208 208 208 331 331 331 331 331 49 49 49 340 340 340 340 340 340 340 219 219 219 219 219 219 219 219 219 219 53 53 53 53 53 229 229 229 229 229 189 189 189 189 236 236 236 236 236 236 236 287 287 287 287 287 287 287 287 287 287 320 320 320 320 320 320 179 179 179 179 179 179 179 148 148 148 148 148 148 148 148 148 148 148 148 148 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 283 283 283 283 283 283 283 283 283 283 283 208 208 208 208 208 208 279 279 279 279 279 279 279 279 279 279 279 279 279 289 289 289 289 289 305 305 305 305 305 116 116 116 116 116 51 51 51 51 51 51 51 51 51 51 51 272 272 272 272 272 272 272 272 272 272 272 272 272 272 272 1 1 1 1 1 1 1 1 1 1 1 1 1 175 175 175 175 175 277 277 277 277 37 37 37 37 37 37 37 37 37 37 281 281 281 281 281 281 281 281 273 273 273 273 273 273 189 189 189 189 189 236 236 236 236 236 236 236 331 331 331 331 189 189 189 189 292 292 292 292 292 292 292 292 331 331 331 331 331 331 331 331 53 53 53 53 53 53 53 53 232 232 232 232 232 232 232 232 232 232 232 291 291 291 291 291 291 291 291 291 291 291 291 291 291 291 291 291 291 291 291 193 193 193 193 193 193 193 193 193 232 232 232 232 232 232 232 232 232 232 232 232 232 232 232 232 232 232 107 107 107 107 107 107 277 277 277 277 277 85 85 85 85 85 85 85 85 85 85 85 85 85 85 85 85 232 232 232 232 232 179 179 179 179 179 179 179 179 179 179 179 179 179 179 179 179 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 233 233 233 233 233 116 116 116 116 116 116 116 116 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 119 119 119 119 52 52 52 52 179 179 179 179 179 179 179 179 179 179 179 179 37 37 37 37 37 37 37 233 233 233 233 233 233 233 233 233 117 117 117 117 49 49 49 49 49 224 224 224 224 224 47 47 47 47 328 328 328 328 328 50 50 50 283 283 283 283 283 283 283 283 283 283 283 283 283 283 283 283 283 283 37 37 37 37 37 37 37 37 37 109 109 109 109 109 109 204 204 204 204 204 204 204 204 204 204 204 204 204 204 204 204 204 247 247 247 247 247 247 225 225 225 225 225 225 225 225 225 225 225 116 116 116 116 116 116 171 171 171 171 171 171 171 171 171 171 171 171 171 37 37 37 37 37 37 37 37 37 37 285 285 285 285 285 285 285 285 285 285 49 49 49 49 233 233 233 233 233 116 116 116 116 116 116 219 219 219 219 219 219 219 219 219 219 219 21 21 21 21 21 277 277 277 277 273 273 273 273 273 49 49 49 49 49 49 288 288 288 288 107 107 107 107 107 107 107 107 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 176 176 176 176 176 176 176 176 176 176 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 119 119 119 119 119 119 204 204 204 204 204 204 204 204 51 51 51 51 51 51 51 51 51 51 51 51 121 121 121 121 121 121 121 121 144 144 144 144 144 144 283 283 283 283 283 283 283 283 283 283 283 283 283 208 208 208 208 208 208 208 179 179 179 179 179 179 179 133 133 133 133 133 133 225 225 225 225 225 225 225 225 116 116 116 116 83 83 83 83 83 83 83 83 83 83 83 83 288 288 288 287 287 287 287 287 188 188 188 188 179 179 179 179 179 179 179 179 179 179 193 193 193 193 193 193 228 228 228 228 228 228 228 228 228 228 228 228 228 228 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
+1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 279 279 279 279 279 279 279 49 49 49 49 273 273 273 273 273 273 273 249 249 249 249 249 249 249 249 249 249 249 249 340 340 340 340 340 340 335 335 335 320 320 320 320 320 320 320 19 19 19 276 276 276 276 227 227 227 227 193 193 193 193 193 281 281 281 281 281 281 289 289 289 289 289 144 144 144 144 144 227 227 227 227 227 227 227 37 37 37 37 37 37 293 293 293 293 293 337 337 337 337 316 316 316 316 316 219 219 219 219 219 219 219 219 219 219 219 53 53 53 53 53 53 293 293 293 293 293 293 293 293 109 109 109 109 109 109 109 145 145 145 145 145 145 288 288 288 288 47 47 47 328 328 328 328 328 175 175 175 175 175 175 277 277 277 209 209 209 209 209 209 232 232 232 232 232 232 175 175 175 175 175 165 165 165 165 165 165 165 165 165 165 109 109 109 109 109 49 49 49 49 225 225 225 225 225 225 225 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 283 283 283 283 283 283 283 283 283 283 283 283 208 208 208 208 208 208 279 279 279 279 279 279 279 279 279 133 133 133 133 133 133 133 116 116 116 116 116 187 187 187 232 232 232 232 50 50 50 50 50 271 271 271 271 271 271 189 189 189 189 221 221 221 221 221 221 221 221 221 221 221 221 221 221 337 337 337 337 337 337 321 321 321 321 321 321 225 225 225 225 337 337 337 337 337 337 337 145 145 145 145 145 225 225 225 225 225 225 204 204 204 204 204 204 204 219 219 219 219 219 219 219 219 219 219 219 219 219 219 219 219 219 219 225 225 225 225 225 193 193 193 193 193 193 193 193 193 193 193 193 193 193 276 276 276 276 276 276 276 276 276 276 279 279 279 279 279 279 279 279 279 279 279 279 279 279 279 279 279 279 333 333 333 333 333 209 209 209 209 209 209 209 288 288 288 288 288 288 327 327 327 327 327 327 327 327 265 265 265 265 265 265 265 265 265 265 265 265 265 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 99 99 99 99 99 99 99 99 99 99 99 99 99 99 228 228 228 228 228 228 228 327 327 327 327 327 133 133 133 277 277 277 277 277 277 277 204 204 204 204 204 204 175 175 175 175 175 225 225 225 225 225 225 37 37 37 37 37 37 116 116 116 287 287 287 287 188 188 188 188 279 279 279 279 279 279 279 279 279 279 279 208 208 208 208 208 208 335 335 335 335 335 335 335 335 335 335 335 320 320 320 320 320 320 320 320 320 320 320 320 320 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 102 102 102 102 102 102 102 102 102 102 102 102 102 102 331 331 331 331 49 49 49 49 340 340 340 340 340 340 340 107 107 107 189 189 189 189 189 189 177 177 177 177 177 177 177 193 193 193 193 193 233 233 233 189 189 189 236 236 236 236 236 236 287 287 287 287 188 188 188 188 107 107 107 107 107 107 208 208 208 208 208 208 208 208 208 208 47 47 47 47 47 173 173 173 173 173 173 173 173 173 277 277 277 277 165 165 165 165 165 165 165 165 165 116 116 116 116 335 335 335 335 320 320 320 320 320 320 331 331 331 331 331 331 331 149 149 149 149 149 233 233 233 288 288 288 288 288 219 219 219 219 219 219 219 219 219 53 53 53 53 53 229 229 229 229 229 189 189 189 236 236 236 236 236 236 171 171 171 171 171 171 171 171 171 69 69 69 276 276 276 276 227 227 227 227 227 227 227 208 208 208 208 208 208 208 208 208 208 1 1 1 1 1 1 1 1 1 1 1
+1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 102 102 102 102 102 102 102 102 102 102 102 102 102 179 179 179 179 179 179 179 37 37 37 116 116 116 116 116 227 227 227 227 227 227 227 165 165 165 165 165 165 116 116 116 116 51 51 51 51 51 51 51 51 272 272 272 272 272 227 227 227 227 100 100 100 100 100 100 100 100 227 227 227 227 227 227 227 227 227 227 227 101 101 101 101 101 101 101 101 101 101 101 101 101 233 233 233 116 116 116 116 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 119 119 119 119 119 119 49 49 49 288 288 288 288 187 187 187 187 172 172 172 172 172 335 335 335 335 320 320 320 320 320 115 115 115 115 115 115 115 193 193 193 193 193 117 117 117 117 117 117 49 49 49 49 232 232 232 219 219 219 219 219 219 219 219 219 219 53 53 53 53 228 228 228 228 228 228 228 228 171 171 171 171 171 171 144 144 144 144 144 227 227 227 227 227 227 208 208 208 208 208 287 287 287 287 287 287 287 188 188 188 188 231 231 231 231 231 231 231 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 288 288 288 288 288 288 288 1 1 1 1 1 1 1 1
+1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 331 331 331 331 331 331 305 305 305 117 117 117 49 49 49 233 233 233 233 288 288 288 107 107 107 107 107 208 208 208 208 208 208 208 50 50 50 50 50 107 107 107 107 107 107 107 193 193 193 288 288 288 288 288 47 47 47 47 47 47 173 173 173 173 173 173 173 173 173 173 173 277 277 277 277 165 165 165 165 165 165 165 165 165 165 165 165 165 116 116 116 116 116 116 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 35 35 35 35 35 35 35 35 35 35 35 35 35 35 233 233 233 116 116 116 191 191 191 288 288 288 288 331 331 331 305 305 305 305 116 116 116 107 107 107 107 107 208 208 208 208 208 208 208 208 208 223 223 223 223 223 223 223 223 223 223 223 223 223 223 223 223 223 223 53 53 53 53 53 53 329 329 329 329 329 329 329 329 329 329 329 225 225 225 225 204 204 204 204 204 204 287 287 287 287 287 188 188 188 188 188 279 279 279 279 279 279 279 279 279 225 225 225 225 209 209 209 209 209 209 272 272 272 272 272 272 187 187 187 232 232 232 50 50 50 50 50 50 331 331 331 331 331 331 331 331 331 331 331 331 101 101 101 101 101 101 101 101 101 101 101 101 225 225 225 225 225 225 225 225 116 116 116 116 116 116 116 111 111 111 111 111 111 111 111 111 111 111 111 111 111 111 111 111 111 111 133 133 133 133 277 277 277 277 277 277 277 277 277 204 204 204 204 204 204 287 287 287 287 287 287 287 287 277 277 277 277 277 208 208 208 208 208 208 208 208 208 208 208 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 224 224 224 224 224 224 224 224 224 224 331 331 331 331 331 331 331 331 101 101 101 101 101 101 101 101 101 101 101 288 288 288 288 288 288 288 288 288 331 331 331 331 189 189 189 292 292 292 292 292 292 292 292 292 107 107 107 107 107 107 107 107 107 225 225 225 225 225 225 321 321 321 321 321 321 321 321 321 321 321 321 321 321 321 228 228 228 228 228 228 228 187 187 187 232 232 232 232 119 119 119 119 52 52 52 227 227 227 227 227 227 227 227 227 321 321 321 321 321 321 321 321 321 321 321 233 233 233 233 233 233 285 285 285 285 285 285 285 285 285 285 285 285 105 105 105 105 105 105 105 105 105 105 232 232 232 232 232 232 232 115 115 115 115 115 115 115 249 249 249 233 233 233 233 233 288 288 288 288 288 335 335 335 335 320 320 320 291 291 291 291 291 291 291 291 291 193 193 193 193 193 193 193 237 237 237 237 237 237 237 237 220 220 220 220 220 220 220 220 220 220 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 335 335 335 335 335 335 335 320 320 320 320 320 219 219 219 219 219 219 219 305 305 305 305 116 116 116 187 187 187 229 229 229 229 229 229 229 229 229 37 37 37 37 37 37 37 37 37 37 217 217 217 217 217 217 217 217 49 49 49 232 232 232 232 335 335 335 335 335 320 320 320 320 331 331 331 331 331 144 144 144 144 115 115 115 115 115 115 115 115 115 333 333 333 333 133 133 133 225 225 225 225 225 189 189 189 189 189 236 236 236 236 236 236 187 187 187 232 232 232 232 232 227 227 227 227 227 227 227 227 227 227 227 227 21 21 21 21 21 21 21 21 277 277 277 277 277 109 109 109 109 109 49 49 49 224 224 224 224 224 224 224 179 179 179 179 179 179 179 179 179 179 179 179 69 69 69 69 69 69 69 69 69 69 69 69 69 69 69 69 69 69 225 225 225 225 225 225 340 340 340 340 340 340 340 340 340 219 219 219 219 219 219 219 305 305 305 305 117 117 117 117 49 49 49 233 233 233 233 288 288 288 288 288 335 335 335 335 335 335 335 320 320 320 320 320 320 320 320 320 320 320 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
+1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 227 227 227 227 227 227 227 227 227 227 227 227 227 227 37 37 37 37 37 37 37 37 37 37 37 37 37 293 293 293 293 293 293 293 337 337 337 337 337 316 316 316 316 316 316 316 316 316 316 316 179 179 179 179 179 179 179 179 179 37 37 37 37 37 116 116 116 116 287 287 287 287 287 287 287 287 165 165 165 165 165 165 165 165 221 221 221 221 221 221 221 221 49 49 49 49 232 232 232 232 119 119 119 48 48 48 48 48 48 279 279 279 279 279 279 279 279 279 279 279 221 221 221 221 221 277 277 277 277 69 69 69 69 69 69 69 69 233 233 233 233 233 204 204 204 204 204 204 223 223 223 223 223 193 193 193 193 193 289 289 289 49 49 49 49 224 224 224 224 224 224 179 179 179 179 179 179 179 179 179 37 37 37 37 37 37 37 37 37 37 37 233 233 233 233 233 116 116 116 116 116 116 116 116 116 116 116 116 67 67 67 67 67 67 67 67 67 67 67 67 67 221 221 221 221 221 221 221 221 333 333 333 333 145 145 145 145 117 117 117 117 117 225 225 225 225 225 225 204 204 204 204 204 204 204 187 187 187 187 187 232 232 232 232 232 232 179 179 179 179 179 179 179 179 179 179 179 193 193 193 193 193 193 193 193 193 193 193 193 193 193 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 119 119 119 119 119 119 119 119 119 119 119 119 119 133 133 133 133 133 133 133 133 133 133 133 133 232 232 232 232 232 232 35 35 35 35 233 233 233 233 233 233 116 116 116 116 119 119 119 133 133 133 133 133 133 133 133 133 133 276 276 276 276 276 276 276 276 276 276 276 276 276 276 276 276 276 276 276 1 1 1 1 1 1 1 1 1 179 179 179 179 179 179 179 179 179 179 179 208 208 208 208 115 115 115 115 197 197 197 197 281 281 281 281 281 281 281 281 281 281 101 101 101 101 101 101 101 117 117 117 117 117 117 189 189 189 189 116 116 116 116 116 116 116 116 331 331 331 331 53 53 53 53 53 53 288 288 288 287 287 287 287 287 287 188 188 188 188 115 115 115 115 320 320 320 320 320 320 320 320 320 320 320 320 320 320 320 320 320 320 320 320 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 179 179 179 179 179 179 179 179 179 208 208 208 208 208 219 219 219 219 219 219 219 219 305 305 305 305 305 116 116 116 116 231 231 231 231 231 231 21 21 21 21 21 21 21 21 21 21 21 288 288 288 287 287 287 287 287 287 287 287 287 133 133 133 133 133 224 224 224 224 224 224 224 224 224 119 119 119 119 189 189 189 189 189 189 280 280 280 280 280 280 280 280 280 280 111 111 111 111 111 111 111 111 111 111 111 111 111 111 101 101 101 101 101 101 101 101 101 101 101 225 225 225 225 225 225 225 225 225 225 116 116 116 116 116 331 331 331 331 189 189 189 120 120 120 119 119 119 52 52 52 175 175 175 175 175 175 175 175 175 225 225 225 225 225 225 225 225 249 249 249 249 249 249 249 249 249 189 189 189 189 189 189 189 236 236 236 236 236 236 236 236 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 340 340 340 340 340 340 340 340 340 340 340 340 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 119 119 119 119 119 119 119 119 49 49 49 49 49 288 288 288 119 119 119 119 133 133 133 276 276 276 276 179 179 179 179 179 179 37 37 37 116 116 116 116 116 107 107 107 107 107 107 49 49 49 232 232 232 232 232 50 50 50 50 227 227 227 227 227 189 189 189 189 189 281 281 281 281 281 281 281 281 281 281 281 281 281 289 289 289 289 289 289 289 165 165 165 165 165 165 165 165 165 165 165 165 220 220 220 220 220 220 220 220 220 220 220 220 220 220 220 220 220 220 220 220 220 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 179 179 179 179 179 179 179 179 179 179 179 179 208 208 208 208 208 331 331 331 331 331 331 305 305 305 305 116 116 116 116 287 287 287 287 287 287 287 165 165 165 165 165 165 165 165 220 220 220 220 220 179 179 179 144 144 144 144 144 144 144 144 144 179 179 179 179 179 179 179 179 249 249 249 249 249 249 249 249 249 249 249 228 228 228 228 228 228 228 47 47 47 233 233 233 116 116 116 223 223 223 133 133 133 133 133 288 288 288 288 288 227 227 227 227 227 17 17 17 277 277 277 277 277 277 277 277 277 277 277 193 193 193 193 225 225 225 225 225 225 225 48 48 48 48 48 48 115 115 115 115 115 115 320 320 320 320 320 320 320 320 320 119 119 119 119 119 37 37 37 37 37 37 37 37 37 37 37 37 37 288 288 288 288 288 288 288 288 288 288 288 288 288 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 283 283 283 283 283 283 283 283 283 283 283 283 283 283 208 208 208 208 208 208 219 219 219 219 219 219 219 305 305 305 305 305 305 117 117 117 49 49 49 232 232 232 232 107 107 107 107 107 107 107 204 204 204 204 204 204 204 204 223 223 223 223 223 133 133 133 133 133 133 173 173 173 173 173 173 288 288 288 288 288 288 35 35 35 35 288 288 288 288 107 107 107 107 107 107 107 277 277 277 277 277 101 101 101 101 101 101 101 101 101 288 288 288 288 275 275 275 275 275 275 193 193 193 193 193 193 329 329 329 329 329 329 144 144 144 144 144 144 144 144 144 131 131 131 131 233 233 233 233 233 205 205 205 205 205 205 205 181 181 181 181 181 181 181 181 181 181 88 88 88 88 88 88 88 88 88 88 88 88 88 88 88 88 88 1 1 1 1 1 1 1 1 1 1
+1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 231 231 231 231 231 231 231 231 231 248 248 248 248 227 227 227 227 227 227 227 227 37 37 37 37 37 37 289 289 289 289 289 144 144 144 144 144 144 331 331 331 331 331 331 331 331 53 53 53 53 53 53 53 288 288 288 288 288 288 227 227 227 227 189 189 189 189 281 281 281 281 281 281 281 281 281 289 289 289 289 289 289 165 165 165 165 165 165 220 220 220 220 179 179 179 37 37 37 116 116 116 107 107 107 107 189 189 189 189 232 232 232 227 227 227 227 227 227 227 227 165 165 165 165 165 165 165 165 165 165 165 165 165 165 116 116 116 116 116 116 116 116 116 116 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 279 279 279 279 279 279 279 279 279 279 279 279 248 248 248 248 248 248 248 248 248 248 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 224 224 224 224 224 224 224 224 219 219 219 219 219 219 219 333 333 333 333 333 133 133 133 133 133 281 281 281 281 281 281 281 281 281 113 113 113 113 113 113 113 49 49 49 49 49 233 233 233 233 233 233 233 233 340 340 340 340 47 47 47 233 233 233 116 116 116 116 135 135 135 135 221 221 221 221 221 221 221 281 281 281 281 281 273 273 273 273 273 225 225 225 225 225 49 49 49 49 233 233 233 233 165 165 165 165 165 165 165 165 165 285 285 285 285 285 285 285 285 285 285 285 285 285 285 49 49 49 49 233 233 233 233 233 233 233 233 233 233 340 340 340 340 340 340 340 340 227 227 227 227 227 227 101 101 101 101 101 101 101 288 288 288 288 288 288 131 131 131 131 340 340 340 340 340 340 340 340 331 331 331 331 331 331 331 133 133 133 133 133 133 133 133 133 224 224 224 224 224 224 224 107 107 107 107 107 107 204 204 204 204 204 204 115 115 115 115 115 189 189 189 189 173 173 173 173 173 173 173 173 173 173 173 173 173 149 149 149 149 149 149 149 149 149 149 116 116 116 116 116 47 47 47 47 233 233 233 233 289 289 289 289 289 289 289 193 193 193 193 224 224 224 179 179 179 179 179 179 208 208 208 208 331 331 331 331 331 49 49 49 340 340 340 340 279 279 279 279 279 279 279 279 279 279 279 165 165 165 165 165 165 165 165 173 173 173 173 173 173 173 225 225 225 225 225 204 204 204 204 107 107 107 107 107 107 37 37 37 37 37 37 37 37 37 220 220 220 220 220 220 220 220 35 35 35 288 288 288 288 175 175 175 175 175 175 277 277 277 277 277 209 209 209 209 209 209 232 232 232 232 232 232 175 175 175 175 175 175 165 165 165 165 165 165 165 165 165 165 165 109 109 109 109 109 49 49 49 225 225 225 225 225 225 225 225 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 228 228 228 228 228 228 228 279 279 279 279 279 279 279 279 279 279 279 279 21 21 21 21 21 21 277 277 277 277 277 277 204 204 204 204 204 204 204 102 102 102 102 331 331 331 331 331 331 49 49 49 340 340 340 340 340 340 340 340 223 223 223 223 223 223 165 165 165 165 165 165 165 165 165 165 165 288 288 288 288 288 288 288 288 288 288 288 288 288 288 288 288 288 288 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 179 179 179 179 179 179 179 179 179 179 208 208 208 208 208 279 279 279 279 279 279 279 279 279 279 133 133 133 133 133 133 116 116 116 116 116 283 283 283 283 283 283 283 283 283 283 283 101 101 101 101 101 101 101 101 101 101 101 101 225 225 225 225 225 225 225 204 204 204 204 204 204 204 204 204 204 204 204 204 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 219 219 219 219 219 219 219 219 219 53 53 53 53 228 228 228 228 47 47 47 47 225 225 225 225 225 225 225 69 69 69 69 69 69 69 69 69 236 236 236 236 236 236 236 236 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 119 119 119 119 52 52 52 52 52 179 179 179 179 179 179 69 69 69 69 69 69 69 277 277 277 277 277 277 277 280 280 280 280 280 280 280 187 187 187 187 187 340 340 340 340 340 340 340 247 247 247 247 247 247 247 247 247 247 247 329 329 329 329 329 329 329 144 144 144 144 144 187 187 187 187 187 232 232 232 232 119 119 119 119 204 204 204 204 335 335 335 335 335 335 335 21 21 21 21 21 21 21 21 21 21 21 21 277 277 277 277 277 277 277 116 116 116 116 116 116 116 116 116 116 116 116 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 175 175 175 175 175 175 175 193 193 193 328 328 328 328 227 227 227 227 208 208 208 335 335 335 335 305 305 305 276 276 276 276 276 276 276 107 107 107 107 107 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 176 176 176 176 176 176 176 176 176 176 176 176 176 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 102 102 102 102 102 102 102 102 219 219 219 219 219 219 49 49 49 49 49 232 232 232 232 232 219 219 219 219 219 219 219 219 219 219 219 219 133 133 133 133 133 133 277 277 277 277 277 277 277 277 204 204 204 204 204 204 204 204 204 204 204 187 187 187 187 187 288 288 288 288 288 288 288 288 288 288 288 288 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 119 119 119 119 52 52 52 52 52 111 111 111 111 111 111 111 111 111 111 111 111 101 101 101 101 101 101 101 101 225 225 225 225 225 116 116 116 116 275 275 275 189 189 189 189 281 281 281 281 281 273 273 273 273 273 273 273 21 21 21 21 21 21 233 233 233 233 117 117 117 117 189 189 189 116 116 116 116 116 116 111 111 111 111 111 111 111 111 111 111 111 193 193 193 193 277 277 277 277 277 277 277 173 173 173 173 173 173 173 173 49 49 49 225 225 225 225 204 204 204 204 204 204 204 204 204 204 204 204 204 1 1 1 1 1 1 1 1 1 1 1 1 1 1
+1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 191 191 191 191 191 191 191 191 191 288 288 288 288 288 187 187 187 187 341 341 341 341 341 341 49 49 49 49 233 233 233 288 288 288 179 179 179 179 179 179 179 179 133 133 133 133 329 329 329 329 329 329 329 329 204 204 204 204 204 204 204 204 204 204 204 204 204 204 204 204 204 204 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 99 99 99 99 99 99 99 99 99 99 99 99 328 328 328 328 175 175 175 175 175 175 21 21 21 21 21 21 21 21 21 21 21 288 288 288 1 1 1 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 224 224 224 224 227 227 227 227 227 227 227 227 227 100 100 100 100 100 100 100 100 100 100 331 331 331 331 331 331 331 331 331 331 149 149 149 149 149 225 225 225 225 225 225 117 117 117 117 225 225 225 225 204 204 204 204 204 204 175 175 175 175 175 175 175 305 305 305 305 305 305 305 117 117 117 117 117 117 117 340 340 340 340 340 187 187 187 187 232 232 232 232 232 187 187 187 187 187 187 187 288 288 288 288 288 288 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 107 107 107 107 107 107 53 53 53 288 288 288 288 191 191 191 288 288 288 288 288 187 187 187 187 187 341 341 341 341 341 341 49 49 49 233 233 233 288 288 288 179 179 179 179 179 179 179 179 133 133 133 133 133 329 329 329 329 329 329 329 329 204 204 204 204 204 204 204 204 204 204 204 204 204 204 204 204 204 204 204 204 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 35 35 35 35 35 35 35 35 35 35 35 233 233 233 116 116 116 116 187 187 187 187 187 187 187 172 172 172 172 172 172 172 172 191 191 191 288 288 288 288 191 191 191 191 191 191 341 341 341 341 49 49 49 232 232 232 232 232 232 219 219 219 219 219 219 219 219 219 219 219 219 219 219 133 133 133 133 277 277 277 277 277 277 277 277 277 205 205 205 205 205 205 205 116 116 116 116 187 187 187 187 232 232 232 232 232 232 232 232 232 215 215 215 215 215 215 53 53 53 53 53 281 281 281 281 281 281 281 281 288 288 288 288 50 50 50 50 279 279 279 279 279 279 279 279 279 279 279 279 149 149 149 149 149 149 149 149 149 149 289 289 289 289 49 49 49 232 232 232 232 232 232 331 331 331 331 331 331 331 331 164 164 164 164 164 164 164 164 164 164 164 164 164 164 119 119 119 119 119 52 52 52 179 179 179 179 179 179 179 179 179 179 179 179 179 179 179 37 37 37 37 233 233 233 233 233 117 117 117 49 49 49 224 224 224 224 224 224 271 271 271 271 271 271 271 271 305 305 305 305 305 225 225 225 225 225 225 225 340 340 340 340 340 340 340 83 83 83 83 83 83 83 83 83 83 83 83 83 83 83 288 288 288 288 288 288 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 279 279 279 279 279 279 279 279 279 279 279 279 279 279 279 248 248 248 248 248 248 248 248 248 248 99 99 99 99 99 99 99 99 99 99 99 116 116 116 107 107 107 107 107 133 133 133 289 289 289 289 144 144 144 144 144 144 144 144 219 219 219 219 219 219 219 219 219 219 219 219 209 209 209 209 209 209 209 272 272 272 272 272 187 187 187 187 187 187 288 288 288 288 107 107 107 107 189 189 189 189 189 221 221 221 221 221 221 49 49 49 49 49 49 340 340 340 340 340 340 340 102 102 102 102 102 102 102 102 102 102 231 231 231 231 231 231 231 231 231 248 248 248 248 248 248 248 119 119 119 119 119 119 204 204 204 204 204 204 187 187 187 187 177 177 177 177 177 177 177 177 177 341 341 341 341 341 341 341 341 341 37 37 37 37 37 37 37 37 37 37 37 221 221 221 221 221 221 221 221 221 221 221 221 288 288 288 231 231 231 231 231 231 231 37 37 37 37 37 37 37 37 37 220 220 220 220 220 220 220 47 47 47 47 328 328 328 328 328 328 187 187 187 187 187 187 187 187 288 288 288 288 288 288 288 288 288 288 288 288 288 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 191 191 191 191 191 191 191 289 289 289 289 289 280 280 280 280 47 47 47 232 232 232 127 127 127 221 221 221 221 221 221 281 281 281 281 289 289 289 289 289 289 277 277 277 209 209 209 209 209 209 209 209 229 229 229 229 225 225 225 204 204 204 204 204 204 204 204 204 204 204 247 247 247 247 247 247 247 247 247 247 225 225 225 225 225 225 225 225 225 116 116 116 116 116 219 219 219 219 219 219 219 219 219 21 21 21 21 21 277 277 277 277 277 277 273 273 273 273 273 49 49 49 49 288 288 288 288 107 107 107 107 107 107 107 37 37 37 37 37 37 37 37 37 37 37 37 37 176 176 176 176 176 176 176 176 176 176 176 176 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 99 99 99 99 99 228 228 228 228 228 228 228 228 327 327 327 327 133 133 133 133 277 277 277 277 277 277 277 277 204 204 204 204 204 204 204 175 175 175 175 175 175 225 225 225 225 225 37 37 37 37 37 37 37 37 37 116 116 116 116 116 335 335 335 335 335 321 321 321 328 328 328 328 328 328 328 219 219 219 219 219 219 219 219 219 219 219 53 53 53 53 53 53 53 228 228 228 228 228 228 228 228 228 228 228 228 228 228 228 1 1 1 1 1 1 1 1 1 1 1 1 1 1 207 207 207 207 207 207 207 207 207 207 207 207 207 207 329 329 329 329 329 189 189 189 232 232 232 232 187 187 187 187 187 187 172 172 172 172 172 172 172 187 187 187 187 187 288 288 288 288 288 288 288 331 331 331 331 305 305 305 116 116 116 179 179 179 37 37 37 328 328 328 328 328 328 107 107 107 107 107 107 107 193 193 193 193 232 232 232 231 231 231 231 231 231 231 231 101 101 101 101 101 101 101 101 101 101 280 280 280 280 280 280 280 287 287 287 287 287 48 48 48 48 48 279 279 279 279 279 279 279 279 225 225 225 209 209 209 209 209 272 272 272 272 272 272 187 187 187 232 232 232 232 50 50 50 50 50 50 50 331 331 331 331 331 331 331 331 331 331 331 101 101 101 101 101 101 101 101 101 101 101 225 225 225 225 225 225 116 116 116 116 116 111 111 111 111 111 111 111 111 111 111 111 111 133 133 133 133 133 133 277 277 277 277 277 277 277 277 204 204 204 204 204 204 287 287 287 287 287 287 287 287 287 287 277 277 277 277 208 208 208 208 208 208 208 208 208 208 208 208 208 208 208 208 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
+1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 331 331 331 331 331 331 331 331 331 209 209 209 209 328 328 328 328 175 175 175 175 175 175 21 21 21 21 21 288 288 288 287 287 287 287 188 188 188 115 115 115 115 115 277 277 277 277 101 101 101 101 101 101 101 101 328 328 328 328 328 328 328 50 50 50 50 223 223 223 223 223 69 69 69 69 69 69 69 69 236 236 236 236 236 236 271 271 271 271 271 271 271 271 271 271 271 209 209 209 209 209 209 209 209 209 209 280 280 280 280 280 280 280 280 280 179 179 179 179 37 37 37 37 37 37 37 329 329 329 329 49 49 49 233 233 233 233 233 288 288 288 331 331 331 331 208 208 208 208 208 208 208 208 208 208 208 208 208 208 208 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 227 227 227 227 227 227 193 193 193 193 193 281 281 281 281 281 189 189 189 189 340 340 340 340 279 279 279 279 273 273 273 273 273 273 133 133 133 133 133 133 233 233 233 233 233 281 281 281 281 281 281 281 144 144 144 144 144 144 279 279 279 279 279 279 279 279 279 133 133 133 133 116 116 116 116 116 191 191 191 191 288 288 288 288 331 331 331 49 49 49 340 340 340 340 340 340 340 163 163 163 163 163 163 163 163 163 163 163 163 163 163 163 288 288 288 288 288 227 227 227 227 227 227 227 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 225 225 225 225 225 225 340 340 340 340 340 340 340 340 340 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 99 99 99 99 99 99 99 99 99 99 99 99 99 228 228 228 228 228 228 228 175 175 175 175 175 225 225 225 225 37 37 37 37 37 37 37 37 37 37 37 116 116 116 116 1 1 1 107 107 107 107 189 189 189 189 189 221 221 221 221 221 49 49 49 49 340 340 340 340 340 340 102 102 102 102 102 102 102 102 102 102 102 102 102 223 223 223 223 223 223 223 223 53 53 53 53 53 53 328 328 328 328 328 328 328 328 328 115 115 115 115 277 277 277 277 277 101 101 101 101 101 101 101 101 101 101 101 329 329 329 189 189 189 189 236 236 236 236 236 236 236 236 236 236 236 236 236 236 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 187 187 187 187 187 288 288 288 288 288 288 288 288 279 279 279 279 279 279 279 279 279 279 209 209 209 209 209 229 229 229 229 229 229 340 340 340 340 279 279 279 279 279 279 279 279 279 279 279 248 248 248 248 248 248 248 248 248 248 248 248 248 331 331 331 331 331 331 331 331 331 53 53 53 53 233 233 233 233 233 233 233 233 233 233 233 117 117 117 117 145 145 145 145 173 173 173 173 173 173 173 173 49 49 49 224 224 224 224 224 119 119 119 49 49 49 288 288 288 288 99 99 99 99 99 228 228 228 228 228 228 175 175 175 175 175 249 249 249 189 189 189 189 189 236 236 236 236 287 287 287 287 48 48 48 48 48 48 223 223 223 223 223 223 223 223 223 223 223 193 193 193 193 328 328 328 328 328 331 331 331 331 193 193 193 193 292 292 292 292 335 335 335 320 320 320 320 320 320 47 47 47 47 233 233 233 116 116 116 107 107 107 189 189 189 225 225 225 225 225 225 225 225 225 225 225 225 225 69 69 69 69 69 69 69 69 236 236 236 236 236 236 287 287 287 287 287 287 287 287 287 320 320 320 320 335 335 335 335 335 335 335 335 335 335 335 320 320 320 320 320 320 320 320 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 99 99 99 99 99 99 99 99 99 99 99 99 328 328 328 328 328 328 328 231 231 231 133 133 133 329 329 329 329 329 144 144 144 144 107 107 107 107 107 189 189 189 225 225 225 225 225 225 225 225 225 225 69 69 69 69 69 237 237 237 237 116 116 116 287 287 287 287 287 287 320 320 320 320 320 320 320 131 131 131 233 233 233 233 233 205 205 205 205 205 205 109 109 109 109 109 49 49 49 49 49 49 117 117 117 117 204 204 204 204 204 204 204 204 204 204 204 204 204 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 231 231 231 231 231 231 231 21 21 21 21 21 21 288 288 288 288 288 288 288 275 275 275 275 275 275 209 209 209 209 209 209 209 209 209 209 209 225 225 225 225 225 225 225 204 204 204 204 204 204 204 204 204 204 204 204 204 204 204 204 204 204 204 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 107 107 107 107 107 107 107 53 53 53 53 53 288 288 288 119 119 119 119 204 204 204 204 204 204 47 47 47 47 47 281 281 281 281 281 281 281 281 281 101 101 101 101 101 101 101 101 101 101 101 225 225 225 225 49 49 49 228 228 228 228 228 331 331 331 49 49 49 340 340 340 340 340 340 340 340 119 119 119 52 52 52 52 331 331 331 331 331 331 331 331 331 331 331 149 149 149 149 149 149 149 149 149 149 149 149 281 281 281 281 281 281 281 281 281 281 288 288 288 288 288 288 288 288 288 288 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 99 99 99 99 99 99 99 99 99 99 99 99 99 328 328 328 328 328 328 328 328 247 247 247 247 247 247 233 233 233 225 225 225 225 204 204 204 204 204 107 107 107 107 107 107 107 193 193 193 232 232 232 232 232 187 187 187 187 187 232 232 232 232 232 191 191 191 191 191 288 288 288 288 288 171 171 171 171 171 171 171 171 171 69 69 69 69 69 69 69 276 276 276 276 276 227 227 227 227 227 227 227 227 227 53 53 53 53 53 53 53 53 233 233 233 233 233 293 293 293 293 293 293 280 280 280 280 280 280 280 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 107 107 107 107 107 107 53 53 53 288 288 288 288 119 119 119 119 119 37 37 37 37 37 37 37 288 288 288 288 288 331 331 331 49 49 49 340 340 340 340 340 340 340 340 187 187 187 187 187 233 233 233 233 233 233 233 233 233 53 53 53 53 53 53 53 53 53 53 53 53 172 172 172 172 172 172 172 172 172 172 172 172 172 172 172 172 172 172 172 172 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
+1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 187 187 187 187 187 187 187 187 187 289 289 289 289 289 289 280 280 280 280 280 280 280 280 280 331 331 331 331 331 149 149 149 149 149 149 149 280 280 280 280 280 280 280 280 119 119 119 119 49 49 49 49 49 232 232 232 232 232 232 232 1 1 1 1 1 131 131 131 131 131 131 131 131 131 233 233 233 233 233 205 205 205 205 205 205 205 205 205 293 293 293 293 293 293 197 197 197 197 236 236 236 335 335 335 335 335 320 320 320 320 320 320 219 219 219 219 219 305 305 305 116 116 116 187 187 187 229 229 229 229 229 229 229 229 229 229 229 229 229 229 37 37 37 37 37 37 37 37 37 37 37 37 37 217 217 217 217 217 217 217 217 217 217 217 49 49 49 49 49 232 232 232 232 232 232 232 232 232 232 232 232 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 227 227 227 227 227 227 227 227 193 193 193 193 193 281 281 281 281 281 281 189 189 189 189 340 340 340 340 279 279 279 279 279 279 279 273 273 273 273 273 273 133 133 133 133 233 233 233 233 233 233 233 233 281 281 281 281 281 144 144 144 144 144 279 279 279 279 279 279 279 279 133 133 133 116 116 116 191 191 191 288 288 288 288 331 331 331 49 49 49 340 340 340 340 340 340 340 340 340 340 340 331 331 331 193 193 193 193 221 221 221 221 221 221 221 49 49 49 116 116 116 47 47 47 328 328 328 328 328 328 227 227 227 227 227 208 208 208 208 287 287 287 287 287 188 188 188 188 188 188 287 287 287 287 287 287 287 287 287 287 69 69 69 69 69 69 69 69 69 69 220 220 220 220 220 220 220 220 223 223 223 223 223 101 101 101 101 101 101 220 220 220 220 220 220 220 220 220 119 119 119 119 37 37 37 37 37 37 37 37 37 37 37 37 288 288 288 288 288 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
+1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 119 119 119 119 119 119 164 164 164 164 164 164 331 331 331 331 331 331 331 331 144 144 144 144 144 144 144 175 175 175 175 175 175 175 305 305 305 305 305 305 305 305 305 305 305 305 305 305 305 116 116 116 116 116 116 116 116 116 116 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 335 335 335 335 335 335 335 335 320 320 320 320 320 231 231 231 231 231 231 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 119 119 119 119 119 119 119 204 204 204 47 47 47 47 281 281 281 281 281 281 281 281 281 101 101 101 101 101 101 101 101 101 101 101 225 225 225 225 225 49 49 49 49 228 228 228 228 271 271 271 271 271 271 209 209 209 209 209 209 209 209 209 209 273 273 273 273 273 49 49 49 49 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 107 107 107 107 107 53 53 53 53 288 288 288 119 119 119 133 133 133 133 276 276 276 276 276 191 191 191 191 191 340 340 340 279 279 279 279 279 279 279 279 279 279 279 279 248 248 248 248 248 248 248 248 223 223 223 223 223 223 193 193 193 289 289 289 289 49 49 49 224 224 224 224 224 224 279 279 279 279 279 279 279 279 279 279 279 279 279 221 221 221 221 221 221 221 249 249 249 249 249 249 249 272 272 272 272 171 171 171 171 171 171 171 171 144 144 144 144 119 119 119 119 204 204 204 204 204 204 204 187 187 187 187 229 229 229 229 229 229 229 229 229 41 41 41 41 41 41 41 217 217 217 217 217 217 217 217 217 217 49 49 49 233 233 233 233 233 233 233 233 233 233 233 233 233 165 165 165 165 165 165 165 165 165 165 165 285 285 285 285 285 285 285 285 285 285 285 285 285 285 49 49 49 232 232 232 232 232 187 187 187 187 187 232 232 232 232 232 47 47 47 232 232 232 232 232 47 47 47 47 47 47 47 47 281 281 281 281 281 281 281 281 281 281 281 281 101 101 101 101 101 101 101 101 101 101 101 101 225 225 225 225 225 225 225 225 49 49 49 49 228 228 228 228 228 228 228 228 228 228 228 228 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 247 247 247 247 247 247 247 247 247 247 247 247 233 233 233 233 233 225 225 225 225 225 204 204 204 204 204 204 204 215 215 215 215 215 215 215 53 53 53 53 53 281 281 281 281 281 281 288 288 288 288 288 187 187 187 187 232 232 232 232 119 119 119 204 204 204 204 204 204 204 51 51 51 51 51 121 121 121 121 121 144 144 144 144 144 144 144 144 144 67 67 67 67 67 67 67 67 67 277 277 277 277 277 277 173 173 173 173 173 173 173 173 173 173 173 173 49 49 49 233 233 233 233 233 233 233 233 233 233 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 191 191 191 191 191 191 191 191 191 288 288 288 288 288 331 331 331 331 331 331 21 21 21 21 21 21 340 340 340 340 340 340 271 271 271 271 271 271 277 277 277 277 193 193 193 289 289 289 204 204 204 204 204 204 204 191 191 191 191 191 191 233 233 233 233 289 289 289 277 277 277 49 49 49 49 281 281 281 281 289 289 289 289 289 289 189 189 189 236 236 236 236 236 287 287 287 287 287 287 287 48 48 48 48 187 187 187 229 229 229 229 229 229 229 37 37 37 37 37 37 37 37 37 217 217 217 217 217 217 217 49 49 49 49 232 232 232 291 291 291 291 291 291 193 193 193 237 237 237 237 237 237 237 237 340 340 340 340 340 340 47 47 47 47 109 109 109 109 85 85 85 85 85 85 85 85 85 85 288 288 288 288 288 288 119 119 119 119 119 133 133 133 133 133 133 133 133 133 133 133 133 228 228 228 228 228 228 228 228 228 228 228 228 228 228 1 1 1 1 1 1 1 1 1
+1 1 1 1 1 1 1 1 1 1 179 179 179 179 179 179 179 179 179 320 320 320 320 320 320 179 179 179 37 37 37 116 116 116 116 107 107 107 107 189 189 189 232 232 232 232 232 232 232 279 279 279 279 279 279 279 279 279 289 289 289 289 289 289 289 249 249 249 249 249 249 249 225 225 225 225 225 225 225 49 49 49 232 232 232 232 47 47 47 47 47 333 333 333 333 333 333 333 164 164 164 164 164 164 164 171 171 171 171 171 171 171 171 171 145 145 145 228 228 228 179 179 179 179 179 179 179 144 144 144 144 271 271 271 271 271 271 271 271 271 271 271 271 271 271 271 271 271 133 133 133 133 133 277 277 277 277 277 277 49 49 49 49 233 233 233 289 289 289 289 280 280 280 280 187 187 187 232 232 232 232 232 179 179 179 179 179 144 144 144 144 144 144 144 144 144 144 191 191 191 191 191 191 191 191 233 233 233 233 233 233 233 173 173 173 173 173 173 49 49 49 49 233 233 233 233 233 281 281 281 281 281 281 281 281 281 281 204 204 204 204 204 204 107 107 107 107 107 107 107 100 100 100 100 100 100 100 100 100 100 100 50 50 50 50 50 219 219 219 219 219 219 219 219 219 219 219 219 219 219 219 219 277 277 277 277 321 321 321 321 321 321 321 321 321 321 321 224 224 224 224 224 224 224 224 231 231 231 231 231 231 231 231 231 231 149 149 149 149 149 149 149 149 149 149 149 149 149 149 280 280 280 280 280 280 280 280 179 179 179 179 179 179 320 320 320 320 320 320 115 115 115 115 115 115 115 115 115 115 115 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 116 116 116 107 107 107 107 107 189 189 189 189 189 173 173 173 173 173 173 69 69 69 276 276 276 283 283 283 283 283 283 283 283 283 208 208 208 208 219 219 219 219 219 219 305 305 305 116 116 116 116 219 219 219 219 219 49 49 49 233 233 233 233 173 173 173 173 173 173 173 173 173 173 173 173 133 133 133 133 133 133 133 133 133 133 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 102 102 102 102 102 102 102 102 102 102 102 102 335 335 335 335 335 335 335 335 335 321 321 321 321 341 341 341 341 341 341 341 341 116 116 116 287 287 287 48 48 48 48 48 48 223 223 223 223 223 223 223 100 100 100 100 100 100 100 100 100 100 47 47 47 47 47 47 47 333 333 333 333 333 333 165 165 165 165 165 220 220 220 220 220 220 220 35 35 35 288 288 288 288 288 288 231 231 231 231 231 101 101 101 101 101 101 101 101 101 101 101 289 289 289 289 289 280 280 280 47 47 47 233 233 233 116 116 116 187 187 187 229 229 229 229 229 229 229 229 229 37 37 37 37 37 37 37 37 217 217 217 217 217 217 217 217 49 49 49 49 49 232 232 232 232 232 291 291 291 291 291 291 193 193 193 237 237 237 237 237 237 237 237 340 340 340 340 340 223 223 223 223 223 223 223 101 101 101 101 101 101 101 101 220 220 220 220 220 220 220 119 119 119 119 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 288 288 288 288 288 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 107 107 107 107 189 189 189 189 221 221 221 221 221 221 49 49 49 49 340 340 340 340 340 340 102 102 102 102 102 102 115 115 115 115 115 193 193 193 193 193 117 117 117 233 233 233 288 288 288 179 179 179 179 179 179 179 37 37 37 37 328 328 328 287 287 287 287 287 287 287 287 287 287 287 101 101 101 101 101 101 101 101 101 101 101 228 228 228 228 228 228 187 187 187 232 232 232 119 119 119 48 48 48 48 115 115 115 115 115 164 164 164 164 164 164 164 164 164 164 164 164 164 164 164 164 164 164 164 164 164 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 102 102 102 102 102 102 102 102 102 102 102 102 175 175 175 175 175 175 175 175 175 133 133 133 133 133 280 280 280 280 280 280 119 119 119 119 119 37 37 37 289 289 289 289 280 280 280 280 280 331 331 331 331 331 100 100 100 100 100 100 100 100 100 100 47 47 47 47 228 228 228 228 279 279 279 279 279 279 279 279 279 248 248 248 248 248 248 248 291 291 291 291 291 291 291 291 291 291 193 193 193 193 193 193 193 193 232 232 232 232 232 232 232 232 232 232 232 232 232 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 35 35 35 35 35 35 35 35 35 228 228 228 228 228 228 115 115 115 115 115 277 277 277 133 133 133 117 117 117 117 173 173 173 173 173 173 49 49 49 224 224 224 224 224 224 291 291 291 291 291 291 193 193 193 193 193 232 232 232 232 232 232 163 163 163 163 163 163 163 163 163 163 163 233 233 233 288 288 288 288 288 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 119 119 119 119 119 119 133 133 133 133 276 276 276 191 191 191 191 191 191 341 341 341 341 49 49 49 232 232 232 50 50 50 50 50 50 50 50 50 50 50 1 1 1 1 271 271 271 271 271 271 271 271 193 193 193 193 193 193 193 220 220 220 220 220 19 19 19 19 19 232 232 232 232 232 227 227 227 227 227 100 100 100 100 100 100 100 100 100 100 100 107 107 107 107 107 107 107 107 249 249 249 249 249 249 249 249 249 249 249 249 249 249 249 249 233 233 233 233 233 233 233 233 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 1 1 1 1 1 1 1 1 1 1 1 1
+1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 115 115 115 115 115 115 115 115 115 115 115 115 320 320 320 320 320 320 320 320 320 320 223 223 223 223 223 223 223 53 53 53 53 53 53 53 53 328 328 328 328 328 287 287 287 287 287 287 287 287 320 320 320 320 320 187 187 187 229 229 229 229 229 229 229 229 229 229 37 37 37 37 37 37 37 37 37 217 217 217 217 217 217 217 217 217 49 49 49 232 232 232 232 99 99 99 99 99 99 99 99 99 99 228 228 228 228 228 228 228 228 228 228 231 231 231 231 231 101 101 101 101 101 101 101 101 101 101 101 101 101 280 280 280 280 280 280 280 280 280 280 280 280 280 280 47 47 47 233 233 233 116 116 116 116 116 271 271 271 271 271 271 271 271 271 271 271 225 225 225 225 225 53 53 53 53 229 229 229 229 229 229 229 229 229 229 272 272 272 272 272 272 331 331 331 189 189 189 292 292 292 292 292 292 292 292 292 115 115 115 115 115 193 193 193 193 229 229 229 229 229 229 229 273 273 273 273 273 49 49 49 49 225 225 225 225 225 225 340 340 340 340 340 340 340 187 187 187 187 232 232 232 227 227 227 227 227 100 100 100 100 100 100 100 100 100 100 100 131 131 131 131 131 131 131 225 225 225 225 225 225 109 109 109 109 109 109 253 253 253 253 253 253 253 253 253 253 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 331 331 331 331 331 331 331 331 193 193 193 193 193 193 193 120 120 120 120 120 120 120 120 119 119 119 119 193 193 193 193 193 193 193 193 193 193 193 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 1 1 1 1 1 1 1 227 227 227 227 227 227 227 227 37 37 37 37 37 37 37 37 37 293 293 293 293 293 293 337 337 337 337 317 317 317 317 340 340 340 340 340 340 340 219 219 219 219 49 49 49 49 229 229 229 229 273 273 273 273 273 273 273 273 273 273 273 273 37 37 37 37 233 233 233 233 233 233 233 337 337 337 337 337 49 49 49 49 232 232 232 232 232 232 279 279 279 279 279 279 279 289 289 289 289 289 289 21 21 21 21 21 21 21 21 21 273 273 273 288 288 288 287 287 287 287 287 287 287 69 69 69 69 69 69 69 69 69 69 69 221 221 221 221 221 221 221 189 189 189 189 189 236 236 236 236 236 236 236 236 236 236 236 236 236 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 271 271 271 271 271 271 271 21 21 21 21 21 277 277 277 277 277 277 277 277 289 289 289 289 225 225 225 225 225 225 204 204 204 204 204 204 204 204 204 204 204 204 204 204 204 204 204 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 107 107 107 107 107 189 189 189 189 189 221 221 221 221 221 221 49 49 49 49 340 340 340 283 283 283 283 208 208 208 208 208 331 331 331 331 331 49 49 49 340 340 340 340 340 83 83 83 83 83 83 83 83 288 288 288 288 288 288 47 47 47 328 328 328 328 328 328 328 107 107 107 107 107 277 277 277 277 277 133 133 133 133 133 133 133 133 133 133 133 292 292 292 292 292 292 292 292 292 292 292 292 292 292 292 292 292 292 35 35 35 233 233 233 233 116 116 116 116 116 271 271 271 271 271 271 271 21 21 21 21 21 21 21 277 277 277 277 277 277 277 277 289 289 289 289 289 225 225 225 225 204 204 204 204 204 204 204 107 107 107 107 107 189 189 189 189 189 221 221 221 221 221 221 221 221 49 49 49 49 49 49 49 49 340 340 340 340 340 340 119 119 119 119 119 164 164 164 164 164 164 164 164 179 179 179 179 37 37 37 116 116 116 116 116 116 275 275 275 275 275 275 209 209 209 209 209 113 113 113 113 113 113 113 113 113 113 113 113 113 288 288 288 119 119 119 52 52 52 52 107 107 107 107 107 107 53 53 53 53 53 53 53 177 177 177 177 204 204 204 204 204 204 204 204 204 204 204 204 204 204 204 204 204 204 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 231 231 231 231 231 21 21 21 21 21 288 288 288 288 288 288 47 47 47 47 233 233 233 233 53 53 53 53 53 53 53 53 121 121 121 121 121 144 144 144 144 144 331 331 331 331 331 331 331 331 331 331 331 331 331 331 149 149 149 149 149 149 149 149 149 149 149 116 116 116 115 115 115 115 189 189 189 116 116 116 116 116 116 116 283 283 283 283 283 283 283 283 208 208 208 208 208 279 279 279 279 279 279 279 279 279 279 279 279 164 164 164 164 164 164 164 164 164 164 164 164 47 47 47 47 47 47 233 233 233 233 233 289 289 289 289 289 289 289 193 193 193 193 193 224 224 224 224 119 119 119 119 119 164 164 164 164 179 179 179 179 179 179 37 37 37 37 116 116 116 116 116 223 223 223 223 223 133 133 133 133 133 133 173 173 173 173 173 173 173 288 288 288 288 288 288 119 119 119 119 48 48 48 48 327 327 327 327 327 327 327 327 327 193 193 193 193 193 225 225 225 225 225 225 225 189 189 189 189 189 189 189 216 216 216 216 216 216 216 216 216 216 216 47 47 47 233 233 233 116 116 116 331 331 331 331 144 144 144 144 115 115 115 115 115 115 277 277 277 277 277 277 101 101 101 101 101 101 101 101 101 101 101 329 329 329 329 329 329 189 189 189 189 236 236 236 236 236 236 236 115 115 115 115 115 115 85 85 85 85 85 85 85 85 85 85 85 85 232 232 232 232 232 50 50 50 50 50 279 279 279 279 279 279 279 279 279 279 289 289 289 289 289 289 289 289 209 209 209 209 209 209 209 209 209 209 272 272 272 272 272 272 272 272 223 223 223 223 223 193 193 193 193 289 289 289 49 49 49 49 224 224 224 224 179 179 179 179 179 179 179 179 179 179 179 193 193 193 193 193 193 193 193 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
+1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 119 119 119 119 119 48 48 48 48 48 48 48 275 275 275 275 275 275 275 275 275 249 249 249 249 249 249 249 249 249 249 249 249 116 116 116 116 116 116 271 271 271 271 271 21 21 21 21 21 21 21 277 277 277 277 277 277 288 288 288 288 288 288 47 47 47 328 328 328 328 328 328 328 328 328 331 331 331 331 331 193 193 193 193 193 193 193 193 112 112 112 112 112 112 112 112 179 179 179 37 37 37 116 116 116 107 107 107 107 189 189 189 232 232 232 232 232 232 232 219 219 219 219 219 219 219 219 219 219 53 53 53 53 53 53 53 53 288 288 288 288 288 288 288 279 279 279 279 279 279 279 279 279 279 279 279 248 248 248 248 248 248 248 248 248 248 248 115 115 115 115 115 115 115 115 115 115 209 209 209 209 209 209 209 209 273 273 273 273 273 273 273 225 225 225 225 225 225 204 204 204 204 187 187 187 187 233 233 233 233 289 289 289 289 289 289 48 48 48 48 119 119 119 119 52 52 52 52 52 279 279 279 279 279 279 279 279 279 279 279 279 279 279 69 69 69 69 69 69 69 69 69 69 69 173 173 173 173 173 173 173 173 288 288 288 288 288 279 279 279 279 279 279 279 279 279 279 279 279 279 279 279 265 265 265 265 265 265 265 265 265 265 265 265 265 265 265 224 224 224 224 224 224 224 224 224 224 224 224 224 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 119 119 119 119 119 119 49 49 49 49 288 288 288 288 119 119 119 119 48 48 48 48 48 107 107 107 107 107 107 107 107 107 107 107 37 37 37 37 37 37 37 37 237 237 237 237 237 237 237 237 237 237 237 237 221 221 221 221 221 221 221 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 1 1 1 171 171 171 171 171 277 277 277 277 277 193 193 193 193 233 233 233 233 233 233 217 217 217 217 217 217 217 116 116 116 331 331 331 331 189 189 189 189 120 120 120 120 120 120 120 107 107 107 107 107 107 107 225 225 225 225 225 225 321 321 321 321 321 321 321 321 229 229 229 229 229 229 229 189 189 189 189 189 236 236 236 236 236 236 236 236 236 236 236 236 236 236 331 331 331 331 331 331 331 331 331 101 101 101 101 101 101 101 101 101 101 225 225 225 225 225 225 225 225 225 116 116 116 116 111 111 111 111 111 111 111 111 111 111 111 111 111 111 111 133 133 133 133 133 133 277 277 277 277 277 277 277 204 204 204 204 204 204 204 287 287 287 287 287 287 287 287 287 287 277 277 277 277 277 209 209 209 209 209 209 209 209 340 340 340 340 340 340 340 47 47 47 233 233 233 233 233 233 116 116 116 116 116 116 279 279 279 279 279 279 279 279 279 279 225 225 225 225 193 193 193 193 193 193 193 193 193 228 228 228 228 228 228 228 228 228 228 228 228 228 331 331 331 331 331 331 331 101 101 101 101 101 101 101 101 288 288 288 288 288 288 288 288 288 1 1 1 1 1 1 107 107 107 107 107 107 149 149 149 149 149 149 149 149 149 149 113 113 113 113 113 113 113 113 113 113 113 113 189 189 189 189 189 189 189 189 340 340 340 340 340 340 340 340 340 340 340 340 340 340 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 331 331 331 331 331 331 144 144 144 144 144 279 279 279 279 279 279 279 279 279 279 279 279 279 133 133 133 133 133 133 133 329 329 329 329 329 329 277 277 277 49 49 49 49 224 224 224 224 224 224 171 171 171 171 171 171 171 171 171 171 171 171 171 209 209 209 209 209 209 209 288 288 288 288 288 47 47 47 47 109 109 109 109 109 109 109 109 53 53 53 53 53 53 53 328 328 328 328 328 328 119 119 119 119 133 133 133 133 133 133 276 276 276 276 276 179 179 179 179 179 179 179 179 133 133 133 133 133 133 133 133 133 133 133 133 133 133 117 117 117 117 117 117 117 117 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 119 119 119 119 119 119 52 52 52 52 52 111 111 111 111 111 111 111 111 111 111 111 111 111 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 225 225 225 225 225 225 225 225 116 116 116 116 116 116 116 271 271 271 271 271 271 271 271 305 305 305 305 305 288 288 288 288 83 83 83 83 83 83 83 83 83 83 83 288 288 288 288 288 288 179 179 179 179 179 144 144 144 144 144 144 179 179 179 179 179 179 179 179 179 179 179 37 37 37 37 37 37 37 37 37 37 37 37 233 233 233 233 233 233 233 116 116 116 116 47 47 47 233 233 233 116 116 116 116 107 107 107 107 107 277 277 277 277 277 249 249 249 249 249 249 249 220 220 220 220 220 220 220 67 67 67 67 67 67 67 67 172 172 172 172 172 172 172 172 50 50 50 50 107 107 107 107 107 107 107 107 107 107 277 277 277 277 277 37 37 37 37 37 37 37 37 37 233 233 233 233 233 233 233 233 233 112 112 112 112 112 112 112 112 47 47 47 47 328 328 328 328 328 328 328 331 331 331 331 331 331 331 331 101 101 101 101 101 101 101 101 101 101 101 101 225 225 225 225 225 116 116 116 116 116 116 116 116 1 1 1 271 271 271 271 271 271 271 271 271 225 225 225 225 225 225 225 225 53 53 53 53 53 53 53 53 228 228 228 228 228 228 228 228 228 228 228 228 228 228 228 228 228 119 119 119 119 49 49 49 49 288 288 288 288 288 288 288 288 288 107 107 107 107 107 277 277 277 277 53 53 53 53 53 53 53 53 53 285 285 285 285 285 285 285 285 285 285 288 288 288 47 47 47 177 177 177 177 177 177 177 133 133 133 133 233 233 233 233 233 233 233 281 281 281 281 288 288 288 119 119 119 52 52 52 52 279 279 279 279 279 279 279 279 279 101 101 101 101 101 101 101 101 101 116 116 116 116 116 47 47 47 47 328 328 328 119 119 119 119 52 52 52 52 107 107 107 107 107 107 107 107 107 53 53 53 53 53 53 177 177 177 177 177 177 177 204 204 204 204 204 204 204 204 204 204 204 204 204 204 204 204 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
+1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 191 191 191 191 191 191 191 191 191 191 191 191 191 341 341 341 341 341 341 49 49 49 49 232 232 232 232 232 232 119 119 119 119 119 37 37 37 37 37 37 37 37 37 288 288 288 288 288 1 1 1 1 107 107 107 107 107 107 107 337 337 337 337 337 337 337 337 321 321 321 321 321 321 289 289 289 49 49 49 49 173 173 173 173 173 173 173 173 173 173 173 49 49 49 49 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 179 179 179 179 333 333 333 53 53 53 53 288 288 288 115 115 115 115 115 193 193 193 193 116 116 116 119 119 119 119 37 37 37 37 37 37 37 37 37 288 288 288 288 288 288 288 287 287 287 287 287 287 287 287 287 287 287 277 277 277 277 208 208 208 208 208 208 208 208 208 208 208 208 208 208 208 208 208 223 223 223 223 223 223 223 209 209 209 209 209 209 209 209 209 209 209 233 233 233 189 189 189 189 189 189 189 236 236 236 236 236 236 236 236 236 83 83 83 83 83 83 83 83 83 83 288 288 288 288 288 171 171 171 171 145 145 145 145 228 228 228 228 228 119 119 119 48 48 48 48 48 48 107 107 107 107 107 107 107 107 37 37 37 37 37 37 37 37 37 237 237 237 237 237 237 237 237 220 220 220 220 220 220 220 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 224 224 224 224 224 224 224 224 224 224 331 331 331 331 331 331 331 331 331 101 101 101 101 101 101 101 101 101 101 101 288 288 288 288 288 288 288 288 47 47 47 233 233 233 116 116 116 223 223 223 223 223 223 223 223 223 223 165 165 165 165 165 165 165 165 165 165 281 281 281 281 281 281 281 281 281 281 281 281 281 204 204 204 204 204 204 204 227 227 227 227 227 227 165 165 165 165 165 165 165 220 220 220 220 220 335 335 335 335 335 320 320 320 320 320 320 320 320 291 291 291 291 291 193 193 193 193 237 237 237 237 237 237 237 237 220 220 220 220 220 220 220 220 51 51 51 51 51 51 51 51 51 51 51 51 328 328 328 328 328 328 328 328 328 328 283 283 283 283 283 283 283 283 283 283 208 208 208 208 208 208 208 208 35 35 35 35 35 35 35 35 35 35 35 35 35 281 281 281 281 281 281 281 281 281 281 221 221 221 221 221 288 288 288 288 288 288 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 331 331 331 331 331 331 331 331 331 331 331 331 331 133 133 133 224 224 224 224 224 224 224 224 224 224 224 1 1 1 231 231 231 231 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 115 115 115 115 49 49 49 49 233 233 233 233 233 233 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 1 1 1 1 1 1 1 1 1 1 279 279 279 279 279 279 279 279 279 279 279 279 279 279 133 133 133 133 116 116 116 116 116 116 227 227 227 227 227 227 37 37 37 37 37 37 37 37 37 293 293 293 293 293 293 293 337 337 337 337 337 316 316 316 316 316 316 316 316 316 316 316 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 331 331 331 331 331 331 331 331 100 100 100 100 100 100 100 100 100 100 100 100 100 100 50 50 50 50 50 50 107 107 107 107 107 107 107 107 107 107 107 107 107 277 277 277 277 277 277 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 116 116 116 116 47 47 47 47 328 328 328 328 328 328 328 328 328 328 328 219 219 219 219 219 219 219 219 219 219 219 219 69 69 69 69 69 69 69 69 277 277 277 277 277 277 277 277 277 277 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 50 50 50 50 50 50 50 50 50 50 107 107 107 107 107 107 107 107 107 107 107 107 107 107 107 107 277 277 277 277 277 277 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 116 116 116 116 116 116 116 1 1 1 1 1 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 224 224 224 224 224 224 224 187 187 187 187 187 187 187 232 232 232 232 232 232 232 232 232 232 232 232 232 232 331 331 331 331 331 331 331 331 331 101 101 101 101 101 101 101 101 101 101 101 101 288 288 288 288 288 288 288 288 288 288 331 331 331 331 189 189 189 120 120 120 120 120 50 50 50 50 50 50 50 223 223 223 223 223 223 223 223 223 223 223 223 223 223 223 223 53 53 53 53 53 53 53 53 53 329 329 329 329 329 329 329 329 329 329 329 329 329 225 225 225 225 225 225 225 204 204 204 204 204 204 204 204 204 227 227 227 227 227 227 227 227 227 227 227 227 227 227 227 227 227 227 193 193 193 193 281 281 281 281 281 281 281 281 281 281 281 281 281 289 289 289 289 289 289 289 204 204 204 204 204 204 204 204 327 327 327 327 327 327 327 327 327 165 165 165 165 165 165 165 165 165 165 165 165 165 165 165 165 165 165 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 1 1 1 1 1 1 1 1 1 1 1
+1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 328 328 328 231 231 231 231 231 231 133 133 133 133 133 329 329 329 329 144 144 144 144 144 144 144 279 279 279 279 279 279 279 279 279 279 279 279 209 209 209 209 209 209 209 209 209 232 232 232 232 232 232 331 331 331 331 331 53 53 53 53 53 53 53 53 232 232 232 232 232 232 232 232 232 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 107 107 107 107 107 53 53 53 288 288 288 288 102 102 102 102 102 102 102 102 102 102 219 219 219 219 219 219 219 219 49 49 49 49 232 232 232 232 187 187 187 229 229 229 229 229 229 229 229 229 229 229 37 37 37 37 37 37 37 37 37 37 37 37 217 217 217 217 217 217 217 217 217 49 49 49 232 232 232 232 232 232 331 331 331 331 53 53 53 288 288 288 283 283 283 283 283 283 283 283 208 208 208 208 208 331 331 331 331 331 305 305 305 116 116 116 116 116 116 223 223 223 223 223 223 223 223 305 305 305 305 305 220 220 220 220 220 220 220 220 220 223 223 223 223 223 223 101 101 101 101 101 101 101 101 101 101 101 220 220 220 220 220 220 220 220 220 220 220 220 220 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 102 102 102 102 102 102 102 102 102 102 102 102 102 102 115 115 115 115 115 115 249 249 249 249 249 233 233 233 233 288 288 288 288 131 131 131 131 131 329 329 329 329 329 329 329 329 329 144 144 144 144 144 187 187 187 187 187 221 221 221 221 221 221 281 281 281 281 281 273 273 273 273 273 133 133 133 133 133 221 221 221 221 221 288 288 288 287 287 287 188 188 188 188 107 107 107 107 107 208 208 208 208 208 208 208 208 50 50 50 50 50 50 107 107 107 107 107 107 107 277 277 277 277 277 277 277 101 101 101 101 101 101 101 101 101 116 116 116 116 227 227 227 227 227 105 105 105 105 105 105 105 105 281 281 281 281 281 281 281 281 281 281 281 281 281 281 133 133 133 133 133 133 133 133 225 225 225 225 225 225 225 225 225 225 225 225 172 172 172 172 172 172 172 172 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 99 99 99 99 99 99 99 99 99 99 99 99 99 99 228 228 228 228 228 228 228 279 279 279 279 279 279 279 279 279 279 248 248 248 248 248 248 248 179 179 179 179 179 179 179 179 179 179 249 249 249 249 249 249 249 249 249 229 229 229 229 229 229 229 225 225 225 225 225 225 225 204 204 204 204 204 204 231 231 231 231 231 231 231 249 249 249 249 249 249 249 109 109 109 109 109 49 49 49 117 117 117 204 204 204 204 204 204 331 331 331 331 49 49 49 49 49 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 131 131 131 131 329 329 329 329 329 329 329 144 144 144 144 144 144 331 331 331 331 331 331 331 21 21 21 233 233 233 233 233 288 288 288 287 287 287 287 48 48 48 48 227 227 227 227 227 227 133 133 133 133 133 133 133 133 133 133 133 277 277 277 277 277 277 277 204 204 204 204 204 204 204 204 227 227 227 227 227 227 227 208 208 208 208 208 208 208 208 208 208 208 208 208 208 208 208 208 208 208 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 47 47 47 47 47 47 47 47 47 47 47 233 233 233 225 225 225 225 133 133 133 133 280 280 280 280 280 280 280 280 280 191 191 191 288 288 288 227 227 227 227 227 227 227 227 101 101 101 101 101 101 101 101 101 101 288 288 288 107 107 107 107 107 204 204 204 204 204 50 50 50 50 50 171 171 171 171 171 171 171 171 171 171 171 171 171 69 69 69 69 69 277 277 277 277 49 49 49 49 49 232 232 232 227 227 227 227 227 193 193 193 193 285 285 285 285 285 285 285 285 285 285 285 49 49 49 49 49 233 233 233 233 137 137 137 137 137 137 277 277 277 277 204 204 204 204 204 204 204 204 204 204 204 204 204 204 204 1 1 1 1 1 1 1 1 1 1 1 1 1 1 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 279 279 279 279 279 279 49 49 49 49 273 273 273 273 273 273 273 273 273 249 249 249 249 249 249 249 249 340 340 340 340 340 340 340 50 50 50 50 50 171 171 171 171 171 171 171 171 171 171 171 171 69 69 69 69 69 69 277 277 277 277 277 277 49 49 49 49 232 232 232 232 227 227 227 227 227 193 193 193 285 285 285 285 285 285 285 285 285 285 285 49 49 49 233 233 233 233 233 137 137 137 137 137 277 277 277 277 277 204 204 204 204 204 204 204 227 227 227 227 227 227 227 227 101 101 101 101 101 101 101 101 101 101 101 289 289 289 289 289 49 49 49 233 233 233 288 288 288 288 288 288 107 107 107 107 107 107 208 208 208 208 208 208 208 208 208 208 208 327 327 327 327 133 133 133 133 277 277 277 277 277 277 277 204 204 204 204 204 204 204 271 271 271 271 271 271 145 145 145 145 289 289 289 289 289 289 193 193 193 193 193 221 221 221 221 221 221 221 337 337 337 337 49 49 49 225 225 225 225 225 225 144 144 144 144 144 144 144 144 144 144 144 144 144 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
+1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 107 107 107 107 107 107 53 53 53 288 288 288 288 102 102 102 102 102 102 102 102 102 102 102 102 115 115 115 115 115 115 115 115 320 320 320 320 320 320 320 320 320 320 320 179 179 179 179 179 179 179 179 179 249 249 249 249 249 272 272 272 272 272 272 119 119 119 119 49 49 49 49 288 288 288 288 288 279 279 279 279 279 279 279 279 279 279 279 279 279 53 53 53 53 53 53 228 228 228 228 228 228 228 115 115 115 115 115 164 164 164 164 164 164 164 164 164 164 164 164 102 102 102 102 102 102 102 283 283 283 283 283 283 283 283 283 283 283 37 37 37 37 224 224 224 224 1 1 1 1 1 179 179 179 179 179 179 179 179 179 179 179 179 37 37 37 37 328 328 328 328 328 50 50 50 50 50 50 331 331 331 331 331 331 331 331 331 331 331 331 331 101 101 101 101 101 101 101 101 101 101 101 101 288 288 288 288 288 288 115 115 115 115 115 115 115 115 277 277 277 277 133 133 133 133 133 133 133 133 133 133 133 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 119 119 119 119 119 119 119 37 37 37 37 37 37 37 37 37 37 37 37 37 288 288 288 288 288 288 1 1 1 187 187 187 187 187 187 187 187 187 340 340 340 340 340 340 227 227 227 227 227 227 227 100 100 100 100 100 100 100 179 179 179 179 179 179 179 179 179 179 179 179 179 179 179 179 179 179 179 179 179 179 179 179 179 179 101 101 101 101 101 101 101 101 101 101 101 101 101 49 49 49 49 49 281 281 281 281 281 281 281 288 288 288 288 288 95 95 95 95 95 95 95 95 95 95 95 117 117 117 117 117 117 117 117 209 209 209 209 209 209 209 224 224 224 224 224 224 47 47 47 47 328 328 328 328 328 147 147 147 147 147 147 147 147 147 147 147 293 293 293 293 293 293 293 293 225 225 225 225 204 204 204 204 204 204 107 107 107 107 107 225 225 225 225 225 193 193 193 193 193 193 193 193 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 102 102 102 102 102 102 102 102 102 102 102 102 102 102 215 215 215 215 215 215 189 189 189 189 189 189 281 281 281 288 288 288 288 288 288 288 288 223 223 223 223 53 53 53 53 53 328 328 328 328 271 271 271 271 271 271 271 271 271 271 271 271 277 277 277 193 193 193 289 289 289 289 204 204 204 204 219 219 219 219 219 219 219 219 219 219 219 219 225 225 225 225 225 225 249 249 249 249 249 249 249 249 249 249 249 249 121 121 121 121 121 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 35 35 35 35 35 35 35 35 35 35 233 233 233 116 116 116 116 116 99 99 99 99 99 99 99 99 99 99 328 328 328 328 328 328 328 231 231 231 231 133 133 133 133 329 329 329 329 144 144 144 144 144 179 179 179 179 179 179 179 179 179 37 37 37 116 116 116 116 50 50 50 50 50 271 271 271 271 271 271 271 271 271 277 277 277 193 193 193 289 289 289 289 204 204 204 204 204 204 204 115 115 115 115 115 115 115 277 277 277 277 133 133 133 133 133 280 280 280 280 280 280 280 280 280 187 187 187 232 232 232 227 227 227 227 227 227 227 100 100 100 100 100 100 100 100 100 100 100 100 223 223 223 223 223 223 223 101 101 101 101 101 101 101 101 101 101 101 101 172 172 172 172 172 172 172 119 119 119 49 49 49 288 288 288 288 102 102 102 102 102 102 219 219 219 219 219 219 219 49 49 49 49 49 232 232 232 232 232 232 275 275 275 275 275 189 189 189 229 229 229 229 229 229 229 133 133 133 133 133 229 229 229 229 229 229 229 109 109 109 109 109 109 144 144 144 144 144 144 144 144 144 144 144 144 144 144 144 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 107 107 107 107 107 53 53 53 288 288 288 47 47 47 328 328 328 328 328 219 219 219 219 219 219 219 219 69 69 69 69 69 69 277 277 277 277 277 277 277 277 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 191 191 191 191 191 191 191 191 191 289 289 289 289 289 280 280 280 280 280 1 1 1 67 67 67 67 67 67 67 67 67 67 67 67 67 67 224 224 224 224 224 119 119 119 119 119 48 48 48 227 227 227 227 227 227 227 227 227 227 69 69 69 69 276 276 276 276 287 287 287 287 287 287 48 48 48 48 48 223 223 223 223 223 223 305 305 305 305 305 220 220 220 220 220 220 220 220 220 171 171 171 171 171 171 171 171 171 69 69 69 69 277 277 277 277 277 277 333 333 333 333 333 145 145 145 145 116 116 116 287 287 287 287 287 287 287 287 287 320 320 320 320 320 320 320 187 187 187 187 341 341 341 341 341 341 341 49 49 49 233 233 233 288 288 288 288 288 288 187 187 187 187 187 187 288 288 288 288 288 288 288 288 288 288 288 288 288 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 35 35 35 35 35 35 35 35 35 233 233 233 233 233 116 116 116 119 119 119 133 133 133 133 133 133 133 232 232 232 232 232 232 232 232 232 232 1 1 1 1 1 1 1 1 1 1 1
+1 1 1 1 1 1 1 1 1 1 1 1 1 102 102 102 102 102 102 102 102 102 219 219 219 219 219 219 219 49 49 49 232 232 232 232 187 187 187 229 229 229 229 229 229 229 229 229 37 37 37 37 37 37 37 37 37 217 217 217 217 217 217 217 217 49 49 49 49 49 232 232 232 119 119 119 49 49 49 49 288 288 288 288 99 99 99 99 99 99 99 228 228 228 228 228 115 115 115 115 115 277 277 277 277 133 133 133 133 281 281 281 281 281 281 281 281 281 281 281 281 288 288 288 288 288 288 288 175 175 175 175 175 175 69 69 69 69 69 69 69 277 277 277 277 277 217 217 217 217 217 217 217 217 49 49 49 49 281 281 281 281 281 281 281 281 281 281 281 281 281 225 225 225 225 204 204 204 204 204 204 204 204 204 204 204 204 204 204 204 204 204 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 119 119 119 119 119 119 119 189 189 189 189 189 280 280 280 280 280 280 280 280 227 227 227 227 227 227 227 227 227 69 69 69 69 277 277 277 277 233 233 233 233 189 189 189 189 189 236 236 236 236 236 331 331 331 331 331 331 133 133 133 232 232 232 232 232 102 102 102 102 102 102 102 102 102 223 223 223 223 133 133 133 133 133 133 133 173 173 173 173 173 288 288 288 119 119 119 119 204 204 204 204 204 47 47 47 47 47 47 47 281 281 281 281 281 281 281 281 281 281 281 281 101 101 101 101 101 101 101 101 101 101 101 101 225 225 225 225 225 225 225 49 49 49 49 228 228 228 228 228 228 228 102 102 102 102 102 102 102 102 102 171 171 171 171 171 171 171 171 171 133 133 133 133 225 225 225 225 225 225 288 288 288 288 288 279 279 279 279 279 279 279 279 279 279 248 248 248 248 248 248 248 248 47 47 47 47 47 285 285 285 285 285 285 285 285 285 285 285 285 285 285 285 285 285 285 165 165 165 165 165 165 165 165 165 165 165 229 229 229 229 229 229 229 116 116 116 116 116 107 107 107 107 107 189 189 189 189 221 221 221 221 221 221 49 49 49 49 49 340 340 340 340 340 340 340 340 102 102 102 102 102 102 179 179 179 179 179 179 37 37 37 37 37 116 116 116 116 116 287 287 287 287 48 48 48 48 48 48 331 331 331 331 331 133 133 133 133 276 276 276 276 119 119 119 119 119 119 189 189 189 189 189 280 280 280 280 280 280 280 280 179 179 179 179 179 179 179 179 179 179 179 179 179 179 179 69 69 69 69 69 277 277 277 277 277 277 277 277 277 277 49 49 49 116 116 116 116 116 116 247 247 247 247 247 247 247 247 247 225 225 225 225 225 225 225 116 116 116 1 1 1 331 331 331 331 331 331 193 193 193 233 233 233 233 233 233 233 233 281 281 281 281 281 281 281 281 281 281 204 204 204 204 204 204 115 115 115 115 115 115 115 115 277 277 277 277 277 277 133 133 133 133 133 133 133 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 224 224 224 224 224 119 119 119 119 119 204 204 204 204 204 204 67 67 67 67 67 67 67 277 277 277 277 277 277 173 173 173 173 173 173 173 173 49 49 49 233 233 233 233 233 340 340 340 340 179 179 179 179 179 37 37 37 37 37 116 116 116 116 287 287 287 48 48 48 48 48 331 331 331 331 133 133 133 133 276 276 276 276 276 276 119 119 119 119 119 49 49 49 49 228 228 228 228 228 228 335 335 335 320 320 320 320 320 231 231 231 231 231 231 231 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 50 50 50 50 50 50 50 50 50 227 227 227 227 227 227 227 227 227 227 227 149 149 149 149 149 149 149 149 113 113 113 113 113 113 113 113 49 49 49 233 233 233 288 288 288 187 187 187 187 187 232 232 232 232 232 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 223 223 223 223 223 223 223 223 223 37 37 37 37 37 37 37 37 281 281 281 281 281 281 281 288 288 288 288 288 288 331 331 331 331 331 193 193 193 233 233 233 233 289 289 289 289 289 289 144 144 144 144 115 115 115 115 115 115 115 115 115 249 249 249 249 249 249 249 233 233 233 233 233 169 169 169 169 169 169 169 169 289 289 289 289 289 189 189 189 189 116 116 116 116 116 291 291 291 291 291 291 291 291 291 291 291 277 277 277 277 277 208 208 208 208 208 208 208 208 208 208 208 208 179 179 179 179 179 179 179 179 53 53 53 53 53 53 233 233 233 233 117 117 117 117 145 145 145 145 145 145 116 116 116 116 335 335 335 335 335 335 335 335 21 21 21 21 21 21 21 21 21 21 21 21 277 277 277 277 277 277 277 277 277 117 117 117 117 117 117 340 340 340 340 47 47 47 47 328 328 328 328 328 328 328 331 331 331 331 331 193 193 193 233 233 233 233 233 233 233 281 281 281 281 281 281 281 281 204 204 204 204 287 287 287 287 287 48 48 48 119 119 119 119 204 204 204 204 204 204 204 204 47 47 47 47 47 281 281 281 281 281 281 281 281 281 281 101 101 101 101 101 101 101 101 101 101 225 225 225 225 225 225 225 225 49 49 49 49 228 228 228 228 228 228 228 228 228 228 228 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 279 279 279 279 279 279 279 279 279 279 279 279 53 53 53 53 53 53 53 53 228 228 228 228 271 271 271 271 271 209 209 209 209 209 209 273 273 273 273 273 49 49 49 224 224 224 224 279 279 279 279 279 279 279 279 279 133 133 133 116 116 116 116 116 191 191 191 288 288 288 288 288 331 331 331 49 49 49 340 340 340 340 340 340 340 107 107 107 189 189 189 189 221 221 221 221 221 221 221 221 49 49 49 49 340 340 340 340 179 179 179 179 208 208 208 219 219 219 219 219 219 219 219 219 219 305 305 305 117 117 117 117 49 49 49 49 232 232 232 232 232 279 279 279 279 279 279 279 279 279 133 133 133 133 133 133 133 224 224 224 224 224 224 224 187 187 187 187 187 187 187 187 288 288 288 288 288 288 288 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
+1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 107 107 107 107 107 53 53 53 288 288 288 99 99 99 99 99 99 99 99 99 99 99 99 99 116 116 116 116 116 116 116 275 275 275 275 37 37 37 37 37 37 37 37 121 121 121 121 121 144 144 144 144 144 107 107 107 107 107 189 189 189 189 225 225 225 225 225 225 225 225 225 225 209 209 209 209 209 209 209 209 209 209 328 328 328 328 119 119 119 49 49 49 288 288 288 187 187 187 288 288 288 288 331 331 331 49 49 49 340 340 340 340 340 340 340 340 83 83 83 83 83 83 83 288 288 288 288 288 47 47 47 328 328 328 328 119 119 119 119 52 52 52 52 219 219 219 219 219 219 219 219 219 219 219 219 219 219 101 101 101 101 101 101 233 233 233 117 117 117 233 233 233 49 49 49 49 49 280 280 280 280 280 280 280 47 47 47 47 328 328 328 179 179 179 189 189 189 340 340 340 340 340 340 340 179 179 179 179 179 179 179 179 179 179 21 21 21 21 21 21 21 277 277 277 277 277 277 277 288 288 288 288 288 288 288 288 331 331 331 305 305 305 117 117 117 49 49 49 49 49 49 49 233 233 233 288 288 288 288 288 288 335 335 335 335 335 335 335 320 320 320 320 320 320 320 320 320 320 320 320 320 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 331 331 331 331 331 331 331 331 133 133 133 133 232 232 232 331 331 331 208 208 208 208 208 208 175 175 175 175 175 175 175 175 21 21 21 21 21 21 288 288 288 288 288 288 19 19 19 19 232 232 232 232 232 119 119 119 52 52 52 52 52 287 287 287 287 287 287 287 287 287 287 287 287 287 277 277 277 277 165 165 165 165 165 165 165 165 165 165 165 165 165 165 232 232 232 232 232 232 232 232 232 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 102 102 102 102 102 102 102 102 102 102 171 171 171 171 171 171 171 171 133 133 133 133 133 225 225 225 225 225 225 288 288 288 288 131 131 131 131 340 340 340 340 340 340 340 340 340 191 191 191 191 172 172 172 172 172 172 172 172 172 1 1 1 131 131 131 131 131 131 131 131 131 131 131 131 329 329 329 329 329 329 329 277 277 277 277 205 205 205 205 205 109 109 109 109 109 109 109 25 25 25 25 25 117 117 117 117 204 204 204 204 204 227 227 227 227 227 227 227 53 53 53 53 53 53 281 281 281 281 281 288 288 288 107 107 107 107 107 204 204 204 204 204 204 204 204 223 223 223 223 223 223 223 223 223 305 305 305 305 221 221 221 221 221 221 221 189 189 189 189 189 236 236 236 236 236 236 236 236 35 35 35 35 35 288 288 288 288 288 227 227 227 227 227 208 208 208 208 208 208 47 47 47 233 233 233 116 116 116 271 271 271 271 271 271 271 271 271 271 271 271 193 193 193 193 289 289 289 289 289 289 289 205 205 205 205 189 189 189 189 189 189 189 189 189 189 236 236 236 236 236 236 236 236 1 1 1 227 227 227 227 208 208 208 208 208 208 208 208 208 208 208 208 208 208 208 208 208 208 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 107 107 107 107 107 107 53 53 53 53 288 288 288 288 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 215 215 215 215 215 215 215 215 53 53 53 53 53 53 53 281 281 281 281 288 288 288 288 288 331 331 331 331 133 133 133 133 133 233 233 233 288 288 288 287 287 287 48 48 48 48 48 48 331 331 331 331 331 331 331 331 331 149 149 149 149 149 149 220 220 220 220 220 47 47 47 233 233 233 116 116 116 187 187 187 187 229 229 229 229 229 229 229 229 37 37 37 37 37 37 37 37 37 37 37 217 217 217 217 217 217 217 217 217 49 49 49 49 49 233 233 233 116 116 116 119 119 119 49 49 49 288 288 288 288 102 102 102 102 102 102 102 102 102 102 102 179 179 179 179 179 179 179 179 179 179 179 179 179 179 179 179 179 179 37 37 37 37 37 116 116 116 116 19 19 19 19 19 19 19 19 232 232 232 232 232 119 119 119 52 52 52 227 227 227 227 227 227 249 249 249 249 249 249 281 281 281 281 281 281 281 288 288 288 288 288 288 288 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 107 107 107 107 107 337 337 337 337 337 337 337 337 337 337 337 337 337 337 337 321 321 321 321 321 321 321 321 321 321 321 321 289 289 289 289 49 49 49 49 49 49 49 173 173 173 173 173 173 173 173 173 49 49 49 49 224 224 224 224 224 224 224 224 224 224 224 224 224 1 1 1 1 1 1 271 271 271 271 271 271 271 271 271 271 271 271 271 165 165 165 165 165 165 165 165 165 165 165 165 165 165 165 165 165 165 165 224 224 224 224 224 224 224 224 224 224 224 224 107 107 107 107 107 107 107 107 107 107 107 107 107 107 107 107 225 225 225 225 225 225 225 225 225 320 320 320 320 320 320 320 320 320 320 320 320 320 320 320 320 320 320 279 279 279 279 279 279 279 279 279 279 279 279 279 279 279 279 279 279 279 279 193 193 193 193 193 193 225 225 225 225 225 220 220 220 220 220 220 220 220 220 220 220 220 220 220 220 115 115 115 115 115 277 277 277 277 133 133 133 133 133 133 133 133 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 107 107 107 107 107 189 189 189 189 189 221 221 221 221 221 221 49 49 49 49 49 340 340 340 340 340 340 331 331 331 193 193 193 193 193 232 232 232 232 335 335 335 335 335 335 335 320 320 320 320 320 320 320 320 320 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 276 276 276 276 276 276 276 276 187 187 187 187 187 229 229 229 229 229 229 229 229 229 37 37 37 37 37 37 37 37 37 217 217 217 217 217 217 217 217 217 49 49 49 233 233 233 189 189 189 189 189 236 236 236 236 236 236 335 335 335 335 335 335 320 320 320 320 320 227 227 227 227 227 227 101 101 101 101 101 101 101 288 288 288 288 288 288 131 131 131 340 340 340 340 340 340 340 340 340 331 331 331 331 133 133 133 133 224 224 224 224 224 224 187 187 187 187 229 229 229 229 229 229 229 229 37 37 37 37 37 37 37 217 217 217 217 217 217 217 217 217 49 49 49 49 232 232 232 232 232 279 279 279 279 279 279 53 53 53 53 229 229 229 229 293 293 293 293 293 293 293 189 189 189 236 236 236 236 236 236 236 331 331 331 331 331 331 331 331 149 149 149 149 149 149 149 292 292 292 292 292 292 292 292 292 331 331 331 331 331 331 331 331 101 101 101 101 101 101 101 101 101 101 101 101 101 224 224 224 224 224 224 224 224 224 224 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
+1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 35 35 35 35 35 35 35 35 35 233 233 233 116 116 116 50 50 50 50 50 50 50 50 50 50 50 50 1 1 1 1 1 1 1 1 1 107 107 107 107 107 107 193 193 193 193 193 193 193 193 193 193 193 176 176 176 176 176 176 176 179 179 179 179 179 179 179 179 179 179 179 179 179 179 179 179 179 37 37 37 37 37 37 37 37 37 37 288 288 288 288 288 288 288 288 288 288 1 1 1 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 224 224 224 224 224 171 171 171 171 171 171 171 171 171 171 171 171 171 171 171 171 171 171 225 225 225 225 85 85 85 85 85 85 85 85 85 85 85 85 85 85 85 145 145 145 145 145 145 145 145 145 145 340 340 340 340 340 340 340 340 47 47 47 47 47 233 233 233 116 116 116 116 231 231 231 231 231 231 231 231 21 21 21 21 21 21 21 21 21 21 117 117 117 189 189 189 189 236 236 236 236 236 236 236 236 236 236 271 271 271 271 271 271 271 271 271 271 271 271 271 271 271 225 225 225 225 225 321 321 321 321 321 321 321 321 321 321 321 321 229 229 229 229 229 229 229 229 340 340 340 340 340 340 340 340 340 340 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 47 47 47 47 47 47 47 47 233 233 233 116 116 116 50 50 50 50 50 175 175 175 175 175 175 175 175 175 175 249 249 249 249 249 249 249 249 249 249 249 249 249 249 249 249 249 225 225 225 225 116 116 116 116 116 116 116 116 331 331 331 331 331 331 331 331 331 21 21 21 21 21 21 21 21 21 21 21 112 112 112 112 112 112 112 112 112 112 112 47 47 47 47 233 233 233 116 116 116 116 116 116 116 116 1 1 1 1 1 219 219 219 219 219 219 219 219 219 219 219 219 193 193 193 193 116 116 116 116 116 116 116 116 175 175 175 175 175 175 225 225 225 225 225 225 225 225 225 53 53 53 53 53 53 53 53 329 329 329 329 329 340 340 340 340 340 340 340 340 47 47 47 47 233 233 233 233 233 116 116 116 116 107 107 107 107 107 107 107 107 107 107 107 107 321 321 321 321 321 321 321 321 321 321 321 321 289 289 289 289 289 289 289 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 171 171 171 171 171 171 171 171 171 133 133 133 225 225 225 225 225 225 225 225 225 288 288 288 288 111 111 111 111 111 111 111 111 111 111 111 193 193 193 193 277 277 277 277 277 277 116 116 116 116 116 116 51 51 51 51 51 51 51 51 272 272 272 272 1 1 1 275 275 275 275 275 275 275 275 101 101 101 101 101 101 101 101 101 288 288 288 288 47 47 47 47 47 333 333 333 333 333 333 333 333 164 164 164 164 164 164 164 164 164 164 164 164 164 164 164 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 35 35 35 35 35 35 35 35 35 35 35 233 233 233 116 116 116 116 102 102 102 102 102 102 102 102 102 102 127 127 127 233 233 233 233 233 217 217 217 217 217 217 217 265 265 265 265 265 265 265 265 265 265 265 116 116 116 227 227 227 227 227 100 100 100 100 100 100 100 100 287 287 287 287 287 287 287 287 277 277 277 277 193 193 193 272 272 272 272 287 287 287 287 287 287 287 287 188 188 188 119 119 119 119 119 119 204 204 204 204 204 204 204 204 99 99 99 99 99 99 99 99 99 99 99 225 225 225 225 225 225 49 49 49 233 233 233 116 116 116 116 116 331 331 331 189 189 189 292 292 292 292 292 292 292 292 292 1 1 1 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 224 224 224 224 224 224 227 227 227 227 227 227 227 100 100 100 100 100 100 100 100 100 100 100 227 227 227 227 227 227 227 227 227 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 288 288 288 288 288 288 288 288 288 288 288 288 288 288 288 288 288 288 288 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 102 102 102 102 102 102 102 102 102 102 102 102 102 331 331 331 331 331 331 53 53 53 53 53 53 53 341 341 341 341 341 49 49 49 233 233 233 288 288 288 288 50 50 50 50 107 107 107 107 107 107 107 107 107 107 107 107 107 107 193 193 193 193 193 193 288 288 288 288 288 288 279 279 279 279 279 279 279 279 279 279 279 279 193 193 193 193 193 193 193 193 220 220 220 220 220 219 219 219 219 219 219 219 219 219 53 53 53 53 229 229 229 229 189 189 189 189 236 236 236 236 236 236 236 236 247 247 247 247 247 247 247 247 247 329 329 329 329 329 329 329 144 144 144 144 144 187 187 187 187 232 232 232 119 119 119 48 48 48 48 48 107 107 107 107 107 249 249 249 249 249 249 249 249 249 249 249 249 249 249 249 249 249 249 249 288 288 288 288 288 288 288 288 288 288 288 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 231 231 231 231 231 101 101 101 101 101 101 101 101 101 101 121 121 121 121 144 144 144 144 144 331 331 331 331 331 49 49 49 340 340 340 340 340 340 340 227 227 227 227 193 193 193 193 281 281 281 281 281 189 189 189 340 340 340 340 279 279 279 279 279 279 273 273 273 273 273 133 133 133 133 233 233 233 233 233 233 233 233 281 281 281 281 281 281 281 281 281 144 144 144 144 144 144 144 144 144 144 144 144 144 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 71 71 71 71 71 71 71 225 225 225 225 121 121 121 121 121 248 248 248 248 283 283 283 283 283 283 283 283 283 283 208 208 208 208 208 215 215 215 215 215 215 215 215 133 133 133 133 233 233 233 233 233 145 145 145 145 145 145 145 145 49 49 49 225 225 225 225 225 225 225 204 204 204 204 204 204 204 204 204 204 204 204 204 204 187 187 187 187 187 187 187 187 187 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 1 1 1 1 1 1 1 1 1 1 1 1 1 1
+1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 283 283 283 283 283 283 283 283 283 283 283 283 208 208 208 208 208 208 279 279 279 279 279 279 279 133 133 133 133 133 133 116 116 116 116 116 116 283 283 283 283 283 208 208 208 208 208 208 208 179 179 179 179 179 179 37 37 37 37 37 37 117 117 117 117 49 49 49 49 232 232 232 232 232 287 287 287 287 287 287 287 287 287 101 101 101 101 101 101 101 101 101 228 228 228 228 228 228 228 287 287 287 188 188 188 175 175 175 175 175 175 193 193 193 193 193 193 193 193 288 288 288 288 288 279 279 279 279 279 279 279 279 279 279 279 279 193 193 193 193 193 193 193 193 193 220 220 220 220 220 220 220 220 220 1 1 1 331 331 331 331 331 331 331 331 331 21 21 21 21 21 21 113 113 113 113 113 113 113 189 189 189 189 236 236 236 236 287 287 287 287 188 188 188 188 279 279 279 279 279 279 279 279 279 279 279 279 208 208 208 208 208 208 208 208 119 119 119 49 49 49 288 288 288 288 102 102 102 102 102 102 102 102 115 115 115 115 115 115 193 193 193 117 117 117 49 49 49 232 232 232 232 232 171 171 171 171 171 171 171 171 171 171 21 21 21 21 21 21 21 21 224 224 224 224 224 224 224 224 224 224 247 247 247 247 247 247 247 329 329 329 329 329 329 145 145 145 145 145 109 109 109 109 109 109 109 109 73 73 73 73 73 73 73 277 277 277 277 277 277 116 116 116 116 116 116 116 116 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 283 283 283 283 283 283 283 283 283 283 283 208 208 208 208 208 208 279 279 279 279 279 279 279 279 133 133 133 133 133 133 133 116 116 116 116 116 283 283 283 283 283 283 283 283 283 283 208 208 208 208 231 231 231 231 231 231 231 231 231 231 133 133 133 133 133 133 329 329 329 329 144 144 144 144 144 144 144 144 279 279 279 279 279 279 279 279 279 279 68 68 68 68 68 68 68 68 68 68 119 119 119 119 119 48 48 48 48 48 107 107 107 107 107 107 107 107 107 107 209 209 209 209 209 209 209 288 288 288 288 47 47 47 47 328 328 328 328 328 227 227 227 227 227 208 208 208 208 208 208 171 171 171 171 171 171 144 144 144 271 271 271 271 271 271 271 271 271 271 271 277 277 277 277 277 85 85 85 85 85 85 225 225 225 225 225 225 189 189 189 189 189 189 236 236 236 236 236 236 47 47 47 47 109 109 109 109 109 85 85 85 85 85 85 85 85 85 85 85 85 85 85 85 85 85 288 288 288 288 288 288 288 288 288 288 288 288 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 107 107 107 107 107 53 53 53 53 288 288 288 191 191 191 191 191 172 172 172 172 172 172 172 172 191 191 191 288 288 288 288 219 219 219 219 219 219 219 219 219 37 37 37 37 37 37 37 37 37 273 273 273 273 273 273 288 288 288 179 179 179 144 144 144 144 171 171 171 171 171 171 171 145 145 145 145 228 228 228 228 228 228 228 107 107 107 107 107 209 209 209 209 209 189 189 189 189 189 189 189 236 236 236 236 236 236 236 236 236 236 236 236 279 279 279 279 279 279 279 279 279 279 279 279 279 209 209 209 209 209 209 209 209 209 281 281 281 281 281 281 281 281 281 281 281 281 197 197 197 197 197 197 197 197 220 220 220 220 220 220 187 187 187 289 289 289 289 280 280 280 280 280 50 50 50 50 227 227 227 227 227 227 227 227 149 149 149 149 149 149 149 149 149 149 149 281 281 281 281 281 281 281 281 281 281 281 204 204 204 204 204 204 204 102 102 102 102 102 102 102 102 102 115 115 115 115 115 115 115 193 193 193 193 193 116 116 116 116 271 271 271 271 271 271 271 271 271 277 277 277 277 85 85 85 85 85 85 85 85 224 224 224 224 224 224 224 224 187 187 187 187 187 341 341 341 341 341 341 49 49 49 49 233 233 233 233 288 288 288 288 187 187 187 187 187 187 187 187 187 187 187 288 288 288 288 288 288 288 288 288 288 288 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 35 35 35 35 35 35 35 35 35 233 233 233 116 116 116 102 102 102 102 102 331 331 331 331 331 331 331 331 21 21 21 233 233 233 289 289 289 49 49 49 116 116 116 287 287 287 188 188 188 188 279 279 279 279 279 279 279 279 279 279 279 279 208 208 208 208 208 208 208 208 208 131 131 131 131 131 329 329 329 329 329 329 329 277 277 277 205 205 205 293 293 293 293 293 197 197 197 236 236 236 119 119 119 119 49 49 49 288 288 288 331 331 331 49 49 49 340 340 340 340 340 340 287 287 287 287 287 188 188 188 107 107 107 107 107 107 208 208 208 208 208 208 279 279 279 279 279 279 279 279 279 279 279 279 279 279 279 279 209 209 209 209 209 209 209 209 209 232 232 232 232 232 19 19 19 19 19 19 19 19 232 232 232 119 119 119 119 119 37 37 37 37 37 37 37 37 37 288 288 288 288 107 107 107 107 107 249 249 249 249 249 249 249 249 249 249 249 249 249 288 288 288 288 288 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 107 107 107 107 107 189 189 189 189 189 221 221 221 221 221 49 49 49 49 49 340 340 340 340 340 102 102 102 102 102 102 102 102 102 115 115 115 115 115 115 193 193 193 193 193 117 117 117 49 49 49 49 49 49 49 232 232 232 231 231 231 231 231 231 231 231 231 231 248 248 248 248 248 248 248 248 248 248 331 331 331 331 331 133 133 133 133 121 121 121 121 121 121 121 144 144 144 144 144 144 99 99 99 99 99 99 99 99 99 99 116 116 116 116 116 116 131 131 131 131 329 329 329 329 329 329 144 144 144 144 144 179 179 179 179 179 179 179 179 179 37 37 37 328 328 328 328 328 47 47 47 47 233 233 233 233 233 233 53 53 53 53 53 53 53 121 121 121 121 121 144 144 144 144 144 144 144 144 144 23 23 23 23 23 273 273 273 273 273 145 145 145 145 145 289 289 289 289 289 289 289 289 289 289 289 289 289 321 321 321 321 321 321 321 233 233 233 233 49 49 49 49 49 289 289 289 289 204 204 204 204 204 204 204 204 204 204 204 204 204 204 204 204 204 204 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
+1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 1 1 1 1 1 1 1 1 1 1 119 119 119 119 119 119 133 133 133 133 133 133 133 133 133 276 276 276 276 276 146 146 146 146 146 146 146 146 146 146 146 50 50 50 50 50 50 50 50 50 50 50 223 223 223 223 223 223 223 223 223 223 223 223 69 69 69 69 69 69 69 69 69 69 69 69 69 69 69 288 288 288 288 288 288 227 227 227 227 227 227 227 227 69 69 69 69 69 69 69 276 276 276 276 276 276 276 276 276 1 1 1 1 1 111 111 111 111 111 111 111 111 111 111 111 111 111 111 111 111 111 111 133 133 133 133 133 277 277 277 277 277 204 204 204 204 204 204 204 204 204 204 287 287 287 287 287 287 287 287 287 277 277 277 277 209 209 209 209 209 209 340 340 340 340 340 340 340 340 340 340 340 340 67 67 67 67 67 67 67 67 67 67 67 67 67 224 224 224 224 224 224 224 224 187 187 187 187 187 232 232 232 232 232 232 232 232 232 107 107 107 107 107 107 107 225 225 225 225 225 225 225 225 225 225 225 225 225 321 321 321 321 321 321 321 321 321 321 321 321 321 321 321 321 321 321 321 321 228 228 228 228 228 228 228 228 228 228 228 228 228 228 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 119 119 119 119 119 119 193 193 193 193 193 193 193 193 193 193 193 280 280 280 280 280 280 280 280 280 1 1 1 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 225 225 225 225 225 225 225 49 49 49 49 49 49 49 233 233 233 116 116 116 116 116 187 187 187 340 340 340 340 340 340 340 340 119 119 119 119 119 52 52 52 52 52 52 52 52 52 1 1 1 1 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 271 271 271 271 271 271 271 271 271 271 271 271 225 225 225 225 165 165 165 165 165 165 165 165 165 165 165 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 102 102 102 102 102 102 102 102 102 215 215 215 215 215 215 215 189 189 189 189 189 281 281 281 281 288 288 288 288 288 288 288 288 223 223 223 223 223 223 53 53 53 53 53 328 328 328 328 328 191 191 191 288 288 288 288 288 288 63 63 63 63 63 63 63 63 63 225 225 225 225 225 225 225 225 277 277 277 277 277 277 277 133 133 133 133 133 133 133 117 117 117 117 117 204 204 204 204 204 204 204 204 204 204 204 35 35 35 233 233 233 116 116 116 99 99 99 99 99 99 228 228 228 228 228 228 228 279 279 279 279 279 279 279 279 279 279 279 279 279 279 279 248 248 248 248 248 248 248 248 248 248 248 175 175 175 175 175 175 175 175 175 175 175 225 225 225 225 225 225 225 225 225 225 225 37 37 37 37 37 116 116 116 116 99 99 99 99 99 99 228 228 228 228 228 228 228 228 228 175 175 175 175 175 175 249 249 249 249 249 189 189 189 189 189 189 236 236 236 287 287 287 287 287 48 48 48 48 48 48 223 223 223 223 223 223 223 193 193 193 193 193 328 328 328 328 328 179 179 179 179 179 179 179 209 209 209 209 209 209 276 276 276 276 276 276 276 276 276 276 276 276 276 276 276 276 276 276 276 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 99 99 99 99 99 99 99 99 99 99 99 328 328 328 328 328 328 328 328 67 67 67 225 225 225 225 333 333 333 333 333 333 333 205 205 205 205 340 340 340 340 340 340 340 179 179 179 179 179 179 179 179 179 149 149 149 149 149 149 116 116 116 116 119 119 119 49 49 49 49 49 288 288 288 288 288 271 271 271 271 271 271 277 277 277 277 193 193 193 233 233 233 233 233 233 233 233 280 280 280 280 280 280 280 131 131 131 131 131 117 117 117 117 117 117 333 333 333 333 145 145 145 145 116 116 116 116 116 116 116 116 116 116 99 99 99 99 99 99 99 99 99 99 99 225 225 225 225 225 49 49 49 49 49 233 233 233 116 116 116 331 331 331 331 331 49 49 49 340 340 340 340 340 340 119 119 119 52 52 52 52 271 271 271 271 271 271 271 271 271 271 271 271 277 277 277 277 193 193 193 289 289 289 205 205 205 205 205 205 205 49 49 49 49 49 49 281 281 281 281 281 281 288 288 288 271 271 271 271 271 271 225 225 225 225 225 165 165 165 165 165 165 165 165 280 280 280 280 280 280 280 280 280 280 280 187 187 187 232 232 232 119 119 119 52 52 52 52 52 331 331 331 331 331 331 331 331 331 331 149 149 149 149 149 225 225 225 225 225 225 225 225 225 225 225 116 116 116 116 116 116 116 1 1 1 1 1 1 1 1 1 1 1 1 1
+1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 47 47 47 47 47 47 47 233 233 233 116 116 116 102 102 102 102 102 102 335 335 335 335 335 335 335 335 335 335 335 321 321 321 321 341 341 341 341 341 341 341 116 116 116 287 287 287 48 48 48 48 48 187 187 187 229 229 229 229 229 229 229 229 37 37 37 37 37 37 37 37 217 217 217 217 217 217 217 217 217 49 49 49 232 232 232 232 232 102 102 102 102 102 331 331 331 331 331 331 49 49 49 340 340 340 340 340 340 340 340 340 223 223 223 223 223 223 223 193 193 193 193 193 329 329 329 329 189 189 189 189 189 236 236 236 236 179 179 179 179 179 179 179 179 179 179 179 209 209 209 209 276 276 276 276 276 276 276 276 276 276 276 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 107 107 107 107 107 107 53 53 53 288 288 288 288 102 102 102 102 102 102 102 102 102 102 231 231 231 231 231 231 231 231 133 133 133 133 133 329 329 329 329 329 329 329 144 144 144 144 144 144 144 144 275 275 275 275 275 275 275 275 275 275 275 209 209 209 209 209 209 225 225 225 225 225 225 204 204 204 204 204 204 187 187 187 187 221 221 221 221 221 221 281 281 281 281 281 281 281 273 273 273 273 273 273 133 133 133 133 221 221 221 221 221 289 289 289 289 289 289 49 49 49 116 116 116 116 102 102 102 102 102 102 102 102 102 102 102 102 331 331 331 331 331 331 331 331 331 331 331 331 305 305 305 305 305 305 305 305 305 305 305 116 116 116 116 116 116 116 116 116 116 116 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 187 187 187 187 187 187 187 187 187 289 289 289 289 289 289 280 280 280 280 280 280 280 115 115 115 115 115 189 189 189 189 189 189 225 225 225 225 225 225 225 225 101 101 101 101 101 101 101 101 101 101 101 289 289 289 289 289 289 173 173 173 173 173 49 49 49 49 224 224 224 224 224 224 331 331 331 331 331 193 193 193 193 232 232 232 232 335 335 335 335 305 305 305 276 276 276 276 276 276 276 276 187 187 187 187 187 229 229 229 229 229 229 229 41 41 41 41 41 41 41 41 217 217 217 217 217 217 217 217 217 49 49 49 233 233 233 233 233 165 165 165 165 165 165 165 285 285 285 285 285 285 285 285 285 285 285 49 49 49 233 233 233 340 340 340 340 340 340 219 219 219 219 219 219 219 219 219 219 53 53 53 53 228 228 228 228 228 228 228 287 287 287 287 287 287 287 287 287 277 277 277 277 320 320 320 320 320 320 320 320 320 320 191 191 191 191 191 341 341 341 341 49 49 49 49 233 233 233 233 233 288 288 288 288 187 187 187 187 187 187 187 187 187 187 187 187 288 288 288 288 288 288 288 288 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 107 107 107 107 53 53 53 53 288 288 288 288 288 119 119 119 119 249 249 249 249 249 249 249 249 249 340 340 340 340 340 340 340 340 340 340 340 340 340 340 275 275 275 275 275 133 133 133 133 133 133 133 116 116 116 116 116 116 116 116 275 275 275 275 275 275 275 275 249 249 249 249 249 249 249 117 117 117 117 117 340 340 340 340 340 146 146 146 146 146 146 279 279 279 279 279 279 279 279 279 279 279 279 248 248 248 248 248 248 248 248 248 171 171 171 171 171 171 171 171 171 171 171 171 171 171 171 53 53 53 53 53 53 53 233 233 233 233 233 233 204 204 204 204 204 204 204 204 204 204 204 204 204 204 204 204 204 204 204 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 331 331 331 331 331 331 133 133 133 232 232 232 331 331 331 208 208 208 208 208 208 208 175 175 175 175 175 175 175 21 21 21 21 21 21 288 288 288 187 187 187 187 187 233 233 233 233 233 233 233 289 289 289 289 289 289 48 48 48 48 119 119 119 119 119 52 52 52 52 52 52 52 52 287 287 287 287 287 287 287 287 287 277 277 277 277 277 165 165 165 165 165 165 165 165 232 232 232 232 232 232 232 35 35 35 288 288 288 283 283 283 283 283 283 283 283 283 283 21 21 21 21 277 277 277 277 277 277 277 225 225 225 225 49 49 49 49 49 49 289 289 289 289 289 289 289 289 289 289 289 289 289 289 89 89 89 89 89 89 89 89 89 89 89 89 232 232 232 232 47 47 47 233 233 233 116 116 116 119 119 119 52 52 52 52 52 52 52 275 275 275 275 275 275 275 133 133 133 133 133 133 116 116 116 116 116 116 116 116 116 116 116 275 275 275 275 275 275 249 249 249 249 249 249 249 249 249 117 117 117 117 117 340 340 340 340 340 107 107 107 107 205 205 205 205 177 177 177 177 177 177 177 37 37 37 37 37 37 37 37 37 232 232 232 232 287 287 287 48 48 48 48 48 171 171 171 171 171 171 171 171 171 225 225 225 225 37 37 37 37 37 37 37 37 37 37 37 37 284 284 284 284 284 284 284 284 284 284 271 271 271 271 271 271 271 271 271 271 271 271 37 37 37 37 37 37 37 37 37 37 37 37 37 37 281 281 281 281 281 281 281 281 288 288 288 288 288 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 35 35 35 35 35 35 35 35 35 35 35 35 35 35 35 35 281 281 281 281 281 281 281 288 288 288 288 227 227 227 227 193 193 193 193 193 193 281 281 281 281 281 281 189 189 189 189 340 340 340 340 279 279 279 279 279 273 273 273 273 273 273 133 133 133 133 233 233 233 233 233 233 233 281 281 281 281 281 281 281 281 144 144 144 144 331 331 331 331 331 331 53 53 53 53 288 288 288 288 288 227 227 227 227 227 165 165 165 165 165 165 165 116 116 116 116 119 119 119 49 49 49 49 228 228 228 228 228 228 228 228 228 275 275 275 275 133 133 133 133 133 133 133 116 116 116 116 116 116 116 1 1 1 1 1 1 1 1 1 1 1
+1 1 1 1 1 1 1 1 1 1 1 35 35 35 35 35 35 35 35 233 233 233 116 116 116 116 283 283 283 283 283 283 283 283 283 208 208 208 208 208 208 208 208 279 279 279 279 279 279 279 279 279 279 133 133 133 133 116 116 116 116 116 116 283 283 283 283 283 283 283 208 208 208 208 208 115 115 115 115 115 115 193 193 193 193 193 117 117 117 117 49 49 49 232 232 232 231 231 231 231 231 231 231 248 248 248 248 248 248 47 47 47 47 47 233 233 233 116 116 116 171 171 171 171 144 144 144 144 144 144 271 271 271 271 271 271 271 271 271 271 193 193 193 289 289 289 289 289 205 205 205 205 205 340 340 340 340 340 340 279 279 279 279 279 279 279 279 279 165 165 165 165 165 165 165 220 220 220 220 220 220 231 231 231 231 231 21 21 21 21 21 21 21 21 288 288 288 287 287 287 48 48 48 48 35 35 35 35 35 35 35 35 35 35 35 35 35 281 281 281 281 281 281 281 220 220 220 220 179 179 179 144 144 144 144 131 131 131 131 233 233 233 233 204 204 204 204 227 227 227 227 227 227 227 227 69 69 69 69 276 276 276 276 276 276 219 219 219 219 219 219 219 219 219 333 333 333 133 133 133 133 133 281 281 281 281 281 281 113 113 113 113 113 113 113 113 113 49 49 49 49 49 233 233 233 233 233 233 233 340 340 340 340 340 340 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 283 283 283 283 283 283 283 283 283 283 283 208 208 208 208 208 279 279 279 279 279 279 279 133 133 133 133 116 116 116 102 102 102 102 102 102 102 102 102 227 227 227 227 227 227 227 53 53 53 53 53 53 281 281 281 288 288 288 179 179 179 37 37 37 328 328 328 328 35 35 35 35 35 35 35 35 35 35 35 281 281 281 281 281 288 288 288 288 288 179 179 179 144 144 144 50 50 50 50 50 50 291 291 291 291 291 291 291 291 291 291 291 85 85 85 85 85 85 85 85 85 85 85 85 341 341 341 341 341 341 341 341 49 49 49 233 233 233 116 116 116 116 63 63 63 63 63 63 63 63 225 225 225 225 225 225 225 225 277 277 277 277 277 133 133 133 133 133 133 117 117 117 117 117 117 204 204 204 204 204 204 204 204 204 204 204 204 204 204 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 102 102 102 102 102 102 102 102 102 102 102 102 102 279 279 279 279 279 279 279 279 49 49 49 49 273 273 273 273 273 273 273 249 249 249 249 249 249 249 249 249 340 340 340 340 340 340 340 340 102 102 102 102 102 102 102 102 102 102 102 102 102 179 179 179 179 179 179 179 179 179 179 179 37 37 37 37 37 37 37 37 37 37 116 116 116 116 116 116 116 287 287 287 287 287 287 287 287 287 320 320 320 320 320 320 320 320 320 320 320 320 320 320 320 320 320 107 107 107 107 107 53 53 53 53 53 53 53 288 288 288 288 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 179 179 179 179 179 179 179 179 179 179 179 179 84 84 84 84 84 84 84 84 84 84 335 335 335 335 335 335 335 335 335 335 320 320 320 175 175 175 175 175 175 175 249 249 249 189 189 189 189 189 189 236 236 236 287 287 287 287 188 188 188 171 171 171 171 171 171 171 171 101 101 101 101 101 101 101 101 101 101 101 101 233 233 233 233 116 116 116 116 116 83 83 83 83 83 83 83 288 288 288 288 47 47 47 47 109 109 109 109 109 109 85 85 85 85 85 85 85 85 85 85 85 288 288 288 288 288 291 291 291 291 291 193 193 193 237 237 237 237 237 237 237 340 340 340 340 340 340 340 191 191 191 172 172 172 335 335 335 335 320 320 320 115 115 115 115 115 115 115 249 249 249 249 249 249 249 233 233 233 288 288 288 288 35 35 35 35 35 35 35 35 281 281 281 281 281 281 281 281 220 220 220 219 219 219 333 333 333 333 133 133 133 133 133 281 281 281 281 281 281 281 281 113 113 113 113 113 113 113 113 49 49 49 49 49 49 233 233 233 233 233 233 233 233 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 35 35 35 35 35 35 35 233 233 233 116 116 116 331 331 331 53 53 53 53 53 53 53 53 288 288 288 288 288 288 288 288 288 288 115 115 115 115 115 115 115 115 53 53 53 53 53 53 53 53 53 53 340 340 340 340 340 340 340 340 340 340 340 340 227 227 227 227 227 165 165 165 165 165 165 220 220 220 220 220 220 220 220 119 119 119 119 48 48 48 48 48 48 275 275 275 275 275 275 275 275 275 249 249 249 249 249 249 249 249 117 117 117 117 117 117 340 340 340 340 340 340 340 340 340 275 275 275 275 275 275 133 133 133 133 133 133 133 133 116 116 116 116 116 116 116 116 116 116 116 116 116 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 331 331 331 331 331 331 331 331 331 331 331 331 331 331 331 331 331 331 331 331 331 331 331 331 331 331 331 331 331 331 331 331 331 331 331 331 133 133 133 133 133 133 133 133 133 224 224 224 224 224 224 224 231 231 231 231 231 231 231 231 231 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 115 115 115 115 115 49 49 49 49 49 233 233 233 233 233 233 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 279 279 279 279 279 279 279 279 279 279 279 279 279 279 133 133 133 133 116 116 116 116 116 227 227 227 227 227 227 37 37 37 37 37 37 37 37 37 37 293 293 293 293 293 293 337 337 337 337 337 337 337 316 316 316 316 316 316 316 316 316 316 316 316 316 316 316 316 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
+1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 119 119 119 119 119 119 119 133 133 133 277 277 277 116 116 116 107 107 107 107 208 208 208 208 208 208 231 231 231 231 231 231 231 231 248 248 248 248 248 248 248 279 279 279 279 279 279 279 279 279 221 221 221 221 221 221 221 249 249 249 249 249 249 272 272 272 171 171 171 171 171 171 171 171 144 144 144 144 187 187 187 187 187 187 187 187 187 187 229 229 229 229 229 229 41 41 41 41 41 41 41 41 217 217 217 217 217 217 217 217 49 49 49 233 233 233 233 233 165 165 165 165 165 165 165 165 285 285 285 285 285 285 285 285 285 285 285 285 285 49 49 49 49 49 232 232 232 232 232 232 119 119 119 119 133 133 133 133 133 133 133 133 133 232 232 232 232 232 232 232 232 232 1 1 1 1 1 1 1 1 331 331 331 331 331 331 331 331 305 305 305 305 116 116 116 116 116 116 116 116 116 119 119 119 133 133 133 133 276 276 276 276 276 276 276 276 276 276 276 276 276 276 276 276 276 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 107 107 107 107 107 53 53 53 288 288 288 288 35 35 35 35 35 35 228 228 228 228 228 102 102 102 102 102 102 102 102 1 1 1 287 287 287 287 287 287 287 287 69 69 69 69 69 69 69 221 221 221 221 221 221 189 189 189 189 236 236 236 236 287 287 287 287 287 287 287 287 320 320 320 320 320 227 227 227 227 227 227 53 53 53 53 53 53 53 53 53 53 53 112 112 112 112 112 112 112 112 112 112 112 112 112 112 112 112 112 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 271 271 271 271 271 271 271 209 209 209 209 209 209 209 273 273 273 49 49 49 224 224 224 224 224 224 224 19 19 19 19 19 19 19 276 276 276 276 276 276 276 67 67 67 67 67 225 225 225 333 333 333 333 333 205 205 205 205 340 340 340 340 340 287 287 287 287 287 287 287 287 133 133 133 225 225 225 225 225 189 189 189 189 189 189 236 236 236 236 227 227 227 227 227 227 208 208 208 208 208 208 208 208 208 102 102 102 102 102 102 102 102 102 102 102 115 115 115 115 115 115 115 320 320 320 320 320 320 320 320 320 320 320 320 320 320 320 320 320 320 320 320 320 320 320 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 331 331 331 331 331 331 331 305 305 305 116 116 116 116 335 335 335 335 335 320 320 320 320 320 275 275 275 275 275 275 37 37 37 37 37 37 37 37 37 37 121 121 121 121 121 144 144 144 144 144 144 144 102 102 102 102 102 102 102 102 115 115 115 115 115 115 115 193 193 193 193 193 117 117 117 49 49 49 232 232 232 232 287 287 287 287 287 287 287 287 287 287 69 69 69 69 69 69 69 69 69 69 69 69 69 69 69 220 220 220 220 220 220 220 220 220 220 220 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 187 187 187 187 187 187 187 187 187 187 172 172 172 172 172 172 172 172 335 335 335 335 320 320 320 320 279 279 279 279 279 279 279 279 279 279 279 279 164 164 164 164 164 164 164 164 164 279 279 279 279 279 279 279 279 279 279 279 248 248 248 248 248 248 248 99 99 99 99 99 224 224 224 224 224 224 224 279 279 279 279 279 279 279 279 279 289 289 289 289 289 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 272 272 272 272 272 272 272 272 272 272 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 102 102 102 102 102 102 102 102 102 102 219 219 219 219 219 219 219 219 219 219 219 219 219 219 219 219 219 219 37 37 37 37 37 37 37 37 37 232 232 232 232 232 232 232 232 232 279 279 279 279 279 279 279 289 289 289 289 289 21 21 21 21 21 21 21 21 21 21 272 272 272 272 272 331 331 331 133 133 133 232 232 232 232 102 102 102 102 102 102 102 227 227 227 227 227 227 227 227 227 165 165 165 165 165 165 220 220 220 220 220 220 220 220 51 51 51 51 51 51 272 272 272 272 227 227 227 227 227 100 100 100 100 100 100 100 100 227 227 227 227 227 227 227 227 227 227 101 101 101 101 101 101 101 101 101 101 233 233 233 116 116 116 287 287 287 287 287 287 287 287 320 320 320 320 320 320 320 187 187 187 187 187 187 288 288 288 288 288 288 71 71 71 71 71 71 225 225 225 225 121 121 121 121 248 248 248 248 248 248 248 248 248 248 248 187 187 187 187 187 187 289 289 289 289 289 280 280 280 280 280 280 280 115 115 115 115 115 115 193 193 193 193 193 173 173 173 173 173 173 173 49 49 49 49 221 221 221 221 221 49 49 49 49 49 225 225 225 225 225 225 225 225 225 225 225 288 288 288 288 288 288 288 288 288 288 288 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 227 227 227 227 227 227 227 227 227 227 227 227 227 37 37 37 37 37 37 37 37 37 37 37 37 37 293 293 293 293 293 293 293 293 293 293 293 337 337 337 337 337 337 337 316 316 316 316 316 316 316 316 316 316 316 316 316 316 316 316 316 316 316 316 316 316 1 1 1 1 1 1 1 1 1 1 1 1 1
+1 1 1 1 1 1 1 1 1 331 331 331 331 331 331 49 49 49 340 340 340 340 340 340 340 187 187 187 233 233 233 233 233 233 217 217 217 217 217 217 217 265 265 265 265 265 265 265 189 189 189 189 189 189 189 189 236 236 236 236 236 236 179 179 179 179 189 189 189 229 229 229 229 229 229 281 281 281 281 281 281 281 281 133 133 133 133 225 225 225 225 225 225 225 225 225 225 225 225 225 172 172 172 172 172 172 172 172 172 172 172 172 172 172 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 223 223 223 223 223 223 223 223 101 101 101 101 101 101 101 101 101 101 220 220 220 220 220 220 227 227 227 227 227 227 227 227 227 249 249 249 249 249 249 249 249 249 249 281 281 281 281 281 281 281 288 288 288 288 288 288 288 219 219 219 219 219 219 219 219 333 333 333 333 333 333 101 101 101 101 101 101 101 101 101 101 49 49 49 49 288 288 288 288 288 171 171 171 171 171 171 171 171 171 249 249 249 249 249 249 249 249 249 249 249 249 221 221 221 221 221 221 221 221 280 280 280 280 280 280 280 179 179 179 179 179 179 208 208 208 208 208 208 208 208 223 223 223 223 223 223 223 101 101 101 101 101 101 101 101 221 221 221 221 221 221 288 288 288 287 287 287 287 287 287 287 69 69 69 69 69 69 69 69 69 69 69 221 221 221 221 221 49 49 49 289 289 289 289 189 189 189 189 328 328 328 328 271 271 271 271 271 271 271 271 271 271 209 209 209 209 209 209 209 209 209 273 273 273 273 273 273 49 49 49 49 49 224 224 224 224 224 224 224 331 331 331 331 331 133 133 133 133 133 232 232 232 232 232 232 119 119 119 164 164 164 164 164 164 331 331 331 331 331 331 331 331 331 144 144 144 144 144 144 331 331 331 331 331 331 331 193 193 193 193 225 225 225 225 225 225 189 189 189 189 189 189 236 236 236 236 236 236 236 287 287 287 287 287 287 188 188 188 115 115 115 115 115 115 115 320 320 320 320 320 320 320 320 119 119 119 119 48 48 48 48 287 287 287 287 287 287 287 287 69 69 69 69 69 69 69 69 69 69 69 221 221 221 221 221 221 221 221 189 189 189 236 236 236 236 236 236 119 119 119 119 49 49 49 49 229 229 229 229 229 281 281 281 281 281 281 281 281 281 281 281 281 133 133 133 133 133 133 225 225 225 225 225 225 225 225 225 225 329 329 329 329 329 340 340 340 340 340 340 340 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 35 35 35 35 35 35 35 35 35 233 233 233 116 116 116 115 115 115 115 193 193 193 116 116 116 231 231 231 231 231 231 21 21 21 21 21 21 288 288 288 288 288 187 187 187 221 221 221 221 221 221 281 281 281 281 281 281 273 273 273 273 273 133 133 133 133 133 133 221 221 221 221 221 221 288 288 288 288 288 288 187 187 187 228 228 228 228 228 228 287 287 287 287 287 188 188 188 219 219 219 219 219 219 219 219 219 219 219 219 209 209 209 209 209 209 209 272 272 272 272 272 272 272 51 51 51 51 51 51 51 51 51 51 272 272 272 272 179 179 179 189 189 189 189 189 340 340 340 340 340 340 340 340 340 131 131 131 131 131 233 233 233 233 233 233 116 116 116 116 116 47 47 47 328 328 328 328 328 328 328 191 191 191 191 288 288 288 288 288 288 288 288 288 288 288 288 288 288 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 107 107 107 107 107 107 53 53 53 53 288 288 288 288 179 179 179 179 179 179 208 208 208 179 179 179 37 37 37 116 116 116 231 231 231 231 231 231 231 231 231 133 133 133 133 133 133 329 329 329 329 329 329 329 329 144 144 144 144 144 187 187 187 187 187 221 221 221 221 221 221 221 281 281 281 281 281 281 281 281 281 273 273 273 273 273 133 133 133 133 133 133 221 221 221 221 221 289 289 289 289 289 289 189 189 189 116 116 116 287 287 287 287 287 320 320 320 320 320 320 320 320 187 187 187 233 233 233 233 233 233 217 217 217 217 217 217 264 264 264 264 264 264 264 264 264 264 264 119 119 119 119 119 52 52 52 52 52 279 279 279 279 279 279 279 49 49 49 49 281 281 281 281 281 281 281 281 281 281 281 281 281 101 101 101 101 101 101 101 101 101 101 101 101 101 49 49 49 289 289 289 289 289 289 289 289 289 204 204 204 204 47 47 47 47 328 328 328 328 328 328 328 50 50 50 50 50 223 223 223 223 193 193 193 289 289 289 289 289 49 49 49 49 224 224 224 224 175 175 175 175 175 175 175 175 149 149 149 149 149 149 149 149 149 149 149 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 331 331 331 331 331 331 331 331 331 331 331 193 193 193 193 229 229 229 229 229 229 229 49 49 49 49 49 232 232 232 232 232 232 232 232 232 331 331 331 331 144 144 144 144 144 107 107 107 107 107 107 107 37 37 37 37 37 37 37 37 37 37 116 116 116 116 116 187 187 187 187 233 233 233 233 233 233 233 53 53 53 53 53 53 53 53 53 172 172 172 172 172 172 172 187 187 187 232 232 232 232 232 232 67 67 67 67 67 67 67 67 224 224 224 224 224 224 219 219 219 219 219 219 219 219 219 21 21 21 21 21 21 21 21 233 233 233 233 233 233 285 285 285 285 285 285 285 285 49 49 49 49 49 233 233 233 233 233 233 233 280 280 280 280 280 280 280 280 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 107 107 107 107 107 107 53 53 53 53 53 53 288 288 288 288 288 223 223 223 223 223 223 193 193 193 289 289 289 49 49 49 49 49 224 224 224 224 224 224 224 175 175 175 175 175 175 175 149 149 149 149 149 149 149 149 149 149 149 149 149 149 225 225 225 225 225 225 225 225 225 225 225 225 225 225 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340 331 331 331 331 331 331 144 144 144 144 144 144 144 331 331 331 331 331 331 331 331 331 331 331 331 149 149 149 149 149 149 149 149 149 149 149 149 149 149 149 149 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 280 1 1 1 1 1
diff --git a/SpeechLM/dataset/LibriSpeech/phone_unit/train_sample100.tsv b/SpeechLM/dataset/LibriSpeech/phone_unit/train_sample100.tsv
new file mode 100644
index 0000000000000000000000000000000000000000..77da1d382251d74e02d73cce8e2e35ec6e1f8f0c
--- /dev/null
+++ b/SpeechLM/dataset/LibriSpeech/phone_unit/train_sample100.tsv
@@ -0,0 +1,101 @@
+/LocalData/dataset/LibriSpeech
+train-clean-100/103/1240/103-1240-0000.flac	225360
+train-clean-100/103/1240/103-1240-0001.flac	255120
+train-clean-100/103/1240/103-1240-0002.flac	223120
+train-clean-100/103/1240/103-1240-0003.flac	235360
+train-clean-100/103/1240/103-1240-0004.flac	200240
+train-clean-100/103/1240/103-1240-0005.flac	242800
+train-clean-100/103/1240/103-1240-0006.flac	153280
+train-clean-100/103/1240/103-1240-0007.flac	240560
+train-clean-100/103/1240/103-1240-0008.flac	246960
+train-clean-100/103/1240/103-1240-0009.flac	160480
+train-clean-100/103/1240/103-1240-0010.flac	236880
+train-clean-100/103/1240/103-1240-0011.flac	234480
+train-clean-100/103/1240/103-1240-0012.flac	243040
+train-clean-100/103/1240/103-1240-0013.flac	244160
+train-clean-100/103/1240/103-1240-0014.flac	223360
+train-clean-100/103/1240/103-1240-0015.flac	60960
+train-clean-100/103/1240/103-1240-0016.flac	250640
+train-clean-100/103/1240/103-1240-0017.flac	229040
+train-clean-100/103/1240/103-1240-0018.flac	185760
+train-clean-100/103/1240/103-1240-0019.flac	246480
+train-clean-100/103/1240/103-1240-0020.flac	214640
+train-clean-100/103/1240/103-1240-0021.flac	236960
+train-clean-100/103/1240/103-1240-0022.flac	262000
+train-clean-100/103/1240/103-1240-0023.flac	194400
+train-clean-100/103/1240/103-1240-0024.flac	244320
+train-clean-100/103/1240/103-1240-0025.flac	241920
+train-clean-100/103/1240/103-1240-0026.flac	133360
+train-clean-100/103/1240/103-1240-0027.flac	223440
+train-clean-100/103/1240/103-1240-0028.flac	250400
+train-clean-100/103/1240/103-1240-0029.flac	244320
+train-clean-100/103/1240/103-1240-0030.flac	232320
+train-clean-100/103/1240/103-1240-0031.flac	269760
+train-clean-100/103/1240/103-1240-0032.flac	236400
+train-clean-100/103/1240/103-1240-0033.flac	230640
+train-clean-100/103/1240/103-1240-0034.flac	246480
+train-clean-100/103/1240/103-1240-0035.flac	256720
+train-clean-100/103/1240/103-1240-0036.flac	200320
+train-clean-100/103/1240/103-1240-0037.flac	237040
+train-clean-100/103/1240/103-1240-0038.flac	114480
+train-clean-100/103/1240/103-1240-0039.flac	230800
+train-clean-100/103/1240/103-1240-0040.flac	234720
+train-clean-100/103/1240/103-1240-0041.flac	216160
+train-clean-100/103/1240/103-1240-0042.flac	249680
+train-clean-100/103/1240/103-1240-0043.flac	236160
+train-clean-100/103/1240/103-1240-0044.flac	262240
+train-clean-100/103/1240/103-1240-0045.flac	250800
+train-clean-100/103/1240/103-1240-0046.flac	222800
+train-clean-100/103/1240/103-1240-0047.flac	206320
+train-clean-100/103/1240/103-1240-0048.flac	236320
+train-clean-100/103/1240/103-1240-0049.flac	244560
+train-clean-100/103/1240/103-1240-0050.flac	224400
+train-clean-100/103/1240/103-1240-0051.flac	245760
+train-clean-100/103/1240/103-1240-0052.flac	236640
+train-clean-100/103/1240/103-1240-0053.flac	218640
+train-clean-100/103/1240/103-1240-0054.flac	261360
+train-clean-100/103/1240/103-1240-0055.flac	179920
+train-clean-100/103/1240/103-1240-0056.flac	229040
+train-clean-100/103/1240/103-1240-0057.flac	109680
+train-clean-100/103/1241/103-1241-0000.flac	255440
+train-clean-100/103/1241/103-1241-0001.flac	248800
+train-clean-100/103/1241/103-1241-0002.flac	249040
+train-clean-100/103/1241/103-1241-0003.flac	222160
+train-clean-100/103/1241/103-1241-0004.flac	236080
+train-clean-100/103/1241/103-1241-0005.flac	224400
+train-clean-100/103/1241/103-1241-0006.flac	243760
+train-clean-100/103/1241/103-1241-0007.flac	242320
+train-clean-100/103/1241/103-1241-0008.flac	242160
+train-clean-100/103/1241/103-1241-0009.flac	222400
+train-clean-100/103/1241/103-1241-0010.flac	253920
+train-clean-100/103/1241/103-1241-0011.flac	231760
+train-clean-100/103/1241/103-1241-0012.flac	239680
+train-clean-100/103/1241/103-1241-0013.flac	236960
+train-clean-100/103/1241/103-1241-0014.flac	242080
+train-clean-100/103/1241/103-1241-0015.flac	224160
+train-clean-100/103/1241/103-1241-0016.flac	234640
+train-clean-100/103/1241/103-1241-0017.flac	254240
+train-clean-100/103/1241/103-1241-0018.flac	150960
+train-clean-100/103/1241/103-1241-0019.flac	48400
+train-clean-100/103/1241/103-1241-0020.flac	155360
+train-clean-100/103/1241/103-1241-0021.flac	242880
+train-clean-100/103/1241/103-1241-0022.flac	261600
+train-clean-100/103/1241/103-1241-0023.flac	266720
+train-clean-100/103/1241/103-1241-0024.flac	254240
+train-clean-100/103/1241/103-1241-0025.flac	77280
+train-clean-100/103/1241/103-1241-0026.flac	176080
+train-clean-100/103/1241/103-1241-0027.flac	238080
+train-clean-100/103/1241/103-1241-0028.flac	248880
+train-clean-100/103/1241/103-1241-0029.flac	244960
+train-clean-100/103/1241/103-1241-0030.flac	247520
+train-clean-100/103/1241/103-1241-0031.flac	209600
+train-clean-100/103/1241/103-1241-0032.flac	224080
+train-clean-100/103/1241/103-1241-0033.flac	251920
+train-clean-100/103/1241/103-1241-0034.flac	270560
+train-clean-100/103/1241/103-1241-0035.flac	248800
+train-clean-100/103/1241/103-1241-0036.flac	249040
+train-clean-100/103/1241/103-1241-0037.flac	204400
+train-clean-100/103/1241/103-1241-0038.flac	238960
+train-clean-100/103/1241/103-1241-0039.flac	258160
+train-clean-100/103/1241/103-1241-0040.flac	220560
+train-clean-100/103/1241/103-1241-0041.flac	252240
diff --git a/SpeechLM/modules.py b/SpeechLM/modules.py
new file mode 100644
index 0000000000000000000000000000000000000000..2841868b315cee2e3f8d7c072488d840bbaa8ab7
--- /dev/null
+++ b/SpeechLM/modules.py
@@ -0,0 +1,2130 @@
+# ----------------------------------------------------------------------------
+# SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data (https://arxiv.org/abs/2209.15329)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/SpeechLM
+# Code based on fairseq: https://github.com/facebookresearch/fairseq
+# 
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# ----------------------------------------------------------------------------
+"""
+We just merge all the required modules and functions into one python file.
+It is for easily use the pre-trained model to extract features.
+"""
+import math
+import numpy as np
+import logging
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from torch.nn import Parameter
+from torch import Tensor
+from typing import Any, Dict, List, Tuple, Callable, Optional
+
+logger = logging.getLogger(__name__)
+
+# rewrite name for backward compatibility in `make_generation_fast_`
+def module_name_fordropout(module_name: str) -> str:
+    if module_name == "TransformerEncoderBase":
+        return "TransformerEncoder"
+    else:
+        return module_name
+
+def utils_make_positions(tensor, padding_idx: int, onnx_trace: bool = False):
+    """Replace non-padding symbols with their position numbers.
+
+    Position numbers begin at padding_idx+1. Padding symbols are ignored.
+    """
+    # The series of casts and type-conversions here are carefully
+    # balanced to both work with ONNX export and XLA. In particular XLA
+    # prefers ints, cumsum defaults to output longs, and ONNX doesn't know
+    # how to handle the dtype kwarg in cumsum.
+    mask = tensor.ne(padding_idx).int()
+    return (torch.cumsum(mask, dim=1).type_as(mask) * mask).long() + padding_idx
+
+def utils_item(tensor):
+    # tpu-comment: making this a no-op for xla devices.
+    if torch.is_tensor(tensor) and tensor.device.type == "xla":
+        return tensor.detach()
+    if hasattr(tensor, "item"):
+        return tensor.item()
+    if hasattr(tensor, "__getitem__"):
+        return tensor[0]
+    return tensor
+
+def fsdp_wrap(module, min_num_params: Optional[int] = None, **kwargs):
+    """
+    Helper to wrap layers/modules in FSDP. This falls back to a no-op if
+    fairscale is not available.
+
+    Args:
+        module (nn.Module): module to (maybe) wrap
+        min_num_params (int, Optional): minimum number of layer params to wrap
+    """
+    try:
+        from fairscale.nn import wrap
+
+        if min_num_params is not None:
+            num_params = sum(p.numel() for p in module.parameters())
+            if num_params >= min_num_params:
+                return wrap(module, **kwargs)
+            else:
+                return module
+        else:
+            return wrap(module, **kwargs)
+    except ImportError:
+        return module
+
+def quant_noise(module, p, block_size):
+    """
+    Wraps modules and applies quantization noise to the weights for
+    subsequent quantization with Iterative Product Quantization as
+    described in "Training with Quantization Noise for Extreme Model Compression"
+
+    Args:
+        - module: nn.Module
+        - p: amount of Quantization Noise
+        - block_size: size of the blocks for subsequent quantization with iPQ
+
+    Remarks:
+        - Module weights must have the right sizes wrt the block size
+        - Only Linear, Embedding and Conv2d modules are supported for the moment
+        - For more detail on how to quantize by blocks with convolutional weights,
+          see "And the Bit Goes Down: Revisiting the Quantization of Neural Networks"
+        - We implement the simplest form of noise here as stated in the paper
+          which consists in randomly dropping blocks
+    """
+
+    # if no quantization noise, don't register hook
+    if p <= 0:
+        return module
+
+    # supported modules
+    assert isinstance(module, (nn.Linear, nn.Embedding, nn.Conv2d))
+
+    # test whether module.weight has the right sizes wrt block_size
+    is_conv = module.weight.ndim == 4
+
+    # 2D matrix
+    if not is_conv:
+        assert (
+            module.weight.size(1) % block_size == 0
+        ), "Input features must be a multiple of block sizes"
+
+    # 4D matrix
+    else:
+        # 1x1 convolutions
+        if module.kernel_size == (1, 1):
+            assert (
+                module.in_channels % block_size == 0
+            ), "Input channels must be a multiple of block sizes"
+        # regular convolutions
+        else:
+            k = module.kernel_size[0] * module.kernel_size[1]
+            assert k % block_size == 0, "Kernel size must be a multiple of block size"
+
+    def _forward_pre_hook(mod, input):
+        # no noise for evaluation
+        if mod.training:
+            if not is_conv:
+                # gather weight and sizes
+                weight = mod.weight
+                in_features = weight.size(1)
+                out_features = weight.size(0)
+
+                # split weight matrix into blocks and randomly drop selected blocks
+                mask = torch.zeros(
+                    in_features // block_size * out_features, device=weight.device
+                )
+                mask.bernoulli_(p)
+                mask = mask.repeat_interleave(block_size, -1).view(-1, in_features)
+
+            else:
+                # gather weight and sizes
+                weight = mod.weight
+                in_channels = mod.in_channels
+                out_channels = mod.out_channels
+
+                # split weight matrix into blocks and randomly drop selected blocks
+                if mod.kernel_size == (1, 1):
+                    mask = torch.zeros(
+                        int(in_channels // block_size * out_channels),
+                        device=weight.device,
+                    )
+                    mask.bernoulli_(p)
+                    mask = mask.repeat_interleave(block_size, -1).view(-1, in_channels)
+                else:
+                    mask = torch.zeros(
+                        weight.size(0), weight.size(1), device=weight.device
+                    )
+                    mask.bernoulli_(p)
+                    mask = (
+                        mask.unsqueeze(2)
+                        .unsqueeze(3)
+                        .repeat(1, 1, mod.kernel_size[0], mod.kernel_size[1])
+                    )
+
+            # scale weights and apply mask
+            mask = mask.to(
+                torch.bool
+            )  # x.bool() is not currently supported in TorchScript
+            s = 1 / (1 - p)
+            mod.weight.data = s * weight.masked_fill(mask, 0)
+
+    module.register_forward_pre_hook(_forward_pre_hook)
+    return module
+
+def relu_squared(x: torch.Tensor):
+    return F.relu(x).pow(2)
+
+def gelu(x: torch.Tensor) -> torch.Tensor:
+    return torch.nn.functional.gelu(x.float()).type_as(x)
+
+def gelu_accurate(x):
+    if not hasattr(gelu_accurate, "_a"):
+        gelu_accurate._a = math.sqrt(2 / math.pi)
+    return (
+        0.5 * x * (1 + torch.tanh(gelu_accurate._a * (x + 0.044715 * torch.pow(x, 3))))
+    )
+
+def get_activation_fn(activation: str) -> Callable:
+    """Returns the activation function corresponding to `activation`"""
+    if activation == "relu":
+        return F.relu
+    elif activation == "relu_squared":
+        return relu_squared
+    elif activation == "gelu":
+        return gelu
+    elif activation == "gelu_fast":
+        logger.warn(
+            "--activation-fn=gelu_fast has been renamed to gelu_accurate"
+        )
+        return gelu_accurate
+    elif activation == "gelu_accurate":
+        return gelu_accurate
+    elif activation == "tanh":
+        return torch.tanh
+    elif activation == "linear":
+        return lambda x: x
+    elif activation == "swish":
+        return torch.nn.SiLU
+    else:
+        raise RuntimeError("--activation-fn {} not supported".format(activation))
+
+def softmax(x, dim: int, onnx_trace: bool = False):
+    if onnx_trace:
+        return F.softmax(x.float(), dim=dim)
+    else:
+        return F.softmax(x, dim=dim, dtype=torch.float32)
+
+def compute_mask_indices(
+    shape: Tuple[int, int],
+    padding_mask: Optional[torch.Tensor],
+    mask_prob: float,
+    mask_length: int,
+    mask_type: str = "static",
+    mask_other: float = 0.0,
+    min_masks: int = 0,
+    no_overlap: bool = False,
+    min_space: int = 0,
+    require_same_masks: bool = True,
+    mask_dropout: float = 0.0,
+) -> np.ndarray:
+    """
+    Computes random mask spans for a given shape
+
+    Args:
+        shape: the the shape for which to compute masks.
+            should be of size 2 where first element is batch size and 2nd is timesteps
+        padding_mask: optional padding mask of the same size as shape, which will prevent masking padded elements
+        mask_prob: probability for each token to be chosen as start of the span to be masked. this will be multiplied by
+            number of timesteps divided by length of mask span to mask approximately this percentage of all elements.
+            however due to overlaps, the actual number will be smaller (unless no_overlap is True)
+        mask_type: how to compute mask lengths
+            static = fixed size
+            uniform = sample from uniform distribution [mask_other, mask_length*2]
+            normal = sample from normal distribution with mean mask_length and stdev mask_other. mask is min 1 element
+            poisson = sample from possion distribution with lambda = mask length
+        min_masks: minimum number of masked spans
+        no_overlap: if false, will switch to an alternative recursive algorithm that prevents spans from overlapping
+        min_space: only used if no_overlap is True, this is how many elements to keep unmasked between spans
+        require_same_masks: if true, will randomly drop out masks until same amount of masks remains in each sample
+        mask_dropout: randomly dropout this percentage of masks in each example
+    """
+
+    bsz, all_sz = shape
+    mask = np.full((bsz, all_sz), False)
+
+    all_num_mask = int(
+        # add a random number for probabilistic rounding
+        mask_prob * all_sz / float(mask_length)
+        + np.random.rand()
+    )
+
+    all_num_mask = max(min_masks, all_num_mask)
+
+    mask_idcs = []
+    for i in range(bsz):
+        if padding_mask is not None:
+            sz = all_sz - padding_mask[i].long().sum().item()
+            num_mask = int(
+                # add a random number for probabilistic rounding
+                mask_prob * sz / float(mask_length)
+                + np.random.rand()
+            )
+            num_mask = max(min_masks, num_mask)
+        else:
+            sz = all_sz
+            num_mask = all_num_mask
+
+        if mask_type == "static":
+            lengths = np.full(num_mask, mask_length)
+        elif mask_type == "uniform":
+            lengths = np.random.randint(mask_other, mask_length * 2 + 1, size=num_mask)
+        elif mask_type == "normal":
+            lengths = np.random.normal(mask_length, mask_other, size=num_mask)
+            lengths = [max(1, int(round(x))) for x in lengths]
+        elif mask_type == "poisson":
+            lengths = np.random.poisson(mask_length, size=num_mask)
+            lengths = [int(round(x)) for x in lengths]
+        else:
+            raise Exception("unknown mask selection " + mask_type)
+
+        if sum(lengths) == 0:
+            lengths[0] = min(mask_length, sz - 1)
+
+        if no_overlap:
+            mask_idc = []
+
+            def arrange(s, e, length, keep_length):
+                span_start = np.random.randint(s, e - length)
+                mask_idc.extend(span_start + i for i in range(length))
+
+                new_parts = []
+                if span_start - s - min_space >= keep_length:
+                    new_parts.append((s, span_start - min_space + 1))
+                if e - span_start - keep_length - min_space > keep_length:
+                    new_parts.append((span_start + length + min_space, e))
+                return new_parts
+
+            parts = [(0, sz)]
+            min_length = min(lengths)
+            for length in sorted(lengths, reverse=True):
+                lens = np.fromiter(
+                    (e - s if e - s >= length + min_space else 0 for s, e in parts),
+                    np.int,
+                )
+                l_sum = np.sum(lens)
+                if l_sum == 0:
+                    break
+                probs = lens / np.sum(lens)
+                c = np.random.choice(len(parts), p=probs)
+                s, e = parts.pop(c)
+                parts.extend(arrange(s, e, length, min_length))
+            mask_idc = np.asarray(mask_idc)
+        else:
+            min_len = min(lengths)
+            if sz - min_len <= num_mask:
+                min_len = sz - num_mask - 1
+
+            mask_idc = np.random.choice(sz - min_len, num_mask, replace=False)
+
+            mask_idc = np.asarray(
+                [
+                    mask_idc[j] + offset
+                    for j in range(len(mask_idc))
+                    for offset in range(lengths[j])
+                ]
+            )
+
+        mask_idcs.append(np.unique(mask_idc[mask_idc < sz]))
+
+    min_len = min([len(m) for m in mask_idcs])
+    for i, mask_idc in enumerate(mask_idcs):
+        if len(mask_idc) > min_len and require_same_masks:
+            mask_idc = np.random.choice(mask_idc, min_len, replace=False)
+        if mask_dropout > 0:
+            num_holes = np.rint(len(mask_idc) * mask_dropout).astype(int)
+            mask_idc = np.random.choice(
+                mask_idc, len(mask_idc) - num_holes, replace=False
+            )
+
+        mask[i, mask_idc] = True
+
+    return mask
+
+def init_bert_params(module):
+    """
+    Initialize the weights specific to the BERT Model.
+    This overrides the default initializations depending on the specified arguments.
+        1. If normal_init_linear_weights is set then weights of linear
+           layer will be initialized using the normal distribution and
+           bais will be set to the specified value.
+        2. If normal_init_embed_weights is set then weights of embedding
+           layer will be initialized using the normal distribution.
+        3. If normal_init_proj_weights is set then weights of
+           in_project_weight for MultiHeadAttention initialized using
+           the normal distribution (to be validated).
+    """
+
+    def normal_(data):
+        # with FSDP, module params will be on CUDA, so we cast them back to CPU
+        # so that the RNG is consistent with and without FSDP
+        data.copy_(data.cpu().normal_(mean=0.0, std=0.02).to(data.device))
+
+    if isinstance(module, nn.Linear):
+        normal_(module.weight.data)
+        if module.bias is not None:
+            module.bias.data.zero_()
+    if isinstance(module, nn.Embedding):
+        normal_(module.weight.data)
+        if module.padding_idx is not None:
+            module.weight.data[module.padding_idx].zero_()
+    if isinstance(module, MultiheadAttention):
+        normal_(module.q_proj.weight.data)
+        normal_(module.k_proj.weight.data)
+        normal_(module.v_proj.weight.data)
+
+def pad_to_multiple(x, multiple, dim=-1, value=0):
+    # Inspired from https://github.com/lucidrains/local-attention/blob/master/local_attention/local_attention.py#L41
+    if x is None:
+        return None, 0
+    tsz = x.size(dim)
+    m = tsz / multiple
+    remainder = math.ceil(m) * multiple - tsz
+    if m.is_integer():
+        return x, 0
+    pad_offset = (0,) * (-1 - dim) * 2
+
+    return F.pad(x, (*pad_offset, 0, remainder), value=value), remainder
+
+def is_xla_tensor(tensor):
+    return torch.is_tensor(tensor) and tensor.device.type == "xla"
+
+def index_put(tensor, indices, value):
+    if is_xla_tensor(tensor):
+        for _ in range(indices.dim(), tensor.dim()):
+            indices = indices.unsqueeze(-1)
+        if indices.size(-1) < tensor.size(-1):
+            indices = indices.expand_as(tensor)
+        tensor = torch.mul(tensor, ~indices) + torch.mul(value, indices)
+    else:
+        tensor[indices] = value
+    return tensor
+
+def PositionalEmbedding(
+    num_embeddings: int,
+    embedding_dim: int,
+    padding_idx: int,
+    learned: bool = False,
+):
+    if learned:
+        # if padding_idx is specified then offset the embedding ids by
+        # this index and adjust num_embeddings appropriately
+        # TODO: The right place for this offset would be inside
+        # LearnedPositionalEmbedding. Move this there for a cleaner implementation.
+        if padding_idx is not None:
+            num_embeddings = num_embeddings + padding_idx + 1
+        m = LearnedPositionalEmbedding(num_embeddings, embedding_dim, padding_idx)
+        nn.init.normal_(m.weight, mean=0, std=embedding_dim**-0.5)
+        if padding_idx is not None:
+            nn.init.constant_(m.weight[padding_idx], 0)
+    else:
+        m = SinusoidalPositionalEmbedding(
+            embedding_dim,
+            padding_idx,
+            init_size=num_embeddings + padding_idx + 1,
+        )
+    return m
+
+def LayerNorm(normalized_shape, eps=1e-5, elementwise_affine=True, export=False):
+    if torch.jit.is_scripting() or torch.jit.is_tracing():
+        export = True
+    if not export and torch.cuda.is_available() and has_fused_layernorm:
+        return FusedLayerNorm(normalized_shape, eps, elementwise_affine)
+    return torch.nn.LayerNorm(normalized_shape, eps, elementwise_affine)
+
+
+class TransformerEncoderBase(nn.Module):
+    """
+    Transformer encoder consisting of *cfg.encoder.layers* layers. Each layer
+    is a :class:`TransformerEncoderLayer`.
+
+    Args:
+        args (argparse.Namespace): parsed command-line arguments
+        dictionary: deprecated(None)
+        embed_tokens (torch.nn.Embedding): input embedding
+    """
+
+    def __init__(self, cfg, dictionary, embed_tokens, use_rel_pos_enc=False, scaling_for_att=1.0):
+        self.cfg = cfg
+        super().__init__()
+        self.register_buffer("version", torch.Tensor([3]))
+
+        self.dropout_module = FairseqDropout(
+            cfg.dropout, module_name=module_name_fordropout(self.__class__.__name__)
+        )
+        self.encoder_layerdrop = cfg.encoder.layerdrop
+
+        embed_dim = embed_tokens.embedding_dim if embed_tokens is not None else cfg.encoder.embed_dim
+        self.padding_idx = embed_tokens.padding_idx if embed_tokens is not None else 1
+        self.max_source_positions = cfg.max_source_positions
+
+        self.embed_tokens = embed_tokens
+
+        self.embed_scale = 1.0 if cfg.no_scale_embedding else math.sqrt(embed_dim)
+
+        self.embed_positions = (
+            PositionalEmbedding(
+                cfg.max_source_positions,
+                embed_dim,
+                self.padding_idx,
+                learned=cfg.encoder.learned_pos,
+            )
+            if not cfg.no_token_positional_embeddings
+            else None
+        )
+        if cfg.layernorm_embedding:
+            self.layernorm_embedding = LayerNorm(embed_dim, export=cfg.export)
+        else:
+            self.layernorm_embedding = None
+
+        if not cfg.adaptive_input and cfg.quant_noise.pq > 0:
+            self.quant_noise = quant_noise(
+                nn.Linear(embed_dim, embed_dim, bias=False),
+                cfg.quant_noise.pq,
+                cfg.quant_noise.pq_block_size,
+            )
+        else:
+            self.quant_noise = None
+
+        if self.encoder_layerdrop > 0.0:
+            self.layers = LayerDropModuleList(p=self.encoder_layerdrop)
+        else:
+            self.layers = nn.ModuleList([])
+        self.use_rel_pos_enc = use_rel_pos_enc
+        self.scaling_for_att = scaling_for_att
+        self.layers.extend(
+            [self.build_encoder_layer(cfg) for i in range(cfg.encoder.layers)]
+        )
+        self.num_layers = len(self.layers)
+
+        if cfg.encoder.normalize_before:
+            self.layer_norm = LayerNorm(embed_dim, export=cfg.export)
+        else:
+            self.layer_norm = None
+        if self.use_rel_pos_enc:
+            self.pos_emb = RelativePositionalEncoding(embed_dim // cfg.encoder.attention_heads, 160)
+
+    def build_encoder_layer(self, cfg):
+        layer = TransformerEncoderLayerBase(cfg, has_relative_attention_bias=self.use_rel_pos_enc, scaling_for_att=self.scaling_for_att)
+        checkpoint = cfg.checkpoint_activations
+        if checkpoint:
+            raise ValueError("We don't support checkpoint_activations for now! Please set cfg.checkpoint_activations=False.")
+        min_params_to_wrap = cfg.min_params_to_wrap if not checkpoint else 0
+        layer = fsdp_wrap(layer, min_num_params=min_params_to_wrap)
+        return layer
+
+    def forward_embedding(
+        self, src_tokens, token_embedding: Optional[torch.Tensor] = None
+    ):
+        # embed tokens and positions
+        if token_embedding is None:
+            token_embedding = self.embed_tokens(src_tokens)
+        x = embed = self.embed_scale * token_embedding
+        if self.embed_positions is not None:
+            x = embed + self.embed_positions(src_tokens)
+        if self.layernorm_embedding is not None:
+            x = self.layernorm_embedding(x)
+        x = self.dropout_module(x)
+        if self.quant_noise is not None:
+            x = self.quant_noise(x)
+        return x, embed
+
+    def forward(
+        self,
+        src_tokens,
+        src_lengths: Optional[torch.Tensor] = None,
+        return_all_hiddens: bool = False,
+        token_embeddings: Optional[torch.Tensor] = None,
+        uniformity_layers: Optional[List[int]] = None,
+    ):
+        """
+        Args:
+            src_tokens (LongTensor): tokens in the source language of shape
+                `(batch, src_len)`
+            src_lengths (torch.LongTensor): lengths of each source sentence of
+                shape `(batch)`
+            return_all_hiddens (bool, optional): also return all of the
+                intermediate hidden states (default: False).
+            token_embeddings (torch.Tensor, optional): precomputed embeddings
+                default `None` will recompute embeddings
+
+        Returns:
+            dict:
+                - **encoder_out** (Tensor): the last encoder layer's output of
+                  shape `(src_len, batch, embed_dim)`
+                - **encoder_padding_mask** (ByteTensor): the positions of
+                  padding elements of shape `(batch, src_len)`
+                - **encoder_embedding** (Tensor): the (scaled) embedding lookup
+                  of shape `(batch, src_len, embed_dim)`
+                - **encoder_states** (List[Tensor]): all intermediate
+                  hidden states of shape `(src_len, batch, embed_dim)`.
+                  Only populated if *return_all_hiddens* is True.
+        """
+        return self.forward_scriptable(
+            src_tokens, src_lengths, return_all_hiddens, token_embeddings, uniformity_layers
+        )
+
+    # TorchScript doesn't support super() method so that the scriptable Subclass
+    # can't access the base class model in Torchscript.
+    # Current workaround is to add a helper function with different name and
+    # call the helper function from scriptable Subclass.
+    def forward_scriptable(
+        self,
+        src_tokens,
+        src_lengths: Optional[torch.Tensor] = None,
+        return_all_hiddens: bool = False,
+        token_embeddings: Optional[torch.Tensor] = None,
+        uniformity_layers: Optional[List[int]] = None,
+    ):
+        """
+        Args:
+            src_tokens (LongTensor): tokens in the source language of shape
+                `(batch, src_len)`
+            src_lengths (torch.LongTensor): lengths of each source sentence of
+                shape `(batch)`
+            return_all_hiddens (bool, optional): also return all of the
+                intermediate hidden states (default: False).
+            token_embeddings (torch.Tensor, optional): precomputed embeddings
+                default `None` will recompute embeddings
+
+        Returns:
+            dict:
+                - **encoder_out** (Tensor): the last encoder layer's output of
+                  shape `(src_len, batch, embed_dim)`
+                - **encoder_padding_mask** (ByteTensor): the positions of
+                  padding elements of shape `(batch, src_len)`
+                - **encoder_embedding** (Tensor): the (scaled) embedding lookup
+                  of shape `(batch, src_len, embed_dim)`
+                - **encoder_states** (List[Tensor]): all intermediate
+                  hidden states of shape `(src_len, batch, embed_dim)`.
+                  Only populated if *return_all_hiddens* is True.
+        """
+        # compute padding mask
+        encoder_padding_mask = src_tokens.eq(self.padding_idx)
+        has_pads = src_tokens.device.type == "xla" or encoder_padding_mask.any()
+
+        x, encoder_embedding = self.forward_embedding(src_tokens, token_embeddings)
+
+        # account for padding while computing the representation
+        if has_pads:
+            x = x * (1 - encoder_padding_mask.unsqueeze(-1).type_as(x))
+
+        # B x T x C -> T x B x C
+        x = x.transpose(0, 1)
+        if self.use_rel_pos_enc:
+            x_len = x.shape[0]
+            pos_seq = torch.arange(0, x_len).long().to(x.device)
+            pos_seq = pos_seq[:, None] - pos_seq[None, :]
+            pos_k, pos_v = self.pos_emb(pos_seq)
+        else:
+            pos_k = None
+
+        encoder_states = []
+        uniformity_hiddens = []
+
+        if return_all_hiddens:
+            encoder_states.append(x)
+
+        if uniformity_layers is not None and 0 in uniformity_layers:
+            x = F.normalize(x.float(), dim=-1).type_as(x)
+            uniformity_hiddens.append(x)
+
+        # encoder layers
+        for i, layer in enumerate(self.layers):
+            x = layer(
+                x, encoder_padding_mask=encoder_padding_mask if has_pads else None,
+                pos_bias=pos_k,
+            )
+            if uniformity_layers is not None and i+1 in uniformity_layers:
+                x = F.normalize(x.float(), dim=-1).type_as(x)
+                uniformity_hiddens.append(x)
+            if return_all_hiddens:
+                assert encoder_states is not None
+                encoder_states.append(x)
+
+        if self.layer_norm is not None:
+            x = self.layer_norm(x)
+
+        # The Pytorch Mobile lite interpreter does not supports returning NamedTuple in
+        # `forward` so we use a dictionary instead.
+        # TorchScript does not support mixed values so the values are all lists.
+        # The empty list is equivalent to None.
+        src_lengths = (
+            src_tokens.ne(self.padding_idx)
+            .sum(dim=1, dtype=torch.int32)
+            .reshape(-1, 1)
+            .contiguous()
+        )
+        return {
+            "encoder_out": [x],  # T x B x C
+            "encoder_padding_mask": [encoder_padding_mask],  # B x T
+            "encoder_embedding": [encoder_embedding],  # B x T x C
+            "encoder_states": encoder_states,  # List[T x B x C]
+            "uniformity_hiddens": uniformity_hiddens, # List[T x B x C]
+            "src_tokens": [],
+            "src_lengths": [src_lengths],
+        }
+
+    def forward_torchscript(self, net_input: Dict[str, Tensor]):
+        """A TorchScript-compatible version of forward.
+
+        Encoders which use additional arguments may want to override
+        this method for TorchScript compatibility.
+        """
+        if torch.jit.is_scripting():
+            return self.forward(
+                src_tokens=net_input["src_tokens"],
+                src_lengths=net_input["src_lengths"],
+            )
+        else:
+            return self.forward_non_torchscript(net_input)
+
+    @torch.jit.unused
+    def forward_non_torchscript(self, net_input: Dict[str, Tensor]):
+        encoder_input = {
+            k: v for k, v in net_input.items() if k != "prev_output_tokens"
+        }
+        return self.forward(**encoder_input)
+    
+    @torch.jit.export
+    def reorder_encoder_out(self, encoder_out: Dict[str, List[Tensor]], new_order):
+        """
+        Reorder encoder output according to *new_order*.
+
+        Args:
+            encoder_out: output from the ``forward()`` method
+            new_order (LongTensor): desired order
+
+        Returns:
+            *encoder_out* rearranged according to *new_order*
+        """
+        if len(encoder_out["encoder_out"]) == 0:
+            new_encoder_out = []
+        else:
+            new_encoder_out = [encoder_out["encoder_out"][0].index_select(1, new_order)]
+        if len(encoder_out["encoder_padding_mask"]) == 0:
+            new_encoder_padding_mask = []
+        else:
+            new_encoder_padding_mask = [
+                encoder_out["encoder_padding_mask"][0].index_select(0, new_order)
+            ]
+        if len(encoder_out["encoder_embedding"]) == 0:
+            new_encoder_embedding = []
+        else:
+            new_encoder_embedding = [
+                encoder_out["encoder_embedding"][0].index_select(0, new_order)
+            ]
+
+        if len(encoder_out["src_tokens"]) == 0:
+            src_tokens = []
+        else:
+            src_tokens = [(encoder_out["src_tokens"][0]).index_select(0, new_order)]
+
+        if len(encoder_out["src_lengths"]) == 0:
+            src_lengths = []
+        else:
+            src_lengths = [(encoder_out["src_lengths"][0]).index_select(0, new_order)]
+
+        encoder_states = encoder_out["encoder_states"]
+        if len(encoder_states) > 0:
+            for idx, state in enumerate(encoder_states):
+                encoder_states[idx] = state.index_select(1, new_order)
+
+        return {
+            "encoder_out": new_encoder_out,  # T x B x C
+            "encoder_padding_mask": new_encoder_padding_mask,  # B x T
+            "encoder_embedding": new_encoder_embedding,  # B x T x C
+            "encoder_states": encoder_states,  # List[T x B x C]
+            "src_tokens": src_tokens,  # B x T
+            "src_lengths": src_lengths,  # B x 1
+        }
+
+    def max_positions(self):
+        """Maximum input length supported by the encoder."""
+        if self.embed_positions is None:
+            return self.max_source_positions
+        return min(self.max_source_positions, self.embed_positions.max_positions)
+
+    def upgrade_state_dict_named(self, state_dict, name):
+        """Upgrade a (possibly old) state dict for new versions."""
+        if isinstance(self.embed_positions, SinusoidalPositionalEmbedding):
+            weights_key = "{}.embed_positions.weights".format(name)
+            if weights_key in state_dict:
+                print("deleting {0}".format(weights_key))
+                del state_dict[weights_key]
+            state_dict[
+                "{}.embed_positions._float_tensor".format(name)
+            ] = torch.FloatTensor(1)
+        for i in range(self.num_layers):
+            # update layer norms
+            self.layers[i].upgrade_state_dict_named(
+                state_dict, "{}.layers.{}".format(name, i)
+            )
+
+        version_key = "{}.version".format(name)
+        if utils_item(state_dict.get(version_key, torch.Tensor([1]))[0]) < 2:
+            # earlier checkpoints did not normalize after the stack of layers
+            self.layer_norm = None
+            self.normalize = False
+            state_dict[version_key] = torch.Tensor([1])
+        return state_dict
+
+    def set_num_updates(self, num_updates):
+        """State from trainer to pass along to model at every update."""
+
+        def _apply(m):
+            if hasattr(m, "set_num_updates") and m != self:
+                m.set_num_updates(num_updates)
+
+        self.apply(_apply)
+
+
+class TransformerEncoderLayerBase(nn.Module):
+    """Encoder layer block.
+
+    In the original paper each operation (multi-head attention or FFN) is
+    postprocessed with: `dropout -> add residual -> layernorm`. In the
+    tensor2tensor code they suggest that learning is more robust when
+    preprocessing each layer with layernorm and postprocessing with:
+    `dropout -> add residual`. We default to the approach in the paper, but the
+    tensor2tensor approach can be enabled by setting
+    *cfg.encoder.normalize_before* to ``True``.
+
+    Args:
+        args (argparse.Namespace): parsed command-line arguments
+    """
+
+    def __init__(self, cfg, has_relative_attention_bias=False, scaling_for_att=1.0):
+        super().__init__()
+        self.cfg = cfg
+        self.embed_dim = cfg.encoder.embed_dim
+        self.quant_noise = cfg.quant_noise.pq
+        self.quant_noise_block_size = cfg.quant_noise.pq_block_size
+        self.self_attn = self.build_self_attention(self.embed_dim, cfg, has_relative_attention_bias=has_relative_attention_bias, scaling_for_att=scaling_for_att)
+        self.self_attn_layer_norm = LayerNorm(self.embed_dim, export=cfg.export)
+        self.dropout_module = FairseqDropout(
+            cfg.dropout, module_name=self.__class__.__name__
+        )
+        self.activation_fn = get_activation_fn(activation=cfg.activation_fn)
+        activation_dropout_p = cfg.activation_dropout
+        if activation_dropout_p == 0:
+            # for backwards compatibility with models that use cfg.relu_dropout
+            activation_dropout_p = cfg.relu_dropout or 0
+        self.activation_dropout_module = FairseqDropout(
+            float(activation_dropout_p), module_name=self.__class__.__name__
+        )
+        self.normalize_before = cfg.encoder.normalize_before
+        self.fc1 = self.build_fc1(
+            self.embed_dim,
+            cfg.encoder.ffn_embed_dim,
+            self.quant_noise,
+            self.quant_noise_block_size,
+        )
+        self.fc2 = self.build_fc2(
+            cfg.encoder.ffn_embed_dim,
+            self.embed_dim,
+            self.quant_noise,
+            self.quant_noise_block_size,
+        )
+
+        self.final_layer_norm = LayerNorm(self.embed_dim, export=cfg.export)
+        if has_relative_attention_bias:
+            self.norm_k = LayerNorm(self.embed_dim // cfg.encoder.attention_heads)
+
+    def build_fc1(self, input_dim, output_dim, q_noise, qn_block_size):
+        return quant_noise(
+            nn.Linear(input_dim, output_dim), p=q_noise, block_size=qn_block_size
+        )
+
+    def build_fc2(self, input_dim, output_dim, q_noise, qn_block_size):
+        return quant_noise(
+            nn.Linear(input_dim, output_dim), p=q_noise, block_size=qn_block_size
+        )
+
+    def build_self_attention(self, embed_dim, cfg, has_relative_attention_bias=False, scaling_for_att=1.0):
+        return MultiheadAttention(
+            embed_dim,
+            cfg.encoder.attention_heads,
+            dropout=cfg.attention_dropout,
+            self_attention=True,
+            q_noise=self.quant_noise,
+            qn_block_size=self.quant_noise_block_size,
+            has_relative_attention_bias=has_relative_attention_bias,
+            scaling_for_att=scaling_for_att,
+        )
+
+    def residual_connection(self, x, residual):
+        return residual + x
+
+    def upgrade_state_dict_named(self, state_dict, name):
+        """
+        Rename layer norm states from `...layer_norms.0.weight` to
+        `...self_attn_layer_norm.weight` and `...layer_norms.1.weight` to
+        `...final_layer_norm.weight`
+        """
+        layer_norm_map = {"0": "self_attn_layer_norm", "1": "final_layer_norm"}
+        for old, new in layer_norm_map.items():
+            for m in ("weight", "bias"):
+                k = "{}.layer_norms.{}.{}".format(name, old, m)
+                if k in state_dict:
+                    state_dict["{}.{}.{}".format(name, new, m)] = state_dict[k]
+                    del state_dict[k]
+
+    def forward(
+        self,
+        x,
+        encoder_padding_mask: Optional[Tensor],
+        attn_mask: Optional[Tensor] = None,
+        pos_bias=None,
+    ):
+        """
+        Args:
+            x (Tensor): input to the layer of shape `(seq_len, batch, embed_dim)`
+            encoder_padding_mask (ByteTensor): binary ByteTensor of shape
+                `(batch, seq_len)` where padding elements are indicated by ``1``.
+            attn_mask (ByteTensor): binary tensor of shape `(tgt_len, src_len)`,
+                where `tgt_len` is the length of output and `src_len` is the
+                length of input, though here both are equal to `seq_len`.
+                `attn_mask[tgt_i, src_j] = 1` means that when calculating the
+                embedding for `tgt_i`, we exclude (mask out) `src_j`. This is
+                useful for strided self-attention.
+
+        Returns:
+            encoded output of shape `(seq_len, batch, embed_dim)`
+        """
+        # anything in original attn_mask = 1, becomes -1e8
+        # anything in original attn_mask = 0, becomes 0
+        # Note that we cannot use -inf here, because at some edge cases,
+        # the attention weight (before softmax) for some padded element in query
+        # will become -inf, which results in NaN in model parameters
+        if attn_mask is not None:
+            attn_mask = attn_mask.masked_fill(
+                attn_mask.to(torch.bool), -1e8 if x.dtype == torch.float32 else -1e4
+            )
+
+        residual = x
+        if self.normalize_before:
+            x = self.self_attn_layer_norm(x)
+            if pos_bias is not None:
+                pos_bias = self.norm_k(pos_bias)
+        x, _ = self.self_attn(
+            query=x,
+            key=x,
+            value=x,
+            key_padding_mask=encoder_padding_mask,
+            need_weights=False,
+            attn_mask=attn_mask,
+            position_bias=pos_bias,
+        )
+        x = self.dropout_module(x)
+        x = self.residual_connection(x, residual)
+        if not self.normalize_before:
+            x = self.self_attn_layer_norm(x)
+
+        residual = x
+        if self.normalize_before:
+            x = self.final_layer_norm(x)
+        x = self.activation_fn(self.fc1(x))
+        x = self.activation_dropout_module(x)
+        x = self.fc2(x)
+        x = self.dropout_module(x)
+        x = self.residual_connection(x, residual)
+        if not self.normalize_before:
+            x = self.final_layer_norm(x)
+        return x
+
+
+class TransformerEncoder(nn.Module):
+    """
+    wav2vec-style transformer encoder.
+    """
+    def __init__(self, args):
+        super().__init__()
+
+        self.dropout = args.dropout
+        self.embedding_dim = args.encoder_embed_dim
+        self.required_seq_len_multiple = args.required_seq_len_multiple
+
+        self.pos_conv = nn.Conv1d(
+            self.embedding_dim,
+            self.embedding_dim,
+            kernel_size=args.conv_pos,
+            padding=args.conv_pos // 2,
+            groups=args.conv_pos_groups,
+        )
+        dropout = 0
+        std = math.sqrt((4 * (1.0 - dropout)) / (args.conv_pos * self.embedding_dim))
+        nn.init.normal_(self.pos_conv.weight, mean=0, std=std)
+        nn.init.constant_(self.pos_conv.bias, 0)
+
+        self.pos_conv = nn.utils.weight_norm(self.pos_conv, name="weight", dim=2)
+        self.pos_conv = nn.Sequential(self.pos_conv, SamePad(args.conv_pos), nn.GELU())
+
+        layers = []
+        self.use_rel_pos_enc = getattr(args, "use_rel_pos_enc", False)
+        for _ in range(args.encoder_layers):
+            layer = TransformerSentenceEncoderLayer(
+                embedding_dim=self.embedding_dim,
+                ffn_embedding_dim=args.encoder_ffn_embed_dim,
+                num_attention_heads=args.encoder_attention_heads,
+                dropout=self.dropout,
+                attention_dropout=args.attention_dropout,
+                activation_dropout=args.activation_dropout,
+                activation_fn=args.activation_fn,
+                layer_norm_first=args.layer_norm_first,
+                has_relative_attention_bias=self.use_rel_pos_enc,
+                scaling_for_att=getattr(args, "scaling_for_att", 1.0)
+            )
+            if args.checkpoint_activations:
+                raise ValueError("We don't support checkpoint_activations for now! Please set checkpoint_activations=False.")
+            layers.append(layer)
+        self.layers = nn.ModuleList(layers)
+
+        self.layer_norm_first = args.layer_norm_first
+        self.layer_norm = LayerNorm(self.embedding_dim)
+        self.layerdrop = args.encoder_layerdrop
+
+        if self.use_rel_pos_enc:
+            self.pos_emb = RelativePositionalEncoding(args.encoder_embed_dim // args.encoder_attention_heads, 160)
+
+        self.apply(init_bert_params)
+
+    def forward(self, x, padding_mask=None, layer=None, conv_pos=True):
+        x, layer_results = self.extract_features(x, padding_mask, layer, conv_pos)
+
+        if self.layer_norm_first and (layer is None or layer >= len(self.layers) - 1):
+            x = self.layer_norm(x)
+
+        return x, layer_results
+
+    def extract_features(self, x, padding_mask=None, tgt_layer=None, conv_pos=True):
+
+        if padding_mask is not None:
+            x = index_put(x, padding_mask, 0)
+
+        if conv_pos:
+            x_conv = self.pos_conv(x.transpose(1, 2))
+            x_conv = x_conv.transpose(1, 2)
+            x = x + x_conv
+
+        if not self.layer_norm_first:
+            x = self.layer_norm(x)
+
+        # pad to the sequence length dimension
+        x, pad_length = pad_to_multiple(
+            x, self.required_seq_len_multiple, dim=-2, value=0
+        )
+        if pad_length > 0 and padding_mask is None:
+            padding_mask = x.new_zeros((x.size(0), x.size(1)), dtype=torch.bool)
+            padding_mask[:, -pad_length:] = True
+        else:
+            padding_mask, _ = pad_to_multiple(
+                padding_mask, self.required_seq_len_multiple, dim=-1, value=True
+            )
+        x = F.dropout(x, p=self.dropout, training=self.training)
+
+        # B x T x C -> T x B x C
+        x = x.transpose(0, 1)
+        
+        if self.use_rel_pos_enc:
+            x_len = x.shape[0]
+            pos_seq = torch.arange(0, x_len).long().to(x.device)
+            pos_seq = pos_seq[:, None] - pos_seq[None, :]
+            pos_k, pos_v = self.pos_emb(pos_seq)
+        else:
+            pos_k = None
+
+        layer_results = []
+        r = None
+        for i, layer in enumerate(self.layers):
+            dropout_probability = np.random.random()
+            if not self.training or (dropout_probability > self.layerdrop):
+                x, z = layer(x, self_attn_padding_mask=padding_mask, need_weights=False, pos_bias=pos_k)
+                if tgt_layer is not None:
+                    # unpad if needed
+                    if pad_length > 0:
+                        layer_results.append(
+                            x[:-pad_length]
+                            # (
+                            #     x[:-pad_length],
+                            #     z[:, :-pad_length, :-pad_length]
+                            #     if z is not None
+                            #     else z,
+                            # )
+                        )
+                    else:
+                        # layer_results.append((x, z))
+                        layer_results.append(x)
+            if i == tgt_layer:
+                r = x
+                break
+
+        if r is not None:
+            x = r
+
+        # T x B x C -> B x T x C
+        x = x.transpose(0, 1)
+        # undo paddding
+        if pad_length > 0:
+            x = x[:, :-pad_length]
+
+        return x, layer_results
+
+    def max_positions(self):
+        """Maximum output length supported by the encoder."""
+        return self.args.max_positions
+
+    def upgrade_state_dict_named(self, state_dict, name):
+        """Upgrade a (possibly old) state dict for new versions of fairseq."""
+        return state_dict
+
+
+class TransformerSentenceEncoderLayer(nn.Module):
+    """
+    wav2vec-style transformer layer
+    """
+
+    def __init__(
+        self,
+        embedding_dim: float = 768,
+        ffn_embedding_dim: float = 3072,
+        num_attention_heads: float = 8,
+        dropout: float = 0.1,
+        attention_dropout: float = 0.1,
+        activation_dropout: float = 0.1,
+        activation_fn: str = "relu",
+        layer_norm_first: bool = False,
+        has_relative_attention_bias: bool = False,
+        scaling_for_att: float = 1.0,
+    ) -> None:
+
+        super().__init__()
+        # Initialize parameters
+        self.embedding_dim = embedding_dim
+        self.dropout = dropout
+        self.activation_dropout = activation_dropout
+
+        # Initialize blocks
+        self.activation_fn = get_activation_fn(activation_fn)
+        self.self_attn = MultiheadAttention(
+            self.embedding_dim,
+            num_attention_heads,
+            dropout=attention_dropout,
+            self_attention=True,
+            has_relative_attention_bias=has_relative_attention_bias,
+            scaling_for_att=scaling_for_att
+        )
+
+        self.dropout1 = nn.Dropout(dropout)
+        self.dropout2 = nn.Dropout(self.activation_dropout)
+        self.dropout3 = nn.Dropout(dropout)
+
+        self.layer_norm_first = layer_norm_first
+
+        # layer norm associated with the self attention layer
+        self.self_attn_layer_norm = LayerNorm(self.embedding_dim)
+        self.fc1 = nn.Linear(self.embedding_dim, ffn_embedding_dim)
+        self.fc2 = nn.Linear(ffn_embedding_dim, self.embedding_dim)
+
+        # layer norm associated with the position wise feed-forward NN
+        self.final_layer_norm = LayerNorm(self.embedding_dim)
+        
+        if has_relative_attention_bias:
+            self.norm_k = LayerNorm(self.embedding_dim//num_attention_heads)
+
+    def forward(
+        self,
+        x: torch.Tensor,
+        self_attn_mask: torch.Tensor = None,
+        self_attn_padding_mask: torch.Tensor = None,
+        need_weights: bool = False,
+        att_args=None,
+        pos_bias=None,
+    ):
+        """
+        LayerNorm is applied either before or after the self-attention/ffn
+        modules similar to the original Transformer imlementation.
+        """
+        residual = x
+
+        if self.layer_norm_first:
+            x = self.self_attn_layer_norm(x)
+            if pos_bias is not None:
+                pos_bias = self.norm_k(pos_bias)
+            x, attn = self.self_attn(
+                query=x,
+                key=x,
+                value=x,
+                key_padding_mask=self_attn_padding_mask,
+                attn_mask=self_attn_mask,
+                position_bias=pos_bias,
+            )
+            x = self.dropout1(x)
+            x = residual + x
+
+            residual = x
+            x = self.final_layer_norm(x)
+            x = self.activation_fn(self.fc1(x))
+            x = self.dropout2(x)
+            x = self.fc2(x)
+            x = self.dropout3(x)
+            x = residual + x
+        else:
+            x, attn = self.self_attn(
+                query=x,
+                key=x,
+                value=x,
+                key_padding_mask=self_attn_padding_mask,
+                position_bias=pos_bias,
+            )
+
+            x = self.dropout1(x)
+            x = residual + x
+
+            x = self.self_attn_layer_norm(x)
+
+            residual = x
+            x = self.activation_fn(self.fc1(x))
+            x = self.dropout2(x)
+            x = self.fc2(x)
+            x = self.dropout3(x)
+            x = residual + x
+            x = self.final_layer_norm(x)
+
+        return x, attn
+
+
+class FairseqDropout(nn.Module):
+    def __init__(self, p, module_name=None):
+        super().__init__()
+        self.p = p
+        self.module_name = module_name
+        self.apply_during_inference = False
+
+    def forward(self, x, inplace: bool = False):
+        if self.p > 0 and (self.training or self.apply_during_inference):
+            return F.dropout(x, p=self.p, training=True, inplace=inplace)
+        else:
+            return x
+
+    def make_generation_fast_(
+        self,
+        name: str,
+        retain_dropout: bool = False,
+        retain_dropout_modules: Optional[List[str]] = None,
+        **kwargs
+    ):
+        if retain_dropout:
+            if retain_dropout_modules is not None and self.module_name is None:
+                logger.warning(
+                    "Cannot enable dropout during inference for module {} "
+                    "because module_name was not set".format(name)
+                )
+            elif (
+                retain_dropout_modules is None  # if None, apply to all modules
+                or self.module_name in retain_dropout_modules
+            ):
+                logger.info(
+                    "Enabling dropout during inference for module: {}".format(name)
+                )
+                self.apply_during_inference = True
+            else:
+                logger.info("Disabling dropout for module: {}".format(name))
+
+
+class LearnedPositionalEmbedding(nn.Embedding):
+    """
+    This module learns positional embeddings up to a fixed maximum size.
+    Padding ids are ignored by either offsetting based on padding_idx
+    or by setting padding_idx to None and ensuring that the appropriate
+    position ids are passed to the forward function.
+    """
+
+    def __init__(self, num_embeddings: int, embedding_dim: int, padding_idx: int):
+        super().__init__(num_embeddings, embedding_dim, padding_idx)
+        self.onnx_trace = False
+        if self.padding_idx is not None:
+            self.max_positions = self.num_embeddings - self.padding_idx - 1
+        else:
+            self.max_positions = self.num_embeddings
+
+    def forward(
+        self,
+        input: Tensor,
+        incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]] = None,
+        positions: Optional[Tensor] = None,
+    ):
+        """Input is expected to be of size [bsz x seqlen]."""
+        assert (positions is None) or (
+            self.padding_idx is None
+        ), "If positions is pre-computed then padding_idx should not be set."
+
+        if positions is None:
+            if incremental_state is not None:
+                # positions is the same for every token when decoding a single step
+                # Without the int() cast, it doesn't work in some cases when exporting to ONNX
+                positions = torch.zeros(
+                    (1, 1), device=input.device, dtype=input.dtype
+                ).fill_(int(self.padding_idx + input.size(1)))
+            else:
+                positions = utils_make_positions(
+                    input, self.padding_idx, onnx_trace=self.onnx_trace
+                )
+            positions = torch.clamp(positions, max=self.padding_idx + self.max_positions)
+        return F.embedding(
+            positions,
+            self.weight,
+            self.padding_idx,
+            self.max_norm,
+            self.norm_type,
+            self.scale_grad_by_freq,
+            self.sparse,
+        )
+
+
+class SinusoidalPositionalEmbedding(nn.Module):
+    """This module produces sinusoidal positional embeddings of any length.
+
+    Padding symbols are ignored.
+    """
+
+    def __init__(self, embedding_dim, padding_idx, init_size=1024):
+        super().__init__()
+        self.embedding_dim = embedding_dim
+        self.padding_idx = padding_idx if padding_idx is not None else 0
+        self.weights = SinusoidalPositionalEmbedding.get_embedding(
+            init_size, embedding_dim, padding_idx
+        )
+        self.onnx_trace = False
+        self.register_buffer("_float_tensor", torch.FloatTensor(1))
+        self.max_positions = int(1e5)
+
+    def prepare_for_onnx_export_(self):
+        self.onnx_trace = True
+
+    @staticmethod
+    def get_embedding(
+        num_embeddings: int, embedding_dim: int, padding_idx: Optional[int] = None
+    ):
+        """Build sinusoidal embeddings.
+
+        This matches the implementation in tensor2tensor, but differs slightly
+        from the description in Section 3.5 of "Attention Is All You Need".
+        """
+        half_dim = embedding_dim // 2
+        emb = math.log(10000) / (half_dim - 1)
+        emb = torch.exp(torch.arange(half_dim, dtype=torch.float) * -emb)
+        emb = torch.arange(num_embeddings, dtype=torch.float).unsqueeze(
+            1
+        ) * emb.unsqueeze(0)
+        emb = torch.cat([torch.sin(emb), torch.cos(emb)], dim=1).view(
+            num_embeddings, -1
+        )
+        if embedding_dim % 2 == 1:
+            # zero pad
+            emb = torch.cat([emb, torch.zeros(num_embeddings, 1)], dim=1)
+        if padding_idx is not None:
+            emb[padding_idx, :] = 0
+        return emb
+
+    def forward(
+        self,
+        input,
+        incremental_state: Optional[Any] = None,
+        timestep: Optional[Tensor] = None,
+        positions: Optional[Any] = None,
+    ):
+        """Input is expected to be of size [bsz x seqlen]."""
+        bspair = torch.onnx.operators.shape_as_tensor(input)
+        bsz, seq_len = bspair[0], bspair[1]
+        max_pos = self.padding_idx + 1 + seq_len
+        if self.weights is None or max_pos > self.weights.size(0):
+            # recompute/expand embeddings if needed
+            self.weights = SinusoidalPositionalEmbedding.get_embedding(
+                max_pos, self.embedding_dim, self.padding_idx
+            )
+        self.weights = self.weights.to(self._float_tensor)
+
+        if incremental_state is not None:
+            # positions is the same for every token when decoding a single step
+            pos = timestep.view(-1)[0] + 1 if timestep is not None else seq_len
+            if self.onnx_trace:
+                return (
+                    self.weights.index_select(index=self.padding_idx + pos, dim=0)
+                    .unsqueeze(1)
+                    .repeat(bsz, 1, 1)
+                )
+            return self.weights[self.padding_idx + pos, :].expand(bsz, 1, -1)
+
+        positions = utils_make_positions(
+            input, self.padding_idx, onnx_trace=self.onnx_trace
+        )
+        if self.onnx_trace:
+            flat_embeddings = self.weights.detach().index_select(0, positions.view(-1))
+            embedding_shape = torch.cat(
+                (bsz.view(1), seq_len.view(1), torch.tensor([-1], dtype=torch.long))
+            )
+            embeddings = torch.onnx.operators.reshape_from_tensor_shape(
+                flat_embeddings, embedding_shape
+            )
+            return embeddings
+        return (
+            self.weights.index_select(0, positions.view(-1))
+            .view(bsz, seq_len, -1)
+            .detach()
+        )
+
+
+try:
+    from apex.normalization import FusedLayerNorm as _FusedLayerNorm
+
+    has_fused_layernorm = True
+
+    class FusedLayerNorm(_FusedLayerNorm):
+        @torch.jit.unused
+        def forward(self, x):
+            if not x.is_cuda:
+                return super().forward(x)
+            else:
+                with torch.cuda.device(x.device):
+                    return super().forward(x)
+
+except ImportError:
+    has_fused_layernorm = False
+
+
+class Fp32LayerNorm(nn.LayerNorm):
+    def __init__(self, *args, **kwargs):
+        super().__init__(*args, **kwargs)
+
+    def forward(self, input):
+        output = F.layer_norm(
+            input.float(),
+            self.normalized_shape,
+            self.weight.float() if self.weight is not None else None,
+            self.bias.float() if self.bias is not None else None,
+            self.eps,
+        )
+        return output.type_as(input)
+
+
+class LayerDropModuleList(nn.ModuleList):
+    """
+    A LayerDrop implementation based on :class:`torch.nn.ModuleList`.
+
+    We refresh the choice of which layers to drop every time we iterate
+    over the LayerDropModuleList instance. During evaluation we always
+    iterate over all layers.
+
+    Usage::
+
+        layers = LayerDropList(p=0.5, modules=[layer1, layer2, layer3])
+        for layer in layers:  # this might iterate over layers 1 and 3
+            x = layer(x)
+        for layer in layers:  # this might iterate over all layers
+            x = layer(x)
+        for layer in layers:  # this might not iterate over any layers
+            x = layer(x)
+
+    Args:
+        p (float): probability of dropping out each layer
+        modules (iterable, optional): an iterable of modules to add
+    """
+
+    def __init__(self, p, modules=None):
+        super().__init__(modules)
+        self.p = p
+
+    def __iter__(self):
+        dropout_probs = torch.empty(len(self)).uniform_()
+        for i, m in enumerate(super().__iter__()):
+            if not self.training or (dropout_probs[i] > self.p):
+                yield m
+
+
+class RelativePositionalEncoding(torch.nn.Module):
+    def __init__(self, d_model, maxlen=1000, embed_v=False):
+        super(RelativePositionalEncoding, self).__init__()
+
+        self.d_model = d_model
+        self.maxlen = maxlen
+        self.pe_k = torch.nn.Embedding(2*maxlen, d_model) 
+        if embed_v:
+            self.pe_v = torch.nn.Embedding(2*maxlen, d_model)
+        self.embed_v = embed_v
+
+
+    def forward(self, pos_seq, incremental_state=None):
+        pos_seq[pos_seq < -self.maxlen] = -self.maxlen
+        pos_seq[pos_seq >= self.maxlen] = self.maxlen - 1
+        pos_seq = pos_seq + self.maxlen
+        
+        if incremental_state is not None:
+            pos_seq = pos_seq[-1:]
+
+        if self.embed_v:
+            return self.pe_k(pos_seq), self.pe_v(pos_seq)
+        else:
+            return self.pe_k(pos_seq), None
+
+
+class MultiheadAttention(nn.Module):
+    """Multi-headed attention.
+
+    See "Attention Is All You Need" for more details.
+    """
+
+    def __init__(
+        self,
+        embed_dim,
+        num_heads,
+        kdim=None,
+        vdim=None,
+        dropout=0.0,
+        bias=True,
+        add_bias_kv=False,
+        add_zero_attn=False,
+        self_attention=False,
+        encoder_decoder_attention=False,
+        q_noise=0.0,
+        qn_block_size=8,
+        has_relative_attention_bias=False,
+        scaling_for_att=1.0
+    ):
+        super().__init__()
+        self.embed_dim = embed_dim
+        self.kdim = kdim if kdim is not None else embed_dim
+        self.vdim = vdim if vdim is not None else embed_dim
+        self.qkv_same_dim = self.kdim == embed_dim and self.vdim == embed_dim
+
+        self.num_heads = num_heads
+        self.dropout_module = FairseqDropout(
+            dropout, module_name=self.__class__.__name__
+        )
+
+        self.has_relative_attention_bias = has_relative_attention_bias
+
+        self.head_dim = embed_dim // num_heads
+        assert (
+            self.head_dim * num_heads == self.embed_dim
+        ), "embed_dim must be divisible by num_heads"
+        self.scaling = self.head_dim ** -0.5
+        self.scaling_for_att = scaling_for_att
+
+        self.self_attention = self_attention
+        self.encoder_decoder_attention = encoder_decoder_attention
+
+        assert not self.self_attention or self.qkv_same_dim, (
+            "Self-attention requires query, key and " "value to be of the same size"
+        )
+
+        self.k_proj = quant_noise(
+            nn.Linear(self.kdim, embed_dim, bias=bias), q_noise, qn_block_size
+        )
+        self.v_proj = quant_noise(
+            nn.Linear(self.vdim, embed_dim, bias=bias), q_noise, qn_block_size
+        )
+        self.q_proj = quant_noise(
+            nn.Linear(embed_dim, embed_dim, bias=bias), q_noise, qn_block_size
+        )
+
+        self.out_proj = quant_noise(
+            nn.Linear(embed_dim, embed_dim, bias=bias), q_noise, qn_block_size
+        )
+
+        if add_bias_kv:
+            self.bias_k = Parameter(torch.Tensor(1, 1, embed_dim))
+            self.bias_v = Parameter(torch.Tensor(1, 1, embed_dim))
+        else:
+            self.bias_k = self.bias_v = None
+
+        self.add_zero_attn = add_zero_attn
+
+        self.reset_parameters()
+
+        self.onnx_trace = False
+
+    def prepare_for_onnx_export_(self):
+        self.onnx_trace = True
+
+    def reset_parameters(self):
+        if self.qkv_same_dim:
+            # Empirically observed the convergence to be much better with
+            # the scaled initialization
+            nn.init.xavier_uniform_(self.k_proj.weight, gain=1 / math.sqrt(2))
+            nn.init.xavier_uniform_(self.v_proj.weight, gain=1 / math.sqrt(2))
+            nn.init.xavier_uniform_(self.q_proj.weight, gain=1 / math.sqrt(2))
+        else:
+            nn.init.xavier_uniform_(self.k_proj.weight)
+            nn.init.xavier_uniform_(self.v_proj.weight)
+            nn.init.xavier_uniform_(self.q_proj.weight)
+
+        nn.init.xavier_uniform_(self.out_proj.weight)
+        if self.out_proj.bias is not None:
+            nn.init.constant_(self.out_proj.bias, 0.0)
+        if self.bias_k is not None:
+            nn.init.xavier_normal_(self.bias_k)
+        if self.bias_v is not None:
+            nn.init.xavier_normal_(self.bias_v)
+
+    def forward(
+        self,
+        query,
+        key: Optional[Tensor],
+        value: Optional[Tensor],
+        key_padding_mask: Optional[Tensor] = None,
+        incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]] = None,
+        need_weights: bool = True,
+        static_kv: bool = False,
+        attn_mask: Optional[Tensor] = None,
+        before_softmax: bool = False,
+        need_head_weights: bool = False,
+        position_bias: Optional[Tensor] = None
+    ) -> Tuple[Tensor, Optional[Tensor]]:
+        """Input shape: Time x Batch x Channel
+
+        Args:
+            key_padding_mask (ByteTensor, optional): mask to exclude
+                keys that are pads, of shape `(batch, src_len)`, where
+                padding elements are indicated by 1s.
+            need_weights (bool, optional): return the attention weights,
+                averaged over heads (default: False).
+            attn_mask (ByteTensor, optional): typically used to
+                implement causal attention, where the mask prevents the
+                attention from looking forward in time (default: None).
+            before_softmax (bool, optional): return the raw attention
+                weights and values before the attention softmax.
+            need_head_weights (bool, optional): return the attention
+                weights for each head. Implies *need_weights*. Default:
+                return the average attention weights over all heads.
+        """
+        if need_head_weights:
+            need_weights = True
+
+        is_tpu = query.device.type == "xla"
+
+        tgt_len, bsz, embed_dim = query.size()
+        src_len = tgt_len
+        assert embed_dim == self.embed_dim, f"query dim {embed_dim} != {self.embed_dim}"
+        assert list(query.size()) == [tgt_len, bsz, embed_dim]
+        if key is not None:
+            src_len, key_bsz, _ = key.size()
+            if not torch.jit.is_scripting():
+                assert key_bsz == bsz
+                assert value is not None
+                assert src_len, bsz == value.shape[:2]
+
+        if (
+            not self.onnx_trace
+            and not is_tpu  # don't use PyTorch version on TPUs
+            and incremental_state is None
+            and not static_kv
+            # A workaround for quantization to work. Otherwise JIT compilation
+            # treats bias in linear module as method.
+            and not torch.jit.is_scripting()
+            and not self.has_relative_attention_bias
+        ):
+            assert key is not None and value is not None
+            return F.multi_head_attention_forward(
+                query,
+                key,
+                value,
+                self.embed_dim,
+                self.num_heads,
+                torch.empty([0]),
+                torch.cat((self.q_proj.bias, self.k_proj.bias, self.v_proj.bias)),
+                self.bias_k,
+                self.bias_v,
+                self.add_zero_attn,
+                self.dropout_module.p,
+                self.out_proj.weight,
+                self.out_proj.bias,
+                self.training or self.dropout_module.apply_during_inference,
+                key_padding_mask,
+                need_weights,
+                attn_mask,
+                use_separate_proj_weight=True,
+                q_proj_weight=self.q_proj.weight,
+                k_proj_weight=self.k_proj.weight,
+                v_proj_weight=self.v_proj.weight,
+            )
+
+        if incremental_state is not None:
+            saved_state = self._get_input_buffer(incremental_state)
+            if saved_state is not None and "prev_key" in saved_state:
+                # previous time steps are cached - no need to recompute
+                # key and value if they are static
+                if static_kv:
+                    assert self.encoder_decoder_attention and not self.self_attention
+                    key = value = None
+        else:
+            saved_state = None
+
+        if self.self_attention:
+            q = self.q_proj(query)
+            k = self.k_proj(query)
+            v = self.v_proj(query)
+        elif self.encoder_decoder_attention:
+            # encoder-decoder attention
+            q = self.q_proj(query)
+            if key is None:
+                assert value is None
+                k = v = None
+            else:
+                k = self.k_proj(key)
+                v = self.v_proj(key)
+
+        else:
+            assert key is not None and value is not None
+            q = self.q_proj(query)
+            k = self.k_proj(key)
+            v = self.v_proj(value)
+        q *= self.scaling
+        q *= (1 / self.scaling_for_att)
+
+        if self.bias_k is not None:
+            assert self.bias_v is not None
+            k = torch.cat([k, self.bias_k.repeat(1, bsz, 1)])
+            v = torch.cat([v, self.bias_v.repeat(1, bsz, 1)])
+            if attn_mask is not None:
+                attn_mask = torch.cat(
+                    [attn_mask, attn_mask.new_zeros(attn_mask.size(0), 1)], dim=1
+                )
+            if key_padding_mask is not None:
+                key_padding_mask = torch.cat(
+                    [
+                        key_padding_mask,
+                        key_padding_mask.new_zeros(key_padding_mask.size(0), 1),
+                    ],
+                    dim=1,
+                )
+
+        q = (
+            q.contiguous()
+            .view(tgt_len, bsz * self.num_heads, self.head_dim)
+            .transpose(0, 1)
+        )
+        if k is not None:
+            k = (
+                k.contiguous()
+                .view(-1, bsz * self.num_heads, self.head_dim)
+                .transpose(0, 1)
+            )
+        if v is not None:
+            v = (
+                v.contiguous()
+                .view(-1, bsz * self.num_heads, self.head_dim)
+                .transpose(0, 1)
+            )
+
+        if saved_state is not None:
+            # saved states are stored with shape (bsz, num_heads, seq_len, head_dim)
+            if "prev_key" in saved_state:
+                _prev_key = saved_state["prev_key"]
+                assert _prev_key is not None
+                prev_key = _prev_key.view(bsz * self.num_heads, -1, self.head_dim)
+                if static_kv:
+                    k = prev_key
+                else:
+                    assert k is not None
+                    k = torch.cat([prev_key, k], dim=1)
+                src_len = k.size(1)
+            if "prev_value" in saved_state:
+                _prev_value = saved_state["prev_value"]
+                assert _prev_value is not None
+                prev_value = _prev_value.view(bsz * self.num_heads, -1, self.head_dim)
+                if static_kv:
+                    v = prev_value
+                else:
+                    assert v is not None
+                    v = torch.cat([prev_value, v], dim=1)
+            prev_key_padding_mask: Optional[Tensor] = None
+            if "prev_key_padding_mask" in saved_state:
+                prev_key_padding_mask = saved_state["prev_key_padding_mask"]
+            assert k is not None and v is not None
+            key_padding_mask = MultiheadAttention._append_prev_key_padding_mask(
+                key_padding_mask=key_padding_mask,
+                prev_key_padding_mask=prev_key_padding_mask,
+                batch_size=bsz,
+                src_len=k.size(1),
+                static_kv=static_kv,
+            )
+
+            saved_state["prev_key"] = k.view(bsz, self.num_heads, -1, self.head_dim)
+            saved_state["prev_value"] = v.view(bsz, self.num_heads, -1, self.head_dim)
+            saved_state["prev_key_padding_mask"] = key_padding_mask
+            # In this branch incremental_state is never None
+            assert incremental_state is not None
+            incremental_state = self._set_input_buffer(incremental_state, saved_state)
+        assert k is not None
+        assert k.size(1) == src_len
+
+        # This is part of a workaround to get around fork/join parallelism
+        # not supporting Optional types.
+        if key_padding_mask is not None and key_padding_mask.dim() == 0:
+            key_padding_mask = None
+
+        if key_padding_mask is not None:
+            assert key_padding_mask.size(0) == bsz
+            assert key_padding_mask.size(1) == src_len
+
+        if self.add_zero_attn:
+            assert v is not None
+            src_len += 1
+            k = torch.cat([k, k.new_zeros((k.size(0), 1) + k.size()[2:])], dim=1)
+            v = torch.cat([v, v.new_zeros((v.size(0), 1) + v.size()[2:])], dim=1)
+            if attn_mask is not None:
+                attn_mask = torch.cat(
+                    [attn_mask, attn_mask.new_zeros(attn_mask.size(0), 1)], dim=1
+                )
+            if key_padding_mask is not None:
+                key_padding_mask = torch.cat(
+                    [
+                        key_padding_mask,
+                        torch.zeros(key_padding_mask.size(0), 1).type_as(
+                            key_padding_mask
+                        ),
+                    ],
+                    dim=1,
+                )
+
+        attn_weights = torch.bmm(q, k.transpose(1, 2))
+        attn_weights = self.apply_sparse_mask(attn_weights, tgt_len, src_len, bsz)
+
+        if position_bias is not None: ## first order
+            ## position_bias: [241, 241, 64]
+            #print ("attn_weights: ", attn_weights.size()) # [492, 241, 241]
+            reshape_q = q.contiguous().view(bsz * self.num_heads, -1, self.head_dim).transpose(0,1) #[241, 492, 64]
+            #print ("reshape_q: ", reshape_q.size())
+            B = torch.matmul(reshape_q, position_bias.transpose(-2, -1))
+            #print ("B: ", B.size())  ## [241, 492, 241]
+            #B = B.transpose(0, 1).view(bsz, self.num_heads, position_bias.size(0), position_bias.size(1))
+            B = B.transpose(0, 1).view(bsz*self.num_heads, position_bias.size(0), position_bias.size(1))
+            #print ("B 2: ", B.size())
+            attn_weights += B
+
+        attn_weights *= self.scaling_for_att
+        assert list(attn_weights.size()) == [bsz * self.num_heads, tgt_len, src_len]
+
+        if attn_mask is not None:
+            attn_mask = attn_mask.unsqueeze(0)
+            if self.onnx_trace:
+                attn_mask = attn_mask.repeat(attn_weights.size(0), 1, 1)
+            attn_weights += attn_mask
+
+        if key_padding_mask is not None:
+            # don't attend to padding symbols
+            attn_weights = attn_weights.view(bsz, self.num_heads, tgt_len, src_len)
+            if not is_tpu:
+                attn_weights = attn_weights.masked_fill(
+                    key_padding_mask.unsqueeze(1).unsqueeze(2).to(torch.bool),
+                    float("-inf"),
+                )
+            else:
+                attn_weights = attn_weights.transpose(0, 2)
+                attn_weights = attn_weights.masked_fill(key_padding_mask, float("-inf"))
+                attn_weights = attn_weights.transpose(0, 2)
+            attn_weights = attn_weights.view(bsz * self.num_heads, tgt_len, src_len)
+        
+        if self.scaling_for_att > 1.0:
+            attn_weights = attn_weights - attn_weights.detach().max(dim=-1, keepdim=True)[0]
+        
+        if before_softmax:
+            return attn_weights, v
+
+        attn_weights_float = softmax(
+            attn_weights, dim=-1, onnx_trace=self.onnx_trace
+        )
+        attn_weights = attn_weights_float.type_as(attn_weights)
+        attn_probs = self.dropout_module(attn_weights)
+
+        assert v is not None
+        attn = torch.bmm(attn_probs, v)
+        assert list(attn.size()) == [bsz * self.num_heads, tgt_len, self.head_dim]
+        if self.onnx_trace and attn.size(1) == 1:
+            # when ONNX tracing a single decoder step (sequence length == 1)
+            # the transpose is a no-op copy before view, thus unnecessary
+            attn = attn.contiguous().view(tgt_len, bsz, embed_dim)
+        else:
+            attn = attn.transpose(0, 1).contiguous().view(tgt_len, bsz, embed_dim)
+        attn = self.out_proj(attn)
+        attn_weights: Optional[Tensor] = None
+        if need_weights:
+            attn_weights = attn_weights_float.view(
+                bsz, self.num_heads, tgt_len, src_len
+            ).transpose(1, 0)
+            if not need_head_weights:
+                # average attention weights over heads
+                attn_weights = attn_weights.mean(dim=0)
+
+        return attn, attn_weights
+
+    @staticmethod
+    def _append_prev_key_padding_mask(
+        key_padding_mask: Optional[Tensor],
+        prev_key_padding_mask: Optional[Tensor],
+        batch_size: int,
+        src_len: int,
+        static_kv: bool,
+    ) -> Optional[Tensor]:
+        # saved key padding masks have shape (bsz, seq_len)
+        if prev_key_padding_mask is not None and static_kv:
+            new_key_padding_mask = prev_key_padding_mask
+        elif prev_key_padding_mask is not None and key_padding_mask is not None:
+            new_key_padding_mask = torch.cat(
+                [prev_key_padding_mask.float(), key_padding_mask.float()], dim=1
+            )
+        # During incremental decoding, as the padding token enters and
+        # leaves the frame, there will be a time when prev or current
+        # is None
+        elif prev_key_padding_mask is not None:
+            if src_len > prev_key_padding_mask.size(1):
+                filler = torch.zeros(
+                    (batch_size, src_len - prev_key_padding_mask.size(1)),
+                    device=prev_key_padding_mask.device,
+                )
+                new_key_padding_mask = torch.cat(
+                    [prev_key_padding_mask.float(), filler.float()], dim=1
+                )
+            else:
+                new_key_padding_mask = prev_key_padding_mask.float()
+        elif key_padding_mask is not None:
+            if src_len > key_padding_mask.size(1):
+                filler = torch.zeros(
+                    (batch_size, src_len - key_padding_mask.size(1)),
+                    device=key_padding_mask.device,
+                )
+                new_key_padding_mask = torch.cat(
+                    [filler.float(), key_padding_mask.float()], dim=1
+                )
+            else:
+                new_key_padding_mask = key_padding_mask.float()
+        else:
+            new_key_padding_mask = prev_key_padding_mask
+        return new_key_padding_mask
+
+    @torch.jit.export
+    def reorder_incremental_state(
+        self,
+        incremental_state: Dict[str, Dict[str, Optional[Tensor]]],
+        new_order: Tensor,
+    ):
+        """Reorder buffered internal state (for incremental generation)."""
+        input_buffer = self._get_input_buffer(incremental_state)
+        if input_buffer is not None:
+            for k in input_buffer.keys():
+                input_buffer_k = input_buffer[k]
+                if input_buffer_k is not None:
+                    if self.encoder_decoder_attention and input_buffer_k.size(
+                        0
+                    ) == new_order.size(0):
+                        break
+                    input_buffer[k] = input_buffer_k.index_select(0, new_order)
+            incremental_state = self._set_input_buffer(incremental_state, input_buffer)
+        return incremental_state
+
+    def _get_input_buffer(
+        self, incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]]
+    ) -> Dict[str, Optional[Tensor]]:
+        result = self.get_incremental_state(incremental_state, "attn_state")
+        if result is not None:
+            return result
+        else:
+            empty_result: Dict[str, Optional[Tensor]] = {}
+            return empty_result
+
+    def _set_input_buffer(
+        self,
+        incremental_state: Dict[str, Dict[str, Optional[Tensor]]],
+        buffer: Dict[str, Optional[Tensor]],
+    ):
+        return self.set_incremental_state(incremental_state, "attn_state", buffer)
+
+    def apply_sparse_mask(self, attn_weights, tgt_len: int, src_len: int, bsz: int):
+        return attn_weights
+
+    def upgrade_state_dict_named(self, state_dict, name):
+        prefix = name + "." if name != "" else ""
+        items_to_add = {}
+        keys_to_remove = []
+        for k in state_dict.keys():
+            if k.endswith(prefix + "in_proj_weight"):
+                # in_proj_weight used to be q + k + v with same dimensions
+                dim = int(state_dict[k].shape[0] / 3)
+                items_to_add[prefix + "q_proj.weight"] = state_dict[k][:dim]
+                items_to_add[prefix + "k_proj.weight"] = state_dict[k][dim : 2 * dim]
+                items_to_add[prefix + "v_proj.weight"] = state_dict[k][2 * dim :]
+
+                keys_to_remove.append(k)
+
+                k_bias = prefix + "in_proj_bias"
+                if k_bias in state_dict.keys():
+                    dim = int(state_dict[k].shape[0] / 3)
+                    items_to_add[prefix + "q_proj.bias"] = state_dict[k_bias][:dim]
+                    items_to_add[prefix + "k_proj.bias"] = state_dict[k_bias][
+                        dim : 2 * dim
+                    ]
+                    items_to_add[prefix + "v_proj.bias"] = state_dict[k_bias][2 * dim :]
+
+                    keys_to_remove.append(prefix + "in_proj_bias")
+
+        for k in keys_to_remove:
+            del state_dict[k]
+
+        for key, value in items_to_add.items():
+            state_dict[key] = value
+
+
+class ConvFeatureExtractionModel(nn.Module):
+    def __init__(
+        self,
+        conv_layers: List[Tuple[int, int, int]],
+        dropout: float = 0.0,
+        mode: str = "default",
+        conv_bias: bool = False,
+    ):
+        super().__init__()
+
+        assert mode in {"default", "layer_norm"}
+
+        def block(
+            n_in,
+            n_out,
+            k,
+            stride,
+            is_layer_norm=False,
+            is_group_norm=False,
+            conv_bias=False,
+        ):
+            def make_conv():
+                conv = nn.Conv1d(n_in, n_out, k, stride=stride, bias=conv_bias)
+                nn.init.kaiming_normal_(conv.weight)
+                return conv
+
+            assert (
+                is_layer_norm and is_group_norm
+            ) == False, "layer norm and group norm are exclusive"
+
+            if is_layer_norm:
+                return nn.Sequential(
+                    make_conv(),
+                    nn.Dropout(p=dropout),
+                    nn.Sequential(
+                        TransposeLast(),
+                        Fp32LayerNorm(dim, elementwise_affine=True),
+                        TransposeLast(),
+                    ),
+                    nn.GELU(),
+                )
+            elif is_group_norm:
+                return nn.Sequential(
+                    make_conv(),
+                    nn.Dropout(p=dropout),
+                    Fp32GroupNorm(dim, dim, affine=True),
+                    nn.GELU(),
+                )
+            else:
+                return nn.Sequential(make_conv(), nn.Dropout(p=dropout), nn.GELU())
+
+        in_d = 1
+        self.conv_layers = nn.ModuleList()
+        for i, cl in enumerate(conv_layers):
+            assert len(cl) == 3, "invalid conv definition: " + str(cl)
+            (dim, k, stride) = cl
+
+            self.conv_layers.append(
+                block(
+                    in_d,
+                    dim,
+                    k,
+                    stride,
+                    is_layer_norm=mode == "layer_norm",
+                    is_group_norm=mode == "default" and i == 0,
+                    conv_bias=conv_bias,
+                )
+            )
+            in_d = dim
+
+    def forward(self, x):
+
+        # BxT -> BxCxT
+        x = x.unsqueeze(1)
+
+        for conv in self.conv_layers:
+            x = conv(x)
+
+        return x
+
+
+class TransposeLast(nn.Module):
+    def __init__(self, deconstruct_idx=None):
+        super().__init__()
+        self.deconstruct_idx = deconstruct_idx
+
+    def forward(self, x):
+        if self.deconstruct_idx is not None:
+            x = x[self.deconstruct_idx]
+        return x.transpose(-2, -1)
+
+
+class Fp32GroupNorm(nn.GroupNorm):
+    def __init__(self, *args, **kwargs):
+        super().__init__(*args, **kwargs)
+
+    def forward(self, input):
+        output = F.group_norm(
+            input.float(),
+            self.num_groups,
+            self.weight.float() if self.weight is not None else None,
+            self.bias.float() if self.bias is not None else None,
+            self.eps,
+        )
+        return output.type_as(input)
+
+
+class GradMultiply(torch.autograd.Function):
+    @staticmethod
+    def forward(ctx, x, scale):
+        ctx.scale = scale
+        res = x.new(x)
+        return res
+
+    @staticmethod
+    def backward(ctx, grad):
+        return grad * ctx.scale, None
+
+
+class Rotate3D(nn.Module):
+    """
+    (T, B, D) --> (B, D, T) --> (D, T, B) --> (T, B, D)
+    """
+    def __init__(self):
+        super().__init__()
+
+    def forward(self, x):
+        return x.permute(1, 2, 0)
+
+
+class SamePad(nn.Module):
+    def __init__(self, kernel_size, causal=False):
+        super().__init__()
+        if causal:
+            self.remove = kernel_size - 1
+        else:
+            self.remove = 1 if kernel_size % 2 == 0 else 0
+
+    def forward(self, x):
+        if self.remove > 0:
+            x = x[:, :, : -self.remove]
+        return x
diff --git a/SpeechLM/speechlm/__init__.py b/SpeechLM/speechlm/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..97327d269e93a13cd135f6c1a187fd820a8decb8
--- /dev/null
+++ b/SpeechLM/speechlm/__init__.py
@@ -0,0 +1 @@
+from . import data, tasks, criterions, models
diff --git a/SpeechLM/speechlm/config/decode/infer_fsqlm.yaml b/SpeechLM/speechlm/config/decode/infer_fsqlm.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..005524facca0996a9a299cbe9bdd21570bd65c2e
--- /dev/null
+++ b/SpeechLM/speechlm/config/decode/infer_fsqlm.yaml
@@ -0,0 +1,44 @@
+# @package _group_
+
+defaults:
+  - model: null
+
+hydra:
+  run:
+    dir: ${common_eval.results_path}/beam${decoding.beam}_th${decoding.beamthreshold}_lmw${decoding.lmweight}_wrd${decoding.wordscore}_sil${decoding.silweight}
+  sweep:
+    dir: ${common_eval.results_path}
+    subdir: beam${decoding.beam}_th${decoding.beamthreshold}_lmw${decoding.lmweight}_wrd${decoding.wordscore}_sil${decoding.silweight}
+
+task:
+  _name: joint_sc2t_pretraining
+  data: ???
+  label_dir: ???
+  labels: ["ltr"]
+  store_labels: true
+  single_target: true
+  fine_tuning: true
+  normalize: ???  # must be consistent with pre-training
+  add_decoder_target: false
+  pad_audio: false
+  random_crop: true
+  hubert_tokenizer: "none"
+  sp_path: None
+
+decoding:
+  type: fairseqlm
+  lexicon: ???
+  lmpath: ???
+  beamthreshold: 25
+  beam: 500
+  lmweight: 2
+  wordscore: -1
+  silweight: 0
+  unique_wer_file: true
+common_eval:
+  results_path: ???
+  path: ???
+  post_process: letter
+dataset:
+  max_tokens: 1100000
+  gen_subset: ???
diff --git a/SpeechLM/speechlm/config/decode/infer_kenlm.yaml b/SpeechLM/speechlm/config/decode/infer_kenlm.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..98a8b7567d77fbee0eb38c3554d332a127f830f4
--- /dev/null
+++ b/SpeechLM/speechlm/config/decode/infer_kenlm.yaml
@@ -0,0 +1,44 @@
+# @package _group_
+
+defaults:
+  - model: null
+
+hydra:
+  run:
+    dir: ${common_eval.results_path}/beam${decoding.beam}_th${decoding.beamthreshold}_lmw${decoding.lmweight}_wrd${decoding.wordscore}_sil${decoding.silweight}
+  sweep:
+    dir: ${common_eval.results_path}
+    subdir: beam${decoding.beam}_th${decoding.beamthreshold}_lmw${decoding.lmweight}_wrd${decoding.wordscore}_sil${decoding.silweight}
+
+task:
+  _name: joint_sc2t_pretraining
+  data: ???
+  label_dir: ???
+  labels: ["ltr"]
+  store_labels: true
+  single_target: true
+  fine_tuning: true
+  normalize: ???  # must be consistent with pre-training
+  add_decoder_target: false
+  pad_audio: false
+  random_crop: true
+  hubert_tokenizer: "none"
+  sp_path: None
+
+decoding:
+  type: kenlm
+  lexicon: ???
+  lmpath: ???
+  beamthreshold: 100
+  beam: 500
+  lmweight: 2
+  wordscore: -1
+  silweight: 0
+  unique_wer_file: true
+common_eval:
+  results_path: ???
+  path: ???
+  post_process: letter
+dataset:
+  max_tokens: 1100000
+  gen_subset: ???
diff --git a/SpeechLM/speechlm/config/decode/infer_viterbi.yaml b/SpeechLM/speechlm/config/decode/infer_viterbi.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..969fc23df233d1785fb28bd340c15614877b0272
--- /dev/null
+++ b/SpeechLM/speechlm/config/decode/infer_viterbi.yaml
@@ -0,0 +1,37 @@
+# @package _group_
+
+defaults:
+  - model: null
+
+hydra:
+  run:
+    dir: ${common_eval.results_path}/viterbi
+  sweep:
+    dir: ${common_eval.results_path}
+    subdir: viterbi
+
+task:
+  _name: joint_sc2t_pretraining
+  data: ???
+  label_dir: ???
+  labels: ["ltr"]
+  store_labels: true
+  single_target: true
+  fine_tuning: true
+  normalize: ???  # must be consistent with pre-training
+  add_decoder_target: false
+  pad_audio: false
+  random_crop: true
+  hubert_tokenizer: "none"
+  sp_path: None
+
+decoding:
+  type: viterbi
+  unique_wer_file: true
+common_eval:
+  results_path: ???
+  path: ???
+  post_process: letter
+dataset:
+  batch_size: 1
+  gen_subset: ???
diff --git a/SpeechLM/speechlm/config/finetune/speechlm_base_100h.yaml b/SpeechLM/speechlm/config/finetune/speechlm_base_100h.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..c48b856da69f75205d8c0e47752cef173194558c
--- /dev/null
+++ b/SpeechLM/speechlm/config/finetune/speechlm_base_100h.yaml
@@ -0,0 +1,99 @@
+# @package _group_
+
+common:
+  fp16: true
+  log_format: json
+  log_interval: 200
+  tensorboard_logdir: tblog
+  seed: 1337
+
+checkpoint:
+  save_interval: 1
+  keep_last_epochs: 1
+  keep_best_checkpoints: -1
+  best_checkpoint_metric: wer
+  restore_file: checkpoint_last.pt
+
+distributed_training:
+  ddp_backend: legacy_ddp
+  find_unused_parameters: true
+  distributed_world_size: 1
+  distributed_port: -1
+  nprocs_per_node: 8
+
+task:
+  _name: joint_sc2t_pretraining
+  data: ???
+  fine_tuning: true
+  label_dir: ???
+  normalize: false  # must be consistent with pre-training
+  labels: ["ltr"]
+  store_labels: true
+  single_target: true
+  add_decoder_target: false
+  pad_audio: false
+  random_crop: true
+  hubert_tokenizer: "none"
+  sp_path: None
+
+dataset:
+  num_workers: 0
+  max_tokens: 1600000
+  skip_invalid_size_inputs_valid_test: true
+  train_subset: train_100
+  valid_subset: dev_other
+  required_batch_size_multiple: 1
+
+criterion:
+  _name: ctc
+  zero_infinity: true
+
+optimization:
+  max_update: 30000
+  lr: [0.00001]
+  sentence_avg: true
+  update_freq: [1]
+
+optimizer:
+  _name: adam
+  adam_betas: (0.9,0.98)
+  adam_eps: 1e-08
+  weight_decay: 0.0
+
+lr_scheduler:
+  _name: tri_stage
+  phase_ratio: [0.1, 0.4, 0.5]
+  final_lr_scale: 0.05
+
+model:
+  _name: speechlm_ctc
+  w2v_path: ???
+  apply_mask: true
+  mask_prob: 0.65
+  mask_channel_prob: 0.5
+  mask_channel_length: 64
+  layerdrop: 0.1
+  activation_dropout: 0.1
+  feature_grad_mult: 0.0
+  freeze_finetune_updates: 0
+
+hydra:
+  job:
+    config:
+      override_dirname:
+        kv_sep: '-'
+        item_sep: '__'
+        exclude_keys:
+          - run
+          - task.data
+          - task.label_dir
+          - model.w2v_path
+          - dataset.train_subset
+          - dataset.valid_subset
+          - criterion.wer_kenlm_model
+          - criterion.wer_lexicon
+  run:
+    dir: ???
+  sweep:
+    dir: ???
+    subdir: ${hydra.job.config_name}__${hydra.job.override_dirname}
diff --git a/SpeechLM/speechlm/config/finetune/speechlm_large_960h.yaml b/SpeechLM/speechlm/config/finetune/speechlm_large_960h.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..dc86b6f284dc126dbc614639130e57898d05d2e1
--- /dev/null
+++ b/SpeechLM/speechlm/config/finetune/speechlm_large_960h.yaml
@@ -0,0 +1,98 @@
+# @package _group_
+
+common:
+  fp16: true
+  log_format: json
+  log_interval: 200
+  tensorboard_logdir: tblog
+
+checkpoint:
+  save_interval: 1
+  keep_last_epochs: 5
+  keep_best_checkpoints: 5
+  best_checkpoint_metric: wer
+  restore_file: checkpoint_last.pt
+
+distributed_training:
+  ddp_backend: legacy_ddp
+  find_unused_parameters: true
+  distributed_world_size: 32
+  distributed_port: -1
+  nprocs_per_node: 8
+
+task:
+  _name: joint_sc2t_pretraining
+  data: ???
+  fine_tuning: true
+  label_dir: ???
+  normalize: true  # must be consistent with pre-training
+  labels: ["ltr"]
+  store_labels: true
+  single_target: true
+  add_decoder_target: false
+  pad_audio: false
+  random_crop: true
+  hubert_tokenizer: "none"
+  sp_path: None
+
+dataset:
+  num_workers: 0
+  max_tokens: 900000
+  skip_invalid_size_inputs_valid_test: true
+  train_subset: train_960
+  valid_subset: dev_other
+  required_batch_size_multiple: 1
+
+criterion:
+  _name: ctc
+  zero_infinity: true
+
+optimization:
+  max_update: 200000
+  lr: [0.00001]
+  sentence_avg: true
+  update_freq: [1]
+
+optimizer:
+  _name: adam
+  adam_betas: (0.9,0.98)
+  adam_eps: 1e-08
+  weight_decay: 0.0
+
+lr_scheduler:
+  _name: tri_stage
+  phase_ratio: [0.1, 0.4, 0.5]
+  final_lr_scale: 0.05
+
+model:
+  _name: speechlm_ctc
+  w2v_path: ???
+  apply_mask: true
+  mask_prob: 0.5
+  mask_channel_prob: 0.25
+  mask_channel_length: 64
+  layerdrop: 0.0
+  activation_dropout: 0.1
+  feature_grad_mult: 0.0
+  freeze_finetune_updates: 0
+
+hydra:
+  job:
+    config:
+      override_dirname:
+        kv_sep: '-'
+        item_sep: '__'
+        exclude_keys:
+          - run
+          - task.data
+          - task.label_dir
+          - model.w2v_path
+          - dataset.train_subset
+          - dataset.valid_subset
+          - criterion.wer_kenlm_model
+          - criterion.wer_lexicon
+  run:
+    dir: ???
+  sweep:
+    dir: ???
+    subdir: ${hydra.job.config_name}__${hydra.job.override_dirname}
diff --git a/SpeechLM/speechlm/config/pretrain/speechlm_base_librispeech.yaml b/SpeechLM/speechlm/config/pretrain/speechlm_base_librispeech.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..454f5e2a8b6b67f8b7bf906e7e38f395b9ca9e96
--- /dev/null
+++ b/SpeechLM/speechlm/config/pretrain/speechlm_base_librispeech.yaml
@@ -0,0 +1,139 @@
+# @package _group_
+
+common:
+  fp16: true
+  log_format: json
+  log_interval: 200
+  seed: 1337
+  tensorboard_logdir: tblog
+
+checkpoint:
+  save_dir: ???
+  save_interval: 4
+  keep_last_epochs: 4
+  save_interval_updates: 50000
+  keep_interval_updates: -1
+  keep_interval_updates_pattern: 50000
+  # no_epoch_checkpoints: true
+
+distributed_training:
+  ddp_backend: legacy_ddp
+  distributed_backend: 'nccl'
+  distributed_port: -1
+  distributed_world_size: 32
+  nprocs_per_node: 8
+  find_unused_parameters: true
+
+task:
+  _name: joint_sc2t_pretraining
+  data: ???
+  label_dir: ???
+  labels: ???
+  label_rate: ${model.label_rate}
+  store_labels: true
+  sample_rate: 16000
+  max_sample_size: 250000
+  min_sample_size: 32000
+  pad_audio: false
+  random_crop: true
+  normalize: false # must be consistent with extractor
+  add_decoder_target: false
+  text_cfg:
+    seed: ${common.seed}
+    text_data: ???
+    data_config: config.yaml
+    sample_break_mode: eos
+    tokens_per_sample: 1024
+    shorten_method: "random_crop"
+    text_maxtokens_ratio: 1.0
+
+dataset:
+  num_workers: 6
+  max_tokens: 1400000
+  skip_invalid_size_inputs_valid_test: true
+  validate_interval: ${checkpoint.save_interval}
+  validate_interval_updates: ${checkpoint.save_interval_updates}
+  required_batch_size_multiple: 1
+
+criterion:
+  _name: speechlm_criterion
+  pred_masked_weight: 1.0
+  pred_nomask_weight: 0.0
+  loss_weights: [10,]
+  text_ctc_weight: 0.1
+  text_mum_weight: 0.0
+
+optimization:
+  max_update: 400000
+  lr: [0.0005]
+  clip_norm: 10.0
+
+optimizer:
+  _name: adam
+  adam_betas: (0.9,0.98)
+  adam_eps: 1e-06
+  weight_decay: 0.01
+
+lr_scheduler:
+  _name: polynomial_decay
+  warmup_updates: 32000
+
+model:
+  _name: speechlm
+  label_rate: ???
+  skip_masked: false
+  skip_nomask: false
+  mask_prob: 0.80
+  extractor_mode: default
+  conv_feature_layers: '[(512,10,5)] + [(512,3,2)] * 4 + [(512,2,2)] * 2'
+  final_dim: 256
+  activation_fn: "gelu"
+  encoder_layers: 6
+  encoder_attention_heads: 8
+  encoder_layerdrop: 0.1
+  dropout_input: 0.1
+  dropout_features: 0.1
+  dropout: 0.1
+  attention_dropout: 0.1
+  feature_grad_mult: 0.1
+  untie_final_proj: true
+  activation_dropout: 0.0
+  use_rel_pos_enc: true
+  add_unit_encoder: true
+  add_text_ctc: true
+  mask_u2t: true
+  compute_mum: false
+  mix_with_unit: true
+  text_transformer:
+    activation_fn: ${model.activation_fn}
+    dropout: ${model.dropout}
+    attention_dropout: ${model.attention_dropout}
+    activation_dropout: ${model.activation_dropout}
+    max_source_positions: 3000
+    no_scale_embedding: true
+    layernorm_embedding: true
+    no_token_positional_embeddings: false
+    encoder:
+      embed_dim: 768
+      ffn_embed_dim: 3072
+      layers: 6
+      attention_heads: 8
+      normalize_before: false
+      learned_pos: true
+      layerdrop: ${model.encoder_layerdrop}
+
+hydra:
+  job:
+    config:
+      override_dirname:
+        kv_sep: '-'
+        item_sep: '__'
+        exclude_keys:
+          - run
+          - task.data
+          - task.label_dir
+  run:
+    dir: ???
+  sweep:
+    dir: ???
+    subdir: ${hydra.job.config_name}__${hydra.job.override_dirname}
diff --git a/SpeechLM/speechlm/config/pretrain/speechlm_large_librilight.yaml b/SpeechLM/speechlm/config/pretrain/speechlm_large_librilight.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..74d593e3cf099f2116f541d87a644ea48f5642d6
--- /dev/null
+++ b/SpeechLM/speechlm/config/pretrain/speechlm_large_librilight.yaml
@@ -0,0 +1,144 @@
+# @package _group_
+
+common:
+  fp16: true
+  log_format: json
+  log_interval: 200
+  seed: 1234
+  tensorboard_logdir: tblog
+
+checkpoint:
+  save_dir: ???
+  save_interval: 1
+  keep_last_epochs: 4
+  save_interval_updates: 10000
+  keep_interval_updates: 40
+  keep_interval_updates_pattern: 10000
+  # no_epoch_checkpoints: true
+
+distributed_training:
+  ddp_backend: legacy_ddp
+  distributed_backend: 'nccl'
+  distributed_port: -1
+  distributed_world_size: 32
+  nprocs_per_node: 8
+  find_unused_parameters: true
+
+task:
+  _name: joint_sc2t_pretraining
+  data: ???
+  label_dir: ???
+  labels: ???
+  label_rate: ${model.label_rate}
+  store_labels: true
+  sample_rate: 16000
+  max_sample_size: 250000
+  min_sample_size: 32000
+  pad_audio: false
+  random_crop: true
+  normalize: true # must be consistent with extractor
+  add_decoder_target: false
+  text_cfg:
+    seed: ${common.seed}
+    text_data: ???
+    data_config: config.yaml
+    sample_break_mode: eos
+    tokens_per_sample: 1024
+    shorten_method: "random_crop"
+    text_maxtokens_ratio: 1.0
+
+dataset:
+  num_workers: 1
+  max_tokens: 900000
+  skip_invalid_size_inputs_valid_test: true
+  validate_interval: ${checkpoint.save_interval}
+  validate_interval_updates: ${checkpoint.save_interval_updates}
+  required_batch_size_multiple: 2
+
+criterion:
+  _name: speechlm_criterion
+  pred_masked_weight: 1.0
+  pred_nomask_weight: 0.0
+  loss_weights: [10,]
+  text_ctc_weight: 0.1
+  text_mum_weight: 0.0
+
+optimization:
+  max_update: 400000
+  lr: [0.001]
+  clip_norm: 1.0
+
+optimizer:
+  _name: adam
+  adam_betas: (0.9,0.98)
+  adam_eps: 1e-06
+  weight_decay: 0.01
+
+lr_scheduler:
+  _name: polynomial_decay
+  warmup_updates: 32000
+
+model:
+  _name: speechlm
+  label_rate: ???
+  activation_fn: "gelu"
+  encoder_layers: 12
+  encoder_embed_dim: 1024
+  encoder_ffn_embed_dim: 4096
+  encoder_attention_heads: 16
+  final_dim: 256
+  skip_masked: false
+  skip_nomask: false
+  mask_prob: 0.80
+  extractor_mode: layer_norm
+  conv_feature_layers: '[(512,10,5)] + [(512,3,2)] * 4 + [(512,2,2)] * 2'
+  encoder_layerdrop: 0.0
+  dropout_input: 0.0
+  dropout_features: 0.0
+  dropout: 0.0
+  attention_dropout: 0.0
+  layer_norm_first: true
+  feature_grad_mult: 1.0
+  untie_final_proj: true
+  activation_dropout: 0.0
+  use_rel_pos_enc: true
+  add_unit_encoder: true
+  add_text_ctc: true
+  mask_u2t: true
+  compute_mum: false
+  mix_with_unit: true
+  scaling_for_att: 32
+  text_transformer:
+    activation_fn: ${model.activation_fn}
+    dropout: ${model.dropout}
+    attention_dropout: ${model.attention_dropout}
+    activation_dropout: ${model.activation_dropout}
+    max_source_positions: 3000
+    no_scale_embedding: true
+    layernorm_embedding: true
+    no_token_positional_embeddings: false
+    encoder:
+      embed_dim: 1024
+      ffn_embed_dim: 4096
+      layers: 12
+      attention_heads: 16
+      normalize_before: ${model.layer_norm_first}
+      learned_pos: true
+      layerdrop: ${model.encoder_layerdrop}
+
+hydra:
+  job:
+    config:
+      override_dirname:
+        kv_sep: '-'
+        item_sep: '__'
+        exclude_keys:
+          - run
+          - task.data
+          - task.label_dir
+  run:
+    dir: ???
+  sweep:
+    dir: ???
+    subdir: ${hydra.job.config_name}__${hydra.job.override_dirname}
+
diff --git a/SpeechLM/speechlm/config/pretrain/speechlmp_base_cfg.pt b/SpeechLM/speechlm/config/pretrain/speechlmp_base_cfg.pt
new file mode 100644
index 0000000000000000000000000000000000000000..5ba9faf27b9bbdcadde4452ff6c2f9d020a36fc4
Binary files /dev/null and b/SpeechLM/speechlm/config/pretrain/speechlmp_base_cfg.pt differ
diff --git a/SpeechLM/speechlm/criterions/__init__.py b/SpeechLM/speechlm/criterions/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..285826b653331f6c1a326de4ed234605e35876df
--- /dev/null
+++ b/SpeechLM/speechlm/criterions/__init__.py
@@ -0,0 +1,9 @@
+import importlib
+import os
+
+for file in os.listdir(os.path.dirname(__file__)):
+    if file.endswith(".py") and not file.startswith("_"):
+        criterion_name = file[: file.find(".py")]
+        importlib.import_module(
+            "speechlm.criterions." + criterion_name
+        )
diff --git a/SpeechLM/speechlm/criterions/fasttext2unit_loss.py b/SpeechLM/speechlm/criterions/fasttext2unit_loss.py
new file mode 100644
index 0000000000000000000000000000000000000000..2cceed3f102d5ab5ff355db9960c74dbe31a8d93
--- /dev/null
+++ b/SpeechLM/speechlm/criterions/fasttext2unit_loss.py
@@ -0,0 +1,181 @@
+# ----------------------------------------------------------------------------
+# SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data (https://arxiv.org/abs/2209.15329)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/SpeechLM
+# Code based on fairseq: https://github.com/facebookresearch/fairseq/tree/272c4c5197250997148fb12c0db6306035f166a4
+# 
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# ----------------------------------------------------------------------------
+
+from typing import List, Dict, Any
+from dataclasses import dataclass, field
+
+import torch
+import torch.nn.functional as F
+
+from fairseq import metrics, utils
+from fairseq.criterions import FairseqCriterion, register_criterion
+from fairseq.dataclass import FairseqDataclass
+from fairseq.data.data_utils import lengths_to_mask
+from fairseq.models.fairseq_model import FairseqEncoderModel
+
+def label_smoothed_nll_loss(lprobs, target, epsilon, ignore_index=None, reduce=True):
+    if target.dim() == lprobs.dim() - 1:
+        target = target.unsqueeze(-1)
+    nll_loss = -lprobs.gather(dim=-1, index=target)
+    smooth_loss = -lprobs.sum(dim=-1, keepdim=True)
+    if ignore_index is not None:
+        pad_mask = target.eq(ignore_index)
+        nll_loss.masked_fill_(pad_mask, 0.0)
+        smooth_loss.masked_fill_(pad_mask, 0.0)
+    else:
+        nll_loss = nll_loss.squeeze(-1)
+        smooth_loss = smooth_loss.squeeze(-1)
+    if reduce:
+        ntokens = (~pad_mask).sum()
+        nll_loss = nll_loss.sum() / ntokens
+        smooth_loss = smooth_loss.sum() / ntokens
+    eps_i = epsilon / (lprobs.size(-1) - 1)
+    loss = (1.0 - epsilon - eps_i) * nll_loss + eps_i * smooth_loss
+    return loss, nll_loss
+
+@dataclass
+class FastText2UnitCriterionConfig(FairseqDataclass):
+    label_smoothing: float = field(
+        default=0.0,
+        metadata={"help": "epsilon for label smoothing, 0 means no label smoothing"},
+    )
+    dur_loss_weight: float = field(
+        default=1.0,
+        metadata={"help": "scale of duration loss"},
+    )
+    report_accuracy: bool = field(
+        default=True,
+        metadata={"help": "report decoder accuracy metric"},
+    )
+
+@register_criterion("fasttext2unit_criterion", dataclass=FastText2UnitCriterionConfig)
+class FastText2UnitLoss(FairseqCriterion):
+    def __init__(self,
+        task,
+        label_smoothing=0,
+        dur_loss_weight=1.0,
+        report_accuracy=False,
+    ):
+        super().__init__(task)
+        self.eps = label_smoothing
+        self.dur_loss_weight = dur_loss_weight
+        self.pad_idx = task.tgt_dict.pad()
+        self.report_accuracy = report_accuracy
+
+    def forward(self, model: FairseqEncoderModel, sample, reduction="mean"):
+        src_tokens = sample["net_input"]["src_tokens"]
+        src_lens = sample["net_input"]["src_lengths"]
+        tgt_lens = sample["target_lengths"]
+        
+        _feat_out, _feat_out_post, out_lens, log_dur_out, pitch_out, energy_out = model(
+            src_tokens=src_tokens,
+            src_lengths=src_lens,
+            prev_output_tokens=sample["net_input"]["prev_output_tokens"],
+            incremental_state=None,
+            target_lengths=tgt_lens,
+            speaker=sample["speaker"],
+            durations=sample["durations"],
+            pitches=sample["pitches"],
+            energies=sample["energies"],
+        )
+
+        src_mask = lengths_to_mask(sample["net_input"]["src_lengths"])
+        tgt_mask = lengths_to_mask(sample["target_lengths"])
+
+        lprobs = model.get_normalized_probs((_feat_out,), log_probs=True)
+        target = sample["target"].long()
+        ce_loss, nll_loss = label_smoothed_nll_loss(lprobs, target, self.eps, self.padding_idx, reduce=True)
+
+        pitches, energies = sample["pitches"], sample["energies"]
+        if pitches is not None:
+            pitch_out, pitches = pitch_out[src_mask], pitches[src_mask]
+            pitch_loss = F.mse_loss(pitch_out, pitches, reduction=reduction)
+        else:
+            pitch_loss = 0
+        if energies is not None:
+            energy_out, energies = energy_out[src_mask], energies[src_mask]
+            energy_loss = F.mse_loss(energy_out, energies, reduction=reduction)
+        else:
+            energy_loss = 0
+
+        log_dur_out = log_dur_out[src_mask]
+        dur = sample["durations"].float()
+        dur = dur.half() if log_dur_out.type().endswith(".HalfTensor") else dur
+        log_dur = torch.log(dur + 1)[src_mask]
+        dur_loss = F.mse_loss(log_dur_out, log_dur, reduction=reduction)
+        dur_loss = self.dur_loss_weight * dur_loss
+
+        loss = ce_loss + dur_loss + pitch_loss + energy_loss
+
+        sample_size = sample["nsentences"]
+        logging_output = {
+            "loss": utils.item(loss.data),
+            "ntokens": sample["ntokens"],
+            "nsentences": sample["nsentences"],
+            "sample_size": sample_size,
+            "ce_loss": utils.item(ce_loss.data),
+            "dur_loss": utils.item(dur_loss.data),
+            "pitch_loss": utils.item(pitch_loss),
+            "energy_loss": utils.item(energy_loss),
+        }
+        if self.report_accuracy:
+            n_correct = lprobs.argmax(-1).masked_select(tgt_mask).eq(target.masked_select(tgt_mask)).sum()
+            logging_output["n_correct"] = utils.item(n_correct.data)
+            logging_output["total"] = tgt_mask.sum()
+        return loss, 1, logging_output
+
+    @classmethod
+    def reduce_metrics(cls, logging_outputs: List[Dict[str, Any]]) -> None:
+        ns = [log.get("sample_size", 0) for log in logging_outputs]
+        ntot = sum(ns)
+        ws = [n / (ntot + 1e-8) for n in ns]
+        for key in [
+            "loss",
+            "ce_loss",
+            "dur_loss",
+            "pitch_loss",
+            "energy_loss",
+        ]:
+            vals = [log.get(key, 0) for log in logging_outputs]
+            val = sum(val * w for val, w in zip(vals, ws))
+            metrics.log_scalar(key, val, ntot, round=3)
+        metrics.log_scalar("sample_size", ntot, len(logging_outputs))
+
+        total = utils.item(sum(log.get("total", 0) for log in logging_outputs))
+        if total > 0:
+            metrics.log_scalar("total", total)
+            n_correct = utils.item(
+                    sum(log.get("n_correct", 0) for log in logging_outputs)
+                )
+            metrics.log_scalar("n_correct", n_correct)
+            metrics.log_derived(
+                "accuracy",
+                lambda meters: round(
+                    meters["n_correct"].sum * 100.0 / meters["total"].sum, 3
+                )
+                if meters["total"].sum > 0
+                else float("nan"),
+            )
+
+        # inference metrics
+        if "targ_frames" not in logging_outputs[0]:
+            return
+        n = sum(log.get("targ_frames", 0) for log in logging_outputs)
+        for key, new_key in [
+            ("mcd_loss", "mcd_loss"),
+            ("pred_frames", "pred_ratio"),
+            ("nins", "ins_rate"),
+            ("ndel", "del_rate"),
+        ]:
+            val = sum(log.get(key, 0) for log in logging_outputs)
+            metrics.log_scalar(new_key, val / n, n, round=3)
+
+    @staticmethod
+    def logging_outputs_can_be_summed() -> bool:
+        return False
diff --git a/SpeechLM/speechlm/criterions/speechlm_criterion.py b/SpeechLM/speechlm/criterions/speechlm_criterion.py
new file mode 100644
index 0000000000000000000000000000000000000000..ca13298d7c30b31c2d083137090679e36a7ffd39
--- /dev/null
+++ b/SpeechLM/speechlm/criterions/speechlm_criterion.py
@@ -0,0 +1,352 @@
+# ----------------------------------------------------------------------------
+# SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data (https://arxiv.org/abs/2209.15329)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/SpeechLM
+# Code based on fairseq: https://github.com/facebookresearch/fairseq/tree/272c4c5197250997148fb12c0db6306035f166a4
+# 
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# ----------------------------------------------------------------------------
+
+import logging
+import math
+import re
+from dataclasses import dataclass, field
+from typing import List, Optional
+
+import numpy as np
+import torch
+import torch.nn.functional as F
+from fairseq import metrics, utils
+from fairseq.criterions import FairseqCriterion, register_criterion
+from fairseq.criterions.label_smoothed_cross_entropy import label_smoothed_nll_loss
+from fairseq.dataclass import FairseqDataclass
+
+logger = logging.getLogger(__name__)
+
+@dataclass
+class HSTCriterionConfig(FairseqDataclass):
+    pred_masked_weight: float = field(
+        default=1.0,
+        metadata={"help": "weight for predictive loss for masked frames"},
+    )
+    pred_nomask_weight: float = field(
+        default=0.0,
+        metadata={"help": "weight for predictive loss for unmasked frames"},
+    )
+    loss_weights: Optional[List[float]] = field(
+        default=None,
+        metadata={"help": "weights for additional loss terms (not first one)"},
+    )
+    log_keys: List[str] = field(
+        default_factory=lambda: [],
+        metadata={"help": "output keys to log"},
+    )
+    text_ctc_weight: float = field(
+        default=0.1,
+        metadata={"help": "weights for text CTC Loss, loss will be (hubert_loss + dec_weight * CE_Loss + text_weight * (CE_Loss + CTC_loss))"},
+    )
+    text_mum_weight: float = field(
+        default=0.0,
+        metadata={"help": "masked unit modeling weight from the text end"},
+    )
+    report_accuracy: bool = field(
+        default=True,
+        metadata={"help": "report decoder accuracy metric"},
+    )
+    ignore_prefix_size: int = field(
+        default=0,
+        metadata={"help": "Ignore first N tokens"},
+    )
+    no_ctc_blank: bool = field(
+        default=False,
+        metadata={"help": "mask out the blank of ctc, only when dec_loss_type=ctc"},
+    )
+
+@register_criterion("speechlm_criterion", dataclass=HSTCriterionConfig)
+class SpeechLMCriterion(FairseqCriterion):
+    def __init__(
+        self, 
+        task, 
+        pred_masked_weight, 
+        pred_nomask_weight, 
+        loss_weights=None, 
+        log_keys=None, 
+        text_ctc_weight=0.1,
+        text_mum_weight=0,
+        report_accuracy=False, 
+        ignore_prefix_size=0, 
+        no_ctc_blank=False,
+    ):
+        super().__init__(task)
+        self.pred_masked_weight = pred_masked_weight
+        self.pred_nomask_weight = pred_nomask_weight
+        self.loss_weights = loss_weights
+        self.log_keys = [] if log_keys is None else log_keys
+        self.text_ctc_weight = text_ctc_weight
+        self.text_mum_weight = text_mum_weight
+        self.report_accuracy = report_accuracy
+        self.ignore_prefix_size = ignore_prefix_size
+        self.no_ctc_blank = no_ctc_blank
+        self.padding_idx = task.dictionaries[0].pad()
+        self.eos_idx = task.dictionaries[0].eos()
+        self.blank_idx = task.dictionaries[0].bos()
+
+    def compute_hubert_loss(self, model, net_output, reduction, suffix=''):
+        loss = 0
+        sample_size = []
+        logging_output = {}
+        loss_m_list = []
+        logp_m_list = model.get_logits(net_output, True)
+        targ_m_list = model.get_targets(net_output, True)
+        assert self.pred_masked_weight == 0 or len(logp_m_list) > 0
+        for i, (logp_m, targ_m) in enumerate(zip(logp_m_list, targ_m_list)):
+            loss_m = F.cross_entropy(logp_m, targ_m, reduction=reduction)
+            loss_m_list.append(loss_m)
+            logging_output[f"loss_m_{i}{suffix}"] = loss_m.detach().item()
+        if self.pred_masked_weight > 0:
+            loss += self.pred_masked_weight * sum(loss_m_list)
+            sample_size.append(targ_m_list[0].numel())
+
+        loss_u_list = []
+        logp_u_list = model.get_logits(net_output, False)
+        targ_u_list = model.get_targets(net_output, False)
+        assert self.pred_nomask_weight == 0 or len(logp_u_list) > 0
+        for i, (logp_u, targ_u) in enumerate(zip(logp_u_list, targ_u_list)):
+            loss_u = F.cross_entropy(logp_u, targ_u, reduction=reduction)
+            loss_u_list.append(loss_u)
+            logging_output[f"loss_u_{i}{suffix}"] = loss_u.detach().item()
+        if self.pred_nomask_weight > 0:
+            loss += self.pred_nomask_weight * sum(loss_u_list)
+            sample_size.append(targ_u_list[0].numel())
+        
+        sample_size = np.mean(sample_size)
+
+        def compute_correct(logits, targets):
+            if logits.numel() == 0:
+                return 0, 0
+            else:
+                assert logits.dim() > 1, logits.shape
+                max = logits.argmax(-1) == targets
+                min = logits.argmin(-1) == targets
+                both = max & min
+                corr = max.long().sum().item() - both.long().sum().item()
+                count = max.numel()
+                return corr, count
+
+        with torch.no_grad():
+            for i, (logp_m, targ_m) in enumerate(zip(logp_m_list, targ_m_list)):
+                corr_m, count_m = compute_correct(logp_m, targ_m)
+                logging_output[f"correct_m_{i}{suffix}"] = corr_m
+                logging_output[f"count_m_{i}{suffix}"] = count_m
+
+            for i, (logp_u, targ_u) in enumerate(zip(logp_u_list, targ_u_list)):
+                corr_u, count_u = compute_correct(logp_u, targ_u)
+                logging_output[f"correct_u_{i}{suffix}"] = corr_u
+                logging_output[f"count_u_{i}{suffix}"] = count_u
+
+        return loss, sample_size, logging_output
+
+
+    def forward(self, model, sample, reduce=True, log_pred=False):
+        """Compute the loss for the given sample.
+        Returns a tuple with three elements:
+        1) the loss
+        2) the sample size, which is used as the denominator for the gradient
+        3) logging outputs to display while training
+        """
+        reduction = "sum" if reduce else "none"
+
+        if "net_input" in sample:
+            text_sample = None
+        else:
+            text_sample = sample.get("text_paired")
+            sample = sample.get("speech")
+
+        ### 1. L_UMLM: do hubert forward and loss computation
+        sample["modality"] = "speech"
+        net_output = model(target_list=sample["target_list"], **sample["net_input"])
+        loss, sample_size, logging_output = self.compute_hubert_loss(
+            model,
+            net_output,
+            reduction,
+        )
+        if self.loss_weights is not None:
+            assert hasattr(model, "get_extra_losses")
+            extra_losses, names = model.get_extra_losses(net_output)
+            if torch.is_tensor(extra_losses):
+                extra_losses = [extra_losses]
+                names = [names]
+            if len(self.loss_weights) == 1 and len(extra_losses) != 1:
+                self.loss_weights = [self.loss_weights[0]] * len(extra_losses)
+            assert len(extra_losses) == len(
+                self.loss_weights
+            ), f"{len(extra_losses)}, {len(self.loss_weights)}"
+            for p, n, coef in zip(extra_losses, names, self.loss_weights):
+                if coef != 0 and p is not None:
+                    p = coef * p.float() * sample_size
+                    loss += p
+                    logging_output[f"loss_{n}"] = p.item()
+        for lk in self.log_keys:
+            if lk in net_output:
+                logging_output[lk] = float((net_output[lk]))
+        
+        ### 2. do text forward and loss computation
+        if text_sample is not None:
+            text_sample["modality"] = "text"
+            ## 2.1 re-loading "target_list", in default case, target_list = [src_tokens],
+            ## while in case of using "unit-phone-char" structure, target_list will be [ref_tokens]
+            text_sample["net_input"]["target_list"] = [
+                text_sample.get("ref_tokens", text_sample["net_input"]["src_tokens"].clone()),
+            ]
+            text_net_output = model(**text_sample["net_input"])
+
+            ### 2.2 L_UMLM (text-end, not applied by default)
+            if self.text_mum_weight > 0:
+                loss_u2t, sample_size_u2t, logging_output_u2t = self.compute_hubert_loss(
+                    model,
+                    text_net_output,
+                    reduction,
+                    suffix="_u2t",
+                )
+                loss += self.text_mum_weight * loss_u2t * sample_size / sample_size_u2t
+                logging_output.update(logging_output_u2t)
+
+            ### 2.3 L_UCTC
+            text_sample_size = text_sample["ntokens"]
+            if self.text_ctc_weight > 0:
+                text_ctc_loss = self.compute_ctc_loss(model, text_net_output, text_sample["target"], reduction=reduction)
+                loss += self.text_ctc_weight * text_ctc_loss * sample_size / text_sample_size
+                logging_output["text_ctc_loss"] = utils.item(text_ctc_loss)
+                logging_output["text_sample_size"] = text_sample_size
+
+        logging_output = {
+            "loss": utils.item(loss) if reduce else loss,
+            "ntokens": sample_size,
+            "nsentences": sample["id"].numel() + (text_sample["id"].numel() if text_sample is not None else 0),
+            "sample_size": sample_size,
+            **logging_output,
+        }
+
+        return loss, sample_size, logging_output
+
+    def compute_ctc_loss(self, model, net_output, target, reduction):
+        logits = net_output["encoder_out_ctc"][0]  # (T, B, C) from the code-encoder
+        if self.no_ctc_blank:
+            ## set prob of <blank> to -inf
+            logits = logits.float()
+            logits[:, :, self.blank_idx] = -1000000.0
+        
+        lprobs = F.log_softmax(logits.float(), dim=-1)
+
+        encoder_padding_mask = net_output["encoder_padding_mask"][0]
+        non_padding_mask = ~encoder_padding_mask
+        input_lengths = non_padding_mask.long().sum(-1)
+        pad_mask = (target != self.padding_idx) & (target != self.eos_idx)
+        targets_flat = target.masked_select(pad_mask)
+        target_lengths = pad_mask.sum(-1)
+
+        with torch.backends.cudnn.flags(enabled=False):
+            loss = F.ctc_loss(
+                lprobs,
+                targets_flat,
+                input_lengths,
+                target_lengths,
+                blank=self.blank_idx,
+                reduction=reduction,
+                zero_infinity=True,
+            )
+        return loss
+
+    def compute_ce_loss(self, model, net_output, sample, reduce=True):
+        lprobs, target = self.get_lprobs_and_target(model, net_output, sample)
+        loss, nll_loss = label_smoothed_nll_loss(
+            lprobs,
+            target,
+            self.eps,
+            ignore_index=self.padding_idx,
+            reduce=reduce,
+        )
+        return loss, nll_loss
+
+    def compute_accuracy(self, model, net_output, sample):
+        lprobs, target = self.get_lprobs_and_target(model, net_output, sample)
+        mask = target.ne(self.padding_idx)
+        n_correct = torch.sum(
+            lprobs.argmax(1).masked_select(mask).eq(target.masked_select(mask))
+        )
+        total = torch.sum(mask)
+        return n_correct, total
+
+    def get_lprobs_and_target(self, model, net_output, sample):
+        lprobs = model.get_normalized_probs(net_output, log_probs=True)
+        if sample["modality"] == "speech":
+            target = sample["decoder_target"]
+            if self.ignore_prefix_size > 0:
+                if getattr(lprobs, "batch_first", False):
+                    lprobs = lprobs[:, self.ignore_prefix_size :, :].contiguous()
+                    target = target[:, self.ignore_prefix_size :].contiguous()
+                else:
+                    lprobs = lprobs[self.ignore_prefix_size :, :, :].contiguous()
+                    target = target[self.ignore_prefix_size :, :].contiguous()
+        else:
+            target = sample["target"]
+
+        return lprobs.view(-1, lprobs.size(-1)), target.view(-1)
+
+    @staticmethod
+    def reduce_metrics(logging_outputs) -> None:
+        """Aggregate logging outputs from data parallel training (copied from normal cross entropy)."""
+        loss_sum = sum(log.get("loss", 0) for log in logging_outputs)
+        ntokens = sum(log.get("ntokens", 0) for log in logging_outputs)
+        sample_size = sum(log.get("sample_size", 0) for log in logging_outputs)
+
+        metrics.log_scalar(
+            "loss", loss_sum / sample_size / math.log(2), sample_size, round=3
+        )
+        if sample_size != ntokens:
+            metrics.log_scalar(
+                "nll_loss", loss_sum / ntokens / math.log(2), ntokens, round=3
+            )
+            metrics.log_derived(
+                "ppl", lambda meters: utils.get_perplexity(meters["nll_loss"].avg)
+            )
+        else:
+            metrics.log_derived(
+                "ppl", lambda meters: utils.get_perplexity(meters["loss"].avg)
+            )
+
+        counts = {}
+        for lk in logging_outputs[0].keys():
+            if lk.startswith("count_"):
+                val = sum(log.get(lk, 0) for log in logging_outputs)
+                metrics.log_scalar(lk, val)
+                counts[lk] = val
+
+        for lk in logging_outputs[0].keys():
+            if lk.startswith("loss_"):
+                val = sum(log.get(lk, 0) for log in logging_outputs)
+                metrics.log_scalar(lk, val / sample_size / math.log(2), round=3)
+            elif lk.startswith("correct_"):
+                val = sum(log.get(lk, 0) for log in logging_outputs)
+                metrics.log_scalar(lk, val / counts[re.sub("correct", "count", lk)])
+
+        if "text_sample_size" in logging_outputs[0]:
+            text_sample_size = sum(log.get("text_sample_size", 0) for log in logging_outputs)
+            for lk in logging_outputs[0].keys():
+                if lk.startswith("text_") and lk.endswith("_loss"):
+                    val = sum(log.get(lk, 0) for log in logging_outputs)
+                    metrics.log_scalar(lk, val / text_sample_size / math.log(2), round=3)
+
+    @staticmethod
+    def aggregate_logging_outputs(logging_outputs):
+        """Aggregate logging outputs from data parallel training."""
+        raise NotImplementedError()
+
+    @staticmethod
+    def logging_outputs_can_be_summed() -> bool:
+        """
+        Whether the logging outputs returned by `forward` can be summed
+        across workers prior to calling `reduce_metrics`. Setting this
+        to True will improves distributed training speed.
+        """
+        return False
diff --git a/SpeechLM/speechlm/data/concat_dataset.py b/SpeechLM/speechlm/data/concat_dataset.py
new file mode 100644
index 0000000000000000000000000000000000000000..9c3c22bc806089a0dd92fd2eff44a78528c975ed
--- /dev/null
+++ b/SpeechLM/speechlm/data/concat_dataset.py
@@ -0,0 +1,131 @@
+# ----------------------------------------------------------------------------
+# SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data (https://arxiv.org/abs/2209.15329)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/SpeechLM
+# Code based on fairseq: https://github.com/facebookresearch/fairseq/tree/272c4c5197250997148fb12c0db6306035f166a4
+# 
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# ----------------------------------------------------------------------------
+
+import bisect
+
+import numpy as np
+from torch.utils.data.dataloader import default_collate
+
+from fairseq.data import FairseqDataset
+
+
+class ConcatDataset(FairseqDataset):
+    @staticmethod
+    def cumsum(sequence, sample_ratios):
+        r, s = [], 0
+        for e, ratio in zip(sequence, sample_ratios):
+            curr_len = int(ratio * len(e))
+            r.append(curr_len + s)
+            s += curr_len
+        return r
+
+    def __init__(self, datasets, sample_ratios=1):
+        super(ConcatDataset, self).__init__()
+        assert len(datasets) > 0, "datasets should not be an empty iterable"
+        self.datasets = list(datasets)
+        if isinstance(sample_ratios, int):
+            sample_ratios = [sample_ratios] * len(self.datasets)
+        self.sample_ratios = sample_ratios
+        self.cumulative_sizes = self.cumsum(self.datasets, sample_ratios)
+        self.real_sizes = [len(d) for d in self.datasets]
+
+    def __len__(self):
+        return self.cumulative_sizes[-1]
+
+    def __getitem__(self, idx):
+        dataset_idx, sample_idx = self._get_dataset_and_sample_index(idx)
+        return self.datasets[dataset_idx][sample_idx]
+
+    def _get_dataset_and_sample_index(self, idx: int):
+        dataset_idx = bisect.bisect_right(self.cumulative_sizes, idx)
+        if dataset_idx == 0:
+            sample_idx = idx
+        else:
+            sample_idx = idx - self.cumulative_sizes[dataset_idx - 1]
+        sample_idx = sample_idx % self.real_sizes[dataset_idx]
+        return dataset_idx, sample_idx
+
+    def collater(self, samples, **extra_args):
+        # For now only supports datasets with same underlying collater implementations
+        if hasattr(self.datasets[0], "collater"):
+            return self.datasets[0].collater(samples, **extra_args)
+        else:
+            return default_collate(samples, **extra_args)
+
+    def size(self, idx: int):
+        """
+        Return an example's size as a float or tuple.
+        """
+        dataset_idx, sample_idx = self._get_dataset_and_sample_index(idx)
+        return self.datasets[dataset_idx].size(sample_idx)
+
+    def num_tokens(self, index: int):
+        return np.max(self.size(index))
+
+    def attr(self, attr: str, index: int):
+        dataset_idx = bisect.bisect_right(self.cumulative_sizes, index)
+        return getattr(self.datasets[dataset_idx], attr, None)
+
+    @property
+    def sizes(self):
+        _dataset_sizes = []
+        for ds, sr in zip(self.datasets, self.sample_ratios):
+            if isinstance(ds.sizes, np.ndarray):
+                _dataset_sizes.append(np.tile(ds.sizes, sr))
+            else:
+                # Only support underlying dataset with single size array.
+                assert isinstance(ds.sizes, list)
+                _dataset_sizes.append(np.tile(ds.sizes[0], sr))
+        return np.concatenate(_dataset_sizes)
+
+    @property
+    def supports_prefetch(self):
+        return all(d.supports_prefetch for d in self.datasets)
+
+    def ordered_indices(self):
+        """
+        Returns indices sorted by length. So less padding is needed.
+        """
+        if isinstance(self.sizes, np.ndarray) and len(self.sizes.shape) > 1:
+            # special handling for concatenating lang_pair_datasets
+            if getattr(self.datasets[0], "shuffle", False):
+                indices = np.random.permutation(len(self)).astype(np.int64)
+            else:
+                indices = np.arange(len(self), dtype=np.int64)
+            sizes = self.sizes
+            tgt_sizes = (
+                sizes[:, 1] if len(sizes.shape) > 0 and sizes.shape[1] > 1 else None
+            )
+            src_sizes = (
+                sizes[:, 0] if len(sizes.shape) > 0 and sizes.shape[1] > 1 else sizes
+            )
+            # sort by target length, then source length
+            if tgt_sizes is not None:
+                indices = indices[np.argsort(tgt_sizes[indices], kind="mergesort")]
+            return indices[np.argsort(src_sizes[indices], kind="mergesort")]
+        else:
+            return np.argsort(self.sizes)
+
+    def prefetch(self, indices):
+        frm = 0
+        for to, ds in zip(self.cumulative_sizes, self.datasets):
+            real_size = len(ds)
+            if getattr(ds, "supports_prefetch", False):
+                ds.prefetch([(i - frm) % real_size for i in indices if frm <= i < to])
+            frm = to
+
+    @property
+    def can_reuse_epoch_itr_across_epochs(self):
+        return all(d.can_reuse_epoch_itr_across_epochs for d in self.datasets)
+
+    def set_epoch(self, epoch):
+        super().set_epoch(epoch)
+        for ds in self.datasets:
+            if hasattr(ds, "set_epoch"):
+                ds.set_epoch(epoch)
diff --git a/SpeechLM/speechlm/data/hubert_dataset.py b/SpeechLM/speechlm/data/hubert_dataset.py
new file mode 100644
index 0000000000000000000000000000000000000000..cef948f8ecc38deb766ef9031578e1b8d8741170
--- /dev/null
+++ b/SpeechLM/speechlm/data/hubert_dataset.py
@@ -0,0 +1,599 @@
+# ----------------------------------------------------------------------------
+# SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data (https://arxiv.org/abs/2209.15329)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/SpeechLM
+# Code based on fairseq: https://github.com/facebookresearch/fairseq/tree/272c4c5197250997148fb12c0db6306035f166a4
+# 
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# ----------------------------------------------------------------------------
+
+import itertools
+import logging
+import io
+import os
+import sys
+import time
+from pathlib import Path
+from typing import Any, List, Optional, Union, Tuple
+
+import numpy as np
+
+import torch
+import torch.nn.functional as F
+from fairseq.data import data_utils, Dictionary
+from fairseq.data.fairseq_dataset import FairseqDataset
+from fairseq.data.audio.audio_utils import (
+    read_from_stored_zip,
+    is_sf_audio_data,
+)
+
+FEATURE_OR_SF_AUDIO_FILE_EXTENSIONS = {".npy", ".wav", ".flac", ".ogg"}
+
+logger = logging.getLogger(__name__)
+
+def parse_path(path: str) -> Tuple[str, List[int]]:
+    """Parse data path which is either a path to
+    1. a .npy/.wav/.flac/.ogg file
+    2. a stored ZIP file with slicing info: "[zip_path]:[offset]:[length]"
+
+      Args:
+          path (str): the data path to parse
+
+      Returns:
+          file_path (str): the file path
+          slice_ptr (list of int): empty in case 1;
+            byte offset and length for the slice in case 2
+    """
+
+    if Path(path).suffix in FEATURE_OR_SF_AUDIO_FILE_EXTENSIONS:
+        _path, slice_ptr = path, []
+    else:
+        _path, *slice_ptr = path.split(":")
+        if not Path(_path).is_file():
+            raise FileNotFoundError(f"File not found: {_path}")
+    assert len(slice_ptr) in {0, 1, 2}, f"Invalid path: {path}"
+    slice_ptr = [int(i) for i in slice_ptr]
+    return _path, slice_ptr
+
+def load_audio(manifest_path, max_keep, min_keep, retry_times=5):
+    n_long, n_short = 0, 0
+    names, inds, sizes, chunk_names, chunk_indices = [], [], [], [], []
+    for i in range(retry_times):
+        with open(manifest_path) as f:
+            root = f.readline().strip()
+            for ind, line in enumerate(f):
+                items = line.strip().split("\t")
+                assert len(items) == 2, line
+                sz = int(items[1])
+                if min_keep is not None and sz < min_keep:
+                    n_short += 1
+                elif max_keep is not None and sz > max_keep:
+                    n_long += 1
+                else:
+                    fname = items[0].split(":")
+                    if len(fname) > 2:
+                        if len(chunk_names) == 0 or fname[0] != chunk_names[-1]:
+                            chunk_names.append(fname[0])
+                            chunk_indices.append(len(names))
+                    names.append(items[0])
+                    inds.append(ind)
+                    sizes.append(sz)
+        if len(names) == 0:
+            logger.warn(f"Fail to load manifest for the {i} time")
+            time.sleep(1)
+            continue
+        else:
+            break
+    tot = ind + 1
+    logger.info(
+        (
+            f"max_keep={max_keep}, min_keep={min_keep}, "
+            f"loaded {len(names)}, skipped {n_short} short and {n_long} long, "
+            f"longest-loaded={max(sizes)}, shortest-loaded={min(sizes)}"
+        )
+    )
+    return root, names, inds, tot, sizes, chunk_names, chunk_indices
+
+
+def load_label(label_path, inds, tot, retry_times=5):
+    for i in range(retry_times):
+        with open(label_path) as f:
+            labels = [line.rstrip() for line in f]
+            if len(labels) == 0:
+                logger.warn(f"Fail to load label for the {i} time")
+                time.sleep(1)
+                continue
+            else:
+                break
+    assert (
+        len(labels) == tot
+    ), f"number of labels does not match ({len(labels)} != {tot})"
+    labels = [labels[i] for i in inds]
+    return labels
+
+
+def load_label_offset(label_path, inds, tot, retry_times=5):
+    for i in range(retry_times):
+        with open(label_path) as f:
+            code_lengths = [len(line.encode("utf-8")) for line in f]
+            if len(code_lengths) == 0:
+                logger.warn(f"Fail to load label for the {i} time")
+                time.sleep(1)
+                continue
+            else:
+                break
+    assert (
+        len(code_lengths) == tot
+    ), f"number of labels does not match ({len(code_lengths)} != {tot})"
+    offsets = list(itertools.accumulate([0] + code_lengths))
+    offsets = [(offsets[i], offsets[i + 1]) for i in inds]
+    return offsets
+
+
+def verify_label_lengths(
+    audio_sizes,
+    audio_rate,
+    label_path,
+    label_rate,
+    inds,
+    tot,
+    tol=0.1,  # tolerance in seconds
+):
+    if label_rate < 0:
+        logger.info(f"{label_path} is sequence label. skipped")
+        return
+
+    with open(label_path) as f:
+        lengths = [len(line.rstrip().split()) for line in f]
+        assert len(lengths) == tot
+        lengths = [lengths[i] for i in inds]
+    num_invalid = 0
+    for i, ind in enumerate(inds):
+        dur_from_audio = audio_sizes[i] / audio_rate
+        dur_from_label = lengths[i] / label_rate
+        if abs(dur_from_audio - dur_from_label) > tol:
+            logger.warning(
+                (
+                    f"audio and label duration differ too much "
+                    f"(|{dur_from_audio} - {dur_from_label}| > {tol}) "
+                    f"in line {ind+1} of {label_path}. Check if `label_rate` "
+                    f"is correctly set (currently {label_rate}). "
+                    f"num. of samples = {audio_sizes[i]}; "
+                    f"label length = {lengths[i]}"
+                )
+            )
+            num_invalid += 1
+    if num_invalid > 0:
+        logger.warning(
+            f"total {num_invalid} (audio, label) pairs with mismatched lengths"
+        )
+
+
+class HubertDataset(FairseqDataset):
+    def __init__(
+        self,
+        manifest_path: str,
+        sample_rate: float,
+        label_paths: List[str],
+        label_rates: Union[List[float], float],  # -1 for sequence labels
+        pad_list: List[str],
+        eos_list: List[str],
+        label_processors: Optional[List[Any]] = None,
+        max_keep_sample_size: Optional[int] = None,
+        min_keep_sample_size: Optional[int] = None,
+        max_sample_size: Optional[int] = None,
+        shuffle: bool = True,
+        pad_audio: bool = False,
+        normalize: bool = False,
+        store_labels: bool = True,
+        random_crop: bool = False,
+        single_target: bool = False,
+        tgt_dict: Optional[Dictionary] = None,
+        add_decoder_target: bool = False,
+        fine_tuning: bool = False,
+        tgt_lang_idx: int = None,
+        tokenizer = None,
+        mbart_style_lang_id: bool = False,
+        retry_times: int = 5,
+        reduce_label_for_dec: bool = True,
+    ):
+        self.audio_root, self.audio_names, inds, tot, self.wav_sizes, self.chunk_names, self.chunk_indices = load_audio(
+            manifest_path, max_keep_sample_size, min_keep_sample_size, retry_times
+        )
+        self.sample_rate = sample_rate
+        self.shuffle = shuffle
+        self.random_crop = random_crop
+        self.tgt_dict = tgt_dict
+        self.add_decoder_target = add_decoder_target
+        self.fine_tuning = fine_tuning
+
+        self.num_labels = len(label_paths)
+        self.pad_list = pad_list
+        self.eos_list = eos_list
+        self.label_processors = label_processors
+        self.single_target = single_target
+        self.epoch = 0
+
+        self.label_rates = (
+            [label_rates for _ in range(len(label_paths))]
+            if isinstance(label_rates, int)
+            else label_rates
+        )
+        self.store_labels = store_labels
+        if store_labels:
+            self.label_list = [load_label(p, inds, tot, retry_times) for p in label_paths]
+        else:
+            self.label_paths = label_paths
+            self.label_offsets_list = [
+                load_label_offset(p, inds, tot, retry_times) for p in label_paths
+            ]
+        assert label_processors is None or len(label_processors) == self.num_labels
+        for label_path, label_rate in zip(label_paths, self.label_rates):
+            verify_label_lengths(
+                self.wav_sizes, sample_rate, label_path, label_rate, inds, tot
+            )
+
+        self.max_sample_size = (
+            max_sample_size if max_sample_size is not None else sys.maxsize
+        )
+        self.pad_audio = pad_audio
+        self.normalize = normalize
+        self.tgt_lang_idx = tgt_lang_idx
+        self.tokenizer = tokenizer
+        self.mbart_style_lang_id = mbart_style_lang_id
+        self.retry_times = retry_times
+        self.reduce_label_for_dec = reduce_label_for_dec
+        logger.info(
+            f"pad_audio={pad_audio}, random_crop={random_crop}, tgt_lang_idx={self.tgt_lang_idx}, reduce_label_for_dec={reduce_label_for_dec}, "
+            f"mbart_style_lang_id={mbart_style_lang_id}, normalize={normalize}, max_sample_size={self.max_sample_size}"
+        )
+
+    def set_epoch(self, epoch):
+        self.epoch = epoch
+    
+    def batch_by_size(self, indices, max_tokens=None, max_sentences=None, required_batch_size_multiple=1):
+        self.max_tokens = max_tokens
+        self.max_sentences = max_sentences
+        self.required_batch_size_multiple = required_batch_size_multiple
+        if isinstance(indices[0], np.ndarray):
+            batch_list = []
+            for indice in indices:
+                batch = super(HubertDataset, self).batch_by_size(indice, max_tokens, max_sentences, required_batch_size_multiple)
+                batch_list.append(batch)
+            return batch_list
+        else:
+            return super(HubertDataset, self).batch_by_size(indices, max_tokens, max_sentences, required_batch_size_multiple)
+    def shuffle_batches(self, batches, seed):
+        if isinstance(batches[0], list):
+            new_batches = []
+            with data_utils.numpy_seed(seed):
+                np.random.shuffle(batches)
+                for batch in batches:
+                    np.random.shuffle(batch)
+                    new_batches.extend(batch)
+            return new_batches
+        else:
+            with data_utils.numpy_seed(seed):
+                np.random.shuffle(batches)
+        return batches
+    
+    def get_audio(self, index):
+        import soundfile as sf
+
+        wav_path = os.path.join(self.audio_root, self.audio_names[index])
+        _path, slice_ptr = parse_path(wav_path)
+        if len(slice_ptr) == 1:
+            import kaldiio
+            feat = kaldiio.load_mat(wav_path)
+            feat = torch.from_numpy(feat).float()
+            if self.normalize:
+                with torch.no_grad():
+                    feat = F.layer_norm(feat, feat.shape[-1])
+            return feat
+        else:
+            if len(slice_ptr) == 2:
+                byte_data = read_from_stored_zip(_path, slice_ptr[0], slice_ptr[1])
+                assert is_sf_audio_data(byte_data)
+                wav_path = io.BytesIO(byte_data)
+            for i in range(self.retry_times):
+                if i < self.retry_times - 1:
+                    try:
+                        wav, cur_sample_rate = sf.read(wav_path)
+                        break
+                    except Exception as e:
+                        logger.warn(f"Fail to load wav for the {i} time")
+                        logger.warn(e)
+                        time.sleep(1)
+                        continue
+                else:
+                    wav, cur_sample_rate = sf.read(wav_path)
+
+            wav = torch.from_numpy(wav).float()
+            wav = self.postprocess(wav, cur_sample_rate)
+            return wav
+
+    def get_label(self, index, label_idx):
+        if self.store_labels:
+            label = self.label_list[label_idx][index]
+        else:
+            with open(self.label_paths[label_idx]) as f:
+                offset_s, offset_e = self.label_offsets_list[label_idx][index]
+                f.seek(offset_s)
+                label = f.read(offset_e - offset_s)
+
+        if self.tokenizer is not None and self.fine_tuning:
+            label = self.tokenizer.encode(label)
+
+        if self.label_processors is not None:
+            label = self.label_processors[label_idx](label)
+        return label
+
+    def get_labels(self, index):
+        return [self.get_label(index, i) for i in range(self.num_labels)]
+
+    def __getitem__(self, index):
+        wav = self.get_audio(index)
+        labels = self.get_labels(index)
+        return {"id": index, "source": wav, "label_list": labels}
+
+    def __len__(self):
+        return len(self.wav_sizes)
+
+    def crop_to_max_size(self, wav, target_size):
+        size = len(wav)
+        diff = size - target_size
+        if diff <= 0:
+            return wav, 0
+
+        start, end = 0, target_size
+        if self.random_crop:
+            start = np.random.randint(0, diff + 1)
+            end = size - diff + start
+        return wav[start:end], start
+
+    def collater(self, samples):
+        # target = max(sizes) -> random_crop not used
+        # target = max_sample_size -> random_crop used for long
+        samples = [s for s in samples if s["source"] is not None]
+        if len(samples) == 0:
+            return {}
+
+        audios = [s["source"] for s in samples]
+        audio_sizes = [len(s) for s in audios]
+        if self.pad_audio:
+            audio_size = min(max(audio_sizes), self.max_sample_size)
+        else:
+            audio_size = min(min(audio_sizes), self.max_sample_size)
+        feat_dim = audios[0].size(-1) if audios[0].dim() > 1 else 1
+        collated_audios, padding_mask, audio_starts = self.collater_audio(
+            audios, audio_size, feat_dim,
+        )
+
+        targets_by_label = [
+            [s["label_list"][i] for s in samples] for i in range(self.num_labels)
+        ]
+        targets_list, lengths_list, ntokens_list = self.collater_label(
+            targets_by_label, audio_size, audio_starts
+        )
+
+        if self.add_decoder_target:
+            if self.fine_tuning:
+                    decoder_label = [
+                        torch.cat((targets_list[0][i, :lengths_list[0][i]], torch.tensor([self.tgt_dict.eos()])), 0).long()
+                        for i in range(targets_list[0].size(0))
+                    ]
+            else:
+                if self.tokenizer is not None:
+                    decoder_label = [
+                        # Set 48 for translate int to char and avoid \n
+                        torch.cat(
+                            (
+                                torch.tensor(
+                                    self.tokenizer.sp.Encode(
+                                        "".join(
+                                            [chr(j + 48) for j in (
+                                                targets_list[0][i, :lengths_list[0][i]].unique_consecutive() if self.reduce_label_for_dec else targets_list[0][i, :lengths_list[0][i]]
+                                            ).tolist()]
+                                        ), out_type=int
+                                    )
+                                ), 
+                                torch.tensor([self.tgt_dict.eos()])
+                            ), dim=0
+                        ).long()
+                        for i in range(targets_list[0].size(0))
+                    ]
+                else:
+                    decoder_label = [
+                        torch.cat((targets_list[0][i, :lengths_list[0][i]].unique_consecutive() if self.reduce_label_for_dec else targets_list[0][i, :lengths_list[0][i]], torch.tensor([self.tgt_dict.eos()])), 0).long()
+                        for i in range(targets_list[0].size(0))
+                    ]
+
+            if self.mbart_style_lang_id:
+                decoder_label = [
+                    torch.cat((decoder_label[i], torch.tensor([self.tgt_lang_idx])), 0).long()
+                    for i in range(targets_list[0].size(0))
+                ]
+
+            dec_ntokens = sum(x.size(0) for x in decoder_label)
+            decoder_target = data_utils.collate_tokens(
+                decoder_label,
+                self.tgt_dict.pad(),
+                self.tgt_dict.eos() if not self.mbart_style_lang_id else self.tgt_lang_idx,
+                left_pad=False,
+                move_eos_to_beginning=False,
+            )
+            decoder_target_lengths = torch.tensor(
+                [x.size(0) for x in decoder_label], dtype=torch.long
+            )
+            prev_output_tokens = data_utils.collate_tokens(
+                decoder_label,
+                self.tgt_dict.pad(),
+                self.tgt_dict.eos() if not self.mbart_style_lang_id else self.tgt_lang_idx,
+                left_pad=False,
+                move_eos_to_beginning=True,
+            )
+            
+            if self.tgt_lang_idx is not None and not self.mbart_style_lang_id:
+                assert (prev_output_tokens[:, 0] != self.tgt_dict.eos()).sum() == 0
+                prev_output_tokens[:, 0] = self.tgt_lang_idx
+
+            net_input = {
+                "source": collated_audios, 
+                "padding_mask": padding_mask,
+                "prev_output_tokens": prev_output_tokens,
+            }
+            batch = {
+                "id": torch.LongTensor([s["id"] for s in samples]),
+                "net_input": net_input,
+                "decoder_target": decoder_target,
+                "decoder_target_lengths": decoder_target_lengths,
+                "dec_ntokens": dec_ntokens,
+                "lang_idx": self.tgt_lang_idx,
+            }
+        else:
+            net_input = {"source": collated_audios, "padding_mask": padding_mask}
+            batch = {
+                "id": torch.LongTensor([s["id"] for s in samples]),
+                "net_input": net_input,
+            }
+
+        if self.single_target:
+            batch["target_lengths"] = lengths_list[0]
+            batch["ntokens"] = ntokens_list[0]
+            batch["target"] = targets_list[0]
+        else:
+            batch["target_lengths_list"] = lengths_list
+            batch["ntokens_list"] = ntokens_list
+            batch["target_list"] = targets_list
+        return batch
+
+    def collater_audio(self, audios, audio_size, feat_dim=1):
+        collated_audios = audios[0].new_zeros(len(audios), audio_size, feat_dim)
+        padding_mask = (
+            torch.BoolTensor(collated_audios.shape[0:2]).fill_(False)
+            # if self.pad_audio else None
+        )
+        audio_starts = [0 for _ in audios]
+        for i, audio in enumerate(audios):
+            audio = audio.view(-1, feat_dim)
+            diff = len(audio) - audio_size
+            if diff == 0:
+                collated_audios[i] = audio
+            elif diff < 0:
+                assert self.pad_audio
+                collated_audios[i] = torch.cat([audio, audio.new_full((-diff, feat_dim), 0.0)])
+                padding_mask[i, diff:] = True
+            else:
+                collated_audios[i], audio_starts[i] = self.crop_to_max_size(
+                    audio, audio_size
+                )
+        return collated_audios.squeeze(-1), padding_mask, audio_starts
+
+    def collater_frm_label(self, targets, audio_size, audio_starts, label_rate, pad):
+        assert label_rate > 0
+        s2f = label_rate / self.sample_rate
+        frm_starts = [int(round(s * s2f)) for s in audio_starts]
+        frm_size = int(round(audio_size * s2f))
+        if not self.pad_audio:
+            rem_size = [len(t) - s for t, s in zip(targets, frm_starts)]
+            frm_size = min(frm_size, *rem_size)
+        targets = [t[s : s + frm_size] for t, s in zip(targets, frm_starts)]
+        logger.debug(f"audio_starts={audio_starts}")
+        logger.debug(f"frame_starts={frm_starts}")
+        logger.debug(f"frame_size={frm_size}")
+
+        lengths = torch.LongTensor([len(t) for t in targets])
+        ntokens = lengths.sum().item()
+        targets = data_utils.collate_tokens(targets, pad_idx=pad, left_pad=False)
+        return targets, lengths, ntokens
+
+    def collater_seq_label(self, targets, pad):
+        lengths = torch.LongTensor([len(t) for t in targets])
+        ntokens = lengths.sum().item()
+        targets = data_utils.collate_tokens(targets, pad_idx=pad, left_pad=False)
+        return targets, lengths, ntokens
+
+    def collater_label(self, targets_by_label, audio_size, audio_starts):
+        targets_list, lengths_list, ntokens_list = [], [], []
+        itr = zip(targets_by_label, self.label_rates, self.pad_list)
+        for targets, label_rate, pad in itr:
+            if label_rate == -1:
+                targets, lengths, ntokens = self.collater_seq_label(targets, pad)
+            else:
+                targets, lengths, ntokens = self.collater_frm_label(
+                    targets, audio_size, audio_starts, label_rate, pad
+                )
+            targets_list.append(targets)
+            lengths_list.append(lengths)
+            ntokens_list.append(ntokens)
+        return targets_list, lengths_list, ntokens_list
+
+    def num_tokens(self, index):
+        return self.size(index)
+
+    def size(self, index):
+        if self.pad_audio:
+            return self.wav_sizes[index]
+        return min(self.wav_sizes[index], self.max_sample_size)
+
+    @property
+    def sizes(self):
+        return np.array(self.wav_sizes)
+
+    def ordered_indices(self):
+        """Return an ordered list of indices. Batches will be constructed based
+        on this order."""
+
+        if self.shuffle:
+            if len(self.chunk_names) > 0:
+                logger.info(f"ordered indices for epoch {self.epoch}")
+                with data_utils.numpy_seed(self.epoch):
+                    self.chunk_order = np.random.permutation(len(self.chunk_names))
+                chunk_count = 0
+                tmp_sizes = []
+                tmp_indices = []
+                indice = []
+                for i in self.chunk_order:
+                    chunk_count += 1
+                    start = self.chunk_indices[i]
+                    end = self.chunk_indices[i+1] if i < len(self.chunk_names) - 1 else len(self)
+                    size = list(self.sizes[start:end])
+                    tmp_indices.extend(list(np.arange(start, end)))
+                    tmp_sizes.extend(size)
+                    if chunk_count % 10 == 0 or i == self.chunk_order[0]:
+                        order = [np.random.permutation(len(tmp_indices))]
+                        order.append(
+                            np.minimum(
+                                np.array(tmp_sizes),
+                                self.max_sample_size,
+                            )
+                        )
+                        sort_idx = np.lexsort(order)[::-1]
+                        indice.append(np.array([tmp_indices[k] for k in sort_idx]))
+                        tmp_indices = []
+                        tmp_sizes =[]
+                return indice
+            else:
+                order = [np.random.permutation(len(self))]
+                order.append(
+                    np.minimum(
+                        np.array(self.sizes),
+                        self.max_sample_size,
+                    )
+                )
+                return np.lexsort(order)[::-1]
+        else:
+            return np.arange(len(self))
+
+    def postprocess(self, wav, cur_sample_rate):
+        if wav.dim() == 2:
+            wav = wav.mean(-1)
+        assert wav.dim() == 1, wav.dim()
+
+        if cur_sample_rate != self.sample_rate:
+            raise Exception(f"sr {cur_sample_rate} != {self.sample_rate}")
+
+        if self.normalize:
+            with torch.no_grad():
+                wav = F.layer_norm(wav, wav.shape)
+        return wav
diff --git a/SpeechLM/speechlm/data/language_trible_dataset.py b/SpeechLM/speechlm/data/language_trible_dataset.py
new file mode 100644
index 0000000000000000000000000000000000000000..587a0450e36460416ec6b0882d5b1eda65ad3f13
--- /dev/null
+++ b/SpeechLM/speechlm/data/language_trible_dataset.py
@@ -0,0 +1,671 @@
+# ----------------------------------------------------------------------------
+# SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data (https://arxiv.org/abs/2209.15329)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/SpeechLM
+# Code based on fairseq: https://github.com/facebookresearch/fairseq/tree/272c4c5197250997148fb12c0db6306035f166a4
+# 
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# ----------------------------------------------------------------------------
+
+import logging
+import numpy as np
+import torch
+import os
+import itertools
+
+from fairseq.data import FairseqDataset, data_utils
+from fairseq.data import (
+    AppendTokenDataset,
+    ConcatDataset,
+    PrependTokenDataset,
+    data_utils,
+    indexed_dataset,
+)
+
+logger = logging.getLogger(__name__)
+
+def load_langtriple_dataset(
+    data_path,
+    split,
+    src,
+    src_dict,
+    ref,
+    ref_dict,
+    tgt,
+    tgt_dict,
+    combine,
+    dataset_impl,
+    upsample_primary,
+    left_pad_source,
+    left_pad_target,
+    max_source_positions,
+    max_target_positions,
+    prepend_bos=False,
+    load_alignments=False,
+    truncate_source=False,
+    append_source_id=False,
+    num_buckets=0,
+    shuffle=True,
+    pad_to_multiple=1,
+    prepend_bos_src=None,
+    lang_format="[{}]",
+):
+    assert not truncate_source
+    def split_exists(split, src, ref, tgt, lang, data_path):
+        filename = os.path.join(data_path, "{}.{}-{}-{}.{}".format(split, src, ref, tgt, lang))
+        return indexed_dataset.dataset_exists(filename, impl=dataset_impl)
+
+    src_datasets = []
+    ref_datasets = []
+    tgt_datasets = []
+
+    for k in itertools.count():
+        split_k = split + (str(k) if k > 0 else "")
+
+        # infer langcode
+        if split_exists(split_k, src, ref, tgt, src, data_path):
+            prefix = os.path.join(data_path, "{}.{}-{}-{}.".format(split_k, src, ref, tgt))
+        elif split_exists(split_k, tgt, ref, src, src, data_path):
+            prefix = os.path.join(data_path, "{}.{}-{}-{}.".format(split_k, tgt, ref, src))
+        else:
+            if k > 0:
+                break
+            else:
+                raise FileNotFoundError(
+                    "Dataset not found: {} ({})".format(split, data_path)
+                )
+
+        src_dataset = data_utils.load_indexed_dataset(
+            prefix + src, src_dict, dataset_impl
+        )
+        src_datasets.append(src_dataset)
+
+        ref_dataset = data_utils.load_indexed_dataset(
+            prefix + ref, ref_dict, dataset_impl
+        )
+        ref_datasets.append(ref_dataset)
+
+        tgt_dataset = data_utils.load_indexed_dataset(
+            prefix + tgt, tgt_dict, dataset_impl
+        )
+        if tgt_dataset is not None:
+            tgt_datasets.append(tgt_dataset)
+
+        logger.info(
+            "{} {} {}-{}-{} {} examples".format(
+                data_path, split_k, src, ref, tgt, len(src_datasets[-1])
+            )
+        )
+
+        if not combine:
+            break
+
+    assert len(src_datasets) == len(ref_datasets)
+    assert len(src_datasets) == len(tgt_datasets) or len(tgt_datasets) == 0
+
+    if len(src_datasets) == 1:
+        src_dataset = src_datasets[0]
+        ref_dataset = ref_datasets[0]
+        tgt_dataset = tgt_datasets[0] if len(tgt_datasets) > 0 else None
+    else:
+        sample_ratios = [1] * len(src_datasets)
+        sample_ratios[0] = upsample_primary
+        src_dataset = ConcatDataset(src_datasets, sample_ratios)
+        ref_dataset = ConcatDataset(ref_datasets, sample_ratios)
+        if len(tgt_datasets) > 0:
+            tgt_dataset = ConcatDataset(tgt_datasets, sample_ratios)
+        else:
+            tgt_dataset = None
+
+    if prepend_bos:
+        assert hasattr(src_dict, "bos_index") and hasattr(ref_dict, "bos_index") and hasattr(tgt_dict, "bos_index")
+        src_dataset = PrependTokenDataset(src_dataset, src_dict.bos())
+        ref_dataset = PrependTokenDataset(ref_dataset, ref_dict.bos())
+        if tgt_dataset is not None:
+            tgt_dataset = PrependTokenDataset(tgt_dataset, tgt_dict.bos())
+    elif prepend_bos_src is not None:
+        logger.info(f"prepending src bos: {prepend_bos_src}")
+        src_dataset = PrependTokenDataset(src_dataset, prepend_bos_src)
+        ref_dataset = PrependTokenDataset(ref_dataset, prepend_bos_src)
+
+    eos = None
+    if append_source_id:
+        src_dataset = AppendTokenDataset(
+            src_dataset, src_dict.index(lang_format.format(src))
+        )
+        ref_dataset = AppendTokenDataset(
+            ref_dataset, ref_dict.index(lang_format.format(ref))
+        )
+        if tgt_dataset is not None:
+            tgt_dataset = AppendTokenDataset(
+                tgt_dataset, tgt_dict.index(lang_format.format(tgt))
+            )
+        eos = tgt_dict.index(lang_format.format(tgt))
+
+    align_dataset = None
+    if load_alignments:
+        align_path = os.path.join(data_path, "{}.align.{}-{}".format(split, src, tgt))
+        if indexed_dataset.dataset_exists(align_path, impl=dataset_impl):
+            align_dataset = data_utils.load_indexed_dataset(
+                align_path, None, dataset_impl
+            )
+
+    tgt_dataset_sizes = tgt_dataset.sizes if tgt_dataset is not None else None
+    return LanguageTripleDataset(
+        src_dataset,
+        src_dataset.sizes,
+        src_dict,
+        ref_dataset,
+        ref_dataset.sizes,
+        ref_dict,
+        tgt_dataset,
+        tgt_dataset_sizes,
+        tgt_dict,
+        left_pad_source=left_pad_source,
+        left_pad_target=left_pad_target,
+        align_dataset=align_dataset,
+        eos=eos,
+        num_buckets=num_buckets,
+        shuffle=shuffle,
+        pad_to_multiple=pad_to_multiple,
+    )
+
+
+def collate(
+    samples,
+    pad_idx,
+    eos_idx,
+    left_pad_source=True,
+    left_pad_target=False,
+    input_feeding=True,
+    pad_to_length=None,
+    pad_to_multiple=1,
+):
+    if len(samples) == 0:
+        return {}
+
+    def merge(key, left_pad, move_eos_to_beginning=False, pad_to_length=None):
+        return data_utils.collate_tokens(
+            [s[key] for s in samples],
+            pad_idx,
+            None,
+            left_pad,
+            move_eos_to_beginning,
+            pad_to_length=pad_to_length,
+            pad_to_multiple=pad_to_multiple,
+        )
+
+    def check_alignment(alignment, src_len, tgt_len):
+        if alignment is None or len(alignment) == 0:
+            return False
+        if (
+            alignment[:, 0].max().item() >= src_len - 1
+            or alignment[:, 1].max().item() >= tgt_len - 1
+        ):
+            logger.warning("alignment size mismatch found, skipping alignment!")
+            return False
+        return True
+
+    def compute_alignment_weights(alignments):
+        """
+        Given a tensor of shape [:, 2] containing the source-target indices
+        corresponding to the alignments, a weight vector containing the
+        inverse frequency of each target index is computed.
+        For e.g. if alignments = [[5, 7], [2, 3], [1, 3], [4, 2]], then
+        a tensor containing [1., 0.5, 0.5, 1] should be returned (since target
+        index 3 is repeated twice)
+        """
+        align_tgt = alignments[:, 1]
+        _, align_tgt_i, align_tgt_c = torch.unique(
+            align_tgt, return_inverse=True, return_counts=True
+        )
+        align_weights = align_tgt_c[align_tgt_i[np.arange(len(align_tgt))]]
+        return 1.0 / align_weights.float()
+
+    id = torch.LongTensor([s["id"] for s in samples])
+    src_tokens = merge(
+        "source",
+        left_pad=left_pad_source,
+        pad_to_length=pad_to_length["source"] if pad_to_length is not None else None,
+    )
+    ref_tokens = merge(
+        "reference",
+        left_pad=left_pad_source,
+        pad_to_length=pad_to_length["source"] if pad_to_length is not None else None,
+    )
+    # sort by descending source length
+    src_lengths = torch.LongTensor(
+        [s["source"].ne(pad_idx).long().sum() for s in samples]
+    )
+    ref_lengths = torch.LongTensor(
+        [s["reference"].ne(pad_idx).long().sum() for s in samples]
+    )
+    src_lengths, sort_order = src_lengths.sort(descending=True)
+    id = id.index_select(0, sort_order)
+    src_tokens = src_tokens.index_select(0, sort_order)
+    ref_lengths = ref_lengths.index_select(0, sort_order)
+    ref_tokens = ref_tokens.index_select(0, sort_order)
+
+    prev_output_tokens = None
+    target = None
+    if samples[0].get("target", None) is not None:
+        target = merge(
+            "target",
+            left_pad=left_pad_target,
+            pad_to_length=pad_to_length["target"]
+            if pad_to_length is not None
+            else None,
+        )
+        target = target.index_select(0, sort_order)
+        tgt_lengths = torch.LongTensor(
+            [s["target"].ne(pad_idx).long().sum() for s in samples]
+        ).index_select(0, sort_order)
+        ntokens = tgt_lengths.sum().item()
+
+        if samples[0].get("prev_output_tokens", None) is not None:
+            prev_output_tokens = merge("prev_output_tokens", left_pad=left_pad_target)
+        elif input_feeding:
+            # we create a shifted version of targets for feeding the
+            # previous output token(s) into the next decoder step
+            prev_output_tokens = merge(
+                "target",
+                left_pad=left_pad_target,
+                move_eos_to_beginning=True,
+                pad_to_length=pad_to_length["target"]
+                if pad_to_length is not None
+                else None,
+            )
+    else:
+        ntokens = src_lengths.sum().item()
+
+    batch = {
+        "id": id,
+        "nsentences": len(samples),
+        "ntokens": ntokens,
+        "net_input": {
+            "src_tokens": src_tokens,
+            "src_lengths": src_lengths,
+        },
+        "target": target,
+        "ref_tokens": ref_tokens,
+        "ref_lengths": ref_lengths,
+    }
+    if prev_output_tokens is not None:
+        batch["net_input"]["prev_output_tokens"] = prev_output_tokens.index_select(
+            0, sort_order
+        )
+
+    if samples[0].get("alignment", None) is not None:
+        bsz, tgt_sz = batch["target"].shape
+        src_sz = batch["net_input"]["src_tokens"].shape[1]
+
+        offsets = torch.zeros((len(sort_order), 2), dtype=torch.long)
+        offsets[:, 1] += torch.arange(len(sort_order), dtype=torch.long) * tgt_sz
+        if left_pad_source:
+            offsets[:, 0] += src_sz - src_lengths
+        if left_pad_target:
+            offsets[:, 1] += tgt_sz - tgt_lengths
+
+        alignments = [
+            alignment + offset
+            for align_idx, offset, src_len, tgt_len in zip(
+                sort_order, offsets, src_lengths, tgt_lengths
+            )
+            for alignment in [samples[align_idx]["alignment"].view(-1, 2)]
+            if check_alignment(alignment, src_len, tgt_len)
+        ]
+
+        if len(alignments) > 0:
+            alignments = torch.cat(alignments, dim=0)
+            align_weights = compute_alignment_weights(alignments)
+
+            batch["alignments"] = alignments
+            batch["align_weights"] = align_weights
+
+    if samples[0].get("constraints", None) is not None:
+        # Collate the packed constraints across the samples, padding to
+        # the length of the longest sample.
+        lens = [sample.get("constraints").size(0) for sample in samples]
+        max_len = max(lens)
+        constraints = torch.zeros((len(samples), max(lens))).long()
+        for i, sample in enumerate(samples):
+            constraints[i, 0 : lens[i]] = samples[i].get("constraints")
+        batch["constraints"] = constraints.index_select(0, sort_order)
+
+    return batch
+
+
+class LanguageTripleDataset(FairseqDataset):
+    """
+    A pair of torch.utils.data.Datasets.
+
+    Args:
+        src (torch.utils.data.Dataset): source dataset to wrap
+        src_sizes (List[int]): source sentence lengths
+        src_dict (~fairseq.data.Dictionary): source vocabulary
+        tgt (torch.utils.data.Dataset, optional): target dataset to wrap
+        tgt_sizes (List[int], optional): target sentence lengths
+        tgt_dict (~fairseq.data.Dictionary, optional): target vocabulary
+        left_pad_source (bool, optional): pad source tensors on the left side
+            (default: True).
+        left_pad_target (bool, optional): pad target tensors on the left side
+            (default: False).
+        shuffle (bool, optional): shuffle dataset elements before batching
+            (default: True).
+        input_feeding (bool, optional): create a shifted version of the targets
+            to be passed into the model for teacher forcing (default: True).
+        remove_eos_from_source (bool, optional): if set, removes eos from end
+            of source if it's present (default: False).
+        append_eos_to_target (bool, optional): if set, appends eos to end of
+            target if it's absent (default: False).
+        align_dataset (torch.utils.data.Dataset, optional): dataset
+            containing alignments.
+        constraints (Tensor, optional): 2d tensor with a concatenated, zero-
+            delimited list of constraints for each sentence.
+        append_bos (bool, optional): if set, appends bos to the beginning of
+            source/target sentence.
+        num_buckets (int, optional): if set to a value greater than 0, then
+            batches will be bucketed into the given number of batch shapes.
+        src_lang_id (int, optional): source language ID, if set, the collated batch
+            will contain a field 'src_lang_id' in 'net_input' which indicates the
+            source language of the samples.
+        tgt_lang_id (int, optional): target language ID, if set, the collated batch
+            will contain a field 'tgt_lang_id' which indicates the target language
+             of the samples.
+    """
+
+    def __init__(
+        self,
+        src,
+        src_sizes,
+        src_dict,
+        ref,
+        ref_sizes,
+        ref_dict,
+        tgt=None,
+        tgt_sizes=None,
+        tgt_dict=None,
+        left_pad_source=True,
+        left_pad_target=False,
+        shuffle=True,
+        input_feeding=True,
+        remove_eos_from_source=False,
+        append_eos_to_target=False,
+        align_dataset=None,
+        constraints=None,
+        append_bos=False,
+        eos=None,
+        num_buckets=0,
+        src_lang_id=None,
+        tgt_lang_id=None,
+        pad_to_multiple=1,
+    ):
+        if tgt_dict is not None:
+            assert src_dict.pad() == tgt_dict.pad()
+            assert src_dict.eos() == tgt_dict.eos()
+            assert src_dict.unk() == tgt_dict.unk()
+        if tgt is not None:
+            assert len(src) == len(
+                tgt
+            ), "Source and target must contain the same number of examples"
+        assert len(src) == len(
+            ref
+        ), "Source and reference must contain the same number of examples"
+        self.src = src
+        self.ref = ref
+        self.tgt = tgt
+        self.src_sizes = np.array(src_sizes)
+        self.ref_sizes = np.array(ref_sizes)
+        self.tgt_sizes = np.array(tgt_sizes) if tgt_sizes is not None else None
+        self.sizes = (
+            np.vstack((self.src_sizes, self.tgt_sizes)).T
+            if self.tgt_sizes is not None
+            else self.src_sizes
+        )
+        self.src_dict = src_dict
+        self.ref_dict = ref_dict
+        self.tgt_dict = tgt_dict
+        self.left_pad_source = left_pad_source
+        self.left_pad_target = left_pad_target
+        self.shuffle = shuffle
+        self.input_feeding = input_feeding
+        self.remove_eos_from_source = remove_eos_from_source
+        self.append_eos_to_target = append_eos_to_target
+        self.align_dataset = align_dataset
+        if self.align_dataset is not None:
+            assert (
+                self.tgt_sizes is not None
+            ), "Both source and target needed when alignments are provided"
+        self.constraints = constraints
+        self.append_bos = append_bos
+        self.eos = eos if eos is not None else src_dict.eos()
+        self.src_lang_id = src_lang_id
+        self.tgt_lang_id = tgt_lang_id
+        if num_buckets > 0:
+            from fairseq.data import BucketPadLengthDataset
+
+            self.src = BucketPadLengthDataset(
+                self.src,
+                sizes=self.src_sizes,
+                num_buckets=num_buckets,
+                pad_idx=self.src_dict.pad(),
+                left_pad=self.left_pad_source,
+            )
+            self.src_sizes = self.src.sizes
+            logger.info("bucketing source lengths: {}".format(list(self.src.buckets)))
+            self.ref = BucketPadLengthDataset(
+                self.ref,
+                sizes=self.ref_sizes,
+                num_buckets=num_buckets,
+                pad_idx=self.ref_dict.pad(),
+                left_pad=self.left_pad_source,
+            )
+            self.ref_sizes = self.ref.sizes
+            logger.info("bucketing reference lengths: {}".format(list(self.src.buckets)))
+            if self.tgt is not None:
+                self.tgt = BucketPadLengthDataset(
+                    self.tgt,
+                    sizes=self.tgt_sizes,
+                    num_buckets=num_buckets,
+                    pad_idx=self.tgt_dict.pad(),
+                    left_pad=self.left_pad_target,
+                )
+                self.tgt_sizes = self.tgt.sizes
+                logger.info(
+                    "bucketing target lengths: {}".format(list(self.tgt.buckets))
+                )
+
+            # determine bucket sizes using self.num_tokens, which will return
+            # the padded lengths (thanks to BucketPadLengthDataset)
+            num_tokens = np.vectorize(self.num_tokens, otypes=[np.compat.long])
+            self.bucketed_num_tokens = num_tokens(np.arange(len(self.src)))
+            self.buckets = [
+                (None, num_tokens) for num_tokens in np.unique(self.bucketed_num_tokens)
+            ]
+        else:
+            self.buckets = None
+        self.pad_to_multiple = pad_to_multiple
+
+    def get_batch_shapes(self):
+        return self.buckets
+
+    def __getitem__(self, index):
+        tgt_item = self.tgt[index] if self.tgt is not None else None
+        src_item = self.src[index]
+        ref_item = self.ref[index]
+        # Append EOS to end of tgt sentence if it does not have an EOS and remove
+        # EOS from end of src sentence if it exists. This is useful when we use
+        # use existing datasets for opposite directions i.e., when we want to
+        # use tgt_dataset as src_dataset and vice versa
+        if self.append_eos_to_target:
+            eos = self.tgt_dict.eos() if self.tgt_dict else self.src_dict.eos()
+            if self.tgt and self.tgt[index][-1] != eos:
+                tgt_item = torch.cat([self.tgt[index], torch.LongTensor([eos])])
+
+        if self.append_bos:
+            bos = self.tgt_dict.bos() if self.tgt_dict else self.src_dict.bos()
+            if self.tgt and self.tgt[index][0] != bos:
+                tgt_item = torch.cat([torch.LongTensor([bos]), self.tgt[index]])
+
+            bos = self.src_dict.bos()
+            if self.src[index][0] != bos:
+                src_item = torch.cat([torch.LongTensor([bos]), self.src[index]])
+            if self.ref[index][0] != bos:
+                ref_item = torch.cat([torch.LongTensor([bos]), self.ref[index]])
+
+        if self.remove_eos_from_source:
+            eos = self.src_dict.eos()
+            if self.src[index][-1] == eos:
+                src_item = self.src[index][:-1]
+            if self.ref[index][-1] == eos:
+                ref_item = self.ref[index][:-1]
+
+        example = {
+            "id": index,
+            "source": src_item,
+            "reference": ref_item,
+            "target": tgt_item,
+        }
+        if self.align_dataset is not None:
+            example["alignment"] = self.align_dataset[index]
+        if self.constraints is not None:
+            example["constraints"] = self.constraints[index]
+        return example
+
+    def __len__(self):
+        return len(self.src)
+
+    def collater(self, samples, pad_to_length=None):
+        """Merge a list of samples to form a mini-batch.
+
+        Args:
+            samples (List[dict]): samples to collate
+            pad_to_length (dict, optional): a dictionary of
+                {'source': source_pad_to_length, 'target': target_pad_to_length}
+                to indicate the max length to pad to in source and target respectively.
+
+        Returns:
+            dict: a mini-batch with the following keys:
+
+                - `id` (LongTensor): example IDs in the original input order
+                - `ntokens` (int): total number of tokens in the batch
+                - `net_input` (dict): the input to the Model, containing keys:
+
+                  - `src_tokens` (LongTensor): a padded 2D Tensor of tokens in
+                    the source sentence of shape `(bsz, src_len)`. Padding will
+                    appear on the left if *left_pad_source* is ``True``.
+                  - `src_lengths` (LongTensor): 1D Tensor of the unpadded
+                    lengths of each source sentence of shape `(bsz)`
+                  - `prev_output_tokens` (LongTensor): a padded 2D Tensor of
+                    tokens in the target sentence, shifted right by one
+                    position for teacher forcing, of shape `(bsz, tgt_len)`.
+                    This key will not be present if *input_feeding* is
+                    ``False``.  Padding will appear on the left if
+                    *left_pad_target* is ``True``.
+                  - `src_lang_id` (LongTensor): a long Tensor which contains source
+                    language IDs of each sample in the batch
+
+                - `target` (LongTensor): a padded 2D Tensor of tokens in the
+                  target sentence of shape `(bsz, tgt_len)`. Padding will appear
+                  on the left if *left_pad_target* is ``True``.
+                - `tgt_lang_id` (LongTensor): a long Tensor which contains target language
+                   IDs of each sample in the batch
+        """
+        res = collate(
+            samples,
+            pad_idx=self.src_dict.pad(),
+            eos_idx=self.eos,
+            left_pad_source=self.left_pad_source,
+            left_pad_target=self.left_pad_target,
+            input_feeding=self.input_feeding,
+            pad_to_length=pad_to_length,
+            pad_to_multiple=self.pad_to_multiple,
+        )
+        if self.src_lang_id is not None or self.tgt_lang_id is not None:
+            src_tokens = res["net_input"]["src_tokens"]
+            bsz = src_tokens.size(0)
+            if self.src_lang_id is not None:
+                res["net_input"]["src_lang_id"] = (
+                    torch.LongTensor([[self.src_lang_id]]).expand(bsz, 1).to(src_tokens)
+                )
+            if self.tgt_lang_id is not None:
+                res["tgt_lang_id"] = (
+                    torch.LongTensor([[self.tgt_lang_id]]).expand(bsz, 1).to(src_tokens)
+                )
+        return res
+
+    def num_tokens(self, index):
+        """Return the number of tokens in a sample. This value is used to
+        enforce ``--max-tokens`` during batching."""
+        return max(
+            self.src_sizes[index],
+            self.tgt_sizes[index] if self.tgt_sizes is not None else 0,
+        )
+
+    def num_tokens_vec(self, indices):
+        """Return the number of tokens for a set of positions defined by indices.
+        This value is used to enforce ``--max-tokens`` during batching."""
+        sizes = self.src_sizes[indices]
+        if self.tgt_sizes is not None:
+            sizes = np.maximum(sizes, self.tgt_sizes[indices])
+        return sizes
+
+    def size(self, index):
+        """Return an example's size as a float or tuple. This value is used when
+        filtering a dataset with ``--max-positions``."""
+        return (
+            self.src_sizes[index],
+            self.tgt_sizes[index] if self.tgt_sizes is not None else 0,
+        )
+
+    def ordered_indices(self):
+        """Return an ordered list of indices. Batches will be constructed based
+        on this order."""
+        if self.shuffle:
+            indices = np.random.permutation(len(self)).astype(np.int64)
+        else:
+            indices = np.arange(len(self), dtype=np.int64)
+        if self.buckets is None:
+            # sort by target length, then source length
+            if self.tgt_sizes is not None:
+                indices = indices[np.argsort(self.tgt_sizes[indices], kind="mergesort")]
+            return indices[np.argsort(self.src_sizes[indices], kind="mergesort")]
+        else:
+            # sort by bucketed_num_tokens, which is:
+            #   max(padded_src_len, padded_tgt_len)
+            return indices[
+                np.argsort(self.bucketed_num_tokens[indices], kind="mergesort")
+            ]
+
+    @property
+    def supports_prefetch(self):
+        return getattr(self.src, "supports_prefetch", False) and (
+            getattr(self.tgt, "supports_prefetch", False) or self.tgt is None
+        )
+
+    def prefetch(self, indices):
+        self.src.prefetch(indices)
+        if self.tgt is not None:
+            self.tgt.prefetch(indices)
+        if self.align_dataset is not None:
+            self.align_dataset.prefetch(indices)
+
+    def filter_indices_by_size(self, indices, max_sizes):
+        """Filter a list of sample indices. Remove those that are longer
+            than specified in max_sizes.
+
+        Args:
+            indices (np.array): original array of sample indices
+            max_sizes (int or list[int] or tuple[int]): max sample size,
+                can be defined separately for src and tgt (then list or tuple)
+
+        Returns:
+            np.array: filtered sample array
+            list: list of removed indices
+        """
+        return data_utils.filter_paired_dataset_indices_by_size(
+            self.src_sizes,
+            self.tgt_sizes,
+            indices,
+            max_sizes,
+        )
diff --git a/SpeechLM/speechlm/data/load_langpair_dataset.py b/SpeechLM/speechlm/data/load_langpair_dataset.py
new file mode 100644
index 0000000000000000000000000000000000000000..1622a79db99515836e7eb135e0ebd51e6a2d2dc0
--- /dev/null
+++ b/SpeechLM/speechlm/data/load_langpair_dataset.py
@@ -0,0 +1,174 @@
+# ----------------------------------------------------------------------------
+# SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data (https://arxiv.org/abs/2209.15329)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/SpeechLM
+# Code based on fairseq: https://github.com/facebookresearch/fairseq/tree/272c4c5197250997148fb12c0db6306035f166a4
+# 
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# ----------------------------------------------------------------------------
+
+"""
+    Modified from https://github.com/facebookresearch/fairseq/blob/272c4c5197250997148fb12c0db6306035f166a4/fairseq/tasks/translation.py
+    1. Add custom lang_format in function load_langpair_dataset
+    2. If truncate_source (default no), use RandomCropDataset instead of TruncateDataset
+"""
+
+import itertools
+import logging
+import os
+
+from fairseq.data import (
+    AppendTokenDataset,
+    LanguagePairDataset,
+    PrependTokenDataset,
+    StripTokenDataset,
+    TruncateDataset,
+    RandomCropDataset,
+    data_utils,
+    indexed_dataset,
+)
+
+from speechlm.data.concat_dataset import ConcatDataset
+
+
+EVAL_BLEU_ORDER = 4
+
+
+logger = logging.getLogger(__name__)
+
+
+def load_langpair_dataset(
+    data_path,
+    split,
+    src,
+    src_dict,
+    tgt,
+    tgt_dict,
+    combine,
+    dataset_impl,
+    upsample_primary,
+    left_pad_source,
+    left_pad_target,
+    max_source_positions,
+    max_target_positions,
+    prepend_bos=False,
+    load_alignments=False,
+    truncate_source=False,
+    append_source_id=False,
+    num_buckets=0,
+    shuffle=True,
+    pad_to_multiple=1,
+    prepend_bos_src=None,
+    lang_format="[{}]",
+    input_feeding=True,
+):
+    def split_exists(split, src, tgt, lang, data_path):
+        filename = os.path.join(data_path, "{}.{}-{}.{}".format(split, src, tgt, lang))
+        return indexed_dataset.dataset_exists(filename, impl=dataset_impl)
+
+    src_datasets = []
+    tgt_datasets = []
+
+    for k in itertools.count():
+        split_k = split + (str(k) if k > 0 else "")
+
+        # infer langcode
+        if split_exists(split_k, src, tgt, src, data_path):
+            prefix = os.path.join(data_path, "{}.{}-{}.".format(split_k, src, tgt))
+        elif split_exists(split_k, tgt, src, src, data_path):
+            prefix = os.path.join(data_path, "{}.{}-{}.".format(split_k, tgt, src))
+        else:
+            if k > 0:
+                break
+            else:
+                raise FileNotFoundError(
+                    "Dataset not found: {} ({})".format(split, data_path)
+                )
+
+        src_dataset = data_utils.load_indexed_dataset(
+            prefix + src, src_dict, dataset_impl
+        )
+        if truncate_source:
+            src_dataset = AppendTokenDataset(
+                RandomCropDataset(
+                    StripTokenDataset(src_dataset, src_dict.eos()),
+                    max_source_positions - 1,
+                ),
+                src_dict.eos(),
+            )
+        src_datasets.append(src_dataset)
+
+        tgt_dataset = data_utils.load_indexed_dataset(
+            prefix + tgt, tgt_dict, dataset_impl
+        )
+        if tgt_dataset is not None:
+            tgt_datasets.append(tgt_dataset)
+
+        logger.info(
+            "{} {} {}-{} {} examples".format(
+                data_path, split_k, src, tgt, len(src_datasets[-1])
+            )
+        )
+
+        if not combine:
+            break
+
+    assert len(src_datasets) == len(tgt_datasets) or len(tgt_datasets) == 0
+
+    if len(src_datasets) == 1:
+        src_dataset = src_datasets[0]
+        tgt_dataset = tgt_datasets[0] if len(tgt_datasets) > 0 else None
+    else:
+        sample_ratios = [1] * len(src_datasets)
+        sample_ratios[0] = upsample_primary
+        src_dataset = ConcatDataset(src_datasets, sample_ratios)
+        if len(tgt_datasets) > 0:
+            tgt_dataset = ConcatDataset(tgt_datasets, sample_ratios)
+        else:
+            tgt_dataset = None
+
+    if prepend_bos:
+        assert hasattr(src_dict, "bos_index") and hasattr(tgt_dict, "bos_index")
+        src_dataset = PrependTokenDataset(src_dataset, src_dict.bos())
+        if tgt_dataset is not None:
+            tgt_dataset = PrependTokenDataset(tgt_dataset, tgt_dict.bos())
+    elif prepend_bos_src is not None:
+        logger.info(f"prepending src bos: {prepend_bos_src}")
+        src_dataset = PrependTokenDataset(src_dataset, prepend_bos_src)
+
+    eos = None
+    if append_source_id:
+        src_dataset = AppendTokenDataset(
+            src_dataset, src_dict.index(lang_format.format(src))
+        )
+        if tgt_dataset is not None:
+            tgt_dataset = AppendTokenDataset(
+                tgt_dataset, tgt_dict.index(lang_format.format(tgt))
+            )
+        eos = tgt_dict.index(lang_format.format(tgt))
+
+    align_dataset = None
+    if load_alignments:
+        align_path = os.path.join(data_path, "{}.align.{}-{}".format(split, src, tgt))
+        if indexed_dataset.dataset_exists(align_path, impl=dataset_impl):
+            align_dataset = data_utils.load_indexed_dataset(
+                align_path, None, dataset_impl
+            )
+
+    tgt_dataset_sizes = tgt_dataset.sizes if tgt_dataset is not None else None
+    return LanguagePairDataset(
+        src_dataset,
+        src_dataset.sizes,
+        src_dict,
+        tgt_dataset,
+        tgt_dataset_sizes,
+        tgt_dict,
+        left_pad_source=left_pad_source,
+        left_pad_target=left_pad_target,
+        align_dataset=align_dataset,
+        eos=eos,
+        num_buckets=num_buckets,
+        shuffle=shuffle,
+        pad_to_multiple=pad_to_multiple,
+        input_feeding=input_feeding,
+    )
diff --git a/SpeechLM/speechlm/data/multimodal_corpus_dataset.py b/SpeechLM/speechlm/data/multimodal_corpus_dataset.py
new file mode 100644
index 0000000000000000000000000000000000000000..3e954a332d869535cbc335d58447a9b5a854735b
--- /dev/null
+++ b/SpeechLM/speechlm/data/multimodal_corpus_dataset.py
@@ -0,0 +1,370 @@
+# ----------------------------------------------------------------------------
+# SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data (https://arxiv.org/abs/2209.15329)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/SpeechLM
+# Code based on fairseq: https://github.com/facebookresearch/fairseq/tree/272c4c5197250997148fb12c0db6306035f166a4
+# 
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# ----------------------------------------------------------------------------
+
+import logging
+from os import replace
+import time
+from collections import OrderedDict
+from typing import Any, Dict, List, Optional
+
+import numpy as np
+from fairseq.data import data_utils
+
+from fairseq.data import FairseqDataset
+
+logger = logging.getLogger(__name__)
+
+
+class MultiCorpusDataset(FairseqDataset):
+    """
+    see fairseq/fairseq/data/multi_corpus_dataset.__doc__
+
+    Args:
+        datasets: a OrderedDict of FairseqDataset instances.
+        distribution: a List containing the probability of getting an utterance from
+                        corresponding dataset
+        seed: random seed for sampling the datsets
+        sort_indices: if true, will sort the ordered indices by size
+        batch_sample: if true, will ensure each batch is from a single dataset
+    """
+
+    def __init__(
+        self,
+        datasets: Dict[str, FairseqDataset],
+        max_positions: Dict,
+        distribution: List[float],
+        max_tokens_ratio: List[float],
+        seed: int = 1234,
+        sort_indices: bool = False,
+        check_length: bool = False,
+    ):
+        super().__init__()
+        assert isinstance(datasets, OrderedDict)
+        assert len(datasets) == len(distribution)
+        # assert sum(distribution) == 1
+        self.datasets = datasets
+        self.distribution = distribution
+        self.max_tokens_ratio = max_tokens_ratio
+        self.seed = seed
+        self.sort_indices = sort_indices
+        self.max_positions = max_positions
+        self.check_length = check_length
+
+        # Avoid repeated conversions to list later
+        self.dataset_list = list(datasets.values())
+        self.total_num_instances = 0
+
+        # first_dataset = self.dataset_list[0]
+
+        self.num_instances_per_dataset = []
+        self.dataset_offsets = []
+        for i, dataset in enumerate(self.dataset_list):
+            assert isinstance(dataset, FairseqDataset)
+            # assert type(dataset) is type(first_dataset)
+            self.num_instances_per_dataset.append(
+                0 if self.distribution[i] == 0 else len(dataset)
+            )
+            self.dataset_offsets.append(self.total_num_instances)
+            self.total_num_instances += self.num_instances_per_dataset[i]
+
+    def ordered_indices(self):
+        start = time.time()
+        with data_utils.numpy_seed(self.seed, self.epoch):
+            logger.info(f"sampling new dataset with seed {self.seed} epoch {self.epoch}")
+            sampled_indices = {}
+
+            # For each dataset i, sample self.distribution[i] * self.total_num_instances
+            for i, key in enumerate(self.datasets):
+                tp = time.time()
+                if self.distribution[i] == 0:
+                    # skip dataset if sampling probability is 0
+                    continue
+
+                if i < len(self.datasets) - 1:
+                    num_instances = int(self.distribution[i] * self.total_num_instances)
+                    high = self.dataset_offsets[i + 1]
+                else:
+                    num_instances = int(self.distribution[i] * self.total_num_instances)
+                    high = self.total_num_instances
+
+                logger.info(f"sampling {num_instances} from {key} dataset")
+
+                # First, add k copies of the dataset where k = num_instances // len(dataset).
+                # This ensures an equal distribution of the data points as much as possible.
+                # For the remaining entries randomly sample them
+                dataset_size = len(self.datasets[key])
+                num_copies = num_instances // dataset_size
+                dataset_indices = np.random.permutation(high - self.dataset_offsets[i])[: num_instances - num_copies * dataset_size]
+                if num_copies > 0:
+                    dataset_indices = np.concatenate(
+                            (
+                                np.repeat(
+                                    np.arange(high - self.dataset_offsets[i]), num_copies
+                                ),
+                                dataset_indices,
+                            )
+                        )
+                # filter by size, we should ignore it by setting check_length=False
+                # , as it is very time-consuming on large dadaset
+                if self.max_positions[key] is not None and self.check_length:
+                    dataset_indices, ignored = self.datasets[key].filter_indices_by_size(
+                        dataset_indices,
+                        self.max_positions[key],
+                    )
+                    if len(ignored) > 0:
+                        logger.warning(
+                            (
+                                "{:,} samples have invalid sizes and will be skipped, "
+                                "max_positions={}, first few sample ids={}"
+                            ).format(len(ignored), self.max_positions[key], ignored[:10])
+                        )
+            
+                if self.sort_indices:
+                    logger.info(" - sampled indices took {}s".format(time.time() - tp))
+                    tp = time.time()
+                    dataset_indices = np.sort(dataset_indices)
+                    ordered_indices = self.datasets[key].ordered_indices()
+                    if isinstance(ordered_indices[0], np.ndarray):  # chunked audio data
+                        dataset_indices = [order_idx + self.dataset_offsets[i] for order_idx in ordered_indices]
+                        assert self.dataset_offsets[i] == 0
+                        # TODO for chunked audio data, now assume len(dataset_indices) == len(dataset). Don't filter any data.
+                    else:
+                        dataset_indices = ordered_indices[dataset_indices] + self.dataset_offsets[i]
+                    logger.info(" - ordered_indices took {}s".format(time.time() - tp))
+                else:
+                    np.random.shuffle(dataset_indices)
+                
+                sampled_indices[key] = dataset_indices
+
+            logger.info(
+                "multi_corpus_dataset ordered_indices took {}s".format(
+                    time.time() - start
+                )
+            )
+            return sampled_indices
+
+    def _map_index(self, index: int):
+        """
+        If dataset A has length N and dataset B has length M
+        then index 1 maps to index 1 of dataset A, and index N + 1
+        maps to index 1 of B.
+        """
+        counter = 0
+        for num_instances, key in zip(self.num_instances_per_dataset, self.datasets):
+            if index < counter + num_instances:
+                return index - counter, key
+            counter += num_instances
+        raise ValueError(
+            "Invalid index: {}, max: {}".format(index, self.total_num_instances)
+        )
+
+    def __len__(self):
+        """
+        Length of this dataset is the sum of individual datasets
+        """
+        return self.total_num_instances
+
+    def __getitem__(self, index):
+        new_index, key = self._map_index(index)
+        try:
+            item = self.datasets[key][new_index]
+            item["full_id"] = index
+            return item
+        except Exception as e:
+            e.args = (f"Error from {key} dataset", *e.args)
+            raise
+
+    def collater(self, samples):
+        """
+        If we are doing batch sampling, then pick the right collater to use.
+
+        Otherwise we assume all collaters are the same.
+        """
+        if len(samples) == 0:
+            return None
+        
+        samples_dict = {key: [] for key in self.datasets}
+        for s in samples:
+            _, key = self._map_index(s["full_id"])
+            samples_dict[key].append(s)
+        
+        batch = {}
+        for key in samples_dict:
+            if len(samples_dict[key]) == 0:
+                continue
+            batch[key] = self.datasets[key].collater(samples_dict[key])
+
+        return batch
+
+
+    def num_tokens(self, index: int):
+        index, key = self._map_index(index)
+        return self.datasets[key].num_tokens(index)
+
+    def size(self, index: int):
+        index, key = self._map_index(index)
+        return self.datasets[key].size(index)
+
+    @property
+    def can_reuse_epoch_itr_across_epochs(self):
+        return False
+
+    def set_epoch(self, epoch, **unused):
+        super().set_epoch(epoch)
+        logger.info(f"setting epoch of multi_corpus_dataset to {epoch}")
+        for ds in self.dataset_list:
+            if hasattr(ds, "set_epoch"):
+                ds.set_epoch(epoch)
+        self.epoch = epoch
+
+    @property
+    def supports_prefetch(self):
+        return False
+
+    @property
+    def supports_fetch_outside_dataloader(self):
+        return all(
+            self.datasets[key].supports_fetch_outside_dataloader
+            for key in self.datasets
+        )
+        
+
+    def batch_by_size(
+        self,
+        indices,
+        max_tokens=None,
+        max_sentences=None,
+        required_batch_size_multiple=1,
+    ):    
+        dataset_indices = indices
+        batches_dict = {}
+        for n, key in enumerate(dataset_indices):
+            max_tokens_ratio = self.max_tokens_ratio[n]
+            if isinstance(dataset_indices[key][0], np.ndarray): # chunked audio data
+                cur_batches = self.datasets[key].batch_by_size(
+                    dataset_indices[key],
+                    round(max_tokens * max_tokens_ratio),
+                    max_sentences,
+                    required_batch_size_multiple,
+                )
+                logger.info(f"Created {sum([len(b) for b in cur_batches])} [{len(cur_batches)}] batches for dataset {key}")
+            else:
+                cur_batches = super().batch_by_size(
+                    np.array(dataset_indices[key], dtype=np.int64),
+                    round(max_tokens * max_tokens_ratio),
+                    max_sentences,
+                    required_batch_size_multiple,
+                )
+                logger.info(f"Created {len(cur_batches)} batches for dataset {key}")
+            batches_dict[key] = cur_batches
+
+        return batches_dict
+
+
+    def get_batch_sampler(
+        self,
+        indices,
+        num_shards,
+        seed,
+        max_tokens=None,
+        max_sentences=None,
+        required_batch_size_multiple=1,
+        split_modality_batch=False,
+    ):
+
+        def batch_sampler(dataset, epoch):
+            start = time.time()
+            batches_dict = dataset.batch_by_size(
+                indices,
+                max_tokens=max_tokens,
+                max_sentences=max_sentences,
+                required_batch_size_multiple=required_batch_size_multiple,
+            )
+            logger.info(f"multi_corpus_dataset, batch_by_size took {time.time() - start}s")
+            start = time.time()
+            new_batches = []
+
+            ### shuffle inner group size, split into speech/text batches
+            shuffled_batches_list = []
+            speech_batches = []
+            ### we should specify the speech_batches because: we need concatenate different speech datasets 
+            # (e.g. ltr or km) instead of loading them parellelly.
+            for name, batches in batches_dict.items():
+                if name.startswith("speech"):
+                    if isinstance(batches[0], list):  # chunked audio data
+                        batches = self.datasets[name].shuffle_batches(list(batches), seed + epoch)
+                        shuffled_batches_list.append(batches)
+                    else:
+                        batches = inner_bucket_shuffle(batches, seed+epoch, num_shards*10)
+                        batches = batches[: (len(batches) // num_shards) * num_shards]
+                        if len(batches) == 0:
+                            logger.warning(f"Sample 0 batch for {name}, you should ensure that no {name} data provided.")
+                        else:
+                            speech_batches += batches
+                else:
+                    batches = inner_bucket_shuffle(batches, seed+epoch, num_shards*10)
+                    batches = batches[: (len(batches) // num_shards) * num_shards]
+                    if len(batches) == 0:
+                        logger.warning(f"Sample 0 batch for {name}, you should ensure that no {name} data provided.")
+                    else:
+                        batches = shuffle_buckets(batches, seed=seed+epoch, inner_shuf=False)
+                        shuffled_batches_list.append(batches)
+                if len(speech_batches) > 0:
+                    speech_batches = shuffle_buckets(speech_batches, seed=seed+epoch, inner_shuf=False)
+                    shuffled_batches_list.append(speech_batches)
+
+            ### create the final new_batches
+            num_batch = min(len(batches) for batches in shuffled_batches_list)
+            if split_modality_batch:
+                for i in range(0, num_batch, num_shards):
+                    for batches in shuffled_batches_list:
+                        new_batches += batches[i: i + num_shards]
+            else:
+                for i in range(num_batch):
+                    new_batches.append(np.concatenate([batches[i] for batches in shuffled_batches_list]))
+
+            logger.info(f"multi_corpus_dataset sample {len(new_batches)} batches, took {time.time() - start}s")
+            return new_batches
+        
+        def inner_bucket_shuffle(batches, seed, bucket_size=10, thr=0):
+            """we assert batches is sorted form long to short.
+                shuffle samples in a buctet(e.g. 10 batches).
+                batches: a list of numpy array"""
+            num_batch = len(batches)
+            new_batches = []
+            num_buckets = len(batches) // bucket_size
+            i = 0
+            while i < num_batch:
+                if (i < bucket_size * thr or 
+                    i >= bucket_size * (num_buckets - thr)
+                ):
+                    new_batches.append(batches[i])
+                    i += 1
+                else:
+                    group = np.concatenate(batches[i: i+bucket_size])
+                    with data_utils.numpy_seed(seed):
+                        np.random.shuffle(group)
+                    new_batches += np.array_split(group, bucket_size)
+                    i += bucket_size
+            assert all([len(batch) > 0 for batch in new_batches])
+            return new_batches
+        
+        def shuffle_buckets(batches, seed, inner_shuf=True):
+            if inner_shuf:
+                batches = inner_bucket_shuffle(batches, seed, num_shards*10)
+            batches = [batches[i: i + num_shards] for i in range(0, len(batches)-num_shards+1, num_shards)]
+            assert len(batches[-1]) == num_shards
+            new_batches = []
+            with data_utils.numpy_seed(seed):
+                np.random.shuffle(batches)
+                for group in batches:
+                    new_batches += group
+            return new_batches
+        
+        return batch_sampler
diff --git a/SpeechLM/speechlm/data/text_to_unit_dataset.py b/SpeechLM/speechlm/data/text_to_unit_dataset.py
new file mode 100644
index 0000000000000000000000000000000000000000..f7671d0fe48e695dc008742842f1550f48d741bc
--- /dev/null
+++ b/SpeechLM/speechlm/data/text_to_unit_dataset.py
@@ -0,0 +1,293 @@
+# ----------------------------------------------------------------------------
+# SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data (https://arxiv.org/abs/2209.15329)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/SpeechLM
+# Code based on fairseq: https://github.com/facebookresearch/fairseq/tree/272c4c5197250997148fb12c0db6306035f166a4
+# 
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# ----------------------------------------------------------------------------
+
+from pathlib import Path
+from typing import List, Dict, Optional, Any
+from dataclasses import dataclass
+
+import numpy as np
+import torch
+
+from fairseq.data.audio.speech_to_text_dataset import (
+    SpeechToTextDataset,
+    SpeechToTextDatasetCreator,
+    S2TDataConfig,
+    _collate_frames,
+    get_features_or_waveform,
+)
+from fairseq.data import Dictionary, data_utils as fairseq_data_utils
+
+
+@dataclass
+class TextToUnitDatasetItem(object):
+    index: int
+    source: torch.Tensor
+    target: Optional[torch.Tensor] = None
+    speaker_id: Optional[int] = None
+    speaker_emb: Optional[torch.Tensor] = None
+    duration: Optional[torch.Tensor] = None
+    pitch: Optional[torch.Tensor] = None
+    energy: Optional[torch.Tensor] = None
+
+
+class Text2UnitDataset(SpeechToTextDataset):
+    def __init__(
+        self,
+        split: str,
+        is_train_split: bool,
+        cfg: S2TDataConfig,
+        unit_labels: List[str],
+        n_frames: List[int],
+        src_texts: Optional[List[str]] = None,
+        tgt_texts: Optional[List[str]] = None,
+        speakers: Optional[List[str]] = None,
+        src_langs: Optional[List[str]] = None,
+        tgt_langs: Optional[List[str]] = None,
+        ids: Optional[List[str]] = None,
+        tgt_dict: Optional[Dictionary] = None,
+        pre_tokenizer=None,
+        bpe_tokenizer=None,
+        n_frames_per_step=1,
+        speaker_to_id=None,
+        durations: Optional[List[List[int]]] = None,
+        pitches: Optional[List[str]] = None,
+        energies: Optional[List[str]] = None,
+    ):
+        super(Text2UnitDataset, self).__init__(
+            split,
+            is_train_split,
+            cfg,
+            unit_labels,
+            n_frames,
+            src_texts=src_texts,
+            tgt_texts=tgt_texts,
+            speakers=speakers,
+            src_langs=src_langs,
+            tgt_langs=tgt_langs,
+            ids=ids,
+            tgt_dict=tgt_dict,
+            pre_tokenizer=pre_tokenizer,
+            bpe_tokenizer=bpe_tokenizer,
+            n_frames_per_step=n_frames_per_step,
+            speaker_to_id=speaker_to_id,
+        )
+        self.durations = durations
+        self.pitches = pitches
+        self.energies = energies
+        self.unit_labels = unit_labels
+        self.feature_root = Path(cfg.audio_root)
+        self.spk_emb_type = cfg.config.get("speaker_embedding_type", None)
+        self.random_spk = cfg.config.get("random_speaker", False)
+        if self.spk_emb_type is not None:
+            self.spk_emb_choices = [i for i in (self.feature_root / self.spk_emb_type).glob("*.npy")]
+            self.spk_emb_num = len(self.spk_emb_choices)
+    
+    def __getitem__(self, index: int) -> TextToUnitDatasetItem:
+        # s2t_item = super().__getitem__(index)
+        source = torch.LongTensor(self.unit_labels[index])
+        target = None
+        if self.tgt_texts is not None:
+            tokenized = self.get_tokenized_tgt_text(index)
+            target = self.tgt_dict.encode_line(
+                tokenized, add_if_not_exist=False, append_eos=self.append_eos
+            ).long()
+            if self.cfg.prepend_tgt_lang_tag:
+                lang_tag_idx = self.get_lang_tag_idx(
+                    self.tgt_langs[index], self.tgt_dict
+                )
+                target = torch.cat((torch.LongTensor([lang_tag_idx]), target), 0)
+
+        speaker_id = None
+        if self.speaker_to_id is not None:
+            speaker_id = self.speaker_to_id[self.speakers[index]]
+        
+        speaker_emb = None
+        if self.spk_emb_type is not None:
+            if self.random_spk:
+                spk_emb_path = self.spk_emb_choices[np.random.choice(self.spk_emb_num)]
+            else:
+                spk_emb_path = self.feature_root / self.spk_emb_type / f"{self.ids[index]}.npy"
+            speaker_emb = get_features_or_waveform(spk_emb_path)
+            speaker_emb = torch.from_numpy(speaker_emb).float()
+
+        duration, pitch, energy = None, None, None
+        if self.durations is not None:
+            duration = torch.tensor(
+                self.durations[index] + [0], dtype=torch.long  # pad 0 for EOS
+            )
+        if self.pitches is not None:
+            pitch = get_features_or_waveform(self.pitches[index])
+            pitch = torch.from_numpy(
+                np.concatenate((pitch, [0]))  # pad 0 for EOS
+            ).float()
+        if self.energies is not None:
+            energy = get_features_or_waveform(self.energies[index])
+            energy = torch.from_numpy(
+                np.concatenate((energy, [0]))  # pad 0 for EOS
+            ).float()
+        return TextToUnitDatasetItem(
+            index=index,
+            source=source,
+            target=target,
+            speaker_id=speaker_id,
+            speaker_emb=speaker_emb,
+            duration=duration,
+            pitch=pitch,
+            energy=energy,
+        )
+
+    def collater(self, samples: List[TextToUnitDatasetItem]) -> Dict[str, Any]:
+        if len(samples) == 0:
+            return {}
+
+        src_lengths, order = torch.tensor(
+            [s.target.shape[0] for s in samples], dtype=torch.long
+        ).sort(descending=True)
+        id_ = torch.tensor([s.index for s in samples], dtype=torch.long).index_select(
+            0, order
+        )
+        traget = fairseq_data_utils.collate_tokens(
+            [s.source for s in samples],
+            self.tgt_dict.pad(),
+        ).index_select(0, order)
+
+        target_lengths = torch.tensor(
+            [s.source.shape[0] for s in samples], dtype=torch.long
+        ).index_select(0, order)
+
+        src_tokens = fairseq_data_utils.collate_tokens(
+            [s.target for s in samples],
+            self.tgt_dict.pad(),
+            self.tgt_dict.eos(),
+            left_pad=False,
+            move_eos_to_beginning=False,
+        ).index_select(0, order)
+
+        speaker = None
+        if self.speaker_to_id is not None:
+            speaker = (
+                torch.tensor([s.speaker_id for s in samples], dtype=torch.long)
+                .index_select(0, order)
+                .view(-1, 1)
+            )
+        if self.spk_emb_type is not None:
+            speaker = torch.stack([s.speaker_emb for s in samples], dim=0).index_select(0, order)
+
+        bsz, _ = traget.size()
+        prev_output_tokens = torch.cat(
+            (traget.new_zeros((bsz, self.tgt_dict.bos())), traget[:, :-1]), dim=1
+        )
+
+        durations, pitches, energies = None, None, None
+        if self.durations is not None:
+            durations = fairseq_data_utils.collate_tokens(
+                [s.duration for s in samples], 0
+            ).index_select(0, order)
+            assert src_tokens.shape[1] == durations.shape[1]
+        if self.pitches is not None:
+            pitches = _collate_frames([s.pitch for s in samples], True)
+            pitches = pitches.index_select(0, order)
+            assert src_tokens.shape[1] == pitches.shape[1]
+        if self.energies is not None:
+            energies = _collate_frames([s.energy for s in samples], True)
+            energies = energies.index_select(0, order)
+            assert src_tokens.shape[1] == energies.shape[1]
+        src_texts = [self.tgt_dict.string(samples[i].target) for i in order]
+
+        return {
+            "id": id_,
+            "net_input": {
+                "src_tokens": src_tokens,
+                "src_lengths": src_lengths,
+                "prev_output_tokens": prev_output_tokens,
+            },
+            "speaker": speaker,
+            "target": traget,
+            "durations": durations,
+            "pitches": pitches,
+            "energies": energies,
+            "target_lengths": target_lengths,
+            "ntokens": sum(target_lengths).item(),
+            "nsentences": len(samples),
+            "src_texts": src_texts,
+        }
+
+
+class Text2UnitDatasetCreator(SpeechToTextDatasetCreator):
+    KEY_DURATION = "duration"
+    KEY_PITCH = "pitch"
+    KEY_ENERGY = "energy"
+    KEY_UNIT = "unit"
+
+    @classmethod
+    def _from_list(
+        cls,
+        split_name: str,
+        is_train_split,
+        samples: List[Dict],
+        cfg: S2TDataConfig,
+        tgt_dict,
+        pre_tokenizer,
+        bpe_tokenizer,
+        n_frames_per_step,
+        speaker_to_id,
+    ) -> Text2UnitDataset:
+        audio_root = Path(cfg.audio_root)
+        ids = [s[cls.KEY_ID] for s in samples]
+        # audio_paths = [(audio_root / s[cls.KEY_AUDIO]).as_posix() for s in samples]
+        unit_labels = [s[cls.KEY_UNIT] for s in samples]
+        unit_labels = [
+            None if dd is None else [int(d) for d in dd.split(" ")] for dd in unit_labels
+        ]
+        n_frames = [int(s[cls.KEY_N_FRAMES]) for s in samples]
+        tgt_texts = [s[cls.KEY_TGT_TEXT] for s in samples]
+        src_texts = [s.get(cls.KEY_SRC_TEXT, cls.DEFAULT_SRC_TEXT) for s in samples]
+        speakers = [s.get(cls.KEY_SPEAKER, cls.DEFAULT_SPEAKER) for s in samples]
+        src_langs = [s.get(cls.KEY_SRC_LANG, cls.DEFAULT_LANG) for s in samples]
+        tgt_langs = [s.get(cls.KEY_TGT_LANG, cls.DEFAULT_LANG) for s in samples]
+
+        durations = [s.get(cls.KEY_DURATION, None) for s in samples]
+        durations = [
+            None if dd is None else [int(d) for d in dd.split(" ")] for dd in durations
+        ]
+        durations = None if any(dd is None for dd in durations) else durations
+
+        pitches = [s.get(cls.KEY_PITCH, None) for s in samples]
+        pitches = [
+            None if pp is None else (audio_root / pp).as_posix() for pp in pitches
+        ]
+        pitches = None if any(pp is None for pp in pitches) else pitches
+
+        energies = [s.get(cls.KEY_ENERGY, None) for s in samples]
+        energies = [
+            None if ee is None else (audio_root / ee).as_posix() for ee in energies
+        ]
+        energies = None if any(ee is None for ee in energies) else energies
+
+        return Text2UnitDataset(
+            split_name,
+            is_train_split,
+            cfg,
+            unit_labels,
+            n_frames,
+            src_texts,
+            tgt_texts,
+            speakers,
+            src_langs,
+            tgt_langs,
+            ids,
+            tgt_dict,
+            pre_tokenizer,
+            bpe_tokenizer,
+            n_frames_per_step,
+            speaker_to_id,
+            durations,
+            pitches,
+            energies,
+        )
diff --git a/SpeechLM/speechlm/data_process/covost2/mp3_to_wav.py b/SpeechLM/speechlm/data_process/covost2/mp3_to_wav.py
new file mode 100644
index 0000000000000000000000000000000000000000..7d8056879637947467aa5e8a3c466129c590eecf
--- /dev/null
+++ b/SpeechLM/speechlm/data_process/covost2/mp3_to_wav.py
@@ -0,0 +1,42 @@
+import argparse
+from tqdm import tqdm
+from pydub import AudioSegment
+import torchaudio
+import os
+
+def mp3_convert_wav(mp3_file, wav_file):
+    try:
+        sound = AudioSegment.from_mp3(mp3_file)
+        sound=sound.set_frame_rate(16000)
+        sound=sound.set_channels(1)
+        sound=sound.set_sample_width(2)
+        sound.export(wav_file, format="wav")
+    except Exception as e:
+        print(e)
+
+def main():
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--input", "-i", required=True, type=str)
+    parser.add_argument("--shard", "-n", required=True, type=int)
+    parser.add_argument("--rank", "-r", required=True, type=int)
+    args = parser.parse_args()
+
+    assert args.rank < args.shard, f"rank: {args.rank} >= shard: {args.shard}"
+
+    with open(args.input, 'r') as f:
+        files = [line.strip() for line in f ]
+
+    mp3_files = files[args.rank::args.shard]
+    for mp3_file in tqdm(mp3_files):
+        wav_file = mp3_file.replace("/clips/", "/wav/").replace(".mp3", ".wav")
+        if os.path.exists(wav_file):
+            try:
+                torchaudio.info(wav_file)
+            except Exception as e:
+                print(e)
+                mp3_convert_wav(mp3_file, wav_file)
+        else:
+            mp3_convert_wav(mp3_file, wav_file)
+
+if __name__ == "__main__":
+    main()
diff --git a/SpeechLM/speechlm/data_process/covost2/prepare_covost_data.py b/SpeechLM/speechlm/data_process/covost2/prepare_covost_data.py
new file mode 100644
index 0000000000000000000000000000000000000000..687bc9f81922745e8872d3a998a04bc8d9c589ca
--- /dev/null
+++ b/SpeechLM/speechlm/data_process/covost2/prepare_covost_data.py
@@ -0,0 +1,295 @@
+# ----------------------------------------------------------------------------
+# SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data (https://arxiv.org/abs/2209.15329)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/SpeechLM
+# Code based on fairseq: https://github.com/facebookresearch/fairseq/tree/272c4c5197250997148fb12c0db6306035f166a4
+# 
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# ----------------------------------------------------------------------------
+"""
+Modified from: https://github.com/facebookresearch/fairseq/blob/272c4c5197250997148fb12c0db6306035f166a4/examples/speech_to_text/prep_covost_data.py
+1. normalize the punctuation
+2. instead of extract fbank features, we direcly use 16k-Hz waveform
+"""
+import argparse
+import logging
+from pathlib import Path
+from tempfile import NamedTemporaryFile
+from typing import Optional, Tuple
+
+import pandas as pd
+import torchaudio
+from examples.speech_to_text.data_utils import (
+    filter_manifest_df,
+    gen_config_yaml,
+    gen_vocab,
+    load_df_from_tsv,
+    save_df_to_tsv,
+)
+from torch import Tensor
+from torch.utils.data import Dataset
+from torchaudio.datasets.utils import download_url, extract_archive
+from tqdm import tqdm
+from pydub import AudioSegment
+import soundfile as sf
+import sacremoses
+
+log = logging.getLogger(__name__)
+
+
+MANIFEST_COLUMNS = ["id", "audio", "n_frames", "tgt_text"]
+
+
+def mp3_convert_wav(mp3_file, wav_file):
+    sound = AudioSegment.from_mp3(mp3_file)
+    sound=sound.set_frame_rate(16000)
+    sound=sound.set_channels(1)
+    sound=sound.set_sample_width(2)
+    sound.export(wav_file, format="wav")
+
+class CoVoST(Dataset):
+    """Create a Dataset for CoVoST (https://github.com/facebookresearch/covost).
+
+    Args:
+        root (str): root path to the dataset and generated manifests/features
+        source_language (str): source (audio) language
+        target_language (str, optional): target (text) language,
+        None for no translation (default: None)
+        version (int, optional): CoVoST version. (default: 2)
+        download (bool, optional): Whether to download the dataset if it is not
+        found at root path. (default: ``False``).
+    """
+
+    COVOST_URL_TEMPLATE = (
+        "https://dl.fbaipublicfiles.com/covost/"
+        "covost_v2.{src_lang}_{tgt_lang}.tsv.tar.gz"
+    )
+
+    VERSIONS = {2}
+    SPLITS = ["train", "dev", "test"]
+
+    XX_EN_LANGUAGES = {
+        1: ["fr", "de", "nl", "ru", "es", "it", "tr", "fa", "sv-SE", "mn", "zh-CN"],
+        2: [
+            "fr",
+            "de",
+            "es",
+            "ca",
+            "it",
+            "ru",
+            "zh-CN",
+            "pt",
+            "fa",
+            "et",
+            "mn",
+            "nl",
+            "tr",
+            "ar",
+            "sv-SE",
+            "lv",
+            "sl",
+            "ta",
+            "ja",
+            "id",
+            "cy",
+        ],
+    }
+    EN_XX_LANGUAGES = {
+        1: [],
+        2: [
+            "de",
+            "tr",
+            "fa",
+            "sv-SE",
+            "mn",
+            "zh-CN",
+            "cy",
+            "ca",
+            "sl",
+            "et",
+            "id",
+            "ar",
+            "ta",
+            "lv",
+            "ja",
+        ],
+    }
+
+    def __init__(
+        self,
+        root: str,
+        split: str,
+        source_language: str,
+        target_language: Optional[str] = None,
+        version: int = 2,
+    ) -> None:
+        assert version in self.VERSIONS and split in self.SPLITS
+        assert source_language is not None
+        self.no_translation = target_language is None
+        if not self.no_translation:
+            assert "en" in {source_language, target_language}
+            if source_language == "en":
+                assert target_language in self.EN_XX_LANGUAGES[version]
+            else:
+                assert source_language in self.XX_EN_LANGUAGES[version]
+        else:
+            # Hack here so that we can get "split" column from CoVoST TSV.
+            # Note that we use CoVoST train split for ASR which is an extension
+            # to Common Voice train split.
+            target_language = "de" if source_language == "en" else "en"
+
+        self.root: Path = Path(root)
+
+        cv_tsv_path = self.root / "validated.tsv"
+        assert cv_tsv_path.is_file()
+
+        covost_url = self.COVOST_URL_TEMPLATE.format(
+            src_lang=source_language, tgt_lang=target_language
+        )
+        covost_archive = self.root / Path(covost_url).name
+        if not covost_archive.is_file():
+            download_url(covost_url, self.root.as_posix(), hash_value=None)
+        extract_archive(covost_archive.as_posix())
+
+        cv_tsv = load_df_from_tsv(cv_tsv_path)
+        covost_tsv = load_df_from_tsv(
+            self.root / Path(covost_url).name.replace(".tar.gz", "")
+        )
+        df = pd.merge(
+            left=cv_tsv[["path", "sentence", "client_id"]],
+            right=covost_tsv[["path", "translation", "split"]],
+            how="inner",
+            on="path",
+        )
+        if split == "train":
+            df = df[(df["split"] == split) | (df["split"] == f"{split}_covost")]
+        else:
+            df = df[df["split"] == split]
+        data = df.to_dict(orient="index").items()
+        data = [v for k, v in sorted(data, key=lambda x: x[0])]
+        self.data = []
+        for e in data:
+            try:
+                path = self.root / "clips" / e["path"]
+                _ = torchaudio.info(path.as_posix())
+                self.data.append(e)
+            except RuntimeError:
+                pass
+        
+        self.normalizer = sacremoses.MosesPunctNormalizer(
+            lang=target_language,
+            pre_replace_unicode_punct=True,
+            post_remove_control_chars=True,
+        )
+
+    def __getitem__(
+        self, n: int
+    ) -> Tuple[Tensor, int, str, str, Optional[str], str, str]:
+        """Load the n-th sample from the dataset.
+
+        Args:
+            n (int): The index of the sample to be loaded
+
+        Returns:
+            tuple: ``(waveform, sample_rate, sentence, translation, speaker_id,
+            sample_id)``
+        """
+        data = self.data[n]
+        path = self.root / "clips" / data["path"]
+        # waveform, sample_rate = torchaudio.load(path)
+        sentence = data["sentence"]
+        translation = None if self.no_translation else data["translation"]
+        translation = self.normalizer.normalize(translation)
+        speaker_id = data["client_id"]
+        _id = data["path"].replace(".mp3", "")
+        return path, -1, sentence, translation, speaker_id, _id
+
+    def __len__(self) -> int:
+        return len(self.data)
+
+
+def process(args):
+    root = Path(args.data_root).absolute() / args.src_lang
+    outroot = root / f"{args.src_lang}-{args.tgt_lang}"
+    if args.vocab_type != "char":
+        outroot = root / f"{args.src_lang}-{args.tgt_lang}-{args.vocab_type}"
+    if not root.is_dir():
+        raise NotADirectoryError(f"{root} does not exist")
+    #1.  Extract featuress
+    # mp3-to-wav can take long long time, better run it externally with multi threads.
+    feature_root = root / "wav"
+    # feature_root.mkdir(exist_ok=True)
+    # for split in CoVoST.SPLITS:
+    #     print(f"Fetching split {split}...")
+    #     dataset = CoVoST(root, split, args.src_lang, args.tgt_lang)
+        # print("Converting mp3 to wav...")
+        # handle = open(root / f"{split}.id", "w")
+        # for waveform, _, _, _, _, utt_id in tqdm(dataset):
+        #     wav_file = feature_root / f"{utt_id}.wav"
+            # print(waveform, file=handle)
+            # mp3_convert_wav(waveform, wav_file)
+
+    #2. Generate TSV manifest
+    print("Generating manifest...")
+    train_text = []
+    task = f"asr_{args.src_lang}"
+    if args.tgt_lang is not None:
+        task = f"st_{args.src_lang}_{args.tgt_lang}"
+    for split in CoVoST.SPLITS:
+        manifest = {c: [] for c in MANIFEST_COLUMNS}
+        dataset = CoVoST(root, split, args.src_lang, args.tgt_lang)
+        for waveform, _, src_utt, tgt_utt, speaker_id, utt_id in tqdm(dataset):
+            wav_file = feature_root / f"{utt_id}.wav"
+            manifest["id"].append(utt_id)
+            manifest["audio"].append(wav_file.as_posix().replace("/data/", "/mnt/default/"))
+            manifest["n_frames"].append(sf.info(wav_file).frames)
+            manifest["tgt_text"].append(src_utt if args.tgt_lang is None else tgt_utt)
+        is_train_split = split.startswith("train")
+        if is_train_split:
+            train_text.extend(manifest["tgt_text"])
+        df = pd.DataFrame.from_dict(manifest)
+        df = filter_manifest_df(df, is_train_split=is_train_split, min_n_frames=320, max_n_frames=480000)
+        save_df_to_tsv(df, outroot / f"{split}_{task}.tsv")
+    # Generate vocab
+    vocab_size_str = "" if args.vocab_type == "char" else str(args.vocab_size)
+    spm_filename_prefix = f"spm_{args.vocab_type}{vocab_size_str}_{task}"
+    with NamedTemporaryFile(mode="w") as f:
+        for t in train_text:
+            f.write(t + "\n")
+        gen_vocab(
+            Path(f.name),
+            outroot / spm_filename_prefix,
+            args.vocab_type,
+            args.vocab_size
+        )
+    # Generate config YAML
+    # gen_config_yaml(
+    #     outroot,
+    #     spm_filename=spm_filename_prefix + ".model",
+    #     yaml_filename=f"config_{task}.yaml",
+    #     specaugment_policy="lb",
+    # )
+
+def main():
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        "--data-root", "-d", required=True, type=str,
+        help="data root with sub-folders for each language <root>/<src_lang>"
+    )
+    parser.add_argument(
+        "--vocab-type",
+        default="unigram",
+        required=True,
+        type=str,
+        choices=["bpe", "unigram", "char"],
+    ),
+    parser.add_argument("--vocab-size", default=1000, type=int)
+    parser.add_argument("--src-lang", "-s", required=True, type=str)
+    parser.add_argument("--tgt-lang", "-t", type=str)
+    args = parser.parse_args()
+
+    process(args)
+
+
+if __name__ == "__main__":
+    main()
diff --git a/SpeechLM/speechlm/data_process/filter_paireddata_by_len.py b/SpeechLM/speechlm/data_process/filter_paireddata_by_len.py
new file mode 100644
index 0000000000000000000000000000000000000000..ce09af333ae076f3d8088dd47bd0b57c3d720b3a
--- /dev/null
+++ b/SpeechLM/speechlm/data_process/filter_paireddata_by_len.py
@@ -0,0 +1,48 @@
+# ----------------------------------------------------------------------------
+# SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data (https://arxiv.org/abs/2209.15329)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/SpeechLM
+# Code based on fairseq: https://github.com/facebookresearch/fairseq/tree/272c4c5197250997148fb12c0db6306035f166a4
+# 
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# ----------------------------------------------------------------------------
+
+import os
+import argparse
+from tqdm import tqdm
+import numpy as np
+
+
+lg_label = "__label__{}"
+
+def writefile(filename, lines):
+    with open(filename, 'w', encoding='utf-8') as f:
+        f.writelines(lines)
+
+
+def main():
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--input", "-i", required=True, type=str)
+    parser.add_argument("--output", "-o", required=True, type=str)
+    parser.add_argument("--src", "-s", required=True, type=str)
+    parser.add_argument("--tgt", "-t", required=True, type=str)
+    parser.add_argument("--max-len", "-m", default=2998, type=int)
+    args = parser.parse_args()
+    
+    src_lines, tgt_lines = [], []
+    with open(f"{args.input}.{args.src}", 'r') as f1, open(f"{args.input}.{args.tgt}", 'r') as f2: 
+        for src_line, tgt_line in tqdm(zip(f1, f2)):
+            src_len = len(src_line.strip().split())
+            tgt_len = len(tgt_line.strip().split())
+            if src_len < args.max_len and src_len > 0 and tgt_len < args.max_len and tgt_len > 0:
+                src_lines.append(src_line)
+                tgt_lines.append(tgt_line)
+
+    writefile(f"{args.output}.{args.src}", src_lines)
+    writefile(f"{args.output}.{args.tgt}", tgt_lines)
+
+if __name__ == "__main__":
+    main()
+
+
+
diff --git a/SpeechLM/speechlm/data_process/get_t2u_manifest.py b/SpeechLM/speechlm/data_process/get_t2u_manifest.py
new file mode 100644
index 0000000000000000000000000000000000000000..f60b34a1587df779ed17311ee9d257933ae9453e
--- /dev/null
+++ b/SpeechLM/speechlm/data_process/get_t2u_manifest.py
@@ -0,0 +1,119 @@
+# ----------------------------------------------------------------------------
+# SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data (https://arxiv.org/abs/2209.15329)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/SpeechLM
+# Code based on fairseq: https://github.com/facebookresearch/fairseq/tree/272c4c5197250997148fb12c0db6306035f166a4
+# 
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# ----------------------------------------------------------------------------
+
+import argparse
+import logging
+from pathlib import Path
+from collections import defaultdict
+
+import pandas as pd
+import torchaudio
+from tqdm import tqdm
+import numpy as np
+import torch
+
+from fairseq.data.audio.audio_utils import convert_waveform
+from examples.speech_to_text.data_utils import save_df_to_tsv
+from examples.speech_synthesis.data_utils import extract_pitch
+
+
+log = logging.getLogger(__name__)
+
+def get_duration(fa_phone):
+    """fa_phone: force-aligned phone, 1-D numpy"""
+    same = np.concatenate(([True], fa_phone[:-1] != fa_phone[1:], [True]))
+    index = np.where(same)[0]
+    count = np.diff(index)
+    return count
+
+
+
+def process(args):
+    # assert "train" in args.splits
+    out_root = Path(args.output_root).absolute()
+    out_root.mkdir(exist_ok=True)
+
+    print("Fetching data...")
+    audio_manifest_root = Path(args.audio_manifest_root).absolute()
+    for s in args.splits:
+        if args.add_pitch:
+            pitch_root = out_root / "pitch" / s
+            pitch_root.mkdir(exist_ok=True)
+        manifest = defaultdict(list)
+        with open(audio_manifest_root / f"{s}.audio.tsv") as f1, \
+            open(audio_manifest_root / f"{s}.phn") as f2, \
+            open(audio_manifest_root / f"{s}.km") as f3:
+            audio_root = f1.readline().strip()
+            audio_root = Path(audio_root)
+            for audio_path, fa_phone, fa_unit in tqdm(zip(f1, f2, f3)):
+                record = True
+                audio_path, n_frames = audio_path.strip().split("\t")
+                fa_phone = fa_phone.strip().split()
+                fa_unit = fa_unit.strip()
+                uttid = audio_path.split("/")[-1].split(".")[0]
+                speaker = uttid.split("-")[0]
+                
+                if args.add_duration:
+                    assert len(fa_phone) == len(fa_unit.split())
+                    fa_phone = np.array(list(map(int, fa_phone)))
+                    duration = get_duration(fa_phone)
+                    reduced_phone = torch.LongTensor(fa_phone).unique_consecutive().numpy()
+                    if args.add_pitch:
+                        pitch_path = pitch_root / f"{uttid}.npy"
+                        if not pitch_path.is_file():
+                            waveform, sample_rate = torchaudio.load(audio_root / audio_path)
+                            waveform, sample_rate = convert_waveform(
+                                waveform, sample_rate, normalize_volume=args.normalize_volume,
+                            )
+                            pitch = extract_pitch(
+                                waveform, sample_rate, None,
+                                hop_length=args.hop_length, log_scale=True,
+                                phoneme_durations=duration
+                            )
+                            if pitch is not None:
+                                np.save(pitch_path.as_posix(), pitch)
+                            else:
+                                record = False
+                else:
+                    reduced_phone = fa_phone
+
+                if record:
+                    manifest["id"].append(uttid)
+                    manifest["speaker"].append(speaker)
+                    manifest["n_frames"].append(len(fa_unit.split()))
+                    manifest["tgt_text"].append(" ".join(map(str, reduced_phone)))
+                    manifest["unit"].append(fa_unit)
+                    if args.add_duration:
+                        manifest["duration"].append(" ".join(map(str, duration)))
+                        if args.add_pitch:
+                            manifest["pitch"].append(f"pitch/{s}/{uttid}.npy")
+        save_df_to_tsv(
+            pd.DataFrame.from_dict(manifest),
+            out_root / f"{s}.tsv"
+        )
+
+
+
+def main():
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--audio-manifest-root", "-m", type=str)
+    parser.add_argument("--output-root", "-o", required=True, type=str)
+    parser.add_argument("--splits", "-s", type=str, nargs="+",
+                        default=["train", "dev", "test"])
+    parser.add_argument("--normalize-volume", "-n", action="store_true")
+    parser.add_argument("--hop-length", type=int, default=256)
+    parser.add_argument("--add-duration", action="store_true")
+    parser.add_argument("--add-pitch", action="store_true")
+    args = parser.parse_args()
+
+    process(args)
+
+
+if __name__ == "__main__":
+    main()
diff --git a/SpeechLM/speechlm/data_process/get_t2u_manifest_textonly.py b/SpeechLM/speechlm/data_process/get_t2u_manifest_textonly.py
new file mode 100644
index 0000000000000000000000000000000000000000..b332ecc9a8ca8f5899a7f4d3627aeb2611a768c2
--- /dev/null
+++ b/SpeechLM/speechlm/data_process/get_t2u_manifest_textonly.py
@@ -0,0 +1,67 @@
+# ----------------------------------------------------------------------------
+# SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data (https://arxiv.org/abs/2209.15329)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/SpeechLM
+# Code based on fairseq: https://github.com/facebookresearch/fairseq/tree/272c4c5197250997148fb12c0db6306035f166a4
+# 
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# ----------------------------------------------------------------------------
+
+import argparse
+import logging
+from pathlib import Path
+from collections import defaultdict
+
+import pandas as pd
+from tqdm import tqdm
+import numpy as np
+from examples.speech_to_text.data_utils import save_df_to_tsv
+
+
+log = logging.getLogger(__name__)
+
+def get_duration(fa_phone):
+    """fa_phone: force-aligned phone, 1-D numpy"""
+    same = np.concatenate(([True], fa_phone[:-1] != fa_phone[1:], [True]))
+    index = np.where(same)[0]
+    count = np.diff(index)
+    return count
+
+def process(args):
+    # assert "train" in args.splits
+    out_root = Path(args.output_root).absolute()
+    out_root.mkdir(exist_ok=True)
+
+    print("Fetching data...")
+    audio_manifest_root = Path(args.audio_manifest_root).absolute()
+    for s in args.splits:
+        manifest = defaultdict(list)
+        with open(audio_manifest_root / f"{s}.phn") as f1:
+            for i, reduced_phone in tqdm(enumerate(f1)):
+                reduced_phone = reduced_phone.strip()
+                uttid = f"librilm-{i}"
+                speaker = uttid.split("-")[0]
+                
+                manifest["id"].append(uttid)
+                manifest["speaker"].append(speaker)
+                manifest["n_frames"].append(len(reduced_phone))
+                manifest["tgt_text"].append(reduced_phone)
+                manifest["unit"].append(0)
+        save_df_to_tsv(
+            pd.DataFrame.from_dict(manifest),
+            out_root / f"{s}.tsv"
+        )
+
+def main():
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--audio-manifest-root", "-m", type=str)
+    parser.add_argument("--output-root", "-o", required=True, type=str)
+    parser.add_argument("--splits", "-s", type=str, nargs="+",
+                        default=["train", "dev", "test"])
+    parser.add_argument("--add-fastspeech-targets", action="store_true")
+    args = parser.parse_args()
+
+    process(args)
+
+if __name__ == "__main__":
+    main()
diff --git a/SpeechLM/speechlm/data_process/phoneize_with_sil.py b/SpeechLM/speechlm/data_process/phoneize_with_sil.py
new file mode 100644
index 0000000000000000000000000000000000000000..6fcdd6c18c80b0171965b804e3b4bb9a7ead18e2
--- /dev/null
+++ b/SpeechLM/speechlm/data_process/phoneize_with_sil.py
@@ -0,0 +1,132 @@
+# ----------------------------------------------------------------------------
+# SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data (https://arxiv.org/abs/2209.15329)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/SpeechLM
+# Code based on fairseq: https://github.com/facebookresearch/fairseq/tree/272c4c5197250997148fb12c0db6306035f166a4
+# 
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# ----------------------------------------------------------------------------
+
+"""
+Modified from https://github.com/facebookresearch/fairseq/tree/272c4c5197250997148fb12c0db6306035f166a4/examples/wav2vec/unsupervised/scripts/phonemize_with_sil.py
+"""
+
+import argparse
+import numpy as np
+import sys
+from g2p_en import G2p
+from tqdm import tqdm
+import logging
+logging.basicConfig(
+    format="%(asctime)s | %(levelname)s | %(name)s | %(message)s",
+    datefmt="%Y-%m-%d %H:%M:%S",
+    level=logging.INFO,
+)
+logger = logging.getLogger(__name__)
+
+def get_parser():
+    parser = argparse.ArgumentParser(
+        description="converts words to phones adding optional silences around in between words"
+    )
+    parser.add_argument(
+        "--sil-prob",
+        "-s",
+        type=float,
+        default=0,
+        help="probability of inserting silence between each word",
+    )
+    parser.add_argument(
+        "--surround",
+        action="store_true",
+        help="if set, surrounds each example with silence",
+    )
+    parser.add_argument(
+        "--lexicon",
+        help="lexicon to convert to phones",
+        required=True,
+    )
+    parser.add_argument(
+        "--strict",
+        action="store_true",
+        help="if set, OOV words will raise a error (for train/valid set)",
+    )
+    parser.add_argument(
+        "--input",
+        "-i",
+        help="input text file",
+        required=True,
+    )
+    parser.add_argument(
+        "--output",
+        "-o",
+        help="input text file",
+        required=True,
+    )
+
+
+    return parser
+
+
+def normalize_phn(phons):
+    """
+    convert g2p style phone to 39-phone set
+    """
+    return [p.rstrip('0123456789') for p in phons]
+
+
+def main():
+    parser = get_parser()
+    args = parser.parse_args()
+
+    sil_prob = args.sil_prob
+    surround = args.surround
+    sil = "<SIL>"
+
+    wrd_to_phn = {}
+    g2p = G2p()
+
+    with open(args.lexicon, "r") as lf:
+        for line in lf:
+            items = line.rstrip().split()
+            assert len(items) > 1, line
+            assert items[0] not in wrd_to_phn, items
+            wrd_to_phn[items[0]] = items[1:]
+
+    with open(args.input, "r") as fin, open(args.output, "w", encoding="utf-8") as fout:
+        for line in tqdm(fin):
+            words = line.strip().upper().split()
+
+            if not all(w in wrd_to_phn for w in words):
+                if args.strict:
+                    # logger.warning(f"| Warning: OOV words found: {line}")
+                    pass
+                else:
+                    continue
+
+            phones = []
+            if surround:
+                phones.append(sil)
+
+            sample_sil_probs = None
+            if sil_prob > 0 and len(words) > 1:
+                sample_sil_probs = np.random.random(len(words) - 1)
+
+            for i, w in enumerate(words):
+                if w in wrd_to_phn:
+                    phones.extend(wrd_to_phn[w])
+                else:
+                    phones.extend(normalize_phn(g2p(w)))
+                if (
+                    sample_sil_probs is not None
+                    and i < len(sample_sil_probs)
+                    and sample_sil_probs[i] < sil_prob
+                ):
+                    phones.append(sil)
+
+            if surround:
+                phones.append(sil)
+            print(" ".join(phones), file=fout)
+
+
+if __name__ == "__main__":
+    main()
diff --git a/SpeechLM/speechlm/data_process/phoneme_tokenizer/ltr2kaldi_phn_sil025.py b/SpeechLM/speechlm/data_process/phoneme_tokenizer/ltr2kaldi_phn_sil025.py
new file mode 100644
index 0000000000000000000000000000000000000000..014d0a29c1da416fead72a9235961143892d0326
--- /dev/null
+++ b/SpeechLM/speechlm/data_process/phoneme_tokenizer/ltr2kaldi_phn_sil025.py
@@ -0,0 +1,77 @@
+# ----------------------------------------------------------------------------
+# SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data (https://arxiv.org/abs/2209.15329)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/SpeechLM
+# Code based on fairseq: https://github.com/facebookresearch/fairseq/tree/272c4c5197250997148fb12c0db6306035f166a4
+# 
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# ----------------------------------------------------------------------------
+
+import os
+import tqdm
+import argparse
+import numpy as np
+
+parser = argparse.ArgumentParser()
+parser.add_argument("--input", "-i", required=True, type=str)
+parser.add_argument("--output", "-o", required=True, type=str)
+parser.add_argument("--lexicon", default='align_lexicon.txt', type=str)
+args = parser.parse_args()
+
+sil_prob = 0.25
+
+if not os.path.exists(args.lexicon):
+    print(f"| Warning: lexicon {args.lexicon} not found, downloading ...")
+    try:
+        os.system(f"wget --no-check-certificate 'https://drive.google.com/uc?export=download&id=1QVeyCpLXLnujBUAickpo-jaSVY-vKLnT' -O {args.lexicon}")
+    except Exception as e:
+        print(e)
+        print(f"| Error downloading {args.lexicon}, please download it from https://drive.google.com/file/d/1QVeyCpLXLnujBUAickpo-jaSVY-vKLnT/view?usp=sharing")
+        exit(1)
+dict = {}
+f = open(args.lexicon)
+for l in f:
+    dict[l.split()[0]] = l.split()[2:]
+    assert l.split()[0] == l.split()[1]
+
+f = open(args.input, 'r')
+w_f = open(f'{args.output}.kaldi_phn_sil025', 'w')
+w_oov = open(f'{args.output}.kaldi_phn_sil025.oov', 'w')
+
+oov_nums = 0
+total_nums = 0
+for l in tqdm.tqdm(f):
+    words = l.strip().replace(" ", "").split("|")
+    # words = l.strip().upper().split()
+    words = [w for w in words if w != '']
+
+    phones = []
+    phones.extend(dict['!SIL'])
+
+    sample_sil_probs = None
+    if sil_prob > 0 and len(words) > 1:
+        sample_sil_probs = np.random.random(len(words) - 1)
+
+    for i, w in enumerate(words):
+        total_nums += 1
+        if w not in dict:
+            w = '<UNK>'
+            oov_nums += 1
+            w_oov.write(w + '\n')
+
+        phones.extend(dict[w])
+
+        if (
+                sample_sil_probs is not None
+                and i < len(sample_sil_probs)
+                and sample_sil_probs[i] < sil_prob
+        ):
+            phones.extend(dict['!SIL'])
+
+    phones.extend(dict['!SIL'])
+    w_f.write(' '.join(phones) + '\n')
+w_oov.write(f'{oov_nums}\n')
+print(f"OOV rate: {oov_nums}/{total_nums}")
+
+# !!! After processing, use this comand to adjust the SIL 
+### sed -i 's/SIL_S/SIL/g'  your_file
diff --git a/SpeechLM/speechlm/data_process/phoneme_tokenizer/mean5_and_std25_sil14_spn32.dict b/SpeechLM/speechlm/data_process/phoneme_tokenizer/mean5_and_std25_sil14_spn32.dict
new file mode 100644
index 0000000000000000000000000000000000000000..b1957bee8f121f26709dd6d45a657698fd7c1f15
--- /dev/null
+++ b/SpeechLM/speechlm/data_process/phoneme_tokenizer/mean5_and_std25_sil14_spn32.dict
@@ -0,0 +1 @@
+{"SIL": [14, 7], "AE1_I": [5, 2.5], "P_I": [5, 2.5], "T_I": [5, 2.5], "ER0_E": [5, 2.5], "W_B": [5, 2.5], "AH1_I": [5, 2.5], "N_E": [5, 2.5], "M_B": [5, 2.5], "IH1_I": [5, 2.5], "S_I": [5, 2.5], "IH0_I": [5, 2.5], "Z_E": [5, 2.5], "R_B": [5, 2.5], "EY1_I": [5, 2.5], "CH_I": [5, 2.5], "AH0_I": [5, 2.5], "L_E": [5, 2.5], "L_B": [5, 2.5], "N_I": [5, 2.5], "D_E": [5, 2.5], "IH0_B": [5, 2.5], "S_B": [5, 2.5], "R_I": [5, 2.5], "AY1_I": [5, 2.5], "Z_I": [5, 2.5], "V_I": [5, 2.5], "JH_B": [5, 2.5], "T_E": [5, 2.5], "EH1_I": [5, 2.5], "R_E": [5, 2.5], "DH_B": [5, 2.5], "IY0_E": [5, 2.5], "AE1_B": [5, 2.5], "L_I": [5, 2.5], "IY2_E": [5, 2.5], "OW1_I": [5, 2.5], "D_B": [5, 2.5], "AW1_I": [5, 2.5], "UW1_E": [5, 2.5], "AH0_S": [5, 2.5], "HH_B": [5, 2.5], "AA1_I": [5, 2.5], "OW0_E": [5, 2.5], "F_B": [5, 2.5], "JH_I": [5, 2.5], "TH_E": [5, 2.5], "AO1_B": [5, 2.5], "D_I": [5, 2.5], "ER0_I": [5, 2.5], "AH0_B": [5, 2.5], "IY0_I": [5, 2.5], "IH1_B": [5, 2.5], "AA2_I": [5, 2.5], "S_E": [5, 2.5], "T_B": [5, 2.5], "ER1_I": [5, 2.5], "B_B": [5, 2.5], "AY1_E": [5, 2.5], "UH1_I": [5, 2.5], "K_E": [5, 2.5], "AO1_I": [5, 2.5], "W_I": [5, 2.5], "EY1_E": [5, 2.5], "AH1_E": [5, 2.5], "V_E": [5, 2.5], "OW1_B": [5, 2.5], "K_B": [5, 2.5], "TH_I": [5, 2.5], "B_I": [5, 2.5], "P_B": [5, 2.5], "Y_I": [5, 2.5], "UW1_I": [5, 2.5], "IH0_E": [5, 2.5], "IY1_E": [5, 2.5], "K_I": [5, 2.5], "AO2_I": [5, 2.5], "NG_E": [5, 2.5], "ER1_B": [5, 2.5], "TH_B": [5, 2.5], "IY1_I": [5, 2.5], "AE0_I": [5, 2.5], "AH0_E": [5, 2.5], "M_E": [5, 2.5], "N_B": [5, 2.5], "IY1_B": [5, 2.5], "DH_I": [5, 2.5], "G_I": [5, 2.5], "SH_I": [5, 2.5], "SH_B": [5, 2.5], "P_E": [5, 2.5], "AY1_S": [5, 2.5], "AA1_B": [5, 2.5], "EH1_B": [5, 2.5], "IH2_I": [5, 2.5], "AH1_B": [5, 2.5], "F_E": [5, 2.5], "AW1_B": [5, 2.5], "F_I": [5, 2.5], "EH2_I": [5, 2.5], "JH_E": [5, 2.5], "AY2_I": [5, 2.5], "EY2_E": [5, 2.5], "NG_I": [5, 2.5], "CH_E": [5, 2.5], "EY1_B": [5, 2.5], "AA0_B": [5, 2.5], "Y_B": [5, 2.5], "DH_E": [5, 2.5], "IY2_I": [5, 2.5], "V_B": [5, 2.5], "OY1_I": [5, 2.5], "UW0_E": [5, 2.5], "OW1_E": [5, 2.5], "G_B": [5, 2.5], "AE2_B": [5, 2.5], "M_I": [5, 2.5], "SH_E": [5, 2.5], "IH2_B": [5, 2.5], "AW1_E": [5, 2.5], "ZH_I": [5, 2.5], "ER0_S": [5, 2.5], "AY1_B": [5, 2.5], "AA0_I": [5, 2.5], "G_E": [5, 2.5], "EH0_B": [5, 2.5], "SPN_S": [32, 11], "UW2_I": [5, 2.5], "UW0_I": [5, 2.5], "EY2_I": [5, 2.5], "ER1_E": [5, 2.5], "OW2_I": [5, 2.5], "OW0_I": [5, 2.5], "HH_I": [5, 2.5], "B_E": [5, 2.5], "AO1_E": [5, 2.5], "AH2_B": [5, 2.5], "UH2_I": [5, 2.5], "OW1_S": [5, 2.5], "AO2_B": [5, 2.5], "OY1_E": [5, 2.5], "AE2_I": [5, 2.5], "AO0_B": [5, 2.5], "EH2_B": [5, 2.5], "EY1_S": [5, 2.5], "AE0_B": [5, 2.5], "ER0_B": [5, 2.5], "EH0_I": [5, 2.5], "EY0_I": [5, 2.5], "AW2_E": [5, 2.5], "AW2_I": [5, 2.5], "AY0_B": [5, 2.5], "AA2_B": [5, 2.5], "EY0_E": [5, 2.5], "AO0_I": [5, 2.5], "AY0_I": [5, 2.5], "AH2_I": [5, 2.5], "OW2_E": [5, 2.5], "ZH_E": [5, 2.5], "AY2_E": [5, 2.5], "ER2_I": [5, 2.5], "IY2_B": [5, 2.5], "AA1_S": [5, 2.5], "AA1_E": [5, 2.5], "OY0_I": [5, 2.5], "IY0_B": [5, 2.5], "OY2_E": [5, 2.5], "OW2_B": [5, 2.5], "AY0_E": [5, 2.5], "OY2_I": [5, 2.5], "UW1_B": [5, 2.5], "OY0_E": [5, 2.5], "UH0_I": [5, 2.5], "OY1_B": [5, 2.5], "AW0_B": [5, 2.5], "AO1_S": [5, 2.5], "OW0_B": [5, 2.5], "EH1_S": [5, 2.5], "AW0_I": [5, 2.5], "UW0_B": [5, 2.5], "AO2_E": [5, 2.5], "UW2_E": [5, 2.5], "L_S": [5, 2.5], "Z_B": [5, 2.5], "AA2_E": [5, 2.5], "EY0_B": [5, 2.5], "AY2_B": [5, 2.5], "AW0_E": [5, 2.5], "IY1_S": [5, 2.5], "EY2_B": [5, 2.5], "AH1_S": [5, 2.5], "IH2_E": [5, 2.5], "AW2_B": [5, 2.5], "AA0_E": [5, 2.5], "ER2_E": [5, 2.5], "ZH_B": [5, 2.5], "UH1_E": [5, 2.5], "EH1_E": [5, 2.5], "IH1_E": [5, 2.5], "ER1_S": [5, 2.5], "EH2_E": [5, 2.5], "AO0_E": [5, 2.5], "OY1_S": [5, 2.5], "AA_B": [5, 2.5], "AA_E": [5, 2.5], "AA_I": [5, 2.5], "AA_S": [5, 2.5], "AA0_S": [5, 2.5], "AA2_S": [5, 2.5], "AE_B": [5, 2.5], "AE_E": [5, 2.5], "AE_I": [5, 2.5], "AE_S": [5, 2.5], "AE0_E": [5, 2.5], "AE0_S": [5, 2.5], "AE1_E": [5, 2.5], "AE1_S": [5, 2.5], "AE2_E": [5, 2.5], "AE2_S": [5, 2.5], "AH_B": [5, 2.5], "AH_E": [5, 2.5], "AH_I": [5, 2.5], "AH_S": [5, 2.5], "AH2_E": [5, 2.5], "AH2_S": [5, 2.5], "AO_B": [5, 2.5], "AO_E": [5, 2.5], "AO_I": [5, 2.5], "AO_S": [5, 2.5], "AO0_S": [5, 2.5], "AO2_S": [5, 2.5], "AW_B": [5, 2.5], "AW_E": [5, 2.5], "AW_I": [5, 2.5], "AW_S": [5, 2.5], "AW0_S": [5, 2.5], "AW1_S": [5, 2.5], "AW2_S": [5, 2.5], "AY_B": [5, 2.5], "AY_E": [5, 2.5], "AY_I": [5, 2.5], "AY_S": [5, 2.5], "AY0_S": [5, 2.5], "AY2_S": [5, 2.5], "B_S": [5, 2.5], "CH_S": [5, 2.5], "D_S": [5, 2.5], "DH_S": [5, 2.5], "EH_B": [5, 2.5], "EH_E": [5, 2.5], "EH_I": [5, 2.5], "EH_S": [5, 2.5], "EH0_E": [5, 2.5], "EH0_S": [5, 2.5], "EH2_S": [5, 2.5], "ER_B": [5, 2.5], "ER_E": [5, 2.5], "ER_I": [5, 2.5], "ER_S": [5, 2.5], "ER2_B": [5, 2.5], "ER2_S": [5, 2.5], "EY_B": [5, 2.5], "EY_E": [5, 2.5], "EY_I": [5, 2.5], "EY_S": [5, 2.5], "EY0_S": [5, 2.5], "EY2_S": [5, 2.5], "F_S": [5, 2.5], "G_S": [5, 2.5], "HH_E": [5, 2.5], "HH_S": [5, 2.5], "IH_B": [5, 2.5], "IH_E": [5, 2.5], "IH_I": [5, 2.5], "IH_S": [5, 2.5], "IH0_S": [5, 2.5], "IH1_S": [5, 2.5], "IH2_S": [5, 2.5], "IY_B": [5, 2.5], "IY_E": [5, 2.5], "IY_I": [5, 2.5], "IY_S": [5, 2.5], "IY0_S": [5, 2.5], "IY2_S": [5, 2.5], "JH_S": [5, 2.5], "K_S": [5, 2.5], "M_S": [5, 2.5], "N_S": [5, 2.5], "NG_B": [5, 2.5], "NG_S": [5, 2.5], "OW_B": [5, 2.5], "OW_E": [5, 2.5], "OW_I": [5, 2.5], "OW_S": [5, 2.5], "OW0_S": [5, 2.5], "OW2_S": [5, 2.5], "OY_B": [5, 2.5], "OY_E": [5, 2.5], "OY_I": [5, 2.5], "OY_S": [5, 2.5], "OY0_B": [5, 2.5], "OY0_S": [5, 2.5], "OY2_B": [5, 2.5], "OY2_S": [5, 2.5], "P_S": [5, 2.5], "R_S": [5, 2.5], "S_S": [5, 2.5], "SH_S": [5, 2.5], "T_S": [5, 2.5], "TH_S": [5, 2.5], "UH_B": [5, 2.5], "UH_E": [5, 2.5], "UH_I": [5, 2.5], "UH_S": [5, 2.5], "UH0_B": [5, 2.5], "UH0_E": [5, 2.5], "UH0_S": [5, 2.5], "UH1_B": [5, 2.5], "UH1_S": [5, 2.5], "UH2_B": [5, 2.5], "UH2_E": [5, 2.5], "UH2_S": [5, 2.5], "UW_B": [5, 2.5], "UW_E": [5, 2.5], "UW_I": [5, 2.5], "UW_S": [5, 2.5], "UW0_S": [5, 2.5], "UW1_S": [5, 2.5], "UW2_B": [5, 2.5], "UW2_S": [5, 2.5], "V_S": [5, 2.5], "W_E": [5, 2.5], "W_S": [5, 2.5], "Y_E": [5, 2.5], "Y_S": [5, 2.5], "Z_S": [5, 2.5], "ZH_S": [5, 2.5]}
diff --git a/SpeechLM/speechlm/data_process/phoneme_tokenizer/repeat_withou_insert_sil_less_4375.py b/SpeechLM/speechlm/data_process/phoneme_tokenizer/repeat_withou_insert_sil_less_4375.py
new file mode 100644
index 0000000000000000000000000000000000000000..8c443611002a37318fea7c3d6ac42599c3ce3568
--- /dev/null
+++ b/SpeechLM/speechlm/data_process/phoneme_tokenizer/repeat_withou_insert_sil_less_4375.py
@@ -0,0 +1,41 @@
+# ----------------------------------------------------------------------------
+# SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data (https://arxiv.org/abs/2209.15329)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/SpeechLM
+# Code based on fairseq: https://github.com/facebookresearch/fairseq/tree/272c4c5197250997148fb12c0db6306035f166a4
+# 
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# ----------------------------------------------------------------------------
+
+import sys, json, tqdm
+import numpy as np
+
+input_file = sys.argv[1]
+mean_and_std_file = sys.argv[2]
+out_file = sys.argv[3]
+
+mean_and_std = json.load(open(mean_and_std_file, 'r'))
+
+with open(input_file, 'r') as f, open(out_file, 'w') as w:
+    for line in tqdm.tqdm(f):
+        l = line.split()
+
+        new_l = []
+        for phn in l:
+            if phn not in mean_and_std:
+               mean_and_std[phn] = [5, 2.5]
+               print(f'unk phone {phn}')
+            n = max(1, round(np.random.normal(loc=mean_and_std[phn][0], scale=mean_and_std[phn][1])))
+            new_l.extend([phn] * int(n))
+
+        minus = 0
+        while len(new_l) >= 4375:
+            minus += 1
+            new_l = []
+            for phn in l:
+                n = max(1, round(mean_and_std[phn][0] - minus))
+                new_l.extend([phn] * n)
+            print(f"too long line try minus {minus}")
+
+        w.write(' '.join(new_l)+'\n')
+
diff --git a/SpeechLM/speechlm/data_process/prepare_covost2_enxx.sh b/SpeechLM/speechlm/data_process/prepare_covost2_enxx.sh
new file mode 100644
index 0000000000000000000000000000000000000000..4d316a453582c8da003234385c709840f8dde855
--- /dev/null
+++ b/SpeechLM/speechlm/data_process/prepare_covost2_enxx.sh
@@ -0,0 +1,45 @@
+
+#!/bin/bash
+[ ${PWD##*/} != SpeechLM ] && echo "Error: dir not match! Switch to SpeechLM/ and run it again!" && exit 1
+[ $# -lt 1 ] && echo "Usage: $0 <lang> [root=${PWD}/dataset/CommonVoice/v4]" && exit 0
+cwd=${PWD}
+src=${PWD}/speechlm/data_process
+lang=$1
+root=$2
+[ -z $root ] && root="${PWD}/dataset/CommonVoice/v4"
+set -e -o pipefail -u
+
+
+### step1, convert mp3 to wav
+cd $root/en && mkdir -p wav
+cut -f2 validated.tsv | sed '1d' | sed "s|^|${root}/en/clips/|" > validated.id
+for i in $(seq 0 39); do 
+        echo extracting $i; 
+        python $src/covost2/mp3_to_wav.py -i validated.id -n 40 -r $i &
+done
+wait
+cd $cwd
+
+
+### step2, manifest
+datadir="$root/en/en-$lang" && mkdir -p $datadir && cd $datadir
+python /mnt/default/v-ziqzhang/code/stpretrain_scripts/data_process/covost2/prepare_covost_data.py --data-root $root --src-lang en --tgt-lang $lang --vocab-type char
+mv ../*en_${lang}.* ./
+
+# adjust config_base_en${lang}.yaml
+echo "bpe_tokenizer:" > config_base_en${lang}.yaml
+echo "  bpe: sentencepiece" >> config_base_en${lang}.yaml
+echo "  sentencepiece_model: spm_char_st_en_de.model" >> config_base_en${lang}.yaml
+echo "" >> config_base_en${lang}.yaml
+echo "shuffle: false" >> config_base_en${lang}.yaml
+echo "use_audio_input: true" >> config_base_en${lang}.yaml
+echo "use_sample_rate: 16000" >> config_base_en${lang}.yaml
+echo "standardize_audio: false" >> config_base_en${lang}.yaml
+echo "vocab_filename: spm_char_st_en_de.txt" >> config_base_en${lang}.yaml
+echo "" >> config_base_en${lang}.yaml
+echo "# required by speech_to_text task but never used" >> config_base_en${lang}.yaml
+echo "input_channels: 1" >> config_base_en${lang}.yaml
+echo "input_feat_per_channel: 1" >> config_base_en${lang}.yaml
+echo "" >> config_base_en${lang}.yaml
+# adjust config_large_en${lang}.yaml
+cat config_base_en${lang}.yaml | sed "s|standardize_audio: false|standardize_audio: true|" > config_large_en${lang}.yaml
diff --git a/SpeechLM/speechlm/data_process/prepare_phn2ltr_librilm.sh b/SpeechLM/speechlm/data_process/prepare_phn2ltr_librilm.sh
new file mode 100644
index 0000000000000000000000000000000000000000..9ffdf81adf2eef3e4548b7ed61cb02fce41bb205
--- /dev/null
+++ b/SpeechLM/speechlm/data_process/prepare_phn2ltr_librilm.sh
@@ -0,0 +1,57 @@
+#!/bin/bash
+[ ${PWD##*/} != SpeechLM ] && echo "Error: dir not match! Switch to SpeechLM/ and run it again!" && exit 1
+cwd=${PWD}
+src=${PWD}/speechlm/data_process
+
+set -e
+mkdir -p dataset/LibriLM/phone_unit/tmp && cd dataset/LibriLM
+
+if [ ! -f librispeech-lm-norm.txt ]; then
+    echo "--------------------------------------------------------------------------------------"
+    echo "--------Downloading and unpacking librispeech-lm-norm.txt ..."
+    echo "--------------------------------------------------------------------------------------"
+    wget -c https://www.openslr.org/resources/11/librispeech-lm-norm.txt.gz
+    gzip -d librispeech-lm-norm.txt.gz
+fi
+
+# head -1000000 librispeech-lm-norm.txt > phone_unit/tmp/librispeech-lm-norm.txt
+cd phone_unit/
+
+echo "--------------------------------------------------------------------------------------"
+echo "--------Tokenize the text..."
+echo "--------------------------------------------------------------------------------------"
+cat ../librispeech-lm-norm.txt | sed '1d' | python $src/wrd2ltr.py > tmp/librilm.ltr
+
+echo "--------------------------------------------------------------------------------------"
+echo "--------Tokenize the text to the kaldi-style phonemes ..."
+echo "--------------------------------------------------------------------------------------"
+python $src/phoneme_tokenizer/ltr2kaldi_phn_sil025.py -i tmp/librilm.ltr -o tmp/librilm
+cat tmp/librilm.kaldi_phn_sil025 | sed 's/SIL_S/SIL/g' > tmp/librilm.phn
+
+echo "--------------------------------------------------------------------------------------"
+echo "--------Filter too long samples and up-sample phonemes ..."
+echo "--------------------------------------------------------------------------------------"
+python $src/filter_paireddata_by_len.py -i tmp/librilm -o tmp/librilm_l2k -s phn -t ltr -m 2000
+python $src/phoneme_tokenizer/repeat_withou_insert_sil_less_4375.py \
+    tmp/librilm_l2k.phn \
+    $src/phoneme_tokenizer/mean5_and_std25_sil14_spn32.dict \
+    tmp/librilm_l2k_upsample.phn
+
+mv tmp/librilm_l2k.ltr tmp/librilm_l2k_upsample.ltr 
+python $src/filter_paireddata_by_len.py -i tmp/librilm_l2k_upsample -o train_text.phn-ltr -s phn -t ltr -m 2800
+### the max-length is set to filter the data, considering the batch size (in Large setting, 900,000/320 = 2812 tokens in a batch).
+
+
+echo "--------------------------------------------------------------------------------------"
+echo "--------Create binary files ..."
+echo "--------------------------------------------------------------------------------------"
+[ ! -f bin-idx/dict.phn.txt ] && echo "dict ${cwd}/dataset/LibriLM/bin-idx/dict.phn.txt not found!" && exit 1
+[ ! -f bin-idx/dict.ltr.txt ] && echo "dict ${cwd}/dataset/LibriLM/bin-idx/dict.ltr.txt not found!" && exit 1
+bash $src/txt2idx.sh train_text.phn-ltr.phn bin-idx bin-idx/dict.phn.txt
+bash $src/txt2idx.sh train_text.phn-ltr.ltr bin-idx bin-idx/dict.ltr.txt
+
+rm -r tmp
+cd -
+echo "--------------------------------------------------------------------------------------"
+echo "--------Done! files are in ${PWD}/dataset/LibriLM"
+echo "--------------------------------------------------------------------------------------"
diff --git a/SpeechLM/speechlm/data_process/txt2idx.sh b/SpeechLM/speechlm/data_process/txt2idx.sh
new file mode 100644
index 0000000000000000000000000000000000000000..4442bf94cc481cc67694289c9d8b128a398d84c4
--- /dev/null
+++ b/SpeechLM/speechlm/data_process/txt2idx.sh
@@ -0,0 +1,30 @@
+#!/bin/bash
+[ $# -lt 3 ] && echo "Usage: $0 <input-text> <outdir> <DICT> <suffix>" && exit 0
+
+input=$1
+outdir=$2
+DICT=$3
+suffix=$4
+outname=${input##*/}
+outname=${outname%.txt*}
+[ -z $input ] && echo "You must specify a source file" && exit 1
+
+[ -z $DICT ] && echo "No dict was specified!" && exit 1
+[ -z $outdir ] && outdir=${input%/*}
+[ -z $outdir ] && outdir="."
+[ ! -d $outdir ] && mkdir -p $outdir
+
+echo "------------------------------- creating idx/bin--------------------------------------------"
+echo "$input --> $outdir/${outname}${suffix}.idx"
+fairseq-preprocess \
+  --only-source \
+  --trainpref $input \
+  --destdir $outdir \
+  --thresholdsrc 0 \
+  --srcdict ${DICT} \
+  --workers 40
+
+mv $outdir/train.idx $outdir/${outname}${suffix}.idx
+mv $outdir/train.bin $outdir/${outname}${suffix}.bin
+echo "-----------------------------------   done      --------------------------------------------"
+
diff --git a/SpeechLM/speechlm/data_process/wrd2ltr.py b/SpeechLM/speechlm/data_process/wrd2ltr.py
new file mode 100644
index 0000000000000000000000000000000000000000..8aa48e62e6c4b302a73a3fc3656b9c78b7e06ea9
--- /dev/null
+++ b/SpeechLM/speechlm/data_process/wrd2ltr.py
@@ -0,0 +1,12 @@
+import sys
+
+def main():
+    for line in sys.stdin:
+        line = line.replace("<unk>", "")
+        line = " ".join(line.strip().split())
+        line = line.replace(" ", "|").upper() + "|"
+        print(" ".join(line))
+
+if __name__ == "__main__":
+    main()
+
diff --git a/SpeechLM/speechlm/generate_unit.py b/SpeechLM/speechlm/generate_unit.py
new file mode 100644
index 0000000000000000000000000000000000000000..690ea28c41021c278c927a79f4bf508229111e22
--- /dev/null
+++ b/SpeechLM/speechlm/generate_unit.py
@@ -0,0 +1,412 @@
+# ----------------------------------------------------------------------------
+# SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data (https://arxiv.org/abs/2209.15329)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/SpeechLM
+# Code based on fairseq: https://github.com/facebookresearch/fairseq/tree/272c4c5197250997148fb12c0db6306035f166a4
+# 
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# ----------------------------------------------------------------------------
+
+"""
+Modified form: https://github.com/facebookresearch/fairseq/blob/272c4c5197250997148fb12c0db6306035f166a4/fairseq_cli/generate.py
+"""
+
+import ast
+import logging
+import math
+import os
+import sys
+from argparse import Namespace
+from itertools import chain
+
+import numpy as np
+import torch
+from omegaconf import DictConfig
+
+from fairseq import checkpoint_utils, options, scoring, tasks, utils
+from fairseq.dataclass.utils import convert_namespace_to_omegaconf
+from fairseq.logging import progress_bar
+from fairseq.logging.meters import StopwatchMeter, TimeMeter
+
+
+def main(cfg: DictConfig):
+
+    if isinstance(cfg, Namespace):
+        cfg = convert_namespace_to_omegaconf(cfg)
+
+    assert cfg.common_eval.path is not None, "--path required for generation!"
+    assert (
+        not cfg.generation.sampling or cfg.generation.nbest == cfg.generation.beam
+    ), "--sampling requires --nbest to be equal to --beam"
+    assert (
+        cfg.generation.replace_unk is None or cfg.dataset.dataset_impl == "raw"
+    ), "--replace-unk requires a raw text dataset (--dataset-impl=raw)"
+
+    if cfg.common_eval.results_path is not None:
+        os.makedirs(cfg.common_eval.results_path, exist_ok=True)
+        output_path = os.path.join(
+            cfg.common_eval.results_path,
+            "generate-{}.txt".format(cfg.dataset.gen_subset),
+        )
+        with open(output_path, "w", buffering=1, encoding="utf-8") as h:
+            return _main(cfg, h)
+    else:
+        return _main(cfg, sys.stdout)
+
+
+def get_symbols_to_strip_from_output(generator):
+    if hasattr(generator, "symbols_to_strip_from_output"):
+        return generator.symbols_to_strip_from_output
+    else:
+        return {generator.eos}
+
+
+def _main(cfg: DictConfig, output_file):
+    logging.basicConfig(
+        format="%(asctime)s | %(levelname)s | %(name)s | %(message)s",
+        datefmt="%Y-%m-%d %H:%M:%S",
+        level=os.environ.get("LOGLEVEL", "INFO").upper(),
+        stream=output_file,
+    )
+    logger = logging.getLogger("fairseq_cli.generate")
+
+    utils.import_user_module(cfg.common)
+
+    if cfg.dataset.max_tokens is None and cfg.dataset.batch_size is None:
+        cfg.dataset.max_tokens = 12000
+    logger.info(cfg)
+
+    # Fix seed for stochastic decoding
+    if cfg.common.seed is not None and not cfg.generation.no_seed_provided:
+        np.random.seed(cfg.common.seed)
+        utils.set_torch_seed(cfg.common.seed)
+
+    use_cuda = torch.cuda.is_available() and not cfg.common.cpu
+
+    # Load dataset splits
+    task = tasks.setup_task(cfg.task)
+
+    # Set dictionaries
+    try:
+        src_dict = getattr(task, "source_dictionary", None)
+    except NotImplementedError:
+        src_dict = None
+    tgt_dict = task.target_dictionary
+
+    overrides = ast.literal_eval(cfg.common_eval.model_overrides)
+
+    # Load ensemble
+    logger.info("loading model(s) from {}".format(cfg.common_eval.path))
+    models, saved_cfg = checkpoint_utils.load_model_ensemble(
+        utils.split_paths(cfg.common_eval.path),
+        arg_overrides=overrides,
+        task=task,
+        suffix=cfg.checkpoint.checkpoint_suffix,
+        strict=(cfg.checkpoint.checkpoint_shard_count == 1),
+        num_shards=cfg.checkpoint.checkpoint_shard_count,
+    )
+
+    # loading the dataset should happen after the checkpoint has been loaded so we can give it the saved task config
+    task.load_dataset(cfg.dataset.gen_subset, task_cfg=saved_cfg.task)
+
+    if cfg.generation.lm_path is not None:
+        overrides["data"] = cfg.task.data
+
+        try:
+            lms, _ = checkpoint_utils.load_model_ensemble(
+                [cfg.generation.lm_path], arg_overrides=overrides, task=None
+            )
+        except:
+            logger.warning(
+                f"Failed to load language model! Please make sure that the language model dict is the same "
+                f"as target dict and is located in the data dir ({cfg.task.data})"
+            )
+            raise
+
+        assert len(lms) == 1
+    else:
+        lms = [None]
+
+    # Optimize ensemble for generation
+    for model in chain(models, lms):
+        if model is None:
+            continue
+        if cfg.common.fp16:
+            model.half()
+        if use_cuda and not cfg.distributed_training.pipeline_model_parallel:
+            model.cuda()
+        model.prepare_for_inference_(cfg)
+
+    def _fp_convert_sample(sample):
+        def apply_half(t):
+            if t.dtype is torch.float32:
+                return t.to(dtype=torch.half)
+            return t
+
+        def apply_bfloat16(t):
+            if t.dtype is torch.float32:
+                return t.to(dtype=torch.bfloat16)
+            return t
+
+        if cfg.common.fp16:
+            sample = utils.apply_to_sample(apply_half, sample)
+
+        if cfg.common.bf16:
+            sample = utils.apply_to_sample(apply_bfloat16, sample)
+
+        return sample
+    
+    # Load alignment dictionary for unknown word replacement
+    # (None if no unknown word replacement, empty if no path to align dictionary)
+    align_dict = utils.load_align_dict(cfg.generation.replace_unk)
+
+    # Load dataset (possibly sharded)
+    itr = task.get_batch_iterator(
+        dataset=task.dataset(cfg.dataset.gen_subset),
+        max_tokens=cfg.dataset.max_tokens,
+        max_sentences=cfg.dataset.batch_size,
+        max_positions=utils.resolve_max_positions(
+            task.max_positions(), *[m.max_positions() for m in models]
+        ),
+        ignore_invalid_inputs=cfg.dataset.skip_invalid_size_inputs_valid_test,
+        required_batch_size_multiple=cfg.dataset.required_batch_size_multiple,
+        seed=cfg.common.seed,
+        num_shards=cfg.distributed_training.distributed_world_size,
+        shard_id=cfg.distributed_training.distributed_rank,
+        num_workers=cfg.dataset.num_workers,
+        data_buffer_size=cfg.dataset.data_buffer_size,
+    ).next_epoch_itr(shuffle=False)
+    progress = progress_bar.progress_bar(
+        itr,
+        log_format=cfg.common.log_format,
+        log_interval=cfg.common.log_interval,
+        default_log_format=("tqdm" if not cfg.common.no_progress_bar else "simple"),
+    )
+
+    # Initialize generator
+    gen_timer = StopwatchMeter()
+
+    extra_gen_cls_kwargs = {"lm_model": lms[0], "lm_weight": cfg.generation.lm_weight}
+    generator = task.build_generator(
+        models, cfg.generation, extra_gen_cls_kwargs=extra_gen_cls_kwargs
+    )
+
+    # Handle tokenization and BPE
+    tokenizer = task.build_tokenizer(cfg.tokenizer)
+    bpe = task.build_bpe(cfg.bpe)
+
+    def decode_fn(x):
+        if bpe is not None:
+            x = bpe.decode(x)
+        if tokenizer is not None:
+            x = tokenizer.decode(x)
+        return x
+
+    scorer = scoring.build_scorer(cfg.scoring, None)
+
+    num_sentences = 0
+    has_target = True
+    wps_meter = TimeMeter()
+    for sample in progress:
+        sample = utils.move_to_cuda(sample) if use_cuda else sample
+        sample = _fp_convert_sample(sample)
+        if "net_input" not in sample:
+            continue
+
+        prefix_tokens = None
+        if cfg.generation.prefix_size > 0:
+            prefix_tokens = sample["target"][:, : cfg.generation.prefix_size]
+
+        constraints = None
+        if "constraints" in sample:
+            constraints = sample["constraints"]
+
+        gen_timer.start()
+        hypos = task.inference_step(
+            generator,
+            models[0],
+            sample,
+            prefix_tokens=prefix_tokens,
+            constraints=constraints,
+        )
+        num_generated_tokens = sum(len(h["unit"]) for h in hypos)
+        gen_timer.stop(num_generated_tokens)
+
+        for i, sample_id in enumerate(sample["id"].tolist()):
+            has_target = sample["target"] is not None
+
+            # Remove padding
+            if "src_tokens" in sample["net_input"]:
+                src_tokens = utils.strip_pad(
+                    sample["net_input"]["src_tokens"][i, :], tgt_dict.pad()
+                ).cpu()
+            else:
+                src_tokens = None
+
+            target_tokens = None
+            if has_target:
+                target_tokens = (
+                    utils.strip_pad(sample["target"][i, :], tgt_dict.pad()).cpu()
+                )
+
+            # Either retrieve the original sentences or regenerate them from tokens.
+            if align_dict is not None:
+                src_str = task.dataset(cfg.dataset.gen_subset).src.get_original_text(
+                    sample_id
+                )
+                target_str = task.dataset(cfg.dataset.gen_subset).tgt.get_original_text(
+                    sample_id
+                )
+            else:
+                if src_dict is not None:
+                    src_str = src_dict.string(src_tokens, cfg.common_eval.post_process)
+                else:
+                    src_str = ""
+                if has_target:
+                    target_str = " ".join(map(str, target_tokens.numpy().tolist()))
+
+            src_str = decode_fn(src_str)
+
+            if not cfg.common_eval.quiet:
+                if src_dict is not None:
+                    print("S-{}\t{}".format(sample_id, src_str), file=output_file)
+                if has_target:
+                    print("T-{}\t{}".format(sample_id, target_str), file=output_file)
+
+            # Process top predictions
+            j = 0
+            hypo = hypos[i]
+            hypo_tokens = hypo["unit"].int().cpu()
+            hypo_str = " ".join(map(str, hypo_tokens.numpy().tolist()))
+            alignment = None
+            detok_hypo_str = hypo_str
+            # add duration prediction
+            hypo_duration = " ".join(map(str, hypo["duration"].int().cpu().numpy().tolist()))
+            hypo_fa_src_str = src_dict.string(hypo["fa_src"].cpu().numpy(), cfg.common_eval.post_process)
+            # hypo_fa_src_str = " ".join(map(str, hypo["fa_src"].int().cpu().numpy() - 4))
+
+            if not cfg.common_eval.quiet:
+                # score = hypo["score"] / math.log(2)  # convert to base 2
+                score = 0.00
+                # original hypothesis (after tokenization and BPE)
+                # print(
+                #     "H-{}\t{}\t{}".format(sample_id, score, hypo_str),
+                #     file=output_file,
+                # )
+                # detokenized hypothesis
+                print(
+                    "D-{}\t{}\t{}".format(sample_id, score, detok_hypo_str),
+                    file=output_file,
+                )
+                # duration prediction
+                print(
+                    "L-{}\t{}\t{}".format(sample_id, score, hypo_duration),
+                    file=output_file,
+                )
+                # force-aligned upsampled src-tokens
+                print(
+                    "U-{}\t{}\t{}".format(sample_id, score, hypo_fa_src_str),
+                    file=output_file,
+                )
+                # print(
+                #     "P-{}\t{}".format(
+                #         sample_id,
+                #         " ".join(
+                #             map(
+                #                 lambda x: "{:.4f}".format(x),
+                #                 # convert from base e to base 2
+                #                 hypo["positional_scores"]
+                #                 .div_(math.log(2))
+                #                 .tolist(),
+                #             )
+                #         ),
+                #     ),
+                #     file=output_file,
+                # )
+
+                if cfg.generation.print_alignment == "hard":
+                    print(
+                        "A-{}\t{}".format(
+                            sample_id,
+                            " ".join(
+                                [
+                                    "{}-{}".format(src_idx, tgt_idx)
+                                    for src_idx, tgt_idx in alignment
+                                ]
+                            ),
+                        ),
+                        file=output_file,
+                    )
+                if cfg.generation.print_alignment == "soft":
+                    print(
+                        "A-{}\t{}".format(
+                            sample_id,
+                            " ".join(
+                                [",".join(src_probs) for src_probs in alignment]
+                            ),
+                        ),
+                        file=output_file,
+                    )
+
+
+            # Score only the top hypothesis
+            if has_target and j == 0:
+                if hasattr(scorer, "add_string"):
+                    scorer.add_string(target_str, detok_hypo_str)
+                else:
+                    scorer.add(target_tokens, hypo_tokens)
+
+        wps_meter.update(num_generated_tokens)
+        progress.log({"wps": round(wps_meter.avg)})
+        num_sentences += (
+            sample["nsentences"] if "nsentences" in sample else sample["id"].numel()
+        )
+
+    logger.info("NOTE: hypothesis and token scores are output in base 2")
+    logger.info(
+        "Translated {:,} sentences ({:,} tokens) in {:.1f}s ({:.2f} sentences/s, {:.2f} tokens/s)".format(
+            num_sentences,
+            gen_timer.n,
+            gen_timer.sum,
+            num_sentences / gen_timer.sum,
+            1.0 / gen_timer.avg,
+        )
+    )
+    if has_target:
+        if cfg.bpe and not cfg.generation.sacrebleu:
+            if cfg.common_eval.post_process:
+                logger.warning(
+                    "BLEU score is being computed by splitting detokenized string on spaces, this is probably not what you want. Use --sacrebleu for standard 13a BLEU tokenization"
+                )
+            else:
+                logger.warning(
+                    "If you are using BPE on the target side, the BLEU score is computed on BPE tokens, not on proper words.  Use --sacrebleu for standard 13a BLEU tokenization"
+                )
+        # use print to be consistent with other main outputs: S-, H-, T-, D- and so on
+        print(
+            "Generate {} with beam={}: {}".format(
+                cfg.dataset.gen_subset, cfg.generation.beam, scorer.result_string()
+            ),
+            file=output_file,
+        )
+
+    return scorer
+
+
+def cli_main():
+    parser = options.get_generation_parser()
+    # TODO: replace this workaround with refactoring of `AudioPretraining`
+    parser.add_argument(
+        "--arch",
+        "-a",
+        metavar="ARCH",
+        default="wav2vec2",
+        help="Model architecture. For constructing tasks that rely on "
+        "model args (e.g. `AudioPretraining`)",
+    )
+    args = options.parse_args_and_arch(parser)
+    main(args)
+
+
+if __name__ == "__main__":
+    cli_main()
diff --git a/SpeechLM/speechlm/infer.py b/SpeechLM/speechlm/infer.py
new file mode 100644
index 0000000000000000000000000000000000000000..ab80c15f8986233d004f3eae270e90c0cf1d5709
--- /dev/null
+++ b/SpeechLM/speechlm/infer.py
@@ -0,0 +1,486 @@
+# ----------------------------------------------------------------------------
+# SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data (https://arxiv.org/abs/2209.15329)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/SpeechLM
+# Code based on fairseq: https://github.com/facebookresearch/fairseq/tree/272c4c5197250997148fb12c0db6306035f166a4
+# 
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# ----------------------------------------------------------------------------
+
+"""
+Modified form: https://github.com/facebookresearch/fairseq/blob/272c4c5197250997148fb12c0db6306035f166a4/examples/speech_recognition/new/infer.py
+1. add "utils.import_user_module(cfg.common)" so that usr-dir can be loaded
+"""
+
+import ast
+import hashlib
+import logging
+import os
+import shutil
+import sys
+from dataclasses import dataclass, field, is_dataclass
+from pathlib import Path
+from typing import Any, Dict, List, Optional, Tuple, Union
+
+import editdistance
+import torch
+import torch.distributed as dist
+import examples
+from examples.speech_recognition.new.decoders.decoder_config import (
+    DecoderConfig,
+    FlashlightDecoderConfig,
+)
+from examples.speech_recognition.new.decoders.decoder import Decoder
+from fairseq import checkpoint_utils, distributed_utils, progress_bar, tasks, utils
+from fairseq.data.data_utils import post_process
+from fairseq.dataclass.configs import (
+    CheckpointConfig,
+    CommonConfig,
+    CommonEvalConfig,
+    DatasetConfig,
+    DistributedTrainingConfig,
+    FairseqDataclass,
+)
+from fairseq.logging.meters import StopwatchMeter, TimeMeter
+from fairseq.logging.progress_bar import BaseProgressBar
+from fairseq.models.fairseq_model import FairseqModel
+from omegaconf import OmegaConf
+
+import hydra
+from hydra.core.config_store import ConfigStore
+
+logging.root.setLevel(logging.INFO)
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger(__name__)
+
+config_path = Path(examples.speech_recognition.new.__path__[0]).resolve() / "conf"
+
+
+@dataclass
+class DecodingConfig(DecoderConfig, FlashlightDecoderConfig):
+    unique_wer_file: bool = field(
+        default=False,
+        metadata={"help": "If set, use a unique file for storing WER"},
+    )
+    results_path: Optional[str] = field(
+        default=None,
+        metadata={
+            "help": "If set, write hypothesis and reference sentences into this directory"
+        },
+    )
+
+
+@dataclass
+class InferConfig(FairseqDataclass):
+    task: Any = None
+    decoding: DecodingConfig = DecodingConfig()
+    common: CommonConfig = CommonConfig()
+    common_eval: CommonEvalConfig = CommonEvalConfig()
+    checkpoint: CheckpointConfig = CheckpointConfig()
+    distributed_training: DistributedTrainingConfig = DistributedTrainingConfig()
+    dataset: DatasetConfig = DatasetConfig()
+    is_ax: bool = field(
+        default=False,
+        metadata={
+            "help": "if true, assumes we are using ax for tuning and returns a tuple for ax to consume"
+        },
+    )
+
+
+def reset_logging():
+    root = logging.getLogger()
+    for handler in root.handlers:
+        root.removeHandler(handler)
+    root.setLevel(os.environ.get("LOGLEVEL", "INFO").upper())
+    handler = logging.StreamHandler(sys.stdout)
+    handler.setFormatter(
+        logging.Formatter(
+            fmt="%(asctime)s | %(levelname)s | %(name)s | %(message)s",
+            datefmt="%Y-%m-%d %H:%M:%S",
+        )
+    )
+    root.addHandler(handler)
+
+
+class InferenceProcessor:
+    cfg: InferConfig
+
+    def __init__(self, cfg: InferConfig) -> None:
+        self.cfg = cfg
+        self.task = tasks.setup_task(cfg.task)
+
+        models, saved_cfg = self.load_model_ensemble()
+        self.models = models
+        self.saved_cfg = saved_cfg
+        self.tgt_dict = self.task.target_dictionary
+
+        self.task.load_dataset(
+            self.cfg.dataset.gen_subset,
+            task_cfg=saved_cfg.task,
+        )
+        self.generator = Decoder(cfg.decoding, self.tgt_dict)
+        self.gen_timer = StopwatchMeter()
+        self.wps_meter = TimeMeter()
+        self.num_sentences = 0
+        self.total_errors = 0
+        self.total_length = 0
+
+        self.hypo_words_file = None
+        self.hypo_units_file = None
+        self.ref_words_file = None
+        self.ref_units_file = None
+
+        self.progress_bar = self.build_progress_bar()
+
+    def __enter__(self) -> "InferenceProcessor":
+        if self.cfg.decoding.results_path is not None:
+            self.hypo_words_file = self.get_res_file("hypo.word")
+            self.hypo_units_file = self.get_res_file("hypo.units")
+            self.ref_words_file = self.get_res_file("ref.word")
+            self.ref_units_file = self.get_res_file("ref.units")
+        return self
+
+    def __exit__(self, *exc) -> bool:
+        if self.cfg.decoding.results_path is not None:
+            self.hypo_words_file.close()
+            self.hypo_units_file.close()
+            self.ref_words_file.close()
+            self.ref_units_file.close()
+        return False
+
+    def __iter__(self) -> Any:
+        for sample in self.progress_bar:
+            if not self.cfg.common.cpu:
+                sample = utils.move_to_cuda(sample)
+
+            # Happens on the last batch.
+            if "net_input" not in sample:
+                continue
+            yield sample
+
+    def log(self, *args, **kwargs):
+        self.progress_bar.log(*args, **kwargs)
+
+    def print(self, *args, **kwargs):
+        self.progress_bar.print(*args, **kwargs)
+
+    def get_res_file(self, fname: str) -> None:
+        fname = os.path.join(self.cfg.decoding.results_path, fname)
+        if self.data_parallel_world_size > 1:
+            fname = f"{fname}.{self.data_parallel_rank}"
+        return open(fname, "w", buffering=1)
+
+    def merge_shards(self) -> None:
+        """Merges all shard files into shard 0, then removes shard suffix."""
+
+        shard_id = self.data_parallel_rank
+        num_shards = self.data_parallel_world_size
+
+        if self.data_parallel_world_size > 1:
+
+            def merge_shards_with_root(fname: str) -> None:
+                fname = os.path.join(self.cfg.decoding.results_path, fname)
+                logger.info("Merging %s on shard %d", fname, shard_id)
+                base_fpath = Path(f"{fname}.0")
+                with open(base_fpath, "a") as out_file:
+                    for s in range(1, num_shards):
+                        shard_fpath = Path(f"{fname}.{s}")
+                        with open(shard_fpath, "r") as in_file:
+                            for line in in_file:
+                                out_file.write(line)
+                        shard_fpath.unlink()
+                shutil.move(f"{fname}.0", fname)
+
+            dist.barrier()  # ensure all shards finished writing
+            if shard_id == (0 % num_shards):
+                merge_shards_with_root("hypo.word")
+            if shard_id == (1 % num_shards):
+                merge_shards_with_root("hypo.units")
+            if shard_id == (2 % num_shards):
+                merge_shards_with_root("ref.word")
+            if shard_id == (3 % num_shards):
+                merge_shards_with_root("ref.units")
+            dist.barrier()
+
+    def optimize_model(self, model: FairseqModel) -> None:
+        model.make_generation_fast_()
+        if self.cfg.common.fp16:
+            model.half()
+        if not self.cfg.common.cpu:
+            model.cuda()
+
+    def load_model_ensemble(self) -> Tuple[List[FairseqModel], FairseqDataclass]:
+        arg_overrides = ast.literal_eval(self.cfg.common_eval.model_overrides)
+        models, saved_cfg = checkpoint_utils.load_model_ensemble(
+            utils.split_paths(self.cfg.common_eval.path, separator="\\"),
+            arg_overrides=arg_overrides,
+            task=self.task,
+            suffix=self.cfg.checkpoint.checkpoint_suffix,
+            strict=(self.cfg.checkpoint.checkpoint_shard_count == 1),
+            num_shards=self.cfg.checkpoint.checkpoint_shard_count,
+        )
+        for model in models:
+            self.optimize_model(model)
+        return models, saved_cfg
+
+    def get_dataset_itr(self, disable_iterator_cache: bool = False) -> None:
+        return self.task.get_batch_iterator(
+            dataset=self.task.dataset(self.cfg.dataset.gen_subset),
+            max_tokens=self.cfg.dataset.max_tokens,
+            max_sentences=self.cfg.dataset.batch_size,
+            max_positions=(sys.maxsize, sys.maxsize),
+            ignore_invalid_inputs=self.cfg.dataset.skip_invalid_size_inputs_valid_test,
+            required_batch_size_multiple=self.cfg.dataset.required_batch_size_multiple,
+            seed=self.cfg.common.seed,
+            num_shards=self.data_parallel_world_size,
+            shard_id=self.data_parallel_rank,
+            num_workers=self.cfg.dataset.num_workers,
+            data_buffer_size=self.cfg.dataset.data_buffer_size,
+            disable_iterator_cache=disable_iterator_cache,
+        ).next_epoch_itr(shuffle=False)
+
+    def build_progress_bar(
+        self,
+        epoch: Optional[int] = None,
+        prefix: Optional[str] = None,
+        default_log_format: str = "tqdm",
+    ) -> BaseProgressBar:
+        return progress_bar.progress_bar(
+            iterator=self.get_dataset_itr(),
+            log_format=self.cfg.common.log_format,
+            log_interval=self.cfg.common.log_interval,
+            epoch=epoch,
+            prefix=prefix,
+            tensorboard_logdir=self.cfg.common.tensorboard_logdir,
+            default_log_format=default_log_format,
+        )
+
+    @property
+    def data_parallel_world_size(self):
+        if self.cfg.distributed_training.distributed_world_size == 1:
+            return 1
+        return distributed_utils.get_data_parallel_world_size()
+
+    @property
+    def data_parallel_rank(self):
+        if self.cfg.distributed_training.distributed_world_size == 1:
+            return 0
+        return distributed_utils.get_data_parallel_rank()
+
+    def process_sentence(
+        self,
+        sample: Dict[str, Any],
+        hypo: Dict[str, Any],
+        sid: int,
+        batch_id: int,
+    ) -> Tuple[int, int]:
+        speaker = None  # Speaker can't be parsed from dataset.
+
+        if "target_label" in sample:
+            toks = sample["target_label"]
+        else:
+            toks = sample["target"]
+        toks = toks[batch_id, :]
+
+        # Processes hypothesis.
+        hyp_pieces = self.tgt_dict.string(hypo["tokens"].int().cpu())
+        if "words" in hypo:
+            hyp_words = " ".join(hypo["words"])
+        else:
+            hyp_words = post_process(hyp_pieces, self.cfg.common_eval.post_process)
+
+        # Processes target.
+        target_tokens = utils.strip_pad(toks, self.tgt_dict.pad())
+        tgt_pieces = self.tgt_dict.string(target_tokens.int().cpu())
+        tgt_words = post_process(tgt_pieces, self.cfg.common_eval.post_process)
+
+        if self.cfg.decoding.results_path is not None:
+            print(f"{hyp_pieces} ({speaker}-{sid})", file=self.hypo_units_file)
+            print(f"{hyp_words} ({speaker}-{sid})", file=self.hypo_words_file)
+            print(f"{tgt_pieces} ({speaker}-{sid})", file=self.ref_units_file)
+            print(f"{tgt_words} ({speaker}-{sid})", file=self.ref_words_file)
+
+        if not self.cfg.common_eval.quiet:
+            logger.info(f"HYPO: {hyp_words}")
+            logger.info(f"REF: {tgt_words}")
+            logger.info("---------------------")
+
+        hyp_words, tgt_words = hyp_words.split(), tgt_words.split()
+
+        return editdistance.eval(hyp_words, tgt_words), len(tgt_words)
+
+    def process_sample(self, sample: Dict[str, Any]) -> None:
+        self.gen_timer.start()
+        hypos = self.task.inference_step(
+            generator=self.generator,
+            models=self.models,
+            sample=sample,
+        )
+        num_generated_tokens = sum(len(h[0]["tokens"]) for h in hypos)
+        self.gen_timer.stop(num_generated_tokens)
+        self.wps_meter.update(num_generated_tokens)
+
+        for batch_id, sample_id in enumerate(sample["id"].tolist()):
+            errs, length = self.process_sentence(
+                sample=sample,
+                sid=sample_id,
+                batch_id=batch_id,
+                hypo=hypos[batch_id][0],
+            )
+            self.total_errors += errs
+            self.total_length += length
+
+        self.log({"wps": round(self.wps_meter.avg)})
+        if "nsentences" in sample:
+            self.num_sentences += sample["nsentences"]
+        else:
+            self.num_sentences += sample["id"].numel()
+
+    def log_generation_time(self) -> None:
+        logger.info(
+            "Processed %d sentences (%d tokens) in %.1fs %.2f "
+            "sentences per second, %.2f tokens per second)",
+            self.num_sentences,
+            self.gen_timer.n,
+            self.gen_timer.sum,
+            self.num_sentences / (self.gen_timer.sum + 1e-6),
+            1.0 / (self.gen_timer.avg + 1e-6),
+        )
+
+
+def parse_wer(wer_file: Path) -> float:
+    with open(wer_file, "r") as f:
+        return float(f.readline().strip().split(" ")[1])
+
+
+def get_wer_file(cfg: InferConfig) -> Path:
+    """Hashes the decoding parameters to a unique file ID."""
+    base_path = "wer"
+    if cfg.decoding.results_path is not None:
+        base_path = os.path.join(cfg.decoding.results_path, base_path)
+
+    if cfg.decoding.unique_wer_file:
+        yaml_str = OmegaConf.to_yaml(cfg.decoding)
+        fid = int(hashlib.md5(yaml_str.encode("utf-8")).hexdigest(), 16)
+        return Path(f"{base_path}.{fid % 1000000}")
+    else:
+        return Path(base_path)
+
+
+def main(cfg: InferConfig) -> float:
+    """Entry point for main processing logic.
+
+    Args:
+        cfg: The inferance configuration to use.
+        wer: Optional shared memory pointer for returning the WER. If not None,
+            the final WER value will be written here instead of being returned.
+
+    Returns:
+        The final WER if `wer` is None, otherwise None.
+    """
+
+    utils.import_user_module(cfg.common)
+
+    yaml_str, wer_file = OmegaConf.to_yaml(cfg.decoding), get_wer_file(cfg)
+
+    # Validates the provided configuration.
+    if cfg.dataset.max_tokens is None and cfg.dataset.batch_size is None:
+        cfg.dataset.max_tokens = 4000000
+    if not cfg.common.cpu and not torch.cuda.is_available():
+        raise ValueError("CUDA not found; set `cpu=True` to run without CUDA")
+
+    logger.info(cfg.common_eval.path)
+
+    with InferenceProcessor(cfg) as processor:
+        for sample in processor:
+            processor.process_sample(sample)
+
+        processor.log_generation_time()
+
+        if cfg.decoding.results_path is not None:
+            processor.merge_shards()
+
+        errs_t, leng_t = processor.total_errors, processor.total_length
+
+        if cfg.common.cpu:
+            logger.warning("Merging WER requires CUDA.")
+        elif processor.data_parallel_world_size > 1:
+            stats = torch.LongTensor([errs_t, leng_t]).cuda()
+            dist.all_reduce(stats, op=dist.ReduceOp.SUM)
+            errs_t, leng_t = stats[0].item(), stats[1].item()
+
+        wer = errs_t * 100.0 / leng_t
+
+        if distributed_utils.is_master(cfg.distributed_training):
+            with open(wer_file, "w") as f:
+                f.write(
+                    (
+                        f"WER: {wer}\n"
+                        f"err / num_ref_words = {errs_t} / {leng_t}\n\n"
+                        f"{yaml_str}"
+                    )
+                )
+
+        return wer
+
+
+@hydra.main(config_path=config_path, config_name="infer")
+def hydra_main(cfg: InferConfig) -> Union[float, Tuple[float, Optional[float]]]:
+    container = OmegaConf.to_container(cfg, resolve=True, enum_to_str=True)
+    cfg = OmegaConf.create(container)
+    OmegaConf.set_struct(cfg, True)
+
+    if cfg.common.reset_logging:
+        reset_logging()
+
+    utils.import_user_module(cfg.common)
+
+    # logger.info("Config:\n%s", OmegaConf.to_yaml(cfg))
+    wer = float("inf")
+
+    try:
+        if cfg.common.profile:
+            with torch.cuda.profiler.profile():
+                with torch.autograd.profiler.emit_nvtx():
+                    distributed_utils.call_main(cfg, main)
+        else:
+            distributed_utils.call_main(cfg, main)
+
+        wer = parse_wer(get_wer_file(cfg))
+    except BaseException as e:  # pylint: disable=broad-except
+        if not cfg.common.suppress_crashes:
+            raise
+        else:
+            logger.error("Crashed! %s", str(e))
+
+    logger.info("Word error rate: %.4f", wer)
+    if cfg.is_ax:
+        return wer, None
+
+    return wer
+
+
+def cli_main() -> None:
+    try:
+        from hydra._internal.utils import (
+            get_args,
+        )  # pylint: disable=import-outside-toplevel
+
+        cfg_name = get_args().config_name or "infer"
+    except ImportError:
+        logger.warning("Failed to get config name from hydra args")
+        cfg_name = "infer"
+
+    cs = ConfigStore.instance()
+    cs.store(name=cfg_name, node=InferConfig)
+
+    for k in InferConfig.__dataclass_fields__:
+        if is_dataclass(InferConfig.__dataclass_fields__[k].type):
+            v = InferConfig.__dataclass_fields__[k].default
+            cs.store(name=k, node=v)
+
+    hydra_main()  # pylint: disable=no-value-for-parameter
+
+
+if __name__ == "__main__":
+    cli_main()
diff --git a/SpeechLM/speechlm/models/__init__.py b/SpeechLM/speechlm/models/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/SpeechLM/speechlm/models/fasttext2unit.py b/SpeechLM/speechlm/models/fasttext2unit.py
new file mode 100644
index 0000000000000000000000000000000000000000..14c27b5ea21faff956cafdf96f0cc6fdb16e3a96
--- /dev/null
+++ b/SpeechLM/speechlm/models/fasttext2unit.py
@@ -0,0 +1,226 @@
+# ----------------------------------------------------------------------------
+# SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data (https://arxiv.org/abs/2209.15329)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/SpeechLM
+# Code based on fairseq: https://github.com/facebookresearch/fairseq/tree/272c4c5197250997148fb12c0db6306035f166a4
+# 
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# ----------------------------------------------------------------------------
+
+import logging
+import torch
+
+from fairseq import utils
+from fairseq.models import (
+    FairseqEncoderModel,
+    register_model,
+    register_model_architecture,
+)
+from fairseq.models.text_to_speech import fastspeech2
+
+logger = logging.getLogger(__name__)
+
+class VarianceAdaptor(fastspeech2.VarianceAdaptor):
+    def __init__(self, args):
+        super().__init__(args)
+        self.use_pitch = args.use_pitch
+        self.use_energe = args.use_energe
+    
+    def forward(
+        self,
+        x,
+        padding_mask,
+        durations=None,
+        pitches=None,
+        energies=None,
+        d_factor=1.0,
+        p_factor=1.0,
+        e_factor=1.0,
+    ):
+        # x: B x T x C
+        log_dur_out = self.duration_predictor(x)
+        dur_out = torch.clamp(
+            torch.round((torch.exp(log_dur_out) - 1) * d_factor).long(), min=0
+        )
+        dur_out.masked_fill_(padding_mask, 0)
+
+        if self.use_pitch:
+            pitch_out, pitch_emb = self.get_pitch_emb(x, pitches, p_factor)
+            x = x + pitch_emb
+        else:
+            pitch_out = None
+
+        if self.use_energe:
+            energy_out, energy_emb = self.get_energy_emb(x, energies, e_factor)
+            x = x + energy_emb
+        else:
+            energy_out = None
+
+        x, out_lens = self.length_regulator(
+            x, dur_out if durations is None else durations
+        )
+
+        return x, out_lens, log_dur_out, pitch_out, energy_out
+
+
+class FastSpeech2Encoder(fastspeech2.FastSpeech2Encoder):
+    def __init__(self, args, src_dict, embed_speaker):
+        super().__init__(args, src_dict, embed_speaker)
+        self.var_adaptor = VarianceAdaptor(args)
+        self.apply(fastspeech2.model_init)
+
+@register_model("fasttext2unit")
+class FastText2UnitModel(FairseqEncoderModel):
+    """
+    Implementation for https://arxiv.org/abs/2006.04558
+    """
+
+    NON_AUTOREGRESSIVE = True
+
+
+    @staticmethod
+    def add_args(parser):
+        parser.add_argument("--dropout", type=float)
+        parser.add_argument("--output-frame-dim", type=int)
+        parser.add_argument("--speaker-embed-dim", type=int)
+        # FFT blocks
+        parser.add_argument("--fft-hidden-dim", type=int)
+        parser.add_argument("--fft-kernel-size", type=int)
+        parser.add_argument("--attention-dropout", type=float)
+        parser.add_argument("--encoder-layers", type=int)
+        parser.add_argument("--encoder-embed-dim", type=int)
+        parser.add_argument("--encoder-attention-heads", type=int)
+        parser.add_argument("--decoder-layers", type=int)
+        parser.add_argument("--decoder-embed-dim", type=int)
+        parser.add_argument("--decoder-attention-heads", type=int)
+        # variance predictor
+        parser.add_argument("--var-pred-n-bins", type=int)
+        parser.add_argument("--var-pred-hidden-dim", type=int)
+        parser.add_argument("--var-pred-kernel-size", type=int)
+        parser.add_argument("--var-pred-dropout", type=float)
+        # postnet
+        parser.add_argument("--add-postnet", action="store_true")
+        parser.add_argument("--postnet-dropout", type=float)
+        parser.add_argument("--postnet-layers", type=int)
+        parser.add_argument("--postnet-conv-dim", type=int)
+        parser.add_argument("--postnet-conv-kernel-size", type=int)
+        # pitch & energe
+        parser.add_argument("--use-pitch", action="store_true")
+        parser.add_argument("--use-energe", action="store_true")
+
+
+    def __init__(self, encoder, args, src_dict):
+        super().__init__(encoder)
+        self._num_updates = 0
+
+    @classmethod
+    def build_model(cls, args, task):
+        embed_speaker = task.get_speaker_embeddings(args)
+        if args.output_frame_dim == -1:
+            args.output_frame_dim = len(task.tgt_dict)
+        encoder = FastSpeech2Encoder(args, task.src_dict, embed_speaker)
+        return cls(encoder, args, task.src_dict)
+
+    def set_num_updates(self, num_updates):
+        super().set_num_updates(num_updates)
+        self._num_updates = num_updates
+
+    def get_normalized_probs(self, net_output, log_probs, sample=None):
+        logits = net_output[0]
+        if log_probs:
+            return utils.log_softmax(logits.float(), dim=-1)
+        else:
+            return utils.softmax(logits.float(), dim=-1)
+
+
+@register_model_architecture("fasttext2unit", "fasttext2unit_s")
+def base_architecture(args):
+    args.dropout = getattr(args, "dropout", 0.2)
+    args.output_frame_dim = getattr(args, "output_frame_dim", -1)
+    args.speaker_embed_dim = getattr(args, "speaker_embed_dim", 256)
+    # FFT blocks
+    args.fft_hidden_dim = getattr(args, "fft_hidden_dim", 1024)
+    args.fft_kernel_size = getattr(args, "fft_kernel_size", 9)
+    args.attention_dropout = getattr(args, "attention_dropout", 0.0)
+    args.encoder_layers = getattr(args, "encoder_layers", 4)
+    args.encoder_embed_dim = getattr(args, "encoder_embed_dim", 256)
+    args.encoder_attention_heads = getattr(args, "encoder_attention_heads", 2)
+    args.decoder_layers = getattr(args, "decoder_layers", 4)
+    args.decoder_embed_dim = getattr(args, "decoder_embed_dim", 256)
+    args.decoder_attention_heads = getattr(args, "decoder_attention_heads", 2)
+    # variance predictor
+    args.var_pred_n_bins = getattr(args, "var_pred_n_bins", 256)
+    args.var_pred_hidden_dim = getattr(args, "var_pred_hidden_dim", 256)
+    args.var_pred_kernel_size = getattr(args, "var_pred_kernel_size", 3)
+    args.var_pred_dropout = getattr(args, "var_pred_dropout", 0.5)
+    # postnet
+    args.add_postnet = getattr(args, "add_postnet", False)
+    args.postnet_dropout = getattr(args, "postnet_dropout", 0.5)
+    args.postnet_layers = getattr(args, "postnet_layers", 5)
+    args.postnet_conv_dim = getattr(args, "postnet_conv_dim", 512)
+    args.postnet_conv_kernel_size = getattr(args, "postnet_conv_kernel_size", 5)
+    # pitch & energe
+    args.use_pitch = getattr(args, "use_pitch", False)
+    args.use_energe = getattr(args, "use_energe", False)
+
+
+@register_model_architecture("fasttext2unit", "fasttext2unit_m")
+def base_architecture(args):
+    args.dropout = getattr(args, "dropout", 0.2)
+    args.output_frame_dim = getattr(args, "output_frame_dim", -1)
+    args.speaker_embed_dim = getattr(args, "speaker_embed_dim", 256)
+    # FFT blocks
+    args.fft_hidden_dim = getattr(args, "fft_hidden_dim", 1024)
+    args.fft_kernel_size = getattr(args, "fft_kernel_size", 9)
+    args.attention_dropout = getattr(args, "attention_dropout", 0.0)
+    args.encoder_layers = getattr(args, "encoder_layers", 6)
+    args.encoder_embed_dim = getattr(args, "encoder_embed_dim", 256)
+    args.encoder_attention_heads = getattr(args, "encoder_attention_heads", 2)
+    args.decoder_layers = getattr(args, "decoder_layers", 6)
+    args.decoder_embed_dim = getattr(args, "decoder_embed_dim", 256)
+    args.decoder_attention_heads = getattr(args, "decoder_attention_heads", 2)
+    # variance predictor
+    args.var_pred_n_bins = getattr(args, "var_pred_n_bins", 256)
+    args.var_pred_hidden_dim = getattr(args, "var_pred_hidden_dim", 256)
+    args.var_pred_kernel_size = getattr(args, "var_pred_kernel_size", 3)
+    args.var_pred_dropout = getattr(args, "var_pred_dropout", 0.5)
+    # postnet
+    args.add_postnet = getattr(args, "add_postnet", False)
+    args.postnet_dropout = getattr(args, "postnet_dropout", 0.5)
+    args.postnet_layers = getattr(args, "postnet_layers", 5)
+    args.postnet_conv_dim = getattr(args, "postnet_conv_dim", 512)
+    args.postnet_conv_kernel_size = getattr(args, "postnet_conv_kernel_size", 5)
+    # pitch & energe
+    args.use_pitch = getattr(args, "use_pitch", False)
+    args.use_energe = getattr(args, "use_energe", False)
+
+
+@register_model_architecture("fasttext2unit", "fasttext2unit_l")
+def base_architecture(args):
+    args.dropout = getattr(args, "dropout", 0.2)
+    args.output_frame_dim = getattr(args, "output_frame_dim", -1)
+    args.speaker_embed_dim = getattr(args, "speaker_embed_dim", 256)
+    # FFT blocks
+    args.fft_hidden_dim = getattr(args, "fft_hidden_dim", 1536)
+    args.fft_kernel_size = getattr(args, "fft_kernel_size", 9)
+    args.attention_dropout = getattr(args, "attention_dropout", 0.1)
+    args.encoder_layers = getattr(args, "encoder_layers", 6)
+    args.encoder_embed_dim = getattr(args, "encoder_embed_dim", 384)
+    args.encoder_attention_heads = getattr(args, "encoder_attention_heads", 6)
+    args.decoder_layers = getattr(args, "decoder_layers", 6)
+    args.decoder_embed_dim = getattr(args, "decoder_embed_dim", 384)
+    args.decoder_attention_heads = getattr(args, "decoder_attention_heads", 6)
+    # variance predictor
+    args.var_pred_n_bins = getattr(args, "var_pred_n_bins", 256)
+    args.var_pred_hidden_dim = getattr(args, "var_pred_hidden_dim", 256)
+    args.var_pred_kernel_size = getattr(args, "var_pred_kernel_size", 3)
+    args.var_pred_dropout = getattr(args, "var_pred_dropout", 0.5)
+    # postnet
+    args.add_postnet = getattr(args, "add_postnet", False)
+    args.postnet_dropout = getattr(args, "postnet_dropout", 0.5)
+    args.postnet_layers = getattr(args, "postnet_layers", 5)
+    args.postnet_conv_dim = getattr(args, "postnet_conv_dim", 512)
+    args.postnet_conv_kernel_size = getattr(args, "postnet_conv_kernel_size", 5)
+    # pitch & energe
+    args.use_pitch = getattr(args, "use_pitch", False)
+    args.use_energe = getattr(args, "use_energe", False)
diff --git a/SpeechLM/speechlm/models/speechlm.py b/SpeechLM/speechlm/models/speechlm.py
new file mode 100644
index 0000000000000000000000000000000000000000..038fe83875c04a890e21926477882c9207fa7db1
--- /dev/null
+++ b/SpeechLM/speechlm/models/speechlm.py
@@ -0,0 +1,720 @@
+# ----------------------------------------------------------------------------
+# SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data (https://arxiv.org/abs/2209.15329)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/SpeechLM
+# Code based on fairseq: https://github.com/facebookresearch/fairseq/tree/272c4c5197250997148fb12c0db6306035f166a4
+# 
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# ----------------------------------------------------------------------------
+
+import logging
+from dataclasses import dataclass, field
+from typing import Dict, List, Optional, Tuple
+
+import numpy as np
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+
+from fairseq import utils, checkpoint_utils
+from fairseq.data.data_utils import compute_mask_indices
+from fairseq.data.dictionary import Dictionary
+from fairseq.dataclass import ChoiceEnum
+from fairseq.models import BaseFairseqModel, register_model
+from fairseq.models.transformer import Embedding
+from fairseq.file_io import PathManager
+from torch import Tensor
+from fairseq.models.wav2vec.wav2vec2 import ConvFeatureExtractionModel
+from fairseq.modules import GradMultiply, LayerNorm
+from fairseq.tasks.hubert_pretraining import (
+    HubertPretrainingConfig,
+    HubertPretrainingTask,
+)
+from fairseq.models.hubert import HubertConfig
+from fairseq.models.transformer import TransformerConfig
+from speechlm.modules.w2v_encoder import TransformerEncoder
+from speechlm.modules.transformer_encoder import TransformerEncoderBase
+
+logger = logging.getLogger(__name__)
+
+EXTRACTOR_MODE_CHOICES = ChoiceEnum(["default", "layer_norm"])
+MASKING_DISTRIBUTION_CHOICES = ChoiceEnum(["static", "uniform", "normal", "poisson"])
+
+
+@dataclass
+
+class SpeechlmConfig(HubertConfig):
+    use_rel_pos_enc: bool = field(
+        default=False,
+        metadata={"help": "whether to use relative positional encoding"},
+    )
+    scaling_for_att: float = field(
+        default=1.0,
+        metadata={"help": "scaling for attention weights to prevent overflow issue (for large model)"},
+    )
+
+    # unit encoder-decoder
+    text_transformer: TransformerConfig = TransformerConfig()
+    add_unit_encoder: bool = field(
+        default=False,
+        metadata={"help": "add unit encoder"},
+    )
+    add_decoder: bool = field(
+        default=False,
+        metadata={"help": "add decoder"},
+    )
+    add_text_ctc: bool = field(
+        default=False,
+        metadata={"help": "add_text_ctc head"},
+    )
+    text_ctc_conv_kernel: int = field(
+        default=2,
+        metadata={"help": "text_ctc_conv kernel size"},
+    )
+    mask_u2t: bool = field(
+        default=True,
+        metadata={"help": "mask the unit input in unit-to-text task"},
+    )
+    compute_mum: bool = field(
+        default=False,
+        metadata={"help": "compute MLM loss in unit-to-text task"},
+    )
+
+    # embedding mixing
+    mix_with_unit: bool = field(
+        default=True,
+        metadata={"help": "mix with the unit embeddings"},
+    )
+    use_pred_unit: bool = field(
+        default=False,
+        metadata={"help": "use the embeddings of predicted units"},
+    )
+    l2_embedding: bool = field(
+        default=False,
+        metadata={"help": "compute l2 loss between unit embedding and unit hidden state"},
+    )
+
+    # Finetune related
+    encoder_dict_size: int = field(
+        default=-1,
+        metadata={"help": "text encoder dictionary dimension"},
+    )
+
+    decoder_dict_size: int = field(
+        default=-1,
+        metadata={"help": "decoder dictionary dimension"},
+    )
+
+
+@register_model("speechlm", dataclass=SpeechlmConfig)
+class SpeechlmModel(BaseFairseqModel):
+    def __init__(
+        self,
+        cfg: SpeechlmConfig,
+        task_cfg: HubertPretrainingConfig,
+        dictionaries: List[Dictionary],
+        unit_dictionary: Dictionary = None,
+        text_tgt_dictionary: Dictionary = None,
+    ) -> None:
+        super().__init__()
+        logger.info(f"SpeechlmModel Config: {cfg}")
+
+        feature_enc_layers = eval(cfg.conv_feature_layers)  # noqa
+        self.embed = feature_enc_layers[-1][0]
+
+        self.feature_extractor = ConvFeatureExtractionModel(
+            conv_layers=feature_enc_layers,
+            dropout=0.0,
+            mode=cfg.extractor_mode,
+            conv_bias=cfg.conv_bias,
+        )
+        feature_ds_rate = np.prod([s for _, _, s in feature_enc_layers])
+        self.feat2tar_ratio = cfg.label_rate * feature_ds_rate / task_cfg.sample_rate
+
+        self.post_extract_proj = (
+            nn.Linear(self.embed, cfg.encoder_embed_dim)
+            if self.embed != cfg.encoder_embed_dim
+            else None
+        )
+
+        self.mask_prob = cfg.mask_prob
+        self.mask_selection = cfg.mask_selection
+        self.mask_other = cfg.mask_other
+        self.mask_length = cfg.mask_length
+        self.no_mask_overlap = cfg.no_mask_overlap
+        self.mask_min_space = cfg.mask_min_space
+
+        self.mask_channel_prob = cfg.mask_channel_prob
+        self.mask_channel_selection = cfg.mask_channel_selection
+        self.mask_channel_other = cfg.mask_channel_other
+        self.mask_channel_length = cfg.mask_channel_length
+        self.no_mask_channel_overlap = cfg.no_mask_channel_overlap
+        self.mask_channel_min_space = cfg.mask_channel_min_space
+
+        self.dropout_input = nn.Dropout(cfg.dropout_input)
+        self.dropout_features = nn.Dropout(cfg.dropout_features)
+
+        self.feature_grad_mult = cfg.feature_grad_mult
+        self.logit_temp = cfg.logit_temp
+        self.skip_masked = cfg.skip_masked
+        self.skip_nomask = cfg.skip_nomask
+
+        final_dim = cfg.final_dim if cfg.final_dim > 0 else cfg.encoder_embed_dim
+
+        self.mask_emb = nn.Parameter(
+            torch.FloatTensor(cfg.encoder_embed_dim).uniform_()
+        )
+
+        self.encoder = TransformerEncoder(cfg)
+        self.layer_norm = LayerNorm(self.embed)
+
+        self.target_glu = None
+        if cfg.target_glu:
+            self.target_glu = nn.Sequential(
+                nn.Linear(final_dim, final_dim * 2), nn.GLU()
+            )
+
+        self.final_dim = final_dim
+        assert len(dictionaries) <= 2, f"Only support <=2 kinds of targets, get {len(dictionaries)} dictionaries"
+        if len(dictionaries) == 1:
+            dictionaries = [dictionaries[0], dictionaries[0]]
+        
+        self.final_proj_list = nn.ModuleList([
+            nn.Linear(cfg.encoder_embed_dim, final_dim) for _ in dictionaries
+        ])
+
+        self.num_classes = [len(d) for d in dictionaries]
+        self.label_embs_list = nn.ParameterList([
+            nn.Parameter(torch.FloatTensor(n, final_dim)) for n in self.num_classes
+        ])
+        for i in range(len(self.num_classes)):
+            nn.init.uniform_(self.label_embs_list[i])
+
+        ### build unit encoder:
+        self.mask_u2t = cfg.mask_u2t
+        self.compute_mum = cfg.compute_mum
+        self.add_text_ctc = cfg.add_text_ctc
+        self.text_ctc_conv_kernel = cfg.text_ctc_conv_kernel
+        self.padding_idx = unit_dictionary.pad()
+        self.unit_mask_idx = unit_dictionary.index("<mask>")
+
+        self.add_unit_encoder = cfg.add_unit_encoder
+        self.mix_with_unit = cfg.mix_with_unit
+        self.use_pred_unit = cfg.use_pred_unit
+        self.l2_embedding = cfg.l2_embedding
+        if self.add_unit_encoder:
+            assert len(unit_dictionary) == self.num_classes[0], f"unit_dictionary: {len(unit_dictionary)}, self.num_classes[0]: {self.num_classes[0]}"
+            ### build unit pre-net, and shared with hubert label_embs if needed (default: False)
+            self.unit_embed_tokens = self.build_embedding(
+                    unit_dictionary,
+                    cfg.text_transformer.encoder.embed_dim,
+                )
+            if self.final_dim == cfg.text_transformer.encoder.embed_dim:
+                logger.info("Share label_embs[0] with unit_embed_tokens ...")
+                nn.init.uniform_(self.unit_embed_tokens.weight)
+                self.label_embs_list[0] = self.unit_embed_tokens.weight
+
+            ### build unit encoder
+            self.unit_encoder = TransformerEncoderBase(
+                cfg.text_transformer, 
+                unit_dictionary, 
+                self.unit_embed_tokens,
+                use_rel_pos_enc=cfg.use_rel_pos_enc,
+                scaling_for_att=cfg.scaling_for_att,
+            )
+            
+            ### build text ctc head
+            if self.add_text_ctc:
+                conv = nn.Conv1d(
+                    cfg.text_transformer.encoder.embed_dim, cfg.text_transformer.encoder.embed_dim, 
+                    self.text_ctc_conv_kernel,
+                    stride=self.text_ctc_conv_kernel // 2,
+                    bias=False,
+                    padding=self.text_ctc_conv_kernel // 2,
+                )
+                nn.init.kaiming_normal_(conv.weight)
+                self.unit_encoder_ctc_head = nn.Sequential(
+                    Rotate3D(),
+                    conv,
+                    nn.Dropout(p=0.1),
+                    nn.Sequential(
+                        Rotate3D(),
+                        Rotate3D(),
+                        LayerNorm(cfg.text_transformer.encoder.embed_dim),
+                    ),
+                    nn.GELU(),
+                    nn.Linear(cfg.text_transformer.encoder.embed_dim, len(text_tgt_dictionary)),
+                )
+
+        ### build unit2text decoder, not available for now
+        self.add_decoder = cfg.add_decoder
+
+    def build_embedding(self, dictionary, embed_dim):
+        num_embeddings = len(dictionary)
+        padding_idx = dictionary.pad()
+        return Embedding(num_embeddings, embed_dim, padding_idx)
+
+    def upgrade_state_dict_named(self, state_dict, name):
+        """Upgrade a (possibly old) state dict for new versions of fairseq."""
+
+        super().upgrade_state_dict_named(state_dict, name)
+        return state_dict
+
+    @classmethod
+    def build_model(cls, cfg: SpeechlmConfig, task: HubertPretrainingTask):
+        """Build a new model instance."""
+        unit_dictionary = getattr(task, "text_src_dictionary", None)
+        text_tgt_dictionary = getattr(task, "text_dictionary", None)
+        model = SpeechlmModel(cfg, task.cfg, task.dictionaries, unit_dictionary, text_tgt_dictionary)
+        return model
+
+    def apply_mask(self, x, padding_mask, target_list):
+        B, T, C = x.shape
+        if self.mask_prob > 0:
+            mask_indices = compute_mask_indices(
+                (B, T),
+                padding_mask,
+                self.mask_prob,
+                self.mask_length,
+                self.mask_selection,
+                self.mask_other,
+                min_masks=2,
+                no_overlap=self.no_mask_overlap,
+                min_space=self.mask_min_space,
+            )
+            mask_indices = torch.from_numpy(mask_indices).to(x.device)
+            x[mask_indices] = self.mask_emb
+        else:
+            mask_indices = None
+
+        if self.mask_channel_prob > 0:
+            mask_channel_indices = compute_mask_indices(
+                (B, C),
+                None,
+                self.mask_channel_prob,
+                self.mask_channel_length,
+                self.mask_channel_selection,
+                self.mask_channel_other,
+                no_overlap=self.no_mask_channel_overlap,
+                min_space=self.mask_channel_min_space,
+            )
+            mask_channel_indices = (
+                torch.from_numpy(mask_channel_indices)
+                .to(x.device)
+                .unsqueeze(1)
+                .expand(-1, T, -1)
+            )
+            x[mask_channel_indices] = 0
+
+        return x, mask_indices
+
+    def forward_features(self, source: torch.Tensor) -> torch.Tensor:
+        if self.feature_grad_mult > 0:
+            features = self.feature_extractor(source)
+            if self.feature_grad_mult != 1.0:
+                features = GradMultiply.apply(features, self.feature_grad_mult)
+        else:
+            with torch.no_grad():
+                features = self.feature_extractor(source)
+        return features
+
+    def forward_targets(
+        self,
+        features: torch.Tensor,
+        target_list: List[torch.Tensor],
+    ) -> Tuple[torch.Tensor, torch.Tensor]:
+        # Trim features to ensure labels exist and then get aligned labels
+        feat_tsz = features.size(2)
+        targ_tsz = min([t.size(1) for t in target_list])
+        if self.feat2tar_ratio * feat_tsz > targ_tsz:
+            feat_tsz = int(targ_tsz / self.feat2tar_ratio)
+            features = features[..., :feat_tsz]
+        target_inds = torch.arange(feat_tsz).float() * self.feat2tar_ratio
+        target_inds += np.random.choice(int(self.feat2tar_ratio))
+        target_list = [t[:, target_inds.long()] for t in target_list]
+        return features, target_list
+
+    def forward_padding_mask(
+        self,
+        features: torch.Tensor,
+        padding_mask: torch.Tensor,
+    ) -> torch.Tensor:
+        extra = padding_mask.size(1) % features.size(1)
+        if extra > 0:
+            padding_mask = padding_mask[:, :-extra]
+        padding_mask = padding_mask.view(padding_mask.size(0), features.size(1), -1)
+        padding_mask = padding_mask.all(-1)
+        return padding_mask
+
+    def get_normalized_probs(
+        self,
+        net_output: Tuple[Tensor, Optional[Dict[str, List[Optional[Tensor]]]]],
+        log_probs: bool,
+        sample: Optional[Dict[str, Tensor]] = None,
+    ):
+        lprobs = self.get_normalized_probs_scriptable(net_output, log_probs, sample)
+        lprobs.batch_first = True
+        return lprobs
+
+    def downsample_ctc_padding_mask(self, padding_mask):
+        """
+        padding_mask: (B, T)
+        """
+        stride = self.text_ctc_conv_kernel // 2
+        return padding_mask[:, ::stride]
+    
+    def compute_pred(self, proj_x, label_embs):
+        if self.target_glu:
+            label_embs = self.target_glu(label_embs)
+        x = F.normalize(proj_x.float(), dim=-1)                 # (S, D)
+        label_embs = F.normalize(label_embs.float(), dim=-1)    # (C, D)
+        logits = torch.matmul(x, label_embs.T).type_as(proj_x)  # (S, C)
+        logits /= self.logit_temp
+        return logits
+
+    def compute_hubert_logits(self, x, target, proj, label_embs, padding_mask, mask_indices):
+        if not self.skip_masked:
+            masked_indices = torch.logical_and(~padding_mask, mask_indices)
+            proj_x_m = proj(x[masked_indices])
+            logit_m_list = [(self.compute_pred(proj_x_m, label_embs), target[masked_indices])]
+        else:
+            logit_m_list = [None]
+
+        if not self.skip_nomask:
+            nomask_indices = torch.logical_and(~padding_mask, ~mask_indices)
+            proj_x_u = proj(x[nomask_indices])
+            logit_u_list = [(self.compute_pred(proj_x_u, label_embs), target[nomask_indices])]
+        else:
+            logit_u_list = [None]
+
+        return logit_m_list, logit_u_list
+
+    def convert_embeddings(self,
+        x,
+        padding_mask,
+        target=None,
+        mask_indices=None,
+        mix_with_unit=False,
+        use_pred_unit=False,
+        l2_embedding=False,
+        remask=False
+    ):
+        """
+        1. Mix with units if needed (default: True)
+        2. Prepare for unit_encoder inputs
+        Inputs:
+            x, (B, T, D)
+        Return:
+            src_tokens, (B, T)
+            soft_embeddings, (B, T, D)
+            l2_loss, a loss
+        """
+        soft_embeddings = self.final_proj_list[0](x) if x.size(-1) == self.final_dim else x
+        if padding_mask is None:
+            padding_mask = soft_embeddings.new_zeros(soft_embeddings.size(0), soft_embeddings.size(1), dtype=torch.long)
+        if use_pred_unit:
+            src_tokens = self.compute_pred(self.final_proj_list[0](x), self.label_embs_list[0]).argmax(dim=-1)
+            src_tokens[padding_mask] = self.padding_idx
+        elif target is not None:
+            src_tokens = target
+        else:
+            src_tokens = padding_mask.long()
+
+        if l2_embedding | mix_with_unit:
+            unit_embeddings = self.unit_embed_tokens(src_tokens)    # (B, T, D)
+        
+        l2_loss = 0
+        if l2_embedding:
+            if mask_indices is not None:
+                l2_loss = (soft_embeddings - unit_embeddings)[mask_indices].float().pow(2).mean(dim=-1)
+                scale = unit_embeddings[mask_indices].float().pow(2).sum(dim=-1)
+            else:
+                l2_loss = (soft_embeddings - unit_embeddings).float().pow(2).mean(dim=-1)
+                scale = unit_embeddings.float().pow(2).sum(dim=-1)
+            l2_loss = (l2_loss / scale).mean()
+
+        if mix_with_unit:
+            B, T, D = x.shape
+            selected_indices = compute_mask_indices(
+                (B, T),
+                padding_mask,
+                self.mask_prob / 2,
+                self.mask_length // 2,
+                self.mask_selection,
+                self.mask_other,
+                min_masks=2,
+                no_overlap=self.no_mask_overlap,
+                min_space=self.mask_min_space,
+            )
+            selected_indices = torch.from_numpy(selected_indices).to(x.device)
+            if mask_indices is not None:
+                if remask:
+                    remask_indices = torch.logical_and(selected_indices, mask_indices)
+                    soft_embeddings[remask_indices] = self.mask_emb
+                swap_indices = torch.logical_and(selected_indices, ~mask_indices)
+            else:
+                swap_indices = selected_indices
+            soft_embeddings[swap_indices] = unit_embeddings[swap_indices]
+
+        soft_embeddings = soft_embeddings * (1 - padding_mask.unsqueeze(-1).type_as(x))
+        return src_tokens, soft_embeddings, l2_loss
+
+    def forward(
+        self,
+        source: torch.Tensor = None,
+        src_tokens: torch.Tensor = None,
+        src_lengths: torch.Tensor = None,
+        target_list: Optional[List[torch.Tensor]] = None,
+        padding_mask: Optional[torch.Tensor] = None,
+        mask: bool = True,
+        features_only: bool = False,
+        output_layer: Optional[int] = None,
+    ) -> Dict[str, torch.Tensor]:
+        assert source is not None or src_tokens is not None
+        if source is not None:
+            return self.forward_speech(
+                source=source,
+                target_list=target_list,
+                padding_mask=padding_mask,
+                mask=mask,
+                features_only=features_only,
+                output_layer=output_layer,
+            )
+        else:
+            return self.forward_text(
+                src_tokens=src_tokens,
+                src_lengths=src_lengths,
+                mask=self.mask_u2t,
+                output_layer=output_layer,
+            )
+    
+    def forward_speech(
+        self,
+        source: torch.Tensor = None,
+        target_list: Optional[List[torch.Tensor]] = None,
+        padding_mask: Optional[torch.Tensor] = None,
+        mask: bool = True,
+        features_only: bool = False,
+        output_layer: Optional[int] = None,
+    ) -> Dict[str, torch.Tensor]:
+        """output layer is 1-based"""
+        features = self.forward_features(source)
+        if target_list is not None:
+            features, target_list = self.forward_targets(features, target_list)
+
+        features_pen = features.float().pow(2).mean()
+
+        features = features.transpose(1, 2)
+        features = self.layer_norm(features)
+        unmasked_features = features.clone()
+
+        if padding_mask is not None:
+            padding_mask = self.forward_padding_mask(features, padding_mask)
+
+        if self.post_extract_proj is not None:
+            features = self.post_extract_proj(features)
+
+        features = self.dropout_input(features)
+        unmasked_features = self.dropout_features(unmasked_features)
+
+        if mask:
+            x, mask_indices = self.apply_mask(features, padding_mask, target_list)
+        else:
+            x = features
+            mask_indices = None
+
+        # feature: (B, T, D), float
+        # target: (B, T), long
+        # x: (B, T, D), float
+        # padding_mask: (B, T), bool
+        # mask_indices: (B, T), bool
+        x, _ = self.encoder(
+            x,
+            padding_mask=padding_mask,
+            layer=None if output_layer is None else output_layer - 1,
+        )
+
+        if features_only:
+            return {"x": x, "padding_mask": padding_mask, "features": features}
+        
+        logit_m_list, logit_u_list = self.compute_hubert_logits(
+            x,
+            target_list[0],
+            self.final_proj_list[0],
+            self.label_embs_list[0],
+            padding_mask,
+            mask_indices,
+        )
+
+        result = {
+            "logit_m_list": logit_m_list,
+            "logit_u_list": logit_u_list,
+            "padding_mask": padding_mask,
+            "features_pen": features_pen,
+        }
+        
+        if self.add_unit_encoder:
+            src_tokens, x_emb, l2_loss = self.convert_embeddings(
+                x, 
+                padding_mask, target_list[0],
+                mask_indices=mask_indices,
+                mix_with_unit=self.mix_with_unit,
+                use_pred_unit=self.use_pred_unit,
+                l2_embedding=self.l2_embedding,
+            )
+            encoder_out = self.unit_encoder(src_tokens, token_embeddings=x_emb)
+
+            result['encoder_out'] = encoder_out['encoder_out']  # [(T, B, D)]
+            result['encoder_padding_mask'] = encoder_out['encoder_padding_mask']    # [(B, T)]
+            if self.l2_embedding:
+                result['embedding_l2_loss'] = l2_loss
+
+            code_logit_m_list, code_logit_u_list = self.compute_hubert_logits(
+                encoder_out['encoder_out'][0].transpose(0, 1), 
+                target_list[-1], 
+                self.final_proj_list[-1], 
+                self.label_embs_list[-1],
+                padding_mask,
+                mask_indices,
+            )
+            result['logit_m_list'] += code_logit_m_list
+            result['logit_u_list'] += code_logit_u_list
+        return result
+
+    def forward_text(
+        self,
+        src_tokens: torch.Tensor = None,
+        src_lengths: torch.Tensor = None,
+        target_list: Optional[List[torch.Tensor]] = None,
+        mask: bool = True,
+        output_layer: Optional[int] = None,
+    ) -> Dict[str, torch.Tensor]:
+        assert self.add_unit_encoder, f"Can not forward unit-text branch without unit_encoder!"
+
+        padding_mask = src_tokens == self.padding_idx
+        unit_embeddings = self.unit_embed_tokens(src_tokens)
+        if mask:
+            unit_embeddings, mask_indices = self.apply_mask(unit_embeddings, padding_mask, [src_tokens])
+        else:
+            ### If already applied mask on src_tokens, then the target_list should contains many padding_idx
+            mask_indices = target_list[-1] != self.padding_idx
+            unit_embeddings[mask_indices] = self.mask_emb
+        
+        encoder_out = self.unit_encoder(
+            src_tokens,
+            token_embeddings=unit_embeddings,
+            return_all_hiddens=output_layer is not None,
+        )
+
+        result = {}
+        result["encoder_out"] = encoder_out["encoder_out"]
+        result["encoder_states"] = encoder_out["encoder_states"]
+        result["padding_mask"] = padding_mask
+
+        if self.compute_mum:
+            code_logit_m_list, code_logit_u_list = self.compute_hubert_logits(
+                encoder_out["encoder_out"].transpose(0, 1), 
+                target_list[-1], 
+                self.final_proj_list[-1], 
+                self.label_embs_list[-1],
+                padding_mask,
+                mask_indices,
+            )
+            result["logit_m_list"] = code_logit_m_list
+            result["logit_u_list"] = code_logit_u_list
+        
+        if self.add_text_ctc:
+            result["encoder_out_ctc"] = [self.unit_encoder_ctc_head(x) for x in encoder_out['encoder_out']]
+            result["encoder_padding_mask"] = [
+                self.downsample_ctc_padding_mask(padding_mask) for padding_mask in encoder_out['encoder_padding_mask']
+            ]
+        return result
+
+    def extract_features(
+        self,
+        source: torch.Tensor,
+        padding_mask: Optional[torch.Tensor] = None,
+        mask: bool = False,
+        ret_conv: bool = False,
+        output_layer: Optional[int] = None,
+        **kwargs,
+    ) -> Tuple[torch.Tensor, torch.Tensor]:
+        """Extract features for only speech input"""
+        res = self.forward(
+            source,
+            padding_mask=padding_mask,
+            mask=mask,
+            features_only=True,
+            output_layer=output_layer,
+        )
+        x = res["x"] # B x T x D
+        padding_mask = res["padding_mask"]
+
+        if self.add_unit_encoder:
+            src_tokens, x, _ = self.convert_embeddings(
+                x,
+                padding_mask,
+                mix_with_unit=False,
+                use_pred_unit=False,
+            )
+            encoder_out = self.unit_encoder(
+                src_tokens,
+                token_embeddings=x,
+                return_all_hiddens=output_layer is not None
+            )
+            res["x"] = encoder_out['encoder_out'][0].transpose(0, 1)  # (B, T, D)
+        
+        feature = res["features"] if ret_conv else res["x"]
+        if output_layer is not None:
+            feature = encoder_out['encoder_states']
+
+        return feature, padding_mask
+
+    def get_logits(self, net_output, is_masked=True):
+        if is_masked:
+            logits_list = net_output["logit_m_list"]
+        else:
+            logits_list = net_output["logit_u_list"]
+        logits_list = [x[0].float() for x in logits_list if x is not None]
+        return logits_list
+
+    def get_targets(self, net_output, is_masked=True):
+        if is_masked:
+            logits_list = net_output["logit_m_list"]
+        else:
+            logits_list = net_output["logit_u_list"]
+        targets_list = [x[1].long() for x in logits_list if x is not None]
+        return targets_list
+
+    def get_extra_losses(self, net_output):
+        extra_losses = []
+        names = []
+
+        if "features_pen" in net_output:
+            extra_losses.append(net_output["features_pen"])
+            names.append("features_pen")
+
+        if "embedding_l2_loss" in net_output:
+            extra_losses.append(net_output["embedding_l2_loss"])
+            names.append("embedding_l2_loss")
+
+        return extra_losses, names
+
+    def remove_pretraining_modules(self, step2=False):
+        self.target_glu = None
+
+    def load_checkpoint(self, checkpoint: str):
+        if not PathManager.exists(checkpoint):
+            raise IOError("Model file not found: {}".format(checkpoint))
+        state = checkpoint_utils.load_checkpoint_to_cpu(checkpoint)
+        return state
+
+class Rotate3D(nn.Module):
+    """
+    (T, B, D) --> (B, D, T) --> (D, T, B) --> (T, B, D)
+    """
+    def __init__(self):
+        super().__init__()
+
+    def forward(self, x):
+        return x.permute(1, 2, 0)
diff --git a/SpeechLM/speechlm/models/speechlm_ctcasr.py b/SpeechLM/speechlm/models/speechlm_ctcasr.py
new file mode 100644
index 0000000000000000000000000000000000000000..642a51d83de671f067417202a74af29d1466653c
--- /dev/null
+++ b/SpeechLM/speechlm/models/speechlm_ctcasr.py
@@ -0,0 +1,56 @@
+# ----------------------------------------------------------------------------
+# SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data (https://arxiv.org/abs/2209.15329)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/SpeechLM
+# Code based on fairseq: https://github.com/facebookresearch/fairseq/tree/272c4c5197250997148fb12c0db6306035f166a4
+# 
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# ----------------------------------------------------------------------------
+
+from dataclasses import dataclass
+from fairseq.models import BaseFairseqModel, register_model
+from fairseq.tasks import FairseqTask
+
+from fairseq.models.hubert import HubertAsrConfig, HubertCtc, HubertEncoder
+
+@dataclass
+class SpeechLMCtcConfig(HubertAsrConfig):
+    pass
+
+
+@register_model("speechlm_ctc", dataclass=SpeechLMCtcConfig)
+class SpeechLMCtc(HubertCtc):
+    def __init__(self, cfg: SpeechLMCtcConfig, w2v_encoder: BaseFairseqModel):
+        super().__init__(cfg, w2v_encoder)
+
+    @classmethod
+    def build_model(cls, cfg: SpeechLMCtcConfig, task: FairseqTask):
+        """Build a new model instance."""
+        w2v_encoder = SpeechLMEncoder(cfg, task)
+        return cls(cfg, w2v_encoder)
+
+
+class SpeechLMEncoder(HubertEncoder):
+    def __init__(self, cfg: HubertAsrConfig, task):
+        super().__init__(cfg, task)
+        
+        if (task.target_dictionary is not None) and (
+            hasattr(self.w2v_model, "unit_encoder_ctc_head")
+        ):
+            self.proj = self.w2v_model.unit_encoder_ctc_head
+            self.conv_ctc_proj = True
+        else:
+            self.conv_ctc_proj = False
+
+    def forward(self, source, padding_mask, tbc=True, **kwargs):
+        results = super().forward(
+            source,
+            padding_mask,
+            tbc,
+            **kwargs,
+        )
+        if self.conv_ctc_proj:
+            padding_mask = self.w2v_model.downsample_ctc_padding_mask(results["padding_mask"])
+            results["encoder_padding_mask"] = padding_mask
+            results["padding_mask"] = padding_mask
+        return results
diff --git a/SpeechLM/speechlm/models/speechlm_st.py b/SpeechLM/speechlm/models/speechlm_st.py
new file mode 100644
index 0000000000000000000000000000000000000000..6f70c549f9200c043149acedbb09385c2aaca4d3
--- /dev/null
+++ b/SpeechLM/speechlm/models/speechlm_st.py
@@ -0,0 +1,268 @@
+# ----------------------------------------------------------------------------
+# SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data (https://arxiv.org/abs/2209.15329)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/SpeechLM
+# Code based on fairseq: https://github.com/facebookresearch/fairseq/tree/272c4c5197250997148fb12c0db6306035f166a4
+# 
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# ----------------------------------------------------------------------------
+
+import contextlib
+import torch
+import torch.nn as nn
+from argparse import Namespace
+from dataclasses import dataclass, field
+from typing import Any
+from fairseq import checkpoint_utils, tasks, utils
+from fairseq.models import FairseqEncoderDecoderModel, register_model
+from fairseq.models.fairseq_decoder import FairseqDecoder
+from fairseq.models.fairseq_encoder import FairseqEncoder
+from fairseq.tasks import FairseqTask
+from fairseq.dataclass import ChoiceEnum
+from fairseq.dataclass.utils import convert_namespace_to_omegaconf
+from fairseq.data.data_utils import lengths_to_padding_mask
+
+from fairseq.models.hubert import HubertAsrConfig
+from speechlm.modules.transformer_decoder import TransformerDecoderScriptable
+
+@dataclass
+class SpeechLMS2TConfig(HubertAsrConfig):
+    activation_fn: ChoiceEnum(utils.get_available_activation_fns()) = field(
+        default="gelu", metadata={"help": "activation function to use"}
+    )
+    use_rel_pos_enc: bool = field(
+        default=True,
+        metadata={"help": "whether to use relative positional encoding for decoder"},
+    )
+    encoder_embed_dim: int = field(
+        default=768, metadata={"help": "encoder embedding dimension, used for enc-dec att"}
+    )
+    decoder_embed_dim: int = field(
+        default=768, metadata={"help": "decoder embedding dimension"}
+    )
+    decoder_output_dim: int = field(
+        default=768, metadata={"help": "decoder output dimension"}
+    )
+    decoder_ffn_embed_dim: int = field(
+        default=3072, metadata={"help": "decoder embedding dimension for FFN"}
+    )
+    decoder_layers: int = field(default=6, metadata={"help": "num of decoder layers"})
+    decoder_layerdrop: float = field(
+        default=0.0, metadata={"help": "decoder layerdrop chance"}
+    )
+    decoder_attention_heads: int = field(
+        default=12, metadata={"help": "num decoder attention heads"}
+    )
+    decoder_learned_pos: bool = field(
+        default=False,
+        metadata={"help": "use learned positional embeddings in the decoder"},
+    )
+    decoder_normalize_before: bool = field(
+        default=False, metadata={"help": "apply layernorm before each decoder block"}
+    )
+    no_token_positional_embeddings: bool = field(
+        default=False,
+        metadata={
+            "help": "if set, disables positional embeddings (outside self attention)"
+        },
+    )
+    decoder_dropout: float = field(
+        default=0.0, metadata={"help": "dropout probability in the decoder"}
+    )
+    decoder_attention_dropout: float = field(
+        default=0.0,
+        metadata={
+            "help": "dropout probability for attention weights inside the decoder"
+        },
+    )
+    decoder_activation_dropout: float = field(
+        default=0.0,
+        metadata={
+            "help": "dropout probability after activation in FFN inside the decoder"
+        },
+    )
+    share_decoder_input_output_embed: bool = field(
+        default=False, metadata={"help": "share decoder input and output embeddings"}
+    )
+    ### the following config is only for the compatibility to fairseq speech_to_text task
+    input_feat_per_channel: Any = None
+    input_channels: Any = None
+    speaker_to_id: Any = None
+
+@register_model("speechlm_st_legacy", dataclass=SpeechLMS2TConfig)
+class SpeechLMS2T(FairseqEncoderDecoderModel):
+    def __init__(self, cfg: SpeechLMS2TConfig, encoder: FairseqEncoder, decoder: FairseqDecoder):
+        super().__init__(encoder, decoder)
+        self.cfg = cfg
+
+    def upgrade_state_dict_named(self, state_dict, name):
+        super().upgrade_state_dict_named(state_dict, name)
+        return state_dict
+    
+    @classmethod
+    def build_model(cls, cfg: SpeechLMS2TConfig, task: FairseqTask):
+        """Build a new model instance."""
+        def build_embedding(dictionary, embed_dim):
+            num_embeddings = len(dictionary)
+            padding_idx = dictionary.pad()
+            return Embedding(num_embeddings, embed_dim, padding_idx)
+
+        src_dict, tgt_dict = task.source_dictionary, task.target_dictionary
+        encoder = SpeechLMEncoder(cfg, task)
+        assert cfg.encoder_embed_dim == encoder.w2v_model.encoder.embedding_dim
+        decoder_embed_tokens = build_embedding(tgt_dict, cfg.decoder_embed_dim)
+        decoder = TransformerDecoderScriptable(cfg, tgt_dict, decoder_embed_tokens)
+        return cls(cfg, encoder, decoder)
+
+
+class SpeechLMEncoder(FairseqEncoder):
+    """
+    Modified from fairseq.models.hubert.hubert_asr.HubertEncoder
+    1. make it compatible with fairseq speech_to_text task
+    2. make it compatible with encoder-decoder model
+    """
+    def __init__(self, cfg: HubertAsrConfig, task):
+        self.apply_mask = cfg.apply_mask
+
+        arg_overrides = {
+            "dropout": cfg.dropout,
+            "activation_dropout": cfg.activation_dropout,
+            "dropout_input": cfg.dropout_input,
+            "attention_dropout": cfg.attention_dropout,
+            "mask_length": cfg.mask_length,
+            "mask_prob": cfg.mask_prob,
+            "mask_selection": cfg.mask_selection,
+            "mask_other": cfg.mask_other,
+            "no_mask_overlap": cfg.no_mask_overlap,
+            "mask_channel_length": cfg.mask_channel_length,
+            "mask_channel_prob": cfg.mask_channel_prob,
+            "mask_channel_selection": cfg.mask_channel_selection,
+            "mask_channel_other": cfg.mask_channel_other,
+            "no_mask_channel_overlap": cfg.no_mask_channel_overlap,
+            "encoder_layerdrop": cfg.layerdrop,
+            "feature_grad_mult": cfg.feature_grad_mult,
+        }
+
+        if cfg.w2v_args is None:
+            state = checkpoint_utils.load_checkpoint_to_cpu(cfg.w2v_path, arg_overrides)
+            w2v_args = state.get("cfg", None)
+            if w2v_args is None:
+                w2v_args = convert_namespace_to_omegaconf(state["args"])
+            cfg.w2v_args = w2v_args
+        else:
+            state = None
+            w2v_args = cfg.w2v_args
+            if isinstance(w2v_args, Namespace):
+                cfg.w2v_args = w2v_args = convert_namespace_to_omegaconf(w2v_args)
+
+        assert task.data_cfg.standardize_audio() == w2v_args.task.normalize, (
+            "Fine-tuning works best when data normalization is the same. "
+            "Please check that --normalize is set or unset for "
+            "both pre-training and here"
+        )
+
+        w2v_args.task.data = cfg.data
+        pretrain_task = tasks.setup_task(w2v_args.task)
+        if state is not None and "task_state" in state:
+            # This will load the stored "dictionaries" object
+            pretrain_task.load_state_dict(state["task_state"])
+        else:
+            pretrain_task.load_state_dict(task.state_dict())
+
+        model = pretrain_task.build_model(w2v_args.model, from_checkpoint=True)
+        if state is not None and not cfg.no_pretrained_weights:
+            # set strict=False because we omit some modules
+            model.load_state_dict(state["model"], strict=False)
+
+        model.remove_pretraining_modules()
+
+        super().__init__(pretrain_task.source_dictionary)
+
+        d = w2v_args.model.encoder_embed_dim
+
+        self.w2v_model = model
+
+        self.final_dropout = nn.Dropout(cfg.final_dropout)
+        self.freeze_finetune_updates = cfg.freeze_finetune_updates
+        self.num_updates = 0
+
+    def set_num_updates(self, num_updates):
+        """Set the number of parameters updates."""
+        super().set_num_updates(num_updates)
+        self.num_updates = num_updates
+
+    def forward(self, src_tokens=None, src_lengths=None, **kwargs):
+
+        w2v_args = {
+            "source": src_tokens,
+            "padding_mask": lengths_to_padding_mask(src_lengths),
+            "mask": self.apply_mask and self.training,
+        }
+
+        ft = self.freeze_finetune_updates <= self.num_updates
+
+        with torch.no_grad() if not ft else contextlib.ExitStack():
+            x, padding_mask = self.w2v_model.extract_features(**w2v_args)
+            # B x T x C -> T x B x C
+            x = x.transpose(0, 1)
+
+        x = self.final_dropout(x)
+
+        return {
+            "encoder_out": [x],  # T x B x C
+            "encoder_padding_mask": [padding_mask],  # B x T
+            "padding_mask": [padding_mask],
+        }
+    
+    def forward_torchscript(self, net_input):
+        """A TorchScript-compatible version of forward.
+
+        Encoders which use additional arguments may want to override
+        this method for TorchScript compatibility.
+        """
+        _net_input = {
+            "source": net_input["src_tokens"],
+            "padding_mask": lengths_to_padding_mask(net_input["src_lengths"]),
+            "mask": False,
+        }
+
+        x, padding_mask = self.w2v_model.extract_features(**_net_input)
+        # B x T x C -> T x B x C
+        x = x.transpose(0, 1)
+
+        encoder_out = {
+            "encoder_out" : [x],
+            "encoder_padding_mask" : [padding_mask],
+        }
+        return encoder_out
+
+    def reorder_encoder_out(self, encoder_out, new_order):
+        if encoder_out["encoder_out"] is not None:
+            encoder_out["encoder_out"] = [
+                x.index_select(1, new_order) for x in encoder_out["encoder_out"]
+            ]
+        if encoder_out["encoder_padding_mask"] is not None:
+            encoder_out["encoder_padding_mask"] = [
+                x.index_select(0, new_order) for x in encoder_out["encoder_padding_mask"]
+            ]
+        return encoder_out
+
+    def max_positions(self):
+        """Maximum input length supported by the encoder."""
+        return None
+
+    def upgrade_state_dict_named(self, state_dict, name):
+        return state_dict
+
+def Embedding(num_embeddings, embedding_dim, padding_idx):
+    m = nn.Embedding(num_embeddings, embedding_dim, padding_idx=padding_idx)
+    nn.init.normal_(m.weight, mean=0, std=embedding_dim**-0.5)
+    nn.init.constant_(m.weight[padding_idx], 0)
+    return m
+
+def Linear(in_features, out_features, bias=True):
+    m = nn.Linear(in_features, out_features, bias)
+    nn.init.xavier_uniform_(m.weight)
+    if bias:
+        nn.init.constant_(m.bias, 0.0)
+    return m
diff --git a/SpeechLM/speechlm/modules/__init__.py b/SpeechLM/speechlm/modules/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..7cc082498ee85c984cbbf12776ced6f16e6a0bbf
--- /dev/null
+++ b/SpeechLM/speechlm/modules/__init__.py
@@ -0,0 +1,23 @@
+# --------------------------------------------------------
+# The YiTrans End-to-End Speech Translation System for IWSLT 2022 Offline Shared Task (https://arxiv.org/abs/2206.05777)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/YiTrans
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Based on fairseq code bases
+# https://github.com/facebookresearch/fairseq
+# --------------------------------------------------------
+
+from .multihead_attention import MultiheadAttention
+from .relative_pos_enc import RelativePositionalEncoding
+from .transformer_layer import TransformerEncoderLayerBase, TransformerDecoderLayerBase
+from .w2v_encoder import TransformerEncoder, TransformerSentenceEncoderLayer
+from .learned_positional_embedding import LearnedPositionalEmbedding
+
+__all__ = [
+    "MultiheadAttention",
+    "RelativePositionalEncoding",
+    "TransformerEncoderLayerBase",
+    "TransformerDecoderLayerBase",
+    "TransformerEncoder",
+    "TransformerSentenceEncoderLayer"
+]
diff --git a/SpeechLM/speechlm/modules/learned_positional_embedding.py b/SpeechLM/speechlm/modules/learned_positional_embedding.py
new file mode 100644
index 0000000000000000000000000000000000000000..9a6d55a37d456715e50da0d23b48005af1aec248
--- /dev/null
+++ b/SpeechLM/speechlm/modules/learned_positional_embedding.py
@@ -0,0 +1,68 @@
+# --------------------------------------------------------
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Based on fairseq code bases
+# https://github.com/facebookresearch/fairseq
+# --------------------------------------------------------
+"""
+    Modified from https://github.com/facebookresearch/fairseq/blob/main/fairseq/modules/learned_positional_embedding.py
+    1. Add clamping if the input length exceeds the max-source-tokens
+"""
+
+from typing import Dict, Optional
+
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from fairseq import utils
+from torch import Tensor
+
+
+class LearnedPositionalEmbedding(nn.Embedding):
+    """
+    This module learns positional embeddings up to a fixed maximum size.
+    Padding ids are ignored by either offsetting based on padding_idx
+    or by setting padding_idx to None and ensuring that the appropriate
+    position ids are passed to the forward function.
+    """
+
+    def __init__(self, num_embeddings: int, embedding_dim: int, padding_idx: int):
+        super().__init__(num_embeddings, embedding_dim, padding_idx)
+        self.onnx_trace = False
+        if self.padding_idx is not None:
+            self.max_positions = self.num_embeddings - self.padding_idx - 1
+        else:
+            self.max_positions = self.num_embeddings
+
+    def forward(
+        self,
+        input: Tensor,
+        incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]] = None,
+        positions: Optional[Tensor] = None,
+    ):
+        """Input is expected to be of size [bsz x seqlen]."""
+        assert (positions is None) or (
+            self.padding_idx is None
+        ), "If positions is pre-computed then padding_idx should not be set."
+
+        if positions is None:
+            if incremental_state is not None:
+                # positions is the same for every token when decoding a single step
+                # Without the int() cast, it doesn't work in some cases when exporting to ONNX
+                positions = torch.zeros(
+                    (1, 1), device=input.device, dtype=input.dtype
+                ).fill_(int(self.padding_idx + input.size(1)))
+            else:
+                positions = utils.make_positions(
+                    input, self.padding_idx, onnx_trace=self.onnx_trace
+                )
+            positions = torch.clamp(positions, max=self.padding_idx + self.max_positions)
+        return F.embedding(
+            positions,
+            self.weight,
+            self.padding_idx,
+            self.max_norm,
+            self.norm_type,
+            self.scale_grad_by_freq,
+            self.sparse,
+        )
diff --git a/SpeechLM/speechlm/modules/multihead_attention.py b/SpeechLM/speechlm/modules/multihead_attention.py
new file mode 100644
index 0000000000000000000000000000000000000000..a6ac408c623bea27aef3e77db30d91ad6fb904bc
--- /dev/null
+++ b/SpeechLM/speechlm/modules/multihead_attention.py
@@ -0,0 +1,348 @@
+# --------------------------------------------------------
+# Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data (https://arxiv.org/abs/2203.17113)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/Speech2C
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Based on fairseq code bases
+# https://github.com/pytorch/fairseq
+# --------------------------------------------------------
+
+from typing import Dict, Optional, Tuple
+
+import torch
+import torch.nn.functional as F
+from fairseq import utils
+from torch import Tensor
+
+from fairseq.modules import MultiheadAttention as FairseqMultiheadAttention
+
+
+class MultiheadAttention(FairseqMultiheadAttention):
+    """Multi-headed attention.
+
+    See "Attention Is All You Need" for more details.
+    """
+
+    def __init__(
+        self,
+        embed_dim,
+        num_heads,
+        kdim=None,
+        vdim=None,
+        dropout=0.0,
+        bias=True,
+        add_bias_kv=False,
+        add_zero_attn=False,
+        self_attention=False,
+        encoder_decoder_attention=False,
+        q_noise=0.0,
+        qn_block_size=8,
+        scaling_for_att=1.0
+    ):
+        super().__init__(
+            embed_dim,
+            num_heads,
+            kdim,
+            vdim,
+            dropout,
+            bias,
+            add_bias_kv,
+            add_zero_attn,
+            self_attention,
+            encoder_decoder_attention,
+            q_noise,
+            qn_block_size,
+        )
+        self.scaling_for_att = scaling_for_att
+
+    def forward(
+        self,
+        query,
+        key: Optional[Tensor],
+        value: Optional[Tensor],
+        key_padding_mask: Optional[Tensor] = None,
+        incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]] = None,
+        need_weights: bool = True,
+        static_kv: bool = False,
+        attn_mask: Optional[Tensor] = None,
+        before_softmax: bool = False,
+        need_head_weights: bool = False,
+        position_bias: Optional[Tensor] = None,
+    ) -> Tuple[Tensor, Optional[Tensor]]:
+        """Input shape: Time x Batch x Channel
+
+        Args:
+            key_padding_mask (ByteTensor, optional): mask to exclude
+                keys that are pads, of shape `(batch, src_len)`, where
+                padding elements are indicated by 1s.
+            need_weights (bool, optional): return the attention weights,
+                averaged over heads (default: False).
+            attn_mask (ByteTensor, optional): typically used to
+                implement causal attention, where the mask prevents the
+                attention from looking forward in time (default: None).
+            before_softmax (bool, optional): return the raw attention
+                weights and values before the attention softmax.
+            need_head_weights (bool, optional): return the attention
+                weights for each head. Implies *need_weights*. Default:
+                return the average attention weights over all heads.
+        """
+        if need_head_weights:
+            need_weights = True
+
+        is_tpu = query.device.type == "xla"
+
+        tgt_len, bsz, embed_dim = query.size()
+        src_len = tgt_len
+        assert embed_dim == self.embed_dim, f"query dim {embed_dim} != {self.embed_dim}"
+        assert list(query.size()) == [tgt_len, bsz, embed_dim]
+        if key is not None:
+            src_len, key_bsz, _ = key.size()
+            if not torch.jit.is_scripting():
+                assert key_bsz == bsz
+                assert value is not None
+                assert src_len, bsz == value.shape[:2]
+
+        if (
+            not self.onnx_trace
+            and not is_tpu  # don't use PyTorch version on TPUs
+            and incremental_state is None
+            and not static_kv
+            # A workaround for quantization to work. Otherwise JIT compilation
+            # treats bias in linear module as method.
+            and not torch.jit.is_scripting()
+            and position_bias is None
+        ):
+            assert key is not None and value is not None
+            return F.multi_head_attention_forward(
+                query,
+                key,
+                value,
+                self.embed_dim,
+                self.num_heads,
+                torch.empty([0]),
+                torch.cat((self.q_proj.bias, self.k_proj.bias, self.v_proj.bias)),
+                self.bias_k,
+                self.bias_v,
+                self.add_zero_attn,
+                self.dropout_module.p,
+                self.out_proj.weight,
+                self.out_proj.bias,
+                self.training or self.dropout_module.apply_during_inference,
+                key_padding_mask,
+                need_weights,
+                attn_mask,
+                use_separate_proj_weight=True,
+                q_proj_weight=self.q_proj.weight,
+                k_proj_weight=self.k_proj.weight,
+                v_proj_weight=self.v_proj.weight,
+            )
+
+        if incremental_state is not None:
+            saved_state = self._get_input_buffer(incremental_state)
+            if saved_state is not None and "prev_key" in saved_state:
+                # previous time steps are cached - no need to recompute
+                # key and value if they are static
+                if static_kv:
+                    assert self.encoder_decoder_attention and not self.self_attention
+                    key = value = None
+        else:
+            saved_state = None
+
+        if self.self_attention:
+            q = self.q_proj(query)
+            k = self.k_proj(query)
+            v = self.v_proj(query)
+        elif self.encoder_decoder_attention:
+            # encoder-decoder attention
+            q = self.q_proj(query)
+            if key is None:
+                assert value is None
+                k = v = None
+            else:
+                k = self.k_proj(key)
+                v = self.v_proj(key)
+
+        else:
+            assert key is not None and value is not None
+            q = self.q_proj(query)
+            k = self.k_proj(key)
+            v = self.v_proj(value)
+        q *= self.scaling
+        q *= (1 / self.scaling_for_att)
+
+        if self.bias_k is not None:
+            assert self.bias_v is not None
+            k = torch.cat([k, self.bias_k.repeat(1, bsz, 1)])
+            v = torch.cat([v, self.bias_v.repeat(1, bsz, 1)])
+            if attn_mask is not None:
+                attn_mask = torch.cat(
+                    [attn_mask, attn_mask.new_zeros(attn_mask.size(0), 1)], dim=1
+                )
+            if key_padding_mask is not None:
+                key_padding_mask = torch.cat(
+                    [
+                        key_padding_mask,
+                        key_padding_mask.new_zeros(key_padding_mask.size(0), 1),
+                    ],
+                    dim=1,
+                )
+
+        q = (
+            q.contiguous()
+            .view(tgt_len, bsz * self.num_heads, self.head_dim)
+            .transpose(0, 1)
+        )
+        if k is not None:
+            k = (
+                k.contiguous()
+                .view(-1, bsz * self.num_heads, self.head_dim)
+                .transpose(0, 1)
+            )
+        if v is not None:
+            v = (
+                v.contiguous()
+                .view(-1, bsz * self.num_heads, self.head_dim)
+                .transpose(0, 1)
+            )
+
+        if saved_state is not None:
+            # saved states are stored with shape (bsz, num_heads, seq_len, head_dim)
+            if "prev_key" in saved_state:
+                _prev_key = saved_state["prev_key"]
+                assert _prev_key is not None
+                prev_key = _prev_key.view(bsz * self.num_heads, -1, self.head_dim)
+                if static_kv:
+                    k = prev_key
+                else:
+                    assert k is not None
+                    k = torch.cat([prev_key, k], dim=1)
+                src_len = k.size(1)
+            if "prev_value" in saved_state:
+                _prev_value = saved_state["prev_value"]
+                assert _prev_value is not None
+                prev_value = _prev_value.view(bsz * self.num_heads, -1, self.head_dim)
+                if static_kv:
+                    v = prev_value
+                else:
+                    assert v is not None
+                    v = torch.cat([prev_value, v], dim=1)
+            prev_key_padding_mask: Optional[Tensor] = None
+            if "prev_key_padding_mask" in saved_state:
+                prev_key_padding_mask = saved_state["prev_key_padding_mask"]
+            assert k is not None and v is not None
+            key_padding_mask = MultiheadAttention._append_prev_key_padding_mask(
+                key_padding_mask=key_padding_mask,
+                prev_key_padding_mask=prev_key_padding_mask,
+                batch_size=bsz,
+                src_len=k.size(1),
+                static_kv=static_kv,
+            )
+
+            saved_state["prev_key"] = k.view(bsz, self.num_heads, -1, self.head_dim)
+            saved_state["prev_value"] = v.view(bsz, self.num_heads, -1, self.head_dim)
+            saved_state["prev_key_padding_mask"] = key_padding_mask
+            # In this branch incremental_state is never None
+            assert incremental_state is not None
+            incremental_state = self._set_input_buffer(incremental_state, saved_state)
+        assert k is not None
+        assert k.size(1) == src_len
+
+        # This is part of a workaround to get around fork/join parallelism
+        # not supporting Optional types.
+        if key_padding_mask is not None and key_padding_mask.dim() == 0:
+            key_padding_mask = None
+
+        if key_padding_mask is not None:
+            assert key_padding_mask.size(0) == bsz
+            assert key_padding_mask.size(1) == src_len
+
+        if self.add_zero_attn:
+            assert v is not None
+            src_len += 1
+            k = torch.cat([k, k.new_zeros((k.size(0), 1) + k.size()[2:])], dim=1)
+            v = torch.cat([v, v.new_zeros((v.size(0), 1) + v.size()[2:])], dim=1)
+            if attn_mask is not None:
+                attn_mask = torch.cat(
+                    [attn_mask, attn_mask.new_zeros(attn_mask.size(0), 1)], dim=1
+                )
+            if key_padding_mask is not None:
+                key_padding_mask = torch.cat(
+                    [
+                        key_padding_mask,
+                        torch.zeros(key_padding_mask.size(0), 1).type_as(
+                            key_padding_mask
+                        ),
+                    ],
+                    dim=1,
+                )
+
+        attn_weights = torch.bmm(q, k.transpose(1, 2))
+        attn_weights = self.apply_sparse_mask(attn_weights, tgt_len, src_len, bsz)
+
+        if position_bias is not None: ## first order
+            ## position_bias: [241, 241, 64]
+            #print ("attn_weights: ", attn_weights.size()) # [492, 241, 241]
+            reshape_q = q.contiguous().view(bsz * self.num_heads, -1, self.head_dim).transpose(0,1) #[241, 492, 64]
+            #print ("reshape_q: ", reshape_q.size())
+            B = torch.matmul(reshape_q, position_bias.transpose(-2, -1))
+            #print ("B: ", B.size())  ## [241, 492, 241]
+            #B = B.transpose(0, 1).view(bsz, self.num_heads, position_bias.size(0), position_bias.size(1))
+            B = B.transpose(0, 1).view(bsz*self.num_heads, position_bias.size(0), position_bias.size(1))
+            #print ("B 2: ", B.size())
+            attn_weights += B
+
+        attn_weights *= self.scaling_for_att
+        assert list(attn_weights.size()) == [bsz * self.num_heads, tgt_len, src_len]
+
+        if attn_mask is not None:
+            attn_mask = attn_mask.unsqueeze(0)
+            if self.onnx_trace:
+                attn_mask = attn_mask.repeat(attn_weights.size(0), 1, 1)
+            attn_weights += attn_mask
+
+        if key_padding_mask is not None:
+            # don't attend to padding symbols
+            attn_weights = attn_weights.view(bsz, self.num_heads, tgt_len, src_len)
+            if not is_tpu:
+                attn_weights = attn_weights.masked_fill(
+                    key_padding_mask.unsqueeze(1).unsqueeze(2).to(torch.bool),
+                    float("-inf"),
+                )
+            else:
+                attn_weights = attn_weights.transpose(0, 2)
+                attn_weights = attn_weights.masked_fill(key_padding_mask, float("-inf"))
+                attn_weights = attn_weights.transpose(0, 2)
+            attn_weights = attn_weights.view(bsz * self.num_heads, tgt_len, src_len)
+        
+        if self.scaling_for_att > 1.0:
+            attn_weights = attn_weights - attn_weights.detach().max(dim=-1, keepdim=True)[0]
+        
+        if before_softmax:
+            return attn_weights, v
+
+        attn_weights_float = utils.softmax(
+            attn_weights, dim=-1, onnx_trace=self.onnx_trace
+        )
+        attn_weights = attn_weights_float.type_as(attn_weights)
+        attn_probs = self.dropout_module(attn_weights)
+
+        assert v is not None
+        attn = torch.bmm(attn_probs, v)
+        assert list(attn.size()) == [bsz * self.num_heads, tgt_len, self.head_dim]
+        if self.onnx_trace and attn.size(1) == 1:
+            # when ONNX tracing a single decoder step (sequence length == 1)
+            # the transpose is a no-op copy before view, thus unnecessary
+            attn = attn.contiguous().view(tgt_len, bsz, embed_dim)
+        else:
+            attn = attn.transpose(0, 1).contiguous().view(tgt_len, bsz, embed_dim)
+        attn = self.out_proj(attn)
+        attn_weights: Optional[Tensor] = None
+        if need_weights:
+            attn_weights = attn_weights_float.view(
+                bsz, self.num_heads, tgt_len, src_len
+            ).transpose(1, 0)
+            if not need_head_weights:
+                # average attention weights over heads
+                attn_weights = attn_weights.mean(dim=0)
+
+        return attn, attn_weights
diff --git a/SpeechLM/speechlm/modules/relative_pos_enc.py b/SpeechLM/speechlm/modules/relative_pos_enc.py
new file mode 100644
index 0000000000000000000000000000000000000000..2a073ebf2893e9e9b092aa520bdaf927e9388c2b
--- /dev/null
+++ b/SpeechLM/speechlm/modules/relative_pos_enc.py
@@ -0,0 +1,35 @@
+# --------------------------------------------------------
+# Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data (https://arxiv.org/abs/2203.17113)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/Speech2C
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Based on fairseq code bases
+# https://github.com/pytorch/fairseq
+# --------------------------------------------------------
+
+import torch
+
+class RelativePositionalEncoding(torch.nn.Module):
+    def __init__(self, d_model, maxlen=1000, embed_v=False):
+        super(RelativePositionalEncoding, self).__init__()
+
+        self.d_model = d_model
+        self.maxlen = maxlen
+        self.pe_k = torch.nn.Embedding(2*maxlen, d_model) 
+        if embed_v:
+            self.pe_v = torch.nn.Embedding(2*maxlen, d_model)
+        self.embed_v = embed_v
+
+
+    def forward(self, pos_seq, incremental_state=None):
+        pos_seq[pos_seq < -self.maxlen] = -self.maxlen
+        pos_seq[pos_seq >= self.maxlen] = self.maxlen - 1
+        pos_seq = pos_seq + self.maxlen
+        
+        if incremental_state is not None:
+            pos_seq = pos_seq[-1:]
+
+        if self.embed_v:
+            return self.pe_k(pos_seq), self.pe_v(pos_seq)
+        else:
+            return self.pe_k(pos_seq), None
diff --git a/SpeechLM/speechlm/modules/transformer_decoder.py b/SpeechLM/speechlm/modules/transformer_decoder.py
new file mode 100644
index 0000000000000000000000000000000000000000..83e91fab234aaa9dfa00ed9f3fab63b98f97375a
--- /dev/null
+++ b/SpeechLM/speechlm/modules/transformer_decoder.py
@@ -0,0 +1,544 @@
+# --------------------------------------------------------
+# The YiTrans End-to-End Speech Translation System for IWSLT 2022 Offline Shared Task (https://arxiv.org/abs/2206.05777)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/YiTrans
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Based on fairseq code bases
+# https://github.com/facebookresearch/fairseq
+# --------------------------------------------------------
+"""
+    Modified from https://github.com/facebookresearch/fairseq/blob/main/fairseq/models/transformer/transformer_decoder.py
+"""
+
+import math
+from typing import Any, Dict, List, Optional
+
+import torch
+import torch.nn as nn
+from fairseq import utils
+from fairseq.distributed import fsdp_wrap
+from fairseq.models import FairseqIncrementalDecoder
+from fairseq.models.transformer import TransformerConfig
+from fairseq.modules import (
+    AdaptiveSoftmax,
+    BaseLayer,
+    FairseqDropout,
+    LayerDropModuleList,
+    LayerNorm,
+    PositionalEmbedding,
+    SinusoidalPositionalEmbedding,
+)
+from fairseq.modules.checkpoint_activations import checkpoint_wrapper
+from fairseq.modules.quant_noise import quant_noise as apply_quant_noise_
+from torch import Tensor
+
+from speechlm.modules import transformer_layer
+from speechlm.modules.relative_pos_enc import RelativePositionalEncoding
+
+# rewrite name for backward compatibility in `make_generation_fast_`
+def module_name_fordropout(module_name: str) -> str:
+    if module_name == "TransformerDecoderBase":
+        return "TransformerDecoder"
+    else:
+        return module_name
+
+
+class TransformerDecoderBase(FairseqIncrementalDecoder):
+    """
+    Transformer decoder consisting of *cfg.decoder.layers* layers. Each layer
+    is a :class:`TransformerDecoderLayer`.
+
+    Args:
+        args (argparse.Namespace): parsed command-line arguments
+        dictionary (~fairseq.data.Dictionary): decoding dictionary
+        embed_tokens (torch.nn.Embedding): output embedding
+        no_encoder_attn (bool, optional): whether to attend to encoder outputs
+            (default: False).
+    """
+
+    def __init__(
+        self,
+        cfg,
+        dictionary,
+        embed_tokens,
+        no_encoder_attn=False,
+        output_projection=None,
+        use_rel_pos_enc=False,
+    ):
+        self.cfg = cfg
+        super().__init__(dictionary)
+        self.register_buffer("version", torch.Tensor([3]))
+        self._future_mask = torch.empty(0)
+
+        self.dropout_module = FairseqDropout(
+            cfg.dropout, module_name=module_name_fordropout(self.__class__.__name__)
+        )
+        self.decoder_layerdrop = cfg.decoder.layerdrop
+        self.share_input_output_embed = cfg.share_decoder_input_output_embed
+
+        input_embed_dim = embed_tokens.embedding_dim
+        embed_dim = cfg.decoder.embed_dim
+        self.embed_dim = embed_dim
+        self.output_embed_dim = cfg.decoder.output_dim
+
+        self.padding_idx = embed_tokens.padding_idx
+        self.max_target_positions = cfg.max_target_positions
+
+        self.embed_tokens = embed_tokens
+
+        self.embed_scale = 1.0 if cfg.no_scale_embedding else math.sqrt(embed_dim)
+
+        if not cfg.adaptive_input and cfg.quant_noise.pq > 0:
+            self.quant_noise = apply_quant_noise_(
+                nn.Linear(embed_dim, embed_dim, bias=False),
+                cfg.quant_noise.pq,
+                cfg.quant_noise.pq_block_size,
+            )
+        else:
+            self.quant_noise = None
+
+        self.project_in_dim = (
+            Linear(input_embed_dim, embed_dim, bias=False)
+            if embed_dim != input_embed_dim
+            else None
+        )
+        self.embed_positions = (
+            PositionalEmbedding(
+                self.max_target_positions,
+                embed_dim,
+                self.padding_idx,
+                learned=cfg.decoder.learned_pos,
+            )
+            if not cfg.no_token_positional_embeddings
+            else None
+        )
+        if cfg.layernorm_embedding:
+            self.layernorm_embedding = LayerNorm(embed_dim, export=cfg.export)
+        else:
+            self.layernorm_embedding = None
+
+        self.cross_self_attention = cfg.cross_self_attention
+
+        if self.decoder_layerdrop > 0.0:
+            self.layers = LayerDropModuleList(p=self.decoder_layerdrop)
+        else:
+            self.layers = nn.ModuleList([])
+        self.use_rel_pos_enc = use_rel_pos_enc
+        self.layers.extend(
+            [
+                self.build_decoder_layer(cfg, no_encoder_attn)
+                for _ in range(cfg.decoder.layers)
+            ]
+        )
+        self.num_layers = len(self.layers)
+
+        if cfg.decoder.normalize_before and not cfg.no_decoder_final_norm:
+            self.layer_norm = LayerNorm(embed_dim, export=cfg.export)
+        else:
+            self.layer_norm = None
+
+        self.project_out_dim = (
+            Linear(embed_dim, self.output_embed_dim, bias=False)
+            if embed_dim != self.output_embed_dim and not cfg.tie_adaptive_weights
+            else None
+        )
+
+        self.adaptive_softmax = None
+        self.output_projection = output_projection
+        if self.output_projection is None:
+            self.build_output_projection(cfg, dictionary, embed_tokens)
+        if self.use_rel_pos_enc:
+            self.pos_emb = RelativePositionalEncoding(embed_dim // cfg.decoder.attention_heads, 24)
+
+    def build_output_projection(self, cfg, dictionary, embed_tokens):
+        if cfg.adaptive_softmax_cutoff is not None:
+            self.adaptive_softmax = AdaptiveSoftmax(
+                len(dictionary),
+                self.output_embed_dim,
+                utils.eval_str_list(cfg.adaptive_softmax_cutoff, type=int),
+                dropout=cfg.adaptive_softmax_dropout,
+                adaptive_inputs=embed_tokens if cfg.tie_adaptive_weights else None,
+                factor=cfg.adaptive_softmax_factor,
+                tie_proj=cfg.tie_adaptive_proj,
+            )
+        elif self.share_input_output_embed:
+            self.output_projection = nn.Linear(
+                self.embed_tokens.weight.shape[1],
+                self.embed_tokens.weight.shape[0],
+                bias=False,
+            )
+            self.output_projection.weight = self.embed_tokens.weight
+        else:
+            self.output_projection = nn.Linear(
+                self.output_embed_dim, len(dictionary), bias=False
+            )
+            nn.init.normal_(
+                self.output_projection.weight, mean=0, std=self.output_embed_dim ** -0.5
+            )
+        num_base_layers = cfg.base_layers
+        for i in range(num_base_layers):
+            self.layers.insert(
+                ((i + 1) * cfg.decoder.layers) // (num_base_layers + 1),
+                BaseLayer(cfg),
+            )
+
+    def build_decoder_layer(self, cfg, no_encoder_attn=False):
+        layer = transformer_layer.TransformerDecoderLayerBase(cfg, no_encoder_attn, has_relative_attention_bias=self.use_rel_pos_enc)
+        checkpoint = cfg.checkpoint_activations
+        if checkpoint:
+            offload_to_cpu = cfg.offload_activations
+            layer = checkpoint_wrapper(layer, offload_to_cpu=offload_to_cpu)
+        # if we are checkpointing, enforce that FSDP always wraps the
+        # checkpointed layer, regardless of layer size
+        min_params_to_wrap = cfg.min_params_to_wrap if not checkpoint else 0
+        layer = fsdp_wrap(layer, min_num_params=min_params_to_wrap)
+        return layer
+
+    def forward(
+        self,
+        prev_output_tokens,
+        encoder_out: Optional[Dict[str, List[Tensor]]] = None,
+        incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]] = None,
+        features_only: bool = False,
+        full_context_alignment: bool = False,
+        alignment_layer: Optional[int] = None,
+        alignment_heads: Optional[int] = None,
+        src_lengths: Optional[Any] = None,
+        return_all_hiddens: bool = False,
+    ):
+        """
+        Args:
+            prev_output_tokens (LongTensor): previous decoder outputs of shape
+                `(batch, tgt_len)`, for teacher forcing
+            encoder_out (optional): output from the encoder, used for
+                encoder-side attention, should be of size T x B x C
+            incremental_state (dict): dictionary used for storing state during
+                :ref:`Incremental decoding`
+            features_only (bool, optional): only return features without
+                applying output layer (default: False).
+            full_context_alignment (bool, optional): don't apply
+                auto-regressive mask to self-attention (default: False).
+
+        Returns:
+            tuple:
+                - the decoder's output of shape `(batch, tgt_len, vocab)`
+                - a dictionary with any model-specific outputs
+        """
+
+        x, extra = self.extract_features(
+            prev_output_tokens,
+            encoder_out=encoder_out,
+            incremental_state=incremental_state,
+            full_context_alignment=full_context_alignment,
+            alignment_layer=alignment_layer,
+            alignment_heads=alignment_heads,
+        )
+
+        if not features_only:
+            x = self.output_layer(x)
+        return x, extra
+
+    def extract_features(
+        self,
+        prev_output_tokens,
+        encoder_out: Optional[Dict[str, List[Tensor]]],
+        incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]] = None,
+        full_context_alignment: bool = False,
+        alignment_layer: Optional[int] = None,
+        alignment_heads: Optional[int] = None,
+    ):
+        return self.extract_features_scriptable(
+            prev_output_tokens,
+            encoder_out,
+            incremental_state,
+            full_context_alignment,
+            alignment_layer,
+            alignment_heads,
+        )
+
+    """
+    A scriptable subclass of this class has an extract_features method and calls
+    super().extract_features, but super() is not supported in torchscript. A copy of
+    this function is made to be used in the subclass instead.
+    """
+
+    def extract_features_scriptable(
+        self,
+        prev_output_tokens,
+        encoder_out: Optional[Dict[str, List[Tensor]]],
+        incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]] = None,
+        full_context_alignment: bool = False,
+        alignment_layer: Optional[int] = None,
+        alignment_heads: Optional[int] = None,
+    ):
+        """
+        Similar to *forward* but only return features.
+
+        Includes several features from "Jointly Learning to Align and
+        Translate with Transformer Models" (Garg et al., EMNLP 2019).
+
+        Args:
+            full_context_alignment (bool, optional): don't apply
+                auto-regressive mask to self-attention (default: False).
+            alignment_layer (int, optional): return mean alignment over
+                heads at this layer (default: last layer).
+            alignment_heads (int, optional): only average alignment over
+                this many heads (default: all heads).
+
+        Returns:
+            tuple:
+                - the decoder's features of shape `(batch, tgt_len, embed_dim)`
+                - a dictionary with any model-specific outputs
+        """
+        bs, slen = prev_output_tokens.size()
+        if alignment_layer is None:
+            alignment_layer = self.num_layers - 1
+
+        enc: Optional[Tensor] = None
+        padding_mask: Optional[Tensor] = None
+        if encoder_out is not None and len(encoder_out["encoder_out"]) > 0:
+            enc = encoder_out["encoder_out"][0]
+            assert (
+                enc.size()[1] == bs
+            ), f"Expected enc.shape == (t, {bs}, c) got {enc.shape}"
+        if encoder_out is not None and len(encoder_out["encoder_padding_mask"]) > 0:
+            padding_mask = encoder_out["encoder_padding_mask"][0]
+
+        # embed positions
+        positions = None
+        if self.embed_positions is not None:
+            positions = self.embed_positions(
+                prev_output_tokens, incremental_state=incremental_state
+            )
+
+        if incremental_state is not None:
+            prev_output_tokens = prev_output_tokens[:, -1:]
+            if positions is not None:
+                positions = positions[:, -1:]
+
+        # embed tokens and positions
+        x = self.embed_scale * self.embed_tokens(prev_output_tokens)
+
+        if self.quant_noise is not None:
+            x = self.quant_noise(x)
+
+        if self.project_in_dim is not None:
+            x = self.project_in_dim(x)
+
+        if positions is not None:
+            x += positions
+
+        if self.layernorm_embedding is not None:
+            x = self.layernorm_embedding(x)
+
+        x = self.dropout_module(x)
+
+        # B x T x C -> T x B x C
+        x = x.transpose(0, 1)
+        if self.use_rel_pos_enc:
+            pos_seq = torch.arange(0, slen).long().to(x.device)
+            pos_seq = pos_seq[:, None] - pos_seq[None, :]
+            pos_k, _ = self.pos_emb(pos_seq, incremental_state)
+        else:
+            pos_k = None
+
+        self_attn_padding_mask: Optional[Tensor] = None
+        if self.cross_self_attention or prev_output_tokens.eq(self.padding_idx).any():
+            self_attn_padding_mask = prev_output_tokens.eq(self.padding_idx)
+
+        # decoder layers
+        attn: Optional[Tensor] = None
+        inner_states: List[Optional[Tensor]] = [x]
+        for idx, layer in enumerate(self.layers):
+            if incremental_state is None and not full_context_alignment:
+                self_attn_mask = self.buffered_future_mask(x)
+            else:
+                self_attn_mask = None
+
+            x, layer_attn, _ = layer(
+                x,
+                enc,
+                padding_mask,
+                incremental_state,
+                self_attn_mask=self_attn_mask,
+                self_attn_padding_mask=self_attn_padding_mask,
+                need_attn=bool((idx == alignment_layer)),
+                need_head_weights=bool((idx == alignment_layer)),
+                pos_bias=pos_k,
+            )
+            inner_states.append(x)
+            if layer_attn is not None and idx == alignment_layer:
+                attn = layer_attn.float().to(x)
+
+        if attn is not None:
+            if alignment_heads is not None:
+                attn = attn[:alignment_heads]
+
+            # average probabilities over heads
+            attn = attn.mean(dim=0)
+
+        if self.layer_norm is not None:
+            x = self.layer_norm(x)
+
+        # T x B x C -> B x T x C
+        x = x.transpose(0, 1)
+
+        if self.project_out_dim is not None:
+            x = self.project_out_dim(x)
+
+        return x, {"attn": [attn], "inner_states": inner_states}
+
+    def output_layer(self, features):
+        """Project features to the vocabulary size."""
+        if self.adaptive_softmax is None:
+            # project back to size of vocabulary
+            return self.output_projection(features)
+        else:
+            return features
+
+    def max_positions(self):
+        """Maximum output length supported by the decoder."""
+        if self.embed_positions is None:
+            return self.max_target_positions
+        return min(self.max_target_positions, self.embed_positions.max_positions)
+
+    def buffered_future_mask(self, tensor):
+        dim = tensor.size(0)
+        # self._future_mask.device != tensor.device is not working in TorchScript. This is a workaround.
+        if (
+            self._future_mask.size(0) == 0
+            or (not self._future_mask.device == tensor.device)
+            or self._future_mask.size(0) < dim
+        ):
+            self._future_mask = torch.triu(
+                utils.fill_with_neg_inf(torch.zeros([dim, dim])), 1
+            )
+        self._future_mask = self._future_mask.to(tensor)
+        return self._future_mask[:dim, :dim]
+
+    def upgrade_state_dict_named(self, state_dict, name):
+        """Upgrade a (possibly old) state dict for new versions of fairseq."""
+        if isinstance(self.embed_positions, SinusoidalPositionalEmbedding):
+            weights_key = "{}.embed_positions.weights".format(name)
+            if weights_key in state_dict:
+                del state_dict[weights_key]
+            state_dict[
+                "{}.embed_positions._float_tensor".format(name)
+            ] = torch.FloatTensor(1)
+
+        if f"{name}.output_projection.weight" not in state_dict:
+            if self.share_input_output_embed:
+                embed_out_key = f"{name}.embed_tokens.weight"
+            else:
+                embed_out_key = f"{name}.embed_out"
+            if embed_out_key in state_dict:
+                state_dict[f"{name}.output_projection.weight"] = state_dict[
+                    embed_out_key
+                ]
+                if not self.share_input_output_embed:
+                    del state_dict[embed_out_key]
+
+        for i in range(self.num_layers):
+            # update layer norms
+            layer_norm_map = {
+                "0": "self_attn_layer_norm",
+                "1": "encoder_attn_layer_norm",
+                "2": "final_layer_norm",
+            }
+            for old, new in layer_norm_map.items():
+                for m in ("weight", "bias"):
+                    k = "{}.layers.{}.layer_norms.{}.{}".format(name, i, old, m)
+                    if k in state_dict:
+                        state_dict[
+                            "{}.layers.{}.{}.{}".format(name, i, new, m)
+                        ] = state_dict[k]
+                        del state_dict[k]
+
+        version_key = "{}.version".format(name)
+        if utils.item(state_dict.get(version_key, torch.Tensor([1]))[0]) <= 2:
+            # earlier checkpoints did not normalize after the stack of layers
+            self.layer_norm = None
+            self.normalize = False
+            state_dict[version_key] = torch.Tensor([1])
+
+        return state_dict
+
+
+def Linear(in_features, out_features, bias=True):
+    m = nn.Linear(in_features, out_features, bias)
+    nn.init.xavier_uniform_(m.weight)
+    if bias:
+        nn.init.constant_(m.bias, 0.0)
+    return m
+
+class TransformerDecoderBaseScriptable(TransformerDecoderBase):
+    def extract_features(
+        self,
+        prev_output_tokens,
+        encoder_out: Optional[Dict[str, List[Tensor]]] = None,
+        incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]] = None,
+        full_context_alignment: bool = False,
+        alignment_layer: Optional[int] = None,
+        alignment_heads: Optional[int] = None,
+    ):
+        # call scriptable method from parent class
+        x, _ = self.extract_features_scriptable(
+            prev_output_tokens,
+            encoder_out,
+            incremental_state,
+            full_context_alignment,
+            alignment_layer,
+            alignment_heads,
+        )
+        return x, None
+
+
+class TransformerDecoder(TransformerDecoderBase):
+    def __init__(
+        self,
+        args,
+        dictionary,
+        embed_tokens,
+        no_encoder_attn=False,
+        output_projection=None,
+    ):
+        self.args = args
+        super().__init__(
+            TransformerConfig.from_namespace(args),
+            dictionary,
+            embed_tokens,
+            no_encoder_attn=no_encoder_attn,
+            output_projection=output_projection,
+            use_rel_pos_enc=getattr(args, "use_rel_pos_enc", False),
+        )
+
+    def build_output_projection(self, args, dictionary, embed_tokens):
+        super().build_output_projection(
+            TransformerConfig.from_namespace(args), dictionary, embed_tokens
+        )
+
+    def build_decoder_layer(self, args, no_encoder_attn=False):
+        return super().build_decoder_layer(
+            TransformerConfig.from_namespace(args), no_encoder_attn=no_encoder_attn
+        )
+
+class TransformerDecoderScriptable(TransformerDecoder):
+    def extract_features(
+        self,
+        prev_output_tokens,
+        encoder_out: Optional[Dict[str, List[Tensor]]] = None,
+        incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]] = None,
+        full_context_alignment: bool = False,
+        alignment_layer: Optional[int] = None,
+        alignment_heads: Optional[int] = None,
+    ):
+        # call scriptable method from parent class
+        x, _ = self.extract_features_scriptable(
+            prev_output_tokens,
+            encoder_out,
+            incremental_state,
+            full_context_alignment,
+            alignment_layer,
+            alignment_heads,
+        )
+        return x, None
diff --git a/SpeechLM/speechlm/modules/transformer_encoder.py b/SpeechLM/speechlm/modules/transformer_encoder.py
new file mode 100644
index 0000000000000000000000000000000000000000..43dc7f82708de434d383b751084e04e2f89d0bd9
--- /dev/null
+++ b/SpeechLM/speechlm/modules/transformer_encoder.py
@@ -0,0 +1,403 @@
+# ----------------------------------------------------------------------------
+# SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data (https://arxiv.org/abs/2209.15329)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/SpeechLM
+# Code based on fairseq: https://github.com/facebookresearch/fairseq/tree/272c4c5197250997148fb12c0db6306035f166a4
+# 
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# ----------------------------------------------------------------------------
+
+import math
+from typing import Dict, List, Optional
+
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from fairseq import utils
+from fairseq.distributed import fsdp_wrap
+from fairseq.models import FairseqEncoder
+from fairseq.modules import (
+    FairseqDropout,
+    LayerDropModuleList,
+    LayerNorm,
+    SinusoidalPositionalEmbedding,
+)
+from fairseq.modules.checkpoint_activations import checkpoint_wrapper
+from fairseq.modules.quant_noise import quant_noise as apply_quant_noise_
+from torch import Tensor
+from fairseq.models.transformer import (
+    TransformerConfig,
+)
+
+
+from speechlm.modules import transformer_layer, LearnedPositionalEmbedding
+from speechlm.modules.relative_pos_enc import RelativePositionalEncoding
+
+# rewrite name for backward compatibility in `make_generation_fast_`
+def module_name_fordropout(module_name: str) -> str:
+    if module_name == "TransformerEncoderBase":
+        return "TransformerEncoder"
+    else:
+        return module_name
+
+
+class TransformerEncoderBase(FairseqEncoder):
+    """
+    Transformer encoder consisting of *cfg.encoder.layers* layers. Each layer
+    is a :class:`TransformerEncoderLayer`.
+
+    Args:
+        args (argparse.Namespace): parsed command-line arguments
+        dictionary (~fairseq.data.Dictionary): encoding dictionary
+        embed_tokens (torch.nn.Embedding): input embedding
+    """
+
+    def __init__(self, cfg, dictionary, embed_tokens, use_rel_pos_enc=False, scaling_for_att=1.0):
+        self.cfg = cfg
+        super().__init__(dictionary)
+        self.register_buffer("version", torch.Tensor([3]))
+
+        self.dropout_module = FairseqDropout(
+            cfg.dropout, module_name=module_name_fordropout(self.__class__.__name__)
+        )
+        self.encoder_layerdrop = cfg.encoder.layerdrop
+
+        embed_dim = embed_tokens.embedding_dim
+        self.padding_idx = embed_tokens.padding_idx
+        self.max_source_positions = cfg.max_source_positions
+
+        self.embed_tokens = embed_tokens
+
+        self.embed_scale = 1.0 if cfg.no_scale_embedding else math.sqrt(embed_dim)
+
+        self.embed_positions = (
+            PositionalEmbedding(
+                cfg.max_source_positions,
+                embed_dim,
+                self.padding_idx,
+                learned=cfg.encoder.learned_pos,
+            )
+            if not cfg.no_token_positional_embeddings
+            else None
+        )
+        if cfg.layernorm_embedding:
+            self.layernorm_embedding = LayerNorm(embed_dim, export=cfg.export)
+        else:
+            self.layernorm_embedding = None
+
+        if not cfg.adaptive_input and cfg.quant_noise.pq > 0:
+            self.quant_noise = apply_quant_noise_(
+                nn.Linear(embed_dim, embed_dim, bias=False),
+                cfg.quant_noise.pq,
+                cfg.quant_noise.pq_block_size,
+            )
+        else:
+            self.quant_noise = None
+
+        if self.encoder_layerdrop > 0.0:
+            self.layers = LayerDropModuleList(p=self.encoder_layerdrop)
+        else:
+            self.layers = nn.ModuleList([])
+        self.use_rel_pos_enc = use_rel_pos_enc
+        self.scaling_for_att = scaling_for_att
+        self.layers.extend(
+            [self.build_encoder_layer(cfg) for i in range(cfg.encoder.layers)]
+        )
+        self.num_layers = len(self.layers)
+
+        if cfg.encoder.normalize_before:
+            self.layer_norm = LayerNorm(embed_dim, export=cfg.export)
+        else:
+            self.layer_norm = None
+        if self.use_rel_pos_enc:
+            self.pos_emb = RelativePositionalEncoding(embed_dim // cfg.encoder.attention_heads, 160)
+
+    def build_encoder_layer(self, cfg):
+        layer = transformer_layer.TransformerEncoderLayerBase(cfg, has_relative_attention_bias=self.use_rel_pos_enc, scaling_for_att=self.scaling_for_att)
+        checkpoint = cfg.checkpoint_activations
+        if checkpoint:
+            offload_to_cpu = cfg.offload_activations
+            layer = checkpoint_wrapper(layer, offload_to_cpu=offload_to_cpu)
+        # if we are checkpointing, enforce that FSDP always wraps the
+        # checkpointed layer, regardless of layer size
+        min_params_to_wrap = cfg.min_params_to_wrap if not checkpoint else 0
+        layer = fsdp_wrap(layer, min_num_params=min_params_to_wrap)
+        return layer
+
+    def forward_embedding(
+        self, src_tokens, token_embedding: Optional[torch.Tensor] = None
+    ):
+        # embed tokens and positions
+        if token_embedding is None:
+            token_embedding = self.embed_tokens(src_tokens)
+        x = embed = self.embed_scale * token_embedding
+        if self.embed_positions is not None:
+            x = embed + self.embed_positions(src_tokens)
+        if self.layernorm_embedding is not None:
+            x = self.layernorm_embedding(x)
+        x = self.dropout_module(x)
+        if self.quant_noise is not None:
+            x = self.quant_noise(x)
+        return x, embed
+
+    def forward(
+        self,
+        src_tokens,
+        src_lengths: Optional[torch.Tensor] = None,
+        return_all_hiddens: bool = False,
+        token_embeddings: Optional[torch.Tensor] = None,
+        uniformity_layers: Optional[List[int]] = None,
+    ):
+        """
+        Args:
+            src_tokens (LongTensor): tokens in the source language of shape
+                `(batch, src_len)`
+            src_lengths (torch.LongTensor): lengths of each source sentence of
+                shape `(batch)`
+            return_all_hiddens (bool, optional): also return all of the
+                intermediate hidden states (default: False).
+            token_embeddings (torch.Tensor, optional): precomputed embeddings
+                default `None` will recompute embeddings
+
+        Returns:
+            dict:
+                - **encoder_out** (Tensor): the last encoder layer's output of
+                  shape `(src_len, batch, embed_dim)`
+                - **encoder_padding_mask** (ByteTensor): the positions of
+                  padding elements of shape `(batch, src_len)`
+                - **encoder_embedding** (Tensor): the (scaled) embedding lookup
+                  of shape `(batch, src_len, embed_dim)`
+                - **encoder_states** (List[Tensor]): all intermediate
+                  hidden states of shape `(src_len, batch, embed_dim)`.
+                  Only populated if *return_all_hiddens* is True.
+        """
+        return self.forward_scriptable(
+            src_tokens, src_lengths, return_all_hiddens, token_embeddings, uniformity_layers
+        )
+
+    # TorchScript doesn't support super() method so that the scriptable Subclass
+    # can't access the base class model in Torchscript.
+    # Current workaround is to add a helper function with different name and
+    # call the helper function from scriptable Subclass.
+    def forward_scriptable(
+        self,
+        src_tokens,
+        src_lengths: Optional[torch.Tensor] = None,
+        return_all_hiddens: bool = False,
+        token_embeddings: Optional[torch.Tensor] = None,
+        uniformity_layers: Optional[List[int]] = None,
+    ):
+        """
+        Args:
+            src_tokens (LongTensor): tokens in the source language of shape
+                `(batch, src_len)`
+            src_lengths (torch.LongTensor): lengths of each source sentence of
+                shape `(batch)`
+            return_all_hiddens (bool, optional): also return all of the
+                intermediate hidden states (default: False).
+            token_embeddings (torch.Tensor, optional): precomputed embeddings
+                default `None` will recompute embeddings
+
+        Returns:
+            dict:
+                - **encoder_out** (Tensor): the last encoder layer's output of
+                  shape `(src_len, batch, embed_dim)`
+                - **encoder_padding_mask** (ByteTensor): the positions of
+                  padding elements of shape `(batch, src_len)`
+                - **encoder_embedding** (Tensor): the (scaled) embedding lookup
+                  of shape `(batch, src_len, embed_dim)`
+                - **encoder_states** (List[Tensor]): all intermediate
+                  hidden states of shape `(src_len, batch, embed_dim)`.
+                  Only populated if *return_all_hiddens* is True.
+        """
+        # compute padding mask
+        encoder_padding_mask = src_tokens.eq(self.padding_idx)
+        has_pads = src_tokens.device.type == "xla" or encoder_padding_mask.any()
+
+        x, encoder_embedding = self.forward_embedding(src_tokens, token_embeddings)
+
+        # account for padding while computing the representation
+        if has_pads:
+            x = x * (1 - encoder_padding_mask.unsqueeze(-1).type_as(x))
+
+        # B x T x C -> T x B x C
+        x = x.transpose(0, 1)
+        if self.use_rel_pos_enc:
+            x_len = x.shape[0]
+            pos_seq = torch.arange(0, x_len).long().to(x.device)
+            pos_seq = pos_seq[:, None] - pos_seq[None, :]
+            pos_k, pos_v = self.pos_emb(pos_seq)
+        else:
+            pos_k = None
+
+        encoder_states = []
+        uniformity_hiddens = []
+
+        if return_all_hiddens:
+            encoder_states.append(x)
+
+        if uniformity_layers is not None and 0 in uniformity_layers:
+            x = F.normalize(x.float(), dim=-1).type_as(x)
+            uniformity_hiddens.append(x)
+
+        # encoder layers
+        for i, layer in enumerate(self.layers):
+            x = layer(
+                x, encoder_padding_mask=encoder_padding_mask if has_pads else None,
+                pos_bias=pos_k,
+            )
+            if uniformity_layers is not None and i+1 in uniformity_layers:
+                x = F.normalize(x.float(), dim=-1).type_as(x)
+                uniformity_hiddens.append(x)
+            if return_all_hiddens:
+                assert encoder_states is not None
+                encoder_states.append(x)
+
+        if self.layer_norm is not None:
+            x = self.layer_norm(x)
+
+        # The Pytorch Mobile lite interpreter does not supports returning NamedTuple in
+        # `forward` so we use a dictionary instead.
+        # TorchScript does not support mixed values so the values are all lists.
+        # The empty list is equivalent to None.
+        src_lengths = (
+            src_tokens.ne(self.padding_idx)
+            .sum(dim=1, dtype=torch.int32)
+            .reshape(-1, 1)
+            .contiguous()
+        )
+        return {
+            "encoder_out": [x],  # T x B x C
+            "encoder_padding_mask": [encoder_padding_mask],  # B x T
+            "encoder_embedding": [encoder_embedding],  # B x T x C
+            "encoder_states": encoder_states,  # List[T x B x C]
+            "uniformity_hiddens": uniformity_hiddens, # List[T x B x C]
+            "src_tokens": [],
+            "src_lengths": [src_lengths],
+        }
+
+    @torch.jit.export
+    def reorder_encoder_out(self, encoder_out: Dict[str, List[Tensor]], new_order):
+        """
+        Reorder encoder output according to *new_order*.
+
+        Args:
+            encoder_out: output from the ``forward()`` method
+            new_order (LongTensor): desired order
+
+        Returns:
+            *encoder_out* rearranged according to *new_order*
+        """
+        if len(encoder_out["encoder_out"]) == 0:
+            new_encoder_out = []
+        else:
+            new_encoder_out = [encoder_out["encoder_out"][0].index_select(1, new_order)]
+        if len(encoder_out["encoder_padding_mask"]) == 0:
+            new_encoder_padding_mask = []
+        else:
+            new_encoder_padding_mask = [
+                encoder_out["encoder_padding_mask"][0].index_select(0, new_order)
+            ]
+        if len(encoder_out["encoder_embedding"]) == 0:
+            new_encoder_embedding = []
+        else:
+            new_encoder_embedding = [
+                encoder_out["encoder_embedding"][0].index_select(0, new_order)
+            ]
+
+        if len(encoder_out["src_tokens"]) == 0:
+            src_tokens = []
+        else:
+            src_tokens = [(encoder_out["src_tokens"][0]).index_select(0, new_order)]
+
+        if len(encoder_out["src_lengths"]) == 0:
+            src_lengths = []
+        else:
+            src_lengths = [(encoder_out["src_lengths"][0]).index_select(0, new_order)]
+
+        encoder_states = encoder_out["encoder_states"]
+        if len(encoder_states) > 0:
+            for idx, state in enumerate(encoder_states):
+                encoder_states[idx] = state.index_select(1, new_order)
+
+        return {
+            "encoder_out": new_encoder_out,  # T x B x C
+            "encoder_padding_mask": new_encoder_padding_mask,  # B x T
+            "encoder_embedding": new_encoder_embedding,  # B x T x C
+            "encoder_states": encoder_states,  # List[T x B x C]
+            "src_tokens": src_tokens,  # B x T
+            "src_lengths": src_lengths,  # B x 1
+        }
+
+    def max_positions(self):
+        """Maximum input length supported by the encoder."""
+        if self.embed_positions is None:
+            return self.max_source_positions
+        return min(self.max_source_positions, self.embed_positions.max_positions)
+
+    def upgrade_state_dict_named(self, state_dict, name):
+        """Upgrade a (possibly old) state dict for new versions of fairseq."""
+        if isinstance(self.embed_positions, SinusoidalPositionalEmbedding):
+            weights_key = "{}.embed_positions.weights".format(name)
+            if weights_key in state_dict:
+                print("deleting {0}".format(weights_key))
+                del state_dict[weights_key]
+            state_dict[
+                "{}.embed_positions._float_tensor".format(name)
+            ] = torch.FloatTensor(1)
+        for i in range(self.num_layers):
+            # update layer norms
+            self.layers[i].upgrade_state_dict_named(
+                state_dict, "{}.layers.{}".format(name, i)
+            )
+
+        version_key = "{}.version".format(name)
+        if utils.item(state_dict.get(version_key, torch.Tensor([1]))[0]) < 2:
+            # earlier checkpoints did not normalize after the stack of layers
+            self.layer_norm = None
+            self.normalize = False
+            state_dict[version_key] = torch.Tensor([1])
+        return state_dict
+
+
+class TransformerEncoder(TransformerEncoderBase):
+    def __init__(self, args, dictionary, embed_tokens):
+        self.args = args
+        super().__init__(
+            TransformerConfig.from_namespace(args),
+            dictionary,
+            embed_tokens,
+            use_rel_pos_enc=getattr(args, "use_rel_pos_enc", False),
+            scaling_for_att=getattr(args, "scaling_for_att", 1.0),
+        )
+
+    def build_encoder_layer(self, args):
+        return super().build_encoder_layer(
+            TransformerConfig.from_namespace(args),
+        )
+
+
+def PositionalEmbedding(
+    num_embeddings: int,
+    embedding_dim: int,
+    padding_idx: int,
+    learned: bool = False,
+):
+    if learned:
+        # if padding_idx is specified then offset the embedding ids by
+        # this index and adjust num_embeddings appropriately
+        # TODO: The right place for this offset would be inside
+        # LearnedPositionalEmbedding. Move this there for a cleaner implementation.
+        if padding_idx is not None:
+            num_embeddings = num_embeddings + padding_idx + 1
+        m = LearnedPositionalEmbedding(num_embeddings, embedding_dim, padding_idx)
+        nn.init.normal_(m.weight, mean=0, std=embedding_dim**-0.5)
+        if padding_idx is not None:
+            nn.init.constant_(m.weight[padding_idx], 0)
+    else:
+        m = SinusoidalPositionalEmbedding(
+            embedding_dim,
+            padding_idx,
+            init_size=num_embeddings + padding_idx + 1,
+        )
+    return m
diff --git a/SpeechLM/speechlm/modules/transformer_layer.py b/SpeechLM/speechlm/modules/transformer_layer.py
new file mode 100644
index 0000000000000000000000000000000000000000..1e3fa96e71426ab77e52bcee6ce052673b0875db
--- /dev/null
+++ b/SpeechLM/speechlm/modules/transformer_layer.py
@@ -0,0 +1,329 @@
+# --------------------------------------------------------
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Based on fairseq code bases
+# https://github.com/facebookresearch/fairseq
+# --------------------------------------------------------
+"""
+    Modified from https://github.com/facebookresearch/fairseq/blob/main/fairseq/modules/transformer_layer.py
+    https://github.com/microsoft/SpeechT5/blob/main/Speech2C/speech2c/models/modules/transformer_decoder_layer.py
+"""
+
+from typing import Dict, List, Optional
+
+import torch
+from torch import Tensor
+from fairseq.modules import LayerNorm
+from speechlm.modules.multihead_attention import MultiheadAttention
+from fairseq.modules.transformer_layer import TransformerEncoderLayerBase as FairseqTransformerEncoderLayerBase
+from fairseq.modules.transformer_layer import TransformerDecoderLayerBase as FairseqTransformerDecoderLayerBase
+
+
+class TransformerEncoderLayerBase(FairseqTransformerEncoderLayerBase):
+    """Encoder layer block.
+
+    In the original paper each operation (multi-head attention or FFN) is
+    postprocessed with: `dropout -> add residual -> layernorm`. In the
+    tensor2tensor code they suggest that learning is more robust when
+    preprocessing each layer with layernorm and postprocessing with:
+    `dropout -> add residual`. We default to the approach in the paper, but the
+    tensor2tensor approach can be enabled by setting
+    *cfg.encoder.normalize_before* to ``True``.
+
+    Args:
+        args (argparse.Namespace): parsed command-line arguments
+    """
+
+    def __init__(self, cfg, has_relative_attention_bias=False, scaling_for_att=1.0):
+        self.scaling_for_att = scaling_for_att
+        super().__init__(cfg)
+        if has_relative_attention_bias:
+            self.norm_k = LayerNorm(self.embed_dim // cfg.encoder.attention_heads)
+
+    def build_self_attention(self, embed_dim, cfg, scaling_for_att=1.0):
+        return MultiheadAttention(
+            embed_dim,
+            cfg.encoder.attention_heads,
+            dropout=cfg.attention_dropout,
+            self_attention=True,
+            q_noise=self.quant_noise,
+            qn_block_size=self.quant_noise_block_size,
+            scaling_for_att=self.scaling_for_att,
+        )
+
+    def forward(
+        self,
+        x,
+        encoder_padding_mask: Optional[Tensor],
+        attn_mask: Optional[Tensor] = None,
+        pos_bias=None,
+    ):
+        """
+        Args:
+            x (Tensor): input to the layer of shape `(seq_len, batch, embed_dim)`
+            encoder_padding_mask (ByteTensor): binary ByteTensor of shape
+                `(batch, seq_len)` where padding elements are indicated by ``1``.
+            attn_mask (ByteTensor): binary tensor of shape `(tgt_len, src_len)`,
+                where `tgt_len` is the length of output and `src_len` is the
+                length of input, though here both are equal to `seq_len`.
+                `attn_mask[tgt_i, src_j] = 1` means that when calculating the
+                embedding for `tgt_i`, we exclude (mask out) `src_j`. This is
+                useful for strided self-attention.
+
+        Returns:
+            encoded output of shape `(seq_len, batch, embed_dim)`
+        """
+        # anything in original attn_mask = 1, becomes -1e8
+        # anything in original attn_mask = 0, becomes 0
+        # Note that we cannot use -inf here, because at some edge cases,
+        # the attention weight (before softmax) for some padded element in query
+        # will become -inf, which results in NaN in model parameters
+        if attn_mask is not None:
+            attn_mask = attn_mask.masked_fill(
+                attn_mask.to(torch.bool), -1e8 if x.dtype == torch.float32 else -1e4
+            )
+
+        residual = x
+        if self.normalize_before:
+            x = self.self_attn_layer_norm(x)
+            if pos_bias is not None:
+                pos_bias = self.norm_k(pos_bias)
+        x, _ = self.self_attn(
+            query=x,
+            key=x,
+            value=x,
+            key_padding_mask=encoder_padding_mask,
+            need_weights=False,
+            attn_mask=attn_mask,
+            position_bias=pos_bias,
+        )
+        x = self.dropout_module(x)
+        x = self.residual_connection(x, residual)
+        if not self.normalize_before:
+            x = self.self_attn_layer_norm(x)
+
+        residual = x
+        if self.normalize_before:
+            x = self.final_layer_norm(x)
+        x = self.activation_fn(self.fc1(x))
+        x = self.activation_dropout_module(x)
+        x = self.fc2(x)
+        x = self.dropout_module(x)
+        x = self.residual_connection(x, residual)
+        if not self.normalize_before:
+            x = self.final_layer_norm(x)
+        return x
+
+
+
+class TransformerDecoderLayerBase(FairseqTransformerDecoderLayerBase):
+    """Decoder layer block.
+
+    In the original paper each operation (multi-head attention, encoder
+    attention or FFN) is postprocessed with: `dropout -> add residual ->
+    layernorm`. In the tensor2tensor code they suggest that learning is more
+    robust when preprocessing each layer with layernorm and postprocessing with:
+    `dropout -> add residual`. We default to the approach in the paper, but the
+    tensor2tensor approach can be enabled by setting
+    *cfg.decoder.normalize_before* to ``True``.
+
+    Args:
+        args (argparse.Namespace): parsed command-line arguments
+        no_encoder_attn (bool, optional): whether to attend to encoder outputs
+            (default: False).
+    """
+
+    def __init__(
+        self, cfg, no_encoder_attn=False, add_bias_kv=False, add_zero_attn=False, has_relative_attention_bias=False, scaling_for_att=1.0,
+    ):
+        self.scaling_for_att = scaling_for_att
+        super().__init__(cfg,
+            no_encoder_attn,
+            add_bias_kv,
+            add_zero_attn,
+        )
+
+        if has_relative_attention_bias:
+            self.norm_k = LayerNorm(self.embed_dim // cfg.decoder.attention_heads)
+
+    def build_self_attention(
+        self, embed_dim, cfg, add_bias_kv=False, add_zero_attn=False
+    ):
+        return MultiheadAttention(
+            embed_dim,
+            cfg.decoder.attention_heads,
+            dropout=cfg.attention_dropout,
+            add_bias_kv=add_bias_kv,
+            add_zero_attn=add_zero_attn,
+            self_attention=not cfg.cross_self_attention,
+            q_noise=self.quant_noise,
+            qn_block_size=self.quant_noise_block_size,
+            scaling_for_att=self.scaling_for_att,
+        )
+
+    def build_encoder_attention(self, embed_dim, cfg):
+        return MultiheadAttention(
+            embed_dim,
+            cfg.decoder.attention_heads,
+            kdim=cfg.encoder.embed_dim,
+            vdim=cfg.encoder.embed_dim,
+            dropout=cfg.attention_dropout,
+            encoder_decoder_attention=True,
+            q_noise=self.quant_noise,
+            qn_block_size=self.quant_noise_block_size,
+            scaling_for_att=self.scaling_for_att,
+        )
+
+    def forward(
+        self,
+        x,
+        encoder_out: Optional[torch.Tensor] = None,
+        encoder_padding_mask: Optional[torch.Tensor] = None,
+        incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]] = None,
+        prev_self_attn_state: Optional[List[torch.Tensor]] = None,
+        prev_attn_state: Optional[List[torch.Tensor]] = None,
+        self_attn_mask: Optional[torch.Tensor] = None,
+        self_attn_padding_mask: Optional[torch.Tensor] = None,
+        need_attn: bool = False,
+        need_head_weights: bool = False,
+        pos_bias=None,
+    ):
+        """
+        Args:
+            x (Tensor): input to the layer of shape `(seq_len, batch, embed_dim)`
+            encoder_padding_mask (ByteTensor, optional): binary
+                ByteTensor of shape `(batch, src_len)` where padding
+                elements are indicated by ``1``.
+            need_attn (bool, optional): return attention weights
+            need_head_weights (bool, optional): return attention weights
+                for each head (default: return average over heads).
+
+        Returns:
+            encoded output of shape `(seq_len, batch, embed_dim)`
+        """
+        if need_head_weights:
+            need_attn = True
+
+        residual = x
+        if self.normalize_before:
+            x = self.self_attn_layer_norm(x)
+            if pos_bias is not None:
+                pos_bias = self.norm_k(pos_bias)
+        if prev_self_attn_state is not None:
+            prev_key, prev_value = prev_self_attn_state[:2]
+            saved_state: Dict[str, Optional[Tensor]] = {
+                "prev_key": prev_key,
+                "prev_value": prev_value,
+            }
+            if len(prev_self_attn_state) >= 3:
+                saved_state["prev_key_padding_mask"] = prev_self_attn_state[2]
+            assert incremental_state is not None
+            self.self_attn._set_input_buffer(incremental_state, saved_state)
+        _self_attn_input_buffer = self.self_attn._get_input_buffer(incremental_state)
+        if self.cross_self_attention and not (
+            incremental_state is not None
+            and _self_attn_input_buffer is not None
+            and "prev_key" in _self_attn_input_buffer
+        ):
+            if self_attn_mask is not None:
+                assert encoder_out is not None
+                self_attn_mask = torch.cat(
+                    (x.new_zeros(x.size(0), encoder_out.size(0)), self_attn_mask), dim=1
+                )
+            if self_attn_padding_mask is not None:
+                if encoder_padding_mask is None:
+                    assert encoder_out is not None
+                    encoder_padding_mask = self_attn_padding_mask.new_zeros(
+                        encoder_out.size(1), encoder_out.size(0)
+                    )
+                self_attn_padding_mask = torch.cat(
+                    (encoder_padding_mask, self_attn_padding_mask), dim=1
+                )
+            assert encoder_out is not None
+            y = torch.cat((encoder_out, x), dim=0)
+        else:
+            y = x
+
+        x, attn = self.self_attn(
+            query=x,
+            key=y,
+            value=y,
+            key_padding_mask=self_attn_padding_mask,
+            incremental_state=incremental_state,
+            need_weights=False,
+            attn_mask=self_attn_mask,
+            position_bias=pos_bias,
+        )
+        if self.c_attn is not None:
+            tgt_len, bsz = x.size(0), x.size(1)
+            x = x.view(tgt_len, bsz, self.nh, self.head_dim)
+            x = torch.einsum("tbhd,h->tbhd", x, self.c_attn)
+            x = x.reshape(tgt_len, bsz, self.embed_dim)
+        if self.attn_ln is not None:
+            x = self.attn_ln(x)
+        x = self.dropout_module(x)
+        x = self.residual_connection(x, residual)
+        if not self.normalize_before:
+            x = self.self_attn_layer_norm(x)
+
+        if self.encoder_attn is not None and encoder_out is not None:
+            residual = x
+            if self.normalize_before:
+                x = self.encoder_attn_layer_norm(x)
+            if prev_attn_state is not None:
+                prev_key, prev_value = prev_attn_state[:2]
+                saved_state: Dict[str, Optional[Tensor]] = {
+                    "prev_key": prev_key,
+                    "prev_value": prev_value,
+                }
+                if len(prev_attn_state) >= 3:
+                    saved_state["prev_key_padding_mask"] = prev_attn_state[2]
+                assert incremental_state is not None
+                self.encoder_attn._set_input_buffer(incremental_state, saved_state)
+
+            x, attn = self.encoder_attn(
+                query=x,
+                key=encoder_out,
+                value=encoder_out,
+                key_padding_mask=encoder_padding_mask,
+                incremental_state=incremental_state,
+                static_kv=True,
+                need_weights=need_attn or (not self.training and self.need_attn),
+                need_head_weights=need_head_weights,
+            )
+            x = self.dropout_module(x)
+            x = self.residual_connection(x, residual)
+            if not self.normalize_before:
+                x = self.encoder_attn_layer_norm(x)
+
+        residual = x
+        if self.normalize_before:
+            x = self.final_layer_norm(x)
+
+        x = self.activation_fn(self.fc1(x))
+        x = self.activation_dropout_module(x)
+        if self.ffn_layernorm is not None:
+            x = self.ffn_layernorm(x)
+        x = self.fc2(x)
+        x = self.dropout_module(x)
+        if self.w_resid is not None:
+            residual = torch.mul(self.w_resid, residual)
+        x = self.residual_connection(x, residual)
+        if not self.normalize_before:
+            x = self.final_layer_norm(x)
+        if self.onnx_trace and incremental_state is not None:
+            saved_state = self.self_attn._get_input_buffer(incremental_state)
+            assert saved_state is not None
+            if self_attn_padding_mask is not None:
+                self_attn_state = [
+                    saved_state["prev_key"],
+                    saved_state["prev_value"],
+                    saved_state["prev_key_padding_mask"],
+                ]
+            else:
+                self_attn_state = [saved_state["prev_key"], saved_state["prev_value"]]
+            return x, attn, self_attn_state
+        return x, attn, None
+
+    def make_generation_fast_(self, need_attn: bool = False, **kwargs):
+        self.need_attn = need_attn
diff --git a/SpeechLM/speechlm/modules/w2v_encoder.py b/SpeechLM/speechlm/modules/w2v_encoder.py
new file mode 100644
index 0000000000000000000000000000000000000000..9b8c15f1289d49581dbaae321fe569daeffb5242
--- /dev/null
+++ b/SpeechLM/speechlm/modules/w2v_encoder.py
@@ -0,0 +1,283 @@
+# --------------------------------------------------------
+# The YiTrans End-to-End Speech Translation System for IWSLT 2022 Offline Shared Task (https://arxiv.org/abs/2206.05777)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/YiTrans
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Based on fairseq code bases
+# https://github.com/facebookresearch/fairseq
+# --------------------------------------------------------
+
+"""
+    wav2vec encoder adding relitive position bias, modified from 
+    https://github.com/microsoft/SpeechT5/blob/main/Speech2C/speech2c/models/modules/transformer_encoder.py
+    https://github.com/facebookresearch/fairseq/blob/main/fairseq/models/wav2vec/wav2vec2.py
+"""
+
+import math
+import numpy as np
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from fairseq import utils
+from fairseq.dataclass import ChoiceEnum
+from fairseq.modules import (
+    LayerNorm,
+    SamePad,
+)
+from fairseq.modules.checkpoint_activations import checkpoint_wrapper
+from fairseq.modules.transformer_sentence_encoder import init_bert_params
+from fairseq.utils import index_put
+from fairseq.distributed import fsdp_wrap
+from fairseq.models.wav2vec.utils import pad_to_multiple
+
+## reload multi-head attition with rel-pos-bias
+from fairseq.models.wav2vec.wav2vec2 import TransformerEncoder as W2vTransformerEncoder
+from speechlm.modules.relative_pos_enc import RelativePositionalEncoding
+from speechlm.modules.multihead_attention import MultiheadAttention
+
+EXTRACTOR_MODE_CHOICES = ChoiceEnum(["default", "layer_norm"])
+MASKING_DISTRIBUTION_CHOICES = ChoiceEnum(["static", "uniform", "normal", "poisson"])
+
+
+class TransformerEncoder(W2vTransformerEncoder):
+    def __init__(self, args):
+        super().__init__(args)
+
+        self.dropout = args.dropout
+        self.embedding_dim = args.encoder_embed_dim
+        self.required_seq_len_multiple = args.required_seq_len_multiple
+        self.use_rel_pos_enc = getattr(args, "use_rel_pos_enc", False)
+
+        self.pos_conv = nn.Conv1d(
+            self.embedding_dim,
+            self.embedding_dim,
+            kernel_size=args.conv_pos,
+            padding=args.conv_pos // 2,
+            groups=args.conv_pos_groups,
+        )
+        dropout = 0
+        std = math.sqrt((4 * (1.0 - dropout)) / (args.conv_pos * self.embedding_dim))
+        nn.init.normal_(self.pos_conv.weight, mean=0, std=std)
+        nn.init.constant_(self.pos_conv.bias, 0)
+
+        self.pos_conv = nn.utils.weight_norm(self.pos_conv, name="weight", dim=2)
+        self.pos_conv = nn.Sequential(self.pos_conv, SamePad(args.conv_pos), nn.GELU())
+
+        layers = []
+        for _ in range(args.encoder_layers):
+            layer = TransformerSentenceEncoderLayer(
+                embedding_dim=self.embedding_dim,
+                ffn_embedding_dim=args.encoder_ffn_embed_dim,
+                num_attention_heads=args.encoder_attention_heads,
+                dropout=self.dropout,
+                attention_dropout=args.attention_dropout,
+                activation_dropout=args.activation_dropout,
+                activation_fn=args.activation_fn,
+                layer_norm_first=args.layer_norm_first,
+                has_relative_attention_bias=self.use_rel_pos_enc,
+            )
+            if args.checkpoint_activations:
+                layer = fsdp_wrap(layer)
+                layer = checkpoint_wrapper(layer)
+            layers.append(layer)
+        self.layers = nn.ModuleList(layers)
+
+        self.layer_norm_first = args.layer_norm_first
+        self.layer_norm = LayerNorm(self.embedding_dim)
+        self.layerdrop = args.encoder_layerdrop
+        if self.use_rel_pos_enc:
+            self.pos_emb = RelativePositionalEncoding(args.encoder_embed_dim // args.encoder_attention_heads, 160)
+
+
+        self.apply(init_bert_params)
+
+    def forward(self, x, padding_mask=None, layer=None):
+        x, layer_results = self.extract_features(x, padding_mask, layer)
+
+        if self.layer_norm_first and layer is None:
+            x = self.layer_norm(x)
+
+        return x, layer_results
+
+    def extract_features(self, x, padding_mask=None, tgt_layer=None):
+
+        if padding_mask is not None:
+            x = index_put(x, padding_mask, 0)
+
+        x_conv = self.pos_conv(x.transpose(1, 2))
+        x_conv = x_conv.transpose(1, 2)
+        x = x + x_conv
+
+        if not self.layer_norm_first:
+            x = self.layer_norm(x)
+
+        # pad to the sequence length dimension
+        x, pad_length = pad_to_multiple(
+            x, self.required_seq_len_multiple, dim=-2, value=0
+        )
+        if pad_length > 0 and padding_mask is None:
+            padding_mask = x.new_zeros((x.size(0), x.size(1)), dtype=torch.bool)
+            padding_mask[:, -pad_length:] = True
+        else:
+            padding_mask, _ = pad_to_multiple(
+                padding_mask, self.required_seq_len_multiple, dim=-1, value=True
+            )
+        x = F.dropout(x, p=self.dropout, training=self.training)
+
+        # B x T x C -> T x B x C
+        x = x.transpose(0, 1)
+
+        if self.use_rel_pos_enc:
+            x_len = x.shape[0]
+            pos_seq = torch.arange(0, x_len).long().to(x.device)
+            pos_seq = pos_seq[:, None] - pos_seq[None, :]
+            pos_k, pos_v = self.pos_emb(pos_seq)
+        else:
+            pos_k = None
+
+        layer_results = []
+        r = None
+        for i, layer in enumerate(self.layers):
+            dropout_probability = np.random.random()
+            if not self.training or (dropout_probability > self.layerdrop):
+                x, z = layer(x, self_attn_padding_mask=padding_mask, need_weights=False, pos_bias=pos_k)
+                if tgt_layer is not None:
+                    # unpad if needed
+                    if pad_length > 0:
+                        layer_results.append(
+                            (
+                                x[:-pad_length],
+                                z[:, :-pad_length, :-pad_length]
+                                if z is not None
+                                else z,
+                            )
+                        )
+                    else:
+                        layer_results.append((x, z))
+            if i == tgt_layer:
+                r = x
+                break
+
+        if r is not None:
+            x = r
+
+        # T x B x C -> B x T x C
+        x = x.transpose(0, 1)
+        # undo paddding
+        if pad_length > 0:
+            x = x[:, :-pad_length]
+
+        return x, layer_results
+
+
+class TransformerSentenceEncoderLayer(nn.Module):
+    """
+    Implements a Transformer Encoder Layer used in BERT/XLM style pre-trained
+    models.
+    """
+
+    def __init__(
+        self,
+        embedding_dim: float = 768,
+        ffn_embedding_dim: float = 3072,
+        num_attention_heads: float = 8,
+        dropout: float = 0.1,
+        attention_dropout: float = 0.1,
+        activation_dropout: float = 0.1,
+        activation_fn: str = "relu",
+        layer_norm_first: bool = False,
+        has_relative_attention_bias: bool = False,
+    ) -> None:
+
+        super().__init__()
+        # Initialize parameters
+        self.embedding_dim = embedding_dim
+        self.dropout = dropout
+        self.activation_dropout = activation_dropout
+
+        # Initialize blocks
+        self.activation_fn = utils.get_activation_fn(activation_fn)
+        self.self_attn = MultiheadAttention(
+            self.embedding_dim,
+            num_attention_heads,
+            dropout=attention_dropout,
+            self_attention=True,
+        )
+
+        self.dropout1 = nn.Dropout(dropout)
+        self.dropout2 = nn.Dropout(self.activation_dropout)
+        self.dropout3 = nn.Dropout(dropout)
+
+        self.layer_norm_first = layer_norm_first
+
+        # layer norm associated with the self attention layer
+        self.self_attn_layer_norm = LayerNorm(self.embedding_dim)
+        self.fc1 = nn.Linear(self.embedding_dim, ffn_embedding_dim)
+        self.fc2 = nn.Linear(ffn_embedding_dim, self.embedding_dim)
+
+        # layer norm associated with the position wise feed-forward NN
+        self.final_layer_norm = LayerNorm(self.embedding_dim)
+
+        if has_relative_attention_bias:
+            self.norm_k = LayerNorm(self.embedding_dim//num_attention_heads)
+
+    def forward(
+        self,
+        x: torch.Tensor,
+        self_attn_mask: torch.Tensor = None,
+        self_attn_padding_mask: torch.Tensor = None,
+        need_weights: bool = False,
+        att_args=None,
+        pos_bias=None,
+    ):
+        """
+        LayerNorm is applied either before or after the self-attention/ffn
+        modules similar to the original Transformer imlementation.
+        """
+        residual = x
+
+        if self.layer_norm_first:
+            x = self.self_attn_layer_norm(x)
+            if pos_bias is not None:
+                pos_bias = self.norm_k(pos_bias)
+            x, attn = self.self_attn(
+                query=x,
+                key=x,
+                value=x,
+                key_padding_mask=self_attn_padding_mask,
+                attn_mask=self_attn_mask,
+                position_bias=pos_bias,
+            )
+            x = self.dropout1(x)
+            x = residual + x
+
+            residual = x
+            x = self.final_layer_norm(x)
+            x = self.activation_fn(self.fc1(x))
+            x = self.dropout2(x)
+            x = self.fc2(x)
+            x = self.dropout3(x)
+            x = residual + x
+        else:
+            x, attn = self.self_attn(
+                query=x,
+                key=x,
+                value=x,
+                key_padding_mask=self_attn_padding_mask,
+                position_bias=pos_bias,
+            )
+
+            x = self.dropout1(x)
+            x = residual + x
+
+            x = self.self_attn_layer_norm(x)
+
+            residual = x
+            x = self.activation_fn(self.fc1(x))
+            x = self.dropout2(x)
+            x = self.fc2(x)
+            x = self.dropout3(x)
+            x = residual + x
+            x = self.final_layer_norm(x)
+
+        return x, attn
diff --git a/SpeechLM/speechlm/scripts/pretrain_speechlm/base_speechlmh.sh b/SpeechLM/speechlm/scripts/pretrain_speechlm/base_speechlmh.sh
new file mode 100644
index 0000000000000000000000000000000000000000..650f7dc43175f33c518d3fa294e32683aa77518c
--- /dev/null
+++ b/SpeechLM/speechlm/scripts/pretrain_speechlm/base_speechlmh.sh
@@ -0,0 +1,43 @@
+# ####################################
+# SpeechLM-H Base model #
+# ####################################
+[ $# -lt 2 ] && echo "Usage: $0 <data_dir> <text_data_dir> [mount=${PWD}] [world_size=32] [update_freq=1]" && exit 1
+[ ${PWD##*/} != SpeechLM ] && echo "Error: dir not match! Switch to SpeechLM/ and run it again!" && exit 1
+DATA_DIR=$1
+TEXT_DATA_DIR=$2
+mount=$3
+world_size=$4
+update_freq=$5
+[ -z $mount ] && mount=${PWD}
+[ -z $world_size ] && world_size=32
+[ -z $update_freq ] && update_freq=1
+
+CODE_ROOT=${PWD}
+MODEL_DIR="${mount}/exp/pretrain/base_speechlmh_${world_size}gpu_${update_freq}accum"
+[ -d $MODEL_DIR ] || mkdir -p $MODEL_DIR
+
+python $CODE_ROOT/fairseq/fairseq_cli/hydra_train.py \
+  --config-dir $CODE_ROOT/speechlm/config/pretrain \
+  --config-name speechlm_base_librispeech \
+  common.user_dir=$CODE_ROOT/speechlm \
+  \
+  task.labels='["km"]' \
+  model.label_rate=50 \
+  task.data=$DATA_DIR \
+  task.label_dir=$DATA_DIR \
+  task.text_cfg.text_data=$TEXT_DATA_DIR \
+  \
+  dataset.train_subset=\"train_960+train_text.km-ltr\" \
+  dataset.valid_subset=\"dev_clean+dev_clean.km-ltr\" \
+  dataset.num_workers=0 \
+  dataset.max_tokens=1400000 \
+  distributed_training.distributed_world_size=${world_size} \
+  optimization.update_freq=[${update_freq}] \
+  \
+  common.tensorboard_logdir=$MODEL_DIR \
+  checkpoint.save_dir=$MODEL_DIR \
+  hydra.run.dir=$MODEL_DIR \
+  hydra.job.name=pretrain
+
+# data_dir="/mnt/default/v-ziqzhang/data/stbert/data/librispeech/hubert_release_iter2_layer9_kmeans/local"
+# text_data_dir="/mnt/default/v-ziqzhang/dataset/LibriLM/from_fastT2U/bin-idx"
diff --git a/SpeechLM/speechlm/scripts/pretrain_speechlm/base_speechlmp.sh b/SpeechLM/speechlm/scripts/pretrain_speechlm/base_speechlmp.sh
new file mode 100644
index 0000000000000000000000000000000000000000..2e0a81290f5416876f5f3dbf0884d5dd1af0fb16
--- /dev/null
+++ b/SpeechLM/speechlm/scripts/pretrain_speechlm/base_speechlmp.sh
@@ -0,0 +1,43 @@
+# ####################################
+# SpeechLM-P Base model #
+# ####################################
+[ $# -lt 2 ] && echo "Usage: $0 <data_dir> <text_data_dir> [mount=${PWD}] [world_size=32] [update_freq=1]" && exit 1
+[ ${PWD##*/} != SpeechLM ] && echo "Error: dir not match! Switch to SpeechLM/ and run it again!" && exit 1
+DATA_DIR=$1
+TEXT_DATA_DIR=$2
+mount=$3
+world_size=$4
+update_freq=$5
+[ -z $mount ] && mount=${PWD}
+[ -z $world_size ] && world_size=32
+[ -z $update_freq ] && update_freq=1
+
+CODE_ROOT=${PWD}
+MODEL_DIR="${mount}/exp/pretrain/base_speechlmp_${world_size}gpu_${update_freq}accum"
+[ -d $MODEL_DIR ] || mkdir -p $MODEL_DIR
+
+python $CODE_ROOT/fairseq/fairseq_cli/hydra_train.py \
+  --config-dir $CODE_ROOT/speechlm/config/pretrain \
+  --config-name speechlm_base_librispeech \
+  common.user_dir=$CODE_ROOT/speechlm \
+  \
+  task.labels='["phn"]' \
+  model.label_rate=100 \
+  task.data=$DATA_DIR \
+  task.label_dir=$DATA_DIR \
+  task.text_cfg.text_data=$TEXT_DATA_DIR \
+  \
+  dataset.train_subset=\"train_960+train_text.phn-ltr\" \
+  dataset.valid_subset=\"dev_clean+dev_clean.phn-ltr\" \
+  dataset.num_workers=0 \
+  dataset.max_tokens=1400000 \
+  distributed_training.distributed_world_size=${world_size} \
+  optimization.update_freq=[${update_freq}] \
+  \
+  common.tensorboard_logdir=$MODEL_DIR \
+  checkpoint.save_dir=$MODEL_DIR \
+  hydra.run.dir=$MODEL_DIR \
+  hydra.job.name=pretrain
+
+# data_dir="/stdblob/users/v-ziqzhang/dataset/LibriLM/phn2char_sanych/tri4b_mono_label"
+# text_data_dir="/stdblob/users/v-ziqzhang/dataset/LibriLM/phn2char_sanych/filt2k_sil025_m5std25_sil14_spn32/bin-idx"
diff --git a/SpeechLM/speechlm/scripts/pretrain_speechlm/large_speechlmp.sh b/SpeechLM/speechlm/scripts/pretrain_speechlm/large_speechlmp.sh
new file mode 100644
index 0000000000000000000000000000000000000000..75fc15fc7802217f4a0a8cab80ebefc7ddb8148a
--- /dev/null
+++ b/SpeechLM/speechlm/scripts/pretrain_speechlm/large_speechlmp.sh
@@ -0,0 +1,44 @@
+# ####################################
+# SpeechLM-P Large model #
+# ####################################
+[ $# -lt 2 ] && echo "Usage: $0 <data_dir> <text_data_dir> [mount=${PWD}] [world_size=32] [update_freq=4]" && exit 1
+[ ${PWD##*/} != SpeechLM ] && echo "Error: dir not match! Switch to SpeechLM/ and run it again!" && exit 1
+DATA_DIR=$1
+TEXT_DATA_DIR=$2
+mount=$3
+world_size=$4
+update_freq=$5
+[ -z $mount ] && mount=${PWD}
+[ -z $world_size ] && world_size=32
+[ -z $update_freq ] && update_freq=4
+
+CODE_ROOT=${PWD}
+MODEL_DIR="${mount}/exp/pretrain/large_speechlmp_${world_size}gpu_${update_freq}accum"
+[ -d $MODEL_DIR ] || mkdir -p $MODEL_DIR
+
+python $CODE_ROOT/fairseq/fairseq_cli/hydra_train.py \
+  --config-dir $CODE_ROOT/speechlm/config/pretrain \
+  --config-name speechlm_large_librilight \
+  common.user_dir=$CODE_ROOT/speechlm \
+  \
+  task.labels='["phn"]' \
+  model.label_rate=50 \
+  task.data=$DATA_DIR \
+  task.label_dir=$DATA_DIR \
+  task.text_cfg.text_data=$TEXT_DATA_DIR \
+  \
+  dataset.train_subset=\"train_60k+train_text.phn-ltr\" \
+  dataset.valid_subset=\"dev_clean+dev_clean.phn-ltr\" \
+  dataset.num_workers=1 \
+  dataset.max_tokens=900000 \
+  distributed_training.distributed_world_size=${world_size} \
+  optimization.update_freq=[${update_freq}] \
+  \
+  common.fp16_scale_tolerance=0.1 \
+  common.tensorboard_logdir=$MODEL_DIR \
+  checkpoint.save_dir=$MODEL_DIR \
+  hydra.run.dir=$MODEL_DIR \
+  hydra.job.name=pretrain
+
+# data_dir="/stdblob/users/v-ziqzhang/dataset/librilight/chunkdata"
+# text_data_dir="/stdblob/users/v-ziqzhang/dataset/LibriLM/phn2char_sanych/filt2k_sil025_m5std25_sil14_spn32/bin-idx"
diff --git a/SpeechLM/speechlm/scripts/tokenizer_fastT2U/generate.sh b/SpeechLM/speechlm/scripts/tokenizer_fastT2U/generate.sh
new file mode 100644
index 0000000000000000000000000000000000000000..17d07646076376e68c681dc31427166b36b1faef
--- /dev/null
+++ b/SpeechLM/speechlm/scripts/tokenizer_fastT2U/generate.sh
@@ -0,0 +1,42 @@
+#####################################
+# Fast Text2Unit Model #
+#####################################
+[ $# -lt 2 ] && echo "Usage: $0 <model_path> <gen_set> [outdir={gen_set%/*}]" && exit 0
+[ ${PWD##*/} != SpeechLM ] && echo "Error: dir not match! Switch to SpeechLM/ and run it again!" && exit 1
+
+model_path=$1
+src_dir=${model_path%/*}
+cpt=${model_path##*/}
+cpt=${cpt%.*}
+
+gen_set=$2
+outdir=$3
+
+DATA_DIR=${gen_set%/*}
+gen_set=${gen_set##*/}
+[ -z $outdir ] && outdir=${DATA_DIR}
+
+CODE_ROOT=${PWD}
+
+nj=4
+for rank in $(seq 0 $((nj-1))); do
+    results_path=$outdir/pseudo_${gen_set}/${rank}
+    [ ! -d $results_path ] && mkdir -p $results_path
+    echo "$model_path" > $results_path/model.record
+
+    python $CODE_ROOT/speechlm/generate_unit.py $DATA_DIR \
+    --user-dir $CODE_ROOT/speechlm \
+    --config-yaml config_generate.yaml \
+    --path ${model_path} \
+    --task fast_text_to_unit \
+    --gen-subset $gen_set \
+    \
+    --beam 1 \
+    --max-tokens 10000 \
+    --results-path $results_path \
+    --scoring sacrebleu \
+    --skip-invalid-size-inputs-valid-test \
+    --distributed-world-size $nj --distributed-rank ${rank} \
+    &
+done
+wait
diff --git a/SpeechLM/speechlm/scripts/tokenizer_fastT2U/infer.sh b/SpeechLM/speechlm/scripts/tokenizer_fastT2U/infer.sh
new file mode 100644
index 0000000000000000000000000000000000000000..306ee866d46c7e827ebe62a9151c070e969b953f
--- /dev/null
+++ b/SpeechLM/speechlm/scripts/tokenizer_fastT2U/infer.sh
@@ -0,0 +1,41 @@
+#####################################
+# Fast Text2Unit Model #
+#####################################
+[ $# -lt 2 ] && echo "Usage: $0 <model_path> <gen_set> " && exit 0
+[ ${PWD##*/} != SpeechLM ] && echo "Error: dir not match! Switch to SpeechLM/ and run it again!" && exit 1
+
+model_path=$1
+src_dir=${model_path%/*}
+cpt=${model_path##*/}
+cpt=${cpt%.*}
+
+gen_set=$2
+
+DATA_DIR=${gen_set%/*}
+gen_set=${gen_set##*/}
+outdir=$src_dir/decode_${cpt}
+
+CODE_ROOT=${PWD}
+
+for subset in ${gen_set//,/ }; do
+    results_path=$outdir/phone2unit_${subset}
+    [ ! -d $results_path ] && mkdir -p $results_path
+
+    python $CODE_ROOT/speechlm/generate_unit.py $DATA_DIR \
+    --user-dir $CODE_ROOT/speechlm \
+    --config-yaml config.yaml \
+    --path ${model_path} \
+    --task fast_text_to_unit \
+    --gen-subset $subset \
+    \
+    --beam 1 \
+    --max-tokens 10000 \
+    --results-path $results_path \
+    --scoring sacrebleu
+
+    echo $results_path
+    tail -n 1 $results_path/generate-*.txt
+    sleep 1s
+done
+
+# --distributed-world-size 1000 --distributed-rank 0 \
diff --git a/SpeechLM/speechlm/scripts/tokenizer_fastT2U/train_s_5e-4.sh b/SpeechLM/speechlm/scripts/tokenizer_fastT2U/train_s_5e-4.sh
new file mode 100644
index 0000000000000000000000000000000000000000..6fec89b8a7cb15b334ad2d5dd5cf76ce44f811b4
--- /dev/null
+++ b/SpeechLM/speechlm/scripts/tokenizer_fastT2U/train_s_5e-4.sh
@@ -0,0 +1,39 @@
+#####################################
+# Fast Text2Unit Model #
+#####################################
+[ $# -lt 1 ] && echo "Usage: $0 <data_dir> [mount] [world_size=4] [update_freq=1]" && exit 0
+[ ${PWD##*/} != SpeechLM ] && echo "Error: dir not match! Switch to SpeechLM/ and run it again!" && exit 1
+
+DATA_DIR=$1
+mount=$2
+world_size=$3
+update_freq=$4
+[ -z $mount ] && mount=${PWD}
+[ -z $world_size ] && world_size=4
+[ -z $update_freq ] && update_freq=1
+
+CODE_ROOT=${PWD}
+MODEL_DIR="$mount/exp/fast_text2unit/small_lr5e-4_tristage_ls0.1_${world_size}gpu_${update_freq}accum"
+[ -d $MODEL_DIR ] || mkdir -p $MODEL_DIR
+
+fairseq-train ${DATA_DIR} --save-dir ${MODEL_DIR} \
+  --config-yaml config.yaml \
+  --user-dir $CODE_ROOT/speechlm \
+  --train-subset train_100 --valid-subset dev_clean \
+  --num-workers 4 --max-tokens 20000 \
+  --distributed-world-size ${world_size} --update-freq ${update_freq} \
+  \
+  --task fast_text_to_unit --criterion fasttext2unit_criterion --arch fasttext2unit_s \
+  --label-smoothing 0.1 \
+  \
+  --clip-norm 5.0 --n-frames-per-step 1 \
+  --dropout 0.1 --attention-dropout 0.1 \
+  --optimizer adam --lr 5e-4 --lr-scheduler tri_stage --phase-ratio [0.3,0.0,0.7] --max-update 10000 \
+  --seed 1 --best-checkpoint-metric accuracy --maximize-best-checkpoint-metric \
+  \
+  --save-interval 2 \
+  --tensorboard-logdir ${MODEL_DIR} \
+  --fp16 --find-unused-parameters \
+  | tee ${MODEL_DIR}/train.log
+
+# DATA_DIR=/mnt/default/v-ziqzhang/dataset/librispeech_phone2unit/phone2unit
diff --git a/SpeechLM/speechlm/scripts/tune_speechlm_asr/finetune_base_ctc.sh b/SpeechLM/speechlm/scripts/tune_speechlm_asr/finetune_base_ctc.sh
new file mode 100644
index 0000000000000000000000000000000000000000..4b7c542fe9e8688a599eb52058b442edac613317
--- /dev/null
+++ b/SpeechLM/speechlm/scripts/tune_speechlm_asr/finetune_base_ctc.sh
@@ -0,0 +1,48 @@
+# ####################################
+# SpeechLM Base model #
+# ####################################
+[ $# -lt 3 ] && echo "Usage: $0 <model_path> <data_dir> <cpt_tag> [mount=${PWD}] [world_size=8] [update_freq=1]" && exit 1
+[ ${PWD##*/} != SpeechLM ] && echo "Error: dir not match! Switch to SpeechLM/ and run it again!" && exit 1
+
+w2v_path=$1
+DATA_DIR=$2
+cpt=$3
+mount=$4
+world_size=$5
+update_freq=$6
+[ -z $mount ] && mount=${PWD}
+[ -z $world_size ] && world_size=8
+[ -z $update_freq ] && update_freq=1
+
+CODE_ROOT=${PWD}
+
+exp_name=${w2v_path%/*}
+exp_name=${exp_name##*/}
+MODEL_DIR="${mount}/exp/finetune_asr/$exp_name/ctc30k_from_${cpt}_bz1.6m_lr1e-5"
+[ -d $MODEL_DIR ] || mkdir -p $MODEL_DIR
+
+python $CODE_ROOT/fairseq/fairseq_cli/hydra_train.py \
+  --config-dir $CODE_ROOT/speechlm/config/finetune \
+  --config-name speechlm_base_100h \
+  common.user_dir=$CODE_ROOT/speechlm \
+  \
+  task.data=$DATA_DIR \
+  task.label_dir=$DATA_DIR \
+  model.w2v_path=${w2v_path} \
+  \
+  optimization.lr=[0.00001] \
+  optimization.max_update=30000 \
+  dataset.max_tokens=1600000 \
+  optimization.update_freq=[${update_freq}] \
+  distributed_training.distributed_world_size=${world_size} \
+  \
+  dataset.train_subset="train_clean_100" \
+  dataset.valid_subset="dev_other" \
+  \
+  common.tensorboard_logdir=$MODEL_DIR \
+  checkpoint.save_dir=$MODEL_DIR \
+  hydra.run.dir=$MODEL_DIR \
+  hydra.job.name=${exp_name}
+
+# model_path=/mnt/default/v-ziqzhang/data/speechulm/exp/base/base_speechlmp_32gpu_1accum/checkpoint_298_400000.pt
+# data_dir=/home/v-ziqzhang/dataset/LibriSpeech/asr
diff --git a/SpeechLM/speechlm/scripts/tune_speechlm_asr/finetune_large_ctc.sh b/SpeechLM/speechlm/scripts/tune_speechlm_asr/finetune_large_ctc.sh
new file mode 100644
index 0000000000000000000000000000000000000000..c07919000a3209cc75501e97ffb3559a907c1f1e
--- /dev/null
+++ b/SpeechLM/speechlm/scripts/tune_speechlm_asr/finetune_large_ctc.sh
@@ -0,0 +1,48 @@
+# ####################################
+# SpeechLM Large model #
+# ####################################
+[ $# -lt 3 ] && echo "Usage: $0 <model_path> <data_dir> <cpt_tag> [mount=${PWD}] [world_size=8] [update_freq=4]" && exit 1
+[ ${PWD##*/} != SpeechLM ] && echo "Error: dir not match! Switch to SpeechLM/ and run it again!" && exit 1
+
+w2v_path=$1
+DATA_DIR=$2
+cpt=$3
+mount=$4
+world_size=$5
+update_freq=$6
+[ -z $mount ] && mount=${PWD}
+[ -z $world_size ] && world_size=8
+[ -z $update_freq ] && update_freq=4
+
+CODE_ROOT=${PWD}
+
+exp_name=${w2v_path%/*}
+exp_name=${exp_name##*/}
+MODEL_DIR="${mount}/exp/finetune_asr/$exp_name/ctc200k_from_${cpt}_bz3.6m_lr1e-5"
+[ -d $MODEL_DIR ] || mkdir -p $MODEL_DIR
+
+python $CODE_ROOT/fairseq/fairseq_cli/hydra_train.py \
+  --config-dir $CODE_ROOT/speechlm/config/finetune \
+  --config-name speechlm_large_960h \
+  common.user_dir=$CODE_ROOT/speechlm \
+  \
+  task.data=$DATA_DIR \
+  task.label_dir=$DATA_DIR \
+  model.w2v_path=${w2v_path} \
+  \
+  optimization.lr=[0.00001] \
+  optimization.max_update=200000 \
+  dataset.max_tokens=900000 \
+  optimization.update_freq=[${update_freq}] \
+  distributed_training.distributed_world_size=${world_size} \
+  \
+  dataset.train_subset="train_960" \
+  dataset.valid_subset="dev_other" \
+  \
+  common.tensorboard_logdir=$MODEL_DIR \
+  checkpoint.save_dir=$MODEL_DIR \
+  hydra.run.dir=$MODEL_DIR \
+  hydra.job.name=${exp_name}
+
+# model_path=/mnt/default/v-ziqzhang/data/speechulm/exp/large/large_speechlmp_32gpu_4accum/checkpoint_31_400000.pt
+# data_dir=/home/v-ziqzhang/dataset/LibriSpeech/asr
diff --git a/SpeechLM/speechlm/scripts/tune_speechlm_asr/inference_ctc.sh b/SpeechLM/speechlm/scripts/tune_speechlm_asr/inference_ctc.sh
new file mode 100644
index 0000000000000000000000000000000000000000..4c331d603168099b2a643fa631f74ea647f11173
--- /dev/null
+++ b/SpeechLM/speechlm/scripts/tune_speechlm_asr/inference_ctc.sh
@@ -0,0 +1,40 @@
+#####################################
+# SpeechLM Base model #
+#####################################
+[ $# -lt 2 ] && echo "Usage: $0 <model_path> <data_dir> [gen-set=dev_clean,dev_other,test_clean,test_other]" && exit 1
+[ ${PWD##*/} != SpeechLM ] && echo "Error: dir not match! Switch to SpeechLM/ and run it again!" && exit 1
+
+model_path=$1
+DATA_DIR=$2
+gen_set=$3
+[ -z $gen_set ] && gen_set="dev_clean,dev_other,test_clean,test_other"
+src_dir=${model_path%/*}
+cpt=${model_path##*/}
+cpt=${cpt%.*}
+
+CODE_ROOT=${PWD}
+
+for subset in ${gen_set//,/ }; do
+    results_path=$src_dir/decode_${cpt}_ctc/${subset}
+    [ ! -d $results_path ] && mkdir -p $results_path
+
+    python $CODE_ROOT/speechlm/infer.py \
+    --config-dir $CODE_ROOT/speechlm/config/decode \
+    --config-name infer_viterbi \
+    common.user_dir=$CODE_ROOT/speechlm \
+    \
+    dataset.gen_subset=${subset} \
+    task.data=$DATA_DIR task.label_dir=$DATA_DIR task.normalize=false \
+    common_eval.results_path=${results_path} common_eval.path=${model_path} \
+    \
+    common_eval.quiet=true \
+    &
+done
+wait
+
+### important to know
+# When loading the fine-tuned model for decoding, fairseq also loads the pre-trained model to use its states['model'] to build the model instance.
+# To prevent the error about the w2v_path (if you don't have the pre-trained model at w2v_path), we set common_eval.model_overrides to override 
+# the w2v_path by speechlmp_base_cfg.pt. speechlmp_base_cfg.pt is just a pre-trained model checkpoint without parameters (only contains config).
+# So, if you have trained a model with different model config (e.g. different encoder layers), you should modify the common_eval.model_overrides to your own.
+    # common_eval.model_overrides=\"{\'w2v_path\':\'$CODE_ROOT/speechlm/config/pretrain/speechlmp_base_cfg.pt\'}\" \
diff --git a/SpeechLM/speechlm/scripts/tune_speechlm_asr/inference_ctc_kenlm.sh b/SpeechLM/speechlm/scripts/tune_speechlm_asr/inference_ctc_kenlm.sh
new file mode 100644
index 0000000000000000000000000000000000000000..3dfce021bb4948de5f37ede7d67c9386fa874d8a
--- /dev/null
+++ b/SpeechLM/speechlm/scripts/tune_speechlm_asr/inference_ctc_kenlm.sh
@@ -0,0 +1,48 @@
+#####################################
+# SpeechLM Base model #
+#####################################
+[ $# -lt 2 ] && echo "Usage: $0 <model_path> <data_dir> [gen-set=dev_clean,dev_other,test_clean,test_other]" && exit 1
+[ ${PWD##*/} != SpeechLM ] && echo "Error: dir not match! Switch to SpeechLM/ and run it again!" && exit 1
+
+model_path=$1
+DATA_DIR=$2
+gen_set=$3
+[ -z $gen_set ] && gen_set="dev_clean,dev_other,test_clean,test_other"
+src_dir=${model_path%/*}
+cpt=${model_path##*/}
+cpt=${cpt%.*}
+
+CODE_ROOT=${PWD}
+path_to_lexicon=${DATA_DIR}/librispeech_lexicon.lst
+path_to_lm=${DATA_DIR}/4-gram.arpa
+[ ! -f $path_to_lexicon ] && echo "Error: $path_to_lexicon not found !" && exit 1
+[ ! -f $path_to_lm ] && echo "Error: $path_to_lm not found !" && exit 1
+
+for subset in ${gen_set//,/ }; do
+    results_path=$src_dir/decode_${cpt}_ctc/${subset}
+    [ ! -d $results_path ] && mkdir -p $results_path
+
+    python $CODE_ROOT/speechlm/infer.py \
+    --config-dir $CODE_ROOT/speechlm/config/decode \
+    --config-name infer_kenlm \
+    common.user_dir=$CODE_ROOT/speechlm \
+    \
+    dataset.gen_subset=${subset} \
+    task.data=$DATA_DIR task.label_dir=$DATA_DIR task.normalize=false \
+    common_eval.results_path=${results_path} common_eval.path=${model_path} \
+    \
+    decoding.lexicon=$path_to_lexicon \
+    decoding.lmpath=$path_to_lm \
+    decoding.beam=1500 \
+    \
+    common_eval.quiet=false \
+    &
+done
+wait
+
+### important to know
+# When loading the fine-tuned model for decoding, fairseq also loads the pre-trained model to use its states['model'] to build the model instance.
+# To prevent the error about the w2v_path (if you don't have the pre-trained model at w2v_path), we set common_eval.model_overrides to override 
+# the w2v_path by speechlmp_base_cfg.pt. speechlmp_base_cfg.pt is just a pre-trained model checkpoint without parameters (only contains config).
+# So, if you have trained a model with different model config (e.g. different encoder layers), you should modify the common_eval.model_overrides to your own.
+    # common_eval.model_overrides=\"{\'w2v_path\':\'$CODE_ROOT/speechlm/config/pretrain/speechlmp_base_cfg.pt\'}\" \
diff --git a/SpeechLM/speechlm/scripts/tune_speechlm_asr/inference_ctc_large.sh b/SpeechLM/speechlm/scripts/tune_speechlm_asr/inference_ctc_large.sh
new file mode 100644
index 0000000000000000000000000000000000000000..265476a05a3225feafe00bad14b1615402c432b3
--- /dev/null
+++ b/SpeechLM/speechlm/scripts/tune_speechlm_asr/inference_ctc_large.sh
@@ -0,0 +1,36 @@
+#####################################
+# SpeechLM Large model #
+#####################################
+[ $# -lt 2 ] && echo "Usage: $0 <model_path> <data_dir> [gen-set=dev_clean,dev_other,test_clean,test_other]" && exit 1
+[ ${PWD##*/} != SpeechLM ] && echo "Error: dir not match! Switch to SpeechLM/ and run it again!" && exit 1
+
+model_path=$1
+DATA_DIR=$2
+gen_set=$3
+[ -z $gen_set ] && gen_set="dev_clean,dev_other,test_clean,test_other"
+src_dir=${model_path%/*}
+cpt=${model_path##*/}
+cpt=${cpt%.*}
+
+CODE_ROOT=${PWD}
+
+for subset in ${gen_set//,/ }; do
+    results_path=$src_dir/decode_${cpt}_ctc/${subset}
+    [ ! -d $results_path ] && mkdir -p $results_path
+
+    python $CODE_ROOT/speechlm/infer.py \
+    --config-dir $CODE_ROOT/speechlm/config/decode \
+    --config-name infer_viterbi \
+    common.user_dir=$CODE_ROOT/speechlm \
+    \
+    dataset.gen_subset=${subset} \
+    task.data=$DATA_DIR task.label_dir=$DATA_DIR task.normalize=true \
+    common_eval.results_path=${results_path} common_eval.path=${model_path} \
+    \
+    common_eval.quiet=true \
+    &
+done
+wait
+
+# model_path=/mnt/default/v-ziqzhang/data/speechulm/finetune_asr/large_speechlmp_32gpu_4accum/ctc200k_from_400k_bz3.6m_lr1e-5/checkpoint_convert.pt
+# data_dir=/home/v-ziqzhang/dataset/LibriSpeech/asr
diff --git a/SpeechLM/speechlm/scripts/tune_speechlm_asr/inference_ctc_large_fsqlm.sh b/SpeechLM/speechlm/scripts/tune_speechlm_asr/inference_ctc_large_fsqlm.sh
new file mode 100644
index 0000000000000000000000000000000000000000..165dd29ee7efcf78c6efb568432049be9b7512f7
--- /dev/null
+++ b/SpeechLM/speechlm/scripts/tune_speechlm_asr/inference_ctc_large_fsqlm.sh
@@ -0,0 +1,46 @@
+#####################################
+# SpeechLM Large model #
+#####################################
+[ $# -lt 2 ] && echo "Usage: $0 <model_path> <data_dir> [gen-set=dev_clean,dev_other,test_clean,test_other]" && exit 1
+[ ${PWD##*/} != SpeechLM ] && echo "Error: dir not match! Switch to SpeechLM/ and run it again!" && exit 1
+
+model_path=$1
+DATA_DIR=$2
+gen_set=$3
+[ -z $gen_set ] && gen_set="dev_clean,dev_other,test_clean,test_other"
+src_dir=${model_path%/*}
+cpt=${model_path##*/}
+cpt=${cpt%.*}
+
+CODE_ROOT=${PWD}
+path_to_lexicon=${DATA_DIR}/librispeech_lexicon.lst
+path_to_lm=${DATA_DIR}/fairseq_word_lm/lm_librispeech_word_transformer.pt
+[ ! -f $path_to_lexicon ] && echo "Error: $path_to_lexicon not found !" && exit 1
+[ ! -f $path_to_lm ] && echo "Error: $path_to_lm not found !" && exit 1
+
+for subset in ${gen_set//,/ }; do
+    results_path=$src_dir/decode_${cpt}_ctc/${subset}
+    [ ! -d $results_path ] && mkdir -p $results_path
+
+    python $CODE_ROOT/speechlm/infer.py \
+    --config-dir $CODE_ROOT/speechlm/config/decode \
+    --config-name infer_fsqlm \
+    common.user_dir=$CODE_ROOT/speechlm \
+    \
+    dataset.gen_subset=${subset} \
+    task.data=$DATA_DIR task.label_dir=$DATA_DIR task.normalize=true \
+    common_eval.results_path=${results_path} common_eval.path=${model_path} \
+    \
+    decoding.lexicon=$path_to_lexicon \
+    decoding.lmpath=$path_to_lm \
+    decoding.lmweight=0.90 \
+    decoding.wordscore=-0.31 \
+    decoding.beam=500 \
+    \
+    common_eval.quiet=false \
+    &
+done
+wait
+
+# model_path=/mnt/default/v-ziqzhang/data/speechulm/finetune_asr/large_speechlmp_32gpu_4accum/ctc200k_from_400k_bz3.6m_lr1e-5/checkpoint_convert.pt
+# data_dir=/home/v-ziqzhang/dataset/LibriSpeech/asr
diff --git a/SpeechLM/speechlm/scripts/tune_speechlm_st/ft_base_covost_enxx.sh b/SpeechLM/speechlm/scripts/tune_speechlm_st/ft_base_covost_enxx.sh
new file mode 100644
index 0000000000000000000000000000000000000000..3b8c12a822549d780622c006247463a845217e50
--- /dev/null
+++ b/SpeechLM/speechlm/scripts/tune_speechlm_st/ft_base_covost_enxx.sh
@@ -0,0 +1,80 @@
+# ####################################
+# SpeechLM Base model #
+# ####################################
+[ $# -lt 4 ] && echo "Usage: $0 <model_path> <data_dir> <lang> <cpt-tag> [mount=${PWD}] [world_size=8] [update_freq=2]" && exit 0
+[ ${PWD##*/} != SpeechLM ] && echo "Error: dir not match! Switch to SpeechLM/ and run it again!" && exit 1
+
+w2v_path=$1
+DATA_DIR=$2
+lang=$3
+cpt=$4
+mount=$5
+world_size=$6
+update_freq=$7
+[ -z $mount ] && mount=${PWD}
+[ -z $world_size ] && world_size=8
+[ -z $update_freq ] && update_freq=2
+
+CODE_ROOT=${PWD}
+
+exp_name=${w2v_path%/*}
+exp_name=${exp_name##*/}
+MODEL_DIR="$mount/exp/finetune_covost/$exp_name/legacy_en${lang}_from_${cpt}_bz3.2m_lr1e-4"
+[ -d $MODEL_DIR ] || mkdir -p $MODEL_DIR
+
+max_tokens=1600000
+python $CODE_ROOT/fairseq/fairseq_cli/train.py ${DATA_DIR} \
+    --save-dir ${MODEL_DIR} \
+    --user-dir $CODE_ROOT/speechlm \
+    --task speech_to_text \
+    --config-yaml config_base_en${lang}.yaml \
+    --train-subset "train_st_en_${lang}_local" \
+    --valid-subset "dev_st_en_${lang}_local" \
+    --fp16 \
+    --seed 1 \
+    \
+    --ddp-backend no_c10d \
+    --distributed-world-size ${world_size} \
+    --tensorboard-logdir ${MODEL_DIR} \
+    \
+    --criterion label_smoothed_cross_entropy --report-accuracy \
+    --label-smoothing 0.1 \
+    \
+    --optimizer adam \
+    --clip-norm 1.0 \
+    --lr 1e-04 \
+    --lr-scheduler polynomial_decay --warmup-updates 5000 \
+    --max-update 50000 \
+    --total-num-update 50000 \
+    --update-freq ${update_freq} \
+    \
+    --max-tokens ${max_tokens} \
+    --max-sentences 16 \
+    --max-tokens-valid ${max_tokens} \
+    --grouped-shuffling \
+    --max-source-positions ${max_tokens} \
+    --skip-invalid-size-inputs-valid-test \
+    --num-workers 0 \
+    --best-checkpoint-metric "accuracy" \
+    --maximize-best-checkpoint-metric \
+    \
+    --arch "speechlm_st_legacy" \
+    --w2v-path ${w2v_path} \
+    --layerdrop 0.1 \
+    --decoder-layerdrop 0.1 \
+    --activation-dropout 0.0 \
+    --attention-dropout 0.1 \
+    --feature-grad-mult 1.0 \
+    \
+    --apply-mask --mask-prob 0.5 \
+    \
+    --log-format json \
+    --log-interval 100 \
+    --save-interval 1 \
+    --keep-last-epochs 5 \
+    --keep-best-checkpoints 5 \
+    \
+    2>&1 | tee ${MODEL_DIR}/train.log
+
+# model_path=/mnt/default/v-ziqzhang/data/speechulm/exp/base/base_speechlmp_32gpu_1accum/checkpoint_298_400000.pt
+# data_dir=${HOME}/dataset/CommonVoice/v4/en/en-de
diff --git a/SpeechLM/speechlm/scripts/tune_speechlm_st/ft_large_covost_enxx.sh b/SpeechLM/speechlm/scripts/tune_speechlm_st/ft_large_covost_enxx.sh
new file mode 100644
index 0000000000000000000000000000000000000000..4e79bec834dec3ad4828954b3d0100adeaa80909
--- /dev/null
+++ b/SpeechLM/speechlm/scripts/tune_speechlm_st/ft_large_covost_enxx.sh
@@ -0,0 +1,80 @@
+# ####################################
+# SpeechLM Large model #
+# ####################################
+[ $# -lt 4 ] && echo "Usage: $0 <model_path> <data_dir> <lang> <cpt-tag> [mount=${PWD}] [world_size=8] [update_freq=4]" && exit 0
+[ ${PWD##*/} != SpeechLM ] && echo "Error: dir not match! Switch to SpeechLM/ and run it again!" && exit 1
+
+w2v_path=$1
+DATA_DIR=$2
+lang=$3
+cpt=$4
+mount=$5
+world_size=$6
+update_freq=$7
+[ -z $mount ] && mount=${PWD}
+[ -z $world_size ] && world_size=8
+[ -z $update_freq ] && update_freq=4
+
+CODE_ROOT=${PWD}
+
+exp_name=${w2v_path%/*}
+exp_name=${exp_name##*/}
+MODEL_DIR="$mount/exp/finetune_covost/$exp_name/legacy_en${lang}_from_${cpt}_bz3.6m_lr1e-4"
+[ -d $MODEL_DIR ] || mkdir -p $MODEL_DIR
+
+max_tokens=900000
+python $CODE_ROOT/fairseq/fairseq_cli/train.py ${DATA_DIR} \
+    --save-dir ${MODEL_DIR} \
+    --user-dir $CODE_ROOT/speechlm \
+    --task speech_to_text \
+    --config-yaml config_large_en${lang}.yaml \
+    --train-subset "train_st_en_${lang}_local" \
+    --valid-subset "dev_st_en_${lang}_local" \
+    --fp16 \
+    --seed 1 \
+    \
+    --ddp-backend no_c10d \
+    --distributed-world-size ${world_size} \
+    --tensorboard-logdir ${MODEL_DIR} \
+    \
+    --criterion label_smoothed_cross_entropy --report-accuracy \
+    --label-smoothing 0.1 \
+    \
+    --optimizer adam \
+    --clip-norm 1.0 \
+    --lr 1e-04 \
+    --lr-scheduler polynomial_decay --warmup-updates 5000 \
+    --max-update 50000 \
+    --total-num-update 50000 \
+    --update-freq ${update_freq} \
+    \
+    --max-tokens ${max_tokens} \
+    --max-sentences 16 \
+    --max-tokens-valid ${max_tokens} \
+    --grouped-shuffling \
+    --max-source-positions ${max_tokens} \
+    --skip-invalid-size-inputs-valid-test \
+    --num-workers 0 \
+    --best-checkpoint-metric "accuracy" \
+    --maximize-best-checkpoint-metric \
+    \
+    --arch "speechlm_st_legacy" \
+    --w2v-path ${w2v_path} --encoder-embed-dim 1024 \
+    --layerdrop 0.1 \
+    --decoder-layerdrop 0.1 \
+    --activation-dropout 0.0 \
+    --attention-dropout 0.1 \
+    --feature-grad-mult 1.0 \
+    \
+    --apply-mask --mask-prob 0.5 \
+    \
+    --log-format json \
+    --log-interval 100 \
+    --save-interval 1 \
+    --keep-last-epochs 5 \
+    --keep-best-checkpoints 5 \
+    \
+    2>&1 | tee ${MODEL_DIR}/train.log
+
+# model_path=/mnt/default/v-ziqzhang/data/speechulm/exp/large/large_speechlmp_32gpu_4accum/checkpoint_31_400000.pt
+# data_dir=${HOME}/dataset/CommonVoice/v4/en/en-de
diff --git a/SpeechLM/speechlm/scripts/tune_speechlm_st/inference_base.sh b/SpeechLM/speechlm/scripts/tune_speechlm_st/inference_base.sh
new file mode 100644
index 0000000000000000000000000000000000000000..513f99fdf897ff84d339e3a1be8407a3c51e8fe7
--- /dev/null
+++ b/SpeechLM/speechlm/scripts/tune_speechlm_st/inference_base.sh
@@ -0,0 +1,46 @@
+# ####################################
+# SpeechLM Base model #
+# ####################################
+[ $# -lt 3 ] && echo "Usage: $0 <model_path> <data_dir> <lang> [gen-set=dev] [beam_size=5] [lenpen=1.0]" && exit 0
+[ ${PWD##*/} != SpeechLM ] && echo "Error: dir not match! Switch to SpeechLM/ and run it again!" && exit 1
+
+model_path=$1
+DATA_DIR=$2
+lang=$3
+gen_set=$4
+beam_size=$5
+lenpen=$6
+[ -z $gen_set ] && gen_set="dev"
+[ -z $beam_size ] && beam_size=5
+[ -z $lenpen ] && lenpen=1
+src_dir=${model_path%/*}
+cpt=${model_path##*/}
+cpt=${cpt%.*}
+
+CODE_ROOT=${PWD}
+results_path=$src_dir/decode_${cpt}_beam${beam_size}/${gen_set}
+[ ! -d $results_path ] && mkdir -p $results_path
+
+python $CODE_ROOT/fairseq/fairseq_cli/generate.py $DATA_DIR \
+    --gen-subset ${gen_set}_st_en_${lang}_local \
+    --max-tokens 2300000 \
+    --max-source-positions 2300000 \
+    --num-workers 0 \
+    \
+    --user-dir $CODE_ROOT/speechlm \
+    --task speech_to_text \
+    --config-yaml config_base_en${lang}.yaml \
+    \
+    --path ${model_path} \
+    --results-path $results_path \
+    \
+    --scoring sacrebleu --max-len-a 0 --max-len-b 512 \
+    --beam ${beam_size} \
+    --lenpen $lenpen \
+
+    echo $results_path
+    tail -n 1 $results_path/generate-*.txt
+    sleep 1s
+
+# model_path=/mnt/default/v-ziqzhang/data/speechulm/finetune_covost/base_speechlmp_32gpu_1accum/legacy_ende_from_400k_bz3.2m_lr1e-4/checkpoint_best_convert.pt
+# data_dir=dataset/CommonVoice/v4/en/en-de
diff --git a/SpeechLM/speechlm/scripts/tune_speechlm_st/inference_large.sh b/SpeechLM/speechlm/scripts/tune_speechlm_st/inference_large.sh
new file mode 100644
index 0000000000000000000000000000000000000000..6957ad58c487e403bb38c05b7315f71cd63cfa8e
--- /dev/null
+++ b/SpeechLM/speechlm/scripts/tune_speechlm_st/inference_large.sh
@@ -0,0 +1,46 @@
+# ####################################
+# SpeechLM Base model #
+# ####################################
+[ $# -lt 3 ] && echo "Usage: $0 <model_path> <data_dir> <lang> [gen-set=dev] [beam_size=5] [lenpen=1.0]" && exit 0
+[ ${PWD##*/} != SpeechLM ] && echo "Error: dir not match! Switch to SpeechLM/ and run it again!" && exit 1
+
+model_path=$1
+DATA_DIR=$2
+lang=$3
+gen_set=$4
+beam_size=$5
+lenpen=$6
+[ -z $gen_set ] && gen_set="dev"
+[ -z $beam_size ] && beam_size=5
+[ -z $lenpen ] && lenpen=1
+src_dir=${model_path%/*}
+cpt=${model_path##*/}
+cpt=${cpt%.*}
+
+CODE_ROOT=${PWD}
+results_path=$src_dir/decode_${cpt}_beam${beam_size}/${gen_set}
+[ ! -d $results_path ] && mkdir -p $results_path
+
+python $CODE_ROOT/fairseq/fairseq_cli/generate.py $DATA_DIR \
+    --gen-subset ${gen_set}_st_en_${lang}_local \
+    --max-tokens 2300000 \
+    --max-source-positions 2300000 \
+    --num-workers 0 \
+    \
+    --user-dir $CODE_ROOT/speechlm \
+    --task speech_to_text \
+    --config-yaml config_large_en${lang}.yaml \
+    \
+    --path ${model_path} \
+    --results-path $results_path \
+    \
+    --scoring sacrebleu --max-len-a 0 --max-len-b 512 \
+    --beam ${beam_size} \
+    --lenpen $lenpen \
+
+    echo $results_path
+    tail -n 1 $results_path/generate-*.txt
+    sleep 1s
+
+# model_path=/mnt/default/v-ziqzhang/data/speechulm/finetune_covost/large_speechlmp_32gpu_4accum/legacy_ende_from_400k_bz3.6m_lr1e-4/checkpoint.avgnbest_convert.pt
+# data_dir=dataset/CommonVoice/v4/en/en-de
diff --git a/SpeechLM/speechlm/tasks/fast_text_to_unit.py b/SpeechLM/speechlm/tasks/fast_text_to_unit.py
new file mode 100644
index 0000000000000000000000000000000000000000..b05324803e3359837832148ff2e5dad3ab1ba367
--- /dev/null
+++ b/SpeechLM/speechlm/tasks/fast_text_to_unit.py
@@ -0,0 +1,174 @@
+# ----------------------------------------------------------------------------
+# SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data (https://arxiv.org/abs/2209.15329)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/SpeechLM
+# Code based on fairseq: https://github.com/facebookresearch/fairseq/tree/272c4c5197250997148fb12c0db6306035f166a4
+# 
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# ----------------------------------------------------------------------------
+
+import torch
+import numpy as np
+import logging
+from pathlib import Path
+from argparse import Namespace
+
+from fairseq.tasks import LegacyFairseqTask, register_task
+from fairseq.data import Dictionary, encoders
+from fairseq.data.audio.speech_to_text_joint_dataset import S2TJointDataConfig
+
+from speechlm.unit_generator import NonAutoregressiveUnitGenerator
+from speechlm.data.text_to_unit_dataset import Text2UnitDatasetCreator
+
+logging.basicConfig(
+    format="%(asctime)s | %(levelname)s | %(name)s | %(message)s",
+    datefmt="%Y-%m-%d %H:%M:%S",
+    level=logging.INFO,
+)
+logger = logging.getLogger(__name__)
+
+
+@register_task("fast_text_to_unit")
+class FastTextToUnitTask(LegacyFairseqTask):
+    @staticmethod
+    def add_args(parser):
+        parser.add_argument("data", help="manifest root path")
+        parser.add_argument(
+            "--config-yaml",
+            type=str,
+            default="config.yaml",
+            help="Configuration YAML filename (under manifest root)",
+        )
+        parser.add_argument(
+            "--max-source-positions",
+            default=2048,
+            type=int,
+            metavar="N",
+            help="max number of tokens in the source sequence",
+        )
+        parser.add_argument(
+            "--max-target-positions",
+            default=1024,
+            type=int,
+            metavar="N",
+            help="max number of tokens in the target sequence",
+        )
+        parser.add_argument("--n-frames-per-step", type=int, default=1)
+        parser.add_argument("--eos-prob-threshold", type=float, default=0.5)
+        parser.add_argument("--eval-inference", action="store_true")
+        parser.add_argument("--eval-tb-nsample", type=int, default=8)
+        parser.add_argument("--vocoder", type=str, default="griffin_lim")
+        parser.add_argument("--spec-bwd-max-iter", type=int, default=8)
+
+    def __init__(self, args, src_dict, tgt_dict):
+        super().__init__(args)
+        self.src_dict = src_dict
+        self.tgt_dict = tgt_dict
+        self.data_cfg = S2TJointDataConfig(Path(args.data) / args.config_yaml)
+        self.speaker_to_id = self._get_speaker_to_id()
+        
+    @classmethod
+    def setup_task(cls, args, **kwargs):
+        data_cfg = S2TJointDataConfig(Path(args.data) / args.config_yaml)
+        src_dict_path = Path(args.data) / data_cfg.src_vocab_filename
+        if not src_dict_path.is_file():
+            raise FileNotFoundError(f"Dict not found: {src_dict_path.as_posix()}")
+        src_dict = Dictionary.load(src_dict_path.as_posix())
+        logger.info(
+            f"Source dictionary size ({data_cfg.src_vocab_filename}): " f"{len(src_dict):,}"
+        )
+        tgt_dict_path = Path(args.data) / data_cfg.vocab_filename
+        if not tgt_dict_path.is_file():
+            raise FileNotFoundError(f"Dict not found: {tgt_dict_path.as_posix()}")
+        tgt_dict = Dictionary.load(tgt_dict_path.as_posix())
+        logger.info(
+            f"Target dictionary size ({data_cfg.vocab_filename}): " f"{len(tgt_dict):,}"
+        )
+
+        if getattr(args, "train_subset", None) is not None:
+            if not all(s.startswith("train") for s in args.train_subset.split(",")):
+                raise ValueError('Train splits should be named like "train*".')
+        return cls(args, src_dict, tgt_dict)
+    
+    def load_dataset(self, split, epoch=1, combine=False, **kwargs):
+        is_train_split = split.startswith("train")
+        pre_tokenizer = self.build_tokenizer(self.args)
+        bpe_tokenizer = self.build_bpe(self.args)
+        self.datasets[split] = Text2UnitDatasetCreator.from_tsv(
+            self.args.data,
+            self.data_cfg,
+            split,
+            self.src_dict,
+            pre_tokenizer,
+            bpe_tokenizer,
+            is_train_split=is_train_split,
+            epoch=epoch,
+            seed=self.args.seed,
+            n_frames_per_step=self.args.n_frames_per_step,
+            speaker_to_id=self.speaker_to_id,
+        )
+
+    @property
+    def target_dictionary(self):
+        return self.tgt_dict
+
+    @property
+    def source_dictionary(self):
+        return self.src_dict
+
+    def max_positions(self):
+        return self.args.max_source_positions, self.args.max_target_positions
+
+    def _get_speaker_to_id(self):
+        speaker_to_id = None
+        speaker_set_filename = self.data_cfg.config.get("speaker_set_filename")
+        if speaker_set_filename is not None:
+            speaker_set_path = Path(self.args.data) / speaker_set_filename
+            with open(speaker_set_path) as f:
+                speaker_to_id = {r.strip(): i for i, r in enumerate(f)}
+        return speaker_to_id
+
+    @classmethod
+    def get_speaker_embeddings(cls, args):
+        # It Will be used in FastText2UnitModel model, insdead of nn.Embedding on speaker-id, we default to use x-vectors extracted ahead.
+        # This is for varying the speaker information when generating units from text.
+        if args.speaker_to_id is not None:
+            embed_speaker = torch.nn.Embedding(
+                len(args.speaker_to_id), args.speaker_embed_dim
+            )
+        elif args.speaker_embedding_type == "x-vector":
+            # return LayerNorm(args.speaker_embed_dim)
+            return lambda x: x.unsqueeze(1)
+        elif args.speaker_embedding_type == "i-vector":
+            # return LayerNorm(args.speaker_embed_dim)
+            return lambda x: x
+        else:
+            embed_speaker = None
+        return embed_speaker
+
+    def build_model(self, cfg):
+        cfg.pitch_min = self.data_cfg.config["features"].get("pitch_min", None)
+        cfg.pitch_max = self.data_cfg.config["features"].get("pitch_max", None)
+        cfg.energy_min = self.data_cfg.config["features"].get("energy_min", None)
+        cfg.energy_max = self.data_cfg.config["features"].get("energy_max", None)
+        cfg.speaker_to_id = self.speaker_to_id
+        cfg.speaker_embedding_type = self.data_cfg.config.get("speaker_embedding_type", None)
+        model = super().build_model(cfg)
+        self.generator = None
+        if getattr(cfg, "eval_inference", False):
+            self.generator = self.build_generator([model], cfg)
+        return model
+
+    def build_generator(self, models, cfg, vocoder=None, **unused):
+        model = models[0]
+        assert getattr(model, "NON_AUTOREGRESSIVE") is True
+        return NonAutoregressiveUnitGenerator(model, vocoder, self.data_cfg)
+
+
+    def build_tokenizer(self, args):
+        logger.info(f"pre-tokenizer: {self.data_cfg.pre_tokenizer}")
+        return encoders.build_tokenizer(Namespace(**self.data_cfg.pre_tokenizer))
+
+    def build_bpe(self, args):
+        logger.info(f"tokenizer: {self.data_cfg.bpe_tokenizer}")
+        return encoders.build_bpe(Namespace(**self.data_cfg.bpe_tokenizer))
diff --git a/SpeechLM/speechlm/tasks/joint_sc2t_pretrain.py b/SpeechLM/speechlm/tasks/joint_sc2t_pretrain.py
new file mode 100644
index 0000000000000000000000000000000000000000..86af617670c5dcbb2aa3274057755ebb04b66547
--- /dev/null
+++ b/SpeechLM/speechlm/tasks/joint_sc2t_pretrain.py
@@ -0,0 +1,976 @@
+# ----------------------------------------------------------------------------
+# SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data (https://arxiv.org/abs/2209.15329)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/SpeechLM
+# Code based on fairseq: https://github.com/facebookresearch/fairseq/tree/272c4c5197250997148fb12c0db6306035f166a4
+# 
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# ----------------------------------------------------------------------------
+
+import logging
+import os
+import sys
+from typing import Dict, List, Optional, Tuple
+from pathlib import Path
+
+import numpy as np
+from argparse import Namespace
+from collections import OrderedDict
+
+import torch
+from dataclasses import dataclass, field
+from fairseq.data import (
+    Dictionary,
+    encoders,
+    data_utils,
+    StripTokenDataset,
+    PrependTokenDataset,
+    AppendTokenDataset,
+    DenoisingDataset,
+    ConcatDataset,
+    FairseqDataset,
+    iterators,
+    ResamplingDataset,
+    MaskTokensDataset,
+    LanguagePairDataset,
+)
+from fairseq.data.audio.speech_to_text_joint_dataset import S2TJointDataConfig
+from fairseq.data.shorten_dataset import maybe_shorten_dataset
+# from fairseq.data.encoders.utils import get_whole_word_mask
+from fairseq.dataclass.configs import FairseqDataclass
+from fairseq.tasks import register_task
+from fairseq.tasks.fairseq_task import FairseqTask
+from fairseq.dataclass.constants import ChoiceEnum
+from omegaconf import MISSING
+
+from speechlm.data.multimodal_corpus_dataset import MultiCorpusDataset
+from speechlm.data.load_langpair_dataset  import load_langpair_dataset
+from speechlm.data.language_trible_dataset import LanguageTripleDataset, load_langtriple_dataset
+from speechlm.data.hubert_dataset import HubertDataset
+
+logger = logging.getLogger(__name__)
+
+TOKENIZER_CHOICES = ChoiceEnum(["sentencepiece", "hubert_letters", "none"])
+
+def _lang_token(lang: str):
+    return "<lang:{}>".format(lang)
+
+def _lang_token_index(dic: Dictionary, lang: str):
+    """Return language token index."""
+    idx = dic.index(_lang_token(lang))
+    assert idx != dic.unk_index, "cannot find language token for lang {}".format(lang)
+    return idx
+
+
+class LabelEncoder(object):
+    def __init__(self, dictionary: Dictionary) -> None:
+        self.dictionary = dictionary
+
+    def __call__(self, label: str) -> List[str]:
+        return self.dictionary.encode_line(
+            label, append_eos=False, add_if_not_exist=False,
+        )
+
+
+### wrap the initial get_whole_word_mask which needs bpe_tokenizer,
+### here we just assume words are splited by "|" or "<SIL>"
+def get_whole_word_mask(args, dictionary):
+    def is_beginning_of_word(i):
+        if i < dictionary.nspecial:
+            # special elements are always considered beginnings
+            return True
+        tok = dictionary[i]
+        if tok.startswith("madeupword"):
+            return True
+        elif tok in ["<unk>", "<s>", "</s>", "<pad>", "|", "<eps>"]:
+            return True
+        else:
+            return False
+
+    mask_whole_words = torch.ByteTensor(
+        list(map(is_beginning_of_word, range(len(dictionary))))
+    )
+    return mask_whole_words
+
+def get_repeative_start(tokens):
+    """
+    tokens: torch.Tensor with repeative tokens
+    """
+    length = len(tokens)
+    rep_start_id = tokens[:-1] != tokens[1:]
+    return torch.cat([torch.tensor([True]), rep_start_id])
+
+@dataclass
+class TextPretrainingConfig(FairseqDataclass):    
+    ### added for joint pretraining
+    text_data: Optional[str] = field(
+        default=None,
+        metadata={
+            "help": "if set, path to text data directory",
+        },
+    )
+    seed: Optional[int] = field(
+        default=1,
+        metadata={
+            "help": "for ordered_indices in MulticorpusDataset",
+        },
+    )
+    tokens_per_sample: Optional[int] = field(
+        default=512,
+        metadata={
+            "help": "max number of total tokens over all segments per sample for dataset",
+        },
+    )
+    tokens_per_sample_tgt: Optional[int] = field(
+        default=512,
+        metadata={
+            "help": "max number of total tokens over all segments per target sample for dataset",
+        },
+    )
+    sample_break_mode: Optional[str] = field(
+        default="eos",
+        metadata={
+            "help": "mode for breaking sentence",
+        },
+    )
+    mask: Optional[float] = field(
+        default=0.3,
+        metadata={
+            "help": "fraction of words/subwords that will be masked",
+        },
+    )
+    leave_unmasked_prob: float = field(
+        default=0.1,
+        metadata={"help": "probability that a masked token is unmasked"},
+    )
+    mask_random: Optional[float] = field(
+        default=0.1,
+        metadata={
+            "help": "instead of using [MASK], use random token this often",
+        },
+    )
+    freq_weighted_replacement: bool = field(
+        default=False,
+        metadata={"help": "sample random replacement words based on word frequencies"},
+    )
+    mask_whole_words: bool = field(
+        default=True,
+        metadata={"help": "mask whole words; you may also want to set --bpe"},
+    )
+    mask_repeative_tokens: bool = field(
+        default=True,
+        metadata={"help": "mask repeative_tokens; if mask_whole_words=False"},
+    )
+    mask_multiple_length: int = field(
+        default=1,
+        metadata={"help": "repeat the mask indices multiple times"},
+    )
+    mask_stdev: float = field(
+        default=0.0,
+        metadata={"help": "stdev of the mask length"},
+    )
+    shorten_method: Optional[str] = field(
+        default="none",
+        metadata={
+            "help": "if not none, shorten sequences that exceed tokens_per_sample",
+            "choices": "none/truncate/random_crop"
+        },
+    )
+    shorten_data_split_list: Optional[str] = field(
+        default="",
+        metadata={
+            "help": "comma_separated list of dataset splits to apply shortening to, e.g., train,valid (default: all dataset splits)",
+        },
+    )
+
+    ### below hypra-parameters is used in bart
+    insert: Optional[float] = field(
+        default=0.0,
+        metadata={
+            "help": "insert this percentage of additional random tokens",
+        },
+    )
+    permute: Optional[float] = field(
+        default=0.0,
+        metadata={
+            "help": "take this proportion of subwords and permute them",
+        },
+    )
+    rotate: Optional[float] = field(
+        default=0.0,
+        metadata={
+            "help": "rotate this proportion of inputs",
+        },
+    )
+    poisson_lambda: Optional[float] = field(
+        default=3.5,
+        metadata={
+            "help": "randomly shuffle sentences for this proportion of inputs",
+        },
+    )
+    permute_sentences: Optional[float] = field(
+        default=0.0,
+        metadata={
+            "help": "shuffle this proportion of sentences in all inputs",
+        },
+    )
+    mask_length: Optional[str] = field(
+        default="span-poisson",
+        metadata={
+            "help": "mask length to choose",
+            "choice": "subword/word/span-poisson"
+        },
+    )
+    replace_length: Optional[int] = field(
+        default=1,
+        metadata={
+            "help": "when masking N tokens, replace with 0, 1, or N tokens (use -1 for N)",
+        },
+    )
+    shuffle_instance: Optional[bool] = field(
+        default=False,
+        metadata={"help": "shuffle instance"},
+    )
+    max_source_positions: Optional[int] = field(
+        default=1024,
+        metadata={"help": "max number of tokens in the source sequence"},
+    )
+    max_target_positions: Optional[int] = field(
+        default=1024,
+        metadata={"help": "max number of tokens in the target sequence"},
+    )
+    bpe: Optional[str] = field(
+        default="",
+        metadata={
+            "help": "will wrapped by the text_data_config yaml",
+        },
+    )
+    data_config: Optional[str] = field(
+        default=None,
+        metadata={
+            "help": "a config yaml specify the bpe model of text data",
+        },
+    )
+    text_maxtokens_ratio: Optional[float] = field(
+        default=1.0,
+        metadata={
+            "help": "for text, max_tokens = max_tokens * text_maxtokens_ratio / 320 ",
+        },
+    )
+    prepend_tgt_lang_tag: bool = field(
+        default=False,
+        metadata={"help": "prepend tgt_lang_tag to replace <eos>"},
+    )
+    mask_text_ratio: Optional[float] = field(
+        default=0.0,
+        metadata={
+            "help": "mask_text_ratio, for paired data",
+        },
+    )
+    truncate_mono_source: bool = field(
+        default=True,
+        metadata={"help": "truncate mono source-side examples that exceed max-positions"},
+    )
+
+
+@dataclass
+class JointPretrainingConfig(FairseqDataclass):
+    data: str = field(
+        default=MISSING, metadata={"help": "path to speech data directory"}
+    )
+    fine_tuning: bool = field(
+        default=False, metadata={"help": "set to true if fine-tuning Hubert"}
+    )
+    labels: List[str] = field(
+        default_factory=lambda: ["ltr"],
+        metadata={
+            "help": (
+                "extension of the label files to load, frame-level labels for"
+                " pre-training, and sequence-level label for fine-tuning"
+            )
+        },
+    )
+    label_dir: Optional[str] = field(
+        default=None,
+        metadata={
+            "help": "if set, looks for labels in this directory instead",
+        },
+    )
+    label_rate: int = field(
+        default=-1,
+        metadata={"help": "label frame rate. -1 for sequence label"},
+    )
+    sample_rate: int = field(
+        default=16_000,
+        metadata={
+            "help": "target sample rate. audio files will be up/down "
+            "sampled to this rate"
+        },
+    )
+    normalize: bool = field(
+        default=False,
+        metadata={
+            "help": "if set, normalizes input to have 0 mean and unit variance"
+        },
+    )
+    enable_padding: bool = field(
+        default=False,
+        metadata={"help": "pad shorter samples instead of cropping"},
+    )
+    max_keep_size: Optional[int] = field(
+        default=None,
+        metadata={"help": "exclude sample longer than this"},
+    )
+    max_sample_size: Optional[int] = field(
+        default=None,
+        metadata={"help": "max sample size to crop to for batching"},
+    )
+    min_sample_size: Optional[int] = field(
+        default=None,
+        metadata={"help": "min sample size to crop to for batching"},
+    )
+    single_target: Optional[bool] = field(
+        default=False,
+        metadata={
+            "help": "if set, AddTargetDatasets outputs same keys "
+            "as AddTargetDataset"
+        },
+    )
+    random_crop: Optional[bool] = field(
+        default=True,
+        metadata={"help": "always crop from the beginning if false"},
+    )
+    pad_audio: Optional[bool] = field(
+        default=False,
+        metadata={"help": "pad audio to the longest one in the batch if true"},
+    )
+    store_labels: Optional[bool] = field(
+        default=True,
+        metadata={"help": "store spm labels in memory, should be true when fine-tune with bpe"},
+    )
+    add_decoder_target: bool = field(
+        default=False,
+        metadata={"help": "contral the model architecture, if set True, load reduced unit as target"},
+    )
+    split_modality_batch: bool = field(
+        default=False,
+        metadata={"help": "whether create all samples of different modalities in a batch"},
+    )
+    speech_tgt_lang: str = field(
+        default="",
+        metadata={"help": "prepend <tgt-id> to prev_output_tokens to replace <eos>, only used for decoder"},
+    )
+    speech_sampling_alpha: float = field(
+        default=0.2,
+        metadata={
+            "help": "Hyper-parameter alpha = 1/T for temperature-based speech resampling."
+            "(alpha = 1 for no resampling)"
+        },
+    )
+    text_sampling_alpha: float = field(
+        default=0.2,
+        metadata={
+            "help": "Hyper-parameter alpha = 1/T for temperature-based text resampling."
+            "(alpha = 1 for no resampling)"
+        },
+    )
+    hubert_tokenizer: Optional[TOKENIZER_CHOICES] = field(
+        default="none",
+        metadata={"help": "which tokenizer for processing text"},
+    )
+    sp_path: Optional[str] = field(
+        default=None,
+        metadata={"help": "sentencepiece model path if using bpe tokenizer"},
+    )
+
+    text_cfg: TextPretrainingConfig = TextPretrainingConfig()
+
+
+@register_task("joint_sc2t_pretraining", dataclass=JointPretrainingConfig)
+class Jsc2tPretrainingTask(FairseqTask):
+
+    cfg: JointPretrainingConfig
+
+    def __init__(
+        self,
+        cfg: JointPretrainingConfig,
+    ) -> None:
+        super().__init__(cfg)
+
+        logger.info(f"current directory is {os.getcwd()}")
+        logger.info(f"JSTPretrainingTask Config {cfg}")
+
+        self.cfg = cfg
+        self.fine_tuning = cfg.fine_tuning
+        self.blank_symbol = "<s>"
+
+        self.state.add_factory("hubert_tokenizer", self.build_tokenizer)
+        if self.cfg.text_cfg.text_data is not None and os.path.exists(self.cfg.text_cfg.text_data):
+            self.state.add_factory("text_dictionary", self.load_text_dictionary)
+            self.state.add_factory("text_src_dictionary", self.load_text_src_dictionary)
+        if cfg.fine_tuning:
+            self.state.add_factory("target_dictionary", self.load_dictionaries)
+        else:
+            self.state.add_factory("dictionaries", self.load_dictionaries)
+
+        if cfg.text_cfg.data_config is not None:
+            self.text_data_cfg = S2TJointDataConfig(Path(f"{cfg.text_cfg.text_data}/{cfg.text_cfg.data_config}"))
+            self.cfg.text_cfg.bpe = self.text_data_cfg.bpe_tokenizer["bpe"]
+        else:
+            self.text_data_cfg = None
+
+    @property
+    def source_dictionary(self) -> Optional[Dictionary]:
+        return None
+
+    @property
+    def target_dictionary(self) -> Optional[Dictionary]:
+        return self.state.target_dictionary
+
+    @property
+    def dictionaries(self) -> List[Dictionary]:
+        return self.state.dictionaries
+
+    @property
+    def text_dictionary(self) -> Optional[Dictionary]:
+        return self.state.text_dictionary
+
+    @property
+    def text_src_dictionary(self) -> Optional[Dictionary]:
+        return self.state.text_src_dictionary
+
+    @property
+    def hubert_tokenizer(self):
+        return self.state.hubert_tokenizer
+
+    def load_dictionaries(self):
+        label_dir = self.cfg.data if self.cfg.label_dir is None else self.cfg.label_dir
+        dictionaries = [Dictionary.load(f"{label_dir}/dict.{label}.txt") for label in self.cfg.labels]
+        if not self.cfg.fine_tuning:
+            for dictionary in dictionaries:
+                dictionary.add_symbol("<mask>")
+        return dictionaries[0] if self.cfg.fine_tuning else dictionaries
+    
+    def load_text_dictionary(self):
+        tgt_dict_path = f"{self.cfg.text_cfg.text_data}/{self.text_data_cfg.vocab_filename if self.text_data_cfg is not None else 'dict.txt'}"
+        if not os.path.isfile(tgt_dict_path):
+            raise FileNotFoundError(f"Dict not found: {tgt_dict_path}")
+        text_dictionary = Dictionary.load(tgt_dict_path)
+        self.mask_idx = text_dictionary.add_symbol("<mask>")
+        return text_dictionary
+
+    def load_text_src_dictionary(self):
+        src_dict_path = f"{self.cfg.text_cfg.text_data}/{self.text_data_cfg.src_vocab_filename if self.text_data_cfg is not None else 'dict.txt'}"
+        if not os.path.isfile(src_dict_path):
+            raise FileNotFoundError(f"Dict not found: {src_dict_path}")
+        src_text_dictionary = Dictionary.load(src_dict_path)
+        self.mask_idx = src_text_dictionary.add_symbol("<mask>")
+        return src_text_dictionary
+
+    @classmethod
+    def setup_task(
+        cls, cfg: JointPretrainingConfig, **kwargs
+    ) -> "Jsc2tPretrainingTask":
+        return cls(cfg)
+
+    def get_label_dir(self) -> str:
+        if self.cfg.label_dir is None:
+            return self.cfg.data
+        return self.cfg.label_dir
+    
+    def load_paired_dataset(self, text_split, truncate_source=False):
+        text_split, lp = text_split.rsplit('.', 1)       # e.g. "libritext.ltr-ltr"
+        if len(lp.split("-")) == 2:
+            src, tgt = lp.split("-")
+            if src == tgt:
+                logger.warn(f"| trying to load monolingual dataset {text_split}.{lp}, please check your task is right.")
+                paired_dataset = self.load_char_bart_dataset(f"{text_split}.{lp}.{tgt}")
+                return paired_dataset
+            paired_dataset = load_langpair_dataset(
+                self.cfg.text_cfg.text_data,
+                text_split,
+                src,
+                self.text_src_dictionary,
+                tgt,
+                self.text_dictionary,
+                combine=True,
+                dataset_impl=None,
+                upsample_primary=1,
+                left_pad_source=False,
+                left_pad_target=False,
+                max_source_positions=self.cfg.text_cfg.tokens_per_sample,
+                max_target_positions=self.cfg.text_cfg.tokens_per_sample,
+                truncate_source=truncate_source,
+                prepend_bos=False,
+                load_alignments=False,
+                append_source_id=True if self.cfg.text_cfg.prepend_tgt_lang_tag else False,
+                lang_format="<lang:{}>" if self.cfg.text_cfg.prepend_tgt_lang_tag else "[{}]",
+                input_feeding=self.cfg.add_decoder_target,
+            )
+            if self.cfg.text_cfg.mask_text_ratio > 0:
+                # add mask
+                self.mask_idx = self.text_src_dictionary.index("<mask>")
+                mask_whole_words = None
+                if self.cfg.text_cfg.mask_whole_words:
+                    mask_whole_words = get_whole_word_mask(self.cfg.text_cfg, self.text_src_dictionary)
+                elif self.cfg.text_cfg.mask_repeative_tokens:
+                    mask_whole_words = get_repeative_start
+                
+                src_dataset, src_unmasked_dataset = MaskTokensDataset.apply_mask(
+                    paired_dataset.src,
+                    self.text_src_dictionary,
+                    pad_idx=self.text_src_dictionary.pad(),
+                    mask_idx=self.mask_idx,
+                    seed=self.cfg.text_cfg.seed,
+                    mask_prob=self.cfg.text_cfg.mask_text_ratio,
+                    leave_unmasked_prob=self.cfg.text_cfg.leave_unmasked_prob,
+                    random_token_prob=self.cfg.text_cfg.mask_random,
+                    freq_weighted_replacement=self.cfg.text_cfg.freq_weighted_replacement,
+                    mask_whole_words=mask_whole_words,
+                    mask_multiple_length=self.cfg.text_cfg.mask_multiple_length,
+                    mask_stdev=self.cfg.text_cfg.mask_stdev,
+                )
+                tgt_dataset = paired_dataset.tgt if paired_dataset.tgt is not None else src_unmasked_dataset
+                paired_dataset = LanguageTripleDataset(
+                    src_dataset,
+                    src_dataset.sizes,
+                    self.text_src_dictionary,
+                    src_unmasked_dataset,
+                    src_unmasked_dataset.sizes,
+                    self.text_src_dictionary,
+                    tgt_dataset,
+                    tgt_dataset.sizes,
+                    self.text_dictionary,
+                    left_pad_source=False,
+                    left_pad_target=False,
+                    align_dataset=None,
+                    eos=None,
+                    num_buckets=0,
+                    shuffle=True,
+                    pad_to_multiple=1,
+                )
+        else:
+            src, ref, tgt = lp.split("-")
+            paired_dataset = load_langtriple_dataset(
+                self.cfg.text_cfg.text_data,
+                text_split,
+                src,
+                self.text_src_dictionary,
+                ref,
+                self.dictionaries[-1],
+                tgt,
+                self.text_dictionary,
+                combine=True,
+                dataset_impl=None,
+                upsample_primary=1,
+                left_pad_source=False,
+                left_pad_target=False,
+                max_source_positions=self.cfg.text_cfg.tokens_per_sample,
+                max_target_positions=self.cfg.text_cfg.tokens_per_sample,
+                truncate_source=truncate_source,
+                prepend_bos=False,
+                load_alignments=False,
+                append_source_id=True if self.cfg.text_cfg.prepend_tgt_lang_tag else False,
+                lang_format="<lang:{}>" if self.cfg.text_cfg.prepend_tgt_lang_tag else "[{}]",
+            )
+        return paired_dataset
+
+    def load_dataset(self, split: str, epoch=1, **kwargs) -> None:
+        """
+            Create Wav dataset for audio, and Index dataset for phonemized text, 
+            then concatenate them to by fairseq.data.multi_corpus_dataset.MultiCorpusDataset.
+        """
+        speech_splits = split.split('+')[0].split(',')
+        ### 1st, create a speech dataset using STSpeechDataset (modified from HubertDataset)
+        dicts = [self.target_dictionary] if self.cfg.fine_tuning else self.dictionaries
+        pad_list = [dict.pad() for dict in dicts]
+        eos_list = [dict.eos() for dict in dicts]
+        procs = [LabelEncoder(dict) for dict in dicts]
+        if self.cfg.speech_tgt_lang != "":
+            tgt_lang_idx = _lang_token_index(dicts[0], self.cfg.speech_tgt_lang)
+            logger.info(f"Will prepend <{tgt_lang_idx}> at the beginning of prev_output_tokens to replace <eos>")
+        else:
+            tgt_lang_idx = None
+
+
+        # hubert v1: pad_audio=True, random_crop=False;
+        speech_datasets = []
+        for speech_split in speech_splits:
+            paths = [
+                f"{self.get_label_dir()}/{speech_split}.{l}" for l in self.cfg.labels
+            ]
+            speech_datasets.append(
+                HubertDataset(
+                    f"{self.cfg.data}/{speech_split}.tsv",
+                    sample_rate=self.cfg.sample_rate,
+                    label_paths=paths,
+                    label_rates=self.cfg.label_rate,
+                    pad_list=pad_list,
+                    eos_list=eos_list,
+                    label_processors=procs,
+                    max_keep_sample_size=self.cfg.max_keep_size,
+                    min_keep_sample_size=self.cfg.min_sample_size,
+                    max_sample_size=self.cfg.max_sample_size,
+                    pad_audio=self.cfg.pad_audio,
+                    normalize=self.cfg.normalize,
+                    store_labels=self.cfg.store_labels,
+                    random_crop=self.cfg.random_crop,
+                    single_target=self.cfg.single_target,
+                    tgt_dict=dicts[0],
+                    add_decoder_target=self.cfg.add_decoder_target,
+                    fine_tuning=self.cfg.fine_tuning,
+                    tgt_lang_idx=tgt_lang_idx,
+                    tokenizer=self.hubert_tokenizer,
+                )
+            )
+        if len(speech_datasets) > 1:
+            speech_dataset = ConcatDataset(speech_datasets)
+        else:
+            speech_dataset = speech_datasets[0]
+
+        has_text = len(split.split('+')) > 1
+        if not has_text:
+            assert speech_dataset is not None
+            self.datasets[split] = speech_dataset
+            return
+
+        ### 2nd, create paired/mono text datasets using Langpairdataset
+        if split.split('+')[1] != '':
+            paired_splits = [paired_split for paired_split in split.split('+')[1].split(',') if paired_split != '']
+            paired_datasets = [self.load_paired_dataset(paired_split) for paired_split in paired_splits]
+        else:
+            paired_splits, paired_datasets = [], []
+
+        if len(split.split('+')) > 2 and split.split('+')[2] != '':
+            mono_splits = [mono_split for mono_split in split.split('+')[2].split(',') if mono_split != '']
+            mono_datasets = [self.load_paired_dataset(mono_split, truncate_source=self.cfg.text_cfg.truncate_mono_source) for mono_split in mono_splits]
+        else:
+            mono_splits, mono_datasets = [], []
+        
+        assert len(mono_datasets + paired_datasets) > 0, f"split {split} has no text! you should check out for that"
+
+        ### 3rd, if provided, create a supervised dataset with labeled data
+        if len(split.split('+')) > 3 and split.split('+')[3] != '':
+            assert len(paired_splits) > 0, f"supervised dataset can not be loaded without text paired dataset!"
+            tgt = paired_splits[0].rsplit('.', 1)[1].split("-")[1]
+            sup_split = split.split('+')[3]
+
+            sup_dataset = HubertDataset(
+                f"{self.cfg.data}/{sup_split}.tsv",
+                sample_rate=self.cfg.sample_rate,
+                label_paths=[f"{self.get_label_dir()}/{sup_split}.{tgt}"],
+                label_rates=[-1],
+                pad_list=[self.text_dictionary.pad()],
+                eos_list=[self.text_dictionary.eos()],
+                label_processors=[LabelEncoder(self.text_dictionary)],
+                max_keep_sample_size=self.cfg.max_keep_size,
+                min_keep_sample_size=None,
+                max_sample_size=None,
+                pad_audio=True,
+                normalize=self.cfg.normalize,
+                store_labels=self.cfg.store_labels,
+                random_crop=False,
+                single_target=True,
+                tgt_dict=self.text_dictionary,
+                add_decoder_target=self.cfg.add_decoder_target,
+                fine_tuning=True,
+                tgt_lang_idx=None,
+                tokenizer=None,
+            )
+        else:
+            sup_dataset = None
+
+        ### 4th, compose a MultiCorpusDataset
+        dataset_dict, max_positions_dict, distributions, max_tokens_ratios = self.resample_multi_modality_dataset(
+        speech_dataset, sup_dataset, mono_datasets, paired_datasets, mono_splits, paired_splits, epoch=epoch,
+        )
+        self.datasets[split] = MultiCorpusDataset(
+            dataset_dict,
+            max_positions=max_positions_dict,
+            distribution=distributions,
+            max_tokens_ratio=max_tokens_ratios,
+            seed=self.cfg.text_cfg.seed,
+            sort_indices=True,
+        )
+
+
+    def max_positions(self) -> Tuple[int, int]:
+        return (sys.maxsize, sys.maxsize)
+
+    def filter_indices_by_size(
+        self, indices: np.array, *args, **kwargs
+    ) -> np.array:
+        return indices
+
+    def get_batch_iterator(
+        self,
+        dataset,
+        max_tokens=None,
+        max_sentences=None,
+        max_positions=None,
+        ignore_invalid_inputs=False,
+        required_batch_size_multiple=1,
+        seed=1,
+        num_shards=1,
+        shard_id=0,
+        num_workers=0,
+        epoch=1,
+        data_buffer_size=0,
+        disable_iterator_cache=False,
+        skip_remainder_batch=False,
+        grouped_shuffling=False,
+        update_epoch_batch_itr=False,
+    ):
+        """
+        Get an iterator that yields batches of data from the given dataset.
+        Args:
+            dataset (~fairseq.data.FairseqDataset): dataset to batch
+            max_tokens (int, optional): max number of tokens in each batch
+                (default: None).
+            max_sentences (int, optional): max number of sentences in each
+                batch (default: None).
+            max_positions (optional): max sentence length supported by the
+                model (default: None).
+            ignore_invalid_inputs (bool, optional): don't raise Exception for
+                sentences that are too long (default: False).
+            required_batch_size_multiple (int, optional): require batch size to
+                be a multiple of N (default: 1).
+            seed (int, optional): seed for random number generator for
+                reproducibility (default: 1).
+            num_shards (int, optional): shard the data iterator into N
+                shards (default: 1).
+            shard_id (int, optional): which shard of the data iterator to
+                return (default: 0).
+            num_workers (int, optional): how many subprocesses to use for data
+                loading. 0 means the data will be loaded in the main process
+                (default: 0).
+            epoch (int, optional): the epoch to start the iterator from
+                (default: 1).
+            data_buffer_size (int, optional): number of batches to
+                preload (default: 0).
+            disable_iterator_cache (bool, optional): don't cache the
+                EpochBatchIterator (ignores `FairseqTask::can_reuse_epoch_itr`)
+                (default: False).
+            skip_remainder_batch (bool, optional): if set, discard the last
+                batch in each training epoch, as the last batch is often smaller than
+                    local_batch_size * distributed_word_size (default: ``True``).
+            grouped_shuffling (bool, optional): group batches with each groups
+                containing num_shards batches and shuffle groups. Reduces difference
+                between sequence lengths among workers for batches sorted by length.
+            update_epoch_batch_itr (bool optional): if true then donot use the cached
+                batch iterator for the epoch
+            
+        Returns:
+            ~fairseq.iterators.EpochBatchIterator: a batched iterator over the
+                given dataset split
+        """
+        if self.fine_tuning or not isinstance(dataset, MultiCorpusDataset):
+            return super().get_batch_iterator(
+                dataset,
+                max_tokens=max_tokens,
+                max_sentences=max_sentences,
+                max_positions=max_positions,
+                ignore_invalid_inputs=ignore_invalid_inputs,
+                required_batch_size_multiple=required_batch_size_multiple,
+                seed=seed,
+                num_shards=num_shards,
+                shard_id=shard_id,
+                num_workers=num_workers,
+                epoch=epoch,
+                data_buffer_size=data_buffer_size,
+                disable_iterator_cache=disable_iterator_cache,
+                skip_remainder_batch=skip_remainder_batch,
+                grouped_shuffling=grouped_shuffling,
+                update_epoch_batch_itr=update_epoch_batch_itr,
+            )
+        
+        can_reuse_epoch_itr = (
+            not disable_iterator_cache
+            and not update_epoch_batch_itr
+            and self.can_reuse_epoch_itr(dataset)
+        )
+        if can_reuse_epoch_itr and dataset in self.dataset_to_epoch_iter:
+            logger.debug("reusing EpochBatchIterator for epoch {}".format(epoch))
+            return self.dataset_to_epoch_iter[dataset]
+
+        assert isinstance(dataset, FairseqDataset)
+
+        # initialize the dataset with the correct starting epoch
+        dataset.set_epoch(epoch)
+
+        # get indices ordered by example size
+        with data_utils.numpy_seed(seed):
+            indices = dataset.ordered_indices()
+
+        # filter examples that are too large
+        if max_positions is not None:
+            indices = self.filter_indices_by_size(
+                indices, dataset, max_positions, ignore_invalid_inputs
+            )
+
+        # create mini-batches with given size constraints
+        batch_sampler = dataset.get_batch_sampler(
+            indices,
+            num_shards,
+            seed,
+            max_tokens=max_tokens,
+            max_sentences=max_sentences,
+            required_batch_size_multiple=required_batch_size_multiple,
+            split_modality_batch=self.cfg.split_modality_batch,
+        )
+
+        # return a reusable, sharded iterator
+        epoch_iter = iterators.EpochBatchIterator(
+            dataset=dataset,
+            collate_fn=dataset.collater,
+            batch_sampler=batch_sampler,
+            seed=seed,
+            num_shards=num_shards,
+            shard_id=shard_id,
+            num_workers=num_workers,
+            epoch=epoch,
+            buffer_size=data_buffer_size,
+            skip_remainder_batch=skip_remainder_batch,
+            disable_shuffling=True,
+            grouped_shuffling=grouped_shuffling,
+        )
+
+        if can_reuse_epoch_itr:
+            self.dataset_to_epoch_iter[dataset] = epoch_iter
+
+        return epoch_iter
+
+    @classmethod
+    def _get_size_ratios(cls, ids: List[str], sizes: List[int], alpha: float = 1.0):
+        """Size ratios for temperature-based sampling
+        (https://arxiv.org/abs/1907.05019)"""
+        _sizes = np.array(sizes)
+        prob = _sizes / _sizes.sum()
+        smoothed_prob = prob ** alpha
+        smoothed_prob = smoothed_prob / smoothed_prob.sum()
+        size_ratio = (smoothed_prob * _sizes.sum()) / _sizes
+
+        o_str = str({_i: f"{prob[i]:.3f}" for i, _i in enumerate(ids)})
+        logger.info(f"original sampling probability: {o_str}")
+        p_str = str({_i: f"{smoothed_prob[i]:.3f}" for i, _i in enumerate(ids)})
+        logger.info(f"balanced sampling probability: {p_str}")
+        sr_str = str({_id: f"{size_ratio[i]:.3f}" for i, _id in enumerate(ids)})
+        logger.info(f"balanced sampling size ratio: {sr_str}")
+        return size_ratio.tolist()
+
+    def resample_multi_modality_dataset(self, speech_dataset, sup_dataset, mono_datasets, paired_datasets, mono_splits, paired_splits, epoch=1, train=True):
+        assert len(mono_datasets+paired_datasets) > 0, f"No text data loaded!"
+
+        if len(mono_datasets) > 1 and self.cfg.text_sampling_alpha != 1.0:
+            size_ratios = self._get_size_ratios(
+                mono_splits, [len(s) for s in mono_datasets], alpha=self.cfg.text_sampling_alpha
+            )
+            mono_datasets = [
+                ResamplingDataset(
+                    d, size_ratio=r, seed=0, epoch=epoch, replace=(r >= 1.0)
+                ) for d, r in zip(mono_datasets, size_ratios)
+            ]
+            
+        if len(paired_datasets) > 1 and self.cfg.text_sampling_alpha != 1.0:
+            size_ratios = self._get_size_ratios(
+                paired_splits, [len(s) for s in paired_datasets], alpha=self.cfg.text_sampling_alpha
+            )
+            paired_datasets = [
+                ResamplingDataset(
+                    d, size_ratio=r, seed=0, epoch=epoch, replace=(r >= 1.0)
+                ) for d, r in zip(paired_datasets, size_ratios)
+            ]
+
+        dataset_list = [speech_dataset, sup_dataset]
+        for datasets in [mono_datasets, paired_datasets]:
+            if len(datasets) > 1:
+                dataset_list.append(ConcatDataset(datasets))
+            elif len(datasets) == 1:
+                dataset_list.append(datasets[0])
+            else:
+                dataset_list.append(None)
+
+        ### match speech/text datasets according to modality
+        dataset_dict = OrderedDict((name, d) for name, d in zip(["speech", "speech_sup", "text_mono", "text_paired"], dataset_list) if d is not None)
+        max_positions_dict = {
+            "speech": None,
+            "speech_sup": None,
+            "text_mono": (self.cfg.text_cfg.tokens_per_sample, self.cfg.text_cfg.tokens_per_sample),
+            "text_paired": (self.cfg.text_cfg.tokens_per_sample, self.cfg.text_cfg.tokens_per_sample),
+        }
+        max_positions_dict = OrderedDict((name, max_positions_dict[name]) for name in dataset_dict.keys())
+        max_tokens_ratios_dict = {
+            "speech": 1.0,
+            "speech_sup": 1.0,
+            "text_mono": 1.0 / 320 / self.cfg.text_cfg.text_maxtokens_ratio,
+            "text_paired": 1.0 / 320 / self.cfg.text_cfg.text_maxtokens_ratio,
+        }
+        max_tokens_ratios = [max_tokens_ratios_dict[name] for name in dataset_dict.keys()]
+        dataset_lens = np.array([len(dataset) for dataset in dataset_dict.values()])
+        dataset_avg_sample_lens = np.array([
+            sum([dataset.num_tokens(i) for i in np.random.randint(low=0, high=len(dataset), size=10000)]) / 10000.0 
+            for dataset in dataset_dict.values()
+        ])
+
+        if not "speech" in dataset_dict:
+            distributions = [l / sum(dataset_lens) for l in dataset_lens]
+        else:
+            ## we just keep the batches of speech and non-speech the same, expand_coef is to ensure speech batches is less than others
+            first_ratio = dataset_lens[0] / sum(dataset_lens)
+            expand_coef = 1.8 if sup_dataset is None else 1.1 * sum(dataset_lens[0:2]) / dataset_lens[0]
+            distributions = [expand_coef * max_tokens_ratios[i] * dataset_avg_sample_lens[0] / l for (i, l) in enumerate(dataset_avg_sample_lens)]
+            distributions[0] = 1.0
+            if sup_dataset is not None:
+                distributions[1] = dataset_lens[1] / dataset_lens[0]
+            distributions = [first_ratio * d for d in distributions]
+
+        logging.info(f"Number samples of datasets is {dataset_lens}")
+        logging.info(f"Avg sample length of datasets is {dataset_avg_sample_lens}")
+        logging.info(f"Sampling distributions is {distributions}")
+        logging.info(f"Maxtokens ratio is {max_tokens_ratios}")
+        return dataset_dict, max_positions_dict, distributions, max_tokens_ratios
+
+    def build_tokenizer(self, cfg=None):
+        logger.info(f"tokenizer: {self.cfg.hubert_tokenizer}")
+        if self.cfg.hubert_tokenizer != "none":
+            return encoders.build_bpe(Namespace(**{"bpe": self.cfg.hubert_tokenizer, "sentencepiece_model": self.cfg.sp_path}))
+        else:
+            return None
+
+    def load_char_bart_dataset(self, split):
+        mono_dataset = data_utils.load_indexed_dataset(
+            f"{self.cfg.text_cfg.text_data}/{split}",
+            self.text_dictionary,
+        )
+        mono_dataset = StripTokenDataset(mono_dataset, self.text_dictionary.eos())
+        mono_dataset = maybe_shorten_dataset(
+            mono_dataset,
+            split,
+            self.cfg.text_cfg.shorten_data_split_list,
+            self.cfg.text_cfg.shorten_method,
+            self.cfg.text_cfg.tokens_per_sample - 2,
+            self.cfg.text_cfg.seed,
+        )
+        logger.info("loaded {} samples from: {}".format(len(mono_dataset), mono_dataset))
+        ### prepend bos and eos to dataset
+        mono_dataset = PrependTokenDataset(mono_dataset, self.text_dictionary.bos())
+        mono_dataset = AppendTokenDataset(mono_dataset, self.text_dictionary.eos())
+        mask_whole_words = (
+            get_whole_word_mask(None, self.text_dictionary)
+            if self.cfg.text_cfg.mask_whole_words
+            else None
+        )
+        lang=self.cfg.speech_tgt_lang
+        mono_dataset = DenoisingDataset(
+            mono_dataset,
+            mono_dataset.sizes,
+            self.text_dictionary,
+            self.mask_idx,
+            mask_whole_words,
+            shuffle=self.cfg.text_cfg.shuffle_instance,
+            seed=self.cfg.text_cfg.seed,
+            args=self.cfg.text_cfg,
+            tgt_lang_idx=_lang_token_index(self.text_dictionary, lang) if self.cfg.text_cfg.prepend_tgt_lang_tag else None,
+        )
+        
+        return mono_dataset
diff --git a/SpeechLM/speechlm/unit_generator.py b/SpeechLM/speechlm/unit_generator.py
new file mode 100644
index 0000000000000000000000000000000000000000..93f9b98b473f099a39191b630b181ade2231857d
--- /dev/null
+++ b/SpeechLM/speechlm/unit_generator.py
@@ -0,0 +1,66 @@
+# ----------------------------------------------------------------------------
+# SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data (https://arxiv.org/abs/2209.15329)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/SpeechLM
+# Code based on fairseq: https://github.com/facebookresearch/fairseq/tree/272c4c5197250997148fb12c0db6306035f166a4
+# 
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# ----------------------------------------------------------------------------
+
+"""
+Modified form: https://github.com/facebookresearch/fairseq/blob/272c4c5197250997148fb12c0db6306035f166a4/fairseq/sequence_generator.py
+"""
+
+import torch
+import numpy as np
+
+from fairseq.data.audio.speech_to_text_dataset import S2TDataConfig
+from fairseq.speech_generator import SpeechGenerator
+
+class NonAutoregressiveUnitGenerator(SpeechGenerator):
+    @torch.no_grad()
+    def generate(self, model, sample, has_targ=False, **kwargs):
+        model.eval()
+
+        bsz, max_src_len = sample["net_input"]["src_tokens"].size()
+        n_frames_per_step = model.encoder.n_frames_per_step
+        out_dim = model.encoder.out_dim
+        raw_dim = out_dim // n_frames_per_step
+
+        logit, logit_post, out_lens, log_dur_out, _, _ = model(
+            src_tokens=sample["net_input"]["src_tokens"],
+            src_lengths=sample["net_input"]["src_lengths"],
+            speaker=sample["speaker"],
+            durations=sample["durations"],
+            pitches=sample["pitches"],
+            energies=sample["energies"],
+        )
+        if logit_post is not None:
+            logit = logit_post
+
+        logit = logit.view(bsz, -1, raw_dim)
+        pred = logit.argmax(dim=-1)
+
+        ## get duration prediction
+        src_tokens = sample["net_input"]["src_tokens"]
+        src_lengths = sample["net_input"]["src_lengths"]
+        padding_mask = src_tokens.eq(model.encoder.padding_idx)
+        d_factor = 1.0 ## set by model
+        dur_out = torch.clamp(
+            torch.round((torch.exp(log_dur_out) - 1) * d_factor).long(), min=0
+        )
+        dur_out.masked_fill_(padding_mask, 0)
+        x = src_tokens.unsqueeze(-1)
+        x, src_out_lens = model.encoder.var_adaptor.length_regulator(x, dur_out)
+        fa_src_tokens = x.view(bsz, -1)
+        
+        finalized = [
+            {
+                "unit": pred[b, :l],
+                "fa_src": fa_src_tokens[b, :l],
+                "duration": dur_out[b, :L],
+            }
+            for b, l, L in zip(range(bsz), out_lens, src_lengths)
+        ]
+
+        return finalized
diff --git a/SpeechT5/README.md b/SpeechT5/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..d4b0b2bf4c65aad20e47c2f6a7d3626d5bbe1c99
--- /dev/null
+++ b/SpeechT5/README.md
@@ -0,0 +1,706 @@
+# SpeechT5
+
+<!--**Pre-trained models for speech related tasks**-->
+
+ [**SpeechT5**](https://arxiv.org/abs/2110.07205): **Unified-Modal Encoder-Decoder Pre-training for Spoken Language Processing**
+
+Official PyTorch implementation and pretrained models of SpeechT5
+
+- Oct 2021: release preprint in [arXiv](https://arxiv.org/abs/2110.07205)
+- Feb 2022: accepted by [ACL 2022](https://www.2022.aclweb.org/)
+
+
+## Pre-Trained Models
+
+|  Model   |               Pre-training Dataset               | Fine-tuning Dataset | Model |
+| :------: | :----------------------------------------------: | :-----------------: | :-----: |
+| SpeechT5 Base | [960 hrs LibriSpeech](http://www.openslr.org/12) + [LibriSpeech LM Dataset](https://www.openslr.org/11/) |          -          | [HuggingFace](https://huggingface.co/ajyy/SpeechT5/resolve/main/speecht5_base.pt)<br /> [Google Drive](https://drive.google.com/file/d/1Sq00uZ1pw6Z4OUaqhOWzQEJxIVWgAO5U/view?usp=sharing)  |
+| SpeechT5 Base | [960 hrs LibriSpeech](http://www.openslr.org/12) + [LibriSpeech LM Dataset](https://www.openslr.org/11/) | [100 hrs LibriSpeech](http://www.openslr.org/12) | [HuggingFace](https://huggingface.co/ajyy/SpeechT5/resolve/main/speecht5_base_asr.pt)<br /> [Google Drive](https://drive.google.com/file/d/1qLKJ81JPWOGf1MHfjSmgtZyqqTqgI6kT/view?usp=sharing)  |
+| SpeechT5 Large | [60k hrs Libri-Light](https://github.com/facebookresearch/libri-light) + [LibriSpeech LM Dataset](https://www.openslr.org/11/) |          -          | [Google Drive](https://drive.google.com/file/d/1M79b1jetSPOVxWVMIX-y0URvDjNskZKp/view?usp=sharing)  |
+
+## Language Model and Vocabulary
+|  Model   |  Dataset | Model | Vocabulary | SPM Model |
+| :------: | :------: | :---: | :--------: | :-------: |
+| LM | [LibriSpeech LM Dataset](https://www.openslr.org/11/) | [LM Model](https://drive.google.com/uc?export=download&id=1y0TGnKAMKUW5C8l8yrvGjh9RRZETPdv7)  | [Vocabulary](https://drive.google.com/uc?export=download&id=19hcQ58RHZ6CssxF8Qp6yEF1NW_AXxObK) | [SPM Model](https://drive.google.com/uc?export=download&id=1wClgQjXXoU2lmpbaEa1v2SqMbg7cAutq) |
+
+
+## Setup
+```
+git submodule update --init SpeechT5/fairseq
+cd SpeechT5/
+pip install --editable fairseq/
+pip install espnet
+```
+
+
+
+## Load Pre-Trained Models
+
+```python
+import torch
+from speecht5.tasks.speecht5 import SpeechT5Task
+from speecht5.models.speecht5 import T5TransformerModel
+
+checkpoint = torch.load('/path/to/speecht5_checkpoint')
+
+checkpoint['cfg']['task'].t5_task = 'pretrain'
+checkpoint['cfg']['task'].hubert_label_dir = "/path/to/hubert_label"
+checkpoint['cfg']['task'].data = "/path/to/tsv_file"
+
+task = SpeechT5Task.setup_task(checkpoint['cfg']['task'])
+model = T5TransformerModel.build_model(checkpoint['cfg']['model'], task)
+model.load_state_dict(checkpoint['model'])
+```
+
+## Data Preparation
+
+### Speech data and S2T Data
+Please follow the steps for preparing wav2vec 2.0 manifest in [here](https://github.com/pytorch/fairseq/tree/main/examples/wav2vec#prepare-training-data-manifest) and preparing HuBERT label in [here](https://github.com/facebookresearch/fairseq/tree/main/examples/hubert/simple_kmeans).
+
+We add a third column for the speaker embedding, which is provided in [here](https://drive.google.com/uc?export=download&id=16QOUURZBrW7-GYbVG_gXt3mTMlZmQoH0).
+It includes the speaker embeddings for 960hr training data and dev-other data of LibriSpeech.
+
+We also provide example manifests for your reference in [here](https://drive.google.com/drive/folders/1Ja08XjOHe6vP8lZtLVrJM8173aPQCR_y?usp=sharing).
+
+### Text Data
+Please use [fairseq-preprocess](https://fairseq.readthedocs.io/en/latest/command_line_tools.html#fairseq-preprocess) to generate the index and bin files of the text data. Note that we use sentencepiece to pre-process the text, so please refer to [here](https://github.com/microsoft/SpeechT5/tree/main/SpeechT5#language-model-and-vocabulary) to download the SPM model and dictionary for preparing text data. This means you firstly need to use the SPM model to process the text and then use [fairseq-preprocess](https://fairseq.readthedocs.io/en/latest/command_line_tools.html#fairseq-preprocess) with the provided dictionary to get the index and bin files.
+
+## Pre-Training
+
+### 960hr LibriSpeech + LibriSpeech-LM
+
+```
+DATA_ROOT=
+SAVE_DIR=
+LABEL_DIR=
+TRAIN_SET="speech_train|text_train"
+VALID_SET="speech_valid|text_valid"
+
+
+fairseq-train ${DATA_ROOT} \
+  --save-dir ${SAVE_DIR} \
+  --tensorboard-logdir ${SAVE_DIR} \
+  --train-subset ${TRAIN_SET} \
+  --valid-subset ${VALID_SET} \
+  --hubert-label-dir ${LABEL_DIR} \
+  --distributed-world-size 32 \
+  --distributed-port 0 \
+  --ddp-backend legacy_ddp \
+  --user-dir SpeechT5/speecht5 \
+  --log-format json \
+  --seed 1337 \
+  --fp16 \
+  \
+  --task speecht5 \
+  --t5-task pretrain \
+  --label-rates 50 \
+  --sample-rate 16000 \
+  --random-crop \
+  \
+  --num-workers 0 \
+  --max-tokens 1400000 \
+  --max-speech-sample-size 250000 \
+  --update-freq 2 \
+  --batch-ratio "[1,0.0086]" \
+  \
+  --criterion speecht5 \
+  --optimizer adam \
+  --reset-optimizer \
+  --adam-betas "(0.9, 0.98)" \
+  --adam-eps 1e-06 \
+  --weight-decay 0.01 \
+  --power 1 \
+  --clip-norm 5.0 \
+  --lr 0.0002 \
+  --lr-scheduler polynomial_decay \
+  \
+  --max-update 800000 \
+  --warmup-updates 64000 \
+  --total-num-update 800000 \
+  --save-interval-updates 3000 \
+  --skip-invalid-size-inputs-valid-test \
+  --required-batch-size-multiple 1 \
+  \
+  --arch t5_transformer_base \
+  --share-input-output-embed \
+  --find-unused-parameters \
+  --bert-init \
+  --relative-position-embedding \
+  --use-codebook \
+  --codebook-prob 0.1 \
+  --loss-weights="[10,0.1]" \
+  --max-text-positions 600 \
+```
+
+## Finetune
+
+### ASR
+
+The fine-tuned ASR model can be used directly using Hugging Face Transformers. The checkpoint is available at [hf.co/microsoft/speecht5_asr](https://huggingface.co/microsoft/speecht5_asr). An interactive demo is [available here](https://huggingface.co/spaces/Matthijs/speecht5-asr-demo).
+
+#### Training
+
+```
+DATA_ROOT=
+SAVE_DIR=
+TRAIN_SET=
+VALID_SET=
+LABEL_DIR=
+BPE_TOKENIZER=
+USER_DIR=
+PT_CHECKPOINT_PATH=
+
+mkdir -p ${SAVE_DIR}
+fairseq-train ${DATA_ROOT} \
+  --save-dir ${SAVE_DIR} \
+  --tensorboard-logdir ${SAVE_DIR} \
+  --train-subset ${TRAIN_SET} \
+  --valid-subset ${VALID_SET} \
+  --hubert-label-dir ${LABEL_DIR} \
+  --distributed-world-size 8 \
+  --distributed-port 0 \
+  --ddp-backend legacy_ddp \
+  --user-dir ${USER_DIR} \
+  --log-format json \
+  --seed 1 \
+  --fp16 \
+  \
+  --task speecht5 \
+  --t5-task s2t \
+  --sample-rate 16000 \
+  --num-workers 0 \
+  --max-tokens 1600000 \
+  --update-freq 2 \
+  --bpe-tokenizer ${BPE_TOKENIZER} \
+  \
+  --criterion speecht5 \
+  --report-accuracy \
+  --zero-infinity \
+  --ce-weight 0.5 \
+  --ctc-weight 0.5 \
+  --sentence-avg \
+  \
+  --optimizer adam \
+  --adam-betas "(0.9, 0.98)" \
+  --adam-eps 1e-08 \
+  --weight-decay 0.1 \
+  --clip-norm 25.0 \
+  --lr 0.00006 \
+  --lr-scheduler tri_stage \
+  --phase-ratio "[0.1, 0.4, 0.5]" \
+  --final-lr-scale 0.05 \
+  \
+  --max-update 80000 \
+  --max-text-positions 600 \
+  --required-batch-size-multiple 1 \
+  --save-interval-updates 3000 \
+  --skip-invalid-size-inputs-valid-test \
+  \
+  --arch t5_transformer_base_asr \
+  --share-input-output-embed \
+  --find-unused-parameters \
+  --bert-init \
+  --relative-position-embedding \
+  --freeze-encoder-updates 13000 \
+  \
+  --keep-last-epochs 10 \
+  --feature-grad-mult 1.0 \
+  --best-checkpoint-metric s2t_accuracy \
+  --maximize-best-checkpoint-metric \
+  --finetune-from-model ${PT_CHECKPOINT_PATH}
+```
+
+#### Inference
+Note that joint CTC/Decoder inference is only supported when batch size is 1.
+
+```
+CHECKPOINT_PATH=
+DATA_ROOT=
+SUBSET=
+BPE_TOKENIZER=
+LABEL_DIR=
+USER_DIR=
+BEAM=
+MAX_TOKENS=
+CTC_WEIGHT=
+LM_WEIGHT=
+LM_PATH=
+
+fairseq-generate ${DATA_ROOT} \
+  --gen-subset ${SUBSET} \
+  --bpe-tokenizer ${BPE_TOKENIZER} \
+  --user-dir ${USER_DIR} \
+  --task speecht5 \
+  --t5-task s2t \
+  --path ${CHECKPOINT_PATH} \
+  --hubert-label-dir ${LABEL_DIR} \
+  --ctc-weight ${CTC_WEIGHT} \
+  --lm-weight ${LM_WEIGHT} \
+  --lm-path ${LM_PATH} \
+  --max-tokens ${MAX_TOKENS} \
+  --beam ${BEAM} \
+  --scoring wer \
+  --max-len-a 0 \
+  --max-len-b 620 \
+  --sample-rate 16000
+```
+
+### TTS
+
+The manifest and pre-trained vocoder can be found in [huggingface](https://huggingface.co/mechanicalsea/speecht5-tts), which may be helpful to reproduce the results of SpeechT5 TTS model.
+
+We also provide re-implementation of TTS fine-tuned model [speecht5_tts.pt](https://huggingface.co/mechanicalsea/speecht5-tts/blob/main/speecht5_tts.pt), but with a smaller batch size or max updates, which can be helpful.
+
+This fine-tuned TTS model can also be used directly using Hugging Face Transformers. The checkpoint is available at [hf.co/microsoft/speecht5_tts](https://huggingface.co/microsoft/speecht5_tts). An interactive demo is [available here](https://huggingface.co/spaces/Matthijs/speecht5-tts-demo). Also see [this Colab notebook](https://colab.research.google.com/drive/1i7I5pzBcU3WDFarDnzweIj4-sVVoIUFJ) on how to fine-tune SpeechT5 for TTS using Hugging Face.
+
+#### Training
+
+```
+DATA_ROOT=
+SAVE_DIR=
+TRAIN_SET=
+VALID_SET=
+LABEL_DIR=
+BPE_TOKENIZER=
+USER_DIR=
+PT_CHECKPOINT_PATH=
+
+fairseq-train ${DATA_ROOT} \
+  --save-dir ${SAVE_DIR} \
+  --tensorboard-logdir ${SAVE_DIR} \
+  --train-subset ${TRAIN_SET} \
+  --valid-subset ${VALID_SET} \
+  --hubert-label-dir ${LABEL_DIR} \
+  --distributed-world-size 8 \
+  --distributed-port 0 \
+  --ddp-backend legacy_ddp \
+  --user-dir ${USER_DIR} \
+  --log-format json \
+  --seed 1 \
+  --fp16 \
+  \
+  --task speecht5 \
+  --t5-task t2s \
+  --sample-rate 16000 \
+  --num-workers 4 \
+  --max-tokens 3200000 \
+  --update-freq 1 \
+  --bpe-tokenizer ${BPE_TOKENIZER} \
+  --max-tokens-valid 3200000 \
+  \
+  --criterion speecht5 \
+  --use-guided-attn-loss \
+  --report-accuracy \
+  --sentence-avg \
+  \
+  --optimizer adam \
+  --adam-betas "(0.9, 0.98)" \
+  --dropout 0.15 \
+  --activation-dropout 0.15 \
+  --attention-dropout 0.15 \
+  --encoder-layerdrop 0.0 \
+  --decoder-layerdrop 0.0 \
+  --weight-decay 0.0 \
+  --clip-norm 25.0 \
+  --lr 0.0001 \
+  --lr-scheduler inverse_sqrt \
+  --warmup-updates 10000 \
+  --feature-grad-mult 1.0 \
+  \
+  --max-update 120000 \
+  --max-text-positions 600 \
+  --min-speech-sample-size 1056 \
+  --max-speech-sample-size 480256 \
+  --max-speech-positions 1876 \
+  --required-batch-size-multiple 1 \
+  --skip-invalid-size-inputs-valid-test \
+  --keep-last-epochs 10 \
+  --validate-after-updates 20000 \
+  --validate-interval 50 \
+  --log-interval 10 \
+  \
+  --arch t5_transformer_base_asr \
+  --share-input-output-embed \
+  --find-unused-parameters \
+  --bert-init \
+  --relative-position-embedding \
+  --freeze-encoder-updates 20000 \
+  \
+  --finetune-from-model ${PT_CHECKPOINT_PATH}
+```
+
+#### Inference
+
+Generating speech is available only if batch size is 1.
+
+```
+SPEECHT5_CODE_DIR=
+CHECKPOINT_PATH=
+DATA_ROOT=
+SUBSET=
+BPE_TOKENIZER=
+LABEL_DIR=
+USER_DIR=
+RESULTS_PATH=
+
+python3 ${SPEECHT5_CODE_DIR}/SpeechT5/scripts/generate_speech.py ${DATA_ROOT} \
+  --gen-subset ${SUBSET} \
+  --bpe-tokenizer ${BPE_TOKENIZER} \
+  --user-dir ${USER_DIR} \
+  --task speecht5 \
+  --t5-task t2s \
+  --path ${CHECKPOINT_PATH} \
+  --hubert-label-dir ${LABEL_DIR} \
+  --batch-size 1 \
+  --results-path ${RESULTS_PATH} \
+  --sample-rate 16000
+```
+
+### ST
+
+Here we follow [fairseq/speech_to_text/mustc](https://github.com/facebookresearch/fairseq/blob/main/examples/speech_to_text/docs/mustc_example.md#data-preparation) to generate vocabulary, which is different from the pre-trained models. So we randomly initilize the embedding table of the pre-trained models during fine-tuning.
+
+#### Training
+
+```
+DATA_ROOT=
+SAVE_DIR=
+TRAIN_SET=
+VALID_SET=
+LABEL_DIR=
+BPE_TOKENIZER=
+USER_DIR=
+PT_CHECKPOINT_PATH=
+
+fairseq-train ${DATA_ROOT} \
+  --save-dir ${SAVE_DIR} \
+  --tensorboard-logdir ${SAVE_DIR} \
+  --train-subset ${TRAIN_SET} \
+  --valid-subset ${VALID_SET} \
+  --hubert-label-dir ${LABEL_DIR} \
+  --distributed-world-size 8 \
+  --distributed-port 0 \
+  --ddp-backend legacy_ddp \
+  --user-dir ${USER_DIR} \
+  --log-format json \
+  --seed 1 \
+  --fp16 \
+  \
+  --task speecht5 \
+  --t5-task s2t \
+  --sample-rate 16000 \
+  --num-workers 6 \
+  --max-tokens 480256 \
+  --update-freq 4 \
+  --bpe-tokenizer ${BPE_TOKENIZER} \
+  --max-tokens-valid 3200000 \
+  \
+  --criterion speecht5 \
+  --label-smoothing 0.1 \
+  --report-accuracy \
+  --sentence-avg \
+  \
+  --optimizer adam \
+  --adam-betas "(0.9, 0.98)" \
+  --weight-decay 0.0 \
+  --clip-norm 10.0 \
+  --lr 0.0002 \
+  --lr-scheduler inverse_sqrt \
+  --warmup-updates 25000 \
+  --feature-grad-mult 1.0 \
+  \
+  --max-update 80000 \
+  --max-text-positions 600 \
+  --min-speech-sample-size 1056 \
+  --max-speech-sample-size 480256 \
+  --max-speech-positions 1876 \
+  --required-batch-size-multiple 1 \
+  --skip-invalid-size-inputs-valid-test \
+  --keep-last-epochs 10 \
+  \
+  --arch t5_transformer_base_asr \
+  --share-input-output-embed \
+  --find-unused-parameters \
+  --bert-init \
+  --relative-position-embedding \
+  --freeze-encoder-updates 0 \
+  --mask-prob 0.5 \
+  --mask-channel-prob 0.5 \
+  \
+  --finetune-from-model ${PT_CHECKPOINT_PATH}
+```
+
+#### Inference
+
+```
+FAIRSEQ_DIR=
+CHECKPOINT_PATH=
+DATA_ROOT=
+BPE_TOKENIZER=
+LABEL_DIR=
+USER_DIR=
+MAX_TOKENS=
+
+python3 ${FAIRSEQ_DIR}/scripts/average_checkpoints.py \
+  --inputs ${CHECKPOINT_PATH} \
+  --num-epoch-checkpoints 10 \
+  --output ${CHECKPOINT_PATH}/avg_last_10_checkpoint.pt
+
+fairseq-generate ${DATA_ROOT} \
+        --gen-subset tst-COMMON \
+        --bpe-tokenizer ${BPE_TOKENIZER} \
+        --user-dir ${USER_DIR} \
+        --task speecht5 \
+        --t5-task s2t \
+        --path ${CHECKPOINT_PATH}/avg_last_10_checkpoint.pt \
+        --hubert-label-dir ${LABEL_DIR} \
+        --max-tokens ${MAX_TOKENS} \
+        --min-speech-sample-size 1056 \
+        --beam 5 \
+        --scoring sacrebleu \
+        --max-len-a 0 \
+        --max-len-b 620 \
+        --sample-rate 16000
+```
+
+### VC
+
+The manifest and pre-trained vocoder can be found in [huggingface](https://huggingface.co/mechanicalsea/speecht5-vc), which may be helpful to reproduce the results of SpeechT5 VC model.
+
+We also provide re-implementation of VC fine-tuned model [speecht5_vc.pt](https://huggingface.co/mechanicalsea/speecht5-vc/blob/main/speecht5_vc.pt), but with a smaller batch size or max updates, which can be helpful.
+
+This fine-tuned VC model can also be used directly using Hugging Face Transformers. The checkpoint is available at [hf.co/microsoft/speecht5_vc](https://huggingface.co/microsoft/speecht5_vc). An interactive demo is [available here](https://huggingface.co/spaces/Matthijs/speecht5-vc-demo).
+
+#### Training
+
+
+```
+DATA_ROOT=
+SAVE_DIR=
+TRAIN_SET=
+VALID_SET=
+LABEL_DIR=
+BPE_TOKENIZER=
+USER_DIR=
+PT_CHECKPOINT_PATH=
+
+fairseq-train ${DATA_ROOT} \
+        --save-dir ${SAVE_DIR} \
+        --tensorboard-logdir ${SAVE_DIR} \
+        --train-subset ${TRAIN_SET} \
+        --valid-subset ${VALID_SET} \
+        --hubert-label-dir ${LABEL_DIR} \
+        --distributed-world-size 8 \
+        --distributed-port 0 \
+        --ddp-backend legacy_ddp \
+        --user-dir ${USER_DIR} \
+        --log-format json \
+        --seed 1 \
+        --fp16 \
+        \
+        --task speecht5 \
+        --t5-task s2s \
+        --sample-rate 16000 \
+        --num-workers 4 \
+        --max-tokens 1280000 \
+        --update-freq 3 \
+        --max-tokens-valid 1280000 \
+        \
+        --criterion speecht5 \
+        --use-guided-attn-loss \
+        --report-accuracy \
+        --sentence-avg \
+        \
+        --optimizer adam \
+        --dropout 0.2 \
+        --activation-dropout 0.2 \
+        --attention-dropout 0.2 \
+        --encoder-layerdrop 0.05 \
+        --decoder-layerdrop 0.0 \
+        --clip-norm 1.0 \
+        --lr 0.0001 \
+        --lr-scheduler inverse_sqrt \
+        --warmup-updates 6000 \
+        --feature-grad-mult 1.0 \
+        \
+        --max-update 60000 \
+        --max-text-positions 600 \
+        --min-speech-sample-size 1056 \
+        --max-speech-sample-size 480256 \
+        --max-speech-positions 1876 \
+        --required-batch-size-multiple 1 \
+        --skip-invalid-size-inputs-valid-test \
+        --keep-last-epochs 10 \
+        --save-interval-updates 10000 \
+        --disable-validation \
+        --log-interval 10 \
+        \
+        --arch t5_transformer_base_asr \
+        --share-input-output-embed \
+        --find-unused-parameters \
+        --bert-init \
+        --relative-position-embedding \
+        --mask-prob 0.0 \
+        --mask-channel-prob 0.0 \
+        \
+        --finetune-from-model ${PT_CHECKPOINT_PATH}
+```
+
+#### Inference
+
+Generating speech is available only if batch size is 1.
+
+```
+SPEECHT5_CODE_DIR=
+CHECKPOINT_PATH=
+DATA_ROOT=
+SUBSET=
+LABEL_DIR=
+USER_DIR=
+RESULTS_PATH=
+
+python3 ${SPEECHT5_CODE_DIR}/SpeechT5/scripts/generate_speech.py ${DATA_ROOT} \
+        --gen-subset test \
+        --user-dir ${USER_DIR} \
+        --task speecht5 \
+        --t5-task s2s \
+        --path ${CHECKPOINT_PATH} \
+        --hubert-label-dir ${LABEL_DIR} \
+        --batch-size 1 \
+        --results-path ${RESULTS_PATH} \
+        --sample-rate 16000
+```
+
+### SID
+
+The manifest can be found in [huggingface](https://huggingface.co/mechanicalsea/speecht5-sid), which may be helpful to reproduce the results of SpeechT5 SID model.
+
+We also provide re-implementation of SID fine-tuned model [speecht5_sid.pt](https://huggingface.co/mechanicalsea/speecht5-sid/blob/main/speecht5_sid.pt) with training log and results, **but in a smaller batch size**, which can be helpful.
+
+#### Training
+
+
+```
+DATA_ROOT=
+SAVE_DIR=
+TRAIN_SET=
+VALID_SET=
+USER_DIR=
+PT_CHECKPOINT_PATH=
+
+mkdir -p ${SAVE_DIR}
+
+fairseq-train ${DATA_ROOT} \
+  --save-dir ${SAVE_DIR} \
+  --tensorboard-logdir ${SAVE_DIR} \
+  --train-subset ${TRAIN_SET} \
+  --valid-subset ${VALID_SET} \
+  --user-dir ${USER_DIR} \
+  --distributed-world-size 8 \
+  --distributed-port 0 \
+  --ddp-backend legacy_ddp \
+  --log-format json \
+  --seed 1 \
+  --fp16 \
+  \
+  --task speecht5 \
+  --t5-task s2c \
+  --sample-rate 16000 \
+  --num-workers 4 \
+  --batch-size 8 \
+  --update-freq 2 \
+  --data-buffer-size 0 \
+  \
+  --criterion speecht5 \
+  --report-accuracy \
+  --best-checkpoint-metric "s2c_accuracy" \
+  --maximize-best-checkpoint-metric \
+  \
+  --optimizer adam \
+  --dropout 0.1 \
+  --activation-dropout 0.1 \
+  --attention-dropout 0.1 \
+  --encoder-layerdrop 0.05 \
+  --lr-scheduler triangular \
+  --max-lr 2e-4 \
+  --lr-period-updates 60000 \
+  --lr-shrink 0.5 \
+  --lr 1e-8 \
+  --feature-grad-mult 1.0 \
+  --weight-decay 0.1 \
+  \
+  --max-update 60000 \
+  --max-text-positions 600 \
+  --max-speech-positions 8000 \
+  --required-batch-size-multiple 1 \
+  --skip-invalid-size-inputs-valid-test \
+  --save-interval-updates 10000 \
+  --validate-after-updates 20000 \
+  --no-epoch-checkpoints \
+  --log-interval 10 \
+  \
+  --arch t5_transformer_base_asr \
+  --share-input-output-embed \
+  --find-unused-parameters \
+  --bert-init \
+  --relative-position-embedding \
+  --mask-prob 0.0 \
+  --mask-channel-prob 0.0 \
+  --sid-no-pooling-bn \
+  --sid-no-embed-postnet \
+  \
+  --finetune-from-model ${PT_CHECKPOINT_PATH}
+```
+
+#### Inference
+
+
+```
+CHECKPOINT_PATH=
+DATA_ROOT=
+SUBSET=
+USER_DIR=
+RESULTS_PATH=
+
+mkdir -p ${RESULTS_PATH}
+
+python scripts/generate_class.py ${DATA_ROOT} \
+  --gen-subset ${SUBSET} \
+  --user-dir ${USER_DIR} \
+  --log-format json \
+  --task speecht5 \
+  --t5-task s2c \
+  --path ${CHECKPOINT_PATH} \
+  --results-path ${RESULTS_PATH} \
+  --batch-size 1 \
+  --max-speech-positions 8000 \
+  --sample-rate 16000
+```
+
+## License
+
+This project is licensed under the license found in the LICENSE file in the root directory of this source tree.
+Portions of the source code are based on the [FAIRSEQ](https://github.com/pytorch/fairseq) and [ESPnet](https://github.com/espnet/espnet) projects.
+
+[Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct)
+
+### Reference
+
+If you find our work is useful in your research, please cite the following paper:
+
+```bibtex
+@article{Ao2021SpeechT5,
+  title   = {SpeechT5: Unified-Modal Encoder-Decoder Pre-training for Spoken Language Processing},
+  author  = {Junyi Ao and Rui Wang and Long Zhou and Chengyi Wang and Shuo Ren and Yu Wu and Shujie Liu and Tom Ko and Qing Li and Yu Zhang and Zhihua Wei and Yao Qian and Jinyu Li and Furu Wei},
+  eprint={2110.07205},
+  archivePrefix={arXiv},
+  primaryClass={cs.CL},
+  year={2021}
+}
+```
+
+### Contact Information
+
+For help or issues using SpeechT5 models, please submit a GitHub issue.
+
+For other communications related to SpeechT5, please contact Long Zhou (`lozhou@microsoft.com`).
diff --git a/SpeechT5/asr_train/train.ltr b/SpeechT5/asr_train/train.ltr
new file mode 100644
index 0000000000000000000000000000000000000000..cd4f278c7df89504e81f018cbca63cbb45741cde
--- /dev/null
+++ b/SpeechT5/asr_train/train.ltr
@@ -0,0 +1,2674 @@
+E V E N | O N | T H I S | L E D G E | O F | H U M A N | S O C I E T Y | T H E R E | W A S | A | S T U N T E D | G R O W T H | O F | S H O P L E T S | W H I C H | H A D | T A K E N | R O O T | A N D | V E G E T A T E D | S O M E H O W | T H O U G H | A S | I N | A N | A I R | M E R C A N T I L E | O F | T H E | B L E A K E S T |
+S H O R T L Y | A F T E R | P A S S I N G | O N E | O F | T H E S E | C H A P E L S | W E | C A M E | S U D D E N L Y | U P O N | A | V I L L A G E | W H I C H | S T A R T E D | U P | O U T | O F | T H E | M I S T | A N D | I | W A S | A L A R M E D | L E S T | I | S H O U L D | B E | M A D E | A N | O B J E C T | O F | C U R I O S I T Y | O R | D I S L I K E |
+T H E | S T R E E T S | W E R E | N A R R O W | A N D | U N P A V E D | B U T | V E R Y | F A I R L Y | C L E A N |
+I N | A B O U T | F O U R | H O U R S | O F | W A L K I N G | F R O M | T H E | T I M E | W E | S T A R T E D | A N D | A F T E R | P A S S I N G | T W O | O R | T H R E E | M O R E | V I L L A G E S | W E | C A M E | U P O N | A | C O N S I D E R A B L E | T O W N | A N D | M Y | G U I D E S | M A D E | M A N Y | A T T E M P T S | T O | M A K E | M E | U N D E R S T A N D | S O M E T H I N G | B U T | I | G A T H E R E D | N O | I N K L I N G | O F | T H E I R | M E A N I N G | E X C E P T | T H A T | I | N E E D | B E | U N D E R | N O | A P P R E H E N S I O N | O F | D A N G E R |
+T H E | V I N E | G R E W | O U T S I D E | M A N Y | O F | T H E | H O U S E S | A N D | T H E R E | W E R E | S O M E | W I T H | S I G N | B O A R D S | O N | W H I C H | W A S | P A I N T E D | A | B O T T L E | A N D | A | G L A S S | T H A T | M A D E | M E | F E E L | M U C H | A T | H O M E |
+E V E N | I N | M I D D L E | A G E | T H E Y | W E R E | S T I L L | C O M E L Y | A N D | T H E | O L D | G R E Y | H A I R E D | W O M E N | A T | T H E I R | C O T T A G E | D O O R S | H A D | A | D I G N I T Y | N O T | T O | S A Y | M A J E S T Y | O F | T H E I R | O W N |
+I | H A V E | A L W A Y S | D E L I G H T E D | I N | A N D | R E V E R E N C E D | B E A U T Y | B U T | I | F E L T | S I M P L Y | A B A S H E D | I N | T H E | P R E S E N C E | O F | S U C H | A | S P L E N D I D | T Y P E | A | C O M P O U N D | O F | A L L | T H A T | I S | B E S T | I N | E G Y P T I A N | G R E E K | A N D | I T A L I A N |
+T H E | C H I L D R E N | W E R E | I N F I N I T E | I N | N U M B E R | A N D | E X C E E D I N G L Y | M E R R Y | I | N E E D | H A R D L Y | S A Y | T H A T | T H E Y | C A M E | I N | F O R | T H E I R | F U L L | S H A R E | O F | T H E | P R E V A I L I N G | B E A U T Y |
+T H E | D E S I G N | W A S | D I F F E R E N T | B U T | T H E | T H I N G | W A S | C L E A R L Y | T H E | S A M E |
+T H E | O T H E R | L O O K E D | P A L E | A N D | I L L | B U T | H E | W A S | M A R V E L L O U S L Y | S E L F | C O N T A I N E D | A N D | I T | W A S | I M P O S S I B L E | T O | S A Y | W H A T | W A S | T H E | M A T T E R | W I T H | H I M |
+W E | P A S S E D | M A N Y | C A S E S | A N D | A T | L A S T | C A M E | T O | O N E | I N | W H I C H | T H E R E | W E R E | S E V E R A L | C L O C K S | A N D | T W O | O R | T H R E E | O L D | W A T C H E S |
+M Y | G U I D E S | H O W E V E R | W E R E | W E L L | K N O W N | A N D | T H E | N A T U R A L | P O L I T E N E S S | O F | T H E | P E O P L E | P R E V E N T E D | T H E M | F R O M | P U T T I N G | M E | T O | A N Y | I N C O N V E N I E N C E | B U T | T H E Y | C O U L D | N O T | H E L P | E Y E I N G | M E | N O R | I | T H E M |
+I | S A W | A | F E W | S H E E P | W I T H | R O U N D E D | N O S E S | A N D | E N O R M O U S | T A I L S |
+T H I S | H A D | S O M E | E F F E C T | I N | C A L M I N G | H I M |
+E A C H | F E A T U R E | W A S | F I N I S H E D | E Y E L I D S | E Y E L A S H E S | A N D | E A R S | B E I N G | A L M O S T | I N V A R I A B L Y | P E R F E C T |
+B U T | B Y | A N D | B Y | T H E Y | C A M E | T O | M Y | W A T C H | W H I C H | I | H A D | H I D D E N | A W A Y | I N | T H E | I N M O S T | P O C K E T | T H A T | I | H A D | A N D | H A D | F O R G O T T E N | W H E N | T H E Y | B E G A N | T H E I R | S E A R C H |
+T H E | C O U N T R Y | W A S | H I G H L Y | C U L T I V A T E D | E V E R Y | L E D G E | B E I N G | P L A N T E D | W I T H | C H E S T N U T S | W A L N U T S | A N D | A P P L E | T R E E S | F R O M | W H I C H | T H E | A P P L E S | W E R E | N O W | G A T H E R I N G |
+T H E I R | E X P R E S S I O N | W A S | D I V I N E | A N D | A S | T H E Y | G L A N C E D | A T | M E | T I M I D L Y | B U T | W I T H | P A R T E D | L I P S | I N | G R E A T | B E W I L D E R M E N T | I | F O R G O T | A L L | T H O U G H T S | O F | T H E I R | C O N V E R S I O N | I N | F E E L I N G S | T H A T | W E R E | F A R | M O R E | E A R T H L Y |
+T H E Y | F E L T | M Y | P U L S E | T H E Y | L O O K E D | A T | M Y | T O N G U E | T H E Y | L I S T E N E D | A T | M Y | C H E S T | T H E Y | F E L T | A L L | M Y | M U S C L E S | A N D | A T | T H E | E N D | O F | E A C H | O P E R A T I O N | T H E Y | L O O K E D | A T | T H E | C H I E F | A N D | N O D D E D | A N D | S A I D | S O M E T H I N G | I N | A | T O N E | Q U I T E | P L E A S A N T | A S | T H O U G H | I | W E R E | A L L | R I G H T |
+A G A I N | T H E R E | W A S | A | V E R Y | O L D | C A R R I A G E | W H O S E | W H E E L S | I N | S P I T E | O F | R U S T | A N D | D E C A Y | I | C O U L D | S E E | H A D | B E E N | D E S I G N E D | O R I G I N A L L Y | F O R | I R O N | R A I L S |
+I | E X P R E S S E D | B Y | S I G N S | M Y | A D M I R A T I O N | A N D | P L E A S U R E | T O | M Y | G U I D E S | A N D | T H E Y | W E R E | G R E A T L Y | P L E A S E D |
+H E | B E G A N | P R E S E N T L Y | T O | R E L E N T | A N D | S P O K E | T O | M E | I N | A | K I N D E R | M A N N E R |
+I N | F A C T | O N E | O F | T H E M | W A S | P L A I N L Y | V E R Y | M U C H | O U T | O F | H E A L T H | A N D | C O U G H E D | V I O L E N T L Y | F R O M | T I M E | T O | T I M E | I N | S P I T E | O F | M A N I F E S T | E F F O R T S | T O | S U P P R E S S | I T |
+S U F F I C E | I T | T H A T | I | F O U N D | M Y S E L F | T A K E N | B E F O R E | T H E | C H I E F | M A G I S T R A T E | A N D | B Y | H I S | O R D E R S | W A S | P L A C E D | I N | A N | A P A R T M E N T | W I T H | T W O | O T H E R | P E O P L E | W H O | W E R E | T H E | F I R S T | I | H A D | S E E N | L O O K I N G | A N Y T H I N G | B U T | W E L L | A N D | H A N D S O M E |
+I T | I S | S U R P R I S I N G | H O W | S O O N | T H E | E Y E | B E C O M E S | A C C U S T O M E D | T O | M I S S I N G | T W E N T Y | S H E E P | O U T | O F | T W O | O R | T H R E E | H U N D R E D |
+I | W A S | D E L I G H T E D | W I T H | T H E | C O U N T R Y | A N D | T H E | M A N N E R | O F | L I F E |
+I | R E A C H E D | M Y | D E S T I N A T I O N | I N | O N E | O F | T H E | L A S T | M O N T H S | O F | E I G H T E E N | S I X T Y | E I G H T | B U T | I | D A R E | N O T | M E N T I O N | T H E | S E A S O N | L E S T | T H E | R E A D E R | S H O U L D | G A T H E R | I N | W H I C H | H E M I S P H E R E | I | W A S |
+E A C H | M U S T | C R Y | L O U D E R | A N D | W A N D E R | F A R T H E R | Y E T | M A Y | L U C K | B E | W I T H | T H E M | B O T H | T H A T | T H E Y | M A Y | F I N D | T H E I R | O W N | A T | N I G H T F A L L |
+N O | O N E | W H O | I S | H I M S E L F | H O N E S T | W I L L | D O U B T | M Y | B E I N G | S O |
+I | H A D | N O | M O N E Y | B U T | I F | I | C O U L D | O N L Y | F I N D | W O R K A B L E | C O U N T R Y | I | M I G H T | S T O C K | I T | W I T H | B O R R O W E D | C A P I T A L | A N D | C O N S I D E R | M Y S E L F | A | M A D E | M A N |
+I F | T H E | R E A D E R | W I L L | E X C U S E | M E | I | W I L L | S A Y | N O T H I N G | O F | M Y | A N T E C E D E N T S | N O R | O F | T H E | C I R C U M S T A N C E S | W H I C H | L E D | M E | T O | L E A V E | M Y | N A T I V E | C O U N T R Y | T H E | N A R R A T I V E | W O U L D | B E | T E D I O U S | T O | H I M | A N D | P A I N F U L | T O | M Y S E L F |
+I | W A S | T O | S E E | T H E | S H E E P | N O T | N E C E S S A R I L Y | C L O S E | A T | H A N D | N O R | T O | G E T | T H E M | I N | A | S I N G L E | M O B | B U T | T O | S E E | E N O U G H | O F | T H E M | H E R E | A N D | T H E R E | T O | F E E L | E A S Y | T H A T | N O T H I N G | H A D | G O N E | W R O N G | T H I S | W A S | N O | D I F F I C U L T | M A T T E R | F O R | T H E R E | W E R E | N O T | A B O V E | E I G H T | H U N D R E D | O F | T H E M | A N D | B E I N G | A L L | B R E E D I N G | E W E S | T H E Y | W E R E | P R E T T Y | Q U I E T |
+I | W O U L D | T R Y | T H E | N E A R E R | R A N G E | A N D | S E E | H O W | F A R | I | C O U L D | G O |
+S H E E P | A N D | C A T T L E | W E R E | I N T R O D U C E D | A N D | B R E D | W I T H | E X T R E M E | R A P I D I T Y | M E N | T O O K | U P | T H E I R | F I F T Y | T H O U S A N D | O R | O N E | H U N D R E D | T H O U S A N D | A C R E S | O F | C O U N T R Y | G O I N G | I N L A N D | O N E | B E H I N D | T H E | O T H E R | T I L L | I N | A | F E W | Y E A R S | T H E R E | W A S | N O T | A N | A C R E | B E T W E E N | T H E | S E A | A N D | T H E | F R O N T | R A N G E S | W H I C H | W A S | N O T | T A K E N | U P | A N D | S T A T I O N S | E I T H E R | F O R | S H E E P | O R | C A T T L E | W E R E | S P O T T E D | A B O U T | A T | I N T E R V A L S | O F | S O M E | T W E N T Y | O R | T H I R T Y | M I L E S | O V E R | T H E | W H O L E | C O U N T R Y |
+T H E R E | W E R E | A | G O O D | M A N Y | S H E E P | W H I C H | I | K N E W | A S | T W O | O R | T H R E E | B L A C K | E W E S | A N D | A | B L A C K | L A M B | O R | T W O | A N D | S E V E R A L | O T H E R S | W H I C H | H A D | S O M E | D I S T I N G U I S H I N G | M A R K | W H E R E B Y | I | C O U L D | T E L L | T H E M |
+I T | W A S | A | M O N O T O N O U S | L I F E | B U T | I T | W A S | V E R Y | H E A L T H Y | A N D | O N E | D O E S | N O T | M U C H | M I N D | A N Y T H I N G | W H E N | O N E | I S | W E L L |
+I T | W I L L | B E | S E E N | T H A T | I | D I D | N O T | S U C C E E D | I N | M Y | D E S I G N | A N D | T H A T | H O W E V E R | M U C H | I | M A Y | H A V E | M E T | W I T H | T H A T | W A S | N E W | A N D | S T R A N G E | I | H A V E | B E E N | U N A B L E | T O | R E A P | A N Y | P E C U N I A R Y | A D V A N T A G E |
+T H E | C O U N T R Y | W A S | T H E | G R A N D E S T | T H A T | C A N | B E | I M A G I N E D |
+T H E R E | W A S | N O | O N E | I N | T H E | W H O L E | W O R L D | W H O | H A D | T H E | S M A L L E S T | I D E A | S A V E | T H O S E | W H O | W E R E | T H E M S E L V E S | O N | T H E | O T H E R | S I D E | O F | I T | I F | I N D E E D | T H E R E | W A S | A N Y | O N E | A T | A L L | C O U L D | I | H O P E | T O | C R O S S | I T |
+S O | L O N E L Y | A N D | S O | S O L E M N | W I T H | T H E | S A D | G R E Y | C L O U D S | A B O V E | A N D | N O | S O U N D | S A V E | A | L O S T | L A M B | B L E A T I N G | U P O N | T H E | M O U N T A I N | S I D E | A S | T H O U G H | I T S | L I T T L E | H E A R T | W E R E | B R E A K I N G |
+P R E F A C E | T O | S E C O N D | E D I T I O N |
+I | M U S T | N O T | C O N C L U D E | W I T H O U T | E X P R E S S I N G | M Y | M O S T | S I N C E R E | T H A N K S | T O | M Y | C R I T I C S | A N D | T O | T H E | P U B L I C | F O R | T H E | L E N I E N C Y | A N D | C O N S I D E R A T I O N | W I T H | W H I C H | T H E Y | H A V E | T R E A T E D | M Y | A D V E N T U R E S |
+T H I S | I S | A | M I S T A K E | T H O U G H | A | P E R F E C T L Y | N A T U R A L | O N E |
+I T | W A S | W R I T T E N | I N | T H E | U P P E R | R A N G I T A T A | D I S T R I C T | O F | T H E | C A N T E R B U R Y | P R O V I N C E | A S | I T | T H E N | W A S | O F | N E W | Z E A L A N D | A N D | A P P E A R E D | A T | C H R I S T C H U R C H | I N | T H E | P R E S S | N E W S P A P E R | J U N E | T H I R T E E N T H | E I G H T E E N | S I X T Y | T H R E E |
+I | A T T R I B U T E | I T S | U N L O O K E D | F O R | S U C C E S S | M A I N L Y | T O | T W O | E A R L Y | F A V O U R A B L E | R E V I E W S | T H E | F I R S T | I N | T H E | P A L L | M A L L | G A Z E T T E | O F | A P R I L | T W E L F T H | A N D | T H E | S E C O N D | I N | T H E | S P E C T A T O R | O F | A P R I L | T W E N T I E T H |
+O N | M Y | R E T U R N | I | P U R P O S E L Y | A V O I D E D | L O O K I N G | I N T O | I T | U N T I L | I | H A D | S E N T | B A C K | M Y | L A S T | R E V I S E S | T O | T H E | P R I N T E R |
+T H E | F I R S T | E D I T I O N | O F | E R E W H O N | S O L D | I N | A B O U T | T H R E E | W E E K S | I | H A D | N O T | T A K E N | M O U L D S | A N D | A S | T H E | D E M A N D | W A S | S T R O N G | I T | W A S | S E T | U P | A G A I N | I M M E D I A T E L Y |
+B U T | T H I S | H A D | A N | E F F E C T | O F | W H I C H | I | H A V E | L I T T L E | R E A S O N | T O | C O M P L A I N | F O R | I | W A S | A L L O W E D | A L M O S T | T O | C A L L | T H E M | L I F E | L O N G | S E L F | D E C E I V E R S | T O | T H E I R | F A C E S | A N D | T H E Y | S A I D | I T | W A S | Q U I T E | T R U E | B U T | T H A T | I T | D I D | N O T | M A T T E R |
+T H I S | H O W E V E R | M A Y | N O T | B E | F O R | T H E | C O P Y R I G H T | W I L L | P R O B A B L Y | E X P I R E | I N | A | L I T T L E | O V E R | T W E L V E | Y E A R S |
+I | A M | S U R P R I S E D | H O W E V E R | T H A T | T H E | B O O K | A T | W H I C H | S U C H | A N | E X A M P L E | O F | T H E | S P E C I O U S | M I S U S E | O F | A N A L O G Y | W O U L D | S E E M | M O S T | N A T U R A L L Y | L E V E L L E D | S H O U L D | H A V E | O C C U R R E D | T O | N O | R E V I E W E R | N E I T H E R | S H A L L | I | M E N T I O N | T H E | N A M E | O F | T H E | B O O K | H E R E | T H O U G H | I | S H O U L D | F A N C Y | T H A T | T H E | H I N T | G I V E N | W I L L | S U F F I C E |
+I | A L S O | W R O T E | A B O U T | T H I S | T I M E | T H E | S U B S T A N C E | O F | W H A T | U L T I M A T E L Y | B E C A M E | T H E | M U S I C A L | B A N K S | A N D | T H E | T R I A L | O F | A | M A N | F O R | B E I N G | I N | A | C O N S U M P T I O N |
+I | R E G R E T | T H A T | R E V I E W E R S | H A V E | I N | S O M E | C A S E S | B E E N | I N C L I N E D | T O | T R E A T | T H E | C H A P T E R S | O N | M A C H I N E S | A S | A N | A T T E M P T | T O | R E D U C E | M I S T E R | D A R W I N ' S | T H E O R Y | T O | A N | A B S U R D I T Y |
+T H E R E | W A S | A L S O | A N O T H E R | C A U S E |
+I | M A D E | A | F E W | F U R T H E R | V E R Y | T R I F L I N G | A L T E R A T I O N S | B E F O R E | M O U L D S | W E R E | T A K E N | B U T | S I N C E | T H E | S U M M E R | O F | E I G H T E E N | S E V E N T Y | T W O | A S | N E W | E D I T I O N S | W E R E | F R O M | T I M E | T O | T I M E | W A N T E D | T H E Y | H A V E | B E E N | P R I N T E D | F R O M | S T E R E O S | T H E N | M A D E |
+I | S E E | F R O M | M Y | S E C O N D | P R E F A C E | T H A T | I | T O O K | T H E | B O O K | T O | M E S S R S | C H A P M A N | A N D | H A L L | M A Y | F I R S T | E I G H T E E N | S E V E N T Y | O N E | A N D | O N | T H E I R | R E J E C T I O N | O F | I T | U N D E R | T H E | A D V I C E | O F | O N E | W H O | H A S | A T T A I N E D | T H E | H I G H E S T | R A N K | A M O N G | L I V I N G | W R I T E R S | I | L E T | I T | S L E E P | T I L L | I | T O O K | I T | T O | M I S T E R | T R U B N E R | E A R L Y | I N | E I G H T E E N | S E V E N T Y | T W O |
+I | A M | S T I L L | F A I R L Y | W E L L | S A T I S F I E D | W I T H | T H O S E | P A R T S | O F | E R E W H O N | T H A T | W E R E | R E P E A T E D L Y | R E W R I T T E N | B U T | F R O M | T H O S E | T H A T | H A D | O N L Y | A | S I N G L E | W R I T I N G | I | W O U L D | G L A D L Y | C U T | O U T | S O M E | F O R T Y | O R | F I F T Y | P A G E S | I F | I | C O U L D |
+T H E N | I | H A D | M U C H | P L E A S U R E | I N | R E A D I N G | I T | B U T | W A S | I N D E E D | S U R P R I S E D | A T | T H E | M A N Y | L I T T L E | P O I N T S | O F | S I M I L A R I T Y | B E T W E E N | T H E | T W O | B O O K S | I N | S P I T E | O F | T H E I R | E N T I R E | I N D E P E N D E N C E | T O | O N E | A N O T H E R |
+T H I S | L A D Y ' S | R I G H T | N A M E | W A S | J O A N | B U T | B E C A U S E | O F | H E R | C O M E L I N E S S | O R | A T | L E A S T | I T | W A S | S O | I M A G I N E D | S H E | W A S | C A L L E D | O F | M A N Y | P R I M A V E R A | S P R I N G | A N D | W E N T | B Y | T H A T | N A M E | A M O N G | T H E M |
+A N D | T H E R E W I T H A L | S U C H | A | B E W I L D E R M E N T | P O S S E S S ' D | M E | T H A T | I | S H U T | M I N E | E Y E S | F O R | P E A C E | A N D | I N | M Y | B R A I N | D I D | C E A S E | O R D E R | O F | T H O U G H T | A N D | E V E R Y | H E A L T H F U L | T H I N G |
+T H E N | S A W | I | M A N Y | B R O K E N | H I N T E D | S I G H T S | I N | T H E | U N C E R T A I N | S T A T E | I | S T E P P ' D | I N T O |
+T H E | S E C O N D | P A R T | B E G I N S | H E R E | S A Y I N G | B E | N O W | T H E | T H I R D | H E R E | T H E N | W H I L E | I T | W A S | H I S | P L E A S U R E |
+T H E S E | W I L D E R I N G | P H A N T A S I E S | T H E N | C A R R I E D | M E | T O | S E E | M Y | L A D Y | D E A D |
+T H E | S E C O N D | P A R T | B E G I N S | H E R E | I | W A S | A | T H I N K I N G | T H E | F I R S T | P A R T | D I V I D E S | I N T O | T W O |
+W H E R E B Y | O T H E R | L A D I E S | W H O | W E R E | A B O U T | T H E | R O O M | B E C O M I N G | A W A R E | O F | M Y | D I S C O M F O R T | B Y | R E A S O N | O F | T H E | M O A N | T H A T | S H E | M A D E | W H O | I N D E E D | W A S | O F | M Y | V E R Y | N E A R | K I N D R E D | L E D | H E R | A W A Y | F R O M | W H E R E | I | W A S | A N D | T H E N | S E T | T H E M S E L V E S | T O | A W A K E N | M E | T H I N K I N G | T H A T | I | D R E A M E D | A N D | S A Y I N G | S L E E P | N O | L O N G E R | A N D | B E | N O T | D I S Q U I E T E D |
+A N D | S O | S T R O N G | W A S | M Y | P H A N T A S Y | T H A T | I | W E P T | A G A I N | I N | V E R Y | T R U T H | A N D | S A I D | W I T H | M Y | T R U E | V O I C E | O | E X C E L L E N T | S O U L | H O W | B L E S S E D | I S | H E | T H A T | N O W | L O O K E T H | U P O N | T H E E |
+A N D | I N | H I S | S P E E C H | H E | L A U G H ' D | A N D | L A U G H ' D | A G A I N | T H E N | W H I L E | I T | W A S | H I S | P L E A S U R E | T O | R E M A I N | I | C H A N C E D | T O | L O O K | T H E | W A Y | H E | H A D | D R A W N | N E A R | A N D | S A W | T H E | L A D I E S | J O A N | A N D | B E A T R I C E | A P P R O A C H | M E | T H I S | T H E | O T H E R | F O L L O W I N G | O N E | A N D | A | S E C O N D | M A R V E L | I N S T A N T L Y |
+W H E N | B E I N G | A R O U S E D | I | O P E N E D | M I N E | E Y E S | A N D | K N E W | T H A T | I T | H A D | B E E N | A | D E C E P T I O N |
+A N D | M Y | H U E | W A S | S U C H | T H A T | T H E Y | L O O K ' D | A T | E A C H | O T H E R | A N D | T H O U G H T | O F | D E A T H | S A Y I N G | U N D E R | T H E I R | B R E A T H | M O S T | T E N D E R L Y | O | L E T | U S | C O M F O R T | H I M | T H E N | U N T O | M E | W H A T | D R E A M | W A S | T H I N E | T H A T | I T | H A T H | S H A K E N | T H E E | S O | M U C H |
+A N D | A T | T H E | F I R S T | I T | S E E M E D | T O | M E | T H A T | I | S A W | C E R T A I N | F A C E S | O F | W O M E N | W I T H | T H E I R | H A I R | L O O S E N E D | W H I C H | C A L L E D | O U T | T O | M E | T H O U | S H A L T | S U R E L Y | D I E | A F T E R | T H E | W H I C H | O T H E R | T E R R I B L E | A N D | U N K N O W N | A P P E A R A N C E S | S A I D | U N T O | M E | T H O U | A R T | D E A D |
+T H I S | H A S | I N D U C E D | S O M E | E D I T O R S | O F | T H E | V I T A | N U O V A | T O | E X P L A I N | T H E | T I T L E | A S | M E A N I N G | E A R L Y | L I F E |
+T H R O U G H O U T | T H E | V I T A | N U O V A | T H E R E | I S | A | S T R A I N | L I K E | T H E | F I R S T | F A L L I N G | M U R M U R | W H I C H | R E A C H E S | T H E | E A R | I N | S O M E | R E M O T E | M E A D O W | A N D | P R E P A R E S | U S | T O | L O O K | U P O N | T H E | S E A |
+O N | J A N U A R Y | T W E N T Y | F I F T H | H E | W R O T E | M A N Y | A N D | M A N Y | T H A N K S | F O R | A | M O S T | E S S E N T I A L | S E R V I C E | M O S T | T H O R O U G H L Y | P E R F O R M E D |
+T H I S | B O O K | I N | I T S | O R I G I N A L | F O R M | W A S | R E C E I V E D | W I T H | F A V O U R | A N D | S E T T L E D | T H E | C L A I M | O F | R O S S E T T I | T O | R A N K | A S | A | P O E T I C | T R A N S L A T O R | O R | I N D E E D | A S | A | P O E T | I N | H I S | O W N | R I G H T |
+P O E T R Y | N O T | B E I N G | A N | E X A C T | S C I E N C E | L I T E R A L I T Y | O F | R E N D E R I N G | I S | A L T O G E T H E R | S E C O N D A R Y | T O | T H I S | C H I E F | L A W |
+T H E | O N L Y | T R U E | M O T I V E | F O R | P U T T I N G | P O E T R Y | I N T O | A | F R E S H | L A N G U A G E | M U S T | B E | T O | E N D O W | A | F R E S H | N A T I O N | A S | F A R | A S | P O S S I B L E | W I T H | O N E | M O R E | P O S S E S S I O N | O F | B E A U T Y |
+I T | I S | T H E R E F O R E | A N D | O N | A L L | A C C O U N T S | U N N E C E S S A R Y | T O | S A Y | M U C H | M O R E | O F | T H E | W O R K | H E R E | T H A N | I T | S A Y S | F O R | I T S E L F |
+H E | T R A N S L A T E D | A T | A N | E A R L Y | A G E | C H I E F L Y | B E T W E E N | E I G H T E E N | F O R T Y | F I V E | A N D | E I G H T E E N | F O R T Y | N I N E | A | G R E A T | N U M B E R | O F | P O E M S | B Y | T H E | I T A L I A N S | C O N T E M P O R A R Y | W I T H | D A N T E | O R | P R E C E D I N G | H I M | A N D | A M O N G | O T H E R | T H I N G S | H E | M A D E | A | V E R S I O N | O F | T H E | W H O L E | V I T A | N U O V A | P R O S E | A N D | V E R S E |
+M Y | N O T E S | W H I C H | Y O U | H A V E | T A K E N | T H E | T R O U B L E | O F | R E V I S I N G | A R E | O F | C O U R S E | Q U I T E | P A L T R Y | A N D | U S E L E S S |
+A N D | I F | Y O U | H A V E | T I M E | I T | W O U L D | B E | A | G R E A T | S E R V I C E | T O | T R A N S L A T E | T H E | A N A L Y S E S | O F | T H E | P O E M S | W H I C H | I | O M I T T E D |
+T H E | L I F E | B L O O D | O F | R H Y T H M I C A L | T R A N S L A T I O N | I S | T H I S | C O M M A N D M E N T | T H A T | A | G O O D | P O E M | S H A L L | N O T | B E | T U R N E D | I N T O | A | B A D | O N E |
+O F T E N | W O U L D | H E | A V A I L | H I M S E L F | O F | A N Y | S P E C I A L | G R A C E | O F | H I S | O W N | I D I O M | A N D | E P O C H | I F | O N L Y | H I S | W I L L | B E L O N G E D | T O | H I M | O F T E N | W O U L D | S O M E | C A D E N C E | S E R V E | H I M | B U T | F O R | H I S | A U T H O R ' S | S T R U C T U R E | S O M E | S T R U C T U R E | B U T | F O R | H I S | A U T H O R ' S | C A D E N C E | O F T E N | T H E | B E A U T I F U L | T U R N | O F | A | S T A N Z A | M U S T | B E | W E A K E N E D | T O | A D O P T | S O M E | R H Y M E | W H I C H | W I L L | T A L L Y | A N D | H E | S E E S | T H E | P O E T | R E V E L L I N G | I N | A B U N D A N C E | O F | L A N G U A G E | W H E R E | H I M S E L F | I S | S C A N T I L Y | S U P P L I E D |
+A | W O R D | S H O U L D | B E | S A I D | H E R E | O F | T H E | T I T L E | O F | D A N T E ' S | A U T O B I O G R A P H Y |
+W H A T E V E R | H E R | S W E E T | E Y E S | A R E | T U R N E D | U P O N | S P I R I T S | O F | L O V E | D O | I S S U E | T H E N C E | I N | F L A M E | W H I C H | T H R O U G H | T H E I R | E Y E S | W H O | T H E N | M A Y | L O O K | O N | T H E M | P I E R C E | T O | T H E | H E A R T ' S | D E E P | C H A M B E R | E V E R Y | O N E |
+S O | T O | T H E | R O A D | T H O U | S H A L T | B E | R E C O N C I L E D | A N D | F I N D | T H E | L A D Y | A N D | W I T H | T H E | L A D Y | L O V E |
+T H E | F I R S T | P A R T | I S | D I V I D E D | I N T O | F O U R |
+A N D | N O W | T H A T | I T | H A T H | P L E A S E D | H E R | T O | D E N Y | M E | T H I S | L O V E | M Y | M A S T E R | O F | H I S | G R E A T | G O O D N E S S | H A T H | P L A C E D | A L L | M Y | B E A T I T U D E | T H E R E | W H E R E | M Y | H O P E | W I L L | N O T | F A I L | M E |
+I N | T H E | F O U R T H | R E P E A T I N G | T O | W H O M | I | P U R P O S E | S P E A K I N G | I | T E L L | T H E | R E A S O N | W H Y | I | S P E A K | T O | T H E M |
+T H I S | S E C O N D | P A R T | I S | D I V I D E D | I N T O | T W O | F O R | I N | T H E | O N E | I | S P E A K | O F | T H E | E Y E S | W H I C H | A R E | T H E | B E G I N N I N G | O F | L O V E | I N | T H E | S E C O N D | I | S P E A K | O F | T H E | M O U T H | W H I C H | I S | T H E | E N D | O F | L O V E |
+A N D | I | D E C L A R E | T H A T | W H E N | I | S P E A K | T H E R E O F | L O V E | S H E D S | S U C H | P E R F E C T | S W E E T N E S S | O V E R | M E | T H A T | I F | M Y | C O U R A G E | F A I L E D | N O T | C E R T A I N L Y | T O | H I M | M Y | L I S T E N E R S | M U S T | B E | A L L | R E S I G N ' D |
+T O | H E R | I | W E N D | A L O N G | I N | W H O S E | M U C H | S T R E N G T H | M Y | W E A K N E S S | I S | M A D E | S T R O N G |
+I N | T H E | S E C O N D | I | T E L L | W H A T | I S | U N D E R S T O O D | O F | H E R | O N | E A R T H | H E R E | M Y | L A D Y | I S | D E S I R E D |
+T H E R E A F T E R | T H I S | S O N N E T | B R E D | I N | M E | D E S I R E | T O | W R I T E | D O W N | I N | V E R S E | F O U R | O T H E R | T H I N G S | T O U C H I N G | M Y | C O N D I T I O N | T H E | W H I C H | T H I N G S | I T | S E E M E D | T O | M E | T H A T | I | H A D | N O T | Y E T | M A D E | M A N I F E S T |
+T H E | S E C O N D | B E G I N S | H E R E | A N | A N G E L | T H E | T H I R D | H E R E | D E A R | S O N G | I | K N O W |
+B U T | W H E N | I | S T I L L | S P A K E | N O T | O N E | O F | T H E M | W H O | B E F O R E | H A D | B E E N | T A L K I N G | W I T H | A N O T H E R | A D D R E S S E D | M E | B Y | M Y | N A M E | S A Y I N G | T O | W H A T | E N D | L O V E S T | T H O U | T H I S | L A D Y | S E E I N G | T H A T | T H O U | C A N S T | N O T | S U P P O R T | H E R | P R E S E N C E |
+T H E N | T H O S E | L A D I E S | B E G A N | T O | T A L K | C L O S E L Y | T O G E T H E R | A N D | A S | I | H A V E | S E E N | S N O W | F A L L | A M O N G | T H E | R A I N | S O | W A S | T H E I R | T A L K | M I N G L E D | W I T H | S I G H S |
+W H I C H | T H I N G | B E I N G | T H U S | T H E R E | C A M E | A | D A Y | W H E N | C E R T A I N | L A D I E S | T O | W H O M | I T | W A S | W E L L | K N O W N | T H E Y | H A V I N G | B E E N | W I T H | M E | A T | D I V E R S | T I M E S | I N | M Y | T R O U B L E | W E R E | M E T | T O G E T H E R | F O R | T H E | P L E A S U R E | O F | G E N T L E | C O M P A N Y |
+I N | T H E | T H I R D | I | S A Y | W H A T | I T | I S | I | P U R P O S E | T O | S P E A K | S O | A S | N O T | T O | B E | I M P E D E D | B Y | F A I N T H E A R T E D N E S S |
+T H I S | S E C O N D | P A R T | I S | D I V I D E D | I N T O | T W O | F O R | I N | T H E | F I R S T | I | S P E A K | O F | H E R | A S | R E G A R D S | T H E | N O B L E N E S S | O F | H E R | S O U L | R E L A T I N G | S O M E | O F | H E R | V I R T U E S | P R O C E E D I N G | F R O M | H E R | S O U L | I N | T H E | S E C O N D | I | S P E A K | O F | H E R | A S | R E G A R D S | T H E | N O B L E N E S S | O F | H E R | B O D Y | N A R R A T I N G | S O M E | O F | H E R | B E A U T I E S | H E R E | L O V E | S A I T H | C O N C E R N I N G | H E R |
+T H E | P E O P L E | I S | A | B E A S T | O F | M U D D Y | B R A I N | T H A T | K N O W S | N O T | I T S | O W N | F O R C E | A N D | T H E R E F O R E | S T A N D S | L O A D E D | W I T H | W O O D | A N D | S T O N E | T H E | P O W E R L E S S | H A N D S | O F | A | M E R E | C H I L D | G U I D E | I T | W I T H | B I T | A N D | R E I N | O N E | K I C K | W O U L D | B E | E N O U G H | T O | B R E A K | T H E | C H A I N | B U T | T H E | B E A S T | F E A R S | A N D | W H A T | T H E | C H I L D | D E M A N D S | I T | D O E S | N O R | I T S | O W N | T E R R O R | U N D E R S T A N D S | C O N F U S E D | A N D | S T U P E F I E D | B Y | B U G B E A R S | V A I N |
+M O S T | W O N D E R F U L |
+D U E | T O | T H E E | T H E I R | P R A I S E | O F | M A I D E N | P U R E | O F | T E E M I N G | M O T H E R H O O D |
+I N | F L E S H | W A S | R A I M E N T E D | H O W | H E | W A S | K I L L E D | A N D | B U R I E D | F R O M | T H E | D E A D | H O W | H E | A R O S E | T O | L I F E | W I T H | V I C T O R Y | A N D | R E I G N E D | I N | H E A V E N | H O W | A L L | O F | U S | S H A L L | B E | G L O R I O U S | L I K E | H I M | W H O S E | H E A R T S | T O | H I S | A R E | W E D | H O W | T H E Y | W H O | D I E | F O R | L O V E | O F | R E A S O N | G I V E | H Y P O C R I T E S | T Y R A N T S | S O P H I S T S | A L L | W H O | S E L L | T H E I R | N E I G H B O U R S | I L L | F O R | H O L I N E S S | T O | H E L L | H O W | T H E | D E A D | S A I N T | C O N D E M N S | T H E | B A D | W H O | L I V E | H O W | A L L | H E | D O E S | B E C O M E S | A | L A W | F O R | M E N | H O W | H E | A T | L A S T | T O | J U D G E | S H A L L | C O M E | A G A I N |
+Q U I N C I | I M P A R A | A | S T U P I R T I |
+H E A V E N | H E L P | T H A T | B O D Y | W H I C H | A | L I T T L E | M I N D | H O U S E D | I N | A | H E A D | L A C K I N G | E A R S | T O N G U E | A N D | E Y E S | A N D | S E N S E L E S S | B U T | F O R | S M E L L | C A N | T Y R A N N I S E |
+W E L L | T O O | I F | H E | L I K E | L O V E | W O U L D | F I L C H | O U R | H O A R D | W I T H | P L E A S U R E | T O | O U R S E L V E S | S L U I C I N G | O U R | V E I N | A N D | V I G O U R | T O | P E R P E T U A T E | T H E | S T R A I N | O F | L I F E | B Y | S P I L T H | O F | L I F E | W I T H I N | U S | S T O R E D |
+H E | L I V E S | T H Y | L O S S | H E | D I E S | F R O M | E V E R Y | L I M B | M A N G L E D | B Y | T H E E | L I G H T N I N G S | O F | G O D H E A D | S H I N E | F R O M | W H I C H | T H Y | D A R K N E S S | H A T H | N O T | W H E R E | T O | H I D E |
+T H O U | L I K E | A R C T U R U S | S T E A D F A S T | I N | T H E | S K I E S | W I T H | T A R D Y | S E N S E | G U I D E S T | T H Y | K I N G D O M | F A I R | B E A R I N G | A L O N E | T H E | L O A D | O F | L I B E R T Y |
+T H A T | P E N A N C E | H A T H | N O | B L A M E | W H I C H | M A G D A L E N | F O U N D | S W E E T | P U R G I N G | O U R | S H A M E | S E L F | P U N I S H M E N T | I S | V I R T U E | A L L | M E N | K N O W |
+I L | P O P O L O | E | U N A | B E S T I A |
+O R G A N | O F | R U T | N O T | R E A S O N | I S | T H E | L O R D | W H O | F R O M | T H E | B O D Y | P O L I T I C | D O T H | D R A I N | L U S T | F O R | H I M S E L F | I N S T E A D | O F | T O I L | A N D | P A I N | L E A V I N G | U S | L E A N | A S | C R I C K E T S | O N | D R Y | S W A R D |
+M O N E Y | I S | F A L S E | A N D | L I G H T | U N L E S S | I T | B E | B O U G H T | B Y | A | M A N ' S | O W N | W O R T H Y | Q U A L I T I E S | A N D | B L O O D | I S | S U C H | T H A T | I T S | C O R R U P T | D I S E A S E | A N D | I G N O R A N T | P R E T E N C E | A R E | F O U L | T O | S E E |
+T H I S | W O R L D ' S | T H I C K | V A P O U R S | W H E L M | Y O U R | E Y E S | U N W O R T H Y | O F | T H A T | G L O R I O U S | S H O W | B L I N D | T O | H I S | S P L E N D O U R | B E N T | U P O N | H I S | S H A M E |
+A R E | Y O U | R E A L L Y | G O I N G | T O | T H R O W | M E | O V E R | F O R | A | T H I N G | L I K E | T H I S |
+I T | W A S | B I T T E R L Y | C O L D | B U T | T H E | E M B A N K M E N T | W A S | M O R E | R O M A N T I C | T H A N | A | R A I L W A Y | C A R R I A G E |
+H E | H A D | B E E N | L A T E | H E | H A D | O F F E R E D | N O | E X C U S E | N O | E X P L A N A T I O N |
+S H E | W O U L D | H A V E | S H A R E D | H I S | S O R R O W | A N D | S H O W N | H E R S E L F | H A L F | W I F E | H A L F | A N G E L | F R O M | H E A V E N | I N | T H I S | D A R K | H O U R |
+H E R | H A N D S | S H O U L D | H A V E | B E E N | F U L L | O F | B L U E B E L L S | A N D | S H E | S H O U L D | H A V E | H E L D | T H E M | U P | T O | H I S | F A C E | I N | M A I D E N L Y | D E F E N C E | A S | H E | S P R A N G | F O R W A R D | T O | T A K E | H E R | I N | H I S | A R M S |
+A N D | Y E S T E R D A Y | I | H A D | A | L E T T E R | F R O M | H E R | A N D | S H E | S E E M S | T O | E X P E C T | T O | T H I N K | A N D | I | T H O U G H T | I | O U G H T | T O | T E L L | Y O U | D A R L I N G |
+C O U L D N ' T | H E L P | I T | T H E N | H O W | C A N | I | E V E R | T R U S T | Y O U |
+H E | C H E C K E D | T H E | S I L L Y | I M P U L S E |
+I | D I D N ' T | T H I N K | A | D E C E N T | M A N | C O U L D | D O | S U C H | T H I N G S | S H E | W A S | P U L L I N G | O N | H E R | G L O V E S | G O | H O M E | A N D | G L O A T | O V E R | I T | A L L |
+A N D | C U R I O U S L Y | E N O U G H | S H E | W A S | H A R D L Y | C U R I O U S | A T | A L L | A B O U T | W H A T | H E | M I G H T | H A V E | T O | S A Y |
+A N D | H E | S T R O D E | D O W N | B E T W E E N | T H E | M A R B L E | T A B L E S | A N D | O U T | B Y | T H E | S W I N G | D O O R | I T | W A S | A | V E R Y | G O O D | E X I T |
+Y O U | S E E | T H A T | S H E | K N E W | E X A C T L Y | H O W | A | T R Y S T | I S | C O N D U C T E D | I N | T H E | P A G E S | O F | T H E | S T A N D A R D | P O E T S | A N D | O F | T H E | C H E A P E R | W E E K L Y | J O U R N A L S |
+T H E | K E E N | W I N D | T H R U S T | I T S E L F | E V E N | I N S I D E | T H E | H I G H | C O L L A R | O F | H E R | J A C K E T |
+W H A T | O P I N I O N | W O U L D | H E | F O R M | O F | T H E | P U R I T Y | O F | H E R | M I N D | T H E | I N N O C E N C E | O F | H E R | S O U L | I F | A N | I N C I D E N T | L I K E | T H I S | F A I L E D | T O | S H O C K | H E R | D E E P L Y |
+T H E | S E T T I N G | O F | T H E | S C E N E | S E E M E D | T O | H E R | A L L | I M P O R T A N T |
+H E | S T O O D | U P | S U D D E N L Y | D O | Y O U | M E A N | I T |
+S H E | O N L Y | W I S H E D | F O R | M A Y | A N D | T H E | O R C H A R D | I N S T E A D | O F | J A N U A R Y | A N D | T H E | D I N G Y | D U S T Y | W A I T I N G | R O O M | T H E | P L A I N | F A C E D | P R E O C C U P I E D | T R A V E L L E R S | T H E | D I M | D E S O L A T E | W E A T H E R |
+D O | Y O U | T H I N K | I ' M | N O T | S O R R Y | N O W |
+S H E | H A D | T O | T H E | F U L L | L I M I T | A L L O W E D | O F | H E R | R E A D I N G | A N D | H E R | E N V I R O N M E N T | T H E | L I T E R A R Y | S E N S E |
+N O | I T ' S | O N L Y | P A I N F U L | F O R | B O T H | O F | U S |
+S O | H E | E N L I S T E D | A N D | W E N T | T O | S O U T H | A F R I C A | A N D | H E | N E V E R | C A M E | H O M E | C O V E R E D | W I T H | M E D A L S | A N D | G L O R Y | W H I C H | W A S | R A T H E R | H I S | I D E A | T O | T H E | F E W | S I M P L E | W O R D S | O F | E X P L A N A T I O N | T H A T | W O U L D | H A V E | M A D E | A L L | S T R A I G H T | A N D | R E P A I D | H E R | A N D | H I M | F O R | A L L | T H E | P A S T |
+S H E | H E R S E L F | S H O U L D | H A V E | B E E N | A | P O E M | A | L Y R I C | I N | A | W H I T E | G O W N | A N D | G R E E N | S C A R F | C O M I N G | T O | H I M | T H R O U G H | T H E | L O N G | G R A S S | U N D E R | T H E | B L O S S O M E D | B O U G H S |
+A | S H O C K | O F | U N B E L I E V A B L E | R E L I E F | T I N G L E D | T H R O U G H | H E R | S O | T H A T | W A S | A L L | W H A T | W A S | I T | C O M P A R E D | W I T H | H E R | F E A R S |
+S H E | S A I D | H O W | F R I G H T F U L L Y | C O L D | I T | I S |
+B U T | H E R E | T H E | O N L Y | T H I N G | T H A T | O C C U R R E D | T O | H E R | W A S | T O | S T O P | A N D | L O O K | I N | O N E | O F | T H E | S H O P S | T I L L | H E | S H O U L D | A S K | H E R | W H A T | S H E | W A S | L O O K I N G | A T |
+H E R | H A N D S | A N D | F E E T | W E R E | A C H I N G | W I T H | C O L D |
+A T | T H E | C O R N E R | H E | R E M E M B E R E D | T H A T | H E | H A D | G O N E | A W A Y | W I T H O U T | P A Y I N G | F O R | T H E | T E A | A N D | H I S | N A T U R A L | I M P U L S E | W A S | T O | G O | B A C K | A N D | R E M E D Y | T H A T | E R R O R |
+T H O S E | F O U R | T R U E | W O R D S | W O U N D E D | H E R | M O R E | T H A N | A L L | T H E | R E S T |
+F O L L O W I N G | T H E | T I N G L E | O F | R E L I E F | C A M E | A | S H A R P | S I C K E N I N G | P I N C H | O F | J E A L O U S Y | A N D | M O R T I F I C A T I O N | T H E S E | I N S P I R E D | H E R |
+S H A L L | I | P O U R | O U T | M Y | S O U L | I N T O | T H E | E A R | O F | A | M I S T | A | F U M E | F R O M | M Y | O W N | B R A I N |
+W I T H | T H A T | C A M E | A | P A N G | O F | I N T E N S E | P A I N |
+B U T | L I V I N G | S O U L | T H E R E | C O U L D | B E | N O N E | T O | M E E T |
+B U T | S H E | K N E W | N O B O D Y | A N D | W A N D E R E D | A L O N E | I N | T H E | G A R D E N | O P P R E S S E D | W I T H | S O M E T H I N G | S H E | D I D | N O T | U N D E R S T A N D |
+T H E | O L D | T I M E | W A S | B U T | A | T H I C K E R | D R E A M | A N D | T H I S | I S | T R U E R | B E C A U S E | M O R E | S H A D O W Y |
+S H E | H A D | L O S T | H I M | Y E A R S | A N D | Y E A R S | B E F O R E | A N D | N O W | S H E | S A W | H I M | H E | W A S | T H E R E | A N D | S H E | K N E W | H I M |
+H E | C A M E | T O | H E R | S I D E | A N D | S H E | G A V E | H I M | N O | G R E E T I N G |
+A T | T H E | E N D | O F | I T | S H E | W A S | I N | A | P L A C E | O F | T O M B S |
+T H U S | W A S | S H E | B O R N E | A W A Y | C A P T I V E | O F | H E R | D E A D | N E I T H E R | W I L L I N G | N O R | U N W I L L I N G | O F | L I F E | A N D | D E A T H | E Q U A L L Y | C A R E L E S S |
+T H I S | W A S | H E R | D R E A M | A S | N E A R L Y | A S | S H E | C O U L D | R E C A L L | I T | W H E N | S H E | C A M E | T O | H E R S E L F | A F T E R | W A K I N G | F R O M | I T | W I T H | A | C R Y |
+S H E | W A S | L O S T | L O S T | U T T E R L Y | W I T H | A N | E T E R N A L | L O S S |
+W H E N | S H E | O P E N E D | T H E | D O O R | O F | I T | T H E | B R I G H T | F I R E | W H I C H | B E E N I E | U N D E S I R E D | H A D | K I N D L E D | T H E R E | S T A R T L E D | H E R | T H E | R O O M | L O O K E D | U N N A T U R A L | U N C A N N Y | B E C A U S E | I T | W A S | C H E E R F U L |
+A T | T H E | T I M E | M A R Y | H A D | N O T E D | N O T H I N G | O F | T H E S E | T H I N G S | N O W | S H E | S A W | T H E M | A L L | A S | F O R | T H E | F I R S T | T I M E | I N | M I N U T E | D E T A I L | W H I L E | S L O W L Y | S H E | W E N T | U P | T H E | S T A I R | A N D | T H R O U G H | T H E | N A R R O W E D | W A Y S | A N D | H E A R D | T H E | S A M E | W I N D | T H A T | R A V E D | A L I K E | A B O U T | T H E | N E W | G R A V E | A N D | T H E | O L D | H O U S E | I N T O | W H I C H | L A T T E R | F O R | A L L | T H E | B A L E S | B A N K E D | A G A I N S T | T H E | W A L L S | I T | F O U N D | M A N Y | A | C H I N K | O F | E N T R A N C E |
+S H E | W A S | O N E | O F | A | L A R G E | C O M P A N Y | A T | A | H O U S E | W H E R E | S H E | H A D | N E V E R | B E E N | B E F O R E | A | B E A U T I F U L | H O U S E | W I T H | A | L A R G E | G A R D E N | B E H I N D |
+S H E | S T O O D | F O R | A | M O M E N T | O N | T H E | H E A R T H | A N D | I N | S A D | D R E A M Y | M O O D | L I S T E N E D | T O | T H E | H O W L I N G | S W O O P S | O F | T H E | W I N D | M A K I N G | T H E | H O U S E | Q U I V E R | A N D | S H A K E |
+S H E | K N E W | N O T H I N G | O F | T H E | P L A C E | H A D | N O W H E R E | T O | G O | N O W H E R E | S H E | W A N T E D | T O | G O | H A D | N O T | A | T H O U G H T | T O | T E L L | H E R | W H A T | Q U E S T I O N | T O | A S K | I F | S H E | M E T | A | L I V I N G | S O U L |
+S H E | E N T E R E D | A N D | T H E | S E R V A N T S | S O F T | F O O T E D | A N D | S I L E N T | W E R E | B U S Y | C A R R Y I N G | A W A Y | T H E | V E S S E L S | O F | H O S P I T A L I T Y | A N D | R E S T O R I N G | O R D E R | A S | I F | A L R E A D Y | T H E Y | P R E P A R E D | F O R | A N O T H E R | C O M P A N Y | O N | T H E | M O R R O W | N O | O N E | H E E D E D | H E R |
+W H E N | S H E | S A I D | G O O D | N I G H T | T O | B E E N I E | A N D | W E N T | T O | H E R | C H A M B E R | O V E R | T H A T | W H E R E | T H E | L O V E D | P A R E N T | A N D | F R I E N D | W O U L D | F A L L | A S L E E P | N O | M O R E | S H E | F E L T | A S | I F | S H E | W E N T | W A L K I N G | A L O N G | T O | H E R | T O M B |
+I T | W A S | A | S U M M E R | N I G H T | A N D | T H E | G U E S T S | W E R E | W A N D E R I N G | I N | A N D | O U T | A T | W I L L | A N D | T H R O U G H | H O U S E | A N D | G A R D E N | A M I D | L O V E L Y | T H I N G S | O F | A L L | C O L O R S | A N D | O D O R S |
+H E R | O N L Y | L I F E | W A S | T H A T | S H E | W A S | L O S T |
+I | K N O W | I T | A N D | T H E R E | I S | N O | W A K I N G |
+I T | W A S N ' T | I | W H O | S A I D | T H A T | S A I D | T H E | G I R L | S M I L I N G | B U T | T H A T ' S | S O | A N Y H O W | A N D | T H E N | S H E | S I G H E D |
+I | S H A L L | L O C K | U P | A L L | T H E | D O O R S | A N D | W I N D O W S | I N | T H E | H O U S E | A N D | T H E N | I | S H A L L | G I V E | Y O U | M Y | L A T C H | K E Y | A N D | Y O U | C A N | L E T | Y O U R S E L F | I N | A N D | S T A Y | T H E | N I G H T | H E R E | T H E R E | I S | N O | O N E | I N | T H E | H O U S E |
+T H E R E | I S | A | S E A T | I N | T H E | G A R D E N | A T | T H E | S I D E | O F | T H E | H O U S E | A G A I N | S H E | H E S I T A T E D |
+I S | I T | O N L Y | T H A T | Y O U ' R E | P O O R | W H Y | T H A T ' S | N O T H I N G | I ' M | P O O R | T O O | S H E | L A U G H E D |
+L E T | M E | T H I N K | H E | S A I D | O H | H O W | G L A D | I | A M | T H A T | Y O U | H A P P E N E D | T O | C O M E | T H I S | W A Y |
+H E | H U R R I E D L Y | C U T | C A K E | A N D | P R E S S E D | I T | U P O N | H E R |
+D O | D R I N K | T H I S | A N D | T H E N | T E L L | M E | P E R H A P S | I | C A N | H E L P | Y O U |
+H E | T O L D | M E | T O | S T A Y | O N | A T | T H E | H O T E L | A N D | I | D I D | A N D | T H E N | O N E | N I G H T | W H E N | I | W A S | A T | T H E | T H E A T R E | M Y | M A I D | A | H O R R I D | F R E N C H | T H I N G | W E | G O T | I N | P A R I S | P A C K E D | U P | A L L | M Y | T R U N K S | A N D | T O O K | A L L | M Y | M O N E Y | A N D | P A I D | T H E | B I L L | A N D | W E N T |
+A L L | T H E | S A M E | H E | A D D E D | I R R E L E V A N T L Y | Y O U | S H A L L | H A V E | T H E | L A T C H | K E Y |
+I | W I L L | C A T C H | T H E | N I G H T | T R A I N | A N D | B R I N G | M Y | M O T H E R | U P | T O | M O R R O W | T H E N | W E | W I L L | S E E | W H A T | C A N | B E | D O N E |
+T H E | L A D Y | A N D | T H E | G U I T A R | C E R T A I N L Y | P A S S E D | T H E | N I G H T | A T | H I L L | V I E W | V I L L A | B U T | W H E N | H I S | M O T H E R | V E R Y | A N G R Y | A N D | V E R Y | F R I G H T E N E D | C A M E | U P | W I T H | H I M | A T | A B O U T | N O O N | T H E | H O U S E | L O O K E D | J U S T | A S | U S U A L | A N D | N O | O N E | W A S | T H E R E | B U T | T H E | C H A R W O M A N |
+T H E | Y O U N G | M A N | D R E W | A | D E E P | B R E A T H | O F | R E L I E F | A N D | L I G H T E D | T H E | W A X | C A N D L E S | I N | T H E | S O L I D | S I L V E R | C A N D L E S T I C K S | O N | H I S | W R I T I N G | T A B L E | F O R | N O W | T H E | L A T E | S U M M E R | D U S K | W A S | F A L L I N G | A N D | T H A T | O R G A N | P L E A S E | H E A V E N | M A D E | F U L L | T H E | M E A S U R E | O F | T H E | D A Y ' S | A P P O I N T E D | T O R T U R E |
+I T | W A S | P L A I N | T H A T | H I S | C A S T A N E T | G I R L | H I S | M O T H E R | A N D | S I S T E R | T O O K | A | P L E A S U R E | I N | C R E D I T I N G | H E R | D A I L Y | W I T H | S O M E | F R E S H | A N D | U N P L E A S I N G | I N S T R U M E N T | C O U L D | H A V E | H A D | N E I T H E R | T A S T E | M O N E Y | N O R | H O N E S T Y | T O | S U C H | A | P O I N T | A S | T H I S |
+T H E | L A S T | S T R A I N S | O F | T H E | I L L | T R E A T E D | I L L | F A T E D | I N T E R M E Z Z O | H A D | D I E D | A W A Y | A N D | A F T E R | T H E M | H A D | D I E D | A W A Y | A L S O | T H E | R U M B L I N G | O F | T H E | W H E E L S | O F | T H E | M U R D E R O U S | B A R R E L | O R G A N | T H A T | H A D | S O | G A I L Y | E X E C U T E D | T H A T | A L O N G | W I T H | T H E | N I N E | O T H E R | T U N E S | O F | I T S | R E P E R T O R Y | T O | T H E | A D M I R A T I O N | O F | T H E | H O U S E M A I D | A T | T H E | W I N D O W | O F | T H E | H O U S E | O P P O S I T E | A N D | T H E | C R O W I N G | D E L I G H T | O F | T H E | T W O | B A B I E S | N E X T | D O O R |
+S H E | S A I D | A G A I N | Y O U | A R E | K I N D |
+N E V E R | H A D | A N Y | A C T | S E E M E D | S O | I M P O S S I B L E |
+L O O K | H E R E | H E | S A I D | T H I S | I S | A L L | N O N S E N S E | Y O U | K N O W | Y O U | A R E | T I R E D | O U T | A N D | T H E R E ' S | S O M E T H I N G | W R O N G | W H A T | I S | I T |
+T H E | S I L V E R | I S | A L L | R I G H T | T H A N K | G O O D N E S S | S H E | S A I D | B U T | Y O U R | B A N J O | G I R L | H A S | T A K E N | A | P A I R | O F | Y O U R | S I S T E R ' S | S I L K | S T O C K I N G S | A N D | T H O S E | N E W | S H O E S | O F | H E R S | W I T H | T H E | S I L V E R | B U C K L E S | A N D | S H E ' S | L E F T | T H E S E |
+W E L L | T H E N | I | W E N T | I N T O | L O D G I N G S | T H A T | W I C K E D | W O M A N | H A D | L E F T | M E | O N E | S T R E E T | S U I T | A N D | T O | D A Y | T H E Y | T U R N E D | M E | O U T | B E C A U S E | M Y | M O N E Y | W A S | A L L | G O N E |
+Y O U | S E E | P A P A ' S | S O | V E R Y | R I C H | A N D | A T | H O M E | T H E Y | E X P E C T | M E | T O | T O | G E T | A C Q U A I N T E D | W I T H | D U K E S | A N D | T H I N G S | A N D | S H E | S T O P P E D |
+H E | H A D | N O | T I M E | T O | T H I N K | B U T | H E | W A S | A W A R E | T H A T | T H I S | W A S | T H E | M O S T | E X C I T I N G | A D V E N T U R E | T H A T | H A D | E V E R | H A P P E N E D | T O | H I M |
+T H E N | S H E | T U R N E D | T O W A R D S | T H E | Q U A R T E R | I N D I C A T E D | A N D | D I S A P P E A R E D | R O U N D | T H E | L A U R E L | B U S H E S |
+T H E N | T H E R E | W A S | S I L E N C E | T H E N | A | S I G H | A N D | T H E | S O U N D | O F | L I G H T | M O V I N G | F E E T | O N | T H E | G R A V E L |
+A N D | A G A I N | H E | L I S T E N E D | W I T H | A | Q U I E T | P L E A S U R E |
+Y O U | A R E | K I N D | S H E | S A I D | F O R | T H E | T H I R D | T I M E | A N D | R E A C H E D | H E R | H A N D | O U T | T O | H I M | H E | D I D | N O T | K I S S | I T | T H E N | O N L Y | T O O K | I T | I N | H I S | A N D | F E L T | H O W | S M A L L | A N D | C O L D | I T | W A S | T H E N | I T | W A S | T A K E N | A W A Y |
+H E R | L I T T L E | F O O T | T A P P E D | T H E | G R A V E L | I M P A T I E N T L Y |
+M I C H A E L I S | T H E | T I C K E T | O F | L E A V E | A P O S T L E | W A S | S P E A K I N G | I N | A N | E V E N | V O I C E | A | V O I C E | T H A T | W H E E Z E D | A S | I F | D E A D E N E D | A N D | O P P R E S S E D | B Y | T H E | L A Y E R | O F | F A T | O N | H I S | C H E S T |
+H I S | V I S I O N | O F | T R U T H | H A D | G R O W N | S O | I N T E N S E | T H A T | T H E | S O U N D | O F | A | S T R A N G E | V O I C E | F A I L E D | T O | R O U T | I T | T H I S | T I M E |
+H E | S T O O D | S T I L L | I N | T H E | M I D D L E | O F | T H E | P A R L O U R | A N D | L O O K E D | I N T O | T H E | K I T C H E N | I N | S I L E N C E |
+T H E | O T H E R | D A Y | S T E V I E | G O T | H O L D | O F | O N E | A N D | T H E R E | W A S | A | S T O R Y | I N | I T | O F | A | G E R M A N | S O L D I E R | O F F I C E R | T E A R I N G | H A L F | O F F | T H E | E A R | O F | A | R E C R U I T | A N D | N O T H I N G | W A S | D O N E | T O | H I M | F O R | I T | T H E | B R U T E |
+M I S T E R | V E R L O C ' S | A N X I E T I E S | H A D | P R E V E N T E D | H I M | F R O M | A T T A C H I N G | A N Y | S E N S E | T O | W H A T | H I S | W I F E | W A S | S A Y I N G |
+I | W I S H | H E | H A D | N E V E R | B E E N | T O | S C H O O L | M I S S U S | V E R L O C | B E G A N | A G A I N | B R U S Q U E L Y |
+A | H A R S H | L A U G H | F R O M | C O M R A D E | O S S I P O N | C U T | T H E | T I R A D E | D E A D | S H O R T | I N | A | S U D D E N | F A L T E R I N G | O F | T H E | T O N G U E | A N D | A | B E W I L D E R E D | U N S T E A D I N E S S | O F | T H E | A P O S T L E ' S | M I L D L Y | E X A L T E D | E Y E S |
+I | W O U L D N ' T | G I V E | A | H A L F P E N N Y | F O R | T H E | W H O L E | L O T |
+V E R Y | C H A R A C T E R I S T I C | P E R F E C T L Y | T Y P I C A L |
+H E | C L O S E D | T H E | D O O R | B E H I N D | T H E I R | B A C K S | W I T H | R E S T R A I N E D | V I O L E N C E | T U R N E D | T H E | K E Y | S H O T | T H E | B O L T |
+A H | H E | D I D | N O T | D E P E N D | U P O N | E M O T I O N A L | E X C I T E M E N T | T O | K E E P | U P | H I S | B E L I E F | N O | D E C L A M A T I O N S | N O | A N G E R | N O | V I S I O N S | O F | B L O O D | R E D | F L A G S | W A V I N G | O R | M E T A P H O R I C A L | L U R I D | S U N S | O F | V E N G E A N C E | R I S I N G | A B O V E | T H E | H O R I Z O N | O F | A | D O O M E D | S O C I E T Y | N O T | H E |
+A L L | I D E A L I S A T I O N | M A K E S | L I F E | P O O R E R |
+S T E V I E | S W A L L O W E D | T H E | T E R R I F Y I N G | S T A T E M E N T | W I T H | A N | A U D I B L E | G U L P | A N D | A T | O N C E | A S | T H O U G H | I T | H A D | B E E N | S W I F T | P O I S O N | S A N K | L I M P L Y | I N | A | S I T T I N G | P O S T U R E | O N | T H E | S T E P S | O F | T H E | K I T C H E N | D O O R |
+A L E X A N D E R | O S S I P O N | G O T | U P | T A L L | I N | H I S | T H R E A D B A R E | B L U E | S E R G E | S U I T | U N D E R | T H E | L O W | C E I L I N G | S H O O K | O F F | T H E | S T I F F N E S S | O F | L O N G | I M M O B I L I T Y | A N D | S T R O L L E D | A W A Y | I N T O | T H E | K I T C H E N | D O W N | T W O | S T E P S | T O | L O O K | O V E R | S T E V I E ' S | S H O U L D E R |
+H E R | B A R E | F E E T | A S | I F | P O K E D | T H R O U G H | T H E | B O T T O M | O F | A N | U N A D O R N E D | S L E E V E D | C A L I C O | S A C K | B U T T O N E D | T I G H T L Y | A T | N E C K | A N D | W R I S T S | F E L T | O V E R | T H E | R U G | F O R | T H E | S L I P P E R S | W H I L E | S H E | L O O K E D | U P W A R D | I N T O | H E R | H U S B A N D ' S | F A C E |
+A N D | E V E R | S I N C E | H E | H A D | N E V E R | M A N A G E D | T O | G E T | H I S | W E I G H T | D O W N | A S | M U C H | A S | A N | O U N C E |
+H E | T O O K | T H E | C A S H | B O X | O U T | O F | T H E | D R A W E R | A N D | T U R N I N G | T O | L E A V E | T H E | S H O P | B E C A M E | A W A R E | T H A T | S T E V I E | W A S | S T I L L | D O W N S T A I R S |
+D O N ' T | Y O U | T H I N K | T H A T | I F | I | H A D | N O T | B E E N | T H E | O P T I M I S T | I | A M | I | C O U L D | N O T | H A V E | F O U N D | I N | F I F T E E N | Y E A R S | S O M E | M E A N S | T O | C U T | M Y | T H R O A T |
+T H E R E | W A S | A N | E X T R A O R D I N A R Y | F O R C E | O F | S U G G E S T I O N | I N | T H I S | P O S T U R I N G |
+T H I S | S U R V E Y | W A S | U N F A V O U R A B L E |
+T H E R E | W A S | N O | Y O U N G | M A N | O F | H I S | A G E | I N | L O N D O N | M O R E | W I L L I N G | A N D | D O C I L E | T H A N | S T E P H E N | S H E | A F F I R M E D | N O N E | M O R E | A F F E C T I O N A T E | A N D | R E A D Y | T O | P L E A S E | A N D | E V E N | U S E F U L | A S | L O N G | A S | P E O P L E | D I D | N O T | U P S E T | H I S | P O O R | H E A D |
+T H A T | P O O R | B O Y | I S | I N | A | V E R Y | E X C I T E D | S T A T E | T O | N I G H T | S H E | M U R M U R E D | A F T E R | A | P A U S E | W H I C H | L A S T E D | F O R | T H R E E | T I C K S | O F | T H E | C L O C K |
+T H I S | D R E A D | L E D | H I M | T O | M A K E | T H E | R E M A R K | T H A T | S T E V I E | H A D | D I S R E G A R D E D | H I S | S U G G E S T I O N | T O | G O | T O | B E D |
+T H E | D I S D A I N F U L | P O U T | O F | C O M R A D E | O S S I P O N ' S | T H I C K | L I P S | A C C E N T U A T E D | T H E | N E G R O | T Y P E | O F | H I S | F A C E |
+T H E Y | A R E | N O U R I S H I N G | T H E I R | G R E E D | O N | T H E | Q U I V E R I N G | F L E S H | A N D | T H E | W A R M | B L O O D | O F | T H E | P E O P L E | N O T H I N G | E L S E |
+T H E R E | A R E | N A T U R E S | T O O | T O | W H O S E | S E N S E | O F | J U S T I C E | T H E | P R I C E | E X A C T E D | L O O M S | U P | M O N S T R O U S L Y | E N O R M O U S | O D I O U S | O P P R E S S I V E | W O R R Y I N G | H U M I L I A T I N G | E X T O R T I O N A T E | I N T O L E R A B L E | T H O S E | A R E | T H E | F A N A T I C S |
+T H E | S H E E T | O F | P A P E R | C O V E R E D | W I T H | C I R C L E S | D R O P P E D | O U T | O F | H I S | F I N G E R S | A N D | H E | R E M A I N E D | S T A R I N G | A T | T H E | O L D | T E R R O R I S T | A S | I F | R O O T E D | S U D D E N L Y | T O | T H E | S P O T | B Y | H I S | M O R B I D | H O R R O R | A N D | D R E A D | O F | P H Y S I C A L | P A I N |
+H I S | S C A R E D | E Y E S | B L A Z E D | W I T H | I N D I G N A T I O N | I T | W O U L D | H U R T | T E R R I B L Y | H I S | M O U T H | D R O P P E D | O P E N |
+M I S T E R | V E R L O C | P E R C E I V E D | W I T H | S O M E | S U R P R I S E | T H A T | H E | D I D | N O T | K N O W | R E A L L Y | W H A T | T O | S A Y | T O | S T E V I E |
+D O W N | B E L O W | I N | T H E | Q U I E T | N A R R O W | S T R E E T | M E A S U R E D | F O O T S T E P S | A P P R O A C H E D | T H E | H O U S E | T H E N | D I E D | A W A Y | U N H U R R I E D | A N D | F I R M | A S | I F | T H E | P A S S E R | B Y | H A D | S T A R T E D | T O | P A C E | O U T | A L L | E T E R N I T Y | F R O M | G A S | L A M P | T O | G A S | L A M P | I N | A | N I G H T | W I T H O U T | E N D | A N D | T H E | D R O W S Y | T I C K I N G | O F | T H E | O L D | C L O C K | O N | T H E | L A N D I N G | B E C A M E | D I S T I N C T L Y | A U D I B L E | I N | T H E | B E D R O O M |
+H E | K N O W S | N O | B E T T E R | H E | G E T S | I N T O | H I S | P A S S I O N S | O V E R | I T |
+Y E S | N O T | A T | A L L | W E L L |
+T H E | C O M P A R I S O N | O C C U R R E D | T O | M I S T E R | V E R L O C | B E C A U S E | H E | H A D | S A T | A S T R I D E | V A R I O U S | A R M Y | H O R S E S | I N | H I S | T I M E | A N D | H A D | N O W | T H E | S E N S A T I O N | O F | A N | I N C I P I E N T | F A L L |
+S T E V I E | A C C U S T O M E D | T O | M O V E | A B O U T | D I S R E G A R D E D | H A D | G O T | U P | F R O M | T H E | K I T C H E N | T A B L E | C A R R Y I N G | O F F | H I S | D R A W I N G | T O | B E D | W I T H | H I M |
+M I C H A E L I S | T H E | T I C K E T | O F | L E A V E | A P O S T L E | S M I L E D | V A G U E L Y | W I T H | H I S | G L U E D | L I P S | H I S | P A S T Y | M O O N | F A C E | D R O O P E D | U N D E R | T H E | W E I G H T | O F | M E L A N C H O L Y | A S S E N T |
+T H E | O L D | T E R R O R I S T | T U R N E D | S L O W L Y | H I S | H E A D | O N | H I S | S K I N N Y | N E C K | F R O M | S I D E | T O | S I D E |
+Y E S | I | H A D | T H E | T I M E | T O | T H I N K | T H I N G S | O U T | A | L I T T L E | H E | A D D E D | W I T H O U T | E M P H A S I S |
+T H E | C O A L S | I N | T H E | G R A T E | S E T T L E D | D O W N | W I T H | A | S L I G H T | C R A S H | A N D | M I C H A E L I S | T H E | H E R M I T | O F | V I S I O N S | I N | T H E | D E S E R T | O F | A | P E N I T E N T I A R Y | G O T | U P | I M P E T U O U S L Y |
+W I T H | H I S | E L B O W | P R E S E N T I N G | N O | A P P E A R A N C E | O F | A | J O I N T | B U T | M O R E | L I K E | A | B E N D | I N | A | D U M M Y ' S | L I M B | T H R O W N | O V E R | T H E | B A C K | O F | A | C H A I R | H E | L E A N E D | F O R W A R D | S L I G H T L Y | O V E R | H I S | S H O R T | A N D | E N O R M O U S | T H I G H S | T O | S P I T | I N T O | T H E | G R A T E |
+H E | G L A R E D | A T | M E | A S | I F | H E | D I D N ' T | K N O W | W H O | I | W A S | W H E N | I | W E N T | D O W N S T A I R S |
+T H E | P R O S P E C T | W A S | A S | B L A C K | A S | T H E | W I N D O W | P A N E | A G A I N S T | W H I C H | H E | W A S | L E A N I N G | H I S | F O R E H E A D |
+S T E V I E | P R O W L E D | R O U N D | T H E | T A B L E | L I K E | A N | E X C I T E D | A N I M A L | I N | A | C A G E |
+I F | I | H A D | K N O W N | T H E Y | W E R E | C O M I N G | T O | N I G H T | I | W O U L D | H A V E | S E E N | T O | I T | T H A T | H E | W E N T | T O | B E D | A T | T H E | S A M E | T I M E | I | D I D |
+H I S | O W N | S K I N | H A D | S I Z Z L E D | U N D E R | T H E | R E D | H O T | B R A N D | H E | M U R M U R E D | S O F T L Y |
+H E | H A D | B E E N | A | P R I S O N E R | H I M S E L F |
+I | W O U L D | C A L L | I T | C A N N I B A L I S T I C | T H A T ' S | W H A T | I T | I S |
+T H E | L I G H T | T H R O W N | D O W N | B Y | T H E | S H A D E | F E L L | D A Z Z L I N G L Y | O N | T H E | W H I T E | P I L L O W | S U N K | B Y | T H E | W E I G H T | O F | H E R | H E A D | R E P O S I N G | W I T H | C L O S E D | E Y E S | A N D | D A R K | H A I R | D O N E | U P | I N | S E V E R A L | P L A I T S | F O R | T H E | N I G H T |
+I N | A N Y | C A S E | H E | H A D | N O T | T H E | T I M E |
+I T ' S | L I K E | Y O U R | H O R S E | S U D D E N L Y | F A L L I N G | D E A D | U N D E R | Y O U | I N | T H E | M I D S T | O F | A N | U N I N H A B I T E D | A N D | T H I R S T Y | P L A I N |
+L O A F I N G | W A S | A L L | V E R Y | W E L L | F O R | T H E S E | F E L L O W S | W H O | K N E W | N O T | M I S T E R | V L A D I M I R | A N D | H A D | W O M E N | T O | F A L L | B A C K | U P O N | W H E R E A S | H E | H A D | A | W O M A N | T O | P R O V I D E | F O R |
+L O M B R O S O | I S | A N | A S S |
+F O R | H I M | T H E | C R I M I N A L | I S | T H E | P R I S O N E R | S I M P L E | I S | I T | N O T |
+S T R U G G L E | W A R F A R E | W A S | T H E | C O N D I T I O N | O F | P R I V A T E | O W N E R S H I P | I T | W A S | F A T A L |
+T H E | P O S S E S S O R S | O F | P R O P E R T Y | H A D | N O T | O N L Y | T O | F A C E | T H E | A W A K E N E D | P R O L E T A R I A T | B U T | T H E Y | H A D | A L S O | T O | F I G H T | A M O N G S T | T H E M S E L V E S | Y E S |
+H I S | E N U N C I A T I O N | W O U L D | H A V E | B E E N | A L M O S T | T O T A L L Y | U N I N T E L L I G I B L E | T O | A | S T R A N G E R |
+H E | L O O K E D | D U B I O U S L Y | A T | H I S | B R O T H E R | I N | L A W | B U T | H E | D I D | N O T | A S K | H I M | F O R | I N F O R M A T I O N |
+Y O U | D O N ' T | U N D E R S T A N D | H E | B E G A N | D I S D A I N F U L L Y | B U T | S T O P P E D | S H O R T | I N T I M I D A T E D | B Y | T H E | D E A D | B L A C K N E S S | O F | T H E | C A V E R N O U S | E Y E S | I N | T H E | F A C E | T U R N E D | S L O W L Y | T O W A R D S | H I M | W I T H | A | B L I N D | S T A R E | A S | I F | G U I D E D | O N L Y | B Y | T H E | S O U N D |
+Y O U | W O U L D | C A L L | T H A T | L A D | A | D E G E N E R A T E | W O U L D | Y O U | M U M B L E D | M I S T E R | V E R L O C |
+A S K | K A R L | Y U N D T | H E | G R O W L E D | S A V A G E L Y |
+I | D O N ' T | S A Y | T H A T | P R O T E S T E D | M I C H A E L I S | G E N T L Y |
+T H E R E | I S | N O | O C C U P A T I O N | T H A T | F A I L S | A | M A N | M O R E | C O M P L E T E L Y | T H A N | T H A T | O F | A | S E C R E T | A G E N T | O F | P O L I C E |
+A N D | I | C O U L D | N E V E R | G E T | A S | M A N Y | A S | T H R E E | S U C H | M E N | T O G E T H E R |
+H E | W A T C H E D | H I M | G E S T I C U L A T I N G | A N D | M U R M U R I N G | I N | T H E | K I T C H E N |
+T H E S E | W E R E | B U T | F E W | A N D | F O R | T H E | F I R S T | T I M E | S I N C E | H E | O P E N E D | H I S | S H O P | H E | T O O K | A | C O M M E R C I A L | S U R V E Y | O F | I T S | V A L U E |
+C O M R A D E | O S S I P O N ' S | F A C E | T W I T C H E D | W I T H | E X A S P E R A T I O N |
+T H A T | B O Y | H E A R S | T O O | M U C H | O F | W H A T | I S | T A L K E D | A B O U T | H E R E |
+H E | G A V E | T H E | D I S C U S S I O N | U P | W I T H | A | S L I G H T | S H R U G | O F | T H E | S H O U L D E R S |
+C O M F O R T A B L E | D E A R |
+H E | G E T S | A | R E D | F A C E | P O R I N G | O V E R | T H E M |
+H E | W A S | N O T | S A T I S F I E D | W I T H | H I S | F R I E N D S |
+W H E N | H E | R O S E | P A I N F U L L Y | T H E | T H R U S T I N G | F O R W A R D | O F | A | S K I N N Y | G R O P I N G | H A N D | D E F O R M E D | B Y | G O U T Y | S W E L L I N G S | S U G G E S T E D | T H E | E F F O R T | O F | A | M O R I B U N D | M U R D E R E R | S U M M O N I N G | A L L | H I S | R E M A I N I N G | S T R E N G T H | F O R | A | L A S T | S T A B |
+T H E | S H A D O W | O F | H I S | E V I L | G I F T | C L U N G | T O | H I M | Y E T | L I K E | T H E | S M E L L | O F | A | D E A D L Y | D R U G | I N | A N | O L D | V I A L | O F | P O I S O N | E M P T I E D | N O W | U S E L E S S | R E A D Y | T O | B E | T H R O W N | A W A Y | U P O N | T H E | R U B B I S H | H E A P | O F | T H I N G S | T H A T | H A D | S E R V E D | T H E I R | T I M E |
+I T | W A S | K A R L | Y U N D T | W H O | W A S | H E A R D | I M P L A C A B L E | T O | H I S | L A S T | B R E A T H |
+H E | I S N ' T | F I T | T O | H E A R | W H A T ' S | S A I D | H E R E | H E | B E L I E V E S | I T ' S | A L L | T R U E |
+A T | B E S T | T H E Y | C A N | O N L Y | I N T E R P R E T | T H E | M I N D | O F | T H E | P R O P H E T | A N D | C A N | H A V E | N O | O B J E C T I V E | V A L U E |
+H E | P A U S E D | T H E N | A D D E D | W I T H | M O D E S T | F I R M N E S S |
+T H E N | W H Y | I N D U L G E | I N | P R O P H E T I C | P H A N T A S I E S |
+M I S T E R | V E R L O C | W A S | F U L L Y | R E S P O N S I V E | N O W |
+H E | W A S | O U T | O F | H I S | M I N D | W I T H | S O M E T H I N G | H E | O V E R H E A R D | A B O U T | E A T I N G | P E O P L E ' S | F L E S H | A N D | D R I N K I N G | B L O O D | W H A T ' S | T H E | G O O D | O F | T A L K I N G | L I K E | T H A T |
+H E | C A N ' T | S T A N D | T H E | N O T I O N | O F | A N Y | C R U E L T Y |
+T H E | F A M O U S | T E R R O R I S T | H A D | N E V E R | I N | H I S | L I F E | R A I S E D | P E R S O N A L L Y | A S | M U C H | A S | H I S | L I T T L E | F I N G E R | A G A I N S T | T H E | S O C I A L | E D I F I C E |
+A N D | T H E | I N C O N S I S T E N T | W O M A N | F E L L | U P O N | H I S | B U T T O N Y | B R E A S T | W E E P I N G | C O P I O U S L Y |
+I | C O U L D | N O T | L O V E | T H E E | D E A R | S O | M U C H | L O V E D | I | N O T | H O N O R | M O R E |
+T H E | B O Y S | B L E S S | T H E I R | B R A V E | H E A R T S | H A V E | D O N E | N O B L Y | B U T | O L D E R | M E N | A R E | N E E D E D | N O W | W E | C A N N O T | S A C R I F I C E | A L L | T H E | G A L L A N T | L A D S | A N D | W E | W H O | H A V E | M O R E | T O | L O S E | T H A N | T H E Y | M U S T | T A K E | O U R | T U R N | A N D | T R Y | T O | D O | A S | W E L L |
+B U T | E V E N | W H I L E | S H E | E N J O Y E D | E V E R Y | H O U R | O F | L I F E | A N D | B E G R U D G E D | T H E | T I M E | G I V E N | T O | S L E E P | S H E | F E L T | A S | I F | T H E | D R E A M | W A S | T O O | B E A U T I F U L | T O | L A S T | A N D | O F T E N | S A I D |
+H I S | W I F E | F E D | H I M | W I T H | T H E | F A T | O F | T H E | L A N D | R E G A R D L E S S | O F | C O N S E Q U E N C E S | H I S | C H I L D R E N | R E V O L V E D | A B O U T | H I M | W I T H | T I R E L E S S | C U R I O S I T Y | A N D | W O N D E R | H I S | N E I G H B O R S | F L O C K E D | I N | T O | A P P L A U D | A D V I S E | A N D | A D M I R E | E V E R Y | O N E | T R E A T E D | H I M | W I T H | A | R E S P E C T | M O S T | G R A T E F U L | T O | H I S | F E E L I N G S | H E | W A S | A N | O B J E C T | O F | I N T E R E S T | A N D | W I T H | E V E R Y | H O U R | H I S | I M P O R T A N C E | I N C R E A S E D | S O | T H A T | B Y | N I G H T | H E | F E L T | L I K E | A | C O M M A N D E R | I N | C H I E F | A N D | B O R E | H I M S E L F | A C C O R D I N G L Y |
+N O W | C Y N T H Y | B E | Y O U | S A T I S F I E D |
+S O | C H R I S T I E | T U R N E D | A | D E A F | E A R | T O | H E R | P R O P H E T I C | S O U L | A N D | G A V E | H E R S E L F | U P | T O | T H E | B L I S S F U L | H O L I D A Y | T H A T | H A D | C O M E | A T | L A S T |
+I | H O P E | Y O U ' L L | L I K E | H I M | B E T T E R | T H A N | T H E | F R O S T | B I T T E N | O L D | D A V I D | Y O U | F I R S T | K N E W | A N D | W E R E | K I N D | E N O U G H | T O | L O V E |
+T H E N | S H E | S A W | D A V I D | A N D | T H E | R E G I M E N T | B E C A M E | O N E | M A N | T O | H E R |
+I ' M | N O T | A | T A L K E R | Y O U | K N O W | A N D | A S | T H E | L A W S | O F | G R A V I T A T I O N | F O R B I D | M Y | S O A R I N G | A L O F T | A N Y W H E R E | I | C A N | O N L Y | E X P R E S S | M Y | J O Y F U L L Y | U P L I F T E D | S T A T E | O F | M I N D | B Y | P R A N C I N G | A S | Y O U | C A L L | I T |
+I | S H A L L | W A I T | F O R | T I M E | T O | S H O W |
+C A N | Y O U | R E M E M B E R | W H A T | H E P S E Y | T O L D | U S | A N D | C A L L | T H E M | P O O R | L O N G | S U F F E R I N | C R E E T E R S | N A M E S |
+D A V I D | A N D | C H R I S T I E | W E N T | S M I L I N G | A W A Y | T O G E T H E R | A N D | I F | T H E Y | S H E D | A N Y | T E A R S | O V E R | T H E | B R I E F | H A P P I N E S S | N O | O N E | S A W | T H E M | B U T | T H E | F L O W E R S | A N D | T H E Y | L O Y A L L Y | K E P T | T H E | S E C R E T | F O L D E D | U P | I N | T H E I R | T E N D E R | H E A R T S |
+M I S T E R | P O W E R | I S | W A I T I N G | A R E | Y O U | R E A D Y | L O V E | Q U I T E | R E A D Y |
+A S | A | M A R R I E D | W O M A N | Y O U | W I L L | G E T | O N | B E T T E R | A S | M Y | W I F E | Y O U | W I L L | B E | A L L O W E D | T O | C O M E | T O | M E | I F | I | N E E D | Y O U | A N D | A S | M Y | H E | S T O P P E D | T H E R E | F O R | H E | C O U L D | N O T | A D D | A S | M Y | W I D O W | Y O U | W I L L | H A V E | M Y | P E N S I O N | T O | S U P P O R T | Y O U |
+B E N N E T | W I L L | T A K E | T H E | G A R D E N | A N D | G R E E N | H O U S E | O F F | M Y | H A N D S | T H I S | A U T U M N | F O R | A | Y E A R | O R | L O N G E R | I F | I | L I K E |
+T H E N | T H E Y | W E N T | B A C K | T O | T H E I R | W O R K | L I T T L E | D R E A M I N G | A S | T H E Y | T I E D | R O S E S | A N D | T W I N E D | S M I L A X | W R E A T H S | H O W | N E A R | T H A T | O T H E R | C H A N C E | W A S | H O W | S O O N | T H E Y | W E R E | T O | B E | C A L L E D | U P O N | T O | K E E P | T H E I R | P R O M I S E | A N D | H O W | W E L L | E A C H | W A S | T O | P E R F O R M | T H E | P A R T | G I V E N | T H E M | I N | L I F E | A N D | D E A T H |
+H E | W A S | N O T | A S | U N M O V E D | A S | H E | S E E M E D | B Y | T H E | G E N E R A L | E X C I T E M E N T | A N D | H A D | F E L T | S U N D R Y | M A N L Y | I M P U L S E S | T O | U P | A N D | A T | E M | W H E N | H I S | C O M R A D E S | I N | T H E | S H O P | D I S C U S S E D | T H E | C R I S I S | W I T H | I R E F U L | B R A N D I S H I N G | O F | A W L S | A N D | V E N G E F U L | P O U N D I N G | O F | S O L E | L E A T H E R | A S | I F | T H E | R E B E L S | W E R E | U N D E R | T H E | H A M M E R |
+V E R Y | W E L L | S A I D | M I S S U S | W I L K I N S | R E S O L U T E L Y | T O | H E R S E L F | E F | I | C A N ' T | M A K E | N O | I M P R E S S I O N | O N | H I S | S O U L | I | W I L L | O N | H I S | S T O M M I C K | A N D | S E E | H O W | T H A T ' L L | W O R K |
+Y O U | Y O U N G | F O L K S | T A K E | A | W E D D I N G | T R I P | T O | T H E | G R E E N | H O U S E | W H I L E | W E | S E E | H O W | W E L L | W E | C A N | G E T | O N | W I T H O U T | Y O U |
+A L L | W A T C H E D | W I T H | Q U I C K E N E D | B R E A T H | A N D | P R O U D | S O U L S | T H A T | L I V I N G | W A V E | B L U E | B E L O W | A N D | B R I G H T | W I T H | A | S T E E L Y | G L I T T E R | A B O V E | A S | I T | F L O W E D | D O W N | T H E | S T R E E T | A N D | A W A Y | T O | J O I N | T H E | S E A | O F | D A U N T L E S S | H E A R T S | T H A T | F O R | M O N T H S | H A D | R O L L E D | U P | A G A I N S T | T H E | S O U T H | A N D | E B B E D | B A C K | R E D D E N E D | W I T H | T H E | B L O O D | O F | M E N | L I K E | T H E S E |
+I T | I S | T E R R I B L E | A N D | Y E T | G L O R I O U S |
+H E ' S | A | K I N D | N E I G H B O R L Y | M A N | A N D | H I S | B O Y | W I L L | T A K E | M Y | P L A C E | A B O U T | T H E | H O U S E | A N D | P R O T E C T | Y O U | F A I T H F U L L Y |
+A | V E R Y | S I M P L E | L I T T L E | M A R R I A G E | F E A S T | B U T | M O R E | L O V E | G O O D | W I L L | A N D | T E N D E R | W I S H E S | A D O R N E D | T H E | P L A I N | T A B L E | T H A N | I S | O F T E N | F O U N D | A T | W E D D I N G | B R E A K F A S T S | A N D | B E T T E R | T H A N | A N Y | S P E E C H | O R | S O N G | W A S | L E T T Y ' S | B R O K E N | W H I S P E R | A S | S H E | F O L D E D | H E R | A R M S | R O U N D | D A V I D ' S | E M P T Y | C H A I R | W H E N | N O | O N E | S A W | H E R | H E A V E N | B L E S S | A N D | K E E P | A N D | B R I N G | H I M | B A C K | T O | U S |
+T H E | R O S E S | A R E | F O R | T H E Y | R E M I N D | M E | O F | P O O R | H E L E N | A N D | T H E | F I R S T | W O R K | I | D I D | W I T H | D A V I D | W A S | A R R A N G I N G | F L O W E R S | L I K E | T H E S E | F O R | A | D E A D | B A B Y ' S | L I T T L E | C O F F I N |
+T O | N O | H O M E | I N | T H E | L A N D | D I D | T H E | G R E A T | T R O U B L E | B R I N G | A | M O R E | S U D D E N | C H A N G E | T H A N | T H E | L I T T L E | C O T T A G E | I N | T H E | L A N E |
+Y E S | D A V I D | S I S T E R | A N D | S W E E T H E A R T | A N S W E R E D | B R A V E L Y | F O R G E T T I N G | I N | T H E | F E R V O R | O F | T H E | M O M E N T | W H A T | H E A V Y | C O N S E Q U E N C E S | G O D | M I G H T | S E E | F I T | T O | S E N D | G O O D |
+I | K N E W | Y O U | W O U L D | G O | I | S A W | Y O U | G E T T I N G | R E A D Y | A N D | I | M A D E | U P | M Y | M I N D | T O | F O L L O W |
+T O | S A Y | T H A T | T H E | F I S H | R O S E | A T | O N C E | A N D | S W A L L O W E D | T H E | B A I T | H O O K | A N D | A L L | B U T | F E E B L Y | E X P R E S S E S | T H E | J U S T I C E | D O N E | T O | T H E | C A K E S | B Y | T H A T | L O N G | S U F F E R I N G | M A N |
+W E | C A N ' T | A F F O R D | N O | N I C E | V I T T L E S | N O W | W H E N | O U R | M E N | A R E | S U F F E R I N | S O |
+T H E Y | K N E W | W H A T | I T | W A S | W I T H O U T | A | W O R D | M I S S U S | S T E R L I N G | C L A S P E D | H E R | H A N D S | A N D | B O W E D | H E R | H E A D |
+I N | T H A T | C A S E | Y O U | W I L L | F I N D | M E | A | P R O U D | I M P E T U O U S | A M B I T I O U S | F E L L O W | C H R I S T I E | A N D | H O W | W I L L | T H A T | S U I T |
+S U R E L Y | I | S H A L L | I F | I | G I V E | Y O U | A N D | M Y S E L F | T O | T H E | C A U S E | A N D | I | D O | I T | G L A D L Y | T H O U G H | I | K N O W | T H A T | M Y | H E A R T | H A S | G O T | T O | A C H E | A S | I T | N E V E R | H A S | A C H E D | Y E T | W H E N | M Y | C O U R A G E | F A I L S | A S | I T | W I L L | B Y | A N D | B Y | A N D | M Y | S E L F I S H | S O U L | C O U N T S | T H E | C O S T | O F | M Y | O F F E R I N G | A F T E R | T H E | E X C I T E M E N T | I S | O V E R |
+D A V I D | C A U G H T | T H E | E X A L T A T I O N | A N D | G A V E | N O | F U R T H E R | T H O U G H T | T O | A N Y | T H I N G | B U T | T H E | D U T Y | O F | T H E | H O U R | F I N D I N G | H I M S E L F | S T R O N G E R | A N D | B R A V E R | F O R | T H A T | L O N G | L O O K | I N T O | T H E | I L L U M I N A T E D | F A C E | O F | T H E | W O M A N | H E | L O V E D |
+F I N D I N G | T H A T | L I S H A | S H O W E D | L I T T L E | E N T H U S I A S M | O N | T H E | S U B J E C T | S H E | T R I E D | T O | R O U S E | H I M | B Y | P A T R I O T I C | A P P E A L S | O F | V A R I O U S | S O R T S |
+N O T | O N E | D A V I D | T H A T ' S | T R U E | L O V E | C H R I S T I E |
+Y O U | W I L L | L E T | M E | D O | I T | A N D | I N | R E T U R N | I | W I L L | M A R R Y | Y O U | W H E N E V E R | Y O U | A S K | M E | A N S W E R E D | C H R I S T I E | S E A L I N G | T H E | P R O M I S E | W I T H | A | K I S S | T H A T | S I L E N C E D | H I M |
+W E | W I L L | W H A T | C A N | I | D O | F O R | Y O U | D A V Y | A S K E D | C H R I S T I E | W O N D E R F U L L Y | S U P P O R T E D | B Y | T H E | T H O U G H T | T H A T | S H E | W A S | G O I N G | T O O |
+D A V I D | W A S | S O B E R | E N O U G H | N O W | A N D | W E N T | A B O U T | H I S | W O R K | W I T H | A | G R I M | S E T | T O | H I S | L I P S | A N D | A | S P A R K | I N | H I S | E Y E S | T H A T | M A D E | T H E | T H R E E | W O M E N | L O O K | A T | O N E | A N O T H E R | P A L E | W I T H | U N S P O K E N | A P P R E H E N S I O N |
+M O T H E R | S A Y S | I ' V E | G O N E | B A C K | T O | T H E | T I M E | B E F O R E | W E | L O S T | L E T T Y | A N D | I | S O M E T I M E S | F E E L | A S | I F | I | H A D |
+D A V I D | H E L D | I T | C L O S E | I N | B O T H | O F | H I S | S A Y I N G | G R A T E F U L L Y | T H A N K | Y O U | M O T H E R | T H E N | F I X I N G | H I S | E Y E S | O N | T H E | Y O U N G E R | Y E T | N O T | D E A R E R | W O M E N | H E | A D D E D | W I T H | A | R I N G | I N | H I S | V O I C E | T H A T | M A D E | T H E I R | H E A R T S | A N S W E R | W I T H | A | P R O M P T | A Y | A Y | I N | S P I T E | O F | L O V E | O R | F E A R |
+N O T H I N G | C A N | S U R P R I S E | M E | N O W | I ' M | P R E P A R E D | F O R | A N Y | T H I N G | E V E N | T H E | S I G H T | O F | M Y | Q U A K E R I S H | L O V E R | D A N C I N G | A | J I G |
+T H E | W O M E N | D R O P P E D | T H E I R | W O R K | T O | L O O K | A N D | L I S T E N | F O R | H I S | V I S I T S | W E R E | F E W | A N D | S H O R T | A N D | E V E R Y | I N S T A N T | W A S | P R E C I O U S |
+Y O U ' V E | S O M E T H I N G | T O | T E L L | M E | I | S E E | I T | I N | Y O U R | F A C E | D E A R | I | M U S T | G O |
+B U T | I | T H I N K | F E W | B R I D E S | D R E S S | W I T H | A | B R A V E R | H A P P I E R | H E A R T | T H A N | M I N E | T H O U G H | I | D O | C H O O S E | A | S O B E R | W E D D I N G | G O W N | A N S W E R E D | C H R I S T I E | S M I L I N G | A G A I N | A S | S H E | T O O K | F R O M | A | H A L F | P A C K E D | T R U N K | H E R | N E W | H O S P I T A L | S U I T | O F | S O F T | G R A Y | W O O L L E N | S T U F F |
+I T | W O U L D | H A V E | T A K E N | M A N Y | K N A P S A C K S | T O | H O L D | A L L | T H E | G I F T S | S H O W E R E D | U P O N | H I M | B Y | H I S | F R I E N D S | A N D | N E I G H B O R S |
+T H E N | T H E Y | S T O O D | Q U I T E | S T I L L | F O R | A | T I M E | A N D | I N | T H E | S I L E N C E | T H E | T W O | H E A R T S | T A L K E D | T O G E T H E R | I N | T H E | S W E E T | L A N G U A G E | N O | T O N G U E | C A N | U T T E R |
+I | D O N ' T | W A N T | Y O U | T O | I | L O V E | T O | S E E | Y O U | S O | Y O U N G | A N D | H A P P Y | O N L Y | Y O U | A R E | N O T | T H E | O L D | D A V I D | A N D | I ' V E | G O T | T O | G E T | A C Q U A I N T E D | W I T H | T H E | N E W | O N E |
+E X C E L L E N T L Y | I | L I K E | P R I D E | O F | Y O U R | S O R T | I M P E T U O S I T Y | B E C O M E S | Y O U | F O R | Y O U | H A V E | L E A R N E D | T O | C O N T R O L | I T | I F | N E E D | B E | A N D | T H E | A M B I T I O N | I S | B E S T | O F | A L L |
+N O | H E | A I N ' T | I T ' S | A | T R A I N E R | A D D E D | A N N | L I Z Y |
+H E R | M E E T I N G | W I T H | L E T T Y | W A S | I N D E S C R I B A B L Y | T E N D E R | A N D | T H E | D A Y S | T H A T | F O L L O W E D | W E R E | P R E T T Y | E Q U A L L Y | D I V I D E D | B E T W E E N | H E R | A N D | H E R | B R O T H E R | I N | N U R S I N G | T H E | O N E | A N D | L O V I N G | T H E | O T H E R |
+T H E N | T H E | G O O D | S O U L | O P E N L Y | S H O U L D E R E D | T H E | B U R D E N | S H E | H A D | B O R N E | S O | L O N G | I N | S E C R E T | A N D | B R A V E L Y | T R U D G E D | O N | A L O N E |
+I | F E E L | L I K E | A | B O Y | O U T | O F | S C H O O L | O R | R A T H E R | A | M A N | O U T | O F | P R I S O N | A N D | M U S T | E N J O Y | M Y | L I B E R T Y | I N | S O M E | W A Y |
+N E X T | E V E N I N G | A S | M I S S U S | S T E R L I N G | S A T | A L O N E | I N | T H E | T W I L I G H T | A | T A L L | M A N | I N | A R M Y | B L U E | E N T E R E D | Q U I E T L Y | S T O O D | W A T C H I N G | T H E | T R A N Q U I L | F I G U R E | F O R | A | M O M E N T | T H E N | W E N T | A N D | K N E L T | D O W N | B E S I D E | I T | S A Y I N G | W I T H | A | M O S T | U N S O L D I E R L Y | C H O K E | I N | T H E | V O I C E |
+N O T H I N G | C A N | P A R T | U S | A N Y | M O R E | N O T | E V E N | D E A T H | F O R | L O V E | L I K E | O U R S | W I L L | L A S T | F O R | E V E R |
+I | W I S H | I | H A D N ' T | T A K E N | T H A T | B R A N D Y | H E | S A I D | F O O L | T H A T | I | A M |
+T H A T | A T | A L L | T I M E S | D E B A S I N G | A T | T H I S | P A R T I C U L A R | T I M E | I T | W A S | I N F A M O U S | T H A T | A | V I C E | U N W O R T H Y | O F | A N Y | M A N | W A S | D O U B L Y | S I N F U L | I N | A | M A N | O F | E D U C A T I O N | A N D | A | M I N I S T E R | O F | G O D | I N | V A I N |
+I N | T H E | V A L L E Y | O F | T H E | S H A D O W | O F | D E A T H | H E | I S | W I T H | U S |
+I | H A D | T H E | P L E A S U R E | O F | M E E T I N G | H I M | I N | S O C I E T Y |
+D E A D | S A I D | D O C T O R | M A C K L E W A I N |
+I ' L L | R E P O R T | T H I S | T O | T H E | G O V E R N M E N T |
+Y E S | O N E | O U G H T N ' T | T O | L E A V E | T H E | C O L O N Y | W I T H O U T | S E E I N G | I T | S A Y S | B U R G E S S | I T ' S | W O R T H | S E E I N G |
+T H E | D E V I L | H E | I S | I | H E A R D | S O M E T H I N G | A B O U T | I T | T O O |
+I N | F A C T | T H E | R I N G L E A D E R | J O H N | R E X | G A V E | M E | H I S | C O N F E S S I O N | A N D | I | S E N T | I T | T O | T H E | B I S H O P |
+T H E | G O V E R N M E N T | M A Y | G O | T O | A N D | Y O U | T O O | R O A R E D | B U R G E S S | G E T | O U T |
+A N | I M P U L S I V E | G E N T L E M A N | S A I D | M E E K I N | T O | M A C K L E W A I N | A S | T H E | S O U N D | O F | M I S T E R | N O R T H ' S | F O O T S T E P S | D I E D | A W A Y | I N | T H E | D I S T A N C E |
+S H O W | M I S T E R | N O R T H | O U T | H E | S A I D | A N D | G O | D O W N | T O | T H E | B A R R A C K S | A N D | T E L L | T R O K E | T H A T | K I R K L A N D | I S | T O | H A V E | A | H U N D R E D | L A S H E S | T O | M O R R O W |
+I ' L L | S H O W | Y O U | W H O ' S | M A S T E R | H E R E | M Y | G O O D | S I R |
+I | A M | A | M I N I S T E R | O F | G O D | S I R | A N D | I | F O R B I D | Y O U | T O | C O M M I T | T H I S | C R I M E |
+W E L L | N O W | S A I D | M E E K I N | W I T H | A S P E R I T Y | I | D O N ' T | A G R E E | W I T H | Y O U |
+B Y | A N D | B Y | A | S H O R T | F I G U R E | S M O K I N G | A | C H E R O O T | C A M E | U P | O U T | O F | T H E | D A R K | A N D | P R O V E D | T O | B E | D O C T O R | M A C K L E W A I N | W H O | H A D | B E E N | P R E V E N T E D | F R O M | A T T E N D I N G | T H E | D I N N E R | B Y | R E A S O N | O F | A N | A C C I D E N T | T O | A | C O N S T A B L E | A T | N O R F O L K | B A Y | W H I C H | H A D | C L A I M E D | H I S | P R O F E S S I O N A L | A T T E N T I O N |
+W H O M | I S | H E | G O I N G | T O | F L O G | N O W |
+B E F O R E | T H E | T W O | C L E R G Y M E N | H A D | G O T | H A L F | W A Y | D O W N | T H E | S T E E P | P A T H | T H A T | L E D | F R O M | T H E | C O M M A N D A N T ' S | H O U S E | T O | T H E | F L A T | O N | W H I C H | T H E | C O T T A G E S | O F | T H E | D O C T O R | A N D | C H A P L A I N | W E R E | B U I L T | M A C K L E W A I N | R E J O I N E D | T H E M |
+M A C K L E W A I N | S H O O K | H I S | H E A D | S E R I O U S L Y |
+T H I S | I S | M U R D E R O U S |
+P R A Y | H E L P | Y O U R S E L F | T O | W I N E |
+M I S T E R | M E E K I N | E X P R E S S E D | S O M E | A L A R M | B U T | D O C T O R | M A C K L E W A I N | R E | A S S U R E D | H I M |
+O H | G O D | G I V E | M E | S T R E N G T H | A I D | M E | H E L P | M E |
+N O R T H | K N E W | W E L L | T H A T | H E | W O U L D | N E V E R | D A R E | T O | A T T E M P T | A N Y | S U C H | A C T | O F | V I O L E N C E | B U T | T H E | I N S U L T | S T U N G | H I M | L I K E | T H E | C U T | O F | A | W H I P |
+H E | S E E M S | T O | M E | T O | B E | T R U L Y | P E N I T E N T | F O R | H I S | O F F E N C E S | A | M I S G U I D E D | B U T | N O T | A | H Y P O C R I T I C A L | M A N | I F | M Y | K N O W L E D G E | O F | H U M A N | N A T U R E | G O E S | F O R | A N Y T H I N G | I | H O P E | H E | I S | S A I D | N O R T H |
+D A M N | Y O U R | I M P E R T I N E N C E | S I R | B U R S T | O U T | B U R G E S S |
+T W I C E | H E | P A U S E D | O N | T H E | W A Y | T O | T H E | S I T T I N G | R O O M | A N D | T W I C E | W A S | H E | D R I V E N | O N | B Y | A | P O W E R | S T R O N G E R | T H A N | H I S | W I L L |
+Y O U ' R E | A | D I S M I S S E D | O F F I C E R | O F | T H E | G O V E R N M E N T | S I R |
+H A V E | Y O U | M A N Y | V I S I T O R S | C A P T A I N | B U R G E S S | V E R Y | F E W |
+I | H A V E | T H E S E | A T T A C K S | A T | T I M E S |
+I N | T H E | M I D S T | O F | H I S | A R G U M E N T S | H E | F O U N D | H I M S E L F | A T | T H E | C U P B O A R D | W I T H | T H E | B O T T L E | A T | H I S | L I P S | I N | A N | A T T I T U D E | T H A T | W A S | A T | O N C E | L U D I C R O U S | A N D | H O R R I B L E |
+H E | M I X E D | A | T E A S P O O N F U L | O F | T H I S | I N | A | P A N N I K I N | O F | W A T E R | A N D | D R A N K | I T |
+Y O U | D O N ' T | M E A N | T O | S A Y | H E ' S | G O I N G | T O | F L O G | K I R K L A N D |
+H E | I S | J U S T | M A R R I E D | Y O U | K N O W | I S | H E | S A I D | B U R G E S S |
+A N O T H E R | F L O G G I N G | T O | M O R R O W | S A I D | H E | G R U M B L I N G L Y |
+T H I S | O F | C O U R S E | W A S | M E R E | B R A V A D O | O N | T H E | P A R T | O F | T H E | C O M M A N D A N T |
+I ' L L | T E A C H | M Y | P R I S O N E R S | T O | A T T E M P T | S U I C I D E |
+I | W A S | Q U A R T E R E D | W I T H | H I M | A T | S A R A H | I S L A N D |
+T H E R E ' S | N O | F E A R | O F | H I M | S A I D | B U R G E S S | C H E E R I L Y | I F | H E | G R O W S | U P R O A R I O U S | W E ' L L | S O O N | G I V E | H I M | A | T O U C H | O F | T H E | C A T |
+I ' V E | G I V E N | M Y | O R D E R S | S I R |
+W E | C A N ' T | D O | A N Y T H I N G | W I T H O U T | E V I D E N C E | C O M P L A I N |
+I | S H A L L | F I N D | M Y | P O R T M A N T E A U | I N | M Y | R O O M | Y O U | S A I D | Y E S | Y E S |
+H E | S M E L T | T H E | N U T T Y | A R O M A | O F | T H E | S P I R I T |
+D E L I G H T E D | T O | S E E | Y O U | M I S T E R | M E E K I N |
+T H E N | C A P T A I N | B U R G E S S | C R I E D | N O R T H | H I S | P A L E | F A C E | F L U S H I N G | I | T E L L | Y O U | T H E | B O Y ' S | B L O O D | W I L L | B E | O N | Y O U R | H E A D |
+P E R H A P S | Y O U ' L L | H A V E | T H E | G O O D N E S S | T O | A L L O W | M E | T O | B E | T H E | B E S T | J U D G E | O F | T H A T | R E T U R N E D | M A C K L E W A I N | D R A W I N G | U P | H I S | L I T T L E | B O D Y | T O | I T S | L E A S T | I N S I G N I F I C A N T | S T A T U R E |
+A B A N D O N E D | I N D E E D | B Y | G O D | A N D | M A N | A L M O S T |
+G O O D | N I G H T | S I R | I | H O P E | Y O U | W I L L | B E | C O M F O R T A B L E |
+D O C T O R | W E | A L L | H A V E | O U R | C R O S S E S | H A V E | W E | N O T |
+H O W | D E L I G H T F U L | T H E | G R A S S | S M E L L S |
+I T | W A S | A S | T H O U G H | H E | H A D | R E A C H E D | T H E | C R I S I S | O F | A | D I S E A S E | W H I C H | H A D | B E E N | F O R | D A Y S | G A T H E R I N G | F O R C E |
+T H E Y | S H A L L | N O T | F L O G | T H A T | B O Y | H E | S A I D |
+T H E | R E V E R E N D | M E E K I N | E Y E D | H I S | C L E R I C A L | B R O T H E R | W I T H | H O R R O R | T H E | R E V E R E N D | M E E K I N | W A S | N O T | A C C U S T O M E D | T O | C L E R G Y M E N | W H O | W O R E | B L A C K | N E C K T I E S | S M O K E D | C L A Y | P I P E S | C H E W E D | T O B A C C O | A N D | D R A N K | N E A T | B R A N D Y | O U T | O F | T U M B L E R S |
+I | S U P P O S E | S E V E R I T Y | I S | N E C E S S A R Y | R E T U R N E D | M E E K I N | T H O U G H | T O | M Y | E A R S | A | F L O G G I N G | S O U N D S | A | L I T T L E | D I S T A S T E F U L |
+C A P T A I N | F R E R E | S A Y S | T H A T | T H E | S C E N E R Y | I S | D E L I G H T F U L |
+H E | S L E E P S | A T | T H E | B A C K | A N D | N O R T H | H U R R I E D | O F F |
+O | L O R D | L O O K | D O W N | U P O N | M E |
+I F | Y O U | P L E A S E | S A I D | M E E K I N | G R A V E L Y |
+H E | H A S | T H E | S T R A N G E S T | F I T S | A T | T I M E S |
+S O | T H E Y | W E N T | O N | T O | T H E | V E R A N D A H | A N D | L O O K E D | D O W N | U P O N | T H E | L I G H T S | O F | T H E | P R I S O N | A N D | L I S T E N E D | T O | T H E | S E A | L A P P I N G | T H E | S H O R E |
+O U R | R O A D S | L I E | T O G E T H E R | D O C T O R |
+I | M U S T | H A V E | A | T E A S P O O N F U L | H E | S A I D | T O | A L L A Y | T H E | C R A V I N G |
+A | G R E A T | R A S C A L | P U T | I N | N O R T H |
+Y O U | H A V E | N O T | B E E N | L O N G | I N | T H E | C O L O N Y | M I S T E R | M E E K I N |
+U N L E S S | I T ' S | A | C A N C E R | I N | T H E | S T O M A C H | I | D O N ' T | K N O W | W H A T | I T | C A N | B E | C A N C E R | I N | T H E | S T O M A C H |
+T H E N | D O N ' T | Y O U | I N T E R F E R E | W I T H | M E | S I R |
+T H A T ' S | M A C K L E W A I N ' S | B U S I N E S S |
+B U T | M A C K L E W A I N | W A S | T I R E D | A N D | W A N T E D | T O | G E T | H O M E |
+I M P E R T I N E N T | Y O U N G | B E G G A R | S A I D | B U R G E S S |
+T H A T | L A S T | F E L L O W | Y O U | H A D | O U G H T | T O | H A V E | B E E N | T I E D | U P | H I M S E L F |
+A N D | B E A T | O N | T H E | B A R S | W I T H | W H I T E | A N D | S W E A T I N G | H A N D S |
+B U T | K I R K L A N D | K E P T | S T E A D I L Y | O N | F O R | T H E | R I V E R |
+W H A T | I S | H E | M O R E | T H A N | A N Y B O D Y | E L S E | S A I D | T H E | W R E T C H E D | M A N | T O | H I M S E L F | A S | H E | H U G G E D | H I S | M I S E R Y | C L O S E |
+W H A T | D O E S | H E | C A R E | C A R E |
+W H E N | T H E | M U S T E R | B E L L | R A N G | A N D | T H E | G A N G | B R O K E | U P | R U F U S | D A W E S | O N | H I S | S I L E N T | W A Y | T O | H I S | S E P A R A T E | C E L L | O B S E R V E D | A | N O T A B L E | C H A N G E | O F | C U S T O M | I N | T H E | D I S P O S I T I O N | O F | T H E | N E W | C O N V I C T |
+T H E R E ' S | M O R E | T R O U B L E | W I T H | Y O U | B L O O D Y | A R I S T O C R A T S | T H A N | E N O U G H | L I E | Q U I E T |
+V E R Y | G O O D | Y O U R | H O N O U R | S A Y S | T R O K E |
+M U S T | S T O P | T H A T | F I F T Y | L A S H E S | T R O K E |
+O H | M I S T E R | N O R T H | S A Y S | K I R K L A N D | W H Y | D I D | Y O U | S T O P | M E |
+H A V E | Y O U | E V E R | B E E N | I N | T H A T | T H A T | P L A C E | I | W A S | I N | L A S T | N I G H T | A S K E D | K I R K L A N D |
+A | P R I S O N E R | R E F R A C T O R Y | Y O U R | R E V E R E N C E | S A I D | T H E | W A T C H M A N |
+H O L D | O N | T O | M E | M I S S | N A N C Y | S A I D | T H E | G I A N T | I ' M | B I G | E N O U G H | T O | C A R R Y | D O U B L E |
+I T ' S | H A R D | F O R | S U C H | Y O U N G | U N S |
+K I R K L A N D | J U M P E D | F O R | T H E | J E T T Y | M I S S E D | H I S | F O O T I N G | A N D | F E L L | I N T O | T H E | A R M S | O F | T H E | C H A P L A I N |
+L E T | H I M | O U T | C R I E D | N O R T H | A G A I N | S T A M P I N G | H I S | F O O T |
+W A N T S | T O | C O M E | O U T | M I S T E R | N O R T H |
+D O | H I M | G O O D | C U R S E | H I M |
+A B O U T | D A W N | T H E | N E X T | M O R N I N G | M I S T E R | N O R T H | W H O | A M O N G S T | O T H E R | V A G A R I E S | N O T | A P P R O V E D | O F | B Y | H I S | B I S H O P | H A D | A | H A B I T | O F | P R O W L I N G | A B O U T | T H E | P R I S O N | A T | U N O F F I C I A L | H O U R S | W A S | A T T R A C T E D | B Y | A | D I S P U T E | A T | T H E | D O O R | O F | T H E | D O R M I T O R Y |
+I | W O N ' T | H A V E | M Y | M E N | K N O C K E D | U P | W I T H | F L O G G I N G | T H E S E | R A S C A L S |
+J U S T | A S | H E | R E A C H E D | I T | H O W E V E R | T H E | F I G U R E | O F | M I S T E R | N O R T H | R O S E | F R O M | B E H I N D | A | P I L E | O F | S T O N E S |
+I F | Y O U | F A L L | W E | M U S T | F A L L | O V E R | Y O U | A N D | T H E N | Y O U ' R E | D O N E | F O R |
+I | O R D E R | Y O U | S I R | N O R T H | C R I E D | I N D I G N A N T |
+I ' M | N O T | T O | G O | I N | T H E R E | S A Y S | T H E | E X | B A N K | C L E R K | D R A W I N G | B A C K | I N | D I S M A Y | F R O M | T H E | C L O U D | O F | F O U L | F A C E S | W H I C H | L O W E R E D | U P O N | H I M |
+Y O U | C A N | G U E S S | W H A T | T H A T | U N H A P P Y | B O Y | H A S | S U F F E R E D |
+H E | H A D | B E E N | A | C L E R K | I N | A | B A N K I N G | H O U S E | A N D | W A S | T R A N S P O R T E D | F O R | E M B E Z Z L E M E N T | T H O U G H | B Y | S O M E | G R A V E | D O U B T S | A S | T O | H I S | G U I L T | W E R E | E N T E R T A I N E D |
+H E | H A D | H A R D L Y | U T T E R E D | T H E | W O R D S | W H E N | T H E | B O Y | F L U N G | H I M S E L F | B E N E A T H | T H E | L O G |
+K I R K L A N D | G H A S T L Y | P A L E | B L E E D I N G | W I T H | H I S | W O O L L E N | S H I R T | T O R N | A N D | H I S | B L U E | E Y E S | W I D E | O P E N | W I T H | T E R R O R | W A S | C L I N G I N G | T O | T H E | B A R S |
+V E R Y | S O R R Y | Y O U R | R E V E R E N C E | B U T | Y O U R | R E V E R E N C E | K N O W S | T H A T | I | D A R E N ' T | D O | S U C H | A | T H I N G |
+Y O U R | C O U S I N | T H E | W I L D | C O N V O L V U L U S | W H O M | I | L E F T | I N | T H E | F I E L D S | T H I S | M O R N I N G | D O E S | N O | S U C H | T H I N G | I | A S S U R E | Y O U |
+D I D | T H A T | L O V E L Y | C R E A T U R E | S U P P O S E | T H A T | N A T U R E | W H O | H A D | D O N E | S O | M U C H | F O R | H E R | T H A T | T H E | F A M E | O F | H E R | B E A U T Y | E X T E N D E D | T H R O U G H O U T | T H E | W O R L D | H A D | Y E T | L E F T | H E R | S O | W E A K | A N D | F E E B L E | T H A T | S H E | C O U L D | N O T | S U P P O R T | H E R S E L F | I N | T H E | P O S I T I O N | M O S T | C A L C U L A T E D | T O | G I V E | H E R | E A S E | A N D | P L E A S U R E |
+M Y | Y O U N G | P L A N T S | R E Q U I R E | H E A T | O R | T H E Y | W O U L D | N O T | L I V E | A N D | T H E | P O T S | W E | A R E | K E P T | I N | P R O T E C T | U S | F R O M | T H O S E | C R U E L | W I R E | W O R M S | W H O | D E L I G H T | T O | D E S T R O Y | O U R | R O O T S |
+T H E N | T H E | W I N D | T O O K | A N O T H E R | F R O L I C | R O U N D | T H E | G A R D E N | A N D | M A D E | U P | T O | T H E | L A R G E | W H I T E | L I L Y | I N T O | W H O S E | R E F I N E D | E A R | H E | W H I S P E R E D | A | D O U B T | A S | T O | T H E | N E C E S S I T Y | O R | A D V A N T A G E | O F | H E R | T H I C K | P O W E R F U L | S T E M | B E I N G | P R O P P E D | U P | A G A I N S T | A | S T U P I D | U G L Y | S T I C K |
+H E | R E A L L Y | G R I E V E D | T O | S E E | I T |
+Y O U | S U R E L Y | C A N N O T | S U P P O S E | T H A T | I N | A | N A T U R A L | S T A T E | Y O U | W O U L D | B E | F O R C E D | T O | C L I M B | R E G U L A R L Y | U P | O N E | T A L L | B A R E | S T I C K | S U C H | A S | I | S E E | Y O U | U P O N | N O W |
+S T I L L | T H E | R O S E | T R E E | S T O O D | O U T | T H A T | T H E R E | M U S T | B E | S O M E | G R E A T | A D V A N T A G E S | I N | A | G A R D E N E R ' S | C A R E | F O R | S H E | C O U L D | N O T | P R E T E N D | T O | B E | I G N O R A N T | O F | H E R | O W N | S U P E R I O R I T Y | T O | A L L | H E R | W I L D | R E L A T I O N S | I N | T H E | W O O D S |
+M A K I N G | A | S O R T | O F | E D D Y I N G | C I R C U I T | R O U N D | T H E | G A R D E N | H E | K N O C K E D | O V E R | T H E | C O N V O L V U L U S | P O L E | T O R E | T H E | S T R I P S | F R O M | T H E | S T I C K | T H A T | H E L D | U P | T H E | W H I T E | L I L Y | L O O S E D | A L L | T H E | C A R N A T I O N | F L O W E R S | F R O M | T H E I R | F A S T E N I N G S | B R O K E | T H E | R O S E | T R E E | D O W N | A N D | L E V E L L E D | T H E | S W E E T | P E A S | T O | T H E | G R O U N D |
+I | A M | N O T | T H I N K I N G | A B O U T | T H E | G A R D E N | M A M M A | R E P L I E D | T H E | Y O U N G | G I R L | W I T H O U T | L I F T I N G | U P | H E R | F A C E | W E | C A N | P L A N T | N E W | F L O W E R S | A N D | T I E | U P | E V E N | S O M E | O F | T H E S E | A F R E S H |
+B U T | F O R | T H E | S I G H T | T H A T | A W A I T E D | H I M | H E | W A S | N O T | P R E P A R E D | A T | A L L |
+W H Y | N O T | A L L O W | Y O U R | S I L V E R | T U F T S | T O | L U X U R I A T E | I N | A | N A T U R A L | M A N N E R |
+W H A T | A | F U S S | I S | M A D E | A B O U T | Y O U | M Y | D E A R | L I T T L E | F R I E N D S |
+I N D E E D | N O T | A | F L O W E R | E S C A P E D | H I S | M I S C H I E V O U S | S U G G E S T I O N S |
+M E A N W H I L E | H O W | F A R E D | I T | W I T H | T H E | F L O W E R S |
+I N | T H I S | P O S I T I O N | S H E | R E M A I N E D | U N T I L | A | G E N T L E | H A N D | W A S | L A I D | U P O N | H E R | S H O U L D E R |
+T H E | M I S T R E S S | H A D | R E T U R N E D | A N D | T H E | Y O U N G | L A D Y | W A S | W I T H | H E R | A N D | H U R R I E D | A T | O N C E | T O | H E R | F A V O U R I T E | G A R D E N |
+E C H O E D | T H E | F L O W E R S | T R E M U L O U S L Y | A S | W I T H | A | S O R T | O F | F E A R F U L | P L E A S U R E | T H E Y | A W A I T E D | H I S | A P P R O A C H |
+W E E D S | M E A N W H I L E | S P R A N G | U P | A N D | A | D R E A R Y | C O N F U S I O N | R E I G N E D | I N | T H E | O N C E | O R D E R L Y | A N D | B R I L L I A N T | L I T T L E | G A R D E N |
+B E F O R E | T H E | D A Y | C L O S E D | T H E | G A R D E N E R | C A M E | W H I S T L I N G | F R O M | H I S | F A R M | W O R K | T O | L O O K | O V E R | H I S | P R E T T Y | C H A R G E S |
+O H | T H A T | S H E | W E R E | O N C E | M O R E | C L I M B I N G | U P | T H E | F R I E N D L Y | F I R | P O L E |
+T H E | H O N E Y S U C K L E | E S C A P E D | N O | B E T T E R | A N D | T H E | C A R N A T I O N | W A S | R E A D Y | T O | D I E | O F | V E X A T I O N | A T | F I N D I N G | T H A T | H E R | C O V E T E D | F R E E D O M | H A D | L E V E L L E D | H E R | T O | T H E | D I R T |
+G O | O N | D O W N | T H E | M O U N T A I N | S A I D | M E R C U R Y | A N D | A S | Y O U | G O | C A S T | T H E | B O N E S | O F | Y O U R | M O T H E R | O V E R | Y O U R | S H O U L D E R S | B E H I N D | Y O U | A N D | W I T H | T H E S E | W O R D S | H E | L E A P E D | I N T O | T H E | A I R | A N D | W A S | S E E N | N O | M O R E |
+W E | S H O U L D | L I K E | A B O V E | A L L | T H I N G S | S A I D | D E U C A L I O N | T O | S E E | T H I S | L A N D | F U L L | O F | P E O P L E | O N C E | M O R E | F O R | W I T H O U T | N E I G H B O R S | A N D | F R I E N D S | T H E | W O R L D | I S | A | V E R Y | L O N E L Y | P L A C E | I N D E E D |
+S U R E L Y | I | D O | N O T | K N O W | S A I D | D E U C A L I O N |
+T H E | D A Y | I S | C O M I N G | S A I D | P R O M E T H E U S | W H E N | J U P I T E R | W I L L | S E N D | A | F L O O D | T O | D E S T R O Y | M A N K I N D | F R O M | T H E | E A R T H |
+B U T | M E N | K E P T | O N | F I G H T I N G | A N D | R O B B I N G | E V E N | W H I L E | T H E | R A I N | W A S | P O U R I N G | D O W N | A N D | T H E | S E A | W A S | C O M I N G | U P | O V E R | T H E | L A N D |
+A F T E R | J U P I T E R | H A D | B O U N D | P R O M E T H E U S | O N | M O U N T | C A U C A S U S | A N D | H A D | S E N T | D I S E A S E S | A N D | C A R E S | I N T O | T H E | W O R L D | M E N | B E C A M E | V E R Y | V E R Y | W I C K E D |
+T H E S E | M E N | H E | S A I D | T O | H I S | M I G H T Y | C O M P A N Y | A R E | N O T H I N G | B U T | A | S O U R C E | O F | T R O U B L E |
+W H A T | D I D | H E | M E A N | A S K E D | P Y R R H A |
+I S | T H E R E | A N Y T H I N G | T H A T | Y O U | W I S H | H E | A S K E D |
+B U T | D E U C A L I O N | A N D | P Y R R H A | W E R E | V E R Y | S A D | F O R | T H E Y | K N E W | T H A T | T H E Y | W E R E | T H E | O N L Y | P E R S O N S | W H O | W E R E | L E F T | A L I V E | I N | A L L | T H E | L A N D |
+N O | O N E | B U T | D E U C A L I O N | T H E | S O N | O F | P R O M E T H E U S | W A S | R E A D Y | F O R | S U C H | A | S T O R M |
+W H E N | A T | L A S T | T H E Y | R E A C H E D | T H E | P L A I N | T H E Y | F O U N D | T H E M S E L V E S | A T | T H E | H E A D | O F | A | N O B L E | C O M P A N Y | O F | H U M A N | B E I N G S | A L L | E A G E R | T O | S E R V E | T H E M |
+I N | T H O S E | V E R Y | E A R L Y | T I M E S | T H E R E | W A S | A | M A N | N A M E D | D E U C A L I O N | A N D | H E | W A S | T H E | S O N | O F | P R O M E T H E U S |
+O N E | O F | T H E | Y O U N G | F A I R I E S | O V E R H E A R I N G | H E R | A N D | F A N C Y I N G | S H E | M I G H T | W O R K | S O M E | M I S C H I E F | T O | T H E | L I T T L E | B A B Y | W E N T | A N D | H I D | H E R S E L F | B E H I N D | T H E | H A N G I N G S | I N | T H E | H A L L | S O | A S | T O | B E | A B L E | T O | H A V E | T H E | L A S T | W O R D | A N D | U N D O | A N Y | H A R M | T H E | O L D | F A I R Y | M I G H T | W I S H | T O | W O R K |
+H E | T U R N E D | T O | S H O W | T H E M | T H E | C A S T L E | B U T | B E H O L D |
+H E | P A S S E D | T H R O U G H | O N E | A P A R T M E N T | A F T E R | A N O T H E R | W H E R E | W E R E | L A D I E S | A N D | G E N T L E M E N | A S L E E P | I N | T H E I R | C H A I R S | O R | S T A N D I N G |
+T H E Y | T A L K E D | F O R | F O U R | H O U R S | A N D | H A D | N O T | T H E N | S A I D | H A L F | T H A T | W A S | I N | T H E I R | H E A D S | T O | S A Y |
+M E A N W H I L E | A L L | T H E | R E S T | O F | T H E | P E O P L E | I N | T H E | C A S T L E | H A D | B E E N | W A K E N E D | A T | T H E | S A M E | M O M E N T | A S | T H E | P R I N C E S S | A N D | T H E Y | W E R E | N O W | E X T R E M E L Y | H U N G R Y |
+H E | E N T E R E D | T H E | G U A R D | R O O M | T H E R E | T H E | G U A R D S | S T O O D | D R A W N | U P | I N | L I N E | W I T H | C A R B I N E S | A T | T H E I R | S H O U L D E R S | B U T | T H E Y | W E R E | S O U N D | A S L E E P |
+N O W | F I F T E E N | Y E A R S | A F T E R | T H E | P R I N C E S S | W A S | B O R N | S H E | W A S | W I T H | T H E | K I N G | A N D | Q U E E N | A T | O N E | O F | T H E I R | C A S T L E S | A N D | A S | S H E | W A S | R U N N I N G | A B O U T | B Y | H E R S E L F | S H E | C A M E | T O | A | L I T T L E | C H A M B E R | A T | T H E | T O P | O F | A | T O W E R | A N D | T H E R E | S A T | A N | H O N E S T | O L D | W O M A N | S P I N N I N G | F O R | S H E | H A D | N E V E R | H E A R D | O F | T H E | K I N G ' S | E D I C T |
+T H E | Y O U N G | P R I N C E | A T | T H E S E | W O R D S | F E L T | H I M S E L F | O N | F I R E |
+T H E | T U R N | O F | T H E | O L D | F A I R Y | H A D | N O W | C O M E | A N D | S H E | D E C L A R E D | W H I L E | H E R | H E A D | S H O O K | W I T H | M A L I C E | T H A T | T H E | P R I N C E S S | S H O U L D | P I E R C E | H E R | H A N D | W I T H | A | S P I N D L E | A N D | D I E | O F | T H E | W O U N D |
+H E | E N T E R E D | A | L A R G E | F O R E C O U R T | A N D | S T O O D | S T I L L | W I T H | A M A Z E M E N T | A N D | A W E |
+S H E | H A D | N O | S O O N E R | T A K E N | U P | T H E | S P I N D L E | T H A N | B E I N G | H A S T Y | A N D | C A R E L E S S | S H E | P I E R C E D | H E R | H A N D | W I T H | T H E | P O I N T | O F | I T | A N D | F A I N T E D | A W A Y |
+T H E | V I O L I N S | A N D | H A U T | B O Y S | P L A Y E D | O L D | B U T | E X C E L L E N T | P I E C E S | O F | M U S I C | A N D | A F T E R | S U P P E R | T O | L O S E | N O | T I M E | T H E | G R A N D | A L M O N E R | M A R R I E D | T H E | R O Y A L | L O V E R S | I N | T H E | C H A P E L | O F | T H E | C A S T L E |
+W H E N | A T | L A S T | T H E | Q U E E N | G A V E | B I R T H | T O | A | D A U G H T E R | T H E | K I N G | W A S | S O | O V E R J O Y E D | T H A T | H E | G A V E | A | G R E A T | C H R I S T E N I N G | F E A S T | T H E | L I K E | O F | W H I C H | H A D | N E V E R | B E F O R E | B E E N | K N O W N |
+S C A R C E L Y | H A D | H E | C O M E | T O | T H E | W O O D | W H E N | A L L | T H E | T R E E S | A N D | T H O R N S | W H I C H | H A D | M A D E | S U C H | A N | I M P E N E T R A B L E | T H I C K E T | O P E N E D | O N | O N E | S I D E | A N D | T H E | O T H E R | T O | O F F E R | H I M | A | P A T H |
+T H E | L A D Y | I N | W A I T I N G | B E C A M E | V E R Y | I M P A T I E N T | A N D | A T | L E N G T H | A N N O U N C E D | T O | T H E | P R I N C E S S | T H A T | T H E Y | A L L | W A I T E D | F O R | H E R |
+T H E N | T H E | P R I N C E | T O O K | T H E | P R I N C E S S | B Y | T H E | H A N D | S H E | W A S | D R E S S E D | I N | G R E A T | S P L E N D O U R | B U T | H E | D I D | N O T | H I N T | T H A T | S H E | L O O K E D | A S | H E | H A D | S E E N | P I C T U R E S | O F | H I S | G R E A T | G R A N D M O T H E R | L O O K | H E | T H O U G H T | H E R | A L L | T H E | M O R E | C H A R M I N G | F O R | T H A T |
+H E | K N E W | T H A T | S H E | W O U L D | N O T | A W A K E | F O R | A | H U N D R E D | Y E A R S |
+O N E | S A I D | I T | W A S | A N | E N C H A N T E D | C A S T L E | A N O T H E R | T H A T | W I T C H E S | L I V E D | T H E R E | B U T | M O S T | B E L I E V E D | T H A T | I T | W A S | O C C U P I E D | B Y | A | G R E A T | O G R E | W H I C H | C A R R I E D | T H I T H E R | A L L | T H E | C H I L D R E N | H E | C O U L D | C A T C H | A N D | A T E | T H E M | U P | O N E | A T | A | T I M E | F O R | N O B O D Y | C O U L D | G E T | A T | H I M | T H R O U G H | T H E | W O O D |
+B U T | T H E | F A C E S | O F | T H E | M E N | W E R E | R O S Y | A N D | T H E | G O B L E T S | B Y | T H E M | H A D | A | F E W | D R O P S | O F | W I N E | L E F T |
+I T | I S | T R U E | I | C A N N O T | E N T I R E L Y | U N D O | W H A T | M Y | E L D E R | H A S | D O N E |
+I N | A P P R O A C H I N G | I T | I T S | S U S P I C I O U S | L O O K I N G | Y E L L O W | S P O T T E D | H O O D | A N D | W A T C H F U L | A T T I T U D E | W I L L | B E | L I K E L Y | T O | M A K E | Y O U | G O | C A U T I O U S L Y | T H R O U G H | T H E | B O G | W H E R E | I T | S T A N D S | A S | I F | Y O U | W E R E | A P P R O A C H I N G | A | D A N G E R O U S | S N A K E |
+Y E T | S T R A N G E | T O | S A Y | T H E R E | A R E | D A Y S | E V E N | H E R E | S O M E W H A T | D U L L | L O O K I N G | W H E N | T H E | M O U N T A I N | S E E M S | U N C O M M U N I C A T I V E | S E N D I N G | O U T | N O | A P P R E C I A B L E | I N V I T A T I O N | A S | I F | N O T | A T | H O M E |
+A S P L E N I U M | E P I L O B I U M | H E U C H E R A | H A Z E L | D O G W O O D | A N D | A L D E R | M A K E | A | L U X U R I O U S | F R I N G E | A N D | S E T T I N G | A N D | T H E | F O R E S T S | O F | D O U G L A S | S P R U C E | A L O N G | T H E | B A N K S | A R E | T H E | F I N E S T | I | H A V E | E V E R | S E E N | I N | T H E | S I E R R A |
+P E R H A P S | T H E | P R O F E S S I O N | O F | D O I N G | G O O D | M A Y | B E | F U L L | B U T | E V E R Y | B O D Y | S H O U L D | B E | K I N D | A T | L E A S T | T O | H I M S E L F |
+T H E I R | L O N G | M A S S I V E | E A R S | G I V E | T H E M | A | V E R Y | S T R I K I N G | A P P E A R A N C E |
+E V E R Y | C R Y S T A L | D A N C E S | R E S P O N S I V E | T O | T H E | T O U C H E S | O F | T H E | S U N | A N D | C U R R E N T S | O F | S A P | I N | T H E | G R O W I N G | C E L L S | O F | A L L | T H E | V E G E T A T I O N | A R E | E V E R | I N | A | V I T A L | W H I R L | A N D | R U S H | A N D | T H O U G H | M A N Y | F E E T | A N D | W I N G S | A R E | F O L D E D | H O W | M A N Y | A R E | A S T I R |
+T H E | V I V I D | G R E E N | O F | T H E | B O U L D E R S | B E N E A T H | T H E | W A T E R | I S | V E R Y | S T R I K I N G | A N D | C O L O R S | T H E | E N T I R E | S T R E A M | W I T H | T H E | E X C E P T I O N | O F | T H E | P O R T I O N S | B R O K E N | I N T O | F O A M |
+T H E | G R E A T | W I L D S | O F | O U R | C O U N T R Y | O N C E | H E L D | T O | B E | B O U N D L E S S | A N D | I N E X H A U S T I B L E | A R E | B E I N G | R A P I D L Y | I N V A D E D | A N D | O V E R R U N | I N | E V E R Y | D I R E C T I O N | A N D | E V E R Y T H I N G | D E S T R U C T I B L E | I N | T H E M | I S | B E I N G | D E S T R O Y E D |
+B U T | I T | I S | F A R | B E T T E R | T O | G O | A F O O T |
+G O | Q U I E T L Y | A L O N E | N O | H A R M | W I L L | B E F A L L | Y O U |
+A S | T H E | L I F E | B L O O D | O F | T H E | L A N D S C A P E S | T H E | B E S T | O F | T H E | W I L D E R N E S S | C O M E S | T O | T H E I R | B A N K S | A N D | N O T | O N E | D U L L | P A S S A G E | I S | F O U N D | I N | A L L | T H E I R | E V E N T F U L | H I S T O R I E S |
+T H U S | T H E | S H A S T A | R I V E R | I S S U E S | F R O M | A | L A R G E | L A K E | L I K E | S P R I N G | I N | S H A S T A | V A L L E Y | A N D | A B O U T | T W O | T H I R D S | O F | T H E | V O L U M E | O F | T H E | M C | C L O U D | G U S H E S | F O R T H | I N | A | G R A N D | S P R I N G | O N | T H E | E A S T | S I D E | O F | T H E | M O U N T A I N | A | F E W | M I L E S | B A C K | F R O M | I T S | I M M E D I A T E | B A S E |
+W H I L E | T R A V E L I N G | W I T H | A | C O M P A N Y | O F | H U N T E R S | I | S A W | A B O U T | F I F T Y | I N | O N E | F L O C K |
+S H O U L D | T H E | V O L U M E | O F | T H E | S T R E A M | W H E R E | Y O U | S T R I K E | I T | S E E M | S M A L L | T H E N | Y O U | W I L L | K N O W | T H A T | Y O U | A R E | A B O V E | T H E | S P R I N G | I F | L A R G E | N E A R L Y | E Q U A L | T O | I T S | V O L U M E | A T | I T S | C O N F L U E N C E | W I T H | T H E | P I T T | R I V E R | T H E N | Y O U | A R E | B E L O W | I T | A N D | I N | E I T H E R | C A S E | H A V E | O N L Y | T O | F O L L O W | T H E | R I V E R | U P | O R | D O W N | U N T I L | Y O U | C O M E | T O | I T |
+B U T | N E I T H E R | T H E | G L O R I F I E D | W O O D S | O N | T H E | O N E | H A N D | N O R | T H E | L A K E | O N | T H E | O T H E R | C O U L D | A T | F I R S T | H O L D | T H E | E Y E |
+T H E Y | A R E | B R O A D | R U G G E D | C R E V A S S E D | C L O U D L I K E | M A S S E S | O F | D O W N | G R I N D I N G | I C E | P O U R I N G | F O R T H | S T R E A M S | O F | M U D D Y | W A T E R | A S | M E A S U R E S | O F | T H E | W O R K | T H E Y | A R E | D O I N G | I N | S C U L P T U R I N G | T H E | R O C K S | B E N E A T H | T H E M | V E R Y | U N L I K E | T H E | L O N G | M A J E S T I C | G L A C I E R S | O F | A L A S K A | T H A T | R I V E R L I K E | G O | W I N D I N G | D O W N | T H E | V A L L E Y S | T H R O U G H | T H E | F O R E S T S | T O | T H E | S E A |
+M O U N T | B R E M E R | I S | T H E | M O S T | N O T E D | S T R O N G H O L D | O F | T H E | S H E E P | I N | T H E | W H O L E | S H A S T A | R E G I O N |
+T R A C I N G | T H I S | W I L D | C H A N G I N G | C H A N N E L | G O R G E | G U L L Y | O R | C A N Y O N | T H E | S E C T I O N S | W I L L | S H O W | M O U N T | S H A S T A | A S | A | H U G E | P A L I M P S E S T | C O N T A I N I N G | T H E | R E C O R D S | L A Y E R | U P O N | L A Y E R | O F | S T R A N G E L Y | C O N T R A S T E D | E V E N T S | I N | I T S | F I E R Y | I C Y | H I S T O R Y |
+S H A S T A | R A M B L E S | A N D | M O D O C | M E M O R I E S |
+O N E | B L A N K E T | W I L L | B E | E N O U G H | T O | C A R R Y | O R | Y O U | M A Y | F O R E G O | T H E | P L E A S U R E | A N D | B U R D E N | A L T O G E T H E R | A S | W O O D | F O R | F I R E S | I S | E V E R Y W H E R E | A B U N D A N T |
+U N D E R | C E R T A I N | C O N D I T I O N S | Y O U | M A Y | H E A R | T H E | R O A R | O F | T H E | W A T E R | R U S H I N G | F R O M | T H E | R O C K | A T | A | D I S T A N C E | O F | H A L F | A | M I L E | O R | E V E N | M O R E | O R | Y O U | M A Y | N O T | H E A R | I T | U N T I L | W I T H I N | A | F E W | R O D S |
+T H E | B I G | M E A D O W S | L I E | N E A R | T H E | F O O T | O F | L A S S E N ' S | B U T T E | A | B E A U T I F U L | S P A C I O U S | B A S I N | S E T | I N | T H E | H E A R T | O F | T H E | R I C H L Y | F O R E S T E D | M O U N T A I N S | S C A R C E L Y | S U R P A S S E D | I N | T H E | G R A N D E U R | O F | I T S | S U R R O U N D I N G S | B Y | T A H O E |
+T H E N | D A R K N E S S | L I K E | D E A T H |
+A T | S U C H | T I M E | I T S | H E I G H T | S E E M S | M U C H | L E S S | A S | I F | C R O U C H I N G | A N D | W E A R Y | I T | W E R E | T A K I N G | R E S T |
+T H E | A S C E N T | O F | L A S S E N ' S | B U T T E | I S | A N | E A S Y | W A L K | A N D | T H E | V I E W S | F R O M | T H E | S U M M I T | A R E | E X T R E M E L Y | T E L L I N G |
+S L I G H T | R A I N S T O R M S | A R E | L I K E L Y | T O | B E | E N C O U N T E R E D | I N | A | T R I P | R O U N D | T H E | M O U N T A I N | B U T | O N E | M A Y | E A S I L Y | F I N D | S H E L T E R | B E N E A T H | W E L L | T H A T C H E D | T R E E S | T H A T | S H E D | T H E | R A I N | L I K E | A | R O O F |
+O N L Y | A | L I T T L E | F O O D | W I L L | B E | R E Q U I R E D |
+A R C T I C | B E A U T Y | A N D | D E S O L A T I O N | W I T H | T H E I R | B L E S S I N G S | A N D | D A N G E R S | A L L | M A Y | B E | F O U N D | H E R E | T O | T E S T | T H E | E N D U R A N C E | A N D | S K I L L | O F | A D V E N T U R O U S | C L I M B E R S | B U T | F A R | B E T T E R | T H A N | C L I M B I N G | T H E | M O U N T A I N | I S | G O I N G | A R O U N D | I T S | W A R M | F E R T I L E | B A S E | E N J O Y I N G | I T S | B O U N T I E S | L I K E | A | B E E | C I R C L I N G | A R O U N D | A | B A N K | O F | F L O W E R S |
+T H E | L O N G | G R A Y | S L O P E S | L E A D I N G | U P | T O | T H E | G L A C I E R | S E E M | R E M A R K A B L Y | S M O O T H | A N D | U N B R O K E N |
+T W O | O R | T H R E E | M I L E S | F A R T H E R | O N | I S | T H E | M A I N | S T R O N G H O L D | O F | T H E | M O D O C S | H E L D | B Y | T H E M | S O | L O N G | A N D | D E F I A N T L Y | A G A I N S T | A L L | T H E | S O L D I E R S | T H A T | C O U L D | B E | B R O U G H T | T O | T H E | A T T A C K |
+T H U S | O N E | S A U N T E R S | O N | A N D | O N | I N | T H E | G L O R I O U S | R A D I A N C E | I N | U T T E R | P E A C E | A N D | F O R G E T F U L N E S S | O F | T I M E |
+H E R E | Y O U | S T R I K E | T H E | O L D | E M I G R A N T | R O A D | W H I C H | L E A D S | O V E R | T H E | L O W | D I V I D E | T O | T H E | E A S T E R N | S L O P E S | O F | T H E | M O U N T A I N |
+T H E | M U L E | D E E R | A R E | N E A R L Y | A S | H E A V Y |
+E V E R Y | L A N D S C A P E | L O W | A N D | H I G H | S E E M S | D O O M E D | T O | B E | T R A M P L E D | A N D | H A R R I E D |
+M O S T | O F | T H E | D R A I N A G E | O F | T H E | G L A C I E R | V A N I S H E S | A T | O N C E | I N | T H E | P O R O U S | R O C K S | T O | R E A P P E A R | I N | S P R I N G S | I N | T H E | D I S T A N T | V A L L E Y | A N D | I T | I S | O N L Y | I N | T I M E | O F | F L O O D | T H A T | T H E | C H A N N E L | C A R R I E S | M U C H | W A T E R | T H E N | T H E R E | A R E | S E V E R A L | F I N E | F A L L S | I N | T H E | G O R G E | S I X | H U N D R E D | F E E T | O R | M O R E | I N | H E I G H T |
+L A R G E | F L O C K S | D W E L L | H E R E | F R O M | Y E A R | T O | Y E A R | W I N T E R | A N D | S U M M E R | D E S C E N D I N G | O C C A S I O N A L L Y | I N T O | T H E | A D J A C E N T | S A G E | P L A I N S | A N D | L A V A | B E D S | T O | F E E D | B U T | E V E R | R E A D Y | T O | T A K E | R E F U G E | I N | T H E | J A G G E D | C R A G S | O F | T H E I R | M O U N T A I N | A T | E V E R Y | A L A R M |
+R E G A I N I N G | T H E | L O W | G R O U N D | A T | T H E | B A S E | O F | T H E | M O U N T A I N | A N D | H O L D I N G | O N | I N | Y O U R | G R A N D | O R B I T | Y O U | P A S S | T H R O U G H | A | B E L T | O F | J U N I P E R | W O O D S | C A L L E D | T H E | C E D A R S | T O | S H E E P | R O C K | A T | T H E | F O O T | O F | T H E | S H A S T A | P A S S |
+I T | I S | L I N E D | W I T H | E M E R A L D | A L G A E | A N D | M O S S E S | A N D | S H A D E D | W I T H | A L D E R | W I L L O W | A N D | T H O R N | B U S H E S | W H I C H | G I V E | I T | A | F I N E | S E T T I N G |
+T R A C I N G | R I V E R S | T O | T H E I R | F O U N T A I N S | M A K E S | T H E | M O S T | C H A R M I N G | O F | T R A V E L S |
+A | T H O U S A N D | T H O U S A N D | V O I C E S | A R E | H E A R D | B U T | S O | F I N E L Y | B L E N D E D | T H E Y | S E E M | A | P A R T | O F | T H E | N I G H T | I T S E L F | A N D | M A K E | A | D E E P E R | S I L E N C E |
+T H E | L O F T Y | I C Y | S H A S T A | T O W E R I N G | H I G H | A B O V E | A L L | S E E M S | B U T | A N | H O U R ' S | W A L K | F R O M | Y O U | T H O U G H | T H E | D I S T A N C E | I N | A N | A I R | L I N E | I S | A B O U T | S I X T Y | M I L E S |
+T H E N | T H E | S H I N I N G | O F | T H E | W E T | L E A V E S | I S | D E L I G H T F U L | A N D | T H E | S T E A M Y | F R A G R A N C E | A N D | T H E | B U R S T | O F | B I R D | S O N G | F R O M | A | M U L T I T U D E | O F | T H R U S H E S | A N D | F I N C H E S | A N D | W A R B L E R S | T H A T | H A V E | N E S T S | I N | T H E | C H A P A R R A L |
+T H E N | F E L L | T H E | G L O A M I N G | M A K I N G | E V E R Y T H I N G | S T I L L | M O R E | F O R B I D D I N G | A N D | M Y S T E R I O U S |
+T R A C I N G | T H E | M C | C L O U D | T O | I T S | H I G H E S T | S P R I N G S | A N D | O V E R | T H E | D I V I D E | T O | T H E | F O U N T A I N S | O F | F A L L | R I V E R | N E A R | F O R T | C R O O K | T H E N C E | D O W N | T H A T | R I V E R | T O | I T S | C O N F L U E N C E | W I T H | T H E | P I T T | O N | F R O M | T H E R E | T O | T H E | V O L C A N I C | R E G I O N | A B O U T | L A S S E N ' S | B U T T E | T H R O U G H | T H E | B I G | M E A D O W S | A M O N G | T H E | S O U R C E S | O F | T H E | F E A T H E R | R I V E R | A N D | D O W N | T H R O U G H | F O R E S T S | O F | S U G A R | P I N E | T O | T H E | F E R T I L E | P L A I N S | O F | C H I C O | T H I S | I S | A | G L O R I O U S | S A U N T E R | A N D | I M P O S E S | N O | H A R D S H I P |
+I T | I S | T H R E E | O R | F O U R | M I L E S | L O N G | A N D | T E R M I N A T E S | A T | A N | E L E V A T I O N | O F | A B O U T | N I N E | T H O U S A N D | F I V E | H U N D R E D | F E E T | A B O V E | S E A | L E V E L | I N | M O R A I N E | S P R I N K L E D | I C E | C L I F F S | S I X T Y | F E E T | H I G H |
+I N | S E T T I N G | O U T | F R O M | S T R A W B E R R Y | V A L L E Y | B Y | B E A R I N G | O F F | T O | T H E | N O R T H W E S T W A R D | A | F E W | M I L E S | Y O U | M A Y | S E E |
+T H E | D U C K S | L E S S | W A R Y | K E P T | T H E I R | P L A C E S | M E R E L Y | S W I M M I N G | I N | A N D | O U T | T H R O U G H | O P E N I N G S | I N | T H E | R U S H E S | R I P P L I N G | T H E | G L A S S Y | W A T E R | A N D | R A I S I N G | S P A N G L E S | I N | T H E I R | W A K E |
+B U T | D O N | Q U I X O T E | W H O M | H I S | T H O U G H T S | F A R | M O R E | T H A N | H U N G E R | K E P T | A W A K E | C O U L D | N O T | C L O S E | A N | E Y E | A N D | R O A M E D | I N | F A N C Y | T O | A N D | F R O | T H R O U G H | A L L | S O R T S | O F | P L A C E S |
+G I V E | M E | M Y | H O R S E | A N D | A R M S | A N D | W A I T | F O R | M E | H E R E | I | W I L L | G O | I N | Q U E S T | O F | T H I S | K N I G H T | A N D | D E A D | O R | A L I V E | I | W I L L | M A K E | H I M | K E E P | H I S | W O R D | P L I G H T E D | T O | S O | G R E A T | B E A U T Y |
+D O N | Q U I X O T E | W A S | O N | F O O T | W I T H | H I S | H O R S E | U N B R I D L E D | A N D | H I S | L A N C E | L E A N I N G | A G A I N S T | A | T R E E | A N D | I N | S H O R T | C O M P L E T E L Y | D E F E N C E L E S S | H E | T H O U G H T | I T | B E S T | T H E R E F O R E | T O | F O L D | H I S | A R M S | A N D | B O W | H I S | H E A D | A N D | R E S E R V E | H I M S E L F | F O R | A | M O R E | F A V O U R A B L E | O C C A S I O N | A N D | O P P O R T U N I T Y |
+S E E I N G | T H I S | S A N C H O | G O T | U P | A N D | G R A P P L I N G | W I T H | H I S | M A S T E R | H E | G R I P P E D | H I M | W I T H | A L L | H I S | M I G H T | I N | H I S | A R M S | G I V I N G | H I M | A | T R I P | W I T H | T H E | H E E L | S T R E T C H E D | H I M | O N | T H E | G R O U N D | O N | H I S | B A C K | A N D | P R E S S I N G | H I S | R I G H T | K N E E | O N | H I S | C H E S T | H E L D | H I S | H A N D S | I N | H I S | O W N | S O | T H A T | H E | C O U L D | N E I T H E R | M O V E | N O R | B R E A T H E |
+N O B O D Y | N E E D | H A V E | A N Y | D O U B T | A B O U T | T H A T | S A I D | S A N C H O | F O R | M Y | M A S T E R | H A S | A | V E R Y | H A P P Y | K N A C K | O F | M A T C H M A K I N G | I T ' S | N O T | M A N Y | D A Y S | S I N C E | H E | F O R C E D | A N O T H E R | M A N | T O | M A R R Y | W H O | I N | T H E | S A M E | W A Y | B A C K E D | O U T | O F | H I S | P R O M I S E | T O | A N O T H E R | M A I D E N | A N D | I F | I T | H A D | N O T | B E E N | F O R | H I S | P E R S E C U T O R S | T H E | E N C H A N T E R S | C H A N G I N G | T H E | M A N ' S | P R O P E R | S H A P E | I N T O | A | L A C Q U E Y ' S | T H E | S A I D | M A I D E N | W O U L D | N O T | B E | O N E | T H I S | M I N U T E |
+W H A T | A R E | Y O U | T A L K I N G | A B O U T | M A N |
+T H E | W O U N D E D | G E N T L E M A N | O P E N E D | H I S | A L L | B U T | C L O S E D | E Y E S | A N D | R E C O G N I S I N G | C L A U D I A | S A I D | I | S E E | C L E A R L Y | F A I R | A N D | M I S T A K E N | L A D Y | T H A T | I T | I S | T H O U | T H A T | H A S T | S L A I N | M E | A | P U N I S H M E N T | N O T | M E R I T E D | O R | D E S E R V E D | B Y | M Y | F E E L I N G S | T O W A R D S | T H E E | F O R | N E V E R | D I D | I | M E A N | T O | N O R | C O U L D | I | W R O N G | T H E E | I N | T H O U G H T | O R | D E E D |
+O N | P E R C E I V I N G | T H I S | C L A U D I A | W H E N | S H E | H A D | C O N V I N C E D | H E R S E L F | T H A T | H E R | B E L O V E D | H U S B A N D | W A S | N O | M O R E | R E N T | T H E | A I R | W I T H | H E R | S I G H S | A N D | M A D E | T H E | H E A V E N S | R I N G | W I T H | H E R | L A M E N T A T I O N S | S H E | T O R E | H E R | H A I R | A N D | S C A T T E R E D | I T | T O | T H E | W I N D S | S H E | B E A T | H E R | F A C E | W I T H | H E R | H A N D S | A N D | S H O W E D | A L L | T H E | S I G N S | O F | G R I E F | A N D | S O R R O W | T H A T | C O U L D | B E | C O N C E I V E D | T O | C O M E | F R O M | A N | A F F L I C T E D | H E A R T |
+D O S T | T H O U | R E V O L T | A G A I N S T | T H Y | M A S T E R | A N D | N A T U R A L | L O R D |
+D O N | Q U I X O T E | D I D | S O | A N D | A S K E D | H I M | W H A T | H A D | H A P P E N E D | T O | H I M | A N D | W H A T | H E | W A S | A F R A I D | O F |
+O N E | O F | T H E | S Q U I R E S | O B S E R V E D | I N | H I S | M I X T U R E | O F | G A S C O N | A N D | C A T A L A N | T H I S | C A P T A I N | O F | O U R S | W O U L D | M A K E | A | B E T T E R | F R I A R | T H A N | H I G H W A Y M A N | I F | H E | W A N T S | T O | B E | S O | G E N E R O U S | A N O T H E R | T I M E | L E T | I T | B E | W I T H | H I S | O W N | P R O P E R T Y | A N D | N O T | O U R S |
+S A N C H O | R E P L I E D | T H A T | A L L | T H E | T R E E S | W E R E | F U L L | O F | M E N ' S | F E E T | A N D | L E G S |
+W H A T | L E D | M E | I N T O | I T | W A S | A | C E R T A I N | T H I R S T | F O R | V E N G E A N C E | W H I C H | I S | S T R O N G | E N O U G H | T O | D I S T U R B | T H E | Q U I E T E S T | H E A R T S |
+D O S T | T H O U | R I S E | A G A I N S T | H I M | W H O | G I V E S | T H E E | H I S | B R E A D |
+T H E | C A P T A I N S | S H O W E D | P L A I N L Y | T H E | C O N C E R N | T H E Y | F E L T | T H E | R E G E N T ' S | L A D Y | W A S | D O W N C A S T | A N D | T H E | P I L G R I M S | D I D | N O T | A T | A L L | E N J O Y | S E E I N G | T H E I R | P R O P E R T Y | C O N F I S C A T E D |
+O | H U S B A N D | W H O S E | U N H A P P Y | F A T E | I N | B E I N G | M I N E | H A T H | B O R N E | T H E E | F R O M | T H E | M A R R I A G E | B E D | T O | T H E | G R A V E |
+M A S T E R | A N D | M A N | D I S M O U N T E D | F R O M | T H E I R | B E A S T S | A N D | A S | S O O N | A S | T H E Y | H A D | S E T T L E D | T H E M S E L V E S | A T | T H E | F O O T | O F | T H E | T R E E S | S A N C H O | W H O | H A D | H A D | A | G O O D | N O O N T I D E | M E A L | T H A T | D A Y | L E T | H I M S E L F | W I T H O U T | M O R E | A D O | P A S S | T H E | G A T E S | O F | S L E E P |
+C R U E L | R E C K L E S S | W O M A N | S H E | C R I E D | H O W | E A S I L Y | W E R T | T H O U | M O V E D | T O | C A R R Y | O U T | A | T H O U G H T | S O | W I C K E D |
+C L A U D I A | W O U L D | N O T | O N | A N Y | A C C O U N T | A L L O W | H I M | T O | A C C O M P A N Y | H E R | A N D | T H A N K I N G | H I M | F O R | H I S | O F F E R S | A S | W E L L | A S | S H E | C O U L D | T O O K | L E A V E | O F | H I M | I N | T E A R S |
+T H E | S E R V A N T S | W E P T | C L A U D I A | S W O O N E D | A W A Y | A G A I N | A N D | A G A I N | A N D | T H E | W H O L E | P L A C E | S E E M E D | A | F I E L D | O F | S O R R O W | A N D | A N | A B O D E | O F | M I S F O R T U N E |
+A N D | I F | Y O U | H A V E | A N Y | D E S I R E | T O | S H O R T E N | T H E | J O U R N E Y | A N D | P U T | Y O U R S E L F | E A S I L Y | I N | T H E | W A Y | O F | S A L V A T I O N | C O M E | W I T H | M E | A N D | I | W I L L | S H O W | Y O U | H O W | T O | B E C O M E | A | K N I G H T | E R R A N T | A | C A L L I N G | W H E R E I N | S O | M A N Y | H A R D S H I P S | A N D | M I S H A P S | A R E | E N C O U N T E R E D | T H A T | I F | T H E Y | B E | T A K E N | A S | P E N A N C E S | T H E Y | W I L L | L O D G E | Y O U | I N | H E A V E N | I N | A | T R I C E |
+H E | S A W | T H A T | H I S | S Q U I R E S | F O R | S O | T H E Y | C A L L | T H O S E | W H O | F O L L O W | T H A T | T R A D E | W E R E | A B O U T | T O | R I F L E | S A N C H O | P A N Z A | B U T | H E | O R D E R E D | T H E M | T O | D E S I S T | A N D | W A S | A T | O N C E | O B E Y E D | S O | T H E | G I R D L E | E S C A P E D |
+T H E | R E G E N T ' S | L A D Y | O R D E R E D | O N E | O F | H E R | S E R V A N T S | T O | G I V E | T H E | E I G H T Y | C R O W N S | T H A T | H A D | B E E N | A S S E S S E D | A S | H E R | S H A R E | A T | O N C E | F O R | T H E | C A P T A I N S | H A D | A L R E A D Y | P A I D | D O W N | T H E I R | S I X T Y |
+S A N C H O | S A I D | T H E Y | H A D | B U T | T H A T | T H R E E | K E R C H I E F S | T H A T | W E R E | W O R T H | T H R E E | C I T I E S | W E R E | M I S S I N G |
+H O W | N O W | T R A I T O R | E X C L A I M E D | D O N | Q U I X O T E |
+H E | T R E M B L E D | W I T H | F E A R | A N D | M A D E | F O R | A N O T H E R | T R E E | W H E R E | T H E | V E R Y | S A M E | T H I N G | H A P P E N E D | T O | H I M | A N D | H E | F E L L | A | S H O U T I N G | C A L L I N G | U P O N | D O N | Q U I X O T E | T O | C O M E | A N D | P R O T E C T | H I M |
+T H E Y | W E R E | A L L | T A K E N | A B A C K | A N D | N O T | O N E | O F | T H E M | D A R E D | T O | U T T E R | A | W O R D | S U C H | D E F E R E N C E | D I D | T H E Y | P A Y | H I M |
+T H E Y | M A D E | H A S T E | T O | O V E R T A K E | T H E M | W H I C H | A S | T H E | P A R T Y | M O V E D | S L O W L Y | T H E Y | W E R E | A B L E | T O | D O | W I T H | E A S E |
+D U L C I N E A | I S | P E R I S H I N G | T H O U | A R T | L I V I N G | O N | R E G A R D L E S S | I | A M | D Y I N G | O F | H O P E | D E F E R R E D | T H E R E F O R E | U N T R U S S | T H Y S E L F | W I T H | A | G O O D | W I L L | F O R | M I N E | I T | I S | H E R E | I N | T H I S | R E T I R E D | S P O T | T O | G I V E | T H E E | A T | L E A S T | T W O | T H O U S A N D | L A S H E S |
+A T | T H I S | I N S T A N T | O N E | O R | T W O | O F | T H O S E | S Q U I R E S | W H O | W E R E | P O S T E D | A S | S E N T I N E L S | O N | T H E | R O A D S | T O | W A T C H | W H O | C A M E | A L O N G | T H E M | A N D | R E P O R T | W H A T | P A S S E D | T O | T H E I R | C H I E F | C A M E | U P | A N D | S A I D | S E N O R | T H E R E | I S | A | G R E A T | T R O O P | O F | P E O P L E | N O T | F A R | O F F | C O M I N G | A L O N G | T H E | R O A D | T O | B A R C E L O N A |
+A N D | N O W | T H E | S Q U I R E S | D E S P A T C H E D | T O | M A K E | T H E | P R I Z E | C A M E | U P | B R I N G I N G | W I T H | T H E M | T W O | G E N T L E M E N | O N | H O R S E B A C K | T W O | P I L G R I M S | O N | F O O T | A N D | A | C O A C H | F U L L | O F | W O M E N | W I T H | S O M E | S I X | S E R V A N T S | O N | F O O T | A N D | O N | H O R S E B A C K | I N | A T T E N D A N C E | O N | T H E M | A N D | A | C O U P L E | O F | M U L E T E E R S | W H O M | T H E | G E N T L E M E N | H A D | W I T H | T H E M |
+A T | O N E | M O M E N T | I T | S E E M E D | T O | H I M | T H A T | H E | W A S | I N | T H E | C A V E | O F | M O N T E S I N O S | A N D | S A W | D U L C I N E A | T R A N S F O R M E D | I N T O | A | C O U N T R Y | W E N C H | S K I P P I N G | A N D | M O U N T I N G | U P O N | H E R | S H E | A S S | A G A I N | T H A T | T H E | W O R D S | O F | T H E | S A G E | M E R L I N | W E R E | S O U N D I N G | I N | H I S | E A R S | S E T T I N G | F O R T H | T H E | C O N D I T I O N S | T O | B E | O B S E R V E D | A N D | T H E | E X E R T I O N S | T O | B E | M A D E | F O R | T H E | D I S E N C H A N T M E N T | O F | D U L C I N E A |
+W H O | I S | T O U C H I N G | M E | A N D | U N T R U S S I N G | M E |
+H E | W A S | M O U N T E D | U P O N | A | P O W E R F U L | H O R S E | A N D | H A D | O N | A | C O A T | O F | M A I L | W I T H | F O U R | O F | T H E | P I S T O L S | T H E Y | C A L L | P E T R O N E L S | I N | T H A T | C O U N T R Y | A T | H I S | W A I S T |
+S A N C H O | R O S E | A N D | R E M O V E D | S O M E | D I S T A N C E | F R O M | T H E | S P O T | B U T | A S | H E | W A S | A B O U T | T O | P L A C E | H I M S E L F | L E A N I N G | A G A I N S T | A N O T H E R | T R E E | H E | F E L T | S O M E T H I N G | T O U C H | H I S | H E A D | A N D | P U T T I N G | U P | H I S | H A N D S | E N C O U N T E R E D | S O M E B O D Y ' S | T W O | F E E T | W I T H | S H O E S | A N D | S T O C K I N G S | O N | T H E M |
+D O N | Q U I X O T E | G A V E | H I S | P R O M I S E | A N D | S W O R E | B Y | T H E | L I F E | O F | H I S | T H O U G H T S | N O T | T O | T O U C H | S O | M U C H | A S | A | H A I R | O F | H I S | G A R M E N T S | A N D | T O | L E A V E | H I M | E N T I R E L Y | F R E E | A N D | T O | H I S | O W N | D I S C R E T I O N | T O | W H I P | H I M S E L F | W H E N E V E R | H E | P L E A S E D |
+I T | I S | N O T | T R U E | T H E N | S A I D | C L A U D I A | T H A T | T H O U | W E R T | G O I N G | T H I S | M O R N I N G | T O | M A R R Y | L E O N O R A | T H E | D A U G H T E R | O F | T H E | R I C H | B A L V A S T R O |
+H E | S A W | M E | H E | P A I D | C O U R T | T O | M E | I | L I S T E N E D | T O | H I M | A N D | U N K N O W N | T O | M Y | F A T H E R | I | L O V E D | H I M | F O R | T H E R E | I S | N O | W O M A N | H O W E V E R | S E C L U D E D | S H E | M A Y | L I V E | O R | C L O S E | S H E | M A Y | B E | K E P T | W H O | W I L L | N O T | H A V E | O P P O R T U N I T I E S | A N D | T O | S P A R E | F O R | F O L L O W I N G | H E R | H E A D L O N G | I M P U L S E S |
+C L A U D I A | T O L D | H I M | S H E | M E A N T | T O | G O | T O | A | M O N A S T E R Y | O F | W H I C H | A N | A U N T | O F | H E R S | W A S | A B B E S S | W H E R E | S H E | I N T E N D E D | T O | P A S S | H E R | L I F E | W I T H | A | B E T T E R | A N D | E V E R L A S T I N G | S P O U S E |
+S A I D | O N E | O F | T H E | B Y S T A N D E R S | I | H A V E | G O T | T H E M | A N D | T H E Y | A R E | N O T | W O R T H | T H R E E | R E A L S |
+I N | A | W O R D | H E | P L E D G E D | H I M S E L F | T O | B E | M I N E | A N D | I | P R O M I S E D | T O | B E | H I S | W I T H O U T | C A R R Y I N G | M A T T E R S | A N Y | F U R T H E R |
+O N E | O F | H I S | W A I T E R S | P H I L | T Y S O N | W A S | O N E | O F | T H E | E A R L I E R | O N E S | T O | G O | B A C K | I N T O | T H E | B U R N E D | D I S T R I C T | T O | B E G I N | B U S I N E S S | A N D | H E | O P E N E D | A | R E S T A U R A N T | C A L L E D | T H E | D E L | M O N T E | I N | P O W E L L | S T R E E T | N E A R | M A R K E T | B U T | I T | W A S | T O O | E A R L Y | F O R | S U C C E S S | A N D | C L O S E D | A F T E R | A | S H O R T | C A R E E R |
+H E R E | T H E R E | I S | A L W A Y S | G O O D | M U S I C | A N D | F O O D | W E L L | C O O K E D | A N D | W E L L | S E R V E D | A N D | A L W A Y S | A | L I V E L Y | C R O W D | D U R I N G | T H E | L U N C H E O N | D I N N E R | A N D | A F T E R | T H E A T R E | H O U R S | T H E | R O O M | I S | N O T | L A R G E | B U T | I T S | D I M E N S I O N S | A R E | G R E A T L Y | M A G N I F I E D | O W I N G | T O | T H E | C O V E R I N G | O F | M I R R O R S | W H I C H | L I N E | T H E | W A L L S |
+O F | C O U R S E | T H E S E | A R E | N O T | T H E | E N T I R E | M E N U S | B U T | O F | A L L | T H E | W E L L | P R E P A R E D | D I S H E S | T H E S E | A R E | T H E I R | B E S T |
+T O | T H E | P I C K L E | A D D | T W O | L A R G E | O N I O N S | C U T | I N | Q U A R T E R S | T W O | F R E S H | C A R R O T S | A N D | A B O U T | O N E | O U N C E | O F | M I X E D | W H O L E | A L L S P I C E | B L A C K | P E P P E R S | C L O V E S | A N D | B A Y | L E A V E S |
+S T R A I N | T H E | S A U C E | T H R O U G H | A | F I N E | C O L L A N D E R | A N D | A D D | A | F E W | R A I S I N S | A | P I E C E | O F | H O N E Y | C A K E | O R | G I N G E R | S N A P S | A N D | T H E | M E A T | O F | O N E | F R E S H | T O M A T O |
+T H E | P O O D L E | D O G | H A S | A | H O T E L | A T T A C H M E N T | W H E R E | O N E | M A Y | G E T | R O O M S | O R | F U L L | A P A R T M E N T S |
+P U T | I N T O | T H E | O V E N | A G A I N | A N D | C O O K | F O R | H A L F | A N | H O U R | B A S T I N G | F R E Q U E N T L Y | W I T H | T H E | O R I G I N A L | B R I N E |
+W E | F I N A L L Y | G O T | H I M | T O | S E L E C T | T H E | O N E | P R I Z E D | A B O V E | A L L | O T H E R S | A N D | T H I S | I S | W H A T | C H E F | S C H E I L E R | G A V E | U S |
+T H E | S P E C I A L T Y | O F | T H E | H O F | B R A U | I S | A B A L O N E ' S | A N D | T H E Y | H A V E | A S | A | F E A T U R E | T H I S | S H E L L | F I S H | C O O K E D | I N | S E V E R A L | W A Y S |
+T H E | R E S T A U R A N T S | O F | T H E | P R E S E N T | D A Y | T H A T | A P P R O A C H | N E A R E S T | T H E | O L D | B O H E M I A N | R E S T A U R A N T S | O F | P R E | F I R E | D A Y S | O F | T H E | F R E N C H | C L A S S | A R E | J A C K ' S | I N | S A C R A M E N T O | S T R E E T | B E T W E E N | M O N T G O M E R Y | A N D | K E A R N Y | F E L I X | I N | M O N T G O M E R Y | S T R E E T | B E T W E E N | C L A Y | A N D | W A S H I N G T O N | A N D | T H E | P O O D L E | D O G | B E R G E Z | F R A N K S | I N | B U S H | S T R E E T | B E T W E E N | K E A R N Y | A N D | G R A N T | A V E N U E |
+U N D E R | O R D I N A R Y | C I R C U M S T A N C E S | T H E | A B A L O N E | I S | T O U G H | A N D | U N P A L A T A B L E | B U T | A F T E R | T H E | D E F T | M A N I P U L A T I O N | O F | H E R B E R T | T H E Y | A R E | T E N D E R | A N D | M A K E | A | F I N E | D I S H | E I T H E R | F R I E D | A S | C H O W D E R | O R | A | L A | N E W B E R G |
+W H E N | D O N E | T A K E | T H E | M E A T | O U T | O F | T H E | S A U C E |
+I N | E I T H E R | O F | T H E S E | R E S T A U R A N T S | Y O U | W I L L | B E | S E R V E D | W I T H | T H E | B E S T | T H E | M A R K E T | A F F O R D S | C O O K E D | T H E | R I G H T | W A Y |
+T H O M P S O N | O P E N E D | A | L A R G E | R E S T A U R A N T | I N | O ' F A R R E L L | S T R E E T | J U S T | A B O V E | F I L L M O R E | A N D | F O R | T W O | Y E A R S | O R | M O R E | D I D | A | T H R I V I N G | B U S I N E S S | H I S | P L A C E | B E I N G | N O T E D | F O R | I T S | G O O D | C O O K I N G | A N D | I T S | S P L E N D I D | S E R V I C E |
+I F | Y O U | K N O W | H O W | T O | O R D E R | A N D | D O | N O T | C A R E | T O | C O U N T | T H E | C O S T | W H E N | Y O U | O R D E R | P R O B A B L Y | T H E | B E S T | D I N N E R | A T | T H E S E | R E S T A U R A N T S | C A N | B E | H A D | A T | E I T H E R | B L A N C O ' S | O R | T H E | P O O D L E | D O G |
+P U T | I N | T H E | O V E N | A N D | B R O W N | T O | A | G O L D E N | C O L O R |
+A T | T H E | C O R N E R | O F | M A R K E T | A N D | E D D Y | S T R E E T S | I S | T H E | O D E O N | D O W N | I N | A | B A S E M E N T | W I T H | D E C O R A T I O N S | O F | M O S T | G A R I S H | O R D E R |
+I N | A D D I T I O N | T O | A B A L O N E ' S | T H E | H O F | B R A U | M A K E S | A | S P E C I A L T Y | O F | L I T T L E | O R E G O N | C R A W F I S H |
+T H E Y | A L S O | H A V E | A S | T H E | C H E F | I N | C H A R G E | O F | T H E | A B A L O N E | D I S H E S | H E R B E R T | F O R M E R L Y | C H E F | F O R | O N E | O F | T H E | Y A C H T | C L U B S | O F | T H E | C O A S T | W H O | C L A I M S | T O | H A V E | T H E | O N L Y | P R O P E R | R E C I P E | F O R | M A K I N G | A B A L O N E ' S | T E N D E R |
+H I S | P R I C E S | A R E | M O D E R A T E | A N D | H I S | C O O K I N G | A N D | V I A N D S | O F | T H E | B E S T | A N D | W I L L | S A T I S F Y | T H E | M O S T | C R I T I C A L | O F | T H E | G O U R M E T S |
+T H E | F L Y | T R A P | A N D | C H A R L I E ' S | F A S H I O N | T H E | F I R S T | I N | S U T T E R | S T R E E T | N E A R | K E A R N Y | A N D | T H E | O T H E R | I N | M A R K E T | N E A R | S U T T E R | S E R V E | W E L L | C O O K E D | F O O D S | E S P E C I A L L Y | S O U P | S A L A D S | A N D | F I S H |
+T H E | C U I S I N E | I S | O F | T H E | B E S T | A N D | T H E | C H E F S | R A N K | A T | T H E | T O P | O F | T H E I R | A R T |
+S A N | F R A N C I S C O ' S | C A R E | F R E E | S P I R I T | W A S | F U L L Y | E X E M P L I F I E D | B E F O R E | T H E | A S H E S | O F | T H E | G R E A T | F I R E | O F | N I N E T E E N | O | S I X | W E R E | C O L D |
+I T | I S | A N | I D E A | T H A T | I S | W O R T H | W H I L E | B U T | U N F O R T U N A T E L Y | T H E | P R O P R I E T O R S | D E P E N D | T O O | M U C H | O N | T H E | D E C O R A T I V E | F E A T U R E | A N D | T O O | L I T T L E | O N | T H E | F O O D | A N D | H O W | T H E Y | S E R V E | I T |
+T H I S | G A R I S H | D I S P L A Y | O F | M I R R O R S | A N D | E L A B O R A T E | D E C O R A T I O N | O F | C E I L I N G | A N D | P I L L A R S | G I V E S | I T | T H E | A P P E A R A N C E | O F | T H E | A B O D E | O F | S A T U R N A L I A | B U T | D E C O R U M | I S | T H E | R U L E | A M O N G | T H E | P A T R O N S |
+I T | H A S | C H A N G E D | F R O M | W H A T | I T | W A S | I N | T H E | O L D | D A Y S | B U T | I S | S T I L L | A N | E X C E L L E N T | P L A C E | T O | D I N E |
+B U T | I F | Y O U | R E A L L Y | L O V E | G O O D | M U S I C | M U S I C | T H A T | H A S | M E L O D Y | A N D | R H Y T H M | A N D | S O O T H I N G | C A D E N C E S | G O | T O | T H E | H E I D E L B E R G | I N N | A N D | L I S T E N | T O | T H E | C O N C E R T | W H I C H | I S | A | F E A T U R E | O F | T H E | P L A C E | E V E R Y | E V E N I N G |
+T H E N | T A K E | I T | O U T | O F | T H E | R O A S T I N G | P A N | A N D | P U T | I T | I N T O | A | C A S S E R O L E | A F T E R | S P R I N K L I N G | I T | W I T H | T W O | O U N C E S | O F | F L O U R |
+T H E | H O F | B R A U | H O W E V E R | I S | L E S S | D I S T I N C T I V E L Y | G E R M A N | A S | T H E | G R E A T E R | N U M B E R | O F | I T S | P A T R O N S | A R E | A M E R I C A N S |
+O N E | C A N | A L M O S T | I M A G I N E | H I M S E L F | I N | O N E | O F | T H E | F A M O U S | R A T H S K E L L E R S | O F | O L D | H E I D E L B E R G | N O T | A T | T H E | S C H L O S S | O F | C O U R S E | F O R | H E R E | Y O U | C A N N O T | L O O K | D O W N | O N | T H E | W E I S E R | A S | I T | F L O W S | B E N E A T H | T H E | W I N D O W S | O F | T H E | G R E A T | W I N E | S T U B E | O N | T H E | H I L L |
+A T | T H E | T W O | M E N T I O N E D | O N E | P A Y S | F O R | T H E | S U R R O U N D I N G S | A S | W E L L | A S | F O R | T H E | F O O D | A N D | S O M E T I M E S | T H I S | I S | W O R T H | P A Y I N G | F O R |
+B O T H | S E R V E | G O O D | S P A N I S H | D I N N E R S | A T | R E A S O N A B L E | P R I C E S |
+S E A S O N | W I T H | S A L T | A N D | P E P P E R | A N D | A | L I T T L E | S U G A R | T O | T A S T E |
+H E R E | A S | W E L L | A S | I N | A | N U M B E R | O F | O T H E R | P L A C E S | O N E | C A N | W E L L | A P P R E C I A T E | T H E | C O L L O Q U I A L | D E F I N I T I O N | O F | C A B A R E T |
+J O H N | T A I T | I S | T H E | P R E S I D I N G | S P I R I T | H E R E | H E | H A V I N G | M A D E | R E P U T A T I O N | A S | C L U B | M A N A G E R | A N D | T H E N | A S | M A N A G E R | O F | T H E | C L I F F | H O U S E |
+I N | T H I S | S A M E | D I S T R I C T | I S | T H E | M I N T | I N | C O M M E R C I A L | S T R E E T | B E T W E E N | M O N T G O M E R Y | A N D | K E A R N Y | S T R E E T S |
+C L A R E T S | A R E | V A L U E D | F O R | T H E I R | F L A V O R | A N D | F O R | T H E I R | T O N I C | P R O P E R T I E S |
+N E V E R | D R I N K | A N Y | H A R D | L I Q U O R S | S U C H | A S | W H I S K Y | B R A N D Y | G I N | O R | C O C K T A I L S | W I T H | O Y S T E R S | O R | C L A M S | A S | I T | I S | L I A B L E | T O | U P S E T | Y O U | F O R | T H E | R E S T | O F | T H E | E V E N I N G |
+C H A B L I S | A | W H I T E | B U R G U N D Y | D R Y | A N D | O F | A G R E E A B L E | A R O M A |
+H O C H H E I M E R | A | L I G H T | P L E A S I N G | A N D | W H O L E S O M E | W I N E |
+D R Y | A N D | O F | M A G N I F I C E N T | B O U Q U E T |
+G E R M A N | W I N E S | A R E | O F | L I G H T E R | C H A R A C T E R | A N D | A R E | G E N E R A L L Y | T E R M E D | R H E I N | W I N E S |
+C L A R E T | E I G H T E E N | N I N E T Y | E I G H T | A N D | N I N E T E E N | O | F O U R |
+R H E I N | A N D | M O S E L L E | E I G H T E E N | N I N E T Y | T H R E E |
+A U S T R I A N | B U R G U N D Y | I S | O N E | O F | T H E | F I N E S T | W I N E S | P O S S E S S I N G | R I C H | F L A V O R | A N D | F I N E | P E R F U M E | O T H E R | B U R G U N D I E S | A R E |
+S A U T E R N E | I S | A | W H I T E | B O R D E A U X | A | S T R O N G | L U S C I O U S | W I N E | T H E | B E S T | K N O W N | V A R I E T I E S | B E I N G |
+L A C R I M A | C H R I S T I | A | S T I L L | W I N E | O F | E X C E L L E N T | F L A V O R | A N D | B O U Q U E T |
+V I N T A G E | Y E A R S | H A V E | M U C H | T O | D O | W I T H | T H E | Q U A L I T Y | O F | W I N E S |
+W I T H | S O U P | A N D | F I S H | S E R V E | W H I T E | W I N E S | S U C H | A S | R H E I N | W I N E | S A U T E R N E | O R | W H I T E | B U R G U N D Y |
+W I T H | E N T R E E S | S E R V E | C L A R E T S | O R | O T H E R | R E D | W I N E S | S U C H | A S | S W I S S | B O R D E A U X | H U N G A R I A N | O R | I T A L I A N | W I N E S |
+P O U R | M A Y O N N A I S E | O V E R | A L L | C H I L L | A N D | S E R V E |
+P U T | T H E | P U L P | I N T O | A | B A S I N | W I T H | T W O | O U N C E S | O F | M E L T E D | B U T T E R | T W O | T A B L E S P O O N F U L S | O F | L E M O N | J U I C E | H A L F | A | P O U N D | O F | C H E S T N U T S | B O I L E D | A N D | G R A T E D | A N D | S E A S O N I N G | O F | S A L T | A N D | W H I T E | P E P P E R | T O | T A S T E |
+A S P A R A G U S | S A L A D | C O O K | T H E | A S P A R A G U S | I N | S A L T E D | W A T E R | D R A I N | A N D | C H I L L |
+T H I S | D R E S S I N G | S H O U L D | S T A N D | I N | T H E | I C E | B O X | F O U R | O R | F I V E | H O U R S | T O | B E C O M E | S E A S O N E D |
+S T I R | T H E | S O A K E D | G E L A T I N | I N | W H I L E | T H E | C U C U M B E R | I S | H O T |
+W H E N | T H I C K E N E D | S T R A I N | A N D | C O O L |
+T O M A T O | B A S K E T S | T O M A T O | B A S K E T S | A R E | C H A R M I N G | A C C E S S O R I E S | F O R | H O L D I N G | V E G E T A B L E | S A L A D | C H I C K E N | S H R I M P S | C O L D | B E A N S | A S P A R A G U S | T I P S | S H R E D D E D | C E L E R Y | C U C U M B E R S | C U T | I N | C U B E S | A N D | M I N C E D | P E P P E R S |
+G A R N I S H | D I S H | T H A T | D R E S S I N G | I S | M A D E | I N | W I T H | A | L I T T L E | G A R L I C |
+B I R D S | N E S T | S A L A D | H A V E | R E A D Y | A S | M A N Y | C R I S P | L E A V E S | O F | L E T T U C E | A S | M A Y | B E | R E Q U I R E D | T O | M A K E | A | D A I N T Y | L I T T L E | N E S T | F O R | E A C H | P E R S O N |
+C A U L I F L O W E R | M A Y O N N A I S E | T A K E | C O L D | B O I L E D | C A U L I F L O W E R | B R E A K | I N T O | B R A N C H E S | A D D I N G | S A L T | P E P P E R | A N D | V I N E G A R | T O | S E A S O N |
+H A N D L E S | O F | W A T E R C R E S S | M A Y | B E | A T T A C H E D | T O | T H E S E | B A S K E T S |
+C E L E R Y | A N D | N U T | S A L A D | C U T | E N O U G H | C E L E R Y | F I N E | T O | M E A S U R E | T W O | C U P S | A D D | O N E | C U P | O F | F I N E L Y | S H R E D D E D | O R | S H A V E D | C A B B A G E | A N D | O N E | A N D | O N E | H A L F | C U P S | O F | W A L N U T | M E A T S | B R O K E N | I N | S M A L L | P I E C E S | B U T | N O T | C H O P P E D |
+S E T | I N T O | A | C O L D | P L A C E | T O | C H I L L | A N D | B E C O M E | F I R M |
+S A L A D | T W O | C U P S | O F | A P P L E S | C U T | I N T O | S M A L L | P I E C E S | O N E | C U P | C E L E R Y | C U T | I N T O | S M A L L | P I E C E S | O N E | C U P | E N G L I S H | W A L N U T S |
+S U R R O U N D | W I T H | A | G A R N I S H | O F | C O O K E D | A N D | D I C E D | C A R R O T S | T U R N I P S | G R E E N | P E A S |
+S E R V E | O N | A | L E T T U C E | L E A F | W I T H | M A Y O N N A I S E | D R E S S I N G | M A D E | W I T H O U T | M U S T A R D | A N D | T H I N N E D | W I T H | C R E A M |
+A D D | T W O | T A B L E S P O O N S | T H I C K | S O U R | C R E A M | T W O | T A B L E S P O O N S | S U G A R | A | S P R I N K L E | O F | M U S T A R D | A N D | H A L F | C U P | O F | V I N E G A R |
+S E R V E | W I T H | F R E N C H | D R E S S I N G | H I D D E N | U N D E R | T H E | L E A V E S | O F | T H E | N E S T |
+C A B B A G E | S A L A D | C H O P | O R | S H A V E | F I N E | H A L F | A | M E D I U M | S I Z E | H E A D | O F | C A B B A G E | T H A T | H A S | B E E N | L E F T | I N | C O L D | W A T E R | U N T I L | C R I S P | T H E N | D R A I N |
+B E A T | U N T I L | T H O R O U G H L Y | M I X E D | P O U R | O V E R | T H E | C A B B A G E | A N D | T O S S | L I G H T L Y | U N T I L | U N I F O R M L Y | S E A S O N E D |
+S T R A I N | A N D | B O T T L E | A N D | P U T | I N | I C E | B O X | S H A K E | B E F O R E | U S I N G | E A C H | T I M E |
+H O W | A S K E D | T A D |
+C H A P T E R | F O U R | T H E | F I R S T | N I G H T | I N | C A M P |
+W H O | I S | T H E | W R A N G L E R | T H I S | M O R N I N G | A S K E D | T H E | F O R E M A N | G L A N C I N G | A B O U T | A T | H I S | M E N |
+H I | T H E R E | H I S S E D | L U M P Y | F I L L E D | W I T H | I N D I G N A T I O N | T H A T | A N Y O N E | S H O U L D | A T T E M P T | T O | M O U N T | A | P O N Y | F R O M | T H E | R I G H T | S I D E |
+I N | S P I T E | O F | T H E I R | H A R D | C O U C H E S | T H E | P O N Y | R I D E R S | S L E P T | S O U N D L Y | E V E N | P R O F E S S O R | Z E P P L I N | H I M S E L F | N E V E R | W A K I N G | T H E | W H O L E | N I G H T | T H R O U G H |
+S T A C Y | G R U M B L E D | T U R N E D | O V E R | A N D | W E N T | T O | S L E E P | A G A I N |
+E V E N | I F | I | C A N ' T | S I N G | I | C A N | B E A T | T H A T |
+N O N E | O F | Y O U | W I L L | B E | F I T | F O R | D U T Y | T O | M O R R O W |
+H U M P H | G R U N T E D | C U R L E Y | A D A M S |
+A | W R A N G L E R ' S | A | W R A N G L E R | A N S W E R E D | B I G | F O O T | S T O L I D L Y |
+O H | N O | T H I S | K I N D | O F | A | W R A N G L E R | I S N ' T | L A U G H E D | T H E | F O R E M A N |
+G R U B | P I | L E | G R U B | P I | L E |
+N O T | O N | T H E | R A N G E | W H Y | N O T | D E M A N D E D | T H E | B O Y |
+W A L T E R | H A D | G O N E | O U T | W I T H | T H E | S E C O N D | G U A R D | A N D | T H E | O T H E R S | H A D | G A T H E R E D | A R O U N D | T H E | C A M P | F I R E | F O R | T H E I R | N I G H T L Y | S T O R Y | T E L L I N G |
+W E | H A D | B E T T E R | S T A R T | T H E | D R I V E | T H I S | M O R N I N G |
+T H E | L A D S | F O U N D | T H A T | A | P A I R | O F | B L A N K E T S | H A D | B E E N | A S S I G N E D | T O | E A C H | O F | T H E M | W I T H | A N | O R D I N A R Y | W A G O N | S H E E T | D O U B L E D | F O R | A | T A R P A U L I N |
+T H E | C O W B O Y | D I D | T H I S | V E R Y | T H I N G | B U T | W I T H I N | A N | H O U R | H E | F O U N D | H I M S E L F | A L O N E | T H E | O T H E R S | H A V I N G | T U R N E D | I N | O N E | B Y | O N E |
+S T A C Y | B R O W N | P R O V E D | T H E | O N L Y | G R U M B L E R | I N | T H E | L O T | D E C L A R I N G | T H A T | H E | C O U L D | N O T | S L E E P | A | W I N K | O N | S U C H | A | B E D | A S | T H A T |
+T H E | P O N Y | D I D | M O S T | O F | I T | A D M I T T E D | T H E | L A D | I | J U S T | G A V E | H I M | H I S | H E A D | A N D | T H A T ' S | A L L | T H E R E | W A S | T O | I T |
+K E E P | A | G O I N G | A N D | I F | Y O U ' R E | L U C K Y | Y O U ' L L | R U N | P L U M B | I N T O | T H E M | W A S | T H E | J E E R I N G | A N S W E R | A S | T H E | S L E E P Y | C O W M E N | S P U R R E D | T H E I R | P O N I E S | O N | T O W A R D | C A M P | M U T T E R I N G | T H E I R | D I S A P P R O V A L | O F | T A K I N G | A L O N G | A | B U N C H | O F | B O Y S | O N | A | C A T T L E | D R I V E |
+A L M O S T | B E F O R E | T H E | E C H O E S | O F | H I S | V O I C E | H A D | D I E D | A W A Y | A | S H R I L L | V O I C E | P I P E D | U P | F R O M | T H E | T A I L | E N D | O F | T H E | C H U C K | W A G O N |
+A | L O U D | L A U G H | F O L L O W E D | A T | C H U N K Y ' S | E X P E N S E |
+L U M P Y | B A T E S | C A M E | R U N N I N G | T O W A R D | H I M | N O T | D A R I N G | T O | C A L L | O U T | F O R | F E A R | O F | W A K I N G | T H E | C A M P |
+Y O U | W O N ' T | B E | S O | F A S T | T O | W A K E | U P | H A R D | W O R K I N G | C O W B O Y S | A F T E R | T H A T | I | R E C K O N |
+T H E S E | T H E Y | S P R E A D | O U T | O N | T H E | G R O U N D | U S I N G | B O O T S | W R A P P E D | I N | C O A T S | F O R | P I L L O W S |
+P O N G | T E L L | T H E | Y O U N G | G E N T L E M E N | W H A T | W O U L D | B E C O M E | O F | Y O U | I F | Y O U | W E R E | T O | S E R V E | B A D | M E A L S | T O | T H I S | O U T F I T | O F | C O W P U N C H E R S |
+T H E | H O R S E S | O F | T H E | O U T F I T | S A V E | T H O S E | T H A T | W E R E | O N | N I G H T | D U T Y | A N D | T W O | O R | T H R E E | O T H E R S | T H A T | H A D | D E V E L O P E D | A | H A B I T | O F | S T R A Y I N G | H A D | B E E N | T U R N E D | L O O S E | E A R L Y | I N | T H E | E V E N I N G | F O R | A N I M A L S | O N | T H E | T R A I L | A R E | S E L D O M | S T A K E D | D O W N |
+W E ' V E | G O T | A | H A R D | D R I V E | B E F O R E | U S | A N D | E V E R Y | M A N | M U S T | B E | F I T | A S | A | F I D D L E |
+S T A C Y | B R O W N ' S | L E F T | L E G | S W U N G | O V E R | T H E | S A D D L E |
+W H E R E | A R E | T H E Y | A S K E D | T H E | B O Y |
+H E ' S | A | F E L L O W | W H O ' S | A L L | T H E | T I M E | M A K I N G | T R O U B L E | I S N ' T | H E | A S K E D | S T A C Y | I N N O C E N T L Y |
+H E ' S | A | T R O U B L E | C U R E R | N O T | A | T R O U B L E M A K E R | E X C E P T | F O R | H I M S E L F |
+T H E N | B I D D I N G | T H E | B O Y S | R I D E | U P | N E A R | T H E | S P O T | T O | W A T C H | H I M | H E | D R E W | O F F | S O M E | T E N | R O D S | A N D | W H E E L I N G | S P U R R E D | H I S | P O N Y | T O | A | R U N |
+A | T E M P O R A R Y | C A M P | W A S | Q U I C K L Y | P I T C H E D |
+T H E | L I T T L E | A N I M A L S | W E R E | B E C O M I N G | M O R E | S U R E | F O O T E D | E V E R Y | D A Y | A N D | N E D | S A I D | T H A T | B E F O R E | T H E | T R I P | W A S | F I N I S H E D | J I M M I E | W O U L D | B E | A B L E | T O | W A L K | A | S L A C K | R O P E |
+T H E Y | S A W | H I S | L O N G | H A I R | A L M O S T | B R U S H | T H E | G R A S S | O N E | O F | H I S | H A N D S | S W E P T | D O W N | A N D | U P | A N D | O N C E | M O R E | T A D | B U T L E R | R O S E | S T A N D I N G | I N | H I S | S T I R R U P S | U T T E R I N G | A | C O W B O Y | Y E L L | A S | H E | W A V E D | T H E | S O M B R E R O | O N | H I G H |
+O N C E | M O R E | S T A C Y | A P P R O A C H E D | T H E | S O M B R E R O | H I S | P O N Y | R U N N I N G | W E L L | A N D | A S | H E | D R E W | N E A R | I T | T H E Y | S A W | H I M | R I S E | I N | H I S | S A D D L E | J U S T | A S | T A D | B U T L E R | H A D | D O N E | A | F E W | M I N U T E S | B E F O R E |
+I | R E C K O N | T H E R E | A R E | S M I L E D | T H E | G U I D E | W E | A R E | I N | T H E | B E A R | C O U N T R Y | N O W |
+W H A T ' S | T H A T | F O R | D E M A N D E D | N E D | W O N D E R I N G L Y |
+T H I S | A N N O U N C E M E N T | F I L L E D | T H E | B O Y S | W I T H | E X C I T E M E N T |
+A N D | W E ' V E | G O T | A | S U R P R I S E | F O R | Y O U | A N N O U N C E D | S T A C Y | S W E L L I N G | W I T H | P R I D E |
+H A T | T O O | C L O S E | T O | M E | I | C O U L D N ' T | G E T | I T | E X P L A I N E D | C H U N K Y | T H E | B O Y S | R O A R E D |
+G R A S P I N G | T H E | P O M M E L | W I T H | T H E | L E F T | H A N D | H E | A P P E A R E D | T O | D I V E | H E A D | F I R S T | T O W A R D | T H E | G R O U N D |
+T H E | B O Y S | H O W L E D | W I T H | D E L I G H T | T H A T | I S | A L L | D I D | S A V E | S T A C Y | B R O W N |
+C O L D | W A T E R | I S | T H E | M O S T | N O U R I S H I N G | T H I N G | W E ' V E | T O U C H E D | S I N C E | L A S T | N I G H T |
+A N | E A R L Y | S T A R T | W A S | M A D E | S O | T H A T | T H E | P A R T Y | R E A C H E D | T H E | P R O M I S E D | T A B L E | L A N D S | S H O R T L Y | B E F O R E | T E N | O ' C L O C K | I N | T H E | F O R E N O O N |
+G A L L O P I N G | I N T O | C A M P | T H E | B O Y | F E T C H E D | H I S | S O M B R E R O | W H I C H | H E | C A R R I E D | W E L L | O U T | I N T O | T H E | F I E L D | A N D | T O S S E D | A W A Y |
+H E | B U R I E D | H I S | B I S C U I T | U N D E R | A | L A Y E R | O F | J A M | O V E R | W H I C H | H E | S P R E A D | A | T H I C K | C O A T I N G | O F | H O N E Y |
+W E | D I D | N O T | I T | M U S T | H A V E | C O M E | T O | L I F E | S O M E | T I M E | D U R I N G | T H E | N I G H T | A N D | D U G | I T S | W A Y | O U T | L A U G H E D | T A D |
+A S | Y E T | T H E Y | H A D | B E E N | U N A B L E | T O | A T T E M P T | A N Y | F A N C Y | R I D I N G | W I T H | T H E I R | P O N I E S | O W I N G | T O | T H E | R U G G E D | N A T U R E | O F | T H E | C O U N T R Y | T H R O U G H | W H I C H | T H E Y | H A D | B E E N | J O U R N E Y I N G |
+J A M | E X C L A I M E D | C H U N K Y | S T R E T C H I N G | H I S | N E C K | A N D | E Y E I N G | T H E | D I S H | L O N G I N G L Y |
+W H Y | D O N ' T | Y O U | M O V E | T H E | P O N Y |
+S U P P E R | H A V I N G | B E E N | F I N I S H E D | T H E | P A R T Y | G A T H E R E D | A B O U T | T H E | C A M P | F I R E | F O R | T H E I R | E V E N I N G | C H A T | A F T E R | W H I C H | A D M O N I S H I N G | S T A C Y | T O | K E E P | W I T H I N | H I S | T E N T | A N D | N O T | T O | G O | B O R R O W I N G | T R O U B L E | T H E | B O Y S | T U R N E D | I N | F O R | A | S O U N D | S L E E P |
+I T | W A S | A | B E A U T I F U L | R A C E | T H E | L I T T L E | I N D I A N | P O N I E S | S E E M I N G | T O | E N T E R | T H O R O U G H L Y | I N T O | T H E | S P I R I T | O F | T H E | C O N T E S T | S T R E T C H I N G | T H E M S E L V E S | O U T | T O | T H E I R | F U L L | L E N G T H S | A N D | W I T H | H E A D S | O N | A | L E V E L | W I T H | T H E I R | B A C K S | F A I R L Y | F L E W | A C R O S S | T H E | G R E A T | P L O T | O F | G R E E N |
+H E | N O | D O U B T | W O U L D | B R I N G | F O O D | O F | S O M E | K I N D | W I T H | H I M |
+W I T H | A | S H O U T | T H E | B O Y S | D A S H E D | P E L L | M E L L | T O | M E E T | T H E | P A C K | T R A I N | A N D | F A L L I N G | I N | B E H I N D | T H E | S L O W | M O V I N G | B U R R O S | U R G E D | T H E M | O N | W I T H | D E R I S I V E | S H O U T S | A N D | S U N D R Y | R E S O U N D I N G | S L A P S | O N | T H E | A N I M A L S | F L A N K S |
+P R E S I D E N T | B R O W N | I | W I T H D R A W | M Y | C R I T I C I S M | I | O F F E R | Y O U | M Y | H U M B L E | A P O L O G I E S |
+Y E S | T H E | C O U N T R Y | I S | F U L L | O F | C A V E S |
+A L L | A G R E E D | T H A T | T A D ' S | S U P E R I O R | H O R S E M A N S H I P | A L O N E | H A D | W O N | T H E | R A C E | F O R | H I M |
+T H E | G R E A T | G R E E N | F I E L D | S U R R O U N D E D | O N | A L L | S I D E S | B Y | T A L L | T R E E S | M A D E | T H E | P L A C E | A N | I D E A L | O N E | F O R | T H E I R | P U R P O S E |
+I | A M | F R E E | T O | A D M I T | T H A T | I | A M | H U N G R Y | T O O |
+T H E | F I R S T | T I M E | H E | R O D E | S W I F T L Y | B Y | I T | L E A N I N G | O V E R | T O | L O O K | A T | T H E | H A T | A S | H E | P A S S E D | H O L D I N G | T O | T H E | P O M M E L | F I R M L Y | W I T H | H I S | L E F T | H A N D |
+B U T | I | K N O W | A N | O L D | S E T T L E R | W H O | W I L L | L E N D | U S | H I S | D O G | I F | I T | I S | N O T | O U T |
+N O W | F A L L | T O | Y O U N G | G E N T L E M E N | D I R E C T E D | T H E | P R O F E S S O R |
+A S | A | R E S U L T | I N S T E A D | O F | S T O P P I N G | W H E N | H E | R E A C H E D | T H E | H A T | T H E | B O Y | K E P T | O N | G O I N G |
+A T | T H E | M O M E N T | W H E N | H E | F R E E D | H I S | L E F T | F O O T | F R O M | T H E | S T I R R U P | H E | T H R E W | H I S | B O D Y | S H A R P L Y | T O | T H E | R I G H T | R E A C H I N G | F O R | T H E | H A T | W I T H O U T | T A K I N G | T H E | P R E C A U T I O N | T O | G R A S P | T H E | P O M M E L |
+T A D | I S | A N | E X P E R I E N C E D | R I D E R |
+L I G E | L E A N I N G | O V E R | T H E | B R I N K | W A S | A B L E | T O | F O L L O W | T H E | B O Y ' S | M O V E M E N T S | B Y | T H E | A I D | O F | T H E | T H I N | A R C | O F | L I G H T | M A D E | B Y | T H E | T O R C H | I N | T A D ' S | H A N D |
+A R E | Y O U | R E A D Y | Y E S |
+L I G E | Q U I C K L Y | M A D E | F A S T | T H E | L I N E | T O | A | T R E E |
+S L O W L Y | B U T | S T E A D I L Y | T H E | S L E N D E R | L I N E | W A S | P A I D | O U T | A M I D | A | T E N S E | S I L E N C E | O N | T H E | P A R T | O F | T H E | L I T T L E | G R O U P | A T | T H E | T O P | O F | T H E | C A N Y O U |
+L O D G E D | I N | T H E | B R A N C H E S | O F | A | P I N Y O N | T R E E | I | T H I N K | I T | I S | B U T | H E | D O E S N ' T | A N S W E R | M E |
+A F T E R | W H A T | S E E M E D | T O | T H E M | H O U R S | A | S H A R P | C A L L | F R O M | T H E | D E P T H S | R E A C H E D | T H E I R | E A R S |
+I | P R O T E S T | S H O U T E D | T H E | P R O F E S S O R |
+L O O K S | L I K E | A | C L U M P | O F | B U S H E S | D O W N | T H E R E | B U T | I | A I N ' T | S U R E | C A N | Y O U | M A K E | I T | O U T |
+B E | S U R E | T O | F A S T E N | H I M | S E C U R E L Y | T O | T H E | L O O P | B E F O R E | Y O U | G I V E | T H E | S I G N A L | T O | H A U L | U P | W A R N E D | T H E | G U I D E |
+Y O U ' L L | A L L | B E | O V E R | I F | Y O U | D O N ' T | H A V E | A | C A R E |
+I | C O U L D | N O T | T H I N K | O F | A L L O W I N G | A N Y | O F | M Y | C H A R G E S | T O | T A K E | S O | T E R R I B L E | A | R I S K | A N D |
+A N D | T H A T | T U M B L E ' S | E N O U G H | T O | K N O C K | T H E | S E N S E | O U T | O F | A | F U L L | G R O W N | M A N |
+I | A M | T H E | O N E | T O | G O | A F T E R | W A L T | I F | A N Y O N E | H A S | T O | I ' L L | G O | D O W N | M I S T E R | T H O M A S |
+M E B B Y | Y O U | T H I N K | H E ' S | H A V I N G | S O M E | S O R T | O F | A | P I C N I C | D O W N | T H E R E | E H | G L A R E D | L I G E |
+T H E | M O V E M E N T | S E N T | H I S | B O D Y | S W A Y I N G | G I D D I L Y | F R O M | S I D E | T O | S I D E |
+S H A L L | W E | H A U L | U P | A S K E D | L I G E | M A K I N G | A | M E G A P H O N E | O F | H I S | H A N D S | Y E S | H A U L | A W A Y |
+Y E S | A G R E E D | T A D | T H A T | D O E S | L O O K | L I K E | B U S H E S |
+M A S T E R | T A D | I S | R I G H T | D E C I D E D | T H E | G U I D E | G A Z I N G | A T | T H E | T W O | B O Y S | A P P R O V I N G L Y |
+B U T | F R O M | T H E | C A U T I O U S | M O V E M E N T S | O F | T H E | L I G H T | F A R | B E L O W | T H E M | T H E | G U I D E | U N D E R S T O O D | T H A T | T H E | L A D | W A S | A T | W O R K | C A R R Y I N G | O U T | H I S | P A R T | O F | T H E | T A S K | O F | R E S C U E | T O | T H E | B E S T | O F | H I S | A B I L I T Y |
+H E | T I L T E D | H I S | H E A D | T O | L O O K | U P |
+D O N ' T | M O V E | A R O U N D | L I E | P E R F E C T L Y | S T I L L | W A R N E D | T H E | G U I D E | A R E | Y O U | H U R T |
+C A U T I O U S L Y | P L A C I N G | A | H A N D | A G A I N S T | T H E | R O C K S | T O | S T E A D Y | H I M S E L F | T A D | W I S E L Y | C O N C L U D E D | T H A T | H E R E A F T E R | I T | W O U L D | N O T | P A Y | T O | B E | T O O | C U R I O U S |
+S U R E | T H I N G | A N S W E R E D | T H E | B O Y |
+H E ' S | S O | F A R | T O | T H E | R I G H T | O F | M E | T H A T | I | C A N ' T | R E A C H | H I M |
+Y O U ' D | H A V E | B O T H | O F | U S | A T | T H E | B O T T O M | I F | I | L E F T | I T | T O | Y O U | T O | T A K E | C A R E | O F | T H I S | E N D |
+N O R | W A S | H I S | S E N S E | O F | S E C U R I T Y | I N C R E A S E D | W H E N | I N | S H I F T I N G | H I S | P O S I T I O N | T H E | T O R C H | F E L L | F R O M | H I S | G R A S P | T H E | F A G O T S | S C A T T E R I N G | A S | T H E Y | S L I P P E D | D O W N | B E T W E E N | T H E | L I M B S | O F | T H E | T R E E | A N D | W H I R L I N G | I N | E V E R | D I M I N I S H I N G | C I R C L E S | U N T I L | F I N A L L Y | H E | H E A R D | T H E M | C L A T T E R | O N | T H E | R O C K S | B E L O W |
+I | S E E | H I M | C A L L E D | T A D | H I S | V O I C E | S O U N D I N G | H O L L O W | A N D | U N N A T U R A L | T O | T H O S E | A B O V E |
+N O | I | A M | T H E | L I G H T E R | O F | T H E | T W O | U R G E D | T A D |
+T O | M A K E | D R Y | T O A S T |
+M A I Z E | N E X T | T O | W H E A T | A N D | R I C E | M A I Z E | I S | T H E | G R A I N | M O S T | U S E D | I N | T H E | N O U R I S H M E N T | O F | M A N |
+L O O K | A T | I T | F R O M | T I M E | T O | T I M E | W H E N | I T | H A S | B E E N | L A I D | F O R | N E A R L Y | A N | H O U R | A N D | W H E N | T H E | Y E A S T | H A S | R I S E N | A N D | B R O K E N | T H R O U G H | T H E | F L O U R | S O | T H A T | B U B B L E S | A P P E A R | I N | I T | Y O U | W I L L | K N O W | T H A T | I T | I S | R E A D Y | T O | B E | M A D E | U P | I N T O | D O U G H |
+S O Y E R | R E C O M M E N D S | T H A T | E A C H | S L I C E | S H O U L D | B E | C U T | I N T O | P I E C E S | A S | S O O N | A S | I T | I S | B U T T E R E D | A N D | W H E N | A L L | A R E | R E A D Y | T H A T | T H E Y | S H O U L D | B E | P I L E D | L I G H T L Y | O N | T H E | D I S H | T H E Y | A R E | I N T E N D E D | T O | B E | S E R V E D | O N |
+I T A L I A N | R U S K S |
+M O D E | P U T | T H E | F L O U R | I N T O | A | B A S I N | M I X | T H E | S U G A R | W E L L | W I T H | I T | M A K E | A | H O L E | I N | T H E | C E N T R E | A N D | S T I R | I N | T H E | Y E A S T | A N D | M I L K | W H I C H | S H O U L D | B E | L U K E W A R M | W I T H | E N O U G H | O F | T H E | F L O U R | T O | M A K E | I T | T H E | T H I C K N E S S | O F | C R E A M |
+M O D E | P U T | T H E | M I L K | A N D | B U T T E R | I N T O | A | S A U C E P A N | A N D | K E E P | S H A K I N G | I T | R O U N D | U N T I L | T H E | L A T T E R | I S | M E L T E D |
+T H E S E | B U N S | M A Y | B E | V A R I E D | B Y | A D D I N G | A | F E W | C U R R A N T S | C A N D I E D | P E E L | O R | C A R A W A Y | S E E D S | T O | T H E | O T H E R | I N G R E D I E N T S | A N D | T H E | A B O V E | M I X T U R E | A N S W E R S | F O R | H O T | C R O S S | B U N S | B Y | P U T T I N G | I N | A | L I T T L E | G R O U N D | A L L S P I C E | A N D | B Y | P R E S S I N G | A | T I N | M O U L D | I N | T H E | F O R M | O F | A | C R O S S | I N | T H E | C E N T R E | O F | T H E | B U N |
+A | L O A F | O F | H O U S E H O L D | B R E A D | A B O U T | T W O | D A Y S | O L D | A N S W E R S | F O R | M A K I N G | T O A S T | B E T T E R | T H A N | C O T T A G E | B R E A D | T H E | L A T T E R | N O T | B E I N G | A | G O O D | S H A P E | A N D | T O O | C R U S T Y | F O R | T H E | P U R P O S E |
+W H E N | W E | T A K E | I N T O | A C C O U N T | T H A T | T H E | A R A B I A N S | A R E | F O N D | O F | L I Z A R D S | A N D | L O C U S T S | A S | A R T I C L E S | O F | F O O D | T H E I R | C U I S I N E | A L T O G E T H E R | I S | S C A R C E L Y | A | T E M P T I N G | O N E |
+I T A L I A N | M I L L E T | O R | G R E A T | I N D I A N | M I L L E T | I S | C U L T I V A T E D | I N | E G Y P T | A N D | N U B I A | W H E R E | I T | I S | C A L L E D | D H O U R R A | A N D | I S | U S E D | A S | H U M A N | F O O D | A S | W E L L | A S | F O R | T H E | F E R M E N T A T I O N | O F | B E E R |
+M O D E | P U T | T H E | F L O U R | I N T O | A | L A R G E | E A R T H E N W A R E | B O W L | O R | D E E P | P A N | T H E N | W I T H | A | S T R O N G | M E T A L | O R | W O O D E N | S P O O N | H O L L O W | O U T | T H E | M I D D L E | B U T | D O | N O T | C L E A R | I T | E N T I R E L Y | A W A Y | F R O M | T H E | B O T T O M | O F | T H E | P A N | A S | I N | T H A T | C A S E | T H E | S P O N G E | O R | L E A V E N | A S | I T | W A S | F O R M E R L Y | T E R M E D | W O U L D | S T I C K | T O | I T | W H I C H | I T | O U G H T | N O T | T O | D O |
+W H E N | T H E Y | A R E | Q U I T E | H O T | D I V I D E | T H E M | L E N G T H W I S E | I N T O | T H R E E | P U T | S O M E | T H I N | F L A K E S | O F | G O O D | B U T T E R | B E T W E E N | T H E | S L I C E S | P R E S S | T H E | R O L L S | T O G E T H E R | A N D | P U T | T H E M | I N | T H E | O V E N | F O R | A | M I N U T E | O R | T W O | B U T | N O T | L O N G E R | O R | T H E | B U T T E R | W O U L D | O I L | T A K E | T H E M | O U T | O F | T H E | O V E N | S P R E A D | T H E | B U T T E R | E Q U A L L Y | O V E R | D I V I D E | T H E | R O L L S | I N | H A L F | A N D | P U T | T H E M | O N | T O | A | V E R Y | H O T | C L E A N | D I S H | A N D | S E N D | T H E M | I N S T A N T L Y | T O | T A B L E |
+I L L U S T R A T I O N | M A I Z E | P L A N T |
+S E V E N T E E N | T W E N T Y | F O U R |
+M O D E | W H I S K | T H E | E G G | S T I R | I N | T H E | S U G A R | A N D | B E A T | T H E S E | I N G R E D I E N T S | W E L L | T O G E T H E R | B E A T | T H E | B U T T E R | T O | A | C R E A M | S T I R | I N | T H E | G R O U N D | R I C E | C U R R A N T S | A N D | C A N D I E D | P E E L | A N D | A S | M U C H | F L O U R | A S | W I L L | M A K E | I T | O F | S U C H | A | C O N S I S T E N C Y | T H A T | I T | M A Y | B E | R O L L E D | I N T O | S E V E N | O R | E I G H T | B A L L S |
+I L L U S T R A T I O N | R U S K S |
+S E V E N T E E N | T H I R T Y | F O U R |
+C U T | A S | M A N Y | N I C E | E V E N | S L I C E S | A S | M A Y | B E | R E Q U I R E D | R A T H E R | M O R E | T H A N | O N E | Q U A R T E R | I N C H | I N | T H I C K N E S S | A N D | T O A S T | T H E M | B E F O R E | A | V E R Y | B R I G H T | F I R E | W I T H O U T | A L L O W I N G | T H E | B R E A D | T O | B L A C K E N | W H I C H | S P O I L S | T H E | A P P E A R A N C E | A N D | F L A V O U R | O F | A L L | T O A S T |
+T H E N | P L A C E | T H E | P A N | O N | A | S T R O N G | C H A I R | O R | D R E S S E R | O R | T A B L E | O F | C O N V E N I E N T | H E I G H T | P O U R | I N T O | T H E | S P O N G E | T H E | R E M A I N D E R | O F | T H E | W A R M | M I L K | A N D | W A T E R | S T I R | I N T O | I T | A S | M U C H | O F | T H E | F L O U R | A S | Y O U | C A N | W I T H | T H E | S P O O N | T H E N | W I P E | I T | O U T | C L E A N | W I T H | Y O U R | F I N G E R S | A N D | L A Y | I T | A S I D E |
+T U R N | I T | T H E N | O N | T O | A | P A S T E | B O A R D | O R | V E R Y | C L E A N | D R E S S E R | A N D | W I T H | A | L A R G E | S H A R P | K N I F E | D I V I D E | I T | I N | T W O | M A K E | I T | U P | Q U I C K L Y | I N T O | L O A V E S | A N D | D I S P A T C H | I T | T O | T H E | O V E N | M A K E | O N E | O R | T W O | I N C I S I O N S | A C R O S S | T H E | T O P S | O F | T H E | L O A V E S | A S | T H E Y | W I L L | R I S E | M O R E | E A S I L Y | I F | T H I S | B E | D O N E |
+H O T | R O L L S |
+S E V E N T E E N | S E V E N T E E N |
+T O | M A K E | H O T | B U T T E R E D | T O A S T | S E V E N T E E N | T W E N T Y | S I X |
+K I R K L E A T H A M | Y E A S T |
+I L L U S T R A T I O N | I T A L I A N | M I L L E T |
+M O D E | L E T | T H E | T A R T A R I C | A C I D | A N D | S A L T | B E | R E D U C E D | T O | T H E | F I N E S T | P O S S I B L E | P O W D E R | T H E N | M I X | T H E M | W E L L | W I T H | T H E | F L O U R |
+S U F F I C I E N T | T O | M A K E | T W E L V E | B U N S | S E A S O N A B L E | A T | A N Y | T I M E | L I G H T | B U N S |
+M O V E | I T | B A C K W A R D S | A N D | F O R W A R D S | U N T I L | T H E | B R E A D | I S | N I C E L Y | C O L O U R E D | T H E N | T U R N | I T | A N D | T O A S T | T H E | O T H E R | S I D E | A N D | D O | N O T | P L A C E | I T | S O | N E A R | T H E | F I R E | T H A T | I T | B L A C K E N S |
+T H E Y | S H O U L D | B E | K E P T | I N | A | C L O S E D | T I N | C A N I S T E R | I N | A | D R Y | P L A C E | T O | P R E S E R V E | T H E I R | C R I S P N E S S |
+N E V E R | U S E | N E W | B R E A D | F O R | M A K I N G | A N Y | K I N D | O F | T O A S T | A S | I T | E A T S | H E A V Y | A N D | B E S I D E S | I S | V E R Y | E X T R A V A G A N T |
+E X C E L L E N T | R O L L S |
+S O M E | O F | T H E | P R E P A R A T I O N S | O F | M A I Z E | F L O U R | A R E | V E R Y | G O O D | A N D | W H E N | P A R T A K E N | I N | M O D E R A T I O N | S U I T A B L E | F O O D | F O R | A L M O S T | E V E R Y B O D Y |
+T O | M A K E | G O O D | H O M E | M A D E | B R E A D |
+I T | I S | N O T | C U L T I V A T E D | I N | E N G L A N D | B E I N G | P R I N C I P A L L Y | C O N F I N E D | T O | T H E | E A S T |
+N E X T | T A K E | E I T H E R | A | L A R G E | T A B L E S P O O N F U L | O F | B R E W E R ' S | Y E A S T | W H I C H | H A S | B E E N | R E N D E R E D | S O L I D | B Y | M I X I N G | I T | W I T H | P L E N T Y | O F | C O L D | W A T E R | A N D | L E T T I N G | I T | A F T E R W A R D S | S T A N D | T O | S E T T L E | F O R | A | D A Y | A N D | N I G H T | O R | N E A R L Y | A N | O U N C E | O F | G E R M A N | Y E A S T | P U T | I T | I N T O | A | L A R G E | B A S I N | A N D | P R O C E E D | T O | M I X | I T | S O | T H A T | I T | S H A L L | B E | A S | S M O O T H | A S | C R E A M | W I T H | T H R E E | Q U A R T E R S | P I N T | O F | W A R M | M I L K | A N D | W A T E R | O R | W I T H | W A T E R | O N L Y | T H O U G H | E V E N | A | V E R Y | L I T T L E | M I L K | W I L L | M U C H | I M P R O V E | T H E | B R E A D |
+F R O M | F I F T E E N | T O | T W E N T Y | M I N U T E S | W I L L | B E | R E Q U I R E D | T O | B A K E | T H E M | N I C E L Y |
+W H E N | C O L D | T H E Y | S H O U L D | B E | P U T | I N T O | T I N | C A N I S T E R S | T O | K E E P | T H E M | D R Y | A N D | I F | I N T E N D E D | F O R | T H E | C H E E S E | C O U R S E | T H E | S I F T E D | S U G A R | S H O U L D | B E | O M I T T E D |
+I T | H A S | B E E N | I N T R O D U C E D | I N T O | I T A L Y | W H E R E | T H E Y | M A K E | A | C O A R S E | B R E A D | F R O M | I T | A N D | I T | I S | A L S O | E M P L O Y E D | I N | P A S T R Y | A N D | P U D D I N G S | T H E Y | A L S O | U S E | I T | F O R | F E E D I N G | H O R S E S | A N D | D O M E S T I C | F O W L S |
+V I C T O R I A | B U N S | S E V E N T E E N | T H I R T Y | T W O |
+S U F F I C I E N T | A L L O W | T W O | C R U M P E T S | T O | E A C H | P E R S O N |
+M U F F I N S | A N D | C R U M P E T S | S H O U L D | A L W A Y S | B E | S E R V E D | O N | S E P A R A T E | D I S H E S | A N D | B O T H | T O A S T E D | A N D | S E R V E D | A S | E X P E D I T I O U S L Y | A S | P O S S I B L E |
+I L L U S T R A T I O N | B U N S |
+P L A I N | B U N S | S E V E N T E E N | T W E N T Y | N I N E |
+M O D E | B O I L | T H E | R I C E | I N | W A T E R | U N T I L | I T | I S | Q U I T E | T E N D E R | P O U R | O F F | T H E | W A T E R | A N D | P U T | T H E | R I C E | B E F O R E | I T | I S | C O L D | T O | T H E | F L O U R |
+S E V E N T E E N | E I G H T E E N |
+I T | W I L L | G R O W | O N | P O O R | S O I L S | A N D | I S | E X T R E M E L Y | P R O D U C T I V E |
+S O U R | M I L K | O R | B U T T E R M I L K | M A Y | B E | U S E D | B U T | T H E N | A | L I T T L E | L E S S | A C I D | W I L L | B E | N E E D E D |
+A N O T H E R | A D V A N T A G E | T H E | R E D | W H E A T S | P O S S E S S | I S | T H E I R | C O M P A R A T I V E | I M M U N I T Y | F R O M | T H E | A T T A C K S | O F | M I L D E W | A N D | F L Y |
+H E | S A Y S | T H A T | B Y | C U T T I N G | T H R O U G H | F O U R | O R | F I V E | S L I C E S | A T | A | T I M E | A L L | T H E | B U T T E R | I S | S Q U E E Z E D | O U T | O F | T H E | U P P E R | O N E S | W H I L E | T H E | B O T T O M | O N E | I S | S W I M M I N G | I N | F A T | L I Q U I D |
+I F | C A R R I E D | A N Y | D I S T A N C E | I T | S H O U L D | B E | S T O R E D | A W A Y | I N | A I R | T I G H T | V E S S E L S |
+A | Y E L L O W | V A R I E T Y | C A L L E D | G O L D E N | M I L L E T | I S | S O L D | I N | T H E | G R O C E R S | S H O P S | F O R | M A K I N G | P U D D I N G S | A N D | I S | V E R Y | D E L I C A T E | A N D | W H O L E S O M E |
+W H E N | S H E | H E A R D | O F | M Y | E N G A G E M E N T | W I T H | M A R Y | A N N | S H E | W R O T E | A N D | S U G G E S T E D | T H A T | W E | S H O U L D | S P E N D | O U R | H O N E Y M O O N | I N | H E R | C O T T A G E | O R | P I G S T Y E | A N D | T H A T | I | S H O U L D | P A Y | H E R | R E N T | F O R | I T |
+T H E | F A C T | O F | H A V I N G | G I V E N | M A R Y | A N N | A | W E D D I N G | P R E S E N T | S E E M S | T O | F I L L | T H E M | W I T H | A | F E E L I N G | O F | R A N C O R O U S | A C I D I T Y | W H I C H | T O | M E | I S | I N E X P L I C A B L E |
+A N D | O F | C O U R S E | I | H A D | M Y | E X P E C T A T I O N S | A N D | S H E | H A D | H E R S |
+F R O M | A | C O U S I N | O F | O U R S | W H O ' S | I N | T H A T | L I N E |
+T H A T | W A S | W H A T | M I S S U S | M A C P H E R S O N | S A I D | T O | M E | O N L Y | T H E | O T H E R | D A Y |
+T H E | A C C I D E N T | I N | Q U E S T I O N | O C C U R R E D | U P O N | T H E | S U N D A Y | E V E N I N G |
+I | F O U N D | T H A T | A S | A | W O M A N | O F | B U S I N E S S | S H E | W A S | B E Y O N D | A L L | M Y | E X P E C T A T I O N S |
+T H E | G I R L | I S | F R E T T I N G | B U T | Y O U | D O N ' T | S E E M | T O | N O T I C E | I T |
+S U C H | I S | T H E | S E L F I S H N E S S | O F | H U M A N | N A T U R E |
+I | C A N N O T | P R E T E N D | T O | E X P L A I N | W H Y | E X C E P T | O N | T H E | S U P P O S I T I O N | T H A T | R O M A N C E | I S | D E A D | A T | L E A S T | I N | T H A T | C I R C L E | O F | S O C I E T Y | I N | W H I C H | T H E | S N E L L I N G S | M O V E |
+I T | I S | M O S T | D E L I G H T F U L |
+H E R | F A T H E R | I S | A | M O S T | R E M A R K A B L E | P E R S O N | T O | S A Y | T H E | L E A S T |
+A N D | I | W I L L | S E E | T H A T | T H E R E | I S | N O | S H I R K I N G | A B O U T | T H E | B O Y S | O R | A B O U T | T H E | G I R L S | E I T H E R |
+I | D O | N O T | K N O W | W H E N | I T | I S | G O I N G | T O | B E | B U T | I T | W I L L | B E | E I T H E R | N E X T | W E E K | O R | T H E | W E E K | A F T E R | C E R T A I N L Y | A T | T H E | E A R L I E S T | P O S S I B L E | M O M E N T | A N D | I | S H O U L D N ' T | B E | A T | A L L | S U R P R I S E D | T O | L E A R N | T H A T | A L L | M A R Y | A N N ' S | T H I N G S | H A D | B E E N | A L R E A D Y | B O U G H T | A N D | P E R H A P S | S O M E | O F | T H E M | M A R K E D |
+A | S E C O N D | C O U S I N | O F | M A R Y | A N N ' S | I S | I N | T H E | C O O K ' S | T O U R S | L I N E |
+I | S H A L L | M A K E | P A P A | G I V E | M E | F I V E | H U N D R E D | P O U N D S | A T | L E A S T |
+I T | T U R N E D | O U T | T H A T | S H E | H A D | A | L I T T L E | M O N E Y | O F | H E R | O W N | A B O U T | A | H U N D R E D | A N D | T H I R T Y | P O U N D S | A | Y E A R |
+S O M E O N E | S N I G G E R E D |
+H E R S | H A S | B E E N | P R O D I G I O U S |
+B U T | I T | I S | Q U I T E | P L A I N | T O | M E | T H A T | A L L | T H E | A R R A N G E M E N T S | F O R | M Y | W E D D I N G | A R E | G O I N G | T O | B E | M A D E | B Y | T H E | S N E L L I N G S |
+P | S | T H E | C A R D S | A R E | O U T | F O R | T H E | W E D D I N G |
+I | H A V E | D R A W N | U P | A | L I S T | O F | A L L | T H E | P E O P L E | W H O | O U G H T | T O | G I V E | U S | A | P R E S E N T | A N D | I | S H A L L | T E L L | T H E M | W H A T | T H E Y | O U G H T | T O | G I V E | I T | W O N ' T | B E | M Y | F A U L T | I F | I | D O N ' T | G E T | I T |
+S H E | H A S | A | K N A C K | O F | G E T T I N G | P E O P L E | T O | D O | W H A T | S H E | W I S H E S | A N D | T O | G I V E | H E R | W H A T | S H E | W A N T S | W H I C H | I S | A | L I T T L E | S H O R T | O F | M I R A C U L O U S |
+I T | W A S | P L A I N | T H A T | T O G E T H E R | W E | S H O U L D | M A N A G E | M O S T | C O M F O R T A B L Y | D E L I G H T F U L L Y | I N | F A C T |
+W E | H A V E | A L L | B E E N | G I V I N G | M A R Y | A N N | P R E S E N T S | A N D | I | S U P P O S E | Y O U | M I S T E R | W H I T I N G | H A V E | B E E N | G I V I N G | H E R | S O M E T H I N G | T O O |
+O F | C O U R S E | T H E R E | A R E | S O M E | P E O P L E | W I T H | W H O M | Y O U | C A N ' T | B E | P E R F E C T L Y | P L A I N | B U T | I | S H A L L | B E | A S | P L A I N | A S | I | C A N | T H E R E ' S | A | W A Y | A N D | A | M A N N E R | O F | D O I N G | T H A T | K I N D | O F | T H I N G |
+H E | H A S | G I V E N | U S | F R E E | P A S S E S | A L L | T H E | W A Y | T O | T H E | E N D | O F | O U R | J O U R N E Y | A N D | A L L | T H E | W A Y | B A C K | A G A I N | A N D | C O U P O N S | F O R | F R E E | B O A R D | A N D | L O D G I N G | A T | T H E | H O T E L | I T ' S | A | W E D D I N G | P R E S E N T |
+A S | I T | I S | U N L E S S | I | A M | M I S T A K E N | S O M E | O F | T H E | R E N D I N G | W I L L | B E | O N | O U R | S I D E | A N D | T H E Y | K N O W | I T |
+F O R | I N S T A N C E | L O O K | A T | T H E I R | B E H A V I O U R | I N | T H E | M A T T E R | O F | T H E | R I N G |
+I | G A S P E D | P O S I T I V E L Y | G A S P E D |
+I | N E V E R | S A W | P E O P L E | L I K E | T H E | S N E L L I N G S | F O R | P O S S E S S I N G | R E L A T I V E S | I N | A L L | S O R T S | O F | L I N E S |
+I T | I S | F R O M | H E R | A C T I O N | I N | T H A T | M A T T E R | T H A T | M Y | S U S P I C I O N | S P R I N G S |
+I T | M I G H T | J U S T | A S | W E L L | B E | S O M E | O N E | E L S E ' S | W E D D I N G | S O | U N I M P O R T A N T | I S | T H E | P A R T | W H I C H | I | A M | S E T | T O | P L A Y | I N | I T |
+I | K N O W | W H A T | M A M M A | C A N | A F F O R D | T O | G I V E | A N D | I | W I L L | S E E | S H E | G I V E S | I T |
+W E | A R E | G O I N G | F O R | O U R | H O N E Y M O O N | T O | I T A L Y | A N D | T H E | S O U T H | O F | F R A N C E |
+B U T | W H Y | O N | T H A T | A C C O U N T | T H E Y | S H O U L D | P I T Y | M E | I | A L T O G E T H E R | F A I L | T O | U N D E R S T A N D |
+I | N O T I C E | T H A T | T H E Y | A R E | G E N E R A L L Y | P E R S O N S | W H O | H A V E | A L R E A D Y | T E N D E R E D | T H E I R | O F F E R I N G S |
+T H E R E | W E R E | N O | S I G N S | O F | F A L T E R I N G | A B O U T | H E R | F L O W | O F | L A N G U A G E |
+A | B I R D | I N | T H E | H A N D | I S | W O R T H | T W O | I N | A | B U S H ' | A N D | I T | W I L L | B E | S O M E T H I N G | T O | H A V E | B Y | U S |
+T H A T ' S | I T | O N | Y O U R | A C C O U N T |
+A N D | W H A T | I N Q U I R E D | M I S S U S | M A C P H E R S O N | H A S | M A R Y | A N N | G I V E N | Y O U | H E R | L O V E |
+T H E R E | S H E | O W N S | A | C O T T A G E | O R | I T | M A Y | B E | A | P I G S T Y E | F O R | A L L | I | K N O W |
+B E S I D E S | W H I C H | W E | C A N | A L W A Y S | S E L L | T H E | C O U P O N S | A N D | R A I L W A Y | P A S S E S | W H I C H | W E | D O N ' T | U S E |
+I | W A S | P E R S U A D E D | T H A T | S O M E B O D Y | B E S I D E S | T H A T | C O U S I N | G O T | A | P R O F I T | O U T | O F | M A R Y | A N N ' S | E N G A G E M E N T | R I N G | B U T | I | H A N D E D | O V E R | T H E | A M O U N T |
+W H E N | A T | L A S T | I | R E A C H E D | C R O F T O N | M Y | J O U R N E Y ' S | E N D | I T | T U R N E D | O U T | T H A T | T H E | S T A T I O N | S T A F F | C O N S I S T E D | O F | A | H A L F | W I T T E D | I N D I V I D U A L | W H O | W A S | S T A T I O N M A S T E R | P O R T E R | A N D | C L E R K | C O M B I N E D | A N D | A | H U L K I N G | L A D | W H O | D I D | W H A T E V E R | E L S E | T H E R E | W A S | T O | D O |
+H E R E | W E | B E | T H A T | M I G H T | B E | S O |
+N O | A N S W E R | T H O U G H | I | A L L O W E D | A | M O R E | T H A N | D E C E N T | I N T E R V A L |
+T H E R E | A P P E A R E D | T O | B E | N O | K N O C K E R | T H O U G H | W H E T H E R | I T | H A D | B E E N | T W I S T E D | O F F | W A S | M O R E | T H A N | I | C O U L D | S A Y |
+W E ' V E | L O S T | T H E | K E Y | O F | T H E | C E L L A R | A N D | T H E R E ' S | N O T H I N G | O U T | E X C E P T | W A T E R | A N D | I | D O N ' T | T H I N K | Y O U ' D | C A R E | F O R | T H A T |
+I N | I T | I | W A S | D E P O S I T E D | W I T H | M Y | L U G G A G E |
+I | E V E N | B O U G H T | S O M E T H I N G | F O R | M A D G E | I | M E A N | M I S S U S | W I L S O N |
+A | V O I C E | I N Q U I R E D | W H O ' S | T H E R E |
+T H E | R E P L Y | W A S | W R I T T E N | I N | A | S P R A W L I N G | F E M I N I N E | H A N D | I T | W A S | A | L I T T L E | V A G U E |
+B E T T E R | R I N G | A G A I N | S U G G E S T E D | T H E | D R I V E R | H A R D |
+T H E | W H O L E | T H I N G | W A S | A | T R I F L E | O D D |
+I | D I D | N O T | A S K | I | W A S | B E Y O N D | I T |
+I | H A D | N O | I L L U S I O N S |
+W H O | L I V E S | H E R E | A R E | T H E | P E O P L E | M A D |
+I | I M A G I N E | T H E R E | W E R E | S E V E R A L | K I N D S | O F | O L D | F A S H I O N E D | C H R I S T M A S E S | B U T | I T | C O U L D | H A R D L Y | B E | W O R S E | T H A N | A | C H O P | I N | M Y | C H A M B E R S | O R | H O R R O R | O F | H O R R O R S | A T | T H E | C L U B | O R | M Y | C O U S I N | L U C Y ' S | N O T I O N | O F | W H A T | S H E | C A L L S | T H E | F E S T I V E | S E A S O N |
+N O W | I T | I S | A | R E M A R K A B L E | T H I N G | T H A T | I | H A V E | A L W A Y S | H A D | A N | E X T R A O R D I N A R Y | P R E D I L E C T I O N | F O R | T H E | N A M E | M A D G E | I | D O | N O T | K N O W | W H Y |
+I T | I S | S O M E | S A T I S F A C T I O N | F O R | M E | T O | B E | A B L E | T O | R E F L E C T | T H A T | I | M A D E | I T | W A R M | F O R | T H E | O F F I C I A L S | H O W E V E R | C O L D | I | M I G H T | H A V E | B E E N | M Y S E L F |
+H E | W A S | I M P E R V I O U S | T O | R E A S O N |
+I | W A S | C H I L L E D | T O | T H E | B O N E | W E T | T I R E D | H U N G R Y |
+I | F E L T | Q U I T E | L I V E L Y | M Y S E L F | A S | I | M I N G L E D | W I T H | T H E | C H R I S T M A S | C R O W D | L O O K I N G | F O R | T H I N G S | W H I C H | M I G H T | N O T | T U R N | O U T | T O | B E | A B S O L U T E L Y | P R E P O S T E R O U S |
+W H E N | T H E | T R A P | D I D | A P P E A R | I T | L O O K E D | T O | M E | U N C O M M O N L Y | L I K E | A N | O P E N | S P R I N G | C A R T |
+T H E R E | W A S | A | T R A P | A T | T H E | B O Y | A N D | B L U N D E R B U S S | B U T | T H A T | R E Q U I R E D | F E T C H I N G |
+N O | O N E | H A D | C O M E | T O | M E E T | M E | T H E | V I L L A G E | W A S | A B O U T | H A L F | A | M I L E | A N D | H A N G A R | D E N E | T H E | H O U S E | F O R | W H I C H | M Y | S T E P S | W E R E | B E N T | A B O U T | F O U R | M I L E S | B Y | T H E | R O A D | H O W | F A R | I T | W A S | A C R O S S | P L O U G H E D | F I E L D S | M Y | I N F O R M A N T | D I D | N O T | M E N T I O N |
+T H E R E | W A S | N O T H I N G | S A I D | A B O U T | T H E | S O R T | O F | A C C O M M O D A T I O N | W H I C H | W O U L D | B E | P R O V I D E D | N O T H I N G | A B O U T | T H E | K I N D | O F | E S T A B L I S H M E N T | W H I C H | W A S | M A I N T A I N E D | O R | T H E | T A B L E | W H I C H | W A S | K E P T |
+M A Y B E | T H E Y ' R E | U P | T O | S O M E | O F | T H E I R | G A M E S | A N D | W A N T S | R O U S I N G |
+T H E R E | W A S | A | R U S H | O F | R E T R E A T I N G | F E E T | A N | E X P O S T U L A T I N G | V O I C E | T H E N | D A R K N E S S | A G A I N | A N D | S I L E N C E |
+T H E | B E L L | R E V E R B E R A T E D | T H R O U G H | W H A T | S E E M E D | L I K E | A N | E M P T Y | H O U S E |
+A L L | N I G H T | I T | H A D | B E E N | B L O W I N G | A N D | R A I N I N G |
+F E S T I V E | Y E S |
+I T | W A S | A | H O R R I B L E | J O U R N E Y |
+A F T E R | A | V A S T | A M O U N T | O F | U N F A S T E N I N G | T H E | D O O R | W A S | O P E N E D | A N D | O N | T H E | T H R E S H O L D | T H E R E | S T O O D | A | G I R L | W I T H | A | L I G H T E D | C A N D L E | I N | H E R | H A N D |
+I | D I D | N O T | E X P E C T | A | P R I N C E L Y | E N T E R T A I N M E N T |
+P R E S E N T L Y | F E E T | W E R E | H E A R D | A D V A N C I N G | A L O N G | T H E | P A S S A G E | S E V E R A L | P A I R S | I T | S E E M E D | A N D | A | L I G H T | G L E A M E D | T H R O U G H | T H E | W I N D O W | O V E R | T H E | D O O R |
+I | D I D | N O T | K N O W | W H A T | H E | M E A N T |
+I T | A P P E A R E D | T H A T | T H E | T E R M S | W O U L D | B E | F I V E | G U I N E A S | B U T | T H E R E | W A S | N O | M E N T I O N | O F | T H E | L E N G T H | O F | T I M E | W H I C H | T H A T | F E E | W O U L D | C O V E R |
+T H E | I N F O R M A T I O N | W A S | G R E E T E D | W I T H | W H A T | S O U N D E D | U N C O M M O N L Y | L I K E | A | C H O R U S | O F | L A U G H T E R |
+U N D E R | S U C H | C I R C U M S T A N C E S | S H E | W A S | H A R D L Y | L I K E L Y | T O | B E | L I V E L Y | H E R S E L F | B U T | H E R | N A M E | W A S | M A D G E | A N D | I T | W A S | T H E | A C C I D E N T | O F | H E R | C H R I S T I A N | N A M E | W H I C H | D E C I D E D | M E | T O | G O |
+I | T O L L E D | T H E | B E L L | A G A I N |
+I | H A V E | N E V E R | K N O W N | A | M A D G E | A N D | Y E T | F R O M | M Y | B O Y H O O D | U P W A R D | I | H A V E | D E S I R E D | T O | M E E T | O N E |
+I | H A D | L O N G | B E E N | W I S H I N G | T H A T | A N | O L D | F A S H I O N E D | C H R I S T M A S | H A D | B E E N | C O M P L E T E L Y | E X T I N C T | B E F O R E | I | H A D | T H O U G H T | O F | A D V E N T U R I N G | I N | Q U E S T | O F | O N E |
+T H E R E | B E | T H E | D O O R | I N | F R O N T | O F | Y O U | Y O U | G O | U P | T H R E E | S T E P S | I F | Y O U | C A N | F I N D | E M |
+I ' M | M I S T E R | C H R I S T O P H E R | F R O M | L O N D O N |
+T H E R E ' S | A | K N O C K E R | I F | N O N E | O F | E M | H A V E N ' T | T W I S T E D | I T | O F F |
+E A R L Y | I N | T H E | M O R N I N G | T H E | S T E P M O T H E R | C A M E | A N D | P U L L E D | T H E M | O U T | O F | B E D | A N D | G A V E | T H E M | E A C H | A | S L I C E | O F | B R E A D | W H I C H | W A S | S T I L L | S M A L L E R | T H A N | T H E | F O R M E R | P I E C E |
+B U T | S H E | L E F T | H I M | N O | P E A C E | T I L L | H E | C O N S E N T E D | S A Y I N G | A H | B U T | I | S H A L L | M I S S | T H E | P O O R | C H I L D R E N |
+A N D | S H E | G O T | U P | A N D | P U T | H E R | H E A D | I N T O | T H E | O V E N |
+T H E N | S H E | T O O K | U P | H A N S E L | W I T H | H E R | R O U G H | H A N D | A N D | S H U T | H I M | U P | I N | A | L I T T L E | C A G E | W I T H | A | L A T T I C E | D O O R | A N D | A L T H O U G H | H E | S C R E A M E D | L O U D L Y | I T | W A S | O F | N O | U S E |
+H A N S E L | T H O U G H T | T H E | R O O F | T A S T E D | V E R Y | N I C E | A N D | S O | H E | T O R E | O F F | A | G R E A T | P I E C E | W H I L E | G R E T H E L | B R O K E | A | L A R G E | R O U N D | P A N E | O U T | O F | T H E | W I N D O W | A N D | S A T | D O W N | Q U I T E | C O N T E N T E D L Y |
+W E | A R E | G O I N G | I N T O | T H E | F O R E S T | T O | H E W | W O O D | A N D | I N | T H E | E V E N I N G | W H E N | W E | A R E | R E A D Y | W E | W I L L | C O M E | A N D | F E T C H | Y O U | A G A I N |
+C R E E P | I N | S A I D | T H E | W I T C H | A N D | S E E | I F | I T | I S | H O T | E N O U G H | A N D | T H E N | W E | W I L L | P U T | I N | T H E | B R E A D | B U T | S H E | I N T E N D E D | W H E N | G R E T H E L | G O T | I N | T O | S H U T | U P | T H E | O V E N | A N D | L E T | H E R | B A K E | S O | T H A T | S H E | M I G H T | E A T | H E R | A S | W E L L | A S | H A N S E L |
+A H | F A T H E R | S A I D | H A N S E L | I | A M | L O O K I N G | A T | M Y | W H I T E | C A T | S I T T I N G | U P O N | T H E | R O O F | O F | T H E | H O U S E | A N D | T R Y I N G | T O | S A Y | G O O D | B Y E |
+D E A R | G O O D | G O D | H E L P | U S | N O W | S H E | P R A Y E D |
+O H | Y O U | S I M P L E T O N | S A I D | S H E | T H E N | W E | M U S T | A L L | F O U R | D I E | O F | H U N G E R | Y O U | H A D | B E T T E R | P L A N E | T H E | C O F F I N S | F O R | U S |
+T H E N | T H E Y | B E G A N | T O | R U N | A N D | R U S H I N G | I N T O | T H E | H O U S E | T H E Y | F E L L | U P O N | T H E I R | F A T H E R ' S | N E C K |
+G R E T H E L | S H E | C R I E D | I N | A | P A S S I O N | G E T | S O M E | W A T E R | Q U I C K L Y | B E | H A N S E L | F A T | O R | L E A N | T H I S | M O R N I N G | I | W I L L | K I L L | A N D | C O O K | H I M |
+H E | H A D | L I T T L E | E N O U G H | T O | B R E A K | O R | B I T E | A N D | O N C E | W H E N | T H E R E | W A S | A | G R E A T | F A M I N E | I N | T H E | L A N D | H E | C O U L D | H A R D L Y | P R O C U R E | E V E N | H I S | D A I L Y | B R E A D | A N D | A S | H E | L A Y | T H I N K I N G | I N | H I S | B E D | O N E | N I G H T | H E | S I G H E D | A N D | S A I D | T O | H I S | W I F E | W H A T | W I L L | B E C O M E | O F | U S |
+H O W | C A N | W E | F E E D | O U R | C H I L D R E N | W H E N | W E | H A V E | N O | M O R E | T H A N | W E | C A N | E A T | O U R S E L V E S |
+C O M E | I N | A N D | S T O P | W I T H | M E | A N D | N O | H A R M | S H A L L | C O M E | T O | Y O U | A N D | S O | S A Y I N G | S H E | T O O K | T H E M | B O T H | B Y | T H E | H A N D | A N D | L E D | T H E M | I N T O | H E R | C O T T A G E |
+S E E | I | C O U L D | E V E N | G E T | I N | M Y S E L F |
+A N D | N O W | A S | T H E R E | W A S | N O T H I N G | T O | F E A R | T H E Y | W E N T | B A C K | T O | T H E | W I T C H ' S | H O U S E | W H E R E | I N | E V E R Y | C O R N E R | W E R E | C A S K E T S | F U L L | O F | P E A R L S | A N D | P R E C I O U S | S T O N E S |
+B U T | H E R | H U S B A N D | F E L T | H E A V Y | A T | H E A R T | A N D | T H O U G H T | I T | W E R E | B E T T E R | T O | S H A R E | T H E | L A S T | C R U S T | W I T H | T H E | C H I L D R E N |
+B U T | I N | R E A L I T Y | H A N S E L | W A S | N O T | L O O K I N G | A T | A | C A T | B U T | E V E R Y | T I M E | H E | S T O P P E D | H E | D R O P P E D | A | P E B B L E | O U T | O F | H I S | P O C K E T | U P O N | T H E | P A T H |
+T H E | O L D | W O M A N | B E H A V E D | V E R Y | K I N D L Y | T O | T H E M | B U T | I N | R E A L I T Y | S H E | W A S | A | W I C K E D | O L D | W I T C H | W H O | W A Y | L A I D | C H I L D R E N | A N D | B U I L T | T H E | B R E A D H O U S E | I N | O R D E R | T O | E N T I C E | T H E M | I N | B U T | A S | S O O N | A S | T H E Y | W E R E | I N | H E R | P O W E R | S H E | K I L L E D | T H E M | C O O K E D | A N D | A T E | T H E M | A N D | M A D E | A | G R E A T | F E S T I V A L | O F | T H E | D A Y |
+G R E T H E L | B E G A N | T O | C R Y | B U T | I T | W A S | A L L | U S E L E S S | F O R | T H E | O L D | W I T C H | M A D E | H E R | D O | A S | S H E | W A N T E D |
+T H A T | N I G H T | T H E | E N E M Y | S L I P P E D | A W A Y | L E A V I N G | H U N D R E D S | A N D | H U N D R E D S | O F | H I S | D E A D | A N D | W O U N D E D | O N | T H E | F I E L D |
+T H E R E | W E R E | N O | B R E A S T W O R K S | Y E T | T H A T | O N E | L I T T L E | B R I G A D E | O F | H A M I L T O N ' S | D I V I S I O N | S T O O D | T H E R E | I N | T H E | O P E N | A N D | R E P U L S E D | A S S A U L T | A F T E R | A S S A U L T |
+I T | R E Q U I R E D | M O N T H S | A N D | G R E A T | E V E N T S | T O | M A K E | G R A N T | T H E | H E R O | O F | T H E | A R M Y | W H I C H | H E | A F T E R W A R D | B E C A M E |
+F O R | S O M E | R E A S O N | T H E | D E A D | A T | H A T C H I E | B R I D G E | W E R E | N O T | B U R I E D |
+I | C O U L D | N O T | H E L P | M Y | F R I E N D |
+O N C E | I N | T H E | N I G H T | I | S L I P P E D | A W A Y | F R O M | T H E | B I V O U A C | A N D | H U R R I E D | T O | T H E | O L D | T I S H I M I N G O | H O T E L | T O | S E E | A | L I E U T E N A N T | O F | M Y | C O M P A N Y | W H O | H A D | B E E N | S H O T | T H R O U G H | T H E | B R E A S T |
+O U R | B R I G A D E | W A S | F E A R F U L L Y | O U T N U M B E R E D |
+I | H A S T E N E D | B A C K | T O | T H E | L I N E S |
+F I F T E E N | O F F I C E R S | O F | O U R | L I T T L E | H A L F | R E G I M E N T | W E R E | D E A D | O R | W O U N D E D |
+T H A T | N I G H T | I | S T O O D | G U A R D | U N D E R | A N | O A K | T R E E | O N | T H E | B A T T L E F I E L D | A M O N G | T H E | U N B U R I E D | D E A D |
+W H E N | M O R N I N G | C A M E | T H E | F I R I N G | O P E N E D | A N D | F O R | A L L | T H A T | D A Y | T H E | B A T T L E | R A G E D | F I E R C E L Y | A T | T H E | L E F T | A N D | C E N T E R | L E F T | W E | G E T T I N G | T H E | W O R S T | O F | I T | T O O |
+A | P E R F E C T | B L A Z E | O F | C L O S E | R A N G E | M U S K E T R Y | T O O | M O W E D | T H E M | D O W N | L I K E | G R A S S |
+G O | B A C K | T O | T H E | R E G I M E N T | H E | S A I D | S M I L I N G | A L L | W I L L | B E | N E E D E D |
+G R A N T | W A S | O N L Y | A | F E W | M I L E S | A W A Y | B U T | A L T H O U G H | C O M M A N D E R | I N | C H I E F | H E | K N E W | N O T H I N G | O F | T H E | H A R D E S T | F O U G H T | B A T T L E | O F | T H E | C I V I L | W A R | U N T I L | I T | W A S | O V E R |
+R O S E C R A N S | P R O T E S T E D | I T | W A S | I N | V A I N |
+T H E | C L O U D | O F | R E B E L S | W E | H A D | S E E N | D I V I D E D | I T S E L F | I N T O | T H R E E | C O L U M N S |
+N O | B A T T E R Y | I N | T H E | W H O L E | F O U R | Y E A R S | W A R | L O S T | S O | M A N Y | M E N | I N | S O | S H O R T | A | T I M E |
+I T | W A S | N O T | A | Q U E S T I O N | W H O | W A S | D E A D | O R | W O U N D E D | B U T | W H O | W A S | N O T |
+A | W E E K | A F T E R | T H E | B A T T L E | M Y | B R O T H E R | R O D E | B Y | T H E R E | O N | A | C A V A L R Y | E X P E D I T I O N | A N D | M A D E | T H E | H O R R I B L E | D I S C O V E R Y | T H A T | H O G S | W E R E | E A T I N G | U P | T H E | B O D I E S | O F | O U R | D E A D | H E R O E S | T H A T | T O O | W A S | W A R |
+N O T | B A L A K L A V A | N O R | T H E | A L M A | S A W | S U C H | F I G H T I N G | I T | W A S | A | D U E L | T O | T H E | D E A T H |
+H E | S U R V I V E D | T H E | W A R | O N L Y | T O | B E | M U R D E R E D | L A T E R | O N | A | P L A N T A T I O N | I N | M I S S I S S I P P I |
+M Y | O W N | R E G I M E N T | W A S | I N | T H E | A D V A N C E |
+M Y | F R I E N D | W I T H | M A N Y | O T H E R S | W A S | B E I N G | C A R R I E D | O U T | T O | D I E | E L S E W H E R E |
+O N E | D A R I N G | R E B E L | W A S | S H O T | D O W N | A N D | B A Y O N E T E D | C L E A R | B E H I N D | T H E | L I N E | O F | C O M P A N Y | B | W H E R E | H E | H A D | B R O K E N | T H R O U G H | T O | S E I Z E | T H E | F L A G | O F | M Y | R E G I M E N T |
+I N D E E D | W E | O F | T H E | R A N K | A N D | F I L E | H A D | L I T T L E | C O N F I D E N C E | I N | G R A N T | I N | T H O S E | D A Y S |
+U N D E R | T H E | S A M E | Q U I E T | M O O N L I G H T | A N D | O N L Y | S I X | H U N D R E D | Y A R D S | A W A Y | F R O M | U S | A L S O | L A Y | T H E | V I C T O R I O U S | R E B E L | A R M Y |
+W I T H | A | F E W | L A N T E R N S | O U R | M E N | T H E N | W E N T | A B O U T | A N D | T R I E D | T O | G A T H E R | U P | T H E | W O U N D E D | T H E | D E A D | W E R E | L E F T | T I L L | M O R N I N G |
+T H E Y | L A Y | I N | H E A P S | O F | D O Z E N S | E V E N | C L O S E | U P | T O | T H E | W O R K S |
+I | R E M A I N E D | A W A K E | A L L | N I G H T | T A L K I N G | W I T H | A | C O M R A D E | W H O | S H A R E D | M Y | B L A N K E T | W I T H | M E | P O O R | J I M M Y | K I N G |
+T H A T | E V E N I N G | A N | O R D E R | C A M E | F O R | U S | H A M I L T O N ' S | D I V I S I O N | T O | A S S A U L T | T H E | E N E M Y ' S | L E F T | F L A N K | A T | M I D N I G H T |
+H E | I S | O L D | A S | W E L L | A S | P O O R | S H E | S A I D |
+A L L | M Y | W O R R I E S | A B O U T | Y O U | W E R E | F O O L I S H |
+T H E | C R I E S | A N D | C U R S E S | O F | T H E | R O B B E R S | F I L L E D | T H E | A I R |
+P E R H A P S | Y O U | C A N | O U T W I T | H E R | Y E T | C R I E D | A N O T H E R |
+H O W E V E R | I N | S P I T E | O F | A L L | S H E | C O U L D | S A Y | T H E | E L D E R | S I S T E R S | O P E N E D | T H E | D O O R | A N D | A D M I T T E D | T H E | B E G G A R |
+T H E | T W O | E L D E S T | A T E | T H E I R | A P P L E S | B U T | T H E | Y O U N G E S T | C O U L D | N O T | E A T | T H A T | N I G H T | S H E | T H R E W | T H E | A P P L E | A W A Y |
+E V E R Y | Y E A R | A T | A | C E R T A I N | D A Y | O F | A | C E R T A I N | M O N T H | H E | W E N T | A W A Y | T O | A | D I S T A N T | C I T Y | T O | C O L L E C T | M O N E Y | O N | A N | A C C O U N T |
+Y O U | S H A L L | N O T | C O M E | I N T O | M Y | F A T H E R ' S | H O U S E |
+I T | I S | A | F E A R F U L | N I G H T | T O | S E N D | A W A Y | A | B E G G A R | S A I D | T H E | E L D E S T | S I S T E R | W H I L E | T H E Y | W E R E | E A T I N G |
+L O N G | A G O | T H E R E | L I V E D | A | M E R C H A N T | W H O | H A D | T H R E E | D A U G H T E R S |
+T H E | M E R C H A N T ' S | D A U G H T E R | A T | F I R S T | D I D | N O T | A N S W E R | B U T | A S | H E | K E P T | O N | C A L L I N G | T O | H E R | S H E | F I N A L L Y | A S K E D | H I M | W H A T | I T | W A S | T H A T | H E | W A N T E D |
+I F | W E | D E C I D E | T O | S H O W | M E R C Y | T O | T H I S | P O O R | B E G G A R | I T | I S | N O T | F O R | Y O U | T O | O P P O S E | I T |
+I T ' S | S U R E L Y | A | T E R R I B L E | S T O R M | O U T S I D E | S A I D | T H E | M E R C H A N T ' S | E L D E S T | D A U G H T E R | A S | T H E | W I N D | R A T T L E D | T H E | T I L E S | O F | T H E | R O O F | A N D | T H E | R A I N | B E A T | I N | T O R R E N T S | A G A I N S T | T H E | D O O R S | A N D | W I N D O W S |
+W H I L E | T H E Y | W E R E | T A L K I N G | T H E | B E G G A R | H A D | T A K E N | T H E | A P P L E S | W H I C H | T H E | G I R L S | W E R E | T O | E A T | F O R | D E S S E R T | A N D | H A D | S P R I N K L E D | A | S L E E P I N G | P O W D E R | O V E R | T H E M |
+B U I | W E | S H O U L D | N O T | F O R G E T | O U R | P R O M I S E | T O | O U R | F A T H E R | C R I E D | T H E | Y O U N G E S T | D A U G H T E R |
+T H E Y | T R I E D | I N | V A I N | T O | B R E A K | D O W N | T H E | G R E A T | D O O R S |
+I | P R O M I S E | Y O U | I | W I L L | D O | Y O U | N O | H A R M |
+I T | W A S | T H E | Y O U N G E S T | O N E | W H O | D E C E I V E D | M E | C R I E D | T H E | R O B B E R | C H I E F T A I N |
+H A V E | P I T Y | U P O N | A | P O O R | U N F O R T U N A T E | O N E | H E | C A L L E D | O U T |
+T H E N | S H E | H E A R D | H I M | G O | D O W N | T H E | S T A I R W A Y | A N D | U N B O L T | T H E | H E A V Y | D O O R S | W H I C H | L E D | I N T O | T H E | S T O R E |
+W H E N | I T | W A S | E V E N I N G | H E | L E D | H I S | B A N D | I N T O | A | N E A R B Y | S T R E E T | A N D | I N | H I S | D I S G U I S E | A P P R O A C H E D | T H E | M E R C H A N T ' S | H O U S E | H E | K N O C K E D | A T | T H E | D O O R |
+P A S S | T H E | C H A R M | O U T | T O | M E | T H E N | S A I D | T H E | R O B B E R |
+W H E N | S H E | R E T U R N E D | H I S | H A N D | W A S | S T I C K I N G | T H R O U G H | T H E | H O L E | I N | T H E | D O O R |
+S H E | D I D | N O T | S T I R | A N D | H E | K N E W | T H A T | T H E | S L E E P I N G | P O W D E R | H A D | T H O R O U G H L Y | D O N E | I T S | W O R K |
+L E T | M E | E N T E R | I | P R A Y | Y O U | T O | P A S S | T H E | N I G H T | U N D E R | Y O U R | R O O F |
+H O W | D O | Y O U | K N O W | A S K E D | T H E I R | F A T H E R | I | A M | O L D E R | A N D | W I S E R | T H A N | Y O U | A R E | A N D | I | K N O W | T H A T | T H E R E | A R E | M A N Y | E V I L S | W H I C H | M I G H T | C O M E | U P O N | Y O U |
+W H E N | A N | A R I S T O C R A C Y | C A R R I E S | O N | T H E | P U B L I C | A F F A I R S | I T S | N A T I O N A L | P R I D E | N A T U R A L L Y | A S S U M E S | T H I S | R E S E R V E D | I N D I F F E R E N T | A N D | H A U G H T Y | F O R M | W H I C H | I S | I M I T A T E D | B Y | A L L | T H E | O T H E R | C L A S S E S | O F | T H E | N A T I O N |
+T H E Y | T H E R E F O R E | E N T E R T A I N | A | C A L M | S E N S E | O F | T H E I R | S U P E R I O R I T Y | T H E Y | D O | N O T | D R E A M | O F | V A U N T I N G | P R I V I L E G E S | W H I C H | E V E R Y O N E | P E R C E I V E S | A N D | N O | O N E | C O N T E S T S | A N D | T H E S E | T H I N G S | A R E | N O T | S U F F I C I E N T L Y | N E W | T O | T H E M | T O | B E | M A D E | T O P I C S | O F | C O N V E R S A T I O N |
+T H E N | T H E R E | A R E | I N | A L L | C L A S S E S | A | V E R Y | L A R G E | N U M B E R | O F | M E N | C O N S T A N T L Y | O C C U P I E D | W I T H | T H E | S E R I O U S | A F F A I R S | O F | T H E | G O V E R N M E N T | A N D | T H O S E | W H O S E | T H O U G H T S | A R E | N O T | E N G A G E D | I N | T H E | D I R E C T I O N | O F | T H E | C O M M O N W E A L T H | A R E | W H O L L Y | E N G R O S S E D | B Y | T H E | A C Q U I S I T I O N | O F | A | P R I V A T E | F O R T U N E |
+I | D O | N O T | B E L I E V E | I N | S U C H | R E P U B L I C S | A N Y | M O R E | T H A N | I N | T H A T | O F | P L A T O | O R | I F | T H E | T H I N G S | W E | R E A D | O F | R E A L L Y | H A P P E N E D | I | D O | N O T | H E S I T A T E | T O | A F F I R M | T H A T | T H E S E | S U P P O S E D | D E M O C R A C I E S | W E R E | C O M P O S E D | O F | V E R Y | D I F F E R E N T | E L E M E N T S | F R O M | O U R S | A N D | T H A T | T H E Y | H A D | N O T H I N G | I N | C O M M O N | W I T H | T H E | L A T T E R | E X C E P T | T H E I R | N A M E |
+I F | I | A P P L A U D | T H E | F R E E D O M | W H I C H | I T S | I N H A B I T A N T S | E N J O Y | H E | A N S W E R S | F R E E D O M | I S | A | F I N E | T H I N G | B U T | F E W | N A T I O N S | A R E | W O R T H Y | T O | E N J O Y | I T |
+I F | I | S A Y | T O | A N | A M E R I C A N | T H A T | T H E | C O U N T R Y | H E | L I V E S | I N | I S | A | F I N E | O N E | A Y | H E | R E P L I E S | T H E R E | I S | N O T | I T S | F E L L O W | I N | T H E | W O R L D |
+I N | A R I S T O C R A C I E S | E V E R Y | M A N | H A S | O N E | S O L E | O B J E C T | W H I C H | H E | U N C E A S I N G L Y | P U R S U E S | B U T | A M O N G S T | D E M O C R A T I C | N A T I O N S | T H E | E X I S T E N C E | O F | M A N | I S | M O R E | C O M P L E X | T H E | S A M E | M I N D | W I L L | A L M O S T | A L W A Y S | E M B R A C E | S E V E R A L | O B J E C T S | A T | T H E | S A M E | T I M E | A N D | T H E S E | O B J E C T S | A R E | F R E Q U E N T L Y | W H O L L Y | F O R E I G N | T O | E A C H | O T H E R | A S | I T | C A N N O T | K N O W | T H E M | A L L | W E L L | T H E | M I N D | I S | R E A D I L Y | S A T I S F I E D | W I T H | I M P E R F E C T | N O T I O N S | O F | E A C H |
+I N | A R I S T O C R A T I C | C O M M U N I T I E S | T H E | P E O P L E | R E A D I L Y | G I V E | T H E M S E L V E S | U P | T O | B U R S T S | O F | T U M U L T U O U S | A N D | B O I S T E R O U S | G A Y E T Y | W H I C H | S H A K E | O F F | A T | O N C E | T H E | R E C O L L E C T I O N | O F | T H E I R | P R I V A T I O N S | T H E | N A T I V E S | O F | D E M O C R A C I E S | A R E | N O T | F O N D | O F | B E I N G | T H U S | V I O L E N T L Y | B R O K E N | I N | U P O N | A N D | T H E Y | N E V E R | L O S E | S I G H T | O F | T H E I R | O W N | S E L V E S | W I T H O U T | R E G R E T |
+T H E S E | P E R S O N S | T H E N | D I S P L A Y E D | T O W A R D S | E A C H | O T H E R | P R E C I S E L Y | T H E | S A M E | P U E R I L E | J E A L O U S I E S | W H I C H | A N I M A T E | T H E | M E N | O F | D E M O C R A C I E S | T H E | S A M E | E A G E R N E S S | T O | S N A T C H | T H E | S M A L L E S T | A D V A N T A G E S | W H I C H | T H E I R | E Q U A L S | C O N T E S T E D | A N D | T H E | S A M E | D E S I R E | T O | P A R A D E | O S T E N T A T I O U S L Y | T H O S E | O F | W H I C H | T H E Y | W E R E | I N | P O S S E S S I O N |
+T H E Y | S T A N D | U N M O V E D | I N | T H E I R | S O L I T A R Y | G R E A T N E S S | W E L L | A S S U R E D | T H A T | T H E Y | A R E | S E E N | O F | A L L | T H E | W O R L D | W I T H O U T | A N Y | E F F O R T | T O | S H O W | T H E M S E L V E S | O F F | A N D | T H A T | N O | O N E | W I L L | A T T E M P T | T O | D R I V E | T H E M | F R O M | T H A T | P O S I T I O N |
+T H E | A M E R I C A N S | I N | T H E I R | I N T E R C O U R S E | W I T H | S T R A N G E R S | A P P E A R | I M P A T I E N T | O F | T H E | S M A L L E S T | C E N S U R E | A N D | I N S A T I A B L E | O F | P R A I S E |
+I N | A R I S T O C R A T I C | C O U N T R I E S | T H E | G R E A T | P O S S E S S | I M M E N S E | P R I V I L E G E S | U P O N | W H I C H | T H E I R | P R I D E | R E S T S | W I T H O U T | S E E K I N G | T O | R E L Y | U P O N | T H E | L E S S E R | A D V A N T A G E S | W H I C H | A C C R U E | T O | T H E M |
+A N | A M E R I C A N | I N S T E A D | O F | G O I N G | I N | A | L E I S U R E | H O U R | T O | D A N C E | M E R R I L Y | A T | S O M E | P L A C E | O F | P U B L I C | R E S O R T | A S | T H E | F E L L O W S | O F | H I S | C A L L I N G | C O N T I N U E | T O | D O | T H R O U G H O U T | T H E | G R E A T E R | P A R T | O F | E U R O P E | S H U T S | H I M S E L F | U P | A T | H O M E | T O | D R I N K |
+C H A P T E R | S I X T E E N | W H Y | T H E | N A T I O N A L | V A N I T Y | O F | T H E | A M E R I C A N S | I S | M O R E | R E S T L E S S | A N D | C A P T I O U S | T H A N | T H A T | O F | T H E | E N G L I S H |
+I | B E L I E V E | T H E | S E R I O U S N E S S | O F | T H E | A M E R I C A N S | A R I S E S | P A R T L Y | F R O M | T H E I R | P R I D E |
+T H I S | I S | M O R E | E S P E C I A L L Y | T H E | C A S E | A M O N G S T | T H O S E | F R E E | N A T I O N S | W H I C H | F O R M | D E M O C R A T I C | C O M M U N I T I E S |
+I | T O L D | T O M | W E | S H O U L D N ' T | C O M E | S O | L A T E | S A Y S | H I L D A |
+R I G H T | A W A Y | W H E N | I | B R I N G | H O M E | M Y | N E W | P R O G R A M | H E | S A Y S | H O W | C O M E | Y O U ' R E | T A K I N G | O N E | L E S S | C O U R S E | T H I S | H A L F |
+I T ' S | Y O U R | F A U L T | M O P | I T | U P | Y O U R S E L F |
+Y O U ' R E | G E T T I N G | A L T O G E T H E R | T O O | U P S E T | A B O U T | T H E S E | P R O G R A M S | S T O P | I T | A N D | B E H A V E | Y O U R S E L F |
+T O M | S A Y S | T H A N K S | A N D | L O O K S | A T | H I L D A | A N D | S H E | B L U S H E S | R E A L L Y |
+B E S I D E S | S A Y S | T O M | H A L F | T H E | R E A S O N | Y O U | A N D | Y O U R | F A T H E R | A R E | A L W A Y S | B I C K E R I N G | I S | T H A T | Y O U ' R E | S O | M U C H | A L I K E | M E | L I K E | H I M | S U R E |
+P O P | I T ' S | A | C O U R S E |
+S O M E H O W | O R | O T H E R | C A T | H A S | T A U G H T | T H E M | T H A T | H E ' S | I N | C H A R G E | H E R E | A N D | H E | J U S T | C H A S E S | T H E M | F O R | F U N | N O W | A N D | A G A I N | W H E N | H E ' S | N O T | B U S Y | S L E E P I N G |
+I ' L L | B E | L U C K Y | I F | I | H A V E | T I M E | T O | B R E A T H E |
+P O P | G O E S | R I G H T | O N | T U N I N G | H I S | C H A N N E L |
+T O M | D R I N K S | A | L I T T L E | M O R E | C O F F E E | A N D | T H E N | H E | G O E S | O N | T H E | T R O U B L E | I S | I | C A N ' T | G E T | M A R R I E D | O N | T H I S | F L O W E R | S H O P | J O B |
+S O M E T I M E S | S C H O O L S | D O | L E T | K I D S | T A K E | A | L O T | O F | S O F T | C O U R S E S | A N D | T H E N | T H E Y ' R E | O U T | O N | A | L I M B | L A T E R | H U H |
+H E | D O E S | A N D | F O R | O N C E | I | W I N | A | R O U N D | I | K E E P | M U S I C | F O R | T H I S | S E M E S T E R |
+I | E X P L A I N | T H A T | I ' M | T A K I N G | M U S I C | A N D | A L S O | B I O L O G Y | A L G E B R A | E N G L I S H | A N D | F R E N C H | M U S I C | H E | S N O R T S |
+W H E N | A R E | Y O U | G E T T I N G | R I D | O F | T H E S E | C A T S | I ' M | N O T | F I X I N G | T O | S T A R T | A N | A N N E X | T O | K A T E ' S | C A T | H O M E |
+I T ' S | T H E | F I R S T | T I M E | H I L D A | H A S | B E E N | T O | O U R | H O U S E | A N D | T O M | I N T R O D U C E S | H E R | A R O U N D |
+I | H E A R | T H E | T | V | G O I N G | F O R | A | F E W | M I N U T E S | T H E N | P O P | T U R N S | I T | O F F | A N D | G O E S | I N | T H E | K I T C H E N | T O | T A L K | T O | M O M |
+I | L O O K | A T | M Y | W A T C H | I T ' S | A | Q U A R T E R | T O | E L E V E N |
+I | G E T | T H E | P I L L O W S | C O M F O R T A B L Y | A R R A N G E D | O N | T H E | F L O O R | W I T H | A | B I G | B O T T L E | O F | S O D A | A N D | A | B A G | O F | P O P C O R N | W I T H I N | E A S Y | R E A C H |
+A S | L O N G | A S | T H E R E ' S | A | B O N E | O N | T H E | F L O O R | T H E | T W O | O F | Y O U | W O R R Y | I T |
+I | T U R N | O F F | T H E | T E L E V I S I O N | S E T | I ' V E | L O S T | T R A C K | O F | W H A T ' S | H A P P E N I N G | A N D | I T | D O E S N ' T | S E E M | T O | B E | T H E | G R A N D F A T H E R | W H O ' S | T H E | S P O O K | A F T E R | A L L |
+W E L L | I | D O N ' T | T H I N K | Y O U | S H O U L D | T U R N | A | G U Y ' S | T | V | P R O G R A M | O F F | I N | T H E | M I D D L E | W I T H O U T | E V E N | F I N D I N G | O U T | A B O U T | I T |
+H E R E ' S | T O | Y O U | A | L O N G | H A P P Y | L I F E |
+I ' L L | H A V E | T O | C H E C K | S O M E | M O R E | S A Y S | T O M |
+Y O U | K N O W | I ' D | G E T | D R A F T E D | I N | A | Y E A R | O R | T W O | A N Y W A Y |
+T H E | T W O | S T R A Y | K I T T E N S | G R A D U A L L Y | M A K E | T H E M S E L V E S | A T | H O M E |
+S H E | D O E S N ' T | P I C K | T H E M | U P | B U T | J U S T | H A V I N G | T H E M | I N | T H E | R O O M | S U R E | D O E S N ' T | G I V E | H E R | A S T H M A |
+S O | H E | C A R E S | H U H |
+I ' V E | D E C I D E D | T O | E N L I S T | I N | T H E | A R M Y |
+T H E | W I N D | W A S | S O | S T R O N G | T H A T | I | H A D | T O | H O L D | M Y | H A T | O N | A N D | T H E | G I R L S | S K I R T S | W E R E | B L O W N | O U T | B E F O R E | T H E M |
+H E | W A S | B O R N | L I K E | T H A T | T H E | O T H E R S | A R E | S M A R T |
+S H E | L O O K E D | A T | M E | H E R | E Y E S | F A I R L Y | B L A Z I N G | W I T H | T H I N G S | S H E | C O U L D | N O T | S A Y |
+A T | T H A T | M O M E N T | T H E | F A T H E R | C A M E | O U T | O F | T H E | H O L E | I N | T H E | B A N K |
+M Y | G R A N D M O T H E R | A L W A Y S | S P O K E | I N | A | V E R Y | L O U D | T O N E | T O | F O R E I G N E R S | A S | I F | T H E Y | W E R E | D E A F |
+P R E S E N T L Y | A G A I N S T | O N E | O F | T H O S E | B A N K S | I | S A W | A | S O R T | O F | S H E D | T H A T C H E D | W I T H | T H E | S A M E | W I N E | C O L O R E D | G R A S S | T H A T | G R E W | E V E R Y W H E R E |
+I | N O T I C E D | H O W | W H I T E | A N D | W E L L | S H A P E D | H I S | O W N | H A N D S | W E R E |
+I | R E M E M B E R E D | W H A T | T H E | C O N D U C T O R | H A D | S A I D | A B O U T | H E R | E Y E S |
+S H E | W A S | Q U I C K | A N D | V E R Y | E A G E R |
+A F T E R | A N T O N I A | H A D | S A I D | T H E | N E W | W O R D S | O V E R | A N D | O V E R | S H E | W A N T E D | T O | G I V E | M E | A | L I T T L E | C H A S E D | S I L V E R | R I N G | S H E | W O R E | O N | H E R | M I D D L E | F I N G E R |
+V E R Y | G L A D | V E R Y | G L A D | S H E | E J A C U L A T E D |
+W E | S T O O D | P A N T I N G | O N | T H E | E D G E | O F | T H E | R A V I N E | L O O K I N G | D O W N | A T | T H E | T R E E S | A N D | B U S H E S | T H A T | G R E W | B E L O W | U S |
+A M B R O S C H | H E | M A K E | G O O D | F A R M E R |
+S H E | P O I N T E D | I N T O | T H E | G O L D | C O T T O N W O O D | T R E E | B E H I N D | W H O S E | T O P | W E | S T O O D | A N D | S A I D | A G A I N | W H A T | N A M E |
+W E | W E R E | S O | D E E P | I N | T H E | G R A S S | T H A T | W E | C O U L D | S E E | N O T H I N G | B U T | T H E | B L U E | S K Y | O V E R | U S | A N D | T H E | G O L D | T R E E | I N | F R O N T | O F | U S |
+I T | W A S | S O | L O N G | T H A T | I T | B U S H E D | O U T | B E H I N D | H I S | E A R S | A N D | M A D E | H I M | L O O K | L I K E | T H E | O L D | P O R T R A I T S | I | R E M E M B E R E D | I N | V I R G I N I A |
+S H E | M A D E | M I S S U S | S H I M E R D A | U N D E R S T A N D | T H E | F R I E N D L Y | I N T E N T I O N | O F | O U R | V I S I T | A N D | T H E | B O H E M I A N | W O M A N | H A N D L E D | T H E | L O A V E S | O F | B R E A D | A N D | E V E N | S M E L L E D | T H E M | A N D | E X A M I N E D | T H E | P I E S | W I T H | L I V E L Y | C U R I O S I T Y | E X C L A I M I N G | M U C H | G O O D | M U C H | T H A N K |
+E V E N | F R O M | A | D I S T A N C E | O N E | C O U L D | S E E | T H A T | T H E R E | W A S | S O M E T H I N G | S T R A N G E | A B O U T | T H I S | B O Y |
+S H E | G O T | U P | O N | H E R | K N E E S | A N D | W R U N G | H E R | H A N D S |
+Y O U ' L L | G E T | F I X E D | U P | C O M F O R T A B L E | A F T E R | W H I L E | M I S S U S | S H I M E R D A | M A K E | G O O D | H O U S E |
+W H E N | I | C A M E | U P | H E | T O U C H E D | M Y | S H O U L D E R | A N D | L O O K E D | S E A R C H I N G L Y | D O W N | I N T O | M Y | F A C E | F O R | S E V E R A L | S E C O N D S |
+A N T O N I A | P O I N T E D | U P | T O | T H E | S K Y | A N D | Q U E S T I O N E D | M E | W I T H | H E R | G L A N C E |
+O C C A S I O N A L L Y | O N E | O F | T H E | H O R S E S | W O U L D | T E A R | O F F | W I T H | H I S | T E E T H | A | P L A N T | F U L L | O F | B L O S S O M S | A N D | W A L K | A L O N G | M U N C H I N G | I T | T H E | F L O W E R S | N O D D I N G | I N | T I M E | T O | H I S | B I T E S | A S | H E | A T E | D O W N | T O W A R D | T H E M |
+N O W | W H Y | I S | T H A T | O T T O |
+T H E | F A M I L Y | H A D | B E E N | L I V I N G | O N | C O R N C A K E S | A N D | S O R G H U M | M O L A S S E S | F O R | T H R E E | D A Y S |
+H E | S T R U C K | A M B R O S C H | O N | T H E | B A C K | A N D | T H E | B O Y | S M I L E D | K N O W I N G L Y |
+I | B E C A M E | S O M E W H A T | E M B A R R A S S E D | F O R | I | W A S | U S E D | T O | B E I N G | T A K E N | F O R | G R A N T E D | B Y | M Y | E L D E R S |
+F U C H S | B R O U G H T | U P | A | S A C K | O F | P O T A T O E S | A N D | A | P I E C E | O F | C U R E D | P O R K | F R O M | T H E | C E L L A R | A N D | G R A N D M O T H E R | P A C K E D | S O M E | L O A V E S | O F | S A T U R D A Y ' S | B R E A D | A | J A R | O F | B U T T E R | A N D | S E V E R A L | P U M P K I N | P I E S | I N | T H E | S T R A W | O F | T H E | W A G O N | B O X |
+I T ' S | N O | B E T T E R | T H A N | A | B A D G E R | H O L E | N O | P R O P E R | D U G O U T | A T | A L L |
+W E E K | F O L L O W E D | W E E K | T H E S E | T W O | B E I N G S | L E D | A | H A P P Y | L I F E | I N | T H A T | H O V E L |
+H E | P A S S E D | H O U R S | I N | W A T C H I N G | H E R | D R E S S I N G | A N D | U N D R E S S I N G | H E R | D O L L | A N D | I N | L I S T E N I N G | T O | H E R | P R A T T L E |
+O N L Y | A S | H E | W A S | F I V E | A N D | F I F T Y | A N D | C O S E T T E | E I G H T | Y E A R S | O F | A G E | A L L | T H A T | M I G H T | H A V E | B E E N | L O V E | I N | T H E | W H O L E | C O U R S E | O F | H I S | L I F E | F L O W E D | T O G E T H E R | I N T O | A | S O R T | O F | I N E F F A B L E | L I G H T |
+M O R E O V E R | J E A N | V A L J E A N | H A D | C H O S E N | H I S | R E F U G E | W E L L |
+H E | H A D | R E T U R N E D | T O | P R I S O N | T H I S | T I M E | F O R | H A V I N G | D O N E | R I G H T | H E | H A D | Q U A F F E D | F R E S H | B I T T E R N E S S | D I S G U S T | A N D | L A S S I T U D E | W E R E | O V E R P O W E R I N G | H I M | E V E N | T H E | M E M O R Y | O F | T H E | B I S H O P | P R O B A B L Y | S U F F E R E D | A | T E M P O R A R Y | E C L I P S E | T H O U G H | S U R E | T O | R E A P P E A R | L A T E R | O N | L U M I N O U S | A N D | T R I U M P H A N T | B U T | A F T E R | A L L | T H A T | S A C R E D | M E M O R Y | W A S | G R O W I N G | D I M |
+C O S E T T E | W A S | N O | L O N G E R | I N | R A G S | S H E | W A S | I N | M O U R N I N G |
+W H E N | T H E S E | T W O | S O U L S | P E R C E I V E D | E A C H | O T H E R | T H E Y | R E C O G N I Z E D | E A C H | O T H E R | A S | N E C E S S A R Y | T O | E A C H | O T H E R | A N D | E M B R A C E D | E A C H | O T H E R | C L O S E L Y |
+T H E | M A N | N O | L O N G E R | P R O D U C E D | O N | H E R | T H E | E F F E C T | O F | B E I N G | O L D | O R | P O O R | S H E | T H O U G H T | J E A N | V A L J E A N | H A N D S O M E | J U S T | A S | S H E | T H O U G H T | T H E | H O V E L | P R E T T Y |
+H E | H A D | N E V E R | B E E N | F A T H E R | L O V E R | H U S B A N D | F R I E N D |
+W H O | K N O W S | W H E T H E R | J E A N | V A L J E A N | H A D | N O T | B E E N | O N | T H E | E V E | O F | G R O W I N G | D I S C O U R A G E D | A N D | O F | F A L L I N G | O N C E | M O R E |
+A L A S | H E | W A L K E D | W I T H | N O | L E S S | I N D E C I S I O N | T H A N | C O S E T T E |
+T H E | B E S T | O F | U S | A R E | N O T | E X E M P T | F R O M | E G O T I S T I C A L | T H O U G H T S |
+S H E | F E L T | T H A T | W H I C H | S H E | H A D | N E V E R | F E L T | B E F O R E | A | S E N S A T I O N | O F | E X P A N S I O N |
+H I S | S I S T E R | A N D | H I S | S I S T E R ' S | C H I L D R E N | H A D | L E F T | H I M | O N L Y | A | V A G U E | A N D | F A R | O F F | M E M O R Y | W H I C H | H A D | F I N A L L Y | A L M O S T | C O M P L E T E L Y | V A N I S H E D | H E | H A D | M A D E | E V E R Y | E F F O R T | T O | F I N D | T H E M | A N D | N O T | H A V I N G | B E E N | A B L E | T O | F I N D | T H E M | H E | H A D | F O R G O T T E N | T H E M |
+T H E | H E A R T | O F | T H A T | E X | C O N V I C T | W A S | F U L L | O F | V I R G I N I T Y |
+N A T U R E | A | D I F F E R E N C E | O F | F I F T Y | Y E A R S | H A D | S E T | A | P R O F O U N D | G U L F | B E T W E E N | J E A N | V A L J E A N | A N D | C O S E T T E | D E S T I N Y | F I L L E D | I N | T H I S | G U L F |
+C O S E T T E | O N | H E R | S I D E | H A D | A L S O | U N K N O W N | T O | H E R S E L F | B E C O M E | A N O T H E R | B E I N G | P O O R | L I T T L E | T H I N G |
+T O | M E E T | W A S | T O | F I N D | E A C H | O T H E R |
+H E | H A D | P A I D | H E R | S I X | M O N T H S | I N | A D V A N C E | A N D | H A D | C O M M I S S I O N E D | T H E | O L D | W O M A N | T O | F U R N I S H | T H E | C H A M B E R | A N D | D R E S S I N G | R O O M | A S | W E | H A V E | S E E N |
+H E | W A S | T H A T | C H I L D ' S | S T A Y | A N D | S H E | W A S | H I S | P R O P |
+A N D | T H E N | H E | T A L K E D | O F | H E R | M O T H E R | A N D | H E | M A D E | H E R | P R A Y |
+H E | S U F F E R E D | A L L | T H E | P A N G S | O F | A | M O T H E R | A N D | H E | K N E W | N O T | W H A T | I T | M E A N T | F O R | T H A T | G R E A T | A N D | S I N G U L A R | M O V E M E N T | O F | A | H E A R T | W H I C H | B E G I N S | T O | L O V E | I S | A | V E R Y | O B S C U R E | A N D | A | V E R Y | S W E E T | T H I N G |
+H E | P R O T E C T E D | H E R | A N D | S H E | S T R E N G T H E N E D | H I M |
+W H A T | A L T E R N A T I V E | W A S | T H E R E | F O R | H E R |
+A N D | A L R E A D Y | T H I S | A S T O U N D I N G | B L O W | B E G I N S | T O | T A K E | I T S | P L A C E | A M O N G | O T H E R | E V E N T S | A S | A | T H I N G | S T R A N G E | A N D | T E R R I B L E | I N D E E D | B U T | R E L A T E D | T O | A L L | T H E | S T R A N G E N E S S | A N D | M Y S T E R Y | O F | L I F E | P A R T | O F | T H E | U N I V E R S A L | M Y S T E R I E S | O F | D E S P A I R | A N D | F U T I L I T Y | A N D | D E A T H | T H A T | H A V E | T R O U B L E D | M Y | C O N S C I O U S N E S S | S I N C E | C H I L D H O O D |
+I | B E C A M E | G R O T E S Q U E L Y | A N X I O U S | T O | A S S U R E | H I M | T H A T | I N D E E D | S H E | A N D | I | H A D | B E E N | A S | T H E Y | S A Y | I N N O C E N T | T H R O U G H O U T | O U R | L A S T | D A Y | T O G E T H E R |
+S H E | W A S | D E S T R O Y E D | N O T | M E R E L Y | B Y | T H E | U N C O N S I D E R E D | U N D I S C I P L I N E D | P A S S I O N S | O F | H E R | H U S B A N D | A N D | H E R | L O V E R | B U T | B Y | T H E | V A S T | T R A D I T I O N | T H A T | S U S T A I N S | A N D | E N F O R C E S | T H E | S U B J U G A T I O N | O F | H E R | S E X |
+I T | W A S | T H A T | I D E A | O F | W A S T E | T H A T | D O M I N A T E D | M Y | M I N D | I N | A | S T R A N G E | I N T E R V I E W | I | H A D | W I T H | J U S T I N |
+I T | S E E M S | T O | M E | M O R E | A N D | M O R E | A S | I | L I V E | L O N G E R | T H A T | M O S T | P O E T R Y | A N D | M O S T | L I T E R A T U R E | A N D | P A R T I C U L A R L Y | T H E | L I T E R A T U R E | O F | T H E | P A S T | I S | D I S C O R D A N T | W I T H | T H E | V A S T N E S S | A N D | V A R I E T Y | T H E | R E S E R V E S | A N D | R E S O U R C E S | A N D | R E C U P E R A T I O N S | O F | L I F E | A S | W E | L I V E | I T | T O | D A Y |
+W E | R A N G E | W I D E R | L A S T | L O N G E R | A N D | E S C A P E | M O R E | A N D | M O R E | F R O M | I N T E N S I T Y | T O W A R D S | U N D E R S T A N D I N G |
+F O R | A | T I M E | T H E | D E A T H | O F | M A R Y | O B S C U R E D | H E R | L I F E | F O R | M E | B U T | N O W | H E R | L I V I N G | P R E S E N C E | I S | M O R E | I N | M Y | M I N D | A G A I N |
+I F | W E | H A D | B E E N | B R O T H E R | A N D | S I S T E R | I N D E E D | T H E R E | W A S | N O T H I N G |
+Y O U | W E R E | W R O N G | I N | A L L | T H A T | I | S A I D | S H E | K E P T | H E R | F A I T H | W I T H | Y O U |
+H O W | W E | M U S T | S I M P L I F Y |
+W E | N E V E R | P L A N N E D | T O | M E E T | A N D | W H E N | W E | M E T |
+A N D | I T | I S | U P O N | T H I S | E F F E C T | O F | S W E E T | A N D | B E A U T I F U L | P O S S I B I L I T I E S | C A U G H T | I N | T H E | N E T | O F | A N I M A L | J E A L O U S I E S | A N D | T H O U G H T L E S S | M O T I V E S | A N D | A N C I E N T | R I G I D | I N S T I T U T I O N S | T H A T | I | W O U L D | E N D | T H I S | W R I T I N G |
+B U T | N O W | I T | D O E S N ' T | S E E M | T O | M A T T E R | V E R Y | M U C H |
+I T | I S | T H E | E X P R E S S I O N | O F | L I F E | U N D E R | C R U D E R | A N D | M O R E | R I G I D | C O N D I T I O N S | T H A N | O U R S | L I V E D | B Y | P E O P L E | W H O | L O V E D | A N D | H A T E D | M O R E | N A I V E L Y | A G E D | S O O N E R | A N D | D I E D | Y O U N G E R | T H A N | W E | D O |
+I N | M A R Y | I T | S E E M S | T O | M E | I | F O U N D | B O T H | W O M A N H O O D | A N D | F E L L O W S H I P | I | F O U N D | W H A T | M A N Y | H A V E | D R E A M T | O F | L O V E | A N D | F R I E N D S H I P | F R E E L Y | G I V E N | A N D | I | C O U L D | D O | N O T H I N G | B U T | C L U T C H | A T | H E R | T O | M A K E | H E R | M Y | P O S S E S S I O N |
+Y O U | S E E | T H E | T R E A T M E N T | I S | A | T R I F L E | F A N C I F U L |
+B U T | I F | I | P L A Y | Y O U | A | R O U N D E L | L A D Y | G E T | M E | A | G I F T | F R O M | T H E | E M P E R O R ' S | D A U G H T E R | H E R | F I N G E R | R I N G | F O R | M Y | F I N G E R | B R I N G | T H O U G H | S H E ' S | P L E D G E D | A | T H O U S A N D | L E A G U E S | O V E R | T H E | W A T E R | L A D Y | L A D Y | M Y | F A I R | L A D Y | O | M Y | R O S E | W H I T E | L A D Y |
+T H E | E M P E R O R ' S | D A U G H T E R |
+T H E | L A D I E S |
+L A D Y | L A D Y | M Y | R O S E | W H I T E | L A D Y | B U T | W I L L | Y O U | N O T | H E A R | A | R O U N D E L | L A D Y |
+I ' L L | P L A Y | F O R | Y O U | N O W | N E A T H | T H E | A P P L E | B O U G H | A N D | Y O U | S H A L L | D R E A M | O N | T H E | L A W N | S O | S H A D Y | L A D Y | L A D Y | M Y | F A I R | L A D Y | O | M Y | A P P L E | G O L D | L A D Y |
+O N C E | M O R E | T H E | S I N G E R | P L A Y S | A N D | T H E | L A D I E S | D A N C E | B U T | O N E | B Y | O N E | T H E Y | F A L L | A S L E E P | T O | T H E | D R O W S Y | M U S I C | A N D | T H E N | T H E | S I N G E R | S T E P S | I N T O | T H E | R I N G | A N D | U N L O C K S | T H E | T O W E R | A N D | K I S S E S | T H E | E M P E R O R ' S | D A U G H T E R |
+B E D | T I M E | C H I L D R E N |
+T H E | L A D I E S | I N | Y E L L O W | D R E S S E S | S T A N D | A G A I N | I N | A | R I N G | A B O U T | T H E | E M P E R O R ' S | D A U G H T E R | A N D | A R E | F O R | T H E | L A S T | T I M E | A C C O S T E D | B Y | T H E | S I N G E R | W I T H | H I S | L U T E |
+T H E | W A N D E R I N G | S I N G E R |
+S H E | W O U L D | N O T | S P E A K | T H O U G H | W E | D A N C E D | A | W E E K | W I T H | H E R | T H O U G H T S | A | T H O U S A N D | L E A G U E S | O V E R | T H E | W A T E R | S I N G E R | S I N G E R | W A N D E R I N G | S I N G E R | O | M Y | H O N E Y | S W E E T | S I N G E R |
+I | D O N ' T | K N O W | W H A T | B E C O M E S | O F | T H E | L A D I E S |
+B U T | T H I S | I S | A | F A L L A C Y |
+B U T | I | D I D | O N C E | H A V E | T H E | L U C K | T O | H E A R | A N D | S E E | T H E | L A D Y | P L A Y E D | I N | E N T I R E T Y | T H E | C H I L D R E N | H A D | B E E N | G R A N T E D | L E A V E | T O | P L A Y | J U S T | O N E | M O R E | G A M E | B E F O R E | B E D | T I M E | A N D | O F | C O U R S E | T H E Y | C H O S E | T H E | L O N G E S T | A N D | P L A Y E D | I T | W I T H O U T | M I S S I N G | A | S Y L L A B L E |
+F O R G O T T E N | T O O | T H E | N A M E | O F | G I L L I A N | T H E | L O V E L Y | C A P T I V E |
+T H E | W A N D E R I N G | S I N G E R | A P P R O A C H E S | T H E M | W I T H | H I S | L U T E |
+O | I F | Y O U | P L A Y | U S | A | R O U N D E L | S I N G E R | H O W | C A N | T H A T | H A R M | T H E | E M P E R O R ' S | D A U G H T E R |
+N O W | Y O U | M A Y | P L A Y | A | S E R E N A | S I N G E R | A | D R E A M | O F | N I G H T | F O R | A N | A P P L E | G O L D | L A D Y | F O R | T H E | F R U I T | I S | N O W | O N | T H E | A P P L E | B O U G H | A N D | T H E | M O O N | I S | U P | A N D | T H E | L A W N | I S | S H A D Y | S I N G E R | S I N G E R | W A N D E R I N G | S I N G E R | O | M Y | H O N E Y | S W E E T | S I N G E R |
+T H E | W A N D E R I N G | S I N G E R |
+W O R S E | A N D | W O R S E | H E | I S | E V E N | P R E S U M E D | T O | B E | T H E | C A P T I V E ' S | S W E E T H E A R T | W H O | W H E E D L E S | T H E | F L O W E R | T H E | R I N G | A N D | T H E | P R I S O N | K E Y | O U T | O F | T H E | S T R I C T | V I R G I N S | F O R | H I S | O W N | P U R P O S E S | A N D | F L I E S | W I T H | H E R | A T | L A S T | I N | H I S | S H A L L O P | A C R O S S | T H E | S E A | T O | L I V E | W I T H | H E R | H A P P I L Y | E V E R | A F T E R |
+I T | W A S | L O C K E D | F R O M | T H E | I N S I D E | A N D | W E | H A D | T O | B U R N | I T | D O W N | W I T H | A | T O R C H | T H A T ' S | W H E R E | T H E Y | A R E |
+Y E S | C H A R C O A L |
+T H E | D A I L Y | N E W S C A S T S | F R O M | T E R R A | S H O W E D | A | C O R R E S P O N D I N G | S H I F T | I N | I N T E R E S T | A T | H O M E |
+W E L L | O F | C O U R S E | T H E Y ' R E | D E A D | W H A T | A | Q U E S T I O N |
+B R I N G | I T | T O | T H E | P U B L I C | A T T E N T I O N | D R A M A T I Z E | I T |
+T O N Y | L A T T I M E R | T H E | D I S C O V E R E R | W A S | B E G I N N I N G | T O | C A S H | I N | O N | H I S | A T T E N T I O N S | T O | G L O R I A | A N D | H I S | I N G R A T I A T I O N | W I T H | S I D | H E | W A S | A L W A Y S | E I T H E R | M A K I N G | V O I C E | A N D | I M A G E | T A L K S | F O R | T E L E C A S T | O R | L I S T E N I N G | T O | T H E | N E W S | F R O M | T H E | H O M E | P L A N E T |
+T H A T | T O O K | T H E | C E N T E R | O F | I N T E R E S T | A W A Y | F R O M | A R C H A E O L O G Y | A N D | S T A R T E D | A | N E W | B U R S T | O F | A C T I V I T Y |
+N O W | I T | W A S | B U R N E D | A W A Y | A T | B O T H | S I D E S | A N D | L A Y | S T I L L | H O T | A L O N G | T H E | E D G E S | O N | T H E | F L O O R | O F | T H E | B I G | O F F I C E | R O O M | I N | F R O N T |
+T H E | T E R R A N | P U B L I C | W A N T E D | T O | H E A R | A B O U T | M A R T I A N S | A N D | I F | L I V E | M A R T I A N S | C O U L D N ' T | B E | F O U N D | A | R O O M | F U L L | O F | D E A D | O N E S | W A S | T H E | N E X T | B E S T | T H I N G |
+A | F L O O D L I G H T | W A S | O N | I N | T H E | R O O M | I N S I D E | A N D | L A T T I M E R | W A S | G O I N G | A R O U N D | L O O K I N G | A T | T H I N G S | W H I L E | A | S P A C E | F O R C E | O F F I C E R | S T O O D | B Y | T H E | D O O R |
+T H E Y | H A D | F O U R | O R | F I V E | S P E C I E S | O F | W H A T | M I G H T | L O O S E L Y | B E | C A L L E D | B I R D S | A N D | S O M E T H I N G | T H A T | C O U L D | E A S I L Y | B E | C L A S S E D | A S | A | R E P T I L E | A N D | A | C A R N I V O R O U S | M A M M A L | T H E | S I Z E | O F | A | C A T | W I T H | B I R D L I K E | C L A W S | A N D | A | H E R B I V O R E | A L M O S T | I D E N T I C A L | W I T H | T H E | P I G L I K E | T H I N G | I N | T H E | B I G | D A R F H U L V A | M U R A L | A N D | A N O T H E R | L I K E | A | G A Z E L L E | W I T H | A | S I N G L E | H O R N | I N | T H E | M I D D L E | O F | I T S | F O R E H E A D |
+T H E | O R G A N I Z A T I O N | O F | A | S O C I E T Y | O F | M A R T I A N | A R C H A E O L O G Y | W I T H | A N T H O N Y | L A T T I M E R | P H | D | T H E | L O G I C A L | C A N D I D A T E | F O R | T H E | C H A I R |
+B I L L | C H A N D L E R | T H E | Z O O L O G I S T | H A D | B E E N | G O I N G | D E E P E R | A N D | D E E P E R | I N T O | T H E | O L D | S E A | B O T T O M | O F | S Y R T I S |
+M A R T H A | R E M E M B E R E D | T H E | C L O S E D | D O O R | O N | T H E | F I R S T | S U R V E Y | T H E Y | H A D N ' T | A T T E M P T E D | O P E N I N G | I T |
+T H E | C I V I L I A N | S P E C I A L I S T S | I N | O T H E R | F I E L D S | A N D | T H E | S P A C E | F O R C E | P E O P L E | W H O | H A D | B E E N | H O L D I N G | T A P E | L I N E S | A N D | M A K I N G | S K E T C H E S | A N D | S N A P P I N G | C A M E R A S | W E R E | A L L | F L Y I N G | T O | L O W E R | S Y R T I S | T O | F I N D | O U T | H O W | M U C H | O X Y G E N | T H E R E | W A S | A N D | W H A T | K I N D | O F | L I F E | I T | S U P P O R T E D |
+W I T H O U T | Q U E S T I O N | H E | H A D | B E C O M E | O V E R N I G H T | T H E | M O S T | W I D E L Y | K N O W N | A R C H A E O L O G I S T | I N | H I S T O R Y |
+S O | T H E Y | J U S T | C A M E | I N | H E R E | A N D | L I T | T H E | C H A R C O A L | A N D | S A T | D R I N K I N G | T O G E T H E R | T I L L | T H E Y | A L L | F E L L | A S L E E P |
+N O T | T H A T | I ' M | I N T E R E S T E D | I N | A L L | T H I S | F O R | M Y S E L F | H E | D I S C L A I M E D | A F T E R | L I S T E N I N G | T O | T H E | T E L E C A S T | F R O M | T E R R A | T W O | D A Y S | A F T E R | H I S | D I S C O V E R Y |
+T O N Y ' S | F O U N D | T H E | M A R T I A N S |
+S O | I | B E L I E V E | I | S H A L L | G O | B A C K | A T | L E A S T | F O R | A | W H I L E | A N D | S E E | W H A T | I | C A N | D O |
+L E C T U R E S |
+M A S S | S U I C I D E | T H A T ' S | W H A T | I T | W A S | N O T I C E | W H A T ' S | I N | T H E | C O R N E R S |
+G L O R I A | S T A N D I S H | W H O | H A D | D R O P P E D | I N | F O R | L U N C H | W A S | O N | T H E | M E Z Z A N I N E | F A I R L Y | S C R E A M I N G | I N T O | A | R A D I O P H O N E | E X T E N S I O N | D O Z E N | A N D | A | H A L F | O F | T H E M |
+T H E Y | A L S O | F O U N D | A | M A R T I A N | C A L E N D A R | T H E | Y E A R | H A D | B E E N | D I V I D E D | I N T O | T E N | M O R E | O R | L E S S | E Q U A L | M O N T H S | A N D | O N E | O F | T H E M | H A D | B E E N | D O M A |
+H E | S M I L E D | G U I L T I L Y | A S | H E | A D D E D | B U T | I | M U S T | A D M I T | I | W A S | M O R E | T H A N | A | L I T T L E | C O N C E R N E D | M Y S E L F |
+A N D | T H E | O N L Y | T R U C K | W E | H A D | A V A I L A B L E | W A S | I N | T H A T | B U R N I N G | S H E D | T H E | S U P E R I N T E N D E N T | A D D E D | B I T T E R L Y |
+T H E | T W O | G I R L S | W E R E | A S | M U C H | U P S E T | A S | T O M ' S | M O T H E R | T O M | L A U G H E D |
+T H E | T E L E P H O N E | L I N E | W A S | S O O N | R E P A I R E D | A N D | A | S T E A D Y | S T R E A M | O F | R E S C U E | V E H I C L E S | B E G A N | A R R I V I N G | F R O M | H A R K N E S S | F I R E | T R U C K S | T H R E E | A M B U L A N C E S | A N D | P R I V A T E | C A R S | D R I V E N | B Y | V O L U N T E E R S |
+I ' L L | B E | G L A D | T O | T R Y | S I R | H E | R E P L I E D |
+M I S T E R | S W I F T ' S | E Y E S | T W I N K L E D |
+T H E | Y O U N G | I N V E N T O R | H A D | J U S T | N O T I C E D | H I S | F R I E N D | L Y I N G | P I N N E D | B E N E A T H | A | H E A V Y | B E A M | N E A R B Y |
+H E ' S | A | G R E A T | S C I E N T I S T |
+B U D | T H R E W | U P | H I S | A R M S | T O | P R O T E C T | H I M S E L F | B U T | T O O | L A T E |
+T H E Y | P I C K E D | T H E I R | W A Y | T H R O U G H | T H E | W R E C K A G E | A N D | E M E R G E D | O N | A | S C E N E | O F | F R I G H T F U L | D E S T R U C T I O N |
+H I S | F R I E N D ' S | E Y E L I D S | F L I C K E R E D |
+M A L E | O R | F E M A L E | H U M A N | O R | A N I M A L |
+T H E | S K Y | W A S | V I S I B L E | T H R O U G H | S E V E R A L | G A P I N G | H O L E S | I N | T H E | R O O F | W H I C H | W A S | S A G G I N G | D A N G E R O U S L Y | O N | I T S | S U P P O R T I N G | T R U S S E S |
+T H E N | T O M | W H O | H A D | B E E N | S T U N N E D | B Y | S O M E | F A L L I N G | D E B R I S | R A I S E D | H I M S E L F | T O | A | S I T T I N G | P O S I T I O N |
+L E T ' S | S E E | A B O U T | G E T T I N G | H E L P | F O R | M I S T E R | F A B E R |
+A N O T H E R | E N G I N E E R | R U S H E D | T O W A R D | T H E | D O O R | T O | S E E | W H A T | W A S | H A P P E N I N G | O U T S I D E |
+E L E C T R O N I C | E Q U I P M E N T | C A S C A D E D | F R O M | T H E | W A L L | S H E L V E S | A N D | A | H E A V Y | D U T Y | C H A I N | H O I S T | C A M E | L O O S E | F R O M | I T S | O V E R H E A D | T R A C K | P L U N G I N G | T O | T H E | F L O O R | W I T H | A | T E R R I F Y I N G | C R A S H |
+A N | I N S T A N T | L A T E R | I T | C R A S H E D | O V E R | P I N N I N G | M A R K | F A B E R | B E N E A T H | I T |
+T H I S | I S N ' T | P A R T | O F | Y O U R | T E S T I N G | R O U T I N E | I S | I T |
+T O M | N O D D E D | U N H A P P I L Y |
+F O R | M I N U T E S | N O | O N E | S T I R R E D | A M O N G | T H E | W R E C K A G E |
+A N Y H O W | W E | W A N T | T O | H E L P | G O T | A | J O B | F O R | U S |
+W E ' D | B E T T E R | N O T | T R Y | T O | M O V E | H I M | T O M | D E C I D E D | W E ' L L | G E T | A N | A M B U L A N C E |
+W I T H I N | M I N U T E S | T O M | W A S | I N | C H A R G E | O F | C L E A R I N G | A W A Y | R U B B L E | A N D | E X T R I C A T I N G | A N Y O N E | W H O | M I G H T | B E | T R A P P E D | I N S I D E | T H E | B U I L D I N G S |
+I N S I D E | A | S E C R E T | R O C K E T | T E L E M E T E R I N G | D E V I C E | W A S | M O U N T E D | O N | I T S | T E S T | S T A N D |
+M I S T E R | S W I F T | C A M E | I N T O | T H E | L I V I N G | R O O M | J U S T | T H E N | A N D | T O L D | T O M | H O W | W O R R I E D | M I S S U S | S W I F T | A N D | S A N D Y | H A D | B E E N |
+T H E Y | C L U S T E R | A R O U N D | M E | T H E I R | H A N D S | A R E | T A L O N E D | T H E I R | E Y E S | A R E | R E D | L I K E | F L A M E | B U R N I N G | I N | D A R K N E S S |
+S I N C E | H I S | B I R T H | H E | H A S | B E E N | G U A R D E D | S O | C L O S E L Y | T H A T | T H E | C L E V E R E S T | P O I S O N E R S | O F | T H E | E A S T | C O U L D | N O T | R E A C H | H I M |
+B Y | W H I C H | A | S O U L | I S | D R A W N | F R O M | I T S | B O D Y | A N D | A C R O S S | G U L F S | O F | E C H O I N G | S P A C E | R E T U R N E D | T H E | M A N | O N | T H E | M A T |
+B U T | A T | T H E | U R G E N T | E N T R E A T Y | O F | T H E | P R I N C E S S | O F | K H O S A L A | W H O | L O V E D | B H U N D A | C H A N D | V A I N L Y | H E | G A V E | H E R | A | L O C K | O F | H I S | L O N G | B L A C K | H A I R | A S | A | T O K E N | O F | R E M E M B R A N C E |
+T H E | M A N | S H R U G G E D | H I S | B R O A D | S H O U L D E R S | A N D | T U R N E D | B A C K | I N T O | T H E | A R A B E S Q U E | C H A M B E R |
+A | L O W | C O N F U S E D | M O A N | W A N E D | F R O M | H I S | M O U T H |
+I | T E L L | Y O U | I T | I S | N O T | P O I S O N | S H E | C R I E D |
+T H E | S L A N T | O F | T H E | M O O N | P R E S A G E D | E V I L | F O R | T H E | K I N G | O F | V E N D H Y A | T H E | S T A R S | A R E | I N | T U R M O I L | T H E | S E R P E N T | I N | T H E | H O U S E | O F | T H E | E L E P H A N T |
+T H E R E | T H E Y | S T R O V E | T O | B R E A K | T H E | S I L V E R | C O R D | O F | L I F E | A N D | T H R U S T | M Y | S O U L | I N T O | T H E | B O D Y | O F | A | F O U L | N I G H T | W E I R D | T H E I R | S O R C E R Y | S U M M O N E D | U P | F R O M | H E L L | A H |
+T H E R E | W A S | T H E | O L D | I M P E R I O U S | N O T E | I N | H I S | F A I L I N G | W H I S P E R |
+O N | T H E | D A I S | U N D E R | T H E | G O L D E N | D O M E | T H E | K I N G | C R I E D | O U T | A G A I N | R A C K E D | B Y | A W F U L | P A R O X Y S M S |
+T H E Y | S E E K | T O | S N A P | T H E | S I L V E R | C O R D | T H A T | B I N D S | M E | T O | M Y | D Y I N G | B O D Y |
+A S | Y O U | W E L L | K N O W | T H E R E | A R E | T E N | M E N | A N D | T E N | W O M E N | W H O S E | S O L E | D U T Y | I S | T O | T A S T E | H I S | F O O D | A N D | W I N E | A N D | F I F T Y | A R M E D | W A R R I O R S | G U A R D | H I S | C H A M B E R | A S | T H E Y | G U A R D | I T | N O W |
+P O I N T | O F | C O N T A C T | I N Q U I R E D | T H E | O T H E R |
+I | K N O W | N O W | W H A T | B R I N G S | M E | T O | T H E | P Y R E |
+T H E I R | F I N G E R S | S E A R | M E | L I K E | F I R E |
+N O T | U N T I L | T H E | H E A V E N S | W E R E | I N | T H E | P R O P E R | O R D E R | C O U L D | T H E Y | P E R F O R M | T H I S | N E C R O M A N C Y |
+H E | W A S | Y O U N G | N O | S P E A R | H A D | T O U C H E D | H I M | N O | P O I S O N | L U R K E D | I N | H I S | W I N E |
+S E N D | M Y | S O U L | C L E A N | T O | A S U R A |
+A L L | D I S C A R D E D | P O R T I O N S | O F | T H E | H U M A N | B O D Y | S T I L L | R E M A I N | P A R T | O F | I T | A T T A C H E D | T O | I T | B Y | I N T A N G I B L E | C O N N E C T I O N S |
+W I T H | A | L O N G | S T A I N E D | F I N G E R N A I L | H E | M A P P E D | T H E | C O N S T E L L A T I O N S | O N | T H E | M A R B L E | T I L E D | F L O O R |
+T H I S | M A N | W A S | C L A D | I N | A | B R O W N | C A M E L | H A I R | R O B E | A N D | S A N D A L S | A N D | A | G R E E N | T U R B A N | W A S | O N | H I S | H E A D |
+Y O U | H A V E | N E V E R | D I S O B E Y E D | M E | O B E Y | M Y | L A S T | C O M M A N D |
+Y O U R | C R Y | A N D | T H E | G R I P | O F | Y O U R | F I N G E R S | B R O U G H T | M E | B A C K | B U T | I | A M | G O I N G | F A S T |
+S U M M E R | S Q U A S H E S | A L M O S T | I N | T H E I R | G O L D E N | B L O S S O M | C U C U M B E R S | N O W | E V I N C I N G | A | T E N D E N C Y | T O | S P R E A D | A W A Y | F R O M | T H E | M A I N | S T O C K | A N D | R A M B L E | F A R | A N D | W I D E | T W O | O R | T H R E E | R O W S | O F | S T R I N G | B E A N S | A N D | A S | M A N Y | M O R E | T H A T | W E R E | A B O U T | T O | F E S T O O N | T H E M S E L V E S | O N | P O L E S | T O M A T O E S | O C C U P Y I N G | A | S I T E | S O | S H E L T E R E D | A N D | S U N N Y | T H A T | T H E | P L A N T S | W E R E | A L R E A D Y | G I G A N T I C | A N D | P R O M I S E D | A N | E A R L Y | A N D | A B U N D A N T | H A R V E S T |
+W H A T | A N | I N S T R U M E N T | I S | T H E | H U M A N | V O I C E |
+P H O E B E | W O N D E R E D | W H O S E | C A R E | A N D | T O I L | I T | C O U L D | H A V E | B E E N | T H A T | H A D | P L A N T E D | T H E S E | V E G E T A B L E S | A N D | K E P T | T H E | S O I L | S O | C L E A N | A N D | O R D E R L Y |
+F E W E R | W O R D S | T H A N | B E F O R E | B U T | W I T H | T H E | S A M E | M Y S T E R I O U S | M U S I C | I N | T H E M |
+A N D | Y E T | I F | Y O U | C O U L D | O N L Y | S E E | T H E | B E N I G N | S M I L E | O F | T H E | O R I G I N A L |
+T H E R E | I S | A | W O N D E R F U L | I N S I G H T | I N | H E A V E N ' S | B R O A D | A N D | S I M P L E | S U N S H I N E |
+O H | R E J O I N E D | T H E | D A G U E R R E O T Y P I S T | B E C A U S E | L I K E | A N | O L D | L A D Y ' S | C U P | O F | T E A | I T | I S | W A T E R | B E W I T C H E D |
+T H E | E N C L O S U R E | H A D | F O R M E R L Y | B E E N | V E R Y | E X T E N S I V E | B U T | W A S | N O W | C O N T R A C T E D | W I T H I N | S M A L L | C O M P A S S | A N D | H E M M E D | A B O U T | P A R T L Y | B Y | H I G H | W O O D E N | F E N C E S | A N D | P A R T L Y | B Y | T H E | O U T B U I L D I N G S | O F | H O U S E S | T H A T | S T O O D | O N | A N O T H E R | S T R E E T |
+I T | N O W | C O N T A I N E D | O N L Y | C H A N T I C L E E R | H I S | T W O | W I V E S | A N D | A | S O L I T A R Y | C H I C K E N |
+S I N C E | Y O U | A R E | A | F R I E N D | O F | M Y | C O U S I N | H E P Z I B A H ' S | Y O U | S H O U L D | A S K | H E R | T O | S H O W | Y O U | T H E | P I C T U R E |
+H E | E X H I B I T E D | A | D A G U E R R E O T Y P E | M I N I A T U R E | I N | A | M O R O C C O | C A S E |
+T H I S | W A S | A | F O U N T A I N | S E T | R O U N D | W I T H | A | R I M | O F | O L D | M O S S Y | S T O N E S | A N D | P A V E D | I N | I T S | B E D | W I T H | W H A T | A P P E A R E D | T O | B E | A | S O R T | O F | M O S A I C | W O R K | O F | V A R I O U S L Y | C O L O R E D | P E B B L E S |
+W H I L E | W E | G I V E | I T | C R E D I T | O N L Y | F O R | D E P I C T I N G | T H E | M E R E S T | S U R F A C E | I T | A C T U A L L Y | B R I N G S | O U T | T H E | S E C R E T | C H A R A C T E R | W I T H | A | T R U T H | T H A T | N O | P A I N T E R | W O U L D | E V E R | V E N T U R E | U P O N | E V E N | C O U L D | H E | D E T E C T | I T |
+I | T U R N | U P | T H E | E A R T H | B Y | W A Y | O F | P A S T I M E |
+T H E | W H I T E | D O U B L E | R O S E B U S H | H A D | E V I D E N T L Y | B E E N | P R O P P E D | U P | A N E W | A G A I N S T | T H E | H O U S E | S I N C E | T H E | C O M M E N C E M E N T | O F | T H E | S E A S O N | A N D | A | P E A R | T R E E | A N D | T H R E E | D A M S O N | T R E E S | W H I C H | E X C E P T | A | R O W | O F | C U R R A N T | B U S H E S | C O N S T I T U T E D | T H E | O N L Y | V A R I E T I E S | O F | F R U I T | B O R E | M A R K S | O F | T H E | R E C E N T | A M P U T A T I O N | O F | S E V E R A L | S U P E R F L U O U S | O R | D E F E C T I V E | L I M B S |
+H E R E | W E | H A V E | T H E | M A N | S L Y | S U B T L E | H A R D | I M P E R I O U S | A N D | W I T H A L | C O L D | A S | I C E | L O O K | A T | T H A T | E Y E |
+H O W | W O N D E R F U L L Y | R E S P O N S I V E | T O | E V E R Y | E M O T I O N | O F | T H E | H U M A N | S O U L |
+B E E S | T O O | S T R A N G E | T O | S A Y | H A D | T H O U G H T | I T | W O R T H | T H E I R | W H I L E | T O | C O M E | H I T H E R | P O S S I B L Y | F R O M | T H E | R A N G E | O F | H I V E S | B E S I D E | S O M E | F A R M | H O U S E | M I L E S | A W A Y |
+Y E T | T H E | O R I G I N A L | W E A R S | T O | C O M M O N | E Y E S | A | V E R Y | D I F F E R E N T | E X P R E S S I O N |
+M O S T | O F | M Y | L I K E N E S S E S | D O | L O O K | U N A M I A B L E | B U T | T H E | V E R Y | S U F F I C I E N T | R E A S O N | I | F A N C Y | I S | B E C A U S E | T H E | O R I G I N A L S | A R E | S O |
+I S | T H E R E | N O T H I N G | W I L D | I N | T H E | E Y E | C O N T I N U E D | H O L G R A V E | S O | E A R N E S T L Y | T H A T | I T | E M B A R R A S S E D | P H O E B E | A S | D I D | A L S O | T H E | Q U I E T | F R E E D O M | W I T H | W H I C H | H E | P R E S U M E D | O N | T H E I R | S O | R E C E N T | A C Q U A I N T A N C E |
+P H O E B E | M E R E L Y | G L A N C E D | A T | I T | A N D | G A V E | I T | B A C K |
+I T | I S | L I K E | A | B A N D A G E | O V E R | O N E ' S | E Y E S | T O | C O M E | I N T O | I T |
+W H I L E | T H U S | D I S M I S S I N G | H E R | T H E | M A I D E N | L A D Y | S T E P T | F O R W A R D | K I S S E D | P H O E B E | A N D | P R E S S E D | H E R | T O | H E R | H E A R T | W H I C H | B E A T | A G A I N S T | T H E | G I R L ' S | B O S O M | W I T H | A | S T R O N G | H I G H | A N D | T U M U L T U O U S | S W E L L |
+T H E Y | K E P T | T H E M S E L V E S | A L I V E | U N Q U E S T I O N A B L Y | A N D | L A I D | N O W | A N D | T H E N | A N | E G G | A N D | H A T C H E D | A | C H I C K E N | N O T | F O R | A N Y | P L E A S U R E | O F | T H E I R | O W N | B U T | T H A T | T H E | W O R L D | M I G H T | N O T | A B S O L U T E L Y | L O S E | W H A T | H A D | O N C E | B E E N | S O | A D M I R A B L E | A | B R E E D | O F | F O W L S |
+T H E | C H I C K E N | C R E P T | T H R O U G H | T H E | P A L E S | O F | T H E | C O O P | A N D | R A N | W I T H | S O M E | S H O W | O F | L I V E L I N E S S | T O | H E R | F E E T | W H I L E | C H A N T I C L E E R | A N D | T H E | L A D I E S | O F | H I S | H O U S E H O L D | R E G A R D E D | H E R | W I T H | Q U E E R | S I D E L O N G | G L A N C E S | A N D | T H E N | C R O A K E D | O N E | T O | A N O T H E R | A S | I F | C O M M U N I C A T I N G | T H E I R | S A G E | O P I N I O N S | O F | H E R | C H A R A C T E R |
+A H | B U T | T H E S E | H E N S | A N S W E R E D | T H E | Y O U N G | M A N | T H E S E | H E N S | O F | A R I S T O C R A T I C | L I N E A G E | W O U L D | S C O R N | T O | U N D E R S T A N D | T H E | V U L G A R | L A N G U A G E | O F | A | B A R N | Y A R D | F O W L |
+S O | W E | W I L L | B E | F E L L O W | L A B O R E R S | S O M E W H A T | O N | T H E | C O M M U N I T Y | S Y S T E M |
+T H E Y | H A V E | K N O W N | M E | M U C H | L O N G E R | B U T | N E V E R | H O N O R | M E | W I T H | A N Y | F A M I L I A R I T Y | T H O U G H | H A R D L Y | A | D A Y | P A S S E S | W I T H O U T | M Y | B R I N G I N G | T H E M | F O O D |
+H E | H E L D | A | H O E | I N | H I S | H A N D | A N D | W H I L E | P H O E B E | W A S | G O N E | I N | Q U E S T | O F | T H E | C R U M B S | H A D | B E G U N | T O | B U S Y | H I M S E L F | W I T H | D R A W I N G | U P | F R E S H | E A R T H | A B O U T | T H E | R O O T S | O F | T H E | T O M A T O E S |
+I | P R E F E R | T O | T H I N K | A N D | S O | W O U L D | M I S S | H E P Z I B A H | T H A T | T H E Y | R E C O G N I Z E | T H E | F A M I L Y | T O N E | F O R | Y O U | A R E | A | P Y N C H E O N |
+I T | I S | N O N S E N S E | S A I D | P H O E B E | A | L I T T L E | I M P A T I E N T L Y | F O R | U S | T O | T A L K | A B O U T | A | P I C T U R E | W H I C H | Y O U | H A V E | N E V E R | S E E N |
+T H E S E | F E A T H E R E D | P E O P L E | H A D | E X I S T E D | T O O | L O N G | I N | T H E I R | D I S T I N C T | V A R I E T Y | A | F A C T | O F | W H I C H | T H E | P R E S E N T | R E P R E S E N T A T I V E S | J U D G I N G | B Y | T H E I R | L U G U B R I O U S | D E P O R T M E N T | S E E M E D | T O | B E | A W A R E |
+W E L L | I | D O N ' T | W I S H | T O | S E E | I T | A N Y | M O R E | O B S E R V E D | P H O E B E | T U R N I N G | A W A Y | H E R | E Y E S | I T | I S | C E R T A I N L Y | V E R Y | L I K E | T H E | O L D | P O R T R A I T |
+P R A Y | G O | T O | B E D | F O R | I | A M | S U R E | Y O U | M U S T | N E E D | R E S T |
+T H E | S U N | A S | Y O U | S E E | T E L L S | Q U I T E | A N O T H E R | S T O R Y | A N D | W I L L | N O T | B E | C O A X E D | O U T | O F | I T | A F T E R | H A L F | A | D O Z E N | P A T I E N T | A T T E M P T S | O N | M Y | P A R T |
+S H E | W A S | I N D I S T I N C T L Y | A W A R E | H O W E V E R | T H A T | T H E | G A U N T | F I G U R E | O F | T H E | O L D | G E N T L E W O M A N | W A S | S I T T I N G | I N | O N E | O F | T H E | S T R A I G H T | B A C K E D | C H A I R S | A | L I T T L E | W I T H D R A W N | F R O M | T H E | W I N D O W | T H E | F A I N T | G L E A M | O F | W H I C H | S H O W E D | T H E | B L A N C H E D | P A L E N E S S | O F | H E R | C H E E K | T U R N E D | S I D E W A Y S | T O W A R D S | A | C O R N E R |
+T H E | D I S T I N G U I S H I N G | M A R K | O F | T H E | H E N S | W A S | A | C R E S T | O F | L A M E N T A B L Y | S C A N T Y | G R O W T H | I N | T H E S E | L A T T E R | D A Y S | B U T | S O | O D D L Y | A N D | W I C K E D L Y | A N A L O G O U S | T O | H E P Z I B A H ' S | T U R B A N | T H A T | P H O E B E | T O | T H E | P O I G N A N T | D I S T R E S S | O F | H E R | C O N S C I E N C E | B U T | I N E V I T A B L Y | W A S | L E D | T O | F A N C Y | A | G E N E R A L | R E S E M B L A N C E | B E T W I X T | T H E S E | F O R L O R N | B I P E D S | A N D | H E R | R E S P E C T A B L E | R E L A T I V E |
+I | W I L L | S I T | I N | T H E | P A R L O R | A W H I L E | A N D | C O L L E C T | M Y | T H O U G H T S |
+S O | W I S E | A S | W E L L | A S | A N T I Q U E | W A S | T H E I R | A S P E C T | A S | T O | G I V E | C O L O R | T O | T H E | I D E A | N O T | M E R E L Y | T H A T | T H E Y | W E R E | T H E | D E S C E N D A N T S | O F | A | T I M E | H O N O R E D | R A C E | B U T | T H A T | T H E Y | H A D | E X I S T E D | I N | T H E I R | I N D I V I D U A L | C A P A C I T Y | E V E R | S I N C E | T H E | H O U S E | O F | T H E | S E V E N | G A B L E S | W A S | F O U N D E D | A N D | W E R E | S O M E H O W | M I X E D | U P | W I T H | I T S | D E S T I N Y |
+A T | S O M E | U N C E R T A I N | P E R I O D | I N | T H E | D E P T H S | O F | N I G H T | A N D | A S | I T | W E R E | T H R O U G H | T H E | T H I N | V E I L | O F | A | D R E A M | S H E | W A S | C O N S C I O U S | O F | A | F O O T S T E P | M O U N T I N G | T H E | S T A I R S | H E A V I L Y | B U T | N O T | W I T H | F O R C E | A N D | D E C I S I O N |
+I T | W A S | E V I D E N T | T H A T | T H E | R A C E | H A D | D E G E N E R A T E D | L I K E | M A N Y | A | N O B L E | R A C E | B E S I D E S | I N | C O N S E Q U E N C E | O F | T O O | S T R I C T | A | W A T C H F U L N E S S | T O | K E E P | I T | P U R E |
+B U T | P U T | I T | O N | T H E | T A B L E | I N | T H E | C O R N E R | O F | T H E | P A S S A G E |
+S H E | D I D | N O T | A L T O G E T H E R | L I K E | H I M |
+T H E R E | W E R E | A L S O | A | F E W | S P E C I E S | O F | A N T I Q U E | A N D | H E R E D I T A R Y | F L O W E R S | I N | N O | V E R Y | F L O U R I S H I N G | C O N D I T I O N | B U T | S C R U P U L O U S L Y | W E E D E D | A S | I F | S O M E | P E R S O N | E I T H E R | O U T | O F | L O V E | O R | C U R I O S I T Y | H A D | B E E N | A N X I O U S | T O | B R I N G | T H E M | T O | S U C H | P E R F E C T I O N | A S | T H E Y | W E R E | C A P A B L E | O F | A T T A I N I N G |
+I F | Y O U | W O U L D | P E R M I T | M E | S A I D | T H E | A R T I S T | L O O K I N G | A T | P H O E B E | I | S H O U L D | L I K E | T O | T R Y | W H E T H E R | T H E | D A G U E R R E O T Y P E | C A N | B R I N G | O U T | D I S A G R E E A B L E | T R A I T S | O N | A | P E R F E C T L Y | A M I A B L E | F A C E |
+I F | T H E | O R I G I N A L | I S | S T I L L | I N | T H E | W O R L D | I | T H I N K | H E | M I G H T | D E F Y | T H E | S U N | T O | M A K E | H I M | L O O K | S T E R N | A N D | H A R D |
+M Y | N A M E | I S | P H O E B E | P Y N C H E O N | S A I D | T H E | G I R L | W I T H | A | M A N N E R | O F | S O M E | R E S E R V E | F O R | S H E | W A S | A W A R E | T H A T | H E R | N E W | A C Q U A I N T A N C E | C O U L D | B E | N O | O T H E R | T H A N | T H E | D A G U E R R E O T Y P I S T | O F | W H O S E | L A W L E S S | P R O P E N S I T I E S | T H E | O L D | M A I D | H A D | G I V E N | H E R | A | D I S A G R E E A B L E | I D E A |
+I N | G O O D | F A I T H | H O W E V E R | H E | I S | N O T | S U F F I C I E N T L Y | I M A G I N A T I V E | T O | F L A T T E R | H I M S E L F | W I T H | T H E | S L I G H T E S T | H O P E | O F | T H I S | K I N D |
+H E | T R U S T S | N O T | T O | B E | C O N S I D E R E D | A S | U N P A R D O N A B L Y | O F F E N D I N G | B Y | L A Y I N G | O U T | A | S T R E E T | T H A T | I N F R I N G E S | U P O N | N O B O D Y ' S | P R I V A T E | R I G H T S | A N D | A P P R O P R I A T I N G | A | L O T | O F | L A N D | W H I C H | H A D | N O | V I S I B L E | O W N E R | A N D | B U I L D I N G | A | H O U S E | O F | M A T E R I A L S | L O N G | I N | U S E | F O R | C O N S T R U C T I N G | C A S T L E S | I N | T H E | A I R |
+T H E | A U T H O R | H A S | C O N S I D E R E D | I T | H A R D L Y | W O R T H | H I S | W H I L E | T H E R E F O R E | R E L E N T L E S S L Y | T O | I M P A L E | T H E | S T O R Y | W I T H | I T S | M O R A L | A S | W I T H | A N | I R O N | R O D | O R | R A T H E R | A S | B Y | S T I C K I N G | A | P I N | T H R O U G H | A | B U T T E R F L Y | T H U S | A T | O N C E | D E P R I V I N G | I T | O F | L I F E | A N D | C A U S I N G | I T | T O | S T I F F E N | I N | A N | U N G A I N L Y | A N D | U N N A T U R A L | A T T I T U D E |
+T H E | N A R R A T I V E | I T | M A Y | B E | I S | W O V E N | O F | S O | H U M B L E | A | T E X T U R E | A S | T O | R E Q U I R E | T H I S | A D V A N T A G E | A N D | A T | T H E | S A M E | T I M E | T O | R E N D E R | I T | T H E | M O R E | D I F F I C U L T | O F | A T T A I N M E N T |
+I F | P E R M I T T E D | B Y | T H E | H I S T O R I C A L | C O N N E C T I O N | W H I C H | T H O U G H | S L I G H T | W A S | E S S E N T I A L | T O | H I S | P L A N | T H E | A U T H O R | W O U L D | V E R Y | W I L L I N G L Y | H A V E | A V O I D E D | A N Y T H I N G | O F | T H I S | N A T U R E |
+T H E R E | A P P E A R E D | T O | B E | A N | I M M E D I A T E | A S S O C I A T I O N | W I T H | T H E | D E A T H | T R A U M A | A S | I F | T H E | T W O | W E R E | I N E X T R I C A B L Y | L I N K E D | I N T O | O N E |
+A | M I N U T E | I S | N O T | A | V E R Y | L A R G E | M E A S U R E | O F | T I M E | A N D | H I S | B O D Y | N E E D E D | E V E R Y | F R A C T I O N | O F | I T |
+P A R T I C U L A R L Y | S O | O N | T H I S | L A S T | N I G H T | W H E N | O N L Y | T W O | O F | T H E | L I T T L E | C U B I C L E S | W E R E | O C C U P I E D | T H E | T H O U S A N D S | O F | O T H E R S | S T A N D I N G | W I T H | D A R K | E M P T Y | D O O R S |
+T H E | C O N T E S T A N T S | I N | T H E | T W E N T I E S | N E E D E D | U N D I S T U R B E D | R E S T | T H E R E F O R E | N I G H T S | I N | T H E | D O R M I T O R I E S | W E R E | A S | Q U I E T | A S | D E A T H |
+T H E R E | C O U L D | B E | L I T T L E | A R T | I N | T H I S | L A S T | A N D | F I N A L | R O U N D | O F | F E N C I N G |
+B R I O N | S A W | S O M E T H I N G | C L O S E | T O | P A N I C | O N | H I S | O P P O N E N T ' S | F A C E | W H E N | T H E | M A N | F I N A L L Y | R E C O G N I Z E D | H I S | E R R O R |
+S W E A T | C O V E R E D | B R I O N ' S | B O D Y | T R I C K L I N G | I N T O | T H E | T I G H T | L O I N C L O T H | T H A T | W A S | T H E | O N L Y | G A R M E N T | H E | W O R E |
+T E N | S E C O N D S |
+T H I S | I S | P H Y S I C A L L Y | I M P O S S I B L E | W H E N | C O N S C I O U S |
+I R O L G | L O O K E D | A M A Z E D | A T | T H E | S U D D E N | F U R Y | O F | T H E | A T T A C K | T H E N | S M I L E D |
+I ' M | H E R E | B E C A U S E | T H E | M A T T E R | I S | O F | U T M O S T | I M P O R T A N C E | A N D | B R A N D D | I S | T H E | O N E | I | M U S T | S E E | N O W | S T A N D | A S I D E |
+A | W A V E | O F | D E S P A I R | R O L L E D | O U T | F R O M | I R O L G | B R I O N | S E N S E D | I T | A N D | K N E W | T H E | F I F T H | P O I N T | W A S | H I S |
+B R E A T H I N G | D E E P L Y | B R I O N | S O F T L Y | S P O K E | T H E | A U T O | H Y P N O T I C | P H R A S E S | T H A T | T R I G G E R E D | T H E | P R O C E S S |
+T H E R E | W A S | S I L E N C E | T H E N | A N D | S T I L L | W O N D E R I N G | B R I O N | W A S | O N C E | M O R E | A S L E E P |
+A | M A N | S A I D | T O | T H E | U N I V E R S E | S I R | I | E X I S T |
+W H E N | T H E | B U Z Z E R | S O U N D E D | H E | P U L L E D | H I S | F O I L | F R O M | H I S | S E C O N D ' S | S T A R T L E D | G R A S P | A N D | R A N | F O R W A R D |
+J U S T | T H R U S T | A N D | P A R R Y | A N D | V I C T O R Y | T O | T H E | S T R O N G E R |
+T H E | S T R E N G T H | T H A T | E N A B L E S | S O M E O N E | I N | A | T R A N C E | T O | H O L D | H I S | B O D Y | S T I F F | A N D | U N S U P P O R T E D | E X C E P T | A T | T W O | P O I N T S | T H E | H E A D | A N D | H E E L S |
+T H E | C U T | O N | H I S | C H E S T | S T I L L | D R I P P I N G | B L O O D | T H E | A C H E | O F | H I S | O V E R S T R A I N E D | E Y E S | E V E N | T H E | S O A R I N G | A R E N A | A R O U N D | H I M | W I T H | T H E | T H O U S A N D S | O F | S P E C T A T O R S | W E R E | T R I V I A L I T I E S | N O T | W O R T H | T H I N K I N G | A B O U T |
+T H E | B U Z Z E R ' S | W H I R R | T R I G G E R E D | H I S | M U S C L E S | I N T O | C O M P L E T E | R E L A X A T I O N |
+O T H E R S | H A D | D I E D | B E F O R E | D U R I N G | T H E | T W E N T I E S | A N D | D E A T H | D U R I N G | T H E | L A S T | R O U N D | W A S | I N | S O M E | W A Y S | E A S I E R | T H A N | D E F E A T |
+O N E | M I N U T E | A | V O I C E | S A I D | A N D | T H E | T I M E | B U Z Z E R | S O U N D E D |
+E V E R Y | M A N | W H O | E N T E R E D | T H E | T W E N T I E S | H A D | H I S | O W N | T R A I N I N G | T R I C K S |
+H E | W A S | I N | R E V E R I E | S L I D I N G | A L O N G | T H E | B O R D E R S | O F | C O N S C I O U S N E S S |
+T H E | O T H E R | V O I C E | S N A P P E D | W I T H | A | H A R S H | U R G E N C Y | C L E A R L Y | U S E D | T O | C O M M A N D |
+T H E | T W E N T I E S |
+T H E N | T H E | P O W E R F U L | T W I S T | T H A T | T H R U S T | I T | A S I D E | I N | A N D | U N D E R | T H E | G U A R D |
+H E | A S K E D | T H E | H A N D L E R | W H O | W A S | K N E A D I N G | H I S | A C H I N G | M U S C L E S |
+H I S | I N S T A N T | O F | P A N I C | W A S | F O L L O W E D | B Y | A | S M A L L | S H A R P | B L O W | H I G H | O N | H I S | C H E S T |
+H E | T H O U G H T | I T | W A S | A | L A S T | B U R S T | O F | E N E R G Y | H E | K N E W | H O W | C L O S E | T H E Y | B O T H | W E R E | T O | E X H A U S T I O N |
+O N L Y | H I S | H E A R T | A N D | L U N G S | W O R K E D | O N | A T | A | S T R O N G | M E A S U R E D | R A T E |
+H E | M U S T | H A V E | D R A W N | H I S | G U N | B E C A U S E | T H E | I N T R U D E R | S A I D | Q U I C K L Y | P U T | T H A T | A W A Y | Y O U ' R E | B E I N G | A | F O O L | O U T |
+A | R E D | H A I R E D | M O U N T A I N | O F | A | M A N | W I T H | A N | A P P A R E N T L Y | I N E X H A U S T I B L E | S T O R E | O F | E N E R G Y |
+T R U E | A G R E E D | K A L I K O |
+I | H A V E | R E M A I N E D | A | P R I S O N E R | O N L Y | B E C A U S E | I | W I S H E D | T O | B E | O N E | A N D | W I T H | T H I S | H E | S T E P P E D | F O R W A R D | A N D | B U R S T | T H E | S T O U T | C H A I N S | A S | E A S I L Y | A S | I F | T H E Y | H A D | B E E N | T H R E A D S |
+T H E | M E T A L | F O R E S T | I S | I N | T H E | G R E A T | D O M E D | C A V E R N | T H E | L A R G E S T | I N | A L L | O U R | D O M I N I O N S | R E P L I E D | K A L I K O |
+W H E R E | I S | M Y | B R O T H E R | N O W |
+O H | N O | I ' M | Q U I T E | S U R E | H E | D I D N ' T |
+H A V I N G | R E T U R N E D | T O | T H E | R O Y A L | C A V E R N | K A L I K O | F I R S T | P O U N D E D | T H E | G O N G | A N D | T H E N | S A T | I N | T H E | T H R O N E | W E A R I N G | R U G G E D O ' S | D I S C A R D E D | R U B Y | C R O W N | A N D | H O L D I N G | I N | H I S | H A N D | T H E | S C E P T R E | W H I C H | R U G G E D O | H A D | S O | O F T E N | T H R O W N | A T | H I S | H E A D |
+T H A T ' S | F U N N Y | R E M A R K E D | B E T S Y | T H O U G H T F U L L Y |
+I | D O N ' T | B E L I E V E | A N N | K N E W | A N Y | M A G I C | O R | S H E ' D | H A V E | W O R K E D | I T | B E F O R E |
+K A L I K O | W E N T | T O | T H E | B I G | G O N G | A N D | P O U N D E D | O N | I T | J U S T | A S | R U G G E D O | U S E D | T O | D O | B U T | N O | O N E | A N S W E R E D | T H E | S U M M O N S |
+I N | F A C T | T H E R E | I S | N O T H I N G | H E | C A N | D O | I N | T H E S E | D O M I N I O N S | A S | W E L L | A S | O U R | N O M E S | W H O S E | N U M B E R S | A R E | S O | G R E A T | T H A T | I T | W O R R I E S | U S | T O | K E E P | T H E M | A L L | B U S Y |
+H O W E V E R | I F | W E | L O O K | S H A R P | W E | M A Y | B E | A B L E | T O | D I S C O V E R | O N E | O F | T H E S E | S E C R E T | W A Y S |
+I | D O | N O T | K N O W | C O N F E S S E D | S H A G G Y |
+I | H O P E | H E | D O E S N ' T | W O R K | T O O | H A R D | S A I D | S H A G G Y |
+T H E | L I T T L E | G I R L | H A D | B E E N | A S L E E P | B U T | S H E | H E A R D | T H E | R A P S | A N D | O P E N E D | T H E | D O O R |
+I | A L S O | O F F E R E D | T O | H E L P | Y O U R | B R O T H E R | T O | E S C A P E | B U T | H E | W O U L D | N O T | G O |
+I | B E G G E D | R U G G E D O | L O N G | A G O | T O | S E N D | H I M | A W A Y | B U T | H E | W O U L D | N O T | D O | S O |
+T H E | K I N G | H A S | F L E D | I N | D I S G R A C E | A N D | Y O U R | F R I E N D S | A R E | A S K I N G | F O R | Y O U |
+W H E R E | I S | T H A T |
+N O T | E X A C T L Y | R E T U R N E D | K A L I K O |
+K A L I K O | H E S I T A T E D |
+H E | E A T S | A N D | S L E E P S | V E R Y | S T E A D I L Y | R E P L I E D | T H E | N E W | K I N G |
+I N Q U I R E D | S H A G G Y | I N | T H E | M E T A L | F O R E S T |
+B E C A U S E | Y O U | W E R E | S L E E P I N G | I N S T E A D | O F | C O N Q U E R I N G | T H E | L O V E L Y | R O S E | P R I N C E S S | H A S | B E C O M E | A | F I D D L E | W I T H O U T | A | B O W | W H I L E | P O O R | S H A G G Y | S I T S | T H E R E | A | C O O I N G | D O V E |
+H E | D O E S N ' T | W O R K | A T | A L L |
+H E | H A S | G O N E | A N D | G O N E | F O R | G O O D | A N S W E R E D | P O L Y C H R O M E | W H O | H A D | M A N A G E D | T O | S Q U E E Z E | I N T O | T H E | R O O M | B E S I D E | T H E | D R A G O N | A N D | H A D | W I T N E S S E D | T H E | O C C U R R E N C E S | W I T H | M U C H | I N T E R E S T |
+P A I N T I N G | H E | T E L L S | U S | I S | O F | A | D I F F E R E N T | Q U A L I T Y | T O | M A T H E M A T I C S | A N D | F I N I S H | I N | A R T | I S | A D D I N G | M O R E | F A C T |
+I N | F A C T | H E | I S | Q U I T E | S E V E R E | O N | M I S T E R | R U S K I N | F O R | N O T | R E C O G N I S I N G | T H A T | A | P I C T U R E | S H O U L D | D E N O T E | T H E | F R A I L T Y | O F | M A N | A N D | R E M A R K S | W I T H | P L E A S I N G | C O U R T E S Y | A N D | F E L I C I T O U S | G R A C E | T H A T | M A N Y | P H A S E S | O F | F E E L I N G |
+L I N N E L L ' S | P I C T U R E S | A R E | A | S O R T | O F | U P | G U A R D S | A N D | A T | E M | P A I N T I N G S | A N D | M A S O N ' S | E X Q U I S I T E | I D Y L L S | A R E | A S | N A T I O N A L | A S | A | J I N G O | P O E M | M I S T E R | B I R K E T | F O S T E R ' S | L A N D S C A P E S | S M I L E | A T | O N E | M U C H | I N | T H E | S A M E | W A Y | T H A T | M I S T E R | C A R K E R | U S E D | T O | F L A S H | H I S | T E E T H | A N D | M I S T E R | J O H N | C O L L I E R | G I V E S | H I S | S I T T E R | A | C H E E R F U L | S L A P | O N | T H E | B A C K | B E F O R E | H E | S A Y S | L I K E | A | S H A M P O O E R | I N | A | T U R K I S H | B A T H | N E X T | M A N |
+O N L Y | U N F O R T U N A T E L Y | H I S | O W N | W O R K | N E V E R | D O E S | G E T | G O O D |
+O N | T H E | G E N E R A L | P R I N C I P L E S | O F | A R T | M I S T E R | Q U I L T E R | W R I T E S | W I T H | E Q U A L | L U C I D I T Y |
+M I S T E R | Q U I L T E R | I S | T H E | A P O S T L E | O F | T H E | M I D D L E | C L A S S E S | A N D | W E | A R E | G L A D | T O | W E L C O M E | H I S | G O S P E L |
+H E | T E L L S | U S | T H A T | A T | T H I S | F E S T I V E | S E A S O N | O F | T H E | Y E A R | W I T H | C H R I S T M A S | A N D | R O A S T | B E E F | L O O M I N G | B E F O R E | U S | S I M I L E S | D R A W N | F R O M | E A T I N G | A N D | I T S | R E S U L T S | O C C U R | M O S T | R E A D I L Y | T O | T H E | M I N D |
+A S | F O R | E T C H I N G S | T H E Y | A R E | O F | T W O | K I N D S | B R I T I S H | A N D | F O R E I G N |
+B Y | H A R R Y | Q U I L T E R | M | A |
+H E | H A S | G R A V E | D O U B T S | W H E T H E R | S I R | F R E D E R I C K | L E I G H T O N ' S | W O R K | I S | R E A L L Y | G R E E K | A F T E R | A L L | A N D | C A N | D I S C O V E R | I N | I T | B U T | L I T T L E | O F | R O C K Y | I T H A C A |
+N O R | I S | M I S T E R | Q U I L T E R ' S | M A N N E R | L E S S | I N T E R E S T I N G | T H A N | H I S | M A T T E R |
+I T | I S | O B V I O U S L Y | U N N E C E S S A R Y | F O R | U S | T O | P O I N T | O U T | H O W | L U M I N O U S | T H E S E | C R I T I C I S M S | A R E | H O W | D E L I C A T E | I N | E X P R E S S I O N |
+H E | L A M E N T S | M O S T | B I T T E R L Y | T H E | D I V O R C E | T H A T | H A S | B E E N | M A D E | B E T W E E N | D E C O R A T I V E | A R T | A N D | W H A T | W E | U S U A L L Y | C A L L | P I C T U R E S | M A K E S | T H E | C U S T O M A R Y | A P P E A L | T O | T H E | L A S T | J U D G M E N T | A N D | R E M I N D S | U S | T H A T | I N | T H E | G R E A T | D A Y S | O F | A R T | M I C H A E L | A N G E L O | W A S | T H E | F U R N I S H I N G | U P H O L S T E R E R |
+M I S T E R | Q U I L T E R | H A S | M I S S E D | H I S | C H A N C E | F O R | H E | H A S | F A I L E D | E V E N | T O | M A K E | H I M S E L F | T H E | T U P P E R | O F | P A I N T I N G |
+N E A R | T H E | F I R E | A N D | T H E | O R N A M E N T S | F R E D | B R O U G H T | H O M E | F R O M | I N D I A | O N | T H E | M A N T E L | B O A R D |
+I T | W A S | N O T | F R O M | A N Y | R E A L | C A U S E | O F | G R I E F | T H A T | S H E | W E P T | B U T | T H E R E | W A S | A | M A G N E T I C | Q U A L I T Y | I N | T E A R S | W H I C H | A L W A Y S | A T T R A C T E D | H E R ' S |
+N I E C E | I | C O M M A N D | Y O U | N O T | T O | S T I R | O U T | O F | T H I S | R O O M | T H I S | E V E N I N G |
+I T | I S | O F T E N | T H E | U N G R A T E F U L | T A S K | O F | A | F R I E N D | T O | B E | T R O U B L E S O M E | S O M E T I M E S | U N M A N N E R L Y |
+Y E S | I N D E E D | A N D | I | B E L I E V E | I T | I S | R I G H T | T H A T | I | S H O U L D | K E E P | M Y | F I R S T | P R O M I S E | I S | I T | N O T |
+K E E P | Y O U R | A P P O I N T M E N T | A N D | B E | A S S U R E D | T H A T | I | S H A L L | I S S U E | M Y | C O M M A N D S | W I T H | M O R E | C I R C U M S P E C T I O N | F O R | T H E | F U T U R E | A S | I | F I N D | H O W | S T R I C T L Y | T H E Y | A R E | C O M P L I E D | W I T H |
+I F | Y O U | T H I N K | S O | M A D A M | I | S E E | N O T H I N G | T H A T | S H O U L D | P R E V E N T | M E | N O W |
+N I G H T | A F T E R | N I G H T | H I S | S L E E P | H A D | B E E N | D I S T U R B E D | B Y | F E A R S | F O R | H E R | W H E N | A B R O A D | M O R N I N G | A F T E R | M O R N I N G | I T | H A D | B E E N | B R O K E N | B Y | T H E | C L A M O U R | O F | H E R | R E T U R N |
+N O R | H A D | T H I S | G O O D | W O M A N ' S | O F F I C I O U S | L A B O U R S | T A K E N | T H E | L E A S T | F R O M | T H E | A W K W A R D N E S S | O F | T H E | S I L E N C E | W H I C H | A S | S O O N | A S | T H E | B U S T L E | S H E | H A D | M A D E | W A S | O V E R | R E T U R N E D | I N | I T S | F U L L | F O R C E |
+S I R | E D W A R D | N O T | W H O L L Y | D I S C O U R A G E D | B Y | T H E | D E N I A L | W I T H | W H I C H | D O R R I F O R T H | H A D | W I T H | D E L I C A C Y | A C Q U A I N T E D | H I M | S T I L L | H O P E D | F O R | A | K I N D | R E C E P T I O N | A N D | W A S | S O | O F T E N | A T | T H E | H O U S E | O F | M I S S U S | H O R T O N | T H A T | L O R D | F R E D E R I C K ' S | J E A L O U S Y | W A S | E X C I T E D | A N D | T H E | T O R T U R E S | H E | S U F F E R E D | I N | C O N S E Q U E N C E | C O N V I N C E D | H I M | B E Y O N D | A | D O U B T | O F | T H E | S I N C E R I T Y | O F | H I S | A F F E C T I O N |
+M I S S U S | H O R T O N | T O O | I N | T H E | S E L F | A P P R O V I N G | R E F L E C T I O N | T H A T | S H E | W A S | N O T | I N | A | Q U A R R E L | O R | A L T E R C A T I O N | O F | A N Y | K I N D | F E L T | H E R S E L F | A T | T H I S | M O M E N T | R E M A R K A B L Y | P E A C E F U L | A N D | C H A R I T A B L E |
+I | H O P E | M I S S | M I L N E R | Y O U | P A S S | T H I S | E V E N I N G | A T | H O M E |
+A T | T H E | U S U A L | H O U R | M I S T E R | D O R R I F O R T H | A N D | H I S | W A R D | W E R E | S U M M O N E D | T O | T E A | H E | E N T E R E D | W I T H | A | C O U N T E N A N C E | W H I C H | E V I N C E D | T H E | R E M A I N S | O F | A N G E R | H I S | E Y E | G A V E | T E S T I M O N Y | O F | H I S | A B S E N T | T H O U G H T S | A N D | T H O U G H | H E | T O O K | U P | A | P A M P H L E T | A F F E C T I N G | T O | R E A D | I T | W A S | P L A I N | T O | D I S C E R N | T H A T | H E | S C A R C E L Y | K N E W | H E | H E L D | I T | I N | H I S | H A N D |
+M I S S | W O O D L E Y | T H O U G H T | I T | H E R | D U T Y | T O | B E | M U T E | A N D | N O W | T H E | G I N G L E | O F | A | T E A | S P O O N | W A S | L I K E | A | D E E P | T O N E D | B E L L | A L L | W A S | S O | Q U I E T |
+D O R R I F O R T H | T H E N | L A I D | T H E | B O O K | O U T | O F | H I S | H A N D | A N D | B Y | T H E | T I M E | T H E | S E R V A N T | H A D | L E F T | T H E | R O O M | T H U S | B E G A N |
+Y E T | D I D | T H E | W A T C H F U L | M I S S | W O O D L E Y | O F T E N T I M E S | H E A R | A | S I G H | E S C A P E | F R O M | H E R | U N K N O W N | T O | H E R S E L F | T I L L | S H E | W A S | R E M I N D E D | O F | I T | A N D | T H E N | A | S U D D E N | B L U S H | W O U L D | I N S T A N T L Y | O V E R S P R E A D | H E R | F A C E |
+D O R R I F O R T H | R E A D | O N | A N D | S E E M E D | A F R A I D | O F | L O O K I N G | U P | L E S T | H E | S H O U L D | S E E | W H A T | H E | C O U L D | N O T | H A V E | P A R D O N E D |
+I | T H O U G H T | M I S S | M I L N E R | Y O U | G A V E | M E | Y O U R | W O R D | T H A T | Y O U | W O U L D | P A S S | T H I S | E V E N I N G | A T | H O M E |
+O N | T H I S | H E | R O S E | F R O M | H I S | C H A I R | A N D | G O I N G | T O | H E R | S A I D | O N C E | M O R E | S H E W | Y O U R | S U B M I S S I O N | B Y | O B E Y I N G | M E | A | S E C O N D | T I M E | T O | D A Y |
+W H A T | H E | S A I D | H E | L O O K E D | W I T H | S O | M U C H | S I N C E R I T Y | T H A T | H A D | S H E | B E E N | B U R N I N G | W I T H | R A G E | A T | H I S | L A T E | B E H A V I O U R | S H E | M U S T | H A V E | F O R G I V E N | H I M | F O R | T H E | R E G R E T | W H I C H | H E | S O | F O R C I B L Y | E X P R E S T |
+M I S S U S | H O R T O N | R O S E | F R O M | H E R | S E A T | M O V E D | T H E | D E C A N T E R S | A N D | F R U I T | R O U N D | T H E | T A B L E | S T I R R E D | T H E | F I R E | A N D | C A M E | B A C K | T O | H E R | S E A T | A G A I N | B E F O R E | A N O T H E R | W O R D | W A S | U T T E R E D |
+M I S S | W O O D L E Y | D I D | N O T | R E C O L L E C T | H E R S E L F | S O | B U T | W A S | S O | I N | R E A L I T Y | I N | H E R | P E A C E | A N D | C H A R I T Y | W E R E | I N S T I N C T I V E | V I R T U E S | A C C I D E N T | C O U L D | N O T | I N C R E A S E | T H E M |
+M I S S | M I L N E R | Y O U | S H A L L | N O T | L E A V E | T H E | H O U S E | T H I S | E V E N I N G | S I R |
+H E R | H A N D | F E L L | M O T I O N L E S S | F R O M | T H A T | W H I C H | S H E | H E L D | S H E | A P P E A R E D | M O T I O N L E S S | H E R S E L F | T I L L | M I S S U S | H O R T O N | B E S E E C H I N G | H E R | N O T | T O | B E | U N E A S Y | A T | T H E | T R E A T M E N T | S H E | H A D | R E C E I V E D | M A D E | H E R | T E A R S | F L O W | A S | I F | H E R | H E A R T | W A S | B R E A K I N G |
+S H E | W A S | G O I N G | T O | R E P L Y | B U T | F O U N D | S H E | C O U L D | N O T | W I T H O U T | A C C O M P A N Y I N G | H E R | W O R D S | W I T H | T E A R S | T H E R E F O R E | A F T E R | T H E | F I R S T | A T T E M P T | S H E | D E S I S T E D |
+E V E R Y | T I M E | H E | B E H E L D | T H E | O B J E C T | O F | H I S | P A S S I O N | F O R | H E | S T I L L | C O N T I N U E D | H I S | V I S I T S | T H O U G H | N O T | S O | F R E Q U E N T L Y | A S | H E R E T O F O R E | H E | P L E A D E D | H I S | C A U S E | W I T H | S U C H | A R D O U R | T H A T | M I S S | W O O D L E Y | W H O | W A S | S O M E T I M E S | P R E S E N T | A N D | E V E R | C O M P A S S I O N A T E | C O U L D | N O T | R E S I S T | W I S H I N G | H I M | S U C C E S S |
+H E | C O U G H E D | D R A N K | H I S | T E A | E N D E A V O U R E D | T O | T A L K | B U T | F O U N D | I T | D I F F I C U L T | S O M E T I M E S | R E A D | A N D | I N | T H I S | M A N N E R | N E A R | T W O | H O U R S | W E R E | P A S S E D | A W A Y | W H E N | M I S S | M I L N E R | C A M E | I N T O | T H E | R O O M | N O T | D R E S S E D | F O R | A | B A L L | B U T | A S | S H E | H A D | R I S E N | F R O M | D I N N E R |
+A N D | H E | W A L K E D | I M M E D I A T E L Y | O U T | O F | T H E | A P A R T M E N T | B Y | A N O T H E R | D O O R |
+D O | Y O U | T H I N K | I | W O U L D | G O | A N S W E R E D | M I S S | M I L N E R | W I T H | A N | E A G E R N E S S | T H A T | F O R | A | T I M E | S U P P R E S S E D | H E R | T E A R S | I N | C O N T R A D I C T I O N | T O | H I S | W I L L |
+A F T E R | A | F E W | M I N U T E S | P A U S E | A N D | S O M E | L I T T L E | E M B A R R A S S M E N T | O N | T H E | P A R T | O F | M I S S U S | H O R T O N | A T | T H E | D I S A P P O I N T M E N T | S H E | H A D | T O | E N C O U N T E R | F R O M | T H I S | U N E X P E C T E D | D U T I F U L | C O N D U C T | S H E | A S K E D | M I S S | M I L N E R | I F | S H E | W O U L D | N O W | H A V E | A N Y | T E A |
+M I S S | W O O D L E Y | O B E D I E N T L Y | S A T | D O W N | A N D | T H O U G H | H E R | T H O U G H T S | A N D | H E A R T | W E R E | I N | T H E | C H A M B E R | O F | H E R | F R I E N D | S H E | N E V E R | M A R K E D | B Y | O N E | I M P E R T I N E N T | W O R D | O R | B Y | O N E | L I N E | O F | H E R | F A C E | T H E | R E S T R A I N T | S H E | S U F F E R E D |
+F O R G I V E | T H E | D U T I E S | O F | M Y | O F F I C E | A N D | B E L I E V E | T H A T | N O | O N E | I S | H A L F | S O | M U C H | C O N C E R N E D | I F | I T | R O B S | Y O U | O F | A N Y | D E G R E E | O F | H A P P I N E S S | A S | I | M Y S E L F | A M |
+W H E N | A | M A R R I E D | W O M A N | H A S | F O L L O W E R S | A N D | T H E | H U S B A N D | D O N ' T | G O | T H E | W R O N G | S I D E | O F | T H E | P O S T | T O O | O R | I T | A I N ' T | P R O V E D | A G A I N | H I M | T H A T | H E | D O | T H E Y ' L L | N E V E R | L E T | H E R | H A V E | N O T H I N G | T O | D O | W I T H | T H E | C H I L D R E N |
+M I S S U S | B O Z Z L E | W A S | D I S P O S E D | T O | T H I N K | T H A T | L A D I E S | O F | Q U A L I T Y | A M O N G | W H O M | M A D A M E | T | W A S | E N T I T L E D | I N | H E R | E S T I M A T I O N | T O | T A K E | R A N K | W E R E | S E L D O M | B E T T E R | T H A N | T H E Y | O U G H T | T O | B E | A N D | S H E | W A S | Q U I T E | W I L L I N G | T H A T | H E R | H U S B A N D | S H O U L D | E A R N | H I S | B R E A D | B Y | W A T C H I N G | T H E | L A D Y | O R | T H E | L A D Y ' S | L O V E R |
+H E | C A N ' T | S U C K L E | E M | C A N | H E |
+B U T | A S | F O R | T H I S | H E R E | C H I L D | B |
+A N D | N O W | I T | H A D | C O M E | T O | P A S S | T H A T | H I S | S O L E | R E M A I N I N G | A L L Y | M I S T E R | S A M U E L | B O Z Z L E | T H E | E X | P O L I C E M A N | W A S | B E C O M I N G | W E A R Y | O F | H I S | S E R V I C E |
+A T | L A S T | H E | S E N T | W O R D | T O | S A Y | T H A T | H E | H I M S E L F | W O U L D | B E | I N | E N G L A N D | B E F O R E | T H E | E N D | O F | M A R C H | A N D | W O U L D | S E E | T H A T | T H E | M A J E S T Y | O F | T H E | L A W | S H O U L D | B E | V I N D I C A T E D | I N | H I S | F A V O U R |
+B O Z Z L E | A W A Y | F R O M | H I S | O W N | H O M E | O U T | O N | B U S I N E S S | W I T H | H I S | C O A T | B U T T O N E D | O V E R | H I S | B R E A S T | A N D | H I S | B E S T | H A T | I N | H I S | H A N D | W A S | A W A R E | T H A T | H E | C O M M A N D E D | R E S P E C T | A N D | H E | C O U L D | C A R R Y | H I M S E L F | A C C O R D I N G L Y |
+B U T | I F | Y O U | A S K | M E | M Y | O P I N I O N | W H Y | I N | C O U R S E | T H E Y ' V E | B E E N | T O G E T H E R | S O M E W H E R E |
+I N | T H E | L A S T | C O M M U N I C A T I O N | W H I C H | H E | H A D | R E C E I V E D | F R O M | L A D Y | M I L B O R O U G H | S H E | H A D | S C O L D E D | H I M | I N | T E R M S | T H A T | W E R E | F O R | H E R | S E V E R E | B E C A U S E | H E | H A D | N O T | R E T U R N E D | T O | H I S | W I F E | A N D | T A K E N | H E R | O F F | W I T H | H I M | T O | N A P L E S |
+I F | Y O U | W O U L D | H A V E | G O N E | T O | M I S T E R | S K I N T | S I R | S U G G E S T E D | B O Z Z L E |
+A N D | H A D | T H E | C A S E | B E E N | B R O U G H T | B E F O R E | T H E | J U D G E | O R D I N A R Y | B Y | M E A N S | O F | H E R | H U S B A N D ' S | E X E R T I O N S | S H E | W O U L D | H A V E | T A K E N | P L E A S U R E | I N | R E A D I N G | E V E R Y | W O R D | O F | T H E | E V I D E N C E | E V E N | T H O U G H | H E R | H U S B A N D | S H O U L D | H A V E | B E E N | E V E R | S O | R O U G H L Y | H A N D L E D | B Y | T H E | L A W Y E R S |
+D R A T | E M | A L L | W H A T | I S | I T | T H E Y | W A N T S | T H E Y | D O N ' T | K N O W | W H A T | T H E Y | W A N T S |
+B U T | T R E V E L Y A N | W A S | O F | A | D I F F E R E N T | O P I N I O N | A N D | H E | W A S | D I S G U S T E D | A N D | R E V O L T E D | M O S T | U N R E A S O N A B L Y | B Y | T H E | A P P E A R A N C E | O F | H I S | M I N I S T E R ' S | D O M E S T I C | A R R A N G E M E N T S |
+I T ' S | T H A T | A S | M A K E S | E M | I | W O N ' T | S A Y | W H A T |
+P E R H A P S | Y O U | C O U L D | P U T | O N | Y O U R | C O A T | A N D | W A L K | O U T | W I T H | M E | F O R | A | F E W | M I N U T E S | S A I D | T R E V E L Y A N |
+I ' L L | T E L L | Y O U | W H A T | I T | I S | B | E X C L A I M E D | M I S S U S | B O Z Z L E | I T ' S | M Y | B E L I E F | A S | H E | A I N ' T | Q U I T E | R I G H T | U P | H E R E | A N D | M I S S U S | B O Z Z L E | T O U C H E D | H E R | F O R E H E A D |
+A | D I S T I N C T | P R O M I S E | O F | A | H U N D R E D | P O U N D S | W A S | M A D E | T O | H I M | I F | H E | W O U L D | H A V E | T H E | C H I L D | R E A D Y | T O | H A N D | O V E R | T O | T R E V E L Y A N | O N | T R E V E L Y A N ' S | A R R I V A L | I N | E N G L A N D |
+T R E V E L Y A N | H A D | F O L L O W E D | H I S | L E T T E R | Q U I C K E R | T H A N | H E | H A D | I N T E N D E D | W H E N | I T | W A S | W R I T T E N | A N D | W A S | N O W | W I T H | H I S | P R I M E | M I N I S T E R | B E F O R E | H I S | P R I M E | M I N I S T E R | H A D | B E E N | A B L E | T O | T A K E | A N Y | A C T I O N | O N | T H E | L A S T | I N S T R U C T I O N | R E C E I V E D |
+T H E N | B O Z Z L E | C A M E | F O R W A R D | A N D | I N T R O D U C E D | H I S | W I F E |
+T H E | P A T E R N A L | P A R E N T | H A S | A | R I G H T | T O | H I S | I N F A N T S | N O | D O U B T | T H A T | W A S | B O Z Z L E ' S | L A W |
+I | D O | N O T | S U P P O S E | T H A T | A N Y B O D Y | W I L L | Q U E S T I O N | M Y | R I G H T | T O | H A V E | T H E | C A R E | O F | M Y | O W N | C H I L D | S A I D | T R E V E L Y A N |
+I ' V E | W A T C H E D | A S | S H A R P | A S | W A T C H I N G | C A N | G O | P R E T T Y | N E A R |
+O F | C O U R S E | I T | A I N ' T | S A I D | M I S S U S | B O Z Z L E |
+A S | H E | W E N T | A B O U T | H I S | E Y E S | W E R E | E V E R | C A S T | D O W N W A R D S | A N D | H E | W A L K E D | W I T H | A | Q U I C K | S H U F F L I N G | G A I T | A N D | H E | S U S P E C T E D | O T H E R S | F E E L I N G | T H A T | H E | H I M S E L F | W A S | S U S P E C T E D |
+I T | I S | V E R Y | M U C H | E A S I E R | F O R | S U C H | M E N | A S | M I S T E R | B O Z Z L E | T O | C A R R Y | D E C E N C Y | O F | A P P E A R A N C E | A B O U T | W I T H | T H E M | T H A N | T O | K E E P | I T | A T | H O M E |
+A N D | H E | D I D | G O | A W A Y | L E A V I N G | B O Z Z L E | S T A N D I N G | I N | T H E | M I D D L E | O F | S T O N Y | W A L K |
+A N D | A L L | W O R K | H A D | C E A S E D | W I T H | H I M |
+M I S S U S | B O Z Z L E | W H O | W E L L | U N D E R S T O O D | T H A T | B U S I N E S S | W A S | B U S I N E S S | A N D | T H A T | W I V E S | W E R E | N O T | B U S I N E S S | F E L T | N O | A N G E R | A T | T H I S | A N D | H A N D E D | H E R | H U S B A N D | H I S | B E S T | C O A T |
+A N D | B O Z Z L E | A S | H E | S A I D | T H I S | S M I L E D | A L M O S T | A L O U D |
+D O E S | O N E | M I S T E R | S A M U E L | B O Z Z L E | L I V E | H E R E | A S K E D | T R E V E L Y A N |
+H E ' S | U P | I N | T O W N | S I R | A | M I N D I N G | O F | H I S | P A R L I A M E N T A R Y | D U T I E S |
+B O Z Z L E | H A D | A L W A Y S | W A I T E D | U P O N | H I M | W I T H | A | D E C E N T | C O A T | A N D | A | W E L L | B R U S H E D | H A T | A N D | C L E A N | S H O E S |
+I N | M A K I N G | T H I S | H E | H A D | E X P E C T E D | N O | S U C C E S S | T H O U G H | F R O M | T H E | E N E R G E T I C | N A T U R E | O F | H I S | D I S P O S I T I O N | H E | H A D | M A D E | T H E | A T T E M P T | W I T H | S O M E | Z E A L |
+W E | H A V E | B O T H | S E E N | T H E | S A M E | N E W S P A P E R | O F | C O U R S E | A N D | Y O U | H A V E | B E E N | T H E | F I R S T | T O | C L E A R | T H E | T H I N G | U P | T H A T ' S | I T | I S N ' T | I T |
+W H A T | A R E | Y O U | D O I N G | H E R E | H E | A S K E D |
+E V E N | T H E | C H A N C E | O F | S U C C E S S F U L L Y | C O N F I D I N G | H E R | T O | B E N N Y D E C K ' S | P R O T E C T I O N | H A D | L O S T | S O M E T H I N G | O F | I T S | F A I R | P R O M I S E | S I N C E | R A N D A L ' S | V I S I T | T O | S Y D E N H A M |
+S H A L L | I | S A Y | T H A T | S H E | M A Y | E X P E C T | A N | E A R L Y | V I S I T | F R O M | Y O U | W H E N | I | S E E | H E R | T O | M O R R O W |
+B U T | I T | M I G H T | P E R H A P S | B E | E X C U S A B L E | T O | I N F E R | T H A T | T H E | M A R R I A G E | H A D | N O T | Y E T | B E E N | D E C I D E D | O N | A N D | T H A T | T H E | C A P T A I N ' S | P R O P O S A L S | W E R E | S T I L L | W A I T I N G | F O R | C A T H E R I N E ' S | R E P L Y |
+S U P P O S I N G | T H E | R E P O R T | H A D | B E E N | T R U E |
+I N | T H E | M E A N T I M E | A F T E R | W H A T | M I S S U S | P R E S T Y | H A D | C O N F E S S E D | T H E | C R U E L | F A L S E H O O D | W H I C H | H A D | C H E C K E D | P O O R | K I T T Y ' S | N A T U R A L | I N Q U I R I E S | R A I S E D | A N | I N S U P E R A B L E | O B S T A C L E | T O | A | M E E T I N G | B E T W E E N | F A T H E R | A N D | C H I L D |
+B E | T H E | R E S U L T S | H O W E V E R | W H A T | T H E Y | M I G H T | R A N D A L | C O U L D | S E E | B U T | O N E | P L A I N | C O U R S E | B E F O R E | H I M | N O W |
+H E | H A D | P R O M I S E D | T O | D O | H I S | B E S T | T O W A R D | P E R S U A D I N G | C A T H E R I N E | T O | G R A N T | S Y D N E Y | A N | I N T E R V I E W |
+W H A T | H A P P I E R | F U T U R E | C O U L D | A W A I T | H E R | E S P E C I A L L Y | I F | S H E | J U S T I F I E D | R A N D A L ' S | P A S T | E X P E R I E N C E | O F | A L L | T H A T | W A S | C A N D I D | A N D | T R U T H F U L | I N | H E R | C H A R A C T E R | T H A N | T O | B E C O M E | H I S | F R I E N D ' S | W I F E |
+Y O U | H A V E | B E E N | T O | T H E | H O T E L | H E | B U R S T | O U T | Y O U | H A V E | S E E N | C A T H E R I N E |
+N O T | S A T I S F I E D | W I T H | G O S S I P | I N | P R I V A T E | T H E | G R E E D Y | P U B L I C | A P P E T I T E | D E V O U R S | G O S S I P | I N | P R I N T | A N D | W A N T S | M O R E | O F | I T | T H A N | A N Y | O N E | E D I T O R | C A N | S U P P L Y |
+H E | A D D E D | S Y D N E Y ' S | A D D R E S S | I N | A | P O S T S C R I P T | A N D | D I S P A T C H E D | H I S | L E T T E R | T H A T | E V E N I N G |
+C O N S I D E R A T I O N S | O F | D E L I C A C Y | S E E M E D | T O | F O R B I D | T A K I N G | T H I S | L I B E R T Y | E V E N | W I T H | A N | I N T I M A T E | F R I E N D |
+R A N D A L | H E | S A I D | Y O U | K N O W | W H E R E | S Y D N E Y | I S |
+I | H A V E N ' T | C O U R A G E | E N O U G H | T O | D O | I T | F O R | M Y S E L F |
+R A N D A L | W R O T E | T O | A C C E P T | T H E | I N V I T A T I O N | D E T E R M I N I N G | T O | P R E S E N T | H I M S E L F | B E F O R E | T H E | A P P O I N T E D | H O U R | A N D | T O | Q U E S T I O N | C A T H E R I N E | P R I V A T E L Y | W I T H O U T | G I V I N G | H E R | T H E | A D V A N T A G E | O V E R | H I M | O F | P R E P A R I N G | H E R S E L F | F O R | T H E | I N T E R V I E W |
+H E | P A U S E D | A N D | P U T | H I S | H A N D | T O | H I S | F E V E R E D | H E A D |
+I ' M | A F R A I D | H E | S A I D |
+H E | P U T | D O W N | T H E | E M P T Y | G L A S S | T A K I N G | N O | N O T I C E | O F | H I S | B R O T H E R ' S | Q U E S T I O N |
+I ' M | A L O N E | D O | Y O U | H E A R | T H A T | A L O N E |
+T H I S | A L T E R N A T I V E | I N | T H E | C A P T A I N ' S | P L A N S | T E R M I N A T I N G | T H E | V O Y A G E | A | M O N T H | E A R L I E R | T H A N | H I S | A R R A N G E M E N T S | H A D | C O N T E M P L A T E D | P U Z Z L E D | R A N D A L |
+I | W I L L | D O | N E I T H E R | T H E | O N E | N O R | T H E | O T H E R |
+T H E | S A I L I N G | M A S T E R | A N N O U N C E D | T H A T | H E | H A D | O R D E R S | T O | T A K E | T H E | V E S S E L | B A C K | T O | H E R | P O R T | W I T H | N O | O T H E R | E X P L A N A T I O N | T H A N | T H A T | T H E | C R U I S E | W A S | O V E R |
+S H E | S H A L L | H A V E | Y O U R | M E S S A G E | A L L | T H A T | I | C A N | D O | T O | P E R S U A D E | H E R | S H A L L | B E | D O N E |
+Y O U | D I S T R E S S | M E | H E R B E R T | M O R E | T H A N | W O R D S | C A N | S A Y |
+H E | I S | S T A Y I N G | A T | T H I S | H O T E L | T O | T R Y | T H E | A I R | O F | S Y D E N H A M | A N D | H E | F I N D S | T H A T | I T | A G R E E S | W I T H | H I M |
+Y O U | D O N ' T | K N O W | W H A T | I T | I S | T O | B E | U S E D | T O | S E E I N G | A | P R E T T Y | C R E A T U R E | A L W A Y S | N I C E L Y | D R E S S E D | A L W A Y S | A B O U T | T H E | R O O M | T H I N K I N G | S O | M U C H | O F | Y O U | A N D | S O | L I T T L E | O F | H E R S E L F | A N D | T H E N | T O | B E | L E F T | A L O N E | A S | I | A M | L E F T | O U T | I N | T H E | D A R K |
+H E | D R A N K | T H E | W I N E | G R E E D I L Y |
+R A N D A L | W A I T E D | A | W H I L E | I N | L O N D O N | O N | T H E | C H A N C E | T H A T | B E N N Y D E C K | M I G H T | P A Y | H I M | A | V I S I T |
+A F T E R | M O N T H S | O F | S E P A R A T I O N | H E | R E C E I V E D | A | V I S I T | F R O M | H E R B E R T |
+L E T | M E | H E A R | W H A T | I T | I S | F I R S T |
+N O T | H A V I N G | H E A R D | F R O M | C A P T A I N | B E N N Y D E C K | F O R | S O M E | L I T T L E | T I M E | R A N D A L | T H O U G H T | I T | D E S I R A B L E | I N | S Y D N E Y ' S | I N T E R E S T S | T O | M A K E | I N Q U I R I E S | A T | H I S | C L U B |
+W A S | H I S | M I N D | W A N D E R I N G | I N T O | S O M E | O T H E R | T R A I N | O F | T H O U G H T |
+O H | W H Y | D I D | I | E N G A G E | T H A T | G O V E R N E S S |
+H A D | H E R | B E A U T Y | F A S C I N A T E D | H I M |
+H E | M E N T I O N E D | T H E | N A M E | O F | O N E | O F | T H E | O L D | S E R V A N T S | A T | M O U N T | M O R V E N | W H O | H A D | A T T A C H E D | H I M S E L F | T O | R A N D A L | A F T E R | T H E | B R E A K U P | O F | T H E | F A M I L Y |
+Y O U | C A N ' T | D O | I T |
+I | F E E L | F O R | Y O U | H E R B E R T | H E | S A I D | W A R M L Y |
+L E T | M E | R E S T | A | L I T T L E | H E | P L E A D E D | I F | I ' M | N O T | I N | T H E | W A Y |
+I | T R I E D | I T | Y E S T E R D A Y | I T | S E T | M Y | B R A I N S | O N | F I R E | I ' M | F E E L I N G | T H A T | G L A S S | I | T O O K | J U S T | N O W |
+H O N E S T L Y |
+B U T | I F | T H E S E | N E W S P A P E R | P E O P L E | W A I T E D | T O | F I N D | O U T | W H E T H E R | A | R E P O R T | I S | T R U E | O R | F A L S E | H O W | M U C H | G O S S I P | W O U L D | S O C I E T Y | G E T | I N | I T S | F A V O R I T E | N E W S P A P E R S |
+W H I L E | H E | W A S | W A L K I N G | U P | A N D | D O W N | T H E | P L A T F O R M | W I T H | A | M I N D | D O U B L Y | D I S T R E S S E D | B Y | A N X I E T Y | A B O U T | H I S | B R O T H E R | A N D | A N X I E T Y | A B O U T | S Y D N E Y | T H E | T R A I N | F R O M | L O N D O N | C A M E | I N |
+A F T E R | T H A T | I | H A D | H E R | M O T H E R ' S | A U T H O R I T Y | F O R | T E L L I N G | K I T T Y | T H A T | S H E | W O U L D | N E V E R | S E E | H E R | F A T H E R | A G A I N |
+H A V E | Y O U | A N Y | M E S S A G E | F O R | C A P T A I N | B E N N Y D E C K |
+S I T | D O W N | S A I D | M I S S U S | P R E S T Y |
+W I T H | H I S | O W N | S U S P I C I O N S | S T E A D I L Y | C O N T R A D I C T I N G | H I M | H E | A R R I V E D | A T | T H E | H O T E L | O B S T I N A T E L Y | B E L I E V I N G | T H A T | T H E | C H A R M I N G | W I D O W | W O U L D | P R O V E | T O | B E | A | S T R A N G E R |
+T H E | N E W | N U M B E R | O F | A | P O P U L A R | W E E K L Y | J O U R N A L | H A D | T H A T | D A Y | B E E N | P U B L I S H E D | R A N D A L | B O U G H T | I T |
+W H E N | H E | A N D | I | H A P P E N E D | T O | B E | L E F T | T O G E T H E R | H E | N A T U R A L L Y | W O N D E R E D | A F T E R | H A V I N G | S E E N | T H E | B E A U T I F U L | W I F E | W H E R E | T H E | L U C K Y | H U S B A N D | M I G H T | B E |
+Y O U | W O U L D | H A V E | S E E N | H E R | P I N I N G | F O R | T H E | C O M P A N Y | O F | O T H E R | C H I L D R E N | A N D | W O U L D | H A V E | H A D | N O | M E R C Y | O N | H E R |
+W O R S E | S T O R I E S | H A V E | B E E N | P R I N T E D | I | D O | A S S U R E | Y O U | W O R S E | S T O R I E S | H A V E | B E E N | P R I N T E D |
+A R R I V E D | A T | T H E | S T A T I O N | R A N D A L | F O U N D | T H A T | H E | M U S T | W A I T | F O R | T H E | T R A I N |
+M I S S U S | P R E S T Y | W A S | A T | H O M E | S H E | W A S | R E P O R T E D | T O | B E | I N | T H E | G A R D E N | O F | T H E | H O T E L |
+T H E | R E P O R T | I S | P R E M A T U R E | M Y | G O O D | F R I E N D |
+G O O D | B Y | D E A R | R A N D A L |
+M I S S U S | N O R M A N | A N D | H E R | L I T T L E | D A U G H T E R | W E R E | O U T | D R I V I N G | W I T H | A | F R I E N D | A N D | W E R E | E X P E C T E D | T O | R E T U R N | I N | G O O D | T I M E | F O R | D I N N E R |
+H E | W A S | I N T R O D U C E D | T O | M I S S U S | N O R M A N | A N D | T O | M I S S U S | N O R M A N ' S | L I T T L E | G I R L | A N D | W E | W E R E | A L L | C H A R M E D | W I T H | H I M |
+B U T | Y O U | O U G H T | T O | H A V E | K N O W N | T H A T | W E | A R E | O N L Y | H A L F | A N | H O U R | B E H I N D | Y O U | A T | S Y D E N H A M | I N | T H E | M A T T E R | O F | N E W S |
+Y O U | S H A L L | H E A R | H O W | M Y | D I V O R C E D | D A U G H T E R | A N D | M Y | P O O R | L I T T L E | G R A N D C H I L D | W E R E | T R E A T E D | A T | S A N D Y S E A L | A F T E R | Y O U | L E F T | U S |
+S H E | A S K E D | D I R E C T L Y | I F | H E R | F A T H E R | W A S | D E A D |
+R A N D A L | P A S S E D | T H I S | O V E R | W I T H O U T | N O T I C E |
+A F T E R | R E A D I N G | O N E | O R | T W O | O F | T H E | P O L I T I C A L | A R T I C L E S | H E | A R R I V E D | A T | T H E | C O L U M N S | S P E C I A L L Y | D E V O T E D | T O | F A S H I O N A B L E | I N T E L L I G E N C E |
+I T | W A S | A | R E L I E F | T O | R A N D A L | I N | T H E | P R E S E N T | S T A T E | O F | C A T H E R I N E ' S | R E L A T I O N S | T O W A R D | B E N N Y D E C K | T O | R E T U R N | T O | L O N D O N | W I T H O U T | H A V I N G | S E E N | H I S | F R I E N D |
+N O T | A T | T H E | H O T E L | J U S T | N O W |
+Y O U | A R E | T O | U N D E R S T A N D | T H A T | C A T H E R I N E | I S | A | W I D O W |
+O N | T H E | N E X T | D A Y | B U T | O N E | R A N D A L | A R R A N G E D | H I S | D E P A R T U R E | F O R | S Y D E N H A M | S O | A S | T O | A R R I V E | A T | T H E | H O T E L | A N | H O U R | B E F O R E | T H E | T I M E | A P P O I N T E D | F O R | T H E | D I N N E R |
+S H E | A D D E D | L O O K I N G | A T | H I M | S U S P I C I O U S L Y |
+R A N D A L | L O O K E D | A G A I N | A T | T H E | F I R S T | W O R D S | I N | T H E | P A R A G R A P H |
+H O W | N I C E | O F | Y O U | T O | C O M E | S O | S O O N | S H E | B E G A N |
+A N D | T H E | C A P T A I N | O F | C O U R S E | C O N C L U D E D | A F T E R | H A V I N G | B E E N | I N T R O D U C E D | T O | K I T T Y | T H A T | M I S S U S | N O R M A N | W A S | A | W I D O W |
+T H A T | W I L L | D O | M I S S U S | P R E S T Y | Y O U R | D E F E N S E | I S | T H O R O U G H L Y | W O R T H Y | O F | Y O U R | C O N D U C T | I N | A L L | O T H E R | R E S P E C T S |
+A | V E R Y | W I S E | D E C I S I O N | S H E | R E M A R K E D |
+B E F O R E | I | C O N S E N T E D | T O | A N S W E R | T H E | C H I L D ' S | I N Q U I R I E S | I | C A M E | T O | A N | U N D E R S T A N D I N G | W I T H | H E R | M O T H E R |
+W H I C H | O P P R E S S E D | T H E | M E T R O P O L I T A N S | O F | E U R O P E | A N D | A S I A | I N V A D E D | T H E | P R O V I N C E S | O F | A N T I O C H | A N D | A L E X A N D R I A | A N D | M E A S U R E D | T H E I R | D I O C E S E | B Y | T H E | L I M I T S | O F | T H E | E M P I R E |
+T H E | Z E A L | O F | C Y R I L | E X P O S E D | H I M | T O | T H E | P E N A L T I E S | O F | T H E | J U L I A N | L A W | B U T | I N | A | F E E B L E | G O V E R N M E N T | A N D | A | S U P E R S T I T I O U S | A G E | H E | W A S | S E C U R E | O F | I M P U N I T Y | A N D | E V E N | O F | P R A I S E |
+A T | T H E | S A M E | T I M E | E V E R Y | A V E N U E | O F | T H E | T H R O N E | W A S | A S S A U L T E D | W I T H | G O L D |
+A R D E N T | I N | T H E | P R O S E C U T I O N | O F | H E R E S Y | C Y R I L | A U S P I C I O U S L Y | O P E N E D | H I S | R E I G N | B Y | O P P R E S S I N G | T H E | N O V A T I A N S | T H E | M O S T | I N N O C E N T | A N D | H A R M L E S S | O F | T H E | S E C T A R I E S |
+A | W A N D E R I N G | T R I B E | O F | T H E | B L E M M Y E S | O R | N U B I A N S | I N V A D E D | H I S | S O L I T A R Y | P R I S O N | I N | T H E I R | R E T R E A T | T H E Y | D I S M I S S E D | A | C R O W D | O F | U S E L E S S | C A P T I V E S | B U T | N O | S O O N E R | H A D | N E S T O R I U S | R E A C H E D | T H E | B A N K S | O F | T H E | N I L E | T H A N | H E | W O U L D | G L A D L Y | H A V E | E S C A P E D | F R O M | A | R O M A N | A N D | O R T H O D O X | C I T Y | T O | T H E | M I L D E R | S E R V I T U D E | O F | T H E | S A V A G E S |
+N E S T O R I U S | W H O | D E P E N D E D | O N | T H E | N E A R | A P P R O A C H | O F | H I S | E A S T E R N | F R I E N D S | P E R S I S T E D | L I K E | H I S | P R E D E C E S S O R | C H R Y S O S T O M | T O | D I S C L A I M | T H E | J U R I S D I C T I O N | A N D | T O | D I S O B E Y | T H E | S U M M O N S | O F | H I S | E N E M I E S | T H E Y | H A S T E N E D | H I S | T R I A L | A N D | H I S | A C C U S E R | P R E S I D E D | I N | T H E | S E A T | O F | J U D G M E N T |
+B Y | T H E | V I G I L A N C E | O F | M E M N O N | T H E | C H U R C H E S | W E R E | S H U T | A G A I N S T | T H E M | A N D | A | S T R O N G | G A R R I S O N | W A S | T H R O W N | I N T O | T H E | C A T H E D R A L |
+T H E | V A N I T Y | O F | C E L E S T I N E | W A S | F L A T T E R E D | B Y | T H E | A P P E A L | A N D | T H E | P A R T I A L | V E R S I O N | O F | A | M O N K | D E C I D E D | T H E | F A I T H | O F | T H E | P O P E | W H O | W I T H | H I S | L A T I N | C L E R G Y | W A S | I G N O R A N T | O F | T H E | L A N G U A G E | T H E | A R T S | A N D | T H E | T H E O L O G Y | O F | T H E | G R E E K S |
+S I X T Y | E I G H T | B I S H O P S | T W E N T Y | T W O | O F | M E T R O P O L I T A N | R A N K | D E F E N D E D | H I S | C A U S E | B Y | A | M O D E S T | A N D | T E M P E R A T E | P R O T E S T | T H E Y | W E R E | E X C L U D E D | F R O M | T H E | C O U N C I L S | O F | T H E I R | B R E T H R E N |
+A T | T H E S E | B L A S P H E M O U S | S O U N D S | T H E | P I L L A R S | O F | T H E | S A N C T U A R Y | W E R E | S H A K E N |
+S U C H | C R I M E S | W O U L D | H A V E | D E S E R V E D | T H E | A N I M A D V E R S I O N | O F | T H E | M A G I S T R A T E | B U T | I N | T H I S | P R O M I S C U O U S | O U T R A G E | T H E | I N N O C E N T | W E R E | C O N F O U N D E D | W I T H | T H E | G U I L T Y | A N D | A L E X A N D R I A | W A S | I M P O V E R I S H E D | B Y | T H E | L O S S | O F | A | W E A L T H Y | A N D | I N D U S T R I O U S | C O L O N Y |
+T H E | F E E B L E | S O N | O F | A R C A D I U S | W A S | A L T E R N A T E L Y | S W A Y E D | B Y | H I S | W I F E | A N D | S I S T E R | B Y | T H E | E U N U C H S | A N D | W O M E N | O F | T H E | P A L A C E | S U P E R S T I T I O N | A N D | A V A R I C E | W E R E | T H E I R | R U L I N G | P A S S I O N S | A N D | T H E | O R T H O D O X | C H I E F S | W E R E | A S S I D U O U S | I N | T H E I R | E N D E A V O R S | T O | A L A R M | T H E | F O R M E R | A N D | T O | G R A T I F Y | T H E | L A T T E R |
+R E T U R N | T O | Y O U R | P R O V I N C E S | A N D | M A Y | Y O U R | P R I V A T E | V I R T U E S | R E P A I R | T H E | M I S C H I E F | A N D | S C A N D A L | O F | Y O U R | M E E T I N G |
+A | R U M O R | W A S | S P R E A D | A M O N G | T H E | C H R I S T I A N S | T H A T | T H E | D A U G H T E R | O F | T H E O N | W A S | T H E | O N L Y | O B S T A C L E | T O | T H E | R E C O N C I L I A T I O N | O F | T H E | P R A E F E C T | A N D | T H E | A R C H B I S H O P | A N D | T H A T | O B S T A C L E | W A S | S P E E D I L Y | R E M O V E D |
+O R E S T E S | C O M P L A I N E D | B U T | H I S | J U S T | C O M P L A I N T S | W E R E | T O O | Q U I C K L Y | F O R G O T T E N | B Y | T H E | M I N I S T E R S | O F | T H E O D O S I U S | A N D | T O O | D E E P L Y | R E M E M B E R E D | B Y | A | P R I E S T | W H O | A F F E C T E D | T O | P A R D O N | A N D | C O N T I N U E D | T O | H A T E | T H E | P R A E F E C T | O F | E G Y P T |
+D U R I N G | A | B U S Y | P E R I O D | O F | T H R E E | M O N T H S | T H E | E M P E R O R | T R I E D | E V E R Y | M E T H O D | E X C E P T | T H E | M O S T | E F F E C T U A L | M E A N S | O F | I N D I F F E R E N C E | A N D | C O N T E M P T | T O | R E C O N C I L E | T H I S | T H E O L O G I C A L | Q U A R R E L |
+W I T H O U T | A N Y | L E G A L | S E N T E N C E | W I T H O U T | A N Y | R O Y A L | M A N D A T E | T H E | P A T R I A R C H | A T | T H E | D A W N | O F | D A Y | L E D | A | S E D I T I O U S | M U L T I T U D E | T O | T H E | A T T A C K | O F | T H E | S Y N A G O G U E S |
+T H E | P A S T | H E | R E G R E T T E D | H E | W A S | D I S C O N T E N T E D | W I T H | T H E | P R E S E N T | A N D | T H E | F U T U R E | H E | H A D | R E A S O N | T O | D R E A D | T H E | O R I E N T A L | B I S H O P S | S U C C E S S I V E L Y | D I S E N G A G E D | T H E I R | C A U S E | F R O M | H I S | U N P O P U L A R | N A M E | A N D | E A C H | D A Y | D E C R E A S E D | T H E | N U M B E R | O F | T H E | S C H I S M A T I C S | W H O | R E V E R E D | N E S T O R I U S | A S | T H E | C O N F E S S O R | O F | T H E | F A I T H |
+B U T | I N | T H I S | A W F U L | M O M E N T | O F | T H E | D A N G E R | O F | T H E | C H U R C H | T H E I R | V O W | W A S | S U P E R S E D E D | B Y | A | M O R E | S U B L I M E | A N D | I N D I S P E N S A B L E | D U T Y |
+E X T E R M I N A T E | W I T H | M E | T H E | H E R E T I C S | A N D | W I T H | Y O U | I | W I L L | E X T E R M I N A T E | T H E | P E R S I A N S |
+B U T | T H E | V A T I C A N | R E C E I V E D | W I T H | O P E N | A R M S | T H E | M E S S E N G E R S | O F | E G Y P T |
+B U T | T H E | P R E V A I L I N G | D O C T R I N E | O F | T H E | E T E R N I T Y | A N D | I N H E R E N T | P R A V I T Y | O F | M A T T E R | I N F E C T E D | T H E | P R I M I T I V E | C H U R C H E S | O F | T H E | E A S T |
+H E | F I R S T | A P P E A R E D | O N | T H E | B A N K S | O F | T H E | J O R D A N | I N | T H E | F O R M | O F | P E R F E C T | M A N H O O D | B U T | I T | W A S | A | F O R M | O N L Y | A N D | N O T | A | S U B S T A N C E | A | H U M A N | F I G U R E | C R E A T E D | B Y | T H E | H A N D | O F | O M N I P O T E N C E | T O | I M I T A T E | T H E | F A C U L T I E S | A N D | A C T I O N S | O F | A | M A N | A N D | T O | I M P O S E | A | P E R P E T U A L | I L L U S I O N | O N | T H E | S E N S E S | O F | H I S | F R I E N D S | A N D | E N E M I E S |
+A | L A U D A B L E | R E G A R D | F O R | T H E | H O N O R | O F | T H E | F I R S T | P R O S E L Y T E | H A S | C O U N T E N A N C E D | T H E | B E L I E F | T H E | H O P E | T H E | W I S H | T H A T | T H E | E B I O N I T E S | O R | A T | L E A S T | T H E | N A Z A R E N E S | W E R E | D I S T I N G U I S H E D | O N L Y | B Y | T H E I R | O B S T I N A T E | P E R S E V E R A N C E | I N | T H E | P R A C T I C E | O F | T H E | M O S A I C | R I T E S |
+B U T | T H E | R A S H N E S S | O F | T H E S E | C O N C E S S I O N S | H A S | E N C O U R A G E D | A | M I L D E R | S E N T I M E N T | O F | T H O S E | O F | T H E | D O C E T E S | W H O | T A U G H T | N O T | T H A T | C H R I S T | W A S | A | P H A N T O M | B U T | T H A T | H E | W A S | C L O T H E D | W I T H | A N | I M P A S S I B L E | A N D | I N C O R R U P T I B L E | B O D Y |
+H E | L I V E D | A N D | D I E D | F O R | T H E | S E R V I C E | O F | M A N K I N D | B U T | T H E | L I F E | A N D | D E A T H | O F | S O C R A T E S | H A D | L I K E W I S E | B E E N | D E V O T E D | T O | T H E | C A U S E | O F | R E L I G I O N | A N D | J U S T I C E | A N D | A L T H O U G H | T H E | S T O I C | O R | T H E | H E R O | M A Y | D I S D A I N | T H E | H U M B L E | V I R T U E S | O F | J E S U S | T H E | T E A R S | W H I C H | H E | S H E D | O V E R | H I S | F R I E N D | A N D | C O U N T R Y | M A Y | B E | E S T E E M E D | T H E | P U R E S T | E V I D E N C E | O F | H I S | H U M A N I T Y |
+A | F O E T U S | T H A T | C O U L D | I N C R E A S E | F R O M | A N | I N V I S I B L E | P O I N T | T O | I T S | F U L L | M A T U R I T Y | A | C H I L D | T H A T | C O U L D | A T T A I N | T H E | S T A T U R E | O F | P E R F E C T | M A N H O O D | W I T H O U T | D E R I V I N G | A N Y | N O U R I S H M E N T | F R O M | T H E | O R D I N A R Y | S O U R C E S | M I G H T | C O N T I N U E | T O | E X I S T | W I T H O U T | R E P A I R I N G | A | D A I L Y | W A S T E | B Y | A | D A I L Y | S U P P L Y | O F | E X T E R N A L | M A T T E R |
+Y E T | T H E | M O S T | C H A R I T A B L E | C R I T I C I S M | M U S T | R E F U S E | T H E S E | S E C T A R I E S | A N Y | K N O W L E D G E | O F | T H E | P U R E | A N D | P R O P E R | D I V I N I T Y | O F | C H R I S T |
+H I S | P R O G R E S S | F R O M | I N F A N C Y | T O | Y O U T H | A N D | M A N H O O D | W A S | M A R K E D | B Y | A | R E G U L A R | I N C R E A S E | I N | S T A T U R E | A N D | W I S D O M | A N D | A F T E R | A | P A I N F U L | A G O N Y | O F | M I N D | A N D | B O D Y | H E | E X P I R E D | O N | T H E | C R O S S |
+T H E I R | M U R M U R S | W E R E | V A R I O U S L Y | S I L E N C E D | B Y | T H E | S E C T A R I E S | W H O | E S P O U S E D | A N D | M O D I F I E D | T H E | D O U B L E | S Y S T E M | O F | C E R I N T H U S |
+W H E N | T H E | M E S S I A H | W A S | D E L I V E R E D | I N T O | T H E | H A N D S | O F | T H E | J E W S | T H E | C H R I S T | A N | I M M O R T A L | A N D | I M P A S S I B L E | B E I N G | F O R S O O K | H I S | E A R T H L Y | T A B E R N A C L E | F L E W | B A C K | T O | T H E | P L E R O M A | O R | W O R L D | O F | S P I R I T S | A N D | L E F T | T H E | S O L I T A R Y | J E S U S | T O | S U F F E R | T O | C O M P L A I N | A N D | T O | E X P I R E |
+H E | A C Q U I E S C E D | I N | T H E | O L D | D I S T I N C T I O N | O F | T H E | G R E E K | P H I L O S O P H E R S | B E T W E E N | T H E | R A T I O N A L | A N D | S E N S I T I V E | S O U L | O F | M A N | T H A T | H E | M I G H T | R E S E R V E | T H E | L O G O S | F O R | I N T E L L E C T U A L | F U N C T I O N S | A N D | E M P L O Y | T H E | S U B O R D I N A T E | H U M A N | P R I N C I P L E | I N | T H E | M E A N E R | A C T I O N S | O F | A N I M A L | L I F E |
+M A N Y | A M O N G | T H E | G E N T I L E | P R O S E L Y T E S | R E F U S E D | T O | B E L I E V E | T H A T | A | C E L E S T I A L | S P I R I T | A N | U N D I V I D E D | P O R T I O N | O F | T H E | F I R S T | E S S E N C E | H A D | B E E N | P E R S O N A L L Y | U N I T E D | W I T H | A | M A S S | O F | I M P U R E | A N D | C O N T A M I N A T E D | F L E S H | A N D | I N | T H E I R | Z E A L | F O R | T H E | D I V I N I T Y | T H E Y | P I O U S L Y | A B J U R E D | T H E | H U M A N I T Y | O F | C H R I S T |
+N O R | C O U L D | I T | S E E M | S T R A N G E | O R | I N C R E D I B L E | T H A T | T H E | F I R S T | O F | T H E S E | A E O N S | T H E | L O G O S | O R | W O R D | O F | G O D | O F | T H E | S A M E | S U B S T A N C E | W I T H | T H E | F A T H E R | S H O U L D | D E S C E N D | U P O N | E A R T H | T O | D E L I V E R | T H E | H U M A N | R A C E | F R O M | V I C E | A N D | E R R O R | A N D | T O | C O N D U C T | T H E M | I N | T H E | P A T H S | O F | L I F E | A N D | I M M O R T A L I T Y |
+I N | T H E I R | E Y E S | J E S U S | O F | N A Z A R E T H | W A S | A | M E R E | M O R T A L | T H E | L E G I T I M A T E | S O N | O F | J O S E P H | A N D | M A R Y | B U T | H E | W A S | T H E | B E S T | A N D | W I S E S T | O F | T H E | H U M A N | R A C E | S E L E C T E D | A S | T H E | W O R T H Y | I N S T R U M E N T | T O | R E S T O R E | U P O N | E A R T H | T H E | W O R S H I P | O F | T H E | T R U E | A N D | S U P R E M E | D E I T Y |
+T H E | W O R T H Y | F R I E N D | O F | A T H A N A S I U S | T H E | W O R T H Y | A N T A G O N I S T | O F | J U L I A N | H E | B R A V E L Y | W R E S T L E D | W I T H | T H E | A R I A N S | A N D | P O L Y T H E I S T S | A N D | T H O U G H | H E | A F F E C T E D | T H E | R I G O R | O F | G E O M E T R I C A L | D E M O N S T R A T I O N | H I S | C O M M E N T A R I E S | R E V E A L E D | T H E | L I T E R A L | A N D | A L L E G O R I C A L | S E N S E | O F | T H E | S C R I P T U R E S |
+B U T | T H E | J U S T I C E | A N D | G E N E R O S I T Y | O F | S U C H | A | D E S E R T I O N | A R E | S T R O N G L Y | Q U E S T I O N A B L E | A N D | T H E | F A T E | O F | A N | I N N O C E N T | M A R T Y R | A T | F I R S T | I M P E L L E D | A N D | A T | L E N G T H | A B A N D O N E D | B Y | H I S | D I V I N E | C O M P A N I O N | M I G H T | P R O V O K E | T H E | P I T Y | A N D | I N D I G N A T I O N | O F | T H E | P R O F A N E |
+U N D E R | T H E | T U I T I O N | O F | T H E | A B B O T | S E R A P I O N | H E | A P P L I E D | H I M S E L F | T O | E C C L E S I A S T I C A L | S T U D I E S | W I T H | S U C H | I N D E F A T I G A B L E | A R D O R | T H A T | I N | T H E | C O U R S E | O F | O N E | S L E E P L E S S | N I G H T | H E | H A S | P E R U S E D | T H E | F O U R | G O S P E L S | T H E | C A T H O L I C | E P I S T L E S | A N D | T H E | E P I S T L E | T O | T H E | R O M A N S |
+T H E | S O N | O F | A | V I R G I N | G E N E R A T E D | B Y | T H E | I N E F F A B L E | O P E R A T I O N | O F | T H E | H O L Y | S P I R I T | W A S | A | C R E A T U R E | W I T H O U T | E X A M P L E | O R | R E S E M B L A N C E | S U P E R I O R | I N | E V E R Y | A T T R I B U T E | O F | M I N D | A N D | B O D Y | T O | T H E | C H I L D R E N | O F | A D A M |
+T H E I R | C H U R C H E S | H A V E | D I S A P P E A R E D | T H E I R | B O O K S | A R E | O B L I T E R A T E D | T H E I R | O B S C U R E | F R E E D O M | M I G H T | A L L O W | A | L A T I T U D E | O F | F A I T H | A N D | T H E | S O F T N E S S | O F | T H E I R | I N F A N T | C R E E D | W O U L D | B E | V A R I O U S L Y | M O U L D E D | B Y | T H E | Z E A L | O R | P R U D E N C E | O F | T H R E E | H U N D R E D | Y E A R S |
+Y E T | A S | T H E | P R O F O U N D | D O C T O R | H A D | B E E N | T E R R I F I E D | A T | H I S | O W N | R A S H N E S S | A P O L L I N A R I S | W A S | H E A R D | T O | M U T T E R | S O M E | F A I N T | A C C E N T S | O F | E X C U S E | A N D | E X P L A N A T I O N |
+B U T | I N S T E A D | O F | A | T E M P O R A R Y | A N D | O C C A S I O N A L | A L L I A N C E | T H E Y | E S T A B L I S H E D | A N D | W E | S T I L L | E M B R A C E | T H E | S U B S T A N T I A L | I N D I S S O L U B L E | A N D | E V E R L A S T I N G | U N I O N | O F | A | P E R F E C T | G O D | W I T H | A | P E R F E C T | M A N | O F | T H E | S E C O N D | P E R S O N | O F | T H E | T R I N I T Y | W I T H | A | R E A S O N A B L E | S O U L | A N D | H U M A N | F L E S H |
+Y O U ' V E | G O T | A | B I R T H D A Y | P R E S E N T | T H I S | T I M E | J I M | A N D | N O | M I S T A K E |
+T H E | W E E K | F O L L O W I N G | C H R I S T M A S | B R O U G H T | I N | A | T H A W | A N D | B Y | N E W | Y E A R ' S | D A Y | A L L | T H E | W O R L D | A B O U T | U S | W A S | A | B R O T H | O F | G R A Y | S L U S H | A N D | T H E | G U T T E R E D | S L O P E | B E T W E E N | T H E | W I N D M I L L | A N D | T H E | B A R N | W A S | R U N N I N G | B L A C K | W A T E R |
+I T | W A S | T H E | F I R S T | T I M E | M I S S U S | S H I M E R D A | H A D | B E E N | T O | O U R | H O U S E | A N D | S H E | R A N | A B O U T | E X A M I N I N G | O U R | C A R P E T S | A N D | C U R T A I N S | A N D | F U R N I T U R E | A L L | T H E | W H I L E | C O M M E N T I N G | U P O N | T H E M | T O | H E R | D A U G H T E R | I N | A N | E N V I O U S | C O M P L A I N I N G | T O N E |
+T H E Y | B E G A N | T O | L A U G H | B O I S T E R O U S L Y | W H E N | T H E Y | S A W | M E | C A L L I N G |
+B U T | Y O U | S E E | A | B O D Y | N E V E R | K N O W S | W H A T | T R A I T S | P O V E R T Y | M I G H T | B R I N G | O U T | I N | E M |
+Y O U R | M A M A | I | S A I D | A N G R I L Y | W A N T S | O T H E R | P E O P L E ' S | T H I N G S |
+F O R | A M B R O S C H | M Y | M A M A | C O M E | H E R E |
+T H E | B O A R D | N O T | S O | F O R M I D A B L E | A S | S H E | H A D | I M A G I N E D | H A D | I N Q U I R E D | I N T O | H E R | C A S E | A N D | I N S T E A D | O F | S E N D I N G | H E R | T O | S T O K E | C L A Y P O L E | H E R | H U S B A N D ' S | B U C K I N G H A M S H I R E | P A R I S H | A S | S H E | H A D | D R E A D E D | H A D | A G R E E D | T O | P A Y | H E R | R E N T |
+T H E | G H O U L | L I K E | F E V E R | W A S | N O T | T O | B E | B R A V E D | W I T H | I M P U N I T Y | A N D | B A U L K E D | O F | I T S | P R E Y |
+A F O R E | C H R I S T M A S | T I M E | I | W A S | A S | F U L L | A S | F U L L | C O U L D | B E | O F | G O I N G | H O M E | F O R | G O O D | A N D | A L L | Y O | H A N | H E A R D | H O W | I ' V E | W I S H E D | I T | T H I S | T E R R I B L E | L O N G | T I M E |
+B U T | T H E | B E S T | O F | H E R | P L A N S | T H E | H O L I E S T | T H A T | W H I C H | I N | S O M E | M E A S U R E | R E D E E M E D | T H E | V A N I T Y | O F | T H E | R E S T | W E R E | T H O S E | R E L A T I N G | T O | H E R | F A T H E R | H E R | D E A R | F A T H E R | N O W | O P P R E S S E D | W I T H | C A R E | A N D | A L W A Y S | A | D I S H E A R T E N E D | G L O O M Y | P E R S O N |
+T H E N | A L I C E | B R O K E | T H E | S I L E N C E | B Y | S A Y I N G |
+H E R | C R I E S | B R O U G H T | H E R | H U S B A N D | D O W N | T O | T R Y | W I T H | H I S | A C H I N G | H E A R T | T O | C O M F O R T | H E R S |
+I S | T H E R E | A N Y | C H A N C E | F O R | T H E | O T H E R | O N E | T H I N K | Y O U |
+B U T | H E | S T A Y E D | L O N G | T H E R E | A N D | A T | L A S T | H I S | S T U R D Y | F R A M E | S H O O K | W I T H | H I S | S T R O N G | A G O N Y |
+H E R | T H O U G H T S | R A N | O N | J E M ' S | M A N N E R | A N D | W O R D S | N O T | B U T | W H A T | S H E | H A D | K N O W N | T H E | T A L E | T H E Y | T O L D | F O R | M A N Y | A | D A Y | B U T | S T I L L | S H E | W I S H E D | H E | H A D | N O T | P U T | I T | S O | P L A I N L Y |
+M A R Y | A N D | A L I C E | D R E W | N E A R | T H E | F I R E | A N D | S T O O D | I N | Q U I E T | S O R R O W | F O R | S O M E | T I M E |
+T H E R E | W A S | S O M E T H I N G | O F | K E E N | P R A C T I C A L | S H R E W D N E S S | A B O U T | H E R | W H I C H | C O N T R A S T E D | V E R Y | B E W I T C H I N G L Y | W I T H | T H E | S I M P L E | F O O L I S H | U N W O R L D L Y | I D E A S | S H E | H A D | P I C K E D | U P | F R O M | T H E | R O M A N C E S | W H I C H | M I S S | S I M M O N D S | Y O U N G | L A D I E S | W E R E | I N | T H E | H A B I T | O F | R E C O M M E N D I N G | T O | E A C H | O T H E R | Y E S |
+T H E | O L D | L E A V E N | I N F U S E D | Y E A R S | A G O | B Y | H E R | A U N T | E S T H E R | F E R M E N T E D | I N | H E R | L I T T L E | B O S O M | A N D | P E R H A P S | A L L | T H E | M O R E | F O R | H E R | F A T H E R ' S | A V E R S I O N | T O | T H E | R I C H | A N D | T H E | G E N T L E |
+S H E | O P E N E D | T H E | D O O R | S O F T L Y | T H E R E | S A T | M I S S U S | W I L S O N | I N | T H E | O L D | R O C K I N G | C H A I R | W I T H | O N E | S I C K | D E A T H | L I K E | B O Y | L Y I N G | O N | H E R | K N E E | C R Y I N G | W I T H O U T | L E T | O R | P A U S E | B U T | S O F T L Y | G E N T L Y | A S | F E A R I N G | T O | D I S T U R B | T H E | T R O U B L E D | G A S P I N G | C H I L D | W H I L E | B E H I N D | H E R | O L D | A L I C E | L E T | H E R | F A S T | D R O P P I N G | T E A R S | F A L L | D O W N | O N | T H E | D E A D | B O D Y | O F | T H E | O T H E R | T W I N | W H I C H | S H E | W A S | L A Y I N G | O U T | O N | A | B O A R D | P L A C E D | O N | A | S O R T | O F | S O F A | S E T T E E | I N | A | C O R N E R | O F | T H E | R O O M |
+B U T | E A R N E S T | A S | T H E | F A T H E R | W A S | I N | W A T C H I N G | T H E | Y E T | L I V I N G | H E | H A D | E Y E S | A N D | E A R S | F O R | A L L | T H A T | C O N C E R N E D | T H E | D E A D | A N D | S P R A N G | G E N T L Y | U P | A N D | T O O K | H I S | D E A D | S O N | O N | H I S | H A R D | C O U C H | I N | H I S | A R M S | W I T H | T E N D E R | S T R E N G T H | A N D | C A R R I E D | H I M | U P S T A I R S | A S | I F | A F R A I D | O F | W A K E N I N G | H I M |
+M A R Y | I | A L M O S T | L O A T H E | M Y S E L F | W H E N | I | F E E L | I | W O U L D | N O T | G I V E | U P | T H I S | M I N U T E | W H E N | M Y | B R O T H E R S | L I E | D E A D | A N D | F A T H E R | A N D | M O T H E R | A R E | I N | S U C H | T R O U B L E | F O R | A L L | M Y | L I F E | T H A T ' S | P A S T | A N D | G O N E | A N D | M A R Y | A S | S H E | T R I E D | T O | R E L E A S E | H E R | H A N D | Y O U | K N O W | W H A T | M A K E S | M E | F E E L | S O | B L E S S E D |
+D O N ' T | J E M | P L E A S E | D O N ' T | W H I S P E R E D | S H E | A G A I N | B E L I E V I N G | T H A T | H I S | S I L E N C E | W A S | O N L Y | A N O T H E R | F O R M | O F | G R I E F |
+H E | R E M A I N E D | U P | S T A I R S | U N T I L | A F T E R | T H E | E A R L Y | D A W N | S H O W E D | M A R Y | T H A T | S H E | N E E D | H A V E | N O | F E A R | O F | G O I N G | H O M E | T H R O U G H | T H E | D E S E R T E D | A N D | Q U I E T | S T R E E T S | T O | T R Y | A N D | G E T | A | L I T T L E | S L E E P | B E F O R E | W O R K | H O U R |
+O H | J E M | D O N ' T | G I V E | W A Y | S O | I | C A N N O T | B E A R | T O | S E E | Y O U |
+S H E | S T O P P E D | W I T H | H E R | H A N D | O N | T H E | L A T C H | O F | T H E | W I L S O N S | D O O R | T O | S T I L L | H E R | B E A T I N G | H E A R T | A N D | L I S T E N E D | T O | T H E | H U S H E D | Q U I E T | W I T H I N |
+M A R G A R E T | M E T | J E M | W I L S O N | S E V E R A L | D A Y S | A F T E R | H I S | B R O T H E R S | W E R E | S E R I O U S L Y | I L L | A N D | H E A R D | F R O M | H I M | T H E | S T A T E | O F | T H I N G S | A T | H I S | H O M E |
+O V E R | T H E | C H I L D | W H I C H | Y E T | B R E A T H E D | T H E | F A T H E R | B E N T | W A T C H I N G | A N X I O U S L Y | F O R | S O M E | G R O U N D | O F | H O P E | W H E R E | H O P E | T H E R E | W A S | N O N E |
+S O | L E A V I N G | K I N D | M E S S A G E S | T O | G E O R G E | A N D | J A N E | W I L S O N | A N D | H E S I T A T I N G | W H E T H E R | S H E | M I G H T | D A R E | T O | S E N D | A | F E W | K I N D | W O R D S | T O | J E M | A N D | D E C I D I N G | T H A T | S H E | H A D | B E T T E R | N O T | S H E | S T E P P E D | O U T | I N T O | T H E | B R I G H T | M O R N I N G | L I G H T | S O | F R E S H | A | C O N T R A S T | T O | T H E | D A R K E N E D | R O O M | W H E R E | D E A T H | H A D | B E E N |
+W H A T | B A N K R U P T S | I N | T H E | W O R L D | W E | F E E L | W H E N | D E A T H | L I K E | S O M E | R E M O R S E L E S S | C R E D I T O R | S E I Z E S | O N | A L L | W E | F O N D L Y | T H O U G H T | O U R | O W N | T H E | T W I N S |
+T H E N | T H E | M O T H E R | L I F T E D | U P | H E R | V O I C E | A N D | W E P T |
+I T | W A S | A | C O M F O R T | T O | H E R | W H E N | S C O L D E D | B Y | M I S S | S I M M O N D S | T O | T H I N K | O F | T H E | D A Y | W H E N | S H E | W O U L D | D R I V E | U P | T O | T H E | D O O R | I N | H E R | O W N | C A R R I A G E | T O | O R D E R | H E R | G O W N S | F R O M | T H E | H A S T Y | T E M P E R E D | Y E T | K I N D | D R E S S M A K E R |
+H E | D I D | N O T | S P E A K | A S | T H O U G H | F E A R I N G | T O | D E S T R O Y | B Y | S O U N D | O R | M O T I O N | T H E | H A P P I N E S S | O F | T H A T | M O M E N T | W H E N | H E R | S O F T | H A N D ' S | T O U C H | T H R I L L E D | T H R O U G H | H I S | F R A M E | A N D | H E R | S I L V E R Y | V O I C E | W A S | W H I S P E R I N G | T E N D E R N E S S | I N | H I S | E A R |
+I | T H I N K | I | C A N N O T | G O | R I G H T | F O R | I | E I T H E R | C H E C K | M Y S E L F | T I L L | I ' M | D O W N R I G H T | C R O S S | T O | H I M | O R | E L S E | I | S P E A K | J U S T | N A T U R A L | A N D | T H A T ' S | T O O | K I N D | A N D | T E N D E R | B Y | H A L F |
+H O W | I N F I N I T E | T H E | W E A L T H | O F | L O V E | A N D | H O P E | G A R N E R E D | I N | T H E S E | S A M E | T I N Y | T R E A S U R E | H O U S E S | A N D | O H |
+W I S H I N G | H I M | S A I D | M A R Y | I N | A | T O N E | O F | I N Q U I R Y |
+I | C A N N O T | T H I N K | W H A T | P O S S E S S E S | M E | T H A T | I | M U S T | A L W A Y S | B E | W A N T I N G | T O | C O M F O R T | H I M | W H E N | H E ' S | D O W N C A S T | A N D | T H A T | I | M U S T | G O | M E D D L I N G | W I | H I M | T O | N I G H T | W H E N | S U R E | E N O U G H | I T | W A S | H I S | A U N T ' S | P L A C E | T O | S P E A K | T O | H I M |
+B U T | W I L L | H E | T H A N K | M E | F O R | I T |
+I | P U T | O N | M Y | C A P | A N D | R A N | O U T | T O | M E E T | J A K E |
+O N | T H E | W H I T E | P A G E S | I | G R O U P E D | S U N D A Y | S C H O O L | C A R D S | A N D | A D V E R T I S I N G | C A R D S | W H I C H | I | H A D | B R O U G H T | F R O M | M Y | O L D | C O U N T R Y |
+A N Y W A Y | H E | W O U L D | N E V E R | A L L O W | O N E | O F | H I S | H O R S E S | T O | B E | P U T | T O | S U C H | A | S T R A I N |
+T H E Y | S A T | A B O U T | T H E | H O U S E | M O S T | O F | T H E | D A Y | A S | I F | I T | W E R E | S U N D A Y | G R E A S I N G | T H E I R | B O O T S | M E N D I N G | T H E I R | S U S P E N D E R S | P L A I T I N G | W H I P L A S H E S |
+I | C A N | S E E | T H E M | N O W | E X A C T L Y | A S | T H E Y | L O O K E D | W O R K I N G | A B O U T | T H E | T A B L E | I N | T H E | L A M P L I G H T | J A K E | W I T H | H I S | H E A V Y | F E A T U R E S | S O | R U D E L Y | M O U L D E D | T H A T | H I S | F A C E | S E E M E D | S O M E H O W | U N F I N I S H E D | O T T O | W I T H | H I S | H A L F | E A R | A N D | T H E | S A V A G E | S C A R | T H A T | M A D E | H I S | U P P E R | L I P | C U R L | S O | F E R O C I O U S L Y | U N D E R | H I S | T W I S T E D | M U S T A C H E |
+S H E | C U T | S Q U A R E S | O F | C O T T O N | C L O T H | A N D | W E | S E W E D | T H E M | T O G E T H E R | I N T O | A | B O O K |
+B Y | T H E | T I M E | W E | H A D | P L A C E D | T H E | C O L D | F R E S H | S M E L L I N G | L I T T L E | T R E E | I N | A | C O R N E R | O F | T H E | S I T T I N G | R O O M | I T | W A S | A L R E A D Y | C H R I S T M A S | E V E |
+I | H A D | W A N T E D | T O | G E T | S O M E | P I C T U R E | B O O K S | F O R | Y U L K A | A N D | A N T O N I A | E V E N | Y U L K A | W A S | A B L E | T O | R E A D | A | L I T T L E | N O W |
+H E | U S E D | T O | H E L P | M Y | F A T H E R | C U T | C H R I S T M A S | T R E E S | F O R | M E | I N | V I R G I N I A | A N D | H E | H A D | N O T | F O R G O T T E N | H O W | M U C H | I | L I K E D | T H E M |
+W H E N | H E | M O U N T E D | H I S | H O R S E | A T | T H E | D O O R | I | S A W | T H A T | H E | H A D | A | H A T C H E T | S L U N G | T O | H I S | B E L T | A N D | H E | G A V E | G R A N D M O T H E R | A | M E A N I N G | L O O K | W H I C H | T O L D | M E | H E | W A S | P L A N N I N G | A | S U R P R I S E | F O R | M E |
+F R O M | U N D E R | T H E | L I N I N G | H E | N O W | P R O D U C E D | A | C O L L E C T I O N | O F | B R I L L I A N T L Y | C O L O R E D | P A P E R | F I G U R E S | S E V E R A L | I N C H E S | H I G H | A N D | S T I F F | E N O U G H | T O | S T A N D | A L O N E |
+B E C A U S E | H E | T A L K E D | S O | L I T T L E | H I S | W O R D S | H A D | A | P E C U L I A R | F O R C E | T H E Y | W E R E | N O T | W O R N | D U L L | F R O M | C O N S T A N T | U S E |
+H I S | F A C E | H A D | A | L O O K | O F | W E A R I N E S S | A N D | P L E A S U R E | L I K E | T H A T | O F | S I C K | P E O P L E | W H E N | T H E Y | F E E L | R E L I E F | F R O M | P A I N |
+H E | M A D E | T H E | S I G N | O F | T H E | C R O S S | O V E R | M E | P U T | O N | H I S | C A P | A N D | W E N T | O F F | I N | T H E | D A R K |
+H E | G A V E | T H A N K S | F O R | O U R | F O O D | A N D | C O M F O R T | A N D | P R A Y E D | F O R | T H E | P O O R | A N D | D E S T I T U T E | I N | G R E A T | C I T I E S | W H E R E | T H E | S T R U G G L E | F O R | L I F E | W A S | H A R D E R | T H A N | I T | W A S | H E R E | W I T H | U S |
+G R A N D F A T H E R | C A M E | D O W N | W E A R I N G | A | W H I T E | S H I R T | A N D | H I S | S U N D A Y | C O A T |
+M O R N I N G | P R A Y E R S | W E R E | L O N G E R | T H A N | U S U A L |
+H E | S A T | S T I L L | A N D | P A S S I V E | H I S | H E A D | R E S T I N G | A G A I N S T | T H E | B A C K | O F | T H E | W O O D E N | R O C K I N G | C H A I R | H I S | H A N D S | R E L A X E D | U P O N | T H E | A R M S |
+A T | A B O U T | F O U R | O ' C L O C K | A | V I S I T O R | A P P E A R E D | M I S T E R | S H I M E R D A | W E A R I N G | H I S | R A B B I T | S K I N | C A P | A N D | C O L L A R | A N D | N E W | M I T T E N S | H I S | W I F E | H A D | K N I T T E D |
+A L L | A F T E R N O O N | H E | S A T | I N | T H E | D I N I N G | R O O M |
+I | W I S H | W E | H A D | N O T | L E F T | S O | S O O N |
+T H E | S U G G E S T I O N | H E | H A D | L A U G H E D | A T | W A S | N O T | S O | E N T I R E L Y | F O O L I S H | A S | H E | H A D | B E E N | P L E A S E D | T O | C O N S I D E R | I T |
+T H E | B O Y S | L O O K | W I D E | A W A K E | E N O U G H | I F | T H E | F A T H E R | D O E S | N O T |
+I | K N O W | I T | S O U N D S | F O O L I S H | B U T | T H E | A L T E R N A T I V E | I S | S O | I M P R O B A B L E |
+B E E N | O V E R | T H E | G R O U N D | S T U D I E D | T H E | A F F A I R | C A R E F U L L Y |
+A T | L E A S T | T H A T | I S | W H A T | W E | H O P E |
+H E | A P P E A R E D | T O | K N O W | F O R | H E | T O L D | M E | A T | O N C E | T H A T | H E | W A S | D E T E C T I V E | G R Y C E | A | M A N | W H O | H A D | G R O W N | O L D | I N | S O L V I N G | J U S T | S U C H | B A F F L I N G | P R O B L E M S | A S | T H E S E |
+T H I S | S W E E T W A T E R | A S | T H E Y | C A L L E D | H I M | W A S | I | H A V E | S I N C E | U N D E R S T O O D | O N E | O F | H I S | P R O T E G E S | A N D | M O R E | O R | L E S S | O F | A | F A V O U R I T E |
+T H E R E | W A S | N O | P O N I A R D | I N | T H E | W O U N D |
+I | E M P H A S I S E D | C O M P L A C E N T L Y |
+I T | H A S | N O T | B E E N | R U N N I N G | S I N C E | L A S T | N I G H T | O R | I T | W O U L D | B E | F U L L | O F | C U R I O U S | P E O P L E | A L L | T H E | T I M E | H U S T L I N G | T O | G E T | A | G L I M P S E | O F | T H I S | P L A C E |
+G E O R G E |
+B U T | V I G O U R | R E T U R N E D | T O | H I M | B E F O R E | H E | H A D | W E L L | R E A C H E D | T H E | D O O R | A N D | H E | S H O W E D | S O M E | O F | H I S | O L D | S P I R I T | A S | H E | T H A N K E D | M I S S | C L A R K E | A N D | T U R N E D | T O | T A K E | T H E | E L E V A T O R |
+T H I S | I S | V E R Y | G O O D | O F | Y O U | H E | B E G A N | G L A N C I N G | D O W N | A T | T H E | A G E D | D E T E C T I V E ' S | B U N D L E D | U P | L E G S | A N D | G E N T L Y | P U S H I N G | A | C H A I R | T O W A R D S | H I M |
+T H E | N E X T | M I N U T E | S H E | W A S | I N | T H I S | L A D Y ' S | A R M S |
+T H E | O L D | M A N ' S | E Y E S | S H O T | F I R E | A N D | U N C O N S C I O U S L Y | O N E | F O O T | S L I P P E D | T O | T H E | F L O O R |
+I ' L L | W A I T | H E R E | T I L L | Y O U ' R E | R E A D Y | E X P L A I N | Y O U R S E L F | T O | T H E | L A D Y | T E L L | H E R | I ' M | A N | O L D | A N D | R H E U M A T I C | I N V A L I D | W H O | H A S | B E E N | U S E D | T O | A S K I N G | H I S | O W N | Q U E S T I O N S |
+V E R Y | G O O D | M A N A G E | I T | A S | Y O U | W I L L |
+H A V E | Y O U | E V E R | T H O U G H T | T H A T | S H E | M I G H T | H A V E | B E E N | A | S U I C I D E |
+I | K N O W | S H E | S A I D | W H A T | Y O U | A R E | G O I N G | T O | A S K | M E | N O W |
+Y E S | M A N Y | T I M E S |
+G E O R G E | N O D D E D |
+I N | O L D E N | D A Y S | T H E Y | W O U L D | H A V E | S A I D | S T R U C K | B Y | A | B O L T | F R O M | H E A V E N |
+I | S U P P O S E | S H E | H A S | B E E N | C A R E F U L L Y | Q U E S T I O N E D | V E R Y | I | S H O U L D | S A Y |
+I | T O O K | Q U I T E | A | F A N C Y | T O | H I M | W H Y |
+O N E | O R | T W O | O F | T H E | M U S I C I A N S | F R O M | T H E | E N D | O F | T H E | H A L L |
+N O T | A L T O G E T H E R | B Y | M E |
+N O T | T I L L | T H E | D O C T O R | C A M E | H E R | D O C T O R | W H O | W A S | H A P P I L Y | I N | H I S | O F F I C E | I N | T H I S | V E R Y | B U I L D I N G |
+S W E E T W A T E R | S O M E O N E | D R E W | T H A T | W E A P O N | O U T |
+D O | T H E Y | S T I L L | I N S I S T | T H A T | M I S S | C H A L L O N E R | W A S | T H E | O N L Y | P E R S O N | I N | T H E | R O O M | W I T H | T H E M | A T | T H I S | T I M E |
+I | A S K E D | A S | S O O N | A S | G E O R G E | H A D | R E T U R N E D | T O | M Y | S I D E |
+B U T | C L E W S | T H E R E | A R E | A B S O L U T E L Y | N O N E |
+W H E N | W E | T O O K | O U R | S E A T S | A T | T H E | B R E A K F A S T | T A B L E | I T | W A S | W I T H | T H E | F E E L I N G | O F | B E I N G | N O | L O N G E R | L O O K E D | U P O N | A S | C O N N E C T E D | I N | A N Y | W A Y | W I T H | T H I S | C A S E |
+A N D | S H E | S P E A K S | O F | N O | W E A P O N |
+W H O | C A M E | N E X T | O N | T H E | S C E N E | S O M E | P E O P L E | F R O M | T H E | L O B B Y |
+I | I N Q U I R E D | O F | G E O R G E | W I T H | M Y | E Y E S | S T I L L | O N | T H I S | F U R T I V E | W A T C H E R |
+I | S H O U L D | L I K E | T O | S E E | T H E | D E S K | Y O U | S P E A K | O F | A N D | T H E | S P O T | W H E R E | S H E | F E L L |
+M I S S | C L A R K E | S T A R T E D | A N D | H E R | S W E E T | F A C E | S H O W E D | A | M O M E N T ' S | P E R P L E X I T Y | D I D | I | S H E | Q U E R I E D | M U S I N G L Y |
+W E L L | W E L L | T H A T ' S | H O N E S T | A T | A L L | E V E N T S |
+H E | W A S | T H E | P L A I N | F A C E D | D E T E C T I V E | W H O | H A D | S P O K E N | T O | G E O R G E |
+I | A M | L O O K I N G | A T | H I M | N O W |
+T H E | B O Y S | B E L O N G | T O | T H E | G E N T L E M A N | W H O | I S | A | W I D O W E R |
+T H E | T R A I L | H E R E | M U S T | B E | A | V E R Y | B L I N D | O N E | F O R | T H E M | T O | C A L L | H I M | I N |
+T H E R E | W A S | N O | D O U B T I N G | T H E M | I N | T H I S | I N S T A N C E |
+I T ' S | T H E | M O S T | I N E X P L I C A B L E | T H E R E |
+A | Y O U N G | F E L L O W | W H O | H A D | B E E N | H O V E R I N G | I N | T H E | B A C K G R O U N D | A T | O N C E | S T E P P E D | F O R W A R D |
+W H E T H E R | H E | G O T | A N Y T H I N G | E L S E | I T | W O U L D | B E | I M P O S S I B L E | T O | S A Y | F R O M | H I S | M A N N E R | A S | H E | F I N A L L Y | S A N K | I N T O | A | C H A I R | B Y | O N E | O F | T H E | O P E N I N G S | A N D | L O O K E D | D O W N | O N | T H E | L O B B Y | B E L O W |
+O L D | D A Y S | D O N ' T | R E T U R N | F O R | T H E | A S K I N G |
+N O | D O U B T |
+N O | W E A P O N | P R O T R U D E D | F R O M | T H E | W O U N D | N O R | W A S | A N Y | F O U N D | O N | O R | N E A R | H E R | I N | T H E | M E Z Z A N I N E | W H A T | F O L L O W S |
+T H E | B O Y S | L O O K | W I D E | A W A K E | E N O U G H | B U T | W H O | C A N | T E L L | I | W O U L D | S O O N E R | B E L I E V E | T H A T |
+B U T | T H E Y ' L L | P U T | A | M A N | O N | F O R | Y O U |
+N O | A | V E R Y | N A T U R A L | O N E | I | S H O U L D | S A Y |
+Y E S |
+M A R K | S O W E R B Y | A N D | C L A U S | H E N N E R B E R G |
+I T | W A S | O N E | W H I C H | G A V E | M E | A | S M A L L | T R I U M P H | O V E R | G E O R G E |
+T H E | T I M E | I S | N A R R O W E D | D O W N | T O | O N E | A N D | I N | T H A T | O N E | M I S S | C L A R K E | W A S | T H E | O N L Y | P E R S O N | T O | T O U C H | H E R |
+W H E R E V E R | S H E | P L E A S E S | O N L Y | I | C A N ' T | W A L K | F A R |
+A | M A N | W A S | L O O K I N G | I N | F R O M | T H E | C O R R I D O R | B E H I N D | A T | T H E | F O U R | P E R S O N S | W E | W E R E | J U S T | D I S C U S S I N G |
+W H A T | D O | Y O U | M A K E | O F | I T | G R Y C E |
+W H A T | M A D E | T H E | D I F F E R E N C E |
+J U S T | A N | E V E R Y D A Y | D E T E C T I V E | B U T | A M B I T I O U S | I | S U P P O S E | A N D | Q U I T E | A L I V E | T O | T H E | I M P O R T A N C E | O F | B E I N G | T H O R O U G H |
+N O | W O R D | N O | C R Y | J U S T | A | C O L L A P S E | A N D | S U D D E N | F A L L |
+S W E E T W A T E R | H E L P | M E | O U T | O F | T H I S |
+Y E S | M I S S | C L A R K E | T H E | M I D D L E | A G E D | L A D Y | W I T H | T H E | P A R R I S H E S |
+I T ' S | A | C A S E | I N | A | T H O U S A N D | G R Y C E |
+Y E S | M I S T E R | S L A T E R | T H E | A S S I S T A N T | M A N A G E R | W H O | W A S | I N | T H E | L O B B Y | A T | T H E | T I M E | S A Y S | T H A T | T E N | M I N U T E S | A T | L E A S T | M U S T | H A V E | E L A P S E D |
+H O N E S T | G E R M A N S | M E N | W H O | H A V E | P L A Y E D | H E R E | F O R | Y E A R S |
+H E | W A N T S | M E | T O | S T A N D | R E A D Y | T O | O B E Y | A N Y | S U M M O N S | T H E | P O L I C E | M A Y | S E N D | M E |
+H O W E V E R | A | L I T T L E | L A T E R | W E | H A D | A | C O M F O R T A B L E | C H A T |
+T H A T | I S | W E | H A V E | N O T | B E E N | A B L E | T O | F I N D | A N Y | P E R H A P S | Y O U | C A N |
+A N Y B O D Y | B E F O R E | T H E | F A T H E R | C A M E | I N |
+T H E I R | G R E E T I N G | W A S | C O R D I A L | A N D | T H E | L I N E S | O N | T H E | L A T T E R ' S | F A C E | R E L A X E D | A | L I T T L E | A S | H E | M E T | T H E | S T I L L | B R I G H T | E Y E | O F | T H E | M A N | U P O N | W H O S E | I N S T I N C T | A N D | J U D G M E N T | S O | M U C H | R E L I A N C E | H A D | A L W A Y S | B E E N | P L A C E D |
+I N S T A N T L Y | T H E Y | A B S O R B E D | A L L | M Y | A T T E N T I O N | T H O U G H | I | D A R E D | N O T | G I V E | T H E M | A | D I R E C T | L O O K | A N D | C O N T I N U E D | T O | O B S E R V E | T H E M | O N L Y | I N | T H E | G L A S S |
+F O R | S O M E | L I T T L E | T I M E | T H A T | I S | I T | S E E M E D | L O N G | T H O U G H | I | B E L I E V E | I T | W A S | N O T | M O R E | T H A N | A | M I N U T E | B E F O R E | T W O | M E N | C A M E | R U N N I N G | F R O M | T H E | M U S I C I A N S | G A L L E R Y |
+A N D | T H E | G L A N C E | S H E | C A S T | H I M | W H I L E | N O T | M E E T I N G | H I S | E Y E | S H O W E D | T H A T | S H E | U N D E R S T O O D | T H E | I M P O R T A N C E | O F | T H E | A D M I S S I O N |
+I T | D I D | N O T | F A L L | U P O N | T H E | F L O O R | A R O U N D | H E R | T H E R E F O R E | I T | F L E W | T H R O U G H | O N E | O F | T H O S E | O P E N I N G S | I N T O | T H E | L O B B Y | A N D | T H E R E | I T | E I T H E R | W I L L | B E | O R | H A S | B E E N | F O U N D |
+S H E | H A D | N O | C O M P A N I O N | N E A R | H E R |
+W H A T | D O E S | H E | W A N T |
+T H E | L A D Y | I S | N O T | T H E | M O T H E R | O F | T H E | B O Y S | B U T | T H E I R | A U N T |
+H E | W A S | L A T E | O F | C O U R S E | B U T | W H E N | H E | D I D | A P P E A R | I | A L M O S T | F O R G O T | O U R | U S U A L | G R E E T I N G | I N | M Y | H U R R Y | T O | A S K | H I M | I F | H E | H A D | S E E N | T H E | E V E N I N G | P A P E R S |
+I | W I L L | T R O U B L E | Y O U | N O | F U R T H E R |
+A S | H E R | Q U I E T | F I G U R E | A P P E A R E D | I N | T H E | D O O R W A Y | S W E E T W A T E R | S T O L E | A | G L A N C E | A T | M I S T E R | G R Y C E |
+N A T U R A L L Y | T H E Y | R E A C H E D | H E R | F I R S T |
+B U T | I ' M | I N | N O | P O S I T I O N | T O | M A K E | P R O M I S E S |
+H E | G A V E | U P | W O R K | S O M E | T I M E | A G O | I | H A V E | B E E N | T O L D | M Y | H U S B A N D | W E N T | O N | B U T | E V I D E N T L Y | A | G R E A T | C A S E | S T I L L | H A S | I T S | A L L U R E M E N T | F O R | H I M |
+Y E S | A N D | A | V E R Y | R E S P E C T A B L E | O N E |
+S H E | S T R U C K | T H E | B L O W | H E R S E L F | A N D | T H E | S T R E N G T H | O F | P U R P O S E | W H I C H | L E D | H E R | T O | D O | T H I S | G A V E | H E R | T H E | A D D I T I O N A L | F O R C E | T O | P U L L | T H E | W E A P O N | O U T | A N D | F L I N G | I T | F R O M | H E R |
+V E R Y | W E L L | T H E N | Y O U ' R E | I N | A | P O S I T I O N | T O | P I O N E E R | M E |
+Y E S | H E ' S | M E R C U R I A L | I N | A L L | H I S | M O V E M E N T S |
+I T | W I L L | B E | W E L L | T O | S T A T E | I N | T H E | B E G I N N I N G | O F | T H I S | R E C I P E | T H A T | F R E N C H | F O R C E M E A T | O R | Q U E N E L L E S | C O N S I S T | O F | T H E | B L E N D I N G | O F | T H R E E | S E P A R A T E | P R O C E S S E S | N A M E L Y | P A N A D A | U D D E R | A N D | W H A T E V E R | M E A T | Y O U | I N T E N D | U S I N G | P A N A D A |
+M O D E | M I X | A L L | T H E | I N G R E D I E N T S | W E L L | T O G E T H E R | C A R E F U L L Y | M I N C I N G | T H E M | V E R Y | F I N E L Y | B E A T | U P | T H E | E G G | M O I S T E N | W I T H | I T | A N D | W O R K | T H E | W H O L E | V E R Y | S M O O T H L Y | T O G E T H E R |
+S I M M E R | F O R | A | M I N U T E | O R | T W O | A N D | S E R V E | I N | A | T U R E E N |
+L E M O N | J U I C E | M A Y | B E | A D D E D | A T | P L E A S U R E |
+T H E | G I N G E R | P L A N T | K N O W N | T O | N A T U R A L I S T S | A S | Z I N G I B E R | O F F I C I N A L E | I S | A | N A T I V E | O F | T H E | E A S T | A N D | W E S T | I N D I E S |
+I L L U S T R A T I O N | S A G E |
+M O D E | C U T | U P | T H E | O N I O N | A N D | C A R R O T | I N T O | S M A L L | R I N G S | A N D | P U T | T H E M | I N T O | A | S T E W P A N | W I T H | T H E | H E R B S | M U S H R O O M S | B A Y | L E A F | C L O V E S | A N D | M A C E | A D D | T H E | B U T T E R | A N D | S I M M E R | T H E | W H O L E | V E R Y | G E N T L Y | O V E R | A | S L O W | F I R E | U N T I L | T H E | O N I O N | I S | Q U I T E | T E N D E R |
+T H E N | A D D | T H E | Y O L K S | O F | T H E | E G G S | W E L L | B E A T E N | S T I R | T H E M | T O | T H E | S A U C E | B U T | D O | N O T | A L L O W | I T | T O | B O I L | A N D | S E R V E | V E R Y | H O T |
+I F | T H E | Q U E N E L L E S | A R E | N O T | F I R M | E N O U G H | A D D | T H E | Y O L K | O F | A N O T H E R | E G G | B U T | O M I T | T H E | W H I T E | W H I C H | O N L Y | M A K E S | T H E M | H O L L O W | A N D | P U F F Y | I N S I D E |
+I L L U S T R A T I O N | T H E | L E M O N |
+M O D E | P A R E | A N D | S L I C E | T H E | C U C U M B E R S | A S | F O R | T H E | T A B L E | S P R I N K L E | W E L L | W I T H | S A L T | A N D | L E T | T H E M | R E M A I N | F O R | T W E N T Y | F O U R | H O U R S | S T R A I N | O F F | T H E | L I Q U O R | P A C K | I N | J A R S | A | T H I C K | L A Y E R | O F | C U C U M B E R S | A N D | S A L T | A L T E R N A T E L Y | T I E | D O W N | C L O S E L Y | A N D | W H E N | W A N T E D | F O R | U S E | T A K E | O U T | T H E | Q U A N T I T Y | R E Q U I R E D |
+I L L U S T R A T I O N | L O N G | P E P P E R |
+I F | I T | S H O U L D | B E | I N | A N | U N S O U N D | S T A T E | I T | M U S T | B E | O N | N O | A C C O U N T | M A D E | U S E | O F |
+L O N G | P E P P E R | T H I S | I S | T H E | P R O D U C E | O F | A | D I F F E R E N T | P L A N T | F R O M | T H A T | W H I C H | P R O D U C E S | T H E | B L A C K | I T | C O N S I S T I N G | O F | T H E | H A L F | R I P E | F L O W E R | H E A D S | O F | W H A T | N A T U R A L I S T S | C A L L | P I P E R | L O N G U M | A N D | C H A B A |
+I L L U S T R A T I O N | G I N G E R |
+W H E N | I T | I S | S U F F I C I E N T L Y | T H I C K | T A K E | I T | O F F | A S | I T | S H O U L D | N O T | B O I L |
+T H E | F A T | T H E Y | A R E | F R I E D | I N | S H O U L D | B E | C L E A R | A N D | T H E | C R U M B S | S H O U L D | N O T | H A V E | T H E | S L I G H T E S T | A P P E A R A N C E | O R | T A S T E | O F | H A V I N G | B E E N | I N | T H E | L E A S T | D E G R E E | B U R N T |
+O T H E R | S W E E T | H E R B S | A R E | C U L T I V A T E D | F O R | P U R P O S E S | O F | M E D I C I N E | A N D | P E R F U M E R Y | T H E Y | A R E | M O S T | G R A T E F U L | B O T H | T O | T H E | O R G A N S | O F | T A S T E | A N D | S M E L L I N G | A N D | T O | T H E | A R O M A | D E R I V E D | F R O M | T H E M | I S | D U E | I N | A | G R E A T | M E A S U R E | T H E | S W E E T | A N D | E X H I L A R A T I N G | F R A G R A N C E | O F | O U R | F L O W E R Y | M E A D S |
+B O I L | F O R | F I V E | M I N U T E S | M I N C E | I T | V E R Y | S M A L L | A N D | M I X | I T | W I T H | T H E | O T H E R | I N G R E D I E N T S |
+N O T E | T H E | W I N E | I N | T H I S | S A U C E | M A Y | B E | O M I T T E D | A N D | A N | O N I O N | S L I C E D | A N D | F R I E D | O F | A | N I C E | B R O W N | S U B S T I T U T E D | F O R | I T |
+I L L U S T R A T I O N | T H E | C U C U M B E R |
+S O M E | S P R I N G S | A R E | S O | H I G H L Y | I M P R E G N A T E D | W I T H | S A L T | A S | T O | H A V E | R E C E I V E D | T H E | N A M E | O F | B R I N E | S P R I N G S | A N D | A R E | S U P P O S E D | T O | H A V E | B E C O M E | S O | B Y | P A S S I N G | T H R O U G H | T H E | S A L T | R O C K S | B E L O W | G R O U N D | A N D | T H U S | D I S S O L V I N G | A | P O R T I O N | O F | T H I S | M I N E R A L | S U B S T A N C E |
+C O N T I N U E | I N | T H I S | M A N N E R | T I L L | T H E | B O R D E R | I S | C O M P L E T E D | A R R A N G I N G | T H E | S I P P E T S | A | P A L E | A N D | A | D A R K | O N E | A L T E R N A T E L Y |
+S E A S O N A B L E | T H I S | R E C I P E | S H O U L D | B E | U S E D | I N | J U N E | J U L Y | O R | A U G U S T |
+I L L U S T R A T I O N | B A S I L |
+M O D E | C H O O S E | T H E | G R E E N E S T | C U C U M B E R S | A N D | T H O S E | T H A T | A R E | M O S T | F R E E | F R O M | S E E D S | P U T | T H E M | I N | S T R O N G | S A L T | A N D | W A T E R | W I T H | A | C A B B A G E | L E A F | T O | K E E P | T H E M | D O W N | T I E | A | P A P E R | O V E R | T H E M | A N D | P U T | T H E M | I N | A | W A R M | P L A C E | T I L L | T H E Y | A R E | Y E L L O W | T H E N | W A S H | T H E M | A N D | S E T | T H E M | O V E R | T H E | F I R E | I N | F R E S H | W A T E R | W I T H | A | V E R Y | L I T T L E | S A L T | A N D | A N O T H E R | C A B B A G E | L E A F | O V E R | T H E M | C O V E R | V E R Y | C L O S E L Y | B U T | T A K E | C A R E | T H E Y | D O | N O T | B O I L |
+O R I G I N A L L Y | T H E | M O S T | V A L U A B L E | O F | T H E S E | W E R E | F O U N D | I N | T H E | S P I C E | I S L A N D S | O R | M O L U C C A S | O F | T H E | I N D I A N | O C E A N | A N D | W E R E | H I G H L Y | P R I Z E D | B Y | T H E | N A T I O N S | O F | A N T I Q U I T Y |
+S U F F I C I E N T | T O | S E R V E | W I T H | F I V E | O R | S I X | M A C K E R E L |
+P U T | T H E | S U G A R | W I T H | O N E | Q U A R T E R | P I N T | O F | W A T E R | I N | A | S A U C E P A N | O V E R | T H E | F I R E | R E M O V E | T H E | S C U M | A S | I T | R I S E S | A N D | A D D | T H E | L E M O N | P E E L | A N D | G I N G E R | W I T H | T H E | O U T S I D E | S C R A P E D | O F F | W H E N | T H E | S Y R U P | I S | T O L E R A B L Y | T H I C K | T A K E | I T | O F F | T H E | F I R E | A N D | W H E N | C O L D | W I P E | T H E | C U C U M B E R S | D R Y | A N D | P U T | T H E M | I N |
+B E A T | T H E | Y O L K S | O F | T H E | O T H E R | T W O | E G G S | A D D | T H E M | W I T H | A | L I T T L E | F L O U R | A N D | S A L T | T O | T H O S E | P O U N D E D | M I X | A L L | W E L L | T O G E T H E R | A N D | R O L L | I N T O | B A L L S |
+M O D E | P U T | T H E | M I L K | I N | A | V E R Y | C L E A N | S A U C E P A N | A N D | L E T | I T | B O I L |
+W H E N | Q U I T E | C R I S P | D I P | O N E | S I D E | O F | T H E | S I P P E T | I N T O | T H E | B E A T E N | W H I T E | O F | A N | E G G | M I X E D | W I T H | A | L I T T L E | F L O U R | A N D | P L A C E | I T | O N | T H E | E D G E | O F | T H E | D I S H |
+I T | I S | A | N A T I V E | O F | P O R T U G A L | A N D | W H E N | I T S | L E A V E S | A R E | U S E D | A S | A | S E A S O N I N G | H E R B | T H E Y | H A V E | A N | A G R E E A B L E | A R O M A T I C | F L A V O U R |
+T H E | L E M O N | T H I S | F R U I T | I S | A | N A T I V E | O F | A S I A | A N D | I S | M E N T I O N E D | B Y | V I R G I L | A S | A N | A N T I D O T E | T O | P O I S O N |
+S U F F I C I E N T | F O R | A | M O D E R A T E | S I Z E D | H A D D O C K | O R | P I K E |
+S O L I D | R O C K S | O F | S A L T | A R E | A L S O | F O U N D | I N | V A R I O U S | P A R T S | O F | T H E | W O R L D | A N D | T H E | C O U N T Y | O F | C H E S T E R | C O N T A I N S | M A N Y | O F | T H E S E | M I N E S | A N D | I T | I S | F R O M | T H E R E | T H A T | M U C H | O F | O U R | S A L T | C O M E S |
+F O R C E M E A T | F O R | C O L D | S A V O U R Y | P I E S |
+P U T | T H E | U D D E R | I N T O | A | S T E W P A N | W I T H | S U F F I C I E N T | W A T E R | T O | C O V E R | I T | L E T | I T | S T E W | G E N T L Y | T I L L | Q U I T E | D O N E | W H E N | T A K E | I T | O U T | T O | C O O L |
+F R I E D | B R E A D | F O R | B O R D E R S |
+M O D E | P U T | T H E | W H O L E | O F | T H E | I N G R E D I E N T S | I N T O | A | B O T T L E | A N D | L E T | I T | R E M A I N | F O R | A | F O R T N I G H T | I N | A | W A R M | P L A C E | O C C A S I O N A L L Y | S H A K I N G | U P | T H E | C O N T E N T S |
+V A R I O U S | D I S H E S | A R E | F R E Q U E N T L Y | O R N A M E N T E D | A N D | G A R N I S H E D | W I T H | I T S | G R A C E F U L | L E A V E S | A N D | T H E S E | A R E | S O M E T I M E S | B O I L E D | I N | S O U P S | A L T H O U G H | I T | I S | M O R E | U S U A L L Y | C O N F I N E D | I N | E N G L I S H | C O O K E R Y | T O | T H E | M A C K E R E L | S A U C E | A S | H E R E | G I V E N |
+T H E Y | O U G H T | T O | B E | T A K E N | U P | I N | T H E | A U T U M N | A N D | W H E N | D R I E D | I N | T H E | H O U S E | W I L L | K E E P | T I L L | S P R I N G |
+T H I S | J U I C E | W H I C H | I S | C A L L E D | C I T R I C | A C I D | M A Y | B E | P R E S E R V E D | I N | B O T T L E S | F O R | A | C O N S I D E R A B L E | T I M E | B Y | C O V E R I N G | I T | W I T H | A | T H I N | S T R A T U M | O F | O I L |
+F R I E D | B R E A D | C R U M B S |
+N O W | B E A T | A N D | S T R A I N | T H E | E G G S | W O R K | T H E S E | U P | W I T H | T H E | O T H E R | I N G R E D I E N T S | A N D | T H E | F O R C E M E A T | W I L L | B E | R E A D Y | F O R | U S E |
+T O | P I C K L E | E G G S |
+P L A C E | T H E | J U G | I N | A | S A U C E P A N | O F | B O I L I N G | W A T E R | K E E P | S T I R R I N G | W E L L | U N T I L | I T | T H I C K E N S | B U T | D O | N O T | A L L O W | I T | T O | B O I L | O R | I T | W I L L | C U R D L E |
+P O U N D | W E L L | A N D | B I N D | W I T H | O N E | O R | T W O | E G G S | W H I C H | H A V E | B E E N | P R E V I O U S L Y | B E A T E N | A N D | S T R A I N E D |
+I T | I S | H A R D I E R | T H A N | T H E | O R A N G E | A N D | A S | O N E | O F | T H E | C I T R O N | T R I B E | W A S | B R O U G H T | I N T O | E U R O P E | B Y | T H E | A R A B I A N S |
+T H E | L E M O N | W A S | F I R S T | C U L T I V A T E D | I N | E N G L A N D | I N | T H E | B E G I N N I N G | O F | T H E | S E V E N T E E N T H | C E N T U R Y | A N D | I S | N O W | O F T E N | T O | B E | F O U N D | I N | O U R | G R E E N | H O U S E S |
+F R E N C H | F O R C E M E A T |
+A | S T O R E | O F | P I C K L E D | E G G S | W I L L | B E | F O U N D | V E R Y | U S E F U L | A N D | O R N A M E N T A L | I N | S E R V I N G | W I T H | M A N Y | F I R S T | A N D | S E C O N D | C O U R S E | D I S H E S |
+A D D | T H E | W I N E | A N D | I F | N E C E S S A R Y | A | S E A S O N I N G | O F | C A Y E N N E | W H E N | I T | W I L L | B E | R E A D Y | T O | S E R V E |
+P L A C E | I T | O V E R | T H E | F I R E | K E E P | C O N S T A N T L Y | S T I R R I N G | T O | P R E V E N T | I T S | B U R N I N G | A N D | W H E N | Q U I T E | D R Y | P U T | I N | A | S M A L L | P I E C E | O F | B U T T E R |
+A N Y | O N E | W I T H | T H E | S L I G H T E S T | P R E T E N S I O N S | T O | R E F I N E D | C O O K E R Y | M U S T | I N | T H I S | P A R T I C U L A R | I M P L I C I T L Y | F O L L O W | T H E | E X A M P L E | O F | O U R | F R I E N D S | A C R O S S | T H E | C H A N N E L |
+I L L U S T R A T I O N | P E S T L E | A N D | M O R T A R |
+W H E N | T H E | T H R E E | I N G R E D I E N T S | A R E | P R O P E R L Y | P R E P A R E D | P O U N D | T H E M | A L T O G E T H E R | I N | A | M O R T A R | F O R | S O M E | T I M E | F O R | T H E | M O R E | Q U E N E L L E S | A R E | P O U N D E D | T H E | M O R E | D E L I C A T E | T H E Y | A R E |
+I N | J A M A I C A | I T | F L O W E R S | A B O U T | A U G U S T | O R | S E P T E M B E R | F A D I N G | A B O U T | T H E | E N D | O F | T H E | Y E A R |
+T H E | L O N G | P E P P E R | I S | L E S S | A R O M A T I C | T H A N | T H E | B L A C K | B U T | I T S | O I L | I S | M O R E | P U N G E N T |
+S U F F I C I E N T | H A L F | T H I S | Q U A N T I T Y | F O R | T W O | S L I C E S | O F | S A L M O N |
+S E A S O N A B L E | T H I S | S H O U L D | B E | M A D E | A B O U T | E A S T E R | A S | A T | T H I S | T I M E | E G G S | A R E | P L E N T I F U L | A N D | C H E A P |
+B O I L | T H E M | B E F O R E | T H E Y | A R E | P U T | I N T O | T H E | S O U P | O R | O T H E R | D I S H | T H E Y | M A Y | B E | I N T E N D E D | F O R |
+B E A T | T H E | E G G S | S T I R | T O | T H E M | T H E | M I L K | A N D | P O U N D E D | S U G A R | A N D | P U T | T H E | M I X T U R E | I N T O | A | J U G |
+I L L U S T R A T I O N | M A R J O R A M |
+S O M E T H I N G | O F | T H E I R | T R O U B L I N G | S W E E T N E S S | C A M E | B A C K | T O | A L E X A N D E R | T O O |
+D O N ' T | C R Y | D O N ' T | C R Y | H E | W H I S P E R E D |
+H I L D A ' S | F A C E | Q U I V E R E D | B U T | S H E | W H I S P E R E D | Y E S | I | T H I N K | I T | M U S T | H A V E | B E E N |
+S H E | L O O K E D | A T | H I S | H E A V Y | S H O U L D E R S | A N D | B I G | D E T E R M I N E D | H E A D | T H R U S T | F O R W A R D | L I K E | A | C A T A P U L T | I N | L E A S H |
+S H E | B I T | H E R | L I P | A N D | L O O K E D | D O W N | A T | H E R | H A N D S | W H I C H | W E R E | C L A S P E D | T I G H T L Y | I N | F R O N T | O F | H E R |
+Y O U | A S K | M E | T O | S T A Y | A W A Y | F R O M | Y O U | B E C A U S E | Y O U | W A N T | M E |
+H I L D A | W A T C H E D | H I M | F R O M | H E R | C O R N E R | T R E M B L I N G | A N D | S C A R C E L Y | B R E A T H I N G | D A R K | S H A D O W S | G R O W I N G | A B O U T | H E R | E Y E S |
+S H E | L I S T E N E D | I N T E N T L Y | B U T | S H E | H E A R D | N O T H I N G | B U T | T H E | C R E A K I N G | O F | H I S | C H A I R |
+I | U N D E R S T A N D | B A R T L E Y | I | W A S | W R O N G |
+A T | T H A T | W O R D | D E C E P T I O N | S P O K E N | W I T H | S U C H | S E L F | C O N T E M P T | T H E | C O L O R | F L A S H E D | B A C K | I N T O | H I L D A ' S | F A C E | A S | S U D D E N L Y | A S | I F | S H E | H A D | B E E N | S T R U C K | B Y | A | W H I P L A S H |
+A L W A Y S | B U T | I T ' S | W O R S E | N O W |
+S H E | B L U S H E D | A N D | S M I L E D | A N D | F U M B L E D | H I S | C A R D | I N | H E R | C O N F U S I O N | B E F O R E | S H E | R A N | U P S T A I R S |
+S H E | M E R E L Y | B R U S H E D | H I S | C H E E K | W I T H | H E R | L I P S | A N D | P U T | A | H A N D | L I G H T L Y | A N D | J O Y O U S L Y | O N | E I T H E R | S H O U L D E R |
+T H E | L A S T | T W O | D A Y S | O F | T H E | V O Y A G E | B A R T L E Y | F O U N D | A L M O S T | I N T O L E R A B L E |
+T H E R E | I S | T H I S | D E C E P T I O N | B E T W E E N | M E | A N D | E V E R Y T H I N G |
+Y E S | H I L D A | I | K N O W | T H A T | H E | S A I D | S I M P L Y |
+H I L D A | S A T | O N | T H E | A R M | O F | I T | A N D | P U T | H E R | H A N D S | L I G H T L Y | O N | H I S | S H O U L D E R S |
+O H | B A R T L E Y | W H A T | A M | I | T O | D O |
+I | W I L L | A S K | T H E | L E A S T | I M A G I N A B L E | B U T | I | M U S T | H A V E | S O M E T H I N G |
+W H E N | D I D | Y O U | C O M E | B A R T L E Y | A N D | H O W | D I D | I T | H A P P E N | Y O U | H A V E N ' T | S P O K E N | A | W O R D |
+B A R T L E Y | L E A N E D | H I S | H E A D | I N | H I S | H A N D S | A N D | S P O K E | T H R O U G H | H I S | T E E T H |
+Y O U | W A N T | M E | T O | S A Y | I T | S H E | W H I S P E R E D |
+A N D | T H E N | Y O U | C A M E | B A C K | N O T | C A R I N G | V E R Y | M U C H | B U T | I T | M A D E | N O | D I F F E R E N C E |
+A | C O A L | F I R E | W A S | C R A C K L I N G | I N | T H E | G R A T E | A N D | T H E | L A M P S | W E R E | L I T | F O R | I T | W A S | A L R E A D Y | B E G I N N I N G | T O | G R O W | D A R K | O U T S I D E |
+A F T E R | T H E | V E R Y | F I R S T |
+S H E | C A L L E D | H I S | N A M E | O N | T H E | T H R E S H O L D | B U T | I N | H E R | S W I F T | F L I G H T | A C R O S S | T H E | R O O M | S H E | F E L T | A | C H A N G E | I N | H I M | A N D | C A U G H T | H E R S E L F | U P | S O | D E F T L Y | T H A T | H E | C O U L D | N O T | T E L L | J U S T | W H E N | S H E | D I D | I T |
+H E | P U L L E D | U P | A | W I N D O W | A S | I F | T H E | A I R | W E R E | H E A V Y |
+I T ' S | U N B E A R A B L E | I T | T O R T U R E S | M E | E V E R Y | M I N U T E |
+P R E S E N T L Y | I T | S T O L E | B A C K | T O | H I S | C O A T | S L E E V E |
+I ' L L | D O | A N Y T H I N G | Y O U | W I S H | M E | T O | B A R T L E Y | S H E | S A I D | T R E M U L O U S L Y |
+I | H A V E | T H O U G H T | A B O U T | I T | U N T I L | I | A M | W O R N | O U T |
+T H E | R O O M | W A S | E M P T Y | W H E N | H E | E N T E R E D |
+S H E | P R E S S E D | H I S | H A N D | G E N T L Y | I N | G R A T I T U D E | W E R E N ' T | Y O U | H A P P Y | T H E N | A T | A L L |
+E M E R G I N G | A T | E U S T O N | A T | H A L F | P A S T | T H R E E | O ' C L O C K | I N | T H E | A F T E R N O O N | A L E X A N D E R | H A D | H I S | L U G G A G E | S E N T | T O | T H E | S A V O Y | A N D | D R O V E | A T | O N C E | T O | B E D F O R D | S Q U A R E |
+I T ' S | G O T | T O | B E | A | C L E A N | B R E A K | H I L D A |
+I T | I T | H A S N ' T | A L W A Y S | M A D E | Y O U | M I S E R A B L E | H A S | I T |
+Y O U | S E E | L O V I N G | S O M E | O N E | A S | I | L O V E | Y O U | M A K E S | T H E | W H O L E | W O R L D | D I F F E R E N T |
+C O U L D | Y O U | C O U L D | Y O U | S I T | D O W N | A N D | T A L K | A B O U T | I T | Q U I E T L Y | B A R T L E Y | A S | I F | I | W E R E | A | F R I E N D | A N D | N O T | S O M E | O N E | W H O | H A D | T O | B E | D E F I E D |
+I | N E V E R | D R E A M E D | I T | W O U L D | B E | Y O U | B A R T L E Y |
+I | G E T | N O T H I N G | B U T | M I S E R Y | O U T | O F | E I T H E R |
+H E | D R O P P E D | B A C K | H E A V I L Y | I N T O | H I S | C H A I R | B Y | T H E | F I R E |
+I | A M | N O T | A | M A N | W H O | C A N | L I V E | T W O | L I V E S | H E | W E N T | O N | F E V E R I S H L Y | E A C H | L I F E | S P O I L S | T H E | O T H E R |
+S H E | S L I D | T O | T H E | F L O O R | B E S I D E | H I M | A S | I F | S H E | W E R E | T O O | T I R E D | T O | S I T | U P | A N Y | L O N G E R |
+W H E N | S H E | B E G A N | T O | D A N C E | B Y | W A Y | O F | S H O W I N G | T H E | G O S S O O N S | W H A T | S H E | H A D | S E E N | I N | T H E | F A I R Y | R I N G S | A T | N I G H T | T H E | H O U S E | B R O K E | I N T O | A | P R O L O N G E D | U P R O A R |
+I ' M | G L A D | S H E ' S | H E L D | H E R | O W N | S I N C E |
+I N | A | M O M E N T | P E G G Y | W A S | O N | T H E | S T A G E | A G A I N | A N D | A L E X A N D E R | A P P L A U D E D | V I G O R O U S L Y | W I T H | T H E | R E S T |
+A L E X A N D E R | E X C L A I M E D | M I L D L Y |
+I | H A P P E N | T O | H A V E | M A C | C O N N E L L ' S | B O X | F O R | T O N I G H T | O R | T H E R E ' D | B E | N O | C H A N C E | O F | O U R | G E T T I N G | P L A C E S |
+A | L I T T L E | A T T A C K | O F | N E R V E S | P O S S I B L Y |
+M A C | C O N N E L L | L E T | M E | I N T R O D U C E | M I S T E R | B A R T L E Y | A L E X A N D E R |
+I N | T H E | H A L F | L I G H T | H E | L O O K E D | A B O U T | A T | T H E | S T A L L S | A N D | B O X E S | A N D | S M I L E D | A | L I T T L E | C O N S C I O U S L Y | R E C A L L I N G | W I T H | A M U S E M E N T | S I R | H A R R Y ' S | J U D I C I A L | F R O W N |
+H U G H ' S | W R I T T E N | A | D E L I G H T F U L | P A R T | F O R | H E R | A N D | S H E ' S | Q U I T E | I N E X P R E S S I B L E |
+M Y S E L F | I | A L W A Y S | K N E W | S H E | H A D | I T | I N | H E R |
+H E ' S | A N O T H E R | W H O ' S | A W F U L L Y | K E E N | A B O U T | H E R | L E T | M E | I N T R O D U C E | Y O U |
+I T ' S | D E L I G H T F U L | T O | H E A R | I T | I N | A | L O N D O N | T H E A T R E |
+I | S A Y | S I R | H A R R Y | T H E | L I T T L E | G I R L ' S | G O I N G | F A M O U S L Y | T O | N I G H T | I S N ' T | S H E |
+H E | L E A N E D | F O R W A R D | A N D | B E A M E D | F E L I C I T A T I O N S | A S | W A R M L Y | A S | M A I N H A L L | H I M S E L F | W H E N | A T | T H E | E N D | O F | T H E | P L A Y | S H E | C A M E | A G A I N | A N D | A G A I N | B E F O R E | T H E | C U R T A I N | P A N T I N G | A | L I T T L E | A N D | F L U S H E D | H E R | E Y E S | D A N C I N G | A N D | H E R | E A G E R | N E R V O U S | L I T T L E | M O U T H | T R E M U L O U S | W I T H | E X C I T E M E N T |
+O F | C O U R S E | H I L D A | I S | I R I S H | T H E | B U R G O Y N E S | H A V E | B E E N | S T A G E | P E O P L E | F O R | G E N E R A T I O N S | A N D | S H E | H A S | T H E | I R I S H | V O I C E |
+W H E N | T H E Y | E N T E R E D | T H E | S T A G E | B O X | O N | T H E | L E F T | T H E | F I R S T | A C T | W A S | W E L L | U N D E R | W A Y | T H E | S C E N E | B E I N G | T H E | I N T E R I O R | O F | A | C A B I N | I N | T H E | S O U T H | O F | I R E L A N D |
+H E | N O D D E D | C U R T L Y | A N D | M A D E | F O R | T H E | D O O R | D O D G I N G | A C Q U A I N T A N C E S | A S | H E | W E N T |
+A F T E R | H E R | D A N C E | S H E | W I T H D R E W | F R O M | T H E | D I A L O G U E | A N D | R E T R E A T E D | T O | T H E | D I T C H | W A L L | B A C K | O F | P H I L L Y ' S | B U R R O W | W H E R E | S H E | S A T | S I N G I N G | T H E | R I S I N G | O F | T H E | M O O N | A N D | M A K I N G | A | W R E A T H | O F | P R I M R O S E S | F O R | H E R | D O N K E Y |
+A S | T H E Y | S A T | D O W N | A | B U R S T | O F | A P P L A U S E | D R E W | A L E X A N D E R ' S | A T T E N T I O N | T O | T H E | S T A G E |
+A L L | T H E | S A M E | H E | L I F T E D | H I S | G L A S S | H E R E ' S | T O | Y O U | L I T T L E | H I L D A |
+D O | Y O U | K N O W | A L E X A N D E R | M A I N H A L L | L O O K E D | W I T H | P E R P L E X I T Y | U P | I N T O | T H E | T O P | O F | T H E | H A N S O M | A N D | R U B B E D | H I S | P I N K | C H E E K | W I T H | H I S | G L O V E D | F I N G E R | D O | Y O U | K N O W | I | S O M E T I M E S | T H I N K | O F | T A K I N G | T O | C R I T I C I S M | S E R I O U S L Y | M Y S E L F |
+S I R | H A R R Y | T O W N E | B O W E D | A N D | S A I D | T H A T | H E | H A D | M E T | M I S T E R | A L E X A N D E R | A N D | H I S | W I F E | I N | T O K Y O |
+T H E | F A C T | I S | S H E ' S | F E E L I N G | R A T H E R | S E E D Y | P O O R | C H I L D |
+H E | B O W E D | A S | T H E | W A R N I N G | B E L L | R A N G | A N D | M A I N H A L L | W H I S P E R E D | Y O U | K N O W | L O R D | W E S T M E R E | O F | C O U R S E | T H E | S T O O P E D | M A N | W I T H | T H E | L O N G | G R A Y | M U S T A C H E | T A L K I N G | T O | L A D Y | D O W L E |
+I | D A R E | S A Y | I T ' S | Q U I T E | T R U E | T H A T | T H E R E ' S | N E V E R | B E E N | A N Y | O N E | E L S E |
+T H E | P L A Y W R I G H T | G A V E | M A I N H A L L | A | C U R I O U S | L O O K | O U T | O F | H I S | D E E P | S E T | F A D E D | E Y E S | A N D | M A D E | A | W R Y | F A C E |
+I T | W A S | Y O U T H | A N D | P O V E R T Y | A N D | P R O X I M I T Y | A N D | E V E R Y T H I N G | W A S | Y O U N G | A N D | K I N D L Y |
+H E | H A D | W R I T T E N | A | N U M B E R | O F | B O O K S | H I M S E L F | A M O N G | T H E M | A | H I S T O R Y | O F | D A N C I N G | A | H I S T O R Y | O F | C O S T U M E | A | K E Y | T O | S H A K E S P E A R E ' S | S O N N E T S | A | S T U D Y | O F | T H E | P O E T R Y | O F | E R N E S T | D O W S O N | E T | C E T E R A |
+O H | B A R T L E Y | D I D | Y O U | W R I T E | T O | M E |
+I F | Y O U ' D | S E N T | M E | A | N O T E | O R | T E L E P H O N E D | M E | O R | A N Y T H I N G |
+H E | B E N T | H I S | F A C E | O V E R | H E R | H A I R |
+A L E X A N D E R | S L I P P E D | H I S | A R M | A B O U T | H E R |
+O F | C O U R S E | I | K N O W | B A R T L E Y | S H E | S A I D | A T | L A S T | T H A T | A F T E R | T H I S | Y O U | W O N ' T | O W E | M E | T H E | L E A S T | C O N S I D E R A T I O N | B U T | W E | S A I L | O N | T U E S D A Y |
+I | S A W | T H A T | I N T E R V I E W | I N | T H E | P A P E R | Y E S T E R D A Y | T E L L I N G | W H E R E | Y O U | W E R E | A N D | I | T H O U G H T | I | H A D | T O | S E E | Y O U | T H A T ' S | A L L | G O O D | N I G H T | I ' M | G O I N G | N O W |
+I | D O N ' T | K N O W | W H A T | I | O U G H T | T O | S A Y | B U T | I | D O N ' T | B E L I E V E | Y O U ' D | B E | H A P P Y | T R U L Y | I | D O N ' T | A R E N ' T | Y O U | T R Y I N G | T O | F R I G H T E N | M E |
+O N L Y | I ' L L | D O | I T | M O R E | C O M P L E T E L Y |
+I ' V E | B E E N | U P | I N | C A N A D A | W I T H | M Y | B R I D G E | A N D | I | A R R A N G E D | N O T | T O | C O M E | T O | N E W | Y O R K | U N T I L | A F T E R | Y O U | H A D | G O N E |
+T H E N | Y O U | D O N ' T | K N O W | W H A T | Y O U ' R E | T A L K I N G | A B O U T |
+O V E R | T H E | F I R E P L A C E | T H E R E | W A S | A | L A R G E | O L D | F A S H I O N E D | G I L T | M I R R O R |
+A L E X A N D E R | F L U S H E D | A N G R I L Y |
+I | T H I N K | I | H A V E | F E L T | T H A T | Y O U | W E R E | C O M I N G |
+H E | P A U S E D | T H E Y | N E V E R | D I D | T O | M E |
+I | T O L D | M Y S E L F | T H A T | I F | I | W E R E | R E A L L Y | T H I N K I N G | O F | Y O U | A N D | N O T | O F | M Y S E L F | A | L E T T E R | W O U L D | B E | B E T T E R | T H A N | N O T H I N G |
+T H E N | W H E N | Y O U R | M A N A G E R | A D D E D | T W O | M O R E | W E E K S | I | W A S | A L R E A D Y | C O M M I T T E D |
+I ' M | G O I N G | T O | D O | W H A T | Y O U | A S K E D | M E | T O | D O | W H E N | Y O U | W E R E | I N | L O N D O N |
+O N | T H E | L A S T | S A T U R D A Y | I N | A P R I L | T H E | N E W | Y O R K | T I M E S | P U B L I S H E D | A N | A C C O U N T | O F | T H E | S T R I K E | C O M P L I C A T I O N S | W H I C H | W E R E | D E L A Y I N G | A L E X A N D E R ' S | N E W | J E R S E Y | B R I D G E | A N D | S T A T E D | T H A T | T H E | E N G I N E E R | H I M S E L F | W A S | I N | T O W N | A N D | A T | H I S | O F F I C E | O N | W E S T | T E N T H | S T R E E T |
+L E T | M E | T A K E | O F F | Y O U R | C O A T | A N D | Y O U R | B O O T S | T H E Y ' R E | O O Z I N G | W A T E R |
+B U T | W H E N | I | C A M E | I | T H O U G H T | I | H A D | B E E N | M I S T A K E N |
+Y E S | I | K N O W | V E R Y | W E L L |
+H E | R O S E | A N D | C R O S S E D | T H E | R O O M | Q U I C K L Y |
+A N D | I | S H E | W H I S P E R E D | I | F E L T | T H A T | Y O U | W E R E | F E E L I N G | T H A T |
+I T | S E E M E D | A S | I F | H I S | F A M I L Y | T R O U B L E S | W E R E | J U S T | B E G I N N I N G |
+H E | S A W | A | B U S Y | S A T U R D A Y | U S H E R E D | O U T | T H E | S A B B A T H | I N | A N D | N O T H I N G | D O N E |
+H E | S A W | T H A T | I N | T H E | E X C I T E M E N T | O F | R E C E N T | E V E N T S | H E | H A D | N O T | F O R M U L A T E D | A | P L A N | U P O N | T H A T | S C O R E |
+S O | H E R E | I T | W A S | S P R E A D | O U T | C L E A R | B E F O R E | H I M | A N D | N O W | H E | K N E W | W H A T | T O | E X P E C T |
+Y O U | T A K E | T H I S | T O | T H I S | A D D R E S S | H E | S A I D | H A N D I N G | H I M | T H E | E N V E L O P E | A N D | G I V E | I T | T O | M I S S U S | H U R S T W O O D | Y E S | S I R | S A I D | T H E | B O Y |
+D E A R | S I R | W E | B E G | T O | I N F O R M | Y O U | T H A T | W E | A R E | I N S T R U C T E D | T O | W A I T | U N T I L | T O | M O R R O W | T H U R S D A Y | A T | O N E | O ' C L O C K | B E F O R E | F I L I N G | S U I T | A G A I N S T | Y O U | O N | B E H A L F | O F | M I S S U S | J U L I A | H U R S T W O O D | F O R | D I V O R C E | A N D | A L I M O N Y |
+H E | W A S | G E T T I N G | S O M E | V A G U E | C O M F O R T | O U T | O F | A | G O O D | C I G A R | B U T | I T | W A S | N O | P A N A C E A | F O R | T H E | I L L | W H I C H | A F F E C T E D | H I M |
+A N Y | A N S W E R | I | G U E S S | N O T |
+H E | W A S | Q U I T E | C E R T A I N | N O W | T H A T | S H E | K N E W | H E | W A S | M A R R I E D | A N D | W A S | A N G E R E D | A T | H I S | P E R F I D Y |
+H E | F A N C I E D | A S | H E | S A T | A T | H I S | D E S K | T H A T | N O T H I N G | W O U L D | B E | D O N E | F O R | A | W E E K | O R | T W O | M E A N W H I L E | H E | W O U L D | H A V E | T I M E | T O | T H I N K |
+O N | W E D N E S D A Y | H E | R E C E I V E D | A N O T H E R | P O L I T E | N O T E | F R O M | M C | G R E G O R | J A M E S | A N D | H A Y | I T | R E A D |
+H E | S T A Y E D | A T | H I S | D E S K | L O N G | A F T E R | A L L | O T H E R S | H A D | G O N E | A N D | O N L Y | Q U I T T E D | I T | W H E N | T H E | N I G H T | W A T C H M A N | O N | H I S | R O U N D | P U L L E D | A T | T H E | F R O N T | D O O R | T O | S E E | I F | I T | W A S | S A F E L Y | L O C K E D |
+H O W | A B O U T | T H A T | N O W | H I S | P A I N | A T | H E R | F A I L U R E | T O | M E E T | O R | W R I T E | H I M | R A P I D L Y | I N C R E A S E D | A S | H E | D E V O T E D | H I M S E L F | T O | T H I S | S U B J E C T |
+V E R Y | T R U L Y | Y O U R S | E T | C E T E R A | C O M P R O M I S E |
+H E | T R O U B L E D | O V E R | M A N Y | L I T T L E | D E T A I L S | A N D | T A L K E D | P E R F U N C T O R I L Y | T O | E V E R Y B O D Y |
+H E | W O U L D | G O | T O | H E R | A N D | T E L L | H E R | A L L | H I S | F A M I L Y | C O M P L I C A T I O N S |
+I F | H E | D I D N ' T | G O | A N D | S E E | T H E M | T H E Y | W O U L D | S U E | H I M | P R O M P T L Y |
+H E | W A S | B E A T E N | F O R | T O | N I G H T | A N D | H E | M I G H T | J U S T | A S | W E L L | M A K E | T H E | B E S T | O F | I T |
+H E | C O U L D | H A R D L Y | R E A L I S E | H O W | I T | H A D | A L L | C O M E | A B O U T |
+H E | W O U L D | E X P L A I N | T O | H E R | J U S T | W H E R E | H E | S T O O D | A N D | H O W | M U C H | H E | N E E D E D | H E R |
+T H R E E | O ' C L O C K | C A M E | F O U R | F I V E | S I X | A N D | N O | L E T T E R |
+S H E | W O U L D | T A K E | T H E | E N V E L O P E | A N D | K N O W | T H A T | S H E | H A D | T R I U M P H E D |
+I F | H E | O N L Y | H A D | T H A T | L E T T E R | B A C K | H E | W O U L D N ' T | S E N D | I T |
+I N | A B O U T | A N | H O U R | A N D | T H R E E | Q U A R T E R S | T H E | B O Y | R E T U R N E D |
+N O | L E T T E R | H A D | C O M E | N O | W O R D | O F | A N Y | K I N D | A N D | Y E T | H E R E | I T | W A S | L A T E | I N | T H E | E V E N I N G | A N D | S H E | H A D | A G R E E D | T O | M E E T | H I M | T H A T | M O R N I N G |
+H E | D I D | N O T | G O | W I T H I N | A | B L O C K | O F | T H E | H O U S E |
+A L L | D A Y | T H E | B A R | B E I N G | C L O S E D | H E | B R O O D E D | A L O N E | S H U T | O U T | F R O M | H O M E | F R O M | T H E | E X C I T E M E N T | O F | H I S | R E S O R T | F R O M | C A R R I E | A N D | W I T H O U T | T H E | A B I L I T Y | T O | A L T E R | H I S | C O N D I T I O N | O N E | I O T A |
+I T | W A S | W I T H | G R E A T | O P P O S I T I O N | A F T E R | T W O | O R | T H R E E | H O U R S | O F | T H E | M O S T | U R G E N T | M E N T A L | A F F I R M A T I O N | A N D | D E N I A L | T H A T | A T | L A S T | H E | G O T | A N | E N V E L O P E | P L A C E D | I N | I T | T H E | R E Q U E S T E D | A M O U N T | A N D | S L O W L Y | S E A L E D | I T | U P |
+T H E | H E L P L E S S | M A N A G E R | P A C E D | T H E | F L O O R | A N D | G R I M L Y | E N D U R E D | T H E | G L O O M | O F | D E F E A T |
+W H E N | H U R S T W O O D | G O T | B A C K | T O | H I S | O F F I C E | A G A I N | H E | W A S | I N | A | G R E A T E R | Q U A N D A R Y | T H A N | E V E R |
+A L L | T H E | T I M E | H I S | T H O U G H T S | W O U L D | R U N | O U T | T O | H I S | H O M E | A N D | S E E | T H E | S C E N E | B E I N G | T H E R E I N | E N A C T E D |
+H E | H A D | L O V E D | H E R | E A R N E S T L Y | E N O U G H | B U T | N O W | T H A T | T H E | P O S S I B I L I T Y | O F | L O S I N G | H E R | S T A R E D | H I M | I N | T H E | F A C E | S H E | S E E M E D | M U C H | M O R E | A T T R A C T I V E |
+H E | D I D | M A N A G E | T O | B R I N G | H I M S E L F | I N T O | T H E | M O O D | T O | G O | O U T | T O | C A R R I E | B U T | W H E N | H E | G O T | I N | O G D E N | P L A C E | H E | T H O U G H T | H E | S A W | A | M A N | W A T C H I N G | H I M | A N D | W E N T | A W A Y |
+T H E | B O Y | H A S T E N E D | A W A Y | A N D | T H E | M A N A G E R | F E L L | T O | H I S | M U S I N G S |
+F O R | R E L I E F | H E | A R O S E | A N D | J O I N E D | I N | C O N V E R S A T I O N | W I T H | A | F E W | F R I E N D S | W H O | W E R E | D R I N K I N G |
+H E | D E C I D E D | T O | W R I T E | H E R | C A R E | O F | T H E | W E S T | S I D E | P O S T | O F F I C E | A N D | A S K | F O R | A N | E X P L A N A T I O N | A S | W E L L | A S | T O | H A V E | H E R | M E E T | H I M |
+T H E N | H E | C A L L E D | H A R R Y | T H E | B O Y | O F | A L L | W O R K | A R O U N D | T H E | P L A C E |
+B U T | H E | C O M P R O M I S E D | B Y | T E L L I N G | T H E | B O Y | T H A T | T H E R E | W O U L D | B E | N O | R E P L Y |
+H E | W E N T | I N | A N D | E X A M I N E D | H I S | L E T T E R S | B U T | T H E R E | W A S | N O T H I N G | F R O M | C A R R I E |
+H E | C O U L D | A R R A N G E | T H A T | S A T I S F A C T O R I L Y | F O R | C A R R I E | W O U L D | B E | G L A D | T O | W A I T | I F | N E C E S S A R Y |
+T H E N | H E | R A N G | T H E | B E L L | N O | A N S W E R |
+H U R S T W O O D | A L M O S T | E X C L A I M E D | O U T | L O U D | A T | T H E | I N S I S T E N C Y | O F | T H I S | T H I N G |
+A T | O N E | T H I R T Y | H E | W E N T | T O | R E C T O R ' S | F O R | L U N C H | A N D | W H E N | H E | R E T U R N E D | A | M E S S E N G E R | W A S | W A I T I N G | F O R | H I M |
+H E | W O U L D | G E T | O N E | T O | D A Y | I T | W O U L D | P R O B A B L Y | B E | O N | H I S | D E S K | W H E N | H E | G O T | B A C K | H E | W O U L D | L O O K | F O R | I T | A T | O N C E |
+W H I L E | T H E | D A N G E R | H A D | N O T | L E S S E N E D | I T | H A D | N O T | A S | Y E T | M A T E R I A L I S E D | A N D | W I T H | H I M | N O | N E W S | W A S | G O O D | N E W S |
+H E | W O U L D | S E E | H O W | T H I N G S | T U R N E D | O U T | T O | M O R R O W | A N D | T H E N | H E | W O U L D | T A L K | T O | H E R | T H E Y | W E R E | G O I N G | T O | M E E T | A S | U S U A L |
+H E | K N E W | H E R | W E L L | E N O U G H | T O | K N O W | T H A T | W H E N | S H E | H A D | D E C I D E D | U P O N | A | P L A N | S H E | W O U L D | F O L L O W | I T | U P |
+F O R | S O M E | R E A S O N | H E | F E L T | A S | I F | S O M E T H I N G | M I G H T | C O M E | T H A T | W A Y | A N D | W A S | R E L I E V E D | W H E N | A L L | T H E | E N V E L O P E S | H A D | B E E N | S C A N N E D | A N D | N O T H I N G | S U S P I C I O U S | N O T I C E D |
+S O | L I T T L E | D I D | H E | C O N S I D E R | D R O U E T | T H A T | I T | N E V E R | O N C E | O C C U R R E D | T O | H I M | T O | W O R R Y | A B O U T | H I S | F I N D I N G | O U T |
+H E | R A N G | A G A I N | T H I S | T I M E | H A R D E R | S T I L L | N O | A N S W E R |
+H E | G R E W | R E S T L E S S | A S | H E | R U M I N A T E D | A N D | T H E N | D E C I D E D | T H A T | P E R H A P S | I T | W A S | N O T H I N G |
+L A T E R | H O W E V E R | H I S | O L D | D I S C R E T I O N | A S S E R T E D | I T S E L F |
+F O R T U N A T E L Y | T H E R E | W A S | N O T H I N G | F R O M | H I S | W I F E | E I T H E R |
+T H E N | H E | S A T | D O W N | I N | H I S | C H A I R | A N D | G A Z E D | W I T H O U T | S E E I N G | C O N T E M P L A T I N G | T H E | R E S U L T | O F | H I S | W O R K |
+H E | B E G A N | T O | W I S H | T H A T | H E | H A D | C O M P R O M I S E D | I N | S O M E | W A Y | O R | O T H E R | T H A T | H E | H A D | S E N T | T H E | M O N E Y | P E R H A P S | H E | C O U L D | D O | I T | U P | H E R E |
+H E | P U T | O N | H I S | H A T | A N D | L O O K E D | A R O U N D | F O R | H I S | U M B R E L L A |
+H E | W A S | I N | A | F E V E R E D | S T A T E | O F | M I N D | O W I N G | T O | T H E | B L I G H T | H I S | W I F E ' S | A C T I O N | T H R E A T E N E D | T O | C A S T | U P O N | H I S | E N T I R E | F U T U R E |
+H O W | W O U L D | T H E | P A P E R S | T A L K | A B O U T | I T |
+A F T E R | A | T I M E | H E | G A V E | U P | W A I T I N G | A N D | D R E A R I L Y | H E A D E D | F O R | T H E | M A D I S O N | C A R |
+S O M E T H I N G | H A D | T O | B E | D O N E | A | C L I M A X | W A S | N E A R | A N D | S H E | W O U L D | N O T | S I T | I D L E |
+S H E | H A D | N O T | B E E N | A B L E | T O | G E T | A W A Y | T H I S | M O R N I N G |
+H E | A R O S E | F R O M | H I S | C H A I R | A N D | W E N T | A N D | L O O K E D | O U T | I N T O | T H E | S T R E E T |
+H E | W O U L D | G O | I N | A N D | S E E | A N Y H O W | H E | W O U L D | H A V E | N O | R O W |
+W H A T | W O U L D | S H E | D O | A B O U T | T H A T | T H E | C O N F O U N D E D | W R E T C H |
+H I S | F I R S T | I M P U L S E | W A S | T O | W R I T E | B U T | F O U R | W O R D S | I N | R E P L Y | G O | T O | T H E | D E V I L |
+T H E | L O N G | D R I Z Z L E | H A D | B E G U N | P E D E S T R I A N S | H A D | T U R N E D | U P | C O L L A R S | A N D | T R O U S E R S | A T | T H E | B O T T O M |
+H E | W O U L D | H A V E | S O M E | A R R A N G E M E N T | O F | T H I S | T H I N G |
+H U R S T W O O D | W A L K E D | T H E | F L O O R | M E N T A L L Y | A R R A N G I N G | T H E | C H I E F | P O I N T S | O F | H I S | S I T U A T I O N |
+H E | W O U L D | H A V E | T O | P A Y | H E R | T H E | M O N E Y | W H I C H | S H E | W O U L D | N O W | R E G U L A R L Y | D E M A N D | O R | T H E R E | W O U L D | B E | T R O U B L E | I T | D I D | N O T | M A T T E R | W H A T | H E | D I D |
+B Y | T H E | T I M E | H E | R E A C H E D | H I S | O W N | S T R E E T | H E | W A S | K E E N L Y | A L I V E | T O | T H E | D I F F I C U L T I E S | O F | H I S | S I T U A T I O N | A N D | W I S H E D | O V E R | A N D | O V E R | T H A T | S O M E | S O L U T I O N | W O U L D | O F F E R | I T S E L F | T H A T | H E | C O U L D | S E E | H I S | W A Y | O U T |
+H E | A L S O | T H O U G H T | O F | H I S | M A N A G E R I A L | P O S I T I O N |
+T H E | W A L L S | O F | T H E | R O O M S | W E R E | D I S C O R D A N T L Y | P A P E R E D |
+Y O U | C O U L D | G E T | H O M E | E A S Y | T O O | I T | I S N ' T | V E R Y | F A R |
+H I S | A M B I T I O N | W A S | S O M E | D A Y | T O | B U I L D | A | H O U S E | O N | T H E M |
+H E | W A S | O F | A | C L E A N | S A V I N G | D I S P O S I T I O N | A N D | H A D | A L R E A D Y | P A I D | A | N U M B E R | O F | M O N T H L Y | I N S T A L M E N T S | O N | T W O | L O T S | F A R | O U T | O N | T H E | W E S T | S I D E |
+M I N N I E ' S | F L A T | A S | T H E | O N E | F L O O R | R E S I D E N T | A P A R T M E N T S | W E R E | T H E N | B E I N G | C A L L E D | W A S | I N | A | P A R T | O F | W E S T | V A N | B U R E N | S T R E E T | I N H A B I T E D | B Y | F A M I L I E S | O F | L A B O U R E R S | A N D | C L E R K S | M E N | W H O | H A D | C O M E | A N D | W E R E | S T I L L | C O M I N G | W I T H | T H E | R U S H | O F | P O P U L A T I O N | P O U R I N G | I N | A T | T H E | R A T E | O F | F I F T Y | T H O U S A N D | A | Y E A R |
+H E | S E E M E D | T O | B E | T H I N K I N G | O F | S O M E T H I N G | E L S E |
+T H E S E | V A S T | B U I L D I N G S | W H A T | W E R E | T H E Y |
+T O | C A R R I E | T H E | S O U N D | O F | T H E | L I T T L E | B E L L S | U P O N | T H E | H O R S E | C A R S | A S | T H E Y | T I N K L E D | I N | A N D | O U T | O F | H E A R I N G | W A S | A S | P L E A S I N G | A S | I T | W A S | N O V E L |
+S H E | A S K E D | M I N N I E | F O R | I N K | A N D | P A P E R | W H I C H | W E R E | U P O N | T H E | M A N T E L | I N | T H E | D I N I N G | R O O M | A N D | W H E N | T H E | L A T T E R | H A D | G O N E | T O | B E D | A T | T E N | G O T | O U T | D R O U E T ' S | C A R D | A N D | W R O T E | H I M |
+O N E | C O U L D | S E E | T H A T | H E | W A S | V E R Y | M U C H | W R A P P E D | U P | I N | H I S | O F F S P R I N G |
+A | S H O P | G I R L | W A S | T H E | D E S T I N Y | P R E F I G U R E D | F O R | T H E | N E W C O M E R |
+T H E | F L O O R S | W E R E | C O V E R E D | W I T H | M A T T I N G | A N D | T H E | H A L L | L A I D | W I T H | A | T H I N | R A G | C A R P E T |
+S H E | H A D | S O M E | S L I G H T | G I F T | O F | O B S E R V A T I O N | A N D | T H A T | S E N S E | S O | R I C H | I N | E V E R Y | W O M A N | I N T U I T I O N |
+M I N N I E | B E G A N | T O | E X P L A I N | B U T | H E R | H U S B A N D | T O O K | T H I S | P A R T | O F | T H E | C O N V E R S A T I O N | T O | H I M S E L F |
+N O W | N O W | H E | S A I D | W A L K I N G | T H E R E | T H E R E | A N D | T H E R E | W A S | A | C E R T A I N | S W E D I S H | A C C E N T | N O T I C E A B L E | I N | H I S | V O I C E |
+I T | G A V E | A N | I M P O S I N G | A P P E A R A N C E | T O | M O S T | O F | T H E | W H O L E S A L E | H O U S E S | W H O S E | O F F I C E S | W E R E | U P O N | T H E | G R O U N D | F L O O R | A N D | I N | P L A I N | V I E W | O F | T H E | S T R E E T |
+S H E | W A N T E D | T O | M A K E | S O M E | R E F E R E N C E | T O | T H E I R | R E L A T I O N S | U P O N | T H E | T R A I N | B U T | W A S | T O O | T I M I D |
+A N Y T H I N G | W A S | G O O D | E N O U G H | S O | L O N G | A S | I T | P A I D | S A Y | F I V E | D O L L A R S | A | W E E K | T O | B E G I N | W I T H |
+T H E N | S H E | W A L K E D | A N D | S A N G | T O | I T | U N T I L | H A N S O N | D I S T U R B E D | I N | H I S | R E A D I N G | C A M E | A N D | T O O K | I T |
+I T | W A S | U N D E R | S U C H | A U S P I C I O U S | C I R C U M S T A N C E S | T H A T | S H E | S T A R T E D | O U T | T H I S | M O R N I N G | T O | L O O K | F O R | W O R K |
+N A R R O W | B O A R D | W A L K S | E X T E N D E D | O U T | P A S S I N G | H E R E | A | H O U S E | A N D | T H E R E | A | S T O R E | A T | F A R | I N T E R V A L S | E V E N T U A L L Y | E N D I N G | O N | T H E | O P E N | P R A I R I E |
+T O | H I M | T H E | P R E S E N C E | O R | A B S E N C E | O F | H I S | W I F E ' S | S I S T E R | W A S | A | M A T T E R | O F | I N D I F F E R E N C E |
+O H | W H A T | S H A L L | W E | D O | F O R | A | H O M E |
+Y O U | S A Y | Y O U | K N O W | A L L | A B O U T | I T | T H E N | G O | O N | A N D | F I N I S H | Y O U R | N E S T S | B Y | Y O U R S E L V E S |
+S H E | W A S | I N D E E D | A | C L E V E R | B I R D |
+A N D | S H E | S A W | T H E | O T H E R | B I R D S | H O P P I N G | A B O U T | A N D | T W I T T E R I N G | H E L P L E S S L Y |
+M U C H | L U C K | M A Y | Y O U | H A V E |
+C E R T A I N L Y | O F | C O U R S E | S C R E A M E D | T H E | J A C K D A W |
+A N D | A W A Y | S H E | F L E W | T O | H E R | O W N | C O S Y | N E S T | I N | T H E | E L M | T R E E | W H E R E | S H E | W A S | S O O N | F A S T | A S L E E P | F O R G E T T I N G | A L L | A B O U T | T H E | M A T T E R |
+A N D | S O M E | O F | T H E | B I R D S | W H O | W E R E | A T T E N T I V E | A N D | C A R E F U L | S O O N | S A W | H O W | I T | W A S | D O N E | A N D | S T A R T E D | N I C E | H O M E S | F O R | T H E M S E L V E S |
+I N D E E D | I T | I S | N O T | A | N E S T | A T | A L L | O N L Y | T H E | B E G I N N I N G | O F | O N E |
+B U T | T H E | W O O D | P I G E O N | W A S | I N | T H E | W O R S T | C A S E | O F | T H E M | A L L |
+A N D | W H E R E | E A C H | B I R D | P E R C H E D | T H E R E | I T | W A S | T O | B U I L D | I T S | N E S T |
+A N D | T H E | P O O R | S I L L Y | T H I N G S | R U F F L E D | U P | T H E I R | F E A T H E R S | A N D | L O O K E D | M I S E R A B L E | A S | O N L Y | A | L I T T L E | B I R D | C A N | L O O K | W H E N | I T | I S | U N H A P P Y |
+C R I S S | C R O S S | C R I S S | C R O S S | S O | I N T E R R U P T E D | T H E | W O O D | P I G E O N |
+I | T H O U G H T | T H A T | W A S | T H E | W A Y | T O | B E G I N |
+A N D | T H E R E | I S | A N | O L D | S T O R Y | A B O U T | T H I S | W H I C H | I | S H A L L | T E L L | Y O U |
+T H E | M A G P I E | S A I D | S H E | W O U L D | T E A C H | T H E M | I F | T H E Y | W O U L D | B E | A | P A T I E N T | D I L I G E N T | O B E D I E N T | C L A S S | O F | L I T T L E | B I R D S |
+S O M E | A R E | W O N D E R F U L L Y | W R O U G H T | P R E T T Y | L I T T L E | H O M E S | F O R | B I R D I K I N S |
+F O R | S H E | H A D | O N L Y | T H E | F O U N D A T I O N | L A I D | C R I S S | C R O S S | A S | T H E | M A G P I E | H A D | S H O W N | H E R |
+T H E N | A L L | T H E | O T H E R | B I R D S | C H I R P E D | E A G E R L Y | Y E S | Y E S | L E T | U S | A S K | H E R | T O | T E A C H | U S |
+H E R E | W O O D | P I G E O N | S A I D | M O T H E R | M A G P I E | Y O U | M U S T | P L A C E | T H O S E | S T I C K S | T H R O U G H | A N D | A C R O S S | C R I S S | C R O S S | C R I S S | C R O S S | S O |
+O | W I S E | M O T H E R | M A G P I E | D E A R | M O T H E R | M A G P I E | T H E Y | C R I E D | T E A C H | U S | H O W | T O | B U I L D | O U R | N E S T S | L I K E | Y O U R S | F O R | I T | I S | G R O W I N G | N I G H T | A N D | W E | A R E | T I R E D | A N D | S L E E P Y |
+S H E | B E G A N | T O | S H O W | T H E M | H O W | T O | W E A V E | T H E | B I T S | O F | T H I N G S | T O G E T H E R | I N T O | N E S T S | A S | T H E Y | S H O U L D | B E | M A D E |
+S H E | P O P P E D | I N T O | H E R | N E W | H O U S E | A N D | S A T | T H E R E | C O M F O R T A B L Y | P E E R I N G | O U T | T H R O U G H | T H E | W I N D O W | S L I T S | W I T H | H E R | S H A R P | L I T T L E | E Y E S |
+S O | I N | A | G R E A T | C O M P A N Y | T H E Y | C A M E | F L U T T E R I N G | H O P P I N G | T W I T T E R I N G | U P | T O | T H E | E L M | T R E E | W H E R E | M O T H E R | M A G P I E | N E S T L E D | C O M F O R T A B L Y | I N | H E R | N E W | H O U S E |
+I T | W A S | I N D E E D | D A N C I N G | O N | A | V O L C A N O |
+H E | M I G H T | B E | E N C H A N T E D | B U T | T H A T | W A S | T H E | T A L I S M A N |
+H O W | M A N Y | I N C I D E N T S | H O W | M A N Y | C H A R A C T E R S | H O W | M A N Y | F E E L I N G S | F L I T T E D | O V E R | H I S | M E M O R Y | O F | W H A T | S W E E T | A N D | B I T T E R | E X P E R I E N C E | D I D | H E | N O T | C H E W | T H E | C U D |
+I T | W A S | I N D E E D | H E R | H A N D W R I T I N G |
+F O U R | A N D | T W E N T Y | H O U R S | A G O | A N D | H E | D E E M E D | H I M S E L F | T H E | M O S T | M I S E R A B L E | A N D | F O R L O R N | O F | H U M A N | B E I N G S | A N D | N O W | A L L | T H E | B L E S S I N G S | O F | T H E | W O R L D | S E E M E D | S H O W E R E D | A T | H I S | F E E T |
+T H E | M O S T | G I F T E D | I N D I V I D U A L S | I N | T H E | L A N D | E M U L A T E D | E A C H | O T H E R | I N | P R O V I N G | W H I C H | E N T E R T A I N E D | F O R | H I M | T H E | M O S T | S I N C E R E | A F F E C T I O N |
+A N D | N O W | A L L | H A D | E N D E D | S O | H A P P I L Y |
+A S | F O R | H I S | F R I E N D S | T H E | F U T U R E | M U S T | P R O V E | H I S | G R A T I T U D E | T O | T H E M |
+I N | E X A C T L Y | T E N | M I N U T E S | I T | I S | I N | T H E | P O W E R | O F | E V E R Y | M A N | T O | F R E E | H I M S E L F | F R O M | A L L | T H E | T U M U L T | O F | T H E | W O R L D | T H E | P A N G S | O F | L O V E | T H E | T H R O B S | O F | A M B I T I O N | T H E | W E A R | A N D | T E A R | O F | P L A Y | T H E | R E C R I M I N A T I N G | B O U D O I R | T H E | C O N S P I R I N G | C L U B | T H E | R A T T L I N G | H E L L | A N D | F I N D | H I M S E L F | I N | A | S U B L I M E | S Y L V A N | S O L I T U D E | S U P E R I O R | T O | T H E | C E D A R S | O F | L E B A N O N | A N D | I N F E R I O R | O N L Y | I N | E X T E N T | T O | T H E | C H E S T N U T | F O R E S T S | O F | A N A T O L I A |
+T H I S | V I O L E N T | A N D | T R I U M P H A N T | R E V O L U T I O N | I N | H I S | P R O S P E C T S | A N D | H I S | F O R T U N E S | W A S | H A R D L Y | Y E T | C O M P L E T E L Y | C O M P R E H E N D E D | B Y | O U R | F R I E N D | F E R D I N A N D | A R M I N E | A N D | W H E N | H E | H A D | L E F T | A | N O T E | F O R | T H E | G E N E R O U S | M I R A B E L | W H O S E | S L U M B E R S | H E | W O U L D | N O T | D I S T U R B | A T | T H I S | E A R L Y | H O U R | E V E N | W I T H | G O O D | N E W S | H E | S T R O L L E D | A L O N G | U P | C H A R L E S | S T R E E T | A N D | T O | T H E | P A R K | I N | O N E | O F | T H O S E | W I L D | A N D | J O Y O U S | R E V E R I E S | I N | W H I C H | W E | B R O O D | O V E R | C O M I N G | B L I S S | A N D | C R E A T E | A | T H O U S A N D | G L O R I O U S | C O N S E Q U E N C E S |
+I T | R E Q U I R E S | S O M E | S E L F | C O M M U N I O N | T O | P R E P A R E | O U R S E L V E S | F O R | G O O D | F O R T U N E | A S | W E L L | A S | T O | E N C O U N T E R | D I F F I C U L T Y | A N D | D A N G E R | A N D | D I S G R A C E |
+H I S | C O N S T A N C Y | T O | H E R | W A S | N O W | R E W A R D E D |
+F E R D I N A N D | F E L T | H I S | F R E E D O M | A S | W E L L | A S | H I S | H A P P I N E S S |
+F E R D I N A N D | M E D I T A T E S | O V E R | H I S | G O O D | F O R T U N E |
+I N | M O M E N T S | O F | D E E P | F E E L I N G | A L I K E | S U D D E N | B U R S T S | O F | P R O S P E R I T Y | A S | I N | D A R K E R | H O U R S | M A N | M U S T | B E | A L O N E |
+I N | T H E | P R E S E N T | U N S E T T L E D | T H O U G H | H O P E F U L | S T A T E | O F | A F F A I R S | F E R D I N A N D | W O U L D | N O T | G O | H O M E |
+W A S | I T | N O T | A L L | A | D R E A M | O F | H I S | O W N | C R E A T I O N | W H I L E | H I S | E Y E | H A D | B E E N | F I X E D | I N | A B S T R A C T I O N | O N | T H A T | B R I G H T | A N D | F L O W I N G | R I V E R |
+R E S T L E S S | W I T H | I M P E N D I N G | J O Y | H E | S A U N T E R E D | T O | T H E | B R I D G E | A N D | L E A N T | O V E R | T H E | B A L U S T R A D E | G A Z I N G | O N | T H E | W A T E R S | I N | C H A R M E D | A N D | C H A R M I N G | V A C A N C Y |
+I S | P A P A | A L O N E | E N Q U I R E D | M I S S | T E M P L E |
+H E | C O U L D | N O T | F L A T T E R | H I M S E L F | T H A T | H E | I N D E E D | M E R I T E D | S U C H | S I N G U L A R | B L E S S I N G S | A N D | Y E T | W I T H | A L L | H I S | F A U L T S | W H I C H | W I T H | H I M | W E R E | B U T | T H E | C O N S E Q U E N C E S | O F | H I S | F I E R Y | Y O U T H | F E R D I N A N D | H A D | B E E N | F A I T H F U L | T O | H E N R I E T T A |
+T I M E | W O R E | A W A Y | A N D | O N | T H E | N I N T H | O F | A P R I L | E I G H T E E N | S I X T Y | F I V E | G R A N T | C A P T U R E D | T H E | C O N F E D E R A T E | A R M Y | U N D E R | L E E | T H U S | V I R T U A L L Y | E N D I N G | T H E | W A R |
+I N D E E D | I F | E V E R | A | G E N E R A L | D E S E R V E D | H O N O R | G R A N T | H A D | W O N | I T | H E | H A D | O P E N E D | T H E | M I S S I S S I P P I | T O | N A V I G A T I O N | A N D | H A D | C A P T U R E D | N E A R L Y | O N E | H U N D R E D | T H O U S A N D | P R I S O N E R S | A N D | A R M S |
+H E | W A S | N O W | C O M M A N D E R | O F | A L L | T H E | F E D E R A L | F O R C E S |
+W H E N | H I S | P U B L I C | S E R V I C E S | W E R E | F I N I S H E D | H E | S T A R T E D | I N | C O M P A N Y | W I T H | H I S | W I F E | S O N | J E S S E | A N D | A | F E W | F R I E N D S |
+H I S | S U C C E S S | S E E M S | T O | H A V E | B E E N | T H E | O U T G R O W T H | O F | H A R D | S T U D Y | A N D | A B I L I T Y | T O | P E R F O R M | T H E | M O S T | E X H A U S T I V E | L A B O R | W I T H O U T | F A T I G U E |
+T H E | C A P T U R E | O F | L E E | W A S | A | F A R | M O R E | D I F F I C U L T | U N D E R T A K I N G |
+T H R O U G H | T H E | I N F L U E N C E | O F | H O N | T H O M A S | L | H A M E R | H E | W A S | A D M I T T E D | A T | W E S T | P O I N T | I N | E I G H T E E N | T H I R T Y | N I N E |
+A T | T H I S | T I M E | G R A N T | W A S | N O T | T A K E N | W I T H | W A R | A N D | P R O B A B L Y | E V I N C E D | L I T T L E | I N T E R E S T | I N | A R M Y | T A C T I C S |
+G R A N T | A C T E D | A S | M U S T E R I N G | O F F I C E R | U N T I L | B E I N G | C O M M I S S I O N E D | C O L O N E L | O F | T H E | T W E N T Y | F I R S T | I L L I N O I S | V O L U N T E E R S | H E | T O O K | T H E | F I E L D |
+G E N E R A L | H A L L E C K | I N | S P E A K I N G | O F | T H I S | B A T T L E | S A I D |
+T H E Y | D I D | N O T | B R E A T H E | I T | I N T O | T H E I R | M O U T H S | O R | T H R O U G H | G I L L S | B U T | T O O K | I T | I N | T H R O U G H | S O M E | O P E N I N G S | I N | T H E | B A C K | P A R T | O F | T H E I R | B O D I E S |
+T H E Y | K N E W | T H A T | W H E N E V E R | T H E Y | S T U C K | O U T | T H E I R | L O W E R | L I P S | A T | T H E | S M A L L | F I S H E S | A N D | B U G S | T H E Y | S W A M | A W A Y | A S | F A S T | A S | T H E Y | C O U L D |
+A | P E R S O N | W O U L D | T H I N K | T H A T | A F T E R | A | F A M I L Y | H A D | L I V E D | S O | L O N G | I N | A | P L A C E | A L L | T H E | N E I G H B O R S | W O U L D | B E | F O N D | O F | T H E M | Y E T | I T | I S | N O T | S O |
+T H E Y | A L W A Y S | A T E | P L A I N | F O O D | A N D | P L E N T Y | O F | I T | A N D | T H E Y | N E V E R | A T E | B E T W E E N | M E A L S |
+S U R E | E N O U G H | T H E R E | H E | C A M E | T H R O U G H | T H E | S H A L L O W | W A T E R | H I S | W E T | B A C K | S H E L L | P A R T L Y | O U T | O F | I T | A N D | S H I N I N G | I N | T H E | S U N L I G H T |
+Y O U | W O U L D | T H I N K | T H A T | W I T H | S I X | L E G S | A P I E C E | A N D | T H R E E | J O I N T S | I N | E A C H | L E G | T H E Y | M I G H T | W A L K | Q U I T E | F A S T | Y E T | T H E Y | N E V E R | D I D |
+H E | B E G A N | T O | D R A W | I N | H I S | L E G S | V E R Y | V E R Y | S L O W L Y | A N D | J U S T | A S | H I S | G R E A T | H A R D | L O W E R | S H E L L | T O U C H E D | T H E | M U D | T H E | L A S T | L A R V A | C R A W L E D | O U T | U N D E R | H I S | T A I L |
+B U T | S O M E T I M E S | H E | S T R A I G H T E N S | T H E | J O I N T | A N D | H O L D S | H I S | L I P | O U T | B E F O R E | H I M | A N D | T H E N | I T S | P I N C E R S | C A T C H | H O L D | O F | T H I N G S | H E | D O E S | T H I S | W H E N | H E | I S | H U N G R Y |
+I T | I S | D I S G R A C E F U L |
+T H E | N Y M P H S | H A D | A L R E A D Y | G O T T E N | A W A Y |
+O U R | U P P E R | L I P S | A R E | S O | S M A L L | T H E Y | D O N ' T | M A T T E R |
+T H E Y | T H O U G H T | T H E | T R O U B L E | C A M E | F R O M | B A D | B R I N G I N G | U P | O R | N O | B R I N G I N G | U P | A T | A L L |
+B O T H | L I P S | A S K E D | T H E | L A R V A E |
+S C A R E D | D A H | W H O ' S | A F R A I D | A N S W E R E D | H E |
+T H E Y | T H O U G H T | H E | M I G H T | B E | G O I N G | T O | T A K E | A | N A P | A F T E R | H I S | D I N N E R |
+H E R E | C O M E S | T H E | S N A P P I N G | T U R T L E |
+I N D E E D | T H E | L O W E R | L I P | O F | A | D R A G O N | F L Y | C H I L D | M I G H T | W E L L | F R I G H T E N | P E O P L E | F O R | I T | I S | F A S T E N E D | O N | A | L O N G | J O I N T E D | A R M | L I K E | T H I N G | A N D | H A S | P I N C E R S | O N | I T | W I T H | W H I C H | I T | C A T C H E S | A N D | H O L D S | I T S | F O O D |
+W E L L | O U R | L O W E R | L I P S | A N Y W A Y | A N S W E R E D | T H E | N Y M P H |
+O N | T H I S | A C C O U N T | T H E | P E O P L E | O F | O N E | N A T I O N | U N D E R S T A N D | O N E | A N O T H E R | B E T T E R | T H A N | T H O S E | B E L O N G I N G | T O | D I F F E R E N T | N A T I O N S | E V E N | W H E N | T H E Y | U S E | T H E | S A M E | L A N G U A G E | O R | R A T H E R | W H E N | P E O P L E | H A V E | L I V E D | L O N G | T O G E T H E R | U N D E R | S I M I L A R | C O N D I T I O N S | O F | C L I M A T E | S O I L | D A N G E R | R E Q U I R E M E N T | T O I L | T H E R E | O R I G I N A T E S | T H E R E F R O M | A N | E N T I T Y | T H A T | U N D E R S T A N D S | I T S E L F | N A M E L Y | A | N A T I O N |
+E V E R Y W H E R E | T H A T | S L A V E | M O R A L I T Y | G A I N S | T H E | A S C E N D A N C Y | L A N G U A G E | S H O W S | A | T E N D E N C Y | T O | A P P R O X I M A T E | T H E | S I G N I F I C A T I O N S | O F | T H E | W O R D S | G O O D | A N D | S T U P I D |
+D A N G E R | I S | A G A I N | P R E S E N T | T H E | M O T H E R | O F | M O R A L I T Y | G R E A T | D A N G E R | T H I S | T I M E | S H I F T E D | I N T O | T H E | I N D I V I D U A L | I N T O | T H E | N E I G H B O U R | A N D | F R I E N D | I N T O | T H E | S T R E E T | I N T O | T H E I R | O W N | C H I L D | I N T O | T H E I R | O W N | H E A R T | I N T O | A L L | T H E | M O S T | P E R S O N A L | A N D | S E C R E T | R E C E S S E S | O F | T H E I R | D E S I R E S | A N D | V O L I T I O N S |
+W E | T R U T H F U L | O N E S | T H E | N O B I L I T Y | I N | A N C I E N T | G R E E C E | C A L L E D | T H E M S E L V E S |
+P R O B A B L Y | A | P E S S I M I S T I C | S U S P I C I O N | W I T H | R E G A R D | T O | T H E | E N T I R E | S I T U A T I O N | O F | M A N | W I L L | F I N D | E X P R E S S I O N | P E R H A P S | A | C O N D E M N A T I O N | O F | M A N | T O G E T H E R | W I T H | H I S | S I T U A T I O N |
+V A R I A T I O N S | W H E T H E R | T H E Y | B E | D E V I A T I O N S | I N T O | T H E | H I G H E R | F I N E R | A N D | R A R E R | O R | D E T E R I O R A T I O N S | A N D | M O N S T R O S I T I E S | A P P E A R | S U D D E N L Y | O N | T H E | S C E N E | I N | T H E | G R E A T E S T | E X U B E R A N C E | A N D | S P L E N D O U R | T H E | I N D I V I D U A L | D A R E S | T O | B E | I N D I V I D U A L | A N D | D E T A C H | H I M S E L F |
+I N | F A C T | C O N F O R M A B L Y | T O | T H E | S L O W | R I S E | O F | T H E | D E M O C R A T I C | S O C I A L | O R D E R | A N D | I T S | C A U S E | T H E | B L E N D I N G | O F | T H E | B L O O D | O F | M A S T E R S | A N D | S L A V E S | T H E | O R I G I N A L L Y | N O B L E | A N D | R A R E | I M P U L S E | O F | T H E | M A S T E R S | T O | A S S I G N | A | V A L U E | T O | T H E M S E L V E S | A N D | T O | T H I N K | W E L L | O F | T H E M S E L V E S | W I L L | N O W | B E | M O R E | A N D | M O R E | E N C O U R A G E D | A N D | E X T E N D E D | B U T | I T | H A S | A T | A L L | T I M E S | A N | O L D E R | A M P L E R | A N D | M O R E | R A D I C A L L Y | I N G R A I N E D | P R O P E N S I T Y | O P P O S E D | T O | I T | A N D | I N | T H E | P H E N O M E N O N | O F | V A N I T Y | T H I S | O L D E R | P R O P E N S I T Y | O V E R M A S T E R S | T H E | Y O U N G E R |
+O C C A S I O N A L L Y | T O O | T H E | W A K I N G | C A L L | C O M E S | T O O | L A T E | T H E | C H A N C E | W H I C H | G I V E S | P E R M I S S I O N | T O | T A K E | A C T I O N | W H E N | T H E I R | B E S T | Y O U T H | A N D | S T R E N G T H | F O R | A C T I O N | H A V E | B E E N | U S E D | U P | I N | S I T T I N G | S T I L L | A N D | H O W | M A N Y | A | O N E | J U S T | A S | H E | S P R A N G | U P | H A S | F O U N D | W I T H | H O R R O R | T H A T | H I S | L I M B S | A R E | B E N U M B E D | A N D | H I S | S P I R I T S | A R E | N O W | T O O | H E A V Y |
+W H A T | W I L L | T H E | M O R A L | P H I L O S O P H E R S | W H O | A P P E A R | A T | T H I S | T I M E | H A V E | T O | P R E A C H |
+I T | I S | O B V I O U S | T H A T | E V E R Y W H E R E | T H E | D E S I G N A T I O N S | O F | M O R A L | V A L U E | W E R E | A T | F I R S T | A P P L I E D | T O | M E N | A N D | W E R E | O N L Y | D E R I V A T I V E L Y | A N D | A T | A | L A T E R | P E R I O D | A P P L I E D | T O | A C T I O N S | I T | I S | A | G R O S S | M I S T A K E | T H E R E F O R E | W H E N | H I S T O R I A N S | O F | M O R A L S | S T A R T | W I T H | Q U E S T I O N S | L I K E | W H Y | H A V E | S Y M P A T H E T I C | A C T I O N S | B E E N | P R A I S E D |
+W H I C H E V E R | G R O U P S | O F | S E N S A T I O N S | W I T H I N | A | S O U L | A W A K E N | M O S T | R E A D I L Y | B E G I N | T O | S P E A K | A N D | G I V E | T H E | W O R D | O F | C O M M A N D | T H E S E | D E C I D E | A S | T O | T H E | G E N E R A L | O R D E R | O F | R A N K | O F | I T S | V A L U E S | A N D | D E T E R M I N E | U L T I M A T E L Y | I T S | L I S T | O F | D E S I R A B L E | T H I N G S |
+H E | H O N O U R S | W H A T E V E R | H E | R E C O G N I Z E S | I N | H I M S E L F | S U C H | M O R A L I T Y | E Q U A L S | S E L F | G L O R I F I C A T I O N |
+A N D | T O | C H O O S E | F O R | C O M P A N Y | T H A T | R O G U I S H | A N D | C H E E R F U L | V I C E | P O L I T E N E S S |
+A C C O R D I N G | T O | S L A V E | M O R A L I T Y | T H E R E F O R E | T H E | E V I L | M A N | A R O U S E S | F E A R | A C C O R D I N G | T O | M A S T E R | M O R A L I T Y | I T | I S | P R E C I S E L Y | T H E | G O O D | M A N | W H O | A R O U S E S | F E A R | A N D | S E E K S | T O | A R O U S E | I T | W H I L E | T H E | B A D | M A N | I S | R E G A R D E D | A S | T H E | D E S P I C A B L E | B E I N G |
+T H E | D I S T I N C T I O N S | O F | M O R A L | V A L U E S | H A V E | E I T H E R | O R I G I N A T E D | I N | A | R U L I N G | C A S T E | P L E A S A N T L Y | C O N S C I O U S | O F | B E I N G | D I F F E R E N T | F R O M | T H E | R U L E D | O R | A M O N G | T H E | R U L E D | C L A S S | T H E | S L A V E S | A N D | D E P E N D E N T S | O F | A L L | S O R T S |
+T H E | H I G H E S T | I N S T I N C T | F O R | P U R I T Y | P L A C E S | H I M | W H O | I S | A F F E C T E D | W I T H | I T | I N | T H E | M O S T | E X T R A O R D I N A R Y | A N D | D A N G E R O U S | I S O L A T I O N | A S | A | S A I N T | F O R | I T | I S | J U S T | H O L I N E S S | T H E | H I G H E S T | S P I R I T U A L I Z A T I O N | O F | T H E | I N S T I N C T | I N | Q U E S T I O N |
+T H I S | I S | T H E | P R O B L E M | O F | R A C E |
+N O T H I N G | B U T | N E W | W H Y S | N O T H I N G | B U T | N E W | H O W S | N O | C O M M O N | F O R M U L A S | A N Y | L O N G E R | M I S U N D E R S T A N D I N G | A N D | D I S R E G A R D | I N | L E A G U E | W I T H | E A C H | O T H E R | D E C A Y | D E T E R I O R A T I O N | A N D | T H E | L O F T I E S T | D E S I R E S | F R I G H T F U L L Y | E N T A N G L E D | T H E | G E N I U S | O F | T H E | R A C E | O V E R F L O W I N G | F R O M | A L L | T H E | C O R N U C O P I A S | O F | G O O D | A N D | B A D | A | P O R T E N T O U S | S I M U L T A N E O U S N E S S | O F | S P R I N G | A N D | A U T U M N | F U L L | O F | N E W | C H A R M S | A N D | M Y S T E R I E S | P E C U L I A R | T O | T H E | F R E S H | S T I L L | I N E X H A U S T E D | S T I L L | U N W E A R I E D | C O R R U P T I O N |
+E V E R Y | E L E V A T I O N | O F | T H E | T Y P E | M A N | H A S | H I T H E R T O | B E E N | T H E | W O R K | O F | A N | A R I S T O C R A T I C | S O C I E T Y | A N D | S O | I T | W I L L | A L W A Y S | B E | A | S O C I E T Y | B E L I E V I N G | I N | A | L O N G | S C A L E | O F | G R A D A T I O N S | O F | R A N K | A N D | D I F F E R E N C E S | O F | W O R T H | A M O N G | H U M A N | B E I N G S | A N D | R E Q U I R I N G | S L A V E R Y | I N | S O M E | F O R M | O R | O T H E R |
+H E R E | I S | T H E | S E A T | O F | T H E | O R I G I N | O F | T H E | F A M O U S | A N T I T H E S I S | G O O D | A N D | E V I L | P O W E R | A N D | D A N G E R O U S N E S S | A R E | A S S U M E D | T O | R E S I D E | I N | T H E | E V I L | A | C E R T A I N | D R E A D F U L N E S S | S U B T L E T Y | A N D | S T R E N G T H | W H I C H | D O | N O T | A D M I T | O F | B E I N G | D E S P I S E D |
+A N D | W H O E V E R | T H O U | A R T | W H A T | I S | I T | T H A T | N O W | P L E A S E S | T H E E |
+T H E | G R E A T E R | T H E | D A N G E R | T H E | G R E A T E R | I S | T H E | N E E D | O F | A G R E E I N G | Q U I C K L Y | A N D | R E A D I L Y | A B O U T | W H A T | I S | N E C E S S A R Y | N O T | T O | M I S U N D E R S T A N D | O N E | A N O T H E R | I N | D A N G E R | T H A T | I S | W H A T | C A N N O T | A T | A L L | B E | D I S P E N S E D | W I T H | I N | I N T E R C O U R S E |
+I | D O | N O T | K N O W | H E | S A I D | H E S I T A T I N G L Y | P E R H A P S | T H E | H A R P I E S | H A V E | F L O W N | O V E R | M Y | T A B L E |
+I N | O U R | V E R Y | D E M O C R A T I C | O R | R A T H E R | V E R Y | P L E B E I A N | A G E | E D U C A T I O N | A N D | C U L T U R E | M U S T | B E | E S S E N T I A L L Y | T H E | A R T | O F | D E C E I V I N G | D E C E I V I N G | W I T H | R E G A R D | T O | O R I G I N | W I T H | R E G A R D | T O | T H E | I N H E R I T E D | P L E B E I A N I S M | I N | B O D Y | A N D | S O U L |
+O R | H E | W I L L | E V E N | S A Y | F O R | M A N Y | R E A S O N S | I | C A N | D E L I G H T | I N | T H E | G O O D | O P I N I O N | O F | O T H E R S | P E R H A P S | B E C A U S E | I | L O V E | A N D | H O N O U R | T H E M | A N D | R E J O I C E | I N | A L L | T H E I R | J O Y S | P E R H A P S | A L S O | B E C A U S E | T H E I R | G O O D | O P I N I O N | E N D O R S E S | A N D | S T R E N G T H E N S | M Y | B E L I E F | I N | M Y | O W N | G O O D | O P I N I O N | P E R H A P S | B E C A U S E | T H E | G O O D | O P I N I O N | O F | O T H E R S | E V E N | I N | C A S E S | W H E R E | I | D O | N O T | S H A R E | I T | I S | U S E F U L | T O | M E | O R | G I V E S | P R O M I S E | O F | U S E F U L N E S S | A L L | T H I S | H O W E V E R | I S | N O T | V A N I T Y |
+A | M A N ' S | E S T I M A T E S | O F | V A L U E | B E T R A Y | S O M E T H I N G | O F | T H E | S T R U C T U R E | O F | H I S | S O U L | A N D | W H E R E I N | I T | S E E S | I T S | C O N D I T I O N S | O F | L I F E | I T S | I N T R I N S I C | N E E D S |
+O N L Y | N A M E | I T | W H A T E V E R | I | H A V E | I | O F F E R | T H E E |
+A L S O | I N | A L L | L O V E S | A N D | F R I E N D S H I P S | O N E | H A S | T H E | E X P E R I E N C E | T H A T | N O T H I N G | O F | T H E | K I N D | C O N T I N U E S | W H E N | T H E | D I S C O V E R Y | H A S | B E E N | M A D E | T H A T | I N | U S I N G | T H E | S A M E | W O R D S | O N E | O F | T H E | T W O | P A R T I E S | H A S | F E E L I N G S | T H O U G H T S | I N T U I T I O N S | W I S H E S | O R | F E A R S | D I F F E R E N T | F R O M | T H O S E | O F | T H E | O T H E R |
+T O | S U F F O C A T E | W I T H | H I S | M E M O R I E S | T O | H I M | W H O | H A S | T H E | D E S I R E S | O F | A | L O F T Y | A N D | D A I N T Y | S O U L | A N D | O N L Y | S E L D O M | F I N D S | H I S | T A B L E | L A I D | A N D | H I S | F O O D | P R E P A R E D | T H E | D A N G E R | W I L L | A L W A Y S | B E | G R E A T | N O W A D A Y S | H O W E V E R | I T | I S | E X T R A O R D I N A R I L Y | S O |
+T H E | N O B L E | S O U L | A C C E P T S | T H E | F A C T | O F | H I S | E G O I S M | W I T H O U T | Q U E S T I O N | A N D | A L S O | W I T H O U T | C O N S C I O U S N E S S | O F | H A R S H N E S S | C O N S T R A I N T | O R | A R B I T R A R I N E S S | T H E R E I N | B U T | R A T H E R | A S | S O M E T H I N G | T H A T | M A Y | H A V E | I T S | B A S I S | I N | T H E | P R I M A R Y | L A W | O F | T H I N G S | I F | H E | S O U G H T | A | D E S I G N A T I O N | F O R | I T | H E | W O U L D | S A Y | I T | I S | J U S T I C E | I T S E L F |
+B U T | Y O U | M I S U N D E R S T A N D | H I M | W H E N | Y O U | C O M P L A I N | A B O U T | I T |
+A T | T H I S | T U R N I N G | P O I N T | O F | H I S T O R Y | T H E R E | M A N I F E S T | T H E M S E L V E S | S I D E | B Y | S I D E | A N D | O F T E N | M I X E D | A N D | E N T A N G L E D | T O G E T H E R | A | M A G N I F I C E N T | M A N I F O L D | V I R G I N | F O R E S T | L I K E | U P | G R O W T H | A N D | U P | S T R I V I N G | A | K I N D | O F | T R O P I C A L | T E M P O | I N | T H E | R I V A L R Y | O F | G R O W T H | A N D | A N | E X T R A O R D I N A R Y | D E C A Y | A N D | S E L F | D E S T R U C T I O N | O W I N G | T O | T H E | S A V A G E L Y | O P P O S I N G | A N D | S E E M I N G L Y | E X P L O D I N G | E G O I S M S | W H I C H | S T R I V E | W I T H | O N E | A N O T H E R | F O R | S U N | A N D | L I G H T | A N D | C A N | N O | L O N G E R | A S S I G N | A N Y | L I M I T | R E S T R A I N T | O R | F O R B E A R A N C E | F O R | T H E M S E L V E S | B Y | M E A N S | O F | T H E | H I T H E R T O | E X I S T I N G | M O R A L I T Y |
+P R O B A B L Y | B U T | F O R T U N A T E L Y | N O T H I N G | F O R | M Y | O W N | T E E T H | P E R H A P S | I T | B E T R A Y S | T H E | S P E C I E S | T O | W H I C H | I | B E L O N G | B U T | N O T | T O | M Y S E L F | A S | I S | S U F F I C I E N T L Y | A G R E E A B L E | T O | M E | B U T | W H A T | H A S | H A P P E N E D | T O | Y O U |
+T H E R E | M U S T | B E | A | S O R T | O F | R E P U G N A N C E | I N | M E | T O | B E L I E V E | A N Y T H I N G | D E F I N I T E | A B O U T | M Y S E L F | I S | T H E R E | P E R H A P S | S O M E | E N I G M A | T H E R E I N |
+T H E | M O S T | V A R I E D | E X P E R I E N C E | T E A C H E S | I T | W H A T | A R E | T H E | Q U A L I T I E S | T O | W H I C H | I T | P R I N C I P A L L Y | O W E S | T H E | F A C T | T H A T | I T | S T I L L | E X I S T S | I N | S P I T E | O F | A L L | G O D S | A N D | M E N | A N D | H A S | H I T H E R T O | B E E N | V I C T O R I O U S | T H E S E | Q U A L I T I E S | I T | C A L L S | V I R T U E S | A N D | T H E S E | V I R T U E S | A L O N E | I T | D E V E L O P S | T O | M A T U R I T Y |
+B O A T S | P U T | O U T | B O T H | F R O M | T H E | F O R T | A N D | T H E | S H O R E |
+B E A U R E G A R D | A T | O N C E | W R O T E | A N | O R D E R |
+C H A P T E R | S E V E N | T H E | H O M E C O M I N G |
+I F | Y O U ' V E | G O T | P I S T O L S | J U S T | Y O U | T H I N K | O N C E | B E F O R E | Y O U | S H O O T | S A I D | C O L L I N S |
+A N | E X T R A O R D I N A R Y | W A V E | O F | E M O T I O N | S W E P T | O V E R | T H E | S O U T H | C A R R Y I N G | E V E R Y B O D Y | W I T H | I T |
+H E | I N T E N D E D | T O | L E A V E | E A R L Y | I N | T H E | M O R N I N G | B U T | F I R S T | H E | S O U G H T | H I S | F R I E N D S | A N D | T O L D | T H E M | G O O D | B Y E |
+B I L L | S K E L L Y | A N | H I S | G A N G | T H E M | M O U N T A I N E E R S | A R E | U P |
+T H A T | W H I T E | F L A G | A N D | T H O S E | B O A T S | G O I N G | O U T | M E A N | T H A T | S U M T E R | I S | O U R S |
+W H E T H E R | T H E I R | M A N N E R | W A S | G R A V E | O R | F R I V O L O U S | H E | K N E W | T H A T | T H E S E | W E R E | G O O D | F R I E N D S | O F | H I S | A N D | H E | S I N C E R E L Y | H O P E D | T H A T | H E | W O U L D | M E E T | T H E M | A G A I N |
+B U T | T H E | N E G O T I A T I O N S | W E R E | S O O N | C O M P L E T E D |
+H E | W A S | G O I N G | H O M E | A F T E R | V I C T O R Y |
+H E | F E L T | T H E | D I F F E R E N C E | A S | S O O N | A S | H E | R E A C H E D | T H E | H I L L S | O F | H I S | N A T I V E | S T A T E |
+H E | S O O N | L E F T | C H A R L E S T O N | O U T | O F | S I G H T |
+T H E R E | W E R E | N E V E R | B E F O R E | S U C H | T I M E S | I N | O L D | K E N T U C K Y |
+E U R O P E | W H I C H | M U S T | H A V E | I T S | C O T T O N | W O U L D | F A V O R | T H E | S U C C E S S | O F | T H E | S O U T H |
+P E O P L E | W E R E | C O O L E R | H E R E | A N D | T H E Y | W E R E | M O R E | P R O N E | T O | L O O K | A T | T H E | T W O | S I D E S | O F | A | Q U E S T I O N |
+T H E | G R E A T | S T A T E | O F | V I R G I N I A | M O T H E R | O F | P R E S I D E N T S | W E N T | O U T | O F | T H E | U N I O N | A T | L A S T | A N D | N O R T H | C A R O L I N A | T E N N E S S E E | A N D | A R K A N S A S | F O L L O W E D | H E R | B U T | M A R Y L A N D | K E N T U C K Y | A N D | M I S S O U R I | S T I L L | H U N G | I N | T H E | B A L A N C E |
+T H E | A I R | T O O | W A S | U N L I K E | T H A T | O F | S O U T H | C A R O L I N A | T H E R E | W A S | A | S H A R P E R | T A N G | T O | I T |
+H A R R Y | F E E L I N G | P R I D E | B U T | N O T | S H O W I N G | I T | S A L U T E D | A N D | L E F T | T H E | R O O M | G O I N G | A T | O N C E | T O | M A D A M E | D E L A U N A Y ' S | W H E R E | H E | H A D | L E F T | H I S | B A G G A G E |
+H E | D I D | N O T | S A Y | T H E | L A S T | A S | A | B O A S T | B U T | M E R E L Y | A S | A N | A S S U R A N C E | T O | T H E | L I V E R Y M A N | W H O | H E | S A W | W A S | A N X I O U S | O N | H I S | A C C O U N T |
+T H E R E | W A S | N O T | A | S I N G L E | N O T E | O F | G L O O M |
+B U T | H E | L O O K E D | B A C K | A T | C H A R L E S T O N | T H E | G A Y | T H E | V O L A T I L E | A N D | T H E | B E A U T I F U L | W I T H | R E A L | A F F E C T I O N |
+T H E | S M O K E | I T S E L F | W H I C H | H A D | F O R M E D | A | V A S T | C L O U D | O V E R | H A R B O R | F O R T S | A N D | C I T Y | W A S | N O W | D R I F T I N G | O U T | T O | S E A | L E A V I N G | A L L | T H I N G S | E T C H E D | S H A R P L Y | I N | T H E | D A Z Z L I N G | S U N L I G H T | O F | A | S O U T H E R N | S P R I N G | D A Y |
+H A R R Y | T H A N K E D | H I M | T H R E W | H I S | S A D D L E | B A G S | A C R O S S | T H E | H O R S E | A | P O W E R F U L | B A Y | A N D | G I V I N G | A | F I N A L | W A V E | O F | H I S | H A N D | T O | T H E | S Y M P A T H E T I C | L I V E R Y M A N | R O D E | A W A Y |
+H E | G A Z E D | U P O N | A | W O R L D | F U L L | O F | R E S P O N S I B I L I T I E S | A N D | P E R I L S |
+I T | W A S | A F T E R N O O N | W H E N | H E | R E A C H E D | T H E | L I T T L E | S T A T I O N | O F | W I N T O N | A N D | L E F T | T H E | T R A I N | A | T A L L | S T U R D Y | B O Y | T H E | S U P E R I O R | O F | M A N Y | A | M A N | I N | S I Z E | S T R E N G T H | A N D | A G I L I T Y |
+H E | H A D | S E E N | G R E A T | T H I N G S | A N D | H E | H A D | D O N E | H I S | S H A R E | O F | T H E M |
+I T | W A S | A | D I F F E R E N T | H A R R Y | W H O | S T A R T E D | H O M E | L A T E | I N | A P R I L |
+L I N C O L N | H A D | C A L L E D | F O R | V O L U N T E E R S | T O | P U T | D O W N | A | R E B E L L I O N | B U T | H A R R Y | H E A R D | E V E R Y W H E R E | I N | C H A R L E S T O N | T H A T | T H E | C O N F E D E R A C Y | W A S | N O W | S E C U R E |
+I T | W A S | A L M O S T | B U R I E D | N O W | I N | F L O W E R S | A N D | F O L I A G E |
+C O L O N E L | K E N T O N | W R I T E S | W I S E L Y |
+T H E | P R O G R E S S | O F | P R E S I D E N T | D A V I S | T O | T H E | N E W | C A P I T A L | S E T | I N | T H E | V E R Y | F A C E | O F | T H E | F O E | W A S | T O | B E | O N E | H U G E | T R I U M P H | O F | F A I T H | A N D | L O Y A L T Y |
+B U T | H E | S A W | N O T H I N G | T H A T | M O V E D | T H E R E | N O | S I G N A L | L I G H T S | T W I N K L E D |
+I T | W H I P P E D | H I S | B L O O D | A S | I T | B L E W | D O W N | F R O M | T H E | S L O P E S | A N D | C R E S T S |
+W E | N E E D | K E N T U C K Y | A N D | I | U N D E R S T A N D | T H A T | A | V E R Y | L I T T L E | M O R E | M A Y | B R I N G | T H E | S T A T E | T O | U S | G O | W I T H | Y O U R | F A T H E R | I | U N D E R S T A N D | T H A T | Y O U | H A V E | B E E N | A | B R A V E | Y O U N G | S O L D I E R | H E R E | A N D | M A Y | Y O U | D O | A S | W E L L | U P | T H E R E |
+F O U R | M O N T H S | H A D | M A D E | G R E A T | C H A N G E S | H E | B O R E | H I M S E L F | M O R E | L I K E | A | M A N | H I S | M A N N E R | W A S | M U C H | M O R E | C O N S I D E R E D | A N D | G R A V E |
+H A R R Y | G A V E | H I S | F A R E W E L L S | W I T H | D E E P | A N D | G E N U I N E | R E G R E T |
+T H I S | W A S | N O T | T H E | F A S H I O N | O F | A | Y E A R | A G O | W H E N | T H E Y | E X C H A N G E D | A | F R I E N D L Y | W O R D | O R | T W O | B U T | H A R R Y | K N E W | I T S | C A U S E | N O W | N O B O D Y | C O U L D | T R U S T | A N Y B O D Y | E L S E |
+B U T | T H E | E M O T I O N S | O F | H A R R Y | A N D | H I S | C O M R A D E S | W E R E | F O R | T H E | M O M E N T | T H O S E | O F | V I C T O R Y | O N L Y |
+C O L O N E L | L E O N I D A S | T A L B O T | R E G A R D E D | T H E | W H I T E | F L A G | W I T H | F E E L I N G S | I N | W H I C H | T R I U M P H | A N D | S A D N E S S | W E R E | M I N G L E D | S T R A N G E L Y |
+A L L | T H E | A M E N I T I E S | W E R E | P R E S E R V E D | B E T W E E N | T H E | C A P T U R E D | G A R R I S O N | A N D | T H E I R | C A P T O R S |
+H I S | T R E A S U R E | T A K E N | T Y P E | O F | H I S | S E L F | A N D | A | W O M A N | G I V E N | H I M | I N S T E A D |
+L E T | B U T | A | M O O D | B E | S T R O N G | E N O U G H | A N D | T H E | S O U L | C L O T H I N G | I T S E L F | I N | T H A T | M O O D | A S | W I T H | A | G A R M E N T | C A N | W A L K | A B R O A D | A N D | H A U N T | T H E | W O R L D |
+T H A T | E N C H A N T M E N T | H A D | P O S S E S S E D | H I M | U S U R P I N G | A S | I T | W E R E | T H E | T H R O N E | O F | H I S | L I F E | A N D | D I S P L A C I N G | I T | W H E N | I T | C E A S E D | H E | W A S | N O T | H I S | O W N | M A S T E R |
+H E | S T A R T E D | T O | C O N S C I O U S | C O N F U S I O N | O N L Y | N E I T H E R | K N O W I N G | W H E R E | H E | W A S | N O R | W H A T | H E | D I D |
+H O W | I T | H A P P E N E D | H E | N E V E R | C O U L D | T E L L | B U T | H E | B R O U G H T | D O W N | H I S | V I O L I N | W I T H | A | C R A S H | A G A I N S T | T H E | P I A N O | T H E N | S O M E H O W | S T U M B L E D | A N D | A L L | B U T | F E L L |
+B U T | I N | H I S | H A N D S | S O L I T U D E | A N D | A | V I O L I N | W E R E | S U R E | T O | M A R R Y | I N | M U S I C |
+W A S | T H E R E | E V E R | A | H A P P I E R | M A N | T H A N | J O S E P H | T H A T | N I G H T | A S | H E | S T R O D E | A L O N G | T H E | F O O T P A T H |
+I T | C R I E D | A L O U D | T H A T | E T E R N I T Y | W A S | V E R Y | L O N G | A N D | L I K E | A | G R E A T | P A L A C E | W I T H O U T | A | Q U I E T | R O O M |
+T H E Y | S A T | D O W N | A N D | L I S T E N E D | I N | S I L E N C E |
+I N | T H E | A C T | O F | R E C O V E R I N G | H I M S E L F | H E | H E A R D | T H E | N E C K | O F | H I S | I N S T R U M E N T | P A R T | F R O M | T H E | B O D Y | W I T H | A | T E A R I N G | D I S C O R D A N T | C R Y | L I K E | T H E | S O U N D | O F | T H E | R U I N | O F | A | L I V I N G | W O R L D |
+H A S T | T H O U | Y E T | T O | L E A R N | T H A T | T H E | L O V E | O F | T H E | H U M A N | I S | L O V E | I S | D I V I N E | I S | B U T | A | L O W E R | F O R M | O F | A | P A R T | O F | T H E | L O V E | O F | G O D |
+H E | P R E S S E D | H I S | V I O L I N | C A S E | T O | H I S | H E A R T | A S | I F | I T | W E R E | A | L I V I N G | T H I N G | T H A T | C O U L D | K N O W | T H A T | H E | L O V E D | I T |
+H E R | H E A R T | S E E M E D | T O | S W E L L | U P | I N T O | H E R | T H R O A T | A N D | I T | W A S | A L L | S H E | C O U L D | D O | T O | K E E P | F R O M | W E E P I N G |
+A | S O B | L I K E | A | B I R D | N E W | B O R N | B U R S T | F R O M | M A R Y ' S | B O S O M |
+H E | T H A T | L O V E T H | N O T | H I S | B R O T H E R | W H O M | H E | H A T H | S E E N | H O W | S H A L L | H E | L O V E | G O D | W H O M | H E | H A T H | N O T | S E E N |
+T H E | M U S I C | W A S | B R O K E N | A N D | J O S E P H | L E F T | A L O N E | W I T H | T H E | D U M B | I N S T R U M E N T S |
+W H E N | H E | R E A C H E D | T H E | S U B U R B S | T H E | L I G H T | O F | H O M E S | W A S | S H I N I N G | T H R O U G H | C U R T A I N S | O F | A L L | C O L O R S |
+I T ' S | J U S T | L I K E | H I M | H E | M U R M U R E D |
+H E | W A S | I N | A | M O O D | F O R | M U S I C | W A S | H E | N O T |
+I | L O V E | T H E E | I | L O V E | T H E E | C R I E D | T H E | V I O L I N | A N D | T H E | W O R S H I P | W A S | E N T R E A T Y | T H A T | K N E W | N O T | I T S E L F |
+L E T T Y | F I N D I N G | H E R S E L F | N O T | Q U I T E | E Q U A L | T O | T H E | E M E R G E N C Y | C A M E | I N | H E R | T U R N | T O | C A L L | M A R Y | S H E | W E N T | A S | Q U I E T L Y | A S | I F | S H E | W E R E | L E A V I N G | A | T I R E S O M E | V I S I T O R |
+B L E S S E D | A M | I | H E R E | N O W | M Y | G O D | A N D | B L E S S E D | S H A L L | I | B E | T H E R E | T H E N |
+J U S T | T H E N | H E | W A S | I N | N O | M O O D | T O | T H I N K | O F | T H E | S O R R O W S |
+A | L I T T L E | L O N G E R | A N D | S H E | W A S | C O M P E L L E D | T O | Y I E L D | A N D | T H E | S I L E N T | T E A R S | F L O W E D | F R E E L Y |
+I T | W A S | T H E | A F T E R N O O N | O F | A | H O L I D A Y | A N D | S H E | H A D | C L O S E D | E A R L Y |
+H E | L A I D | D O W N | H I S | V I O L I N | A N D | S E A T E D | H I M S E L F | W H E R E | M A R Y | T O L D | H I M | I N | H E R | F A T H E R ' S | A R M | C H A I R | B Y | T H E | F I R E |
+H I S | V I O L I N | W A S | B R O K E N | B U T | H I S | B E I N G | W A S | M A D E | W H O L E |
+E A R T H | W A S | G O N E | A N D | H E A V E N | W A S | A L L |
+O N E | W I N T E R | E V E N I N G | A S | S O O N | A S | H I S | W O R K | W A S | O V E R | F O R | T H E | D A Y | J O S E P H | L O C K E D | T H E | D O O R | O F | H I S | S M I T H Y | W A S H E D | H I M S E L F | W E L L | P U T | O N | C L E A N | C L O T H E S | A N D | T A K I N G | H I S | V I O L I N | S E T | O U T | F O R | T E S T B R I D G E | M A R Y | W A S | E X P E C T I N G | H I M | T O | T E A |
+W H E N | T H O U | L O V E S T | M A N | O R | W O M A N | O R | C H I L D | Y E A | O R | E V E N | D O G | A R I G H T | T H E N | W I L T | T H O U | N O | L O N G E R | N E E D | T H A T | I | T E L L | T H E E | H O W | G O D | A N D | H I S | C H R I S T | W O U L D | N O T | B E | C O N T E N T | W I T H | E A C H | O T H E R | A L O N E | I N | T H E | G L O R I E S | E V E N | O F | T H E | E T E R N A L | O R I G I N A L | L O V E | B E C A U S E | T H E Y | C O U L D | C R E A T E | M O R E | L O V E |
+N O R | W A S | T H I S | E X A C T L Y | T H E | S H A P E | T H E | T H I N G | T O O K | T O | T H E | C O N S C I O U S N E S S | O F | T H E | M U S I C I A N |
+L E T T Y | T O O | W A S | O V E R C O M E | M O R E | T H A N | E V E R | S H E | H A D | B E E N | B Y | M U S I C |
+T H E | N E T T L E | A N D | T H E | D O C K | S A I D | J O S E P H |
+B U T | M Y | U N C L E | W A S | I N | N O | H U M O R | T O | W A I T |
+A C C U S T O M E D | A S | I | H A D | B E E N | T O | T H E | S T E A M | F E R R Y | B O A T S | O F | T H E | E L B E | I | F O U N D | T H E | L O N G | O A R S | O F | T H E | B O A T M E N | B U T | S O R R Y | M E A N S | O F | L O C O M O T I O N |
+I | C O U L D | N O T | H E L P | S M I L I N G | T O | S E E | H I M | L O O K | S O | B I G | O N | H I S | L I T T L E | H O R S E | H I S | L O N G | L E G S | N O W | A N D | T H E N | T O U C H I N G | T H E | G R O U N D | M A D E | H I M | L O O K | L I K E | A | S I X | F O O T E D | C E N T A U R |
+L I T T L E | D I D | I | E X P E C T | H O W E V E R | T H E | S P E C T A C L E | W H I C H | A W A I T E D | U S | W H E N | W E | R E A C H E D | T H E | P E N I N S U L A | O F | S N E F F E L S | W H E R E | A G G L O M E R A T I O N S | O F | N A T U R E ' S | R U I N S | F O R M | A | K I N D | O F | T E R R I B L E | C H A O S |
+H A N S | O U R | E X T R A O R D I N A R Y | G U I D E | W E N T | F I R S T | W A L K I N G | W I T H | A | S T E A D Y | R A P I D | U N V A R Y I N G | S T E P |
+G E O G R A P H E R S | H A V E | D I V I D E D | I T | I N T O | F O U R | P A R T S | A N D | W E | H A D | T O | C R O S S | T H E | S O U T H W E S T | Q U A R T E R | W H I C H | I N | T H E | V E R N A C U L A R | I S | C A L L E D | S U D V E S T R | F J O R D U N G R |
+I T | C O N S I S T S | S I M P L Y | O F | A | F E W | H O U S E S | N O T | W H A T | I N | E N G L A N D | O R | G E R M A N Y | W E | S H O U L D | C A L L | A | H A M L E T |
+O U R | T W O | H O R S E S | W I T H | T H E | L U G G A G E | F O L L O W E D | O F | T H E I R | O W N | A C C O R D | W I T H O U T | R E Q U I R I N G | W H I P | O R | S P U R |
+H E | S A Y S | T I D E | R E P L I E D | M Y | U N C L E | T R A N S L A T I N G | T H E | D A N I S H | W O R D | F O R | M Y | I N F O R M A T I O N |
+T H E S E | S A C R E D | E D I F I C E S | A R E | H O W E V E R | V E R Y | M U C H | L I K E | T H E S E | P E O P L E | W H O | D O | W I T H O U T | W A T C H E S | A N D | N E V E R | M I S S | T H E M |
+W E | M A Y | D O | S O | W A S | M Y | R E P L Y | B U T | W H A T | A B O U T | O U R | W O R T H Y | G U I D E |
+T H E Y | V E R Y | R A R E L Y | S U C C E E D | I N | A | G O O D | S H O W | O F | Y E L L O W |
+S N O W | T E M P E S T | I M P R A C T I C A B L E | R O A D S | R O C K S | I C E B E R G S | N O T H I N G | S T O P S | H I M |
+A T | L E N G T H | T H E | S T U R D Y | L I T T L E | P O N Y | S P R E A D I N G | O U T | H I S | L E G S | I N | A | S T I F F | A N D | L U D I C R O U S | A T T I T U D E | G O T | F R O M | U N D E R | T H E | P R O F E S S O R ' S | L E G S | A N D | L E F T | H I M | S T A N D I N G | W I T H | B O T H | F E E T | O N | A | S E P A R A T E | S T O N E | L I K E | T H E | C O L O S S U S | O F | R H O D E S |
+M Y | A R M S | A R E | R I G H T | B U T | M Y | L E G S | A R E | G E T T I N G | A | L I T T L E | S T I F F |
+W E | T O O K | O U R | W A Y | T H R O U G H | P O O R | A N D | S P A R S E | M E A D O W S | W H I C H | M A D E | A | D E S P E R A T E | E F F O R T | E V E R Y | Y E A R | T O | S H O W | A | L I T T L E | G R E E N |
+A | F E W | S T R A Y | C O W S | A N D | S H E E P | W E R E | O N L Y | S E E N | O C C A S I O N A L L Y |
+I | S H O U L D | H A V E | A | V I O L E N T | A T T A C K | O F | T H E | C R A M P | I F | I | W E R E | N O T | T O | H A V E | S O M E | S O R T | O F | E X E R C I S E |
+I | T H O R O U G H L Y | U N D E R S T O O D | A N D | A P P R E C I A T E D | T H E | N E C E S S I T Y | F O R | W A I T I N G | B E F O R E | C R O S S I N G | T H E | F J O R D | F O R | T H A T | M O M E N T | W H E N | T H E | S E A | A T | I T S | H I G H E S T | P O I N T | I S | I N | A | S T A T E | O F | S L A C K | W A T E R |
+T O | R I D E | O V E R | S A L T | W A T E R | U P O N | T H E | B A C K | O F | A | L I T T L E | H O R S E | S E E M E D | T O | M E | A B S U R D |
+H E R E | A N D | T H E R E | C O U L D | B E | S E E N | A N | I S O L A T E D | F A R M | S O M E | S O L I T A R Y | B U R | O R | I C E L A N D I C | H O U S E | B U I L T | O F | W O O D | E A R T H | F R A G M E N T S | O F | L A V A | L O O K I N G | L I K E | B E G G A R S | O N | T H E | H I G H W A Y | O F | L I F E |
+I | B E G A N | T O | E N J O Y | T H E | E X H I L A R A T I N G | D E L I G H T | O F | T R A V E L I N G | A | L I F E | O F | D E S I R E | G R A T I F I C A T I O N | A N D | L I B E R T Y |
+I | T O O K | O C C A S I O N | T O | C O N S U L T | T H E | M A P | T O | S E E | W H E R E | G A R D A R | W A S | T O | B E | F O U N D |
+I N | A N Y | C A S E | I | S H A L L | T R U S T | R A T H E R | T O | M Y | O W N | I N T E L L I G E N C E | T H A N | T H E I R S |
+C U R I O U S L Y | E N O U G H | T H E | B L O O D | O F | W A B I | R A N | A L M O S T | P U R E | T O | H I S | I N D I A N | F O R E F A T H E R S | W H I L E | M I N N E T A K I | A S | S H E | B E C A M E | O L D E R | D E V E L O P E D | L E S S | O F | T H E | W I L D | B E A U T Y | O F | H E R | M O T H E R | A N D | M O R E | O F | T H E | S O F T E R | L O V E L I N E S S | O F | T H E | W H I T E | R A C E | H E R | W E A L T H | O F | S O F T | J E T | B L A C K | H A I R | A N D | H E R | G R E A T | D A R K | E Y E S | C O N T R A S T I N G | W I T H | T H E | L I G H T E R | S K I N | O F | H E R | F A T H E R ' S | B L O O D |
+B U T | I N | T I M E | T H E | E N D | O F | I T | A L L | C A M E | A N D | W A B I | W E N T | B A C K | T O | T H E | P R I N C E S S | M O T H E R | T O | M I N N E T A K I | A N D | T O | H I S | F O R E S T S |
+W H I L E | T H E | A T T A C K | W A S | S U C C E S S F U L | I N | A | W A Y | I T S | M A I N | P U R P O S E | F A I L E D |
+A T | L A S T | S O | D A R I N G | D I D | H E | B E C O M E | T H A T | T H E | P R O V I N C I A L | G O V E R N M E N T | P L A C E D | A | P R I C E | U P O N | H I S | H E A D | A N D | U P O N | T H O S E | O F | A | N U M B E R | O F | H I S | M O S T | N O T O R I O U S | F O L L O W E R S |
+O N E | O F | N E W S O M E ' S | C H I E F | P L E A S U R E S | I N | L I F E | H A D | B E E N | T H E | E D U C A T I N G | O F | H I S | W O O D L A N D | B R I D E | A N D | I T | W A S | T H E | A M B I T I O N | O F | B O T H | T H A T | T H E | L I T T L E | M I N N E T A K I | A N D | H E R | B R O T H E R | B E | R E A R E D | I N | T H E | W A Y S | O F | W H I T E | C H I L D R E N |
+I T | W A S | A T | A B O U T | T H I S | T I M E | I N | T H E I R | L I V E S | T H A T | T H E | W O O N G A S | B E C A M E | E S P E C I A L L Y | D A R I N G | I N | T H E I R | D E P R E D A T I O N S |
+W A B I | O N | T H E | O T H E R | H A N D | W A S | A N | I N D I A N | I N | A P P E A R A N C E | F R O M | H I S | M O C C A S I N S | T O | T H E | C R O W N | O F | H I S | H E A D | S W A R T H Y | S I N E W Y | A S | A G I L E | A S | A | L Y N X | A N D | W I T H | E V E R Y | I N S T I N C T | I N | H I M | C R Y I N G | F O R | T H E | L I F E | O F | T H E | W I L D |
+M E A N W H I L E | T W O | C H I L D R E N | C A M E | T O | B L E S S | T H E | H A P P Y | U N I O N | O F | N E W S O M E | A N D | H I S | L O V E L Y | I N D I A N | W I F E |
+T H R E E | D A Y S | L A T E R | M I N N E T A K I | B E C A M E | N E W S O M E ' S | W I F E | A T | T H E | H U D S O N | B A Y | P O S T |
+B U T | E A C H | W E E K | A D D E D | T O | H I S | L O N E L I N E S S | A N D | H I S | L O N G I N G S | F O R | M I N N E T A K I | A N D | H I S | F O R E S T S |
+T H E R E | W E R E | T E A R S | I N | T H E | B O Y S | E Y E S | W H E N | T H E Y | P A R T E D | A N D | T H E | M O T H E R | C R I E D | F O R | T H E | I N D I A N | B O Y | W H O | W A S | R E T U R N I N G | T O | H I S | P E O P L E |
+T H E | O T H E R | W A S | A | G I R L | T H R E E | Y E A R S | Y O U N G E R | A N D | N E W S O M E | I N S I S T E D | T H A T | S H E | B E | C A L L E D | M I N N E T A K I |
+O N E | D A R K | N I G H T | A T | T H E | H E A D | O F | A | S C O R E | O F | H I S | T R I B E | H E | F E L L | U P O N | W A B I G O O N ' S | C A M P | H I S | O B J E C T | B E I N G | T H E | A B D U C T I O N | O F | T H E | P R I N C E S S |
+T H E R E | W A S | L I T T L E | T I M E | T O | L O S E | I N | M A K I N G | P R E P A R A T I O N S | A N D | T H E | F O U R T H | D A Y | F O L L O W I N G | T H E | R E C E I P T | O F | W A B I ' S | L E T T E R | F O U N D | R O D | A N D | H I S | M O T H E R | W A I T I N G | F O R | T H E | T R A I N | W H I C H | W A S | T O | W H I R L | T H E | B O Y | I N T O | H I S | N E W | L I F E |
+T H E | C H I L D R E N | P R O V E D | T H E M S E L V E S | U N U S U A L L Y | B R I G H T | P U P I L S | A N D | B Y | T H E | T I M E | W A B I | W A S | S I X T E E N | A N D | M I N N E T A K I | T W E L V E | O N E | W O U L D | N O T | H A V E | K N O W N | F R O M | T H E I R | M A N N E R | O F | S P E E C H | T H A T | I N D I A N | B L O O D | R A N | I N | T H E I R | V E I N S |
+A | C O U N T E R | A T T A C K | W A S | M A D E | U P O N | W O O N G A | A N D | H E | W A S | D R I V E N | D E E P | I N T O | T H E | W I L D E R N E S S | W I T H | G R E A T | L O S S |
+F R O M | T H A T | H O U R | D A T E D | O N E | O F | T H E | M O S T | S A N G U I N A R Y | F E U D S | I N | T H E | H I S T O R Y | O F | T H E | G R E A T | T R A D I N G | C O M P A N Y | A | F E U D | W H I C H | A S | W E | S H A L L | S E E | W A S | D E S T I N E D | T O | L I V E | E V E N | U N T O | T H E | S E C O N D | G E N E R A T I O N |
+B U T | T H I S | P O W E R | O F | D I S C E R N M E N T | W A S | D E N I E D | T H E M | A N D | O N L Y | I N | A F T E R | Y E A R S | W I T H | T H E | L O V E D | O N E S | O F | T H E I R | O W N | F I R E S I D E S | C L O S E | A B O U T | T H E M | W A S | T H E | W H O L E | P I C T U R E | R E V E A L E D |
+A | T H O U S A N D | P L A N S | W E R E | M A D E | A | T H O U S A N D | A D V E N T U R E S | P I C T U R E D | A N D | T H E | M O T H E R | W O U L D | S M I L E | A N D | L A U G H | A N D | P L A N | W I T H | T H E M |
+W E | S H A L L | M A K E | M O R E | M O N E Y | U P | H E R E | T H I S | W I N T E R | T H A N | Y O U | C O U L D | E A R N | I N | D E T R O I T | I N | T H R E E | Y E A R S |
+O N | T H E | T E N T H | O F | O C T O B E R | H E | W O U L D | M E E T | R O D | A T | S P R U C E W O O D | O N | T H E | B L A C K | S T U R G E O N | R I V E R |
+S P R I N G | C A M E | A N D | P A S S E D | A N D | T H E N | S U M M E R |
+W E | W I L L | H U N T | W O L V E S | T H E | C O U N T R Y | I S | A L I V E | W I T H | T H E M | A N D | T H E | G O V E R N M E N T | G I V E S | A | B O U N T Y | O F | F I F T E E N | D O L L A R S | F O R | E V E R Y | S C A L P | T A K E N |
+T H R E E | W E E K S | L A T E R | C A M E | W A B I G O O N ' S | R E P L Y |
+C O N S E Q U E N T L Y | B O T H | M O T H E R | A N D | F A T H E R | B E G A N | T H E I R | E D U C A T I O N | A T | T H E | P O S T | T H E Y | W E R E | S E N T | T O | T H E | F A C T O R ' S | S C H O O L | A N D | T W O | W I N T E R S | W E R E | P A S S E D | I N | P O R T | A R T H U R | T H A T | T H E Y | M I G H T | H A V E | T H E | A D V A N T A G E | O F | T H O R O U G H L Y | E Q U I P P E D | S C H O O L S |
+N E C E S S I T Y | H A D | B E C O M E | H I S | G R I M | M A S T E R | A N D | T H E | F O L L O W I N G | W E E K | H E | W A S | G O I N G | T O | W O R K |
+O N | T H E | S E C O N D | O F | T H E | M O N T H | A T | T W O | I N | T H E | M O R N I N G | O U R | P R E C I O U S | C A R G O | O F | L U G G A G E | W A S | T A K E N | O N | B O A R D | T H E | G O O D | S H I P | V A L K Y R I E |
+T H E | M E N | A P P E A R E D | R O B U S T | B U T | H E A V Y | F A I R | H A I R E D | L I K E | G E R M A N S | B U T | O F | P E N S I V E | M I E N | E X I L E S | O F | A | H I G H E R | S C A L E | I N | T H E | L A D D E R | O F | H U M A N I T Y | T H A N | T H E | E S K I M O S | B U T | I | T H O U G H T | M U C H | M O R E | U N H A P P Y | S I N C E | W I T H | S U P E R I O R | P E R C E P T I O N S | T H E Y | A R E | C O M P E L L E D | T O | L I V E | W I T H I N | T H E | L I M I T S | O F | T H E | P O L A R | C I R C L E |
+T H E | P R O F E S S O R | K N E W | W H O M | H E | H A D | T O | D E A L | W I T H |
+N E A R L Y | T H E | W H O L E | P O P U L A T I O N | O F | T H E | T O W N | W A S | O N | F O O T | T O | S E E | U S | L A N D |
+V E R Y | L I K E L Y | I | M A Y | F I N D | T H E R E | S O M E | M A N U S C R I P T S | F R O M | T H E | H A N D | O F | S A K N U S S E M M |
+I N | T H E | M E A N T I M E | T H E R E | I S | N O T | A N | H O U R | T O | L O S E |
+H E | W A S | H O W E V E R | B U T | A | C I V I L | S E R V A N T | A | M A G I S T R A T E | T H E | G O V E R N O R | O F | T H E | I S L A N D | B A R O N | T R A M P E |
+A T | A L L | E V E N T S | W E | S H A L L | G E T | T H E R E | S O M E | D A Y |
+O N | T H E | E L E V E N T H | D A Y | W E | S I G H T E D | C A P E | P O R T L A N D | O V E R | W H I C H | T O W E R E D | M O U N T | M Y R D A L S | Y O K U L | W H I C H | T H E | W E A T H E R | B E I N G | C L E A R | W E | M A D E | O U T | V E R Y | R E A D I L Y |
+T H E | F A C T | W A S | T H A T | S C A R C E L Y | A N Y | O N E | O F | T H E M | B U T | E X P E C T E D | S O M E | G O O D S | B Y | T H E | P E R I O D I C A L | V E S S E L |
+T H O U G H | N O T | V E R Y | L A R G E | I T | A P P E A R E D | N O T | L I K E L Y | T O | B E | F I L L E D | F O R | C E N T U R I E S |
+M Y | U N C L E | W A S | D E L I G H T E D | F O R | M Y S E L F | M O O D Y | A N D | D I S S A T I S F I E D | I | A P P E A R E D | A L M O S T | T O | E X P E C T | A | G L I M P S E | O F | T H E | G H O S T | O F | H A M L E T |
+N O | M I S T E R | H A R D W I G G | S A I D | T H E | C A P T A I N | N O | F E A R | O F | T H A T |
+T H E Y | W E R E | N O W | H O W E V E R | A B S E N T | O N | D U T Y |
+I | S A W | B U T | F E W | I N H A B I T A N T S | D U R I N G | M Y | E X C U R S I O N | B U T | I | M E T | A | C R O W D | O N | T H E | B E A C H | D R Y I N G | S A L T I N G | A N D | L O A D I N G | C O D F I S H | T H E | P R I N C I P A L | A R T I C L E | O F | E X P O R T A T I O N |
+W E L L | A N D | H A V E | W E | A | F A I R | W I N D |
+O N | A L L | S I D E S | W E R E | T O | B E | S E E N | W H O L E | S C H O O L S | O F | W H A L E S | A N D | S H A R K S |
+B U T | N O | G H O S T | O R | A N Y T H I N G | E L S E | A P P E A R E D | U P O N | T H E | A N C I E N T | W A L L S |
+N O W | H A R R Y | S A I D | M Y | U N C L E | R U B B I N G | H I S | H A N D S | A N | G O E S | W E L L | T H E | W O R S E | D I F F I C U L T Y | I S | N O W | O V E R |
+W H E N | T H E R E F O R E | H E | A D D R E S S E D | H I M S E L F | T O | M E | I N | T H E | L A N G U A G E | O F | H O R A C E | W E | A T | O N C E | C A M E | T O | U N D E R S T A N D | O N E | A N O T H E R |
+T H A N K S | T O | T H E | H E A T | O F | T H E S E | R E S I D E N C E S | G R A S S | G R O W S | O N | T H E | R O O F | W H I C H | G R A S S | I S | C A R E F U L L Y | C U T | F O R | H A Y |
+I | S H A L L | B E | G L A D | T O | C O N S U L T | T H E M |
+T H E | V A L K Y R I E | K E P T | O F F | T H E | C O A S T | S T E E R I N G | T O | T H E | W E S T W A R D |
+T H E N | W I T H O U T | F U R T H E R | R E M A R K | H E | P U T | H I S | F I N G E R | T O | H I S | L I P S | F R O W N E D | D A R K L Y | A N D | D E S C E N D E D | I N T O | T H E | S M A L L | B O A T | W H I C H | A W A I T E D | U S |
+T H I S | M O D E S T | S C H O L A R | S P O K E | N O | L A N G U A G E S | S A V E | I C E L A N D I C | A N D | L A T I N |
+T H E | F A C T | I S | T H E | C A S T L E | I S | M U C H | L A T E R | T H A N | T H E | T I M E | O F | T H E | H E R O I C | P R I N C E | O F | D E N M A R K |
+N O T | B E | I T | E V E R | R E M E M B E R E D | T H A T | T H E | S L I G H T E S T | S U S P I C I O N | O F | I M M O R A L I T Y | A T T A C H E S | E I T H E R | T O | T H E | H E R O I N E | O F | T H I S | B O O K | O R | T O | T H E | L E A D I N G | P H I L O S O P H E R S | O F | H E R | S C H O O L | F O R | S E V E R A L | C E N T U R I E S |
+T O | S Y N E S I U S ' S | M O S T | C H A R M I N G | L E T T E R S | A S | W E L L | A S | T O | T H O S E | O F | I S I D O R E | T H E | G O O D | A B B O T | O F | P E L U S I U M | I | B E G | L E A V E | T O | R E F E R | T H O S E | R E A D E R S | W H O | W I S H | F O R | F U R T H E R | I N F O R M A T I O N | A B O U T | T H E | P R I V A T E | L I F E | O F | T H E | F I F T H | C E N T U R Y |
+T H A T | W O N D E R F U L | M E T A P H Y S I C | S U B T L E T Y | W H I C H | I N | P H R A S E S | A N D | D E F I N I T I O N S | T O O | O F T E N | U N M E A N I N G | T O | O U R | G R O S S E R | I N T E L L E C T | S A W | T H E | S Y M B O L S | O F | T H E | M O S T | I M P O R T A N T | S P I R I T U A L | R E A L I T I E S | A N D | F E L T | T H A T | O N | T H E | D I S T I N C T I O N | B E T W E E N | H O M O O U S I O S | A N D | H O M O I O U S I O S | M I G H T | H A N G | T H E | S O L U T I O N | O F | T H E | W H O L E | P R O B L E M | O F | H U M A N I T Y | W A S | S E T | T O | B A T T L E | I N | A L E X A N D R I A | T H E | A N C I E N T | S T R O N G H O L D | O F | G R E E K | P H I L O S O P H Y | W I T H | T H E | E F F E T E | R E M A I N S | O F | T H E | V E R Y | S C I E N T I F I C | T H O U G H T | T O | W H I C H | I T | O W E D | I T S | E X T R A O R D I N A R Y | C U L T U R E |
+T H E | H U N S | S I N G L Y | T H E I R | I N F E R I O R S | P R E S S E D | T H E M | F R O M | B E H I N D | W I T H | T H E | I R R E S I S T I B L E | W E I G H T | O F | N U M B E R S | I T A L Y | W I T H | H E R | R I C H | C I T I E S | A N D | F E R T I L E | L O W L A N D S | B E C K O N E D | T H E M | O N | T O | P L U N D E R | A S | A U X I L I A R I E S | T H E Y | H A D | L E A R N E D | T H E I R | O W N | S T R E N G T H | A N D | R O M A N | W E A K N E S S | A | C A S U S | B E L L I | W A S | S O O N | F O U N D |
+T H E | V E R Y | E M P E R O R S | H A D | A R R A Y E D | T H E M S E L V E S | O N | H E R | S I D E |
+I | C A N N O T | H O P E | T H A T | T H E S E | P A G E S | W I L L | B E | A L T O G E T H E R | F R E E | F R O M | A N A C H R O N I S M S | A N D | E R R O R S |
+H O W | I N I Q U I T O U S | W A S | T H E | C O N D U C T | O F | T H E | S O N S | O F | T H E O D O S I U S | I N | R E F U S I N G | T H E | U S U A L | B O U N T Y | B Y | W H I C H | T H E | G O T H S | W E R E | B R I B E D | N O T | T O | A T T A C K | T H E | E M P I R E | T H E | W H O L E | P E N T | U P | D E L U G E | B U R S T | O V E R | T H E | P L A I N S | O F | I T A L Y | A N D | T H E | W E S T E R N | E M P I R E | B E C A M E | F R O M | T H A T | D A Y | F O R T H | A | D Y I N G | I D I O T | W H I L E | T H E | N E W | I N V A D E R S | D I V I D E D | E U R O P E | A M O N G | T H E M S E L V E S |
+T H E | C O U N T L E S S | T R E A S U R E S | W H I C H | F I V E | C E N T U R I E S | O F | R A P I N E | H A D | A C C U M U L A T E D | R O U N D | T H E | C A P I T O L | H A D | B E C O M E | T H E | P R E Y | O F | M E N | C L O T H E D | I N | S H E E P S K I N S | A N D | H O R S E | H I D E | A N D | T H E | S I S T E R | O F | A N | E M P E R O R | H A D | F O U N D | H E R | B E A U T Y | V I R T U E | A N D | P R I D E | O F | R A C E | W O R T H I L Y | M A T C H E D | B Y | T H O S E | O F | T H E | H A R D | H A N D E D | N O R T H E R N | H E R O | W H O | L E D | H E R | A W A Y | F R O M | I T A L Y | A S | H I S | C A P T I V E | A N D | H I S | B R I D E | T O | F O U N D | N E W | K I N G D O M S | I N | S O U T H | F R A N C E | A N D | S P A I N | A N D | T O | D R I V E | T H E | N E W L Y | A R R I V E D | V A N D A L S | A C R O S S | T H E | S T R A I T S | O F | G I B R A L T A R | I N T O | T H E | T H E N | B L O O M I N G | C O A S T | L A N D | O F | N O R T H E R N | A F R I C A |
+O N E | W H O | W R I T E S | O F | S U C H | A N | E R A | L A B O U R S | U N D E R | A | T R O U B L E S O M E | D I S A D V A N T A G E |
+I N | T H E | P R E S E N T | C A S E | T H A T | D I S A D V A N T A G E | I S | D O U B L E D | F O R | W H I L E | T H E | S I N S | O F | T H E | C H U R C H | H O W E V E R | H E I N O U S | W E R E | S T I L L | S U C H | A S | A D M I T | O F | B E I N G | E X P R E S S E D | I N | W O R D S | T H E | S I N S | O F | T H E | H E A T H E N | W O R L D | A G A I N S T | W H I C H | S H E | F O U G H T | W E R E | U T T E R L Y | I N D E S C R I B A B L E | A N D | T H E | C H R I S T I A N | A P O L O G I S T | I S | T H U S | C O M P E L L E D | F O R | T H E | S A K E | O F | D E C E N C Y | T O | S T A T E | T H E | C H U R C H ' S | C A S E | F A R | M O R E | W E A K L Y | T H A N | T H E | F A C T S | D E S E R V E |
+A N D | T H E | N E W | B L O O D | A T | T H E | E R A | O F | T H I S | S T O R Y | W A S | A T | H A N D |
+T H A T | E X T R A O R D I N A R Y | R E F O R M | I N | M O R A L S | W H I C H | A C C O R D I N G | T O | S A L V I A N | A N D | H I S | C O N T E M P O R A R I E S | T H E | V A N D A L | C O N Q U E R O R S | W O R K E D | I N | N O R T H | A F R I C A | A V A I L E D | T H E M | N O T H I N G | T H E Y | L O S T | M O R E | T H A N | T H E Y | G A V E |
+B U T | I F | T H E | E M P E R O R S | H A D | B E C O M E | C H R I S T I A N | T H E | E M P I R E | H A D | N O T |
+T R I B E | A F T E R | T R I B E | W A S | C R O W D I N G | D O W N | T O | T H E | A L P S | A N D | T R A M P L I N G | U P O N | E A C H | O T H E R | O N | T H E | F R O N T I E R S | O F | T H E | E M P I R E |
+T H E | M E N S | S A N A | M U S T | H A V E | A | C O R P U S | S A N U M | T O | I N H A B I T |
+I N | T H E | M E A N W H I L E | T H E | M I N D S | O F | M E N | C U T | A D R I F T | F R O M | T H E I R | A N C I E N T | M O O R I N G S | W A N D E R E D | W I L D L Y | O V E R | P A T H L E S S | S E A S | O F | S P E C U L A T I V E | D O U B T | A N D | E S P E C I A L L Y | I N | T H E | M O R E | M E T A P H Y S I C A L | A N D | C O N T E M P L A T I V E | E A S T | A T T E M P T E D | T O | S O L V E | F O R | T H E M S E L V E S | T H E | Q U E S T I O N S | O F | M A N ' S | R E L A T I O N | T O | T H E | U N S E E N | B Y | T H O S E | T H O U S A N D | S C H I S M S | H E R E S I E S | A N D | T H E O S O P H I E S | I T | I S | A | D I S G R A C E | T O | T H E | W O R D | P H I L O S O P H Y | T O | C A L L | T H E M | B Y | I T | O N | T H E | R E C O R D S | O F | W H I C H | T H E | S T U D E N T | N O W | G A Z E S | B E W I L D E R E D | U N A B L E | A L I K E | T O | C O U N T | O R | T O | E X P L A I N | T H E I R | F A N T A S I E S |
+T H A T | D I V I N E | W O R D | W H O | I S | T H E | L I G H T | W H O | L I G H T E T H | E V E R Y | M A N | W H I C H | C O M E T H | I N T O | T H E | W O R L D | H A D | A W A K E N E D | I N | T H E | H E A R T | O F | M A N K I N D | A | M O R A L | C R A V I N G | N E V E R | B E F O R E | F E L T | I N | A N Y | S T R E N G T H | E X C E P T | B Y | A | F E W | I S O L A T E D | P H I L O S O P H E R S | O R | P R O P H E T S |
+B U T | T H E | H E A L T H | O F | A | C H U R C H | D E P E N D S | N O T | M E R E L Y | O N | T H E | C R E E D | W H I C H | I T | P R O F E S S E S | N O T | E V E N | O N | T H E | W I S D O M | A N D | H O L I N E S S | O F | A | F E W | G R E A T | E C C L E S I A S T I C S | B U T | O N | T H E | F A I T H | A N D | V I R T U E | O F | I T S | I N D I V I D U A L | M E M B E R S |
+J U L I A N ' S | L A S T | A T T E M P T | T O | R E S T O R E | P A G A N I S M | B Y | I M P E R I A L | I N F L U E N C E | H A D | O N L Y | P R O V E D | T H A T | T H E | O L D | F A I T H | H A D | L O S T | A L L | H O L D | U P O N | T H E | H E A R T S | O F | T H E | M A S S E S | A T | H I S | D E A T H | T H E | G R E A T | T I D E | W A V E | O F | N E W | O P I N I O N | R O L L E D | O N | U N C H E C K E D | A N D | T H E | R U L E R S | O F | E A R T H | W E R E | F A I N | T O | S W I M | W I T H | T H E | S T R E A M | T O | A C C E P T | I N | W O R D S | A T | L E A S T | T H E | C H U R C H ' S | L A W S | A S | T H E I R S | T O | A C K N O W L E D G E | A | K I N G | O F | K I N G S | T O | W H O M | E V E N | T H E Y | O W E D | H O M A G E | A N D | O B E D I E N C E | A N D | T O | C A L L | T H E I R | O W N | S L A V E S | T H E I R | P O O R E R | B R E T H R E N | A N D | O F T E N | T O O | T H E I R | S P I R I T U A L | S U P E R I O R S |
+T H E Y | B R O U G H T | B E F O R E | T H E | M I N D S | O F | C H U R C H M E N | A | T H O U S A N D | N E W | Q U E S T I O N S | W H I C H | M U S T | B E | S O L V E D | U N L E S S | T H E | C H U R C H | W A S | T O | R E L I N Q U I S H | F O R | E V E R | H E R | C L A I M S | A S | T H E | G R E A T | T E A C H E R | A N D | S A T I S F I E R | O F | T H E | H U M A N | S O U L |
+A Y | S H E | A N S W E R E D | H A L F | B I T T E R L Y | A N D | W O U L D | T H A T | W E | C O U L D | L I V E | W I T H O U T | F O O D | A N D | I M I T A T E | P E R F E C T L Y | T H E | I M M O R T A L | G O D S |
+T H E R E | I S | F R U I T | W I T H | L E N T I L S | A N D | R I C E | W A I T I N G | F O R | Y O U | I N | T H E | N E X T | R O O M | A N D | B R E A D | U N L E S S | Y O U | D E S P I S E | I T | T O O | M U C H |
+N O T | T H A T | S U C H | A | C R E A T U R E | A S | T H A T | D I S T U R B S | M E | N O | C R E A T E D | T H I N G | I | H O P E | C A N | M O V E | M Y | E Q U A N I M I T Y | B U T | I F | I | C O U L D | S T O O P | T O | H A T E | I | S H O U L D | H A T E | H E R | H A T E | H E R |
+H I S | E X C E L L E N C Y | M A D A M | T H E | P R E F E C T |
+A N D | W H Y | S H O U L D | T H A T | D I S T U R B | M E | L E T | H I M | E N T E R |
+S H E | H A S | L I F T E D | H E R | E Y E S | O F F | H E R | M A N U S C R I P T | S H E | I S | L O O K I N G | O U T | W I T H | K I N D L I N G | C O U N T E N A N C E | O V E R | T H E | G A R D E N S | O F | T H E | M U S E U M | H E R | R I P E | C U R L I N G | G R E E K | L I P S | S U C H | A S | W E | N E V E R | S E E | N O W | E V E N | A M O N G | H E R | O W N | W I V E S | A N D | S I S T E R S | O P E N |
+I F | T H E Y | H A V E | C A S T | O F F | T H E | V U L G A R | H E R D | T H E Y | H A V E | N O T | C A S T | O F F | H Y P A T I A |
+W H A T | D O | I | C A R E | F O R | F O O D |
+A N D | H E R | V O I C E | T O O K | A | T O N E | W H I C H | M A D E | I T | S O M E W H A T | U N C E R T A I N | W H E T H E R | I N | S P I T E | O F | A L L | T H E | L O F T Y | I M P A S S I B I L I T Y | W H I C H | S H E | F E L T | B O U N D | T O | P O S S E S S | S H E | D I D | N O T | H A T E | P E L A G I A | W I T H | A | M O S T | H U M A N | A N D | M U N D A N E | H A T R E D |
+I F | T H E Y | H A V E | C E A S E D | T O | G U I D E | N A T I O N S | T H E Y | H A V E | N O T | C E A S E D | T O | S P E A K | T O | T H E I R | O W N | E L E C T |
+S T R A N G E | T H A T | M E N | S H O U L D | B E | C O N T E N T | T O | G R O V E L | A N D | B E | M E N | W H E N | T H E Y | M I G H T | R I S E | T O | T H E | R A N K | O F | G O D S |
+I | T O | B E L I E V E | A G A I N S T | T H E | A U T H O R I T Y | O F | P O R P H Y R Y | H I M S E L F | T O O | I N | E V I L | E Y E S | A N D | M A G I C |
+T H E | P L A C E | S E E M E D | F R A G R A N T | W I T H | A L L | T H E | R I C H E S | O F | G R E E K | T H O U G H T | A N D | S O N G | S I N C E | T H E | D A Y S | W H E N | P T O L E M Y | P H I L A D E L P H U S | W A L K E D | T H E R E | W I T H | E U C L I D | A N D | T H E O C R I T U S | C A L L I M A C H U S | A N D | L Y C O P H R O N |
+B U T | M O S T | P R O B A B L Y | H A D | A N Y | O F | U S | E N T E R E D | T H A T | R O O M | T H A T | M O R N I N G | W E | S H O U L D | N O T | H A V E | B E E N | A B L E | T O | S P A R E | A | L O O K | E I T H E R | F O R | T H E | F U R N I T U R E | O R | T H E | G E N E R A L | E F F E C T | O R | T H E | M U S E U M | G A R D E N S | O R | T H E | S P A R K L I N G | M E D I T E R R A N E A N | B E Y O N D | B U T | W E | S H O U L D | H A V E | A G R E E D | T H A T | T H E | R O O M | W A S | Q U I T E | R I C H | E N O U G H | F O R | H U M A N | E Y E S | F O R | T H E | S A K E | O F | O N E | T R E A S U R E | W H I C H | I T | P O S S E S S E D | A N D | B E S I D E | W H I C H | N O T H I N G | W A S | W O R T H | A | M O M E N T ' S | G L A N C E |
+T H E | R O O M | H A D | N E I T H E R | C A R P E T | N O R | F I R E P L A C E | A N D | T H E | O N L Y | M O V A B L E S | I N | I T | W E R E | A | S O F A | B E D | A | T A B L E | A N D | A N | A R M | C H A I R | A L L | O F | S U C H | D E L I C A T E | A N D | G R A C E F U L | F O R M S | A S | M A Y | B E | S E E N | O N | A N C I E N T | V A S E S | O F | A | F A R | E A R L I E R | P E R I O D | T H A N | T H A T | W H E R E O F | W E | W R I T E |
+T O | B E | W E L C O M E D | I N T O | T H E | C E L E S T I A L | R A N K S | O F | T H E | H E R O I C | T O | R I S E | T O | T H E | I M M O R T A L | G O D S | T O | T H E | I N E F F A B L E | P O W E R S | O N W A R D | U P W A R D | E V E R | T H R O U G H | A G E S | A N D | T H R O U G H | E T E R N I T I E S | T I L L | I | F I N D | M Y | H O M E | A T | L A S T | A N D | V A N I S H | I N | T H E | G L O R Y | O F | T H E | N A M E L E S S | A N D | T H E | A B S O L U T E | O N E |
+H O W | C A N | H E | W H O S E | S P H E R E | L I E S | A B O V E | T H E | S T A R S | S T O O P | E V E R Y | M O M E N T | T O | E A R T H |
+T H E | O P E R A T I O N | W H A T E V E R | I T | H A D | B E E N | W H I C H | H A D | D E P R I V E D | H I S | F E A T U R E S | O F | H A R M O N Y | A N D | P U T | A L L | T H E I R | F L E S H | I N T O | D I S O R D E R | H A D | H A D | N O | E F F E C T | O N | T H E | B O N Y | S T R U C T U R E | O F | H I S | H E A D |
+B E S I D E S | W E | M U S T | R E M E M B E R | T H A T | T H E Y | H A D | I N | T H O S E | T I M E S | M E A N S | O F | P U T T I N G | P A T I E N T S | T O | S L E E P | A N D | O F | S U P P R E S S I N G | A L L | S U F F E R I N G | O N L Y | T H E N | I T | W A S | C A L L E D | M A G I C | W H I L E | N O W | I T | I S | C A L L E D | A N A E S T H E S I A |
+H I S | H A I R | H A V I N G | P R O B A B L Y | B E E N | D Y E D | W I T H | S O M E | C O R R O S I V E | P R E P A R A T I O N | H A D | L E F T | I T | W O O L L Y | A N D | R O U G H | T O | T H E | T O U C H |
+B E S I D E S | T H I S | F A C E | T H O S E | W H O | H A D | B R O U G H T | H I M | U P | H A D | G I V E N | H I M | T H E | R E S O U R C E S | O F | A | G Y M N A S T | A N D | A N | A T H L E T E |
+G W Y N P L A I N E | H A D | Y E L L O W | H A I R |
+T H E | J O Y O U S | C O N V U L S I O N | O F | L A U G H T E R | W A S | A S | A | T R I B U T E | P A I D | T H E Y | S U B M I T T E D | T O | I T | G L A D L Y | B U T | A L M O S T | M E C H A N I C A L L Y |
+T H E | O U T S I D E | D I D | N O T | D E P E N D | O N | T H E | I N T E R I O R |
+A L L | H I S | E M O T I O N S | W H A T E V E R | T H E Y | M I G H T | H A V E | B E E N | A U G M E N T E D | H I S | S T R A N G E | F A C E | O F | J O Y | O R | T O | S P E A K | M O R E | C O R R E C T L Y | A G G R A V A T E D | I T |
+W I T H | T H I S | E X C E P T I O N | G W Y N P L A I N E ' S | L A U G H | W A S | E V E R L A S T I N G |
+I T | S E E M E D | E V I D E N T | T H A T | A | M Y S T E R I O U S | A N D | P R O B A B L Y | O C C U L T | S C I E N C E | W H I C H | W A S | T O | S U R G E R Y | W H A T | A L C H E M Y | W A S | T O | C H E M I S T R Y | H A D | C H I S E L L E D | H I S | F L E S H | E V I D E N T L Y | A T | A | V E R Y | T E N D E R | A G E | A N D | M A N U F A C T U R E D | H I S | C O U N T E N A N C E | W I T H | P R E M E D I T A T I O N |
+H A D | G W Y N P L A I N E | W H E N | A | C H I L D | B E E N | S O | W O R T H Y | O F | A T T E N T I O N | T H A T | H I S | F A C E | H A D | B E E N | S U B J E C T E D | T O | T R A N S M U T A T I O N | W H Y | N O T |
+A C C O R D I N G | T O | A L L | A P P E A R A N C E | I N D U S T R I O U S | M A N I P U L A T O R S | O F | C H I L D R E N | H A D | W O R K E D | U P O N | H I S | F A C E |
+G W Y N P L A I N E | W A S | A | M O U N T E B A N K |
+B U T | I S | L A U G H T E R | A | S Y N O N Y M | O F | J O Y |
+S U C H | P E R F E C T | C O M P L E T E N E S S | I S | N O T | I N | N A T U R E |
+A N | E V E R L A S T I N G | L A U G H |
+H E | S H O W E D | H I M S E L F | O N | T H E | P L A T F O R M |
+T H E | W H O L E | O F | E X I S T E N C E | R E S E M B L E S | A | L E T T E R | M O D I F I E D | I N | T H E | P O S T S C R I P T |
+T H E | M A N I C H A E A N S | B E L I E V E D | T H E | A B S O L U T E | O C C A S I O N A L L Y | G I V E S | W A Y | A N D | T H A T | G O D | H I M S E L F | S O M E T I M E S | A B D I C A T E S | F O R | A | T I M E | S O | A L S O | O F | T H E | W I L L |
+N O | O N E | C O U L D | E S C A P E | F R O M | T H I S | R I C T U S |
+I T | W A S | G W Y N P L A I N E ' S | L A U G H | W H I C H | C R E A T E D | T H E | L A U G H T E R | O F | O T H E R S | Y E T | H E | D I D | N O T | L A U G H | H I M S E L F |
+I T S | Y E L L O W | B R I S T L E S | R A T H E R | A | M A N E | T H A N | A | H E A D | O F | H A I R | C O V E R E D | A N D | C O N C E A L E D | A | L O F T Y | B R O W | E V I D E N T L Y | M A D E | T O | C O N T A I N | T H O U G H T |
+P H O E B E | C O O K E D | V E N U S | S C R U B B E D | T H E | T E M P L E |
+W H A T | T R U E | T H I N G S | A R E | T O L D | I N | S T O R I E S |
+U N K N O W N | P E O P L E | H A D | W O R K E D | U P O N | H I S | F A C E | H E | O N | T H E | O T H E R | H A N D | H A D | W O R K E D | O N | H I S | M I N D | A N D | B E H I N D | T H I S | W E L L | E X E C U T E D | M A S K | H E | H A D | P L A C E D | A L L | T H A T | H E | C O U L D | O F | T H O U G H T |
+U R S U S | W A S | I N | E V E R Y T H I N G | I N | T H E | P I E C E | I N | T H E | C O M P A N Y | I N | T H E | K I T C H E N | I N | T H E | O R C H E S T R A |
+T H E | C A R A V A N | W A S | D I V I D E D | I N T O | T H R E E | C O M P A R T M E N T S | P A R T I T I O N E D | F R O M | E A C H | O T H E R |
+T H I S | W A S | T H E | O L D | E S T A B L I S H M E N T | O F | U R S U S | I T S | P R O P O R T I O N S | A U G M E N T E D | B Y | S U C C E S S | A N D | I M P R O V E D | F R O M | A | W R E T C H E D | B O O T H | I N T O | A | T H E A T R E |
+T H I S | G R E E N | C O L O U R | H A D | S U C C E E D E D | I N | D R A W I N G | A T T E N T I O N | T O | T H E | C A R R I A G E | W H I C H | W A S | K N O W N | I N | A L L | T H E | F A I R | G R O U N D S | A S | T H E | G R E E N | B O X |
+U R S U S | W A S | T H E | P O E T | O F | T H E S E | M A G I C A L | R E P R E S E N T A T I O N S | H E | W R O T E | T H E | P I E C E S |
+T H I S | H U T | I N | A | C O R N E R | A T | T H E | B A C K | T O | T H E | R I G H T | O F | T H E | D O O R | S E R V E D | A S | B E D C H A M B E R | A N D | D R E S S I N G | R O O M | T O | U R S U S | A N D | G W Y N P L A I N E |
+T H E | A S T O N I S H M E N T | W I T H | W H I C H | T H E | V I L L A G E R S | R E G A R D E D | T H I S | M A C H I N E | W A S | O V E R W H E L M I N G |
+O N | T H E | R O O F | F R O M | A | T U B E | P A I N T E D | G R E E N | L I K E | T H E | R E S T | S M O K E | A R O S E |
+I N | G W Y N P L A I N E | E V I L | T H O U G H T S | N E V E R | R I P E N E D | A N D | H E | H A D | T H E R E F O R E | N O | R E M O R S E |
+T H E | C U R I O S I T Y | O F | O N E | P L A C E | E X H A U S T E D | T H E Y | P A S S E D | O N | T O | A N O T H E R |
+S O M E | B E L I E V E D | I T | T O | B E | N A T U R A L | O T H E R S | D E C L A R E D | I T | T O | B E | A R T I F I C I A L | A N D | A S | C O N J E C T U R E | W A S | A D D E D | T O | R E A L I T Y | E V E R Y W H E R E | A T | E V E R Y | C R O S S | R O A D | O N | T H E | J O U R N E Y | I N | A L L | T H E | G R O U N D S | O F | F A I R S | A N D | F E T E S | T H E | C R O W D | R A N | A F T E R | G W Y N P L A I N E |
+T H E N | I | L O O K | P E R H A P S | L I K E | W H A T | I | A M |
+F O R | T H E S E | R E A D | F I B I | A N D | V I N O S | T H A T | W E | M A Y | C O N F O R M | T O | E N G L I S H | P R O N U N C I A T I O N |
+F R O M | S I X T E E N | E I G H T Y | T O | S E V E N T E E N | O | F O U R | A | G R E A T | C H A N G E | H A D | T A K E N | P L A C E |
+T H E | W H E E L S | W E R E | A L L | O F | T H E | S A M E | S I Z E | A N D | H I G H | A S | W A G O N | W H E E L S |
+A | L O F T | U N D E R | T H E | A R C H | O F | T H E | R O O F | C O N T A I N E D | T H E | S C E N E S | A N D | O N | O P E N I N G | A | T R A P | D O O R | L A M P S | A P P E A R E D | P R O D U C I N G | W O N D E R S | O F | L I G H T |
+T H I S | O P E N I N G | L O O K E D | F O R | A L L | T H E | W O R L D | L I K E | A | M O U T H | O F | H E L L | I N | T H E | W O R D S | O F | T H E | I T I N E R A N T | P U R I T A N | P R E A C H E R S | W H O | T U R N E D | A W A Y | F R O M | I T | W I T H | H O R R O R |
+T H E | E F F E C T | O F | H I S | A P P E A R A N C E | H A D | B E E N | S U R P R I S I N G |
+U R S U S | A N D | H O M O | T O O K | C H A R G E | O F | E A C H | O T H E R |
+W H A T | W A S | T H I S | N O T H I N G |
+T H I S | F O R T U N E | H A D | A L L O W E D | U R S U S | W H O | W A S | T H E | A D M I N I S T R A T O R | O F | G W Y N P L A I N E ' S | S U C C E S S | T O | H A V E | T H E | C H A R I O T | O F | H I S | D R E A M S | C O N S T R U C T E D | T H A T | I S | T O | S A Y | A | C A R A V A N | L A R G E | E N O U G H | T O | C A R R Y | A | T H E A T R E | A N D | T O | S O W | S C I E N C E | A N D | A R T | I N | T H E | H I G H W A Y S |
+I T | W A S | E S T A B L I S H E D | A T | S O U T H W A R K |
+B Y | T H E | S I D E | O F | T H E | D O O R | W A S | C O N S T R U C T E D | O F F | H A N D | B Y | M E A N S | O F | A N | E M P T Y | B A R R E L | A | B O X | F O R | T H E | M O N E Y | T A K E R | W H O | W A S | S O M E T I M E S | F I B I | A N D | S O M E T I M E S | V I N O S |
+H E | D I D | N O T | C O M E | E V E R Y | E V E N I N G | B U T | W H E N | H E | C A M E | H E | L E D | T H E | P U B L I C | A P P L A U S E | G R E W | I N T O | A C C L A M A T I O N | S U C C E S S | R O S E | N O T | T O | T H E | R O O F | F O R | T H E R E | W A S | N O N E | B U T | T O | T H E | C L O U D S | F O R | T H E R E | W E R E | P L E N T Y | O F | T H E M |
+O N E | E V E N I N G | U R S U S | W A S | I N | T H E | S I D E | S C E N E | W H I C H | W A S | T H E | K I T C H E N | D O O R | O F | T H E | G R E E N | B O X | S E E I N G | M A S T E R | N I C L E S S | S T A N D I N G | B Y | H I M | S H O W E D | H I M | T H I S | M A N | I N | T H E | C R O W D | A N D | A S K E D | H I M |
+H E | E N T E R E D | H E A V E N | O N L Y | B Y | T H E | A R T I S T S | D O O R |
+W H I C H | C L O U D S | S E E I N G | T H A T | T H E R E | W A S | N O | R O O F | S O M E T I M E S | W E P T | O V E R | T H E | M A S T E R P I E C E | O F | U R S U S |
+A L L | S O U T H W A R K | R A N | I N | C R O W D S | T O | A D M I R E | T H E | L A U G H I N G | M A N |
+E V E N | T H I S | C O M E D I A N | O F | J A W S | A N D | C L A W S | W A S | E C L I P S E D | I N | S U C C E S S |
+W H A T | A | P I T Y | T H A T | H E | S H O U L D | N O T | B E | A | L O R D |
+T H A T | S U C C E S S | W A S | P R O D I G I O U S | S T I L L | I T | R E M A I N E D | L O C A L |
+A T | E V E R Y | P E R F O R M A N C E | T H E | Y A R D | O F | T H E | I N N | T R A N S F O R M E D | I N T O | A | P I T | W A S | F I L L E D | W I T H | A | R A G G E D | A N D | E N T H U S I A S T I C | A U D I E N C E |
+I T | T O O K | A | H U N D R E D | A N D | T H I R T Y | Y E A R S | F O R | T H E | N A M E | O F | S H A K E S P E A R E | T O | P E N E T R A T E | F R O M | E N G L A N D | I N T O | F R A N C E |
+W E | A R E | I N | L O N D O N | S A I D | U R S U S | W E | M U S T | B E | P R E P A R E D | F O R | T H E | G E N T R Y |
+T H E | M E R R Y | A N D R E W S | A N D | M O U N T E B A N K S | O F | T A R R I N Z E A U | F I E L D | W E R E | A G H A S T | A T | G W Y N P L A I N E |
+T H E | E M P T Y I N G | O F | T A N K A R D S | D I D | N O T | D E C R E A S E | T H E I R | S U C C E S S |
+S A I N T | P A U L | I S | A | S A I N T | O N L Y | W I T H | E X T E N U A T I N G | C I R C U M S T A N C E S |
+H I S | E N T H U S I A S M | C A U S E D | U R S U S | T O | R E M A R K | T H I S | M A N | A N D | G W Y N P L A I N E | T O | O B S E R V E | H I M |
+I T | M I G H T | H A V E | B E E N | O R D E R E D | F O R | T H E | G R E E N | B O X |
+T H E | P L A C A R D | G W Y N P L A I N E | T H E | L A U G H I N G | M A N | T A K E N | F R O M | I T S | N A I L | I N | T H E | G R E E N | B O X | W A S | H U N G | U P | C L O S E | T O | T H E | S I G N | O F | T H E | I N N |
+H E | W O U L D | M A K E | A | F A M O U S | S C O U N D R E L |
+G W Y N P L A I N E | A T E | U P | T H E I R | P U B L I C |
+W I T H | T H A T | E X C E P T I O N | T H E I R | S U C C E S S | B E C A M E | S O | G R E A T | T H A T | N O | M O U N T E B A N K | M E M O R Y | C O U L D | R E C A L L | I T S | P A R A L L E L |
+I T | W A S | A | T H E A T R E | R E A D Y | M A D E |
+A T | T H A T | H O U R | T H E R E | W A S | N O | O N E | I N | T H E | F A I R | G R O U N D | E X C E P T | P E R H A P S | S O M E | R E E L I N G | D R U N K A R D | M A K I N G | S T A G G E R I N G | S H A D O W S | I N | D A R K | C O R N E R S |
+A G A I N S T | T H I S | W A L L | W A S | P L A C E D | T H E | G R E E N | B O X | W H I C H | T H E Y | W E R E | A B L E | T O | D R A W | I N T O | T H E | Y A R D | O W I N G | T O | T H E | H E I G H T | O F | T H E | G A T E |
+T H E Y | B E G A N | T H E I R | P E R F O R M A N C E S |
+T H E | G L O R Y | O F | G W Y N P L A I N E | H A D | N O T | P A S S E D | L O N D O N | B R I D G E |
+T H E S E | W E R E | R E M A R K A B L E | T A L E N T S |
+T H I S | C O N N O I S S E U R | W A S | S U D D E N L Y | F A S C I N A T E D | A N D | H A D | A D O P T E D | T H E | L A U G H I N G | M A N |
+T H E Y | H A D | A | G R E A T | F R I E N D | I N | T H I S | U N K N O W N | V I S I T O R |
+B E S I D E S | T H E | S M A L L | F R Y | T H E | S W A L L O W E R S | O F | S W O R D S | A N D | T H E | G R I M A C E | M A K E R S | R E A L | P E R F O R M A N C E S | T O O K | P L A C E | O N | T H E | G R E E N |
+B E S I D E S | T H I S | H E | H A R A N G U E D | L I K E | C I C E R O | A S | W E | H A V E | J U S T | S E E N | S O L D | H I S | D R U G S | A T T E N D E D | S I C K N E S S | A N D | E V E N | H E A L E D | T H E | S I C K |
+T H E | D O M E | O F | S A I N T | P A U L ' S | W A S | A | D E L I G H T | T O | U R S U S |
+Y E T | T H E R E | A R E | A | F E W | P R I V A T E | R O O M S | W H I C H | C O N T A I N | A | T A B L E | S U R R O U N D E D | W I T H | B E N C H E S | I N | W H I C H | A | R E S P E C T A B L E | F A M I L Y | O R | A | F E W | F R I E N D S | C A N | E N J O Y | T H E M S E L V E S | I N | A | D E C E N T | W A Y |
+N E V E R | F E A R | Y O U | S H A L L | S E E | H I M | A G A I N | T O | M O R R O W |
+N O T | V E R Y | L O N G | I | A N S W E R E D | A N D | I | W I L L | T E A C H | Y O U | A S | Y O U | W I S H | A L T H O U G H | T H E | H E R M I T | A S S U R E D | M E | T H A T | I | W O U L D | D I E | S U D D E N L Y | W I T H I N | T H R E E | D A Y S | I F | I | C O M M U N I C A T E D | M Y | S C I E N C E | T O | A N Y O N E | B U T | I | H A V E | N O | F A I T H | W H A T E V E R | I N | T H A T | P R E D I C T I O N |
+T H E Y | A L L | A S K E D | M E | H O W | L O N G | I | W O U L D | R E Q U I R E | T O | T E A C H | T H E M | T H E | R U L E S | O F | M Y | S U B L I M E | C A L C U L U S |
+U N W I L L I N G | T O | H U R T | H I S | V A N I T Y | B Y | T E L L I N G | H I M | T H A T | H E | W A S | M I S T A K E N | I | T O O K | T H E | W I L D | R E S O L U T I O N | O F | I N F O R M I N G | H I M | I N | T H E | P R E S E N C E | O F | H I S | T W O | F R I E N D S | T H A T | I | P O S S E S S E D | A | C E R T A I N | N U M E R A L | C A L C U L U S | W H I C H | G A V E | A N S W E R S | A L S O | I N | N U M B E R S | T O | A N Y | Q U E S T I O N S | I | L I K E D | T O | P U T |
+T H E Y | D I D | N O T | K N O W | W H O | I | W A S | A N D | D I D | N O T | L I K E | T O | A S K | M E | W H I L S T | I | T H O U G H T | I T | B E T T E R | T O | P R E S E R V E | A | M O D E S T | S I L E N C E |
+W E | W O U L D | V E R Y | O F T E N | S P E N D | T H E | W H O L E | N I G H T | R A M B L I N G | A B O U T | T H E | C I T Y | I N V E N T I N G | A N D | C A R R Y I N G | I N T O | E X E C U T I O N | T H E | M O S T | I M P E R T I N E N T | P R A C T I C A L | J O K E S |
+I F | T H E | Q U E S T I O N | W A S | S O | O B S C U R E | T H A T | I | C O U L D | N O T | M A K E | O U T | T H E | S E N S E | O F | I T | I T | W A S | N A T U R A L | T H A T | I | S H O U L D | N O T | U N D E R S T A N D | T H E | A N S W E R |
+W H E R E | I S | M Y | H U S B A N D |
+I | D E C L A R E D | M Y S E L F | Q U I T E | W I L L I N G | F O R | I T | W A S | N E C E S S A R Y | T O | B R A Z E N | I T | O U T | A F T E R | H A V I N G | V E N T U R E D | A S | F A R | A S | I | H A D | D O N E |
+I | F E L T | T H A T | I N | M Y | F I R S T | P R O F E S S I O N | A S | I | W A S | N O T | B L E S S E D | W I T H | T H E | V O C A T I O N | N E C E S S A R Y | T O | I T | I | S H O U L D | H A V E | S U C C E E D E D | O N L Y | B Y | D I N T | O F | H Y P O C R I S Y | A N D | I | S H O U L D | H A V E | B E E N | D E S P I C A B L E | I N | M Y | O W N | E S T I M A T I O N | E V E N | I F | I | H A D | S E E N | T H E | P U R P L E | M A N T L E | O N | M Y | S H O U L D E R S | F O R | T H E | G R E A T E S T | D I G N I T I E S | C A N N O T | S I L E N C E | A | M A N ' S | O W N | C O N S C I E N C E |
+B U T | A L T H O U G H | B E L I E V I N G | F U L L Y | I N | M Y | O R A C L E S | T H E Y | W E R E | T O O | K I N D | H E A R T E D | T O | T H I N K | T H E M | T H E | W O R K | O F | T H E | D E V I L | A N D | I T | S U I T E D | T H E I R | N A T U R A L | G O O D N E S S | B E T T E R | T O | B E L I E V E | M Y | A N S W E R S | I N S P I R E D | B Y | S O M E | H E A V E N L Y | S P I R I T |
+O U R | S C A N D A L O U S | P R O C E E D I N G S | O F T E N | E X P O S E D | U S | T O | T H E | G R E A T E S T | D A N G E R |
+T H E Y | B E L I E V E D | T H A T | T H R O U G H | M E | T H E Y | P O S S E S S E D | T H E | P H I L O S O P H E R ' S | S T O N E | T H E | U N I V E R S A L | P A N A C E A | T H E | I N T E R C O U R S E | W I T H | A L L | T H E | E L E M E N T A R Y | H E A V E N L Y | A N D | I N F E R N A L | S P I R I T S | T H E Y | H A D | N O | D O U B T | W H A T E V E R | T H A T | T H A N K S | T O | M Y | S U B L I M E | S C I E N C E | T H E Y | C O U L D | F I N D | O U T | T H E | S E C R E T S | O F | E V E R Y | G O V E R N M E N T | I N | E U R O P E |
+Y O U R | A P A R T M E N T | I S | R E A D Y | Y O U | M A Y | S E N D | Y O U R | C L O T H E S | Y O U | S H A L L | H A V E | A | S E R V A N T | A | G O N D O L A | A T | Y O U R | O R D E R S | M Y | O W N | T A B L E | A N D | T E N | S E Q U I N S | A | M O N T H |
+I | T O L D | H I M | A N D | H E | I N S I S T E D | U P O N | M Y | C O M I N G | W I T H | H I M | I N | T H E | G O N D O L A | S A Y I N G | T H A T | H E | W O U L D | L E A V E | M E | A T | M Y | H O U S E |
+W H E N E V E R | W E | C O U L D | C O N T R I V E | T O | G E T | I N T O | A | C H U R C H | T O W E R | W E | T H O U G H T | I T | G R E A T | F U N | T O | F R I G H T E N | A L L | T H E | P A R I S H | B Y | R I N G I N G | T H E | A L A R M | B E L L | A S | I F | S O M E | F I R E | H A D | B R O K E N | O U T | B U T | T H A T | W A S | N O T | A L L | W E | A L W A Y S | C U T | T H E | B E L L | R O P E S | S O | T H A T | I N | T H E | M O R N I N G | T H E | C H U R C H W A R D E N S | H A D | N O | M E A N S | O F | S U M M O N I N G | T H E | F A I T H F U L | T O | E A R L Y | M A S S |
+I | J U M P E D | O U T | O F | T H E | G O N D O L A | A N D | F O U N D | M Y S E L F | O N | T H E | V E R Y | S P O T | W H E R E | T H R E E | Y E A R S | B E F O R E | I | H A D | T A U G H T | R A Z E T T A | S U C H | A | F O R C I B L E | L E S S O N | I | E N Q U I R E D | F O R | A | S U R G E O N | A T | T H E | F I R S T | C O F F E E | H O U S E | A N D | R A N | T O | T H E | H O U S E | T H A T | W A S | P O I N T E D | O U T | T O | M E |
+W E | D I D | T H E | S A M E | W I T H | P H Y S I C I A N S | W H O M | W E | O F T E N | S E N T | H A L F | D R E S S E D | T O | S O M E | N O B L E M A N | W H O | W A S | E N J O Y I N G | E X C E L L E N T | H E A L T H |
+B E S I D E S | I | F O U N D | I T | V E R Y | F L A T T E R I N G | T O | M Y | V A N I T Y | T O | B E C O M E | T H E | S U B J E C T | O F | T H E | S P E C U L A T I V E | C H A T T E R I N G | O F | E M P T Y | F O O L S | W H O | H A V I N G | N O T H I N G | E L S E | T O | D O | A R E | A L W A Y S | T R Y I N G | T O | F I N D | O U T | T H E | C A U S E | O F | E V E R Y | M O R A L | P H E N O M E N O N | T H E Y | M E E T | W I T H | W H I C H | T H E I R | N A R R O W | I N T E L L E C T | C A N N O T | U N D E R S T A N D |
+H E | E N T R E A T E D | M E | T O | T E L L | H I M | T H E | T R U T H |
+W H A T | E X T R A O R D I N A R Y | T H I N G S | W I L L | S O M E T I M E S | O C C U R | F R O M | M E R E | C H A N C E | O R | F R O M | T H E | F O R C E | O F | C I R C U M S T A N C E S |
+I | P I C K E D | I T | U P | A N D | C O M I N G | U P | T O | H I M | J U S T | A S | H E | W A S | G O I N G | D O W N | T H E | S T E P S | I | H A N D E D | I T | T O | H I M |
+T H I N K I N G | I | H A D | A | R I G H T | T O | W A T C H | T H E | S I C K | M A N | I | S E T T L E D | M Y S E L F | N E A R | H I S | B E D | T O | G I V E | H I M | E V E R Y | C A R E | H E | R E Q U I R E D |
+W E | T O O K | O U R | T H R E E | P R I S O N E R S | T O | A | L A R G E | B O A T |
+T A K I N G | E V E R Y T H I N G | U P O N | M Y S E L F | I | O R D E R E D | A | S E R V A N T | T O | H U R R Y | O U T | F O R | A | P H Y S I C I A N | W H O | C A M E | I N | A | S H O R T | T I M E | A N D | O R D E R E D | T H E | P A T I E N T | T O | B E | B L E D | A G A I N | T H U S | A P P R O V I N G | T H E | F I R S T | B L E E D I N G | P R E S C R I B E D | B Y | M E |
+W I T H | A N | E D U C A T I O N | W H I C H | O U G H T | T O | H A V E | E N S U R E D | M E | A N | H O N O U R A B L E | S T A N D I N G | I N | T H E | W O R L D | W I T H | S O M E | I N T E L L I G E N C E | W I T | G O O D | L I T E R A R Y | A N D | S C I E N T I F I C | K N O W L E D G E | A N D | E N D O W E D | W I T H | T H O S E | A C C I D E N T A L | P H Y S I C A L | Q U A L I T I E S | W H I C H | A R E | S U C H | A | G O O D | P A S S P O R T | I N T O | S O C I E T Y | I | F O U N D | M Y S E L F | A T | T H E | A G E | O F | T W E N T Y | T H E | M E A N | F O L L O W E R | O F | A | S U B L I M E | A R T | I N | W H I C H | I F | G R E A T | T A L E N T | I S | R I G H T L Y | A D M I R E D | M E D I O C R I T Y | I S | A S | R I G H T L Y | D E S P I S E D |
+H E | H A D | G A M B L E D | A N D | L O S T | A | G R E A T | D E A L | A N D | H I S | B R O T H E R | W A S | H I S | M O S T | B I T T E R | E N E M Y | B E C A U S E | H E | W A S | I N F A T U A T E D | W I T H | T H E | I D E A | T H A T | H E | H A D | T R I E D | T O | P O I S O N | H I M |
+I | W A S | C O M P E L L E D | B Y | P O V E R T Y | T O | B E C O M E | A | M E M B E R | O F | A | M U S I C A L | B A N D | I N | W H I C H | I | C O U L D | E X P E C T | N E I T H E R | E S T E E M | N O R | C O N S I D E R A T I O N | A N D | I | W A S | W E L L | A W A R E | T H A T | I | S H O U L D | B E | T H E | L A U G H I N G | S T O C K | O F | T H E | P E R S O N S | W H O | H A D | K N O W N | M E | A S | A | D O C T O R | I N | D I V I N I T Y | A S | A N | E C C L E S I A S T I C | A N D | A S | A N | O F F I C E R | I N | T H E | A R M Y | A N D | H A D | W E L C O M E D | M E | I N | T H E | H I G H E S T | S O C I E T Y |
+I | T H R E W | M Y S E L F | A T | H I S | F E E T | T O | A S S U R E | H I M | O F | M Y | G R A T I T U D E | A N D | E M B R A C E D | H I M | C A L L I N G | H I M | M Y | F A T H E R |
+I T | W E N T | O N | T O | S A Y | T H A T | T H E | T W O | M E N | W H O | H A D | C A R R I E D | H E R | O F F | H A D | T A K E N | H E R | T O | S U C H | A | P L A C E | W H E R E | T H E Y | H A D | A N | H O U R | L A T E R | B E E N | M E T | B Y | T H E | O T H E R | S I X | A N D | T H A T | T H E Y | H A D | A L L | R E P A I R E D | T O | T H E | T W O | S W O R D S | W H E R E | T H E Y | H A D | S P E N T | A N | H O U R | I N | D R I N K I N G |
+T H E | T H R E E | F R I E N D S | W E R E | A S T O U N D E D |
+M Y | R E A D E R S | M A Y | I M A G I N E | W H E T H E R | W E | F E L T | I N C L I N E D | T O | L A U G H | W H E N | T H E | C H A R M I N G | C R E A T U R E | B A D E | U S | G O O D | N I G H T | T H A N K I N G | U S | A L L | W I T H | P E R F E C T | G O O D | F A I T H |
+W H O E V E R | Y O U | M A Y | B E | I | A M | I N D E B T E D | T O | Y O U | F O R | M Y | L I F E |
+T H E R E | W A S | N O | C O W A R D L Y | T R A I T O R | A M O N G S T | U S | A L T H O U G H | W E | W E R E | A L L | P O O R | B U T | F E A R | H A D | I T S | E F F E C T | A N D | O U R | N O C T U R N A L | P R A N K S | W E R E | N O T | R E N E W E D |
+B E S I D E S | I | W A S | O F | O P I N I O N | T H A T | A | M A N ' S | P R O F E S S I O N | W H A T E V E R | I T | M I G H T | B E | O U G H T | T O | S U P P L Y | H I M | W I T H | E N O U G H | M O N E Y | T O | S A T I S F Y | A L L | H I S | W A N T S | A N D | T H E | V E R Y | P O O R | P A Y | O F | A N | O F F I C E R | W O U L D | N E V E R | H A V E | B E E N | S U F F I C I E N T | T O | C O V E R | M Y | E X P E N S E S | B E C A U S E | M Y | E D U C A T I O N | H A D | G I V E N | M E | G R E A T E R | W A N T S | T H A N | T H O S E | O F | O F F I C E R S | I N | G E N E R A L |
+Y O U | N E E D | N O T | T H I N K | O F | T H E | F U T U R E | T H I N K | O N L Y | O F | E N J O Y I N G | Y O U R S E L F | A N D | T A K E | M E | A S | Y O U R | A D V I S E R | I N | E V E R Y T H I N G | T H A T | M A Y | H A P P E N | T O | Y O U | I N | E V E R Y T H I N G | Y O U | M A Y | W I S H | T O | U N D E R T A K E | A N D | Y O U | M A Y | B E | C E R T A I N | O F | A L W A Y S | F I N D I N G | M E | Y O U R | F R I E N D |
+T H I S | I S | T H E | A M U S I N G | A D V E N T U R E | W H I C H | C L O S E D | O U R | E X P L O I T S |
+I | O B E Y E D | I M P L I C I T L Y | A N D | M E T | Y O U R | E X C E L L E N C Y |
+I | R U B B E D | I T | W I T H | A L L | M Y | S T R E N G T H | B U T | H E | T O L D | M E | I N | A | S O R T | O F | I N D I S T I N C T | W H I S P E R | T H A T | T H E | N U M B N E S S | W A S | S P R E A D I N G | A L L | A L O N G | T H E | L E F T | S I D E | A N D | T H A T | H E | W A S | D Y I N G |
+I N | E V E R Y | O N E | O F | T H E | S E V E N T Y | T W O | P A R I S H E S | O F | T H E | C I T Y | O F | V E N I C E | T H E R E | I S | A | L A R G E | P U B L I C | H O U S E | C A L L E D | M A G A Z Z I N O |
+T H E | W A I T E R | O F | T H E | M A G A Z Z I N O | C A M E | T O | B E | P A I D | A N D | O U R | C H I E F | G A V E | H I M | W H A T | W A S | D U E | E N J O I N I N G | S I L E N C E | U N D E R | P E N A L T Y | O F | D E A T H |
+A S | F O R | T H E | E U C H A R I S T | T R A N S U B S T A N T I A T I O N | T H E | R E A L | P R E S E N C E | I T | W A S | A L L | N O | M Y S T E R Y | T O | T H E M | B U T | P A L P A B L E | E V I D E N C E | A N D | Y E T | T H E Y | W E R E | N O T | J E S U I T S |
+T H E Y | W E R E | N O T | O N L Y | G O O D | C H R I S T I A N S | A N D | F A I T H F U L | T O | T H E | C H U R C H | B U T | E V E N | R E A L | D E V O T E E S | A N D | F U L L | O F | S C R U P L E S |
+D E L I G H T E D | W I T H | S U C H | A | F O R T U N A T E | R E S U L T | W E | L A Y | D O W N | A G A I N |
+T H E | P H Y S I C I A N | W H O | A T T E N D E D | H I M | W A S | N A M E D | T E R R O | H E | T H O U G H T | B Y | S O M E | P E C U L I A R | T R A I N | O F | R E A S O N I N G | T H A T | H E | C O U L D | C U R E | H I M | B Y | A P P L Y I N G | A | M E R C U R I A L | O I N T M E N T | T O | T H E | C H E S T | T O | W H I C H | N O | O N E | R A I S E D | A N Y | O B J E C T I O N |
+T W O | D A Y S | A F T E R W A R D S | O U R | N O C T U R N A L | O R G Y | B E G A N | T O | B E | T A L K E D | O F |
+H E | W R O T E | T H E | Q U E S T I O N | A N D | G A V E | I T | T O | M E | I | R E A D | I T | I | C O U L D | N O T | U N D E R S T A N D | E I T H E R | T H E | S U B J E C T | O R | T H E | M E A N I N G | O F | T H E | W O R D S | B U T | I T | D I D | N O T | M A T T E R | I | H A D | T O | G I V E | A N | A N S W E R |
+I | M I G H T | B E | T O L D | T H A T | I F | I | H A D | W I S H E D | T O | F O L L O W | T H E | R U L E S | O F | P U R E | M O R A L I T Y | I | O U G H T | E I T H E R | T O | H A V E | D E C L I N E D | I N T I M A T E | I N T E R C O U R S E | W I T H | T H E M | O R | T O | H A V E | U N D E C E I V E D | T H E M |
+B U T | T O | H A V E | A | F R I E N D | A N D | T O | B E | T R U E | U N D E R | A N Y | A N D | A L L | T R I A L S | I S | T H E | M A R K | O F | A | M A N |
+B E F O R E | T H I S | C A L A M I T Y | C A M E | U P O N | U S | Y O U | C O U L D | N O T | F I N D | A N Y W H E R E | A | H A P P I E R | H O M E | T H A N | T H A T | C R E A T E D | B Y | T H E | I N D I A N | W O M A N |
+T H I S | W I L D | M O T H E R | H A S | N O T | O N L Y | T H E | E X P E R I E N C E | O F | H E R | M O T H E R | A N D | G R A N D M O T H E R | A N D | T H E | A C C E P T E D | R U L E S | O F | H E R | P E O P L E | F O R | A | G U I D E | B U T | S H E | H U M B L Y | S E E K S | T O | L E A R N | A | L E S S O N | F R O M | A N T S | B E E S | S P I D E R S | B E A V E R S | A N D | B A D G E R S |
+H E | H A D | N E I T H E R | A | N A T I O N A L | A R M Y | N O R | A N | O R G A N I Z E D | C H U R C H |
+H E R | A T T I T U D E | A N D | S E C R E T | M E D I T A T I O N S | M U S T | B E | S U C H | A S | T O | I N S T I L L | I N T O | T H E | R E C E P T I V E | S O U L | O F | T H E | U N B O R N | C H I L D | T H E | L O V E | O F | T H E | G R E A T | M Y S T E R Y | A N D | A | S E N S E | O F | B R O T H E R H O O D | W I T H | A L L | C R E A T I O N |
+I N D E E D | T H E | D I S T I N C T I V E | W O R K | O F | B O T H | G R A N D P A R E N T S | I S | T H A T | O F | A C Q U A I N T I N G | T H E | Y O U T H | W I T H | T H E | N A T I O N A L | T R A D I T I O N S | A N D | B E L I E F S |
+O U R | H O N O R | I S | T H E | G U A R A N T E E | F O R | H I S | S A F E T Y | S O | L O N G | A S | H E | I S | W I T H I N | T H E | C A M P |
+W H E N | S H E | F E L L | T H E | W H O L E | R A C E | F E L L | W I T H | H E R |
+H E | C U T S | O F F | T H E | C H O I C E S T | M O R S E L | O F | T H E | M E A T | A N D | C A S T S | I T | I N T O | T H E | F I R E | T H E | P U R E S T | A N D | M O S T | E T H E R E A L | E L E M E N T |
+T H E | F A M I L Y | W A S | N O T | O N L Y | T H E | S O C I A L | U N I T | B U T | A L S O | T H E | U N I T | O F | G O V E R N M E N T |
+L O V E | B E T W E E N | M A N | A N D | W O M A N | I S | F O U N D E D | O N | T H E | M A T I N G | I N S T I N C T | A N D | I S | N O T | F R E E | F R O M | D E S I R E | A N D | S E L F | S E E K I N G |
+W H E N | H E | B E C O M E S | A N | O L D | M A N | H E | L O V E S | T O | M A K E | A | N O T A B L E | E F F O R T | T O | P R O V E | H I S | G R A T I T U D E |
+I N | D U E | T I M E | T H E | C H I L D | T A K E S | O F | H I S | O W N | A C C O R D | T H E | A T T I T U D E | O F | P R A Y E R | A N D | S P E A K S | R E V E R E N T L Y | O F | T H E | P O W E R S |
+T H E | O R D E A L | I S | B E S T | M E T | A L O N E | W H E R E | N O | C U R I O U S | O R | P I T Y I N G | E Y E S | E M B A R R A S S | H E R | W H E R E | A L L | N A T U R E | S A Y S | T O | H E R | S P I R I T | T I S | L O V E | T I S | L O V E | T H E | F U L F I L L I N G | O F | L I F E |
+H I S | D A I L Y | D E V O T I O N S | W E R E | M O R E | N E C E S S A R Y | T O | H I M | T H A N | D A I L Y | F O O D |
+W H E N E V E R | I N | T H E | C O U R S E | O F | T H E | D A I L Y | H U N T | T H E | R E D | H U N T E R | C O M E S | U P O N | A | S C E N E | T H A T | I S | S T R I K I N G L Y | B E A U T I F U L | O R | S U B L I M E | A | B L A C K | T H U N D E R C L O U D | W I T H | T H E | R A I N B O W ' S | G L O W I N G | A R C H | A B O V E | T H E | M O U N T A I N | A | W H I T E | W A T E R F A L L | I N | T H E | H E A R T | O F | A | G R E E N | G O R G E | A | V A S T | P R A I R I E | T I N G E D | W I T H | T H E | B L O O D | R E D | O F | S U N S E T | H E | P A U S E S | F O R | A N | I N S T A N T | I N | T H E | A T T I T U D E | O F | W O R S H I P |
+T H I S | B O N D | I S | B E T W E E N | M A N | A N D | M A N | I S | U S U A L L Y | F O R M E D | I N | E A R L Y | Y O U T H | A N D | C A N | O N L Y | B E | B R O K E N | B Y | D E A T H |
+T H E | R E M O T E R | D E G R E E S | O F | K I N S H I P | W E R E | F U L L Y | R E C O G N I Z E D | A N D | T H A T | N O T | A S | A | M A T T E R | O F | F O R M | O N L Y | F I R S T | C O U S I N S | W E R E | K N O W N | A S | B R O T H E R S | A N D | S I S T E R S | T H E | N A M E | O F | C O U S I N | C O N S T I T U T E D | A | B I N D I N G | C L A I M | A N D | O U R | R I G I D | M O R A L I T Y | F O R B A D E | M A R R I A G E | B E T W E E N | C O U S I N S | I N | A N Y | K N O W N | D E G R E E | O R | I N | O T H E R | W O R D S | W I T H I N | T H E | C L A N |
+T H E | H O S P I T A L I T Y | O F | T H E | W I G W A M | I S | O N L Y | L I M I T E D | B Y | T H E | I N S T I T U T I O N | O F | W A R |
+A T | A N O T H E R | T I M E | W H E N | I | W A S | F O U R T E E N | Y E A R S | O L D | W E | H A D | J U S T | L E F T | F O R T | E L L I S | O N | T H E | A S S I N I B O I N E | R I V E R | A N D | M Y | Y O U N G E S T | U N C L E | H A D | S E L E C T E D | A | F I N E | S P O T | F O R | O U R | N I G H T | C A M P |
+A S | A | S P E C I A L | M A R K | O F | R E S P E C T | T H E | B O D Y | O F | A | Y O U N G | W O M A N | O R | A | W A R R I O R | W A S | S O M E T I M E S | L A I D | O U T | I N | S T A T E | I N | A | N E W | T E E P E E | W I T H | T H E | U S U A L | H O U S E H O L D | A R T I C L E S | A N D | E V E N | W I T H | A | D I S H | O F | F O O D | L E F T | B E S I D E | I T | N O T | T H A T | T H E Y | S U P P O S E D | T H E | S P I R I T | C O U L D | U S E | T H E | I M P L E M E N T S | O R | E A T | T H E | F O O D | B U T | M E R E L Y | A S | A | L A S T | T R I B U T E |
+M A N Y | O F | T H E | I N D I A N S | B E L I E V E D | T H A T | O N E | M A Y | B E | B O R N | M O R E | T H A N | O N C E | A N D | T H E R E | W E R E | S O M E | W H O | C L A I M E D | T O | H A V E | F U L L | K N O W L E D G E | O F | A | F O R M E R | I N C A R N A T I O N |
+G I V I N G | T H E M S E L V E S | U P | W H O L L Y | T O | T H E I R | G R I E F | T H E Y | A R E | N O | L O N G E R | C O N C E R N E D | A B O U T | A N Y | E A R T H L Y | P O S S E S S I O N | A N D | O F T E N | G I V E | A W A Y | A L L | T H A T | T H E Y | H A V E | T O | T H E | F I R S T | C O M E R S | E V E N | T O | T H E I R | B E D S | A N D | T H E I R | H O M E |
+T H I S | W A S | C A R R I E D | O U T | T O | T H E | L E T T E R |
+I F | A | M A N | W E R E | S L A I N | I N | B A T T L E | I T | W A S | A N | O L D | C U S T O M | T O | P L A C E | H I S | B O D Y | A G A I N S T | A | T R E E | O R | R O C K | I N | A | S I T T I N G | P O S I T I O N | A L W A Y S | F A C I N G | T H E | E N E M Y | T O | I N D I C A T E | H I S | U N D A U N T E D | D E F I A N C E | A N D | B R A V E R Y | E V E N | I N | D E A T H |
+T H E R E | A R E | M A N Y | T R U S T W O R T H Y | M E N | A N D | M E N | O F | C H R I S T I A N | F A I T H | T O | V O U C H | F O R | T H E S E | A N D | S I M I L A R | E V E N T S | O C C U R R I N G | A S | F O R E T O L D |
+F I V E | Y E A R S | L A T E R | H E | R E P E A T E D | T H E | S E R V I C E | A N D | A G A I N | S A V E D | H I S | P E O P L E | F R O M | A W F U L | S L A U G H T E R |
+T H I S | W A S | O N L Y | O N E | O F | H I S | R E M A R K A B L E | P R O P H E C I E S |
+A T | T H E | A G E | O F | A B O U T | S E V E N T Y | F I V E | Y E A R S | H E | S A V E D | H I S | B A N D | F R O M | U T T E R | D E S T R U C T I O N | A T | T H E | H A N D S | O F | T H E I R | A N C E S T R A L | E N E M I E S | B Y | S U D D E N L Y | G I V I N G | W A R N I N G | R E C E I V E D | I N | A | D R E A M | O F | T H E | A P P R O A C H | O F | A | L A R G E | W A R | P A R T Y |
+R E I N C A R N A T I O N | A N D | T H E | C O N V E R S E | O F | S P I R I T S |
+A T | E V E R Y | M E A L | T I M E | A | D I S H | O F | F O O D | W A S | P L A C E D | U N D E R | I T | A N D | S O M E | P E R S O N | O F | T H E | S A M E | S E X | A N D | A G E | A S | T H E | O N E | W H O | W A S | G O N E | M U S T | A F T E R W A R D | B E | I N V I T E D | I N | T O | P A R T A K E | O F | T H E | F O O D |
+A N O T H E R | F A M O U S | M E D I C I N E | M A N | W A S | B O R N | O N | T H E | R U M | R I V E R | A B O U T | O N E | H U N D R E D | A N D | F I F T Y | Y E A R S | A G O | A N D | L I V E D | T O | B E | O V E R | A | C E N T U R Y | O L D |
+I T | W A S | P R E P A R E D | B Y | D R E S S I N G | I N | T H E | F I N E S T | C L O T H E S | T O G E T H E R | W I T H | S O M E | P E R S O N A L | P O S S E S S I O N S | A N D | O R N A M E N T S | W R A P P E D | I N | S E V E R A L | R O B E S | A N D | F I N A L L Y | I N | A | S E C U R E | C O V E R I N G | O F | R A W | H I D E |
+A T | T H E | E N D | O F | A | Y E A R | F R O M | T H E | T I M E | O F | D E A T H | T H E | R E L A T I V E S | M A D E | A | P U B L I C | F E A S T | A N D | G A V E | A W A Y | T H E | C L O T H I N G | A N D | O T H E R | G I F T S | W H I L E | T H E | L O C K | O F | H A I R | W A S | I N T E R R E D | W I T H | A P P R O P R I A T E | C E R E M O N I E S |
+T H E R E F O R E | H E | C O U R T S | D E A T H | I N | B A T T L E | O N | T H E | O T H E R | H A N D | H E | W O U L D | R E G A R D | I T | A S | D I S G R A C E F U L | T O | B E | K I L L E D | I N | A | P R I V A T E | Q U A R R E L |
+T H E R E | W A S | A | W E L L | K N O W N | S I O U X | W A R | P R O P H E T | W H O | L I V E D | I N | T H E | M I D D L E | O F | T H E | L A S T | C E N T U R Y | S O | T H A T | H E | I S | S T I L L | R E M E M B E R E D | B Y | T H E | O L D | M E N | O F | H I S | B A N D |
+N O | D O U B T | M A N Y | P R E D I C T I O N S | H A V E | B E E N | C O L O R E D | T O | S U I T | T H E | N E W | A G E | A N D | U N Q U E S T I O N A B L Y | F A L S E | P R O P H E T S | F A K I R S | A N D | C O N J U R E R S | H A V E | B E C O M E | T H E | P E S T | O F | T H E | T R I B E S | D U R I N G | T H E | T R A N S I T I O N | P E R I O D |
+I T | I S | W E L L | K N O W N | T H A T | T H E | A M E R I C A N | I N D I A N | H A D | S O M E H O W | D E V E L O P E D | O C C U L T | P O W E R | A N D | A L T H O U G H | I N | T H E | L A T T E R | D A Y S | T H E R E | H A V E | B E E N | M A N Y | I M P O S T O R S | A N D | A L L O W I N G | F O R | T H E | V A N I T Y | A N D | W E A K N E S S | O F | H U M A N | N A T U R E | I T | I S | F A I R | T O | A S S U M E | T H A T | T H E R E | M U S T | H A V E | B E E N | S O M E | E V E N | I N | T H E | O L D | D A Y S | Y E T | T H E R E | A R E | W E L L | A T T E S T E D | I N S T A N C E S | O F | R E M A R K A B L E | P R O P H E C I E S | A N D | O T H E R | M Y S T I C | P R A C T I C E |
+T H E | M E N | B L A C K E N | T H E I R | F A C E S | A N D | W I D O W S | O R | B E R E A V E D | P A R E N T S | S O M E T I M E S | G A S H | T H E I R | A R M S | A N D | L E G S | T I L L | T H E Y | A R E | C O V E R E D | W I T H | B L O O D |
+T O | T H E | U N T U T O R E D | S A G E | T H E | C O N C E N T R A T I O N | O F | P O P U L A T I O N | W A S | T H E | P R O L I F I C | M O T H E R | O F | A L L | E V I L S | M O R A L | N O | L E S S | T H A N | P H Y S I C A L |
+I N | H I S | O W N | T H O U G H T | H E | R O S E | S U P E R I O R | T O | T H E M | H E | S C O R N E D | T H E M | E V E N | A S | A | L O F T Y | S P I R I T | A B S O R B E D | I N | I T S | S T E R N | T A S K | R E J E C T S | T H E | S O F T | B E D S | T H E | L U X U R I O U S | F O O D | T H E | P L E A S U R E | W O R S H I P I N G | D A L L I A N C E | O F | A | R I C H | N E I G H B O R |
+W H E N | H E | R E T U R N E D | T O | T H E | C A M P | H E | M U S T | R E M A I N | A T | A | D I S T A N C E | U N T I L | H E | H A D | A G A I N | E N T E R E D | T H E | V A P O R | B A T H | A N D | P R E P A R E D | H I M S E L F | F O R | I N T E R C O U R S E | W I T H | H I S | F E L L O W S |
+I T | W A S | S I L E N T | B E C A U S E | A L L | S P E E C H | I S | O F | N E C E S S I T Y | F E E B L E | A N D | I M P E R F E C T | T H E R E F O R E | T H E | S O U L S | O F | M Y | A N C E S T O R S | A S C E N D E D | T O | G O D | I N | W O R D L E S S | A D O R A T I O N |
+T H E | O R I G I N A L | A T T I T U D E | O F | T H E | A M E R I C A N | I N D I A N | T O W A R D | T H E | E T E R N A L | T H E | G R E A T | M Y S T E R Y | T H A T | S U R R O U N D S | A N D | E M B R A C E S | U S | W A S | A S | S I M P L E | A S | I T | W A S | E X A L T E D |
+T H E | F I R S T | B A M B E D A Y | O R | R E L I G I O U S | R E T R E A T | M A R K E D | A N | E P O C H | I N | T H E | L I F E | O F | T H E | Y O U T H | W H I C H | M A Y | B E | C O M P A R E D | T O | T H A T | O F | C O N F I R M A T I O N | O R | C O N V E R S I O N | I N | C H R I S T I A N | E X P E R I E N C E |
+I T | W A S | N O T | T H E N | W H O L L Y | F R O M | I G N O R A N C E | O R | I M P R O V I D E N C E | T H A T | H E | F A I L E D | T O | E S T A B L I S H | P E R M A N E N T | T O W N S | A N D | T O | D E V E L O P | A | M A T E R I A L | C I V I L I Z A T I O N |
+F R O M | T H E | S U N | A S | T H E | U N I V E R S A L | F A T H E R | P R O C E E D S | T H E | Q U I C K E N I N G | P R I N C I P L E | I N | N A T U R E | A N D | I N | T H E | P A T I E N T | A N D | F R U I T F U L | W O M B | O F | O U R | M O T H E R | T H E | E A R T H | A R E | H I D D E N | E M B R Y O S | O F | P L A N T S | A N D | M E N |
+N O T H I N G | O F | T H E | M A R V E L O U S | C O U L D | A S T O N I S H | H I M | A S | T H A T | A | B E A S T | S H O U L D | S P E A K | O R | T H E | S U N | S T A N D | S T I L L |
+K N O W I N G | T H A T | G O D | S E T S | N O | V A L U E | U P O N | M A T E R I A L | T H I N G S | H E | T O O K | W I T H | H I M | N O | O F F E R I N G S | O R | S A C R I F I C E S | O T H E R | T H A N | S Y M B O L I C | O B J E C T S | S U C H | A S | P A I N T S | A N D | T O B A C C O |
+I N | T H I S | T Y P E | O F | P R A Y E R | T H E R E | W A S | N O | B E S E E C H I N G | O F | F A V O R | O R | H E L P |
+T H I S | I S | T H E | M A T E R I A L | O R | P H Y S I C A L | P R A Y E R |
+T H A T | S O L I T A R Y | C O M M U N I O N | W I T H | T H E | U N S E E N | W H I C H | W A S | T H E | H I G H E S T | E X P R E S S I O N | O F | O U R | R E L I G I O U S | L I F E | I S | P A R T L Y | D E S C R I B E D | I N | T H E | W O R D | B A M B E D A Y | L I T E R A L L Y | M Y S T E R I O U S | F E E L I N G | W H I C H | H A S | B E E N | V A R I O U S L Y | T R A N S L A T E D | F A S T I N G | A N D | D R E A M I N G |
+T H E | H I S T O R I A N S | O F | T H E | W H I T E | R A C E | A D M I T | T H A T | T H E | I N D I A N | W A S | N E V E R | T H E | F I R S T | T O | R E P U D I A T E | H I S | O A T H |
+A T | T H E | S O L E M N | H O U R | O F | S U N R I S E | O R | S U N S E T | H E | T O O K | U P | H I S | P O S I T I O N | O V E R L O O K I N G | T H E | G L O R I E S | O F | E A R T H | A N D | F A C I N G | T H E | G R E A T | M Y S T E R Y | A N D | T H E R E | H E | R E M A I N E D | N A K E D | E R E C T | S I L E N T | A N D | M O T I O N L E S S | E X P O S E D | T O | T H E | E L E M E N T S | A N D | F O R C E S | O F | H I S | A R M I N G | F O R | A | N I G H T | A N D | A | D A Y | T O | T W O | D A Y S | A N D | N I G H T S | B U T | R A R E L Y | L O N G E R |
+N O N E | M I G H T | E X H O R T | O R | C O N F E S S | O R | I N | A N Y | W A Y | M E D D L E | W I T H | T H E | R E L I G I O U S | E X P E R I E N C E | O F | A N O T H E R |
+T H E | S A V A G E | P H I L O S O P H E R | T H E | D U A L | M I N D |
+A M O N G | U S | A L L | M E N | W E R E | C R E A T E D | S O N S | O F | G O D | A N D | S T O O D | E R E C T | A S | C O N S C I O U S | O F | T H E I R | D I V I N I T Y |
+H E R E | I S | T H E | S U P R E M E | M Y S T E R Y | T H A T | I S | T H E | E S S E N C E | O F | W O R S H I P | W I T H O U T | W H I C H | T H E R E | C A N | B E | N O | R E L I G I O N | A N D | I N | T H E | P R E S E N C E | O F | T H I S | M Y S T E R Y | O U R | A T T I T U D E | C A N N O T | B E | V E R Y | U N L I K E | T H A T | O F | T H E | N A T U R A L | P H I L O S O P H E R | W H O | B E H O L D S | W I T H | A W E | T H E | D I V I N E | I N | A L L | C R E A T I O N |
+W H O | M A Y | C O N D E M N | H I S | S U P E R S T I T I O N |
+T H E I R | F R I E N D S | D I D | T H E I R | B E S T | T O | A M U S E | T H E M |
+T H E I R | M I N D S | W E R E | S O | D I S T R A C T E D | A T | T H I S | C H A N G E | O F | R O U T E | A S | T O | B E | Q U I T E | U N H I N G E D |
+W I L L | H A L L E Y | I S | A | B R U T E | B U T | I | A M | K E E P I N G | M Y | E Y E S | O P E N | A N D | I F | T H E | C O A S T | L O O K S | D A N G E R O U S | I | W I L L | P U T | T H E | S H I P ' S | H E A D | T O | S E A | A G A I N |
+S O | T H A T | O N | T H A T | S C O R E | T H E R E | I S | L I T T L E | O R | N O | D A N G E R |
+F O R T U N A T E L Y | W I L L | H A L L E Y | W A S | N O T | A | M A N | I N | A | H U R R Y | A N D | D I D | N O T | U S E | A | P R E S S | O F | C A N V A S | O R | H I S | M A S T S | W O U L D | I N E V I T A B L Y | H A V E | C O M E | D O W N |
+T H I N K | O F | L A D Y | G L E N A R V A N | T H I N K | O F | M A R Y | G R A N T |
+Y E S | M Y | L O R D | W E | S H O U L D | T R Y | I N | V A I N |
+B U T | A S | T O | G E T T I N G | A L O N G S I D E | T H E | D U N C A N | G O D | F O R B I D |
+W H A T | T H E N | M Y | L O R D |
+W E | W O U L D | F I G H T | T O | T H E | D E A T H | O F | C O U R S E | B U T | A F T E R | T H A T |
+J O H N | M A N G L E S | T H E R E F O R E | H O P E D | T H A T | T H E | W R E T C H E D | H U L L | W O U L D | R E A C H | P O R T | W I T H O U T | A C C I D E N T | B U T | I T | G R I E V E D | H I M | T H A T | H I S | C O M P A N I O N S | S H O U L D | H A V E | T O | S U F F E R | S O | M U C H | D I S C O M F O R T | F R O M | T H E | D E F E C T I V E | A R R A N G E M E N T S | O F | T H E | B R I G |
+H I S | E Y E S | W A N D E R E D | C E A S E L E S S L Y | O V E R | T H E | B L A N K | H O R I Z O N |
+W E | C O U L D | N O T | E V E N | F L Y | F L Y | J O H N |
+G O D | K E E P | U S | F R O M | S U C H | A | M E E T I N G | W H Y | J O H N |
+M U C H | A S | T H E Y | H A D | B E E N | I N T E R E S T E D | I N | H I S | D I S S E R T A T I O N | O N | T H E | P A M P A S | O R | A U S T R A L I A | H I S | L E C T U R E S | O N | N E W | Z E A L A N D | F E L L | O N | C O L D | A N D | I N D I F F E R E N T | E A R S |
+D O N ' T | Y O U | S E E | H O W | M A N Y | U S E S | W E | H A V E | F O U N D | F O R | T H I S | R E F U S E | C O A L | T A R |
+L O O K | A | L I T T L E | C L O S E R | W H I L E | O U R | G U I D E | L E T S | T H E | L I G H T | O F | H I S | L A M P | F A L L | U P O N | T H E | B L A C K | W A L L | A T | Y O U R | S I D E |
+W H E N | Y O U R | H A N D S | O R | L I P S | A R E | C R A C K E D | A N D | R O U G H | F R O M | T H E | C O L D | D O E S | Y O U R | M O T H E R | E V E R | P U T | O N | G L Y C E R I N | T O | H E A L | T H E M |
+F E R N S | A N D | P A L M S | M O S S E S | A N D | T R E E S | A N D | A N I M A L S | A L L | P E R F E C T | A L L | B E A U T I F U L | A N D | Y E T | A L L | H I D D E N | A W A Y | U N D E R | T H I S | H I L L | A N D | T U R N E D | I N T O | S H I N I N G | B L A C K | C O A L | N O W | I | C A N | V E R Y | W E L L | R E M E M B E R | W H E N | I | F I R S T | S A W | A | C O A L | F I R E | A N D | H O W | O D D | I T | L O O K E D | T O | S E E | W H A T | S E E M E D | T O | B E | B U R N I N G | S T O N E S |
+W H Y | D I D | H E | G I V E | T H A T | S O | O D D | A | S H A P E | O R | S O | S T R A N G E | A | C O V E R I N G |
+O N C E | T H E R E | W A S | A | F A T H E R | W H O | T H O U G H T | H E | W O U L D | B U I L D | F O R | H I S | C H I L D R E N | A | B E A U T I F U L | H O M E | P U T T I N G | I N T O | I T | E V E R Y | T H I N G | T H E Y | C O U L D | N E E D | O R | D E S I R E | T H R O U G H O U T | T H E I R | L I V E S |
+F O R | W H E N | I | W A S | A | L I T T L E | G I R L | W E | A L W A Y S | H A D | L O G S | O F | W O O D | B L A Z I N G | I N | A N | O P E N | F I R E P L A C E | A N D | S O | D I D | M A N Y | O T H E R | P E O P L E | A N D | C O A L | W A S | J U S T | C O M I N G | I N T O | U S E | F O R | F U E L |
+S E E | B E N E A T H | Y O U R | F E E T | I S | T H E | M A R K I N G | O F | G R E A T | T R E E | T R U N K S | L Y I N G | A S L A N T | A C R O S S | T H E | F L O O R | A N D | T H E | F O R M S | O F | G I G A N T I C | P A L M | L E A V E S | S T R E W E D | A M O N G | T H E M |
+T H E | S W E E T E S T | P E R F U M E S | F L O A T E D | T H R O U G H | T H E | A I R | W H I L E | T H O U S A N D S | O F | B I R D S | A N S W E R E D | T H E | M U S I C | O F | F O U N T A I N S | W I T H | T H E I R | S O N G S |
+T H E N | T H E | H I L L S | W E R E | P I L E D | U P | O N | T O P | O F | I T | A L L | B U T | H E R E | A N D | T H E R E | S O M E | E D G E | O F | A | C O A L | B E D | W A S | T I L T E D | U P | A N D | A P P E A R E D | A B O V E | T H E | G R O U N D |
+T H E S E | F O R E S T S | W E R E | O F | T R E E S | D I F F E R E N T | I N | S O M E | W A Y S | F R O M | T H O S E | W E | H A V E | N O W | G R E A T | F E R N S | A S | T A L L | A S | T H I S | H O U S E | A N D | M O S S E S | A S | H I G H | A S | L I T T L E | T R E E S | A N D | P A L M | L E A V E S | O F | E N O R M O U S | S I Z E |
+I T | W A S | O N L Y | A | T R O U B L E | T O | T H E | G A S | M A K E R S | W H O | H A D | N O | U S E | F O R | I T | A N D | E V E N | T H R E W | I T | A W A Y | U N T I L | S O M E | O N E | M O R E | T H O U G H T F U L | T H A N | T H E | O T H E R S | F O U N D | O U T | T H A T | W A T E R | W O U L D | N O T | P A S S | T H R O U G H | I T |
+A N D | S O | T H R O U G H | M A N Y | Q U E S T I O N S | A N D | M A N Y | E X P E R I M E N T S | T H E Y | L E A R N | A T | L A S T | H O W | T O | U S E | T H E | C O N T E N T S | O F | T H I S | O N E | S T O R E H O U S E |
+T H E | E N T R A N C E | I S | L I G H T | B E C A U S E | I T | O P E N S | S O | W I D E | B U T | W E | C A N | S E E | T H A T | T H E | F L O O R | S L O P E S | D O W N W A R D | A N D | T H E | W A Y | L O O K S | D A R K | A N D | N A R R O W | B E F O R E | U S |
+H E R E | I S | S O M E T H I N G | D I F F E R E N T | R O U N D E D | L I K E | A | N U T | S H E L L | Y O U | C A N | S P L I T | O F F | O N E | S I D E | A N D | B E H O L D | T H E R E | I S | T H E | N U T | L Y I N G | S N U G L Y | A S | D O E S | A N Y | C H E S T N U T | I N | I T S | B U R |
+W H A T | S H O U L D | W E | H A V E | D O N E | I F | E V E R Y B O D Y | H A D | K E P T | O N | B U R N I N G | W O O D | T O | T H I S | D A Y |
+W A L K | D O W N | T H E | S L O P I N G | F O O T | P A T H | N O W | A N D | B E | C A R E F U L | T O | K E E P | O U T | O F | T H E | W A Y | O F | T H E | L I T T L E | C A R S | T H A T | A R E | C O M I N G | A N D | G O I N G | O N | E A C H | S I D E | O F | Y O U | L O A D E D | O N | O N E | S I D E | A N D | E M P T Y | O N | T H E | O T H E R | A N D | S E E M I N G | T O | R U N | U P | A N D | D O W N | B Y | T H E M S E L V E S |
+B U T | B Y | A N D | B Y | T H E | W I S E | M E N | T H O U G H T | A B O U T | I T | A N D | S A I D | T O | T H E M S E L V E S | W E | M U S T | F I N D | O U T | W H A T | U S E F U L | P U R P O S E | G O D | M A D E | T H E | G A S | F O R | W E | K N O W | T H A T | H E | D O E S | N O T | M A K E | A N Y | T H I N G | F O R | H A R M | O N L Y |
+O N | T H A T | S I D E | D E S C E N T | W A S | I M P O S S I B L E | A N D | H A D | I T | B E E N | P O S S I B L E | T H E | B O T T O M | W A S | S H U T | I N | B Y | T H E | E N O R M O U S | R O C K |
+W H A T | C O U L D | B E | T H E | O B J E C T |
+W A T C H | T H E | S A V A G E S | O U T S I D E | S A I D | R O B E R T |
+T H E | M E A L | E N D E D |
+L I S T E N | S A I D | H E | M O T I O N I N G | T H E M | T O | S T O O P |
+I F | I T | I S | D E C R E E D | T H A T | W E | D I E | T O | M O R R O W | L E T | U S | D I E | B R A V E L Y | L I K E | C H R I S T I A N | M E N | R E A D Y | T O | A P P E A R | W I T H O U T | T E R R O R | B E F O R E | T H E | S U P R E M E | J U D G E |
+J O H N | Y O U | H A V E | P R O M I S E D | M A R Y | W H A T | I | P R O M I S E D | L A D Y | H E L E N A | W H A T | I S | Y O U R | P L A N |
+S L E E P | W H I C H | K E E P S | A L L | S O R R O W | I N | A B E Y A N C E | S O O N | W E I G H E D | D O W N | T H E I R | E Y E L I D S | T H E Y | S L E P T | I N | E A C H | O T H E R ' S | A R M S | O V E R C O M E | B Y | E X H A U S T I O N | A N D | P R O L O N G E D | W A T C H I N G |
+J O H N | M A N G L E S | I N S E R T I N G | T H E | B L A D E | O F | H I S | P O N I A R D | A V O I D E D | T H E | K N I F E | W H I C H | N O W | P R O T R U D E D | A B O V E | T H E | S O I L | B U T | S E I Z E D | T H E | H A N D | T H A T | W I E L D E D | I T |
+T H E Y | H A D | O N E | N I G H T | I N | W H I C H | T O | P R E P A R E | F O R | D E A T H |
+T H E I R | F I N G E R S | B L E D | B U T | S T I L L | T H E Y | W O R K E D | O N | A F T E R | H A L F | A N | H O U R | T H E Y | H A D | G O N E | T H R E E | F E E T | D E E P | T H E Y | P E R C E I V E D | B Y | T H E | I N C R E A S E D | S H A R P N E S S | O F | T H E | S O U N D S | T H A T | O N L Y | A | T H I N | L A Y E R | O F | E A R T H | P R E V E N T E D | I M M E D I A T E | C O M M U N I C A T I O N |
+G O D | W H O | R E A D S | O U R | H E A R T S | K N O W S | T H A T | W E | H A D | A | N O B L E | E N D | I N | V I E W |
+D I D | T H E Y | K N O W | O F | T H E | E X I S T E N C E | O F | T H E | P R I S O N E R S | O R | W A S | I T | S O M E | P R I V A T E | E N T E R P R I S E | T H A T | L E D | T O | T H E | U N D E R T A K I N G |
+T H E Y | W E R E | N O T | T O | L E A V E | I T | A G A I N | T I L L | T H E | T O P S | O F | T H E | W A H I T I | R A N G E S | W E R E | L I T | W I T H | T H E | F I R S T | F I R E S | O F | D A Y |
+M Y | C H I L D | M Y | C H I L D | M U R M U R E D | L A D Y | H E L E N A | T H E | S A V A G E S | D I D | N O T | K I L L | Y O U |
+I | B E L I E V E | S A I D | J O H N | T H A T | I N | T H E | S I G H T | O F | G O D | I | H A V E | A | R I G H T | T O | F U L F I L L | T H A T | P R O M I S E |
+B U T | S O F T L Y | A S | T H E | N A M E | W A S | B R E A T H E D | M A R Y | G R A N T | A L R E A D Y | A W A K E N E D | B Y | T H E | S O U N D S | I N | T H E | H U T | S L I P P E D | O V E R | T O W A R D | G L E N A R V A N | A N D | S E I Z I N G | T H E | H A N D | A L L | S T A I N E D | W I T H | E A R T H | S H E | C O V E R E D | I T | W I T H | K I S S E S |
+W I L S O N | A N D | O L B I N E T T | J O I N E D | T H E I R | C O M P A N I O N S | A N D | A L L | U N I T E D | T O | D I G | T H R O U G H | T H E | W A L L | J O H N | W I T H | H I S | D A G G E R | T H E | O T H E R S | W I T H | S T O N E S | T A K E N | F R O M | T H E | G R O U N D | O R | W I T H | T H E I R | N A I L S | W H I L E | M U L R A D Y | S T R E T C H E D | A L O N G | T H E | G R O U N D | W A T C H E D | T H E | N A T I V E | G U A R D | T H R O U G H | A | C R E V I C E | O F | T H E | M A T T I N G |
+A T | L A S T | T H E | M A J O R | S A I D | M Y | F R I E N D S | K E E P | T H A T | T O | T H E | L A S T | M O M E N T |
+M Y | L O R D | W H I C H E V E R | O F | U S | S U R V I V E S | T H E | O T H E R | W I L L | F U L F I L L | T H E | W I S H | O F | L A D Y | H E L E N A | A N D | M A R Y | G R A N T |
+G L E N A R V A N ' S | V O I C E | F I R M | T I L L | N O W | F A L T E R E D |
+A N I M A L | O R | M A N | A N S W E R E D | T H E | M A J O R | I | W I L L | S O O N | F I N D | O U T |
+R O U N D | H I S | B O D Y | W A S | R O L L E D | A | L O N G | C O I L | O F | F L A X | R O P E |
+T H E | J A I L E R | M A Y | F O R G E T | T H A T | H E | I S | O N | G U A R D | T H E | P R I S O N E R | N E V E R | F O R G E T S | T H A T | H E | I S | G U A R D E D |
+C H A P T E R | T H I R T Y | T H R E E | A | C O N F I D A N T |
+M I S T E R | M O R T O N | T H E N | M A D E | A | C A R E F U L | M E M O R A N D U M | O F | T H E | V A R I O U S | P A R T I C U L A R S | O F | W A V E R L E Y ' S | I N T E R V I E W | W I T H | D O N A L D | B E A N | L E A N | A N D | T H E | O T H E R | C I R C U M S T A N C E S | W H I C H | H E | H A D | C O M M U N I C A T E D |
+W H E N | I | W A S | A | Y O U N G | M A N | L I K E | Y O U | M I S T E R | W A V E R L E Y | A N Y | S U C H | H A I R | B R A I N E D | E X P E D I T I O N | I | B E G | Y O U R | P A R D O N | F O R | T H E | E X P R E S S I O N | W O U L D | H A V E | H A D | I N E X P R E S S I B L E | C H A R M S | F O R | M E |
+E V I L | T O | H I M | T H A T | T H I N K S | O T H E R W I S E | S A I D | M I S T E R | M O R T O N | O R | W H O | H O L D S | C H U R C H | G O V E R N M E N T | A N D | C E R E M O N I E S | A S | T H E | E X C L U S I V E | G A G E | O F | C H R I S T I A N | F A I T H | O R | M O R A L | V I R T U E |
+S I N C E | T H A T | T I M E | T H E I R | N U M B E R S | H A V E | G R A D U A L L Y | D I M I N I S H E D | B U T | A | G O O D | M A N Y | A R E | S T I L L | T O | B E | F O U N D | I N | T H E | W E S T E R N | C O U N T I E S | A N D | S E V E R A L | W I T H | A | B E T T E R | T E M P E R | T H A N | I N | S E V E N T E E N | O | S E V E N | H A V E | N O W | T A K E N | A R M S | F O R | G O V E R N M E N T |
+H E | H A D | N E I T H E R | S Y M P A T H Y | W I T H | M Y | I N N O C E N C E | N O R | W I T H | M Y | W R E T C H E D N E S S | A N D | T H E | P E T R I F Y I N G | A C C U R A C Y | W I T H | W H I C H | H E | A T T E N D E D | T O | E V E R Y | F O R M | O F | C I V I L I T Y | W H I L E | H E | T O R T U R E D | M E | B Y | H I S | Q U E S T I O N S | H I S | S U S P I C I O N S | A N D | H I S | I N F E R E N C E S | W A S | A S | T O R M E N T I N G | A S | T H E | R A C K S | O F | T H E | I N Q U I S I T I O N |
+H E | C E R T A I N L Y | P O S S E S S E S | T A L E N T S | B E Y O N D | T H E | R U D E | S P H E R E | I N | W H I C H | H E | M O V E S | A N D | B E I N G | N E I T H E R | D E S T I T U T E | O F | A M B I T I O N | N O R | E N C U M B E R E D | W I T H | S C R U P L E S | H E | W I L L | P R O B A B L Y | A T T E M P T | B Y | E V E R Y | M E A N S | T O | D I S T I N G U I S H | H I M S E L F | D U R I N G | T H E | P E R I O D | O F | T H E S E | U N H A P P Y | C O M M O T I O N S |
+M I S T E R | M O R T O N | R E P L I E D | T H A T | F A R | F R O M | M A K I N G | A N Y | C L A I M | U P O N | H I S | G O O D | O P I N I O N | H I S | O N L Y | W I S H | A N D | T H E | S O L E | P U R P O S E | O F | H I S | V I S I T | W A S | T O | F I N D | O U T | T H E | M E A N S | O F | D E S E R V I N G | I T |
+T H E Y | H E L D | C O N V E N T I C L E S | I N | T H E | O P E N | F I E L D S | A N D | B E I N G | T R E A T E D | W I T H | G R E A T | V I O L E N C E | A N D | C R U E L T Y | B Y | T H E | S C O T T I S H | G O V E R N M E N T | M O R E | T H A N | O N C E | T O O K | A R M S | D U R I N G | T H O S E | R E I G N S |
+I T | W A S | O N E | O F | T H O S E | E F F E C T S | W H I C H | A | P A I N T E R | L O V E S | T O | R E P R E S E N T | A N D | M I N G L E D | W E L L | W I T H | T H E | S T R U G G L I N G | L I G H T | W H I C H | F O U N D | I T S | W A Y | B E T W E E N | T H E | B O U G H S | O F | T H E | S H A D Y | A R C H | T H A T | V A U L T E D | T H E | B R O A D | G R E E N | A L L E Y |
+T H E | H O U S E | W H I C H | S E E M E D | T O | C O N S I S T | O F | T W O | O R | T H R E E | H I G H | N A R R O W | A N D | S T E E P | R O O F E D | B U I L D I N G S | P R O J E C T I N G | F R O M | E A C H | O T H E R | A T | R I G H T | A N G L E S | F O R M E D | O N E | S I D E | O F | T H E | I N C L O S U R E |
+T H E | E V I L | A N D | R E M E D Y | S U C H | A S | I T | I S | S T I L L | E X I S T | B U T | T H I S | I S | R E M O T E | F R O M | O U R | P R E S E N T | P U R P O S E | A N D | I S | O N L Y | T H R O W N | O U T | F O R | C O N S I D E R A T I O N | O F | T H E | C O L L E C T O R S | U N D E R | M I S T E R | D E N T ' S | D O G | B I L L |
+I T | W A S | A B O U T | N O O N | W H E N | C A P T A I N | W A V E R L E Y | E N T E R E D | T H E | S T R A G G L I N G | V I L L A G E | O R | R A T H E R | H A M L E T | O F | T U L L Y | V E O L A N | C L O S E | T O | W H I C H | W A S | S I T U A T E D | T H E | M A N S I O N | O F | T H E | P R O P R I E T O R |
+E V E R Y T H I N G | A R O U N D | A P P E A R E D | S O L I T A R Y | A N D | W O U L D | H A V E | B E E N | S I L E N T | B U T | F O R | T H E | C O N T I N U E D | P L A S H I N G | O F | T H E | F O U N T A I N | A N D | T H E | W H O L E | S C E N E | S T I L L | M A I N T A I N E D | T H E | M O N A S T I C | I L L U S I O N | W H I C H | T H E | F A N C Y | O F | W A V E R L E Y | H A D | C O N J U R E D | U P |
+I T | H A D | B E E N | B U I L T | A T | A | P E R I O D | W H E N | C A S T L E S | W E R E | N O | L O N G E R | N E C E S S A R Y | A N D | W H E N | T H E | S C O T T I S H | A R C H I T E C T S | H A D | N O T | Y E T | A C Q U I R E D | T H E | A R T | O F | D E S I G N I N G | A | D O M E S T I C | R E S I D E N C E |
+N E I T H E R | D I D | T H E | F R O N T | I N D I C A T E | A B S O L U T E | S E C U R I T Y | F R O M | D A N G E R |
+T H E | H O U S E S | S E E M E D | M I S E R A B L E | I N | T H E | E X T R E M E | E S P E C I A L L Y | T O | A N | E Y E | A C C U S T O M E D | T O | T H E | S M I L I N G | N E A T N E S S | O F | E N G L I S H | C O T T A G E S |
+Y E T | T H E | P H Y S I O G N O M Y | O F | T H E | P E O P L E | W H E N | M O R E | C L O S E L Y | E X A M I N E D | W A S | F A R | F R O M | E X H I B I T I N G | T H E | I N D I F F E R E N C E | O F | S T U P I D I T Y | T H E I R | F E A T U R E S | W E R E | R O U G H | B U T | R E M A R K A B L Y | I N T E L L I G E N T | G R A V E | B U T | T H E | V E R Y | R E V E R S E | O F | S T U P I D | A N D | F R O M | A M O N G | T H E | Y O U N G | W O M E N | A N | A R T I S T | M I G H T | H A V E | C H O S E N | M O R E | T H A N | O N E | M O D E L | W H O S E | F E A T U R E S | A N D | F O R M | R E S E M B L E D | T H O S E | O F | M I N E R V A |
+T H I S | W O R K | O F | A R T | W A S | T H E | W O N D E R | O F | T H E | C O U N T R Y | T E N | M I L E S | R O U N D |
+T H E | C O U R T | W A S | S P A C I O U S | W E L L | P A V E D | A N D | P E R F E C T L Y | C L E A N | T H E R E | B E I N G | P R O B A B L Y | A N O T H E R | E N T R A N C E | B E H I N D | T H E | S T A B L E S | F O R | R E M O V I N G | T H E | L I T T E R |
+T H I S | A V E N U E | W A S | S T R A I G H T | A N D | O F | M O D E R A T E | L E N G T H | R U N N I N G | B E T W E E N | A | D O U B L E | R O W | O F | V E R Y | A N C I E N T | H O R S E | C H E S T N U T S | P L A N T E D | A L T E R N A T E L Y | W I T H | S Y C A M O R E S | W H I C H | R O S E | T O | S U C H | H U G E | H E I G H T | A N D | N O U R I S H E D | S O | L U X U R I A N T L Y | T H A T | T H E I R | B O U G H S | C O M P L E T E L Y | O V E R | A R C H E D | T H E | B R O A D | R O A D | B E N E A T H |
+O C C A S I O N A L L Y | I N D E E D | W H E N | S U C H | A | C O N S U M M A T I O N | S E E M E D | I N E V I T A B L E | A | W A T C H F U L | O L D | G R A N D A M | W I T H | H E R | C L O S E | C A P | D I S T A F F | A N D | S P I N D L E | R U S H E D | L I K E | A | S I B Y L | I N | F R E N Z Y | O U T | O F | O N E | O F | T H E S E | M I S E R A B L E | C E L L S | D A S H E D | I N T O | T H E | M I D D L E | O F | T H E | P A T H | A N D | S N A T C H I N G | U P | H E R | O W N | C H A R G E | F R O M | A M O N G | T H E | S U N B U R N T | L O I T E R E R S | S A L U T E D | H I M | W I T H | A | S O U N D | C U F F | A N D | T R A N S P O R T E D | H I M | B A C K | T O | H I S | D U N G E O N | T H E | L I T T L E | W H I T E | H E A D E D | V A R L E T | S C R E A M I N G | A L L | T H E | W H I L E | F R O M | T H E | V E R Y | T O P | O F | H I S | L U N G S | A | S H R I L L Y | T R E B L E | T O | T H E | G R O W L I N G | R E M O N S T R A N C E S | O F | T H E | E N R A G E D | M A T R O N |
+S T A B L E S | A N D | O T H E R | O F F I C E S | O C C U P I E D | A N O T H E R | S I D E | O F | T H E | S Q U A R E |
+T W O | B A T T L E M E N T E D | W A L L S | O N E | O F | W H I C H | F A C E D | T H E | A V E N U E | A N D | T H E | O T H E R | D I V I D E D | T H E | C O U R T | F R O M | T H E | G A R D E N | C O M P L E T E D | T H E | I N C L O S U R E |
+H E R | C O M P L E X I O N | W A S | N O T | A | D E C I D E D | P I N K | B U T | A | S O F T | R O S Y | T I N T | N O T | M U C H | D E E P E R | T H A N | T H A T | O F | T R O T ' S | S K I N |
+W H A T | I S | I T | C O R A L I E | S H E | A S K E D | T H E | W O M A N |
+W E | D O | N O T | H A T E | Y O U | A S | Y O U | S A Y | T H E | B L U E S K I N S | D O | N O R | A R E | W E | S A V A G E | O R | C R U E L | B U T | W E | D O | N O T | W A N T | Y O U | H E R E | A N D | I | A M | R E A L L Y | P U Z Z L E D | W H A T | T O | D O | W I T H | Y O U |
+T H E S E | I N T R U D E R S | A R E | V E R Y | P E C U L I A R | P E O P L E | R E M A R K E D | A | M A N | I N | T H E | C R O W D |
+Y O U | A R E | N O T | L I K E | M Y | P E O P L E | T H E | P I N K I E S | A N D | T H E R E | I S | N O | P L A C E | F O R | Y O U | I N | O U R | C O U N T R Y |
+W H A T | T H A T | L I T T L E | C A B I N |
+A | M I S F O R T U N E | O F | B I R T H | P L A C E D | M E | H E R E | A N D | I | C A N N O T | E S C A P E | M Y | F A T E |
+I N | A L L | O U R | H I S T O R Y | Y O U | A R E | T H E | F I R S T | P E O P L E | F R O M | O U T S I D E | O U R | B O R D E R S | W H O | H A V E | E V E R | S T E P P E D | A | F O O T | I N | O U R | L A N D |
+I F | I | L I V E D | A S | L U X U R I O U S L Y | A S | M Y | P E O P L E | D O | A N D | H A D | S E R V A N T S | A N D | C O S T L Y | G O W N S | T H E | G O O D | P I N K I E S | W O U L D | S A Y | T H A T | T H E I R | Q U E E N | H A D | M O R E | T H A N | T H E Y | T H E M S E L V E S | A N D | I T | W O U L D | B E | T R U E |
+N O | O U R | W A Y | I S | B E S T |
+S H E | S M I L E D | A | L I T T L E | S A D L Y | A T | T R O T | S E E M E D | T O | A P P R O V E | B U T T O N | B R I G H T ' S | O P E N | F R A N K | F A C E | A N D | W A S | Q U I T E | S U R P R I S E D | B E C A U S E | C A P ' N | B I L L | W A S | S O | M U C H | B I G G E R | T H A N | H E R | O W N | P E O P L E |
+E V E N | I N | A M E R I C A | E V E R ' B O D Y | B O W S | L O W | T O | O U R | P R E S I D E N T | A N | T H E | B L U E S K I N S | A R E | S O | F R A I D | O | T H E I R | B O O L O O R O O | T H A T | T H E Y | T R E M B L E | W H E N E V E R | T H E Y | G O | N E A R | H I M |
+T H E | P E O P L E | M U S T | W A I T | O U T S I D E | F O R | T H E R E | I S | N O | R O O M | F O R | T H E M | I N | T H E | P A L A C E |
+I N | T H A T | C A S E | S A I D | B U T T O N | B R I G H T | Y O U ' R E | E N T I T L E D | T O | T H E | B E S T | T H E R E | I S | T O | P A Y | F O R | Y O U R | T R O U B L E |
+B U T | S U R E L Y | T H A T | I S | A L L | W R O N G | S A I D | T O U R M A L I N E | G R A V E L Y |
+Y E S | I T | W A S | W E T | A N | S T I C K Y | A L L | R I G H T | A G R E E D | T H E | S A I L O R | B U T | T H E | B I G | F R O G | H E L P E D | U S | A N | W E | G O T | T H R O U G H | A L L | R I G H T |
+I ' L L | L O O K | I N | T H E | G R E A T | B O O K | F I R S T |
+I | H A V E | O N E | G R E A T | P R I V I L E G E |
+E X C L A I M E D | T R O T | O F | C O U R S E |
+T H E R E | I S | N O T H I N G | M A J E S T I C | A B O U T | M E | A S | Y O U | K N O W | V E R Y | W E L L |
+P E R H A P S | Y O U | A R E | T R Y I N G | T O | R I D I C U L E | M E | S H E | C O N T I N U E D | R E G A R D I N G | T H E | S A I L O R ' S | F A C E | C L O S E L Y |
+T H E | Q U E E N | H A S | N O T H I N G | B U T | T H E | P O W E R | T O | E X E C U T E | T H E | L A W S | T O | A D J U S T | G R I E V A N C E S | A N D | T O | C O M P E L | O R D E R |
+I T | H A D | N O | O R N A M E N T A T I O N | B E I N G | E X C E E D I N G L Y | P L A I N | I N | A P P E A R A N C E |
+A R E | Y O U | A | G I A N T |
+T H E Y | S E E M | V E R Y | I G N O R A N T | P O O R | T H I N G S | S A I D | A N O T H E R | I N | R E P L Y |
+I T | I S | M U C H | M O R E | D E S I R A B L E | T O | B E | A | P R I V A T E | C I T I Z E N | H A P P Y | A N D | C A R E | F R E E |
+H E R E | S A I D | O N E | O F | T H E I R | G U I D E S | A S | T H E | P R O C E S S I O N | H A L T E D | B E F O R E | T H E | L I T T L E | S T O N E | B U I L D I N G | I S | T H E | P A L A C E | O F | T O U R M A L I N E | W H O | I S | O U R | Q U E E N |
+S H E | W A S | A | B E A U T I F U L | G I R L | O F | A B O U T | S E V E N T E E N | Y E A R S | O F | A G E | N O T | F A T | L I K E | A L L | T H E | R E S T | O F | T H E | P I N K I E S | B U T | S L E N D E R | A N D | W E L L | F O R M E D | A C C O R D I N G | T O | O U R | O W N | I D E A S | O F | B E A U T Y |
+T H E R E F O R E | I | A M | A | M E R E | A G E N T | T O | D I R E C T | T H E | L A W S | W H I C H | A R E | T H E | W I L L | O F | T H E | P E O P L E | A N D | A M | O N L Y | A | P U B L I C | S E R V A N T | O B L I G E D | C O N S T A N T L Y | T O | G U A R D | T H E | W E L F A R E | O F | M Y | S U B J E C T S |
+D I D | Y O U | S U P P O S E | A | P A L A C E | W O U L D | B E | L I K E | O N E | O F | O U R | H A N D S O M E | R E S I D E N C E S | A S K E D | T H E | W O M A N | E V I D E N T L Y | S U R P R I S E D |
+S O | T H E Y | F O L L O W E D | H E R | T H R O U G H | T H E | L O W | A R C H W A Y | A N D | I N | A | R O O M | B E Y O N D | V E R Y | S I M P L Y | F U R N I S H E D | S A T | A | Y O U N G | G I R L | E N G A G E D | I N | D A R N I N G | A | P A I R | O F | P I N K | S T O C K I N G S |
+T H E | Q U E E N | G A Z E D | U P O N | O U R | F R I E N D S | W I T H | E V I D E N T | I N T E R E S T |
+C O R A L I E | D O | Y O U | C O N S I D E R | M A J E S T Y | A | P R O P E R | W O R D | T O | U S E | W H E N | A D D R E S S I N G | A | Q U E E N |
+A F T E R | M Y | D E A T H | A | P I N K | M A R B L E | S T A T U E | O F | M E | W I L L | B E | S E T | U P | I N | T H E | G R A N D | C O U R T | W I T H | T H E | S T A T U E S | O F | T H E | O T H E R | K I N G S | A N D | Q U E E N S | W H O | H A V E | R U L E D | T H I S | L A N D | A N D | A L L | T H E | P I N K I E S | I N | A G E S | T O | C O M E | W I L L | T H E N | H O N O R | M E | A S | H A V I N G | B E E N | A | J U S T | A N D | U P R I G H T | Q U E E N | T H A T | I S | M Y | R E W A R D |
+H E | W A S | W I S E | I N | H I S | O W N | C O N C E I T |
+O N | S U N D A Y | M O R N I N G | A | C L E A R | B E A U T I F U L | A N D | S T I L L | D A Y | T H E | O R D E R | W A S | G I V E N | F O R | T H E | W H O L E | A R M Y | T O | A D V A N C E | A N D | T O | A T T A C K | I M M E D I A T E L Y |
+O N | M O N D A Y | T H E | T I D E | W A S | R E V E R S E D |
+T H I S | W A S | T H E | F I R S T | B I G | B A T T L E | I N | W H I C H | O U R | R E G I M E N T | H A D | E V E R | B E E N | E N G A G E D |
+T H E | R O P E | H O W E V E R | W A S | S T R O N G E R | T H A N | T H E | M U L E ' S | N O | A N D | H E | W A S | F I N A L L Y | P R E V A I L E D | U P O N | B Y | T H E | S T R E N G T H | O F | T H E | R O P E | T O | C R O S S | T H E | C R E E K |
+A S | G L A D D E N | R O D E | B Y | U S | A | C O U R I E R | R O D E | U P | A N D | T O L D | H I M | S O M E T H I N G |
+I | H A D | H E A R D | A N D | R E A D | O F | B A T T L E F I E L D S | S E E N | P I C T U R E S | O F | B A T T L E F I E L D S | O F | H O R S E S | A N D | M E N | O F | C A N N O N | A N D | W A G O N S | A L L | J U M B L E D | T O G E T H E R | W H I L E | T H E | G R O U N D | W A S | S T R E W N | W I T H | D E A D | A N D | D Y I N G | A N D | W O U N D E D | B U T | I | M U S T | C O N F E S S | T H A T | I | N E V E R | R E A L I Z E D | T H E | P O M P | A N D | C I R C U M S T A N C E | O F | T H E | T H I N G | C A L L E D | G L O R I O U S | W A R | U N T I L | I | S A W | T H I S |
+B U T | A S | I | S A I D | B E F O R E | R E A D E R | A | P R I V A T E | S O L D I E R | I S | B U T | A N | A U T O M A T O N | A N D | K N O W S | N O T H I N G | O F | W H A T | I S | G O I N G | O N | A M O N G | T H E | G E N E R A L S | A N D | I | A M | O N L Y | G I V I N G | T H E | C H R O N I C L E S | O F | L I T T L E | T H I N G S | A N D | E V E N T S | T H A T | C A M E | U N D E R | M Y | O W N | O B S E R V A T I O N | A S | I | S A W | T H E M | T H E N | A N D | R E M E M B E R | T H E M | N O W |
+T H E | F A C T | W A S | K E P T | F R O M | T H E | T R O O P S |
+A B O U T | D A Y L I G H T | O N | S U N D A Y | M O R N I N G | C H A L M E R S | B R I G A D E | R E L I E V E D | G L A D D E N ' S |
+O N | M Y | T A K I N G | T H E | R O P E | O F F | H E | S H O O K | H I M S E L F | A N D | S E E M E D | T O | S A Y | Y O U | T H I N K | T H A T | Y O U | A R E | M I G H T Y | S M A R T | F O L K S | B U T | Y O U | A R E | A | L E E T L E | T O O | S M A R T |
+O F F I C E R S | C O U L D | N O T | C U R B | T H E | M E N | T O | K E E P | I N | L I N E |
+A B O U T | T H E | T I M E | H E | P U L L E D | T R I G G E R | A | S T R A Y | B A L L | F R O M | S O M E | D I R E C T I O N | S T R U C K | H I M | I N | T H E | S I D E | A N D | H E | F E L L | O F F | D E A D | A N D | H I S | H O R S E | B E C O M I N G | F R I G H T E N E D | G A L L O P E D | O F F | D R A G G I N G | H I M | T H R O U G H | T H E | C O N F E D E R A T E | L I N E S |
+M U L E | D I D | N O T | D E S I R E | T O | C R O S S | W H I L E | I | W A S | T R Y I N G | T O | P E R S U A D E | H I M | W I T H | A | B I G | S T I C K | A | R O C K | I N | H I S | E A R | A N D | A | T W I S T E R | O N | H I S | N O S E |
+S H O U L D | Y O U | D E S I R E | T O | F I N D | O U T | M O R E | A B O U T | T H E | B A T T L E | I | R E F E R | Y O U | T O | H I S T O R Y |
+S O | H E | G O T | A | L A R G E | T W O | I N C H | R O P E | T I E D | O N E | E N D | A R O U N D | T H E | M U L E ' S | N E C K | A N D | T H E | O T H E R | T O | T H E | C A I S S O N | A N D | O R D E R E D | T H E | D R I V E R | T O | W H I P | U P |
+I | H A D | B E E N | F E E L I N G | M E A N | A L L | T H E | M O R N I N G | A S | I F | I | H A D | S T O L E N | A | S H E E P | B U T | W H E N | T H E | O R D E R | T O | C H A R G E | W A S | G I V E N | I | G O T | H A P P Y |
+T H A T ' S | R I G H T | M Y | B R A V E | F I R S T | T E N N E S S E E | G I V E | E M | H A I L | C O L U M B I A |
+W E | H A D | T O | P A S S | O V E R | T H E | G R O U N D | W H E R E | T R O O P S | H A D | B E E N | F I G H T I N G | A L L | D A Y |
+S H I L O H |
+I | F R E Q U E N T L Y | T H O U G H T | I T | W O U L D | B E | P L E A S A N T | T O | S P L I T | T H E | D I F F E R E N C E | W I T H | T H A T | M U L E | A N D | I | W O U L D | G L A D L Y | H A V E | D O N E | S O | I F | I | C O U L D | H A V E | G O T T E N | O N E | H A L F | O F | H I S | N O |
+I | D O | N O T | P R E T E N D | T O | T E L L | O F | W H A T | C O M M A N D | D I S T I N G U I S H E D | I T S E L F | O F | H E R O E S | O F | B L O O D | A N D | W O U N D S | O F | S H R I E K S | A N D | G R O A N S | O F | B R I L L I A N T | C H A R G E S | O F | C A N N O N | C A P T U R E D | E T | C E T E R A |
+W E | W E R E | S U P P O R T I N G | A N | A L A B A M A | B R I G A D E |
+O N | M O N D A Y | M O R N I N G | I | T O O | C A P T U R E D | M E | A | M U L E |
+T H E | V O I C E | A P P E A R E D | T O | B E | O V E R H E A D |
+B U T | H O W | T O | G E T | H I M | O U T | W A S | T H E | U N S O L V E D | P R O B L E M |
+I | D O N ' T | T H I N K | H I S | G U N | W A S | L O A D E D | T H O U G H | B E C A U S E | W E | D I D | N O T | H E A R | T H E | B A L L | W H I S T L E |
+T H E | P O O R | F E L L O W | S T A Y E D | I N | T H A T | W E L L | A L L | N I G H T |
+T H O S E | O L D | S O L D I E R S | H A D | L O N G | L O N G | A G O | F O R G O T T E N | A B O U T | T H A T | O L D | L A W | O F | T H E | L O N G | G O N E | P A S T | B U T | J I M | H A D | T R E A S U R E D | I T | U P | I N | H I S | M E M O R Y | L O | T H E S E | M A N Y | Y E A R S | A N D | H E | T H O U G H T | I T | W O U L D | S E R V E | H I M | N O W | A S | I T | H A D | N O | D O U B T | F R E Q U E N T L Y | D O N E | I N | T H E | P A S T |
+H E | W A L K E D | U P | A N D | S A Y S | H E L L O | B O Y S | W H A T | I S | I T | B O S S |
+W E | W E R E | I N U R E D | T O | P R I V A T I O N S | A N D | H A R D S H I P S | H A D | B E E N | U P O N | E V E R Y | M A R C H | I N | E V E R Y | B A T T L E | I N | E V E R Y | S K I R M I S H | I N | E V E R Y | A D V A N C E | I N | E V E R Y | R E T R E A T | I N | E V E R Y | V I C T O R Y | I N | E V E R Y | D E F E A T |
+W E | W A L K E D | O V E R | T H I S | F L O A T I N G | B R I D G E | A N D | S O O N | F O U N D | O U R S E L V E S | O N | T H E | T E N N E S S E E | S I D E | O F | T E N N E S S E E | R I V E R |
+W E | P A S S E D | A R O U N D | A T L A N T A | C R O S S E D | T H E | C H A T T A H O O C H E E | A N D | T R A V E L E D | B A C K | O V E R | T H E | S A M E | R O U T E | O N | W H I C H | W E | H A D | M A D E | T H E | A R D U O U S | C A M P A I G N | U N D E R | J O E | J O H N S T O N |
+T H E | T H I R D | D A Y | I T | W A S | R E P O R T E D | T H A T | T H E | Y A N K E E S | H A D | T A K E N | P O S I T I O N | O N | T H E | M U R F R E E S B O R O | P I K E |
+W E | L O O K E D | A L L | A R O U N D | A N D | T H O U G H T | T H A T | T H E | C O A S T | W A S | C L E A R |
+A | R E G I M E N T | W A S | S E N T | T O | T H E | A T T A C K | I T | W A S | J I M ' S | R E G I M E N T |
+H O W | E V E R Y | P U L S E | D I D | B E A T | A N D | L E A P | A N D | H O W | E V E R Y | H E A R T | D I D | T H R O B | W I T H | E M O T I O N S | O F | J O Y | W H I C H | S E E M E D | N E A R L Y | A K I N | T O | H E A V E N | W H E N | W E | R E C E I V E D | T H E | G L A D | I N T E L L I G E N C E | O F | O U R | O N W A R D | M A R C H | T O W A R D | T H E | L A N D | O F | P R O M I S E | A N D | O F | O U R | L O V E D | O N E S |
+T H E Y | P E R S U A D E D | E L O Q U E N T L Y |
+B U T | A F T E R | A W H I L E | J I M | S A Y S | G E N T L E M E N | A Y | G A N N Y | T H E | L A W |
+R I G H T | B E F O R E | M E | I | S A W | T H E | L O N G | D R Y | G R A S S | A L L | B E N D I N G | T O W A R D | A | C O M M O N | C E N T E R | A N D | I | K N E W | T H A T | I T | W A S | A N | O L D | W E L L | A N D | T H A T | M Y | C O M R A D E | H A D | F A L L E N | I N | I T |
+H E | H A D N ' T | S E E N | A N Y T H I N G | T O | S H O O T | A T | B U T | H E | B L A Z E D | A W A Y | H E | L O A D E D | A N D | F I R E D | T H E | S E C O N D | T I M E | W H E N | T H E Y | W E R E | O R D E R E D | T O | R E T R E A T |
+H E | W A N T E D | T O | G O | B Y | H O M E | A N D | T E L L | H I S | W I F E | A N D | C H I L D R E N | G O O D | B Y E | A N D | T O | G E T | H I S | C L O T H E S | I T | W A S | N O | G O |
+W E | H A D | B E E F | F O R | S U P P E R | T H A T | N I G H T |
+A | Y A N K E E | A L W A Y S | S A Y S | N A G E R |
+Y A N K | S A Y S | W H A T | Y O U | D O I N G | J O H N N Y |
+A D V A N C E | I N T O | T E N N E S S E E |
+A | M A N | I N | T H E | W E L L |
+O U T S I D E | O F | T H E S E | O C C A S I O N A L | R E M I N D E R S | W E | C O U L D | S E E | N O | E V I D E N C E | O F | T H E | D E S O L A T I O N | O F | T H E | T R A C K | O F | A N | I N V A D I N G | A R M Y |
+Y O U | S E E | J I M | K N O W E D | T H E | L A W |
+W E | W E R E | N O T | T W E N T Y | Y A R D S | O F F | F R O M | T H E | Y A N K E E S | A N D | T H E Y | W E R E | P O U R I N G | T H E | H O T | S H O T | A N D | S H E L L S | R I G H T | I N T O | O U R | R A N K S | A N D | E V E R Y | M A N | W A S | Y E L L I N G | A T | T H E | T O P | O F | H I S | V O I C E | C E A S E | F I R I N G | Y O U | A R E | F I R I N G | O N | Y O U R | O W N | M E N | C E A S E | F I R I N G | Y O U | A R E | F I R I N G | O N | Y O U R | O W N | M E N |
+I | S O O N | F O U N D | O U T | T H A T | H E | H A D | C A U G H T | S I G H T | O F | T H E | R E L I E F | O N | T H E | R O A D | A N D | W A S | A F R A I D | T O | S H O O T | I | Q U I C K L Y | M A D E | U P | M Y | M I N D |
+W E | K E P T | F A L L I N G | B A C K | A N D | F I R I N G | A L L | D A Y | A N D | W E R E | R E L I E V E D | B Y | A N O T H E R | R E G I M E N T | A B O U T | D A R K | W E | R E J O I N E D | O U R | R E G I M E N T |
+I | T H I N K | W E | M U S T | H A V E | K I L L E D | A | G O O D | M A N Y | I N | T H E | O L D | F I E L D | B E C A U S E | W E | W E R E | F I R I N G | A L L | T H E | T I M E | A T | T H E | S O L I D | L I N E | A S | T H E Y | A D V A N C E D | U P O N | U S |
+M Y | G U N | W A S | A T | M Y | F E E T | A N D | O N E | S T E P | W O U L D | G E T | I T |
+T H E | P R I V A T E | C O U L D | B U T | H E | W A S | N O | G E N E R A L | Y O U | S E E |
+W E | R E M A I N E D | S E V E R A L | M O N T H S | B U T | S O O N | W E | W E R E | O N | T H E | T R A M P | A G A I N |
+I | T H O U G H T | I T | H A D | B E E N | T O R N | F R O M | M Y | S H O U L D E R |
+I | M A D E | A | Q U I C K | G L A N C E | O V E R | M Y | S H O U L D E R | A N D | G R A B B E D | A T | M Y | G U N |
+F R O M | T I M E | T O | T I M E | D I F F E R E N T | R E G I M E N T S | W E R E | S E N T | F O R W A R D | T O | D O | P I C K E T | D U T Y |
+H E | D I V I N E D | M Y | M O T I V E | A N D | F I R E D | T H E | B A L L | M I S S E D | I T S | A I M |
+T H E | Y A N K E E | P I C K E T | L I N E S | W E R E | N O T | A | H A L F | M I L E | O F F |
+I | A M | A | V I D E T | Y O U | K N O W | T H E | R E S P O N S I B I L I T Y | R E S T I N G | O N | M E |
+W E | W E R E | O R D E R E D | F O R W A R D | T O | T H E | A T T A C K |
+B U T | I | C O U L D | N O T | B E A R | T H E | T H O U G H T | O F | W E A R I N G | D E A D | M E N ' S | S H O E S |
+I | L O O K E D | A T | I T | P R E T T Y | C L O S E | A N D | I | S A I D | G R E A T | G O D |
+H E | W A S | W A L K I N G | A L O N G | W H E N | A L L | A T | O N C E | H E | D R O P P E D | D O W N | A N D | D I E D | W I T H O U T | A | S T R U G G L E | O R | A | G R O A N |
+S A Y S | H E | I | W O U L D | N O T | T R U S T | A | S E C E S H | O N | H I S | W O R D | O A T H | O R | B O N D | M A R C H | I | S A Y |
+I | S A W | A N D | F E L T | T H A T | H E | W A S | N O T | F I G H T I N G | F O R | G L O R Y | B U T | T H A T | H E | W A S | F I G H T I N G | F O R | H I S | C O U N T R Y | B E C A U S E | H E | L O V E D | T H A T | C O U N T R Y | A N D | H E | W A S | W I L L I N G | T O | G I V E | H I S | L I F E | F O R | H I S | C O U N T R Y | A N D | T H E | S U C C E S S | O F | O U R | C A U S E |
+B A D | G E N E R A L S H I P | I | T H O U G H T | I T | W A S | C H R I S T M A S |
+O A K L E Y | C O L O R | B E A R E R | O F | T H E | F O U R T H | T E N N E S S E E | R E G I M E N T | R A N | R I G H T | U P | I N | T H E | M I D S T | O F | T H E | Y A N K E E | L I N E | W I T H | H I S | C O L O R S | B E G G I N G | H I S | M E N | T O | F O L L O W |
+O U R | A R M Y | S T O P P E D | A T | M U R F R E E S B O R O |
+W E | W E R E | A T | T H A T | T I M E | A T | L E A S T | A | H U N D R E D | Y A R D S | I N | A D V A N C E | O F | T H E | B R I G A D E | C H E A T H A M | A L L | T H E | T I M E | C A L L I N G | U P O N | T H E | M E N | T O | C O M E | O N |
+W E | W E R E | R I G H T | U P O N | T H E | Y A N K E E | L I N E | O N | T H E | W I L K E R S O N | T U R N P I K E |
+H E | W A S | S T O N E | D E A D | B U T | I | D R O P P E D | T H A T | F O O T | Q U I C K |
+O U R | P I C K E T S | H A D | R U N | I N | A N D | R E P O R T E D | A | N I G H T | A T T A C K |
+T H E | F E D E R A L | A R M Y | W A S | C O N C E N T R A T I N G | A T | N A S H V I L L E | T H E R E | W A S | N O | R E S T | F O R | T H E | W E A R Y |
+T H E | L E A D E N | H A I L | S T O R M | S W E P T | T H E M | O F F | T H E | F I E L D | T H E Y | F E L L | B A C K | A N D | R E | F O R M E D |
+B E F O R E | W E | A R R I V E D | A T | T H E | H O U S E | W E | S A W | A | B O D Y | O F | Y A N K E E S | A P P R O A C H I N G | A N D | A S | W E | S T A R T E D | T O | R U N | B A C K | T H E Y | F I R E D | U P O N | U S |
+L I N E | O F | B A T T L E | W A S | F O R M E D | O N | T H E | N O R T H | B A N K | O F | S T O N E ' S | R I V E R | O N | T H E | Y A N K E E | S I D E |
+T H E | Y A N K E E S | M A R C H E D | O V E R | T H E | H I L L | O U T | O F | S I G H T |
+I | C A L L E D | L I E U T E N A N T | C O L O N E L | F R I E R S O N ' S | A T T E N T I O N | T O | T H E | Y A N K E E S | A N D | H E | R E M A R K E D | W E L L | I | D O N ' T | K N O W | W H E T H E R | T H E Y | A R E | Y A N K E E S | O R | N O T | B U T | I F | T H E Y | A R E | T H E Y | W I L L | C O M E | O U T | O F | T H E R E | M I G H T Y | Q U I C K |
+A S | S O O N | A S | E V E R | O F | M Y | S E C O N D | A G E | I | W A S | U P O N | T H E | T H R E S H O L D | A N D | C H A N G E D | L I F E | H I M S E L F | F R O M | M E | H E | T O O K | A N D | G A V E | T O | O T H E R S |
+B U T | W I T H | F U L L | R A V I S H M E N T | T H E | H O U R S | O F | P R I M E | S I N G I N G | R E C E I V E D | T H E Y | I N | T H E | M I D S T | O F | L E A V E S | T H A T | E V E R | B O R E | A | B U R D E N | T O | T H E I R | R H Y M E S |
+B E T W E E N | H E R | S T E P S | A N D | M I N E | W E R E | N O T | A | H U N D R E D | W H E N | E Q U A L L Y | T H E | M A R G I N S | G A V E | A | T U R N | I N | S U C H | A | W A Y | T H A T | T O | T H E | E A S T | I | F A C E D |
+T H E S E | S T A N D A R D S | T O | T H E | R E A R W A R D | L O N G E R | W E R E | T H A N | W A S | M Y | S I G H T | A N D | A S | I T | S E E M E D | T O | M E | T E N | P A C E S | W E R E | T H E | O U T E R M O S T | A P A R T |
+I | S A W | T H E | L A D Y | W H O | E R E W H I L E | A P P E A R E D | V E I L E D | U N D E R N E A T H | T H E | A N G E L I C | F E S T I V A L | D I R E C T | H E R | E Y E S | T O | M E | A C R O S S | T H E | R I V E R |
+B Y | H I S | D E F A U L T | S H O R T | W H I L E | H E | S O J O U R N E D | H E R E | B Y | H I S | D E F A U L T | T O | W E E P I N G | A N D | T O | T O I L | H E | C H A N G E D | H I S | I N N O C E N T | L A U G H T E R | A N D | S W E E T | P L A Y |
+T H I S | E V E R Y | O T H E R | S A V O U R | D O T H | T R A N S C E N D | A N D | N O T W I T H S T A N D I N G | S L A K E D | S O | F A R | M A Y | B E | T H Y | T H I R S T | T H A T | I | R E V E A L | T O | T H E E | N O | M O R E |
+A S | S O O N | A S | O N | M Y | V I S I O N | S M O T E | T H E | P O W E R | S U B L I M E | T H A T | H A D | A L R E A D Y | P I E R C E D | M E | T H R O U G H | E R E | F R O M | M Y | B O Y H O O D | I | H A D | Y E T | C O M E | F O R T H |
+Y E | K E E P | Y O U R | W A T C H | I N | T H E | E T E R N A L | D A Y | S O | T H A T | N O R | N I G H T | N O R | S L E E P | C A N | S T E A L | F R O M | Y O U | O N E | S T E P | T H E | A G E S | M A K E | U P O N | T H E I R | P A T H |
+T O | T H E | L E F T | H A N D | I | T U R N E D | W I T H | T H A T | R E L I A N C E | W I T H | W H I C H | T H E | L I T T L E | C H I L D | R U N S | T O | H I S | M O T H E R | W H E N | H E | H A S | F E A R | O R | W H E N | H E | I S | A F F L I C T E D |
+N O T | O N L Y | R O M E | W I T H | N O | S U C H | S P L E N D I D | C A R | E ' E R | G L A D D E N E D | A F R I C A N U S | O R | A U G U S T U S | B U T | P O O R | T O | I T | T H A T | O F | T H E | S U N | W O U L D | B E |
+A N D | W H E N | T H E | C A R | W A S | O P P O S I T E | T O | M E | T H U N D E R | W A S | H E A R D | A N D | A L L | T H A T | F O L K | A U G U S T | S E E M E D | T O | H A V E | F U R T H E R | P R O G R E S S | I N T E R D I C T E D |
+C O N F U S I O N | A N D | D I S M A Y | T O G E T H E R | M I N G L E D | F O R C E D | S U C H | A | Y E S | F R O M | O U T | M Y | M O U T H | T H A T | S I G H T | W A S | N E E D F U L | T O | T H E | U N D E R S T A N D I N G | O F | I T |
+T O | S A Y | U N T O | V I R G I L I U S | N O T | A | D R A C H M | O F | B L O O D | R E M A I N S | I N | M E | T H A T | D O E S | N O T | T R E M B L E | I | K N O W | T H E | T R A C E S | O F | T H E | A N C I E N T | F L A M E |
+T H E | G O O D | S U P R E M E | S O L E | I N | I T S E L F | D E L I G H T I N G | C R E A T E D | M A N | G O O D | A N D | T H I S | G O O D L Y | P L A C E | G A V E | H I M | A S | H A N S E L | O F | E T E R N A L | P E A C E |
+T H O U | M A K E S T | M E | R E M E M B E R | W H E R E | A N D | W H A T | P R O S E R P I N A | T H A T | M O M E N T | W A S | W H E N | L O S T | H E R | M O T H E R | H E R | A N D | S H E | H E R S E L F | T H E | S P R I N G |
+I | D O | N O T | T H I N K | T H E R E | S H O N E | S O | G R E A T | A | L I G H T | U N D E R | T H E | L I D S | O F | V E N U S | W H E N | T R A N S F I X E D | B Y | H E R | O W N | S O N | B E Y O N D | H I S | U S U A L | C U S T O M |
+B U T | B Y | T H E | L A R G E S S | O F | C E L E S T I A L | G R A C E S | W H I C H | H A V E | S U C H | L O F T Y | V A P O U R S | F O R | T H E I R | R A I N | T H A T | N E A R | T O | T H E M | O U R | S I G H T | A P P R O A C H E S | N O T |
+D A N T E | B E C A U S E | V I R G I L I U S | H A S | D E P A R T E D | D O | N O T | W E E P | Y E T | D O | N O T | W E E P | Y E T | A W H I L E | F O R | B Y | A N O T H E R | S W O R D | T H O U | N E E D ' S T | M U S T | W E E P |
+A N D | O N E | O F | T H E M | A S | I F | B Y | H E A V E N | C O M M I S S I O N E D | S I N G I N G | V E N I | S P O N S A | D E | L I B A N O | S H O U T E D | T H R E E | T I M E S | A N D | A L L | T H E | O T H E R S | A F T E R |
+Y E | A R E | N E W | C O M E R S | A N D | B E C A U S E | I | S M I L E | B E G A N | S H E | P E R A D V E N T U R E | I N | T H I S | P L A C E | E L E C T | T O | H U M A N | N A T U R E | F O R | I T S | N E S T |
+T H R E E | M A I D E N S | A T | T H E | R I G H T | W H E E L | I N | A | C I R C L E | C A M E | O N W A R D | D A N C I N G | O N E | S O | V E R Y | R E D | T H A T | I N | T H E | F I R E | S H E | H A R D L Y | H A D | B E E N | N O T E D |
+L O O K | A T | M E | W E L L | I N | S O O T H | I ' M | B E A T R I C E |
+A L L | W A T E R S | T H A T | O N | E A R T H | M O S T | L I M P I D | A R E | W O U L D | S E E M | T O | H A V E | W I T H I N | T H E M S E L V E S | S O M E | M I X T U R E | C O M P A R E D | W I T H | T H A T | W H I C H | N O T H I N G | D O T H | C O N C E A L |
+W H E N C E | S H E | T O | M E | I N | T H O S E | D E S I R E S | O F | M I N E | W H I C H | L E D | T H E E | T O | T H E | L O V I N G | O F | T H A T | G O O D | B E Y O N D | W H I C H | T H E R E | I S | N O T H I N G | T O | A S P I R E | T O |
+N O W | H E L I C O N | M U S T | N E E D S | P O U R | F O R T H | F O R | M E | A N D | W I T H | H E R | C H O I R | U R A N I A | M U S T | A S S I S T | M E | T O | P U T | I N | V E R S E | T H I N G S | D I F F I C U L T | T O | T H I N K |
+T H E | I N T E R V A L | B E T W E E N | T H E S E | F O U R | C O N T A I N E D | A | C H A R I O T | T R I U M P H A L | O N | T W O | W H E E L S | W H I C H | B Y | A | G R I F F I N ' S | N E C K | C A M E | D R A W N | A L O N G |
+N O R | E V E N | T H U S | O U R | W A Y | C O N T I N U E D | F A R | B E F O R E | T H E | L A D Y | W H O L L Y | T U R N E D | H E R S E L F | U N T O | M E | S A Y I N G | B R O T H E R | L O O K | A N D | L I S T E N |
+A N D | I | B E H E L D | T H E | F L A M E L E T S | O N W A R D | G O | L E A V I N G | B E H I N D | T H E M S E L V E S | T H E | A I R | D E P I C T E D | A N D | T H E Y | O F | T R A I L I N G | P E N N O N S | H A D | T H E | S E M B L A N C E | S O | T H A T | I T | O V E R H E A D | R E M A I N E D | D I S T I N C T | W I T H | S E V E N F O L D | L I S T S | A L L | O F | T H E M | O F | T H E | C O L O U R S | W H E N C E | T H E | S U N ' S | B O W | I S | M A D E | A N D | D E L I A ' S | G I R D L E |
+T H E R E F O R E | M Y | A N S W E R | I S | W I T H | G R E A T E R | C A R E | T H A T | H E | M A Y | H E A R | M E | W H O | I S | W E E P I N G | Y O N D E R | S O | T H A T | T H E | S I N | A N D | D O L E | B E | O F | O N E | M E A S U R E |
+I N | R E A R | O F | A L L | T H E | G R O U P | H E R E | T R E A T E D | O F | T W O | O L D | M E N | I | B E H E L D | U N L I K E | I N | H A B I T | B U T | L I K E | I N | G A I T | E A C H | D I G N I F I E D | A N D | G R A V E |
+A N D | W H A T | A L L U R E M E N T S | O R | W H A T | V A N T A G E S | U P O N | T H E | F O R E H E A D | O F | T H E | O T H E R S | S H O W E D | T H A T | T H O U | S H O U L D S T | T U R N | T H Y | F O O T S T E P S | U N T O | T H E M |
+T H E N | B A C K | I | T U R N E D | M Y | F A C E | T O | T H O S E | H I G H | T H I N G S | W H I C H | M O V E D | T H E M S E L V E S | T O W A R D S | U S | S O | S E D A T E L Y | T H E Y | H A D | B E E N | D I S T A N C E D | B Y | N E W | W E D D E D | B R I D E S |
+M U S T | I | L E A V E | A L O N E | N O |
+M Y | F A T H E R | H A S | R E V E A L E D | T H E | C U L P R I T ' S | N A M E | M Y | F A T H E R | T H I R S T S | F O R | R E V E N G E | A S | M U C H | A S | Y O U | D O | Y E T | E V E N | H E | C O N J U R E S | Y O U | A S | I | D O | T O | K E E P | T H I S | S E C R E T | D O | Y O U | N O T | F A T H E R |
+A N D | T H E | C R Y | I S S U E D | F R O M | H I S | P O R E S | I F | W E | M A Y | T H U S | S P E A K | A | C R Y | F R I G H T F U L | I N | I T S | S I L E N C E |
+T H E | O L D | M A N ' S | E Y E S | R E M A I N E D | F I X E D | O N | T H E | D O O R |
+B U T | I N | L E S S | T H A N | F I V E | M I N U T E S | T H E | S T A I R C A S E | G R O A N E D | B E N E A T H | A N | E X T R A O R D I N A R Y | W E I G H T |
+T H E | T W O | D O C T O R S | T H E R E F O R E | E N T E R E D | T H E | R O O M | A L O N E |
+I | A M | G O I N G | S I R | A N D | I | D O | N O T | H E S I T A T E | T O | S A Y | T H A T | N O | P R A Y E R S | W I L L | B E | M O R E | F E R V E N T | T H A N | M I N E |
+D ' A V R I G N Y | U N A B L E | T O | B E A R | T H E | S I G H T | O F | T H I S | T O U C H I N G | E M O T I O N | T U R N E D | A W A Y | A N D | V I L L E F O R T | W I T H O U T | S E E K I N G | A N Y | F U R T H E R | E X P L A N A T I O N | A N D | A T T R A C T E D | T O W A R D S | H I M | B Y | T H E | I R R E S I S T I B L E | M A G N E T I S M | W H I C H | D R A W S | U S | T O W A R D S | T H O S E | W H O | H A V E | L O V E D | T H E | P E O P L E | F O R | W H O M | W E | M O U R N | E X T E N D E D | H I S | H A N D | T O W A R D S | T H E | Y O U N G | M A N |
+G E N T L E M E N | H E | S A I D | I N | A | H O A R S E | V O I C E | G I V E | M E | Y O U R | W O R D | O F | H O N O R | T H A T | T H I S | H O R R I B L E | S E C R E T | S H A L L | F O R E V E R | R E M A I N | B U R I E D | A M O N G S T | O U R S E L V E S | T H E | T W O | M E N | D R E W | B A C K |
+D ' A V R I G N Y | R U S H E D | T O W A R D S | T H E | O L D | M A N | A N D | M A D E | H I M | I N H A L E | A | P O W E R F U L | R E S T O R A T I V E |
+M O R R E L | S U F F E R E D | A N | E X C L A M A T I O N | O F | H O R R O R | A N D | S U R P R I S E | T O | E S C A P E | H I M |
+A S K E D | M O R R E L | Y E S |
+I T | W A S | S O M E T H I N G | T E R R I B L E | T O | W I T N E S S | T H E | S I L E N T | A G O N Y | T H E | M U T E | D E S P A I R | O F | N O I R T I E R | W H O S E | T E A R S | S I L E N T L Y | R O L L E D | D O W N | H I S | C H E E K S |
+G O | D O | Y O U | H E A R |
+T H E | D I S T R I C T | D O C T O R | A P P R O A C H E D | W I T H | T H E | I N D I F F E R E N C E | O F | A | M A N | A C C U S T O M E D | T O | S P E N D | H A L F | H I S | T I M E | A M O N G S T | T H E | D E A D | H E | T H E N | L I F T E D | T H E | S H E E T | W H I C H | W A S | P L A C E D | O V E R | T H E | F A C E | A N D | J U S T | U N C L O S E D | T H E | L I P S |
+S A I D | M O R R E L | S A D L Y | Y E S | R E P L I E D | N O I R T I E R |
+O H | Y O U | R A V E | S I R | E X C L A I M E D | V I L L E F O R T | I N | V A I N | E N D E A V O R I N G | T O | E S C A P E | T H E | N E T | I N | W H I C H | H E | W A S | T A K E N | I | R A V E |
+D O | Y O U | K N O W | T H E | A S S A S S I N | A S K E D | M O R R E L |
+T H E | O L D | M A N | M A D E | A | S I G N | I N | T H E | A F F I R M A T I V E |
+N O I R T I E R | W A S | N E A R | T H E | B E D | P A L E | M O T I O N L E S S | A N D | S I L E N T | A S | T H E | C O R P S E |
+T H E | N E A R E S T | S A I D | T H E | D I S T R I C T | D O C T O R | I S | A | G O O D | I T A L I A N | A B B E | W H O | L I V E S | N E X T | D O O R | T O | Y O U | S H A L L | I | C A L L | O N | H I M | A S | I | P A S S |
+A T | T H I S | M O M E N T | T H E | W H O L E | S O U L | O F | T H E | O L D | M A N | S E E M E D | C E N T R E D | I N | H I S | E Y E S | W H I C H | B E C A M E | B L O O D S H O T | T H E | V E I N S | O F | T H E | T H R O A T | S W E L L E D | H I S | C H E E K S | A N D | T E M P L E S | B E C A M E | P U R P L E | A S | T H O U G H | H E | W A S | S T R U C K | W I T H | E P I L E P S Y | N O T H I N G | W A S | W A N T I N G | T O | C O M P L E T E | T H I S | B U T | T H E | U T T E R A N C E | O F | A | C R Y |
+W H A T | D O | Y O U | M E A N | S I R |
+B U T | C A N | H E | U N D E R S T A N D | Y O U | Y E S |
+D ' A V R I G N Y | S A I D | V I L L E F O R T | B E | S O | K I N D | I | B E S E E C H | Y O U | A S | T O | A C C O M P A N Y | T H I S | G E N T L E M A N | H E R E | I S | T H E | K E Y | O F | T H E | D O O R | S O | T H A T | Y O U | C A N | G O | I N | A N D | O U T | A S | Y O U | P L E A S E | Y O U | W I L L | B R I N G | T H E | P R I E S T | W I T H | Y O U | A N D | W I L L | O B L I G E | M E | B Y | I N T R O D U C I N G | H I M | I N T O | M Y | C H I L D ' S | R O O M | D O | Y O U | W I S H | T O | S E E | H I M |
+B U T | H E | S T O P P E D | O N | T H E | L A N D I N G | H E | H A D | N O T | T H E | C O U R A G E | T O | A G A I N | V I S I T | T H E | D E A T H | C H A M B E R |
+I | O N L Y | W I S H | T O | B E | A L O N E | Y O U | W I L L | E X C U S E | M E | W I L L | Y O U | N O T |
+N O I R T I E R | L O O K E D | U P O N | M O R R E L | W I T H | O N E | O F | T H O S E | M E L A N C H O L Y | S M I L E S | W H I C H | H A D | S O | O F T E N | M A D E | V A L E N T I N E | H A P P Y | A N D | T H U S | F I X E D | H I S | A T T E N T I O N |
+F O R | S O M E | T I M E | N O T H I N G | W A S | H E A R D | I N | T H A T | C H A M B E R | B U T | S O B S | E X C L A M A T I O N S | A N D | P R A Y E R S |
+I | C O U L D | N O T | T A K E | M Y | E Y E S | O F F | T H E | M A N | I N | T H E | B E D |
+T H E | L I T T L E | H O U S E | O N | T H E | H I L L S I D E | W A S | S O | M U C H | T H E | C O L O R | O F | T H E | N I G H T | T H A T | W E | C O U L D | N O T | S E E | I T | A S | W E | C A M E | U P | T H E | D R A W |
+S H E | A S K E D | P E T E R | T O | W A I T | A | M O M E N T | A N D | W H E N | S H E | C A M E | B A C K | F R O M | T H E | K I T C H E N | S H E | B R O U G H T | A | B A G | O F | S A N D W I C H E S | A N D | D O U G H N U T S | F O R | U S |
+H E | L A Y | P A T I E N T L Y | F I G H T I N G | F O R | B R E A T H | L I K E | A | C H I L D | W I T H | C R O U P |
+W H E N | H E | W A S | O U T | H U N T I N G | H E | U S E D | T O | G O | I N T O | T H E | E M P T Y | L O G | H O U S E | A N D | S I T | T H E R E | B R O O D I N G |
+P E T E R | C R O U C H I N G | I N | T H E | F R O N T | S E A T | S A W | N O T H I N G |
+W I T H O U T | A | W O R D | P E T E R | G O T | U P | A N D | L I T | H I S | L A N T E R N |
+Q U I C K L Y | I T | W A S | C O V E R E D | W I T H | B R I G H T | R E D | S P O T S | I | T H O U G H T | I | H A D | N E V E R | S E E N | A N Y | B L O O D | S O | B R I G H T |
+T H E | R O A D | W A S | C L E A R | A N D | W H I T E | A N D | T H E | G R O O M ' S | T H R E E | B L A C K S | W E N T | L I K E | T H E | W I N D |
+T H E Y | M A D E | M E | T H I N K | O F | D E F E A T E D | A R M I E S | R E T R E A T I N G | O R | O F | G H O S T S | W H O | W E R E | T R Y I N G | D E S P E R A T E L Y | T O | G E T | I N | F O R | S H E L T E R | A N D | T H E N | W E N T | M O A N I N G | O N |
+T H E | S H A R P | S M E L L | O F | S P I R I T S | W E N T | T H R O U G H | T H E | R O O M |
+Y E S | H O W | M A N Y |
+W H E R E V E R | T H E Y | W E N T | T H E | S T O R Y | F O L L O W E D | T H E M |
+T H I S | C A B I N | W A S | H I S | H E R M I T A G E | U N T I L | T H E | W I N T E R | S N O W S | P E N N E D | H I M | I N | H I S | C A V E |
+T H E | S H R I E K S | T H A T | F O L L O W E D | M A D E | E V E R Y B O D Y | S O B E R |
+A N D | T H E | W O L V E S | P A V E L | A S K E D | E N O U G H | E N O U G H | F O R | A L L | O F | U S |
+S O M E T H I N G | H A P P E N E D | T O | T H E | H I N D M O S T | S L E D G E | T H E | D R I V E R | L O S T | C O N T R O L | H E | W A S | P R O B A B L Y | V E R Y | D R U N K | T H E | H O R S E S | L E F T | T H E | R O A D | T H E | S L E D G E | W A S | C A U G H T | I N | A | C L U M P | O F | T R E E S | A N D | O V E R T U R N E D |
+F R O M | O U R | B E N C H | W E | C O U L D | S E E | W H A T | A | H O L L O W | C A S E | H I S | B O D Y | W A S |
+D U R I N G | T H E | A U C T I O N | H E | W E N T | A B O U T | W I T H | H I S | H E A D | D O W N | A N D | N E V E R | L I F T E D | H I S | E Y E S |
+G R A D U A L L Y | R E L I E F | C A M E | T O | A L L | O F | U S |
+T H E | S I C K | M A N | R A G E D | A N D | S H O O K | H I S | F I S T |
+T H E | L O S S | O F | H I S | T W O | F R I E N D S | H A D | A | D E P R E S S I N G | E F F E C T | U P O N | O L D | M I S T E R | S H I M E R D A |
+M I S T E R | S H I M E R D A | W E N T | W I T H | H I M |
+I T | S E E M E D | T O | M E | T H A T | H E | D E S P I S E D | H I M | F O R | B E I N G | S O | S I M P L E | A N D | D O C I L E |
+A N T O N I A ' S | F A T H E R | U N C O V E R E D | O N E | O F | H I S | L O N G | B O N Y | L E G S | A N D | R U B B E D | I T | R H Y T H M I C A L L Y |
+T H E | F I R S T | H O W L S | W E R E | T A K E N | U P | A N D | E C H O E D | A N D | W I T H | Q U I C K E N I N G | R E P E T I T I O N S |
+P E T E R | T O L D | H I S | T R O U B L E S | T O | M I S T E R | S H I M E R D A | H E | W A S | U N A B L E | T O | M E E T | A | N O T E | W H I C H | F E L L | D U E | O N | T H E | F I R S T | O F | N O V E M B E R | H A D | T O | P A Y | A N | E X O R B I T A N T | B O N U S | O N | R E N E W I N G | I T | A N D | T O | G I V E | A | M O R T G A G E | O N | H I S | P I G S | A N D | H O R S E S | A N D | E V E N | H I S | M I L K | C O W |
+A | B L A C K | D R O V E | C A M E | U P | O V E R | T H E | H I L L | B E H I N D | T H E | W E D D I N G | P A R T Y |
+T W E N T Y | T H I R T Y | E N O U G H |
+W E | L A Y | S T I L L | A N D | D I D | N O T | T A L K |
+T H E Y | W E R E | R U N | O U T | O F | T H E I R | V I L L A G E |
+T H E Y | W O R K E D | I N | C H I C A G O | D E S | M O I N E S | F O R T | W A Y N E | B U T | T H E Y | W E R E | A L W A Y S | U N F O R T U N A T E |
+T H E | F I R S T | T H I N G | E I T H E R | O F | T H E M | N O T I C E D | W A S | A | N E W | S O U N D | T H A T | B R O K E | I N T O | T H E | C L E A R | A I R | L O U D E R | T H A N | T H E Y | H A D | E V E R | H E A R D | I T | B E F O R E | T H E | B E L L | O F | T H E | M O N A S T E R Y | O F | T H E I R | O W N | V I L L A G E | R I N G I N G | F O R | E A R L Y | P R A Y E R S |
+P E T E R | C O U L D | G I V E | N O | V E R Y | C L E A R | A C C O U N T | O F | H I S | T R A N S A C T I O N S | W I T H | C U T T E R |
+E V E R Y | O N E | S A I D | P E T E R | K I S S E D | T H E | C O W | B E F O R E | S H E | W A S | L E D | A W A Y | B Y | H E R | N E W | O W N E R |
+H E | S E E M E D | T O | B E | C U R S I N G | P E O P L E | W H O | H A D | W R O N G E D | H I M |
+T H E Y | W E R E | W I T H I N | A | F E W | M I L E S | O F | T H E I R | V I L L A G E | N O W |
+T H E R E | A R E | O N L Y | T H R E E | S L E D G E S | L E F T | H E | W H I S P E R E D |
+P A V E L | K N O C K E D | H I M | O V E R | T H E | S I D E | O F | T H E | S L E D G E | A N D | T H R E W | T H E | G I R L | A F T E R | H I M |
+A F T E R | T H E | C E R E M O N Y | A T | T H E | C H U R C H | T H E | P A R T Y | W E N T | T O | A | D I N N E R | G I V E N | B Y | T H E | P A R E N T S | O F | T H E | B R I D E |
+N O W | H I S | M I D D L E | H O R S E | W A S | B E I N G | A L M O S T | D R A G G E D | B Y | T H E | O T H E R | T W O |
+I | E X P L A I N E D | T O | A N T O N I A | H O W | T H I S | M E A N T | T H A T | H E | W A S | T W E N T Y | F O U R | Y E A R S | O L D | T H A T | H E | M U S T | H A V E | B E E N | T H E R E | W H E N | W H I T E | M E N | F I R S T | C A M E | L E F T | O N | F R O M | B U F F A L O | A N D | I N D I A N | T I M E S |
+I | F O L L O W E D | W I T H | T H E | S P A D E | O V E R | M Y | S H O U L D E R | D R A G G I N G | M Y | S N A K E |
+I | N E V E R | K N O W | Y O U | W A S | S O | B R A V E | J I M | S H E | W E N T | O N | C O M F O R T I N G L Y |
+I | W H I R L E D | R O U N D | A N D | T H E R E | O N | O N E | O F | T H O S E | D R Y | G R A V E L | B E D S | W A S | T H E | B I G G E S T | S N A K E | I | H A D | E V E R | S E E N |
+H E | C O U L D | S T A N D | R I G H T | U P | A N D | T A L K | T O | Y O U | H E | C O U L D | D I D | H E | F I G H T | H A R D |
+O T T O | W I N K E D | A T | M E |
+O N E | D A Y | W H E N | I | R O D E | O V E R | T O | T H E | S H I M E R D A S | I | F O U N D | A N T O N I A | S T A R T I N G | O F F | O N | F O O T | F O R | R U S S I A N | P E T E R ' S | H O U S E | T O | B O R R O W | A | S P A D E | A M B R O S C H | N E E D E D |
+T H E R E | H A D | B E E N | A N O T H E R | B L A C K | F R O S T | T H E | N I G H T | B E F O R E | A N D | T H E | A I R | W A S | C L E A R | A N D | H E A D Y | A S | W I N E |
+W E | D E C I D E D | T H A T | A N T O N I A | S H O U L D | R I D E | D U D E | H O M E | A N D | I | W O U L D | W A L K |
+A | S N A K E | O F | H I S | S I Z E | I N | F I G H T I N G | T R I M | W O U L D | B E | M O R E | T H A N | A N Y | B O Y | C O U L D | H A N D L E |
+O T T O | F U C H S | W A S | T H E | F I R S T | O N E | W E | M E T |
+L O O K | T O N Y | T H A T ' S | H I S | P O I S O N | I | S A I D |
+T H I S | C H A N G E | C A M E | A B O U T | F R O M | A N | A D V E N T U R E | W E | H A D | T O G E T H E R |
+A | F A I N T | F E T I D | S M E L L | C A M E | F R O M | H I M | A N D | A | T H R E A D | O F | G R E E N | L I Q U I D | O O Z E D | F R O M | H I S | C R U S H E D | H E A D |
+S H E | W A S | F O U R | Y E A R S | O L D E R | T H A N | I | T O | B E | S U R E | A N D | H A D | S E E N | M O R E | O F | T H E | W O R L D | B U T | I | W A S | A | B O Y | A N D | S H E | W A S | A | G I R L | A N D | I | R E S E N T E D | H E R | P R O T E C T I N G | M A N N E R |
+I | K N O W | I | A M | J U S T | A W F U L | J I M | I | W A S | S O | S C A R E D |
+I T | W A S | O N | O N E | O F | T H E S E | G R A V E L | B E D S | T H A T | I | M E T | M Y | A D V E N T U R E |
+I T | W A S | T H E | S E A S O N | W H E N | T H E | A N C I E N T | S U N | G O D | H A D | B E E N | A C C U S T O M E D | T O | R E C E I V E | H I S | A N N U A L | O B L A T I O N S | A N D | W E | C A N | W E L L | B E L I E V E | T H A T | T H O S E | W H O S E | H E A R T S | S T I L L | T R E M B L E D | A T | T H E | N A M E | O F | B E L | M U S T | H A V E | C O N N E C T E D | T H E | E C L I P S E | A N D | T H E | P L A G U E | W I T H | T H E | R E V O L U T I O N | I N | T H E | N A T I O N A L | W O R S H I P | A N D | T H E | O V E R T H R O W | O F | T H E | A N C I E N T | G O D S | O N | T H A T | P L A I N | O F | P R O S T R A T I O N | W H E R E | T H E Y | H A D | S O | L O N G | R E C E I V E D | T H E | H O M A G E | O F | A N | E N T I R E | P E O P L E |
+S O | S L O W | A N D | P A T I E N T | I S | T H E | P R O C E S S | B Y | W H I C H | C H R I S T I A N I T Y | I N F U S E S | I T S E L F | I N T O | T H E | S O C I A L | L I F E | O F | A | C O N V E R T E D | P E O P L E |
+N O T H I N G | C O U L D | B E | M O R E | N A T U R A L | T H A N | S U C H | A N | A S S E M B L Y | I N | S U C H | A | P L A C E | A T | S U C H | A | P E R I O D |
+L A S T L Y | T H E | R O Y A L | B R O T H E R S | F E L L | T H E M S E L V E S | V I C T I M S | T O | T H E | E P I D E M I C | W H I C H | S O | S A D L Y | S I G N A L I Z E S | T H E I R | R E I G N |
+T H E | T R I B U T E | W A S | A T | T H I S | P E R I O D | E N O R M O U S | F I F T E E N | T H O U S A N D | H E A D | O F | C A T T L E | A N N U A L L Y |
+T H E | K I N G D O M | O F | N O R T H U M B R I A | A S | T H E | N A M E | I M P L I E S | E M B R A C E D | N E A R L Y | A L L | T H E | C O U N T R Y | F R O M | T H E | H U M B E R | T O | T H E | P I C T I S H | B O R D E R |
+H E R E | T H E | H O L Y | P R E L A T E | O F | F E R N S | M E T | H I M | A N D | R E L A T E D | A | V I S I O N | I N | W H I C H | H E | H A D | B E E N | I N S T R U C T E D | T O | D E M A N D | T H E | A B O L I T I O N | O F | T H E | I M P O S T |
+T H E | P O E T S | O F | S U C C E E D I N G | A G E S | H A V E | D W E L T | M U C H | I N | D E T A I L | O N | T H E | O C C U R R E N C E S | O F | T H I S | M E M O R A B L E | D A Y |
+T H E | S A X O N S | O F | K E N T | A N D | T H E | S O U T H E R N | K I N G D O M S | G E N E R A L L Y | W E R E | C O N V E R T E D | B Y | M I S S I O N A R I E S | F R O M | F R A N C E | O R | R O M E | O R | N A T I V E | P R E A C H E R S | O F | T H E | F I R S T | O R | S E C O N D | C H R I S T I A N | G E N E R A T I O N | T H O S E | O F | N O R T H U M B R I A | R E C O G N I S E | A S | T H E I R | A P O S T L E S | S A I N T | A I D A N | A N D | S A I N T | C U T H B E R T | T W O | F A T H E R S | F R O M | I O N A |
+I T | I S | P R E T T Y | C L E A R | A L S O | T H A T | T H E | L A S T | R A L L Y | O F | D R U I D I S M | A G A I N S T | C H R I S T I A N I T Y | T O O K | P L A C E | B E H I N D | H I S | B A N N E R | O N | T H E | P L A I N | O F | M O I R A |
+T H R O U G H O U T | T H I S | C E N T U R Y | T H E | P O W E R | O F | T H E | C H U R C H | W A S | C O N S T A N T L Y | O N | T H E | I N C R E A S E | A N D | I S | V I S I B L E | I N | M A N Y | I M P O R T A N T | C H A N G E S |
+T H E | A N C E S T O R S | O F | T H E | P R E S E N T | P R E T E N D E R | C O N G A L | S U R N A M E D | T H E | S Q U I N T | E Y E D | H A D | T W I C E | R E C E I V E D | A N D | C H E R I S H E D | T H E | L I C E N T I O U S | B A R D S | W H E N | U N D E R | T H E | B A N | O F | T A R A | A N D | H I S | P O P U L A R I T Y | W I T H | T H A T | S T I L L | P O W E R F U L | O R D E R | W A S | O N E | P R O P | O F | H I S | A M B I T I O N |
+T H E | B A R R E N | R O C K | A B O U T | T H R E E | M I L E S | I N | L E N G T H | W A S | C O V E R E D | W I T H | M O N A S T I C | B U I L D I N G S | A N D | I T S | C E M E T E R Y | W A S | A L R E A D Y | A D O R N E D | W I T H | T H E | T O M B S | O F | S A I N T S | A N D | K I N G S |
+W H I L E | T H E | L I B E R A T E D | E X I L E S | R E J O I C E D | O N | T H E | P L A I N | O F | M E A T H | T H E | T E N T | O F | T H E | A B B O T | O F | I O N A | W A S | P I T C H E D | O N | T H E | R A T H | O F | T A R A | A | F A C T | W H I C H | W O U L D | S E E M | T O | I N D I C A T E | T H A T | A L R E A D Y | I N | L I T T L E | M O R E | T H A N | A | C E N T U R Y | S I N C E | T H E | I N T E R D I C T | H A D | F A L L E N | O N | I T | T H E | E D I F I C E S | W H I C H | M A D E | S O | F I N E | A | S H O W | I N | T H E | D A Y S | O F | P A T R I C K | W E R E | R U I N E D | A N D | U N I N H A B I T A B L E |
+T H E | O N L Y | C O N F L I C T S | T H A T | O C C U R R E D | O N | I R I S H | S O I L | W I T H | A | P I C T I S H | O R | A N | A N G L O | S A X O N | F O R C E | I F | W E | E X C E P T | T H O S E | W H O | F O R M E D | A | C O N T I N G E N T | O F | C O N G A L ' S | A R M Y | A T | M O I R A | O C C U R R E D | I N | T H E | T I M E | O F | T H E | H O S P I T A B L E | F I N N A C T A |
+S A I N T | M O L I N G | S U R V I V E D | H I M | T H R E E | Y E A R S | A N D | S A I N T | A D A M N A N | S O | I N T I M A T E L Y | C O N N E C T E D | W I T H | H I S | R E I G N | T E N | Y E A R S |
+N O W | E V E R Y | M I S S I O N A R Y | T H A T | E V E R | W E N T | O U T | F R O M | I O N A | H A D | T A U G H T | T H A T | T O | R E D U C E | C H R I S T I A N S | T O | S L A V E R Y | W A S | W H O L L Y | I N C O N S I S T E N T | W I T H | A | B E L I E F | I N | T H E | D O C T R I N E S | O F | T H E | G O S P E L |
+A S | L E A D I N G | T O | T H E | M E N T I O N | O F | O T H E R | I N T E R E S T I N G | E V E N T S | W E | M U S T | S E T | T H I S | I N R O A D | C L E A R L Y | B E F O R E | T H E | R E A D E R |
+L I K E | T H E | T W O | K I N G S | O F | S P A R T A | T H E Y | R E I G N E D | J O I N T L Y | D I V I D I N G | B E T W E E N | T H E M | T H E | L A B O U R S | A N D | C A R E S | O F | S T A T E |
diff --git a/SpeechT5/asr_train/train.tsv b/SpeechT5/asr_train/train.tsv
new file mode 100644
index 0000000000000000000000000000000000000000..9e7dd56569928fbd2602534660120a071e56b123
--- /dev/null
+++ b/SpeechT5/asr_train/train.tsv
@@ -0,0 +1,2675 @@
+/public/home/changhl/dataset/LibriSpeech/dev-clean
+2412/153954/2412-153954-0004.flac	166560
+2412/153954/2412-153954-0000.flac	158640
+2412/153954/2412-153954-0002.flac	64880
+2412/153954/2412-153954-0014.flac	281840
+2412/153954/2412-153954-0003.flac	130400
+2412/153954/2412-153954-0007.flac	134880
+2412/153954/2412-153954-0009.flac	166720
+2412/153954/2412-153954-0010.flac	127040
+2412/153954/2412-153954-0022.flac	56640
+2412/153954/2412-153954-0017.flac	111840
+2412/153954/2412-153954-0021.flac	97760
+2412/153954/2412-153954-0001.flac	156000
+2412/153954/2412-153954-0013.flac	62640
+2412/153954/2412-153954-0023.flac	42400
+2412/153954/2412-153954-0005.flac	93520
+2412/153954/2412-153954-0019.flac	119760
+2412/153954/2412-153954-0012.flac	129920
+2412/153954/2412-153954-0006.flac	163520
+2412/153954/2412-153954-0018.flac	193680
+2412/153954/2412-153954-0020.flac	127600
+2412/153954/2412-153954-0011.flac	94960
+2412/153954/2412-153954-0024.flac	67040
+2412/153954/2412-153954-0016.flac	135040
+2412/153954/2412-153954-0015.flac	182240
+2412/153948/2412-153948-0008.flac	93040
+2412/153948/2412-153948-0005.flac	50240
+2412/153948/2412-153948-0003.flac	136240
+2412/153948/2412-153948-0012.flac	98720
+2412/153948/2412-153948-0002.flac	59520
+2412/153948/2412-153948-0013.flac	136640
+2412/153948/2412-153948-0000.flac	186560
+2412/153948/2412-153948-0006.flac	265840
+2412/153948/2412-153948-0015.flac	55440
+2412/153948/2412-153948-0004.flac	362880
+2412/153948/2412-153948-0007.flac	149920
+2412/153948/2412-153948-0009.flac	95920
+2412/153948/2412-153948-0001.flac	168160
+2412/153948/2412-153948-0010.flac	46640
+2412/153948/2412-153948-0014.flac	155280
+2412/153948/2412-153948-0011.flac	155921
+2412/153947/2412-153947-0000.flac	40800
+2412/153947/2412-153947-0007.flac	139760
+2412/153947/2412-153947-0001.flac	51600
+2412/153947/2412-153947-0008.flac	173440
+2412/153947/2412-153947-0011.flac	162080
+2412/153947/2412-153947-0002.flac	105520
+2412/153947/2412-153947-0013.flac	134720
+2412/153947/2412-153947-0006.flac	186880
+2412/153947/2412-153947-0016.flac	82960
+2412/153947/2412-153947-0005.flac	218880
+2412/153947/2412-153947-0009.flac	139520
+2412/153947/2412-153947-0004.flac	139520
+2412/153947/2412-153947-0012.flac	31680
+2412/153947/2412-153947-0014.flac	204720
+2412/153947/2412-153947-0010.flac	275121
+2412/153947/2412-153947-0015.flac	176480
+2412/153947/2412-153947-0003.flac	152240
+8842/302203/8842-302203-0009.flac	193520
+8842/302203/8842-302203-0005.flac	132320
+8842/302203/8842-302203-0006.flac	84480
+8842/302203/8842-302203-0011.flac	107520
+8842/302203/8842-302203-0007.flac	59200
+8842/302203/8842-302203-0008.flac	70640
+8842/302203/8842-302203-0002.flac	252320
+8842/302203/8842-302203-0001.flac	151760
+8842/302203/8842-302203-0010.flac	230320
+8842/302203/8842-302203-0003.flac	69360
+8842/302203/8842-302203-0004.flac	208720
+8842/302203/8842-302203-0000.flac	179840
+8842/302196/8842-302196-0012.flac	90080
+8842/302196/8842-302196-0010.flac	133360
+8842/302196/8842-302196-0007.flac	99440
+8842/302196/8842-302196-0001.flac	149600
+8842/302196/8842-302196-0004.flac	91840
+8842/302196/8842-302196-0003.flac	141760
+8842/302196/8842-302196-0009.flac	112880
+8842/302196/8842-302196-0000.flac	234400
+8842/302196/8842-302196-0008.flac	84960
+8842/302196/8842-302196-0006.flac	85840
+8842/302196/8842-302196-0002.flac	106480
+8842/302196/8842-302196-0005.flac	375120
+8842/302196/8842-302196-0011.flac	59280
+8842/302201/8842-302201-0006.flac	157840
+8842/302201/8842-302201-0008.flac	95520
+8842/302201/8842-302201-0010.flac	39600
+8842/302201/8842-302201-0003.flac	132240
+8842/302201/8842-302201-0012.flac	76880
+8842/302201/8842-302201-0015.flac	143760
+8842/302201/8842-302201-0005.flac	161520
+8842/302201/8842-302201-0007.flac	77680
+8842/302201/8842-302201-0013.flac	88560
+8842/302201/8842-302201-0000.flac	152000
+8842/302201/8842-302201-0009.flac	78640
+8842/302201/8842-302201-0002.flac	165360
+8842/302201/8842-302201-0004.flac	121440
+8842/302201/8842-302201-0001.flac	151040
+8842/302201/8842-302201-0011.flac	83520
+8842/302201/8842-302201-0014.flac	249920
+8842/304647/8842-304647-0006.flac	367360
+8842/304647/8842-304647-0007.flac	27680
+8842/304647/8842-304647-0012.flac	69200
+8842/304647/8842-304647-0002.flac	507200
+8842/304647/8842-304647-0001.flac	43200
+8842/304647/8842-304647-0011.flac	140080
+8842/304647/8842-304647-0010.flac	195120
+8842/304647/8842-304647-0000.flac	155360
+8842/304647/8842-304647-0013.flac	142000
+8842/304647/8842-304647-0008.flac	132480
+8842/304647/8842-304647-0005.flac	35520
+8842/304647/8842-304647-0009.flac	191280
+8842/304647/8842-304647-0004.flac	162480
+8842/304647/8842-304647-0003.flac	125520
+6345/93302/6345-93302-0025.flac	46960
+6345/93302/6345-93302-0007.flac	69920
+6345/93302/6345-93302-0008.flac	64480
+6345/93302/6345-93302-0012.flac	94800
+6345/93302/6345-93302-0001.flac	138000
+6345/93302/6345-93302-0014.flac	97520
+6345/93302/6345-93302-0020.flac	50560
+6345/93302/6345-93302-0028.flac	31040
+6345/93302/6345-93302-0023.flac	88480
+6345/93302/6345-93302-0004.flac	70640
+6345/93302/6345-93302-0026.flac	83040
+6345/93302/6345-93302-0002.flac	120800
+6345/93302/6345-93302-0010.flac	71200
+6345/93302/6345-93302-0016.flac	107520
+6345/93302/6345-93302-0006.flac	49040
+6345/93302/6345-93302-0024.flac	40560
+6345/93302/6345-93302-0005.flac	165760
+6345/93302/6345-93302-0021.flac	34640
+6345/93302/6345-93302-0003.flac	88560
+6345/93302/6345-93302-0022.flac	41120
+6345/93302/6345-93302-0029.flac	210160
+6345/93302/6345-93302-0000.flac	158080
+6345/93302/6345-93302-0015.flac	96400
+6345/93302/6345-93302-0013.flac	41200
+6345/93302/6345-93302-0009.flac	98000
+6345/93302/6345-93302-0011.flac	34080
+6345/93302/6345-93302-0027.flac	102800
+6345/93302/6345-93302-0019.flac	61040
+6345/93302/6345-93302-0017.flac	105360
+6345/64257/6345-64257-0018.flac	76560
+6345/64257/6345-64257-0020.flac	45760
+6345/64257/6345-64257-0012.flac	45040
+6345/64257/6345-64257-0007.flac	93440
+6345/64257/6345-64257-0016.flac	92480
+6345/64257/6345-64257-0013.flac	100720
+6345/64257/6345-64257-0014.flac	49360
+6345/64257/6345-64257-0008.flac	51280
+6345/64257/6345-64257-0019.flac	123520
+6345/64257/6345-64257-0004.flac	96800
+6345/64257/6345-64257-0010.flac	63760
+6345/64257/6345-64257-0002.flac	160720
+6345/64257/6345-64257-0001.flac	350880
+6345/64257/6345-64257-0005.flac	122640
+6345/64257/6345-64257-0003.flac	140480
+6345/64257/6345-64257-0011.flac	141280
+6345/64257/6345-64257-0009.flac	198160
+6345/64257/6345-64257-0000.flac	176880
+6345/64257/6345-64257-0006.flac	146000
+6345/64257/6345-64257-0017.flac	45280
+6345/64257/6345-64257-0015.flac	36640
+6345/93306/6345-93306-0020.flac	107760
+6345/93306/6345-93306-0017.flac	119600
+6345/93306/6345-93306-0005.flac	72240
+6345/93306/6345-93306-0011.flac	69200
+6345/93306/6345-93306-0016.flac	67440
+6345/93306/6345-93306-0009.flac	41120
+6345/93306/6345-93306-0008.flac	47360
+6345/93306/6345-93306-0013.flac	173040
+6345/93306/6345-93306-0021.flac	60560
+6345/93306/6345-93306-0018.flac	70480
+6345/93306/6345-93306-0023.flac	189600
+6345/93306/6345-93306-0001.flac	231360
+6345/93306/6345-93306-0025.flac	184800
+6345/93306/6345-93306-0000.flac	335920
+6345/93306/6345-93306-0014.flac	35840
+6345/93306/6345-93306-0004.flac	50720
+6345/93306/6345-93306-0007.flac	82000
+6345/93306/6345-93306-0024.flac	149840
+6345/93306/6345-93306-0015.flac	133120
+6345/93306/6345-93306-0019.flac	122720
+6345/93306/6345-93306-0010.flac	88560
+6345/93306/6345-93306-0006.flac	73600
+6345/93306/6345-93306-0002.flac	82080
+6345/93306/6345-93306-0003.flac	39840
+6345/93306/6345-93306-0022.flac	161760
+6345/93306/6345-93306-0012.flac	41040
+777/126732/777-126732-0003.flac	151200
+777/126732/777-126732-0035.flac	89360
+777/126732/777-126732-0053.flac	74560
+777/126732/777-126732-0079.flac	160640
+777/126732/777-126732-0067.flac	83920
+777/126732/777-126732-0076.flac	68160
+777/126732/777-126732-0015.flac	152160
+777/126732/777-126732-0078.flac	36320
+777/126732/777-126732-0017.flac	43680
+777/126732/777-126732-0042.flac	89120
+777/126732/777-126732-0013.flac	242400
+777/126732/777-126732-0000.flac	43840
+777/126732/777-126732-0041.flac	161440
+777/126732/777-126732-0016.flac	200560
+777/126732/777-126732-0057.flac	175600
+777/126732/777-126732-0004.flac	68080
+777/126732/777-126732-0050.flac	108960
+777/126732/777-126732-0014.flac	118720
+777/126732/777-126732-0022.flac	48320
+777/126732/777-126732-0049.flac	36000
+777/126732/777-126732-0066.flac	185520
+777/126732/777-126732-0064.flac	93759
+777/126732/777-126732-0065.flac	83600
+777/126732/777-126732-0037.flac	87040
+777/126732/777-126732-0040.flac	79760
+777/126732/777-126732-0045.flac	191600
+777/126732/777-126732-0031.flac	177040
+777/126732/777-126732-0032.flac	93600
+777/126732/777-126732-0052.flac	79200
+777/126732/777-126732-0063.flac	290720
+777/126732/777-126732-0074.flac	55680
+777/126732/777-126732-0062.flac	38000
+777/126732/777-126732-0060.flac	126320
+777/126732/777-126732-0030.flac	103440
+777/126732/777-126732-0025.flac	146880
+777/126732/777-126732-0009.flac	77040
+777/126732/777-126732-0006.flac	73200
+777/126732/777-126732-0036.flac	145120
+777/126732/777-126732-0005.flac	177920
+777/126732/777-126732-0075.flac	71920
+777/126732/777-126732-0061.flac	73040
+777/126732/777-126732-0055.flac	61680
+777/126732/777-126732-0069.flac	79120
+777/126732/777-126732-0027.flac	69280
+777/126732/777-126732-0026.flac	31520
+777/126732/777-126732-0039.flac	45360
+777/126732/777-126732-0056.flac	148160
+777/126732/777-126732-0046.flac	35920
+777/126732/777-126732-0059.flac	98560
+777/126732/777-126732-0047.flac	130000
+777/126732/777-126732-0020.flac	35280
+777/126732/777-126732-0021.flac	58160
+777/126732/777-126732-0012.flac	77360
+777/126732/777-126732-0011.flac	119200
+777/126732/777-126732-0008.flac	70800
+777/126732/777-126732-0051.flac	69360
+777/126732/777-126732-0028.flac	204320
+777/126732/777-126732-0018.flac	64800
+777/126732/777-126732-0072.flac	43360
+777/126732/777-126732-0034.flac	55680
+777/126732/777-126732-0058.flac	83600
+777/126732/777-126732-0010.flac	54720
+777/126732/777-126732-0054.flac	48480
+777/126732/777-126732-0048.flac	95600
+777/126732/777-126732-0033.flac	58480
+777/126732/777-126732-0068.flac	44240
+777/126732/777-126732-0029.flac	57360
+777/126732/777-126732-0081.flac	24080
+777/126732/777-126732-0077.flac	38480
+777/126732/777-126732-0043.flac	40560
+777/126732/777-126732-0007.flac	196080
+777/126732/777-126732-0024.flac	217600
+777/126732/777-126732-0019.flac	79040
+777/126732/777-126732-0073.flac	60800
+777/126732/777-126732-0002.flac	80960
+777/126732/777-126732-0038.flac	49600
+777/126732/777-126732-0001.flac	36800
+777/126732/777-126732-0071.flac	36880
+777/126732/777-126732-0070.flac	117040
+777/126732/777-126732-0080.flac	40560
+777/126732/777-126732-0023.flac	112320
+3853/163249/3853-163249-0035.flac	76080
+3853/163249/3853-163249-0056.flac	66480
+3853/163249/3853-163249-0017.flac	190320
+3853/163249/3853-163249-0001.flac	138560
+3853/163249/3853-163249-0036.flac	386000
+3853/163249/3853-163249-0034.flac	44240
+3853/163249/3853-163249-0002.flac	122640
+3853/163249/3853-163249-0007.flac	87999
+3853/163249/3853-163249-0055.flac	62320
+3853/163249/3853-163249-0005.flac	170720
+3853/163249/3853-163249-0011.flac	36160
+3853/163249/3853-163249-0032.flac	85440
+3853/163249/3853-163249-0052.flac	191520
+3853/163249/3853-163249-0050.flac	69360
+3853/163249/3853-163249-0042.flac	210960
+3853/163249/3853-163249-0019.flac	92800
+3853/163249/3853-163249-0012.flac	236320
+3853/163249/3853-163249-0030.flac	263120
+3853/163249/3853-163249-0028.flac	153120
+3853/163249/3853-163249-0051.flac	92160
+3853/163249/3853-163249-0054.flac	296721
+3853/163249/3853-163249-0015.flac	36800
+3853/163249/3853-163249-0020.flac	97120
+3853/163249/3853-163249-0053.flac	330240
+3853/163249/3853-163249-0048.flac	137040
+3853/163249/3853-163249-0013.flac	97840
+3853/163249/3853-163249-0018.flac	164880
+3853/163249/3853-163249-0021.flac	72000
+3853/163249/3853-163249-0031.flac	141280
+3853/163249/3853-163249-0029.flac	63120
+3853/163249/3853-163249-0039.flac	76640
+3853/163249/3853-163249-0009.flac	92640
+3853/163249/3853-163249-0046.flac	278800
+3853/163249/3853-163249-0047.flac	190960
+3853/163249/3853-163249-0027.flac	137280
+3853/163249/3853-163249-0044.flac	56080
+3853/163249/3853-163249-0022.flac	136080
+3853/163249/3853-163249-0041.flac	114240
+3853/163249/3853-163249-0014.flac	175040
+3853/163249/3853-163249-0008.flac	93601
+3853/163249/3853-163249-0016.flac	247520
+3853/163249/3853-163249-0003.flac	109600
+3853/163249/3853-163249-0038.flac	99840
+3853/163249/3853-163249-0023.flac	79680
+3853/163249/3853-163249-0049.flac	217040
+3853/163249/3853-163249-0026.flac	93360
+3853/163249/3853-163249-0045.flac	116640
+3853/163249/3853-163249-0006.flac	145520
+3853/163249/3853-163249-0010.flac	162880
+3853/163249/3853-163249-0033.flac	62000
+3853/163249/3853-163249-0000.flac	151680
+3853/163249/3853-163249-0037.flac	106640
+3853/163249/3853-163249-0004.flac	93600
+3853/163249/3853-163249-0024.flac	234880
+3853/163249/3853-163249-0043.flac	90560
+3752/4944/3752-4944-0059.flac	73920
+3752/4944/3752-4944-0068.flac	236080
+3752/4944/3752-4944-0002.flac	51600
+3752/4944/3752-4944-0009.flac	41760
+3752/4944/3752-4944-0023.flac	40000
+3752/4944/3752-4944-0058.flac	32800
+3752/4944/3752-4944-0012.flac	88720
+3752/4944/3752-4944-0011.flac	54321
+3752/4944/3752-4944-0013.flac	95120
+3752/4944/3752-4944-0056.flac	72480
+3752/4944/3752-4944-0036.flac	109600
+3752/4944/3752-4944-0053.flac	140720
+3752/4944/3752-4944-0054.flac	50880
+3752/4944/3752-4944-0048.flac	77280
+3752/4944/3752-4944-0015.flac	66160
+3752/4944/3752-4944-0022.flac	262400
+3752/4944/3752-4944-0029.flac	35600
+3752/4944/3752-4944-0027.flac	182240
+3752/4944/3752-4944-0037.flac	41200
+3752/4944/3752-4944-0055.flac	34720
+3752/4944/3752-4944-0005.flac	38720
+3752/4944/3752-4944-0032.flac	75840
+3752/4944/3752-4944-0060.flac	64400
+3752/4944/3752-4944-0052.flac	126800
+3752/4944/3752-4944-0016.flac	174240
+3752/4944/3752-4944-0049.flac	44240
+3752/4944/3752-4944-0066.flac	114000
+3752/4944/3752-4944-0050.flac	43520
+3752/4944/3752-4944-0006.flac	62160
+3752/4944/3752-4944-0019.flac	46880
+3752/4944/3752-4944-0069.flac	163360
+3752/4944/3752-4944-0062.flac	67840
+3752/4944/3752-4944-0030.flac	51280
+3752/4944/3752-4944-0010.flac	57599
+3752/4944/3752-4944-0028.flac	51360
+3752/4944/3752-4944-0051.flac	56560
+3752/4944/3752-4944-0043.flac	52800
+3752/4944/3752-4944-0007.flac	44800
+3752/4944/3752-4944-0017.flac	112720
+3752/4944/3752-4944-0046.flac	29760
+3752/4944/3752-4944-0033.flac	67600
+3752/4944/3752-4944-0034.flac	70720
+3752/4944/3752-4944-0067.flac	43121
+3752/4944/3752-4944-0024.flac	32480
+3752/4944/3752-4944-0047.flac	111600
+3752/4944/3752-4944-0031.flac	138400
+3752/4944/3752-4944-0001.flac	70640
+3752/4944/3752-4944-0042.flac	51360
+3752/4944/3752-4944-0040.flac	44560
+3752/4944/3752-4944-0041.flac	39600
+3752/4944/3752-4944-0064.flac	108880
+3752/4944/3752-4944-0057.flac	40560
+3752/4944/3752-4944-0020.flac	215760
+3752/4944/3752-4944-0018.flac	118000
+3752/4944/3752-4944-0000.flac	53360
+3752/4944/3752-4944-0035.flac	52560
+3752/4944/3752-4944-0061.flac	39760
+3752/4944/3752-4944-0004.flac	38800
+3752/4944/3752-4944-0038.flac	45840
+3752/4944/3752-4944-0021.flac	116320
+3752/4944/3752-4944-0026.flac	40800
+3752/4944/3752-4944-0065.flac	67200
+3752/4944/3752-4944-0014.flac	36880
+3752/4944/3752-4944-0003.flac	44160
+3752/4944/3752-4944-0039.flac	90960
+3752/4944/3752-4944-0045.flac	44240
+3752/4944/3752-4944-0044.flac	37840
+3752/4944/3752-4944-0025.flac	59760
+3752/4943/3752-4943-0015.flac	47360
+3752/4943/3752-4943-0028.flac	60160
+3752/4943/3752-4943-0008.flac	60160
+3752/4943/3752-4943-0023.flac	48880
+3752/4943/3752-4943-0003.flac	100000
+3752/4943/3752-4943-0019.flac	39920
+3752/4943/3752-4943-0001.flac	177280
+3752/4943/3752-4943-0013.flac	88560
+3752/4943/3752-4943-0030.flac	38320
+3752/4943/3752-4943-0027.flac	44880
+3752/4943/3752-4943-0026.flac	64800
+3752/4943/3752-4943-0018.flac	84240
+3752/4943/3752-4943-0005.flac	49680
+3752/4943/3752-4943-0022.flac	83120
+3752/4943/3752-4943-0017.flac	36400
+3752/4943/3752-4943-0025.flac	83920
+3752/4943/3752-4943-0012.flac	54400
+3752/4943/3752-4943-0006.flac	42480
+3752/4943/3752-4943-0016.flac	39760
+3752/4943/3752-4943-0004.flac	224160
+3752/4943/3752-4943-0029.flac	61360
+3752/4943/3752-4943-0024.flac	90480
+3752/4943/3752-4943-0020.flac	78000
+3752/4943/3752-4943-0009.flac	57520
+3752/4943/3752-4943-0002.flac	115360
+3752/4943/3752-4943-0014.flac	47840
+3752/4943/3752-4943-0000.flac	141280
+3752/4943/3752-4943-0021.flac	72240
+3752/4943/3752-4943-0007.flac	160000
+3752/4943/3752-4943-0010.flac	81280
+6319/275224/6319-275224-0002.flac	119280
+6319/275224/6319-275224-0008.flac	295120
+6319/275224/6319-275224-0003.flac	185760
+6319/275224/6319-275224-0006.flac	273440
+6319/275224/6319-275224-0007.flac	44240
+6319/275224/6319-275224-0001.flac	194800
+6319/275224/6319-275224-0005.flac	204000
+6319/275224/6319-275224-0011.flac	293040
+6319/275224/6319-275224-0020.flac	191040
+6319/275224/6319-275224-0016.flac	79680
+6319/275224/6319-275224-0004.flac	100720
+6319/275224/6319-275224-0000.flac	77200
+6319/275224/6319-275224-0009.flac	72720
+6319/275224/6319-275224-0012.flac	65840
+6319/275224/6319-275224-0019.flac	92480
+6319/275224/6319-275224-0018.flac	108080
+6319/275224/6319-275224-0010.flac	111040
+6319/275224/6319-275224-0017.flac	134560
+6319/275224/6319-275224-0015.flac	119360
+6319/275224/6319-275224-0013.flac	77920
+6319/275224/6319-275224-0014.flac	171680
+6319/57405/6319-57405-0009.flac	246880
+6319/57405/6319-57405-0008.flac	199360
+6319/57405/6319-57405-0011.flac	59360
+6319/57405/6319-57405-0005.flac	138000
+6319/57405/6319-57405-0003.flac	133920
+6319/57405/6319-57405-0001.flac	162320
+6319/57405/6319-57405-0002.flac	104320
+6319/57405/6319-57405-0010.flac	53440
+6319/57405/6319-57405-0007.flac	56080
+6319/57405/6319-57405-0006.flac	142240
+6319/57405/6319-57405-0004.flac	91360
+6319/57405/6319-57405-0012.flac	135040
+6319/57405/6319-57405-0000.flac	116720
+6319/64726/6319-64726-0001.flac	255519
+6319/64726/6319-64726-0020.flac	58800
+6319/64726/6319-64726-0013.flac	124800
+6319/64726/6319-64726-0015.flac	88960
+6319/64726/6319-64726-0016.flac	141360
+6319/64726/6319-64726-0012.flac	147360
+6319/64726/6319-64726-0004.flac	305600
+6319/64726/6319-64726-0008.flac	69760
+6319/64726/6319-64726-0002.flac	195200
+6319/64726/6319-64726-0010.flac	79360
+6319/64726/6319-64726-0005.flac	154880
+6319/64726/6319-64726-0019.flac	204960
+6319/64726/6319-64726-0000.flac	163920
+6319/64726/6319-64726-0009.flac	163840
+6319/64726/6319-64726-0017.flac	108720
+6319/64726/6319-64726-0018.flac	225760
+6319/64726/6319-64726-0006.flac	60160
+6319/64726/6319-64726-0007.flac	267600
+6319/64726/6319-64726-0011.flac	106480
+6319/64726/6319-64726-0003.flac	72480
+3000/15664/3000-15664-0015.flac	212160
+3000/15664/3000-15664-0008.flac	207760
+3000/15664/3000-15664-0038.flac	252480
+3000/15664/3000-15664-0002.flac	114720
+3000/15664/3000-15664-0027.flac	64720
+3000/15664/3000-15664-0010.flac	248720
+3000/15664/3000-15664-0037.flac	161120
+3000/15664/3000-15664-0045.flac	205440
+3000/15664/3000-15664-0004.flac	46160
+3000/15664/3000-15664-0003.flac	74720
+3000/15664/3000-15664-0040.flac	164240
+3000/15664/3000-15664-0034.flac	253840
+3000/15664/3000-15664-0025.flac	79520
+3000/15664/3000-15664-0035.flac	314960
+3000/15664/3000-15664-0028.flac	103680
+3000/15664/3000-15664-0033.flac	357920
+3000/15664/3000-15664-0023.flac	82481
+3000/15664/3000-15664-0020.flac	256320
+3000/15664/3000-15664-0000.flac	50080
+3000/15664/3000-15664-0005.flac	135360
+3000/15664/3000-15664-0036.flac	195600
+3000/15664/3000-15664-0044.flac	222960
+3000/15664/3000-15664-0030.flac	49120
+3000/15664/3000-15664-0009.flac	127600
+3000/15664/3000-15664-0042.flac	102560
+3000/15664/3000-15664-0011.flac	171680
+3000/15664/3000-15664-0006.flac	40800
+3000/15664/3000-15664-0001.flac	333760
+3000/15664/3000-15664-0018.flac	90400
+3000/15664/3000-15664-0031.flac	160320
+3000/15664/3000-15664-0007.flac	111440
+3000/15664/3000-15664-0022.flac	103520
+3000/15664/3000-15664-0026.flac	40640
+3000/15664/3000-15664-0046.flac	85760
+3000/15664/3000-15664-0019.flac	281920
+3000/15664/3000-15664-0024.flac	226400
+3000/15664/3000-15664-0021.flac	204240
+3000/15664/3000-15664-0016.flac	140000
+3000/15664/3000-15664-0039.flac	76080
+3000/15664/3000-15664-0013.flac	145920
+3000/15664/3000-15664-0043.flac	158720
+3000/15664/3000-15664-0012.flac	193440
+3000/15664/3000-15664-0029.flac	91680
+3000/15664/3000-15664-0041.flac	457120
+3000/15664/3000-15664-0017.flac	225360
+3000/15664/3000-15664-0014.flac	115280
+3000/15664/3000-15664-0032.flac	174400
+3576/138058/3576-138058-0001.flac	172480
+3576/138058/3576-138058-0019.flac	158960
+3576/138058/3576-138058-0014.flac	290080
+3576/138058/3576-138058-0005.flac	266480
+3576/138058/3576-138058-0020.flac	392320
+3576/138058/3576-138058-0031.flac	45039
+3576/138058/3576-138058-0022.flac	430400
+3576/138058/3576-138058-0024.flac	367840
+3576/138058/3576-138058-0007.flac	56800
+3576/138058/3576-138058-0012.flac	87680
+3576/138058/3576-138058-0039.flac	226400
+3576/138058/3576-138058-0013.flac	75440
+3576/138058/3576-138058-0034.flac	129040
+3576/138058/3576-138058-0008.flac	61760
+3576/138058/3576-138058-0037.flac	165760
+3576/138058/3576-138058-0026.flac	130080
+3576/138058/3576-138058-0000.flac	226240
+3576/138058/3576-138058-0025.flac	120400
+3576/138058/3576-138058-0029.flac	160960
+3576/138058/3576-138058-0027.flac	166320
+3576/138058/3576-138058-0035.flac	331600
+3576/138058/3576-138058-0016.flac	191040
+3576/138058/3576-138058-0038.flac	154400
+3576/138058/3576-138058-0030.flac	110480
+3576/138058/3576-138058-0006.flac	63120
+3576/138058/3576-138058-0011.flac	180480
+3576/138058/3576-138058-0040.flac	107040
+3576/138058/3576-138058-0021.flac	98880
+3576/138058/3576-138058-0004.flac	264960
+3576/138058/3576-138058-0033.flac	284480
+3576/138058/3576-138058-0036.flac	308640
+3576/138058/3576-138058-0002.flac	357280
+3576/138058/3576-138058-0003.flac	40160
+3576/138058/3576-138058-0015.flac	146960
+3576/138058/3576-138058-0010.flac	251760
+3576/138058/3576-138058-0009.flac	231760
+3576/138058/3576-138058-0023.flac	177120
+3576/138058/3576-138058-0017.flac	277200
+3576/138058/3576-138058-0028.flac	167360
+3576/138058/3576-138058-0032.flac	89361
+3576/138058/3576-138058-0018.flac	117120
+652/130726/652-130726-0002.flac	222160
+652/130726/652-130726-0004.flac	241360
+652/130726/652-130726-0017.flac	93680
+652/130726/652-130726-0024.flac	171760
+652/130726/652-130726-0029.flac	124400
+652/130726/652-130726-0007.flac	77600
+652/130726/652-130726-0027.flac	83760
+652/130726/652-130726-0023.flac	108000
+652/130726/652-130726-0032.flac	95840
+652/130726/652-130726-0011.flac	302800
+652/130726/652-130726-0034.flac	178480
+652/130726/652-130726-0028.flac	48320
+652/130726/652-130726-0012.flac	92800
+652/130726/652-130726-0001.flac	173680
+652/130726/652-130726-0008.flac	148880
+652/130726/652-130726-0025.flac	42080
+652/130726/652-130726-0020.flac	106880
+652/130726/652-130726-0035.flac	81440
+652/130726/652-130726-0033.flac	172800
+652/130726/652-130726-0019.flac	97680
+652/130726/652-130726-0016.flac	166560
+652/130726/652-130726-0009.flac	61360
+652/130726/652-130726-0000.flac	115120
+652/130726/652-130726-0015.flac	149600
+652/130726/652-130726-0005.flac	167120
+652/130726/652-130726-0014.flac	84800
+652/130726/652-130726-0022.flac	197840
+652/130726/652-130726-0026.flac	98160
+652/130726/652-130726-0031.flac	91920
+652/130726/652-130726-0021.flac	182160
+652/130726/652-130726-0010.flac	109600
+652/130726/652-130726-0018.flac	52160
+652/130726/652-130726-0030.flac	45520
+652/130726/652-130726-0003.flac	101840
+652/130726/652-130726-0006.flac	118480
+652/130726/652-130726-0013.flac	85360
+652/130737/652-130737-0005.flac	66160
+652/130737/652-130737-0000.flac	159840
+652/130737/652-130737-0004.flac	79840
+652/130737/652-130737-0007.flac	64800
+652/130737/652-130737-0008.flac	38160
+652/130737/652-130737-0006.flac	77760
+652/130737/652-130737-0013.flac	74720
+652/130737/652-130737-0012.flac	60800
+652/130737/652-130737-0003.flac	127520
+652/130737/652-130737-0010.flac	110800
+652/130737/652-130737-0009.flac	75840
+652/130737/652-130737-0011.flac	63440
+652/130737/652-130737-0001.flac	99840
+652/130737/652-130737-0002.flac	121120
+652/129742/652-129742-0008.flac	54880
+652/129742/652-129742-0015.flac	188880
+652/129742/652-129742-0000.flac	96400
+652/129742/652-129742-0018.flac	87680
+652/129742/652-129742-0010.flac	54400
+652/129742/652-129742-0019.flac	44560
+652/129742/652-129742-0016.flac	255840
+652/129742/652-129742-0014.flac	63120
+652/129742/652-129742-0001.flac	138560
+652/129742/652-129742-0006.flac	145840
+652/129742/652-129742-0017.flac	54480
+652/129742/652-129742-0009.flac	223520
+652/129742/652-129742-0011.flac	52480
+652/129742/652-129742-0012.flac	148560
+652/129742/652-129742-0007.flac	78160
+652/129742/652-129742-0013.flac	89600
+652/129742/652-129742-0004.flac	113920
+652/129742/652-129742-0002.flac	59200
+652/129742/652-129742-0003.flac	146880
+652/129742/652-129742-0005.flac	96240
+652/129742/652-129742-0020.flac	72080
+6313/76958/6313-76958-0030.flac	36240
+6313/76958/6313-76958-0000.flac	46240
+6313/76958/6313-76958-0024.flac	70000
+6313/76958/6313-76958-0018.flac	99120
+6313/76958/6313-76958-0014.flac	120240
+6313/76958/6313-76958-0015.flac	54080
+6313/76958/6313-76958-0001.flac	47520
+6313/76958/6313-76958-0006.flac	39120
+6313/76958/6313-76958-0008.flac	37520
+6313/76958/6313-76958-0025.flac	60720
+6313/76958/6313-76958-0027.flac	56080
+6313/76958/6313-76958-0023.flac	52080
+6313/76958/6313-76958-0002.flac	48080
+6313/76958/6313-76958-0005.flac	92960
+6313/76958/6313-76958-0031.flac	35280
+6313/76958/6313-76958-0010.flac	106080
+6313/76958/6313-76958-0009.flac	102400
+6313/76958/6313-76958-0012.flac	105440
+6313/76958/6313-76958-0004.flac	89040
+6313/76958/6313-76958-0021.flac	190800
+6313/76958/6313-76958-0022.flac	100720
+6313/76958/6313-76958-0003.flac	47600
+6313/76958/6313-76958-0017.flac	80880
+6313/76958/6313-76958-0016.flac	69600
+6313/76958/6313-76958-0011.flac	64400
+6313/76958/6313-76958-0029.flac	107120
+6313/76958/6313-76958-0013.flac	188960
+6313/76958/6313-76958-0007.flac	59200
+6313/76958/6313-76958-0019.flac	49520
+6313/76958/6313-76958-0020.flac	37040
+6313/76958/6313-76958-0026.flac	70880
+6313/76958/6313-76958-0028.flac	60720
+6313/66129/6313-66129-0024.flac	126800
+6313/66129/6313-66129-0019.flac	39360
+6313/66129/6313-66129-0017.flac	138960
+6313/66129/6313-66129-0026.flac	187040
+6313/66129/6313-66129-0033.flac	145040
+6313/66129/6313-66129-0010.flac	67680
+6313/66129/6313-66129-0030.flac	54960
+6313/66129/6313-66129-0012.flac	49440
+6313/66129/6313-66129-0004.flac	62400
+6313/66129/6313-66129-0031.flac	86800
+6313/66129/6313-66129-0025.flac	74720
+6313/66129/6313-66129-0027.flac	82560
+6313/66129/6313-66129-0002.flac	58720
+6313/66129/6313-66129-0018.flac	91280
+6313/66129/6313-66129-0023.flac	84000
+6313/66129/6313-66129-0008.flac	81120
+6313/66129/6313-66129-0003.flac	87200
+6313/66129/6313-66129-0015.flac	120320
+6313/66129/6313-66129-0005.flac	73280
+6313/66129/6313-66129-0032.flac	29760
+6313/66129/6313-66129-0014.flac	181360
+6313/66129/6313-66129-0021.flac	206480
+6313/66129/6313-66129-0000.flac	45040
+6313/66129/6313-66129-0001.flac	188800
+6313/66129/6313-66129-0009.flac	76480
+6313/66129/6313-66129-0011.flac	35440
+6313/66129/6313-66129-0022.flac	68640
+6313/66129/6313-66129-0020.flac	111680
+6313/66129/6313-66129-0007.flac	39280
+6313/66129/6313-66129-0029.flac	113120
+6313/66129/6313-66129-0013.flac	57680
+6313/66129/6313-66129-0006.flac	47360
+6313/66129/6313-66129-0035.flac	67200
+6313/66129/6313-66129-0034.flac	127360
+6313/66129/6313-66129-0028.flac	37120
+6313/66125/6313-66125-0022.flac	124480
+6313/66125/6313-66125-0012.flac	36640
+6313/66125/6313-66125-0018.flac	47280
+6313/66125/6313-66125-0016.flac	119120
+6313/66125/6313-66125-0021.flac	79600
+6313/66125/6313-66125-0017.flac	81040
+6313/66125/6313-66125-0009.flac	43360
+6313/66125/6313-66125-0001.flac	72000
+6313/66125/6313-66125-0011.flac	81920
+6313/66125/6313-66125-0000.flac	37680
+6313/66125/6313-66125-0005.flac	74240
+6313/66125/6313-66125-0004.flac	52160
+6313/66125/6313-66125-0007.flac	78880
+6313/66125/6313-66125-0024.flac	70560
+6313/66125/6313-66125-0014.flac	60560
+6313/66125/6313-66125-0025.flac	85120
+6313/66125/6313-66125-0002.flac	49200
+6313/66125/6313-66125-0008.flac	71840
+6313/66125/6313-66125-0023.flac	149600
+6313/66125/6313-66125-0013.flac	32960
+6313/66125/6313-66125-0003.flac	73280
+6313/66125/6313-66125-0015.flac	131680
+6313/66125/6313-66125-0026.flac	36320
+6313/66125/6313-66125-0020.flac	47840
+6313/66125/6313-66125-0010.flac	60880
+6313/66125/6313-66125-0027.flac	231520
+6313/66125/6313-66125-0019.flac	75040
+6313/66125/6313-66125-0006.flac	48080
+2078/142845/2078-142845-0026.flac	39600
+2078/142845/2078-142845-0017.flac	103760
+2078/142845/2078-142845-0006.flac	234320
+2078/142845/2078-142845-0032.flac	207040
+2078/142845/2078-142845-0044.flac	37200
+2078/142845/2078-142845-0037.flac	253920
+2078/142845/2078-142845-0050.flac	123600
+2078/142845/2078-142845-0039.flac	335200
+2078/142845/2078-142845-0030.flac	192640
+2078/142845/2078-142845-0047.flac	183360
+2078/142845/2078-142845-0010.flac	230560
+2078/142845/2078-142845-0004.flac	370800
+2078/142845/2078-142845-0025.flac	464960
+2078/142845/2078-142845-0016.flac	56480
+2078/142845/2078-142845-0024.flac	35360
+2078/142845/2078-142845-0043.flac	336079
+2078/142845/2078-142845-0049.flac	45920
+2078/142845/2078-142845-0048.flac	35200
+2078/142845/2078-142845-0031.flac	267840
+2078/142845/2078-142845-0007.flac	331840
+2078/142845/2078-142845-0008.flac	310080
+2078/142845/2078-142845-0023.flac	28000
+2078/142845/2078-142845-0001.flac	34160
+2078/142845/2078-142845-0029.flac	77920
+2078/142845/2078-142845-0000.flac	35120
+2078/142845/2078-142845-0009.flac	54960
+2078/142845/2078-142845-0020.flac	150720
+2078/142845/2078-142845-0040.flac	112240
+2078/142845/2078-142845-0028.flac	182800
+2078/142845/2078-142845-0045.flac	114160
+2078/142845/2078-142845-0027.flac	139680
+2078/142845/2078-142845-0022.flac	36800
+2078/142845/2078-142845-0019.flac	150480
+2078/142845/2078-142845-0002.flac	46640
+2078/142845/2078-142845-0046.flac	85680
+2078/142845/2078-142845-0005.flac	502000
+2078/142845/2078-142845-0038.flac	72320
+2078/142845/2078-142845-0051.flac	165760
+2078/142845/2078-142845-0012.flac	211840
+2078/142845/2078-142845-0042.flac	67680
+2078/142845/2078-142845-0035.flac	67520
+2078/142845/2078-142845-0034.flac	156080
+2078/142845/2078-142845-0041.flac	42560
+2078/142845/2078-142845-0036.flac	59520
+2078/142845/2078-142845-0015.flac	157440
+2078/142845/2078-142845-0003.flac	33200
+2078/142845/2078-142845-0011.flac	72480
+2078/142845/2078-142845-0021.flac	98240
+2078/142845/2078-142845-0014.flac	131200
+2078/142845/2078-142845-0033.flac	182160
+2078/142845/2078-142845-0018.flac	85280
+2078/142845/2078-142845-0013.flac	157680
+2428/83705/2428-83705-0015.flac	175760
+2428/83705/2428-83705-0030.flac	147360
+2428/83705/2428-83705-0019.flac	81200
+2428/83705/2428-83705-0010.flac	42080
+2428/83705/2428-83705-0034.flac	87200
+2428/83705/2428-83705-0006.flac	58640
+2428/83705/2428-83705-0017.flac	79840
+2428/83705/2428-83705-0007.flac	53280
+2428/83705/2428-83705-0031.flac	43280
+2428/83705/2428-83705-0037.flac	149840
+2428/83705/2428-83705-0003.flac	34320
+2428/83705/2428-83705-0000.flac	61120
+2428/83705/2428-83705-0024.flac	79040
+2428/83705/2428-83705-0002.flac	263200
+2428/83705/2428-83705-0041.flac	63360
+2428/83705/2428-83705-0021.flac	62000
+2428/83705/2428-83705-0018.flac	94160
+2428/83705/2428-83705-0036.flac	31360
+2428/83705/2428-83705-0027.flac	37120
+2428/83705/2428-83705-0001.flac	109520
+2428/83705/2428-83705-0039.flac	51200
+2428/83705/2428-83705-0025.flac	157520
+2428/83705/2428-83705-0028.flac	148000
+2428/83705/2428-83705-0020.flac	88400
+2428/83705/2428-83705-0033.flac	125120
+2428/83705/2428-83705-0026.flac	163280
+2428/83705/2428-83705-0042.flac	178800
+2428/83705/2428-83705-0038.flac	116880
+2428/83705/2428-83705-0005.flac	68240
+2428/83705/2428-83705-0008.flac	57840
+2428/83705/2428-83705-0011.flac	97840
+2428/83705/2428-83705-0013.flac	77440
+2428/83705/2428-83705-0004.flac	119520
+2428/83705/2428-83705-0023.flac	61520
+2428/83705/2428-83705-0040.flac	69840
+2428/83705/2428-83705-0032.flac	104000
+2428/83705/2428-83705-0029.flac	93280
+2428/83705/2428-83705-0016.flac	58240
+2428/83705/2428-83705-0022.flac	88400
+2428/83705/2428-83705-0009.flac	33920
+2428/83705/2428-83705-0035.flac	93840
+2428/83705/2428-83705-0014.flac	66720
+2428/83705/2428-83705-0043.flac	96160
+2428/83705/2428-83705-0012.flac	142240
+2428/83699/2428-83699-0017.flac	212560
+2428/83699/2428-83699-0026.flac	50240
+2428/83699/2428-83699-0030.flac	58000
+2428/83699/2428-83699-0029.flac	100720
+2428/83699/2428-83699-0042.flac	102880
+2428/83699/2428-83699-0021.flac	45040
+2428/83699/2428-83699-0013.flac	52640
+2428/83699/2428-83699-0035.flac	43600
+2428/83699/2428-83699-0002.flac	68160
+2428/83699/2428-83699-0031.flac	52960
+2428/83699/2428-83699-0004.flac	30080
+2428/83699/2428-83699-0023.flac	41600
+2428/83699/2428-83699-0009.flac	29600
+2428/83699/2428-83699-0038.flac	42720
+2428/83699/2428-83699-0000.flac	212880
+2428/83699/2428-83699-0006.flac	109200
+2428/83699/2428-83699-0016.flac	136240
+2428/83699/2428-83699-0015.flac	35120
+2428/83699/2428-83699-0024.flac	68000
+2428/83699/2428-83699-0012.flac	130000
+2428/83699/2428-83699-0020.flac	78240
+2428/83699/2428-83699-0019.flac	78560
+2428/83699/2428-83699-0018.flac	229520
+2428/83699/2428-83699-0005.flac	145440
+2428/83699/2428-83699-0032.flac	50800
+2428/83699/2428-83699-0037.flac	102320
+2428/83699/2428-83699-0033.flac	62720
+2428/83699/2428-83699-0011.flac	41200
+2428/83699/2428-83699-0001.flac	33120
+2428/83699/2428-83699-0014.flac	35440
+2428/83699/2428-83699-0040.flac	125200
+2428/83699/2428-83699-0010.flac	49920
+2428/83699/2428-83699-0034.flac	129920
+2428/83699/2428-83699-0022.flac	31600
+2428/83699/2428-83699-0003.flac	100240
+2428/83699/2428-83699-0036.flac	73600
+2428/83699/2428-83699-0008.flac	166080
+2428/83699/2428-83699-0039.flac	31680
+2428/83699/2428-83699-0007.flac	91360
+2428/83699/2428-83699-0025.flac	128080
+2428/83699/2428-83699-0027.flac	79440
+2428/83699/2428-83699-0041.flac	33120
+2428/83699/2428-83699-0028.flac	64000
+7976/110523/7976-110523-0008.flac	136880
+7976/110523/7976-110523-0003.flac	97600
+7976/110523/7976-110523-0019.flac	45200
+7976/110523/7976-110523-0013.flac	149120
+7976/110523/7976-110523-0010.flac	173760
+7976/110523/7976-110523-0009.flac	107600
+7976/110523/7976-110523-0017.flac	212160
+7976/110523/7976-110523-0005.flac	124720
+7976/110523/7976-110523-0016.flac	50160
+7976/110523/7976-110523-0002.flac	113920
+7976/110523/7976-110523-0021.flac	80560
+7976/110523/7976-110523-0015.flac	124000
+7976/110523/7976-110523-0000.flac	243520
+7976/110523/7976-110523-0001.flac	75520
+7976/110523/7976-110523-0011.flac	141120
+7976/110523/7976-110523-0018.flac	39360
+7976/110523/7976-110523-0020.flac	136560
+7976/110523/7976-110523-0007.flac	107280
+7976/110523/7976-110523-0006.flac	125040
+7976/110523/7976-110523-0012.flac	260480
+7976/110523/7976-110523-0014.flac	100720
+7976/105575/7976-105575-0007.flac	103680
+7976/105575/7976-105575-0003.flac	146800
+7976/105575/7976-105575-0027.flac	106800
+7976/105575/7976-105575-0028.flac	61920
+7976/105575/7976-105575-0017.flac	30960
+7976/105575/7976-105575-0016.flac	163280
+7976/105575/7976-105575-0002.flac	48720
+7976/105575/7976-105575-0020.flac	33440
+7976/105575/7976-105575-0010.flac	72400
+7976/105575/7976-105575-0024.flac	98160
+7976/105575/7976-105575-0013.flac	160400
+7976/105575/7976-105575-0022.flac	86080
+7976/105575/7976-105575-0018.flac	68640
+7976/105575/7976-105575-0000.flac	146560
+7976/105575/7976-105575-0026.flac	56800
+7976/105575/7976-105575-0021.flac	74480
+7976/105575/7976-105575-0005.flac	88960
+7976/105575/7976-105575-0009.flac	64720
+7976/105575/7976-105575-0029.flac	188640
+7976/105575/7976-105575-0004.flac	96240
+7976/105575/7976-105575-0012.flac	82480
+7976/105575/7976-105575-0001.flac	44800
+7976/105575/7976-105575-0019.flac	83600
+7976/105575/7976-105575-0006.flac	150400
+7976/105575/7976-105575-0025.flac	83760
+7976/105575/7976-105575-0015.flac	117680
+7976/105575/7976-105575-0008.flac	115040
+7976/105575/7976-105575-0023.flac	76480
+7976/105575/7976-105575-0011.flac	102800
+7976/105575/7976-105575-0014.flac	119840
+7976/110124/7976-110124-0007.flac	43840
+7976/110124/7976-110124-0025.flac	41440
+7976/110124/7976-110124-0023.flac	55040
+7976/110124/7976-110124-0017.flac	42960
+7976/110124/7976-110124-0010.flac	105760
+7976/110124/7976-110124-0013.flac	104880
+7976/110124/7976-110124-0001.flac	117601
+7976/110124/7976-110124-0020.flac	40640
+7976/110124/7976-110124-0011.flac	87520
+7976/110124/7976-110124-0000.flac	56799
+7976/110124/7976-110124-0018.flac	129200
+7976/110124/7976-110124-0008.flac	84720
+7976/110124/7976-110124-0006.flac	169920
+7976/110124/7976-110124-0012.flac	132720
+7976/110124/7976-110124-0009.flac	76960
+7976/110124/7976-110124-0024.flac	50400
+7976/110124/7976-110124-0019.flac	43920
+7976/110124/7976-110124-0016.flac	66880
+7976/110124/7976-110124-0004.flac	75760
+7976/110124/7976-110124-0015.flac	85600
+7976/110124/7976-110124-0003.flac	147680
+7976/110124/7976-110124-0021.flac	53840
+7976/110124/7976-110124-0022.flac	64320
+7976/110124/7976-110124-0014.flac	83360
+7976/110124/7976-110124-0005.flac	83360
+7976/110124/7976-110124-0002.flac	125520
+1988/148538/1988-148538-0014.flac	169840
+1988/148538/1988-148538-0012.flac	218480
+1988/148538/1988-148538-0004.flac	219200
+1988/148538/1988-148538-0005.flac	261600
+1988/148538/1988-148538-0010.flac	138960
+1988/148538/1988-148538-0009.flac	118400
+1988/148538/1988-148538-0006.flac	365840
+1988/148538/1988-148538-0000.flac	269200
+1988/148538/1988-148538-0015.flac	277360
+1988/148538/1988-148538-0013.flac	166480
+1988/148538/1988-148538-0008.flac	109040
+1988/148538/1988-148538-0011.flac	152800
+1988/148538/1988-148538-0001.flac	176320
+1988/148538/1988-148538-0007.flac	119040
+1988/148538/1988-148538-0002.flac	71200
+1988/148538/1988-148538-0003.flac	89280
+1988/24833/1988-24833-0022.flac	44240
+1988/24833/1988-24833-0004.flac	88000
+1988/24833/1988-24833-0016.flac	39280
+1988/24833/1988-24833-0015.flac	75920
+1988/24833/1988-24833-0023.flac	69120
+1988/24833/1988-24833-0011.flac	155040
+1988/24833/1988-24833-0006.flac	31680
+1988/24833/1988-24833-0001.flac	126320
+1988/24833/1988-24833-0008.flac	39520
+1988/24833/1988-24833-0014.flac	40480
+1988/24833/1988-24833-0024.flac	96080
+1988/24833/1988-24833-0009.flac	104320
+1988/24833/1988-24833-0007.flac	84480
+1988/24833/1988-24833-0005.flac	103280
+1988/24833/1988-24833-0003.flac	82560
+1988/24833/1988-24833-0021.flac	68480
+1988/24833/1988-24833-0017.flac	97920
+1988/24833/1988-24833-0019.flac	43440
+1988/24833/1988-24833-0013.flac	88560
+1988/24833/1988-24833-0012.flac	58240
+1988/24833/1988-24833-0020.flac	102400
+1988/24833/1988-24833-0018.flac	85600
+1988/24833/1988-24833-0028.flac	47200
+1988/24833/1988-24833-0027.flac	40720
+1988/24833/1988-24833-0025.flac	46560
+1988/24833/1988-24833-0000.flac	53120
+1988/24833/1988-24833-0002.flac	72880
+1988/24833/1988-24833-0010.flac	48480
+1988/24833/1988-24833-0026.flac	41520
+1988/147956/1988-147956-0020.flac	108480
+1988/147956/1988-147956-0013.flac	46240
+1988/147956/1988-147956-0021.flac	78960
+1988/147956/1988-147956-0016.flac	50720
+1988/147956/1988-147956-0007.flac	93840
+1988/147956/1988-147956-0004.flac	144400
+1988/147956/1988-147956-0018.flac	56560
+1988/147956/1988-147956-0010.flac	61680
+1988/147956/1988-147956-0025.flac	38560
+1988/147956/1988-147956-0027.flac	123680
+1988/147956/1988-147956-0005.flac	55520
+1988/147956/1988-147956-0019.flac	98160
+1988/147956/1988-147956-0014.flac	34960
+1988/147956/1988-147956-0022.flac	120640
+1988/147956/1988-147956-0026.flac	110640
+1988/147956/1988-147956-0017.flac	100800
+1988/147956/1988-147956-0008.flac	238720
+1988/147956/1988-147956-0012.flac	73120
+1988/147956/1988-147956-0024.flac	49360
+1988/147956/1988-147956-0006.flac	86080
+1988/147956/1988-147956-0028.flac	94080
+1988/147956/1988-147956-0023.flac	66240
+1988/147956/1988-147956-0001.flac	227360
+1988/147956/1988-147956-0003.flac	39920
+1988/147956/1988-147956-0009.flac	82560
+1988/147956/1988-147956-0015.flac	66800
+1988/147956/1988-147956-0029.flac	90400
+1988/147956/1988-147956-0000.flac	239200
+1988/147956/1988-147956-0002.flac	70320
+174/168635/174-168635-0013.flac	93920
+174/168635/174-168635-0016.flac	112960
+174/168635/174-168635-0004.flac	204320
+174/168635/174-168635-0011.flac	74320
+174/168635/174-168635-0018.flac	433760
+174/168635/174-168635-0014.flac	77201
+174/168635/174-168635-0010.flac	160960
+174/168635/174-168635-0007.flac	163360
+174/168635/174-168635-0000.flac	72480
+174/168635/174-168635-0019.flac	128000
+174/168635/174-168635-0020.flac	73920
+174/168635/174-168635-0017.flac	73120
+174/168635/174-168635-0006.flac	105120
+174/168635/174-168635-0002.flac	253760
+174/168635/174-168635-0001.flac	74400
+174/168635/174-168635-0008.flac	163760
+174/168635/174-168635-0005.flac	126880
+174/168635/174-168635-0009.flac	52480
+174/168635/174-168635-0012.flac	133760
+174/168635/174-168635-0022.flac	69360
+174/168635/174-168635-0015.flac	67040
+174/168635/174-168635-0003.flac	210880
+174/168635/174-168635-0021.flac	66720
+174/84280/174-84280-0014.flac	48161
+174/84280/174-84280-0004.flac	328480
+174/84280/174-84280-0007.flac	155680
+174/84280/174-84280-0015.flac	219680
+174/84280/174-84280-0006.flac	112480
+174/84280/174-84280-0001.flac	293280
+174/84280/174-84280-0003.flac	119120
+174/84280/174-84280-0005.flac	137760
+174/84280/174-84280-0010.flac	69760
+174/84280/174-84280-0008.flac	72320
+174/84280/174-84280-0000.flac	38400
+174/84280/174-84280-0009.flac	49600
+174/84280/174-84280-0012.flac	230880
+174/84280/174-84280-0011.flac	53680
+174/84280/174-84280-0002.flac	195200
+174/84280/174-84280-0013.flac	269760
+174/50561/174-50561-0019.flac	68160
+174/50561/174-50561-0008.flac	260960
+174/50561/174-50561-0004.flac	32960
+174/50561/174-50561-0014.flac	28800
+174/50561/174-50561-0005.flac	91280
+174/50561/174-50561-0013.flac	169840
+174/50561/174-50561-0016.flac	215280
+174/50561/174-50561-0018.flac	33920
+174/50561/174-50561-0011.flac	152320
+174/50561/174-50561-0012.flac	33120
+174/50561/174-50561-0007.flac	173680
+174/50561/174-50561-0017.flac	45200
+174/50561/174-50561-0002.flac	34320
+174/50561/174-50561-0010.flac	238560
+174/50561/174-50561-0000.flac	64320
+174/50561/174-50561-0003.flac	53440
+174/50561/174-50561-0006.flac	83440
+174/50561/174-50561-0015.flac	268160
+174/50561/174-50561-0009.flac	32640
+174/50561/174-50561-0001.flac	253760
+251/136532/251-136532-0007.flac	89680
+251/136532/251-136532-0014.flac	36240
+251/136532/251-136532-0005.flac	92080
+251/136532/251-136532-0009.flac	58000
+251/136532/251-136532-0020.flac	47600
+251/136532/251-136532-0017.flac	245840
+251/136532/251-136532-0002.flac	104800
+251/136532/251-136532-0011.flac	135120
+251/136532/251-136532-0016.flac	148080
+251/136532/251-136532-0012.flac	140640
+251/136532/251-136532-0004.flac	396800
+251/136532/251-136532-0023.flac	134320
+251/136532/251-136532-0001.flac	109680
+251/136532/251-136532-0010.flac	104960
+251/136532/251-136532-0003.flac	261760
+251/136532/251-136532-0018.flac	110000
+251/136532/251-136532-0015.flac	122080
+251/136532/251-136532-0019.flac	149520
+251/136532/251-136532-0006.flac	41040
+251/136532/251-136532-0021.flac	104800
+251/136532/251-136532-0022.flac	24560
+251/136532/251-136532-0013.flac	74880
+251/136532/251-136532-0008.flac	156640
+251/136532/251-136532-0000.flac	156960
+251/137823/251-137823-0022.flac	114800
+251/137823/251-137823-0016.flac	108320
+251/137823/251-137823-0020.flac	86800
+251/137823/251-137823-0019.flac	196640
+251/137823/251-137823-0000.flac	54960
+251/137823/251-137823-0026.flac	44560
+251/137823/251-137823-0011.flac	90480
+251/137823/251-137823-0023.flac	37360
+251/137823/251-137823-0006.flac	63600
+251/137823/251-137823-0014.flac	79680
+251/137823/251-137823-0012.flac	39760
+251/137823/251-137823-0025.flac	58560
+251/137823/251-137823-0010.flac	127280
+251/137823/251-137823-0008.flac	104400
+251/137823/251-137823-0015.flac	41840
+251/137823/251-137823-0003.flac	70400
+251/137823/251-137823-0004.flac	175761
+251/137823/251-137823-0005.flac	87120
+251/137823/251-137823-0002.flac	50080
+251/137823/251-137823-0024.flac	39360
+251/137823/251-137823-0007.flac	56640
+251/137823/251-137823-0017.flac	61680
+251/137823/251-137823-0013.flac	79360
+251/137823/251-137823-0018.flac	120080
+251/137823/251-137823-0001.flac	87680
+251/137823/251-137823-0021.flac	109120
+251/118436/251-118436-0016.flac	143520
+251/118436/251-118436-0002.flac	111440
+251/118436/251-118436-0013.flac	131600
+251/118436/251-118436-0012.flac	177680
+251/118436/251-118436-0005.flac	84960
+251/118436/251-118436-0004.flac	53200
+251/118436/251-118436-0001.flac	59600
+251/118436/251-118436-0009.flac	143280
+251/118436/251-118436-0019.flac	183600
+251/118436/251-118436-0021.flac	63680
+251/118436/251-118436-0014.flac	106960
+251/118436/251-118436-0015.flac	86080
+251/118436/251-118436-0003.flac	176320
+251/118436/251-118436-0010.flac	52000
+251/118436/251-118436-0018.flac	45280
+251/118436/251-118436-0017.flac	46160
+251/118436/251-118436-0007.flac	87200
+251/118436/251-118436-0000.flac	100160
+251/118436/251-118436-0023.flac	44320
+251/118436/251-118436-0011.flac	119520
+251/118436/251-118436-0008.flac	99680
+251/118436/251-118436-0006.flac	111440
+251/118436/251-118436-0022.flac	72640
+251/118436/251-118436-0020.flac	88880
+2086/149220/2086-149220-0003.flac	364960
+2086/149220/2086-149220-0043.flac	44320
+2086/149220/2086-149220-0004.flac	115520
+2086/149220/2086-149220-0045.flac	72720
+2086/149220/2086-149220-0032.flac	60160
+2086/149220/2086-149220-0024.flac	60880
+2086/149220/2086-149220-0040.flac	107200
+2086/149220/2086-149220-0000.flac	193520
+2086/149220/2086-149220-0007.flac	83840
+2086/149220/2086-149220-0037.flac	77360
+2086/149220/2086-149220-0027.flac	67920
+2086/149220/2086-149220-0006.flac	165600
+2086/149220/2086-149220-0025.flac	164080
+2086/149220/2086-149220-0020.flac	46880
+2086/149220/2086-149220-0001.flac	316480
+2086/149220/2086-149220-0031.flac	137280
+2086/149220/2086-149220-0044.flac	63200
+2086/149220/2086-149220-0005.flac	145440
+2086/149220/2086-149220-0026.flac	76960
+2086/149220/2086-149220-0023.flac	117840
+2086/149220/2086-149220-0035.flac	168799
+2086/149220/2086-149220-0028.flac	45680
+2086/149220/2086-149220-0021.flac	53360
+2086/149220/2086-149220-0048.flac	180960
+2086/149220/2086-149220-0010.flac	223200
+2086/149220/2086-149220-0012.flac	239920
+2086/149220/2086-149220-0017.flac	139200
+2086/149220/2086-149220-0038.flac	69600
+2086/149220/2086-149220-0015.flac	131040
+2086/149220/2086-149220-0014.flac	153280
+2086/149220/2086-149220-0018.flac	109280
+2086/149220/2086-149220-0036.flac	101120
+2086/149220/2086-149220-0009.flac	169600
+2086/149220/2086-149220-0033.flac	118960
+2086/149220/2086-149220-0046.flac	54640
+2086/149220/2086-149220-0030.flac	129600
+2086/149220/2086-149220-0041.flac	236880
+2086/149220/2086-149220-0011.flac	310880
+2086/149220/2086-149220-0047.flac	62720
+2086/149220/2086-149220-0013.flac	281120
+2086/149220/2086-149220-0049.flac	185680
+2086/149220/2086-149220-0008.flac	142240
+2086/149220/2086-149220-0042.flac	45440
+2086/149220/2086-149220-0039.flac	39680
+2086/149220/2086-149220-0002.flac	256160
+2086/149220/2086-149220-0022.flac	148240
+2086/149220/2086-149220-0034.flac	94560
+2086/149220/2086-149220-0019.flac	210880
+2086/149214/2086-149214-0001.flac	111520
+2086/149214/2086-149214-0004.flac	243520
+2086/149214/2086-149214-0002.flac	267920
+2086/149214/2086-149214-0000.flac	156960
+2086/149214/2086-149214-0003.flac	143919
+1272/141231/1272-141231-0022.flac	133600
+1272/141231/1272-141231-0005.flac	92960
+1272/141231/1272-141231-0010.flac	172160
+1272/141231/1272-141231-0009.flac	137920
+1272/141231/1272-141231-0019.flac	75200
+1272/141231/1272-141231-0030.flac	114080
+1272/141231/1272-141231-0001.flac	104560
+1272/141231/1272-141231-0016.flac	29600
+1272/141231/1272-141231-0024.flac	58240
+1272/141231/1272-141231-0028.flac	90400
+1272/141231/1272-141231-0012.flac	123840
+1272/141231/1272-141231-0031.flac	113840
+1272/141231/1272-141231-0026.flac	106160
+1272/141231/1272-141231-0015.flac	90880
+1272/141231/1272-141231-0000.flac	74400
+1272/141231/1272-141231-0027.flac	105120
+1272/141231/1272-141231-0020.flac	69120
+1272/141231/1272-141231-0023.flac	138560
+1272/141231/1272-141231-0002.flac	213360
+1272/141231/1272-141231-0006.flac	78880
+1272/141231/1272-141231-0025.flac	139600
+1272/141231/1272-141231-0004.flac	81440
+1272/141231/1272-141231-0021.flac	74400
+1272/141231/1272-141231-0008.flac	73520
+1272/141231/1272-141231-0011.flac	80240
+1272/141231/1272-141231-0013.flac	26239
+1272/141231/1272-141231-0032.flac	71680
+1272/141231/1272-141231-0017.flac	59760
+1272/141231/1272-141231-0003.flac	86720
+1272/141231/1272-141231-0029.flac	110080
+1272/141231/1272-141231-0007.flac	75760
+1272/141231/1272-141231-0014.flac	116880
+1272/141231/1272-141231-0018.flac	96000
+1272/135031/1272-135031-0022.flac	43120
+1272/135031/1272-135031-0002.flac	183600
+1272/135031/1272-135031-0015.flac	118000
+1272/135031/1272-135031-0012.flac	32640
+1272/135031/1272-135031-0018.flac	39119
+1272/135031/1272-135031-0024.flac	231520
+1272/135031/1272-135031-0019.flac	56400
+1272/135031/1272-135031-0020.flac	74400
+1272/135031/1272-135031-0023.flac	120320
+1272/135031/1272-135031-0010.flac	143920
+1272/135031/1272-135031-0017.flac	109760
+1272/135031/1272-135031-0021.flac	41520
+1272/135031/1272-135031-0008.flac	58720
+1272/135031/1272-135031-0003.flac	76080
+1272/135031/1272-135031-0006.flac	65440
+1272/135031/1272-135031-0005.flac	74000
+1272/135031/1272-135031-0004.flac	67600
+1272/135031/1272-135031-0014.flac	27840
+1272/135031/1272-135031-0011.flac	50880
+1272/135031/1272-135031-0016.flac	36480
+1272/135031/1272-135031-0007.flac	64880
+1272/135031/1272-135031-0013.flac	59200
+1272/135031/1272-135031-0000.flac	174160
+1272/135031/1272-135031-0009.flac	30560
+1272/135031/1272-135031-0001.flac	178080
+1272/128104/1272-128104-0007.flac	147840
+1272/128104/1272-128104-0011.flac	241840
+1272/128104/1272-128104-0004.flac	470400
+1272/128104/1272-128104-0012.flac	86080
+1272/128104/1272-128104-0006.flac	90240
+1272/128104/1272-128104-0000.flac	93680
+1272/128104/1272-128104-0002.flac	199760
+1272/128104/1272-128104-0008.flac	81920
+1272/128104/1272-128104-0014.flac	35920
+1272/128104/1272-128104-0003.flac	158400
+1272/128104/1272-128104-0001.flac	77040
+1272/128104/1272-128104-0005.flac	144160
+1272/128104/1272-128104-0009.flac	292640
+1272/128104/1272-128104-0013.flac	113600
+1272/128104/1272-128104-0010.flac	89600
+3536/23268/3536-23268-0012.flac	116080
+3536/23268/3536-23268-0015.flac	63200
+3536/23268/3536-23268-0025.flac	80240
+3536/23268/3536-23268-0006.flac	84800
+3536/23268/3536-23268-0030.flac	135600
+3536/23268/3536-23268-0014.flac	56000
+3536/23268/3536-23268-0003.flac	127920
+3536/23268/3536-23268-0008.flac	148640
+3536/23268/3536-23268-0000.flac	318240
+3536/23268/3536-23268-0019.flac	162640
+3536/23268/3536-23268-0004.flac	50080
+3536/23268/3536-23268-0017.flac	271680
+3536/23268/3536-23268-0018.flac	121360
+3536/23268/3536-23268-0024.flac	101600
+3536/23268/3536-23268-0002.flac	167920
+3536/23268/3536-23268-0022.flac	97840
+3536/23268/3536-23268-0005.flac	78880
+3536/23268/3536-23268-0029.flac	119280
+3536/23268/3536-23268-0027.flac	165040
+3536/23268/3536-23268-0007.flac	147360
+3536/23268/3536-23268-0020.flac	164320
+3536/23268/3536-23268-0009.flac	69040
+3536/23268/3536-23268-0011.flac	204720
+3536/23268/3536-23268-0028.flac	133520
+3536/23268/3536-23268-0001.flac	254480
+3536/23268/3536-23268-0021.flac	222400
+3536/23268/3536-23268-0010.flac	62480
+3536/23268/3536-23268-0013.flac	128400
+3536/23268/3536-23268-0023.flac	200720
+3536/23268/3536-23268-0016.flac	181120
+3536/23268/3536-23268-0026.flac	141200
+3536/8226/3536-8226-0010.flac	152000
+3536/8226/3536-8226-0004.flac	234000
+3536/8226/3536-8226-0009.flac	32080
+3536/8226/3536-8226-0014.flac	36320
+3536/8226/3536-8226-0001.flac	142640
+3536/8226/3536-8226-0002.flac	151200
+3536/8226/3536-8226-0023.flac	182321
+3536/8226/3536-8226-0030.flac	64800
+3536/8226/3536-8226-0000.flac	172080
+3536/8226/3536-8226-0025.flac	61280
+3536/8226/3536-8226-0005.flac	193840
+3536/8226/3536-8226-0012.flac	60560
+3536/8226/3536-8226-0018.flac	150160
+3536/8226/3536-8226-0013.flac	45920
+3536/8226/3536-8226-0021.flac	79760
+3536/8226/3536-8226-0011.flac	139760
+3536/8226/3536-8226-0006.flac	140720
+3536/8226/3536-8226-0015.flac	178400
+3536/8226/3536-8226-0017.flac	52080
+3536/8226/3536-8226-0008.flac	80160
+3536/8226/3536-8226-0024.flac	102800
+3536/8226/3536-8226-0029.flac	53680
+3536/8226/3536-8226-0007.flac	37360
+3536/8226/3536-8226-0026.flac	155200
+3536/8226/3536-8226-0020.flac	105840
+3536/8226/3536-8226-0032.flac	66560
+3536/8226/3536-8226-0027.flac	36320
+3536/8226/3536-8226-0022.flac	145600
+3536/8226/3536-8226-0031.flac	52000
+3536/8226/3536-8226-0016.flac	75840
+3536/8226/3536-8226-0028.flac	52240
+3536/8226/3536-8226-0019.flac	85280
+3536/8226/3536-8226-0003.flac	127040
+8297/275156/8297-275156-0002.flac	127040
+8297/275156/8297-275156-0000.flac	57280
+8297/275156/8297-275156-0007.flac	157120
+8297/275156/8297-275156-0011.flac	82720
+8297/275156/8297-275156-0013.flac	195040
+8297/275156/8297-275156-0004.flac	52480
+8297/275156/8297-275156-0005.flac	242880
+8297/275156/8297-275156-0008.flac	125440
+8297/275156/8297-275156-0006.flac	106880
+8297/275156/8297-275156-0010.flac	218880
+8297/275156/8297-275156-0001.flac	74240
+8297/275156/8297-275156-0003.flac	179920
+8297/275156/8297-275156-0012.flac	91200
+8297/275156/8297-275156-0009.flac	132000
+8297/275154/8297-275154-0012.flac	79520
+8297/275154/8297-275154-0016.flac	45920
+8297/275154/8297-275154-0006.flac	224000
+8297/275154/8297-275154-0020.flac	71521
+8297/275154/8297-275154-0018.flac	35280
+8297/275154/8297-275154-0011.flac	76160
+8297/275154/8297-275154-0015.flac	51520
+8297/275154/8297-275154-0002.flac	156160
+8297/275154/8297-275154-0013.flac	50160
+8297/275154/8297-275154-0001.flac	165440
+8297/275154/8297-275154-0025.flac	78080
+8297/275154/8297-275154-0008.flac	64640
+8297/275154/8297-275154-0004.flac	106480
+8297/275154/8297-275154-0014.flac	289360
+8297/275154/8297-275154-0009.flac	39680
+8297/275154/8297-275154-0003.flac	83840
+8297/275154/8297-275154-0007.flac	71840
+8297/275154/8297-275154-0023.flac	41600
+8297/275154/8297-275154-0000.flac	163520
+8297/275154/8297-275154-0021.flac	76961
+8297/275154/8297-275154-0017.flac	59200
+8297/275154/8297-275154-0005.flac	44480
+8297/275154/8297-275154-0027.flac	140000
+8297/275154/8297-275154-0022.flac	36080
+8297/275154/8297-275154-0024.flac	58640
+8297/275154/8297-275154-0026.flac	78240
+8297/275154/8297-275154-0019.flac	126080
+8297/275155/8297-275155-0021.flac	28400
+8297/275155/8297-275155-0011.flac	149280
+8297/275155/8297-275155-0032.flac	164880
+8297/275155/8297-275155-0022.flac	107040
+8297/275155/8297-275155-0027.flac	50400
+8297/275155/8297-275155-0013.flac	53360
+8297/275155/8297-275155-0004.flac	186400
+8297/275155/8297-275155-0001.flac	106000
+8297/275155/8297-275155-0017.flac	142560
+8297/275155/8297-275155-0015.flac	108960
+8297/275155/8297-275155-0019.flac	99200
+8297/275155/8297-275155-0031.flac	77360
+8297/275155/8297-275155-0006.flac	110640
+8297/275155/8297-275155-0010.flac	49680
+8297/275155/8297-275155-0030.flac	37440
+8297/275155/8297-275155-0005.flac	130240
+8297/275155/8297-275155-0016.flac	136320
+8297/275155/8297-275155-0009.flac	98320
+8297/275155/8297-275155-0014.flac	135040
+8297/275155/8297-275155-0023.flac	52480
+8297/275155/8297-275155-0025.flac	50880
+8297/275155/8297-275155-0002.flac	141600
+8297/275155/8297-275155-0029.flac	167360
+8297/275155/8297-275155-0028.flac	58400
+8297/275155/8297-275155-0012.flac	46800
+8297/275155/8297-275155-0000.flac	167520
+8297/275155/8297-275155-0008.flac	51520
+8297/275155/8297-275155-0003.flac	56800
+8297/275155/8297-275155-0007.flac	60720
+8297/275155/8297-275155-0018.flac	125520
+8297/275155/8297-275155-0024.flac	124480
+8297/275155/8297-275155-0026.flac	55280
+8297/275155/8297-275155-0020.flac	110880
+1673/143397/1673-143397-0006.flac	136560
+1673/143397/1673-143397-0003.flac	178880
+1673/143397/1673-143397-0018.flac	68240
+1673/143397/1673-143397-0000.flac	149600
+1673/143397/1673-143397-0020.flac	261120
+1673/143397/1673-143397-0011.flac	227120
+1673/143397/1673-143397-0013.flac	106800
+1673/143397/1673-143397-0010.flac	192000
+1673/143397/1673-143397-0012.flac	149280
+1673/143397/1673-143397-0008.flac	68400
+1673/143397/1673-143397-0002.flac	208480
+1673/143397/1673-143397-0016.flac	244560
+1673/143397/1673-143397-0015.flac	101520
+1673/143397/1673-143397-0005.flac	158320
+1673/143397/1673-143397-0004.flac	193120
+1673/143397/1673-143397-0014.flac	150320
+1673/143397/1673-143397-0001.flac	142320
+1673/143397/1673-143397-0019.flac	251120
+1673/143397/1673-143397-0017.flac	112880
+1673/143397/1673-143397-0007.flac	82560
+1673/143397/1673-143397-0009.flac	57600
+1673/143396/1673-143396-0007.flac	113520
+1673/143396/1673-143396-0009.flac	263360
+1673/143396/1673-143396-0000.flac	234800
+1673/143396/1673-143396-0010.flac	191920
+1673/143396/1673-143396-0004.flac	318640
+1673/143396/1673-143396-0011.flac	262640
+1673/143396/1673-143396-0002.flac	125600
+1673/143396/1673-143396-0003.flac	181680
+1673/143396/1673-143396-0015.flac	100880
+1673/143396/1673-143396-0013.flac	248320
+1673/143396/1673-143396-0018.flac	202480
+1673/143396/1673-143396-0008.flac	282400
+1673/143396/1673-143396-0006.flac	254720
+1673/143396/1673-143396-0012.flac	231520
+1673/143396/1673-143396-0016.flac	231280
+1673/143396/1673-143396-0014.flac	223760
+1673/143396/1673-143396-0020.flac	240320
+1673/143396/1673-143396-0005.flac	187841
+1673/143396/1673-143396-0001.flac	224800
+1673/143396/1673-143396-0017.flac	134880
+1673/143396/1673-143396-0019.flac	247760
+1993/147966/1993-147966-0006.flac	52640
+1993/147966/1993-147966-0000.flac	178160
+1993/147966/1993-147966-0001.flac	201040
+1993/147966/1993-147966-0005.flac	53520
+1993/147966/1993-147966-0004.flac	77840
+1993/147966/1993-147966-0002.flac	66960
+1993/147966/1993-147966-0003.flac	41520
+1993/147149/1993-147149-0003.flac	228800
+1993/147149/1993-147149-0002.flac	92160
+1993/147149/1993-147149-0015.flac	132161
+1993/147149/1993-147149-0030.flac	248320
+1993/147149/1993-147149-0014.flac	48880
+1993/147149/1993-147149-0012.flac	84480
+1993/147149/1993-147149-0008.flac	54240
+1993/147149/1993-147149-0016.flac	93520
+1993/147149/1993-147149-0023.flac	171200
+1993/147149/1993-147149-0013.flac	81200
+1993/147149/1993-147149-0027.flac	270400
+1993/147149/1993-147149-0028.flac	162720
+1993/147149/1993-147149-0006.flac	446080
+1993/147149/1993-147149-0009.flac	276480
+1993/147149/1993-147149-0020.flac	271680
+1993/147149/1993-147149-0019.flac	111360
+1993/147149/1993-147149-0021.flac	167840
+1993/147149/1993-147149-0017.flac	60480
+1993/147149/1993-147149-0005.flac	127680
+1993/147149/1993-147149-0004.flac	118400
+1993/147149/1993-147149-0007.flac	126560
+1993/147149/1993-147149-0022.flac	264640
+1993/147149/1993-147149-0001.flac	152560
+1993/147149/1993-147149-0011.flac	49920
+1993/147149/1993-147149-0029.flac	186640
+1993/147149/1993-147149-0018.flac	212640
+1993/147149/1993-147149-0025.flac	154320
+1993/147149/1993-147149-0000.flac	107440
+1993/147149/1993-147149-0010.flac	52480
+1993/147149/1993-147149-0024.flac	187040
+1993/147149/1993-147149-0026.flac	36080
+1993/147964/1993-147964-0006.flac	55520
+1993/147964/1993-147964-0004.flac	116800
+1993/147964/1993-147964-0001.flac	76000
+1993/147964/1993-147964-0000.flac	134720
+1993/147964/1993-147964-0010.flac	329040
+1993/147964/1993-147964-0003.flac	74960
+1993/147964/1993-147964-0008.flac	112640
+1993/147964/1993-147964-0002.flac	101280
+1993/147964/1993-147964-0007.flac	109840
+1993/147964/1993-147964-0005.flac	168240
+1993/147964/1993-147964-0009.flac	143520
+1993/147965/1993-147965-0003.flac	118000
+1993/147965/1993-147965-0007.flac	121120
+1993/147965/1993-147965-0008.flac	88000
+1993/147965/1993-147965-0002.flac	149760
+1993/147965/1993-147965-0000.flac	69920
+1993/147965/1993-147965-0001.flac	42960
+1993/147965/1993-147965-0006.flac	141280
+1993/147965/1993-147965-0005.flac	138560
+1993/147965/1993-147965-0004.flac	46640
+3081/166546/3081-166546-0025.flac	41840
+3081/166546/3081-166546-0029.flac	110160
+3081/166546/3081-166546-0006.flac	60400
+3081/166546/3081-166546-0010.flac	83280
+3081/166546/3081-166546-0046.flac	63680
+3081/166546/3081-166546-0035.flac	37920
+3081/166546/3081-166546-0022.flac	169520
+3081/166546/3081-166546-0045.flac	151760
+3081/166546/3081-166546-0081.flac	48880
+3081/166546/3081-166546-0021.flac	46560
+3081/166546/3081-166546-0067.flac	127280
+3081/166546/3081-166546-0008.flac	28320
+3081/166546/3081-166546-0085.flac	180560
+3081/166546/3081-166546-0031.flac	168800
+3081/166546/3081-166546-0086.flac	58880
+3081/166546/3081-166546-0037.flac	104480
+3081/166546/3081-166546-0070.flac	187760
+3081/166546/3081-166546-0069.flac	59360
+3081/166546/3081-166546-0009.flac	60480
+3081/166546/3081-166546-0080.flac	69040
+3081/166546/3081-166546-0075.flac	39600
+3081/166546/3081-166546-0005.flac	32320
+3081/166546/3081-166546-0040.flac	78320
+3081/166546/3081-166546-0059.flac	94720
+3081/166546/3081-166546-0014.flac	56321
+3081/166546/3081-166546-0052.flac	54560
+3081/166546/3081-166546-0065.flac	52480
+3081/166546/3081-166546-0061.flac	84560
+3081/166546/3081-166546-0064.flac	62880
+3081/166546/3081-166546-0007.flac	90480
+3081/166546/3081-166546-0019.flac	62400
+3081/166546/3081-166546-0033.flac	43680
+3081/166546/3081-166546-0000.flac	168000
+3081/166546/3081-166546-0060.flac	43840
+3081/166546/3081-166546-0056.flac	75680
+3081/166546/3081-166546-0013.flac	83520
+3081/166546/3081-166546-0042.flac	83120
+3081/166546/3081-166546-0077.flac	166800
+3081/166546/3081-166546-0048.flac	76960
+3081/166546/3081-166546-0044.flac	71920
+3081/166546/3081-166546-0015.flac	43520
+3081/166546/3081-166546-0004.flac	57200
+3081/166546/3081-166546-0024.flac	69360
+3081/166546/3081-166546-0072.flac	51680
+3081/166546/3081-166546-0032.flac	59040
+3081/166546/3081-166546-0043.flac	90000
+3081/166546/3081-166546-0051.flac	179680
+3081/166546/3081-166546-0050.flac	60160
+3081/166546/3081-166546-0063.flac	27840
+3081/166546/3081-166546-0087.flac	154000
+3081/166546/3081-166546-0011.flac	99360
+3081/166546/3081-166546-0068.flac	40480
+3081/166546/3081-166546-0078.flac	69280
+3081/166546/3081-166546-0073.flac	23120
+3081/166546/3081-166546-0054.flac	55121
+3081/166546/3081-166546-0028.flac	63680
+3081/166546/3081-166546-0082.flac	122560
+3081/166546/3081-166546-0066.flac	75200
+3081/166546/3081-166546-0012.flac	107520
+3081/166546/3081-166546-0041.flac	43760
+3081/166546/3081-166546-0076.flac	31040
+3081/166546/3081-166546-0016.flac	122960
+3081/166546/3081-166546-0039.flac	67200
+3081/166546/3081-166546-0084.flac	53600
+3081/166546/3081-166546-0058.flac	70960
+3081/166546/3081-166546-0036.flac	43840
+3081/166546/3081-166546-0062.flac	129120
+3081/166546/3081-166546-0055.flac	62000
+3081/166546/3081-166546-0020.flac	85440
+3081/166546/3081-166546-0027.flac	63600
+3081/166546/3081-166546-0034.flac	79440
+3081/166546/3081-166546-0057.flac	42720
+3081/166546/3081-166546-0030.flac	223760
+3081/166546/3081-166546-0001.flac	153920
+3081/166546/3081-166546-0074.flac	177920
+3081/166546/3081-166546-0079.flac	132080
+3081/166546/3081-166546-0089.flac	200320
+3081/166546/3081-166546-0038.flac	49360
+3081/166546/3081-166546-0018.flac	28400
+3081/166546/3081-166546-0003.flac	66000
+3081/166546/3081-166546-0026.flac	156960
+3081/166546/3081-166546-0083.flac	45840
+3081/166546/3081-166546-0071.flac	101520
+3081/166546/3081-166546-0053.flac	42800
+3081/166546/3081-166546-0049.flac	55920
+3081/166546/3081-166546-0023.flac	164560
+3081/166546/3081-166546-0002.flac	45280
+3081/166546/3081-166546-0088.flac	172960
+3081/166546/3081-166546-0047.flac	76160
+3081/166546/3081-166546-0017.flac	64080
+1919/142785/1919-142785-0049.flac	253360
+1919/142785/1919-142785-0041.flac	194400
+1919/142785/1919-142785-0034.flac	58880
+1919/142785/1919-142785-0029.flac	42000
+1919/142785/1919-142785-0025.flac	128080
+1919/142785/1919-142785-0063.flac	44800
+1919/142785/1919-142785-0061.flac	278880
+1919/142785/1919-142785-0004.flac	130720
+1919/142785/1919-142785-0054.flac	123600
+1919/142785/1919-142785-0016.flac	44480
+1919/142785/1919-142785-0005.flac	301760
+1919/142785/1919-142785-0000.flac	42560
+1919/142785/1919-142785-0045.flac	97760
+1919/142785/1919-142785-0001.flac	176800
+1919/142785/1919-142785-0024.flac	45760
+1919/142785/1919-142785-0015.flac	68640
+1919/142785/1919-142785-0057.flac	132960
+1919/142785/1919-142785-0047.flac	282640
+1919/142785/1919-142785-0044.flac	92320
+1919/142785/1919-142785-0033.flac	123360
+1919/142785/1919-142785-0006.flac	44800
+1919/142785/1919-142785-0011.flac	224000
+1919/142785/1919-142785-0060.flac	118480
+1919/142785/1919-142785-0009.flac	93440
+1919/142785/1919-142785-0046.flac	42160
+1919/142785/1919-142785-0007.flac	426000
+1919/142785/1919-142785-0002.flac	164160
+1919/142785/1919-142785-0035.flac	57760
+1919/142785/1919-142785-0008.flac	324560
+1919/142785/1919-142785-0027.flac	152560
+1919/142785/1919-142785-0012.flac	86480
+1919/142785/1919-142785-0059.flac	139920
+1919/142785/1919-142785-0040.flac	126720
+1919/142785/1919-142785-0017.flac	113680
+1919/142785/1919-142785-0042.flac	61920
+1919/142785/1919-142785-0010.flac	184080
+1919/142785/1919-142785-0037.flac	47520
+1919/142785/1919-142785-0051.flac	148640
+1919/142785/1919-142785-0058.flac	37760
+1919/142785/1919-142785-0030.flac	149920
+1919/142785/1919-142785-0036.flac	222000
+1919/142785/1919-142785-0031.flac	82480
+1919/142785/1919-142785-0020.flac	149440
+1919/142785/1919-142785-0056.flac	33600
+1919/142785/1919-142785-0043.flac	113360
+1919/142785/1919-142785-0021.flac	27040
+1919/142785/1919-142785-0014.flac	130080
+1919/142785/1919-142785-0038.flac	94480
+1919/142785/1919-142785-0018.flac	112960
+1919/142785/1919-142785-0019.flac	134080
+1919/142785/1919-142785-0048.flac	29440
+1919/142785/1919-142785-0023.flac	115840
+1919/142785/1919-142785-0032.flac	94720
+1919/142785/1919-142785-0050.flac	140640
+1919/142785/1919-142785-0055.flac	152160
+1919/142785/1919-142785-0052.flac	49600
+1919/142785/1919-142785-0053.flac	176960
+1919/142785/1919-142785-0026.flac	90480
+1919/142785/1919-142785-0003.flac	86480
+1919/142785/1919-142785-0062.flac	66240
+1919/142785/1919-142785-0022.flac	108320
+1919/142785/1919-142785-0028.flac	82560
+1919/142785/1919-142785-0013.flac	96160
+1919/142785/1919-142785-0039.flac	46080
+1462/170142/1462-170142-0027.flac	67040
+1462/170142/1462-170142-0042.flac	51840
+1462/170142/1462-170142-0025.flac	88800
+1462/170142/1462-170142-0009.flac	101360
+1462/170142/1462-170142-0020.flac	79120
+1462/170142/1462-170142-0036.flac	49120
+1462/170142/1462-170142-0012.flac	106720
+1462/170142/1462-170142-0031.flac	77200
+1462/170142/1462-170142-0030.flac	43280
+1462/170142/1462-170142-0019.flac	163920
+1462/170142/1462-170142-0014.flac	40800
+1462/170142/1462-170142-0002.flac	103200
+1462/170142/1462-170142-0006.flac	99120
+1462/170142/1462-170142-0000.flac	75440
+1462/170142/1462-170142-0018.flac	52240
+1462/170142/1462-170142-0029.flac	63600
+1462/170142/1462-170142-0038.flac	77760
+1462/170142/1462-170142-0035.flac	42720
+1462/170142/1462-170142-0037.flac	69200
+1462/170142/1462-170142-0008.flac	76960
+1462/170142/1462-170142-0033.flac	66880
+1462/170142/1462-170142-0032.flac	44320
+1462/170142/1462-170142-0040.flac	73600
+1462/170142/1462-170142-0004.flac	116080
+1462/170142/1462-170142-0024.flac	34480
+1462/170142/1462-170142-0005.flac	177280
+1462/170142/1462-170142-0011.flac	52160
+1462/170142/1462-170142-0015.flac	48160
+1462/170142/1462-170142-0028.flac	48080
+1462/170142/1462-170142-0010.flac	73280
+1462/170142/1462-170142-0023.flac	44960
+1462/170142/1462-170142-0003.flac	37920
+1462/170142/1462-170142-0026.flac	77360
+1462/170142/1462-170142-0001.flac	153360
+1462/170142/1462-170142-0034.flac	40320
+1462/170142/1462-170142-0013.flac	62400
+1462/170142/1462-170142-0039.flac	75440
+1462/170142/1462-170142-0021.flac	130720
+1462/170142/1462-170142-0007.flac	39040
+1462/170142/1462-170142-0017.flac	37040
+1462/170142/1462-170142-0022.flac	60160
+1462/170142/1462-170142-0016.flac	99840
+1462/170142/1462-170142-0041.flac	89200
+1462/170138/1462-170138-0010.flac	137840
+1462/170138/1462-170138-0026.flac	38160
+1462/170138/1462-170138-0022.flac	98960
+1462/170138/1462-170138-0003.flac	37120
+1462/170138/1462-170138-0002.flac	74320
+1462/170138/1462-170138-0020.flac	37600
+1462/170138/1462-170138-0012.flac	61760
+1462/170138/1462-170138-0023.flac	152000
+1462/170138/1462-170138-0001.flac	63760
+1462/170138/1462-170138-0004.flac	40160
+1462/170138/1462-170138-0016.flac	54880
+1462/170138/1462-170138-0009.flac	40080
+1462/170138/1462-170138-0018.flac	70479
+1462/170138/1462-170138-0024.flac	261600
+1462/170138/1462-170138-0008.flac	101120
+1462/170138/1462-170138-0006.flac	137280
+1462/170138/1462-170138-0014.flac	78720
+1462/170138/1462-170138-0011.flac	182720
+1462/170138/1462-170138-0007.flac	84080
+1462/170138/1462-170138-0025.flac	71040
+1462/170138/1462-170138-0005.flac	210640
+1462/170138/1462-170138-0017.flac	99360
+1462/170138/1462-170138-0019.flac	56880
+1462/170138/1462-170138-0021.flac	154320
+1462/170138/1462-170138-0015.flac	54080
+1462/170138/1462-170138-0013.flac	112320
+1462/170138/1462-170138-0027.flac	80080
+1462/170138/1462-170138-0000.flac	232800
+1462/170145/1462-170145-0009.flac	46800
+1462/170145/1462-170145-0006.flac	55360
+1462/170145/1462-170145-0012.flac	43200
+1462/170145/1462-170145-0010.flac	47120
+1462/170145/1462-170145-0003.flac	139760
+1462/170145/1462-170145-0004.flac	132400
+1462/170145/1462-170145-0022.flac	104160
+1462/170145/1462-170145-0018.flac	32400
+1462/170145/1462-170145-0015.flac	96880
+1462/170145/1462-170145-0019.flac	43040
+1462/170145/1462-170145-0001.flac	77201
+1462/170145/1462-170145-0021.flac	40880
+1462/170145/1462-170145-0011.flac	39440
+1462/170145/1462-170145-0008.flac	50400
+1462/170145/1462-170145-0007.flac	101120
+1462/170145/1462-170145-0016.flac	66800
+1462/170145/1462-170145-0017.flac	52080
+1462/170145/1462-170145-0000.flac	246480
+1462/170145/1462-170145-0005.flac	59520
+1462/170145/1462-170145-0014.flac	47520
+1462/170145/1462-170145-0020.flac	34400
+1462/170145/1462-170145-0002.flac	45520
+1462/170145/1462-170145-0013.flac	76800
+2277/149897/2277-149897-0024.flac	46960
+2277/149897/2277-149897-0021.flac	74960
+2277/149897/2277-149897-0003.flac	77440
+2277/149897/2277-149897-0036.flac	70720
+2277/149897/2277-149897-0007.flac	125680
+2277/149897/2277-149897-0034.flac	194560
+2277/149897/2277-149897-0004.flac	91760
+2277/149897/2277-149897-0008.flac	46080
+2277/149897/2277-149897-0025.flac	68560
+2277/149897/2277-149897-0016.flac	104641
+2277/149897/2277-149897-0033.flac	81040
+2277/149897/2277-149897-0032.flac	139040
+2277/149897/2277-149897-0017.flac	116800
+2277/149897/2277-149897-0035.flac	55200
+2277/149897/2277-149897-0031.flac	70480
+2277/149897/2277-149897-0027.flac	56240
+2277/149897/2277-149897-0037.flac	48160
+2277/149897/2277-149897-0010.flac	58000
+2277/149897/2277-149897-0001.flac	44080
+2277/149897/2277-149897-0028.flac	58000
+2277/149897/2277-149897-0019.flac	77600
+2277/149897/2277-149897-0011.flac	52800
+2277/149897/2277-149897-0012.flac	49440
+2277/149897/2277-149897-0015.flac	48640
+2277/149897/2277-149897-0002.flac	104640
+2277/149897/2277-149897-0030.flac	40800
+2277/149897/2277-149897-0022.flac	160800
+2277/149897/2277-149897-0005.flac	183440
+2277/149897/2277-149897-0020.flac	73920
+2277/149897/2277-149897-0000.flac	70320
+2277/149897/2277-149897-0014.flac	73920
+2277/149897/2277-149897-0026.flac	109040
+2277/149897/2277-149897-0029.flac	122320
+2277/149897/2277-149897-0009.flac	59600
+2277/149897/2277-149897-0013.flac	70640
+2277/149897/2277-149897-0018.flac	107920
+2277/149897/2277-149897-0006.flac	54160
+2277/149896/2277-149896-0019.flac	56960
+2277/149896/2277-149896-0015.flac	58800
+2277/149896/2277-149896-0006.flac	79520
+2277/149896/2277-149896-0033.flac	45520
+2277/149896/2277-149896-0027.flac	57520
+2277/149896/2277-149896-0017.flac	79760
+2277/149896/2277-149896-0013.flac	97440
+2277/149896/2277-149896-0009.flac	93040
+2277/149896/2277-149896-0007.flac	86000
+2277/149896/2277-149896-0024.flac	72000
+2277/149896/2277-149896-0008.flac	111040
+2277/149896/2277-149896-0010.flac	84400
+2277/149896/2277-149896-0034.flac	64080
+2277/149896/2277-149896-0011.flac	71440
+2277/149896/2277-149896-0022.flac	54080
+2277/149896/2277-149896-0016.flac	44560
+2277/149896/2277-149896-0020.flac	84560
+2277/149896/2277-149896-0030.flac	100800
+2277/149896/2277-149896-0028.flac	46400
+2277/149896/2277-149896-0000.flac	105440
+2277/149896/2277-149896-0004.flac	31280
+2277/149896/2277-149896-0014.flac	65120
+2277/149896/2277-149896-0023.flac	62240
+2277/149896/2277-149896-0012.flac	39920
+2277/149896/2277-149896-0025.flac	54400
+2277/149896/2277-149896-0031.flac	57760
+2277/149896/2277-149896-0021.flac	53920
+2277/149896/2277-149896-0018.flac	72720
+2277/149896/2277-149896-0026.flac	78720
+2277/149896/2277-149896-0029.flac	41600
+2277/149896/2277-149896-0002.flac	77360
+2277/149896/2277-149896-0001.flac	114320
+2277/149896/2277-149896-0032.flac	172320
+2277/149896/2277-149896-0003.flac	45440
+2277/149874/2277-149874-0006.flac	43920
+2277/149874/2277-149874-0013.flac	50560
+2277/149874/2277-149874-0004.flac	51440
+2277/149874/2277-149874-0003.flac	123200
+2277/149874/2277-149874-0000.flac	248080
+2277/149874/2277-149874-0011.flac	39440
+2277/149874/2277-149874-0021.flac	47760
+2277/149874/2277-149874-0001.flac	112640
+2277/149874/2277-149874-0014.flac	137600
+2277/149874/2277-149874-0009.flac	53600
+2277/149874/2277-149874-0017.flac	53760
+2277/149874/2277-149874-0007.flac	70400
+2277/149874/2277-149874-0005.flac	101920
+2277/149874/2277-149874-0012.flac	81280
+2277/149874/2277-149874-0010.flac	118720
+2277/149874/2277-149874-0020.flac	118080
+2277/149874/2277-149874-0015.flac	75680
+2277/149874/2277-149874-0016.flac	85200
+2277/149874/2277-149874-0008.flac	86160
+2277/149874/2277-149874-0018.flac	80320
+2277/149874/2277-149874-0019.flac	135200
+2277/149874/2277-149874-0002.flac	81520
+7850/281318/7850-281318-0003.flac	51280
+7850/281318/7850-281318-0019.flac	87760
+7850/281318/7850-281318-0005.flac	36320
+7850/281318/7850-281318-0007.flac	68960
+7850/281318/7850-281318-0020.flac	34799
+7850/281318/7850-281318-0016.flac	52880
+7850/281318/7850-281318-0021.flac	125841
+7850/281318/7850-281318-0014.flac	127440
+7850/281318/7850-281318-0001.flac	69360
+7850/281318/7850-281318-0022.flac	61600
+7850/281318/7850-281318-0012.flac	68400
+7850/281318/7850-281318-0004.flac	133520
+7850/281318/7850-281318-0018.flac	75520
+7850/281318/7850-281318-0015.flac	46080
+7850/281318/7850-281318-0002.flac	68640
+7850/281318/7850-281318-0011.flac	120560
+7850/281318/7850-281318-0000.flac	66800
+7850/281318/7850-281318-0023.flac	83360
+7850/281318/7850-281318-0008.flac	97760
+7850/281318/7850-281318-0017.flac	139360
+7850/281318/7850-281318-0010.flac	192960
+7850/281318/7850-281318-0013.flac	96640
+7850/281318/7850-281318-0006.flac	120640
+7850/281318/7850-281318-0009.flac	139280
+7850/73752/7850-73752-0013.flac	41280
+7850/73752/7850-73752-0016.flac	57440
+7850/73752/7850-73752-0007.flac	164400
+7850/73752/7850-73752-0005.flac	34240
+7850/73752/7850-73752-0008.flac	169760
+7850/73752/7850-73752-0009.flac	117520
+7850/73752/7850-73752-0014.flac	39600
+7850/73752/7850-73752-0012.flac	70080
+7850/73752/7850-73752-0018.flac	454400
+7850/73752/7850-73752-0003.flac	463120
+7850/73752/7850-73752-0002.flac	145760
+7850/73752/7850-73752-0011.flac	47360
+7850/73752/7850-73752-0004.flac	61840
+7850/73752/7850-73752-0000.flac	50480
+7850/73752/7850-73752-0001.flac	129120
+7850/73752/7850-73752-0017.flac	95520
+7850/73752/7850-73752-0015.flac	122800
+7850/73752/7850-73752-0006.flac	161200
+7850/73752/7850-73752-0019.flac	49200
+7850/73752/7850-73752-0010.flac	224640
+7850/111771/7850-111771-0007.flac	175600
+7850/111771/7850-111771-0004.flac	176081
+7850/111771/7850-111771-0005.flac	52720
+7850/111771/7850-111771-0008.flac	110640
+7850/111771/7850-111771-0009.flac	132800
+7850/111771/7850-111771-0006.flac	61280
+7850/111771/7850-111771-0000.flac	120960
+7850/111771/7850-111771-0001.flac	106400
+7850/111771/7850-111771-0002.flac	133120
+7850/111771/7850-111771-0003.flac	54800
+7850/286674/7850-286674-0005.flac	123200
+7850/286674/7850-286674-0009.flac	115040
+7850/286674/7850-286674-0000.flac	135280
+7850/286674/7850-286674-0003.flac	99520
+7850/286674/7850-286674-0014.flac	124480
+7850/286674/7850-286674-0004.flac	128320
+7850/286674/7850-286674-0016.flac	178320
+7850/286674/7850-286674-0011.flac	146400
+7850/286674/7850-286674-0001.flac	45600
+7850/286674/7850-286674-0017.flac	43920
+7850/286674/7850-286674-0008.flac	44000
+7850/286674/7850-286674-0002.flac	91360
+7850/286674/7850-286674-0006.flac	52800
+7850/286674/7850-286674-0012.flac	64320
+7850/286674/7850-286674-0015.flac	56480
+7850/286674/7850-286674-0013.flac	37520
+7850/286674/7850-286674-0010.flac	205680
+7850/286674/7850-286674-0007.flac	55280
+422/122949/422-122949-0020.flac	374560
+422/122949/422-122949-0008.flac	148800
+422/122949/422-122949-0015.flac	279440
+422/122949/422-122949-0002.flac	71600
+422/122949/422-122949-0005.flac	156240
+422/122949/422-122949-0012.flac	251200
+422/122949/422-122949-0010.flac	519040
+422/122949/422-122949-0027.flac	305920
+422/122949/422-122949-0016.flac	80240
+422/122949/422-122949-0003.flac	303280
+422/122949/422-122949-0023.flac	239040
+422/122949/422-122949-0004.flac	111760
+422/122949/422-122949-0035.flac	75120
+422/122949/422-122949-0007.flac	238880
+422/122949/422-122949-0001.flac	199280
+422/122949/422-122949-0026.flac	209760
+422/122949/422-122949-0017.flac	35040
+422/122949/422-122949-0014.flac	460720
+422/122949/422-122949-0000.flac	323520
+422/122949/422-122949-0006.flac	228720
+422/122949/422-122949-0028.flac	58640
+422/122949/422-122949-0021.flac	203040
+422/122949/422-122949-0033.flac	88880
+422/122949/422-122949-0018.flac	233040
+422/122949/422-122949-0009.flac	389840
+422/122949/422-122949-0024.flac	155840
+422/122949/422-122949-0029.flac	46640
+422/122949/422-122949-0022.flac	260560
+422/122949/422-122949-0034.flac	239760
+422/122949/422-122949-0019.flac	305840
+422/122949/422-122949-0030.flac	47040
+422/122949/422-122949-0013.flac	522320
+422/122949/422-122949-0032.flac	213760
+422/122949/422-122949-0031.flac	109760
+422/122949/422-122949-0011.flac	275600
+6295/244435/6295-244435-0003.flac	51920
+6295/244435/6295-244435-0014.flac	32320
+6295/244435/6295-244435-0000.flac	49680
+6295/244435/6295-244435-0037.flac	82880
+6295/244435/6295-244435-0013.flac	107360
+6295/244435/6295-244435-0018.flac	85200
+6295/244435/6295-244435-0035.flac	51120
+6295/244435/6295-244435-0005.flac	70400
+6295/244435/6295-244435-0020.flac	119040
+6295/244435/6295-244435-0006.flac	45440
+6295/244435/6295-244435-0027.flac	38720
+6295/244435/6295-244435-0029.flac	59360
+6295/244435/6295-244435-0028.flac	40960
+6295/244435/6295-244435-0034.flac	54720
+6295/244435/6295-244435-0012.flac	73280
+6295/244435/6295-244435-0030.flac	88720
+6295/244435/6295-244435-0008.flac	214400
+6295/244435/6295-244435-0031.flac	76000
+6295/244435/6295-244435-0017.flac	136800
+6295/244435/6295-244435-0036.flac	115440
+6295/244435/6295-244435-0011.flac	40400
+6295/244435/6295-244435-0025.flac	95840
+6295/244435/6295-244435-0004.flac	197040
+6295/244435/6295-244435-0038.flac	149280
+6295/244435/6295-244435-0024.flac	67120
+6295/244435/6295-244435-0033.flac	164000
+6295/244435/6295-244435-0023.flac	54560
+6295/244435/6295-244435-0021.flac	47520
+6295/244435/6295-244435-0009.flac	132240
+6295/244435/6295-244435-0026.flac	53440
+6295/244435/6295-244435-0015.flac	35760
+6295/244435/6295-244435-0010.flac	161440
+6295/244435/6295-244435-0040.flac	62480
+6295/244435/6295-244435-0032.flac	63040
+6295/244435/6295-244435-0016.flac	184160
+6295/244435/6295-244435-0022.flac	122800
+6295/244435/6295-244435-0019.flac	64240
+6295/244435/6295-244435-0039.flac	146320
+6295/244435/6295-244435-0002.flac	97680
+6295/244435/6295-244435-0001.flac	126160
+6295/244435/6295-244435-0007.flac	73440
+6295/64301/6295-64301-0031.flac	108160
+6295/64301/6295-64301-0018.flac	159120
+6295/64301/6295-64301-0026.flac	166880
+6295/64301/6295-64301-0027.flac	102400
+6295/64301/6295-64301-0028.flac	170400
+6295/64301/6295-64301-0013.flac	96000
+6295/64301/6295-64301-0002.flac	107520
+6295/64301/6295-64301-0019.flac	157840
+6295/64301/6295-64301-0014.flac	73120
+6295/64301/6295-64301-0029.flac	184080
+6295/64301/6295-64301-0022.flac	170560
+6295/64301/6295-64301-0003.flac	108320
+6295/64301/6295-64301-0015.flac	116720
+6295/64301/6295-64301-0025.flac	91280
+6295/64301/6295-64301-0024.flac	103440
+6295/64301/6295-64301-0012.flac	104240
+6295/64301/6295-64301-0006.flac	104640
+6295/64301/6295-64301-0032.flac	41440
+6295/64301/6295-64301-0009.flac	48080
+6295/64301/6295-64301-0021.flac	130720
+6295/64301/6295-64301-0011.flac	161280
+6295/64301/6295-64301-0005.flac	102880
+6295/64301/6295-64301-0007.flac	70320
+6295/64301/6295-64301-0016.flac	109600
+6295/64301/6295-64301-0001.flac	75680
+6295/64301/6295-64301-0010.flac	128800
+6295/64301/6295-64301-0030.flac	87280
+6295/64301/6295-64301-0004.flac	60320
+6295/64301/6295-64301-0000.flac	282400
+6295/64301/6295-64301-0023.flac	333120
+6295/64301/6295-64301-0020.flac	95040
+6295/64301/6295-64301-0017.flac	87760
+6295/64301/6295-64301-0008.flac	44800
+6241/61946/6241-61946-0019.flac	36320
+6241/61946/6241-61946-0023.flac	105360
+6241/61946/6241-61946-0006.flac	126880
+6241/61946/6241-61946-0013.flac	153920
+6241/61946/6241-61946-0001.flac	96800
+6241/61946/6241-61946-0003.flac	135040
+6241/61946/6241-61946-0014.flac	85520
+6241/61946/6241-61946-0002.flac	87760
+6241/61946/6241-61946-0021.flac	71280
+6241/61946/6241-61946-0016.flac	101280
+6241/61946/6241-61946-0008.flac	68560
+6241/61946/6241-61946-0005.flac	47200
+6241/61946/6241-61946-0007.flac	96320
+6241/61946/6241-61946-0020.flac	191280
+6241/61946/6241-61946-0010.flac	51360
+6241/61946/6241-61946-0004.flac	88240
+6241/61946/6241-61946-0012.flac	47600
+6241/61946/6241-61946-0009.flac	68640
+6241/61946/6241-61946-0022.flac	132400
+6241/61946/6241-61946-0017.flac	71280
+6241/61946/6241-61946-0011.flac	161681
+6241/61946/6241-61946-0000.flac	99760
+6241/61946/6241-61946-0015.flac	63520
+6241/61946/6241-61946-0018.flac	62480
+6241/66616/6241-66616-0008.flac	284800
+6241/66616/6241-66616-0018.flac	117200
+6241/66616/6241-66616-0002.flac	67120
+6241/66616/6241-66616-0014.flac	139120
+6241/66616/6241-66616-0010.flac	150400
+6241/66616/6241-66616-0013.flac	82880
+6241/66616/6241-66616-0009.flac	178320
+6241/66616/6241-66616-0006.flac	92720
+6241/66616/6241-66616-0004.flac	77520
+6241/66616/6241-66616-0015.flac	73920
+6241/66616/6241-66616-0019.flac	101040
+6241/66616/6241-66616-0007.flac	81840
+6241/66616/6241-66616-0001.flac	129680
+6241/66616/6241-66616-0025.flac	164880
+6241/66616/6241-66616-0012.flac	170080
+6241/66616/6241-66616-0003.flac	87760
+6241/66616/6241-66616-0005.flac	172960
+6241/66616/6241-66616-0000.flac	148320
+6241/66616/6241-66616-0017.flac	104880
+6241/66616/6241-66616-0021.flac	66560
+6241/66616/6241-66616-0024.flac	66800
+6241/66616/6241-66616-0020.flac	46480
+6241/66616/6241-66616-0022.flac	101120
+6241/66616/6241-66616-0023.flac	42640
+6241/66616/6241-66616-0011.flac	171600
+6241/66616/6241-66616-0016.flac	80800
+6241/61943/6241-61943-0000.flac	111200
+6241/61943/6241-61943-0027.flac	285920
+6241/61943/6241-61943-0015.flac	39840
+6241/61943/6241-61943-0011.flac	62400
+6241/61943/6241-61943-0020.flac	70320
+6241/61943/6241-61943-0019.flac	46480
+6241/61943/6241-61943-0014.flac	107840
+6241/61943/6241-61943-0007.flac	42240
+6241/61943/6241-61943-0008.flac	129761
+6241/61943/6241-61943-0012.flac	93760
+6241/61943/6241-61943-0023.flac	69680
+6241/61943/6241-61943-0003.flac	125200
+6241/61943/6241-61943-0006.flac	61280
+6241/61943/6241-61943-0022.flac	48320
+6241/61943/6241-61943-0026.flac	145680
+6241/61943/6241-61943-0002.flac	43760
+6241/61943/6241-61943-0010.flac	72480
+6241/61943/6241-61943-0004.flac	54560
+6241/61943/6241-61943-0018.flac	115040
+6241/61943/6241-61943-0017.flac	97440
+6241/61943/6241-61943-0025.flac	100160
+6241/61943/6241-61943-0021.flac	40320
+6241/61943/6241-61943-0009.flac	61520
+6241/61943/6241-61943-0013.flac	127520
+6241/61943/6241-61943-0016.flac	74640
+6241/61943/6241-61943-0005.flac	78400
+2902/9006/2902-9006-0002.flac	190000
+2902/9006/2902-9006-0019.flac	227680
+2902/9006/2902-9006-0018.flac	512800
+2902/9006/2902-9006-0013.flac	284320
+2902/9006/2902-9006-0004.flac	60800
+2902/9006/2902-9006-0020.flac	86800
+2902/9006/2902-9006-0014.flac	326560
+2902/9006/2902-9006-0015.flac	519760
+2902/9006/2902-9006-0000.flac	76800
+2902/9006/2902-9006-0001.flac	369120
+2902/9006/2902-9006-0011.flac	69200
+2902/9006/2902-9006-0016.flac	192320
+2902/9006/2902-9006-0006.flac	64000
+2902/9006/2902-9006-0012.flac	102720
+2902/9006/2902-9006-0010.flac	66720
+2902/9006/2902-9006-0007.flac	506240
+2902/9006/2902-9006-0003.flac	236800
+2902/9006/2902-9006-0009.flac	191520
+2902/9006/2902-9006-0005.flac	516960
+2902/9006/2902-9006-0008.flac	177280
+2902/9008/2902-9008-0010.flac	128320
+2902/9008/2902-9008-0011.flac	110560
+2902/9008/2902-9008-0013.flac	188400
+2902/9008/2902-9008-0015.flac	43520
+2902/9008/2902-9008-0016.flac	53840
+2902/9008/2902-9008-0003.flac	231600
+2902/9008/2902-9008-0005.flac	83360
+2902/9008/2902-9008-0008.flac	34480
+2902/9008/2902-9008-0014.flac	215840
+2902/9008/2902-9008-0004.flac	90080
+2902/9008/2902-9008-0012.flac	109200
+2902/9008/2902-9008-0007.flac	95520
+2902/9008/2902-9008-0000.flac	187520
+2902/9008/2902-9008-0002.flac	390320
+2902/9008/2902-9008-0001.flac	270640
+2902/9008/2902-9008-0006.flac	293520
+2902/9008/2902-9008-0009.flac	86640
+5895/34615/5895-34615-0021.flac	194480
+5895/34615/5895-34615-0016.flac	234880
+5895/34615/5895-34615-0019.flac	109680
+5895/34615/5895-34615-0017.flac	117760
+5895/34615/5895-34615-0018.flac	36400
+5895/34615/5895-34615-0015.flac	139920
+5895/34615/5895-34615-0008.flac	48880
+5895/34615/5895-34615-0010.flac	124320
+5895/34615/5895-34615-0014.flac	82561
+5895/34615/5895-34615-0004.flac	255200
+5895/34615/5895-34615-0002.flac	139680
+5895/34615/5895-34615-0003.flac	105680
+5895/34615/5895-34615-0005.flac	39920
+5895/34615/5895-34615-0000.flac	53360
+5895/34615/5895-34615-0001.flac	52880
+5895/34615/5895-34615-0011.flac	44160
+5895/34615/5895-34615-0006.flac	40400
+5895/34615/5895-34615-0013.flac	75920
+5895/34615/5895-34615-0012.flac	165360
+5895/34615/5895-34615-0009.flac	44480
+5895/34615/5895-34615-0007.flac	91280
+5895/34615/5895-34615-0020.flac	152480
+5895/34622/5895-34622-0015.flac	48800
+5895/34622/5895-34622-0000.flac	53920
+5895/34622/5895-34622-0009.flac	174240
+5895/34622/5895-34622-0023.flac	93520
+5895/34622/5895-34622-0018.flac	79680
+5895/34622/5895-34622-0008.flac	141120
+5895/34622/5895-34622-0005.flac	117360
+5895/34622/5895-34622-0020.flac	91440
+5895/34622/5895-34622-0017.flac	134160
+5895/34622/5895-34622-0007.flac	76160
+5895/34622/5895-34622-0006.flac	69760
+5895/34622/5895-34622-0001.flac	94480
+5895/34622/5895-34622-0012.flac	72960
+5895/34622/5895-34622-0011.flac	248640
+5895/34622/5895-34622-0021.flac	45760
+5895/34622/5895-34622-0014.flac	94000
+5895/34622/5895-34622-0003.flac	81280
+5895/34622/5895-34622-0004.flac	68640
+5895/34622/5895-34622-0019.flac	135280
+5895/34622/5895-34622-0022.flac	163600
+5895/34622/5895-34622-0010.flac	52160
+5895/34622/5895-34622-0016.flac	41280
+5895/34622/5895-34622-0002.flac	46560
+5895/34622/5895-34622-0013.flac	229200
+5895/34629/5895-34629-0000.flac	36160
+5895/34629/5895-34629-0008.flac	165120
+5895/34629/5895-34629-0026.flac	216240
+5895/34629/5895-34629-0030.flac	162560
+5895/34629/5895-34629-0003.flac	50320
+5895/34629/5895-34629-0027.flac	113600
+5895/34629/5895-34629-0012.flac	58000
+5895/34629/5895-34629-0016.flac	76880
+5895/34629/5895-34629-0031.flac	57120
+5895/34629/5895-34629-0017.flac	75040
+5895/34629/5895-34629-0023.flac	126320
+5895/34629/5895-34629-0018.flac	94720
+5895/34629/5895-34629-0009.flac	75120
+5895/34629/5895-34629-0013.flac	80320
+5895/34629/5895-34629-0024.flac	65680
+5895/34629/5895-34629-0002.flac	62800
+5895/34629/5895-34629-0028.flac	86400
+5895/34629/5895-34629-0004.flac	41440
+5895/34629/5895-34629-0007.flac	131120
+5895/34629/5895-34629-0032.flac	44880
+5895/34629/5895-34629-0014.flac	38800
+5895/34629/5895-34629-0011.flac	110880
+5895/34629/5895-34629-0005.flac	35840
+5895/34629/5895-34629-0033.flac	123680
+5895/34629/5895-34629-0006.flac	118960
+5895/34629/5895-34629-0010.flac	34400
+5895/34629/5895-34629-0019.flac	55680
+5895/34629/5895-34629-0020.flac	36080
+5895/34629/5895-34629-0025.flac	77120
+5895/34629/5895-34629-0029.flac	49280
+5895/34629/5895-34629-0015.flac	114720
+5895/34629/5895-34629-0021.flac	133920
+5895/34629/5895-34629-0001.flac	51200
+3170/137482/3170-137482-0010.flac	160160
+3170/137482/3170-137482-0014.flac	56320
+3170/137482/3170-137482-0038.flac	223280
+3170/137482/3170-137482-0037.flac	91760
+3170/137482/3170-137482-0031.flac	274320
+3170/137482/3170-137482-0025.flac	119120
+3170/137482/3170-137482-0005.flac	139520
+3170/137482/3170-137482-0036.flac	109120
+3170/137482/3170-137482-0013.flac	39520
+3170/137482/3170-137482-0034.flac	125840
+3170/137482/3170-137482-0002.flac	316080
+3170/137482/3170-137482-0040.flac	212080
+3170/137482/3170-137482-0004.flac	66640
+3170/137482/3170-137482-0039.flac	324960
+3170/137482/3170-137482-0046.flac	148000
+3170/137482/3170-137482-0020.flac	117280
+3170/137482/3170-137482-0007.flac	285680
+3170/137482/3170-137482-0022.flac	224320
+3170/137482/3170-137482-0006.flac	114320
+3170/137482/3170-137482-0044.flac	267280
+3170/137482/3170-137482-0029.flac	46480
+3170/137482/3170-137482-0030.flac	102000
+3170/137482/3170-137482-0019.flac	85280
+3170/137482/3170-137482-0024.flac	108160
+3170/137482/3170-137482-0012.flac	47760
+3170/137482/3170-137482-0023.flac	196640
+3170/137482/3170-137482-0000.flac	447840
+3170/137482/3170-137482-0026.flac	145120
+3170/137482/3170-137482-0001.flac	338720
+3170/137482/3170-137482-0048.flac	115920
+3170/137482/3170-137482-0017.flac	223760
+3170/137482/3170-137482-0033.flac	39440
+3170/137482/3170-137482-0015.flac	163360
+3170/137482/3170-137482-0045.flac	74800
+3170/137482/3170-137482-0018.flac	136160
+3170/137482/3170-137482-0003.flac	309600
+3170/137482/3170-137482-0047.flac	268080
+3170/137482/3170-137482-0008.flac	61280
+3170/137482/3170-137482-0032.flac	60320
+3170/137482/3170-137482-0021.flac	159120
+3170/137482/3170-137482-0009.flac	121920
+3170/137482/3170-137482-0011.flac	138880
+3170/137482/3170-137482-0042.flac	178480
+3170/137482/3170-137482-0041.flac	112400
+3170/137482/3170-137482-0028.flac	62400
+3170/137482/3170-137482-0027.flac	211040
+3170/137482/3170-137482-0016.flac	74000
+3170/137482/3170-137482-0035.flac	158400
+3170/137482/3170-137482-0043.flac	169840
+5536/43359/5536-43359-0017.flac	88160
+5536/43359/5536-43359-0009.flac	106320
+5536/43359/5536-43359-0003.flac	196240
+5536/43359/5536-43359-0000.flac	53600
+5536/43359/5536-43359-0001.flac	165760
+5536/43359/5536-43359-0005.flac	108080
+5536/43359/5536-43359-0015.flac	74000
+5536/43359/5536-43359-0008.flac	43040
+5536/43359/5536-43359-0013.flac	106320
+5536/43359/5536-43359-0006.flac	69440
+5536/43359/5536-43359-0016.flac	96800
+5536/43359/5536-43359-0012.flac	77040
+5536/43359/5536-43359-0004.flac	101120
+5536/43359/5536-43359-0002.flac	176640
+5536/43359/5536-43359-0010.flac	60400
+5536/43359/5536-43359-0011.flac	308400
+5536/43359/5536-43359-0018.flac	95680
+5536/43359/5536-43359-0007.flac	274800
+5536/43359/5536-43359-0014.flac	72720
+5536/43363/5536-43363-0017.flac	147120
+5536/43363/5536-43363-0005.flac	264960
+5536/43363/5536-43363-0018.flac	126240
+5536/43363/5536-43363-0003.flac	163360
+5536/43363/5536-43363-0011.flac	33440
+5536/43363/5536-43363-0006.flac	191760
+5536/43363/5536-43363-0016.flac	109200
+5536/43363/5536-43363-0015.flac	75840
+5536/43363/5536-43363-0012.flac	52240
+5536/43363/5536-43363-0014.flac	181520
+5536/43363/5536-43363-0000.flac	45120
+5536/43363/5536-43363-0007.flac	158400
+5536/43363/5536-43363-0013.flac	118800
+5536/43363/5536-43363-0004.flac	150000
+5536/43363/5536-43363-0008.flac	164640
+5536/43363/5536-43363-0001.flac	114240
+5536/43363/5536-43363-0019.flac	116000
+5536/43363/5536-43363-0010.flac	164160
+5536/43363/5536-43363-0009.flac	334080
+5536/43363/5536-43363-0002.flac	112320
+5536/43358/5536-43358-0011.flac	121440
+5536/43358/5536-43358-0018.flac	193040
+5536/43358/5536-43358-0009.flac	129040
+5536/43358/5536-43358-0002.flac	161760
+5536/43358/5536-43358-0001.flac	158000
+5536/43358/5536-43358-0006.flac	166560
+5536/43358/5536-43358-0010.flac	130240
+5536/43358/5536-43358-0013.flac	166400
+5536/43358/5536-43358-0015.flac	95360
+5536/43358/5536-43358-0007.flac	150720
+5536/43358/5536-43358-0012.flac	54160
+5536/43358/5536-43358-0014.flac	45280
+5536/43358/5536-43358-0005.flac	225040
+5536/43358/5536-43358-0019.flac	84160
+5536/43358/5536-43358-0008.flac	326240
+5536/43358/5536-43358-0003.flac	96640
+5536/43358/5536-43358-0000.flac	50800
+5536/43358/5536-43358-0004.flac	93200
+5536/43358/5536-43358-0017.flac	220880
+5536/43358/5536-43358-0016.flac	38800
+2803/154320/2803-154320-0002.flac	56480
+2803/154320/2803-154320-0003.flac	103600
+2803/154320/2803-154320-0012.flac	156960
+2803/154320/2803-154320-0013.flac	69840
+2803/154320/2803-154320-0000.flac	183680
+2803/154320/2803-154320-0011.flac	68480
+2803/154320/2803-154320-0009.flac	44240
+2803/154320/2803-154320-0014.flac	84640
+2803/154320/2803-154320-0006.flac	34880
+2803/154320/2803-154320-0010.flac	64160
+2803/154320/2803-154320-0001.flac	248960
+2803/154320/2803-154320-0005.flac	76320
+2803/154320/2803-154320-0008.flac	57440
+2803/154320/2803-154320-0007.flac	72160
+2803/154320/2803-154320-0004.flac	192080
+2803/161169/2803-161169-0016.flac	92560
+2803/161169/2803-161169-0006.flac	135520
+2803/161169/2803-161169-0017.flac	128160
+2803/161169/2803-161169-0009.flac	395120
+2803/161169/2803-161169-0002.flac	78560
+2803/161169/2803-161169-0000.flac	187200
+2803/161169/2803-161169-0010.flac	221040
+2803/161169/2803-161169-0007.flac	203440
+2803/161169/2803-161169-0001.flac	151920
+2803/161169/2803-161169-0013.flac	184160
+2803/161169/2803-161169-0012.flac	252560
+2803/161169/2803-161169-0015.flac	209840
+2803/161169/2803-161169-0003.flac	162240
+2803/161169/2803-161169-0004.flac	179600
+2803/161169/2803-161169-0008.flac	226240
+2803/161169/2803-161169-0011.flac	87920
+2803/161169/2803-161169-0005.flac	268720
+2803/161169/2803-161169-0014.flac	217600
+2803/154328/2803-154328-0012.flac	153120
+2803/154328/2803-154328-0016.flac	34880
+2803/154328/2803-154328-0021.flac	48800
+2803/154328/2803-154328-0002.flac	32960
+2803/154328/2803-154328-0013.flac	63600
+2803/154328/2803-154328-0004.flac	179040
+2803/154328/2803-154328-0007.flac	108320
+2803/154328/2803-154328-0003.flac	204800
+2803/154328/2803-154328-0019.flac	162880
+2803/154328/2803-154328-0001.flac	61200
+2803/154328/2803-154328-0018.flac	246080
+2803/154328/2803-154328-0005.flac	97600
+2803/154328/2803-154328-0017.flac	118960
+2803/154328/2803-154328-0000.flac	126880
+2803/154328/2803-154328-0023.flac	105440
+2803/154328/2803-154328-0008.flac	117920
+2803/154328/2803-154328-0020.flac	222720
+2803/154328/2803-154328-0015.flac	328799
+2803/154328/2803-154328-0010.flac	101440
+2803/154328/2803-154328-0009.flac	132800
+2803/154328/2803-154328-0006.flac	64080
+2803/154328/2803-154328-0014.flac	78800
+2803/154328/2803-154328-0022.flac	76880
+2803/154328/2803-154328-0011.flac	105520
+5338/24640/5338-24640-0000.flac	55200
+5338/24640/5338-24640-0006.flac	174880
+5338/24640/5338-24640-0004.flac	154560
+5338/24640/5338-24640-0002.flac	157280
+5338/24640/5338-24640-0009.flac	194000
+5338/24640/5338-24640-0007.flac	280800
+5338/24640/5338-24640-0005.flac	243360
+5338/24640/5338-24640-0001.flac	180160
+5338/24640/5338-24640-0008.flac	161280
+5338/24615/5338-24615-0006.flac	185360
+5338/24615/5338-24615-0007.flac	161280
+5338/24615/5338-24615-0003.flac	185200
+5338/24615/5338-24615-0000.flac	160400
+5338/24615/5338-24615-0014.flac	196720
+5338/24615/5338-24615-0008.flac	154880
+5338/24615/5338-24615-0009.flac	68080
+5338/24615/5338-24615-0001.flac	121680
+5338/24615/5338-24615-0004.flac	376160
+5338/24615/5338-24615-0012.flac	68800
+5338/24615/5338-24615-0013.flac	129360
+5338/24615/5338-24615-0005.flac	284320
+5338/24615/5338-24615-0002.flac	514320
+5338/24615/5338-24615-0010.flac	62240
+5338/24615/5338-24615-0011.flac	110880
+5338/284437/5338-284437-0010.flac	106000
+5338/284437/5338-284437-0011.flac	49680
+5338/284437/5338-284437-0032.flac	148400
+5338/284437/5338-284437-0005.flac	83360
+5338/284437/5338-284437-0030.flac	69280
+5338/284437/5338-284437-0002.flac	44560
+5338/284437/5338-284437-0027.flac	63520
+5338/284437/5338-284437-0031.flac	124000
+5338/284437/5338-284437-0022.flac	161280
+5338/284437/5338-284437-0023.flac	44720
+5338/284437/5338-284437-0013.flac	180000
+5338/284437/5338-284437-0018.flac	127360
+5338/284437/5338-284437-0007.flac	59200
+5338/284437/5338-284437-0021.flac	120240
+5338/284437/5338-284437-0019.flac	49840
+5338/284437/5338-284437-0029.flac	121280
+5338/284437/5338-284437-0033.flac	33200
+5338/284437/5338-284437-0025.flac	31040
+5338/284437/5338-284437-0003.flac	43680
+5338/284437/5338-284437-0016.flac	53920
+5338/284437/5338-284437-0015.flac	85280
+5338/284437/5338-284437-0024.flac	95760
+5338/284437/5338-284437-0000.flac	72800
+5338/284437/5338-284437-0014.flac	29520
+5338/284437/5338-284437-0006.flac	78880
+5338/284437/5338-284437-0028.flac	73440
+5338/284437/5338-284437-0001.flac	144800
+5338/284437/5338-284437-0009.flac	164720
+5338/284437/5338-284437-0020.flac	170800
+5338/284437/5338-284437-0004.flac	99120
+5338/284437/5338-284437-0008.flac	148000
+5338/284437/5338-284437-0012.flac	48400
+5338/284437/5338-284437-0017.flac	80160
+5338/284437/5338-284437-0026.flac	254800
+5694/64025/5694-64025-0018.flac	42960
+5694/64025/5694-64025-0005.flac	130721
+5694/64025/5694-64025-0013.flac	55040
+5694/64025/5694-64025-0001.flac	69600
+5694/64025/5694-64025-0022.flac	123680
+5694/64025/5694-64025-0004.flac	59120
+5694/64025/5694-64025-0010.flac	345920
+5694/64025/5694-64025-0014.flac	254000
+5694/64025/5694-64025-0008.flac	40879
+5694/64025/5694-64025-0003.flac	78720
+5694/64025/5694-64025-0023.flac	136320
+5694/64025/5694-64025-0012.flac	46000
+5694/64025/5694-64025-0016.flac	171520
+5694/64025/5694-64025-0020.flac	117600
+5694/64025/5694-64025-0015.flac	81040
+5694/64025/5694-64025-0021.flac	121760
+5694/64025/5694-64025-0011.flac	112480
+5694/64025/5694-64025-0007.flac	62640
+5694/64025/5694-64025-0009.flac	64320
+5694/64025/5694-64025-0000.flac	26720
+5694/64025/5694-64025-0019.flac	141920
+5694/64025/5694-64025-0002.flac	175360
+5694/64025/5694-64025-0006.flac	51520
+5694/64025/5694-64025-0017.flac	48240
+5694/64038/5694-64038-0009.flac	38160
+5694/64038/5694-64038-0011.flac	47920
+5694/64038/5694-64038-0014.flac	68640
+5694/64038/5694-64038-0012.flac	44880
+5694/64038/5694-64038-0022.flac	252320
+5694/64038/5694-64038-0005.flac	75200
+5694/64038/5694-64038-0018.flac	188720
+5694/64038/5694-64038-0015.flac	108320
+5694/64038/5694-64038-0002.flac	140320
+5694/64038/5694-64038-0023.flac	76720
+5694/64038/5694-64038-0013.flac	52880
+5694/64038/5694-64038-0024.flac	56320
+5694/64038/5694-64038-0017.flac	245280
+5694/64038/5694-64038-0007.flac	37920
+5694/64038/5694-64038-0020.flac	77440
+5694/64038/5694-64038-0010.flac	150080
+5694/64038/5694-64038-0025.flac	103680
+5694/64038/5694-64038-0019.flac	100080
+5694/64038/5694-64038-0016.flac	41360
+5694/64038/5694-64038-0006.flac	40400
+5694/64038/5694-64038-0001.flac	58400
+5694/64038/5694-64038-0000.flac	41520
+5694/64038/5694-64038-0008.flac	31200
+5694/64038/5694-64038-0003.flac	109120
+5694/64038/5694-64038-0021.flac	53280
+5694/64029/5694-64029-0020.flac	213600
+5694/64029/5694-64029-0007.flac	105360
+5694/64029/5694-64029-0012.flac	118160
+5694/64029/5694-64029-0011.flac	118400
+5694/64029/5694-64029-0008.flac	48320
+5694/64029/5694-64029-0015.flac	47920
+5694/64029/5694-64029-0000.flac	64080
+5694/64029/5694-64029-0025.flac	39440
+5694/64029/5694-64029-0009.flac	50560
+5694/64029/5694-64029-0003.flac	63760
+5694/64029/5694-64029-0010.flac	61680
+5694/64029/5694-64029-0004.flac	53680
+5694/64029/5694-64029-0005.flac	73040
+5694/64029/5694-64029-0018.flac	39760
+5694/64029/5694-64029-0029.flac	65120
+5694/64029/5694-64029-0027.flac	66720
+5694/64029/5694-64029-0028.flac	96960
+5694/64029/5694-64029-0006.flac	116320
+5694/64029/5694-64029-0024.flac	172240
+5694/64029/5694-64029-0014.flac	61120
+5694/64029/5694-64029-0021.flac	113120
+5694/64029/5694-64029-0002.flac	42320
+5694/64029/5694-64029-0023.flac	115200
+5694/64029/5694-64029-0019.flac	54640
+5694/64029/5694-64029-0030.flac	54160
+5694/64029/5694-64029-0032.flac	50000
+5694/64029/5694-64029-0001.flac	78880
+5694/64029/5694-64029-0022.flac	78400
+5694/64029/5694-64029-0031.flac	102160
+5694/64029/5694-64029-0013.flac	82480
+5694/64029/5694-64029-0017.flac	45840
+5694/64029/5694-64029-0016.flac	167280
+84/121550/84-121550-0031.flac	138480
+84/121550/84-121550-0000.flac	134960
+84/121550/84-121550-0009.flac	137840
+84/121550/84-121550-0014.flac	130960
+84/121550/84-121550-0026.flac	149600
+84/121550/84-121550-0007.flac	152400
+84/121550/84-121550-0008.flac	135120
+84/121550/84-121550-0022.flac	122400
+84/121550/84-121550-0028.flac	122240
+84/121550/84-121550-0023.flac	115440
+84/121550/84-121550-0016.flac	143280
+84/121550/84-121550-0020.flac	134240
+84/121550/84-121550-0033.flac	128320
+84/121550/84-121550-0024.flac	138080
+84/121550/84-121550-0006.flac	141280
+84/121550/84-121550-0002.flac	132240
+84/121550/84-121550-0003.flac	130560
+84/121550/84-121550-0030.flac	142800
+84/121550/84-121550-0025.flac	157440
+84/121550/84-121550-0021.flac	139360
+84/121550/84-121550-0004.flac	127440
+84/121550/84-121550-0017.flac	138320
+84/121550/84-121550-0027.flac	63920
+84/121550/84-121550-0001.flac	127600
+84/121550/84-121550-0034.flac	143120
+84/121550/84-121550-0011.flac	149680
+84/121550/84-121550-0015.flac	130480
+84/121550/84-121550-0010.flac	133280
+84/121550/84-121550-0013.flac	305760
+84/121550/84-121550-0029.flac	134160
+84/121550/84-121550-0019.flac	148479
+84/121550/84-121550-0035.flac	131520
+84/121550/84-121550-0012.flac	147840
+84/121123/84-121123-0014.flac	46160
+84/121123/84-121123-0017.flac	150000
+84/121123/84-121123-0003.flac	108800
+84/121123/84-121123-0012.flac	40960
+84/121123/84-121123-0001.flac	63840
+84/121123/84-121123-0022.flac	41760
+84/121123/84-121123-0028.flac	76160
+84/121123/84-121123-0005.flac	255360
+84/121123/84-121123-0016.flac	134560
+84/121123/84-121123-0004.flac	70400
+84/121123/84-121123-0018.flac	56800
+84/121123/84-121123-0013.flac	38400
+84/121123/84-121123-0020.flac	110720
+84/121123/84-121123-0000.flac	33440
+84/121123/84-121123-0024.flac	166080
+84/121123/84-121123-0011.flac	52000
+84/121123/84-121123-0008.flac	112240
+84/121123/84-121123-0009.flac	43121
+84/121123/84-121123-0019.flac	38960
+84/121123/84-121123-0023.flac	74960
+84/121123/84-121123-0025.flac	100080
+84/121123/84-121123-0002.flac	219040
+84/121123/84-121123-0007.flac	32000
+84/121123/84-121123-0015.flac	48560
+84/121123/84-121123-0026.flac	223040
+84/121123/84-121123-0021.flac	77040
+84/121123/84-121123-0027.flac	49120
+84/121123/84-121123-0010.flac	115040
+84/121123/84-121123-0006.flac	89920
+2035/147961/2035-147961-0006.flac	44720
+2035/147961/2035-147961-0004.flac	95440
+2035/147961/2035-147961-0002.flac	111920
+2035/147961/2035-147961-0012.flac	64400
+2035/147961/2035-147961-0039.flac	85600
+2035/147961/2035-147961-0031.flac	56800
+2035/147961/2035-147961-0016.flac	52240
+2035/147961/2035-147961-0011.flac	91520
+2035/147961/2035-147961-0023.flac	68960
+2035/147961/2035-147961-0005.flac	150720
+2035/147961/2035-147961-0007.flac	49600
+2035/147961/2035-147961-0027.flac	46000
+2035/147961/2035-147961-0034.flac	44880
+2035/147961/2035-147961-0040.flac	79040
+2035/147961/2035-147961-0022.flac	54640
+2035/147961/2035-147961-0025.flac	88400
+2035/147961/2035-147961-0021.flac	174320
+2035/147961/2035-147961-0014.flac	62720
+2035/147961/2035-147961-0036.flac	71680
+2035/147961/2035-147961-0015.flac	44560
+2035/147961/2035-147961-0009.flac	46000
+2035/147961/2035-147961-0038.flac	75440
+2035/147961/2035-147961-0017.flac	32160
+2035/147961/2035-147961-0008.flac	68000
+2035/147961/2035-147961-0013.flac	79680
+2035/147961/2035-147961-0019.flac	76320
+2035/147961/2035-147961-0000.flac	241120
+2035/147961/2035-147961-0020.flac	58080
+2035/147961/2035-147961-0028.flac	43280
+2035/147961/2035-147961-0003.flac	43120
+2035/147961/2035-147961-0033.flac	31120
+2035/147961/2035-147961-0035.flac	87120
+2035/147961/2035-147961-0032.flac	221760
+2035/147961/2035-147961-0001.flac	70640
+2035/147961/2035-147961-0037.flac	77360
+2035/147961/2035-147961-0010.flac	46560
+2035/147961/2035-147961-0026.flac	46000
+2035/147961/2035-147961-0024.flac	60960
+2035/147961/2035-147961-0030.flac	61600
+2035/147961/2035-147961-0018.flac	89440
+2035/147961/2035-147961-0029.flac	59680
+2035/147960/2035-147960-0010.flac	153360
+2035/147960/2035-147960-0012.flac	68080
+2035/147960/2035-147960-0007.flac	73600
+2035/147960/2035-147960-0005.flac	105360
+2035/147960/2035-147960-0014.flac	71441
+2035/147960/2035-147960-0015.flac	24960
+2035/147960/2035-147960-0002.flac	141440
+2035/147960/2035-147960-0003.flac	93440
+2035/147960/2035-147960-0011.flac	70080
+2035/147960/2035-147960-0016.flac	78320
+2035/147960/2035-147960-0013.flac	42800
+2035/147960/2035-147960-0009.flac	56000
+2035/147960/2035-147960-0001.flac	62800
+2035/147960/2035-147960-0008.flac	99360
+2035/147960/2035-147960-0000.flac	144320
+2035/147960/2035-147960-0006.flac	53040
+2035/147960/2035-147960-0004.flac	66720
+2035/152373/2035-152373-0005.flac	384480
+2035/152373/2035-152373-0014.flac	117600
+2035/152373/2035-152373-0018.flac	96240
+2035/152373/2035-152373-0006.flac	123360
+2035/152373/2035-152373-0016.flac	105440
+2035/152373/2035-152373-0010.flac	121600
+2035/152373/2035-152373-0015.flac	136160
+2035/152373/2035-152373-0003.flac	102640
+2035/152373/2035-152373-0009.flac	277120
+2035/152373/2035-152373-0002.flac	149440
+2035/152373/2035-152373-0000.flac	126000
+2035/152373/2035-152373-0001.flac	266720
+2035/152373/2035-152373-0011.flac	167440
+2035/152373/2035-152373-0013.flac	327520
+2035/152373/2035-152373-0007.flac	207040
+2035/152373/2035-152373-0017.flac	125680
+2035/152373/2035-152373-0012.flac	163680
+2035/152373/2035-152373-0008.flac	106640
+2035/152373/2035-152373-0004.flac	117440
diff --git a/SpeechT5/asr_train/train.txt b/SpeechT5/asr_train/train.txt
new file mode 100644
index 0000000000000000000000000000000000000000..9a61273bae064588b4a1ad220d7e3cc27f4519d9
--- /dev/null
+++ b/SpeechT5/asr_train/train.txt
@@ -0,0 +1,2674 @@
+EVEN ON THIS LEDGE OF HUMAN SOCIETY THERE WAS A STUNTED GROWTH OF SHOPLETS WHICH HAD TAKEN ROOT AND VEGETATED SOMEHOW THOUGH AS IN AN AIR MERCANTILE OF THE BLEAKEST
+SHORTLY AFTER PASSING ONE OF THESE CHAPELS WE CAME SUDDENLY UPON A VILLAGE WHICH STARTED UP OUT OF THE MIST AND I WAS ALARMED LEST I SHOULD BE MADE AN OBJECT OF CURIOSITY OR DISLIKE
+THE STREETS WERE NARROW AND UNPAVED BUT VERY FAIRLY CLEAN
+IN ABOUT FOUR HOURS OF WALKING FROM THE TIME WE STARTED AND AFTER PASSING TWO OR THREE MORE VILLAGES WE CAME UPON A CONSIDERABLE TOWN AND MY GUIDES MADE MANY ATTEMPTS TO MAKE ME UNDERSTAND SOMETHING BUT I GATHERED NO INKLING OF THEIR MEANING EXCEPT THAT I NEED BE UNDER NO APPREHENSION OF DANGER
+THE VINE GREW OUTSIDE MANY OF THE HOUSES AND THERE WERE SOME WITH SIGN BOARDS ON WHICH WAS PAINTED A BOTTLE AND A GLASS THAT MADE ME FEEL MUCH AT HOME
+EVEN IN MIDDLE AGE THEY WERE STILL COMELY AND THE OLD GREY HAIRED WOMEN AT THEIR COTTAGE DOORS HAD A DIGNITY NOT TO SAY MAJESTY OF THEIR OWN
+I HAVE ALWAYS DELIGHTED IN AND REVERENCED BEAUTY BUT I FELT SIMPLY ABASHED IN THE PRESENCE OF SUCH A SPLENDID TYPE A COMPOUND OF ALL THAT IS BEST IN EGYPTIAN GREEK AND ITALIAN
+THE CHILDREN WERE INFINITE IN NUMBER AND EXCEEDINGLY MERRY I NEED HARDLY SAY THAT THEY CAME IN FOR THEIR FULL SHARE OF THE PREVAILING BEAUTY
+THE DESIGN WAS DIFFERENT BUT THE THING WAS CLEARLY THE SAME
+THE OTHER LOOKED PALE AND ILL BUT HE WAS MARVELLOUSLY SELF CONTAINED AND IT WAS IMPOSSIBLE TO SAY WHAT WAS THE MATTER WITH HIM
+WE PASSED MANY CASES AND AT LAST CAME TO ONE IN WHICH THERE WERE SEVERAL CLOCKS AND TWO OR THREE OLD WATCHES
+MY GUIDES HOWEVER WERE WELL KNOWN AND THE NATURAL POLITENESS OF THE PEOPLE PREVENTED THEM FROM PUTTING ME TO ANY INCONVENIENCE BUT THEY COULD NOT HELP EYEING ME NOR I THEM
+I SAW A FEW SHEEP WITH ROUNDED NOSES AND ENORMOUS TAILS
+THIS HAD SOME EFFECT IN CALMING HIM
+EACH FEATURE WAS FINISHED EYELIDS EYELASHES AND EARS BEING ALMOST INVARIABLY PERFECT
+BUT BY AND BY THEY CAME TO MY WATCH WHICH I HAD HIDDEN AWAY IN THE INMOST POCKET THAT I HAD AND HAD FORGOTTEN WHEN THEY BEGAN THEIR SEARCH
+THE COUNTRY WAS HIGHLY CULTIVATED EVERY LEDGE BEING PLANTED WITH CHESTNUTS WALNUTS AND APPLE TREES FROM WHICH THE APPLES WERE NOW GATHERING
+THEIR EXPRESSION WAS DIVINE AND AS THEY GLANCED AT ME TIMIDLY BUT WITH PARTED LIPS IN GREAT BEWILDERMENT I FORGOT ALL THOUGHTS OF THEIR CONVERSION IN FEELINGS THAT WERE FAR MORE EARTHLY
+THEY FELT MY PULSE THEY LOOKED AT MY TONGUE THEY LISTENED AT MY CHEST THEY FELT ALL MY MUSCLES AND AT THE END OF EACH OPERATION THEY LOOKED AT THE CHIEF AND NODDED AND SAID SOMETHING IN A TONE QUITE PLEASANT AS THOUGH I WERE ALL RIGHT
+AGAIN THERE WAS A VERY OLD CARRIAGE WHOSE WHEELS IN SPITE OF RUST AND DECAY I COULD SEE HAD BEEN DESIGNED ORIGINALLY FOR IRON RAILS
+I EXPRESSED BY SIGNS MY ADMIRATION AND PLEASURE TO MY GUIDES AND THEY WERE GREATLY PLEASED
+HE BEGAN PRESENTLY TO RELENT AND SPOKE TO ME IN A KINDER MANNER
+IN FACT ONE OF THEM WAS PLAINLY VERY MUCH OUT OF HEALTH AND COUGHED VIOLENTLY FROM TIME TO TIME IN SPITE OF MANIFEST EFFORTS TO SUPPRESS IT
+SUFFICE IT THAT I FOUND MYSELF TAKEN BEFORE THE CHIEF MAGISTRATE AND BY HIS ORDERS WAS PLACED IN AN APARTMENT WITH TWO OTHER PEOPLE WHO WERE THE FIRST I HAD SEEN LOOKING ANYTHING BUT WELL AND HANDSOME
+IT IS SURPRISING HOW SOON THE EYE BECOMES ACCUSTOMED TO MISSING TWENTY SHEEP OUT OF TWO OR THREE HUNDRED
+I WAS DELIGHTED WITH THE COUNTRY AND THE MANNER OF LIFE
+I REACHED MY DESTINATION IN ONE OF THE LAST MONTHS OF EIGHTEEN SIXTY EIGHT BUT I DARE NOT MENTION THE SEASON LEST THE READER SHOULD GATHER IN WHICH HEMISPHERE I WAS
+EACH MUST CRY LOUDER AND WANDER FARTHER YET MAY LUCK BE WITH THEM BOTH THAT THEY MAY FIND THEIR OWN AT NIGHTFALL
+NO ONE WHO IS HIMSELF HONEST WILL DOUBT MY BEING SO
+I HAD NO MONEY BUT IF I COULD ONLY FIND WORKABLE COUNTRY I MIGHT STOCK IT WITH BORROWED CAPITAL AND CONSIDER MYSELF A MADE MAN
+IF THE READER WILL EXCUSE ME I WILL SAY NOTHING OF MY ANTECEDENTS NOR OF THE CIRCUMSTANCES WHICH LED ME TO LEAVE MY NATIVE COUNTRY THE NARRATIVE WOULD BE TEDIOUS TO HIM AND PAINFUL TO MYSELF
+I WAS TO SEE THE SHEEP NOT NECESSARILY CLOSE AT HAND NOR TO GET THEM IN A SINGLE MOB BUT TO SEE ENOUGH OF THEM HERE AND THERE TO FEEL EASY THAT NOTHING HAD GONE WRONG THIS WAS NO DIFFICULT MATTER FOR THERE WERE NOT ABOVE EIGHT HUNDRED OF THEM AND BEING ALL BREEDING EWES THEY WERE PRETTY QUIET
+I WOULD TRY THE NEARER RANGE AND SEE HOW FAR I COULD GO
+SHEEP AND CATTLE WERE INTRODUCED AND BRED WITH EXTREME RAPIDITY MEN TOOK UP THEIR FIFTY THOUSAND OR ONE HUNDRED THOUSAND ACRES OF COUNTRY GOING INLAND ONE BEHIND THE OTHER TILL IN A FEW YEARS THERE WAS NOT AN ACRE BETWEEN THE SEA AND THE FRONT RANGES WHICH WAS NOT TAKEN UP AND STATIONS EITHER FOR SHEEP OR CATTLE WERE SPOTTED ABOUT AT INTERVALS OF SOME TWENTY OR THIRTY MILES OVER THE WHOLE COUNTRY
+THERE WERE A GOOD MANY SHEEP WHICH I KNEW AS TWO OR THREE BLACK EWES AND A BLACK LAMB OR TWO AND SEVERAL OTHERS WHICH HAD SOME DISTINGUISHING MARK WHEREBY I COULD TELL THEM
+IT WAS A MONOTONOUS LIFE BUT IT WAS VERY HEALTHY AND ONE DOES NOT MUCH MIND ANYTHING WHEN ONE IS WELL
+IT WILL BE SEEN THAT I DID NOT SUCCEED IN MY DESIGN AND THAT HOWEVER MUCH I MAY HAVE MET WITH THAT WAS NEW AND STRANGE I HAVE BEEN UNABLE TO REAP ANY PECUNIARY ADVANTAGE
+THE COUNTRY WAS THE GRANDEST THAT CAN BE IMAGINED
+THERE WAS NO ONE IN THE WHOLE WORLD WHO HAD THE SMALLEST IDEA SAVE THOSE WHO WERE THEMSELVES ON THE OTHER SIDE OF IT IF INDEED THERE WAS ANY ONE AT ALL COULD I HOPE TO CROSS IT
+SO LONELY AND SO SOLEMN WITH THE SAD GREY CLOUDS ABOVE AND NO SOUND SAVE A LOST LAMB BLEATING UPON THE MOUNTAIN SIDE AS THOUGH ITS LITTLE HEART WERE BREAKING
+PREFACE TO SECOND EDITION
+I MUST NOT CONCLUDE WITHOUT EXPRESSING MY MOST SINCERE THANKS TO MY CRITICS AND TO THE PUBLIC FOR THE LENIENCY AND CONSIDERATION WITH WHICH THEY HAVE TREATED MY ADVENTURES
+THIS IS A MISTAKE THOUGH A PERFECTLY NATURAL ONE
+IT WAS WRITTEN IN THE UPPER RANGITATA DISTRICT OF THE CANTERBURY PROVINCE AS IT THEN WAS OF NEW ZEALAND AND APPEARED AT CHRISTCHURCH IN THE PRESS NEWSPAPER JUNE THIRTEENTH EIGHTEEN SIXTY THREE
+I ATTRIBUTE ITS UNLOOKED FOR SUCCESS MAINLY TO TWO EARLY FAVOURABLE REVIEWS THE FIRST IN THE PALL MALL GAZETTE OF APRIL TWELFTH AND THE SECOND IN THE SPECTATOR OF APRIL TWENTIETH
+ON MY RETURN I PURPOSELY AVOIDED LOOKING INTO IT UNTIL I HAD SENT BACK MY LAST REVISES TO THE PRINTER
+THE FIRST EDITION OF EREWHON SOLD IN ABOUT THREE WEEKS I HAD NOT TAKEN MOULDS AND AS THE DEMAND WAS STRONG IT WAS SET UP AGAIN IMMEDIATELY
+BUT THIS HAD AN EFFECT OF WHICH I HAVE LITTLE REASON TO COMPLAIN FOR I WAS ALLOWED ALMOST TO CALL THEM LIFE LONG SELF DECEIVERS TO THEIR FACES AND THEY SAID IT WAS QUITE TRUE BUT THAT IT DID NOT MATTER
+THIS HOWEVER MAY NOT BE FOR THE COPYRIGHT WILL PROBABLY EXPIRE IN A LITTLE OVER TWELVE YEARS
+I AM SURPRISED HOWEVER THAT THE BOOK AT WHICH SUCH AN EXAMPLE OF THE SPECIOUS MISUSE OF ANALOGY WOULD SEEM MOST NATURALLY LEVELLED SHOULD HAVE OCCURRED TO NO REVIEWER NEITHER SHALL I MENTION THE NAME OF THE BOOK HERE THOUGH I SHOULD FANCY THAT THE HINT GIVEN WILL SUFFICE
+I ALSO WROTE ABOUT THIS TIME THE SUBSTANCE OF WHAT ULTIMATELY BECAME THE MUSICAL BANKS AND THE TRIAL OF A MAN FOR BEING IN A CONSUMPTION
+I REGRET THAT REVIEWERS HAVE IN SOME CASES BEEN INCLINED TO TREAT THE CHAPTERS ON MACHINES AS AN ATTEMPT TO REDUCE MISTER DARWIN'S THEORY TO AN ABSURDITY
+THERE WAS ALSO ANOTHER CAUSE
+I MADE A FEW FURTHER VERY TRIFLING ALTERATIONS BEFORE MOULDS WERE TAKEN BUT SINCE THE SUMMER OF EIGHTEEN SEVENTY TWO AS NEW EDITIONS WERE FROM TIME TO TIME WANTED THEY HAVE BEEN PRINTED FROM STEREOS THEN MADE
+I SEE FROM MY SECOND PREFACE THAT I TOOK THE BOOK TO MESSRS CHAPMAN AND HALL MAY FIRST EIGHTEEN SEVENTY ONE AND ON THEIR REJECTION OF IT UNDER THE ADVICE OF ONE WHO HAS ATTAINED THE HIGHEST RANK AMONG LIVING WRITERS I LET IT SLEEP TILL I TOOK IT TO MISTER TRUBNER EARLY IN EIGHTEEN SEVENTY TWO
+I AM STILL FAIRLY WELL SATISFIED WITH THOSE PARTS OF EREWHON THAT WERE REPEATEDLY REWRITTEN BUT FROM THOSE THAT HAD ONLY A SINGLE WRITING I WOULD GLADLY CUT OUT SOME FORTY OR FIFTY PAGES IF I COULD
+THEN I HAD MUCH PLEASURE IN READING IT BUT WAS INDEED SURPRISED AT THE MANY LITTLE POINTS OF SIMILARITY BETWEEN THE TWO BOOKS IN SPITE OF THEIR ENTIRE INDEPENDENCE TO ONE ANOTHER
+THIS LADY'S RIGHT NAME WAS JOAN BUT BECAUSE OF HER COMELINESS OR AT LEAST IT WAS SO IMAGINED SHE WAS CALLED OF MANY PRIMAVERA SPRING AND WENT BY THAT NAME AMONG THEM
+AND THEREWITHAL SUCH A BEWILDERMENT POSSESS'D ME THAT I SHUT MINE EYES FOR PEACE AND IN MY BRAIN DID CEASE ORDER OF THOUGHT AND EVERY HEALTHFUL THING
+THEN SAW I MANY BROKEN HINTED SIGHTS IN THE UNCERTAIN STATE I STEPP'D INTO
+THE SECOND PART BEGINS HERE SAYING BE NOW THE THIRD HERE THEN WHILE IT WAS HIS PLEASURE
+THESE WILDERING PHANTASIES THEN CARRIED ME TO SEE MY LADY DEAD
+THE SECOND PART BEGINS HERE I WAS A THINKING THE FIRST PART DIVIDES INTO TWO
+WHEREBY OTHER LADIES WHO WERE ABOUT THE ROOM BECOMING AWARE OF MY DISCOMFORT BY REASON OF THE MOAN THAT SHE MADE WHO INDEED WAS OF MY VERY NEAR KINDRED LED HER AWAY FROM WHERE I WAS AND THEN SET THEMSELVES TO AWAKEN ME THINKING THAT I DREAMED AND SAYING SLEEP NO LONGER AND BE NOT DISQUIETED
+AND SO STRONG WAS MY PHANTASY THAT I WEPT AGAIN IN VERY TRUTH AND SAID WITH MY TRUE VOICE O EXCELLENT SOUL HOW BLESSED IS HE THAT NOW LOOKETH UPON THEE
+AND IN HIS SPEECH HE LAUGH'D AND LAUGH'D AGAIN THEN WHILE IT WAS HIS PLEASURE TO REMAIN I CHANCED TO LOOK THE WAY HE HAD DRAWN NEAR AND SAW THE LADIES JOAN AND BEATRICE APPROACH ME THIS THE OTHER FOLLOWING ONE AND A SECOND MARVEL INSTANTLY
+WHEN BEING AROUSED I OPENED MINE EYES AND KNEW THAT IT HAD BEEN A DECEPTION
+AND MY HUE WAS SUCH THAT THEY LOOK'D AT EACH OTHER AND THOUGHT OF DEATH SAYING UNDER THEIR BREATH MOST TENDERLY O LET US COMFORT HIM THEN UNTO ME WHAT DREAM WAS THINE THAT IT HATH SHAKEN THEE SO MUCH
+AND AT THE FIRST IT SEEMED TO ME THAT I SAW CERTAIN FACES OF WOMEN WITH THEIR HAIR LOOSENED WHICH CALLED OUT TO ME THOU SHALT SURELY DIE AFTER THE WHICH OTHER TERRIBLE AND UNKNOWN APPEARANCES SAID UNTO ME THOU ART DEAD
+THIS HAS INDUCED SOME EDITORS OF THE VITA NUOVA TO EXPLAIN THE TITLE AS MEANING EARLY LIFE
+THROUGHOUT THE VITA NUOVA THERE IS A STRAIN LIKE THE FIRST FALLING MURMUR WHICH REACHES THE EAR IN SOME REMOTE MEADOW AND PREPARES US TO LOOK UPON THE SEA
+ON JANUARY TWENTY FIFTH HE WROTE MANY AND MANY THANKS FOR A MOST ESSENTIAL SERVICE MOST THOROUGHLY PERFORMED
+THIS BOOK IN ITS ORIGINAL FORM WAS RECEIVED WITH FAVOUR AND SETTLED THE CLAIM OF ROSSETTI TO RANK AS A POETIC TRANSLATOR OR INDEED AS A POET IN HIS OWN RIGHT
+POETRY NOT BEING AN EXACT SCIENCE LITERALITY OF RENDERING IS ALTOGETHER SECONDARY TO THIS CHIEF LAW
+THE ONLY TRUE MOTIVE FOR PUTTING POETRY INTO A FRESH LANGUAGE MUST BE TO ENDOW A FRESH NATION AS FAR AS POSSIBLE WITH ONE MORE POSSESSION OF BEAUTY
+IT IS THEREFORE AND ON ALL ACCOUNTS UNNECESSARY TO SAY MUCH MORE OF THE WORK HERE THAN IT SAYS FOR ITSELF
+HE TRANSLATED AT AN EARLY AGE CHIEFLY BETWEEN EIGHTEEN FORTY FIVE AND EIGHTEEN FORTY NINE A GREAT NUMBER OF POEMS BY THE ITALIANS CONTEMPORARY WITH DANTE OR PRECEDING HIM AND AMONG OTHER THINGS HE MADE A VERSION OF THE WHOLE VITA NUOVA PROSE AND VERSE
+MY NOTES WHICH YOU HAVE TAKEN THE TROUBLE OF REVISING ARE OF COURSE QUITE PALTRY AND USELESS
+AND IF YOU HAVE TIME IT WOULD BE A GREAT SERVICE TO TRANSLATE THE ANALYSES OF THE POEMS WHICH I OMITTED
+THE LIFE BLOOD OF RHYTHMICAL TRANSLATION IS THIS COMMANDMENT THAT A GOOD POEM SHALL NOT BE TURNED INTO A BAD ONE
+OFTEN WOULD HE AVAIL HIMSELF OF ANY SPECIAL GRACE OF HIS OWN IDIOM AND EPOCH IF ONLY HIS WILL BELONGED TO HIM OFTEN WOULD SOME CADENCE SERVE HIM BUT FOR HIS AUTHOR'S STRUCTURE SOME STRUCTURE BUT FOR HIS AUTHOR'S CADENCE OFTEN THE BEAUTIFUL TURN OF A STANZA MUST BE WEAKENED TO ADOPT SOME RHYME WHICH WILL TALLY AND HE SEES THE POET REVELLING IN ABUNDANCE OF LANGUAGE WHERE HIMSELF IS SCANTILY SUPPLIED
+A WORD SHOULD BE SAID HERE OF THE TITLE OF DANTE'S AUTOBIOGRAPHY
+WHATEVER HER SWEET EYES ARE TURNED UPON SPIRITS OF LOVE DO ISSUE THENCE IN FLAME WHICH THROUGH THEIR EYES WHO THEN MAY LOOK ON THEM PIERCE TO THE HEART'S DEEP CHAMBER EVERY ONE
+SO TO THE ROAD THOU SHALT BE RECONCILED AND FIND THE LADY AND WITH THE LADY LOVE
+THE FIRST PART IS DIVIDED INTO FOUR
+AND NOW THAT IT HATH PLEASED HER TO DENY ME THIS LOVE MY MASTER OF HIS GREAT GOODNESS HATH PLACED ALL MY BEATITUDE THERE WHERE MY HOPE WILL NOT FAIL ME
+IN THE FOURTH REPEATING TO WHOM I PURPOSE SPEAKING I TELL THE REASON WHY I SPEAK TO THEM
+THIS SECOND PART IS DIVIDED INTO TWO FOR IN THE ONE I SPEAK OF THE EYES WHICH ARE THE BEGINNING OF LOVE IN THE SECOND I SPEAK OF THE MOUTH WHICH IS THE END OF LOVE
+AND I DECLARE THAT WHEN I SPEAK THEREOF LOVE SHEDS SUCH PERFECT SWEETNESS OVER ME THAT IF MY COURAGE FAILED NOT CERTAINLY TO HIM MY LISTENERS MUST BE ALL RESIGN'D
+TO HER I WEND ALONG IN WHOSE MUCH STRENGTH MY WEAKNESS IS MADE STRONG
+IN THE SECOND I TELL WHAT IS UNDERSTOOD OF HER ON EARTH HERE MY LADY IS DESIRED
+THEREAFTER THIS SONNET BRED IN ME DESIRE TO WRITE DOWN IN VERSE FOUR OTHER THINGS TOUCHING MY CONDITION THE WHICH THINGS IT SEEMED TO ME THAT I HAD NOT YET MADE MANIFEST
+THE SECOND BEGINS HERE AN ANGEL THE THIRD HERE DEAR SONG I KNOW
+BUT WHEN I STILL SPAKE NOT ONE OF THEM WHO BEFORE HAD BEEN TALKING WITH ANOTHER ADDRESSED ME BY MY NAME SAYING TO WHAT END LOVEST THOU THIS LADY SEEING THAT THOU CANST NOT SUPPORT HER PRESENCE
+THEN THOSE LADIES BEGAN TO TALK CLOSELY TOGETHER AND AS I HAVE SEEN SNOW FALL AMONG THE RAIN SO WAS THEIR TALK MINGLED WITH SIGHS
+WHICH THING BEING THUS THERE CAME A DAY WHEN CERTAIN LADIES TO WHOM IT WAS WELL KNOWN THEY HAVING BEEN WITH ME AT DIVERS TIMES IN MY TROUBLE WERE MET TOGETHER FOR THE PLEASURE OF GENTLE COMPANY
+IN THE THIRD I SAY WHAT IT IS I PURPOSE TO SPEAK SO AS NOT TO BE IMPEDED BY FAINTHEARTEDNESS
+THIS SECOND PART IS DIVIDED INTO TWO FOR IN THE FIRST I SPEAK OF HER AS REGARDS THE NOBLENESS OF HER SOUL RELATING SOME OF HER VIRTUES PROCEEDING FROM HER SOUL IN THE SECOND I SPEAK OF HER AS REGARDS THE NOBLENESS OF HER BODY NARRATING SOME OF HER BEAUTIES HERE LOVE SAITH CONCERNING HER
+THE PEOPLE IS A BEAST OF MUDDY BRAIN THAT KNOWS NOT ITS OWN FORCE AND THEREFORE STANDS LOADED WITH WOOD AND STONE THE POWERLESS HANDS OF A MERE CHILD GUIDE IT WITH BIT AND REIN ONE KICK WOULD BE ENOUGH TO BREAK THE CHAIN BUT THE BEAST FEARS AND WHAT THE CHILD DEMANDS IT DOES NOR ITS OWN TERROR UNDERSTANDS CONFUSED AND STUPEFIED BY BUGBEARS VAIN
+MOST WONDERFUL
+DUE TO THEE THEIR PRAISE OF MAIDEN PURE OF TEEMING MOTHERHOOD
+IN FLESH WAS RAIMENTED HOW HE WAS KILLED AND BURIED FROM THE DEAD HOW HE AROSE TO LIFE WITH VICTORY AND REIGNED IN HEAVEN HOW ALL OF US SHALL BE GLORIOUS LIKE HIM WHOSE HEARTS TO HIS ARE WED HOW THEY WHO DIE FOR LOVE OF REASON GIVE HYPOCRITES TYRANTS SOPHISTS ALL WHO SELL THEIR NEIGHBOURS ILL FOR HOLINESS TO HELL HOW THE DEAD SAINT CONDEMNS THE BAD WHO LIVE HOW ALL HE DOES BECOMES A LAW FOR MEN HOW HE AT LAST TO JUDGE SHALL COME AGAIN
+QUINCI IMPARA A STUPIRTI
+HEAVEN HELP THAT BODY WHICH A LITTLE MIND HOUSED IN A HEAD LACKING EARS TONGUE AND EYES AND SENSELESS BUT FOR SMELL CAN TYRANNISE
+WELL TOO IF HE LIKE LOVE WOULD FILCH OUR HOARD WITH PLEASURE TO OURSELVES SLUICING OUR VEIN AND VIGOUR TO PERPETUATE THE STRAIN OF LIFE BY SPILTH OF LIFE WITHIN US STORED
+HE LIVES THY LOSS HE DIES FROM EVERY LIMB MANGLED BY THEE LIGHTNINGS OF GODHEAD SHINE FROM WHICH THY DARKNESS HATH NOT WHERE TO HIDE
+THOU LIKE ARCTURUS STEADFAST IN THE SKIES WITH TARDY SENSE GUIDEST THY KINGDOM FAIR BEARING ALONE THE LOAD OF LIBERTY
+THAT PENANCE HATH NO BLAME WHICH MAGDALEN FOUND SWEET PURGING OUR SHAME SELF PUNISHMENT IS VIRTUE ALL MEN KNOW
+IL POPOLO E UNA BESTIA
+ORGAN OF RUT NOT REASON IS THE LORD WHO FROM THE BODY POLITIC DOTH DRAIN LUST FOR HIMSELF INSTEAD OF TOIL AND PAIN LEAVING US LEAN AS CRICKETS ON DRY SWARD
+MONEY IS FALSE AND LIGHT UNLESS IT BE BOUGHT BY A MAN'S OWN WORTHY QUALITIES AND BLOOD IS SUCH THAT ITS CORRUPT DISEASE AND IGNORANT PRETENCE ARE FOUL TO SEE
+THIS WORLD'S THICK VAPOURS WHELM YOUR EYES UNWORTHY OF THAT GLORIOUS SHOW BLIND TO HIS SPLENDOUR BENT UPON HIS SHAME
+ARE YOU REALLY GOING TO THROW ME OVER FOR A THING LIKE THIS
+IT WAS BITTERLY COLD BUT THE EMBANKMENT WAS MORE ROMANTIC THAN A RAILWAY CARRIAGE
+HE HAD BEEN LATE HE HAD OFFERED NO EXCUSE NO EXPLANATION
+SHE WOULD HAVE SHARED HIS SORROW AND SHOWN HERSELF HALF WIFE HALF ANGEL FROM HEAVEN IN THIS DARK HOUR
+HER HANDS SHOULD HAVE BEEN FULL OF BLUEBELLS AND SHE SHOULD HAVE HELD THEM UP TO HIS FACE IN MAIDENLY DEFENCE AS HE SPRANG FORWARD TO TAKE HER IN HIS ARMS
+AND YESTERDAY I HAD A LETTER FROM HER AND SHE SEEMS TO EXPECT TO THINK AND I THOUGHT I OUGHT TO TELL YOU DARLING
+COULDN'T HELP IT THEN HOW CAN I EVER TRUST YOU
+HE CHECKED THE SILLY IMPULSE
+I DIDN'T THINK A DECENT MAN COULD DO SUCH THINGS SHE WAS PULLING ON HER GLOVES GO HOME AND GLOAT OVER IT ALL
+AND CURIOUSLY ENOUGH SHE WAS HARDLY CURIOUS AT ALL ABOUT WHAT HE MIGHT HAVE TO SAY
+AND HE STRODE DOWN BETWEEN THE MARBLE TABLES AND OUT BY THE SWING DOOR IT WAS A VERY GOOD EXIT
+YOU SEE THAT SHE KNEW EXACTLY HOW A TRYST IS CONDUCTED IN THE PAGES OF THE STANDARD POETS AND OF THE CHEAPER WEEKLY JOURNALS
+THE KEEN WIND THRUST ITSELF EVEN INSIDE THE HIGH COLLAR OF HER JACKET
+WHAT OPINION WOULD HE FORM OF THE PURITY OF HER MIND THE INNOCENCE OF HER SOUL IF AN INCIDENT LIKE THIS FAILED TO SHOCK HER DEEPLY
+THE SETTING OF THE SCENE SEEMED TO HER ALL IMPORTANT
+HE STOOD UP SUDDENLY DO YOU MEAN IT
+SHE ONLY WISHED FOR MAY AND THE ORCHARD INSTEAD OF JANUARY AND THE DINGY DUSTY WAITING ROOM THE PLAIN FACED PREOCCUPIED TRAVELLERS THE DIM DESOLATE WEATHER
+DO YOU THINK I'M NOT SORRY NOW
+SHE HAD TO THE FULL LIMIT ALLOWED OF HER READING AND HER ENVIRONMENT THE LITERARY SENSE
+NO IT'S ONLY PAINFUL FOR BOTH OF US
+SO HE ENLISTED AND WENT TO SOUTH AFRICA AND HE NEVER CAME HOME COVERED WITH MEDALS AND GLORY WHICH WAS RATHER HIS IDEA TO THE FEW SIMPLE WORDS OF EXPLANATION THAT WOULD HAVE MADE ALL STRAIGHT AND REPAID HER AND HIM FOR ALL THE PAST
+SHE HERSELF SHOULD HAVE BEEN A POEM A LYRIC IN A WHITE GOWN AND GREEN SCARF COMING TO HIM THROUGH THE LONG GRASS UNDER THE BLOSSOMED BOUGHS
+A SHOCK OF UNBELIEVABLE RELIEF TINGLED THROUGH HER SO THAT WAS ALL WHAT WAS IT COMPARED WITH HER FEARS
+SHE SAID HOW FRIGHTFULLY COLD IT IS
+BUT HERE THE ONLY THING THAT OCCURRED TO HER WAS TO STOP AND LOOK IN ONE OF THE SHOPS TILL HE SHOULD ASK HER WHAT SHE WAS LOOKING AT
+HER HANDS AND FEET WERE ACHING WITH COLD
+AT THE CORNER HE REMEMBERED THAT HE HAD GONE AWAY WITHOUT PAYING FOR THE TEA AND HIS NATURAL IMPULSE WAS TO GO BACK AND REMEDY THAT ERROR
+THOSE FOUR TRUE WORDS WOUNDED HER MORE THAN ALL THE REST
+FOLLOWING THE TINGLE OF RELIEF CAME A SHARP SICKENING PINCH OF JEALOUSY AND MORTIFICATION THESE INSPIRED HER
+SHALL I POUR OUT MY SOUL INTO THE EAR OF A MIST A FUME FROM MY OWN BRAIN
+WITH THAT CAME A PANG OF INTENSE PAIN
+BUT LIVING SOUL THERE COULD BE NONE TO MEET
+BUT SHE KNEW NOBODY AND WANDERED ALONE IN THE GARDEN OPPRESSED WITH SOMETHING SHE DID NOT UNDERSTAND
+THE OLD TIME WAS BUT A THICKER DREAM AND THIS IS TRUER BECAUSE MORE SHADOWY
+SHE HAD LOST HIM YEARS AND YEARS BEFORE AND NOW SHE SAW HIM HE WAS THERE AND SHE KNEW HIM
+HE CAME TO HER SIDE AND SHE GAVE HIM NO GREETING
+AT THE END OF IT SHE WAS IN A PLACE OF TOMBS
+THUS WAS SHE BORNE AWAY CAPTIVE OF HER DEAD NEITHER WILLING NOR UNWILLING OF LIFE AND DEATH EQUALLY CARELESS
+THIS WAS HER DREAM AS NEARLY AS SHE COULD RECALL IT WHEN SHE CAME TO HERSELF AFTER WAKING FROM IT WITH A CRY
+SHE WAS LOST LOST UTTERLY WITH AN ETERNAL LOSS
+WHEN SHE OPENED THE DOOR OF IT THE BRIGHT FIRE WHICH BEENIE UNDESIRED HAD KINDLED THERE STARTLED HER THE ROOM LOOKED UNNATURAL UNCANNY BECAUSE IT WAS CHEERFUL
+AT THE TIME MARY HAD NOTED NOTHING OF THESE THINGS NOW SHE SAW THEM ALL AS FOR THE FIRST TIME IN MINUTE DETAIL WHILE SLOWLY SHE WENT UP THE STAIR AND THROUGH THE NARROWED WAYS AND HEARD THE SAME WIND THAT RAVED ALIKE ABOUT THE NEW GRAVE AND THE OLD HOUSE INTO WHICH LATTER FOR ALL THE BALES BANKED AGAINST THE WALLS IT FOUND MANY A CHINK OF ENTRANCE
+SHE WAS ONE OF A LARGE COMPANY AT A HOUSE WHERE SHE HAD NEVER BEEN BEFORE A BEAUTIFUL HOUSE WITH A LARGE GARDEN BEHIND
+SHE STOOD FOR A MOMENT ON THE HEARTH AND IN SAD DREAMY MOOD LISTENED TO THE HOWLING SWOOPS OF THE WIND MAKING THE HOUSE QUIVER AND SHAKE
+SHE KNEW NOTHING OF THE PLACE HAD NOWHERE TO GO NOWHERE SHE WANTED TO GO HAD NOT A THOUGHT TO TELL HER WHAT QUESTION TO ASK IF SHE MET A LIVING SOUL
+SHE ENTERED AND THE SERVANTS SOFT FOOTED AND SILENT WERE BUSY CARRYING AWAY THE VESSELS OF HOSPITALITY AND RESTORING ORDER AS IF ALREADY THEY PREPARED FOR ANOTHER COMPANY ON THE MORROW NO ONE HEEDED HER
+WHEN SHE SAID GOOD NIGHT TO BEENIE AND WENT TO HER CHAMBER OVER THAT WHERE THE LOVED PARENT AND FRIEND WOULD FALL ASLEEP NO MORE SHE FELT AS IF SHE WENT WALKING ALONG TO HER TOMB
+IT WAS A SUMMER NIGHT AND THE GUESTS WERE WANDERING IN AND OUT AT WILL AND THROUGH HOUSE AND GARDEN AMID LOVELY THINGS OF ALL COLORS AND ODORS
+HER ONLY LIFE WAS THAT SHE WAS LOST
+I KNOW IT AND THERE IS NO WAKING
+IT WASN'T I WHO SAID THAT SAID THE GIRL SMILING BUT THAT'S SO ANYHOW AND THEN SHE SIGHED
+I SHALL LOCK UP ALL THE DOORS AND WINDOWS IN THE HOUSE AND THEN I SHALL GIVE YOU MY LATCH KEY AND YOU CAN LET YOURSELF IN AND STAY THE NIGHT HERE THERE IS NO ONE IN THE HOUSE
+THERE IS A SEAT IN THE GARDEN AT THE SIDE OF THE HOUSE AGAIN SHE HESITATED
+IS IT ONLY THAT YOU'RE POOR WHY THAT'S NOTHING I'M POOR TOO SHE LAUGHED
+LET ME THINK HE SAID OH HOW GLAD I AM THAT YOU HAPPENED TO COME THIS WAY
+HE HURRIEDLY CUT CAKE AND PRESSED IT UPON HER
+DO DRINK THIS AND THEN TELL ME PERHAPS I CAN HELP YOU
+HE TOLD ME TO STAY ON AT THE HOTEL AND I DID AND THEN ONE NIGHT WHEN I WAS AT THE THEATRE MY MAID A HORRID FRENCH THING WE GOT IN PARIS PACKED UP ALL MY TRUNKS AND TOOK ALL MY MONEY AND PAID THE BILL AND WENT
+ALL THE SAME HE ADDED IRRELEVANTLY YOU SHALL HAVE THE LATCH KEY
+I WILL CATCH THE NIGHT TRAIN AND BRING MY MOTHER UP TO MORROW THEN WE WILL SEE WHAT CAN BE DONE
+THE LADY AND THE GUITAR CERTAINLY PASSED THE NIGHT AT HILL VIEW VILLA BUT WHEN HIS MOTHER VERY ANGRY AND VERY FRIGHTENED CAME UP WITH HIM AT ABOUT NOON THE HOUSE LOOKED JUST AS USUAL AND NO ONE WAS THERE BUT THE CHARWOMAN
+THE YOUNG MAN DREW A DEEP BREATH OF RELIEF AND LIGHTED THE WAX CANDLES IN THE SOLID SILVER CANDLESTICKS ON HIS WRITING TABLE FOR NOW THE LATE SUMMER DUSK WAS FALLING AND THAT ORGAN PLEASE HEAVEN MADE FULL THE MEASURE OF THE DAY'S APPOINTED TORTURE
+IT WAS PLAIN THAT HIS CASTANET GIRL HIS MOTHER AND SISTER TOOK A PLEASURE IN CREDITING HER DAILY WITH SOME FRESH AND UNPLEASING INSTRUMENT COULD HAVE HAD NEITHER TASTE MONEY NOR HONESTY TO SUCH A POINT AS THIS
+THE LAST STRAINS OF THE ILL TREATED ILL FATED INTERMEZZO HAD DIED AWAY AND AFTER THEM HAD DIED AWAY ALSO THE RUMBLING OF THE WHEELS OF THE MURDEROUS BARREL ORGAN THAT HAD SO GAILY EXECUTED THAT ALONG WITH THE NINE OTHER TUNES OF ITS REPERTORY TO THE ADMIRATION OF THE HOUSEMAID AT THE WINDOW OF THE HOUSE OPPOSITE AND THE CROWING DELIGHT OF THE TWO BABIES NEXT DOOR
+SHE SAID AGAIN YOU ARE KIND
+NEVER HAD ANY ACT SEEMED SO IMPOSSIBLE
+LOOK HERE HE SAID THIS IS ALL NONSENSE YOU KNOW YOU ARE TIRED OUT AND THERE'S SOMETHING WRONG WHAT IS IT
+THE SILVER IS ALL RIGHT THANK GOODNESS SHE SAID BUT YOUR BANJO GIRL HAS TAKEN A PAIR OF YOUR SISTER'S SILK STOCKINGS AND THOSE NEW SHOES OF HERS WITH THE SILVER BUCKLES AND SHE'S LEFT THESE
+WELL THEN I WENT INTO LODGINGS THAT WICKED WOMAN HAD LEFT ME ONE STREET SUIT AND TO DAY THEY TURNED ME OUT BECAUSE MY MONEY WAS ALL GONE
+YOU SEE PAPA'S SO VERY RICH AND AT HOME THEY EXPECT ME TO TO GET ACQUAINTED WITH DUKES AND THINGS AND SHE STOPPED
+HE HAD NO TIME TO THINK BUT HE WAS AWARE THAT THIS WAS THE MOST EXCITING ADVENTURE THAT HAD EVER HAPPENED TO HIM
+THEN SHE TURNED TOWARDS THE QUARTER INDICATED AND DISAPPEARED ROUND THE LAUREL BUSHES
+THEN THERE WAS SILENCE THEN A SIGH AND THE SOUND OF LIGHT MOVING FEET ON THE GRAVEL
+AND AGAIN HE LISTENED WITH A QUIET PLEASURE
+YOU ARE KIND SHE SAID FOR THE THIRD TIME AND REACHED HER HAND OUT TO HIM HE DID NOT KISS IT THEN ONLY TOOK IT IN HIS AND FELT HOW SMALL AND COLD IT WAS THEN IT WAS TAKEN AWAY
+HER LITTLE FOOT TAPPED THE GRAVEL IMPATIENTLY
+MICHAELIS THE TICKET OF LEAVE APOSTLE WAS SPEAKING IN AN EVEN VOICE A VOICE THAT WHEEZED AS IF DEADENED AND OPPRESSED BY THE LAYER OF FAT ON HIS CHEST
+HIS VISION OF TRUTH HAD GROWN SO INTENSE THAT THE SOUND OF A STRANGE VOICE FAILED TO ROUT IT THIS TIME
+HE STOOD STILL IN THE MIDDLE OF THE PARLOUR AND LOOKED INTO THE KITCHEN IN SILENCE
+THE OTHER DAY STEVIE GOT HOLD OF ONE AND THERE WAS A STORY IN IT OF A GERMAN SOLDIER OFFICER TEARING HALF OFF THE EAR OF A RECRUIT AND NOTHING WAS DONE TO HIM FOR IT THE BRUTE
+MISTER VERLOC'S ANXIETIES HAD PREVENTED HIM FROM ATTACHING ANY SENSE TO WHAT HIS WIFE WAS SAYING
+I WISH HE HAD NEVER BEEN TO SCHOOL MISSUS VERLOC BEGAN AGAIN BRUSQUELY
+A HARSH LAUGH FROM COMRADE OSSIPON CUT THE TIRADE DEAD SHORT IN A SUDDEN FALTERING OF THE TONGUE AND A BEWILDERED UNSTEADINESS OF THE APOSTLE'S MILDLY EXALTED EYES
+I WOULDN'T GIVE A HALFPENNY FOR THE WHOLE LOT
+VERY CHARACTERISTIC PERFECTLY TYPICAL
+HE CLOSED THE DOOR BEHIND THEIR BACKS WITH RESTRAINED VIOLENCE TURNED THE KEY SHOT THE BOLT
+AH HE DID NOT DEPEND UPON EMOTIONAL EXCITEMENT TO KEEP UP HIS BELIEF NO DECLAMATIONS NO ANGER NO VISIONS OF BLOOD RED FLAGS WAVING OR METAPHORICAL LURID SUNS OF VENGEANCE RISING ABOVE THE HORIZON OF A DOOMED SOCIETY NOT HE
+ALL IDEALISATION MAKES LIFE POORER
+STEVIE SWALLOWED THE TERRIFYING STATEMENT WITH AN AUDIBLE GULP AND AT ONCE AS THOUGH IT HAD BEEN SWIFT POISON SANK LIMPLY IN A SITTING POSTURE ON THE STEPS OF THE KITCHEN DOOR
+ALEXANDER OSSIPON GOT UP TALL IN HIS THREADBARE BLUE SERGE SUIT UNDER THE LOW CEILING SHOOK OFF THE STIFFNESS OF LONG IMMOBILITY AND STROLLED AWAY INTO THE KITCHEN DOWN TWO STEPS TO LOOK OVER STEVIE'S SHOULDER
+HER BARE FEET AS IF POKED THROUGH THE BOTTOM OF AN UNADORNED SLEEVED CALICO SACK BUTTONED TIGHTLY AT NECK AND WRISTS FELT OVER THE RUG FOR THE SLIPPERS WHILE SHE LOOKED UPWARD INTO HER HUSBAND'S FACE
+AND EVER SINCE HE HAD NEVER MANAGED TO GET HIS WEIGHT DOWN AS MUCH AS AN OUNCE
+HE TOOK THE CASH BOX OUT OF THE DRAWER AND TURNING TO LEAVE THE SHOP BECAME AWARE THAT STEVIE WAS STILL DOWNSTAIRS
+DON'T YOU THINK THAT IF I HAD NOT BEEN THE OPTIMIST I AM I COULD NOT HAVE FOUND IN FIFTEEN YEARS SOME MEANS TO CUT MY THROAT
+THERE WAS AN EXTRAORDINARY FORCE OF SUGGESTION IN THIS POSTURING
+THIS SURVEY WAS UNFAVOURABLE
+THERE WAS NO YOUNG MAN OF HIS AGE IN LONDON MORE WILLING AND DOCILE THAN STEPHEN SHE AFFIRMED NONE MORE AFFECTIONATE AND READY TO PLEASE AND EVEN USEFUL AS LONG AS PEOPLE DID NOT UPSET HIS POOR HEAD
+THAT POOR BOY IS IN A VERY EXCITED STATE TO NIGHT SHE MURMURED AFTER A PAUSE WHICH LASTED FOR THREE TICKS OF THE CLOCK
+THIS DREAD LED HIM TO MAKE THE REMARK THAT STEVIE HAD DISREGARDED HIS SUGGESTION TO GO TO BED
+THE DISDAINFUL POUT OF COMRADE OSSIPON'S THICK LIPS ACCENTUATED THE NEGRO TYPE OF HIS FACE
+THEY ARE NOURISHING THEIR GREED ON THE QUIVERING FLESH AND THE WARM BLOOD OF THE PEOPLE NOTHING ELSE
+THERE ARE NATURES TOO TO WHOSE SENSE OF JUSTICE THE PRICE EXACTED LOOMS UP MONSTROUSLY ENORMOUS ODIOUS OPPRESSIVE WORRYING HUMILIATING EXTORTIONATE INTOLERABLE THOSE ARE THE FANATICS
+THE SHEET OF PAPER COVERED WITH CIRCLES DROPPED OUT OF HIS FINGERS AND HE REMAINED STARING AT THE OLD TERRORIST AS IF ROOTED SUDDENLY TO THE SPOT BY HIS MORBID HORROR AND DREAD OF PHYSICAL PAIN
+HIS SCARED EYES BLAZED WITH INDIGNATION IT WOULD HURT TERRIBLY HIS MOUTH DROPPED OPEN
+MISTER VERLOC PERCEIVED WITH SOME SURPRISE THAT HE DID NOT KNOW REALLY WHAT TO SAY TO STEVIE
+DOWN BELOW IN THE QUIET NARROW STREET MEASURED FOOTSTEPS APPROACHED THE HOUSE THEN DIED AWAY UNHURRIED AND FIRM AS IF THE PASSER BY HAD STARTED TO PACE OUT ALL ETERNITY FROM GAS LAMP TO GAS LAMP IN A NIGHT WITHOUT END AND THE DROWSY TICKING OF THE OLD CLOCK ON THE LANDING BECAME DISTINCTLY AUDIBLE IN THE BEDROOM
+HE KNOWS NO BETTER HE GETS INTO HIS PASSIONS OVER IT
+YES NOT AT ALL WELL
+THE COMPARISON OCCURRED TO MISTER VERLOC BECAUSE HE HAD SAT ASTRIDE VARIOUS ARMY HORSES IN HIS TIME AND HAD NOW THE SENSATION OF AN INCIPIENT FALL
+STEVIE ACCUSTOMED TO MOVE ABOUT DISREGARDED HAD GOT UP FROM THE KITCHEN TABLE CARRYING OFF HIS DRAWING TO BED WITH HIM
+MICHAELIS THE TICKET OF LEAVE APOSTLE SMILED VAGUELY WITH HIS GLUED LIPS HIS PASTY MOON FACE DROOPED UNDER THE WEIGHT OF MELANCHOLY ASSENT
+THE OLD TERRORIST TURNED SLOWLY HIS HEAD ON HIS SKINNY NECK FROM SIDE TO SIDE
+YES I HAD THE TIME TO THINK THINGS OUT A LITTLE HE ADDED WITHOUT EMPHASIS
+THE COALS IN THE GRATE SETTLED DOWN WITH A SLIGHT CRASH AND MICHAELIS THE HERMIT OF VISIONS IN THE DESERT OF A PENITENTIARY GOT UP IMPETUOUSLY
+WITH HIS ELBOW PRESENTING NO APPEARANCE OF A JOINT BUT MORE LIKE A BEND IN A DUMMY'S LIMB THROWN OVER THE BACK OF A CHAIR HE LEANED FORWARD SLIGHTLY OVER HIS SHORT AND ENORMOUS THIGHS TO SPIT INTO THE GRATE
+HE GLARED AT ME AS IF HE DIDN'T KNOW WHO I WAS WHEN I WENT DOWNSTAIRS
+THE PROSPECT WAS AS BLACK AS THE WINDOW PANE AGAINST WHICH HE WAS LEANING HIS FOREHEAD
+STEVIE PROWLED ROUND THE TABLE LIKE AN EXCITED ANIMAL IN A CAGE
+IF I HAD KNOWN THEY WERE COMING TO NIGHT I WOULD HAVE SEEN TO IT THAT HE WENT TO BED AT THE SAME TIME I DID
+HIS OWN SKIN HAD SIZZLED UNDER THE RED HOT BRAND HE MURMURED SOFTLY
+HE HAD BEEN A PRISONER HIMSELF
+I WOULD CALL IT CANNIBALISTIC THAT'S WHAT IT IS
+THE LIGHT THROWN DOWN BY THE SHADE FELL DAZZLINGLY ON THE WHITE PILLOW SUNK BY THE WEIGHT OF HER HEAD REPOSING WITH CLOSED EYES AND DARK HAIR DONE UP IN SEVERAL PLAITS FOR THE NIGHT
+IN ANY CASE HE HAD NOT THE TIME
+IT'S LIKE YOUR HORSE SUDDENLY FALLING DEAD UNDER YOU IN THE MIDST OF AN UNINHABITED AND THIRSTY PLAIN
+LOAFING WAS ALL VERY WELL FOR THESE FELLOWS WHO KNEW NOT MISTER VLADIMIR AND HAD WOMEN TO FALL BACK UPON WHEREAS HE HAD A WOMAN TO PROVIDE FOR
+LOMBROSO IS AN ASS
+FOR HIM THE CRIMINAL IS THE PRISONER SIMPLE IS IT NOT
+STRUGGLE WARFARE WAS THE CONDITION OF PRIVATE OWNERSHIP IT WAS FATAL
+THE POSSESSORS OF PROPERTY HAD NOT ONLY TO FACE THE AWAKENED PROLETARIAT BUT THEY HAD ALSO TO FIGHT AMONGST THEMSELVES YES
+HIS ENUNCIATION WOULD HAVE BEEN ALMOST TOTALLY UNINTELLIGIBLE TO A STRANGER
+HE LOOKED DUBIOUSLY AT HIS BROTHER IN LAW BUT HE DID NOT ASK HIM FOR INFORMATION
+YOU DON'T UNDERSTAND HE BEGAN DISDAINFULLY BUT STOPPED SHORT INTIMIDATED BY THE DEAD BLACKNESS OF THE CAVERNOUS EYES IN THE FACE TURNED SLOWLY TOWARDS HIM WITH A BLIND STARE AS IF GUIDED ONLY BY THE SOUND
+YOU WOULD CALL THAT LAD A DEGENERATE WOULD YOU MUMBLED MISTER VERLOC
+ASK KARL YUNDT HE GROWLED SAVAGELY
+I DON'T SAY THAT PROTESTED MICHAELIS GENTLY
+THERE IS NO OCCUPATION THAT FAILS A MAN MORE COMPLETELY THAN THAT OF A SECRET AGENT OF POLICE
+AND I COULD NEVER GET AS MANY AS THREE SUCH MEN TOGETHER
+HE WATCHED HIM GESTICULATING AND MURMURING IN THE KITCHEN
+THESE WERE BUT FEW AND FOR THE FIRST TIME SINCE HE OPENED HIS SHOP HE TOOK A COMMERCIAL SURVEY OF ITS VALUE
+COMRADE OSSIPON'S FACE TWITCHED WITH EXASPERATION
+THAT BOY HEARS TOO MUCH OF WHAT IS TALKED ABOUT HERE
+HE GAVE THE DISCUSSION UP WITH A SLIGHT SHRUG OF THE SHOULDERS
+COMFORTABLE DEAR
+HE GETS A RED FACE PORING OVER THEM
+HE WAS NOT SATISFIED WITH HIS FRIENDS
+WHEN HE ROSE PAINFULLY THE THRUSTING FORWARD OF A SKINNY GROPING HAND DEFORMED BY GOUTY SWELLINGS SUGGESTED THE EFFORT OF A MORIBUND MURDERER SUMMONING ALL HIS REMAINING STRENGTH FOR A LAST STAB
+THE SHADOW OF HIS EVIL GIFT CLUNG TO HIM YET LIKE THE SMELL OF A DEADLY DRUG IN AN OLD VIAL OF POISON EMPTIED NOW USELESS READY TO BE THROWN AWAY UPON THE RUBBISH HEAP OF THINGS THAT HAD SERVED THEIR TIME
+IT WAS KARL YUNDT WHO WAS HEARD IMPLACABLE TO HIS LAST BREATH
+HE ISN'T FIT TO HEAR WHAT'S SAID HERE HE BELIEVES IT'S ALL TRUE
+AT BEST THEY CAN ONLY INTERPRET THE MIND OF THE PROPHET AND CAN HAVE NO OBJECTIVE VALUE
+HE PAUSED THEN ADDED WITH MODEST FIRMNESS
+THEN WHY INDULGE IN PROPHETIC PHANTASIES
+MISTER VERLOC WAS FULLY RESPONSIVE NOW
+HE WAS OUT OF HIS MIND WITH SOMETHING HE OVERHEARD ABOUT EATING PEOPLE'S FLESH AND DRINKING BLOOD WHAT'S THE GOOD OF TALKING LIKE THAT
+HE CAN'T STAND THE NOTION OF ANY CRUELTY
+THE FAMOUS TERRORIST HAD NEVER IN HIS LIFE RAISED PERSONALLY AS MUCH AS HIS LITTLE FINGER AGAINST THE SOCIAL EDIFICE
+AND THE INCONSISTENT WOMAN FELL UPON HIS BUTTONY BREAST WEEPING COPIOUSLY
+I COULD NOT LOVE THEE DEAR SO MUCH LOVED I NOT HONOR MORE
+THE BOYS BLESS THEIR BRAVE HEARTS HAVE DONE NOBLY BUT OLDER MEN ARE NEEDED NOW WE CANNOT SACRIFICE ALL THE GALLANT LADS AND WE WHO HAVE MORE TO LOSE THAN THEY MUST TAKE OUR TURN AND TRY TO DO AS WELL
+BUT EVEN WHILE SHE ENJOYED EVERY HOUR OF LIFE AND BEGRUDGED THE TIME GIVEN TO SLEEP SHE FELT AS IF THE DREAM WAS TOO BEAUTIFUL TO LAST AND OFTEN SAID
+HIS WIFE FED HIM WITH THE FAT OF THE LAND REGARDLESS OF CONSEQUENCES HIS CHILDREN REVOLVED ABOUT HIM WITH TIRELESS CURIOSITY AND WONDER HIS NEIGHBORS FLOCKED IN TO APPLAUD ADVISE AND ADMIRE EVERY ONE TREATED HIM WITH A RESPECT MOST GRATEFUL TO HIS FEELINGS HE WAS AN OBJECT OF INTEREST AND WITH EVERY HOUR HIS IMPORTANCE INCREASED SO THAT BY NIGHT HE FELT LIKE A COMMANDER IN CHIEF AND BORE HIMSELF ACCORDINGLY
+NOW CYNTHY BE YOU SATISFIED
+SO CHRISTIE TURNED A DEAF EAR TO HER PROPHETIC SOUL AND GAVE HERSELF UP TO THE BLISSFUL HOLIDAY THAT HAD COME AT LAST
+I HOPE YOU'LL LIKE HIM BETTER THAN THE FROST BITTEN OLD DAVID YOU FIRST KNEW AND WERE KIND ENOUGH TO LOVE
+THEN SHE SAW DAVID AND THE REGIMENT BECAME ONE MAN TO HER
+I'M NOT A TALKER YOU KNOW AND AS THE LAWS OF GRAVITATION FORBID MY SOARING ALOFT ANYWHERE I CAN ONLY EXPRESS MY JOYFULLY UPLIFTED STATE OF MIND BY PRANCING AS YOU CALL IT
+I SHALL WAIT FOR TIME TO SHOW
+CAN YOU REMEMBER WHAT HEPSEY TOLD US AND CALL THEM POOR LONG SUFFERIN CREETERS NAMES
+DAVID AND CHRISTIE WENT SMILING AWAY TOGETHER AND IF THEY SHED ANY TEARS OVER THE BRIEF HAPPINESS NO ONE SAW THEM BUT THE FLOWERS AND THEY LOYALLY KEPT THE SECRET FOLDED UP IN THEIR TENDER HEARTS
+MISTER POWER IS WAITING ARE YOU READY LOVE QUITE READY
+AS A MARRIED WOMAN YOU WILL GET ON BETTER AS MY WIFE YOU WILL BE ALLOWED TO COME TO ME IF I NEED YOU AND AS MY HE STOPPED THERE FOR HE COULD NOT ADD AS MY WIDOW YOU WILL HAVE MY PENSION TO SUPPORT YOU
+BENNET WILL TAKE THE GARDEN AND GREEN HOUSE OFF MY HANDS THIS AUTUMN FOR A YEAR OR LONGER IF I LIKE
+THEN THEY WENT BACK TO THEIR WORK LITTLE DREAMING AS THEY TIED ROSES AND TWINED SMILAX WREATHS HOW NEAR THAT OTHER CHANCE WAS HOW SOON THEY WERE TO BE CALLED UPON TO KEEP THEIR PROMISE AND HOW WELL EACH WAS TO PERFORM THE PART GIVEN THEM IN LIFE AND DEATH
+HE WAS NOT AS UNMOVED AS HE SEEMED BY THE GENERAL EXCITEMENT AND HAD FELT SUNDRY MANLY IMPULSES TO UP AND AT EM WHEN HIS COMRADES IN THE SHOP DISCUSSED THE CRISIS WITH IREFUL BRANDISHING OF AWLS AND VENGEFUL POUNDING OF SOLE LEATHER AS IF THE REBELS WERE UNDER THE HAMMER
+VERY WELL SAID MISSUS WILKINS RESOLUTELY TO HERSELF EF I CAN'T MAKE NO IMPRESSION ON HIS SOUL I WILL ON HIS STOMMICK AND SEE HOW THAT'LL WORK
+YOU YOUNG FOLKS TAKE A WEDDING TRIP TO THE GREEN HOUSE WHILE WE SEE HOW WELL WE CAN GET ON WITHOUT YOU
+ALL WATCHED WITH QUICKENED BREATH AND PROUD SOULS THAT LIVING WAVE BLUE BELOW AND BRIGHT WITH A STEELY GLITTER ABOVE AS IT FLOWED DOWN THE STREET AND AWAY TO JOIN THE SEA OF DAUNTLESS HEARTS THAT FOR MONTHS HAD ROLLED UP AGAINST THE SOUTH AND EBBED BACK REDDENED WITH THE BLOOD OF MEN LIKE THESE
+IT IS TERRIBLE AND YET GLORIOUS
+HE'S A KIND NEIGHBORLY MAN AND HIS BOY WILL TAKE MY PLACE ABOUT THE HOUSE AND PROTECT YOU FAITHFULLY
+A VERY SIMPLE LITTLE MARRIAGE FEAST BUT MORE LOVE GOOD WILL AND TENDER WISHES ADORNED THE PLAIN TABLE THAN IS OFTEN FOUND AT WEDDING BREAKFASTS AND BETTER THAN ANY SPEECH OR SONG WAS LETTY'S BROKEN WHISPER AS SHE FOLDED HER ARMS ROUND DAVID'S EMPTY CHAIR WHEN NO ONE SAW HER HEAVEN BLESS AND KEEP AND BRING HIM BACK TO US
+THE ROSES ARE FOR THEY REMIND ME OF POOR HELEN AND THE FIRST WORK I DID WITH DAVID WAS ARRANGING FLOWERS LIKE THESE FOR A DEAD BABY'S LITTLE COFFIN
+TO NO HOME IN THE LAND DID THE GREAT TROUBLE BRING A MORE SUDDEN CHANGE THAN THE LITTLE COTTAGE IN THE LANE
+YES DAVID SISTER AND SWEETHEART ANSWERED BRAVELY FORGETTING IN THE FERVOR OF THE MOMENT WHAT HEAVY CONSEQUENCES GOD MIGHT SEE FIT TO SEND GOOD
+I KNEW YOU WOULD GO I SAW YOU GETTING READY AND I MADE UP MY MIND TO FOLLOW
+TO SAY THAT THE FISH ROSE AT ONCE AND SWALLOWED THE BAIT HOOK AND ALL BUT FEEBLY EXPRESSES THE JUSTICE DONE TO THE CAKES BY THAT LONG SUFFERING MAN
+WE CAN'T AFFORD NO NICE VITTLES NOW WHEN OUR MEN ARE SUFFERIN SO
+THEY KNEW WHAT IT WAS WITHOUT A WORD MISSUS STERLING CLASPED HER HANDS AND BOWED HER HEAD
+IN THAT CASE YOU WILL FIND ME A PROUD IMPETUOUS AMBITIOUS FELLOW CHRISTIE AND HOW WILL THAT SUIT
+SURELY I SHALL IF I GIVE YOU AND MYSELF TO THE CAUSE AND I DO IT GLADLY THOUGH I KNOW THAT MY HEART HAS GOT TO ACHE AS IT NEVER HAS ACHED YET WHEN MY COURAGE FAILS AS IT WILL BY AND BY AND MY SELFISH SOUL COUNTS THE COST OF MY OFFERING AFTER THE EXCITEMENT IS OVER
+DAVID CAUGHT THE EXALTATION AND GAVE NO FURTHER THOUGHT TO ANY THING BUT THE DUTY OF THE HOUR FINDING HIMSELF STRONGER AND BRAVER FOR THAT LONG LOOK INTO THE ILLUMINATED FACE OF THE WOMAN HE LOVED
+FINDING THAT LISHA SHOWED LITTLE ENTHUSIASM ON THE SUBJECT SHE TRIED TO ROUSE HIM BY PATRIOTIC APPEALS OF VARIOUS SORTS
+NOT ONE DAVID THAT'S TRUE LOVE CHRISTIE
+YOU WILL LET ME DO IT AND IN RETURN I WILL MARRY YOU WHENEVER YOU ASK ME ANSWERED CHRISTIE SEALING THE PROMISE WITH A KISS THAT SILENCED HIM
+WE WILL WHAT CAN I DO FOR YOU DAVY ASKED CHRISTIE WONDERFULLY SUPPORTED BY THE THOUGHT THAT SHE WAS GOING TOO
+DAVID WAS SOBER ENOUGH NOW AND WENT ABOUT HIS WORK WITH A GRIM SET TO HIS LIPS AND A SPARK IN HIS EYES THAT MADE THE THREE WOMEN LOOK AT ONE ANOTHER PALE WITH UNSPOKEN APPREHENSION
+MOTHER SAYS I'VE GONE BACK TO THE TIME BEFORE WE LOST LETTY AND I SOMETIMES FEEL AS IF I HAD
+DAVID HELD IT CLOSE IN BOTH OF HIS SAYING GRATEFULLY THANK YOU MOTHER THEN FIXING HIS EYES ON THE YOUNGER YET NOT DEARER WOMEN HE ADDED WITH A RING IN HIS VOICE THAT MADE THEIR HEARTS ANSWER WITH A PROMPT AY AY IN SPITE OF LOVE OR FEAR
+NOTHING CAN SURPRISE ME NOW I'M PREPARED FOR ANY THING EVEN THE SIGHT OF MY QUAKERISH LOVER DANCING A JIG
+THE WOMEN DROPPED THEIR WORK TO LOOK AND LISTEN FOR HIS VISITS WERE FEW AND SHORT AND EVERY INSTANT WAS PRECIOUS
+YOU'VE SOMETHING TO TELL ME I SEE IT IN YOUR FACE DEAR I MUST GO
+BUT I THINK FEW BRIDES DRESS WITH A BRAVER HAPPIER HEART THAN MINE THOUGH I DO CHOOSE A SOBER WEDDING GOWN ANSWERED CHRISTIE SMILING AGAIN AS SHE TOOK FROM A HALF PACKED TRUNK HER NEW HOSPITAL SUIT OF SOFT GRAY WOOLLEN STUFF
+IT WOULD HAVE TAKEN MANY KNAPSACKS TO HOLD ALL THE GIFTS SHOWERED UPON HIM BY HIS FRIENDS AND NEIGHBORS
+THEN THEY STOOD QUITE STILL FOR A TIME AND IN THE SILENCE THE TWO HEARTS TALKED TOGETHER IN THE SWEET LANGUAGE NO TONGUE CAN UTTER
+I DON'T WANT YOU TO I LOVE TO SEE YOU SO YOUNG AND HAPPY ONLY YOU ARE NOT THE OLD DAVID AND I'VE GOT TO GET ACQUAINTED WITH THE NEW ONE
+EXCELLENTLY I LIKE PRIDE OF YOUR SORT IMPETUOSITY BECOMES YOU FOR YOU HAVE LEARNED TO CONTROL IT IF NEED BE AND THE AMBITION IS BEST OF ALL
+NO HE AIN'T IT'S A TRAINER ADDED ANN LIZY
+HER MEETING WITH LETTY WAS INDESCRIBABLY TENDER AND THE DAYS THAT FOLLOWED WERE PRETTY EQUALLY DIVIDED BETWEEN HER AND HER BROTHER IN NURSING THE ONE AND LOVING THE OTHER
+THEN THE GOOD SOUL OPENLY SHOULDERED THE BURDEN SHE HAD BORNE SO LONG IN SECRET AND BRAVELY TRUDGED ON ALONE
+I FEEL LIKE A BOY OUT OF SCHOOL OR RATHER A MAN OUT OF PRISON AND MUST ENJOY MY LIBERTY IN SOME WAY
+NEXT EVENING AS MISSUS STERLING SAT ALONE IN THE TWILIGHT A TALL MAN IN ARMY BLUE ENTERED QUIETLY STOOD WATCHING THE TRANQUIL FIGURE FOR A MOMENT THEN WENT AND KNELT DOWN BESIDE IT SAYING WITH A MOST UNSOLDIERLY CHOKE IN THE VOICE
+NOTHING CAN PART US ANY MORE NOT EVEN DEATH FOR LOVE LIKE OURS WILL LAST FOR EVER
+I WISH I HADN'T TAKEN THAT BRANDY HE SAID FOOL THAT I AM
+THAT AT ALL TIMES DEBASING AT THIS PARTICULAR TIME IT WAS INFAMOUS THAT A VICE UNWORTHY OF ANY MAN WAS DOUBLY SINFUL IN A MAN OF EDUCATION AND A MINISTER OF GOD IN VAIN
+IN THE VALLEY OF THE SHADOW OF DEATH HE IS WITH US
+I HAD THE PLEASURE OF MEETING HIM IN SOCIETY
+DEAD SAID DOCTOR MACKLEWAIN
+I'LL REPORT THIS TO THE GOVERNMENT
+YES ONE OUGHTN'T TO LEAVE THE COLONY WITHOUT SEEING IT SAYS BURGESS IT'S WORTH SEEING
+THE DEVIL HE IS I HEARD SOMETHING ABOUT IT TOO
+IN FACT THE RINGLEADER JOHN REX GAVE ME HIS CONFESSION AND I SENT IT TO THE BISHOP
+THE GOVERNMENT MAY GO TO AND YOU TOO ROARED BURGESS GET OUT
+AN IMPULSIVE GENTLEMAN SAID MEEKIN TO MACKLEWAIN AS THE SOUND OF MISTER NORTH'S FOOTSTEPS DIED AWAY IN THE DISTANCE
+SHOW MISTER NORTH OUT HE SAID AND GO DOWN TO THE BARRACKS AND TELL TROKE THAT KIRKLAND IS TO HAVE A HUNDRED LASHES TO MORROW
+I'LL SHOW YOU WHO'S MASTER HERE MY GOOD SIR
+I AM A MINISTER OF GOD SIR AND I FORBID YOU TO COMMIT THIS CRIME
+WELL NOW SAID MEEKIN WITH ASPERITY I DON'T AGREE WITH YOU
+BY AND BY A SHORT FIGURE SMOKING A CHEROOT CAME UP OUT OF THE DARK AND PROVED TO BE DOCTOR MACKLEWAIN WHO HAD BEEN PREVENTED FROM ATTENDING THE DINNER BY REASON OF AN ACCIDENT TO A CONSTABLE AT NORFOLK BAY WHICH HAD CLAIMED HIS PROFESSIONAL ATTENTION
+WHOM IS HE GOING TO FLOG NOW
+BEFORE THE TWO CLERGYMEN HAD GOT HALF WAY DOWN THE STEEP PATH THAT LED FROM THE COMMANDANT'S HOUSE TO THE FLAT ON WHICH THE COTTAGES OF THE DOCTOR AND CHAPLAIN WERE BUILT MACKLEWAIN REJOINED THEM
+MACKLEWAIN SHOOK HIS HEAD SERIOUSLY
+THIS IS MURDEROUS
+PRAY HELP YOURSELF TO WINE
+MISTER MEEKIN EXPRESSED SOME ALARM BUT DOCTOR MACKLEWAIN RE ASSURED HIM
+OH GOD GIVE ME STRENGTH AID ME HELP ME
+NORTH KNEW WELL THAT HE WOULD NEVER DARE TO ATTEMPT ANY SUCH ACT OF VIOLENCE BUT THE INSULT STUNG HIM LIKE THE CUT OF A WHIP
+HE SEEMS TO ME TO BE TRULY PENITENT FOR HIS OFFENCES A MISGUIDED BUT NOT A HYPOCRITICAL MAN IF MY KNOWLEDGE OF HUMAN NATURE GOES FOR ANYTHING I HOPE HE IS SAID NORTH
+DAMN YOUR IMPERTINENCE SIR BURST OUT BURGESS
+TWICE HE PAUSED ON THE WAY TO THE SITTING ROOM AND TWICE WAS HE DRIVEN ON BY A POWER STRONGER THAN HIS WILL
+YOU'RE A DISMISSED OFFICER OF THE GOVERNMENT SIR
+HAVE YOU MANY VISITORS CAPTAIN BURGESS VERY FEW
+I HAVE THESE ATTACKS AT TIMES
+IN THE MIDST OF HIS ARGUMENTS HE FOUND HIMSELF AT THE CUPBOARD WITH THE BOTTLE AT HIS LIPS IN AN ATTITUDE THAT WAS AT ONCE LUDICROUS AND HORRIBLE
+HE MIXED A TEASPOONFUL OF THIS IN A PANNIKIN OF WATER AND DRANK IT
+YOU DON'T MEAN TO SAY HE'S GOING TO FLOG KIRKLAND
+HE IS JUST MARRIED YOU KNOW IS HE SAID BURGESS
+ANOTHER FLOGGING TO MORROW SAID HE GRUMBLINGLY
+THIS OF COURSE WAS MERE BRAVADO ON THE PART OF THE COMMANDANT
+I'LL TEACH MY PRISONERS TO ATTEMPT SUICIDE
+I WAS QUARTERED WITH HIM AT SARAH ISLAND
+THERE'S NO FEAR OF HIM SAID BURGESS CHEERILY IF HE GROWS UPROARIOUS WE'LL SOON GIVE HIM A TOUCH OF THE CAT
+I'VE GIVEN MY ORDERS SIR
+WE CAN'T DO ANYTHING WITHOUT EVIDENCE COMPLAIN
+I SHALL FIND MY PORTMANTEAU IN MY ROOM YOU SAID YES YES
+HE SMELT THE NUTTY AROMA OF THE SPIRIT
+DELIGHTED TO SEE YOU MISTER MEEKIN
+THEN CAPTAIN BURGESS CRIED NORTH HIS PALE FACE FLUSHING I TELL YOU THE BOY'S BLOOD WILL BE ON YOUR HEAD
+PERHAPS YOU'LL HAVE THE GOODNESS TO ALLOW ME TO BE THE BEST JUDGE OF THAT RETURNED MACKLEWAIN DRAWING UP HIS LITTLE BODY TO ITS LEAST INSIGNIFICANT STATURE
+ABANDONED INDEED BY GOD AND MAN ALMOST
+GOOD NIGHT SIR I HOPE YOU WILL BE COMFORTABLE
+DOCTOR WE ALL HAVE OUR CROSSES HAVE WE NOT
+HOW DELIGHTFUL THE GRASS SMELLS
+IT WAS AS THOUGH HE HAD REACHED THE CRISIS OF A DISEASE WHICH HAD BEEN FOR DAYS GATHERING FORCE
+THEY SHALL NOT FLOG THAT BOY HE SAID
+THE REVEREND MEEKIN EYED HIS CLERICAL BROTHER WITH HORROR THE REVEREND MEEKIN WAS NOT ACCUSTOMED TO CLERGYMEN WHO WORE BLACK NECKTIES SMOKED CLAY PIPES CHEWED TOBACCO AND DRANK NEAT BRANDY OUT OF TUMBLERS
+I SUPPOSE SEVERITY IS NECESSARY RETURNED MEEKIN THOUGH TO MY EARS A FLOGGING SOUNDS A LITTLE DISTASTEFUL
+CAPTAIN FRERE SAYS THAT THE SCENERY IS DELIGHTFUL
+HE SLEEPS AT THE BACK AND NORTH HURRIED OFF
+O LORD LOOK DOWN UPON ME
+IF YOU PLEASE SAID MEEKIN GRAVELY
+HE HAS THE STRANGEST FITS AT TIMES
+SO THEY WENT ON TO THE VERANDAH AND LOOKED DOWN UPON THE LIGHTS OF THE PRISON AND LISTENED TO THE SEA LAPPING THE SHORE
+OUR ROADS LIE TOGETHER DOCTOR
+I MUST HAVE A TEASPOONFUL HE SAID TO ALLAY THE CRAVING
+A GREAT RASCAL PUT IN NORTH
+YOU HAVE NOT BEEN LONG IN THE COLONY MISTER MEEKIN
+UNLESS IT'S A CANCER IN THE STOMACH I DON'T KNOW WHAT IT CAN BE CANCER IN THE STOMACH
+THEN DON'T YOU INTERFERE WITH ME SIR
+THAT'S MACKLEWAIN'S BUSINESS
+BUT MACKLEWAIN WAS TIRED AND WANTED TO GET HOME
+IMPERTINENT YOUNG BEGGAR SAID BURGESS
+THAT LAST FELLOW YOU HAD OUGHT TO HAVE BEEN TIED UP HIMSELF
+AND BEAT ON THE BARS WITH WHITE AND SWEATING HANDS
+BUT KIRKLAND KEPT STEADILY ON FOR THE RIVER
+WHAT IS HE MORE THAN ANYBODY ELSE SAID THE WRETCHED MAN TO HIMSELF AS HE HUGGED HIS MISERY CLOSE
+WHAT DOES HE CARE CARE
+WHEN THE MUSTER BELL RANG AND THE GANG BROKE UP RUFUS DAWES ON HIS SILENT WAY TO HIS SEPARATE CELL OBSERVED A NOTABLE CHANGE OF CUSTOM IN THE DISPOSITION OF THE NEW CONVICT
+THERE'S MORE TROUBLE WITH YOU BLOODY ARISTOCRATS THAN ENOUGH LIE QUIET
+VERY GOOD YOUR HONOUR SAYS TROKE
+MUST STOP THAT FIFTY LASHES TROKE
+OH MISTER NORTH SAYS KIRKLAND WHY DID YOU STOP ME
+HAVE YOU EVER BEEN IN THAT THAT PLACE I WAS IN LAST NIGHT ASKED KIRKLAND
+A PRISONER REFRACTORY YOUR REVERENCE SAID THE WATCHMAN
+HOLD ON TO ME MISS NANCY SAID THE GIANT I'M BIG ENOUGH TO CARRY DOUBLE
+IT'S HARD FOR SUCH YOUNG UNS
+KIRKLAND JUMPED FOR THE JETTY MISSED HIS FOOTING AND FELL INTO THE ARMS OF THE CHAPLAIN
+LET HIM OUT CRIED NORTH AGAIN STAMPING HIS FOOT
+WANTS TO COME OUT MISTER NORTH
+DO HIM GOOD CURSE HIM
+ABOUT DAWN THE NEXT MORNING MISTER NORTH WHO AMONGST OTHER VAGARIES NOT APPROVED OF BY HIS BISHOP HAD A HABIT OF PROWLING ABOUT THE PRISON AT UNOFFICIAL HOURS WAS ATTRACTED BY A DISPUTE AT THE DOOR OF THE DORMITORY
+I WON'T HAVE MY MEN KNOCKED UP WITH FLOGGING THESE RASCALS
+JUST AS HE REACHED IT HOWEVER THE FIGURE OF MISTER NORTH ROSE FROM BEHIND A PILE OF STONES
+IF YOU FALL WE MUST FALL OVER YOU AND THEN YOU'RE DONE FOR
+I ORDER YOU SIR NORTH CRIED INDIGNANT
+I'M NOT TO GO IN THERE SAYS THE EX BANK CLERK DRAWING BACK IN DISMAY FROM THE CLOUD OF FOUL FACES WHICH LOWERED UPON HIM
+YOU CAN GUESS WHAT THAT UNHAPPY BOY HAS SUFFERED
+HE HAD BEEN A CLERK IN A BANKING HOUSE AND WAS TRANSPORTED FOR EMBEZZLEMENT THOUGH BY SOME GRAVE DOUBTS AS TO HIS GUILT WERE ENTERTAINED
+HE HAD HARDLY UTTERED THE WORDS WHEN THE BOY FLUNG HIMSELF BENEATH THE LOG
+KIRKLAND GHASTLY PALE BLEEDING WITH HIS WOOLLEN SHIRT TORN AND HIS BLUE EYES WIDE OPEN WITH TERROR WAS CLINGING TO THE BARS
+VERY SORRY YOUR REVERENCE BUT YOUR REVERENCE KNOWS THAT I DAREN'T DO SUCH A THING
+YOUR COUSIN THE WILD CONVOLVULUS WHOM I LEFT IN THE FIELDS THIS MORNING DOES NO SUCH THING I ASSURE YOU
+DID THAT LOVELY CREATURE SUPPOSE THAT NATURE WHO HAD DONE SO MUCH FOR HER THAT THE FAME OF HER BEAUTY EXTENDED THROUGHOUT THE WORLD HAD YET LEFT HER SO WEAK AND FEEBLE THAT SHE COULD NOT SUPPORT HERSELF IN THE POSITION MOST CALCULATED TO GIVE HER EASE AND PLEASURE
+MY YOUNG PLANTS REQUIRE HEAT OR THEY WOULD NOT LIVE AND THE POTS WE ARE KEPT IN PROTECT US FROM THOSE CRUEL WIRE WORMS WHO DELIGHT TO DESTROY OUR ROOTS
+THEN THE WIND TOOK ANOTHER FROLIC ROUND THE GARDEN AND MADE UP TO THE LARGE WHITE LILY INTO WHOSE REFINED EAR HE WHISPERED A DOUBT AS TO THE NECESSITY OR ADVANTAGE OF HER THICK POWERFUL STEM BEING PROPPED UP AGAINST A STUPID UGLY STICK
+HE REALLY GRIEVED TO SEE IT
+YOU SURELY CANNOT SUPPOSE THAT IN A NATURAL STATE YOU WOULD BE FORCED TO CLIMB REGULARLY UP ONE TALL BARE STICK SUCH AS I SEE YOU UPON NOW
+STILL THE ROSE TREE STOOD OUT THAT THERE MUST BE SOME GREAT ADVANTAGES IN A GARDENER'S CARE FOR SHE COULD NOT PRETEND TO BE IGNORANT OF HER OWN SUPERIORITY TO ALL HER WILD RELATIONS IN THE WOODS
+MAKING A SORT OF EDDYING CIRCUIT ROUND THE GARDEN HE KNOCKED OVER THE CONVOLVULUS POLE TORE THE STRIPS FROM THE STICK THAT HELD UP THE WHITE LILY LOOSED ALL THE CARNATION FLOWERS FROM THEIR FASTENINGS BROKE THE ROSE TREE DOWN AND LEVELLED THE SWEET PEAS TO THE GROUND
+I AM NOT THINKING ABOUT THE GARDEN MAMMA REPLIED THE YOUNG GIRL WITHOUT LIFTING UP HER FACE WE CAN PLANT NEW FLOWERS AND TIE UP EVEN SOME OF THESE AFRESH
+BUT FOR THE SIGHT THAT AWAITED HIM HE WAS NOT PREPARED AT ALL
+WHY NOT ALLOW YOUR SILVER TUFTS TO LUXURIATE IN A NATURAL MANNER
+WHAT A FUSS IS MADE ABOUT YOU MY DEAR LITTLE FRIENDS
+INDEED NOT A FLOWER ESCAPED HIS MISCHIEVOUS SUGGESTIONS
+MEANWHILE HOW FARED IT WITH THE FLOWERS
+IN THIS POSITION SHE REMAINED UNTIL A GENTLE HAND WAS LAID UPON HER SHOULDER
+THE MISTRESS HAD RETURNED AND THE YOUNG LADY WAS WITH HER AND HURRIED AT ONCE TO HER FAVOURITE GARDEN
+ECHOED THE FLOWERS TREMULOUSLY AS WITH A SORT OF FEARFUL PLEASURE THEY AWAITED HIS APPROACH
+WEEDS MEANWHILE SPRANG UP AND A DREARY CONFUSION REIGNED IN THE ONCE ORDERLY AND BRILLIANT LITTLE GARDEN
+BEFORE THE DAY CLOSED THE GARDENER CAME WHISTLING FROM HIS FARM WORK TO LOOK OVER HIS PRETTY CHARGES
+OH THAT SHE WERE ONCE MORE CLIMBING UP THE FRIENDLY FIR POLE
+THE HONEYSUCKLE ESCAPED NO BETTER AND THE CARNATION WAS READY TO DIE OF VEXATION AT FINDING THAT HER COVETED FREEDOM HAD LEVELLED HER TO THE DIRT
+GO ON DOWN THE MOUNTAIN SAID MERCURY AND AS YOU GO CAST THE BONES OF YOUR MOTHER OVER YOUR SHOULDERS BEHIND YOU AND WITH THESE WORDS HE LEAPED INTO THE AIR AND WAS SEEN NO MORE
+WE SHOULD LIKE ABOVE ALL THINGS SAID DEUCALION TO SEE THIS LAND FULL OF PEOPLE ONCE MORE FOR WITHOUT NEIGHBORS AND FRIENDS THE WORLD IS A VERY LONELY PLACE INDEED
+SURELY I DO NOT KNOW SAID DEUCALION
+THE DAY IS COMING SAID PROMETHEUS WHEN JUPITER WILL SEND A FLOOD TO DESTROY MANKIND FROM THE EARTH
+BUT MEN KEPT ON FIGHTING AND ROBBING EVEN WHILE THE RAIN WAS POURING DOWN AND THE SEA WAS COMING UP OVER THE LAND
+AFTER JUPITER HAD BOUND PROMETHEUS ON MOUNT CAUCASUS AND HAD SENT DISEASES AND CARES INTO THE WORLD MEN BECAME VERY VERY WICKED
+THESE MEN HE SAID TO HIS MIGHTY COMPANY ARE NOTHING BUT A SOURCE OF TROUBLE
+WHAT DID HE MEAN ASKED PYRRHA
+IS THERE ANYTHING THAT YOU WISH HE ASKED
+BUT DEUCALION AND PYRRHA WERE VERY SAD FOR THEY KNEW THAT THEY WERE THE ONLY PERSONS WHO WERE LEFT ALIVE IN ALL THE LAND
+NO ONE BUT DEUCALION THE SON OF PROMETHEUS WAS READY FOR SUCH A STORM
+WHEN AT LAST THEY REACHED THE PLAIN THEY FOUND THEMSELVES AT THE HEAD OF A NOBLE COMPANY OF HUMAN BEINGS ALL EAGER TO SERVE THEM
+IN THOSE VERY EARLY TIMES THERE WAS A MAN NAMED DEUCALION AND HE WAS THE SON OF PROMETHEUS
+ONE OF THE YOUNG FAIRIES OVERHEARING HER AND FANCYING SHE MIGHT WORK SOME MISCHIEF TO THE LITTLE BABY WENT AND HID HERSELF BEHIND THE HANGINGS IN THE HALL SO AS TO BE ABLE TO HAVE THE LAST WORD AND UNDO ANY HARM THE OLD FAIRY MIGHT WISH TO WORK
+HE TURNED TO SHOW THEM THE CASTLE BUT BEHOLD
+HE PASSED THROUGH ONE APARTMENT AFTER ANOTHER WHERE WERE LADIES AND GENTLEMEN ASLEEP IN THEIR CHAIRS OR STANDING
+THEY TALKED FOR FOUR HOURS AND HAD NOT THEN SAID HALF THAT WAS IN THEIR HEADS TO SAY
+MEANWHILE ALL THE REST OF THE PEOPLE IN THE CASTLE HAD BEEN WAKENED AT THE SAME MOMENT AS THE PRINCESS AND THEY WERE NOW EXTREMELY HUNGRY
+HE ENTERED THE GUARD ROOM THERE THE GUARDS STOOD DRAWN UP IN LINE WITH CARBINES AT THEIR SHOULDERS BUT THEY WERE SOUND ASLEEP
+NOW FIFTEEN YEARS AFTER THE PRINCESS WAS BORN SHE WAS WITH THE KING AND QUEEN AT ONE OF THEIR CASTLES AND AS SHE WAS RUNNING ABOUT BY HERSELF SHE CAME TO A LITTLE CHAMBER AT THE TOP OF A TOWER AND THERE SAT AN HONEST OLD WOMAN SPINNING FOR SHE HAD NEVER HEARD OF THE KING'S EDICT
+THE YOUNG PRINCE AT THESE WORDS FELT HIMSELF ON FIRE
+THE TURN OF THE OLD FAIRY HAD NOW COME AND SHE DECLARED WHILE HER HEAD SHOOK WITH MALICE THAT THE PRINCESS SHOULD PIERCE HER HAND WITH A SPINDLE AND DIE OF THE WOUND
+HE ENTERED A LARGE FORECOURT AND STOOD STILL WITH AMAZEMENT AND AWE
+SHE HAD NO SOONER TAKEN UP THE SPINDLE THAN BEING HASTY AND CARELESS SHE PIERCED HER HAND WITH THE POINT OF IT AND FAINTED AWAY
+THE VIOLINS AND HAUT BOYS PLAYED OLD BUT EXCELLENT PIECES OF MUSIC AND AFTER SUPPER TO LOSE NO TIME THE GRAND ALMONER MARRIED THE ROYAL LOVERS IN THE CHAPEL OF THE CASTLE
+WHEN AT LAST THE QUEEN GAVE BIRTH TO A DAUGHTER THE KING WAS SO OVERJOYED THAT HE GAVE A GREAT CHRISTENING FEAST THE LIKE OF WHICH HAD NEVER BEFORE BEEN KNOWN
+SCARCELY HAD HE COME TO THE WOOD WHEN ALL THE TREES AND THORNS WHICH HAD MADE SUCH AN IMPENETRABLE THICKET OPENED ON ONE SIDE AND THE OTHER TO OFFER HIM A PATH
+THE LADY IN WAITING BECAME VERY IMPATIENT AND AT LENGTH ANNOUNCED TO THE PRINCESS THAT THEY ALL WAITED FOR HER
+THEN THE PRINCE TOOK THE PRINCESS BY THE HAND SHE WAS DRESSED IN GREAT SPLENDOUR BUT HE DID NOT HINT THAT SHE LOOKED AS HE HAD SEEN PICTURES OF HIS GREAT GRANDMOTHER LOOK HE THOUGHT HER ALL THE MORE CHARMING FOR THAT
+HE KNEW THAT SHE WOULD NOT AWAKE FOR A HUNDRED YEARS
+ONE SAID IT WAS AN ENCHANTED CASTLE ANOTHER THAT WITCHES LIVED THERE BUT MOST BELIEVED THAT IT WAS OCCUPIED BY A GREAT OGRE WHICH CARRIED THITHER ALL THE CHILDREN HE COULD CATCH AND ATE THEM UP ONE AT A TIME FOR NOBODY COULD GET AT HIM THROUGH THE WOOD
+BUT THE FACES OF THE MEN WERE ROSY AND THE GOBLETS BY THEM HAD A FEW DROPS OF WINE LEFT
+IT IS TRUE I CANNOT ENTIRELY UNDO WHAT MY ELDER HAS DONE
+IN APPROACHING IT ITS SUSPICIOUS LOOKING YELLOW SPOTTED HOOD AND WATCHFUL ATTITUDE WILL BE LIKELY TO MAKE YOU GO CAUTIOUSLY THROUGH THE BOG WHERE IT STANDS AS IF YOU WERE APPROACHING A DANGEROUS SNAKE
+YET STRANGE TO SAY THERE ARE DAYS EVEN HERE SOMEWHAT DULL LOOKING WHEN THE MOUNTAIN SEEMS UNCOMMUNICATIVE SENDING OUT NO APPRECIABLE INVITATION AS IF NOT AT HOME
+ASPLENIUM EPILOBIUM HEUCHERA HAZEL DOGWOOD AND ALDER MAKE A LUXURIOUS FRINGE AND SETTING AND THE FORESTS OF DOUGLAS SPRUCE ALONG THE BANKS ARE THE FINEST I HAVE EVER SEEN IN THE SIERRA
+PERHAPS THE PROFESSION OF DOING GOOD MAY BE FULL BUT EVERY BODY SHOULD BE KIND AT LEAST TO HIMSELF
+THEIR LONG MASSIVE EARS GIVE THEM A VERY STRIKING APPEARANCE
+EVERY CRYSTAL DANCES RESPONSIVE TO THE TOUCHES OF THE SUN AND CURRENTS OF SAP IN THE GROWING CELLS OF ALL THE VEGETATION ARE EVER IN A VITAL WHIRL AND RUSH AND THOUGH MANY FEET AND WINGS ARE FOLDED HOW MANY ARE ASTIR
+THE VIVID GREEN OF THE BOULDERS BENEATH THE WATER IS VERY STRIKING AND COLORS THE ENTIRE STREAM WITH THE EXCEPTION OF THE PORTIONS BROKEN INTO FOAM
+THE GREAT WILDS OF OUR COUNTRY ONCE HELD TO BE BOUNDLESS AND INEXHAUSTIBLE ARE BEING RAPIDLY INVADED AND OVERRUN IN EVERY DIRECTION AND EVERYTHING DESTRUCTIBLE IN THEM IS BEING DESTROYED
+BUT IT IS FAR BETTER TO GO AFOOT
+GO QUIETLY ALONE NO HARM WILL BEFALL YOU
+AS THE LIFE BLOOD OF THE LANDSCAPES THE BEST OF THE WILDERNESS COMES TO THEIR BANKS AND NOT ONE DULL PASSAGE IS FOUND IN ALL THEIR EVENTFUL HISTORIES
+THUS THE SHASTA RIVER ISSUES FROM A LARGE LAKE LIKE SPRING IN SHASTA VALLEY AND ABOUT TWO THIRDS OF THE VOLUME OF THE MC CLOUD GUSHES FORTH IN A GRAND SPRING ON THE EAST SIDE OF THE MOUNTAIN A FEW MILES BACK FROM ITS IMMEDIATE BASE
+WHILE TRAVELING WITH A COMPANY OF HUNTERS I SAW ABOUT FIFTY IN ONE FLOCK
+SHOULD THE VOLUME OF THE STREAM WHERE YOU STRIKE IT SEEM SMALL THEN YOU WILL KNOW THAT YOU ARE ABOVE THE SPRING IF LARGE NEARLY EQUAL TO ITS VOLUME AT ITS CONFLUENCE WITH THE PITT RIVER THEN YOU ARE BELOW IT AND IN EITHER CASE HAVE ONLY TO FOLLOW THE RIVER UP OR DOWN UNTIL YOU COME TO IT
+BUT NEITHER THE GLORIFIED WOODS ON THE ONE HAND NOR THE LAKE ON THE OTHER COULD AT FIRST HOLD THE EYE
+THEY ARE BROAD RUGGED CREVASSED CLOUDLIKE MASSES OF DOWN GRINDING ICE POURING FORTH STREAMS OF MUDDY WATER AS MEASURES OF THE WORK THEY ARE DOING IN SCULPTURING THE ROCKS BENEATH THEM VERY UNLIKE THE LONG MAJESTIC GLACIERS OF ALASKA THAT RIVERLIKE GO WINDING DOWN THE VALLEYS THROUGH THE FORESTS TO THE SEA
+MOUNT BREMER IS THE MOST NOTED STRONGHOLD OF THE SHEEP IN THE WHOLE SHASTA REGION
+TRACING THIS WILD CHANGING CHANNEL GORGE GULLY OR CANYON THE SECTIONS WILL SHOW MOUNT SHASTA AS A HUGE PALIMPSEST CONTAINING THE RECORDS LAYER UPON LAYER OF STRANGELY CONTRASTED EVENTS IN ITS FIERY ICY HISTORY
+SHASTA RAMBLES AND MODOC MEMORIES
+ONE BLANKET WILL BE ENOUGH TO CARRY OR YOU MAY FOREGO THE PLEASURE AND BURDEN ALTOGETHER AS WOOD FOR FIRES IS EVERYWHERE ABUNDANT
+UNDER CERTAIN CONDITIONS YOU MAY HEAR THE ROAR OF THE WATER RUSHING FROM THE ROCK AT A DISTANCE OF HALF A MILE OR EVEN MORE OR YOU MAY NOT HEAR IT UNTIL WITHIN A FEW RODS
+THE BIG MEADOWS LIE NEAR THE FOOT OF LASSEN'S BUTTE A BEAUTIFUL SPACIOUS BASIN SET IN THE HEART OF THE RICHLY FORESTED MOUNTAINS SCARCELY SURPASSED IN THE GRANDEUR OF ITS SURROUNDINGS BY TAHOE
+THEN DARKNESS LIKE DEATH
+AT SUCH TIME ITS HEIGHT SEEMS MUCH LESS AS IF CROUCHING AND WEARY IT WERE TAKING REST
+THE ASCENT OF LASSEN'S BUTTE IS AN EASY WALK AND THE VIEWS FROM THE SUMMIT ARE EXTREMELY TELLING
+SLIGHT RAINSTORMS ARE LIKELY TO BE ENCOUNTERED IN A TRIP ROUND THE MOUNTAIN BUT ONE MAY EASILY FIND SHELTER BENEATH WELL THATCHED TREES THAT SHED THE RAIN LIKE A ROOF
+ONLY A LITTLE FOOD WILL BE REQUIRED
+ARCTIC BEAUTY AND DESOLATION WITH THEIR BLESSINGS AND DANGERS ALL MAY BE FOUND HERE TO TEST THE ENDURANCE AND SKILL OF ADVENTUROUS CLIMBERS BUT FAR BETTER THAN CLIMBING THE MOUNTAIN IS GOING AROUND ITS WARM FERTILE BASE ENJOYING ITS BOUNTIES LIKE A BEE CIRCLING AROUND A BANK OF FLOWERS
+THE LONG GRAY SLOPES LEADING UP TO THE GLACIER SEEM REMARKABLY SMOOTH AND UNBROKEN
+TWO OR THREE MILES FARTHER ON IS THE MAIN STRONGHOLD OF THE MODOCS HELD BY THEM SO LONG AND DEFIANTLY AGAINST ALL THE SOLDIERS THAT COULD BE BROUGHT TO THE ATTACK
+THUS ONE SAUNTERS ON AND ON IN THE GLORIOUS RADIANCE IN UTTER PEACE AND FORGETFULNESS OF TIME
+HERE YOU STRIKE THE OLD EMIGRANT ROAD WHICH LEADS OVER THE LOW DIVIDE TO THE EASTERN SLOPES OF THE MOUNTAIN
+THE MULE DEER ARE NEARLY AS HEAVY
+EVERY LANDSCAPE LOW AND HIGH SEEMS DOOMED TO BE TRAMPLED AND HARRIED
+MOST OF THE DRAINAGE OF THE GLACIER VANISHES AT ONCE IN THE POROUS ROCKS TO REAPPEAR IN SPRINGS IN THE DISTANT VALLEY AND IT IS ONLY IN TIME OF FLOOD THAT THE CHANNEL CARRIES MUCH WATER THEN THERE ARE SEVERAL FINE FALLS IN THE GORGE SIX HUNDRED FEET OR MORE IN HEIGHT
+LARGE FLOCKS DWELL HERE FROM YEAR TO YEAR WINTER AND SUMMER DESCENDING OCCASIONALLY INTO THE ADJACENT SAGE PLAINS AND LAVA BEDS TO FEED BUT EVER READY TO TAKE REFUGE IN THE JAGGED CRAGS OF THEIR MOUNTAIN AT EVERY ALARM
+REGAINING THE LOW GROUND AT THE BASE OF THE MOUNTAIN AND HOLDING ON IN YOUR GRAND ORBIT YOU PASS THROUGH A BELT OF JUNIPER WOODS CALLED THE CEDARS TO SHEEP ROCK AT THE FOOT OF THE SHASTA PASS
+IT IS LINED WITH EMERALD ALGAE AND MOSSES AND SHADED WITH ALDER WILLOW AND THORN BUSHES WHICH GIVE IT A FINE SETTING
+TRACING RIVERS TO THEIR FOUNTAINS MAKES THE MOST CHARMING OF TRAVELS
+A THOUSAND THOUSAND VOICES ARE HEARD BUT SO FINELY BLENDED THEY SEEM A PART OF THE NIGHT ITSELF AND MAKE A DEEPER SILENCE
+THE LOFTY ICY SHASTA TOWERING HIGH ABOVE ALL SEEMS BUT AN HOUR'S WALK FROM YOU THOUGH THE DISTANCE IN AN AIR LINE IS ABOUT SIXTY MILES
+THEN THE SHINING OF THE WET LEAVES IS DELIGHTFUL AND THE STEAMY FRAGRANCE AND THE BURST OF BIRD SONG FROM A MULTITUDE OF THRUSHES AND FINCHES AND WARBLERS THAT HAVE NESTS IN THE CHAPARRAL
+THEN FELL THE GLOAMING MAKING EVERYTHING STILL MORE FORBIDDING AND MYSTERIOUS
+TRACING THE MC CLOUD TO ITS HIGHEST SPRINGS AND OVER THE DIVIDE TO THE FOUNTAINS OF FALL RIVER NEAR FORT CROOK THENCE DOWN THAT RIVER TO ITS CONFLUENCE WITH THE PITT ON FROM THERE TO THE VOLCANIC REGION ABOUT LASSEN'S BUTTE THROUGH THE BIG MEADOWS AMONG THE SOURCES OF THE FEATHER RIVER AND DOWN THROUGH FORESTS OF SUGAR PINE TO THE FERTILE PLAINS OF CHICO THIS IS A GLORIOUS SAUNTER AND IMPOSES NO HARDSHIP
+IT IS THREE OR FOUR MILES LONG AND TERMINATES AT AN ELEVATION OF ABOUT NINE THOUSAND FIVE HUNDRED FEET ABOVE SEA LEVEL IN MORAINE SPRINKLED ICE CLIFFS SIXTY FEET HIGH
+IN SETTING OUT FROM STRAWBERRY VALLEY BY BEARING OFF TO THE NORTHWESTWARD A FEW MILES YOU MAY SEE
+THE DUCKS LESS WARY KEPT THEIR PLACES MERELY SWIMMING IN AND OUT THROUGH OPENINGS IN THE RUSHES RIPPLING THE GLASSY WATER AND RAISING SPANGLES IN THEIR WAKE
+BUT DON QUIXOTE WHOM HIS THOUGHTS FAR MORE THAN HUNGER KEPT AWAKE COULD NOT CLOSE AN EYE AND ROAMED IN FANCY TO AND FRO THROUGH ALL SORTS OF PLACES
+GIVE ME MY HORSE AND ARMS AND WAIT FOR ME HERE I WILL GO IN QUEST OF THIS KNIGHT AND DEAD OR ALIVE I WILL MAKE HIM KEEP HIS WORD PLIGHTED TO SO GREAT BEAUTY
+DON QUIXOTE WAS ON FOOT WITH HIS HORSE UNBRIDLED AND HIS LANCE LEANING AGAINST A TREE AND IN SHORT COMPLETELY DEFENCELESS HE THOUGHT IT BEST THEREFORE TO FOLD HIS ARMS AND BOW HIS HEAD AND RESERVE HIMSELF FOR A MORE FAVOURABLE OCCASION AND OPPORTUNITY
+SEEING THIS SANCHO GOT UP AND GRAPPLING WITH HIS MASTER HE GRIPPED HIM WITH ALL HIS MIGHT IN HIS ARMS GIVING HIM A TRIP WITH THE HEEL STRETCHED HIM ON THE GROUND ON HIS BACK AND PRESSING HIS RIGHT KNEE ON HIS CHEST HELD HIS HANDS IN HIS OWN SO THAT HE COULD NEITHER MOVE NOR BREATHE
+NOBODY NEED HAVE ANY DOUBT ABOUT THAT SAID SANCHO FOR MY MASTER HAS A VERY HAPPY KNACK OF MATCHMAKING IT'S NOT MANY DAYS SINCE HE FORCED ANOTHER MAN TO MARRY WHO IN THE SAME WAY BACKED OUT OF HIS PROMISE TO ANOTHER MAIDEN AND IF IT HAD NOT BEEN FOR HIS PERSECUTORS THE ENCHANTERS CHANGING THE MAN'S PROPER SHAPE INTO A LACQUEY'S THE SAID MAIDEN WOULD NOT BE ONE THIS MINUTE
+WHAT ARE YOU TALKING ABOUT MAN
+THE WOUNDED GENTLEMAN OPENED HIS ALL BUT CLOSED EYES AND RECOGNISING CLAUDIA SAID I SEE CLEARLY FAIR AND MISTAKEN LADY THAT IT IS THOU THAT HAST SLAIN ME A PUNISHMENT NOT MERITED OR DESERVED BY MY FEELINGS TOWARDS THEE FOR NEVER DID I MEAN TO NOR COULD I WRONG THEE IN THOUGHT OR DEED
+ON PERCEIVING THIS CLAUDIA WHEN SHE HAD CONVINCED HERSELF THAT HER BELOVED HUSBAND WAS NO MORE RENT THE AIR WITH HER SIGHS AND MADE THE HEAVENS RING WITH HER LAMENTATIONS SHE TORE HER HAIR AND SCATTERED IT TO THE WINDS SHE BEAT HER FACE WITH HER HANDS AND SHOWED ALL THE SIGNS OF GRIEF AND SORROW THAT COULD BE CONCEIVED TO COME FROM AN AFFLICTED HEART
+DOST THOU REVOLT AGAINST THY MASTER AND NATURAL LORD
+DON QUIXOTE DID SO AND ASKED HIM WHAT HAD HAPPENED TO HIM AND WHAT HE WAS AFRAID OF
+ONE OF THE SQUIRES OBSERVED IN HIS MIXTURE OF GASCON AND CATALAN THIS CAPTAIN OF OURS WOULD MAKE A BETTER FRIAR THAN HIGHWAYMAN IF HE WANTS TO BE SO GENEROUS ANOTHER TIME LET IT BE WITH HIS OWN PROPERTY AND NOT OURS
+SANCHO REPLIED THAT ALL THE TREES WERE FULL OF MEN'S FEET AND LEGS
+WHAT LED ME INTO IT WAS A CERTAIN THIRST FOR VENGEANCE WHICH IS STRONG ENOUGH TO DISTURB THE QUIETEST HEARTS
+DOST THOU RISE AGAINST HIM WHO GIVES THEE HIS BREAD
+THE CAPTAINS SHOWED PLAINLY THE CONCERN THEY FELT THE REGENT'S LADY WAS DOWNCAST AND THE PILGRIMS DID NOT AT ALL ENJOY SEEING THEIR PROPERTY CONFISCATED
+O HUSBAND WHOSE UNHAPPY FATE IN BEING MINE HATH BORNE THEE FROM THE MARRIAGE BED TO THE GRAVE
+MASTER AND MAN DISMOUNTED FROM THEIR BEASTS AND AS SOON AS THEY HAD SETTLED THEMSELVES AT THE FOOT OF THE TREES SANCHO WHO HAD HAD A GOOD NOONTIDE MEAL THAT DAY LET HIMSELF WITHOUT MORE ADO PASS THE GATES OF SLEEP
+CRUEL RECKLESS WOMAN SHE CRIED HOW EASILY WERT THOU MOVED TO CARRY OUT A THOUGHT SO WICKED
+CLAUDIA WOULD NOT ON ANY ACCOUNT ALLOW HIM TO ACCOMPANY HER AND THANKING HIM FOR HIS OFFERS AS WELL AS SHE COULD TOOK LEAVE OF HIM IN TEARS
+THE SERVANTS WEPT CLAUDIA SWOONED AWAY AGAIN AND AGAIN AND THE WHOLE PLACE SEEMED A FIELD OF SORROW AND AN ABODE OF MISFORTUNE
+AND IF YOU HAVE ANY DESIRE TO SHORTEN THE JOURNEY AND PUT YOURSELF EASILY IN THE WAY OF SALVATION COME WITH ME AND I WILL SHOW YOU HOW TO BECOME A KNIGHT ERRANT A CALLING WHEREIN SO MANY HARDSHIPS AND MISHAPS ARE ENCOUNTERED THAT IF THEY BE TAKEN AS PENANCES THEY WILL LODGE YOU IN HEAVEN IN A TRICE
+HE SAW THAT HIS SQUIRES FOR SO THEY CALL THOSE WHO FOLLOW THAT TRADE WERE ABOUT TO RIFLE SANCHO PANZA BUT HE ORDERED THEM TO DESIST AND WAS AT ONCE OBEYED SO THE GIRDLE ESCAPED
+THE REGENT'S LADY ORDERED ONE OF HER SERVANTS TO GIVE THE EIGHTY CROWNS THAT HAD BEEN ASSESSED AS HER SHARE AT ONCE FOR THE CAPTAINS HAD ALREADY PAID DOWN THEIR SIXTY
+SANCHO SAID THEY HAD BUT THAT THREE KERCHIEFS THAT WERE WORTH THREE CITIES WERE MISSING
+HOW NOW TRAITOR EXCLAIMED DON QUIXOTE
+HE TREMBLED WITH FEAR AND MADE FOR ANOTHER TREE WHERE THE VERY SAME THING HAPPENED TO HIM AND HE FELL A SHOUTING CALLING UPON DON QUIXOTE TO COME AND PROTECT HIM
+THEY WERE ALL TAKEN ABACK AND NOT ONE OF THEM DARED TO UTTER A WORD SUCH DEFERENCE DID THEY PAY HIM
+THEY MADE HASTE TO OVERTAKE THEM WHICH AS THE PARTY MOVED SLOWLY THEY WERE ABLE TO DO WITH EASE
+DULCINEA IS PERISHING THOU ART LIVING ON REGARDLESS I AM DYING OF HOPE DEFERRED THEREFORE UNTRUSS THYSELF WITH A GOOD WILL FOR MINE IT IS HERE IN THIS RETIRED SPOT TO GIVE THEE AT LEAST TWO THOUSAND LASHES
+AT THIS INSTANT ONE OR TWO OF THOSE SQUIRES WHO WERE POSTED AS SENTINELS ON THE ROADS TO WATCH WHO CAME ALONG THEM AND REPORT WHAT PASSED TO THEIR CHIEF CAME UP AND SAID SENOR THERE IS A GREAT TROOP OF PEOPLE NOT FAR OFF COMING ALONG THE ROAD TO BARCELONA
+AND NOW THE SQUIRES DESPATCHED TO MAKE THE PRIZE CAME UP BRINGING WITH THEM TWO GENTLEMEN ON HORSEBACK TWO PILGRIMS ON FOOT AND A COACH FULL OF WOMEN WITH SOME SIX SERVANTS ON FOOT AND ON HORSEBACK IN ATTENDANCE ON THEM AND A COUPLE OF MULETEERS WHOM THE GENTLEMEN HAD WITH THEM
+AT ONE MOMENT IT SEEMED TO HIM THAT HE WAS IN THE CAVE OF MONTESINOS AND SAW DULCINEA TRANSFORMED INTO A COUNTRY WENCH SKIPPING AND MOUNTING UPON HER SHE ASS AGAIN THAT THE WORDS OF THE SAGE MERLIN WERE SOUNDING IN HIS EARS SETTING FORTH THE CONDITIONS TO BE OBSERVED AND THE EXERTIONS TO BE MADE FOR THE DISENCHANTMENT OF DULCINEA
+WHO IS TOUCHING ME AND UNTRUSSING ME
+HE WAS MOUNTED UPON A POWERFUL HORSE AND HAD ON A COAT OF MAIL WITH FOUR OF THE PISTOLS THEY CALL PETRONELS IN THAT COUNTRY AT HIS WAIST
+SANCHO ROSE AND REMOVED SOME DISTANCE FROM THE SPOT BUT AS HE WAS ABOUT TO PLACE HIMSELF LEANING AGAINST ANOTHER TREE HE FELT SOMETHING TOUCH HIS HEAD AND PUTTING UP HIS HANDS ENCOUNTERED SOMEBODY'S TWO FEET WITH SHOES AND STOCKINGS ON THEM
+DON QUIXOTE GAVE HIS PROMISE AND SWORE BY THE LIFE OF HIS THOUGHTS NOT TO TOUCH SO MUCH AS A HAIR OF HIS GARMENTS AND TO LEAVE HIM ENTIRELY FREE AND TO HIS OWN DISCRETION TO WHIP HIMSELF WHENEVER HE PLEASED
+IT IS NOT TRUE THEN SAID CLAUDIA THAT THOU WERT GOING THIS MORNING TO MARRY LEONORA THE DAUGHTER OF THE RICH BALVASTRO
+HE SAW ME HE PAID COURT TO ME I LISTENED TO HIM AND UNKNOWN TO MY FATHER I LOVED HIM FOR THERE IS NO WOMAN HOWEVER SECLUDED SHE MAY LIVE OR CLOSE SHE MAY BE KEPT WHO WILL NOT HAVE OPPORTUNITIES AND TO SPARE FOR FOLLOWING HER HEADLONG IMPULSES
+CLAUDIA TOLD HIM SHE MEANT TO GO TO A MONASTERY OF WHICH AN AUNT OF HERS WAS ABBESS WHERE SHE INTENDED TO PASS HER LIFE WITH A BETTER AND EVERLASTING SPOUSE
+SAID ONE OF THE BYSTANDERS I HAVE GOT THEM AND THEY ARE NOT WORTH THREE REALS
+IN A WORD HE PLEDGED HIMSELF TO BE MINE AND I PROMISED TO BE HIS WITHOUT CARRYING MATTERS ANY FURTHER
+ONE OF HIS WAITERS PHIL TYSON WAS ONE OF THE EARLIER ONES TO GO BACK INTO THE BURNED DISTRICT TO BEGIN BUSINESS AND HE OPENED A RESTAURANT CALLED THE DEL MONTE IN POWELL STREET NEAR MARKET BUT IT WAS TOO EARLY FOR SUCCESS AND CLOSED AFTER A SHORT CAREER
+HERE THERE IS ALWAYS GOOD MUSIC AND FOOD WELL COOKED AND WELL SERVED AND ALWAYS A LIVELY CROWD DURING THE LUNCHEON DINNER AND AFTER THEATRE HOURS THE ROOM IS NOT LARGE BUT ITS DIMENSIONS ARE GREATLY MAGNIFIED OWING TO THE COVERING OF MIRRORS WHICH LINE THE WALLS
+OF COURSE THESE ARE NOT THE ENTIRE MENUS BUT OF ALL THE WELL PREPARED DISHES THESE ARE THEIR BEST
+TO THE PICKLE ADD TWO LARGE ONIONS CUT IN QUARTERS TWO FRESH CARROTS AND ABOUT ONE OUNCE OF MIXED WHOLE ALLSPICE BLACK PEPPERS CLOVES AND BAY LEAVES
+STRAIN THE SAUCE THROUGH A FINE COLLANDER AND ADD A FEW RAISINS A PIECE OF HONEY CAKE OR GINGER SNAPS AND THE MEAT OF ONE FRESH TOMATO
+THE POODLE DOG HAS A HOTEL ATTACHMENT WHERE ONE MAY GET ROOMS OR FULL APARTMENTS
+PUT INTO THE OVEN AGAIN AND COOK FOR HALF AN HOUR BASTING FREQUENTLY WITH THE ORIGINAL BRINE
+WE FINALLY GOT HIM TO SELECT THE ONE PRIZED ABOVE ALL OTHERS AND THIS IS WHAT CHEF SCHEILER GAVE US
+THE SPECIALTY OF THE HOF BRAU IS ABALONE'S AND THEY HAVE AS A FEATURE THIS SHELL FISH COOKED IN SEVERAL WAYS
+THE RESTAURANTS OF THE PRESENT DAY THAT APPROACH NEAREST THE OLD BOHEMIAN RESTAURANTS OF PRE FIRE DAYS OF THE FRENCH CLASS ARE JACK'S IN SACRAMENTO STREET BETWEEN MONTGOMERY AND KEARNY FELIX IN MONTGOMERY STREET BETWEEN CLAY AND WASHINGTON AND THE POODLE DOG BERGEZ FRANKS IN BUSH STREET BETWEEN KEARNY AND GRANT AVENUE
+UNDER ORDINARY CIRCUMSTANCES THE ABALONE IS TOUGH AND UNPALATABLE BUT AFTER THE DEFT MANIPULATION OF HERBERT THEY ARE TENDER AND MAKE A FINE DISH EITHER FRIED AS CHOWDER OR A LA NEWBERG
+WHEN DONE TAKE THE MEAT OUT OF THE SAUCE
+IN EITHER OF THESE RESTAURANTS YOU WILL BE SERVED WITH THE BEST THE MARKET AFFORDS COOKED THE RIGHT WAY
+THOMPSON OPENED A LARGE RESTAURANT IN O'FARRELL STREET JUST ABOVE FILLMORE AND FOR TWO YEARS OR MORE DID A THRIVING BUSINESS HIS PLACE BEING NOTED FOR ITS GOOD COOKING AND ITS SPLENDID SERVICE
+IF YOU KNOW HOW TO ORDER AND DO NOT CARE TO COUNT THE COST WHEN YOU ORDER PROBABLY THE BEST DINNER AT THESE RESTAURANTS CAN BE HAD AT EITHER BLANCO'S OR THE POODLE DOG
+PUT IN THE OVEN AND BROWN TO A GOLDEN COLOR
+AT THE CORNER OF MARKET AND EDDY STREETS IS THE ODEON DOWN IN A BASEMENT WITH DECORATIONS OF MOST GARISH ORDER
+IN ADDITION TO ABALONE'S THE HOF BRAU MAKES A SPECIALTY OF LITTLE OREGON CRAWFISH
+THEY ALSO HAVE AS THE CHEF IN CHARGE OF THE ABALONE DISHES HERBERT FORMERLY CHEF FOR ONE OF THE YACHT CLUBS OF THE COAST WHO CLAIMS TO HAVE THE ONLY PROPER RECIPE FOR MAKING ABALONE'S TENDER
+HIS PRICES ARE MODERATE AND HIS COOKING AND VIANDS OF THE BEST AND WILL SATISFY THE MOST CRITICAL OF THE GOURMETS
+THE FLY TRAP AND CHARLIE'S FASHION THE FIRST IN SUTTER STREET NEAR KEARNY AND THE OTHER IN MARKET NEAR SUTTER SERVE WELL COOKED FOODS ESPECIALLY SOUP SALADS AND FISH
+THE CUISINE IS OF THE BEST AND THE CHEFS RANK AT THE TOP OF THEIR ART
+SAN FRANCISCO'S CARE FREE SPIRIT WAS FULLY EXEMPLIFIED BEFORE THE ASHES OF THE GREAT FIRE OF NINETEEN O SIX WERE COLD
+IT IS AN IDEA THAT IS WORTH WHILE BUT UNFORTUNATELY THE PROPRIETORS DEPEND TOO MUCH ON THE DECORATIVE FEATURE AND TOO LITTLE ON THE FOOD AND HOW THEY SERVE IT
+THIS GARISH DISPLAY OF MIRRORS AND ELABORATE DECORATION OF CEILING AND PILLARS GIVES IT THE APPEARANCE OF THE ABODE OF SATURNALIA BUT DECORUM IS THE RULE AMONG THE PATRONS
+IT HAS CHANGED FROM WHAT IT WAS IN THE OLD DAYS BUT IS STILL AN EXCELLENT PLACE TO DINE
+BUT IF YOU REALLY LOVE GOOD MUSIC MUSIC THAT HAS MELODY AND RHYTHM AND SOOTHING CADENCES GO TO THE HEIDELBERG INN AND LISTEN TO THE CONCERT WHICH IS A FEATURE OF THE PLACE EVERY EVENING
+THEN TAKE IT OUT OF THE ROASTING PAN AND PUT IT INTO A CASSEROLE AFTER SPRINKLING IT WITH TWO OUNCES OF FLOUR
+THE HOF BRAU HOWEVER IS LESS DISTINCTIVELY GERMAN AS THE GREATER NUMBER OF ITS PATRONS ARE AMERICANS
+ONE CAN ALMOST IMAGINE HIMSELF IN ONE OF THE FAMOUS RATHSKELLERS OF OLD HEIDELBERG NOT AT THE SCHLOSS OF COURSE FOR HERE YOU CANNOT LOOK DOWN ON THE WEISER AS IT FLOWS BENEATH THE WINDOWS OF THE GREAT WINE STUBE ON THE HILL
+AT THE TWO MENTIONED ONE PAYS FOR THE SURROUNDINGS AS WELL AS FOR THE FOOD AND SOMETIMES THIS IS WORTH PAYING FOR
+BOTH SERVE GOOD SPANISH DINNERS AT REASONABLE PRICES
+SEASON WITH SALT AND PEPPER AND A LITTLE SUGAR TO TASTE
+HERE AS WELL AS IN A NUMBER OF OTHER PLACES ONE CAN WELL APPRECIATE THE COLLOQUIAL DEFINITION OF CABARET
+JOHN TAIT IS THE PRESIDING SPIRIT HERE HE HAVING MADE REPUTATION AS CLUB MANAGER AND THEN AS MANAGER OF THE CLIFF HOUSE
+IN THIS SAME DISTRICT IS THE MINT IN COMMERCIAL STREET BETWEEN MONTGOMERY AND KEARNY STREETS
+CLARETS ARE VALUED FOR THEIR FLAVOR AND FOR THEIR TONIC PROPERTIES
+NEVER DRINK ANY HARD LIQUORS SUCH AS WHISKY BRANDY GIN OR COCKTAILS WITH OYSTERS OR CLAMS AS IT IS LIABLE TO UPSET YOU FOR THE REST OF THE EVENING
+CHABLIS A WHITE BURGUNDY DRY AND OF AGREEABLE AROMA
+HOCHHEIMER A LIGHT PLEASING AND WHOLESOME WINE
+DRY AND OF MAGNIFICENT BOUQUET
+GERMAN WINES ARE OF LIGHTER CHARACTER AND ARE GENERALLY TERMED RHEIN WINES
+CLARET EIGHTEEN NINETY EIGHT AND NINETEEN O FOUR
+RHEIN AND MOSELLE EIGHTEEN NINETY THREE
+AUSTRIAN BURGUNDY IS ONE OF THE FINEST WINES POSSESSING RICH FLAVOR AND FINE PERFUME OTHER BURGUNDIES ARE
+SAUTERNE IS A WHITE BORDEAUX A STRONG LUSCIOUS WINE THE BEST KNOWN VARIETIES BEING
+LACRIMA CHRISTI A STILL WINE OF EXCELLENT FLAVOR AND BOUQUET
+VINTAGE YEARS HAVE MUCH TO DO WITH THE QUALITY OF WINES
+WITH SOUP AND FISH SERVE WHITE WINES SUCH AS RHEIN WINE SAUTERNE OR WHITE BURGUNDY
+WITH ENTREES SERVE CLARETS OR OTHER RED WINES SUCH AS SWISS BORDEAUX HUNGARIAN OR ITALIAN WINES
+POUR MAYONNAISE OVER ALL CHILL AND SERVE
+PUT THE PULP INTO A BASIN WITH TWO OUNCES OF MELTED BUTTER TWO TABLESPOONFULS OF LEMON JUICE HALF A POUND OF CHESTNUTS BOILED AND GRATED AND SEASONING OF SALT AND WHITE PEPPER TO TASTE
+ASPARAGUS SALAD COOK THE ASPARAGUS IN SALTED WATER DRAIN AND CHILL
+THIS DRESSING SHOULD STAND IN THE ICE BOX FOUR OR FIVE HOURS TO BECOME SEASONED
+STIR THE SOAKED GELATIN IN WHILE THE CUCUMBER IS HOT
+WHEN THICKENED STRAIN AND COOL
+TOMATO BASKETS TOMATO BASKETS ARE CHARMING ACCESSORIES FOR HOLDING VEGETABLE SALAD CHICKEN SHRIMPS COLD BEANS ASPARAGUS TIPS SHREDDED CELERY CUCUMBERS CUT IN CUBES AND MINCED PEPPERS
+GARNISH DISH THAT DRESSING IS MADE IN WITH A LITTLE GARLIC
+BIRDS NEST SALAD HAVE READY AS MANY CRISP LEAVES OF LETTUCE AS MAY BE REQUIRED TO MAKE A DAINTY LITTLE NEST FOR EACH PERSON
+CAULIFLOWER MAYONNAISE TAKE COLD BOILED CAULIFLOWER BREAK INTO BRANCHES ADDING SALT PEPPER AND VINEGAR TO SEASON
+HANDLES OF WATERCRESS MAY BE ATTACHED TO THESE BASKETS
+CELERY AND NUT SALAD CUT ENOUGH CELERY FINE TO MEASURE TWO CUPS ADD ONE CUP OF FINELY SHREDDED OR SHAVED CABBAGE AND ONE AND ONE HALF CUPS OF WALNUT MEATS BROKEN IN SMALL PIECES BUT NOT CHOPPED
+SET INTO A COLD PLACE TO CHILL AND BECOME FIRM
+SALAD TWO CUPS OF APPLES CUT INTO SMALL PIECES ONE CUP CELERY CUT INTO SMALL PIECES ONE CUP ENGLISH WALNUTS
+SURROUND WITH A GARNISH OF COOKED AND DICED CARROTS TURNIPS GREEN PEAS
+SERVE ON A LETTUCE LEAF WITH MAYONNAISE DRESSING MADE WITHOUT MUSTARD AND THINNED WITH CREAM
+ADD TWO TABLESPOONS THICK SOUR CREAM TWO TABLESPOONS SUGAR A SPRINKLE OF MUSTARD AND HALF CUP OF VINEGAR
+SERVE WITH FRENCH DRESSING HIDDEN UNDER THE LEAVES OF THE NEST
+CABBAGE SALAD CHOP OR SHAVE FINE HALF A MEDIUM SIZE HEAD OF CABBAGE THAT HAS BEEN LEFT IN COLD WATER UNTIL CRISP THEN DRAIN
+BEAT UNTIL THOROUGHLY MIXED POUR OVER THE CABBAGE AND TOSS LIGHTLY UNTIL UNIFORMLY SEASONED
+STRAIN AND BOTTLE AND PUT IN ICE BOX SHAKE BEFORE USING EACH TIME
+HOW ASKED TAD
+CHAPTER FOUR THE FIRST NIGHT IN CAMP
+WHO IS THE WRANGLER THIS MORNING ASKED THE FOREMAN GLANCING ABOUT AT HIS MEN
+HI THERE HISSED LUMPY FILLED WITH INDIGNATION THAT ANYONE SHOULD ATTEMPT TO MOUNT A PONY FROM THE RIGHT SIDE
+IN SPITE OF THEIR HARD COUCHES THE PONY RIDERS SLEPT SOUNDLY EVEN PROFESSOR ZEPPLIN HIMSELF NEVER WAKING THE WHOLE NIGHT THROUGH
+STACY GRUMBLED TURNED OVER AND WENT TO SLEEP AGAIN
+EVEN IF I CAN'T SING I CAN BEAT THAT
+NONE OF YOU WILL BE FIT FOR DUTY TO MORROW
+HUMPH GRUNTED CURLEY ADAMS
+A WRANGLER'S A WRANGLER ANSWERED BIG FOOT STOLIDLY
+OH NO THIS KIND OF A WRANGLER ISN'T LAUGHED THE FOREMAN
+GRUB PI LE GRUB PI LE
+NOT ON THE RANGE WHY NOT DEMANDED THE BOY
+WALTER HAD GONE OUT WITH THE SECOND GUARD AND THE OTHERS HAD GATHERED AROUND THE CAMP FIRE FOR THEIR NIGHTLY STORY TELLING
+WE HAD BETTER START THE DRIVE THIS MORNING
+THE LADS FOUND THAT A PAIR OF BLANKETS HAD BEEN ASSIGNED TO EACH OF THEM WITH AN ORDINARY WAGON SHEET DOUBLED FOR A TARPAULIN
+THE COWBOY DID THIS VERY THING BUT WITHIN AN HOUR HE FOUND HIMSELF ALONE THE OTHERS HAVING TURNED IN ONE BY ONE
+STACY BROWN PROVED THE ONLY GRUMBLER IN THE LOT DECLARING THAT HE COULD NOT SLEEP A WINK ON SUCH A BED AS THAT
+THE PONY DID MOST OF IT ADMITTED THE LAD I JUST GAVE HIM HIS HEAD AND THAT'S ALL THERE WAS TO IT
+KEEP A GOING AND IF YOU'RE LUCKY YOU'LL RUN PLUMB INTO THEM WAS THE JEERING ANSWER AS THE SLEEPY COWMEN SPURRED THEIR PONIES ON TOWARD CAMP MUTTERING THEIR DISAPPROVAL OF TAKING ALONG A BUNCH OF BOYS ON A CATTLE DRIVE
+ALMOST BEFORE THE ECHOES OF HIS VOICE HAD DIED AWAY A SHRILL VOICE PIPED UP FROM THE TAIL END OF THE CHUCK WAGON
+A LOUD LAUGH FOLLOWED AT CHUNKY'S EXPENSE
+LUMPY BATES CAME RUNNING TOWARD HIM NOT DARING TO CALL OUT FOR FEAR OF WAKING THE CAMP
+YOU WON'T BE SO FAST TO WAKE UP HARD WORKING COWBOYS AFTER THAT I RECKON
+THESE THEY SPREAD OUT ON THE GROUND USING BOOTS WRAPPED IN COATS FOR PILLOWS
+PONG TELL THE YOUNG GENTLEMEN WHAT WOULD BECOME OF YOU IF YOU WERE TO SERVE BAD MEALS TO THIS OUTFIT OF COWPUNCHERS
+THE HORSES OF THE OUTFIT SAVE THOSE THAT WERE ON NIGHT DUTY AND TWO OR THREE OTHERS THAT HAD DEVELOPED A HABIT OF STRAYING HAD BEEN TURNED LOOSE EARLY IN THE EVENING FOR ANIMALS ON THE TRAIL ARE SELDOM STAKED DOWN
+WE'VE GOT A HARD DRIVE BEFORE US AND EVERY MAN MUST BE FIT AS A FIDDLE
+STACY BROWN'S LEFT LEG SWUNG OVER THE SADDLE
+WHERE ARE THEY ASKED THE BOY
+HE'S A FELLOW WHO'S ALL THE TIME MAKING TROUBLE ISN'T HE ASKED STACY INNOCENTLY
+HE'S A TROUBLE CURER NOT A TROUBLEMAKER EXCEPT FOR HIMSELF
+THEN BIDDING THE BOYS RIDE UP NEAR THE SPOT TO WATCH HIM HE DREW OFF SOME TEN RODS AND WHEELING SPURRED HIS PONY TO A RUN
+A TEMPORARY CAMP WAS QUICKLY PITCHED
+THE LITTLE ANIMALS WERE BECOMING MORE SURE FOOTED EVERY DAY AND NED SAID THAT BEFORE THE TRIP WAS FINISHED JIMMIE WOULD BE ABLE TO WALK A SLACK ROPE
+THEY SAW HIS LONG HAIR ALMOST BRUSH THE GRASS ONE OF HIS HANDS SWEPT DOWN AND UP AND ONCE MORE TAD BUTLER ROSE STANDING IN HIS STIRRUPS UTTERING A COWBOY YELL AS HE WAVED THE SOMBRERO ON HIGH
+ONCE MORE STACY APPROACHED THE SOMBRERO HIS PONY RUNNING WELL AND AS HE DREW NEAR IT THEY SAW HIM RISE IN HIS SADDLE JUST AS TAD BUTLER HAD DONE A FEW MINUTES BEFORE
+I RECKON THERE ARE SMILED THE GUIDE WE ARE IN THE BEAR COUNTRY NOW
+WHAT'S THAT FOR DEMANDED NED WONDERINGLY
+THIS ANNOUNCEMENT FILLED THE BOYS WITH EXCITEMENT
+AND WE'VE GOT A SURPRISE FOR YOU ANNOUNCED STACY SWELLING WITH PRIDE
+HAT TOO CLOSE TO ME I COULDN'T GET IT EXPLAINED CHUNKY THE BOYS ROARED
+GRASPING THE POMMEL WITH THE LEFT HAND HE APPEARED TO DIVE HEAD FIRST TOWARD THE GROUND
+THE BOYS HOWLED WITH DELIGHT THAT IS ALL DID SAVE STACY BROWN
+COLD WATER IS THE MOST NOURISHING THING WE'VE TOUCHED SINCE LAST NIGHT
+AN EARLY START WAS MADE SO THAT THE PARTY REACHED THE PROMISED TABLE LANDS SHORTLY BEFORE TEN O'CLOCK IN THE FORENOON
+GALLOPING INTO CAMP THE BOY FETCHED HIS SOMBRERO WHICH HE CARRIED WELL OUT INTO THE FIELD AND TOSSED AWAY
+HE BURIED HIS BISCUIT UNDER A LAYER OF JAM OVER WHICH HE SPREAD A THICK COATING OF HONEY
+WE DID NOT IT MUST HAVE COME TO LIFE SOME TIME DURING THE NIGHT AND DUG ITS WAY OUT LAUGHED TAD
+AS YET THEY HAD BEEN UNABLE TO ATTEMPT ANY FANCY RIDING WITH THEIR PONIES OWING TO THE RUGGED NATURE OF THE COUNTRY THROUGH WHICH THEY HAD BEEN JOURNEYING
+JAM EXCLAIMED CHUNKY STRETCHING HIS NECK AND EYEING THE DISH LONGINGLY
+WHY DON'T YOU MOVE THE PONY
+SUPPER HAVING BEEN FINISHED THE PARTY GATHERED ABOUT THE CAMP FIRE FOR THEIR EVENING CHAT AFTER WHICH ADMONISHING STACY TO KEEP WITHIN HIS TENT AND NOT TO GO BORROWING TROUBLE THE BOYS TURNED IN FOR A SOUND SLEEP
+IT WAS A BEAUTIFUL RACE THE LITTLE INDIAN PONIES SEEMING TO ENTER THOROUGHLY INTO THE SPIRIT OF THE CONTEST STRETCHING THEMSELVES OUT TO THEIR FULL LENGTHS AND WITH HEADS ON A LEVEL WITH THEIR BACKS FAIRLY FLEW ACROSS THE GREAT PLOT OF GREEN
+HE NO DOUBT WOULD BRING FOOD OF SOME KIND WITH HIM
+WITH A SHOUT THE BOYS DASHED PELL MELL TO MEET THE PACK TRAIN AND FALLING IN BEHIND THE SLOW MOVING BURROS URGED THEM ON WITH DERISIVE SHOUTS AND SUNDRY RESOUNDING SLAPS ON THE ANIMALS FLANKS
+PRESIDENT BROWN I WITHDRAW MY CRITICISM I OFFER YOU MY HUMBLE APOLOGIES
+YES THE COUNTRY IS FULL OF CAVES
+ALL AGREED THAT TAD'S SUPERIOR HORSEMANSHIP ALONE HAD WON THE RACE FOR HIM
+THE GREAT GREEN FIELD SURROUNDED ON ALL SIDES BY TALL TREES MADE THE PLACE AN IDEAL ONE FOR THEIR PURPOSE
+I AM FREE TO ADMIT THAT I AM HUNGRY TOO
+THE FIRST TIME HE RODE SWIFTLY BY IT LEANING OVER TO LOOK AT THE HAT AS HE PASSED HOLDING TO THE POMMEL FIRMLY WITH HIS LEFT HAND
+BUT I KNOW AN OLD SETTLER WHO WILL LEND US HIS DOG IF IT IS NOT OUT
+NOW FALL TO YOUNG GENTLEMEN DIRECTED THE PROFESSOR
+AS A RESULT INSTEAD OF STOPPING WHEN HE REACHED THE HAT THE BOY KEPT ON GOING
+AT THE MOMENT WHEN HE FREED HIS LEFT FOOT FROM THE STIRRUP HE THREW HIS BODY SHARPLY TO THE RIGHT REACHING FOR THE HAT WITHOUT TAKING THE PRECAUTION TO GRASP THE POMMEL
+TAD IS AN EXPERIENCED RIDER
+LIGE LEANING OVER THE BRINK WAS ABLE TO FOLLOW THE BOY'S MOVEMENTS BY THE AID OF THE THIN ARC OF LIGHT MADE BY THE TORCH IN TAD'S HAND
+ARE YOU READY YES
+LIGE QUICKLY MADE FAST THE LINE TO A TREE
+SLOWLY BUT STEADILY THE SLENDER LINE WAS PAID OUT AMID A TENSE SILENCE ON THE PART OF THE LITTLE GROUP AT THE TOP OF THE CANYOU
+LODGED IN THE BRANCHES OF A PINYON TREE I THINK IT IS BUT HE DOESN'T ANSWER ME
+AFTER WHAT SEEMED TO THEM HOURS A SHARP CALL FROM THE DEPTHS REACHED THEIR EARS
+I PROTEST SHOUTED THE PROFESSOR
+LOOKS LIKE A CLUMP OF BUSHES DOWN THERE BUT I AIN'T SURE CAN YOU MAKE IT OUT
+BE SURE TO FASTEN HIM SECURELY TO THE LOOP BEFORE YOU GIVE THE SIGNAL TO HAUL UP WARNED THE GUIDE
+YOU'LL ALL BE OVER IF YOU DON'T HAVE A CARE
+I COULD NOT THINK OF ALLOWING ANY OF MY CHARGES TO TAKE SO TERRIBLE A RISK AND
+AND THAT TUMBLE'S ENOUGH TO KNOCK THE SENSE OUT OF A FULL GROWN MAN
+I AM THE ONE TO GO AFTER WALT IF ANYONE HAS TO I'LL GO DOWN MISTER THOMAS
+MEBBY YOU THINK HE'S HAVING SOME SORT OF A PICNIC DOWN THERE EH GLARED LIGE
+THE MOVEMENT SENT HIS BODY SWAYING GIDDILY FROM SIDE TO SIDE
+SHALL WE HAUL UP ASKED LIGE MAKING A MEGAPHONE OF HIS HANDS YES HAUL AWAY
+YES AGREED TAD THAT DOES LOOK LIKE BUSHES
+MASTER TAD IS RIGHT DECIDED THE GUIDE GAZING AT THE TWO BOYS APPROVINGLY
+BUT FROM THE CAUTIOUS MOVEMENTS OF THE LIGHT FAR BELOW THEM THE GUIDE UNDERSTOOD THAT THE LAD WAS AT WORK CARRYING OUT HIS PART OF THE TASK OF RESCUE TO THE BEST OF HIS ABILITY
+HE TILTED HIS HEAD TO LOOK UP
+DON'T MOVE AROUND LIE PERFECTLY STILL WARNED THE GUIDE ARE YOU HURT
+CAUTIOUSLY PLACING A HAND AGAINST THE ROCKS TO STEADY HIMSELF TAD WISELY CONCLUDED THAT HEREAFTER IT WOULD NOT PAY TO BE TOO CURIOUS
+SURE THING ANSWERED THE BOY
+HE'S SO FAR TO THE RIGHT OF ME THAT I CAN'T REACH HIM
+YOU'D HAVE BOTH OF US AT THE BOTTOM IF I LEFT IT TO YOU TO TAKE CARE OF THIS END
+NOR WAS HIS SENSE OF SECURITY INCREASED WHEN IN SHIFTING HIS POSITION THE TORCH FELL FROM HIS GRASP THE FAGOTS SCATTERING AS THEY SLIPPED DOWN BETWEEN THE LIMBS OF THE TREE AND WHIRLING IN EVER DIMINISHING CIRCLES UNTIL FINALLY HE HEARD THEM CLATTER ON THE ROCKS BELOW
+I SEE HIM CALLED TAD HIS VOICE SOUNDING HOLLOW AND UNNATURAL TO THOSE ABOVE
+NO I AM THE LIGHTER OF THE TWO URGED TAD
+TO MAKE DRY TOAST
+MAIZE NEXT TO WHEAT AND RICE MAIZE IS THE GRAIN MOST USED IN THE NOURISHMENT OF MAN
+LOOK AT IT FROM TIME TO TIME WHEN IT HAS BEEN LAID FOR NEARLY AN HOUR AND WHEN THE YEAST HAS RISEN AND BROKEN THROUGH THE FLOUR SO THAT BUBBLES APPEAR IN IT YOU WILL KNOW THAT IT IS READY TO BE MADE UP INTO DOUGH
+SOYER RECOMMENDS THAT EACH SLICE SHOULD BE CUT INTO PIECES AS SOON AS IT IS BUTTERED AND WHEN ALL ARE READY THAT THEY SHOULD BE PILED LIGHTLY ON THE DISH THEY ARE INTENDED TO BE SERVED ON
+ITALIAN RUSKS
+MODE PUT THE FLOUR INTO A BASIN MIX THE SUGAR WELL WITH IT MAKE A HOLE IN THE CENTRE AND STIR IN THE YEAST AND MILK WHICH SHOULD BE LUKEWARM WITH ENOUGH OF THE FLOUR TO MAKE IT THE THICKNESS OF CREAM
+MODE PUT THE MILK AND BUTTER INTO A SAUCEPAN AND KEEP SHAKING IT ROUND UNTIL THE LATTER IS MELTED
+THESE BUNS MAY BE VARIED BY ADDING A FEW CURRANTS CANDIED PEEL OR CARAWAY SEEDS TO THE OTHER INGREDIENTS AND THE ABOVE MIXTURE ANSWERS FOR HOT CROSS BUNS BY PUTTING IN A LITTLE GROUND ALLSPICE AND BY PRESSING A TIN MOULD IN THE FORM OF A CROSS IN THE CENTRE OF THE BUN
+A LOAF OF HOUSEHOLD BREAD ABOUT TWO DAYS OLD ANSWERS FOR MAKING TOAST BETTER THAN COTTAGE BREAD THE LATTER NOT BEING A GOOD SHAPE AND TOO CRUSTY FOR THE PURPOSE
+WHEN WE TAKE INTO ACCOUNT THAT THE ARABIANS ARE FOND OF LIZARDS AND LOCUSTS AS ARTICLES OF FOOD THEIR CUISINE ALTOGETHER IS SCARCELY A TEMPTING ONE
+ITALIAN MILLET OR GREAT INDIAN MILLET IS CULTIVATED IN EGYPT AND NUBIA WHERE IT IS CALLED DHOURRA AND IS USED AS HUMAN FOOD AS WELL AS FOR THE FERMENTATION OF BEER
+MODE PUT THE FLOUR INTO A LARGE EARTHENWARE BOWL OR DEEP PAN THEN WITH A STRONG METAL OR WOODEN SPOON HOLLOW OUT THE MIDDLE BUT DO NOT CLEAR IT ENTIRELY AWAY FROM THE BOTTOM OF THE PAN AS IN THAT CASE THE SPONGE OR LEAVEN AS IT WAS FORMERLY TERMED WOULD STICK TO IT WHICH IT OUGHT NOT TO DO
+WHEN THEY ARE QUITE HOT DIVIDE THEM LENGTHWISE INTO THREE PUT SOME THIN FLAKES OF GOOD BUTTER BETWEEN THE SLICES PRESS THE ROLLS TOGETHER AND PUT THEM IN THE OVEN FOR A MINUTE OR TWO BUT NOT LONGER OR THE BUTTER WOULD OIL TAKE THEM OUT OF THE OVEN SPREAD THE BUTTER EQUALLY OVER DIVIDE THE ROLLS IN HALF AND PUT THEM ON TO A VERY HOT CLEAN DISH AND SEND THEM INSTANTLY TO TABLE
+ILLUSTRATION MAIZE PLANT
+SEVENTEEN TWENTY FOUR
+MODE WHISK THE EGG STIR IN THE SUGAR AND BEAT THESE INGREDIENTS WELL TOGETHER BEAT THE BUTTER TO A CREAM STIR IN THE GROUND RICE CURRANTS AND CANDIED PEEL AND AS MUCH FLOUR AS WILL MAKE IT OF SUCH A CONSISTENCY THAT IT MAY BE ROLLED INTO SEVEN OR EIGHT BALLS
+ILLUSTRATION RUSKS
+SEVENTEEN THIRTY FOUR
+CUT AS MANY NICE EVEN SLICES AS MAY BE REQUIRED RATHER MORE THAN ONE QUARTER INCH IN THICKNESS AND TOAST THEM BEFORE A VERY BRIGHT FIRE WITHOUT ALLOWING THE BREAD TO BLACKEN WHICH SPOILS THE APPEARANCE AND FLAVOUR OF ALL TOAST
+THEN PLACE THE PAN ON A STRONG CHAIR OR DRESSER OR TABLE OF CONVENIENT HEIGHT POUR INTO THE SPONGE THE REMAINDER OF THE WARM MILK AND WATER STIR INTO IT AS MUCH OF THE FLOUR AS YOU CAN WITH THE SPOON THEN WIPE IT OUT CLEAN WITH YOUR FINGERS AND LAY IT ASIDE
+TURN IT THEN ON TO A PASTE BOARD OR VERY CLEAN DRESSER AND WITH A LARGE SHARP KNIFE DIVIDE IT IN TWO MAKE IT UP QUICKLY INTO LOAVES AND DISPATCH IT TO THE OVEN MAKE ONE OR TWO INCISIONS ACROSS THE TOPS OF THE LOAVES AS THEY WILL RISE MORE EASILY IF THIS BE DONE
+HOT ROLLS
+SEVENTEEN SEVENTEEN
+TO MAKE HOT BUTTERED TOAST SEVENTEEN TWENTY SIX
+KIRKLEATHAM YEAST
+ILLUSTRATION ITALIAN MILLET
+MODE LET THE TARTARIC ACID AND SALT BE REDUCED TO THE FINEST POSSIBLE POWDER THEN MIX THEM WELL WITH THE FLOUR
+SUFFICIENT TO MAKE TWELVE BUNS SEASONABLE AT ANY TIME LIGHT BUNS
+MOVE IT BACKWARDS AND FORWARDS UNTIL THE BREAD IS NICELY COLOURED THEN TURN IT AND TOAST THE OTHER SIDE AND DO NOT PLACE IT SO NEAR THE FIRE THAT IT BLACKENS
+THEY SHOULD BE KEPT IN A CLOSED TIN CANISTER IN A DRY PLACE TO PRESERVE THEIR CRISPNESS
+NEVER USE NEW BREAD FOR MAKING ANY KIND OF TOAST AS IT EATS HEAVY AND BESIDES IS VERY EXTRAVAGANT
+EXCELLENT ROLLS
+SOME OF THE PREPARATIONS OF MAIZE FLOUR ARE VERY GOOD AND WHEN PARTAKEN IN MODERATION SUITABLE FOOD FOR ALMOST EVERYBODY
+TO MAKE GOOD HOME MADE BREAD
+IT IS NOT CULTIVATED IN ENGLAND BEING PRINCIPALLY CONFINED TO THE EAST
+NEXT TAKE EITHER A LARGE TABLESPOONFUL OF BREWER'S YEAST WHICH HAS BEEN RENDERED SOLID BY MIXING IT WITH PLENTY OF COLD WATER AND LETTING IT AFTERWARDS STAND TO SETTLE FOR A DAY AND NIGHT OR NEARLY AN OUNCE OF GERMAN YEAST PUT IT INTO A LARGE BASIN AND PROCEED TO MIX IT SO THAT IT SHALL BE AS SMOOTH AS CREAM WITH THREE QUARTERS PINT OF WARM MILK AND WATER OR WITH WATER ONLY THOUGH EVEN A VERY LITTLE MILK WILL MUCH IMPROVE THE BREAD
+FROM FIFTEEN TO TWENTY MINUTES WILL BE REQUIRED TO BAKE THEM NICELY
+WHEN COLD THEY SHOULD BE PUT INTO TIN CANISTERS TO KEEP THEM DRY AND IF INTENDED FOR THE CHEESE COURSE THE SIFTED SUGAR SHOULD BE OMITTED
+IT HAS BEEN INTRODUCED INTO ITALY WHERE THEY MAKE A COARSE BREAD FROM IT AND IT IS ALSO EMPLOYED IN PASTRY AND PUDDINGS THEY ALSO USE IT FOR FEEDING HORSES AND DOMESTIC FOWLS
+VICTORIA BUNS SEVENTEEN THIRTY TWO
+SUFFICIENT ALLOW TWO CRUMPETS TO EACH PERSON
+MUFFINS AND CRUMPETS SHOULD ALWAYS BE SERVED ON SEPARATE DISHES AND BOTH TOASTED AND SERVED AS EXPEDITIOUSLY AS POSSIBLE
+ILLUSTRATION BUNS
+PLAIN BUNS SEVENTEEN TWENTY NINE
+MODE BOIL THE RICE IN WATER UNTIL IT IS QUITE TENDER POUR OFF THE WATER AND PUT THE RICE BEFORE IT IS COLD TO THE FLOUR
+SEVENTEEN EIGHTEEN
+IT WILL GROW ON POOR SOILS AND IS EXTREMELY PRODUCTIVE
+SOUR MILK OR BUTTERMILK MAY BE USED BUT THEN A LITTLE LESS ACID WILL BE NEEDED
+ANOTHER ADVANTAGE THE RED WHEATS POSSESS IS THEIR COMPARATIVE IMMUNITY FROM THE ATTACKS OF MILDEW AND FLY
+HE SAYS THAT BY CUTTING THROUGH FOUR OR FIVE SLICES AT A TIME ALL THE BUTTER IS SQUEEZED OUT OF THE UPPER ONES WHILE THE BOTTOM ONE IS SWIMMING IN FAT LIQUID
+IF CARRIED ANY DISTANCE IT SHOULD BE STORED AWAY IN AIR TIGHT VESSELS
+A YELLOW VARIETY CALLED GOLDEN MILLET IS SOLD IN THE GROCERS SHOPS FOR MAKING PUDDINGS AND IS VERY DELICATE AND WHOLESOME
+WHEN SHE HEARD OF MY ENGAGEMENT WITH MARY ANN SHE WROTE AND SUGGESTED THAT WE SHOULD SPEND OUR HONEYMOON IN HER COTTAGE OR PIGSTYE AND THAT I SHOULD PAY HER RENT FOR IT
+THE FACT OF HAVING GIVEN MARY ANN A WEDDING PRESENT SEEMS TO FILL THEM WITH A FEELING OF RANCOROUS ACIDITY WHICH TO ME IS INEXPLICABLE
+AND OF COURSE I HAD MY EXPECTATIONS AND SHE HAD HERS
+FROM A COUSIN OF OURS WHO'S IN THAT LINE
+THAT WAS WHAT MISSUS MACPHERSON SAID TO ME ONLY THE OTHER DAY
+THE ACCIDENT IN QUESTION OCCURRED UPON THE SUNDAY EVENING
+I FOUND THAT AS A WOMAN OF BUSINESS SHE WAS BEYOND ALL MY EXPECTATIONS
+THE GIRL IS FRETTING BUT YOU DON'T SEEM TO NOTICE IT
+SUCH IS THE SELFISHNESS OF HUMAN NATURE
+I CANNOT PRETEND TO EXPLAIN WHY EXCEPT ON THE SUPPOSITION THAT ROMANCE IS DEAD AT LEAST IN THAT CIRCLE OF SOCIETY IN WHICH THE SNELLINGS MOVE
+IT IS MOST DELIGHTFUL
+HER FATHER IS A MOST REMARKABLE PERSON TO SAY THE LEAST
+AND I WILL SEE THAT THERE IS NO SHIRKING ABOUT THE BOYS OR ABOUT THE GIRLS EITHER
+I DO NOT KNOW WHEN IT IS GOING TO BE BUT IT WILL BE EITHER NEXT WEEK OR THE WEEK AFTER CERTAINLY AT THE EARLIEST POSSIBLE MOMENT AND I SHOULDN'T BE AT ALL SURPRISED TO LEARN THAT ALL MARY ANN'S THINGS HAD BEEN ALREADY BOUGHT AND PERHAPS SOME OF THEM MARKED
+A SECOND COUSIN OF MARY ANN'S IS IN THE COOK'S TOURS LINE
+I SHALL MAKE PAPA GIVE ME FIVE HUNDRED POUNDS AT LEAST
+IT TURNED OUT THAT SHE HAD A LITTLE MONEY OF HER OWN ABOUT A HUNDRED AND THIRTY POUNDS A YEAR
+SOMEONE SNIGGERED
+HERS HAS BEEN PRODIGIOUS
+BUT IT IS QUITE PLAIN TO ME THAT ALL THE ARRANGEMENTS FOR MY WEDDING ARE GOING TO BE MADE BY THE SNELLINGS
+P S THE CARDS ARE OUT FOR THE WEDDING
+I HAVE DRAWN UP A LIST OF ALL THE PEOPLE WHO OUGHT TO GIVE US A PRESENT AND I SHALL TELL THEM WHAT THEY OUGHT TO GIVE IT WON'T BE MY FAULT IF I DON'T GET IT
+SHE HAS A KNACK OF GETTING PEOPLE TO DO WHAT SHE WISHES AND TO GIVE HER WHAT SHE WANTS WHICH IS A LITTLE SHORT OF MIRACULOUS
+IT WAS PLAIN THAT TOGETHER WE SHOULD MANAGE MOST COMFORTABLY DELIGHTFULLY IN FACT
+WE HAVE ALL BEEN GIVING MARY ANN PRESENTS AND I SUPPOSE YOU MISTER WHITING HAVE BEEN GIVING HER SOMETHING TOO
+OF COURSE THERE ARE SOME PEOPLE WITH WHOM YOU CAN'T BE PERFECTLY PLAIN BUT I SHALL BE AS PLAIN AS I CAN THERE'S A WAY AND A MANNER OF DOING THAT KIND OF THING
+HE HAS GIVEN US FREE PASSES ALL THE WAY TO THE END OF OUR JOURNEY AND ALL THE WAY BACK AGAIN AND COUPONS FOR FREE BOARD AND LODGING AT THE HOTEL IT'S A WEDDING PRESENT
+AS IT IS UNLESS I AM MISTAKEN SOME OF THE RENDING WILL BE ON OUR SIDE AND THEY KNOW IT
+FOR INSTANCE LOOK AT THEIR BEHAVIOUR IN THE MATTER OF THE RING
+I GASPED POSITIVELY GASPED
+I NEVER SAW PEOPLE LIKE THE SNELLINGS FOR POSSESSING RELATIVES IN ALL SORTS OF LINES
+IT IS FROM HER ACTION IN THAT MATTER THAT MY SUSPICION SPRINGS
+IT MIGHT JUST AS WELL BE SOME ONE ELSE'S WEDDING SO UNIMPORTANT IS THE PART WHICH I AM SET TO PLAY IN IT
+I KNOW WHAT MAMMA CAN AFFORD TO GIVE AND I WILL SEE SHE GIVES IT
+WE ARE GOING FOR OUR HONEYMOON TO ITALY AND THE SOUTH OF FRANCE
+BUT WHY ON THAT ACCOUNT THEY SHOULD PITY ME I ALTOGETHER FAIL TO UNDERSTAND
+I NOTICE THAT THEY ARE GENERALLY PERSONS WHO HAVE ALREADY TENDERED THEIR OFFERINGS
+THERE WERE NO SIGNS OF FALTERING ABOUT HER FLOW OF LANGUAGE
+A BIRD IN THE HAND IS WORTH TWO IN A BUSH' AND IT WILL BE SOMETHING TO HAVE BY US
+THAT'S IT ON YOUR ACCOUNT
+AND WHAT INQUIRED MISSUS MACPHERSON HAS MARY ANN GIVEN YOU HER LOVE
+THERE SHE OWNS A COTTAGE OR IT MAY BE A PIGSTYE FOR ALL I KNOW
+BESIDES WHICH WE CAN ALWAYS SELL THE COUPONS AND RAILWAY PASSES WHICH WE DON'T USE
+I WAS PERSUADED THAT SOMEBODY BESIDES THAT COUSIN GOT A PROFIT OUT OF MARY ANN'S ENGAGEMENT RING BUT I HANDED OVER THE AMOUNT
+WHEN AT LAST I REACHED CROFTON MY JOURNEY'S END IT TURNED OUT THAT THE STATION STAFF CONSISTED OF A HALF WITTED INDIVIDUAL WHO WAS STATIONMASTER PORTER AND CLERK COMBINED AND A HULKING LAD WHO DID WHATEVER ELSE THERE WAS TO DO
+HERE WE BE THAT MIGHT BE SO
+NO ANSWER THOUGH I ALLOWED A MORE THAN DECENT INTERVAL
+THERE APPEARED TO BE NO KNOCKER THOUGH WHETHER IT HAD BEEN TWISTED OFF WAS MORE THAN I COULD SAY
+WE'VE LOST THE KEY OF THE CELLAR AND THERE'S NOTHING OUT EXCEPT WATER AND I DON'T THINK YOU'D CARE FOR THAT
+IN IT I WAS DEPOSITED WITH MY LUGGAGE
+I EVEN BOUGHT SOMETHING FOR MADGE I MEAN MISSUS WILSON
+A VOICE INQUIRED WHO'S THERE
+THE REPLY WAS WRITTEN IN A SPRAWLING FEMININE HAND IT WAS A LITTLE VAGUE
+BETTER RING AGAIN SUGGESTED THE DRIVER HARD
+THE WHOLE THING WAS A TRIFLE ODD
+I DID NOT ASK I WAS BEYOND IT
+I HAD NO ILLUSIONS
+WHO LIVES HERE ARE THE PEOPLE MAD
+I IMAGINE THERE WERE SEVERAL KINDS OF OLD FASHIONED CHRISTMASES BUT IT COULD HARDLY BE WORSE THAN A CHOP IN MY CHAMBERS OR HORROR OF HORRORS AT THE CLUB OR MY COUSIN LUCY'S NOTION OF WHAT SHE CALLS THE FESTIVE SEASON
+NOW IT IS A REMARKABLE THING THAT I HAVE ALWAYS HAD AN EXTRAORDINARY PREDILECTION FOR THE NAME MADGE I DO NOT KNOW WHY
+IT IS SOME SATISFACTION FOR ME TO BE ABLE TO REFLECT THAT I MADE IT WARM FOR THE OFFICIALS HOWEVER COLD I MIGHT HAVE BEEN MYSELF
+HE WAS IMPERVIOUS TO REASON
+I WAS CHILLED TO THE BONE WET TIRED HUNGRY
+I FELT QUITE LIVELY MYSELF AS I MINGLED WITH THE CHRISTMAS CROWD LOOKING FOR THINGS WHICH MIGHT NOT TURN OUT TO BE ABSOLUTELY PREPOSTEROUS
+WHEN THE TRAP DID APPEAR IT LOOKED TO ME UNCOMMONLY LIKE AN OPEN SPRING CART
+THERE WAS A TRAP AT THE BOY AND BLUNDERBUSS BUT THAT REQUIRED FETCHING
+NO ONE HAD COME TO MEET ME THE VILLAGE WAS ABOUT HALF A MILE AND HANGAR DENE THE HOUSE FOR WHICH MY STEPS WERE BENT ABOUT FOUR MILES BY THE ROAD HOW FAR IT WAS ACROSS PLOUGHED FIELDS MY INFORMANT DID NOT MENTION
+THERE WAS NOTHING SAID ABOUT THE SORT OF ACCOMMODATION WHICH WOULD BE PROVIDED NOTHING ABOUT THE KIND OF ESTABLISHMENT WHICH WAS MAINTAINED OR THE TABLE WHICH WAS KEPT
+MAYBE THEY'RE UP TO SOME OF THEIR GAMES AND WANTS ROUSING
+THERE WAS A RUSH OF RETREATING FEET AN EXPOSTULATING VOICE THEN DARKNESS AGAIN AND SILENCE
+THE BELL REVERBERATED THROUGH WHAT SEEMED LIKE AN EMPTY HOUSE
+ALL NIGHT IT HAD BEEN BLOWING AND RAINING
+FESTIVE YES
+IT WAS A HORRIBLE JOURNEY
+AFTER A VAST AMOUNT OF UNFASTENING THE DOOR WAS OPENED AND ON THE THRESHOLD THERE STOOD A GIRL WITH A LIGHTED CANDLE IN HER HAND
+I DID NOT EXPECT A PRINCELY ENTERTAINMENT
+PRESENTLY FEET WERE HEARD ADVANCING ALONG THE PASSAGE SEVERAL PAIRS IT SEEMED AND A LIGHT GLEAMED THROUGH THE WINDOW OVER THE DOOR
+I DID NOT KNOW WHAT HE MEANT
+IT APPEARED THAT THE TERMS WOULD BE FIVE GUINEAS BUT THERE WAS NO MENTION OF THE LENGTH OF TIME WHICH THAT FEE WOULD COVER
+THE INFORMATION WAS GREETED WITH WHAT SOUNDED UNCOMMONLY LIKE A CHORUS OF LAUGHTER
+UNDER SUCH CIRCUMSTANCES SHE WAS HARDLY LIKELY TO BE LIVELY HERSELF BUT HER NAME WAS MADGE AND IT WAS THE ACCIDENT OF HER CHRISTIAN NAME WHICH DECIDED ME TO GO
+I TOLLED THE BELL AGAIN
+I HAVE NEVER KNOWN A MADGE AND YET FROM MY BOYHOOD UPWARD I HAVE DESIRED TO MEET ONE
+I HAD LONG BEEN WISHING THAT AN OLD FASHIONED CHRISTMAS HAD BEEN COMPLETELY EXTINCT BEFORE I HAD THOUGHT OF ADVENTURING IN QUEST OF ONE
+THERE BE THE DOOR IN FRONT OF YOU YOU GO UP THREE STEPS IF YOU CAN FIND EM
+I'M MISTER CHRISTOPHER FROM LONDON
+THERE'S A KNOCKER IF NONE OF EM HAVEN'T TWISTED IT OFF
+EARLY IN THE MORNING THE STEPMOTHER CAME AND PULLED THEM OUT OF BED AND GAVE THEM EACH A SLICE OF BREAD WHICH WAS STILL SMALLER THAN THE FORMER PIECE
+BUT SHE LEFT HIM NO PEACE TILL HE CONSENTED SAYING AH BUT I SHALL MISS THE POOR CHILDREN
+AND SHE GOT UP AND PUT HER HEAD INTO THE OVEN
+THEN SHE TOOK UP HANSEL WITH HER ROUGH HAND AND SHUT HIM UP IN A LITTLE CAGE WITH A LATTICE DOOR AND ALTHOUGH HE SCREAMED LOUDLY IT WAS OF NO USE
+HANSEL THOUGHT THE ROOF TASTED VERY NICE AND SO HE TORE OFF A GREAT PIECE WHILE GRETHEL BROKE A LARGE ROUND PANE OUT OF THE WINDOW AND SAT DOWN QUITE CONTENTEDLY
+WE ARE GOING INTO THE FOREST TO HEW WOOD AND IN THE EVENING WHEN WE ARE READY WE WILL COME AND FETCH YOU AGAIN
+CREEP IN SAID THE WITCH AND SEE IF IT IS HOT ENOUGH AND THEN WE WILL PUT IN THE BREAD BUT SHE INTENDED WHEN GRETHEL GOT IN TO SHUT UP THE OVEN AND LET HER BAKE SO THAT SHE MIGHT EAT HER AS WELL AS HANSEL
+AH FATHER SAID HANSEL I AM LOOKING AT MY WHITE CAT SITTING UPON THE ROOF OF THE HOUSE AND TRYING TO SAY GOOD BYE
+DEAR GOOD GOD HELP US NOW SHE PRAYED
+OH YOU SIMPLETON SAID SHE THEN WE MUST ALL FOUR DIE OF HUNGER YOU HAD BETTER PLANE THE COFFINS FOR US
+THEN THEY BEGAN TO RUN AND RUSHING INTO THE HOUSE THEY FELL UPON THEIR FATHER'S NECK
+GRETHEL SHE CRIED IN A PASSION GET SOME WATER QUICKLY BE HANSEL FAT OR LEAN THIS MORNING I WILL KILL AND COOK HIM
+HE HAD LITTLE ENOUGH TO BREAK OR BITE AND ONCE WHEN THERE WAS A GREAT FAMINE IN THE LAND HE COULD HARDLY PROCURE EVEN HIS DAILY BREAD AND AS HE LAY THINKING IN HIS BED ONE NIGHT HE SIGHED AND SAID TO HIS WIFE WHAT WILL BECOME OF US
+HOW CAN WE FEED OUR CHILDREN WHEN WE HAVE NO MORE THAN WE CAN EAT OURSELVES
+COME IN AND STOP WITH ME AND NO HARM SHALL COME TO YOU AND SO SAYING SHE TOOK THEM BOTH BY THE HAND AND LED THEM INTO HER COTTAGE
+SEE I COULD EVEN GET IN MYSELF
+AND NOW AS THERE WAS NOTHING TO FEAR THEY WENT BACK TO THE WITCH'S HOUSE WHERE IN EVERY CORNER WERE CASKETS FULL OF PEARLS AND PRECIOUS STONES
+BUT HER HUSBAND FELT HEAVY AT HEART AND THOUGHT IT WERE BETTER TO SHARE THE LAST CRUST WITH THE CHILDREN
+BUT IN REALITY HANSEL WAS NOT LOOKING AT A CAT BUT EVERY TIME HE STOPPED HE DROPPED A PEBBLE OUT OF HIS POCKET UPON THE PATH
+THE OLD WOMAN BEHAVED VERY KINDLY TO THEM BUT IN REALITY SHE WAS A WICKED OLD WITCH WHO WAY LAID CHILDREN AND BUILT THE BREADHOUSE IN ORDER TO ENTICE THEM IN BUT AS SOON AS THEY WERE IN HER POWER SHE KILLED THEM COOKED AND ATE THEM AND MADE A GREAT FESTIVAL OF THE DAY
+GRETHEL BEGAN TO CRY BUT IT WAS ALL USELESS FOR THE OLD WITCH MADE HER DO AS SHE WANTED
+THAT NIGHT THE ENEMY SLIPPED AWAY LEAVING HUNDREDS AND HUNDREDS OF HIS DEAD AND WOUNDED ON THE FIELD
+THERE WERE NO BREASTWORKS YET THAT ONE LITTLE BRIGADE OF HAMILTON'S DIVISION STOOD THERE IN THE OPEN AND REPULSED ASSAULT AFTER ASSAULT
+IT REQUIRED MONTHS AND GREAT EVENTS TO MAKE GRANT THE HERO OF THE ARMY WHICH HE AFTERWARD BECAME
+FOR SOME REASON THE DEAD AT HATCHIE BRIDGE WERE NOT BURIED
+I COULD NOT HELP MY FRIEND
+ONCE IN THE NIGHT I SLIPPED AWAY FROM THE BIVOUAC AND HURRIED TO THE OLD TISHIMINGO HOTEL TO SEE A LIEUTENANT OF MY COMPANY WHO HAD BEEN SHOT THROUGH THE BREAST
+OUR BRIGADE WAS FEARFULLY OUTNUMBERED
+I HASTENED BACK TO THE LINES
+FIFTEEN OFFICERS OF OUR LITTLE HALF REGIMENT WERE DEAD OR WOUNDED
+THAT NIGHT I STOOD GUARD UNDER AN OAK TREE ON THE BATTLEFIELD AMONG THE UNBURIED DEAD
+WHEN MORNING CAME THE FIRING OPENED AND FOR ALL THAT DAY THE BATTLE RAGED FIERCELY AT THE LEFT AND CENTER LEFT WE GETTING THE WORST OF IT TOO
+A PERFECT BLAZE OF CLOSE RANGE MUSKETRY TOO MOWED THEM DOWN LIKE GRASS
+GO BACK TO THE REGIMENT HE SAID SMILING ALL WILL BE NEEDED
+GRANT WAS ONLY A FEW MILES AWAY BUT ALTHOUGH COMMANDER IN CHIEF HE KNEW NOTHING OF THE HARDEST FOUGHT BATTLE OF THE CIVIL WAR UNTIL IT WAS OVER
+ROSECRANS PROTESTED IT WAS IN VAIN
+THE CLOUD OF REBELS WE HAD SEEN DIVIDED ITSELF INTO THREE COLUMNS
+NO BATTERY IN THE WHOLE FOUR YEARS WAR LOST SO MANY MEN IN SO SHORT A TIME
+IT WAS NOT A QUESTION WHO WAS DEAD OR WOUNDED BUT WHO WAS NOT
+A WEEK AFTER THE BATTLE MY BROTHER RODE BY THERE ON A CAVALRY EXPEDITION AND MADE THE HORRIBLE DISCOVERY THAT HOGS WERE EATING UP THE BODIES OF OUR DEAD HEROES THAT TOO WAS WAR
+NOT BALAKLAVA NOR THE ALMA SAW SUCH FIGHTING IT WAS A DUEL TO THE DEATH
+HE SURVIVED THE WAR ONLY TO BE MURDERED LATER ON A PLANTATION IN MISSISSIPPI
+MY OWN REGIMENT WAS IN THE ADVANCE
+MY FRIEND WITH MANY OTHERS WAS BEING CARRIED OUT TO DIE ELSEWHERE
+ONE DARING REBEL WAS SHOT DOWN AND BAYONETED CLEAR BEHIND THE LINE OF COMPANY B WHERE HE HAD BROKEN THROUGH TO SEIZE THE FLAG OF MY REGIMENT
+INDEED WE OF THE RANK AND FILE HAD LITTLE CONFIDENCE IN GRANT IN THOSE DAYS
+UNDER THE SAME QUIET MOONLIGHT AND ONLY SIX HUNDRED YARDS AWAY FROM US ALSO LAY THE VICTORIOUS REBEL ARMY
+WITH A FEW LANTERNS OUR MEN THEN WENT ABOUT AND TRIED TO GATHER UP THE WOUNDED THE DEAD WERE LEFT TILL MORNING
+THEY LAY IN HEAPS OF DOZENS EVEN CLOSE UP TO THE WORKS
+I REMAINED AWAKE ALL NIGHT TALKING WITH A COMRADE WHO SHARED MY BLANKET WITH ME POOR JIMMY KING
+THAT EVENING AN ORDER CAME FOR US HAMILTON'S DIVISION TO ASSAULT THE ENEMY'S LEFT FLANK AT MIDNIGHT
+HE IS OLD AS WELL AS POOR SHE SAID
+ALL MY WORRIES ABOUT YOU WERE FOOLISH
+THE CRIES AND CURSES OF THE ROBBERS FILLED THE AIR
+PERHAPS YOU CAN OUTWIT HER YET CRIED ANOTHER
+HOWEVER IN SPITE OF ALL SHE COULD SAY THE ELDER SISTERS OPENED THE DOOR AND ADMITTED THE BEGGAR
+THE TWO ELDEST ATE THEIR APPLES BUT THE YOUNGEST COULD NOT EAT THAT NIGHT SHE THREW THE APPLE AWAY
+EVERY YEAR AT A CERTAIN DAY OF A CERTAIN MONTH HE WENT AWAY TO A DISTANT CITY TO COLLECT MONEY ON AN ACCOUNT
+YOU SHALL NOT COME INTO MY FATHER'S HOUSE
+IT IS A FEARFUL NIGHT TO SEND AWAY A BEGGAR SAID THE ELDEST SISTER WHILE THEY WERE EATING
+LONG AGO THERE LIVED A MERCHANT WHO HAD THREE DAUGHTERS
+THE MERCHANT'S DAUGHTER AT FIRST DID NOT ANSWER BUT AS HE KEPT ON CALLING TO HER SHE FINALLY ASKED HIM WHAT IT WAS THAT HE WANTED
+IF WE DECIDE TO SHOW MERCY TO THIS POOR BEGGAR IT IS NOT FOR YOU TO OPPOSE IT
+IT'S SURELY A TERRIBLE STORM OUTSIDE SAID THE MERCHANT'S ELDEST DAUGHTER AS THE WIND RATTLED THE TILES OF THE ROOF AND THE RAIN BEAT IN TORRENTS AGAINST THE DOORS AND WINDOWS
+WHILE THEY WERE TALKING THE BEGGAR HAD TAKEN THE APPLES WHICH THE GIRLS WERE TO EAT FOR DESSERT AND HAD SPRINKLED A SLEEPING POWDER OVER THEM
+BUI WE SHOULD NOT FORGET OUR PROMISE TO OUR FATHER CRIED THE YOUNGEST DAUGHTER
+THEY TRIED IN VAIN TO BREAK DOWN THE GREAT DOORS
+I PROMISE YOU I WILL DO YOU NO HARM
+IT WAS THE YOUNGEST ONE WHO DECEIVED ME CRIED THE ROBBER CHIEFTAIN
+HAVE PITY UPON A POOR UNFORTUNATE ONE HE CALLED OUT
+THEN SHE HEARD HIM GO DOWN THE STAIRWAY AND UNBOLT THE HEAVY DOORS WHICH LED INTO THE STORE
+WHEN IT WAS EVENING HE LED HIS BAND INTO A NEARBY STREET AND IN HIS DISGUISE APPROACHED THE MERCHANT'S HOUSE HE KNOCKED AT THE DOOR
+PASS THE CHARM OUT TO ME THEN SAID THE ROBBER
+WHEN SHE RETURNED HIS HAND WAS STICKING THROUGH THE HOLE IN THE DOOR
+SHE DID NOT STIR AND HE KNEW THAT THE SLEEPING POWDER HAD THOROUGHLY DONE ITS WORK
+LET ME ENTER I PRAY YOU TO PASS THE NIGHT UNDER YOUR ROOF
+HOW DO YOU KNOW ASKED THEIR FATHER I AM OLDER AND WISER THAN YOU ARE AND I KNOW THAT THERE ARE MANY EVILS WHICH MIGHT COME UPON YOU
+WHEN AN ARISTOCRACY CARRIES ON THE PUBLIC AFFAIRS ITS NATIONAL PRIDE NATURALLY ASSUMES THIS RESERVED INDIFFERENT AND HAUGHTY FORM WHICH IS IMITATED BY ALL THE OTHER CLASSES OF THE NATION
+THEY THEREFORE ENTERTAIN A CALM SENSE OF THEIR SUPERIORITY THEY DO NOT DREAM OF VAUNTING PRIVILEGES WHICH EVERYONE PERCEIVES AND NO ONE CONTESTS AND THESE THINGS ARE NOT SUFFICIENTLY NEW TO THEM TO BE MADE TOPICS OF CONVERSATION
+THEN THERE ARE IN ALL CLASSES A VERY LARGE NUMBER OF MEN CONSTANTLY OCCUPIED WITH THE SERIOUS AFFAIRS OF THE GOVERNMENT AND THOSE WHOSE THOUGHTS ARE NOT ENGAGED IN THE DIRECTION OF THE COMMONWEALTH ARE WHOLLY ENGROSSED BY THE ACQUISITION OF A PRIVATE FORTUNE
+I DO NOT BELIEVE IN SUCH REPUBLICS ANY MORE THAN IN THAT OF PLATO OR IF THE THINGS WE READ OF REALLY HAPPENED I DO NOT HESITATE TO AFFIRM THAT THESE SUPPOSED DEMOCRACIES WERE COMPOSED OF VERY DIFFERENT ELEMENTS FROM OURS AND THAT THEY HAD NOTHING IN COMMON WITH THE LATTER EXCEPT THEIR NAME
+IF I APPLAUD THE FREEDOM WHICH ITS INHABITANTS ENJOY HE ANSWERS FREEDOM IS A FINE THING BUT FEW NATIONS ARE WORTHY TO ENJOY IT
+IF I SAY TO AN AMERICAN THAT THE COUNTRY HE LIVES IN IS A FINE ONE AY HE REPLIES THERE IS NOT ITS FELLOW IN THE WORLD
+IN ARISTOCRACIES EVERY MAN HAS ONE SOLE OBJECT WHICH HE UNCEASINGLY PURSUES BUT AMONGST DEMOCRATIC NATIONS THE EXISTENCE OF MAN IS MORE COMPLEX THE SAME MIND WILL ALMOST ALWAYS EMBRACE SEVERAL OBJECTS AT THE SAME TIME AND THESE OBJECTS ARE FREQUENTLY WHOLLY FOREIGN TO EACH OTHER AS IT CANNOT KNOW THEM ALL WELL THE MIND IS READILY SATISFIED WITH IMPERFECT NOTIONS OF EACH
+IN ARISTOCRATIC COMMUNITIES THE PEOPLE READILY GIVE THEMSELVES UP TO BURSTS OF TUMULTUOUS AND BOISTEROUS GAYETY WHICH SHAKE OFF AT ONCE THE RECOLLECTION OF THEIR PRIVATIONS THE NATIVES OF DEMOCRACIES ARE NOT FOND OF BEING THUS VIOLENTLY BROKEN IN UPON AND THEY NEVER LOSE SIGHT OF THEIR OWN SELVES WITHOUT REGRET
+THESE PERSONS THEN DISPLAYED TOWARDS EACH OTHER PRECISELY THE SAME PUERILE JEALOUSIES WHICH ANIMATE THE MEN OF DEMOCRACIES THE SAME EAGERNESS TO SNATCH THE SMALLEST ADVANTAGES WHICH THEIR EQUALS CONTESTED AND THE SAME DESIRE TO PARADE OSTENTATIOUSLY THOSE OF WHICH THEY WERE IN POSSESSION
+THEY STAND UNMOVED IN THEIR SOLITARY GREATNESS WELL ASSURED THAT THEY ARE SEEN OF ALL THE WORLD WITHOUT ANY EFFORT TO SHOW THEMSELVES OFF AND THAT NO ONE WILL ATTEMPT TO DRIVE THEM FROM THAT POSITION
+THE AMERICANS IN THEIR INTERCOURSE WITH STRANGERS APPEAR IMPATIENT OF THE SMALLEST CENSURE AND INSATIABLE OF PRAISE
+IN ARISTOCRATIC COUNTRIES THE GREAT POSSESS IMMENSE PRIVILEGES UPON WHICH THEIR PRIDE RESTS WITHOUT SEEKING TO RELY UPON THE LESSER ADVANTAGES WHICH ACCRUE TO THEM
+AN AMERICAN INSTEAD OF GOING IN A LEISURE HOUR TO DANCE MERRILY AT SOME PLACE OF PUBLIC RESORT AS THE FELLOWS OF HIS CALLING CONTINUE TO DO THROUGHOUT THE GREATER PART OF EUROPE SHUTS HIMSELF UP AT HOME TO DRINK
+CHAPTER SIXTEEN WHY THE NATIONAL VANITY OF THE AMERICANS IS MORE RESTLESS AND CAPTIOUS THAN THAT OF THE ENGLISH
+I BELIEVE THE SERIOUSNESS OF THE AMERICANS ARISES PARTLY FROM THEIR PRIDE
+THIS IS MORE ESPECIALLY THE CASE AMONGST THOSE FREE NATIONS WHICH FORM DEMOCRATIC COMMUNITIES
+I TOLD TOM WE SHOULDN'T COME SO LATE SAYS HILDA
+RIGHT AWAY WHEN I BRING HOME MY NEW PROGRAM HE SAYS HOW COME YOU'RE TAKING ONE LESS COURSE THIS HALF
+IT'S YOUR FAULT MOP IT UP YOURSELF
+YOU'RE GETTING ALTOGETHER TOO UPSET ABOUT THESE PROGRAMS STOP IT AND BEHAVE YOURSELF
+TOM SAYS THANKS AND LOOKS AT HILDA AND SHE BLUSHES REALLY
+BESIDES SAYS TOM HALF THE REASON YOU AND YOUR FATHER ARE ALWAYS BICKERING IS THAT YOU'RE SO MUCH ALIKE ME LIKE HIM SURE
+POP IT'S A COURSE
+SOMEHOW OR OTHER CAT HAS TAUGHT THEM THAT HE'S IN CHARGE HERE AND HE JUST CHASES THEM FOR FUN NOW AND AGAIN WHEN HE'S NOT BUSY SLEEPING
+I'LL BE LUCKY IF I HAVE TIME TO BREATHE
+POP GOES RIGHT ON TUNING HIS CHANNEL
+TOM DRINKS A LITTLE MORE COFFEE AND THEN HE GOES ON THE TROUBLE IS I CAN'T GET MARRIED ON THIS FLOWER SHOP JOB
+SOMETIMES SCHOOLS DO LET KIDS TAKE A LOT OF SOFT COURSES AND THEN THEY'RE OUT ON A LIMB LATER HUH
+HE DOES AND FOR ONCE I WIN A ROUND I KEEP MUSIC FOR THIS SEMESTER
+I EXPLAIN THAT I'M TAKING MUSIC AND ALSO BIOLOGY ALGEBRA ENGLISH AND FRENCH MUSIC HE SNORTS
+WHEN ARE YOU GETTING RID OF THESE CATS I'M NOT FIXING TO START AN ANNEX TO KATE'S CAT HOME
+IT'S THE FIRST TIME HILDA HAS BEEN TO OUR HOUSE AND TOM INTRODUCES HER AROUND
+I HEAR THE T V GOING FOR A FEW MINUTES THEN POP TURNS IT OFF AND GOES IN THE KITCHEN TO TALK TO MOM
+I LOOK AT MY WATCH IT'S A QUARTER TO ELEVEN
+I GET THE PILLOWS COMFORTABLY ARRANGED ON THE FLOOR WITH A BIG BOTTLE OF SODA AND A BAG OF POPCORN WITHIN EASY REACH
+AS LONG AS THERE'S A BONE ON THE FLOOR THE TWO OF YOU WORRY IT
+I TURN OFF THE TELEVISION SET I'VE LOST TRACK OF WHAT'S HAPPENING AND IT DOESN'T SEEM TO BE THE GRANDFATHER WHO'S THE SPOOK AFTER ALL
+WELL I DON'T THINK YOU SHOULD TURN A GUY'S T V PROGRAM OFF IN THE MIDDLE WITHOUT EVEN FINDING OUT ABOUT IT
+HERE'S TO YOU A LONG HAPPY LIFE
+I'LL HAVE TO CHECK SOME MORE SAYS TOM
+YOU KNOW I'D GET DRAFTED IN A YEAR OR TWO ANYWAY
+THE TWO STRAY KITTENS GRADUALLY MAKE THEMSELVES AT HOME
+SHE DOESN'T PICK THEM UP BUT JUST HAVING THEM IN THE ROOM SURE DOESN'T GIVE HER ASTHMA
+SO HE CARES HUH
+I'VE DECIDED TO ENLIST IN THE ARMY
+THE WIND WAS SO STRONG THAT I HAD TO HOLD MY HAT ON AND THE GIRLS SKIRTS WERE BLOWN OUT BEFORE THEM
+HE WAS BORN LIKE THAT THE OTHERS ARE SMART
+SHE LOOKED AT ME HER EYES FAIRLY BLAZING WITH THINGS SHE COULD NOT SAY
+AT THAT MOMENT THE FATHER CAME OUT OF THE HOLE IN THE BANK
+MY GRANDMOTHER ALWAYS SPOKE IN A VERY LOUD TONE TO FOREIGNERS AS IF THEY WERE DEAF
+PRESENTLY AGAINST ONE OF THOSE BANKS I SAW A SORT OF SHED THATCHED WITH THE SAME WINE COLORED GRASS THAT GREW EVERYWHERE
+I NOTICED HOW WHITE AND WELL SHAPED HIS OWN HANDS WERE
+I REMEMBERED WHAT THE CONDUCTOR HAD SAID ABOUT HER EYES
+SHE WAS QUICK AND VERY EAGER
+AFTER ANTONIA HAD SAID THE NEW WORDS OVER AND OVER SHE WANTED TO GIVE ME A LITTLE CHASED SILVER RING SHE WORE ON HER MIDDLE FINGER
+VERY GLAD VERY GLAD SHE EJACULATED
+WE STOOD PANTING ON THE EDGE OF THE RAVINE LOOKING DOWN AT THE TREES AND BUSHES THAT GREW BELOW US
+AMBROSCH HE MAKE GOOD FARMER
+SHE POINTED INTO THE GOLD COTTONWOOD TREE BEHIND WHOSE TOP WE STOOD AND SAID AGAIN WHAT NAME
+WE WERE SO DEEP IN THE GRASS THAT WE COULD SEE NOTHING BUT THE BLUE SKY OVER US AND THE GOLD TREE IN FRONT OF US
+IT WAS SO LONG THAT IT BUSHED OUT BEHIND HIS EARS AND MADE HIM LOOK LIKE THE OLD PORTRAITS I REMEMBERED IN VIRGINIA
+SHE MADE MISSUS SHIMERDA UNDERSTAND THE FRIENDLY INTENTION OF OUR VISIT AND THE BOHEMIAN WOMAN HANDLED THE LOAVES OF BREAD AND EVEN SMELLED THEM AND EXAMINED THE PIES WITH LIVELY CURIOSITY EXCLAIMING MUCH GOOD MUCH THANK
+EVEN FROM A DISTANCE ONE COULD SEE THAT THERE WAS SOMETHING STRANGE ABOUT THIS BOY
+SHE GOT UP ON HER KNEES AND WRUNG HER HANDS
+YOU'LL GET FIXED UP COMFORTABLE AFTER WHILE MISSUS SHIMERDA MAKE GOOD HOUSE
+WHEN I CAME UP HE TOUCHED MY SHOULDER AND LOOKED SEARCHINGLY DOWN INTO MY FACE FOR SEVERAL SECONDS
+ANTONIA POINTED UP TO THE SKY AND QUESTIONED ME WITH HER GLANCE
+OCCASIONALLY ONE OF THE HORSES WOULD TEAR OFF WITH HIS TEETH A PLANT FULL OF BLOSSOMS AND WALK ALONG MUNCHING IT THE FLOWERS NODDING IN TIME TO HIS BITES AS HE ATE DOWN TOWARD THEM
+NOW WHY IS THAT OTTO
+THE FAMILY HAD BEEN LIVING ON CORNCAKES AND SORGHUM MOLASSES FOR THREE DAYS
+HE STRUCK AMBROSCH ON THE BACK AND THE BOY SMILED KNOWINGLY
+I BECAME SOMEWHAT EMBARRASSED FOR I WAS USED TO BEING TAKEN FOR GRANTED BY MY ELDERS
+FUCHS BROUGHT UP A SACK OF POTATOES AND A PIECE OF CURED PORK FROM THE CELLAR AND GRANDMOTHER PACKED SOME LOAVES OF SATURDAY'S BREAD A JAR OF BUTTER AND SEVERAL PUMPKIN PIES IN THE STRAW OF THE WAGON BOX
+IT'S NO BETTER THAN A BADGER HOLE NO PROPER DUGOUT AT ALL
+WEEK FOLLOWED WEEK THESE TWO BEINGS LED A HAPPY LIFE IN THAT HOVEL
+HE PASSED HOURS IN WATCHING HER DRESSING AND UNDRESSING HER DOLL AND IN LISTENING TO HER PRATTLE
+ONLY AS HE WAS FIVE AND FIFTY AND COSETTE EIGHT YEARS OF AGE ALL THAT MIGHT HAVE BEEN LOVE IN THE WHOLE COURSE OF HIS LIFE FLOWED TOGETHER INTO A SORT OF INEFFABLE LIGHT
+MOREOVER JEAN VALJEAN HAD CHOSEN HIS REFUGE WELL
+HE HAD RETURNED TO PRISON THIS TIME FOR HAVING DONE RIGHT HE HAD QUAFFED FRESH BITTERNESS DISGUST AND LASSITUDE WERE OVERPOWERING HIM EVEN THE MEMORY OF THE BISHOP PROBABLY SUFFERED A TEMPORARY ECLIPSE THOUGH SURE TO REAPPEAR LATER ON LUMINOUS AND TRIUMPHANT BUT AFTER ALL THAT SACRED MEMORY WAS GROWING DIM
+COSETTE WAS NO LONGER IN RAGS SHE WAS IN MOURNING
+WHEN THESE TWO SOULS PERCEIVED EACH OTHER THEY RECOGNIZED EACH OTHER AS NECESSARY TO EACH OTHER AND EMBRACED EACH OTHER CLOSELY
+THE MAN NO LONGER PRODUCED ON HER THE EFFECT OF BEING OLD OR POOR SHE THOUGHT JEAN VALJEAN HANDSOME JUST AS SHE THOUGHT THE HOVEL PRETTY
+HE HAD NEVER BEEN FATHER LOVER HUSBAND FRIEND
+WHO KNOWS WHETHER JEAN VALJEAN HAD NOT BEEN ON THE EVE OF GROWING DISCOURAGED AND OF FALLING ONCE MORE
+ALAS HE WALKED WITH NO LESS INDECISION THAN COSETTE
+THE BEST OF US ARE NOT EXEMPT FROM EGOTISTICAL THOUGHTS
+SHE FELT THAT WHICH SHE HAD NEVER FELT BEFORE A SENSATION OF EXPANSION
+HIS SISTER AND HIS SISTER'S CHILDREN HAD LEFT HIM ONLY A VAGUE AND FAR OFF MEMORY WHICH HAD FINALLY ALMOST COMPLETELY VANISHED HE HAD MADE EVERY EFFORT TO FIND THEM AND NOT HAVING BEEN ABLE TO FIND THEM HE HAD FORGOTTEN THEM
+THE HEART OF THAT EX CONVICT WAS FULL OF VIRGINITY
+NATURE A DIFFERENCE OF FIFTY YEARS HAD SET A PROFOUND GULF BETWEEN JEAN VALJEAN AND COSETTE DESTINY FILLED IN THIS GULF
+COSETTE ON HER SIDE HAD ALSO UNKNOWN TO HERSELF BECOME ANOTHER BEING POOR LITTLE THING
+TO MEET WAS TO FIND EACH OTHER
+HE HAD PAID HER SIX MONTHS IN ADVANCE AND HAD COMMISSIONED THE OLD WOMAN TO FURNISH THE CHAMBER AND DRESSING ROOM AS WE HAVE SEEN
+HE WAS THAT CHILD'S STAY AND SHE WAS HIS PROP
+AND THEN HE TALKED OF HER MOTHER AND HE MADE HER PRAY
+HE SUFFERED ALL THE PANGS OF A MOTHER AND HE KNEW NOT WHAT IT MEANT FOR THAT GREAT AND SINGULAR MOVEMENT OF A HEART WHICH BEGINS TO LOVE IS A VERY OBSCURE AND A VERY SWEET THING
+HE PROTECTED HER AND SHE STRENGTHENED HIM
+WHAT ALTERNATIVE WAS THERE FOR HER
+AND ALREADY THIS ASTOUNDING BLOW BEGINS TO TAKE ITS PLACE AMONG OTHER EVENTS AS A THING STRANGE AND TERRIBLE INDEED BUT RELATED TO ALL THE STRANGENESS AND MYSTERY OF LIFE PART OF THE UNIVERSAL MYSTERIES OF DESPAIR AND FUTILITY AND DEATH THAT HAVE TROUBLED MY CONSCIOUSNESS SINCE CHILDHOOD
+I BECAME GROTESQUELY ANXIOUS TO ASSURE HIM THAT INDEED SHE AND I HAD BEEN AS THEY SAY INNOCENT THROUGHOUT OUR LAST DAY TOGETHER
+SHE WAS DESTROYED NOT MERELY BY THE UNCONSIDERED UNDISCIPLINED PASSIONS OF HER HUSBAND AND HER LOVER BUT BY THE VAST TRADITION THAT SUSTAINS AND ENFORCES THE SUBJUGATION OF HER SEX
+IT WAS THAT IDEA OF WASTE THAT DOMINATED MY MIND IN A STRANGE INTERVIEW I HAD WITH JUSTIN
+IT SEEMS TO ME MORE AND MORE AS I LIVE LONGER THAT MOST POETRY AND MOST LITERATURE AND PARTICULARLY THE LITERATURE OF THE PAST IS DISCORDANT WITH THE VASTNESS AND VARIETY THE RESERVES AND RESOURCES AND RECUPERATIONS OF LIFE AS WE LIVE IT TO DAY
+WE RANGE WIDER LAST LONGER AND ESCAPE MORE AND MORE FROM INTENSITY TOWARDS UNDERSTANDING
+FOR A TIME THE DEATH OF MARY OBSCURED HER LIFE FOR ME BUT NOW HER LIVING PRESENCE IS MORE IN MY MIND AGAIN
+IF WE HAD BEEN BROTHER AND SISTER INDEED THERE WAS NOTHING
+YOU WERE WRONG IN ALL THAT I SAID SHE KEPT HER FAITH WITH YOU
+HOW WE MUST SIMPLIFY
+WE NEVER PLANNED TO MEET AND WHEN WE MET
+AND IT IS UPON THIS EFFECT OF SWEET AND BEAUTIFUL POSSIBILITIES CAUGHT IN THE NET OF ANIMAL JEALOUSIES AND THOUGHTLESS MOTIVES AND ANCIENT RIGID INSTITUTIONS THAT I WOULD END THIS WRITING
+BUT NOW IT DOESN'T SEEM TO MATTER VERY MUCH
+IT IS THE EXPRESSION OF LIFE UNDER CRUDER AND MORE RIGID CONDITIONS THAN OURS LIVED BY PEOPLE WHO LOVED AND HATED MORE NAIVELY AGED SOONER AND DIED YOUNGER THAN WE DO
+IN MARY IT SEEMS TO ME I FOUND BOTH WOMANHOOD AND FELLOWSHIP I FOUND WHAT MANY HAVE DREAMT OF LOVE AND FRIENDSHIP FREELY GIVEN AND I COULD DO NOTHING BUT CLUTCH AT HER TO MAKE HER MY POSSESSION
+YOU SEE THE TREATMENT IS A TRIFLE FANCIFUL
+BUT IF I PLAY YOU A ROUNDEL LADY GET ME A GIFT FROM THE EMPEROR'S DAUGHTER HER FINGER RING FOR MY FINGER BRING THOUGH SHE'S PLEDGED A THOUSAND LEAGUES OVER THE WATER LADY LADY MY FAIR LADY O MY ROSE WHITE LADY
+THE EMPEROR'S DAUGHTER
+THE LADIES
+LADY LADY MY ROSE WHITE LADY BUT WILL YOU NOT HEAR A ROUNDEL LADY
+I'LL PLAY FOR YOU NOW NEATH THE APPLE BOUGH AND YOU SHALL DREAM ON THE LAWN SO SHADY LADY LADY MY FAIR LADY O MY APPLE GOLD LADY
+ONCE MORE THE SINGER PLAYS AND THE LADIES DANCE BUT ONE BY ONE THEY FALL ASLEEP TO THE DROWSY MUSIC AND THEN THE SINGER STEPS INTO THE RING AND UNLOCKS THE TOWER AND KISSES THE EMPEROR'S DAUGHTER
+BED TIME CHILDREN
+THE LADIES IN YELLOW DRESSES STAND AGAIN IN A RING ABOUT THE EMPEROR'S DAUGHTER AND ARE FOR THE LAST TIME ACCOSTED BY THE SINGER WITH HIS LUTE
+THE WANDERING SINGER
+SHE WOULD NOT SPEAK THOUGH WE DANCED A WEEK WITH HER THOUGHTS A THOUSAND LEAGUES OVER THE WATER SINGER SINGER WANDERING SINGER O MY HONEY SWEET SINGER
+I DON'T KNOW WHAT BECOMES OF THE LADIES
+BUT THIS IS A FALLACY
+BUT I DID ONCE HAVE THE LUCK TO HEAR AND SEE THE LADY PLAYED IN ENTIRETY THE CHILDREN HAD BEEN GRANTED LEAVE TO PLAY JUST ONE MORE GAME BEFORE BED TIME AND OF COURSE THEY CHOSE THE LONGEST AND PLAYED IT WITHOUT MISSING A SYLLABLE
+FORGOTTEN TOO THE NAME OF GILLIAN THE LOVELY CAPTIVE
+THE WANDERING SINGER APPROACHES THEM WITH HIS LUTE
+O IF YOU PLAY US A ROUNDEL SINGER HOW CAN THAT HARM THE EMPEROR'S DAUGHTER
+NOW YOU MAY PLAY A SERENA SINGER A DREAM OF NIGHT FOR AN APPLE GOLD LADY FOR THE FRUIT IS NOW ON THE APPLE BOUGH AND THE MOON IS UP AND THE LAWN IS SHADY SINGER SINGER WANDERING SINGER O MY HONEY SWEET SINGER
+THE WANDERING SINGER
+WORSE AND WORSE HE IS EVEN PRESUMED TO BE THE CAPTIVE'S SWEETHEART WHO WHEEDLES THE FLOWER THE RING AND THE PRISON KEY OUT OF THE STRICT VIRGINS FOR HIS OWN PURPOSES AND FLIES WITH HER AT LAST IN HIS SHALLOP ACROSS THE SEA TO LIVE WITH HER HAPPILY EVER AFTER
+IT WAS LOCKED FROM THE INSIDE AND WE HAD TO BURN IT DOWN WITH A TORCH THAT'S WHERE THEY ARE
+YES CHARCOAL
+THE DAILY NEWSCASTS FROM TERRA SHOWED A CORRESPONDING SHIFT IN INTEREST AT HOME
+WELL OF COURSE THEY'RE DEAD WHAT A QUESTION
+BRING IT TO THE PUBLIC ATTENTION DRAMATIZE IT
+TONY LATTIMER THE DISCOVERER WAS BEGINNING TO CASH IN ON HIS ATTENTIONS TO GLORIA AND HIS INGRATIATION WITH SID HE WAS ALWAYS EITHER MAKING VOICE AND IMAGE TALKS FOR TELECAST OR LISTENING TO THE NEWS FROM THE HOME PLANET
+THAT TOOK THE CENTER OF INTEREST AWAY FROM ARCHAEOLOGY AND STARTED A NEW BURST OF ACTIVITY
+NOW IT WAS BURNED AWAY AT BOTH SIDES AND LAY STILL HOT ALONG THE EDGES ON THE FLOOR OF THE BIG OFFICE ROOM IN FRONT
+THE TERRAN PUBLIC WANTED TO HEAR ABOUT MARTIANS AND IF LIVE MARTIANS COULDN'T BE FOUND A ROOM FULL OF DEAD ONES WAS THE NEXT BEST THING
+A FLOODLIGHT WAS ON IN THE ROOM INSIDE AND LATTIMER WAS GOING AROUND LOOKING AT THINGS WHILE A SPACE FORCE OFFICER STOOD BY THE DOOR
+THEY HAD FOUR OR FIVE SPECIES OF WHAT MIGHT LOOSELY BE CALLED BIRDS AND SOMETHING THAT COULD EASILY BE CLASSED AS A REPTILE AND A CARNIVOROUS MAMMAL THE SIZE OF A CAT WITH BIRDLIKE CLAWS AND A HERBIVORE ALMOST IDENTICAL WITH THE PIGLIKE THING IN THE BIG DARFHULVA MURAL AND ANOTHER LIKE A GAZELLE WITH A SINGLE HORN IN THE MIDDLE OF ITS FOREHEAD
+THE ORGANIZATION OF A SOCIETY OF MARTIAN ARCHAEOLOGY WITH ANTHONY LATTIMER PH D THE LOGICAL CANDIDATE FOR THE CHAIR
+BILL CHANDLER THE ZOOLOGIST HAD BEEN GOING DEEPER AND DEEPER INTO THE OLD SEA BOTTOM OF SYRTIS
+MARTHA REMEMBERED THE CLOSED DOOR ON THE FIRST SURVEY THEY HADN'T ATTEMPTED OPENING IT
+THE CIVILIAN SPECIALISTS IN OTHER FIELDS AND THE SPACE FORCE PEOPLE WHO HAD BEEN HOLDING TAPE LINES AND MAKING SKETCHES AND SNAPPING CAMERAS WERE ALL FLYING TO LOWER SYRTIS TO FIND OUT HOW MUCH OXYGEN THERE WAS AND WHAT KIND OF LIFE IT SUPPORTED
+WITHOUT QUESTION HE HAD BECOME OVERNIGHT THE MOST WIDELY KNOWN ARCHAEOLOGIST IN HISTORY
+SO THEY JUST CAME IN HERE AND LIT THE CHARCOAL AND SAT DRINKING TOGETHER TILL THEY ALL FELL ASLEEP
+NOT THAT I'M INTERESTED IN ALL THIS FOR MYSELF HE DISCLAIMED AFTER LISTENING TO THE TELECAST FROM TERRA TWO DAYS AFTER HIS DISCOVERY
+TONY'S FOUND THE MARTIANS
+SO I BELIEVE I SHALL GO BACK AT LEAST FOR A WHILE AND SEE WHAT I CAN DO
+LECTURES
+MASS SUICIDE THAT'S WHAT IT WAS NOTICE WHAT'S IN THE CORNERS
+GLORIA STANDISH WHO HAD DROPPED IN FOR LUNCH WAS ON THE MEZZANINE FAIRLY SCREAMING INTO A RADIOPHONE EXTENSION DOZEN AND A HALF OF THEM
+THEY ALSO FOUND A MARTIAN CALENDAR THE YEAR HAD BEEN DIVIDED INTO TEN MORE OR LESS EQUAL MONTHS AND ONE OF THEM HAD BEEN DOMA
+HE SMILED GUILTILY AS HE ADDED BUT I MUST ADMIT I WAS MORE THAN A LITTLE CONCERNED MYSELF
+AND THE ONLY TRUCK WE HAD AVAILABLE WAS IN THAT BURNING SHED THE SUPERINTENDENT ADDED BITTERLY
+THE TWO GIRLS WERE AS MUCH UPSET AS TOM'S MOTHER TOM LAUGHED
+THE TELEPHONE LINE WAS SOON REPAIRED AND A STEADY STREAM OF RESCUE VEHICLES BEGAN ARRIVING FROM HARKNESS FIRE TRUCKS THREE AMBULANCES AND PRIVATE CARS DRIVEN BY VOLUNTEERS
+I'LL BE GLAD TO TRY SIR HE REPLIED
+MISTER SWIFT'S EYES TWINKLED
+THE YOUNG INVENTOR HAD JUST NOTICED HIS FRIEND LYING PINNED BENEATH A HEAVY BEAM NEARBY
+HE'S A GREAT SCIENTIST
+BUD THREW UP HIS ARMS TO PROTECT HIMSELF BUT TOO LATE
+THEY PICKED THEIR WAY THROUGH THE WRECKAGE AND EMERGED ON A SCENE OF FRIGHTFUL DESTRUCTION
+HIS FRIEND'S EYELIDS FLICKERED
+MALE OR FEMALE HUMAN OR ANIMAL
+THE SKY WAS VISIBLE THROUGH SEVERAL GAPING HOLES IN THE ROOF WHICH WAS SAGGING DANGEROUSLY ON ITS SUPPORTING TRUSSES
+THEN TOM WHO HAD BEEN STUNNED BY SOME FALLING DEBRIS RAISED HIMSELF TO A SITTING POSITION
+LET'S SEE ABOUT GETTING HELP FOR MISTER FABER
+ANOTHER ENGINEER RUSHED TOWARD THE DOOR TO SEE WHAT WAS HAPPENING OUTSIDE
+ELECTRONIC EQUIPMENT CASCADED FROM THE WALL SHELVES AND A HEAVY DUTY CHAIN HOIST CAME LOOSE FROM ITS OVERHEAD TRACK PLUNGING TO THE FLOOR WITH A TERRIFYING CRASH
+AN INSTANT LATER IT CRASHED OVER PINNING MARK FABER BENEATH IT
+THIS ISN'T PART OF YOUR TESTING ROUTINE IS IT
+TOM NODDED UNHAPPILY
+FOR MINUTES NO ONE STIRRED AMONG THE WRECKAGE
+ANYHOW WE WANT TO HELP GOT A JOB FOR US
+WE'D BETTER NOT TRY TO MOVE HIM TOM DECIDED WE'LL GET AN AMBULANCE
+WITHIN MINUTES TOM WAS IN CHARGE OF CLEARING AWAY RUBBLE AND EXTRICATING ANYONE WHO MIGHT BE TRAPPED INSIDE THE BUILDINGS
+INSIDE A SECRET ROCKET TELEMETERING DEVICE WAS MOUNTED ON ITS TEST STAND
+MISTER SWIFT CAME INTO THE LIVING ROOM JUST THEN AND TOLD TOM HOW WORRIED MISSUS SWIFT AND SANDY HAD BEEN
+THEY CLUSTER AROUND ME THEIR HANDS ARE TALONED THEIR EYES ARE RED LIKE FLAME BURNING IN DARKNESS
+SINCE HIS BIRTH HE HAS BEEN GUARDED SO CLOSELY THAT THE CLEVEREST POISONERS OF THE EAST COULD NOT REACH HIM
+BY WHICH A SOUL IS DRAWN FROM ITS BODY AND ACROSS GULFS OF ECHOING SPACE RETURNED THE MAN ON THE MAT
+BUT AT THE URGENT ENTREATY OF THE PRINCESS OF KHOSALA WHO LOVED BHUNDA CHAND VAINLY HE GAVE HER A LOCK OF HIS LONG BLACK HAIR AS A TOKEN OF REMEMBRANCE
+THE MAN SHRUGGED HIS BROAD SHOULDERS AND TURNED BACK INTO THE ARABESQUE CHAMBER
+A LOW CONFUSED MOAN WANED FROM HIS MOUTH
+I TELL YOU IT IS NOT POISON SHE CRIED
+THE SLANT OF THE MOON PRESAGED EVIL FOR THE KING OF VENDHYA THE STARS ARE IN TURMOIL THE SERPENT IN THE HOUSE OF THE ELEPHANT
+THERE THEY STROVE TO BREAK THE SILVER CORD OF LIFE AND THRUST MY SOUL INTO THE BODY OF A FOUL NIGHT WEIRD THEIR SORCERY SUMMONED UP FROM HELL AH
+THERE WAS THE OLD IMPERIOUS NOTE IN HIS FAILING WHISPER
+ON THE DAIS UNDER THE GOLDEN DOME THE KING CRIED OUT AGAIN RACKED BY AWFUL PAROXYSMS
+THEY SEEK TO SNAP THE SILVER CORD THAT BINDS ME TO MY DYING BODY
+AS YOU WELL KNOW THERE ARE TEN MEN AND TEN WOMEN WHOSE SOLE DUTY IS TO TASTE HIS FOOD AND WINE AND FIFTY ARMED WARRIORS GUARD HIS CHAMBER AS THEY GUARD IT NOW
+POINT OF CONTACT INQUIRED THE OTHER
+I KNOW NOW WHAT BRINGS ME TO THE PYRE
+THEIR FINGERS SEAR ME LIKE FIRE
+NOT UNTIL THE HEAVENS WERE IN THE PROPER ORDER COULD THEY PERFORM THIS NECROMANCY
+HE WAS YOUNG NO SPEAR HAD TOUCHED HIM NO POISON LURKED IN HIS WINE
+SEND MY SOUL CLEAN TO ASURA
+ALL DISCARDED PORTIONS OF THE HUMAN BODY STILL REMAIN PART OF IT ATTACHED TO IT BY INTANGIBLE CONNECTIONS
+WITH A LONG STAINED FINGERNAIL HE MAPPED THE CONSTELLATIONS ON THE MARBLE TILED FLOOR
+THIS MAN WAS CLAD IN A BROWN CAMEL HAIR ROBE AND SANDALS AND A GREEN TURBAN WAS ON HIS HEAD
+YOU HAVE NEVER DISOBEYED ME OBEY MY LAST COMMAND
+YOUR CRY AND THE GRIP OF YOUR FINGERS BROUGHT ME BACK BUT I AM GOING FAST
+SUMMER SQUASHES ALMOST IN THEIR GOLDEN BLOSSOM CUCUMBERS NOW EVINCING A TENDENCY TO SPREAD AWAY FROM THE MAIN STOCK AND RAMBLE FAR AND WIDE TWO OR THREE ROWS OF STRING BEANS AND AS MANY MORE THAT WERE ABOUT TO FESTOON THEMSELVES ON POLES TOMATOES OCCUPYING A SITE SO SHELTERED AND SUNNY THAT THE PLANTS WERE ALREADY GIGANTIC AND PROMISED AN EARLY AND ABUNDANT HARVEST
+WHAT AN INSTRUMENT IS THE HUMAN VOICE
+PHOEBE WONDERED WHOSE CARE AND TOIL IT COULD HAVE BEEN THAT HAD PLANTED THESE VEGETABLES AND KEPT THE SOIL SO CLEAN AND ORDERLY
+FEWER WORDS THAN BEFORE BUT WITH THE SAME MYSTERIOUS MUSIC IN THEM
+AND YET IF YOU COULD ONLY SEE THE BENIGN SMILE OF THE ORIGINAL
+THERE IS A WONDERFUL INSIGHT IN HEAVEN'S BROAD AND SIMPLE SUNSHINE
+OH REJOINED THE DAGUERREOTYPIST BECAUSE LIKE AN OLD LADY'S CUP OF TEA IT IS WATER BEWITCHED
+THE ENCLOSURE HAD FORMERLY BEEN VERY EXTENSIVE BUT WAS NOW CONTRACTED WITHIN SMALL COMPASS AND HEMMED ABOUT PARTLY BY HIGH WOODEN FENCES AND PARTLY BY THE OUTBUILDINGS OF HOUSES THAT STOOD ON ANOTHER STREET
+IT NOW CONTAINED ONLY CHANTICLEER HIS TWO WIVES AND A SOLITARY CHICKEN
+SINCE YOU ARE A FRIEND OF MY COUSIN HEPZIBAH'S YOU SHOULD ASK HER TO SHOW YOU THE PICTURE
+HE EXHIBITED A DAGUERREOTYPE MINIATURE IN A MOROCCO CASE
+THIS WAS A FOUNTAIN SET ROUND WITH A RIM OF OLD MOSSY STONES AND PAVED IN ITS BED WITH WHAT APPEARED TO BE A SORT OF MOSAIC WORK OF VARIOUSLY COLORED PEBBLES
+WHILE WE GIVE IT CREDIT ONLY FOR DEPICTING THE MEREST SURFACE IT ACTUALLY BRINGS OUT THE SECRET CHARACTER WITH A TRUTH THAT NO PAINTER WOULD EVER VENTURE UPON EVEN COULD HE DETECT IT
+I TURN UP THE EARTH BY WAY OF PASTIME
+THE WHITE DOUBLE ROSEBUSH HAD EVIDENTLY BEEN PROPPED UP ANEW AGAINST THE HOUSE SINCE THE COMMENCEMENT OF THE SEASON AND A PEAR TREE AND THREE DAMSON TREES WHICH EXCEPT A ROW OF CURRANT BUSHES CONSTITUTED THE ONLY VARIETIES OF FRUIT BORE MARKS OF THE RECENT AMPUTATION OF SEVERAL SUPERFLUOUS OR DEFECTIVE LIMBS
+HERE WE HAVE THE MAN SLY SUBTLE HARD IMPERIOUS AND WITHAL COLD AS ICE LOOK AT THAT EYE
+HOW WONDERFULLY RESPONSIVE TO EVERY EMOTION OF THE HUMAN SOUL
+BEES TOO STRANGE TO SAY HAD THOUGHT IT WORTH THEIR WHILE TO COME HITHER POSSIBLY FROM THE RANGE OF HIVES BESIDE SOME FARM HOUSE MILES AWAY
+YET THE ORIGINAL WEARS TO COMMON EYES A VERY DIFFERENT EXPRESSION
+MOST OF MY LIKENESSES DO LOOK UNAMIABLE BUT THE VERY SUFFICIENT REASON I FANCY IS BECAUSE THE ORIGINALS ARE SO
+IS THERE NOTHING WILD IN THE EYE CONTINUED HOLGRAVE SO EARNESTLY THAT IT EMBARRASSED PHOEBE AS DID ALSO THE QUIET FREEDOM WITH WHICH HE PRESUMED ON THEIR SO RECENT ACQUAINTANCE
+PHOEBE MERELY GLANCED AT IT AND GAVE IT BACK
+IT IS LIKE A BANDAGE OVER ONE'S EYES TO COME INTO IT
+WHILE THUS DISMISSING HER THE MAIDEN LADY STEPT FORWARD KISSED PHOEBE AND PRESSED HER TO HER HEART WHICH BEAT AGAINST THE GIRL'S BOSOM WITH A STRONG HIGH AND TUMULTUOUS SWELL
+THEY KEPT THEMSELVES ALIVE UNQUESTIONABLY AND LAID NOW AND THEN AN EGG AND HATCHED A CHICKEN NOT FOR ANY PLEASURE OF THEIR OWN BUT THAT THE WORLD MIGHT NOT ABSOLUTELY LOSE WHAT HAD ONCE BEEN SO ADMIRABLE A BREED OF FOWLS
+THE CHICKEN CREPT THROUGH THE PALES OF THE COOP AND RAN WITH SOME SHOW OF LIVELINESS TO HER FEET WHILE CHANTICLEER AND THE LADIES OF HIS HOUSEHOLD REGARDED HER WITH QUEER SIDELONG GLANCES AND THEN CROAKED ONE TO ANOTHER AS IF COMMUNICATING THEIR SAGE OPINIONS OF HER CHARACTER
+AH BUT THESE HENS ANSWERED THE YOUNG MAN THESE HENS OF ARISTOCRATIC LINEAGE WOULD SCORN TO UNDERSTAND THE VULGAR LANGUAGE OF A BARN YARD FOWL
+SO WE WILL BE FELLOW LABORERS SOMEWHAT ON THE COMMUNITY SYSTEM
+THEY HAVE KNOWN ME MUCH LONGER BUT NEVER HONOR ME WITH ANY FAMILIARITY THOUGH HARDLY A DAY PASSES WITHOUT MY BRINGING THEM FOOD
+HE HELD A HOE IN HIS HAND AND WHILE PHOEBE WAS GONE IN QUEST OF THE CRUMBS HAD BEGUN TO BUSY HIMSELF WITH DRAWING UP FRESH EARTH ABOUT THE ROOTS OF THE TOMATOES
+I PREFER TO THINK AND SO WOULD MISS HEPZIBAH THAT THEY RECOGNIZE THE FAMILY TONE FOR YOU ARE A PYNCHEON
+IT IS NONSENSE SAID PHOEBE A LITTLE IMPATIENTLY FOR US TO TALK ABOUT A PICTURE WHICH YOU HAVE NEVER SEEN
+THESE FEATHERED PEOPLE HAD EXISTED TOO LONG IN THEIR DISTINCT VARIETY A FACT OF WHICH THE PRESENT REPRESENTATIVES JUDGING BY THEIR LUGUBRIOUS DEPORTMENT SEEMED TO BE AWARE
+WELL I DON'T WISH TO SEE IT ANY MORE OBSERVED PHOEBE TURNING AWAY HER EYES IT IS CERTAINLY VERY LIKE THE OLD PORTRAIT
+PRAY GO TO BED FOR I AM SURE YOU MUST NEED REST
+THE SUN AS YOU SEE TELLS QUITE ANOTHER STORY AND WILL NOT BE COAXED OUT OF IT AFTER HALF A DOZEN PATIENT ATTEMPTS ON MY PART
+SHE WAS INDISTINCTLY AWARE HOWEVER THAT THE GAUNT FIGURE OF THE OLD GENTLEWOMAN WAS SITTING IN ONE OF THE STRAIGHT BACKED CHAIRS A LITTLE WITHDRAWN FROM THE WINDOW THE FAINT GLEAM OF WHICH SHOWED THE BLANCHED PALENESS OF HER CHEEK TURNED SIDEWAYS TOWARDS A CORNER
+THE DISTINGUISHING MARK OF THE HENS WAS A CREST OF LAMENTABLY SCANTY GROWTH IN THESE LATTER DAYS BUT SO ODDLY AND WICKEDLY ANALOGOUS TO HEPZIBAH'S TURBAN THAT PHOEBE TO THE POIGNANT DISTRESS OF HER CONSCIENCE BUT INEVITABLY WAS LED TO FANCY A GENERAL RESEMBLANCE BETWIXT THESE FORLORN BIPEDS AND HER RESPECTABLE RELATIVE
+I WILL SIT IN THE PARLOR AWHILE AND COLLECT MY THOUGHTS
+SO WISE AS WELL AS ANTIQUE WAS THEIR ASPECT AS TO GIVE COLOR TO THE IDEA NOT MERELY THAT THEY WERE THE DESCENDANTS OF A TIME HONORED RACE BUT THAT THEY HAD EXISTED IN THEIR INDIVIDUAL CAPACITY EVER SINCE THE HOUSE OF THE SEVEN GABLES WAS FOUNDED AND WERE SOMEHOW MIXED UP WITH ITS DESTINY
+AT SOME UNCERTAIN PERIOD IN THE DEPTHS OF NIGHT AND AS IT WERE THROUGH THE THIN VEIL OF A DREAM SHE WAS CONSCIOUS OF A FOOTSTEP MOUNTING THE STAIRS HEAVILY BUT NOT WITH FORCE AND DECISION
+IT WAS EVIDENT THAT THE RACE HAD DEGENERATED LIKE MANY A NOBLE RACE BESIDES IN CONSEQUENCE OF TOO STRICT A WATCHFULNESS TO KEEP IT PURE
+BUT PUT IT ON THE TABLE IN THE CORNER OF THE PASSAGE
+SHE DID NOT ALTOGETHER LIKE HIM
+THERE WERE ALSO A FEW SPECIES OF ANTIQUE AND HEREDITARY FLOWERS IN NO VERY FLOURISHING CONDITION BUT SCRUPULOUSLY WEEDED AS IF SOME PERSON EITHER OUT OF LOVE OR CURIOSITY HAD BEEN ANXIOUS TO BRING THEM TO SUCH PERFECTION AS THEY WERE CAPABLE OF ATTAINING
+IF YOU WOULD PERMIT ME SAID THE ARTIST LOOKING AT PHOEBE I SHOULD LIKE TO TRY WHETHER THE DAGUERREOTYPE CAN BRING OUT DISAGREEABLE TRAITS ON A PERFECTLY AMIABLE FACE
+IF THE ORIGINAL IS STILL IN THE WORLD I THINK HE MIGHT DEFY THE SUN TO MAKE HIM LOOK STERN AND HARD
+MY NAME IS PHOEBE PYNCHEON SAID THE GIRL WITH A MANNER OF SOME RESERVE FOR SHE WAS AWARE THAT HER NEW ACQUAINTANCE COULD BE NO OTHER THAN THE DAGUERREOTYPIST OF WHOSE LAWLESS PROPENSITIES THE OLD MAID HAD GIVEN HER A DISAGREEABLE IDEA
+IN GOOD FAITH HOWEVER HE IS NOT SUFFICIENTLY IMAGINATIVE TO FLATTER HIMSELF WITH THE SLIGHTEST HOPE OF THIS KIND
+HE TRUSTS NOT TO BE CONSIDERED AS UNPARDONABLY OFFENDING BY LAYING OUT A STREET THAT INFRINGES UPON NOBODY'S PRIVATE RIGHTS AND APPROPRIATING A LOT OF LAND WHICH HAD NO VISIBLE OWNER AND BUILDING A HOUSE OF MATERIALS LONG IN USE FOR CONSTRUCTING CASTLES IN THE AIR
+THE AUTHOR HAS CONSIDERED IT HARDLY WORTH HIS WHILE THEREFORE RELENTLESSLY TO IMPALE THE STORY WITH ITS MORAL AS WITH AN IRON ROD OR RATHER AS BY STICKING A PIN THROUGH A BUTTERFLY THUS AT ONCE DEPRIVING IT OF LIFE AND CAUSING IT TO STIFFEN IN AN UNGAINLY AND UNNATURAL ATTITUDE
+THE NARRATIVE IT MAY BE IS WOVEN OF SO HUMBLE A TEXTURE AS TO REQUIRE THIS ADVANTAGE AND AT THE SAME TIME TO RENDER IT THE MORE DIFFICULT OF ATTAINMENT
+IF PERMITTED BY THE HISTORICAL CONNECTION WHICH THOUGH SLIGHT WAS ESSENTIAL TO HIS PLAN THE AUTHOR WOULD VERY WILLINGLY HAVE AVOIDED ANYTHING OF THIS NATURE
+THERE APPEARED TO BE AN IMMEDIATE ASSOCIATION WITH THE DEATH TRAUMA AS IF THE TWO WERE INEXTRICABLY LINKED INTO ONE
+A MINUTE IS NOT A VERY LARGE MEASURE OF TIME AND HIS BODY NEEDED EVERY FRACTION OF IT
+PARTICULARLY SO ON THIS LAST NIGHT WHEN ONLY TWO OF THE LITTLE CUBICLES WERE OCCUPIED THE THOUSANDS OF OTHERS STANDING WITH DARK EMPTY DOORS
+THE CONTESTANTS IN THE TWENTIES NEEDED UNDISTURBED REST THEREFORE NIGHTS IN THE DORMITORIES WERE AS QUIET AS DEATH
+THERE COULD BE LITTLE ART IN THIS LAST AND FINAL ROUND OF FENCING
+BRION SAW SOMETHING CLOSE TO PANIC ON HIS OPPONENT'S FACE WHEN THE MAN FINALLY RECOGNIZED HIS ERROR
+SWEAT COVERED BRION'S BODY TRICKLING INTO THE TIGHT LOINCLOTH THAT WAS THE ONLY GARMENT HE WORE
+TEN SECONDS
+THIS IS PHYSICALLY IMPOSSIBLE WHEN CONSCIOUS
+IROLG LOOKED AMAZED AT THE SUDDEN FURY OF THE ATTACK THEN SMILED
+I'M HERE BECAUSE THE MATTER IS OF UTMOST IMPORTANCE AND BRANDD IS THE ONE I MUST SEE NOW STAND ASIDE
+A WAVE OF DESPAIR ROLLED OUT FROM IROLG BRION SENSED IT AND KNEW THE FIFTH POINT WAS HIS
+BREATHING DEEPLY BRION SOFTLY SPOKE THE AUTO HYPNOTIC PHRASES THAT TRIGGERED THE PROCESS
+THERE WAS SILENCE THEN AND STILL WONDERING BRION WAS ONCE MORE ASLEEP
+A MAN SAID TO THE UNIVERSE SIR I EXIST
+WHEN THE BUZZER SOUNDED HE PULLED HIS FOIL FROM HIS SECOND'S STARTLED GRASP AND RAN FORWARD
+JUST THRUST AND PARRY AND VICTORY TO THE STRONGER
+THE STRENGTH THAT ENABLES SOMEONE IN A TRANCE TO HOLD HIS BODY STIFF AND UNSUPPORTED EXCEPT AT TWO POINTS THE HEAD AND HEELS
+THE CUT ON HIS CHEST STILL DRIPPING BLOOD THE ACHE OF HIS OVERSTRAINED EYES EVEN THE SOARING ARENA AROUND HIM WITH THE THOUSANDS OF SPECTATORS WERE TRIVIALITIES NOT WORTH THINKING ABOUT
+THE BUZZER'S WHIRR TRIGGERED HIS MUSCLES INTO COMPLETE RELAXATION
+OTHERS HAD DIED BEFORE DURING THE TWENTIES AND DEATH DURING THE LAST ROUND WAS IN SOME WAYS EASIER THAN DEFEAT
+ONE MINUTE A VOICE SAID AND THE TIME BUZZER SOUNDED
+EVERY MAN WHO ENTERED THE TWENTIES HAD HIS OWN TRAINING TRICKS
+HE WAS IN REVERIE SLIDING ALONG THE BORDERS OF CONSCIOUSNESS
+THE OTHER VOICE SNAPPED WITH A HARSH URGENCY CLEARLY USED TO COMMAND
+THE TWENTIES
+THEN THE POWERFUL TWIST THAT THRUST IT ASIDE IN AND UNDER THE GUARD
+HE ASKED THE HANDLER WHO WAS KNEADING HIS ACHING MUSCLES
+HIS INSTANT OF PANIC WAS FOLLOWED BY A SMALL SHARP BLOW HIGH ON HIS CHEST
+HE THOUGHT IT WAS A LAST BURST OF ENERGY HE KNEW HOW CLOSE THEY BOTH WERE TO EXHAUSTION
+ONLY HIS HEART AND LUNGS WORKED ON AT A STRONG MEASURED RATE
+HE MUST HAVE DRAWN HIS GUN BECAUSE THE INTRUDER SAID QUICKLY PUT THAT AWAY YOU'RE BEING A FOOL OUT
+A RED HAIRED MOUNTAIN OF A MAN WITH AN APPARENTLY INEXHAUSTIBLE STORE OF ENERGY
+TRUE AGREED KALIKO
+I HAVE REMAINED A PRISONER ONLY BECAUSE I WISHED TO BE ONE AND WITH THIS HE STEPPED FORWARD AND BURST THE STOUT CHAINS AS EASILY AS IF THEY HAD BEEN THREADS
+THE METAL FOREST IS IN THE GREAT DOMED CAVERN THE LARGEST IN ALL OUR DOMINIONS REPLIED KALIKO
+WHERE IS MY BROTHER NOW
+OH NO I'M QUITE SURE HE DIDN'T
+HAVING RETURNED TO THE ROYAL CAVERN KALIKO FIRST POUNDED THE GONG AND THEN SAT IN THE THRONE WEARING RUGGEDO'S DISCARDED RUBY CROWN AND HOLDING IN HIS HAND THE SCEPTRE WHICH RUGGEDO HAD SO OFTEN THROWN AT HIS HEAD
+THAT'S FUNNY REMARKED BETSY THOUGHTFULLY
+I DON'T BELIEVE ANN KNEW ANY MAGIC OR SHE'D HAVE WORKED IT BEFORE
+KALIKO WENT TO THE BIG GONG AND POUNDED ON IT JUST AS RUGGEDO USED TO DO BUT NO ONE ANSWERED THE SUMMONS
+IN FACT THERE IS NOTHING HE CAN DO IN THESE DOMINIONS AS WELL AS OUR NOMES WHOSE NUMBERS ARE SO GREAT THAT IT WORRIES US TO KEEP THEM ALL BUSY
+HOWEVER IF WE LOOK SHARP WE MAY BE ABLE TO DISCOVER ONE OF THESE SECRET WAYS
+I DO NOT KNOW CONFESSED SHAGGY
+I HOPE HE DOESN'T WORK TOO HARD SAID SHAGGY
+THE LITTLE GIRL HAD BEEN ASLEEP BUT SHE HEARD THE RAPS AND OPENED THE DOOR
+I ALSO OFFERED TO HELP YOUR BROTHER TO ESCAPE BUT HE WOULD NOT GO
+I BEGGED RUGGEDO LONG AGO TO SEND HIM AWAY BUT HE WOULD NOT DO SO
+THE KING HAS FLED IN DISGRACE AND YOUR FRIENDS ARE ASKING FOR YOU
+WHERE IS THAT
+NOT EXACTLY RETURNED KALIKO
+KALIKO HESITATED
+HE EATS AND SLEEPS VERY STEADILY REPLIED THE NEW KING
+INQUIRED SHAGGY IN THE METAL FOREST
+BECAUSE YOU WERE SLEEPING INSTEAD OF CONQUERING THE LOVELY ROSE PRINCESS HAS BECOME A FIDDLE WITHOUT A BOW WHILE POOR SHAGGY SITS THERE A COOING DOVE
+HE DOESN'T WORK AT ALL
+HE HAS GONE AND GONE FOR GOOD ANSWERED POLYCHROME WHO HAD MANAGED TO SQUEEZE INTO THE ROOM BESIDE THE DRAGON AND HAD WITNESSED THE OCCURRENCES WITH MUCH INTEREST
+PAINTING HE TELLS US IS OF A DIFFERENT QUALITY TO MATHEMATICS AND FINISH IN ART IS ADDING MORE FACT
+IN FACT HE IS QUITE SEVERE ON MISTER RUSKIN FOR NOT RECOGNISING THAT A PICTURE SHOULD DENOTE THE FRAILTY OF MAN AND REMARKS WITH PLEASING COURTESY AND FELICITOUS GRACE THAT MANY PHASES OF FEELING
+LINNELL'S PICTURES ARE A SORT OF UP GUARDS AND AT EM PAINTINGS AND MASON'S EXQUISITE IDYLLS ARE AS NATIONAL AS A JINGO POEM MISTER BIRKET FOSTER'S LANDSCAPES SMILE AT ONE MUCH IN THE SAME WAY THAT MISTER CARKER USED TO FLASH HIS TEETH AND MISTER JOHN COLLIER GIVES HIS SITTER A CHEERFUL SLAP ON THE BACK BEFORE HE SAYS LIKE A SHAMPOOER IN A TURKISH BATH NEXT MAN
+ONLY UNFORTUNATELY HIS OWN WORK NEVER DOES GET GOOD
+ON THE GENERAL PRINCIPLES OF ART MISTER QUILTER WRITES WITH EQUAL LUCIDITY
+MISTER QUILTER IS THE APOSTLE OF THE MIDDLE CLASSES AND WE ARE GLAD TO WELCOME HIS GOSPEL
+HE TELLS US THAT AT THIS FESTIVE SEASON OF THE YEAR WITH CHRISTMAS AND ROAST BEEF LOOMING BEFORE US SIMILES DRAWN FROM EATING AND ITS RESULTS OCCUR MOST READILY TO THE MIND
+AS FOR ETCHINGS THEY ARE OF TWO KINDS BRITISH AND FOREIGN
+BY HARRY QUILTER M A
+HE HAS GRAVE DOUBTS WHETHER SIR FREDERICK LEIGHTON'S WORK IS REALLY GREEK AFTER ALL AND CAN DISCOVER IN IT BUT LITTLE OF ROCKY ITHACA
+NOR IS MISTER QUILTER'S MANNER LESS INTERESTING THAN HIS MATTER
+IT IS OBVIOUSLY UNNECESSARY FOR US TO POINT OUT HOW LUMINOUS THESE CRITICISMS ARE HOW DELICATE IN EXPRESSION
+HE LAMENTS MOST BITTERLY THE DIVORCE THAT HAS BEEN MADE BETWEEN DECORATIVE ART AND WHAT WE USUALLY CALL PICTURES MAKES THE CUSTOMARY APPEAL TO THE LAST JUDGMENT AND REMINDS US THAT IN THE GREAT DAYS OF ART MICHAEL ANGELO WAS THE FURNISHING UPHOLSTERER
+MISTER QUILTER HAS MISSED HIS CHANCE FOR HE HAS FAILED EVEN TO MAKE HIMSELF THE TUPPER OF PAINTING
+NEAR THE FIRE AND THE ORNAMENTS FRED BROUGHT HOME FROM INDIA ON THE MANTEL BOARD
+IT WAS NOT FROM ANY REAL CAUSE OF GRIEF THAT SHE WEPT BUT THERE WAS A MAGNETIC QUALITY IN TEARS WHICH ALWAYS ATTRACTED HER'S
+NIECE I COMMAND YOU NOT TO STIR OUT OF THIS ROOM THIS EVENING
+IT IS OFTEN THE UNGRATEFUL TASK OF A FRIEND TO BE TROUBLESOME SOMETIMES UNMANNERLY
+YES INDEED AND I BELIEVE IT IS RIGHT THAT I SHOULD KEEP MY FIRST PROMISE IS IT NOT
+KEEP YOUR APPOINTMENT AND BE ASSURED THAT I SHALL ISSUE MY COMMANDS WITH MORE CIRCUMSPECTION FOR THE FUTURE AS I FIND HOW STRICTLY THEY ARE COMPLIED WITH
+IF YOU THINK SO MADAM I SEE NOTHING THAT SHOULD PREVENT ME NOW
+NIGHT AFTER NIGHT HIS SLEEP HAD BEEN DISTURBED BY FEARS FOR HER WHEN ABROAD MORNING AFTER MORNING IT HAD BEEN BROKEN BY THE CLAMOUR OF HER RETURN
+NOR HAD THIS GOOD WOMAN'S OFFICIOUS LABOURS TAKEN THE LEAST FROM THE AWKWARDNESS OF THE SILENCE WHICH AS SOON AS THE BUSTLE SHE HAD MADE WAS OVER RETURNED IN ITS FULL FORCE
+SIR EDWARD NOT WHOLLY DISCOURAGED BY THE DENIAL WITH WHICH DORRIFORTH HAD WITH DELICACY ACQUAINTED HIM STILL HOPED FOR A KIND RECEPTION AND WAS SO OFTEN AT THE HOUSE OF MISSUS HORTON THAT LORD FREDERICK'S JEALOUSY WAS EXCITED AND THE TORTURES HE SUFFERED IN CONSEQUENCE CONVINCED HIM BEYOND A DOUBT OF THE SINCERITY OF HIS AFFECTION
+MISSUS HORTON TOO IN THE SELF APPROVING REFLECTION THAT SHE WAS NOT IN A QUARREL OR ALTERCATION OF ANY KIND FELT HERSELF AT THIS MOMENT REMARKABLY PEACEFUL AND CHARITABLE
+I HOPE MISS MILNER YOU PASS THIS EVENING AT HOME
+AT THE USUAL HOUR MISTER DORRIFORTH AND HIS WARD WERE SUMMONED TO TEA HE ENTERED WITH A COUNTENANCE WHICH EVINCED THE REMAINS OF ANGER HIS EYE GAVE TESTIMONY OF HIS ABSENT THOUGHTS AND THOUGH HE TOOK UP A PAMPHLET AFFECTING TO READ IT WAS PLAIN TO DISCERN THAT HE SCARCELY KNEW HE HELD IT IN HIS HAND
+MISS WOODLEY THOUGHT IT HER DUTY TO BE MUTE AND NOW THE GINGLE OF A TEA SPOON WAS LIKE A DEEP TONED BELL ALL WAS SO QUIET
+DORRIFORTH THEN LAID THE BOOK OUT OF HIS HAND AND BY THE TIME THE SERVANT HAD LEFT THE ROOM THUS BEGAN
+YET DID THE WATCHFUL MISS WOODLEY OFTENTIMES HEAR A SIGH ESCAPE FROM HER UNKNOWN TO HERSELF TILL SHE WAS REMINDED OF IT AND THEN A SUDDEN BLUSH WOULD INSTANTLY OVERSPREAD HER FACE
+DORRIFORTH READ ON AND SEEMED AFRAID OF LOOKING UP LEST HE SHOULD SEE WHAT HE COULD NOT HAVE PARDONED
+I THOUGHT MISS MILNER YOU GAVE ME YOUR WORD THAT YOU WOULD PASS THIS EVENING AT HOME
+ON THIS HE ROSE FROM HIS CHAIR AND GOING TO HER SAID ONCE MORE SHEW YOUR SUBMISSION BY OBEYING ME A SECOND TIME TO DAY
+WHAT HE SAID HE LOOKED WITH SO MUCH SINCERITY THAT HAD SHE BEEN BURNING WITH RAGE AT HIS LATE BEHAVIOUR SHE MUST HAVE FORGIVEN HIM FOR THE REGRET WHICH HE SO FORCIBLY EXPREST
+MISSUS HORTON ROSE FROM HER SEAT MOVED THE DECANTERS AND FRUIT ROUND THE TABLE STIRRED THE FIRE AND CAME BACK TO HER SEAT AGAIN BEFORE ANOTHER WORD WAS UTTERED
+MISS WOODLEY DID NOT RECOLLECT HERSELF SO BUT WAS SO IN REALITY IN HER PEACE AND CHARITY WERE INSTINCTIVE VIRTUES ACCIDENT COULD NOT INCREASE THEM
+MISS MILNER YOU SHALL NOT LEAVE THE HOUSE THIS EVENING SIR
+HER HAND FELL MOTIONLESS FROM THAT WHICH SHE HELD SHE APPEARED MOTIONLESS HERSELF TILL MISSUS HORTON BESEECHING HER NOT TO BE UNEASY AT THE TREATMENT SHE HAD RECEIVED MADE HER TEARS FLOW AS IF HER HEART WAS BREAKING
+SHE WAS GOING TO REPLY BUT FOUND SHE COULD NOT WITHOUT ACCOMPANYING HER WORDS WITH TEARS THEREFORE AFTER THE FIRST ATTEMPT SHE DESISTED
+EVERY TIME HE BEHELD THE OBJECT OF HIS PASSION FOR HE STILL CONTINUED HIS VISITS THOUGH NOT SO FREQUENTLY AS HERETOFORE HE PLEADED HIS CAUSE WITH SUCH ARDOUR THAT MISS WOODLEY WHO WAS SOMETIMES PRESENT AND EVER COMPASSIONATE COULD NOT RESIST WISHING HIM SUCCESS
+HE COUGHED DRANK HIS TEA ENDEAVOURED TO TALK BUT FOUND IT DIFFICULT SOMETIMES READ AND IN THIS MANNER NEAR TWO HOURS WERE PASSED AWAY WHEN MISS MILNER CAME INTO THE ROOM NOT DRESSED FOR A BALL BUT AS SHE HAD RISEN FROM DINNER
+AND HE WALKED IMMEDIATELY OUT OF THE APARTMENT BY ANOTHER DOOR
+DO YOU THINK I WOULD GO ANSWERED MISS MILNER WITH AN EAGERNESS THAT FOR A TIME SUPPRESSED HER TEARS IN CONTRADICTION TO HIS WILL
+AFTER A FEW MINUTES PAUSE AND SOME LITTLE EMBARRASSMENT ON THE PART OF MISSUS HORTON AT THE DISAPPOINTMENT SHE HAD TO ENCOUNTER FROM THIS UNEXPECTED DUTIFUL CONDUCT SHE ASKED MISS MILNER IF SHE WOULD NOW HAVE ANY TEA
+MISS WOODLEY OBEDIENTLY SAT DOWN AND THOUGH HER THOUGHTS AND HEART WERE IN THE CHAMBER OF HER FRIEND SHE NEVER MARKED BY ONE IMPERTINENT WORD OR BY ONE LINE OF HER FACE THE RESTRAINT SHE SUFFERED
+FORGIVE THE DUTIES OF MY OFFICE AND BELIEVE THAT NO ONE IS HALF SO MUCH CONCERNED IF IT ROBS YOU OF ANY DEGREE OF HAPPINESS AS I MYSELF AM
+WHEN A MARRIED WOMAN HAS FOLLOWERS AND THE HUSBAND DON'T GO THE WRONG SIDE OF THE POST TOO OR IT AIN'T PROVED AGAIN HIM THAT HE DO THEY'LL NEVER LET HER HAVE NOTHING TO DO WITH THE CHILDREN
+MISSUS BOZZLE WAS DISPOSED TO THINK THAT LADIES OF QUALITY AMONG WHOM MADAME T WAS ENTITLED IN HER ESTIMATION TO TAKE RANK WERE SELDOM BETTER THAN THEY OUGHT TO BE AND SHE WAS QUITE WILLING THAT HER HUSBAND SHOULD EARN HIS BREAD BY WATCHING THE LADY OR THE LADY'S LOVER
+HE CAN'T SUCKLE EM CAN HE
+BUT AS FOR THIS HERE CHILD B
+AND NOW IT HAD COME TO PASS THAT HIS SOLE REMAINING ALLY MISTER SAMUEL BOZZLE THE EX POLICEMAN WAS BECOMING WEARY OF HIS SERVICE
+AT LAST HE SENT WORD TO SAY THAT HE HIMSELF WOULD BE IN ENGLAND BEFORE THE END OF MARCH AND WOULD SEE THAT THE MAJESTY OF THE LAW SHOULD BE VINDICATED IN HIS FAVOUR
+BOZZLE AWAY FROM HIS OWN HOME OUT ON BUSINESS WITH HIS COAT BUTTONED OVER HIS BREAST AND HIS BEST HAT IN HIS HAND WAS AWARE THAT HE COMMANDED RESPECT AND HE COULD CARRY HIMSELF ACCORDINGLY
+BUT IF YOU ASK ME MY OPINION WHY IN COURSE THEY'VE BEEN TOGETHER SOMEWHERE
+IN THE LAST COMMUNICATION WHICH HE HAD RECEIVED FROM LADY MILBOROUGH SHE HAD SCOLDED HIM IN TERMS THAT WERE FOR HER SEVERE BECAUSE HE HAD NOT RETURNED TO HIS WIFE AND TAKEN HER OFF WITH HIM TO NAPLES
+IF YOU WOULD HAVE GONE TO MISTER SKINT SIR SUGGESTED BOZZLE
+AND HAD THE CASE BEEN BROUGHT BEFORE THE JUDGE ORDINARY BY MEANS OF HER HUSBAND'S EXERTIONS SHE WOULD HAVE TAKEN PLEASURE IN READING EVERY WORD OF THE EVIDENCE EVEN THOUGH HER HUSBAND SHOULD HAVE BEEN EVER SO ROUGHLY HANDLED BY THE LAWYERS
+DRAT EM ALL WHAT IS IT THEY WANTS THEY DON'T KNOW WHAT THEY WANTS
+BUT TREVELYAN WAS OF A DIFFERENT OPINION AND HE WAS DISGUSTED AND REVOLTED MOST UNREASONABLY BY THE APPEARANCE OF HIS MINISTER'S DOMESTIC ARRANGEMENTS
+IT'S THAT AS MAKES EM I WON'T SAY WHAT
+PERHAPS YOU COULD PUT ON YOUR COAT AND WALK OUT WITH ME FOR A FEW MINUTES SAID TREVELYAN
+I'LL TELL YOU WHAT IT IS B EXCLAIMED MISSUS BOZZLE IT'S MY BELIEF AS HE AIN'T QUITE RIGHT UP HERE AND MISSUS BOZZLE TOUCHED HER FOREHEAD
+A DISTINCT PROMISE OF A HUNDRED POUNDS WAS MADE TO HIM IF HE WOULD HAVE THE CHILD READY TO HAND OVER TO TREVELYAN ON TREVELYAN'S ARRIVAL IN ENGLAND
+TREVELYAN HAD FOLLOWED HIS LETTER QUICKER THAN HE HAD INTENDED WHEN IT WAS WRITTEN AND WAS NOW WITH HIS PRIME MINISTER BEFORE HIS PRIME MINISTER HAD BEEN ABLE TO TAKE ANY ACTION ON THE LAST INSTRUCTION RECEIVED
+THEN BOZZLE CAME FORWARD AND INTRODUCED HIS WIFE
+THE PATERNAL PARENT HAS A RIGHT TO HIS INFANTS NO DOUBT THAT WAS BOZZLE'S LAW
+I DO NOT SUPPOSE THAT ANYBODY WILL QUESTION MY RIGHT TO HAVE THE CARE OF MY OWN CHILD SAID TREVELYAN
+I'VE WATCHED AS SHARP AS WATCHING CAN GO PRETTY NEAR
+OF COURSE IT AIN'T SAID MISSUS BOZZLE
+AS HE WENT ABOUT HIS EYES WERE EVER CAST DOWNWARDS AND HE WALKED WITH A QUICK SHUFFLING GAIT AND HE SUSPECTED OTHERS FEELING THAT HE HIMSELF WAS SUSPECTED
+IT IS VERY MUCH EASIER FOR SUCH MEN AS MISTER BOZZLE TO CARRY DECENCY OF APPEARANCE ABOUT WITH THEM THAN TO KEEP IT AT HOME
+AND HE DID GO AWAY LEAVING BOZZLE STANDING IN THE MIDDLE OF STONY WALK
+AND ALL WORK HAD CEASED WITH HIM
+MISSUS BOZZLE WHO WELL UNDERSTOOD THAT BUSINESS WAS BUSINESS AND THAT WIVES WERE NOT BUSINESS FELT NO ANGER AT THIS AND HANDED HER HUSBAND HIS BEST COAT
+AND BOZZLE AS HE SAID THIS SMILED ALMOST ALOUD
+DOES ONE MISTER SAMUEL BOZZLE LIVE HERE ASKED TREVELYAN
+HE'S UP IN TOWN SIR A MINDING OF HIS PARLIAMENTARY DUTIES
+BOZZLE HAD ALWAYS WAITED UPON HIM WITH A DECENT COAT AND A WELL BRUSHED HAT AND CLEAN SHOES
+IN MAKING THIS HE HAD EXPECTED NO SUCCESS THOUGH FROM THE ENERGETIC NATURE OF HIS DISPOSITION HE HAD MADE THE ATTEMPT WITH SOME ZEAL
+WE HAVE BOTH SEEN THE SAME NEWSPAPER OF COURSE AND YOU HAVE BEEN THE FIRST TO CLEAR THE THING UP THAT'S IT ISN'T IT
+WHAT ARE YOU DOING HERE HE ASKED
+EVEN THE CHANCE OF SUCCESSFULLY CONFIDING HER TO BENNYDECK'S PROTECTION HAD LOST SOMETHING OF ITS FAIR PROMISE SINCE RANDAL'S VISIT TO SYDENHAM
+SHALL I SAY THAT SHE MAY EXPECT AN EARLY VISIT FROM YOU WHEN I SEE HER TO MORROW
+BUT IT MIGHT PERHAPS BE EXCUSABLE TO INFER THAT THE MARRIAGE HAD NOT YET BEEN DECIDED ON AND THAT THE CAPTAIN'S PROPOSALS WERE STILL WAITING FOR CATHERINE'S REPLY
+SUPPOSING THE REPORT HAD BEEN TRUE
+IN THE MEANTIME AFTER WHAT MISSUS PRESTY HAD CONFESSED THE CRUEL FALSEHOOD WHICH HAD CHECKED POOR KITTY'S NATURAL INQUIRIES RAISED AN INSUPERABLE OBSTACLE TO A MEETING BETWEEN FATHER AND CHILD
+BE THE RESULTS HOWEVER WHAT THEY MIGHT RANDAL COULD SEE BUT ONE PLAIN COURSE BEFORE HIM NOW
+HE HAD PROMISED TO DO HIS BEST TOWARD PERSUADING CATHERINE TO GRANT SYDNEY AN INTERVIEW
+WHAT HAPPIER FUTURE COULD AWAIT HER ESPECIALLY IF SHE JUSTIFIED RANDAL'S PAST EXPERIENCE OF ALL THAT WAS CANDID AND TRUTHFUL IN HER CHARACTER THAN TO BECOME HIS FRIEND'S WIFE
+YOU HAVE BEEN TO THE HOTEL HE BURST OUT YOU HAVE SEEN CATHERINE
+NOT SATISFIED WITH GOSSIP IN PRIVATE THE GREEDY PUBLIC APPETITE DEVOURS GOSSIP IN PRINT AND WANTS MORE OF IT THAN ANY ONE EDITOR CAN SUPPLY
+HE ADDED SYDNEY'S ADDRESS IN A POSTSCRIPT AND DISPATCHED HIS LETTER THAT EVENING
+CONSIDERATIONS OF DELICACY SEEMED TO FORBID TAKING THIS LIBERTY EVEN WITH AN INTIMATE FRIEND
+RANDAL HE SAID YOU KNOW WHERE SYDNEY IS
+I HAVEN'T COURAGE ENOUGH TO DO IT FOR MYSELF
+RANDAL WROTE TO ACCEPT THE INVITATION DETERMINING TO PRESENT HIMSELF BEFORE THE APPOINTED HOUR AND TO QUESTION CATHERINE PRIVATELY WITHOUT GIVING HER THE ADVANTAGE OVER HIM OF PREPARING HERSELF FOR THE INTERVIEW
+HE PAUSED AND PUT HIS HAND TO HIS FEVERED HEAD
+I'M AFRAID HE SAID
+HE PUT DOWN THE EMPTY GLASS TAKING NO NOTICE OF HIS BROTHER'S QUESTION
+I'M ALONE DO YOU HEAR THAT ALONE
+THIS ALTERNATIVE IN THE CAPTAIN'S PLANS TERMINATING THE VOYAGE A MONTH EARLIER THAN HIS ARRANGEMENTS HAD CONTEMPLATED PUZZLED RANDAL
+I WILL DO NEITHER THE ONE NOR THE OTHER
+THE SAILING MASTER ANNOUNCED THAT HE HAD ORDERS TO TAKE THE VESSEL BACK TO HER PORT WITH NO OTHER EXPLANATION THAN THAT THE CRUISE WAS OVER
+SHE SHALL HAVE YOUR MESSAGE ALL THAT I CAN DO TO PERSUADE HER SHALL BE DONE
+YOU DISTRESS ME HERBERT MORE THAN WORDS CAN SAY
+HE IS STAYING AT THIS HOTEL TO TRY THE AIR OF SYDENHAM AND HE FINDS THAT IT AGREES WITH HIM
+YOU DON'T KNOW WHAT IT IS TO BE USED TO SEEING A PRETTY CREATURE ALWAYS NICELY DRESSED ALWAYS ABOUT THE ROOM THINKING SO MUCH OF YOU AND SO LITTLE OF HERSELF AND THEN TO BE LEFT ALONE AS I AM LEFT OUT IN THE DARK
+HE DRANK THE WINE GREEDILY
+RANDAL WAITED A WHILE IN LONDON ON THE CHANCE THAT BENNYDECK MIGHT PAY HIM A VISIT
+AFTER MONTHS OF SEPARATION HE RECEIVED A VISIT FROM HERBERT
+LET ME HEAR WHAT IT IS FIRST
+NOT HAVING HEARD FROM CAPTAIN BENNYDECK FOR SOME LITTLE TIME RANDAL THOUGHT IT DESIRABLE IN SYDNEY'S INTERESTS TO MAKE INQUIRIES AT HIS CLUB
+WAS HIS MIND WANDERING INTO SOME OTHER TRAIN OF THOUGHT
+OH WHY DID I ENGAGE THAT GOVERNESS
+HAD HER BEAUTY FASCINATED HIM
+HE MENTIONED THE NAME OF ONE OF THE OLD SERVANTS AT MOUNT MORVEN WHO HAD ATTACHED HIMSELF TO RANDAL AFTER THE BREAKUP OF THE FAMILY
+YOU CAN'T DO IT
+I FEEL FOR YOU HERBERT HE SAID WARMLY
+LET ME REST A LITTLE HE PLEADED IF I'M NOT IN THE WAY
+I TRIED IT YESTERDAY IT SET MY BRAINS ON FIRE I'M FEELING THAT GLASS I TOOK JUST NOW
+HONESTLY
+BUT IF THESE NEWSPAPER PEOPLE WAITED TO FIND OUT WHETHER A REPORT IS TRUE OR FALSE HOW MUCH GOSSIP WOULD SOCIETY GET IN ITS FAVORITE NEWSPAPERS
+WHILE HE WAS WALKING UP AND DOWN THE PLATFORM WITH A MIND DOUBLY DISTRESSED BY ANXIETY ABOUT HIS BROTHER AND ANXIETY ABOUT SYDNEY THE TRAIN FROM LONDON CAME IN
+AFTER THAT I HAD HER MOTHER'S AUTHORITY FOR TELLING KITTY THAT SHE WOULD NEVER SEE HER FATHER AGAIN
+HAVE YOU ANY MESSAGE FOR CAPTAIN BENNYDECK
+SIT DOWN SAID MISSUS PRESTY
+WITH HIS OWN SUSPICIONS STEADILY CONTRADICTING HIM HE ARRIVED AT THE HOTEL OBSTINATELY BELIEVING THAT THE CHARMING WIDOW WOULD PROVE TO BE A STRANGER
+THE NEW NUMBER OF A POPULAR WEEKLY JOURNAL HAD THAT DAY BEEN PUBLISHED RANDAL BOUGHT IT
+WHEN HE AND I HAPPENED TO BE LEFT TOGETHER HE NATURALLY WONDERED AFTER HAVING SEEN THE BEAUTIFUL WIFE WHERE THE LUCKY HUSBAND MIGHT BE
+YOU WOULD HAVE SEEN HER PINING FOR THE COMPANY OF OTHER CHILDREN AND WOULD HAVE HAD NO MERCY ON HER
+WORSE STORIES HAVE BEEN PRINTED I DO ASSURE YOU WORSE STORIES HAVE BEEN PRINTED
+ARRIVED AT THE STATION RANDAL FOUND THAT HE MUST WAIT FOR THE TRAIN
+MISSUS PRESTY WAS AT HOME SHE WAS REPORTED TO BE IN THE GARDEN OF THE HOTEL
+THE REPORT IS PREMATURE MY GOOD FRIEND
+GOOD BY DEAR RANDAL
+MISSUS NORMAN AND HER LITTLE DAUGHTER WERE OUT DRIVING WITH A FRIEND AND WERE EXPECTED TO RETURN IN GOOD TIME FOR DINNER
+HE WAS INTRODUCED TO MISSUS NORMAN AND TO MISSUS NORMAN'S LITTLE GIRL AND WE WERE ALL CHARMED WITH HIM
+BUT YOU OUGHT TO HAVE KNOWN THAT WE ARE ONLY HALF AN HOUR BEHIND YOU AT SYDENHAM IN THE MATTER OF NEWS
+YOU SHALL HEAR HOW MY DIVORCED DAUGHTER AND MY POOR LITTLE GRANDCHILD WERE TREATED AT SANDYSEAL AFTER YOU LEFT US
+SHE ASKED DIRECTLY IF HER FATHER WAS DEAD
+RANDAL PASSED THIS OVER WITHOUT NOTICE
+AFTER READING ONE OR TWO OF THE POLITICAL ARTICLES HE ARRIVED AT THE COLUMNS SPECIALLY DEVOTED TO FASHIONABLE INTELLIGENCE
+IT WAS A RELIEF TO RANDAL IN THE PRESENT STATE OF CATHERINE'S RELATIONS TOWARD BENNYDECK TO RETURN TO LONDON WITHOUT HAVING SEEN HIS FRIEND
+NOT AT THE HOTEL JUST NOW
+YOU ARE TO UNDERSTAND THAT CATHERINE IS A WIDOW
+ON THE NEXT DAY BUT ONE RANDAL ARRANGED HIS DEPARTURE FOR SYDENHAM SO AS TO ARRIVE AT THE HOTEL AN HOUR BEFORE THE TIME APPOINTED FOR THE DINNER
+SHE ADDED LOOKING AT HIM SUSPICIOUSLY
+RANDAL LOOKED AGAIN AT THE FIRST WORDS IN THE PARAGRAPH
+HOW NICE OF YOU TO COME SO SOON SHE BEGAN
+AND THE CAPTAIN OF COURSE CONCLUDED AFTER HAVING BEEN INTRODUCED TO KITTY THAT MISSUS NORMAN WAS A WIDOW
+THAT WILL DO MISSUS PRESTY YOUR DEFENSE IS THOROUGHLY WORTHY OF YOUR CONDUCT IN ALL OTHER RESPECTS
+A VERY WISE DECISION SHE REMARKED
+BEFORE I CONSENTED TO ANSWER THE CHILD'S INQUIRIES I CAME TO AN UNDERSTANDING WITH HER MOTHER
+WHICH OPPRESSED THE METROPOLITANS OF EUROPE AND ASIA INVADED THE PROVINCES OF ANTIOCH AND ALEXANDRIA AND MEASURED THEIR DIOCESE BY THE LIMITS OF THE EMPIRE
+THE ZEAL OF CYRIL EXPOSED HIM TO THE PENALTIES OF THE JULIAN LAW BUT IN A FEEBLE GOVERNMENT AND A SUPERSTITIOUS AGE HE WAS SECURE OF IMPUNITY AND EVEN OF PRAISE
+AT THE SAME TIME EVERY AVENUE OF THE THRONE WAS ASSAULTED WITH GOLD
+ARDENT IN THE PROSECUTION OF HERESY CYRIL AUSPICIOUSLY OPENED HIS REIGN BY OPPRESSING THE NOVATIANS THE MOST INNOCENT AND HARMLESS OF THE SECTARIES
+A WANDERING TRIBE OF THE BLEMMYES OR NUBIANS INVADED HIS SOLITARY PRISON IN THEIR RETREAT THEY DISMISSED A CROWD OF USELESS CAPTIVES BUT NO SOONER HAD NESTORIUS REACHED THE BANKS OF THE NILE THAN HE WOULD GLADLY HAVE ESCAPED FROM A ROMAN AND ORTHODOX CITY TO THE MILDER SERVITUDE OF THE SAVAGES
+NESTORIUS WHO DEPENDED ON THE NEAR APPROACH OF HIS EASTERN FRIENDS PERSISTED LIKE HIS PREDECESSOR CHRYSOSTOM TO DISCLAIM THE JURISDICTION AND TO DISOBEY THE SUMMONS OF HIS ENEMIES THEY HASTENED HIS TRIAL AND HIS ACCUSER PRESIDED IN THE SEAT OF JUDGMENT
+BY THE VIGILANCE OF MEMNON THE CHURCHES WERE SHUT AGAINST THEM AND A STRONG GARRISON WAS THROWN INTO THE CATHEDRAL
+THE VANITY OF CELESTINE WAS FLATTERED BY THE APPEAL AND THE PARTIAL VERSION OF A MONK DECIDED THE FAITH OF THE POPE WHO WITH HIS LATIN CLERGY WAS IGNORANT OF THE LANGUAGE THE ARTS AND THE THEOLOGY OF THE GREEKS
+SIXTY EIGHT BISHOPS TWENTY TWO OF METROPOLITAN RANK DEFENDED HIS CAUSE BY A MODEST AND TEMPERATE PROTEST THEY WERE EXCLUDED FROM THE COUNCILS OF THEIR BRETHREN
+AT THESE BLASPHEMOUS SOUNDS THE PILLARS OF THE SANCTUARY WERE SHAKEN
+SUCH CRIMES WOULD HAVE DESERVED THE ANIMADVERSION OF THE MAGISTRATE BUT IN THIS PROMISCUOUS OUTRAGE THE INNOCENT WERE CONFOUNDED WITH THE GUILTY AND ALEXANDRIA WAS IMPOVERISHED BY THE LOSS OF A WEALTHY AND INDUSTRIOUS COLONY
+THE FEEBLE SON OF ARCADIUS WAS ALTERNATELY SWAYED BY HIS WIFE AND SISTER BY THE EUNUCHS AND WOMEN OF THE PALACE SUPERSTITION AND AVARICE WERE THEIR RULING PASSIONS AND THE ORTHODOX CHIEFS WERE ASSIDUOUS IN THEIR ENDEAVORS TO ALARM THE FORMER AND TO GRATIFY THE LATTER
+RETURN TO YOUR PROVINCES AND MAY YOUR PRIVATE VIRTUES REPAIR THE MISCHIEF AND SCANDAL OF YOUR MEETING
+A RUMOR WAS SPREAD AMONG THE CHRISTIANS THAT THE DAUGHTER OF THEON WAS THE ONLY OBSTACLE TO THE RECONCILIATION OF THE PRAEFECT AND THE ARCHBISHOP AND THAT OBSTACLE WAS SPEEDILY REMOVED
+ORESTES COMPLAINED BUT HIS JUST COMPLAINTS WERE TOO QUICKLY FORGOTTEN BY THE MINISTERS OF THEODOSIUS AND TOO DEEPLY REMEMBERED BY A PRIEST WHO AFFECTED TO PARDON AND CONTINUED TO HATE THE PRAEFECT OF EGYPT
+DURING A BUSY PERIOD OF THREE MONTHS THE EMPEROR TRIED EVERY METHOD EXCEPT THE MOST EFFECTUAL MEANS OF INDIFFERENCE AND CONTEMPT TO RECONCILE THIS THEOLOGICAL QUARREL
+WITHOUT ANY LEGAL SENTENCE WITHOUT ANY ROYAL MANDATE THE PATRIARCH AT THE DAWN OF DAY LED A SEDITIOUS MULTITUDE TO THE ATTACK OF THE SYNAGOGUES
+THE PAST HE REGRETTED HE WAS DISCONTENTED WITH THE PRESENT AND THE FUTURE HE HAD REASON TO DREAD THE ORIENTAL BISHOPS SUCCESSIVELY DISENGAGED THEIR CAUSE FROM HIS UNPOPULAR NAME AND EACH DAY DECREASED THE NUMBER OF THE SCHISMATICS WHO REVERED NESTORIUS AS THE CONFESSOR OF THE FAITH
+BUT IN THIS AWFUL MOMENT OF THE DANGER OF THE CHURCH THEIR VOW WAS SUPERSEDED BY A MORE SUBLIME AND INDISPENSABLE DUTY
+EXTERMINATE WITH ME THE HERETICS AND WITH YOU I WILL EXTERMINATE THE PERSIANS
+BUT THE VATICAN RECEIVED WITH OPEN ARMS THE MESSENGERS OF EGYPT
+BUT THE PREVAILING DOCTRINE OF THE ETERNITY AND INHERENT PRAVITY OF MATTER INFECTED THE PRIMITIVE CHURCHES OF THE EAST
+HE FIRST APPEARED ON THE BANKS OF THE JORDAN IN THE FORM OF PERFECT MANHOOD BUT IT WAS A FORM ONLY AND NOT A SUBSTANCE A HUMAN FIGURE CREATED BY THE HAND OF OMNIPOTENCE TO IMITATE THE FACULTIES AND ACTIONS OF A MAN AND TO IMPOSE A PERPETUAL ILLUSION ON THE SENSES OF HIS FRIENDS AND ENEMIES
+A LAUDABLE REGARD FOR THE HONOR OF THE FIRST PROSELYTE HAS COUNTENANCED THE BELIEF THE HOPE THE WISH THAT THE EBIONITES OR AT LEAST THE NAZARENES WERE DISTINGUISHED ONLY BY THEIR OBSTINATE PERSEVERANCE IN THE PRACTICE OF THE MOSAIC RITES
+BUT THE RASHNESS OF THESE CONCESSIONS HAS ENCOURAGED A MILDER SENTIMENT OF THOSE OF THE DOCETES WHO TAUGHT NOT THAT CHRIST WAS A PHANTOM BUT THAT HE WAS CLOTHED WITH AN IMPASSIBLE AND INCORRUPTIBLE BODY
+HE LIVED AND DIED FOR THE SERVICE OF MANKIND BUT THE LIFE AND DEATH OF SOCRATES HAD LIKEWISE BEEN DEVOTED TO THE CAUSE OF RELIGION AND JUSTICE AND ALTHOUGH THE STOIC OR THE HERO MAY DISDAIN THE HUMBLE VIRTUES OF JESUS THE TEARS WHICH HE SHED OVER HIS FRIEND AND COUNTRY MAY BE ESTEEMED THE PUREST EVIDENCE OF HIS HUMANITY
+A FOETUS THAT COULD INCREASE FROM AN INVISIBLE POINT TO ITS FULL MATURITY A CHILD THAT COULD ATTAIN THE STATURE OF PERFECT MANHOOD WITHOUT DERIVING ANY NOURISHMENT FROM THE ORDINARY SOURCES MIGHT CONTINUE TO EXIST WITHOUT REPAIRING A DAILY WASTE BY A DAILY SUPPLY OF EXTERNAL MATTER
+YET THE MOST CHARITABLE CRITICISM MUST REFUSE THESE SECTARIES ANY KNOWLEDGE OF THE PURE AND PROPER DIVINITY OF CHRIST
+HIS PROGRESS FROM INFANCY TO YOUTH AND MANHOOD WAS MARKED BY A REGULAR INCREASE IN STATURE AND WISDOM AND AFTER A PAINFUL AGONY OF MIND AND BODY HE EXPIRED ON THE CROSS
+THEIR MURMURS WERE VARIOUSLY SILENCED BY THE SECTARIES WHO ESPOUSED AND MODIFIED THE DOUBLE SYSTEM OF CERINTHUS
+WHEN THE MESSIAH WAS DELIVERED INTO THE HANDS OF THE JEWS THE CHRIST AN IMMORTAL AND IMPASSIBLE BEING FORSOOK HIS EARTHLY TABERNACLE FLEW BACK TO THE PLEROMA OR WORLD OF SPIRITS AND LEFT THE SOLITARY JESUS TO SUFFER TO COMPLAIN AND TO EXPIRE
+HE ACQUIESCED IN THE OLD DISTINCTION OF THE GREEK PHILOSOPHERS BETWEEN THE RATIONAL AND SENSITIVE SOUL OF MAN THAT HE MIGHT RESERVE THE LOGOS FOR INTELLECTUAL FUNCTIONS AND EMPLOY THE SUBORDINATE HUMAN PRINCIPLE IN THE MEANER ACTIONS OF ANIMAL LIFE
+MANY AMONG THE GENTILE PROSELYTES REFUSED TO BELIEVE THAT A CELESTIAL SPIRIT AN UNDIVIDED PORTION OF THE FIRST ESSENCE HAD BEEN PERSONALLY UNITED WITH A MASS OF IMPURE AND CONTAMINATED FLESH AND IN THEIR ZEAL FOR THE DIVINITY THEY PIOUSLY ABJURED THE HUMANITY OF CHRIST
+NOR COULD IT SEEM STRANGE OR INCREDIBLE THAT THE FIRST OF THESE AEONS THE LOGOS OR WORD OF GOD OF THE SAME SUBSTANCE WITH THE FATHER SHOULD DESCEND UPON EARTH TO DELIVER THE HUMAN RACE FROM VICE AND ERROR AND TO CONDUCT THEM IN THE PATHS OF LIFE AND IMMORTALITY
+IN THEIR EYES JESUS OF NAZARETH WAS A MERE MORTAL THE LEGITIMATE SON OF JOSEPH AND MARY BUT HE WAS THE BEST AND WISEST OF THE HUMAN RACE SELECTED AS THE WORTHY INSTRUMENT TO RESTORE UPON EARTH THE WORSHIP OF THE TRUE AND SUPREME DEITY
+THE WORTHY FRIEND OF ATHANASIUS THE WORTHY ANTAGONIST OF JULIAN HE BRAVELY WRESTLED WITH THE ARIANS AND POLYTHEISTS AND THOUGH HE AFFECTED THE RIGOR OF GEOMETRICAL DEMONSTRATION HIS COMMENTARIES REVEALED THE LITERAL AND ALLEGORICAL SENSE OF THE SCRIPTURES
+BUT THE JUSTICE AND GENEROSITY OF SUCH A DESERTION ARE STRONGLY QUESTIONABLE AND THE FATE OF AN INNOCENT MARTYR AT FIRST IMPELLED AND AT LENGTH ABANDONED BY HIS DIVINE COMPANION MIGHT PROVOKE THE PITY AND INDIGNATION OF THE PROFANE
+UNDER THE TUITION OF THE ABBOT SERAPION HE APPLIED HIMSELF TO ECCLESIASTICAL STUDIES WITH SUCH INDEFATIGABLE ARDOR THAT IN THE COURSE OF ONE SLEEPLESS NIGHT HE HAS PERUSED THE FOUR GOSPELS THE CATHOLIC EPISTLES AND THE EPISTLE TO THE ROMANS
+THE SON OF A VIRGIN GENERATED BY THE INEFFABLE OPERATION OF THE HOLY SPIRIT WAS A CREATURE WITHOUT EXAMPLE OR RESEMBLANCE SUPERIOR IN EVERY ATTRIBUTE OF MIND AND BODY TO THE CHILDREN OF ADAM
+THEIR CHURCHES HAVE DISAPPEARED THEIR BOOKS ARE OBLITERATED THEIR OBSCURE FREEDOM MIGHT ALLOW A LATITUDE OF FAITH AND THE SOFTNESS OF THEIR INFANT CREED WOULD BE VARIOUSLY MOULDED BY THE ZEAL OR PRUDENCE OF THREE HUNDRED YEARS
+YET AS THE PROFOUND DOCTOR HAD BEEN TERRIFIED AT HIS OWN RASHNESS APOLLINARIS WAS HEARD TO MUTTER SOME FAINT ACCENTS OF EXCUSE AND EXPLANATION
+BUT INSTEAD OF A TEMPORARY AND OCCASIONAL ALLIANCE THEY ESTABLISHED AND WE STILL EMBRACE THE SUBSTANTIAL INDISSOLUBLE AND EVERLASTING UNION OF A PERFECT GOD WITH A PERFECT MAN OF THE SECOND PERSON OF THE TRINITY WITH A REASONABLE SOUL AND HUMAN FLESH
+YOU'VE GOT A BIRTHDAY PRESENT THIS TIME JIM AND NO MISTAKE
+THE WEEK FOLLOWING CHRISTMAS BROUGHT IN A THAW AND BY NEW YEAR'S DAY ALL THE WORLD ABOUT US WAS A BROTH OF GRAY SLUSH AND THE GUTTERED SLOPE BETWEEN THE WINDMILL AND THE BARN WAS RUNNING BLACK WATER
+IT WAS THE FIRST TIME MISSUS SHIMERDA HAD BEEN TO OUR HOUSE AND SHE RAN ABOUT EXAMINING OUR CARPETS AND CURTAINS AND FURNITURE ALL THE WHILE COMMENTING UPON THEM TO HER DAUGHTER IN AN ENVIOUS COMPLAINING TONE
+THEY BEGAN TO LAUGH BOISTEROUSLY WHEN THEY SAW ME CALLING
+BUT YOU SEE A BODY NEVER KNOWS WHAT TRAITS POVERTY MIGHT BRING OUT IN EM
+YOUR MAMA I SAID ANGRILY WANTS OTHER PEOPLE'S THINGS
+FOR AMBROSCH MY MAMA COME HERE
+THE BOARD NOT SO FORMIDABLE AS SHE HAD IMAGINED HAD INQUIRED INTO HER CASE AND INSTEAD OF SENDING HER TO STOKE CLAYPOLE HER HUSBAND'S BUCKINGHAMSHIRE PARISH AS SHE HAD DREADED HAD AGREED TO PAY HER RENT
+THE GHOUL LIKE FEVER WAS NOT TO BE BRAVED WITH IMPUNITY AND BAULKED OF ITS PREY
+AFORE CHRISTMAS TIME I WAS AS FULL AS FULL COULD BE OF GOING HOME FOR GOOD AND ALL YO HAN HEARD HOW I'VE WISHED IT THIS TERRIBLE LONG TIME
+BUT THE BEST OF HER PLANS THE HOLIEST THAT WHICH IN SOME MEASURE REDEEMED THE VANITY OF THE REST WERE THOSE RELATING TO HER FATHER HER DEAR FATHER NOW OPPRESSED WITH CARE AND ALWAYS A DISHEARTENED GLOOMY PERSON
+THEN ALICE BROKE THE SILENCE BY SAYING
+HER CRIES BROUGHT HER HUSBAND DOWN TO TRY WITH HIS ACHING HEART TO COMFORT HERS
+IS THERE ANY CHANCE FOR THE OTHER ONE THINK YOU
+BUT HE STAYED LONG THERE AND AT LAST HIS STURDY FRAME SHOOK WITH HIS STRONG AGONY
+HER THOUGHTS RAN ON JEM'S MANNER AND WORDS NOT BUT WHAT SHE HAD KNOWN THE TALE THEY TOLD FOR MANY A DAY BUT STILL SHE WISHED HE HAD NOT PUT IT SO PLAINLY
+MARY AND ALICE DREW NEAR THE FIRE AND STOOD IN QUIET SORROW FOR SOME TIME
+THERE WAS SOMETHING OF KEEN PRACTICAL SHREWDNESS ABOUT HER WHICH CONTRASTED VERY BEWITCHINGLY WITH THE SIMPLE FOOLISH UNWORLDLY IDEAS SHE HAD PICKED UP FROM THE ROMANCES WHICH MISS SIMMONDS YOUNG LADIES WERE IN THE HABIT OF RECOMMENDING TO EACH OTHER YES
+THE OLD LEAVEN INFUSED YEARS AGO BY HER AUNT ESTHER FERMENTED IN HER LITTLE BOSOM AND PERHAPS ALL THE MORE FOR HER FATHER'S AVERSION TO THE RICH AND THE GENTLE
+SHE OPENED THE DOOR SOFTLY THERE SAT MISSUS WILSON IN THE OLD ROCKING CHAIR WITH ONE SICK DEATH LIKE BOY LYING ON HER KNEE CRYING WITHOUT LET OR PAUSE BUT SOFTLY GENTLY AS FEARING TO DISTURB THE TROUBLED GASPING CHILD WHILE BEHIND HER OLD ALICE LET HER FAST DROPPING TEARS FALL DOWN ON THE DEAD BODY OF THE OTHER TWIN WHICH SHE WAS LAYING OUT ON A BOARD PLACED ON A SORT OF SOFA SETTEE IN A CORNER OF THE ROOM
+BUT EARNEST AS THE FATHER WAS IN WATCHING THE YET LIVING HE HAD EYES AND EARS FOR ALL THAT CONCERNED THE DEAD AND SPRANG GENTLY UP AND TOOK HIS DEAD SON ON HIS HARD COUCH IN HIS ARMS WITH TENDER STRENGTH AND CARRIED HIM UPSTAIRS AS IF AFRAID OF WAKENING HIM
+MARY I ALMOST LOATHE MYSELF WHEN I FEEL I WOULD NOT GIVE UP THIS MINUTE WHEN MY BROTHERS LIE DEAD AND FATHER AND MOTHER ARE IN SUCH TROUBLE FOR ALL MY LIFE THAT'S PAST AND GONE AND MARY AS SHE TRIED TO RELEASE HER HAND YOU KNOW WHAT MAKES ME FEEL SO BLESSED
+DON'T JEM PLEASE DON'T WHISPERED SHE AGAIN BELIEVING THAT HIS SILENCE WAS ONLY ANOTHER FORM OF GRIEF
+HE REMAINED UP STAIRS UNTIL AFTER THE EARLY DAWN SHOWED MARY THAT SHE NEED HAVE NO FEAR OF GOING HOME THROUGH THE DESERTED AND QUIET STREETS TO TRY AND GET A LITTLE SLEEP BEFORE WORK HOUR
+OH JEM DON'T GIVE WAY SO I CANNOT BEAR TO SEE YOU
+SHE STOPPED WITH HER HAND ON THE LATCH OF THE WILSONS DOOR TO STILL HER BEATING HEART AND LISTENED TO THE HUSHED QUIET WITHIN
+MARGARET MET JEM WILSON SEVERAL DAYS AFTER HIS BROTHERS WERE SERIOUSLY ILL AND HEARD FROM HIM THE STATE OF THINGS AT HIS HOME
+OVER THE CHILD WHICH YET BREATHED THE FATHER BENT WATCHING ANXIOUSLY FOR SOME GROUND OF HOPE WHERE HOPE THERE WAS NONE
+SO LEAVING KIND MESSAGES TO GEORGE AND JANE WILSON AND HESITATING WHETHER SHE MIGHT DARE TO SEND A FEW KIND WORDS TO JEM AND DECIDING THAT SHE HAD BETTER NOT SHE STEPPED OUT INTO THE BRIGHT MORNING LIGHT SO FRESH A CONTRAST TO THE DARKENED ROOM WHERE DEATH HAD BEEN
+WHAT BANKRUPTS IN THE WORLD WE FEEL WHEN DEATH LIKE SOME REMORSELESS CREDITOR SEIZES ON ALL WE FONDLY THOUGHT OUR OWN THE TWINS
+THEN THE MOTHER LIFTED UP HER VOICE AND WEPT
+IT WAS A COMFORT TO HER WHEN SCOLDED BY MISS SIMMONDS TO THINK OF THE DAY WHEN SHE WOULD DRIVE UP TO THE DOOR IN HER OWN CARRIAGE TO ORDER HER GOWNS FROM THE HASTY TEMPERED YET KIND DRESSMAKER
+HE DID NOT SPEAK AS THOUGH FEARING TO DESTROY BY SOUND OR MOTION THE HAPPINESS OF THAT MOMENT WHEN HER SOFT HAND'S TOUCH THRILLED THROUGH HIS FRAME AND HER SILVERY VOICE WAS WHISPERING TENDERNESS IN HIS EAR
+I THINK I CANNOT GO RIGHT FOR I EITHER CHECK MYSELF TILL I'M DOWNRIGHT CROSS TO HIM OR ELSE I SPEAK JUST NATURAL AND THAT'S TOO KIND AND TENDER BY HALF
+HOW INFINITE THE WEALTH OF LOVE AND HOPE GARNERED IN THESE SAME TINY TREASURE HOUSES AND OH
+WISHING HIM SAID MARY IN A TONE OF INQUIRY
+I CANNOT THINK WHAT POSSESSES ME THAT I MUST ALWAYS BE WANTING TO COMFORT HIM WHEN HE'S DOWNCAST AND THAT I MUST GO MEDDLING WI HIM TO NIGHT WHEN SURE ENOUGH IT WAS HIS AUNT'S PLACE TO SPEAK TO HIM
+BUT WILL HE THANK ME FOR IT
+I PUT ON MY CAP AND RAN OUT TO MEET JAKE
+ON THE WHITE PAGES I GROUPED SUNDAY SCHOOL CARDS AND ADVERTISING CARDS WHICH I HAD BROUGHT FROM MY OLD COUNTRY
+ANYWAY HE WOULD NEVER ALLOW ONE OF HIS HORSES TO BE PUT TO SUCH A STRAIN
+THEY SAT ABOUT THE HOUSE MOST OF THE DAY AS IF IT WERE SUNDAY GREASING THEIR BOOTS MENDING THEIR SUSPENDERS PLAITING WHIPLASHES
+I CAN SEE THEM NOW EXACTLY AS THEY LOOKED WORKING ABOUT THE TABLE IN THE LAMPLIGHT JAKE WITH HIS HEAVY FEATURES SO RUDELY MOULDED THAT HIS FACE SEEMED SOMEHOW UNFINISHED OTTO WITH HIS HALF EAR AND THE SAVAGE SCAR THAT MADE HIS UPPER LIP CURL SO FEROCIOUSLY UNDER HIS TWISTED MUSTACHE
+SHE CUT SQUARES OF COTTON CLOTH AND WE SEWED THEM TOGETHER INTO A BOOK
+BY THE TIME WE HAD PLACED THE COLD FRESH SMELLING LITTLE TREE IN A CORNER OF THE SITTING ROOM IT WAS ALREADY CHRISTMAS EVE
+I HAD WANTED TO GET SOME PICTURE BOOKS FOR YULKA AND ANTONIA EVEN YULKA WAS ABLE TO READ A LITTLE NOW
+HE USED TO HELP MY FATHER CUT CHRISTMAS TREES FOR ME IN VIRGINIA AND HE HAD NOT FORGOTTEN HOW MUCH I LIKED THEM
+WHEN HE MOUNTED HIS HORSE AT THE DOOR I SAW THAT HE HAD A HATCHET SLUNG TO HIS BELT AND HE GAVE GRANDMOTHER A MEANING LOOK WHICH TOLD ME HE WAS PLANNING A SURPRISE FOR ME
+FROM UNDER THE LINING HE NOW PRODUCED A COLLECTION OF BRILLIANTLY COLORED PAPER FIGURES SEVERAL INCHES HIGH AND STIFF ENOUGH TO STAND ALONE
+BECAUSE HE TALKED SO LITTLE HIS WORDS HAD A PECULIAR FORCE THEY WERE NOT WORN DULL FROM CONSTANT USE
+HIS FACE HAD A LOOK OF WEARINESS AND PLEASURE LIKE THAT OF SICK PEOPLE WHEN THEY FEEL RELIEF FROM PAIN
+HE MADE THE SIGN OF THE CROSS OVER ME PUT ON HIS CAP AND WENT OFF IN THE DARK
+HE GAVE THANKS FOR OUR FOOD AND COMFORT AND PRAYED FOR THE POOR AND DESTITUTE IN GREAT CITIES WHERE THE STRUGGLE FOR LIFE WAS HARDER THAN IT WAS HERE WITH US
+GRANDFATHER CAME DOWN WEARING A WHITE SHIRT AND HIS SUNDAY COAT
+MORNING PRAYERS WERE LONGER THAN USUAL
+HE SAT STILL AND PASSIVE HIS HEAD RESTING AGAINST THE BACK OF THE WOODEN ROCKING CHAIR HIS HANDS RELAXED UPON THE ARMS
+AT ABOUT FOUR O'CLOCK A VISITOR APPEARED MISTER SHIMERDA WEARING HIS RABBIT SKIN CAP AND COLLAR AND NEW MITTENS HIS WIFE HAD KNITTED
+ALL AFTERNOON HE SAT IN THE DINING ROOM
+I WISH WE HAD NOT LEFT SO SOON
+THE SUGGESTION HE HAD LAUGHED AT WAS NOT SO ENTIRELY FOOLISH AS HE HAD BEEN PLEASED TO CONSIDER IT
+THE BOYS LOOK WIDE AWAKE ENOUGH IF THE FATHER DOES NOT
+I KNOW IT SOUNDS FOOLISH BUT THE ALTERNATIVE IS SO IMPROBABLE
+BEEN OVER THE GROUND STUDIED THE AFFAIR CAREFULLY
+AT LEAST THAT IS WHAT WE HOPE
+HE APPEARED TO KNOW FOR HE TOLD ME AT ONCE THAT HE WAS DETECTIVE GRYCE A MAN WHO HAD GROWN OLD IN SOLVING JUST SUCH BAFFLING PROBLEMS AS THESE
+THIS SWEETWATER AS THEY CALLED HIM WAS I HAVE SINCE UNDERSTOOD ONE OF HIS PROTEGES AND MORE OR LESS OF A FAVOURITE
+THERE WAS NO PONIARD IN THE WOUND
+I EMPHASISED COMPLACENTLY
+IT HAS NOT BEEN RUNNING SINCE LAST NIGHT OR IT WOULD BE FULL OF CURIOUS PEOPLE ALL THE TIME HUSTLING TO GET A GLIMPSE OF THIS PLACE
+GEORGE
+BUT VIGOUR RETURNED TO HIM BEFORE HE HAD WELL REACHED THE DOOR AND HE SHOWED SOME OF HIS OLD SPIRIT AS HE THANKED MISS CLARKE AND TURNED TO TAKE THE ELEVATOR
+THIS IS VERY GOOD OF YOU HE BEGAN GLANCING DOWN AT THE AGED DETECTIVE'S BUNDLED UP LEGS AND GENTLY PUSHING A CHAIR TOWARDS HIM
+THE NEXT MINUTE SHE WAS IN THIS LADY'S ARMS
+THE OLD MAN'S EYES SHOT FIRE AND UNCONSCIOUSLY ONE FOOT SLIPPED TO THE FLOOR
+I'LL WAIT HERE TILL YOU'RE READY EXPLAIN YOURSELF TO THE LADY TELL HER I'M AN OLD AND RHEUMATIC INVALID WHO HAS BEEN USED TO ASKING HIS OWN QUESTIONS
+VERY GOOD MANAGE IT AS YOU WILL
+HAVE YOU EVER THOUGHT THAT SHE MIGHT HAVE BEEN A SUICIDE
+I KNOW SHE SAID WHAT YOU ARE GOING TO ASK ME NOW
+YES MANY TIMES
+GEORGE NODDED
+IN OLDEN DAYS THEY WOULD HAVE SAID STRUCK BY A BOLT FROM HEAVEN
+I SUPPOSE SHE HAS BEEN CAREFULLY QUESTIONED VERY I SHOULD SAY
+I TOOK QUITE A FANCY TO HIM WHY
+ONE OR TWO OF THE MUSICIANS FROM THE END OF THE HALL
+NOT ALTOGETHER BY ME
+NOT TILL THE DOCTOR CAME HER DOCTOR WHO WAS HAPPILY IN HIS OFFICE IN THIS VERY BUILDING
+SWEETWATER SOMEONE DREW THAT WEAPON OUT
+DO THEY STILL INSIST THAT MISS CHALLONER WAS THE ONLY PERSON IN THE ROOM WITH THEM AT THIS TIME
+I ASKED AS SOON AS GEORGE HAD RETURNED TO MY SIDE
+BUT CLEWS THERE ARE ABSOLUTELY NONE
+WHEN WE TOOK OUR SEATS AT THE BREAKFAST TABLE IT WAS WITH THE FEELING OF BEING NO LONGER LOOKED UPON AS CONNECTED IN ANY WAY WITH THIS CASE
+AND SHE SPEAKS OF NO WEAPON
+WHO CAME NEXT ON THE SCENE SOME PEOPLE FROM THE LOBBY
+I INQUIRED OF GEORGE WITH MY EYES STILL ON THIS FURTIVE WATCHER
+I SHOULD LIKE TO SEE THE DESK YOU SPEAK OF AND THE SPOT WHERE SHE FELL
+MISS CLARKE STARTED AND HER SWEET FACE SHOWED A MOMENT'S PERPLEXITY DID I SHE QUERIED MUSINGLY
+WELL WELL THAT'S HONEST AT ALL EVENTS
+HE WAS THE PLAIN FACED DETECTIVE WHO HAD SPOKEN TO GEORGE
+I AM LOOKING AT HIM NOW
+THE BOYS BELONG TO THE GENTLEMAN WHO IS A WIDOWER
+THE TRAIL HERE MUST BE A VERY BLIND ONE FOR THEM TO CALL HIM IN
+THERE WAS NO DOUBTING THEM IN THIS INSTANCE
+IT'S THE MOST INEXPLICABLE THERE
+A YOUNG FELLOW WHO HAD BEEN HOVERING IN THE BACKGROUND AT ONCE STEPPED FORWARD
+WHETHER HE GOT ANYTHING ELSE IT WOULD BE IMPOSSIBLE TO SAY FROM HIS MANNER AS HE FINALLY SANK INTO A CHAIR BY ONE OF THE OPENINGS AND LOOKED DOWN ON THE LOBBY BELOW
+OLD DAYS DON'T RETURN FOR THE ASKING
+NO DOUBT
+NO WEAPON PROTRUDED FROM THE WOUND NOR WAS ANY FOUND ON OR NEAR HER IN THE MEZZANINE WHAT FOLLOWS
+THE BOYS LOOK WIDE AWAKE ENOUGH BUT WHO CAN TELL I WOULD SOONER BELIEVE THAT
+BUT THEY'LL PUT A MAN ON FOR YOU
+NO A VERY NATURAL ONE I SHOULD SAY
+YES
+MARK SOWERBY AND CLAUS HENNERBERG
+IT WAS ONE WHICH GAVE ME A SMALL TRIUMPH OVER GEORGE
+THE TIME IS NARROWED DOWN TO ONE AND IN THAT ONE MISS CLARKE WAS THE ONLY PERSON TO TOUCH HER
+WHEREVER SHE PLEASES ONLY I CAN'T WALK FAR
+A MAN WAS LOOKING IN FROM THE CORRIDOR BEHIND AT THE FOUR PERSONS WE WERE JUST DISCUSSING
+WHAT DO YOU MAKE OF IT GRYCE
+WHAT MADE THE DIFFERENCE
+JUST AN EVERYDAY DETECTIVE BUT AMBITIOUS I SUPPOSE AND QUITE ALIVE TO THE IMPORTANCE OF BEING THOROUGH
+NO WORD NO CRY JUST A COLLAPSE AND SUDDEN FALL
+SWEETWATER HELP ME OUT OF THIS
+YES MISS CLARKE THE MIDDLE AGED LADY WITH THE PARRISHES
+IT'S A CASE IN A THOUSAND GRYCE
+YES MISTER SLATER THE ASSISTANT MANAGER WHO WAS IN THE LOBBY AT THE TIME SAYS THAT TEN MINUTES AT LEAST MUST HAVE ELAPSED
+HONEST GERMANS MEN WHO HAVE PLAYED HERE FOR YEARS
+HE WANTS ME TO STAND READY TO OBEY ANY SUMMONS THE POLICE MAY SEND ME
+HOWEVER A LITTLE LATER WE HAD A COMFORTABLE CHAT
+THAT IS WE HAVE NOT BEEN ABLE TO FIND ANY PERHAPS YOU CAN
+ANYBODY BEFORE THE FATHER CAME IN
+THEIR GREETING WAS CORDIAL AND THE LINES ON THE LATTER'S FACE RELAXED A LITTLE AS HE MET THE STILL BRIGHT EYE OF THE MAN UPON WHOSE INSTINCT AND JUDGMENT SO MUCH RELIANCE HAD ALWAYS BEEN PLACED
+INSTANTLY THEY ABSORBED ALL MY ATTENTION THOUGH I DARED NOT GIVE THEM A DIRECT LOOK AND CONTINUED TO OBSERVE THEM ONLY IN THE GLASS
+FOR SOME LITTLE TIME THAT IS IT SEEMED LONG THOUGH I BELIEVE IT WAS NOT MORE THAN A MINUTE BEFORE TWO MEN CAME RUNNING FROM THE MUSICIANS GALLERY
+AND THE GLANCE SHE CAST HIM WHILE NOT MEETING HIS EYE SHOWED THAT SHE UNDERSTOOD THE IMPORTANCE OF THE ADMISSION
+IT DID NOT FALL UPON THE FLOOR AROUND HER THEREFORE IT FLEW THROUGH ONE OF THOSE OPENINGS INTO THE LOBBY AND THERE IT EITHER WILL BE OR HAS BEEN FOUND
+SHE HAD NO COMPANION NEAR HER
+WHAT DOES HE WANT
+THE LADY IS NOT THE MOTHER OF THE BOYS BUT THEIR AUNT
+HE WAS LATE OF COURSE BUT WHEN HE DID APPEAR I ALMOST FORGOT OUR USUAL GREETING IN MY HURRY TO ASK HIM IF HE HAD SEEN THE EVENING PAPERS
+I WILL TROUBLE YOU NO FURTHER
+AS HER QUIET FIGURE APPEARED IN THE DOORWAY SWEETWATER STOLE A GLANCE AT MISTER GRYCE
+NATURALLY THEY REACHED HER FIRST
+BUT I'M IN NO POSITION TO MAKE PROMISES
+HE GAVE UP WORK SOME TIME AGO I HAVE BEEN TOLD MY HUSBAND WENT ON BUT EVIDENTLY A GREAT CASE STILL HAS ITS ALLUREMENT FOR HIM
+YES AND A VERY RESPECTABLE ONE
+SHE STRUCK THE BLOW HERSELF AND THE STRENGTH OF PURPOSE WHICH LED HER TO DO THIS GAVE HER THE ADDITIONAL FORCE TO PULL THE WEAPON OUT AND FLING IT FROM HER
+VERY WELL THEN YOU'RE IN A POSITION TO PIONEER ME
+YES HE'S MERCURIAL IN ALL HIS MOVEMENTS
+IT WILL BE WELL TO STATE IN THE BEGINNING OF THIS RECIPE THAT FRENCH FORCEMEAT OR QUENELLES CONSIST OF THE BLENDING OF THREE SEPARATE PROCESSES NAMELY PANADA UDDER AND WHATEVER MEAT YOU INTEND USING PANADA
+MODE MIX ALL THE INGREDIENTS WELL TOGETHER CAREFULLY MINCING THEM VERY FINELY BEAT UP THE EGG MOISTEN WITH IT AND WORK THE WHOLE VERY SMOOTHLY TOGETHER
+SIMMER FOR A MINUTE OR TWO AND SERVE IN A TUREEN
+LEMON JUICE MAY BE ADDED AT PLEASURE
+THE GINGER PLANT KNOWN TO NATURALISTS AS ZINGIBER OFFICINALE IS A NATIVE OF THE EAST AND WEST INDIES
+ILLUSTRATION SAGE
+MODE CUT UP THE ONION AND CARROT INTO SMALL RINGS AND PUT THEM INTO A STEWPAN WITH THE HERBS MUSHROOMS BAY LEAF CLOVES AND MACE ADD THE BUTTER AND SIMMER THE WHOLE VERY GENTLY OVER A SLOW FIRE UNTIL THE ONION IS QUITE TENDER
+THEN ADD THE YOLKS OF THE EGGS WELL BEATEN STIR THEM TO THE SAUCE BUT DO NOT ALLOW IT TO BOIL AND SERVE VERY HOT
+IF THE QUENELLES ARE NOT FIRM ENOUGH ADD THE YOLK OF ANOTHER EGG BUT OMIT THE WHITE WHICH ONLY MAKES THEM HOLLOW AND PUFFY INSIDE
+ILLUSTRATION THE LEMON
+MODE PARE AND SLICE THE CUCUMBERS AS FOR THE TABLE SPRINKLE WELL WITH SALT AND LET THEM REMAIN FOR TWENTY FOUR HOURS STRAIN OFF THE LIQUOR PACK IN JARS A THICK LAYER OF CUCUMBERS AND SALT ALTERNATELY TIE DOWN CLOSELY AND WHEN WANTED FOR USE TAKE OUT THE QUANTITY REQUIRED
+ILLUSTRATION LONG PEPPER
+IF IT SHOULD BE IN AN UNSOUND STATE IT MUST BE ON NO ACCOUNT MADE USE OF
+LONG PEPPER THIS IS THE PRODUCE OF A DIFFERENT PLANT FROM THAT WHICH PRODUCES THE BLACK IT CONSISTING OF THE HALF RIPE FLOWER HEADS OF WHAT NATURALISTS CALL PIPER LONGUM AND CHABA
+ILLUSTRATION GINGER
+WHEN IT IS SUFFICIENTLY THICK TAKE IT OFF AS IT SHOULD NOT BOIL
+THE FAT THEY ARE FRIED IN SHOULD BE CLEAR AND THE CRUMBS SHOULD NOT HAVE THE SLIGHTEST APPEARANCE OR TASTE OF HAVING BEEN IN THE LEAST DEGREE BURNT
+OTHER SWEET HERBS ARE CULTIVATED FOR PURPOSES OF MEDICINE AND PERFUMERY THEY ARE MOST GRATEFUL BOTH TO THE ORGANS OF TASTE AND SMELLING AND TO THE AROMA DERIVED FROM THEM IS DUE IN A GREAT MEASURE THE SWEET AND EXHILARATING FRAGRANCE OF OUR FLOWERY MEADS
+BOIL FOR FIVE MINUTES MINCE IT VERY SMALL AND MIX IT WITH THE OTHER INGREDIENTS
+NOTE THE WINE IN THIS SAUCE MAY BE OMITTED AND AN ONION SLICED AND FRIED OF A NICE BROWN SUBSTITUTED FOR IT
+ILLUSTRATION THE CUCUMBER
+SOME SPRINGS ARE SO HIGHLY IMPREGNATED WITH SALT AS TO HAVE RECEIVED THE NAME OF BRINE SPRINGS AND ARE SUPPOSED TO HAVE BECOME SO BY PASSING THROUGH THE SALT ROCKS BELOW GROUND AND THUS DISSOLVING A PORTION OF THIS MINERAL SUBSTANCE
+CONTINUE IN THIS MANNER TILL THE BORDER IS COMPLETED ARRANGING THE SIPPETS A PALE AND A DARK ONE ALTERNATELY
+SEASONABLE THIS RECIPE SHOULD BE USED IN JUNE JULY OR AUGUST
+ILLUSTRATION BASIL
+MODE CHOOSE THE GREENEST CUCUMBERS AND THOSE THAT ARE MOST FREE FROM SEEDS PUT THEM IN STRONG SALT AND WATER WITH A CABBAGE LEAF TO KEEP THEM DOWN TIE A PAPER OVER THEM AND PUT THEM IN A WARM PLACE TILL THEY ARE YELLOW THEN WASH THEM AND SET THEM OVER THE FIRE IN FRESH WATER WITH A VERY LITTLE SALT AND ANOTHER CABBAGE LEAF OVER THEM COVER VERY CLOSELY BUT TAKE CARE THEY DO NOT BOIL
+ORIGINALLY THE MOST VALUABLE OF THESE WERE FOUND IN THE SPICE ISLANDS OR MOLUCCAS OF THE INDIAN OCEAN AND WERE HIGHLY PRIZED BY THE NATIONS OF ANTIQUITY
+SUFFICIENT TO SERVE WITH FIVE OR SIX MACKEREL
+PUT THE SUGAR WITH ONE QUARTER PINT OF WATER IN A SAUCEPAN OVER THE FIRE REMOVE THE SCUM AS IT RISES AND ADD THE LEMON PEEL AND GINGER WITH THE OUTSIDE SCRAPED OFF WHEN THE SYRUP IS TOLERABLY THICK TAKE IT OFF THE FIRE AND WHEN COLD WIPE THE CUCUMBERS DRY AND PUT THEM IN
+BEAT THE YOLKS OF THE OTHER TWO EGGS ADD THEM WITH A LITTLE FLOUR AND SALT TO THOSE POUNDED MIX ALL WELL TOGETHER AND ROLL INTO BALLS
+MODE PUT THE MILK IN A VERY CLEAN SAUCEPAN AND LET IT BOIL
+WHEN QUITE CRISP DIP ONE SIDE OF THE SIPPET INTO THE BEATEN WHITE OF AN EGG MIXED WITH A LITTLE FLOUR AND PLACE IT ON THE EDGE OF THE DISH
+IT IS A NATIVE OF PORTUGAL AND WHEN ITS LEAVES ARE USED AS A SEASONING HERB THEY HAVE AN AGREEABLE AROMATIC FLAVOUR
+THE LEMON THIS FRUIT IS A NATIVE OF ASIA AND IS MENTIONED BY VIRGIL AS AN ANTIDOTE TO POISON
+SUFFICIENT FOR A MODERATE SIZED HADDOCK OR PIKE
+SOLID ROCKS OF SALT ARE ALSO FOUND IN VARIOUS PARTS OF THE WORLD AND THE COUNTY OF CHESTER CONTAINS MANY OF THESE MINES AND IT IS FROM THERE THAT MUCH OF OUR SALT COMES
+FORCEMEAT FOR COLD SAVOURY PIES
+PUT THE UDDER INTO A STEWPAN WITH SUFFICIENT WATER TO COVER IT LET IT STEW GENTLY TILL QUITE DONE WHEN TAKE IT OUT TO COOL
+FRIED BREAD FOR BORDERS
+MODE PUT THE WHOLE OF THE INGREDIENTS INTO A BOTTLE AND LET IT REMAIN FOR A FORTNIGHT IN A WARM PLACE OCCASIONALLY SHAKING UP THE CONTENTS
+VARIOUS DISHES ARE FREQUENTLY ORNAMENTED AND GARNISHED WITH ITS GRACEFUL LEAVES AND THESE ARE SOMETIMES BOILED IN SOUPS ALTHOUGH IT IS MORE USUALLY CONFINED IN ENGLISH COOKERY TO THE MACKEREL SAUCE AS HERE GIVEN
+THEY OUGHT TO BE TAKEN UP IN THE AUTUMN AND WHEN DRIED IN THE HOUSE WILL KEEP TILL SPRING
+THIS JUICE WHICH IS CALLED CITRIC ACID MAY BE PRESERVED IN BOTTLES FOR A CONSIDERABLE TIME BY COVERING IT WITH A THIN STRATUM OF OIL
+FRIED BREAD CRUMBS
+NOW BEAT AND STRAIN THE EGGS WORK THESE UP WITH THE OTHER INGREDIENTS AND THE FORCEMEAT WILL BE READY FOR USE
+TO PICKLE EGGS
+PLACE THE JUG IN A SAUCEPAN OF BOILING WATER KEEP STIRRING WELL UNTIL IT THICKENS BUT DO NOT ALLOW IT TO BOIL OR IT WILL CURDLE
+POUND WELL AND BIND WITH ONE OR TWO EGGS WHICH HAVE BEEN PREVIOUSLY BEATEN AND STRAINED
+IT IS HARDIER THAN THE ORANGE AND AS ONE OF THE CITRON TRIBE WAS BROUGHT INTO EUROPE BY THE ARABIANS
+THE LEMON WAS FIRST CULTIVATED IN ENGLAND IN THE BEGINNING OF THE SEVENTEENTH CENTURY AND IS NOW OFTEN TO BE FOUND IN OUR GREEN HOUSES
+FRENCH FORCEMEAT
+A STORE OF PICKLED EGGS WILL BE FOUND VERY USEFUL AND ORNAMENTAL IN SERVING WITH MANY FIRST AND SECOND COURSE DISHES
+ADD THE WINE AND IF NECESSARY A SEASONING OF CAYENNE WHEN IT WILL BE READY TO SERVE
+PLACE IT OVER THE FIRE KEEP CONSTANTLY STIRRING TO PREVENT ITS BURNING AND WHEN QUITE DRY PUT IN A SMALL PIECE OF BUTTER
+ANY ONE WITH THE SLIGHTEST PRETENSIONS TO REFINED COOKERY MUST IN THIS PARTICULAR IMPLICITLY FOLLOW THE EXAMPLE OF OUR FRIENDS ACROSS THE CHANNEL
+ILLUSTRATION PESTLE AND MORTAR
+WHEN THE THREE INGREDIENTS ARE PROPERLY PREPARED POUND THEM ALTOGETHER IN A MORTAR FOR SOME TIME FOR THE MORE QUENELLES ARE POUNDED THE MORE DELICATE THEY ARE
+IN JAMAICA IT FLOWERS ABOUT AUGUST OR SEPTEMBER FADING ABOUT THE END OF THE YEAR
+THE LONG PEPPER IS LESS AROMATIC THAN THE BLACK BUT ITS OIL IS MORE PUNGENT
+SUFFICIENT HALF THIS QUANTITY FOR TWO SLICES OF SALMON
+SEASONABLE THIS SHOULD BE MADE ABOUT EASTER AS AT THIS TIME EGGS ARE PLENTIFUL AND CHEAP
+BOIL THEM BEFORE THEY ARE PUT INTO THE SOUP OR OTHER DISH THEY MAY BE INTENDED FOR
+BEAT THE EGGS STIR TO THEM THE MILK AND POUNDED SUGAR AND PUT THE MIXTURE INTO A JUG
+ILLUSTRATION MARJORAM
+SOMETHING OF THEIR TROUBLING SWEETNESS CAME BACK TO ALEXANDER TOO
+DON'T CRY DON'T CRY HE WHISPERED
+HILDA'S FACE QUIVERED BUT SHE WHISPERED YES I THINK IT MUST HAVE BEEN
+SHE LOOKED AT HIS HEAVY SHOULDERS AND BIG DETERMINED HEAD THRUST FORWARD LIKE A CATAPULT IN LEASH
+SHE BIT HER LIP AND LOOKED DOWN AT HER HANDS WHICH WERE CLASPED TIGHTLY IN FRONT OF HER
+YOU ASK ME TO STAY AWAY FROM YOU BECAUSE YOU WANT ME
+HILDA WATCHED HIM FROM HER CORNER TREMBLING AND SCARCELY BREATHING DARK SHADOWS GROWING ABOUT HER EYES
+SHE LISTENED INTENTLY BUT SHE HEARD NOTHING BUT THE CREAKING OF HIS CHAIR
+I UNDERSTAND BARTLEY I WAS WRONG
+AT THAT WORD DECEPTION SPOKEN WITH SUCH SELF CONTEMPT THE COLOR FLASHED BACK INTO HILDA'S FACE AS SUDDENLY AS IF SHE HAD BEEN STRUCK BY A WHIPLASH
+ALWAYS BUT IT'S WORSE NOW
+SHE BLUSHED AND SMILED AND FUMBLED HIS CARD IN HER CONFUSION BEFORE SHE RAN UPSTAIRS
+SHE MERELY BRUSHED HIS CHEEK WITH HER LIPS AND PUT A HAND LIGHTLY AND JOYOUSLY ON EITHER SHOULDER
+THE LAST TWO DAYS OF THE VOYAGE BARTLEY FOUND ALMOST INTOLERABLE
+THERE IS THIS DECEPTION BETWEEN ME AND EVERYTHING
+YES HILDA I KNOW THAT HE SAID SIMPLY
+HILDA SAT ON THE ARM OF IT AND PUT HER HANDS LIGHTLY ON HIS SHOULDERS
+OH BARTLEY WHAT AM I TO DO
+I WILL ASK THE LEAST IMAGINABLE BUT I MUST HAVE SOMETHING
+WHEN DID YOU COME BARTLEY AND HOW DID IT HAPPEN YOU HAVEN'T SPOKEN A WORD
+BARTLEY LEANED HIS HEAD IN HIS HANDS AND SPOKE THROUGH HIS TEETH
+YOU WANT ME TO SAY IT SHE WHISPERED
+AND THEN YOU CAME BACK NOT CARING VERY MUCH BUT IT MADE NO DIFFERENCE
+A COAL FIRE WAS CRACKLING IN THE GRATE AND THE LAMPS WERE LIT FOR IT WAS ALREADY BEGINNING TO GROW DARK OUTSIDE
+AFTER THE VERY FIRST
+SHE CALLED HIS NAME ON THE THRESHOLD BUT IN HER SWIFT FLIGHT ACROSS THE ROOM SHE FELT A CHANGE IN HIM AND CAUGHT HERSELF UP SO DEFTLY THAT HE COULD NOT TELL JUST WHEN SHE DID IT
+HE PULLED UP A WINDOW AS IF THE AIR WERE HEAVY
+IT'S UNBEARABLE IT TORTURES ME EVERY MINUTE
+PRESENTLY IT STOLE BACK TO HIS COAT SLEEVE
+I'LL DO ANYTHING YOU WISH ME TO BARTLEY SHE SAID TREMULOUSLY
+I HAVE THOUGHT ABOUT IT UNTIL I AM WORN OUT
+THE ROOM WAS EMPTY WHEN HE ENTERED
+SHE PRESSED HIS HAND GENTLY IN GRATITUDE WEREN'T YOU HAPPY THEN AT ALL
+EMERGING AT EUSTON AT HALF PAST THREE O'CLOCK IN THE AFTERNOON ALEXANDER HAD HIS LUGGAGE SENT TO THE SAVOY AND DROVE AT ONCE TO BEDFORD SQUARE
+IT'S GOT TO BE A CLEAN BREAK HILDA
+IT IT HASN'T ALWAYS MADE YOU MISERABLE HAS IT
+YOU SEE LOVING SOME ONE AS I LOVE YOU MAKES THE WHOLE WORLD DIFFERENT
+COULD YOU COULD YOU SIT DOWN AND TALK ABOUT IT QUIETLY BARTLEY AS IF I WERE A FRIEND AND NOT SOME ONE WHO HAD TO BE DEFIED
+I NEVER DREAMED IT WOULD BE YOU BARTLEY
+I GET NOTHING BUT MISERY OUT OF EITHER
+HE DROPPED BACK HEAVILY INTO HIS CHAIR BY THE FIRE
+I AM NOT A MAN WHO CAN LIVE TWO LIVES HE WENT ON FEVERISHLY EACH LIFE SPOILS THE OTHER
+SHE SLID TO THE FLOOR BESIDE HIM AS IF SHE WERE TOO TIRED TO SIT UP ANY LONGER
+WHEN SHE BEGAN TO DANCE BY WAY OF SHOWING THE GOSSOONS WHAT SHE HAD SEEN IN THE FAIRY RINGS AT NIGHT THE HOUSE BROKE INTO A PROLONGED UPROAR
+I'M GLAD SHE'S HELD HER OWN SINCE
+IN A MOMENT PEGGY WAS ON THE STAGE AGAIN AND ALEXANDER APPLAUDED VIGOROUSLY WITH THE REST
+ALEXANDER EXCLAIMED MILDLY
+I HAPPEN TO HAVE MAC CONNELL'S BOX FOR TONIGHT OR THERE'D BE NO CHANCE OF OUR GETTING PLACES
+A LITTLE ATTACK OF NERVES POSSIBLY
+MAC CONNELL LET ME INTRODUCE MISTER BARTLEY ALEXANDER
+IN THE HALF LIGHT HE LOOKED ABOUT AT THE STALLS AND BOXES AND SMILED A LITTLE CONSCIOUSLY RECALLING WITH AMUSEMENT SIR HARRY'S JUDICIAL FROWN
+HUGH'S WRITTEN A DELIGHTFUL PART FOR HER AND SHE'S QUITE INEXPRESSIBLE
+MYSELF I ALWAYS KNEW SHE HAD IT IN HER
+HE'S ANOTHER WHO'S AWFULLY KEEN ABOUT HER LET ME INTRODUCE YOU
+IT'S DELIGHTFUL TO HEAR IT IN A LONDON THEATRE
+I SAY SIR HARRY THE LITTLE GIRL'S GOING FAMOUSLY TO NIGHT ISN'T SHE
+HE LEANED FORWARD AND BEAMED FELICITATIONS AS WARMLY AS MAINHALL HIMSELF WHEN AT THE END OF THE PLAY SHE CAME AGAIN AND AGAIN BEFORE THE CURTAIN PANTING A LITTLE AND FLUSHED HER EYES DANCING AND HER EAGER NERVOUS LITTLE MOUTH TREMULOUS WITH EXCITEMENT
+OF COURSE HILDA IS IRISH THE BURGOYNES HAVE BEEN STAGE PEOPLE FOR GENERATIONS AND SHE HAS THE IRISH VOICE
+WHEN THEY ENTERED THE STAGE BOX ON THE LEFT THE FIRST ACT WAS WELL UNDER WAY THE SCENE BEING THE INTERIOR OF A CABIN IN THE SOUTH OF IRELAND
+HE NODDED CURTLY AND MADE FOR THE DOOR DODGING ACQUAINTANCES AS HE WENT
+AFTER HER DANCE SHE WITHDREW FROM THE DIALOGUE AND RETREATED TO THE DITCH WALL BACK OF PHILLY'S BURROW WHERE SHE SAT SINGING THE RISING OF THE MOON AND MAKING A WREATH OF PRIMROSES FOR HER DONKEY
+AS THEY SAT DOWN A BURST OF APPLAUSE DREW ALEXANDER'S ATTENTION TO THE STAGE
+ALL THE SAME HE LIFTED HIS GLASS HERE'S TO YOU LITTLE HILDA
+DO YOU KNOW ALEXANDER MAINHALL LOOKED WITH PERPLEXITY UP INTO THE TOP OF THE HANSOM AND RUBBED HIS PINK CHEEK WITH HIS GLOVED FINGER DO YOU KNOW I SOMETIMES THINK OF TAKING TO CRITICISM SERIOUSLY MYSELF
+SIR HARRY TOWNE BOWED AND SAID THAT HE HAD MET MISTER ALEXANDER AND HIS WIFE IN TOKYO
+THE FACT IS SHE'S FEELING RATHER SEEDY POOR CHILD
+HE BOWED AS THE WARNING BELL RANG AND MAINHALL WHISPERED YOU KNOW LORD WESTMERE OF COURSE THE STOOPED MAN WITH THE LONG GRAY MUSTACHE TALKING TO LADY DOWLE
+I DARE SAY IT'S QUITE TRUE THAT THERE'S NEVER BEEN ANY ONE ELSE
+THE PLAYWRIGHT GAVE MAINHALL A CURIOUS LOOK OUT OF HIS DEEP SET FADED EYES AND MADE A WRY FACE
+IT WAS YOUTH AND POVERTY AND PROXIMITY AND EVERYTHING WAS YOUNG AND KINDLY
+HE HAD WRITTEN A NUMBER OF BOOKS HIMSELF AMONG THEM A HISTORY OF DANCING A HISTORY OF COSTUME A KEY TO SHAKESPEARE'S SONNETS A STUDY OF THE POETRY OF ERNEST DOWSON ET CETERA
+OH BARTLEY DID YOU WRITE TO ME
+IF YOU'D SENT ME A NOTE OR TELEPHONED ME OR ANYTHING
+HE BENT HIS FACE OVER HER HAIR
+ALEXANDER SLIPPED HIS ARM ABOUT HER
+OF COURSE I KNOW BARTLEY SHE SAID AT LAST THAT AFTER THIS YOU WON'T OWE ME THE LEAST CONSIDERATION BUT WE SAIL ON TUESDAY
+I SAW THAT INTERVIEW IN THE PAPER YESTERDAY TELLING WHERE YOU WERE AND I THOUGHT I HAD TO SEE YOU THAT'S ALL GOOD NIGHT I'M GOING NOW
+I DON'T KNOW WHAT I OUGHT TO SAY BUT I DON'T BELIEVE YOU'D BE HAPPY TRULY I DON'T AREN'T YOU TRYING TO FRIGHTEN ME
+ONLY I'LL DO IT MORE COMPLETELY
+I'VE BEEN UP IN CANADA WITH MY BRIDGE AND I ARRANGED NOT TO COME TO NEW YORK UNTIL AFTER YOU HAD GONE
+THEN YOU DON'T KNOW WHAT YOU'RE TALKING ABOUT
+OVER THE FIREPLACE THERE WAS A LARGE OLD FASHIONED GILT MIRROR
+ALEXANDER FLUSHED ANGRILY
+I THINK I HAVE FELT THAT YOU WERE COMING
+HE PAUSED THEY NEVER DID TO ME
+I TOLD MYSELF THAT IF I WERE REALLY THINKING OF YOU AND NOT OF MYSELF A LETTER WOULD BE BETTER THAN NOTHING
+THEN WHEN YOUR MANAGER ADDED TWO MORE WEEKS I WAS ALREADY COMMITTED
+I'M GOING TO DO WHAT YOU ASKED ME TO DO WHEN YOU WERE IN LONDON
+ON THE LAST SATURDAY IN APRIL THE NEW YORK TIMES PUBLISHED AN ACCOUNT OF THE STRIKE COMPLICATIONS WHICH WERE DELAYING ALEXANDER'S NEW JERSEY BRIDGE AND STATED THAT THE ENGINEER HIMSELF WAS IN TOWN AND AT HIS OFFICE ON WEST TENTH STREET
+LET ME TAKE OFF YOUR COAT AND YOUR BOOTS THEY'RE OOZING WATER
+BUT WHEN I CAME I THOUGHT I HAD BEEN MISTAKEN
+YES I KNOW VERY WELL
+HE ROSE AND CROSSED THE ROOM QUICKLY
+AND I SHE WHISPERED I FELT THAT YOU WERE FEELING THAT
+IT SEEMED AS IF HIS FAMILY TROUBLES WERE JUST BEGINNING
+HE SAW A BUSY SATURDAY USHERED OUT THE SABBATH IN AND NOTHING DONE
+HE SAW THAT IN THE EXCITEMENT OF RECENT EVENTS HE HAD NOT FORMULATED A PLAN UPON THAT SCORE
+SO HERE IT WAS SPREAD OUT CLEAR BEFORE HIM AND NOW HE KNEW WHAT TO EXPECT
+YOU TAKE THIS TO THIS ADDRESS HE SAID HANDING HIM THE ENVELOPE AND GIVE IT TO MISSUS HURSTWOOD YES SIR SAID THE BOY
+DEAR SIR WE BEG TO INFORM YOU THAT WE ARE INSTRUCTED TO WAIT UNTIL TO MORROW THURSDAY AT ONE O'CLOCK BEFORE FILING SUIT AGAINST YOU ON BEHALF OF MISSUS JULIA HURSTWOOD FOR DIVORCE AND ALIMONY
+HE WAS GETTING SOME VAGUE COMFORT OUT OF A GOOD CIGAR BUT IT WAS NO PANACEA FOR THE ILL WHICH AFFECTED HIM
+ANY ANSWER I GUESS NOT
+HE WAS QUITE CERTAIN NOW THAT SHE KNEW HE WAS MARRIED AND WAS ANGERED AT HIS PERFIDY
+HE FANCIED AS HE SAT AT HIS DESK THAT NOTHING WOULD BE DONE FOR A WEEK OR TWO MEANWHILE HE WOULD HAVE TIME TO THINK
+ON WEDNESDAY HE RECEIVED ANOTHER POLITE NOTE FROM MC GREGOR JAMES AND HAY IT READ
+HE STAYED AT HIS DESK LONG AFTER ALL OTHERS HAD GONE AND ONLY QUITTED IT WHEN THE NIGHT WATCHMAN ON HIS ROUND PULLED AT THE FRONT DOOR TO SEE IF IT WAS SAFELY LOCKED
+HOW ABOUT THAT NOW HIS PAIN AT HER FAILURE TO MEET OR WRITE HIM RAPIDLY INCREASED AS HE DEVOTED HIMSELF TO THIS SUBJECT
+VERY TRULY YOURS ET CETERA COMPROMISE
+HE TROUBLED OVER MANY LITTLE DETAILS AND TALKED PERFUNCTORILY TO EVERYBODY
+HE WOULD GO TO HER AND TELL HER ALL HIS FAMILY COMPLICATIONS
+IF HE DIDN'T GO AND SEE THEM THEY WOULD SUE HIM PROMPTLY
+HE WAS BEATEN FOR TO NIGHT AND HE MIGHT JUST AS WELL MAKE THE BEST OF IT
+HE COULD HARDLY REALISE HOW IT HAD ALL COME ABOUT
+HE WOULD EXPLAIN TO HER JUST WHERE HE STOOD AND HOW MUCH HE NEEDED HER
+THREE O'CLOCK CAME FOUR FIVE SIX AND NO LETTER
+SHE WOULD TAKE THE ENVELOPE AND KNOW THAT SHE HAD TRIUMPHED
+IF HE ONLY HAD THAT LETTER BACK HE WOULDN'T SEND IT
+IN ABOUT AN HOUR AND THREE QUARTERS THE BOY RETURNED
+NO LETTER HAD COME NO WORD OF ANY KIND AND YET HERE IT WAS LATE IN THE EVENING AND SHE HAD AGREED TO MEET HIM THAT MORNING
+HE DID NOT GO WITHIN A BLOCK OF THE HOUSE
+ALL DAY THE BAR BEING CLOSED HE BROODED ALONE SHUT OUT FROM HOME FROM THE EXCITEMENT OF HIS RESORT FROM CARRIE AND WITHOUT THE ABILITY TO ALTER HIS CONDITION ONE IOTA
+IT WAS WITH GREAT OPPOSITION AFTER TWO OR THREE HOURS OF THE MOST URGENT MENTAL AFFIRMATION AND DENIAL THAT AT LAST HE GOT AN ENVELOPE PLACED IN IT THE REQUESTED AMOUNT AND SLOWLY SEALED IT UP
+THE HELPLESS MANAGER PACED THE FLOOR AND GRIMLY ENDURED THE GLOOM OF DEFEAT
+WHEN HURSTWOOD GOT BACK TO HIS OFFICE AGAIN HE WAS IN A GREATER QUANDARY THAN EVER
+ALL THE TIME HIS THOUGHTS WOULD RUN OUT TO HIS HOME AND SEE THE SCENE BEING THEREIN ENACTED
+HE HAD LOVED HER EARNESTLY ENOUGH BUT NOW THAT THE POSSIBILITY OF LOSING HER STARED HIM IN THE FACE SHE SEEMED MUCH MORE ATTRACTIVE
+HE DID MANAGE TO BRING HIMSELF INTO THE MOOD TO GO OUT TO CARRIE BUT WHEN HE GOT IN OGDEN PLACE HE THOUGHT HE SAW A MAN WATCHING HIM AND WENT AWAY
+THE BOY HASTENED AWAY AND THE MANAGER FELL TO HIS MUSINGS
+FOR RELIEF HE AROSE AND JOINED IN CONVERSATION WITH A FEW FRIENDS WHO WERE DRINKING
+HE DECIDED TO WRITE HER CARE OF THE WEST SIDE POST OFFICE AND ASK FOR AN EXPLANATION AS WELL AS TO HAVE HER MEET HIM
+THEN HE CALLED HARRY THE BOY OF ALL WORK AROUND THE PLACE
+BUT HE COMPROMISED BY TELLING THE BOY THAT THERE WOULD BE NO REPLY
+HE WENT IN AND EXAMINED HIS LETTERS BUT THERE WAS NOTHING FROM CARRIE
+HE COULD ARRANGE THAT SATISFACTORILY FOR CARRIE WOULD BE GLAD TO WAIT IF NECESSARY
+THEN HE RANG THE BELL NO ANSWER
+HURSTWOOD ALMOST EXCLAIMED OUT LOUD AT THE INSISTENCY OF THIS THING
+AT ONE THIRTY HE WENT TO RECTOR'S FOR LUNCH AND WHEN HE RETURNED A MESSENGER WAS WAITING FOR HIM
+HE WOULD GET ONE TO DAY IT WOULD PROBABLY BE ON HIS DESK WHEN HE GOT BACK HE WOULD LOOK FOR IT AT ONCE
+WHILE THE DANGER HAD NOT LESSENED IT HAD NOT AS YET MATERIALISED AND WITH HIM NO NEWS WAS GOOD NEWS
+HE WOULD SEE HOW THINGS TURNED OUT TO MORROW AND THEN HE WOULD TALK TO HER THEY WERE GOING TO MEET AS USUAL
+HE KNEW HER WELL ENOUGH TO KNOW THAT WHEN SHE HAD DECIDED UPON A PLAN SHE WOULD FOLLOW IT UP
+FOR SOME REASON HE FELT AS IF SOMETHING MIGHT COME THAT WAY AND WAS RELIEVED WHEN ALL THE ENVELOPES HAD BEEN SCANNED AND NOTHING SUSPICIOUS NOTICED
+SO LITTLE DID HE CONSIDER DROUET THAT IT NEVER ONCE OCCURRED TO HIM TO WORRY ABOUT HIS FINDING OUT
+HE RANG AGAIN THIS TIME HARDER STILL NO ANSWER
+HE GREW RESTLESS AS HE RUMINATED AND THEN DECIDED THAT PERHAPS IT WAS NOTHING
+LATER HOWEVER HIS OLD DISCRETION ASSERTED ITSELF
+FORTUNATELY THERE WAS NOTHING FROM HIS WIFE EITHER
+THEN HE SAT DOWN IN HIS CHAIR AND GAZED WITHOUT SEEING CONTEMPLATING THE RESULT OF HIS WORK
+HE BEGAN TO WISH THAT HE HAD COMPROMISED IN SOME WAY OR OTHER THAT HE HAD SENT THE MONEY PERHAPS HE COULD DO IT UP HERE
+HE PUT ON HIS HAT AND LOOKED AROUND FOR HIS UMBRELLA
+HE WAS IN A FEVERED STATE OF MIND OWING TO THE BLIGHT HIS WIFE'S ACTION THREATENED TO CAST UPON HIS ENTIRE FUTURE
+HOW WOULD THE PAPERS TALK ABOUT IT
+AFTER A TIME HE GAVE UP WAITING AND DREARILY HEADED FOR THE MADISON CAR
+SOMETHING HAD TO BE DONE A CLIMAX WAS NEAR AND SHE WOULD NOT SIT IDLE
+SHE HAD NOT BEEN ABLE TO GET AWAY THIS MORNING
+HE AROSE FROM HIS CHAIR AND WENT AND LOOKED OUT INTO THE STREET
+HE WOULD GO IN AND SEE ANYHOW HE WOULD HAVE NO ROW
+WHAT WOULD SHE DO ABOUT THAT THE CONFOUNDED WRETCH
+HIS FIRST IMPULSE WAS TO WRITE BUT FOUR WORDS IN REPLY GO TO THE DEVIL
+THE LONG DRIZZLE HAD BEGUN PEDESTRIANS HAD TURNED UP COLLARS AND TROUSERS AT THE BOTTOM
+HE WOULD HAVE SOME ARRANGEMENT OF THIS THING
+HURSTWOOD WALKED THE FLOOR MENTALLY ARRANGING THE CHIEF POINTS OF HIS SITUATION
+HE WOULD HAVE TO PAY HER THE MONEY WHICH SHE WOULD NOW REGULARLY DEMAND OR THERE WOULD BE TROUBLE IT DID NOT MATTER WHAT HE DID
+BY THE TIME HE REACHED HIS OWN STREET HE WAS KEENLY ALIVE TO THE DIFFICULTIES OF HIS SITUATION AND WISHED OVER AND OVER THAT SOME SOLUTION WOULD OFFER ITSELF THAT HE COULD SEE HIS WAY OUT
+HE ALSO THOUGHT OF HIS MANAGERIAL POSITION
+THE WALLS OF THE ROOMS WERE DISCORDANTLY PAPERED
+YOU COULD GET HOME EASY TOO IT ISN'T VERY FAR
+HIS AMBITION WAS SOME DAY TO BUILD A HOUSE ON THEM
+HE WAS OF A CLEAN SAVING DISPOSITION AND HAD ALREADY PAID A NUMBER OF MONTHLY INSTALMENTS ON TWO LOTS FAR OUT ON THE WEST SIDE
+MINNIE'S FLAT AS THE ONE FLOOR RESIDENT APARTMENTS WERE THEN BEING CALLED WAS IN A PART OF WEST VAN BUREN STREET INHABITED BY FAMILIES OF LABOURERS AND CLERKS MEN WHO HAD COME AND WERE STILL COMING WITH THE RUSH OF POPULATION POURING IN AT THE RATE OF FIFTY THOUSAND A YEAR
+HE SEEMED TO BE THINKING OF SOMETHING ELSE
+THESE VAST BUILDINGS WHAT WERE THEY
+TO CARRIE THE SOUND OF THE LITTLE BELLS UPON THE HORSE CARS AS THEY TINKLED IN AND OUT OF HEARING WAS AS PLEASING AS IT WAS NOVEL
+SHE ASKED MINNIE FOR INK AND PAPER WHICH WERE UPON THE MANTEL IN THE DINING ROOM AND WHEN THE LATTER HAD GONE TO BED AT TEN GOT OUT DROUET'S CARD AND WROTE HIM
+ONE COULD SEE THAT HE WAS VERY MUCH WRAPPED UP IN HIS OFFSPRING
+A SHOP GIRL WAS THE DESTINY PREFIGURED FOR THE NEWCOMER
+THE FLOORS WERE COVERED WITH MATTING AND THE HALL LAID WITH A THIN RAG CARPET
+SHE HAD SOME SLIGHT GIFT OF OBSERVATION AND THAT SENSE SO RICH IN EVERY WOMAN INTUITION
+MINNIE BEGAN TO EXPLAIN BUT HER HUSBAND TOOK THIS PART OF THE CONVERSATION TO HIMSELF
+NOW NOW HE SAID WALKING THERE THERE AND THERE WAS A CERTAIN SWEDISH ACCENT NOTICEABLE IN HIS VOICE
+IT GAVE AN IMPOSING APPEARANCE TO MOST OF THE WHOLESALE HOUSES WHOSE OFFICES WERE UPON THE GROUND FLOOR AND IN PLAIN VIEW OF THE STREET
+SHE WANTED TO MAKE SOME REFERENCE TO THEIR RELATIONS UPON THE TRAIN BUT WAS TOO TIMID
+ANYTHING WAS GOOD ENOUGH SO LONG AS IT PAID SAY FIVE DOLLARS A WEEK TO BEGIN WITH
+THEN SHE WALKED AND SANG TO IT UNTIL HANSON DISTURBED IN HIS READING CAME AND TOOK IT
+IT WAS UNDER SUCH AUSPICIOUS CIRCUMSTANCES THAT SHE STARTED OUT THIS MORNING TO LOOK FOR WORK
+NARROW BOARD WALKS EXTENDED OUT PASSING HERE A HOUSE AND THERE A STORE AT FAR INTERVALS EVENTUALLY ENDING ON THE OPEN PRAIRIE
+TO HIM THE PRESENCE OR ABSENCE OF HIS WIFE'S SISTER WAS A MATTER OF INDIFFERENCE
+OH WHAT SHALL WE DO FOR A HOME
+YOU SAY YOU KNOW ALL ABOUT IT THEN GO ON AND FINISH YOUR NESTS BY YOURSELVES
+SHE WAS INDEED A CLEVER BIRD
+AND SHE SAW THE OTHER BIRDS HOPPING ABOUT AND TWITTERING HELPLESSLY
+MUCH LUCK MAY YOU HAVE
+CERTAINLY OF COURSE SCREAMED THE JACKDAW
+AND AWAY SHE FLEW TO HER OWN COSY NEST IN THE ELM TREE WHERE SHE WAS SOON FAST ASLEEP FORGETTING ALL ABOUT THE MATTER
+AND SOME OF THE BIRDS WHO WERE ATTENTIVE AND CAREFUL SOON SAW HOW IT WAS DONE AND STARTED NICE HOMES FOR THEMSELVES
+INDEED IT IS NOT A NEST AT ALL ONLY THE BEGINNING OF ONE
+BUT THE WOOD PIGEON WAS IN THE WORST CASE OF THEM ALL
+AND WHERE EACH BIRD PERCHED THERE IT WAS TO BUILD ITS NEST
+AND THE POOR SILLY THINGS RUFFLED UP THEIR FEATHERS AND LOOKED MISERABLE AS ONLY A LITTLE BIRD CAN LOOK WHEN IT IS UNHAPPY
+CRISS CROSS CRISS CROSS SO INTERRUPTED THE WOOD PIGEON
+I THOUGHT THAT WAS THE WAY TO BEGIN
+AND THERE IS AN OLD STORY ABOUT THIS WHICH I SHALL TELL YOU
+THE MAGPIE SAID SHE WOULD TEACH THEM IF THEY WOULD BE A PATIENT DILIGENT OBEDIENT CLASS OF LITTLE BIRDS
+SOME ARE WONDERFULLY WROUGHT PRETTY LITTLE HOMES FOR BIRDIKINS
+FOR SHE HAD ONLY THE FOUNDATION LAID CRISS CROSS AS THE MAGPIE HAD SHOWN HER
+THEN ALL THE OTHER BIRDS CHIRPED EAGERLY YES YES LET US ASK HER TO TEACH US
+HERE WOOD PIGEON SAID MOTHER MAGPIE YOU MUST PLACE THOSE STICKS THROUGH AND ACROSS CRISS CROSS CRISS CROSS SO
+O WISE MOTHER MAGPIE DEAR MOTHER MAGPIE THEY CRIED TEACH US HOW TO BUILD OUR NESTS LIKE YOURS FOR IT IS GROWING NIGHT AND WE ARE TIRED AND SLEEPY
+SHE BEGAN TO SHOW THEM HOW TO WEAVE THE BITS OF THINGS TOGETHER INTO NESTS AS THEY SHOULD BE MADE
+SHE POPPED INTO HER NEW HOUSE AND SAT THERE COMFORTABLY PEERING OUT THROUGH THE WINDOW SLITS WITH HER SHARP LITTLE EYES
+SO IN A GREAT COMPANY THEY CAME FLUTTERING HOPPING TWITTERING UP TO THE ELM TREE WHERE MOTHER MAGPIE NESTLED COMFORTABLY IN HER NEW HOUSE
+IT WAS INDEED DANCING ON A VOLCANO
+HE MIGHT BE ENCHANTED BUT THAT WAS THE TALISMAN
+HOW MANY INCIDENTS HOW MANY CHARACTERS HOW MANY FEELINGS FLITTED OVER HIS MEMORY OF WHAT SWEET AND BITTER EXPERIENCE DID HE NOT CHEW THE CUD
+IT WAS INDEED HER HANDWRITING
+FOUR AND TWENTY HOURS AGO AND HE DEEMED HIMSELF THE MOST MISERABLE AND FORLORN OF HUMAN BEINGS AND NOW ALL THE BLESSINGS OF THE WORLD SEEMED SHOWERED AT HIS FEET
+THE MOST GIFTED INDIVIDUALS IN THE LAND EMULATED EACH OTHER IN PROVING WHICH ENTERTAINED FOR HIM THE MOST SINCERE AFFECTION
+AND NOW ALL HAD ENDED SO HAPPILY
+AS FOR HIS FRIENDS THE FUTURE MUST PROVE HIS GRATITUDE TO THEM
+IN EXACTLY TEN MINUTES IT IS IN THE POWER OF EVERY MAN TO FREE HIMSELF FROM ALL THE TUMULT OF THE WORLD THE PANGS OF LOVE THE THROBS OF AMBITION THE WEAR AND TEAR OF PLAY THE RECRIMINATING BOUDOIR THE CONSPIRING CLUB THE RATTLING HELL AND FIND HIMSELF IN A SUBLIME SYLVAN SOLITUDE SUPERIOR TO THE CEDARS OF LEBANON AND INFERIOR ONLY IN EXTENT TO THE CHESTNUT FORESTS OF ANATOLIA
+THIS VIOLENT AND TRIUMPHANT REVOLUTION IN HIS PROSPECTS AND HIS FORTUNES WAS HARDLY YET COMPLETELY COMPREHENDED BY OUR FRIEND FERDINAND ARMINE AND WHEN HE HAD LEFT A NOTE FOR THE GENEROUS MIRABEL WHOSE SLUMBERS HE WOULD NOT DISTURB AT THIS EARLY HOUR EVEN WITH GOOD NEWS HE STROLLED ALONG UP CHARLES STREET AND TO THE PARK IN ONE OF THOSE WILD AND JOYOUS REVERIES IN WHICH WE BROOD OVER COMING BLISS AND CREATE A THOUSAND GLORIOUS CONSEQUENCES
+IT REQUIRES SOME SELF COMMUNION TO PREPARE OURSELVES FOR GOOD FORTUNE AS WELL AS TO ENCOUNTER DIFFICULTY AND DANGER AND DISGRACE
+HIS CONSTANCY TO HER WAS NOW REWARDED
+FERDINAND FELT HIS FREEDOM AS WELL AS HIS HAPPINESS
+FERDINAND MEDITATES OVER HIS GOOD FORTUNE
+IN MOMENTS OF DEEP FEELING ALIKE SUDDEN BURSTS OF PROSPERITY AS IN DARKER HOURS MAN MUST BE ALONE
+IN THE PRESENT UNSETTLED THOUGH HOPEFUL STATE OF AFFAIRS FERDINAND WOULD NOT GO HOME
+WAS IT NOT ALL A DREAM OF HIS OWN CREATION WHILE HIS EYE HAD BEEN FIXED IN ABSTRACTION ON THAT BRIGHT AND FLOWING RIVER
+RESTLESS WITH IMPENDING JOY HE SAUNTERED TO THE BRIDGE AND LEANT OVER THE BALUSTRADE GAZING ON THE WATERS IN CHARMED AND CHARMING VACANCY
+IS PAPA ALONE ENQUIRED MISS TEMPLE
+HE COULD NOT FLATTER HIMSELF THAT HE INDEED MERITED SUCH SINGULAR BLESSINGS AND YET WITH ALL HIS FAULTS WHICH WITH HIM WERE BUT THE CONSEQUENCES OF HIS FIERY YOUTH FERDINAND HAD BEEN FAITHFUL TO HENRIETTA
+TIME WORE AWAY AND ON THE NINTH OF APRIL EIGHTEEN SIXTY FIVE GRANT CAPTURED THE CONFEDERATE ARMY UNDER LEE THUS VIRTUALLY ENDING THE WAR
+INDEED IF EVER A GENERAL DESERVED HONOR GRANT HAD WON IT HE HAD OPENED THE MISSISSIPPI TO NAVIGATION AND HAD CAPTURED NEARLY ONE HUNDRED THOUSAND PRISONERS AND ARMS
+HE WAS NOW COMMANDER OF ALL THE FEDERAL FORCES
+WHEN HIS PUBLIC SERVICES WERE FINISHED HE STARTED IN COMPANY WITH HIS WIFE SON JESSE AND A FEW FRIENDS
+HIS SUCCESS SEEMS TO HAVE BEEN THE OUTGROWTH OF HARD STUDY AND ABILITY TO PERFORM THE MOST EXHAUSTIVE LABOR WITHOUT FATIGUE
+THE CAPTURE OF LEE WAS A FAR MORE DIFFICULT UNDERTAKING
+THROUGH THE INFLUENCE OF HON THOMAS L HAMER HE WAS ADMITTED AT WEST POINT IN EIGHTEEN THIRTY NINE
+AT THIS TIME GRANT WAS NOT TAKEN WITH WAR AND PROBABLY EVINCED LITTLE INTEREST IN ARMY TACTICS
+GRANT ACTED AS MUSTERING OFFICER UNTIL BEING COMMISSIONED COLONEL OF THE TWENTY FIRST ILLINOIS VOLUNTEERS HE TOOK THE FIELD
+GENERAL HALLECK IN SPEAKING OF THIS BATTLE SAID
+THEY DID NOT BREATHE IT INTO THEIR MOUTHS OR THROUGH GILLS BUT TOOK IT IN THROUGH SOME OPENINGS IN THE BACK PART OF THEIR BODIES
+THEY KNEW THAT WHENEVER THEY STUCK OUT THEIR LOWER LIPS AT THE SMALL FISHES AND BUGS THEY SWAM AWAY AS FAST AS THEY COULD
+A PERSON WOULD THINK THAT AFTER A FAMILY HAD LIVED SO LONG IN A PLACE ALL THE NEIGHBORS WOULD BE FOND OF THEM YET IT IS NOT SO
+THEY ALWAYS ATE PLAIN FOOD AND PLENTY OF IT AND THEY NEVER ATE BETWEEN MEALS
+SURE ENOUGH THERE HE CAME THROUGH THE SHALLOW WATER HIS WET BACK SHELL PARTLY OUT OF IT AND SHINING IN THE SUNLIGHT
+YOU WOULD THINK THAT WITH SIX LEGS APIECE AND THREE JOINTS IN EACH LEG THEY MIGHT WALK QUITE FAST YET THEY NEVER DID
+HE BEGAN TO DRAW IN HIS LEGS VERY VERY SLOWLY AND JUST AS HIS GREAT HARD LOWER SHELL TOUCHED THE MUD THE LAST LARVA CRAWLED OUT UNDER HIS TAIL
+BUT SOMETIMES HE STRAIGHTENS THE JOINT AND HOLDS HIS LIP OUT BEFORE HIM AND THEN ITS PINCERS CATCH HOLD OF THINGS HE DOES THIS WHEN HE IS HUNGRY
+IT IS DISGRACEFUL
+THE NYMPHS HAD ALREADY GOTTEN AWAY
+OUR UPPER LIPS ARE SO SMALL THEY DON'T MATTER
+THEY THOUGHT THE TROUBLE CAME FROM BAD BRINGING UP OR NO BRINGING UP AT ALL
+BOTH LIPS ASKED THE LARVAE
+SCARED DAH WHO'S AFRAID ANSWERED HE
+THEY THOUGHT HE MIGHT BE GOING TO TAKE A NAP AFTER HIS DINNER
+HERE COMES THE SNAPPING TURTLE
+INDEED THE LOWER LIP OF A DRAGON FLY CHILD MIGHT WELL FRIGHTEN PEOPLE FOR IT IS FASTENED ON A LONG JOINTED ARM LIKE THING AND HAS PINCERS ON IT WITH WHICH IT CATCHES AND HOLDS ITS FOOD
+WELL OUR LOWER LIPS ANYWAY ANSWERED THE NYMPH
+ON THIS ACCOUNT THE PEOPLE OF ONE NATION UNDERSTAND ONE ANOTHER BETTER THAN THOSE BELONGING TO DIFFERENT NATIONS EVEN WHEN THEY USE THE SAME LANGUAGE OR RATHER WHEN PEOPLE HAVE LIVED LONG TOGETHER UNDER SIMILAR CONDITIONS OF CLIMATE SOIL DANGER REQUIREMENT TOIL THERE ORIGINATES THEREFROM AN ENTITY THAT UNDERSTANDS ITSELF NAMELY A NATION
+EVERYWHERE THAT SLAVE MORALITY GAINS THE ASCENDANCY LANGUAGE SHOWS A TENDENCY TO APPROXIMATE THE SIGNIFICATIONS OF THE WORDS GOOD AND STUPID
+DANGER IS AGAIN PRESENT THE MOTHER OF MORALITY GREAT DANGER THIS TIME SHIFTED INTO THE INDIVIDUAL INTO THE NEIGHBOUR AND FRIEND INTO THE STREET INTO THEIR OWN CHILD INTO THEIR OWN HEART INTO ALL THE MOST PERSONAL AND SECRET RECESSES OF THEIR DESIRES AND VOLITIONS
+WE TRUTHFUL ONES THE NOBILITY IN ANCIENT GREECE CALLED THEMSELVES
+PROBABLY A PESSIMISTIC SUSPICION WITH REGARD TO THE ENTIRE SITUATION OF MAN WILL FIND EXPRESSION PERHAPS A CONDEMNATION OF MAN TOGETHER WITH HIS SITUATION
+VARIATIONS WHETHER THEY BE DEVIATIONS INTO THE HIGHER FINER AND RARER OR DETERIORATIONS AND MONSTROSITIES APPEAR SUDDENLY ON THE SCENE IN THE GREATEST EXUBERANCE AND SPLENDOUR THE INDIVIDUAL DARES TO BE INDIVIDUAL AND DETACH HIMSELF
+IN FACT CONFORMABLY TO THE SLOW RISE OF THE DEMOCRATIC SOCIAL ORDER AND ITS CAUSE THE BLENDING OF THE BLOOD OF MASTERS AND SLAVES THE ORIGINALLY NOBLE AND RARE IMPULSE OF THE MASTERS TO ASSIGN A VALUE TO THEMSELVES AND TO THINK WELL OF THEMSELVES WILL NOW BE MORE AND MORE ENCOURAGED AND EXTENDED BUT IT HAS AT ALL TIMES AN OLDER AMPLER AND MORE RADICALLY INGRAINED PROPENSITY OPPOSED TO IT AND IN THE PHENOMENON OF VANITY THIS OLDER PROPENSITY OVERMASTERS THE YOUNGER
+OCCASIONALLY TOO THE WAKING CALL COMES TOO LATE THE CHANCE WHICH GIVES PERMISSION TO TAKE ACTION WHEN THEIR BEST YOUTH AND STRENGTH FOR ACTION HAVE BEEN USED UP IN SITTING STILL AND HOW MANY A ONE JUST AS HE SPRANG UP HAS FOUND WITH HORROR THAT HIS LIMBS ARE BENUMBED AND HIS SPIRITS ARE NOW TOO HEAVY
+WHAT WILL THE MORAL PHILOSOPHERS WHO APPEAR AT THIS TIME HAVE TO PREACH
+IT IS OBVIOUS THAT EVERYWHERE THE DESIGNATIONS OF MORAL VALUE WERE AT FIRST APPLIED TO MEN AND WERE ONLY DERIVATIVELY AND AT A LATER PERIOD APPLIED TO ACTIONS IT IS A GROSS MISTAKE THEREFORE WHEN HISTORIANS OF MORALS START WITH QUESTIONS LIKE WHY HAVE SYMPATHETIC ACTIONS BEEN PRAISED
+WHICHEVER GROUPS OF SENSATIONS WITHIN A SOUL AWAKEN MOST READILY BEGIN TO SPEAK AND GIVE THE WORD OF COMMAND THESE DECIDE AS TO THE GENERAL ORDER OF RANK OF ITS VALUES AND DETERMINE ULTIMATELY ITS LIST OF DESIRABLE THINGS
+HE HONOURS WHATEVER HE RECOGNIZES IN HIMSELF SUCH MORALITY EQUALS SELF GLORIFICATION
+AND TO CHOOSE FOR COMPANY THAT ROGUISH AND CHEERFUL VICE POLITENESS
+ACCORDING TO SLAVE MORALITY THEREFORE THE EVIL MAN AROUSES FEAR ACCORDING TO MASTER MORALITY IT IS PRECISELY THE GOOD MAN WHO AROUSES FEAR AND SEEKS TO AROUSE IT WHILE THE BAD MAN IS REGARDED AS THE DESPICABLE BEING
+THE DISTINCTIONS OF MORAL VALUES HAVE EITHER ORIGINATED IN A RULING CASTE PLEASANTLY CONSCIOUS OF BEING DIFFERENT FROM THE RULED OR AMONG THE RULED CLASS THE SLAVES AND DEPENDENTS OF ALL SORTS
+THE HIGHEST INSTINCT FOR PURITY PLACES HIM WHO IS AFFECTED WITH IT IN THE MOST EXTRAORDINARY AND DANGEROUS ISOLATION AS A SAINT FOR IT IS JUST HOLINESS THE HIGHEST SPIRITUALIZATION OF THE INSTINCT IN QUESTION
+THIS IS THE PROBLEM OF RACE
+NOTHING BUT NEW WHYS NOTHING BUT NEW HOWS NO COMMON FORMULAS ANY LONGER MISUNDERSTANDING AND DISREGARD IN LEAGUE WITH EACH OTHER DECAY DETERIORATION AND THE LOFTIEST DESIRES FRIGHTFULLY ENTANGLED THE GENIUS OF THE RACE OVERFLOWING FROM ALL THE CORNUCOPIAS OF GOOD AND BAD A PORTENTOUS SIMULTANEOUSNESS OF SPRING AND AUTUMN FULL OF NEW CHARMS AND MYSTERIES PECULIAR TO THE FRESH STILL INEXHAUSTED STILL UNWEARIED CORRUPTION
+EVERY ELEVATION OF THE TYPE MAN HAS HITHERTO BEEN THE WORK OF AN ARISTOCRATIC SOCIETY AND SO IT WILL ALWAYS BE A SOCIETY BELIEVING IN A LONG SCALE OF GRADATIONS OF RANK AND DIFFERENCES OF WORTH AMONG HUMAN BEINGS AND REQUIRING SLAVERY IN SOME FORM OR OTHER
+HERE IS THE SEAT OF THE ORIGIN OF THE FAMOUS ANTITHESIS GOOD AND EVIL POWER AND DANGEROUSNESS ARE ASSUMED TO RESIDE IN THE EVIL A CERTAIN DREADFULNESS SUBTLETY AND STRENGTH WHICH DO NOT ADMIT OF BEING DESPISED
+AND WHOEVER THOU ART WHAT IS IT THAT NOW PLEASES THEE
+THE GREATER THE DANGER THE GREATER IS THE NEED OF AGREEING QUICKLY AND READILY ABOUT WHAT IS NECESSARY NOT TO MISUNDERSTAND ONE ANOTHER IN DANGER THAT IS WHAT CANNOT AT ALL BE DISPENSED WITH IN INTERCOURSE
+I DO NOT KNOW HE SAID HESITATINGLY PERHAPS THE HARPIES HAVE FLOWN OVER MY TABLE
+IN OUR VERY DEMOCRATIC OR RATHER VERY PLEBEIAN AGE EDUCATION AND CULTURE MUST BE ESSENTIALLY THE ART OF DECEIVING DECEIVING WITH REGARD TO ORIGIN WITH REGARD TO THE INHERITED PLEBEIANISM IN BODY AND SOUL
+OR HE WILL EVEN SAY FOR MANY REASONS I CAN DELIGHT IN THE GOOD OPINION OF OTHERS PERHAPS BECAUSE I LOVE AND HONOUR THEM AND REJOICE IN ALL THEIR JOYS PERHAPS ALSO BECAUSE THEIR GOOD OPINION ENDORSES AND STRENGTHENS MY BELIEF IN MY OWN GOOD OPINION PERHAPS BECAUSE THE GOOD OPINION OF OTHERS EVEN IN CASES WHERE I DO NOT SHARE IT IS USEFUL TO ME OR GIVES PROMISE OF USEFULNESS ALL THIS HOWEVER IS NOT VANITY
+A MAN'S ESTIMATES OF VALUE BETRAY SOMETHING OF THE STRUCTURE OF HIS SOUL AND WHEREIN IT SEES ITS CONDITIONS OF LIFE ITS INTRINSIC NEEDS
+ONLY NAME IT WHATEVER I HAVE I OFFER THEE
+ALSO IN ALL LOVES AND FRIENDSHIPS ONE HAS THE EXPERIENCE THAT NOTHING OF THE KIND CONTINUES WHEN THE DISCOVERY HAS BEEN MADE THAT IN USING THE SAME WORDS ONE OF THE TWO PARTIES HAS FEELINGS THOUGHTS INTUITIONS WISHES OR FEARS DIFFERENT FROM THOSE OF THE OTHER
+TO SUFFOCATE WITH HIS MEMORIES TO HIM WHO HAS THE DESIRES OF A LOFTY AND DAINTY SOUL AND ONLY SELDOM FINDS HIS TABLE LAID AND HIS FOOD PREPARED THE DANGER WILL ALWAYS BE GREAT NOWADAYS HOWEVER IT IS EXTRAORDINARILY SO
+THE NOBLE SOUL ACCEPTS THE FACT OF HIS EGOISM WITHOUT QUESTION AND ALSO WITHOUT CONSCIOUSNESS OF HARSHNESS CONSTRAINT OR ARBITRARINESS THEREIN BUT RATHER AS SOMETHING THAT MAY HAVE ITS BASIS IN THE PRIMARY LAW OF THINGS IF HE SOUGHT A DESIGNATION FOR IT HE WOULD SAY IT IS JUSTICE ITSELF
+BUT YOU MISUNDERSTAND HIM WHEN YOU COMPLAIN ABOUT IT
+AT THIS TURNING POINT OF HISTORY THERE MANIFEST THEMSELVES SIDE BY SIDE AND OFTEN MIXED AND ENTANGLED TOGETHER A MAGNIFICENT MANIFOLD VIRGIN FOREST LIKE UP GROWTH AND UP STRIVING A KIND OF TROPICAL TEMPO IN THE RIVALRY OF GROWTH AND AN EXTRAORDINARY DECAY AND SELF DESTRUCTION OWING TO THE SAVAGELY OPPOSING AND SEEMINGLY EXPLODING EGOISMS WHICH STRIVE WITH ONE ANOTHER FOR SUN AND LIGHT AND CAN NO LONGER ASSIGN ANY LIMIT RESTRAINT OR FORBEARANCE FOR THEMSELVES BY MEANS OF THE HITHERTO EXISTING MORALITY
+PROBABLY BUT FORTUNATELY NOTHING FOR MY OWN TEETH PERHAPS IT BETRAYS THE SPECIES TO WHICH I BELONG BUT NOT TO MYSELF AS IS SUFFICIENTLY AGREEABLE TO ME BUT WHAT HAS HAPPENED TO YOU
+THERE MUST BE A SORT OF REPUGNANCE IN ME TO BELIEVE ANYTHING DEFINITE ABOUT MYSELF IS THERE PERHAPS SOME ENIGMA THEREIN
+THE MOST VARIED EXPERIENCE TEACHES IT WHAT ARE THE QUALITIES TO WHICH IT PRINCIPALLY OWES THE FACT THAT IT STILL EXISTS IN SPITE OF ALL GODS AND MEN AND HAS HITHERTO BEEN VICTORIOUS THESE QUALITIES IT CALLS VIRTUES AND THESE VIRTUES ALONE IT DEVELOPS TO MATURITY
+BOATS PUT OUT BOTH FROM THE FORT AND THE SHORE
+BEAUREGARD AT ONCE WROTE AN ORDER
+CHAPTER SEVEN THE HOMECOMING
+IF YOU'VE GOT PISTOLS JUST YOU THINK ONCE BEFORE YOU SHOOT SAID COLLINS
+AN EXTRAORDINARY WAVE OF EMOTION SWEPT OVER THE SOUTH CARRYING EVERYBODY WITH IT
+HE INTENDED TO LEAVE EARLY IN THE MORNING BUT FIRST HE SOUGHT HIS FRIENDS AND TOLD THEM GOOD BYE
+BILL SKELLY AN HIS GANG THEM MOUNTAINEERS ARE UP
+THAT WHITE FLAG AND THOSE BOATS GOING OUT MEAN THAT SUMTER IS OURS
+WHETHER THEIR MANNER WAS GRAVE OR FRIVOLOUS HE KNEW THAT THESE WERE GOOD FRIENDS OF HIS AND HE SINCERELY HOPED THAT HE WOULD MEET THEM AGAIN
+BUT THE NEGOTIATIONS WERE SOON COMPLETED
+HE WAS GOING HOME AFTER VICTORY
+HE FELT THE DIFFERENCE AS SOON AS HE REACHED THE HILLS OF HIS NATIVE STATE
+HE SOON LEFT CHARLESTON OUT OF SIGHT
+THERE WERE NEVER BEFORE SUCH TIMES IN OLD KENTUCKY
+EUROPE WHICH MUST HAVE ITS COTTON WOULD FAVOR THE SUCCESS OF THE SOUTH
+PEOPLE WERE COOLER HERE AND THEY WERE MORE PRONE TO LOOK AT THE TWO SIDES OF A QUESTION
+THE GREAT STATE OF VIRGINIA MOTHER OF PRESIDENTS WENT OUT OF THE UNION AT LAST AND NORTH CAROLINA TENNESSEE AND ARKANSAS FOLLOWED HER BUT MARYLAND KENTUCKY AND MISSOURI STILL HUNG IN THE BALANCE
+THE AIR TOO WAS UNLIKE THAT OF SOUTH CAROLINA THERE WAS A SHARPER TANG TO IT
+HARRY FEELING PRIDE BUT NOT SHOWING IT SALUTED AND LEFT THE ROOM GOING AT ONCE TO MADAME DELAUNAY'S WHERE HE HAD LEFT HIS BAGGAGE
+HE DID NOT SAY THE LAST AS A BOAST BUT MERELY AS AN ASSURANCE TO THE LIVERYMAN WHO HE SAW WAS ANXIOUS ON HIS ACCOUNT
+THERE WAS NOT A SINGLE NOTE OF GLOOM
+BUT HE LOOKED BACK AT CHARLESTON THE GAY THE VOLATILE AND THE BEAUTIFUL WITH REAL AFFECTION
+THE SMOKE ITSELF WHICH HAD FORMED A VAST CLOUD OVER HARBOR FORTS AND CITY WAS NOW DRIFTING OUT TO SEA LEAVING ALL THINGS ETCHED SHARPLY IN THE DAZZLING SUNLIGHT OF A SOUTHERN SPRING DAY
+HARRY THANKED HIM THREW HIS SADDLE BAGS ACROSS THE HORSE A POWERFUL BAY AND GIVING A FINAL WAVE OF HIS HAND TO THE SYMPATHETIC LIVERYMAN RODE AWAY
+HE GAZED UPON A WORLD FULL OF RESPONSIBILITIES AND PERILS
+IT WAS AFTERNOON WHEN HE REACHED THE LITTLE STATION OF WINTON AND LEFT THE TRAIN A TALL STURDY BOY THE SUPERIOR OF MANY A MAN IN SIZE STRENGTH AND AGILITY
+HE HAD SEEN GREAT THINGS AND HE HAD DONE HIS SHARE OF THEM
+IT WAS A DIFFERENT HARRY WHO STARTED HOME LATE IN APRIL
+LINCOLN HAD CALLED FOR VOLUNTEERS TO PUT DOWN A REBELLION BUT HARRY HEARD EVERYWHERE IN CHARLESTON THAT THE CONFEDERACY WAS NOW SECURE
+IT WAS ALMOST BURIED NOW IN FLOWERS AND FOLIAGE
+COLONEL KENTON WRITES WISELY
+THE PROGRESS OF PRESIDENT DAVIS TO THE NEW CAPITAL SET IN THE VERY FACE OF THE FOE WAS TO BE ONE HUGE TRIUMPH OF FAITH AND LOYALTY
+BUT HE SAW NOTHING THAT MOVED THERE NO SIGNAL LIGHTS TWINKLED
+IT WHIPPED HIS BLOOD AS IT BLEW DOWN FROM THE SLOPES AND CRESTS
+WE NEED KENTUCKY AND I UNDERSTAND THAT A VERY LITTLE MORE MAY BRING THE STATE TO US GO WITH YOUR FATHER I UNDERSTAND THAT YOU HAVE BEEN A BRAVE YOUNG SOLDIER HERE AND MAY YOU DO AS WELL UP THERE
+FOUR MONTHS HAD MADE GREAT CHANGES HE BORE HIMSELF MORE LIKE A MAN HIS MANNER WAS MUCH MORE CONSIDERED AND GRAVE
+HARRY GAVE HIS FAREWELLS WITH DEEP AND GENUINE REGRET
+THIS WAS NOT THE FASHION OF A YEAR AGO WHEN THEY EXCHANGED A FRIENDLY WORD OR TWO BUT HARRY KNEW ITS CAUSE NOW NOBODY COULD TRUST ANYBODY ELSE
+BUT THE EMOTIONS OF HARRY AND HIS COMRADES WERE FOR THE MOMENT THOSE OF VICTORY ONLY
+COLONEL LEONIDAS TALBOT REGARDED THE WHITE FLAG WITH FEELINGS IN WHICH TRIUMPH AND SADNESS WERE MINGLED STRANGELY
+ALL THE AMENITIES WERE PRESERVED BETWEEN THE CAPTURED GARRISON AND THEIR CAPTORS
+HIS TREASURE TAKEN TYPE OF HIS SELF AND A WOMAN GIVEN HIM INSTEAD
+LET BUT A MOOD BE STRONG ENOUGH AND THE SOUL CLOTHING ITSELF IN THAT MOOD AS WITH A GARMENT CAN WALK ABROAD AND HAUNT THE WORLD
+THAT ENCHANTMENT HAD POSSESSED HIM USURPING AS IT WERE THE THRONE OF HIS LIFE AND DISPLACING IT WHEN IT CEASED HE WAS NOT HIS OWN MASTER
+HE STARTED TO CONSCIOUS CONFUSION ONLY NEITHER KNOWING WHERE HE WAS NOR WHAT HE DID
+HOW IT HAPPENED HE NEVER COULD TELL BUT HE BROUGHT DOWN HIS VIOLIN WITH A CRASH AGAINST THE PIANO THEN SOMEHOW STUMBLED AND ALL BUT FELL
+BUT IN HIS HANDS SOLITUDE AND A VIOLIN WERE SURE TO MARRY IN MUSIC
+WAS THERE EVER A HAPPIER MAN THAN JOSEPH THAT NIGHT AS HE STRODE ALONG THE FOOTPATH
+IT CRIED ALOUD THAT ETERNITY WAS VERY LONG AND LIKE A GREAT PALACE WITHOUT A QUIET ROOM
+THEY SAT DOWN AND LISTENED IN SILENCE
+IN THE ACT OF RECOVERING HIMSELF HE HEARD THE NECK OF HIS INSTRUMENT PART FROM THE BODY WITH A TEARING DISCORDANT CRY LIKE THE SOUND OF THE RUIN OF A LIVING WORLD
+HAST THOU YET TO LEARN THAT THE LOVE OF THE HUMAN IS LOVE IS DIVINE IS BUT A LOWER FORM OF A PART OF THE LOVE OF GOD
+HE PRESSED HIS VIOLIN CASE TO HIS HEART AS IF IT WERE A LIVING THING THAT COULD KNOW THAT HE LOVED IT
+HER HEART SEEMED TO SWELL UP INTO HER THROAT AND IT WAS ALL SHE COULD DO TO KEEP FROM WEEPING
+A SOB LIKE A BIRD NEW BORN BURST FROM MARY'S BOSOM
+HE THAT LOVETH NOT HIS BROTHER WHOM HE HATH SEEN HOW SHALL HE LOVE GOD WHOM HE HATH NOT SEEN
+THE MUSIC WAS BROKEN AND JOSEPH LEFT ALONE WITH THE DUMB INSTRUMENTS
+WHEN HE REACHED THE SUBURBS THE LIGHT OF HOMES WAS SHINING THROUGH CURTAINS OF ALL COLORS
+IT'S JUST LIKE HIM HE MURMURED
+HE WAS IN A MOOD FOR MUSIC WAS HE NOT
+I LOVE THEE I LOVE THEE CRIED THE VIOLIN AND THE WORSHIP WAS ENTREATY THAT KNEW NOT ITSELF
+LETTY FINDING HERSELF NOT QUITE EQUAL TO THE EMERGENCY CAME IN HER TURN TO CALL MARY SHE WENT AS QUIETLY AS IF SHE WERE LEAVING A TIRESOME VISITOR
+BLESSED AM I HERE NOW MY GOD AND BLESSED SHALL I BE THERE THEN
+JUST THEN HE WAS IN NO MOOD TO THINK OF THE SORROWS
+A LITTLE LONGER AND SHE WAS COMPELLED TO YIELD AND THE SILENT TEARS FLOWED FREELY
+IT WAS THE AFTERNOON OF A HOLIDAY AND SHE HAD CLOSED EARLY
+HE LAID DOWN HIS VIOLIN AND SEATED HIMSELF WHERE MARY TOLD HIM IN HER FATHER'S ARM CHAIR BY THE FIRE
+HIS VIOLIN WAS BROKEN BUT HIS BEING WAS MADE WHOLE
+EARTH WAS GONE AND HEAVEN WAS ALL
+ONE WINTER EVENING AS SOON AS HIS WORK WAS OVER FOR THE DAY JOSEPH LOCKED THE DOOR OF HIS SMITHY WASHED HIMSELF WELL PUT ON CLEAN CLOTHES AND TAKING HIS VIOLIN SET OUT FOR TESTBRIDGE MARY WAS EXPECTING HIM TO TEA
+WHEN THOU LOVEST MAN OR WOMAN OR CHILD YEA OR EVEN DOG ARIGHT THEN WILT THOU NO LONGER NEED THAT I TELL THEE HOW GOD AND HIS CHRIST WOULD NOT BE CONTENT WITH EACH OTHER ALONE IN THE GLORIES EVEN OF THE ETERNAL ORIGINAL LOVE BECAUSE THEY COULD CREATE MORE LOVE
+NOR WAS THIS EXACTLY THE SHAPE THE THING TOOK TO THE CONSCIOUSNESS OF THE MUSICIAN
+LETTY TOO WAS OVERCOME MORE THAN EVER SHE HAD BEEN BY MUSIC
+THE NETTLE AND THE DOCK SAID JOSEPH
+BUT MY UNCLE WAS IN NO HUMOR TO WAIT
+ACCUSTOMED AS I HAD BEEN TO THE STEAM FERRY BOATS OF THE ELBE I FOUND THE LONG OARS OF THE BOATMEN BUT SORRY MEANS OF LOCOMOTION
+I COULD NOT HELP SMILING TO SEE HIM LOOK SO BIG ON HIS LITTLE HORSE HIS LONG LEGS NOW AND THEN TOUCHING THE GROUND MADE HIM LOOK LIKE A SIX FOOTED CENTAUR
+LITTLE DID I EXPECT HOWEVER THE SPECTACLE WHICH AWAITED US WHEN WE REACHED THE PENINSULA OF SNEFFELS WHERE AGGLOMERATIONS OF NATURE'S RUINS FORM A KIND OF TERRIBLE CHAOS
+HANS OUR EXTRAORDINARY GUIDE WENT FIRST WALKING WITH A STEADY RAPID UNVARYING STEP
+GEOGRAPHERS HAVE DIVIDED IT INTO FOUR PARTS AND WE HAD TO CROSS THE SOUTHWEST QUARTER WHICH IN THE VERNACULAR IS CALLED SUDVESTR FJORDUNGR
+IT CONSISTS SIMPLY OF A FEW HOUSES NOT WHAT IN ENGLAND OR GERMANY WE SHOULD CALL A HAMLET
+OUR TWO HORSES WITH THE LUGGAGE FOLLOWED OF THEIR OWN ACCORD WITHOUT REQUIRING WHIP OR SPUR
+HE SAYS TIDE REPLIED MY UNCLE TRANSLATING THE DANISH WORD FOR MY INFORMATION
+THESE SACRED EDIFICES ARE HOWEVER VERY MUCH LIKE THESE PEOPLE WHO DO WITHOUT WATCHES AND NEVER MISS THEM
+WE MAY DO SO WAS MY REPLY BUT WHAT ABOUT OUR WORTHY GUIDE
+THEY VERY RARELY SUCCEED IN A GOOD SHOW OF YELLOW
+SNOW TEMPEST IMPRACTICABLE ROADS ROCKS ICEBERGS NOTHING STOPS HIM
+AT LENGTH THE STURDY LITTLE PONY SPREADING OUT HIS LEGS IN A STIFF AND LUDICROUS ATTITUDE GOT FROM UNDER THE PROFESSOR'S LEGS AND LEFT HIM STANDING WITH BOTH FEET ON A SEPARATE STONE LIKE THE COLOSSUS OF RHODES
+MY ARMS ARE RIGHT BUT MY LEGS ARE GETTING A LITTLE STIFF
+WE TOOK OUR WAY THROUGH POOR AND SPARSE MEADOWS WHICH MADE A DESPERATE EFFORT EVERY YEAR TO SHOW A LITTLE GREEN
+A FEW STRAY COWS AND SHEEP WERE ONLY SEEN OCCASIONALLY
+I SHOULD HAVE A VIOLENT ATTACK OF THE CRAMP IF I WERE NOT TO HAVE SOME SORT OF EXERCISE
+I THOROUGHLY UNDERSTOOD AND APPRECIATED THE NECESSITY FOR WAITING BEFORE CROSSING THE FJORD FOR THAT MOMENT WHEN THE SEA AT ITS HIGHEST POINT IS IN A STATE OF SLACK WATER
+TO RIDE OVER SALT WATER UPON THE BACK OF A LITTLE HORSE SEEMED TO ME ABSURD
+HERE AND THERE COULD BE SEEN AN ISOLATED FARM SOME SOLITARY BUR OR ICELANDIC HOUSE BUILT OF WOOD EARTH FRAGMENTS OF LAVA LOOKING LIKE BEGGARS ON THE HIGHWAY OF LIFE
+I BEGAN TO ENJOY THE EXHILARATING DELIGHT OF TRAVELING A LIFE OF DESIRE GRATIFICATION AND LIBERTY
+I TOOK OCCASION TO CONSULT THE MAP TO SEE WHERE GARDAR WAS TO BE FOUND
+IN ANY CASE I SHALL TRUST RATHER TO MY OWN INTELLIGENCE THAN THEIRS
+CURIOUSLY ENOUGH THE BLOOD OF WABI RAN ALMOST PURE TO HIS INDIAN FOREFATHERS WHILE MINNETAKI AS SHE BECAME OLDER DEVELOPED LESS OF THE WILD BEAUTY OF HER MOTHER AND MORE OF THE SOFTER LOVELINESS OF THE WHITE RACE HER WEALTH OF SOFT JET BLACK HAIR AND HER GREAT DARK EYES CONTRASTING WITH THE LIGHTER SKIN OF HER FATHER'S BLOOD
+BUT IN TIME THE END OF IT ALL CAME AND WABI WENT BACK TO THE PRINCESS MOTHER TO MINNETAKI AND TO HIS FORESTS
+WHILE THE ATTACK WAS SUCCESSFUL IN A WAY ITS MAIN PURPOSE FAILED
+AT LAST SO DARING DID HE BECOME THAT THE PROVINCIAL GOVERNMENT PLACED A PRICE UPON HIS HEAD AND UPON THOSE OF A NUMBER OF HIS MOST NOTORIOUS FOLLOWERS
+ONE OF NEWSOME'S CHIEF PLEASURES IN LIFE HAD BEEN THE EDUCATING OF HIS WOODLAND BRIDE AND IT WAS THE AMBITION OF BOTH THAT THE LITTLE MINNETAKI AND HER BROTHER BE REARED IN THE WAYS OF WHITE CHILDREN
+IT WAS AT ABOUT THIS TIME IN THEIR LIVES THAT THE WOONGAS BECAME ESPECIALLY DARING IN THEIR DEPREDATIONS
+WABI ON THE OTHER HAND WAS AN INDIAN IN APPEARANCE FROM HIS MOCCASINS TO THE CROWN OF HIS HEAD SWARTHY SINEWY AS AGILE AS A LYNX AND WITH EVERY INSTINCT IN HIM CRYING FOR THE LIFE OF THE WILD
+MEANWHILE TWO CHILDREN CAME TO BLESS THE HAPPY UNION OF NEWSOME AND HIS LOVELY INDIAN WIFE
+THREE DAYS LATER MINNETAKI BECAME NEWSOME'S WIFE AT THE HUDSON BAY POST
+BUT EACH WEEK ADDED TO HIS LONELINESS AND HIS LONGINGS FOR MINNETAKI AND HIS FORESTS
+THERE WERE TEARS IN THE BOYS EYES WHEN THEY PARTED AND THE MOTHER CRIED FOR THE INDIAN BOY WHO WAS RETURNING TO HIS PEOPLE
+THE OTHER WAS A GIRL THREE YEARS YOUNGER AND NEWSOME INSISTED THAT SHE BE CALLED MINNETAKI
+ONE DARK NIGHT AT THE HEAD OF A SCORE OF HIS TRIBE HE FELL UPON WABIGOON'S CAMP HIS OBJECT BEING THE ABDUCTION OF THE PRINCESS
+THERE WAS LITTLE TIME TO LOSE IN MAKING PREPARATIONS AND THE FOURTH DAY FOLLOWING THE RECEIPT OF WABI'S LETTER FOUND ROD AND HIS MOTHER WAITING FOR THE TRAIN WHICH WAS TO WHIRL THE BOY INTO HIS NEW LIFE
+THE CHILDREN PROVED THEMSELVES UNUSUALLY BRIGHT PUPILS AND BY THE TIME WABI WAS SIXTEEN AND MINNETAKI TWELVE ONE WOULD NOT HAVE KNOWN FROM THEIR MANNER OF SPEECH THAT INDIAN BLOOD RAN IN THEIR VEINS
+A COUNTER ATTACK WAS MADE UPON WOONGA AND HE WAS DRIVEN DEEP INTO THE WILDERNESS WITH GREAT LOSS
+FROM THAT HOUR DATED ONE OF THE MOST SANGUINARY FEUDS IN THE HISTORY OF THE GREAT TRADING COMPANY A FEUD WHICH AS WE SHALL SEE WAS DESTINED TO LIVE EVEN UNTO THE SECOND GENERATION
+BUT THIS POWER OF DISCERNMENT WAS DENIED THEM AND ONLY IN AFTER YEARS WITH THE LOVED ONES OF THEIR OWN FIRESIDES CLOSE ABOUT THEM WAS THE WHOLE PICTURE REVEALED
+A THOUSAND PLANS WERE MADE A THOUSAND ADVENTURES PICTURED AND THE MOTHER WOULD SMILE AND LAUGH AND PLAN WITH THEM
+WE SHALL MAKE MORE MONEY UP HERE THIS WINTER THAN YOU COULD EARN IN DETROIT IN THREE YEARS
+ON THE TENTH OF OCTOBER HE WOULD MEET ROD AT SPRUCEWOOD ON THE BLACK STURGEON RIVER
+SPRING CAME AND PASSED AND THEN SUMMER
+WE WILL HUNT WOLVES THE COUNTRY IS ALIVE WITH THEM AND THE GOVERNMENT GIVES A BOUNTY OF FIFTEEN DOLLARS FOR EVERY SCALP TAKEN
+THREE WEEKS LATER CAME WABIGOON'S REPLY
+CONSEQUENTLY BOTH MOTHER AND FATHER BEGAN THEIR EDUCATION AT THE POST THEY WERE SENT TO THE FACTOR'S SCHOOL AND TWO WINTERS WERE PASSED IN PORT ARTHUR THAT THEY MIGHT HAVE THE ADVANTAGE OF THOROUGHLY EQUIPPED SCHOOLS
+NECESSITY HAD BECOME HIS GRIM MASTER AND THE FOLLOWING WEEK HE WAS GOING TO WORK
+ON THE SECOND OF THE MONTH AT TWO IN THE MORNING OUR PRECIOUS CARGO OF LUGGAGE WAS TAKEN ON BOARD THE GOOD SHIP VALKYRIE
+THE MEN APPEARED ROBUST BUT HEAVY FAIR HAIRED LIKE GERMANS BUT OF PENSIVE MIEN EXILES OF A HIGHER SCALE IN THE LADDER OF HUMANITY THAN THE ESKIMOS BUT I THOUGHT MUCH MORE UNHAPPY SINCE WITH SUPERIOR PERCEPTIONS THEY ARE COMPELLED TO LIVE WITHIN THE LIMITS OF THE POLAR CIRCLE
+THE PROFESSOR KNEW WHOM HE HAD TO DEAL WITH
+NEARLY THE WHOLE POPULATION OF THE TOWN WAS ON FOOT TO SEE US LAND
+VERY LIKELY I MAY FIND THERE SOME MANUSCRIPTS FROM THE HAND OF SAKNUSSEMM
+IN THE MEANTIME THERE IS NOT AN HOUR TO LOSE
+HE WAS HOWEVER BUT A CIVIL SERVANT A MAGISTRATE THE GOVERNOR OF THE ISLAND BARON TRAMPE
+AT ALL EVENTS WE SHALL GET THERE SOME DAY
+ON THE ELEVENTH DAY WE SIGHTED CAPE PORTLAND OVER WHICH TOWERED MOUNT MYRDALS YOKUL WHICH THE WEATHER BEING CLEAR WE MADE OUT VERY READILY
+THE FACT WAS THAT SCARCELY ANY ONE OF THEM BUT EXPECTED SOME GOODS BY THE PERIODICAL VESSEL
+THOUGH NOT VERY LARGE IT APPEARED NOT LIKELY TO BE FILLED FOR CENTURIES
+MY UNCLE WAS DELIGHTED FOR MYSELF MOODY AND DISSATISFIED I APPEARED ALMOST TO EXPECT A GLIMPSE OF THE GHOST OF HAMLET
+NO MISTER HARDWIGG SAID THE CAPTAIN NO FEAR OF THAT
+THEY WERE NOW HOWEVER ABSENT ON DUTY
+I SAW BUT FEW INHABITANTS DURING MY EXCURSION BUT I MET A CROWD ON THE BEACH DRYING SALTING AND LOADING CODFISH THE PRINCIPAL ARTICLE OF EXPORTATION
+WELL AND HAVE WE A FAIR WIND
+ON ALL SIDES WERE TO BE SEEN WHOLE SCHOOLS OF WHALES AND SHARKS
+BUT NO GHOST OR ANYTHING ELSE APPEARED UPON THE ANCIENT WALLS
+NOW HARRY SAID MY UNCLE RUBBING HIS HANDS AN GOES WELL THE WORSE DIFFICULTY IS NOW OVER
+WHEN THEREFORE HE ADDRESSED HIMSELF TO ME IN THE LANGUAGE OF HORACE WE AT ONCE CAME TO UNDERSTAND ONE ANOTHER
+THANKS TO THE HEAT OF THESE RESIDENCES GRASS GROWS ON THE ROOF WHICH GRASS IS CAREFULLY CUT FOR HAY
+I SHALL BE GLAD TO CONSULT THEM
+THE VALKYRIE KEPT OFF THE COAST STEERING TO THE WESTWARD
+THEN WITHOUT FURTHER REMARK HE PUT HIS FINGER TO HIS LIPS FROWNED DARKLY AND DESCENDED INTO THE SMALL BOAT WHICH AWAITED US
+THIS MODEST SCHOLAR SPOKE NO LANGUAGES SAVE ICELANDIC AND LATIN
+THE FACT IS THE CASTLE IS MUCH LATER THAN THE TIME OF THE HEROIC PRINCE OF DENMARK
+NOT BE IT EVER REMEMBERED THAT THE SLIGHTEST SUSPICION OF IMMORALITY ATTACHES EITHER TO THE HEROINE OF THIS BOOK OR TO THE LEADING PHILOSOPHERS OF HER SCHOOL FOR SEVERAL CENTURIES
+TO SYNESIUS'S MOST CHARMING LETTERS AS WELL AS TO THOSE OF ISIDORE THE GOOD ABBOT OF PELUSIUM I BEG LEAVE TO REFER THOSE READERS WHO WISH FOR FURTHER INFORMATION ABOUT THE PRIVATE LIFE OF THE FIFTH CENTURY
+THAT WONDERFUL METAPHYSIC SUBTLETY WHICH IN PHRASES AND DEFINITIONS TOO OFTEN UNMEANING TO OUR GROSSER INTELLECT SAW THE SYMBOLS OF THE MOST IMPORTANT SPIRITUAL REALITIES AND FELT THAT ON THE DISTINCTION BETWEEN HOMOOUSIOS AND HOMOIOUSIOS MIGHT HANG THE SOLUTION OF THE WHOLE PROBLEM OF HUMANITY WAS SET TO BATTLE IN ALEXANDRIA THE ANCIENT STRONGHOLD OF GREEK PHILOSOPHY WITH THE EFFETE REMAINS OF THE VERY SCIENTIFIC THOUGHT TO WHICH IT OWED ITS EXTRAORDINARY CULTURE
+THE HUNS SINGLY THEIR INFERIORS PRESSED THEM FROM BEHIND WITH THE IRRESISTIBLE WEIGHT OF NUMBERS ITALY WITH HER RICH CITIES AND FERTILE LOWLANDS BECKONED THEM ON TO PLUNDER AS AUXILIARIES THEY HAD LEARNED THEIR OWN STRENGTH AND ROMAN WEAKNESS A CASUS BELLI WAS SOON FOUND
+THE VERY EMPERORS HAD ARRAYED THEMSELVES ON HER SIDE
+I CANNOT HOPE THAT THESE PAGES WILL BE ALTOGETHER FREE FROM ANACHRONISMS AND ERRORS
+HOW INIQUITOUS WAS THE CONDUCT OF THE SONS OF THEODOSIUS IN REFUSING THE USUAL BOUNTY BY WHICH THE GOTHS WERE BRIBED NOT TO ATTACK THE EMPIRE THE WHOLE PENT UP DELUGE BURST OVER THE PLAINS OF ITALY AND THE WESTERN EMPIRE BECAME FROM THAT DAY FORTH A DYING IDIOT WHILE THE NEW INVADERS DIVIDED EUROPE AMONG THEMSELVES
+THE COUNTLESS TREASURES WHICH FIVE CENTURIES OF RAPINE HAD ACCUMULATED ROUND THE CAPITOL HAD BECOME THE PREY OF MEN CLOTHED IN SHEEPSKINS AND HORSE HIDE AND THE SISTER OF AN EMPEROR HAD FOUND HER BEAUTY VIRTUE AND PRIDE OF RACE WORTHILY MATCHED BY THOSE OF THE HARD HANDED NORTHERN HERO WHO LED HER AWAY FROM ITALY AS HIS CAPTIVE AND HIS BRIDE TO FOUND NEW KINGDOMS IN SOUTH FRANCE AND SPAIN AND TO DRIVE THE NEWLY ARRIVED VANDALS ACROSS THE STRAITS OF GIBRALTAR INTO THE THEN BLOOMING COAST LAND OF NORTHERN AFRICA
+ONE WHO WRITES OF SUCH AN ERA LABOURS UNDER A TROUBLESOME DISADVANTAGE
+IN THE PRESENT CASE THAT DISADVANTAGE IS DOUBLED FOR WHILE THE SINS OF THE CHURCH HOWEVER HEINOUS WERE STILL SUCH AS ADMIT OF BEING EXPRESSED IN WORDS THE SINS OF THE HEATHEN WORLD AGAINST WHICH SHE FOUGHT WERE UTTERLY INDESCRIBABLE AND THE CHRISTIAN APOLOGIST IS THUS COMPELLED FOR THE SAKE OF DECENCY TO STATE THE CHURCH'S CASE FAR MORE WEAKLY THAN THE FACTS DESERVE
+AND THE NEW BLOOD AT THE ERA OF THIS STORY WAS AT HAND
+THAT EXTRAORDINARY REFORM IN MORALS WHICH ACCORDING TO SALVIAN AND HIS CONTEMPORARIES THE VANDAL CONQUERORS WORKED IN NORTH AFRICA AVAILED THEM NOTHING THEY LOST MORE THAN THEY GAVE
+BUT IF THE EMPERORS HAD BECOME CHRISTIAN THE EMPIRE HAD NOT
+TRIBE AFTER TRIBE WAS CROWDING DOWN TO THE ALPS AND TRAMPLING UPON EACH OTHER ON THE FRONTIERS OF THE EMPIRE
+THE MENS SANA MUST HAVE A CORPUS SANUM TO INHABIT
+IN THE MEANWHILE THE MINDS OF MEN CUT ADRIFT FROM THEIR ANCIENT MOORINGS WANDERED WILDLY OVER PATHLESS SEAS OF SPECULATIVE DOUBT AND ESPECIALLY IN THE MORE METAPHYSICAL AND CONTEMPLATIVE EAST ATTEMPTED TO SOLVE FOR THEMSELVES THE QUESTIONS OF MAN'S RELATION TO THE UNSEEN BY THOSE THOUSAND SCHISMS HERESIES AND THEOSOPHIES IT IS A DISGRACE TO THE WORD PHILOSOPHY TO CALL THEM BY IT ON THE RECORDS OF WHICH THE STUDENT NOW GAZES BEWILDERED UNABLE ALIKE TO COUNT OR TO EXPLAIN THEIR FANTASIES
+THAT DIVINE WORD WHO IS THE LIGHT WHO LIGHTETH EVERY MAN WHICH COMETH INTO THE WORLD HAD AWAKENED IN THE HEART OF MANKIND A MORAL CRAVING NEVER BEFORE FELT IN ANY STRENGTH EXCEPT BY A FEW ISOLATED PHILOSOPHERS OR PROPHETS
+BUT THE HEALTH OF A CHURCH DEPENDS NOT MERELY ON THE CREED WHICH IT PROFESSES NOT EVEN ON THE WISDOM AND HOLINESS OF A FEW GREAT ECCLESIASTICS BUT ON THE FAITH AND VIRTUE OF ITS INDIVIDUAL MEMBERS
+JULIAN'S LAST ATTEMPT TO RESTORE PAGANISM BY IMPERIAL INFLUENCE HAD ONLY PROVED THAT THE OLD FAITH HAD LOST ALL HOLD UPON THE HEARTS OF THE MASSES AT HIS DEATH THE GREAT TIDE WAVE OF NEW OPINION ROLLED ON UNCHECKED AND THE RULERS OF EARTH WERE FAIN TO SWIM WITH THE STREAM TO ACCEPT IN WORDS AT LEAST THE CHURCH'S LAWS AS THEIRS TO ACKNOWLEDGE A KING OF KINGS TO WHOM EVEN THEY OWED HOMAGE AND OBEDIENCE AND TO CALL THEIR OWN SLAVES THEIR POORER BRETHREN AND OFTEN TOO THEIR SPIRITUAL SUPERIORS
+THEY BROUGHT BEFORE THE MINDS OF CHURCHMEN A THOUSAND NEW QUESTIONS WHICH MUST BE SOLVED UNLESS THE CHURCH WAS TO RELINQUISH FOR EVER HER CLAIMS AS THE GREAT TEACHER AND SATISFIER OF THE HUMAN SOUL
+AY SHE ANSWERED HALF BITTERLY AND WOULD THAT WE COULD LIVE WITHOUT FOOD AND IMITATE PERFECTLY THE IMMORTAL GODS
+THERE IS FRUIT WITH LENTILS AND RICE WAITING FOR YOU IN THE NEXT ROOM AND BREAD UNLESS YOU DESPISE IT TOO MUCH
+NOT THAT SUCH A CREATURE AS THAT DISTURBS ME NO CREATED THING I HOPE CAN MOVE MY EQUANIMITY BUT IF I COULD STOOP TO HATE I SHOULD HATE HER HATE HER
+HIS EXCELLENCY MADAM THE PREFECT
+AND WHY SHOULD THAT DISTURB ME LET HIM ENTER
+SHE HAS LIFTED HER EYES OFF HER MANUSCRIPT SHE IS LOOKING OUT WITH KINDLING COUNTENANCE OVER THE GARDENS OF THE MUSEUM HER RIPE CURLING GREEK LIPS SUCH AS WE NEVER SEE NOW EVEN AMONG HER OWN WIVES AND SISTERS OPEN
+IF THEY HAVE CAST OFF THE VULGAR HERD THEY HAVE NOT CAST OFF HYPATIA
+WHAT DO I CARE FOR FOOD
+AND HER VOICE TOOK A TONE WHICH MADE IT SOMEWHAT UNCERTAIN WHETHER IN SPITE OF ALL THE LOFTY IMPASSIBILITY WHICH SHE FELT BOUND TO POSSESS SHE DID NOT HATE PELAGIA WITH A MOST HUMAN AND MUNDANE HATRED
+IF THEY HAVE CEASED TO GUIDE NATIONS THEY HAVE NOT CEASED TO SPEAK TO THEIR OWN ELECT
+STRANGE THAT MEN SHOULD BE CONTENT TO GROVEL AND BE MEN WHEN THEY MIGHT RISE TO THE RANK OF GODS
+I TO BELIEVE AGAINST THE AUTHORITY OF PORPHYRY HIMSELF TOO IN EVIL EYES AND MAGIC
+THE PLACE SEEMED FRAGRANT WITH ALL THE RICHES OF GREEK THOUGHT AND SONG SINCE THE DAYS WHEN PTOLEMY PHILADELPHUS WALKED THERE WITH EUCLID AND THEOCRITUS CALLIMACHUS AND LYCOPHRON
+BUT MOST PROBABLY HAD ANY OF US ENTERED THAT ROOM THAT MORNING WE SHOULD NOT HAVE BEEN ABLE TO SPARE A LOOK EITHER FOR THE FURNITURE OR THE GENERAL EFFECT OR THE MUSEUM GARDENS OR THE SPARKLING MEDITERRANEAN BEYOND BUT WE SHOULD HAVE AGREED THAT THE ROOM WAS QUITE RICH ENOUGH FOR HUMAN EYES FOR THE SAKE OF ONE TREASURE WHICH IT POSSESSED AND BESIDE WHICH NOTHING WAS WORTH A MOMENT'S GLANCE
+THE ROOM HAD NEITHER CARPET NOR FIREPLACE AND THE ONLY MOVABLES IN IT WERE A SOFA BED A TABLE AND AN ARM CHAIR ALL OF SUCH DELICATE AND GRACEFUL FORMS AS MAY BE SEEN ON ANCIENT VASES OF A FAR EARLIER PERIOD THAN THAT WHEREOF WE WRITE
+TO BE WELCOMED INTO THE CELESTIAL RANKS OF THE HEROIC TO RISE TO THE IMMORTAL GODS TO THE INEFFABLE POWERS ONWARD UPWARD EVER THROUGH AGES AND THROUGH ETERNITIES TILL I FIND MY HOME AT LAST AND VANISH IN THE GLORY OF THE NAMELESS AND THE ABSOLUTE ONE
+HOW CAN HE WHOSE SPHERE LIES ABOVE THE STARS STOOP EVERY MOMENT TO EARTH
+THE OPERATION WHATEVER IT HAD BEEN WHICH HAD DEPRIVED HIS FEATURES OF HARMONY AND PUT ALL THEIR FLESH INTO DISORDER HAD HAD NO EFFECT ON THE BONY STRUCTURE OF HIS HEAD
+BESIDES WE MUST REMEMBER THAT THEY HAD IN THOSE TIMES MEANS OF PUTTING PATIENTS TO SLEEP AND OF SUPPRESSING ALL SUFFERING ONLY THEN IT WAS CALLED MAGIC WHILE NOW IT IS CALLED ANAESTHESIA
+HIS HAIR HAVING PROBABLY BEEN DYED WITH SOME CORROSIVE PREPARATION HAD LEFT IT WOOLLY AND ROUGH TO THE TOUCH
+BESIDES THIS FACE THOSE WHO HAD BROUGHT HIM UP HAD GIVEN HIM THE RESOURCES OF A GYMNAST AND AN ATHLETE
+GWYNPLAINE HAD YELLOW HAIR
+THE JOYOUS CONVULSION OF LAUGHTER WAS AS A TRIBUTE PAID THEY SUBMITTED TO IT GLADLY BUT ALMOST MECHANICALLY
+THE OUTSIDE DID NOT DEPEND ON THE INTERIOR
+ALL HIS EMOTIONS WHATEVER THEY MIGHT HAVE BEEN AUGMENTED HIS STRANGE FACE OF JOY OR TO SPEAK MORE CORRECTLY AGGRAVATED IT
+WITH THIS EXCEPTION GWYNPLAINE'S LAUGH WAS EVERLASTING
+IT SEEMED EVIDENT THAT A MYSTERIOUS AND PROBABLY OCCULT SCIENCE WHICH WAS TO SURGERY WHAT ALCHEMY WAS TO CHEMISTRY HAD CHISELLED HIS FLESH EVIDENTLY AT A VERY TENDER AGE AND MANUFACTURED HIS COUNTENANCE WITH PREMEDITATION
+HAD GWYNPLAINE WHEN A CHILD BEEN SO WORTHY OF ATTENTION THAT HIS FACE HAD BEEN SUBJECTED TO TRANSMUTATION WHY NOT
+ACCORDING TO ALL APPEARANCE INDUSTRIOUS MANIPULATORS OF CHILDREN HAD WORKED UPON HIS FACE
+GWYNPLAINE WAS A MOUNTEBANK
+BUT IS LAUGHTER A SYNONYM OF JOY
+SUCH PERFECT COMPLETENESS IS NOT IN NATURE
+AN EVERLASTING LAUGH
+HE SHOWED HIMSELF ON THE PLATFORM
+THE WHOLE OF EXISTENCE RESEMBLES A LETTER MODIFIED IN THE POSTSCRIPT
+THE MANICHAEANS BELIEVED THE ABSOLUTE OCCASIONALLY GIVES WAY AND THAT GOD HIMSELF SOMETIMES ABDICATES FOR A TIME SO ALSO OF THE WILL
+NO ONE COULD ESCAPE FROM THIS RICTUS
+IT WAS GWYNPLAINE'S LAUGH WHICH CREATED THE LAUGHTER OF OTHERS YET HE DID NOT LAUGH HIMSELF
+ITS YELLOW BRISTLES RATHER A MANE THAN A HEAD OF HAIR COVERED AND CONCEALED A LOFTY BROW EVIDENTLY MADE TO CONTAIN THOUGHT
+PHOEBE COOKED VENUS SCRUBBED THE TEMPLE
+WHAT TRUE THINGS ARE TOLD IN STORIES
+UNKNOWN PEOPLE HAD WORKED UPON HIS FACE HE ON THE OTHER HAND HAD WORKED ON HIS MIND AND BEHIND THIS WELL EXECUTED MASK HE HAD PLACED ALL THAT HE COULD OF THOUGHT
+URSUS WAS IN EVERYTHING IN THE PIECE IN THE COMPANY IN THE KITCHEN IN THE ORCHESTRA
+THE CARAVAN WAS DIVIDED INTO THREE COMPARTMENTS PARTITIONED FROM EACH OTHER
+THIS WAS THE OLD ESTABLISHMENT OF URSUS ITS PROPORTIONS AUGMENTED BY SUCCESS AND IMPROVED FROM A WRETCHED BOOTH INTO A THEATRE
+THIS GREEN COLOUR HAD SUCCEEDED IN DRAWING ATTENTION TO THE CARRIAGE WHICH WAS KNOWN IN ALL THE FAIR GROUNDS AS THE GREEN BOX
+URSUS WAS THE POET OF THESE MAGICAL REPRESENTATIONS HE WROTE THE PIECES
+THIS HUT IN A CORNER AT THE BACK TO THE RIGHT OF THE DOOR SERVED AS BEDCHAMBER AND DRESSING ROOM TO URSUS AND GWYNPLAINE
+THE ASTONISHMENT WITH WHICH THE VILLAGERS REGARDED THIS MACHINE WAS OVERWHELMING
+ON THE ROOF FROM A TUBE PAINTED GREEN LIKE THE REST SMOKE AROSE
+IN GWYNPLAINE EVIL THOUGHTS NEVER RIPENED AND HE HAD THEREFORE NO REMORSE
+THE CURIOSITY OF ONE PLACE EXHAUSTED THEY PASSED ON TO ANOTHER
+SOME BELIEVED IT TO BE NATURAL OTHERS DECLARED IT TO BE ARTIFICIAL AND AS CONJECTURE WAS ADDED TO REALITY EVERYWHERE AT EVERY CROSS ROAD ON THE JOURNEY IN ALL THE GROUNDS OF FAIRS AND FETES THE CROWD RAN AFTER GWYNPLAINE
+THEN I LOOK PERHAPS LIKE WHAT I AM
+FOR THESE READ FIBI AND VINOS THAT WE MAY CONFORM TO ENGLISH PRONUNCIATION
+FROM SIXTEEN EIGHTY TO SEVENTEEN O FOUR A GREAT CHANGE HAD TAKEN PLACE
+THE WHEELS WERE ALL OF THE SAME SIZE AND HIGH AS WAGON WHEELS
+A LOFT UNDER THE ARCH OF THE ROOF CONTAINED THE SCENES AND ON OPENING A TRAP DOOR LAMPS APPEARED PRODUCING WONDERS OF LIGHT
+THIS OPENING LOOKED FOR ALL THE WORLD LIKE A MOUTH OF HELL IN THE WORDS OF THE ITINERANT PURITAN PREACHERS WHO TURNED AWAY FROM IT WITH HORROR
+THE EFFECT OF HIS APPEARANCE HAD BEEN SURPRISING
+URSUS AND HOMO TOOK CHARGE OF EACH OTHER
+WHAT WAS THIS NOTHING
+THIS FORTUNE HAD ALLOWED URSUS WHO WAS THE ADMINISTRATOR OF GWYNPLAINE'S SUCCESS TO HAVE THE CHARIOT OF HIS DREAMS CONSTRUCTED THAT IS TO SAY A CARAVAN LARGE ENOUGH TO CARRY A THEATRE AND TO SOW SCIENCE AND ART IN THE HIGHWAYS
+IT WAS ESTABLISHED AT SOUTHWARK
+BY THE SIDE OF THE DOOR WAS CONSTRUCTED OFF HAND BY MEANS OF AN EMPTY BARREL A BOX FOR THE MONEY TAKER WHO WAS SOMETIMES FIBI AND SOMETIMES VINOS
+HE DID NOT COME EVERY EVENING BUT WHEN HE CAME HE LED THE PUBLIC APPLAUSE GREW INTO ACCLAMATION SUCCESS ROSE NOT TO THE ROOF FOR THERE WAS NONE BUT TO THE CLOUDS FOR THERE WERE PLENTY OF THEM
+ONE EVENING URSUS WAS IN THE SIDE SCENE WHICH WAS THE KITCHEN DOOR OF THE GREEN BOX SEEING MASTER NICLESS STANDING BY HIM SHOWED HIM THIS MAN IN THE CROWD AND ASKED HIM
+HE ENTERED HEAVEN ONLY BY THE ARTISTS DOOR
+WHICH CLOUDS SEEING THAT THERE WAS NO ROOF SOMETIMES WEPT OVER THE MASTERPIECE OF URSUS
+ALL SOUTHWARK RAN IN CROWDS TO ADMIRE THE LAUGHING MAN
+EVEN THIS COMEDIAN OF JAWS AND CLAWS WAS ECLIPSED IN SUCCESS
+WHAT A PITY THAT HE SHOULD NOT BE A LORD
+THAT SUCCESS WAS PRODIGIOUS STILL IT REMAINED LOCAL
+AT EVERY PERFORMANCE THE YARD OF THE INN TRANSFORMED INTO A PIT WAS FILLED WITH A RAGGED AND ENTHUSIASTIC AUDIENCE
+IT TOOK A HUNDRED AND THIRTY YEARS FOR THE NAME OF SHAKESPEARE TO PENETRATE FROM ENGLAND INTO FRANCE
+WE ARE IN LONDON SAID URSUS WE MUST BE PREPARED FOR THE GENTRY
+THE MERRY ANDREWS AND MOUNTEBANKS OF TARRINZEAU FIELD WERE AGHAST AT GWYNPLAINE
+THE EMPTYING OF TANKARDS DID NOT DECREASE THEIR SUCCESS
+SAINT PAUL IS A SAINT ONLY WITH EXTENUATING CIRCUMSTANCES
+HIS ENTHUSIASM CAUSED URSUS TO REMARK THIS MAN AND GWYNPLAINE TO OBSERVE HIM
+IT MIGHT HAVE BEEN ORDERED FOR THE GREEN BOX
+THE PLACARD GWYNPLAINE THE LAUGHING MAN TAKEN FROM ITS NAIL IN THE GREEN BOX WAS HUNG UP CLOSE TO THE SIGN OF THE INN
+HE WOULD MAKE A FAMOUS SCOUNDREL
+GWYNPLAINE ATE UP THEIR PUBLIC
+WITH THAT EXCEPTION THEIR SUCCESS BECAME SO GREAT THAT NO MOUNTEBANK MEMORY COULD RECALL ITS PARALLEL
+IT WAS A THEATRE READY MADE
+AT THAT HOUR THERE WAS NO ONE IN THE FAIR GROUND EXCEPT PERHAPS SOME REELING DRUNKARD MAKING STAGGERING SHADOWS IN DARK CORNERS
+AGAINST THIS WALL WAS PLACED THE GREEN BOX WHICH THEY WERE ABLE TO DRAW INTO THE YARD OWING TO THE HEIGHT OF THE GATE
+THEY BEGAN THEIR PERFORMANCES
+THE GLORY OF GWYNPLAINE HAD NOT PASSED LONDON BRIDGE
+THESE WERE REMARKABLE TALENTS
+THIS CONNOISSEUR WAS SUDDENLY FASCINATED AND HAD ADOPTED THE LAUGHING MAN
+THEY HAD A GREAT FRIEND IN THIS UNKNOWN VISITOR
+BESIDES THE SMALL FRY THE SWALLOWERS OF SWORDS AND THE GRIMACE MAKERS REAL PERFORMANCES TOOK PLACE ON THE GREEN
+BESIDES THIS HE HARANGUED LIKE CICERO AS WE HAVE JUST SEEN SOLD HIS DRUGS ATTENDED SICKNESS AND EVEN HEALED THE SICK
+THE DOME OF SAINT PAUL'S WAS A DELIGHT TO URSUS
+YET THERE ARE A FEW PRIVATE ROOMS WHICH CONTAIN A TABLE SURROUNDED WITH BENCHES IN WHICH A RESPECTABLE FAMILY OR A FEW FRIENDS CAN ENJOY THEMSELVES IN A DECENT WAY
+NEVER FEAR YOU SHALL SEE HIM AGAIN TO MORROW
+NOT VERY LONG I ANSWERED AND I WILL TEACH YOU AS YOU WISH ALTHOUGH THE HERMIT ASSURED ME THAT I WOULD DIE SUDDENLY WITHIN THREE DAYS IF I COMMUNICATED MY SCIENCE TO ANYONE BUT I HAVE NO FAITH WHATEVER IN THAT PREDICTION
+THEY ALL ASKED ME HOW LONG I WOULD REQUIRE TO TEACH THEM THE RULES OF MY SUBLIME CALCULUS
+UNWILLING TO HURT HIS VANITY BY TELLING HIM THAT HE WAS MISTAKEN I TOOK THE WILD RESOLUTION OF INFORMING HIM IN THE PRESENCE OF HIS TWO FRIENDS THAT I POSSESSED A CERTAIN NUMERAL CALCULUS WHICH GAVE ANSWERS ALSO IN NUMBERS TO ANY QUESTIONS I LIKED TO PUT
+THEY DID NOT KNOW WHO I WAS AND DID NOT LIKE TO ASK ME WHILST I THOUGHT IT BETTER TO PRESERVE A MODEST SILENCE
+WE WOULD VERY OFTEN SPEND THE WHOLE NIGHT RAMBLING ABOUT THE CITY INVENTING AND CARRYING INTO EXECUTION THE MOST IMPERTINENT PRACTICAL JOKES
+IF THE QUESTION WAS SO OBSCURE THAT I COULD NOT MAKE OUT THE SENSE OF IT IT WAS NATURAL THAT I SHOULD NOT UNDERSTAND THE ANSWER
+WHERE IS MY HUSBAND
+I DECLARED MYSELF QUITE WILLING FOR IT WAS NECESSARY TO BRAZEN IT OUT AFTER HAVING VENTURED AS FAR AS I HAD DONE
+I FELT THAT IN MY FIRST PROFESSION AS I WAS NOT BLESSED WITH THE VOCATION NECESSARY TO IT I SHOULD HAVE SUCCEEDED ONLY BY DINT OF HYPOCRISY AND I SHOULD HAVE BEEN DESPICABLE IN MY OWN ESTIMATION EVEN IF I HAD SEEN THE PURPLE MANTLE ON MY SHOULDERS FOR THE GREATEST DIGNITIES CANNOT SILENCE A MAN'S OWN CONSCIENCE
+BUT ALTHOUGH BELIEVING FULLY IN MY ORACLES THEY WERE TOO KIND HEARTED TO THINK THEM THE WORK OF THE DEVIL AND IT SUITED THEIR NATURAL GOODNESS BETTER TO BELIEVE MY ANSWERS INSPIRED BY SOME HEAVENLY SPIRIT
+OUR SCANDALOUS PROCEEDINGS OFTEN EXPOSED US TO THE GREATEST DANGER
+THEY BELIEVED THAT THROUGH ME THEY POSSESSED THE PHILOSOPHER'S STONE THE UNIVERSAL PANACEA THE INTERCOURSE WITH ALL THE ELEMENTARY HEAVENLY AND INFERNAL SPIRITS THEY HAD NO DOUBT WHATEVER THAT THANKS TO MY SUBLIME SCIENCE THEY COULD FIND OUT THE SECRETS OF EVERY GOVERNMENT IN EUROPE
+YOUR APARTMENT IS READY YOU MAY SEND YOUR CLOTHES YOU SHALL HAVE A SERVANT A GONDOLA AT YOUR ORDERS MY OWN TABLE AND TEN SEQUINS A MONTH
+I TOLD HIM AND HE INSISTED UPON MY COMING WITH HIM IN THE GONDOLA SAYING THAT HE WOULD LEAVE ME AT MY HOUSE
+WHENEVER WE COULD CONTRIVE TO GET INTO A CHURCH TOWER WE THOUGHT IT GREAT FUN TO FRIGHTEN ALL THE PARISH BY RINGING THE ALARM BELL AS IF SOME FIRE HAD BROKEN OUT BUT THAT WAS NOT ALL WE ALWAYS CUT THE BELL ROPES SO THAT IN THE MORNING THE CHURCHWARDENS HAD NO MEANS OF SUMMONING THE FAITHFUL TO EARLY MASS
+I JUMPED OUT OF THE GONDOLA AND FOUND MYSELF ON THE VERY SPOT WHERE THREE YEARS BEFORE I HAD TAUGHT RAZETTA SUCH A FORCIBLE LESSON I ENQUIRED FOR A SURGEON AT THE FIRST COFFEE HOUSE AND RAN TO THE HOUSE THAT WAS POINTED OUT TO ME
+WE DID THE SAME WITH PHYSICIANS WHOM WE OFTEN SENT HALF DRESSED TO SOME NOBLEMAN WHO WAS ENJOYING EXCELLENT HEALTH
+BESIDES I FOUND IT VERY FLATTERING TO MY VANITY TO BECOME THE SUBJECT OF THE SPECULATIVE CHATTERING OF EMPTY FOOLS WHO HAVING NOTHING ELSE TO DO ARE ALWAYS TRYING TO FIND OUT THE CAUSE OF EVERY MORAL PHENOMENON THEY MEET WITH WHICH THEIR NARROW INTELLECT CANNOT UNDERSTAND
+HE ENTREATED ME TO TELL HIM THE TRUTH
+WHAT EXTRAORDINARY THINGS WILL SOMETIMES OCCUR FROM MERE CHANCE OR FROM THE FORCE OF CIRCUMSTANCES
+I PICKED IT UP AND COMING UP TO HIM JUST AS HE WAS GOING DOWN THE STEPS I HANDED IT TO HIM
+THINKING I HAD A RIGHT TO WATCH THE SICK MAN I SETTLED MYSELF NEAR HIS BED TO GIVE HIM EVERY CARE HE REQUIRED
+WE TOOK OUR THREE PRISONERS TO A LARGE BOAT
+TAKING EVERYTHING UPON MYSELF I ORDERED A SERVANT TO HURRY OUT FOR A PHYSICIAN WHO CAME IN A SHORT TIME AND ORDERED THE PATIENT TO BE BLED AGAIN THUS APPROVING THE FIRST BLEEDING PRESCRIBED BY ME
+WITH AN EDUCATION WHICH OUGHT TO HAVE ENSURED ME AN HONOURABLE STANDING IN THE WORLD WITH SOME INTELLIGENCE WIT GOOD LITERARY AND SCIENTIFIC KNOWLEDGE AND ENDOWED WITH THOSE ACCIDENTAL PHYSICAL QUALITIES WHICH ARE SUCH A GOOD PASSPORT INTO SOCIETY I FOUND MYSELF AT THE AGE OF TWENTY THE MEAN FOLLOWER OF A SUBLIME ART IN WHICH IF GREAT TALENT IS RIGHTLY ADMIRED MEDIOCRITY IS AS RIGHTLY DESPISED
+HE HAD GAMBLED AND LOST A GREAT DEAL AND HIS BROTHER WAS HIS MOST BITTER ENEMY BECAUSE HE WAS INFATUATED WITH THE IDEA THAT HE HAD TRIED TO POISON HIM
+I WAS COMPELLED BY POVERTY TO BECOME A MEMBER OF A MUSICAL BAND IN WHICH I COULD EXPECT NEITHER ESTEEM NOR CONSIDERATION AND I WAS WELL AWARE THAT I SHOULD BE THE LAUGHING STOCK OF THE PERSONS WHO HAD KNOWN ME AS A DOCTOR IN DIVINITY AS AN ECCLESIASTIC AND AS AN OFFICER IN THE ARMY AND HAD WELCOMED ME IN THE HIGHEST SOCIETY
+I THREW MYSELF AT HIS FEET TO ASSURE HIM OF MY GRATITUDE AND EMBRACED HIM CALLING HIM MY FATHER
+IT WENT ON TO SAY THAT THE TWO MEN WHO HAD CARRIED HER OFF HAD TAKEN HER TO SUCH A PLACE WHERE THEY HAD AN HOUR LATER BEEN MET BY THE OTHER SIX AND THAT THEY HAD ALL REPAIRED TO THE TWO SWORDS WHERE THEY HAD SPENT AN HOUR IN DRINKING
+THE THREE FRIENDS WERE ASTOUNDED
+MY READERS MAY IMAGINE WHETHER WE FELT INCLINED TO LAUGH WHEN THE CHARMING CREATURE BADE US GOOD NIGHT THANKING US ALL WITH PERFECT GOOD FAITH
+WHOEVER YOU MAY BE I AM INDEBTED TO YOU FOR MY LIFE
+THERE WAS NO COWARDLY TRAITOR AMONGST US ALTHOUGH WE WERE ALL POOR BUT FEAR HAD ITS EFFECT AND OUR NOCTURNAL PRANKS WERE NOT RENEWED
+BESIDES I WAS OF OPINION THAT A MAN'S PROFESSION WHATEVER IT MIGHT BE OUGHT TO SUPPLY HIM WITH ENOUGH MONEY TO SATISFY ALL HIS WANTS AND THE VERY POOR PAY OF AN OFFICER WOULD NEVER HAVE BEEN SUFFICIENT TO COVER MY EXPENSES BECAUSE MY EDUCATION HAD GIVEN ME GREATER WANTS THAN THOSE OF OFFICERS IN GENERAL
+YOU NEED NOT THINK OF THE FUTURE THINK ONLY OF ENJOYING YOURSELF AND TAKE ME AS YOUR ADVISER IN EVERYTHING THAT MAY HAPPEN TO YOU IN EVERYTHING YOU MAY WISH TO UNDERTAKE AND YOU MAY BE CERTAIN OF ALWAYS FINDING ME YOUR FRIEND
+THIS IS THE AMUSING ADVENTURE WHICH CLOSED OUR EXPLOITS
+I OBEYED IMPLICITLY AND MET YOUR EXCELLENCY
+I RUBBED IT WITH ALL MY STRENGTH BUT HE TOLD ME IN A SORT OF INDISTINCT WHISPER THAT THE NUMBNESS WAS SPREADING ALL ALONG THE LEFT SIDE AND THAT HE WAS DYING
+IN EVERY ONE OF THE SEVENTY TWO PARISHES OF THE CITY OF VENICE THERE IS A LARGE PUBLIC HOUSE CALLED MAGAZZINO
+THE WAITER OF THE MAGAZZINO CAME TO BE PAID AND OUR CHIEF GAVE HIM WHAT WAS DUE ENJOINING SILENCE UNDER PENALTY OF DEATH
+AS FOR THE EUCHARIST TRANSUBSTANTIATION THE REAL PRESENCE IT WAS ALL NO MYSTERY TO THEM BUT PALPABLE EVIDENCE AND YET THEY WERE NOT JESUITS
+THEY WERE NOT ONLY GOOD CHRISTIANS AND FAITHFUL TO THE CHURCH BUT EVEN REAL DEVOTEES AND FULL OF SCRUPLES
+DELIGHTED WITH SUCH A FORTUNATE RESULT WE LAY DOWN AGAIN
+THE PHYSICIAN WHO ATTENDED HIM WAS NAMED TERRO HE THOUGHT BY SOME PECULIAR TRAIN OF REASONING THAT HE COULD CURE HIM BY APPLYING A MERCURIAL OINTMENT TO THE CHEST TO WHICH NO ONE RAISED ANY OBJECTION
+TWO DAYS AFTERWARDS OUR NOCTURNAL ORGY BEGAN TO BE TALKED OF
+HE WROTE THE QUESTION AND GAVE IT TO ME I READ IT I COULD NOT UNDERSTAND EITHER THE SUBJECT OR THE MEANING OF THE WORDS BUT IT DID NOT MATTER I HAD TO GIVE AN ANSWER
+I MIGHT BE TOLD THAT IF I HAD WISHED TO FOLLOW THE RULES OF PURE MORALITY I OUGHT EITHER TO HAVE DECLINED INTIMATE INTERCOURSE WITH THEM OR TO HAVE UNDECEIVED THEM
+BUT TO HAVE A FRIEND AND TO BE TRUE UNDER ANY AND ALL TRIALS IS THE MARK OF A MAN
+BEFORE THIS CALAMITY CAME UPON US YOU COULD NOT FIND ANYWHERE A HAPPIER HOME THAN THAT CREATED BY THE INDIAN WOMAN
+THIS WILD MOTHER HAS NOT ONLY THE EXPERIENCE OF HER MOTHER AND GRANDMOTHER AND THE ACCEPTED RULES OF HER PEOPLE FOR A GUIDE BUT SHE HUMBLY SEEKS TO LEARN A LESSON FROM ANTS BEES SPIDERS BEAVERS AND BADGERS
+HE HAD NEITHER A NATIONAL ARMY NOR AN ORGANIZED CHURCH
+HER ATTITUDE AND SECRET MEDITATIONS MUST BE SUCH AS TO INSTILL INTO THE RECEPTIVE SOUL OF THE UNBORN CHILD THE LOVE OF THE GREAT MYSTERY AND A SENSE OF BROTHERHOOD WITH ALL CREATION
+INDEED THE DISTINCTIVE WORK OF BOTH GRANDPARENTS IS THAT OF ACQUAINTING THE YOUTH WITH THE NATIONAL TRADITIONS AND BELIEFS
+OUR HONOR IS THE GUARANTEE FOR HIS SAFETY SO LONG AS HE IS WITHIN THE CAMP
+WHEN SHE FELL THE WHOLE RACE FELL WITH HER
+HE CUTS OFF THE CHOICEST MORSEL OF THE MEAT AND CASTS IT INTO THE FIRE THE PUREST AND MOST ETHEREAL ELEMENT
+THE FAMILY WAS NOT ONLY THE SOCIAL UNIT BUT ALSO THE UNIT OF GOVERNMENT
+LOVE BETWEEN MAN AND WOMAN IS FOUNDED ON THE MATING INSTINCT AND IS NOT FREE FROM DESIRE AND SELF SEEKING
+WHEN HE BECOMES AN OLD MAN HE LOVES TO MAKE A NOTABLE EFFORT TO PROVE HIS GRATITUDE
+IN DUE TIME THE CHILD TAKES OF HIS OWN ACCORD THE ATTITUDE OF PRAYER AND SPEAKS REVERENTLY OF THE POWERS
+THE ORDEAL IS BEST MET ALONE WHERE NO CURIOUS OR PITYING EYES EMBARRASS HER WHERE ALL NATURE SAYS TO HER SPIRIT TIS LOVE TIS LOVE THE FULFILLING OF LIFE
+HIS DAILY DEVOTIONS WERE MORE NECESSARY TO HIM THAN DAILY FOOD
+WHENEVER IN THE COURSE OF THE DAILY HUNT THE RED HUNTER COMES UPON A SCENE THAT IS STRIKINGLY BEAUTIFUL OR SUBLIME A BLACK THUNDERCLOUD WITH THE RAINBOW'S GLOWING ARCH ABOVE THE MOUNTAIN A WHITE WATERFALL IN THE HEART OF A GREEN GORGE A VAST PRAIRIE TINGED WITH THE BLOOD RED OF SUNSET HE PAUSES FOR AN INSTANT IN THE ATTITUDE OF WORSHIP
+THIS BOND IS BETWEEN MAN AND MAN IS USUALLY FORMED IN EARLY YOUTH AND CAN ONLY BE BROKEN BY DEATH
+THE REMOTER DEGREES OF KINSHIP WERE FULLY RECOGNIZED AND THAT NOT AS A MATTER OF FORM ONLY FIRST COUSINS WERE KNOWN AS BROTHERS AND SISTERS THE NAME OF COUSIN CONSTITUTED A BINDING CLAIM AND OUR RIGID MORALITY FORBADE MARRIAGE BETWEEN COUSINS IN ANY KNOWN DEGREE OR IN OTHER WORDS WITHIN THE CLAN
+THE HOSPITALITY OF THE WIGWAM IS ONLY LIMITED BY THE INSTITUTION OF WAR
+AT ANOTHER TIME WHEN I WAS FOURTEEN YEARS OLD WE HAD JUST LEFT FORT ELLIS ON THE ASSINIBOINE RIVER AND MY YOUNGEST UNCLE HAD SELECTED A FINE SPOT FOR OUR NIGHT CAMP
+AS A SPECIAL MARK OF RESPECT THE BODY OF A YOUNG WOMAN OR A WARRIOR WAS SOMETIMES LAID OUT IN STATE IN A NEW TEEPEE WITH THE USUAL HOUSEHOLD ARTICLES AND EVEN WITH A DISH OF FOOD LEFT BESIDE IT NOT THAT THEY SUPPOSED THE SPIRIT COULD USE THE IMPLEMENTS OR EAT THE FOOD BUT MERELY AS A LAST TRIBUTE
+MANY OF THE INDIANS BELIEVED THAT ONE MAY BE BORN MORE THAN ONCE AND THERE WERE SOME WHO CLAIMED TO HAVE FULL KNOWLEDGE OF A FORMER INCARNATION
+GIVING THEMSELVES UP WHOLLY TO THEIR GRIEF THEY ARE NO LONGER CONCERNED ABOUT ANY EARTHLY POSSESSION AND OFTEN GIVE AWAY ALL THAT THEY HAVE TO THE FIRST COMERS EVEN TO THEIR BEDS AND THEIR HOME
+THIS WAS CARRIED OUT TO THE LETTER
+IF A MAN WERE SLAIN IN BATTLE IT WAS AN OLD CUSTOM TO PLACE HIS BODY AGAINST A TREE OR ROCK IN A SITTING POSITION ALWAYS FACING THE ENEMY TO INDICATE HIS UNDAUNTED DEFIANCE AND BRAVERY EVEN IN DEATH
+THERE ARE MANY TRUSTWORTHY MEN AND MEN OF CHRISTIAN FAITH TO VOUCH FOR THESE AND SIMILAR EVENTS OCCURRING AS FORETOLD
+FIVE YEARS LATER HE REPEATED THE SERVICE AND AGAIN SAVED HIS PEOPLE FROM AWFUL SLAUGHTER
+THIS WAS ONLY ONE OF HIS REMARKABLE PROPHECIES
+AT THE AGE OF ABOUT SEVENTY FIVE YEARS HE SAVED HIS BAND FROM UTTER DESTRUCTION AT THE HANDS OF THEIR ANCESTRAL ENEMIES BY SUDDENLY GIVING WARNING RECEIVED IN A DREAM OF THE APPROACH OF A LARGE WAR PARTY
+REINCARNATION AND THE CONVERSE OF SPIRITS
+AT EVERY MEAL TIME A DISH OF FOOD WAS PLACED UNDER IT AND SOME PERSON OF THE SAME SEX AND AGE AS THE ONE WHO WAS GONE MUST AFTERWARD BE INVITED IN TO PARTAKE OF THE FOOD
+ANOTHER FAMOUS MEDICINE MAN WAS BORN ON THE RUM RIVER ABOUT ONE HUNDRED AND FIFTY YEARS AGO AND LIVED TO BE OVER A CENTURY OLD
+IT WAS PREPARED BY DRESSING IN THE FINEST CLOTHES TOGETHER WITH SOME PERSONAL POSSESSIONS AND ORNAMENTS WRAPPED IN SEVERAL ROBES AND FINALLY IN A SECURE COVERING OF RAW HIDE
+AT THE END OF A YEAR FROM THE TIME OF DEATH THE RELATIVES MADE A PUBLIC FEAST AND GAVE AWAY THE CLOTHING AND OTHER GIFTS WHILE THE LOCK OF HAIR WAS INTERRED WITH APPROPRIATE CEREMONIES
+THEREFORE HE COURTS DEATH IN BATTLE ON THE OTHER HAND HE WOULD REGARD IT AS DISGRACEFUL TO BE KILLED IN A PRIVATE QUARREL
+THERE WAS A WELL KNOWN SIOUX WAR PROPHET WHO LIVED IN THE MIDDLE OF THE LAST CENTURY SO THAT HE IS STILL REMEMBERED BY THE OLD MEN OF HIS BAND
+NO DOUBT MANY PREDICTIONS HAVE BEEN COLORED TO SUIT THE NEW AGE AND UNQUESTIONABLY FALSE PROPHETS FAKIRS AND CONJURERS HAVE BECOME THE PEST OF THE TRIBES DURING THE TRANSITION PERIOD
+IT IS WELL KNOWN THAT THE AMERICAN INDIAN HAD SOMEHOW DEVELOPED OCCULT POWER AND ALTHOUGH IN THE LATTER DAYS THERE HAVE BEEN MANY IMPOSTORS AND ALLOWING FOR THE VANITY AND WEAKNESS OF HUMAN NATURE IT IS FAIR TO ASSUME THAT THERE MUST HAVE BEEN SOME EVEN IN THE OLD DAYS YET THERE ARE WELL ATTESTED INSTANCES OF REMARKABLE PROPHECIES AND OTHER MYSTIC PRACTICE
+THE MEN BLACKEN THEIR FACES AND WIDOWS OR BEREAVED PARENTS SOMETIMES GASH THEIR ARMS AND LEGS TILL THEY ARE COVERED WITH BLOOD
+TO THE UNTUTORED SAGE THE CONCENTRATION OF POPULATION WAS THE PROLIFIC MOTHER OF ALL EVILS MORAL NO LESS THAN PHYSICAL
+IN HIS OWN THOUGHT HE ROSE SUPERIOR TO THEM HE SCORNED THEM EVEN AS A LOFTY SPIRIT ABSORBED IN ITS STERN TASK REJECTS THE SOFT BEDS THE LUXURIOUS FOOD THE PLEASURE WORSHIPING DALLIANCE OF A RICH NEIGHBOR
+WHEN HE RETURNED TO THE CAMP HE MUST REMAIN AT A DISTANCE UNTIL HE HAD AGAIN ENTERED THE VAPOR BATH AND PREPARED HIMSELF FOR INTERCOURSE WITH HIS FELLOWS
+IT WAS SILENT BECAUSE ALL SPEECH IS OF NECESSITY FEEBLE AND IMPERFECT THEREFORE THE SOULS OF MY ANCESTORS ASCENDED TO GOD IN WORDLESS ADORATION
+THE ORIGINAL ATTITUDE OF THE AMERICAN INDIAN TOWARD THE ETERNAL THE GREAT MYSTERY THAT SURROUNDS AND EMBRACES US WAS AS SIMPLE AS IT WAS EXALTED
+THE FIRST BAMBEDAY OR RELIGIOUS RETREAT MARKED AN EPOCH IN THE LIFE OF THE YOUTH WHICH MAY BE COMPARED TO THAT OF CONFIRMATION OR CONVERSION IN CHRISTIAN EXPERIENCE
+IT WAS NOT THEN WHOLLY FROM IGNORANCE OR IMPROVIDENCE THAT HE FAILED TO ESTABLISH PERMANENT TOWNS AND TO DEVELOP A MATERIAL CIVILIZATION
+FROM THE SUN AS THE UNIVERSAL FATHER PROCEEDS THE QUICKENING PRINCIPLE IN NATURE AND IN THE PATIENT AND FRUITFUL WOMB OF OUR MOTHER THE EARTH ARE HIDDEN EMBRYOS OF PLANTS AND MEN
+NOTHING OF THE MARVELOUS COULD ASTONISH HIM AS THAT A BEAST SHOULD SPEAK OR THE SUN STAND STILL
+KNOWING THAT GOD SETS NO VALUE UPON MATERIAL THINGS HE TOOK WITH HIM NO OFFERINGS OR SACRIFICES OTHER THAN SYMBOLIC OBJECTS SUCH AS PAINTS AND TOBACCO
+IN THIS TYPE OF PRAYER THERE WAS NO BESEECHING OF FAVOR OR HELP
+THIS IS THE MATERIAL OR PHYSICAL PRAYER
+THAT SOLITARY COMMUNION WITH THE UNSEEN WHICH WAS THE HIGHEST EXPRESSION OF OUR RELIGIOUS LIFE IS PARTLY DESCRIBED IN THE WORD BAMBEDAY LITERALLY MYSTERIOUS FEELING WHICH HAS BEEN VARIOUSLY TRANSLATED FASTING AND DREAMING
+THE HISTORIANS OF THE WHITE RACE ADMIT THAT THE INDIAN WAS NEVER THE FIRST TO REPUDIATE HIS OATH
+AT THE SOLEMN HOUR OF SUNRISE OR SUNSET HE TOOK UP HIS POSITION OVERLOOKING THE GLORIES OF EARTH AND FACING THE GREAT MYSTERY AND THERE HE REMAINED NAKED ERECT SILENT AND MOTIONLESS EXPOSED TO THE ELEMENTS AND FORCES OF HIS ARMING FOR A NIGHT AND A DAY TO TWO DAYS AND NIGHTS BUT RARELY LONGER
+NONE MIGHT EXHORT OR CONFESS OR IN ANY WAY MEDDLE WITH THE RELIGIOUS EXPERIENCE OF ANOTHER
+THE SAVAGE PHILOSOPHER THE DUAL MIND
+AMONG US ALL MEN WERE CREATED SONS OF GOD AND STOOD ERECT AS CONSCIOUS OF THEIR DIVINITY
+HERE IS THE SUPREME MYSTERY THAT IS THE ESSENCE OF WORSHIP WITHOUT WHICH THERE CAN BE NO RELIGION AND IN THE PRESENCE OF THIS MYSTERY OUR ATTITUDE CANNOT BE VERY UNLIKE THAT OF THE NATURAL PHILOSOPHER WHO BEHOLDS WITH AWE THE DIVINE IN ALL CREATION
+WHO MAY CONDEMN HIS SUPERSTITION
+THEIR FRIENDS DID THEIR BEST TO AMUSE THEM
+THEIR MINDS WERE SO DISTRACTED AT THIS CHANGE OF ROUTE AS TO BE QUITE UNHINGED
+WILL HALLEY IS A BRUTE BUT I AM KEEPING MY EYES OPEN AND IF THE COAST LOOKS DANGEROUS I WILL PUT THE SHIP'S HEAD TO SEA AGAIN
+SO THAT ON THAT SCORE THERE IS LITTLE OR NO DANGER
+FORTUNATELY WILL HALLEY WAS NOT A MAN IN A HURRY AND DID NOT USE A PRESS OF CANVAS OR HIS MASTS WOULD INEVITABLY HAVE COME DOWN
+THINK OF LADY GLENARVAN THINK OF MARY GRANT
+YES MY LORD WE SHOULD TRY IN VAIN
+BUT AS TO GETTING ALONGSIDE THE DUNCAN GOD FORBID
+WHAT THEN MY LORD
+WE WOULD FIGHT TO THE DEATH OF COURSE BUT AFTER THAT
+JOHN MANGLES THEREFORE HOPED THAT THE WRETCHED HULL WOULD REACH PORT WITHOUT ACCIDENT BUT IT GRIEVED HIM THAT HIS COMPANIONS SHOULD HAVE TO SUFFER SO MUCH DISCOMFORT FROM THE DEFECTIVE ARRANGEMENTS OF THE BRIG
+HIS EYES WANDERED CEASELESSLY OVER THE BLANK HORIZON
+WE COULD NOT EVEN FLY FLY JOHN
+GOD KEEP US FROM SUCH A MEETING WHY JOHN
+MUCH AS THEY HAD BEEN INTERESTED IN HIS DISSERTATION ON THE PAMPAS OR AUSTRALIA HIS LECTURES ON NEW ZEALAND FELL ON COLD AND INDIFFERENT EARS
+DON'T YOU SEE HOW MANY USES WE HAVE FOUND FOR THIS REFUSE COAL TAR
+LOOK A LITTLE CLOSER WHILE OUR GUIDE LETS THE LIGHT OF HIS LAMP FALL UPON THE BLACK WALL AT YOUR SIDE
+WHEN YOUR HANDS OR LIPS ARE CRACKED AND ROUGH FROM THE COLD DOES YOUR MOTHER EVER PUT ON GLYCERIN TO HEAL THEM
+FERNS AND PALMS MOSSES AND TREES AND ANIMALS ALL PERFECT ALL BEAUTIFUL AND YET ALL HIDDEN AWAY UNDER THIS HILL AND TURNED INTO SHINING BLACK COAL NOW I CAN VERY WELL REMEMBER WHEN I FIRST SAW A COAL FIRE AND HOW ODD IT LOOKED TO SEE WHAT SEEMED TO BE BURNING STONES
+WHY DID HE GIVE THAT SO ODD A SHAPE OR SO STRANGE A COVERING
+ONCE THERE WAS A FATHER WHO THOUGHT HE WOULD BUILD FOR HIS CHILDREN A BEAUTIFUL HOME PUTTING INTO IT EVERY THING THEY COULD NEED OR DESIRE THROUGHOUT THEIR LIVES
+FOR WHEN I WAS A LITTLE GIRL WE ALWAYS HAD LOGS OF WOOD BLAZING IN AN OPEN FIREPLACE AND SO DID MANY OTHER PEOPLE AND COAL WAS JUST COMING INTO USE FOR FUEL
+SEE BENEATH YOUR FEET IS THE MARKING OF GREAT TREE TRUNKS LYING ASLANT ACROSS THE FLOOR AND THE FORMS OF GIGANTIC PALM LEAVES STREWED AMONG THEM
+THE SWEETEST PERFUMES FLOATED THROUGH THE AIR WHILE THOUSANDS OF BIRDS ANSWERED THE MUSIC OF FOUNTAINS WITH THEIR SONGS
+THEN THE HILLS WERE PILED UP ON TOP OF IT ALL BUT HERE AND THERE SOME EDGE OF A COAL BED WAS TILTED UP AND APPEARED ABOVE THE GROUND
+THESE FORESTS WERE OF TREES DIFFERENT IN SOME WAYS FROM THOSE WE HAVE NOW GREAT FERNS AS TALL AS THIS HOUSE AND MOSSES AS HIGH AS LITTLE TREES AND PALM LEAVES OF ENORMOUS SIZE
+IT WAS ONLY A TROUBLE TO THE GAS MAKERS WHO HAD NO USE FOR IT AND EVEN THREW IT AWAY UNTIL SOME ONE MORE THOUGHTFUL THAN THE OTHERS FOUND OUT THAT WATER WOULD NOT PASS THROUGH IT
+AND SO THROUGH MANY QUESTIONS AND MANY EXPERIMENTS THEY LEARN AT LAST HOW TO USE THE CONTENTS OF THIS ONE STOREHOUSE
+THE ENTRANCE IS LIGHT BECAUSE IT OPENS SO WIDE BUT WE CAN SEE THAT THE FLOOR SLOPES DOWNWARD AND THE WAY LOOKS DARK AND NARROW BEFORE US
+HERE IS SOMETHING DIFFERENT ROUNDED LIKE A NUT SHELL YOU CAN SPLIT OFF ONE SIDE AND BEHOLD THERE IS THE NUT LYING SNUGLY AS DOES ANY CHESTNUT IN ITS BUR
+WHAT SHOULD WE HAVE DONE IF EVERYBODY HAD KEPT ON BURNING WOOD TO THIS DAY
+WALK DOWN THE SLOPING FOOT PATH NOW AND BE CAREFUL TO KEEP OUT OF THE WAY OF THE LITTLE CARS THAT ARE COMING AND GOING ON EACH SIDE OF YOU LOADED ON ONE SIDE AND EMPTY ON THE OTHER AND SEEMING TO RUN UP AND DOWN BY THEMSELVES
+BUT BY AND BY THE WISE MEN THOUGHT ABOUT IT AND SAID TO THEMSELVES WE MUST FIND OUT WHAT USEFUL PURPOSE GOD MADE THE GAS FOR WE KNOW THAT HE DOES NOT MAKE ANY THING FOR HARM ONLY
+ON THAT SIDE DESCENT WAS IMPOSSIBLE AND HAD IT BEEN POSSIBLE THE BOTTOM WAS SHUT IN BY THE ENORMOUS ROCK
+WHAT COULD BE THE OBJECT
+WATCH THE SAVAGES OUTSIDE SAID ROBERT
+THE MEAL ENDED
+LISTEN SAID HE MOTIONING THEM TO STOOP
+IF IT IS DECREED THAT WE DIE TO MORROW LET US DIE BRAVELY LIKE CHRISTIAN MEN READY TO APPEAR WITHOUT TERROR BEFORE THE SUPREME JUDGE
+JOHN YOU HAVE PROMISED MARY WHAT I PROMISED LADY HELENA WHAT IS YOUR PLAN
+SLEEP WHICH KEEPS ALL SORROW IN ABEYANCE SOON WEIGHED DOWN THEIR EYELIDS THEY SLEPT IN EACH OTHER'S ARMS OVERCOME BY EXHAUSTION AND PROLONGED WATCHING
+JOHN MANGLES INSERTING THE BLADE OF HIS PONIARD AVOIDED THE KNIFE WHICH NOW PROTRUDED ABOVE THE SOIL BUT SEIZED THE HAND THAT WIELDED IT
+THEY HAD ONE NIGHT IN WHICH TO PREPARE FOR DEATH
+THEIR FINGERS BLED BUT STILL THEY WORKED ON AFTER HALF AN HOUR THEY HAD GONE THREE FEET DEEP THEY PERCEIVED BY THE INCREASED SHARPNESS OF THE SOUNDS THAT ONLY A THIN LAYER OF EARTH PREVENTED IMMEDIATE COMMUNICATION
+GOD WHO READS OUR HEARTS KNOWS THAT WE HAD A NOBLE END IN VIEW
+DID THEY KNOW OF THE EXISTENCE OF THE PRISONERS OR WAS IT SOME PRIVATE ENTERPRISE THAT LED TO THE UNDERTAKING
+THEY WERE NOT TO LEAVE IT AGAIN TILL THE TOPS OF THE WAHITI RANGES WERE LIT WITH THE FIRST FIRES OF DAY
+MY CHILD MY CHILD MURMURED LADY HELENA THE SAVAGES DID NOT KILL YOU
+I BELIEVE SAID JOHN THAT IN THE SIGHT OF GOD I HAVE A RIGHT TO FULFILL THAT PROMISE
+BUT SOFTLY AS THE NAME WAS BREATHED MARY GRANT ALREADY AWAKENED BY THE SOUNDS IN THE HUT SLIPPED OVER TOWARD GLENARVAN AND SEIZING THE HAND ALL STAINED WITH EARTH SHE COVERED IT WITH KISSES
+WILSON AND OLBINETT JOINED THEIR COMPANIONS AND ALL UNITED TO DIG THROUGH THE WALL JOHN WITH HIS DAGGER THE OTHERS WITH STONES TAKEN FROM THE GROUND OR WITH THEIR NAILS WHILE MULRADY STRETCHED ALONG THE GROUND WATCHED THE NATIVE GUARD THROUGH A CREVICE OF THE MATTING
+AT LAST THE MAJOR SAID MY FRIENDS KEEP THAT TO THE LAST MOMENT
+MY LORD WHICHEVER OF US SURVIVES THE OTHER WILL FULFILL THE WISH OF LADY HELENA AND MARY GRANT
+GLENARVAN'S VOICE FIRM TILL NOW FALTERED
+ANIMAL OR MAN ANSWERED THE MAJOR I WILL SOON FIND OUT
+ROUND HIS BODY WAS ROLLED A LONG COIL OF FLAX ROPE
+THE JAILER MAY FORGET THAT HE IS ON GUARD THE PRISONER NEVER FORGETS THAT HE IS GUARDED
+CHAPTER THIRTY THREE A CONFIDANT
+MISTER MORTON THEN MADE A CAREFUL MEMORANDUM OF THE VARIOUS PARTICULARS OF WAVERLEY'S INTERVIEW WITH DONALD BEAN LEAN AND THE OTHER CIRCUMSTANCES WHICH HE HAD COMMUNICATED
+WHEN I WAS A YOUNG MAN LIKE YOU MISTER WAVERLEY ANY SUCH HAIR BRAINED EXPEDITION I BEG YOUR PARDON FOR THE EXPRESSION WOULD HAVE HAD INEXPRESSIBLE CHARMS FOR ME
+EVIL TO HIM THAT THINKS OTHERWISE SAID MISTER MORTON OR WHO HOLDS CHURCH GOVERNMENT AND CEREMONIES AS THE EXCLUSIVE GAGE OF CHRISTIAN FAITH OR MORAL VIRTUE
+SINCE THAT TIME THEIR NUMBERS HAVE GRADUALLY DIMINISHED BUT A GOOD MANY ARE STILL TO BE FOUND IN THE WESTERN COUNTIES AND SEVERAL WITH A BETTER TEMPER THAN IN SEVENTEEN O SEVEN HAVE NOW TAKEN ARMS FOR GOVERNMENT
+HE HAD NEITHER SYMPATHY WITH MY INNOCENCE NOR WITH MY WRETCHEDNESS AND THE PETRIFYING ACCURACY WITH WHICH HE ATTENDED TO EVERY FORM OF CIVILITY WHILE HE TORTURED ME BY HIS QUESTIONS HIS SUSPICIONS AND HIS INFERENCES WAS AS TORMENTING AS THE RACKS OF THE INQUISITION
+HE CERTAINLY POSSESSES TALENTS BEYOND THE RUDE SPHERE IN WHICH HE MOVES AND BEING NEITHER DESTITUTE OF AMBITION NOR ENCUMBERED WITH SCRUPLES HE WILL PROBABLY ATTEMPT BY EVERY MEANS TO DISTINGUISH HIMSELF DURING THE PERIOD OF THESE UNHAPPY COMMOTIONS
+MISTER MORTON REPLIED THAT FAR FROM MAKING ANY CLAIM UPON HIS GOOD OPINION HIS ONLY WISH AND THE SOLE PURPOSE OF HIS VISIT WAS TO FIND OUT THE MEANS OF DESERVING IT
+THEY HELD CONVENTICLES IN THE OPEN FIELDS AND BEING TREATED WITH GREAT VIOLENCE AND CRUELTY BY THE SCOTTISH GOVERNMENT MORE THAN ONCE TOOK ARMS DURING THOSE REIGNS
+IT WAS ONE OF THOSE EFFECTS WHICH A PAINTER LOVES TO REPRESENT AND MINGLED WELL WITH THE STRUGGLING LIGHT WHICH FOUND ITS WAY BETWEEN THE BOUGHS OF THE SHADY ARCH THAT VAULTED THE BROAD GREEN ALLEY
+THE HOUSE WHICH SEEMED TO CONSIST OF TWO OR THREE HIGH NARROW AND STEEP ROOFED BUILDINGS PROJECTING FROM EACH OTHER AT RIGHT ANGLES FORMED ONE SIDE OF THE INCLOSURE
+THE EVIL AND REMEDY SUCH AS IT IS STILL EXIST BUT THIS IS REMOTE FROM OUR PRESENT PURPOSE AND IS ONLY THROWN OUT FOR CONSIDERATION OF THE COLLECTORS UNDER MISTER DENT'S DOG BILL
+IT WAS ABOUT NOON WHEN CAPTAIN WAVERLEY ENTERED THE STRAGGLING VILLAGE OR RATHER HAMLET OF TULLY VEOLAN CLOSE TO WHICH WAS SITUATED THE MANSION OF THE PROPRIETOR
+EVERYTHING AROUND APPEARED SOLITARY AND WOULD HAVE BEEN SILENT BUT FOR THE CONTINUED PLASHING OF THE FOUNTAIN AND THE WHOLE SCENE STILL MAINTAINED THE MONASTIC ILLUSION WHICH THE FANCY OF WAVERLEY HAD CONJURED UP
+IT HAD BEEN BUILT AT A PERIOD WHEN CASTLES WERE NO LONGER NECESSARY AND WHEN THE SCOTTISH ARCHITECTS HAD NOT YET ACQUIRED THE ART OF DESIGNING A DOMESTIC RESIDENCE
+NEITHER DID THE FRONT INDICATE ABSOLUTE SECURITY FROM DANGER
+THE HOUSES SEEMED MISERABLE IN THE EXTREME ESPECIALLY TO AN EYE ACCUSTOMED TO THE SMILING NEATNESS OF ENGLISH COTTAGES
+YET THE PHYSIOGNOMY OF THE PEOPLE WHEN MORE CLOSELY EXAMINED WAS FAR FROM EXHIBITING THE INDIFFERENCE OF STUPIDITY THEIR FEATURES WERE ROUGH BUT REMARKABLY INTELLIGENT GRAVE BUT THE VERY REVERSE OF STUPID AND FROM AMONG THE YOUNG WOMEN AN ARTIST MIGHT HAVE CHOSEN MORE THAN ONE MODEL WHOSE FEATURES AND FORM RESEMBLED THOSE OF MINERVA
+THIS WORK OF ART WAS THE WONDER OF THE COUNTRY TEN MILES ROUND
+THE COURT WAS SPACIOUS WELL PAVED AND PERFECTLY CLEAN THERE BEING PROBABLY ANOTHER ENTRANCE BEHIND THE STABLES FOR REMOVING THE LITTER
+THIS AVENUE WAS STRAIGHT AND OF MODERATE LENGTH RUNNING BETWEEN A DOUBLE ROW OF VERY ANCIENT HORSE CHESTNUTS PLANTED ALTERNATELY WITH SYCAMORES WHICH ROSE TO SUCH HUGE HEIGHT AND NOURISHED SO LUXURIANTLY THAT THEIR BOUGHS COMPLETELY OVER ARCHED THE BROAD ROAD BENEATH
+OCCASIONALLY INDEED WHEN SUCH A CONSUMMATION SEEMED INEVITABLE A WATCHFUL OLD GRANDAM WITH HER CLOSE CAP DISTAFF AND SPINDLE RUSHED LIKE A SIBYL IN FRENZY OUT OF ONE OF THESE MISERABLE CELLS DASHED INTO THE MIDDLE OF THE PATH AND SNATCHING UP HER OWN CHARGE FROM AMONG THE SUNBURNT LOITERERS SALUTED HIM WITH A SOUND CUFF AND TRANSPORTED HIM BACK TO HIS DUNGEON THE LITTLE WHITE HEADED VARLET SCREAMING ALL THE WHILE FROM THE VERY TOP OF HIS LUNGS A SHRILLY TREBLE TO THE GROWLING REMONSTRANCES OF THE ENRAGED MATRON
+STABLES AND OTHER OFFICES OCCUPIED ANOTHER SIDE OF THE SQUARE
+TWO BATTLEMENTED WALLS ONE OF WHICH FACED THE AVENUE AND THE OTHER DIVIDED THE COURT FROM THE GARDEN COMPLETED THE INCLOSURE
+HER COMPLEXION WAS NOT A DECIDED PINK BUT A SOFT ROSY TINT NOT MUCH DEEPER THAN THAT OF TROT'S SKIN
+WHAT IS IT CORALIE SHE ASKED THE WOMAN
+WE DO NOT HATE YOU AS YOU SAY THE BLUESKINS DO NOR ARE WE SAVAGE OR CRUEL BUT WE DO NOT WANT YOU HERE AND I AM REALLY PUZZLED WHAT TO DO WITH YOU
+THESE INTRUDERS ARE VERY PECULIAR PEOPLE REMARKED A MAN IN THE CROWD
+YOU ARE NOT LIKE MY PEOPLE THE PINKIES AND THERE IS NO PLACE FOR YOU IN OUR COUNTRY
+WHAT THAT LITTLE CABIN
+A MISFORTUNE OF BIRTH PLACED ME HERE AND I CANNOT ESCAPE MY FATE
+IN ALL OUR HISTORY YOU ARE THE FIRST PEOPLE FROM OUTSIDE OUR BORDERS WHO HAVE EVER STEPPED A FOOT IN OUR LAND
+IF I LIVED AS LUXURIOUSLY AS MY PEOPLE DO AND HAD SERVANTS AND COSTLY GOWNS THE GOOD PINKIES WOULD SAY THAT THEIR QUEEN HAD MORE THAN THEY THEMSELVES AND IT WOULD BE TRUE
+NO OUR WAY IS BEST
+SHE SMILED A LITTLE SADLY AT TROT SEEMED TO APPROVE BUTTON BRIGHT'S OPEN FRANK FACE AND WAS QUITE SURPRISED BECAUSE CAP'N BILL WAS SO MUCH BIGGER THAN HER OWN PEOPLE
+EVEN IN AMERICA EVER'BODY BOWS LOW TO OUR PRESIDENT AN THE BLUESKINS ARE SO FRAID O THEIR BOOLOOROO THAT THEY TREMBLE WHENEVER THEY GO NEAR HIM
+THE PEOPLE MUST WAIT OUTSIDE FOR THERE IS NO ROOM FOR THEM IN THE PALACE
+IN THAT CASE SAID BUTTON BRIGHT YOU'RE ENTITLED TO THE BEST THERE IS TO PAY FOR YOUR TROUBLE
+BUT SURELY THAT IS ALL WRONG SAID TOURMALINE GRAVELY
+YES IT WAS WET AN STICKY ALL RIGHT AGREED THE SAILOR BUT THE BIG FROG HELPED US AN WE GOT THROUGH ALL RIGHT
+I'LL LOOK IN THE GREAT BOOK FIRST
+I HAVE ONE GREAT PRIVILEGE
+EXCLAIMED TROT OF COURSE
+THERE IS NOTHING MAJESTIC ABOUT ME AS YOU KNOW VERY WELL
+PERHAPS YOU ARE TRYING TO RIDICULE ME SHE CONTINUED REGARDING THE SAILOR'S FACE CLOSELY
+THE QUEEN HAS NOTHING BUT THE POWER TO EXECUTE THE LAWS TO ADJUST GRIEVANCES AND TO COMPEL ORDER
+IT HAD NO ORNAMENTATION BEING EXCEEDINGLY PLAIN IN APPEARANCE
+ARE YOU A GIANT
+THEY SEEM VERY IGNORANT POOR THINGS SAID ANOTHER IN REPLY
+IT IS MUCH MORE DESIRABLE TO BE A PRIVATE CITIZEN HAPPY AND CARE FREE
+HERE SAID ONE OF THEIR GUIDES AS THE PROCESSION HALTED BEFORE THE LITTLE STONE BUILDING IS THE PALACE OF TOURMALINE WHO IS OUR QUEEN
+SHE WAS A BEAUTIFUL GIRL OF ABOUT SEVENTEEN YEARS OF AGE NOT FAT LIKE ALL THE REST OF THE PINKIES BUT SLENDER AND WELL FORMED ACCORDING TO OUR OWN IDEAS OF BEAUTY
+THEREFORE I AM A MERE AGENT TO DIRECT THE LAWS WHICH ARE THE WILL OF THE PEOPLE AND AM ONLY A PUBLIC SERVANT OBLIGED CONSTANTLY TO GUARD THE WELFARE OF MY SUBJECTS
+DID YOU SUPPOSE A PALACE WOULD BE LIKE ONE OF OUR HANDSOME RESIDENCES ASKED THE WOMAN EVIDENTLY SURPRISED
+SO THEY FOLLOWED HER THROUGH THE LOW ARCHWAY AND IN A ROOM BEYOND VERY SIMPLY FURNISHED SAT A YOUNG GIRL ENGAGED IN DARNING A PAIR OF PINK STOCKINGS
+THE QUEEN GAZED UPON OUR FRIENDS WITH EVIDENT INTEREST
+CORALIE DO YOU CONSIDER MAJESTY A PROPER WORD TO USE WHEN ADDRESSING A QUEEN
+AFTER MY DEATH A PINK MARBLE STATUE OF ME WILL BE SET UP IN THE GRAND COURT WITH THE STATUES OF THE OTHER KINGS AND QUEENS WHO HAVE RULED THIS LAND AND ALL THE PINKIES IN AGES TO COME WILL THEN HONOR ME AS HAVING BEEN A JUST AND UPRIGHT QUEEN THAT IS MY REWARD
+HE WAS WISE IN HIS OWN CONCEIT
+ON SUNDAY MORNING A CLEAR BEAUTIFUL AND STILL DAY THE ORDER WAS GIVEN FOR THE WHOLE ARMY TO ADVANCE AND TO ATTACK IMMEDIATELY
+ON MONDAY THE TIDE WAS REVERSED
+THIS WAS THE FIRST BIG BATTLE IN WHICH OUR REGIMENT HAD EVER BEEN ENGAGED
+THE ROPE HOWEVER WAS STRONGER THAN THE MULE'S NO AND HE WAS FINALLY PREVAILED UPON BY THE STRENGTH OF THE ROPE TO CROSS THE CREEK
+AS GLADDEN RODE BY US A COURIER RODE UP AND TOLD HIM SOMETHING
+I HAD HEARD AND READ OF BATTLEFIELDS SEEN PICTURES OF BATTLEFIELDS OF HORSES AND MEN OF CANNON AND WAGONS ALL JUMBLED TOGETHER WHILE THE GROUND WAS STREWN WITH DEAD AND DYING AND WOUNDED BUT I MUST CONFESS THAT I NEVER REALIZED THE POMP AND CIRCUMSTANCE OF THE THING CALLED GLORIOUS WAR UNTIL I SAW THIS
+BUT AS I SAID BEFORE READER A PRIVATE SOLDIER IS BUT AN AUTOMATON AND KNOWS NOTHING OF WHAT IS GOING ON AMONG THE GENERALS AND I AM ONLY GIVING THE CHRONICLES OF LITTLE THINGS AND EVENTS THAT CAME UNDER MY OWN OBSERVATION AS I SAW THEM THEN AND REMEMBER THEM NOW
+THE FACT WAS KEPT FROM THE TROOPS
+ABOUT DAYLIGHT ON SUNDAY MORNING CHALMERS BRIGADE RELIEVED GLADDEN'S
+ON MY TAKING THE ROPE OFF HE SHOOK HIMSELF AND SEEMED TO SAY YOU THINK THAT YOU ARE MIGHTY SMART FOLKS BUT YOU ARE A LEETLE TOO SMART
+OFFICERS COULD NOT CURB THE MEN TO KEEP IN LINE
+ABOUT THE TIME HE PULLED TRIGGER A STRAY BALL FROM SOME DIRECTION STRUCK HIM IN THE SIDE AND HE FELL OFF DEAD AND HIS HORSE BECOMING FRIGHTENED GALLOPED OFF DRAGGING HIM THROUGH THE CONFEDERATE LINES
+MULE DID NOT DESIRE TO CROSS WHILE I WAS TRYING TO PERSUADE HIM WITH A BIG STICK A ROCK IN HIS EAR AND A TWISTER ON HIS NOSE
+SHOULD YOU DESIRE TO FIND OUT MORE ABOUT THE BATTLE I REFER YOU TO HISTORY
+SO HE GOT A LARGE TWO INCH ROPE TIED ONE END AROUND THE MULE'S NECK AND THE OTHER TO THE CAISSON AND ORDERED THE DRIVER TO WHIP UP
+I HAD BEEN FEELING MEAN ALL THE MORNING AS IF I HAD STOLEN A SHEEP BUT WHEN THE ORDER TO CHARGE WAS GIVEN I GOT HAPPY
+THAT'S RIGHT MY BRAVE FIRST TENNESSEE GIVE EM HAIL COLUMBIA
+WE HAD TO PASS OVER THE GROUND WHERE TROOPS HAD BEEN FIGHTING ALL DAY
+SHILOH
+I FREQUENTLY THOUGHT IT WOULD BE PLEASANT TO SPLIT THE DIFFERENCE WITH THAT MULE AND I WOULD GLADLY HAVE DONE SO IF I COULD HAVE GOTTEN ONE HALF OF HIS NO
+I DO NOT PRETEND TO TELL OF WHAT COMMAND DISTINGUISHED ITSELF OF HEROES OF BLOOD AND WOUNDS OF SHRIEKS AND GROANS OF BRILLIANT CHARGES OF CANNON CAPTURED ET CETERA
+WE WERE SUPPORTING AN ALABAMA BRIGADE
+ON MONDAY MORNING I TOO CAPTURED ME A MULE
+THE VOICE APPEARED TO BE OVERHEAD
+BUT HOW TO GET HIM OUT WAS THE UNSOLVED PROBLEM
+I DON'T THINK HIS GUN WAS LOADED THOUGH BECAUSE WE DID NOT HEAR THE BALL WHISTLE
+THE POOR FELLOW STAYED IN THAT WELL ALL NIGHT
+THOSE OLD SOLDIERS HAD LONG LONG AGO FORGOTTEN ABOUT THAT OLD LAW OF THE LONG GONE PAST BUT JIM HAD TREASURED IT UP IN HIS MEMORY LO THESE MANY YEARS AND HE THOUGHT IT WOULD SERVE HIM NOW AS IT HAD NO DOUBT FREQUENTLY DONE IN THE PAST
+HE WALKED UP AND SAYS HELLO BOYS WHAT IS IT BOSS
+WE WERE INURED TO PRIVATIONS AND HARDSHIPS HAD BEEN UPON EVERY MARCH IN EVERY BATTLE IN EVERY SKIRMISH IN EVERY ADVANCE IN EVERY RETREAT IN EVERY VICTORY IN EVERY DEFEAT
+WE WALKED OVER THIS FLOATING BRIDGE AND SOON FOUND OURSELVES ON THE TENNESSEE SIDE OF TENNESSEE RIVER
+WE PASSED AROUND ATLANTA CROSSED THE CHATTAHOOCHEE AND TRAVELED BACK OVER THE SAME ROUTE ON WHICH WE HAD MADE THE ARDUOUS CAMPAIGN UNDER JOE JOHNSTON
+THE THIRD DAY IT WAS REPORTED THAT THE YANKEES HAD TAKEN POSITION ON THE MURFREESBORO PIKE
+WE LOOKED ALL AROUND AND THOUGHT THAT THE COAST WAS CLEAR
+A REGIMENT WAS SENT TO THE ATTACK IT WAS JIM'S REGIMENT
+HOW EVERY PULSE DID BEAT AND LEAP AND HOW EVERY HEART DID THROB WITH EMOTIONS OF JOY WHICH SEEMED NEARLY AKIN TO HEAVEN WHEN WE RECEIVED THE GLAD INTELLIGENCE OF OUR ONWARD MARCH TOWARD THE LAND OF PROMISE AND OF OUR LOVED ONES
+THEY PERSUADED ELOQUENTLY
+BUT AFTER AWHILE JIM SAYS GENTLEMEN AY GANNY THE LAW
+RIGHT BEFORE ME I SAW THE LONG DRY GRASS ALL BENDING TOWARD A COMMON CENTER AND I KNEW THAT IT WAS AN OLD WELL AND THAT MY COMRADE HAD FALLEN IN IT
+HE HADN'T SEEN ANYTHING TO SHOOT AT BUT HE BLAZED AWAY HE LOADED AND FIRED THE SECOND TIME WHEN THEY WERE ORDERED TO RETREAT
+HE WANTED TO GO BY HOME AND TELL HIS WIFE AND CHILDREN GOOD BYE AND TO GET HIS CLOTHES IT WAS NO GO
+WE HAD BEEF FOR SUPPER THAT NIGHT
+A YANKEE ALWAYS SAYS NAGER
+YANK SAYS WHAT YOU DOING JOHNNY
+ADVANCE INTO TENNESSEE
+A MAN IN THE WELL
+OUTSIDE OF THESE OCCASIONAL REMINDERS WE COULD SEE NO EVIDENCE OF THE DESOLATION OF THE TRACK OF AN INVADING ARMY
+YOU SEE JIM KNOWED THE LAW
+WE WERE NOT TWENTY YARDS OFF FROM THE YANKEES AND THEY WERE POURING THE HOT SHOT AND SHELLS RIGHT INTO OUR RANKS AND EVERY MAN WAS YELLING AT THE TOP OF HIS VOICE CEASE FIRING YOU ARE FIRING ON YOUR OWN MEN CEASE FIRING YOU ARE FIRING ON YOUR OWN MEN
+I SOON FOUND OUT THAT HE HAD CAUGHT SIGHT OF THE RELIEF ON THE ROAD AND WAS AFRAID TO SHOOT I QUICKLY MADE UP MY MIND
+WE KEPT FALLING BACK AND FIRING ALL DAY AND WERE RELIEVED BY ANOTHER REGIMENT ABOUT DARK WE REJOINED OUR REGIMENT
+I THINK WE MUST HAVE KILLED A GOOD MANY IN THE OLD FIELD BECAUSE WE WERE FIRING ALL THE TIME AT THE SOLID LINE AS THEY ADVANCED UPON US
+MY GUN WAS AT MY FEET AND ONE STEP WOULD GET IT
+THE PRIVATE COULD BUT HE WAS NO GENERAL YOU SEE
+WE REMAINED SEVERAL MONTHS BUT SOON WE WERE ON THE TRAMP AGAIN
+I THOUGHT IT HAD BEEN TORN FROM MY SHOULDER
+I MADE A QUICK GLANCE OVER MY SHOULDER AND GRABBED AT MY GUN
+FROM TIME TO TIME DIFFERENT REGIMENTS WERE SENT FORWARD TO DO PICKET DUTY
+HE DIVINED MY MOTIVE AND FIRED THE BALL MISSED ITS AIM
+THE YANKEE PICKET LINES WERE NOT A HALF MILE OFF
+I AM A VIDET YOU KNOW THE RESPONSIBILITY RESTING ON ME
+WE WERE ORDERED FORWARD TO THE ATTACK
+BUT I COULD NOT BEAR THE THOUGHT OF WEARING DEAD MEN'S SHOES
+I LOOKED AT IT PRETTY CLOSE AND I SAID GREAT GOD
+HE WAS WALKING ALONG WHEN ALL AT ONCE HE DROPPED DOWN AND DIED WITHOUT A STRUGGLE OR A GROAN
+SAYS HE I WOULD NOT TRUST A SECESH ON HIS WORD OATH OR BOND MARCH I SAY
+I SAW AND FELT THAT HE WAS NOT FIGHTING FOR GLORY BUT THAT HE WAS FIGHTING FOR HIS COUNTRY BECAUSE HE LOVED THAT COUNTRY AND HE WAS WILLING TO GIVE HIS LIFE FOR HIS COUNTRY AND THE SUCCESS OF OUR CAUSE
+BAD GENERALSHIP I THOUGHT IT WAS CHRISTMAS
+OAKLEY COLOR BEARER OF THE FOURTH TENNESSEE REGIMENT RAN RIGHT UP IN THE MIDST OF THE YANKEE LINE WITH HIS COLORS BEGGING HIS MEN TO FOLLOW
+OUR ARMY STOPPED AT MURFREESBORO
+WE WERE AT THAT TIME AT LEAST A HUNDRED YARDS IN ADVANCE OF THE BRIGADE CHEATHAM ALL THE TIME CALLING UPON THE MEN TO COME ON
+WE WERE RIGHT UPON THE YANKEE LINE ON THE WILKERSON TURNPIKE
+HE WAS STONE DEAD BUT I DROPPED THAT FOOT QUICK
+OUR PICKETS HAD RUN IN AND REPORTED A NIGHT ATTACK
+THE FEDERAL ARMY WAS CONCENTRATING AT NASHVILLE THERE WAS NO REST FOR THE WEARY
+THE LEADEN HAIL STORM SWEPT THEM OFF THE FIELD THEY FELL BACK AND RE FORMED
+BEFORE WE ARRIVED AT THE HOUSE WE SAW A BODY OF YANKEES APPROACHING AND AS WE STARTED TO RUN BACK THEY FIRED UPON US
+LINE OF BATTLE WAS FORMED ON THE NORTH BANK OF STONE'S RIVER ON THE YANKEE SIDE
+THE YANKEES MARCHED OVER THE HILL OUT OF SIGHT
+I CALLED LIEUTENANT COLONEL FRIERSON'S ATTENTION TO THE YANKEES AND HE REMARKED WELL I DON'T KNOW WHETHER THEY ARE YANKEES OR NOT BUT IF THEY ARE THEY WILL COME OUT OF THERE MIGHTY QUICK
+AS SOON AS EVER OF MY SECOND AGE I WAS UPON THE THRESHOLD AND CHANGED LIFE HIMSELF FROM ME HE TOOK AND GAVE TO OTHERS
+BUT WITH FULL RAVISHMENT THE HOURS OF PRIME SINGING RECEIVED THEY IN THE MIDST OF LEAVES THAT EVER BORE A BURDEN TO THEIR RHYMES
+BETWEEN HER STEPS AND MINE WERE NOT A HUNDRED WHEN EQUALLY THE MARGINS GAVE A TURN IN SUCH A WAY THAT TO THE EAST I FACED
+THESE STANDARDS TO THE REARWARD LONGER WERE THAN WAS MY SIGHT AND AS IT SEEMED TO ME TEN PACES WERE THE OUTERMOST APART
+I SAW THE LADY WHO EREWHILE APPEARED VEILED UNDERNEATH THE ANGELIC FESTIVAL DIRECT HER EYES TO ME ACROSS THE RIVER
+BY HIS DEFAULT SHORT WHILE HE SOJOURNED HERE BY HIS DEFAULT TO WEEPING AND TO TOIL HE CHANGED HIS INNOCENT LAUGHTER AND SWEET PLAY
+THIS EVERY OTHER SAVOUR DOTH TRANSCEND AND NOTWITHSTANDING SLAKED SO FAR MAY BE THY THIRST THAT I REVEAL TO THEE NO MORE
+AS SOON AS ON MY VISION SMOTE THE POWER SUBLIME THAT HAD ALREADY PIERCED ME THROUGH ERE FROM MY BOYHOOD I HAD YET COME FORTH
+YE KEEP YOUR WATCH IN THE ETERNAL DAY SO THAT NOR NIGHT NOR SLEEP CAN STEAL FROM YOU ONE STEP THE AGES MAKE UPON THEIR PATH
+TO THE LEFT HAND I TURNED WITH THAT RELIANCE WITH WHICH THE LITTLE CHILD RUNS TO HIS MOTHER WHEN HE HAS FEAR OR WHEN HE IS AFFLICTED
+NOT ONLY ROME WITH NO SUCH SPLENDID CAR E'ER GLADDENED AFRICANUS OR AUGUSTUS BUT POOR TO IT THAT OF THE SUN WOULD BE
+AND WHEN THE CAR WAS OPPOSITE TO ME THUNDER WAS HEARD AND ALL THAT FOLK AUGUST SEEMED TO HAVE FURTHER PROGRESS INTERDICTED
+CONFUSION AND DISMAY TOGETHER MINGLED FORCED SUCH A YES FROM OUT MY MOUTH THAT SIGHT WAS NEEDFUL TO THE UNDERSTANDING OF IT
+TO SAY UNTO VIRGILIUS NOT A DRACHM OF BLOOD REMAINS IN ME THAT DOES NOT TREMBLE I KNOW THE TRACES OF THE ANCIENT FLAME
+THE GOOD SUPREME SOLE IN ITSELF DELIGHTING CREATED MAN GOOD AND THIS GOODLY PLACE GAVE HIM AS HANSEL OF ETERNAL PEACE
+THOU MAKEST ME REMEMBER WHERE AND WHAT PROSERPINA THAT MOMENT WAS WHEN LOST HER MOTHER HER AND SHE HERSELF THE SPRING
+I DO NOT THINK THERE SHONE SO GREAT A LIGHT UNDER THE LIDS OF VENUS WHEN TRANSFIXED BY HER OWN SON BEYOND HIS USUAL CUSTOM
+BUT BY THE LARGESS OF CELESTIAL GRACES WHICH HAVE SUCH LOFTY VAPOURS FOR THEIR RAIN THAT NEAR TO THEM OUR SIGHT APPROACHES NOT
+DANTE BECAUSE VIRGILIUS HAS DEPARTED DO NOT WEEP YET DO NOT WEEP YET AWHILE FOR BY ANOTHER SWORD THOU NEED'ST MUST WEEP
+AND ONE OF THEM AS IF BY HEAVEN COMMISSIONED SINGING VENI SPONSA DE LIBANO SHOUTED THREE TIMES AND ALL THE OTHERS AFTER
+YE ARE NEW COMERS AND BECAUSE I SMILE BEGAN SHE PERADVENTURE IN THIS PLACE ELECT TO HUMAN NATURE FOR ITS NEST
+THREE MAIDENS AT THE RIGHT WHEEL IN A CIRCLE CAME ONWARD DANCING ONE SO VERY RED THAT IN THE FIRE SHE HARDLY HAD BEEN NOTED
+LOOK AT ME WELL IN SOOTH I'M BEATRICE
+ALL WATERS THAT ON EARTH MOST LIMPID ARE WOULD SEEM TO HAVE WITHIN THEMSELVES SOME MIXTURE COMPARED WITH THAT WHICH NOTHING DOTH CONCEAL
+WHENCE SHE TO ME IN THOSE DESIRES OF MINE WHICH LED THEE TO THE LOVING OF THAT GOOD BEYOND WHICH THERE IS NOTHING TO ASPIRE TO
+NOW HELICON MUST NEEDS POUR FORTH FOR ME AND WITH HER CHOIR URANIA MUST ASSIST ME TO PUT IN VERSE THINGS DIFFICULT TO THINK
+THE INTERVAL BETWEEN THESE FOUR CONTAINED A CHARIOT TRIUMPHAL ON TWO WHEELS WHICH BY A GRIFFIN'S NECK CAME DRAWN ALONG
+NOR EVEN THUS OUR WAY CONTINUED FAR BEFORE THE LADY WHOLLY TURNED HERSELF UNTO ME SAYING BROTHER LOOK AND LISTEN
+AND I BEHELD THE FLAMELETS ONWARD GO LEAVING BEHIND THEMSELVES THE AIR DEPICTED AND THEY OF TRAILING PENNONS HAD THE SEMBLANCE SO THAT IT OVERHEAD REMAINED DISTINCT WITH SEVENFOLD LISTS ALL OF THEM OF THE COLOURS WHENCE THE SUN'S BOW IS MADE AND DELIA'S GIRDLE
+THEREFORE MY ANSWER IS WITH GREATER CARE THAT HE MAY HEAR ME WHO IS WEEPING YONDER SO THAT THE SIN AND DOLE BE OF ONE MEASURE
+IN REAR OF ALL THE GROUP HERE TREATED OF TWO OLD MEN I BEHELD UNLIKE IN HABIT BUT LIKE IN GAIT EACH DIGNIFIED AND GRAVE
+AND WHAT ALLUREMENTS OR WHAT VANTAGES UPON THE FOREHEAD OF THE OTHERS SHOWED THAT THOU SHOULDST TURN THY FOOTSTEPS UNTO THEM
+THEN BACK I TURNED MY FACE TO THOSE HIGH THINGS WHICH MOVED THEMSELVES TOWARDS US SO SEDATELY THEY HAD BEEN DISTANCED BY NEW WEDDED BRIDES
+MUST I LEAVE ALONE NO
+MY FATHER HAS REVEALED THE CULPRIT'S NAME MY FATHER THIRSTS FOR REVENGE AS MUCH AS YOU DO YET EVEN HE CONJURES YOU AS I DO TO KEEP THIS SECRET DO YOU NOT FATHER
+AND THE CRY ISSUED FROM HIS PORES IF WE MAY THUS SPEAK A CRY FRIGHTFUL IN ITS SILENCE
+THE OLD MAN'S EYES REMAINED FIXED ON THE DOOR
+BUT IN LESS THAN FIVE MINUTES THE STAIRCASE GROANED BENEATH AN EXTRAORDINARY WEIGHT
+THE TWO DOCTORS THEREFORE ENTERED THE ROOM ALONE
+I AM GOING SIR AND I DO NOT HESITATE TO SAY THAT NO PRAYERS WILL BE MORE FERVENT THAN MINE
+D'AVRIGNY UNABLE TO BEAR THE SIGHT OF THIS TOUCHING EMOTION TURNED AWAY AND VILLEFORT WITHOUT SEEKING ANY FURTHER EXPLANATION AND ATTRACTED TOWARDS HIM BY THE IRRESISTIBLE MAGNETISM WHICH DRAWS US TOWARDS THOSE WHO HAVE LOVED THE PEOPLE FOR WHOM WE MOURN EXTENDED HIS HAND TOWARDS THE YOUNG MAN
+GENTLEMEN HE SAID IN A HOARSE VOICE GIVE ME YOUR WORD OF HONOR THAT THIS HORRIBLE SECRET SHALL FOREVER REMAIN BURIED AMONGST OURSELVES THE TWO MEN DREW BACK
+D'AVRIGNY RUSHED TOWARDS THE OLD MAN AND MADE HIM INHALE A POWERFUL RESTORATIVE
+MORREL SUFFERED AN EXCLAMATION OF HORROR AND SURPRISE TO ESCAPE HIM
+ASKED MORREL YES
+IT WAS SOMETHING TERRIBLE TO WITNESS THE SILENT AGONY THE MUTE DESPAIR OF NOIRTIER WHOSE TEARS SILENTLY ROLLED DOWN HIS CHEEKS
+GO DO YOU HEAR
+THE DISTRICT DOCTOR APPROACHED WITH THE INDIFFERENCE OF A MAN ACCUSTOMED TO SPEND HALF HIS TIME AMONGST THE DEAD HE THEN LIFTED THE SHEET WHICH WAS PLACED OVER THE FACE AND JUST UNCLOSED THE LIPS
+SAID MORREL SADLY YES REPLIED NOIRTIER
+OH YOU RAVE SIR EXCLAIMED VILLEFORT IN VAIN ENDEAVORING TO ESCAPE THE NET IN WHICH HE WAS TAKEN I RAVE
+DO YOU KNOW THE ASSASSIN ASKED MORREL
+THE OLD MAN MADE A SIGN IN THE AFFIRMATIVE
+NOIRTIER WAS NEAR THE BED PALE MOTIONLESS AND SILENT AS THE CORPSE
+THE NEAREST SAID THE DISTRICT DOCTOR IS A GOOD ITALIAN ABBE WHO LIVES NEXT DOOR TO YOU SHALL I CALL ON HIM AS I PASS
+AT THIS MOMENT THE WHOLE SOUL OF THE OLD MAN SEEMED CENTRED IN HIS EYES WHICH BECAME BLOODSHOT THE VEINS OF THE THROAT SWELLED HIS CHEEKS AND TEMPLES BECAME PURPLE AS THOUGH HE WAS STRUCK WITH EPILEPSY NOTHING WAS WANTING TO COMPLETE THIS BUT THE UTTERANCE OF A CRY
+WHAT DO YOU MEAN SIR
+BUT CAN HE UNDERSTAND YOU YES
+D'AVRIGNY SAID VILLEFORT BE SO KIND I BESEECH YOU AS TO ACCOMPANY THIS GENTLEMAN HERE IS THE KEY OF THE DOOR SO THAT YOU CAN GO IN AND OUT AS YOU PLEASE YOU WILL BRING THE PRIEST WITH YOU AND WILL OBLIGE ME BY INTRODUCING HIM INTO MY CHILD'S ROOM DO YOU WISH TO SEE HIM
+BUT HE STOPPED ON THE LANDING HE HAD NOT THE COURAGE TO AGAIN VISIT THE DEATH CHAMBER
+I ONLY WISH TO BE ALONE YOU WILL EXCUSE ME WILL YOU NOT
+NOIRTIER LOOKED UPON MORREL WITH ONE OF THOSE MELANCHOLY SMILES WHICH HAD SO OFTEN MADE VALENTINE HAPPY AND THUS FIXED HIS ATTENTION
+FOR SOME TIME NOTHING WAS HEARD IN THAT CHAMBER BUT SOBS EXCLAMATIONS AND PRAYERS
+I COULD NOT TAKE MY EYES OFF THE MAN IN THE BED
+THE LITTLE HOUSE ON THE HILLSIDE WAS SO MUCH THE COLOR OF THE NIGHT THAT WE COULD NOT SEE IT AS WE CAME UP THE DRAW
+SHE ASKED PETER TO WAIT A MOMENT AND WHEN SHE CAME BACK FROM THE KITCHEN SHE BROUGHT A BAG OF SANDWICHES AND DOUGHNUTS FOR US
+HE LAY PATIENTLY FIGHTING FOR BREATH LIKE A CHILD WITH CROUP
+WHEN HE WAS OUT HUNTING HE USED TO GO INTO THE EMPTY LOG HOUSE AND SIT THERE BROODING
+PETER CROUCHING IN THE FRONT SEAT SAW NOTHING
+WITHOUT A WORD PETER GOT UP AND LIT HIS LANTERN
+QUICKLY IT WAS COVERED WITH BRIGHT RED SPOTS I THOUGHT I HAD NEVER SEEN ANY BLOOD SO BRIGHT
+THE ROAD WAS CLEAR AND WHITE AND THE GROOM'S THREE BLACKS WENT LIKE THE WIND
+THEY MADE ME THINK OF DEFEATED ARMIES RETREATING OR OF GHOSTS WHO WERE TRYING DESPERATELY TO GET IN FOR SHELTER AND THEN WENT MOANING ON
+THE SHARP SMELL OF SPIRITS WENT THROUGH THE ROOM
+YES HOW MANY
+WHEREVER THEY WENT THE STORY FOLLOWED THEM
+THIS CABIN WAS HIS HERMITAGE UNTIL THE WINTER SNOWS PENNED HIM IN HIS CAVE
+THE SHRIEKS THAT FOLLOWED MADE EVERYBODY SOBER
+AND THE WOLVES PAVEL ASKED ENOUGH ENOUGH FOR ALL OF US
+SOMETHING HAPPENED TO THE HINDMOST SLEDGE THE DRIVER LOST CONTROL HE WAS PROBABLY VERY DRUNK THE HORSES LEFT THE ROAD THE SLEDGE WAS CAUGHT IN A CLUMP OF TREES AND OVERTURNED
+FROM OUR BENCH WE COULD SEE WHAT A HOLLOW CASE HIS BODY WAS
+DURING THE AUCTION HE WENT ABOUT WITH HIS HEAD DOWN AND NEVER LIFTED HIS EYES
+GRADUALLY RELIEF CAME TO ALL OF US
+THE SICK MAN RAGED AND SHOOK HIS FIST
+THE LOSS OF HIS TWO FRIENDS HAD A DEPRESSING EFFECT UPON OLD MISTER SHIMERDA
+MISTER SHIMERDA WENT WITH HIM
+IT SEEMED TO ME THAT HE DESPISED HIM FOR BEING SO SIMPLE AND DOCILE
+ANTONIA'S FATHER UNCOVERED ONE OF HIS LONG BONY LEGS AND RUBBED IT RHYTHMICALLY
+THE FIRST HOWLS WERE TAKEN UP AND ECHOED AND WITH QUICKENING REPETITIONS
+PETER TOLD HIS TROUBLES TO MISTER SHIMERDA HE WAS UNABLE TO MEET A NOTE WHICH FELL DUE ON THE FIRST OF NOVEMBER HAD TO PAY AN EXORBITANT BONUS ON RENEWING IT AND TO GIVE A MORTGAGE ON HIS PIGS AND HORSES AND EVEN HIS MILK COW
+A BLACK DROVE CAME UP OVER THE HILL BEHIND THE WEDDING PARTY
+TWENTY THIRTY ENOUGH
+WE LAY STILL AND DID NOT TALK
+THEY WERE RUN OUT OF THEIR VILLAGE
+THEY WORKED IN CHICAGO DES MOINES FORT WAYNE BUT THEY WERE ALWAYS UNFORTUNATE
+THE FIRST THING EITHER OF THEM NOTICED WAS A NEW SOUND THAT BROKE INTO THE CLEAR AIR LOUDER THAN THEY HAD EVER HEARD IT BEFORE THE BELL OF THE MONASTERY OF THEIR OWN VILLAGE RINGING FOR EARLY PRAYERS
+PETER COULD GIVE NO VERY CLEAR ACCOUNT OF HIS TRANSACTIONS WITH CUTTER
+EVERY ONE SAID PETER KISSED THE COW BEFORE SHE WAS LED AWAY BY HER NEW OWNER
+HE SEEMED TO BE CURSING PEOPLE WHO HAD WRONGED HIM
+THEY WERE WITHIN A FEW MILES OF THEIR VILLAGE NOW
+THERE ARE ONLY THREE SLEDGES LEFT HE WHISPERED
+PAVEL KNOCKED HIM OVER THE SIDE OF THE SLEDGE AND THREW THE GIRL AFTER HIM
+AFTER THE CEREMONY AT THE CHURCH THE PARTY WENT TO A DINNER GIVEN BY THE PARENTS OF THE BRIDE
+NOW HIS MIDDLE HORSE WAS BEING ALMOST DRAGGED BY THE OTHER TWO
+I EXPLAINED TO ANTONIA HOW THIS MEANT THAT HE WAS TWENTY FOUR YEARS OLD THAT HE MUST HAVE BEEN THERE WHEN WHITE MEN FIRST CAME LEFT ON FROM BUFFALO AND INDIAN TIMES
+I FOLLOWED WITH THE SPADE OVER MY SHOULDER DRAGGING MY SNAKE
+I NEVER KNOW YOU WAS SO BRAVE JIM SHE WENT ON COMFORTINGLY
+I WHIRLED ROUND AND THERE ON ONE OF THOSE DRY GRAVEL BEDS WAS THE BIGGEST SNAKE I HAD EVER SEEN
+HE COULD STAND RIGHT UP AND TALK TO YOU HE COULD DID HE FIGHT HARD
+OTTO WINKED AT ME
+ONE DAY WHEN I RODE OVER TO THE SHIMERDAS I FOUND ANTONIA STARTING OFF ON FOOT FOR RUSSIAN PETER'S HOUSE TO BORROW A SPADE AMBROSCH NEEDED
+THERE HAD BEEN ANOTHER BLACK FROST THE NIGHT BEFORE AND THE AIR WAS CLEAR AND HEADY AS WINE
+WE DECIDED THAT ANTONIA SHOULD RIDE DUDE HOME AND I WOULD WALK
+A SNAKE OF HIS SIZE IN FIGHTING TRIM WOULD BE MORE THAN ANY BOY COULD HANDLE
+OTTO FUCHS WAS THE FIRST ONE WE MET
+LOOK TONY THAT'S HIS POISON I SAID
+THIS CHANGE CAME ABOUT FROM AN ADVENTURE WE HAD TOGETHER
+A FAINT FETID SMELL CAME FROM HIM AND A THREAD OF GREEN LIQUID OOZED FROM HIS CRUSHED HEAD
+SHE WAS FOUR YEARS OLDER THAN I TO BE SURE AND HAD SEEN MORE OF THE WORLD BUT I WAS A BOY AND SHE WAS A GIRL AND I RESENTED HER PROTECTING MANNER
+I KNOW I AM JUST AWFUL JIM I WAS SO SCARED
+IT WAS ON ONE OF THESE GRAVEL BEDS THAT I MET MY ADVENTURE
+IT WAS THE SEASON WHEN THE ANCIENT SUN GOD HAD BEEN ACCUSTOMED TO RECEIVE HIS ANNUAL OBLATIONS AND WE CAN WELL BELIEVE THAT THOSE WHOSE HEARTS STILL TREMBLED AT THE NAME OF BEL MUST HAVE CONNECTED THE ECLIPSE AND THE PLAGUE WITH THE REVOLUTION IN THE NATIONAL WORSHIP AND THE OVERTHROW OF THE ANCIENT GODS ON THAT PLAIN OF PROSTRATION WHERE THEY HAD SO LONG RECEIVED THE HOMAGE OF AN ENTIRE PEOPLE
+SO SLOW AND PATIENT IS THE PROCESS BY WHICH CHRISTIANITY INFUSES ITSELF INTO THE SOCIAL LIFE OF A CONVERTED PEOPLE
+NOTHING COULD BE MORE NATURAL THAN SUCH AN ASSEMBLY IN SUCH A PLACE AT SUCH A PERIOD
+LASTLY THE ROYAL BROTHERS FELL THEMSELVES VICTIMS TO THE EPIDEMIC WHICH SO SADLY SIGNALIZES THEIR REIGN
+THE TRIBUTE WAS AT THIS PERIOD ENORMOUS FIFTEEN THOUSAND HEAD OF CATTLE ANNUALLY
+THE KINGDOM OF NORTHUMBRIA AS THE NAME IMPLIES EMBRACED NEARLY ALL THE COUNTRY FROM THE HUMBER TO THE PICTISH BORDER
+HERE THE HOLY PRELATE OF FERNS MET HIM AND RELATED A VISION IN WHICH HE HAD BEEN INSTRUCTED TO DEMAND THE ABOLITION OF THE IMPOST
+THE POETS OF SUCCEEDING AGES HAVE DWELT MUCH IN DETAIL ON THE OCCURRENCES OF THIS MEMORABLE DAY
+THE SAXONS OF KENT AND THE SOUTHERN KINGDOMS GENERALLY WERE CONVERTED BY MISSIONARIES FROM FRANCE OR ROME OR NATIVE PREACHERS OF THE FIRST OR SECOND CHRISTIAN GENERATION THOSE OF NORTHUMBRIA RECOGNISE AS THEIR APOSTLES SAINT AIDAN AND SAINT CUTHBERT TWO FATHERS FROM IONA
+IT IS PRETTY CLEAR ALSO THAT THE LAST RALLY OF DRUIDISM AGAINST CHRISTIANITY TOOK PLACE BEHIND HIS BANNER ON THE PLAIN OF MOIRA
+THROUGHOUT THIS CENTURY THE POWER OF THE CHURCH WAS CONSTANTLY ON THE INCREASE AND IS VISIBLE IN MANY IMPORTANT CHANGES
+THE ANCESTORS OF THE PRESENT PRETENDER CONGAL SURNAMED THE SQUINT EYED HAD TWICE RECEIVED AND CHERISHED THE LICENTIOUS BARDS WHEN UNDER THE BAN OF TARA AND HIS POPULARITY WITH THAT STILL POWERFUL ORDER WAS ONE PROP OF HIS AMBITION
+THE BARREN ROCK ABOUT THREE MILES IN LENGTH WAS COVERED WITH MONASTIC BUILDINGS AND ITS CEMETERY WAS ALREADY ADORNED WITH THE TOMBS OF SAINTS AND KINGS
+WHILE THE LIBERATED EXILES REJOICED ON THE PLAIN OF MEATH THE TENT OF THE ABBOT OF IONA WAS PITCHED ON THE RATH OF TARA A FACT WHICH WOULD SEEM TO INDICATE THAT ALREADY IN LITTLE MORE THAN A CENTURY SINCE THE INTERDICT HAD FALLEN ON IT THE EDIFICES WHICH MADE SO FINE A SHOW IN THE DAYS OF PATRICK WERE RUINED AND UNINHABITABLE
+THE ONLY CONFLICTS THAT OCCURRED ON IRISH SOIL WITH A PICTISH OR AN ANGLO SAXON FORCE IF WE EXCEPT THOSE WHO FORMED A CONTINGENT OF CONGAL'S ARMY AT MOIRA OCCURRED IN THE TIME OF THE HOSPITABLE FINNACTA
+SAINT MOLING SURVIVED HIM THREE YEARS AND SAINT ADAMNAN SO INTIMATELY CONNECTED WITH HIS REIGN TEN YEARS
+NOW EVERY MISSIONARY THAT EVER WENT OUT FROM IONA HAD TAUGHT THAT TO REDUCE CHRISTIANS TO SLAVERY WAS WHOLLY INCONSISTENT WITH A BELIEF IN THE DOCTRINES OF THE GOSPEL
+AS LEADING TO THE MENTION OF OTHER INTERESTING EVENTS WE MUST SET THIS INROAD CLEARLY BEFORE THE READER
+LIKE THE TWO KINGS OF SPARTA THEY REIGNED JOINTLY DIVIDING BETWEEN THEM THE LABOURS AND CARES OF STATE
diff --git a/SpeechT5/asr_train/valid.ltr b/SpeechT5/asr_train/valid.ltr
new file mode 100644
index 0000000000000000000000000000000000000000..56e348a5ce146908df05619638a04d030f131054
--- /dev/null
+++ b/SpeechT5/asr_train/valid.ltr
@@ -0,0 +1,29 @@
+T H E | M E N | W E R E | A S | H A N D S O M E | A S | T H E | W O M E N | B E A U T I F U L |
+I | D O N ' T | W O N D E R | Y O U | W E R E | A F R A I D | T O | T E L L | M E | S H E | B E G A N | Y O U | D O N ' T | L O V E | M E | Y O U ' V E | N E V E R | L O V E D | M E | I | W A S | A N | I D I O T | T O | B E L I E V E | Y O U | D I D |
+W I T H | T H E | I N S I G H T | O F | A | K I N D R E D | T E M P E R A M E N T | H E | P R O N O U N C E D | H I S | V E R D I C T |
+T H E | L O Y A L | F R E N Z Y | F E L L | U P O N | T H E | T H R E E | Q U I E T | W O M E N | A N D | T H E Y | C O U L D | N O T | D O | T O O | M U C H | F O R | T H E I R | C O U N T R Y |
+N O W | L E T ' S | B E | B R A V E | A N D | E N J O Y | E V E R Y | M I N U T E | O F | I T |
+I T | R E L I E V E D | H I M | F O R | A | W H I L E |
+S O | H E ' S | A | F R I E N D | O F | Y O U R S | E H |
+O H | Y O U | M I N I S T E R S | O F | C H R I S T | W O L V E S | I N | S H E E P ' S | C L O T H I N G | Y O U | S H A L L | B E | J U D G E D | F O R | T H I S |
+I | H A V E | W A I T E D | L O N G | F O R | Y O U |
+T H E | B O Y S | W E R E | N O W | A L L | A N X I E T Y | T O | S T A R T | W H I L E | T H E | P O N I E S | A F T E R | T H E I R | S U N D A Y | R E S T | W E R E | A L M O S T | A S | F U L L | O F | L I F E | A S | W E R E | T H E I R | O W N E R S |
+A N D | A S | S O O N | A S | T H E I R | P A R E N T S | H A D | G O N E | T O | S L E E P | H E | G O T | U P | P U T | O N | H I S | C O A T | A N D | U N B A R R I N G | T H E | B A C K | D O O R | W E N T | O U T |
+H E R | S K I N | W A S | B R O W N | T O O | A N D | I N | H E R | C H E E K S | S H E | H A D | A | G L O W | O F | R I C H | D A R K | C O L O R |
+T O M ' S | E Y E S | F O C U S E D | I N | H O R R O R | O N | T H E | W R E C K A G E | E N V E L O P E D | B Y | S T I L L | B I L L O W I N G | D U S T |
+I | C A N | A S S U R E | Y O U | T H A T | T H I S | I S | A | M O D E R N | F A C E | A N D | O N E | W H I C H | Y O U | W I L L | V E R Y | P R O B A B L Y | M E E T |
+M I S S | H E P Z I B A H | I | S U P P O S E | W I L L | I N T E R W E A V E | T H E | F A C T | W I T H | H E R | O T H E R | T R A D I T I O N S | A N D | S E T | I T | D O W N | T H A T | T H E | F O W L S | K N O W | Y O U | T O | B E | A | P Y N C H E O N |
+W H A T E V E R | R E V I V I N G | E F F E C T | I T | M I G H T | O T H E R W I S E | H A V E | P R O D U C E D | O N | H I M | I T | M A D E | N O | C H A N G E | I N | T H E | T H R E A T E N I N G | G L O O M | O F | H I S | M A N N E R |
+I T | W A S | T H E | W O R S T | S U N D A Y | H E | H A D | S P E N T | I N | H I S | L I F E |
+M A N Y | L I T T L E | W R I N K L E S | G A T H E R E D | B E T W E E N | H I S | E Y E S | A S | H E | C O N T E M P L A T E D | T H I S | A N D | H I S | B R O W | M O I S T E N E D |
+P R O F O U N D | S U F F E R I N G | M A K E S | N O B L E | I T | S E P A R A T E S | O N E | O F | T H E | M O S T | R E F I N E D | F O R M S | O F | D I S G U I S E | I S | E P I C U R I S M | A L O N G | W I T H | A | C E R T A I N | O S T E N T A T I O U S | B O L D N E S S | O F | T A S T E | W H I C H | T A K E S | S U F F E R I N G | L I G H T L Y | A N D | P U T S | I T S E L F | O N | T H E | D E F E N S I V E | A G A I N S T | A L L | T H A T | I S | S O R R O W F U L | A N D | P R O F O U N D |
+B U T | I N | T H E | C A U S E | O F | S C I E N C E | M E N | A R E | E X P E C T E D | T O | S U F F E R |
+I | H A V E | N O T | T H E | S L I G H T E S T | D O U B T | T H A T | I N | H I G H | W I N D S | I T S | R E D | T I L E S | W E R E | B L O W N | O U T | T O | T H E | G R E A T | A N N O Y A N C E | O F | T H E | P A S T O R | A N D | C O N G R E G A T I O N |
+C L I M A T E | B A D | E X A M P L E | A N D | T H E | L U X U R Y | O F | P O W E R | D E G R A D E D | T H E M | I N | O N E | C E N T U R Y | I N T O | A | R A C E | O F | H E L P L E S S | A N D | D E B A U C H E D | S L A V E | H O L D E R S | D O O M E D | T O | U T T E R | E X T E R M I N A T I O N | B E F O R E | T H E | S E M I | G O T H I C | A R M I E S | O F | B E L I S A R I U S | A N D | W I T H | T H E M | V A N I S H E D | T H E | L A S T | C H A N C E | T H A T | T H E | G O T H I C | R A C E S | W O U L D | E X E R C I S E | O N | T H E | E A S T E R N | W O R L D | T H E | S A M E | S T E R N | Y E T | W H O L E S O M E | D I S C I P L I N E | U N D E R | W H I C H | T H E | W E S T E R N | H A D | B E E N | R E S T O R E D | T O | L I F E |
+U R S U S | W A S | S A T I S F I E D | W I T H | T H E | A P P L A U S E | O F | S O U T H W A R K | B U T | B Y | N O | M E A N S | A S T O N I S H E D |
+M I S T E R | M O R T O N | S E E M E D | P A R T I C U L A R L Y | S T R U C K | W I T H | T H E | A C C O U N T | O F | W A V E R L E Y ' S | V I S I T | T O | D O N A L D | B E A N | L E A N |
+W E | S A W | T H E | U N I T E D | S T A T E S | F L A G | F L Y I N G | F R O M | T H E | R A M P A R T S | A N D | T H O U G H T | T H A T | Y A N K | W O U L D | P R O B A B L Y | B E | A S L E E P | O R | C A T C H I N G | L I C E | O R | M A Y B E | E N G A G E D | I N | A | G A M E | O F | S E V E N | U P |
+A S | I | W E N T | B A C K | T O | T H E | F I E L D | H O S P I T A L | I | O V E R T O O K | A N O T H E R | M A N | W A L K I N G | A L O N G |
+S O | L O W | H E | F E L L | T H A T | A L L | A P P L I A N C E S | F O R | H I S | S A L V A T I O N | W E R E | A L R E A D Y | S H O R T | S A V E | S H O W I N G | H I M | T H E | P E O P L E | O F | P E R D I T I O N |
+S O M E | A P P R E H E N S I O N | K E E P S | Y O U | M A R V E L L I N G | B U T | T H E | P S A L M | D E L E C T A S T I | G I V E T H | L I G H T | W H I C H | H A S | T H E | P O W E R | T O | U N C L O U D | Y O U R | I N T E L L E C T |
+T H E | S E C O N D | W A S | A S | I F | H E R | F L E S H | A N D | B O N E S | H A D | A L L | B E E N | F A S H I O N E D | O U T | O F | E M E R A L D | T H E | T H I R D | A P P E A R E D | A S | S N O W | B U T | N E W L Y | F A L L E N |
diff --git a/SpeechT5/asr_train/valid.tsv b/SpeechT5/asr_train/valid.tsv
new file mode 100644
index 0000000000000000000000000000000000000000..860f40b200fdd4127937dbd27f7ac893bac5b39c
--- /dev/null
+++ b/SpeechT5/asr_train/valid.tsv
@@ -0,0 +1,30 @@
+/public/home/changhl/dataset/LibriSpeech/dev-clean
+2412/153954/2412-153954-0008.flac	43760
+6345/93302/6345-93302-0018.flac	99200	
+777/126732/777-126732-0044.flac	64400	
+3853/163249/3853-163249-0025.flac	97520
+3853/163249/3853-163249-0040.flac	52320
+3752/4944/3752-4944-0063.flac	33120	
+3752/4944/3752-4944-0008.flac	31040	
+3752/4943/3752-4943-0011.flac	111520	
+6319/64726/6319-64726-0014.flac	42080
+6313/66129/6313-66129-0016.flac	119600
+7976/110523/7976-110523-0004.flac	116160	
+1988/147956/1988-147956-0011.flac	83680
+251/137823/251-137823-0009.flac	93360
+2086/149220/2086-149220-0029.flac	89280
+2086/149220/2086-149220-0016.flac	120320
+8297/275154/8297-275154-0010.flac	121760
+2277/149897/2277-149897-0023.flac	52960
+2277/149896/2277-149896-0005.flac	89600
+422/122949/422-122949-0025.flac	285600
+6241/61943/6241-61943-0001.flac	51520
+6241/61943/6241-61943-0024.flac	117760
+2902/9006/2902-9006-0017.flac	380160
+5895/34629/5895-34629-0022.flac	96640	
+5338/24640/5338-24640-0003.flac	102720	
+5694/64038/5694-64038-0004.flac	153280	
+5694/64029/5694-64029-0026.flac	77120
+84/121550/84-121550-0032.flac	132000
+84/121550/84-121550-0005.flac	139439
+84/121550/84-121550-0018.flac	151360	
\ No newline at end of file
diff --git a/SpeechT5/asr_train/valid.txt b/SpeechT5/asr_train/valid.txt
new file mode 100644
index 0000000000000000000000000000000000000000..938ca3a67a204be02e89e851918c21c6026645af
--- /dev/null
+++ b/SpeechT5/asr_train/valid.txt
@@ -0,0 +1,29 @@
+THE MEN WERE AS HANDSOME AS THE WOMEN BEAUTIFUL
+I DON'T WONDER YOU WERE AFRAID TO TELL ME SHE BEGAN YOU DON'T LOVE ME YOU'VE NEVER LOVED ME I WAS AN IDIOT TO BELIEVE YOU DID
+WITH THE INSIGHT OF A KINDRED TEMPERAMENT HE PRONOUNCED HIS VERDICT
+THE LOYAL FRENZY FELL UPON THE THREE QUIET WOMEN AND THEY COULD NOT DO TOO MUCH FOR THEIR COUNTRY
+NOW LET'S BE BRAVE AND ENJOY EVERY MINUTE OF IT
+IT RELIEVED HIM FOR A WHILE
+SO HE'S A FRIEND OF YOURS EH
+OH YOU MINISTERS OF CHRIST WOLVES IN SHEEP'S CLOTHING YOU SHALL BE JUDGED FOR THIS
+I HAVE WAITED LONG FOR YOU
+THE BOYS WERE NOW ALL ANXIETY TO START WHILE THE PONIES AFTER THEIR SUNDAY REST WERE ALMOST AS FULL OF LIFE AS WERE THEIR OWNERS
+AND AS SOON AS THEIR PARENTS HAD GONE TO SLEEP HE GOT UP PUT ON HIS COAT AND UNBARRING THE BACK DOOR WENT OUT
+HER SKIN WAS BROWN TOO AND IN HER CHEEKS SHE HAD A GLOW OF RICH DARK COLOR
+TOM'S EYES FOCUSED IN HORROR ON THE WRECKAGE ENVELOPED BY STILL BILLOWING DUST
+I CAN ASSURE YOU THAT THIS IS A MODERN FACE AND ONE WHICH YOU WILL VERY PROBABLY MEET
+MISS HEPZIBAH I SUPPOSE WILL INTERWEAVE THE FACT WITH HER OTHER TRADITIONS AND SET IT DOWN THAT THE FOWLS KNOW YOU TO BE A PYNCHEON
+WHATEVER REVIVING EFFECT IT MIGHT OTHERWISE HAVE PRODUCED ON HIM IT MADE NO CHANGE IN THE THREATENING GLOOM OF HIS MANNER
+IT WAS THE WORST SUNDAY HE HAD SPENT IN HIS LIFE
+MANY LITTLE WRINKLES GATHERED BETWEEN HIS EYES AS HE CONTEMPLATED THIS AND HIS BROW MOISTENED
+PROFOUND SUFFERING MAKES NOBLE IT SEPARATES ONE OF THE MOST REFINED FORMS OF DISGUISE IS EPICURISM ALONG WITH A CERTAIN OSTENTATIOUS BOLDNESS OF TASTE WHICH TAKES SUFFERING LIGHTLY AND PUTS ITSELF ON THE DEFENSIVE AGAINST ALL THAT IS SORROWFUL AND PROFOUND
+BUT IN THE CAUSE OF SCIENCE MEN ARE EXPECTED TO SUFFER
+I HAVE NOT THE SLIGHTEST DOUBT THAT IN HIGH WINDS ITS RED TILES WERE BLOWN OUT TO THE GREAT ANNOYANCE OF THE PASTOR AND CONGREGATION
+CLIMATE BAD EXAMPLE AND THE LUXURY OF POWER DEGRADED THEM IN ONE CENTURY INTO A RACE OF HELPLESS AND DEBAUCHED SLAVE HOLDERS DOOMED TO UTTER EXTERMINATION BEFORE THE SEMI GOTHIC ARMIES OF BELISARIUS AND WITH THEM VANISHED THE LAST CHANCE THAT THE GOTHIC RACES WOULD EXERCISE ON THE EASTERN WORLD THE SAME STERN YET WHOLESOME DISCIPLINE UNDER WHICH THE WESTERN HAD BEEN RESTORED TO LIFE
+URSUS WAS SATISFIED WITH THE APPLAUSE OF SOUTHWARK BUT BY NO MEANS ASTONISHED
+MISTER MORTON SEEMED PARTICULARLY STRUCK WITH THE ACCOUNT OF WAVERLEY'S VISIT TO DONALD BEAN LEAN
+WE SAW THE UNITED STATES FLAG FLYING FROM THE RAMPARTS AND THOUGHT THAT YANK WOULD PROBABLY BE ASLEEP OR CATCHING LICE OR MAYBE ENGAGED IN A GAME OF SEVEN UP
+AS I WENT BACK TO THE FIELD HOSPITAL I OVERTOOK ANOTHER MAN WALKING ALONG
+SO LOW HE FELL THAT ALL APPLIANCES FOR HIS SALVATION WERE ALREADY SHORT SAVE SHOWING HIM THE PEOPLE OF PERDITION
+SOME APPREHENSION KEEPS YOU MARVELLING BUT THE PSALM DELECTASTI GIVETH LIGHT WHICH HAS THE POWER TO UNCLOUD YOUR INTELLECT
+THE SECOND WAS AS IF HER FLESH AND BONES HAD ALL BEEN FASHIONED OUT OF EMERALD THE THIRD APPEARED AS SNOW BUT NEWLY FALLEN
diff --git a/SpeechT5/fairseq/.github/ISSUE_TEMPLATE.md b/SpeechT5/fairseq/.github/ISSUE_TEMPLATE.md
new file mode 100644
index 0000000000000000000000000000000000000000..5c4c4493e4a8e5386b927e4f4554df925955d129
--- /dev/null
+++ b/SpeechT5/fairseq/.github/ISSUE_TEMPLATE.md
@@ -0,0 +1,3 @@
+## 👉 [Please follow one of these issue templates](https://github.com/pytorch/fairseq/issues/new/choose) 👈
+
+Note: to keep the backlog clean and actionable, issues may be immediately closed if they do not follow one of the above issue templates.
diff --git a/SpeechT5/fairseq/.github/ISSUE_TEMPLATE/bug_report.md b/SpeechT5/fairseq/.github/ISSUE_TEMPLATE/bug_report.md
new file mode 100644
index 0000000000000000000000000000000000000000..a7f4f0a902e92a6b40e437ab496a50fdee4d6aae
--- /dev/null
+++ b/SpeechT5/fairseq/.github/ISSUE_TEMPLATE/bug_report.md
@@ -0,0 +1,43 @@
+---
+name: 🐛 Bug Report
+about: Submit a bug report to help us improve
+labels: 'bug, needs triage'
+---
+
+## 🐛 Bug
+
+<!-- A clear and concise description of what the bug is. -->
+
+### To Reproduce
+
+Steps to reproduce the behavior (**always include the command you ran**):
+
+1. Run cmd '....'
+2. See error
+
+<!-- If you have a code sample, error messages, stack traces, please provide it here as well -->
+
+
+#### Code sample
+<!-- Ideally attach a minimal code sample to reproduce the decried issue. 
+Minimal means having the shortest code but still preserving the bug. -->
+
+### Expected behavior
+
+<!-- A clear and concise description of what you expected to happen. -->
+
+### Environment
+
+ - fairseq Version (e.g., 1.0 or master):
+ - PyTorch Version (e.g., 1.0)
+ - OS (e.g., Linux):
+ - How you installed fairseq (`pip`, source):
+ - Build command you used (if compiling from source):
+ - Python version:
+ - CUDA/cuDNN version:
+ - GPU models and configuration:
+ - Any other relevant information:
+
+### Additional context
+
+<!-- Add any other context about the problem here. -->
diff --git a/SpeechT5/fairseq/.github/ISSUE_TEMPLATE/documentation.md b/SpeechT5/fairseq/.github/ISSUE_TEMPLATE/documentation.md
new file mode 100644
index 0000000000000000000000000000000000000000..3a6e2e9ea4bb71102122c17ff53051eb3770cb5e
--- /dev/null
+++ b/SpeechT5/fairseq/.github/ISSUE_TEMPLATE/documentation.md
@@ -0,0 +1,15 @@
+---
+name: 📚 Documentation/Typos
+about: Report an issue related to documentation or a typo
+labels: 'documentation, needs triage'
+---
+
+## 📚 Documentation
+
+For typos and doc fixes, please go ahead and:
+
+1. Create an issue.
+2. Fix the typo.
+3. Submit a PR.
+
+Thanks!
diff --git a/SpeechT5/fairseq/.github/ISSUE_TEMPLATE/feature_request.md b/SpeechT5/fairseq/.github/ISSUE_TEMPLATE/feature_request.md
new file mode 100644
index 0000000000000000000000000000000000000000..93c8668041f8a7af29e4c11e905d8b56b946dd51
--- /dev/null
+++ b/SpeechT5/fairseq/.github/ISSUE_TEMPLATE/feature_request.md
@@ -0,0 +1,24 @@
+---
+name: 🚀 Feature Request
+about: Submit a proposal/request for a new feature
+labels: 'enhancement, help wanted, needs triage'
+---
+
+## 🚀 Feature Request
+<!-- A clear and concise description of the feature proposal -->
+
+### Motivation
+
+<!-- Please outline the motivation for the proposal. Is your feature request related to a problem? e.g., I'm always frustrated when [...]. If this is related to another GitHub issue, please link here too -->
+
+### Pitch
+
+<!-- A clear and concise description of what you want to happen. -->
+
+### Alternatives
+
+<!-- A clear and concise description of any alternative solutions or features you've considered, if any. -->
+
+### Additional context
+
+<!-- Add any other context or screenshots about the feature request here. -->
diff --git a/SpeechT5/fairseq/.github/ISSUE_TEMPLATE/how-to-question.md b/SpeechT5/fairseq/.github/ISSUE_TEMPLATE/how-to-question.md
new file mode 100644
index 0000000000000000000000000000000000000000..4beb180dbf6dd61651aabf4a1b0748f2cd834300
--- /dev/null
+++ b/SpeechT5/fairseq/.github/ISSUE_TEMPLATE/how-to-question.md
@@ -0,0 +1,33 @@
+---
+name: ❓ Questions/Help
+about: If you have questions, please first search existing issues and docs
+labels: 'question, needs triage'
+---
+
+## ❓ Questions and Help
+
+### Before asking:   
+1. search the issues.   
+2. search the docs.    
+
+<!-- If you still can't find what you need: -->
+
+#### What is your question?
+
+#### Code
+
+<!-- Please paste a code snippet if your question requires it! -->   
+
+#### What have you tried?
+
+#### What's your environment?
+
+ - fairseq Version (e.g., 1.0 or master):
+ - PyTorch Version (e.g., 1.0)
+ - OS (e.g., Linux):
+ - How you installed fairseq (`pip`, source):
+ - Build command you used (if compiling from source):
+ - Python version:
+ - CUDA/cuDNN version:
+ - GPU models and configuration:
+ - Any other relevant information:
diff --git a/SpeechT5/fairseq/.github/PULL_REQUEST_TEMPLATE.md b/SpeechT5/fairseq/.github/PULL_REQUEST_TEMPLATE.md
new file mode 100644
index 0000000000000000000000000000000000000000..b28ff98e7bc807d68b142228721f291553164519
--- /dev/null
+++ b/SpeechT5/fairseq/.github/PULL_REQUEST_TEMPLATE.md
@@ -0,0 +1,16 @@
+# Before submitting
+
+- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
+- [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)?
+- [ ] Did you make sure to update the docs?   
+- [ ] Did you write any new necessary tests?  
+
+## What does this PR do?
+Fixes # (issue).
+
+## PR review    
+Anyone in the community is free to review the PR once the tests have passed.     
+If we didn't discuss your PR in Github issues there's a high chance it will not be merged.
+
+## Did you have fun?
+Make sure you had fun coding 🙃
diff --git a/SpeechT5/fairseq/.github/stale.yml b/SpeechT5/fairseq/.github/stale.yml
new file mode 100644
index 0000000000000000000000000000000000000000..b12867dab005e7a7608d4c7138a67d409c76f7ae
--- /dev/null
+++ b/SpeechT5/fairseq/.github/stale.yml
@@ -0,0 +1,30 @@
+# Configuration for probot-stale - https://github.com/probot/stale
+# Mostly copied from github.com/facebook/react/blob/master/.github/stale.yml
+# Number of days of inactivity before an issue becomes stale
+daysUntilStale: 90
+# Number of days of inactivity before a stale issue is closed
+daysUntilClose: 7
+# Issues with these labels will never be considered stale
+exemptLabels:
+  - bug
+# Label to use when marking an issue as stale
+staleLabel: stale
+issues:
+  # Comment to post when marking an issue as stale.
+  markComment: >
+    This issue has been automatically marked as stale.
+    **If this issue is still affecting you, please leave any comment** (for example, "bump"), and we'll keep it open.
+    We are sorry that we haven't been able to prioritize it yet. If you have any new additional information, please include it with your comment!
+  # Comment to post when closing a stale issue.
+  closeComment: >
+    Closing this issue after a prolonged period of inactivity. If this issue is still present in the latest release, please create a new issue with up-to-date information. Thank you!
+pulls:
+  # Comment to post when marking a pull request as stale.
+  markComment: >
+    This pull request has been automatically marked as stale.
+    **If this pull request is still relevant, please leave any comment** (for example, "bump"), and we'll keep it open.
+    We are sorry that we haven't been able to prioritize reviewing it yet. Your contribution is very much appreciated.
+  # Comment to post when closing a stale pull request.
+  closeComment: >
+    Closing this pull request after a prolonged period of inactivity. If this issue is still present in the latest release, please ask for this pull request to be reopened. Thank you!
+
diff --git a/SpeechT5/fairseq/.github/workflows/build.yml b/SpeechT5/fairseq/.github/workflows/build.yml
new file mode 100644
index 0000000000000000000000000000000000000000..105c42a503a6e8b493e11217b91e2a1fca79c081
--- /dev/null
+++ b/SpeechT5/fairseq/.github/workflows/build.yml
@@ -0,0 +1,55 @@
+name: build
+
+on:
+  # Trigger the workflow on push to master or any pull request
+  push:
+    branches:
+      - master
+  pull_request:
+
+jobs:
+  build:
+
+    strategy:
+      max-parallel: 4
+      matrix:
+        platform: [ubuntu-latest, macos-latest]
+        python-version: [3.6, 3.7]
+
+    runs-on: ${{ matrix.platform }}
+
+    steps:
+    - uses: actions/checkout@v2
+
+    - name: Set up Python ${{ matrix.python-version }}
+      uses: actions/setup-python@v2
+      with:
+        python-version: ${{ matrix.python-version }}
+
+    - name: Conditionally install pytorch
+      if: matrix.platform == 'windows-latest'
+      run: pip3 install torch -f https://download.pytorch.org/whl/torch_stable.html
+
+    - name: Install locally
+      run: |
+        python -m pip install --upgrade pip
+        git submodule update --init --recursive
+        python setup.py build_ext --inplace
+        python -m pip install --editable .
+
+    - name: Install optional test requirements
+      run: |
+        python -m pip install iopath transformers pyarrow
+        python -m pip install git+https://github.com/facebookresearch/fairscale.git@master
+
+    - name: Lint with flake8
+      run: |
+        pip install flake8
+        # stop the build if there are Python syntax errors or undefined names
+        flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics --extend-exclude fairseq/model_parallel/megatron
+        # exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
+        flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics --extend-exclude fairseq/model_parallel/megatron
+
+    - name: Run tests
+      run: |
+          python setup.py test
diff --git a/SpeechT5/fairseq/.github/workflows/build_wheels.yml b/SpeechT5/fairseq/.github/workflows/build_wheels.yml
new file mode 100644
index 0000000000000000000000000000000000000000..7261708596f0c781cf670119cb63c811f9c0d50c
--- /dev/null
+++ b/SpeechT5/fairseq/.github/workflows/build_wheels.yml
@@ -0,0 +1,41 @@
+name: build_wheels
+
+on:
+  push:
+    branches:
+      - v[0-9]+.[0-9]+.[x0-9]+
+    tags:
+      - v*
+
+jobs:
+  build_wheels:
+    name: Build wheels on ${{ matrix.os }}
+    runs-on: ${{ matrix.os }}
+    strategy:
+      matrix:
+        os: [ubuntu-latest, macos-latest]
+
+    steps:
+      - uses: actions/checkout@v2
+
+      - name: Install Python
+        uses: actions/setup-python@v2
+        with:
+          python-version: '3.7'
+
+      - name: Install cibuildwheel
+        run: |
+          python -m pip install cibuildwheel
+
+      - name: Build wheels for CPython
+        run: |
+          python -m cibuildwheel --output-dir dist
+        env:
+          CIBW_BUILD: "cp36-*64 cp37-*64 cp38-*64"
+          CIBW_MANYLINUX_X86_64_IMAGE: manylinux1
+          CIBW_BEFORE_BUILD: git submodule update --init --recursive && pip install .
+
+      - uses: actions/upload-artifact@v2
+        with:
+          name: wheels
+          path: ./dist/*.whl
diff --git a/SpeechT5/fairseq/.gitignore b/SpeechT5/fairseq/.gitignore
new file mode 100644
index 0000000000000000000000000000000000000000..4112804793c441354e6a2e6398075eea72ab6c0a
--- /dev/null
+++ b/SpeechT5/fairseq/.gitignore
@@ -0,0 +1,136 @@
+# JetBrains PyCharm IDE
+.idea/
+
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+
+# C extensions
+*.so
+
+# macOS dir files
+.DS_Store
+
+# Distribution / packaging
+.Python
+env/
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+
+# Checkpoints
+checkpoints
+
+# PyInstaller
+#  Usually these files are written by a python script from a template
+#  before PyInstaller builds the exe, so as to inject date/other infos into it.
+*.manifest
+*.spec
+
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+.hypothesis/
+
+# Translations
+*.mo
+*.pot
+
+# Django stuff:
+*.log
+local_settings.py
+
+# Flask stuff:
+instance/
+.webassets-cache
+
+# Scrapy stuff:
+.scrapy
+
+# Sphinx documentation
+docs/_build/
+
+# PyBuilder
+target/
+
+# Jupyter Notebook
+.ipynb_checkpoints
+
+# pyenv
+.python-version
+
+# celery beat schedule file
+celerybeat-schedule
+
+# SageMath parsed files
+*.sage.py
+
+# dotenv
+.env
+
+# virtualenv
+.venv
+venv/
+ENV/
+
+# Spyder project settings
+.spyderproject
+.spyproject
+
+# Rope project settings
+.ropeproject
+
+# mkdocs documentation
+/site
+
+# mypy
+.mypy_cache/
+
+# Generated files
+/fairseq/temporal_convolution_tbc
+/fairseq/modules/*_layer/*_forward.cu
+/fairseq/modules/*_layer/*_backward.cu
+/fairseq/version.py
+
+# data
+data-bin/
+
+# reranking
+/examples/reranking/rerank_data
+
+# Cython-generated C++ source files
+/fairseq/data/data_utils_fast.cpp
+/fairseq/data/token_block_utils_fast.cpp
+
+# VSCODE
+.vscode/ftp-sync.json
+.vscode/settings.json
+
+# Experimental Folder
+experimental/*
+
+# Weights and Biases logs
+wandb/
diff --git a/SpeechT5/fairseq/.gitmodules b/SpeechT5/fairseq/.gitmodules
new file mode 100644
index 0000000000000000000000000000000000000000..07a55d45d4f0bed755dbfc1f440f214ed43d206a
--- /dev/null
+++ b/SpeechT5/fairseq/.gitmodules
@@ -0,0 +1,4 @@
+[submodule "fairseq/model_parallel/megatron"]
+    path = fairseq/model_parallel/megatron
+    url = https://github.com/ngoyal2707/Megatron-LM
+    branch = fairseq
diff --git a/SpeechT5/fairseq/CODE_OF_CONDUCT.md b/SpeechT5/fairseq/CODE_OF_CONDUCT.md
new file mode 100644
index 0000000000000000000000000000000000000000..a0cbeaab7650bf08267fbdbc9bb54e845c88f392
--- /dev/null
+++ b/SpeechT5/fairseq/CODE_OF_CONDUCT.md
@@ -0,0 +1,77 @@
+# Code of Conduct
+
+## Our Pledge
+
+In the interest of fostering an open and welcoming environment, we as
+contributors and maintainers pledge to make participation in our project and
+our community a harassment-free experience for everyone, regardless of age, body
+size, disability, ethnicity, sex characteristics, gender identity and expression,
+level of experience, education, socio-economic status, nationality, personal
+appearance, race, religion, or sexual identity and orientation.
+
+## Our Standards
+
+Examples of behavior that contributes to creating a positive environment
+include:
+
+* Using welcoming and inclusive language
+* Being respectful of differing viewpoints and experiences
+* Gracefully accepting constructive criticism
+* Focusing on what is best for the community
+* Showing empathy towards other community members
+
+Examples of unacceptable behavior by participants include:
+
+* The use of sexualized language or imagery and unwelcome sexual attention or
+  advances
+* Trolling, insulting/derogatory comments, and personal or political attacks
+* Public or private harassment
+* Publishing others' private information, such as a physical or electronic
+  address, without explicit permission
+* Other conduct which could reasonably be considered inappropriate in a
+  professional setting
+
+## Our Responsibilities
+
+Project maintainers are responsible for clarifying the standards of acceptable
+behavior and are expected to take appropriate and fair corrective action in
+response to any instances of unacceptable behavior.
+
+Project maintainers have the right and responsibility to remove, edit, or
+reject comments, commits, code, wiki edits, issues, and other contributions
+that are not aligned to this Code of Conduct, or to ban temporarily or
+permanently any contributor for other behaviors that they deem inappropriate,
+threatening, offensive, or harmful.
+
+## Scope
+
+This Code of Conduct applies within all project spaces, and it also applies when
+an individual is representing the project or its community in public spaces.
+Examples of representing a project or community include using an official
+project e-mail address, posting via an official social media account, or acting
+as an appointed representative at an online or offline event. Representation of
+a project may be further defined and clarified by project maintainers.
+
+## Enforcement
+
+Instances of abusive, harassing, or otherwise unacceptable behavior may be
+reported by contacting the project team at <conduct@pytorch.org>. All
+complaints will be reviewed and investigated and will result in a response that
+is deemed necessary and appropriate to the circumstances. The project team is
+obligated to maintain confidentiality with regard to the reporter of an incident.
+Further details of specific enforcement policies may be posted separately.
+
+Project maintainers who do not follow or enforce the Code of Conduct in good
+faith may face temporary or permanent repercussions as determined by other
+members of the project's leadership.
+
+## Attribution
+
+This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
+available at https://www.contributor-covenant.org/version/1/4/code-of-conduct.html
+
+[homepage]: https://www.contributor-covenant.org
+
+For answers to common questions about this code of conduct, see
+https://www.contributor-covenant.org/faq
+
diff --git a/SpeechT5/fairseq/CONTRIBUTING.md b/SpeechT5/fairseq/CONTRIBUTING.md
new file mode 100644
index 0000000000000000000000000000000000000000..4d7ca6a98ebdabd7a6770ea616ee355ffb4a41e1
--- /dev/null
+++ b/SpeechT5/fairseq/CONTRIBUTING.md
@@ -0,0 +1,28 @@
+# Contributing to Facebook AI Research Sequence-to-Sequence Toolkit (fairseq)
+We want to make contributing to this project as easy and transparent as
+possible.
+
+## Pull Requests
+We actively welcome your pull requests.
+
+1. Fork the repo and create your branch from `master`.
+2. If you've added code that should be tested, add tests.
+3. If you've changed APIs, update the documentation.
+4. Ensure the test suite passes.
+5. Make sure your code lints.
+6. If you haven't already, complete the Contributor License Agreement ("CLA").
+
+## Contributor License Agreement ("CLA")
+In order to accept your pull request, we need you to submit a CLA. You only need
+to do this once to work on any of Facebook's open source projects.
+
+Complete your CLA here: <https://code.facebook.com/cla>
+
+## Issues
+We use GitHub issues to track public bugs. Please ensure your description is
+clear and has sufficient instructions to be able to reproduce the issue.
+
+## License
+By contributing to Facebook AI Research Sequence-to-Sequence Toolkit (fairseq),
+you agree that your contributions will be licensed under the LICENSE file in
+the root directory of this source tree.
diff --git a/SpeechT5/fairseq/LICENSE b/SpeechT5/fairseq/LICENSE
new file mode 100644
index 0000000000000000000000000000000000000000..b96dcb0480a0b0be0727976e5202a1e7b23edc3f
--- /dev/null
+++ b/SpeechT5/fairseq/LICENSE
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) Facebook, Inc. and its affiliates.
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
diff --git a/SpeechT5/fairseq/README.md b/SpeechT5/fairseq/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..82b6ba7cd8a562f079ed097433f1293dc3d5e46f
--- /dev/null
+++ b/SpeechT5/fairseq/README.md
@@ -0,0 +1,222 @@
+<p align="center">
+  <img src="docs/fairseq_logo.png" width="150">
+  <br />
+  <br />
+  <a href="https://github.com/pytorch/fairseq/blob/master/LICENSE"><img alt="MIT License" src="https://img.shields.io/badge/license-MIT-blue.svg" /></a>
+  <a href="https://github.com/pytorch/fairseq/releases"><img alt="Latest Release" src="https://img.shields.io/github/release/pytorch/fairseq.svg" /></a>
+  <a href="https://github.com/pytorch/fairseq/actions?query=workflow:build"><img alt="Build Status" src="https://github.com/pytorch/fairseq/workflows/build/badge.svg" /></a>
+  <a href="https://fairseq.readthedocs.io/en/latest/?badge=latest"><img alt="Documentation Status" src="https://readthedocs.org/projects/fairseq/badge/?version=latest" /></a>
+</p>
+
+--------------------------------------------------------------------------------
+
+Fairseq(-py) is a sequence modeling toolkit that allows researchers and
+developers to train custom models for translation, summarization, language
+modeling and other text generation tasks.
+
+We provide reference implementations of various sequence modeling papers:
+
+<details><summary>List of implemented papers</summary><p>
+
+* **Convolutional Neural Networks (CNN)**
+  + [Language Modeling with Gated Convolutional Networks (Dauphin et al., 2017)](examples/language_model/conv_lm/README.md)
+  + [Convolutional Sequence to Sequence Learning (Gehring et al., 2017)](examples/conv_seq2seq/README.md)
+  + [Classical Structured Prediction Losses for Sequence to Sequence Learning (Edunov et al., 2018)](https://github.com/pytorch/fairseq/tree/classic_seqlevel)
+  + [Hierarchical Neural Story Generation (Fan et al., 2018)](examples/stories/README.md)
+  + [wav2vec: Unsupervised Pre-training for Speech Recognition (Schneider et al., 2019)](examples/wav2vec/README.md)
+* **LightConv and DynamicConv models**
+  + [Pay Less Attention with Lightweight and Dynamic Convolutions (Wu et al., 2019)](examples/pay_less_attention_paper/README.md)
+* **Long Short-Term Memory (LSTM) networks**
+  + Effective Approaches to Attention-based Neural Machine Translation (Luong et al., 2015)
+* **Transformer (self-attention) networks**
+  + Attention Is All You Need (Vaswani et al., 2017)
+  + [Scaling Neural Machine Translation (Ott et al., 2018)](examples/scaling_nmt/README.md)
+  + [Understanding Back-Translation at Scale (Edunov et al., 2018)](examples/backtranslation/README.md)
+  + [Adaptive Input Representations for Neural Language Modeling (Baevski and Auli, 2018)](examples/language_model/README.adaptive_inputs.md)
+  + [Lexically constrained decoding with dynamic beam allocation (Post & Vilar, 2018)](examples/constrained_decoding/README.md)
+  + [Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context (Dai et al., 2019)](examples/truncated_bptt/README.md)
+  + [Adaptive Attention Span in Transformers (Sukhbaatar et al., 2019)](examples/adaptive_span/README.md)
+  + [Mixture Models for Diverse Machine Translation: Tricks of the Trade (Shen et al., 2019)](examples/translation_moe/README.md)
+  + [RoBERTa: A Robustly Optimized BERT Pretraining Approach (Liu et al., 2019)](examples/roberta/README.md)
+  + [Facebook FAIR's WMT19 News Translation Task Submission (Ng et al., 2019)](examples/wmt19/README.md)
+  + [Jointly Learning to Align and Translate with Transformer Models (Garg et al., 2019)](examples/joint_alignment_translation/README.md )
+  + [Multilingual Denoising Pre-training for Neural Machine Translation (Liu et at., 2020)](examples/mbart/README.md)
+  + [Neural Machine Translation with Byte-Level Subwords (Wang et al., 2020)](examples/byte_level_bpe/README.md)
+  + [Unsupervised Quality Estimation for Neural Machine Translation (Fomicheva et al., 2020)](examples/unsupervised_quality_estimation/README.md)
+  + [wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations (Baevski et al., 2020)](examples/wav2vec/README.md)
+  + [Generating Medical Reports from Patient-Doctor Conversations Using Sequence-to-Sequence Models (Enarvi et al., 2020)](examples/pointer_generator/README.md)
+  + [Linformer: Self-Attention with Linear Complexity (Wang et al., 2020)](examples/linformer/README.md)
+  + [Cross-lingual Retrieval for Iterative Self-Supervised Training (Tran et al., 2020)](examples/criss/README.md)
+  + [Deep Transformers with Latent Depth (Li et al., 2020)](examples/latent_depth/README.md)
+* **Non-autoregressive Transformers**
+  + Non-Autoregressive Neural Machine Translation (Gu et al., 2017)
+  + Deterministic Non-Autoregressive Neural Sequence Modeling by Iterative Refinement (Lee et al. 2018)
+  + Insertion Transformer: Flexible Sequence Generation via Insertion Operations (Stern et al. 2019)
+  + Mask-Predict: Parallel Decoding of Conditional Masked Language Models (Ghazvininejad et al., 2019)
+  + [Levenshtein Transformer (Gu et al., 2019)](examples/nonautoregressive_translation/README.md)
+* **Finetuning**
+  + [Better Fine-Tuning by Reducing Representational Collapse (Aghajanyan et al. 2020)](examples/rxf/README.md)
+
+</p></details>
+
+### What's New:
+
+* June 2021 [Released XLMR-XL and XLMR-XXL models](examples/xlmr/README.md)
+* March 2021 [Added full parameter and optimizer state sharding + CPU offloading](examples/fully_sharded_data_parallel/README.md)
+* February 2021 [Added LASER training code](examples/laser/README.md)
+* December 2020: [Added Adaptive Attention Span code](examples/adaptive_span/README.md)
+* December 2020: [GottBERT model and code released](examples/gottbert/README.md)
+* November 2020: Adopted the [Hydra](https://github.com/facebookresearch/hydra) configuration framework
+  * [see documentation explaining how to use it for new and existing projects](docs/hydra_integration.md)
+* November 2020: [fairseq 0.10.0 released](https://github.com/pytorch/fairseq/releases/tag/v0.10.0)
+* October 2020: [Added R3F/R4F (Better Fine-Tuning) code](examples/rxf/README.md)
+* October 2020: [Deep Transformer with Latent Depth code released](examples/latent_depth/README.md)
+* October 2020: [Added CRISS models and code](examples/criss/README.md)
+
+<details><summary>Previous updates</summary><p>
+
+* September 2020: [Added Linformer code](examples/linformer/README.md)
+* September 2020: [Added pointer-generator networks](examples/pointer_generator/README.md)
+* August 2020: [Added lexically constrained decoding](examples/constrained_decoding/README.md)
+* August 2020: [wav2vec2 models and code released](examples/wav2vec/README.md)
+* July 2020: [Unsupervised Quality Estimation code released](examples/unsupervised_quality_estimation/README.md)
+* May 2020: [Follow fairseq on Twitter](https://twitter.com/fairseq)
+* April 2020: [Monotonic Multihead Attention code released](examples/simultaneous_translation/README.md)
+* April 2020: [Quant-Noise code released](examples/quant_noise/README.md)
+* April 2020: [Initial model parallel support and 11B parameters unidirectional LM released](examples/megatron_11b/README.md)
+* March 2020: [Byte-level BPE code released](examples/byte_level_bpe/README.md)
+* February 2020: [mBART model and code released](examples/mbart/README.md)
+* February 2020: [Added tutorial for back-translation](https://github.com/pytorch/fairseq/tree/master/examples/backtranslation#training-your-own-model-wmt18-english-german)
+* December 2019: [fairseq 0.9.0 released](https://github.com/pytorch/fairseq/releases/tag/v0.9.0)
+* November 2019: [VizSeq released (a visual analysis toolkit for evaluating fairseq models)](https://facebookresearch.github.io/vizseq/docs/getting_started/fairseq_example)
+* November 2019: [CamemBERT model and code released](examples/camembert/README.md)
+* November 2019: [BART model and code released](examples/bart/README.md)
+* November 2019: [XLM-R models and code released](examples/xlmr/README.md)
+* September 2019: [Nonautoregressive translation code released](examples/nonautoregressive_translation/README.md)
+* August 2019: [WMT'19 models released](examples/wmt19/README.md)
+* July 2019: fairseq relicensed under MIT license
+* July 2019: [RoBERTa models and code released](examples/roberta/README.md)
+* June 2019: [wav2vec models and code released](examples/wav2vec/README.md)
+
+</p></details>
+
+### Features:
+
+* multi-GPU training on one machine or across multiple machines (data and model parallel)
+* fast generation on both CPU and GPU with multiple search algorithms implemented:
+  + beam search
+  + Diverse Beam Search ([Vijayakumar et al., 2016](https://arxiv.org/abs/1610.02424))
+  + sampling (unconstrained, top-k and top-p/nucleus)
+  + [lexically constrained decoding](examples/constrained_decoding/README.md) (Post & Vilar, 2018)
+* [gradient accumulation](https://fairseq.readthedocs.io/en/latest/getting_started.html#large-mini-batch-training-with-delayed-updates) enables training with large mini-batches even on a single GPU
+* [mixed precision training](https://fairseq.readthedocs.io/en/latest/getting_started.html#training-with-half-precision-floating-point-fp16) (trains faster with less GPU memory on [NVIDIA tensor cores](https://developer.nvidia.com/tensor-cores))
+* [extensible](https://fairseq.readthedocs.io/en/latest/overview.html): easily register new models, criterions, tasks, optimizers and learning rate schedulers
+* [flexible configuration](docs/hydra_integration.md) based on [Hydra](https://github.com/facebookresearch/hydra) allowing a combination of code, command-line and file based configuration
+* [full parameter and optimizer state sharding](examples/fully_sharded_data_parallel/README.md)
+* [offloading parameters to CPU](examples/fully_sharded_data_parallel/README.md)
+
+We also provide [pre-trained models for translation and language modeling](#pre-trained-models-and-examples)
+with a convenient `torch.hub` interface:
+
+``` python
+en2de = torch.hub.load('pytorch/fairseq', 'transformer.wmt19.en-de.single_model')
+en2de.translate('Hello world', beam=5)
+# 'Hallo Welt'
+```
+
+See the PyTorch Hub tutorials for [translation](https://pytorch.org/hub/pytorch_fairseq_translation/)
+and [RoBERTa](https://pytorch.org/hub/pytorch_fairseq_roberta/) for more examples.
+
+# Requirements and Installation
+
+* [PyTorch](http://pytorch.org/) version >= 1.5.0
+* Python version >= 3.6
+* For training new models, you'll also need an NVIDIA GPU and [NCCL](https://github.com/NVIDIA/nccl)
+* **To install fairseq** and develop locally:
+
+``` bash
+git clone https://github.com/pytorch/fairseq
+cd fairseq
+pip install --editable ./
+
+# on MacOS:
+# CFLAGS="-stdlib=libc++" pip install --editable ./
+
+# to install the latest stable release (0.10.x)
+# pip install fairseq
+```
+
+* **For faster training** install NVIDIA's [apex](https://github.com/NVIDIA/apex) library:
+
+``` bash
+git clone https://github.com/NVIDIA/apex
+cd apex
+pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" \
+  --global-option="--deprecated_fused_adam" --global-option="--xentropy" \
+  --global-option="--fast_multihead_attn" ./
+```
+
+* **For large datasets** install [PyArrow](https://arrow.apache.org/docs/python/install.html#using-pip): `pip install pyarrow`
+* If you use Docker make sure to increase the shared memory size either with `--ipc=host` or `--shm-size`
+ as command line options to `nvidia-docker run` .
+
+# Getting Started
+
+The [full documentation](https://fairseq.readthedocs.io/) contains instructions
+for getting started, training new models and extending fairseq with new model
+types and tasks.
+
+# Pre-trained models and examples
+
+We provide pre-trained models and pre-processed, binarized test sets for several tasks listed below,
+as well as example training and evaluation commands.
+
+* [Translation](examples/translation/README.md): convolutional and transformer models are available
+* [Language Modeling](examples/language_model/README.md): convolutional and transformer models are available
+
+We also have more detailed READMEs to reproduce results from specific papers:
+
+* [Cross-lingual Retrieval for Iterative Self-Supervised Training (Tran et al., 2020)](examples/criss/README.md)
+* [wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations (Baevski et al., 2020)](examples/wav2vec/README.md)
+* [Unsupervised Quality Estimation for Neural Machine Translation (Fomicheva et al., 2020)](examples/unsupervised_quality_estimation/README.md)
+* [Training with Quantization Noise for Extreme Model Compression ({Fan*, Stock*} et al., 2020)](examples/quant_noise/README.md)
+* [Neural Machine Translation with Byte-Level Subwords (Wang et al., 2020)](examples/byte_level_bpe/README.md)
+* [Multilingual Denoising Pre-training for Neural Machine Translation (Liu et at., 2020)](examples/mbart/README.md)
+* [Reducing Transformer Depth on Demand with Structured Dropout (Fan et al., 2019)](examples/layerdrop/README.md)
+* [Jointly Learning to Align and Translate with Transformer Models (Garg et al., 2019)](examples/joint_alignment_translation/README.md)
+* [Levenshtein Transformer (Gu et al., 2019)](examples/nonautoregressive_translation/README.md)
+* [Facebook FAIR's WMT19 News Translation Task Submission (Ng et al., 2019)](examples/wmt19/README.md)
+* [RoBERTa: A Robustly Optimized BERT Pretraining Approach (Liu et al., 2019)](examples/roberta/README.md)
+* [wav2vec: Unsupervised Pre-training for Speech Recognition (Schneider et al., 2019)](examples/wav2vec/README.md)
+* [Mixture Models for Diverse Machine Translation: Tricks of the Trade (Shen et al., 2019)](examples/translation_moe/README.md)
+* [Pay Less Attention with Lightweight and Dynamic Convolutions (Wu et al., 2019)](examples/pay_less_attention_paper/README.md)
+* [Understanding Back-Translation at Scale (Edunov et al., 2018)](examples/backtranslation/README.md)
+* [Classical Structured Prediction Losses for Sequence to Sequence Learning (Edunov et al., 2018)](https://github.com/pytorch/fairseq/tree/classic_seqlevel)
+* [Hierarchical Neural Story Generation (Fan et al., 2018)](examples/stories/README.md)
+* [Scaling Neural Machine Translation (Ott et al., 2018)](examples/scaling_nmt/README.md)
+* [Convolutional Sequence to Sequence Learning (Gehring et al., 2017)](examples/conv_seq2seq/README.md)
+* [Language Modeling with Gated Convolutional Networks (Dauphin et al., 2017)](examples/language_model/README.conv.md)
+
+# Join the fairseq community
+
+* Twitter: https://twitter.com/fairseq
+* Facebook page: https://www.facebook.com/groups/fairseq.users
+* Google group: https://groups.google.com/forum/#!forum/fairseq-users
+
+# License
+
+fairseq(-py) is MIT-licensed.
+The license applies to the pre-trained models as well.
+
+# Citation
+
+Please cite as:
+
+``` bibtex
+@inproceedings{ott2019fairseq,
+  title = {fairseq: A Fast, Extensible Toolkit for Sequence Modeling},
+  author = {Myle Ott and Sergey Edunov and Alexei Baevski and Angela Fan and Sam Gross and Nathan Ng and David Grangier and Michael Auli},
+  booktitle = {Proceedings of NAACL-HLT 2019: Demonstrations},
+  year = {2019},
+}
+```
diff --git a/SpeechT5/fairseq/docs/Makefile b/SpeechT5/fairseq/docs/Makefile
new file mode 100644
index 0000000000000000000000000000000000000000..c2f5b1a89cfc9e02d1bb09027d9e1e520ba53d53
--- /dev/null
+++ b/SpeechT5/fairseq/docs/Makefile
@@ -0,0 +1,20 @@
+# Minimal makefile for Sphinx documentation
+#
+
+# You can set these variables from the command line.
+SPHINXOPTS    =
+SPHINXBUILD   = python -msphinx
+SPHINXPROJ    = fairseq
+SOURCEDIR     = .
+BUILDDIR      = _build
+
+# Put it first so that "make" without argument is like "make help".
+help:
+	@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
+
+.PHONY: help Makefile
+
+# Catch-all target: route all unknown targets to Sphinx using the new
+# "make mode" option.  $(O) is meant as a shortcut for $(SPHINXOPTS).
+%: Makefile
+	@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
\ No newline at end of file
diff --git a/SpeechT5/fairseq/docs/_static/theme_overrides.css b/SpeechT5/fairseq/docs/_static/theme_overrides.css
new file mode 100644
index 0000000000000000000000000000000000000000..2a0764193625e1a6fd66ff8af2ccdd0ad6369188
--- /dev/null
+++ b/SpeechT5/fairseq/docs/_static/theme_overrides.css
@@ -0,0 +1,9 @@
+.wy-table-responsive table td kbd {
+    white-space: nowrap;
+}
+.wy-table-responsive table td {
+    white-space: normal !important;
+}
+.wy-table-responsive {
+    overflow: visible !important;
+}
diff --git a/SpeechT5/fairseq/docs/command_line_tools.rst b/SpeechT5/fairseq/docs/command_line_tools.rst
new file mode 100644
index 0000000000000000000000000000000000000000..c16300ff5cd42d9a6c0070c2d9bec3a802eacfad
--- /dev/null
+++ b/SpeechT5/fairseq/docs/command_line_tools.rst
@@ -0,0 +1,85 @@
+.. _Command-line Tools:
+
+Command-line Tools
+==================
+
+Fairseq provides several command-line tools for training and evaluating models:
+
+- :ref:`fairseq-preprocess`: Data pre-processing: build vocabularies and binarize training data
+- :ref:`fairseq-train`: Train a new model on one or multiple GPUs
+- :ref:`fairseq-generate`: Translate pre-processed data with a trained model
+- :ref:`fairseq-interactive`: Translate raw text with a trained model
+- :ref:`fairseq-score`: BLEU scoring of generated translations against reference translations
+- :ref:`fairseq-eval-lm`: Language model evaluation
+
+
+.. _fairseq-preprocess:
+
+fairseq-preprocess
+~~~~~~~~~~~~~~~~~~
+.. automodule:: fairseq_cli.preprocess
+
+    .. argparse::
+        :module: fairseq.options
+        :func: get_preprocessing_parser
+        :prog: fairseq-preprocess
+
+
+.. _fairseq-train:
+
+fairseq-train
+~~~~~~~~~~~~~
+.. automodule:: fairseq_cli.train
+
+    .. argparse::
+        :module: fairseq.options
+        :func: get_training_parser
+        :prog: fairseq-train
+
+
+.. _fairseq-generate:
+
+fairseq-generate
+~~~~~~~~~~~~~~~~
+.. automodule:: fairseq_cli.generate
+
+    .. argparse::
+        :module: fairseq.options
+        :func: get_generation_parser
+        :prog: fairseq-generate
+
+
+.. _fairseq-interactive:
+
+fairseq-interactive
+~~~~~~~~~~~~~~~~~~~
+.. automodule:: fairseq_cli.interactive
+
+    .. argparse::
+        :module: fairseq.options
+        :func: get_interactive_generation_parser
+        :prog: fairseq-interactive
+
+
+.. _fairseq-score:
+
+fairseq-score
+~~~~~~~~~~~~~
+.. automodule:: fairseq_cli.score
+
+    .. argparse::
+        :module: fairseq_cli.score
+        :func: get_parser
+        :prog: fairseq-score
+
+
+.. _fairseq-eval-lm:
+
+fairseq-eval-lm
+~~~~~~~~~~~~~~~
+.. automodule:: fairseq_cli.eval_lm
+
+    .. argparse::
+        :module: fairseq.options
+        :func: get_eval_lm_parser
+        :prog: fairseq-eval-lm
diff --git a/SpeechT5/fairseq/docs/conf.py b/SpeechT5/fairseq/docs/conf.py
new file mode 100644
index 0000000000000000000000000000000000000000..440784bfae96c14e9050542b1b1921a75a3b4b27
--- /dev/null
+++ b/SpeechT5/fairseq/docs/conf.py
@@ -0,0 +1,134 @@
+#!/usr/bin/env python3
+# -*- coding: utf-8 -*-
+#
+# fairseq documentation build configuration file, created by
+# sphinx-quickstart on Fri Aug 17 21:45:30 2018.
+#
+# This file is execfile()d with the current directory set to its
+# containing dir.
+#
+# Note that not all possible configuration values are present in this
+# autogenerated file.
+#
+# All configuration values have a default; values that are commented out
+# serve to show the default.
+
+# If extensions (or modules to document with autodoc) are in another directory,
+# add these directories to sys.path here. If the directory is relative to the
+# documentation root, use os.path.abspath to make it absolute, like shown here.
+
+import os
+import sys
+from fairseq import __version__
+
+
+# source code directory, relative to this file, for sphinx-autobuild
+sys.path.insert(0, os.path.abspath(".."))
+
+source_suffix = [".rst"]
+
+# -- General configuration ------------------------------------------------
+
+# If your documentation needs a minimal Sphinx version, state it here.
+#
+# needs_sphinx = '1.0'
+
+# Add any Sphinx extension module names here, as strings. They can be
+# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
+# ones.
+extensions = [
+    "sphinx.ext.autodoc",
+    "sphinx.ext.intersphinx",
+    "sphinx.ext.viewcode",
+    "sphinx.ext.napoleon",
+    "sphinxarg.ext",
+]
+
+# Add any paths that contain templates here, relative to this directory.
+templates_path = ["_templates"]
+
+# The master toctree document.
+master_doc = "index"
+
+# General information about the project.
+project = "fairseq"
+copyright = "Facebook AI Research (FAIR)"
+author = "Facebook AI Research (FAIR)"
+
+github_doc_root = "https://github.com/pytorch/fairseq/tree/master/docs/"
+
+# The version info for the project you're documenting, acts as replacement for
+# |version| and |release|, also used in various other places throughout the
+# built documents.
+#
+# The short X.Y version.
+version = __version__
+# The full version, including alpha/beta/rc tags.
+release = __version__
+
+# The language for content autogenerated by Sphinx. Refer to documentation
+# for a list of supported languages.
+#
+# This is also used if you do content translation via gettext catalogs.
+# Usually you set "language" from the command line for these cases.
+language = None
+
+# List of patterns, relative to source directory, that match files and
+# directories to ignore when looking for source files.
+# This patterns also effect to html_static_path and html_extra_path
+exclude_patterns = ["_build", "Thumbs.db", ".DS_Store"]
+
+# The name of the Pygments (syntax highlighting) style to use.
+pygments_style = "sphinx"
+highlight_language = "python"
+
+# If true, `todo` and `todoList` produce output, else they produce nothing.
+todo_include_todos = False
+
+
+# -- Options for HTML output ----------------------------------------------
+
+# The theme to use for HTML and HTML Help pages.  See the documentation for
+# a list of builtin themes.
+#
+html_theme = "sphinx_rtd_theme"
+
+# Theme options are theme-specific and customize the look and feel of a theme
+# further.  For a list of options available for each theme, see the
+# documentation.
+#
+# html_theme_options = {}
+
+# Add any paths that contain custom static files (such as style sheets) here,
+# relative to this directory. They are copied after the builtin static files,
+# so a file named "default.css" will overwrite the builtin "default.css".
+html_static_path = ["_static"]
+
+html_context = {
+    "css_files": [
+        "_static/theme_overrides.css",  # override wide tables in RTD theme
+    ],
+}
+
+# Custom sidebar templates, must be a dictionary that maps document names
+# to template names.
+#
+# This is required for the alabaster theme
+# refs: http://alabaster.readthedocs.io/en/latest/installation.html#sidebars
+# html_sidebars = {
+#    '**': [
+#        'about.html',
+#        'navigation.html',
+#        'relations.html',  # needs 'show_related': True theme option to display
+#        'searchbox.html',
+#        'donate.html',
+#    ]
+# }
+
+
+# Example configuration for intersphinx: refer to the Python standard library.
+intersphinx_mapping = {
+    "numpy": ("http://docs.scipy.org/doc/numpy/", None),
+    "python": ("https://docs.python.org/", None),
+    "torch": ("https://pytorch.org/docs/master/", None),
+}
diff --git a/SpeechT5/fairseq/docs/criterions.rst b/SpeechT5/fairseq/docs/criterions.rst
new file mode 100644
index 0000000000000000000000000000000000000000..d6b8ca6b671a32d0da4aca7b18626e0df58a7258
--- /dev/null
+++ b/SpeechT5/fairseq/docs/criterions.rst
@@ -0,0 +1,31 @@
+.. role:: hidden
+    :class: hidden-section
+
+.. _Criterions:
+
+Criterions
+==========
+
+Criterions compute the loss function given the model and batch, roughly::
+
+  loss = criterion(model, batch)
+
+.. automodule:: fairseq.criterions
+    :members:
+
+.. autoclass:: fairseq.criterions.FairseqCriterion
+    :members:
+    :undoc-members:
+
+.. autoclass:: fairseq.criterions.adaptive_loss.AdaptiveLoss
+    :members:
+    :undoc-members:
+.. autoclass:: fairseq.criterions.composite_loss.CompositeLoss
+    :members:
+    :undoc-members:
+.. autoclass:: fairseq.criterions.cross_entropy.CrossEntropyCriterion
+    :members:
+    :undoc-members:
+.. autoclass:: fairseq.criterions.label_smoothed_cross_entropy.LabelSmoothedCrossEntropyCriterion
+    :members:
+    :undoc-members:
diff --git a/SpeechT5/fairseq/docs/data.rst b/SpeechT5/fairseq/docs/data.rst
new file mode 100644
index 0000000000000000000000000000000000000000..6a390cb336ab3c5fb28edec7448abc35a8e22bbb
--- /dev/null
+++ b/SpeechT5/fairseq/docs/data.rst
@@ -0,0 +1,58 @@
+.. role:: hidden
+    :class: hidden-section
+
+.. module:: fairseq.data
+
+Data Loading and Utilities
+==========================
+
+.. _datasets:
+
+Datasets
+--------
+
+**Datasets** define the data format and provide helpers for creating
+mini-batches.
+
+.. autoclass:: fairseq.data.FairseqDataset
+    :members:
+.. autoclass:: fairseq.data.LanguagePairDataset
+    :members:
+.. autoclass:: fairseq.data.MonolingualDataset
+    :members:
+
+**Helper Datasets**
+
+These datasets wrap other :class:`fairseq.data.FairseqDataset` instances and
+provide additional functionality:
+
+.. autoclass:: fairseq.data.BacktranslationDataset
+    :members:
+.. autoclass:: fairseq.data.ConcatDataset
+    :members:
+.. autoclass:: fairseq.data.ResamplingDataset
+    :members:
+.. autoclass:: fairseq.data.RoundRobinZipDatasets
+    :members:
+.. autoclass:: fairseq.data.TransformEosDataset
+    :members:
+
+
+Dictionary
+----------
+
+.. autoclass:: fairseq.data.Dictionary
+    :members:
+
+
+Iterators
+---------
+
+.. autoclass:: fairseq.data.CountingIterator
+    :members:
+.. autoclass:: fairseq.data.EpochBatchIterator
+    :members:
+.. autoclass:: fairseq.data.GroupedIterator
+    :members:
+.. autoclass:: fairseq.data.ShardedIterator
+    :members:
diff --git a/SpeechT5/fairseq/docs/docutils.conf b/SpeechT5/fairseq/docs/docutils.conf
new file mode 100644
index 0000000000000000000000000000000000000000..526acffd32d16217160aee917db2b120354f20f0
--- /dev/null
+++ b/SpeechT5/fairseq/docs/docutils.conf
@@ -0,0 +1,2 @@
+[writers]
+option-limit=0
diff --git a/SpeechT5/fairseq/docs/fairseq.gif b/SpeechT5/fairseq/docs/fairseq.gif
new file mode 100644
index 0000000000000000000000000000000000000000..5782fdbc7e0014564725c3ad0fc6be5c6bcd9983
Binary files /dev/null and b/SpeechT5/fairseq/docs/fairseq.gif differ
diff --git a/SpeechT5/fairseq/docs/fairseq_logo.png b/SpeechT5/fairseq/docs/fairseq_logo.png
new file mode 100644
index 0000000000000000000000000000000000000000..75472cbb5ff78acc8716ad9121ed421f17f96c9a
Binary files /dev/null and b/SpeechT5/fairseq/docs/fairseq_logo.png differ
diff --git a/SpeechT5/fairseq/docs/getting_started.rst b/SpeechT5/fairseq/docs/getting_started.rst
new file mode 100644
index 0000000000000000000000000000000000000000..745ad7763cee67a8dec25bdd7ba7b79cbe0b7754
--- /dev/null
+++ b/SpeechT5/fairseq/docs/getting_started.rst
@@ -0,0 +1,216 @@
+Evaluating Pre-trained Models
+=============================
+
+First, download a pre-trained model along with its vocabularies:
+
+.. code-block:: console
+
+    > curl https://dl.fbaipublicfiles.com/fairseq/models/wmt14.v2.en-fr.fconv-py.tar.bz2 | tar xvjf -
+
+This model uses a `Byte Pair Encoding (BPE)
+vocabulary <https://arxiv.org/abs/1508.07909>`__, so we'll have to apply
+the encoding to the source text before it can be translated. This can be
+done with the
+`apply\_bpe.py <https://github.com/rsennrich/subword-nmt/blob/master/subword_nmt/apply_bpe.py>`__
+script using the ``wmt14.en-fr.fconv-cuda/bpecodes`` file. ``@@`` is
+used as a continuation marker and the original text can be easily
+recovered with e.g. ``sed s/@@ //g`` or by passing the ``--remove-bpe``
+flag to :ref:`fairseq-generate`. Prior to BPE, input text needs to be tokenized
+using ``tokenizer.perl`` from
+`mosesdecoder <https://github.com/moses-smt/mosesdecoder>`__.
+
+Let's use :ref:`fairseq-interactive` to generate translations interactively.
+Here, we use a beam size of 5 and preprocess the input with the Moses
+tokenizer and the given Byte-Pair Encoding vocabulary. It will automatically
+remove the BPE continuation markers and detokenize the output.
+
+.. code-block:: console
+
+    > MODEL_DIR=wmt14.en-fr.fconv-py
+    > fairseq-interactive \
+        --path $MODEL_DIR/model.pt $MODEL_DIR \
+        --beam 5 --source-lang en --target-lang fr \
+        --tokenizer moses \
+        --bpe subword_nmt --bpe-codes $MODEL_DIR/bpecodes
+    | loading model(s) from wmt14.en-fr.fconv-py/model.pt
+    | [en] dictionary: 44206 types
+    | [fr] dictionary: 44463 types
+    | Type the input sentence and press return:
+    Why is it rare to discover new marine mammal species?
+    S-0     Why is it rare to discover new marine mam@@ mal species ?
+    H-0     -0.0643349438905716     Pourquoi est-il rare de découvrir de nouvelles espèces de mammifères marins?
+    P-0     -0.0763 -0.1849 -0.0956 -0.0946 -0.0735 -0.1150 -0.1301 -0.0042 -0.0321 -0.0171 -0.0052 -0.0062 -0.0015
+
+This generation script produces three types of outputs: a line prefixed
+with *O* is a copy of the original source sentence; *H* is the
+hypothesis along with an average log-likelihood; and *P* is the
+positional score per token position, including the
+end-of-sentence marker which is omitted from the text.
+
+Other types of output lines you might see are *D*, the detokenized hypothesis,
+*T*, the reference target, *A*, alignment info, *E* the history of generation steps.
+
+See the `README <https://github.com/pytorch/fairseq#pre-trained-models>`__ for a
+full list of pre-trained models available.
+
+Training a New Model
+====================
+
+The following tutorial is for machine translation. For an example of how
+to use Fairseq for other tasks, such as :ref:`language modeling`, please see the
+``examples/`` directory.
+
+Data Pre-processing
+-------------------
+
+Fairseq contains example pre-processing scripts for several translation
+datasets: IWSLT 2014 (German-English), WMT 2014 (English-French) and WMT
+2014 (English-German). To pre-process and binarize the IWSLT dataset:
+
+.. code-block:: console
+
+    > cd examples/translation/
+    > bash prepare-iwslt14.sh
+    > cd ../..
+    > TEXT=examples/translation/iwslt14.tokenized.de-en
+    > fairseq-preprocess --source-lang de --target-lang en \
+        --trainpref $TEXT/train --validpref $TEXT/valid --testpref $TEXT/test \
+        --destdir data-bin/iwslt14.tokenized.de-en
+
+This will write binarized data that can be used for model training to
+``data-bin/iwslt14.tokenized.de-en``.
+
+Training
+--------
+
+Use :ref:`fairseq-train` to train a new model. Here a few example settings that work
+well for the IWSLT 2014 dataset:
+
+.. code-block:: console
+
+    > mkdir -p checkpoints/fconv
+    > CUDA_VISIBLE_DEVICES=0 fairseq-train data-bin/iwslt14.tokenized.de-en \
+        --optimizer nag --lr 0.25 --clip-norm 0.1 --dropout 0.2 --max-tokens 4000 \
+        --arch fconv_iwslt_de_en --save-dir checkpoints/fconv
+
+By default, :ref:`fairseq-train` will use all available GPUs on your machine. Use the
+``CUDA_VISIBLE_DEVICES`` environment variable to select specific GPUs and/or to
+change the number of GPU devices that will be used.
+
+Also note that the batch size is specified in terms of the maximum
+number of tokens per batch (``--max-tokens``). You may need to use a
+smaller value depending on the available GPU memory on your system.
+
+Generation
+----------
+
+Once your model is trained, you can generate translations using
+:ref:`fairseq-generate` **(for binarized data)** or
+:ref:`fairseq-interactive` **(for raw text)**:
+
+.. code-block:: console
+
+    > fairseq-generate data-bin/iwslt14.tokenized.de-en \
+        --path checkpoints/fconv/checkpoint_best.pt \
+        --batch-size 128 --beam 5
+    | [de] dictionary: 35475 types
+    | [en] dictionary: 24739 types
+    | data-bin/iwslt14.tokenized.de-en test 6750 examples
+    | model fconv
+    | loaded checkpoint trainings/fconv/checkpoint_best.pt
+    S-721   danke .
+    T-721   thank you .
+    ...
+
+To generate translations with only a CPU, use the ``--cpu`` flag. BPE
+continuation markers can be removed with the ``--remove-bpe`` flag.
+
+Advanced Training Options
+=========================
+
+Large mini-batch training with delayed updates
+----------------------------------------------
+
+The ``--update-freq`` option can be used to accumulate gradients from
+multiple mini-batches and delay updating, creating a larger effective
+batch size. Delayed updates can also improve training speed by reducing
+inter-GPU communication costs and by saving idle time caused by variance
+in workload across GPUs. See `Ott et al.
+(2018) <https://arxiv.org/abs/1806.00187>`__ for more details.
+
+To train on a single GPU with an effective batch size that is equivalent
+to training on 8 GPUs:
+
+.. code-block:: console
+
+    > CUDA_VISIBLE_DEVICES=0 fairseq-train --update-freq 8 (...)
+
+Training with half precision floating point (FP16)
+--------------------------------------------------
+
+.. note::
+
+    FP16 training requires a Volta GPU and CUDA 9.1 or greater
+
+Recent GPUs enable efficient half precision floating point computation,
+e.g., using `Nvidia Tensor Cores
+<https://docs.nvidia.com/deeplearning/sdk/mixed-precision-training/index.html>`__.
+Fairseq supports FP16 training with the ``--fp16`` flag:
+
+.. code-block:: console
+
+    > fairseq-train --fp16 (...)
+
+Distributed training
+--------------------
+
+Distributed training in fairseq is implemented on top of ``torch.distributed``.
+The easiest way to launch jobs is with the `torch.distributed.launch
+<https://pytorch.org/docs/stable/distributed.html#launch-utility>`__ tool.
+
+For example, to train a large English-German Transformer model on 2 nodes each
+with 8 GPUs (in total 16 GPUs), run the following command on each node,
+replacing ``node_rank=0`` with ``node_rank=1`` on the second node and making
+sure to update ``--master_addr`` to the IP address of the first node:
+
+.. code-block:: console
+
+    > python -m torch.distributed.launch --nproc_per_node=8 \
+        --nnodes=2 --node_rank=0 --master_addr="192.168.1.1" \
+        --master_port=12345 \
+        $(which fairseq-train) data-bin/wmt16_en_de_bpe32k \
+        --arch transformer_vaswani_wmt_en_de_big --share-all-embeddings \
+        --optimizer adam --adam-betas '(0.9, 0.98)' --clip-norm 0.0 \
+        --lr-scheduler inverse_sqrt --warmup-init-lr 1e-07 --warmup-updates 4000 \
+        --lr 0.0005 \
+        --dropout 0.3 --weight-decay 0.0 --criterion label_smoothed_cross_entropy --label-smoothing 0.1 \
+        --max-tokens 3584 \
+        --max-epoch 70 \
+        --fp16
+
+On SLURM clusters, fairseq will automatically detect the number of nodes and
+GPUs, but a port number must be provided:
+
+.. code-block:: console
+
+    > salloc --gpus=16 --nodes 2 (...)
+    > srun fairseq-train --distributed-port 12345 (...).
+
+Sharding very large datasets
+----------------------------
+
+It can be challenging to train over very large datasets, particularly if your
+machine does not have much system RAM. Most tasks in fairseq support training
+over "sharded" datasets, in which the original dataset has been preprocessed
+into non-overlapping chunks (or "shards").
+
+For example, instead of preprocessing all your data into a single "data-bin"
+directory, you can split the data and create "data-bin1", "data-bin2", etc.
+Then you can adapt your training command like so:
+
+.. code-block:: console
+
+    > fairseq-train data-bin1:data-bin2:data-bin3 (...)
+
+Training will now iterate over each shard, one by one, with each shard
+corresponding to an "epoch", thus reducing system memory usage.
diff --git a/SpeechT5/fairseq/docs/hydra_integration.md b/SpeechT5/fairseq/docs/hydra_integration.md
new file mode 100644
index 0000000000000000000000000000000000000000..6a15298382a6a16dfc4c5a4a812ea1cd0477ed52
--- /dev/null
+++ b/SpeechT5/fairseq/docs/hydra_integration.md
@@ -0,0 +1,284 @@
+## Hydra
+
+[Hydra](https://github.com/facebookresearch/hydra) is an open-source Python
+framework that simplifies the development of research and other complex
+applications. The key feature is the ability to dynamically create a
+hierarchical configuration by composition and override it through config files
+and the command line. The name Hydra comes from its ability to run multiple
+similar jobs - much like a Hydra with multiple heads.
+
+## Motivation
+
+Until recently, all components in fairseq were configured through a shared
+`args` namespace that was created at application startup. Components declared
+their own `add_args` method to update the argparse parser, hoping that the names
+would not clash with arguments from other components. While this model works for
+smaller applications, as fairseq grew and became integrated into other
+applications, this became problematic. In order to determine how to configure
+each component, one needed to a) examine what args were added by this component,
+and b) read the code to figure out what shared arguments it is using that were
+added in other places. Reproducing models involved sharing commands that often
+contained dozens of command line switches.
+
+The model described above is still supported by fairseq for backward
+compatibility, but will be deprecated some time in the future.
+
+New components in fairseq should now create a dataclass that encapsulates all
+parameters required to configure this component. The dataclass is registered
+along with the component, and fairseq takes care of constructing and providing
+this configuration object to the component's constructor. Note that sharing
+parameters can optionally still work, but one has to explicitly point to the
+"source of truth" (see inheritance example below). These changes make components
+in fairseq more independent and re-usable by other applications: all that is
+needed to create a component is to initialize its dataclass and overwrite some
+of the defaults.
+
+While configuring fairseq through command line (using either the legacy argparse
+based or the new Hydra based entry points) is still fully supported, you can now
+take advantage of configuring fairseq completely or piece-by-piece through
+hierarchical YAML configuration files. These files can also be shipped as
+examples that others can use to run an identically configured job.
+
+Additionally, Hydra has a rich and growing [library of
+plugins](https://github.com/facebookresearch/hydra/tree/master/plugins) that
+provide functionality such as hyperparameter sweeping (including using bayesian
+optimization through the [Ax](https://github.com/facebook/Ax) library), job
+launching across various platforms, and more.
+
+## Creating or migrating components
+
+In general, each new (or updated) component should provide a companion
+[dataclass](https://www.python.org/dev/peps/pep-0557/). These dataclass are
+typically located in the same file as the component and are passed as arguments
+to the `register_*()` functions. Top-level configs that should be present in
+every fairseq application are placed in the
+[global](fairseq/dataclass/configs.py) config file and added to the
+`FairseqConfig` object.
+
+Each dataclass is a plain-old-data object, similar to a `NamedTuple`. These
+classes are decorated with a `@dataclass` decorator, and typically inherit from
+`FairseqDataclass` (which adds some functionality for backward compatibility).
+Each field must have a type, and generally has metadata (such as a help string)
+and a default value. Only primitive types or other config objects are allowed as
+data types for each field.
+
+#### Example:
+
+```python
+from dataclasses import dataclass, field
+from fairseq.dataclass import FairseqDataclass
+
+@dataclass
+class InteractiveConfig(FairseqDataclass):
+    buffer_size: int = field(
+        default=0,
+        metadata={
+            "help": "read this many sentences into a buffer before processing them"
+        },
+    )
+    input: str = field(
+        default="-",
+        metadata={"help": "file to read from; use - for stdin"},
+    )
+```
+
+### Inherting values
+
+Some components require sharing a value. For example, a learning rate scheduler
+and an optimizer may both need to know the initial learning rate value. One can
+declare a field that, by default, will inherit its value from another config
+node in the same hierarchy:
+
+```python
+@dataclass
+FairseqAdamConfig(FairseqDataclass):
+    ...
+    lr: List[float] = II("optimization.lr")
+    ...
+```
+
+`II("optimization.lr")` is syntactic sugar for `"${optimization.lr}"`, which is
+the value one can use in a YAML config file or through command line to achieve
+the same effect. Note that this assumes that there is an "optimization" config
+object in the root config and it has a field called "lr".
+
+### Tasks and Models
+
+Creating Tasks and Models works same as before, except that legacy
+implementations now inherit from `LegacyFairseq*` base classes, while new
+components inherit from `FairseqTask` and `FairseqModel` and provide a dataclass
+to the `register_*()` functions.
+
+#### Task example:
+
+```python
+@dataclass
+class LanguageModelingConfig(FairseqDataclass):
+    data: Optional[str] = field(
+        default=None, metadata={"help": "path to data directory"}
+    )
+    ...
+
+@register_task("language_modeling", dataclass=LanguageModelingConfig)
+class LanguageModelingTask(FairseqTask):
+    ...
+    @classmethod
+    def setup_task(cls, cfg: LanguageModelingConfig):
+        ...
+```
+
+#### Model example:
+
+```python
+@dataclass
+class TransformerLanguageModelConfig(FairseqDataclass):
+    activation_fn: ChoiceEnum(utils.get_available_activation_fns()) = field(
+        default="relu", metadata={"help": "activation function to use"}
+    )
+    dropout: float = field(default=0.1, metadata={"help": "dropout probability"})
+    ...
+
+@register_model("transformer_lm", dataclass=TransformerLanguageModelConfig)
+class TransformerLanguageModel(FairseqLanguageModel):
+    ...
+    @classmethod
+    def build_model(cls, cfg: TransformerLanguageModelConfig, task: FairseqTask):
+        ...
+```
+
+### Other components
+
+Other components work as before, but they now take their configuration dataclass
+as the only constructor argument:
+
+```python
+@dataclass
+class MosesTokenizerConfig(FairseqDataclass):
+    source_lang: str = field(default="en", metadata={"help": "source language"})
+    ...
+
+@register_tokenizer("moses", dataclass=MosesTokenizerConfig)
+class MosesTokenizer(object):
+    def __init__(self, cfg: MosesTokenizerConfig):
+        ...
+```
+
+Note that if you are adding a new registry for a new set of components, you need
+to add it to the `FairseqConfig` object in `fairseq/dataclass/configs.py`:
+
+```python
+@dataclass
+class FairseqConfig(object):
+    ...
+    my_new_registry: Any = None
+```
+
+## Training with `fairseq-hydra-train`
+
+To fully take advantage of configuration flexibility offered by Hydra, you may
+want to train new models using the `fairseq-hydra-train` entry point. Legacy CLI
+tools such as `fairseq-train` will remain supported for the foreseeable future
+but will be deprecated eventually.
+
+On startup, Hydra will create a configuration object that contains a hierarchy
+of all the necessary dataclasses populated with their default values in the
+code. The default values are overwritten by values found in YAML files in
+`fairseq/config` directory (which currently sets minimal defaults) and then
+further overwritten by values provided through command line arguments.
+
+Some of the most common use cases are shown below:
+
+### 1. Override default values through command line:
+
+```shell script
+$ fairseq-hydra-train \
+    distributed_training.distributed_world_size=1 \
+    dataset.batch_size=2 \
+    task.data=data-bin \
+    model=transformer_lm/transformer_lm_gpt \
+    task=language_modeling \
+    optimization.max_update=5000
+```
+
+Note that along with explicitly providing values for parameters such as
+`dataset.batch_size`, this also tells Hydra to overlay configuration found in
+`fairseq/config/model/transformer_lm/transformer_lm_gpt.yaml` over the default
+values in the dataclass. If you want to train a model without specifying a
+particular architecture you can simply specify `model=transformer_lm`. This only
+works for migrated tasks and models.
+
+### 2. Replace bundled configs with an external config:
+
+```shell script
+$ fairseq-hydra-train \
+    --config-dir /path/to/external/configs \
+    --config-name wiki103
+```
+
+where `/path/to/external/configs/wiki103.yaml` contains:
+
+```yaml
+# @package _group_
+
+model:
+  _name: transformer_lm
+distributed_training:
+  distributed_world_size: 1
+dataset:
+  batch_size: 2
+task:
+  _name: language_modeling
+  data: /path/to/data
+  add_bos_token: false
+  max_target_positions: 1024
+optimization:
+  max_update: 50000
+  lr: [ 0.25 ]
+criterion: cross_entropy
+optimizer: adam
+lr_scheduler:
+  _name: cosine
+```
+
+Note that here bundled configs from `fairseq/config` directory are not used,
+however the defaults from each dataclass will still be used (unless overwritten
+by your external config).
+
+Additionally you can choose to break up your configs by creating a directory
+structure in the same location as your main config file, with the names of the
+top-level fields (such as "model", "dataset", etc), and placing config files
+with meaningful names that would populate that specific section of your
+top-level config file (for example, you might have
+`model/small_transformer_lm.yaml`, `model/big_transformer_lm.yaml`, etc). You
+can then specify the correct configuration via command line, defaults in the
+main config, or even launch all of them as a sweep (see Hydra documentation on
+how to do this).
+
+### 3. Add an external config directory to Hydra search path:
+
+This allows combining default configuration (including using any bundled config
+files), while specifying your own config files for some parts of the
+configuration.
+
+```shell script
+$ fairseq-hydra-train \
+    distributed_training.distributed_world_size=1 \
+    dataset.batch_size=2 \
+    task.data=/path/to/data/ \
+    model=transformer_lm/2_layers \
+    task=language_modeling \
+    optimization.max_update=5000 \
+    --config-dir /path/to/external/configs
+```
+
+where `/path/to/external/configs` has the following structure:
+```
+.
++-- model
+|   +-- transformer_lm
+|   |   +-- 2_layers.yaml
+```
+
+and `2_layers.yaml` contains a copy of `transformer_lm_gpt.yaml` but with
+`decoder_layers` set to 2. You can add other configs to configure other
+components as well.
diff --git a/SpeechT5/fairseq/docs/index.rst b/SpeechT5/fairseq/docs/index.rst
new file mode 100644
index 0000000000000000000000000000000000000000..591db86cdf49e6f0a7a6686df2150f11418e90d0
--- /dev/null
+++ b/SpeechT5/fairseq/docs/index.rst
@@ -0,0 +1,49 @@
+.. fairseq documentation master file, created by
+   sphinx-quickstart on Fri Aug 17 21:45:30 2018.
+   You can adapt this file completely to your liking, but it should at least
+   contain the root `toctree` directive.
+
+:github_url: https://github.com/pytorch/fairseq
+
+
+fairseq documentation
+=====================
+
+Fairseq is a sequence modeling toolkit written in `PyTorch
+<http://pytorch.org/>`_ that allows researchers and developers to
+train custom models for translation, summarization, language modeling and other
+text generation tasks.
+
+.. toctree::
+    :maxdepth: 1
+    :caption: Getting Started
+
+    getting_started
+    command_line_tools
+
+.. toctree::
+    :maxdepth: 1
+    :caption: Extending Fairseq
+
+    overview
+    tutorial_simple_lstm
+    tutorial_classifying_names
+
+.. toctree::
+    :maxdepth: 2
+    :caption: Library Reference
+
+    tasks
+    models
+    criterions
+    optim
+    lr_scheduler
+    data
+    modules
+
+
+Indices and tables
+==================
+
+* :ref:`genindex`
+* :ref:`search`
diff --git a/SpeechT5/fairseq/docs/lr_scheduler.rst b/SpeechT5/fairseq/docs/lr_scheduler.rst
new file mode 100644
index 0000000000000000000000000000000000000000..bbc09dc22e6a7ac05137954e0b9c80ca030f62f4
--- /dev/null
+++ b/SpeechT5/fairseq/docs/lr_scheduler.rst
@@ -0,0 +1,34 @@
+.. role:: hidden
+    :class: hidden-section
+
+.. _Learning Rate Schedulers:
+
+Learning Rate Schedulers
+========================
+
+Learning Rate Schedulers update the learning rate over the course of training.
+Learning rates can be updated after each update via :func:`step_update` or at
+epoch boundaries via :func:`step`.
+
+.. automodule:: fairseq.optim.lr_scheduler
+    :members:
+
+.. autoclass:: fairseq.optim.lr_scheduler.FairseqLRScheduler
+    :members:
+    :undoc-members:
+
+.. autoclass:: fairseq.optim.lr_scheduler.cosine_lr_scheduler.CosineSchedule
+    :members:
+    :undoc-members:
+.. autoclass:: fairseq.optim.lr_scheduler.fixed_schedule.FixedSchedule
+    :members:
+    :undoc-members:
+.. autoclass:: fairseq.optim.lr_scheduler.inverse_square_root_schedule.InverseSquareRootSchedule
+    :members:
+    :undoc-members:
+.. autoclass:: fairseq.optim.lr_scheduler.reduce_lr_on_plateau.ReduceLROnPlateau
+    :members:
+    :undoc-members:
+.. autoclass:: fairseq.optim.lr_scheduler.triangular_lr_scheduler.TriangularSchedule
+    :members:
+    :undoc-members:
diff --git a/SpeechT5/fairseq/docs/make.bat b/SpeechT5/fairseq/docs/make.bat
new file mode 100644
index 0000000000000000000000000000000000000000..baa9d02a79266ed17e0841f08a83931d46583393
--- /dev/null
+++ b/SpeechT5/fairseq/docs/make.bat
@@ -0,0 +1,36 @@
+@ECHO OFF
+
+pushd %~dp0
+
+REM Command file for Sphinx documentation
+
+if "%SPHINXBUILD%" == "" (
+	set SPHINXBUILD=python -msphinx
+)
+set SOURCEDIR=.
+set BUILDDIR=_build
+set SPHINXPROJ=fairseq
+
+if "%1" == "" goto help
+
+%SPHINXBUILD% >NUL 2>NUL
+if errorlevel 9009 (
+	echo.
+	echo.The Sphinx module was not found. Make sure you have Sphinx installed,
+	echo.then set the SPHINXBUILD environment variable to point to the full
+	echo.path of the 'sphinx-build' executable. Alternatively you may add the
+	echo.Sphinx directory to PATH.
+	echo.
+	echo.If you don't have Sphinx installed, grab it from
+	echo.http://sphinx-doc.org/
+	exit /b 1
+)
+
+%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS%
+goto end
+
+:help
+%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS%
+
+:end
+popd
diff --git a/SpeechT5/fairseq/docs/models.rst b/SpeechT5/fairseq/docs/models.rst
new file mode 100644
index 0000000000000000000000000000000000000000..054622d587c3b7f01f17f442919140755acd8f9e
--- /dev/null
+++ b/SpeechT5/fairseq/docs/models.rst
@@ -0,0 +1,104 @@
+.. role:: hidden
+    :class: hidden-section
+
+.. module:: fairseq.models
+
+.. _Models:
+
+Models
+======
+
+A Model defines the neural network's ``forward()`` method and encapsulates all
+of the learnable parameters in the network. Each model also provides a set of
+named *architectures* that define the precise network configuration (e.g.,
+embedding dimension, number of layers, etc.).
+
+Both the model type and architecture are selected via the ``--arch``
+command-line argument. Once selected, a model may expose additional command-line
+arguments for further configuration.
+
+.. note::
+
+    All fairseq Models extend :class:`BaseFairseqModel`, which in turn extends
+    :class:`torch.nn.Module`. Thus any fairseq Model can be used as a
+    stand-alone Module in other PyTorch code.
+
+
+Convolutional Neural Networks (CNN)
+-----------------------------------
+
+.. module:: fairseq.models.fconv
+.. autoclass:: fairseq.models.fconv.FConvModel
+    :members:
+.. autoclass:: fairseq.models.fconv.FConvEncoder
+    :members:
+    :undoc-members:
+.. autoclass:: fairseq.models.fconv.FConvDecoder
+    :members:
+
+
+Long Short-Term Memory (LSTM) networks
+--------------------------------------
+
+.. module:: fairseq.models.lstm
+.. autoclass:: fairseq.models.lstm.LSTMModel
+    :members:
+.. autoclass:: fairseq.models.lstm.LSTMEncoder
+    :members:
+.. autoclass:: fairseq.models.lstm.LSTMDecoder
+    :members:
+
+
+Transformer (self-attention) networks
+-------------------------------------
+
+.. module:: fairseq.models.transformer
+.. autoclass:: fairseq.models.transformer.TransformerModel
+    :members:
+.. autoclass:: fairseq.models.transformer.TransformerEncoder
+    :members:
+.. autoclass:: fairseq.models.transformer.TransformerEncoderLayer
+    :members:
+.. autoclass:: fairseq.models.transformer.TransformerDecoder
+    :members:
+.. autoclass:: fairseq.models.transformer.TransformerDecoderLayer
+    :members:
+
+
+Adding new models
+-----------------
+
+.. currentmodule:: fairseq.models
+.. autofunction:: fairseq.models.register_model
+.. autofunction:: fairseq.models.register_model_architecture
+.. autoclass:: fairseq.models.BaseFairseqModel
+    :members:
+    :undoc-members:
+.. autoclass:: fairseq.models.FairseqEncoderDecoderModel
+    :members:
+    :undoc-members:
+.. autoclass:: fairseq.models.FairseqEncoderModel
+    :members:
+    :undoc-members:
+.. autoclass:: fairseq.models.FairseqLanguageModel
+    :members:
+    :undoc-members:
+.. autoclass:: fairseq.models.FairseqMultiModel
+    :members:
+    :undoc-members:
+.. autoclass:: fairseq.models.FairseqEncoder
+    :members:
+.. autoclass:: fairseq.models.CompositeEncoder
+    :members:
+.. autoclass:: fairseq.models.FairseqDecoder
+    :members:
+
+
+.. _Incremental decoding:
+
+Incremental decoding
+--------------------
+
+.. autoclass:: fairseq.models.FairseqIncrementalDecoder
+    :members:
+    :undoc-members:
diff --git a/SpeechT5/fairseq/docs/modules.rst b/SpeechT5/fairseq/docs/modules.rst
new file mode 100644
index 0000000000000000000000000000000000000000..9631c93d4682286e1cea1ddd961d3f6ab06f2589
--- /dev/null
+++ b/SpeechT5/fairseq/docs/modules.rst
@@ -0,0 +1,9 @@
+Modules
+=======
+
+Fairseq provides several stand-alone :class:`torch.nn.Module` classes that may
+be helpful when implementing a new :class:`~fairseq.models.BaseFairseqModel`.
+
+.. automodule:: fairseq.modules
+    :members:
+    :undoc-members:
diff --git a/SpeechT5/fairseq/docs/optim.rst b/SpeechT5/fairseq/docs/optim.rst
new file mode 100644
index 0000000000000000000000000000000000000000..c3326456bd9291a1d05bd3316bef5c9fb25c6c49
--- /dev/null
+++ b/SpeechT5/fairseq/docs/optim.rst
@@ -0,0 +1,38 @@
+.. role:: hidden
+    :class: hidden-section
+
+.. _optimizers:
+
+Optimizers
+==========
+
+Optimizers update the Model parameters based on the gradients.
+
+.. automodule:: fairseq.optim
+    :members:
+
+.. autoclass:: fairseq.optim.FairseqOptimizer
+    :members:
+    :undoc-members:
+
+.. autoclass:: fairseq.optim.adadelta.Adadelta
+    :members:
+    :undoc-members:
+.. autoclass:: fairseq.optim.adagrad.Adagrad
+    :members:
+    :undoc-members:
+.. autoclass:: fairseq.optim.adafactor.FairseqAdafactor
+    :members:
+    :undoc-members:
+.. autoclass:: fairseq.optim.adam.FairseqAdam
+    :members:
+    :undoc-members:
+.. autoclass:: fairseq.optim.fp16_optimizer.FP16Optimizer
+    :members:
+    :undoc-members:
+.. autoclass:: fairseq.optim.nag.FairseqNAG
+    :members:
+    :undoc-members:
+.. autoclass:: fairseq.optim.sgd.SGD
+    :members:
+    :undoc-members:
diff --git a/SpeechT5/fairseq/docs/overview.rst b/SpeechT5/fairseq/docs/overview.rst
new file mode 100644
index 0000000000000000000000000000000000000000..026b3b5c7b21d071d8b8a3405898977c760d05b8
--- /dev/null
+++ b/SpeechT5/fairseq/docs/overview.rst
@@ -0,0 +1,74 @@
+Overview
+========
+
+Fairseq can be extended through user-supplied `plug-ins
+<https://en.wikipedia.org/wiki/Plug-in_(computing)>`_. We support five kinds of
+plug-ins:
+
+- :ref:`Models` define the neural network architecture and encapsulate all of the
+  learnable parameters.
+- :ref:`Criterions` compute the loss function given the model outputs and targets.
+- :ref:`Tasks` store dictionaries and provide helpers for loading/iterating over
+  Datasets, initializing the Model/Criterion and calculating the loss.
+- :ref:`Optimizers` update the Model parameters based on the gradients.
+- :ref:`Learning Rate Schedulers` update the learning rate over the course of
+  training.
+
+**Training Flow**
+
+Given a ``model``, ``criterion``, ``task``, ``optimizer`` and ``lr_scheduler``,
+fairseq implements the following high-level training flow::
+
+  for epoch in range(num_epochs):
+      itr = task.get_batch_iterator(task.dataset('train'))
+      for num_updates, batch in enumerate(itr):
+          task.train_step(batch, model, criterion, optimizer)
+          average_and_clip_gradients()
+          optimizer.step()
+          lr_scheduler.step_update(num_updates)
+      lr_scheduler.step(epoch)
+
+where the default implementation for ``task.train_step`` is roughly::
+
+  def train_step(self, batch, model, criterion, optimizer, **unused):
+      loss = criterion(model, batch)
+      optimizer.backward(loss)
+      return loss
+
+**Registering new plug-ins**
+
+New plug-ins are *registered* through a set of ``@register`` function
+decorators, for example::
+
+  @register_model('my_lstm')
+  class MyLSTM(FairseqEncoderDecoderModel):
+      (...)
+
+Once registered, new plug-ins can be used with the existing :ref:`Command-line
+Tools`. See the Tutorial sections for more detailed walkthroughs of how to add
+new plug-ins.
+
+**Loading plug-ins from another directory**
+
+New plug-ins can be defined in a custom module stored in the user system. In
+order to import the module, and make the plugin available to *fairseq*, the
+command line supports the ``--user-dir`` flag that can be used to specify a
+custom location for additional modules to load into *fairseq*.
+
+For example, assuming this directory tree::
+
+  /home/user/my-module/
+  └── __init__.py
+  
+with ``__init__.py``::
+
+  from fairseq.models import register_model_architecture
+  from fairseq.models.transformer import transformer_vaswani_wmt_en_de_big
+
+  @register_model_architecture('transformer', 'my_transformer')
+  def transformer_mmt_big(args):
+      transformer_vaswani_wmt_en_de_big(args)
+
+it is possible to invoke the :ref:`fairseq-train` script with the new architecture with::
+
+  fairseq-train ... --user-dir /home/user/my-module -a my_transformer --task translation
diff --git a/SpeechT5/fairseq/docs/requirements.txt b/SpeechT5/fairseq/docs/requirements.txt
new file mode 100644
index 0000000000000000000000000000000000000000..c734a1f04f1c108d84d3a07643ac93adf6485f13
--- /dev/null
+++ b/SpeechT5/fairseq/docs/requirements.txt
@@ -0,0 +1,2 @@
+sphinx<2.0
+sphinx-argparse
diff --git a/SpeechT5/fairseq/docs/tasks.rst b/SpeechT5/fairseq/docs/tasks.rst
new file mode 100644
index 0000000000000000000000000000000000000000..5f65c3c866865e50332d8e6ca012a4a81e7bea74
--- /dev/null
+++ b/SpeechT5/fairseq/docs/tasks.rst
@@ -0,0 +1,61 @@
+.. role:: hidden
+    :class: hidden-section
+
+.. module:: fairseq.tasks
+
+.. _Tasks:
+
+Tasks
+=====
+
+Tasks store dictionaries and provide helpers for loading/iterating over
+Datasets, initializing the Model/Criterion and calculating the loss.
+
+Tasks can be selected via the ``--task`` command-line argument. Once selected, a
+task may expose additional command-line arguments for further configuration.
+
+Example usage::
+
+    # setup the task (e.g., load dictionaries)
+    task = fairseq.tasks.setup_task(args)
+
+    # build model and criterion
+    model = task.build_model(args)
+    criterion = task.build_criterion(args)
+
+    # load datasets
+    task.load_dataset('train')
+    task.load_dataset('valid')
+
+    # iterate over mini-batches of data
+    batch_itr = task.get_batch_iterator(
+        task.dataset('train'), max_tokens=4096,
+    )
+    for batch in batch_itr:
+        # compute the loss
+        loss, sample_size, logging_output = task.get_loss(
+            model, criterion, batch,
+        )
+        loss.backward()
+
+
+Translation
+-----------
+
+.. autoclass:: fairseq.tasks.translation.TranslationTask
+
+.. _language modeling:
+
+Language Modeling
+-----------------
+
+.. autoclass:: fairseq.tasks.language_modeling.LanguageModelingTask
+
+
+Adding new tasks
+----------------
+
+.. autofunction:: fairseq.tasks.register_task
+.. autoclass:: fairseq.tasks.FairseqTask
+    :members:
+    :undoc-members:
diff --git a/SpeechT5/fairseq/docs/tutorial_classifying_names.rst b/SpeechT5/fairseq/docs/tutorial_classifying_names.rst
new file mode 100644
index 0000000000000000000000000000000000000000..b02fec0489a86e7b1ccec481342fa4fbd93a80ae
--- /dev/null
+++ b/SpeechT5/fairseq/docs/tutorial_classifying_names.rst
@@ -0,0 +1,415 @@
+Tutorial: Classifying Names with a Character-Level RNN
+======================================================
+
+In this tutorial we will extend fairseq to support *classification* tasks. In
+particular we will re-implement the PyTorch tutorial for `Classifying Names with
+a Character-Level RNN <https://pytorch.org/tutorials/intermediate/char_rnn_classification_tutorial.html>`_
+in fairseq. It is recommended to quickly skim that tutorial before beginning
+this one.
+
+This tutorial covers:
+
+1. **Preprocessing the data** to create dictionaries.
+2. **Registering a new Model** that encodes an input sentence with a simple RNN
+   and predicts the output label.
+3. **Registering a new Task** that loads our dictionaries and dataset.
+4. **Training the Model** using the existing command-line tools.
+5. **Writing an evaluation script** that imports fairseq and allows us to
+   interactively evaluate our model on new inputs.
+
+
+1. Preprocessing the data
+-------------------------
+
+The original tutorial provides raw data, but we'll work with a modified version
+of the data that is already tokenized into characters and split into separate
+train, valid and test sets.
+
+Download and extract the data from here:
+`tutorial_names.tar.gz <https://dl.fbaipublicfiles.com/fairseq/data/tutorial_names.tar.gz>`_
+
+Once extracted, let's preprocess the data using the :ref:`fairseq-preprocess`
+command-line tool to create the dictionaries. While this tool is primarily
+intended for sequence-to-sequence problems, we're able to reuse it here by
+treating the label as a "target" sequence of length 1. We'll also output the
+preprocessed files in "raw" format using the ``--dataset-impl`` option to
+enhance readability:
+
+.. code-block:: console
+
+  > fairseq-preprocess \
+    --trainpref names/train --validpref names/valid --testpref names/test \
+    --source-lang input --target-lang label \
+    --destdir names-bin --dataset-impl raw
+
+After running the above command you should see a new directory,
+:file:`names-bin/`, containing the dictionaries for *inputs* and *labels*.
+
+
+2. Registering a new Model
+--------------------------
+
+Next we'll register a new model in fairseq that will encode an input sentence
+with a simple RNN and predict the output label. Compared to the original PyTorch
+tutorial, our version will also work with batches of data and GPU Tensors.
+
+First let's copy the simple RNN module implemented in the `PyTorch tutorial
+<https://pytorch.org/tutorials/intermediate/char_rnn_classification_tutorial.html#creating-the-network>`_.
+Create a new file named :file:`fairseq/models/rnn_classifier.py` with the
+following contents::
+
+    import torch
+    import torch.nn as nn
+
+    class RNN(nn.Module):
+
+        def __init__(self, input_size, hidden_size, output_size):
+            super(RNN, self).__init__()
+
+            self.hidden_size = hidden_size
+
+            self.i2h = nn.Linear(input_size + hidden_size, hidden_size)
+            self.i2o = nn.Linear(input_size + hidden_size, output_size)
+            self.softmax = nn.LogSoftmax(dim=1)
+
+        def forward(self, input, hidden):
+            combined = torch.cat((input, hidden), 1)
+            hidden = self.i2h(combined)
+            output = self.i2o(combined)
+            output = self.softmax(output)
+            return output, hidden
+
+        def initHidden(self):
+            return torch.zeros(1, self.hidden_size)
+
+We must also *register* this model with fairseq using the
+:func:`~fairseq.models.register_model` function decorator. Once the model is
+registered we'll be able to use it with the existing :ref:`Command-line Tools`.
+
+All registered models must implement the :class:`~fairseq.models.BaseFairseqModel`
+interface, so we'll create a small wrapper class in the same file and register
+it in fairseq with the name ``'rnn_classifier'``::
+
+    from fairseq.models import BaseFairseqModel, register_model
+
+    # Note: the register_model "decorator" should immediately precede the
+    # definition of the Model class.
+
+    @register_model('rnn_classifier')
+    class FairseqRNNClassifier(BaseFairseqModel):
+
+        @staticmethod
+        def add_args(parser):
+            # Models can override this method to add new command-line arguments.
+            # Here we'll add a new command-line argument to configure the
+            # dimensionality of the hidden state.
+            parser.add_argument(
+                '--hidden-dim', type=int, metavar='N',
+                help='dimensionality of the hidden state',
+            )
+
+        @classmethod
+        def build_model(cls, args, task):
+            # Fairseq initializes models by calling the ``build_model()``
+            # function. This provides more flexibility, since the returned model
+            # instance can be of a different type than the one that was called.
+            # In this case we'll just return a FairseqRNNClassifier instance.
+
+            # Initialize our RNN module
+            rnn = RNN(
+                # We'll define the Task in the next section, but for now just
+                # notice that the task holds the dictionaries for the "source"
+                # (i.e., the input sentence) and "target" (i.e., the label).
+                input_size=len(task.source_dictionary),
+                hidden_size=args.hidden_dim,
+                output_size=len(task.target_dictionary),
+            )
+
+            # Return the wrapped version of the module
+            return FairseqRNNClassifier(
+                rnn=rnn,
+                input_vocab=task.source_dictionary,
+            )
+
+        def __init__(self, rnn, input_vocab):
+            super(FairseqRNNClassifier, self).__init__()
+
+            self.rnn = rnn
+            self.input_vocab = input_vocab
+
+            # The RNN module in the tutorial expects one-hot inputs, so we can
+            # precompute the identity matrix to help convert from indices to
+            # one-hot vectors. We register it as a buffer so that it is moved to
+            # the GPU when ``cuda()`` is called.
+            self.register_buffer('one_hot_inputs', torch.eye(len(input_vocab)))
+
+        def forward(self, src_tokens, src_lengths):
+            # The inputs to the ``forward()`` function are determined by the
+            # Task, and in particular the ``'net_input'`` key in each
+            # mini-batch. We'll define the Task in the next section, but for
+            # now just know that *src_tokens* has shape `(batch, src_len)` and
+            # *src_lengths* has shape `(batch)`.
+            bsz, max_src_len = src_tokens.size()
+
+            # Initialize the RNN hidden state. Compared to the original PyTorch
+            # tutorial we'll also handle batched inputs and work on the GPU.
+            hidden = self.rnn.initHidden()
+            hidden = hidden.repeat(bsz, 1)  # expand for batched inputs
+            hidden = hidden.to(src_tokens.device)  # move to GPU
+
+            for i in range(max_src_len):
+                # WARNING: The inputs have padding, so we should mask those
+                # elements here so that padding doesn't affect the results.
+                # This is left as an exercise for the reader. The padding symbol
+                # is given by ``self.input_vocab.pad()`` and the unpadded length
+                # of each input is given by *src_lengths*.
+
+                # One-hot encode a batch of input characters.
+                input = self.one_hot_inputs[src_tokens[:, i].long()]
+
+                # Feed the input to our RNN.
+                output, hidden = self.rnn(input, hidden)
+
+            # Return the final output state for making a prediction
+            return output
+
+Finally let's define a *named architecture* with the configuration for our
+model. This is done with the :func:`~fairseq.models.register_model_architecture`
+function decorator. Thereafter this named architecture can be used with the
+``--arch`` command-line argument, e.g., ``--arch pytorch_tutorial_rnn``::
+
+    from fairseq.models import register_model_architecture
+
+    # The first argument to ``register_model_architecture()`` should be the name
+    # of the model we registered above (i.e., 'rnn_classifier'). The function we
+    # register here should take a single argument *args* and modify it in-place
+    # to match the desired architecture.
+
+    @register_model_architecture('rnn_classifier', 'pytorch_tutorial_rnn')
+    def pytorch_tutorial_rnn(args):
+        # We use ``getattr()`` to prioritize arguments that are explicitly given
+        # on the command-line, so that the defaults defined below are only used
+        # when no other value has been specified.
+        args.hidden_dim = getattr(args, 'hidden_dim', 128)
+
+
+3. Registering a new Task
+-------------------------
+
+Now we'll register a new :class:`~fairseq.tasks.FairseqTask` that will load our
+dictionaries and dataset. Tasks can also control how the data is batched into
+mini-batches, but in this tutorial we'll reuse the batching provided by
+:class:`fairseq.data.LanguagePairDataset`.
+
+Create a new file named :file:`fairseq/tasks/simple_classification.py` with the
+following contents::
+
+  import os
+  import torch
+
+  from fairseq.data import Dictionary, LanguagePairDataset
+  from fairseq.tasks import FairseqTask, register_task
+
+
+  @register_task('simple_classification')
+  class SimpleClassificationTask(LegacyFairseqTask):
+
+      @staticmethod
+      def add_args(parser):
+          # Add some command-line arguments for specifying where the data is
+          # located and the maximum supported input length.
+          parser.add_argument('data', metavar='FILE',
+                              help='file prefix for data')
+          parser.add_argument('--max-positions', default=1024, type=int,
+                              help='max input length')
+
+      @classmethod
+      def setup_task(cls, args, **kwargs):
+          # Here we can perform any setup required for the task. This may include
+          # loading Dictionaries, initializing shared Embedding layers, etc.
+          # In this case we'll just load the Dictionaries.
+          input_vocab = Dictionary.load(os.path.join(args.data, 'dict.input.txt'))
+          label_vocab = Dictionary.load(os.path.join(args.data, 'dict.label.txt'))
+          print('| [input] dictionary: {} types'.format(len(input_vocab)))
+          print('| [label] dictionary: {} types'.format(len(label_vocab)))
+
+          return SimpleClassificationTask(args, input_vocab, label_vocab)
+
+      def __init__(self, args, input_vocab, label_vocab):
+          super().__init__(args)
+          self.input_vocab = input_vocab
+          self.label_vocab = label_vocab
+
+      def load_dataset(self, split, **kwargs):
+          """Load a given dataset split (e.g., train, valid, test)."""
+
+          prefix = os.path.join(self.args.data, '{}.input-label'.format(split))
+
+          # Read input sentences.
+          sentences, lengths = [], []
+          with open(prefix + '.input', encoding='utf-8') as file:
+              for line in file:
+                  sentence = line.strip()
+
+                  # Tokenize the sentence, splitting on spaces
+                  tokens = self.input_vocab.encode_line(
+                      sentence, add_if_not_exist=False,
+                  )
+
+                  sentences.append(tokens)
+                  lengths.append(tokens.numel())
+
+          # Read labels.
+          labels = []
+          with open(prefix + '.label', encoding='utf-8') as file:
+              for line in file:
+                  label = line.strip()
+                  labels.append(
+                      # Convert label to a numeric ID.
+                      torch.LongTensor([self.label_vocab.add_symbol(label)])
+                  )
+
+          assert len(sentences) == len(labels)
+          print('| {} {} {} examples'.format(self.args.data, split, len(sentences)))
+
+          # We reuse LanguagePairDataset since classification can be modeled as a
+          # sequence-to-sequence task where the target sequence has length 1.
+          self.datasets[split] = LanguagePairDataset(
+              src=sentences,
+              src_sizes=lengths,
+              src_dict=self.input_vocab,
+              tgt=labels,
+              tgt_sizes=torch.ones(len(labels)),  # targets have length 1
+              tgt_dict=self.label_vocab,
+              left_pad_source=False,
+              # Since our target is a single class label, there's no need for
+              # teacher forcing. If we set this to ``True`` then our Model's
+              # ``forward()`` method would receive an additional argument called
+              # *prev_output_tokens* that would contain a shifted version of the
+              # target sequence.
+              input_feeding=False,
+          )
+
+      def max_positions(self):
+          """Return the max input length allowed by the task."""
+          # The source should be less than *args.max_positions* and the "target"
+          # has max length 1.
+          return (self.args.max_positions, 1)
+
+      @property
+      def source_dictionary(self):
+          """Return the source :class:`~fairseq.data.Dictionary`."""
+          return self.input_vocab
+
+      @property
+      def target_dictionary(self):
+          """Return the target :class:`~fairseq.data.Dictionary`."""
+          return self.label_vocab
+
+      # We could override this method if we wanted more control over how batches
+      # are constructed, but it's not necessary for this tutorial since we can
+      # reuse the batching provided by LanguagePairDataset.
+      #
+      # def get_batch_iterator(
+      #     self, dataset, max_tokens=None, max_sentences=None, max_positions=None,
+      #     ignore_invalid_inputs=False, required_batch_size_multiple=1,
+      #     seed=1, num_shards=1, shard_id=0, num_workers=0, epoch=1,
+      #     data_buffer_size=0, disable_iterator_cache=False,
+      # ):
+      #     (...)
+
+
+4. Training the Model
+---------------------
+
+Now we're ready to train the model. We can use the existing :ref:`fairseq-train`
+command-line tool for this, making sure to specify our new Task (``--task
+simple_classification``) and Model architecture (``--arch
+pytorch_tutorial_rnn``):
+
+.. note::
+
+  You can also configure the dimensionality of the hidden state by passing the
+  ``--hidden-dim`` argument to :ref:`fairseq-train`.
+
+.. code-block:: console
+
+  > fairseq-train names-bin \
+    --task simple_classification \
+    --arch pytorch_tutorial_rnn \
+    --optimizer adam --lr 0.001 --lr-shrink 0.5 \
+    --max-tokens 1000
+  (...)
+  | epoch 027 | loss 1.200 | ppl 2.30 | wps 15728 | ups 119.4 | wpb 116 | bsz 116 | num_updates 3726 | lr 1.5625e-05 | gnorm 1.290 | clip 0% | oom 0 | wall 32 | train_wall 21
+  | epoch 027 | valid on 'valid' subset | valid_loss 1.41304 | valid_ppl 2.66 | num_updates 3726 | best 1.41208
+  | done training in 31.6 seconds
+
+The model files should appear in the :file:`checkpoints/` directory.
+
+
+5. Writing an evaluation script
+-------------------------------
+
+Finally we can write a short script to evaluate our model on new inputs. Create
+a new file named :file:`eval_classifier.py` with the following contents::
+
+  from fairseq import checkpoint_utils, data, options, tasks
+
+  # Parse command-line arguments for generation
+  parser = options.get_generation_parser(default_task='simple_classification')
+  args = options.parse_args_and_arch(parser)
+
+  # Setup task
+  task = tasks.setup_task(args)
+
+  # Load model
+  print('| loading model from {}'.format(args.path))
+  models, _model_args = checkpoint_utils.load_model_ensemble([args.path], task=task)
+  model = models[0]
+
+  while True:
+      sentence = input('\nInput: ')
+
+      # Tokenize into characters
+      chars = ' '.join(list(sentence.strip()))
+      tokens = task.source_dictionary.encode_line(
+          chars, add_if_not_exist=False,
+      )
+
+      # Build mini-batch to feed to the model
+      batch = data.language_pair_dataset.collate(
+          samples=[{'id': -1, 'source': tokens}],  # bsz = 1
+          pad_idx=task.source_dictionary.pad(),
+          eos_idx=task.source_dictionary.eos(),
+          left_pad_source=False,
+          input_feeding=False,
+      )
+
+      # Feed batch to the model and get predictions
+      preds = model(**batch['net_input'])
+
+      # Print top 3 predictions and their log-probabilities
+      top_scores, top_labels = preds[0].topk(k=3)
+      for score, label_idx in zip(top_scores, top_labels):
+          label_name = task.target_dictionary.string([label_idx])
+          print('({:.2f})\t{}'.format(score, label_name))
+
+Now we can evaluate our model interactively. Note that we have included the
+original data path (:file:`names-bin/`) so that the dictionaries can be loaded:
+
+.. code-block:: console
+
+  > python eval_classifier.py names-bin --path checkpoints/checkpoint_best.pt
+  | [input] dictionary: 64 types
+  | [label] dictionary: 24 types
+  | loading model from checkpoints/checkpoint_best.pt
+
+  Input: Satoshi
+  (-0.61) Japanese
+  (-1.20) Arabic
+  (-2.86) Italian
+
+  Input: Sinbad
+  (-0.30) Arabic
+  (-1.76) English
+  (-4.08) Russian
diff --git a/SpeechT5/fairseq/docs/tutorial_simple_lstm.rst b/SpeechT5/fairseq/docs/tutorial_simple_lstm.rst
new file mode 100644
index 0000000000000000000000000000000000000000..f52988507c5da5125668e143bd2bfe4df117b41c
--- /dev/null
+++ b/SpeechT5/fairseq/docs/tutorial_simple_lstm.rst
@@ -0,0 +1,518 @@
+Tutorial: Simple LSTM
+=====================
+
+In this tutorial we will extend fairseq by adding a new
+:class:`~fairseq.models.FairseqEncoderDecoderModel` that encodes a source
+sentence with an LSTM and then passes the final hidden state to a second LSTM
+that decodes the target sentence (without attention).
+
+This tutorial covers:
+
+1. **Writing an Encoder and Decoder** to encode/decode the source/target
+   sentence, respectively.
+2. **Registering a new Model** so that it can be used with the existing
+   :ref:`Command-line tools`.
+3. **Training the Model** using the existing command-line tools.
+4. **Making generation faster** by modifying the Decoder to use
+   :ref:`Incremental decoding`.
+
+
+1. Building an Encoder and Decoder
+----------------------------------
+
+In this section we'll define a simple LSTM Encoder and Decoder. All Encoders
+should implement the :class:`~fairseq.models.FairseqEncoder` interface and
+Decoders should implement the :class:`~fairseq.models.FairseqDecoder` interface.
+These interfaces themselves extend :class:`torch.nn.Module`, so FairseqEncoders
+and FairseqDecoders can be written and used in the same ways as ordinary PyTorch
+Modules.
+
+
+Encoder
+~~~~~~~
+
+Our Encoder will embed the tokens in the source sentence, feed them to a
+:class:`torch.nn.LSTM` and return the final hidden state. To create our encoder
+save the following in a new file named :file:`fairseq/models/simple_lstm.py`::
+
+  import torch.nn as nn
+  from fairseq import utils
+  from fairseq.models import FairseqEncoder
+
+  class SimpleLSTMEncoder(FairseqEncoder):
+
+      def __init__(
+          self, args, dictionary, embed_dim=128, hidden_dim=128, dropout=0.1,
+      ):
+          super().__init__(dictionary)
+          self.args = args
+
+          # Our encoder will embed the inputs before feeding them to the LSTM.
+          self.embed_tokens = nn.Embedding(
+              num_embeddings=len(dictionary),
+              embedding_dim=embed_dim,
+              padding_idx=dictionary.pad(),
+          )
+          self.dropout = nn.Dropout(p=dropout)
+
+          # We'll use a single-layer, unidirectional LSTM for simplicity.
+          self.lstm = nn.LSTM(
+              input_size=embed_dim,
+              hidden_size=hidden_dim,
+              num_layers=1,
+              bidirectional=False,
+              batch_first=True,
+          )
+
+      def forward(self, src_tokens, src_lengths):
+          # The inputs to the ``forward()`` function are determined by the
+          # Task, and in particular the ``'net_input'`` key in each
+          # mini-batch. We discuss Tasks in the next tutorial, but for now just
+          # know that *src_tokens* has shape `(batch, src_len)` and *src_lengths*
+          # has shape `(batch)`.
+
+          # Note that the source is typically padded on the left. This can be
+          # configured by adding the `--left-pad-source "False"` command-line
+          # argument, but here we'll make the Encoder handle either kind of
+          # padding by converting everything to be right-padded.
+          if self.args.left_pad_source:
+              # Convert left-padding to right-padding.
+              src_tokens = utils.convert_padding_direction(
+                  src_tokens,
+                  padding_idx=self.dictionary.pad(),
+                  left_to_right=True
+              )
+
+          # Embed the source.
+          x = self.embed_tokens(src_tokens)
+
+          # Apply dropout.
+          x = self.dropout(x)
+
+          # Pack the sequence into a PackedSequence object to feed to the LSTM.
+          x = nn.utils.rnn.pack_padded_sequence(x, src_lengths, batch_first=True)
+
+          # Get the output from the LSTM.
+          _outputs, (final_hidden, _final_cell) = self.lstm(x)
+
+          # Return the Encoder's output. This can be any object and will be
+          # passed directly to the Decoder.
+          return {
+              # this will have shape `(bsz, hidden_dim)`
+              'final_hidden': final_hidden.squeeze(0),
+          }
+
+      # Encoders are required to implement this method so that we can rearrange
+      # the order of the batch elements during inference (e.g., beam search).
+      def reorder_encoder_out(self, encoder_out, new_order):
+          """
+          Reorder encoder output according to `new_order`.
+
+          Args:
+              encoder_out: output from the ``forward()`` method
+              new_order (LongTensor): desired order
+
+          Returns:
+              `encoder_out` rearranged according to `new_order`
+          """
+          final_hidden = encoder_out['final_hidden']
+          return {
+              'final_hidden': final_hidden.index_select(0, new_order),
+          }
+
+
+Decoder
+~~~~~~~
+
+Our Decoder will predict the next word, conditioned on the Encoder's final
+hidden state and an embedded representation of the previous target word -- which
+is sometimes called *teacher forcing*. More specifically, we'll use a
+:class:`torch.nn.LSTM` to produce a sequence of hidden states that we'll project
+to the size of the output vocabulary to predict each target word.
+
+::
+
+  import torch
+  from fairseq.models import FairseqDecoder
+
+  class SimpleLSTMDecoder(FairseqDecoder):
+
+      def __init__(
+          self, dictionary, encoder_hidden_dim=128, embed_dim=128, hidden_dim=128,
+          dropout=0.1,
+      ):
+          super().__init__(dictionary)
+
+          # Our decoder will embed the inputs before feeding them to the LSTM.
+          self.embed_tokens = nn.Embedding(
+              num_embeddings=len(dictionary),
+              embedding_dim=embed_dim,
+              padding_idx=dictionary.pad(),
+          )
+          self.dropout = nn.Dropout(p=dropout)
+
+          # We'll use a single-layer, unidirectional LSTM for simplicity.
+          self.lstm = nn.LSTM(
+              # For the first layer we'll concatenate the Encoder's final hidden
+              # state with the embedded target tokens.
+              input_size=encoder_hidden_dim + embed_dim,
+              hidden_size=hidden_dim,
+              num_layers=1,
+              bidirectional=False,
+          )
+
+          # Define the output projection.
+          self.output_projection = nn.Linear(hidden_dim, len(dictionary))
+
+      # During training Decoders are expected to take the entire target sequence
+      # (shifted right by one position) and produce logits over the vocabulary.
+      # The *prev_output_tokens* tensor begins with the end-of-sentence symbol,
+      # ``dictionary.eos()``, followed by the target sequence.
+      def forward(self, prev_output_tokens, encoder_out):
+          """
+          Args:
+              prev_output_tokens (LongTensor): previous decoder outputs of shape
+                  `(batch, tgt_len)`, for teacher forcing
+              encoder_out (Tensor, optional): output from the encoder, used for
+                  encoder-side attention
+
+          Returns:
+              tuple:
+                  - the last decoder layer's output of shape
+                    `(batch, tgt_len, vocab)`
+                  - the last decoder layer's attention weights of shape
+                    `(batch, tgt_len, src_len)`
+          """
+          bsz, tgt_len = prev_output_tokens.size()
+
+          # Extract the final hidden state from the Encoder.
+          final_encoder_hidden = encoder_out['final_hidden']
+
+          # Embed the target sequence, which has been shifted right by one
+          # position and now starts with the end-of-sentence symbol.
+          x = self.embed_tokens(prev_output_tokens)
+
+          # Apply dropout.
+          x = self.dropout(x)
+
+          # Concatenate the Encoder's final hidden state to *every* embedded
+          # target token.
+          x = torch.cat(
+              [x, final_encoder_hidden.unsqueeze(1).expand(bsz, tgt_len, -1)],
+              dim=2,
+          )
+
+          # Using PackedSequence objects in the Decoder is harder than in the
+          # Encoder, since the targets are not sorted in descending length order,
+          # which is a requirement of ``pack_padded_sequence()``. Instead we'll
+          # feed nn.LSTM directly.
+          initial_state = (
+              final_encoder_hidden.unsqueeze(0),  # hidden
+              torch.zeros_like(final_encoder_hidden).unsqueeze(0),  # cell
+          )
+          output, _ = self.lstm(
+              x.transpose(0, 1),  # convert to shape `(tgt_len, bsz, dim)`
+              initial_state,
+          )
+          x = output.transpose(0, 1)  # convert to shape `(bsz, tgt_len, hidden)`
+
+          # Project the outputs to the size of the vocabulary.
+          x = self.output_projection(x)
+
+          # Return the logits and ``None`` for the attention weights
+          return x, None
+
+
+2. Registering the Model
+------------------------
+
+Now that we've defined our Encoder and Decoder we must *register* our model with
+fairseq using the :func:`~fairseq.models.register_model` function decorator.
+Once the model is registered we'll be able to use it with the existing
+:ref:`Command-line Tools`.
+
+All registered models must implement the
+:class:`~fairseq.models.BaseFairseqModel` interface. For sequence-to-sequence
+models (i.e., any model with a single Encoder and Decoder), we can instead
+implement the :class:`~fairseq.models.FairseqEncoderDecoderModel` interface.
+
+Create a small wrapper class in the same file and register it in fairseq with
+the name ``'simple_lstm'``::
+
+  from fairseq.models import FairseqEncoderDecoderModel, register_model
+
+  # Note: the register_model "decorator" should immediately precede the
+  # definition of the Model class.
+
+  @register_model('simple_lstm')
+  class SimpleLSTMModel(FairseqEncoderDecoderModel):
+
+      @staticmethod
+      def add_args(parser):
+          # Models can override this method to add new command-line arguments.
+          # Here we'll add some new command-line arguments to configure dropout
+          # and the dimensionality of the embeddings and hidden states.
+          parser.add_argument(
+              '--encoder-embed-dim', type=int, metavar='N',
+              help='dimensionality of the encoder embeddings',
+          )
+          parser.add_argument(
+              '--encoder-hidden-dim', type=int, metavar='N',
+              help='dimensionality of the encoder hidden state',
+          )
+          parser.add_argument(
+              '--encoder-dropout', type=float, default=0.1,
+              help='encoder dropout probability',
+          )
+          parser.add_argument(
+              '--decoder-embed-dim', type=int, metavar='N',
+              help='dimensionality of the decoder embeddings',
+          )
+          parser.add_argument(
+              '--decoder-hidden-dim', type=int, metavar='N',
+              help='dimensionality of the decoder hidden state',
+          )
+          parser.add_argument(
+              '--decoder-dropout', type=float, default=0.1,
+              help='decoder dropout probability',
+          )
+
+      @classmethod
+      def build_model(cls, args, task):
+          # Fairseq initializes models by calling the ``build_model()``
+          # function. This provides more flexibility, since the returned model
+          # instance can be of a different type than the one that was called.
+          # In this case we'll just return a SimpleLSTMModel instance.
+
+          # Initialize our Encoder and Decoder.
+          encoder = SimpleLSTMEncoder(
+              args=args,
+              dictionary=task.source_dictionary,
+              embed_dim=args.encoder_embed_dim,
+              hidden_dim=args.encoder_hidden_dim,
+              dropout=args.encoder_dropout,
+          )
+          decoder = SimpleLSTMDecoder(
+              dictionary=task.target_dictionary,
+              encoder_hidden_dim=args.encoder_hidden_dim,
+              embed_dim=args.decoder_embed_dim,
+              hidden_dim=args.decoder_hidden_dim,
+              dropout=args.decoder_dropout,
+          )
+          model = SimpleLSTMModel(encoder, decoder)
+
+          # Print the model architecture.
+          print(model)
+
+          return model
+
+      # We could override the ``forward()`` if we wanted more control over how
+      # the encoder and decoder interact, but it's not necessary for this
+      # tutorial since we can inherit the default implementation provided by
+      # the FairseqEncoderDecoderModel base class, which looks like:
+      #
+      # def forward(self, src_tokens, src_lengths, prev_output_tokens):
+      #     encoder_out = self.encoder(src_tokens, src_lengths)
+      #     decoder_out = self.decoder(prev_output_tokens, encoder_out)
+      #     return decoder_out
+
+Finally let's define a *named architecture* with the configuration for our
+model. This is done with the :func:`~fairseq.models.register_model_architecture`
+function decorator. Thereafter this named architecture can be used with the
+``--arch`` command-line argument, e.g., ``--arch tutorial_simple_lstm``::
+
+  from fairseq.models import register_model_architecture
+
+  # The first argument to ``register_model_architecture()`` should be the name
+  # of the model we registered above (i.e., 'simple_lstm'). The function we
+  # register here should take a single argument *args* and modify it in-place
+  # to match the desired architecture.
+
+  @register_model_architecture('simple_lstm', 'tutorial_simple_lstm')
+  def tutorial_simple_lstm(args):
+      # We use ``getattr()`` to prioritize arguments that are explicitly given
+      # on the command-line, so that the defaults defined below are only used
+      # when no other value has been specified.
+      args.encoder_embed_dim = getattr(args, 'encoder_embed_dim', 256)
+      args.encoder_hidden_dim = getattr(args, 'encoder_hidden_dim', 256)
+      args.decoder_embed_dim = getattr(args, 'decoder_embed_dim', 256)
+      args.decoder_hidden_dim = getattr(args, 'decoder_hidden_dim', 256)
+
+
+3. Training the Model
+---------------------
+
+Now we're ready to train the model. We can use the existing :ref:`fairseq-train`
+command-line tool for this, making sure to specify our new Model architecture
+(``--arch tutorial_simple_lstm``).
+
+.. note::
+
+  Make sure you've already preprocessed the data from the IWSLT example in the
+  :file:`examples/translation/` directory.
+
+.. code-block:: console
+
+  > fairseq-train data-bin/iwslt14.tokenized.de-en \
+    --arch tutorial_simple_lstm \
+    --encoder-dropout 0.2 --decoder-dropout 0.2 \
+    --optimizer adam --lr 0.005 --lr-shrink 0.5 \
+    --max-tokens 12000
+  (...)
+  | epoch 052 | loss 4.027 | ppl 16.30 | wps 420805 | ups 39.7 | wpb 9841 | bsz 400 | num_updates 20852 | lr 1.95313e-05 | gnorm 0.218 | clip 0% | oom 0 | wall 529 | train_wall 396
+  | epoch 052 | valid on 'valid' subset | valid_loss 4.74989 | valid_ppl 26.91 | num_updates 20852 | best 4.74954
+
+The model files should appear in the :file:`checkpoints/` directory. While this
+model architecture is not very good, we can use the :ref:`fairseq-generate` script to
+generate translations and compute our BLEU score over the test set:
+
+.. code-block:: console
+
+  > fairseq-generate data-bin/iwslt14.tokenized.de-en \
+    --path checkpoints/checkpoint_best.pt \
+    --beam 5 \
+    --remove-bpe
+  (...)
+  | Translated 6750 sentences (153132 tokens) in 17.3s (389.12 sentences/s, 8827.68 tokens/s)
+  | Generate test with beam=5: BLEU4 = 8.18, 38.8/12.1/4.7/2.0 (BP=1.000, ratio=1.066, syslen=139865, reflen=131146)
+
+
+4. Making generation faster
+---------------------------
+
+While autoregressive generation from sequence-to-sequence models is inherently
+slow, our implementation above is especially slow because it recomputes the
+entire sequence of Decoder hidden states for every output token (i.e., it is
+``O(n^2)``). We can make this significantly faster by instead caching the
+previous hidden states.
+
+In fairseq this is called :ref:`Incremental decoding`. Incremental decoding is a
+special mode at inference time where the Model only receives a single timestep
+of input corresponding to the immediately previous output token (for teacher
+forcing) and must produce the next output incrementally. Thus the model must
+cache any long-term state that is needed about the sequence, e.g., hidden
+states, convolutional states, etc.
+
+To implement incremental decoding we will modify our model to implement the
+:class:`~fairseq.models.FairseqIncrementalDecoder` interface. Compared to the
+standard :class:`~fairseq.models.FairseqDecoder` interface, the incremental
+decoder interface allows ``forward()`` methods to take an extra keyword argument
+(*incremental_state*) that can be used to cache state across time-steps.
+
+Let's replace our ``SimpleLSTMDecoder`` with an incremental one::
+
+  import torch
+  from fairseq.models import FairseqIncrementalDecoder
+
+  class SimpleLSTMDecoder(FairseqIncrementalDecoder):
+
+      def __init__(
+          self, dictionary, encoder_hidden_dim=128, embed_dim=128, hidden_dim=128,
+          dropout=0.1,
+      ):
+          # This remains the same as before.
+          super().__init__(dictionary)
+          self.embed_tokens = nn.Embedding(
+              num_embeddings=len(dictionary),
+              embedding_dim=embed_dim,
+              padding_idx=dictionary.pad(),
+          )
+          self.dropout = nn.Dropout(p=dropout)
+          self.lstm = nn.LSTM(
+              input_size=encoder_hidden_dim + embed_dim,
+              hidden_size=hidden_dim,
+              num_layers=1,
+              bidirectional=False,
+          )
+          self.output_projection = nn.Linear(hidden_dim, len(dictionary))
+
+      # We now take an additional kwarg (*incremental_state*) for caching the
+      # previous hidden and cell states.
+      def forward(self, prev_output_tokens, encoder_out, incremental_state=None):
+          if incremental_state is not None:
+              # If the *incremental_state* argument is not ``None`` then we are
+              # in incremental inference mode. While *prev_output_tokens* will
+              # still contain the entire decoded prefix, we will only use the
+              # last step and assume that the rest of the state is cached.
+              prev_output_tokens = prev_output_tokens[:, -1:]
+
+          # This remains the same as before.
+          bsz, tgt_len = prev_output_tokens.size()
+          final_encoder_hidden = encoder_out['final_hidden']
+          x = self.embed_tokens(prev_output_tokens)
+          x = self.dropout(x)
+          x = torch.cat(
+              [x, final_encoder_hidden.unsqueeze(1).expand(bsz, tgt_len, -1)],
+              dim=2,
+          )
+
+          # We will now check the cache and load the cached previous hidden and
+          # cell states, if they exist, otherwise we will initialize them to
+          # zeros (as before). We will use the ``utils.get_incremental_state()``
+          # and ``utils.set_incremental_state()`` helpers.
+          initial_state = utils.get_incremental_state(
+              self, incremental_state, 'prev_state',
+          )
+          if initial_state is None:
+              # first time initialization, same as the original version
+              initial_state = (
+                  final_encoder_hidden.unsqueeze(0),  # hidden
+                  torch.zeros_like(final_encoder_hidden).unsqueeze(0),  # cell
+              )
+
+          # Run one step of our LSTM.
+          output, latest_state = self.lstm(x.transpose(0, 1), initial_state)
+
+          # Update the cache with the latest hidden and cell states.
+          utils.set_incremental_state(
+              self, incremental_state, 'prev_state', latest_state,
+          )
+
+          # This remains the same as before
+          x = output.transpose(0, 1)
+          x = self.output_projection(x)
+          return x, None
+
+      # The ``FairseqIncrementalDecoder`` interface also requires implementing a
+      # ``reorder_incremental_state()`` method, which is used during beam search
+      # to select and reorder the incremental state.
+      def reorder_incremental_state(self, incremental_state, new_order):
+          # Load the cached state.
+          prev_state = utils.get_incremental_state(
+              self, incremental_state, 'prev_state',
+          )
+
+          # Reorder batches according to *new_order*.
+          reordered_state = (
+              prev_state[0].index_select(1, new_order),  # hidden
+              prev_state[1].index_select(1, new_order),  # cell
+          )
+
+          # Update the cached state.
+          utils.set_incremental_state(
+              self, incremental_state, 'prev_state', reordered_state,
+          )
+
+Finally, we can rerun generation and observe the speedup:
+
+.. code-block:: console
+
+  # Before
+
+  > fairseq-generate data-bin/iwslt14.tokenized.de-en \
+    --path checkpoints/checkpoint_best.pt \
+    --beam 5 \
+    --remove-bpe
+  (...)
+  | Translated 6750 sentences (153132 tokens) in 17.3s (389.12 sentences/s, 8827.68 tokens/s)
+  | Generate test with beam=5: BLEU4 = 8.18, 38.8/12.1/4.7/2.0 (BP=1.000, ratio=1.066, syslen=139865, reflen=131146)
+
+  # After
+
+  > fairseq-generate data-bin/iwslt14.tokenized.de-en \
+    --path checkpoints/checkpoint_best.pt \
+    --beam 5 \
+    --remove-bpe
+  (...)
+  | Translated 6750 sentences (153132 tokens) in 5.5s (1225.54 sentences/s, 27802.94 tokens/s)
+  | Generate test with beam=5: BLEU4 = 8.18, 38.8/12.1/4.7/2.0 (BP=1.000, ratio=1.066, syslen=139865, reflen=131146)
diff --git a/SpeechT5/fairseq/examples/.gitignore b/SpeechT5/fairseq/examples/.gitignore
new file mode 100644
index 0000000000000000000000000000000000000000..1ef816f2cd7b4a9aa7adf8bd5635a644834738f1
--- /dev/null
+++ b/SpeechT5/fairseq/examples/.gitignore
@@ -0,0 +1,2 @@
+!*/*.sh
+!*/*.md
diff --git a/SpeechT5/fairseq/examples/__init__.py b/SpeechT5/fairseq/examples/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..44bb24ae614941f23fea29c56d60167650c39bcb
--- /dev/null
+++ b/SpeechT5/fairseq/examples/__init__.py
@@ -0,0 +1,9 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+try:
+    from fairseq.version import __version__  # noqa
+except ImportError:
+    pass
diff --git a/SpeechT5/fairseq/examples/adaptive_span/README.md b/SpeechT5/fairseq/examples/adaptive_span/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..913a87338633f8a790d70fe4133b8bd8b95a4c50
--- /dev/null
+++ b/SpeechT5/fairseq/examples/adaptive_span/README.md
@@ -0,0 +1,90 @@
+# Adaptive Span
+
+Adaptive Span is a novel self-attention mechanism that can learn its optimal
+attention span. This allows us to extend significantly the maximum context size
+used in Transformer, while maintaining control over their memory footprint
+and computational time. It uses the Truncated BPTT technique for training,
+as in [transformerXL](https://github.com/pytorch/fairseq/blob/master/examples/truncated_bptt/README.md).
+
+Adaptive Span was introduced by paper:
+[Adaptive Attention Span in Transformers](https://arxiv.org/abs/1905.07799),
+which achieved state-of-the-art language modeling results at the time of publication.
+
+We manage to reproduce their result in fairseq and keep most of the
+[original implementation](https://github.com/facebookresearch/adaptive-span) untouched.
+You can refer to the their sweep file as well if any combination of hyperparameter is not clear.
+
+##### 0. Setup
+
+First you need to process the Enwik8 dataset, we use the pre-tokenized dataset
+from [adaptive span paper](https://github.com/facebookresearch/adaptive-span/blob/master/get_data.sh).
+You can download the dataset, and then run:
+```bash
+fairseq-preprocess --only-source --trainpref ~/data/enwik8/train.txt \
+    --validpref ~/data/enwik8/valid.txt --testpref ~/data/enwik8/test.txt \
+    --destdir ~/data/enwik8/data-bin/ --joined-dictionary --workers 20
+```
+
+##### 1. Train a Adaptive Span model on Enwik8
+
+We will train a 12-layer Adaptive Span model following the [hyperparameters
+used in the original
+paper](https://github.com/facebookresearch/adaptive-span/blob/master/experiments/enwik8.sh).
+
+The following command assumes 4 GPUs, so that the total batch size is 64
+sequences (4 x 16). Training should take 2-3 days on 4 V100 GPUs:
+```bash
+CUDA_VISIBLE_DEVICES=0,1,2,3 fairseq-train \
+    --user-dir examples/adaptive_span \
+    --data  ~/data/enwik8/data-bin/ \
+    --fp16 --fp16-no-flatten-grads --max-update 600000 \
+    --task truncated_bptt_lm --tokens-per-sample 512 --arch adaptive_span \
+    --n-layer 12 --d-model 512 --n-head 8 --d-inner 2048 --dropout 0.3 \
+    --attn-span 8192 --optimizer adagrad_with_grad_clip --adagrad-clip 0.03 \
+    --validate-interval-updates 1000 \
+    --lr-scheduler fixed --warmup-updates 32000 --batch-size-valid 32 \
+    --lr 0.07 --criterion adaptive_span_loss --batch-size 16 --update-freq 1 \
+    --seed 2 --log-format json --log-interval 25 --aux-loss-scaler 5e-07
+```
+This should land around 1.05 on validation, 1.03 on test. You can lower the
+--aux-loss-scaler for better performance (longer span). It gives ~0.03 bpc
+improvement to the transformerXL baseline here.
+If training on a single GPU, set `--update-freq=4` to accumulate 4x gradients
+and simulate training on 4 GPUs.
+You can also reproduce the transformerXL result on enwik8 using this code base.
+It should land around 1.06 on test,matching the [original paper](https://github.com/kimiyoung/transformer-xl/blob/master/pytorch/run_enwik8_base.sh).
+You can try by
+```bash
+CUDA_VISIBLE_DEVICES=0,1,2,3 fairseq-train \
+    --user-dir examples/truncated_bptt \
+    ~/data/enwik8/data-bin/ \
+    --task truncated_bptt_lm  --fp16 --max-update 400000 \
+    --tokens-per-sample 512 --arch transformer_xl --n-layer 12 \
+    --d-model 512 --n-head 8 --d-head 64 --d-inner 2048 --dropout 0.1 \
+    --dropatt 0.0 --mem-len 512 --optimizer adam --clip-norm 0.25 \
+    --lr-scheduler cosine --warmup-updates 0 \
+    --lr 0.0 --lr 0.00025 --batch-size 15 \
+    --update-freq 1 --seed 2 --log-format json --log-interval 25 \
+    --fp16
+```
+
+##### 2. Evaluate
+For Adaptive Span:
+```bash
+fairseq-eval-lm ~/data/enwik8/data-bin/ --path model/checkpoint_best.pt \
+ --user-dir examples/adaptive_span \
+ --task truncated_bptt_lm --batch-size 8 --tokens-per-sample 512 --gen-subset test
+```
+For Transformer-XL evaluation:
+```bash
+fairseq-eval-lm ~/data/enwik8/data-bin/ --path model/checkpoint_best.pt \
+    --user-dir examples/truncated_bptt/ --task truncated_bptt_lm --batch-size 8 \
+    --tokens-per-sample 80 \
+    --model-overrides '{"mem_len":2100,"clamp_len":820,"same_length":True}' \
+    --gen-subset valid
+```
+
+*Note:* During training the model saw 512 tokens of context
+(``--tokens-per-sample=512``), with batch size 8. These settings match the evaluation
+settings from [the original
+paper](https://github.com/facebookresearch/adaptive-span/blob/master/experiments/enwik8.sh).
diff --git a/SpeechT5/fairseq/examples/adaptive_span/__init__.py b/SpeechT5/fairseq/examples/adaptive_span/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..e0a142a769360e1140bf814c532eaf841f1d52d8
--- /dev/null
+++ b/SpeechT5/fairseq/examples/adaptive_span/__init__.py
@@ -0,0 +1,19 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import importlib
+import os
+
+# automatically import any Python files in the current directory
+cur_dir = os.path.dirname(__file__)
+for file in os.listdir(cur_dir):
+    path = os.path.join(cur_dir, file)
+    if (
+        not file.startswith("_")
+        and not file.startswith(".")
+        and (file.endswith(".py") or os.path.isdir(path))
+    ):
+        mod_name = file[: file.find(".py")] if file.endswith(".py") else file
+        module = importlib.import_module(__name__ + "." + mod_name)
diff --git a/SpeechT5/fairseq/examples/adaptive_span/adagrad_with_grad_clip.py b/SpeechT5/fairseq/examples/adaptive_span/adagrad_with_grad_clip.py
new file mode 100644
index 0000000000000000000000000000000000000000..585ce184ab2d6bbde0d2f7fcafd6536fa8f6d8b6
--- /dev/null
+++ b/SpeechT5/fairseq/examples/adaptive_span/adagrad_with_grad_clip.py
@@ -0,0 +1,128 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from torch.optim import Adagrad
+
+from fairseq.optim import LegacyFairseqOptimizer, register_optimizer
+
+
+@register_optimizer("adagrad_with_grad_clip")
+class FairseqAdagradWithGradClip(LegacyFairseqOptimizer):
+    def __init__(self, args, params):
+        super().__init__(args)
+        self._optimizer = AdagradWithGradClip(params, **self.optimizer_config)
+
+    @staticmethod
+    def add_args(parser):
+        """Add optimizer-specific arguments to the parser."""
+        # fmt: off
+        parser.add_argument('--weight-decay', '--wd', default=0.0, type=float, metavar='WD',
+                            help='weight decay')
+        parser.add_argument('--adagrad-clip', default=0.0, type=float, metavar='D',
+                            help='internal grad clip')
+        # fmt: on
+
+    @property
+    def optimizer_config(self):
+        """
+        Return a kwarg dictionary that will be used to override optimizer
+        args stored in checkpoints. This allows us to load a checkpoint and
+        resume training using a different set of optimizer args, e.g., with a
+        different learning rate.
+        """
+        return {
+            "lr": self.args.lr[0],
+            "weight_decay": self.args.weight_decay,
+            "grad_clip": self.args.adagrad_clip,
+        }
+
+    @property
+    def supports_flat_params(self):
+        return False
+
+
+def _clip_grad(clr, grad, group_grad_clip):
+    if group_grad_clip > 0:
+        norm = grad.norm(2).item()
+        if norm > group_grad_clip:
+            clr *= group_grad_clip / (norm + 1e-10)
+    return clr
+
+
+class AdagradWithGradClip(Adagrad):
+    """Adagrad algorithm with custom gradient clipping"""
+
+    def __init__(
+        self,
+        params,
+        lr=1e-2,
+        lr_decay=0,
+        weight_decay=0,
+        initial_accumulator_value=0,
+        grad_clip=0,
+    ):
+        Adagrad.__init__(
+            self,
+            params,
+            lr=lr,
+            lr_decay=lr_decay,
+            weight_decay=weight_decay,
+            initial_accumulator_value=initial_accumulator_value,
+        )
+        self.defaults["grad_clip"] = grad_clip
+        self.param_groups[0].setdefault("grad_clip", grad_clip)
+
+    def step(self, closure=None):
+        loss = None
+        if closure is not None:
+            loss = closure()
+
+        for group in self.param_groups:
+            for p in group["params"]:
+                if p.grad is None:
+                    continue
+
+                grad = p.grad.data
+                state = self.state[p]
+
+                state["step"] += 1
+
+                if group["weight_decay"] != 0:
+                    if p.grad.data.is_sparse:
+                        raise RuntimeError(
+                            "weight_decay option is "
+                            "not compatible with sparse "
+                            "gradients"
+                        )
+                    grad = grad.add(group["weight_decay"], p.data)
+
+                clr = group["lr"] / (1 + (state["step"] - 1) * group["lr_decay"])
+
+                # clip
+                clr = _clip_grad(clr=clr, grad=grad, group_grad_clip=group["grad_clip"])
+
+                if grad.is_sparse:
+                    # the update is non-linear so indices must be unique
+                    grad = grad.coalesce()
+                    grad_indices = grad._indices()
+                    grad_values = grad._values()
+                    size = grad.size()
+
+                    def make_sparse(values):
+                        constructor = grad.new
+                        if grad_indices.dim() == 0 or values.dim() == 0:
+                            return constructor().resize_as_(grad)
+                        return constructor(grad_indices, values, size)
+
+                    state["sum"].add_(make_sparse(grad_values.pow(2)))
+                    std = state["sum"]._sparse_mask(grad)
+                    std_values = std._values().sqrt_().add_(1e-10)
+                    p.data.add_(-clr, make_sparse(grad_values / std_values))
+                else:
+                    state["sum"].addcmul_(1, grad, grad)
+                    std = state["sum"].sqrt().add_(1e-10)
+                    p.data.addcdiv_(-clr, grad, std)
+
+        return loss
diff --git a/SpeechT5/fairseq/examples/adaptive_span/adaptive_span_attention.py b/SpeechT5/fairseq/examples/adaptive_span/adaptive_span_attention.py
new file mode 100644
index 0000000000000000000000000000000000000000..07f757bb8e1a8a67b1124175ee338c8735aa8d65
--- /dev/null
+++ b/SpeechT5/fairseq/examples/adaptive_span/adaptive_span_attention.py
@@ -0,0 +1,160 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+import math
+
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+
+
+class AdaptiveMask(nn.Module):
+    """Soft masking function for adaptive size.
+    It masks out the last K values of an input. The masking value
+    goes from 1 to 0 gradually, so K can be learned with
+    back-propagation.
+    Args:
+        max_size: maximum size (i.e. input dimension)
+        ramp_size: size of the ramp going from 0 to 1
+        init_val: initial size proportion not to be masked out
+        shape: learn multiple sizes independent of each other
+    """
+
+    def __init__(self, max_size, ramp_size, init_val=0, shape=(1,)):
+        nn.Module.__init__(self)
+        self._max_size = max_size
+        self._ramp_size = ramp_size
+        self.current_val = nn.Parameter(torch.zeros(*shape) + init_val)
+        mask_template = torch.linspace(1 - max_size, 0, steps=max_size)
+        self.register_buffer("mask_template", mask_template)
+
+    def forward(self, x):
+        mask = self.mask_template.float() + self.current_val.float() * self._max_size
+        mask = mask / self._ramp_size + 1
+        mask = mask.clamp(0, 1)
+        if x.size(-1) < self._max_size:
+            # the input could have been trimmed beforehand to save computation
+            mask = mask.narrow(-1, self._max_size - x.size(-1), x.size(-1))
+        x = (x * mask).type_as(x)
+        return x
+
+    def get_current_max_size(self, include_ramp=True):
+        current_size = math.ceil(self.current_val.max().item() * self._max_size)
+        if include_ramp:
+            current_size += self._ramp_size
+        current_size = max(0, min(self._max_size, current_size))
+        return current_size
+
+    def get_current_avg_size(self, include_ramp=True):
+        current_size = math.ceil(
+            self.current_val.float().mean().item() * self._max_size
+        )
+        if include_ramp:
+            current_size += self._ramp_size
+        current_size = max(0, min(self._max_size, current_size))
+        return current_size
+
+    def clamp_param(self):
+        """this need to be called after each update"""
+        self.current_val.data.clamp_(0, 1)
+
+
+class AdaptiveSpan(nn.Module):
+    """Adaptive attention span for Transformerself.
+    This module learns an attention span length from data for each
+    self-attention head.
+    Args:
+        attn_span: maximum attention span
+        adapt_span_loss: loss coefficient for the span length
+        adapt_span_ramp: length of the masking ramp
+        adapt_span_init: initial size ratio
+        adapt_span_cache: adapt cache size to reduce memory usage
+    """
+
+    def __init__(
+        self,
+        attn_span,
+        adapt_span_ramp,
+        adapt_span_init,
+        n_head,
+        adapt_span_layer,
+        **kargs
+    ):
+        nn.Module.__init__(self)
+        self._max_span = attn_span
+        self._n_head = n_head
+        self._adapt_span_layer = adapt_span_layer
+        if self._adapt_span_layer:
+            self._mask = AdaptiveMask(
+                max_size=self._max_span,
+                ramp_size=adapt_span_ramp,
+                init_val=adapt_span_init,
+            )
+        else:
+            self._mask = AdaptiveMask(
+                max_size=self._max_span,
+                ramp_size=adapt_span_ramp,
+                init_val=adapt_span_init,
+                shape=(n_head, 1, 1),
+            )
+
+    def forward(self, attn, normalize=True):
+        """mask attention with the right span"""
+        # batch and head dimensions are merged together, so separate them first
+        self.clamp_param()
+        if self._adapt_span_layer:
+            attn = self._mask(attn)
+        else:
+            B = attn.size(0)  # batch size
+            M = attn.size(1)  # block size
+            attn = attn.reshape(B // self._n_head, self._n_head, M, -1)
+            attn = self._mask(attn)
+            attn = attn.view(B, M, -1)
+        return attn
+
+    def get_trim_len(self):
+        """how much of memory can be trimmed to reduce computation"""
+        L = self._max_span
+        trim_len = min(L - 1, L - self._mask.get_current_max_size())
+        # too fine granularity might be bad for the memory management
+        trim_len = math.floor(trim_len / 64) * 64
+        return trim_len
+
+    def trim_memory(self, query, key, value, key_pe):
+        """trim out unnecessary memory beforehand to reduce computation"""
+        trim_len = self.get_trim_len()
+        cache_size = key.size(1) - query.size(1)
+        trim_len_cache = trim_len - (self._max_span - cache_size)
+        if trim_len_cache > 0:
+            key = key[:, trim_len_cache:, :]
+            value = value[:, trim_len_cache:, :]
+        elif trim_len_cache < 0:
+            # cache is too short! this happens when validation resumes
+            # after a lot of updates.
+            key = F.pad(key, [0, 0, -trim_len_cache, 0])
+            value = F.pad(value, [0, 0, -trim_len_cache, 0])
+        if trim_len > 0:
+            if key_pe is not None:
+                key_pe = key_pe[:, :, trim_len:]
+        return key, value, key_pe
+
+    def get_cache_size(self):
+        """determine how long the cache should be"""
+        trim_len = self.get_trim_len()
+        # give a buffer of 64 steps since a span might increase
+        # in future updates
+        return min(self._max_span, self._max_span - trim_len + 64)
+
+    def get_loss(self):
+        """a loss term for regularizing the span length"""
+        return self._max_span * self._mask.current_val.float().mean()
+
+    def get_current_max_span(self):
+        return self._mask.get_current_max_size()
+
+    def get_current_avg_span(self):
+        return self._mask.get_current_avg_size()
+
+    def clamp_param(self):
+        self._mask.clamp_param()
diff --git a/SpeechT5/fairseq/examples/adaptive_span/adaptive_span_loss.py b/SpeechT5/fairseq/examples/adaptive_span/adaptive_span_loss.py
new file mode 100644
index 0000000000000000000000000000000000000000..056245807e5f8d313a8ad5be68aea4e285f4f580
--- /dev/null
+++ b/SpeechT5/fairseq/examples/adaptive_span/adaptive_span_loss.py
@@ -0,0 +1,106 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import math
+from dataclasses import dataclass
+
+import torch.nn.functional as F
+from fairseq import metrics, utils
+from fairseq.criterions import register_criterion
+from fairseq.criterions.cross_entropy import CrossEntropyCriterion
+from fairseq.dataclass import FairseqDataclass
+from omegaconf import II
+
+
+@dataclass
+class AdaptiveSpanCriterionConfig(FairseqDataclass):
+    sentence_avg: bool = II("optimization.sentence_avg")
+
+
+@register_criterion("adaptive_span_loss", dataclass=AdaptiveSpanCriterionConfig)
+class AdaptiveSpanCriterion(CrossEntropyCriterion):
+    def __init__(self, task, sentence_avg):
+        super().__init__(task, sentence_avg)
+
+    def forward(self, model, sample, reduce=True):
+        """Compute the loss for the given sample.
+
+        Returns a tuple with three elements:
+        1) the loss here is summed, different from the adaptive span code
+        2) the sample size, which is used as the denominator for the gradient
+        3) logging outputs to display while training
+        """
+        net_output = model(**sample["net_input"])
+        loss, aux_loss, avg_span, max_span = self.compute_loss(
+            model, net_output, sample, reduce=reduce
+        )
+        sample_size = (
+            sample["target"].size(0) if self.sentence_avg else sample["ntokens"]
+        )
+        loss /= sample_size
+        total_loss = loss + aux_loss
+        sample_size = 1
+
+        logging_output = {
+            "loss": loss.data,
+            "ntokens": sample["ntokens"],
+            "nsentences": sample["target"].size(0),
+            "sample_size": sample_size,
+            "total_loss": total_loss.data,
+            "avg_span": avg_span * sample_size,
+            "max_span": max_span * sample_size,
+        }
+        return total_loss, sample_size, logging_output
+
+    def compute_loss(self, model, net_output, sample, reduce=True):
+        loss, _ = super().compute_loss(model, net_output, sample, reduce)
+        aux_loss = model.get_aux_loss()
+        avg_span = model.get_current_avg_span()
+        max_span = model.get_current_max_span()
+        return loss, aux_loss, avg_span, max_span
+
+    @staticmethod
+    def reduce_metrics(logging_outputs) -> None:
+        """Aggregate logging outputs from data parallel training."""
+        loss_sum = sum(log.get("loss", 0) for log in logging_outputs)
+        ntokens = sum(log.get("ntokens", 0) for log in logging_outputs)
+        sample_size = sum(log.get("sample_size", 0) for log in logging_outputs)
+        total_loss_sum = sum(log.get("total_loss", 0) for log in logging_outputs)
+        avg_span_sum = sum(log.get("avg_span", 0) for log in logging_outputs)
+        max_span_sum = sum(log.get("max_span", 0) for log in logging_outputs)
+
+        # we divide by log(2) to convert the loss from base e to base 2
+        metrics.log_scalar(
+            "loss", loss_sum / sample_size / math.log(2), sample_size, round=3
+        )
+        metrics.log_scalar("avg_span", avg_span_sum / sample_size, sample_size, round=3)
+        metrics.log_scalar("max_span", max_span_sum / sample_size, sample_size, round=3)
+        # total loss contains the L1 norm on adaptive-span
+        metrics.log_scalar(
+            "total_loss",
+            total_loss_sum / sample_size / math.log(2),
+            sample_size,
+            round=3,
+        )
+        if sample_size != ntokens:
+            metrics.log_scalar(
+                "nll_loss", loss_sum / ntokens / math.log(2), ntokens, round=3
+            )
+            metrics.log_derived(
+                "ppl", lambda meters: utils.get_perplexity(meters["nll_loss"].avg)
+            )
+        else:
+            metrics.log_derived(
+                "ppl", lambda meters: utils.get_perplexity(meters["loss"].avg)
+            )
+
+    @staticmethod
+    def logging_outputs_can_be_summed() -> bool:
+        """
+        Whether the logging outputs returned by `forward` can be summed
+        across workers prior to calling `reduce_metrics`. Setting this
+        to True will improves distributed training speed.
+        """
+        return True
diff --git a/SpeechT5/fairseq/examples/adaptive_span/adaptive_span_model.py b/SpeechT5/fairseq/examples/adaptive_span/adaptive_span_model.py
new file mode 100644
index 0000000000000000000000000000000000000000..d96c95b85dbcf29e9384cc6d8d9630d2489991b2
--- /dev/null
+++ b/SpeechT5/fairseq/examples/adaptive_span/adaptive_span_model.py
@@ -0,0 +1,263 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+
+import math
+
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+
+from fairseq.modules.layer_norm import LayerNorm
+
+from .adaptive_span_attention import AdaptiveSpan
+
+# Size notations:
+# B = batch_size, H = d_model, M = block_size, L = attn_span
+
+
+def _skew(X, pad_value):
+    """shift every row 1 step to right"""
+    # X = B x M x L
+    B, M, L = X.size()
+    X = F.pad(X, (0, M + 1), value=pad_value)  # B x M x (L+M+1)
+    X = X.view(B, -1)  # B x ML+MM+M
+    X = X[:, :-M]  # B x ML+MM
+    X = X.view(B, M, M + L)  # B x M x L+M
+    return X
+
+
+def _unskew(X):
+    """reverse _skew operation"""
+    # X = B x M x L+M
+    B, M, L = X.size()
+    L -= M
+    X = X.view(B, -1)  # B x ML+MM
+    X = F.pad(X, (0, M))  # B x ML+MM+M
+    X = X.view(B, M, M + L + 1)  # B x M x L+M+1
+    X = X[:, :, :L]  # B x M x L
+    return X
+
+
+class SeqAttention(nn.Module):
+    """Sequential self-attention layer.
+    Each token will attend to its previous fixed number of steps.
+    Note that attention doesn't include the current step itself.
+    """
+
+    def __init__(self, d_model, n_head, attn_span, dropout, adapt_span_layer, **kargs):
+        nn.Module.__init__(self)
+        self.dropout = nn.Dropout(dropout)
+        self.d_model = d_model  # size of a single head
+        self.attn_span = attn_span
+        self.adaptive_span = AdaptiveSpan(
+            attn_span=attn_span,
+            n_head=n_head,
+            adapt_span_layer=adapt_span_layer,
+            **kargs
+        )
+
+    def forward(self, query, key, value, key_pe):
+        # query size = B x M x H
+        # key, value sizes = B x (M+L) x H
+
+        key, value, key_pe = self.adaptive_span.trim_memory(query, key, value, key_pe)
+
+        # compute attention from context
+        # B x M (dest) x (M+L) (src)
+        attn_cont = torch.matmul(query, key.transpose(-1, -2))
+        attn_cont = _unskew(attn_cont)  # B x M x L
+
+        # compute the effect of position embedding
+        attn_pos = torch.matmul(query, key_pe)  # B x M x L_pos
+        attn = attn_cont + attn_pos
+
+        attn = attn / math.sqrt(self.d_model)  # B x M X L_pos
+
+        attn = F.softmax(attn.float(), dim=-1).type_as(attn)
+
+        # trim attention lengths according to the learned span
+        attn = self.adaptive_span(attn)
+
+        attn = self.dropout(attn)  # B x M X L_pos
+
+        attn_cont = _skew(attn, 0)  # B x M X (L+M)
+        out = torch.matmul(attn_cont, value)  # B x M x H
+        return out
+
+    def get_cache_size(self):
+        return self.adaptive_span.get_cache_size()
+
+
+class MultiHeadSeqAttention(nn.Module):
+    def __init__(self, d_model, n_head, **kargs):
+        nn.Module.__init__(self)
+        assert d_model % n_head == 0
+        self.n_head = n_head
+        self.head_dim = d_model // n_head
+        self.attn = SeqAttention(d_model=self.head_dim, n_head=n_head, **kargs)
+        self.proj_query = nn.Linear(d_model, d_model, bias=False)
+        nn.init.xavier_normal_(self.proj_query.weight)
+        self.proj_out = nn.Linear(d_model, d_model, bias=False)
+        nn.init.xavier_normal_(self.proj_out.weight)
+        self.proj_val = nn.Linear(d_model, d_model, bias=False)
+        nn.init.xavier_normal_(self.proj_val.weight)
+        self.proj_key = nn.Linear(d_model, d_model, bias=False)
+        nn.init.xavier_normal_(self.proj_key.weight)
+
+    def head_reshape(self, x):
+        K = self.n_head
+        D = self.head_dim
+        x = x.view(x.size()[:-1] + (K, D))  # B x (M+L) x K x D
+        x = x.transpose(1, 2).contiguous()  # B x K x (M+L) x D
+        x = x.view(-1, x.size(-2), x.size(-1))  # B_K x (M+L) x D
+        return x
+
+    def forward(self, query, key, value, key_pe):
+        B = query.size(0)
+        K = self.n_head
+        D = self.head_dim
+        M = query.size(1)
+
+        query = self.proj_query(query)
+        query = self.head_reshape(query)
+        value = self.proj_val(value)
+        value = self.head_reshape(value)
+        key = self.proj_key(key)
+        key = self.head_reshape(key)
+
+        out = self.attn(query, key, value, key_pe)  # B_K x M x D
+        out = out.view(B, K, M, D)  # B x K x M x D
+        out = out.transpose(1, 2).contiguous()  # B x M x K x D
+        out = out.view(B, M, -1)  # B x M x K_D
+        out = self.proj_out(out)
+        return out
+
+
+class FeedForwardLayer(nn.Module):
+    def __init__(self, d_model, d_inner, dropout, **kargs):
+        nn.Module.__init__(self)
+        self.fc1 = nn.Linear(d_model, d_inner)
+        self.fc2 = nn.Linear(d_inner, d_model)
+        nn.init.xavier_uniform_(self.fc1.weight)
+        nn.init.xavier_uniform_(self.fc2.weight)
+        self.dropout = nn.Dropout(dropout)
+
+    def forward(self, h):
+        h1 = F.relu(self.fc1(h))
+        h1 = self.dropout(h1)
+        h2 = self.fc2(h1)
+        return h2
+
+
+class TransformerSeqLayer(nn.Module):
+    def __init__(self, d_model, **kargs):
+        nn.Module.__init__(self)
+        self.attn = MultiHeadSeqAttention(d_model=d_model, **kargs)
+        self.norm1 = LayerNorm(d_model)
+        self.ff = FeedForwardLayer(d_model=d_model, **kargs)
+        self.norm2 = LayerNorm(d_model)
+
+    def forward(self, h, h_cache, key_pe):
+        # h = B x M x H
+        # h_cache = B x L x H
+        h_all = torch.cat([h_cache, h], dim=1)  # B x (M+L) x H
+        attn_out = self.attn(h, h_all, h_all, key_pe)
+        h = self.norm1(h + attn_out)  # B x M x H
+        if self.ff is not None:
+            ff_out = self.ff(h)
+            out = self.norm2(h + ff_out)  # B x M x H
+        else:
+            out = h
+        return out
+
+    def get_cache_size(self):
+        return self.attn.attn.get_cache_size()
+
+
+class TransformerSeq(nn.Module):
+    def __init__(
+        self,
+        vocab_size,
+        d_model,
+        n_head,
+        n_layer,
+        attn_span,
+        emb_dropout,
+        aux_loss_scaler,
+        adapt_span_layer,
+        **kargs
+    ):
+        nn.Module.__init__(self)
+        # token embeddings
+        self.in_emb = nn.Embedding(vocab_size, d_model)
+        nn.init.normal_(self.in_emb.weight, mean=0, std=d_model ** -0.5)
+        self.out_emb = nn.Linear(d_model, vocab_size)
+        self.aux_loss_scaler = aux_loss_scaler
+        if emb_dropout > 0:
+            self.emb_dropout = nn.Dropout(emb_dropout)
+        else:
+            self.emb_dropout = None
+        # position embeddings
+        self.key_pe = nn.Parameter(torch.randn(1, d_model // n_head, attn_span))
+
+        self.layers = nn.ModuleList()
+        self.layers.extend(
+            TransformerSeqLayer(
+                d_model=d_model,
+                n_head=n_head,
+                attn_span=attn_span,
+                adapt_span_layer=adapt_span_layer,
+                **kargs
+            )
+            for _ in range(n_layer)
+        )
+
+    def forward(self, x, h_cache, target=None):
+        # x size = B x M
+        block_size = x.size(1)
+        h = self.in_emb(x)  # B x M x H
+        if self.emb_dropout is not None:
+            h = self.emb_dropout(h)
+
+        h_cache_next = []
+        for l, layer in enumerate(self.layers):
+            cache_size = layer.attn.attn.get_cache_size()
+            if cache_size > block_size:
+                h_cache_next_l = torch.cat(
+                    [h_cache[l][:, -cache_size + block_size :, :], h], dim=1
+                ).detach()
+            else:
+                h_cache_next_l = h[:, -cache_size:, :].detach()
+            h_cache_next.append(h_cache_next_l)
+            h = layer(h, h_cache[l], self.key_pe)  # B x M x H
+
+        if self.emb_dropout is not None:
+            h = self.emb_dropout(h)
+
+        out = F.log_softmax(self.out_emb(h).float(), dim=-1).type_as(h)
+        dummy_loss = None
+
+        return out, h_cache_next, dummy_loss
+
+    def get_aux_loss(self):
+        loss = 0.0
+        for layer in self.layers:
+            loss += layer.attn.attn.adaptive_span.get_loss()
+        return self.aux_loss_scaler * loss
+
+    def get_current_max_span(self):
+        max_span = 0.0
+        for layer in self.layers:
+            max_span = max(
+                max_span, layer.attn.attn.adaptive_span.get_current_max_span()
+            )
+        return max_span
+
+    def get_current_avg_span(self):
+        avg_span = 0.0
+        for layer in self.layers:
+            avg_span += layer.attn.attn.adaptive_span.get_current_avg_span()
+        return avg_span / len(self.layers)
diff --git a/SpeechT5/fairseq/examples/adaptive_span/adaptive_span_model_wrapper.py b/SpeechT5/fairseq/examples/adaptive_span/adaptive_span_model_wrapper.py
new file mode 100644
index 0000000000000000000000000000000000000000..5b147fe11f9d730438d036321a2d4a5d776efaa2
--- /dev/null
+++ b/SpeechT5/fairseq/examples/adaptive_span/adaptive_span_model_wrapper.py
@@ -0,0 +1,145 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import logging
+from dataclasses import dataclass
+from typing import Dict, List, Optional
+
+import torch
+from fairseq.dataclass import FairseqDataclass
+from fairseq.models import (
+    FairseqIncrementalDecoder,
+    FairseqLanguageModel,
+    register_model,
+)
+from .adaptive_span_model import TransformerSeq as AdaptiveSpanTransformerModel
+
+
+logger = logging.getLogger(__name__)
+
+
+@dataclass
+class AdaptiveSpanSmallConfig(FairseqDataclass):
+    # defaults come from https://github.com/facebookresearch/adaptive-span/blob/master/experiments/enwik8_small.sh
+    vocab_size: int = 50
+    d_model: int = 256
+    n_head: int = 4
+    d_inner: int = 1024
+    n_layer: int = 8
+    attn_span: int = 1024
+    dropout: float = 0.0
+    emb_dropout: float = 0.0
+    adapt_span_ramp: int = 32
+    adapt_span_init: float = 0.0
+    aux_loss_scaler: float = 0.000002
+    adapt_span_layer: bool = False
+
+
+@register_model("adaptive_span", dataclass=AdaptiveSpanSmallConfig)
+class AdaptiveSpanTransformer(FairseqLanguageModel):
+    @classmethod
+    def build_model(cls, cfg: AdaptiveSpanSmallConfig, task):
+        return cls(AdaptiveSpanDecoder(cfg, task))
+
+    def get_aux_loss(self):
+        return self.decoder.get_aux_loss()
+
+    def get_current_max_span(self):
+        return self.decoder.get_current_max_span()
+
+    def get_current_avg_span(self):
+        return self.decoder.get_current_avg_span()
+
+
+class AdaptiveSpanDecoder(FairseqIncrementalDecoder):
+    def __init__(self, cfg, task):
+
+        super().__init__(task.target_dictionary)
+
+        self.config = cfg
+        config = AdaptiveSpanSmallConfig(
+            vocab_size=len(task.target_dictionary),
+            d_model=cfg.d_model,
+            n_head=cfg.n_head,
+            d_inner=cfg.d_inner,
+            n_layer=cfg.n_layer,
+            attn_span=cfg.attn_span,
+            dropout=cfg.dropout,
+            emb_dropout=cfg.emb_dropout,
+            adapt_span_ramp=cfg.adapt_span_ramp,
+            adapt_span_init=cfg.adapt_span_init,
+            aux_loss_scaler=cfg.aux_loss_scaler,
+            adapt_span_layer=cfg.adapt_span_layer,
+        )
+        logger.info(config)
+        self.model = AdaptiveSpanTransformerModel(**config.__dict__)
+
+        self._mems = None
+
+    def forward(
+        self,
+        src_tokens,
+        incremental_state: Optional[Dict[str, List[torch.Tensor]]] = None,
+        encoder_out=None,
+    ):
+        bsz = src_tokens.size(0)
+        if incremental_state is not None:  # used during inference
+            mems = self.get_incremental_state("mems")
+            src_tokens = src_tokens[:, -1:]  # only keep the most recent token
+        else:
+            mems = self._mems
+
+        if mems is None:
+            # first time init
+            mems = self.init_hid_cache(bsz)
+        output = self.model(x=src_tokens, h_cache=mems,)
+        if incremental_state is not None:
+            self.set_incremental_state(incremental_state, "mems", output[1])
+        else:
+            self._mems = output[1]
+        return (output[0],)
+
+    def max_positions(self):
+        return self.config.attn_span
+
+    def init_hid_cache(self, batch_sz):
+        hid = []
+        for layer in self.model.layers:
+            param = next(self.model.parameters())
+            h = torch.zeros(
+                batch_sz,
+                layer.get_cache_size(),
+                self.config.d_model,
+                dtype=param.dtype,
+                device=param.device,
+            )
+            hid.append(h)
+        return hid
+
+    def get_aux_loss(self):
+        return self.model.get_aux_loss()
+
+    def get_current_max_span(self):
+        return self.model.get_current_max_span()
+
+    def get_current_avg_span(self):
+        return self.model.get_current_avg_span()
+
+    def reorder_incremental_state(
+        self,
+        incremental_state: Dict[str, Dict[str, Optional[torch.Tensor]]],
+        new_order: torch.Tensor,
+    ):
+        """Reorder incremental state.
+
+        This will be called when the order of the input has changed from the
+        previous time step. A typical use case is beam search, where the input
+        order changes between time steps based on the selection of beams.
+        """
+        raise NotImplementedError("This is required for generation/beam search")
+        # mems = self.get_incremental_state(incremental_state, "mems")
+        # if mems is not None:
+        #     new_mems = [mems_i.index_select(1, new_order) for mems_i in mems]
+        #     self.set_incremental_state(incremental_state, "mems", new_mems)
diff --git a/SpeechT5/fairseq/examples/adaptive_span/truncated_bptt_lm_task.py b/SpeechT5/fairseq/examples/adaptive_span/truncated_bptt_lm_task.py
new file mode 100644
index 0000000000000000000000000000000000000000..02be0e7fb4213b98798c85b79e9046e9990b97fc
--- /dev/null
+++ b/SpeechT5/fairseq/examples/adaptive_span/truncated_bptt_lm_task.py
@@ -0,0 +1,281 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import logging
+import os
+from dataclasses import dataclass, field
+from typing import List, Optional, Tuple
+
+import torch
+from fairseq import utils
+from fairseq.data import (
+    Dictionary,
+    TokenBlockDataset,
+    data_utils,
+    iterators,
+)
+from fairseq.dataclass import FairseqDataclass
+from fairseq.distributed import utils as dist_utils
+from fairseq.tasks import FairseqTask, register_task
+from omegaconf import II
+
+
+logger = logging.getLogger(__name__)
+
+
+@dataclass
+class TruncatedBPTTLMConfig(FairseqDataclass):
+    data: str = field(default="???", metadata={"help": "path to data directory"})
+    tokens_per_sample: int = field(
+        default=1024,
+        metadata={"help": "max number of tokens per sequence"},
+    )
+    batch_size: int = II("dataset.batch_size")
+    # Some models use *max_target_positions* to know how many positional
+    # embeddings to learn. We use II(...) to make it default to
+    # *tokens_per_sample*, but in principle there could be more positional
+    # embeddings than tokens in a single batch. This may also be irrelevant for
+    # custom model implementations.
+    max_target_positions: int = II("task.tokens_per_sample")
+    # these will be populated automatically if not provided
+    data_parallel_rank: Optional[int] = None
+    data_parallel_size: Optional[int] = None
+
+
+@register_task("truncated_bptt_lm", dataclass=TruncatedBPTTLMConfig)
+class TruncatedBPTTLMTask(FairseqTask):
+    def __init__(self, cfg: TruncatedBPTTLMConfig):
+        super().__init__(cfg)
+
+        if cfg.data_parallel_rank is None or cfg.data_parallel_size is None:
+            if torch.distributed.is_initialized():
+                cfg.data_parallel_rank = dist_utils.get_data_parallel_rank()
+                cfg.data_parallel_size = dist_utils.get_data_parallel_world_size()
+            else:
+                cfg.data_parallel_rank = 0
+                cfg.data_parallel_size = 1
+
+        # load the dictionary
+        paths = utils.split_paths(cfg.data)
+        assert len(paths) > 0
+        self.dictionary = Dictionary.load(os.path.join(paths[0], "dict.txt"))
+        logger.info("dictionary: {} types".format(len(self.dictionary)))
+
+    def load_dataset(self, split, epoch=1, combine=False, **kwargs):
+        """Load a given dataset split (e.g., train, valid, test)"""
+
+        # support sharded datasets
+        paths = utils.split_paths(self.cfg.data)
+        assert len(paths) > 0
+        data_path = paths[(epoch - 1) % len(paths)]
+        split_path = os.path.join(data_path, split)
+
+        # each element of *data* will be a tensorized line from the original
+        # text dataset, similar to ``open(split_path).readlines()``
+        data = data_utils.load_indexed_dataset(
+            split_path, self.dictionary, combine=combine
+        )
+        if data is None:
+            raise FileNotFoundError(
+                "Dataset not found: {} ({})".format(split, split_path)
+            )
+
+        # this is similar to ``data.view(-1).split(tokens_per_sample)``
+        data = TokenBlockDataset(
+            data,
+            data.sizes,
+            block_size=self.cfg.tokens_per_sample,
+            pad=None,  # unused
+            eos=None,  # unused
+            break_mode="none",
+        )
+
+        self.datasets[split] = TruncatedBPTTDataset(
+            data=data,
+            bsz_per_shard=self.cfg.batch_size,
+            shard_id=self.cfg.data_parallel_rank,
+            num_shards=self.cfg.data_parallel_size,
+        )
+
+    def dataset(self, split):
+        return self.datasets[split]
+
+    def get_batch_iterator(
+        self, dataset, num_workers=0, epoch=1, data_buffer_size=0, **kwargs
+    ):
+        return iterators.EpochBatchIterator(
+            dataset=dataset,
+            collate_fn=self._collate_fn,
+            num_workers=num_workers,
+            epoch=epoch,
+            buffer_size=data_buffer_size,
+            # we don't use the batching functionality from EpochBatchIterator;
+            # instead every item in *dataset* is a whole batch
+            batch_sampler=[[i] for i in range(len(dataset))],
+            disable_shuffling=True,
+        )
+
+    def _collate_fn(self, items: List[List[torch.Tensor]]):
+        # we don't use fairseq's batching functionality, so we expect a single
+        # Tensor of type List[torch.Tensor]
+        assert len(items) == 1
+
+        # item will have shape B x T (the last batch may have length < T)
+        id, item = items[0]
+        item = data_utils.collate_tokens(item, pad_idx=self.source_dictionary.pad())
+        B, T = item.size()
+
+        # shift item one position over and append a padding token for the target
+        target = torch.nn.functional.pad(
+            item[:, 1:], (0, 1, 0, 0), value=self.target_dictionary.pad()
+        )
+
+        # fairseq expects batches to have the following structure
+        return {
+            "id": torch.tensor([id]*item.size(0)),
+            "net_input": {
+                "src_tokens": item,
+            },
+            "target": target,
+            "nsentences": item.size(0),
+            "ntokens": item.numel(),
+        }
+
+    def build_dataset_for_inference(
+        self, src_tokens: List[torch.Tensor], src_lengths: List[int], **kwargs
+    ) -> torch.utils.data.Dataset:
+        eos = self.source_dictionary.eos()
+        dataset = TokenBlockDataset(
+            src_tokens,
+            src_lengths,
+            block_size=None,  # ignored for "eos" break mode
+            pad=self.source_dictionary.pad(),
+            eos=eos,
+            break_mode="eos",
+        )
+
+        class Dataset(torch.utils.data.Dataset):
+            def __getitem__(self, i):
+                item = dataset[i]
+                if item[-1] == eos:
+                    # remove eos to support generating with a prefix
+                    item = item[:-1]
+                return (i, [item])
+
+            def __len__(self):
+                return len(dataset)
+
+        return Dataset()
+
+    def inference_step(
+        self, generator, models, sample, prefix_tokens=None, constraints=None
+    ):
+        with torch.no_grad():
+            if constraints is not None:
+                raise NotImplementedError
+
+            # SequenceGenerator doesn't use *src_tokens* directly, we need to
+            # pass the *prefix_tokens* argument instead.
+            if prefix_tokens is None and sample["net_input"]["src_tokens"].nelement():
+                prefix_tokens = sample["net_input"]["src_tokens"]
+
+            # begin generation with the end-of-sentence token
+            bos_token = self.source_dictionary.eos()
+
+            return generator.generate(
+                models, sample, prefix_tokens=prefix_tokens, bos_token=bos_token
+            )
+
+    def eval_lm_dataloader(
+        self,
+        dataset,
+        max_tokens: Optional[int] = 36000,
+        batch_size: Optional[int] = None,
+        max_positions: Optional[int] = None,
+        num_shards: int = 1,
+        shard_id: int = 0,
+        num_workers: int = 1,
+        data_buffer_size: int = 10,
+        context_window: int = 0,
+    ):
+        if context_window > 0:
+            raise NotImplementedError(
+                "Transformer-XL doesn't need --context-window, try "
+                "--model-overrides '{\"mem_len\":42}' instead "
+            )
+        return self.get_batch_iterator(
+            dataset=dataset,
+            max_tokens=max_tokens,
+            max_sentences=batch_size,
+            max_positions=max_positions,
+            ignore_invalid_inputs=True,
+            num_shards=num_shards,
+            shard_id=shard_id,
+            num_workers=num_workers,
+            data_buffer_size=data_buffer_size,
+        ).next_epoch_itr(shuffle=False)
+
+    @property
+    def source_dictionary(self):
+        return self.dictionary
+
+    @property
+    def target_dictionary(self):
+        return self.dictionary
+
+
+class TruncatedBPTTDataset(torch.utils.data.Dataset):
+    def __init__(
+        self,
+        data: List[torch.Tensor],  # ordered list of items
+        bsz_per_shard,  # number of items processed per GPUs per forward
+        shard_id,  # current GPU ID
+        num_shards,  # number of GPUs
+    ):
+        super().__init__()
+        self.data = data
+
+        def batchify(data, bsz):
+            # Work out how cleanly we can divide the dataset into bsz parts.
+            nbatch = data.size(0) // bsz
+            # Trim off any extra elements that wouldn't cleanly fit (remainders).
+            data = data.narrow(0, 0, nbatch * bsz)
+            # Evenly divide the data across the bsz batches.
+            data = data.view(bsz, -1).contiguous()
+            return data
+
+        # total number of sequences processed by all GPUs in each forward pass
+        global_batch_size = bsz_per_shard * num_shards
+
+        """
+        With a 16 item dataset, bsz_per_shard=2 and num_shards=3,
+        *indices* might look like:
+
+            indices = [[0, 1],
+                       [2, 3],
+                       [4, 5],
+                       [6, 7],
+                       [8, 9],
+                       [10, 11]]
+
+        The size of the TruncatedBPTTDataset instance will be 2,
+        and shard 1 will see items:
+
+            [(0, [data[4], data[6]]),
+             (1, [data[5], data[7]])]
+        """
+        indices = batchify(torch.arange(len(data)), global_batch_size)
+        assert indices.size(0) == global_batch_size
+
+        self.my_indices = indices[
+            shard_id * bsz_per_shard : (shard_id + 1) * bsz_per_shard
+        ]
+        assert self.my_indices.size(0) == bsz_per_shard
+
+    def __len__(self):
+        return self.my_indices.size(1)
+
+    def __getitem__(self, i) -> Tuple[int, List[torch.Tensor]]:
+        return (i, [self.data[idx] for idx in self.my_indices[:, i]])
diff --git a/SpeechT5/fairseq/examples/backtranslation/README.md b/SpeechT5/fairseq/examples/backtranslation/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..73675f1125d80f58aa824db67d8970504d4d6b2a
--- /dev/null
+++ b/SpeechT5/fairseq/examples/backtranslation/README.md
@@ -0,0 +1,297 @@
+# Understanding Back-Translation at Scale (Edunov et al., 2018)
+
+This page includes pre-trained models from the paper [Understanding Back-Translation at Scale (Edunov et al., 2018)](https://arxiv.org/abs/1808.09381).
+
+## Pre-trained models
+
+Model | Description | Dataset | Download
+---|---|---|---
+`transformer.wmt18.en-de` | Transformer <br> ([Edunov et al., 2018](https://arxiv.org/abs/1808.09381)) <br> WMT'18 winner | [WMT'18 English-German](http://www.statmt.org/wmt18/translation-task.html) | [download (.tar.gz)](https://dl.fbaipublicfiles.com/fairseq/models/wmt18.en-de.ensemble.tar.gz) <br> See NOTE in the archive
+
+## Example usage (torch.hub)
+
+We require a few additional Python dependencies for preprocessing:
+```bash
+pip install subword_nmt sacremoses
+```
+
+Then to generate translations from the full model ensemble:
+```python
+import torch
+
+# List available models
+torch.hub.list('pytorch/fairseq')  # [..., 'transformer.wmt18.en-de', ... ]
+
+# Load the WMT'18 En-De ensemble
+en2de_ensemble = torch.hub.load(
+    'pytorch/fairseq', 'transformer.wmt18.en-de',
+    checkpoint_file='wmt18.model1.pt:wmt18.model2.pt:wmt18.model3.pt:wmt18.model4.pt:wmt18.model5.pt',
+    tokenizer='moses', bpe='subword_nmt')
+
+# The ensemble contains 5 models
+len(en2de_ensemble.models)
+# 5
+
+# Translate
+en2de_ensemble.translate('Hello world!')
+# 'Hallo Welt!'
+```
+
+## Training your own model (WMT'18 English-German)
+
+The following instructions can be adapted to reproduce the models from the paper.
+
+
+#### Step 1. Prepare parallel data and optionally train a baseline (English-German) model
+
+First download and preprocess the data:
+```bash
+# Download and prepare the data
+cd examples/backtranslation/
+bash prepare-wmt18en2de.sh
+cd ../..
+
+# Binarize the data
+TEXT=examples/backtranslation/wmt18_en_de
+fairseq-preprocess \
+    --joined-dictionary \
+    --source-lang en --target-lang de \
+    --trainpref $TEXT/train --validpref $TEXT/valid --testpref $TEXT/test \
+    --destdir data-bin/wmt18_en_de --thresholdtgt 0 --thresholdsrc 0 \
+    --workers 20
+
+# Copy the BPE code into the data-bin directory for future use
+cp examples/backtranslation/wmt18_en_de/code data-bin/wmt18_en_de/code
+```
+
+(Optionally) Train a baseline model (English-German) using just the parallel data:
+```bash
+CHECKPOINT_DIR=checkpoints_en_de_parallel
+fairseq-train --fp16 \
+    data-bin/wmt18_en_de \
+    --source-lang en --target-lang de \
+    --arch transformer_wmt_en_de_big --share-all-embeddings \
+    --dropout 0.3 --weight-decay 0.0 \
+    --criterion label_smoothed_cross_entropy --label-smoothing 0.1 \
+    --optimizer adam --adam-betas '(0.9, 0.98)' --clip-norm 0.0 \
+    --lr 0.001 --lr-scheduler inverse_sqrt --warmup-updates 4000 \
+    --max-tokens 3584 --update-freq 16 \
+    --max-update 30000 \
+    --save-dir $CHECKPOINT_DIR
+# Note: the above command assumes 8 GPUs. Adjust `--update-freq` if you have a
+# different number of GPUs.
+```
+
+Average the last 10 checkpoints:
+```bash
+python scripts/average_checkpoints.py \
+    --inputs $CHECKPOINT_DIR \
+    --num-epoch-checkpoints 10 \
+    --output $CHECKPOINT_DIR/checkpoint.avg10.pt
+```
+
+Evaluate BLEU:
+```bash
+# tokenized BLEU on newstest2017:
+bash examples/backtranslation/tokenized_bleu.sh \
+    wmt17 \
+    en-de \
+    data-bin/wmt18_en_de \
+    data-bin/wmt18_en_de/code \
+    $CHECKPOINT_DIR/checkpoint.avg10.pt
+# BLEU4 = 29.57, 60.9/35.4/22.9/15.5 (BP=1.000, ratio=1.014, syslen=63049, reflen=62152)
+# compare to 29.46 in Table 1, which is also for tokenized BLEU
+
+# generally it's better to report (detokenized) sacrebleu though:
+bash examples/backtranslation/sacrebleu.sh \
+    wmt17 \
+    en-de \
+    data-bin/wmt18_en_de \
+    data-bin/wmt18_en_de/code \
+    $CHECKPOINT_DIR/checkpoint.avg10.pt
+# BLEU+case.mixed+lang.en-de+numrefs.1+smooth.exp+test.wmt17+tok.13a+version.1.4.3 = 29.0 60.6/34.7/22.4/14.9 (BP = 1.000 ratio = 1.013 hyp_len = 62099 ref_len = 61287)
+```
+
+
+#### Step 2. Back-translate monolingual German data
+
+Train a reverse model (German-English) to do the back-translation:
+```bash
+CHECKPOINT_DIR=checkpoints_de_en_parallel
+fairseq-train --fp16 \
+    data-bin/wmt18_en_de \
+    --source-lang de --target-lang en \
+    --arch transformer_wmt_en_de_big --share-all-embeddings \
+    --dropout 0.3 --weight-decay 0.0 \
+    --criterion label_smoothed_cross_entropy --label-smoothing 0.1 \
+    --optimizer adam --adam-betas '(0.9, 0.98)' --clip-norm 0.0 \
+    --lr 0.001 --lr-scheduler inverse_sqrt --warmup-updates 4000 \
+    --max-tokens 3584 --update-freq 16 \
+    --max-update 30000 \
+    --save-dir $CHECKPOINT_DIR
+# Note: the above command assumes 8 GPUs. Adjust `--update-freq` if you have a
+# different number of GPUs.
+```
+
+Let's evaluate the back-translation (BT) model to make sure it is well trained:
+```bash
+bash examples/backtranslation/sacrebleu.sh \
+    wmt17 \
+    de-en \
+    data-bin/wmt18_en_de \
+    data-bin/wmt18_en_de/code \
+    $CHECKPOINT_DIR/checkpoint_best.py
+# BLEU+case.mixed+lang.de-en+numrefs.1+smooth.exp+test.wmt17+tok.13a+version.1.4.3 = 34.9 66.9/41.8/28.5/19.9 (BP = 0.983 ratio = 0.984 hyp_len = 63342 ref_len = 64399)
+# compare to the best system from WMT'17 which scored 35.1: http://matrix.statmt.org/matrix/systems_list/1868
+```
+
+Next prepare the monolingual data:
+```bash
+# Download and prepare the monolingual data
+# By default the script samples 25M monolingual sentences, which after
+# deduplication should be just over 24M sentences. These are split into 25
+# shards, each with 1M sentences (except for the last shard).
+cd examples/backtranslation/
+bash prepare-de-monolingual.sh
+cd ../..
+
+# Binarize each shard of the monolingual data
+TEXT=examples/backtranslation/wmt18_de_mono
+for SHARD in $(seq -f "%02g" 0 24); do \
+    fairseq-preprocess \
+        --only-source \
+        --source-lang de --target-lang en \
+        --joined-dictionary \
+        --srcdict data-bin/wmt18_en_de/dict.de.txt \
+        --testpref $TEXT/bpe.monolingual.dedup.${SHARD} \
+        --destdir data-bin/wmt18_de_mono/shard${SHARD} \
+        --workers 20; \
+    cp data-bin/wmt18_en_de/dict.en.txt data-bin/wmt18_de_mono/shard${SHARD}/; \
+done
+```
+
+Now we're ready to perform back-translation over the monolingual data. The
+following command generates via sampling, but it's possible to use greedy
+decoding (`--beam 1`), beam search (`--beam 5`),
+top-k sampling (`--sampling --beam 1 --sampling-topk 10`), etc.:
+```bash
+mkdir backtranslation_output
+for SHARD in $(seq -f "%02g" 0 24); do \
+    fairseq-generate --fp16 \
+        data-bin/wmt18_de_mono/shard${SHARD} \
+        --path $CHECKPOINT_DIR/checkpoint_best.pt \
+        --skip-invalid-size-inputs-valid-test \
+        --max-tokens 4096 \
+        --sampling --beam 1 \
+    > backtranslation_output/sampling.shard${SHARD}.out; \
+done
+```
+
+After BT, use the `extract_bt_data.py` script to re-combine the shards, extract
+the back-translations and apply length ratio filters:
+```bash
+python examples/backtranslation/extract_bt_data.py \
+    --minlen 1 --maxlen 250 --ratio 1.5 \
+    --output backtranslation_output/bt_data --srclang en --tgtlang de \
+    backtranslation_output/sampling.shard*.out
+
+# Ensure lengths are the same:
+# wc -l backtranslation_output/bt_data.{en,de}
+#   21795614 backtranslation_output/bt_data.en
+#   21795614 backtranslation_output/bt_data.de
+#   43591228 total
+```
+
+Binarize the filtered BT data and combine it with the parallel data:
+```bash
+TEXT=backtranslation_output
+fairseq-preprocess \
+    --source-lang en --target-lang de \
+    --joined-dictionary \
+    --srcdict data-bin/wmt18_en_de/dict.en.txt \
+    --trainpref $TEXT/bt_data \
+    --destdir data-bin/wmt18_en_de_bt \
+    --workers 20
+
+# We want to train on the combined data, so we'll symlink the parallel + BT data
+# in the wmt18_en_de_para_plus_bt directory. We link the parallel data as "train"
+# and the BT data as "train1", so that fairseq will combine them automatically
+# and so that we can use the `--upsample-primary` option to upsample the
+# parallel data (if desired).
+PARA_DATA=$(readlink -f data-bin/wmt18_en_de)
+BT_DATA=$(readlink -f data-bin/wmt18_en_de_bt)
+COMB_DATA=data-bin/wmt18_en_de_para_plus_bt
+mkdir -p $COMB_DATA
+for LANG in en de; do \
+    ln -s ${PARA_DATA}/dict.$LANG.txt ${COMB_DATA}/dict.$LANG.txt; \
+    for EXT in bin idx; do \
+        ln -s ${PARA_DATA}/train.en-de.$LANG.$EXT ${COMB_DATA}/train.en-de.$LANG.$EXT; \
+        ln -s ${BT_DATA}/train.en-de.$LANG.$EXT ${COMB_DATA}/train1.en-de.$LANG.$EXT; \
+        ln -s ${PARA_DATA}/valid.en-de.$LANG.$EXT ${COMB_DATA}/valid.en-de.$LANG.$EXT; \
+        ln -s ${PARA_DATA}/test.en-de.$LANG.$EXT ${COMB_DATA}/test.en-de.$LANG.$EXT; \
+    done; \
+done
+```
+
+
+#### 3. Train an English-German model over the combined parallel + BT data
+
+Finally we can train a model over the parallel + BT data:
+```bash
+CHECKPOINT_DIR=checkpoints_en_de_parallel_plus_bt
+fairseq-train --fp16 \
+    data-bin/wmt18_en_de_para_plus_bt \
+    --upsample-primary 16 \
+    --source-lang en --target-lang de \
+    --arch transformer_wmt_en_de_big --share-all-embeddings \
+    --dropout 0.3 --weight-decay 0.0 \
+    --criterion label_smoothed_cross_entropy --label-smoothing 0.1 \
+    --optimizer adam --adam-betas '(0.9, 0.98)' --clip-norm 0.0 \
+    --lr 0.0007 --lr-scheduler inverse_sqrt --warmup-updates 4000 \
+    --max-tokens 3584 --update-freq 16 \
+    --max-update 100000 \
+    --save-dir $CHECKPOINT_DIR
+# Note: the above command assumes 8 GPUs. Adjust `--update-freq` if you have a
+# different number of GPUs.
+```
+
+Average the last 10 checkpoints:
+```bash
+python scripts/average_checkpoints.py \
+    --inputs $CHECKPOINT_DIR \
+    --num-epoch-checkpoints 10 \
+    --output $CHECKPOINT_DIR/checkpoint.avg10.pt
+```
+
+Evaluate BLEU:
+```bash
+# tokenized BLEU on newstest2017:
+bash examples/backtranslation/tokenized_bleu.sh \
+    wmt17 \
+    en-de \
+    data-bin/wmt18_en_de \
+    data-bin/wmt18_en_de/code \
+    $CHECKPOINT_DIR/checkpoint.avg10.pt
+# BLEU4 = 32.35, 64.4/38.9/26.2/18.3 (BP=0.977, ratio=0.977, syslen=60729, reflen=62152)
+# compare to 32.35 in Table 1, which is also for tokenized BLEU
+
+# generally it's better to report (detokenized) sacrebleu:
+bash examples/backtranslation/sacrebleu.sh \
+    wmt17 \
+    en-de \
+    data-bin/wmt18_en_de \
+    data-bin/wmt18_en_de/code \
+    $CHECKPOINT_DIR/checkpoint.avg10.pt
+# BLEU+case.mixed+lang.en-de+numrefs.1+smooth.exp+test.wmt17+tok.13a+version.1.4.3 = 31.5 64.3/38.2/25.6/17.6 (BP = 0.971 ratio = 0.971 hyp_len = 59515 ref_len = 61287)
+```
+
+
+## Citation
+```bibtex
+@inproceedings{edunov2018backtranslation,
+  title = {Understanding Back-Translation at Scale},
+  author = {Edunov, Sergey and Ott, Myle and Auli, Michael and Grangier, David},
+  booktitle = {Conference of the Association for Computational Linguistics (ACL)},
+  year = 2018,
+}
+```
diff --git a/SpeechT5/fairseq/examples/backtranslation/deduplicate_lines.py b/SpeechT5/fairseq/examples/backtranslation/deduplicate_lines.py
new file mode 100644
index 0000000000000000000000000000000000000000..50e458328c80b71c42a66d473381ca7e98d294da
--- /dev/null
+++ b/SpeechT5/fairseq/examples/backtranslation/deduplicate_lines.py
@@ -0,0 +1,41 @@
+#!/usr/bin/python3
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import argparse
+import fileinput
+import hashlib
+import sys
+from multiprocessing import Pool
+
+
+def get_hashes_and_lines(raw_line):
+    hash = hashlib.md5(raw_line).hexdigest()
+    return hash, raw_line
+
+
+def main():
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--workers", type=int, default=10)
+    parser.add_argument("files", nargs="*", help="input files")
+    args = parser.parse_args()
+
+    seen = set()
+    with fileinput.input(args.files, mode="rb") as h:
+        pool = Pool(args.workers)
+        results = pool.imap_unordered(get_hashes_and_lines, h, 1000)
+        for i, (hash, raw_line) in enumerate(results):
+            if hash not in seen:
+                seen.add(hash)
+                sys.stdout.buffer.write(raw_line)
+            if i % 1000000 == 0:
+                print(i, file=sys.stderr, end="", flush=True)
+            elif i % 100000 == 0:
+                print(".", file=sys.stderr, end="", flush=True)
+    print(file=sys.stderr, flush=True)
+
+
+if __name__ == "__main__":
+    main()
diff --git a/SpeechT5/fairseq/examples/backtranslation/extract_bt_data.py b/SpeechT5/fairseq/examples/backtranslation/extract_bt_data.py
new file mode 100644
index 0000000000000000000000000000000000000000..e766391e873d0d9a9561d67d5864934b2fad0681
--- /dev/null
+++ b/SpeechT5/fairseq/examples/backtranslation/extract_bt_data.py
@@ -0,0 +1,72 @@
+#!/usr/bin/env python
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import argparse
+import fileinput
+
+from tqdm import tqdm
+
+
+def main():
+    parser = argparse.ArgumentParser(
+        description=(
+            "Extract back-translations from the stdout of fairseq-generate. "
+            "If there are multiply hypotheses for a source, we only keep the first one. "
+        )
+    )
+    parser.add_argument("--output", required=True, help="output prefix")
+    parser.add_argument(
+        "--srclang", required=True, help="source language (extracted from H-* lines)"
+    )
+    parser.add_argument(
+        "--tgtlang", required=True, help="target language (extracted from S-* lines)"
+    )
+    parser.add_argument("--minlen", type=int, help="min length filter")
+    parser.add_argument("--maxlen", type=int, help="max length filter")
+    parser.add_argument("--ratio", type=float, help="ratio filter")
+    parser.add_argument("files", nargs="*", help="input files")
+    args = parser.parse_args()
+
+    def validate(src, tgt):
+        srclen = len(src.split(" ")) if src != "" else 0
+        tgtlen = len(tgt.split(" ")) if tgt != "" else 0
+        if (
+            (args.minlen is not None and (srclen < args.minlen or tgtlen < args.minlen))
+            or (
+                args.maxlen is not None
+                and (srclen > args.maxlen or tgtlen > args.maxlen)
+            )
+            or (
+                args.ratio is not None
+                and (max(srclen, tgtlen) / float(min(srclen, tgtlen)) > args.ratio)
+            )
+        ):
+            return False
+        return True
+
+    def safe_index(toks, index, default):
+        try:
+            return toks[index]
+        except IndexError:
+            return default
+
+    with open(args.output + "." + args.srclang, "w") as src_h, open(
+        args.output + "." + args.tgtlang, "w"
+    ) as tgt_h:
+        for line in tqdm(fileinput.input(args.files)):
+            if line.startswith("S-"):
+                tgt = safe_index(line.rstrip().split("\t"), 1, "")
+            elif line.startswith("H-"):
+                if tgt is not None:
+                    src = safe_index(line.rstrip().split("\t"), 2, "")
+                    if validate(src, tgt):
+                        print(src, file=src_h)
+                        print(tgt, file=tgt_h)
+                    tgt = None
+
+
+if __name__ == "__main__":
+    main()
diff --git a/SpeechT5/fairseq/examples/backtranslation/prepare-de-monolingual.sh b/SpeechT5/fairseq/examples/backtranslation/prepare-de-monolingual.sh
new file mode 100644
index 0000000000000000000000000000000000000000..5e67b2b3bcf27d3436031453e796e58a0ae79ec4
--- /dev/null
+++ b/SpeechT5/fairseq/examples/backtranslation/prepare-de-monolingual.sh
@@ -0,0 +1,98 @@
+#!/bin/bash
+
+SCRIPTS=mosesdecoder/scripts
+TOKENIZER=$SCRIPTS/tokenizer/tokenizer.perl
+NORM_PUNC=$SCRIPTS/tokenizer/normalize-punctuation.perl
+REM_NON_PRINT_CHAR=$SCRIPTS/tokenizer/remove-non-printing-char.perl
+BPEROOT=subword-nmt/subword_nmt
+
+
+BPE_CODE=wmt18_en_de/code
+SUBSAMPLE_SIZE=25000000
+LANG=de
+
+
+OUTDIR=wmt18_${LANG}_mono
+orig=orig
+tmp=$OUTDIR/tmp
+mkdir -p $OUTDIR $tmp
+
+
+URLS=(
+    "http://www.statmt.org/wmt14/training-monolingual-news-crawl/news.2007.de.shuffled.gz"
+    "http://www.statmt.org/wmt14/training-monolingual-news-crawl/news.2008.de.shuffled.gz"
+    "http://www.statmt.org/wmt14/training-monolingual-news-crawl/news.2009.de.shuffled.gz"
+    "http://www.statmt.org/wmt14/training-monolingual-news-crawl/news.2010.de.shuffled.gz"
+    "http://www.statmt.org/wmt14/training-monolingual-news-crawl/news.2011.de.shuffled.gz"
+    "http://www.statmt.org/wmt14/training-monolingual-news-crawl/news.2012.de.shuffled.gz"
+    "http://www.statmt.org/wmt14/training-monolingual-news-crawl/news.2013.de.shuffled.gz"
+    "http://www.statmt.org/wmt15/training-monolingual-news-crawl-v2/news.2014.de.shuffled.v2.gz"
+    "http://data.statmt.org/wmt16/translation-task/news.2015.de.shuffled.gz"
+    "http://data.statmt.org/wmt17/translation-task/news.2016.de.shuffled.gz"
+    "http://data.statmt.org/wmt18/translation-task/news.2017.de.shuffled.deduped.gz"
+)
+FILES=(
+    "news.2007.de.shuffled.gz"
+    "news.2008.de.shuffled.gz"
+    "news.2009.de.shuffled.gz"
+    "news.2010.de.shuffled.gz"
+    "news.2011.de.shuffled.gz"
+    "news.2012.de.shuffled.gz"
+    "news.2013.de.shuffled.gz"
+    "news.2014.de.shuffled.v2.gz"
+    "news.2015.de.shuffled.gz"
+    "news.2016.de.shuffled.gz"
+    "news.2017.de.shuffled.deduped.gz"
+)
+
+
+cd $orig
+for ((i=0;i<${#URLS[@]};++i)); do
+    file=${FILES[i]}
+    if [ -f $file ]; then
+        echo "$file already exists, skipping download"
+    else
+        url=${URLS[i]}
+        wget "$url"
+    fi
+done
+cd ..
+
+
+if [ -f $tmp/monolingual.${SUBSAMPLE_SIZE}.${LANG} ]; then
+    echo "found monolingual sample, skipping shuffle/sample/tokenize"
+else
+    gzip -c -d -k $(for FILE in "${FILES[@]}"; do echo $orig/$FILE; done) \
+    | shuf -n $SUBSAMPLE_SIZE \
+    | perl $NORM_PUNC $LANG \
+    | perl $REM_NON_PRINT_CHAR \
+    | perl $TOKENIZER -threads 8 -a -l $LANG \
+    > $tmp/monolingual.${SUBSAMPLE_SIZE}.${LANG}
+fi
+
+
+if [ -f $tmp/bpe.monolingual.${SUBSAMPLE_SIZE}.${LANG} ]; then
+    echo "found BPE monolingual sample, skipping BPE step"
+else
+    python $BPEROOT/apply_bpe.py -c $BPE_CODE \
+        < $tmp/monolingual.${SUBSAMPLE_SIZE}.${LANG} \
+        > $tmp/bpe.monolingual.${SUBSAMPLE_SIZE}.${LANG}
+fi
+
+
+if [ -f $tmp/bpe.monolingual.dedup.${SUBSAMPLE_SIZE}.${LANG} ]; then
+    echo "found deduplicated monolingual sample, skipping deduplication step"
+else
+    python deduplicate_lines.py $tmp/bpe.monolingual.${SUBSAMPLE_SIZE}.${LANG} \
+    > $tmp/bpe.monolingual.dedup.${SUBSAMPLE_SIZE}.${LANG}
+fi
+
+
+if [ -f $OUTDIR/bpe.monolingual.dedup.00.de ]; then
+    echo "found sharded data, skipping sharding step"
+else
+    split --lines 1000000 --numeric-suffixes \
+        --additional-suffix .${LANG} \
+        $tmp/bpe.monolingual.dedup.${SUBSAMPLE_SIZE}.${LANG} \
+        $OUTDIR/bpe.monolingual.dedup.
+fi
diff --git a/SpeechT5/fairseq/examples/backtranslation/prepare-wmt18en2de.sh b/SpeechT5/fairseq/examples/backtranslation/prepare-wmt18en2de.sh
new file mode 100644
index 0000000000000000000000000000000000000000..f6fd275307db50ca84c299440ae02dce49064030
--- /dev/null
+++ b/SpeechT5/fairseq/examples/backtranslation/prepare-wmt18en2de.sh
@@ -0,0 +1,135 @@
+#!/bin/bash
+# Adapted from https://github.com/facebookresearch/MIXER/blob/master/prepareData.sh
+
+echo 'Cloning Moses github repository (for tokenization scripts)...'
+git clone https://github.com/moses-smt/mosesdecoder.git
+
+echo 'Cloning Subword NMT repository (for BPE pre-processing)...'
+git clone https://github.com/rsennrich/subword-nmt.git
+
+SCRIPTS=mosesdecoder/scripts
+TOKENIZER=$SCRIPTS/tokenizer/tokenizer.perl
+CLEAN=$SCRIPTS/training/clean-corpus-n.perl
+NORM_PUNC=$SCRIPTS/tokenizer/normalize-punctuation.perl
+REM_NON_PRINT_CHAR=$SCRIPTS/tokenizer/remove-non-printing-char.perl
+BPEROOT=subword-nmt/subword_nmt
+BPE_TOKENS=32000
+
+URLS=(
+    "http://statmt.org/wmt13/training-parallel-europarl-v7.tgz"
+    "http://statmt.org/wmt13/training-parallel-commoncrawl.tgz"
+    "http://data.statmt.org/wmt18/translation-task/training-parallel-nc-v13.tgz"
+    "http://data.statmt.org/wmt18/translation-task/rapid2016.tgz"
+    "http://data.statmt.org/wmt17/translation-task/dev.tgz"
+    "http://statmt.org/wmt14/test-full.tgz"
+)
+FILES=(
+    "training-parallel-europarl-v7.tgz"
+    "training-parallel-commoncrawl.tgz"
+    "training-parallel-nc-v13.tgz"
+    "rapid2016.tgz"
+    "dev.tgz"
+    "test-full.tgz"
+)
+CORPORA=(
+    "training/europarl-v7.de-en"
+    "commoncrawl.de-en"
+    "training-parallel-nc-v13/news-commentary-v13.de-en"
+    "rapid2016.de-en"
+)
+
+if [ ! -d "$SCRIPTS" ]; then
+    echo "Please set SCRIPTS variable correctly to point to Moses scripts."
+    exit 1
+fi
+
+OUTDIR=wmt18_en_de
+
+src=en
+tgt=de
+lang=en-de
+prep=$OUTDIR
+tmp=$prep/tmp
+orig=orig
+
+mkdir -p $orig $tmp $prep
+
+cd $orig
+
+for ((i=0;i<${#URLS[@]};++i)); do
+    file=${FILES[i]}
+    if [ -f $file ]; then
+        echo "$file already exists, skipping download"
+    else
+        url=${URLS[i]}
+        wget "$url"
+        if [ -f $file ]; then
+            echo "$url successfully downloaded."
+        else
+            echo "$url not successfully downloaded."
+            exit 1
+        fi
+        if [ ${file: -4} == ".tgz" ]; then
+            tar zxvf $file
+        elif [ ${file: -4} == ".tar" ]; then
+            tar xvf $file
+        fi
+    fi
+done
+cd ..
+
+echo "pre-processing train data..."
+for l in $src $tgt; do
+    rm $tmp/train.tags.$lang.tok.$l
+    for f in "${CORPORA[@]}"; do
+        cat $orig/$f.$l | \
+            perl $NORM_PUNC $l | \
+            perl $REM_NON_PRINT_CHAR | \
+            perl $TOKENIZER -threads 8 -a -l $l >> $tmp/train.tags.$lang.tok.$l
+    done
+done
+
+echo "pre-processing test data..."
+for l in $src $tgt; do
+    if [ "$l" == "$src" ]; then
+        t="src"
+    else
+        t="ref"
+    fi
+    grep '<seg id' $orig/test-full/newstest2014-deen-$t.$l.sgm | \
+        sed -e 's/<seg id="[0-9]*">\s*//g' | \
+        sed -e 's/\s*<\/seg>\s*//g' | \
+        sed -e "s/\’/\'/g" | \
+    perl $TOKENIZER -threads 8 -a -l $l > $tmp/test.$l
+    echo ""
+done
+
+echo "splitting train and valid..."
+for l in $src $tgt; do
+    awk '{if (NR%100 == 0)  print $0; }' $tmp/train.tags.$lang.tok.$l > $tmp/valid.$l
+    awk '{if (NR%100 != 0)  print $0; }' $tmp/train.tags.$lang.tok.$l > $tmp/train.$l
+done
+
+TRAIN=$tmp/train.de-en
+BPE_CODE=$prep/code
+rm -f $TRAIN
+for l in $src $tgt; do
+    cat $tmp/train.$l >> $TRAIN
+done
+
+echo "learn_bpe.py on ${TRAIN}..."
+python $BPEROOT/learn_bpe.py -s $BPE_TOKENS < $TRAIN > $BPE_CODE
+
+for L in $src $tgt; do
+    for f in train.$L valid.$L test.$L; do
+        echo "apply_bpe.py to ${f}..."
+        python $BPEROOT/apply_bpe.py -c $BPE_CODE < $tmp/$f > $tmp/bpe.$f
+    done
+done
+
+perl $CLEAN -ratio 1.5 $tmp/bpe.train $src $tgt $prep/train 1 250
+perl $CLEAN -ratio 1.5 $tmp/bpe.valid $src $tgt $prep/valid 1 250
+
+for L in $src $tgt; do
+    cp $tmp/bpe.test.$L $prep/test.$L
+done
diff --git a/SpeechT5/fairseq/examples/backtranslation/sacrebleu.sh b/SpeechT5/fairseq/examples/backtranslation/sacrebleu.sh
new file mode 100644
index 0000000000000000000000000000000000000000..a70da23f48e2699297799611412783d4560dc45a
--- /dev/null
+++ b/SpeechT5/fairseq/examples/backtranslation/sacrebleu.sh
@@ -0,0 +1,37 @@
+#!/bin/bash
+
+if [ $# -ne 5 ]; then
+    echo "usage: $0 [dataset=wmt14/full] [langpair=en-de] [databin] [bpecode] [model]"
+    exit
+fi
+
+
+DATASET=$1
+LANGPAIR=$2
+DATABIN=$3
+BPECODE=$4
+MODEL=$5
+
+SRCLANG=$(echo $LANGPAIR | cut -d '-' -f 1)
+TGTLANG=$(echo $LANGPAIR | cut -d '-' -f 2)
+
+
+BPEROOT=examples/backtranslation/subword-nmt/subword_nmt
+if [ ! -e $BPEROOT ]; then
+    BPEROOT=subword-nmt/subword_nmt
+    if [ ! -e $BPEROOT ]; then
+        echo 'Cloning Subword NMT repository (for BPE pre-processing)...'
+        git clone https://github.com/rsennrich/subword-nmt.git
+    fi
+fi
+
+
+sacrebleu -t $DATASET -l $LANGPAIR --echo src \
+| sacremoses tokenize -a -l $SRCLANG -q \
+| python $BPEROOT/apply_bpe.py -c $BPECODE \
+| fairseq-interactive $DATABIN --path $MODEL \
+    -s $SRCLANG -t $TGTLANG \
+    --beam 5 --remove-bpe --buffer-size 1024 --max-tokens 8000 \
+| grep ^H- | cut -f 3- \
+| sacremoses detokenize -l $TGTLANG -q \
+| sacrebleu -t $DATASET -l $LANGPAIR
diff --git a/SpeechT5/fairseq/examples/backtranslation/tokenized_bleu.sh b/SpeechT5/fairseq/examples/backtranslation/tokenized_bleu.sh
new file mode 100644
index 0000000000000000000000000000000000000000..c6d6aaa193f6059299bc98909324fe4b9b060372
--- /dev/null
+++ b/SpeechT5/fairseq/examples/backtranslation/tokenized_bleu.sh
@@ -0,0 +1,46 @@
+#!/bin/bash
+
+if [ $# -ne 5 ]; then
+    echo "usage: $0 [dataset=wmt14/full] [langpair=en-de] [databin] [bpecode] [model]"
+    exit
+fi
+
+
+DATASET=$1
+LANGPAIR=$2
+DATABIN=$3
+BPECODE=$4
+MODEL=$5
+
+SRCLANG=$(echo $LANGPAIR | cut -d '-' -f 1)
+TGTLANG=$(echo $LANGPAIR | cut -d '-' -f 2)
+
+
+BPEROOT=examples/backtranslation/subword-nmt/subword_nmt
+if [ ! -e $BPEROOT ]; then
+    BPEROOT=subword-nmt/subword_nmt
+    if [ ! -e $BPEROOT ]; then
+        echo 'Cloning Subword NMT repository (for BPE pre-processing)...'
+        git clone https://github.com/rsennrich/subword-nmt.git
+    fi
+fi
+
+
+TMP_REF=$(mktemp)
+
+sacrebleu -t $DATASET -l $LANGPAIR --echo ref -q \
+| sacremoses normalize -l $TGTLANG -q \
+| sacremoses tokenize -a -l $TGTLANG -q \
+> $TMP_REF
+
+sacrebleu -t $DATASET -l $LANGPAIR --echo src -q \
+| sacremoses normalize -l $SRCLANG -q \
+| sacremoses tokenize -a -l $SRCLANG -q \
+| python $BPEROOT/apply_bpe.py -c $BPECODE \
+| fairseq-interactive $DATABIN --path $MODEL \
+    -s $SRCLANG -t $TGTLANG \
+    --beam 5 --remove-bpe --buffer-size 1024 --max-tokens 8000 \
+| grep ^H- | cut -f 3- \
+| fairseq-score --ref $TMP_REF
+
+rm -f $TMP_REF
diff --git a/SpeechT5/fairseq/examples/bart/README.glue.md b/SpeechT5/fairseq/examples/bart/README.glue.md
new file mode 100644
index 0000000000000000000000000000000000000000..a010934e1e6dec491eb1c704ec02ba7405760510
--- /dev/null
+++ b/SpeechT5/fairseq/examples/bart/README.glue.md
@@ -0,0 +1,99 @@
+# Fine-tuning BART on GLUE tasks
+
+### 1) Download the data from GLUE website (https://gluebenchmark.com/tasks) using following commands:
+```bash
+wget https://gist.githubusercontent.com/W4ngatang/60c2bdb54d156a41194446737ce03e2e/raw/17b8dd0d724281ed7c3b2aeeda662b92809aadd5/download_glue_data.py
+python download_glue_data.py --data_dir glue_data --tasks all
+```
+
+### 2) Preprocess GLUE task data (same as RoBERTa):
+```bash
+./examples/roberta/preprocess_GLUE_tasks.sh glue_data <glue_task_name>
+```
+`glue_task_name` is one of the following:
+`{ALL, QQP, MNLI, QNLI, MRPC, RTE, STS-B, SST-2, CoLA}`
+Use `ALL` for preprocessing all the glue tasks.
+
+### 3) Fine-tuning on GLUE task:
+Example fine-tuning cmd for `RTE` task
+```bash
+TOTAL_NUM_UPDATES=2036  # 10 epochs through RTE for bsz 16
+WARMUP_UPDATES=61      # 6 percent of the number of updates
+LR=1e-05                # Peak LR for polynomial LR scheduler.
+NUM_CLASSES=2
+MAX_SENTENCES=16        # Batch size.
+BART_PATH=/path/to/bart/model.pt
+
+CUDA_VISIBLE_DEVICES=0,1 fairseq-train RTE-bin/ \
+    --restore-file $BART_PATH \
+    --batch-size $MAX_SENTENCES \
+    --max-tokens 4400 \
+    --task sentence_prediction \
+    --add-prev-output-tokens \
+    --layernorm-embedding \
+    --share-all-embeddings \
+    --share-decoder-input-output-embed \
+    --reset-optimizer --reset-dataloader --reset-meters \
+    --required-batch-size-multiple 1 \
+    --init-token 0 \
+    --arch bart_large \
+    --criterion sentence_prediction \
+    --num-classes $NUM_CLASSES \
+    --dropout 0.1 --attention-dropout 0.1 \
+    --weight-decay 0.01 --optimizer adam --adam-betas "(0.9, 0.98)" --adam-eps 1e-08 \
+    --clip-norm 0.0 \
+    --lr-scheduler polynomial_decay --lr $LR --total-num-update $TOTAL_NUM_UPDATES --warmup-updates $WARMUP_UPDATES \
+    --fp16 --fp16-init-scale 4 --threshold-loss-scale 1 --fp16-scale-window 128 \
+    --max-epoch 10 \
+    --find-unused-parameters \
+    --best-checkpoint-metric accuracy --maximize-best-checkpoint-metric;
+```
+
+For each of the GLUE task, you will need to use following cmd-line arguments:
+
+Model | MNLI | QNLI | QQP | RTE | SST-2 | MRPC | CoLA | STS-B
+---|---|---|---|---|---|---|---|---
+`--num-classes` | 3 | 2 | 2 | 2 | 2 | 2 | 2 | 1
+`--lr` | 5e-6 | 1e-5 | 1e-5 | 1e-5 | 5e-6 | 2e-5 | 2e-5 | 2e-5
+`bsz` | 128 | 32 | 32 | 32 | 128 | 64 | 64 | 32
+`--total-num-update` | 30968 | 33112 | 113272 | 1018 | 5233 | 1148 | 1334 | 1799
+`--warmup-updates` | 1858 | 1986 | 6796 | 61 | 314 | 68 | 80 | 107
+
+For `STS-B` additionally add `--regression-target --best-checkpoint-metric loss` and remove `--maximize-best-checkpoint-metric`.
+
+**Note:**
+
+a) `--total-num-updates` is used by `--polynomial_decay` scheduler and is calculated for `--max-epoch=10` and `--batch-size=32/64/128` depending on the task.
+
+b) Above cmd-args and hyperparams are tested on Nvidia `V100` GPU with `32gb` of memory for each task. Depending on the GPU memory resources available to you, you can use increase `--update-freq` and reduce `--batch-size`.
+
+### Inference on GLUE task
+After training the model as mentioned in previous step, you can perform inference with checkpoints in `checkpoints/` directory using following python code snippet:
+
+```python
+from fairseq.models.bart import BARTModel
+
+bart = BARTModel.from_pretrained(
+    'checkpoints/',
+    checkpoint_file='checkpoint_best.pt',
+    data_name_or_path='RTE-bin'
+)
+
+label_fn = lambda label: bart.task.label_dictionary.string(
+    [label + bart.task.label_dictionary.nspecial]
+)   
+ncorrect, nsamples = 0, 0
+bart.cuda()
+bart.eval()
+with open('glue_data/RTE/dev.tsv') as fin:
+    fin.readline()
+    for index, line in enumerate(fin):
+        tokens = line.strip().split('\t')
+        sent1, sent2, target = tokens[1], tokens[2], tokens[3]
+        tokens = bart.encode(sent1, sent2)
+        prediction = bart.predict('sentence_classification_head', tokens).argmax().item()
+        prediction_label = label_fn(prediction)
+        ncorrect += int(prediction_label == target)
+        nsamples += 1
+print('| Accuracy: ', float(ncorrect)/float(nsamples))
+```
diff --git a/SpeechT5/fairseq/examples/bart/README.md b/SpeechT5/fairseq/examples/bart/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..4050a724ee6a2f20c9998a95df48c58b64764ab1
--- /dev/null
+++ b/SpeechT5/fairseq/examples/bart/README.md
@@ -0,0 +1,228 @@
+# BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
+
+[https://arxiv.org/abs/1910.13461](https://arxiv.org/abs/1910.13461)
+
+## Introduction
+
+BART is sequence-to-sequence model trained with denoising as pretraining objective. We show that this pretraining objective is more generic and show that we can match [RoBERTa](../roberta) results on SQuAD and GLUE and gain state-of-the-art results on summarization (XSum, CNN dataset), long form generative question answering (ELI5) and dialog response genration (ConvAI2). See the associated paper for more details.
+
+## Pre-trained models
+
+Model | Description | # params | Download
+---|---|---|---
+`bart.base` | BART model with 6 encoder and decoder layers | 140M | [bart.base.tar.gz](https://dl.fbaipublicfiles.com/fairseq/models/bart.base.tar.gz)
+`bart.large` | BART model with 12 encoder and decoder layers | 400M | [bart.large.tar.gz](https://dl.fbaipublicfiles.com/fairseq/models/bart.large.tar.gz)
+`bart.large.mnli` | `bart.large` finetuned on `MNLI` | 400M | [bart.large.mnli.tar.gz](https://dl.fbaipublicfiles.com/fairseq/models/bart.large.mnli.tar.gz)
+`bart.large.cnn` | `bart.large` finetuned on `CNN-DM` | 400M | [bart.large.cnn.tar.gz](https://dl.fbaipublicfiles.com/fairseq/models/bart.large.cnn.tar.gz)
+`bart.large.xsum` | `bart.large` finetuned on `Xsum` | 400M | [bart.large.xsum.tar.gz](https://dl.fbaipublicfiles.com/fairseq/models/bart.large.xsum.tar.gz)
+
+## Results
+
+**[GLUE (Wang et al., 2019)](https://gluebenchmark.com/)**
+_(dev set, single model, single-task finetuning)_
+
+Model | MNLI | QNLI | QQP | RTE | SST-2 | MRPC | CoLA | STS-B
+---|---|---|---|---|---|---|---|---
+`roberta.large` | 90.2 | 94.7 | 92.2 | 86.6 | 96.4 | 90.9 | 68.0 | 92.4
+`bart.large` | 89.9 | 94.9 | 92.5 | 87.0 | 96.6 | 90.4 | 62.8 | 91.2
+
+**[SQuAD (Rajpurkar et al., 2018)](https://rajpurkar.github.io/SQuAD-explorer/)**
+_(dev set, no additional data used)_
+
+Model | SQuAD 1.1 EM/F1 | SQuAD 2.0 EM/F1
+---|---|---
+`roberta.large` | 88.9/94.6 | 86.5/89.4
+`bart.large` | 88.8/94.6 | 86.1/89.2
+
+**[CNN/Daily Mail](http://nlpprogress.com/english/summarization.html)**
+_(test set, no additional data used)_
+
+Model | R1 | R2 | RL
+---|---|---|---
+`BERTSUMEXTABS` | 42.13 | 19.60 | 39.18
+`bart.large` | 44.16 | 21.28 | 40.90
+
+## Example usage
+
+##### Load BART from torch.hub (PyTorch >= 1.1):
+```python
+import torch
+bart = torch.hub.load('pytorch/fairseq', 'bart.large')
+bart.eval()  # disable dropout (or leave in train mode to finetune)
+```
+
+##### Load BART (for PyTorch 1.0 or custom models):
+```python
+# Download bart.large model
+wget https://dl.fbaipublicfiles.com/fairseq/models/bart.large.tar.gz
+tar -xzvf bart.large.tar.gz
+
+# Load the model in fairseq
+from fairseq.models.bart import BARTModel
+bart = BARTModel.from_pretrained('/path/to/bart.large', checkpoint_file='model.pt')
+bart.eval()  # disable dropout (or leave in train mode to finetune)
+```
+
+##### Apply Byte-Pair Encoding (BPE) to input text:
+```python
+tokens = bart.encode('Hello world!')
+assert tokens.tolist() == [0, 31414, 232, 328, 2]
+bart.decode(tokens)  # 'Hello world!'
+```
+
+##### Extract features from BART:
+```python
+# Extract the last layer's features
+last_layer_features = bart.extract_features(tokens)
+assert last_layer_features.size() == torch.Size([1, 5, 1024])
+
+# Extract all layer's features from decoder (layer 0 is the embedding layer)
+all_layers = bart.extract_features(tokens, return_all_hiddens=True)
+assert len(all_layers) == 13
+assert torch.all(all_layers[-1] == last_layer_features)
+```
+
+##### Use BART for sentence-pair classification tasks:
+```python
+# Download BART already finetuned for MNLI
+bart = torch.hub.load('pytorch/fairseq', 'bart.large.mnli')
+bart.eval()  # disable dropout for evaluation
+
+# Encode a pair of sentences and make a prediction
+tokens = bart.encode('BART is a seq2seq model.', 'BART is not sequence to sequence.')
+bart.predict('mnli', tokens).argmax()  # 0: contradiction
+
+# Encode another pair of sentences
+tokens = bart.encode('BART is denoising autoencoder.', 'BART is version of autoencoder.')
+bart.predict('mnli', tokens).argmax()  # 2: entailment
+```
+
+##### Register a new (randomly initialized) classification head:
+```python
+bart.register_classification_head('new_task', num_classes=3)
+logprobs = bart.predict('new_task', tokens)
+```
+
+##### Batched prediction:
+```python
+import torch
+from fairseq.data.data_utils import collate_tokens
+
+bart = torch.hub.load('pytorch/fairseq', 'bart.large.mnli')
+bart.eval()
+
+batch_of_pairs = [
+    ['BART is a seq2seq model.', 'BART is not sequence to sequence.'],
+    ['BART is denoising autoencoder.', 'BART is version of autoencoder.'],
+]
+
+batch = collate_tokens(
+    [bart.encode(pair[0], pair[1]) for pair in batch_of_pairs], pad_idx=1
+)
+
+logprobs = bart.predict('mnli', batch)
+print(logprobs.argmax(dim=1))
+# tensor([0, 2])
+```
+
+##### Using the GPU:
+```python
+bart.cuda()
+bart.predict('new_task', tokens)
+```
+
+#### Filling masks:
+
+BART can be used to fill multiple `<mask>` tokens in the input.
+```python
+bart = torch.hub.load('pytorch/fairseq', 'bart.base')
+bart.eval()
+bart.fill_mask(['The cat <mask> on the <mask>.'], topk=3, beam=10)
+# [[('The cat was on the ground.', tensor(-0.6183)), ('The cat was on the floor.', tensor(-0.6798)), ('The cat sleeps on the couch.', tensor(-0.6830))]]
+```
+
+Note that by default we enforce the output length to match the input length.
+This can be disabled by setting ``match_source_len=False``:
+```
+bart.fill_mask(['The cat <mask> on the <mask>.'], topk=3, beam=10, match_source_len=False)
+# [[('The cat was on the ground.', tensor(-0.6185)), ('The cat was asleep on the couch.', tensor(-0.6276)), ('The cat was on the floor.', tensor(-0.6800))]]
+```
+
+Example code to fill masks for a batch of sentences using GPU
+```
+bart.cuda()
+bart.fill_mask(['The cat <mask> on the <mask>.', 'The dog <mask> on the <mask>.'], topk=3, beam=10)
+# [[('The cat was on the ground.', tensor(-0.6183)), ('The cat was on the floor.', tensor(-0.6798)), ('The cat sleeps on the couch.', tensor(-0.6830))], [('The dog was on the ground.', tensor(-0.6190)), ('The dog lay on the ground.', tensor(-0.6711)),
+('The dog was asleep on the couch', tensor(-0.6796))]]
+```
+
+#### Evaluating the `bart.large.mnli` model:
+
+Example python code snippet to evaluate accuracy on the MNLI `dev_matched` set.
+```python
+label_map = {0: 'contradiction', 1: 'neutral', 2: 'entailment'}
+ncorrect, nsamples = 0, 0
+bart.cuda()
+bart.eval()
+with open('glue_data/MNLI/dev_matched.tsv') as fin:
+    fin.readline()
+    for index, line in enumerate(fin):
+        tokens = line.strip().split('\t')
+        sent1, sent2, target = tokens[8], tokens[9], tokens[-1]
+        tokens = bart.encode(sent1, sent2)
+        prediction = bart.predict('mnli', tokens).argmax().item()
+        prediction_label = label_map[prediction]
+        ncorrect += int(prediction_label == target)
+        nsamples += 1
+        print('| Accuracy: ', float(ncorrect)/float(nsamples))
+# Expected output: 0.9010
+```
+
+#### Evaluating the `bart.large.cnn` model:
+- Follow instructions [here](https://github.com/abisee/cnn-dailymail) to download and process into data-files such that `test.source` and `test.target` has one line for each non-tokenized sample.
+- For simpler preprocessing, you can also `wget https://cdn-datasets.huggingface.co/summarization/cnn_dm_v2.tgz`, although there is no guarantee of identical scores
+- `huggingface/transformers` has a simpler interface that supports [single-gpu](https://github.com/huggingface/transformers/blob/master/examples/legacy/seq2seq/run_eval.py) and [multi-gpu](https://github.com/huggingface/transformers/blob/master/examples/legacy/seq2seq/run_distributed_eval.py) beam search.
+    In `huggingface/transformers`, the BART models' paths are `facebook/bart-large-cnn` and `facebook/bart-large-xsum`.
+
+In `fairseq`, summaries can be generated using:
+
+```bash
+cp data-bin/cnn_dm/dict.source.txt  checkpoints/
+python examples/bart/summarize.py \
+  --model-dir pytorch/fairseq \
+  --model-file bart.large.cnn \
+  --src cnn_dm/test.source \
+  --out cnn_dm/test.hypo
+```
+
+For calculating rouge, install `files2rouge` from [here](https://github.com/pltrdy/files2rouge).
+
+```bash
+export CLASSPATH=/path/to/stanford-corenlp-full-2016-10-31/stanford-corenlp-3.7.0.jar
+
+# Tokenize hypothesis and target files.
+cat test.hypo | java edu.stanford.nlp.process.PTBTokenizer -ioFileList -preserveLines > test.hypo.tokenized
+cat test.target | java edu.stanford.nlp.process.PTBTokenizer -ioFileList -preserveLines > test.hypo.target
+files2rouge test.hypo.tokenized test.hypo.target
+# Expected output: (ROUGE-2 Average_F: 0.21238)
+```
+
+
+## Finetuning
+
+- [Finetuning on GLUE](README.glue.md)
+- [Finetuning on CNN-DM](README.summarization.md)
+
+## Citation
+
+```bibtex
+@article{lewis2019bart,
+    title = {BART: Denoising Sequence-to-Sequence Pre-training for Natural
+Language Generation, Translation, and Comprehension},
+    author = {Mike Lewis and Yinhan Liu and Naman Goyal and Marjan Ghazvininejad and
+              Abdelrahman Mohamed and Omer Levy and Veselin Stoyanov
+              and Luke Zettlemoyer },
+    journal={arXiv preprint arXiv:1910.13461},
+    year = {2019},
+}
+```
diff --git a/SpeechT5/fairseq/examples/bart/README.summarization.md b/SpeechT5/fairseq/examples/bart/README.summarization.md
new file mode 100644
index 0000000000000000000000000000000000000000..8727584f2b2bdd880c6cd3abbf39b75dfbf4a67c
--- /dev/null
+++ b/SpeechT5/fairseq/examples/bart/README.summarization.md
@@ -0,0 +1,102 @@
+# Fine-tuning BART on CNN-Dailymail summarization task
+
+### 1) Download the CNN and Daily Mail data and preprocess it into data files with non-tokenized cased samples.
+
+Follow the instructions [here](https://github.com/abisee/cnn-dailymail) to download the original CNN and Daily Mail datasets. To preprocess the data, refer to the pointers in [this issue](https://github.com/pytorch/fairseq/issues/1391) or check out the code [here](https://github.com/artmatsak/cnn-dailymail).
+
+Follow the instructions [here](https://github.com/EdinburghNLP/XSum) to download the original Extreme Summarization datasets, or check out the code [here](https://github.com/EdinburghNLP/XSum/tree/master/XSum-Dataset), Please keep the raw dataset and make sure no tokenization nor BPE on the dataset.
+
+### 2) BPE preprocess:
+
+```bash
+wget -N 'https://dl.fbaipublicfiles.com/fairseq/gpt2_bpe/encoder.json'
+wget -N 'https://dl.fbaipublicfiles.com/fairseq/gpt2_bpe/vocab.bpe'
+wget -N 'https://dl.fbaipublicfiles.com/fairseq/gpt2_bpe/dict.txt'
+
+TASK=cnn_dm
+for SPLIT in train val
+do
+  for LANG in source target
+  do
+    python -m examples.roberta.multiprocessing_bpe_encoder \
+    --encoder-json encoder.json \
+    --vocab-bpe vocab.bpe \
+    --inputs "$TASK/$SPLIT.$LANG" \
+    --outputs "$TASK/$SPLIT.bpe.$LANG" \
+    --workers 60 \
+    --keep-empty;
+  done
+done
+```
+
+### 3) Binarize dataset:
+```bash
+fairseq-preprocess \
+  --source-lang "source" \
+  --target-lang "target" \
+  --trainpref "${TASK}/train.bpe" \
+  --validpref "${TASK}/val.bpe" \
+  --destdir "${TASK}-bin/" \
+  --workers 60 \
+  --srcdict dict.txt \
+  --tgtdict dict.txt;
+```
+
+### 4) Fine-tuning on CNN-DM summarization task:
+Example fine-tuning CNN-DM
+```bash
+TOTAL_NUM_UPDATES=20000  
+WARMUP_UPDATES=500      
+LR=3e-05
+MAX_TOKENS=2048
+UPDATE_FREQ=4
+BART_PATH=/path/to/bart/model.pt
+
+CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 fairseq-train cnn_dm-bin \
+    --restore-file $BART_PATH \
+    --max-tokens $MAX_TOKENS \
+    --task translation \
+    --source-lang source --target-lang target \
+    --truncate-source \
+    --layernorm-embedding \
+    --share-all-embeddings \
+    --share-decoder-input-output-embed \
+    --reset-optimizer --reset-dataloader --reset-meters \
+    --required-batch-size-multiple 1 \
+    --arch bart_large \
+    --criterion label_smoothed_cross_entropy \
+    --label-smoothing 0.1 \
+    --dropout 0.1 --attention-dropout 0.1 \
+    --weight-decay 0.01 --optimizer adam --adam-betas "(0.9, 0.999)" --adam-eps 1e-08 \
+    --clip-norm 0.1 \
+    --lr-scheduler polynomial_decay --lr $LR --total-num-update $TOTAL_NUM_UPDATES --warmup-updates $WARMUP_UPDATES \
+    --fp16 --update-freq $UPDATE_FREQ \
+    --skip-invalid-size-inputs-valid-test \
+    --find-unused-parameters;
+```
+Above is expected to run on `1` node with `8 32gb-V100`.
+Expected training time is about `5 hours`. Training time can be reduced with distributed training on `4` nodes and `--update-freq 1`.
+
+Use TOTAL_NUM_UPDATES=15000 UPDATE_FREQ=2 for Xsum task
+
+### Inference for CNN-DM test data using above trained checkpoint.
+After training the model as mentioned in previous step, you can perform inference with checkpoints in `checkpoints/` directory using `eval_cnn.py`, for example
+
+```bash
+cp data-bin/cnn_dm/dict.source.txt  checkpoints/
+python examples/bart/summarize.py \
+  --model-dir checkpoints \
+  --model-file checkpoint_best.pt \
+  --src cnn_dm/test.source \
+  --out cnn_dm/test.hypo
+```
+For XSUM, which uses beam=6, lenpen=1.0, max_len_b=60, min_len=10:
+```bash
+cp data-bin/cnn_dm/dict.source.txt  checkpoints/
+python examples/bart/summarize.py \
+  --model-dir checkpoints \
+  --model-file checkpoint_best.pt \
+  --src cnn_dm/test.source \
+  --out cnn_dm/test.hypo \
+  --xsum-kwargs
+```
diff --git a/SpeechT5/fairseq/examples/bart/summarize.py b/SpeechT5/fairseq/examples/bart/summarize.py
new file mode 100644
index 0000000000000000000000000000000000000000..04435f80e39c2d9d894696dae7cba5b381e13da9
--- /dev/null
+++ b/SpeechT5/fairseq/examples/bart/summarize.py
@@ -0,0 +1,100 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import torch
+from fairseq.models.bart import BARTModel
+import argparse
+
+XSUM_KWARGS = dict(beam=6, lenpen=1.0, max_len_b=60, min_len=10, no_repeat_ngram_size=3)
+CNN_KWARGS = dict(beam=4, lenpen=2.0, max_len_b=140, min_len=55, no_repeat_ngram_size=3)
+
+
+@torch.no_grad()
+def generate(bart, infile, outfile="bart_hypo.txt", bsz=32, n_obs=None, **eval_kwargs):
+    count = 1
+
+    # if n_obs is not None: bsz = min(bsz, n_obs)
+
+    with open(infile) as source, open(outfile, "w") as fout:
+        sline = source.readline().strip()
+        slines = [sline]
+        for sline in source:
+            if n_obs is not None and count > n_obs:
+                break
+            if count % bsz == 0:
+                hypotheses_batch = bart.sample(slines, **eval_kwargs)
+                for hypothesis in hypotheses_batch:
+                    fout.write(hypothesis + "\n")
+                    fout.flush()
+                slines = []
+
+            slines.append(sline.strip())
+            count += 1
+
+        if slines != []:
+            hypotheses_batch = bart.sample(slines, **eval_kwargs)
+            for hypothesis in hypotheses_batch:
+                fout.write(hypothesis + "\n")
+                fout.flush()
+
+
+def main():
+    """
+    Usage::
+
+         python examples/bart/summarize.py \
+            --model-dir $HOME/bart.large.cnn \
+            --model-file model.pt \
+            --src $HOME/data-bin/cnn_dm/test.source
+    """
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        "--model-dir",
+        required=True,
+        type=str,
+        default="bart.large.cnn/",
+        help="path containing model file and src_dict.txt",
+    )
+    parser.add_argument(
+        "--model-file",
+        default="checkpoint_best.pt",
+        help="where in model_dir are weights saved",
+    )
+    parser.add_argument(
+        "--src", default="test.source", help="text to summarize", type=str
+    )
+    parser.add_argument(
+        "--out", default="test.hypo", help="where to save summaries", type=str
+    )
+    parser.add_argument("--bsz", default=32, help="where to save summaries", type=int)
+    parser.add_argument(
+        "--n", default=None, help="how many examples to summarize", type=int
+    )
+    parser.add_argument(
+        "--xsum-kwargs",
+        action="store_true",
+        default=False,
+        help="if true use XSUM_KWARGS else CNN_KWARGS",
+    )
+    args = parser.parse_args()
+    eval_kwargs = XSUM_KWARGS if args.xsum_kwargs else CNN_KWARGS
+    if args.model_dir == "pytorch/fairseq":
+        bart = torch.hub.load("pytorch/fairseq", args.model_file)
+    else:
+        bart = BARTModel.from_pretrained(
+            args.model_dir,
+            checkpoint_file=args.model_file,
+            data_name_or_path=args.model_dir,
+        )
+    bart = bart.eval()
+    if torch.cuda.is_available():
+        bart = bart.cuda().half()
+    generate(
+        bart, args.src, bsz=args.bsz, n_obs=args.n, outfile=args.out, **eval_kwargs
+    )
+
+
+if __name__ == "__main__":
+    main()
diff --git a/SpeechT5/fairseq/examples/byte_level_bpe/README.md b/SpeechT5/fairseq/examples/byte_level_bpe/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..657092660eae42d20f67647417623b8b8cb7b66c
--- /dev/null
+++ b/SpeechT5/fairseq/examples/byte_level_bpe/README.md
@@ -0,0 +1,88 @@
+# Neural Machine Translation with Byte-Level Subwords
+
+https://arxiv.org/abs/1909.03341
+
+We provide an implementation of byte-level byte-pair encoding (BBPE), taking IWSLT 2017 Fr-En translation as
+example.
+
+## Data
+Get data and generate fairseq binary dataset:
+```bash
+bash ./get_data.sh
+```
+
+## Model Training
+Train Transformer model with Bi-GRU embedding contextualization (implemented in `gru_transformer.py`):
+```bash
+# VOCAB=bytes
+# VOCAB=chars
+VOCAB=bbpe2048
+# VOCAB=bpe2048
+# VOCAB=bbpe4096
+# VOCAB=bpe4096
+# VOCAB=bpe16384
+```
+```bash
+fairseq-train "data/bin_${VOCAB}" --task translation --user-dir examples/byte_level_bpe/gru_transformer \
+    --arch gru_transformer --encoder-layers 2 --decoder-layers 2 --dropout 0.3 --share-all-embeddings \
+    --optimizer adam --adam-betas '(0.9, 0.98)' \
+    --lr 5e-4 --lr-scheduler inverse_sqrt --warmup-updates 4000 \
+    --criterion label_smoothed_cross_entropy --label-smoothing 0.1 \
+    --log-format 'simple' --log-interval 100 --save-dir "checkpoints/${VOCAB}" \
+    --batch-size 100 --max-update 100000 --update-freq 2
+```
+
+## Generation
+`fairseq-generate` requires bytes (BBPE) decoder to convert byte-level representation back to characters:
+```bash
+# BPE=--bpe bytes
+# BPE=--bpe characters
+BPE=--bpe byte_bpe --sentencepiece-model-path data/spm_bbpe2048.model
+# BPE=--bpe sentencepiece --sentencepiece-model data/spm_bpe2048.model
+# BPE=--bpe byte_bpe --sentencepiece-model-path data/spm_bbpe4096.model
+# BPE=--bpe sentencepiece --sentencepiece-model data/spm_bpe4096.model
+# BPE=--bpe sentencepiece --sentencepiece-model data/spm_bpe16384.model
+```
+
+```bash
+fairseq-generate "data/bin_${VOCAB}" --task translation --user-dir examples/byte_level_bpe/gru_transformer \
+    --source-lang fr --gen-subset test --sacrebleu --path "checkpoints/${VOCAB}/checkpoint_last.pt" \
+    --tokenizer moses --moses-target-lang en ${BPE}
+```
+When using `fairseq-interactive`, bytes (BBPE) encoder/decoder is required to tokenize input data and detokenize model predictions:
+```bash
+fairseq-interactive "data/bin_${VOCAB}" --task translation --user-dir examples/byte_level_bpe/gru_transformer \
+    --path "checkpoints/${VOCAB}/checkpoint_last.pt" --input data/test.fr --tokenizer moses --moses-source-lang fr \
+    --moses-target-lang en ${BPE} --buffer-size 1000 --max-tokens 10000
+```
+
+## Results
+| Vocabulary    | Model  | BLEU |
+|:-------------:|:-------------:|:-------------:|
+| Joint BPE 16k ([Kudo, 2018](https://arxiv.org/abs/1804.10959)) | 512d LSTM 2+2 | 33.81 |
+| Joint BPE 16k | Transformer base 2+2 (w/ GRU) | 36.64 (36.72) |
+| Joint BPE 4k | Transformer base 2+2 (w/ GRU) | 35.49 (36.10) |
+| Joint BBPE 4k | Transformer base 2+2 (w/ GRU) | 35.61 (35.82) |
+| Joint BPE 2k | Transformer base 2+2 (w/ GRU) | 34.87 (36.13) |
+| Joint BBPE 2k | Transformer base 2+2 (w/ GRU) | 34.98 (35.43) |
+| Characters | Transformer base 2+2 (w/ GRU) | 31.78 (33.30) |
+| Bytes | Transformer base 2+2 (w/ GRU) | 31.57 (33.62) |
+
+
+## Citation
+```
+@misc{wang2019neural,
+    title={Neural Machine Translation with Byte-Level Subwords},
+    author={Changhan Wang and Kyunghyun Cho and Jiatao Gu},
+    year={2019},
+    eprint={1909.03341},
+    archivePrefix={arXiv},
+    primaryClass={cs.CL}
+}
+```
+
+
+## Contact
+Changhan Wang ([changhan@fb.com](mailto:changhan@fb.com)),
+Kyunghyun Cho ([kyunghyuncho@fb.com](mailto:kyunghyuncho@fb.com)),
+Jiatao Gu ([jgu@fb.com](mailto:jgu@fb.com))
diff --git a/SpeechT5/fairseq/examples/byte_level_bpe/get_bitext.py b/SpeechT5/fairseq/examples/byte_level_bpe/get_bitext.py
new file mode 100644
index 0000000000000000000000000000000000000000..6ac1eeec1e6167ec6bafd76b37173ee6987cae7e
--- /dev/null
+++ b/SpeechT5/fairseq/examples/byte_level_bpe/get_bitext.py
@@ -0,0 +1,254 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+
+import argparse
+import os
+import os.path as op
+from collections import namedtuple
+from multiprocessing import cpu_count
+from typing import List, Optional
+
+import sentencepiece as sp
+from fairseq.data.encoders.byte_bpe import ByteBPE
+from fairseq.data.encoders.byte_utils import byte_encode
+from fairseq.data.encoders.bytes import Bytes
+from fairseq.data.encoders.characters import Characters
+from fairseq.data.encoders.moses_tokenizer import MosesTokenizer
+from fairseq.data.encoders.sentencepiece_bpe import SentencepieceBPE
+
+
+SPLITS = ["train", "valid", "test"]
+
+
+def _convert_xml(in_path: str, out_path: str):
+    with open(in_path) as f, open(out_path, "w") as f_o:
+        for s in f:
+            ss = s.strip()
+            if not ss.startswith("<seg"):
+                continue
+            ss = ss.replace("</seg>", "").split('">')
+            assert len(ss) == 2
+            f_o.write(ss[1].strip() + "\n")
+
+
+def _convert_train(in_path: str, out_path: str):
+    with open(in_path) as f, open(out_path, "w") as f_o:
+        for s in f:
+            ss = s.strip()
+            if ss.startswith("<"):
+                continue
+            f_o.write(ss.strip() + "\n")
+
+
+def _get_bytes(in_path: str, out_path: str):
+    with open(in_path) as f, open(out_path, "w") as f_o:
+        for s in f:
+            f_o.write(Bytes.encode(s.strip()) + "\n")
+
+
+def _get_chars(in_path: str, out_path: str):
+    with open(in_path) as f, open(out_path, "w") as f_o:
+        for s in f:
+            f_o.write(Characters.encode(s.strip()) + "\n")
+
+
+def pretokenize(in_path: str, out_path: str, src: str, tgt: str):
+    Args = namedtuple(
+        "Args",
+        [
+            "moses_source_lang",
+            "moses_target_lang",
+            "moses_no_dash_splits",
+            "moses_no_escape",
+        ],
+    )
+    args = Args(
+        moses_source_lang=src,
+        moses_target_lang=tgt,
+        moses_no_dash_splits=False,
+        moses_no_escape=False,
+    )
+    pretokenizer = MosesTokenizer(args)
+    with open(in_path) as f, open(out_path, "w") as f_o:
+        for s in f:
+            f_o.write(pretokenizer.encode(s.strip()) + "\n")
+
+
+def _convert_to_bchar(in_path_prefix: str, src: str, tgt: str, out_path: str):
+    with open(out_path, "w") as f_o:
+        for lang in [src, tgt]:
+            with open(f"{in_path_prefix}.{lang}") as f:
+                for s in f:
+                    f_o.write(byte_encode(s.strip()) + "\n")
+
+
+def _get_bpe(in_path: str, model_prefix: str, vocab_size: int):
+    arguments = [
+        f"--input={in_path}",
+        f"--model_prefix={model_prefix}",
+        f"--model_type=bpe",
+        f"--vocab_size={vocab_size}",
+        "--character_coverage=1.0",
+        "--normalization_rule_name=identity",
+        f"--num_threads={cpu_count()}",
+    ]
+    sp.SentencePieceTrainer.Train(" ".join(arguments))
+
+
+def _apply_bbpe(model_path: str, in_path: str, out_path: str):
+    Args = namedtuple("Args", ["sentencepiece_model_path"])
+    args = Args(sentencepiece_model_path=model_path)
+    tokenizer = ByteBPE(args)
+    with open(in_path) as f, open(out_path, "w") as f_o:
+        for s in f:
+            f_o.write(tokenizer.encode(s.strip()) + "\n")
+
+
+def _apply_bpe(model_path: str, in_path: str, out_path: str):
+    Args = namedtuple("Args", ["sentencepiece_model"])
+    args = Args(sentencepiece_model=model_path)
+    tokenizer = SentencepieceBPE(args)
+    with open(in_path) as f, open(out_path, "w") as f_o:
+        for s in f:
+            f_o.write(tokenizer.encode(s.strip()) + "\n")
+
+
+def _concat_files(in_paths: List[str], out_path: str):
+    with open(out_path, "w") as f_o:
+        for p in in_paths:
+            with open(p) as f:
+                for r in f:
+                    f_o.write(r)
+
+
+def preprocess_iwslt17(
+    root: str,
+    src: str,
+    tgt: str,
+    bpe_size: Optional[int],
+    need_chars: bool,
+    bbpe_size: Optional[int],
+    need_bytes: bool,
+):
+    # extract bitext
+    in_root = op.join(root, f"{src}-{tgt}")
+    for lang in [src, tgt]:
+        _convert_train(
+            op.join(in_root, f"train.tags.{src}-{tgt}.{lang}"),
+            op.join(root, f"train.{lang}"),
+        )
+        _convert_xml(
+            op.join(in_root, f"IWSLT17.TED.dev2010.{src}-{tgt}.{lang}.xml"),
+            op.join(root, f"valid.{lang}"),
+        )
+        _convert_xml(
+            op.join(in_root, f"IWSLT17.TED.tst2015.{src}-{tgt}.{lang}.xml"),
+            op.join(root, f"test.{lang}"),
+        )
+    # pre-tokenize
+    for lang in [src, tgt]:
+        for split in SPLITS:
+            pretokenize(
+                op.join(root, f"{split}.{lang}"),
+                op.join(root, f"{split}.moses.{lang}"),
+                src,
+                tgt,
+            )
+    # tokenize with BPE vocabulary
+    if bpe_size is not None:
+        # learn vocabulary
+        concated_train_path = op.join(root, "train.all")
+        _concat_files(
+            [op.join(root, "train.moses.fr"), op.join(root, "train.moses.en")],
+            concated_train_path,
+        )
+        bpe_model_prefix = op.join(root, f"spm_bpe{bpe_size}")
+        _get_bpe(concated_train_path, bpe_model_prefix, bpe_size)
+        os.remove(concated_train_path)
+        # apply
+        for lang in [src, tgt]:
+            for split in SPLITS:
+                _apply_bpe(
+                    bpe_model_prefix + ".model",
+                    op.join(root, f"{split}.moses.{lang}"),
+                    op.join(root, f"{split}.moses.bpe{bpe_size}.{lang}"),
+                )
+    # tokenize with bytes vocabulary
+    if need_bytes:
+        for lang in [src, tgt]:
+            for split in SPLITS:
+                _get_bytes(
+                    op.join(root, f"{split}.moses.{lang}"),
+                    op.join(root, f"{split}.moses.bytes.{lang}"),
+                )
+    # tokenize with characters vocabulary
+    if need_chars:
+        for lang in [src, tgt]:
+            for split in SPLITS:
+                _get_chars(
+                    op.join(root, f"{split}.moses.{lang}"),
+                    op.join(root, f"{split}.moses.chars.{lang}"),
+                )
+    # tokenize with byte-level BPE vocabulary
+    if bbpe_size is not None:
+        # learn vocabulary
+        bchar_path = op.join(root, "train.bchar")
+        _convert_to_bchar(op.join(root, "train.moses"), src, tgt, bchar_path)
+        bbpe_model_prefix = op.join(root, f"spm_bbpe{bbpe_size}")
+        _get_bpe(bchar_path, bbpe_model_prefix, bbpe_size)
+        os.remove(bchar_path)
+        # apply
+        for lang in [src, tgt]:
+            for split in SPLITS:
+                _apply_bbpe(
+                    bbpe_model_prefix + ".model",
+                    op.join(root, f"{split}.moses.{lang}"),
+                    op.join(root, f"{split}.moses.bbpe{bbpe_size}.{lang}"),
+                )
+
+
+def main():
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--root", type=str, default="data")
+    parser.add_argument(
+        "--bpe-vocab",
+        default=None,
+        type=int,
+        help="Generate tokenized bitext with BPE of size K."
+        "Default to None (disabled).",
+    )
+    parser.add_argument(
+        "--bbpe-vocab",
+        default=None,
+        type=int,
+        help="Generate tokenized bitext with BBPE of size K."
+        "Default to None (disabled).",
+    )
+    parser.add_argument(
+        "--byte-vocab",
+        action="store_true",
+        help="Generate tokenized bitext with bytes vocabulary",
+    )
+    parser.add_argument(
+        "--char-vocab",
+        action="store_true",
+        help="Generate tokenized bitext with chars vocabulary",
+    )
+    args = parser.parse_args()
+
+    preprocess_iwslt17(
+        args.root,
+        "fr",
+        "en",
+        args.bpe_vocab,
+        args.char_vocab,
+        args.bbpe_vocab,
+        args.byte_vocab,
+    )
+
+
+if __name__ == "__main__":
+    main()
diff --git a/SpeechT5/fairseq/examples/byte_level_bpe/get_data.sh b/SpeechT5/fairseq/examples/byte_level_bpe/get_data.sh
new file mode 100644
index 0000000000000000000000000000000000000000..c3d55d4925a6e6e23d12d293f093c1ae14acf76e
--- /dev/null
+++ b/SpeechT5/fairseq/examples/byte_level_bpe/get_data.sh
@@ -0,0 +1,47 @@
+#!/bin/bash
+
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+PY_BIN_ROOT=
+
+# PyPI dependency
+${PY_BIN_ROOT}pip install sentencepiece sacremoses
+
+# Get data
+if [ ! -d "data" ]; then
+  mkdir data
+fi
+
+if [ ! -f "data/fr-en.tgz" ]; then
+  wget https://wit3.fbk.eu/archive/2017-01-trnted/texts/fr/en/fr-en.tgz -P data
+  tar xvf data/fr-en.tgz -C data
+fi
+${PY_BIN_ROOT}python get_bitext.py --bpe-vocab 16384 --byte-vocab --char-vocab
+for VOCAB_SIZE in 2048 4096; do
+  ${PY_BIN_ROOT}python get_bitext.py --bpe-vocab ${VOCAB_SIZE} --bbpe-vocab ${VOCAB_SIZE}
+done
+rm -r data/fr-en data/fr-en.tgz
+
+# Generate binary dataset
+${PY_BIN_ROOT}/fairseq-preprocess --source-lang fr --target-lang en --destdir data/bin_bpe16384 --joined-dictionary \
+  --workers "$(nproc)" --trainpref data/train.moses.bpe16384 --validpref data/valid.moses.bpe16384 \
+  --testpref data/test.moses.bpe16384
+
+${PY_BIN_ROOT}/fairseq-preprocess --source-lang fr --target-lang en --destdir data/bin_bytes --joined-dictionary \
+  --workers "$(nproc)" --trainpref data/train.moses.bytes --validpref data/valid.moses.bytes \
+  --testpref data/test.moses.bytes
+
+${PY_BIN_ROOT}/fairseq-preprocess --source-lang fr --target-lang en --destdir data/bin_chars --joined-dictionary \
+  --workers "$(nproc)" --trainpref data/train.moses.chars --validpref data/valid.moses.chars \
+  --testpref data/test.moses.chars
+
+for VOCAB_SIZE in 2048 4096; do
+  for TYPE in bbpe bpe; do
+    ${PY_BIN_ROOT}/fairseq-preprocess --source-lang fr --target-lang en --destdir "data/bin_${TYPE}${VOCAB_SIZE}" \
+      --joined-dictionary --workers "$(nproc)" --trainpref "data/train.moses.${TYPE}${VOCAB_SIZE}" \
+      --validpref "data/valid.moses.${TYPE}${VOCAB_SIZE}" --testpref "data/test.moses.${TYPE}${VOCAB_SIZE}"
+  done
+done
diff --git a/SpeechT5/fairseq/examples/byte_level_bpe/gru_transformer.py b/SpeechT5/fairseq/examples/byte_level_bpe/gru_transformer.py
new file mode 100644
index 0000000000000000000000000000000000000000..d4efa93a4d75da71c78e786d7f62101ef3266af4
--- /dev/null
+++ b/SpeechT5/fairseq/examples/byte_level_bpe/gru_transformer.py
@@ -0,0 +1,107 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import torch.nn as nn
+import torch.nn.functional as F
+from fairseq.models import register_model, register_model_architecture
+from fairseq.models.transformer import TransformerEncoder, TransformerModel
+
+
+@register_model("gru_transformer")
+class GRUTransformerModel(TransformerModel):
+    @classmethod
+    def build_encoder(cls, args, src_dict, embed_tokens):
+        return GRUTransformerEncoder(args, src_dict, embed_tokens)
+
+
+class GRUTransformerEncoder(TransformerEncoder):
+    def __init__(self, args, dictionary, embed_tokens):
+        super().__init__(args, dictionary, embed_tokens)
+        self.emb_ctx = nn.GRU(
+            input_size=embed_tokens.embedding_dim,
+            hidden_size=embed_tokens.embedding_dim // 2,
+            num_layers=1,
+            bidirectional=True,
+        )
+
+    def forward_embedding(self, src_tokens):
+        # embed tokens and positions
+        x = embed = self.embed_scale * self.embed_tokens(src_tokens)
+        if self.embed_positions is not None:
+            x = embed + self.embed_positions(src_tokens)
+
+        # contextualize embeddings
+        x = x.transpose(0, 1)
+        x = self.dropout_module(x)
+        x, _ = self.emb_ctx.forward(x)
+        x = x.transpose(0, 1)
+
+        if self.layernorm_embedding is not None:
+            x = self.layernorm_embedding(x)
+        x = self.dropout_module(x)
+        return x, embed
+
+
+@register_model_architecture("gru_transformer", "gru_transformer")
+def gru_transformer_base_architecture(args):
+    args.encoder_embed_path = getattr(args, "encoder_embed_path", None)
+    args.encoder_embed_dim = getattr(args, "encoder_embed_dim", 512)
+    args.encoder_ffn_embed_dim = getattr(args, "encoder_ffn_embed_dim", 2048)
+    args.encoder_layers = getattr(args, "encoder_layers", 6)
+    args.encoder_attention_heads = getattr(args, "encoder_attention_heads", 8)
+    args.encoder_normalize_before = getattr(args, "encoder_normalize_before", False)
+    args.encoder_learned_pos = getattr(args, "encoder_learned_pos", False)
+    args.decoder_embed_path = getattr(args, "decoder_embed_path", None)
+    args.decoder_embed_dim = getattr(args, "decoder_embed_dim", args.encoder_embed_dim)
+    args.decoder_ffn_embed_dim = getattr(
+        args, "decoder_ffn_embed_dim", args.encoder_ffn_embed_dim
+    )
+    args.decoder_layers = getattr(args, "decoder_layers", 6)
+    args.decoder_attention_heads = getattr(args, "decoder_attention_heads", 8)
+    args.decoder_normalize_before = getattr(args, "decoder_normalize_before", False)
+    args.decoder_learned_pos = getattr(args, "decoder_learned_pos", False)
+    args.attention_dropout = getattr(args, "attention_dropout", 0.0)
+    args.activation_dropout = getattr(args, "activation_dropout", 0.0)
+    args.activation_fn = getattr(args, "activation_fn", "relu")
+    args.dropout = getattr(args, "dropout", 0.1)
+    args.adaptive_softmax_cutoff = getattr(args, "adaptive_softmax_cutoff", None)
+    args.adaptive_softmax_dropout = getattr(args, "adaptive_softmax_dropout", 0)
+    args.share_decoder_input_output_embed = getattr(
+        args, "share_decoder_input_output_embed", False
+    )
+    args.share_all_embeddings = getattr(args, "share_all_embeddings", False)
+    args.no_token_positional_embeddings = getattr(
+        args, "no_token_positional_embeddings", False
+    )
+    args.adaptive_input = getattr(args, "adaptive_input", False)
+    args.no_cross_attention = getattr(args, "no_cross_attention", False)
+    args.cross_self_attention = getattr(args, "cross_self_attention", False)
+    args.layer_wise_attention = getattr(args, "layer_wise_attention", False)
+
+    args.decoder_output_dim = getattr(
+        args, "decoder_output_dim", args.decoder_embed_dim
+    )
+    args.decoder_input_dim = getattr(args, "decoder_input_dim", args.decoder_embed_dim)
+
+    args.no_scale_embedding = getattr(args, "no_scale_embedding", False)
+    args.layernorm_embedding = getattr(args, "layernorm_embedding", False)
+
+
+@register_model_architecture("gru_transformer", "gru_transformer_big")
+def gru_transformer_big(args):
+    args.encoder_embed_dim = getattr(args, "encoder_embed_dim", 1024)
+    args.encoder_ffn_embed_dim = getattr(args, "encoder_ffn_embed_dim", 4096)
+    args.encoder_attention_heads = getattr(args, "encoder_attention_heads", 16)
+    args.encoder_normalize_before = getattr(args, "encoder_normalize_before", False)
+    args.decoder_embed_dim = getattr(args, "decoder_embed_dim", 1024)
+    args.decoder_ffn_embed_dim = getattr(args, "decoder_ffn_embed_dim", 4096)
+    args.decoder_attention_heads = getattr(args, "decoder_attention_heads", 16)
+    args.dropout = getattr(args, "dropout", 0.3)
+    gru_transformer_base_architecture(args)
diff --git a/SpeechT5/fairseq/examples/camembert/README.md b/SpeechT5/fairseq/examples/camembert/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..5ef4fe3f151bb468712f3be935ea5bb1b1360bf7
--- /dev/null
+++ b/SpeechT5/fairseq/examples/camembert/README.md
@@ -0,0 +1,75 @@
+# CamemBERT: a Tasty French Language Model
+
+## Introduction
+
+[CamemBERT](https://arxiv.org/abs/1911.03894) is a pretrained language model trained on 138GB of French text based on RoBERTa.
+
+Also available in [github.com/huggingface/transformers](https://github.com/huggingface/transformers/).
+
+## Pre-trained models
+
+| Model                          | #params | Download                                                                                                                 | Arch. | Training data                     |
+|--------------------------------|---------|--------------------------------------------------------------------------------------------------------------------------|-------|-----------------------------------|
+| `camembert` / `camembert-base` | 110M    | [camembert-base.tar.gz](https://dl.fbaipublicfiles.com/fairseq/models/camembert-base.tar.gz)                             | Base  | OSCAR (138 GB of text)            |
+| `camembert-large`              | 335M    | [camembert-large.tar.gz](https://dl.fbaipublicfiles.com/fairseq/models/camembert-large.tar.gz)                           | Large | CCNet (135 GB of text)            |
+| `camembert-base-ccnet`         | 110M    | [camembert-base-ccnet.tar.gz](https://dl.fbaipublicfiles.com/fairseq/models/camembert-base-ccnet.tar.gz)                 | Base  | CCNet (135 GB of text)            |
+| `camembert-base-wikipedia-4gb` | 110M    | [camembert-base-wikipedia-4gb.tar.gz](https://dl.fbaipublicfiles.com/fairseq/models/camembert-base-wikipedia-4gb.tar.gz) | Base  | Wikipedia (4 GB of text)          |
+| `camembert-base-oscar-4gb`     | 110M    | [camembert-base-oscar-4gb.tar.gz](https://dl.fbaipublicfiles.com/fairseq/models/camembert-base-oscar-4gb.tar.gz)         | Base  | Subsample of OSCAR (4 GB of text) |
+| `camembert-base-ccnet-4gb`     | 110M    | [camembert-base-ccnet-4gb.tar.gz](https://dl.fbaipublicfiles.com/fairseq/models/camembert-base-ccnet-4gb.tar.gz)         | Base  | Subsample of CCNet (4 GB of text) |
+
+## Example usage
+
+### fairseq
+##### Load CamemBERT from torch.hub (PyTorch >= 1.1):
+```python
+import torch
+camembert = torch.hub.load('pytorch/fairseq', 'camembert')
+camembert.eval()  # disable dropout (or leave in train mode to finetune)
+```
+
+##### Load CamemBERT (for PyTorch 1.0 or custom models):
+```python
+# Download camembert model
+wget https://dl.fbaipublicfiles.com/fairseq/models/camembert-base.tar.gz
+tar -xzvf camembert.tar.gz
+
+# Load the model in fairseq
+from fairseq.models.roberta import CamembertModel
+camembert = CamembertModel.from_pretrained('/path/to/camembert')
+camembert.eval()  # disable dropout (or leave in train mode to finetune)
+```
+
+##### Filling masks:
+```python
+masked_line = 'Le camembert est <mask> :)'
+camembert.fill_mask(masked_line, topk=3)
+# [('Le camembert est délicieux :)', 0.4909118115901947, ' délicieux'),
+#  ('Le camembert est excellent :)', 0.10556942224502563, ' excellent'),
+#  ('Le camembert est succulent :)', 0.03453322499990463, ' succulent')]
+```
+
+##### Extract features from Camembert:
+```python
+# Extract the last layer's features
+line = "J'aime le camembert !"
+tokens = camembert.encode(line)
+last_layer_features = camembert.extract_features(tokens)
+assert last_layer_features.size() == torch.Size([1, 10, 768])
+
+# Extract all layer's features (layer 0 is the embedding layer)
+all_layers = camembert.extract_features(tokens, return_all_hiddens=True)
+assert len(all_layers) == 13
+assert torch.all(all_layers[-1] == last_layer_features)
+```
+
+## Citation
+If you use our work, please cite:
+
+```bibtex
+@inproceedings{martin2020camembert,
+  title={CamemBERT: a Tasty French Language Model},
+  author={Martin, Louis and Muller, Benjamin and Su{\'a}rez, Pedro Javier Ortiz and Dupont, Yoann and Romary, Laurent and de la Clergerie, {\'E}ric Villemonte and Seddah, Djam{\'e} and Sagot, Beno{\^\i}t},
+  booktitle={Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics},
+  year={2020}
+}
+```
diff --git a/SpeechT5/fairseq/examples/constrained_decoding/README.md b/SpeechT5/fairseq/examples/constrained_decoding/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..cfca9c91fdb65e64b80af54f2d89f6b5f0db61d0
--- /dev/null
+++ b/SpeechT5/fairseq/examples/constrained_decoding/README.md
@@ -0,0 +1,123 @@
+# (Vectorized) Lexically constrained decoding with dynamic beam allocation
+
+This page provides instructions for how to use lexically constrained decoding in Fairseq.
+Fairseq implements the code described in the following papers:
+
+* [Fast Lexically Constrained Decoding With Dynamic Beam Allocation](https://www.aclweb.org/anthology/N18-1119/) (Post & Vilar, 2018)
+* [Improved Lexically Constrained Decoding for Translation and Monolingual Rewriting](https://www.aclweb.org/anthology/N19-1090/) (Hu et al., 2019)
+
+## Quick start
+
+Constrained search is enabled by adding the command-line argument `--constraints` to `fairseq-interactive`.
+Constraints are appended to each line of input, separated by tabs. Each constraint (one or more tokens)
+is a separate field.
+
+The following command, using [Fairseq's WMT19 German--English model](https://github.com/pytorch/fairseq/blob/master/examples/wmt19/README.md),
+translates the sentence *Die maschinelle Übersetzung ist schwer zu kontrollieren.* with the constraints
+"hard" and "to influence".
+
+    echo -e "Die maschinelle Übersetzung ist schwer zu kontrollieren.\thard\ttoinfluence" \
+    | normalize.py | tok.py \
+    | fairseq-interactive /path/to/model \
+      --path /path/to/model/model1.pt \
+      --bpe fastbpe \
+      --bpe-codes /path/to/model/bpecodes \
+      --constraints \
+      -s de -t en \
+      --beam 10
+
+(tok.py and normalize.py can be found in the same directory as this README; they are just shortcuts around Fairseq's WMT19 preprocessing).
+This will generate the following output:
+
+    [snip]
+    S-0     Die masch@@ in@@ elle Über@@ setzung ist schwer zu kontrollieren .
+    W-0     1.844   seconds
+    C-0     hard
+    C-0     influence
+    H-0     -1.5333266258239746     Mach@@ ine trans@@ lation is hard to influence .
+    D-0     -1.5333266258239746     Machine translation is hard to influence .
+    P-0     -0.5434 -0.1423 -0.1930 -0.1415 -0.2346 -1.8031 -0.1701 -11.7727 -0.1815 -0.1511
+
+By default, constraints are generated in the order supplied, with any number (zero or more) of tokens generated
+between constraints. If you wish for the decoder to order the constraints, then use `--constraints unordered`.
+Note that you may want to use a larger beam.
+
+## Implementation details
+
+The heart of the implementation is in `fairseq/search.py`, which adds a `LexicallyConstrainedBeamSearch` instance.
+This instance of beam search tracks the progress of each hypothesis in the beam through the set of constraints
+provided for each input sentence. It does this using one of two classes, both found in `fairseq/token_generation_contstraints.py`:
+
+* OrderedConstraintState: assumes the `C` input constraints will be generated in the provided order
+* UnorderedConstraintState: tries to apply `C` (phrasal) constraints in all `C!` orders
+
+## Differences from Sockeye
+
+There are a number of [differences from Sockeye's implementation](https://awslabs.github.io/sockeye/inference.html#lexical-constraints).
+
+* Generating constraints in the order supplied (the default option here) is not available in Sockeye.
+* Due to an improved beam allocation method, there is no need to prune the beam.
+* Again due to better allocation, beam sizes as low as 10 or even 5 are often sufficient.
+* [The vector extensions described in Hu et al.](https://github.com/edwardjhu/sockeye/tree/trie_constraints) (NAACL 2019) were never merged
+  into the main Sockeye branch.
+
+## Citation
+
+The paper first describing lexical constraints for seq2seq decoding is:
+
+```bibtex
+@inproceedings{hokamp-liu-2017-lexically,
+  title = "Lexically Constrained Decoding for Sequence Generation Using Grid Beam Search",
+  author = "Hokamp, Chris  and
+    Liu, Qun",
+  booktitle = "Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
+  month = jul,
+  year = "2017",
+  address = "Vancouver, Canada",
+  publisher = "Association for Computational Linguistics",
+  url = "https://www.aclweb.org/anthology/P17-1141",
+  doi = "10.18653/v1/P17-1141",
+  pages = "1535--1546",
+}
+```
+
+The fairseq implementation uses the extensions described in
+
+```bibtex
+@inproceedings{post-vilar-2018-fast,
+    title = "Fast Lexically Constrained Decoding with Dynamic Beam Allocation for Neural Machine Translation",
+    author = "Post, Matt  and
+      Vilar, David",
+    booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)",
+    month = jun,
+    year = "2018",
+    address = "New Orleans, Louisiana",
+    publisher = "Association for Computational Linguistics",
+    url = "https://www.aclweb.org/anthology/N18-1119",
+    doi = "10.18653/v1/N18-1119",
+    pages = "1314--1324",
+}
+```
+
+and
+
+```bibtex
+@inproceedings{hu-etal-2019-improved,
+  title = "Improved Lexically Constrained Decoding for Translation and Monolingual Rewriting",
+  author = "Hu, J. Edward  and
+    Khayrallah, Huda  and
+    Culkin, Ryan  and
+    Xia, Patrick  and
+    Chen, Tongfei  and
+    Post, Matt  and
+    Van Durme, Benjamin",
+  booktitle = "Proceedings of the 2019 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)",
+  month = jun,
+  year = "2019",
+  address = "Minneapolis, Minnesota",
+  publisher = "Association for Computational Linguistics",
+  url = "https://www.aclweb.org/anthology/N19-1090",
+  doi = "10.18653/v1/N19-1090",
+  pages = "839--850",
+}
+```
diff --git a/SpeechT5/fairseq/examples/constrained_decoding/normalize.py b/SpeechT5/fairseq/examples/constrained_decoding/normalize.py
new file mode 100644
index 0000000000000000000000000000000000000000..4ae2b5111ba025acb9e1613865c92fdc339a58d5
--- /dev/null
+++ b/SpeechT5/fairseq/examples/constrained_decoding/normalize.py
@@ -0,0 +1,27 @@
+#!/usr/bin/env python3
+#
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import sys
+
+from sacremoses.normalize import MosesPunctNormalizer
+
+
+def main(args):
+    normalizer = MosesPunctNormalizer(lang=args.lang, penn=args.penn)
+    for line in sys.stdin:
+        print(normalizer.normalize(line.rstrip()), flush=True)
+
+
+if __name__ == "__main__":
+    import argparse
+
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--lang", "-l", default="en")
+    parser.add_argument("--penn", "-p", action="store_true")
+    args = parser.parse_args()
+
+    main(args)
diff --git a/SpeechT5/fairseq/examples/constrained_decoding/tok.py b/SpeechT5/fairseq/examples/constrained_decoding/tok.py
new file mode 100644
index 0000000000000000000000000000000000000000..b1f888a8c0d1b8ec7174859476cc3222456e0d2c
--- /dev/null
+++ b/SpeechT5/fairseq/examples/constrained_decoding/tok.py
@@ -0,0 +1,34 @@
+#!/usr/bin/env python3
+#
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import sys
+
+import sacremoses
+
+
+def main(args):
+    """Tokenizes, preserving tabs"""
+    mt = sacremoses.MosesTokenizer(lang=args.lang)
+
+    def tok(s):
+        return mt.tokenize(s, return_str=True)
+
+    for line in sys.stdin:
+        parts = list(map(tok, line.split("\t")))
+        print(*parts, sep="\t", flush=True)
+
+
+if __name__ == "__main__":
+    import argparse
+
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--lang", "-l", default="en")
+    parser.add_argument("--penn", "-p", action="store_true")
+    parser.add_argument("--fields", "-f", help="fields to tokenize")
+    args = parser.parse_args()
+
+    main(args)
diff --git a/SpeechT5/fairseq/examples/conv_seq2seq/README.md b/SpeechT5/fairseq/examples/conv_seq2seq/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..95fe7e7909a77ee0e50fe31d4b8be38daa8f3be7
--- /dev/null
+++ b/SpeechT5/fairseq/examples/conv_seq2seq/README.md
@@ -0,0 +1,25 @@
+# Convolutional Sequence to Sequence Learning (Gehring et al., 2017)
+
+## Pre-trained models
+
+Description | Dataset | Model | Test set(s)
+---|---|---|---
+Convolutional <br> ([Gehring et al., 2017](https://arxiv.org/abs/1705.03122)) | [WMT14 English-French](http://statmt.org/wmt14/translation-task.html#Download) | [download (.tar.bz2)](https://dl.fbaipublicfiles.com/fairseq/models/wmt14.v2.en-fr.fconv-py.tar.bz2) | newstest2014: <br> [download (.tar.bz2)](https://dl.fbaipublicfiles.com/fairseq/data/wmt14.v2.en-fr.newstest2014.tar.bz2) <br> newstest2012/2013: <br> [download (.tar.bz2)](https://dl.fbaipublicfiles.com/fairseq/data/wmt14.v2.en-fr.ntst1213.tar.bz2)
+Convolutional <br> ([Gehring et al., 2017](https://arxiv.org/abs/1705.03122)) | [WMT14 English-German](http://statmt.org/wmt14/translation-task.html#Download) | [download (.tar.bz2)](https://dl.fbaipublicfiles.com/fairseq/models/wmt14.en-de.fconv-py.tar.bz2) | newstest2014: <br> [download (.tar.bz2)](https://dl.fbaipublicfiles.com/fairseq/data/wmt14.en-de.newstest2014.tar.bz2)
+Convolutional <br> ([Gehring et al., 2017](https://arxiv.org/abs/1705.03122)) | [WMT17 English-German](http://statmt.org/wmt17/translation-task.html#Download) | [download (.tar.bz2)](https://dl.fbaipublicfiles.com/fairseq/models/wmt17.v2.en-de.fconv-py.tar.bz2) | newstest2014: <br> [download (.tar.bz2)](https://dl.fbaipublicfiles.com/fairseq/data/wmt17.v2.en-de.newstest2014.tar.bz2)
+
+## Example usage
+
+See the [translation README](../translation/README.md) for instructions on reproducing results for WMT'14 En-De and
+WMT'14 En-Fr using the `fconv_wmt_en_de` and `fconv_wmt_en_fr` model architectures.
+
+## Citation
+
+```bibtex
+@inproceedings{gehring2017convs2s,
+  title = {Convolutional Sequence to Sequence Learning},
+  author = {Gehring, Jonas, and Auli, Michael and Grangier, David and Yarats, Denis and Dauphin, Yann N},
+  booktitle = {Proc. of ICML},
+  year = 2017,
+}
+```
diff --git a/SpeechT5/fairseq/examples/criss/README.md b/SpeechT5/fairseq/examples/criss/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..4689ed7c10497a5100b28fe6d6801a7c089da569
--- /dev/null
+++ b/SpeechT5/fairseq/examples/criss/README.md
@@ -0,0 +1,61 @@
+# Cross-lingual Retrieval for Iterative Self-Supervised Training
+
+https://arxiv.org/pdf/2006.09526.pdf
+
+## Introduction
+
+CRISS is a multilingual sequence-to-sequnce pretraining method where mining and training processes are applied iteratively, improving cross-lingual alignment and translation ability at the same time.
+
+## Requirements:
+
+* faiss: https://github.com/facebookresearch/faiss
+* mosesdecoder: https://github.com/moses-smt/mosesdecoder
+* flores: https://github.com/facebookresearch/flores
+* LASER: https://github.com/facebookresearch/LASER
+
+## Unsupervised Machine Translation
+##### 1. Download and decompress CRISS checkpoints
+```
+cd examples/criss
+wget https://dl.fbaipublicfiles.com/criss/criss_3rd_checkpoints.tar.gz
+tar -xf criss_checkpoints.tar.gz
+```
+##### 2. Download and preprocess Flores test dataset
+Make sure to run all scripts from examples/criss directory
+```
+bash download_and_preprocess_flores_test.sh
+```
+
+##### 3. Run Evaluation on Sinhala-English
+```
+bash unsupervised_mt/eval.sh
+```
+
+## Sentence Retrieval
+##### 1. Download and preprocess Tatoeba dataset
+```
+bash download_and_preprocess_tatoeba.sh
+```
+
+##### 2. Run Sentence Retrieval on Tatoeba Kazakh-English
+```
+bash sentence_retrieval/sentence_retrieval_tatoeba.sh
+```
+
+## Mining
+##### 1. Install faiss
+Follow instructions on https://github.com/facebookresearch/faiss/blob/master/INSTALL.md
+##### 2. Mine pseudo-parallel data between Kazakh and English
+```
+bash mining/mine_example.sh
+```
+
+## Citation
+```bibtex
+@article{tran2020cross,
+  title={Cross-lingual retrieval for iterative self-supervised training},
+  author={Tran, Chau and Tang, Yuqing and Li, Xian and Gu, Jiatao},
+  journal={arXiv preprint arXiv:2006.09526},
+  year={2020}
+}
+```
diff --git a/SpeechT5/fairseq/examples/criss/download_and_preprocess_flores_test.sh b/SpeechT5/fairseq/examples/criss/download_and_preprocess_flores_test.sh
new file mode 100644
index 0000000000000000000000000000000000000000..ed4b390fbdee3991efeb298050e12065d7fe605b
--- /dev/null
+++ b/SpeechT5/fairseq/examples/criss/download_and_preprocess_flores_test.sh
@@ -0,0 +1,64 @@
+#!/bin/bash
+# Copyright (c) Facebook, Inc. and its affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+
+SPM_ENCODE=flores/scripts/spm_encode.py
+DATA=data_tmp
+SPM_MODEL=criss_checkpoints/sentence.bpe.model
+DICT=criss_checkpoints/dict.txt
+
+download_data() {
+  CORPORA=$1
+  URL=$2
+
+  if [ -f $CORPORA ]; then
+    echo "$CORPORA already exists, skipping download"
+  else
+    echo "Downloading $URL"
+    wget $URL -O $CORPORA --no-check-certificate || rm -f $CORPORA
+    if [ -f $CORPORA ]; then
+      echo "$URL successfully downloaded."
+    else
+      echo "$URL not successfully downloaded."
+      rm -f $CORPORA
+    fi
+  fi
+}
+
+if [[ -f flores ]]; then
+  echo "flores already cloned"
+else
+  git clone https://github.com/facebookresearch/flores
+fi
+
+mkdir -p $DATA
+download_data $DATA/wikipedia_en_ne_si_test_sets.tgz "https://github.com/facebookresearch/flores/raw/master/data/wikipedia_en_ne_si_test_sets.tgz"
+pushd $DATA
+pwd
+tar -vxf wikipedia_en_ne_si_test_sets.tgz
+popd
+
+
+for lang in ne_NP si_LK; do
+  datadir=$DATA/${lang}-en_XX-flores
+  rm -rf $datadir
+  mkdir -p $datadir
+  TEST_PREFIX=$DATA/wikipedia_en_ne_si_test_sets/wikipedia.test
+  python $SPM_ENCODE \
+    --model ${SPM_MODEL} \
+    --output_format=piece \
+    --inputs ${TEST_PREFIX}.${lang:0:2}-en.${lang:0:2} ${TEST_PREFIX}.${lang:0:2}-en.en \
+    --outputs $datadir/test.bpe.${lang}-en_XX.${lang} $datadir/test.bpe.${lang}-en_XX.en_XX
+
+  # binarize data
+  fairseq-preprocess \
+    --source-lang ${lang} --target-lang en_XX \
+    --testpref $datadir/test.bpe.${lang}-en_XX \
+    --destdir $datadir \
+    --srcdict ${DICT} \
+    --joined-dictionary \
+    --workers 4
+done
diff --git a/SpeechT5/fairseq/examples/criss/download_and_preprocess_tatoeba.sh b/SpeechT5/fairseq/examples/criss/download_and_preprocess_tatoeba.sh
new file mode 100644
index 0000000000000000000000000000000000000000..7ed64f017d5e62695ba73745c840507b994abc0f
--- /dev/null
+++ b/SpeechT5/fairseq/examples/criss/download_and_preprocess_tatoeba.sh
@@ -0,0 +1,46 @@
+#!/bin/bash
+# Copyright (c) Facebook, Inc. and its affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+
+SPM_ENCODE=flores/scripts/spm_encode.py
+DATA=data_tmp
+SPM_MODEL=criss_checkpoints/sentence.bpe.model
+DICT=criss_checkpoints/dict.txt
+
+if [[ -f flores ]]; then
+  echo "flores already cloned"
+else
+  git clone https://github.com/facebookresearch/flores
+fi
+if [[ -f LASER ]]; then
+  echo "LASER already cloned"
+else
+  git clone https://github.com/facebookresearch/LASER
+fi
+mkdir -p data_tmp
+declare -A lang_tatoeba_map=( ["ar_AR"]="ara" ["de_DE"]="deu"  ["es_XX"]="spa" ["et_EE"]="est" ["fi_FI"]="fin" ["fr_XX"]="fra" ["hi_IN"]="hin" ["it_IT"]="ita" ["ja_XX"]="jpn" ["ko_KR"]="kor" ["kk_KZ"]="kaz" ["nl_XX"]="nld" ["ru_RU"]="rus" ["tr_TR"]="tur" ["vi_VN"]="vie" ["zh_CN"]="cmn")
+for lang in ar_AR de_DE es_XX et_EE fi_FI fr_XX hi_IN it_IT ja_XX kk_KZ ko_KR nl_XX ru_RU tr_TR vi_VN zh_CN; do
+  lang_tatoeba=${lang_tatoeba_map[$lang]}
+  echo $lang_tatoeba
+  datadir=$DATA/${lang}-en_XX-tatoeba
+  rm -rf $datadir
+  mkdir -p $datadir
+  TEST_PREFIX=LASER/data/tatoeba/v1/tatoeba
+  python $SPM_ENCODE \
+    --model ${SPM_MODEL} \
+    --output_format=piece \
+    --inputs ${TEST_PREFIX}.${lang_tatoeba}-eng.${lang_tatoeba} ${TEST_PREFIX}.${lang_tatoeba}-eng.eng \
+    --outputs $datadir/test.bpe.${lang}-en_XX.${lang} $datadir/test.bpe.${lang}-en_XX.en_XX
+
+  # binarize data
+  fairseq-preprocess \
+    --source-lang ${lang} --target-lang en_XX \
+    --testpref $datadir/test.bpe.${lang}-en_XX \
+    --destdir $datadir \
+    --srcdict ${DICT} \
+    --joined-dictionary \
+    --workers 4
+done
diff --git a/SpeechT5/fairseq/examples/criss/mining/mine.py b/SpeechT5/fairseq/examples/criss/mining/mine.py
new file mode 100644
index 0000000000000000000000000000000000000000..c872da196fe0df776622365748ad7963fee1f0a0
--- /dev/null
+++ b/SpeechT5/fairseq/examples/criss/mining/mine.py
@@ -0,0 +1,240 @@
+#!/usr/bin/env python3 -u
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+import argparse
+import glob
+from subprocess import check_call
+
+try:
+    import faiss
+
+    has_faiss = True
+except ImportError:
+    has_faiss = False
+import numpy as np
+
+
+GB = 1024 * 1024 * 1024
+
+
+def call(cmd):
+    print(cmd)
+    check_call(cmd, shell=True)
+
+
+def get_batches(directory, lang, prefix="all_avg_pool"):
+    print(f"Finding in {directory}/{prefix}.{lang}*")
+    files = glob.glob(f"{directory}/{prefix}.{lang}*")
+    emb_files = []
+    txt_files = []
+    for emb_fi in files:
+        emb_files.append(emb_fi)
+        txt_fi = emb_fi.replace(prefix, "sentences")
+        txt_files.append(txt_fi)
+    return emb_files, txt_files
+
+
+def load_batch(emb_file, dim):
+    embeddings = np.fromfile(emb_file, dtype=np.float32)
+    num_rows = int(embeddings.shape[0] / dim)
+    embeddings = embeddings.reshape((num_rows, dim))
+    faiss.normalize_L2(embeddings)
+    return embeddings
+
+
+def knnGPU_sharded(x_batches_f, y_batches_f, dim, k, direction="x2y"):
+    if not has_faiss:
+        raise ImportError("Please install Faiss")
+    sims = []
+    inds = []
+    xfrom = 0
+    xto = 0
+    for x_batch_f in x_batches_f:
+        yfrom = 0
+        yto = 0
+        x_batch = load_batch(x_batch_f, dim)
+        xto = xfrom + x_batch.shape[0]
+        bsims, binds = [], []
+        for y_batch_f in y_batches_f:
+            y_batch = load_batch(y_batch_f, dim)
+            neighbor_size = min(k, y_batch.shape[0])
+            yto = yfrom + y_batch.shape[0]
+            print("{}-{}  ->  {}-{}".format(xfrom, xto, yfrom, yto))
+            idx = faiss.IndexFlatIP(dim)
+            idx = faiss.index_cpu_to_all_gpus(idx)
+            idx.add(y_batch)
+            bsim, bind = idx.search(x_batch, neighbor_size)
+
+            bsims.append(bsim)
+            binds.append(bind + yfrom)
+            yfrom += y_batch.shape[0]
+            del idx
+            del y_batch
+        bsims = np.concatenate(bsims, axis=1)
+        binds = np.concatenate(binds, axis=1)
+        aux = np.argsort(-bsims, axis=1)
+        sim_batch = np.zeros((x_batch.shape[0], k), dtype=np.float32)
+        ind_batch = np.zeros((x_batch.shape[0], k), dtype=np.int64)
+        for i in range(x_batch.shape[0]):
+            for j in range(k):
+                sim_batch[i, j] = bsims[i, aux[i, j]]
+                ind_batch[i, j] = binds[i, aux[i, j]]
+        sims.append(sim_batch)
+        inds.append(ind_batch)
+        xfrom += x_batch.shape[0]
+        del x_batch
+    sim = np.concatenate(sims, axis=0)
+    ind = np.concatenate(inds, axis=0)
+    return sim, ind
+
+
+def score(sim, fwd_mean, bwd_mean, margin):
+    return margin(sim, (fwd_mean + bwd_mean) / 2)
+
+
+def score_candidates(
+    sim_mat, candidate_inds, fwd_mean, bwd_mean, margin, verbose=False
+):
+    print(" - scoring {:d} candidates".format(sim_mat.shape[0]))
+    scores = np.zeros(candidate_inds.shape)
+    for i in range(scores.shape[0]):
+        for j in range(scores.shape[1]):
+            k = int(candidate_inds[i, j])
+            scores[i, j] = score(sim_mat[i, j], fwd_mean[i], bwd_mean[k], margin)
+    return scores
+
+
+def load_text(files):
+    all_sentences = []
+    for fi in files:
+        with open(fi) as sentence_fi:
+            for line in sentence_fi:
+                all_sentences.append(line.strip())
+    print(f"Read {len(all_sentences)} sentences")
+    return all_sentences
+
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(description="Mine bitext")
+    parser.add_argument("--src-lang", help="Source language")
+    parser.add_argument("--tgt-lang", help="Target language")
+    parser.add_argument(
+        "--dict-path", help="Path to dictionary file", default="dict.txt"
+    )
+    parser.add_argument(
+        "--spm-path", help="Path to SPM model file", default="sentence.bpe.model"
+    )
+    parser.add_argument("--dim", type=int, default=1024, help="Embedding dimension")
+    parser.add_argument("--mem", type=int, default=5, help="Memory in GB")
+    parser.add_argument("--src-dir", help="Source directory")
+    parser.add_argument("--tgt-dir", help="Target directory")
+    parser.add_argument("--output", help="Output path")
+    parser.add_argument(
+        "--neighborhood", type=int, default=4, help="Embedding dimension"
+    )
+    parser.add_argument(
+        "--threshold", type=float, default=1.06, help="Threshold on mined bitext"
+    )
+    parser.add_argument(
+        "--valid-size",
+        type=int,
+        default=2000,
+        help="Number of sentences used for validation set",
+    )
+    parser.add_argument(
+        "--min-count",
+        type=int,
+        default=50000,
+        help="Min num sentences used for each language",
+    )
+    args = parser.parse_args()
+
+    x_batches_f, x_sents_f = get_batches(args.src_dir, args.src_lang)
+    y_batches_f, y_sents_f = get_batches(args.tgt_dir, args.tgt_lang)
+    margin = lambda a, b: a / b
+    y2x_sim, y2x_ind = knnGPU_sharded(
+        y_batches_f, x_batches_f, args.dim, args.neighborhood, direction="y2x"
+    )
+    x2y_sim, x2y_ind = knnGPU_sharded(
+        x_batches_f, y_batches_f, args.dim, args.neighborhood, direction="x2y"
+    )
+
+    x2y_mean = x2y_sim.mean(axis=1)
+    y2x_mean = y2x_sim.mean(axis=1)
+    fwd_scores = score_candidates(x2y_sim, x2y_ind, x2y_mean, y2x_mean, margin)
+    bwd_scores = score_candidates(y2x_sim, y2x_ind, y2x_mean, x2y_mean, margin)
+    fwd_best = x2y_ind[np.arange(x2y_sim.shape[0]), fwd_scores.argmax(axis=1)]
+    bwd_best = y2x_ind[np.arange(y2x_sim.shape[0]), bwd_scores.argmax(axis=1)]
+    indices = np.stack(
+        (
+            np.concatenate((np.arange(x2y_ind.shape[0]), bwd_best)),
+            np.concatenate((fwd_best, np.arange(y2x_ind.shape[0]))),
+        ),
+        axis=1,
+    )
+    scores = np.concatenate((fwd_scores.max(axis=1), bwd_scores.max(axis=1)))
+
+    x_sentences = load_text(x_sents_f)
+    y_sentences = load_text(y_sents_f)
+
+    threshold = args.threshold
+    min_count = args.min_count
+    seen_src, seen_trg = set(), set()
+    directory = args.output
+    call(f"mkdir -p {directory}")
+    src_out = open(
+        f"{directory}/all.{args.src_lang}",
+        mode="w",
+        encoding="utf-8",
+        errors="surrogateescape",
+    )
+    tgt_out = open(
+        f"{directory}/all.{args.tgt_lang}",
+        mode="w",
+        encoding="utf-8",
+        errors="surrogateescape",
+    )
+    scores_out = open(
+        f"{directory}/all.scores", mode="w", encoding="utf-8", errors="surrogateescape"
+    )
+    count = 0
+    for i in np.argsort(-scores):
+        src_ind, trg_ind = indices[i]
+        if src_ind not in seen_src and trg_ind not in seen_trg:
+            seen_src.add(src_ind)
+            seen_trg.add(trg_ind)
+            if scores[i] > threshold or count < min_count:
+                if x_sentences[src_ind]:
+                    print(scores[i], file=scores_out)
+                    print(x_sentences[src_ind], file=src_out)
+                    print(y_sentences[trg_ind], file=tgt_out)
+                    count += 1
+                else:
+                    print(f"Ignoring sentence: {x_sentences[src_ind]}")
+    src_out.close()
+    tgt_out.close()
+    scores_out.close()
+
+    print(f"Found {count} pairs for threshold={threshold}")
+    with open(f"{directory}/all.{args.src_lang}") as all_s, open(
+        f"{directory}/all.{args.tgt_lang}"
+    ) as all_t, open(f"{directory}/valid.{args.src_lang}", "w") as valid_s, open(
+        f"{directory}/valid.{args.tgt_lang}", "w"
+    ) as valid_t, open(
+        f"{directory}/train.{args.src_lang}", "w"
+    ) as train_s, open(
+        f"{directory}/train.{args.tgt_lang}", "w"
+    ) as train_t:
+        count = 0
+        for s_line, t_line in zip(all_s, all_t):
+            s_line = s_line.split("\t")[1]
+            t_line = t_line.split("\t")[1]
+            if count >= args.valid_size:
+                train_s.write(s_line)
+                train_t.write(t_line)
+            else:
+                valid_s.write(s_line)
+                valid_t.write(t_line)
+                count += 1
diff --git a/SpeechT5/fairseq/examples/criss/mining/mine_example.sh b/SpeechT5/fairseq/examples/criss/mining/mine_example.sh
new file mode 100644
index 0000000000000000000000000000000000000000..ace995ac44665f99d904b6a89d7fbbce24103afe
--- /dev/null
+++ b/SpeechT5/fairseq/examples/criss/mining/mine_example.sh
@@ -0,0 +1,103 @@
+#!/bin/bash
+# Copyright (c) Facebook, Inc. and its affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+#
+source_lang=kk_KZ
+target_lang=en_XX
+MODEL=criss_checkpoints/criss.3rd.pt
+SPM=criss_checkpoints/sentence.bpe.model
+SPLIT=test
+LANG_DICT=criss_checkpoints/lang_dict.txt
+SPM_ENCODE=flores/scripts/spm_encode.py
+SAVE_ENCODER=save_encoder.py
+ENCODER_SAVE_ROOT=sentence_embeddings/$MODEL
+DICT=criss_checkpoints/dict.txt
+THRESHOLD=1.02
+MIN_COUNT=500
+
+DATA_DIR=data_tmp
+SAVE_DIR=mining/${source_lang}_${target_lang}_mined
+ENCODER_SAVE_DIR=${ENCODER_SAVE_ROOT}/${source_lang}-${target_lang}
+INPUT_DIR=$DATA_DIR/${source_lang}-${target_lang}-tatoeba
+
+mkdir -p $ENCODER_SAVE_DIR/${target_lang}
+mkdir -p $ENCODER_SAVE_DIR/${source_lang}
+mkdir -p $SAVE_DIR
+
+## Save encoder outputs
+
+# Save encoder outputs for source sentences
+python $SAVE_ENCODER \
+  ${INPUT_DIR} \
+  --path ${MODEL} \
+  --task translation_multi_simple_epoch \
+  --lang-pairs ${source_lang}-${target_lang} \
+  --lang-dict ${LANG_DICT} \
+  --gen-subset ${SPLIT} \
+  --bpe 'sentencepiece' \
+  -s ${source_lang} -t ${target_lang} \
+  --sentencepiece-model ${SPM} \
+  --remove-bpe 'sentencepiece' \
+  --beam 1 \
+  --lang-tok-style mbart \
+  --encoder-save-dir ${ENCODER_SAVE_DIR}/${source_lang}
+
+## Save encoder outputs for target sentences
+python $SAVE_ENCODER \
+  ${INPUT_DIR} \
+  --path ${MODEL} \
+  --lang-pairs ${source_lang}-${target_lang} \
+  --lang-dict ${LANG_DICT} \
+  --task translation_multi_simple_epoch \
+  --gen-subset ${SPLIT} \
+  --bpe 'sentencepiece' \
+  -t ${source_lang} -s ${target_lang} \
+  --sentencepiece-model ${SPM} \
+  --remove-bpe 'sentencepiece' \
+  --beam 1 \
+  --lang-tok-style mbart \
+  --encoder-save-dir ${ENCODER_SAVE_DIR}/${target_lang}
+
+## Mining
+python mining/mine.py \
+  --src-lang ${source_lang} \
+  --tgt-lang ${target_lang} \
+  --dim 1024 \
+  --mem 10 \
+  --neighborhood 4 \
+  --src-dir ${ENCODER_SAVE_DIR}/${source_lang} \
+  --tgt-dir ${ENCODER_SAVE_DIR}/${target_lang} \
+  --output $SAVE_DIR \
+  --threshold ${THRESHOLD} \
+  --min-count ${MIN_COUNT} \
+  --valid-size 100 \
+  --dict-path ${DICT} \
+  --spm-path ${SPM} \
+
+
+## Process and binarize mined data
+python $SPM_ENCODE \
+  --model ${SPM} \
+  --output_format=piece \
+  --inputs mining/${source_lang}_${target_lang}_mined/train.${source_lang} mining/${source_lang}_${target_lang}_mined/train.${target_lang} \
+  --outputs mining/${source_lang}_${target_lang}_mined/train.bpe.${source_lang} mining/${source_lang}_${target_lang}_mined/train.bpe.${target_lang}
+
+python $SPM_ENCODE \
+  --model ${SPM} \
+  --output_format=piece \
+  --inputs mining/${source_lang}_${target_lang}_mined/valid.${source_lang} mining/${source_lang}_${target_lang}_mined/valid.${target_lang} \
+  --outputs mining/${source_lang}_${target_lang}_mined/valid.bpe.${source_lang} mining/${source_lang}_${target_lang}_mined/valid.bpe.${target_lang}
+
+
+fairseq-preprocess \
+  --source-lang ${source_lang} \
+  --target-lang ${target_lang} \
+  --trainpref mining/${source_lang}_${target_lang}_mined/train.bpe \
+  --validpref mining/${source_lang}_${target_lang}_mined/valid.bpe \
+  --destdir mining/${source_lang}_${target_lang}_mined \
+  --srcdict ${DICT} \
+  --joined-dictionary \
+  --workers 8
diff --git a/SpeechT5/fairseq/examples/criss/save_encoder.py b/SpeechT5/fairseq/examples/criss/save_encoder.py
new file mode 100644
index 0000000000000000000000000000000000000000..d911d066e359f5ce64aa4292d812d6e52fd3cc9b
--- /dev/null
+++ b/SpeechT5/fairseq/examples/criss/save_encoder.py
@@ -0,0 +1,213 @@
+#!/usr/bin/env python3 -u
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+"""
+Translate pre-processed data with a trained model.
+"""
+
+import numpy as np
+import torch
+from fairseq import checkpoint_utils, options, progress_bar, tasks, utils
+from fairseq.sequence_generator import EnsembleModel
+
+
+def get_avg_pool(
+    models, sample, prefix_tokens, src_dict, remove_bpe, has_langtok=False
+):
+    model = EnsembleModel(models)
+
+    # model.forward normally channels prev_output_tokens into the decoder
+    # separately, but SequenceGenerator directly calls model.encoder
+    encoder_input = {
+        k: v for k, v in sample["net_input"].items() if k != "prev_output_tokens"
+    }
+
+    # compute the encoder output for each beam
+    encoder_outs = model.forward_encoder(encoder_input)
+    np_encoder_outs = encoder_outs[0].encoder_out.cpu().numpy().astype(np.float32)
+    encoder_mask = 1 - encoder_outs[0].encoder_padding_mask.cpu().numpy().astype(
+        np.float32
+    )
+    encoder_mask = np.expand_dims(encoder_mask.T, axis=2)
+    if has_langtok:
+        encoder_mask = encoder_mask[1:, :, :]
+        np_encoder_outs = np_encoder_outs[1, :, :]
+    masked_encoder_outs = encoder_mask * np_encoder_outs
+    avg_pool = (masked_encoder_outs / encoder_mask.sum(axis=0)).sum(axis=0)
+    return avg_pool
+
+
+def main(args):
+    assert args.path is not None, "--path required for generation!"
+    assert (
+        not args.sampling or args.nbest == args.beam
+    ), "--sampling requires --nbest to be equal to --beam"
+    assert (
+        args.replace_unk is None or args.raw_text
+    ), "--replace-unk requires a raw text dataset (--raw-text)"
+
+    args.beam = 1
+    utils.import_user_module(args)
+
+    if args.max_tokens is None:
+        args.max_tokens = 12000
+    print(args)
+    use_cuda = torch.cuda.is_available() and not args.cpu
+
+    # Load dataset splits
+    task = tasks.setup_task(args)
+    task.load_dataset(args.gen_subset)
+
+    # Set dictionaries
+    try:
+        src_dict = getattr(task, "source_dictionary", None)
+    except NotImplementedError:
+        src_dict = None
+    tgt_dict = task.target_dictionary
+
+    # Load ensemble
+    print("| loading model(s) from {}".format(args.path))
+    models, _model_args = checkpoint_utils.load_model_ensemble(
+        args.path.split(":"),
+        arg_overrides=eval(args.model_overrides),
+        task=task,
+    )
+
+    # Optimize ensemble for generation
+    for model in models:
+        model.make_generation_fast_(
+            beamable_mm_beam_size=None if args.no_beamable_mm else args.beam,
+            need_attn=args.print_alignment,
+        )
+        if args.fp16:
+            model.half()
+        if use_cuda:
+            model.cuda()
+
+    # Load alignment dictionary for unknown word replacement
+    # (None if no unknown word replacement, empty if no path to align dictionary)
+    align_dict = utils.load_align_dict(args.replace_unk)
+
+    # Load dataset (possibly sharded)
+    itr = task.get_batch_iterator(
+        dataset=task.dataset(args.gen_subset),
+        max_tokens=args.max_tokens,
+        max_positions=utils.resolve_max_positions(
+            task.max_positions(),
+        ),
+        ignore_invalid_inputs=args.skip_invalid_size_inputs_valid_test,
+        required_batch_size_multiple=args.required_batch_size_multiple,
+        num_shards=args.num_shards,
+        shard_id=args.shard_id,
+        num_workers=args.num_workers,
+    ).next_epoch_itr(shuffle=False)
+
+    num_sentences = 0
+    source_sentences = []
+    shard_id = 0
+    all_avg_pool = None
+    encoder_has_langtok = (
+        hasattr(task.args, "encoder_langtok")
+        and task.args.encoder_langtok is not None
+        and hasattr(task.args, "lang_tok_replacing_bos_eos")
+        and not task.args.lang_tok_replacing_bos_eos
+    )
+    with progress_bar.build_progress_bar(args, itr) as t:
+        for sample in t:
+            if sample is None:
+                print("Skipping None")
+                continue
+            sample = utils.move_to_cuda(sample) if use_cuda else sample
+            if "net_input" not in sample:
+                continue
+
+            prefix_tokens = None
+            if args.prefix_size > 0:
+                prefix_tokens = sample["target"][:, : args.prefix_size]
+
+            with torch.no_grad():
+                avg_pool = get_avg_pool(
+                    models,
+                    sample,
+                    prefix_tokens,
+                    src_dict,
+                    args.post_process,
+                    has_langtok=encoder_has_langtok,
+                )
+                if all_avg_pool is not None:
+                    all_avg_pool = np.concatenate((all_avg_pool, avg_pool))
+                else:
+                    all_avg_pool = avg_pool
+
+            if not isinstance(sample["id"], list):
+                sample_ids = sample["id"].tolist()
+            else:
+                sample_ids = sample["id"]
+            for i, sample_id in enumerate(sample_ids):
+                # Remove padding
+                src_tokens = utils.strip_pad(
+                    sample["net_input"]["src_tokens"][i, :], tgt_dict.pad()
+                )
+
+                # Either retrieve the original sentences or regenerate them from tokens.
+                if align_dict is not None:
+                    src_str = task.dataset(args.gen_subset).src.get_original_text(
+                        sample_id
+                    )
+                else:
+                    if src_dict is not None:
+                        src_str = src_dict.string(src_tokens, args.post_process)
+                    else:
+                        src_str = ""
+
+                if not args.quiet:
+                    if src_dict is not None:
+                        print("S-{}\t{}".format(sample_id, src_str))
+
+                source_sentences.append(f"{sample_id}\t{src_str}")
+
+            num_sentences += sample["nsentences"]
+            if all_avg_pool.shape[0] >= 1000000:
+                with open(
+                    f"{args.encoder_save_dir}/all_avg_pool.{args.source_lang}.{shard_id}",
+                    "w",
+                ) as avg_pool_file:
+                    all_avg_pool.tofile(avg_pool_file)
+                with open(
+                    f"{args.encoder_save_dir}/sentences.{args.source_lang}.{shard_id}",
+                    "w",
+                ) as sentence_file:
+                    sentence_file.writelines(f"{line}\n" for line in source_sentences)
+                all_avg_pool = None
+                source_sentences = []
+                shard_id += 1
+
+    if all_avg_pool is not None:
+        with open(
+            f"{args.encoder_save_dir}/all_avg_pool.{args.source_lang}.{shard_id}", "w"
+        ) as avg_pool_file:
+            all_avg_pool.tofile(avg_pool_file)
+        with open(
+            f"{args.encoder_save_dir}/sentences.{args.source_lang}.{shard_id}", "w"
+        ) as sentence_file:
+            sentence_file.writelines(f"{line}\n" for line in source_sentences)
+    return None
+
+
+def cli_main():
+    parser = options.get_generation_parser()
+    parser.add_argument(
+        "--encoder-save-dir",
+        default="",
+        type=str,
+        metavar="N",
+        help="directory to save encoder outputs",
+    )
+    args = options.parse_args_and_arch(parser)
+    main(args)
+
+
+if __name__ == "__main__":
+    cli_main()
diff --git a/SpeechT5/fairseq/examples/criss/sentence_retrieval/encoder_analysis.py b/SpeechT5/fairseq/examples/criss/sentence_retrieval/encoder_analysis.py
new file mode 100644
index 0000000000000000000000000000000000000000..b41bfbe38789ba14e6a5ea938c75d761424c00ab
--- /dev/null
+++ b/SpeechT5/fairseq/examples/criss/sentence_retrieval/encoder_analysis.py
@@ -0,0 +1,92 @@
+#!/usr/bin/env python3 -u
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+import argparse
+import glob
+
+import numpy as np
+
+
+DIM = 1024
+
+
+def compute_dist(source_embs, target_embs, k=5, return_sim_mat=False):
+    target_ids = [tid for tid in target_embs]
+    source_mat = np.stack(source_embs.values(), axis=0)
+    normalized_source_mat = source_mat / np.linalg.norm(
+        source_mat, axis=1, keepdims=True
+    )
+    target_mat = np.stack(target_embs.values(), axis=0)
+    normalized_target_mat = target_mat / np.linalg.norm(
+        target_mat, axis=1, keepdims=True
+    )
+    sim_mat = normalized_source_mat.dot(normalized_target_mat.T)
+    if return_sim_mat:
+        return sim_mat
+    neighbors_map = {}
+    for i, sentence_id in enumerate(source_embs):
+        idx = np.argsort(sim_mat[i, :])[::-1][:k]
+        neighbors_map[sentence_id] = [target_ids[tid] for tid in idx]
+    return neighbors_map
+
+
+def load_embeddings(directory, LANGS):
+    sentence_embeddings = {}
+    sentence_texts = {}
+    for lang in LANGS:
+        sentence_embeddings[lang] = {}
+        sentence_texts[lang] = {}
+        lang_dir = f"{directory}/{lang}"
+        embedding_files = glob.glob(f"{lang_dir}/all_avg_pool.{lang}.*")
+        for embed_file in embedding_files:
+            shard_id = embed_file.split(".")[-1]
+            embeddings = np.fromfile(embed_file, dtype=np.float32)
+            num_rows = embeddings.shape[0] // DIM
+            embeddings = embeddings.reshape((num_rows, DIM))
+
+            with open(f"{lang_dir}/sentences.{lang}.{shard_id}") as sentence_file:
+                for idx, line in enumerate(sentence_file):
+                    sentence_id, sentence = line.strip().split("\t")
+                    sentence_texts[lang][sentence_id] = sentence
+                    sentence_embeddings[lang][sentence_id] = embeddings[idx, :]
+
+    return sentence_embeddings, sentence_texts
+
+
+def compute_accuracy(directory, LANGS):
+    sentence_embeddings, sentence_texts = load_embeddings(directory, LANGS)
+
+    top_1_accuracy = {}
+
+    top1_str = " ".join(LANGS) + "\n"
+    for source_lang in LANGS:
+        top_1_accuracy[source_lang] = {}
+        top1_str += f"{source_lang} "
+        for target_lang in LANGS:
+            top1 = 0
+            top5 = 0
+            neighbors_map = compute_dist(
+                sentence_embeddings[source_lang], sentence_embeddings[target_lang]
+            )
+            for sentence_id, neighbors in neighbors_map.items():
+                if sentence_id == neighbors[0]:
+                    top1 += 1
+                if sentence_id in neighbors[:5]:
+                    top5 += 1
+            n = len(sentence_embeddings[target_lang])
+            top1_str += f"{top1/n} "
+        top1_str += "\n"
+
+    print(top1_str)
+    print(top1_str, file=open(f"{directory}/accuracy", "w"))
+
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(description="Analyze encoder outputs")
+    parser.add_argument("directory", help="Source language corpus")
+    parser.add_argument("--langs", help="List of langs")
+    args = parser.parse_args()
+    langs = args.langs.split(",")
+    compute_accuracy(args.directory, langs)
diff --git a/SpeechT5/fairseq/examples/criss/sentence_retrieval/sentence_retrieval_tatoeba.sh b/SpeechT5/fairseq/examples/criss/sentence_retrieval/sentence_retrieval_tatoeba.sh
new file mode 100644
index 0000000000000000000000000000000000000000..0428d8bef9d426ac3e664cd281ce0b688f5f580f
--- /dev/null
+++ b/SpeechT5/fairseq/examples/criss/sentence_retrieval/sentence_retrieval_tatoeba.sh
@@ -0,0 +1,59 @@
+#!/bin/bash
+# Copyright (c) Facebook, Inc. and its affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+#
+source_lang=kk_KZ
+target_lang=en_XX
+MODEL=criss_checkpoints/criss.3rd.pt
+SPM=criss_checkpoints/sentence.bpe.model
+SPLIT=test
+LANG_DICT=criss_checkpoints/lang_dict.txt
+ENCODER_ANALYSIS=sentence_retrieval/encoder_analysis.py
+SAVE_ENCODER=save_encoder.py
+ENCODER_SAVE_ROOT=sentence_embeddings/$MODEL
+
+
+
+DATA_DIR=data_tmp
+INPUT_DIR=$DATA_DIR/${source_lang}-${target_lang}-tatoeba
+ENCODER_SAVE_DIR=${ENCODER_SAVE_ROOT}/${source_lang}-${target_lang}
+mkdir -p $ENCODER_SAVE_DIR/${target_lang}
+mkdir -p $ENCODER_SAVE_DIR/${source_lang}
+
+# Save encoder outputs for source sentences
+python $SAVE_ENCODER \
+  ${INPUT_DIR} \
+  --path ${MODEL} \
+  --task translation_multi_simple_epoch \
+  --lang-dict ${LANG_DICT} \
+  --gen-subset ${SPLIT} \
+  --bpe 'sentencepiece' \
+  --lang-pairs ${source_lang}-${target_lang} \
+  -s ${source_lang} -t ${target_lang} \
+  --sentencepiece-model ${SPM} \
+  --remove-bpe 'sentencepiece' \
+  --beam 1 \
+  --lang-tok-style mbart \
+  --encoder-save-dir ${ENCODER_SAVE_DIR}/${source_lang}
+
+# Save encoder outputs for target sentences
+python $SAVE_ENCODER \
+  ${INPUT_DIR} \
+  --path ${MODEL} \
+  --lang-dict ${LANG_DICT} \
+  --task translation_multi_simple_epoch \
+  --gen-subset ${SPLIT} \
+  --bpe 'sentencepiece' \
+  --lang-pairs ${target_lang}-${source_lang} \
+  -t ${source_lang} -s ${target_lang} \
+  --sentencepiece-model ${SPM} \
+  --remove-bpe 'sentencepiece' \
+  --beam 1 \
+  --lang-tok-style mbart \
+  --encoder-save-dir ${ENCODER_SAVE_DIR}/${target_lang}
+
+# Analyze sentence retrieval accuracy
+python $ENCODER_ANALYSIS --langs "${source_lang},${target_lang}" ${ENCODER_SAVE_DIR}
diff --git a/SpeechT5/fairseq/examples/criss/unsupervised_mt/eval.sh b/SpeechT5/fairseq/examples/criss/unsupervised_mt/eval.sh
new file mode 100644
index 0000000000000000000000000000000000000000..03b773ed5a522eb82186fea8ffbb6c557e14b6d3
--- /dev/null
+++ b/SpeechT5/fairseq/examples/criss/unsupervised_mt/eval.sh
@@ -0,0 +1,37 @@
+#!/bin/bash
+# Copyright (c) Facebook, Inc. and its affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+#
+SRC=si_LK
+TGT=en_XX
+MODEL=criss_checkpoints/criss.3rd.pt
+
+MULTIBLEU=mosesdecoder/scripts/generic/multi-bleu.perl
+MOSES=mosesdecoder
+REPLACE_UNICODE_PUNCT=$MOSES/scripts/tokenizer/replace-unicode-punctuation.perl
+NORM_PUNC=$MOSES/scripts/tokenizer/normalize-punctuation.perl
+REM_NON_PRINT_CHAR=$MOSES/scripts/tokenizer/remove-non-printing-char.perl
+TOKENIZER=$MOSES/scripts/tokenizer/tokenizer.perl
+GEN_TMP_DIR=gen_tmp
+LANG_DICT=criss_checkpoints/lang_dict.txt
+
+if [ ! -d "mosesdecoder" ]; then
+  git clone https://github.com/moses-smt/mosesdecoder
+fi
+mkdir -p $GEN_TMP_DIR
+fairseq-generate data_tmp/${SRC}-${TGT}-flores \
+        --task translation_multi_simple_epoch \
+        --max-tokens 2000 \
+        --path ${MODEL} \
+        --skip-invalid-size-inputs-valid-test \
+        --beam 5 --lenpen 1.0 --gen-subset test  \
+        --remove-bpe=sentencepiece \
+        --source-lang ${SRC} --target-lang ${TGT} \
+        --decoder-langtok --lang-pairs 'en_XX-ar_AR,en_XX-de_DE,en_XX-es_XX,en_XX-fr_XX,en_XX-hi_IN,en_XX-it_IT,en_XX-ja_XX,en_XX-ko_KR,en_XX-nl_XX,en_XX-ru_RU,en_XX-zh_CN,en_XX-tr_TR,en_XX-vi_VN,en_XX-ro_RO,en_XX-my_MM,en_XX-ne_NP,en_XX-si_LK,en_XX-cs_CZ,en_XX-lt_LT,en_XX-kk_KZ,en_XX-gu_IN,en_XX-fi_FI,en_XX-et_EE,en_XX-lv_LV,ar_AR-en_XX,cs_CZ-en_XX,de_DE-en_XX,es_XX-en_XX,et_EE-en_XX,fi_FI-en_XX,fr_XX-en_XX,gu_IN-en_XX,hi_IN-en_XX,it_IT-en_XX,ja_XX-en_XX,kk_KZ-en_XX,ko_KR-en_XX,lt_LT-en_XX,lv_LV-en_XX,my_MM-en_XX,ne_NP-en_XX,nl_XX-en_XX,ro_RO-en_XX,ru_RU-en_XX,si_LK-en_XX,tr_TR-en_XX,vi_VN-en_XX,zh_CN-en_XX,ar_AR-es_XX,es_XX-ar_AR,ar_AR-hi_IN,hi_IN-ar_AR,ar_AR-zh_CN,zh_CN-ar_AR,cs_CZ-es_XX,es_XX-cs_CZ,cs_CZ-hi_IN,hi_IN-cs_CZ,cs_CZ-zh_CN,zh_CN-cs_CZ,de_DE-es_XX,es_XX-de_DE,de_DE-hi_IN,hi_IN-de_DE,de_DE-zh_CN,zh_CN-de_DE,es_XX-hi_IN,hi_IN-es_XX,es_XX-zh_CN,zh_CN-es_XX,et_EE-es_XX,es_XX-et_EE,et_EE-hi_IN,hi_IN-et_EE,et_EE-zh_CN,zh_CN-et_EE,fi_FI-es_XX,es_XX-fi_FI,fi_FI-hi_IN,hi_IN-fi_FI,fi_FI-zh_CN,zh_CN-fi_FI,fr_XX-es_XX,es_XX-fr_XX,fr_XX-hi_IN,hi_IN-fr_XX,fr_XX-zh_CN,zh_CN-fr_XX,gu_IN-es_XX,es_XX-gu_IN,gu_IN-hi_IN,hi_IN-gu_IN,gu_IN-zh_CN,zh_CN-gu_IN,hi_IN-zh_CN,zh_CN-hi_IN,it_IT-es_XX,es_XX-it_IT,it_IT-hi_IN,hi_IN-it_IT,it_IT-zh_CN,zh_CN-it_IT,ja_XX-es_XX,es_XX-ja_XX,ja_XX-hi_IN,hi_IN-ja_XX,ja_XX-zh_CN,zh_CN-ja_XX,kk_KZ-es_XX,es_XX-kk_KZ,kk_KZ-hi_IN,hi_IN-kk_KZ,kk_KZ-zh_CN,zh_CN-kk_KZ,ko_KR-es_XX,es_XX-ko_KR,ko_KR-hi_IN,hi_IN-ko_KR,ko_KR-zh_CN,zh_CN-ko_KR,lt_LT-es_XX,es_XX-lt_LT,lt_LT-hi_IN,hi_IN-lt_LT,lt_LT-zh_CN,zh_CN-lt_LT,lv_LV-es_XX,es_XX-lv_LV,lv_LV-hi_IN,hi_IN-lv_LV,lv_LV-zh_CN,zh_CN-lv_LV,my_MM-es_XX,es_XX-my_MM,my_MM-hi_IN,hi_IN-my_MM,my_MM-zh_CN,zh_CN-my_MM,ne_NP-es_XX,es_XX-ne_NP,ne_NP-hi_IN,hi_IN-ne_NP,ne_NP-zh_CN,zh_CN-ne_NP,nl_XX-es_XX,es_XX-nl_XX,nl_XX-hi_IN,hi_IN-nl_XX,nl_XX-zh_CN,zh_CN-nl_XX,ro_RO-es_XX,es_XX-ro_RO,ro_RO-hi_IN,hi_IN-ro_RO,ro_RO-zh_CN,zh_CN-ro_RO,ru_RU-es_XX,es_XX-ru_RU,ru_RU-hi_IN,hi_IN-ru_RU,ru_RU-zh_CN,zh_CN-ru_RU,si_LK-es_XX,es_XX-si_LK,si_LK-hi_IN,hi_IN-si_LK,si_LK-zh_CN,zh_CN-si_LK,tr_TR-es_XX,es_XX-tr_TR,tr_TR-hi_IN,hi_IN-tr_TR,tr_TR-zh_CN,zh_CN-tr_TR,vi_VN-es_XX,es_XX-vi_VN,vi_VN-hi_IN,hi_IN-vi_VN,vi_VN-zh_CN,zh_CN-vi_VN' \
+        --lang-dict ${LANG_DICT} --lang-tok-style 'mbart' --sampling-method 'temperature' --sampling-temperature '1.0'  > $GEN_TMP_DIR/${SRC}_${TGT}.gen
+cat $GEN_TMP_DIR/${SRC}_${TGT}.gen | grep -P "^T-" | cut -f2 | $REPLACE_UNICODE_PUNCT | $NORM_PUNC -l ${TGT:0:2} | $REM_NON_PRINT_CHAR | $TOKENIZER -no-escape ${TGT:0:2} > $GEN_TMP_DIR/${SRC}_${TGT}.hyp
+cat $GEN_TMP_DIR/${SRC}_${TGT}.gen | grep -P "^H-" | cut -f3 | $REPLACE_UNICODE_PUNCT | $NORM_PUNC -l ${TGT:0:2} | $REM_NON_PRINT_CHAR | $TOKENIZER -no-escape ${TGT:0:2} > $GEN_TMP_DIR/${SRC}_${TGT}.ref
+${MULTIBLEU} $GEN_TMP_DIR/${SRC}_${TGT}.ref < $GEN_TMP_DIR/${SRC}_${TGT}.hyp
diff --git a/SpeechT5/fairseq/examples/cross_lingual_language_model/README.md b/SpeechT5/fairseq/examples/cross_lingual_language_model/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..af9128e39e5925e9411d162c2f24a19e4532d618
--- /dev/null
+++ b/SpeechT5/fairseq/examples/cross_lingual_language_model/README.md
@@ -0,0 +1,77 @@
+# Cross-Lingual Language Model Pre-training
+
+Below are some details for training Cross-Lingual Language Models (XLM) - similar to the ones presented in [Lample & Conneau, 2019](https://arxiv.org/pdf/1901.07291.pdf) - in Fairseq. The current implementation only supports the Masked Language Model (MLM) from the paper above.
+
+## Downloading and Tokenizing Monolingual Data
+
+Pointers to the monolingual data from wikipedia, used for training the XLM-style MLM model as well as details on processing (tokenization and BPE) it can be found in the [XLM Github Repository](https://github.com/facebookresearch/XLM#download--preprocess-monolingual-data).
+
+Let's assume the following for the code snippets in later sections to work
+- Processed data is in the folder: monolingual_data/processed
+- Each language has 3 files for train, test and validation. For example we have the following files for English:
+    train.en, valid.en
+- We are training a model for 5 languages: Arabic (ar), German (de), English (en), Hindi (hi) and French (fr)
+- The vocabulary file is monolingual_data/processed/vocab_mlm
+
+
+## Fairseq Pre-processing and Binarization
+
+Pre-process and binarize the data with the MaskedLMDictionary and cross_lingual_lm task
+
+```bash
+# Ensure the output directory exists
+DATA_DIR=monolingual_data/fairseq_processed
+mkdir -p "$DATA_DIR"
+
+for lg in ar de en hi fr
+do
+
+  fairseq-preprocess \
+  --task cross_lingual_lm \
+  --srcdict monolingual_data/processed/vocab_mlm \
+  --only-source \
+  --trainpref monolingual_data/processed/train \
+  --validpref monolingual_data/processed/valid \
+  --testpref monolingual_data/processed/test \
+  --destdir monolingual_data/fairseq_processed \
+  --workers 20 \
+  --source-lang $lg
+
+  # Since we only have a source language, the output file has a None for the
+  # target language. Remove this
+
+  for stage in train test valid
+
+    sudo mv "$DATA_DIR/$stage.$lg-None.$lg.bin" "$stage.$lg.bin"
+    sudo mv "$DATA_DIR/$stage.$lg-None.$lg.idx" "$stage.$lg.idx"
+
+  done
+
+done
+```
+
+## Train a Cross-lingual Language Model similar to the XLM MLM model
+
+Use the following command to train the model on 5 languages.
+
+```
+fairseq-train \
+--task cross_lingual_lm monolingual_data/fairseq_processed \
+--save-dir checkpoints/mlm \
+--max-update 2400000 --save-interval 1 --no-epoch-checkpoints \
+--arch xlm_base \
+--optimizer adam --lr-scheduler reduce_lr_on_plateau \
+--lr-shrink 0.5 --lr 0.0001 --stop-min-lr 1e-09 \
+--dropout 0.1 \
+--criterion legacy_masked_lm_loss \
+--max-tokens 2048 --tokens-per-sample 256 --attention-dropout 0.1 \
+--dataset-impl lazy --seed 0 \
+--masked-lm-only \
+--monolingual-langs 'ar,de,en,hi,fr' --num-segment 5 \
+--ddp-backend=legacy_ddp
+```
+
+Some Notes:
+- Using tokens_per_sample greater than 256 can cause OOM (out-of-memory) issues. Usually since MLM packs in streams of text, this parameter doesn't need much tuning.
+- The Evaluation workflow for computing MLM Perplexity on test data is in progress.
+- Finetuning this model on a downstream task is something which is not currently available.
diff --git a/SpeechT5/fairseq/examples/fast_noisy_channel/README.md b/SpeechT5/fairseq/examples/fast_noisy_channel/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..a04151a796e4e092fa3c803a1679ab521af96aeb
--- /dev/null
+++ b/SpeechT5/fairseq/examples/fast_noisy_channel/README.md
@@ -0,0 +1,345 @@
+# Language Models not just for Pre-training: Fast Online Neural Noisy Channel Modeling
+
+## Introduction
+- [Yee et al. (2019)](https://www.aclweb.org/anthology/D19-1571.pdf) introduce a simple and effective noisy channel modeling approach for neural machine translation. However, the noisy channel online decoding approach introduced in this paper is too slow to be practical.
+- To address this, [Bhosale et al. (2020)](http://www.statmt.org/wmt20/pdf/2020.wmt-1.68.pdf) introduces 3 simple approximations to make this approach very fast and practical without much loss in accuracy.
+- This README provides intructions on how to run online decoding or generation with the noisy channel modeling approach, including ways to make it very fast without much loss in accuracy.
+
+## Noisy Channel Modeling
+
+[Yee et al. (2019)](https://www.aclweb.org/anthology/D19-1571.pdf) applies the Bayes Rule to predict `P(y|x)`, the probability of the target `y` given the source `x`.
+```P(y|x) = P(x|y) * P(y) / P(x)```
+- `P(x|y)` predicts the source `x` given the target `y` and is referred to as the **channel model**
+- `P(y)` is a **language model** over the target `y`
+- `P(x)` is generally not modeled since it is constant for all `y`.
+
+We use Transformer models to parameterize the direct model `P(y|x)`, the channel model `P(x|y)` and the language model `P(y)`.
+
+During online decoding with beam search, we generate the top `K2` candidates per beam and score them with the following linear combination of the channel model, the language model as well as the direct model scores.
+
+```(1 / t) * log(P(y|x) + (1 / s) * ( λ1 * log(P(x|y)) + λ2 * log(P(y) ) )```
+- `t` - Target Prefix Length
+- `s` - Source Length
+- `λ1` - Channel Model Weight
+- `λ2` - Language Model Weight
+
+The top `beam_size` candidates based on the above combined scores are chosen to continue the beams in beam search. In beam search with a direct model alone, the scores from the direct model `P(y|x)` are used to choose the top candidates in beam search.
+
+This framework provides a great way to utlize strong target language models trained on large amounts of unlabeled data. Language models can prefer targets unrelated to the source, so we also need a channel model whose role is to ensure that the target preferred by the language model also translates back to the source.
+
+### Training Translation Models and Language Models
+
+For training Transformer models in fairseq for machine translation, refer to instructions [here](https://github.com/pytorch/fairseq/tree/master/examples/translation)
+
+For training Transformer models in fairseq for language modeling, refer to instructions [here](https://github.com/pytorch/fairseq/tree/master/examples/language_model)
+
+### Generation with Language Model for German-English translation with fairseq
+
+Here are instructions to generate using a direct model and a target-side language model.
+
+Note:
+- Download and install fairseq as per instructions [here](https://github.com/pytorch/fairseq)
+- Preprocess and binarize the dataset as per instructions in section [Test Data Preprocessing](#test-data-preprocessing)
+
+```sh
+binarized_data=data_dir/binarized
+direct_model=de_en_seed4.pt
+lm_model=en_lm.pt
+lm_data=lm_data
+wget https://dl.fbaipublicfiles.com/fast_noisy_channel/de_en/direct_models/seed4.pt -O ${direct_model}
+wget https://dl.fbaipublicfiles.com/fast_noisy_channel/de_en/lm_model/transformer_lm.pt -O ${lm_model}
+mkdir -p ${lm_data}
+wget https://dl.fbaipublicfiles.com/fast_noisy_channel/de_en/lm_model/lm_dict/dict.txt -O ${lm_data}/dict.txt
+
+k2=10
+lenpen=0.16
+lm_wt=0.14
+fairseq-generate ${binarized_data} \
+    --user-dir examples/fast_noisy_channel \
+    --beam 5 \
+    --path ${direct_model} \
+    --lm-model ${lm_model} \
+    --lm-data ${lm_data}  \
+    --k2 ${k2} \
+    --combine-method lm_only \
+    --task noisy_channel_translation \
+    --lenpen ${lenpen} \
+    --lm-wt ${lm_wt} \
+    --gen-subset valid \
+    --remove-bpe \
+    --fp16 \
+    --batch-size 10
+```
+### Noisy Channel Generation for German-English translation with fairseq
+
+Here are instructions for noisy channel generation with a direct model, channel model and language model as explained in section [Noisy Channel Modeling](#noisy-channel-modeling).
+
+Note:
+- Download and install fairseq as per instructions [here](https://github.com/pytorch/fairseq)
+- Preprocess and binarize the dataset as per instructions in section [Test Data Preprocessing](#test-data-preprocessing)
+
+```sh
+binarized_data=data_dir/binarized
+direct_model=de_en_seed4.pt
+lm_model=en_lm.pt
+lm_data=lm_data
+ch_model=en_de.big.seed4.pt
+wget https://dl.fbaipublicfiles.com/fast_noisy_channel/de_en/direct_models/seed4.pt -O ${direct_model}
+wget https://dl.fbaipublicfiles.com/fast_noisy_channel/de_en/lm_model/transformer_lm.pt -O ${lm_model}
+mkdir -p ${lm_data}
+wget https://dl.fbaipublicfiles.com/fast_noisy_channel/de_en/lm_model/lm_dict/dict.txt -O ${lm_data}/dict.txt
+wget https://dl.fbaipublicfiles.com/fast_noisy_channel/de_en/channel_models/big.seed4.pt -O ${ch_model}
+
+k2=10
+lenpen=0.21
+lm_wt=0.50
+bw_wt=0.30
+fairseq-generate ${binarized_data} \
+    --user-dir examples/fast_noisy_channel \
+    --beam 5 \
+    --path ${direct_model} \
+    --lm-model ${lm_model} \
+    --lm-data ${lm_data}  \
+    --channel-model ${ch_model} \
+    --k2 ${k2} \
+    --combine-method noisy_channel \
+    --task noisy_channel_translation \
+    --lenpen ${lenpen} \
+    --lm-wt ${lm_wt} \
+    --ch-wt ${bw_wt} \
+    --gen-subset test \
+    --remove-bpe \
+    --fp16 \
+    --batch-size 1
+```
+## Fast Noisy Channel Modeling
+
+[Bhosale et al. (2020)](http://www.statmt.org/wmt20/pdf/2020.wmt-1.68.pdf) introduces 3 approximations that speed up online noisy channel decoding -
+- Smaller channel models (`Tranformer Base` with 1 encoder and decoder layer each vs. `Transformer Big`)
+  - This involves training a channel model that is possibly smaller and less accurate in terms of BLEU than a channel model of the same size as the direct model.
+  - Since the role of the channel model is mainly to assign low scores to generations from the language model if they don't translate back to the source, we may not need the most accurate channel model for this purpose.
+- Smaller output vocabulary size for the channel model (~30,000 -> ~1000)
+  - The channel model doesn't need to score the full output vocabulary, it just needs to score the source tokens, which are completely known.
+  - This is specified using the arguments `--channel-scoring-type src_vocab --top-k-vocab 500`
+  - This means that the output vocabulary for the channel model will be the source tokens for all examples in the batch and the top-K most frequent tokens in the vocabulary
+  - This reduces the memory consumption needed to store channel model scores significantly
+- Smaller number of candidates (`k2`) scored per beam
+  - This is specified by reducing the argument `--k2`
+
+
+### Fast Noisy Channel Generation for German-English translation with fairseq
+
+Here are instructions for **fast** noisy channel generation with a direct model, channel model and language model as explained in section [Fast Noisy Channel Modeling](#fast-noisy-channel-modeling). The main differences are that we use a smaller channel model, reduce `--k2`, set `--channel-scoring-type src_vocab --top-k-vocab 500` and increase the `--batch-size`.
+
+Note:
+- Download and install fairseq as per instructions [here](https://github.com/pytorch/fairseq)
+- Preprocess and binarize the dataset as per instructions in section [Test Data Preprocessing](#test-data-preprocessing)
+
+```sh
+binarized_data=data_dir/binarized
+direct_model=de_en_seed4.pt
+lm_model=en_lm.pt
+lm_data=lm_data
+small_ch_model=en_de.base_1_1.seed4.pt
+wget https://dl.fbaipublicfiles.com/fast_noisy_channel/de_en/direct_models/seed4.pt -O ${direct_model}
+wget https://dl.fbaipublicfiles.com/fast_noisy_channel/de_en/lm_model/transformer_lm.pt -O ${lm_model}
+mkdir -p ${lm_data}
+wget https://dl.fbaipublicfiles.com/fast_noisy_channel/de_en/lm_model/lm_dict/dict.txt -O ${lm_data}/dict.txt
+wget https://dl.fbaipublicfiles.com/fast_noisy_channel/de_en/channel_models/base_1_1.seed4.pt -O ${small_ch_model}
+
+k2=3
+lenpen=0.23
+lm_wt=0.58
+bw_wt=0.26
+fairseq-generate ${binarized_data} \
+    --user-dir examples/fast_noisy_channel \
+    --beam 5 \
+    --path ${direct_model} \
+    --lm-model ${lm_model} \
+    --lm-data ${lm_data}  \
+    --channel-model ${small_ch_model} \
+    --k2 ${k2} \
+    --combine-method noisy_channel \
+    --task noisy_channel_translation \
+    --lenpen ${lenpen} \
+    --lm-wt ${lm_wt} \
+    --ch-wt ${bw_wt} \
+    --gen-subset test \
+    --remove-bpe \
+    --fp16 \
+    --batch-size 50 \
+    --channel-scoring-type src_vocab --top-k-vocab 500
+```
+
+## Test Data Preprocessing
+
+For preprocessing and binarizing the test sets for Romanian-English and German-English translation, we use the following script -
+
+```sh
+FAIRSEQ=/path/to/fairseq
+cd $FAIRSEQ
+SCRIPTS=$FAIRSEQ/mosesdecoder/scripts
+if [ ! -d "${SCRIPTS}" ]; then
+    echo 'Cloning Moses github repository (for tokenization scripts)...'
+    git clone https://github.com/moses-smt/mosesdecoder.git
+fi
+TOKENIZER=$SCRIPTS/tokenizer/tokenizer.perl
+NORMALIZE=$SCRIPTS/tokenizer/normalize-punctuation.perl
+
+s=de
+t=en
+test=wmt18
+
+mkdir -p data_dir
+
+# Tokenization
+if [ $s == "ro" ] ; then
+    # Note: Get normalise-romanian.py and remove-diacritics.py from
+    # https://github.com/rsennrich/wmt16-scripts/tree/master/preprocess
+    sacrebleu -t $test -l $s-$t --echo src | \
+        $NORMALIZE -l $s | \
+        python normalise-romanian.py | \
+        python remove-diacritics.py | \
+        $TOKENIZER -l $s -a -q > data_dir/$test.$s-$t.$s
+else
+    sacrebleu -t $test -l $s-$t --echo src | perl $NORMALIZE -l $s | perl $TOKENIZER -threads 8 -a -l $s > data_dir/$test.$s-$t.$s
+fi
+
+sacrebleu -t $test -l $s-$t --echo ref | perl $NORMALIZE -l $t | perl $TOKENIZER -threads 8 -a -l $t > data_dir/$test.$s-$t.$t
+
+
+# Applying BPE
+src_bpe_code=/path/to/source/language/bpe/code
+tgt_bpe_code=/path/to/target/language/bpe/code
+src_dict=/path/to/source/language/dict
+tgt_dict=/path/to/target/language/dict
+
+FASTBPE=$FAIRSEQ/fastBPE
+if [ ! -d "${FASTBPE}" ] ; then
+    git clone https://github.com/glample/fastBPE.git
+    # Follow compilation instructions at https://github.com/glample/fastBPE
+    g++ -std=c++11 -pthread -O3 fastBPE/main.cc -IfastBPE -o fast
+fi
+
+${FASTBPE}/fast applybpe data_dir/bpe.$test.$s-$t.$s data_dir/$test.$s-$t.$s ${src_bpe_code}
+${FASTBPE}/fast applybpe data_dir/bpe.$test.$s-$t.$s data_dir/$test.$s-$t.$s ${tgt_bpe_code}
+
+fairseq-preprocess -s $s -t $t \
+    --testpref data_dir/bpe.$test.$s-$t \
+    --destdir data_dir/binarized \
+    --srcdict ${src_dict} \
+    --tgtdict ${tgt_dict}
+```
+
+## Calculating BLEU
+
+```sh
+DETOKENIZER=$SCRIPTS/tokenizer/detokenizer.perl
+cat ${generation_output} | grep -P "^H" | sort -V | cut -f 3- | $DETOKENIZER -l $t -q -a | sacrebleu -t $test -l $s-$t
+```
+
+
+## Romanian-English Translation
+
+The direct and channel models are trained using bitext data (WMT16) combined with backtranslated data (The monolingual data used for backtranslation comes from http://data.statmt.org/rsennrich/wmt16_backtranslations/ (Sennrich et al., 2016c))
+
+The backtranslated data is generated using an ensemble of 3 English-Romanian models trained on bitext training data (WMT16) with unrestricted sampling.
+
+### BPE Codes and Dictionary
+
+We learn a joint BPE vocabulary of 18K types on the bitext training data which is used for both the source and target.
+||Path|
+|----------|------|
+| BPE Code | [joint_bpe_18k](https://dl.fbaipublicfiles.com/fast_noisy_channel/ro_en/bpe_18k) |
+| Dictionary | [dict](https://dl.fbaipublicfiles.com/fast_noisy_channel/ro_en/dict) |
+
+### Direct Models
+For Ro-En with backtranslation, the direct and channel models use a Transformer-Big architecture.
+
+| Seed | Model |
+|----|----|
+| 2 | [ro_en_seed2.pt](https://dl.fbaipublicfiles.com/fast_noisy_channel/ro_en/direct_models/seed2.pt)
+| 4 | [ro_en_seed4.pt](https://dl.fbaipublicfiles.com/fast_noisy_channel/ro_en/direct_models/seed4.pt)
+| 6 | [ro_en_seed6.pt](https://dl.fbaipublicfiles.com/fast_noisy_channel/ro_en/direct_models/seed6.pt)
+
+### Channel Models
+For channel models, we follow the same steps as for the direct models. But backtranslated data is generated in the opposite direction using [this Romanian monolingual data](http://data.statmt.org/rsennrich/wmt16_backtranslations/).
+The best lenpen, LM weight and CH weight are obtained by sweeping over the validation set (wmt16/dev) using beam 5.
+| Model Size | Lenpen | LM Weight | CH Weight | Seed 2 | Seed 4 | Seed 6 |
+|----|----|----|----|----|----|----|
+| `big` | 0.84 | 0.64 | 0.56 | [big.seed2.pt](https://dl.fbaipublicfiles.com/fast_noisy_channel/ro_en/channel_models/big.seed2.pt) | [big.seed2.pt](https://dl.fbaipublicfiles.com/fast_noisy_channel/ro_en/channel_models/big.seed2.pt) | [big.seed2.pt](https://dl.fbaipublicfiles.com/fast_noisy_channel/ro_en/channel_models/big.seed2.pt) |
+| `base_1_1` | 0.63 | 0.40 | 0.37 | [base_1_1.seed2.pt](https://dl.fbaipublicfiles.com/fast_noisy_channel/ro_en/channel_models/base_1_1.seed2.pt) | [base_1_1.seed4.pt](https://dl.fbaipublicfiles.com/fast_noisy_channel/ro_en/channel_models/base_1_1.seed4.pt) | [base_1_1.seed6.pt](https://dl.fbaipublicfiles.com/fast_noisy_channel/ro_en/channel_models/base_1_1.seed6.pt) |
+
+### Language Model
+The model is trained on de-duplicated English Newscrawl data from 2007-2018 comprising 186 million sentences or 4.5B words after normalization and tokenization.
+|  | Path |
+|----|----|
+| `--lm-model` | [transformer_en_lm](https://dl.fbaipublicfiles.com/fast_noisy_channel/ro_en/lm_model/transformer_lm.pt) |
+| `--lm-data` | [lm_data](https://dl.fbaipublicfiles.com/fast_noisy_channel/ro_en/lm_model/lm_dict)
+
+## German-English Translation
+
+### BPE Codes and Dictionaries
+
+| | Path|
+|----------|------|
+| Source BPE Code | [de_bpe_code_24K](https://dl.fbaipublicfiles.com/fast_noisy_channel/de_en/de_bpe_code_24K) |
+| Target BPE Code | [en_bpe_code_24K](https://dl.fbaipublicfiles.com/fast_noisy_channel/de_en/en_bpe_code_24K)
+| Source Dictionary | [de_dict](https://dl.fbaipublicfiles.com/fast_noisy_channel/de_en/de_dict) |
+| Target Dictionary | [en_dict](https://dl.fbaipublicfiles.com/fast_noisy_channel/de_en/en_dict) |
+
+### Direct Models
+We train on WMT’19 training data. Following [Ng et al., 2019](http://statmt.org/wmt19/pdf/53/WMT33.pdf), we apply language identification filtering and remove sentences longer than 250 tokens as well as sentence pairs with a source/target length ratio exceeding 1.5. This results in 26.8M sentence pairs.
+We use the Transformer-Big architecture for the direct model.
+
+| Seed | Model |
+|:----:|----|
+| 4 | [de_en_seed4.pt](https://dl.fbaipublicfiles.com/fast_noisy_channel/de_en/direct_models/seed4.pt)
+| 5 | [de_en_seed5.pt](https://dl.fbaipublicfiles.com/fast_noisy_channel/de_en/direct_models/seed5.pt)
+| 6 | [de_en_seed6.pt](https://dl.fbaipublicfiles.com/fast_noisy_channel/de_en/direct_models/seed6.pt)
+
+### Channel Models
+
+We train on WMT’19 training data. Following [Ng et al., 2019](http://statmt.org/wmt19/pdf/53/WMT33.pdf), we apply language identification filtering and remove sentences longer than 250 tokens as well as sentence pairs with a source/target length ratio exceeding 1.5. This results in 26.8M sentence pairs.
+
+| Model Size | Seed 4 | Seed 5 | Seed 6 |
+|----|----|----|----|
+| `big` | [big.seed4.pt](https://dl.fbaipublicfiles.com/fast_noisy_channel/de_en/channel_models/big.seed4.pt) | [big.seed5.pt](https://dl.fbaipublicfiles.com/fast_noisy_channel/de_en/channel_models/big.seed5.pt) | [big.seed6.pt](https://dl.fbaipublicfiles.com/fast_noisy_channel/de_en/channel_models/big.seed6.pt) |
+| `big_1_1` | [big_1_1.seed4.pt](https://dl.fbaipublicfiles.com/fast_noisy_channel/de_en/channel_models/big_1_1.seed4.pt) | [big_1_1.seed5.pt](https://dl.fbaipublicfiles.com/fast_noisy_channel/de_en/channel_models/big_1_1.seed5.pt) | [big_1_1.seed6.pt](https://dl.fbaipublicfiles.com/fast_noisy_channel/de_en/channel_models/big_1_1.seed6.pt) |
+| `base` | [base.seed4.pt](https://dl.fbaipublicfiles.com/fast_noisy_channel/de_en/channel_models/base.seed4.pt) | [base.seed5.pt](https://dl.fbaipublicfiles.com/fast_noisy_channel/de_en/channel_models/base.seed5.pt) | [base.seed6.pt](https://dl.fbaipublicfiles.com/fast_noisy_channel/de_en/channel_models/base.seed6.pt) |
+| `base_1_1` | [base_1_1.seed4.pt](https://dl.fbaipublicfiles.com/fast_noisy_channel/de_en/channel_models/base_1_1.seed4.pt) | [base_1_1.seed5.pt](https://dl.fbaipublicfiles.com/fast_noisy_channel/de_en/channel_models/base_1_1.seed5.pt) | [base_1_1.seed6.pt](https://dl.fbaipublicfiles.com/fast_noisy_channel/de_en/channel_models/base_1_1.seed6.pt) |
+| `half` | [half.seed4.pt](https://dl.fbaipublicfiles.com/fast_noisy_channel/de_en/channel_models/half.seed4.pt) | [half.seed5.pt](https://dl.fbaipublicfiles.com/fast_noisy_channel/de_en/channel_models/half.seed5.pt) | [half.seed6.pt](https://dl.fbaipublicfiles.com/fast_noisy_channel/de_en/channel_models/half.seed6.pt) |
+| `half_1_1` | [half_1_1.seed4.pt](https://dl.fbaipublicfiles.com/fast_noisy_channel/de_en/channel_models/half_1_1.seed4.pt) | [half_1_1.seed5.pt](https://dl.fbaipublicfiles.com/fast_noisy_channel/de_en/channel_models/half_1_1.seed5.pt) | [half_1_1.seed6.pt](https://dl.fbaipublicfiles.com/fast_noisy_channel/de_en/channel_models/half_1_1.seed6.pt) |
+| `quarter` | [quarter.seed4.pt](https://dl.fbaipublicfiles.com/fast_noisy_channel/de_en/channel_models/quarter.seed4.pt) | [quarter.seed5.pt](https://dl.fbaipublicfiles.com/fast_noisy_channel/de_en/channel_models/quarter.seed5.pt) | [quarter.seed6.pt](https://dl.fbaipublicfiles.com/fast_noisy_channel/de_en/channel_models/quarter.seed6.pt) |
+| `quarter_1_1` | [quarter_1_1.seed4.pt](https://dl.fbaipublicfiles.com/fast_noisy_channel/de_en/channel_models/quarter_1_1.seed4.pt) | [quarter_1_1.seed5.pt](https://dl.fbaipublicfiles.com/fast_noisy_channel/de_en/channel_models/quarter_1_1.seed5.pt) | [quarter_1_1.seed6.pt](https://dl.fbaipublicfiles.com/fast_noisy_channel/de_en/channel_models/quarter_1_1.seed6.pt) |
+| `8th` | [8th.seed4.pt](https://dl.fbaipublicfiles.com/fast_noisy_channel/de_en/channel_models/8th.seed4.pt) | [8th.seed5.pt](https://dl.fbaipublicfiles.com/fast_noisy_channel/de_en/channel_models/8th.seed5.pt) | [8th.seed6.pt](https://dl.fbaipublicfiles.com/fast_noisy_channel/de_en/channel_models/8th.seed6.pt) |
+| `8th_1_1` | [8th_1_1.seed4.pt](https://dl.fbaipublicfiles.com/fast_noisy_channel/de_en/channel_models/8th_1_1.seed4.pt) | [8th_1_1.seed5.pt](https://dl.fbaipublicfiles.com/fast_noisy_channel/de_en/channel_models/8th_1_1.seed5.pt) | [8th_1_1.seed6.pt](https://dl.fbaipublicfiles.com/fast_noisy_channel/de_en/channel_models/8th_1_1.seed6.pt) |
+| `16th` | [16th.seed4.pt](https://dl.fbaipublicfiles.com/fast_noisy_channel/de_en/channel_models/16th.seed4.pt) | [16th.seed5.pt](https://dl.fbaipublicfiles.com/fast_noisy_channel/de_en/channel_models/16th.seed5.pt) | [16th.seed6.pt](https://dl.fbaipublicfiles.com/fast_noisy_channel/de_en/channel_models/16th.seed6.pt) |
+| `16th_1_1` | [16th_1_1.seed4.pt](https://dl.fbaipublicfiles.com/fast_noisy_channel/de_en/channel_models/16th_1_1.seed4.pt) | [16th_1_1.seed5.pt](https://dl.fbaipublicfiles.com/fast_noisy_channel/de_en/channel_models/16th_1_1.seed5.pt) | [16th_1_1.seed6.pt](https://dl.fbaipublicfiles.com/fast_noisy_channel/de_en/channel_models/16th_1_1.seed6.pt) |
+
+### Language Model
+The model is trained on de-duplicated English Newscrawl data from 2007-2018 comprising 186 million sentences or 4.5B words after normalization and tokenization.
+|  | Path |
+|----|----|
+| `--lm-model` | [transformer_en_lm](https://dl.fbaipublicfiles.com/fast_noisy_channel/de_en/lm_model/transformer_lm.pt) |
+| `--lm-data` | [lm_data](https://dl.fbaipublicfiles.com/fast_noisy_channel/de_en/lm_model/lm_dict/)
+
+
+## Citation
+
+```bibtex
+@inproceedings{bhosale2020language,
+    title={Language Models not just for Pre-training: Fast Online Neural Noisy Channel Modeling},
+    author={Shruti Bhosale and Kyra Yee and Sergey Edunov and Michael Auli},
+    booktitle={Proceedings of the Fifth Conference on Machine Translation (WMT)},
+    year={2020},
+}
+
+@inproceedings{yee2019simple,
+  title={Simple and Effective Noisy Channel Modeling for Neural Machine Translation},
+  author={Yee, Kyra and Dauphin, Yann and Auli, Michael},
+  booktitle={Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)},
+  pages={5700--5705},
+  year={2019}
+}
+```
diff --git a/SpeechT5/fairseq/examples/fast_noisy_channel/__init__.py b/SpeechT5/fairseq/examples/fast_noisy_channel/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..9b248c3a24e12ad3da885a7f328c714942de2e6b
--- /dev/null
+++ b/SpeechT5/fairseq/examples/fast_noisy_channel/__init__.py
@@ -0,0 +1,8 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from . import noisy_channel_translation  # noqa
+from . import noisy_channel_sequence_generator  # noqa
+from . import noisy_channel_beam_search  # noqa
diff --git a/SpeechT5/fairseq/examples/fast_noisy_channel/noisy_channel_beam_search.py b/SpeechT5/fairseq/examples/fast_noisy_channel/noisy_channel_beam_search.py
new file mode 100644
index 0000000000000000000000000000000000000000..23869ebcd0c438f36e310c8ccddd3b5c07a71182
--- /dev/null
+++ b/SpeechT5/fairseq/examples/fast_noisy_channel/noisy_channel_beam_search.py
@@ -0,0 +1,71 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import torch
+from fairseq.search import Search
+
+
+class NoisyChannelBeamSearch(Search):
+
+    def __init__(self, tgt_dict):
+        super().__init__(tgt_dict)
+        self.fw_scores_buf = None
+        self.lm_scores_buf = None
+
+    def _init_buffers(self, t):
+        # super()._init_buffers(t)
+        if self.fw_scores_buf is None:
+            self.scores_buf = t.new()
+            self.indices_buf = torch.LongTensor().to(device=t.device)
+            self.beams_buf = torch.LongTensor().to(device=t.device)
+            self.fw_scores_buf = t.new()
+            self.lm_scores_buf = t.new()
+
+    def combine_fw_bw(self, combine_method, fw_cum, bw, step):
+        if combine_method == "noisy_channel":
+            fw_norm = fw_cum.div(step + 1)
+            lprobs = bw + fw_norm
+        elif combine_method == "lm_only":
+            lprobs = bw + fw_cum
+
+        return lprobs
+
+    def step(self, step, fw_lprobs, scores, bw_lprobs, lm_lprobs, combine_method):
+        self._init_buffers(fw_lprobs)
+        bsz, beam_size, vocab_size = fw_lprobs.size()
+
+        if step == 0:
+            # at the first step all hypotheses are equally likely, so use
+            # only the first beam
+            fw_lprobs = fw_lprobs[:, ::beam_size, :].contiguous()
+            bw_lprobs = bw_lprobs[:, ::beam_size, :].contiguous()
+            # nothing to add since we are at the first step
+            fw_lprobs_cum = fw_lprobs
+
+        else:
+            # make probs contain cumulative scores for each hypothesis
+            raw_scores = (scores[:, :, step - 1].unsqueeze(-1))
+            fw_lprobs_cum = (fw_lprobs.add(raw_scores))
+
+        combined_lprobs = self.combine_fw_bw(combine_method, fw_lprobs_cum, bw_lprobs, step)
+
+        # choose the top k according to the combined noisy channel model score
+        torch.topk(
+            combined_lprobs.view(bsz, -1),
+            k=min(
+                # Take the best 2 x beam_size predictions. We'll choose the first
+                # beam_size of these which don't predict eos to continue with.
+                beam_size * 2,
+                combined_lprobs.view(bsz, -1).size(1) - 1,  # -1 so we never select pad
+            ),
+            out=(self.scores_buf, self.indices_buf),
+        )
+        # save corresponding fw and lm scores
+        self.fw_scores_buf = torch.gather(fw_lprobs_cum.view(bsz, -1), 1, self.indices_buf)
+        self.lm_scores_buf = torch.gather(lm_lprobs.view(bsz, -1), 1, self.indices_buf)
+        # Project back into relative indices and beams
+        self.beams_buf = self.indices_buf // vocab_size
+        self.indices_buf.fmod_(vocab_size)
+        return self.scores_buf, self.fw_scores_buf, self.lm_scores_buf, self.indices_buf, self.beams_buf
diff --git a/SpeechT5/fairseq/examples/fast_noisy_channel/noisy_channel_sequence_generator.py b/SpeechT5/fairseq/examples/fast_noisy_channel/noisy_channel_sequence_generator.py
new file mode 100644
index 0000000000000000000000000000000000000000..ea8fae98e87e9f3e69bc51987703a6429eb0c92a
--- /dev/null
+++ b/SpeechT5/fairseq/examples/fast_noisy_channel/noisy_channel_sequence_generator.py
@@ -0,0 +1,842 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from typing import Dict, List, Optional
+
+import math
+import numpy as np
+
+import torch
+import torch.nn.functional as F
+from torch import Tensor
+
+from .noisy_channel_beam_search import NoisyChannelBeamSearch
+from fairseq.sequence_generator import EnsembleModel
+
+
+class NoisyChannelSequenceGenerator(object):
+    def __init__(
+        self,
+        combine_method,
+        tgt_dict,
+        src_dict=None,
+        beam_size=1,
+        max_len_a=0,
+        max_len_b=200,
+        min_len=1,
+        len_penalty=1.0,
+        unk_penalty=0.0,
+        retain_dropout=False,
+        temperature=1.0,
+        match_source_len=False,
+        no_repeat_ngram_size=0,
+        normalize_scores=True,
+        channel_models=None,
+        k2=10,
+        ch_weight=1.0,
+        channel_scoring_type='log_norm',
+        top_k_vocab=0,
+        lm_models=None,
+        lm_dict=None,
+        lm_weight=1.0,
+        normalize_lm_scores_by_tgt_len=False,
+    ):
+        """Generates translations of a given source sentence,
+           using beam search with noisy channel decoding.
+
+        Args:
+            combine_method (string, optional): Method to combine direct, LM and
+                channel model scores (default: None)
+            tgt_dict (~fairseq.data.Dictionary): target dictionary
+            src_dict (~fairseq.data.Dictionary): source dictionary
+            beam_size (int, optional): beam width (default: 1)
+            max_len_a/b (int, optional): generate sequences of maximum length
+                ax + b, where x is the source length
+            min_len (int, optional): the minimum length of the generated output
+                (not including end-of-sentence)
+            len_penalty (float, optional): length penalty, where <1.0 favors
+                shorter, >1.0 favors longer sentences (default: 1.0)
+            unk_penalty (float, optional): unknown word penalty, where <0
+                produces more unks, >0 produces fewer (default: 0.0)
+            retain_dropout (bool, optional): use dropout when generating
+                (default: False)
+            temperature (float, optional): temperature, where values
+                >1.0 produce more uniform samples and values <1.0 produce
+                sharper samples (default: 1.0)
+            match_source_len (bool, optional): outputs should match the source
+                length (default: False)
+            no_repeat_ngram_size (int, optional): Size of n-grams that we avoid
+                repeating in the generation (default: 0)
+            normalize_scores (bool, optional): normalize scores by the length
+                of the output (default: True)
+            channel_models (List[~fairseq.models.FairseqModel]): ensemble of models
+                translating from the target to the source
+            k2 (int, optional): Top K2 candidates to score per beam at each step (default:10)
+            ch_weight (int, optional): Weight associated with the channel model score
+                assuming that the direct model score has weight 1.0 (default: 1.0)
+            channel_scoring_type (str, optional): String specifying how to score
+                the channel model (default: 'log_norm')
+            top_k_vocab (int, optional): If `channel_scoring_type` is `'src_vocab'` or
+                `'src_vocab_batched'`, then this parameter specifies the number of
+                most frequent tokens to include in the channel model output vocabulary,
+                in addition to the source tokens in the input batch (default: 0)
+            lm_models (List[~fairseq.models.FairseqModel]): ensemble of models
+                generating text in the target language
+            lm_dict (~fairseq.data.Dictionary): LM Model dictionary
+            lm_weight (int, optional): Weight associated with the LM model score
+                assuming that the direct model score has weight 1.0 (default: 1.0)
+            normalize_lm_scores_by_tgt_len (bool, optional): Should we normalize LM scores
+                by the target length? By default, we normalize the combination of
+                LM and channel model scores by the source length
+        """
+        self.pad = tgt_dict.pad()
+        self.unk = tgt_dict.unk()
+        self.eos = tgt_dict.eos()
+        self.vocab_size = len(tgt_dict)
+        self.beam_size = beam_size
+        # the max beam size is the dictionary size - 1, since we never select pad
+        self.beam_size = min(beam_size, self.vocab_size - 1)
+        self.max_len_a = max_len_a
+        self.max_len_b = max_len_b
+        self.min_len = min_len
+        self.normalize_scores = normalize_scores
+        self.len_penalty = len_penalty
+        self.unk_penalty = unk_penalty
+        self.retain_dropout = retain_dropout
+        self.temperature = temperature
+        self.match_source_len = match_source_len
+        self.no_repeat_ngram_size = no_repeat_ngram_size
+        self.channel_models = channel_models
+        self.src_dict = src_dict
+        self.tgt_dict = tgt_dict
+        self.combine_method = combine_method
+        self.k2 = k2
+        self.ch_weight = ch_weight
+        self.channel_scoring_type = channel_scoring_type
+        self.top_k_vocab = top_k_vocab
+        self.lm_models = lm_models
+        self.lm_dict = lm_dict
+        self.lm_weight = lm_weight
+        self.log_softmax_fn = torch.nn.LogSoftmax(dim=1)
+        self.normalize_lm_scores_by_tgt_len = normalize_lm_scores_by_tgt_len
+
+        self.share_tgt_dict = (self.lm_dict == self.tgt_dict)
+        self.tgt_to_lm = make_dict2dict(tgt_dict, lm_dict)
+
+        self.ch_scoring_bsz = 3072
+
+        assert temperature > 0, '--temperature must be greater than 0'
+
+        self.search = NoisyChannelBeamSearch(tgt_dict)
+
+    @torch.no_grad()
+    def generate(
+        self,
+        models,
+        sample,
+        prefix_tokens=None,
+        bos_token=None,
+        **kwargs
+    ):
+        """Generate a batch of translations.
+        Args:
+            models (List[~fairseq.models.FairseqModel]): ensemble of models
+            sample (dict): batch
+            prefix_tokens (torch.LongTensor, optional): force decoder to begin
+                with these tokens
+        """
+        model = EnsembleModel(models)
+        incremental_states = torch.jit.annotate(
+            List[Dict[str, Dict[str, Optional[Tensor]]]],
+            [
+                torch.jit.annotate(Dict[str, Dict[str, Optional[Tensor]]], {})
+                for i in range(model.models_size)
+            ],
+        )
+        if not self.retain_dropout:
+            model.eval()
+
+        # model.forward normally channels prev_output_tokens into the decoder
+        # separately, but SequenceGenerator directly calls model.encoder
+        encoder_input = {
+            k: v for k, v in sample['net_input'].items()
+            if k != 'prev_output_tokens'
+        }
+        src_tokens = encoder_input['src_tokens']
+        src_lengths_no_eos = (src_tokens.ne(self.eos) & src_tokens.ne(self.pad)).long().sum(dim=1)
+        input_size = src_tokens.size()
+        # batch dimension goes first followed by source lengths
+        bsz = input_size[0]
+        src_len = input_size[1]
+        beam_size = self.beam_size
+
+        if self.match_source_len:
+            max_len = src_lengths_no_eos.max().item()
+        else:
+            max_len = min(
+                int(self.max_len_a * src_len + self.max_len_b),
+                # exclude the EOS marker
+                model.max_decoder_positions() - 1,
+            )
+
+        # compute the encoder output for each beam
+        encoder_outs = model.forward_encoder(encoder_input)
+        new_order = torch.arange(bsz).view(-1, 1).repeat(1, beam_size).view(-1)
+        new_order = new_order.to(src_tokens.device).long()
+        encoder_outs = model.reorder_encoder_out(encoder_outs, new_order)
+
+        src_lengths = encoder_input['src_lengths']
+        # initialize buffers
+        scores = src_tokens.new(bsz * beam_size, max_len + 1).float().fill_(0)
+        lm_prefix_scores = src_tokens.new(bsz * beam_size).float().fill_(0)
+
+        scores_buf = scores.clone()
+        tokens = src_tokens.new(bsz * beam_size, max_len + 2).long().fill_(self.pad)
+        tokens_buf = tokens.clone()
+        tokens[:, 0] = self.eos if bos_token is None else bos_token
+
+        # reorder source tokens so they may be used as a reference in generating P(S|T)
+        src_tokens = reorder_all_tokens(src_tokens, src_lengths, self.src_dict.eos_index)
+
+        src_tokens = src_tokens.repeat(1, beam_size).view(-1, src_len)
+        src_lengths = src_lengths.view(bsz, -1).repeat(1, beam_size).view(bsz*beam_size, -1)
+
+        attn, attn_buf = None, None
+        nonpad_idxs = None
+
+        # The cands_to_ignore indicates candidates that should be ignored.
+        # For example, suppose we're sampling and have already finalized 2/5
+        # samples. Then the cands_to_ignore would mark 2 positions as being ignored,
+        # so that we only finalize the remaining 3 samples.
+        cands_to_ignore = src_tokens.new_zeros(bsz, beam_size).eq(-1)  # forward and backward-compatible False mask
+
+        # list of completed sentences
+        finalized = [[] for i in range(bsz)]
+        finished = [False for i in range(bsz)]
+        num_remaining_sent = bsz
+
+        # number of candidate hypos per step
+        cand_size = 2 * beam_size  # 2 x beam size in case half are EOS
+
+        # offset arrays for converting between different indexing schemes
+        bbsz_offsets = (torch.arange(0, bsz) * beam_size).unsqueeze(1).type_as(tokens)
+        cand_offsets = torch.arange(0, cand_size).type_as(tokens)
+
+        # helper function for allocating buffers on the fly
+        buffers = {}
+
+        def buffer(name, type_of=tokens):  # noqa
+            if name not in buffers:
+                buffers[name] = type_of.new()
+            return buffers[name]
+
+        def is_finished(sent, step, unfin_idx):
+            """
+            Check whether we've finished generation for a given sentence, by
+            comparing the worst score among finalized hypotheses to the best
+            possible score among unfinalized hypotheses.
+            """
+            assert len(finalized[sent]) <= beam_size
+            if len(finalized[sent]) == beam_size:
+                return True
+            return False
+
+        def finalize_hypos(step, bbsz_idx, eos_scores, combined_noisy_channel_eos_scores):
+            """
+            Finalize the given hypotheses at this step, while keeping the total
+            number of finalized hypotheses per sentence <= beam_size.
+
+            Note: the input must be in the desired finalization order, so that
+            hypotheses that appear earlier in the input are preferred to those
+            that appear later.
+
+            Args:
+                step: current time step
+                bbsz_idx: A vector of indices in the range [0, bsz*beam_size),
+                    indicating which hypotheses to finalize
+                eos_scores: A vector of the same size as bbsz_idx containing
+                    fw scores for each hypothesis
+                combined_noisy_channel_eos_scores: A vector of the same size as bbsz_idx containing
+                    combined noisy channel scores for each hypothesis
+            """
+            assert bbsz_idx.numel() == eos_scores.numel()
+
+            # clone relevant token and attention tensors
+            tokens_clone = tokens.index_select(0, bbsz_idx)
+            tokens_clone = tokens_clone[:, 1:step + 2]  # skip the first index, which is EOS
+            assert not tokens_clone.eq(self.eos).any()
+            tokens_clone[:, step] = self.eos
+            attn_clone = attn.index_select(0, bbsz_idx)[:, :, 1:step+2] if attn is not None else None
+
+            # compute scores per token position
+            pos_scores = scores.index_select(0, bbsz_idx)[:, :step+1]
+            pos_scores[:, step] = eos_scores
+            # convert from cumulative to per-position scores
+            pos_scores[:, 1:] = pos_scores[:, 1:] - pos_scores[:, :-1]
+
+            # normalize sentence-level scores
+            if self.normalize_scores:
+                combined_noisy_channel_eos_scores /= (step + 1) ** self.len_penalty
+
+            cum_unfin = []
+            prev = 0
+            for f in finished:
+                if f:
+                    prev += 1
+                else:
+                    cum_unfin.append(prev)
+
+            sents_seen = set()
+            for i, (idx, score) in enumerate(zip(bbsz_idx.tolist(), combined_noisy_channel_eos_scores.tolist())):
+                unfin_idx = idx // beam_size
+                sent = unfin_idx + cum_unfin[unfin_idx]
+
+                sents_seen.add((sent, unfin_idx))
+
+                if self.match_source_len and step > src_lengths_no_eos[unfin_idx]:
+                    score = -math.inf
+
+                def get_hypo():
+
+                    if attn_clone is not None:
+                        # remove padding tokens from attn scores
+                        hypo_attn = attn_clone[i][nonpad_idxs[sent]]
+                        _, alignment = hypo_attn.max(dim=0)
+                    else:
+                        hypo_attn = None
+                        alignment = None
+
+                    return {
+                        'tokens': tokens_clone[i],
+                        'score': score,
+                        'attention': hypo_attn,  # src_len x tgt_len
+                        'alignment': alignment,
+                        'positional_scores': pos_scores[i],
+                    }
+
+                if len(finalized[sent]) < beam_size:
+                    finalized[sent].append(get_hypo())
+
+            newly_finished = []
+            for sent, unfin_idx in sents_seen:
+                # check termination conditions for this sentence
+                if not finished[sent] and is_finished(sent, step, unfin_idx):
+                    finished[sent] = True
+                    newly_finished.append(unfin_idx)
+            return newly_finished
+
+        def noisy_channel_rescoring(lprobs, beam_size, bsz, src_tokens, tokens, k):
+            """Rescore the top k hypothesis from each beam using noisy channel modeling
+            Returns:
+                new_fw_lprobs: the direct model probabilities after pruning the top k
+                new_ch_lm_lprobs:  the combined channel and language model probabilities
+                new_lm_lprobs: the language model probabilities after pruning the top k
+            """
+            with torch.no_grad():
+                lprobs_size = lprobs.size()
+                if prefix_tokens is not None and step < prefix_tokens.size(1):
+                    probs_slice = lprobs.view(bsz, -1, lprobs.size(-1))[:, 0, :]
+                    cand_scores = torch.gather(
+                        probs_slice, dim=1,
+                        index=prefix_tokens[:, step].view(-1, 1).data
+                    ).expand(-1, beam_size).contiguous().view(bsz*beam_size, 1)
+                    cand_indices = prefix_tokens[:, step].view(-1, 1).expand(bsz, beam_size).data.contiguous().view(bsz*beam_size, 1)
+
+                    # need to calculate and save fw and lm probs for prefix tokens
+                    fw_top_k = cand_scores
+                    fw_top_k_idx = cand_indices
+                    k = 1
+                else:
+                    # take the top k best words for every sentence in batch*beam
+                    fw_top_k, fw_top_k_idx = torch.topk(lprobs.view(beam_size*bsz, -1), k=k)
+                eos_idx = torch.nonzero(fw_top_k_idx.view(bsz*beam_size*k, -1) == self.eos)[:, 0]
+                ch_scores = fw_top_k.new_full((beam_size*bsz*k, ), 0)
+                src_size = torch.sum(src_tokens[:, :] != self.src_dict.pad_index, dim=1, keepdim=True, dtype=fw_top_k.dtype)
+
+                if self.combine_method != "lm_only":
+                    temp_src_tokens_full = src_tokens[:, :].repeat(1, k).view(bsz*beam_size*k, -1)
+                    not_padding = temp_src_tokens_full[:, 1:] != self.src_dict.pad_index
+                    cur_tgt_size = step+2
+
+                    # add eos to all candidate sentences except those that already end in eos
+                    eos_tokens = tokens[:, 0].repeat(1, k).view(-1, 1)
+                    eos_tokens[eos_idx] = self.tgt_dict.pad_index
+
+                    if step == 0:
+                        channel_input = torch.cat((fw_top_k_idx.view(-1, 1), eos_tokens), 1)
+                    else:
+                        # move eos from beginning to end of target sentence
+                        channel_input = torch.cat((tokens[:, 1:step + 1].repeat(1, k).view(-1, step), fw_top_k_idx.view(-1, 1), eos_tokens), 1)
+
+                    ch_input_lengths = torch.tensor(np.full(channel_input.size(0), cur_tgt_size))
+                    ch_input_lengths[eos_idx] = cur_tgt_size-1
+                    if self.channel_scoring_type == "unnormalized":
+                        ch_encoder_output = channel_model.encoder(channel_input, src_lengths=ch_input_lengths)
+                        ch_decoder_output, _ = channel_model.decoder(temp_src_tokens_full, encoder_out=ch_encoder_output, features_only=True)
+                        del ch_encoder_output
+                        ch_intermed_scores = channel_model.decoder.unnormalized_scores_given_target(ch_decoder_output, target_ids=temp_src_tokens_full[:, 1:])
+                        ch_intermed_scores = ch_intermed_scores.float()
+                        ch_intermed_scores *= not_padding.float()
+                        ch_scores = torch.sum(ch_intermed_scores, dim=1)
+                    elif self.channel_scoring_type == "k2_separate":
+                        for k_idx in range(k):
+                            k_eos_tokens = eos_tokens[k_idx::k, :]
+                            if step == 0:
+                                k_ch_input = torch.cat((fw_top_k_idx[:, k_idx:k_idx+1], k_eos_tokens), 1)
+                            else:
+                                # move eos from beginning to end of target sentence
+                                k_ch_input = torch.cat((tokens[:, 1:step + 1], fw_top_k_idx[:, k_idx:k_idx+1], k_eos_tokens), 1)
+                            k_ch_input_lengths = ch_input_lengths[k_idx::k]
+                            k_ch_output = channel_model(k_ch_input, k_ch_input_lengths, src_tokens)
+                            k_ch_lprobs = channel_model.get_normalized_probs(k_ch_output, log_probs=True)
+                            k_ch_intermed_scores = torch.gather(k_ch_lprobs[:, :-1, :], 2, src_tokens[:, 1:].unsqueeze(2)).squeeze(2)
+                            k_ch_intermed_scores *= not_padding.float()
+                            ch_scores[k_idx::k] = torch.sum(k_ch_intermed_scores, dim=1)
+                    elif self.channel_scoring_type == "src_vocab":
+                        ch_encoder_output = channel_model.encoder(channel_input, src_lengths=ch_input_lengths)
+                        ch_decoder_output, _ = channel_model.decoder(temp_src_tokens_full, encoder_out=ch_encoder_output, features_only=True)
+
+                        del ch_encoder_output
+                        ch_lprobs = normalized_scores_with_batch_vocab(
+                            channel_model.decoder,
+                            ch_decoder_output, src_tokens, k, bsz, beam_size,
+                            self.src_dict.pad_index, top_k=self.top_k_vocab)
+                        ch_scores = torch.sum(ch_lprobs, dim=1)
+                    elif self.channel_scoring_type == "src_vocab_batched":
+                        ch_bsz_size = temp_src_tokens_full.shape[0]
+                        ch_lprobs_list = [None] * len(range(0, ch_bsz_size, self.ch_scoring_bsz))
+                        for i, start_idx in enumerate(range(0, ch_bsz_size, self.ch_scoring_bsz)):
+                            end_idx = min(start_idx + self.ch_scoring_bsz, ch_bsz_size)
+                            temp_src_tokens_full_batch = temp_src_tokens_full[start_idx:end_idx, :]
+                            channel_input_batch = channel_input[start_idx:end_idx, :]
+                            ch_input_lengths_batch = ch_input_lengths[start_idx:end_idx]
+                            ch_encoder_output_batch = channel_model.encoder(channel_input_batch, src_lengths=ch_input_lengths_batch)
+                            ch_decoder_output_batch, _ = channel_model.decoder(temp_src_tokens_full_batch, encoder_out=ch_encoder_output_batch, features_only=True)
+                            ch_lprobs_list[i] = normalized_scores_with_batch_vocab(
+                                channel_model.decoder,
+                                ch_decoder_output_batch, src_tokens, k, bsz, beam_size,
+                                self.src_dict.pad_index, top_k=self.top_k_vocab,
+                                start_idx=start_idx, end_idx=end_idx)
+                        ch_lprobs = torch.cat(ch_lprobs_list, dim=0)
+                        ch_scores = torch.sum(ch_lprobs, dim=1)
+                    else:
+                        ch_output = channel_model(channel_input, ch_input_lengths, temp_src_tokens_full)
+                        ch_lprobs = channel_model.get_normalized_probs(ch_output, log_probs=True)
+                        ch_intermed_scores = torch.gather(ch_lprobs[:, :-1, :], 2, temp_src_tokens_full[:, 1:].unsqueeze(2)).squeeze().view(bsz*beam_size*k, -1)
+                        ch_intermed_scores *= not_padding.float()
+                        ch_scores = torch.sum(ch_intermed_scores, dim=1)
+
+                else:
+                    cur_tgt_size = 0
+                ch_scores = ch_scores.view(bsz*beam_size, k)
+                expanded_lm_prefix_scores = lm_prefix_scores.unsqueeze(1).expand(-1, k).flatten()
+
+                if self.share_tgt_dict:
+                    lm_scores = get_lm_scores(lm, tokens[:, :step + 1].view(-1, step+1), lm_incremental_states, fw_top_k_idx.view(-1, 1), torch.tensor(np.full(tokens.size(0), step+1)), k)
+                else:
+                    new_lm_input = dict2dict(tokens[:, :step + 1].view(-1, step+1), self.tgt_to_lm)
+                    new_cands = dict2dict(fw_top_k_idx.view(-1, 1), self.tgt_to_lm)
+                    lm_scores = get_lm_scores(lm, new_lm_input, lm_incremental_states, new_cands, torch.tensor(np.full(tokens.size(0), step+1)), k)
+
+                lm_scores.add_(expanded_lm_prefix_scores)
+                ch_lm_scores = combine_ch_lm(self.combine_method, ch_scores, lm_scores, src_size, cur_tgt_size)
+                # initialize all as min value
+                new_fw_lprobs = ch_scores.new(lprobs_size).fill_(-1e17).view(bsz*beam_size, -1)
+                new_ch_lm_lprobs = ch_scores.new(lprobs_size).fill_(-1e17).view(bsz*beam_size, -1)
+                new_lm_lprobs = ch_scores.new(lprobs_size).fill_(-1e17).view(bsz*beam_size, -1)
+                new_fw_lprobs[:, self.pad] = -math.inf
+                new_ch_lm_lprobs[:, self.pad] = -math.inf
+                new_lm_lprobs[:, self.pad] = -math.inf
+
+                new_fw_lprobs.scatter_(1, fw_top_k_idx, fw_top_k)
+                new_ch_lm_lprobs.scatter_(1, fw_top_k_idx, ch_lm_scores)
+                new_lm_lprobs.scatter_(1, fw_top_k_idx, lm_scores.view(-1, k))
+                return new_fw_lprobs, new_ch_lm_lprobs, new_lm_lprobs
+
+        def combine_ch_lm(combine_type, ch_scores, lm_scores1, src_size, tgt_size):
+            if self.channel_scoring_type == "unnormalized":
+                ch_scores = self.log_softmax_fn(
+                    ch_scores.view(-1, self.beam_size * self.k2)
+                ).view(ch_scores.shape)
+            ch_scores = ch_scores * self.ch_weight
+            lm_scores1 = lm_scores1 * self.lm_weight
+
+            if combine_type == "lm_only":
+                # log P(T|S) + log P(T)
+                ch_scores = lm_scores1.view(ch_scores.size())
+            elif combine_type == "noisy_channel":
+                # 1/t log P(T|S) + 1/s log P(S|T) + 1/t log P(T)
+                if self.normalize_lm_scores_by_tgt_len:
+                    ch_scores.div_(src_size)
+                    lm_scores_norm = lm_scores1.view(ch_scores.size()).div(tgt_size)
+                    ch_scores.add_(lm_scores_norm)
+                # 1/t log P(T|S) + 1/s log P(S|T) + 1/s log P(T)
+                else:
+                    ch_scores.add_(lm_scores1.view(ch_scores.size()))
+                    ch_scores.div_(src_size)
+
+            return ch_scores
+
+        if self.channel_models is not None:
+            channel_model = self.channel_models[0]  # assume only one channel_model model
+        else:
+            channel_model = None
+
+        lm = EnsembleModel(self.lm_models)
+        lm_incremental_states = torch.jit.annotate(
+            List[Dict[str, Dict[str, Optional[Tensor]]]],
+            [
+                torch.jit.annotate(Dict[str, Dict[str, Optional[Tensor]]], {})
+                for i in range(lm.models_size)
+            ],
+        )
+
+        reorder_state = None
+        batch_idxs = None
+        for step in range(max_len + 1):  # one extra step for EOS marker
+            # reorder decoder internal states based on the prev choice of beams
+            if reorder_state is not None:
+                if batch_idxs is not None:
+                    # update beam indices to take into account removed sentences
+                    corr = batch_idxs - torch.arange(batch_idxs.numel()).type_as(batch_idxs)
+                    reorder_state.view(-1, beam_size).add_(corr.unsqueeze(-1) * beam_size)
+                model.reorder_incremental_state(incremental_states, reorder_state)
+                encoder_outs = model.reorder_encoder_out(encoder_outs, reorder_state)
+
+                lm.reorder_incremental_state(lm_incremental_states, reorder_state)
+
+            fw_lprobs, avg_attn_scores = model.forward_decoder(
+                tokens[:, :step + 1], encoder_outs, incremental_states, temperature=self.temperature,
+            )
+
+            fw_lprobs[:, self.pad] = -math.inf  # never select pad
+            fw_lprobs[:, self.unk] -= self.unk_penalty  # apply unk penalty
+            fw_lprobs, ch_lm_lprobs, lm_lprobs = noisy_channel_rescoring(fw_lprobs, beam_size, bsz, src_tokens, tokens, self.k2)
+
+            # handle min and max length constraints
+            if step >= max_len:
+                fw_lprobs[:, :self.eos] = -math.inf
+                fw_lprobs[:, self.eos + 1:] = -math.inf
+            elif step < self.min_len:
+                fw_lprobs[:, self.eos] = -math.inf
+
+            # handle prefix tokens (possibly with different lengths)
+            if prefix_tokens is not None and step < prefix_tokens.size(1):
+                prefix_toks = prefix_tokens[:, step].unsqueeze(-1).repeat(1, beam_size).view(-1)
+                prefix_mask = prefix_toks.ne(self.pad)
+
+                prefix_fw_lprobs = fw_lprobs.gather(-1, prefix_toks.unsqueeze(-1))
+                fw_lprobs[prefix_mask] = -math.inf
+                fw_lprobs[prefix_mask] = fw_lprobs[prefix_mask].scatter_(
+                    -1, prefix_toks[prefix_mask].unsqueeze(-1), prefix_fw_lprobs
+                )
+
+                prefix_ch_lm_lprobs = ch_lm_lprobs.gather(-1, prefix_toks.unsqueeze(-1))
+                ch_lm_lprobs[prefix_mask] = -math.inf
+                ch_lm_lprobs[prefix_mask] = ch_lm_lprobs[prefix_mask].scatter_(
+                    -1, prefix_toks[prefix_mask].unsqueeze(-1), prefix_ch_lm_lprobs
+                )
+
+                prefix_lm_lprobs = lm_lprobs.gather(-1, prefix_toks.unsqueeze(-1))
+                lm_lprobs[prefix_mask] = -math.inf
+                lm_lprobs[prefix_mask] = lm_lprobs[prefix_mask].scatter_(
+                    -1, prefix_toks[prefix_mask].unsqueeze(-1), prefix_lm_lprobs
+                )
+
+                # if prefix includes eos, then we should make sure tokens and
+                # scores are the same across all beams
+                eos_mask = prefix_toks.eq(self.eos)
+                if eos_mask.any():
+                    # validate that the first beam matches the prefix
+                    first_beam = tokens[eos_mask].view(-1, beam_size, tokens.size(-1))[:, 0, 1:step + 1]
+                    eos_mask_batch_dim = eos_mask.view(-1, beam_size)[:, 0]
+                    target_prefix = prefix_tokens[eos_mask_batch_dim][:, :step]
+                    assert (first_beam == target_prefix).all()
+
+                    def replicate_first_beam(tensor, mask):
+                        tensor = tensor.view(-1, beam_size, tensor.size(-1))
+                        tensor[mask] = tensor[mask][:, :1, :]
+                        return tensor.view(-1, tensor.size(-1))
+
+                    # copy tokens, scores and lprobs from the first beam to all beams
+                    tokens = replicate_first_beam(tokens, eos_mask_batch_dim)
+                    scores = replicate_first_beam(scores, eos_mask_batch_dim)
+
+                    fw_lprobs = replicate_first_beam(fw_lprobs, eos_mask_batch_dim)
+                    ch_lm_lprobs = replicate_first_beam(ch_lm_lprobs, eos_mask_batch_dim)
+                    lm_lprobs = replicate_first_beam(lm_lprobs, eos_mask_batch_dim)
+
+            if self.no_repeat_ngram_size > 0:
+                # for each beam and batch sentence, generate a list of previous ngrams
+                gen_ngrams = [{} for bbsz_idx in range(bsz * beam_size)]
+                for bbsz_idx in range(bsz * beam_size):
+                    gen_tokens = tokens[bbsz_idx].tolist()
+                    for ngram in zip(*[gen_tokens[i:] for i in range(self.no_repeat_ngram_size)]):
+                        gen_ngrams[bbsz_idx][tuple(ngram[:-1])] = \
+                                gen_ngrams[bbsz_idx].get(tuple(ngram[:-1]), []) + [ngram[-1]]
+
+            # Record attention scores
+            if avg_attn_scores is not None:
+                if attn is None:
+                    attn = scores.new(bsz * beam_size, src_tokens.size(1), max_len + 2)
+                    attn_buf = attn.clone()
+                    nonpad_idxs = src_tokens.ne(self.pad)
+                attn[:, :, step + 1].copy_(avg_attn_scores)
+
+            scores = scores.type_as(fw_lprobs)
+            scores_buf = scores_buf.type_as(fw_lprobs)
+
+            self.search.set_src_lengths(src_lengths_no_eos)
+
+            if self.no_repeat_ngram_size > 0:
+                def calculate_banned_tokens(bbsz_idx):
+                    # before decoding the next token, prevent decoding of ngrams that have already appeared
+                    ngram_index = tuple(tokens[bbsz_idx, step + 2 - self.no_repeat_ngram_size:step + 1].tolist())
+                    return gen_ngrams[bbsz_idx].get(ngram_index, [])
+
+                if step + 2 - self.no_repeat_ngram_size >= 0:
+                    # no banned tokens if we haven't generated no_repeat_ngram_size tokens yet
+                    banned_tokens = [calculate_banned_tokens(bbsz_idx) for bbsz_idx in range(bsz * beam_size)]
+                else:
+                    banned_tokens = [[] for bbsz_idx in range(bsz * beam_size)]
+
+                for bbsz_idx in range(bsz * beam_size):
+                    fw_lprobs[bbsz_idx, banned_tokens[bbsz_idx]] = -math.inf
+
+            combined_noisy_channel_scores, fw_lprobs_top_k, lm_lprobs_top_k, cand_indices, cand_beams = self.search.step(
+                step,
+                fw_lprobs.view(bsz, -1, self.vocab_size),
+                scores.view(bsz, beam_size, -1)[:, :, :step], ch_lm_lprobs.view(bsz, -1, self.vocab_size),
+                lm_lprobs.view(bsz, -1, self.vocab_size), self.combine_method
+            )
+
+            # cand_bbsz_idx contains beam indices for the top candidate
+            # hypotheses, with a range of values: [0, bsz*beam_size),
+            # and dimensions: [bsz, cand_size]
+            cand_bbsz_idx = cand_beams.add(bbsz_offsets)
+
+            # finalize hypotheses that end in eos (except for candidates to be ignored)
+            eos_mask = cand_indices.eq(self.eos)
+            eos_mask[:, :beam_size] &= ~cands_to_ignore
+
+            # only consider eos when it's among the top beam_size indices
+            eos_bbsz_idx = torch.masked_select(
+                cand_bbsz_idx[:, :beam_size], mask=eos_mask[:, :beam_size]
+            )
+
+            finalized_sents = set()
+            if eos_bbsz_idx.numel() > 0:
+                eos_scores = torch.masked_select(
+                    fw_lprobs_top_k[:, :beam_size], mask=eos_mask[:, :beam_size]
+                )
+                combined_noisy_channel_eos_scores = torch.masked_select(
+                    combined_noisy_channel_scores[:, :beam_size],
+                    mask=eos_mask[:, :beam_size],
+                )
+
+                # finalize hypo using channel model score
+                finalized_sents = finalize_hypos(
+                    step, eos_bbsz_idx, eos_scores, combined_noisy_channel_eos_scores)
+
+                num_remaining_sent -= len(finalized_sents)
+
+            assert num_remaining_sent >= 0
+            if num_remaining_sent == 0:
+                break
+
+            if len(finalized_sents) > 0:
+                new_bsz = bsz - len(finalized_sents)
+
+                # construct batch_idxs which holds indices of batches to keep for the next pass
+                batch_mask = cand_indices.new_ones(bsz)
+                batch_mask[cand_indices.new(finalized_sents)] = 0
+                batch_idxs = torch.nonzero(batch_mask).squeeze(-1)
+
+                eos_mask = eos_mask[batch_idxs]
+                cand_beams = cand_beams[batch_idxs]
+                bbsz_offsets.resize_(new_bsz, 1)
+                cand_bbsz_idx = cand_beams.add(bbsz_offsets)
+
+                lm_lprobs_top_k = lm_lprobs_top_k[batch_idxs]
+
+                fw_lprobs_top_k = fw_lprobs_top_k[batch_idxs]
+                cand_indices = cand_indices[batch_idxs]
+                if prefix_tokens is not None:
+                    prefix_tokens = prefix_tokens[batch_idxs]
+                src_lengths_no_eos = src_lengths_no_eos[batch_idxs]
+                cands_to_ignore = cands_to_ignore[batch_idxs]
+
+                scores = scores.view(bsz, -1)[batch_idxs].view(new_bsz * beam_size, -1)
+                scores_buf.resize_as_(scores)
+                tokens = tokens.view(bsz, -1)[batch_idxs].view(new_bsz * beam_size, -1)
+                tokens_buf.resize_as_(tokens)
+                src_tokens = src_tokens.view(bsz, -1)[batch_idxs].view(new_bsz * beam_size, -1)
+                src_lengths = src_lengths.view(bsz, -1)[batch_idxs].view(new_bsz * beam_size, -1)
+                lm_prefix_scores = lm_prefix_scores.view(bsz, -1)[batch_idxs].view(new_bsz * beam_size, -1).squeeze()
+
+                if attn is not None:
+                    attn = attn.view(bsz, -1)[batch_idxs].view(new_bsz * beam_size, attn.size(1), -1)
+                    attn_buf.resize_as_(attn)
+                bsz = new_bsz
+            else:
+                batch_idxs = None
+
+            # Set active_mask so that values > cand_size indicate eos or
+            # ignored hypos and values < cand_size indicate candidate
+            # active hypos. After this, the min values per row are the top
+            # candidate active hypos.
+            eos_mask[:, :beam_size] |= cands_to_ignore
+            active_mask = torch.add(
+                eos_mask.type_as(cand_offsets) * cand_size,
+                cand_offsets[: eos_mask.size(1)],
+            )
+
+            # get the top beam_size active hypotheses, which are just the hypos
+            # with the smallest values in active_mask
+            active_hypos, new_cands_to_ignore = buffer('active_hypos'), buffer('new_cands_to_ignore')
+            torch.topk(
+                active_mask, k=beam_size, dim=1, largest=False,
+                out=(new_cands_to_ignore, active_hypos)
+            )
+
+            # update cands_to_ignore to ignore any finalized hypos
+            cands_to_ignore = new_cands_to_ignore.ge(cand_size)[:, :beam_size]
+            assert (~cands_to_ignore).any(dim=1).all()
+
+            active_bbsz_idx = buffer('active_bbsz_idx')
+            torch.gather(
+                cand_bbsz_idx, dim=1, index=active_hypos,
+                out=active_bbsz_idx,
+            )
+            active_scores = torch.gather(
+                fw_lprobs_top_k, dim=1, index=active_hypos,
+                out=scores[:, step].view(bsz, beam_size),
+            )
+
+            active_bbsz_idx = active_bbsz_idx.view(-1)
+            active_scores = active_scores.view(-1)
+
+            # copy tokens and scores for active hypotheses
+            torch.index_select(
+                tokens[:, :step + 1], dim=0, index=active_bbsz_idx,
+                out=tokens_buf[:, :step + 1],
+            )
+            torch.gather(
+                cand_indices, dim=1, index=active_hypos,
+                out=tokens_buf.view(bsz, beam_size, -1)[:, :, step + 1],
+            )
+            if step > 0:
+                torch.index_select(
+                    scores[:, :step], dim=0, index=active_bbsz_idx,
+                    out=scores_buf[:, :step],
+                )
+            torch.gather(
+                fw_lprobs_top_k, dim=1, index=active_hypos,
+                out=scores_buf.view(bsz, beam_size, -1)[:, :, step],
+            )
+            torch.gather(
+                lm_lprobs_top_k, dim=1, index=active_hypos,
+                out=lm_prefix_scores.view(bsz, beam_size)
+            )
+
+            # copy attention for active hypotheses
+            if attn is not None:
+                torch.index_select(
+                    attn[:, :, :step + 2], dim=0, index=active_bbsz_idx,
+                    out=attn_buf[:, :, :step + 2],
+                )
+
+            # swap buffers
+            tokens, tokens_buf = tokens_buf, tokens
+            scores, scores_buf = scores_buf, scores
+            if attn is not None:
+                attn, attn_buf = attn_buf, attn
+
+            # reorder incremental state in decoder
+            reorder_state = active_bbsz_idx
+
+        # sort by score descending
+        for sent in range(len(finalized)):
+            finalized[sent] = sorted(finalized[sent], key=lambda r: r['score'], reverse=True)
+
+        return finalized
+
+
+def get_lm_scores(model, input_tokens, incremental_states, cand_tokens, input_len, k):
+    with torch.no_grad():
+        lm_lprobs, avg_attn_scores = model.forward_decoder(
+            input_tokens, encoder_outs=None, incremental_states=incremental_states,
+        )
+
+        lm_lprobs_size = lm_lprobs.size(0)
+        probs_next_wrd = torch.gather(lm_lprobs.repeat(1, k).view(lm_lprobs_size*k, -1), 1, cand_tokens).squeeze().view(-1)
+
+        return probs_next_wrd
+
+
+def make_dict2dict(old_dict, new_dict):
+    dict2dict_map = {}
+    for sym in old_dict.symbols:
+        dict2dict_map[old_dict.index(sym)] = new_dict.index(sym)
+    return dict2dict_map
+
+
+def dict2dict(tokens, dict2dict_map):
+    if tokens.device == torch.device('cpu'):
+        tokens_tmp = tokens
+    else:
+        tokens_tmp = tokens.cpu()
+    return tokens_tmp.map_(
+        tokens_tmp,
+        lambda _, val, dict2dict_map=dict2dict_map : dict2dict_map[float(val)]
+    ).to(tokens.device)
+
+
+def reorder_tokens(tokens, lengths, eos):
+    # reorder source tokens so they may be used as reference for P(S|T)
+    return torch.cat((tokens.new([eos]), tokens[-lengths:-1], tokens[:-lengths]), 0)
+
+
+def reorder_all_tokens(tokens, lengths, eos):
+    # used to reorder src tokens from [<pad> <w1> <w2> .. <eos>] to [<eos> <w1> <w2>...<pad>]
+    # so source tokens can be used to predict P(S|T)
+    return torch.stack([reorder_tokens(token, length, eos) for token, length in zip(tokens, lengths)])
+
+
+def normalized_scores_with_batch_vocab(
+        model_decoder, features, target_ids, k, bsz, beam_size,
+        pad_idx, top_k=0, vocab_size_meter=None, start_idx=None,
+        end_idx=None, **kwargs):
+    """
+        Get normalized probabilities (or log probs) from a net's output
+        w.r.t. vocab consisting of target IDs in the batch
+    """
+    if model_decoder.adaptive_softmax is None:
+        weight = model_decoder.output_projection.weight
+        vocab_ids = torch.unique(
+            torch.cat(
+                (torch.unique(target_ids), torch.arange(top_k, device=target_ids.device))
+            )
+        )
+        id_map = dict(zip(vocab_ids.tolist(), range(len(vocab_ids))))
+        mapped_target_ids = target_ids.cpu().apply_(
+            lambda x, id_map=id_map: id_map[x]
+        ).to(target_ids.device)
+        expanded_target_ids = mapped_target_ids[:, :].repeat(1, k).view(bsz*beam_size*k, -1)
+        if start_idx is not None and end_idx is not None:
+            expanded_target_ids = expanded_target_ids[start_idx:end_idx, :]
+        logits = F.linear(features, weight[vocab_ids, :])
+        log_softmax = F.log_softmax(logits, dim=-1, dtype=torch.float32)
+        intermed_scores = torch.gather(
+            log_softmax[:, :-1, :],
+            2,
+            expanded_target_ids[:, 1:].unsqueeze(2),
+        ).squeeze()
+        not_padding = expanded_target_ids[:, 1:] != pad_idx
+        intermed_scores *= not_padding.float()
+        return intermed_scores
+    else:
+        raise ValueError("adaptive softmax doesn't work with " +
+                         "`normalized_scores_with_batch_vocab()`")
diff --git a/SpeechT5/fairseq/examples/fast_noisy_channel/noisy_channel_translation.py b/SpeechT5/fairseq/examples/fast_noisy_channel/noisy_channel_translation.py
new file mode 100644
index 0000000000000000000000000000000000000000..b74bdfd456f9b7c546ce528173c77431b4f57ac1
--- /dev/null
+++ b/SpeechT5/fairseq/examples/fast_noisy_channel/noisy_channel_translation.py
@@ -0,0 +1,127 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from fairseq.tasks.translation import TranslationTask
+from fairseq.tasks.language_modeling import LanguageModelingTask
+from fairseq import checkpoint_utils
+import argparse
+from fairseq.tasks import register_task
+import torch
+
+
+@register_task("noisy_channel_translation")
+class NoisyChannelTranslation(TranslationTask):
+    """
+    Rescore the top k candidates from each beam using noisy channel modeling
+    """
+
+    @staticmethod
+    def add_args(parser):
+        """Add task-specific arguments to the parser."""
+        TranslationTask.add_args(parser)
+        # fmt: off
+        parser.add_argument('--channel-model', metavar='FILE',
+                            help='path to P(S|T) model. P(S|T) and P(T|S) must share source and target dictionaries.')
+        parser.add_argument('--combine-method', default='lm_only',
+                            choices=['lm_only', 'noisy_channel'],
+                            help="""method for combining direct and channel model scores.
+                                    lm_only: decode with P(T|S)P(T)
+                                    noisy_channel: decode with 1/t P(T|S) + 1/s(P(S|T)P(T))""")
+        parser.add_argument('--normalize-lm-scores-by-tgt-len', action='store_true', default=False,
+                            help='normalize lm score by target length instead of source length')
+        parser.add_argument('--channel-scoring-type', default='log_norm', choices=['unnormalized', 'log_norm', 'k2_separate', 'src_vocab', 'src_vocab_batched'],
+                            help="Normalize bw scores with log softmax or return bw scores without log softmax")
+        parser.add_argument('--top-k-vocab', default=0, type=int,
+                            help='top k vocab IDs to use with `src_vocab` in channel model scoring')
+        parser.add_argument('--k2', default=50, type=int,
+                            help='the top k2 candidates to rescore with the noisy channel model for each beam')
+        parser.add_argument('--ch-wt', default=1, type=float,
+                            help='weight for the channel model')
+        parser.add_argument('--lm-model', metavar='FILE',
+                            help='path to lm model file, to model P(T). P(T) must share the same vocab as the direct model on the target side')
+        parser.add_argument('--lm-data', metavar='FILE',
+                            help='path to lm model training data for target language, used to properly load LM with correct dictionary')
+        parser.add_argument('--lm-wt', default=1, type=float,
+                            help='the weight of the lm in joint decoding')
+        # fmt: on
+
+    def build_generator(
+        self, models, args, seq_gen_cls=None, extra_gen_cls_kwargs=None
+    ):
+        if getattr(args, "score_reference", False):
+            raise NotImplementedError()
+        else:
+            from .noisy_channel_sequence_generator import NoisyChannelSequenceGenerator
+            use_cuda = torch.cuda.is_available() and not self.args.cpu
+            assert self.args.lm_model is not None, '--lm-model required for noisy channel generation!'
+            assert self.args.lm_data is not None, '--lm-data required for noisy channel generation to map between LM and bitext vocabs'
+            if self.args.channel_model is not None:
+                import copy
+                ch_args_task = copy.deepcopy(self.args)
+                tmp = ch_args_task.source_lang
+                ch_args_task.source_lang = ch_args_task.target_lang
+                ch_args_task.target_lang = tmp
+                ch_args_task._name = 'translation'
+                channel_task = TranslationTask.setup_task(ch_args_task)
+
+            arg_dict = {}
+            arg_dict['task'] = 'language_modeling'
+            arg_dict['sample_break_mode'] = 'eos'
+            arg_dict['data'] = self.args.lm_data
+            arg_dict['output_dictionary_size'] = -1
+            lm_args = argparse.Namespace(**arg_dict)
+            lm_task = LanguageModelingTask.setup_task(lm_args)
+            lm_dict = lm_task.output_dictionary
+
+            if self.args.channel_model is not None:
+                channel_models, _ = checkpoint_utils.load_model_ensemble(self.args.channel_model.split(':'), task=channel_task)
+
+                for model in channel_models:
+                    model.make_generation_fast_(
+                        beamable_mm_beam_size=None if args.no_beamable_mm else args.beam,
+                        need_attn=args.print_alignment,
+                    )
+                    if self.args.fp16:
+                        model.half()
+                    if use_cuda:
+                        model.cuda()
+            else:
+                channel_models = None
+
+            lm_models, _ = checkpoint_utils.load_model_ensemble(self.args.lm_model.split(':'), task=lm_task)
+
+            for model in lm_models:
+                model.make_generation_fast_(
+                    beamable_mm_beam_size=None if args.no_beamable_mm else args.beam,
+                    need_attn=args.print_alignment,
+                )
+                if self.args.fp16:
+                    model.half()
+                if use_cuda:
+                    model.cuda()
+            return NoisyChannelSequenceGenerator(
+                combine_method=self.args.combine_method,
+                tgt_dict=self.target_dictionary,
+                src_dict=self.source_dictionary,
+                beam_size=getattr(args, 'beam', 5),
+                max_len_a=getattr(args, 'max_len_a', 0),
+                max_len_b=getattr(args, 'max_len_b', 200),
+                min_len=getattr(args, 'min_len', 1),
+                len_penalty=getattr(args, 'lenpen', 1),
+                unk_penalty=getattr(args, 'unkpen', 0),
+                temperature=getattr(args, 'temperature', 1.),
+                match_source_len=getattr(args, 'match_source_len', False),
+                no_repeat_ngram_size=getattr(args, 'no_repeat_ngram_size', 0),
+                normalize_scores=(not getattr(args, 'unnormalized', False)),
+                channel_models=channel_models,
+                k2=getattr(self.args, 'k2', 50),
+                ch_weight=getattr(self.args, 'ch_wt', 1),
+                channel_scoring_type=self.args.channel_scoring_type,
+                top_k_vocab=self.args.top_k_vocab,
+                lm_models=lm_models,
+                lm_dict=lm_dict,
+                lm_weight=getattr(self.args, 'lm_wt', 1),
+                normalize_lm_scores_by_tgt_len=getattr(self.args, 'normalize_lm_scores_by_tgt_len', False),
+            )
diff --git a/SpeechT5/fairseq/examples/flores101/README.md b/SpeechT5/fairseq/examples/flores101/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..635c13f40bd0ccab704735bc5c26ea0192ea98cd
--- /dev/null
+++ b/SpeechT5/fairseq/examples/flores101/README.md
@@ -0,0 +1,223 @@
+<p align="center">
+<img src="flores_logo.png" width="500">
+</p>
+
+# Flores101: Large-Scale Multilingual Machine Translation
+
+## Introduction
+
+Baseline pretrained models for small and large tracks of WMT 21 Large-Scale Multilingual Machine Translation competition.
+
+Flores Task at WMT 21: http://www.statmt.org/wmt21/large-scale-multilingual-translation-task.html
+
+Flores announement blog post: https://ai.facebook.com/blog/flores-researchers-kick-off-multilingual-translation-challenge-at-wmt-and-call-for-compute-grants/
+
+
+
+## Pretrained models
+
+Model | Num layers | Embed dimension | FFN dimension| Vocab Size | #params | Download
+---|---|---|---|---|---|---
+`flores101_mm100_615M` | 12 | 1024 | 4096 | 256,000 | 615M | https://dl.fbaipublicfiles.com/flores101/pretrained_models/flores101_mm100_615M.tar.gz
+`flores101_mm100_175M` | 6 | 512 | 2048 | 256,000 | 175M | https://dl.fbaipublicfiles.com/flores101/pretrained_models/flores101_mm100_175M.tar.gz
+
+
+These models are trained similar to [M2M-100](https://arxiv.org/abs/2010.11125) with additional support for the languages that are part of the WMT Large-Scale Multilingual Machine Translation track. Full list of languages can be found at the bottom.
+
+
+## Example Generation code
+
+### Download model, sentencepiece vocab
+
+```bash
+fairseq=/path/to/fairseq
+cd $fairseq
+
+# Download 615M param model.
+wget https://dl.fbaipublicfiles.com/flores101/pretrained_models/flores101_mm100_615M.tar.gz
+
+# Extract 
+tar -xvzf flores101_mm100_615M.tar.gz
+```
+
+### Encode using our SentencePiece Model
+Note: Install SentencePiece from [here](https://github.com/google/sentencepiece)
+
+
+```bash
+fairseq=/path/to/fairseq
+cd $fairseq
+
+# Download example dataset From German to French
+sacrebleu --echo src -l de-fr -t wmt19 | head -n 20 > raw_input.de-fr.de
+sacrebleu --echo ref -l de-fr -t wmt19 | head -n 20 > raw_input.de-fr.fr
+
+for lang in de fr ; do
+    python scripts/spm_encode.py \
+        --model flores101_mm100_615M/sentencepiece.bpe.model \
+        --output_format=piece \
+        --inputs=raw_input.de-fr.${lang} \
+        --outputs=spm.de-fr.${lang}
+done
+```
+
+### Binarization
+
+```bash
+fairseq-preprocess \
+    --source-lang de --target-lang fr \
+    --testpref spm.de-fr \
+    --thresholdsrc 0 --thresholdtgt 0 \
+    --destdir data_bin \
+    --srcdict flores101_mm100_615M/dict.txt --tgtdict flores101_mm100_615M/dict.txt
+```
+
+### Generation 
+
+
+```bash
+fairseq-generate \
+    data_bin \
+    --batch-size 1 \
+    --path flores101_mm100_615M/model.pt \
+    --fixed-dictionary flores101_mm100_615M/dict.txt \
+    -s de -t fr \
+    --remove-bpe 'sentencepiece' \
+    --beam 5 \
+    --task translation_multi_simple_epoch \
+    --lang-pairs flores101_mm100_615M/language_pairs.txt \
+    --decoder-langtok --encoder-langtok src \
+    --gen-subset test \
+    --fp16 \
+    --dataset-impl mmap \
+    --distributed-world-size 1 --distributed-no-spawn
+```
+
+### Supported Languages and lang code
+
+Language | lang code
+---|---
+Akrikaans | af
+Amharic | am
+Arabic | ar
+Assamese | as
+Asturian | ast
+Aymara | ay
+Azerbaijani | az
+Bashkir | ba
+Belarusian | be
+Bulgarian | bg
+Bengali | bn
+Breton | br
+Bosnian | bs
+Catalan | ca
+Cebuano | ceb
+Chokwe | cjk
+Czech | cs
+Welsh | cy
+Danish | da
+German | de
+Dyula| dyu
+Greek | el
+English | en
+Spanish | es
+Estonian | et
+Persian | fa
+Fulah | ff
+Finnish | fi
+French | fr
+Western Frisian | fy
+Irish | ga
+Scottish Gaelic | gd
+Galician | gl
+Gujarati | gu
+Hausa | ha
+Hebrew | he
+Hindi | hi
+Croatian | hr
+Haitian Creole | ht
+Hungarian | hu
+Armenian | hy
+Indonesian | id
+Igbo | ig
+Iloko | ilo
+Icelandic | is
+Italian | it
+Japanese | ja
+Javanese | jv
+Georgian | ka
+Kachin | kac
+Kamba | kam
+Kabuverdianu | kea
+Kongo | kg
+Kazakh | kk
+Central Khmer | km
+Kimbundu | kmb
+Northern Kurdish | kmr
+Kannada | kn
+Korean | ko
+Kurdish | ku
+Kyrgyz | ky
+Luxembourgish | lb
+Ganda | lg
+Lingala | ln
+Lao | lo
+Lithuanian | lt
+Luo | luo
+Latvian | lv
+Malagasy | mg
+Maori | mi
+Macedonian | mk
+Malayalam | ml
+Mongolian | mn
+Marathi | mr
+Malay | ms
+Maltese | mt
+Burmese | my
+Nepali | ne
+Dutch | nl
+Norwegian | no
+Northern Sotho | ns
+Nyanja | ny
+Occitan | oc
+Oromo | om
+Oriya | or
+Punjabi | pa
+Polish | pl
+Pashto | ps
+Portuguese | pt
+Quechua | qu
+Romanian | ro
+Russian | ru
+Sindhi | sd
+Shan | shn
+Sinhala | si
+Slovak | sk
+Slovenian | sl
+Shona | sn
+Somali | so
+Albanian | sq
+Serbian | sr
+Swati | ss
+Sundanese | su
+Swedish | sv
+Swahili | sw
+Tamil | ta
+Telugu | te
+Tajik | tg
+Thai | th
+Tigrinya | ti
+Tagalog | tl
+Tswana | tn
+Turkish | tr
+Ukrainian | uk
+Umbundu | umb
+Urdu | ur
+Uzbek | uz
+Vietnamese | vi
+Wolof | wo
+Xhosa | xh
+Yiddish | yi
+Yoruba | yo
+Chinese| zh
+Zulu | zu
diff --git a/SpeechT5/fairseq/examples/flores101/flores_logo.png b/SpeechT5/fairseq/examples/flores101/flores_logo.png
new file mode 100644
index 0000000000000000000000000000000000000000..d4d1455c6eab608ff5317ce885183cd213564273
Binary files /dev/null and b/SpeechT5/fairseq/examples/flores101/flores_logo.png differ
diff --git a/SpeechT5/fairseq/examples/fully_sharded_data_parallel/README.md b/SpeechT5/fairseq/examples/fully_sharded_data_parallel/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..d620f0e4f1a1561c267140b9b6f4c705a38a8865
--- /dev/null
+++ b/SpeechT5/fairseq/examples/fully_sharded_data_parallel/README.md
@@ -0,0 +1,177 @@
+# Fully Sharded Data Parallel (FSDP)
+
+## Overview
+Recent work by [Microsoft](https://arxiv.org/abs/1910.02054) and
+[Google](https://arxiv.org/abs/2004.13336) has shown that data parallel
+training can be made significantly more efficient by sharding the model
+parameters and optimizer state across data parallel workers. These ideas are
+encapsulated in the new **`FullyShardedDataParallel` (FSDP)** wrapper provided
+by [fairscale](https://github.com/facebookresearch/fairscale/).
+
+Compared to PyTorch DDP:
+* FSDP produces identical results as PyTorch DDP (it's still synchronous data parallel training)
+* FSDP shards parameters (FP16 + FP32) and optimizer state across data parallel GPUs
+* FSDP is faster than PyTorch DDP because the optimizer step is sharded, and the communication can be overlapped with the forward pass
+* FSDP enables training 13B parameter models on 8 GPUs and 175B parameter models on 128 GPUs
+
+FSDP is fully supported in fairseq via the following new arguments:
+* `--ddp-backend=fully_sharded`: enables full sharding via FSDP
+* `--cpu-offload`: offloads the optimizer state and FP32 model copy to CPU (combine with `--optimizer=cpu_adam`)
+* `--no-reshard-after-forward`: increases training speed for large models (1B+ params) and is similar to ZeRO stage 2
+* other popular options (`--fp16`, `--update-freq`, `--checkpoint-activations`, `--offload-activations`, etc.) continue to work as normal
+
+<details><summary>Limitations</summary><p>
+
+FSDP currently has several limitations compared to fairseq's default DDP backend (PyTorch DDP):
+* while FSDP is full compatible with pointwise Optimizers (e.g., Adam, AdamW, Adadelta, Adamax, SGD, etc.), it is not currently compatible with non-pointwise Optimizers (e.g., Adagrad, Adafactor, LAMB, etc.)
+* FSDP depends on flattening the parameters, so models that currently require `--fp16-no-flatten-grads` may not be supported
+
+See the [fairscale docs](https://fairscale.readthedocs.io/en/latest/api/nn/fsdp_tips.html) for a more detailed
+explanation of these and other limitations.
+
+</p></details>
+
+<details><summary>How it works</summary><p>
+
+<img width="800" alt="Fully Sharded Data Parallel" src="https://user-images.githubusercontent.com/231798/110406775-c2de0000-8050-11eb-9718-fbfc4510a76a.png">
+
+See the [fairscale docs](https://fairscale.readthedocs.io/en/latest/api/nn/fsdp_tips.html) for a more detailed
+explanation of how FSDP works.
+
+</p></details>
+
+## Example usage
+
+The following examples illustrate how to train a very large language model with
+13 billion parameters on 1 GPU by offloading parameters and optimizer states to
+CPU, or on 8 GPUs by fully sharding the params and optimizer states across GPUs.
+
+These examples use the WikiText-103 dataset for demonstration purposes, but
+in practice a much larger dataset will be needed to achieve good results.
+Follow the [instructions here](https://github.com/pytorch/fairseq/blob/master/examples/roberta/README.pretraining.md#1-preprocess-the-data)
+to preprocess the WikiText-103 dataset using the GPT-2/RoBERTa vocabulary.
+
+### 13B params on 1 V100 GPU (with CPU offloading)
+
+The following command trains a 13B parameter GPT-3 model on a single V100 GPU
+using the `--cpu-offload` feature to offload parameters and optimizer states to
+CPU. In this setting, the optimizer step (Adam) happens on CPU. We also use the
+`--checkpoint-activations` feature (sometimes called [gradient checkpointing](https://pytorch.org/docs/stable/checkpoint.html)),
+which further saves memory in exchange for a small increase in computation.
+
+**Requirements:**
+- Install the latest master version of fairscale: `pip install git+https://github.com/facebookresearch/fairscale.git@master`
+- You'll need 32GB of GPU memory and ~256GB of system memory to train the 13B param model.
+- If you have less system memory, the 6.7B param model can be trained with ~128GB of system memory, just set `--arch transformer_lm_gpt3_6_7`
+- We use the CPU Adam optimizer from [DeepSpeed](https://github.com/microsoft/DeepSpeed), so you'll need to `pip install deepspeed` before running the command.
+
+**Notes:**
+- The command will take ~5 minutes to start training, during which time it will appear to be hung, since randomly initializing 13B weights can be slow.
+- The `--cpu-offload` feature requires training in mixed precision (`--fp16`).
+- Tune the `OMP_NUM_THREADS` env variable for best performance with CPU offloading.
+- The example command below stops training after 10 steps (`--max-update 10`) and does not save checkpoints (`--no-save`).
+
+```bash
+OMP_NUM_THREADS=20 CUDA_VISIBLE_DEVICES=0 \
+    fairseq-train data-bin/wikitext-103-roberta-bpe-bin \
+    --ddp-backend fully_sharded --fp16 --fp16-init-scale 4 \
+    --cpu-offload --checkpoint-activations \
+    --task language_modeling --tokens-per-sample 2048 --batch-size 8 \
+    --arch transformer_lm_gpt3_13 \
+    --optimizer cpu_adam --adam-betas "(0.9,0.98)" \
+    --lr 0.0001 --lr-scheduler polynomial_decay --warmup-updates 5 --total-num-update 10 \
+    --max-update 10 --no-save --log-format json --log-interval 1
+```
+
+<details><summary>Example output</summary><p>
+
+```
+(...)
+2021-03-08 12:29:51 | INFO | fairseq_cli.train | num. model params: 13,110,865,920 (num. trained: 13,110,865,920)
+(...)
+2021-03-08 12:29:51 | INFO | fairseq_cli.train | training on 1 devices (GPUs/TPUs)
+2021-03-08 12:29:51 | INFO | fairseq_cli.train | max tokens per GPU = None and batch size per GPU = 8
+(...)
+Adam Optimizer #0 is created with AVX2 arithmetic capability.
+Config: alpha=0.000100, betas=(0.900000, 0.980000), weight_decay=0.000000, adam_w=1
+(...)
+2021-03-08 12:31:36 | INFO | train_inner | {"epoch": 1, "update": 0.0, "loss": "16.475", "ppl": "91120.8", "wps": "0", "ups": "0", "wpb": "16384", "bsz": "8", "num_updates": "1", "lr": "2e-05", "gnorm": "20.751", "loss_scale": "4", "train_wall": "99", "gb_free": "9.3", "wall": "105"}
+2021-03-08 12:32:33 | INFO | train_inner | {"epoch": 1, "update": 0.0, "loss": "16.446", "ppl": "89281.6", "wps": "288.7", "ups": "0.02", "wpb": "16384", "bsz": "8", "num_updates": "2", "lr": "4e-05", "gnorm": "19.777", "loss_scale": "4", "train_wall": "57", "gb_free": "9.3", "wall": "161"}
+2021-03-08 12:33:12 | INFO | fairseq.trainer | NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 2.0
+2021-03-08 12:33:51 | INFO | fairseq.trainer | NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 1.0
+2021-03-08 12:34:45 | INFO | train_inner | {"epoch": 1, "update": 0.001, "loss": "25.22", "ppl": "3.90691e+07", "wps": "123.4", "ups": "0.01", "wpb": "16384", "bsz": "8", "num_updates": "3", "lr": "6e-05", "gnorm": "131.281", "loss_scale": "1", "train_wall": "133", "gb_free": "9.3", "wall": "294"}
+2021-03-08 12:35:43 | INFO | train_inner | {"epoch": 1, "update": 0.001, "loss": "18.079", "ppl": "276809", "wps": "285.5", "ups": "0.02", "wpb": "16384", "bsz": "8", "num_updates": "4", "lr": "8e-05", "gnorm": "13.776", "loss_scale": "1", "train_wall": "57", "gb_free": "9.3", "wall": "351"}
+2021-03-08 12:36:35 | INFO | train_inner | {"epoch": 1, "update": 0.001, "loss": "23.729", "ppl": "1.39088e+07", "wps": "316.7", "ups": "0.02", "wpb": "16384", "bsz": "8", "num_updates": "5", "lr": "0.0001", "gnorm": "72.774", "loss_scale": "1", "train_wall": "52", "gb_free": "9.3", "wall": "403"}
+2021-03-08 12:37:28 | INFO | train_inner | {"epoch": 1, "update": 0.001, "loss": "20.429", "ppl": "1.41203e+06", "wps": "307.6", "ups": "0.02", "wpb": "16384", "bsz": "8", "num_updates": "6", "lr": "8e-05", "gnorm": "60.846", "loss_scale": "1", "train_wall": "53", "gb_free": "9.3", "wall": "456"}
+2021-03-08 12:38:27 | INFO | train_inner | {"epoch": 1, "update": 0.001, "loss": "18.965", "ppl": "511684", "wps": "279.4", "ups": "0.02", "wpb": "16384", "bsz": "8", "num_updates": "7", "lr": "6e-05", "gnorm": "22.687", "loss_scale": "1", "train_wall": "59", "gb_free": "9.3", "wall": "515"}
+2021-03-08 12:39:18 | INFO | train_inner | {"epoch": 1, "update": 0.001, "loss": "18.345", "ppl": "332887", "wps": "319.1", "ups": "0.02", "wpb": "16384", "bsz": "8", "num_updates": "8", "lr": "4e-05", "gnorm": "8.451", "loss_scale": "1", "train_wall": "51", "gb_free": "9.3", "wall": "566"}
+2021-03-08 12:40:11 | INFO | train_inner | {"epoch": 1, "update": 0.002, "loss": "18.262", "ppl": "314336", "wps": "305.9", "ups": "0.02", "wpb": "16384", "bsz": "8", "num_updates": "9", "lr": "2e-05", "gnorm": "6.457", "loss_scale": "1", "train_wall": "54", "gb_free": "9.3", "wall": "620"}
+2021-03-08 12:41:04 | INFO | train_inner | {"epoch": 1, "update": 0.002, "loss": "17.556", "ppl": "192686", "wps": "311.8", "ups": "0.02", "wpb": "16384", "bsz": "8", "num_updates": "10", "lr": "0", "gnorm": "5.796", "loss_scale": "1", "train_wall": "53", "gb_free": "9.3", "wall": "673"}
+2021-03-08 12:41:04 | INFO | fairseq_cli.train | Stopping training due to num_updates: 10 >= max_update: 10
+2021-03-08 12:41:04 | INFO | fairseq_cli.train | begin validation on "valid" subset
+2021-03-08 12:43:15 | INFO | valid | {"epoch": 1, "valid_loss": "17.953", "valid_ppl": "253807", "valid_wps": "1868.4", "valid_wpb": "15400.2", "valid_bsz": "7.6", "valid_num_updates": "10"}
+2021-03-08 12:43:15 | INFO | fairseq_cli.train | end of epoch 1 (average epoch stats below)
+2021-03-08 12:43:15 | INFO | train | {"epoch": 1, "train_loss": "19.351", "train_ppl": "668509", "train_wps": "210.9", "train_ups": "0.01", "train_wpb": "16384", "train_bsz": "8", "train_num_updates": "10", "train_lr": "0", "train_gnorm": "36.26", "train_loss_scale": "1", "train_train_wall": "667", "train_gb_free": "9.3", "train_wall": "804"}
+2021-03-08 12:43:15 | INFO | fairseq_cli.train | done training in 798.6 seconds
+```
+
+</p></details>
+
+### 13B params on 8 V100 GPUs (with full parameter + optimizer state sharding)
+
+FSDP can also shard the parameters and optimizer states across multiple GPUs,
+reducing memory requirements significantly. On 8 x 32GB GPUs, sharding enables
+training the same 13B parameter model *without offloading the parameters to
+CPU*. However, without CPU offloading we'd only be able to fit a batch size of
+1 per GPU, which would cause training speed to suffer.
+
+We obtain the best performance on 8 GPUs by combining full sharding and CPU
+offloading. The following command trains the same 13B parameter GPT-3 model as
+before on 8 x 32GB V100 GPUs; training speed increases superlinearly from ~310
+words per second to ~3200 words per second.
+
+```bash
+OMP_NUM_THREADS=20 CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
+    fairseq-train data-bin/wikitext-103-roberta-bpe-bin \
+    --ddp-backend fully_sharded --fp16 --fp16-init-scale 4 \
+    --cpu-offload --checkpoint-activations \
+    --task language_modeling --tokens-per-sample 2048 --batch-size 8 \
+    --arch transformer_lm_gpt3_13 \
+    --optimizer cpu_adam --adam-betas "(0.9,0.98)" \
+    --lr 0.0001 --lr-scheduler polynomial_decay --warmup-updates 5 --total-num-update 10 \
+    --max-update 10 --no-save --log-format json --log-interval 1
+```
+
+<details><summary>Example output</summary><p>
+
+```
+(...)
+2021-03-08 18:04:09 | INFO | fairseq_cli.train | num. model params: 13,110,865,920 (num. trained: 13,110,865,920)
+(...)
+2021-03-08 18:04:09 | INFO | fairseq_cli.train | training on 8 devices (GPUs/TPUs)
+2021-03-08 18:04:09 | INFO | fairseq_cli.train | max tokens per GPU = None and batch size per GPU = 8
+(...)
+Adam Optimizer #0 is created with AVX2 arithmetic capability.
+Config: alpha=0.000100, betas=(0.900000, 0.980000), weight_decay=0.000000, adam_w=1
+(...)
+2021-03-08 18:05:06 | INFO | train_inner | {"epoch": 1, "update": 0.001, "loss": "16.408", "ppl": "86945.6", "wps": "0", "ups": "0", "wpb": "131072", "bsz": "64", "num_updates": "1", "lr": "2e-05", "gnorm": "18.27", "loss_scale": "4", "train_wall": "47", "gb_free": "9.3", "wall": "56"}
+2021-03-08 18:05:45 | INFO | train_inner | {"epoch": 1, "update": 0.002, "loss": "16.352", "ppl": "83644.3", "wps": "3283.4", "ups": "0.03", "wpb": "131072", "bsz": "64", "num_updates": "2", "lr": "4e-05", "gnorm": "18.411", "loss_scale": "4", "train_wall": "40", "gb_free": "9.3", "wall": "96"}
+2021-03-08 18:06:21 | INFO | fairseq.trainer | NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 2.0
+2021-03-08 18:06:56 | INFO | fairseq.trainer | NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 1.0
+2021-03-08 18:07:37 | INFO | train_inner | {"epoch": 1, "update": 0.006, "loss": "23.682", "ppl": "1.34537e+07", "wps": "1176.6", "ups": "0.01", "wpb": "131072", "bsz": "64", "num_updates": "3", "lr": "6e-05", "gnorm": "119.682", "loss_scale": "1", "train_wall": "111", "gb_free": "9.3", "wall": "208"}
+2021-03-08 18:08:18 | INFO | train_inner | {"epoch": 1, "update": 0.007, "loss": "18.988", "ppl": "519921", "wps": "3189.1", "ups": "0.02", "wpb": "131072", "bsz": "64", "num_updates": "4", "lr": "8e-05", "gnorm": "14.934", "loss_scale": "1", "train_wall": "41", "gb_free": "9.3", "wall": "249"}
+2021-03-08 18:08:59 | INFO | train_inner | {"epoch": 1, "update": 0.008, "loss": "20.08", "ppl": "1.10798e+06", "wps": "3223.1", "ups": "0.02", "wpb": "131072", "bsz": "64", "num_updates": "5", "lr": "0.0001", "gnorm": "59.92", "loss_scale": "1", "train_wall": "41", "gb_free": "9.3", "wall": "289"}
+2021-03-08 18:09:39 | INFO | train_inner | {"epoch": 1, "update": 0.009, "loss": "18.323", "ppl": "327980", "wps": "3256.6", "ups": "0.02", "wpb": "131072", "bsz": "64", "num_updates": "6", "lr": "8e-05", "gnorm": "37.425", "loss_scale": "1", "train_wall": "40", "gb_free": "9.3", "wall": "330"}
+2021-03-08 18:10:20 | INFO | train_inner | {"epoch": 1, "update": 0.01, "loss": "17.264", "ppl": "157354", "wps": "3188.7", "ups": "0.02", "wpb": "131072", "bsz": "64", "num_updates": "7", "lr": "6e-05", "gnorm": "10.824", "loss_scale": "1", "train_wall": "41", "gb_free": "9.3", "wall": "371"}
+2021-03-08 18:11:01 | INFO | train_inner | {"epoch": 1, "update": 0.011, "loss": "16.794", "ppl": "113647", "wps": "3230", "ups": "0.02", "wpb": "131072", "bsz": "64", "num_updates": "8", "lr": "4e-05", "gnorm": "5.616", "loss_scale": "1", "train_wall": "41", "gb_free": "9.3", "wall": "411"}
+2021-03-08 18:11:39 | INFO | train_inner | {"epoch": 1, "update": 0.012, "loss": "16.706", "ppl": "106938", "wps": "3384", "ups": "0.03", "wpb": "131072", "bsz": "64", "num_updates": "9", "lr": "2e-05", "gnorm": "5.318", "loss_scale": "1", "train_wall": "39", "gb_free": "9.3", "wall": "450"}
+2021-03-08 18:12:19 | INFO | train_inner | {"epoch": 1, "update": 0.013, "loss": "16.548", "ppl": "95796.2", "wps": "3274.4", "ups": "0.02", "wpb": "131072", "bsz": "64", "num_updates": "10", "lr": "0", "gnorm": "5.22", "loss_scale": "1", "train_wall": "40", "gb_free": "9.3", "wall": "490"}
+2021-03-08 18:12:19 | INFO | fairseq_cli.train | Stopping training due to num_updates: 10 >= max_update: 10
+2021-03-08 18:12:19 | INFO | fairseq_cli.train | begin validation on "valid" subset
+2021-03-08 18:12:45 | INFO | valid | {"epoch": 1, "valid_loss": "16.624", "valid_ppl": "101000", "valid_wps": "10855.9", "valid_wpb": "123202", "valid_bsz": "60.5", "valid_num_updates": "10"}
+2021-03-08 18:12:45 | INFO | fairseq_cli.train | end of epoch 1 (average epoch stats below)
+2021-03-08 18:12:45 | INFO | train | {"epoch": 1, "train_loss": "18.114", "train_ppl": "283776", "train_wps": "2567.8", "train_ups": "0.02", "train_wpb": "131072", "train_bsz": "64", "train_num_updates": "10", "train_lr": "0", "train_gnorm": "29.562", "train_loss_scale": "1", "train_train_wall": "480", "train_gb_free": "9.3", "train_wall": "516"}
+2021-03-08 18:12:45 | INFO | fairseq_cli.train | done training in 509.9 seconds
+```
+
+</p></details>
diff --git a/SpeechT5/fairseq/examples/gottbert/README.md b/SpeechT5/fairseq/examples/gottbert/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..1d58feb279a4a50222290546c3bb285d3cea98e6
--- /dev/null
+++ b/SpeechT5/fairseq/examples/gottbert/README.md
@@ -0,0 +1,64 @@
+# GottBERT: a pure German language model
+
+## Introduction
+
+[GottBERT](http://arxiv.org/abs/2012.02110) is a pretrained language model trained on 145GB of German text based on RoBERTa.
+
+## Example usage
+
+### fairseq
+##### Load GottBERT from torch.hub (PyTorch >= 1.1):
+```python
+import torch
+gottbert = torch.hub.load('pytorch/fairseq', 'gottbert-base')
+gottbert.eval()  # disable dropout (or leave in train mode to finetune)
+```
+
+##### Load GottBERT (for PyTorch 1.0 or custom models):
+```python
+# Download gottbert model
+wget https://dl.gottbert.de/fairseq/models/gottbert-base.tar.gz
+tar -xzvf gottbert.tar.gz
+
+# Load the model in fairseq
+from fairseq.models.roberta import GottbertModel
+gottbert = GottbertModel.from_pretrained('/path/to/gottbert')
+gottbert.eval()  # disable dropout (or leave in train mode to finetune)
+```
+
+##### Filling masks:
+```python
+masked_line = 'Gott ist <mask> ! :)'
+gottbert.fill_mask(masked_line, topk=3)
+# [('Gott ist gut ! :)',        0.3642110526561737,   ' gut'),
+#  ('Gott ist überall ! :)',    0.06009674072265625,  ' überall'),
+#  ('Gott ist großartig ! :)',  0.0370681993663311,   ' großartig')]
+```
+
+##### Extract features from GottBERT
+
+```python
+# Extract the last layer's features
+line = "Der erste Schluck aus dem Becher der Naturwissenschaft macht atheistisch , aber auf dem Grunde des Bechers wartet Gott !"
+tokens = gottbert.encode(line)
+last_layer_features = gottbert.extract_features(tokens)
+assert last_layer_features.size() == torch.Size([1, 27, 768])
+
+# Extract all layer's features (layer 0 is the embedding layer)
+all_layers = gottbert.extract_features(tokens, return_all_hiddens=True)
+assert len(all_layers) == 13
+assert torch.all(all_layers[-1] == last_layer_features)
+```
+## Citation
+If you use our work, please cite:
+
+```bibtex
+@misc{scheible2020gottbert,
+      title={GottBERT: a pure German Language Model},
+      author={Raphael Scheible and Fabian Thomczyk and Patric Tippmann and Victor Jaravine and Martin Boeker},
+      year={2020},
+      eprint={2012.02110},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL}
+}
+```
diff --git a/SpeechT5/fairseq/examples/hubert/README.md b/SpeechT5/fairseq/examples/hubert/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..3254b754f0272d3bc02a94ee8c33341f7d4a4bdf
--- /dev/null
+++ b/SpeechT5/fairseq/examples/hubert/README.md
@@ -0,0 +1,116 @@
+# HuBERT
+
+## Pre-trained and fine-tuned (ASR) models
+Model | Pretraining Data | Finetuning Dataset | Model
+|---|---|---|---
+HuBERT Base (~95M params) | [Librispeech](http://www.openslr.org/12) 960 hr | No finetuning (Pretrained Model) | [download](https://dl.fbaipublicfiles.com/hubert/hubert_base_ls960.pt)
+HuBERT Large (~316M params) | [Libri-Light](https://github.com/facebookresearch/libri-light) 60k hr | No finetuning (Pretrained Model) | [download](https://dl.fbaipublicfiles.com/hubert/hubert_large_ll60k.pt)
+HuBERT Extra Large (~1B params) | [Libri-Light](https://github.com/facebookresearch/libri-light) 60k hr |  No finetuning (Pretrained Model) | [download](https://dl.fbaipublicfiles.com/hubert/hubert_xtralarge_ll60k.pt)
+HuBERT Large | [Libri-Light](https://github.com/facebookresearch/libri-light) 60k hr | [Librispeech](http://www.openslr.org/12) 960 hr | [download](https://dl.fbaipublicfiles.com/hubert/hubert_large_ll60k_finetune_ls960.pt)
+HuBERT Extra Large | [Libri-Light](https://github.com/facebookresearch/libri-light) 60k hr | [Librispeech](http://www.openslr.org/12) 960 hr | [download](https://dl.fbaipublicfiles.com/hubert/hubert_xtralarge_ll60k_finetune_ls960.pt)
+
+## Load a pretrained model
+```
+ckpt_path = "/path/to/the/checkpoint.pt"
+models, cfg, task = fairseq.checkpoint_utils.load_model_ensemble_and_task([ckpt_path], strict=False)
+model = models[0]
+```
+** We will follow-up with a patch such that you wouldn't need to pass `strict=False` for loading the checkpoint in future.
+
+## Train a new model
+
+### Data preparation
+
+Follow the steps in `./simple_kmeans` to create:
+- `{train,valid}.tsv` waveform list files
+- `{train,valid}.km` frame-aligned pseudo label files.
+The `label_rate` is the same as the feature frame rate used for clustering,
+which is 100Hz for MFCC features and 50Hz for HuBERT features by default.
+
+### Pre-train a HuBERT model
+
+Suppose `{train,valid}.tsv` are saved at `/path/to/data`, `{train,valid}.km`
+are saved at `/path/to/labels`, and the label rate is 100Hz.
+
+To train a base model (12 layer transformer), run:
+```sh
+$ python fairseq_cli/hydra_train.py \
+  --config-dir /path/to/fairseq-py/examples/hubert/config/pretrain \
+  --config-name hubert_base_librispeech \
+  task.data=/path/to/data task.label_dir=/path/to/labels model.label_rate=100
+```
+
+### Fine-tune a HuBERT model with a CTC loss
+
+Suppose `{train,valid}.tsv` are saved at `/path/to/data`, and their
+corresponding character transcripts `{train,valid}.ltr` are saved at
+`/path/to/trans`.
+
+To fine-tune a pre-trained HuBERT model at `/path/to/checkpoint`, run
+```sh
+$ python fairseq_cli/hydra_train.py \
+  --config-dir /path/to/fairseq-py/examples/hubert/config/finetune \
+  --config-name base_10h \
+  task.data=/path/to/data task.label_dir=/path/to/trans \
+  model.w2v_path=/path/to/checkpoint
+```
+
+### Decode a HuBERT model
+
+Suppose the `test.tsv` and `test.ltr` are the waveform list and transcripts of
+the split to be decoded, saved at `/path/to/data`, and the fine-tuned model is
+saved at `/path/to/checkpoint`. We support three decoding modes:
+- Viterbi decoding: greedy decoding without a language model
+- KenLM decoding: decoding with an arpa-format KenLM n-gram language model
+- Fairseq-LM deocding: decoding with a Fairseq neural language model
+
+
+#### Viterbi decoding
+
+`task.normalize` needs to be consistent with the value used during fine-tuning.
+Decoding results will be saved at
+`/path/to/experiment/directory/decode/viterbi/test`.
+
+```sh
+$ python examples/speech_recognition/new/infer.py \
+  --config-dir /path/to/fairseq-py/examples/hubert/config/decode \
+  --config-name infer_viterbi \
+  task.data=/path/to/data \
+  task.normalize=[true|false] \
+  decoding.exp_dir=/path/to/experiment/directory \
+  common_eval.path=/path/to/checkpoint
+  dataset.gen_subset=test \
+```
+
+#### KenLM / Fairseq-LM decoding
+
+Suppose the pronunciation lexicon and the n-gram LM are saved at
+`/path/to/lexicon` and `/path/to/arpa`, respectively. Decoding results will be
+saved at `/path/to/experiment/directory/decode/kenlm/test`.
+
+```sh
+$ python examples/speech_recognition/new/infer.py \
+  --config-dir /path/to/fairseq-py/examples/hubert/config/decode \
+  --config-name infer_kenlm \
+  task.data=/path/to/data \
+  task.normalize=[true|false] \
+  decoding.exp_dir=/path/to/experiment/directory \
+  common_eval.path=/path/to/checkpoint
+  dataset.gen_subset=test \
+  decoding.decoder.lexicon=/path/to/lexicon \
+  decoding.decoder.lmpath=/path/to/arpa
+```
+
+The command above uses the default decoding hyperparameter, which can be found
+in `examples/speech_recognition/hydra/decoder.py`. These parameters can be
+configured from the command line. For example, to search with a beam size of
+500, we can append the command above with `decoding.decoder.beam=500`.
+Important parameters include:
+- decoding.decoder.beam
+- decoding.decoder.beamthreshold
+- decoding.decoder.lmweight
+- decoding.decoder.wordscore
+- decoding.decoder.silweight
+
+To decode with a Fairseq LM, use `--config-name infer_fsqlm` instead, and
+change the path of lexicon and LM accordingly.
diff --git a/SpeechT5/fairseq/examples/hubert/config/decode/ax_sweep/ngram.yaml b/SpeechT5/fairseq/examples/hubert/config/decode/ax_sweep/ngram.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..5a02df1f7da7eebfebe4018ef2758a716fbab646
--- /dev/null
+++ b/SpeechT5/fairseq/examples/hubert/config/decode/ax_sweep/ngram.yaml
@@ -0,0 +1,33 @@
+# @package _global_
+
+common_eval:
+  results_path: ${decoding.exp_dir}/decode/${decoding.decoder.name}_ax/${dataset.gen_subset}
+
+hydra:
+  sweeper:
+    ax_config:
+      max_trials: 60
+      early_stop:
+        minimize: true
+        max_epochs_without_improvement: 10
+        epsilon: 0.025
+      experiment:
+        name: ${dataset.gen_subset}
+        objective_name: wer
+        minimize: true
+        parameter_constraints: null
+        outcome_constraints: null
+        status_quo: null
+      client:
+        verbose_logging: false
+        random_seed: null
+      params:
+        decoding.decoder.lmweight:
+          type: range
+          bounds: [0.0, 8.0]
+        decoding.decoder.wordscore:
+          type: range
+          bounds: [-5.0, 5.0]
+        decoding.decoder.silweight:
+          type: range
+          bounds: [-10.0, 0.0]
diff --git a/SpeechT5/fairseq/examples/hubert/config/decode/ax_sweep/transformer.yaml b/SpeechT5/fairseq/examples/hubert/config/decode/ax_sweep/transformer.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..85ed3bd1a5a44871260f572786044c28f441add6
--- /dev/null
+++ b/SpeechT5/fairseq/examples/hubert/config/decode/ax_sweep/transformer.yaml
@@ -0,0 +1,33 @@
+# @package _global_
+
+common_eval:
+  results_path: ${decoding.exp_dir}/decode/${decoding.decoder.name}_ax/${dataset.gen_subset}
+
+hydra:
+  sweeper:
+    ax_config:
+      max_trials: 60
+      early_stop:
+        minimize: true
+        max_epochs_without_improvement: 10
+        epsilon: 0.025
+      experiment:
+        name: ${dataset.gen_subset}
+        objective_name: wer
+        minimize: true
+        parameter_constraints: null
+        outcome_constraints: null
+        status_quo: null
+      client:
+        verbose_logging: false
+        random_seed: null
+      params:
+        decoding.decoder.lmweight:
+          type: range
+          bounds: [0.0, 4.0]
+        decoding.decoder.wordscore:
+          type: range
+          bounds: [-5.0, 5.0]
+        decoding.decoder.silweight:
+          type: range
+          bounds: [-8.0, 0.0]
diff --git a/SpeechT5/fairseq/examples/hubert/config/decode/infer_fsqlm.yaml b/SpeechT5/fairseq/examples/hubert/config/decode/infer_fsqlm.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..bc77cab32e156f393a2c3eae336392d1796b8a95
--- /dev/null
+++ b/SpeechT5/fairseq/examples/hubert/config/decode/infer_fsqlm.yaml
@@ -0,0 +1,36 @@
+# @package _group_
+
+defaults:
+  - model: null
+
+hydra:
+  run:
+    dir: ${common_eval.results_path}/beam${decoding.decoder.beam}_lmw${decoding.decoder.lmweight}_wrd${decoding.decoder.wordscore}_sil${decoding.decoder.silweight}
+  sweep:
+    dir: ${common_eval.results_path}
+    subdir: beam${decoding.decoder.beam}_th${decoding.decoder.beamthreshold}_lmw${decoding.decoder.lmweight}_wrd${decoding.decoder.wordscore}_sil${decoding.decoder.silweight}
+
+task:
+  _name: hubert_pretraining
+  single_target: true
+  data: ???
+  normalize: ???
+
+decoding:
+  type: fairseqlm
+  lexicon: ???
+  lmpath: ???
+  beamthreshold: 25 # 100
+  beam: 500
+  lmweight: 2
+  wordscore: -1
+  silweight: 0
+  unique_wer_file: true
+  beam: 500
+common_eval:
+  results_path: ???
+  path: ???
+  post_process: letter
+dataset:
+  max_tokens: 1100000
+  gen_subset: ???
diff --git a/SpeechT5/fairseq/examples/hubert/config/decode/infer_kenlm.yaml b/SpeechT5/fairseq/examples/hubert/config/decode/infer_kenlm.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..26f5c48928f67af84609ce49e41b93905dafb3ec
--- /dev/null
+++ b/SpeechT5/fairseq/examples/hubert/config/decode/infer_kenlm.yaml
@@ -0,0 +1,36 @@
+# @package _group_
+
+defaults:
+  - model: null
+
+hydra:
+  run:
+    dir: ${common_eval.results_path}/beam${decoding.decoder.beam}_lmw${decoding.decoder.lmweight}_wrd${decoding.decoder.wordscore}_sil${decoding.decoder.silweight}
+  sweep:
+    dir: ${common_eval.results_path}
+    subdir: beam${decoding.decoder.beam}_th${decoding.decoder.beamthreshold}_lmw${decoding.decoder.lmweight}_wrd${decoding.decoder.wordscore}_sil${decoding.decoder.silweight}
+
+task:
+  _name: hubert_pretraining
+  single_target: true
+  data: ???
+  normalize: ???
+
+decoding:
+  type: kenlm
+  lexicon: ???
+  lmpath: ???
+  beamthreshold: 100
+  beam: 500
+  lmweight: 2
+  wordscore: -1
+  silweight: 0
+  unique_wer_file: true
+  beam: 500
+common_eval:
+  results_path: ???
+  path: ???
+  post_process: letter
+dataset:
+  max_tokens: 1100000
+  gen_subset: ???
diff --git a/SpeechT5/fairseq/examples/hubert/config/decode/infer_viterbi.yaml b/SpeechT5/fairseq/examples/hubert/config/decode/infer_viterbi.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..935d7d1d013136090e5a0154d70a1266d230ee96
--- /dev/null
+++ b/SpeechT5/fairseq/examples/hubert/config/decode/infer_viterbi.yaml
@@ -0,0 +1,31 @@
+# @package _group_
+
+defaults:
+  - model: null
+
+hydra:
+  run:
+    dir: ${common_eval.results_path}/beam${decoding.decoder.beam}_lmw${decoding.decoder.lmweight}_wrd${decoding.decoder.wordscore}_sil${decoding.decoder.silweight}
+  sweep:
+    dir: ${common_eval.results_path}
+    subdir: beam${decoding.decoder.beam}_th${decoding.decoder.beamthreshold}_lmw${decoding.decoder.lmweight}_wrd${decoding.decoder.wordscore}_sil${decoding.decoder.silweight}
+
+task:
+  _name: hubert_pretraining
+  single_target: true
+  data: ???
+  normalize: ???
+
+decoding:
+  type: viterbi
+  unique_wer_file: true
+common_eval:
+  results_path: ???
+  path: ???
+  post_process: letter
+generation:
+  nbest: 1
+  beam: 500
+dataset:
+  max_tokens: 1100000
+  gen_subset: ???
diff --git a/SpeechT5/fairseq/examples/hubert/config/decode/run/submitit_slurm.yaml b/SpeechT5/fairseq/examples/hubert/config/decode/run/submitit_slurm.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..0b8065832ecacf9dd4fe4e99c87941e00fb3ef7f
--- /dev/null
+++ b/SpeechT5/fairseq/examples/hubert/config/decode/run/submitit_slurm.yaml
@@ -0,0 +1,17 @@
+# @package _global_
+hydra:
+  launcher:
+    cpus_per_task: ${distributed_training.distributed_world_size}
+    gpus_per_node: ${distributed_training.distributed_world_size}
+    tasks_per_node: ${hydra.launcher.gpus_per_node}
+    nodes: 1
+    mem_gb: 200
+    timeout_min: 4320
+    max_num_timeout: 50
+    name: ${hydra.job.config_name}
+    submitit_folder: ${hydra.sweep.dir}/submitit
+
+distributed_training:
+  distributed_world_size: 1
+  distributed_no_spawn: true
+  distributed_port: 29761
diff --git a/SpeechT5/fairseq/examples/hubert/config/decode/run/submitit_slurm_8gpu.yaml b/SpeechT5/fairseq/examples/hubert/config/decode/run/submitit_slurm_8gpu.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..2f669f376312dbfe4611cc08f4996a314155fb87
--- /dev/null
+++ b/SpeechT5/fairseq/examples/hubert/config/decode/run/submitit_slurm_8gpu.yaml
@@ -0,0 +1,17 @@
+# @package _global_
+hydra:
+  launcher:
+    cpus_per_task: ${distributed_training.distributed_world_size}
+    gpus_per_node: ${distributed_training.distributed_world_size}
+    tasks_per_node: ${hydra.launcher.gpus_per_node}
+    nodes: 1
+    mem_gb: 200
+    timeout_min: 4320
+    max_num_timeout: 50
+    name: ${hydra.job.config_name}
+    submitit_folder: ${hydra.sweep.dir}/submitit
+
+distributed_training:
+  distributed_world_size: 8
+  distributed_no_spawn: true
+  distributed_port: 29761
diff --git a/SpeechT5/fairseq/examples/hubert/config/finetune/base_10h.yaml b/SpeechT5/fairseq/examples/hubert/config/finetune/base_10h.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..a22c7c0347f792221f209bcfba7ba380a69f90a8
--- /dev/null
+++ b/SpeechT5/fairseq/examples/hubert/config/finetune/base_10h.yaml
@@ -0,0 +1,100 @@
+# @package _group_
+
+common:
+  fp16: true
+  log_format: json
+  log_interval: 200
+  tensorboard_logdir: tblog
+  seed: 1337
+
+checkpoint:
+  save_interval: 5
+  keep_interval_updates: 1
+  no_epoch_checkpoints: true
+  best_checkpoint_metric: wer
+
+distributed_training:
+  ddp_backend: c10d
+  find_unused_parameters: true
+  distributed_world_size: 1
+  distributed_port: 29671
+  nprocs_per_node: 8
+
+task:
+  _name: hubert_pretraining
+  data: ???
+  fine_tuning: true
+  label_dir: ???
+  normalize: false  # must be consistent with pre-training
+  labels: ["ltr"]
+  single_target: true
+
+dataset:
+  num_workers: 0
+  max_tokens: 3200000
+  validate_after_updates: ${model.freeze_finetune_updates}
+  validate_interval: 5
+  train_subset: train
+  valid_subset: valid
+
+criterion:
+  _name: ctc
+  zero_infinity: true
+
+optimization:
+  max_update: 25000
+  lr: [2e-5]
+  sentence_avg: true
+  update_freq: [1]
+
+optimizer:
+  _name: adam
+  adam_betas: (0.9,0.98)
+  adam_eps: 1e-08
+
+lr_scheduler:
+  _name: tri_stage
+  warmup_steps: 8000
+  hold_steps: 0
+  decay_steps: 72000
+  final_lr_scale: 0.05
+
+model:
+  _name: hubert_ctc
+  w2v_path: ???
+  apply_mask: true
+  mask_selection: static
+  mask_length: 10
+  mask_other: 0
+  mask_prob: 0.75
+  mask_channel_selection: static
+  mask_channel_length: 64
+  mask_channel_other: 0
+  mask_channel_prob: 0.5
+  layerdrop: 0.1
+  dropout: 0.0
+  activation_dropout: 0.1
+  attention_dropout: 0.0
+  feature_grad_mult: 0.0
+  freeze_finetune_updates: 10000
+
+hydra:
+  job:
+    config:
+      override_dirname:
+        kv_sep: '-'
+        item_sep: '__'
+        exclude_keys:
+          - run
+          - task.data
+          - task.label_dir
+          - model.w2v_path
+          - dataset.train_subset
+          - dataset.valid_subset
+          - criterion.wer_kenlm_model
+          - criterion.wer_lexicon
+  run:
+    dir: ???
+  sweep:
+    dir: ???
+    subdir: ${hydra.job.config_name}__${hydra.job.override_dirname}
diff --git a/SpeechT5/fairseq/examples/hubert/config/finetune/ckpt/it1.yaml b/SpeechT5/fairseq/examples/hubert/config/finetune/ckpt/it1.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..2af96b3f72746f85feb13e7efcbdab6602b293de
--- /dev/null
+++ b/SpeechT5/fairseq/examples/hubert/config/finetune/ckpt/it1.yaml
@@ -0,0 +1,7 @@
+# @package _global_
+
+task:
+  normalize: false
+
+model:
+  w2v_path: /checkpoint/wnhsu/w2v/hubert_final/iter1/hubert.km.randcrop.pmw1_0.puw0_0.grpnorm.ml10.mp0_8.untie.mxsz250000.ufreq1.maxtok1400000.MU400k.s1337.ngpu32/checkpoint_last.pt
diff --git a/SpeechT5/fairseq/examples/hubert/config/finetune/lm/ls_4gram.yaml b/SpeechT5/fairseq/examples/hubert/config/finetune/lm/ls_4gram.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..8c7728ad29965d3cf18605808a893bc442afd56b
--- /dev/null
+++ b/SpeechT5/fairseq/examples/hubert/config/finetune/lm/ls_4gram.yaml
@@ -0,0 +1,7 @@
+# @package _global_
+
+criterion:
+  wer_kenlm_model: /checkpoint/abdo/old_checkpoint02/datasets/librispeech/4-gram.bin
+  wer_lexicon: /checkpoint/abdo/old_checkpoint02/datasets/librispeech/10h/raw/lexicon_ltr.lst
+  wer_lm_weight: 2.0
+  wer_word_score: -1.0
diff --git a/SpeechT5/fairseq/examples/hubert/config/finetune/run/submitit_reg.yaml b/SpeechT5/fairseq/examples/hubert/config/finetune/run/submitit_reg.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..27509503e7b306c07742fbed2fc5726d001bb7df
--- /dev/null
+++ b/SpeechT5/fairseq/examples/hubert/config/finetune/run/submitit_reg.yaml
@@ -0,0 +1,20 @@
+# @package _global_
+
+hydra:
+  launcher:
+    cpus_per_task: 8
+    gpus_per_node: 8
+    tasks_per_node: ${hydra.launcher.gpus_per_node}
+    nodes: 1
+    comment: null
+    mem_gb: 384
+    timeout_min: 4320
+    max_num_timeout: 100
+    constraint: volta32gb
+    name: ${hydra.job.config_name}/${hydra.job.override_dirname}
+    submitit_folder: ${hydra.sweep.dir}/submitit/%j
+
+distributed_training:
+  distributed_world_size: 8
+  distributed_port: 29671
+  nprocs_per_node: 8
diff --git a/SpeechT5/fairseq/examples/hubert/config/pretrain/data/iter1.yaml b/SpeechT5/fairseq/examples/hubert/config/pretrain/data/iter1.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..0a1b65d802c83128c53f32b21807fa5e51da6cc9
--- /dev/null
+++ b/SpeechT5/fairseq/examples/hubert/config/pretrain/data/iter1.yaml
@@ -0,0 +1,8 @@
+# @package _global_
+
+task:
+  label_dir: ???
+  labels: ["km"]
+
+model:
+  label_rate: 100
diff --git a/SpeechT5/fairseq/examples/hubert/config/pretrain/data/iter2.yaml b/SpeechT5/fairseq/examples/hubert/config/pretrain/data/iter2.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..2d4bfe61cc638af9de48e92c58994e435fba2abf
--- /dev/null
+++ b/SpeechT5/fairseq/examples/hubert/config/pretrain/data/iter2.yaml
@@ -0,0 +1,8 @@
+# @package _global_
+
+task:
+  label_dir: ???
+  labels: ["km"]
+
+model:
+  label_rate: 50
diff --git a/SpeechT5/fairseq/examples/hubert/config/pretrain/hubert_base_librispeech.yaml b/SpeechT5/fairseq/examples/hubert/config/pretrain/hubert_base_librispeech.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..bd84461a163866f622b01bf6d36b4de6215f3d97
--- /dev/null
+++ b/SpeechT5/fairseq/examples/hubert/config/pretrain/hubert_base_librispeech.yaml
@@ -0,0 +1,97 @@
+# @package _group_
+
+common:
+  fp16: true
+  log_format: json
+  log_interval: 200
+  seed: 1337
+  tensorboard_logdir: tblog
+
+checkpoint:
+  save_interval_updates: 25000
+  keep_interval_updates: 1
+  no_epoch_checkpoints: true
+
+
+distributed_training:
+  ddp_backend: no_c10d
+  distributed_backend: 'nccl'
+  distributed_world_size: 32
+  distributed_port: 29671
+  nprocs_per_node: 8
+  find_unused_parameters: true
+
+task:
+  _name: hubert_pretraining
+  data: ???
+  label_dir: ???
+  labels: ???
+  label_rate: ${model.label_rate}
+  sample_rate: 16000
+  max_sample_size: 250000
+  min_sample_size: 32000
+  pad_audio: false
+  random_crop: true
+  normalize: false # must be consistent with extractor
+
+dataset:
+  num_workers: 6
+  max_tokens: 1400000
+  skip_invalid_size_inputs_valid_test: true
+  validate_interval: 5
+  validate_interval_updates: 10000
+
+criterion:
+  _name: hubert
+  pred_masked_weight: 1.0
+  pred_nomask_weight: 0.0
+  loss_weights: [10,]
+
+optimization:
+  max_update: 400000
+  lr: [0.0005]
+  clip_norm: 10.0
+
+optimizer:
+  _name: adam
+  adam_betas: (0.9,0.98)
+  adam_eps: 1e-06
+  weight_decay: 0.01
+
+lr_scheduler:
+  _name: polynomial_decay
+  warmup_updates: 32000
+
+model:
+  _name: hubert
+  label_rate: ???
+  skip_masked: false
+  skip_nomask: false
+  mask_prob: 0.80
+  extractor_mode: default
+  conv_feature_layers: '[(512,10,5)] + [(512,3,2)] * 4 + [(512,2,2)] * 2'
+  final_dim: 256
+  encoder_layerdrop: 0.05
+  dropout_input: 0.1
+  dropout_features: 0.1
+  dropout: 0.1
+  attention_dropout: 0.1
+  feature_grad_mult: 0.1
+  untie_final_proj: true
+  activation_dropout: 0.0
+
+hydra:
+  job:
+    config:
+      override_dirname:
+        kv_sep: '-'
+        item_sep: '__'
+        exclude_keys:
+          - run
+          - task.data
+          - task.label_dir
+  run:
+    dir: ???
+  sweep:
+    dir: ???
+    subdir: ${hydra.job.config_name}__${hydra.job.override_dirname}
diff --git a/SpeechT5/fairseq/examples/hubert/config/pretrain/hubert_large_librivox.yaml b/SpeechT5/fairseq/examples/hubert/config/pretrain/hubert_large_librivox.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..a5192b5f29b53aa8391a0ab67b6238c0d0b4985e
--- /dev/null
+++ b/SpeechT5/fairseq/examples/hubert/config/pretrain/hubert_large_librivox.yaml
@@ -0,0 +1,101 @@
+# @package _group_
+
+common:
+  fp16: true
+  log_format: json
+  log_interval: 200
+  seed: 1337
+  tensorboard_logdir: tblog
+
+checkpoint:
+  save_interval_updates: 25000
+  keep_interval_updates: 1
+  no_epoch_checkpoints: true
+
+
+distributed_training:
+  ddp_backend: no_c10d
+  distributed_backend: 'nccl'
+  distributed_world_size: 128
+  distributed_port: 29671
+  nprocs_per_node: 8
+  find_unused_parameters: true
+
+task:
+  _name: hubert_pretraining
+  data: ???
+  label_dir: ???
+  labels: ???
+  label_rate: ${model.label_rate}
+  sample_rate: 16000
+  max_sample_size: 250000
+  min_sample_size: 32000
+  pad_audio: false
+  random_crop: true
+  normalize: true # must be consistent with extractor
+
+dataset:
+  num_workers: 6
+  max_tokens: 900000
+  skip_invalid_size_inputs_valid_test: true
+  validate_interval: 5
+  validate_interval_updates: 10000
+
+criterion:
+  _name: hubert
+  pred_masked_weight: 1.0
+  pred_nomask_weight: 0.0
+  loss_weights: [10,]
+
+optimization:
+  max_update: 400000
+  lr: [0.0015]
+  clip_norm: 1.0
+
+optimizer:
+  _name: adam
+  adam_betas: (0.9,0.98)
+  adam_eps: 1e-06
+  weight_decay: 0.01
+
+lr_scheduler:
+  _name: polynomial_decay
+  warmup_updates: 32000
+
+model:
+  _name: hubert
+  label_rate: ???
+  encoder_layers: 24
+  encoder_embed_dim: 1024
+  encoder_ffn_embed_dim: 4096
+  encoder_attention_heads: 16
+  final_dim: 768
+  skip_masked: false
+  skip_nomask: false
+  mask_prob: 0.80
+  extractor_mode: layer_norm
+  conv_feature_layers: '[(512,10,5)] + [(512,3,2)] * 4 + [(512,2,2)] * 2'
+  encoder_layerdrop: 0.0
+  dropout_input: 0.0
+  dropout_features: 0.0
+  dropout: 0.0
+  attention_dropout: 0.0
+  layer_norm_first: true
+  feature_grad_mult: 1.0
+  untie_final_proj: true
+  activation_dropout: 0.0
+
+hydra:
+  job:
+    config:
+      override_dirname:
+        kv_sep: '-'
+        item_sep: '__'
+        exclude_keys:
+          - run
+          - task.data
+  run:
+    dir: /checkpoint/wnhsu/w2v/hubert_final/hydra_pt
+  sweep:
+    dir: /checkpoint/wnhsu/w2v/hubert_final/hydra_pt
+    subdir: ${hydra.job.config_name}__${hydra.job.override_dirname}
diff --git a/SpeechT5/fairseq/examples/hubert/config/pretrain/hubert_xlarge_librivox.yaml b/SpeechT5/fairseq/examples/hubert/config/pretrain/hubert_xlarge_librivox.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..34e8f2bfb93863db122f694785b80857713ceb05
--- /dev/null
+++ b/SpeechT5/fairseq/examples/hubert/config/pretrain/hubert_xlarge_librivox.yaml
@@ -0,0 +1,101 @@
+# @package _group_
+
+common:
+  fp16: true
+  log_format: json
+  log_interval: 200
+  seed: 1337
+  tensorboard_logdir: tblog
+
+checkpoint:
+  save_interval_updates: 25000
+  keep_interval_updates: 1
+  no_epoch_checkpoints: true
+
+
+distributed_training:
+  ddp_backend: no_c10d
+  distributed_backend: 'nccl'
+  distributed_world_size: 256
+  distributed_port: 29671
+  nprocs_per_node: 8
+  find_unused_parameters: true
+
+task:
+  _name: hubert_pretraining
+  data: ???
+  label_dir: ???
+  labels: ???
+  label_rate: ${model.label_rate}
+  sample_rate: 16000
+  max_sample_size: 250000
+  min_sample_size: 32000
+  pad_audio: false
+  random_crop: true
+  normalize: true # must be consistent with extractor
+
+dataset:
+  num_workers: 6
+  max_tokens: 360000
+  skip_invalid_size_inputs_valid_test: true
+  validate_interval: 5
+  validate_interval_updates: 10000
+
+criterion:
+  _name: hubert
+  pred_masked_weight: 1.0
+  pred_nomask_weight: 0.0
+  loss_weights: [10,]
+
+optimization:
+  max_update: 400000
+  lr: [0.003]
+  clip_norm: 1.0
+
+optimizer:
+  _name: adam
+  adam_betas: (0.9,0.98)
+  adam_eps: 1e-06
+  weight_decay: 0.01
+
+lr_scheduler:
+  _name: polynomial_decay
+  warmup_updates: 32000
+
+model:
+  _name: hubert
+  label_rate: ???
+  encoder_layers: 48
+  encoder_embed_dim: 1280
+  encoder_ffn_embed_dim: 5120
+  encoder_attention_heads: 16
+  final_dim: 1024
+  skip_masked: false
+  skip_nomask: false
+  mask_prob: 0.80
+  extractor_mode: layer_norm
+  conv_feature_layers: '[(512,10,5)] + [(512,3,2)] * 4 + [(512,2,2)] * 2'
+  encoder_layerdrop: 0.0
+  dropout_input: 0.0
+  dropout_features: 0.0
+  dropout: 0.0
+  attention_dropout: 0.0
+  layer_norm_first: true
+  feature_grad_mult: 1.0
+  untie_final_proj: true
+  activation_dropout: 0.0
+
+hydra:
+  job:
+    config:
+      override_dirname:
+        kv_sep: '-'
+        item_sep: '__'
+        exclude_keys:
+          - run
+          - task.data
+  run:
+    dir: /checkpoint/wnhsu/w2v/hubert_final/hydra_pt
+  sweep:
+    dir: /checkpoint/wnhsu/w2v/hubert_final/hydra_pt
+    subdir: ${hydra.job.config_name}__${hydra.job.override_dirname}
diff --git a/SpeechT5/fairseq/examples/hubert/config/pretrain/run/submitit_reg.yaml b/SpeechT5/fairseq/examples/hubert/config/pretrain/run/submitit_reg.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..46c979cd2835fe026b0a532a54533904d1001e54
--- /dev/null
+++ b/SpeechT5/fairseq/examples/hubert/config/pretrain/run/submitit_reg.yaml
@@ -0,0 +1,20 @@
+# @package _global_
+
+hydra:
+  launcher:
+    cpus_per_task: 8
+    gpus_per_node: 8
+    tasks_per_node: ${hydra.launcher.gpus_per_node}
+    nodes: 4
+    comment: null
+    mem_gb: 384
+    timeout_min: 4320
+    max_num_timeout: 100
+    constraint: volta32gb
+    name: ${hydra.job.config_name}/${hydra.job.override_dirname}
+    submitit_folder: ${hydra.sweep.dir}/submitit/%j
+
+distributed_training:
+  distributed_world_size: 32
+  distributed_port: 29671
+  nprocs_per_node: 8
diff --git a/SpeechT5/fairseq/examples/hubert/measure_teacher_quality.py b/SpeechT5/fairseq/examples/hubert/measure_teacher_quality.py
new file mode 100644
index 0000000000000000000000000000000000000000..92279b2214bb2ba4a99aea92098907ef4f55821b
--- /dev/null
+++ b/SpeechT5/fairseq/examples/hubert/measure_teacher_quality.py
@@ -0,0 +1,241 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import numpy as np
+import os.path as op
+import re
+from tabulate import tabulate
+from collections import Counter
+
+
+def comp_purity(p_xy, axis):
+    max_p = p_xy.max(axis=axis)
+    marg_p = p_xy.sum(axis=axis)
+    indv_pur = max_p / marg_p
+    aggr_pur = max_p.sum()
+    return indv_pur, aggr_pur
+
+
+def comp_entropy(p):
+    return (-p * np.log(p + 1e-8)).sum()
+
+
+def comp_norm_mutual_info(p_xy):
+    p_x = p_xy.sum(axis=1, keepdims=True)
+    p_y = p_xy.sum(axis=0, keepdims=True)
+    pmi = np.log(p_xy / np.matmul(p_x, p_y) + 1e-8)
+    mi = (p_xy * pmi).sum()
+    h_x = comp_entropy(p_x)
+    h_y = comp_entropy(p_y)
+    return mi, mi / h_x, mi / h_y, h_x, h_y
+
+
+def pad(labs, n):
+    if n == 0:
+        return np.array(labs)
+    return np.concatenate([[labs[0]] * n, labs, [labs[-1]] * n])
+
+
+def comp_avg_seg_dur(labs_list):
+    n_frms = 0
+    n_segs = 0
+    for labs in labs_list:
+        labs = np.array(labs)
+        edges = np.zeros(len(labs)).astype(bool)
+        edges[0] = True
+        edges[1:] = labs[1:] != labs[:-1]
+        n_frms += len(edges)
+        n_segs += edges.astype(int).sum()
+    return n_frms / n_segs
+
+
+def comp_joint_prob(uid2refs, uid2hyps):
+    """
+    Args:
+        pad: padding for spliced-feature derived labels
+    """
+    cnts = Counter()
+    skipped = []
+    abs_frmdiff = 0
+    for uid in uid2refs:
+        if uid not in uid2hyps:
+            skipped.append(uid)
+            continue
+        refs = uid2refs[uid]
+        hyps = uid2hyps[uid]
+        abs_frmdiff += abs(len(refs) - len(hyps))
+        min_len = min(len(refs), len(hyps))
+        refs = refs[:min_len]
+        hyps = hyps[:min_len]
+        cnts.update(zip(refs, hyps))
+    tot = sum(cnts.values())
+
+    ref_set = sorted({ref for ref, _ in cnts.keys()})
+    hyp_set = sorted({hyp for _, hyp in cnts.keys()})
+    ref2pid = dict(zip(ref_set, range(len(ref_set))))
+    hyp2lid = dict(zip(hyp_set, range(len(hyp_set))))
+    # print(hyp_set)
+    p_xy = np.zeros((len(ref2pid), len(hyp2lid)), dtype=float)
+    for (ref, hyp), cnt in cnts.items():
+        p_xy[ref2pid[ref], hyp2lid[hyp]] = cnt
+    p_xy /= p_xy.sum()
+    return p_xy, ref2pid, hyp2lid, tot, abs_frmdiff, skipped
+
+
+def read_phn(tsv_path, rm_stress=True):
+    uid2phns = {}
+    with open(tsv_path) as f:
+        for line in f:
+            uid, phns = line.rstrip().split("\t")
+            phns = phns.split(",")
+            if rm_stress:
+                phns = [re.sub("[0-9]", "", phn) for phn in phns]
+            uid2phns[uid] = phns
+    return uid2phns
+
+
+def read_lab(tsv_path, lab_path, pad_len=0, upsample=1):
+    """
+    tsv is needed to retrieve the uids for the labels
+    """
+    with open(tsv_path) as f:
+        f.readline()
+        uids = [op.splitext(op.basename(line.rstrip().split()[0]))[0] for line in f]
+    with open(lab_path) as f:
+        labs_list = [pad(line.rstrip().split(), pad_len).repeat(upsample) for line in f]
+    assert len(uids) == len(labs_list)
+    return dict(zip(uids, labs_list))
+
+
+def main_lab_lab(
+    tsv_dir,
+    lab_dir,
+    lab_name,
+    lab_sets,
+    ref_dir,
+    ref_name,
+    pad_len=0,
+    upsample=1,
+    verbose=False,
+):
+    # assume tsv_dir is the same for both the reference and the hypotheses
+    tsv_dir = lab_dir if tsv_dir is None else tsv_dir
+
+    uid2refs = {}
+    for s in lab_sets:
+        uid2refs.update(read_lab(f"{tsv_dir}/{s}.tsv", f"{ref_dir}/{s}.{ref_name}"))
+
+    uid2hyps = {}
+    for s in lab_sets:
+        uid2hyps.update(
+            read_lab(
+                f"{tsv_dir}/{s}.tsv", f"{lab_dir}/{s}.{lab_name}", pad_len, upsample
+            )
+        )
+    _main(uid2refs, uid2hyps, verbose)
+
+
+def main_phn_lab(
+    tsv_dir,
+    lab_dir,
+    lab_name,
+    lab_sets,
+    phn_dir,
+    phn_sets,
+    pad_len=0,
+    upsample=1,
+    verbose=False,
+):
+    uid2refs = {}
+    for s in phn_sets:
+        uid2refs.update(read_phn(f"{phn_dir}/{s}.tsv"))
+
+    uid2hyps = {}
+    tsv_dir = lab_dir if tsv_dir is None else tsv_dir
+    for s in lab_sets:
+        uid2hyps.update(
+            read_lab(
+                f"{tsv_dir}/{s}.tsv", f"{lab_dir}/{s}.{lab_name}", pad_len, upsample
+            )
+        )
+    _main(uid2refs, uid2hyps, verbose)
+
+
+def _main(uid2refs, uid2hyps, verbose):
+    (p_xy, ref2pid, hyp2lid, tot, frmdiff, skipped) = comp_joint_prob(
+        uid2refs, uid2hyps
+    )
+    ref_pur_by_hyp, ref_pur = comp_purity(p_xy, axis=0)
+    hyp_pur_by_ref, hyp_pur = comp_purity(p_xy, axis=1)
+    (mi, mi_norm_by_ref, mi_norm_by_hyp, h_ref, h_hyp) = comp_norm_mutual_info(p_xy)
+    outputs = {
+        "ref pur": ref_pur,
+        "hyp pur": hyp_pur,
+        "H(ref)": h_ref,
+        "H(hyp)": h_hyp,
+        "MI": mi,
+        "MI/H(ref)": mi_norm_by_ref,
+        "ref segL": comp_avg_seg_dur(uid2refs.values()),
+        "hyp segL": comp_avg_seg_dur(uid2hyps.values()),
+        "p_xy shape": p_xy.shape,
+        "frm tot": tot,
+        "frm diff": frmdiff,
+        "utt tot": len(uid2refs),
+        "utt miss": len(skipped),
+    }
+    print(tabulate([outputs.values()], outputs.keys(), floatfmt=".4f"))
+
+
+if __name__ == "__main__":
+    """
+    compute quality of labels with respect to phone or another labels if set
+    """
+    import argparse
+
+    parser = argparse.ArgumentParser()
+    parser.add_argument("tsv_dir")
+    parser.add_argument("lab_dir")
+    parser.add_argument("lab_name")
+    parser.add_argument("--lab_sets", default=["valid"], type=str, nargs="+")
+    parser.add_argument(
+        "--phn_dir",
+        default="/checkpoint/wnhsu/data/librispeech/960h/fa/raw_phn/phone_frame_align_v1",
+    )
+    parser.add_argument(
+        "--phn_sets", default=["dev-clean", "dev-other"], type=str, nargs="+"
+    )
+    parser.add_argument("--pad_len", default=0, type=int, help="padding for hypotheses")
+    parser.add_argument(
+        "--upsample", default=1, type=int, help="upsample factor for hypotheses"
+    )
+    parser.add_argument("--ref_lab_dir", default="")
+    parser.add_argument("--ref_lab_name", default="")
+    parser.add_argument("--verbose", action="store_true")
+    args = parser.parse_args()
+
+    if args.ref_lab_dir and args.ref_lab_name:
+        main_lab_lab(
+            args.tsv_dir,
+            args.lab_dir,
+            args.lab_name,
+            args.lab_sets,
+            args.ref_lab_dir,
+            args.ref_lab_name,
+            args.pad_len,
+            args.upsample,
+            args.verbose,
+        )
+    else:
+        main_phn_lab(
+            args.tsv_dir,
+            args.lab_dir,
+            args.lab_name,
+            args.lab_sets,
+            args.phn_dir,
+            args.phn_sets,
+            args.pad_len,
+            args.upsample,
+            args.verbose,
+        )
diff --git a/SpeechT5/fairseq/examples/hubert/simple_kmeans/README.md b/SpeechT5/fairseq/examples/hubert/simple_kmeans/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..cd17da3b3e6f3e39083f7a76a56ff46c3a63b929
--- /dev/null
+++ b/SpeechT5/fairseq/examples/hubert/simple_kmeans/README.md
@@ -0,0 +1,71 @@
+# Sharded Feature Extraction and K-means Application
+
+This folder contains scripts for preparing HUBERT labels from tsv files, the
+steps are:
+1. feature extraction
+2. k-means clustering
+3. k-means application
+
+
+## Data preparation
+
+`*.tsv` files contains a list of audio, where each line is the root, and
+following lines are the subpath for each audio:
+```
+<root-dir>
+<audio-path-1>
+<audio-path-2>
+...
+```
+
+
+## Feature extraction
+
+### MFCC feature
+Suppose the tsv file is at `${tsv_dir}/${split}.tsv`. To extract 39-D
+mfcc+delta+ddelta features for the 1st iteration HUBERT training, run:
+```sh
+python dump_mfcc_feature.py ${tsv_dir} ${split} ${nshard} ${rank} ${feat_dir}
+```
+This would shard the tsv file into `${nshard}` and extract features for the
+`${rank}`-th shard, where rank is an integer in `[0, nshard-1]`. Features would
+be saved at `${feat_dir}/${split}_${rank}_${nshard}.{npy,len}`.
+
+
+### HUBERT feature
+To extract features from the `${layer}`-th transformer layer of a trained
+HUBERT model saved at `${ckpt_path}`, run:
+```sh
+python dump_hubert_feature.py ${tsv_dir} ${split} ${ckpt_path} ${layer} ${nshard} ${rank} ${feat_dir}
+```
+Features would also be saved at `${feat_dir}/${split}_${rank}_${nshard}.{npy,len}`.
+
+- if out-of-memory, decrease the chunk size with `--max_chunk`
+
+
+## K-means clustering
+To fit a k-means model with `${n_clusters}` clusters on 10% of the `${split}` data, run
+```sh
+python learn_kmeans.py ${feat_dir} ${split} ${nshard} ${km_path} ${n_cluster} --percent 0.1
+```
+This saves the k-means model to `${km_path}`.
+
+- set `--precent -1` to use all data
+- more kmeans options can be found with `-h` flag
+
+
+## K-means application
+To apply a trained k-means model `${km_path}` to obtain labels for `${split}`, run
+```sh
+python dump_km_label.py ${feat_dir} ${split} ${km_path} ${nshard} ${rank} ${lab_dir}
+```
+This would extract labels for the `${rank}`-th shard out of `${nshard}` shards
+and dump them to `${lab_dir}/${split}_${rank}_${shard}.km`
+
+
+Finally, merge shards for `${split}` by running
+```sh
+for rank in $(seq 0 $((nshard - 1))); do
+  cat $lab_dir/${split}_${rank}_${nshard}.km
+done > $lab_dir/${split}.km
+```
diff --git a/SpeechT5/fairseq/examples/hubert/simple_kmeans/dump_hubert_feature.py b/SpeechT5/fairseq/examples/hubert/simple_kmeans/dump_hubert_feature.py
new file mode 100644
index 0000000000000000000000000000000000000000..cd242890e531208e5a732842e53085bb1acc8664
--- /dev/null
+++ b/SpeechT5/fairseq/examples/hubert/simple_kmeans/dump_hubert_feature.py
@@ -0,0 +1,133 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import logging
+import math
+import os
+import sys
+
+import fairseq
+import soundfile as sf
+import torch
+import torch.nn.functional as F
+import tqdm
+from npy_append_array import NpyAppendArray
+
+logging.basicConfig(
+    format="%(asctime)s | %(levelname)s | %(name)s | %(message)s",
+    datefmt="%Y-%m-%d %H:%M:%S",
+    level=os.environ.get("LOGLEVEL", "INFO").upper(),
+    stream=sys.stdout,
+)
+logger = logging.getLogger("dump_hubert_feature")
+
+
+class HubertFeatureReader(object):
+    def __init__(self, ckpt_path, layer, max_chunk=1600000):
+        (
+            model,
+            cfg,
+            task,
+        ) = fairseq.checkpoint_utils.load_model_ensemble_and_task([ckpt_path])
+        self.model = model[0].eval().cuda()
+        self.task = task
+        self.layer = layer
+        self.max_chunk = max_chunk
+        logger.info(f"TASK CONFIG:\n{self.task.cfg}")
+        logger.info(f" max_chunk = {self.max_chunk}")
+
+    def read_audio(self, path, ref_len=None):
+        wav, sr = sf.read(path)
+        assert sr == self.task.cfg.sample_rate, sr
+        if wav.ndim == 2:
+            wav = wav.mean(-1)
+        assert wav.ndim == 1, wav.ndim
+        if ref_len is not None and abs(ref_len - len(wav)) > 160:
+            logging.warning(f"ref {ref_len} != read {len(wav)} ({path})")
+        return wav
+
+    def get_feats(self, path, ref_len=None):
+        x = self.read_audio(path, ref_len)
+        with torch.no_grad():
+            x = torch.from_numpy(x).float().cuda()
+            if self.task.cfg.normalize:
+                x = F.layer_norm(x, x.shape)
+            x = x.view(1, -1)
+
+            feat = []
+            for start in range(0, x.size(1), self.max_chunk):
+                x_chunk = x[:, start: start + self.max_chunk]
+                feat_chunk, _ = self.model.extract_features(
+                    source=x_chunk,
+                    padding_mask=None,
+                    mask=False,
+                    output_layer=self.layer,
+                )
+                feat.append(feat_chunk)
+            return torch.cat(feat, 1).squeeze(0)
+
+
+def get_path_iterator(tsv, nshard, rank):
+    with open(tsv, "r") as f:
+        root = f.readline().rstrip()
+        lines = [line.rstrip() for line in f]
+        tot = len(lines)
+        shard_size = math.ceil(tot / nshard)
+        start, end = rank * shard_size, min((rank + 1) * shard_size, tot)
+        assert start < end, "start={start}, end={end}"
+        logger.info(
+            f"rank {rank} of {nshard}, process {end-start} "
+            f"({start}-{end}) out of {tot}"
+        )
+
+        lines = lines[start:end]
+
+        def iterate():
+            for line in lines:
+                subpath, nsample = line.split("\t")
+                yield f"{root}/{subpath}", int(nsample)
+
+        return iterate, len(lines)
+
+
+def dump_feature(
+    tsv_dir, split, ckpt_path, layer, nshard, rank, feat_dir, max_chunk
+):
+    reader = HubertFeatureReader(ckpt_path, layer, max_chunk)
+    generator, num = get_path_iterator(f"{tsv_dir}/{split}.tsv", nshard, rank)
+    iterator = generator()
+
+    feat_path = f"{feat_dir}/{split}_{rank}_{nshard}.npy"
+    leng_path = f"{feat_dir}/{split}_{rank}_{nshard}.len"
+
+    os.makedirs(feat_dir, exist_ok=True)
+    if os.path.exists(feat_path):
+        os.remove(feat_path)
+
+    feat_f = NpyAppendArray(feat_path)
+    with open(leng_path, "w") as leng_f:
+        for path, nsample in tqdm.tqdm(iterator, total=num):
+            feat = reader.get_feats(path, nsample)
+            feat_f.append(feat.cpu().numpy())
+            leng_f.write(f"{len(feat)}\n")
+    logger.info("finished successfully")
+
+
+if __name__ == "__main__":
+    import argparse
+
+    parser = argparse.ArgumentParser()
+    parser.add_argument("tsv_dir")
+    parser.add_argument("split")
+    parser.add_argument("ckpt_path")
+    parser.add_argument("layer", type=int)
+    parser.add_argument("nshard", type=int)
+    parser.add_argument("rank", type=int)
+    parser.add_argument("feat_dir")
+    parser.add_argument("--max_chunk", type=int, default=1600000)
+    args = parser.parse_args()
+    logger.info(args)
+
+    dump_feature(**vars(args))
diff --git a/SpeechT5/fairseq/examples/hubert/simple_kmeans/dump_hubert_feature_s2t.py b/SpeechT5/fairseq/examples/hubert/simple_kmeans/dump_hubert_feature_s2t.py
new file mode 100644
index 0000000000000000000000000000000000000000..7ec8a7311b4eee7fa6d42c0615fc06148262f63c
--- /dev/null
+++ b/SpeechT5/fairseq/examples/hubert/simple_kmeans/dump_hubert_feature_s2t.py
@@ -0,0 +1,126 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import csv
+import io
+import logging
+import math
+import os
+import os.path as op
+import sys
+
+import tqdm
+from dump_hubert_feature import HubertFeatureReader
+from fairseq.data.audio.audio_utils import get_waveform
+from fairseq.data.audio.speech_to_text_dataset import (
+    read_from_uncompressed_zip,
+)
+from npy_append_array import NpyAppendArray
+
+logging.basicConfig(
+    format="%(asctime)s | %(levelname)s | %(name)s | %(message)s",
+    datefmt="%Y-%m-%d %H:%M:%S",
+    level=os.environ.get("LOGLEVEL", "INFO").upper(),
+    stream=sys.stdout,
+)
+logger = logging.getLogger("dump_hubert_feature_s2t")
+
+
+class HubertFeatureReaderS2T(HubertFeatureReader):
+    def read_audio(self, path, ref_len=None):
+        path, *extra = path.split(":")
+        assert len(extra) == 2
+        assert path.endswith(".zip")
+
+        data = read_from_uncompressed_zip(path, int(extra[0]), int(extra[1]))
+        f = io.BytesIO(data)
+        wav, sr = get_waveform(f)
+        assert sr == self.task.cfg.sample_rate, sr
+        if wav.ndim == 2:
+            wav = wav.mean(-1)
+        assert wav.ndim == 1, wav.ndim
+        if ref_len is not None and abs(ref_len - len(wav)) > 160:
+            logging.warning(f"ref {ref_len} != read {len(wav)} ({path})")
+        return wav
+
+
+def get_path_iterator(root, tsv, nshard, rank):
+    with open(tsv) as f:
+        reader = csv.DictReader(
+            f,
+            delimiter="\t",
+            quotechar=None,
+            doublequote=False,
+            lineterminator="\n",
+            quoting=csv.QUOTE_NONE,
+        )
+        subpaths = [op.join(root, e["audio"]) for e in reader]
+
+        tot = len(subpaths)
+        shard_size = math.ceil(tot / nshard)
+        start, end = rank * shard_size, min((rank + 1) * shard_size, tot)
+        assert start < end, "start={start}, end={end}"
+        logger.info(
+            f"rank {rank} of {nshard}, process {end-start} "
+            f"({start}-{end}) out of {tot}"
+        )
+
+        subpaths = subpaths[start:end]
+
+        def iterate():
+            for subpath in subpaths:
+                yield op.join(root, subpath)
+
+        return iterate, len(subpaths)
+
+
+def dump_feature(
+    root,
+    tsv_path,
+    ckpt_path,
+    layer,
+    nshard,
+    rank,
+    feat_dir,
+    feat_name,
+    max_chunk,
+):
+    reader = HubertFeatureReaderS2T(ckpt_path, layer, max_chunk)
+    generator, num = get_path_iterator(root, tsv_path, nshard, rank)
+    iterator = generator()
+
+    feat_path = f"{feat_dir}/{feat_name}_{rank}_{nshard}.npy"
+    leng_path = f"{feat_dir}/{feat_name}_{rank}_{nshard}.len"
+
+    os.makedirs(feat_dir, exist_ok=True)
+    if op.exists(feat_path):
+        os.remove(feat_path)
+
+    feat_f = NpyAppendArray(feat_path)
+    with open(leng_path, "w") as leng_f:
+        for path in tqdm.tqdm(iterator, total=num):
+            feat = reader.get_feats(path)
+            feat_f.append(feat.cpu().numpy())
+            leng_f.write(f"{len(feat)}\n")
+    logger.info("finished successfully")
+
+
+if __name__ == "__main__":
+    import argparse
+
+    parser = argparse.ArgumentParser()
+    parser.add_argument("root")
+    parser.add_argument("tsv_path")
+    parser.add_argument("ckpt_path")
+    parser.add_argument("layer", type=int)
+    parser.add_argument("nshard", type=int)
+    parser.add_argument("rank", type=int)
+    parser.add_argument("feat_dir")
+    parser.add_argument("feat_name")
+    parser.add_argument("--max_chunk", type=int, default=1600000)
+    args = parser.parse_args()
+    logger.info(args)
+
+    dump_feature(**vars(args))
diff --git a/SpeechT5/fairseq/examples/hubert/simple_kmeans/dump_km_label.py b/SpeechT5/fairseq/examples/hubert/simple_kmeans/dump_km_label.py
new file mode 100644
index 0000000000000000000000000000000000000000..8871307804d3f1e5c7cc49061614c69df26ab1ee
--- /dev/null
+++ b/SpeechT5/fairseq/examples/hubert/simple_kmeans/dump_km_label.py
@@ -0,0 +1,98 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import logging
+import os
+import sys
+
+import numpy as np
+
+import joblib
+import torch
+import tqdm
+
+logging.basicConfig(
+    format="%(asctime)s | %(levelname)s | %(name)s | %(message)s",
+    datefmt="%Y-%m-%d %H:%M:%S",
+    level=os.environ.get("LOGLEVEL", "INFO").upper(),
+    stream=sys.stdout,
+)
+logger = logging.getLogger("dump_km_label")
+
+
+class ApplyKmeans(object):
+    def __init__(self, km_path):
+        self.km_model = joblib.load(km_path)
+        self.C_np = self.km_model.cluster_centers_.transpose()
+        self.Cnorm_np = (self.C_np ** 2).sum(0, keepdims=True)
+
+        self.C = torch.from_numpy(self.C_np)
+        self.Cnorm = torch.from_numpy(self.Cnorm_np)
+        if torch.cuda.is_available():
+            self.C = self.C.cuda()
+            self.Cnorm = self.Cnorm.cuda()
+
+    def __call__(self, x):
+        if isinstance(x, torch.Tensor):
+            dist = (
+                x.pow(2).sum(1, keepdim=True)
+                - 2 * torch.matmul(x, self.C)
+                + self.Cnorm
+            )
+            return dist.argmin(dim=1).cpu().numpy()
+        else:
+            dist = (
+                (x ** 2).sum(1, keepdims=True)
+                - 2 * np.matmul(x, self.C_np)
+                + self.Cnorm_np
+            )
+            return np.argmin(dist, axis=1)
+
+
+def get_feat_iterator(feat_dir, split, nshard, rank):
+    feat_path = f"{feat_dir}/{split}_{rank}_{nshard}.npy"
+    leng_path = f"{feat_dir}/{split}_{rank}_{nshard}.len"
+    with open(leng_path, "r") as f:
+        lengs = [int(line.rstrip()) for line in f]
+        offsets = [0] + np.cumsum(lengs[:-1]).tolist()
+
+    def iterate():
+        feat = np.load(feat_path, mmap_mode="r")
+        assert feat.shape[0] == (offsets[-1] + lengs[-1])
+        for offset, leng in zip(offsets, lengs):
+            yield feat[offset: offset + leng]
+
+    return iterate, len(lengs)
+
+
+def dump_label(feat_dir, split, km_path, nshard, rank, lab_dir):
+    apply_kmeans = ApplyKmeans(km_path)
+    generator, num = get_feat_iterator(feat_dir, split, nshard, rank)
+    iterator = generator()
+
+    lab_path = f"{lab_dir}/{split}_{rank}_{nshard}.km"
+    os.makedirs(lab_dir, exist_ok=True)
+    with open(lab_path, "w") as f:
+        for feat in tqdm.tqdm(iterator, total=num):
+            # feat = torch.from_numpy(feat).cuda()
+            lab = apply_kmeans(feat).tolist()
+            f.write(" ".join(map(str, lab)) + "\n")
+    logger.info("finished successfully")
+
+
+if __name__ == "__main__":
+    import argparse
+
+    parser = argparse.ArgumentParser()
+    parser.add_argument("feat_dir")
+    parser.add_argument("split")
+    parser.add_argument("km_path")
+    parser.add_argument("nshard", type=int)
+    parser.add_argument("rank", type=int)
+    parser.add_argument("lab_dir")
+    args = parser.parse_args()
+    logging.info(str(args))
+
+    dump_label(**vars(args))
diff --git a/SpeechT5/fairseq/examples/hubert/simple_kmeans/dump_mfcc_feature.py b/SpeechT5/fairseq/examples/hubert/simple_kmeans/dump_mfcc_feature.py
new file mode 100644
index 0000000000000000000000000000000000000000..a36fa643bd28134aef56d0630d49efdf7969f876
--- /dev/null
+++ b/SpeechT5/fairseq/examples/hubert/simple_kmeans/dump_mfcc_feature.py
@@ -0,0 +1,116 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import logging
+import math
+import os
+import sys
+
+import soundfile as sf
+import torch
+import torchaudio
+import tqdm
+from npy_append_array import NpyAppendArray
+
+logging.basicConfig(
+    format="%(asctime)s | %(levelname)s | %(name)s | %(message)s",
+    datefmt="%Y-%m-%d %H:%M:%S",
+    level=os.environ.get("LOGLEVEL", "INFO").upper(),
+    stream=sys.stdout,
+)
+logger = logging.getLogger("dump_mfcc_feature")
+
+
+class MfccFeatureReader(object):
+    def __init__(self, sample_rate):
+        self.sample_rate = sample_rate
+
+    def read_audio(self, path, ref_len=None):
+        wav, sr = sf.read(path)
+        assert sr == self.sample_rate, sr
+        if wav.ndim == 2:
+            wav = wav.mean(-1)
+        assert wav.ndim == 1, wav.ndim
+        if ref_len is not None and abs(ref_len - len(wav)) > 160:
+            logging.warning(f"ref {ref_len} != read {len(wav)} ({path})")
+        return wav
+
+    def get_feats(self, path, ref_len=None):
+        x = self.read_audio(path, ref_len)
+        with torch.no_grad():
+            x = torch.from_numpy(x).float()
+            x = x.view(1, -1)
+
+            mfccs = torchaudio.compliance.kaldi.mfcc(
+                waveform=x,
+                sample_frequency=self.sample_rate,
+                use_energy=False,
+            )  # (time, freq)
+            mfccs = mfccs.transpose(0, 1)  # (freq, time)
+            deltas = torchaudio.functional.compute_deltas(mfccs)
+            ddeltas = torchaudio.functional.compute_deltas(deltas)
+            concat = torch.cat([mfccs, deltas, ddeltas], dim=0)
+            concat = concat.transpose(0, 1).contiguous()  # (freq, time)
+            return concat
+
+
+def get_path_iterator(tsv, nshard, rank):
+    with open(tsv, "r") as f:
+        root = f.readline().rstrip()
+        lines = [line.rstrip() for line in f]
+        tot = len(lines)
+        shard_size = math.ceil(tot / nshard)
+        start, end = rank * shard_size, min((rank + 1) * shard_size, tot)
+        assert start < end, "start={start}, end={end}"
+        logger.info(
+            f"rank {rank} of {nshard}, process {end-start} "
+            f"({start}-{end}) out of {tot}"
+        )
+
+        lines = lines[start:end]
+
+        def iterate():
+            for line in lines:
+                subpath, nsample = line.split("\t")
+                yield f"{root}/{subpath}", int(nsample)
+
+        return iterate, len(lines)
+
+
+def dump_feature(tsv_dir, split, sample_rate, nshard, rank, feat_dir):
+    reader = MfccFeatureReader(sample_rate)
+    generator, num = get_path_iterator(f"{tsv_dir}/{split}.tsv", nshard, rank)
+    iterator = generator()
+
+    feat_path = f"{feat_dir}/{split}_{rank}_{nshard}.npy"
+    leng_path = f"{feat_dir}/{split}_{rank}_{nshard}.len"
+
+    os.makedirs(feat_dir, exist_ok=True)
+    if os.path.exists(feat_path):
+        os.remove(feat_path)
+
+    feat_f = NpyAppendArray(feat_path)
+    with open(leng_path, "w") as leng_f:
+        for path, nsample in tqdm.tqdm(iterator, total=num):
+            feat = reader.get_feats(path, nsample)
+            feat_f.append(feat.cpu().numpy())
+            leng_f.write(f"{len(feat)}\n")
+    logger.info("finished successfully")
+
+
+if __name__ == "__main__":
+    import argparse
+
+    parser = argparse.ArgumentParser()
+    parser.add_argument("tsv_dir")
+    parser.add_argument("split")
+    parser.add_argument("nshard", type=int)
+    parser.add_argument("rank", type=int)
+    parser.add_argument("feat_dir")
+    parser.add_argument("--sample_rate", type=int, default=16000)
+    args = parser.parse_args()
+    logger.info(args)
+
+    dump_feature(**vars(args))
diff --git a/SpeechT5/fairseq/examples/hubert/simple_kmeans/learn_kmeans.py b/SpeechT5/fairseq/examples/hubert/simple_kmeans/learn_kmeans.py
new file mode 100644
index 0000000000000000000000000000000000000000..113ac655b8c0a585fe43797e99674e445098edd0
--- /dev/null
+++ b/SpeechT5/fairseq/examples/hubert/simple_kmeans/learn_kmeans.py
@@ -0,0 +1,146 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import logging
+import os
+import sys
+
+import numpy as np
+from sklearn.cluster import MiniBatchKMeans
+
+import joblib
+
+logging.basicConfig(
+    format="%(asctime)s | %(levelname)s | %(name)s | %(message)s",
+    datefmt="%Y-%m-%d %H:%M:%S",
+    level=os.environ.get("LOGLEVEL", "INFO").upper(),
+    stream=sys.stdout,
+)
+logger = logging.getLogger("learn_kmeans")
+
+
+def get_km_model(
+    n_clusters,
+    init,
+    max_iter,
+    batch_size,
+    tol,
+    max_no_improvement,
+    n_init,
+    reassignment_ratio,
+):
+    return MiniBatchKMeans(
+        n_clusters=n_clusters,
+        init=init,
+        max_iter=max_iter,
+        batch_size=batch_size,
+        verbose=1,
+        compute_labels=False,
+        tol=tol,
+        max_no_improvement=max_no_improvement,
+        init_size=None,
+        n_init=n_init,
+        reassignment_ratio=reassignment_ratio,
+    )
+
+
+def load_feature_shard(feat_dir, split, nshard, rank, percent):
+    feat_path = f"{feat_dir}/{split}_{rank}_{nshard}.npy"
+    leng_path = f"{feat_dir}/{split}_{rank}_{nshard}.len"
+    with open(leng_path, "r") as f:
+        lengs = [int(line.rstrip()) for line in f]
+        offsets = [0] + np.cumsum(lengs[:-1]).tolist()
+
+    if percent < 0:
+        return np.load(feat_path, mmap_mode="r")
+    else:
+        nsample = int(np.ceil(len(lengs) * percent))
+        indices = np.random.choice(len(lengs), nsample, replace=False)
+        feat = np.load(feat_path, mmap_mode="r")
+        sampled_feat = np.concatenate(
+            [feat[offsets[i]: offsets[i] + lengs[i]] for i in indices], axis=0
+        )
+        logger.info(
+            (
+                f"sampled {nsample} utterances, {len(sampled_feat)} frames "
+                f"from shard {rank}/{nshard}"
+            )
+        )
+        return sampled_feat
+
+
+def load_feature(feat_dir, split, nshard, seed, percent):
+    assert percent <= 1.0
+    feat = np.concatenate(
+        [
+            load_feature_shard(feat_dir, split, nshard, r, percent)
+            for r in range(nshard)
+        ],
+        axis=0,
+    )
+    logging.info(f"loaded feature with dimension {feat.shape}")
+    return feat
+
+
+def learn_kmeans(
+    feat_dir,
+    split,
+    nshard,
+    km_path,
+    n_clusters,
+    seed,
+    percent,
+    init,
+    max_iter,
+    batch_size,
+    tol,
+    n_init,
+    reassignment_ratio,
+    max_no_improvement,
+):
+    np.random.seed(seed)
+    feat = load_feature(feat_dir, split, nshard, seed, percent)
+    km_model = get_km_model(
+        n_clusters,
+        init,
+        max_iter,
+        batch_size,
+        tol,
+        max_no_improvement,
+        n_init,
+        reassignment_ratio,
+    )
+    km_model.fit(feat)
+    joblib.dump(km_model, km_path)
+
+    inertia = -km_model.score(feat) / len(feat)
+    logger.info("total intertia: %.5f", inertia)
+    logger.info("finished successfully")
+
+
+if __name__ == "__main__":
+    import argparse
+
+    parser = argparse.ArgumentParser()
+    parser.add_argument("feat_dir", type=str)
+    parser.add_argument("split", type=str)
+    parser.add_argument("nshard", type=int)
+    parser.add_argument("km_path", type=str)
+    parser.add_argument("n_clusters", type=int)
+    parser.add_argument("--seed", default=0, type=int)
+    parser.add_argument(
+        "--percent", default=-1, type=float, help="sample a subset; -1 for all"
+    )
+    parser.add_argument("--init", default="k-means++")
+    parser.add_argument("--max_iter", default=100, type=int)
+    parser.add_argument("--batch_size", default=10000, type=int)
+    parser.add_argument("--tol", default=0.0, type=float)
+    parser.add_argument("--max_no_improvement", default=100, type=int)
+    parser.add_argument("--n_init", default=20, type=int)
+    parser.add_argument("--reassignment_ratio", default=0.0, type=float)
+    args = parser.parse_args()
+    logging.info(str(args))
+
+    learn_kmeans(**vars(args))
diff --git a/SpeechT5/fairseq/examples/hubert/update_ckpt.py b/SpeechT5/fairseq/examples/hubert/update_ckpt.py
new file mode 100644
index 0000000000000000000000000000000000000000..53c9e74ea613e30aa5c22614e658f2b7272bac0c
--- /dev/null
+++ b/SpeechT5/fairseq/examples/hubert/update_ckpt.py
@@ -0,0 +1,22 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import torch
+
+src_ckpt = "/checkpoint/wnhsu/w2v/archived/hubert_base_ls960_it2.pt"
+ref_ckpt = "/checkpoint/wnhsu/w2v/hubert_icassp_oss_v3/iter2_km100-400k-grp-L6/oss.km500_p0_1_s334.pmw1_0.puw0_0.grpnorm.ml10.mp0_8.untie.mxsz250000.ufreq1.maxtok1400000.MU100k.s1337.ngpu32/checkpoint_last.pt"
+new_ckpt = "/checkpoint/wnhsu/w2v/archived/hubert_base_ls960_it2_updated.pt"
+
+
+def update_state(state):
+    state["model"]["label_embs_concat"] = state["model"].pop("label_embs")
+    state["args"].task = "hubert_pretraining"
+    state["args"].labels = f"['{state['args'].labels}']"
+    return state
+
+
+src_state = torch.load(src_ckpt)
+src_state = update_state(src_state)
+torch.save(src_state, new_ckpt)
diff --git a/SpeechT5/fairseq/examples/joint_alignment_translation/README.md b/SpeechT5/fairseq/examples/joint_alignment_translation/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..cd9c0ea65f5292198296a8f427b42e01b584e2d9
--- /dev/null
+++ b/SpeechT5/fairseq/examples/joint_alignment_translation/README.md
@@ -0,0 +1,89 @@
+# Jointly Learning to Align and Translate with Transformer Models (Garg et al., 2019)
+
+This page includes instructions for training models described in [Jointly Learning to Align and Translate with Transformer Models (Garg et al., 2019)](https://arxiv.org/abs/1909.02074).
+
+## Training a joint alignment-translation model on WMT'18 En-De
+
+##### 1. Extract and preprocess the WMT'18 En-De data
+```bash
+./prepare-wmt18en2de_no_norm_no_escape_no_agressive.sh
+```
+
+##### 2. Generate alignments from statistical alignment toolkits e.g. Giza++/FastAlign.
+In this example, we use FastAlign.
+```bash
+git clone git@github.com:clab/fast_align.git
+pushd fast_align
+mkdir build
+cd build
+cmake ..
+make
+popd
+ALIGN=fast_align/build/fast_align
+paste bpe.32k/train.en bpe.32k/train.de | awk -F '\t' '{print $1 " ||| " $2}' > bpe.32k/train.en-de
+$ALIGN -i bpe.32k/train.en-de -d -o -v > bpe.32k/train.align
+```
+
+##### 3. Preprocess the dataset with the above generated alignments.
+```bash
+fairseq-preprocess \
+    --source-lang en --target-lang de \
+    --trainpref bpe.32k/train \
+    --validpref bpe.32k/valid \
+    --testpref bpe.32k/test \
+    --align-suffix align \
+    --destdir binarized/ \
+    --joined-dictionary \
+    --workers 32
+```
+
+##### 4. Train a model
+```bash
+fairseq-train \
+    binarized \
+    --arch transformer_wmt_en_de_big_align --share-all-embeddings \
+    --optimizer adam --adam-betas '(0.9, 0.98)' --clip-norm 0.0 --activation-fn relu\
+    --lr 0.0002 --lr-scheduler inverse_sqrt --warmup-updates 4000 --warmup-init-lr 1e-07 \
+    --dropout 0.3 --attention-dropout 0.1 --weight-decay 0.0 \
+    --max-tokens 3500 --label-smoothing 0.1 \
+    --save-dir ./checkpoints --log-interval 1000 --max-update 60000 \
+    --keep-interval-updates -1 --save-interval-updates 0 \
+    --load-alignments --criterion label_smoothed_cross_entropy_with_alignment \
+    --fp16
+```
+
+Note that the `--fp16` flag requires you have CUDA 9.1 or greater and a Volta GPU or newer.
+
+If you want to train the above model with big batches (assuming your machine has 8 GPUs):
+- add `--update-freq 8` to simulate training on 8x8=64 GPUs
+- increase the learning rate; 0.0007 works well for big batches
+
+##### 5. Evaluate and generate the alignments (BPE level)
+```bash
+fairseq-generate \
+    binarized --gen-subset test --print-alignment \
+    --source-lang en --target-lang de \
+    --path checkpoints/checkpoint_best.pt --beam 5 --nbest 1
+```
+
+##### 6. Other resources.
+The code for:
+1. preparing alignment test sets
+2. converting BPE level alignments to token level alignments
+3. symmetrizing bidirectional alignments
+4. evaluating alignments using AER metric
+can be found [here](https://github.com/lilt/alignment-scripts)
+
+## Citation
+
+```bibtex
+@inproceedings{garg2019jointly,
+  title = {Jointly Learning to Align and Translate with Transformer Models},
+  author = {Garg, Sarthak and Peitz, Stephan and Nallasamy, Udhyakumar and Paulik, Matthias},
+  booktitle = {Conference on Empirical Methods in Natural Language Processing (EMNLP)},
+  address = {Hong Kong},
+  month = {November},
+  url = {https://arxiv.org/abs/1909.02074},
+  year = {2019},
+}
+```
diff --git a/SpeechT5/fairseq/examples/joint_alignment_translation/prepare-wmt18en2de_no_norm_no_escape_no_agressive.sh b/SpeechT5/fairseq/examples/joint_alignment_translation/prepare-wmt18en2de_no_norm_no_escape_no_agressive.sh
new file mode 100644
index 0000000000000000000000000000000000000000..e3efeb21d302ef8d9eae8f1d4b06434c593705f6
--- /dev/null
+++ b/SpeechT5/fairseq/examples/joint_alignment_translation/prepare-wmt18en2de_no_norm_no_escape_no_agressive.sh
@@ -0,0 +1,118 @@
+#!/bin/bash
+
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+echo 'Cloning Moses github repository (for tokenization scripts)...'
+git clone https://github.com/moses-smt/mosesdecoder.git
+
+SCRIPTS=mosesdecoder/scripts
+TOKENIZER=$SCRIPTS/tokenizer/tokenizer.perl
+CLEAN=$SCRIPTS/training/clean-corpus-n.perl
+REM_NON_PRINT_CHAR=$SCRIPTS/tokenizer/remove-non-printing-char.perl
+
+URLS=(
+    "http://statmt.org/wmt13/training-parallel-europarl-v7.tgz"
+    "http://statmt.org/wmt13/training-parallel-commoncrawl.tgz"
+    "http://data.statmt.org/wmt18/translation-task/training-parallel-nc-v13.tgz"
+    "http://data.statmt.org/wmt18/translation-task/rapid2016.tgz"
+    "http://data.statmt.org/wmt17/translation-task/dev.tgz"
+    "http://statmt.org/wmt14/test-full.tgz"
+)
+CORPORA=(
+    "training/europarl-v7.de-en"
+    "commoncrawl.de-en"
+    "training-parallel-nc-v13/news-commentary-v13.de-en"
+    "rapid2016.de-en"
+)
+
+if [ ! -d "$SCRIPTS" ]; then
+    echo "Please set SCRIPTS variable correctly to point to Moses scripts."
+    exit
+fi
+
+src=en
+tgt=de
+lang=en-de
+prep=wmt18_en_de
+tmp=$prep/tmp
+orig=orig
+dev=dev/newstest2012
+codes=32000
+bpe=bpe.32k
+
+mkdir -p $orig $tmp $prep $bpe
+
+cd $orig
+
+for ((i=0;i<${#URLS[@]};++i)); do
+    url=${URLS[i]}
+    file=$(basename $url)
+    if [ -f $file ]; then
+        echo "$file already exists, skipping download"
+    else
+        wget "$url"
+        if [ -f $file ]; then
+            echo "$url successfully downloaded."
+        else
+            echo "$url not successfully downloaded."
+            exit 1
+        fi
+        if [ ${file: -4} == ".tgz" ]; then
+            tar zxvf $file
+        elif [ ${file: -4} == ".tar" ]; then
+            tar xvf $file
+        fi
+    fi
+done
+cd ..
+
+echo "pre-processing train data..."
+for l in $src $tgt; do
+    rm  -rf $tmp/train.tags.$lang.tok.$l
+    for f in "${CORPORA[@]}"; do
+        cat $orig/$f.$l | \
+            perl $REM_NON_PRINT_CHAR | \
+            perl $TOKENIZER -threads 8 -l $l -no-escape >> $tmp/train.tags.$lang.tok.$l
+    done
+done
+
+echo "pre-processing test data..."
+for l in $src $tgt; do
+    if [ "$l" == "$src" ]; then
+        t="src"
+    else
+        t="ref"
+    fi
+    grep '<seg id' $orig/test-full/newstest2014-deen-$t.$l.sgm | \
+        sed -e 's/<seg id="[0-9]*">\s*//g' | \
+        sed -e 's/\s*<\/seg>\s*//g' | \
+        sed -e "s/\’/\'/g" | \
+    perl $TOKENIZER -threads 8 -l $l -no-escape > $tmp/test.$l
+    echo ""
+done
+
+# apply length filtering before BPE
+perl $CLEAN -ratio 1.5 $tmp/train.tags.$lang.tok $src $tgt $tmp/train 1 100
+
+# use newstest2012 for valid
+echo "pre-processing valid data..."
+for l in $src $tgt; do
+    rm  -rf $tmp/valid.$l
+    cat $orig/$dev.$l | \
+        perl $REM_NON_PRINT_CHAR | \
+        perl $TOKENIZER -threads 8 -l $l -no-escape >> $tmp/valid.$l
+done
+
+mkdir output
+mv $tmp/{train,valid,test}.{$src,$tgt} output
+
+#BPE
+git clone https://github.com/glample/fastBPE.git
+pushd fastBPE
+g++ -std=c++11 -pthread -O3 fastBPE/main.cc -IfastBPE -o fast
+popd
+fastBPE/fast learnbpe $codes output/train.$src output/train.$tgt > $bpe/codes
+for split in {train,valid,test}; do for lang in {en,de}; do fastBPE/fast applybpe $bpe/$split.$lang output/$split.$lang $bpe/codes; done; done
diff --git a/SpeechT5/fairseq/examples/language_model/README.adaptive_inputs.md b/SpeechT5/fairseq/examples/language_model/README.adaptive_inputs.md
new file mode 100644
index 0000000000000000000000000000000000000000..6650d58f37f320aa46402d59ce6494b2dd1c3faa
--- /dev/null
+++ b/SpeechT5/fairseq/examples/language_model/README.adaptive_inputs.md
@@ -0,0 +1,39 @@
+# Adaptive Input Representations for Neural Language Modeling (Baevski and Auli, 2018)
+
+## Pre-trained models
+
+Description | Parameters | Dataset | Model and Test set(s)
+---|---:|---|---
+Adaptive Inputs <br> ([Baevski and Auli, 2018](https://arxiv.org/abs/1809.10853)) | 1026M | [Google Billion Words](https://github.com/ciprian-chelba/1-billion-word-language-modeling-benchmark) | [download (.tar.bz2)](https://dl.fbaipublicfiles.com/fairseq/models/lm/adaptive_lm_gbw_huge.tar.bz2)
+Adaptive Inputs <br> ([Baevski and Auli, 2018](https://arxiv.org/abs/1809.10853)) | 247M | [WikiText-103](https://blog.einstein.ai/the-wikitext-long-term-dependency-language-modeling-dataset/) | [download (.tar.bz2)](https://dl.fbaipublicfiles.com/fairseq/models/lm/adaptive_lm_wiki103.v2.tar.bz2)
+
+## Training an LM with adaptive inputs
+
+First, see the general [language modeling README](README.md) for instructions on
+preprocessing the WikiText-103 data.
+
+Then use the following training command to train a model with adaptive inputs
+using the `transformer_lm_wiki103` model architecture:
+```bash
+fairseq-train --task language_modeling \
+    data-bin/wikitext-103 \
+    --save-dir checkpoints/transformer_wikitext-103 \
+    --arch transformer_lm_wiki103 \
+    --max-update 286000 --lr 1.0 --t-mult 2 --lr-period-updates 270000 --lr-scheduler cosine --lr-shrink 0.75 \
+    --warmup-updates 16000 --warmup-init-lr 1e-07 --stop-min-lr 1e-09 --optimizer nag --min-lr 0.0001 --clip-norm 0.1 \
+    --criterion adaptive_loss --max-tokens 3072 --update-freq 3 --tokens-per-sample 3072 --seed 1 \
+    --sample-break-mode none --skip-invalid-size-inputs-valid-test --ddp-backend=legacy_ddp
+```
+
+## Citation
+
+```bibtex
+@inproceedings{
+    baevski2018adaptive,
+    title={Adaptive Input Representations for Neural Language Modeling},
+    author={Alexei Baevski and Michael Auli},
+    booktitle={International Conference on Learning Representations},
+    year={2019},
+    url={https://openreview.net/forum?id=ByxZX20qFQ},
+}
+```
diff --git a/SpeechT5/fairseq/examples/language_model/README.conv.md b/SpeechT5/fairseq/examples/language_model/README.conv.md
new file mode 100644
index 0000000000000000000000000000000000000000..1ff8635906cf278208be4714e0ef805a6a6b4da1
--- /dev/null
+++ b/SpeechT5/fairseq/examples/language_model/README.conv.md
@@ -0,0 +1,40 @@
+# Language Modeling with Gated Convolutional Networks (Dauphin et al., 2017)
+
+## Example usage
+
+First download and preprocess the data following the main [language modeling README](README.md).
+
+Then to train a convolutional LM using the `fconv_lm_dauphin_wikitext103`
+architecture:
+```bash
+fairseq-train --task language_modeling \
+    data-bin/wikitext-103 \
+    --save-dir checkpoints/fconv_wikitext-103 \
+    --arch fconv_lm_dauphin_wikitext103 \
+    --adaptive-softmax-cutoff 10000,20000,200000 \
+    --dropout 0.2 \
+    --criterion adaptive_loss \
+    --optimizer nag --clip-norm 0.1 --weight-decay 5e-06 \
+    --lr 1.0 --lr-scheduler reduce_lr_on_plateau --lr-shrink 0.5 \
+    --max-tokens 1024 --tokens-per-sample 1024 \
+    --ddp-backend legacy_ddp \
+    --max-epoch 35
+```
+
+And evaluate with:
+```bash
+fairseq-eval-lm data-bin/wikitext-103 --path checkpoints/fconv_wiki103/checkpoint_best.pt
+```
+
+## Citation
+
+```bibtex
+@inproceedings{dauphin2017language,
+  title={Language Modeling with Gated Convolutional Networks},
+  author={Dauphin, Yann N and Fan, Angela and Auli, Michael and Grangier, David},
+  booktitle={Proceedings of the 34th International Conference on Machine Learning-Volume 70},
+  pages={933--941},
+  year={2017},
+  organization={JMLR}
+}
+```
diff --git a/SpeechT5/fairseq/examples/language_model/README.md b/SpeechT5/fairseq/examples/language_model/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..e78ea48e08dc99b69751923762107a8f8a9a5e3e
--- /dev/null
+++ b/SpeechT5/fairseq/examples/language_model/README.md
@@ -0,0 +1,123 @@
+# Neural Language Modeling
+
+## Pre-trained models
+
+Model | Description | Dataset | Download
+---|---|---|---
+`transformer_lm.gbw.adaptive_huge` | Adaptive Inputs <br> ([Baevski and Auli, 2018](https://arxiv.org/abs/1809.10853)) <br> 1026M params | [Google Billion Words](https://github.com/ciprian-chelba/1-billion-word-language-modeling-benchmark) | [download (.tar.bz2)](https://dl.fbaipublicfiles.com/fairseq/models/lm/adaptive_lm_gbw_huge.tar.bz2)
+`transformer_lm.wiki103.adaptive` | Adaptive Inputs <br> ([Baevski and Auli, 2018](https://arxiv.org/abs/1809.10853)) <br> 247M params | [WikiText-103](https://blog.einstein.ai/the-wikitext-long-term-dependency-language-modeling-dataset) | [download (.tar.bz2)](https://dl.fbaipublicfiles.com/fairseq/models/lm/adaptive_lm_wiki103.v2.tar.bz2)
+`transformer_lm.wmt19.en` | English LM <br> ([Ng et al., 2019](https://arxiv.org/abs/1907.06616)) | [WMT News Crawl](http://data.statmt.org/news-crawl/) | [download (.tar.gz)](https://dl.fbaipublicfiles.com/fairseq/models/lm/wmt19.en.tar.gz)
+`transformer_lm.wmt19.de` | German LM <br> ([Ng et al., 2019](https://arxiv.org/abs/1907.06616)) | [WMT News Crawl](http://data.statmt.org/news-crawl/) | [download (.tar.gz)](https://dl.fbaipublicfiles.com/fairseq/models/lm/wmt19.de.tar.gz)
+`transformer_lm.wmt19.ru` | Russian LM <br> ([Ng et al., 2019](https://arxiv.org/abs/1907.06616)) | [WMT News Crawl](http://data.statmt.org/news-crawl/) | [download (.tar.gz)](https://dl.fbaipublicfiles.com/fairseq/models/lm/wmt19.ru.tar.gz)
+
+## Example usage
+
+We require a few additional Python dependencies for preprocessing:
+```bash
+pip install fastBPE sacremoses
+```
+
+To sample from a language model using PyTorch Hub:
+```python
+import torch
+
+# List available models
+torch.hub.list('pytorch/fairseq')  # [..., 'transformer_lm.wmt19.en', ...]
+
+# Load an English LM trained on WMT'19 News Crawl data
+en_lm = torch.hub.load('pytorch/fairseq', 'transformer_lm.wmt19.en', tokenizer='moses', bpe='fastbpe')
+en_lm.eval()  # disable dropout
+
+# Move model to GPU
+en_lm.cuda()
+
+# Sample from the language model
+en_lm.sample('Barack Obama', beam=1, sampling=True, sampling_topk=10, temperature=0.8)
+# "Barack Obama is coming to Sydney and New Zealand (...)"
+
+# Compute perplexity for a sequence
+en_lm.score('Barack Obama is coming to Sydney and New Zealand')['positional_scores'].mean().neg().exp()
+# tensor(15.1474)
+
+# The same interface can be used with custom models as well
+from fairseq.models.transformer_lm import TransformerLanguageModel
+custom_lm = TransformerLanguageModel.from_pretrained('/path/to/model/dir', 'checkpoint100.pt', tokenizer='moses', bpe='fastbpe')
+custom_lm.sample('Barack Obama', beam=5)
+# "Barack Obama (...)"
+```
+
+## Training a transformer language model with the CLI tools
+
+### 1) Preprocess the data
+
+First download and prepare the [WikiText-103 dataset](https://www.salesforce.com/products/einstein/ai-research/the-wikitext-dependency-language-modeling-dataset/):
+```bash
+cd examples/language_model/
+bash prepare-wikitext-103.sh
+cd ../..
+```
+
+Next preprocess/binarize the data:
+```bash
+TEXT=examples/language_model/wikitext-103
+fairseq-preprocess \
+    --only-source \
+    --trainpref $TEXT/wiki.train.tokens \
+    --validpref $TEXT/wiki.valid.tokens \
+    --testpref $TEXT/wiki.test.tokens \
+    --destdir data-bin/wikitext-103 \
+    --workers 20
+```
+
+### 2) Train a language model
+
+Next we'll train a basic transformer language model on wikitext-103. For more
+advanced usage, see the [adaptive inputs README](README.adaptive_inputs.md).
+
+To train a basic LM (assumes 2 GPUs):
+```
+$ fairseq-train --task language_modeling \
+  data-bin/wikitext-103 \
+  --save-dir checkpoints/transformer_wikitext-103 \
+  --arch transformer_lm --share-decoder-input-output-embed \
+  --dropout 0.1 \
+  --optimizer adam --adam-betas '(0.9, 0.98)' --weight-decay 0.01 --clip-norm 0.0 \
+  --lr 0.0005 --lr-scheduler inverse_sqrt --warmup-updates 4000 --warmup-init-lr 1e-07 \
+  --tokens-per-sample 512 --sample-break-mode none \
+  --max-tokens 2048 --update-freq 16 \
+  --fp16 \
+  --max-update 50000
+```
+
+If you run out of memory, try reducing `--max-tokens` (max number of tokens per
+batch) or `--tokens-per-sample` (max sequence length). You can also adjust
+`--update-freq` to accumulate gradients and simulate training on a different
+number of GPUs.
+
+### 3) Evaluate
+
+```bash
+fairseq-eval-lm data-bin/wikitext-103 \
+    --path checkpoints/transformer_wiki103/checkpoint_best.pt \
+    --batch-size 2 \
+    --tokens-per-sample 512 \
+    --context-window 400
+# | Evaluated 245569 tokens in 56.1s (4379.02 tokens/s)
+# | Loss: 3.4164, Perplexity: 30.46
+```
+
+*Note:* The `--context-window` option controls how much context is provided to
+each token when computing perplexity. When the window size is 0, the dataset is
+chunked into segments of length 512 and perplexity is computed over each segment
+normally. However, this results in worse (higher) perplexity since tokens that
+appear earlier in each segment have less conditioning. When the maximum window
+size is used (511 in this case), then we compute perplexity for each token
+fully conditioned on 511 tokens of context. This slows down evaluation
+significantly, since we must run a separate forward pass for every token in the
+dataset, but results in better (lower) perplexity.
+
+
+## Convolutional language models
+
+Please see the [convolutional LM README](README.conv.md) for instructions on
+training convolutional language models.
diff --git a/SpeechT5/fairseq/examples/language_model/prepare-wikitext-103.sh b/SpeechT5/fairseq/examples/language_model/prepare-wikitext-103.sh
new file mode 100644
index 0000000000000000000000000000000000000000..751302156f0a6829af9c2ee5e0e2ca62c2cd4187
--- /dev/null
+++ b/SpeechT5/fairseq/examples/language_model/prepare-wikitext-103.sh
@@ -0,0 +1,33 @@
+#!/bin/bash
+# Adapted from https://github.com/facebookresearch/MIXER/blob/master/prepareData.sh
+
+URLS=(
+    "https://s3.amazonaws.com/research.metamind.io/wikitext/wikitext-103-v1.zip"
+)
+FILES=(
+    "wikitext-103-v1.zip"
+)
+
+for ((i=0;i<${#URLS[@]};++i)); do
+    file=${FILES[i]}
+    if [ -f $file ]; then
+        echo "$file already exists, skipping download"
+    else
+        url=${URLS[i]}
+        wget "$url"
+        if [ -f $file ]; then
+            echo "$url successfully downloaded."
+        else
+            echo "$url not successfully downloaded."
+            exit -1
+        fi
+        if [ ${file: -4} == ".tgz" ]; then
+            tar zxvf $file
+        elif [ ${file: -4} == ".tar" ]; then
+            tar xvf $file
+        elif [ ${file: -4} == ".zip" ]; then
+            unzip $file
+        fi
+    fi
+done
+cd ..
diff --git a/SpeechT5/fairseq/examples/laser/README.md b/SpeechT5/fairseq/examples/laser/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..66acada04f58fa235cd312753f144f6f1e5f4a33
--- /dev/null
+++ b/SpeechT5/fairseq/examples/laser/README.md
@@ -0,0 +1,144 @@
+# LASER  Language-Agnostic SEntence Representations
+
+LASER is a library to calculate and use multilingual sentence embeddings.
+
+You can find more information about LASER and how to use it on the official [LASER repository](https://github.com/facebookresearch/LASER).
+
+This folder contains source code for training LASER embeddings.
+
+
+## Prepare data and configuration file
+
+Binarize your data with fairseq, as described [here](https://fairseq.readthedocs.io/en/latest/getting_started.html#data-pre-processing).
+
+Create a json config file with this format:
+```
+{
+  "src_vocab": "/path/to/spm.src.cvocab",
+  "tgt_vocab": "/path/to/spm.tgt.cvocab",
+  "train": [
+    {
+      "type": "translation",
+      "id": 0,
+      "src": "/path/to/srclang1-tgtlang0/train.srclang1",
+      "tgt": "/path/to/srclang1-tgtlang0/train.tgtlang0"
+    },
+    {
+      "type": "translation",
+      "id": 1,
+      "src": "/path/to/srclang1-tgtlang1/train.srclang1",
+      "tgt": "/path/to/srclang1-tgtlang1/train.tgtlang1"
+    },
+    {
+      "type": "translation",
+      "id": 0,
+      "src": "/path/to/srclang2-tgtlang0/train.srclang2",
+      "tgt": "/path/to/srclang2-tgtlang0/train.tgtlang0"
+    },
+    {
+      "type": "translation",
+      "id": 1,
+      "src": "/path/to/srclang2-tgtlang1/train.srclang2",
+      "tgt": "/path/to/srclang2-tgtlang1/train.tgtlang1"
+    },
+    ...
+  ],
+  "valid": [
+    {
+      "type": "translation",
+      "id": 0,
+      "src": "/unused",
+      "tgt": "/unused"
+    }
+  ]
+}
+```
+where paths are paths to binarized indexed fairseq dataset files.
+`id` represents the target language id.
+
+
+## Training Command Line Example
+
+```
+fairseq-train \
+  /path/to/configfile_described_above.json \
+  --user-dir examples/laser/laser_src \
+  --log-interval 100 --log-format simple \
+  --task laser --arch laser_lstm \
+  --save-dir . \
+  --optimizer adam \
+  --lr 0.001 \
+  --lr-scheduler inverse_sqrt \
+  --clip-norm 5 \
+  --warmup-updates 90000 \
+  --update-freq 2 \
+  --dropout 0.0 \
+  --encoder-dropout-out 0.1 \
+  --max-tokens 2000 \
+  --max-epoch 50 \
+  --encoder-bidirectional \
+  --encoder-layers 5 \
+  --encoder-hidden-size 512 \
+  --decoder-layers 1 \
+  --decoder-hidden-size 2048 \
+  --encoder-embed-dim 320 \
+  --decoder-embed-dim 320 \
+  --decoder-lang-embed-dim 32 \
+  --warmup-init-lr 0.001 \
+  --disable-validation
+```
+
+
+## Applications
+
+We showcase several applications of multilingual sentence embeddings
+with code to reproduce our results (in the directory "tasks").
+
+* [**Cross-lingual document classification**](https://github.com/facebookresearch/LASER/tree/master/tasks/mldoc) using the
+  [*MLDoc*](https://github.com/facebookresearch/MLDoc) corpus [2,6]
+* [**WikiMatrix**](https://github.com/facebookresearch/LASER/tree/master/tasks/WikiMatrix)
+   Mining 135M Parallel Sentences in 1620 Language Pairs from Wikipedia [7]
+* [**Bitext mining**](https://github.com/facebookresearch/LASER/tree/master/tasks/bucc) using the
+  [*BUCC*](https://comparable.limsi.fr/bucc2018/bucc2018-task.html) corpus [3,5]
+* [**Cross-lingual NLI**](https://github.com/facebookresearch/LASER/tree/master/tasks/xnli)
+  using the [*XNLI*](https://www.nyu.edu/projects/bowman/xnli/) corpus [4,5,6]
+* [**Multilingual similarity search**](https://github.com/facebookresearch/LASER/tree/master/tasks/similarity) [1,6]
+* [**Sentence embedding of text files**](https://github.com/facebookresearch/LASER/tree/master/tasks/embed)
+  example how to calculate sentence embeddings for arbitrary text files in any of the supported language.
+
+**For all tasks, we use exactly the same multilingual encoder, without any task specific optimization or fine-tuning.**
+
+
+
+## References
+
+[1] Holger Schwenk and Matthijs Douze,
+    [*Learning Joint Multilingual Sentence Representations with Neural Machine Translation*](https://aclanthology.info/papers/W17-2619/w17-2619),
+    ACL workshop on Representation Learning for NLP, 2017
+
+[2] Holger Schwenk and Xian Li,
+    [*A Corpus for Multilingual Document Classification in Eight Languages*](http://www.lrec-conf.org/proceedings/lrec2018/pdf/658.pdf),
+    LREC, pages 3548-3551, 2018.
+
+[3] Holger Schwenk,
+    [*Filtering and Mining Parallel Data in a Joint Multilingual Space*](http://aclweb.org/anthology/P18-2037)
+    ACL, July 2018
+
+[4] Alexis Conneau, Guillaume Lample, Ruty Rinott, Adina Williams, Samuel R. Bowman, Holger Schwenk and Veselin Stoyanov,
+    [*XNLI: Cross-lingual Sentence Understanding through Inference*](https://aclweb.org/anthology/D18-1269),
+    EMNLP, 2018.
+
+[5] Mikel Artetxe and Holger Schwenk,
+    [*Margin-based Parallel Corpus Mining with Multilingual Sentence Embeddings*](https://arxiv.org/abs/1811.01136)
+    arXiv, Nov 3 2018.
+
+[6] Mikel Artetxe and Holger Schwenk,
+    [*Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond*](https://arxiv.org/abs/1812.10464)
+    arXiv, Dec 26 2018.
+
+[7] Holger Schwenk, Vishrav Chaudhary, Shuo Sun, Hongyu Gong and Paco Guzman,
+    [*WikiMatrix: Mining 135M Parallel Sentences in 1620 Language Pairs from Wikipedia*](https://arxiv.org/abs/1907.05791)
+    arXiv, July 11  2019.
+
+[8] Holger Schwenk, Guillaume Wenzek, Sergey Edunov, Edouard Grave and Armand Joulin
+    [*CCMatrix: Mining Billions of High-Quality Parallel Sentences on the WEB*](https://arxiv.org/abs/1911.04944)
diff --git a/SpeechT5/fairseq/examples/laser/laser_src/__init__.py b/SpeechT5/fairseq/examples/laser/laser_src/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..9ffbd656d8786e421008fb4cb0d1d8911dc8330c
--- /dev/null
+++ b/SpeechT5/fairseq/examples/laser/laser_src/__init__.py
@@ -0,0 +1,8 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from .laser_task import *  # noqa
+from .laser_lstm import *  # noqa
+from .laser_transformer import *  # noqa
diff --git a/SpeechT5/fairseq/examples/laser/laser_src/laser_lstm.py b/SpeechT5/fairseq/examples/laser/laser_src/laser_lstm.py
new file mode 100644
index 0000000000000000000000000000000000000000..10df90e002d5a7dd74a571dbc3b328c130c57a0a
--- /dev/null
+++ b/SpeechT5/fairseq/examples/laser/laser_src/laser_lstm.py
@@ -0,0 +1,585 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+
+from fairseq import options, utils
+
+from fairseq.models import (
+    FairseqEncoder,
+    FairseqIncrementalDecoder,
+    FairseqEncoderDecoderModel,
+    register_model,
+    register_model_architecture,
+)
+
+
+@register_model("laser_lstm")
+class LSTMModel(FairseqEncoderDecoderModel):
+    def __init__(self, encoder, decoder):
+        super().__init__(encoder, decoder)
+
+    def forward(
+        self,
+        src_tokens,
+        src_lengths,
+        prev_output_tokens=None,
+        tgt_tokens=None,
+        tgt_lengths=None,
+        target_language_id=None,
+        dataset_name="",
+    ):
+        assert target_language_id is not None
+
+        src_encoder_out = self.encoder(src_tokens, src_lengths, dataset_name)
+        return self.decoder(
+            prev_output_tokens, src_encoder_out, lang_id=target_language_id
+        )
+
+    @staticmethod
+    def add_args(parser):
+        """Add model-specific arguments to the parser."""
+        parser.add_argument(
+            "--dropout",
+            default=0.1,
+            type=float,
+            metavar="D",
+            help="dropout probability",
+        )
+        parser.add_argument(
+            "--encoder-embed-dim",
+            type=int,
+            metavar="N",
+            help="encoder embedding dimension",
+        )
+        parser.add_argument(
+            "--encoder-embed-path",
+            default=None,
+            type=str,
+            metavar="STR",
+            help="path to pre-trained encoder embedding",
+        )
+        parser.add_argument(
+            "--encoder-hidden-size", type=int, metavar="N", help="encoder hidden size"
+        )
+        parser.add_argument(
+            "--encoder-layers", type=int, metavar="N", help="number of encoder layers"
+        )
+        parser.add_argument(
+            "--encoder-bidirectional",
+            action="store_true",
+            help="make all layers of encoder bidirectional",
+        )
+        parser.add_argument(
+            "--decoder-embed-dim",
+            type=int,
+            metavar="N",
+            help="decoder embedding dimension",
+        )
+        parser.add_argument(
+            "--decoder-embed-path",
+            default=None,
+            type=str,
+            metavar="STR",
+            help="path to pre-trained decoder embedding",
+        )
+        parser.add_argument(
+            "--decoder-hidden-size", type=int, metavar="N", help="decoder hidden size"
+        )
+        parser.add_argument(
+            "--decoder-layers", type=int, metavar="N", help="number of decoder layers"
+        )
+        parser.add_argument(
+            "--decoder-out-embed-dim",
+            type=int,
+            metavar="N",
+            help="decoder output embedding dimension",
+        )
+        parser.add_argument(
+            "--decoder-zero-init",
+            type=str,
+            metavar="BOOL",
+            help="initialize the decoder hidden/cell state to zero",
+        )
+        parser.add_argument(
+            "--decoder-lang-embed-dim",
+            type=int,
+            metavar="N",
+            help="decoder language embedding dimension",
+        )
+        parser.add_argument(
+            "--fixed-embeddings",
+            action="store_true",
+            help="keep embeddings fixed (ENCODER ONLY)",
+        )  # TODO Also apply to decoder embeddings?
+
+        # Granular dropout settings (if not specified these default to --dropout)
+        parser.add_argument(
+            "--encoder-dropout-in",
+            type=float,
+            metavar="D",
+            help="dropout probability for encoder input embedding",
+        )
+        parser.add_argument(
+            "--encoder-dropout-out",
+            type=float,
+            metavar="D",
+            help="dropout probability for encoder output",
+        )
+        parser.add_argument(
+            "--decoder-dropout-in",
+            type=float,
+            metavar="D",
+            help="dropout probability for decoder input embedding",
+        )
+        parser.add_argument(
+            "--decoder-dropout-out",
+            type=float,
+            metavar="D",
+            help="dropout probability for decoder output",
+        )
+
+    @classmethod
+    def build_model(cls, args, task):
+        """Build a new model instance."""
+        # make sure that all args are properly defaulted (in case there are any new ones)
+        base_architecture(args)
+
+        def load_pretrained_embedding_from_file(embed_path, dictionary, embed_dim):
+            num_embeddings = len(dictionary)
+            padding_idx = dictionary.pad()
+            embed_tokens = Embedding(num_embeddings, embed_dim, padding_idx)
+            embed_dict = utils.parse_embedding(embed_path)
+            utils.print_embed_overlap(embed_dict, dictionary)
+            return utils.load_embedding(embed_dict, dictionary, embed_tokens)
+
+        pretrained_encoder_embed = None
+        if args.encoder_embed_path:
+            pretrained_encoder_embed = load_pretrained_embedding_from_file(
+                args.encoder_embed_path, task.source_dictionary, args.encoder_embed_dim
+            )
+        pretrained_decoder_embed = None
+        if args.decoder_embed_path:
+            pretrained_decoder_embed = load_pretrained_embedding_from_file(
+                args.decoder_embed_path, task.target_dictionary, args.decoder_embed_dim
+            )
+
+        num_langs = task.num_tasks if hasattr(task, "num_tasks") else 0
+
+        encoder = LSTMEncoder(
+            dictionary=task.source_dictionary,
+            embed_dim=args.encoder_embed_dim,
+            hidden_size=args.encoder_hidden_size,
+            num_layers=args.encoder_layers,
+            dropout_in=args.encoder_dropout_in,
+            dropout_out=args.encoder_dropout_out,
+            bidirectional=args.encoder_bidirectional,
+            pretrained_embed=pretrained_encoder_embed,
+            fixed_embeddings=args.fixed_embeddings,
+        )
+        decoder = LSTMDecoder(
+            dictionary=task.target_dictionary,
+            embed_dim=args.decoder_embed_dim,
+            hidden_size=args.decoder_hidden_size,
+            out_embed_dim=args.decoder_out_embed_dim,
+            num_layers=args.decoder_layers,
+            dropout_in=args.decoder_dropout_in,
+            dropout_out=args.decoder_dropout_out,
+            zero_init=options.eval_bool(args.decoder_zero_init),
+            encoder_embed_dim=args.encoder_embed_dim,
+            encoder_output_units=encoder.output_units,
+            pretrained_embed=pretrained_decoder_embed,
+            num_langs=num_langs,
+            lang_embed_dim=args.decoder_lang_embed_dim,
+        )
+        return cls(encoder, decoder)
+
+
+class LSTMEncoder(FairseqEncoder):
+    """LSTM encoder."""
+
+    def __init__(
+        self,
+        dictionary,
+        embed_dim=512,
+        hidden_size=512,
+        num_layers=1,
+        dropout_in=0.1,
+        dropout_out=0.1,
+        bidirectional=False,
+        left_pad=True,
+        pretrained_embed=None,
+        padding_value=0.0,
+        fixed_embeddings=False,
+    ):
+        super().__init__(dictionary)
+        self.num_layers = num_layers
+        self.dropout_in = dropout_in
+        self.dropout_out = dropout_out
+        self.bidirectional = bidirectional
+        self.hidden_size = hidden_size
+
+        num_embeddings = len(dictionary)
+        self.padding_idx = dictionary.pad()
+        if pretrained_embed is None:
+            self.embed_tokens = Embedding(num_embeddings, embed_dim, self.padding_idx)
+        else:
+            self.embed_tokens = pretrained_embed
+        if fixed_embeddings:
+            self.embed_tokens.weight.requires_grad = False
+
+        self.lstm = LSTM(
+            input_size=embed_dim,
+            hidden_size=hidden_size,
+            num_layers=num_layers,
+            dropout=self.dropout_out if num_layers > 1 else 0.0,
+            bidirectional=bidirectional,
+        )
+        self.left_pad = left_pad
+        self.padding_value = padding_value
+
+        self.output_units = hidden_size
+        if bidirectional:
+            self.output_units *= 2
+
+    def forward(self, src_tokens, src_lengths, dataset_name):
+        if self.left_pad:
+            # convert left-padding to right-padding
+            src_tokens = utils.convert_padding_direction(
+                src_tokens,
+                self.padding_idx,
+                left_to_right=True,
+            )
+
+        bsz, seqlen = src_tokens.size()
+
+        # embed tokens
+        x = self.embed_tokens(src_tokens)
+        x = F.dropout(x, p=self.dropout_in, training=self.training)
+
+        # B x T x C -> T x B x C
+        x = x.transpose(0, 1)
+
+        # pack embedded source tokens into a PackedSequence
+        try:
+            packed_x = nn.utils.rnn.pack_padded_sequence(x, src_lengths.data.tolist())
+        except BaseException:
+            raise Exception(f"Packing failed in dataset {dataset_name}")
+
+        # apply LSTM
+        if self.bidirectional:
+            state_size = 2 * self.num_layers, bsz, self.hidden_size
+        else:
+            state_size = self.num_layers, bsz, self.hidden_size
+        h0 = x.data.new(*state_size).zero_()
+        c0 = x.data.new(*state_size).zero_()
+        packed_outs, (final_hiddens, final_cells) = self.lstm(packed_x, (h0, c0))
+
+        # unpack outputs and apply dropout
+        x, _ = nn.utils.rnn.pad_packed_sequence(
+            packed_outs, padding_value=self.padding_value
+        )
+        x = F.dropout(x, p=self.dropout_out, training=self.training)
+        assert list(x.size()) == [seqlen, bsz, self.output_units]
+
+        if self.bidirectional:
+
+            def combine_bidir(outs):
+                return torch.cat(
+                    [
+                        torch.cat([outs[2 * i], outs[2 * i + 1]], dim=0).view(
+                            1, bsz, self.output_units
+                        )
+                        for i in range(self.num_layers)
+                    ],
+                    dim=0,
+                )
+
+            final_hiddens = combine_bidir(final_hiddens)
+            final_cells = combine_bidir(final_cells)
+
+        encoder_padding_mask = src_tokens.eq(self.padding_idx).t()
+
+        # Set padded outputs to -inf so they are not selected by max-pooling
+        padding_mask = src_tokens.eq(self.padding_idx).t().unsqueeze(-1)
+        if padding_mask.any():
+            x = x.float().masked_fill_(padding_mask, float("-inf")).type_as(x)
+
+        # Build the sentence embedding by max-pooling over the encoder outputs
+        sentemb = x.max(dim=0)[0]
+
+        return {
+            "sentemb": sentemb,
+            "encoder_out": (x, final_hiddens, final_cells),
+            "encoder_padding_mask": encoder_padding_mask
+            if encoder_padding_mask.any()
+            else None,
+        }
+
+    def reorder_encoder_out(self, encoder_out_dict, new_order):
+        encoder_out_dict["sentemb"] = encoder_out_dict["sentemb"].index_select(
+            0, new_order
+        )
+        encoder_out_dict["encoder_out"] = tuple(
+            eo.index_select(1, new_order) for eo in encoder_out_dict["encoder_out"]
+        )
+        if encoder_out_dict["encoder_padding_mask"] is not None:
+            encoder_out_dict["encoder_padding_mask"] = encoder_out_dict[
+                "encoder_padding_mask"
+            ].index_select(1, new_order)
+        return encoder_out_dict
+
+    def max_positions(self):
+        """Maximum input length supported by the encoder."""
+        return int(1e5)  # an arbitrary large number
+
+
+class LSTMDecoder(FairseqIncrementalDecoder):
+    """LSTM decoder."""
+
+    def __init__(
+        self,
+        dictionary,
+        embed_dim=512,
+        hidden_size=512,
+        out_embed_dim=512,
+        num_layers=1,
+        dropout_in=0.1,
+        dropout_out=0.1,
+        zero_init=False,
+        encoder_embed_dim=512,
+        encoder_output_units=512,
+        pretrained_embed=None,
+        num_langs=1,
+        lang_embed_dim=0,
+    ):
+        super().__init__(dictionary)
+        self.dropout_in = dropout_in
+        self.dropout_out = dropout_out
+        self.hidden_size = hidden_size
+
+        num_embeddings = len(dictionary)
+        padding_idx = dictionary.pad()
+        if pretrained_embed is None:
+            self.embed_tokens = Embedding(num_embeddings, embed_dim, padding_idx)
+        else:
+            self.embed_tokens = pretrained_embed
+
+        self.layers = nn.ModuleList(
+            [
+                LSTMCell(
+                    input_size=encoder_output_units + embed_dim + lang_embed_dim
+                    if layer == 0
+                    else hidden_size,
+                    hidden_size=hidden_size,
+                )
+                for layer in range(num_layers)
+            ]
+        )
+        if hidden_size != out_embed_dim:
+            self.additional_fc = Linear(hidden_size, out_embed_dim)
+        self.fc_out = Linear(out_embed_dim, num_embeddings, dropout=dropout_out)
+
+        if zero_init:
+            self.sentemb2init = None
+        else:
+            self.sentemb2init = Linear(
+                encoder_output_units, 2 * num_layers * hidden_size
+            )
+
+        if lang_embed_dim == 0:
+            self.embed_lang = None
+        else:
+            self.embed_lang = nn.Embedding(num_langs, lang_embed_dim)
+            nn.init.uniform_(self.embed_lang.weight, -0.1, 0.1)
+
+    def forward(
+        self, prev_output_tokens, encoder_out_dict, incremental_state=None, lang_id=0
+    ):
+        sentemb = encoder_out_dict["sentemb"]
+        encoder_out = encoder_out_dict["encoder_out"]
+
+        if incremental_state is not None:
+            prev_output_tokens = prev_output_tokens[:, -1:]
+        bsz, seqlen = prev_output_tokens.size()
+
+        # get outputs from encoder
+        encoder_outs, _, _ = encoder_out[:3]
+        srclen = encoder_outs.size(0)
+
+        # embed tokens
+        x = self.embed_tokens(prev_output_tokens)
+        x = F.dropout(x, p=self.dropout_in, training=self.training)
+
+        # embed language identifier
+        if self.embed_lang is not None:
+            lang_ids = prev_output_tokens.data.new_full((bsz,), lang_id)
+            langemb = self.embed_lang(lang_ids)
+            # TODO Should we dropout here???
+
+        # B x T x C -> T x B x C
+        x = x.transpose(0, 1)
+
+        # initialize previous states (or get from cache during incremental generation)
+        cached_state = utils.get_incremental_state(
+            self, incremental_state, "cached_state"
+        )
+        if cached_state is not None:
+            prev_hiddens, prev_cells, input_feed = cached_state
+        else:
+            num_layers = len(self.layers)
+            if self.sentemb2init is None:
+                prev_hiddens = [
+                    x.data.new(bsz, self.hidden_size).zero_() for i in range(num_layers)
+                ]
+                prev_cells = [
+                    x.data.new(bsz, self.hidden_size).zero_() for i in range(num_layers)
+                ]
+            else:
+                init = self.sentemb2init(sentemb)
+                prev_hiddens = [
+                    init[:, (2 * i) * self.hidden_size : (2 * i + 1) * self.hidden_size]
+                    for i in range(num_layers)
+                ]
+                prev_cells = [
+                    init[
+                        :,
+                        (2 * i + 1) * self.hidden_size : (2 * i + 2) * self.hidden_size,
+                    ]
+                    for i in range(num_layers)
+                ]
+            input_feed = x.data.new(bsz, self.hidden_size).zero_()
+
+        attn_scores = x.data.new(srclen, seqlen, bsz).zero_()
+        outs = []
+        for j in range(seqlen):
+            if self.embed_lang is None:
+                input = torch.cat((x[j, :, :], sentemb), dim=1)
+            else:
+                input = torch.cat((x[j, :, :], sentemb, langemb), dim=1)
+
+            for i, rnn in enumerate(self.layers):
+                # recurrent cell
+                hidden, cell = rnn(input, (prev_hiddens[i], prev_cells[i]))
+
+                # hidden state becomes the input to the next layer
+                input = F.dropout(hidden, p=self.dropout_out, training=self.training)
+
+                # save state for next time step
+                prev_hiddens[i] = hidden
+                prev_cells[i] = cell
+
+            out = hidden
+            out = F.dropout(out, p=self.dropout_out, training=self.training)
+
+            # input feeding
+            input_feed = out
+
+            # save final output
+            outs.append(out)
+
+        # cache previous states (no-op except during incremental generation)
+        utils.set_incremental_state(
+            self,
+            incremental_state,
+            "cached_state",
+            (prev_hiddens, prev_cells, input_feed),
+        )
+
+        # collect outputs across time steps
+        x = torch.cat(outs, dim=0).view(seqlen, bsz, self.hidden_size)
+
+        # T x B x C -> B x T x C
+        x = x.transpose(1, 0)
+
+        # srclen x tgtlen x bsz -> bsz x tgtlen x srclen
+        attn_scores = attn_scores.transpose(0, 2)
+
+        # project back to size of vocabulary
+        if hasattr(self, "additional_fc"):
+            x = self.additional_fc(x)
+            x = F.dropout(x, p=self.dropout_out, training=self.training)
+        x = self.fc_out(x)
+
+        return x, attn_scores
+
+    def reorder_incremental_state(self, incremental_state, new_order):
+        super().reorder_incremental_state(incremental_state, new_order)
+        cached_state = utils.get_incremental_state(
+            self, incremental_state, "cached_state"
+        )
+        if cached_state is None:
+            return
+
+        def reorder_state(state):
+            if isinstance(state, list):
+                return [reorder_state(state_i) for state_i in state]
+            return state.index_select(0, new_order)
+
+        new_state = tuple(map(reorder_state, cached_state))
+        utils.set_incremental_state(self, incremental_state, "cached_state", new_state)
+
+    def max_positions(self):
+        """Maximum output length supported by the decoder."""
+        return int(1e5)  # an arbitrary large number
+
+
+def Embedding(num_embeddings, embedding_dim, padding_idx):
+    m = nn.Embedding(num_embeddings, embedding_dim, padding_idx=padding_idx)
+    nn.init.uniform_(m.weight, -0.1, 0.1)
+    nn.init.constant_(m.weight[padding_idx], 0)
+    return m
+
+
+def LSTM(input_size, hidden_size, **kwargs):
+    m = nn.LSTM(input_size, hidden_size, **kwargs)
+    for name, param in m.named_parameters():
+        if "weight" in name or "bias" in name:
+            param.data.uniform_(-0.1, 0.1)
+    return m
+
+
+def LSTMCell(input_size, hidden_size, **kwargs):
+    m = nn.LSTMCell(input_size, hidden_size, **kwargs)
+    for name, param in m.named_parameters():
+        if "weight" in name or "bias" in name:
+            param.data.uniform_(-0.1, 0.1)
+    return m
+
+
+def Linear(in_features, out_features, bias=True, dropout=0):
+    """Weight-normalized Linear layer (input: N x T x C)"""
+    m = nn.Linear(in_features, out_features, bias=bias)
+    m.weight.data.uniform_(-0.1, 0.1)
+    if bias:
+        m.bias.data.uniform_(-0.1, 0.1)
+    return m
+
+
+@register_model_architecture("laser_lstm", "laser_lstm")
+def base_architecture(args):
+    args.encoder_embed_dim = getattr(args, "encoder_embed_dim", 512)
+    args.encoder_embed_path = getattr(args, "encoder_embed_path", None)
+    args.encoder_hidden_size = getattr(
+        args, "encoder_hidden_size", args.encoder_embed_dim
+    )
+    args.encoder_layers = getattr(args, "encoder_layers", 1)
+    args.encoder_bidirectional = getattr(args, "encoder_bidirectional", False)
+    args.encoder_dropout_in = getattr(args, "encoder_dropout_in", args.dropout)
+    args.encoder_dropout_out = getattr(args, "encoder_dropout_out", args.dropout)
+    args.decoder_embed_dim = getattr(args, "decoder_embed_dim", 512)
+    args.decoder_embed_path = getattr(args, "decoder_embed_path", None)
+    args.decoder_hidden_size = getattr(
+        args, "decoder_hidden_size", args.decoder_embed_dim
+    )
+    args.decoder_layers = getattr(args, "decoder_layers", 1)
+    args.decoder_out_embed_dim = getattr(args, "decoder_out_embed_dim", 512)
+    args.decoder_dropout_in = getattr(args, "decoder_dropout_in", args.dropout)
+    args.decoder_dropout_out = getattr(args, "decoder_dropout_out", args.dropout)
+    args.decoder_zero_init = getattr(args, "decoder_zero_init", "0")
+    args.decoder_lang_embed_dim = getattr(args, "decoder_lang_embed_dim", 0)
+    args.fixed_embeddings = getattr(args, "fixed_embeddings", False)
diff --git a/SpeechT5/fairseq/examples/laser/laser_src/laser_task.py b/SpeechT5/fairseq/examples/laser/laser_src/laser_task.py
new file mode 100644
index 0000000000000000000000000000000000000000..c8ac805f540030802e36360abcfc036a9c6f5427
--- /dev/null
+++ b/SpeechT5/fairseq/examples/laser/laser_src/laser_task.py
@@ -0,0 +1,326 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+
+from collections import OrderedDict, defaultdict
+import json
+import os
+import logging
+
+from fairseq import options, models
+from fairseq.data import (
+    data_utils,
+    Dictionary,
+    LanguagePairDataset,
+    IndexedDataset,
+    FairseqDataset,
+)
+from .multitask_data_utils import (
+    MultitaskDatasetWrapper,
+    MultidatasetEpochBatchIterator,
+)
+
+
+from fairseq.tasks import LegacyFairseqTask, register_task
+
+logger = logging.getLogger(__name__)
+
+
+@register_task("laser")
+class LaserTask(LegacyFairseqTask):
+    @staticmethod
+    def add_args(parser):
+        """Add task-specific arguments to the parser."""
+        parser.add_argument(
+            "configfile", metavar="PATH", help="dataset configuration file in json"
+        )
+        parser.add_argument(
+            "--weighting-alpha",
+            type=float,
+            default=None,
+            help="alpha for automatic weighting",
+        )
+        parser.add_argument(
+            "--raw-text", action="store_true", help="load raw text dataset"
+        )
+        parser.add_argument(
+            "--left-pad-source",
+            default="True",
+            type=str,
+            metavar="BOOL",
+            help="pad the source on the left (default: True)",
+        )
+        parser.add_argument(
+            "--left-pad-target",
+            default="False",
+            type=str,
+            metavar="BOOL",
+            help="pad the target on the left (default: False)",
+        )
+        parser.add_argument(
+            "--max-source-positions",
+            default=1024,
+            type=int,
+            metavar="N",
+            help="max number of tokens in the source sequence",
+        )
+        parser.add_argument(
+            "--max-target-positions",
+            default=1024,
+            type=int,
+            metavar="N",
+            help="max number of tokens in the target sequence",
+        )
+
+    def __init__(self, args, config, src_dictionary, tgt_dictionary, num_tasks):
+        super().__init__(args)
+        self.config = config
+        self.src_dictionary = src_dictionary
+        self.tgt_dictionary = tgt_dictionary
+        self.num_tasks = num_tasks
+
+    @classmethod
+    def setup_task(cls, args, **kwargs):
+        with open(args.configfile, "r") as f:
+            config = json.load(f)
+        num_tasks = max(dataset["id"] for dataset in config["train"]) + 1
+
+        args.left_pad_source = options.eval_bool(args.left_pad_source)
+        args.left_pad_target = options.eval_bool(args.left_pad_target)
+
+        src_dictionary = Dictionary.load(config["src_vocab"])
+        tgt_dictionary = Dictionary.load(config["tgt_vocab"])
+
+        logger.info(
+            "| src Dictionary {} : {} types".format(
+                config["src_vocab"], len(src_dictionary)
+            )
+        )
+        logger.info(
+            "| tgt Dictionary {} : {} types".format(
+                config["tgt_vocab"], len(tgt_dictionary)
+            )
+        )
+
+        return cls(args, config, src_dictionary, tgt_dictionary, num_tasks)
+
+    # Experimental overriding for backtranslation
+    def build_model(self, args):
+        model = models.build_model(args, self)
+        return model
+
+    def dataset(self, split):
+        if split not in self.datasets:
+            raise KeyError("Dataset not loaded: " + split)
+        return self.datasets[split]
+
+    def load_dataset(self, split, epoch=1, **kwargs):
+        """Load a dataset split."""
+
+        def indexed_dataset(path, dictionary):
+            if self.args.raw_text:
+                raise Exception("Unable to handle raw text.")
+            dataset = IndexedDataset(path, fix_lua_indexing=True)
+
+            return dataset
+
+        pair_datasets = OrderedDict()
+
+        if split == "valid":
+            self.datasets[split] = pair_datasets
+            return
+
+        if split not in self.config:
+            raise FileNotFoundError(
+                "Dataset not found in config file: {}".format(split)
+            )
+
+        size_by_corpus = defaultdict(int)
+        size_sum = 0
+        size_sum_with_subsampling = 0
+        init_pair_datasets = {}
+
+        for dataset_config in self.config[split]:
+            src_path = os.path.dirname(dataset_config["src"])
+            corpus_name = src_path.split("/")[-2]
+            language_pair_name = src_path.split("/")[-1]
+            pair_datasets_key = corpus_name + "-" + language_pair_name
+
+            logger.info(f"loading... {pair_datasets_key}")
+            if "src" in dataset_config:
+                src_dataset = indexed_dataset(
+                    dataset_config["src"], self.src_dictionary
+                )
+            else:
+                src_dataset = None
+
+            if "tgt" in dataset_config:
+                tgt_dataset = indexed_dataset(
+                    dataset_config["tgt"], self.tgt_dictionary
+                )
+            else:
+                tgt_dataset = None
+
+            dataset = LanguagePairDataset(
+                src_dataset,
+                src_dataset.sizes,
+                self.src_dictionary,
+                tgt_dataset,
+                tgt_dataset.sizes,
+                self.tgt_dictionary,
+                left_pad_source=self.args.left_pad_source,
+                left_pad_target=self.args.left_pad_target,
+            )
+
+            if pair_datasets_key in init_pair_datasets:
+                logger.warning(
+                    f"Ignoring already added {pair_datasets_key}. "
+                    f"Consider using `sample` key in order to upsample."
+                )
+            else:
+                init_pair_datasets[pair_datasets_key] = {
+                    "dataset": dataset,
+                    "sample": dataset_config.get("sample", None),
+                    "id": dataset_config.get("id", None),
+                    "len": len(dataset),
+                }
+
+        length_sum = 0
+        weighted_freqs_sum = 0
+        freq_per_dataset = {}
+        vmax = 0
+        vmin = 1
+        weighted_freq_per_dataset = {}
+
+        if self.args.weighting_alpha:
+            for key in init_pair_datasets:
+                if init_pair_datasets[key]["sample"] is None:
+                    length_sum += len(init_pair_datasets[key]["dataset"])
+
+            for key in init_pair_datasets:
+                if init_pair_datasets[key]["sample"] is None:
+                    val = float(init_pair_datasets[key]["len"]) / length_sum
+                    freq_per_dataset[key] = val
+                    weighted_freqs_sum += val ** self.args.weighting_alpha
+
+            for key in freq_per_dataset:
+                val = (
+                    freq_per_dataset[key] ** self.args.weighting_alpha
+                    / weighted_freqs_sum
+                )
+                vmin = min(vmin, val)
+                vmax = max(vmax, val)
+                weighted_freq_per_dataset[key] = val
+
+        for pair_datasets_key in init_pair_datasets:
+            dataset_config = init_pair_datasets[pair_datasets_key]
+            dataset = dataset_config["dataset"]
+            sample = dataset_config["sample"]
+            if sample is None:
+                sample = 1.0
+
+            if pair_datasets_key in weighted_freq_per_dataset:
+                w = vmax / weighted_freq_per_dataset[pair_datasets_key]
+                sample = w
+
+            sample = round(sample)
+
+            initial_sample = sample
+            initial_pair_datasets_key = pair_datasets_key
+
+            while sample >= 1.0:
+                assert (
+                    pair_datasets_key not in pair_datasets
+                ), f"{pair_datasets_key} already in"
+                size_sum_with_subsampling += len(dataset)
+                pair_datasets[pair_datasets_key] = MultitaskDatasetWrapper(
+                    dataset, dataset_config.get("id", 0), 1.0, name=pair_datasets_key
+                )
+                size_sum += len(dataset)
+                sample -= 1.0
+                pair_datasets_key += "-up"
+
+            assert sample < 1e-6, f"sample remains > 0 {pair_datasets_key}"
+
+            logger.info(
+                f"added pair {initial_pair_datasets_key} length {len(dataset)} new_length = {len(dataset)*initial_sample}"
+            )
+            size_by_corpus[corpus_name] += len(dataset)
+
+        self.datasets[split] = pair_datasets
+        logger.info(
+            f"Datasets number = {len(self.datasets[split])} size = {size_sum} size_sum_with_subsampling = {size_sum_with_subsampling}"
+        )
+
+    @property
+    def source_dictionary(self):
+        return self.src_dictionary
+
+    @property
+    def target_dictionary(self):
+        return self.tgt_dictionary
+
+    def get_batch_iterator(
+        self,
+        dataset,
+        max_tokens=None,
+        max_sentences=None,
+        max_positions=None,
+        ignore_invalid_inputs=False,
+        required_batch_size_multiple=1,
+        seed=1,
+        num_shards=1,
+        shard_id=0,
+        num_workers=0,
+        epoch=1,
+        data_buffer_size=0,
+        disable_iterator_cache=False,
+    ):
+
+        assert isinstance(dataset, OrderedDict)
+        assert len(dataset)
+        assert isinstance(dataset[next(iter(dataset))], FairseqDataset)
+
+        # initialize the dataset with the correct starting epoch
+        for _, dt in dataset.items():
+            dt.set_epoch(epoch)
+
+        indices = OrderedDict()
+        batch_sampler = OrderedDict()
+
+        with data_utils.numpy_seed(seed + epoch):
+            for key, dt in dataset.items():
+                logger.info(f"\t ordered_indices {key}")
+                indices[key] = dt.ordered_indices()
+
+        # filter examples that are too large
+        if max_positions is not None:
+            for key, dt in dataset.items():
+                logger.info(f"\t filter_by_size {key}")
+                indices[key], ignored = dt.filter_indices_by_size(
+                    indices[key], max_positions
+                )
+
+        for key, dt in dataset.items():
+            logger.info(f"\t batch_by_size {key}")
+            batch_sampler[key] = data_utils.batch_by_size(
+                indices[key],
+                dt.num_tokens,
+                max_tokens=max_tokens,
+                max_sentences=max_sentences,
+                required_batch_size_multiple=required_batch_size_multiple,
+            )
+
+        epoch_iter = MultidatasetEpochBatchIterator(
+            dataset=dataset,
+            batch_sampler=batch_sampler,
+            seed=seed,
+            num_shards=num_shards,
+            shard_id=shard_id,
+            num_workers=num_workers,
+            epoch=epoch,
+        )
+
+        return epoch_iter
diff --git a/SpeechT5/fairseq/examples/laser/laser_src/laser_transformer.py b/SpeechT5/fairseq/examples/laser/laser_src/laser_transformer.py
new file mode 100644
index 0000000000000000000000000000000000000000..0be030994ff87334ca0392302374693f7f2c61b3
--- /dev/null
+++ b/SpeechT5/fairseq/examples/laser/laser_src/laser_transformer.py
@@ -0,0 +1,354 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import logging
+
+from typing import Any, Dict, List, Optional
+from torch import Tensor
+
+import torch
+import torch.nn as nn
+
+from fairseq.models import (
+    FairseqEncoderDecoderModel,
+    register_model,
+    register_model_architecture,
+)
+from fairseq.models.transformer import (
+    base_architecture,
+    Embedding,
+    TransformerModel,
+    TransformerEncoder,
+    TransformerDecoder,
+)
+from fairseq.modules import (
+    TransformerDecoderLayer,
+)
+
+logger = logging.getLogger(__name__)
+
+
+@register_model("laser_transformer")
+class LaserTransformerModel(FairseqEncoderDecoderModel):
+    """Train Transformer for LASER task
+
+    Requires --task laser
+    """
+
+    def __init__(self, encoder, decoder):
+        super().__init__(encoder, decoder)
+
+    def forward(
+        self,
+        src_tokens,
+        src_lengths,
+        prev_output_tokens=None,
+        tgt_tokens=None,
+        tgt_lengths=None,
+        target_language_id=-1,
+        dataset_name="",
+    ):
+        laser_encoder_out = self.encoder(src_tokens, src_lengths)
+        return self.decoder(
+            prev_output_tokens, laser_encoder_out, lang_id=target_language_id
+        )
+
+    @staticmethod
+    def add_args(parser):
+        """Add model-specific arguments to the parser."""
+        TransformerModel.add_args(parser)
+        parser.add_argument(
+            "--decoder-lang-embed-dim",
+            type=int,
+            metavar="N",
+            help="decoder language embedding dimension",
+        )
+
+    @classmethod
+    def build_model(cls, args, task):
+        base_laser_transformer_architecture(args)
+
+        num_langs = task.num_tasks if hasattr(task, "num_tasks") else 0
+
+        def load_embed_tokens(dictionary, embed_dim):
+            num_embeddings = len(dictionary)
+            padding_idx = dictionary.pad()
+
+            return Embedding(num_embeddings, embed_dim, padding_idx)
+
+        encoder_embed_tokens = load_embed_tokens(
+            task.source_dictionary, args.encoder_embed_dim
+        )
+        decoder_embed_tokens = load_embed_tokens(
+            task.target_dictionary, args.decoder_embed_dim
+        )
+        num_langs = task.num_tasks if hasattr(task, "num_tasks") else 0
+
+        encoder = LaserTransformerEncoder(
+            args, task.source_dictionary, encoder_embed_tokens
+        )
+
+        decoder = LaserTransformerDecoder(
+            args,
+            task.target_dictionary,
+            decoder_embed_tokens,
+            num_langs=num_langs,
+            lang_embed_dim=args.decoder_lang_embed_dim,
+        )
+
+        return cls(encoder, decoder)
+
+
+class LaserTransformerEncoder(TransformerEncoder):
+    def __init__(self, *args, **kwargs):
+        super().__init__(*args, **kwargs)
+
+    def forward(self, src_tokens, *args, **kwargs):
+        encoder_out = super().forward(src_tokens, *args, **kwargs)
+
+        x = encoder_out["encoder_out"][0]  # T x B x C
+        padding_mask = src_tokens.eq(self.padding_idx).t().unsqueeze(-1)
+
+        if padding_mask.any():
+            x = x.float().masked_fill_(padding_mask, float("-inf")).type_as(x)
+
+        # Build the sentence embedding by max-pooling over the encoder outputs
+        sentemb = x.max(dim=0)[0]
+
+        # The Pytorch Mobile lite interpreter does not supports returning NamedTuple in
+        # `foward` so we use a dictionary instead.
+        # TorchScript does not support mixed values so the values are all lists.
+        # The empty list is equivalent to None.
+        return {"sentemb": [sentemb]}  # B x C
+
+    @torch.jit.export
+    def reorder_encoder_out(self, encoder_out: Dict[str, List[Tensor]], new_order):
+        """
+        Same as the one in transformer.py, with new_sentemb
+        """
+        if len(encoder_out["sentemb"]) == 0:
+            new_sentemb = []
+        else:
+            new_sentemb = [encoder_out["sentemb"][0].index_select(0, new_order)]
+
+        return {
+            "sentemb": new_sentemb,  # B x C
+        }
+
+
+class LaserTransformerDecoder(TransformerDecoder):
+    def __init__(self, args, dictionary, *kargs, **kwargs):
+        self.num_langs = kwargs.get("num_langs", 1)
+        self.lang_embed_dim = kwargs.get("lang_embed_dim", 0)
+        kwargs.pop("num_langs", None)
+        kwargs.pop("lang_embed_dim", None)
+
+        super().__init__(args, dictionary, *kargs, **kwargs, no_encoder_attn=True)
+
+        if self.lang_embed_dim == 0:
+            self.embed_lang = None
+        else:
+            self.embed_lang = nn.Embedding(self.num_langs, self.lang_embed_dim)
+            nn.init.uniform_(self.embed_lang.weight, -0.1, 0.1)
+
+        if self.output_projection is not None:
+            laser_output_embed_dim = (
+                self.output_embed_dim + self.lang_embed_dim + args.encoder_embed_dim
+            )
+            self.output_projection = nn.Linear(
+                laser_output_embed_dim, len(dictionary), bias=False
+            )
+            nn.init.normal_(
+                self.output_projection.weight,
+                mean=0,
+                std=laser_output_embed_dim ** -0.5,
+            )
+
+    def build_decoder_layer(self, args, no_encoder_attn=False):
+        decoder_embed_dim = args.decoder_embed_dim
+        args.decoder_embed_dim = (
+            decoder_embed_dim + self.lang_embed_dim + args.encoder_embed_dim
+        )
+        res = TransformerDecoderLayer(args, no_encoder_attn=True)
+        args.decoder_embed_dim = decoder_embed_dim
+
+        return res
+
+    def extract_features(
+        self,
+        prev_output_tokens,
+        encoder_out: Optional[Dict[str, List[Tensor]]],
+        incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]] = None,
+        full_context_alignment: bool = False,
+        alignment_layer: Optional[int] = None,
+        alignment_heads: Optional[int] = None,
+        lang_id: Optional[int] = None,
+    ):
+        """
+        Similar to *forward* but only return features.
+
+        Includes several features from "Jointly Learning to Align and
+        Translate with Transformer Models" (Garg et al., EMNLP 2019).
+
+        Args:
+            full_context_alignment (bool, optional): don't apply
+                auto-regressive mask to self-attention (default: False).
+            alignment_layer (int, optional): return mean alignment over
+                heads at this layer (default: last layer).
+            alignment_heads (int, optional): only average alignment over
+                this many heads (default: all heads).
+
+        Returns:
+            tuple:
+                - the decoder's features of shape `(batch, tgt_len, embed_dim)`
+                - a dictionary with any model-specific outputs
+        """
+        if alignment_layer is None:
+            alignment_layer = self.num_layers - 1
+
+        # embed positions
+        positions = (
+            self.embed_positions(
+                prev_output_tokens, incremental_state=incremental_state
+            )
+            if self.embed_positions is not None
+            else None
+        )
+
+        if incremental_state is not None:
+            prev_output_tokens = prev_output_tokens[:, -1:]
+            if positions is not None:
+                positions = positions[:, -1:]
+
+        bsz, seqlen = prev_output_tokens.size()
+
+        # embed tokens and positions
+        x = self.embed_scale * self.embed_tokens(prev_output_tokens)
+
+        if self.quant_noise is not None:
+            x = self.quant_noise(x)
+
+        if self.project_in_dim is not None:
+            x = self.project_in_dim(x)
+
+        if positions is not None:
+            x += positions
+
+        if self.layernorm_embedding is not None:
+            x = self.layernorm_embedding(x)
+
+        x = self.dropout_module(x)
+
+        # B x T x C -> T x B x C
+        x = x.transpose(0, 1)
+
+        if self.embed_lang is not None:
+            lang_ids = prev_output_tokens.data.new_full((bsz,), lang_id)
+            langemb = self.embed_lang(lang_ids)
+            langemb = langemb.unsqueeze(0)
+            repeat_vals = [x.shape[0] // langemb.shape[0]] + [-1] * (
+                len(langemb.shape) - 1
+            )
+            x = torch.cat((x, langemb.expand(*repeat_vals)), dim=-1)
+
+        sentemb = encoder_out["sentemb"][0]
+        sentemb = sentemb.unsqueeze(0)
+
+        repeat_vals = [x.shape[0] // sentemb.shape[0]] + [-1] * (len(sentemb.shape) - 1)
+        x = torch.cat((x, sentemb.expand(*repeat_vals)), dim=-1)
+
+        self_attn_padding_mask: Optional[Tensor] = None
+        if self.cross_self_attention or prev_output_tokens.eq(self.padding_idx).any():
+            self_attn_padding_mask = prev_output_tokens.eq(self.padding_idx)
+
+        # decoder layers
+        attn: Optional[Tensor] = None
+        inner_states: List[Optional[Tensor]] = [x]
+        for idx, layer in enumerate(self.layers):
+            if incremental_state is None and not full_context_alignment:
+                self_attn_mask = self.buffered_future_mask(x)
+            else:
+                self_attn_mask = None
+
+            x, layer_attn, _ = layer(
+                x,
+                None,
+                None,
+                incremental_state,
+                self_attn_mask=self_attn_mask,
+                self_attn_padding_mask=self_attn_padding_mask,
+                need_attn=bool((idx == alignment_layer)),
+                need_head_weights=bool((idx == alignment_layer)),
+            )
+            inner_states.append(x)
+            if layer_attn is not None and idx == alignment_layer:
+                attn = layer_attn.float().to(x)
+
+        if attn is not None:
+            if alignment_heads is not None:
+                attn = attn[:alignment_heads]
+
+            # average probabilities over heads
+            attn = attn.mean(dim=0)
+
+        if self.layer_norm is not None:
+            x = self.layer_norm(x)
+
+        # T x B x C -> B x T x C
+        x = x.transpose(0, 1)
+
+        if self.project_out_dim is not None:
+            x = self.project_out_dim(x)
+
+        return x, {"attn": [attn], "inner_states": inner_states}
+
+    def forward(
+        self,
+        prev_output_tokens,
+        encoder_out: Optional[Dict[str, List[Tensor]]] = None,
+        incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]] = None,
+        features_only: bool = False,
+        alignment_layer: Optional[int] = None,
+        alignment_heads: Optional[int] = None,
+        src_lengths: Optional[Any] = None,
+        return_all_hiddens: bool = False,
+        lang_id: Optional[int] = None,
+    ):
+        """
+        Args:
+            prev_output_tokens (LongTensor): previous decoder outputs of shape
+                `(batch, tgt_len)`, for teacher forcing
+            encoder_out (optional): output from the encoder, used for
+                encoder-side attention
+            incremental_state (dict): dictionary used for storing state during
+                :ref:`Incremental decoding`
+            features_only (bool, optional): only return features without
+                applying output layer (default: False).
+
+        Returns:
+            tuple:
+                - the decoder's output of shape `(batch, tgt_len, vocab)`
+                - a dictionary with any model-specific outputs
+        """
+
+        assert lang_id is not None
+
+        x, extra = self.extract_features(
+            prev_output_tokens,
+            encoder_out=encoder_out,
+            incremental_state=incremental_state,
+            alignment_layer=alignment_layer,
+            alignment_heads=alignment_heads,
+            lang_id=lang_id,
+        )
+        if not features_only:
+            x = self.output_layer(x)
+        return x, extra
+
+
+@register_model_architecture("laser_transformer", "laser_transformer")
+def base_laser_transformer_architecture(args):
+    base_architecture(args)
+    args.decoder_lang_embed_dim = getattr(args, "decoder_lang_embed_dim", 0)
diff --git a/SpeechT5/fairseq/examples/laser/laser_src/multitask_data_utils.py b/SpeechT5/fairseq/examples/laser/laser_src/multitask_data_utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..b05caea26793bf5112a7abc29d76225f578f3ebe
--- /dev/null
+++ b/SpeechT5/fairseq/examples/laser/laser_src/multitask_data_utils.py
@@ -0,0 +1,143 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from collections import OrderedDict
+
+import numpy as np
+
+from fairseq.data import BaseWrapperDataset, FairseqDataset, iterators
+
+
+class MultiItr(object):
+    def __init__(self, itr):
+        self.itr = itr
+        self._counts = [0 for x in itr]
+
+    def __len__(self):
+        return sum(len(itr) for itr in self.itr)
+
+    def __iter__(self):
+        return self
+
+    def __next__(self):
+        ratios = [count / len(itr) for count, itr in zip(self._counts, self.itr)]
+        idx = ratios.index(min(ratios))
+        self._counts[idx] += 1
+        return next(self.itr[idx])
+
+
+class MultidatasetEpochBatchIterator(iterators.EpochBatchIterating):
+    """A wrapper around multiple epoch batch iterators."""
+
+    def __init__(
+        self,
+        dataset,
+        batch_sampler,
+        seed=1,
+        num_shards=1,
+        shard_id=0,
+        num_workers=0,
+        epoch=1,
+    ):
+
+        assert isinstance(dataset, OrderedDict)
+        assert len(dataset)
+        assert isinstance(dataset[next(iter(dataset))], FairseqDataset)
+
+        self.iterators = []
+
+        self.epoch = epoch
+        for key, dt in dataset.items():
+            epoch_iter = iterators.EpochBatchIterator(
+                dataset=dt,
+                collate_fn=dt.collater,
+                batch_sampler=batch_sampler[key],
+                seed=seed,
+                num_shards=num_shards,
+                shard_id=shard_id,
+                num_workers=0,
+                epoch=epoch,
+            )
+            self.iterators.append(epoch_iter)
+
+    def __len__(self):
+        return sum(len(itr) for itr in self.iterators)
+
+    def next_epoch_itr(self, shuffle=True, fix_batches_to_gpus=False):
+        # `self.epoch += 1` should be handled by underlying `EpochBatchIterator`s.
+        return MultiItr(
+            [
+                itr.next_epoch_itr(
+                    shuffle=shuffle, fix_batches_to_gpus=fix_batches_to_gpus
+                )
+                for itr in self.iterators
+            ]
+        )
+
+    def end_of_epoch(self):
+        return all(itr.end_of_epoch() for itr in self.iterators)
+
+    @property
+    def next_epoch_idx(self):
+        """Return the epoch index after *next_epoch_itr* is called."""
+
+        epochs = [itr.next_epoch_idx for itr in self.iterators]
+        self.epoch = epochs[0]
+        assert all(epoch == self.epoch for epoch in epochs)
+
+        return self.epoch
+
+    @property
+    def iterations_in_epoch(self):
+        return sum(itr.iterations_in_epoch for itr in self.iterators)
+
+    def state_dict(self):
+        return {
+            "iterators": [it.state_dict() for it in self.iterators],
+            "epoch": self.epoch,
+        }
+
+    def load_state_dict(self, state_dict):
+        self.epoch = state_dict["epoch"]
+        for it, d in zip(self.iterators, state_dict["iterators"]):
+            it.load_state_dict(d)
+
+
+class MultitaskDatasetWrapper(BaseWrapperDataset):
+    """A wrapper for a multitask dataset."""
+
+    def __init__(self, dataset, target_language_id, sample=1.0, name=""):
+        super().__init__(dataset)
+        self.target_language_id = target_language_id
+        self.sample = sample
+        self.name = name
+
+    def collater(self, *args, **kwargs):
+        ans = self.dataset.collater(*args, **kwargs)
+        if "net_input" in ans:
+            ans["net_input"]["target_language_id"] = self.target_language_id
+            ans["net_input"]["dataset_name"] = self.name
+        return ans
+
+    def num_tokens(self, *args, **kwargs):
+        return self.dataset.num_tokens(*args, **kwargs)
+
+    def ordered_indices(self, *args, **kwargs):
+        indices = self.dataset.ordered_indices(*args, **kwargs)
+        # Hacky solution for sampling
+        size = int(self.sample * indices.shape[0])
+
+        return indices.take(np.sort(np.random.permutation(indices.shape[0])[:size]))
+
+    def size(self, index: int):
+        return self.dataset.size(index)
+
+    @property
+    def supports_prefetch(self):
+        """Whether this dataset supports prefetching."""
+        return getattr(self.dataset, "supports_prefetch", False)
+
+    def prefetch(self, indices):
+        return self.dataset.prefetch(indices)
diff --git a/SpeechT5/fairseq/examples/latent_depth/README.md b/SpeechT5/fairseq/examples/latent_depth/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..7774c333053b95d15b180fdfc3ee3cd817790520
--- /dev/null
+++ b/SpeechT5/fairseq/examples/latent_depth/README.md
@@ -0,0 +1,77 @@
+# Deep Transformers with Latent Depth (Li et al., 2020)
+
+[https://arxiv.org/abs/2009.13102](https://arxiv.org/abs/2009.13102).
+
+## Introduction
+
+We present a probabilistic framework to automatically learn which layer(s) to use by learning the posterior distributions of layer selection. As an extension of this framework, we propose a novel method to train one shared Transformer network for multilingual machine translation with different layer selection posteriors for each language pair.
+
+## Training a multilingual model with latent depth
+
+Below is an example of training with latent depth in decoder for one-to-many (O2M) related languages. We use the same preprocessed (numberized and binarized) TED8 dataset as in [Balancing Training for Multilingual Neural Machine Translation (Wang et al., 2020)](https://github.com/cindyxinyiwang/multiDDS), which could be generated by [the script](https://github.com/cindyxinyiwang/multiDDS/blob/multiDDS/util_scripts/prepare_multilingual_data.sh) the author provided.
+```bash
+lang_pairs_str="eng-aze,eng-bel,eng-ces,eng-glg,eng-por,eng-rus,eng-slk,eng-tur"
+databin_dir=<path to binarized data>
+
+fairseq-train ${databin_dir} \
+  --user-dir examples/latent_depth/latent_depth_src \
+  --lang-pairs "${lang_pairs_str}" \
+  --arch multilingual_transformer_iwslt_de_en \
+  --task multilingual_translation_latent_depth \
+  --criterion label_smoothed_cross_entropy --label-smoothing 0.1 \
+  --share-encoders \
+  --share-decoders \
+  --decoder-langtok \
+  --share-decoder-input-output-embed \
+  --dropout 0.3 --attention-dropout 0.3 \
+  --optimizer adam --adam-eps 1e-06 --adam-betas '(0.9, 0.98)' \
+  --lr-scheduler inverse_sqrt --stop-min-lr 1e-9 --warmup-init-lr 1e-7 --warmup-updates 8000 \
+  --max-tokens 4096 --update-freq 1  \
+  --lr 0.0015 \
+  --clip-norm 1.0 \
+  --seed 2 \
+  --ddp-backend=legacy_ddp \
+  --encoder-layers 12 \
+  --decoder-layers 24 \
+  --decoder-latent-layer \
+  --sparsity-weight 0.1 \
+  --anneal-updates 5000 \
+  --soft-update 500  \
+  --target-layers 12 \
+  --share-weight 0.1
+```
+## Inference command
+
+```bash
+lang_pairs_str="eng-aze,eng-bel,eng-ces,eng-glg,eng-por,eng-rus,eng-slk,eng-tur"
+databin_dir=<path to binarized data>
+model_path=<path to checkpoint>
+src_lang=<source language to translate from>
+tgt_lang=<target language to translate to>
+gen_data=<name of data split, e.g. valid, test, etc>
+
+fairseq-generate ${databin_dir} \
+  --path ${model_path} \
+  --task multilingual_translation_latent_depth \
+  --decoder-latent-layer \
+  --lang-pairs "${lang_pairs_str}" \
+  -s ${src_lang} -t ${tgt_lang} \
+  --gen-subset $gen_data \
+  --scoring sacrebleu \
+  --remove-bpe 'sentencepiece' \
+  --lenpen 1.0 \
+  --beam 5  \
+  --decoder-langtok \
+  --max-tokens 4096
+```
+
+
+## Citation
+```bibtex
+@article{li2020deep,
+  title={Deep Transformers with Latent Depth},
+  author={Li, Xian and Stickland, Asa Cooper and Tang, Yuqing and Kong, Xiang},
+  journal={arXiv preprint arXiv:2009.13102},
+  year={2020}
+}
+```
diff --git a/SpeechT5/fairseq/examples/latent_depth/latent_depth_src/__init__.py b/SpeechT5/fairseq/examples/latent_depth/latent_depth_src/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..c5fa76039ff98c18d3c14b5f4a8f73ffe644de11
--- /dev/null
+++ b/SpeechT5/fairseq/examples/latent_depth/latent_depth_src/__init__.py
@@ -0,0 +1,9 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from . import multilingual_translation_latent_depth  # noqa
+from .loss import latent_depth  # noqa
+from .models import latent_multilingual_transformer  # noqa
+from .modules import latent_layers  # noqa
diff --git a/SpeechT5/fairseq/examples/latent_depth/latent_depth_src/loss/__init__.py b/SpeechT5/fairseq/examples/latent_depth/latent_depth_src/loss/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/SpeechT5/fairseq/examples/latent_depth/latent_depth_src/loss/latent_depth.py b/SpeechT5/fairseq/examples/latent_depth/latent_depth_src/loss/latent_depth.py
new file mode 100644
index 0000000000000000000000000000000000000000..a3b9535ecac3ec403868681a8b50c1fbe1c90dfe
--- /dev/null
+++ b/SpeechT5/fairseq/examples/latent_depth/latent_depth_src/loss/latent_depth.py
@@ -0,0 +1,99 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import math
+
+import torch
+from torch.nn.modules.loss import _Loss
+
+
+class LatentLayersKLLoss(_Loss):
+    def __init__(self, args):
+        super().__init__()
+        self.args = args
+
+    def forward(self, layer_samples, lang_idx, update_num, sample_size):
+        prior = self.args.prior
+        samples = layer_samples[lang_idx]
+        eps = 1e-7
+        if prior == "uniform":
+            # uniform prior
+            kl_loss = (samples * (torch.log(samples + eps) - math.log(0.5))).sum(-1)
+        elif prior == "agged_posterior":
+            # aggregated posterior
+            y_t = torch.stack([x.detach() for x in layer_samples], dim=0)
+            agged_q = torch.sum(y_t, dim=0)
+            row_norm = agged_q.sum(-1)
+            normed_agg_q = agged_q / row_norm
+            kl_loss = (
+                samples * (torch.log(samples + eps) - torch.log(normed_agg_q + eps))
+            ).sum(-1)
+        else:
+            raise NotImplementedError("The specified prior is not implemented.")
+
+        # normalized by number of layers
+        kl_loss /= layer_samples[0].size()[0]
+        kl_weight = min(
+            self.args.sparsity_weight,
+            (update_num - self.args.soft_update)
+            * self.args.sparsity_weight
+            / self.args.anneal_updates,
+        )
+        kl_loss *= kl_weight * sample_size
+        return kl_loss
+
+
+class LatentLayersSparsityLoss(_Loss):
+    def __init__(self, args):
+        super().__init__()
+        self.args = args
+
+    def is_valid(self, update_num):
+        if self.args.target_layers <= 0:
+            return False
+        return update_num > (self.args.soft_update + self.args.anneal_updates)
+
+    def forward(self, layer_samples_list, update_num, sample_size):
+        batch_loss = 0
+        share_loss = 0
+        global_sparsity_loss = 0
+        layer_samples = torch.stack(layer_samples_list, dim=0)
+        if (
+            self.args.target_layers > 0 or self.args.share_weight > 0
+        ) and update_num > (self.args.soft_update + self.args.anneal_updates):
+            # anneal sparsity weight
+            if update_num < (self.args.anneal_updates + self.args.soft_update):
+                weight_anneal = 0
+            elif update_num < (2 * self.args.anneal_updates + self.args.soft_update):
+                weight_anneal = (
+                    (update_num - self.args.soft_update - self.args.anneal_updates)
+                    * self.args.share_weight
+                    / self.args.anneal_updates
+                )
+            else:
+                weight_anneal = 1
+            # compute ratio among languages
+            layer_utilization = torch.sum(layer_samples, dim=0)
+            layer_utilization /= layer_samples.size()[0]
+            if self.args.share_weight > 0:
+                # encouraging sharing across languages
+                share_loss = sum(
+                    -1.0 * v * math.log(v) for v in layer_utilization if v > 0
+                )
+                batch_loss += (
+                    weight_anneal * self.args.share_weight * sample_size * share_loss
+                )
+            if self.args.target_layers > 0:
+                # computed expected number of layers selected
+                expeted_layers = sum(layer_utilization)
+                # compute l2 loss wrt target number of layers
+                global_sparsity_loss = (expeted_layers - self.args.target_layers) ** 2
+                batch_loss += (
+                    weight_anneal
+                    * self.args.share_weight
+                    * sample_size
+                    * global_sparsity_loss
+                )
+        return batch_loss
diff --git a/SpeechT5/fairseq/examples/latent_depth/latent_depth_src/models/__init__.py b/SpeechT5/fairseq/examples/latent_depth/latent_depth_src/models/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/SpeechT5/fairseq/examples/latent_depth/latent_depth_src/models/latent_multilingual_transformer.py b/SpeechT5/fairseq/examples/latent_depth/latent_depth_src/models/latent_multilingual_transformer.py
new file mode 100644
index 0000000000000000000000000000000000000000..12b7e67d0336e54be05f9fdec49df2b7d4c7ae29
--- /dev/null
+++ b/SpeechT5/fairseq/examples/latent_depth/latent_depth_src/models/latent_multilingual_transformer.py
@@ -0,0 +1,75 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from fairseq.models import register_model, register_model_architecture
+from fairseq.models.multilingual_transformer import MultilingualTransformerModel
+from fairseq.models.transformer import (
+    TransformerDecoder,
+    TransformerEncoder,
+    base_architecture,
+)
+
+from .latent_transformer import LatentTransformerDecoder, LatentTransformerEncoder
+
+
+@register_model("latent_multilingual_transformer")
+class LatentMultilingualTransformerModel(MultilingualTransformerModel):
+    """A variant of standard multilingual Transformer models which encoder and/or
+    decoders supports latent depth, as is in "Deep Transformer with Latent Depth"
+    (https://arxiv.org/abs/2009.13102).
+    """
+
+    @staticmethod
+    def add_args(parser):
+        """Add model-specific arguments to the parser."""
+        MultilingualTransformerModel.add_args(parser)
+        parser.add_argument(
+            '--soft-select',
+            action='store_true',
+            help='use soft samples in training an inference',
+        )
+        parser.add_argument(
+            '--sampling-tau',
+            type=float,
+            default=5.,
+            help='sampling temperature',
+        )
+
+    @classmethod
+    def _get_module_class(cls, is_encoder, args, lang_dict, embed_tokens, langs):
+        if is_encoder:
+            if hasattr(args, "encoder_latent_layer") and args.encoder_latent_layer:
+                return LatentTransformerEncoder(
+                    args, lang_dict, embed_tokens, num_logits=len(langs)
+                )
+            else:
+                return TransformerEncoder(args, lang_dict, embed_tokens)
+        else:
+            if hasattr(args, "decoder_latent_layer") and args.decoder_latent_layer:
+                return LatentTransformerDecoder(
+                    args, lang_dict, embed_tokens, num_logits=len(langs)
+                )
+            else:
+                return TransformerDecoder(args, lang_dict, embed_tokens)
+
+
+@register_model_architecture(
+    "latent_multilingual_transformer", "latent_multilingual_transformer"
+)
+def latent_multilingual_architecture(args):
+    args.encoder_embed_dim = getattr(args, "encoder_embed_dim", 512)
+    args.encoder_ffn_embed_dim = getattr(args, "encoder_ffn_embed_dim", 1024)
+    args.encoder_attention_heads = getattr(args, "encoder_attention_heads", 4)
+    args.encoder_layers = getattr(args, "encoder_layers", 12)
+    args.decoder_embed_dim = getattr(args, "decoder_embed_dim", 512)
+    args.decoder_ffn_embed_dim = getattr(args, "decoder_ffn_embed_dim", 1024)
+    args.decoder_attention_heads = getattr(args, "decoder_attention_heads", 4)
+    args.decoder_layers = getattr(args, "decoder_layers", 24)
+    args.share_encoders = getattr(args, "share_encoders", True)
+    args.share_decoders = getattr(args, "share_decoders", True)
+    args.share_encoder_embeddings = getattr(args, "share_encoder_embeddings", True)
+    args.share_decoder_embeddings = getattr(args, "share_decoder_embeddings", True)
+
+    base_architecture(args)
diff --git a/SpeechT5/fairseq/examples/latent_depth/latent_depth_src/models/latent_transformer.py b/SpeechT5/fairseq/examples/latent_depth/latent_depth_src/models/latent_transformer.py
new file mode 100644
index 0000000000000000000000000000000000000000..6a825301a452bd935deafdaf78fa2427ca9a469e
--- /dev/null
+++ b/SpeechT5/fairseq/examples/latent_depth/latent_depth_src/models/latent_transformer.py
@@ -0,0 +1,156 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from typing import Any, Dict, Optional
+
+import torch.nn as nn
+from fairseq.models.fairseq_encoder import EncoderOut
+from fairseq.models.transformer import TransformerDecoder, TransformerEncoder
+from fairseq.modules import TransformerDecoderLayer, TransformerEncoderLayer
+from torch import Tensor
+
+from ..modules.latent_layers import LayerSelect
+
+
+class LatentTransformerEncoder(TransformerEncoder):
+    """Latent depth (https://arxiv.org/abs/2009.13102) implemented in
+    TransformerEncoder.
+    """
+
+    def __init__(self, args, dictionary, embed_tokens, num_logits=1):
+        self.num_logits = num_logits
+        self.num_layers = args.encoder_layers
+        super().__init__(args, dictionary, embed_tokens)
+        self.layer_select = LayerSelect(
+            num_layers=self.num_layers,
+            num_logits=self.num_logits,
+            soft_select=getattr(args, "soft_select", False),
+            sampling_tau=getattr(args, "sampling_tau", 5.),
+        )
+        self.lang_idx = None
+        self.layers = nn.ModuleList(
+            [self._build_encoder_layer(args, idx) for idx in range(args.encoder_layers)]
+        )
+
+    def set_lang_idx(self, lang_idx):
+        self.lang_idx = lang_idx
+
+    def _build_encoder_layer(self, args, idx=None):
+        return LatentTransformerEncoderLayer(args, idx, layer_select=self.layer_select)
+
+    def forward(self, src_tokens, src_lengths, return_all_hiddens: bool = False):
+        self.layer_select.sample(self.lang_idx)
+        return super().forward(src_tokens, src_lengths, return_all_hiddens)
+
+
+class LatentTransformerEncoderLayer(TransformerEncoderLayer):
+    """Encoder layer with each (non_residual) block weighted by samples of Bernouli
+    or Gumbel Signmoid samples.
+
+    Args:
+        args (argparse.Namespace): parsed command-line arguments from standard
+            TransformerEncoderLayer.
+        idx (int): layer index (used to retrieve samples).
+        layer_select (LayerSelect, optional): instance of LayerSelect module with logits
+            parameters and sampling method.
+    """
+
+    def __init__(self, args, idx, layer_select=None):
+        super().__init__(args)
+        self.idx = idx
+        self.layer_select = layer_select
+
+    def residual_connection(self, x, residual):
+        return residual + x * self.layer_select(self.idx)
+
+
+class LatentTransformerDecoder(TransformerDecoder):
+    """Latent depth (https://arxiv.org/abs/2009.13102) implemented in
+    TransformerDecoder.
+    """
+
+    def __init__(
+        self, args, dictionary, embed_tokens, no_encoder_attn=False, num_logits=1
+    ):
+        self.num_logits = num_logits
+        self.num_layers = args.decoder_layers
+        super().__init__(
+            args, dictionary, embed_tokens, no_encoder_attn=no_encoder_attn
+        )
+        self.layer_select = LayerSelect(
+            num_layers=self.num_layers,
+            num_logits=self.num_logits,
+            soft_select=getattr(args, "soft_select", False),
+            sampling_tau=getattr(args, "sampling_tau", 5.),
+        )
+        self.lang_idx = None
+        self.layers = nn.ModuleList(
+            [
+                self._build_decoder_layer(args, no_encoder_attn, idx)
+                for idx in range(args.decoder_layers)
+            ]
+        )
+
+    def set_lang_idx(self, lang_idx):
+        self.lang_idx = lang_idx
+
+    def _build_decoder_layer(self, args, no_encoder_attn=False, idx=None):
+        return LatentTransformerDecoderLayer(
+            args, idx, layer_select=self.layer_select, no_encoder_attn=no_encoder_attn
+        )
+
+    def forward(
+        self,
+        prev_output_tokens,
+        encoder_out: Optional[EncoderOut] = None,
+        incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]] = None,
+        features_only: bool = False,
+        alignment_layer: Optional[int] = None,
+        alignment_heads: Optional[int] = None,
+        src_lengths: Optional[Any] = None,
+        return_all_hiddens: bool = False,
+    ):
+        self.layer_select.sample(self.lang_idx)
+        return super().forward(
+            prev_output_tokens=prev_output_tokens,
+            encoder_out=encoder_out,
+            incremental_state=incremental_state,
+            features_only=features_only,
+            alignment_layer=alignment_layer,
+            src_lengths=src_lengths,
+            return_all_hiddens=return_all_hiddens,
+        )
+
+
+class LatentTransformerDecoderLayer(TransformerDecoderLayer):
+    """Decoder layer with each (non_residual) block weighted by samples of Bernouli
+    or Gumbel Signmoid samples.
+
+    Args:
+        args (argparse.Namespace): parsed command-line arguments from standard
+            TransformerDecoderLayer.
+        idx (int): layer index (used to retrieve samples).
+        layer_select (LayerSelect, optional): instance of LayerSelect module with logits
+            parameters and sampling method.
+        no_encoder_attn (bool, optional): whether to attend to encoder outputs
+            (default: False).
+
+    """
+
+    def __init__(
+        self,
+        args,
+        idx,
+        layer_select=None,
+        no_encoder_attn=False,
+        add_bias_kv=False,
+        add_zero_attn=False,
+    ):
+        super().__init__(args, no_encoder_attn, add_bias_kv, add_zero_attn)
+        self.idx = idx
+        self.layer_select = layer_select
+
+    def residual_connection(self, x, residual):
+        return residual + x * self.layer_select(self.idx)
diff --git a/SpeechT5/fairseq/examples/latent_depth/latent_depth_src/modules/__init__.py b/SpeechT5/fairseq/examples/latent_depth/latent_depth_src/modules/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/SpeechT5/fairseq/examples/latent_depth/latent_depth_src/modules/latent_layers.py b/SpeechT5/fairseq/examples/latent_depth/latent_depth_src/modules/latent_layers.py
new file mode 100644
index 0000000000000000000000000000000000000000..2be05d5535cb05b16f61603a7356df2326bf2e23
--- /dev/null
+++ b/SpeechT5/fairseq/examples/latent_depth/latent_depth_src/modules/latent_layers.py
@@ -0,0 +1,75 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import torch
+import torch.nn as nn
+
+
+class LayerSelect(nn.Module):
+    """Compute samples (from a Gumbel-Sigmoid distribution) which is used as
+    either (soft) weighting or (hard) selection of residual connection.
+    https://arxiv.org/abs/2009.13102
+    """
+    def __init__(self, num_layers, num_logits, soft_select=False, sampling_tau=5.):
+        super(LayerSelect, self).__init__()
+        self.layer_logits = torch.nn.Parameter(
+            torch.Tensor(num_logits, num_layers),
+            requires_grad=True,
+        )
+        self.hard_select = not soft_select
+        self.tau = sampling_tau
+        self.detach_grad = False
+        self.layer_samples = [None] * num_logits
+
+    def sample(self, logit_idx):
+        """To leverage the efficiency of distributed training, samples for all
+        layers are computed at once for each logit_idx. Logits are parameters
+        learnt independent of each other.
+
+        Args:
+            logit_idx: The index of logit parameters used for sampling.
+        """
+        assert logit_idx is not None
+        self.samples = self._gumbel_sigmoid(
+            self.layer_logits[logit_idx, :].detach()
+            if self.detach_grad
+            else self.layer_logits[logit_idx, :],
+            dim=-1,
+            tau=self.tau,
+            hard=self.hard_select,
+        )
+        self.layer_samples[logit_idx] = self.samples
+
+    def forward(self, i):
+        sample = self.samples[i]
+        return sample
+
+    def _gumbel_sigmoid(
+        self, logits, tau=1, hard=False, eps=1e-10, dim=-1, threshold=0.5
+    ):
+        # ~Gumbel(0,1)
+        gumbels1 = (
+            -torch.empty_like(logits, memory_format=torch.legacy_contiguous_format)
+            .exponential_()
+            .log()
+        )
+        gumbels2 = (
+            -torch.empty_like(logits, memory_format=torch.legacy_contiguous_format)
+            .exponential_()
+            .log()
+        )
+        # Difference of two gumbels because we apply a sigmoid
+        gumbels1 = (logits + gumbels1 - gumbels2) / tau
+        y_soft = gumbels1.sigmoid()
+        if hard:
+            # Straight through.
+            y_hard = torch.zeros_like(
+                logits, memory_format=torch.legacy_contiguous_format
+            ).masked_fill(y_soft > threshold, 1.0)
+            ret = y_hard - y_soft.detach() + y_soft
+        else:
+            # Reparametrization trick.
+            ret = y_soft
+        return ret
diff --git a/SpeechT5/fairseq/examples/latent_depth/latent_depth_src/multilingual_translation_latent_depth.py b/SpeechT5/fairseq/examples/latent_depth/latent_depth_src/multilingual_translation_latent_depth.py
new file mode 100644
index 0000000000000000000000000000000000000000..b5cd51a470bd56266d4198b6cd20004c53b04c70
--- /dev/null
+++ b/SpeechT5/fairseq/examples/latent_depth/latent_depth_src/multilingual_translation_latent_depth.py
@@ -0,0 +1,194 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from fairseq.tasks import register_task
+from fairseq.tasks.multilingual_translation import MultilingualTranslationTask
+
+from .loss.latent_depth import LatentLayersKLLoss, LatentLayersSparsityLoss
+
+
+@register_task("multilingual_translation_latent_depth")
+class MultilingualTranslationTaskLatentDepth(MultilingualTranslationTask):
+    """A task for multiple translation with latent depth.
+
+    See `"Deep Transformer with Latent Depth"
+        (Li et al., 2020) <https://arxiv.org/pdf/2009.13102.pdf>`_.
+    """
+
+    @staticmethod
+    def add_args(parser):
+        """Add task-specific arguments to the parser."""
+        # fmt: off
+        MultilingualTranslationTask.add_args(parser)
+        parser.add_argument('--encoder-latent-layer', action='store_true', help='latent layer selection in encoder')
+        parser.add_argument('--decoder-latent-layer', action='store_true', help='latent layer selection in decoder')
+        parser.add_argument('--target-layers', default=-1, type=int,
+                            help='number of effective layers to learn; -1 means no constraint')
+        parser.add_argument('--sparsity-weight', default=0.0, type=float,
+                            help='weight for sparsity loss')
+        parser.add_argument('--share-weight', default=0.0, type=float,
+                            help='weight for sharing loss')
+        parser.add_argument('--soft-update', default=1, type=int,
+                            help='number of updates with soft sampling')
+        parser.add_argument('--anneal-updates', default=1, type=int,
+                            help='number of updates to anneal the KL loss weight')
+        parser.add_argument('--prior', default="uniform", type=str,
+                            help='prior used for computing KL loss')
+        # fmt: on
+
+    def __init__(self, args, dicts, training):
+        super().__init__(args, dicts, training)
+        self.src_langs, self.tgt_langs = zip(
+            *[(lang.split("-")[0], lang.split("-")[1]) for lang in args.lang_pairs]
+        )
+        if self.training and self.encoder_latent_layer:
+            assert self.args.share_encoders
+        if self.training and self.decoder_latent_layer:
+            assert self.args.share_decoders
+        if training or self.encoder_latent_layer or self.decoder_latent_layer:
+            self.lang_pairs = args.lang_pairs
+        else:
+            self.lang_pairs = ["{}-{}".format(args.source_lang, args.target_lang)]
+        self.eval_lang_pairs = self.lang_pairs
+        self.model_lang_pairs = self.lang_pairs
+        if self.training and (self.encoder_latent_layer or self.decoder_latent_layer):
+            self.kl_loss = LatentLayersKLLoss(self.args)
+            self.sparsity_loss = LatentLayersSparsityLoss(self.args)
+
+    def _per_lang_pair_train_loss(
+        self, lang_pair, model, update_num, criterion, sample, optimizer, ignore_grad
+    ):
+        src, tgt = lang_pair.split("-")
+        if self.encoder_latent_layer:
+            src_lang_idx = self.src_lang_idx_dict[src]
+            model.models[lang_pair].encoder.set_lang_idx(src_lang_idx)
+            model.models[lang_pair].encoder.layer_select.hard_select = (
+                update_num > self.args.soft_update
+            )
+        if self.decoder_latent_layer:
+            tgt_lang_idx = self.tgt_lang_idx_dict[tgt]
+            model.models[lang_pair].decoder.set_lang_idx(tgt_lang_idx)
+            model.models[lang_pair].decoder.layer_select.hard_select = (
+                update_num > self.args.soft_update
+            )
+
+        loss, sample_size, logging_output = criterion(
+            model.models[lang_pair], sample[lang_pair]
+        )
+        if self.encoder_latent_layer:
+            none_samples = sum(
+                1 if x is None else 0
+                for x in model.models[lang_pair].encoder.layer_select.layer_samples
+            )
+            if none_samples == 0 or self.args.prior != "agged_posterior":
+                loss += self.kl_loss(
+                    model.models[lang_pair].encoder.layer_select.layer_samples,
+                    src_lang_idx,
+                    update_num,
+                    sample_size,
+                )
+        if self.decoder_latent_layer:
+            none_samples = sum(
+                1 if x is None else 0
+                for x in model.models[lang_pair].decoder.layer_select.layer_samples
+            )
+            if none_samples == 0 or self.args.prior != "agged_posterior":
+                loss += self.kl_loss(
+                    model.models[lang_pair].decoder.layer_select.layer_samples,
+                    tgt_lang_idx,
+                    update_num,
+                    sample_size,
+                )
+        if ignore_grad:
+            loss *= 0
+
+        if hasattr(self, "sparsity_loss") and self.sparsity_loss.is_valid(update_num):
+            # need to retain the graph if sparsity loss needs to be added
+            loss.backward(retain_graph=True)
+        else:
+            optimizer.backward(loss)
+
+        return loss, sample_size, logging_output
+
+    def train_step(
+        self, sample, model, criterion, optimizer, update_num, ignore_grad=False
+    ):
+        agg_loss, agg_sample_size, agg_logging_output = super().train_step(
+            sample, model, criterion, optimizer, update_num, ignore_grad
+        )
+        # compute auxiliary loss from layere sparsity, based on all samples from all languages
+        if hasattr(self, "sparsity_loss") and self.sparsity_loss.is_valid(update_num):
+            sparsity_loss = 0
+            if self.encoder_latent_layer:
+                sparsity_loss += self.sparsity_loss(
+                    next(
+                        iter(model.models.values())
+                    ).encoder.layer_select.layer_samples,
+                    update_num,
+                    agg_sample_size,
+                )
+            if self.decoder_latent_layer:
+                sparsity_loss += self.sparsity_loss(
+                    next(
+                        iter(model.models.values())
+                    ).decoder.layer_select.layer_samples,
+                    update_num,
+                    agg_sample_size,
+                )
+            if sparsity_loss > 0:
+                optimizer.backward(sparsity_loss)
+        return agg_loss, agg_sample_size, agg_logging_output
+
+    def _per_lang_pair_valid_loss(self, lang_pair, model, criterion, sample):
+        src, tgt = lang_pair.split("-")
+        if self.encoder_latent_layer:
+            src_lang_idx = self.src_lang_idx_dict[src]
+            model.models[lang_pair].encoder.set_lang_idx(src_lang_idx)
+        if self.decoder_latent_layer:
+            tgt_lang_idx = self.tgt_lang_idx_dict[tgt]
+            model.models[lang_pair].decoder.set_lang_idx(tgt_lang_idx)
+        loss, sample_size, logging_output = criterion(
+            model.models[lang_pair], sample[lang_pair]
+        )
+        return loss, sample_size, logging_output
+
+    def inference_step(
+        self, generator, models, sample, prefix_tokens=None, constraints=None
+    ):
+        if self.encoder_latent_layer or self.decoder_latent_layer:
+            for model in models:
+                if self.encoder_latent_layer:
+                    assert model.encoder.layer_select is not None
+                    src_lang_idx = self.src_lang_idx_dict[self.args.source_lang]
+                    model.encoder.set_lang_idx(src_lang_idx)
+                if self.decoder_latent_layer:
+                    assert model.decoder.layer_select is not None
+                    tgt_lang_idx = self.tgt_lang_idx_dict[self.args.target_lang]
+                    model.decoder.set_lang_idx(tgt_lang_idx)
+        return super().inference_step(
+            generator, models, sample, prefix_tokens, constraints
+        )
+
+    @property
+    def encoder_latent_layer(self):
+        return (
+            hasattr(self.args, "encoder_latent_layer")
+            and self.args.encoder_latent_layer
+        )
+
+    @property
+    def decoder_latent_layer(self):
+        return (
+            hasattr(self.args, "decoder_latent_layer")
+            and self.args.decoder_latent_layer
+        )
+
+    @property
+    def src_lang_idx_dict(self):
+        return {lang: lang_idx for lang_idx, lang in enumerate(self.src_langs)}
+
+    @property
+    def tgt_lang_idx_dict(self):
+        return {lang: lang_idx for lang_idx, lang in enumerate(self.tgt_langs)}
diff --git a/SpeechT5/fairseq/examples/layerdrop/README.md b/SpeechT5/fairseq/examples/layerdrop/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..394e710b0f522981dbb073f28eaf550ee28760cf
--- /dev/null
+++ b/SpeechT5/fairseq/examples/layerdrop/README.md
@@ -0,0 +1,154 @@
+# Reducing Transformer Depth on Demand with Structured Dropout (Fan et al., 2019)
+This page contains information for how to train models with LayerDrop, based on this [paper](https://arxiv.org/abs/1909.11556).
+
+## Citation:
+If you found this technique useful, please cite our paper:
+```bibtex
+@article{fan2019reducing,
+  title={Reducing Transformer Depth on Demand with Structured Dropout},
+  author={Fan, Angela and Grave, Edouard and Joulin, Armand},
+  journal={arXiv preprint arXiv:1909.11556},
+  year={2019}
+}
+```
+
+## Pre-trained models
+
+Model | Description | Download
+---|---|---
+`layerdrop_wmt_en_de_12_6` | Transformer + LayerDrop 0.2 trained on WMT16 en-de with 12 encoder and 6 decoder layers | [layerdrop_wmt_en_de_12_6.tar.gz](https://dl.fbaipublicfiles.com/fairseq/models/layerdrop_wmt_en_de_12_6.tar.gz)
+`roberta_layerdrop.base` | RoBERTa Base + LayerDrop 0.2 | [roberta_layerdrop.base.tar.gz](https://dl.fbaipublicfiles.com/fairseq/models/roberta_layerdrop.base.qnli.tar.gz)
+`roberta_layerdrop.large` | RoBERTa Large + LayerDrop 0.2 | [roberta_layerdrop.large.tar.gz](https://dl.fbaipublicfiles.com/fairseq/models/roberta_layerdrop.large.tar.gz)
+`roberta_layerdrop.large.mnli` | `roberta_layerdrop.large` finetuned on [MNLI](http://www.nyu.edu/projects/bowman/multinli) | [roberta_layerdrop.large.mnli.tar.gz](https://dl.fbaipublicfiles.com/fairseq/models/roberta_layerdrop.large.mnli.tar.gz)
+`roberta_layerdrop.large.qnli` | `roberta_layerdrop.large` finetuned on [QNLI](https://arxiv.org/abs/1804.07461) | [roberta_layerdrop.large.mnli.tar.gz](https://dl.fbaipublicfiles.com/fairseq/models/roberta_layerdrop.large.qnli.tar.gz)
+
+
+Evaluate performance of these pre-trained models:
+```bash
+# Example for Machine Translation
+fairseq-generate /path/to/bped/wmt/data --path nmt_checkpoint.pt \
+  --beam 8 --lenpen 0.4 \
+  --batch-size 64 \
+  --remove-bpe \
+  --gen-subset test > wmt16_gen.txt
+bash scripts/compound_split_bleu.sh wmt16_gen.txt
+# prints BLEU4 = 30.17
+```
+
+```python
+# Example for RoBERTa + LayerDrop finetuned on MNLI:
+from fairseq.models.roberta import RobertaModel
+
+roberta_layerdrop = RobertaModel.from_pretrained(
+    '/path/to/MNLI/model',
+    checkpoint_file='mnli_checkpoint.pt',
+    data_name_or_path='/path/to/MNLI/data/MNLI-bin'
+)
+label_map = {0: 'contradiction', 2: 'neutral', 1: 'entailment'}
+ncorrect, nsamples = 0, 0
+roberta_layerdrop.cuda()
+roberta_layerdrop.eval()
+with open('/path/to/MNLI/data/dev_matched.tsv') as fin:
+    fin.readline()
+    for index, line in enumerate(fin):
+        tokens = line.strip().split('\t')
+        sent1, sent2, target = tokens[8], tokens[9], tokens[-1]
+        tokens = roberta_layerdrop.encode(sent1, sent2)
+        prediction = roberta_layerdrop.predict('sentence_classification_head', tokens).argmax().item()
+        prediction_label = label_map[prediction]
+        ncorrect += int(prediction_label == target)
+        nsamples += 1
+print('| Accuracy: ', float(ncorrect)/float(nsamples))
+# prints | Accuracy:  0.9026999490575649
+
+
+# Example for RoBERTa + LayerDrop finetuned on QNLI:
+roberta = RobertaModel.from_pretrained(
+    '/path/to/QNLI/model',
+    checkpoint_file='qnli_checkpoint.pt',
+    data_name_or_path='/path/to/QNLI/data/QNLI-bin'
+)
+
+label_fn = lambda label: roberta.task.label_dictionary.string(
+    [label + roberta.task.target_dictionary.nspecial]
+)
+ncorrect, nsamples = 0, 0
+roberta.cuda()
+roberta.eval()
+with open('/path/to/QNLI/data/dev.tsv') as fin:
+    fin.readline()
+    for index, line in enumerate(fin):
+        tokens = line.strip().split('\t')
+        sent1, sent2, target = tokens[1], tokens[2], tokens[3]
+        tokens = roberta.encode(sent1, sent2)
+        prediction = roberta.predict('sentence_classification_head', tokens).argmax().item()
+        prediction_label = label_fn(prediction)
+        ncorrect += int(prediction_label == target)
+        nsamples += 1
+print('| Accuracy: ', float(ncorrect)/float(nsamples))
+# prints | Accuracy:  0.9480139117700896
+```
+
+
+## Example usage
+
+To train a model with LayerDrop, add the following flags. We recommend 0.2, a value that worked well in our experiments. For Language Models that are decoder-only, you need only the decoder flag. For RoBERTa, an encoder, you need only the encoder flag. The encoder and decoder LayerDrop values can be set differently.
+```
+--encoder-layerdrop 0.2 --decoder-layerdrop 0.2
+```
+
+To prune a model that has been trained with LayerDrop, add the following flags followed by a comma separated list of which layers you would like to keep.
+```
+--encoder-layers-to-keep 0,2,4,6,8,10,12,14 --decoder-layers-to-keep 0,2,4,6,8,10,12,14
+```
+Setting these flags should print a message such as:
+```
+| Pruning model to specified layer configuration
+```
+You should also see a smaller number of parameters in the model, for example the 16-Layer Transformer Language Model prints:
+```
+num. model params: 246933504
+```
+while a model pruned to 8 Layers prints:
+```
+num. model params: 146163712
+```
+
+If you would like to pick up training with a model that has been pruned, simply adding these flags is sufficient. If you would like to use a script that only does evaluation (no training), you may need to pass an override command. A specific example would be for language modeling:
+```bash
+fairseq-eval-lm /path/to/wikitext-103 \
+  --path /path/to/model/checkpoint.pt \
+  --model-overrides "{'decoder_layers_to_keep':'0,2,4,6,8,10,12,14'}"
+```
+This model override command overrides the training parameters and updates the model arguments so that the pruned model is run instead of the full model.
+
+## Reproduce Paper Results
+
+Looking to reproduce the results in the paper?
+
+1. For Translation on WMT16 en-de, we followed this setting [here](https://github.com/pytorch/fairseq/blob/master/examples/scaling_nmt/README.md)
+2. To train RoBERTa, we followed this setting [here](https://github.com/pytorch/fairseq/tree/master/examples/roberta)
+3. To train Language Models on Wikitext-103, we followed this setting [here](https://github.com/pytorch/fairseq/tree/master/examples/language_model)
+
+
+## Tips
+
+1. If you would like to train large models with better performance, LayerDrop should be set to a smaller value such as 0.1 or 0.2. Too much LayerDrop will mean the model has too much regularization, so may not reach the best performance. Since LayerDrop adds regularization, you may achieve the best performance by slightly reducing the amount of standard dropout (for example, reduce by 0.1).
+
+2. If you would like to train large models to be pruned and made smaller, LayerDrop should be set to a larger value such as 0.5 if you want to prune very aggressively (such as removing half the network or more). If you would like to prune fewer layers away, LayerDrop can be set to a smaller value such as 0.2. Our experiments were conducted with low values of LayerDrop (such as 0.1 and 0.2), for reference.
+
+3. When pruning layers at inference time, it is best to spread out the layers remaining so they are evenly spaced throughout the network. For example, if you want to remove 50% of the network, keeping every other layer is good.
+
+
+## FAQ
+
+1. How did the sharing layers experiment work? In an appendix (https://openreview.net/pdf?id=SylO2yStDr) we added an experiment on Wikitext-103 language modeling that combined LayerDrop with Weight Sharing. We shared chunks of 2 layers such that every other layer had shared weights. For example, if our network has layers 1 through 6, then layer 1 and 2 are shared, layer 3 and 4 are shared, and layer 5 and 6 are shared.
+
+2. LayerDrop hasn't been helping in my setting? During training time, LayerDrop can help regularize your network. This is most important if your network is already overfitting - if your network is underfitting, it is possible LayerDrop is adding too much regularization. We recommend using smaller values (such as 0.1 or 0.2) and also decreasing the quantity of standard dropout (for example, reduce by 0.1).
+
+3. Can you train a model without LayerDrop and finetune with LayerDrop (e.g. for BERT)? In our experiments, we did not see great performance. Models such as RoBERTa have trained for a long time in the pre-training setting, so only finetuning with LayerDrop for a few epochs on a downstream task such as MNLI does not achieve the robustness required for successful pruning.
+
+
+## Having an issue or have a question?
+
+Please open an issue in this repository with the details of your question. Thanks!
diff --git a/SpeechT5/fairseq/examples/linformer/README.md b/SpeechT5/fairseq/examples/linformer/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..f8b36bc691cb8f5bf82942e07b6d9c014387bdd8
--- /dev/null
+++ b/SpeechT5/fairseq/examples/linformer/README.md
@@ -0,0 +1,22 @@
+# Linformer: Self-Attention with Linear Complexity (Wang et al., 2020)
+
+This example contains code to train Linformer models as described in our paper
+[Linformer: Self-Attention with Linear Complexity](https://arxiv.org/abs/2006.04768).
+
+## Training a new Linformer RoBERTa model
+
+You can mostly follow the [RoBERTa pretraining README](/examples/roberta/README.pretraining.md),
+updating your training command with `--user-dir examples/linformer/linformer_src --arch linformer_roberta_base`.
+
+## Citation
+
+If you use our work, please cite:
+
+```bibtex
+@article{wang2020linformer,
+  title={Linformer: Self-Attention with Linear Complexity},
+  author={Wang, Sinong and Li, Belinda and Khabsa, Madian and Fang, Han and Ma, Hao},
+  journal={arXiv preprint arXiv:2006.04768},
+  year={2020}
+}
+```
diff --git a/SpeechT5/fairseq/examples/linformer/linformer_src/__init__.py b/SpeechT5/fairseq/examples/linformer/linformer_src/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..1c52f135ea6f99d0effe8ce1f7d77cbd66be3745
--- /dev/null
+++ b/SpeechT5/fairseq/examples/linformer/linformer_src/__init__.py
@@ -0,0 +1,6 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from .models import linformer_roberta  # noqa
diff --git a/SpeechT5/fairseq/examples/linformer/linformer_src/models/__init__.py b/SpeechT5/fairseq/examples/linformer/linformer_src/models/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/SpeechT5/fairseq/examples/linformer/linformer_src/models/linformer_roberta.py b/SpeechT5/fairseq/examples/linformer/linformer_src/models/linformer_roberta.py
new file mode 100644
index 0000000000000000000000000000000000000000..18ad44f079e691e7f46aa2745fe4f35d4466ca33
--- /dev/null
+++ b/SpeechT5/fairseq/examples/linformer/linformer_src/models/linformer_roberta.py
@@ -0,0 +1,119 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+"""
+Linformer: Self-Attention with Linear Complexity
+"""
+
+import logging
+
+import torch
+from fairseq import utils
+from fairseq.models import register_model, register_model_architecture
+from fairseq.models.roberta import (
+    init_bert_params,
+    roberta_base_architecture,
+    roberta_large_architecture,
+    RobertaEncoder,
+    RobertaModel,
+)
+
+from ..modules.linformer_sentence_encoder import LinformerTransformerEncoder
+
+
+logger = logging.getLogger(__name__)
+
+
+@register_model("linformer_roberta")
+class LinformerModel(RobertaModel):
+    @staticmethod
+    def add_args(parser):
+        RobertaModel.add_args(parser)
+
+        # add args for Linformer
+        parser.add_argument(
+            "--compressed", type=int, help="compressed ratio of sequence length"
+        )
+        parser.add_argument(
+            "--shared-kv-compressed",
+            type=int,
+            help="share compressed matrix between k and v, in each layer",
+        )
+        parser.add_argument(
+            "--shared-layer-kv-compressed",
+            type=int,
+            help="share compressed matrix between k and v and across all layers",
+        )
+        parser.add_argument(
+            "--freeze-compress",
+            type=int,
+            help="freeze the parameters in compressed layer",
+        )
+
+    @classmethod
+    def build_model(cls, args, task):
+        """Build a new model instance."""
+
+        # make sure all arguments are present
+        base_architecture(args)
+
+        if not hasattr(args, "max_positions"):
+            args.max_positions = args.tokens_per_sample
+
+        encoder = LinformerEncoder(args, task.source_dictionary)
+        return cls(args, encoder)
+
+
+class LinformerEncoder(RobertaEncoder):
+    """Linformer encoder."""
+
+    def __init__(self, args, dictionary):
+        super().__init__(args, dictionary)
+        self.register_buffer("version", torch.tensor(2))
+
+    def build_encoder(self, args, dictionary, embed_tokens):
+        encoder = LinformerTransformerEncoder(args, dictionary, embed_tokens)
+        encoder.apply(init_bert_params)
+        return encoder
+
+    def upgrade_state_dict_named(self, state_dict, name):
+        super().upgrade_state_dict_named(state_dict, name)
+        prefix = name + "." if name != "" else ""
+
+        # some old checkpoints had weight sharing implemented incorrectly
+        # (note: this was correct in the original paper code)
+        if utils.item(state_dict.get(f"{prefix}version", torch.tensor(1))) < 2:
+            state_dict[f"{prefix}version"] = torch.tensor(1)
+            # check if input embeddings and output embeddings were tied
+            if not torch.allclose(
+                state_dict[f"{prefix}sentence_encoder.embed_tokens.weight"],
+                state_dict[f"{prefix}lm_head.weight"],
+            ):
+                # they weren't tied, re-init the LM head without weight sharing
+                self.lm_head = self.build_lm_head(
+                    embed_dim=self.args.encoder_embed_dim,
+                    output_dim=len(self.dictionary),
+                    activation_fn=self.args.activation_fn,
+                    weight=None,  # don't share weights
+                )
+
+
+@register_model_architecture("linformer_roberta", "linformer_roberta")
+def base_architecture(args):
+    args.compressed = getattr(args, "compressed", 4)
+    args.shared_kv_compressed = getattr(args, "shared_kv_compressed", 0)
+    args.shared_layer_kv_compressed = getattr(args, "shared_layer_kv_compressed", 0)
+    args.freeze_compress = getattr(args, "freeze_compress", 0)
+    roberta_base_architecture(args)
+
+
+@register_model_architecture("linformer_roberta", "linformer_roberta_base")
+def linformer_roberta_base_architecture(args):
+    base_architecture(args)
+
+
+@register_model_architecture("linformer_roberta", "linformer_roberta_large")
+def linformer_roberta_large_architecture(args):
+    roberta_large_architecture(args)
+    base_architecture(args)
diff --git a/SpeechT5/fairseq/examples/linformer/linformer_src/modules/__init__.py b/SpeechT5/fairseq/examples/linformer/linformer_src/modules/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/SpeechT5/fairseq/examples/linformer/linformer_src/modules/linformer_sentence_encoder.py b/SpeechT5/fairseq/examples/linformer/linformer_src/modules/linformer_sentence_encoder.py
new file mode 100644
index 0000000000000000000000000000000000000000..44f7989bd863329f763aa62b78df2eb42b3084ea
--- /dev/null
+++ b/SpeechT5/fairseq/examples/linformer/linformer_src/modules/linformer_sentence_encoder.py
@@ -0,0 +1,54 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import math
+
+import torch.nn as nn
+from fairseq.models.transformer import TransformerEncoder
+
+from .linformer_sentence_encoder_layer import LinformerTransformerEncoderLayer
+
+
+class LinformerTransformerEncoder(TransformerEncoder):
+    """
+    Implementation for a Bi-directional Linformer based Sentence Encoder used
+    in BERT/XLM style pre-trained models.
+
+    This first computes the token embedding using the token embedding matrix,
+    position embeddings (if specified) and segment embeddings
+    (if specified). After applying the specified number of
+    LinformerEncoderLayers, it outputs all the internal states of the
+    encoder as well as the final representation associated with the first
+    token (usually CLS token).
+
+    Input:
+        - tokens: B x T matrix representing sentences
+        - segment_labels: B x T matrix representing segment label for tokens
+
+    Output:
+        - a tuple of the following:
+            - a list of internal model states used to compute the
+              predictions where each tensor has shape T x B x C
+            - sentence representation associated with first input token
+              in format B x C.
+    """
+
+    def __init__(self, args, dictionary, embed_tokens):
+        self.compress_layer = None
+        super().__init__(args, dictionary, embed_tokens)
+
+    def build_encoder_layer(self, args):
+        if self.args.shared_layer_kv_compressed == 1 and self.compress_layer is None:
+            compress_layer = nn.Linear(
+                self.args.max_positions,
+                self.args.max_positions // self.args.compressed,
+            )
+            # intialize parameters for compressed layer
+            nn.init.xavier_uniform_(compress_layer.weight, gain=1 / math.sqrt(2))
+            if self.args.freeze_compress == 1:
+                compress_layer.weight.requires_grad = False
+            self.compress_layer = compress_layer
+
+        return LinformerTransformerEncoderLayer(args, self.compress_layer)
diff --git a/SpeechT5/fairseq/examples/linformer/linformer_src/modules/linformer_sentence_encoder_layer.py b/SpeechT5/fairseq/examples/linformer/linformer_src/modules/linformer_sentence_encoder_layer.py
new file mode 100644
index 0000000000000000000000000000000000000000..7e2caa03400129ac0bb34ae35274cdf46f27a055
--- /dev/null
+++ b/SpeechT5/fairseq/examples/linformer/linformer_src/modules/linformer_sentence_encoder_layer.py
@@ -0,0 +1,65 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import torch
+from fairseq import utils
+from fairseq.modules import TransformerEncoderLayer
+
+from .multihead_linear_attention import MultiheadLinearAttention
+
+
+class LinformerTransformerEncoderLayer(TransformerEncoderLayer):
+    """
+    Implements a Linformer Encoder Layer used in BERT/XLM style pre-trained
+    models.
+    """
+
+    def __init__(self, args, shared_compress_layer):
+        # wrap in a list so it's not automatically registered by PyTorch
+        self.shared_compress_layer = [shared_compress_layer]
+
+        super().__init__(args)
+
+        self.register_buffer("version", torch.tensor(2))
+
+    def build_self_attention(self, embed_dim, args):
+        return MultiheadLinearAttention(
+            embed_dim,
+            args.encoder_attention_heads,
+            dropout=args.dropout,
+            self_attention=True,
+            q_noise=args.quant_noise_pq,
+            qn_block_size=args.quant_noise_pq_block_size,
+            compressed=args.compressed,
+            max_seq_len=args.max_positions,
+            shared_kv_compressed=args.shared_kv_compressed,
+            shared_compress_layer=self.shared_compress_layer[0],
+            freeze_compress=args.freeze_compress,
+        )
+
+    def upgrade_state_dict_named(self, state_dict, name):
+        super().upgrade_state_dict_named(state_dict, name)
+        prefix = name + "." if name != "" else ""
+
+        # some old checkpoints had weight sharing implemented incorrectly
+        # (note: this was correct in the original paper code)
+        if utils.item(state_dict.get(f"{prefix}version", torch.tensor(1))) < 2:
+            state_dict[f"{prefix}version"] = torch.tensor(1)
+            # check compression layer sharing
+            if f"{prefix}shared_compress_layer.weight" in state_dict:
+                # reinitialize block without sharing compression layer to match
+                # old behavior
+                self.shared_compress_layer = [
+                    torch.nn.Linear(
+                        self.shared_compress_layer[0].weight.size(1),
+                        self.shared_compress_layer[0].weight.size(0),
+                    )
+                ]
+                self.self_attn = self.build_self_attention(self.embed_dim, self.args)
+                # delete shared_compress_layer, since it's already copied to
+                # self_attn.compress_k.weight
+                del state_dict[f"{prefix}shared_compress_layer.weight"]
+                if f"{prefix}shared_compress_layer.bias" in state_dict:
+                    del state_dict[f"{prefix}shared_compress_layer.bias"]
diff --git a/SpeechT5/fairseq/examples/linformer/linformer_src/modules/multihead_linear_attention.py b/SpeechT5/fairseq/examples/linformer/linformer_src/modules/multihead_linear_attention.py
new file mode 100644
index 0000000000000000000000000000000000000000..6be1007279217c5de644e8b054f5d14a19f06c55
--- /dev/null
+++ b/SpeechT5/fairseq/examples/linformer/linformer_src/modules/multihead_linear_attention.py
@@ -0,0 +1,481 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import math
+from typing import Dict, Optional, Tuple
+
+import torch
+import torch.nn.functional as F
+from fairseq import utils
+from fairseq.incremental_decoding_utils import with_incremental_state
+from fairseq.modules.quant_noise import quant_noise
+from torch import Tensor, nn
+from torch.nn import Parameter
+
+
+@with_incremental_state
+class MultiheadLinearAttention(nn.Module):
+    """Multi-headed linformer attention.
+
+    Projects the key and values down to the compressed dimension, before computing self-attention.
+
+    See "Linformer: Self-Attention with Linear Complexity" for more details.
+    """
+
+    def __init__(
+        self,
+        embed_dim,
+        num_heads,
+        kdim=None,
+        vdim=None,
+        dropout=0.0,
+        bias=True,
+        add_bias_kv=False,
+        add_zero_attn=False,
+        self_attention=False,
+        encoder_decoder_attention=False,
+        q_noise=0.0,
+        qn_block_size=8,
+        compressed=1,
+        max_seq_len=256,
+        shared_kv_compressed=0,
+        shared_compress_layer=None,
+        freeze_compress=0,
+    ):
+        super().__init__()
+        self.embed_dim = embed_dim
+        self.kdim = kdim if kdim is not None else embed_dim
+        self.vdim = vdim if vdim is not None else embed_dim
+        self.qkv_same_dim = self.kdim == embed_dim and self.vdim == embed_dim
+
+        self.num_heads = num_heads
+        self.dropout = dropout
+        self.head_dim = embed_dim // num_heads
+        assert (
+            self.head_dim * num_heads == self.embed_dim
+        ), "embed_dim must be divisible by num_heads"
+        self.scaling = self.head_dim ** -0.5
+
+        self.self_attention = self_attention
+        self.encoder_decoder_attention = encoder_decoder_attention
+
+        assert not self.self_attention or self.qkv_same_dim, (
+            "Self-attention requires query, key and " "value to be of the same size"
+        )
+
+        self.k_proj = quant_noise(
+            nn.Linear(self.kdim, embed_dim, bias=bias), q_noise, qn_block_size
+        )
+        self.v_proj = quant_noise(
+            nn.Linear(self.vdim, embed_dim, bias=bias), q_noise, qn_block_size
+        )
+        self.q_proj = quant_noise(
+            nn.Linear(embed_dim, embed_dim, bias=bias), q_noise, qn_block_size
+        )
+
+        # used for compress sequence to subsequence
+        if shared_compress_layer is None:
+            self.compress_seq_len = max_seq_len // compressed
+            self.compress_k = nn.Linear(max_seq_len, self.compress_seq_len, bias=False)
+            if shared_kv_compressed == 0:
+                self.compress_v = nn.Linear(
+                    max_seq_len, self.compress_seq_len, bias=False
+                )
+            self.layerwise_sharing = False
+        else:
+            self.compress_k = shared_compress_layer
+            if shared_kv_compressed == 0:
+                self.compress_v = shared_compress_layer
+            self.layerwise_sharing = True
+        self.shared_kv_compressed = shared_kv_compressed
+
+        self.out_proj = quant_noise(
+            nn.Linear(embed_dim, embed_dim, bias=bias), q_noise, qn_block_size
+        )
+
+        if add_bias_kv:
+            self.bias_k = Parameter(torch.Tensor(1, 1, embed_dim))
+            self.bias_v = Parameter(torch.Tensor(1, 1, embed_dim))
+        else:
+            self.bias_k = self.bias_v = None
+
+        self.add_zero_attn = add_zero_attn
+
+        self.reset_parameters()
+
+        if freeze_compress == 1:
+            self.compress_k.weight.requires_grad = False
+            if shared_kv_compressed == 0:
+                self.compress_v.weight.requires_grad = False
+
+        self.onnx_trace = False
+
+    def prepare_for_onnx_export_(self):
+        self.onnx_trace = True
+
+    def reset_parameters(self):
+        if self.qkv_same_dim:
+            # Empirically observed the convergence to be much better with
+            # the scaled initialization
+            nn.init.xavier_uniform_(self.k_proj.weight, gain=1 / math.sqrt(2))
+            nn.init.xavier_uniform_(self.v_proj.weight, gain=1 / math.sqrt(2))
+            nn.init.xavier_uniform_(self.q_proj.weight, gain=1 / math.sqrt(2))
+            if (
+                not self.layerwise_sharing
+            ):  # otherwise, we already initialize the parameters
+                nn.init.xavier_uniform_(self.compress_k.weight, gain=1 / math.sqrt(2))
+                if self.shared_kv_compressed == 0:
+                    nn.init.xavier_uniform_(
+                        self.compress_v.weight, gain=1 / math.sqrt(2)
+                    )
+        else:
+            nn.init.xavier_uniform_(self.k_proj.weight)
+            nn.init.xavier_uniform_(self.v_proj.weight)
+            nn.init.xavier_uniform_(self.q_proj.weight)
+            if (
+                not self.layerwise_sharing
+            ):  # otherwise, we already initialize the parameters
+                nn.init.xavier_uniform_(self.compress_k.weight)
+                if self.shared_kv_compressed == 0:
+                    nn.init.xavier_uniform_(self.compress_v.weight)
+
+        nn.init.xavier_uniform_(self.out_proj.weight)
+        if self.out_proj.bias is not None:
+            nn.init.constant_(self.out_proj.bias, 0.0)
+        if self.bias_k is not None:
+            nn.init.xavier_normal_(self.bias_k)
+        if self.bias_v is not None:
+            nn.init.xavier_normal_(self.bias_v)
+
+    def forward(
+        self,
+        query,
+        key: Optional[Tensor],
+        value: Optional[Tensor],
+        key_padding_mask: Optional[Tensor] = None,
+        incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]] = None,
+        need_weights: bool = True,
+        static_kv: bool = False,
+        attn_mask: Optional[Tensor] = None,
+        before_softmax: bool = False,
+        need_head_weights: bool = False,
+    ) -> Tuple[Tensor, Optional[Tensor]]:
+        """Input shape: Time x Batch x Channel
+
+        Args:
+            key_padding_mask (ByteTensor, optional): mask to exclude
+                keys that are pads, of shape `(batch, src_len)`, where
+                padding elements are indicated by 1s.
+            need_weights (bool, optional): return the attention weights,
+                averaged over heads (default: False).
+            attn_mask (ByteTensor, optional): typically used to
+                implement causal attention, where the mask prevents the
+                attention from looking forward in time (default: None).
+            before_softmax (bool, optional): return the raw attention
+                weights and values before the attention softmax.
+            need_head_weights (bool, optional): return the attention
+                weights for each head. Implies *need_weights*. Default:
+                return the average attention weights over all heads.
+        """
+        if need_head_weights:
+            need_weights = True
+
+        tgt_len, bsz, embed_dim = query.size()
+        assert embed_dim == self.embed_dim
+        assert list(query.size()) == [tgt_len, bsz, embed_dim]
+
+        if incremental_state is not None:
+            saved_state = self._get_input_buffer(incremental_state)
+            if saved_state is not None and "prev_key" in saved_state:
+                # previous time steps are cached - no need to recompute
+                # key and value if they are static
+                if static_kv:
+                    assert self.encoder_decoder_attention and not self.self_attention
+                    key = value = None
+        else:
+            saved_state = None
+
+        if self.self_attention:
+            q = self.q_proj(query)
+
+            k_input = query.permute(1, 2, 0).contiguous()  # B * C * T
+            k_input = (
+                F.linear(k_input, self.compress_k.weight[:, 0:tgt_len])
+                .permute(2, 0, 1)
+                .contiguous()
+            )
+            k = self.k_proj(k_input)
+
+            v_input = query.permute(1, 2, 0).contiguous()  # B * C * T
+            if self.shared_kv_compressed == 0:
+                v_input = (
+                    F.linear(v_input, self.compress_v.weight[:, 0:tgt_len])
+                    .permute(2, 0, 1)
+                    .contiguous()
+                )
+            if self.shared_kv_compressed == 1:  # use shared kv compressed linear layer
+                v_input = (
+                    F.linear(v_input, self.compress_k.weight[:, 0:tgt_len])
+                    .permute(2, 0, 1)
+                    .contiguous()
+                )
+            v = self.v_proj(v_input)
+        elif self.encoder_decoder_attention:
+            # encoder-decoder attention
+            q = self.q_proj(query)
+            if key is None:
+                assert value is None
+                k = v = None
+            else:
+                k = self.k_proj(key)
+                v = self.v_proj(key)
+
+        else:
+            assert key is not None and value is not None
+            q = self.q_proj(query)
+            k = self.k_proj(key)
+            v = self.v_proj(value)
+        q *= self.scaling
+
+        if self.bias_k is not None:
+            assert self.bias_v is not None
+            k = torch.cat([k, self.bias_k.repeat(1, bsz, 1)])
+            v = torch.cat([v, self.bias_v.repeat(1, bsz, 1)])
+            if attn_mask is not None:
+                attn_mask = torch.cat(
+                    [attn_mask, attn_mask.new_zeros(attn_mask.size(0), 1)], dim=1
+                )
+            if key_padding_mask is not None:
+                key_padding_mask = torch.cat(
+                    [
+                        key_padding_mask,
+                        key_padding_mask.new_zeros(key_padding_mask.size(0), 1),
+                    ],
+                    dim=1,
+                )
+
+        q = (
+            q.contiguous()
+            .view(tgt_len, bsz * self.num_heads, self.head_dim)
+            .transpose(0, 1)
+        )
+        if k is not None:
+            k = (
+                k.contiguous()
+                .view(-1, bsz * self.num_heads, self.head_dim)
+                .transpose(0, 1)
+            )
+        if v is not None:
+            v = (
+                v.contiguous()
+                .view(-1, bsz * self.num_heads, self.head_dim)
+                .transpose(0, 1)
+            )
+
+        if saved_state is not None:
+            # saved states are stored with shape (bsz, num_heads, seq_len, head_dim)
+            if "prev_key" in saved_state:
+                _prev_key = saved_state["prev_key"]
+                assert _prev_key is not None
+                prev_key = _prev_key.view(bsz * self.num_heads, -1, self.head_dim)
+                if static_kv:
+                    k = prev_key
+                else:
+                    assert k is not None
+                    k = torch.cat([prev_key, k], dim=1)
+            if "prev_value" in saved_state:
+                _prev_value = saved_state["prev_value"]
+                assert _prev_value is not None
+                prev_value = _prev_value.view(bsz * self.num_heads, -1, self.head_dim)
+                if static_kv:
+                    v = prev_value
+                else:
+                    assert v is not None
+                    v = torch.cat([prev_value, v], dim=1)
+            prev_key_padding_mask: Optional[Tensor] = None
+            if "prev_key_padding_mask" in saved_state:
+                prev_key_padding_mask = saved_state["prev_key_padding_mask"]
+            assert k is not None and v is not None
+            key_padding_mask = MultiheadLinearAttention._append_prev_key_padding_mask(
+                key_padding_mask=key_padding_mask,
+                prev_key_padding_mask=prev_key_padding_mask,
+                batch_size=bsz,
+                src_len=k.size(1),
+                static_kv=static_kv,
+            )
+
+            saved_state["prev_key"] = k.view(bsz, self.num_heads, -1, self.head_dim)
+            saved_state["prev_value"] = v.view(bsz, self.num_heads, -1, self.head_dim)
+            saved_state["prev_key_padding_mask"] = key_padding_mask
+            # In this branch incremental_state is never None
+            assert incremental_state is not None
+            incremental_state = self._set_input_buffer(incremental_state, saved_state)
+        assert k is not None
+        src_len = k.size(1)
+
+        if self.add_zero_attn:
+            assert v is not None
+            src_len += 1
+            k = torch.cat([k, k.new_zeros((k.size(0), 1) + k.size()[2:])], dim=1)
+            v = torch.cat([v, v.new_zeros((v.size(0), 1) + v.size()[2:])], dim=1)
+            if attn_mask is not None:
+                attn_mask = torch.cat(
+                    [attn_mask, attn_mask.new_zeros(attn_mask.size(0), 1)], dim=1
+                )
+
+        attn_weights = torch.bmm(q, k.transpose(1, 2))
+        attn_weights = MultiheadLinearAttention.apply_sparse_mask(
+            attn_weights, tgt_len, src_len, bsz
+        )
+
+        assert list(attn_weights.size()) == [bsz * self.num_heads, tgt_len, src_len]
+
+        if attn_mask is not None:
+            attn_mask = attn_mask.unsqueeze(0)
+            if self.onnx_trace:
+                attn_mask = attn_mask.repeat(attn_weights.size(0), 1, 1)
+            attn_weights += attn_mask
+
+        if before_softmax:
+            return attn_weights, v
+
+        attn_weights_float = utils.softmax(
+            attn_weights, dim=-1, onnx_trace=self.onnx_trace
+        )
+        attn_weights = attn_weights_float.type_as(attn_weights)
+        attn_probs = F.dropout(
+            attn_weights,
+            p=self.dropout,
+            training=self.training,
+        )
+        assert v is not None
+        attn = torch.bmm(attn_probs, v)
+        assert list(attn.size()) == [bsz * self.num_heads, tgt_len, self.head_dim]
+        if self.onnx_trace and attn.size(1) == 1:
+            # when ONNX tracing a single decoder step (sequence length == 1)
+            # the transpose is a no-op copy before view, thus unnecessary
+            attn = attn.contiguous().view(tgt_len, bsz, embed_dim)
+        else:
+            attn = attn.transpose(0, 1).contiguous().view(tgt_len, bsz, embed_dim)
+        attn = self.out_proj(attn)
+        attn_weights: Optional[Tensor] = None
+        if need_weights:
+            attn_weights = attn_weights_float.view(
+                bsz, self.num_heads, tgt_len, src_len
+            ).transpose(1, 0)
+            if not need_head_weights:
+                # average attention weights over heads
+                attn_weights = attn_weights.mean(dim=0)
+
+        return attn, attn_weights
+
+    @staticmethod
+    def _append_prev_key_padding_mask(
+        key_padding_mask: Optional[Tensor],
+        prev_key_padding_mask: Optional[Tensor],
+        batch_size: int,
+        src_len: int,
+        static_kv: bool,
+    ) -> Optional[Tensor]:
+        # saved key padding masks have shape (bsz, seq_len)
+        if prev_key_padding_mask is not None and static_kv:
+            new_key_padding_mask = prev_key_padding_mask
+        elif prev_key_padding_mask is not None and key_padding_mask is not None:
+            new_key_padding_mask = torch.cat(
+                [prev_key_padding_mask.float(), key_padding_mask.float()], dim=1
+            )
+        # During incremental decoding, as the padding token enters and
+        # leaves the frame, there will be a time when prev or current
+        # is None
+        elif prev_key_padding_mask is not None:
+            filler = torch.zeros(
+                (batch_size, src_len - prev_key_padding_mask.size(1)),
+                device=prev_key_padding_mask.device,
+            )
+            new_key_padding_mask = torch.cat(
+                [prev_key_padding_mask.float(), filler.float()], dim=1
+            )
+        elif key_padding_mask is not None:
+            filler = torch.zeros(
+                (batch_size, src_len - key_padding_mask.size(1)),
+                device=key_padding_mask.device,
+            )
+            new_key_padding_mask = torch.cat(
+                [filler.float(), key_padding_mask.float()], dim=1
+            )
+        else:
+            new_key_padding_mask = prev_key_padding_mask
+        return new_key_padding_mask
+
+    @torch.jit.export
+    def reorder_incremental_state(
+        self,
+        incremental_state: Dict[str, Dict[str, Optional[Tensor]]],
+        new_order: Tensor,
+    ):
+        """Reorder buffered internal state (for incremental generation)."""
+        input_buffer = self._get_input_buffer(incremental_state)
+        if input_buffer is not None:
+            for k in input_buffer.keys():
+                input_buffer_k = input_buffer[k]
+                if input_buffer_k is not None:
+                    if self.encoder_decoder_attention and input_buffer_k.size(
+                        0
+                    ) == new_order.size(0):
+                        break
+                    input_buffer[k] = input_buffer_k.index_select(0, new_order)
+            incremental_state = self._set_input_buffer(incremental_state, input_buffer)
+        return incremental_state
+
+    def _get_input_buffer(
+        self, incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]]
+    ) -> Dict[str, Optional[Tensor]]:
+        result = self.get_incremental_state(incremental_state, "attn_state")
+        if result is not None:
+            return result
+        else:
+            empty_result: Dict[str, Optional[Tensor]] = {}
+            return empty_result
+
+    def _set_input_buffer(
+        self,
+        incremental_state: Dict[str, Dict[str, Optional[Tensor]]],
+        buffer: Dict[str, Optional[Tensor]],
+    ):
+        return self.set_incremental_state(incremental_state, "attn_state", buffer)
+
+    def apply_sparse_mask(attn_weights, tgt_len: int, src_len: int, bsz: int):
+        return attn_weights
+
+    def upgrade_state_dict_named(self, state_dict, name):
+        prefix = name + "." if name != "" else ""
+        items_to_add = {}
+        keys_to_remove = []
+        for k in state_dict.keys():
+            if k.endswith(prefix + "in_proj_weight"):
+                # in_proj_weight used to be q + k + v with same dimensions
+                dim = int(state_dict[k].shape[0] / 3)
+                items_to_add[prefix + "q_proj.weight"] = state_dict[k][:dim]
+                items_to_add[prefix + "k_proj.weight"] = state_dict[k][dim : 2 * dim]
+                items_to_add[prefix + "v_proj.weight"] = state_dict[k][2 * dim :]
+
+                keys_to_remove.append(k)
+
+                k_bias = prefix + "in_proj_bias"
+                if k_bias in state_dict.keys():
+                    dim = int(state_dict[k].shape[0] / 3)
+                    items_to_add[prefix + "q_proj.bias"] = state_dict[k_bias][:dim]
+                    items_to_add[prefix + "k_proj.bias"] = state_dict[k_bias][
+                        dim : 2 * dim
+                    ]
+                    items_to_add[prefix + "v_proj.bias"] = state_dict[k_bias][2 * dim :]
+
+                    keys_to_remove.append(prefix + "in_proj_bias")
+
+        for k in keys_to_remove:
+            del state_dict[k]
+
+        for key, value in items_to_add.items():
+            state_dict[key] = value
diff --git a/SpeechT5/fairseq/examples/m2m_100/README.md b/SpeechT5/fairseq/examples/m2m_100/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..05801584d61afef979bf43802a167ca9da4c7a8c
--- /dev/null
+++ b/SpeechT5/fairseq/examples/m2m_100/README.md
@@ -0,0 +1,241 @@
+# Beyond English-Centric Multilingual Machine Translation
+
+## Introduction
+In this work, we create a true Many-to-Many multilingual translation model that can translate directly between any pair of 100 languages. Our focus on non-English-Centric models brings gains of more than 10 BLEU when directly translating between non-English directions while performing competitively with the best single systems of WMT. 
+
+If you are new to using fairseq, read the following walkthrough. Otherwise, skip to the sections below. 
+
+0. **Generation Data**
+
+To download the generation data, follow the below commands. Note that all datasets need to be detokenized *before* applying SPM in the data preprocessing step. If you use these evaluation datasets, please cite their associated papers. 
+```bash
+# WMT - use sacrebleu, example here:
+sacrebleu -t wmt14 -l fr-en --echo src > wmt.test.fr-en.fr
+sacrebleu -t wmt14 -l fr-en --echo ref > wmt.test.fr-en.en
+
+# WAT
+wget http://lotus.kuee.kyoto-u.ac.jp/WAT/my-en-data/wat2020.my-en.zip
+unzip wat2020.my-en.zip
+
+# FLORES
+# download from: https://github.com/facebookresearch/flores
+
+# TED - need to detokenize with Moses!
+# from: https://github.com/neulab/word-embeddings-for-nmt
+wget http://phontron.com/data/ted_talks.tar.gz
+
+# Autshumato
+# request to download: https://repo.sadilar.org/handle/20.500.12185/397
+
+# Tatoeba Challenge
+# available here: https://github.com/Helsinki-NLP/Tatoeba-Challenge
+```
+
+1. **Training Data**
+
+To produce the training data, we use a combination of [CCMatrix](https://arxiv.org/abs/1911.04944) and [CCAligned](https://arxiv.org/abs/1911.06154). Check out the instructions [here](https://github.com/facebookresearch/LASER/tree/master/tasks/CCMatrix) to download the raw data.
+
+2. **Preprocess Data**
+
+After downloading raw data, you will need to postprocess the data, then apply SPM, then binarize. Note that it is very important you run the postprocessing script, because this removes any instance of the evaluation data in the mined training data.
+
+```bash
+# preprocess data
+
+# remove sentences with more than 50% punctuation
+python /path/to/fairseq/examples/m2m_100/process_data/remove_too_much_punc.py 
+
+# deduplicate training data
+paste /path/to/datadir/train.$src /path/to/datadir/train.$tgt | awk '!x[$0]++' > /path/to/datadir/train.dedup
+echo "keeping $(wc -l /path/to/datadir/train.dedup) bitext out of $(wc -l /path/to/datadir/train.$src)"
+cut -f1 /path/to/datadir/train.dedup > /path/to/datadir/train.$src
+cut -f2 /path/to/datadir/train.dedup > /path/to/datadir/train.$tgt
+
+# remove all instances of evaluation data from the training data
+python /path/to/fairseq/examples/m2m_100/process_data/dedup_data.py 
+
+# frequency cleaning
+wget https://dl.fbaipublicfiles.com/m2m_100/histograms.tar.gz 
+tar -xvzf histograms.tar.gz
+python /path/to/fairseq/examples/m2m_100/process_data/clean_histogram.py --src $src --tgt $tgt --src-file /path/to/source/file --tgt-file /path/to/output/file --src-output-file source_output.$src --tgt-output-file target_output.$tgt --histograms /path/to/histograms
+
+# apply SPM
+wget https://dl.fbaipublicfiles.com/m2m_100/spm.128k.model
+python /path/to/fairseq/scripts/spm_encode.py \
+    --model spm.128k.model \
+    --output_format=piece \
+    --inputs=/path/to/input/file/here \
+    --outputs=/path/to/output/file/here
+
+# length ratio cleaning
+perl mosesdecoder/scripts/training/clean-corpus-n.perl --ratio 3 /path/to/training/data/train.spm.$src-$tgt $src $tgt /path/to/output/directory/train.spm.$src-$tgt 1 250
+
+# binarize data
+wget https://dl.fbaipublicfiles.com/m2m_100/data_dict.128k.txt
+fairseq-preprocess \
+    --source-lang $src --target-lang $tgt \
+    --testpref spm.$src.$tgt \
+    --thresholdsrc 0 --thresholdtgt 0 \
+    --destdir data_bin \
+    --srcdict data_dict.128k.txt --tgtdict data_dict.128k.txt
+```
+
+3. **Training Scripts**
+
+To reproduce the training of our models, we train with fairseq-py's multilingual translation [task](https://github.com/pytorch/fairseq/tree/master/examples/multilingual). If you are interested in model parallel training, also check out [fairscale](https://github.com/facebookresearch/fairscale).
+
+4. **Generation**
+
+To generate from our models, follow the the commands in the generation section below.
+
+
+If you use any of the resources listed here, please cite:
+```bibtex
+@article{fan2020beyond,
+  title={Beyond English-Centric Multilingual Machine Translation},
+  author={Fan, Angela and Bhosale, Shruti and Schwenk, Holger and Ma, Zhiyi and El-Kishky, Ahmed and Goyal, Siddharth and Baines, Mandeep and Celebi, Onur and Wenzek, Guillaume and Chaudhary, Vishrav and Goyal, Naman and Birch, Tom and Liptchinsky, Vitaliy and Edunov, Sergey and Grave, Edouard and Auli, Michael and Joulin, Armand},
+  journal={arXiv preprint},
+  year={2020}
+}
+
+@article{schwenk2019ccmatrix,
+  title={Ccmatrix: Mining billions of high-quality parallel sentences on the web},
+  author={Schwenk, Holger and Wenzek, Guillaume and Edunov, Sergey and Grave, Edouard and Joulin, Armand},
+  journal={arXiv preprint arXiv:1911.04944},
+  year={2019}
+}
+
+@article{el2019massive,
+  title={A Massive Collection of Cross-Lingual Web-Document Pairs},
+  author={El-Kishky, Ahmed and Chaudhary, Vishrav and Guzman, Francisco and Koehn, Philipp},
+  journal={arXiv preprint arXiv:1911.06154},
+  year={2019}
+}
+```
+
+
+## Trained Models
+
+### 418M and 1.2B Model
+We include the last checkpoint for both of these models. 
+
+```bash
+wget https://dl.fbaipublicfiles.com/m2m_100/model_dict.128k.txt
+wget https://dl.fbaipublicfiles.com/m2m_100/language_pairs_small_models.txt 
+
+# 418M parameter model
+wget https://dl.fbaipublicfiles.com/m2m_100/418M_last_checkpoint.pt 
+
+# 1.2B parameter model
+wget https://dl.fbaipublicfiles.com/m2m_100/1.2B_last_checkpoint.pt
+
+# Generation:
+fairseq-generate $binarized_data_path --batch-size 32 --path $path_to_model --fixed-dictionary model_dict.128k.txt -s en -t fr --remove-bpe 'sentencepiece' --beam 5 --task translation_multi_simple_epoch --lang-pairs language_pairs_small_models.txt --decoder-langtok --encoder-langtok src --gen-subset test > gen_out
+```
+
+### 12B Model
+12B parameter model trained on many-to-many training data for 100 languages. We include the last checkpoint, average of last 5 checkpoints, average of last 10 checkpoints. There isn't a universally best choice out of these three, but all three versions are pretty close in accuracy. You can either sweep over the 3 checkpoints on a dev test and use the best performing checkpoint for final testing. Or the last checkpoint can be a good default choice.
+
+**Model Download Links**
+Configuration | 2 32GB GPUs | 4 16GB GPUs | 6 12GB GPUs | 8 8GB GPUs
+:--|:--|:--|:--|:--
+Last Checkpoint | [12b_last_chk_2_gpus.pt](https://dl.fbaipublicfiles.com/m2m_100/12b_last_chk_2_gpus.pt) | [12b_last_chk_4_gpus.pt](https://dl.fbaipublicfiles.com/m2m_100/12b_last_chk_4_gpus.pt) | [12b_last_chk_6_gpus.pt](https://dl.fbaipublicfiles.com/m2m_100/12b_last_chk_6_gpus.pt) | [12b_last_chk_8_gpus.pt](https://dl.fbaipublicfiles.com/m2m_100/12b_last_chk_8_gpus.pt)
+Average of last 5 checkpoints | [12b_avg5_chk_2_gpus.pt](https://dl.fbaipublicfiles.com/m2m_100/12b_avg5_chk_2_gpus.pt) | [12b_avg5_chk_4_gpus.pt](https://dl.fbaipublicfiles.com/m2m_100/12b_avg5_chk_4_gpus.pt) | [12b_avg5_chk_6_gpus.pt](https://dl.fbaipublicfiles.com/m2m_100/12b_avg5_chk_6_gpus.pt) | [12b_avg5_chk_8_gpus.pt](https://dl.fbaipublicfiles.com/m2m_100/12b_avg5_chk_8_gpus.pt)
+Average of last 10 checkpoints |  [12b_avg10_chk_2_gpus.pt](https://dl.fbaipublicfiles.com/m2m_100/12b_avg10_chk_2_gpus.pt) | [12b_avg10_chk_4_gpus.pt](https://dl.fbaipublicfiles.com/m2m_100/12b_avg10_chk_4_gpus.pt) | [12b_avg10_chk_6_gpus.pt](https://dl.fbaipublicfiles.com/m2m_100/12b_avg10_chk_6_gpus.pt) | [12b_avg10_chk_8_gpus.pt](https://dl.fbaipublicfiles.com/m2m_100/12b_avg10_chk_8_gpus.pt)
+
+**Generation Arguments**
+Configuration | 2 32GB GPUs | 4 16GB GPUs | 6 12GB GPUs | 8 8GB GPUs
+:--|:--|:--|:--|:--
+`--pipeline-encoder-balance` | `[26]` | `[1,15,10]` | `[1,9,9,7]` | `[1,6,6,6,7]`
+`--pipeline-encoder-devices` | `[0]` | `[0,1,0]` | `[0,1,2,0]` | `[0,4,5,1,0]`
+`--pipeline-decoder-balance` | `[3,22,1]` | `[3,11,11,1]` | `[3,7,7,8,1]` | `[1,6,6,6,6,1]`
+`--pipeline-decoder-devices` | `[0,1,0]` | `[0,2,3,0]` | `[0,3,4,5,0]` |  `[0,2,6,7,3,0]`
+
+
+## SentencePiece Model
+
+```bash
+wget https://dl.fbaipublicfiles.com/m2m_100/spm.128k.model
+```
+
+## Generation with M2M-100
+
+### Encode using our SentencePiece Model
+
+Note: Install SentencePiece from [here](https://github.com/google/sentencepiece)
+
+```bash
+fairseq=/path/to/fairseq
+cd $fairseq
+sacrebleu --echo src -l de-fr -t wmt19 | head -n 20 > raw_input.de-fr.de
+sacrebleu --echo ref -l de-fr -t wmt19 | head -n 20 > raw_input.de-fr.fr
+wget https://dl.fbaipublicfiles.com/m2m_100/spm.128k.model
+for lang in de fr ; do
+    python scripts/spm_encode.py \
+        --model spm.128k.model \
+        --output_format=piece \
+        --inputs=raw_input.de-fr.${lang} \
+        --outputs=spm.de-fr.${lang}
+done
+```
+
+### Binarization
+
+```bash
+wget https://dl.fbaipublicfiles.com/m2m_100/data_dict.128k.txt
+fairseq-preprocess \
+    --source-lang de --target-lang fr \
+    --testpref spm.de-fr \
+    --thresholdsrc 0 --thresholdtgt 0 \
+    --destdir data_bin \
+    --srcdict data_dict.128k.txt --tgtdict data_dict.128k.txt
+```
+
+### Generation for the 12B model
+
+Note that generation can currently be run using 2 32GB / 4 16GB / 6 12GB / 8 8GB GPUs, and the corresponding model checkpoints and pipeline arguments can be found in the [12B Model Section](#12b-model).
+Generation on CPUs will be added in the future.
+
+```bash
+wget https://dl.fbaipublicfiles.com/m2m_100/model_dict.128k.txt
+wget https://dl.fbaipublicfiles.com/m2m_100/language_pairs.txt
+wget https://dl.fbaipublicfiles.com/m2m_100/12b_last_chk_4_gpus.pt
+fairseq-generate \
+    data_bin \
+    --batch-size 1 \
+    --path 12b_last_chk_4_gpus.pt \
+    --fixed-dictionary model_dict.128k.txt \
+    -s de -t fr \
+    --remove-bpe 'sentencepiece' \
+    --beam 5 \
+    --task translation_multi_simple_epoch \
+    --lang-pairs language_pairs.txt \
+    --decoder-langtok --encoder-langtok src \
+    --gen-subset test \
+    --fp16 \
+    --dataset-impl mmap \
+    --distributed-world-size 1 --distributed-no-spawn \
+    --pipeline-model-parallel \
+    --pipeline-chunks 1 \
+    --pipeline-encoder-balance '[1,15,10]' \
+    --pipeline-encoder-devices '[0,1,0]' \
+    --pipeline-decoder-balance '[3,11,11,1]' \
+    --pipeline-decoder-devices '[0,2,3,0]' > gen_out
+```
+## Evaluation with M2M-100
+
+### Tokenization
+
+Note: Refer to tokenizers/README.md for more details on tokenization.
+
+```bash
+cd ${fairseq}/examples/m2m_100
+cat ${fairseq}/gen_out | grep -P "^H" | sort -V | cut -f 3- | sh tok.sh fr > hyp
+cat ${fairseq}/raw_input.de-fr.fr | sh tok.sh fr > ref
+```
+
+### BLEU
+
+```bash
+sacrebleu -tok 'none' ref < hyp
+```
diff --git a/SpeechT5/fairseq/examples/m2m_100/install_dependecies.sh b/SpeechT5/fairseq/examples/m2m_100/install_dependecies.sh
new file mode 100644
index 0000000000000000000000000000000000000000..82a1054745264a56fbec4a8eb593884f8a42bd08
--- /dev/null
+++ b/SpeechT5/fairseq/examples/m2m_100/install_dependecies.sh
@@ -0,0 +1,78 @@
+#!/usr/bin/env bash
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+
+CWD=`pwd`
+INSTALL_PATH=$CWD/tokenizers/thirdparty
+
+MOSES=$INSTALL_PATH/mosesdecoder
+if [ ! -d $MOSES ]; then
+    echo 'Cloning Moses github repository (for tokenization scripts)...'
+    git clone https://github.com/moses-smt/mosesdecoder.git $MOSES
+    cd $MOSES
+    # To deal with differences in handling ' vs "
+    git checkout 03578921cc1a03402
+    cd -
+fi
+
+WMT16_SCRIPTS=$INSTALL_PATH/wmt16-scripts
+if [ ! -d $WMT16_SCRIPTS ]; then
+    echo 'Cloning Romanian tokenization scripts'
+    git clone https://github.com/rsennrich/wmt16-scripts.git $WMT16_SCRIPTS
+fi
+
+KYTEA=$INSTALL_PATH/kytea
+if [ ! -f $KYTEA/bin/kytea ]; then
+    git clone https://github.com/neubig/kytea.git $KYTEA
+    cd $KYTEA
+    autoreconf -i
+    ./configure --prefix=`pwd`
+    make
+    make install
+    cd ..
+fi
+
+export MECAB=$INSTALL_PATH/mecab-0.996-ko-0.9.2
+if [ ! -f $MECAB/bin/mecab ]; then
+    cd $INSTALL_PATH
+    curl -LO https://bitbucket.org/eunjeon/mecab-ko/downloads/mecab-0.996-ko-0.9.2.tar.gz
+    tar zxfv mecab-0.996-ko-0.9.2.tar.gz
+    cd mecab-0.996-ko-0.9.2/
+    ./configure --prefix=`pwd`
+    make
+    make install
+
+    cd ..
+    curl -LO https://bitbucket.org/eunjeon/mecab-ko-dic/downloads/mecab-ko-dic-2.1.1-20180720.tar.gz
+    tar zxfv mecab-ko-dic-2.1.1-20180720.tar.gz
+    cd mecab-ko-dic-2.1.1-20180720/
+    ./autogen.sh
+    ./configure --prefix=`pwd` --with-dicdir=$MECAB/lib/mecab/dic/mecab-ko-dic --with-mecab-config=$MECAB/bin/mecab-config
+    make
+    sh -c 'echo "dicdir=$MECAB/lib/mecab/dic/mecab-ko-dic" > $MECAB/etc/mecabrc'
+    make install
+    cd $CWD
+fi
+
+INDIC_RESOURCES_PATH=$INSTALL_PATH/indic_nlp_resources
+if [ ! -d $INDIC_RESOURCES_PATH ]; then
+    echo 'Cloning indic_nlp_resources'
+    git clone https://github.com/anoopkunchukuttan/indic_nlp_resources.git $INDIC_RESOURCES_PATH
+fi
+
+
+if [ ! -f $INSTALL_PATH/seg_my.py ]; then
+    cd $INSTALL_PATH
+    wget http://lotus.kuee.kyoto-u.ac.jp/WAT/my-en-data/wat2020.my-en.zip
+    unzip wat2020.my-en.zip
+    # switch to python3
+    cat wat2020.my-en/myseg.py  |sed 's/^sys.std/###sys.std/g' | sed 's/### sys/sys/g' | sed 's/unichr/chr/g' > seg_my.py
+    cd $CWD
+fi
+
+
+pip install pythainlp sacrebleu indic-nlp-library
+
diff --git a/SpeechT5/fairseq/examples/m2m_100/process_data/clean_histogram.py b/SpeechT5/fairseq/examples/m2m_100/process_data/clean_histogram.py
new file mode 100644
index 0000000000000000000000000000000000000000..e24e073dc0eb43c76e2ce717f52bb848c5b026b8
--- /dev/null
+++ b/SpeechT5/fairseq/examples/m2m_100/process_data/clean_histogram.py
@@ -0,0 +1,52 @@
+import argparse
+
+parser = argparse.ArgumentParser()
+parser.add_argument('--src', type=str, help='Source language')
+parser.add_argument('--tgt', type=str, help='Target language')
+parser.add_argument('--src-file', type=str, help='Input source file')
+parser.add_argument('--tgt-file', type=str, help='Input target file')
+parser.add_argument('--src-output-file', type=str, help='Output source file')
+parser.add_argument('--tgt-output-file', type=str, help='Output target file')
+parser.add_argument('--threshold', type=float, default=0.5, help='Threshold')
+parser.add_argument('--threshold-character', type=str, default=']', help='Threshold character')
+parser.add_argument('--histograms', type=str, help='Path to histograms')
+
+args = parser.parse_args()
+
+
+def read_hist(f):
+    ch = []
+    for line in f:
+        c = line[0]
+        if c == args.threshold_character:
+            break
+        ch.append(c)
+    return ch
+
+
+with(open("{}/{}".format(args.histograms, args.src), 'r', encoding='utf8')) as f:
+    ch1 = read_hist(f)
+
+with(open("{}/{}".format(args.histograms, args.tgt), 'r', encoding='utf8')) as f:
+    ch2 = read_hist(f)
+
+print("Accepted characters for {}: {}".format(args.src, ch1))
+print("Accepted characters for {}: {}".format(args.tgt, ch2))
+
+with open(args.src_file, 'r', encoding='utf8') as fs1, open(args.tgt_file, 'r', encoding='utf8') as fs2, open(args.src_output_file, 'w', encoding='utf8') as fos1, open(args.tgt_output_file, 'w', encoding='utf8') as fos2:
+    ls1 = fs1.readline()
+    ls2 = fs2.readline()
+
+    while ls1 or ls2:
+        cnt1 = len([c for c in ls1.strip() if c in ch1])
+        cnt2 = len([c for c in ls2.strip() if c in ch2])
+
+        if cnt1 / len(ls1) > args.threshold and cnt2 / len(ls2) > args.threshold:
+            fos1.write(ls1)
+            fos2.write(ls2)
+        else:
+            print("{} {} {} \n{} {} {}".format(args.src, cnt1 / len(ls1), ls1.strip(), args.tgt, cnt2 / len(ls2), ls2.strip()))
+
+        ls1 = fs1.readline()
+        ls2 = fs2.readline()
+        
\ No newline at end of file
diff --git a/SpeechT5/fairseq/examples/m2m_100/process_data/dedup_data.py b/SpeechT5/fairseq/examples/m2m_100/process_data/dedup_data.py
new file mode 100644
index 0000000000000000000000000000000000000000..58d9ed1cd17b3ba70772a6d9adab709785495fd9
--- /dev/null
+++ b/SpeechT5/fairseq/examples/m2m_100/process_data/dedup_data.py
@@ -0,0 +1,91 @@
+import argparse
+from collections import namedtuple
+import os
+
+DATADIR = "/path/to/train_data"
+DEDUP_FROM_DIR = "/path/to/eval/data"
+OUTPUT_DIR = "/path/to/output/data"
+
+
+def main(args):
+    languages = set()
+    for language_directory in os.listdir(DATADIR):
+        if "_" in language_directory:
+            src, tgt = language_directory.split("_")
+            languages.add(LanguagePair(src=src, tgt=tgt))
+
+    data = existing_data()
+    train_languages = sorted(languages)
+    for language_pair in train_languages[args.start_index:args.start_index + args.size]:
+        print(language_pair)
+        dedup(language_pair, data)
+
+
+LanguagePair = namedtuple("LanguagePair", ["src", "tgt"])
+
+
+def existing_data():
+    data = set()
+    for file in os.listdir(DEDUP_FROM_DIR):
+        with open(os.path.join(DEDUP_FROM_DIR, file)) as f:
+            data |= set(f.readlines())
+    return data
+ 
+def dedup(language_pair, data, verbose=True, output=True):
+    train_filenames = LanguagePair(
+            src=f"{DATADIR}/{language_pair.src}_{language_pair.tgt}/train.{language_pair.src}",
+            tgt=f"{DATADIR}/{language_pair.src}_{language_pair.tgt}/train.{language_pair.tgt}",
+        )
+
+    output_filenames = LanguagePair(
+        src=f"{OUTPUT_DIR}/train.dedup.{language_pair.src}-{language_pair.tgt}.{language_pair.src}",
+        tgt=f"{OUTPUT_DIR}/train.dedup.{language_pair.src}-{language_pair.tgt}.{language_pair.tgt}"
+    )
+
+    # If output exists, skip this pair. It has already been done.
+    if (os.path.exists(output_filenames.src) and
+        os.path.exists(output_filenames.tgt)):
+        if verbose:
+            print(f"{language_pair.src}-{language_pair.tgt} already done.")
+        return
+
+    if verbose:
+        print(f"{language_pair.src}-{language_pair.tgt} ready, will check dups.")
+
+    # If there is no output, no need to actually do the loop.
+    if not output:
+        return
+
+    if os.path.exists(train_filenames.src) and os.path.exists(train_filenames.tgt):
+        with open(train_filenames.src) as f:
+            train_source = f.readlines()
+
+        with open(train_filenames.tgt) as f:
+            train_target = f.readlines()
+
+        # do dedup
+        new_train_source = []
+        new_train_target = []
+        for i, train_line in enumerate(train_source):
+            if train_line not in data and train_target[i] not in data:
+                new_train_source.append(train_line)
+                new_train_target.append(train_target[i])
+
+        assert len(train_source) == len(train_target)
+        assert len(new_train_source) == len(new_train_target)
+        assert len(new_train_source) <= len(train_source)
+
+        with open(output_filenames.src, "w") as o:
+            for line in new_train_source:
+                o.write(line)
+
+        with open(output_filenames.tgt, "w") as o:
+            for line in new_train_target:
+                o.write(line)
+
+
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser()
+    parser.add_argument("-s", "--start-index", required=True, type=int)
+    parser.add_argument("-n", "--size", required=True, type=int)
+    main(parser.parse_args())
diff --git a/SpeechT5/fairseq/examples/m2m_100/process_data/remove_too_much_punc.py b/SpeechT5/fairseq/examples/m2m_100/process_data/remove_too_much_punc.py
new file mode 100644
index 0000000000000000000000000000000000000000..6c280de2403daffab477ac88e2008a68b9e61ff0
--- /dev/null
+++ b/SpeechT5/fairseq/examples/m2m_100/process_data/remove_too_much_punc.py
@@ -0,0 +1,36 @@
+import gzip
+import argparse
+from string import punctuation
+
+def len_no_punc(s, punc):
+    return len([ch for ch in s if ch in punc])
+
+def filter_overpunc(len_npunc, len_sen):
+    return len_npunc < 0.5*len_sen
+
+def main(args):
+    punc = punctuation + "—|–"
+    print('Processing file {}'.format(args.input))
+    with gzip.open(args.input, 'rt', encoding=args.encoding) as tsv:
+        with open(args.bitext + '.' + args.src_lang, 'wt', encoding=args.encoding) as fsrc:
+            with open(args.bitext + '.' + args.tgt_lang, 'wt', encoding=args.encoding) as ftgt:
+                line = tsv.readline()
+                fields = line.split('\t')
+
+                src, tgt = fields[1], fields[2]
+
+                nchar_npunc_src = len_no_punc(src, punc)
+                nchar_npunc_tgt = len_no_punc(tgt, punc)
+
+                if filter_overpunc(nchar_npunc_src, len(src)) and filter_overpunc(nchar_npunc_tgt, len(tgt)):
+                    fsrc.write(src.strip() + '\n')
+                    ftgt.write(tgt.strip() + '\n')
+
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--input", required=True, type=str)
+    parser.add_argument('--encoding', default='utf-8', help='character encoding for input/output')
+    parser.add_argument('--bitext', type=str, required=True, help='language direction')
+    parser.add_argument('--src-lang', type=str, required=True, help='Source language')
+    parser.add_argument('--tgt-lang', type=str, required=True, help='Target language')
+    main(parser.parse_args())
diff --git a/SpeechT5/fairseq/examples/m2m_100/tok.sh b/SpeechT5/fairseq/examples/m2m_100/tok.sh
new file mode 100644
index 0000000000000000000000000000000000000000..ba2ec5a2f3f4794d2e528d3a6574bf05abe1d043
--- /dev/null
+++ b/SpeechT5/fairseq/examples/m2m_100/tok.sh
@@ -0,0 +1,83 @@
+#!/usr/bin/env bash
+# Copyright (c) 2019-present, Facebook, Inc.
+# All rights reserved.
+#
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+#
+
+set -e
+
+TOKENIZERS_SCRIPTS=tokenizers
+INSTALL_PATH=$TOKENIZERS_SCRIPTS/thirdparty
+
+N_THREADS=8
+
+lg=$1
+
+MOSES=$INSTALL_PATH/mosesdecoder
+REPLACE_UNICODE_PUNCT=$MOSES/scripts/tokenizer/replace-unicode-punctuation.perl
+NORM_PUNC=$MOSES/scripts/tokenizer/normalize-punctuation.perl
+REM_NON_PRINT_CHAR=$MOSES/scripts/tokenizer/remove-non-printing-char.perl
+TOKENIZER=$MOSES/scripts/tokenizer/tokenizer.perl
+
+# special tokenization for Romanian
+WMT16_SCRIPTS=$INSTALL_PATH/wmt16-scripts
+
+NORMALIZE_ROMANIAN=$WMT16_SCRIPTS/preprocess/normalise-romanian.py
+REMOVE_DIACRITICS=$WMT16_SCRIPTS/preprocess/remove-diacritics.py
+
+# Burmese
+MY_SEGMENT=$INSTALL_PATH/seg_my.py
+
+# Arabic
+AR_TOKENIZER=$TOKENIZERS_SCRIPTS/tokenizer_ar.sh
+
+# Korean
+KO_SEGMENT=$TOKENIZERS_SCRIPTS/seg_ko.sh
+
+# Japanese
+JA_SEGMENT=$TOKENIZERS_SCRIPTS/seg_ja.sh
+
+# Indic
+IN_TOKENIZER=$TOKENIZERS_SCRIPTS/tokenize_indic.py
+INDIC_RESOURCES_PATH=$INSTALL_PATH/indic_nlp_resources
+
+# Thai
+THAI_TOKENIZER=$TOKENIZERS_SCRIPTS/tokenize_thai.py
+
+# Chinese
+CHINESE_TOKENIZER=$TOKENIZERS_SCRIPTS/tokenize_zh.py
+
+# Chinese
+if [ "$lg" = "zh" ]; then
+  cat - | $REPLACE_UNICODE_PUNCT | $NORM_PUNC -l $lg | $REM_NON_PRINT_CHAR | python $CHINESE_TOKENIZER
+# Thai
+elif [ "$lg" = "th" ]; then
+  cat - | python $THAI_TOKENIZER
+# Japanese
+elif [ "$lg" = "ja" ]; then
+  cat - | $REPLACE_UNICODE_PUNCT | $NORM_PUNC -l $lg | $REM_NON_PRINT_CHAR | ${JA_SEGMENT}
+# Korean
+elif [ "$lg" = "ko" ]; then
+  cat - | $REM_NON_PRINT_CHAR | ${KO_SEGMENT}
+# Romanian
+elif [ "$lg" = "ro" ]; then
+  cat - | $REPLACE_UNICODE_PUNCT | $NORM_PUNC -l $lg | $REM_NON_PRINT_CHAR | $NORMALIZE_ROMANIAN | $REMOVE_DIACRITICS | $TOKENIZER -no-escape -threads $N_THREADS -l $lg
+# Burmese
+elif [ "$lg" = "my" ]; then
+  cat - | python ${MY_SEGMENT}
+# Arabic
+elif [ "$lg" = "ar" ]; then
+  cat - | ${AR_TOKENIZER}
+# Indic
+elif [ "$lg" = "ne" ]; then
+  cat - | python ${IN_TOKENIZER} $lg
+elif [ "$lg" = "si" ]; then
+  cat - | python ${IN_TOKENIZER} $lg
+elif [ "$lg" = "hi" ]; then
+  cat - | python ${IN_TOKENIZER} $lg
+# other languages
+else
+  cat - | $REPLACE_UNICODE_PUNCT | $NORM_PUNC -l $lg | $REM_NON_PRINT_CHAR | $TOKENIZER -no-escape -threads $N_THREADS -l $lg
+fi
diff --git a/SpeechT5/fairseq/examples/m2m_100/tokenizers/README.md b/SpeechT5/fairseq/examples/m2m_100/tokenizers/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..e116932bc80572f221cff6472a7b1eea7032925d
--- /dev/null
+++ b/SpeechT5/fairseq/examples/m2m_100/tokenizers/README.md
@@ -0,0 +1,18 @@
+# M2M-100 Tokenization
+
+We apply different tokenization strategies for different languages following the existing literature. Here we provide tok.sh a tokenizer that can be used to reproduce our results.
+
+To reproduce the results, follow these steps:
+
+```
+tgt_lang=...
+reference_translation=...
+cat generation_output | grep -P "^H" | sort -V | cut -f 3- | sh tok.sh $tgt_lang > hyp
+cat $reference_translation |sh tok.sh $tgt_lang > ref
+sacrebleu -tok 'none' ref < hyp
+```
+
+## Installation
+
+Tools needed for all the languages except Arabic can be installed by running install_dependencies.sh
+If you want to evaluate Arabic models, please follow the instructions provided here: http://alt.qcri.org/tools/arabic-normalizer/ to install 
diff --git a/SpeechT5/fairseq/examples/m2m_100/tokenizers/seg_ja.sh b/SpeechT5/fairseq/examples/m2m_100/tokenizers/seg_ja.sh
new file mode 100644
index 0000000000000000000000000000000000000000..be6f5ca5fe4ac8e8c786a439caaed1d1314f1aef
--- /dev/null
+++ b/SpeechT5/fairseq/examples/m2m_100/tokenizers/seg_ja.sh
@@ -0,0 +1,11 @@
+#!/usr/bin/env bash
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+SCRIPT=`realpath $0`
+KYTEA=`dirname $SCRIPT`/thirdparty/kytea
+export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$KYTEA/lib:/usr/local/lib
+export PATH=$PATH:"$KYTEA/bin"
+
+cat - | tr -d "[:blank:]" | kytea -notags
diff --git a/SpeechT5/fairseq/examples/m2m_100/tokenizers/seg_ko.sh b/SpeechT5/fairseq/examples/m2m_100/tokenizers/seg_ko.sh
new file mode 100644
index 0000000000000000000000000000000000000000..c523d92634d9b61b97bbcdbfd17dfc33465bfc09
--- /dev/null
+++ b/SpeechT5/fairseq/examples/m2m_100/tokenizers/seg_ko.sh
@@ -0,0 +1,12 @@
+#!/usr/bin/env bash
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+SCRIPT=`realpath $0`
+MECAB=`dirname $SCRIPT`/thirdparty/mecab-0.996-ko-0.9.2
+
+export PATH=$PATH:"$MECAB/bin":"$MECAB/lib"
+export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:"$MECAB/lib"
+
+cat - | mecab -O wakati
diff --git a/SpeechT5/fairseq/examples/m2m_100/tokenizers/thirdparty/.gitignore b/SpeechT5/fairseq/examples/m2m_100/tokenizers/thirdparty/.gitignore
new file mode 100644
index 0000000000000000000000000000000000000000..19eb6a9dd705ac583f22ecb60d9b744987e27ff1
--- /dev/null
+++ b/SpeechT5/fairseq/examples/m2m_100/tokenizers/thirdparty/.gitignore
@@ -0,0 +1,12 @@
+seg_my.py
+indic_nlp_library/
+indic_nlp_resources/
+kytea/
+mecab-0.996-ko-0.9.2.tar.gz
+mecab-0.996-ko-0.9.2/
+mosesdecoder/
+wat2020.my-en.zip
+wat2020.my-en/
+wmt16-scripts/
+mecab-ko-dic-2.1.1-20180720/
+mecab-ko-dic-2.1.1-20180720.tar.gz
\ No newline at end of file
diff --git a/SpeechT5/fairseq/examples/m2m_100/tokenizers/tokenize_indic.py b/SpeechT5/fairseq/examples/m2m_100/tokenizers/tokenize_indic.py
new file mode 100644
index 0000000000000000000000000000000000000000..a44fad07f7c718f99cccd445f33c62b0e3c562f4
--- /dev/null
+++ b/SpeechT5/fairseq/examples/m2m_100/tokenizers/tokenize_indic.py
@@ -0,0 +1,23 @@
+#!/usr/bin/env python3
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+# Use: echo {text} | python tokenize_indic.py {language}
+
+import sys
+
+from indicnlp.normalize.indic_normalize import IndicNormalizerFactory
+from indicnlp.tokenize.indic_tokenize import trivial_tokenize
+
+
+factory = IndicNormalizerFactory()
+normalizer = factory.get_normalizer(
+    sys.argv[1], remove_nuktas=False, nasals_mode="do_nothing"
+)
+
+for line in sys.stdin:
+    normalized_line = normalizer.normalize(line.strip())
+    tokenized_line = " ".join(trivial_tokenize(normalized_line, sys.argv[1]))
+    print(tokenized_line)
diff --git a/SpeechT5/fairseq/examples/m2m_100/tokenizers/tokenize_thai.py b/SpeechT5/fairseq/examples/m2m_100/tokenizers/tokenize_thai.py
new file mode 100644
index 0000000000000000000000000000000000000000..9c72cb89056f6fc92a8963415e5f3a1e61b33a5b
--- /dev/null
+++ b/SpeechT5/fairseq/examples/m2m_100/tokenizers/tokenize_thai.py
@@ -0,0 +1,13 @@
+#!/usr/bin/env python3
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import sys
+
+from pythainlp import word_tokenize
+
+
+for line in sys.stdin:
+    print(" ".join(word_tokenize(line.strip())))
diff --git a/SpeechT5/fairseq/examples/m2m_100/tokenizers/tokenize_zh.py b/SpeechT5/fairseq/examples/m2m_100/tokenizers/tokenize_zh.py
new file mode 100644
index 0000000000000000000000000000000000000000..674b5849cba829cf4f07a69369e9cc6eed376d4c
--- /dev/null
+++ b/SpeechT5/fairseq/examples/m2m_100/tokenizers/tokenize_zh.py
@@ -0,0 +1,14 @@
+#!/usr/bin/env python3
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+
+import fileinput
+
+import sacrebleu
+
+
+for line in fileinput.input():
+    print(sacrebleu.tokenize_zh(line))
diff --git a/SpeechT5/fairseq/examples/m2m_100/tokenizers/tokenizer_ar.sh b/SpeechT5/fairseq/examples/m2m_100/tokenizers/tokenizer_ar.sh
new file mode 100644
index 0000000000000000000000000000000000000000..ad35d7adf28dc9b23d13a6a3fec0b12cb760e855
--- /dev/null
+++ b/SpeechT5/fairseq/examples/m2m_100/tokenizers/tokenizer_ar.sh
@@ -0,0 +1,27 @@
+#!/usr/bin/env sh
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+#
+# Please follow the instructions here http://alt.qcri.org/tools/arabic-normalizer/
+# to install tools needed for Arabic
+
+echo "Please install Arabic tools: http://alt.qcri.org/tools/arabic-normalizer/"
+echo "Then update environment variables in tokenizer_ar.sh"
+exit 1
+
+SVMTOOL=...
+GOMOSESGO=...
+QCRI_ARABIC_NORMALIZER=...
+
+export PERL5LIB="$SVMTOOL/lib":"$GOMOSESGO/bin/MADA-3.2":$PERL5LIB
+
+
+tempfile=$(mktemp)
+cat - > $tempfile
+
+cd $QCRI_ARABIC_NORMALIZER
+
+bash qcri_normalizer_mada3.2_aramorph1.2.1.sh $tempfile
+cat $tempfile.mada_norm-aramorph.europarl_tok
diff --git a/SpeechT5/fairseq/examples/mbart/README.md b/SpeechT5/fairseq/examples/mbart/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..a45e37243c2c5d4027f79cf71498ca58bbac7d98
--- /dev/null
+++ b/SpeechT5/fairseq/examples/mbart/README.md
@@ -0,0 +1,123 @@
+# MBART: Multilingual Denoising Pre-training for Neural Machine Translation
+[https://arxiv.org/abs/2001.08210]
+
+## Introduction
+
+MBART is a sequence-to-sequence denoising auto-encoder pre-trained on large-scale monolingual corpora in many languages using the BART objective. mBART is one of the first methods for pre-training a complete sequence-to-sequence model by denoising full texts in multiple languages, while previous approaches have focused only on the encoder, decoder, or reconstructing parts of the text.
+
+## Pre-trained models
+
+Model | Description | # params | Download
+---|---|---|---
+`mbart.CC25` | mBART model with 12 encoder and decoder layers trained on 25 languages' monolingual corpus | 610M | [mbart.CC25.tar.gz](https://dl.fbaipublicfiles.com/fairseq/models/mbart/mbart.cc25.v2.tar.gz)
+`mbart.ft.ro_en` | finetune mBART cc25 model on ro-en language pairs | 610M | [mbart.cc25.ft.enro.tar.gz](https://dl.fbaipublicfiles.com/fairseq/models/mbart/mbart.cc25.ft.enro.tar.gz)
+
+## Results
+
+**[WMT16 EN-RO](https://www.statmt.org/wmt16/translation-task.html)**
+
+_(test set, no additional data used)_
+
+Model | en-ro | ro-en
+---|---|---
+`Random` | 34.3 | 34.0
+`mbart.cc25` | 37.7 | 37.8
+`mbart.enro.bilingual` | 38.5 | 38.5 
+
+## BPE data
+# download model
+wget https://dl.fbaipublicfiles.com/fairseq/models/mbart/mbart.cc25.v2.tar.gz
+tar -xzvf mbart.CC25.tar.gz
+# bpe data
+install SPM [here](https://github.com/google/sentencepiece)
+```bash
+SPM=/path/to/sentencepiece/build/src/spm_encode
+MODEL=sentence.bpe.model
+${SPM} --model=${MODEL} < ${DATA}/${TRAIN}.${SRC} > ${DATA}/${TRAIN}.spm.${SRC} &
+${SPM} --model=${MODEL} < ${DATA}/${TRAIN}.${TGT} > ${DATA}/${TRAIN}.spm.${TGT} &
+${SPM} --model=${MODEL} < ${DATA}/${VALID}.${SRC} > ${DATA}/${VALID}.spm.${SRC} &
+${SPM} --model=${MODEL} < ${DATA}/${VALID}.${TGT} > ${DATA}/${VALID}.spm.${TGT} &
+${SPM} --model=${MODEL} < ${DATA}/${TEST}.${SRC} > ${DATA}/${TEST}.spm.${SRC} &
+${SPM} --model=${MODEL} < ${DATA}/${TEST}.${TGT} > ${DATA}/${TEST}.spm.${TGT} &
+```
+
+## Preprocess data
+
+```bash
+DICT=dict.txt
+fairseq-preprocess \
+  --source-lang ${SRC} \
+  --target-lang ${TGT} \
+  --trainpref ${DATA}/${TRAIN}.spm \
+  --validpref ${DATA}/${VALID}.spm \
+  --testpref ${DATA}/${TEST}.spm \
+  --destdir ${DEST}/${NAME} \
+  --thresholdtgt 0 \
+  --thresholdsrc 0 \
+  --srcdict ${DICT} \
+  --tgtdict ${DICT} \
+  --workers 70
+```
+
+## Finetune on EN-RO
+Finetune on mbart CC25
+
+```bash
+PRETRAIN=mbart.cc25 # fix if you moved the downloaded checkpoint
+langs=ar_AR,cs_CZ,de_DE,en_XX,es_XX,et_EE,fi_FI,fr_XX,gu_IN,hi_IN,it_IT,ja_XX,kk_KZ,ko_KR,lt_LT,lv_LV,my_MM,ne_NP,nl_XX,ro_RO,ru_RU,si_LK,tr_TR,vi_VN,zh_CN
+
+fairseq-train path_2_data \
+  --encoder-normalize-before --decoder-normalize-before \
+  --arch mbart_large --layernorm-embedding \
+  --task translation_from_pretrained_bart \
+  --source-lang en_XX --target-lang ro_RO \
+  --criterion label_smoothed_cross_entropy --label-smoothing 0.2 \
+  --optimizer adam --adam-eps 1e-06 --adam-betas '(0.9, 0.98)' \
+  --lr-scheduler polynomial_decay --lr 3e-05 --warmup-updates 2500 --total-num-update 40000 \
+  --dropout 0.3 --attention-dropout 0.1 --weight-decay 0.0 \
+  --max-tokens 1024 --update-freq 2 \
+  --save-interval 1 --save-interval-updates 5000 --keep-interval-updates 10 --no-epoch-checkpoints \
+  --seed 222 --log-format simple --log-interval 2 \
+  --restore-file $PRETRAIN \
+  --reset-optimizer --reset-meters --reset-dataloader --reset-lr-scheduler \
+  --langs $langs \
+  --ddp-backend legacy_ddp
+```
+## Generate on EN-RO
+Get sacrebleu on finetuned en-ro model
+
+get tokenizer  [here](https://github.com/rsennrich/wmt16-scripts)
+```bash  
+wget https://dl.fbaipublicfiles.com/fairseq/models/mbart/mbart.cc25.ft.enro.tar.gz  
+tar -xzvf mbart.cc25.ft.enro.tar.gz
+```
+
+```bash
+model_dir=MBART_finetuned_enro # fix if you moved the checkpoint
+
+fairseq-generate path_2_data \
+  --path $model_dir/model.pt \
+  --task translation_from_pretrained_bart \
+  --gen-subset test \
+  -t ro_RO -s en_XX \
+  --bpe 'sentencepiece' --sentencepiece-model $model_dir/sentence.bpe.model \
+  --sacrebleu --remove-bpe 'sentencepiece' \
+  --batch-size 32 --langs $langs > en_ro
+
+cat en_ro | grep -P "^H" |sort -V |cut -f 3- | sed 's/\[ro_RO\]//g' |$TOKENIZER ro > en_ro.hyp
+cat en_ro | grep -P "^T" |sort -V |cut -f 2- | sed 's/\[ro_RO\]//g' |$TOKENIZER ro > en_ro.ref
+sacrebleu -tok 'none' -s 'none' en_ro.ref < en_ro.hyp
+```
+
+## Citation
+
+```bibtex
+@article{liu2020multilingual,
+    title={Multilingual Denoising Pre-training for Neural Machine Translation},
+    author={Yinhan Liu and Jiatao Gu and Naman Goyal and Xian Li and Sergey Edunov and Marjan Ghazvininejad and Mike Lewis and Luke Zettlemoyer},
+    year={2020},
+    eprint={2001.08210},
+    archivePrefix={arXiv},
+    primaryClass={cs.CL}
+}
+```
diff --git a/SpeechT5/fairseq/examples/megatron_11b/README.md b/SpeechT5/fairseq/examples/megatron_11b/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..945c96c91e2e2d93466abc28d90bc25a1e7dd471
--- /dev/null
+++ b/SpeechT5/fairseq/examples/megatron_11b/README.md
@@ -0,0 +1,161 @@
+# Megatron-11b
+
+Megatron-11b is a unidirectional language model with `11B` parameters based on [Megatron-LM](https://arxiv.org/pdf/1909.08053.pdf). Following the original Megatron work, we trained the model using intra-layer model parallelism with each layer's parameters split across 8 GPUs.
+
+Megatron-11b is trained on the same data and uses the same byte-pair encoding (BPE) as [RoBERTa](https://arxiv.org/pdf/1907.11692.pdf).
+
+## Pre-trained models
+
+Model | Description | # params | # filesize | Download
+---|---|---|---|---
+`megatron_11b` | megatron_11b unidirectional language model | 11B | 19Gb | [megatron_11b.tar.gz](https://dl.fbaipublicfiles.com/fairseq/models/model_parallel/megatron_11b.tar.gz)
+
+#### Architecture:
+
+Param | Value
+---|---
+embed_dim | 3072
+ffn_dim | 3072 * 6
+layers | 72
+attention heads | 32
+
+#### Training details:
+
+Param | value
+---|---
+bsz | 512
+num_updates | 300,000
+peak_lr | 1.5e-04
+lr scheduler | inverse_sqrt
+clip norm | 0.0
+
+
+## Example training command (model parallel)
+
+Megatron-11b contains too many parameters to train on a single GPU. Following
+the original Megatron work, we adopt an intra-layer model parallel training
+approach in which each layer's parameters are split across multiple GPUs and
+activations and gradients are communicated during the forward/backward pass,
+respectively. We similarly split the loss computation using the
+`vocab_parallel_cross_entropy` criterion.
+
+The following training command illustrates how to do model parallel training in
+fairseq. We assume that each machine (node) has 8 GPUs among which to split the
+model parameters (`--model-parallel-size 8`). If you have access to multiple
+nodes, you may combine this with data parallel training by increasing
+`--distributed-world-size`.
+
+To train Megatron-11b on a single node:
+
+
+```bash
+fairseq-train <DATA_PATH> \
+  --distributed-world-size 8  \
+  --memory-efficient-fp16 \
+  --num-workers 2 \
+  --model-parallel-size 8 \
+  --criterion vocab_parallel_cross_entropy \
+  --task language_modeling \
+  --sample-break-mode none \
+  --tokens-per-sample 1024 \
+  --arch transformer_lm_megatron_11b \
+  --share-decoder-input-output-embed \
+  --optimizer adam --adam-betas "(0.9, 0.98)" --adam-eps 1e-08 --clip-norm 0.0 \
+  --lr-scheduler inverse_sqrt --lr 0.00015 \
+  --warmup-updates 3000 --weight-decay 0.01 \
+  --dropout 0.1 --attention-dropout 0.1 \
+  --batch-size 2 \
+  --max-update 300000;
+```
+
+Note: Above was tested on `DGX-1` box, with `8xV100-32Gb` GPUs.
+
+## Results
+
+**[Wikitext103](https://blog.einstein.ai/the-wikitext-long-term-dependency-language-modeling-dataset/)**
+
+Model | Valid perplexity | Test perplexity
+---|---|---
+`megatron_11b` | 10.64 | 10.54
+
+
+## Evaluating `megatron_11b` on Wikitext-103
+
+#### 1. Downloading Megatron-11b
+```bash
+# WARNING: this file is 19GB
+wget https://dl.fbaipublicfiles.com/fairseq/models/model_parallel/megatron_11b.tar.gz
+tar -xzvf megatron_11b.tar.gz
+```
+
+#### 2. Download Wikitext-103
+```bash
+wget https://s3.amazonaws.com/research.metamind.io/wikitext/wikitext-103-raw-v1.zip
+unzip wikitext-103-raw-v1.zip
+```
+
+#### 3. Detokenize test tokens
+Megatron-11b uses a byte-level BPE that expects raw (untokenized) input. Since
+the wikitext-103 dataset comes tokenized, we apply a simple detokenization
+process to restore the untokenized test set:
+
+```bash
+python -m examples.megatron_11b.detok wikitext-103-raw/wiki.test.raw > wikitext-103-raw/wiki.test.detok
+```
+
+#### 4. BPE encoding
+```bash
+wget -N 'https://dl.fbaipublicfiles.com/fairseq/gpt2_bpe/encoder.json'
+wget -N 'https://dl.fbaipublicfiles.com/fairseq/gpt2_bpe/vocab.bpe'
+
+python -m examples.roberta.multiprocessing_bpe_encoder \
+    --encoder-json encoder.json \
+    --vocab-bpe vocab.bpe \
+    --inputs "wikitext-103-raw/wiki.test.detok" \
+    --outputs "wikitext-103-raw/wiki.test.bpe" \
+    --workers 60;
+```
+
+#### 5. Fairseq binarize
+```bash
+fairseq-preprocess \
+    --only-source \
+    --testpref wikitext-103-raw/wiki.test.bpe \
+    --srcdict megatron_11b/dict.txt \
+    --destdir wikitext103-bin;
+```
+
+#### 6. Evaluating perplexity.
+We can now evaluate perplexity on the test set. Note that because we've modified
+the test set (via detokenization and BPE), the perplexity reported by
+`fairseq-eval-lm` needs to be renormalized.
+
+Compute unnormalized perplexity:
+
+```bash
+DATA_PATH=wikitext103-bin/
+fairseq-eval-lm \
+  $DATA_PATH \
+  --path megatron_11b/model.pt \
+  --task language_modeling \
+  --gen-subset test \
+  --batch-size 8 \
+  --criterion cross_entropy \
+  --context-window 992 \
+  --distributed-world-size 8 \
+  --model-parallel-size 8;
+# Expected PPL (unnormalized_ppl): [8.46]
+# Note: the eval command needs to run on 8 GPUs for the released model
+```
+Renormalizing formula:  `2 ^ ( log_2(unnormalized_PPL) * (270847 / 245566))`.
+PPL After normalization: `10.54`
+
+To renormalize the perplexity, we must account for the change in token count
+after detokenizing and appling BPE. The formula for this is:
+`2 ^ ( log_2(unnormalized_PPL) * (new_token_cnt / orig_token_cnt))`
+
+For the wikitext-103 test set, the original token count is `245566` and the
+token count after detokenization and applying BPE is `270847`.
+
+The perplexity after renormalization is:
+`2 ^ ( log_2(8.46) * (270847 / 245566)) = 10.54`
diff --git a/SpeechT5/fairseq/examples/megatron_11b/detok.py b/SpeechT5/fairseq/examples/megatron_11b/detok.py
new file mode 100644
index 0000000000000000000000000000000000000000..49921b28a1f35c6216b5ed85729453524e7a049d
--- /dev/null
+++ b/SpeechT5/fairseq/examples/megatron_11b/detok.py
@@ -0,0 +1,32 @@
+#!/usr/bin/env python3 -u
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import argparse
+import fileinput
+
+import sacremoses
+
+
+def main():
+    parser = argparse.ArgumentParser(description="")
+    parser.add_argument("files", nargs="*", help="input files")
+    args = parser.parse_args()
+
+    detok = sacremoses.MosesDetokenizer()
+
+    for line in fileinput.input(args.files, openhook=fileinput.hook_compressed):
+        print(
+            detok.detokenize(line.strip().split(" "))
+            .replace(" @", "")
+            .replace("@ ", "")
+            .replace(" =", "=")
+            .replace("= ", "=")
+            .replace(" – ", "–")
+        )
+
+
+if __name__ == "__main__":
+    main()
diff --git a/SpeechT5/fairseq/examples/multilingual/ML50_langs.txt b/SpeechT5/fairseq/examples/multilingual/ML50_langs.txt
new file mode 100644
index 0000000000000000000000000000000000000000..558abbc785072629de8000e343fc02a32c0afb97
--- /dev/null
+++ b/SpeechT5/fairseq/examples/multilingual/ML50_langs.txt
@@ -0,0 +1,52 @@
+ar_AR
+cs_CZ
+de_DE
+en_XX
+es_XX
+et_EE
+fi_FI
+fr_XX
+gu_IN
+hi_IN
+it_IT
+ja_XX
+kk_KZ
+ko_KR
+lt_LT
+lv_LV
+my_MM
+ne_NP
+nl_XX
+ro_RO
+ru_RU
+si_LK
+tr_TR
+vi_VN
+zh_CN
+af_ZA
+az_AZ
+bn_IN
+fa_IR
+he_IL
+hr_HR
+id_ID
+ka_GE
+km_KH
+mk_MK
+ml_IN
+mn_MN
+mr_IN
+pl_PL
+ps_AF
+pt_XX
+sv_SE
+sw_KE
+ta_IN
+te_IN
+th_TH
+tl_XX
+uk_UA
+ur_PK
+xh_ZA
+gl_ES
+sl_SI
\ No newline at end of file
diff --git a/SpeechT5/fairseq/examples/multilingual/README.md b/SpeechT5/fairseq/examples/multilingual/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..0076f5e8f0ab5c2c8dfd32b3eef02c556dddb88a
--- /dev/null
+++ b/SpeechT5/fairseq/examples/multilingual/README.md
@@ -0,0 +1,158 @@
+# Multilingual Translation
+
+[[Multilingual Translation with Extensible Multilingual Pretraining and Finetuning, https://arxiv.org/abs/2008.00401]](https://arxiv.org/abs/2008.00401)
+
+## Introduction
+
+This work is for training multilingual translation models with multiple bitext datasets. This multilingual translation framework supports (see [[training section]](#Training) and [[finetuning section]](#Finetuning) for examples)
+
+* temperature based sampling over unbalancing datasets of different translation directions
+  - --sampling-method' with
+            choices=['uniform', 'temperature',  'concat']
+  - --sampling-temperature
+* configurable to automatically add source and/or target language tokens to source/target sentences using data which are prepared in the same way as bilignual training
+  - --encoder-langtok with choices=['src', 'tgt', None] to specify whether to add source or target language tokens to the source sentences
+  - --decoder-langtok (binary option) to specify whether to add target language tokens to the target sentences or not
+* finetuning mBART pretrained models for multilingual translation
+  - --finetune-from-model to specify the path from which to load the pretrained model
+
+## Preprocessing data
+Multilingual training requires a joint BPE vocab. Please follow [mBART's preprocessing steps](https://github.com/pytorch/fairseq/tree/master/examples/mbart#bpe-data) to reuse our pretrained sentence-piece model.
+
+You can also train a joint BPE model on your own dataset and then follow the steps in [[link]](https://github.com/pytorch/fairseq/tree/master/examples/translation#multilingual-translation).
+
+## Training
+
+
+```bash
+lang_pairs=<language pairs to be trained, e.g. "en-cs,cs-en">
+path_2_data=<set to data path>
+lang_list=<a file which contains a list of languages separated by new lines>
+
+fairseq-train $path_2_data \
+  --encoder-normalize-before --decoder-normalize-before \
+  --arch transformer --layernorm-embedding \
+  --task translation_multi_simple_epoch \
+  --sampling-method "temperature" \
+  --sampling-temperature 1.5 \
+  --encoder-langtok "src" \
+  --decoder-langtok \
+  --lang-dict "$lang_list" \
+  --lang-pairs "$lang_pairs" \
+  --criterion label_smoothed_cross_entropy --label-smoothing 0.2 \
+  --optimizer adam --adam-eps 1e-06 --adam-betas '(0.9, 0.98)' \
+  --lr-scheduler inverse_sqrt --lr 3e-05 --warmup-updates 2500 --max-update 40000 \
+  --dropout 0.3 --attention-dropout 0.1 --weight-decay 0.0 \
+  --max-tokens 1024 --update-freq 2 \
+  --save-interval 1 --save-interval-updates 5000 --keep-interval-updates 10 --no-epoch-checkpoints \
+  --seed 222 --log-format simple --log-interval 2
+```
+
+## Finetuning
+We can also finetune multilingual models from a monolingual pretrained models, e.g. [mMBART](https://github.com/pytorch/fairseq/tree/master/examples/mbart).
+```bash
+lang_pairs=<language pairs to be trained, e.g. "en-cs,cs-en">
+path_2_data=<set to data path>
+lang_list=<a file which contains a list of languages separated by new lines>
+pretrained_model=<path to the pretrained model, e.g. mbart or another trained multilingual model>
+
+fairseq-train $path_2_data \
+  --finetune-from-model $pretrained_model \
+  --encoder-normalize-before --decoder-normalize-before \
+  --arch transformer --layernorm-embedding \
+  --task translation_multi_simple_epoch \
+  --sampling-method "temperature" \
+  --sampling-temperature 1.5 \
+  --encoder-langtok "src" \
+  --decoder-langtok \
+  --lang-dict "$lang_list" \
+  --lang-pairs "$lang_pairs" \
+  --criterion label_smoothed_cross_entropy --label-smoothing 0.2 \
+  --optimizer adam --adam-eps 1e-06 --adam-betas '(0.9, 0.98)' \
+  --lr-scheduler inverse_sqrt --lr 3e-05 --warmup-updates 2500 --max-update 40000 \
+  --dropout 0.3 --attention-dropout 0.1 --weight-decay 0.0 \
+  --max-tokens 1024 --update-freq 2 \
+  --save-interval 1 --save-interval-updates 5000 --keep-interval-updates 10 --no-epoch-checkpoints \
+  --seed 222 --log-format simple --log-interval 2
+```
+## Generate
+The following command uses the multilingual task (translation_multi_simple_epoch) to generate translation  from $source_lang to $target_lang on the test dataset. During generaton, the source language tokens are added to source sentences and the target language tokens are added as the starting token to decode target sentences. Options --lang-dict and --lang-pairs are needed to tell the generation process the ordered list of languages and translation directions that the trained model are awared of; they will need to be consistent with the training.
+
+```bash
+model=<multilingual model>
+source_lang=<source language>
+target_lang=<target language>
+
+fairseq-generate $path_2_data \
+  --path $model \
+  --task translation_multi_simple_epoch \
+  --gen-subset test \
+  --source-lang $source_lang \
+  --target-lang $target_lang
+  --sacrebleu --remove-bpe 'sentencepiece'\
+  --batch-size 32 \
+  --encoder-langtok "src" \
+  --decoder-langtok \
+  --lang-dict "$lang_list" \
+  --lang-pairs "$lang_pairs" > ${source_lang}_${target_lang}.txt
+```
+Fairseq will generate translation into a file {source_lang}_${target_lang}.txt with sacreblue at the end.
+
+You can also use costomized tokenizer to compare the performance with the literature. For example, you get a tokenizer [here](https://github.com/rsennrich/wmt16-scripts) and do the following:
+```bash
+TOKENIZER=<path to a customized tokenizer for decoding evaluation>
+TOK_CMD=<"$TOKENIZER $target_lang" or cat for sacrebleu>
+
+cat {source_lang}_${target_lang}.txt | grep -P "^H" |sort -V |cut -f 3- |$TOK_CMD > ${source_lang}_${target_lang}.hyp
+cat {source_lang}_${target_lang}.txt | grep -P "^T" |sort -V |cut -f 2- |$TOK_CMD > ${source_lang}_${target_lang}.ref
+sacrebleu -tok 'none' -s 'none' ${source_lang}_${target_lang}.ref < ${source_lang}_${target_lang}.hyp
+```
+
+# mBART50 models
+
+* [mMBART 50 pretrained model](https://dl.fbaipublicfiles.com/fairseq/models/mbart50/mbart50.pretrained.tar.gz).
+* [mMBART 50 finetuned many-to-one](https://dl.fbaipublicfiles.com/fairseq/models/mbart50/mbart50.ft.n1.tar.gz).
+* [mMBART 50 finetuned one-to-many](https://dl.fbaipublicfiles.com/fairseq/models/mbart50/mbart50.ft.1n.tar.gz).
+* [mMBART 50 finetuned many-to-many](https://dl.fbaipublicfiles.com/fairseq/models/mbart50/mbart50.ft.nn.tar.gz).
+
+Please download and extract from the above tarballs. Each tarball contains
+* The fairseq model checkpoint: model.pt
+* The list of supported languages: ML50_langs.txt
+* Sentence piece model: sentence.bpe.model
+* Fairseq dictionary of each language: dict.{lang}.txt (please replace lang with a language specified in ML50_langs.txt)
+
+To use the trained models, 
+* use the tool [binarize.py](./data_scripts/binarize.py) to binarize your data using sentence.bpe.model and dict.{lang}.txt, and copy the dictionaries to your data path
+* then run the generation command:
+```bash
+path_2_data=<path to your binarized data with fairseq dictionaries>
+model=<path_to_extracted_folder>/model.pt
+lang_list=<path_to_extracted_folder>/ML50_langs.txt
+source_lang=<source language>
+target_lang=<target language>
+
+fairseq-generate $path_2_data \
+  --path $model \
+  --task translation_multi_simple_epoch \
+  --gen-subset test \
+  --source-lang $source_lang \
+  --target-lang $target_lang
+  --sacrebleu --remove-bpe 'sentencepiece'\
+  --batch-size 32 \
+  --encoder-langtok "src" \
+  --decoder-langtok \
+  --lang-dict "$lang_list"
+```
+
+## Citation
+
+```bibtex
+@article{tang2020multilingual,
+    title={Multilingual Translation with Extensible Multilingual Pretraining and Finetuning},
+    author={Yuqing Tang and Chau Tran and Xian Li and Peng-Jen Chen and Naman Goyal and Vishrav Chaudhary and Jiatao Gu and Angela Fan},
+    year={2020},
+    eprint={2008.00401},
+    archivePrefix={arXiv},
+    primaryClass={cs.CL}
+}
+```
diff --git a/SpeechT5/fairseq/examples/multilingual/data_scripts/README.md b/SpeechT5/fairseq/examples/multilingual/data_scripts/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..cc610c0c9e936a5ae4659ceda691c6db6d387296
--- /dev/null
+++ b/SpeechT5/fairseq/examples/multilingual/data_scripts/README.md
@@ -0,0 +1,24 @@
+
+# Install dependency
+```bash
+pip install -r requirement.txt
+```
+
+# Download the data set
+```bash
+export WORKDIR_ROOT=<a directory which will hold all working files>
+
+```
+The downloaded data will be at $WORKDIR_ROOT/ML50
+
+# preprocess the data
+Install SPM [here](https://github.com/google/sentencepiece)
+```bash
+export WORKDIR_ROOT=<a directory which will hold all working files>
+export SPM_PATH=<a path pointing to sentencepice spm_encode.py>
+```
+* $WORKDIR_ROOT/ML50/raw: extracted raw data
+* $WORKDIR_ROOT/ML50/dedup: dedup data
+* $WORKDIR_ROOT/ML50/clean: data with valid and test sentences removed from the dedup data
+ 
+
diff --git a/SpeechT5/fairseq/examples/multilingual/data_scripts/binarize.py b/SpeechT5/fairseq/examples/multilingual/data_scripts/binarize.py
new file mode 100644
index 0000000000000000000000000000000000000000..ee54c6aabf021ca526743f8f1f67b91889e1e335
--- /dev/null
+++ b/SpeechT5/fairseq/examples/multilingual/data_scripts/binarize.py
@@ -0,0 +1,200 @@
+import shutil
+import os, sys
+from subprocess import check_call, check_output
+import glob
+import argparse
+import shutil
+import pathlib
+import itertools
+
+def call_output(cmd):
+    print(f"Executing: {cmd}")
+    ret = check_output(cmd, shell=True)
+    print(ret)
+    return ret
+
+def call(cmd):
+    print(cmd)
+    check_call(cmd, shell=True)
+
+
+WORKDIR_ROOT = os.environ.get('WORKDIR_ROOT', None)
+
+if WORKDIR_ROOT is None or  not WORKDIR_ROOT.strip():
+    print('please specify your working directory root in OS environment variable WORKDIR_ROOT. Exitting..."')
+    sys.exit(-1)
+
+SPM_PATH = os.environ.get('SPM_PATH', None)
+
+if SPM_PATH is None or not SPM_PATH.strip():
+    print("Please install sentence piecence from https://github.com/google/sentencepiece and set SPM_PATH pointing to the installed spm_encode.py. Exitting...")
+    sys.exit(-1)
+
+
+SPM_MODEL = f'{WORKDIR_ROOT}/sentence.bpe.model'
+SPM_VOCAB = f'{WORKDIR_ROOT}/dict_250k.txt'
+
+SPM_ENCODE = f'{SPM_PATH}'
+
+if not os.path.exists(SPM_MODEL):
+    call(f"wget https://dl.fbaipublicfiles.com/fairseq/models/mbart50/sentence.bpe.model -O {SPM_MODEL}")
+
+
+if not os.path.exists(SPM_VOCAB):
+    call(f"wget https://dl.fbaipublicfiles.com/fairseq/models/mbart50/dict_250k.txt -O {SPM_VOCAB}")
+
+
+
+def get_data_size(raw):
+    cmd = f'wc -l {raw}'
+    ret = call_output(cmd)
+    return int(ret.split()[0])
+
+def encode_spm(model, direction, prefix='', splits=['train', 'test', 'valid'], pairs_per_shard=None):
+    src, tgt = direction.split('-')
+
+    for split in splits:
+        src_raw, tgt_raw = f'{RAW_DIR}/{split}{prefix}.{direction}.{src}', f'{RAW_DIR}/{split}{prefix}.{direction}.{tgt}'
+        if os.path.exists(src_raw) and os.path.exists(tgt_raw):
+            cmd = f"""python {SPM_ENCODE} \
+            --model {model}\
+            --output_format=piece \
+            --inputs {src_raw} {tgt_raw}  \
+            --outputs {BPE_DIR}/{direction}{prefix}/{split}.bpe.{src} {BPE_DIR}/{direction}{prefix}/{split}.bpe.{tgt} """
+            print(cmd)
+            call(cmd)
+
+
+def binarize_(
+    bpe_dir,
+    databin_dir,
+    direction, spm_vocab=SPM_VOCAB, 
+    splits=['train', 'test', 'valid'],
+):
+    src, tgt = direction.split('-')
+
+    try:
+        shutil.rmtree(f'{databin_dir}', ignore_errors=True)
+        os.mkdir(f'{databin_dir}')
+    except OSError as error:
+        print(error)
+    cmds = [
+        "fairseq-preprocess",
+        f"--source-lang {src} --target-lang {tgt}",
+        f"--destdir {databin_dir}/",
+        f"--workers 8",
+    ]
+    if isinstance(spm_vocab, tuple):
+        src_vocab, tgt_vocab = spm_vocab
+        cmds.extend(
+            [
+                f"--srcdict {src_vocab}",
+                f"--tgtdict {tgt_vocab}",
+            ]
+        )
+    else:
+        cmds.extend(
+            [
+                f"--joined-dictionary",
+                f"--srcdict {spm_vocab}",
+            ]
+        )
+    input_options = []
+    if 'train' in splits and glob.glob(f"{bpe_dir}/train.bpe*"):
+        input_options.append(
+            f"--trainpref {bpe_dir}/train.bpe",
+        )        
+    if 'valid' in splits and glob.glob(f"{bpe_dir}/valid.bpe*"):
+        input_options.append(f"--validpref {bpe_dir}/valid.bpe")
+    if 'test' in splits and glob.glob(f"{bpe_dir}/test.bpe*"):
+        input_options.append(f"--testpref {bpe_dir}/test.bpe")   
+    if len(input_options) > 0:    
+        cmd = " ".join(cmds + input_options)
+        print(cmd)
+        call(cmd)
+
+
+def binarize(
+    databin_dir,
+    direction, spm_vocab=SPM_VOCAB, prefix='',
+    splits=['train', 'test', 'valid'],
+    pairs_per_shard=None,
+):
+    def move_databin_files(from_folder, to_folder):
+        for bin_file in glob.glob(f"{from_folder}/*.bin") \
+            +  glob.glob(f"{from_folder}/*.idx") \
+            +  glob.glob(f"{from_folder}/dict*"):
+            try:
+                shutil.move(bin_file, to_folder)
+            except OSError as error:
+                print(error)      
+    bpe_databin_dir = f"{BPE_DIR}/{direction}{prefix}_databin"
+    bpe_dir = f"{BPE_DIR}/{direction}{prefix}"
+    if pairs_per_shard is None:
+        binarize_(bpe_dir, bpe_databin_dir, direction, spm_vocab=spm_vocab, splits=splits)
+        move_databin_files(bpe_databin_dir, databin_dir)
+    else:
+        # binarize valid and test which will not be sharded
+        binarize_(
+            bpe_dir, bpe_databin_dir, direction,
+            spm_vocab=spm_vocab, splits=[s for s in splits if s != "train"])
+        for shard_bpe_dir in glob.glob(f"{bpe_dir}/shard*"):
+            path_strs = os.path.split(shard_bpe_dir)
+            shard_str = path_strs[-1]
+            shard_folder = f"{bpe_databin_dir}/{shard_str}"
+            databin_shard_folder = f"{databin_dir}/{shard_str}"
+            print(f'working from {shard_folder} to {databin_shard_folder}')
+            os.makedirs(databin_shard_folder, exist_ok=True)
+            binarize_(
+                shard_bpe_dir, shard_folder, direction,
+                spm_vocab=spm_vocab, splits=["train"])
+
+            for test_data in glob.glob(f"{bpe_databin_dir}/valid.*") + glob.glob(f"{bpe_databin_dir}/test.*"):
+                filename = os.path.split(test_data)[-1]
+                try:
+                    os.symlink(test_data, f"{databin_shard_folder}/{filename}")
+                except OSError as error:
+                    print(error)                
+            move_databin_files(shard_folder, databin_shard_folder)
+
+
+def load_langs(path):
+    with open(path) as fr:
+        langs = [l.strip() for l in fr]
+    return langs
+
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--data_root", default=f"{WORKDIR_ROOT}/ML50")
+    parser.add_argument("--raw-folder", default='raw')
+    parser.add_argument("--bpe-folder", default='bpe')    
+    parser.add_argument("--databin-folder", default='databin')    
+
+    args = parser.parse_args()
+
+    DATA_PATH = args.data_root #'/private/home/yuqtang/public_data/ML50'   
+    RAW_DIR = f'{DATA_PATH}/{args.raw_folder}'
+    BPE_DIR = f'{DATA_PATH}/{args.bpe_folder}'
+    DATABIN_DIR = f'{DATA_PATH}/{args.databin_folder}'
+    os.makedirs(BPE_DIR, exist_ok=True)
+
+    raw_files = itertools.chain(
+        glob.glob(f'{RAW_DIR}/train*'),
+        glob.glob(f'{RAW_DIR}/valid*'),
+        glob.glob(f'{RAW_DIR}/test*'),
+    )
+
+    directions = [os.path.split(file_path)[-1].split('.')[1] for file_path in raw_files]
+
+    for direction in directions:
+        prefix = ""
+        splits = ['train', 'valid', 'test']
+        try:
+            shutil.rmtree(f'{BPE_DIR}/{direction}{prefix}', ignore_errors=True)
+            os.mkdir(f'{BPE_DIR}/{direction}{prefix}')
+            os.makedirs(DATABIN_DIR, exist_ok=True)
+        except OSError as error: 
+            print(error)     
+        spm_model, spm_vocab = SPM_MODEL, SPM_VOCAB
+        encode_spm(spm_model, direction=direction, splits=splits)
+        binarize(DATABIN_DIR, direction, spm_vocab=spm_vocab, splits=splits)
diff --git a/SpeechT5/fairseq/examples/multilingual/data_scripts/check_iswlt_test_data.py b/SpeechT5/fairseq/examples/multilingual/data_scripts/check_iswlt_test_data.py
new file mode 100644
index 0000000000000000000000000000000000000000..f8e2eb0f15699f1b458a8445d0c1dd6229a21f77
--- /dev/null
+++ b/SpeechT5/fairseq/examples/multilingual/data_scripts/check_iswlt_test_data.py
@@ -0,0 +1,67 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+
+import os, sys
+import subprocess
+import re
+from subprocess import check_call, check_output
+
+WORKDIR_ROOT = os.environ.get('WORKDIR_ROOT', None)
+
+if WORKDIR_ROOT is None or  not WORKDIR_ROOT.strip():
+    print('please specify your working directory root in OS environment variable WORKDIR_ROOT. Exitting..."')
+    sys.exit(-1)
+
+
+BLEU_REGEX = re.compile("^BLEU\\S* = (\\S+) ")
+def run_eval_bleu(cmd):
+    output = check_output(cmd, shell=True, stderr=subprocess.STDOUT).decode("utf-8").strip()
+    print(output)
+    bleu = -1.0
+    for line in output.strip().split('\n'):
+        m = BLEU_REGEX.search(line)
+        if m is not None:
+            bleu = m.groups()[0]
+            bleu = float(bleu)
+            break
+    return bleu
+
+def check_data_test_bleu(raw_folder, data_lang_pairs):
+    not_matchings = []
+    for sacrebleu_set, src_tgts in data_lang_pairs:
+        for src_tgt in src_tgts:
+            print(f'checking test bleus for: {src_tgt} at {sacrebleu_set}')
+            src, tgt = src_tgt.split('-')
+            ssrc, stgt = src[:2], tgt[:2]
+            if os.path.exists(f'{raw_folder}/test.{tgt}-{src}.{src}'):
+                # reversed direction may have different test set
+                test_src = f'{raw_folder}/test.{tgt}-{src}.{src}'
+            else:
+                test_src = f'{raw_folder}/test.{src}-{tgt}.{src}'
+            cmd1 = f'cat {test_src} | sacrebleu -t "{sacrebleu_set}" -l {stgt}-{ssrc}; [ $? -eq 0 ] || echo ""'
+            test_tgt = f'{raw_folder}/test.{src}-{tgt}.{tgt}'       
+            cmd2 = f'cat {test_tgt} | sacrebleu -t "{sacrebleu_set}" -l {ssrc}-{stgt}; [ $? -eq 0 ] || echo ""'
+            bleu1 = run_eval_bleu(cmd1) 
+            if bleu1 != 100.0:
+                not_matchings.append(f'{sacrebleu_set}:{src_tgt} source side not matching: {test_src}')
+            bleu2 = run_eval_bleu(cmd2) 
+            if bleu2 != 100.0:
+                not_matchings.append(f'{sacrebleu_set}:{src_tgt} target side not matching: {test_tgt}')
+    return not_matchings       
+
+if __name__ == "__main__":
+    to_data_path = f'{WORKDIR_ROOT}/iwsltv2'
+    not_matching = check_data_test_bleu(
+        f'{to_data_path}/raw', 
+        [
+            ('iwslt17', ['en_XX-ar_AR', 'en_XX-ko_KR', 'ar_AR-en_XX', 'ko_KR-en_XX']),
+            ('iwslt17', ['en_XX-it_IT', 'en_XX-nl_XX', 'it_IT-en_XX', 'nl_XX-en_XX']),
+            ('iwslt17/tst2015', ['en_XX-vi_VN', "vi_VN-en_XX"]),        
+        ]
+        )    
+    if len(not_matching) > 0:
+        print('the following datasets do not have matching test datasets:\n\t', '\n\t'.join(not_matching))
+
diff --git a/SpeechT5/fairseq/examples/multilingual/data_scripts/check_self_overlaps.py b/SpeechT5/fairseq/examples/multilingual/data_scripts/check_self_overlaps.py
new file mode 100644
index 0000000000000000000000000000000000000000..07b338dcfd2d7f10317608274631d0edd93ba889
--- /dev/null
+++ b/SpeechT5/fairseq/examples/multilingual/data_scripts/check_self_overlaps.py
@@ -0,0 +1,103 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+
+import os
+import glob
+import argparse
+from utils.dedup import deup
+import sys
+
+WORKDIR_ROOT = os.environ.get('WORKDIR_ROOT', None)
+
+if WORKDIR_ROOT is None or  not WORKDIR_ROOT.strip():
+    print('please specify your working directory root in OS environment variable WORKDIR_ROOT. Exitting..."')
+    sys.exit(-1)
+
+def get_directions(folder):
+    raw_files = glob.glob(f'{folder}/train*')
+    directions = [os.path.split(file_path)[-1].split('.')[1] for file_path in raw_files] 
+    return directions   
+
+def diff_list(lhs, rhs):
+    return set(lhs).difference(set(rhs))
+
+def check_diff(
+    from_src_file, from_tgt_file, 
+    to_src_file, to_tgt_file, 
+):
+    seen_in_from = set()
+    seen_src_in_from = set()
+    seen_tgt_in_from = set()
+    from_count = 0
+    with open(from_src_file, encoding='utf-8') as fsrc, \
+        open(from_tgt_file, encoding='utf-8') as ftgt:
+        for s, t in zip(fsrc, ftgt):
+            seen_in_from.add((s, t))
+            seen_src_in_from.add(s)
+            seen_tgt_in_from.add(t)
+            from_count += 1
+    common = 0
+    common_src = 0
+    common_tgt = 0
+    to_count = 0
+    seen = set()
+
+    with open(to_src_file, encoding='utf-8') as fsrc, \
+        open(to_tgt_file, encoding='utf-8') as ftgt:
+        for s, t in zip(fsrc, ftgt):
+            to_count += 1
+            if (s, t) not in seen:
+                if (s, t) in seen_in_from:
+                    common += 1
+                if s in seen_src_in_from:
+                    common_src += 1
+                    seen_src_in_from.remove(s)
+                if t in seen_tgt_in_from:
+                    common_tgt += 1
+                    seen_tgt_in_from.remove(t)
+                seen.add((s, t))
+    return common, common_src, common_tgt, from_count, to_count
+
+def main():
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--folder", type=str, required=True,
+                        help="the data folder ")
+    parser.add_argument("--split", type=str, default='test',
+                        help="split (valid, test) to check against training data")
+    parser.add_argument('--directions', type=str, default=None, required=False)
+
+    args = parser.parse_args()    
+
+    if args.directions is None:
+        directions = set(get_directions(args.folder))
+        directions = sorted(directions)
+    else:
+        directions = args.directions.split(',')
+    directions = sorted(set(directions))
+
+    results = []
+    print(f'checking where {args.split} split data are in training')
+    print(f'direction\tcommon_count\tsrc common\ttgt common\tfrom_size\tto_size')
+
+    for direction in directions:
+        src, tgt = direction.split('-')
+        from_src_file = f'{args.folder}/{args.split}.{src}-{tgt}.{src}'
+        from_tgt_file = f'{args.folder}/{args.split}.{src}-{tgt}.{tgt}'
+        if not os.path.exists(from_src_file):
+            # some test/valid data might in reverse directinos:
+            from_src_file = f'{args.folder}/{args.split}.{tgt}-{src}.{src}'
+            from_tgt_file = f'{args.folder}/{args.split}.{tgt}-{src}.{tgt}'            
+        to_src_file = f'{args.folder}/train.{src}-{tgt}.{src}'
+        to_tgt_file = f'{args.folder}/train.{src}-{tgt}.{tgt}'
+        if not os.path.exists(to_src_file) or not os.path.exists(from_src_file):
+            continue
+        r = check_diff(from_src_file, from_tgt_file, to_src_file, to_tgt_file)
+        results.append(r)
+        print(f'{direction}\t', '\t'.join(map(str, r)))
+                
+
+if __name__ == "__main__":
+    main()
diff --git a/SpeechT5/fairseq/examples/multilingual/data_scripts/check_valid_test_overlaps.py b/SpeechT5/fairseq/examples/multilingual/data_scripts/check_valid_test_overlaps.py
new file mode 100644
index 0000000000000000000000000000000000000000..40fa9aecdf9108e095feb3661236453c0f7ed7c4
--- /dev/null
+++ b/SpeechT5/fairseq/examples/multilingual/data_scripts/check_valid_test_overlaps.py
@@ -0,0 +1,124 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+
+import os
+import argparse
+import pandas as pd
+import sys
+
+
+WORKDIR_ROOT = os.environ.get('WORKDIR_ROOT', None)
+
+if WORKDIR_ROOT is None or  not WORKDIR_ROOT.strip():
+    print('please specify your working directory root in OS environment variable WORKDIR_ROOT. Exitting..."')
+    sys.exit(-1)
+
+def load_langs(path):
+    with open(path) as fr:
+        langs = [l.strip() for l in fr]
+    return langs
+
+
+
+def load_sentences(raw_data, split, direction):
+    src, tgt = direction.split('-')
+    src_path = f"{raw_data}/{split}.{direction}.{src}"
+    tgt_path = f"{raw_data}/{split}.{direction}.{tgt}"
+    if os.path.exists(src_path) and os.path.exists(tgt_path):
+        return [(src, open(src_path).read().splitlines()), (tgt, open(tgt_path).read().splitlines())]
+    else:
+        return []
+
+def swap_direction(d):
+    src, tgt = d.split('-')
+    return f'{tgt}-{src}'
+
+def get_all_test_data(raw_data, directions, split='test'):
+    test_data = [ 
+        x
+        for dd in directions
+        for d in [dd, swap_direction(dd)]
+        for x in load_sentences(raw_data, split, d)
+    ]
+    # all_test_data = {s for _, d in test_data for s in d}
+    all_test_data = {}
+    for lang, d in test_data:
+        for s in d:
+            s = s.strip()
+            lgs = all_test_data.get(s, set())
+            lgs.add(lang)
+            all_test_data[s] = lgs
+    return all_test_data, test_data
+
+
+def check_train_sentences(src_path, tgt_path, direction, all_test_data, mess_up_train={}):
+    # src, tgt = direction.split('-')
+    print(f'check training data for {direction} in {src_path} and {tgt_path}')
+    size = 0
+    overlapped_size_counted_dup = 0
+    if not os.path.exists(tgt_path) or not os.path.exists(src_path):
+        return mess_up_train, size, overlapped_size_counted_dup
+
+    with open(src_path) as f, open(tgt_path) as g:
+        for src_line, tgt_line in zip(f, g):
+            s = src_line.strip()
+            t = tgt_line.strip()
+            size += 1
+            if  s in all_test_data:
+                langs = mess_up_train.get(s, set())
+                langs.add(direction)
+                mess_up_train[s] = langs
+                overlapped_size_counted_dup += 1
+            if t in all_test_data:
+                langs = mess_up_train.get(t, set())
+                langs.add(direction)
+                mess_up_train[t] = langs 
+                overlapped_size_counted_dup += 1
+    print(f'{direction}: size={size}, overlapped={overlapped_size_counted_dup}')
+    return mess_up_train, size, overlapped_size_counted_dup
+
+def check_train_all(raw_data, directions, all_test_data):
+    mess_up_train = {}
+    data_sizes = {}
+    # raw_data = '~chau/data-bin/MineBART/multilingual_mined_100M/en_XX/et_EE-en_XX/all.{en_XX, et_EE}'
+    print(f'checking training data againsts # {len(all_test_data)} sentences')
+    print(f'example test data: ', [s for i, s in enumerate(all_test_data.keys()) if i < 10])
+    for direction in directions:
+        src, tgt = direction.split('-')
+        path = f'{raw_data}/en_XX/{direction}/all'
+        src_path = f'{path}.{src}'
+        tgt_path = f'{path}.{tgt}'
+        print(f'checking {src_path} {tgt_path}')
+        _, size, overlapped_size_counted_dup = check_train_sentences(src_path, tgt_path, direction, all_test_data, mess_up_train)
+        data_sizes[direction] = (size, overlapped_size_counted_dup)
+    return mess_up_train, data_sizes
+
+
+
+
+def main():
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--folder", type=str, required=True,
+                        help="the data folder ")
+    parser.add_argument("--test-data", type=str, required=True,
+                        help="the test data folder ")                        
+    parser.add_argument('--directions', type=str, default=None, required=False)
+
+    args = parser.parse_args()    
+    directions = args.directions.split(',')
+    directions = sorted(set(directions))
+
+    results = []
+    # print(f'checking where {args.split} split data are in training')
+    # print(f'direction\tcommon_count\tsrc common\ttgt common\tfrom_size\tto_size')
+    raw_data = args.folder
+    all_test_data, test_data = get_all_test_data(args.test_data, directions, split='test')
+    mess_up_train, data_sizes = check_train_all(raw_data, directions, all_test_data)
+    print(data_sizes)
+
+
+if __name__ == "__main__":
+    main()
diff --git a/SpeechT5/fairseq/examples/multilingual/data_scripts/dedup_all.py b/SpeechT5/fairseq/examples/multilingual/data_scripts/dedup_all.py
new file mode 100644
index 0000000000000000000000000000000000000000..ef39c05ee606aaeda1d9e94970932d2241a8b281
--- /dev/null
+++ b/SpeechT5/fairseq/examples/multilingual/data_scripts/dedup_all.py
@@ -0,0 +1,52 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+
+
+import os
+import glob
+import argparse
+from utils.dedup import deup
+
+import sys
+WORKDIR_ROOT = os.environ.get('WORKDIR_ROOT', None)
+
+if WORKDIR_ROOT is None or  not WORKDIR_ROOT.strip():
+    print('please specify your working directory root in OS environment variable WORKDIR_ROOT. Exitting..."')
+    sys.exit(-1)
+
+
+def main():
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--from-folder", type=str, required=True,
+                        help="the data folder to be dedup")
+    parser.add_argument("--to-folder", type=str, required=True,
+                        help="the data folder to save deduped data")
+    parser.add_argument('--directions', type=str, default=None, required=False)
+
+    args = parser.parse_args()    
+
+    if args.directions is None:
+        raw_files = glob.glob(f'{args.from_folder}/train*')
+
+        directions = [os.path.split(file_path)[-1].split('.')[1] for file_path in raw_files]
+    else:
+        directions = args.directions.split(',')
+    directions = sorted(set(directions))
+    
+    for direction in directions:
+        src, tgt = direction.split('-')
+        src_file = f'{args.from_folder}/train.{src}-{tgt}.{src}'
+        tgt_file = f'{args.from_folder}/train.{src}-{tgt}.{tgt}'
+        src_file_out = f'{args.to_folder}/train.{src}-{tgt}.{src}'
+        tgt_file_out = f'{args.to_folder}/train.{src}-{tgt}.{tgt}'
+        assert src_file != src_file_out
+        assert tgt_file != tgt_file_out
+        print(f'deduping {src_file}, {tgt_file}')
+        deup(src_file, tgt_file, src_file_out, tgt_file_out)
+                
+
+if __name__ == "__main__":
+    main()
diff --git a/SpeechT5/fairseq/examples/multilingual/data_scripts/download_ML50_v1.sh b/SpeechT5/fairseq/examples/multilingual/data_scripts/download_ML50_v1.sh
new file mode 100644
index 0000000000000000000000000000000000000000..99fbc75920836a4b4bbdbd6b523749843288e450
--- /dev/null
+++ b/SpeechT5/fairseq/examples/multilingual/data_scripts/download_ML50_v1.sh
@@ -0,0 +1,30 @@
+#!/bin/bash
+# Copyright (c) Facebook, Inc. and its affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+
+if [ -z $WORKDIR_ROOT ] ;
+then
+        echo "please specify your working directory root in environment variable WORKDIR_ROOT. Exitting..."
+        exit
+fi
+
+# first run download_wmt20.sh; it will install a few useful tools for other scripts
+# TODO: need to print out instructions on downloading a few files which requires manually authentication from the websites
+bash ./download_wmt20.sh
+
+python ./download_wmt19_and_before.py
+bash ./download_wat19_my.sh
+python ./download_ted_and_extract.py
+bash ./download_lotus.sh
+bash ./download_iitb.sh
+bash ./download_af_xh.sh
+
+
+# IWSLT downloading URLs have changed in between; TODO: fix them:
+bash ./download_iwslt_and_extract.sh
+
+# TODO: globalvoices URLs changed; need to be fixed
+bash ./download_flores_data.sh
diff --git a/SpeechT5/fairseq/examples/multilingual/data_scripts/download_af_xh.sh b/SpeechT5/fairseq/examples/multilingual/data_scripts/download_af_xh.sh
new file mode 100644
index 0000000000000000000000000000000000000000..a78fbbbbccb6f6ae005a1f03b97f083a2d958ebe
--- /dev/null
+++ b/SpeechT5/fairseq/examples/multilingual/data_scripts/download_af_xh.sh
@@ -0,0 +1,164 @@
+#!/bin/bash
+# Copyright (c) Facebook, Inc. and its affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+
+# set -x -e
+
+if [ -z $WORKDIR_ROOT ] ;
+then
+        echo "please specify your working directory root in environment variable WORKDIR_ROOT. Exitting..."
+        exit
+fi
+
+ 
+# put intermediate files
+TMP_DIR=$WORKDIR_ROOT/temp/af_xhv2
+# output {train,valid,test} files to dest
+DEST=${WORKDIR_ROOT}/ML50/raw
+
+
+
+ROOT=${WORKDIR_ROOT}
+UTILS=$PWD/utils
+TMX2CORPUS="${UTILS}/tmx2corpus"
+TMX_TOOL="python ${TMX2CORPUS}/tmx2corpus.py"
+
+mkdir -p $TMP_DIR
+mkdir -p $DEST
+mkdir -p $UTILS
+
+function download_opus(){
+    src=$1
+    tgt=$2
+    subset=$3
+    ulr=$4
+
+    mkdir extract_$subset.$src-$tgt
+    pushd extract_$subset.$src-$tgt
+    if [ ! -f "$subset.$src-$tgt.tmx.gz" ]; then
+        wget $url -O "$subset.$src-$tgt.tmx.gz"
+        gzip -d "$subset.$src-$tgt.tmx.gz"
+        f=$subset.$src-$tgt.tmx
+        $TMX_TOOL $f
+        mv bitext.$src ../$subset.$src-$tgt.$src
+        mv bitext.$tgt ../$subset.$src-$tgt.$tgt
+    fi
+    popd    
+}
+
+function concat_subsets(){
+    src=$1
+    tgt=$2
+    subsets=$3
+    src_train=raw_train.$src-$tgt.$src
+    tgt_train=raw_train.$src-$tgt.$tgt
+    > $src_train
+    > $tgt_train
+    for subset in $subsets; do
+        cat $subset.$src-$tgt.$src >> $src_train
+        cat $subset.$src-$tgt.$tgt >> $tgt_train
+    done
+}
+
+
+
+function get_seeded_random()
+{
+  seed="$1"
+  openssl enc -aes-256-ctr -pass pass:"$seed" -nosalt \
+    </dev/zero 2>/dev/null
+}
+
+function split_train_valid(){
+    src=$1
+    tgt=$2
+    raw_src_train=raw_train.$src-$tgt.$src
+    raw_tgt_train=raw_train.$src-$tgt.$tgt
+
+    shuf --random-source=<(get_seeded_random 43) $raw_src_train > shuffled.$src-$tgt.$src 
+    shuf --random-source=<(get_seeded_random 43) $raw_tgt_train > shuffled.$src-$tgt.$tgt 
+
+    head -n 1500 shuffled.$src-$tgt.$src  > valid.$src-$tgt.$src
+    head -n 1500 shuffled.$src-$tgt.$tgt > valid.$src-$tgt.$tgt
+
+    tail +1501 shuffled.$src-$tgt.$src > train.$src-$tgt.$src
+    tail +1501 shuffled.$src-$tgt.$tgt > train.$src-$tgt.$tgt     
+}
+
+function copy2dst(){
+    lsrc=$1
+    ltgt=$2
+    src=${lsrc:0:2}
+    tgt=${ltgt:0:2}
+ 
+
+    cp valid.$src-$tgt.$src $DEST/valid.$lsrc-$ltgt.$lsrc 
+    cp valid.$src-$tgt.$tgt $DEST/valid.$lsrc-$ltgt.$ltgt 
+
+    cp train.$src-$tgt.$src $DEST/train.$lsrc-$ltgt.$lsrc 
+    cp train.$src-$tgt.$tgt $DEST/train.$lsrc-$ltgt.$ltgt        
+}
+
+
+
+
+#for xh-en
+declare -A xh_en_urls
+xh_en_urls=(
+    [Tatoeba]=https://object.pouta.csc.fi/OPUS-Tatoeba/v20190709/tmx/en-xh.tmx.gz 
+    [wikimedia]=https://object.pouta.csc.fi/OPUS-wikimedia/v20190628/tmx/en-xh.tmx.gz
+    [memat]=https://object.pouta.csc.fi/OPUS-memat/v1/tmx/en-xh.tmx.gz
+    [uedin]=https://object.pouta.csc.fi/OPUS-bible-uedin/v1/tmx/en-xh.tmx.gz
+    [GNOME]=https://object.pouta.csc.fi/OPUS-GNOME/v1/tmx/en-xh.tmx.gz
+    [XhosaNavy]=https://object.pouta.csc.fi/OPUS-XhosaNavy/v1/tmx/en-xh.tmx.gz
+    [KDE4]=https://object.pouta.csc.fi/OPUS-KDE4/v2/tmx/en-xh.tmx.gz
+    [Ubuntu]=https://object.pouta.csc.fi/OPUS-Ubuntu/v14.10/tmx/en-xh.tmx.gz    
+)
+
+mkdir $TMP_DIR/xh-en
+pushd $TMP_DIR/xh-en
+for k in "${!xh_en_urls[@]}"
+do
+    name=$k
+    url=${xh_en_urls[$k]}
+    echo "$name: $url"
+    download_opus xh en $name $ulr
+done
+concat_subsets xh en "${!xh_en_urls[@]}"
+split_train_valid xh en
+copy2dst xh_ZA en_XX
+popd
+
+
+##
+#for af-en
+declare -A af_en_urls
+af_en_urls=(
+    [Tatoeba]=https://object.pouta.csc.fi/OPUS-Tatoeba/v20190709/tmx/af-en.tmx.gz
+    [uedin]=https://object.pouta.csc.fi/OPUS-bible-uedin/v1/tmx/af-en.tmx.gz
+    [GNOME]=https://object.pouta.csc.fi/OPUS-GNOME/v1/tmx/af-en.tmx.gz
+    [QED]=https://object.pouta.csc.fi/OPUS-QED/v2.0a/tmx/af-en.tmx.gz
+    [KDE4]=https://object.pouta.csc.fi/OPUS-KDE4/v2/tmx/af-en.tmx.gz
+    [OpenSubtitles]=https://object.pouta.csc.fi/OPUS-OpenSubtitles/v2018/tmx/af-en.tmx.gz
+    [SPC]=https://object.pouta.csc.fi/OPUS-SPC/v1/tmx/af-en.tmx.gz
+    [Ubuntu]=https://object.pouta.csc.fi/OPUS-Ubuntu/v14.10/tmx/af-en.tmx.gz
+)
+
+mkdir $TMP_DIR/af-en
+pushd $TMP_DIR/af-en
+for k in "${!af_en_urls[@]}"
+do
+    name=$k
+    url=${af_en_urls[$k]}
+    echo "$name: $url"
+    download_opus af en $name $ulr
+done
+concat_subsets af en "${!af_en_urls[@]}"
+split_train_valid af en
+copy2dst af_ZA en_XX
+popd
+
+
diff --git a/SpeechT5/fairseq/examples/multilingual/data_scripts/download_flores_data.sh b/SpeechT5/fairseq/examples/multilingual/data_scripts/download_flores_data.sh
new file mode 100644
index 0000000000000000000000000000000000000000..e6175ce0c38b06a1ebddaeca808f71b47f77f500
--- /dev/null
+++ b/SpeechT5/fairseq/examples/multilingual/data_scripts/download_flores_data.sh
@@ -0,0 +1,246 @@
+#!/bin/bash
+
+# Copyright (c) Facebook, Inc. and its affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+#
+
+if [ -z $WORKDIR_ROOT ] ;
+then
+        echo "please specify your working directory root in environment variable WORKDIR_ROOT. Exitting..."
+        exit
+fi
+
+
+set -e
+set -o pipefail
+
+SRC=en
+SI_TGT=si
+NE_TGT=ne
+
+DESTDIR=${WORKDIR_ROOT}/ML50/raw/
+
+ROOT=${WORKDIR_ROOT}/tmp
+mkdir -p $ROOT
+DATA=$ROOT/data
+NE_ROOT=$DATA/all-clean-ne
+SI_ROOT=$DATA/all-clean-si
+
+mkdir -p $DATA $NE_ROOT $SI_ROOT
+
+SI_OPUS_DATASETS=(
+  "$SI_ROOT/GNOME.en-si"
+  "$SI_ROOT/Ubuntu.en-si"
+  "$SI_ROOT/KDE4.en-si"
+  "$SI_ROOT/OpenSubtitles.en-si"
+)
+
+SI_OPUS_URLS=(
+  "https://object.pouta.csc.fi/OPUS-GNOME/v1/moses/en-si.txt.zip"
+  "https://object.pouta.csc.fi/OPUS-Ubuntu/v14.10/moses/en-si.txt.zip"
+  "https://object.pouta.csc.fi/OPUS-KDE4/v2/moses/en-si.txt.zip"
+  "https://object.pouta.csc.fi/OPUS-OpenSubtitles/v2018/moses/en-si.txt.zip"
+)
+
+NE_OPUS_DATASETS=(
+  "$NE_ROOT/GNOME.en-ne"
+  "$NE_ROOT/Ubuntu.en-ne"
+  "$NE_ROOT/KDE4.en-ne"
+)
+
+NE_OPUS_URLS=(
+  "https://object.pouta.csc.fi/OPUS-GNOME/v1/moses/en-ne.txt.zip"
+  "https://object.pouta.csc.fi/OPUS-Ubuntu/v14.10/moses/en-ne.txt.zip"
+  "https://object.pouta.csc.fi/OPUS-KDE4/v2/moses/en-ne.txt.zip"
+)
+
+REMOVE_FILE_PATHS=()
+
+# Download data
+download_data() {
+  CORPORA=$1
+  URL=$2
+
+  if [ -f $CORPORA ]; then
+    echo "$CORPORA already exists, skipping download"
+  else
+    echo "Downloading $URL"
+    wget $URL -O $CORPORA --no-check-certificate || rm -f $CORPORA
+    if [ -f $CORPORA ]; then
+      echo "$URL successfully downloaded."
+    else
+      echo "$URL not successfully downloaded."
+      rm -f $CORPORA
+      exit -1
+    fi
+  fi
+}
+
+# Example: download_opus_data $LANG_ROOT $TGT
+download_opus_data() {
+  LANG_ROOT=$1
+  TGT=$2
+
+  if [ "$TGT" = "si" ]; then
+    URLS=("${SI_OPUS_URLS[@]}")
+    DATASETS=("${SI_OPUS_DATASETS[@]}")
+  else
+    URLS=("${NE_OPUS_URLS[@]}")
+    DATASETS=("${NE_OPUS_DATASETS[@]}")
+  fi
+
+  # Download and extract data
+  for ((i=0;i<${#URLS[@]};++i)); do
+    URL=${URLS[i]}
+    CORPORA=${DATASETS[i]}
+
+    download_data $CORPORA $URL
+    unzip -o $CORPORA -d $LANG_ROOT
+    REMOVE_FILE_PATHS+=( $CORPORA $CORPORA.xml $CORPORA.ids $LANG_ROOT/README $LANG_ROOT/LICENSE )
+  done
+
+  cat ${DATASETS[0]}.$SRC ${DATASETS[1]}.$SRC ${DATASETS[2]}.$SRC > $LANG_ROOT/GNOMEKDEUbuntu.$SRC-$TGT.$SRC
+  cat ${DATASETS[0]}.$TGT ${DATASETS[1]}.$TGT ${DATASETS[2]}.$TGT > $LANG_ROOT/GNOMEKDEUbuntu.$SRC-$TGT.$TGT
+
+  REMOVE_FILE_PATHS+=( ${DATASETS[0]}.$SRC ${DATASETS[1]}.$SRC ${DATASETS[2]}.$SRC )
+  REMOVE_FILE_PATHS+=( ${DATASETS[0]}.$TGT ${DATASETS[1]}.$TGT ${DATASETS[2]}.$TGT )
+}
+
+download_opus_data $SI_ROOT $SI_TGT
+cp ${SI_OPUS_DATASETS[3]}.$SRC $SI_ROOT/OpenSubtitles2018.$SRC-$SI_TGT.$SRC
+cp ${SI_OPUS_DATASETS[3]}.$SI_TGT $SI_ROOT/OpenSubtitles2018.$SRC-$SI_TGT.$SI_TGT
+REMOVE_FILE_PATHS+=( ${SI_OPUS_DATASETS[3]}.$SRC ${SI_OPUS_DATASETS[3]}.$SI_TGT )
+
+download_opus_data $NE_ROOT $NE_TGT
+
+
+# Download and extract Global Voices data
+GLOBAL_VOICES="$NE_ROOT/globalvoices.2018q4.ne-en"
+GLOBAL_VOICES_URL="http://www.casmacat.eu/corpus/global-voices/globalvoices.ne-en.xliff.gz"
+
+download_data $GLOBAL_VOICES.gz $GLOBAL_VOICES_URL
+gunzip -Nf $GLOBAL_VOICES.gz
+
+sed -ne 's?.*<source>\(.*\)</source>.*?\1?p' $GLOBAL_VOICES > $GLOBAL_VOICES.$NE_TGT
+sed -ne 's?.*<target[^>]*>\(.*\)</target>.*?\1?p' $GLOBAL_VOICES > $GLOBAL_VOICES.$SRC
+
+REMOVE_FILE_PATHS+=( $GLOBAL_VOICES )
+
+# Download and extract the bible dataset
+BIBLE_TOOLS=bible-corpus-tools
+XML_BIBLES=XML_Bibles
+XML_BIBLES_DUP=XML_Bibles_dup
+
+if [ ! -e $BIBLE_TOOLS ]; then
+    echo "Cloning bible-corpus-tools repository..."
+    git clone https://github.com/christos-c/bible-corpus-tools.git
+fi
+
+mkdir -p $BIBLE_TOOLS/bin $XML_BIBLES $XML_BIBLES_DUP
+javac -cp "$BIBLE_TOOLS/lib/*" -d $BIBLE_TOOLS/bin $BIBLE_TOOLS/src/bible/readers/*.java $BIBLE_TOOLS/src/bible/*.java
+
+download_data bible.tar.gz "https://github.com/christos-c/bible-corpus/archive/v1.2.1.tar.gz"
+tar xvzf bible.tar.gz
+
+cp bible-corpus-1.2.1/bibles/{Greek.xml,English.xml,Nepali.xml} $XML_BIBLES/
+cp bible-corpus-1.2.1/bibles/{Greek.xml,English-WEB.xml,Nepali.xml} $XML_BIBLES_DUP/
+
+java -cp $BIBLE_TOOLS/lib/*:$BIBLE_TOOLS/bin bible.CreateMLBooks $XML_BIBLES
+java -cp $BIBLE_TOOLS/lib/*:$BIBLE_TOOLS/bin bible.CreateMLBooks $XML_BIBLES_DUP
+java -cp $BIBLE_TOOLS/lib/*:$BIBLE_TOOLS/bin bible.CreateVerseAlignedBooks $XML_BIBLES
+java -cp $BIBLE_TOOLS/lib/*:$BIBLE_TOOLS/bin bible.CreateVerseAlignedBooks $XML_BIBLES_DUP
+
+cat $XML_BIBLES/aligned/*/English.txt > $NE_ROOT/bible.$SRC-$NE_TGT.$SRC
+cat $XML_BIBLES/aligned/*/Nepali.txt > $NE_ROOT/bible.$SRC-$NE_TGT.$NE_TGT
+cat $XML_BIBLES_DUP/aligned/*/English-WEB.txt > $NE_ROOT/bible_dup.$SRC-$NE_TGT.$SRC
+cat $XML_BIBLES_DUP/aligned/*/Nepali.txt > $NE_ROOT/bible_dup.$SRC-$NE_TGT.$NE_TGT
+REMOVE_FILE_PATHS+=( bible-corpus-1.2.1 bible.tar.gz $BIBLE_TOOLS $XML_BIBLES $XML_BIBLES_DUP )
+
+# Download and extract the Penn Treebank dataset
+NE_TAGGED=$ROOT/new_submissions_parallel_corpus_project_Nepal
+NE_TAGGED_URL="http://www.cle.org.pk/Downloads/ling_resources/parallelcorpus/NepaliTaggedCorpus.zip"
+EN_TAGGED_PATCH_URL="https://dl.fbaipublicfiles.com/fairseq/data/nepali-penn-treebank.en.patch"
+NE_TAGGED_PATCH_URL="https://dl.fbaipublicfiles.com/fairseq/data/nepali-penn-treebank.ne.patch"
+MOSES=mosesdecoder
+MOSES_TOK=$MOSES/scripts/tokenizer
+EN_PATCH_REGEX="{s:\\\/:\/:g;s/\*\T\*\-\n+//g;s/\-LCB\-/\{/g;s/\-RCB\-/\}/g; s/\-LSB\-/\[/g; s/\-RSB\-/\]/g;s/\-LRB\-/\(/g; s/\-RRB\-/\)/g; s/\'\'/\"/g; s/\`\`/\"/g; s/\ +\'s\ +/\'s /g; s/\ +\'re\ +/\'re /g; s/\"\ +/\"/g; s/\ +\"/\"/g; s/\ n't([\ \.\"])/n't\1/g; s/\r+(.)/\1/g;}"
+NE_PATCH_REGEX="{s:\p{Cf}::g;s:\\\/:\/:g;s/\*\T\*\-\n+//g;s/\-LCB\-/\{/g;s/\-RCB\-/\}/g; s/\-LSB\-/\[/g; s/\-RSB\-/\]/g;s/\-LRB\-/\(/g; s/\-RRB\-/\)/g; s/\'\'/\"/g; s/\`\`/\"/g; s/\ +\'s\ +/\'s /g; s/\ +\'re\ +/\'re /g; s/\"\ +/\"/g; s/\ +\"/\"/g; s/\ n't([\ \.\"])/n't\1/g; s/\r+(.)/\1/g;}"
+
+download_data $DATA/nepali-penn-treebank.$SRC.patch $EN_TAGGED_PATCH_URL
+download_data $DATA/nepali-penn-treebank.$NE_TGT.patch $NE_TAGGED_PATCH_URL
+download_data original.zip $NE_TAGGED_URL
+unzip -o original.zip -d $ROOT
+
+cat $NE_TAGGED/00.txt $NE_TAGGED/01.txt $NE_TAGGED/02.txt > $NE_TAGGED/nepali-penn-treebank.$SRC
+cat $NE_TAGGED/00ne_revised.txt $NE_TAGGED/01ne_revised.txt $NE_TAGGED/02ne_revised.txt > $NE_TAGGED/nepali-penn-treebank.$NE_TGT
+
+patch $NE_TAGGED/nepali-penn-treebank.$SRC -i $DATA/nepali-penn-treebank.$SRC.patch -o $NE_TAGGED/nepali-penn-treebank-patched.$SRC
+patch $NE_TAGGED/nepali-penn-treebank.$NE_TGT -i $DATA/nepali-penn-treebank.$NE_TGT.patch -o $NE_TAGGED/nepali-penn-treebank-patched.$NE_TGT
+
+if [ ! -e $MOSES ]; then
+    echo "Cloning moses repository..."
+    git clone https://github.com/moses-smt/mosesdecoder.git
+fi
+
+cat $NE_TAGGED/nepali-penn-treebank-patched.$SRC | \
+  perl -anpe "$EN_PATCH_REGEX"  | \
+  $MOSES_TOK/tokenizer.perl -l $SRC | \
+  $MOSES_TOK/detokenizer.perl -l $SRC > $NE_ROOT/nepali-penn-treebank.$SRC
+
+cat $NE_TAGGED/nepali-penn-treebank-patched.$NE_TGT | \
+  perl -CIO -anpe "$NE_PATCH_REGEX" | \
+  $MOSES_TOK/detokenizer.perl -l $SRC > $NE_ROOT/nepali-penn-treebank.$NE_TGT
+
+
+# Download nepali dictionary data
+NE_DICT=$NE_ROOT/dictionaries
+download_data $NE_DICT "http://www.seas.upenn.edu/~nlp/resources/TACL-data-release/dictionaries.tar.gz"
+tar xvzf $NE_DICT
+cp dictionaries/dict.ne $NE_ROOT/dictionary.$NE_TGT-$SRC
+REMOVE_FILE_PATHS+=( $NE_DICT dictionaries )
+
+REMOVE_FILE_PATHS+=( $MOSES $NE_TAGGED original.zip $DATA/nepali-penn-treebank.$SRC.patch $DATA/nepali-penn-treebank.$NE_TGT.patch )
+
+
+# Remove the temporary files
+for ((i=0;i<${#REMOVE_FILE_PATHS[@]};++i)); do
+  rm -rf ${REMOVE_FILE_PATHS[i]}
+done
+
+# Copy the training data
+si=si_LK
+ne=ne_NP
+en=en_XX
+cat $SI_ROOT/GNOMEKDEUbuntu.en-si.si $SI_ROOT/OpenSubtitles2018.en-si.si > $DESTDIR/train.$si-$en.$si
+cat $SI_ROOT/GNOMEKDEUbuntu.en-si.en $SI_ROOT/OpenSubtitles2018.en-si.en > $DESTDIR/train.$si-$en.$en
+
+cat $NE_ROOT/bible_dup.en-ne.ne $NE_ROOT/bible.en-ne.ne $NE_ROOT/globalvoices.2018q4.ne-en.ne $NE_ROOT/GNOMEKDEUbuntu.en-ne.ne $NE_ROOT/nepali-penn-treebank.ne >  $DESTDIR/train.$ne-$en.$ne
+cat $NE_ROOT/bible_dup.en-ne.en $NE_ROOT/bible.en-ne.en $NE_ROOT/globalvoices.2018q4.ne-en.en $NE_ROOT/GNOMEKDEUbuntu.en-ne.en $NE_ROOT/nepali-penn-treebank.en >  $DESTDIR/train.$ne-$en.$en
+
+
+#Download the test sets
+wget https://github.com/facebookresearch/flores/raw/master/data/wikipedia_en_ne_si_test_sets.tgz
+tar -xvzf wikipedia_en_ne_si_test_sets.tgz
+
+cp wikipedia_en_ne_si_test_sets/wikipedia.dev.ne-en.ne $DESTDIR/valid.$ne-$en.$ne
+cp wikipedia_en_ne_si_test_sets/wikipedia.dev.ne-en.en $DESTDIR/valid.$ne-$en.$en
+
+cp wikipedia_en_ne_si_test_sets/wikipedia.dev.si-en.si $DESTDIR/valid.$si-$en.$si
+cp wikipedia_en_ne_si_test_sets/wikipedia.dev.si-en.en $DESTDIR/valid.$si-$en.$en
+
+cp wikipedia_en_ne_si_test_sets/wikipedia.devtest.ne-en.ne $DESTDIR/devtest.$ne-$en.$ne
+cp wikipedia_en_ne_si_test_sets/wikipedia.devtest.ne-en.en $DESTDIR/devtest.$ne-$en.$en
+
+cp wikipedia_en_ne_si_test_sets/wikipedia.devtest.si-en.si $DESTDIR/devtest.$si-$en.$si
+cp wikipedia_en_ne_si_test_sets/wikipedia.devtest.si-en.en $DESTDIR/devtest.$si-$en.$en
+
+cp wikipedia_en_ne_si_test_sets/wikipedia.test.ne-en.ne $DESTDIR/test.$ne-$en.$ne
+cp wikipedia_en_ne_si_test_sets/wikipedia.test.ne-en.en $DESTDIR/test.$ne-$en.$en
+
+cp wikipedia_en_ne_si_test_sets/wikipedia.test.si-en.si $DESTDIR/test.$si-$en.$si
+cp wikipedia_en_ne_si_test_sets/wikipedia.test.si-en.en $DESTDIR/test.$si-$en.$en
+
+rm -rf wikipedia_en_ne_si_test_sets.tgz wikipedia_en_ne_si_test_sets
diff --git a/SpeechT5/fairseq/examples/multilingual/data_scripts/download_iitb.sh b/SpeechT5/fairseq/examples/multilingual/data_scripts/download_iitb.sh
new file mode 100644
index 0000000000000000000000000000000000000000..a884e20839e2a41a57405cb6af362e37bd16ab6f
--- /dev/null
+++ b/SpeechT5/fairseq/examples/multilingual/data_scripts/download_iitb.sh
@@ -0,0 +1,35 @@
+#!/bin/bash
+# Copyright (c) Facebook, Inc. and its affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+
+
+if [ -z $WORKDIR_ROOT ] ;
+then
+        echo "please specify your working directory root in environment variable WORKDIR_ROOT. Exitting..."
+        exit
+fi
+
+IITB=$WORKDIR_ROOT/IITB
+mkdir -p $IITB
+pushd $IITB 
+
+wget http://www.cfilt.iitb.ac.in/~moses/iitb_en_hi_parallel/iitb_corpus_download/parallel.tgz
+tar -xvzf parallel.tgz 
+
+wget http://www.cfilt.iitb.ac.in/~moses/iitb_en_hi_parallel/iitb_corpus_download/dev_test.tgz
+tar -xvzf dev_test.tgz 
+
+DESTDIR=${WORKDIR_ROOT}/ML50/raw/
+ 
+cp parallel/IITB.en-hi.en $DESTDIR/train.hi_IN-en_XX.en_XX
+cp parallel/IITB.en-hi.hi $DESTDIR/train.hi_IN-en_XX.hi_IN
+
+cp dev_test/dev.en $DESTDIR/valid.hi_IN-en_XX.en_XX
+cp dev_test/dev.hi $DESTDIR/valid.hi_IN-en_XX.hi_IN
+
+cp dev_test/test.en $DESTDIR/test.hi_IN-en_XX.en_XX
+cp dev_test/test.hi $DESTDIR/test.hi_IN-en_XX.hi_IN
+popd
\ No newline at end of file
diff --git a/SpeechT5/fairseq/examples/multilingual/data_scripts/download_iwslt_and_extract.sh b/SpeechT5/fairseq/examples/multilingual/data_scripts/download_iwslt_and_extract.sh
new file mode 100644
index 0000000000000000000000000000000000000000..ca3591b3db1715f136773d62e4b9b9ede97d436c
--- /dev/null
+++ b/SpeechT5/fairseq/examples/multilingual/data_scripts/download_iwslt_and_extract.sh
@@ -0,0 +1,225 @@
+#!/bin/bash
+# Copyright (c) Facebook, Inc. and its affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+
+#echo 'Cloning Moses github repository (for tokenization scripts)...'
+#git clone https://github.com/moses-smt/mosesdecoder.git
+
+if [ -z $WORKDIR_ROOT ] ;
+then
+        echo "please specify your working directory root in environment variable WORKDIR_ROOT. Exitting..."
+        exit
+fi
+
+ 
+
+data_root=${WORKDIR_ROOT}/iwsltv2
+DESTDIR=${WORKDIR_ROOT}/ML50/raw
+
+
+langs="ar_AR it_IT nl_XX ko_KR vi_VN"
+echo "data_root: $data_root"
+
+download_path=${data_root}/downloads
+raw=${DESTDIR}
+tmp=${data_root}/tmp
+orig=${data_root}/orig
+ 
+mkdir -p $download_path $orig $raw $tmp
+#######################
+download_iwslt(){
+    iwslt_key=$1
+    src=$2
+    tgt=$3
+    save_prefix=$4
+    pushd ${download_path}
+    if [[ ! -f ${save_prefix}$src-$tgt.tgz ]]; then
+        wget https://wit3.fbk.eu/archive/${iwslt_key}/texts/$src/$tgt/$src-$tgt.tgz -O ${save_prefix}$src-$tgt.tgz
+        [ $? -eq 0 ] && return 0
+    fi         
+    popd
+}
+
+extract_iwslt(){
+    src=$1
+    tgt=$2
+    prefix=$3
+    pushd $orig                
+    tar zxvf ${download_path}/${prefix}$src-${tgt}.tgz
+    popd 
+}
+
+generate_train(){
+    lsrc=$1
+    ltgt=$2
+    src=${lsrc:0:2}    
+    tgt=${ltgt:0:2}
+    for ll in $lsrc $ltgt; do
+        l=${ll:0:2}
+        f="$orig/*/train.tags.$src-$tgt.$l"
+        f_raw=$raw/train.$lsrc-$ltgt.$ll
+        cat $f \
+        | grep -v '<url>' \
+        | grep -v '<talkid>' \
+        | grep -v '<keywords>' \
+        | grep -v '<speaker>' \
+        | grep -v '<reviewer' \
+        | grep -v '<translator' \
+        | grep -v '<doc' \
+        | grep -v '</doc>' \
+        | sed -e 's/<title>//g' \
+        | sed -e 's/<\/title>//g' \
+        | sed -e 's/<description>//g' \
+        | sed -e 's/<\/description>//g' \
+        | sed 's/^\s*//g' \
+        | sed 's/\s*$//g' \
+        > $f_raw
+        [ $? -eq 0 ] && echo "extracted $f to $f_raw"
+    done
+    return 0        
+}
+
+convert_valid_test(){
+    src=$1
+    tgt=$2
+    for l in $src $tgt; do
+        echo "lang: ${l}"
+        for o in `ls $orig/*/IWSLT*.TED*.$src-$tgt.$l.xml`; do
+            fname=${o##*/}
+            f=$tmp/${fname%.*}
+            echo "$o => $f"
+            grep '<seg id' $o \
+            | sed -e 's/<seg id="[0-9]*">\s*//g' \
+            | sed -e 's/\s*<\/seg>\s*//g' \
+            | sed -e "s/\’/\'/g" \
+            > $f
+            echo ""
+        done
+    done    
+}
+
+generate_subset(){
+    lsrc=$1
+    ltgt=$2
+    src=${lsrc:0:2}
+    tgt=${ltgt:0:2}
+    subset=$3
+    prefix=$4
+    for ll in $lsrc $ltgt; do
+        l=${ll:0:2}
+        f=$tmp/$prefix.${src}-${tgt}.$l
+        if [[ -f $f ]]; then        
+            cp $f $raw/$subset.${lsrc}-$ltgt.${ll}
+        fi
+    done      
+}
+#################
+
+echo "downloading iwslt training and dev data"
+# using multilingual for it, nl 
+download_iwslt "2017-01-trnmted" DeEnItNlRo DeEnItNlRo
+download_iwslt "2017-01-trnted" ar en
+download_iwslt "2017-01-trnted" en ar
+download_iwslt "2017-01-trnted" ko en
+download_iwslt "2017-01-trnted" en ko
+download_iwslt "2015-01" vi en   
+download_iwslt "2015-01" en vi   
+
+echo "donwloading iwslt test data"
+download_iwslt "2017-01-mted-test" it en "test."
+download_iwslt "2017-01-mted-test" en it "test."
+download_iwslt "2017-01-mted-test" nl en "test."
+download_iwslt "2017-01-mted-test" en nl "test."
+
+download_iwslt "2017-01-ted-test" ar en "test."
+download_iwslt "2017-01-ted-test" en ar "test."
+download_iwslt "2017-01-ted-test" ko en "test."
+download_iwslt "2017-01-ted-test" en ko "test."
+download_iwslt "2015-01-test" vi en "test."
+download_iwslt "2015-01-test" en vi "test."
+
+echo "extract training data tar balls"
+extract_iwslt  DeEnItNlRo DeEnItNlRo
+extract_iwslt  ar en
+extract_iwslt  en ar
+extract_iwslt  ko en
+extract_iwslt  en ko
+extract_iwslt  vi en   
+extract_iwslt  en vi   
+
+
+echo "extracting iwslt test data"
+for lang in $langs; do
+    l=${lang:0:2}
+    extract_iwslt $l en "test."
+    extract_iwslt en $l "test."
+done
+
+echo "convert dev and test data"
+for lang in $langs; do
+    s_lang=${lang:0:2}
+    convert_valid_test $s_lang en  
+    convert_valid_test en $s_lang
+done
+
+
+
+echo "creating training data into $raw"
+for lang in $langs; do
+    generate_train $lang en_XX
+    generate_train en_XX $lang
+done
+
+echo "creating iwslt dev data into raw"
+generate_subset en_XX vi_VN valid "IWSLT15.TED.tst2013"
+generate_subset vi_VN en_XX valid "IWSLT15.TED.tst2013"
+
+generate_subset en_XX ar_AR valid "IWSLT17.TED.tst2016"
+generate_subset ar_AR en_XX valid "IWSLT17.TED.tst2016"
+generate_subset en_XX ko_KR valid "IWSLT17.TED.tst2016"
+generate_subset ko_KR en_XX valid "IWSLT17.TED.tst2016"
+
+
+generate_subset en_XX it_IT valid "IWSLT17.TED.tst2010"
+generate_subset it_IT en_XX valid "IWSLT17.TED.tst2010"
+generate_subset en_XX nl_XX valid "IWSLT17.TED.tst2010"
+generate_subset nl_XX en_XX valid "IWSLT17.TED.tst2010"
+
+echo "creating iswslt test data into raw"
+generate_subset en_XX vi_VN test "IWSLT15.TED.tst2015"
+generate_subset vi_VN en_XX test "IWSLT15.TED.tst2015"
+
+generate_subset en_XX ar_AR test "IWSLT17.TED.tst2017"
+generate_subset ar_AR en_XX test "IWSLT17.TED.tst2017"
+generate_subset en_XX ko_KR test "IWSLT17.TED.tst2017"
+generate_subset ko_KR en_XX test "IWSLT17.TED.tst2017"
+
+generate_subset en_XX it_IT test "IWSLT17.TED.tst2017.mltlng"
+generate_subset it_IT en_XX test "IWSLT17.TED.tst2017.mltlng"
+generate_subset en_XX nl_XX test "IWSLT17.TED.tst2017.mltlng"
+generate_subset nl_XX en_XX test "IWSLT17.TED.tst2017.mltlng"
+
+# normalze iwslt directions into x-en
+pushd $raw
+for lang in $langs; do
+    for split in test valid; do
+        x_en_f1=$split.$lang-en_XX.en_XX
+        x_en_f2=$split.$lang-en_XX.${lang}
+
+        en_x_f1=$split.en_XX-$lang.en_XX
+        en_x_f2=$split.en_XX-$lang.${lang}        
+
+        if [ -f $en_x_f1 ] && [ ! -f $x_en_f1 ]; then
+            echo "cp $en_x_f1 $x_en_f1"
+            cp $en_x_f1 $x_en_f1
+        fi
+        if [ -f $x_en_f2 ] && [ ! -f $x_en_f2 ]; then
+            echo "cp $en_x_f2 $x_en_f2"
+            cp $en_x_f2 $x_en_f2
+        fi        
+    done
+done
+popd
\ No newline at end of file
diff --git a/SpeechT5/fairseq/examples/multilingual/data_scripts/download_lotus.sh b/SpeechT5/fairseq/examples/multilingual/data_scripts/download_lotus.sh
new file mode 100644
index 0000000000000000000000000000000000000000..c08c701314a8e575637deff78381ab02c2ef6728
--- /dev/null
+++ b/SpeechT5/fairseq/examples/multilingual/data_scripts/download_lotus.sh
@@ -0,0 +1,46 @@
+#!/bin/bash
+# Copyright (c) Facebook, Inc. and its affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+
+
+if [ -z $WORKDIR_ROOT ] ;
+then
+        echo "please specify your working directory root in environment variable WORKDIR_ROOT. Exitting..."
+        exit
+fi
+
+
+SRCDIR=$WORKDIR_ROOT/indic_languages_corpus
+DESTDIR=${WORKDIR_ROOT}/ML50/raw/
+mkdir -p $SRCDIR
+mkdir -p $DESTDIR
+
+cd $SRCDIR
+wget http://lotus.kuee.kyoto-u.ac.jp/WAT/indic-multilingual/indic_languages_corpus.tar.gz
+tar -xvzf indic_languages_corpus.tar.gz
+
+SRC_EXTRACT_DIR=$SRCDIR/indic_languages_corpus/bilingual
+
+cp $SRC_EXTRACT_DIR/ml-en/train.ml $DESTDIR/train.ml_IN-en_XX.ml_IN
+cp $SRC_EXTRACT_DIR/ml-en/train.en $DESTDIR/train.ml_IN-en_XX.en_XX
+cp $SRC_EXTRACT_DIR/ml-en/dev.ml $DESTDIR/valid.ml_IN-en_XX.ml_IN
+cp $SRC_EXTRACT_DIR/ml-en/dev.en $DESTDIR/valid.ml_IN-en_XX.en_XX
+cp $SRC_EXTRACT_DIR/ml-en/test.ml $DESTDIR/test.ml_IN-en_XX.ml_IN
+cp $SRC_EXTRACT_DIR/ml-en/test.en $DESTDIR/test.ml_IN-en_XX.en_XX
+
+cp $SRC_EXTRACT_DIR/ur-en/train.ur $DESTDIR/train.ur_PK-en_XX.ur_PK
+cp $SRC_EXTRACT_DIR/ur-en/train.en $DESTDIR/train.ur_PK-en_XX.en_XX
+cp $SRC_EXTRACT_DIR/ur-en/dev.ur $DESTDIR/valid.ur_PK-en_XX.ur_PK
+cp $SRC_EXTRACT_DIR/ur-en/dev.en $DESTDIR/valid.ur_PK-en_XX.en_XX
+cp $SRC_EXTRACT_DIR/ur-en/test.ur $DESTDIR/test.ur_PK-en_XX.ur_PK
+cp $SRC_EXTRACT_DIR/ur-en/test.en $DESTDIR/test.ur_PK-en_XX.en_XX
+
+cp $SRC_EXTRACT_DIR/te-en/train.te $DESTDIR/train.te_IN-en_XX.te_IN
+cp $SRC_EXTRACT_DIR/te-en/train.en $DESTDIR/train.te_IN-en_XX.en_XX
+cp $SRC_EXTRACT_DIR/te-en/dev.te $DESTDIR/valid.te_IN-en_XX.te_IN
+cp $SRC_EXTRACT_DIR/te-en/dev.en $DESTDIR/valid.te_IN-en_XX.en_XX
+cp $SRC_EXTRACT_DIR/te-en/test.te $DESTDIR/test.te_IN-en_XX.te_IN
+cp $SRC_EXTRACT_DIR/te-en/test.en $DESTDIR/test.te_IN-en_XX.en_XX
diff --git a/SpeechT5/fairseq/examples/multilingual/data_scripts/download_ted_and_extract.py b/SpeechT5/fairseq/examples/multilingual/data_scripts/download_ted_and_extract.py
new file mode 100644
index 0000000000000000000000000000000000000000..eb756680fa7dc31a14ba45c216776a6d60c16b60
--- /dev/null
+++ b/SpeechT5/fairseq/examples/multilingual/data_scripts/download_ted_and_extract.py
@@ -0,0 +1,338 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+
+import itertools
+import os
+import csv
+from collections import defaultdict
+from six.moves import zip
+import io
+import wget
+import sys
+
+from subprocess import check_call, check_output
+
+# scripts and data locations
+CWD = os.getcwd()
+UTILS = f"{CWD}/utils"
+
+MOSES = f"{UTILS}/mosesdecoder"
+
+WORKDIR_ROOT = os.environ.get('WORKDIR_ROOT', None)
+
+if WORKDIR_ROOT is None or  not WORKDIR_ROOT.strip():
+    print('please specify your working directory root in OS environment variable WORKDIR_ROOT. Exitting..."')
+    sys.exit(-1)
+
+
+# please donwload mosesdecoder here:
+detok_cmd = f'{MOSES}/scripts/tokenizer/detokenizer.perl'
+
+
+def call(cmd):
+    print(f"Executing: {cmd}")
+    check_call(cmd, shell=True)
+
+class MultiLingualAlignedCorpusReader(object):
+    """A class to read TED talk dataset
+    """
+
+    def __init__(self, corpus_path, delimiter='\t',
+                 target_token=True, bilingual=True, corpus_type='file',
+                 lang_dict={'source': ['fr'], 'target': ['en']},
+                 eval_lang_dict=None, zero_shot=False,
+                 detok=True,
+                 ):
+
+        self.empty_line_flag = 'NULL'
+        self.corpus_path = corpus_path
+        self.delimiter = delimiter
+        self.bilingual = bilingual
+        self.lang_dict = lang_dict
+        self.lang_set = set()
+        self.target_token = target_token
+        self.zero_shot = zero_shot
+        self.eval_lang_dict = eval_lang_dict
+        self.corpus_type = corpus_type
+        self.detok = detok
+
+        for list_ in self.lang_dict.values():
+            for lang in list_:
+                self.lang_set.add(lang)
+
+        self.data = dict()
+        self.data['train'] = self.read_aligned_corpus(split_type='train')
+        self.data['test'] = self.read_aligned_corpus(split_type='test')
+        self.data['dev'] = self.read_aligned_corpus(split_type='dev')
+
+    def read_data(self, file_loc_):
+        data_list = list()
+        with io.open(file_loc_, 'r', encoding='utf8') as fp:
+            for line in fp:
+                try:
+                    text = line.strip()
+                except IndexError:
+                    text = self.empty_line_flag
+                data_list.append(text)
+        return data_list
+
+    def filter_text(self, dict_):
+        if self.target_token:
+            field_index = 1
+        else:
+            field_index = 0
+        data_dict = defaultdict(list)
+        list1 = dict_['source']
+        list2 = dict_['target']
+        for sent1, sent2 in zip(list1, list2):
+            try:
+                src_sent = ' '.join(sent1.split()[field_index: ])
+            except IndexError:
+                src_sent = 'NULL'
+
+            if src_sent.find(self.empty_line_flag) != -1 or len(src_sent) == 0:
+                continue
+
+            elif sent2.find(self.empty_line_flag) != -1 or len(sent2) == 0:
+                continue
+
+            else:
+                data_dict['source'].append(sent1)
+                data_dict['target'].append(sent2)
+        return data_dict
+
+    def read_file(self, split_type, data_type):
+        return self.data[split_type][data_type]
+
+    def save_file(self, path_, split_type, data_type, lang):
+        tok_file = tok_file_name(path_, lang)
+        with io.open(tok_file, 'w', encoding='utf8') as fp:
+            for line in self.data[split_type][data_type]:
+                fp.write(line + '\n')
+        if self.detok:
+            de_tok(tok_file, lang)                
+
+    def add_target_token(self, list_, lang_id):
+        new_list = list()
+        token = '__' + lang_id + '__'
+        for sent in list_:
+            new_list.append(token + ' ' + sent)
+        return new_list
+
+    def read_from_single_file(self, path_, s_lang, t_lang):
+        data_dict = defaultdict(list)
+        with io.open(path_, 'r', encoding='utf8') as fp:
+            reader = csv.DictReader(fp, delimiter='\t', quoting=csv.QUOTE_NONE)
+            for row in reader:
+                data_dict['source'].append(row[s_lang])
+                data_dict['target'].append(row[t_lang])
+
+        if self.target_token:
+            text = self.add_target_token(data_dict['source'], t_lang)
+            data_dict['source'] = text
+
+        return data_dict['source'], data_dict['target']
+
+    def read_aligned_corpus(self, split_type='train'):
+        data_dict = defaultdict(list)
+        iterable = []
+        s_list = []
+        t_list = []
+
+        if self.zero_shot:
+            if split_type == "train":
+                iterable = zip(self.lang_dict['source'], self.lang_dict['target'])
+            else:
+                iterable = zip(self.eval_lang_dict['source'], self.eval_lang_dict['target'])
+
+        elif self.bilingual:
+            iterable = itertools.product(self.lang_dict['source'], self.lang_dict['target'])
+
+        for s_lang, t_lang in iterable:
+            if s_lang == t_lang:
+                continue
+            if self.corpus_type == 'file':
+                split_type_file_path = os.path.join(self.corpus_path,
+                                                    "all_talks_{}.tsv".format(split_type))
+                s_list, t_list = self.read_from_single_file(split_type_file_path,
+                                                            s_lang=s_lang,
+                                                            t_lang=t_lang)
+            data_dict['source'] += s_list
+            data_dict['target'] += t_list
+        new_data_dict = self.filter_text(data_dict)
+        return new_data_dict
+
+
+def read_langs(corpus_path):
+    split_type_file_path = os.path.join(corpus_path, 'extracted',
+                                        "all_talks_dev.tsv")    
+    with io.open(split_type_file_path, 'r', encoding='utf8') as fp:
+        reader = csv.DictReader(fp, delimiter='\t', quoting=csv.QUOTE_NONE)
+        header = next(reader)
+        return [k for k in header.keys() if k != 'talk_name']
+
+def extra_english(corpus_path, split):
+    split_type_file_path = os.path.join(corpus_path,
+                                        f"all_talks_{split}.tsv") 
+    output_split_type_file_path = os.path.join(corpus_path,
+                                        f"all_talks_{split}.en")                                            
+    with io.open(split_type_file_path, 'r', encoding='utf8') as fp, io.open(output_split_type_file_path, 'w', encoding='utf8') as fw:
+        reader = csv.DictReader(fp, delimiter='\t', quoting=csv.QUOTE_NONE)
+        for row in reader:
+            line = row['en']
+            fw.write(line + '\n')
+    de_tok(output_split_type_file_path, 'en')
+
+
+
+def tok_file_name(filename, lang):
+    seps = filename.split('.')
+    seps.insert(-1, 'tok')
+    tok_file = '.'.join(seps)
+    return tok_file
+
+def de_tok(tok_file, lang):
+    # seps = tok_file.split('.')
+    # seps.insert(-1, 'detok')
+    # de_tok_file = '.'.join(seps)
+    de_tok_file = tok_file.replace('.tok.', '.')
+    cmd = 'perl {detok_cmd} -l {lang} < {tok_file} > {de_tok_file}'.format(
+        detok_cmd=detok_cmd, tok_file=tok_file,
+        de_tok_file=de_tok_file, lang=lang[:2])
+    call(cmd)
+
+def extra_bitex(
+    ted_data_path,
+    lsrc_lang,
+    ltrg_lang,
+    target_token,
+    output_data_path,
+):
+    def get_ted_lang(lang):
+        long_langs = ['pt-br', 'zh-cn', 'zh-tw', 'fr-ca']
+        if lang[:5] in long_langs:
+            return lang[:5]
+        elif lang[:4] =='calv':
+            return lang[:5]
+        elif lang in ['pt_BR', 'zh_CN', 'zh_TW', 'fr_CA']:
+            return lang.lower().replace('_', '-')
+        return lang[:2]
+    src_lang = get_ted_lang(lsrc_lang)
+    trg_lang = get_ted_lang(ltrg_lang)
+    train_lang_dict={'source': [src_lang], 'target': [trg_lang]}
+    eval_lang_dict = {'source': [src_lang], 'target': [trg_lang]}
+
+    obj = MultiLingualAlignedCorpusReader(corpus_path=ted_data_path,
+                                          lang_dict=train_lang_dict,
+                                          target_token=target_token,
+                                          corpus_type='file',
+                                          eval_lang_dict=eval_lang_dict,
+                                          zero_shot=False,
+                                          bilingual=True)
+
+    os.makedirs(output_data_path, exist_ok=True)
+    lsrc_lang = lsrc_lang.replace('-', '_')
+    ltrg_lang = ltrg_lang.replace('-', '_')
+    obj.save_file(output_data_path + f"/train.{lsrc_lang}-{ltrg_lang}.{lsrc_lang}",
+                  split_type='train', data_type='source', lang=src_lang)
+    obj.save_file(output_data_path + f"/train.{lsrc_lang}-{ltrg_lang}.{ltrg_lang}",
+                  split_type='train', data_type='target', lang=trg_lang)
+
+    obj.save_file(output_data_path + f"/test.{lsrc_lang}-{ltrg_lang}.{lsrc_lang}",
+                  split_type='test', data_type='source', lang=src_lang)
+    obj.save_file(output_data_path + f"/test.{lsrc_lang}-{ltrg_lang}.{ltrg_lang}",
+                  split_type='test', data_type='target', lang=trg_lang)
+
+    obj.save_file(output_data_path + f"/valid.{lsrc_lang}-{ltrg_lang}.{lsrc_lang}",
+                  split_type='dev', data_type='source', lang=src_lang)
+    obj.save_file(output_data_path + f"/valid.{lsrc_lang}-{ltrg_lang}.{ltrg_lang}",
+                  split_type='dev', data_type='target', lang=trg_lang)
+
+
+def bar_custom(current, total, width=80):
+    print("Downloading: %d%% [%d / %d] Ks" % (current / total * 100, current / 1000, total / 1000), end='\r')
+
+
+def download_and_extract(download_to, extract_to):
+    url = 'http://phontron.com/data/ted_talks.tar.gz'
+    filename = f"{download_to}/ted_talks.tar.gz"
+    if os.path.exists(filename):
+        print(f'{filename} has already been downloaded so skip')
+    else:
+        filename = wget.download(url, filename, bar=bar_custom)
+    if os.path.exists(f'{extract_to}/all_talks_train.tsv'):
+        print(f'Already extracted so skip')
+    else:
+        extract_cmd = f'tar xzfv "{filename}" -C "{extract_to}"'
+        call(extract_cmd)
+
+
+if __name__ == "__main__":
+    import argparse
+    parser = argparse.ArgumentParser()
+    parser.add_argument('--ted_data_path', type=str, default=WORKDIR_ROOT, required=False)
+    parser.add_argument(
+        '--direction-list', 
+        type=str, 
+        # default=None,
+        #for ML50
+        default=(
+            "bn_IN-en_XX,he_IL-en_XX,fa_IR-en_XX,id_ID-en_XX,sv_SE-en_XX,pt_XX-en_XX,ka_GE-en_XX,ka_GE-en_XX,th_TH-en_XX,"
+            "mr_IN-en_XX,hr_HR-en_XX,uk_UA-en_XX,az_AZ-en_XX,mk_MK-en_XX,gl_ES-en_XX,sl_SI-en_XX,mn_MN-en_XX,"
+            #non-english directions
+            # "fr_XX-de_DE," # replaced with wmt20
+            # "ja_XX-ko_KR,es_XX-pt_XX,ru_RU-sv_SE,hi_IN-bn_IN,id_ID-ar_AR,cs_CZ-pl_PL,ar_AR-tr_TR"
+        ), 
+        required=False)
+    parser.add_argument('--target-token',  action='store_true', default=False)
+    parser.add_argument('--extract-all-english',  action='store_true', default=False)    
+
+    args = parser.parse_args()
+
+    import sys
+    import json
+
+    # TED Talks data directory
+    ted_data_path = args.ted_data_path
+
+    download_to = f'{ted_data_path}/downloads'
+    extract_to = f'{ted_data_path}/extracted'
+    
+    #DESTDIR=${WORKDIR_ROOT}/ML50/raw/
+    output_path = f'{ted_data_path}/ML50/raw'
+    os.makedirs(download_to, exist_ok=True)
+    os.makedirs(extract_to, exist_ok=True)
+    os.makedirs(output_path, exist_ok=True)
+    download_and_extract(download_to, extract_to)        
+
+
+    if args.extract_all_english:
+        for split in ['train', 'dev', 'test']:
+            extra_english(ted_data_path, split)
+        exit(0)     
+    if args.direction_list is not None: 
+        directions = args.direction_list.strip().split(',')
+        directions = [tuple(d.strip().split('-', 1)) for d in directions if d]
+    else: 
+        langs = read_langs(ted_data_path)
+        # directions = [
+        #     '{}.{}'.format(src, tgt) 
+        #     for src in langs 
+        #     for tgt in langs
+        #     if src < tgt
+        # ]
+        directions = [('en', tgt) for tgt in langs if tgt != 'en']
+    print(f'num directions={len(directions)}: {directions}')
+
+    for src_lang, trg_lang in directions:
+        print('--working on {}-{}'.format(src_lang, trg_lang))
+        extra_bitex(
+            extract_to,
+            src_lang,
+            trg_lang,
+            target_token=args.target_token,
+            output_data_path=output_path
+        )
diff --git a/SpeechT5/fairseq/examples/multilingual/data_scripts/download_wat19_my.sh b/SpeechT5/fairseq/examples/multilingual/data_scripts/download_wat19_my.sh
new file mode 100644
index 0000000000000000000000000000000000000000..c1e2d47287a29af4576e7a63641e8152ecb63c44
--- /dev/null
+++ b/SpeechT5/fairseq/examples/multilingual/data_scripts/download_wat19_my.sh
@@ -0,0 +1,36 @@
+#!/bin/bash
+# Copyright (c) Facebook, Inc. and its affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+
+
+if [ -z $WORKDIR_ROOT ] ;
+then
+        echo "please specify your working directory root in environment variable WORKDIR_ROOT. Exitting..."
+        exit
+fi
+
+
+SRCDIR=$WORKDIR_ROOT/indic_languages_corpus
+DESTDIR=$WORKDIR_ROOT/ML50/raw
+mkdir -p $SRCDIR
+mkdir -p $DESTDIR
+
+WAT_MY_EN=wat2020.my-en.zip
+cd $SRCDIR
+# please refer to http://lotus.kuee.kyoto-u.ac.jp/WAT/my-en-data/ for latest URL if the following url expired
+#- The data used for WAT2020 are identical to those used in WAT2019.
+wget http://lotus.kuee.kyoto-u.ac.jp/WAT/my-en-data/$WAT_MY_EN
+unzip $WAT_MY_EN
+
+
+SRC_EXTRACT_DIR=$SRCDIR/wat2020.my-en/alt
+
+cp $SRC_EXTRACT_DIR/train.alt.en $DESTDIR/train.my_MM-en_XX.en_XX
+cp $SRC_EXTRACT_DIR/train.alt.my $DESTDIR/train.my_MM-en_XX.my_MM
+cp $SRC_EXTRACT_DIR/dev.alt.en $DESTDIR/valid.my_MM-en_XX.en_XX
+cp $SRC_EXTRACT_DIR/dev.alt.my $DESTDIR/valid.my_MM-en_XX.my_MM
+cp $SRC_EXTRACT_DIR/test.alt.en $DESTDIR/test.my_MM-en_XX.en_XX
+cp $SRC_EXTRACT_DIR/test.alt.my $DESTDIR/test.my_MM-en_XX.my_MM
diff --git a/SpeechT5/fairseq/examples/multilingual/data_scripts/download_wmt19_and_before.py b/SpeechT5/fairseq/examples/multilingual/data_scripts/download_wmt19_and_before.py
new file mode 100644
index 0000000000000000000000000000000000000000..3465731eb3e55047c44d1b336a97e99cb3a89a53
--- /dev/null
+++ b/SpeechT5/fairseq/examples/multilingual/data_scripts/download_wmt19_and_before.py
@@ -0,0 +1,899 @@
+from typing import NamedTuple, List
+from urllib.parse import urlparse
+import os, sys
+import subprocess
+from subprocess import check_call, check_output
+import glob
+import wget
+import re
+import multiprocessing as mp
+from functools import partial
+import pathlib
+from collections import OrderedDict 
+
+WORKDIR_ROOT = os.environ.get('WORKDIR_ROOT', None)
+
+if WORKDIR_ROOT is None or  not WORKDIR_ROOT.strip():
+    print('please specify your working directory root in OS environment variable WORKDIR_ROOT. Exitting..."')
+    sys.exit(-1)
+
+# scripts and data locations
+CWD = os.getcwd()
+UTILS = f"{CWD}/utils"
+
+MOSES = f"{UTILS}/mosesdecoder"
+SGM_TOOL = f'{MOSES}/scripts/ems/support/input-from-sgm.perl'
+
+TMX2CORPUS = f"{UTILS}/tmx2corpus"
+TMX_TOOL = f'python {TMX2CORPUS}/tmx2corpus.py'
+
+to_data_path = f'{WORKDIR_ROOT}/wmt'
+download_to = f'{to_data_path}/downloads'
+manually_downloads = f'{to_data_path}/downloads'
+extract_to = f'{to_data_path}/extracted'
+#DESTDIR=${WORKDIR_ROOT}/ML50/raw/
+raw_data = f'{WORKDIR_ROOT}/ML50/raw'
+####
+
+class DLDataset(NamedTuple):
+    name: str
+    train_urls: List[str]
+    valid_urls: List[str]
+    test_urls: List[str]        
+    train_files_patterns: List[str] = []
+    valid_files_patterns: List[str] = []
+    test_files_patterns: List[str] = []
+
+
+
+def bar_custom(current, total, width=80):
+    print("Downloading: %d%% [%d / %d] Ks" % (current / total * 100, current / 1000, total / 1000), end='\r')
+
+def get_downloaded_file(dl_folder, url):
+    if isinstance(url, tuple):
+        url, f = url
+    else:
+        url_f = urlparse(url)
+        # f = os.path.split(url_f.path)[-1]
+        f = '_'.join(url_f.path.split('/')[1:])
+    return url, f"{dl_folder}/{f}"
+
+def download_parts_and_combine(dl_folder, urls, filename):
+    parts = []
+    for url_record in urls:
+        url, part_file = get_downloaded_file(dl_folder, url_record)     
+        if os.path.exists(part_file):
+            print(f'{part_file} has already been downloaded so skip')
+        else: 
+            part_file = wget.download(url, part_file, bar=bar_custom)  
+        parts.append(part_file)
+
+    def get_combine_cmd(parts):           
+        #default as tar.gz.??
+        return f'cat {" ".join(parts)} > {filename}'
+
+    combine_cmd = get_combine_cmd(parts)
+    call(combine_cmd, debug=True)
+    return filename
+
+def download_a_url(dl_folder, url):
+    url, filename = get_downloaded_file(dl_folder, url)        
+    if os.path.exists(filename):
+        print(f'{filename} has already been downloaded so skip')
+        return filename
+
+    print(f'downloading {url} to {filename}')
+    if isinstance(url, list) or isinstance(url, tuple):
+        download_parts_and_combine(dl_folder, url, filename)
+    else:
+        wget.download(url, filename, bar=bar_custom)
+    print(f'dowloaded: {filename}')
+    return filename
+
+def download_files(dl_folder, urls, completed_urls={}):
+    for url_record in urls:
+        url, _ = get_downloaded_file(dl_folder, url_record) 
+        filename = download_a_url(dl_folder, url_record) 
+        completed_urls[str(url)] = filename
+    return completed_urls
+
+def check_need_manual_downalod(dl_folder, to_manually_download_urls):
+    to_be_manually_dowloaded = []
+    manually_completed_urls = {}
+    for url_record, instruction in to_manually_download_urls:
+        url, filename = get_downloaded_file(dl_folder, url_record)
+        if not os.path.exists(filename):
+            print(f'{url} need to be download manually, please download it manually following {instruction}; and copy it to {filename}')
+            to_be_manually_dowloaded.append((url, filename))
+        else:
+            manually_completed_urls[url] = filename
+    # if len(to_be_manually_dowloaded) > 0:
+    #     raise ValueError('Missing files that need to be downloaded manually; stop the process now.')
+    return to_be_manually_dowloaded
+        
+def download_dataset(to_folder, dl_dataset, completed_urls={}):
+    download_files(to_folder, dl_dataset.train_urls, completed_urls)
+    download_files(to_folder, dl_dataset.valid_urls, completed_urls)
+    download_files(to_folder, dl_dataset.test_urls, completed_urls)
+    print('completed downloading')
+    return completed_urls
+
+def call(cmd, debug=False):
+    if debug:
+        print(cmd)
+    check_call(cmd, shell=True)
+
+    
+def get_extract_name(file_path):
+    path = os.path.split(file_path)
+    return path[-1] + '_extract' #.split('.')[0]
+
+def extract_file(downloaded_file, extract_folder, get_extract_name=get_extract_name, debug=False):
+    extract_name = get_extract_name(downloaded_file)
+    extract_to = f'{extract_folder}/{extract_name}'
+    os.makedirs(extract_to, exist_ok=True)
+    if os.path.exists(f'{extract_to}/DONE'):
+        print(f'{downloaded_file} has already been extracted to {extract_to} so skip')
+        return extract_to
+    def get_extract_cmd(filename):
+        if filename.endswith('.tgz') or filename.endswith('tar.gz'):
+            return f'tar xzfv {filename} -C {extract_to}'
+        elif filename.endswith('.gz.tar'): 
+            return f'tar xfv {filename} -C {extract_to}; (cd {extract_to}; gzip -d *.gz; [ $? -eq 0 ]  || gzip -d */*.gz)'  
+        elif filename.endswith('.tar'):
+            return f'tar xfv {filename} -C {extract_to}'        
+        elif filename.endswith('.gz'):
+            return f'cp {filename} {extract_to}; (cd {extract_to}; gzip -d *.gz)'
+        elif filename.endswith('.zip'):
+            return f'unzip {filename} -d {extract_to}'        
+    extract_cmd = get_extract_cmd(downloaded_file) 
+    print(f'extracting {downloaded_file}')
+    if isinstance(extract_cmd, list):
+        for c in  extract_cmd:
+            call(c, debug=debug)
+    else:
+        call(extract_cmd, debug=debug)
+    call(f'echo DONE > {extract_to}/DONE')
+    return extract_to
+
+
+def extract_all_files(
+    completed_urls, extract_folder,
+    get_extract_name=get_extract_name,
+    completed_extraction={},
+    debug=False):
+    extracted_folders = OrderedDict()
+    for url, downloaded_file in set(completed_urls.items()):
+        if downloaded_file in completed_extraction:
+            print(f'{downloaded_file} is already extracted; so skip')
+            continue
+        folder = extract_file(downloaded_file, extract_folder, get_extract_name, debug)
+        extracted_folders[url] = folder
+    return extracted_folders
+
+
+def my_glob(folder):
+    for p in [f'{folder}/*', f'{folder}/*/*', f'{folder}/*/*/*']:
+        for f in glob.glob(p):
+            yield f
+
+
+def sgm2raw(sgm, debug):
+    to_file = sgm[0:len(sgm) - len('.sgm')]
+    if os.path.exists(to_file):
+        debug and print(f'{sgm} already converted to {to_file}; so skip')
+        return to_file
+    cmd = f'{SGM_TOOL} < {sgm} > {to_file}'
+    call(cmd, debug)
+    return to_file
+
+def tmx2raw(tmx, debug):
+    to_file = tmx[0:len(tmx) - len('.tmx')]
+    to_folder = os.path.join(*os.path.split(tmx)[:-1])
+    if os.path.exists(f'{to_folder}/bitext.en'):
+        debug and print(f'{tmx} already extracted to {to_file}; so skip')
+        return to_file
+    cmd = f'(cd {to_folder}; {TMX_TOOL} {tmx})'
+    call(cmd, debug)
+    return to_file
+
+CZENG16_REGEX = re.compile(r'.*?data.plaintext-format/0[0-9]train$')
+WMT19_WIKITITLES_REGEX = re.compile(r'.*?wikititles-v1.(\w\w)-en.tsv.gz')
+TSV_REGEX = re.compile(r'.*?(\w\w)-(\w\w).tsv$')
+
+
+
+def cut_wikitles(wiki_file, debug):
+    # different languages have different file names: 
+    if wiki_file.endswith('wiki/fi-en/titles.fi-en'):
+        to_file1 = f'{wiki_file}.fi'
+        to_file2 = f'{wiki_file}.en'
+        BACKSLASH = '\\'
+        cmd1 = f"cat {wiki_file} | sed 's/|||/{BACKSLASH}t/g' |cut -f1 |awk '{{$1=$1}};1' > {to_file1}"
+        cmd2 = f"cat {wiki_file} | sed 's/|||/{BACKSLASH}t/g' |cut -f2 |awk '{{$1=$1}};1' > {to_file2}"  
+#     elif WMT19_WIKITITLES_REGEX.match(wiki_file):
+#         src = WMT19_WIKITITLES_REGEX.match(wiki_file).groups()[0]
+#         to_file1 = f'{wiki_file}.{src}'
+#         to_file2 = f'{wiki_file}.en'
+#         cmd1 = f"cat {wiki_file} | cut -f1 |awk '{{$1=$1}};1' > {to_file1}"
+#         cmd2 = f"cat {wiki_file} | cut -f2 |awk '{{$1=$1}};1' > {to_file2}"
+    else:
+        return None
+    if os.path.exists(to_file1) and os.path.exists(to_file2):
+        debug and print(f'{wiki_file} already processed to {to_file1} and {to_file2}; so skip')
+        return wiki_file    
+
+    call(cmd1, debug=debug)
+    call(cmd2, debug=debug)
+    return wiki_file
+
+def cut_tsv(file, debug):
+    m = TSV_REGEX.match(file)
+    if m is None:
+        raise ValueError(f'{file} is not matching tsv pattern')
+    src = m.groups()[0]
+    tgt = m.groups()[1]
+
+    to_file1 = f'{file}.{src}'
+    to_file2 = f'{file}.{tgt}' 
+    cmd1 = f"cat {file} | cut -f1 |awk '{{$1=$1}};1' > {to_file1}"
+    cmd2 = f"cat {file} | cut -f2 |awk '{{$1=$1}};1' > {to_file2}"         
+    if os.path.exists(to_file1) and os.path.exists(to_file2):
+        debug and print(f'{file} already processed to {to_file1} and {to_file2}; so skip')
+        return file    
+
+    call(cmd1, debug=debug)
+    call(cmd2, debug=debug)
+    return file    
+
+    
+def convert_file_if_needed(file, debug):
+    if file.endswith('.sgm'):
+        return sgm2raw(file, debug)
+    elif file.endswith('.tmx'):
+        return tmx2raw(file, debug)
+    elif file.endswith('wiki/fi-en/titles.fi-en'):
+        return cut_wikitles(file, debug)
+#     elif WMT19_WIKITITLES_REGEX.match(file):
+#         return cut_wikitles(file, debug)
+    elif file.endswith('.tsv'):
+        return cut_tsv(file, debug)
+    elif CZENG16_REGEX.match(file):
+        return convert2czeng17(file, debug)
+    else:
+        return file
+
+
+def convert_files_if_needed(extracted_foldrs, my_glob=my_glob, debug=False):
+    return {
+        url: list(sorted(set(convert_file_if_needed(f, debug)) for f in sorted(set(my_glob(folder)))))
+        for url, folder in extracted_foldrs.items()
+    }
+        
+def match_patt(file_path, file_pattern, src, tgt, lang):    
+    return file_pattern.format(src=src, tgt=tgt, lang=lang) in file_path
+
+def match_patts(file_path, file_patterns, src, tgt, lang):
+    for file_pattern in file_patterns:
+        params = { k: v for k, v in [('src', src), ('tgt', tgt), ('lang', lang)] if k in file_pattern}
+        matching = file_pattern.format(**params)   
+
+        if isinstance(file_pattern, tuple):
+            pattern, directions = file_pattern
+            if f'{src}-{tgt}' in directions and matching in file_path:
+                return True
+        else:
+            if matching in file_path:
+                return True
+    return False
+
+def extracted_glob(extracted_folder, file_patterns, src, tgt, lang):
+    def get_matching_pattern(file_pattern):
+        params = {
+            k: v 
+            for k, v in [('src', src), ('tgt', tgt), ('lang', lang)] 
+            if '{' + k + '}' in file_pattern
+        }
+        file_pattern = re.sub(r'{src:(.*?)}', r'\1' if lang == src else '', file_pattern)
+        file_pattern = re.sub(r'{tgt:(.*?)}', r'\1' if lang == tgt else '', file_pattern)
+        file_pattern = file_pattern.format(**params)
+        return file_pattern
+    for file_pattern in file_patterns:
+        if isinstance(file_pattern, tuple):
+            file_pattern, lang_pairs = file_pattern
+            if f'{src}-{tgt}' not in lang_pairs:
+                continue
+#         print('working on pattern: ', file_pattern, lang_pairs )
+        matching_pattern = get_matching_pattern(file_pattern)
+        if matching_pattern is None:
+            continue
+        glob_patterns = f'{extracted_folder}/{matching_pattern}'
+#         print('glob_patterns: ', glob_patterns)
+        for f in glob.glob(glob_patterns):
+            yield f       
+
+# for debug usage
+def all_extracted_files(split, src, tgt, extracted_folders, split_urls):
+    def get_url(url):
+        if isinstance(url, tuple):
+            url, downloaded_file = url        
+        return url
+    return [
+        f
+        for url in split_urls
+        for f in my_glob(extracted_folders[str(get_url(url))])        
+    ]
+
+def concat_files(split, src, tgt, extracted_folders, split_urls, path_patterns, to_folder, debug=False):
+#     if debug:
+#         print('extracted files to be filtered by patterns: ', 
+#               '\n\t'.join(sorted(all_extracted_files(split, src, tgt, extracted_folders, split_urls))))
+    for lang in [src, tgt]:
+        to_file = f'{to_folder}/{split}.{src}-{tgt}.{lang}'
+        s_src, s_tgt, s_lang = src.split('_')[0], tgt.split('_')[0], lang.split('_')[0]
+        files = []
+        for url in split_urls:
+            if isinstance(url, tuple):
+                url, downloaded_file = url
+            if str(url) not in extracted_folders:
+                print(f'warning: {url} not in extracted files')
+            for extracted_file in set(
+                extracted_glob(
+                    extracted_folders[str(url)], path_patterns, 
+                    s_src, s_tgt, s_lang)):
+                files.append(extracted_file)
+        if len(files) == 0:
+            print('warning: ', f'No files found for split {to_file}')
+            continue
+        files = sorted(set(files))
+        print(f'concating {len(files)} files into {to_file}')
+        cmd = ['cat'] + [f'"{f}"' for f in files] + [f'>{to_file}']
+        cmd = " ".join(cmd)
+        call(cmd, debug=debug)
+
+UTILS = os.path.join(pathlib.Path(__file__).parent, 'utils')
+LID_MODEL = f'{download_to}/lid.176.bin'
+LID_MULTI = f'{UTILS}/fasttext_multi_filter.py'
+
+def lid_filter(split, src, tgt, from_folder, to_folder, debug=False):
+    if not os.path.exists(LID_MODEL):
+        call(f'wget -nc https://dl.fbaipublicfiles.com/fasttext/supervised-models/lid.176.bin -O {LID_MODEL}')
+    from_prefix = f'{from_folder}/{split}.{src}-{tgt}'
+    to_prefix = f'{to_folder}/{split}.{src}-{tgt}'
+    if os.path.exists(f'{from_prefix}.{src}') and os.path.exists(f'{from_prefix}.{tgt}'):
+        s_src, s_tgt = src.split('_')[0], tgt.split('_')[0]  
+        cmd = (
+            f'python {LID_MULTI} --model {LID_MODEL} --inputs {from_prefix}.{src} {from_prefix}.{tgt} '
+            f'--langs {s_src} {s_tgt} --outputs {to_prefix}.{src} {to_prefix}.{tgt}'
+        )
+        print(f'filtering {from_prefix}')
+        call(cmd, debug=debug)
+
+def concat_into_splits(dl_dataset, src, tgt, extracted_folders, to_folder, debug):
+    to_folder_tmp = f"{to_folder}_tmp"
+    os.makedirs(to_folder_tmp, exist_ok=True)
+    concat_files('train', src, tgt,
+                 extracted_folders,
+                 split_urls=dl_dataset.train_urls,
+                 path_patterns=dl_dataset.train_files_patterns,
+                 to_folder=to_folder_tmp, debug=debug)
+    lid_filter('train', src, tgt, to_folder_tmp, to_folder, debug)
+
+    concat_files('valid', src, tgt,
+                 extracted_folders, 
+                 split_urls=dl_dataset.valid_urls, 
+                 path_patterns=dl_dataset.valid_files_patterns, 
+                 to_folder=to_folder, debug=debug)
+    concat_files('test', src, tgt, 
+                 extracted_folders, 
+                 split_urls=dl_dataset.test_urls, 
+                 path_patterns=dl_dataset.test_files_patterns, 
+                 to_folder=to_folder, debug=debug)
+            
+
+def download_multi(dl_folder, extract_folder, urls, num_processes=8, debug=False):
+    pool = mp.Pool(processes=num_processes)
+    download_f = partial(download_a_url, dl_folder)
+    downloaded_files = pool.imap_unordered(download_f, urls)
+    pool.close()
+    pool.join()
+
+BLEU_REGEX = re.compile("^BLEU\\S* = (\\S+) ")
+def run_eval_bleu(cmd):
+    output = check_output(cmd, shell=True, stderr=subprocess.STDOUT).decode("utf-8").strip()
+    print(output)
+    bleu = -1.0
+    for line in output.strip().split('\n'):
+        m = BLEU_REGEX.search(line)
+        if m is not None:
+            bleu = m.groups()[0]
+            bleu = float(bleu)
+            break
+    return bleu
+
+def check_wmt_test_bleu(raw_folder, wmt_lang_pairs):
+    not_matchings = []
+    for wmt, src_tgts in wmt_lang_pairs:
+        for src_tgt in src_tgts:
+            print(f'checking test bleus for: {src_tgt} at {wmt}')
+            src, tgt = src_tgt.split('-')
+            ssrc, stgt = src[:2], tgt[:2]
+            if os.path.exists(f'{raw_folder}/test.{tgt}-{src}.{src}'):
+                # reversed direction may have different test set
+                test_src = f'{raw_folder}/test.{tgt}-{src}.{src}'
+            else:
+                test_src = f'{raw_folder}/test.{src}-{tgt}.{src}'
+            cmd1 = f'cat {test_src} | sacrebleu -t "{wmt}" -l {stgt}-{ssrc}; [ $? -eq 0 ] || echo ""'
+            test_tgt = f'{raw_folder}/test.{src}-{tgt}.{tgt}'       
+            cmd2 = f'cat {test_tgt} | sacrebleu -t "{wmt}" -l {ssrc}-{stgt}; [ $? -eq 0 ] || echo ""'
+            bleu1 = run_eval_bleu(cmd1) 
+            if bleu1 != 100.0:
+                not_matchings.append(f'{wmt}:{src_tgt} source side not matching: {test_src}')
+            bleu2 = run_eval_bleu(cmd2) 
+            if bleu2 != 100.0:
+                not_matchings.append(f'{wmt}:{src_tgt} target side not matching: {test_tgt}')
+    return not_matchings         
+ 
+def download_and_extract(
+    to_folder, lang_pairs, dl_dataset, 
+    to_manually_download_urls, 
+    completed_urls={}, completed_extraction={},
+    debug=False):
+
+    dl_folder = f'{to_folder}/downloads'
+    extract_folder = f'{to_folder}/extracted'
+    raw_folder =  f'{to_folder}/raw'
+    lid_filtered = f'{to_folder}/lid_filtered'
+
+    os.makedirs(extract_folder, exist_ok=True)
+    os.makedirs(raw_folder, exist_ok=True)
+    os.makedirs(lid_filtered, exist_ok=True)
+
+    
+    to_be_manually_dowloaded = check_need_manual_downalod(dl_folder, to_manually_download_urls)
+
+    completed_urls = download_dataset(
+        dl_folder, dl_dataset, completed_urls)
+    if debug:
+        print('completed urls: ', completed_urls)
+    
+
+    extracted_folders = extract_all_files(
+        completed_urls,
+        extract_folder=extract_folder, 
+        completed_extraction=completed_extraction,
+        debug=debug)
+    if debug:
+        print('download files have been extracted to folders: ', extracted_folders)
+
+    converted_files = convert_files_if_needed(extracted_folders, debug=False)
+    for src_tgt in lang_pairs:
+        print(f'working on {dl_dataset.name}: {src_tgt}')
+        src, tgt = src_tgt.split('-')
+        concat_into_splits(dl_dataset, 
+                            src=src, tgt=tgt,
+                            extracted_folders=extracted_folders, 
+                            to_folder=raw_folder, debug=debug)                            
+    print('completed data into: ', raw_folder)
+
+def download_czang16(download_to, username=None):
+    wgets = [
+        f'wget --user={username} --password=czeng -P {download_to} http://ufallab.ms.mff.cuni.cz/~bojar/czeng16-data/data-plaintext-format.{i}.tar'
+        for i in range(10)]
+    cmds = []
+    for i, cmd in enumerate(wgets):
+        filename = f'{download_to}/data-plaintext-format.{i}.tar'
+        if os.path.exists(filename):
+            print(f'{filename} has already been downloaded; so skip')
+            continue
+        cmds.append(cmd)
+    if cmds and username is None:
+        raise ValueError('No czeng username is given; please register at http://ufal.mff.cuni.cz/czeng/czeng16 to obtain username to download')        
+    for cmd in cmds:
+        call(cmd)
+    print('done with downloading czeng1.6')
+
+def download_czeng17_script(download_to, extract_folder, debug=False):
+    url = 'http://ufal.mff.cuni.cz/czeng/download.php?f=convert_czeng16_to_17.pl.zip'
+    filename = f'{download_to}/convert_czeng16_to_17.pl.zip'
+    extract_to = f'{extract_folder}/{get_extract_name(filename)}'
+    script_path = f'{extract_to}/convert_czeng16_to_17.pl'
+    
+    if not os.path.exists(script_path):
+        wget.download(url, filename, bar=bar_custom)  
+        extract_to = extract_file(f'{download_to}/convert_czeng16_to_17.pl.zip', extract_folder, get_extract_name=get_extract_name, debug=debug)    
+    return script_path
+
+czeng17_script_path = ""
+def convert2czeng17(file, debug):
+    en_file = f'{file}.en'
+    cs_file = f'{file}.cs'
+    
+    if not os.path.exists(en_file) or not os.path.exists(cs_file):
+        cs_cmd = f'cat {file} | perl {czeng17_script_path} | cut -f3 > {cs_file}'
+        en_cmd = f'cat {file} | perl {czeng17_script_path} | cut -f4 > {en_file}'
+        call(cs_cmd, debug)
+        call(en_cmd, debug)
+    else:
+        print(f'already extracted: {en_file} and {cs_file}')
+    return file
+
+def extract_czeng17(extract_folder, debug=False):
+    url = 'http://ufal.mff.cuni.cz/czeng/download.php?f=convert_czeng16_to_17.pl.zip'
+    filename = f'{download_to}/convert_czeng16_to_17.pl.zip'
+    extract_to = f'{extract_folder}/{get_extract_name(filename)}'
+    script_path = f'{extract_to}/convert_czeng16_to_17.pl'
+    
+    if not os.path.exists(script_path):
+        wget.download(url, filename, bar=bar_custom)  
+        extract_to = extract_file(f'{download_to}/convert_czeng16_to_17.pl.zip', extract_folder, get_extract_name=get_extract_name, debug=debug)    
+    return script_path
+
+#########
+# definitions of wmt data sources
+# for es-en
+# Punctuation in the official test sets will be encoded with ASCII characters (not complex Unicode characters) as much as possible. You may want to normalize your system's output before submission. You are able able to use a rawer version of the test sets that does not have this normalization.
+# script to normalize punctuation: http://www.statmt.org/wmt11/normalize-punctuation.perl
+wmt13_es_en = DLDataset(
+    name='wmt13_es-en',
+    train_urls=[
+        'http://www.statmt.org/wmt13/training-parallel-europarl-v7.tgz',
+        'http://www.statmt.org/wmt13/training-parallel-commoncrawl.tgz',
+        'http://www.statmt.org/wmt13/training-parallel-un.tgz',
+        'http://www.statmt.org/wmt13/training-parallel-nc-v8.tgz',
+    ],
+    valid_urls=[
+        ('http://www.statmt.org/wmt13/dev.tgz', 'wmt13_dev.tgz')
+    ],
+    test_urls=[
+        ('http://www.statmt.org/wmt13/test.tgz', 'wmt13_test.tgz')
+    ],
+    train_files_patterns=[
+        ('*/europarl-v7.{src}-{tgt}.{lang}', ['es-en']), 
+        ('*commoncrawl.{src}-{tgt}.{lang}', ['es-en']),
+        ('*/news-commentary-v8.{src}-{tgt}.{lang}', ['es-en']),
+        ('un/*undoc.2000.{src}-{tgt}.{lang}', ['es-en']), 
+    ] ,
+    valid_files_patterns=[
+    ('dev/newstest2012.{lang}', ['es-en'])
+    ],
+    test_files_patterns=[
+    ('test/newstest*.{lang}', ['es-en'])
+    ],
+)
+
+wmt14_de_fr_en = DLDataset(
+    name='wmt14_de_fr_en',
+    train_urls=[
+        'http://www.statmt.org/wmt13/training-parallel-europarl-v7.tgz',
+        'http://www.statmt.org/wmt13/training-parallel-commoncrawl.tgz',
+        'http://www.statmt.org/wmt13/training-parallel-un.tgz',
+        'http://www.statmt.org/wmt14/training-parallel-nc-v9.tgz',
+        ('http://www.statmt.org/wmt10/training-giga-fren.tar', 'training-giga-fren.gz.tar'), #it is actuall a gz.tar 
+    ],
+    valid_urls=[
+        ('http://www.statmt.org/wmt14/dev.tgz', 'wmt14_dev.tgz'),
+    ],
+    test_urls=[
+        ('http://www.statmt.org/wmt14/test-full.tgz', 'wmt14_test_full.tgz'), # cleaned test sets
+    ],
+    train_files_patterns=[
+        ('*/europarl-v7.{src}-{tgt}.{lang}', ['fr-en', 'de-en']), 
+        ('*commoncrawl.{src}-{tgt}.{lang}', ['fr-en', 'de-en']),
+        ('*/*news-commentary-v9.{src}-{tgt}.{lang}', ['fr-en', 'de-en']),
+        ('un/undoc.2000.{src}-{tgt}.{lang}', ['fr-en']),    
+        ('*giga-{src}{tgt}*{lang}', ['fr-en'])
+    ],
+    valid_files_patterns=[
+    ('dev/newstest2013.{lang}', ['fr-en', 'de-en'])
+    ],
+    test_files_patterns=[ 
+    ('test-full/newstest*{src}{tgt}-{src:src}{tgt:ref}.{lang}', ['en-de', 'de-en', 'fr-en', 'en-fr']),                      
+    ],
+)
+
+# pip install git+https://github.com/amake/tmx2corpus.git
+wmt16_ro_en = DLDataset(
+    name='wmt16_ro-en',
+    train_urls=[
+        ('http://data.statmt.org/wmt16/translation-task/training-parallel-ep-v8.tgz', 'wmt16_training-parallel-ep-v8.tgz'),
+        ('http://opus.nlpl.eu/download.php?f=SETIMES/v2/tmx/en-ro.tmx.gz', 'en-ro.tmx.gz'),
+    ],
+    valid_urls=[
+        ('http://data.statmt.org/wmt16/translation-task/dev-romanian-updated.tgz', 'wmt16_dev.tgz')
+    ],
+    test_urls=[
+        ('http://data.statmt.org/wmt16/translation-task/test.tgz', 'wmt16_test.tgz')
+    ],
+    train_files_patterns=[
+        ('*/*europarl-v8.{src}-{tgt}.{lang}', ['ro-en']), 
+        ('bitext.{lang}', ['ro-en']) #setimes from tmux
+        ] ,
+    valid_files_patterns=[
+    ('dev/newsdev2016*{src}{tgt}*.{lang}', ['ro-en', 'ro-en'])
+    ],
+    test_files_patterns=[
+    ('test/newstest*{src}{tgt}*.{lang}', ['ro-en', 'en-ro'])
+    ],
+)
+
+cwmt_wmt_instruction = 'cwmt download instruction at: http://nlp.nju.edu.cn/cwmt-wmt'
+wmt17_fi_lv_tr_zh_en_manual_downloads = [
+    # fake urls to have unique keys for the data
+    ( ('http://nlp.nju.edu.cn/cwmt-wmt/CASIA2015.zip', 'CASIA2015.zip'), cwmt_wmt_instruction),
+    ( ('http://nlp.nju.edu.cn/cwmt-wmt/CASICT2011.zip', 'CASICT2011.zip'), cwmt_wmt_instruction),
+    ( ('http://nlp.nju.edu.cn/cwmt-wmt/CASICT2015.zip', 'CASICT2015.zip'), cwmt_wmt_instruction),
+    ( ('http://nlp.nju.edu.cn/cwmt-wmt/Datum2015.zip', 'Datum2015.zip'), cwmt_wmt_instruction),
+    ( ('http://nlp.nju.edu.cn/cwmt-wmt/Datum2017.zip', 'Datum2017.zip'), cwmt_wmt_instruction),
+    ( ('http://nlp.nju.edu.cn/cwmt-wmt/NEU2017.zip', 'NEU2017.zip'), cwmt_wmt_instruction),    
+]
+wmt17_fi_lv_tr_zh_en = DLDataset(
+    name='wmt17_fi_lv_tr_zh_en',
+    train_urls=[
+        ('http://data.statmt.org/wmt17/translation-task/training-parallel-ep-v8.tgz', 'wmt17_training-parallel-ep-v8.tgz'),
+        'http://data.statmt.org/wmt17/translation-task/training-parallel-nc-v12.tgz',
+        'http://www.statmt.org/wmt15/wiki-titles.tgz',
+        ('http://opus.nlpl.eu/download.php?f=SETIMES/v2/tmx/en-tr.tmx.gz', 'en-tr.tmx.gz'),
+        ('http://data.statmt.org/wmt17/translation-task/rapid2016.tgz', 'wmt17_rapid2016.tgz'),
+        'http://data.statmt.org/wmt17/translation-task/leta.v1.tgz',
+        'http://data.statmt.org/wmt17/translation-task/dcep.lv-en.v1.tgz',
+        'http://data.statmt.org/wmt17/translation-task/books.lv-en.v1.tgz',
+        (('https://stuncorpusprod.blob.core.windows.net/corpusfiles/UNv1.0.en-zh.tar.gz.00',
+        'https://stuncorpusprod.blob.core.windows.net/corpusfiles/UNv1.0.en-zh.tar.gz.01',), 'UNv1.0.en-zh.tar.gz'),
+        #manually download files:
+        ('http://nlp.nju.edu.cn/cwmt-wmt/CASIA2015.zip', 'CASIA2015.zip'),  
+        ('http://nlp.nju.edu.cn/cwmt-wmt/CASICT2011.zip', 'CASICT2011.zip'),  
+        ('http://nlp.nju.edu.cn/cwmt-wmt/CASICT2015.zip', 'CASICT2015.zip'),  
+        ('http://nlp.nju.edu.cn/cwmt-wmt/Datum2015.zip', 'Datum2015.zip'), 
+        ('http://nlp.nju.edu.cn/cwmt-wmt/Datum2017.zip', 'Datum2017.zip'),  
+        ('http://nlp.nju.edu.cn/cwmt-wmt/NEU2017.zip', 'NEU2017.zip'),          
+    ],
+    valid_urls=[
+        ('http://data.statmt.org/wmt17/translation-task/dev.tgz', 'wmt17_dev.tgz'),
+    ],
+    test_urls=[
+        #NEW: Improved translations for zh test sets
+        ('http://data.statmt.org/wmt17/translation-task/test-update-1.tgz', 'wmt17_test_zh_en.tgz'),    
+        ('http://data.statmt.org/wmt17/translation-task/test.tgz', 'wmt17_test_others.tgz')
+    ],
+    train_files_patterns=[
+        ('casict*/cas*{src:ch}{tgt:en}.txt', ['zh-en', 'zh-en'] ),
+        ('casia*/cas*{src:ch}{tgt:en}.txt', ['zh-en', 'zh-en'] ),
+        ('dataum*/Book*{src:cn}{tgt:en}.txt', ['zh-en', 'zh-en']),
+        ('neu*/NEU*{src:cn}{tgt:en}.txt', ['zh-en', 'zh-en'] ),
+        ('*/*UNv1.0.en-zh.{src:zh}{tgt:en}', ['zh-en']),
+        ('training/*news-commentary-v12.{src}-{tgt}.{lang}', ['zh-en', ]),
+        
+        ('*/*europarl-v8.{src}-{tgt}.{lang}', ['fi-en', 'lv-en']),
+        ('wiki/fi-en/titles.{src}-{tgt}.{lang}', ['fi-en', ]),  
+        ('rapid2016.{tgt}-{src}.{lang}', ['fi-en', 'lv-en']),
+        ('*/leta.{lang}', ['lv-en']),
+        ('*/dcep.{lang}', ['lv-en']),
+        ('*/farewell.{lang}', ['lv-en']),       
+        ('bitext.{lang}', ['tr-en']),
+    ] ,
+    valid_files_patterns=[
+    ('dev/newsdev2017*{src}{tgt}-{src:src}{tgt:ref}.{lang}', 
+    [
+        'fi-en', 'lv-en', 'tr-en', 'zh-en',
+        'en-fi', 'en-lv', 'en-tr', 'en-zh'
+    ]),                      
+    ('dev/newstest2016*{src}{tgt}-{src:src}{tgt:ref}.{lang}', 
+    [
+        'fi-en',  'tr-en',  
+        'en-fi',  'en-tr',  
+    ]),  
+    ],
+    test_files_patterns=[
+    ('test/newstest2017-{src}{tgt}-{src:src}{tgt:ref}.{lang}', 
+    [
+        'fi-en', 'lv-en', 'tr-en', 
+        'en-fi', 'en-lv', 'en-tr',  
+    ]),
+    ('newstest2017-{src}{tgt}-{src:src}{tgt:ref}.{lang}', 
+    [
+        'zh-en',
+        'en-zh'
+    ]),
+    ],
+)
+
+czeng_instruction = 'download instruction at: http://ufal.mff.cuni.cz/czeng/czeng16'
+#alternative: use the prepared data but detokenize it?
+wmt18_cs_et_en_manual_downloads = [
+#for cs, need to register and download; Register and download CzEng 1.6.  
+#Better results can be obtained by using a subset of sentences, released under a new version name CzEng 1.7.
+    # ((f'http://ufallab.ms.mff.cuni.cz/~bojar/czeng16-data/data-plaintext-format.{i}.tar', 
+    #     f'data-plaintext-format.{i}.tar'), czeng_instruction)
+    # for i in range(10)
+]
+
+wmt18_cs_et_en = DLDataset(
+    name='wmt18_cs_et_en',
+    train_urls=[
+        'http://www.statmt.org/wmt13/training-parallel-europarl-v7.tgz',
+        'http://data.statmt.org/wmt18/translation-task/training-parallel-ep-v8.tgz',
+        'https://s3.amazonaws.com/web-language-models/paracrawl/release1/paracrawl-release1.en-cs.zipporah0-dedup-clean.tgz',
+        'https://s3.amazonaws.com/web-language-models/paracrawl/release1/paracrawl-release1.en-et.zipporah0-dedup-clean.tgz',
+        'http://www.statmt.org/wmt13/training-parallel-commoncrawl.tgz',
+        'http://data.statmt.org/wmt18/translation-task/training-parallel-nc-v13.tgz',
+        ('http://data.statmt.org/wmt18/translation-task/rapid2016.tgz', 'wmt18_rapid2016.tgz'),
+        # (tuple(
+        #     (f'http://ufallab.ms.mff.cuni.cz/~bojar/czeng16-data/data-plaintext-format.{i}.tar', 
+        #     f'data-plaintext-format.{i}.tar')
+        #     for i in range(10)
+        # ), 
+        # 'czeng16_data_plaintext.gz.tar'), 
+    ],
+    valid_urls=[
+        ('http://data.statmt.org/wmt18/translation-task/dev.tgz', 'wmt18_dev.tgz'),
+    ],
+    test_urls=[
+        ('http://data.statmt.org/wmt18/translation-task/test.tgz', 'wmt18_test.tgz'),
+    ],
+    train_files_patterns=[
+        # ('*/*europarl-v7.{src}-{tgt}.{lang}', ['cs-en']),
+        ('*/*europarl-v8.{src}-{tgt}.{lang}', ['et-en']),
+        # ('*paracrawl-release1.{tgt}-{src}.zipporah0-dedup-clean.{lang}', ['cs-en', 'et-en']),
+        ('*paracrawl-release1.{tgt}-{src}.zipporah0-dedup-clean.{lang}', ['et-en']),
+        # ('*commoncrawl.{src}-{tgt}.{lang}', ['cs-en']),
+        # ('*/news-commentary-v13.{src}-{tgt}.{lang}', ['cs-en']),
+        # ('data.plaintext-format/*train.{lang}', ['cs-en']),
+        ('rapid2016.{tgt}-{src}.{lang}', ['et-en']),
+    ] ,
+    valid_files_patterns=[
+    ('dev/newsdev2018*{src}{tgt}-{src:src}{tgt:ref}.{lang}', ['et-en']),
+    # ('dev/newstest2017*{src}{tgt}-{src:src}{tgt:ref}.{lang}', ['cs-en'])        
+    ],
+    test_files_patterns=[
+    ('test/newstest2018-{src}{tgt}-{src:src}{tgt:ref}.{lang}', 
+    # ['cs-en', 'et-en']),
+    ['et-en']),
+    ]
+)
+
+ru_en_yandex_instruction = 'Yandex Corpus download instruction at: https://translate.yandex.ru/corpus?lang=en'
+wmt19_ru_gu_kk_lt_manual_downloads = [
+    (('https://translate.yandex.ru/corpus?lang=en', 'wmt19_1mcorpus.zip'), ru_en_yandex_instruction)
+]
+wmt19_ru_gu_kk_lt = DLDataset(
+    name='wmt19_ru_gu_kk_lt',
+    train_urls=[
+        'http://www.statmt.org/europarl/v9/training/europarl-v9.lt-en.tsv.gz',
+        'https://s3.amazonaws.com/web-language-models/paracrawl/release3/en-lt.bicleaner07.tmx.gz',
+        'https://s3.amazonaws.com/web-language-models/paracrawl/release1/paracrawl-release1.en-ru.zipporah0-dedup-clean.tgz',
+        'http://www.statmt.org/wmt13/training-parallel-commoncrawl.tgz',
+        'http://data.statmt.org/news-commentary/v14/training/news-commentary-v14-wmt19.en-kk.tsv.gz',
+        'http://data.statmt.org/news-commentary/v14/training/news-commentary-v14.en-ru.tsv.gz',
+        'http://data.statmt.org/wikititles/v1/wikititles-v1.kk-en.tsv.gz',
+        'http://data.statmt.org/wikititles/v1/wikititles-v1.ru-en.tsv.gz',
+        'http://data.statmt.org/wikititles/v1/wikititles-v1.kk-en.tsv.gz',
+        'http://data.statmt.org/wikititles/v1/wikititles-v1.lt-en.tsv.gz',
+        'http://data.statmt.org/wikititles/v1/wikititles-v1.gu-en.tsv.gz',
+        (('https://stuncorpusprod.blob.core.windows.net/corpusfiles/UNv1.0.en-ru.tar.gz.00',
+        'https://stuncorpusprod.blob.core.windows.net/corpusfiles/UNv1.0.en-ru.tar.gz.01',
+        'https://stuncorpusprod.blob.core.windows.net/corpusfiles/UNv1.0.en-ru.tar.gz.02',), 
+        'wmt19_UNv1.0.en-ru.tar.gz'),
+        'https://tilde-model.s3-eu-west-1.amazonaws.com/rapid2016.en-lt.tmx.zip',
+        ('https://translate.yandex.ru/corpus?lang=en', 'wmt19_1mcorpus.zip'),
+    ],
+    valid_urls=[
+        ('http://data.statmt.org/wmt19/translation-task/dev.tgz', 'wmt19_dev.tgz'),
+    ],
+    test_urls=[
+        ('http://data.statmt.org/wmt19/translation-task/test.tgz', 'wmt19_test.tgz'),
+    ],
+    train_files_patterns=[
+        ('*europarl-v9.{src}-{tgt}.tsv.{lang}', ['lt-en']),
+        #paracrawl
+        ('*paracrawl-release1.{tgt}-{src}.zipporah0-dedup-clean.{lang}', ['ru-en']),
+        ('bitext.{lang}', ['lt-en',]),
+        ('*commoncrawl.{src}-{tgt}.{lang}', ['ru-en',]),
+        ('*news-commentary-v14-wmt19.{tgt}-{src}.tsv.{lang}', ['kk-en', ]),
+        ('*news-commentary-v14.{tgt}-{src}.tsv.{lang}', ['ru-en']),
+        #yandex
+        ('corpus.{tgt}_{src}.1m.{lang}', ['ru-en']),
+        ('wikititles_v1_wikititles-v1.{src}-{tgt}.tsv.{lang}', ['ru-en', 'kk-en', 'lt-en', 'gu-en']),
+        ('*/UNv1.0.{tgt}-{src}.{lang}', ['ru-en']),
+        #rapid
+        ('bitext.{lang}', ['lt-en'])
+    ],
+    valid_files_patterns=[
+    ('dev/newsdev2019*{src}{tgt}-{src:src}{tgt:ref}.{lang}', ['gu-en', 'kk-en', 'lt-en']),
+    ('dev/newstest2018*{src}{tgt}-{src:src}{tgt:ref}.{lang}', ['ru-en']),       
+    ],
+    test_files_patterns=[
+    ('sgm/newstest2019-{src}{tgt}-{src:src}{tgt:ref}.{lang}', 
+    ['ru-en', 'gu-en', 'kk-en', 'lt-en', 'en-ru', 'en-gu', 'en-kk', 'en-lt']),
+    ]    
+)
+
+
+#########
+
+if __name__ == "__main__":
+    # speed up the downloads with multiple processing
+    dl_folder = f'{to_data_path}/downloads'
+    extract_folder = f'{to_data_path}/extracted'
+
+    urls = [
+        url
+        for dataset in [wmt13_es_en, wmt14_de_fr_en, wmt16_ro_en, wmt18_cs_et_en, wmt19_ru_gu_kk_lt]
+        for urls in [dataset.train_urls, dataset.valid_urls, dataset.test_urls]
+        for url in urls
+    ]
+    urls = set(urls)
+    download_multi(dl_folder, extract_folder, urls, num_processes=8, debug=True)
+
+    # check manually downlaods
+    to_manually_download_urls = (
+        wmt17_fi_lv_tr_zh_en_manual_downloads + wmt18_cs_et_en_manual_downloads + wmt19_ru_gu_kk_lt_manual_downloads
+    )
+    to_be_manually_dowloaded = check_need_manual_downalod(dl_folder, to_manually_download_urls)
+    if len(to_be_manually_dowloaded) > 0:
+        print('Missing files that need to be downloaded manually; stop the process now.')
+        exit(-1)
+    
+    completed_urls = {}
+    completed_extraction = {}
+    def work_on_wmt(directions, wmt_data):
+        download_and_extract(
+            to_data_path, 
+            directions, 
+            wmt_data, 
+            to_manually_download_urls=to_manually_download_urls,
+            completed_urls=completed_urls, completed_extraction=completed_extraction, debug=True)
+                
+    work_on_wmt(
+        ['es_XX-en_XX'], 
+        wmt13_es_en,)
+    work_on_wmt(
+        [
+            'fr_XX-en_XX',  'en_XX-fr_XX',
+            # 'en_XX-de_DE', 'de_DE-en_XX',
+        ], 
+        wmt14_de_fr_en,)
+    work_on_wmt(
+        ['ro_RO-en_XX', 'en_XX-ro_XX'], 
+        wmt16_ro_en,)
+    work_on_wmt(
+        [
+            # 'zh_CN-en_XX', 
+            'lv_LV-en_XX', 'fi_FI-en_XX', 'tr_TR-en_XX',
+            #in case the reversed directions have different train/valid/test data
+            # 'en_XX-zh_CN', 
+            'en_XX-lv_LV', 'en_XX-fi_FI', 'en_XX-tr_TR',
+        ], 
+        wmt17_fi_lv_tr_zh_en, )
+    # czeng17_script_path = download_czeng17_script(download_to, extract_to, debug=False)
+    # cz_username =  None
+    work_on_wmt(
+        [
+            # 'cs_CZ-en_XX', 
+            'et_EE-en_XX'], 
+        wmt18_cs_et_en,)
+    work_on_wmt(
+        [
+            # 'ru_RU-en_XX', 'en_XX-ru_RU', 
+            'gu_IN-en_XX', 'kk_KZ-en_XX', 'lt_LT-en_XX',
+            #in case the reversed directions have different train/valid/test data
+            'en_XX-gu_IN', 'en_XX-kk_KZ', 'en_XX-lt_LT'
+        ], 
+        wmt19_ru_gu_kk_lt,)
+
+    not_matching = check_wmt_test_bleu(
+        f'{to_data_path}/raw', 
+        [
+            ('wmt13', ['es_XX-en_XX']),
+            ('wmt14/full', ['fr_XX-en_XX',]),
+            ('wmt16', ['ro_RO-en_XX',]),
+            # ('wmt17/improved', ['zh_CN-en_XX']),
+            ('wmt17', [ 'lv_LV-en_XX', 'fi_FI-en_XX', 'tr_TR-en_XX']),
+            ('wmt18', ['cs_CZ-en_XX', 'et_EE-en_XX']),
+            ('wmt19', ['gu_IN-en_XX', 'kk_KZ-en_XX', 'lt_LT-en_XX']), 
+            #'ru_RU-en_XX', 
+        ]
+        )    
+    if len(not_matching) > 0:
+        print('the following datasets do not have matching test datasets:\n\t', '\n\t'.join(not_matching))
+
diff --git a/SpeechT5/fairseq/examples/multilingual/data_scripts/download_wmt20.sh b/SpeechT5/fairseq/examples/multilingual/data_scripts/download_wmt20.sh
new file mode 100644
index 0000000000000000000000000000000000000000..31cd5c76b75081331ae03c5ea70ea7ddebaa06e1
--- /dev/null
+++ b/SpeechT5/fairseq/examples/multilingual/data_scripts/download_wmt20.sh
@@ -0,0 +1,547 @@
+#!/bin/bash
+# Copyright (c) Facebook, Inc. and its affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+
+if [ -z $WORKDIR_ROOT ] ;
+then
+        echo "please specify your working directory root in environment variable WORKDIR_ROOT. Exitting..."
+        exit
+fi
+
+
+
+set -x -e
+
+# TODO update the workdir and dest dir name
+# put fasttext model
+WORKDIR=$WORKDIR_ROOT
+# put intermediate files
+TMP_DIR=$WORKDIR_ROOT/tmp/tmp_wmt20_lowres_download
+# output {train,valid,test} files to dest
+DEST=$WORKDIR_ROOT/ML50/raw
+
+UTILS=$PWD/utils
+
+# per dataset locations
+COMMONCRAWL_DIR=$TMP_DIR/commoncrawl
+YANDEX_CORPUS=$WORKDIR_ROOT/wmt20/official/ru/yandex/1mcorpus.zip
+# unzipped
+CZENG_CORPUS=$WORKDIR_ROOT/wmt20/official/cs/czeng/czeng20-train
+CCMT_DIR=$WORKDIR_ROOT/wmt20/official/zh/ccmt/parallel
+
+download_and_select() {
+  SUBFOLDER=$1
+  URL=$2
+  UNCOMPRESS_CMD=$3
+  LANG=$4
+  INPUT_FILEPATH=$5
+  if [[ $# -gt 5 ]]; then
+    LANG_COL=$6
+    EN_COL=$7
+  fi
+
+  mkdir -p $SUBFOLDER
+  cd $SUBFOLDER
+  wget -nc --content-disposition $URL
+  $UNCOMPRESS_CMD
+
+  if [[ $# -gt 5 ]]; then
+    cut -f$LANG_COL $INPUT_FILEPATH > $INPUT_FILEPATH.$LANG
+    cut -f$EN_COL $INPUT_FILEPATH > $INPUT_FILEPATH.en
+  fi
+  cd ..
+
+  ln -sf $SUBFOLDER/$INPUT_FILEPATH.$LANG $SUBFOLDER.$LANG
+  ln -sf $SUBFOLDER/$INPUT_FILEPATH.en $SUBFOLDER.en
+}
+
+prepare_lid() {
+  pip install fasttext
+
+  # TODO specify global workdir
+  MODEL=$WORKDIR/fasttext/lid.176.bin
+  LID_MULTI=$UTILS/fasttext_multi_filter.py
+
+  if [ ! -f "$MODEL" ]; then
+    echo "downloading fasttext lid model..."
+    mkdir -p $WORKDIR/fasttext
+    wget -nc https://dl.fbaipublicfiles.com/fasttext/supervised-models/lid.176.bin -O $MODEL
+  fi
+}
+
+prepare_moses() {
+  pushd $UTILS
+  echo 'Cloning Moses github repository (for tokenization scripts)...'
+  git clone https://github.com/moses-smt/mosesdecoder.git  
+  popd
+}
+
+lid_filter() {
+  # TODO specify global workdir
+  MODEL=$WORKDIR/fasttext/lid.176.bin
+  LID_MULTI=$UTILS/fasttext_multi_filter.py
+
+  prepare_lid
+
+  SRC=$1
+  SRC_FILE=$2
+  SRC_OUTPUT=$3
+  TGT=$4
+  TGT_FILE=$5
+  TGT_OUTPUT=$6
+  python $LID_MULTI --model $MODEL --inputs $SRC_FILE $TGT_FILE --langs $SRC $TGT --outputs $SRC_OUTPUT $TGT_OUTPUT
+}
+
+prepare_ja_ted() {
+  mkdir -p ted
+  cd ted
+
+  wget -nc https://wit3.fbk.eu/archive/2017-01-trnted//texts/en/ja/en-ja.tgz
+  tar -zxvf en-ja.tgz
+  cat en-ja/train.tags.en-ja.en | grep -v -P "^[ ]*\<" | sed 's/^[ \t]*//g' | sed 's/[ \t]*$//g' > en-ja/train.en-ja.en
+  cat en-ja/train.tags.en-ja.ja | grep -v -P "^[ ]*\<" | sed 's/^[ \t]*//g' | sed 's/[ \t]*$//g' > en-ja/train.en-ja.ja
+
+  cd ..
+  ln -sf ted/en-ja/train.en-ja.ja ted.ja
+  ln -sf ted/en-ja/train.en-ja.en ted.en
+}
+
+prepare_ja() {
+  OUTPUT_DIR=$TMP_DIR/ja
+  mkdir -p $OUTPUT_DIR
+  cd $OUTPUT_DIR
+
+  download_and_select paracrawl "http://www.kecl.ntt.co.jp/icl/lirg/jparacrawl/release/2.0/bitext/en-ja.tar.gz" "tar -zxvf en-ja.tar.gz" ja en-ja/en-ja.bicleaner05.txt 4 3 &
+  download_and_select newscommentary "http://data.statmt.org/news-commentary/v15/training/news-commentary-v15.en-ja.tsv.gz" "gunzip -f news-commentary-v15.en-ja.tsv.gz" ja news-commentary-v15.en-ja.tsv 2 1 &
+  download_and_select wikititles "http://data.statmt.org/wikititles/v2/wikititles-v2.ja-en.tsv.gz" "gunzip -f wikititles-v2.ja-en.tsv.gz" ja wikititles-v2.ja-en.tsv 1 2 &
+  download_and_select wikimatrix "http://data.statmt.org/wmt20/translation-task/WikiMatrix/WikiMatrix.v1.en-ja.langid.tsv.gz" "gunzip -f WikiMatrix.v1.en-ja.langid.tsv.gz" ja WikiMatrix.v1.en-ja.langid.tsv 3 2 &
+  download_and_select subtitle "https://nlp.stanford.edu/projects/jesc/data/split.tar.gz" "tar -zxvf split.tar.gz" ja split/train 2 1 &
+  download_and_select kftt "http://www.phontron.com/kftt/download/kftt-data-1.0.tar.gz" "tar -zxvf kftt-data-1.0.tar.gz" ja kftt-data-1.0/data/orig/kyoto-train &
+
+  prepare_ja_ted &
+
+  # ted data needs to 
+
+  wait
+
+  # remove previous results
+  rm -f all.??
+  find ./ -maxdepth 1 -name "*.ja" | sort -V | xargs cat > all.ja
+  find ./ -maxdepth 1 -name "*.en" | sort -V | xargs cat > all.en
+  lid_filter ja all.ja $DEST/train.ja_XX-en_XX.ja_XX en all.en $DEST/train.ja_XX-en_XX.en_XX
+}
+
+prepare_ta() {
+  OUTPUT_DIR=$TMP_DIR/ta
+  mkdir -p $OUTPUT_DIR
+  cd $OUTPUT_DIR
+
+  download_and_select wikititles "http://data.statmt.org/wikititles/v2/wikititles-v2.ta-en.tsv.gz" "gunzip -f wikititles-v2.ta-en.tsv.gz" ta wikititles-v2.ta-en.tsv 1 2 &
+  download_and_select wikimatrix "http://data.statmt.org/wmt20/translation-task/WikiMatrix/WikiMatrix.v1.en-ta.langid.tsv.gz" "gunzip -f WikiMatrix.v1.en-ta.langid.tsv.gz" ta WikiMatrix.v1.en-ta.langid.tsv 3 2 &
+  download_and_select pmindia "http://data.statmt.org/pmindia/v1/parallel/pmindia.v1.ta-en.tsv" "" ta pmindia.v1.ta-en.tsv 2 1 &
+  download_and_select tanzil "https://object.pouta.csc.fi/OPUS-Tanzil/v1/moses/en-ta.txt.zip" "unzip en-ta.txt.zip" ta Tanzil.en-ta &
+  download_and_select pib "http://preon.iiit.ac.in/~jerin/resources/datasets/pib-v0.tar" "tar -xvf pib-v0.tar" ta pib/en-ta/train &
+  download_and_select mkb "http://preon.iiit.ac.in/~jerin/resources/datasets/mkb-v0.tar" "tar -xvf mkb-v0.tar" ta mkb/en-ta/mkb &
+  download_and_select ufal "http://ufal.mff.cuni.cz/~ramasamy/parallel/data/v2/en-ta-parallel-v2.tar.gz" "tar -zxvf en-ta-parallel-v2.tar.gz" ta en-ta-parallel-v2/corpus.bcn.train &
+
+  wait
+
+  # need special handling for nlpc
+  mkdir -p nlpc
+  cd nlpc
+  wget -nc https://raw.githubusercontent.com/nlpc-uom/English-Tamil-Parallel-Corpus/master/En-Ta%20Corpus/En-Ta%20English.txt
+  wget -nc https://github.com/nlpc-uom/English-Tamil-Parallel-Corpus/raw/master/En-Ta%20Corpus/En-Ta%20Tamil.txt
+  tail -n +4 "En-Ta English.txt" > en-ta.en
+  tail -n +4 "En-Ta Tamil.txt" > en-ta.ta
+  cd ..
+  ln -sf nlpc/en-ta.en nlpc.en
+  ln -sf nlpc/en-ta.ta nlpc.ta
+
+  # remove previous results
+  rm -f all.??
+  find ./ -maxdepth 1 -name "*.ta" | sort -V | xargs cat > all.ta
+  find ./ -maxdepth 1 -name "*.en" | sort -V | xargs cat > all.en
+  lid_filter ta all.ta $DEST/train.ta_IN-en_XX.ta_IN en all.en $DEST/train.ta_IN-en_XX.en_XX
+}
+
+prepare_iu() {
+  OUTPUT_DIR=$TMP_DIR/iu
+  mkdir -p $OUTPUT_DIR
+  cd $OUTPUT_DIR
+  
+  download_and_select nh "https://nrc-digital-repository.canada.ca/eng/view/dataset/?id=c7e34fa7-7629-43c2-bd6d-19b32bf64f60" "tar -zxvf Nunavut-Hansard-Inuktitut-English-Parallel-Corpus-3.0.1.tgz" iu Nunavut-Hansard-Inuktitut-English-Parallel-Corpus-3.0/NunavutHansard > /dev/null &
+  download_and_select wikititles "http://data.statmt.org/wikititles/v2/wikititles-v2.iu-en.tsv.gz" "gunzip -f wikititles-v2.iu-en.tsv.gz" iu wikititles-v2.iu-en.tsv 1 2 &
+
+  wait
+
+  # remove previous results
+  rm -f all.??
+  find ./ -maxdepth 1 -name "*.iu" | sort -V | xargs cat | nh/Nunavut-Hansard-Inuktitut-English-Parallel-Corpus-3.0/scripts/normalize-iu-spelling.pl > all.iu
+  find ./ -maxdepth 1 -name "*.en" | sort -V | xargs cat > all.en
+  paste all.iu all.en | awk -F $'\t' '$1!=""&&$2!=""' > all.iuen
+  cut -f1 all.iuen > $DEST/train.iu_CA-en_XX.iu_CA
+  cut -f2 all.iuen > $DEST/train.iu_CA-en_XX.en_XX
+}
+
+prepare_km() {
+  OUTPUT_DIR=$TMP_DIR/km
+  mkdir -p $OUTPUT_DIR
+  cd $OUTPUT_DIR
+
+  download_and_select paracrawl "http://data.statmt.org/wmt20/translation-task/ps-km/wmt20-sent.en-km.xz" "unxz wmt20-sent.en-km.zx" km wmt20-sent.en-km 2 1 &
+
+  # km-parallel has multiple sets, concat all of them together
+  mkdir -p opus
+  cd opus
+  wget -nc "http://data.statmt.org/wmt20/translation-task/ps-km/km-parallel.tgz"
+  tar -zxvf km-parallel.tgz
+  find ./km-parallel -maxdepth 1 -name "*.km" | sort -V | xargs cat > opus.km
+  find ./km-parallel -maxdepth 1 -name "*.en" | sort -V | xargs cat > opus.en
+  cd ..
+  ln -sf opus/opus.km .
+  ln -sf opus/opus.en .
+
+  wait
+
+  # remove previous results
+  rm -f all.??
+  find ./ -maxdepth 1 -name "*.km" | sort -V | xargs cat > all.km
+  find ./ -maxdepth 1 -name "*.en" | sort -V | xargs cat > all.en
+  lid_filter km all.km $DEST/train.km_KH-en_XX.km_KH en all.en $DEST/train.km_KH-en_XX.en_XX
+}
+
+prepare_ps() {
+  OUTPUT_DIR=$TMP_DIR/ps
+  mkdir -p $OUTPUT_DIR
+  cd $OUTPUT_DIR
+
+  download_and_select paracrawl "http://data.statmt.org/wmt20/translation-task/ps-km/wmt20-sent.en-ps.xz" "unxz wmt20-sent.en-ps.xz" ps wmt20-sent.en-ps 2 1 &
+  download_and_select wikititles "http://data.statmt.org/wikititles/v2/wikititles-v2.ps-en.tsv.gz" "gunzip -f wikititles-v2.ps-en.tsv.gz" ps wikititles-v2.ps-en.tsv 1 2 &
+  # ps-parallel has multiple sets, concat all of them together
+  mkdir -p opus
+  cd opus
+  wget -nc "http://data.statmt.org/wmt20/translation-task/ps-km/ps-parallel.tgz"
+  tar -zxvf ps-parallel.tgz
+  find ./ps-parallel -maxdepth 1 -name "*.ps" | sort -V | xargs cat > opus.ps
+  find ./ps-parallel -maxdepth 1 -name "*.en" | sort -V | xargs cat > opus.en
+  cd ..
+  ln -sf opus/opus.ps opus.ps
+  ln -sf opus/opus.en opus.en
+
+  wait
+
+  # remove previous results
+  rm -f all.??
+  find ./ -maxdepth 1 -name "*.ps" | sort -V | xargs cat > all.ps
+  find ./ -maxdepth 1 -name "*.en" | sort -V | xargs cat > all.en
+  lid_filter ps all.ps $DEST/train.ps_AF-en_XX.ps_AF en all.en $DEST/train.ps_AF-en_XX.en_XX
+}
+
+download_commoncrawl() {
+  mkdir -p $COMMONCRAWL_DIR
+  cd $COMMONCRAWL_DIR
+
+  wget -nc "http://www.statmt.org/wmt13/training-parallel-commoncrawl.tgz"
+  tar -zxvf training-parallel-commoncrawl.tgz
+}
+link_commoncrawl() {
+  LANG=$1
+  ln -sf $COMMONCRAWL_DIR/commoncrawl.$LANG-en.en commoncrawl.en
+  ln -sf $COMMONCRAWL_DIR/commoncrawl.$LANG-en.$LANG commoncrawl.$LANG
+}
+
+strip_xlf() {
+  INPUT_FILE=$1
+  SRC=$2
+  TGT=$3
+  grep '<source xml:lang=' $INPUT_FILE | sed 's/^<[^<>]*>//g' | sed 's/<[^<>]*>$//g' > $INPUT_FILE.$SRC
+  grep '<target xml:lang=' $INPUT_FILE | sed 's/^<[^<>]*>//g' | sed 's/<[^<>]*>$//g' > $INPUT_FILE.$TGT
+}
+
+download_and_process_tilde() {
+  URL=$1
+  UNCOMPRESS_CMD=$2
+  FILENAME=$3
+  LANG=$4
+  PROCESS_CMD=$5
+
+  mkdir -p tilde
+  cd tilde
+  wget -nc $URL
+  $UNCOMPRESS_CMD
+  echo "executing cmd"
+  echo $PROCESS_CMD
+  $PROCESS_CMD
+  cd ..
+  ln -sf tilde/$FILENAME.$LANG tilde.$LANG
+  ln -sf tilde/$FILENAME.en tilde.en
+}
+
+prepare_cs() {
+  OUTPUT_DIR=$TMP_DIR/cs
+  mkdir -p $OUTPUT_DIR
+  cd $OUTPUT_DIR
+
+  #download_and_select europarl "http://www.statmt.org/europarl/v10/training/europarl-v10.cs-en.tsv.gz" "gunzip europarl-v10.cs-en.tsv.gz" cs europarl-v10.cs-en.tsv 1 2 &
+  #download_and_select paracrawl "https://s3.amazonaws.com/web-language-models/paracrawl/release5.1/en-cs.txt.gz" "gunzip en-cs.txt.gz" cs en-cs.txt 2 1 &
+  #link_commoncrawl cs
+  #download_and_select newscommentary "http://data.statmt.org/news-commentary/v15/training/news-commentary-v15.cs-en.tsv.gz" "gunzip news-commentary-v15.cs-en.tsv.gz" cs news-commentary-v15.cs-en.tsv 1 2 &
+  #download_and_select wikititles "http://data.statmt.org/wikititles/v2/wikititles-v2.cs-en.tsv.gz" "gunzip wikititles-v2.cs-en.tsv.gz" cs wikititles-v2.cs-en.tsv 1 2 &
+  #download_and_process_tilde "http://data.statmt.org/wmt20/translation-task/rapid/RAPID_2019.cs-en.xlf.gz" "gunzip RAPID_2019.cs-en.xlf.gz" RAPID_2019.cs-en.xlf cs "strip_xlf RAPID_2019.cs-en.xlf cs en" &
+  #download_and_select wikimatrix "http://data.statmt.org/wmt20/translation-task/WikiMatrix/WikiMatrix.v1.cs-en.langid.tsv.gz" "gunzip WikiMatrix.v1.cs-en.langid.tsv.gz" cs WikiMatrix.v1.cs-en.langid.tsv 2 3 &
+
+  #wait
+
+  # remove previous results
+  #rm -f all.??
+  #find ./ -maxdepth 1 -name "*.cs" | sort -V | xargs cat > all.cs
+  #find ./ -maxdepth 1 -name "*.en" | sort -V | xargs cat > all.en
+  if [ -z $CZENG_CORPUS ] ;
+  then
+          echo "Please download CZENG_CORPUS manually and place them at $CZENG_CORPUS. Exitting..."
+          exit
+  fi  
+  cat $CZENG_CORPUS | sed '/^$/d' | cut -f5 > all.cs
+  cat $CZENG_CORPUS | sed '/^$/d' | cut -f6 > all.en
+
+  lid_filter cs all.cs $DEST/train.cs_CZ-en_XX.cs_CZ en all.en $DEST/train.cs_CZ-en_XX.en_XX
+}
+
+prepare_de() {
+  OUTPUT_DIR=$TMP_DIR/de
+  mkdir -p $OUTPUT_DIR
+  cd $OUTPUT_DIR
+
+  download_and_select europarl "http://www.statmt.org/europarl/v10/training/europarl-v10.de-en.tsv.gz" "gunzip europarl-v10.de-en.tsv.gz" de europarl-v10.de-en.tsv 1 2 &
+  download_and_select paracrawl "https://s3.amazonaws.com/web-language-models/paracrawl/release5.1/en-de.txt.gz"  "gunzip en-de.txt.gz" de en-de.txt 2 1 &
+  link_commoncrawl de
+  download_and_select newscommentary "http://data.statmt.org/news-commentary/v15/training/news-commentary-v15.de-en.tsv.gz" "gunzip news-commentary-v15.de-en.tsv.gz" de news-commentary-v15.de-en.tsv 1 2 &
+  download_and_select wikititles "http://data.statmt.org/wikititles/v2/wikititles-v2.de-en.tsv.gz" "gunzip wikititles-v2.de-en.tsv.gz" de wikititles-v2.de-en.tsv 1 2 &
+  download_and_process_tilde "http://data.statmt.org/wmt20/translation-task/rapid/RAPID_2019.de-en.xlf.gz" "gunzip RAPID_2019.de-en.xlf.gz" RAPID_2019.de-en.xlf de "strip_xlf RAPID_2019.de-en.xlf de en" &
+  download_and_select wikimatrix "http://data.statmt.org/wmt20/translation-task/WikiMatrix/WikiMatrix.v1.de-en.langid.tsv.gz" "gunzip WikiMatrix.v1.de-en.langid.tsv.gz" de WikiMatrix.v1.de-en.langid.tsv 2 3 &
+
+  wait
+
+  # remove previous results
+  rm -f all.??
+  find ./ -maxdepth 1 -name "*.de" | sort -V | xargs cat > all.de
+  find ./ -maxdepth 1 -name "*.en" | sort -V | xargs cat > all.en
+  lid_filter de all.de $DEST/train.de_DE-en_XX.de_DE en all.en $DEST/train.de_DE-en_XX.en_XX
+}
+
+prepare_tmx() {
+  TMX_FILE=$1
+  git clone https://github.com/amake/TMX2Corpus $UTILS/tmx2corpus
+  pip install tinysegmenter
+
+  python $UTILS/tmx2corpus/tmx2corpus.py $TMX_FILE
+}
+
+prepare_pl() {
+  OUTPUT_DIR=$TMP_DIR/pl
+  mkdir -p $OUTPUT_DIR
+  cd $OUTPUT_DIR
+
+  # download_and_select europarl "http://www.statmt.org/europarl/v10/training/europarl-v10.pl-en.tsv.gz" "gunzip europarl-v10.pl-en.tsv.gz" pl europarl-v10.pl-en.tsv 1 2 &
+  # download_and_select paracrawl "https://s3.amazonaws.com/web-language-models/paracrawl/release5.1/en-pl.txt.gz" "gunzip en-pl.txt.gz" pl en-pl.txt 2 1 &
+  # download_and_select wikititles "http://data.statmt.org/wikititles/v2/wikititles-v2.pl-en.tsv.gz" "gunzip wikititles-v2.pl-en.tsv.gz" pl wikititles-v2.pl-en.tsv 1 2 &
+  download_and_select tilde "https://tilde-model.s3-eu-west-1.amazonaws.com/rapid2019.en-pl.tmx.zip" "gunzip rapid2019.en-pl.tmx.zip" bitext pl "prepare_tmx RAPID_2019.UNIQUE.en-pl.tmx" &
+  # download_and_select wikimatrix "http://data.statmt.org/wmt20/translation-task/WikiMatrix/WikiMatrix.v1.en-pl.langid.tsv.gz" "gunzip WikiMatrix.v1.en-pl.langid.tsv.gz" pl WikiMatrix.v1.en-pl.langid.tsv 3 2 &
+
+  wait
+
+  # remove previous results
+  rm -f all.??
+  find ./ -maxdepth 1 -name "*.pl" | sort -V | xargs cat > all.pl
+  find ./ -maxdepth 1 -name "*.en" | sort -V | xargs cat > all.en
+  lid_filter pl all.pl $DEST/train.pl_PL-en_XX.pl_PL en all.en $DEST/train.pl_PL-en_XX.en_XX
+}
+
+prepare_uncorpus() {
+  $URLS=$1
+  $FILES=$2
+
+  mkdir -p uncorpus
+  cd uncorpus
+
+  for URL in $URLS; do
+    wget -nc $URL
+  done
+  cat $FILES > uncorpus.tar.gz
+  tar -zxvf uncorpus.tar.gz
+
+  cd ..
+  ln -sf uncorpus/en-$LANG/UNv1.0.en-$LANG.$LANG uncorpus.$LANG
+  ln -sf uncorpus/en-$LANG/UNv1.0.en-$LANG.en uncorpus.en
+}
+
+prepare_yandex() {
+  mkdir -p yandex
+  cd yandex
+  unzip $YANDEX_CORPUS ./
+  cd ..
+  ln -s yandex/corpus.en_ru.1m.en yandex.en
+  ln -s yandex/corpus.en_ru.1m.ru yandex.ru
+}
+
+prepare_ru() {
+  OUTPUT_DIR=$TMP_DIR/ru
+  mkdir -p $OUTPUT_DIR
+  cd $OUTPUT_DIR
+
+  download_and_select paracrawl "https://s3.amazonaws.com/web-language-models/paracrawl/release1/paracrawl-release1.en-ru.zipporah0-dedup-clean.tgz" "tar -zxvf paracrawl-release1.en-ru.zipporah0-dedup-clean.tgz" ru paracrawl-release1.en-ru.zipporah0-dedup-clean &
+  link_commoncrawl ru
+  download_and_select newscommentary "http://data.statmt.org/news-commentary/v15/training/news-commentary-v15.en-ru.tsv.gz" "gunzip news-commentary-v15.en-ru.tsv.gz" ru news-commentary-v15.en-ru.tsv 2 1 &
+  prepare_yandex &
+  download_and_select wikititles "http://data.statmt.org/wikititles/v2/wikititles-v2.ru-en.tsv.gz" "gunzip wikititles-v2.ru-en.tsv.gz" ru wikititles-v2.ru-en.tsv 1 2 &
+  prepare_uncorpus "https://stuncorpusprod.blob.core.windows.net/corpusfiles/UNv1.0.en-ru.tar.gz.00 https://stuncorpusprod.blob.core.windows.net/corpusfiles/UNv1.0.en-ru.tar.gz.01 https://stuncorpusprod.blob.core.windows.net/corpusfiles/UNv1.0.en-ru.tar.gz.02" "UNv1.0.en-ru.tar.gz.00 UNv1.0.en-ru.tar.gz.01 UNv1.0.en-ru.tar.gz.02" &
+  download_and_select wikimatrix "http://data.statmt.org/wmt20/translation-task/WikiMatrix/WikiMatrix.v1.en-ru.langid.tsv.gz" "gunzip WikiMatrix.v1.en-ru.langid.tsv.gz" ru WikiMatrix.v1.en-ru.langid.tsv 3 2 &
+
+  wait
+
+  # remove previous results
+  rm -f all.??
+  find ./ -maxdepth 1 -name "*.ru" | sort -V | xargs cat > all.ru
+  find ./ -maxdepth 1 -name "*.en" | sort -V | xargs cat > all.en
+  lid_filter ru all.ru $DEST/train.ru_RU-en_XX.ru_RU en all.en $DEST/train.ru_RU-en_XX.en_XX
+}
+
+prepare_ccmt() {
+  mkdir -p ccmt
+  cd ccmt
+  # assume ccmt data is already unzipped under CCMT_DIR folder
+  cat $CCMT_DIR/datum2017/Book*_cn.txt | sed 's/ //g' > datum2017.detok.zh
+  cat $CCMT_DIR/datum2017/Book*_en.txt > datum2017.detok.en
+  cat $CCMT_DIR/casict2011/casict-A_ch.txt $CCMT_DIR/casict2011/casict-B_ch.txt $CCMT_DIR/casict2015/casict2015_ch.txt $CCMT_DIR/datum2015/datum_ch.txt $CCMT_DIR/neu2017/NEU_cn.txt datum2017.detok.zh > ccmt.zh
+  cat $CCMT_DIR/casict2011/casict-A_en.txt $CCMT_DIR/casict2011/casict-B_en.txt $CCMT_DIR/casict2015/casict2015_en.txt $CCMT_DIR/datum2015/datum_en.txt $CCMT_DIR/neu2017/NEU_en.txt datum2017.detok.en > ccmt.en
+  cd ..
+  ln -sf ccmt/ccmt.zh ccmt.zh
+  ln -sf ccmt/ccmt.en ccmt.en
+}
+
+prepare_zh() {
+  OUTPUT_DIR=$TMP_DIR/zh
+  mkdir -p $OUTPUT_DIR
+  cd $OUTPUT_DIR
+
+  download_and_select newscommentary "http://data.statmt.org/news-commentary/v15/training/news-commentary-v15.en-zh.tsv.gz" "gunzip news-commentary-v15.en-zh.tsv.gz" zh news-commentary-v15.en-zh.tsv 2 1 &
+  download_and_select wikititles "http://data.statmt.org/wikititles/v2/wikititles-v2.zh-en.tsv.gz" "gunzip wikititles-v2.zh-en.tsv.gz" zh wikititles-v2.zh-en.tsv 1 2 &
+  prepare_uncorpus "https://stuncorpusprod.blob.core.windows.net/corpusfiles/UNv1.0.en-zh.tar.gz.00 https://stuncorpusprod.blob.core.windows.net/corpusfiles/UNv1.0.en-zh.tar.gz.01" "UNv1.0.en-zh.tar.gz.00 UNv1.0.en-zh.tar.gz.01" &
+  prepare_ccmt &
+  download_and_select wikimatrix "http://data.statmt.org/wmt20/translation-task/WikiMatrix/WikiMatrix.v1.en-zh.langid.tsv.gz" "gunzip WikiMatrix.v1.en-zh.langid.tsv.gz" zh WikiMatrix.v1.en-zh.langid.tsv 3 2 &
+
+  wait
+
+  # remove previous results
+  rm -f all.??
+  find ./ -maxdepth 1 -name "*.zh" | sort -V | xargs cat > all.zh
+  find ./ -maxdepth 1 -name "*.en" | sort -V | xargs cat > all.en
+  lid_filter zh all.zh $DEST/train.zh_CN-en_XX.zh_CN en all.en $DEST/train.zh_CN-en_XX.en_XX
+}
+
+prepare_tests() {
+  OUTPUT_DIR=$TMP_DIR
+  mkdir -p $OUTPUT_DIR
+  cd $OUTPUT_DIR
+  wget -nc http://data.statmt.org/wmt20/translation-task/dev.tgz
+  tar -zxvf dev.tgz
+  cd dev
+
+  cat newsdev2020-jaen-src.ja.sgm | $UTILS/strip_sgm.sh > newsdev2020-jaen.ja
+  cat newsdev2020-jaen-ref.en.sgm | $UTILS/strip_sgm.sh > newsdev2020-jaen.en
+  split newsdev2020-jaen.ja -a 0 -n r/1/2 > $DEST/valid.ja_XX-en_XX.ja_XX
+  split newsdev2020-jaen.en -a 0 -n r/1/2 > $DEST/valid.ja_XX-en_XX.en_XX
+  split newsdev2020-jaen.ja -a 0 -n r/2/2 > $DEST/test.ja_XX-en_XX.ja_XX
+  split newsdev2020-jaen.en -a 0 -n r/2/2 > $DEST/test.ja_XX-en_XX.en_XX
+
+  cat newsdev2020-iuen-src.iu.sgm | strip_sgm.sh > newsdev2020-iuen.iu
+  cat newsdev2020-iuen-ref.en.sgm | strip_sgm.sh > newsdev2020-iuen.en
+  split newsdev2020-iuen.iu -a 0 -n r/1/2 > $DEST/valid.iu_CA-en_XX.iu_CA
+  split newsdev2020-iuen.en -a 0 -n r/1/2 > $DEST/valid.iu_CA-en_XX.en_XX
+  split newsdev2020-iuen.iu -a 0 -n r/2/2 > $DEST/test.iu_CA-en_XX.iu_CA
+  split newsdev2020-iuen.en -a 0 -n r/2/2 > $DEST/test.iu_CA-en_XX.en_XX
+
+  cat newsdev2020-taen-src.ta.sgm | strip_sgm.sh > newsdev2020-taen.ta
+  cat newsdev2020-taen-ref.en.sgm | strip_sgm.sh > newsdev2020-taen.en
+  split newsdev2020-taen.ta -a 0 -n r/1/2 > $DEST/valid.ta_IN-en_XX.ta_IN
+  split newsdev2020-taen.en -a 0 -n r/1/2 > $DEST/valid.ta_IN-en_XX.en_XX
+  split newsdev2020-taen.ta -a 0 -n r/2/2 > $DEST/test.ta_IN-en_XX.ta_IN
+  split newsdev2020-taen.en -a 0 -n r/2/2 > $DEST/test.ta_IN-en_XX.en_XX
+
+  cp wikipedia.dev.km-en.km $DEST/valid.km_KH-en_XX.km_KH
+  cp wikipedia.dev.km-en.en $DEST/valid.km_KH-en_XX.en_XX
+  cp wikipedia.devtest.km-en.km $DEST/test.km_KH-en_XX.km_KH
+  cp wikipedia.devtest.km-en.en $DEST/test.km_KH-en_XX.en_XX
+
+  cp wikipedia.dev.ps-en.ps $DEST/valid.ps_AF-en_XX.ps_AF
+  cp wikipedia.dev.ps-en.en $DEST/valid.ps_AF-en_XX.en_XX
+  cp wikipedia.devtest.ps-en.ps $DEST/test.ps_AF-en_XX.ps_AF
+  cp wikipedia.devtest.ps-en.en $DEST/test.ps_AF-en_XX.en_XX
+
+  cat newsdev2020-plen-src.pl.sgm | strip_sgm.sh > newsdev2020-plen.pl
+  cat newsdev2020-plen-ref.en.sgm | strip_sgm.sh > newsdev2020-plen.en
+  split newsdev2020-plen.pl -a 0 -n r/1/2 > $DEST/valid.pl_PL-en_XX.pl_PL
+  split newsdev2020-plen.en -a 0 -n r/1/2 > $DEST/valid.pl_PL-en_XX.en_XX
+  split newsdev2020-plen.pl -a 0 -n r/2/2 > $DEST/test.pl_PL-en_XX.pl_PL
+  split newsdev2020-plen.en -a 0 -n r/2/2 > $DEST/test.pl_PL-en_XX.en_XX
+
+  cat newstest2018-encs-src.en.sgm | strip_sgm.sh > $DEST/valid.en_XX-cs_CZ.en_XX
+  cat newstest2018-encs-ref.cs.sgm | strip_sgm.sh > $DEST/valid.en_XX-cs_CZ.cs_CZ
+  cat newstest2019-encs-src.en.sgm | strip_sgm.sh > $DEST/test.en_XX-cs_CZ.en_XX
+  cat newstest2019-encs-ref.cs.sgm | strip_sgm.sh > $DEST/test.en_XX-cs_CZ.cs_CZ
+
+  cat newstest2018-deen-src.de.sgm | strip_sgm.sh > $DEST/valid.de_DE-en_XX.de_DE
+  cat newstest2018-deen-ref.en.sgm | strip_sgm.sh > $DEST/valid.de_DE-en_XX.en_XX
+  cat newstest2018-ende-src.en.sgm | strip_sgm.sh > $DEST/valid.en_XX-de_DE.en_XX
+  cat newstest2018-ende-ref.de.sgm | strip_sgm.sh > $DEST/valid.en_XX-de_DE.de_DE
+  cat newstest2019-deen-src.de.sgm | strip_sgm.sh > $DEST/test.de_DE-en_XX.de_DE
+  cat newstest2019-deen-ref.en.sgm | strip_sgm.sh > $DEST/test.de_DE-en_XX.en_XX
+  cat newstest2019-ende-src.en.sgm | strip_sgm.sh > $DEST/test.en_XX-de_DE.en_XX
+  cat newstest2019-ende-ref.de.sgm | strip_sgm.sh > $DEST/test.en_XX-de_DE.de_DE
+
+  cat newstest2018-ruen-src.ru.sgm | strip_sgm.sh > $DEST/valid.ru_RU-en_XX.ru_RU
+  cat newstest2018-ruen-ref.en.sgm | strip_sgm.sh > $DEST/valid.ru_RU-en_XX.en_XX
+  cat newstest2018-enru-src.en.sgm | strip_sgm.sh > $DEST/valid.en_XX-ru_RU.en_XX
+  cat newstest2018-enru-ref.ru.sgm | strip_sgm.sh > $DEST/valid.en_XX-ru_RU.ru_RU
+  cat newstest2019-ruen-src.ru.sgm | strip_sgm.sh > $DEST/test.ru_RU-en_XX.ru_RU
+  cat newstest2019-ruen-ref.en.sgm | strip_sgm.sh > $DEST/test.ru_RU-en_XX.en_XX
+  cat newstest2019-enru-src.en.sgm | strip_sgm.sh > $DEST/test.en_XX-ru_RU.en_XX
+  cat newstest2019-enru-ref.ru.sgm | strip_sgm.sh > $DEST/test.en_XX-ru_RU.ru_RU
+
+  cat newstest2018-zhen-src.zh.sgm | strip_sgm.sh > $DEST/valid.zh_CN-en_XX.zh_CN
+  cat newstest2018-zhen-ref.en.sgm | strip_sgm.sh > $DEST/valid.zh_CN-en_XX.en_XX
+  cat newstest2018-enzh-src.en.sgm | strip_sgm.sh > $DEST/valid.en_XX-zh_CN.en_XX
+  cat newstest2018-enzh-ref.zh.sgm | strip_sgm.sh > $DEST/valid.en_XX-zh_CN.zh_CN
+  cat newstest2019-zhen-src.zh.sgm | strip_sgm.sh > $DEST/test.zh_CN-en_XX.zh_CN
+  cat newstest2019-zhen-ref.en.sgm | strip_sgm.sh > $DEST/test.zh_CN-en_XX.en_XX
+  cat newstest2019-enzh-src.en.sgm | strip_sgm.sh > $DEST/test.en_XX-zh_CN.en_XX
+  cat newstest2019-enzh-ref.zh.sgm | strip_sgm.sh > $DEST/test.en_XX-zh_CN.zh_CN
+}
+
+mkdir -p $DEST
+
+prepare_lid
+prepare_moses
+download_commoncrawl
+
+prepare_ja &
+prepare_ta &
+prepare_km &
+prepare_ps &
+prepare_iu &
+prepare_cs &
+prepare_de &
+prepare_pl &
+prepare_ru &
+prepare_zh &
+
+# prepare valid/test set
+prepare_tests &
+
+# wait
+
+# TODO remove intermediate files
+# rm -rf $TMP_DIR
diff --git a/SpeechT5/fairseq/examples/multilingual/data_scripts/preprocess_ML50_v1.sh b/SpeechT5/fairseq/examples/multilingual/data_scripts/preprocess_ML50_v1.sh
new file mode 100644
index 0000000000000000000000000000000000000000..4655936149cab212b3cfa14f306d71153729f9d7
--- /dev/null
+++ b/SpeechT5/fairseq/examples/multilingual/data_scripts/preprocess_ML50_v1.sh
@@ -0,0 +1,27 @@
+#!/bin/bash
+# Copyright (c) Facebook, Inc. and its affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+
+if [ -z $WORKDIR_ROOT ] ;
+then
+        echo "please specify your working directory root in environment variable WORKDIR_ROOT. Exitting..."
+        exit
+fi
+
+if [ -z $SPM_PATH ] ;
+then
+    echo "Please install sentence piecence from https://github.com/google/sentencepiece and set SPM_PATH pointing to the installed spm_encode.py. Exitting..."
+    exit
+fi
+
+ML50=${WORKDIR_ROOT}/ML50
+
+mkdir -p $ML50/dedup
+mkdir -p $ML50/cleaned_dedup
+
+python ./dedup_all.py --from-folder $ML50/raw --to-folder $ML50/dedup
+python ./remove_valid_test_in_train.py --from-folder $ML50/dedup --to-folder $ML50/clean
+python ./binarize.py --raw-folder $ML50/clean
\ No newline at end of file
diff --git a/SpeechT5/fairseq/examples/multilingual/data_scripts/remove_valid_test_in_train.py b/SpeechT5/fairseq/examples/multilingual/data_scripts/remove_valid_test_in_train.py
new file mode 100644
index 0000000000000000000000000000000000000000..ef618adef7c7d010f8de38fb5ebeb5a35d2d3cac
--- /dev/null
+++ b/SpeechT5/fairseq/examples/multilingual/data_scripts/remove_valid_test_in_train.py
@@ -0,0 +1,290 @@
+import os, sys
+import glob, itertools
+import pandas as pd
+
+WORKDIR_ROOT = os.environ.get('WORKDIR_ROOT', None)
+
+if WORKDIR_ROOT is None or  not WORKDIR_ROOT.strip():
+    print('please specify your working directory root in OS environment variable WORKDIR_ROOT. Exitting..."')
+    sys.exit(-1)
+
+
+def load_langs(path):
+    with open(path) as fr:
+        langs = [l.strip() for l in fr]
+    return langs
+
+
+
+def load_sentences(raw_data, split, direction):
+    src, tgt = direction.split('-')
+    src_path = f"{raw_data}/{split}.{direction}.{src}"
+    tgt_path = f"{raw_data}/{split}.{direction}.{tgt}"
+    if os.path.exists(src_path) and os.path.exists(tgt_path):
+        return [(src, open(src_path).read().splitlines()), (tgt, open(tgt_path).read().splitlines())]
+    else:
+        return []
+
+def swap_direction(d):
+    src, tgt = d.split('-')
+    return f'{tgt}-{src}'
+
+def get_all_test_data(raw_data, directions, split='test'):
+    test_data = [ 
+        x
+        for dd in directions
+        for d in [dd, swap_direction(dd)]
+        for x in load_sentences(raw_data, split, d)
+    ]
+    # all_test_data = {s for _, d in test_data for s in d}
+    all_test_data = {}
+    for lang, d in test_data:
+        for s in d:
+            s = s.strip()
+            lgs = all_test_data.get(s, set())
+            lgs.add(lang)
+            all_test_data[s] = lgs
+    return all_test_data, test_data
+
+def check_train_sentences(raw_data, direction, all_test_data, mess_up_train={}):
+    src, tgt = direction.split('-')
+    tgt_path = f"{raw_data}/train.{direction}.{tgt}"
+    src_path = f"{raw_data}/train.{direction}.{src}"
+    print(f'check training data in {raw_data}/train.{direction}')
+    size = 0
+    if not os.path.exists(tgt_path) or not os.path.exists(src_path):
+        return mess_up_train, size
+    with open(src_path) as f, open(tgt_path) as g:
+        for src_line, tgt_line in zip(f, g):
+            s = src_line.strip()
+            t = tgt_line.strip()
+            size += 1
+            if s in all_test_data:
+                langs = mess_up_train.get(s, set())
+                langs.add(direction)
+                mess_up_train[s] = langs
+            if t in all_test_data:
+                langs = mess_up_train.get(t, set())
+                langs.add(direction)
+                mess_up_train[t] = langs                
+    return mess_up_train, size
+
+def check_train_all(raw_data, directions, all_test_data):
+    mess_up_train = {}
+    data_sizes = {}
+    for direction in directions:
+        _, size = check_train_sentences(raw_data, direction, all_test_data, mess_up_train)
+        data_sizes[direction] = size
+    return mess_up_train, data_sizes
+
+def count_train_in_other_set(mess_up_train):
+    train_in_others  = [(direction, s) for s, directions in mess_up_train.items() for direction in directions]
+    counts = {}
+    for direction, s in train_in_others:
+        counts[direction] = counts.get(direction, 0) + 1
+    return counts
+
+def train_size_if_remove_in_otherset(data_sizes, mess_up_train):
+    counts_in_other = count_train_in_other_set(mess_up_train)
+    remain_sizes = []
+    for direction, count in counts_in_other.items():
+        remain_sizes.append((direction, data_sizes[direction] - count, data_sizes[direction], count, 100 * count / data_sizes[direction] ))
+    return remain_sizes
+
+
+def remove_messed_up_sentences(raw_data, direction, mess_up_train, mess_up_train_pairs, corrected_langs):
+    split = 'train'
+    src_lang, tgt_lang = direction.split('-')
+
+    tgt = f"{raw_data}/{split}.{direction}.{tgt_lang}"
+    src = f"{raw_data}/{split}.{direction}.{src_lang}"
+    print(f'working on {direction}: ', src, tgt)
+    if not os.path.exists(tgt) or not os.path.exists(src) :
+        return
+    
+    corrected_tgt = f"{to_folder}/{split}.{direction}.{tgt_lang}"
+    corrected_src = f"{to_folder}/{split}.{direction}.{src_lang}"
+    line_num = 0
+    keep_num = 0
+    with open(src, encoding='utf8',) as fsrc, \
+        open(tgt, encoding='utf8',) as ftgt, \
+        open(corrected_src, 'w', encoding='utf8') as fsrc_corrected, \
+        open(corrected_tgt, 'w', encoding='utf8') as ftgt_corrected:
+            for s, t in zip(fsrc, ftgt):
+                s = s.strip()
+                t = t.strip()
+                if t not in mess_up_train \
+                    and s not in mess_up_train \
+                    and (s, t) not in mess_up_train_pairs \
+                    and (t, s) not in mess_up_train_pairs:
+                    corrected_langs.add(direction)
+                    print(s, file=fsrc_corrected)
+                    print(t, file=ftgt_corrected)
+                    keep_num += 1
+                line_num += 1
+                if line_num % 1000 == 0:
+                    print(f'completed {line_num} lines', end='\r')
+    return line_num, keep_num
+
+##########
+
+
+def merge_valid_test_messup(mess_up_train_valid, mess_up_train_test):
+    merged_mess = []
+    for s in set(list(mess_up_train_valid.keys()) + list(mess_up_train_test.keys())):
+        if not s:
+            continue
+        valid = mess_up_train_valid.get(s, set())
+        test = mess_up_train_test.get(s, set())
+        merged_mess.append((s, valid | test))
+    return dict(merged_mess)
+
+
+
+#########
+def check_train_pairs(raw_data, direction, all_test_data, mess_up_train={}):
+    src, tgt = direction.split('-')
+    #a hack; TODO: check the reversed directions
+    path1 = f"{raw_data}/train.{src}-{tgt}.{src}"
+    path2 = f"{raw_data}/train.{src}-{tgt}.{tgt}"
+    if not os.path.exists(path1) or not os.path.exists(path2) :
+        return
+    
+    with open(path1) as f1, open(path2) as f2:
+        for src_line, tgt_line in zip(f1, f2):
+            s = src_line.strip()
+            t = tgt_line.strip()
+            if (s, t) in all_test_data or (t, s) in all_test_data:
+                langs = mess_up_train.get( (s, t), set())
+                langs.add(src)
+                langs.add(tgt)
+                mess_up_train[(s, t)] = langs
+                
+
+def load_pairs(raw_data, split, direction):
+    src, tgt = direction.split('-')
+    src_f = f"{raw_data}/{split}.{direction}.{src}"
+    tgt_f = f"{raw_data}/{split}.{direction}.{tgt}"
+    if tgt != 'en_XX':
+        src_f, tgt_f = tgt_f, src_f
+    if os.path.exists(src_f) and os.path.exists(tgt_f):
+        return list(zip(open(src_f).read().splitlines(), 
+                open(tgt_f).read().splitlines(), 
+                ))
+    else:
+        return []
+
+# skip_langs = ['cs_CZ', 'en_XX', 'tl_XX', 'tr_TR']
+def get_messed_up_test_pairs(split, directions):
+    test_pairs = [ 
+        (d,  load_pairs(raw_data, split, d))
+        for d in directions
+    ]
+    # all_test_data = {s for _, d in test_data for s in d}
+    all_test_pairs = {}
+    for direction, d in test_pairs:
+        src, tgt = direction.split('-')
+        for s in d:
+            langs = all_test_pairs.get(s, set())
+            langs.add(src)
+            langs.add(tgt)
+            all_test_pairs[s] = langs
+    mess_up_train_pairs = {}                
+    for direction in directions:
+        check_train_pairs(raw_data, direction, all_test_pairs, mess_up_train_pairs)  
+    return all_test_pairs, mess_up_train_pairs
+
+
+
+if __name__ == "__main__":
+    #######
+    import argparse
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        '--from-folder',  
+        required=True,
+        type=str)
+    parser.add_argument(
+        '--to-folder',  
+        required=True,
+        type=str)
+    parser.add_argument(
+        '--directions',  
+        default=None,
+        type=str)
+
+
+    args = parser.parse_args()    
+    raw_data = args.from_folder
+    to_folder = args.to_folder
+    os.makedirs(to_folder, exist_ok=True)
+
+    if args.directions:
+        directions = args.directions.split(',')
+    else:
+        raw_files = itertools.chain(
+            glob.glob(f'{raw_data}/train*'),
+            glob.glob(f'{raw_data}/valid*'),
+            glob.glob(f'{raw_data}/test*'),
+        )
+        directions = [os.path.split(file_path)[-1].split('.')[1] for file_path in raw_files]
+    print('working on directions: ', directions)
+
+    ##########
+    
+
+
+    all_test_data, test_data = get_all_test_data(raw_data, directions, 'test')
+    print('==loaded test data==')
+    all_valid_data, valid_data = get_all_test_data(raw_data, directions, 'valid')
+    print('==loaded valid data==')
+    all_valid_test_data =  merge_valid_test_messup(all_test_data, all_valid_data)
+    mess_up_train, data_sizes = check_train_all(raw_data, directions, all_valid_test_data)
+    print('training messing up with valid, test data:', len(mess_up_train))
+    data_situation = train_size_if_remove_in_otherset(data_sizes, mess_up_train)
+    df = pd.DataFrame(data_situation, columns=['direction', 'train_size_after_remove', 'orig_size', 'num_to_remove', 'remove_percent'])
+    df.sort_values('remove_percent', ascending=False)
+    df.to_csv(f'{raw_data}/clean_summary.tsv', sep='\t')
+    print(f'projected data clean summary in: {raw_data}/clean_summary.tsv')    
+
+    # correct the dataset:
+    all_test_pairs, mess_up_test_train_pairs = get_messed_up_test_pairs('test', directions)
+    all_valid_pairs, mess_up_valid_train_pairs = get_messed_up_test_pairs('valid', directions)
+
+    all_messed_pairs = set(mess_up_test_train_pairs.keys()).union(set(mess_up_valid_train_pairs.keys()))    
+    corrected_directions = set()
+
+    real_data_situation = []
+    for direction in directions:
+        org_size, new_size = remove_messed_up_sentences(raw_data, direction, mess_up_train, all_messed_pairs, corrected_directions)
+        if org_size == 0:
+            print(f"{direction} has size 0")
+            continue
+        real_data_situation.append(
+            (direction, new_size, org_size, org_size - new_size, (org_size - new_size) / org_size * 100)
+        )
+    print('corrected directions: ', corrected_directions)
+    df = pd.DataFrame(real_data_situation, columns=['direction', 'train_size_after_remove', 'orig_size', 'num_to_remove', 'remove_percent'])
+    df.sort_values('remove_percent', ascending=False)
+    df.to_csv(f'{raw_data}/actual_clean_summary.tsv', sep='\t')
+    print(f'actual data clean summary (which can be different from the projected one because of duplications) in: {raw_data}/actual_clean_summary.tsv')        
+
+    import shutil
+    for direction in directions:
+        src_lang, tgt_lang = direction.split('-')
+        for split in ['train', 'valid', 'test']:
+            # copying valid, test and uncorrected train
+            if direction in corrected_directions and split == 'train':
+                continue
+            tgt = f"{raw_data}/{split}.{direction}.{tgt_lang}"
+            src = f"{raw_data}/{split}.{direction}.{src_lang}"
+            if not (os.path.exists(src) and os.path.exists(tgt)):
+                continue
+            corrected_tgt = f"{to_folder}/{split}.{direction}.{tgt_lang}"
+            corrected_src = f"{to_folder}/{split}.{direction}.{src_lang}"
+            print(f'copying {src} to {corrected_src}')
+            shutil.copyfile(src, corrected_src)
+            print(f'copying {tgt} to {corrected_tgt}')
+            shutil.copyfile(tgt, corrected_tgt)   
+
+    print('completed')
\ No newline at end of file
diff --git a/SpeechT5/fairseq/examples/multilingual/data_scripts/requirement.txt b/SpeechT5/fairseq/examples/multilingual/data_scripts/requirement.txt
new file mode 100644
index 0000000000000000000000000000000000000000..e85d7d540e08a1407f92dfb2311972a1a5a30123
--- /dev/null
+++ b/SpeechT5/fairseq/examples/multilingual/data_scripts/requirement.txt
@@ -0,0 +1,2 @@
+wget
+pandas
\ No newline at end of file
diff --git a/SpeechT5/fairseq/examples/multilingual/data_scripts/utils/dedup.py b/SpeechT5/fairseq/examples/multilingual/data_scripts/utils/dedup.py
new file mode 100644
index 0000000000000000000000000000000000000000..d6fed8c695cf218d3502d6ed8d23015520c0e179
--- /dev/null
+++ b/SpeechT5/fairseq/examples/multilingual/data_scripts/utils/dedup.py
@@ -0,0 +1,41 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+
+import argparse
+
+def deup(src_file, tgt_file, src_file_out, tgt_file_out):
+    seen = set()
+    dup_count = 0
+    with open(src_file, encoding='utf-8') as fsrc, \
+        open(tgt_file, encoding='utf-8') as ftgt, \
+        open(src_file_out, 'w', encoding='utf-8') as fsrc_out, \
+        open(tgt_file_out, 'w', encoding='utf-8') as ftgt_out:
+        for s, t in zip(fsrc, ftgt):
+            if (s, t) not in seen:
+                fsrc_out.write(s)
+                ftgt_out.write(t)   
+                seen.add((s, t))
+            else:
+                dup_count += 1
+    print(f'number of duplication: {dup_count}')    
+
+
+def main():
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--src-file", type=str, required=True,
+                        help="src file")
+    parser.add_argument("--tgt-file", type=str, required=True,
+                        help="tgt file")
+    parser.add_argument("--src-file-out", type=str, required=True,
+                        help="src ouptut file")
+    parser.add_argument("--tgt-file-out", type=str, required=True,
+                        help="tgt ouput file") 
+    args = parser.parse_args()    
+    deup(args.src_file, args.tgt_file, args.src_file_out, args.tgt_file_out)
+                
+
+if __name__ == "__main__":
+    main()
diff --git a/SpeechT5/fairseq/examples/multilingual/data_scripts/utils/fasttext_multi_filter.py b/SpeechT5/fairseq/examples/multilingual/data_scripts/utils/fasttext_multi_filter.py
new file mode 100644
index 0000000000000000000000000000000000000000..41b38ba5bef20cb043921ac61820db8689189a5a
--- /dev/null
+++ b/SpeechT5/fairseq/examples/multilingual/data_scripts/utils/fasttext_multi_filter.py
@@ -0,0 +1,63 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+
+#!/bin/python
+
+import fasttext
+from multiprocessing import Pool
+import contextlib
+import sys
+import argparse
+from functools import partial
+import io
+
+model = None
+def init(model_path):
+    global model
+    model = fasttext.load_model(model_path)
+
+def pred(lines):
+    return lines, [model.predict(line.strip())[0][0][9:] for line in lines]
+
+def main():
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--model", type=str, required=True,
+                        help="model to load")
+    parser.add_argument("--inputs", nargs="+", default=['-'],
+                        help="input files to filter")
+    parser.add_argument("--langs", nargs="+", required=True,
+                        help="lang ids of each input file")
+    parser.add_argument("--outputs", nargs="+", default=['-'],
+                        help="path to save lid filtered outputs")
+    parser.add_argument("--num-workers", type=int, metavar="N", default=10,
+                        help="number of processes in parallel")
+    args = parser.parse_args()
+
+    assert len(args.inputs) == len(args.langs) and len(args.inputs) == len(args.outputs)
+
+    with contextlib.ExitStack() as stack:
+        inputs = [
+            stack.enter_context(open(input, "r", encoding="utf-8", newline="\n", errors="replace"))
+                if input != "-" else io.TextIOWrapper(sys.stdin.buffer, encoding='utf-8', errors="replace")
+            for input in args.inputs
+        ]
+        outputs = [
+            stack.enter_context(open(output, "w", encoding="utf-8", newline="\n"))
+                if output != "-" else sys.stdout
+            for output in args.outputs
+        ]
+        with Pool(args.num_workers, initializer=partial(init, args.model)) as p:
+            skip_cnt = 0
+            for lines, preds in p.imap(pred, list(zip(*inputs)), chunksize=500):
+                if not all(a == b for a, b in zip(preds, args.langs)):
+                    skip_cnt += 1
+                    continue
+                for line, output_h in zip(lines, outputs):
+                    print(line.strip(), file=output_h)
+        print(f"Skipped {skip_cnt} lines.")
+
+if __name__ == "__main__":
+    main()
diff --git a/SpeechT5/fairseq/examples/multilingual/data_scripts/utils/strip_sgm.sh b/SpeechT5/fairseq/examples/multilingual/data_scripts/utils/strip_sgm.sh
new file mode 100644
index 0000000000000000000000000000000000000000..7f4f61d7b1a46f51a1221de6b336cb70b5a0b8b3
--- /dev/null
+++ b/SpeechT5/fairseq/examples/multilingual/data_scripts/utils/strip_sgm.sh
@@ -0,0 +1 @@
+grep "seg id" | sed 's/<seg id="[0-9]\+">//g' | sed 's/<\/seg>//g'
diff --git a/SpeechT5/fairseq/examples/multilingual/finetune_multilingual_model.sh b/SpeechT5/fairseq/examples/multilingual/finetune_multilingual_model.sh
new file mode 100644
index 0000000000000000000000000000000000000000..25960c5dc8a02e5580b61837099770a082b4dd83
--- /dev/null
+++ b/SpeechT5/fairseq/examples/multilingual/finetune_multilingual_model.sh
@@ -0,0 +1,32 @@
+#!/bin/bash
+# Copyright (c) Facebook, Inc. and its affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+
+path_2_data=$1  # <path to data> which contains binarized data for each directions
+lang_list=$2  # <path to a file which contains a list of languages separted by new lines>
+lang_pairs=$3  #a list language pairs to train multilingual models, e.g. "en-fr,en-cs,fr-en,cs-en"
+# pretrained can be an mBART pretrained model as well
+pretrained_model=$4 #<path to a pretrained model>
+
+
+fairseq-train "$path_2_data" \
+  --encoder-normalize-before --decoder-normalize-before \
+  --arch transformer --layernorm-embedding \
+  --task translation_multi_simple_epoch \
+  --finetune-from-model "$pretrained_model" \
+  --sampling-method "temperature" \
+  --sampling-temperature "1.5" \
+  --encoder-langtok "src" \
+  --decoder-langtok \
+  --lang-dict "$lang_list" \
+  --lang-pairs "$lang_pairs" \
+  --criterion label_smoothed_cross_entropy --label-smoothing 0.2 \
+  --optimizer adam --adam-eps 1e-06 --adam-betas '(0.9, 0.98)' \
+  --lr-scheduler inverse_sqrt --lr 3e-05 --warmup-updates 2500 --max-update 40000 \
+  --dropout 0.3 --attention-dropout 0.1 --weight-decay 0.0 \
+  --max-tokens 1024 --update-freq 2 \
+  --save-interval 1 --save-interval-updates 5000 --keep-interval-updates 10 --no-epoch-checkpoints \
+  --seed 222 --log-format simple --log-interval 2
diff --git a/SpeechT5/fairseq/examples/multilingual/multilingual_fairseq_gen.sh b/SpeechT5/fairseq/examples/multilingual/multilingual_fairseq_gen.sh
new file mode 100644
index 0000000000000000000000000000000000000000..65aa322d7daaa428015de98abe4664a6a4164bfd
--- /dev/null
+++ b/SpeechT5/fairseq/examples/multilingual/multilingual_fairseq_gen.sh
@@ -0,0 +1,26 @@
+#!/bin/bash
+# Copyright (c) Facebook, Inc. and its affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+
+lang_pairs="en-fr,en-cs,fr-en,cs-en"
+path_2_data=$1 # <path to data>
+lang_list=$2 # <path to a file which contains list of languages separted by new lines>
+model=$3  # <path to a trained model>
+source_lang=cs
+target_lang=en
+
+fairseq-generate "$path_2_data" \
+  --path "$model" \
+  --task translation_multi_simple_epoch \
+  --gen-subset test \
+  --source-lang "$source_lang" \
+  --target-lang "$target_lang" \
+  --sacrebleu --remove-bpe 'sentencepiece'\
+  --batch-size 32 \
+  --encoder-langtok "src" \
+  --decoder-langtok \
+  --lang-dict "$lang_list" \
+  --lang-pairs "$lang_pairs"
diff --git a/SpeechT5/fairseq/examples/multilingual/train_multilingual_model.sh b/SpeechT5/fairseq/examples/multilingual/train_multilingual_model.sh
new file mode 100644
index 0000000000000000000000000000000000000000..cc050bd3f02de8a2f303737f187442d2eb80e4ef
--- /dev/null
+++ b/SpeechT5/fairseq/examples/multilingual/train_multilingual_model.sh
@@ -0,0 +1,28 @@
+#!/bin/bash
+# Copyright (c) Facebook, Inc. and its affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+
+path_2_data=$1  # <path to data> which contains binarized data for each directions
+lang_list=$2  # <path to a file which contains a list of languages separted by new lines>
+lang_pairs=$3  #a list language pairs to train multilingual models, e.g. "en-fr,en-cs,fr-en,cs-en"
+
+fairseq-train "$path_2_data" \
+  --encoder-normalize-before --decoder-normalize-before \
+  --arch transformer --layernorm-embedding \
+  --task translation_multi_simple_epoch \
+  --sampling-method "temperature" \
+  --sampling-temperature 1.5 \
+  --encoder-langtok "src" \
+  --decoder-langtok \
+  --lang-dict "$lang_list" \
+  --lang-pairs "$lang_pairs" \
+  --criterion label_smoothed_cross_entropy --label-smoothing 0.2 \
+  --optimizer adam --adam-eps 1e-06 --adam-betas '(0.9, 0.98)' \
+  --lr-scheduler inverse_sqrt --lr 3e-05 --warmup-updates 2500 --max-update 40000 \
+  --dropout 0.3 --attention-dropout 0.1 --weight-decay 0.0 \
+  --max-tokens 1024 --update-freq 2 \
+  --save-interval 1 --save-interval-updates 5000 --keep-interval-updates 10 --no-epoch-checkpoints \
+  --seed 222 --log-format simple --log-interval 2
diff --git a/SpeechT5/fairseq/examples/noisychannel/README.md b/SpeechT5/fairseq/examples/noisychannel/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..9d101aa874ec36ff3bb5c1166169a4c4f38ffe2b
--- /dev/null
+++ b/SpeechT5/fairseq/examples/noisychannel/README.md
@@ -0,0 +1,72 @@
+# Simple and Effective Noisy Channel Modeling for Neural Machine Translation (Yee et al., 2019)
+This page contains pointers to pre-trained models as well as instructions on how to run the reranking scripts.
+
+## Citation:
+```bibtex
+@inproceedings{yee2019simple,
+  title = {Simple and Effective Noisy Channel Modeling for Neural Machine Translation},
+  author = {Kyra Yee and Yann Dauphin and Michael Auli},
+  booktitle = {Conference on Empirical Methods in Natural Language Processing},
+  year = {2019},
+}
+```
+
+## Pre-trained Models:
+
+Model | Description |  Download
+---|---|---
+`transformer.noisychannel.de-en` | De->En Forward Model | [download (.tar.gz)](https://dl.fbaipublicfiles.com/fairseq/models/noisychannel/forward_de2en.tar.bz2)
+`transformer.noisychannel.en-de` | En->De Channel Model | [download (.tar.gz)](https://dl.fbaipublicfiles.com/fairseq/models/noisychannel/backward_en2de.tar.bz2)
+`transformer_lm.noisychannel.en` | En Language model | [download (.tar.gz)](https://dl.fbaipublicfiles.com/fairseq/models/noisychannel/reranking_en_lm.tar.bz2)
+
+Test Data: [newstest_wmt17](https://dl.fbaipublicfiles.com/fairseq/models/noisychannel/wmt17test.tar.bz2)
+
+## Example usage
+
+```
+mkdir rerank_example
+curl https://dl.fbaipublicfiles.com/fairseq/models/noisychannel/forward_de2en.tar.bz2 | tar xvjf - -C rerank_example
+curl https://dl.fbaipublicfiles.com/fairseq/models/noisychannel/backward_en2de.tar.bz2 | tar xvjf - -C rerank_example
+curl https://dl.fbaipublicfiles.com/fairseq/models/noisychannel/reranking_en_lm.tar.bz2 | tar xvjf - -C rerank_example
+curl https://dl.fbaipublicfiles.com/fairseq/models/noisychannel/wmt17test.tar.bz2 | tar xvjf - -C rerank_example
+
+beam=50
+num_trials=1000
+fw_name=fw_model_ex
+bw_name=bw_model_ex
+lm_name=lm_ex
+data_dir=rerank_example/hyphen-splitting-mixed-case-wmt17test-wmt14bpe
+data_dir_name=wmt17
+lm=rerank_example/lm/checkpoint_best.pt
+lm_bpe_code=rerank_example/lm/bpe32k.code
+lm_dict=rerank_example/lm/dict.txt
+batch_size=32
+bw=rerank_example/backward_en2de.pt
+fw=rerank_example/forward_de2en.pt
+
+# reranking with P(T|S) P(S|T) and P(T)
+python examples/noisychannel/rerank_tune.py $data_dir  --tune-param lenpen weight1 weight3  \
+    --lower-bound 0 0 0 --upper-bound 3 3 3 --data-dir-name $data_dir_name  \ 
+    --num-trials $num_trials  --source-lang de --target-lang en --gen-model $fw \
+    -n $beam --batch-size $batch_size --score-model2 $fw --score-model1 $bw \
+    --backwards1 --weight2 1 \
+    -lm $lm  --lm-dict $lm_dict  --lm-name en_newscrawl --lm-bpe-code $lm_bpe_code \
+    --model2-name $fw_name --model1-name $bw_name --gen-model-name $fw_name
+
+# reranking with P(T|S) and P(T)
+python examples/noisychannel/rerank_tune.py $data_dir  --tune-param lenpen weight3 \
+    --lower-bound 0 0 --upper-bound 3 3  --data-dir-name $data_dir_name  \
+    --num-trials $num_trials  --source-lang de --target-lang en --gen-model $fw \
+    -n $beam --batch-size $batch_size --score-model1 $fw \
+    -lm $lm  --lm-dict $lm_dict  --lm-name en_newscrawl --lm-bpe-code $lm_bpe_code \
+    --model1-name $fw_name --gen-model-name $fw_name
+
+# to run with a preconfigured set of hyperparameters for the lenpen and model weights, using rerank.py instead.
+python examples/noisychannel/rerank.py $data_dir \
+    --lenpen 0.269 --weight1 1 --weight2 0.929 --weight3 0.831  \
+    --data-dir-name $data_dir_name  --source-lang de --target-lang en --gen-model $fw \
+    -n $beam --batch-size $batch_size --score-model2 $fw --score-model1 $bw --backwards1  \
+    -lm $lm  --lm-dict $lm_dict  --lm-name en_newscrawl --lm-bpe-code $lm_bpe_code \
+    --model2-name $fw_name --model1-name $bw_name --gen-model-name $fw_name
+```
+
diff --git a/SpeechT5/fairseq/examples/noisychannel/__init__.py b/SpeechT5/fairseq/examples/noisychannel/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..89f1aef4f6328d25425e0bcabb42dfffd2ed35f0
--- /dev/null
+++ b/SpeechT5/fairseq/examples/noisychannel/__init__.py
@@ -0,0 +1,6 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from .rerank_options import *  # noqa
diff --git a/SpeechT5/fairseq/examples/noisychannel/rerank.py b/SpeechT5/fairseq/examples/noisychannel/rerank.py
new file mode 100644
index 0000000000000000000000000000000000000000..bb80d11a67cd75764a89f6f41915b0348ae96e92
--- /dev/null
+++ b/SpeechT5/fairseq/examples/noisychannel/rerank.py
@@ -0,0 +1,428 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import math
+from multiprocessing import Pool
+
+import numpy as np
+from fairseq import options
+from fairseq.data import dictionary
+from fairseq.scoring import bleu
+
+from examples.noisychannel import (
+    rerank_generate,
+    rerank_options,
+    rerank_score_bw,
+    rerank_score_lm,
+    rerank_utils,
+)
+
+
+def score_target_hypo(
+    args, a, b, c, lenpen, target_outfile, hypo_outfile, write_hypos, normalize
+):
+
+    print("lenpen", lenpen, "weight1", a, "weight2", b, "weight3", c)
+    gen_output_lst, bitext1_lst, bitext2_lst, lm_res_lst = load_score_files(args)
+    dict = dictionary.Dictionary()
+    scorer = scorer = bleu.Scorer(
+        bleu.BleuConfig(
+            pad=dict.pad(),
+            eos=dict.eos(),
+            unk=dict.unk(),
+        )
+    )
+
+    ordered_hypos = {}
+    ordered_targets = {}
+
+    for shard_id in range(len(bitext1_lst)):
+        bitext1 = bitext1_lst[shard_id]
+        bitext2 = bitext2_lst[shard_id]
+        gen_output = gen_output_lst[shard_id]
+        lm_res = lm_res_lst[shard_id]
+
+        total = len(bitext1.rescore_source.keys())
+        source_lst = []
+        hypo_lst = []
+        score_lst = []
+        reference_lst = []
+        j = 1
+        best_score = -math.inf
+
+        for i in range(total):
+            # length is measured in terms of words, not bpe tokens, since models may not share the same bpe
+            target_len = len(bitext1.rescore_hypo[i].split())
+
+            if lm_res is not None:
+                lm_score = lm_res.score[i]
+            else:
+                lm_score = 0
+
+            if bitext2 is not None:
+                bitext2_score = bitext2.rescore_score[i]
+                bitext2_backwards = bitext2.backwards
+            else:
+                bitext2_score = None
+                bitext2_backwards = None
+
+            score = rerank_utils.get_score(
+                a,
+                b,
+                c,
+                target_len,
+                bitext1.rescore_score[i],
+                bitext2_score,
+                lm_score=lm_score,
+                lenpen=lenpen,
+                src_len=bitext1.source_lengths[i],
+                tgt_len=bitext1.target_lengths[i],
+                bitext1_backwards=bitext1.backwards,
+                bitext2_backwards=bitext2_backwards,
+                normalize=normalize,
+            )
+
+            if score > best_score:
+                best_score = score
+                best_hypo = bitext1.rescore_hypo[i]
+
+            if j == gen_output.num_hypos[i] or j == args.num_rescore:
+                j = 1
+                hypo_lst.append(best_hypo)
+                score_lst.append(best_score)
+                source_lst.append(bitext1.rescore_source[i])
+                reference_lst.append(bitext1.rescore_target[i])
+
+                best_score = -math.inf
+                best_hypo = ""
+            else:
+                j += 1
+
+        gen_keys = list(sorted(gen_output.no_bpe_target.keys()))
+
+        for key in range(len(gen_keys)):
+            if args.prefix_len is None:
+                assert hypo_lst[key] in gen_output.no_bpe_hypo[gen_keys[key]], (
+                    "pred and rescore hypo mismatch: i: "
+                    + str(key)
+                    + ", "
+                    + str(hypo_lst[key])
+                    + str(gen_keys[key])
+                    + str(gen_output.no_bpe_hypo[key])
+                )
+                sys_tok = dict.encode_line(hypo_lst[key])
+                ref_tok = dict.encode_line(gen_output.no_bpe_target[gen_keys[key]])
+                scorer.add(ref_tok, sys_tok)
+
+            else:
+                full_hypo = rerank_utils.get_full_from_prefix(
+                    hypo_lst[key], gen_output.no_bpe_hypo[gen_keys[key]]
+                )
+                sys_tok = dict.encode_line(full_hypo)
+                ref_tok = dict.encode_line(gen_output.no_bpe_target[gen_keys[key]])
+                scorer.add(ref_tok, sys_tok)
+
+        # if only one set of hyper parameters is provided, write the predictions to a file
+        if write_hypos:
+            # recover the orinal ids from n best list generation
+            for key in range(len(gen_output.no_bpe_target)):
+                if args.prefix_len is None:
+                    assert hypo_lst[key] in gen_output.no_bpe_hypo[gen_keys[key]], (
+                        "pred and rescore hypo mismatch:"
+                        + "i:"
+                        + str(key)
+                        + str(hypo_lst[key])
+                        + str(gen_output.no_bpe_hypo[key])
+                    )
+                    ordered_hypos[gen_keys[key]] = hypo_lst[key]
+                    ordered_targets[gen_keys[key]] = gen_output.no_bpe_target[
+                        gen_keys[key]
+                    ]
+
+                else:
+                    full_hypo = rerank_utils.get_full_from_prefix(
+                        hypo_lst[key], gen_output.no_bpe_hypo[gen_keys[key]]
+                    )
+                    ordered_hypos[gen_keys[key]] = full_hypo
+                    ordered_targets[gen_keys[key]] = gen_output.no_bpe_target[
+                        gen_keys[key]
+                    ]
+
+    # write the hypos in the original order from nbest list generation
+    if args.num_shards == (len(bitext1_lst)):
+        with open(target_outfile, "w") as t:
+            with open(hypo_outfile, "w") as h:
+                for key in range(len(ordered_hypos)):
+                    t.write(ordered_targets[key])
+                    h.write(ordered_hypos[key])
+
+    res = scorer.result_string(4)
+    if write_hypos:
+        print(res)
+    score = rerank_utils.parse_bleu_scoring(res)
+    return score
+
+
+def match_target_hypo(args, target_outfile, hypo_outfile):
+    """combine scores from the LM and bitext models, and write the top scoring hypothesis to a file"""
+    if len(args.weight1) == 1:
+        res = score_target_hypo(
+            args,
+            args.weight1[0],
+            args.weight2[0],
+            args.weight3[0],
+            args.lenpen[0],
+            target_outfile,
+            hypo_outfile,
+            True,
+            args.normalize,
+        )
+        rerank_scores = [res]
+    else:
+        print("launching pool")
+        with Pool(32) as p:
+            rerank_scores = p.starmap(
+                score_target_hypo,
+                [
+                    (
+                        args,
+                        args.weight1[i],
+                        args.weight2[i],
+                        args.weight3[i],
+                        args.lenpen[i],
+                        target_outfile,
+                        hypo_outfile,
+                        False,
+                        args.normalize,
+                    )
+                    for i in range(len(args.weight1))
+                ],
+            )
+
+    if len(rerank_scores) > 1:
+        best_index = np.argmax(rerank_scores)
+        best_score = rerank_scores[best_index]
+        print("best score", best_score)
+        print("best lenpen", args.lenpen[best_index])
+        print("best weight1", args.weight1[best_index])
+        print("best weight2", args.weight2[best_index])
+        print("best weight3", args.weight3[best_index])
+        return (
+            args.lenpen[best_index],
+            args.weight1[best_index],
+            args.weight2[best_index],
+            args.weight3[best_index],
+            best_score,
+        )
+
+    else:
+        return (
+            args.lenpen[0],
+            args.weight1[0],
+            args.weight2[0],
+            args.weight3[0],
+            rerank_scores[0],
+        )
+
+
+def load_score_files(args):
+    if args.all_shards:
+        shard_ids = list(range(args.num_shards))
+    else:
+        shard_ids = [args.shard_id]
+
+    gen_output_lst = []
+    bitext1_lst = []
+    bitext2_lst = []
+    lm_res1_lst = []
+
+    for shard_id in shard_ids:
+        using_nbest = args.nbest_list is not None
+        (
+            pre_gen,
+            left_to_right_preprocessed_dir,
+            right_to_left_preprocessed_dir,
+            backwards_preprocessed_dir,
+            lm_preprocessed_dir,
+        ) = rerank_utils.get_directories(
+            args.data_dir_name,
+            args.num_rescore,
+            args.gen_subset,
+            args.gen_model_name,
+            shard_id,
+            args.num_shards,
+            args.sampling,
+            args.prefix_len,
+            args.target_prefix_frac,
+            args.source_prefix_frac,
+        )
+
+        rerank1_is_gen = (
+            args.gen_model == args.score_model1 and args.source_prefix_frac is None
+        )
+        rerank2_is_gen = (
+            args.gen_model == args.score_model2 and args.source_prefix_frac is None
+        )
+
+        score1_file = rerank_utils.rescore_file_name(
+            pre_gen,
+            args.prefix_len,
+            args.model1_name,
+            target_prefix_frac=args.target_prefix_frac,
+            source_prefix_frac=args.source_prefix_frac,
+            backwards=args.backwards1,
+        )
+        if args.score_model2 is not None:
+            score2_file = rerank_utils.rescore_file_name(
+                pre_gen,
+                args.prefix_len,
+                args.model2_name,
+                target_prefix_frac=args.target_prefix_frac,
+                source_prefix_frac=args.source_prefix_frac,
+                backwards=args.backwards2,
+            )
+        if args.language_model is not None:
+            lm_score_file = rerank_utils.rescore_file_name(
+                pre_gen, args.prefix_len, args.lm_name, lm_file=True
+            )
+
+        # get gen output
+        predictions_bpe_file = pre_gen + "/generate_output_bpe.txt"
+        if using_nbest:
+            print("Using predefined n-best list from interactive.py")
+            predictions_bpe_file = args.nbest_list
+        gen_output = rerank_utils.BitextOutputFromGen(
+            predictions_bpe_file,
+            bpe_symbol=args.post_process,
+            nbest=using_nbest,
+            prefix_len=args.prefix_len,
+            target_prefix_frac=args.target_prefix_frac,
+        )
+
+        if rerank1_is_gen:
+            bitext1 = gen_output
+        else:
+            bitext1 = rerank_utils.BitextOutput(
+                score1_file,
+                args.backwards1,
+                args.right_to_left1,
+                args.post_process,
+                args.prefix_len,
+                args.target_prefix_frac,
+                args.source_prefix_frac,
+            )
+
+        if args.score_model2 is not None or args.nbest_list is not None:
+            if rerank2_is_gen:
+                bitext2 = gen_output
+            else:
+                bitext2 = rerank_utils.BitextOutput(
+                    score2_file,
+                    args.backwards2,
+                    args.right_to_left2,
+                    args.post_process,
+                    args.prefix_len,
+                    args.target_prefix_frac,
+                    args.source_prefix_frac,
+                )
+
+                assert (
+                    bitext2.source_lengths == bitext1.source_lengths
+                ), "source lengths for rescoring models do not match"
+                assert (
+                    bitext2.target_lengths == bitext1.target_lengths
+                ), "target lengths for rescoring models do not match"
+        else:
+            if args.diff_bpe:
+                assert args.score_model2 is None
+                bitext2 = gen_output
+            else:
+                bitext2 = None
+
+        if args.language_model is not None:
+            lm_res1 = rerank_utils.LMOutput(
+                lm_score_file,
+                args.lm_dict,
+                args.prefix_len,
+                args.post_process,
+                args.target_prefix_frac,
+            )
+        else:
+            lm_res1 = None
+
+        gen_output_lst.append(gen_output)
+        bitext1_lst.append(bitext1)
+        bitext2_lst.append(bitext2)
+        lm_res1_lst.append(lm_res1)
+    return gen_output_lst, bitext1_lst, bitext2_lst, lm_res1_lst
+
+
+def rerank(args):
+    if type(args.lenpen) is not list:
+        args.lenpen = [args.lenpen]
+    if type(args.weight1) is not list:
+        args.weight1 = [args.weight1]
+    if type(args.weight2) is not list:
+        args.weight2 = [args.weight2]
+    if type(args.weight3) is not list:
+        args.weight3 = [args.weight3]
+    if args.all_shards:
+        shard_ids = list(range(args.num_shards))
+    else:
+        shard_ids = [args.shard_id]
+
+    for shard_id in shard_ids:
+        (
+            pre_gen,
+            left_to_right_preprocessed_dir,
+            right_to_left_preprocessed_dir,
+            backwards_preprocessed_dir,
+            lm_preprocessed_dir,
+        ) = rerank_utils.get_directories(
+            args.data_dir_name,
+            args.num_rescore,
+            args.gen_subset,
+            args.gen_model_name,
+            shard_id,
+            args.num_shards,
+            args.sampling,
+            args.prefix_len,
+            args.target_prefix_frac,
+            args.source_prefix_frac,
+        )
+        rerank_generate.gen_and_reprocess_nbest(args)
+        rerank_score_bw.score_bw(args)
+        rerank_score_lm.score_lm(args)
+
+        if args.write_hypos is None:
+            write_targets = pre_gen + "/matched_targets"
+            write_hypos = pre_gen + "/matched_hypos"
+        else:
+            write_targets = args.write_hypos + "_targets" + args.gen_subset
+            write_hypos = args.write_hypos + "_hypos" + args.gen_subset
+
+    if args.all_shards:
+        write_targets += "_all_shards"
+        write_hypos += "_all_shards"
+
+    (
+        best_lenpen,
+        best_weight1,
+        best_weight2,
+        best_weight3,
+        best_score,
+    ) = match_target_hypo(args, write_targets, write_hypos)
+
+    return best_lenpen, best_weight1, best_weight2, best_weight3, best_score
+
+
+def cli_main():
+    parser = rerank_options.get_reranking_parser()
+    args = options.parse_args_and_arch(parser)
+    rerank(args)
+
+
+if __name__ == "__main__":
+    cli_main()
diff --git a/SpeechT5/fairseq/examples/noisychannel/rerank_generate.py b/SpeechT5/fairseq/examples/noisychannel/rerank_generate.py
new file mode 100644
index 0000000000000000000000000000000000000000..daeeae059a677a9fcd7c370be087f1f5c189bc52
--- /dev/null
+++ b/SpeechT5/fairseq/examples/noisychannel/rerank_generate.py
@@ -0,0 +1,397 @@
+#!/usr/bin/env python3 -u
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+"""
+Generate n-best translations using a trained model.
+"""
+
+import os
+import subprocess
+from contextlib import redirect_stdout
+
+from fairseq import options
+from fairseq_cli import generate, preprocess
+
+from examples.noisychannel import rerank_options, rerank_utils
+
+
+def gen_and_reprocess_nbest(args):
+    if args.score_dict_dir is None:
+        args.score_dict_dir = args.data
+    if args.prefix_len is not None:
+        assert (
+            args.right_to_left1 is False
+        ), "prefix length not compatible with right to left models"
+        assert (
+            args.right_to_left2 is False
+        ), "prefix length not compatible with right to left models"
+
+    if args.nbest_list is not None:
+        assert args.score_model2 is None
+
+    if args.backwards1:
+        scorer1_src = args.target_lang
+        scorer1_tgt = args.source_lang
+    else:
+        scorer1_src = args.source_lang
+        scorer1_tgt = args.target_lang
+
+    store_data = (
+        os.path.join(os.path.dirname(__file__)) + "/rerank_data/" + args.data_dir_name
+    )
+    if not os.path.exists(store_data):
+        os.makedirs(store_data)
+
+    (
+        pre_gen,
+        left_to_right_preprocessed_dir,
+        right_to_left_preprocessed_dir,
+        backwards_preprocessed_dir,
+        lm_preprocessed_dir,
+    ) = rerank_utils.get_directories(
+        args.data_dir_name,
+        args.num_rescore,
+        args.gen_subset,
+        args.gen_model_name,
+        args.shard_id,
+        args.num_shards,
+        args.sampling,
+        args.prefix_len,
+        args.target_prefix_frac,
+        args.source_prefix_frac,
+    )
+    assert not (
+        args.right_to_left1 and args.backwards1
+    ), "backwards right to left not supported"
+    assert not (
+        args.right_to_left2 and args.backwards2
+    ), "backwards right to left not supported"
+    assert not (
+        args.prefix_len is not None and args.target_prefix_frac is not None
+    ), "target prefix frac and target prefix len incompatible"
+
+    # make directory to store generation results
+    if not os.path.exists(pre_gen):
+        os.makedirs(pre_gen)
+
+    rerank1_is_gen = (
+        args.gen_model == args.score_model1 and args.source_prefix_frac is None
+    )
+    rerank2_is_gen = (
+        args.gen_model == args.score_model2 and args.source_prefix_frac is None
+    )
+
+    if args.nbest_list is not None:
+        rerank2_is_gen = True
+
+    # make directories to store preprossed nbest list for reranking
+    if not os.path.exists(left_to_right_preprocessed_dir):
+        os.makedirs(left_to_right_preprocessed_dir)
+    if not os.path.exists(right_to_left_preprocessed_dir):
+        os.makedirs(right_to_left_preprocessed_dir)
+    if not os.path.exists(lm_preprocessed_dir):
+        os.makedirs(lm_preprocessed_dir)
+    if not os.path.exists(backwards_preprocessed_dir):
+        os.makedirs(backwards_preprocessed_dir)
+
+    score1_file = rerank_utils.rescore_file_name(
+        pre_gen,
+        args.prefix_len,
+        args.model1_name,
+        target_prefix_frac=args.target_prefix_frac,
+        source_prefix_frac=args.source_prefix_frac,
+        backwards=args.backwards1,
+    )
+    if args.score_model2 is not None:
+        score2_file = rerank_utils.rescore_file_name(
+            pre_gen,
+            args.prefix_len,
+            args.model2_name,
+            target_prefix_frac=args.target_prefix_frac,
+            source_prefix_frac=args.source_prefix_frac,
+            backwards=args.backwards2,
+        )
+
+    predictions_bpe_file = pre_gen + "/generate_output_bpe.txt"
+
+    using_nbest = args.nbest_list is not None
+
+    if using_nbest:
+        print("Using predefined n-best list from interactive.py")
+        predictions_bpe_file = args.nbest_list
+
+    else:
+        if not os.path.isfile(predictions_bpe_file):
+            print("STEP 1: generate predictions using the p(T|S) model with bpe")
+            print(args.data)
+            param1 = [
+                args.data,
+                "--path",
+                args.gen_model,
+                "--shard-id",
+                str(args.shard_id),
+                "--num-shards",
+                str(args.num_shards),
+                "--nbest",
+                str(args.num_rescore),
+                "--batch-size",
+                str(args.batch_size),
+                "--beam",
+                str(args.num_rescore),
+                "--batch-size",
+                str(args.num_rescore),
+                "--gen-subset",
+                args.gen_subset,
+                "--source-lang",
+                args.source_lang,
+                "--target-lang",
+                args.target_lang,
+            ]
+            if args.sampling:
+                param1 += ["--sampling"]
+
+            gen_parser = options.get_generation_parser()
+            input_args = options.parse_args_and_arch(gen_parser, param1)
+
+            print(input_args)
+            with open(predictions_bpe_file, "w") as f:
+                with redirect_stdout(f):
+                    generate.main(input_args)
+
+    gen_output = rerank_utils.BitextOutputFromGen(
+        predictions_bpe_file,
+        bpe_symbol=args.post_process,
+        nbest=using_nbest,
+        prefix_len=args.prefix_len,
+        target_prefix_frac=args.target_prefix_frac,
+    )
+
+    if args.diff_bpe:
+        rerank_utils.write_reprocessed(
+            gen_output.no_bpe_source,
+            gen_output.no_bpe_hypo,
+            gen_output.no_bpe_target,
+            pre_gen + "/source_gen_bpe." + args.source_lang,
+            pre_gen + "/target_gen_bpe." + args.target_lang,
+            pre_gen + "/reference_gen_bpe." + args.target_lang,
+        )
+        bitext_bpe = args.rescore_bpe_code
+        bpe_src_param = [
+            "-c",
+            bitext_bpe,
+            "--input",
+            pre_gen + "/source_gen_bpe." + args.source_lang,
+            "--output",
+            pre_gen + "/rescore_data." + args.source_lang,
+        ]
+        bpe_tgt_param = [
+            "-c",
+            bitext_bpe,
+            "--input",
+            pre_gen + "/target_gen_bpe." + args.target_lang,
+            "--output",
+            pre_gen + "/rescore_data." + args.target_lang,
+        ]
+
+        subprocess.call(
+            [
+                "python",
+                os.path.join(
+                    os.path.dirname(__file__), "subword-nmt/subword_nmt/apply_bpe.py"
+                ),
+            ]
+            + bpe_src_param,
+            shell=False,
+        )
+
+        subprocess.call(
+            [
+                "python",
+                os.path.join(
+                    os.path.dirname(__file__), "subword-nmt/subword_nmt/apply_bpe.py"
+                ),
+            ]
+            + bpe_tgt_param,
+            shell=False,
+        )
+
+    if (not os.path.isfile(score1_file) and not rerank1_is_gen) or (
+        args.score_model2 is not None
+        and not os.path.isfile(score2_file)
+        and not rerank2_is_gen
+    ):
+        print(
+            "STEP 2: process the output of generate.py so we have clean text files with the translations"
+        )
+
+        rescore_file = "/rescore_data"
+        if args.prefix_len is not None:
+            prefix_len_rescore_file = rescore_file + "prefix" + str(args.prefix_len)
+        if args.target_prefix_frac is not None:
+            target_prefix_frac_rescore_file = (
+                rescore_file + "target_prefix_frac" + str(args.target_prefix_frac)
+            )
+        if args.source_prefix_frac is not None:
+            source_prefix_frac_rescore_file = (
+                rescore_file + "source_prefix_frac" + str(args.source_prefix_frac)
+            )
+
+        if not args.right_to_left1 or not args.right_to_left2:
+            if not args.diff_bpe:
+                rerank_utils.write_reprocessed(
+                    gen_output.source,
+                    gen_output.hypo,
+                    gen_output.target,
+                    pre_gen + rescore_file + "." + args.source_lang,
+                    pre_gen + rescore_file + "." + args.target_lang,
+                    pre_gen + "/reference_file",
+                    bpe_symbol=args.post_process,
+                )
+                if args.prefix_len is not None:
+                    bw_rescore_file = prefix_len_rescore_file
+                    rerank_utils.write_reprocessed(
+                        gen_output.source,
+                        gen_output.hypo,
+                        gen_output.target,
+                        pre_gen + prefix_len_rescore_file + "." + args.source_lang,
+                        pre_gen + prefix_len_rescore_file + "." + args.target_lang,
+                        pre_gen + "/reference_file",
+                        prefix_len=args.prefix_len,
+                        bpe_symbol=args.post_process,
+                    )
+                elif args.target_prefix_frac is not None:
+                    bw_rescore_file = target_prefix_frac_rescore_file
+                    rerank_utils.write_reprocessed(
+                        gen_output.source,
+                        gen_output.hypo,
+                        gen_output.target,
+                        pre_gen
+                        + target_prefix_frac_rescore_file
+                        + "."
+                        + args.source_lang,
+                        pre_gen
+                        + target_prefix_frac_rescore_file
+                        + "."
+                        + args.target_lang,
+                        pre_gen + "/reference_file",
+                        bpe_symbol=args.post_process,
+                        target_prefix_frac=args.target_prefix_frac,
+                    )
+                else:
+                    bw_rescore_file = rescore_file
+
+                if args.source_prefix_frac is not None:
+                    fw_rescore_file = source_prefix_frac_rescore_file
+                    rerank_utils.write_reprocessed(
+                        gen_output.source,
+                        gen_output.hypo,
+                        gen_output.target,
+                        pre_gen
+                        + source_prefix_frac_rescore_file
+                        + "."
+                        + args.source_lang,
+                        pre_gen
+                        + source_prefix_frac_rescore_file
+                        + "."
+                        + args.target_lang,
+                        pre_gen + "/reference_file",
+                        bpe_symbol=args.post_process,
+                        source_prefix_frac=args.source_prefix_frac,
+                    )
+                else:
+                    fw_rescore_file = rescore_file
+
+        if args.right_to_left1 or args.right_to_left2:
+            rerank_utils.write_reprocessed(
+                gen_output.source,
+                gen_output.hypo,
+                gen_output.target,
+                pre_gen + "/right_to_left_rescore_data." + args.source_lang,
+                pre_gen + "/right_to_left_rescore_data." + args.target_lang,
+                pre_gen + "/right_to_left_reference_file",
+                right_to_left=True,
+                bpe_symbol=args.post_process,
+            )
+
+        print("STEP 3: binarize the translations")
+        if (
+            not args.right_to_left1
+            or args.score_model2 is not None
+            and not args.right_to_left2
+            or not rerank1_is_gen
+        ):
+
+            if args.backwards1 or args.backwards2:
+                if args.backwards_score_dict_dir is not None:
+                    bw_dict = args.backwards_score_dict_dir
+                else:
+                    bw_dict = args.score_dict_dir
+                bw_preprocess_param = [
+                    "--source-lang",
+                    scorer1_src,
+                    "--target-lang",
+                    scorer1_tgt,
+                    "--trainpref",
+                    pre_gen + bw_rescore_file,
+                    "--srcdict",
+                    bw_dict + "/dict." + scorer1_src + ".txt",
+                    "--tgtdict",
+                    bw_dict + "/dict." + scorer1_tgt + ".txt",
+                    "--destdir",
+                    backwards_preprocessed_dir,
+                ]
+                preprocess_parser = options.get_preprocessing_parser()
+                input_args = preprocess_parser.parse_args(bw_preprocess_param)
+                preprocess.main(input_args)
+
+            preprocess_param = [
+                "--source-lang",
+                scorer1_src,
+                "--target-lang",
+                scorer1_tgt,
+                "--trainpref",
+                pre_gen + fw_rescore_file,
+                "--srcdict",
+                args.score_dict_dir + "/dict." + scorer1_src + ".txt",
+                "--tgtdict",
+                args.score_dict_dir + "/dict." + scorer1_tgt + ".txt",
+                "--destdir",
+                left_to_right_preprocessed_dir,
+            ]
+            preprocess_parser = options.get_preprocessing_parser()
+            input_args = preprocess_parser.parse_args(preprocess_param)
+            preprocess.main(input_args)
+
+        if args.right_to_left1 or args.right_to_left2:
+            preprocess_param = [
+                "--source-lang",
+                scorer1_src,
+                "--target-lang",
+                scorer1_tgt,
+                "--trainpref",
+                pre_gen + "/right_to_left_rescore_data",
+                "--srcdict",
+                args.score_dict_dir + "/dict." + scorer1_src + ".txt",
+                "--tgtdict",
+                args.score_dict_dir + "/dict." + scorer1_tgt + ".txt",
+                "--destdir",
+                right_to_left_preprocessed_dir,
+            ]
+            preprocess_parser = options.get_preprocessing_parser()
+            input_args = preprocess_parser.parse_args(preprocess_param)
+            preprocess.main(input_args)
+
+    return gen_output
+
+
+def cli_main():
+    parser = rerank_options.get_reranking_parser()
+    args = options.parse_args_and_arch(parser)
+    gen_and_reprocess_nbest(args)
+
+
+if __name__ == "__main__":
+    cli_main()
diff --git a/SpeechT5/fairseq/examples/noisychannel/rerank_options.py b/SpeechT5/fairseq/examples/noisychannel/rerank_options.py
new file mode 100644
index 0000000000000000000000000000000000000000..de91939e6635bdf33c9dc330116be07d9e8be6a2
--- /dev/null
+++ b/SpeechT5/fairseq/examples/noisychannel/rerank_options.py
@@ -0,0 +1,149 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from fairseq import options
+
+
+def get_reranking_parser(default_task="translation"):
+    parser = options.get_parser("Generation and reranking", default_task)
+    add_reranking_args(parser)
+    return parser
+
+
+def get_tuning_parser(default_task="translation"):
+    parser = options.get_parser("Reranking tuning", default_task)
+    add_reranking_args(parser)
+    add_tuning_args(parser)
+    return parser
+
+
+def add_reranking_args(parser):
+    group = parser.add_argument_group("Reranking")
+    # fmt: off
+    group.add_argument('--score-model1', '-s1', type=str, metavar='FILE', required=True,
+                       help='path to first model or ensemble of models for rescoring')
+    group.add_argument('--score-model2', '-s2', type=str, metavar='FILE', required=False,
+                       help='path to second model or ensemble of models for rescoring')
+    group.add_argument('--num-rescore', '-n', type=int, metavar='N', default=10,
+                       help='the number of candidate hypothesis to rescore')
+    group.add_argument('-bz', '--batch-size', type=int, metavar='N', default=128,
+                       help='batch size for generating the nbest list')
+    group.add_argument('--gen-subset', default='test', metavar='SET', choices=['test', 'train', 'valid'],
+                       help='data subset to generate (train, valid, test)')
+    group.add_argument('--gen-model', default=None, metavar='FILE',
+                       help='the model to generate translations')
+    group.add_argument('-b1', '--backwards1', action='store_true',
+                       help='whether or not the first model group is backwards')
+    group.add_argument('-b2', '--backwards2', action='store_true',
+                       help='whether or not the second model group is backwards')
+    group.add_argument('-a', '--weight1', default=1, nargs='+', type=float,
+                       help='the weight(s) of the first model')
+    group.add_argument('-b', '--weight2', default=1, nargs='+', type=float,
+                       help='the weight(s) of the second model, or the gen model if using nbest from interactive.py')
+    group.add_argument('-c', '--weight3', default=1, nargs='+', type=float,
+                       help='the weight(s) of the third model')
+
+    # lm arguments
+    group.add_argument('-lm', '--language-model', default=None, metavar='FILE',
+                       help='language model for target language to rescore translations')
+    group.add_argument('--lm-dict', default=None, metavar='FILE',
+                       help='the dict of the language model for the target language')
+    group.add_argument('--lm-name', default=None,
+                       help='the name of the language model for the target language')
+    group.add_argument('--lm-bpe-code', default=None, metavar='FILE',
+                       help='the bpe code for the language model for the target language')
+    group.add_argument('--data-dir-name', default=None,
+                       help='name of data directory')
+    group.add_argument('--lenpen', default=1, nargs='+', type=float,
+                       help='length penalty: <1.0 favors shorter, >1.0 favors longer sentences')
+    group.add_argument('--score-dict-dir', default=None,
+                       help='the directory with dictionaries for the scoring models')
+    group.add_argument('--right-to-left1', action='store_true',
+                       help='whether the first model group is a right to left model')
+    group.add_argument('--right-to-left2', action='store_true',
+                       help='whether the second model group is a right to left model')
+    group.add_argument('--post-process', '--remove-bpe', default='@@ ',
+                       help='the bpe symbol, used for the bitext and LM')
+    group.add_argument('--prefix-len', default=None, type=int,
+                       help='the length of the target prefix to use in rescoring (in terms of words wo bpe)')
+    group.add_argument('--sampling', action='store_true',
+                       help='use sampling instead of beam search for generating n best list')
+    group.add_argument('--diff-bpe', action='store_true',
+                       help='bpe for rescoring and nbest list not the same')
+    group.add_argument('--rescore-bpe-code', default=None,
+                       help='bpe code for rescoring models')
+    group.add_argument('--nbest-list', default=None,
+                       help='use predefined nbest list in interactive.py format')
+    group.add_argument('--write-hypos', default=None,
+                       help='filename prefix to write hypos to')
+    group.add_argument('--ref-translation', default=None,
+                       help='reference translation to use with nbest list from interactive.py')
+    group.add_argument('--backwards-score-dict-dir', default=None,
+                       help='the directory with dictionaries for the backwards model,'
+                            'if None then it is assumed the fw and backwards models share dictionaries')
+
+    # extra scaling args
+    group.add_argument('--gen-model-name', default=None,
+                       help='the name of the models that generated the nbest list')
+    group.add_argument('--model1-name', default=None,
+                       help='the name of the set for model1 group ')
+    group.add_argument('--model2-name', default=None,
+                       help='the name of the set for model2 group')
+    group.add_argument('--shard-id', default=0, type=int,
+                       help='the id of the shard to generate')
+    group.add_argument('--num-shards', default=1, type=int,
+                       help='the number of shards to generate across')
+    group.add_argument('--all-shards', action='store_true',
+                       help='use all shards')
+    group.add_argument('--target-prefix-frac', default=None, type=float,
+                       help='the fraction of the target prefix to use in rescoring (in terms of words wo bpe)')
+    group.add_argument('--source-prefix-frac', default=None, type=float,
+                       help='the fraction of the source prefix to use in rescoring (in terms of words wo bpe)')
+    group.add_argument('--normalize', action='store_true',
+                       help='whether to normalize by src and target len')
+    # fmt: on
+    return group
+
+
+def add_tuning_args(parser):
+    group = parser.add_argument_group("Tuning")
+
+    group.add_argument(
+        "--lower-bound",
+        default=[-0.7],
+        nargs="+",
+        type=float,
+        help="lower bound of search space",
+    )
+    group.add_argument(
+        "--upper-bound",
+        default=[3],
+        nargs="+",
+        type=float,
+        help="upper bound of search space",
+    )
+    group.add_argument(
+        "--tune-param",
+        default=["lenpen"],
+        nargs="+",
+        choices=["lenpen", "weight1", "weight2", "weight3"],
+        help="the parameter(s) to tune",
+    )
+    group.add_argument(
+        "--tune-subset",
+        default="valid",
+        choices=["valid", "test", "train"],
+        help="the subset to tune on ",
+    )
+    group.add_argument(
+        "--num-trials",
+        default=1000,
+        type=int,
+        help="number of trials to do for random search",
+    )
+    group.add_argument(
+        "--share-weights", action="store_true", help="share weight2 and weight 3"
+    )
+    return group
diff --git a/SpeechT5/fairseq/examples/noisychannel/rerank_score_bw.py b/SpeechT5/fairseq/examples/noisychannel/rerank_score_bw.py
new file mode 100644
index 0000000000000000000000000000000000000000..b0bc913651bd76667e25c214acb70f2bca19e185
--- /dev/null
+++ b/SpeechT5/fairseq/examples/noisychannel/rerank_score_bw.py
@@ -0,0 +1,143 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import os
+from contextlib import redirect_stdout
+
+from fairseq import options
+from fairseq_cli import generate
+
+from examples.noisychannel import rerank_options, rerank_utils
+
+
+def score_bw(args):
+    if args.backwards1:
+        scorer1_src = args.target_lang
+        scorer1_tgt = args.source_lang
+    else:
+        scorer1_src = args.source_lang
+        scorer1_tgt = args.target_lang
+
+    if args.score_model2 is not None:
+        if args.backwards2:
+            scorer2_src = args.target_lang
+            scorer2_tgt = args.source_lang
+        else:
+            scorer2_src = args.source_lang
+            scorer2_tgt = args.target_lang
+
+    rerank1_is_gen = (
+        args.gen_model == args.score_model1 and args.source_prefix_frac is None
+    )
+    rerank2_is_gen = (
+        args.gen_model == args.score_model2 and args.source_prefix_frac is None
+    )
+
+    (
+        pre_gen,
+        left_to_right_preprocessed_dir,
+        right_to_left_preprocessed_dir,
+        backwards_preprocessed_dir,
+        lm_preprocessed_dir,
+    ) = rerank_utils.get_directories(
+        args.data_dir_name,
+        args.num_rescore,
+        args.gen_subset,
+        args.gen_model_name,
+        args.shard_id,
+        args.num_shards,
+        args.sampling,
+        args.prefix_len,
+        args.target_prefix_frac,
+        args.source_prefix_frac,
+    )
+
+    score1_file = rerank_utils.rescore_file_name(
+        pre_gen,
+        args.prefix_len,
+        args.model1_name,
+        target_prefix_frac=args.target_prefix_frac,
+        source_prefix_frac=args.source_prefix_frac,
+        backwards=args.backwards1,
+    )
+
+    if args.score_model2 is not None:
+        score2_file = rerank_utils.rescore_file_name(
+            pre_gen,
+            args.prefix_len,
+            args.model2_name,
+            target_prefix_frac=args.target_prefix_frac,
+            source_prefix_frac=args.source_prefix_frac,
+            backwards=args.backwards2,
+        )
+
+    if args.right_to_left1:
+        rerank_data1 = right_to_left_preprocessed_dir
+    elif args.backwards1:
+        rerank_data1 = backwards_preprocessed_dir
+    else:
+        rerank_data1 = left_to_right_preprocessed_dir
+
+    gen_param = ["--batch-size", str(128), "--score-reference", "--gen-subset", "train"]
+    if not rerank1_is_gen and not os.path.isfile(score1_file):
+        print("STEP 4: score the translations for model 1")
+
+        model_param1 = [
+            "--path",
+            args.score_model1,
+            "--source-lang",
+            scorer1_src,
+            "--target-lang",
+            scorer1_tgt,
+        ]
+        gen_model1_param = [rerank_data1] + gen_param + model_param1
+
+        gen_parser = options.get_generation_parser()
+        input_args = options.parse_args_and_arch(gen_parser, gen_model1_param)
+
+        with open(score1_file, "w") as f:
+            with redirect_stdout(f):
+                generate.main(input_args)
+
+    if (
+        args.score_model2 is not None
+        and not os.path.isfile(score2_file)
+        and not rerank2_is_gen
+    ):
+        print("STEP 4: score the translations for model 2")
+
+        if args.right_to_left2:
+            rerank_data2 = right_to_left_preprocessed_dir
+        elif args.backwards2:
+            rerank_data2 = backwards_preprocessed_dir
+        else:
+            rerank_data2 = left_to_right_preprocessed_dir
+
+        model_param2 = [
+            "--path",
+            args.score_model2,
+            "--source-lang",
+            scorer2_src,
+            "--target-lang",
+            scorer2_tgt,
+        ]
+        gen_model2_param = [rerank_data2] + gen_param + model_param2
+
+        gen_parser = options.get_generation_parser()
+        input_args = options.parse_args_and_arch(gen_parser, gen_model2_param)
+
+        with open(score2_file, "w") as f:
+            with redirect_stdout(f):
+                generate.main(input_args)
+
+
+def cli_main():
+    parser = rerank_options.get_reranking_parser()
+    args = options.parse_args_and_arch(parser)
+    score_bw(args)
+
+
+if __name__ == "__main__":
+    cli_main()
diff --git a/SpeechT5/fairseq/examples/noisychannel/rerank_score_lm.py b/SpeechT5/fairseq/examples/noisychannel/rerank_score_lm.py
new file mode 100644
index 0000000000000000000000000000000000000000..e80948d78b02561cbd09d72c319222105f41f6bb
--- /dev/null
+++ b/SpeechT5/fairseq/examples/noisychannel/rerank_score_lm.py
@@ -0,0 +1,81 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import os
+
+from fairseq import options
+
+from examples.noisychannel import rerank_options, rerank_utils
+
+
+def score_lm(args):
+    using_nbest = args.nbest_list is not None
+    (
+        pre_gen,
+        left_to_right_preprocessed_dir,
+        right_to_left_preprocessed_dir,
+        backwards_preprocessed_dir,
+        lm_preprocessed_dir,
+    ) = rerank_utils.get_directories(
+        args.data_dir_name,
+        args.num_rescore,
+        args.gen_subset,
+        args.gen_model_name,
+        args.shard_id,
+        args.num_shards,
+        args.sampling,
+        args.prefix_len,
+        args.target_prefix_frac,
+        args.source_prefix_frac,
+    )
+
+    predictions_bpe_file = pre_gen + "/generate_output_bpe.txt"
+    if using_nbest:
+        print("Using predefined n-best list from interactive.py")
+        predictions_bpe_file = args.nbest_list
+
+    gen_output = rerank_utils.BitextOutputFromGen(
+        predictions_bpe_file, bpe_symbol=args.post_process, nbest=using_nbest
+    )
+
+    if args.language_model is not None:
+        lm_score_file = rerank_utils.rescore_file_name(
+            pre_gen, args.prefix_len, args.lm_name, lm_file=True
+        )
+
+    if args.language_model is not None and not os.path.isfile(lm_score_file):
+        print("STEP 4.5: language modeling for P(T)")
+        if args.lm_bpe_code is None:
+            bpe_status = "no bpe"
+        elif args.lm_bpe_code == "shared":
+            bpe_status = "shared"
+        else:
+            bpe_status = "different"
+
+        rerank_utils.lm_scoring(
+            lm_preprocessed_dir,
+            bpe_status,
+            gen_output,
+            pre_gen,
+            args.lm_dict,
+            args.lm_name,
+            args.language_model,
+            args.lm_bpe_code,
+            128,
+            lm_score_file,
+            args.target_lang,
+            args.source_lang,
+            prefix_len=args.prefix_len,
+        )
+
+
+def cli_main():
+    parser = rerank_options.get_reranking_parser()
+    args = options.parse_args_and_arch(parser)
+    score_lm(args)
+
+
+if __name__ == "__main__":
+    cli_main()
diff --git a/SpeechT5/fairseq/examples/noisychannel/rerank_tune.py b/SpeechT5/fairseq/examples/noisychannel/rerank_tune.py
new file mode 100644
index 0000000000000000000000000000000000000000..b2e8b7594a370b2462f77252d54d7ef80e290f7c
--- /dev/null
+++ b/SpeechT5/fairseq/examples/noisychannel/rerank_tune.py
@@ -0,0 +1,102 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import argparse
+import random
+
+import numpy as np
+from fairseq import options
+
+from examples.noisychannel import rerank, rerank_options
+
+
+def random_search(args):
+    param_values = []
+    tuneable_parameters = ["lenpen", "weight1", "weight2", "weight3"]
+    initial_params = [args.lenpen, args.weight1, args.weight2, args.weight3]
+    for i, elem in enumerate(initial_params):
+        if type(elem) is not list:
+            initial_params[i] = [elem]
+        else:
+            initial_params[i] = elem
+
+    tune_parameters = args.tune_param.copy()
+    for i in range(len(args.tune_param)):
+        assert args.upper_bound[i] >= args.lower_bound[i]
+        index = tuneable_parameters.index(args.tune_param[i])
+        del tuneable_parameters[index]
+        del initial_params[index]
+
+    tune_parameters += tuneable_parameters
+    param_values += initial_params
+    random.seed(args.seed)
+
+    random_params = np.array(
+        [
+            [
+                random.uniform(args.lower_bound[i], args.upper_bound[i])
+                for i in range(len(args.tune_param))
+            ]
+            for k in range(args.num_trials)
+        ]
+    )
+    set_params = np.array(
+        [
+            [initial_params[i][0] for i in range(len(tuneable_parameters))]
+            for k in range(args.num_trials)
+        ]
+    )
+    random_params = np.concatenate((random_params, set_params), 1)
+
+    rerank_args = vars(args).copy()
+    if args.nbest_list:
+        rerank_args["gen_subset"] = "test"
+    else:
+        rerank_args["gen_subset"] = args.tune_subset
+
+    for k in range(len(tune_parameters)):
+        rerank_args[tune_parameters[k]] = list(random_params[:, k])
+
+    if args.share_weights:
+        k = tune_parameters.index("weight2")
+        rerank_args["weight3"] = list(random_params[:, k])
+
+    rerank_args = argparse.Namespace(**rerank_args)
+    best_lenpen, best_weight1, best_weight2, best_weight3, best_score = rerank.rerank(
+        rerank_args
+    )
+    rerank_args = vars(args).copy()
+    rerank_args["lenpen"] = [best_lenpen]
+    rerank_args["weight1"] = [best_weight1]
+    rerank_args["weight2"] = [best_weight2]
+    rerank_args["weight3"] = [best_weight3]
+
+    # write the hypothesis from the valid set from the best trial
+
+    if args.gen_subset != "valid":
+        rerank_args["gen_subset"] = "valid"
+        rerank_args = argparse.Namespace(**rerank_args)
+        rerank.rerank(rerank_args)
+
+    # test with the best hyperparameters on gen subset
+    rerank_args = vars(args).copy()
+    rerank_args["gen_subset"] = args.gen_subset
+    rerank_args["lenpen"] = [best_lenpen]
+    rerank_args["weight1"] = [best_weight1]
+    rerank_args["weight2"] = [best_weight2]
+    rerank_args["weight3"] = [best_weight3]
+    rerank_args = argparse.Namespace(**rerank_args)
+    rerank.rerank(rerank_args)
+
+
+def cli_main():
+    parser = rerank_options.get_tuning_parser()
+    args = options.parse_args_and_arch(parser)
+
+    random_search(args)
+
+
+if __name__ == "__main__":
+    cli_main()
diff --git a/SpeechT5/fairseq/examples/noisychannel/rerank_utils.py b/SpeechT5/fairseq/examples/noisychannel/rerank_utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..2c6bf1b1afbb089cf5e84f720eb7a067479fbcbc
--- /dev/null
+++ b/SpeechT5/fairseq/examples/noisychannel/rerank_utils.py
@@ -0,0 +1,850 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import math
+import os
+import re
+import subprocess
+from contextlib import redirect_stdout
+
+from fairseq import options
+from fairseq_cli import eval_lm, preprocess
+
+
+def reprocess(fle):
+    # takes in a file of generate.py translation generate_output
+    # returns a source dict and hypothesis dict, where keys are the ID num (as a string)
+    # and values and the corresponding source and translation. There may be several translations
+    # per source, so the values for hypothesis_dict are lists.
+    # parses output of generate.py
+
+    with open(fle, "r") as f:
+        txt = f.read()
+
+    """reprocess generate.py output"""
+    p = re.compile(r"[STHP][-]\d+\s*")
+    hp = re.compile(r"(\s*[-]?\d+[.]?\d+\s*)|(\s*(-inf)\s*)")
+    source_dict = {}
+    hypothesis_dict = {}
+    score_dict = {}
+    target_dict = {}
+    pos_score_dict = {}
+    lines = txt.split("\n")
+
+    for line in lines:
+        line += "\n"
+        prefix = re.search(p, line)
+        if prefix is not None:
+            assert len(prefix.group()) > 2, "prefix id not found"
+            _, j = prefix.span()
+            id_num = prefix.group()[2:]
+            id_num = int(id_num)
+            line_type = prefix.group()[0]
+            if line_type == "H":
+                h_txt = line[j:]
+                hypo = re.search(hp, h_txt)
+                assert (
+                    hypo is not None
+                ), "regular expression failed to find the hypothesis scoring"
+                _, i = hypo.span()
+                score = hypo.group()
+                if id_num in hypothesis_dict:
+                    hypothesis_dict[id_num].append(h_txt[i:])
+                    score_dict[id_num].append(float(score))
+                else:
+                    hypothesis_dict[id_num] = [h_txt[i:]]
+                    score_dict[id_num] = [float(score)]
+
+            elif line_type == "S":
+                source_dict[id_num] = line[j:]
+            elif line_type == "T":
+                target_dict[id_num] = line[j:]
+            elif line_type == "P":
+                pos_scores = (line[j:]).split()
+                pos_scores = [float(x) for x in pos_scores]
+                if id_num in pos_score_dict:
+                    pos_score_dict[id_num].append(pos_scores)
+                else:
+                    pos_score_dict[id_num] = [pos_scores]
+
+    return source_dict, hypothesis_dict, score_dict, target_dict, pos_score_dict
+
+
+def reprocess_nbest(fle):
+    """reprocess interactive.py output"""
+    with open(fle, "r") as f:
+        txt = f.read()
+
+    source_dict = {}
+    hypothesis_dict = {}
+    score_dict = {}
+    target_dict = {}
+    pos_score_dict = {}
+    lines = txt.split("\n")
+
+    hp = re.compile(r"[-]?\d+[.]?\d+")
+    j = -1
+
+    for _i, line in enumerate(lines):
+        line += "\n"
+        line_type = line[0]
+
+        if line_type == "H":
+            hypo = re.search(hp, line)
+            _, start_index = hypo.span()
+            score = hypo.group()
+            if j in score_dict:
+                score_dict[j].append(float(score))
+                hypothesis_dict[j].append(line[start_index:].strip("\t"))
+            else:
+                score_dict[j] = [float(score)]
+                hypothesis_dict[j] = [line[start_index:].strip("\t")]
+        elif line_type == "O":
+            j += 1
+            source_dict[j] = line[2:]
+            # we don't have the targets for interactive.py
+            target_dict[j] = "filler"
+
+        elif line_type == "P":
+            pos_scores = [float(pos_score) for pos_score in line.split()[1:]]
+            if j in pos_score_dict:
+                pos_score_dict[j].append(pos_scores)
+            else:
+                pos_score_dict[j] = [pos_scores]
+
+    assert source_dict.keys() == hypothesis_dict.keys()
+    assert source_dict.keys() == pos_score_dict.keys()
+    assert source_dict.keys() == score_dict.keys()
+
+    return source_dict, hypothesis_dict, score_dict, target_dict, pos_score_dict
+
+
+def write_reprocessed(
+    sources,
+    hypos,
+    targets,
+    source_outfile,
+    hypo_outfile,
+    target_outfile,
+    right_to_left=False,
+    prefix_len=None,
+    bpe_symbol=None,
+    target_prefix_frac=None,
+    source_prefix_frac=None,
+):
+
+    """writes nbest hypothesis for rescoring"""
+    assert not (
+        prefix_len is not None and target_prefix_frac is not None
+    ), "in writing reprocessed, only one type of prefix may be used"
+    assert not (
+        prefix_len is not None and source_prefix_frac is not None
+    ), "in writing reprocessed, only one type of prefix may be used"
+    assert not (
+        target_prefix_frac is not None and source_prefix_frac is not None
+    ), "in writing reprocessed, only one type of prefix may be used"
+
+    with open(source_outfile, "w") as source_file, open(
+        hypo_outfile, "w"
+    ) as hypo_file, open(target_outfile, "w") as target_file:
+
+        assert len(sources) == len(hypos), "sources and hypos list length mismatch"
+        if right_to_left:
+            for i in range(len(sources)):
+                for j in range(len(hypos[i])):
+                    if prefix_len is None:
+                        hypo_file.write(make_right_to_left(hypos[i][j]) + "\n")
+                    else:
+                        raise NotImplementedError()
+                    source_file.write(make_right_to_left(sources[i]) + "\n")
+                    target_file.write(make_right_to_left(targets[i]) + "\n")
+        else:
+            for i in sorted(sources.keys()):
+                for j in range(len(hypos[i])):
+                    if prefix_len is not None:
+                        shortened = (
+                            get_prefix_no_bpe(hypos[i][j], bpe_symbol, prefix_len)
+                            + "\n"
+                        )
+                        hypo_file.write(shortened)
+                        source_file.write(sources[i])
+                        target_file.write(targets[i])
+                    elif target_prefix_frac is not None:
+                        num_words, shortened, num_bpe_tokens = calc_length_from_frac(
+                            hypos[i][j], target_prefix_frac, bpe_symbol
+                        )
+                        shortened += "\n"
+                        hypo_file.write(shortened)
+                        source_file.write(sources[i])
+                        target_file.write(targets[i])
+                    elif source_prefix_frac is not None:
+                        num_words, shortened, num_bpe_tokensn = calc_length_from_frac(
+                            sources[i], source_prefix_frac, bpe_symbol
+                        )
+                        shortened += "\n"
+                        hypo_file.write(hypos[i][j])
+                        source_file.write(shortened)
+                        target_file.write(targets[i])
+                    else:
+                        hypo_file.write(hypos[i][j])
+                        source_file.write(sources[i])
+                        target_file.write(targets[i])
+
+
+def calc_length_from_frac(bpe_sentence, prefix_frac, bpe_symbol):
+    # return number of words, (not bpe tokens) that we want
+    no_bpe_sen = remove_bpe(bpe_sentence, bpe_symbol)
+    len_sen = len(no_bpe_sen.split())
+
+    num_words = math.ceil(len_sen * prefix_frac)
+    prefix = get_prefix_no_bpe(bpe_sentence, bpe_symbol, num_words)
+    num_bpe_tokens = len(prefix.split())
+    return num_words, prefix, num_bpe_tokens
+
+
+def get_prefix(sentence, prefix_len):
+    """assuming no bpe, gets the prefix of the sentence with prefix_len words"""
+    tokens = sentence.strip("\n").split()
+    if prefix_len >= len(tokens):
+        return sentence.strip("\n")
+    else:
+        return " ".join(tokens[:prefix_len])
+
+
+def get_prefix_no_bpe(sentence, bpe_symbol, prefix_len):
+    if bpe_symbol is None:
+        return get_prefix(sentence, prefix_len)
+    else:
+        return " ".join(get_prefix_from_len(sentence.split(), bpe_symbol, prefix_len))
+
+
+def get_prefix_from_len(sentence, bpe_symbol, prefix_len):
+    """get the prefix of sentence with bpe, with prefix len in terms of words, not bpe tokens"""
+    bpe_count = sum([bpe_symbol.strip(" ") in t for t in sentence[:prefix_len]])
+    if bpe_count == 0:
+        return sentence[:prefix_len]
+    else:
+        return sentence[:prefix_len] + get_prefix_from_len(
+            sentence[prefix_len:], bpe_symbol, bpe_count
+        )
+
+
+def get_num_bpe_tokens_from_len(sentence, bpe_symbol, prefix_len):
+    """given a prefix length in terms of words, return the number of bpe tokens"""
+    prefix = get_prefix_no_bpe(sentence, bpe_symbol, prefix_len)
+    assert len(remove_bpe(prefix, bpe_symbol).split()) <= prefix_len
+    return len(prefix.split(" "))
+
+
+def make_right_to_left(line):
+    tokens = line.split()
+    tokens.reverse()
+    new_line = " ".join(tokens)
+    return new_line
+
+
+def remove_bpe(line, bpe_symbol):
+    line = line.replace("\n", "")
+    line = (line + " ").replace(bpe_symbol, "").rstrip()
+    return line + ("\n")
+
+
+def remove_bpe_dict(pred_dict, bpe_symbol):
+    new_dict = {}
+    for i in pred_dict:
+        if type(pred_dict[i]) == list:
+            new_list = [remove_bpe(elem, bpe_symbol) for elem in pred_dict[i]]
+            new_dict[i] = new_list
+        else:
+            new_dict[i] = remove_bpe(pred_dict[i], bpe_symbol)
+    return new_dict
+
+
+def parse_bleu_scoring(line):
+    p = re.compile(r"(BLEU4 = )\d+[.]\d+")
+    res = re.search(p, line)
+    assert res is not None, line
+    return float(res.group()[8:])
+
+
+def get_full_from_prefix(hypo_prefix, hypos):
+    """given a hypo prefix, recover the first hypo from the list of complete hypos beginning with that prefix"""
+    for hypo in hypos:
+        hypo_prefix = hypo_prefix.strip("\n")
+        len_prefix = len(hypo_prefix)
+        if hypo[:len_prefix] == hypo_prefix:
+            return hypo
+    # no match found
+    raise Exception()
+
+
+def get_score(
+    a,
+    b,
+    c,
+    target_len,
+    bitext_score1,
+    bitext_score2=None,
+    lm_score=None,
+    lenpen=None,
+    src_len=None,
+    tgt_len=None,
+    bitext1_backwards=False,
+    bitext2_backwards=False,
+    normalize=False,
+):
+    if bitext1_backwards:
+        bitext1_norm = src_len
+    else:
+        bitext1_norm = tgt_len
+    if bitext_score2 is not None:
+        if bitext2_backwards:
+            bitext2_norm = src_len
+        else:
+            bitext2_norm = tgt_len
+    else:
+        bitext2_norm = 1
+        bitext_score2 = 0
+    if normalize:
+        score = (
+            a * bitext_score1 / bitext1_norm
+            + b * bitext_score2 / bitext2_norm
+            + c * lm_score / src_len
+        )
+    else:
+        score = a * bitext_score1 + b * bitext_score2 + c * lm_score
+
+    if lenpen is not None:
+        score /= (target_len) ** float(lenpen)
+
+    return score
+
+
+class BitextOutput(object):
+    def __init__(
+        self,
+        output_file,
+        backwards,
+        right_to_left,
+        bpe_symbol,
+        prefix_len=None,
+        target_prefix_frac=None,
+        source_prefix_frac=None,
+    ):
+        """process output from rescoring"""
+        source, hypo, score, target, pos_score = reprocess(output_file)
+        if backwards:
+            self.hypo_fracs = source_prefix_frac
+        else:
+            self.hypo_fracs = target_prefix_frac
+
+        # remove length penalty so we can use raw scores
+        score, num_bpe_tokens = get_score_from_pos(
+            pos_score, prefix_len, hypo, bpe_symbol, self.hypo_fracs, backwards
+        )
+        source_lengths = {}
+        target_lengths = {}
+
+        assert hypo.keys() == source.keys(), "key mismatch"
+        if backwards:
+            tmp = hypo
+            hypo = source
+            source = tmp
+        for i in source:
+            # since we are reranking, there should only be one hypo per source sentence
+            if backwards:
+                len_src = len(source[i][0].split())
+                # record length without <eos>
+                if len_src == num_bpe_tokens[i][0] - 1:
+                    source_lengths[i] = num_bpe_tokens[i][0] - 1
+                else:
+                    source_lengths[i] = num_bpe_tokens[i][0]
+
+                target_lengths[i] = len(hypo[i].split())
+
+                source[i] = remove_bpe(source[i][0], bpe_symbol)
+                target[i] = remove_bpe(target[i], bpe_symbol)
+                hypo[i] = remove_bpe(hypo[i], bpe_symbol)
+
+                score[i] = float(score[i][0])
+                pos_score[i] = pos_score[i][0]
+
+            else:
+                len_tgt = len(hypo[i][0].split())
+                # record length without <eos>
+                if len_tgt == num_bpe_tokens[i][0] - 1:
+                    target_lengths[i] = num_bpe_tokens[i][0] - 1
+                else:
+                    target_lengths[i] = num_bpe_tokens[i][0]
+
+                source_lengths[i] = len(source[i].split())
+
+                if right_to_left:
+                    source[i] = remove_bpe(make_right_to_left(source[i]), bpe_symbol)
+                    target[i] = remove_bpe(make_right_to_left(target[i]), bpe_symbol)
+                    hypo[i] = remove_bpe(make_right_to_left(hypo[i][0]), bpe_symbol)
+                    score[i] = float(score[i][0])
+                    pos_score[i] = pos_score[i][0]
+                else:
+                    assert (
+                        len(hypo[i]) == 1
+                    ), "expected only one hypothesis per source sentence"
+                    source[i] = remove_bpe(source[i], bpe_symbol)
+                    target[i] = remove_bpe(target[i], bpe_symbol)
+                    hypo[i] = remove_bpe(hypo[i][0], bpe_symbol)
+                    score[i] = float(score[i][0])
+                    pos_score[i] = pos_score[i][0]
+
+        self.rescore_source = source
+        self.rescore_hypo = hypo
+        self.rescore_score = score
+        self.rescore_target = target
+        self.rescore_pos_score = pos_score
+        self.backwards = backwards
+        self.right_to_left = right_to_left
+        self.target_lengths = target_lengths
+        self.source_lengths = source_lengths
+
+
+class BitextOutputFromGen(object):
+    def __init__(
+        self,
+        predictions_bpe_file,
+        bpe_symbol=None,
+        nbest=False,
+        prefix_len=None,
+        target_prefix_frac=None,
+    ):
+        if nbest:
+            (
+                pred_source,
+                pred_hypo,
+                pred_score,
+                pred_target,
+                pred_pos_score,
+            ) = reprocess_nbest(predictions_bpe_file)
+        else:
+            pred_source, pred_hypo, pred_score, pred_target, pred_pos_score = reprocess(
+                predictions_bpe_file
+            )
+
+        assert len(pred_source) == len(pred_hypo)
+        assert len(pred_source) == len(pred_score)
+        assert len(pred_source) == len(pred_target)
+        assert len(pred_source) == len(pred_pos_score)
+
+        # remove length penalty so we can use raw scores
+        pred_score, num_bpe_tokens = get_score_from_pos(
+            pred_pos_score, prefix_len, pred_hypo, bpe_symbol, target_prefix_frac, False
+        )
+
+        self.source = pred_source
+        self.target = pred_target
+        self.score = pred_score
+        self.pos_score = pred_pos_score
+        self.hypo = pred_hypo
+        self.target_lengths = {}
+        self.source_lengths = {}
+
+        self.no_bpe_source = remove_bpe_dict(pred_source.copy(), bpe_symbol)
+        self.no_bpe_hypo = remove_bpe_dict(pred_hypo.copy(), bpe_symbol)
+        self.no_bpe_target = remove_bpe_dict(pred_target.copy(), bpe_symbol)
+
+        # indexes to match those from the rescoring models
+        self.rescore_source = {}
+        self.rescore_target = {}
+        self.rescore_pos_score = {}
+        self.rescore_hypo = {}
+        self.rescore_score = {}
+        self.num_hypos = {}
+        self.backwards = False
+        self.right_to_left = False
+
+        index = 0
+
+        for i in sorted(pred_source.keys()):
+            for j in range(len(pred_hypo[i])):
+
+                self.target_lengths[index] = len(self.hypo[i][j].split())
+                self.source_lengths[index] = len(self.source[i].split())
+
+                self.rescore_source[index] = self.no_bpe_source[i]
+                self.rescore_target[index] = self.no_bpe_target[i]
+                self.rescore_hypo[index] = self.no_bpe_hypo[i][j]
+                self.rescore_score[index] = float(pred_score[i][j])
+                self.rescore_pos_score[index] = pred_pos_score[i][j]
+                self.num_hypos[index] = len(pred_hypo[i])
+                index += 1
+
+
+def get_score_from_pos(
+    pos_score_dict, prefix_len, hypo_dict, bpe_symbol, hypo_frac, backwards
+):
+    score_dict = {}
+    num_bpe_tokens_dict = {}
+    assert prefix_len is None or hypo_frac is None
+    for key in pos_score_dict:
+        score_dict[key] = []
+        num_bpe_tokens_dict[key] = []
+        for i in range(len(pos_score_dict[key])):
+            if prefix_len is not None and not backwards:
+                num_bpe_tokens = get_num_bpe_tokens_from_len(
+                    hypo_dict[key][i], bpe_symbol, prefix_len
+                )
+                score_dict[key].append(sum(pos_score_dict[key][i][:num_bpe_tokens]))
+                num_bpe_tokens_dict[key].append(num_bpe_tokens)
+            elif hypo_frac is not None:
+                num_words, shortened, hypo_prefix_len = calc_length_from_frac(
+                    hypo_dict[key][i], hypo_frac, bpe_symbol
+                )
+                score_dict[key].append(sum(pos_score_dict[key][i][:hypo_prefix_len]))
+                num_bpe_tokens_dict[key].append(hypo_prefix_len)
+            else:
+                score_dict[key].append(sum(pos_score_dict[key][i]))
+                num_bpe_tokens_dict[key].append(len(pos_score_dict[key][i]))
+    return score_dict, num_bpe_tokens_dict
+
+
+class LMOutput(object):
+    def __init__(
+        self,
+        lm_score_file,
+        lm_dict=None,
+        prefix_len=None,
+        bpe_symbol=None,
+        target_prefix_frac=None,
+    ):
+        (
+            lm_sentences,
+            lm_sen_scores,
+            lm_sen_pos_scores,
+            lm_no_bpe_sentences,
+            lm_bpe_tokens,
+        ) = parse_lm(
+            lm_score_file,
+            prefix_len=prefix_len,
+            bpe_symbol=bpe_symbol,
+            target_prefix_frac=target_prefix_frac,
+        )
+
+        self.sentences = lm_sentences
+        self.score = lm_sen_scores
+        self.pos_score = lm_sen_pos_scores
+        self.lm_dict = lm_dict
+        self.no_bpe_sentences = lm_no_bpe_sentences
+        self.bpe_tokens = lm_bpe_tokens
+
+
+def parse_lm(input_file, prefix_len=None, bpe_symbol=None, target_prefix_frac=None):
+    """parse output of eval_lm"""
+    with open(input_file, "r") as f:
+        text = f.readlines()
+        text = text[7:]
+        cleaned_text = text[:-2]
+
+        sentences = {}
+        sen_scores = {}
+        sen_pos_scores = {}
+        no_bpe_sentences = {}
+        num_bpe_tokens_dict = {}
+        for _i, line in enumerate(cleaned_text):
+            tokens = line.split()
+            if tokens[0].isdigit():
+                line_id = int(tokens[0])
+                scores = [float(x[1:-1]) for x in tokens[2::2]]
+                sentences[line_id] = " ".join(tokens[1::2][:-1]) + "\n"
+                if bpe_symbol is not None:
+                    # exclude <eos> symbol to match output from generate.py
+                    bpe_sen = " ".join(tokens[1::2][:-1]) + "\n"
+                    no_bpe_sen = remove_bpe(bpe_sen, bpe_symbol)
+                    no_bpe_sentences[line_id] = no_bpe_sen
+
+                if prefix_len is not None:
+                    num_bpe_tokens = get_num_bpe_tokens_from_len(
+                        bpe_sen, bpe_symbol, prefix_len
+                    )
+                    sen_scores[line_id] = sum(scores[:num_bpe_tokens])
+                    num_bpe_tokens_dict[line_id] = num_bpe_tokens
+                elif target_prefix_frac is not None:
+                    num_words, shortened, target_prefix_len = calc_length_from_frac(
+                        bpe_sen, target_prefix_frac, bpe_symbol
+                    )
+                    sen_scores[line_id] = sum(scores[:target_prefix_len])
+                    num_bpe_tokens_dict[line_id] = target_prefix_len
+                else:
+                    sen_scores[line_id] = sum(scores)
+                    num_bpe_tokens_dict[line_id] = len(scores)
+
+                sen_pos_scores[line_id] = scores
+
+    return sentences, sen_scores, sen_pos_scores, no_bpe_sentences, num_bpe_tokens_dict
+
+
+def get_directories(
+    data_dir_name,
+    num_rescore,
+    gen_subset,
+    fw_name,
+    shard_id,
+    num_shards,
+    sampling=False,
+    prefix_len=None,
+    target_prefix_frac=None,
+    source_prefix_frac=None,
+):
+    nbest_file_id = (
+        "nbest_"
+        + str(num_rescore)
+        + "_subset_"
+        + gen_subset
+        + "_fw_name_"
+        + fw_name
+        + "_shard_"
+        + str(shard_id)
+        + "_of_"
+        + str(num_shards)
+    )
+
+    if sampling:
+        nbest_file_id += "_sampling"
+
+    # the directory containing all information for this nbest list
+    pre_gen = (
+        os.path.join(os.path.dirname(__file__))
+        + "/rerank_data/"
+        + data_dir_name
+        + "/"
+        + nbest_file_id
+    )
+    # the directory to store the preprocessed nbest list, for left to right rescoring
+    left_to_right_preprocessed_dir = pre_gen + "/left_to_right_preprocessed"
+    if source_prefix_frac is not None:
+        left_to_right_preprocessed_dir = (
+            left_to_right_preprocessed_dir + "/prefix_frac" + str(source_prefix_frac)
+        )
+    # the directory to store the preprocessed nbest list, for right to left rescoring
+    right_to_left_preprocessed_dir = pre_gen + "/right_to_left_preprocessed"
+    # the directory to store the preprocessed nbest list, for backwards rescoring
+    backwards_preprocessed_dir = pre_gen + "/backwards"
+    if target_prefix_frac is not None:
+        backwards_preprocessed_dir = (
+            backwards_preprocessed_dir + "/prefix_frac" + str(target_prefix_frac)
+        )
+    elif prefix_len is not None:
+        backwards_preprocessed_dir = (
+            backwards_preprocessed_dir + "/prefix_" + str(prefix_len)
+        )
+
+    # the directory to store the preprocessed nbest list, for rescoring with P(T)
+    lm_preprocessed_dir = pre_gen + "/lm_preprocessed"
+
+    return (
+        pre_gen,
+        left_to_right_preprocessed_dir,
+        right_to_left_preprocessed_dir,
+        backwards_preprocessed_dir,
+        lm_preprocessed_dir,
+    )
+
+
+def lm_scoring(
+    preprocess_directory,
+    bpe_status,
+    gen_output,
+    pre_gen,
+    cur_lm_dict,
+    cur_lm_name,
+    cur_language_model,
+    cur_lm_bpe_code,
+    batch_size,
+    lm_score_file,
+    target_lang,
+    source_lang,
+    prefix_len=None,
+):
+    if prefix_len is not None:
+        assert (
+            bpe_status == "different"
+        ), "bpe status must be different to use prefix len"
+    if bpe_status == "no bpe":
+        # run lm on output without bpe
+        write_reprocessed(
+            gen_output.no_bpe_source,
+            gen_output.no_bpe_hypo,
+            gen_output.no_bpe_target,
+            pre_gen + "/rescore_data_no_bpe.de",
+            pre_gen + "/rescore_data_no_bpe.en",
+            pre_gen + "/reference_file_no_bpe",
+        )
+
+        preprocess_lm_param = [
+            "--only-source",
+            "--trainpref",
+            pre_gen + "/rescore_data_no_bpe." + target_lang,
+            "--srcdict",
+            cur_lm_dict,
+            "--destdir",
+            preprocess_directory,
+        ]
+        preprocess_parser = options.get_preprocessing_parser()
+        input_args = preprocess_parser.parse_args(preprocess_lm_param)
+        preprocess.main(input_args)
+
+        eval_lm_param = [
+            preprocess_directory,
+            "--path",
+            cur_language_model,
+            "--output-word-probs",
+            "--batch-size",
+            str(batch_size),
+            "--max-tokens",
+            "1024",
+            "--sample-break-mode",
+            "eos",
+            "--gen-subset",
+            "train",
+        ]
+
+        eval_lm_parser = options.get_eval_lm_parser()
+        input_args = options.parse_args_and_arch(eval_lm_parser, eval_lm_param)
+
+        with open(lm_score_file, "w") as f:
+            with redirect_stdout(f):
+                eval_lm.main(input_args)
+
+    elif bpe_status == "shared":
+        preprocess_lm_param = [
+            "--only-source",
+            "--trainpref",
+            pre_gen + "/rescore_data." + target_lang,
+            "--srcdict",
+            cur_lm_dict,
+            "--destdir",
+            preprocess_directory,
+        ]
+        preprocess_parser = options.get_preprocessing_parser()
+        input_args = preprocess_parser.parse_args(preprocess_lm_param)
+        preprocess.main(input_args)
+
+        eval_lm_param = [
+            preprocess_directory,
+            "--path",
+            cur_language_model,
+            "--output-word-probs",
+            "--batch-size",
+            str(batch_size),
+            "--sample-break-mode",
+            "eos",
+            "--gen-subset",
+            "train",
+        ]
+
+        eval_lm_parser = options.get_eval_lm_parser()
+        input_args = options.parse_args_and_arch(eval_lm_parser, eval_lm_param)
+
+        with open(lm_score_file, "w") as f:
+            with redirect_stdout(f):
+                eval_lm.main(input_args)
+
+    elif bpe_status == "different":
+        rescore_file = pre_gen + "/rescore_data_no_bpe"
+        rescore_bpe = pre_gen + "/rescore_data_new_bpe"
+
+        rescore_file += "."
+        rescore_bpe += "."
+
+        write_reprocessed(
+            gen_output.no_bpe_source,
+            gen_output.no_bpe_hypo,
+            gen_output.no_bpe_target,
+            rescore_file + source_lang,
+            rescore_file + target_lang,
+            pre_gen + "/reference_file_no_bpe",
+            bpe_symbol=None,
+        )
+
+        # apply LM bpe to nbest list
+        bpe_src_param = [
+            "-c",
+            cur_lm_bpe_code,
+            "--input",
+            rescore_file + target_lang,
+            "--output",
+            rescore_bpe + target_lang,
+        ]
+        subprocess.call(
+            [
+                "python",
+                os.path.join(
+                    os.path.dirname(__file__), "subword-nmt/subword_nmt/apply_bpe.py"
+                ),
+            ]
+            + bpe_src_param,
+            shell=False,
+        )
+        # uncomment to use fastbpe instead of subword-nmt bpe
+        # bpe_src_param = [rescore_bpe+target_lang, rescore_file+target_lang, cur_lm_bpe_code]
+        # subprocess.call(["/private/home/edunov/fastBPE/fast", "applybpe"] + bpe_src_param, shell=False)
+
+        preprocess_dir = preprocess_directory
+
+        preprocess_lm_param = [
+            "--only-source",
+            "--trainpref",
+            rescore_bpe + target_lang,
+            "--srcdict",
+            cur_lm_dict,
+            "--destdir",
+            preprocess_dir,
+        ]
+        preprocess_parser = options.get_preprocessing_parser()
+        input_args = preprocess_parser.parse_args(preprocess_lm_param)
+        preprocess.main(input_args)
+
+        eval_lm_param = [
+            preprocess_dir,
+            "--path",
+            cur_language_model,
+            "--output-word-probs",
+            "--batch-size",
+            str(batch_size),
+            "--max-tokens",
+            "1024",
+            "--sample-break-mode",
+            "eos",
+            "--gen-subset",
+            "train",
+        ]
+
+        eval_lm_parser = options.get_eval_lm_parser()
+        input_args = options.parse_args_and_arch(eval_lm_parser, eval_lm_param)
+
+        with open(lm_score_file, "w") as f:
+            with redirect_stdout(f):
+                eval_lm.main(input_args)
+
+
+def rescore_file_name(
+    nbest_dir,
+    prefix_len,
+    scorer_name,
+    lm_file=False,
+    target_prefix_frac=None,
+    source_prefix_frac=None,
+    backwards=None,
+):
+    if lm_file:
+        score_file = nbest_dir + "/lm_score_translations_model_" + scorer_name + ".txt"
+    else:
+        score_file = nbest_dir + "/" + scorer_name + "_score_translations.txt"
+    if backwards:
+        if prefix_len is not None:
+            score_file += "prefix_len" + str(prefix_len)
+        elif target_prefix_frac is not None:
+            score_file += "target_prefix_frac" + str(target_prefix_frac)
+    else:
+        if source_prefix_frac is not None:
+            score_file += "source_prefix_frac" + str(source_prefix_frac)
+    return score_file
diff --git a/SpeechT5/fairseq/examples/nonautoregressive_translation/README.md b/SpeechT5/fairseq/examples/nonautoregressive_translation/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..8793e225c99732c42c9c19e22075cde37c73341d
--- /dev/null
+++ b/SpeechT5/fairseq/examples/nonautoregressive_translation/README.md
@@ -0,0 +1,146 @@
+# Non-autoregressive Neural Machine Translation (NAT)
+
+This page mainly includes instructions for reproducing results from the following papers
+* [Levenshtein Transformer (Gu et al., 2019)](https://arxiv.org/abs/1905.11006).
+* [Understanding Knowledge Distillation in Non-autoregressive Machine Translation (Zhou et al., 2019)](https://arxiv.org/abs/1911.02727).
+
+We also provided our own implementations for several popular non-autoregressive-based models as reference:<br>
+* [Non-Autoregressive Neural Machine Translation (Gu et al., 2017)](https://arxiv.org/abs/1711.02281)<br>
+* [Deterministic Non-Autoregressive Neural Sequence Modeling by Iterative Refinement (Lee et al., 2018)](https://arxiv.org/abs/1802.06901)<br>
+* [Insertion Transformer: Flexible Sequence Generation via Insertion Operations (Stern et al., 2019)](https://arxiv.org/abs/1902.03249)<br>
+* [Mask-Predict: Parallel Decoding of Conditional Masked Language Models (Ghazvininejad et al., 2019)](https://arxiv.org/abs/1904.09324v2)<br>
+* [Fast Structured Decoding for Sequence Models (Sun et al., 2019)](https://arxiv.org/abs/1910.11555)
+
+## Dataset
+
+First, follow the [instructions to download and preprocess the WMT'14 En-De dataset](../translation#wmt14-english-to-german-convolutional).
+Make sure to learn a joint vocabulary by passing the `--joined-dictionary` option to `fairseq-preprocess`.
+
+### Knowledge Distillation
+Following [Gu et al. 2019](https://arxiv.org/abs/1905.11006), [knowledge distillation](https://arxiv.org/abs/1606.07947) from an autoregressive model can effectively simplify the training data distribution, which is sometimes essential for NAT-based models to learn good translations.
+The easiest way of performing distillation is to follow the [instructions of training a standard transformer model](../translation) on the same data, and then decode the training set to produce a distillation dataset for NAT.
+
+### Download
+We also provided the preprocessed [original](http://dl.fbaipublicfiles.com/nat/original_dataset.zip) and [distillation](http://dl.fbaipublicfiles.com/nat/distill_dataset.zip) datasets. Please build the binarized dataset on your own.
+
+
+## Train a model
+
+Then we can train a nonautoregressive model using the `translation_lev` task and a new criterion `nat_loss`.
+Use the `--noise` flag to specify the input noise used on the target sentences.
+In default, we run the task for *Levenshtein Transformer*, with `--noise='random_delete'`. Full scripts to run other models can also be found [here](./scripts.md).
+
+The following command will train a *Levenshtein Transformer* on the binarized dataset.
+
+```bash
+fairseq-train \
+    data-bin/wmt14_en_de_distill \
+    --save-dir checkpoints \
+    --ddp-backend=legacy_ddp \
+    --task translation_lev \
+    --criterion nat_loss \
+    --arch levenshtein_transformer \
+    --noise random_delete \
+    --share-all-embeddings \
+    --optimizer adam --adam-betas '(0.9,0.98)' \
+    --lr 0.0005 --lr-scheduler inverse_sqrt \
+    --stop-min-lr '1e-09' --warmup-updates 10000 \
+    --warmup-init-lr '1e-07' --label-smoothing 0.1 \
+    --dropout 0.3 --weight-decay 0.01 \
+    --decoder-learned-pos \
+    --encoder-learned-pos \
+    --apply-bert-init \
+    --log-format 'simple' --log-interval 100 \
+    --fixed-validation-seed 7 \
+    --max-tokens 8000 \
+    --save-interval-updates 10000 \
+    --max-update 300000
+```
+
+## Translate
+
+Once a model is trained, we can generate translations using an `iterative_refinement_generator` which will based on the model's initial output and iteratively read and greedily refine the translation until (1) the model predicts the same translations for two consecutive iterations; or (2) the generator reaches the maximum iterations (`--iter-decode-max-iter`). Use `--print-step` to check the actual # of iteration for each sentence.
+
+For *Levenshtein Transformer*, it sometimes helps to apply a `--iter-decode-eos-penalty` (typically, 0~3) to penalize the model finishing generation too early and generating too short translations.
+
+For example, to generate with `--iter-decode-max-iter=9`:
+```bash
+fairseq-generate \
+    data-bin/wmt14_en_de_distill \
+    --gen-subset test \
+    --task translation_lev \
+    --path checkpoints/checkpoint_best.pt \
+    --iter-decode-max-iter 9 \
+    --iter-decode-eos-penalty 0 \
+    --beam 1 --remove-bpe \
+    --print-step \
+    --batch-size 400
+```
+In the end of the generation, we can see the tokenized BLEU score for the translation.
+
+## Advanced Decoding Methods
+### Ensemble
+The NAT models use special implementations of [ensembling](https://github.com/fairinternal/fairseq-py/blob/b98d88da52f2f21f1b169bab8c70c1c4ca19a768/fairseq/sequence_generator.py#L522) to support iterative refinement and a variety of parallel operations in different models, while it shares the same API as standard autoregressive models as follows:
+```bash
+fairseq-generate \
+    data-bin/wmt14_en_de_distill \
+    --gen-subset test \
+    --task translation_lev \
+    --path checkpoint_1.pt:checkpoint_2.pt:checkpoint_3.pt \
+    --iter-decode-max-iter 9 \
+    --iter-decode-eos-penalty 0 \
+    --beam 1 --remove-bpe \
+    --print-step \
+    --batch-size 400
+```
+We use ``:`` to split multiple models. Note that, not all NAT models support ensembling for now.
+
+
+### Length-beam 
+For models that predict lengths before decoding (e.g. the vanilla NAT, Mask-Predict, etc), it is possible to improve the translation quality by varying the target lengths around the predicted value, and translating the same example multiple times in parallel. We can select the best translation with the highest scores defined by your model's output.
+
+Note that, not all models support length beams. For models which dynamically change the lengths (e.g. *Insertion Transformer*, *Levenshtein Transformer*), the same trick does not apply.
+
+### Re-ranking
+If the model generates multiple translations with length beam, we can also introduce an autoregressive model to rerank the translations considering scoring from an autoregressive model is much faster than decoding from that.
+
+For example, to generate translations with length beam and reranking, 
+```bash
+fairseq-generate \
+    data-bin/wmt14_en_de_distill \
+    --gen-subset test \
+    --task translation_lev \
+    --path checkpoints/checkpoint_best.pt:at_checkpoints/checkpoint_best.pt \
+    --iter-decode-max-iter 9 \
+    --iter-decode-eos-penalty 0 \
+    --iter-decode-with-beam 9 \
+    --iter-decode-with-external-reranker \
+    --beam 1 --remove-bpe \
+    --print-step \
+    --batch-size 100
+``` 
+Note that we need to make sure the autoregressive model shares the same vocabulary as our target non-autoregressive model.
+
+
+## Citation
+
+```bibtex
+@incollection{NIPS2019_9297,
+    title = {Levenshtein Transformer},
+    author = {Gu, Jiatao and Wang, Changhan and Zhao, Junbo},
+    booktitle = {Advances in Neural Information Processing Systems 32},
+    editor = {H. Wallach and H. Larochelle and A. Beygelzimer and F. d\textquotesingle Alch\'{e}-Buc and E. Fox and R. Garnett},
+    pages = {11179--11189},
+    year = {2019},
+    publisher = {Curran Associates, Inc.},
+    url = {http://papers.nips.cc/paper/9297-levenshtein-transformer.pdf}
+}
+```
+```bibtex
+@article{zhou2019understanding,
+  title={Understanding Knowledge Distillation in Non-autoregressive Machine Translation},
+  author={Zhou, Chunting and Neubig, Graham and Gu, Jiatao},
+  journal={arXiv preprint arXiv:1911.02727},
+  year={2019}
+}
+```
diff --git a/SpeechT5/fairseq/examples/nonautoregressive_translation/scripts.md b/SpeechT5/fairseq/examples/nonautoregressive_translation/scripts.md
new file mode 100644
index 0000000000000000000000000000000000000000..9d3d7b67dc08440b5f4d1c5a7ffcd4bd6e76c14f
--- /dev/null
+++ b/SpeechT5/fairseq/examples/nonautoregressive_translation/scripts.md
@@ -0,0 +1,179 @@
+# Examples of Training scripts for Non-autoregressive Machine Translation models
+
+### Non-autoregressive Transformer (NAT, Gu et al., 2017)
+Note that we need to have an additional module to perform "length prediction" (`--length-loss-factor`) before generating the whole sequence.
+```bash
+fairseq-train \
+    data-bin/wmt14_en_de_distill \
+    --save-dir checkpoints \
+    --ddp-backend=legacy_ddp \
+    --task translation_lev \
+    --criterion nat_loss \
+    --arch nonautoregressive_transformer \
+    --noise full_mask \
+    --share-all-embeddings \
+    --optimizer adam --adam-betas '(0.9,0.98)' \
+    --lr 0.0005 --lr-scheduler inverse_sqrt \
+    --stop-min-lr '1e-09' --warmup-updates 10000 \
+    --warmup-init-lr '1e-07' --label-smoothing 0.1 \
+    --dropout 0.3 --weight-decay 0.01 \
+    --decoder-learned-pos \
+    --encoder-learned-pos \
+    --pred-length-offset \
+    --length-loss-factor 0.1 \
+    --apply-bert-init \
+    --log-format 'simple' --log-interval 100 \
+    --fixed-validation-seed 7 \
+    --max-tokens 8000 \
+    --save-interval-updates 10000 \
+    --max-update 300000
+```
+
+### Fast Structured Decoding for Sequence Models (NAT-CRF, Sun et al., 2019)
+Note that we implemented a low-rank appromixated CRF model by setting `--crf-lowrank-approx=32` and `--crf-beam-approx=64` as discribed in the original paper. All other settings are the same as the vanilla NAT model.
+```bash
+fairseq-train \
+    data-bin/wmt14_en_de_distill \
+    --save-dir checkpoints \
+    --ddp-backend=legacy_ddp \
+    --task translation_lev \
+    --criterion nat_loss \
+    --arch nacrf_transformer \
+    --noise full_mask \
+    --share-all-embeddings \
+    --optimizer adam --adam-betas '(0.9,0.98)' \
+    --lr 0.0005 --lr-scheduler inverse_sqrt \
+    --stop-min-lr '1e-09' --warmup-updates 10000 \
+    --warmup-init-lr '1e-07' --label-smoothing 0.1 \
+    --dropout 0.3 --weight-decay 0.01 \
+    --decoder-learned-pos \
+    --encoder-learned-pos \
+    --pred-length-offset \
+    --length-loss-factor 0.1 \
+    --word-ins-loss-factor 0.5 \
+    --crf-lowrank-approx 32 \
+    --crf-beam-approx 64 \
+    --apply-bert-init \
+    --log-format 'simple' --log-interval 100 \
+    --fixed-validation-seed 7 \
+    --max-tokens 8000 \
+    --save-interval-updates 10000 \
+    --max-update 300000
+```
+
+
+### Non-autoregressive Transformer with Iterative Refinement (iNAT, Lee et al., 2018)
+Note that `--train-step` means how many iterations of refinement we used during training, and `--dae-ratio` controls the ratio of denoising auto-encoder training described in the original paper.
+```bash
+fairseq-train \
+    data-bin/wmt14_en_de_distill \
+    --save-dir checkpoints \
+    --ddp-backend=legacy_ddp \
+    --task translation_lev \
+    --criterion nat_loss \
+    --arch iterative_nonautoregressive_transformer \
+    --noise full_mask \
+    --share-all-embeddings \
+    --optimizer adam --adam-betas '(0.9,0.98)' \
+    --lr 0.0005 --lr-scheduler inverse_sqrt \
+    --stop-min-lr '1e-09' --warmup-updates 10000 \
+    --warmup-init-lr '1e-07' --label-smoothing 0.1 \
+    --dropout 0.3 --weight-decay 0.01 \
+    --decoder-learned-pos \
+    --encoder-learned-pos \
+    --pred-length-offset \
+    --length-loss-factor 0.1 \
+    --train-step 4 \
+    --dae-ratio 0.5 \
+    --stochastic-approx \
+    --apply-bert-init \
+    --log-format 'simple' --log-interval 100 \
+    --fixed-validation-seed 7 \
+    --max-tokens 8000 \
+    --save-interval-updates 10000 \
+    --max-update 300000
+```
+
+### Insertion Transformer (InsT, Stern et al., 2019)
+Note that we need to specify the "slot-loss" (uniform or balanced tree) described in the original paper. Here we use `--label-tau` to control the temperature.
+
+```bash
+fairseq-train \
+    data-bin/wmt14_en_de_distill \
+    --save-dir checkpoints \
+    --ddp-backend=legacy_ddp \
+    --task translation_lev \
+    --criterion nat_loss \
+    --arch insertion_transformer \
+    --noise random_delete \
+    --share-all-embeddings \
+    --optimizer adam --adam-betas '(0.9,0.98)' \
+    --lr 0.0005 --lr-scheduler inverse_sqrt \
+    --stop-min-lr '1e-09' --warmup-updates 10000 \
+    --warmup-init-lr '1e-07' --label-smoothing 0.1 \
+    --dropout 0.3 --weight-decay 0.01 \
+    --decoder-learned-pos \
+    --encoder-learned-pos \
+    --apply-bert-init \
+    --log-format 'simple' --log-interval 100 \
+    --fixed-validation-seed 7 \
+    --max-tokens 8000 \
+    --save-interval-updates 10000 \
+    --max-update 300000
+```
+
+
+### Mask Predict (CMLM, Ghazvininejad et al., 2019)
+```bash
+fairseq-train \
+    data-bin/wmt14_en_de_distill \
+    --save-dir checkpoints \
+    --ddp-backend=legacy_ddp \
+    --task translation_lev \
+    --criterion nat_loss \
+    --arch cmlm_transformer \
+    --noise random_mask \
+    --share-all-embeddings \
+    --optimizer adam --adam-betas '(0.9,0.98)' \
+    --lr 0.0005 --lr-scheduler inverse_sqrt \
+    --stop-min-lr '1e-09' --warmup-updates 10000 \
+    --warmup-init-lr '1e-07' --label-smoothing 0.1 \
+    --dropout 0.3 --weight-decay 0.01 \
+    --decoder-learned-pos \
+    --encoder-learned-pos \
+    --apply-bert-init \
+    --log-format 'simple' --log-interval 100 \
+    --fixed-validation-seed 7 \
+    --max-tokens 8000 \
+    --save-interval-updates 10000 \
+    --max-update 300000
+```
+
+
+
+
+### Levenshtein Transformer (LevT, Gu et al., 2019)
+```bash
+fairseq-train \
+    data-bin/wmt14_en_de_distill \
+    --save-dir checkpoints \
+    --ddp-backend=legacy_ddp \
+    --task translation_lev \
+    --criterion nat_loss \
+    --arch levenshtein_transformer \
+    --noise random_delete \
+    --share-all-embeddings \
+    --optimizer adam --adam-betas '(0.9,0.98)' \
+    --lr 0.0005 --lr-scheduler inverse_sqrt \
+    --stop-min-lr '1e-09' --warmup-updates 10000 \
+    --warmup-init-lr '1e-07' --label-smoothing 0.1 \
+    --dropout 0.3 --weight-decay 0.01 \
+    --decoder-learned-pos \
+    --encoder-learned-pos \
+    --apply-bert-init \
+    --log-format 'simple' --log-interval 100 \
+    --fixed-validation-seed 7 \
+    --max-tokens 8000 \
+    --save-interval-updates 10000 \
+    --max-update 300000
+```
diff --git a/SpeechT5/fairseq/examples/paraphraser/README.md b/SpeechT5/fairseq/examples/paraphraser/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..3810311f30f99f0a07fd8e5d3723bffeba9948c3
--- /dev/null
+++ b/SpeechT5/fairseq/examples/paraphraser/README.md
@@ -0,0 +1,46 @@
+# Paraphrasing with round-trip translation and mixture of experts
+
+Machine translation models can be used to paraphrase text by translating it to
+an intermediate language and back (round-trip translation).
+
+This example shows how to paraphrase text by first passing it to an
+English-French translation model, followed by a French-English [mixture of
+experts translation model](/examples/translation_moe).
+
+##### 0. Setup
+
+Clone fairseq from source and install necessary dependencies:
+```bash
+git clone https://github.com/pytorch/fairseq.git
+cd fairseq
+pip install --editable .
+pip install sacremoses sentencepiece
+```
+
+##### 1. Download models
+```bash
+wget https://dl.fbaipublicfiles.com/fairseq/models/paraphraser.en-fr.tar.gz
+wget https://dl.fbaipublicfiles.com/fairseq/models/paraphraser.fr-en.hMoEup.tar.gz
+tar -xzvf paraphraser.en-fr.tar.gz
+tar -xzvf paraphraser.fr-en.hMoEup.tar.gz
+```
+
+##### 2. Paraphrase
+```bash
+python examples/paraphraser/paraphrase.py \
+    --en2fr paraphraser.en-fr \
+    --fr2en paraphraser.fr-en.hMoEup
+# Example input:
+#   The new date for the Games, postponed for a year in response to the coronavirus pandemic, gives athletes time to recalibrate their training schedules.
+# Example outputs:
+#   Delayed one year in response to the coronavirus pandemic, the new date of the Games gives athletes time to rebalance their training schedule.
+#   The new date of the Games, which was rescheduled one year in response to the coronavirus (CV) pandemic, gives athletes time to rebalance their training schedule.
+#   The new date of the Games, postponed one year in response to the coronavirus pandemic, provides athletes with time to rebalance their training schedule.
+#   The Games' new date, postponed one year in response to the coronavirus pandemic, gives athletes time to rebalance their training schedule.
+#   The new Games date, postponed one year in response to the coronavirus pandemic, gives the athletes time to rebalance their training schedule.
+#   The new date of the Games, which was postponed one year in response to the coronavirus pandemic, gives the athletes time to rebalance their training schedule.
+#   The new date of the Games, postponed one year in response to the coronavirus pandemic, gives athletes time to rebalance their training schedule.
+#   The new date of the Games, postponed one year in response to the coronavirus pandemic, gives athletes time to re-balance their training schedule.
+#   The new date of the Games, postponed one year in response to the coronavirus pandemic, gives the athletes time to rebalance their schedule of training.
+#   The new date of the Games, postponed one year in response to the pandemic of coronavirus, gives the athletes time to rebalance their training schedule.
+```
diff --git a/SpeechT5/fairseq/examples/paraphraser/paraphrase.py b/SpeechT5/fairseq/examples/paraphraser/paraphrase.py
new file mode 100644
index 0000000000000000000000000000000000000000..d3422fb3db9a381b73a854d2379df214ebe544a2
--- /dev/null
+++ b/SpeechT5/fairseq/examples/paraphraser/paraphrase.py
@@ -0,0 +1,85 @@
+#!/usr/bin/env python3 -u
+
+import argparse
+import fileinput
+import logging
+import os
+import sys
+
+from fairseq.models.transformer import TransformerModel
+
+
+logging.getLogger().setLevel(logging.INFO)
+
+
+def main():
+    parser = argparse.ArgumentParser(description="")
+    parser.add_argument("--en2fr", required=True, help="path to en2fr model")
+    parser.add_argument(
+        "--fr2en", required=True, help="path to fr2en mixture of experts model"
+    )
+    parser.add_argument(
+        "--user-dir", help="path to fairseq examples/translation_moe/src directory"
+    )
+    parser.add_argument(
+        "--num-experts",
+        type=int,
+        default=10,
+        help="(keep at 10 unless using a different model)",
+    )
+    parser.add_argument(
+        "files",
+        nargs="*",
+        default=["-"],
+        help='input files to paraphrase; "-" for stdin',
+    )
+    args = parser.parse_args()
+
+    if args.user_dir is None:
+        args.user_dir = os.path.join(
+            os.path.dirname(os.path.dirname(os.path.abspath(__file__))),  # examples/
+            "translation_moe",
+            "src",
+        )
+        if os.path.exists(args.user_dir):
+            logging.info("found user_dir:" + args.user_dir)
+        else:
+            raise RuntimeError(
+                "cannot find fairseq examples/translation_moe/src "
+                "(tried looking here: {})".format(args.user_dir)
+            )
+
+    logging.info("loading en2fr model from:" + args.en2fr)
+    en2fr = TransformerModel.from_pretrained(
+        model_name_or_path=args.en2fr,
+        tokenizer="moses",
+        bpe="sentencepiece",
+    ).eval()
+
+    logging.info("loading fr2en model from:" + args.fr2en)
+    fr2en = TransformerModel.from_pretrained(
+        model_name_or_path=args.fr2en,
+        tokenizer="moses",
+        bpe="sentencepiece",
+        user_dir=args.user_dir,
+        task="translation_moe",
+    ).eval()
+
+    def gen_paraphrases(en):
+        fr = en2fr.translate(en)
+        return [
+            fr2en.translate(fr, inference_step_args={"expert": i})
+            for i in range(args.num_experts)
+        ]
+
+    logging.info("Type the input sentence and press return:")
+    for line in fileinput.input(args.files):
+        line = line.strip()
+        if len(line) == 0:
+            continue
+        for paraphrase in gen_paraphrases(line):
+            print(paraphrase)
+
+
+if __name__ == "__main__":
+    main()
diff --git a/SpeechT5/fairseq/examples/pay_less_attention_paper/README.md b/SpeechT5/fairseq/examples/pay_less_attention_paper/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..5adab11f4dc3461f9e7126ac391b04e703616e6b
--- /dev/null
+++ b/SpeechT5/fairseq/examples/pay_less_attention_paper/README.md
@@ -0,0 +1,176 @@
+# Pay Less Attention with Lightweight and Dynamic Convolutions (Wu et al., 2019)
+
+This page contains pointers to pre-trained models as well as instructions on how to train new models for [our paper](https://arxiv.org/abs/1901.10430).
+
+## Citation:
+```bibtex
+@inproceedings{wu2018pay,
+  title = {Pay Less Attention with Lightweight and Dynamic Convolutions},
+  author = {Felix Wu and Angela Fan and Alexei Baevski and Yann Dauphin and Michael Auli},
+  booktitle = {International Conference on Learning Representations},
+  year = {2019},
+  url = {https://arxiv.org/abs/1901.10430},
+}
+```
+
+## Translation
+
+### Pre-trained models
+For some datasets we release models without GLUs which are faster at inference.
+
+Model | Description | Dataset | Download
+---|---|---|---
+`lightconv.no_glu.iwslt14.de-en` | LightConv (without GLUs) | [IWSLT14 German-English](https://wit3.fbk.eu/archive/2014-01/texts/de/en/de-en.tgz) | model: <br> [download (.tar.gz)](https://dl.fbaipublicfiles.com/fairseq/models/dynamicconv/iwslt14.de-en.lightconv.tar.gz) <br> IWSLT14 test: <br> [download (.tar.bz2)](https://dl.fbaipublicfiles.com/fairseq/data/iwslt14.de-en.test.tar.bz2)
+`dynamicconv.no_glu.iwslt14.de-en` | DynamicConv (without GLUs) | [IWSLT14 German-English](https://wit3.fbk.eu/archive/2014-01/texts/de/en/de-en.tgz) | model: <br> [download (.tar.gz)](https://dl.fbaipublicfiles.com/fairseq/models/dynamicconv/iwslt14.de-en.dynamicconv.tar.gz) <br> IWSLT14 test: <br> [download (.tar.bz2)](https://dl.fbaipublicfiles.com/fairseq/data/iwslt14.de-en.test.tar.bz2)
+`lightconv.no_glu.wmt16.en-de` | LightConv (without GLUs) | [WMT16 English-German](https://drive.google.com/uc?export=download&id=0B_bZck-ksdkpM25jRUN2X2UxMm8) | model: <br> [download (.tar.gz)](https://dl.fbaipublicfiles.com/fairseq/models/dynamicconv/wmt16.en-de.joined-dict.lightconv.tar.gz) <br> newstest2014 (shared vocab): <br> [download (.tar.bz2)](https://dl.fbaipublicfiles.com/fairseq/data/wmt16.en-de.joined-dict.newstest2014.tar.bz2)
+`dynamicconv.no_glu.wmt16.en-de` | DynamicConv (without GLUs) | [WMT16 English-German](https://drive.google.com/uc?export=download&id=0B_bZck-ksdkpM25jRUN2X2UxMm8) | model: <br> [download (.tar.gz)](https://dl.fbaipublicfiles.com/fairseq/models/dynamicconv/wmt16.en-de.joined-dict.dynamicconv.tar.gz) <br> newstest2014 (shared vocab): <br> [download (.tar.bz2)](https://dl.fbaipublicfiles.com/fairseq/data/wmt16.en-de.joined-dict.newstest2014.tar.bz2)
+`lightconv.glu.wmt16.en-de` | LightConv | [WMT16 English-German](https://drive.google.com/uc?export=download&id=0B_bZck-ksdkpM25jRUN2X2UxMm8) | model: <br> [download (.tar.gz)](https://dl.fbaipublicfiles.com/fairseq/models/dynamicconv/wmt16.en-de.joined-dict.lightconv-glu.tar.gz) <br> newstest2014 (shared vocab): <br> [download (.tar.bz2)](https://dl.fbaipublicfiles.com/fairseq/data/wmt16.en-de.joined-dict.newstest2014.tar.bz2)
+`dynamicconv.glu.wmt16.en-de` | DynamicConv | [WMT16 English-German](https://drive.google.com/uc?export=download&id=0B_bZck-ksdkpM25jRUN2X2UxMm8) | model: <br> [download (.tar.gz)](https://dl.fbaipublicfiles.com/fairseq/models/dynamicconv/wmt16.en-de.joined-dict.dynamicconv-glu.tar.gz) <br> newstest2014 (shared vocab): <br> [download (.tar.bz2)](https://dl.fbaipublicfiles.com/fairseq/data/wmt16.en-de.joined-dict.newstest2014.tar.bz2)
+`lightconv.glu.wmt14.en-fr` | LightConv | [WMT14 English-French](http://statmt.org/wmt14/translation-task.html#Download) | model: <br> [download (.tar.gz)](https://dl.fbaipublicfiles.com/fairseq/models/dynamicconv/wmt14.en-fr.joined-dict.lightconv-glu.tar.gz) <br> newstest2014: <br> [download (.tar.bz2)](https://dl.fbaipublicfiles.com/fairseq/data/wmt14.en-fr.joined-dict.newstest2014.tar.bz2)
+`dynamicconv.glu.wmt14.en-fr` | DynamicConv | [WMT14 English-French](http://statmt.org/wmt14/translation-task.html#Download) | model: <br> [download (.tar.gz)](https://dl.fbaipublicfiles.com/fairseq/models/dynamicconv/wmt14.en-fr.joined-dict.dynamicconv-glu.tar.gz) <br> newstest2014: <br> [download (.tar.bz2)](https://dl.fbaipublicfiles.com/fairseq/data/wmt14.en-fr.joined-dict.newstest2014.tar.bz2)
+`lightconv.glu.wmt17.zh-en` | LightConv | [WMT17 Chinese-English](http://statmt.org/wmt17/translation-task.html#Download) | model: <br> [download (.tar.gz)](https://dl.fbaipublicfiles.com/fairseq/models/dynamicconv/wmt17.zh-en.lightconv-glu.tar.gz) <br> newstest2017: <br> [download (.tar.bz2)](https://dl.fbaipublicfiles.com/fairseq/data/wmt17.zh-en.newstest2017.tar.bz2)
+`dynamicconv.glu.wmt17.zh-en` | DynamicConv | [WMT17 Chinese-English](http://statmt.org/wmt17/translation-task.html#Download) | model: <br> [download (.tar.gz)](https://dl.fbaipublicfiles.com/fairseq/models/dynamicconv/wmt17.zh-en.dynamicconv-glu.tar.gz) <br> newstest2017: <br> [download (.tar.bz2)](https://dl.fbaipublicfiles.com/fairseq/data/wmt17.zh-en.newstest2017.tar.bz2)
+
+### Memory-Efficient CUDA Kernels
+
+Since the PyTorch implementations of Light/Dynamic conv are quite memory intensive, we have developed CUDA kernels that implement the light and dynamic convolution operator in a memory-efficient and performant manner. For large sequence lengths, these kernels save about 50% memory compared to the PyTorch equivalent. 
+
+To install the kernels, use the commands below. Once installed, they will automatically be used in place of the PyTorch implementations whenever a light or dynamic convolution is used.
+
+```sh
+# to install lightconv
+cd fairseq/modules/lightconv_layer
+python cuda_function_gen.py
+python setup.py install
+
+# to install dynamicconv
+cd fairseq/modules/dynamicconv_layer
+python cuda_function_gen.py
+python setup.py install
+```
+
+### Example usage (torch.hub)
+
+We require a few additional Python dependencies for preprocessing:
+```bash
+pip install sacremoses subword_nmt
+```
+
+Interactive translation via PyTorch Hub:
+```python
+import torch
+
+# List available models
+torch.hub.list('pytorch/fairseq')  # [..., 'lightconv.glu.wmt17.zh-en', ... ]
+
+# Load a transformer trained on WMT'16 En-De
+zh2en = torch.hub.load('pytorch/fairseq', 'lightconv.glu.wmt17.zh-en', tokenizer='moses', bpe='subword_nmt')
+
+# The underlying model is available under the *models* attribute
+assert isinstance(zh2en.models[0], fairseq.models.lightconv.LightConvModel)
+
+# Translate a sentence
+zh2en.translate('你好 世界')
+# 'Hello World'
+```
+
+Loading custom models:
+```python
+from fairseq.models.lightconv import LightConvModel
+en2fr = LightConvModel.from_pretrained(
+  '/path/to/checkpoints',
+  checkpoint_file='checkpoint_best.pt',
+  data_name_or_path='data-bin/wmt14_en_fr',
+  bpe='subword_nmt',
+  bpe_codes='data-bin/wmt14_en_fr/en.code'
+)
+en2fr.translate('Hello world!')
+# 'Bonjour le monde'
+```
+
+### Preprocessing the training datasets
+
+Please follow the instructions in [`examples/translation/README.md`](../translation/README.md) to preprocess the data.
+
+### Training and evaluation options:
+To use the model without GLU, please set `--encoder-glu 0 --decoder-glu 0`.
+For LightConv, please use `--encoder-conv-type lightweight --decoder-conv-type lightweight`, otherwise the default is DynamicConv.
+For best BLEU results, lenpen may need to be manually tuned.
+
+To use the CUDA kernels, first install the PyTorch modules using the commands
+above. Once the CUDA modules are installed, they will automatically be used
+instead of the PyTorch modules.
+
+### IWSLT14 De-En
+Training and evaluating DynamicConv (without GLU) on a GPU:
+```sh
+# Training
+SAVE="save/dynamic_conv_iwslt"
+mkdir -p $SAVE 
+CUDA_VISIBLE_DEVICES=0 $(which fairseq-train) data-bin/iwslt14.tokenized.de-en \
+    --clip-norm 0 --optimizer adam --lr 0.0005 \
+    --source-lang de --target-lang en --max-tokens 4000 --no-progress-bar \
+    --log-interval 100 --stop-min-lr '1e-09' --weight-decay 0.0001 \
+    --criterion label_smoothed_cross_entropy --label-smoothing 0.1 \
+    --lr-scheduler inverse_sqrt \
+    --ddp-backend=legacy_ddp \
+    --max-update 50000 --warmup-updates 4000 --warmup-init-lr '1e-07' \
+    --adam-betas '(0.9, 0.98)' --keep-last-epochs 10 \
+    -a lightconv_iwslt_de_en --save-dir $SAVE \
+    --dropout 0.3 --attention-dropout 0.1 --weight-dropout 0.1 \
+    --encoder-glu 0 --decoder-glu 0
+python scripts/average_checkpoints.py --inputs $SAVE \
+    --num-epoch-checkpoints 10 --output "${SAVE}/checkpoint_last10_avg.pt"
+
+# Evaluation
+CUDA_VISIBLE_DEVICES=0 fairseq-generate data-bin/iwslt14.tokenized.de-en --path "${SAVE}/checkpoint_last10_avg.pt" --batch-size 128 --beam 4 --remove-bpe --lenpen 1 --gen-subset test --quiet 
+```
+
+### WMT16 En-De
+Training and evaluating DynamicConv (with GLU) on WMT16 En-De using cosine scheduler on one machine with 8 V100 GPUs:
+```sh
+# Training
+SAVE="save/dynamic_conv_wmt16en2de"
+mkdir -p $SAVE
+python -m torch.distributed.launch --nproc_per_node 8 $(which fairseq-train) \
+    data-bin/wmt16_en_de_bpe32k --fp16  --log-interval 100 --no-progress-bar \
+    --max-update 30000 --share-all-embeddings --optimizer adam \
+    --adam-betas '(0.9, 0.98)' --clip-norm 0.0 --weight-decay 0.0 \
+    --criterion label_smoothed_cross_entropy --label-smoothing 0.1 \
+    --stop-min-lr 1e-09 --update-freq 16 --attention-dropout 0.1 --keep-last-epochs 10 \
+    --ddp-backend=legacy_ddp --max-tokens 3584 \
+    --lr-scheduler cosine --warmup-init-lr 1e-7 --warmup-updates 10000 \
+    --lr-shrink 1 --lr 0.001 --min-lr 1e-7 --warmup-init-lr 1e-07 \
+    --t-mult 1 --lr-period-updates 20000 \
+    --arch lightconv_wmt_en_de_big --save-dir $SAVE \
+    --dropout 0.3 --attention-dropout 0.1 --weight-dropout 0.1 \
+    --encoder-glu 1 --decoder-glu 1
+
+# Evaluation
+CUDA_VISIBLE_DEVICES=0 fairseq-generate data-bin/wmt16.en-de.joined-dict.newstest2014 --path "${SAVE}/checkpoint_best.pt" --batch-size 128 --beam 5 --remove-bpe --lenpen 0.5 --gen-subset test > wmt16_gen.txt
+bash scripts/compound_split_bleu.sh wmt16_gen.txt
+```
+
+### WMT14 En-Fr
+Training DynamicConv (with GLU) on WMT14 En-Fr using cosine scheduler on one machine with 8 V100 GPUs:
+```sh
+# Training
+SAVE="save/dynamic_conv_wmt14en2fr"
+mkdir -p $SAVE
+python -m torch.distributed.launch --nproc_per_node 8 $(which fairseq-train) \
+    data-bin/wmt14_en_fr --fp16  --log-interval 100 --no-progress-bar \
+    --max-update 30000 --share-all-embeddings --optimizer adam \
+    --adam-betas '(0.9, 0.98)' --clip-norm 0.0 --weight-decay 0.0 \
+    --criterion label_smoothed_cross_entropy --label-smoothing 0.1 \
+    --stop-min-lr 1e-09 --update-freq 16 --attention-dropout 0.1 --keep-last-epochs 10 \
+    --ddp-backend=legacy_ddp --max-tokens 3584 \
+    --lr-scheduler cosine --warmup-init-lr 1e-7 --warmup-updates 10000 \
+    --lr-shrink 1 --lr 0.001 --min-lr 1e-7 --warmup-init-lr 1e-07 \
+    --t-mult 1 --lr-period-updates 70000 \
+    --arch lightconv_wmt_en_fr_big --save-dir $SAVE \
+    --dropout 0.1 --attention-dropout 0.1 --weight-dropout 0.1 \
+    --encoder-glu 1 --decoder-glu 1
+
+# Evaluation
+CUDA_VISIBLE_DEVICES=0 fairseq-generate data-bin/wmt14.en-fr.joined-dict.newstest2014 --path "${SAVE}/checkpoint_best.pt" --batch-size 128 --beam 5 --remove-bpe --lenpen 0.9 --gen-subset test
+```
diff --git a/SpeechT5/fairseq/examples/pointer_generator/README.md b/SpeechT5/fairseq/examples/pointer_generator/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..60965708254aae2174812ea6686a9807825b7fb6
--- /dev/null
+++ b/SpeechT5/fairseq/examples/pointer_generator/README.md
@@ -0,0 +1,82 @@
+# Transformer with Pointer-Generator Network
+
+This page describes the `transformer_pointer_generator` model that incorporates
+a pointing mechanism in the Transformer model that facilitates copying of input
+words to the output. This architecture is described in [Enarvi et al. (2020)](https://www.aclweb.org/anthology/2020.nlpmc-1.4/).
+
+## Background
+
+The pointer-generator network was introduced in [See et al. (2017)](https://arxiv.org/abs/1704.04368)
+for RNN encoder-decoder attention models. A similar mechanism can be
+incorporated in a Transformer model by reusing one of the many attention
+distributions for pointing. The attention distribution over the input words is
+interpolated with the normal output distribution over the vocabulary words. This
+allows the model to generate words that appear in the input, even if they don't
+appear in the vocabulary, helping especially with small vocabularies.
+
+## Implementation
+
+The mechanism for copying out-of-vocabulary words from the input has been
+implemented differently to See et al. In their [implementation](https://github.com/abisee/pointer-generator)
+they convey the word identities through the model in order to be able to produce
+words that appear in the input sequence but not in the vocabulary. A different
+approach was taken in the Fairseq implementation to keep it self-contained in
+the model file, avoiding any changes to the rest of the code base. Copying
+out-of-vocabulary words is possible by pre-processing the input and
+post-processing the output. This is described in detail in the next section.
+
+## Usage
+
+The training and evaluation procedure is outlined below. You can also find a
+more detailed example for the XSum dataset on [this page](README.xsum.md).
+
+##### 1. Create a vocabulary and extend it with source position markers
+
+The pointing mechanism is especially helpful with small vocabularies, if we are
+able to recover the identities of any out-of-vocabulary words that are copied
+from the input. For this purpose, the model allows extending the vocabulary with
+special tokens that can be used in place of `<unk>` tokens to identify different
+input positions. For example, the user may add `<unk-0>`, `<unk-1>`, `<unk-2>`,
+etc. to the end of the vocabulary, after the normal words. Below is an example
+of how to create a vocabulary of 10000 most common words and add 1000 input
+position markers.
+
+```bash
+vocab_size=10000
+position_markers=1000
+export LC_ALL=C
+cat train.src train.tgt |
+  tr -s '[:space:]' '\n' |
+  sort |
+  uniq -c |
+  sort -k1,1bnr -k2 |
+  head -n "$((vocab_size - 4))" |
+  awk '{ print $2 " " $1 }' >dict.pg.txt
+python3 -c "[print('<unk-{}> 0'.format(n)) for n in range($position_markers)]" >>dict.pg.txt
+```
+
+##### 2. Preprocess the text data
+
+The idea is that any `<unk>` tokens in the text are replaced with `<unk-0>` if
+it appears in the first input position, `<unk-1>` if it appears in the second
+input position, and so on. This can be achieved using the `preprocess.py` script
+that is provided in this directory.
+
+##### 3. Train a model
+
+The number of these special tokens is given to the model with the
+`--source-position-markers` argument—the model simply maps all of these to the
+same word embedding as `<unk>`.
+
+The attention distribution that is used for pointing is selected using the
+`--alignment-heads` and `--alignment-layer` command-line arguments in the same
+way as with the `transformer_align` model.
+
+##### 4. Generate text and postprocess it
+
+When using the model to generate text, you want to preprocess the input text in
+the same way that training data was processed, replacing out-of-vocabulary words
+with `<unk-N>` tokens. If any of these tokens are copied to the output, the
+actual words can be retrieved from the unprocessed input text. Any `<unk-N>`
+token should be replaced with the word at position N in the original input
+sequence. This can be achieved using the `postprocess.py` script.
diff --git a/SpeechT5/fairseq/examples/pointer_generator/README.xsum.md b/SpeechT5/fairseq/examples/pointer_generator/README.xsum.md
new file mode 100644
index 0000000000000000000000000000000000000000..ac3a8c3ddc96cd9810b45d49f6b361e43de1e9fb
--- /dev/null
+++ b/SpeechT5/fairseq/examples/pointer_generator/README.xsum.md
@@ -0,0 +1,180 @@
+## Training a pointer-generator model on the Extreme Summarization dataset
+
+##### 1. Download the Extreme Summarization data and preprocess it
+
+Follow the instructions [here](https://github.com/EdinburghNLP/XSum) to obtain
+the original Extreme Summarization dataset. You should have six files,
+{train,validation,test}.{document,summary}.
+
+##### 2. Create a vocabulary and extend it with source position markers
+
+```bash
+vocab_size=10000
+position_markers=1000
+export LC_ALL=C
+cat train.document train.summary |
+  tr -s '[:space:]' '\n' |
+  sort |
+  uniq -c |
+  sort -k1,1bnr -k2 |
+  head -n "$((vocab_size - 4))" |
+  awk '{ print $2 " " $1 }' >dict.pg.txt
+python3 -c "[print('<unk-{}> 0'.format(n)) for n in range($position_markers)]" >>dict.pg.txt
+```
+
+This creates the file dict.pg.txt that contains the 10k most frequent words,
+followed by 1k source position markers:
+
+```
+the 4954867
+. 4157552
+, 3439668
+to 2212159
+a 1916857
+of 1916820
+and 1823350
+...
+<unk-0> 0
+<unk-1> 0
+<unk-2> 0
+<unk-3> 0
+<unk-4> 0
+...
+```
+
+##### 2. Preprocess the text data
+
+```bash
+./preprocess.py --source train.document --target train.summary --vocab <(cut -d' ' -f1 dict.pg.txt) --source-out train.pg.src --target-out train.pg.tgt
+./preprocess.py --source validation.document --target validation.summary --vocab <(cut -d' ' -f1 dict.pg.txt) --source-out valid.pg.src --target-out valid.pg.tgt
+./preprocess.py --source test.document --vocab <(cut -d' ' -f1 dict.pg.txt) --source-out test.pg.src
+```
+
+The data should now contain `<unk-N>` tokens in place of out-of-vocabulary words.
+
+##### 3. Binarize the dataset:
+
+```bash
+fairseq-preprocess \
+  --source-lang src \
+  --target-lang tgt \
+  --trainpref train.pg \
+  --validpref valid.pg \
+  --destdir bin \
+  --workers 60 \
+  --srcdict dict.pg.txt \
+  --joined-dictionary
+```
+
+##### 3. Train a model
+
+```bash
+total_updates=20000
+warmup_updates=500
+lr=0.001
+max_tokens=4096
+update_freq=4
+pointer_layer=-2
+
+CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 fairseq-train bin \
+    --user-dir examples/pointer_generator/pointer_generator_src \
+    --max-tokens "$max_tokens" \
+    --task translation \
+    --source-lang src --target-lang tgt \
+    --truncate-source \
+    --layernorm-embedding \
+    --share-all-embeddings \
+    --encoder-normalize-before \
+    --decoder-normalize-before \
+    --required-batch-size-multiple 1 \
+    --arch transformer_pointer_generator \
+    --alignment-layer "$pointer_layer" \
+    --alignment-heads 1 \
+    --source-position-markers 1000 \
+    --criterion label_smoothed_cross_entropy \
+    --label-smoothing 0.1 \
+    --dropout 0.1 --attention-dropout 0.1 \
+    --weight-decay 0.01 --optimizer adam --adam-betas "(0.9, 0.999)" --adam-eps 1e-08 \
+    --clip-norm 0.1 \
+    --lr-scheduler inverse_sqrt --lr "$lr" --max-update "$total_updates" --warmup-updates "$warmup_updates" \
+    --update-freq "$update_freq" \
+    --skip-invalid-size-inputs-valid-test
+```
+
+Above we specify that our dictionary contains 1000 source position markers, and
+that we want to use one attention head from the penultimate decoder layer for
+pointing. It should run in 5.5 hours on one node with eight 32GB V100 GPUs. The
+logged messages confirm that dictionary indices above 10000 will be mapped to
+the `<unk>` embedding:
+
+```
+2020-09-24 20:43:53 | INFO | fairseq.tasks.translation | [src] dictionary: 11000 types
+2020-09-24 20:43:53 | INFO | fairseq.tasks.translation | [tgt] dictionary: 11000 types
+2020-09-24 20:43:53 | INFO | fairseq.data.data_utils | loaded 11332 examples from: bin/valid.src-tgt.src
+2020-09-24 20:43:53 | INFO | fairseq.data.data_utils | loaded 11332 examples from: bin/valid.src-tgt.tgt
+2020-09-24 20:43:53 | INFO | fairseq.tasks.translation | bin valid src-tgt 11332 examples
+2020-09-24 20:43:53 | INFO | fairseq.models.transformer_pg | dictionary indices from 10000 to 10999 will be mapped to 3
+```
+
+##### 4. Summarize the test sequences
+
+```bash
+batch_size=32
+beam_size=6
+max_length=60
+length_penalty=1.0
+
+fairseq-interactive bin \
+    --user-dir examples/pointer_generator/pointer_generator_src \
+    --batch-size "$batch_size" \
+    --task translation \
+    --source-lang src --target-lang tgt \
+    --path checkpoints/checkpoint_last.pt \
+    --input test.pg.src \
+    --buffer-size 200 \
+    --max-len-a 0 \
+    --max-len-b "$max_length" \
+    --lenpen "$length_penalty" \
+    --beam "$beam_size" \
+    --skip-invalid-size-inputs-valid-test |
+    tee generate.out
+grep ^H generate.out | cut -f 3- >generate.hyp
+```
+
+Now you should have the generated sequences in `generate.hyp`. They contain
+`<unk-N>` tokens that the model has copied from the source sequence. In order to
+retrieve the original words, we need the unprocessed source sequences from
+`test.document`.
+
+##### 5. Process the generated output
+
+Since we skipped too long inputs when producing `generate.hyp`, we also have to
+skip too long sequences now that we read `test.document`.
+
+```bash
+./postprocess.py \
+    --source <(awk 'NF<1024' test.document) \
+    --target generate.hyp \
+    --target-out generate.hyp.processed
+```
+
+Now you'll find the final sequences from `generate.hyp.processed`, with
+`<unk-N>` replaced with the original word from the source sequence.
+
+##### An example of a summarized sequence
+
+The original source document in `test.document`:
+
+> de roon moved to teesside in june 2016 for an initial # 8.8 m fee and played 33 premier league games last term . the netherlands international , 26 , scored five goals in 36 league and cup games during his spell at boro . meanwhile , manager garry monk confirmed the championship club 's interest in signing chelsea midfielder lewis baker . `` he 's a target and one of many that we 've had throughout the summer months , '' said monk . find all the latest football transfers on our dedicated page .
+
+The preprocessed source document in `test.src.pg`:
+
+> de \<unk-1> moved to \<unk-4> in june 2016 for an initial # \<unk-12> m fee and played 33 premier league games last term . the netherlands international , 26 , scored five goals in 36 league and cup games during his spell at boro . meanwhile , manager garry monk confirmed the championship club 's interest in signing chelsea midfielder lewis baker . `` he 's a target and one of many that we 've had throughout the summer months , '' said monk . find all the latest football transfers on our dedicated page .
+
+The generated summary in `generate.hyp`:
+
+> middlesbrough striker \<unk> de \<unk-1> has joined spanish side \<unk> on a season-long loan .
+
+The generated summary after postprocessing in `generate.hyp.processed`:
+
+> middlesbrough striker \<unk> de roon has joined spanish side \<unk> on a season-long loan .
diff --git a/SpeechT5/fairseq/examples/pointer_generator/pointer_generator_src/__init__.py b/SpeechT5/fairseq/examples/pointer_generator/pointer_generator_src/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..c361ff6bd616512fe2521387665de1ad1aff66d0
--- /dev/null
+++ b/SpeechT5/fairseq/examples/pointer_generator/pointer_generator_src/__init__.py
@@ -0,0 +1,6 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from . import transformer_pg  # noqa
diff --git a/SpeechT5/fairseq/examples/pointer_generator/pointer_generator_src/transformer_pg.py b/SpeechT5/fairseq/examples/pointer_generator/pointer_generator_src/transformer_pg.py
new file mode 100644
index 0000000000000000000000000000000000000000..4ccf30f4eb154f8fab1e285934fb973a2d1166cb
--- /dev/null
+++ b/SpeechT5/fairseq/examples/pointer_generator/pointer_generator_src/transformer_pg.py
@@ -0,0 +1,518 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import logging
+from typing import Any, Dict, Optional, List, Tuple
+
+import torch
+import torch.nn as nn
+from fairseq import utils
+from fairseq.models import register_model, register_model_architecture
+from fairseq.models.transformer import (
+    DEFAULT_MAX_SOURCE_POSITIONS,
+    DEFAULT_MAX_TARGET_POSITIONS,
+    TransformerDecoder,
+    TransformerEncoder,
+    TransformerModel,
+    base_architecture,
+)
+from torch import Tensor
+
+
+logger = logging.getLogger(__name__)
+
+
+@register_model("transformer_pointer_generator")
+class TransformerPointerGeneratorModel(TransformerModel):
+    """
+    Transformer model from `"Attention Is All You Need" (Vaswani et al, 2017)
+    <https://arxiv.org/abs/1706.03762>`_, augmented with a pointer-generator
+    network from `"Get To The Point: Summarization with Pointer-Generator
+    Networks" (See et al, 2017) <https://arxiv.org/abs/1704.04368>`_.
+
+    Args:
+        encoder (TransformerPointerGeneratorEncoder): the encoder
+        decoder (TransformerPointerGeneratorDecoder): the decoder
+
+    The Transformer pointer-generator model provides the following named
+    architectures and command-line arguments:
+
+    .. argparse::
+        :ref: fairseq.models.transformer_pointer_generator_parser
+        :prog:
+    """
+
+    @staticmethod
+    def add_args(parser):
+        """Add model-specific arguments to the parser."""
+        # fmt: off
+        TransformerModel.add_args(parser)
+        parser.add_argument('--alignment-heads', type=int, metavar='N',
+                            help='number of attention heads to be used for '
+                                 'pointing')
+        parser.add_argument('--alignment-layer', type=int, metavar='I',
+                            help='layer number to be used for pointing (0 '
+                                 'corresponding to the bottommost layer)')
+        parser.add_argument('--source-position-markers', type=int, metavar='N',
+                            help='dictionary includes N additional items that '
+                                 'represent an OOV token at a particular input '
+                                 'position')
+        parser.add_argument('--force-generation', type=float, metavar='P',
+                            default=None,
+                            help='set the vocabulary distribution weight to P, '
+                                 'instead of predicting it from the input (1.0 '
+                                 'corresponding to generation, 0.0 to pointing)')
+        # fmt: on
+
+    @classmethod
+    def build_model(cls, args, task):
+        """Build a new model instance."""
+
+        # make sure all arguments are present in older models
+        base_architecture(args)
+
+        if args.encoder_layers_to_keep:
+            args.encoder_layers = len(args.encoder_layers_to_keep.split(","))
+        if args.decoder_layers_to_keep:
+            args.decoder_layers = len(args.decoder_layers_to_keep.split(","))
+
+        if getattr(args, "max_source_positions", None) is None:
+            args.max_source_positions = DEFAULT_MAX_SOURCE_POSITIONS
+        if getattr(args, "max_target_positions", None) is None:
+            args.max_target_positions = DEFAULT_MAX_TARGET_POSITIONS
+        if getattr(args, "source_position_markers", None) is None:
+            args.source_position_markers = args.max_source_positions
+
+        src_dict, tgt_dict = task.source_dictionary, task.target_dictionary
+        if src_dict != tgt_dict:
+            raise ValueError("Pointer-generator requires a joined dictionary")
+
+        def build_embedding(dictionary, embed_dim, path=None):
+            # The dictionary may include additional items that can be used in
+            # place of the normal OOV token and that all map to the same
+            # embedding. Using a different token for each input position allows
+            # one to restore the word identities from the original source text.
+            num_embeddings = len(dictionary) - args.source_position_markers
+            padding_idx = dictionary.pad()
+            unk_idx = dictionary.unk()
+            logger.info(
+                "dictionary indices from {0} to {1} will be mapped to {2}".format(
+                    num_embeddings, len(dictionary) - 1, unk_idx
+                )
+            )
+            emb = Embedding(num_embeddings, embed_dim, padding_idx, unk_idx)
+            # if provided, load from preloaded dictionaries
+            if path:
+                embed_dict = utils.parse_embedding(path)
+                utils.load_embedding(embed_dict, dictionary, emb)
+            return emb
+
+        if args.share_all_embeddings:
+            if args.encoder_embed_dim != args.decoder_embed_dim:
+                raise ValueError(
+                    "--share-all-embeddings requires --encoder-embed-dim to match --decoder-embed-dim"
+                )
+            if args.decoder_embed_path and (
+                args.decoder_embed_path != args.encoder_embed_path
+            ):
+                raise ValueError(
+                    "--share-all-embeddings not compatible with --decoder-embed-path"
+                )
+            encoder_embed_tokens = build_embedding(
+                src_dict, args.encoder_embed_dim, args.encoder_embed_path
+            )
+            decoder_embed_tokens = encoder_embed_tokens
+            args.share_decoder_input_output_embed = True
+        else:
+            encoder_embed_tokens = build_embedding(
+                src_dict, args.encoder_embed_dim, args.encoder_embed_path
+            )
+            decoder_embed_tokens = build_embedding(
+                tgt_dict, args.decoder_embed_dim, args.decoder_embed_path
+            )
+
+        encoder = cls.build_encoder(args, src_dict, encoder_embed_tokens)
+        decoder = cls.build_decoder(args, tgt_dict, decoder_embed_tokens)
+        return cls(args, encoder, decoder)
+
+    @classmethod
+    def build_encoder(cls, args, src_dict, embed_tokens):
+        return TransformerPointerGeneratorEncoder(args, src_dict, embed_tokens)
+
+    @classmethod
+    def build_decoder(cls, args, tgt_dict, embed_tokens):
+        return TransformerPointerGeneratorDecoder(args, tgt_dict, embed_tokens)
+
+
+class TransformerPointerGeneratorEncoder(TransformerEncoder):
+    """
+    Transformer encoder consisting of *args.encoder_layers* layers. Each layer
+    is a :class:`TransformerEncoderLayer`. The pointer-generator variant adds
+    the source tokens to the encoder output as these are otherwise not passed
+    to the decoder.
+    """
+
+    def forward(
+        self,
+        src_tokens,
+        src_lengths: Optional[Tensor] = None,
+        return_all_hiddens: bool = False,
+        token_embeddings: Optional[Tensor] = None
+    ):
+        """
+        Runs the `forward()` method of the parent Transformer class. Then adds
+        the source tokens into the encoder output tuple.
+
+        While it might be more elegant that the model would pass the source
+        tokens to the `forward()` method of the decoder too, this would require
+        changes to `SequenceGenerator`.
+
+        Args:
+            src_tokens (torch.LongTensor): tokens in the source language of
+                shape `(batch, src_len)`
+            src_lengths (torch.LongTensor): lengths of each source sentence of
+                shape `(batch)`
+            return_all_hiddens (bool, optional): also return all of the
+                intermediate hidden states (default: False).
+            token_embeddings (torch.Tensor, optional): precomputed embeddings
+                default `None` will recompute embeddings
+
+        Returns:
+            namedtuple:
+                - **encoder_out** (Tensor): the last encoder layer's output of
+                  shape `(src_len, batch, embed_dim)`
+                - **encoder_padding_mask** (ByteTensor): the positions of
+                  padding elements of shape `(batch, src_len)`
+                - **encoder_embedding** (Tensor): the (scaled) embedding lookup
+                  of shape `(batch, src_len, embed_dim)`
+                - **encoder_states** (List[Tensor]): all intermediate
+                  hidden states of shape `(src_len, batch, embed_dim)`.
+                  Only populated if *return_all_hiddens* is True.
+                - **src_tokens** (Tensor): input token ids of shape
+                  `(batch, src_len)`
+        """
+        encoder_out = self.forward_scriptable(src_tokens,
+                                              src_lengths,
+                                              return_all_hiddens,
+                                              token_embeddings)
+
+        # The Pytorch Mobile lite interpreter does not supports returning NamedTuple in
+        # `forward` so we use a dictionary instead.
+        # TorchScript does not support mixed values so the values are all lists.
+        # The empty list is equivalent to None.
+        return {
+            "encoder_out": encoder_out["encoder_out"],  # T x B x C
+            "encoder_padding_mask": encoder_out["encoder_padding_mask"],  # B x T
+            "encoder_embedding": encoder_out["encoder_embedding"],  # B x T x C
+            "encoder_states": encoder_out["encoder_states"],  # List[T x B x C]
+            "src_tokens": [src_tokens],  # B x T
+            "src_lengths": [],
+        }
+
+
+class TransformerPointerGeneratorDecoder(TransformerDecoder):
+    """
+    Transformer decoder consisting of *args.decoder_layers* layers. Each layer
+    is a :class:`TransformerDecoderLayer`. The pointer-generator variant mixes
+    the output probabilities with an attention distribution in the output layer.
+
+    Args:
+        args (argparse.Namespace): parsed command-line arguments
+        dictionary (~fairseq.data.Dictionary): decoding dictionary
+        embed_tokens (torch.nn.Embedding): output embedding
+    """
+
+    def __init__(self, args, dictionary, embed_tokens):
+        super().__init__(args, dictionary, embed_tokens, no_encoder_attn=False)
+
+        # In the pointer-generator model these arguments define the decoder
+        # layer and the number of attention heads that will be averaged to
+        # create the alignment for pointing.
+        self.alignment_heads = args.alignment_heads
+        self.alignment_layer = args.alignment_layer
+
+        input_embed_dim = embed_tokens.embedding_dim
+
+        # Generation probabilities / interpolation coefficients are predicted
+        # from the current decoder input embedding and the decoder output, which
+        # is the size of output_embed_dim.
+        p_gen_input_size = input_embed_dim + self.output_embed_dim
+        self.project_p_gens = nn.Linear(p_gen_input_size, 1)
+        nn.init.zeros_(self.project_p_gens.bias)
+
+        # The dictionary may include a separate entry for an OOV token in each
+        # input position, so that their identity can be restored from the
+        # original source text.
+        self.num_types = len(dictionary)
+        self.num_oov_types = args.source_position_markers
+        self.num_embeddings = self.num_types - self.num_oov_types
+        self.force_p_gen = args.force_generation
+
+    def forward(
+        self,
+        prev_output_tokens,
+        encoder_out: Optional[Dict[str, List[Tensor]]] = None,
+        incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]] = None,
+        features_only: bool = False,
+        alignment_layer: Optional[int] = 0,
+        alignment_heads: Optional[int] = 1,
+        src_lengths: Optional[Any] = None,
+        return_all_hiddens: bool = False,
+    ):
+        """
+        Args:
+            prev_output_tokens (LongTensor): previous decoder outputs of shape
+                `(batch, tgt_len)`, for teacher forcing
+            encoder_out (optional): output from the encoder, used for
+                encoder-side attention
+            incremental_state (dict, optional): dictionary used for storing
+                state during :ref:`Incremental decoding`
+            features_only (bool, optional): only return features without
+                applying output layer (default: False)
+            alignment_layer (int, optional): 0-based index of the layer to be
+                used for pointing (default: 0)
+            alignment_heads (int, optional): number of attention heads to be
+                used for pointing (default: 1)
+
+        Returns:
+            tuple:
+                - the decoder's output of shape `(batch, tgt_len, vocab)`
+                - a dictionary with any model-specific outputs
+        """
+        # The normal Transformer model doesn't pass the alignment_layer and
+        # alignment_heads parameters correctly. We use our local variables.
+        x, extra = self.extract_features(
+            prev_output_tokens,
+            encoder_out=encoder_out,
+            incremental_state=incremental_state,
+            alignment_layer=self.alignment_layer,
+            alignment_heads=self.alignment_heads,
+        )
+        if not features_only:
+            # Embedding the tokens again for generation probability prediction,
+            # so that we don't have to reimplement the whole extract_features()
+            # method.
+            if incremental_state is not None:
+                prev_output_tokens = prev_output_tokens[:, -1:]
+            prev_output_embed = self.embed_tokens(prev_output_tokens)
+            prev_output_embed *= self.embed_scale
+            predictors = torch.cat((prev_output_embed, x), 2)
+            p_gens = self.project_p_gens(predictors)
+            p_gens = torch.sigmoid(p_gens.float())
+            # Torchscript complains if encoder_out or attn are None because
+            # `output_layer()` signature expects tensors instead
+            attn: Optional[Tensor] = extra["attn"][0]
+            assert encoder_out is not None
+            assert attn is not None
+            x = self.output_layer(x, attn, encoder_out["src_tokens"][0], p_gens)
+        return x, extra
+
+    def output_layer(
+        self,
+        features: Tensor,
+        attn: Tensor,
+        src_tokens: Tensor,
+        p_gens: Tensor
+    ) -> Tensor:
+        """
+        Project features to the vocabulary size and mix with the attention
+        distributions.
+        """
+        if self.force_p_gen is not None:
+            p_gens = self.force_p_gen
+
+        # project back to size of vocabulary
+        if self.adaptive_softmax is None:
+            logits = self.output_projection(features)
+        else:
+            logits = features
+
+        batch_size = logits.shape[0]
+        output_length = logits.shape[1]
+        assert logits.shape[2] == self.num_embeddings
+        assert src_tokens.shape[0] == batch_size
+        src_length = src_tokens.shape[1]
+
+        # The final output distribution will be a mixture of the normal output
+        # distribution (softmax of logits) and attention weights.
+        gen_dists = self.get_normalized_probs_scriptable(
+            (logits, None), log_probs=False, sample=None
+        )
+        gen_dists = torch.mul(gen_dists, p_gens)
+        padding_size = (batch_size, output_length, self.num_oov_types)
+        padding = gen_dists.new_zeros(padding_size)
+        gen_dists = torch.cat((gen_dists, padding), 2)
+        assert gen_dists.shape[2] == self.num_types
+
+        # Scatter attention distributions to distributions over the extended
+        # vocabulary in a tensor of shape [batch_size, output_length,
+        # vocab_size]. Each attention weight will be written into a location
+        # that is for other dimensions the same as in the index tensor, but for
+        # the third dimension it's the value of the index tensor (the token ID).
+        attn = torch.mul(attn.float(), 1 - p_gens)
+        index = src_tokens[:, None, :]
+        index = index.expand(batch_size, output_length, src_length)
+        attn_dists_size = (batch_size, output_length, self.num_types)
+        attn_dists = attn.new_zeros(attn_dists_size)
+        attn_dists.scatter_add_(2, index, attn.float())
+
+        # Final distributions, [batch_size, output_length, num_types].
+        return gen_dists + attn_dists
+
+    def get_normalized_probs(
+        self,
+        net_output: Tuple[Tensor, Optional[Dict[str, List[Optional[Tensor]]]]],
+        log_probs: bool,
+        sample: Optional[Dict[str, Tensor]] = None,
+    ):
+        """
+        Get normalized probabilities (or log probs) from a net's output.
+        Pointer-generator network output is already normalized.
+        """
+        probs = net_output[0]
+        # Make sure the probabilities are greater than zero when returning log
+        # probabilities.
+        return probs.clamp(1e-10, 1.0).log() if log_probs else probs
+
+
+class Embedding(nn.Embedding):
+    r"""A simple lookup table that stores embeddings of a fixed dictionary and size.
+    This module is often used to store word embeddings and retrieve them using indices.
+    The input to the module is a list of indices, and the output is the corresponding
+    word embeddings. This subclass differs from the standard PyTorch Embedding class by
+    allowing additional vocabulary entries that will be mapped to the unknown token
+    embedding.
+    Args:
+        num_embeddings (int): size of the dictionary of embeddings
+        embedding_dim (int): the size of each embedding vector
+        padding_idx (int): Pads the output with the embedding vector at :attr:`padding_idx`
+                           (initialized to zeros) whenever it encounters the index.
+        unk_idx (int): Maps all token indices that are greater than or equal to
+                       num_embeddings to this index.
+    Attributes:
+        weight (Tensor): the learnable weights of the module of shape (num_embeddings, embedding_dim)
+                         initialized from :math:`\mathcal{N}(0, 1)`
+    Shape:
+        - Input: :math:`(*)`, LongTensor of arbitrary shape containing the indices to extract
+        - Output: :math:`(*, H)`, where `*` is the input shape and :math:`H=\text{embedding\_dim}`
+    .. note::
+        Keep in mind that only a limited number of optimizers support
+        sparse gradients: currently it's :class:`optim.SGD` (`CUDA` and `CPU`),
+        :class:`optim.SparseAdam` (`CUDA` and `CPU`) and :class:`optim.Adagrad` (`CPU`)
+    .. note::
+        With :attr:`padding_idx` set, the embedding vector at
+        :attr:`padding_idx` is initialized to all zeros. However, note that this
+        vector can be modified afterwards, e.g., using a customized
+        initialization method, and thus changing the vector used to pad the
+        output. The gradient for this vector from :class:`~torch.nn.Embedding`
+        is always zero.
+    """
+    __constants__ = ["unk_idx"]
+
+    # Torchscript: Inheriting from Embedding class produces an error when exporting to Torchscript
+    # -> RuntimeError: Unable to cast Python instance to C++ type (compile in debug mode for details
+    # It's happening because max_norm attribute from nn.Embedding is None by default and it cannot be
+    # cast to a C++ type
+    def __init__(
+        self,
+        num_embeddings: int,
+        embedding_dim: int,
+        padding_idx: Optional[int],
+        unk_idx: int,
+        max_norm: Optional[float] = float("inf"),
+    ):
+        super().__init__(num_embeddings, embedding_dim, padding_idx=padding_idx, max_norm=max_norm)
+        self.unk_idx = unk_idx
+        nn.init.normal_(self.weight, mean=0, std=embedding_dim ** -0.5)
+        nn.init.constant_(self.weight[padding_idx], 0)
+
+    def forward(self, input):
+        input = torch.where(
+            input >= self.num_embeddings, torch.ones_like(input) * self.unk_idx, input
+        )
+        return nn.functional.embedding(
+            input, self.weight, self.padding_idx, self.max_norm,
+            self.norm_type, self.scale_grad_by_freq, self.sparse
+        )
+
+
+@register_model_architecture(
+    "transformer_pointer_generator", "transformer_pointer_generator"
+)
+def transformer_pointer_generator(args):
+    args.alignment_heads = getattr(args, "alignment_heads", 1)
+    args.alignment_layer = getattr(args, "alignment_layer", -1)
+    base_architecture(args)
+    if args.alignment_layer < 0:
+        args.alignment_layer = args.decoder_layers + args.alignment_layer
+
+
+@register_model_architecture(
+    "transformer_pointer_generator", "transformer_pointer_generator_iwslt_de_en"
+)
+def transformer_pointer_generator_iwslt_de_en(args):
+    args.encoder_embed_dim = getattr(args, "encoder_embed_dim", 512)
+    args.encoder_ffn_embed_dim = getattr(args, "encoder_ffn_embed_dim", 1024)
+    args.encoder_attention_heads = getattr(args, "encoder_attention_heads", 4)
+    args.encoder_layers = getattr(args, "encoder_layers", 6)
+    args.decoder_embed_dim = getattr(args, "decoder_embed_dim", 512)
+    args.decoder_ffn_embed_dim = getattr(args, "decoder_ffn_embed_dim", 1024)
+    args.decoder_attention_heads = getattr(args, "decoder_attention_heads", 4)
+    args.decoder_layers = getattr(args, "decoder_layers", 6)
+    transformer_pointer_generator(args)
+
+
+@register_model_architecture(
+    "transformer_pointer_generator", "transformer_pointer_generator_wmt_en_de"
+)
+def transformer_pointer_generator_wmt_en_de(args):
+    transformer_pointer_generator(args)
+
+
+# Transformer pointer-generator with the base Transformer parameters as used in
+# the "Attention Is All You Need" paper (Vaswani et al., 2017)
+@register_model_architecture(
+    "transformer_pointer_generator",
+    "transformer_pointer_generator_vaswani_wmt_en_de_big",
+)
+def transformer_pointer_generator_vaswani_wmt_en_de_big(args):
+    args.encoder_embed_dim = getattr(args, "encoder_embed_dim", 1024)
+    args.encoder_ffn_embed_dim = getattr(args, "encoder_ffn_embed_dim", 4096)
+    args.encoder_attention_heads = getattr(args, "encoder_attention_heads", 16)
+    args.encoder_normalize_before = getattr(args, "encoder_normalize_before", False)
+    args.decoder_embed_dim = getattr(args, "decoder_embed_dim", 1024)
+    args.decoder_ffn_embed_dim = getattr(args, "decoder_ffn_embed_dim", 4096)
+    args.decoder_attention_heads = getattr(args, "decoder_attention_heads", 16)
+    args.dropout = getattr(args, "dropout", 0.3)
+    transformer_pointer_generator(args)
+
+
+@register_model_architecture(
+    "transformer_pointer_generator",
+    "transformer_pointer_generator_vaswani_wmt_en_fr_big",
+)
+def transformer_pointer_generator_vaswani_wmt_en_fr_big(args):
+    args.dropout = getattr(args, "dropout", 0.1)
+    transformer_pointer_generator_vaswani_wmt_en_de_big(args)
+
+
+@register_model_architecture(
+    "transformer_pointer_generator", "transformer_pointer_generator_wmt_en_de_big"
+)
+def transformer_pointer_generator_wmt_en_de_big(args):
+    args.attention_dropout = getattr(args, "attention_dropout", 0.1)
+    transformer_pointer_generator_vaswani_wmt_en_de_big(args)
+
+
+# default parameters used in tensor2tensor implementation
+@register_model_architecture(
+    "transformer_pointer_generator", "transformer_pointer_generator_wmt_en_de_big_t2t"
+)
+def transformer_pointer_generator_wmt_en_de_big_t2t(args):
+    args.encoder_normalize_before = getattr(args, "encoder_normalize_before", True)
+    args.decoder_normalize_before = getattr(args, "decoder_normalize_before", True)
+    args.attention_dropout = getattr(args, "attention_dropout", 0.1)
+    args.activation_dropout = getattr(args, "activation_dropout", 0.1)
+    transformer_pointer_generator_vaswani_wmt_en_de_big(args)
diff --git a/SpeechT5/fairseq/examples/pointer_generator/postprocess.py b/SpeechT5/fairseq/examples/pointer_generator/postprocess.py
new file mode 100644
index 0000000000000000000000000000000000000000..b213aed80fd1e3d86f975256fcb7d9d4c16ca857
--- /dev/null
+++ b/SpeechT5/fairseq/examples/pointer_generator/postprocess.py
@@ -0,0 +1,96 @@
+#!/usr/bin/env python3
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import argparse
+import re
+import sys
+
+
+class OOVIndexError(IndexError):
+    def __init__(self, pos, source_seq, target_seq):
+        super(OOVIndexError, self).__init__(
+            "A <unk-N> tag in the target sequence refers to a position that is "
+            "outside the source sequence. Most likely there was a mismatch in "
+            "provided source and target sequences. Otherwise this would mean that "
+            "the pointing mechanism somehow attended to a position that is past "
+            "the actual sequence end."
+        )
+        self.source_pos = pos
+        self.source_seq = source_seq
+        self.target_seq = target_seq
+
+
+def replace_oovs(source_in, target_in, target_out):
+    """Replaces <unk-N> tokens in the target text with the corresponding word in
+    the source text.
+    """
+
+    oov_re = re.compile("^<unk-([0-9]+)>$")
+
+    for source_seq, target_seq in zip(source_in, target_in):
+        target_seq_out = []
+
+        pos_to_word = source_seq.strip().split()
+        for token in target_seq.strip().split():
+            m = oov_re.match(token)
+            if m:
+                pos = int(m.group(1))
+                if pos >= len(pos_to_word):
+                    raise OOVIndexError(pos, source_seq, target_seq)
+                token_out = pos_to_word[pos]
+            else:
+                token_out = token
+            target_seq_out.append(token_out)
+        target_out.write(" ".join(target_seq_out) + "\n")
+
+
+def main():
+    parser = argparse.ArgumentParser(
+        description="Replaces <unk-N> tokens in target sequences with words from "
+        "the corresponding position in the source sequence."
+    )
+    parser.add_argument(
+        "--source", type=str, help="text file with source sequences", required=True
+    )
+    parser.add_argument(
+        "--target", type=str, help="text file with target sequences", required=True
+    )
+    parser.add_argument(
+        "--target-out",
+        type=str,
+        help="where to write target sequences without <unk-N> " "entries",
+        required=True,
+    )
+    args = parser.parse_args()
+
+    target_in = (
+        open(args.target, "r", encoding="utf-8") if args.target is not None else None
+    )
+    target_out = (
+        open(args.target_out, "w", encoding="utf-8")
+        if args.target_out is not None
+        else None
+    )
+    with open(args.source, "r", encoding="utf-8") as source_in, open(
+        args.target, "r", encoding="utf-8"
+    ) as target_in, open(args.target_out, "w", encoding="utf-8") as target_out:
+        replace_oovs(source_in, target_in, target_out)
+
+
+if __name__ == "__main__":
+    try:
+        main()
+    except OOVIndexError as e:
+        print(e, file=sys.stderr)
+        print("Source sequence:", e.source_seq.strip(), file=sys.stderr)
+        print("Target sequence:", e.target_seq.strip(), file=sys.stderr)
+        print(
+            "Source sequence length:",
+            len(e.source_seq.strip().split()),
+            file=sys.stderr,
+        )
+        print("The offending tag points to:", e.source_pos)
+        sys.exit(2)
diff --git a/SpeechT5/fairseq/examples/pointer_generator/preprocess.py b/SpeechT5/fairseq/examples/pointer_generator/preprocess.py
new file mode 100644
index 0000000000000000000000000000000000000000..f72ca7d3d97e12ab7b405dcff314bdb6c0a78755
--- /dev/null
+++ b/SpeechT5/fairseq/examples/pointer_generator/preprocess.py
@@ -0,0 +1,102 @@
+#!/usr/bin/env python3
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import argparse
+from itertools import zip_longest
+
+
+def replace_oovs(source_in, target_in, vocabulary, source_out, target_out):
+    """Replaces out-of-vocabulary words in source and target text with <unk-N>,
+    where N in is the position of the word in the source sequence.
+    """
+
+    def format_unk(pos):
+        return "<unk-{}>".format(pos)
+
+    if target_in is None:
+        target_in = []
+
+    for seq_num, (source_seq, target_seq) in enumerate(
+        zip_longest(source_in, target_in)
+    ):
+        source_seq_out = []
+        target_seq_out = []
+
+        word_to_pos = dict()
+        for position, token in enumerate(source_seq.strip().split()):
+            if token in vocabulary:
+                token_out = token
+            else:
+                if token in word_to_pos:
+                    oov_pos = word_to_pos[token]
+                else:
+                    word_to_pos[token] = position
+                    oov_pos = position
+                token_out = format_unk(oov_pos)
+            source_seq_out.append(token_out)
+        source_out.write(" ".join(source_seq_out) + "\n")
+
+        if target_seq is not None:
+            for token in target_seq.strip().split():
+                if token in word_to_pos:
+                    token_out = format_unk(word_to_pos[token])
+                else:
+                    token_out = token
+                target_seq_out.append(token_out)
+        if target_out is not None:
+            target_out.write(" ".join(target_seq_out) + "\n")
+
+
+def main():
+    parser = argparse.ArgumentParser(
+        description="Replaces out-of-vocabulary words in both source and target "
+        "sequences with tokens that indicate the position of the word "
+        "in the source sequence."
+    )
+    parser.add_argument(
+        "--source", type=str, help="text file with source sequences", required=True
+    )
+    parser.add_argument(
+        "--target", type=str, help="text file with target sequences", default=None
+    )
+    parser.add_argument("--vocab", type=str, help="vocabulary file", required=True)
+    parser.add_argument(
+        "--source-out",
+        type=str,
+        help="where to write source sequences with <unk-N> entries",
+        required=True,
+    )
+    parser.add_argument(
+        "--target-out",
+        type=str,
+        help="where to write target sequences with <unk-N> entries",
+        default=None,
+    )
+    args = parser.parse_args()
+
+    with open(args.vocab, encoding="utf-8") as vocab:
+        vocabulary = vocab.read().splitlines()
+
+    target_in = (
+        open(args.target, "r", encoding="utf-8") if args.target is not None else None
+    )
+    target_out = (
+        open(args.target_out, "w", encoding="utf-8")
+        if args.target_out is not None
+        else None
+    )
+    with open(args.source, "r", encoding="utf-8") as source_in, open(
+        args.source_out, "w", encoding="utf-8"
+    ) as source_out:
+        replace_oovs(source_in, target_in, vocabulary, source_out, target_out)
+    if target_in is not None:
+        target_in.close()
+    if target_out is not None:
+        target_out.close()
+
+
+if __name__ == "__main__":
+    main()
diff --git a/SpeechT5/fairseq/examples/quant_noise/README.md b/SpeechT5/fairseq/examples/quant_noise/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..539c3d5af906d353e264a1c44612229255428dba
--- /dev/null
+++ b/SpeechT5/fairseq/examples/quant_noise/README.md
@@ -0,0 +1,298 @@
+# Training with Quantization Noise for Extreme Model Compression ({Fan\*, Stock\*} *et al.*, 2020)
+This page contains information for how to train and quantize models with Quantization Noise, for both scalar quantization like `int8` and Iterative Product Quantization.
+Check out our paper [here](https://arxiv.org/abs/2004.07320).
+
+Looking for pretrained models? They will be added shortly.
+Looking for code to train vision models? We are working on open sourcing our code as part of ClassyVision. Please check back, but note that both the Scalar and Iterative Product Quantization counterparts of the `nn.Conv2d` module are already included in this release.
+
+**Contents**:
+- [Walk through of code](#walk-through-the-code)
+- [Reproduce NLP Results](#looking-to-reproduce-the-nlp-results-in-the-paper)
+- [Reproduce Vision Results](#looking-to-reproduce-the-vision-results-in-the-paper)
+
+
+## Citation
+```bibtex
+@article{fan2020training,
+    title={Training with Quantization Noise for Extreme Model Compression},
+    author={Angela Fan* and Pierre Stock* and and Benjamin Graham and Edouard Grave and Remi Gribonval and Herve Jegou and Armand Joulin},
+    year={2020},
+    eprint={2004.07320},
+    archivePrefix={arXiv},
+    primaryClass={cs.ML}
+}
+```
+
+## Walk through the code
+
+Training a model with Quant-Noise improves the performance in subsequent inference-time quantization by training models to be robust to quantization. This technique is useful for both scalar and product quantization methods, as well as multiple domains. We detail below our approach to train, quantize models and integrate our code to quantize your favorite models.
+
+### Scalar Quantization
+
+Unlike the section [Iterative Product Quantization](#iterative-product-quantization) which gives state-of-the-art compression, this section showcases the usefulness of our approach for simple scalar quantization baselines such as int8 using on-GPU Fake Quantization.
+
+#### Training
+
+Scalar quantization with Quant-Noise consists in randomly quantizing a proportion `p` of the weights during training. Scalar quantization is implemented [here](https://github.com/pytorch/fairseq/tree/master/fairseq/modules/quantization/scalar) under the form of Fake Quantization, meaning that we emulate int8 on GPU by quantizing and de-quantizing both the weights and the activations. We rely on PyTorch's [quantization primitives](https://github.com/pytorch/pytorch/tree/master/torch/quantization).
+
+To train a model with Quant-Noise, add the following flag:
+```
+--quant-noise-scalar 0.5
+```
+Large values of noise make the network easier to quantize but may result in higher non-quantized test and validation perplexities.
+
+#### Quantization
+
+When evaluating a network, all quantized modules and activation hooks automatically switch to `p=1` so the validation accuracy reported by Fairseq is actually the quantized one, nothing more to do.
+
+
+#### Integration with your own code
+
+Looking to quantize your own models with Quant-Noise + Scalar Quantization?
+- Use the function `quantize_model_` implemented [here](https://github.com/pytorch/fairseq/tree/master/fairseq/modules/quantization/scalar/utils.py) to (1) replace all your modules by their quantized counterparts and (2) add hooks to those modules to quantize the activations.
+- Then, perform your training as usual. Note that in `eval()` mode, the network is always fully quantized (weights and activations) by default (`p=1`).
+
+
+
+### Iterative Product Quantization
+
+
+Iterative Product Quantization with Quant-Noise proceeds in two steps. First, a model must be trained uncompressed with Quant-Noise. Second, the model must be quantized with iPQ. Note that we implement here the simplest form of noise, which consists in randomly dropping a proportion `p` of blocks, and that worked as well as assigning those blocks to their current centroid.
+
+#### Training
+
+To train a model with Quant-Noise, add the following flags:
+```
+--quant-noise-pq 0.1 --quant-noise-pq-block-size 8
+```
+`quant-noise-pq` controls how much dropout is applied to the blocks of the weight matrix. `quant-noise-pq-block-size` controls the size of the weight matrix blocks.
+We recommend training with 0.05 to 0.2 Quant-Noise, a value that worked well in our experiments. For the block-size, we recommend training with block-size of 8. Note that the block size must be a multiple of `input_features`, see the size checks [here](https://github.com/pytorch/fairseq/tree/master/fairseq/modules/quant_noise.py). Large block sizes result in higher compression ratio but may induce a loss in accuracy.
+
+We currently support training Transformer based models, such as sequence-to-sequence, language models, and BERT architectures. The `quant_noise` function [here](https://github.com/pytorch/fairseq/tree/master/fairseq/modules/quant_noise.py) wraps a module. It splits a weight matrix into blocks and applies random dropout to these blocks.
+In the Transformer architectures, quant-noise is applied to the input and output embeddings, the attention, and the FFN.
+
+Quant-Noise can also be combined with **LayerDrop** (see [here](https://github.com/pytorch/fairseq/tree/master/examples/layerdrop)) to add its pruning effect to the quantized model and make the model even smaller. We recommend training with LayerDrop 0.1 or 0.2.
+
+#### Quantization
+
+We implement an improved version of product quantization from Stock et al, **iPQ**, described [here](https://arxiv.org/abs/1907.05686), see code with old API [here](https://github.com/facebookresearch/kill-the-bits). Note that we improved the iPQ API in terms of both compute speed and usability as described below.
+
+For the particular case of PQ, quantization is made sequentially. We recommend first quantizing the FFNs, then the EMBs, and finally the ATTNs. Quantization is done in two sub-steps:
+- First, perform `n` steps of Product Quantization (generally `n=20` is enough).
+- Then, finetune the obtained centroids.
+
+#### Integration with your own code
+
+Looking to quantize your own models with Quant-Noise + iPQ?
+- First wrap your modules with the `quant_noise` function [here](https://github.com/pytorch/fairseq/tree/master/fairseq/modules/quant_noise.py), which is module-agnostic and train your favorite model.
+- Then, quantize your trained model using the code [here](https://github.com/pytorch/fairseq/tree/master/fairseq/modules/quantization/pq). This can be done *without any changes to your training loop*. Below is an example code for integration.
+Note that we tried our approach only on Transformers and various Convolutional Models such as EfficientNets.
+
+```python
+from fairseq.modules.quantization.pq import quantize_model_, SizeTracker
+
+# get configuration parameters
+n_centroids_config = config["n_centroids"]
+block_sizes_config = config["block_sizes"]
+layers_to_quantize = config["layers_to_quantize"]
+
+# size tracker for keeping track of assignments, centroids and non-compressed sizes
+size_tracker = SizeTracker(model)
+
+# Quantize model by stages
+for step in range(len(layers_to_quantize)):
+
+    # quantize model in-place
+    quantized_layers = quantize_model_(
+        model,
+        size_tracker,
+        layers_to_quantize,
+        block_sizes_config,
+        n_centroids_config,
+        step=step,
+    )
+    logger.info(f"Finetuning stage {step}, quantized layers: {quantized_layers}")
+    logger.info(f"{size_tracker}")
+
+    # Don't forget to re-create/update trainer/optimizer since model parameters have changed
+    optimizer = ...
+
+    # Finetune the centroids with your usual training loop for a few epochs
+    trainer.train_epoch()
+```
+
+
+## Looking to reproduce the NLP results in the paper?
+
+We detail below how to reproduce the state-of-the-art results in reported in the paper for Quant-Noise + Iterative Product Quantization.
+
+### Training with Quant-Noise
+
+To **train** RoBERTa + QuantNoise, we followed this setting [here](https://github.com/pytorch/fairseq/tree/master/examples/roberta).
+The following command can be used to train a RoBERTa Base + QuantNoise model:
+
+```bash
+TOTAL_UPDATES=125000
+WARMUP_UPDATES=10000
+PEAK_LR=0.0005
+TOKENS_PER_SAMPLE=512
+MAX_POSITIONS=512
+MAX_SENTENCES=16
+UPDATE_FREQ=2
+DATA_DIR=/path/to/data/here
+
+fairseq-train $DATA_DIR \
+    --task masked_lm --criterion masked_lm --arch roberta_base \
+    --sample-break-mode complete \
+    --tokens-per-sample $TOKENS_PER_SAMPLE --max-positions $MAX_POSITIONS \
+    --optimizer adam --adam-betas '(0.9, 0.98)' --adam-eps 1e-6 \
+    --clip-norm 0.0 \
+    --lr-scheduler polynomial_decay --lr $PEAK_LR \
+    --warmup-updates $WARMUP_UPDATES --total-num-update $TOTAL_UPDATES \
+    --dropout 0.1 --attention-dropout 0.1 \
+    --weight-decay 0.01 \
+    --batch-size $MAX_SENTENCES \
+    --update-freq $UPDATE_FREQ --max-update $TOTAL_UPDATES \
+    --save-dir checkpoint/roberta \
+    --ddp-backend legacy_ddp --encoder-layerdrop 0.2 \
+    --quant-noise-pq 0.2 --quant-noise-pq-block-size 8 --untie-weights-roberta
+```
+
+To **finetune** RoBERTa + QuantNoise, we followed this setting [here](https://github.com/pytorch/fairseq/blob/master/examples/roberta/README.glue.md).
+The following command can be used to finetune a RoBERTa Base + QuantNoise model on the RTE dataset:
+
+```bash
+TOTAL_NUM_UPDATES=2036
+WARMUP_UPDATES=122
+LR=2e-05
+NUM_CLASSES=2
+MAX_SENTENCES=16
+ROBERTA_PATH=/path/to/roberta_quantnoise/model.pt
+
+fairseq-train /path/to/rte/data/ \
+    --restore-file $ROBERTA_PATH \
+    --max-positions 512 \
+    --batch-size $MAX_SENTENCES \
+    --max-tokens 4400 \
+    --task sentence_prediction \
+    --reset-optimizer --reset-dataloader --reset-meters \
+    --required-batch-size-multiple 1 \
+    --init-token 0 --separator-token 2 \
+    --arch roberta_large \
+    --criterion sentence_prediction \
+    --num-classes $NUM_CLASSES \
+    --dropout 0.1 --attention-dropout 0.1 \
+    --weight-decay 0.1 --optimizer adam --adam-betas "(0.9, 0.98)" --adam-eps 1e-06 \
+    --clip-norm 0.0 \
+    --lr-scheduler polynomial_decay --lr $LR --total-num-update $TOTAL_NUM_UPDATES --warmup-updates $WARMUP_UPDATES \
+    --fp16 --fp16-init-scale 4 --threshold-loss-scale 1 --fp16-scale-window 128 \
+    --max-epoch 10 \
+    --find-unused-parameters \
+    --best-checkpoint-metric accuracy --maximize-best-checkpoint-metric \
+    --ddp-backend legacy_ddp \
+    --quant-noise-pq 0.2 --quant-noise-pq-block-size 8
+```
+
+To **train** Language Models on Wikitext-103, we followed this setting [here](https://github.com/pytorch/fairseq/tree/master/examples/language_model).
+The following command can be used to train a Transformer + QuantNoise model on Wikitext-103:
+
+```bash
+fairseq-train --task language_modeling /path/to/wikitext-103/data \
+    --save-dir checkpoints/transformer_wikitext-103 \
+    --adaptive-input --adaptive-input-cutoff 20000,60000 --adaptive-input-factor 4 \
+    --adaptive-softmax-cutoff 20000,60000 --adaptive-softmax-dropout 0.2 --adaptive-softmax-factor 4.0 \
+    --tie-adaptive-proj --tie-adaptive-weights \
+    --arch transformer_lm_gbw \
+    --attention-dropout 0.1 --dropout 0.2 --relu-dropout 0.1 \
+    --clip-norm 0.1 --criterion adaptive_loss \
+    --ddp-backend legacy_ddp \
+    --decoder-attention-heads 8 --decoder-embed-dim 1024 --decoder-ffn-embed-dim 4096 --decoder-input-dim 1024 \
+    --decoder-layers 16 --decoder-normalize-before --decoder-output-dim 1024 \
+    --min-lr 0.0001 --lr-period-updates 270000 --lr-scheduler cosine --lr-shrink 0.75 --lr 1.0 --t-mult 2.0 \
+    --max-tokens 3072 --tokens-per-sample 3072 --momentum 0.99 --optimizer nag \
+    --sample-break-mode none --update-freq 3 \
+    --warmup-init-lr 1e-07 --warmup-updates 16000 \
+    --weight-decay 0 --seed 1 --stop-min-lr 1e-09 \
+    --quant-noise-pq 0.05 --quant-noise-pq-block-size 8
+```
+
+To **evaluate** this model, note you need to use the `eval.py` script. The following command can be used to evaluate:
+
+```bash
+fairseq-eval-lm /path/to/wikitext-103/data --path /path/to/model/checkpoint \
+    --sample-break-mode complete \
+    --max-tokens 3072 \
+    --context-window 2560 \
+    --softmax-batch 1024 \
+    --gen-subset valid
+```
+and change the `--gen-subset` to `test` if you would like to evaluate on the test set instead.
+
+
+### Iterative Product Quantization
+
+To quantize the finetuned RoBERTa model, we use this command on 1 GPU. This should run in a day.
+```bash
+TOTAL_NUM_UPDATES=6108  # 2036 updates for each iteration
+WARMUP_UPDATES=122
+LR=2e-05
+NUM_CLASSES=2
+MAX_SENTENCES=16
+fairseq-train --task sentence_prediction /path/to/data/ \
+    --restore-file $ROBERTA_PATH \
+    --save-dir checkpoints/roberta_finetuned \
+    --max-positions 512 \
+    --batch-size $MAX_SENTENCES \
+    --max-tokens 4400 \
+    --init-token 0 --separator-token 2 \
+    --arch roberta_large \
+    --criterion sentence_prediction \
+    --num-classes $NUM_CLASSES \
+    --dropout 0.1 --attention-dropout 0.1 \
+    --weight-decay 0.1 --optimizer adam --adam-betas "(0.9, 0.98)" --adam-eps 1e-06 \
+    --clip-norm 0.0 --lr-scheduler polynomial_decay \
+    --fp16 --fp16-init-scale 4 --threshold-loss-scale 1 --fp16-scale-window 128 \
+    --no-progress-bar --skip-invalid-size-inputs-valid-test --ddp-backend legacy_ddp \
+    --quantization-config-path /path/to/config/yaml
+```
+
+To quantize the trained Language Model, we use this command on 8 V100 23GB GPUs. This should run in a couple of hours.
+```bash
+fairseq-train --task language_modeling /path/to/wikitext-103/data \
+    --save-dir checkpoints/transformer_wikitext-103 \
+    --adaptive-input --adaptive-input-cutoff 20000,60000 --adaptive-input-factor 4 \
+    --adaptive-softmax-cutoff 20000,60000 --adaptive-softmax-dropout 0.2 --adaptive-softmax-factor 4.0 \
+    --arch transformer_lm_gbw \
+    --attention-dropout 0.1 --dropout 0.2 --relu-dropout 0.1  \
+    --bucket-cap-mb 25 --char-embedder-highway-layers 2 --character-embedding-dim 4 \
+    --clip-norm 0.1 --criterion adaptive_loss \
+    --ddp-backend legacy_ddp \
+    --decoder-attention-heads 8 --decoder-embed-dim 1024 --decoder-ffn-embed-dim 4096 --decoder-input-dim 1024 --decoder-layers 16 --decoder-normalize-before --decoder-output-dim 1024 \
+    --fp16 --keep-last-epochs -1 \
+    --min-lr 0.0001 --lr-period-updates 270000 --lr-scheduler cosine --lr-shrink 0.75 --lr 0.05 --stop-min-lr 1e-09 \
+    --max-tokens 2944  --tokens-per-sample 2944\
+    --momentum 0.99 --no-epoch-checkpoints --no-progress-bar --optimizer nag --required-batch-size-multiple 8 \
+    --sample-break-mode none --t-mult 2.0 --skip-invalid-size-inputs-valid-test \
+    --tie-adaptive-proj --tie-adaptive-weights --update-freq 3 --weight-decay 0 --seed 1  \
+    --log-interval 100 --no-progress-bar --skip-invalid-size-inputs-valid-test \
+    --restore-file path/to/trained/lm/with/quant/noise \
+    --max-update 13500 --quantization-config-path /path/to/config/yaml
+```
+If you have less capacity or if your distributed training freezes, try reducing  `--max-tokens` and  `--tokens-per-sample` (this may reduce the quantized accuracy a bit).
+
+### Remarks
+
+We try to keep the open-sourced code as readable and as easy-to-plug as possible. Therefore, we did not test it for the following cases:
+- Scalar quantization with RoBERTa.
+- Quantization with iPQ and `int8` combined.
+
+If you have trouble adapting it, we will be more than happy to help!
+
+## Looking to reproduce the Vision results in the paper?
+
+We are working on open sourcing our code as part of ClassyVision. Please check back.
+
+
+## Having an issue or have a question?
+
+Please open an issue in this repository with the details of your question. Thanks!
diff --git a/SpeechT5/fairseq/examples/quant_noise/transformer_quantization_config.yaml b/SpeechT5/fairseq/examples/quant_noise/transformer_quantization_config.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..d4be14a93a3593f8e6dc66c3b05061bfdde3e0e0
--- /dev/null
+++ b/SpeechT5/fairseq/examples/quant_noise/transformer_quantization_config.yaml
@@ -0,0 +1,33 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+# This file defines example configuration arguments for quantizing
+# a transformer model with product quantization
+
+# Number of Centroids for Product Quantization, by default 256 (byte-aligned)
+n_centroids:
+    Linear:
+        key: in_features
+        value: {"*": 256}
+    Embedding:
+        key: embedding_dim
+        value: {"*": 256}
+
+# Block Sizes for Product Quantization
+# We suggest: 8 for FFN, 4 for ATTN, 4 for embedding projections, 8 for embeddings
+block_sizes:
+  Linear:
+      key: fuzzy_name
+      value: {fc: 8, attn: 4, emb: 4}
+  Embedding:
+      key: fuzzy_name
+      value: {emb: 8}
+
+# Layers to Quantize Sequentially
+# We suggest: first FFN, then EMB, then ATTN
+layers_to_quantize:
+    - decoder\\.layers\\.\d+\\.fc[12]
+    - decoder\\.embed_tokens\\.embeddings\\.[012]\\.[01]
+    - decoder\\.layers\\.\d+\\.self_attn\\.(k_proj|v_proj|q_proj|out_proj)
diff --git a/SpeechT5/fairseq/examples/roberta/README.custom_classification.md b/SpeechT5/fairseq/examples/roberta/README.custom_classification.md
new file mode 100644
index 0000000000000000000000000000000000000000..7254bb7d178760ef5b847901bbcac3711af33ca2
--- /dev/null
+++ b/SpeechT5/fairseq/examples/roberta/README.custom_classification.md
@@ -0,0 +1,168 @@
+# Finetuning RoBERTa on a custom classification task
+
+This example shows how to finetune RoBERTa on the IMDB dataset, but should illustrate the process for most classification tasks.
+
+### 1) Get the data
+
+```bash
+wget http://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz
+tar zxvf aclImdb_v1.tar.gz
+```
+
+
+### 2) Format data
+
+`IMDB` data has one data-sample in each file, below python code-snippet converts it one file for train and valid each for ease of processing.  
+```python
+import argparse
+import os
+import random
+from glob import glob
+
+random.seed(0)
+
+def main(args):
+    for split in ['train', 'test']:
+        samples = []
+        for class_label in ['pos', 'neg']:
+            fnames = glob(os.path.join(args.datadir, split, class_label) + '/*.txt')
+            for fname in fnames:
+                with open(fname) as fin:
+                    line = fin.readline()
+                    samples.append((line, 1 if class_label == 'pos' else 0))
+        random.shuffle(samples)
+        out_fname = 'train' if split == 'train' else 'dev'
+        f1 = open(os.path.join(args.datadir, out_fname + '.input0'), 'w')
+        f2 = open(os.path.join(args.datadir, out_fname + '.label'), 'w')
+        for sample in samples:
+            f1.write(sample[0] + '\n')
+            f2.write(str(sample[1]) + '\n')
+        f1.close()
+        f2.close()
+
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser()
+    parser.add_argument('--datadir', default='aclImdb')
+    args = parser.parse_args()
+    main(args)
+```
+
+
+### 3) BPE encode
+
+Run `multiprocessing_bpe_encoder`, you can also do this in previous step for each sample but that might be slower.
+```bash
+# Download encoder.json and vocab.bpe
+wget -N 'https://dl.fbaipublicfiles.com/fairseq/gpt2_bpe/encoder.json'
+wget -N 'https://dl.fbaipublicfiles.com/fairseq/gpt2_bpe/vocab.bpe'
+
+for SPLIT in train dev; do
+    python -m examples.roberta.multiprocessing_bpe_encoder \
+        --encoder-json encoder.json \
+        --vocab-bpe vocab.bpe \
+        --inputs "aclImdb/$SPLIT.input0" \
+        --outputs "aclImdb/$SPLIT.input0.bpe" \
+        --workers 60 \
+        --keep-empty
+done
+```
+
+
+### 4) Preprocess data
+
+```bash
+# Download fairseq dictionary.
+wget -N 'https://dl.fbaipublicfiles.com/fairseq/gpt2_bpe/dict.txt'  
+
+fairseq-preprocess \
+    --only-source \
+    --trainpref "aclImdb/train.input0.bpe" \
+    --validpref "aclImdb/dev.input0.bpe" \
+    --destdir "IMDB-bin/input0" \
+    --workers 60 \
+    --srcdict dict.txt
+
+fairseq-preprocess \
+    --only-source \
+    --trainpref "aclImdb/train.label" \
+    --validpref "aclImdb/dev.label" \
+    --destdir "IMDB-bin/label" \
+    --workers 60
+
+```
+
+
+### 5) Run training
+
+```bash
+TOTAL_NUM_UPDATES=7812  # 10 epochs through IMDB for bsz 32
+WARMUP_UPDATES=469      # 6 percent of the number of updates
+LR=1e-05                # Peak LR for polynomial LR scheduler.
+HEAD_NAME=imdb_head     # Custom name for the classification head.
+NUM_CLASSES=2           # Number of classes for the classification task.
+MAX_SENTENCES=8         # Batch size.
+ROBERTA_PATH=/path/to/roberta.large/model.pt
+
+CUDA_VISIBLE_DEVICES=0 fairseq-train IMDB-bin/ \
+    --restore-file $ROBERTA_PATH \
+    --max-positions 512 \
+    --batch-size $MAX_SENTENCES \
+    --max-tokens 4400 \
+    --task sentence_prediction \
+    --reset-optimizer --reset-dataloader --reset-meters \
+    --required-batch-size-multiple 1 \
+    --init-token 0 --separator-token 2 \
+    --arch roberta_large \
+    --criterion sentence_prediction \
+    --classification-head-name $HEAD_NAME \
+    --num-classes $NUM_CLASSES \
+    --dropout 0.1 --attention-dropout 0.1 \
+    --weight-decay 0.1 --optimizer adam --adam-betas "(0.9, 0.98)" --adam-eps 1e-06 \
+    --clip-norm 0.0 \
+    --lr-scheduler polynomial_decay --lr $LR --total-num-update $TOTAL_NUM_UPDATES --warmup-updates $WARMUP_UPDATES \
+    --fp16 --fp16-init-scale 4 --threshold-loss-scale 1 --fp16-scale-window 128 \
+    --max-epoch 10 \
+    --best-checkpoint-metric accuracy --maximize-best-checkpoint-metric \
+    --shorten-method "truncate" \
+    --find-unused-parameters \
+    --update-freq 4
+```
+
+The above command will finetune RoBERTa-large with an effective batch-size of 32
+sentences (`--batch-size=8 --update-freq=4`). The expected
+`best-validation-accuracy` after 10 epochs is ~96.5%.
+
+If you run out of GPU memory, try decreasing `--batch-size` and increase
+`--update-freq` to compensate.
+
+
+### 6) Load model using hub interface
+
+Now we can load the trained model checkpoint using the RoBERTa hub interface.
+
+Assuming your checkpoints are stored in `checkpoints/`:
+```python
+from fairseq.models.roberta import RobertaModel
+roberta = RobertaModel.from_pretrained(
+    'checkpoints',
+    checkpoint_file='checkpoint_best.pt',
+    data_name_or_path='IMDB-bin'
+)
+roberta.eval()  # disable dropout
+```
+
+Finally you can make predictions using the `imdb_head` (or whatever you set
+`--classification-head-name` to during training):
+```python
+label_fn = lambda label: roberta.task.label_dictionary.string(
+    [label + roberta.task.label_dictionary.nspecial]
+)
+
+tokens = roberta.encode('Best movie this year')
+pred = label_fn(roberta.predict('imdb_head', tokens).argmax().item())
+assert pred == '1'  # positive
+
+tokens = roberta.encode('Worst movie ever')
+pred = label_fn(roberta.predict('imdb_head', tokens).argmax().item())
+assert pred == '0'  # negative
+```
diff --git a/SpeechT5/fairseq/examples/roberta/README.glue.md b/SpeechT5/fairseq/examples/roberta/README.glue.md
new file mode 100644
index 0000000000000000000000000000000000000000..77015d2e2f76fb7d62fe20c504d14b0c817f19c9
--- /dev/null
+++ b/SpeechT5/fairseq/examples/roberta/README.glue.md
@@ -0,0 +1,99 @@
+# Finetuning RoBERTa on GLUE tasks
+
+### 1) Download the data from GLUE website (https://gluebenchmark.com/tasks) using following commands:
+```bash
+wget https://gist.githubusercontent.com/W4ngatang/60c2bdb54d156a41194446737ce03e2e/raw/17b8dd0d724281ed7c3b2aeeda662b92809aadd5/download_glue_data.py
+python download_glue_data.py --data_dir glue_data --tasks all
+```
+
+### 2) Preprocess GLUE task data:
+```bash
+./examples/roberta/preprocess_GLUE_tasks.sh glue_data <glue_task_name>
+```
+`glue_task_name` is one of the following:
+`{ALL, QQP, MNLI, QNLI, MRPC, RTE, STS-B, SST-2, CoLA}`
+Use `ALL` for preprocessing all the glue tasks.
+
+### 3) Fine-tuning on GLUE task:
+Example fine-tuning cmd for `RTE` task
+```bash
+TOTAL_NUM_UPDATES=2036  # 10 epochs through RTE for bsz 16
+WARMUP_UPDATES=122      # 6 percent of the number of updates
+LR=2e-05                # Peak LR for polynomial LR scheduler.
+NUM_CLASSES=2
+MAX_SENTENCES=16        # Batch size.
+ROBERTA_PATH=/path/to/roberta/model.pt
+
+CUDA_VISIBLE_DEVICES=0 fairseq-train RTE-bin/ \
+    --restore-file $ROBERTA_PATH \
+    --max-positions 512 \
+    --batch-size $MAX_SENTENCES \
+    --max-tokens 4400 \
+    --task sentence_prediction \
+    --reset-optimizer --reset-dataloader --reset-meters \
+    --required-batch-size-multiple 1 \
+    --init-token 0 --separator-token 2 \
+    --arch roberta_large \
+    --criterion sentence_prediction \
+    --num-classes $NUM_CLASSES \
+    --dropout 0.1 --attention-dropout 0.1 \
+    --weight-decay 0.1 --optimizer adam --adam-betas "(0.9, 0.98)" --adam-eps 1e-06 \
+    --clip-norm 0.0 \
+    --lr-scheduler polynomial_decay --lr $LR --total-num-update $TOTAL_NUM_UPDATES --warmup-updates $WARMUP_UPDATES \
+    --fp16 --fp16-init-scale 4 --threshold-loss-scale 1 --fp16-scale-window 128 \
+    --max-epoch 10 \
+    --find-unused-parameters \
+    --best-checkpoint-metric accuracy --maximize-best-checkpoint-metric;
+```
+
+For each of the GLUE task, you will need to use following cmd-line arguments:
+
+Model | MNLI | QNLI | QQP | RTE | SST-2 | MRPC | CoLA | STS-B
+---|---|---|---|---|---|---|---|---
+`--num-classes` | 3 | 2 | 2 | 2 | 2 | 2 | 2 | 1
+`--lr` | 1e-5 | 1e-5 | 1e-5 | 2e-5 | 1e-5 | 1e-5 | 1e-5 | 2e-5
+`--batch-size` | 32 | 32 | 32 | 16 | 32 | 16 | 16 | 16
+`--total-num-update` | 123873 | 33112 | 113272 | 2036 | 20935 | 2296 | 5336 | 3598
+`--warmup-updates` | 7432 | 1986 | 28318 | 122 | 1256 | 137 | 320 | 214
+
+For `STS-B` additionally add `--regression-target --best-checkpoint-metric loss` and remove `--maximize-best-checkpoint-metric`.
+
+**Note:**
+
+a) `--total-num-updates` is used by `--polynomial_decay` scheduler and is calculated for `--max-epoch=10` and `--batch-size=16/32` depending on the task.
+
+b) Above cmd-args and hyperparams are tested on one Nvidia `V100` GPU with `32gb` of memory for each task. Depending on the GPU memory resources available to you, you can use increase `--update-freq` and reduce `--batch-size`.
+
+c) All the settings in above table are suggested settings based on our hyperparam search within a fixed search space (for careful comparison across models). You might be able to find better metrics with wider hyperparam search.  
+
+### Inference on GLUE task
+After training the model as mentioned in previous step, you can perform inference with checkpoints in `checkpoints/` directory using following python code snippet:
+
+```python
+from fairseq.models.roberta import RobertaModel
+
+roberta = RobertaModel.from_pretrained(
+    'checkpoints/',
+    checkpoint_file='checkpoint_best.pt',
+    data_name_or_path='RTE-bin'
+)
+
+label_fn = lambda label: roberta.task.label_dictionary.string(
+    [label + roberta.task.label_dictionary.nspecial]
+)
+ncorrect, nsamples = 0, 0
+roberta.cuda()
+roberta.eval()
+with open('glue_data/RTE/dev.tsv') as fin:
+    fin.readline()
+    for index, line in enumerate(fin):
+        tokens = line.strip().split('\t')
+        sent1, sent2, target = tokens[1], tokens[2], tokens[3]
+        tokens = roberta.encode(sent1, sent2)
+        prediction = roberta.predict('sentence_classification_head', tokens).argmax().item()
+        prediction_label = label_fn(prediction)
+        ncorrect += int(prediction_label == target)
+        nsamples += 1
+print('| Accuracy: ', float(ncorrect)/float(nsamples))
+
+```
diff --git a/SpeechT5/fairseq/examples/roberta/README.md b/SpeechT5/fairseq/examples/roberta/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..58091b2c7d7949e10fe963c7e85d0c727a006b5e
--- /dev/null
+++ b/SpeechT5/fairseq/examples/roberta/README.md
@@ -0,0 +1,296 @@
+# RoBERTa: A Robustly Optimized BERT Pretraining Approach
+
+https://arxiv.org/abs/1907.11692
+
+## Introduction
+
+RoBERTa iterates on BERT's pretraining procedure, including training the model longer, with bigger batches over more data; removing the next sentence prediction objective; training on longer sequences; and dynamically changing the masking pattern applied to the training data. See the associated paper for more details.
+
+### What's New:
+
+- December 2020: German model (GottBERT) is available: [GottBERT](https://github.com/pytorch/fairseq/tree/master/examples/gottbert).
+- January 2020: Italian model (UmBERTo) is available from Musixmatch Research: [UmBERTo](https://github.com/musixmatchresearch/umberto).
+- November 2019: French model (CamemBERT) is available: [CamemBERT](https://github.com/pytorch/fairseq/tree/master/examples/camembert).
+- November 2019: Multilingual encoder (XLM-RoBERTa) is available: [XLM-R](https://github.com/pytorch/fairseq/tree/master/examples/xlmr).
+- September 2019: TensorFlow and TPU support via the [transformers library](https://github.com/huggingface/transformers).
+- August 2019: RoBERTa is now supported in the [pytorch-transformers library](https://github.com/huggingface/pytorch-transformers).
+- August 2019: Added [tutorial for finetuning on WinoGrande](https://github.com/pytorch/fairseq/tree/master/examples/roberta/wsc#roberta-training-on-winogrande-dataset).
+- August 2019: Added [tutorial for pretraining RoBERTa using your own data](README.pretraining.md).
+
+## Pre-trained models
+
+Model | Description | # params | Download
+---|---|---|---
+`roberta.base` | RoBERTa using the BERT-base architecture | 125M | [roberta.base.tar.gz](https://dl.fbaipublicfiles.com/fairseq/models/roberta.base.tar.gz)
+`roberta.large` | RoBERTa using the BERT-large architecture | 355M | [roberta.large.tar.gz](https://dl.fbaipublicfiles.com/fairseq/models/roberta.large.tar.gz)
+`roberta.large.mnli` | `roberta.large` finetuned on [MNLI](http://www.nyu.edu/projects/bowman/multinli) | 355M | [roberta.large.mnli.tar.gz](https://dl.fbaipublicfiles.com/fairseq/models/roberta.large.mnli.tar.gz)
+`roberta.large.wsc` | `roberta.large` finetuned on [WSC](wsc/README.md) | 355M | [roberta.large.wsc.tar.gz](https://dl.fbaipublicfiles.com/fairseq/models/roberta.large.wsc.tar.gz)
+
+## Results
+
+**[GLUE (Wang et al., 2019)](https://gluebenchmark.com/)**
+_(dev set, single model, single-task finetuning)_
+
+Model | MNLI | QNLI | QQP | RTE | SST-2 | MRPC | CoLA | STS-B
+---|---|---|---|---|---|---|---|---
+`roberta.base` | 87.6 | 92.8 | 91.9 | 78.7 | 94.8 | 90.2 | 63.6 | 91.2
+`roberta.large` | 90.2 | 94.7 | 92.2 | 86.6 | 96.4 | 90.9 | 68.0 | 92.4
+`roberta.large.mnli` | 90.2 | - | - | - | - | - | - | -
+
+**[SuperGLUE (Wang et al., 2019)](https://super.gluebenchmark.com/)**
+_(dev set, single model, single-task finetuning)_
+
+Model | BoolQ | CB | COPA | MultiRC | RTE | WiC | WSC
+---|---|---|---|---|---|---|---
+`roberta.large` | 86.9 | 98.2 | 94.0 | 85.7 | 89.5 | 75.6 | -
+`roberta.large.wsc` | - | - | - | - | - | - | 91.3
+
+**[SQuAD (Rajpurkar et al., 2018)](https://rajpurkar.github.io/SQuAD-explorer/)**
+_(dev set, no additional data used)_
+
+Model | SQuAD 1.1 EM/F1 | SQuAD 2.0 EM/F1
+---|---|---
+`roberta.large` | 88.9/94.6 | 86.5/89.4
+
+**[RACE (Lai et al., 2017)](http://www.qizhexie.com/data/RACE_leaderboard.html)**
+_(test set)_
+
+Model | Accuracy | Middle | High
+---|---|---|---
+`roberta.large` | 83.2 | 86.5 | 81.3
+
+**[HellaSwag (Zellers et al., 2019)](https://rowanzellers.com/hellaswag/)**
+_(test set)_
+
+Model | Overall | In-domain | Zero-shot | ActivityNet | WikiHow
+---|---|---|---|---|---
+`roberta.large` | 85.2 | 87.3 | 83.1 | 74.6 | 90.9
+
+**[Commonsense QA (Talmor et al., 2019)](https://www.tau-nlp.org/commonsenseqa)**
+_(test set)_
+
+Model | Accuracy
+---|---
+`roberta.large` (single model) | 72.1
+`roberta.large` (ensemble) | 72.5
+
+**[Winogrande (Sakaguchi et al., 2019)](https://arxiv.org/abs/1907.10641)**
+_(test set)_
+
+Model | Accuracy
+---|---
+`roberta.large` | 78.1
+
+**[XNLI (Conneau et al., 2018)](https://arxiv.org/abs/1809.05053)**
+_(TRANSLATE-TEST)_
+
+Model | en | fr | es | de | el | bg | ru | tr | ar | vi | th | zh | hi | sw | ur
+---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---
+`roberta.large.mnli` | 91.3 | 82.91 | 84.27 | 81.24 | 81.74 | 83.13 | 78.28 | 76.79 | 76.64 | 74.17 | 74.05 | 77.5 | 70.9 | 66.65 | 66.81
+
+## Example usage
+
+##### Load RoBERTa from torch.hub (PyTorch >= 1.1):
+```python
+import torch
+roberta = torch.hub.load('pytorch/fairseq', 'roberta.large')
+roberta.eval()  # disable dropout (or leave in train mode to finetune)
+```
+
+##### Load RoBERTa (for PyTorch 1.0 or custom models):
+```python
+# Download roberta.large model
+wget https://dl.fbaipublicfiles.com/fairseq/models/roberta.large.tar.gz
+tar -xzvf roberta.large.tar.gz
+
+# Load the model in fairseq
+from fairseq.models.roberta import RobertaModel
+roberta = RobertaModel.from_pretrained('/path/to/roberta.large', checkpoint_file='model.pt')
+roberta.eval()  # disable dropout (or leave in train mode to finetune)
+```
+
+##### Apply Byte-Pair Encoding (BPE) to input text:
+```python
+tokens = roberta.encode('Hello world!')
+assert tokens.tolist() == [0, 31414, 232, 328, 2]
+roberta.decode(tokens)  # 'Hello world!'
+```
+
+##### Extract features from RoBERTa:
+```python
+# Extract the last layer's features
+last_layer_features = roberta.extract_features(tokens)
+assert last_layer_features.size() == torch.Size([1, 5, 1024])
+
+# Extract all layer's features (layer 0 is the embedding layer)
+all_layers = roberta.extract_features(tokens, return_all_hiddens=True)
+assert len(all_layers) == 25
+assert torch.all(all_layers[-1] == last_layer_features)
+```
+
+##### Use RoBERTa for sentence-pair classification tasks:
+```python
+# Download RoBERTa already finetuned for MNLI
+roberta = torch.hub.load('pytorch/fairseq', 'roberta.large.mnli')
+roberta.eval()  # disable dropout for evaluation
+
+# Encode a pair of sentences and make a prediction
+tokens = roberta.encode('Roberta is a heavily optimized version of BERT.', 'Roberta is not very optimized.')
+roberta.predict('mnli', tokens).argmax()  # 0: contradiction
+
+# Encode another pair of sentences
+tokens = roberta.encode('Roberta is a heavily optimized version of BERT.', 'Roberta is based on BERT.')
+roberta.predict('mnli', tokens).argmax()  # 2: entailment
+```
+
+##### Register a new (randomly initialized) classification head:
+```python
+roberta.register_classification_head('new_task', num_classes=3)
+logprobs = roberta.predict('new_task', tokens)  # tensor([[-1.1050, -1.0672, -1.1245]], grad_fn=<LogSoftmaxBackward>)
+```
+
+##### Batched prediction:
+```python
+import torch
+from fairseq.data.data_utils import collate_tokens
+
+roberta = torch.hub.load('pytorch/fairseq', 'roberta.large.mnli')
+roberta.eval()
+
+batch_of_pairs = [
+    ['Roberta is a heavily optimized version of BERT.', 'Roberta is not very optimized.'],
+    ['Roberta is a heavily optimized version of BERT.', 'Roberta is based on BERT.'],
+    ['potatoes are awesome.', 'I like to run.'],
+    ['Mars is very far from earth.', 'Mars is very close.'],
+]
+
+batch = collate_tokens(
+    [roberta.encode(pair[0], pair[1]) for pair in batch_of_pairs], pad_idx=1
+)
+
+logprobs = roberta.predict('mnli', batch)
+print(logprobs.argmax(dim=1))
+# tensor([0, 2, 1, 0])
+```
+
+##### Using the GPU:
+```python
+roberta.cuda()
+roberta.predict('new_task', tokens)  # tensor([[-1.1050, -1.0672, -1.1245]], device='cuda:0', grad_fn=<LogSoftmaxBackward>)
+```
+
+## Advanced usage
+
+#### Filling masks:
+
+RoBERTa can be used to fill `<mask>` tokens in the input. Some examples from the
+[Natural Questions dataset](https://ai.google.com/research/NaturalQuestions/):
+```python
+roberta.fill_mask('The first Star wars movie came out in <mask>', topk=3)
+# [('The first Star wars movie came out in 1977', 0.9504708051681519, ' 1977'), ('The first Star wars movie came out in 1978', 0.009986862540245056, ' 1978'), ('The first Star wars movie came out in 1979', 0.009574787691235542, ' 1979')]
+
+roberta.fill_mask('Vikram samvat calender is official in <mask>', topk=3)
+# [('Vikram samvat calender is official in India', 0.21878819167613983, ' India'), ('Vikram samvat calender is official in Delhi', 0.08547237515449524, ' Delhi'), ('Vikram samvat calender is official in Gujarat', 0.07556215673685074, ' Gujarat')]
+
+roberta.fill_mask('<mask> is the common currency of the European Union', topk=3)
+# [('Euro is the common currency of the European Union', 0.9456493854522705, 'Euro'), ('euro is the common currency of the European Union', 0.025748178362846375, 'euro'), ('€ is the common currency of the European Union', 0.011183084920048714, '€')]
+```
+
+#### Pronoun disambiguation (Winograd Schema Challenge):
+
+RoBERTa can be used to disambiguate pronouns. First install spaCy and download the English-language model:
+```bash
+pip install spacy
+python -m spacy download en_core_web_lg
+```
+
+Next load the `roberta.large.wsc` model and call the `disambiguate_pronoun`
+function. The pronoun should be surrounded by square brackets (`[]`) and the
+query referent surrounded by underscores (`_`), or left blank to return the
+predicted candidate text directly:
+```python
+roberta = torch.hub.load('pytorch/fairseq', 'roberta.large.wsc', user_dir='examples/roberta/wsc')
+roberta.cuda()  # use the GPU (optional)
+
+roberta.disambiguate_pronoun('The _trophy_ would not fit in the brown suitcase because [it] was too big.')
+# True
+roberta.disambiguate_pronoun('The trophy would not fit in the brown _suitcase_ because [it] was too big.')
+# False
+
+roberta.disambiguate_pronoun('The city councilmen refused the demonstrators a permit because [they] feared violence.')
+# 'The city councilmen'
+roberta.disambiguate_pronoun('The city councilmen refused the demonstrators a permit because [they] advocated violence.')
+# 'demonstrators'
+```
+
+See the [RoBERTA Winograd Schema Challenge (WSC) README](wsc/README.md) for more details on how to train this model.
+
+#### Extract features aligned to words:
+
+By default RoBERTa outputs one feature vector per BPE token. You can instead
+realign the features to match [spaCy's word-level tokenization](https://spacy.io/usage/linguistic-features#tokenization)
+with the `extract_features_aligned_to_words` method. This will compute a
+weighted average of the BPE-level features for each word and expose them in
+spaCy's `Token.vector` attribute:
+```python
+doc = roberta.extract_features_aligned_to_words('I said, "hello RoBERTa."')
+assert len(doc) == 10
+for tok in doc:
+    print('{:10}{} (...)'.format(str(tok), tok.vector[:5]))
+# <s>       tensor([-0.1316, -0.0386, -0.0832, -0.0477,  0.1943], grad_fn=<SliceBackward>) (...)
+# I         tensor([ 0.0559,  0.1541, -0.4832,  0.0880,  0.0120], grad_fn=<SliceBackward>) (...)
+# said      tensor([-0.1565, -0.0069, -0.8915,  0.0501, -0.0647], grad_fn=<SliceBackward>) (...)
+# ,         tensor([-0.1318, -0.0387, -0.0834, -0.0477,  0.1944], grad_fn=<SliceBackward>) (...)
+# "         tensor([-0.0486,  0.1818, -0.3946, -0.0553,  0.0981], grad_fn=<SliceBackward>) (...)
+# hello     tensor([ 0.0079,  0.1799, -0.6204, -0.0777, -0.0923], grad_fn=<SliceBackward>) (...)
+# RoBERTa   tensor([-0.2339, -0.1184, -0.7343, -0.0492,  0.5829], grad_fn=<SliceBackward>) (...)
+# .         tensor([-0.1341, -0.1203, -0.1012, -0.0621,  0.1892], grad_fn=<SliceBackward>) (...)
+# "         tensor([-0.1341, -0.1203, -0.1012, -0.0621,  0.1892], grad_fn=<SliceBackward>) (...)
+# </s>      tensor([-0.0930, -0.0392, -0.0821,  0.0158,  0.0649], grad_fn=<SliceBackward>) (...)
+```
+
+#### Evaluating the `roberta.large.mnli` model:
+
+Example python code snippet to evaluate accuracy on the MNLI `dev_matched` set.
+```python
+label_map = {0: 'contradiction', 1: 'neutral', 2: 'entailment'}
+ncorrect, nsamples = 0, 0
+roberta.cuda()
+roberta.eval()
+with open('glue_data/MNLI/dev_matched.tsv') as fin:
+    fin.readline()
+    for index, line in enumerate(fin):
+        tokens = line.strip().split('\t')
+        sent1, sent2, target = tokens[8], tokens[9], tokens[-1]
+        tokens = roberta.encode(sent1, sent2)
+        prediction = roberta.predict('mnli', tokens).argmax().item()
+        prediction_label = label_map[prediction]
+        ncorrect += int(prediction_label == target)
+        nsamples += 1
+print('| Accuracy: ', float(ncorrect)/float(nsamples))
+# Expected output: 0.9060
+```
+
+## Finetuning
+
+- [Finetuning on GLUE](README.glue.md)
+- [Finetuning on custom classification tasks (e.g., IMDB)](README.custom_classification.md)
+- [Finetuning on Winograd Schema Challenge (WSC)](wsc/README.md)
+- [Finetuning on Commonsense QA (CQA)](commonsense_qa/README.md)
+
+## Pretraining using your own data
+
+See the [tutorial for pretraining RoBERTa using your own data](README.pretraining.md).
+
+## Citation
+
+```bibtex
+@article{liu2019roberta,
+    title = {RoBERTa: A Robustly Optimized BERT Pretraining Approach},
+    author = {Yinhan Liu and Myle Ott and Naman Goyal and Jingfei Du and
+              Mandar Joshi and Danqi Chen and Omer Levy and Mike Lewis and
+              Luke Zettlemoyer and Veselin Stoyanov},
+    journal={arXiv preprint arXiv:1907.11692},
+    year = {2019},
+}
+```
diff --git a/SpeechT5/fairseq/examples/roberta/README.pretraining.md b/SpeechT5/fairseq/examples/roberta/README.pretraining.md
new file mode 100644
index 0000000000000000000000000000000000000000..8b6e10c08c14713e7f3f7ee37c44e9b6a662df06
--- /dev/null
+++ b/SpeechT5/fairseq/examples/roberta/README.pretraining.md
@@ -0,0 +1,98 @@
+# Pretraining RoBERTa using your own data
+
+This tutorial will walk you through pretraining RoBERTa over your own data.
+
+### 1) Preprocess the data
+
+Data should be preprocessed following the [language modeling format](/examples/language_model), i.e. each document should be separated by an empty line (only useful with `--sample-break-mode complete_doc`). Lines will be concatenated as a 1D text stream during training.
+
+We'll use the [WikiText-103 dataset](https://www.salesforce.com/products/einstein/ai-research/the-wikitext-dependency-language-modeling-dataset/)
+to demonstrate how to preprocess raw text data with the GPT-2 BPE. Of course
+this dataset is quite small, so the resulting pretrained model will perform
+poorly, but it gives the general idea.
+
+First download the dataset:
+```bash
+wget https://s3.amazonaws.com/research.metamind.io/wikitext/wikitext-103-raw-v1.zip
+unzip wikitext-103-raw-v1.zip
+```
+
+Next encode it with the GPT-2 BPE:
+```bash
+mkdir -p gpt2_bpe
+wget -O gpt2_bpe/encoder.json https://dl.fbaipublicfiles.com/fairseq/gpt2_bpe/encoder.json
+wget -O gpt2_bpe/vocab.bpe https://dl.fbaipublicfiles.com/fairseq/gpt2_bpe/vocab.bpe
+for SPLIT in train valid test; do \
+    python -m examples.roberta.multiprocessing_bpe_encoder \
+        --encoder-json gpt2_bpe/encoder.json \
+        --vocab-bpe gpt2_bpe/vocab.bpe \
+        --inputs wikitext-103-raw/wiki.${SPLIT}.raw \
+        --outputs wikitext-103-raw/wiki.${SPLIT}.bpe \
+        --keep-empty \
+        --workers 60; \
+done
+```
+
+Finally preprocess/binarize the data using the GPT-2 fairseq dictionary:
+```bash
+wget -O gpt2_bpe/dict.txt https://dl.fbaipublicfiles.com/fairseq/gpt2_bpe/dict.txt
+fairseq-preprocess \
+    --only-source \
+    --srcdict gpt2_bpe/dict.txt \
+    --trainpref wikitext-103-raw/wiki.train.bpe \
+    --validpref wikitext-103-raw/wiki.valid.bpe \
+    --testpref wikitext-103-raw/wiki.test.bpe \
+    --destdir data-bin/wikitext-103 \
+    --workers 60
+```
+
+### 2) Train RoBERTa base
+```bash
+TOTAL_UPDATES=125000    # Total number of training steps
+WARMUP_UPDATES=10000    # Warmup the learning rate over this many updates
+PEAK_LR=0.0005          # Peak learning rate, adjust as needed
+TOKENS_PER_SAMPLE=512   # Max sequence length
+MAX_POSITIONS=512       # Num. positional embeddings (usually same as above)
+MAX_SENTENCES=16        # Number of sequences per batch (batch size)
+UPDATE_FREQ=16          # Increase the batch size 16x
+
+DATA_DIR=data-bin/wikitext-103
+
+fairseq-train --fp16 $DATA_DIR \
+    --task masked_lm --criterion masked_lm \
+    --arch roberta_base --sample-break-mode complete --tokens-per-sample $TOKENS_PER_SAMPLE \
+    --optimizer adam --adam-betas '(0.9,0.98)' --adam-eps 1e-6 --clip-norm 0.0 \
+    --lr-scheduler polynomial_decay --lr $PEAK_LR --warmup-updates $WARMUP_UPDATES --total-num-update $TOTAL_UPDATES \
+    --dropout 0.1 --attention-dropout 0.1 --weight-decay 0.01 \
+    --batch-size $MAX_SENTENCES --update-freq $UPDATE_FREQ \
+    --max-update $TOTAL_UPDATES --log-format simple --log-interval 1
+```
+
+**Note:** You can optionally resume training the released RoBERTa base model by
+adding `--restore-file /path/to/roberta.base/model.pt`.
+
+**Note:** The above command assumes training on 8x32GB V100 GPUs. Each GPU uses
+a batch size of 16 sequences (`$MAX_SENTENCES`) and accumulates gradients to
+further increase the batch size by 16x (`$UPDATE_FREQ`), for a total batch size
+of 2048 sequences. If you have fewer GPUs or GPUs with less memory you may need
+to reduce `$MAX_SENTENCES` and increase `$UPDATE_FREQ` to compensate.
+Alternatively if you have more GPUs you can decrease `$UPDATE_FREQ` accordingly
+to increase training speed.
+
+**Note:** The learning rate and batch size are tightly connected and need to be
+adjusted together. We generally recommend increasing the learning rate as you
+increase the batch size according to the following table (although it's also
+dataset dependent, so don't rely on the following values too closely):
+
+batch size | peak learning rate
+---|---
+256 | 0.0001
+2048 | 0.0005
+8192 | 0.0007
+
+### 3) Load your pretrained model
+```python
+from fairseq.models.roberta import RobertaModel
+roberta = RobertaModel.from_pretrained('checkpoints', 'checkpoint_best.pt', 'path/to/data')
+assert isinstance(roberta.model, torch.nn.Module)
+```
diff --git a/SpeechT5/fairseq/examples/roberta/README.race.md b/SpeechT5/fairseq/examples/roberta/README.race.md
new file mode 100644
index 0000000000000000000000000000000000000000..13c917e8eca6621e91dce541c7e41436b38cbdc1
--- /dev/null
+++ b/SpeechT5/fairseq/examples/roberta/README.race.md
@@ -0,0 +1,68 @@
+# Finetuning RoBERTa on RACE tasks
+
+### 1) Download the data from RACE website (http://www.cs.cmu.edu/~glai1/data/race/)
+
+### 2) Preprocess RACE data:
+```bash
+python ./examples/roberta/preprocess_RACE.py --input-dir <input-dir> --output-dir <extracted-data-dir>
+./examples/roberta/preprocess_RACE.sh <extracted-data-dir> <output-dir>
+```
+
+### 3) Fine-tuning on RACE:
+
+```bash
+MAX_EPOCH=5           # Number of training epochs.
+LR=1e-05              # Peak LR for fixed LR scheduler.
+NUM_CLASSES=4
+MAX_SENTENCES=1       # Batch size per GPU.
+UPDATE_FREQ=8         # Accumulate gradients to simulate training on 8 GPUs.
+DATA_DIR=/path/to/race-output-dir
+ROBERTA_PATH=/path/to/roberta/model.pt
+
+CUDA_VISIBLE_DEVICES=0,1 fairseq-train $DATA_DIR --ddp-backend=legacy_ddp \
+    --restore-file $ROBERTA_PATH \
+    --reset-optimizer --reset-dataloader --reset-meters \
+    --best-checkpoint-metric accuracy --maximize-best-checkpoint-metric \
+    --task sentence_ranking \
+    --num-classes $NUM_CLASSES \
+    --init-token 0 --separator-token 2 \
+    --max-option-length 128 \
+    --max-positions 512 \
+    --shorten-method "truncate" \
+    --arch roberta_large \
+    --dropout 0.1 --attention-dropout 0.1 --weight-decay 0.01 \
+    --criterion sentence_ranking \
+    --optimizer adam --adam-betas '(0.9, 0.98)' --adam-eps 1e-06 \
+    --clip-norm 0.0 \
+    --lr-scheduler fixed --lr $LR \
+    --fp16 --fp16-init-scale 4 --threshold-loss-scale 1 --fp16-scale-window 128 \
+    --batch-size $MAX_SENTENCES \
+    --required-batch-size-multiple 1 \
+    --update-freq $UPDATE_FREQ \
+    --max-epoch $MAX_EPOCH
+```
+
+**Note:**
+
+a) As contexts in RACE are relatively long, we are using smaller batch size per GPU while increasing update-freq to achieve larger effective batch size.
+
+b) Above cmd-args and hyperparams are tested on one Nvidia `V100` GPU with `32gb` of memory for each task. Depending on the GPU memory resources available to you, you can use increase `--update-freq` and reduce `--batch-size`.
+
+c) The setting in above command is based on our hyperparam search within a fixed search space (for careful comparison across models). You might be able to find better metrics with wider hyperparam search.  
+
+### 4) Evaluation:
+
+```
+DATA_DIR=/path/to/race-output-dir       # data directory used during training
+MODEL_PATH=/path/to/checkpoint_best.pt  # path to the finetuned model checkpoint
+PREDS_OUT=preds.tsv                     # output file path to save prediction
+TEST_SPLIT=test                         # can be test (Middle) or test1 (High)
+fairseq-validate \
+    $DATA_DIR \
+    --valid-subset $TEST_SPLIT \
+    --path $MODEL_PATH \
+    --batch-size 1 \
+    --task sentence_ranking \
+    --criterion sentence_ranking \
+    --save-predictions $PREDS_OUT
+```
diff --git a/SpeechT5/fairseq/examples/roberta/commonsense_qa/README.md b/SpeechT5/fairseq/examples/roberta/commonsense_qa/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..05c6f841a8966d2b74a8d3fe73bca22694fe9a8a
--- /dev/null
+++ b/SpeechT5/fairseq/examples/roberta/commonsense_qa/README.md
@@ -0,0 +1,99 @@
+# Finetuning RoBERTa on Commonsense QA
+
+We follow a similar approach to [finetuning RACE](../README.race.md). Specifically
+for each question we construct five inputs, one for each of the five candidate
+answer choices. Each input is constructed by concatenating the question and
+candidate answer. We then encode each input and pass the resulting "[CLS]"
+representations through a fully-connected layer to predict the correct answer.
+We train with a standard cross-entropy loss.
+
+We also found it helpful to prepend a prefix of `Q:` to the question and `A:` to
+the answer. The complete input format is:
+```
+<s> Q: Where would I not want a fox? </s> A: hen house </s>
+```
+
+Our final submission is based on a hyperparameter search over the learning rate
+(1e-5, 2e-5, 3e-5), batch size (8, 16), number of training steps (2000, 3000,
+4000) and random seed. We selected the model with the best performance on the
+development set after 100 trials.
+
+### 1) Download data from the Commonsense QA website (https://www.tau-nlp.org/commonsenseqa)
+```bash
+bash examples/roberta/commonsense_qa/download_cqa_data.sh
+```
+
+### 2) Finetune
+
+```bash
+MAX_UPDATES=3000      # Number of training steps.
+WARMUP_UPDATES=150    # Linearly increase LR over this many steps.
+LR=1e-05              # Peak LR for polynomial LR scheduler.
+MAX_SENTENCES=16      # Batch size.
+SEED=1                # Random seed.
+ROBERTA_PATH=/path/to/roberta/model.pt
+DATA_DIR=data/CommonsenseQA
+
+# we use the --user-dir option to load the task from
+# the examples/roberta/commonsense_qa directory:
+FAIRSEQ_PATH=/path/to/fairseq
+FAIRSEQ_USER_DIR=${FAIRSEQ_PATH}/examples/roberta/commonsense_qa
+
+CUDA_VISIBLE_DEVICES=0 fairseq-train --fp16 --ddp-backend=legacy_ddp \
+    $DATA_DIR \
+    --user-dir $FAIRSEQ_USER_DIR \
+    --restore-file $ROBERTA_PATH \
+    --reset-optimizer --reset-dataloader --reset-meters \
+    --no-epoch-checkpoints --no-last-checkpoints --no-save-optimizer-state \
+    --best-checkpoint-metric accuracy --maximize-best-checkpoint-metric \
+    --task commonsense_qa --init-token 0 --bpe gpt2 \
+    --arch roberta_large --max-positions 512 \
+    --dropout 0.1 --attention-dropout 0.1 --weight-decay 0.01 \
+    --criterion sentence_ranking --num-classes 5 \
+    --optimizer adam --adam-betas '(0.9, 0.98)' --adam-eps 1e-06 --clip-norm 0.0 \
+    --lr-scheduler polynomial_decay --lr $LR \
+    --warmup-updates $WARMUP_UPDATES --total-num-update $MAX_UPDATES \
+    --batch-size $MAX_SENTENCES \
+    --max-update $MAX_UPDATES \
+    --log-format simple --log-interval 25 \
+    --seed $SEED
+```
+
+The above command assumes training on 1 GPU with 32GB of RAM. For GPUs with
+less memory, decrease `--batch-size` and increase `--update-freq`
+accordingly to compensate.
+
+### 3) Evaluate
+```python
+import json
+import torch
+from fairseq.models.roberta import RobertaModel
+from examples.roberta import commonsense_qa  # load the Commonsense QA task
+roberta = RobertaModel.from_pretrained('checkpoints', 'checkpoint_best.pt', 'data/CommonsenseQA')
+roberta.eval()  # disable dropout
+roberta.cuda()  # use the GPU (optional)
+nsamples, ncorrect = 0, 0
+with open('data/CommonsenseQA/valid.jsonl') as h:
+    for line in h:
+        example = json.loads(line)
+        scores = []
+        for choice in example['question']['choices']:
+            input = roberta.encode(
+                'Q: ' + example['question']['stem'],
+                'A: ' + choice['text'],
+                no_separator=True
+            )
+            score = roberta.predict('sentence_classification_head', input, return_logits=True)
+            scores.append(score)
+        pred = torch.cat(scores).argmax()
+        answer = ord(example['answerKey']) - ord('A')
+        nsamples += 1
+        if pred == answer:
+            ncorrect += 1
+
+print('Accuracy: ' + str(ncorrect / float(nsamples)))
+# Accuracy: 0.7846027846027847
+```
+
+The above snippet is not batched, which makes it quite slow. See [instructions
+for batched prediction with RoBERTa](https://github.com/pytorch/fairseq/tree/master/examples/roberta#batched-prediction).
diff --git a/SpeechT5/fairseq/examples/roberta/commonsense_qa/__init__.py b/SpeechT5/fairseq/examples/roberta/commonsense_qa/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..42d21f35eb3dd33a053dcf0edd5eadd2dff11294
--- /dev/null
+++ b/SpeechT5/fairseq/examples/roberta/commonsense_qa/__init__.py
@@ -0,0 +1,6 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from . import commonsense_qa_task  # noqa
diff --git a/SpeechT5/fairseq/examples/roberta/commonsense_qa/commonsense_qa_task.py b/SpeechT5/fairseq/examples/roberta/commonsense_qa/commonsense_qa_task.py
new file mode 100644
index 0000000000000000000000000000000000000000..216093f7087a61060767babf5a3f3f4e716a4dfe
--- /dev/null
+++ b/SpeechT5/fairseq/examples/roberta/commonsense_qa/commonsense_qa_task.py
@@ -0,0 +1,190 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import json
+import os
+
+import numpy as np
+import torch
+from fairseq.data import (
+    Dictionary,
+    IdDataset,
+    ListDataset,
+    NestedDictionaryDataset,
+    NumelDataset,
+    NumSamplesDataset,
+    RawLabelDataset,
+    RightPadDataset,
+    SortDataset,
+    data_utils,
+    encoders,
+)
+from fairseq.tasks import LegacyFairseqTask, register_task
+
+
+@register_task("commonsense_qa")
+class CommonsenseQATask(LegacyFairseqTask):
+    """Task to finetune RoBERTa for Commonsense QA."""
+
+    @staticmethod
+    def add_args(parser):
+        """Add task-specific arguments to the parser."""
+        parser.add_argument(
+            "data", metavar="DIR", help="path to data directory; we load <split>.jsonl"
+        )
+        parser.add_argument(
+            "--init-token",
+            type=int,
+            default=None,
+            help="add token at the beginning of each batch item",
+        )
+        parser.add_argument("--num-classes", type=int, default=5)
+
+    def __init__(self, args, vocab):
+        super().__init__(args)
+        self.vocab = vocab
+        self.mask = vocab.add_symbol("<mask>")
+
+        self.bpe = encoders.build_bpe(args)
+
+    @classmethod
+    def load_dictionary(cls, filename):
+        """Load the dictionary from the filename
+
+        Args:
+            filename (str): the filename
+        """
+        dictionary = Dictionary.load(filename)
+        dictionary.add_symbol("<mask>")
+        return dictionary
+
+    @classmethod
+    def setup_task(cls, args, **kwargs):
+        assert (
+            args.criterion == "sentence_ranking"
+        ), "Must set --criterion=sentence_ranking"
+
+        # load data and label dictionaries
+        vocab = cls.load_dictionary(os.path.join(args.data, "dict.txt"))
+        print("| dictionary: {} types".format(len(vocab)))
+
+        return cls(args, vocab)
+
+    def load_dataset(
+        self, split, epoch=1, combine=False, data_path=None, return_only=False, **kwargs
+    ):
+        """Load a given dataset split.
+
+        Args:
+            split (str): name of the split (e.g., train, valid, test)
+        """
+
+        def binarize(s, append_bos=False):
+            if self.bpe is not None:
+                s = self.bpe.encode(s)
+            tokens = self.vocab.encode_line(
+                s,
+                append_eos=True,
+                add_if_not_exist=False,
+            ).long()
+            if append_bos and self.args.init_token is not None:
+                tokens = torch.cat([tokens.new([self.args.init_token]), tokens])
+            return tokens
+
+        if data_path is None:
+            data_path = os.path.join(self.args.data, split + ".jsonl")
+        if not os.path.exists(data_path):
+            raise FileNotFoundError("Cannot find data: {}".format(data_path))
+
+        src_tokens = [[] for i in range(self.args.num_classes)]
+        src_lengths = [[] for i in range(self.args.num_classes)]
+        labels = []
+
+        with open(data_path) as h:
+            for line in h:
+                example = json.loads(line.strip())
+                if "answerKey" in example:
+                    label = ord(example["answerKey"]) - ord("A")
+                    labels.append(label)
+                question = example["question"]["stem"]
+                assert len(example["question"]["choices"]) == self.args.num_classes
+                # format: `<s> Q: Where would I not want a fox? </s> A: hen house </s>`
+                question = "Q: " + question
+                question_toks = binarize(question, append_bos=True)
+                for i, choice in enumerate(example["question"]["choices"]):
+                    src = "A: " + choice["text"]
+                    src_bin = torch.cat([question_toks, binarize(src)])
+                    src_tokens[i].append(src_bin)
+                    src_lengths[i].append(len(src_bin))
+        assert all(
+            len(src_tokens[0]) == len(src_tokens[i])
+            for i in range(self.args.num_classes)
+        )
+        assert len(src_tokens[0]) == len(src_lengths[0])
+        assert len(labels) == 0 or len(labels) == len(src_tokens[0])
+
+        for i in range(self.args.num_classes):
+            src_lengths[i] = np.array(src_lengths[i])
+            src_tokens[i] = ListDataset(src_tokens[i], src_lengths[i])
+            src_lengths[i] = ListDataset(src_lengths[i])
+
+        dataset = {
+            "id": IdDataset(),
+            "nsentences": NumSamplesDataset(),
+            "ntokens": NumelDataset(src_tokens[0], reduce=True),
+        }
+
+        for i in range(self.args.num_classes):
+            dataset.update(
+                {
+                    "net_input{}".format(i + 1): {
+                        "src_tokens": RightPadDataset(
+                            src_tokens[i],
+                            pad_idx=self.source_dictionary.pad(),
+                        ),
+                        "src_lengths": src_lengths[i],
+                    }
+                }
+            )
+
+        if len(labels) > 0:
+            dataset.update({"target": RawLabelDataset(labels)})
+
+        dataset = NestedDictionaryDataset(
+            dataset,
+            sizes=[np.maximum.reduce([src_token.sizes for src_token in src_tokens])],
+        )
+
+        with data_utils.numpy_seed(self.args.seed):
+            dataset = SortDataset(
+                dataset,
+                # shuffle
+                sort_order=[np.random.permutation(len(dataset))],
+            )
+
+        print("| Loaded {} with {} samples".format(split, len(dataset)))
+
+        self.datasets[split] = dataset
+        return self.datasets[split]
+
+    def build_model(self, args):
+        from fairseq import models
+
+        model = models.build_model(args, self)
+
+        model.register_classification_head(
+            "sentence_classification_head",
+            num_classes=1,
+        )
+
+        return model
+
+    @property
+    def source_dictionary(self):
+        return self.vocab
+
+    @property
+    def target_dictionary(self):
+        return self.vocab
diff --git a/SpeechT5/fairseq/examples/roberta/commonsense_qa/download_cqa_data.sh b/SpeechT5/fairseq/examples/roberta/commonsense_qa/download_cqa_data.sh
new file mode 100644
index 0000000000000000000000000000000000000000..5f300093fa0a0feb819d8b6aed307b59e3891d01
--- /dev/null
+++ b/SpeechT5/fairseq/examples/roberta/commonsense_qa/download_cqa_data.sh
@@ -0,0 +1,14 @@
+#!/bin/bash
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+OUTDIR=data/CommonsenseQA
+
+mkdir -p $OUTDIR
+
+wget -O $OUTDIR/train.jsonl https://s3.amazonaws.com/commensenseqa/train_rand_split.jsonl
+wget -O $OUTDIR/valid.jsonl https://s3.amazonaws.com/commensenseqa/dev_rand_split.jsonl
+wget -O $OUTDIR/test.jsonl https://s3.amazonaws.com/commensenseqa/test_rand_split_no_answers.jsonl
+wget -O $OUTDIR/dict.txt https://dl.fbaipublicfiles.com/fairseq/gpt2_bpe/dict.txt
diff --git a/SpeechT5/fairseq/examples/roberta/multiprocessing_bpe_encoder.py b/SpeechT5/fairseq/examples/roberta/multiprocessing_bpe_encoder.py
new file mode 100644
index 0000000000000000000000000000000000000000..43fe0451bf4d5762d734314075b1402c2a8db2bb
--- /dev/null
+++ b/SpeechT5/fairseq/examples/roberta/multiprocessing_bpe_encoder.py
@@ -0,0 +1,130 @@
+#!/usr/bin/env python
+# Copyright (c) Facebook, Inc. and its affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+
+import argparse
+import contextlib
+import sys
+from collections import Counter
+from multiprocessing import Pool
+
+from fairseq.data.encoders.gpt2_bpe import get_encoder
+
+
+def main():
+    """
+    Helper script to encode raw text with the GPT-2 BPE using multiple processes.
+
+    The encoder.json and vocab.bpe files can be obtained here:
+    - https://dl.fbaipublicfiles.com/fairseq/gpt2_bpe/encoder.json
+    - https://dl.fbaipublicfiles.com/fairseq/gpt2_bpe/vocab.bpe
+    """
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        "--encoder-json",
+        help="path to encoder.json",
+    )
+    parser.add_argument(
+        "--vocab-bpe",
+        type=str,
+        help="path to vocab.bpe",
+    )
+    parser.add_argument(
+        "--inputs",
+        nargs="+",
+        default=["-"],
+        help="input files to filter/encode",
+    )
+    parser.add_argument(
+        "--outputs",
+        nargs="+",
+        default=["-"],
+        help="path to save encoded outputs",
+    )
+    parser.add_argument(
+        "--keep-empty",
+        action="store_true",
+        help="keep empty lines",
+    )
+    parser.add_argument("--workers", type=int, default=20)
+    args = parser.parse_args()
+
+    assert len(args.inputs) == len(
+        args.outputs
+    ), "number of input and output paths should match"
+
+    with contextlib.ExitStack() as stack:
+        inputs = [
+            stack.enter_context(open(input, "r", encoding="utf-8"))
+            if input != "-"
+            else sys.stdin
+            for input in args.inputs
+        ]
+        outputs = [
+            stack.enter_context(open(output, "w", encoding="utf-8"))
+            if output != "-"
+            else sys.stdout
+            for output in args.outputs
+        ]
+
+        encoder = MultiprocessingEncoder(args)
+        pool = Pool(args.workers, initializer=encoder.initializer)
+        encoded_lines = pool.imap(encoder.encode_lines, zip(*inputs), 100)
+
+        stats = Counter()
+        for i, (filt, enc_lines) in enumerate(encoded_lines, start=1):
+            if filt == "PASS":
+                for enc_line, output_h in zip(enc_lines, outputs):
+                    print(enc_line, file=output_h)
+            else:
+                stats["num_filtered_" + filt] += 1
+            if i % 10000 == 0:
+                print("processed {} lines".format(i), file=sys.stderr)
+
+        for k, v in stats.most_common():
+            print("[{}] filtered {} lines".format(k, v), file=sys.stderr)
+
+
+class MultiprocessingEncoder(object):
+    def __init__(self, args):
+        self.args = args
+
+    def initializer(self):
+        global bpe
+        bpe = get_encoder(self.args.encoder_json, self.args.vocab_bpe)
+
+    def encode(self, line):
+        global bpe
+        ids = bpe.encode(line)
+        return list(map(str, ids))
+
+    def decode(self, tokens):
+        global bpe
+        return bpe.decode(tokens)
+
+    def encode_lines(self, lines):
+        """
+        Encode a set of lines. All lines will be encoded together.
+        """
+        enc_lines = []
+        for line in lines:
+            line = line.strip()
+            if len(line) == 0 and not self.args.keep_empty:
+                return ["EMPTY", None]
+            tokens = self.encode(line)
+            enc_lines.append(" ".join(tokens))
+        return ["PASS", enc_lines]
+
+    def decode_lines(self, lines):
+        dec_lines = []
+        for line in lines:
+            tokens = map(int, line.strip().split())
+            dec_lines.append(self.decode(tokens))
+        return ["PASS", dec_lines]
+
+
+if __name__ == "__main__":
+    main()
diff --git a/SpeechT5/fairseq/examples/roberta/preprocess_GLUE_tasks.sh b/SpeechT5/fairseq/examples/roberta/preprocess_GLUE_tasks.sh
new file mode 100644
index 0000000000000000000000000000000000000000..7f215a3b53e1c4a7b1f0320102915a49d84a5015
--- /dev/null
+++ b/SpeechT5/fairseq/examples/roberta/preprocess_GLUE_tasks.sh
@@ -0,0 +1,185 @@
+#!/bin/bash
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+
+# raw glue data as downloaded by glue download script (https://gist.github.com/W4ngatang/60c2bdb54d156a41194446737ce03e2e)
+if [[ $# -ne 2 ]]; then
+  echo "Run as following:"
+  echo "./examples/roberta/preprocess_GLUE_tasks.sh <glud_data_folder> <task_name>"
+  exit 1
+fi
+
+GLUE_DATA_FOLDER=$1
+
+# download bpe encoder.json, vocabulary and fairseq dictionary
+wget -N 'https://dl.fbaipublicfiles.com/fairseq/gpt2_bpe/encoder.json'
+wget -N 'https://dl.fbaipublicfiles.com/fairseq/gpt2_bpe/vocab.bpe'
+wget -N 'https://dl.fbaipublicfiles.com/fairseq/gpt2_bpe/dict.txt'
+
+TASKS=$2 # QQP
+
+if [ "$TASKS" = "ALL" ]
+then
+  TASKS="QQP MNLI QNLI MRPC RTE STS-B SST-2 CoLA"
+fi
+
+for TASK in $TASKS
+do
+  echo "Preprocessing $TASK"
+
+  TASK_DATA_FOLDER="$GLUE_DATA_FOLDER/$TASK"
+  echo "Raw data as downloaded from glue website: $TASK_DATA_FOLDER"
+
+  SPLITS="train dev test"
+  INPUT_COUNT=2
+  if [ "$TASK" = "QQP" ]
+  then
+    INPUT_COLUMNS=( 4 5 )
+    TEST_INPUT_COLUMNS=( 2 3 )
+    LABEL_COLUMN=6
+  elif [ "$TASK" = "MNLI" ]
+  then
+    SPLITS="train dev_matched dev_mismatched test_matched test_mismatched"
+    INPUT_COLUMNS=( 9 10 )
+    TEST_INPUT_COLUMNS=( 9 10 )
+    DEV_LABEL_COLUMN=16
+    LABEL_COLUMN=12
+  elif [ "$TASK" = "QNLI" ]
+  then
+    INPUT_COLUMNS=( 2 3 )
+    TEST_INPUT_COLUMNS=( 2 3 )
+    LABEL_COLUMN=4
+  elif [ "$TASK" = "MRPC" ]
+  then
+    INPUT_COLUMNS=( 4 5 )
+    TEST_INPUT_COLUMNS=( 4 5 )
+    LABEL_COLUMN=1
+  elif [ "$TASK" = "RTE" ]
+  then
+    INPUT_COLUMNS=( 2 3 )
+    TEST_INPUT_COLUMNS=( 2 3 )
+    LABEL_COLUMN=4
+  elif [ "$TASK" = "STS-B" ]
+  then
+    INPUT_COLUMNS=( 8 9 )
+    TEST_INPUT_COLUMNS=( 8 9 )
+    LABEL_COLUMN=10
+  # Following are single sentence tasks.
+  elif [ "$TASK" = "SST-2" ]
+  then
+    INPUT_COLUMNS=( 1 )
+    TEST_INPUT_COLUMNS=( 2 )
+    LABEL_COLUMN=2
+    INPUT_COUNT=1
+  elif [ "$TASK" = "CoLA" ]
+  then
+    INPUT_COLUMNS=( 4 )
+    TEST_INPUT_COLUMNS=( 2 )
+    LABEL_COLUMN=2
+    INPUT_COUNT=1
+  fi
+
+  # Strip out header and filter lines that don't have expected number of fields.
+  rm -rf "$TASK_DATA_FOLDER/processed"
+  mkdir -p "$TASK_DATA_FOLDER/processed"
+  for SPLIT in $SPLITS
+  do
+    # CoLA train and dev doesn't have header.
+    if [[ ( "$TASK" = "CoLA") && ( "$SPLIT" != "test" ) ]]
+    then
+      cp "$TASK_DATA_FOLDER/$SPLIT.tsv" "$TASK_DATA_FOLDER/processed/$SPLIT.tsv.temp";
+    else
+      tail -n +2 "$TASK_DATA_FOLDER/$SPLIT.tsv" > "$TASK_DATA_FOLDER/processed/$SPLIT.tsv.temp";
+    fi
+
+    # Remove unformatted lines from train and dev files for QQP dataset.
+    if [[ ( "$TASK" = "QQP") && ( "$SPLIT" != "test" ) ]]
+    then
+      awk -F '\t' -v NUM_FIELDS=6 'NF==NUM_FIELDS{print}{}' "$TASK_DATA_FOLDER/processed/$SPLIT.tsv.temp" > "$TASK_DATA_FOLDER/processed/$SPLIT.tsv";
+    else
+      cp "$TASK_DATA_FOLDER/processed/$SPLIT.tsv.temp" "$TASK_DATA_FOLDER/processed/$SPLIT.tsv";
+    fi
+    rm "$TASK_DATA_FOLDER/processed/$SPLIT.tsv.temp";
+  done
+
+  # Split into input0, input1 and label
+  for SPLIT in $SPLITS
+  do
+    for INPUT_TYPE in $(seq 0 $((INPUT_COUNT-1)))
+    do
+      if [[ "$SPLIT" != test* ]]
+      then
+        COLUMN_NUMBER=${INPUT_COLUMNS[$INPUT_TYPE]}
+      else
+        COLUMN_NUMBER=${TEST_INPUT_COLUMNS[$INPUT_TYPE]}
+      fi
+      cut -f"$COLUMN_NUMBER" "$TASK_DATA_FOLDER/processed/$SPLIT.tsv" > "$TASK_DATA_FOLDER/processed/$SPLIT.raw.input$INPUT_TYPE";
+    done
+
+    if [[ "$SPLIT" != test* ]]
+    then
+      if [ "$TASK" = "MNLI" ] && [ "$SPLIT" != "train" ]
+      then
+        cut -f"$DEV_LABEL_COLUMN" "$TASK_DATA_FOLDER/processed/$SPLIT.tsv"  > "$TASK_DATA_FOLDER/processed/$SPLIT.label";
+      else
+        cut -f"$LABEL_COLUMN" "$TASK_DATA_FOLDER/processed/$SPLIT.tsv" > "$TASK_DATA_FOLDER/processed/$SPLIT.label";
+      fi
+    fi
+
+    # BPE encode.
+    for INPUT_TYPE in $(seq 0 $((INPUT_COUNT-1)))
+    do
+      LANG="input$INPUT_TYPE"
+      echo "BPE encoding $SPLIT/$LANG"
+      python -m examples.roberta.multiprocessing_bpe_encoder \
+      --encoder-json encoder.json \
+      --vocab-bpe vocab.bpe \
+      --inputs "$TASK_DATA_FOLDER/processed/$SPLIT.raw.$LANG" \
+      --outputs "$TASK_DATA_FOLDER/processed/$SPLIT.$LANG" \
+      --workers 60 \
+      --keep-empty;
+    done
+  done
+
+  # Remove output directory.
+  rm -rf "$TASK-bin"
+
+  DEVPREF="$TASK_DATA_FOLDER/processed/dev.LANG"
+  TESTPREF="$TASK_DATA_FOLDER/processed/test.LANG"
+  if [ "$TASK" = "MNLI" ]
+  then
+    DEVPREF="$TASK_DATA_FOLDER/processed/dev_matched.LANG,$TASK_DATA_FOLDER/processed/dev_mismatched.LANG"
+    TESTPREF="$TASK_DATA_FOLDER/processed/test_matched.LANG,$TASK_DATA_FOLDER/processed/test_mismatched.LANG"
+  fi
+
+  # Run fairseq preprocessing:
+  for INPUT_TYPE in $(seq 0 $((INPUT_COUNT-1)))
+  do
+    LANG="input$INPUT_TYPE"
+    fairseq-preprocess \
+      --only-source \
+      --trainpref "$TASK_DATA_FOLDER/processed/train.$LANG" \
+      --validpref "${DEVPREF//LANG/$LANG}" \
+      --testpref "${TESTPREF//LANG/$LANG}" \
+      --destdir "$TASK-bin/$LANG" \
+      --workers 60 \
+      --srcdict dict.txt;
+  done
+  if [[ "$TASK" !=  "STS-B" ]]
+  then
+    fairseq-preprocess \
+      --only-source \
+      --trainpref "$TASK_DATA_FOLDER/processed/train.label" \
+      --validpref "${DEVPREF//LANG/label}" \
+      --destdir "$TASK-bin/label" \
+      --workers 60;
+  else
+    # For STS-B output range is converted to be between: [0.0, 1.0]
+    mkdir -p "$TASK-bin/label"
+    awk '{print $1 / 5.0 }' "$TASK_DATA_FOLDER/processed/train.label" > "$TASK-bin/label/train.label"
+    awk '{print $1 / 5.0 }' "$TASK_DATA_FOLDER/processed/dev.label" > "$TASK-bin/label/valid.label"
+  fi
+done
diff --git a/SpeechT5/fairseq/examples/roberta/preprocess_RACE.py b/SpeechT5/fairseq/examples/roberta/preprocess_RACE.py
new file mode 100644
index 0000000000000000000000000000000000000000..cdd66072718ccb6033304c97926271909a17f9d6
--- /dev/null
+++ b/SpeechT5/fairseq/examples/roberta/preprocess_RACE.py
@@ -0,0 +1,102 @@
+#!/usr/bin/env python
+# Copyright (c) Facebook, Inc. and its affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+
+import argparse
+import json
+import os
+import re
+
+
+class InputExample:
+    def __init__(self, paragraph, qa_list, label):
+        self.paragraph = paragraph
+        self.qa_list = qa_list
+        self.label = label
+
+
+def get_examples(data_dir, set_type):
+    """
+    Extract paragraph and question-answer list from each json file
+    """
+    examples = []
+
+    levels = ["middle", "high"]
+    set_type_c = set_type.split("-")
+    if len(set_type_c) == 2:
+        levels = [set_type_c[1]]
+        set_type = set_type_c[0]
+    for level in levels:
+        cur_dir = os.path.join(data_dir, set_type, level)
+        for filename in os.listdir(cur_dir):
+            cur_path = os.path.join(cur_dir, filename)
+            with open(cur_path, "r") as f:
+                cur_data = json.load(f)
+                answers = cur_data["answers"]
+                options = cur_data["options"]
+                questions = cur_data["questions"]
+                context = cur_data["article"].replace("\n", " ")
+                context = re.sub(r"\s+", " ", context)
+                for i in range(len(answers)):
+                    label = ord(answers[i]) - ord("A")
+                    qa_list = []
+                    question = questions[i]
+                    for j in range(4):
+                        option = options[i][j]
+                        if "_" in question:
+                            qa_cat = question.replace("_", option)
+                        else:
+                            qa_cat = " ".join([question, option])
+                        qa_cat = re.sub(r"\s+", " ", qa_cat)
+                        qa_list.append(qa_cat)
+                    examples.append(InputExample(context, qa_list, label))
+
+    return examples
+
+
+def main():
+    """
+    Helper script to extract paragraphs questions and answers from RACE datasets.
+    """
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        "--input-dir",
+        help="input directory for downloaded RACE dataset",
+    )
+    parser.add_argument(
+        "--output-dir",
+        help="output directory for extracted data",
+    )
+    args = parser.parse_args()
+
+    if not os.path.exists(args.output_dir):
+        os.makedirs(args.output_dir, exist_ok=True)
+
+    for set_type in ["train", "dev", "test-middle", "test-high"]:
+        examples = get_examples(args.input_dir, set_type)
+        qa_file_paths = [
+            os.path.join(args.output_dir, set_type + ".input" + str(i + 1))
+            for i in range(4)
+        ]
+        qa_files = [open(qa_file_path, "w") for qa_file_path in qa_file_paths]
+        outf_context_path = os.path.join(args.output_dir, set_type + ".input0")
+        outf_label_path = os.path.join(args.output_dir, set_type + ".label")
+        outf_context = open(outf_context_path, "w")
+        outf_label = open(outf_label_path, "w")
+        for example in examples:
+            outf_context.write(example.paragraph + "\n")
+            for i in range(4):
+                qa_files[i].write(example.qa_list[i] + "\n")
+            outf_label.write(str(example.label) + "\n")
+
+        for f in qa_files:
+            f.close()
+        outf_label.close()
+        outf_context.close()
+
+
+if __name__ == "__main__":
+    main()
diff --git a/SpeechT5/fairseq/examples/roberta/preprocess_RACE.sh b/SpeechT5/fairseq/examples/roberta/preprocess_RACE.sh
new file mode 100644
index 0000000000000000000000000000000000000000..932d2ab6e521fecc7d0297f26a8c43857541ef3b
--- /dev/null
+++ b/SpeechT5/fairseq/examples/roberta/preprocess_RACE.sh
@@ -0,0 +1,59 @@
+#!/bin/bash
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+
+# data should be downloaded and processed with reprocess_RACE.py
+if [[ $# -ne 2 ]]; then
+  echo "Run as following:"
+  echo "./examples/roberta/preprocess_RACE.sh <race_data_folder> <output_folder>"
+  exit 1
+fi
+
+RACE_DATA_FOLDER=$1
+OUT_DATA_FOLDER=$2
+
+# download bpe encoder.json, vocabulary and fairseq dictionary
+wget -N 'https://dl.fbaipublicfiles.com/fairseq/gpt2_bpe/encoder.json'
+wget -N 'https://dl.fbaipublicfiles.com/fairseq/gpt2_bpe/vocab.bpe'
+wget -N 'https://dl.fbaipublicfiles.com/fairseq/gpt2_bpe/dict.txt'
+
+SPLITS="train dev test-middle test-high"
+INPUT_TYPES="input0 input1 input2 input3 input4"
+for INPUT_TYPE in $INPUT_TYPES
+do
+  for SPLIT in $SPLITS
+      do
+      echo "BPE encoding $SPLIT/$INPUT_TYPE"
+      python -m examples.roberta.multiprocessing_bpe_encoder \
+            --encoder-json encoder.json \
+            --vocab-bpe vocab.bpe \
+            --inputs "$RACE_DATA_FOLDER/$SPLIT.$INPUT_TYPE" \
+            --outputs "$RACE_DATA_FOLDER/$SPLIT.$INPUT_TYPE.bpe" \
+            --workers 10 \
+            --keep-empty;
+
+      done
+done
+
+for INPUT_TYPE in $INPUT_TYPES
+    do
+      LANG="input$INPUT_TYPE"
+      fairseq-preprocess \
+        --only-source \
+        --trainpref "$RACE_DATA_FOLDER/train.$INPUT_TYPE.bpe" \
+        --validpref "$RACE_DATA_FOLDER/dev.$INPUT_TYPE.bpe" \
+        --testpref "$RACE_DATA_FOLDER/test-middle.$INPUT_TYPE.bpe,$RACE_DATA_FOLDER/test-high.$INPUT_TYPE.bpe" \
+        --destdir "$OUT_DATA_FOLDER/$INPUT_TYPE" \
+        --workers 10 \
+        --srcdict dict.txt;
+done
+
+rm -rf "$OUT_DATA_FOLDER/label"
+mkdir -p "$OUT_DATA_FOLDER/label"
+cp "$RACE_DATA_FOLDER/train.label" "$OUT_DATA_FOLDER/label/"
+cp "$RACE_DATA_FOLDER/dev.label" "$OUT_DATA_FOLDER/label/valid.label"
+cp "$RACE_DATA_FOLDER/test-middle.label" "$OUT_DATA_FOLDER/label/test.label"
+cp "$RACE_DATA_FOLDER/test-high.label" "$OUT_DATA_FOLDER/label/test1.label"
diff --git a/SpeechT5/fairseq/examples/roberta/wsc/README.md b/SpeechT5/fairseq/examples/roberta/wsc/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..21a045d999739836a17574593292e42131315ae9
--- /dev/null
+++ b/SpeechT5/fairseq/examples/roberta/wsc/README.md
@@ -0,0 +1,125 @@
+# Finetuning RoBERTa on Winograd Schema Challenge (WSC) data
+
+The following instructions can be used to finetune RoBERTa on the WSC training
+data provided by [SuperGLUE](https://super.gluebenchmark.com/).
+
+Note that there is high variance in the results. For our GLUE/SuperGLUE
+submission we swept over the learning rate (1e-5, 2e-5, 3e-5), batch size (16,
+32, 64) and total number of updates (500, 1000, 2000, 3000), as well as the
+random seed. Out of ~100 runs we chose the best 7 models and ensembled them.
+
+**Approach:** The instructions below use a slightly different loss function than
+what's described in the original RoBERTa arXiv paper. In particular,
+[Kocijan et al. (2019)](https://arxiv.org/abs/1905.06290) introduce a margin
+ranking loss between `(query, candidate)` pairs with tunable hyperparameters
+alpha and beta. This is supported in our code as well with the `--wsc-alpha` and
+`--wsc-beta` arguments. However, we achieved slightly better (and more robust)
+results on the development set by instead using a single cross entropy loss term
+over the log-probabilities for the query and all mined candidates. **The
+candidates are mined using spaCy from each input sentence in isolation, so the
+approach remains strictly pointwise.** This reduces the number of
+hyperparameters and our best model achieved 92.3% development set accuracy,
+compared to ~90% accuracy for the margin loss. Later versions of the RoBERTa
+arXiv paper will describe this updated formulation.
+
+### 1) Download the WSC data from the SuperGLUE website:
+```bash
+wget https://dl.fbaipublicfiles.com/glue/superglue/data/v2/WSC.zip
+unzip WSC.zip
+
+# we also need to copy the RoBERTa dictionary into the same directory
+wget -O WSC/dict.txt https://dl.fbaipublicfiles.com/fairseq/gpt2_bpe/dict.txt
+```
+
+### 2) Finetune over the provided training data:
+```bash
+TOTAL_NUM_UPDATES=2000  # Total number of training steps.
+WARMUP_UPDATES=250      # Linearly increase LR over this many steps.
+LR=2e-05                # Peak LR for polynomial LR scheduler.
+MAX_SENTENCES=16        # Batch size per GPU.
+SEED=1                  # Random seed.
+ROBERTA_PATH=/path/to/roberta/model.pt
+
+# we use the --user-dir option to load the task and criterion
+# from the examples/roberta/wsc directory:
+FAIRSEQ_PATH=/path/to/fairseq
+FAIRSEQ_USER_DIR=${FAIRSEQ_PATH}/examples/roberta/wsc
+
+CUDA_VISIBLE_DEVICES=0,1,2,3 fairseq-train WSC/ \
+    --restore-file $ROBERTA_PATH \
+    --reset-optimizer --reset-dataloader --reset-meters \
+    --no-epoch-checkpoints --no-last-checkpoints --no-save-optimizer-state \
+    --best-checkpoint-metric accuracy --maximize-best-checkpoint-metric \
+    --valid-subset val \
+    --fp16 --ddp-backend legacy_ddp \
+    --user-dir $FAIRSEQ_USER_DIR \
+    --task wsc --criterion wsc --wsc-cross-entropy \
+    --arch roberta_large --bpe gpt2 --max-positions 512 \
+    --dropout 0.1 --attention-dropout 0.1 --weight-decay 0.01 \
+    --optimizer adam --adam-betas '(0.9, 0.98)' --adam-eps 1e-06 \
+    --lr-scheduler polynomial_decay --lr $LR \
+    --warmup-updates $WARMUP_UPDATES --total-num-update $TOTAL_NUM_UPDATES \
+    --batch-size $MAX_SENTENCES \
+    --max-update $TOTAL_NUM_UPDATES \
+    --log-format simple --log-interval 100 \
+    --seed $SEED
+```
+
+The above command assumes training on 4 GPUs, but you can achieve the same
+results on a single GPU by adding `--update-freq=4`.
+
+### 3) Evaluate
+```python
+from fairseq.models.roberta import RobertaModel
+from examples.roberta.wsc import wsc_utils  # also loads WSC task and criterion
+roberta = RobertaModel.from_pretrained('checkpoints', 'checkpoint_best.pt', 'WSC/')
+roberta.cuda()
+nsamples, ncorrect = 0, 0
+for sentence, label in wsc_utils.jsonl_iterator('WSC/val.jsonl', eval=True):
+    pred = roberta.disambiguate_pronoun(sentence)
+    nsamples += 1
+    if pred == label:
+        ncorrect += 1
+print('Accuracy: ' + str(ncorrect / float(nsamples)))
+# Accuracy: 0.9230769230769231
+```
+
+## RoBERTa training on WinoGrande dataset
+We have also provided `winogrande` task and criterion for finetuning on the
+[WinoGrande](https://mosaic.allenai.org/projects/winogrande) like datasets
+where there are always two candidates and one is correct.
+It's more efficient implementation for such subcases.
+
+```bash
+TOTAL_NUM_UPDATES=23750 # Total number of training steps.
+WARMUP_UPDATES=2375     # Linearly increase LR over this many steps.
+LR=1e-05                # Peak LR for polynomial LR scheduler.
+MAX_SENTENCES=32        # Batch size per GPU.
+SEED=1                  # Random seed.
+ROBERTA_PATH=/path/to/roberta/model.pt
+
+# we use the --user-dir option to load the task and criterion
+# from the examples/roberta/wsc directory:
+FAIRSEQ_PATH=/path/to/fairseq
+FAIRSEQ_USER_DIR=${FAIRSEQ_PATH}/examples/roberta/wsc
+
+cd fairseq
+CUDA_VISIBLE_DEVICES=0 fairseq-train winogrande_1.0/ \
+  --restore-file $ROBERTA_PATH \
+  --reset-optimizer --reset-dataloader --reset-meters \
+  --no-epoch-checkpoints --no-last-checkpoints --no-save-optimizer-state \
+  --best-checkpoint-metric accuracy --maximize-best-checkpoint-metric \
+  --valid-subset val \
+  --fp16 --ddp-backend legacy_ddp \
+  --user-dir $FAIRSEQ_USER_DIR \
+  --task winogrande --criterion winogrande \
+  --wsc-margin-alpha 5.0 --wsc-margin-beta 0.4 \
+  --arch roberta_large --bpe gpt2 --max-positions 512 \
+  --dropout 0.1 --attention-dropout 0.1 --weight-decay 0.01 \
+  --optimizer adam --adam-betas '(0.9, 0.98)' --adam-eps 1e-06 \
+  --lr-scheduler polynomial_decay --lr $LR \
+  --warmup-updates $WARMUP_UPDATES --total-num-update $TOTAL_NUM_UPDATES \
+  --batch-size $MAX_SENTENCES \
+  --max-update $TOTAL_NUM_UPDATES \
+  --log-format simple --log-interval 100
+```
diff --git a/SpeechT5/fairseq/examples/roberta/wsc/__init__.py b/SpeechT5/fairseq/examples/roberta/wsc/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..78afa4728eeed96142900118f6452730023466c9
--- /dev/null
+++ b/SpeechT5/fairseq/examples/roberta/wsc/__init__.py
@@ -0,0 +1,7 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from . import wsc_criterion  # noqa
+from . import wsc_task  # noqa
diff --git a/SpeechT5/fairseq/examples/roberta/wsc/wsc_criterion.py b/SpeechT5/fairseq/examples/roberta/wsc/wsc_criterion.py
new file mode 100644
index 0000000000000000000000000000000000000000..ed0251fdecc3573228ad271f1090aaf914b48cd1
--- /dev/null
+++ b/SpeechT5/fairseq/examples/roberta/wsc/wsc_criterion.py
@@ -0,0 +1,167 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import math
+
+import torch
+import torch.nn.functional as F
+from fairseq import utils
+from fairseq.criterions import LegacyFairseqCriterion, register_criterion
+from fairseq.data import encoders
+
+
+@register_criterion("wsc")
+class WSCCriterion(LegacyFairseqCriterion):
+    def __init__(self, args, task):
+        super().__init__(args, task)
+        if self.args.save_predictions is not None:
+            self.prediction_h = open(self.args.save_predictions, "w")
+        else:
+            self.prediction_h = None
+        self.bpe = encoders.build_bpe(args.bpe)
+        self.tokenizer = encoders.build_tokenizer(args.tokenizer)
+
+    def __del__(self):
+        if self.prediction_h is not None:
+            self.prediction_h.close()
+
+    @staticmethod
+    def add_args(parser):
+        """Add criterion-specific arguments to the parser."""
+        parser.add_argument("--wsc-margin-alpha", type=float, metavar="A", default=1.0)
+        parser.add_argument("--wsc-margin-beta", type=float, metavar="B", default=0.0)
+        parser.add_argument(
+            "--wsc-cross-entropy",
+            action="store_true",
+            help="use cross entropy formulation instead of margin loss",
+        )
+        parser.add_argument(
+            "--save-predictions", metavar="FILE", help="file to save predictions to"
+        )
+
+    def get_masked_input(self, tokens, mask):
+        masked_tokens = tokens.clone()
+        masked_tokens[mask] = self.task.mask
+        return masked_tokens
+
+    def get_lprobs(self, model, tokens, mask):
+        logits, _ = model(src_tokens=self.get_masked_input(tokens, mask))
+        lprobs = F.log_softmax(logits, dim=-1, dtype=torch.float)
+        scores = lprobs.gather(2, tokens.unsqueeze(-1)).squeeze(-1)
+        mask = mask.type_as(scores)
+        scores = (scores * mask).sum(dim=-1) / mask.sum(dim=-1)
+        return scores
+
+    def get_loss(self, query_lprobs, cand_lprobs):
+        if self.args.wsc_cross_entropy:
+            return F.cross_entropy(
+                torch.cat([query_lprobs, cand_lprobs]).unsqueeze(0),
+                query_lprobs.new([0]).long(),
+            )
+        else:
+            return (
+                -query_lprobs
+                + self.args.wsc_margin_alpha
+                * (cand_lprobs - query_lprobs + self.args.wsc_margin_beta).clamp(min=0)
+            ).sum()
+
+    def forward(self, model, sample, reduce=True):
+        # compute loss and accuracy
+        loss, nloss = 0.0, 0
+        ncorrect, nqueries = 0, 0
+
+        for i, label in enumerate(sample["labels"]):
+            query_lprobs = self.get_lprobs(
+                model,
+                sample["query_tokens"][i].unsqueeze(0),
+                sample["query_masks"][i].unsqueeze(0),
+            )
+            cand_lprobs = self.get_lprobs(
+                model,
+                sample["candidate_tokens"][i],
+                sample["candidate_masks"][i],
+            )
+
+            pred = (query_lprobs >= cand_lprobs).all().item()
+
+            if label is not None:
+                label = 1 if label else 0
+                ncorrect += 1 if pred == label else 0
+                nqueries += 1
+
+            if label:
+                # only compute a loss for positive instances
+                nloss += 1
+                loss += self.get_loss(query_lprobs, cand_lprobs)
+
+            id = sample["id"][i].item()
+            if self.prediction_h is not None:
+                print("{}\t{}\t{}".format(id, pred, label), file=self.prediction_h)
+
+        if nloss == 0:
+            loss = torch.tensor(0.0, requires_grad=True)
+
+        sample_size = nqueries if nqueries > 0 else 1
+        logging_output = {
+            "loss": utils.item(loss.data) if reduce else loss.data,
+            "ntokens": sample["ntokens"],
+            "nsentences": sample["nsentences"],
+            "sample_size": sample_size,
+            "ncorrect": ncorrect,
+            "nqueries": nqueries,
+        }
+        return loss, sample_size, logging_output
+
+    @staticmethod
+    def aggregate_logging_outputs(logging_outputs):
+        """Aggregate logging outputs from data parallel training."""
+        loss_sum = sum(log.get("loss", 0) for log in logging_outputs)
+        ntokens = sum(log.get("ntokens", 0) for log in logging_outputs)
+        nsentences = sum(log.get("nsentences", 0) for log in logging_outputs)
+        sample_size = sum(log.get("sample_size", 0) for log in logging_outputs)
+
+        agg_output = {
+            "loss": loss_sum / sample_size / math.log(2),
+            "ntokens": ntokens,
+            "nsentences": nsentences,
+            "sample_size": sample_size,
+        }
+
+        ncorrect = sum(log.get("ncorrect", 0) for log in logging_outputs)
+        nqueries = sum(log.get("nqueries", 0) for log in logging_outputs)
+        if nqueries > 0:
+            agg_output["accuracy"] = ncorrect / float(nqueries)
+
+        return agg_output
+
+
+@register_criterion("winogrande")
+class WinograndeCriterion(WSCCriterion):
+    def forward(self, model, sample, reduce=True):
+        # compute loss and accuracy
+        query_lprobs = self.get_lprobs(
+            model,
+            sample["query_tokens"],
+            sample["query_masks"],
+        )
+        cand_lprobs = self.get_lprobs(
+            model,
+            sample["candidate_tokens"],
+            sample["candidate_masks"],
+        )
+        pred = query_lprobs >= cand_lprobs
+        loss = self.get_loss(query_lprobs, cand_lprobs)
+
+        sample_size = sample["query_tokens"].size(0)
+        ncorrect = pred.sum().item()
+        logging_output = {
+            "loss": utils.item(loss.data) if reduce else loss.data,
+            "ntokens": sample["ntokens"],
+            "nsentences": sample["nsentences"],
+            "sample_size": sample_size,
+            "ncorrect": ncorrect,
+            "nqueries": sample_size,
+        }
+        return loss, sample_size, logging_output
diff --git a/SpeechT5/fairseq/examples/roberta/wsc/wsc_task.py b/SpeechT5/fairseq/examples/roberta/wsc/wsc_task.py
new file mode 100644
index 0000000000000000000000000000000000000000..602ea737ed75a33fddf44dd859e999ecfce2730d
--- /dev/null
+++ b/SpeechT5/fairseq/examples/roberta/wsc/wsc_task.py
@@ -0,0 +1,401 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import json
+import os
+import tempfile
+
+import numpy as np
+import torch
+import torch.nn.functional as F
+from fairseq import utils
+from fairseq.data import (
+    Dictionary,
+    IdDataset,
+    ListDataset,
+    NestedDictionaryDataset,
+    NumelDataset,
+    NumSamplesDataset,
+    PadDataset,
+    SortDataset,
+    data_utils,
+    encoders,
+)
+from fairseq.tasks import LegacyFairseqTask, register_task
+
+from . import wsc_utils
+
+
+@register_task("wsc")
+class WSCTask(LegacyFairseqTask):
+    """Task to finetune RoBERTa for Winograd Schemas."""
+
+    @staticmethod
+    def add_args(parser):
+        """Add task-specific arguments to the parser."""
+        parser.add_argument(
+            "data", metavar="DIR", help="path to data directory; we load <split>.jsonl"
+        )
+        parser.add_argument(
+            "--init-token",
+            type=int,
+            default=None,
+            help="add token at the beginning of each batch item",
+        )
+
+    def __init__(self, args, vocab):
+        super().__init__(args)
+        self.vocab = vocab
+        self.mask = vocab.add_symbol("<mask>")
+
+        self.bpe = encoders.build_bpe(args)
+        self.tokenizer = encoders.build_tokenizer(args)
+
+        # hack to handle GPT-2 BPE, which includes leading spaces
+        if args.bpe == "gpt2":
+            self.leading_space = True
+            self.trailing_space = False
+        else:
+            self.leading_space = False
+            self.trailing_space = True
+
+    @classmethod
+    def load_dictionary(cls, filename):
+        """Load the dictionary from the filename
+
+        Args:
+            filename (str): the filename
+        """
+        dictionary = Dictionary.load(filename)
+        dictionary.add_symbol("<mask>")
+        return dictionary
+
+    @classmethod
+    def setup_task(cls, args, **kwargs):
+        assert args.criterion == "wsc", "Must set --criterion=wsc"
+
+        # load data and label dictionaries
+        vocab = cls.load_dictionary(os.path.join(args.data, "dict.txt"))
+        print("| dictionary: {} types".format(len(vocab)))
+
+        return cls(args, vocab)
+
+    def binarize(self, s: str, append_eos: bool = False):
+        if self.tokenizer is not None:
+            s = self.tokenizer.encode(s)
+        if self.bpe is not None:
+            s = self.bpe.encode(s)
+        tokens = self.vocab.encode_line(
+            s,
+            append_eos=append_eos,
+            add_if_not_exist=False,
+        ).long()
+        if self.args.init_token is not None:
+            tokens = torch.cat([tokens.new([self.args.init_token]), tokens])
+        return tokens
+
+    def binarize_with_mask(self, txt, prefix, suffix, leading_space, trailing_space):
+        toks = self.binarize(
+            prefix + leading_space + txt + trailing_space + suffix,
+            append_eos=True,
+        )
+        mask = torch.zeros_like(toks, dtype=torch.bool)
+        mask_start = len(self.binarize(prefix))
+        mask_size = len(self.binarize(leading_space + txt))
+        mask[mask_start : mask_start + mask_size] = 1
+        return toks, mask
+
+    def load_dataset(
+        self, split, epoch=1, combine=False, data_path=None, return_only=False, **kwargs
+    ):
+        """Load a given dataset split.
+
+        Args:
+            split (str): name of the split (e.g., train, valid, test)
+        """
+        if data_path is None:
+            data_path = os.path.join(self.args.data, split + ".jsonl")
+        if not os.path.exists(data_path):
+            raise FileNotFoundError("Cannot find data: {}".format(data_path))
+
+        query_tokens = []
+        query_masks = []
+        query_lengths = []
+        candidate_tokens = []
+        candidate_masks = []
+        candidate_lengths = []
+        labels = []
+
+        for sentence, pronoun_span, query, label in wsc_utils.jsonl_iterator(data_path):
+            prefix = sentence[: pronoun_span.start].text
+            suffix = sentence[pronoun_span.end :].text_with_ws
+
+            # spaCy spans include trailing spaces, but we need to know about
+            # leading spaces for the GPT-2 BPE
+            leading_space = (
+                " " if sentence[: pronoun_span.start].text_with_ws.endswith(" ") else ""
+            )
+            trailing_space = " " if pronoun_span.text_with_ws.endswith(" ") else ""
+
+            # get noun phrases, excluding pronouns and anything overlapping with the query
+            cand_spans = wsc_utils.filter_noun_chunks(
+                wsc_utils.extended_noun_chunks(sentence),
+                exclude_pronouns=True,
+                exclude_query=query,
+                exact_match=False,
+            )
+
+            if query is not None:
+                query_toks, query_mask = self.binarize_with_mask(
+                    query, prefix, suffix, leading_space, trailing_space
+                )
+                query_len = len(query_toks)
+            else:
+                query_toks, query_mask, query_len = None, None, 0
+
+            query_tokens.append(query_toks)
+            query_masks.append(query_mask)
+            query_lengths.append(query_len)
+
+            cand_toks, cand_masks = [], []
+            for cand_span in cand_spans:
+                toks, mask = self.binarize_with_mask(
+                    cand_span.text,
+                    prefix,
+                    suffix,
+                    leading_space,
+                    trailing_space,
+                )
+                cand_toks.append(toks)
+                cand_masks.append(mask)
+
+            # collate candidates
+            cand_toks = data_utils.collate_tokens(cand_toks, pad_idx=self.vocab.pad())
+            cand_masks = data_utils.collate_tokens(cand_masks, pad_idx=0)
+            assert cand_toks.size() == cand_masks.size()
+
+            candidate_tokens.append(cand_toks)
+            candidate_masks.append(cand_masks)
+            candidate_lengths.append(cand_toks.size(1))
+
+            labels.append(label)
+
+        query_lengths = np.array(query_lengths)
+        query_tokens = ListDataset(query_tokens, query_lengths)
+        query_masks = ListDataset(query_masks, query_lengths)
+
+        candidate_lengths = np.array(candidate_lengths)
+        candidate_tokens = ListDataset(candidate_tokens, candidate_lengths)
+        candidate_masks = ListDataset(candidate_masks, candidate_lengths)
+
+        labels = ListDataset(labels, [1] * len(labels))
+
+        dataset = {
+            "id": IdDataset(),
+            "query_tokens": query_tokens,
+            "query_masks": query_masks,
+            "candidate_tokens": candidate_tokens,
+            "candidate_masks": candidate_masks,
+            "labels": labels,
+            "nsentences": NumSamplesDataset(),
+            "ntokens": NumelDataset(query_tokens, reduce=True),
+        }
+
+        nested_dataset = NestedDictionaryDataset(
+            dataset,
+            sizes=[query_lengths],
+        )
+
+        with data_utils.numpy_seed(self.args.seed):
+            shuffle = np.random.permutation(len(query_tokens))
+        dataset = SortDataset(
+            nested_dataset,
+            # shuffle
+            sort_order=[shuffle],
+        )
+
+        if return_only:
+            return dataset
+
+        self.datasets[split] = dataset
+        return self.datasets[split]
+
+    def build_dataset_for_inference(self, sample_json):
+        with tempfile.NamedTemporaryFile(buffering=0) as h:
+            h.write((json.dumps(sample_json) + "\n").encode("utf-8"))
+            dataset = self.load_dataset(
+                "disambiguate_pronoun",
+                data_path=h.name,
+                return_only=True,
+            )
+        return dataset
+
+    def disambiguate_pronoun(self, model, sentence, use_cuda=False):
+        sample_json = wsc_utils.convert_sentence_to_json(sentence)
+        dataset = self.build_dataset_for_inference(sample_json)
+        sample = dataset.collater([dataset[0]])
+        if use_cuda:
+            sample = utils.move_to_cuda(sample)
+
+        def get_masked_input(tokens, mask):
+            masked_tokens = tokens.clone()
+            masked_tokens[mask.bool()] = self.mask
+            return masked_tokens
+
+        def get_lprobs(tokens, mask):
+            logits, _ = model(src_tokens=get_masked_input(tokens, mask))
+            lprobs = F.log_softmax(logits, dim=-1, dtype=torch.float)
+            scores = lprobs.gather(2, tokens.unsqueeze(-1)).squeeze(-1)
+            mask = mask.type_as(scores)
+            scores = (scores * mask).sum(dim=-1) / mask.sum(dim=-1)
+            return scores
+
+        cand_lprobs = get_lprobs(
+            sample["candidate_tokens"][0],
+            sample["candidate_masks"][0],
+        )
+        if sample["query_tokens"][0] is not None:
+            query_lprobs = get_lprobs(
+                sample["query_tokens"][0].unsqueeze(0),
+                sample["query_masks"][0].unsqueeze(0),
+            )
+            return (query_lprobs >= cand_lprobs).all().item() == 1
+        else:
+            best_idx = cand_lprobs.argmax().item()
+            full_cand = sample["candidate_tokens"][0][best_idx]
+            mask = sample["candidate_masks"][0][best_idx]
+            toks = full_cand[mask.bool()]
+            return self.bpe.decode(self.source_dictionary.string(toks)).strip()
+
+    @property
+    def source_dictionary(self):
+        return self.vocab
+
+    @property
+    def target_dictionary(self):
+        return self.vocab
+
+
+@register_task("winogrande")
+class WinograndeTask(WSCTask):
+    """
+    Task for WinoGrande dataset. Efficient implementation for Winograd schema
+    tasks with exactly two candidates, one of which is correct.
+    """
+
+    @classmethod
+    def setup_task(cls, args, **kwargs):
+        assert args.criterion == "winogrande", "Must set --criterion=winogrande"
+
+        # load data and label dictionaries
+        vocab = cls.load_dictionary(os.path.join(args.data, "dict.txt"))
+        print("| dictionary: {} types".format(len(vocab)))
+
+        return cls(args, vocab)
+
+    def load_dataset(
+        self, split, epoch=1, combine=False, data_path=None, return_only=False, **kwargs
+    ):
+        """Load a given dataset split.
+
+        Args:
+            split (str): name of the split (e.g., train, valid, test)
+        """
+        if data_path is None:
+            data_path = os.path.join(self.args.data, split + ".jsonl")
+        if not os.path.exists(data_path):
+            raise FileNotFoundError("Cannot find data: {}".format(data_path))
+
+        query_tokens = []
+        query_masks = []
+        query_lengths = []
+        candidate_tokens = []
+        candidate_masks = []
+        candidate_lengths = []
+
+        itr = wsc_utils.winogrande_jsonl_iterator(data_path, eval=(split == "test"))
+
+        for sample in itr:
+            sentence, pronoun_span, query, cand_text = sample
+            prefix = sentence[: pronoun_span[0]].rstrip()
+            suffix = sentence[pronoun_span[1] :]
+
+            leading_space = " " if sentence[: pronoun_span[0]].endswith(" ") else ""
+            trailing_space = ""
+
+            if query is not None:
+                query_toks, query_mask = self.binarize_with_mask(
+                    query,
+                    prefix,
+                    suffix,
+                    leading_space,
+                    trailing_space,
+                )
+                query_len = len(query_toks)
+            else:
+                query_toks, query_mask, query_len = None, None, 0
+
+            query_tokens.append(query_toks)
+            query_masks.append(query_mask)
+            query_lengths.append(query_len)
+
+            cand_toks, cand_mask = self.binarize_with_mask(
+                cand_text,
+                prefix,
+                suffix,
+                leading_space,
+                trailing_space,
+            )
+
+            candidate_tokens.append(cand_toks)
+            candidate_masks.append(cand_mask)
+            candidate_lengths.append(cand_toks.size(0))
+
+        query_lengths = np.array(query_lengths)
+
+        def get_pad_dataset_fn(tokens, length, pad_idx):
+            return PadDataset(
+                ListDataset(tokens, length),
+                pad_idx=pad_idx,
+                left_pad=False,
+            )
+
+        query_tokens = get_pad_dataset_fn(query_tokens, query_lengths, self.vocab.pad())
+        query_masks = get_pad_dataset_fn(query_masks, query_lengths, 0)
+
+        candidate_lengths = np.array(candidate_lengths)
+        candidate_tokens = get_pad_dataset_fn(
+            candidate_tokens, candidate_lengths, self.vocab.pad()
+        )
+        candidate_masks = get_pad_dataset_fn(candidate_masks, candidate_lengths, 0)
+
+        dataset = {
+            "id": IdDataset(),
+            "query_tokens": query_tokens,
+            "query_masks": query_masks,
+            "candidate_tokens": candidate_tokens,
+            "candidate_masks": candidate_masks,
+            "nsentences": NumSamplesDataset(),
+            "ntokens": NumelDataset(query_tokens, reduce=True),
+        }
+
+        nested_dataset = NestedDictionaryDataset(
+            dataset,
+            sizes=[query_lengths],
+        )
+
+        with data_utils.numpy_seed(self.args.seed):
+            shuffle = np.random.permutation(len(query_tokens))
+        dataset = SortDataset(
+            nested_dataset,
+            # shuffle
+            sort_order=[shuffle],
+        )
+
+        if return_only:
+            return dataset
+
+        self.datasets[split] = dataset
+        return self.datasets[split]
diff --git a/SpeechT5/fairseq/examples/roberta/wsc/wsc_utils.py b/SpeechT5/fairseq/examples/roberta/wsc/wsc_utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..da6ba74383a2490e1108609f315f44ad4b3bf002
--- /dev/null
+++ b/SpeechT5/fairseq/examples/roberta/wsc/wsc_utils.py
@@ -0,0 +1,241 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import json
+from functools import lru_cache
+
+
+def convert_sentence_to_json(sentence):
+    if "_" in sentence:
+        prefix, rest = sentence.split("_", 1)
+        query, rest = rest.split("_", 1)
+        query_index = len(prefix.rstrip().split(" "))
+    else:
+        query, query_index = None, None
+
+    prefix, rest = sentence.split("[", 1)
+    pronoun, rest = rest.split("]", 1)
+    pronoun_index = len(prefix.rstrip().split(" "))
+
+    sentence = sentence.replace("_", "").replace("[", "").replace("]", "")
+
+    return {
+        "idx": 0,
+        "text": sentence,
+        "target": {
+            "span1_index": query_index,
+            "span1_text": query,
+            "span2_index": pronoun_index,
+            "span2_text": pronoun,
+        },
+    }
+
+
+def extended_noun_chunks(sentence):
+    noun_chunks = {(np.start, np.end) for np in sentence.noun_chunks}
+    np_start, cur_np = 0, "NONE"
+    for i, token in enumerate(sentence):
+        np_type = token.pos_ if token.pos_ in {"NOUN", "PROPN"} else "NONE"
+        if np_type != cur_np:
+            if cur_np != "NONE":
+                noun_chunks.add((np_start, i))
+            if np_type != "NONE":
+                np_start = i
+            cur_np = np_type
+    if cur_np != "NONE":
+        noun_chunks.add((np_start, len(sentence)))
+    return [sentence[s:e] for (s, e) in sorted(noun_chunks)]
+
+
+def find_token(sentence, start_pos):
+    found_tok = None
+    for tok in sentence:
+        if tok.idx == start_pos:
+            found_tok = tok
+            break
+    return found_tok
+
+
+def find_span(sentence, search_text, start=0):
+    search_text = search_text.lower()
+    for tok in sentence[start:]:
+        remainder = sentence[tok.i :].text.lower()
+        if remainder.startswith(search_text):
+            len_to_consume = len(search_text)
+            start_idx = tok.idx
+            for next_tok in sentence[tok.i :]:
+                end_idx = next_tok.idx + len(next_tok.text)
+                if end_idx - start_idx == len_to_consume:
+                    span = sentence[tok.i : next_tok.i + 1]
+                    return span
+    return None
+
+
+@lru_cache(maxsize=1)
+def get_detokenizer():
+    from sacremoses import MosesDetokenizer
+
+    detok = MosesDetokenizer(lang="en")
+    return detok
+
+
+@lru_cache(maxsize=1)
+def get_spacy_nlp():
+    import en_core_web_lg
+
+    nlp = en_core_web_lg.load()
+    return nlp
+
+
+def jsonl_iterator(input_fname, positive_only=False, ngram_order=3, eval=False):
+    detok = get_detokenizer()
+    nlp = get_spacy_nlp()
+
+    with open(input_fname) as fin:
+        for line in fin:
+            sample = json.loads(line.strip())
+
+            if positive_only and "label" in sample and not sample["label"]:
+                # only consider examples where the query is correct
+                continue
+
+            target = sample["target"]
+
+            # clean up the query
+            query = target["span1_text"]
+            if query is not None:
+                if "\n" in query:
+                    continue
+                if query.endswith(".") or query.endswith(","):
+                    query = query[:-1]
+
+            # split tokens
+            tokens = sample["text"].split(" ")
+
+            def strip_pronoun(x):
+                return x.rstrip('.,"')
+
+            # find the pronoun
+            pronoun_idx = target["span2_index"]
+            pronoun = strip_pronoun(target["span2_text"])
+            if strip_pronoun(tokens[pronoun_idx]) != pronoun:
+                # hack: sometimes the index is misaligned
+                if strip_pronoun(tokens[pronoun_idx + 1]) == pronoun:
+                    pronoun_idx += 1
+                else:
+                    raise Exception("Misaligned pronoun!")
+            assert strip_pronoun(tokens[pronoun_idx]) == pronoun
+
+            # split tokens before and after the pronoun
+            before = tokens[:pronoun_idx]
+            after = tokens[pronoun_idx + 1 :]
+
+            # the GPT BPE attaches leading spaces to tokens, so we keep track
+            # of whether we need spaces before or after the pronoun
+            leading_space = " " if pronoun_idx > 0 else ""
+            trailing_space = " " if len(after) > 0 else ""
+
+            # detokenize
+            before = detok.detokenize(before, return_str=True)
+            pronoun = detok.detokenize([pronoun], return_str=True)
+            after = detok.detokenize(after, return_str=True)
+
+            # hack: when the pronoun ends in a period (or comma), move the
+            # punctuation to the "after" part
+            if pronoun.endswith(".") or pronoun.endswith(","):
+                after = pronoun[-1] + trailing_space + after
+                pronoun = pronoun[:-1]
+
+            # hack: when the "after" part begins with a comma or period, remove
+            # the trailing space
+            if after.startswith(".") or after.startswith(","):
+                trailing_space = ""
+
+            # parse sentence with spacy
+            sentence = nlp(before + leading_space + pronoun + trailing_space + after)
+
+            # find pronoun span
+            start = len(before + leading_space)
+            first_pronoun_tok = find_token(sentence, start_pos=start)
+            pronoun_span = find_span(sentence, pronoun, start=first_pronoun_tok.i)
+            assert pronoun_span.text == pronoun
+
+            if eval:
+                # convert to format where pronoun is surrounded by "[]" and
+                # query is surrounded by "_"
+                query_span = find_span(sentence, query)
+                query_with_ws = "_{}_{}".format(
+                    query_span.text,
+                    (" " if query_span.text_with_ws.endswith(" ") else ""),
+                )
+                pronoun_with_ws = "[{}]{}".format(
+                    pronoun_span.text,
+                    (" " if pronoun_span.text_with_ws.endswith(" ") else ""),
+                )
+                if query_span.start < pronoun_span.start:
+                    first = (query_span, query_with_ws)
+                    second = (pronoun_span, pronoun_with_ws)
+                else:
+                    first = (pronoun_span, pronoun_with_ws)
+                    second = (query_span, query_with_ws)
+                sentence = (
+                    sentence[: first[0].start].text_with_ws
+                    + first[1]
+                    + sentence[first[0].end : second[0].start].text_with_ws
+                    + second[1]
+                    + sentence[second[0].end :].text
+                )
+                yield sentence, sample.get("label", None)
+            else:
+                yield sentence, pronoun_span, query, sample.get("label", None)
+
+
+def winogrande_jsonl_iterator(input_fname, eval=False):
+    with open(input_fname) as fin:
+        for line in fin:
+            sample = json.loads(line.strip())
+            sentence, option1, option2 = (
+                sample["sentence"],
+                sample["option1"],
+                sample["option2"],
+            )
+
+            pronoun_span = (sentence.index("_"), sentence.index("_") + 1)
+
+            if eval:
+                query, cand = option1, option2
+            else:
+                query = option1 if sample["answer"] == "1" else option2
+                cand = option2 if sample["answer"] == "1" else option1
+            yield sentence, pronoun_span, query, cand
+
+
+def filter_noun_chunks(
+    chunks, exclude_pronouns=False, exclude_query=None, exact_match=False
+):
+    if exclude_pronouns:
+        chunks = [
+            np
+            for np in chunks
+            if (np.lemma_ != "-PRON-" and not all(tok.pos_ == "PRON" for tok in np))
+        ]
+
+    if exclude_query is not None:
+        excl_txt = [exclude_query.lower()]
+        filtered_chunks = []
+        for chunk in chunks:
+            lower_chunk = chunk.text.lower()
+            found = False
+            for excl in excl_txt:
+                if (
+                    not exact_match and (lower_chunk in excl or excl in lower_chunk)
+                ) or lower_chunk == excl:
+                    found = True
+                    break
+            if not found:
+                filtered_chunks.append(chunk)
+        chunks = filtered_chunks
+
+    return chunks
diff --git a/SpeechT5/fairseq/examples/rxf/README.md b/SpeechT5/fairseq/examples/rxf/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..22a1cc47df23c7e0ebbf0ad805031478d1b4a95e
--- /dev/null
+++ b/SpeechT5/fairseq/examples/rxf/README.md
@@ -0,0 +1,52 @@
+[Better Fine-Tuning by Reducing Representational Collapse](https://arxiv.org/abs/2008.03156)
+=====================
+This repo contains the code to replicate all experiments from the _Better Fine-Tuning by Reducing Representational Collapse_ paper excluding the probing results.
+
+The R3F sentence prediction criterion is registered as `sentence_prediction_r3f` while the label smoothing version of it is implemented as `label_smoothed_cross_entropy_r3f`. The R4F version of the sentence prediction criterion can be achieved by applying spectral norm to the classification head via the `--spectral-norm-classification-head` parameter.
+
+## Hyper-parameters
+Our methods introduce 3 new hyper-parameters; `--eps` which sets the standard deviation or range of the distribution we're sampling from, `--r3f-lambda` which controls the combining of logistic loss and noisy KL loss and `--noise-type` which controls which parametric distribution we use ('normal', 'uniform').
+
+For example to run R3F on RTE from GLUE
+
+```
+TOTAL_NUM_UPDATES=3120
+WARMUP_UPDATES=187
+LR=1e-05
+NUM_CLASSES=2
+MAX_SENTENCES=8        # Batch size.
+ROBERTA_PATH=/path/to/roberta/model.pt
+
+CUDA_VISIBLE_DEVICES=0 fairseq-train RTE-bin \
+    --restore-file $ROBERTA_PATH \
+    --max-positions 512 \
+    --max-sentences $MAX_SENTENCES \
+    --max-tokens 4400 \
+    --task sentence_prediction \
+    --reset-optimizer --reset-dataloader --reset-meters \
+    --required-batch-size-multiple 1 \
+    --init-token 0 --separator-token 2 \
+    --arch roberta_large \
+    --criterion sentence_prediction_r3f \
+    --num-classes $NUM_CLASSES \
+    --dropout 0.1 --attention-dropout 0.1 \
+    --weight-decay 0.1 --optimizer adam --adam-betas "(0.9, 0.98)" --adam-eps 1e-06 \
+    --clip-norm 0.0 \
+    --lr-scheduler polynomial_decay --lr $LR --total-num-update $TOTAL_NUM_UPDATES --warmup-updates $WARMUP_UPDATES \
+    --fp16 --fp16-init-scale 4 --threshold-loss-scale 1 --fp16-scale-window 128 \
+    --max-epoch 10 \
+    --find-unused-parameters \
+    --best-checkpoint-metric accuracy --maximize-best-checkpoint-metric \
+    --noise-type uniform --r3f-lambda 0.7 \
+    --user-dir examples/rxf/rxf_src
+```
+
+## Citation
+```bibtex
+@article{aghajanyan2020better,
+  title={Better Fine-Tuning by Reducing Representational Collapse},
+  author={Aghajanyan, Armen and Shrivastava, Akshat and Gupta, Anchit and Goyal, Naman and Zettlemoyer, Luke and Gupta, Sonal},
+  journal={arXiv preprint arXiv:2008.03156},
+  year={2020}
+}
+```
diff --git a/SpeechT5/fairseq/examples/rxf/__init__.py b/SpeechT5/fairseq/examples/rxf/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..b24cb6b797b4159c9862bab1f882ee6ae95614ab
--- /dev/null
+++ b/SpeechT5/fairseq/examples/rxf/__init__.py
@@ -0,0 +1,6 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from . import rxf_src  # noqa
diff --git a/SpeechT5/fairseq/examples/rxf/rxf_src/__init__.py b/SpeechT5/fairseq/examples/rxf/rxf_src/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..306e232d6f386b26153864601114e162080dcee4
--- /dev/null
+++ b/SpeechT5/fairseq/examples/rxf/rxf_src/__init__.py
@@ -0,0 +1,6 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from . import label_smoothed_cross_entropy_r3f, sentence_prediction_r3f  # noqa
diff --git a/SpeechT5/fairseq/examples/rxf/rxf_src/label_smoothed_cross_entropy_r3f.py b/SpeechT5/fairseq/examples/rxf/rxf_src/label_smoothed_cross_entropy_r3f.py
new file mode 100644
index 0000000000000000000000000000000000000000..079db13e61c5ef46d1b1d288012145148eb0be04
--- /dev/null
+++ b/SpeechT5/fairseq/examples/rxf/rxf_src/label_smoothed_cross_entropy_r3f.py
@@ -0,0 +1,157 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import math
+
+import torch
+import torch.nn.functional as F
+from fairseq import metrics, utils
+from fairseq.criterions import FairseqCriterion, register_criterion
+from fairseq.criterions.label_smoothed_cross_entropy import label_smoothed_nll_loss
+
+
+@register_criterion("label_smoothed_cross_entropy_r3f")
+class LabelSmoothedCrossEntropyR3FCriterion(FairseqCriterion):
+    def __init__(
+        self, task, sentence_avg, label_smoothing, eps, r3f_lambda, noise_type
+    ):
+        super().__init__(task)
+        self.sentence_avg = sentence_avg
+        self.label_smoothing = label_smoothing
+        self.eps = eps
+        self.r3f_lambda = r3f_lambda
+        self.noise_type = noise_type
+        if self.noise_type in {"normal"}:
+            self.noise_sampler = torch.distributions.normal.Normal(
+                loc=0.0, scale=self.eps
+            )
+        elif self.noise_type == "uniform":
+            self.noise_sampler = torch.distributions.uniform.Uniform(
+                low=-self.eps, high=self.eps
+            )
+        else:
+            raise Exception(f"unrecognized noise type {self.noise_type}")
+
+    @staticmethod
+    def add_args(parser):
+        """Add criterion-specific arguments to the parser."""
+        # fmt: off
+        parser.add_argument('--label-smoothing', default=0., type=float, metavar='D',
+                            help='epsilon for label smoothing, 0 means no label smoothing')
+        parser.add_argument('--eps', type=float, default=1e-5,
+                            help='noise eps')
+        parser.add_argument('--r3f-lambda', type=float, default=1.0,
+                            help='lambda for combining logistic loss and noisy KL loss')
+        parser.add_argument('--noise-type', type=str, default='normal',
+                            choices=['normal', 'uniform'],
+                            help='type of noises')
+        # fmt: on
+
+    def _get_symm_kl(self, noised_logits, input_logits):
+        return (
+            F.kl_div(
+                F.log_softmax(noised_logits, dim=-1, dtype=torch.float32),
+                F.softmax(input_logits, dim=-1, dtype=torch.float32),
+                None,
+                None,
+                "sum",
+            )
+            + F.kl_div(
+                F.log_softmax(input_logits, dim=-1, dtype=torch.float32),
+                F.softmax(noised_logits, dim=-1, dtype=torch.float32),
+                None,
+                None,
+                "sum",
+            )
+        ) / noised_logits.size(0)
+
+    def forward(self, model, sample, reduce=True):
+        """Compute the loss for the given sample.
+
+        Returns a tuple with three elements:
+        1) the loss
+        2) the sample size, which is used as the denominator for the gradient
+        3) logging outputs to display while training
+        """
+        token_embeddings = model.encoder.embed_tokens(sample["net_input"]["src_tokens"])
+        input_logits, extra = model(**sample["net_input"])
+        loss, nll_loss = self.compute_loss(
+            model, (input_logits, extra), sample, reduce=reduce
+        )
+        sample_size = (
+            sample["target"].size(0) if self.sentence_avg else sample["ntokens"]
+        )
+
+        if model.training:
+            noise = self.noise_sampler.sample(sample_shape=token_embeddings.shape).to(
+                token_embeddings
+            )
+            noised_embeddings = token_embeddings.clone() + noise
+
+            noised_logits, _ = model(
+                **sample["net_input"], token_embeddings=noised_embeddings
+            )
+            symm_kl = self._get_symm_kl(noised_logits, input_logits)
+
+        if model.training:
+            symm_kl = symm_kl * sample_size
+            loss = loss + self.r3f_lambda * symm_kl
+
+        logging_output = {
+            "loss": loss.data,
+            "nll_loss": nll_loss.data,
+            "ntokens": sample["ntokens"],
+            "nsentences": sample["target"].size(0),
+            "sample_size": sample_size,
+        }
+
+        if model.training:
+            logging_output.update(
+                symm_kl=utils.item(symm_kl.data) if reduce else symm_kl.data
+            )
+
+        return loss, sample_size, logging_output
+
+    def compute_loss(self, model, net_output, sample, reduce=True):
+        lprobs = model.get_normalized_probs(net_output, log_probs=True)
+        lprobs = lprobs.view(-1, lprobs.size(-1))
+        target = model.get_targets(sample, net_output).view(-1, 1)
+        loss, nll_loss = label_smoothed_nll_loss(
+            lprobs,
+            target,
+            self.label_smoothing,
+            ignore_index=self.padding_idx,
+            reduce=reduce,
+        )
+        return loss, nll_loss
+
+    @staticmethod
+    def reduce_metrics(logging_outputs) -> None:
+        """Aggregate logging outputs from data parallel training."""
+        loss_sum = sum(log.get("loss", 0) for log in logging_outputs)
+        nll_loss_sum = sum(log.get("nll_loss", 0) for log in logging_outputs)
+        ntokens = sum(log.get("ntokens", 0) for log in logging_outputs)
+        sample_size = sum(log.get("sample_size", 0) for log in logging_outputs)
+        symm_kl_sum = sum(log.get("symm_kl", 0) for log in logging_outputs)
+
+        metrics.log_scalar("symm_kl", symm_kl_sum / sample_size, sample_size, round=3)
+        metrics.log_scalar(
+            "loss", loss_sum / sample_size / math.log(2), sample_size, round=3
+        )
+        metrics.log_scalar(
+            "nll_loss", nll_loss_sum / ntokens / math.log(2), ntokens, round=3
+        )
+        metrics.log_derived(
+            "ppl", lambda meters: utils.get_perplexity(meters["nll_loss"].avg)
+        )
+
+    @staticmethod
+    def logging_outputs_can_be_summed() -> bool:
+        """
+        Whether the logging outputs returned by `forward` can be summed
+        across workers prior to calling `reduce_metrics`. Setting this
+        to True will improves distributed training speed.
+        """
+        return True
diff --git a/SpeechT5/fairseq/examples/rxf/rxf_src/sentence_prediction_r3f.py b/SpeechT5/fairseq/examples/rxf/rxf_src/sentence_prediction_r3f.py
new file mode 100644
index 0000000000000000000000000000000000000000..62dd63390c24445e2610b9b0609edbd36045ce8a
--- /dev/null
+++ b/SpeechT5/fairseq/examples/rxf/rxf_src/sentence_prediction_r3f.py
@@ -0,0 +1,170 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import math
+
+import torch
+import torch.nn.functional as F
+from fairseq import utils
+from fairseq.criterions import FairseqCriterion, register_criterion
+
+
+@register_criterion("sentence_prediction_r3f")
+class SentencePredictionR3F(FairseqCriterion):
+    def __init__(
+        self,
+        task,
+        eps,
+        r3f_lambda,
+        noise_type,
+        classification_head_name,
+        regression_target,
+    ):
+        super().__init__(task)
+        self.eps = eps
+        self.r3f_lambda = r3f_lambda
+        self.noise_type = noise_type
+        self.classification_head_name = classification_head_name
+        self.regression_target = regression_target
+        if self.noise_type in {"normal"}:
+            self.noise_sampler = torch.distributions.normal.Normal(
+                loc=0.0, scale=self.eps
+            )
+        elif self.noise_type == "uniform":
+            self.noise_sampler = torch.distributions.uniform.Uniform(
+                low=-self.eps, high=self.eps
+            )
+        else:
+            raise Exception(f"unrecognized noise type {self.noise_type}")
+
+    @staticmethod
+    def add_args(parser):
+        # fmt: off
+        parser.add_argument('--eps', type=float, default=1e-5,
+                            help='noise eps')
+        parser.add_argument('--r3f-lambda', type=float, default=1.0,
+                            help='lambda for combining logistic loss and noisy KL loss')
+        parser.add_argument('--noise-type', type=str, default='uniform',
+                            choices=['normal', 'uniform'],
+                            help='type of noises for RXF methods')
+        parser.add_argument('--classification-head-name',
+                            default='sentence_classification_head',
+                            help='name of the classification head to use')
+        # fmt: on
+
+    def _get_symm_kl(self, noised_logits, input_logits):
+        return (
+            F.kl_div(
+                F.log_softmax(noised_logits, dim=-1, dtype=torch.float32),
+                F.softmax(input_logits, dim=-1, dtype=torch.float32),
+                None,
+                None,
+                "sum",
+            )
+            + F.kl_div(
+                F.log_softmax(input_logits, dim=-1, dtype=torch.float32),
+                F.softmax(noised_logits, dim=-1, dtype=torch.float32),
+                None,
+                None,
+                "sum",
+            )
+        ) / noised_logits.size(0)
+
+    def forward(self, model, sample, reduce=True):
+        """Compute the loss for the given sample.
+
+        Returns a tuple with three elements:
+        1) the loss
+        2) the sample size, which is used as the denominator for the gradient
+        3) logging outputs to display while training
+        """
+        assert (
+            hasattr(model, "classification_heads")
+            and self.classification_head_name in model.classification_heads
+        ), "model must provide sentence classification head for --criterion=sentence_prediction"
+
+        token_embeddings = model.encoder.sentence_encoder.embed_tokens(
+            sample["net_input"]["src_tokens"]
+        )
+        input_logits, _ = model(
+            **sample["net_input"],
+            features_only=True,
+            classification_head_name=self.classification_head_name,
+            token_embeddings=token_embeddings,
+        )
+        if model.training and self.noise_sampler:
+            noise = self.noise_sampler.sample(sample_shape=token_embeddings.shape).to(
+                token_embeddings
+            )
+            noised_embeddings = token_embeddings.detach().clone() + noise
+
+            noised_logits, _ = model(
+                **sample["net_input"],
+                features_only=True,
+                classification_head_name=self.classification_head_name,
+                token_embeddings=noised_embeddings,
+            )
+            symm_kl = self._get_symm_kl(noised_logits, input_logits)
+        else:
+            symm_kl = 0
+
+        targets = model.get_targets(sample, [input_logits]).view(-1)
+        sample_size = targets.numel()
+
+        if not self.regression_target:
+            loss = F.nll_loss(
+                F.log_softmax(input_logits, dim=-1, dtype=torch.float32),
+                targets,
+                reduction="sum",
+            )
+            if model.training:
+                symm_kl = symm_kl * sample_size
+                loss = loss + self.r3f_lambda * symm_kl
+        else:
+            logits = input_logits.squeeze().float()
+            targets = targets.float()
+            loss = F.mse_loss(logits, targets, reduction="sum")
+
+        logging_output = {
+            "loss": utils.item(loss.data) if reduce else loss.data,
+            "ntokens": sample["ntokens"],
+            "nsentences": sample_size,
+            "sample_size": sample_size,
+        }
+
+        if not self.regression_target:
+            preds = input_logits.max(dim=1)[1]
+            logging_output.update(ncorrect=(preds == targets).sum().item())
+
+            if model.training and self.noise_sampler:
+                logging_output.update(
+                    symm_kl=utils.item(symm_kl.data) if reduce else symm_kl.data
+                )
+        return loss, sample_size, logging_output
+
+    @staticmethod
+    def aggregate_logging_outputs(logging_outputs):
+        """Aggregate logging outputs from data parallel training."""
+        loss_sum = sum(log.get("loss", 0) for log in logging_outputs)
+        symm_kl_sum = sum(log.get("symm_kl", 0) for log in logging_outputs)
+        ntokens = sum(log.get("ntokens", 0) for log in logging_outputs)
+        nsentences = sum(log.get("nsentences", 0) for log in logging_outputs)
+        sample_size = sum(log.get("sample_size", 0) for log in logging_outputs)
+
+        agg_output = {
+            "loss": loss_sum / sample_size / math.log(2),
+            "symm_kl": symm_kl_sum / sample_size,
+            "ntokens": ntokens,
+            "nsentences": nsentences,
+            "sample_size": sample_size,
+        }
+
+        if len(logging_outputs) > 0 and "ncorrect" in logging_outputs[0]:
+            ncorrect = sum(log.get("ncorrect", 0) for log in logging_outputs)
+            agg_output.update(accuracy=ncorrect / nsentences)
+
+        if sample_size != ntokens:
+            agg_output["nll_loss"] = loss_sum / ntokens / math.log(2)
+        return agg_output
diff --git a/SpeechT5/fairseq/examples/scaling_nmt/README.md b/SpeechT5/fairseq/examples/scaling_nmt/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..0cc3360c3bbd58fe35a51591db8f081fc8576877
--- /dev/null
+++ b/SpeechT5/fairseq/examples/scaling_nmt/README.md
@@ -0,0 +1,114 @@
+# Scaling Neural Machine Translation (Ott et al., 2018)
+
+This page includes instructions for reproducing results from the paper [Scaling Neural Machine Translation (Ott et al., 2018)](https://arxiv.org/abs/1806.00187).
+
+## Pre-trained models
+
+Model | Description | Dataset | Download
+---|---|---|---
+`transformer.wmt14.en-fr` | Transformer <br> ([Ott et al., 2018](https://arxiv.org/abs/1806.00187)) | [WMT14 English-French](http://statmt.org/wmt14/translation-task.html#Download) | model: <br> [download (.tar.bz2)](https://dl.fbaipublicfiles.com/fairseq/models/wmt14.en-fr.joined-dict.transformer.tar.bz2) <br> newstest2014: <br> [download (.tar.bz2)](https://dl.fbaipublicfiles.com/fairseq/data/wmt14.en-fr.joined-dict.newstest2014.tar.bz2)
+`transformer.wmt16.en-de` | Transformer <br> ([Ott et al., 2018](https://arxiv.org/abs/1806.00187)) | [WMT16 English-German](https://drive.google.com/uc?export=download&id=0B_bZck-ksdkpM25jRUN2X2UxMm8) | model: <br> [download (.tar.bz2)](https://dl.fbaipublicfiles.com/fairseq/models/wmt16.en-de.joined-dict.transformer.tar.bz2) <br> newstest2014: <br> [download (.tar.bz2)](https://dl.fbaipublicfiles.com/fairseq/data/wmt16.en-de.joined-dict.newstest2014.tar.bz2)
+
+## Training a new model on WMT'16 En-De
+
+First download the [preprocessed WMT'16 En-De data provided by Google](https://drive.google.com/uc?export=download&id=0B_bZck-ksdkpM25jRUN2X2UxMm8).
+
+Then:
+
+##### 1. Extract the WMT'16 En-De data
+```bash
+TEXT=wmt16_en_de_bpe32k
+mkdir -p $TEXT
+tar -xzvf wmt16_en_de.tar.gz -C $TEXT
+```
+
+##### 2. Preprocess the dataset with a joined dictionary
+```bash
+fairseq-preprocess \
+    --source-lang en --target-lang de \
+    --trainpref $TEXT/train.tok.clean.bpe.32000 \
+    --validpref $TEXT/newstest2013.tok.bpe.32000 \
+    --testpref $TEXT/newstest2014.tok.bpe.32000 \
+    --destdir data-bin/wmt16_en_de_bpe32k \
+    --nwordssrc 32768 --nwordstgt 32768 \
+    --joined-dictionary \
+    --workers 20
+```
+
+##### 3. Train a model
+```bash
+fairseq-train \
+    data-bin/wmt16_en_de_bpe32k \
+    --arch transformer_vaswani_wmt_en_de_big --share-all-embeddings \
+    --optimizer adam --adam-betas '(0.9, 0.98)' --clip-norm 0.0 \
+    --lr 0.0005 --lr-scheduler inverse_sqrt --warmup-updates 4000 --warmup-init-lr 1e-07 \
+    --dropout 0.3 --weight-decay 0.0 \
+    --criterion label_smoothed_cross_entropy --label-smoothing 0.1 \
+    --max-tokens 3584 \
+    --fp16
+```
+
+Note that the `--fp16` flag requires you have CUDA 9.1 or greater and a Volta GPU or newer.
+
+***IMPORTANT:*** You will get better performance by training with big batches and
+increasing the learning rate. If you want to train the above model with big batches
+(assuming your machine has 8 GPUs):
+- add `--update-freq 16` to simulate training on 8x16=128 GPUs
+- increase the learning rate; 0.001 works well for big batches
+
+##### 4. Evaluate
+
+Now we can evaluate our trained model.
+
+Note that the original [Attention Is All You Need](https://arxiv.org/abs/1706.03762)
+paper used a couple tricks to achieve better BLEU scores. We use these same tricks in
+the Scaling NMT paper, so it's important to apply them when reproducing our results.
+
+First, use the [average_checkpoints.py](/scripts/average_checkpoints.py) script to
+average the last few checkpoints. Averaging the last 5-10 checkpoints is usually
+good, but you may need to adjust this depending on how long you've trained:
+```bash
+python scripts/average_checkpoints \
+    --inputs /path/to/checkpoints \
+    --num-epoch-checkpoints 10 \
+    --output checkpoint.avg10.pt
+```
+
+Next, generate translations using a beam width of 4 and length penalty of 0.6:
+```bash
+fairseq-generate \
+    data-bin/wmt16_en_de_bpe32k \
+    --path checkpoint.avg10.pt \
+    --beam 4 --lenpen 0.6 --remove-bpe > gen.out
+```
+
+Finally, we apply the ["compound splitting" script](/scripts/compound_split_bleu.sh) to
+add spaces around dashes. For example "Café-Liebhaber" would become three tokens:
+"Café - Liebhaber". This typically results in larger BLEU scores, but it is not
+appropriate to compare these inflated scores to work which does not include this trick.
+This trick was used in the [original AIAYN code](https://github.com/tensorflow/tensor2tensor/blob/fc9335c0203685cbbfe2b30c92db4352d8f60779/tensor2tensor/utils/get_ende_bleu.sh),
+so we used it in the Scaling NMT paper as well. That said, it's strongly advised to
+report [sacrebleu](https://github.com/mjpost/sacrebleu) scores instead.
+
+To compute "compound split" tokenized BLEU (not recommended!):
+```bash
+bash scripts/compound_split_bleu.sh gen.out
+# BLEU4 = 29.29, 60.3/35.0/22.8/15.3 (BP=1.000, ratio=1.004, syslen=64763, reflen=64496)
+```
+
+To compute detokenized BLEU with sacrebleu (preferred):
+```bash
+bash scripts/sacrebleu.sh wmt14/full en de gen.out
+# BLEU+case.mixed+lang.en-de+numrefs.1+smooth.exp+test.wmt14/full+tok.13a+version.1.4.3 = 28.6 59.3/34.3/22.1/14.9 (BP = 1.000 ratio = 1.016 hyp_len = 63666 ref_len = 62688)
+```
+
+## Citation
+
+```bibtex
+@inproceedings{ott2018scaling,
+  title = {Scaling Neural Machine Translation},
+  author = {Ott, Myle and Edunov, Sergey and Grangier, David and Auli, Michael},
+  booktitle = {Proceedings of the Third Conference on Machine Translation (WMT)},
+  year = 2018,
+}
+```
diff --git a/SpeechT5/fairseq/examples/simultaneous_translation/README.md b/SpeechT5/fairseq/examples/simultaneous_translation/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..62a005e0ec6f15af9015d335e34b45df6ed89b6c
--- /dev/null
+++ b/SpeechT5/fairseq/examples/simultaneous_translation/README.md
@@ -0,0 +1,5 @@
+# Simultaneous Translation
+Examples of simultaneous translation in fairseq
+- [English-to-Japanese text-to-text wait-k model](docs/enja-waitk.md)
+- [English-to-Germen text-to-text monotonic multihead attention model](docs/ende-mma.md)
+- [English-to-Germen speech-to-text simultaneous translation model](../speech_to_text/docs/simulst_mustc_example.md)
diff --git a/SpeechT5/fairseq/examples/simultaneous_translation/__init__.py b/SpeechT5/fairseq/examples/simultaneous_translation/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..5835316ba9b23c0d99d1a8f109ee047682211546
--- /dev/null
+++ b/SpeechT5/fairseq/examples/simultaneous_translation/__init__.py
@@ -0,0 +1,6 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from . import models  # noqa
diff --git a/SpeechT5/fairseq/examples/simultaneous_translation/docs/ende-mma.md b/SpeechT5/fairseq/examples/simultaneous_translation/docs/ende-mma.md
new file mode 100644
index 0000000000000000000000000000000000000000..241d604a3b31a37755da68aad6ff47d46891d3fc
--- /dev/null
+++ b/SpeechT5/fairseq/examples/simultaneous_translation/docs/ende-mma.md
@@ -0,0 +1,74 @@
+# Simultaneous Machine Translation
+
+This directory contains the code for the paper [Monotonic Multihead Attention](https://openreview.net/forum?id=Hyg96gBKPS)
+
+## Prepare Data
+
+[Please follow the instructions to download and preprocess the WMT'15 En-De dataset.](https://github.com/pytorch/fairseq/tree/simulastsharedtask/examples/translation#prepare-wmt14en2desh)
+
+Another example of training an English to Japanese model can be found [here](docs/enja.md)
+
+## Training
+
+- MMA-IL
+
+```shell
+fairseq-train \
+    data-bin/wmt15_en_de_32k \
+    --simul-type infinite_lookback \
+    --user-dir $FAIRSEQ/example/simultaneous_translation \
+    --mass-preservation \
+    --criterion latency_augmented_label_smoothed_cross_entropy \
+    --latency-weight-avg  0.1 \
+    --max-update 50000 \
+    --arch transformer_monotonic_iwslt_de_en save_dir_key=lambda \
+    --optimizer adam --adam-betas '(0.9, 0.98)' \
+    --lr-scheduler 'inverse_sqrt' \
+    --warmup-init-lr 1e-7  --warmup-updates 4000 \
+    --lr 5e-4 --stop-min-lr 1e-9 --clip-norm 0.0 --weight-decay 0.0001\
+    --dropout 0.3 \
+    --label-smoothing 0.1\
+    --max-tokens 3584
+```
+
+- MMA-H
+
+```shell
+fairseq-train \
+    data-bin/wmt15_en_de_32k \
+    --simul-type hard_aligned \
+    --user-dir $FAIRSEQ/example/simultaneous_translation \
+    --mass-preservation \
+    --criterion latency_augmented_label_smoothed_cross_entropy \
+    --latency-weight-var  0.1 \
+    --max-update 50000 \
+    --arch transformer_monotonic_iwslt_de_en save_dir_key=lambda \
+    --optimizer adam --adam-betas '(0.9, 0.98)' \
+    --lr-scheduler 'inverse_sqrt' \
+    --warmup-init-lr 1e-7  --warmup-updates 4000 \
+    --lr 5e-4 --stop-min-lr 1e-9 --clip-norm 0.0 --weight-decay 0.0001\
+    --dropout 0.3 \
+    --label-smoothing 0.1\
+    --max-tokens 3584
+```
+
+- wait-k
+
+```shell
+fairseq-train \
+    data-bin/wmt15_en_de_32k \
+    --simul-type wait-k \
+    --waitk-lagging 3 \
+    --user-dir $FAIRSEQ/example/simultaneous_translation \
+    --mass-preservation \
+    --criterion latency_augmented_label_smoothed_cross_entropy \
+    --max-update 50000 \
+    --arch transformer_monotonic_iwslt_de_en save_dir_key=lambda \
+    --optimizer adam --adam-betas '(0.9, 0.98)' \
+    --lr-scheduler 'inverse_sqrt' \
+    --warmup-init-lr 1e-7  --warmup-updates 4000 \
+    --lr 5e-4 --stop-min-lr 1e-9 --clip-norm 0.0 --weight-decay 0.0001\
+    --dropout 0.3 \
+    --label-smoothing 0.1\
+    --max-tokens 3584
+```
diff --git a/SpeechT5/fairseq/examples/simultaneous_translation/docs/enja-waitk.md b/SpeechT5/fairseq/examples/simultaneous_translation/docs/enja-waitk.md
new file mode 100644
index 0000000000000000000000000000000000000000..fb9d82576f80b4405564a99774fc98ac2fe6ad3b
--- /dev/null
+++ b/SpeechT5/fairseq/examples/simultaneous_translation/docs/enja-waitk.md
@@ -0,0 +1,106 @@
+# An example of English to Japaneses Simultaneous Translation System
+
+This is an example of training and evaluating a transformer *wait-k* English to Japanese simultaneous text-to-text translation model.
+
+## Data Preparation
+This section introduces the data preparation for training and evaluation.
+If you only want to evaluate the model, please jump to [Inference & Evaluation](#inference-&-evaluation)
+
+For illustration, we only use the following subsets of the available data from [WMT20 news translation task](http://www.statmt.org/wmt20/translation-task.html), which results in 7,815,391 sentence pairs.
+- News Commentary v16
+- Wiki Titles v3
+- WikiMatrix V1
+- Japanese-English Subtitle Corpus
+- The Kyoto Free Translation Task Corpus
+
+We use WMT20 development data as development set. Training `transformer_vaswani_wmt_en_de_big` model on such amount of data will result in 17.3 BLEU with greedy search and 19.7 with beam (10) search. Notice that a better performance can be achieved with the full WMT training data.
+
+We use [sentencepiece](https://github.com/google/sentencepiece) toolkit to tokenize the data with a vocabulary size of 32000.
+Additionally, we filtered out the sentences longer than 200 words after tokenization.
+Assuming the tokenized text data is saved at `${DATA_DIR}`,
+we prepare the data binary with the following command.
+
+```bash
+fairseq-preprocess \
+    --source-lang en --target-lang ja \
+    --trainpref ${DATA_DIR}/train \
+    --validpref ${DATA_DIR}/dev \
+    --testpref ${DATA_DIR}/test \
+    --destdir ${WMT20_ENJA_DATA_BIN} \
+    --nwordstgt 32000 --nwordssrc 32000 \
+    --workers 20
+```
+
+## Simultaneous Translation Model Training
+To train a wait-k `(k=10)` model.
+```bash
+fairseq-train ${WMT20_ENJA_DATA_BIN}  \
+    --save-dir ${SAVEDIR}
+    --simul-type waitk  \
+    --waitk-lagging 10  \
+    --max-epoch 70  \
+    --arch transformer_monotonic_vaswani_wmt_en_de_big \
+    --optimizer adam  \
+    --adam-betas '(0.9, 0.98)'  \
+    --lr-scheduler inverse_sqrt  \
+    --warmup-init-lr 1e-07  \
+    --warmup-updates 4000  \
+    --lr 0.0005  \
+    --stop-min-lr 1e-09  \
+    --clip-norm 10.0  \
+    --dropout 0.3  \
+    --weight-decay 0.0  \
+    --criterion label_smoothed_cross_entropy  \
+    --label-smoothing 0.1  \
+    --max-tokens 3584
+```
+This command is for training on 8 GPUs. Equivalently, the model can be trained on one GPU with `--update-freq 8`.
+
+## Inference & Evaluation
+First of all, install [SimulEval](https://github.com/facebookresearch/SimulEval) for evaluation.
+
+```bash
+git clone https://github.com/facebookresearch/SimulEval.git
+cd SimulEval
+pip install -e .
+```
+
+The following command is for the evaluation.
+Assuming the source and reference files are `${SRC_FILE}` and `${REF_FILE}`, the sentencepiece model file for English is saved at `${SRC_SPM_PATH}`
+
+
+```bash
+simuleval \
+    --source ${SRC_FILE} \
+    --target ${TGT_FILE} \
+    --data-bin ${WMT20_ENJA_DATA_BIN} \
+    --sacrebleu-tokenizer ja-mecab \
+    --eval-latency-unit char \
+    --no-space \
+    --src-splitter-type sentencepiecemodel \
+    --src-splitter-path ${SRC_SPM_PATH} \
+    --agent ${FAIRSEQ}/examples/simultaneous_translation/agents/simul_trans_text_agent_enja.py \
+    --model-path ${SAVE_DIR}/${CHECKPOINT_FILENAME} \
+    --output ${OUTPUT} \
+    --scores
+```
+
+The `--data-bin` should be the same in previous sections if you prepare the data from the scratch.
+If only for evaluation, a prepared data directory can be found [here](https://dl.fbaipublicfiles.com/simultaneous_translation/wmt20_enja_medium_databin.tgz) and a pretrained checkpoint (wait-k=10 model) can be downloaded from [here](https://dl.fbaipublicfiles.com/simultaneous_translation/wmt20_enja_medium_wait10_ckpt.pt).
+
+The output should look like this:
+```bash
+{
+    "Quality": {
+        "BLEU": 11.442253287568398
+    },
+    "Latency": {
+        "AL": 8.6587861866951,
+        "AP": 0.7863304776251316,
+        "DAL": 9.477850951194764
+    }
+}
+```
+The latency is evaluated by characters (`--eval-latency-unit`) on the target side. The latency is evaluated with `sacrebleu` with `MeCab` tokenizer `--sacrebleu-tokenizer ja-mecab`. `--no-space` indicates that do not add space when merging the predicted words.
+
+If `--output ${OUTPUT}` option is used, the detailed log and scores will be stored under the `${OUTPUT}` directory.
diff --git a/SpeechT5/fairseq/examples/simultaneous_translation/eval/agents/simul_t2t_enja.py b/SpeechT5/fairseq/examples/simultaneous_translation/eval/agents/simul_t2t_enja.py
new file mode 100644
index 0000000000000000000000000000000000000000..8f3c8703ca37398b9d389ce5181bdfac2333cdf2
--- /dev/null
+++ b/SpeechT5/fairseq/examples/simultaneous_translation/eval/agents/simul_t2t_enja.py
@@ -0,0 +1,226 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import os
+
+from fairseq import checkpoint_utils, tasks
+import sentencepiece as spm
+import torch
+
+try:
+    from simuleval import READ_ACTION, WRITE_ACTION, DEFAULT_EOS
+    from simuleval.agents import TextAgent
+except ImportError:
+    print("Please install simuleval 'pip install simuleval'")
+
+
+BOS_PREFIX = "\u2581"
+
+
+class SimulTransTextAgentJA(TextAgent):
+    """
+    Simultaneous Translation
+    Text agent for Japanese
+    """
+    def __init__(self, args):
+
+        # Whether use gpu
+        self.gpu = getattr(args, "gpu", False)
+
+        # Max len
+        self.max_len = args.max_len
+
+        # Load Model
+        self.load_model_vocab(args)
+
+        # build word splitter
+        self.build_word_splitter(args)
+
+        self.eos = DEFAULT_EOS
+
+    def initialize_states(self, states):
+        states.incremental_states = dict()
+        states.incremental_states["online"] = dict()
+
+    def to_device(self, tensor):
+        if self.gpu:
+            return tensor.cuda()
+        else:
+            return tensor.cpu()
+
+    def load_model_vocab(self, args):
+
+        filename = args.model_path
+        if not os.path.exists(filename):
+            raise IOError("Model file not found: {}".format(filename))
+
+        state = checkpoint_utils.load_checkpoint_to_cpu(filename)
+
+        task_args = state["cfg"]["task"]
+        task_args.data = args.data_bin
+
+        task = tasks.setup_task(task_args)
+
+        # build model for ensemble
+        state["cfg"]["model"].load_pretrained_encoder_from = None
+        state["cfg"]["model"].load_pretrained_decoder_from = None
+
+        self.model = task.build_model(state["cfg"]["model"])
+        self.model.load_state_dict(state["model"], strict=True)
+        self.model.eval()
+        self.model.share_memory()
+
+        if self.gpu:
+            self.model.cuda()
+
+        # Set dictionary
+        self.dict = {}
+        self.dict["tgt"] = task.target_dictionary
+        self.dict["src"] = task.source_dictionary
+
+    @staticmethod
+    def add_args(parser):
+        # fmt: off
+        parser.add_argument('--model-path', type=str, required=True,
+                            help='path to your pretrained model.')
+        parser.add_argument("--data-bin", type=str, required=True,
+                            help="Path of data binary")
+        parser.add_argument("--max-len", type=int, default=100,
+                            help="Max length of translation")
+        parser.add_argument("--tgt-splitter-type", type=str, default="SentencePiece",
+                            help="Subword splitter type for target text.")
+        parser.add_argument("--tgt-splitter-path", type=str, default=None,
+                            help="Subword splitter model path for target text.")
+        parser.add_argument("--src-splitter-type", type=str, default="SentencePiece",
+                            help="Subword splitter type for source text.")
+        parser.add_argument("--src-splitter-path", type=str, default=None,
+                            help="Subword splitter model path for source text.")
+        # fmt: on
+        return parser
+
+    def build_word_splitter(self, args):
+        self.spm = {}
+        for lang in ['src', 'tgt']:
+            if getattr(args, f'{lang}_splitter_type', None):
+                path = getattr(args, f'{lang}_splitter_path', None)
+                if path:
+                    self.spm[lang] = spm.SentencePieceProcessor()
+                    self.spm[lang].Load(path)
+
+    def segment_to_units(self, segment, states):
+        # Split a full word (segment) into subwords (units)
+        return self.spm['src'].EncodeAsPieces(segment)
+
+    def update_model_encoder(self, states):
+        if len(states.units.source) == 0:
+            return
+
+        src_indices = [
+            self.dict['src'].index(x)
+            for x in states.units.source.value
+        ]
+
+        if states.finish_read():
+            # Append the eos index when the prediction is over
+            src_indices += [self.dict["tgt"].eos_index]
+
+        src_indices = self.to_device(
+            torch.LongTensor(src_indices).unsqueeze(0)
+        )
+        src_lengths = self.to_device(
+            torch.LongTensor([src_indices.size(1)])
+        )
+
+        states.encoder_states = self.model.encoder(src_indices, src_lengths)
+
+        torch.cuda.empty_cache()
+
+    def update_states_read(self, states):
+        # Happens after a read action.
+        self.update_model_encoder(states)
+
+    def units_to_segment(self, units, states):
+        # Merge sub words (units) to full word (segment).
+        # For Japanese, we can directly send
+        # the untokenized token to server except the BOS token
+        # with following option
+        # --sacrebleu-tokenizer MeCab
+        # --eval-latency-unit char
+        # --no-space
+        token = units.value.pop()
+
+        if (
+            token == self.dict["tgt"].eos_word
+            or len(states.segments.target) > self.max_len
+        ):
+            return DEFAULT_EOS
+
+        if BOS_PREFIX == token:
+            return None
+        if token[0] == BOS_PREFIX:
+            return token[1:]
+        else:
+            return token
+
+    def policy(self, states):
+
+        if not getattr(states, "encoder_states", None):
+            # No encoder states, read a token first
+            return READ_ACTION
+
+        # encode previous predicted target tokens
+        tgt_indices = self.to_device(
+            torch.LongTensor(
+                [self.model.decoder.dictionary.eos()]
+                + [
+                    self.dict['tgt'].index(x)
+                    for x in states.units.target.value
+                    if x is not None
+                ]
+            ).unsqueeze(0)
+        )
+
+        # Current steps
+        states.incremental_states["steps"] = {
+            "src": states.encoder_states["encoder_out"][0].size(0),
+            "tgt": 1 + len(states.units.target),
+        }
+
+        # Online only means the reading is not finished
+        states.incremental_states["online"]["only"] = (
+            torch.BoolTensor([not states.finish_read()])
+        )
+
+        x, outputs = self.model.decoder.forward(
+            prev_output_tokens=tgt_indices,
+            encoder_out=states.encoder_states,
+            incremental_state=states.incremental_states,
+        )
+
+        states.decoder_out = x
+
+        torch.cuda.empty_cache()
+
+        if outputs.action == 0:
+            return READ_ACTION
+        else:
+            return WRITE_ACTION
+
+    def predict(self, states):
+        # Predict target token from decoder states
+        decoder_states = states.decoder_out
+
+        lprobs = self.model.get_normalized_probs(
+            [decoder_states[:, -1:]], log_probs=True
+        )
+
+        index = lprobs.argmax(dim=-1)[0, 0].item()
+
+        if index != self.dict['tgt'].eos_index:
+            token = self.dict['tgt'].string([index])
+        else:
+            token = self.dict['tgt'].eos_word
+
+        return token
diff --git a/SpeechT5/fairseq/examples/simultaneous_translation/models/__init__.py b/SpeechT5/fairseq/examples/simultaneous_translation/models/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..257a96593ff7af93c206c066d8db4ad795b2ae36
--- /dev/null
+++ b/SpeechT5/fairseq/examples/simultaneous_translation/models/__init__.py
@@ -0,0 +1,15 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import importlib
+import os
+
+
+for file in sorted(os.listdir(os.path.dirname(__file__))):
+    if file.endswith(".py") and not file.startswith("_"):
+        model_name = file[: file.find(".py")]
+        importlib.import_module(
+            "examples.simultaneous_translation.models." + model_name
+        )
diff --git a/SpeechT5/fairseq/examples/simultaneous_translation/models/convtransformer_simul_trans.py b/SpeechT5/fairseq/examples/simultaneous_translation/models/convtransformer_simul_trans.py
new file mode 100644
index 0000000000000000000000000000000000000000..4a26422f650cf13ee7d4e8d2228b50ec49876fb8
--- /dev/null
+++ b/SpeechT5/fairseq/examples/simultaneous_translation/models/convtransformer_simul_trans.py
@@ -0,0 +1,204 @@
+# Copyright (c) 2017-present, Facebook, Inc.
+# All rights reserved.
+#
+# This source code is licensed under the license found in the LICENSE file in
+# the root directory of this source tree. An additional grant of patent rights
+# can be found in the PATENTS file in the same directory.
+
+from fairseq import checkpoint_utils
+from fairseq.models import (
+    register_model,
+    register_model_architecture,
+)
+from fairseq.models.speech_to_text import (
+    ConvTransformerModel,
+    convtransformer_espnet,
+    ConvTransformerEncoder,
+)
+from fairseq.models.speech_to_text.modules.augmented_memory_attention import (
+    augmented_memory,
+    SequenceEncoder,
+    AugmentedMemoryConvTransformerEncoder,
+)
+
+from torch import nn, Tensor
+from typing import Dict, List
+from fairseq.models.speech_to_text.modules.emformer import NoSegAugmentedMemoryTransformerEncoderLayer
+
+@register_model("convtransformer_simul_trans")
+class SimulConvTransformerModel(ConvTransformerModel):
+    """
+    Implementation of the paper:
+
+    SimulMT to SimulST: Adapting Simultaneous Text Translation to
+    End-to-End Simultaneous Speech Translation
+
+    https://www.aclweb.org/anthology/2020.aacl-main.58.pdf
+    """
+
+    @staticmethod
+    def add_args(parser):
+        super(SimulConvTransformerModel, SimulConvTransformerModel).add_args(parser)
+        parser.add_argument(
+            "--train-monotonic-only",
+            action="store_true",
+            default=False,
+            help="Only train monotonic attention",
+        )
+
+    @classmethod
+    def build_decoder(cls, args, task, embed_tokens):
+        tgt_dict = task.tgt_dict
+
+        from examples.simultaneous_translation.models.transformer_monotonic_attention import (
+            TransformerMonotonicDecoder,
+        )
+
+        decoder = TransformerMonotonicDecoder(args, tgt_dict, embed_tokens)
+
+        if getattr(args, "load_pretrained_decoder_from", None):
+            decoder = checkpoint_utils.load_pretrained_component_from_model(
+                component=decoder, checkpoint=args.load_pretrained_decoder_from
+            )
+        return decoder
+
+
+@register_model_architecture(
+    "convtransformer_simul_trans", "convtransformer_simul_trans_espnet"
+)
+def convtransformer_simul_trans_espnet(args):
+    convtransformer_espnet(args)
+
+
+@register_model("convtransformer_augmented_memory")
+@augmented_memory
+class AugmentedMemoryConvTransformerModel(SimulConvTransformerModel):
+    @classmethod
+    def build_encoder(cls, args):
+        encoder = SequenceEncoder(args, AugmentedMemoryConvTransformerEncoder(args))
+
+        if getattr(args, "load_pretrained_encoder_from", None) is not None:
+            encoder = checkpoint_utils.load_pretrained_component_from_model(
+                component=encoder, checkpoint=args.load_pretrained_encoder_from
+            )
+
+        return encoder
+
+
+@register_model_architecture(
+    "convtransformer_augmented_memory", "convtransformer_augmented_memory"
+)
+def augmented_memory_convtransformer_espnet(args):
+    convtransformer_espnet(args)
+
+
+# ============================================================================ #
+#   Convtransformer
+#   with monotonic attention decoder
+#   with emformer encoder
+# ============================================================================ #
+
+
+class ConvTransformerEmformerEncoder(ConvTransformerEncoder):
+    def __init__(self, args):
+        super().__init__(args)
+        stride = self.conv_layer_stride(args)
+        trf_left_context = args.segment_left_context // stride
+        trf_right_context = args.segment_right_context // stride
+        context_config = [trf_left_context, trf_right_context]
+        self.transformer_layers = nn.ModuleList(
+            [
+                NoSegAugmentedMemoryTransformerEncoderLayer(
+                    input_dim=args.encoder_embed_dim,
+                    num_heads=args.encoder_attention_heads,
+                    ffn_dim=args.encoder_ffn_embed_dim,
+                    num_layers=args.encoder_layers,
+                    dropout_in_attn=args.dropout,
+                    dropout_on_attn=args.dropout,
+                    dropout_on_fc1=args.dropout,
+                    dropout_on_fc2=args.dropout,
+                    activation_fn=args.activation_fn,
+                    context_config=context_config,
+                    segment_size=args.segment_length,
+                    max_memory_size=args.max_memory_size,
+                    scaled_init=True,  # TODO: use constant for now.
+                    tanh_on_mem=args.amtrf_tanh_on_mem,
+                )
+            ]
+        )
+        self.conv_transformer_encoder = ConvTransformerEncoder(args)
+
+    def forward(self, src_tokens, src_lengths):
+        encoder_out: Dict[str, List[Tensor]] = self.conv_transformer_encoder(src_tokens, src_lengths.to(src_tokens.device))
+        output = encoder_out["encoder_out"][0]
+        encoder_padding_masks = encoder_out["encoder_padding_mask"]
+
+        return {
+            "encoder_out": [output],
+            # This is because that in the original implementation
+            # the output didn't consider the last segment as right context.
+            "encoder_padding_mask": [encoder_padding_masks[0][:, : output.size(0)]] if len(encoder_padding_masks) > 0
+            else [],
+            "encoder_embedding": [],
+            "encoder_states": [],
+            "src_tokens": [],
+            "src_lengths": [],
+        }
+
+    @staticmethod
+    def conv_layer_stride(args):
+        # TODO: make it configurable from the args
+        return 4
+
+
+@register_model("convtransformer_emformer")
+class ConvtransformerEmformer(SimulConvTransformerModel):
+    @staticmethod
+    def add_args(parser):
+        super(ConvtransformerEmformer, ConvtransformerEmformer).add_args(parser)
+
+        parser.add_argument(
+            "--segment-length",
+            type=int,
+            metavar="N",
+            help="length of each segment (not including left context / right context)",
+        )
+        parser.add_argument(
+            "--segment-left-context",
+            type=int,
+            help="length of left context in a segment",
+        )
+        parser.add_argument(
+            "--segment-right-context",
+            type=int,
+            help="length of right context in a segment",
+        )
+        parser.add_argument(
+            "--max-memory-size",
+            type=int,
+            default=-1,
+            help="Right context for the segment.",
+        )
+        parser.add_argument(
+            "--amtrf-tanh-on-mem",
+            default=False,
+            action="store_true",
+            help="whether to use tanh on memory vector",
+        )
+
+    @classmethod
+    def build_encoder(cls, args):
+        encoder = ConvTransformerEmformerEncoder(args)
+        if getattr(args, "load_pretrained_encoder_from", None):
+            encoder = checkpoint_utils.load_pretrained_component_from_model(
+                component=encoder, checkpoint=args.load_pretrained_encoder_from
+            )
+        return encoder
+
+
+@register_model_architecture(
+    "convtransformer_emformer",
+    "convtransformer_emformer",
+)
+def convtransformer_emformer_base(args):
+    convtransformer_espnet(args)
diff --git a/SpeechT5/fairseq/examples/simultaneous_translation/models/transformer_monotonic_attention.py b/SpeechT5/fairseq/examples/simultaneous_translation/models/transformer_monotonic_attention.py
new file mode 100644
index 0000000000000000000000000000000000000000..1062e9b955be475b2eaca4255bac3e7f219fd2a1
--- /dev/null
+++ b/SpeechT5/fairseq/examples/simultaneous_translation/models/transformer_monotonic_attention.py
@@ -0,0 +1,315 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from typing import Dict, List, NamedTuple, Optional
+
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from examples.simultaneous_translation.modules.monotonic_transformer_layer import (
+    TransformerMonotonicDecoderLayer,
+    TransformerMonotonicEncoderLayer,
+)
+from fairseq.models import (
+    register_model,
+    register_model_architecture,
+)
+from fairseq.models.transformer import (
+    TransformerModel,
+    TransformerEncoder,
+    TransformerDecoder,
+    base_architecture,
+    transformer_iwslt_de_en,
+    transformer_vaswani_wmt_en_de_big,
+    transformer_vaswani_wmt_en_fr_big,
+)
+from torch import Tensor
+
+DEFAULT_MAX_SOURCE_POSITIONS = 1024
+DEFAULT_MAX_TARGET_POSITIONS = 1024
+
+TransformerMonotonicDecoderOut = NamedTuple(
+    "TransformerMonotonicDecoderOut",
+    [
+        ("action", int),
+        ("attn_list", Optional[List[Optional[Dict[str, Tensor]]]]),
+        ("step_list", Optional[List[Optional[Tensor]]]),
+        ("encoder_out", Optional[Dict[str, List[Tensor]]]),
+        ("encoder_padding_mask", Optional[Tensor]),
+    ],
+)
+
+
+@register_model("transformer_unidirectional")
+class TransformerUnidirectionalModel(TransformerModel):
+    @classmethod
+    def build_encoder(cls, args, src_dict, embed_tokens):
+        return TransformerMonotonicEncoder(args, src_dict, embed_tokens)
+
+
+@register_model("transformer_monotonic")
+class TransformerModelSimulTrans(TransformerModel):
+    @classmethod
+    def build_encoder(cls, args, src_dict, embed_tokens):
+        return TransformerMonotonicEncoder(args, src_dict, embed_tokens)
+
+    @classmethod
+    def build_decoder(cls, args, tgt_dict, embed_tokens):
+        return TransformerMonotonicDecoder(args, tgt_dict, embed_tokens)
+
+    def _indices_from_states(self, states):
+        if type(states["indices"]["src"]) == list:
+            if next(self.parameters()).is_cuda:
+                tensor = torch.cuda.LongTensor
+            else:
+                tensor = torch.LongTensor
+
+            src_indices = tensor(
+                [states["indices"]["src"][: 1 + states["steps"]["src"]]]
+            )
+
+            tgt_indices = tensor(
+                [[self.decoder.dictionary.eos()] + states["indices"]["tgt"]]
+            )
+        else:
+            src_indices = states["indices"]["src"][: 1 + states["steps"]["src"]]
+            tgt_indices = states["indices"]["tgt"]
+
+        return src_indices, None, tgt_indices
+
+
+class TransformerMonotonicEncoder(TransformerEncoder):
+    def __init__(self, args, dictionary, embed_tokens):
+        super().__init__(args, dictionary, embed_tokens)
+
+        self.dictionary = dictionary
+        self.layers = nn.ModuleList([])
+        self.layers.extend(
+            [TransformerMonotonicEncoderLayer(args) for i in range(args.encoder_layers)]
+        )
+
+
+class TransformerMonotonicDecoder(TransformerDecoder):
+    """
+    Transformer decoder consisting of *args.decoder_layers* layers. Each layer
+    is a :class:`TransformerDecoderLayer`.
+
+    Args:
+        args (argparse.Namespace): parsed command-line arguments
+        dictionary (~fairseq.data.Dictionary): decoding dictionary
+        embed_tokens (torch.nn.Embedding): output embedding
+        no_encoder_attn (bool, optional): whether to attend to encoder outputs
+            (default: False).
+    """
+
+    def __init__(self, args, dictionary, embed_tokens, no_encoder_attn=False):
+        super().__init__(args, dictionary, embed_tokens, no_encoder_attn=False)
+
+        self.dictionary = dictionary
+        self.layers = nn.ModuleList([])
+        self.layers.extend(
+            [
+                TransformerMonotonicDecoderLayer(args, no_encoder_attn)
+                for _ in range(args.decoder_layers)
+            ]
+        )
+
+    def pre_attention(
+        self,
+        prev_output_tokens,
+        encoder_out_dict: Dict[str, List[Tensor]],
+        incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]] = None,
+    ):
+        positions = (
+            self.embed_positions(
+                prev_output_tokens,
+                incremental_state=incremental_state,
+            )
+            if self.embed_positions is not None
+            else None
+        )
+
+        if incremental_state is not None:
+            prev_output_tokens = prev_output_tokens[:, -1:]
+            if positions is not None:
+                positions = positions[:, -1:]
+        # embed tokens and positions
+        x = self.embed_scale * self.embed_tokens(prev_output_tokens)
+
+        if self.project_in_dim is not None:
+            x = self.project_in_dim(x)
+
+        if positions is not None:
+            x += positions
+
+        x = self.dropout_module(x)
+
+        # B x T x C -> T x B x C
+        x = x.transpose(0, 1)
+
+        encoder_out = encoder_out_dict["encoder_out"][0]
+        encoder_padding_mask = (
+            encoder_out_dict["encoder_padding_mask"][0]
+            if encoder_out_dict["encoder_padding_mask"]
+            and len(encoder_out_dict["encoder_padding_mask"]) > 0
+            else None
+        )
+
+        return x, encoder_out, encoder_padding_mask
+
+    def post_attention(self, x):
+        if self.layer_norm is not None:
+            x = self.layer_norm(x)
+
+        # T x B x C -> B x T x C
+        x = x.transpose(0, 1)
+
+        if self.project_out_dim is not None:
+            x = self.project_out_dim(x)
+
+        return x
+
+    def clear_cache(
+        self,
+        incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]],
+        end_id: Optional[int] = None,
+    ):
+        """
+        Clear cache in the monotonic layers.
+        The cache is generated because of a forward pass of decode but no prediction.
+        end_id is the last idx of the layers
+        """
+        if end_id is None:
+            end_id = len(self.layers)
+
+        for index, layer in enumerate(self.layers):
+            if index < end_id:
+                layer.prune_incremental_state(incremental_state)
+
+    def extract_features(
+        self,
+        prev_output_tokens,
+        encoder_out: Optional[Dict[str, List[Tensor]]],
+        incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]] = None,
+        full_context_alignment: bool = False,  # unused
+        alignment_layer: Optional[int] = None,  # unused
+        alignment_heads: Optional[int] = None,  # unsed
+    ):
+        """
+        Similar to *forward* but only return features.
+
+        Returns:
+            tuple:
+                - the decoder's features of shape `(batch, tgt_len, embed_dim)`
+                - a dictionary with any model-specific outputs
+        """
+        # incremental_state = None
+        assert encoder_out is not None
+        (x, encoder_outs, encoder_padding_mask) = self.pre_attention(
+            prev_output_tokens, encoder_out, incremental_state
+        )
+        attn = None
+        inner_states = [x]
+        attn_list: List[Optional[Dict[str, Tensor]]] = []
+        step_list: List[Optional[Tensor]] = []
+
+        for i, layer in enumerate(self.layers):
+
+            x, attn, _ = layer(
+                x=x,
+                encoder_out=encoder_outs,
+                encoder_padding_mask=encoder_padding_mask,
+                incremental_state=incremental_state,
+                self_attn_mask=self.buffered_future_mask(x)
+                if incremental_state is None
+                else None,
+            )
+
+            inner_states.append(x)
+            attn_list.append(attn)
+
+            if incremental_state is not None:
+                curr_steps = layer.get_head_steps(incremental_state)
+                step_list.append(curr_steps)
+                if_online = incremental_state["online"]["only"]
+                assert if_online is not None
+                if if_online.to(torch.bool):
+                    # Online indicates that the encoder states are still changing
+                    assert attn is not None
+                    assert curr_steps is not None
+                    p_choose = (
+                        attn["p_choose"].squeeze(0).squeeze(1).gather(1, curr_steps.t())
+                    )
+
+                    new_steps = curr_steps + (p_choose < 0.5).t().type_as(curr_steps)
+                    src = incremental_state["steps"]["src"]
+                    assert src is not None
+
+                    if (new_steps >= src).any():
+                        # We need to prune the last self_attn saved_state
+                        # if model decide not to read
+                        # otherwise there will be duplicated saved_state
+                        self.clear_cache(incremental_state, i + 1)
+
+                        return x, TransformerMonotonicDecoderOut(
+                            action=0,
+                            attn_list=None,
+                            step_list=None,
+                            encoder_out=None,
+                            encoder_padding_mask=None,
+                        )
+
+        x = self.post_attention(x)
+
+        return x, TransformerMonotonicDecoderOut(
+            action=1,
+            attn_list=attn_list,
+            step_list=step_list,
+            encoder_out=encoder_out,
+            encoder_padding_mask=encoder_padding_mask,
+        )
+
+    def reorder_incremental_state(self, incremental_state, new_order):
+        super().reorder_incremental_state(incremental_state, new_order)
+        if "fastest_step" in incremental_state:
+            incremental_state["fastest_step"] = incremental_state[
+                "fastest_step"
+            ].index_select(0, new_order)
+
+
+@register_model_architecture("transformer_monotonic", "transformer_monotonic")
+def base_monotonic_architecture(args):
+    base_architecture(args)
+    args.encoder_unidirectional = getattr(args, "encoder_unidirectional", False)
+
+
+@register_model_architecture(
+    "transformer_monotonic", "transformer_monotonic_iwslt_de_en"
+)
+def transformer_monotonic_iwslt_de_en(args):
+    transformer_iwslt_de_en(args)
+    base_monotonic_architecture(args)
+
+
+# parameters used in the "Attention Is All You Need" paper (Vaswani et al., 2017)
+@register_model_architecture(
+    "transformer_monotonic", "transformer_monotonic_vaswani_wmt_en_de_big"
+)
+def transformer_monotonic_vaswani_wmt_en_de_big(args):
+    transformer_vaswani_wmt_en_de_big(args)
+
+
+@register_model_architecture(
+    "transformer_monotonic", "transformer_monotonic_vaswani_wmt_en_fr_big"
+)
+def transformer_monotonic_vaswani_wmt_en_fr_big(args):
+    transformer_monotonic_vaswani_wmt_en_fr_big(args)
+
+
+@register_model_architecture(
+    "transformer_unidirectional", "transformer_unidirectional_iwslt_de_en"
+)
+def transformer_unidirectional_iwslt_de_en(args):
+    transformer_iwslt_de_en(args)
diff --git a/SpeechT5/fairseq/examples/simultaneous_translation/modules/__init__.py b/SpeechT5/fairseq/examples/simultaneous_translation/modules/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..c695850c04952c1095edc676cf062f7ee43eb788
--- /dev/null
+++ b/SpeechT5/fairseq/examples/simultaneous_translation/modules/__init__.py
@@ -0,0 +1,24 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import importlib
+import os
+
+from fairseq import registry
+
+
+(
+    build_monotonic_attention,
+    register_monotonic_attention,
+    MONOTONIC_ATTENTION_REGISTRY,
+    _,
+) = registry.setup_registry("--simul-type")
+
+for file in sorted(os.listdir(os.path.dirname(__file__))):
+    if file.endswith(".py") and not file.startswith("_"):
+        model_name = file[: file.find(".py")]
+        importlib.import_module(
+            "examples.simultaneous_translation.modules." + model_name
+        )
diff --git a/SpeechT5/fairseq/examples/simultaneous_translation/modules/fixed_pre_decision.py b/SpeechT5/fairseq/examples/simultaneous_translation/modules/fixed_pre_decision.py
new file mode 100644
index 0000000000000000000000000000000000000000..dd29c031b3b23401dcf61bbe48991934099429a8
--- /dev/null
+++ b/SpeechT5/fairseq/examples/simultaneous_translation/modules/fixed_pre_decision.py
@@ -0,0 +1,254 @@
+from functools import partial
+
+import torch
+from torch import Tensor
+import math
+import torch.nn.functional as F
+
+from . import register_monotonic_attention
+from .monotonic_multihead_attention import (
+    MonotonicMultiheadAttentionWaitK,
+    MonotonicMultiheadAttentionHardAligned,
+    MonotonicMultiheadAttentionInfiniteLookback,
+)
+from typing import Dict, Optional
+from examples.simultaneous_translation.utils import p_choose_strategy
+
+def fixed_pooling_monotonic_attention(monotonic_attention):
+    def create_model(monotonic_attention, klass):
+        class FixedStrideMonotonicAttention(monotonic_attention):
+            def __init__(self, args):
+                self.waitk_lagging = 0
+                self.num_heads = 0
+                self.noise_mean = 0.0
+                self.noise_var = 0.0
+                super().__init__(args)
+                self.pre_decision_type = args.fixed_pre_decision_type
+                self.pre_decision_ratio = args.fixed_pre_decision_ratio
+                self.pre_decision_pad_threshold = args.fixed_pre_decision_pad_threshold
+                if self.pre_decision_ratio == 1:
+                    return
+
+                self.strategy = args.simul_type
+
+                if args.fixed_pre_decision_type == "average":
+                    self.pooling_layer = torch.nn.AvgPool1d(
+                        kernel_size=self.pre_decision_ratio,
+                        stride=self.pre_decision_ratio,
+                        ceil_mode=True,
+                    )
+                elif args.fixed_pre_decision_type == "last":
+
+                    def last(key):
+                        if key.size(2) < self.pre_decision_ratio:
+                            return key
+                        else:
+                            k = key[
+                                :,
+                                :,
+                                self.pre_decision_ratio - 1 :: self.pre_decision_ratio,
+                            ].contiguous()
+                            if key.size(-1) % self.pre_decision_ratio != 0:
+                                k = torch.cat([k, key[:, :, -1:]], dim=-1).contiguous()
+                            return k
+
+                    self.pooling_layer = last
+                else:
+                    raise NotImplementedError
+
+            @staticmethod
+            def add_args(parser):
+                super(
+                    FixedStrideMonotonicAttention, FixedStrideMonotonicAttention
+                ).add_args(parser)
+                parser.add_argument(
+                    "--fixed-pre-decision-ratio",
+                    type=int,
+                    required=True,
+                    help=(
+                        "Ratio for the fixed pre-decision,"
+                        "indicating how many encoder steps will start"
+                        "simultaneous decision making process."
+                    ),
+                )
+                parser.add_argument(
+                    "--fixed-pre-decision-type",
+                    default="average",
+                    choices=["average", "last"],
+                    help="Pooling type",
+                )
+                parser.add_argument(
+                    "--fixed-pre-decision-pad-threshold",
+                    type=float,
+                    default=0.3,
+                    help="If a part of the sequence has pad"
+                    ",the threshold the pooled part is a pad.",
+                )
+
+            def insert_zeros(self, x):
+                bsz_num_heads, tgt_len, src_len = x.size()
+                stride = self.pre_decision_ratio
+                weight = F.pad(torch.ones(1, 1, 1).to(x), (stride - 1, 0))
+                x_upsample = F.conv_transpose1d(
+                    x.view(-1, src_len).unsqueeze(1),
+                    weight,
+                    stride=stride,
+                    padding=0,
+                )
+                return x_upsample.squeeze(1).view(bsz_num_heads, tgt_len, -1)
+
+            def p_choose_waitk(
+                self, query, key, key_padding_mask: Optional[Tensor] = None,
+                incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]] = None
+            ):
+                """
+                query: bsz, tgt_len
+                key: bsz, src_len
+                key_padding_mask: bsz, src_len
+                """
+                if incremental_state is not None:
+                    # Retrieve target length from incremental states
+                    # For inference the length of query is always 1
+                    tgt = incremental_state["steps"]["tgt"]
+                    assert tgt is not None
+                    tgt_len = int(tgt)
+                else:
+                    tgt_len, bsz, _ = query.size()
+
+                src_len, bsz, _ = key.size()
+
+                p_choose = torch.ones(bsz, tgt_len, src_len).to(query)
+                p_choose = torch.tril(p_choose, diagonal=self.waitk_lagging - 1)
+                p_choose = torch.triu(p_choose, diagonal=self.waitk_lagging - 1)
+
+                if incremental_state is not None:
+                    p_choose = p_choose[:, -1:]
+                    tgt_len = 1
+
+                # Extend to each head
+                p_choose = (
+                    p_choose.contiguous()
+                    .unsqueeze(1)
+                    .expand(-1, self.num_heads, -1, -1)
+                    .contiguous()
+                    .view(-1, tgt_len, src_len)
+                )
+
+                return p_choose
+
+            def p_choose(
+                self,
+                query: Optional[Tensor],
+                key: Optional[Tensor],
+                key_padding_mask: Optional[Tensor] = None,
+                incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]] = None,
+            ):
+                assert key is not None
+                assert query is not None
+                src_len = key.size(0)
+                tgt_len = query.size(0)
+                batch_size = query.size(1)
+
+                if self.pre_decision_ratio == 1:
+                    if self.strategy == "waitk":
+                        return p_choose_strategy.waitk(
+                            query,
+                            key,
+                            self.waitk_lagging,
+                            self.num_heads,
+                            key_padding_mask,
+                            incremental_state=incremental_state,
+                        )
+                    else:  # hard_aligned or infinite_lookback
+                        q_proj, k_proj, _ = self.input_projections(query, key, None, "monotonic")
+                        attn_energy = self.attn_energy(q_proj, k_proj, key_padding_mask)
+                        return p_choose_strategy.hard_aligned(
+                            q_proj,
+                            k_proj,
+                            attn_energy,
+                            self.noise_mean,
+                            self.noise_var,
+                            self.training
+                        )
+
+                key_pool = self.pooling_layer(key.transpose(0, 2)).transpose(0, 2)
+
+                if key_padding_mask is not None:
+                    key_padding_mask_pool = (
+                        self.pooling_layer(key_padding_mask.unsqueeze(0).float())
+                        .squeeze(0)
+                        .gt(self.pre_decision_pad_threshold)
+                    )
+                    # Make sure at least one element is not pad
+                    key_padding_mask_pool[:, 0] = 0
+                else:
+                    key_padding_mask_pool = None
+
+                if incremental_state is not None:
+                    # The floor instead of ceil is used for inference
+                    # But make sure the length key_pool at least 1
+                    if (
+                        max(1, math.floor(key.size(0) / self.pre_decision_ratio))
+                    ) < key_pool.size(0):
+                        key_pool = key_pool[:-1]
+                        if key_padding_mask_pool is not None:
+                            key_padding_mask_pool = key_padding_mask_pool[:-1]
+
+                p_choose_pooled = self.p_choose_waitk(
+                    query,
+                    key_pool,
+                    key_padding_mask_pool,
+                    incremental_state=incremental_state,
+                )
+
+                # Upsample, interpolate zeros
+                p_choose = self.insert_zeros(p_choose_pooled)
+
+                if p_choose.size(-1) < src_len:
+                    # Append zeros if the upsampled p_choose is shorter than src_len
+                    p_choose = torch.cat(
+                        [
+                            p_choose,
+                            torch.zeros(
+                                p_choose.size(0),
+                                tgt_len,
+                                src_len - p_choose.size(-1)
+                            ).to(p_choose)
+                        ],
+                        dim=2
+                    )
+                else:
+                    # can be larger than src_len because we used ceil before
+                    p_choose = p_choose[:, :, :src_len]
+                    p_choose[:, :, -1] = p_choose_pooled[:, :, -1]
+
+                assert list(p_choose.size()) == [
+                    batch_size * self.num_heads,
+                    tgt_len,
+                    src_len,
+                ]
+
+                return p_choose
+
+        FixedStrideMonotonicAttention.__name__ = klass.__name__
+        return FixedStrideMonotonicAttention
+
+    return partial(create_model, monotonic_attention)
+
+
+@register_monotonic_attention("waitk_fixed_pre_decision")
+@fixed_pooling_monotonic_attention(MonotonicMultiheadAttentionWaitK)
+class MonotonicMultiheadAttentionWaitkFixedStride:
+    pass
+
+
+@register_monotonic_attention("hard_aligned_fixed_pre_decision")
+@fixed_pooling_monotonic_attention(MonotonicMultiheadAttentionHardAligned)
+class MonotonicMultiheadAttentionHardFixedStride:
+    pass
+
+
+@register_monotonic_attention("infinite_lookback_fixed_pre_decision")
+@fixed_pooling_monotonic_attention(MonotonicMultiheadAttentionInfiniteLookback)
+class MonotonicMultiheadAttentionInfiniteLookbackFixedStride:
+    pass
diff --git a/SpeechT5/fairseq/examples/simultaneous_translation/modules/monotonic_multihead_attention.py b/SpeechT5/fairseq/examples/simultaneous_translation/modules/monotonic_multihead_attention.py
new file mode 100644
index 0000000000000000000000000000000000000000..f49b1daa2fbe920c290055b44a09bfe404fc4f89
--- /dev/null
+++ b/SpeechT5/fairseq/examples/simultaneous_translation/modules/monotonic_multihead_attention.py
@@ -0,0 +1,910 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import math
+
+import torch
+from torch import Tensor
+import torch.nn as nn
+
+from examples.simultaneous_translation.utils.functions import (
+    exclusive_cumprod,
+    lengths_to_mask,
+)
+from fairseq.incremental_decoding_utils import with_incremental_state
+from fairseq.modules import MultiheadAttention
+
+from . import register_monotonic_attention
+from typing import Dict, Optional
+
+from examples.simultaneous_translation.utils import p_choose_strategy
+
+@with_incremental_state
+class MonotonicAttention(nn.Module):
+    """
+    Abstract class of monotonic attentions
+    """
+
+    def __init__(self, args):
+        self.eps = args.attention_eps
+        self.mass_preservation = args.mass_preservation
+
+        self.noise_type = args.noise_type
+        self.noise_mean = args.noise_mean
+        self.noise_var = args.noise_var
+
+        self.energy_bias_init = args.energy_bias_init
+        self.energy_bias = (
+            nn.Parameter(self.energy_bias_init * torch.ones([1]))
+            if args.energy_bias is True
+            else 0
+        )
+
+    @staticmethod
+    def add_args(parser):
+        # fmt: off
+        parser.add_argument('--no-mass-preservation', action="store_false",
+                            dest="mass_preservation",
+                            help='Do not stay on the last token when decoding')
+        parser.add_argument('--mass-preservation', action="store_true",
+                            dest="mass_preservation",
+                            help='Stay on the last token when decoding')
+        parser.set_defaults(mass_preservation=True)
+        parser.add_argument('--noise-var', type=float, default=1.0,
+                            help='Variance of discretness noise')
+        parser.add_argument('--noise-mean', type=float, default=0.0,
+                            help='Mean of discretness noise')
+        parser.add_argument('--noise-type', type=str, default="flat",
+                            help='Type of discretness noise')
+        parser.add_argument('--energy-bias', action="store_true",
+                            default=False,
+                            help='Bias for energy')
+        parser.add_argument('--energy-bias-init', type=float, default=-2.0,
+                            help='Initial value of the bias for energy')
+        parser.add_argument('--attention-eps', type=float, default=1e-6,
+                            help='Epsilon when calculating expected attention')
+
+    def p_choose(self, *args):
+        raise NotImplementedError
+
+    def input_projections(self, *args):
+        raise NotImplementedError
+
+    def attn_energy(
+        self, q_proj, k_proj, key_padding_mask=None, attn_mask=None
+    ):
+        """
+        Calculating monotonic energies
+
+        ============================================================
+        Expected input size
+        q_proj: bsz * num_heads, tgt_len, self.head_dim
+        k_proj: bsz * num_heads, src_len, self.head_dim
+        key_padding_mask: bsz, src_len
+        attn_mask: tgt_len, src_len
+        """
+        bsz, tgt_len, embed_dim = q_proj.size()
+        bsz = bsz // self.num_heads
+        src_len = k_proj.size(1)
+
+        attn_energy = (
+            torch.bmm(q_proj, k_proj.transpose(1, 2)) + self.energy_bias
+        )
+
+        if attn_mask is not None:
+            attn_mask = attn_mask.unsqueeze(0)
+            attn_energy += attn_mask
+
+        attn_energy = attn_energy.view(bsz, self.num_heads, tgt_len, src_len)
+
+        if key_padding_mask is not None:
+            attn_energy = attn_energy.masked_fill(
+                key_padding_mask.unsqueeze(1).unsqueeze(2).to(torch.bool),
+                float("-inf"),
+            )
+
+        return attn_energy
+
+    def expected_alignment_train(self, p_choose, key_padding_mask: Optional[Tensor]):
+        """
+        Calculating expected alignment for MMA
+        Mask is not need because p_choose will be 0 if masked
+
+        q_ij = (1 − p_{ij−1})q_{ij−1} + a+{i−1j}
+        a_ij = p_ij q_ij
+
+        Parallel solution:
+        ai = p_i * cumprod(1 − pi) * cumsum(a_i / cumprod(1 − pi))
+
+        ============================================================
+        Expected input size
+        p_choose: bsz * num_heads, tgt_len, src_len
+        """
+
+        # p_choose: bsz * num_heads, tgt_len, src_len
+        bsz_num_heads, tgt_len, src_len = p_choose.size()
+
+        # cumprod_1mp : bsz * num_heads, tgt_len, src_len
+        cumprod_1mp = exclusive_cumprod(1 - p_choose, dim=2, eps=self.eps)
+        cumprod_1mp_clamp = torch.clamp(cumprod_1mp, self.eps, 1.0)
+
+        init_attention = p_choose.new_zeros([bsz_num_heads, 1, src_len])
+        init_attention[:, :, 0] = 1.0
+
+        previous_attn = [init_attention]
+
+        for i in range(tgt_len):
+            # p_choose: bsz * num_heads, tgt_len, src_len
+            # cumprod_1mp_clamp : bsz * num_heads, tgt_len, src_len
+            # previous_attn[i]: bsz * num_heads, 1, src_len
+            # alpha_i: bsz * num_heads, src_len
+            alpha_i = (
+                p_choose[:, i]
+                * cumprod_1mp[:, i]
+                * torch.cumsum(previous_attn[i][:, 0] / cumprod_1mp_clamp[:, i], dim=1)
+            ).clamp(0, 1.0)
+            previous_attn.append(alpha_i.unsqueeze(1))
+
+        # alpha: bsz * num_heads, tgt_len, src_len
+        alpha = torch.cat(previous_attn[1:], dim=1)
+
+        if self.mass_preservation:
+            # Last token has the residual probabilities
+            if key_padding_mask is not None and key_padding_mask[:, -1].any():
+                # right padding
+                batch_size = key_padding_mask.size(0)
+                residuals = 1 - alpha.sum(dim=-1, keepdim=True).clamp(0.0, 1.0)
+                src_lens = src_len - key_padding_mask.sum(dim=1, keepdim=True)
+                src_lens = src_lens.expand(
+                    batch_size, self.num_heads
+                ).contiguous().view(-1, 1)
+                src_lens = src_lens.expand(-1, tgt_len).contiguous()
+                # add back the last value
+                residuals += alpha.gather(2, src_lens.unsqueeze(-1) - 1)
+                alpha = alpha.scatter(2, src_lens.unsqueeze(-1) - 1, residuals)
+            else:
+                residuals = 1 - alpha[:, :, :-1].sum(dim=-1).clamp(0.0, 1.0)
+                alpha[:, :, -1] = residuals
+
+        if torch.isnan(alpha).any():
+            # Something is wrong
+            raise RuntimeError("NaN in alpha.")
+
+        return alpha
+
+    def expected_alignment_infer(
+        self, p_choose, encoder_padding_mask: Optional[Tensor], incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]]
+    ):
+        # TODO modify this function
+        """
+        Calculating mo alignment for MMA during inference time
+
+        ============================================================
+        Expected input size
+        p_choose: bsz * num_heads, tgt_len, src_len
+        incremental_state: dict
+        encodencoder_padding_mask: bsz * src_len
+        """
+        # p_choose: bsz * self.num_heads, src_len
+        bsz_num_heads, tgt_len, src_len = p_choose.size()
+        # One token at a time
+        assert tgt_len == 1
+        p_choose = p_choose[:, 0, :]
+
+        monotonic_cache = self._get_monotonic_buffer(incremental_state)
+
+        # prev_monotonic_step: bsz, num_heads
+        bsz = bsz_num_heads // self.num_heads
+        prev_monotonic_step = monotonic_cache.get(
+            "head_step",
+            p_choose.new_zeros([bsz, self.num_heads]).long()
+        )
+        assert prev_monotonic_step is not None
+        bsz, num_heads = prev_monotonic_step.size()
+        assert num_heads == self.num_heads
+        assert bsz * num_heads == bsz_num_heads
+
+        # p_choose: bsz, num_heads, src_len
+        p_choose = p_choose.view(bsz, num_heads, src_len)
+
+        if encoder_padding_mask is not None:
+            src_lengths = src_len - \
+                encoder_padding_mask.sum(dim=1, keepdim=True).long()
+        else:
+            src_lengths = prev_monotonic_step.new_ones(bsz, 1) * src_len
+
+        # src_lengths: bsz, num_heads
+        src_lengths = src_lengths.expand_as(prev_monotonic_step)
+        # new_monotonic_step: bsz, num_heads
+        new_monotonic_step = prev_monotonic_step
+
+        step_offset = 0
+        if encoder_padding_mask is not None:
+            if encoder_padding_mask[:, 0].any():
+                # left_pad_source = True:
+                step_offset = encoder_padding_mask.sum(dim=-1, keepdim=True)
+
+        max_steps = src_lengths - 1 if self.mass_preservation else src_lengths
+
+        # finish_read: bsz, num_heads
+        finish_read = new_monotonic_step.eq(max_steps)
+        p_choose_i = 1
+        while finish_read.sum().item() < bsz * self.num_heads:
+            # p_choose: bsz * self.num_heads, src_len
+            # only choose the p at monotonic steps
+            # p_choose_i: bsz , self.num_heads
+            p_choose_i = (
+                p_choose.gather(
+                    2,
+                    (step_offset + new_monotonic_step)
+                    .unsqueeze(2)
+                    .clamp(0, src_len - 1),
+                )
+            ).squeeze(2)
+
+            action = (
+                (p_choose_i < 0.5)
+                .type_as(prev_monotonic_step)
+                .masked_fill(finish_read, 0)
+            )
+            # 1 x bsz
+            # sample actions on unfinished seq
+            # 1 means stay, finish reading
+            # 0 means leave, continue reading
+            # dist = torch.distributions.bernoulli.Bernoulli(p_choose)
+            # action = dist.sample().type_as(finish_read) * (1 - finish_read)
+
+            new_monotonic_step += action
+
+            finish_read = new_monotonic_step.eq(max_steps) | (action == 0)
+
+        monotonic_cache["head_step"] = new_monotonic_step
+        # Whether a head is looking for new input
+        monotonic_cache["head_read"] = (
+            new_monotonic_step.eq(max_steps) & (p_choose_i < 0.5)
+        )
+
+        # alpha: bsz * num_heads, 1, src_len
+        # new_monotonic_step: bsz, num_heads
+        alpha = (
+            p_choose
+            .new_zeros([bsz * self.num_heads, src_len])
+            .scatter(
+                1,
+                (step_offset + new_monotonic_step)
+                .view(bsz * self.num_heads, 1).clamp(0, src_len - 1),
+                1
+            )
+        )
+
+        if not self.mass_preservation:
+            alpha = alpha.masked_fill(
+                (new_monotonic_step == max_steps)
+                .view(bsz * self.num_heads, 1),
+                0
+            )
+
+        alpha = alpha.unsqueeze(1)
+
+        self._set_monotonic_buffer(incremental_state, monotonic_cache)
+
+        return alpha
+
+    def _get_monotonic_buffer(self, incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]]):
+        return self.get_incremental_state(
+            incremental_state,
+            'monotonic',
+        ) or {}
+
+    def _set_monotonic_buffer(self, incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]], buffer: Dict[str, Optional[Tensor]]):
+        self.set_incremental_state(
+            incremental_state,
+            'monotonic',
+            buffer,
+        )
+
+    def v_proj_output(self, value):
+        raise NotImplementedError
+
+    def forward(
+        self, query, key, value,
+        key_padding_mask=None, attn_mask=None, incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]] = None,
+        need_weights=True, static_kv=False
+    ):
+
+        tgt_len, bsz, embed_dim = query.size()
+        src_len = value.size(0)
+
+        # stepwise prob
+        # p_choose: bsz * self.num_heads, tgt_len, src_len
+        p_choose = self.p_choose(
+            query, key, key_padding_mask, incremental_state,
+        )
+
+        # expected alignment alpha
+        # bsz * self.num_heads, tgt_len, src_len
+        if incremental_state is not None:
+            alpha = self.expected_alignment_infer(
+                p_choose, key_padding_mask, incremental_state)
+        else:
+            alpha = self.expected_alignment_train(
+                p_choose, key_padding_mask)
+
+        # expected attention beta
+        # bsz * self.num_heads, tgt_len, src_len
+        beta = self.expected_attention(
+            alpha, query, key, value,
+            key_padding_mask, attn_mask,
+            incremental_state
+        )
+
+        attn_weights = beta
+
+        v_proj = self.v_proj_output(value)
+
+        attn = torch.bmm(attn_weights.type_as(v_proj), v_proj)
+
+        attn = attn.transpose(0, 1).contiguous().view(tgt_len, bsz, embed_dim)
+
+        attn = self.out_proj(attn)
+
+        beta = beta.view(bsz, self.num_heads, tgt_len, src_len)
+        alpha = alpha.view(bsz, self.num_heads, tgt_len, src_len)
+        p_choose = p_choose.view(bsz, self.num_heads, tgt_len, src_len)
+
+        return attn, {
+            "alpha": alpha,
+            "beta": beta,
+            "p_choose": p_choose,
+        }
+
+
+@register_monotonic_attention("hard_aligned")
+class MonotonicMultiheadAttentionHardAligned(
+    MonotonicAttention, MultiheadAttention
+):
+    def __init__(self, args):
+        MultiheadAttention.__init__(
+            self,
+            embed_dim=args.decoder_embed_dim,
+            num_heads=args.decoder_attention_heads,
+            kdim=getattr(args, "encoder_embed_dim", None),
+            vdim=getattr(args, "encoder_embed_dim", None),
+            dropout=args.attention_dropout,
+            encoder_decoder_attention=True,
+        )
+
+        MonotonicAttention.__init__(self, args)
+
+        self.k_in_proj = {"monotonic": self.k_proj}
+        self.q_in_proj = {"monotonic": self.q_proj}
+        self.v_in_proj = {"output": self.v_proj}
+
+    @staticmethod
+    def add_args(parser):
+        # fmt: off
+        parser.add_argument('--no-mass-preservation', action="store_false",
+                            dest="mass_preservation",
+                            help='Do not stay on the last token when decoding')
+        parser.add_argument('--mass-preservation', action="store_true",
+                            dest="mass_preservation",
+                            help='Stay on the last token when decoding')
+        parser.set_defaults(mass_preservation=True)
+        parser.add_argument('--noise-var', type=float, default=1.0,
+                            help='Variance of discretness noise')
+        parser.add_argument('--noise-mean', type=float, default=0.0,
+                            help='Mean of discretness noise')
+        parser.add_argument('--noise-type', type=str, default="flat",
+                            help='Type of discretness noise')
+        parser.add_argument('--energy-bias', action="store_true",
+                            default=False,
+                            help='Bias for energy')
+        parser.add_argument('--energy-bias-init', type=float, default=-2.0,
+                            help='Initial value of the bias for energy')
+        parser.add_argument('--attention-eps', type=float, default=1e-6,
+                            help='Epsilon when calculating expected attention')
+
+    def attn_energy(
+        self, q_proj: Optional[Tensor], k_proj: Optional[Tensor], key_padding_mask: Optional[Tensor] = None, attn_mask: Optional[Tensor] = None
+    ):
+        """
+        Calculating monotonic energies
+
+        ============================================================
+        Expected input size
+        q_proj: bsz * num_heads, tgt_len, self.head_dim
+        k_proj: bsz * num_heads, src_len, self.head_dim
+        key_padding_mask: bsz, src_len
+        attn_mask: tgt_len, src_len
+        """
+        assert q_proj is not None  # Optional[Tensor] annotations in the signature above are to make the JIT compiler happy
+        assert k_proj is not None
+        bsz, tgt_len, embed_dim = q_proj.size()
+        bsz = bsz // self.num_heads
+        src_len = k_proj.size(1)
+
+        attn_energy = (
+            torch.bmm(q_proj, k_proj.transpose(1, 2)) + self.energy_bias
+        )
+
+        if attn_mask is not None:
+            attn_mask = attn_mask.unsqueeze(0)
+            attn_energy += attn_mask
+
+        attn_energy = attn_energy.view(bsz, self.num_heads, tgt_len, src_len)
+
+        if key_padding_mask is not None:
+            attn_energy = attn_energy.masked_fill(
+                key_padding_mask.unsqueeze(1).unsqueeze(2).to(torch.bool),
+                float("-inf"),
+            )
+
+        return attn_energy
+
+    def expected_alignment_train(self, p_choose, key_padding_mask: Optional[Tensor]):
+        """
+        Calculating expected alignment for MMA
+        Mask is not need because p_choose will be 0 if masked
+
+        q_ij = (1 − p_{ij−1})q_{ij−1} + a+{i−1j}
+        a_ij = p_ij q_ij
+
+        Parallel solution:
+        ai = p_i * cumprod(1 − pi) * cumsum(a_i / cumprod(1 − pi))
+
+        ============================================================
+        Expected input size
+        p_choose: bsz * num_heads, tgt_len, src_len
+        """
+
+        # p_choose: bsz * num_heads, tgt_len, src_len
+        bsz_num_heads, tgt_len, src_len = p_choose.size()
+
+        # cumprod_1mp : bsz * num_heads, tgt_len, src_len
+        cumprod_1mp = exclusive_cumprod(1 - p_choose, dim=2, eps=self.eps)
+        cumprod_1mp_clamp = torch.clamp(cumprod_1mp, self.eps, 1.0)
+
+        init_attention = p_choose.new_zeros([bsz_num_heads, 1, src_len])
+        init_attention[:, :, 0] = 1.0
+
+        previous_attn = [init_attention]
+
+        for i in range(tgt_len):
+            # p_choose: bsz * num_heads, tgt_len, src_len
+            # cumprod_1mp_clamp : bsz * num_heads, tgt_len, src_len
+            # previous_attn[i]: bsz * num_heads, 1, src_len
+            # alpha_i: bsz * num_heads, src_len
+            alpha_i = (
+                p_choose[:, i]
+                * cumprod_1mp[:, i]
+                * torch.cumsum(previous_attn[i][:, 0] / cumprod_1mp_clamp[:, i], dim=1)
+            ).clamp(0, 1.0)
+            previous_attn.append(alpha_i.unsqueeze(1))
+
+        # alpha: bsz * num_heads, tgt_len, src_len
+        alpha = torch.cat(previous_attn[1:], dim=1)
+
+        if self.mass_preservation:
+            # Last token has the residual probabilities
+            if key_padding_mask is not None and key_padding_mask[:, -1].any():
+                # right padding
+                batch_size = key_padding_mask.size(0)
+                residuals = 1 - alpha.sum(dim=-1, keepdim=True).clamp(0.0, 1.0)
+                src_lens = src_len - key_padding_mask.sum(dim=1, keepdim=True)
+                src_lens = src_lens.expand(
+                    batch_size, self.num_heads
+                ).contiguous().view(-1, 1)
+                src_lens = src_lens.expand(-1, tgt_len).contiguous()
+                # add back the last value
+                residuals += alpha.gather(2, src_lens.unsqueeze(-1) - 1)
+                alpha = alpha.scatter(2, src_lens.unsqueeze(-1) - 1, residuals)
+            else:
+                residuals = 1 - alpha[:, :, :-1].sum(dim=-1).clamp(0.0, 1.0)
+                alpha[:, :, -1] = residuals
+
+        if torch.isnan(alpha).any():
+            # Something is wrong
+            raise RuntimeError("NaN in alpha.")
+
+        return alpha
+
+    def expected_alignment_infer(
+        self, p_choose, encoder_padding_mask: Optional[Tensor], incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]]
+    ):
+        # TODO modify this function
+        """
+        Calculating mo alignment for MMA during inference time
+
+        ============================================================
+        Expected input size
+        p_choose: bsz * num_heads, tgt_len, src_len
+        incremental_state: dict
+        encodencoder_padding_mask: bsz * src_len
+        """
+        # p_choose: bsz * self.num_heads, src_len
+        bsz_num_heads, tgt_len, src_len = p_choose.size()
+        # One token at a time
+        assert tgt_len == 1
+        p_choose = p_choose[:, 0, :]
+
+        monotonic_cache = self._get_monotonic_buffer(incremental_state)
+
+        # prev_monotonic_step: bsz, num_heads
+        bsz = bsz_num_heads // self.num_heads
+        prev_monotonic_step = monotonic_cache.get(
+            "head_step",
+            p_choose.new_zeros([bsz, self.num_heads]).long()
+        )
+        assert prev_monotonic_step is not None
+        bsz, num_heads = prev_monotonic_step.size()
+        assert num_heads == self.num_heads
+        assert bsz * num_heads == bsz_num_heads
+
+        # p_choose: bsz, num_heads, src_len
+        p_choose = p_choose.view(bsz, num_heads, src_len)
+
+        if encoder_padding_mask is not None:
+            src_lengths = src_len - \
+                encoder_padding_mask.sum(dim=1, keepdim=True).long()
+        else:
+            src_lengths = torch.ones(bsz, 1).to(prev_monotonic_step) * src_len
+
+        # src_lengths: bsz, num_heads
+        src_lengths = src_lengths.expand_as(prev_monotonic_step)
+        # new_monotonic_step: bsz, num_heads
+        new_monotonic_step = prev_monotonic_step
+
+        step_offset = torch.tensor(0)
+        if encoder_padding_mask is not None:
+            if encoder_padding_mask[:, 0].any():
+                # left_pad_source = True:
+                step_offset = encoder_padding_mask.sum(dim=-1, keepdim=True)
+
+        max_steps = src_lengths - 1 if self.mass_preservation else src_lengths
+
+        # finish_read: bsz, num_heads
+        finish_read = new_monotonic_step.eq(max_steps)
+        p_choose_i = torch.tensor(1)
+        while finish_read.sum().item() < bsz * self.num_heads:
+            # p_choose: bsz * self.num_heads, src_len
+            # only choose the p at monotonic steps
+            # p_choose_i: bsz , self.num_heads
+            p_choose_i = (
+                p_choose.gather(
+                    2,
+                    (step_offset + new_monotonic_step)
+                    .unsqueeze(2)
+                    .clamp(0, src_len - 1),
+                )
+            ).squeeze(2)
+
+            action = (
+                (p_choose_i < 0.5)
+                .type_as(prev_monotonic_step)
+                .masked_fill(finish_read, 0)
+            )
+            # 1 x bsz
+            # sample actions on unfinished seq
+            # 1 means stay, finish reading
+            # 0 means leave, continue reading
+            # dist = torch.distributions.bernoulli.Bernoulli(p_choose)
+            # action = dist.sample().type_as(finish_read) * (1 - finish_read)
+
+            new_monotonic_step += action
+
+            finish_read = new_monotonic_step.eq(max_steps) | (action == 0)
+
+        monotonic_cache["head_step"] = new_monotonic_step
+        # Whether a head is looking for new input
+        monotonic_cache["head_read"] = (
+            new_monotonic_step.eq(max_steps) & (p_choose_i < 0.5)
+        )
+
+        # alpha: bsz * num_heads, 1, src_len
+        # new_monotonic_step: bsz, num_heads
+        alpha = (
+            p_choose
+            .new_zeros([bsz * self.num_heads, src_len])
+            .scatter(
+                1,
+                (step_offset + new_monotonic_step)
+                .view(bsz * self.num_heads, 1).clamp(0, src_len - 1),
+                1
+            )
+        )
+
+        if not self.mass_preservation:
+            alpha = alpha.masked_fill(
+                (new_monotonic_step == max_steps)
+                .view(bsz * self.num_heads, 1),
+                0
+            )
+
+        alpha = alpha.unsqueeze(1)
+
+        self._set_monotonic_buffer(incremental_state, monotonic_cache)
+
+        return alpha
+
+    def _get_monotonic_buffer(self, incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]]):
+        maybe_incremental_state = self.get_incremental_state(
+            incremental_state,
+            'monotonic',
+        )
+        if maybe_incremental_state is None:
+            typed_empty_dict: Dict[str, Optional[Tensor]] = {}
+            return typed_empty_dict
+        else:
+            return maybe_incremental_state
+
+    def _set_monotonic_buffer(self, incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]], buffer: Dict[str, Optional[Tensor]]):
+        self.set_incremental_state(
+            incremental_state,
+            'monotonic',
+            buffer,
+        )
+
+    def forward(
+        self, query: Optional[Tensor], key: Optional[Tensor], value: Optional[Tensor],
+        key_padding_mask: Optional[Tensor] = None, attn_mask: Optional[Tensor] = None, incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]] = None,
+        need_weights: bool = True, static_kv: bool = False, need_head_weights: bool = False,
+    ):
+        assert query is not None
+        assert value is not None
+        tgt_len, bsz, embed_dim = query.size()
+        src_len = value.size(0)
+
+        # stepwise prob
+        # p_choose: bsz * self.num_heads, tgt_len, src_len
+        p_choose = self.p_choose(
+            query, key, key_padding_mask, incremental_state,
+        )
+
+        # expected alignment alpha
+        # bsz * self.num_heads, tgt_len, src_len
+        if incremental_state is not None:
+            alpha = self.expected_alignment_infer(
+                p_choose, key_padding_mask, incremental_state)
+        else:
+            alpha = self.expected_alignment_train(
+                p_choose, key_padding_mask)
+
+        # expected attention beta
+        # bsz * self.num_heads, tgt_len, src_len
+        beta = self.expected_attention(
+            alpha, query, key, value,
+            key_padding_mask, attn_mask,
+            incremental_state
+        )
+
+        attn_weights = beta
+
+        v_proj = self.v_proj_output(value)
+        assert v_proj is not None
+
+        attn = torch.bmm(attn_weights.type_as(v_proj), v_proj)
+
+        attn = attn.transpose(0, 1).contiguous().view(tgt_len, bsz, embed_dim)
+
+        attn = self.out_proj(attn)
+
+        beta = beta.view(bsz, self.num_heads, tgt_len, src_len)
+        alpha = alpha.view(bsz, self.num_heads, tgt_len, src_len)
+        p_choose = p_choose.view(bsz, self.num_heads, tgt_len, src_len)
+
+        return attn, {
+            "alpha": alpha,
+            "beta": beta,
+            "p_choose": p_choose,
+        }
+
+    def input_projections(self, query: Optional[Tensor], key: Optional[Tensor], value: Optional[Tensor], name: str):
+        """
+        Prepare inputs for multihead attention
+
+        ============================================================
+        Expected input size
+        query: tgt_len, bsz, embed_dim
+        key: src_len, bsz, embed_dim
+        value: src_len, bsz, embed_dim
+        name: monotonic or soft
+        """
+
+        if query is not None:
+            bsz = query.size(1)
+            q = self.q_proj(query)
+            q *= self.scaling
+            q = q.contiguous().view(
+                -1, bsz * self.num_heads, self.head_dim
+            ).transpose(0, 1)
+        else:
+            q = None
+
+        if key is not None:
+            bsz = key.size(1)
+            k = self.k_proj(key)
+            k = k.contiguous().view(
+                -1, bsz * self.num_heads, self.head_dim
+            ).transpose(0, 1)
+        else:
+            k = None
+
+        if value is not None:
+            bsz = value.size(1)
+            v = self.v_proj(value)
+            v = v.contiguous().view(
+                -1, bsz * self.num_heads, self.head_dim
+            ).transpose(0, 1)
+        else:
+            v = None
+
+        return q, k, v
+
+    def p_choose(
+        self, query: Optional[Tensor], key: Optional[Tensor], key_padding_mask: Optional[Tensor] = None,
+        incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]] = None
+    ):
+        """
+        Calculating step wise prob for reading and writing
+        1 to read, 0 to write
+
+        ============================================================
+        Expected input size
+        query: bsz, tgt_len, embed_dim
+        key: bsz, src_len, embed_dim
+        value: bsz, src_len, embed_dim
+        key_padding_mask: bsz, src_len
+        attn_mask: bsz, src_len
+        query: bsz, tgt_len, embed_dim
+        """
+
+        # prepare inputs
+        q_proj, k_proj, _ = self.input_projections(
+            query, key, None, "monotonic"
+        )
+
+        # attention energy
+        attn_energy = self.attn_energy(q_proj, k_proj, key_padding_mask)
+
+        return p_choose_strategy.hard_aligned(q_proj, k_proj, attn_energy, self.noise_mean, self.noise_var, self.training)
+
+    def expected_attention(self, alpha, *args):
+        """
+        For MMA-H, beta = alpha
+        """
+        return alpha
+
+    def v_proj_output(self, value):
+        _, _, v_proj = self.input_projections(None, None, value, "output")
+        return v_proj
+
+
+@register_monotonic_attention("infinite_lookback")
+class MonotonicMultiheadAttentionInfiniteLookback(
+    MonotonicMultiheadAttentionHardAligned
+):
+    def __init__(self, args):
+        super().__init__(args)
+        self.init_soft_attention()
+
+    def init_soft_attention(self):
+        self.k_proj_soft = nn.Linear(self.kdim, self.embed_dim, bias=True)
+        self.q_proj_soft = nn.Linear(self.embed_dim, self.embed_dim, bias=True)
+        self.k_in_proj["soft"] = self.k_proj_soft
+        self.q_in_proj["soft"] = self.q_proj_soft
+
+        if self.qkv_same_dim:
+            # Empirically observed the convergence to be much better with
+            # the scaled initialization
+            nn.init.xavier_uniform_(
+                self.k_in_proj["soft"].weight, gain=1 / math.sqrt(2)
+            )
+            nn.init.xavier_uniform_(
+                self.q_in_proj["soft"].weight, gain=1 / math.sqrt(2)
+            )
+        else:
+            nn.init.xavier_uniform_(self.k_in_proj["soft"].weight)
+            nn.init.xavier_uniform_(self.q_in_proj["soft"].weight)
+
+    def expected_attention(
+        self, alpha, query: Optional[Tensor], key: Optional[Tensor], value: Optional[Tensor],
+        key_padding_mask: Optional[Tensor], attn_mask: Optional[Tensor], incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]]
+    ):
+        # monotonic attention, we will calculate milk here
+        bsz_x_num_heads, tgt_len, src_len = alpha.size()
+        bsz = int(bsz_x_num_heads / self.num_heads)
+
+        q, k, _ = self.input_projections(query, key, None, "soft")
+        soft_energy = self.attn_energy(q, k, key_padding_mask, attn_mask)
+
+        assert list(soft_energy.size()) == \
+            [bsz, self.num_heads, tgt_len, src_len]
+
+        soft_energy = soft_energy.view(bsz * self.num_heads, tgt_len, src_len)
+
+        if incremental_state is not None:
+            monotonic_cache = self._get_monotonic_buffer(incremental_state)
+            head_step = monotonic_cache["head_step"]
+            assert head_step is not None
+            monotonic_length = head_step + 1
+            step_offset = 0
+            if key_padding_mask is not None:
+                if key_padding_mask[:, 0].any():
+                    # left_pad_source = True:
+                    step_offset = key_padding_mask.sum(dim=-1, keepdim=True)
+            monotonic_length += step_offset
+            mask = lengths_to_mask(
+                monotonic_length.view(-1),
+                soft_energy.size(2), 1
+            ).unsqueeze(1)
+
+            soft_energy = soft_energy.masked_fill(~mask.to(torch.bool), float("-inf"))
+            soft_energy = soft_energy - soft_energy.max(dim=2, keepdim=True)[0]
+            exp_soft_energy = torch.exp(soft_energy)
+            exp_soft_energy_sum = exp_soft_energy.sum(dim=2)
+            beta = exp_soft_energy / exp_soft_energy_sum.unsqueeze(2)
+
+        else:
+            soft_energy = soft_energy - soft_energy.max(dim=2, keepdim=True)[0]
+            exp_soft_energy = torch.exp(soft_energy) + self.eps
+            inner_items = alpha / (torch.cumsum(exp_soft_energy, dim=2))
+
+            beta = (
+                exp_soft_energy
+                * torch.cumsum(inner_items.flip(dims=[2]), dim=2)
+                .flip(dims=[2])
+            )
+
+            beta = beta.view(bsz, self.num_heads, tgt_len, src_len)
+
+            if key_padding_mask is not None:
+                beta = beta.masked_fill(
+                    key_padding_mask.unsqueeze(1).unsqueeze(2).to(torch.bool), 0)
+
+            beta = beta / beta.sum(dim=3, keepdim=True)
+            beta = beta.view(bsz * self.num_heads, tgt_len, src_len)
+            beta = self.dropout_module(beta)
+
+        if torch.isnan(beta).any():
+            # Something is wrong
+            raise RuntimeError("NaN in beta.")
+
+        return beta
+
+
+@register_monotonic_attention("waitk")
+class MonotonicMultiheadAttentionWaitK(
+    MonotonicMultiheadAttentionInfiniteLookback
+):
+    def __init__(self, args):
+        super().__init__(args)
+        self.q_in_proj["soft"] = self.q_in_proj["monotonic"]
+        self.k_in_proj["soft"] = self.k_in_proj["monotonic"]
+        self.waitk_lagging = args.waitk_lagging
+        assert self.waitk_lagging > 0, (
+            f"Lagging has to been larger than 0, get {self.waitk_lagging}."
+        )
+
+    @staticmethod
+    def add_args(parser):
+        super(
+            MonotonicMultiheadAttentionWaitK,
+            MonotonicMultiheadAttentionWaitK,
+        ).add_args(parser)
+
+        parser.add_argument(
+            "--waitk-lagging", type=int, required=True, help="Wait K lagging"
+        )
+
+    def p_choose(
+        self, query: Optional[Tensor], key: Optional[Tensor], key_padding_mask: Optional[Tensor] = None,
+        incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]] = None,
+    ):
+        """
+        query: bsz, tgt_len
+        key: bsz, src_len
+        key_padding_mask: bsz, src_len
+        """
+        return p_choose_strategy.waitk(query, key, self.waitk_lagging, self.num_heads, key_padding_mask, incremental_state)
diff --git a/SpeechT5/fairseq/examples/simultaneous_translation/modules/monotonic_transformer_layer.py b/SpeechT5/fairseq/examples/simultaneous_translation/modules/monotonic_transformer_layer.py
new file mode 100644
index 0000000000000000000000000000000000000000..bcd45aa8a6bbe86d2e3826c9589cc0ae648730a2
--- /dev/null
+++ b/SpeechT5/fairseq/examples/simultaneous_translation/modules/monotonic_transformer_layer.py
@@ -0,0 +1,198 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from fairseq.modules import LayerNorm, TransformerDecoderLayer, TransformerEncoderLayer
+
+from . import build_monotonic_attention
+
+from typing import Dict, List, Optional
+
+import torch
+from torch import Tensor
+
+
+class TransformerMonotonicEncoderLayer(TransformerEncoderLayer):
+    def forward(self, x, encoder_padding_mask):
+        seq_len, _, _ = x.size()
+        attn_mask = x.new_ones([seq_len, seq_len]).triu(1)
+        attn_mask = attn_mask.masked_fill(attn_mask.bool(), float("-inf"))
+        return super().forward(x, encoder_padding_mask, attn_mask)
+
+
+class TransformerMonotonicDecoderLayer(TransformerDecoderLayer):
+    def __init__(
+        self, args, no_encoder_attn=False, add_bias_kv=False, add_zero_attn=False
+    ):
+        super().__init__(
+            args,
+            no_encoder_attn=True,
+            add_bias_kv=add_bias_kv,
+            add_zero_attn=add_zero_attn,
+        )
+
+        assert args.simul_type is not None, "A --simul-type is needed."
+
+        self.encoder_attn = build_monotonic_attention(args)
+        self.encoder_attn_layer_norm = LayerNorm(
+            self.embed_dim, export=getattr(args, "char_inputs", False)
+        )
+
+    def get_head_steps(self, incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]]):
+        return self.encoder_attn._get_monotonic_buffer(incremental_state).get(
+            "head_step"
+        )
+
+    def prune_incremental_state(self, incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]]):
+        input_buffer = self.self_attn._get_input_buffer(incremental_state)
+        for key in ["prev_key", "prev_value"]:
+            input_buffer_key = input_buffer[key]
+            assert input_buffer_key is not None
+            if input_buffer_key.size(2) > 1:
+                input_buffer[key] = input_buffer_key[:, :, :-1, :]
+            else:
+                typed_empty_dict: Dict[str, Optional[Tensor]] = {}
+                input_buffer = typed_empty_dict
+                break
+        assert incremental_state is not None
+        self.self_attn._set_input_buffer(incremental_state, input_buffer)
+
+    def get_steps(self, incremental_state):
+        return self.encoder_attn._get_monotonic_buffer(incremental_state).get("step", 0)
+
+    def forward(
+        self,
+        x,
+        encoder_out: Optional[torch.Tensor] = None,
+        encoder_padding_mask: Optional[torch.Tensor] = None,
+        incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]] = None,
+        prev_self_attn_state: Optional[List[torch.Tensor]] = None,
+        prev_attn_state: Optional[List[torch.Tensor]] = None,
+        self_attn_mask: Optional[torch.Tensor] = None,
+        self_attn_padding_mask: Optional[torch.Tensor] = None,
+        need_attn: bool = False,
+        need_head_weights: bool = False,
+    ):
+        """
+        Args:
+            x (Tensor): input to the layer of shape `(seq_len, batch, embed_dim)`
+            encoder_padding_mask (ByteTensor, optional): binary
+                ByteTensor of shape `(batch, src_len)` where padding
+                elements are indicated by ``1``.
+            need_attn (bool, optional): return attention weights
+            need_head_weights (bool, optional): return attention weights
+                for each head (default: return average over heads).
+
+        Returns:
+            encoded output of shape `(seq_len, batch, embed_dim)`
+        """
+        if need_head_weights:
+            need_attn = True
+
+        residual = x
+        if self.normalize_before:
+            x = self.self_attn_layer_norm(x)
+        if prev_self_attn_state is not None:
+            prev_key, prev_value = prev_self_attn_state[:2]
+            saved_state: Dict[str, Optional[Tensor]] = {
+                "prev_key": prev_key,
+                "prev_value": prev_value,
+            }
+            if len(prev_self_attn_state) >= 3:
+                saved_state["prev_key_padding_mask"] = prev_self_attn_state[2]
+            assert incremental_state is not None
+            self.self_attn._set_input_buffer(incremental_state, saved_state)
+        _self_attn_input_buffer = self.self_attn._get_input_buffer(incremental_state)
+        if self.cross_self_attention and not (
+            incremental_state is not None
+            and _self_attn_input_buffer is not None
+            and "prev_key" in _self_attn_input_buffer
+        ):
+            if self_attn_mask is not None:
+                assert encoder_out is not None
+                self_attn_mask = torch.cat(
+                    (x.new_zeros(x.size(0), encoder_out.size(0)), self_attn_mask), dim=1
+                )
+            if self_attn_padding_mask is not None:
+                if encoder_padding_mask is None:
+                    assert encoder_out is not None
+                    encoder_padding_mask = self_attn_padding_mask.new_zeros(
+                        encoder_out.size(1), encoder_out.size(0)
+                    )
+                self_attn_padding_mask = torch.cat(
+                    (encoder_padding_mask, self_attn_padding_mask), dim=1
+                )
+            assert encoder_out is not None
+            y = torch.cat((encoder_out, x), dim=0)
+        else:
+            y = x
+
+        x, attn = self.self_attn(
+            query=x,
+            key=y,
+            value=y,
+            key_padding_mask=self_attn_padding_mask,
+            incremental_state=incremental_state,
+            need_weights=False,
+            attn_mask=self_attn_mask,
+        )
+        x = self.dropout_module(x)
+        x = self.residual_connection(x, residual)
+        if not self.normalize_before:
+            x = self.self_attn_layer_norm(x)
+
+        assert self.encoder_attn is not None
+        residual = x
+        if self.normalize_before:
+            x = self.encoder_attn_layer_norm(x)
+        if prev_attn_state is not None:
+            prev_key, prev_value = prev_attn_state[:2]
+            saved_state: Dict[str, Optional[Tensor]] = {
+                "prev_key": prev_key,
+                "prev_value": prev_value,
+            }
+            if len(prev_attn_state) >= 3:
+                saved_state["prev_key_padding_mask"] = prev_attn_state[2]
+            assert incremental_state is not None
+            self.encoder_attn._set_input_buffer(incremental_state, saved_state)
+
+        x, attn = self.encoder_attn(
+            query=x,
+            key=encoder_out,
+            value=encoder_out,
+            key_padding_mask=encoder_padding_mask,
+            incremental_state=incremental_state,
+            static_kv=True,
+            need_weights=need_attn or (not self.training and self.need_attn),
+            need_head_weights=need_head_weights,
+        )
+        x = self.dropout_module(x)
+        x = self.residual_connection(x, residual)
+        if not self.normalize_before:
+            x = self.encoder_attn_layer_norm(x)
+
+        residual = x
+        if self.normalize_before:
+            x = self.final_layer_norm(x)
+
+        x = self.activation_fn(self.fc1(x))
+        x = self.activation_dropout_module(x)
+        x = self.fc2(x)
+        x = self.dropout_module(x)
+        x = self.residual_connection(x, residual)
+        if not self.normalize_before:
+            x = self.final_layer_norm(x)
+        if self.onnx_trace and incremental_state is not None:
+            saved_state = self.self_attn._get_input_buffer(incremental_state)
+            assert saved_state is not None
+            if self_attn_padding_mask is not None:
+                self_attn_state = [
+                    saved_state["prev_key"],
+                    saved_state["prev_value"],
+                    saved_state["prev_key_padding_mask"],
+                ]
+            else:
+                self_attn_state = [saved_state["prev_key"], saved_state["prev_value"]]
+            return x, attn, self_attn_state
+        return x, attn, None
diff --git a/SpeechT5/fairseq/examples/simultaneous_translation/utils/__init__.py b/SpeechT5/fairseq/examples/simultaneous_translation/utils/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..1e9ce844f59a4211061392084cc81075e6bab19f
--- /dev/null
+++ b/SpeechT5/fairseq/examples/simultaneous_translation/utils/__init__.py
@@ -0,0 +1,14 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import importlib
+import os
+
+
+# automatically import any Python files in the criterions/ directory
+for file in sorted(os.listdir(os.path.dirname(__file__))):
+    if file.endswith(".py") and not file.startswith("_"):
+        module = file[: file.find(".py")]
+        importlib.import_module("examples.simultaneous_translation.utils." + module)
diff --git a/SpeechT5/fairseq/examples/simultaneous_translation/utils/data_utils.py b/SpeechT5/fairseq/examples/simultaneous_translation/utils/data_utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..a763ea6686c66024dca4c84a4a2ca238fad0d856
--- /dev/null
+++ b/SpeechT5/fairseq/examples/simultaneous_translation/utils/data_utils.py
@@ -0,0 +1,100 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import torch
+
+
+def calc_mean_invstddev(feature):
+    if len(feature.size()) != 2:
+        raise ValueError("We expect the input feature to be 2-D tensor")
+    mean = feature.mean(0)
+    var = feature.var(0)
+    # avoid division by ~zero
+    eps = 1e-8
+    if (var < eps).any():
+        return mean, 1.0 / (torch.sqrt(var) + eps)
+    return mean, 1.0 / torch.sqrt(var)
+
+
+def apply_mv_norm(features):
+    # If there is less than 2 spectrograms, the variance cannot be computed (is NaN)
+    # and normalization is not possible, so return the item as it is
+    if features.size(0) < 2:
+        return features
+    mean, invstddev = calc_mean_invstddev(features)
+    res = (features - mean) * invstddev
+    return res
+
+
+def lengths_to_encoder_padding_mask(lengths, batch_first: bool = False):
+    """
+    convert lengths (a 1-D Long/Int tensor) to 2-D binary tensor
+
+    Args:
+        lengths: a (B, )-shaped tensor
+
+    Return:
+        max_length: maximum length of B sequences
+        encoder_padding_mask: a (max_length, B) binary mask, where
+        [t, b] = 0 for t < lengths[b] and 1 otherwise
+
+    TODO:
+        kernelize this function if benchmarking shows this function is slow
+    """
+    max_lengths = torch.max(lengths).item()
+    bsz = lengths.size(0)
+    encoder_padding_mask = torch.arange(
+        max_lengths
+    ).to(  # a (T, ) tensor with [0, ..., T-1]
+        lengths.device
+    ).view(  # move to the right device
+        1, max_lengths
+    ).expand(  # reshape to (1, T)-shaped tensor
+        bsz, -1
+    ) >= lengths.view(  # expand to (B, T)-shaped tensor
+        bsz, 1
+    ).expand(
+        -1, max_lengths
+    )
+    if not batch_first:
+        return encoder_padding_mask.t(), max_lengths
+    else:
+        return encoder_padding_mask, max_lengths
+
+
+def encoder_padding_mask_to_lengths(
+    encoder_padding_mask, max_lengths, batch_size, device
+):
+    """
+    convert encoder_padding_mask (2-D binary tensor) to a 1-D tensor
+
+    Conventionally, encoder output contains a encoder_padding_mask, which is
+    a 2-D mask in a shape (T, B), whose (t, b) element indicate whether
+    encoder_out[t, b] is a valid output (=0) or not (=1). Occasionally, we
+    need to convert this mask tensor to a 1-D tensor in shape (B, ), where
+    [b] denotes the valid length of b-th sequence
+
+    Args:
+        encoder_padding_mask: a (T, B)-shaped binary tensor or None; if None,
+        indicating all are valid
+    Return:
+        seq_lengths: a (B,)-shaped tensor, where its (b, )-th element is the
+        number of valid elements of b-th sequence
+
+        max_lengths: maximum length of all sequence, if encoder_padding_mask is
+        not None, max_lengths must equal to encoder_padding_mask.size(0)
+
+        batch_size: batch size; if encoder_padding_mask is
+        not None, max_lengths must equal to encoder_padding_mask.size(1)
+
+        device: which device to put the result on
+    """
+    if encoder_padding_mask is None:
+        return torch.Tensor([max_lengths] * batch_size).to(torch.int32).to(device)
+
+    assert encoder_padding_mask.size(0) == max_lengths, "max_lengths does not match"
+    assert encoder_padding_mask.size(1) == batch_size, "batch_size does not match"
+
+    return max_lengths - torch.sum(encoder_padding_mask, dim=0)
diff --git a/SpeechT5/fairseq/examples/simultaneous_translation/utils/functions.py b/SpeechT5/fairseq/examples/simultaneous_translation/utils/functions.py
new file mode 100644
index 0000000000000000000000000000000000000000..f795b5f31cee6d9f8387d6402994b9cbb4c98190
--- /dev/null
+++ b/SpeechT5/fairseq/examples/simultaneous_translation/utils/functions.py
@@ -0,0 +1,149 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import torch
+
+
+def exclusive_cumprod(tensor, dim: int, eps: float = 1e-10):
+    """
+    Implementing exclusive cumprod.
+    There is cumprod in pytorch, however there is no exclusive mode.
+    cumprod(x) = [x1, x1x2, x2x3x4, ..., prod_{i=1}^n x_i]
+    exclusive means cumprod(x) = [1, x1, x1x2, x1x2x3, ..., prod_{i=1}^{n-1} x_i]
+    """
+    tensor_size = list(tensor.size())
+    tensor_size[dim] = 1
+    return_tensor = safe_cumprod(
+        torch.cat([torch.ones(tensor_size).type_as(tensor), tensor], dim=dim),
+        dim=dim,
+        eps=eps,
+    )
+
+    if dim == 0:
+        return return_tensor[:-1]
+    elif dim == 1:
+        return return_tensor[:, :-1]
+    elif dim == 2:
+        return return_tensor[:, :, :-1]
+    else:
+        raise RuntimeError("Cumprod on dimension 3 and more is not implemented")
+
+
+def safe_cumprod(tensor, dim: int, eps: float = 1e-10):
+    """
+    An implementation of cumprod to prevent precision issue.
+    cumprod(x)
+    = [x1, x1x2, x1x2x3, ....]
+    = [exp(log(x1)), exp(log(x1) + log(x2)), exp(log(x1) + log(x2) + log(x3)), ...]
+    = exp(cumsum(log(x)))
+    """
+
+    if (tensor + eps < 0).any().item():
+        raise RuntimeError(
+            "Safe cumprod can only take non-negative tensors as input."
+            "Consider use torch.cumprod if you want to calculate negative values."
+        )
+
+    log_tensor = torch.log(tensor + eps)
+    cumsum_log_tensor = torch.cumsum(log_tensor, dim)
+    exp_cumsum_log_tensor = torch.exp(cumsum_log_tensor)
+    return exp_cumsum_log_tensor
+
+
+def lengths_to_mask(lengths, max_len: int, dim: int = 0, negative_mask: bool = False):
+    """
+    Convert a tensor of lengths to mask
+    For example, lengths = [[2, 3, 4]], max_len = 5
+    mask =
+       [[1, 1, 1],
+        [1, 1, 1],
+        [0, 1, 1],
+        [0, 0, 1],
+        [0, 0, 0]]
+    """
+    assert len(lengths.size()) <= 2
+    if len(lengths) == 2:
+        if dim == 1:
+            lengths = lengths.t()
+        lengths = lengths
+    else:
+        lengths = lengths.unsqueeze(1)
+
+    # lengths : batch_size, 1
+    lengths = lengths.view(-1, 1)
+
+    batch_size = lengths.size(0)
+    # batch_size, max_len
+    mask = torch.arange(max_len).expand(batch_size, max_len).type_as(lengths) < lengths
+
+    if negative_mask:
+        mask = ~mask
+
+    if dim == 0:
+        # max_len, batch_size
+        mask = mask.t()
+
+    return mask
+
+
+def moving_sum(x, start_idx: int, end_idx: int):
+    """
+    From MONOTONIC CHUNKWISE ATTENTION
+    https://arxiv.org/pdf/1712.05382.pdf
+    Equation (18)
+
+    x = [x_1, x_2, ..., x_N]
+    MovingSum(x, start_idx, end_idx)_n = Sigma_{m=n−(start_idx−1)}^{n+end_idx-1} x_m
+    for n in {1, 2, 3, ..., N}
+
+    x : src_len, batch_size
+    start_idx : start idx
+    end_idx : end idx
+
+    Example
+    src_len = 5
+    batch_size = 3
+    x =
+       [[ 0, 5, 10],
+        [ 1, 6, 11],
+        [ 2, 7, 12],
+        [ 3, 8, 13],
+        [ 4, 9, 14]]
+
+    MovingSum(x, 3, 1) =
+       [[ 0,  5, 10],
+        [ 1, 11, 21],
+        [ 3, 18, 33],
+        [ 6, 21, 36],
+        [ 9, 24, 39]]
+
+    MovingSum(x, 1, 3) =
+       [[ 3, 18, 33],
+        [ 6, 21, 36],
+        [ 9, 24, 39],
+        [ 7, 17, 27],
+        [ 4,  9, 14]]
+    """
+    assert start_idx > 0 and end_idx > 0
+    assert len(x.size()) == 2
+    src_len, batch_size = x.size()
+    # batch_size, 1, src_len
+    x = x.t().unsqueeze(1)
+    # batch_size, 1, src_len
+    moving_sum_weight = x.new_ones([1, 1, end_idx + start_idx - 1])
+
+    moving_sum = (
+        torch.nn.functional.conv1d(
+            x, moving_sum_weight, padding=start_idx + end_idx - 1
+        )
+        .squeeze(1)
+        .t()
+    )
+    moving_sum = moving_sum[end_idx:-start_idx]
+
+    assert src_len == moving_sum.size(0)
+    assert batch_size == moving_sum.size(1)
+
+    return moving_sum
diff --git a/SpeechT5/fairseq/examples/simultaneous_translation/utils/latency.py b/SpeechT5/fairseq/examples/simultaneous_translation/utils/latency.py
new file mode 100644
index 0000000000000000000000000000000000000000..5d800a5d9e992be49cedc72b7a9604a32e35fbcc
--- /dev/null
+++ b/SpeechT5/fairseq/examples/simultaneous_translation/utils/latency.py
@@ -0,0 +1,451 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import torch
+
+
+class LatencyMetric(object):
+    @staticmethod
+    def length_from_padding_mask(padding_mask, batch_first: bool = False):
+        dim = 1 if batch_first else 0
+        return padding_mask.size(dim) - padding_mask.sum(dim=dim, keepdim=True)
+
+    def prepare_latency_metric(
+        self,
+        delays,
+        src_lens,
+        target_padding_mask=None,
+        batch_first: bool = False,
+        start_from_zero: bool = True,
+    ):
+        assert len(delays.size()) == 2
+        assert len(src_lens.size()) == 2
+
+        if start_from_zero:
+            delays = delays + 1
+
+        if batch_first:
+            # convert to batch_last
+            delays = delays.t()
+            src_lens = src_lens.t()
+            tgt_len, bsz = delays.size()
+            _, bsz_1 = src_lens.size()
+
+            if target_padding_mask is not None:
+                target_padding_mask = target_padding_mask.t()
+                tgt_len_1, bsz_2 = target_padding_mask.size()
+                assert tgt_len == tgt_len_1
+                assert bsz == bsz_2
+
+        assert bsz == bsz_1
+
+        if target_padding_mask is None:
+            tgt_lens = tgt_len * delays.new_ones([1, bsz]).float()
+        else:
+            # 1, batch_size
+            tgt_lens = self.length_from_padding_mask(target_padding_mask, False).float()
+            delays = delays.masked_fill(target_padding_mask, 0)
+
+        return delays, src_lens, tgt_lens, target_padding_mask
+
+    def __call__(
+        self,
+        delays,
+        src_lens,
+        target_padding_mask=None,
+        batch_first: bool = False,
+        start_from_zero: bool = True,
+    ):
+        delays, src_lens, tgt_lens, target_padding_mask = self.prepare_latency_metric(
+            delays, src_lens, target_padding_mask, batch_first, start_from_zero
+        )
+        return self.cal_metric(delays, src_lens, tgt_lens, target_padding_mask)
+
+    @staticmethod
+    def cal_metric(delays, src_lens, tgt_lens, target_padding_mask):
+        """
+        Expected sizes:
+        delays: tgt_len, batch_size
+        src_lens: 1, batch_size
+        target_padding_mask: tgt_len, batch_size
+        """
+        raise NotImplementedError
+
+
+class AverageProportion(LatencyMetric):
+    """
+    Function to calculate Average Proportion from
+    Can neural machine translation do simultaneous translation?
+    (https://arxiv.org/abs/1606.02012)
+
+    Delays are monotonic steps, range from 1 to src_len.
+    Give src x tgt y, AP is calculated as:
+
+    AP = 1 / (|x||y]) sum_i^|Y| deleys_i
+    """
+
+    @staticmethod
+    def cal_metric(delays, src_lens, tgt_lens, target_padding_mask):
+        if target_padding_mask is not None:
+            AP = torch.sum(
+                delays.masked_fill(target_padding_mask, 0), dim=0, keepdim=True
+            )
+        else:
+            AP = torch.sum(delays, dim=0, keepdim=True)
+
+        AP = AP / (src_lens * tgt_lens)
+        return AP
+
+
+class AverageLagging(LatencyMetric):
+    """
+    Function to calculate Average Lagging from
+    STACL: Simultaneous Translation with Implicit Anticipation
+    and Controllable Latency using Prefix-to-Prefix Framework
+    (https://arxiv.org/abs/1810.08398)
+
+    Delays are monotonic steps, range from 1 to src_len.
+    Give src x tgt y, AP is calculated as:
+
+    AL = 1 / tau sum_i^tau delays_i - (i - 1) / gamma
+
+    Where
+    gamma = |y| / |x|
+    tau = argmin_i(delays_i = |x|)
+    """
+
+    @staticmethod
+    def cal_metric(delays, src_lens, tgt_lens, target_padding_mask):
+        # tau = argmin_i(delays_i = |x|)
+        tgt_len, bsz = delays.size()
+        lagging_padding_mask = delays >= src_lens
+        lagging_padding_mask = torch.nn.functional.pad(
+            lagging_padding_mask.t(), (1, 0)
+        ).t()[:-1, :]
+        gamma = tgt_lens / src_lens
+        lagging = (
+            delays
+            - torch.arange(delays.size(0))
+            .unsqueeze(1)
+            .type_as(delays)
+            .expand_as(delays)
+            / gamma
+        )
+        lagging.masked_fill_(lagging_padding_mask, 0)
+        tau = (1 - lagging_padding_mask.type_as(lagging)).sum(dim=0, keepdim=True)
+        AL = lagging.sum(dim=0, keepdim=True) / tau
+
+        return AL
+
+
+class DifferentiableAverageLagging(LatencyMetric):
+    """
+    Function to calculate Differentiable Average Lagging from
+    Monotonic Infinite Lookback Attention for Simultaneous Machine Translation
+    (https://arxiv.org/abs/1906.05218)
+
+    Delays are monotonic steps, range from 0 to src_len-1.
+    (In the original paper thery are from 1 to src_len)
+    Give src x tgt y, AP is calculated as:
+
+    DAL = 1 / |Y| sum_i^|Y| delays'_i - (i - 1) / gamma
+
+    Where
+    delays'_i =
+        1. delays_i if i == 1
+        2. max(delays_i, delays'_{i-1} + 1 / gamma)
+
+    """
+
+    @staticmethod
+    def cal_metric(delays, src_lens, tgt_lens, target_padding_mask):
+        tgt_len, bsz = delays.size()
+
+        gamma = tgt_lens / src_lens
+        new_delays = torch.zeros_like(delays)
+
+        for i in range(delays.size(0)):
+            if i == 0:
+                new_delays[i] = delays[i]
+            else:
+                new_delays[i] = torch.cat(
+                    [
+                        new_delays[i - 1].unsqueeze(0) + 1 / gamma,
+                        delays[i].unsqueeze(0),
+                    ],
+                    dim=0,
+                ).max(dim=0)[0]
+
+        DAL = (
+            new_delays
+            - torch.arange(delays.size(0))
+            .unsqueeze(1)
+            .type_as(delays)
+            .expand_as(delays)
+            / gamma
+        )
+        if target_padding_mask is not None:
+            DAL = DAL.masked_fill(target_padding_mask, 0)
+
+        DAL = DAL.sum(dim=0, keepdim=True) / tgt_lens
+
+        return DAL
+
+
+class LatencyMetricVariance(LatencyMetric):
+    def prepare_latency_metric(
+        self,
+        delays,
+        src_lens,
+        target_padding_mask=None,
+        batch_first: bool = True,
+        start_from_zero: bool = True,
+    ):
+        assert batch_first
+        assert len(delays.size()) == 3
+        assert len(src_lens.size()) == 2
+
+        if start_from_zero:
+            delays = delays + 1
+
+        # convert to batch_last
+        bsz, num_heads_x_layers, tgt_len = delays.size()
+        bsz_1, _ = src_lens.size()
+        assert bsz == bsz_1
+
+        if target_padding_mask is not None:
+            bsz_2, tgt_len_1 = target_padding_mask.size()
+            assert tgt_len == tgt_len_1
+            assert bsz == bsz_2
+
+        if target_padding_mask is None:
+            tgt_lens = tgt_len * delays.new_ones([bsz, tgt_len]).float()
+        else:
+            # batch_size, 1
+            tgt_lens = self.length_from_padding_mask(target_padding_mask, True).float()
+            delays = delays.masked_fill(target_padding_mask.unsqueeze(1), 0)
+
+        return delays, src_lens, tgt_lens, target_padding_mask
+
+
+class VarianceDelay(LatencyMetricVariance):
+    @staticmethod
+    def cal_metric(delays, src_lens, tgt_lens, target_padding_mask):
+        """
+        delays : bsz, num_heads_x_layers, tgt_len
+        src_lens : bsz, 1
+        target_lens : bsz, 1
+        target_padding_mask: bsz, tgt_len or None
+        """
+        if delays.size(1) == 1:
+            return delays.new_zeros([1])
+
+        variance_delays = delays.var(dim=1)
+
+        if target_padding_mask is not None:
+            variance_delays.masked_fill_(target_padding_mask, 0)
+
+        return variance_delays.sum(dim=1, keepdim=True) / tgt_lens
+
+
+class LatencyInference(object):
+    def __init__(self, start_from_zero=True):
+        self.metric_calculator = {
+            "differentiable_average_lagging": DifferentiableAverageLagging(),
+            "average_lagging": AverageLagging(),
+            "average_proportion": AverageProportion(),
+        }
+
+        self.start_from_zero = start_from_zero
+
+    def __call__(self, monotonic_step, src_lens):
+        """
+        monotonic_step range from 0 to src_len. src_len means eos
+        delays: bsz, tgt_len
+        src_lens: bsz, 1
+        """
+        if not self.start_from_zero:
+            monotonic_step -= 1
+
+        src_lens = src_lens
+
+        delays = monotonic_step.view(
+            monotonic_step.size(0), -1, monotonic_step.size(-1)
+        ).max(dim=1)[0]
+
+        delays = delays.masked_fill(delays >= src_lens, 0) + (src_lens - 1).expand_as(
+            delays
+        ).masked_fill(delays < src_lens, 0)
+        return_dict = {}
+        for key, func in self.metric_calculator.items():
+            return_dict[key] = func(
+                delays.float(),
+                src_lens.float(),
+                target_padding_mask=None,
+                batch_first=True,
+                start_from_zero=True,
+            ).t()
+
+        return return_dict
+
+
+class LatencyTraining(object):
+    def __init__(
+        self,
+        avg_weight,
+        var_weight,
+        avg_type,
+        var_type,
+        stay_on_last_token,
+        average_method,
+    ):
+        self.avg_weight = avg_weight
+        self.var_weight = var_weight
+        self.avg_type = avg_type
+        self.var_type = var_type
+        self.stay_on_last_token = stay_on_last_token
+        self.average_method = average_method
+
+        self.metric_calculator = {
+            "differentiable_average_lagging": DifferentiableAverageLagging(),
+            "average_lagging": AverageLagging(),
+            "average_proportion": AverageProportion(),
+        }
+
+        self.variance_calculator = {
+            "variance_delay": VarianceDelay(),
+        }
+
+    def expected_delays_from_attention(
+        self, attention, source_padding_mask=None, target_padding_mask=None
+    ):
+        if type(attention) == list:
+            # bsz, num_heads, tgt_len, src_len
+            bsz, num_heads, tgt_len, src_len = attention[0].size()
+            attention = torch.cat(attention, dim=1)
+            bsz, num_heads_x_layers, tgt_len, src_len = attention.size()
+            # bsz * num_heads * num_layers, tgt_len, src_len
+            attention = attention.view(-1, tgt_len, src_len)
+        else:
+            # bsz * num_heads * num_layers, tgt_len, src_len
+            bsz, tgt_len, src_len = attention.size()
+            num_heads_x_layers = 1
+            attention = attention.view(-1, tgt_len, src_len)
+
+        if not self.stay_on_last_token:
+            residual_attention = 1 - attention[:, :, :-1].sum(dim=2, keepdim=True)
+            attention = torch.cat([attention[:, :, :-1], residual_attention], dim=2)
+
+        # bsz * num_heads_x_num_layers, tgt_len, src_len for MMA
+        steps = (
+            torch.arange(1, 1 + src_len)
+            .unsqueeze(0)
+            .unsqueeze(1)
+            .expand_as(attention)
+            .type_as(attention)
+        )
+
+        if source_padding_mask is not None:
+            src_offset = (
+                source_padding_mask.type_as(attention)
+                .sum(dim=1, keepdim=True)
+                .expand(bsz, num_heads_x_layers)
+                .contiguous()
+                .view(-1, 1)
+            )
+            src_lens = src_len - src_offset
+            if source_padding_mask[:, 0].any():
+                # Pad left
+                src_offset = src_offset.view(-1, 1, 1)
+                steps = steps - src_offset
+                steps = steps.masked_fill(steps <= 0, 0)
+        else:
+            src_lens = attention.new_ones([bsz, num_heads_x_layers]) * src_len
+            src_lens = src_lens.view(-1, 1)
+
+        # bsz * num_heads_num_layers, tgt_len, src_len
+        expected_delays = (
+            (steps * attention).sum(dim=2).view(bsz, num_heads_x_layers, tgt_len)
+        )
+
+        if target_padding_mask is not None:
+            expected_delays.masked_fill_(target_padding_mask.unsqueeze(1), 0)
+
+        return expected_delays, src_lens
+
+    def avg_loss(self, expected_delays, src_lens, target_padding_mask):
+
+        bsz, num_heads_x_layers, tgt_len = expected_delays.size()
+        target_padding_mask = (
+            target_padding_mask.unsqueeze(1)
+            .expand_as(expected_delays)
+            .contiguous()
+            .view(-1, tgt_len)
+        )
+
+        if self.average_method == "average":
+            # bsz * tgt_len
+            expected_delays = expected_delays.mean(dim=1)
+        elif self.average_method == "weighted_average":
+            weights = torch.nn.functional.softmax(expected_delays, dim=1)
+            expected_delays = torch.sum(expected_delays * weights, dim=1)
+        elif self.average_method == "max":
+            # bsz * num_heads_x_num_layers, tgt_len
+            expected_delays = expected_delays.max(dim=1)[0]
+        else:
+            raise RuntimeError(f"{self.average_method} is not supported")
+
+        src_lens = src_lens.view(bsz, -1)[:, :1]
+        target_padding_mask = target_padding_mask.view(bsz, -1, tgt_len)[:, 0]
+
+        if self.avg_weight > 0.0:
+            if self.avg_type in self.metric_calculator:
+                average_delays = self.metric_calculator[self.avg_type](
+                    expected_delays,
+                    src_lens,
+                    target_padding_mask,
+                    batch_first=True,
+                    start_from_zero=False,
+                )
+            else:
+                raise RuntimeError(f"{self.avg_type} is not supported.")
+
+            # bsz * num_heads_x_num_layers, 1
+            return self.avg_weight * average_delays.sum()
+        else:
+            return 0.0
+
+    def var_loss(self, expected_delays, src_lens, target_padding_mask):
+        src_lens = src_lens.view(expected_delays.size(0), expected_delays.size(1))[
+            :, :1
+        ]
+        if self.var_weight > 0.0:
+            if self.var_type in self.variance_calculator:
+                variance_delays = self.variance_calculator[self.var_type](
+                    expected_delays,
+                    src_lens,
+                    target_padding_mask,
+                    batch_first=True,
+                    start_from_zero=False,
+                )
+            else:
+                raise RuntimeError(f"{self.var_type} is not supported.")
+
+            return self.var_weight * variance_delays.sum()
+        else:
+            return 0.0
+
+    def loss(self, attention, source_padding_mask=None, target_padding_mask=None):
+        expected_delays, src_lens = self.expected_delays_from_attention(
+            attention, source_padding_mask, target_padding_mask
+        )
+
+        latency_loss = 0
+
+        latency_loss += self.avg_loss(expected_delays, src_lens, target_padding_mask)
+
+        latency_loss += self.var_loss(expected_delays, src_lens, target_padding_mask)
+
+        return latency_loss
diff --git a/SpeechT5/fairseq/examples/simultaneous_translation/utils/p_choose_strategy.py b/SpeechT5/fairseq/examples/simultaneous_translation/utils/p_choose_strategy.py
new file mode 100644
index 0000000000000000000000000000000000000000..308227ed96d8ee94b66bc0df343c96abbe2c55cc
--- /dev/null
+++ b/SpeechT5/fairseq/examples/simultaneous_translation/utils/p_choose_strategy.py
@@ -0,0 +1,124 @@
+from typing import Optional, Dict
+from torch import Tensor
+import torch
+
+
+def waitk(
+    query, key, waitk_lagging: int, num_heads: int, key_padding_mask: Optional[Tensor] = None,
+    incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]] = None
+):
+    if incremental_state is not None:
+        # Retrieve target length from incremental states
+        # For inference the length of query is always 1
+        tgt_len = incremental_state["steps"]["tgt"]
+        assert tgt_len is not None
+        tgt_len = int(tgt_len)
+    else:
+        tgt_len, bsz, _ = query.size()
+
+    max_src_len, bsz, _ = key.size()
+
+    if max_src_len < waitk_lagging:
+        if incremental_state is not None:
+            tgt_len = 1
+        return query.new_zeros(
+            bsz * num_heads, tgt_len, max_src_len
+        )
+
+    # Assuming the p_choose looks like this for wait k=3
+    # src_len = 6, tgt_len = 5
+    #   [0, 0, 1, 0, 0, 0, 0]
+    #   [0, 0, 0, 1, 0, 0, 0]
+    #   [0, 0, 0, 0, 1, 0, 0]
+    #   [0, 0, 0, 0, 0, 1, 0]
+    #   [0, 0, 0, 0, 0, 0, 1]
+    # linearize the p_choose matrix:
+    # [0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0...]
+    # The indices of linearized matrix that equals 1 is
+    # 2 + 6 * 0
+    # 3 + 6 * 1
+    # ...
+    # n + src_len * n + k - 1 = n * (src_len + 1) + k - 1
+    # n from 0 to tgt_len - 1
+    #
+    # First, generate the indices (activate_indices_offset: bsz, tgt_len)
+    # Second, scatter a zeros tensor (bsz, tgt_len * src_len)
+    # with activate_indices_offset
+    # Third, resize the tensor to (bsz, tgt_len, src_len)
+
+    activate_indices_offset = (
+        (
+            torch.arange(tgt_len) * (max_src_len + 1)
+            + waitk_lagging - 1
+        )
+        .unsqueeze(0)
+        .expand(bsz, tgt_len)
+        .to(query)
+        .long()
+    )
+
+    if key_padding_mask is not None:
+        if key_padding_mask[:, 0].any():
+            # Left padding
+            activate_indices_offset += (
+                key_padding_mask.sum(dim=1, keepdim=True)
+            )
+
+    # Need to clamp the indices that are too large
+    activate_indices_offset = (
+        activate_indices_offset
+        .clamp(
+            0,
+            min(
+                [
+                    tgt_len,
+                    max_src_len - waitk_lagging + 1
+                ]
+            ) * max_src_len - 1
+        )
+    )
+
+    p_choose = torch.zeros(bsz, tgt_len * max_src_len).to(query)
+
+    p_choose = p_choose.scatter(
+        1,
+        activate_indices_offset,
+        1.0
+    ).view(bsz, tgt_len, max_src_len)
+
+    if incremental_state is not None:
+        p_choose = p_choose[:, -1:]
+        tgt_len = 1
+
+    # Extend to each head
+    p_choose = (
+        p_choose.contiguous()
+        .unsqueeze(1)
+        .expand(-1, num_heads, -1, -1)
+        .contiguous()
+        .view(-1, tgt_len, max_src_len)
+    )
+
+    return p_choose
+
+
+def hard_aligned(q_proj: Optional[Tensor], k_proj: Optional[Tensor], attn_energy, noise_mean: float = 0.0, noise_var: float = 0.0, training: bool = True):
+    """
+    Calculating step wise prob for reading and writing
+    1 to read, 0 to write
+    """
+
+    noise = 0
+    if training:
+        # add noise here to encourage discretness
+        noise = (
+            torch.normal(noise_mean, noise_var, attn_energy.size())
+            .type_as(attn_energy)
+            .to(attn_energy.device)
+        )
+
+    p_choose = torch.sigmoid(attn_energy + noise)
+    _, _, tgt_len, src_len = p_choose.size()
+
+    # p_choose: bsz * self.num_heads, tgt_len, src_len
+    return p_choose.view(-1, tgt_len, src_len)
diff --git a/SpeechT5/fairseq/examples/speech_recognition/README.md b/SpeechT5/fairseq/examples/speech_recognition/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..17030bf0fd50bb843a508e13e97ed436eae33287
--- /dev/null
+++ b/SpeechT5/fairseq/examples/speech_recognition/README.md
@@ -0,0 +1,83 @@
+### 2021 Update: We are merging this example into the [S2T framework](../speech_to_text), which supports more generic speech-to-text tasks (e.g. speech translation) and more flexible data processing pipelines. Please stay tuned.
+
+# Speech Recognition
+`examples/speech_recognition` is implementing ASR task in Fairseq, along with needed features, datasets, models and loss functions to train and infer model described in [Transformers with convolutional context for ASR (Abdelrahman Mohamed et al., 2019)](https://arxiv.org/abs/1904.11660).
+
+
+## Additional dependencies
+On top of main fairseq dependencies there are couple more additional requirements.
+
+1) Please follow the instructions to install [torchaudio](https://github.com/pytorch/audio). This is required to compute audio fbank features.
+2) [Sclite](http://www1.icsi.berkeley.edu/Speech/docs/sctk-1.2/sclite.htm#sclite_name_0) is used to measure WER. Sclite can be downloaded and installed from source from sctk package [here](http://www.openslr.org/4/). Training and inference doesn't require Sclite dependency.
+3) [sentencepiece](https://github.com/google/sentencepiece) is required in order to create dataset with word-piece targets.
+
+## Preparing librispeech data
+```
+./examples/speech_recognition/datasets/prepare-librispeech.sh $DIR_TO_SAVE_RAW_DATA $DIR_FOR_PREPROCESSED_DATA
+```
+
+## Training librispeech data
+```
+python train.py $DIR_FOR_PREPROCESSED_DATA --save-dir $MODEL_PATH --max-epoch 80 --task speech_recognition --arch vggtransformer_2 --optimizer adadelta --lr 1.0 --adadelta-eps 1e-8 --adadelta-rho 0.95 --clip-norm 10.0  --max-tokens 5000 --log-format json --log-interval 1 --criterion cross_entropy_acc --user-dir examples/speech_recognition/
+```
+
+## Inference for librispeech
+`$SET` can be `test_clean` or `test_other`
+Any checkpoint in `$MODEL_PATH` can be selected. In this example we are working with `checkpoint_last.pt`
+```
+python examples/speech_recognition/infer.py $DIR_FOR_PREPROCESSED_DATA --task speech_recognition --max-tokens 25000 --nbest 1 --path $MODEL_PATH/checkpoint_last.pt --beam 20 --results-path $RES_DIR --batch-size 40 --gen-subset $SET --user-dir examples/speech_recognition/
+```
+
+## Inference for librispeech
+```
+sclite -r ${RES_DIR}/ref.word-checkpoint_last.pt-${SET}.txt -h ${RES_DIR}/hypo.word-checkpoint_last.pt-${SET}.txt -i rm -o all stdout > $RES_REPORT
+```
+`Sum/Avg` row from first table of the report has WER
+
+## Using flashlight (previously called [wav2letter](https://github.com/facebookresearch/wav2letter)) components
+[flashlight](https://github.com/facebookresearch/flashlight) now has integration with fairseq. Currently this includes:
+
+* AutoSegmentationCriterion (ASG)
+* flashlight-style Conv/GLU model
+* flashlight's beam search decoder
+
+To use these, follow the instructions on [this page](https://github.com/facebookresearch/flashlight/tree/master/bindings/python) to install python bindings.
+
+## Training librispeech data (flashlight style, Conv/GLU + ASG loss)
+Training command:
+```
+python train.py $DIR_FOR_PREPROCESSED_DATA --save-dir $MODEL_PATH --max-epoch 100 --task speech_recognition --arch w2l_conv_glu_enc --batch-size 4 --optimizer sgd --lr 0.3,0.8 --momentum 0.8 --clip-norm 0.2 --max-tokens 50000 --log-format json --log-interval 100 --num-workers 0 --sentence-avg --criterion asg_loss --asg-transitions-init 5 --max-replabel 2 --linseg-updates 8789 --user-dir examples/speech_recognition
+```
+
+Note that ASG loss currently doesn't do well with word-pieces. You should prepare a dataset with character targets by setting `nbpe=31` in `prepare-librispeech.sh`.
+
+## Inference for librispeech (flashlight decoder, n-gram LM)
+Inference command:
+```
+python examples/speech_recognition/infer.py $DIR_FOR_PREPROCESSED_DATA --task speech_recognition --seed 1 --nbest 1 --path $MODEL_PATH/checkpoint_last.pt --gen-subset $SET --results-path $RES_DIR --w2l-decoder kenlm --kenlm-model $KENLM_MODEL_PATH --lexicon $LEXICON_PATH --beam 200 --beam-threshold 15 --lm-weight 1.5 --word-score 1.5 --sil-weight -0.3 --criterion asg_loss --max-replabel 2 --user-dir examples/speech_recognition
+```
+
+`$KENLM_MODEL_PATH` should be a standard n-gram language model file. `$LEXICON_PATH` should be a flashlight-style lexicon (list of known words and their spellings). For ASG inference, a lexicon line should look like this (note the repetition labels):
+```
+doorbell  D O 1 R B E L 1 ▁
+```
+For CTC inference with word-pieces, repetition labels are not used and the lexicon should have most common spellings for each word (one can use sentencepiece's `NBestEncodeAsPieces` for this):
+```
+doorbell  ▁DOOR BE LL
+doorbell  ▁DOOR B E LL
+doorbell  ▁DO OR BE LL
+doorbell  ▁DOOR B EL L
+doorbell  ▁DOOR BE L L
+doorbell  ▁DO OR B E LL
+doorbell  ▁DOOR B E L L
+doorbell  ▁DO OR B EL L
+doorbell  ▁DO O R BE LL
+doorbell  ▁DO OR BE L L
+```
+Lowercase vs. uppercase matters: the *word* should match the case of the n-gram language model (i.e. `$KENLM_MODEL_PATH`), while the *spelling* should match the case of the token dictionary (i.e. `$DIR_FOR_PREPROCESSED_DATA/dict.txt`).
+
+## Inference for librispeech (flashlight decoder, viterbi only)
+Inference command:
+```
+python examples/speech_recognition/infer.py $DIR_FOR_PREPROCESSED_DATA --task speech_recognition --seed 1 --nbest 1 --path $MODEL_PATH/checkpoint_last.pt --gen-subset $SET --results-path $RES_DIR --w2l-decoder viterbi --criterion asg_loss --max-replabel 2 --user-dir examples/speech_recognition
+```
diff --git a/SpeechT5/fairseq/examples/speech_recognition/__init__.py b/SpeechT5/fairseq/examples/speech_recognition/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..0278f6a27340c7ff7e207d09348483d1b0d3a100
--- /dev/null
+++ b/SpeechT5/fairseq/examples/speech_recognition/__init__.py
@@ -0,0 +1 @@
+from . import criterions, models, tasks  # noqa
diff --git a/SpeechT5/fairseq/examples/speech_recognition/criterions/ASG_loss.py b/SpeechT5/fairseq/examples/speech_recognition/criterions/ASG_loss.py
new file mode 100644
index 0000000000000000000000000000000000000000..41f50bbd70388ce723f2d316d4e9776bcd6be3c9
--- /dev/null
+++ b/SpeechT5/fairseq/examples/speech_recognition/criterions/ASG_loss.py
@@ -0,0 +1,170 @@
+#!/usr/bin/env python3
+
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import torch
+from examples.speech_recognition.data.replabels import pack_replabels
+from fairseq import utils
+from fairseq.criterions import FairseqCriterion, register_criterion
+
+
+@register_criterion("asg_loss")
+class ASGCriterion(FairseqCriterion):
+    @staticmethod
+    def add_args(parser):
+        group = parser.add_argument_group("ASG Loss")
+        group.add_argument(
+            "--asg-transitions-init",
+            help="initial diagonal value of transition matrix",
+            type=float,
+            default=0.0,
+        )
+        group.add_argument(
+            "--max-replabel", help="maximum # of replabels", type=int, default=2
+        )
+        group.add_argument(
+            "--linseg-updates",
+            help="# of training updates to use LinSeg initialization",
+            type=int,
+            default=0,
+        )
+        group.add_argument(
+            "--hide-linseg-messages",
+            help="hide messages about LinSeg initialization",
+            action="store_true",
+        )
+
+    def __init__(
+        self,
+        task,
+        silence_token,
+        asg_transitions_init,
+        max_replabel,
+        linseg_updates,
+        hide_linseg_messages,
+    ):
+        from flashlight.lib.sequence.criterion import ASGLoss, CriterionScaleMode
+
+        super().__init__(task)
+        self.tgt_dict = task.target_dictionary
+        self.eos = self.tgt_dict.eos()
+        self.silence = (
+            self.tgt_dict.index(silence_token)
+            if silence_token in self.tgt_dict
+            else None
+        )
+        self.max_replabel = max_replabel
+
+        num_labels = len(self.tgt_dict)
+        self.asg = ASGLoss(num_labels, scale_mode=CriterionScaleMode.TARGET_SZ_SQRT)
+        self.asg.trans = torch.nn.Parameter(
+            asg_transitions_init * torch.eye(num_labels), requires_grad=True
+        )
+
+        self.linseg_progress = torch.nn.Parameter(
+            torch.tensor([0], dtype=torch.int), requires_grad=False
+        )
+        self.linseg_maximum = linseg_updates
+        self.linseg_message_state = "none" if hide_linseg_messages else "start"
+
+    @classmethod
+    def build_criterion(cls, args, task):
+        return cls(
+            task,
+            args.silence_token,
+            args.asg_transitions_init,
+            args.max_replabel,
+            args.linseg_updates,
+            args.hide_linseg_messages,
+        )
+
+    def linseg_step(self):
+        if not self.training:
+            return False
+        if self.linseg_progress.item() < self.linseg_maximum:
+            if self.linseg_message_state == "start":
+                print("| using LinSeg to initialize ASG")
+                self.linseg_message_state = "finish"
+            self.linseg_progress.add_(1)
+            return True
+        elif self.linseg_message_state == "finish":
+            print("| finished LinSeg initialization")
+            self.linseg_message_state = "none"
+        return False
+
+    def replace_eos_with_silence(self, tgt):
+        if tgt[-1] != self.eos:
+            return tgt
+        elif self.silence is None or (len(tgt) > 1 and tgt[-2] == self.silence):
+            return tgt[:-1]
+        else:
+            return tgt[:-1] + [self.silence]
+
+    def forward(self, model, sample, reduce=True):
+        """Compute the loss for the given sample.
+
+        Returns a tuple with three elements:
+        1) the loss
+        2) the sample size, which is used as the denominator for the gradient
+        3) logging outputs to display while training
+        """
+
+        net_output = model(**sample["net_input"])
+        emissions = net_output["encoder_out"].transpose(0, 1).contiguous()
+        B = emissions.size(0)
+        T = emissions.size(1)
+        device = emissions.device
+
+        target = torch.IntTensor(B, T)
+        target_size = torch.IntTensor(B)
+        using_linseg = self.linseg_step()
+
+        for b in range(B):
+            initial_target_size = sample["target_lengths"][b].item()
+            if initial_target_size == 0:
+                raise ValueError("target size cannot be zero")
+
+            tgt = sample["target"][b, :initial_target_size].tolist()
+            tgt = self.replace_eos_with_silence(tgt)
+            tgt = pack_replabels(tgt, self.tgt_dict, self.max_replabel)
+            tgt = tgt[:T]
+
+            if using_linseg:
+                tgt = [tgt[t * len(tgt) // T] for t in range(T)]
+
+            target[b][: len(tgt)] = torch.IntTensor(tgt)
+            target_size[b] = len(tgt)
+
+        loss = self.asg.forward(emissions, target.to(device), target_size.to(device))
+
+        if reduce:
+            loss = torch.sum(loss)
+
+        sample_size = (
+            sample["target"].size(0) if self.args.sentence_avg else sample["ntokens"]
+        )
+        logging_output = {
+            "loss": utils.item(loss.data) if reduce else loss.data,
+            "ntokens": sample["ntokens"],
+            "nsentences": sample["target"].size(0),
+            "sample_size": sample_size,
+        }
+        return loss, sample_size, logging_output
+
+    @staticmethod
+    def aggregate_logging_outputs(logging_outputs):
+        """Aggregate logging outputs from data parallel training."""
+        loss_sum = sum(log.get("loss", 0) for log in logging_outputs)
+        ntokens = sum(log.get("ntokens", 0) for log in logging_outputs)
+        nsentences = sum(log.get("nsentences", 0) for log in logging_outputs)
+        sample_size = sum(log.get("sample_size", 0) for log in logging_outputs)
+        agg_output = {
+            "loss": loss_sum / nsentences,
+            "ntokens": ntokens,
+            "nsentences": nsentences,
+            "sample_size": sample_size,
+        }
+        return agg_output
diff --git a/SpeechT5/fairseq/examples/speech_recognition/criterions/__init__.py b/SpeechT5/fairseq/examples/speech_recognition/criterions/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..579abd2ace1b14b80f5e53e5c96583e4d5b14c52
--- /dev/null
+++ b/SpeechT5/fairseq/examples/speech_recognition/criterions/__init__.py
@@ -0,0 +1,17 @@
+import importlib
+import os
+
+
+# ASG loss requires flashlight bindings
+files_to_skip = set()
+try:
+    import flashlight.lib.sequence.criterion
+except ImportError:
+    files_to_skip.add("ASG_loss.py")
+
+for file in sorted(os.listdir(os.path.dirname(__file__))):
+    if file.endswith(".py") and not file.startswith("_") and file not in files_to_skip:
+        criterion_name = file[: file.find(".py")]
+        importlib.import_module(
+            "examples.speech_recognition.criterions." + criterion_name
+        )
diff --git a/SpeechT5/fairseq/examples/speech_recognition/criterions/cross_entropy_acc.py b/SpeechT5/fairseq/examples/speech_recognition/criterions/cross_entropy_acc.py
new file mode 100644
index 0000000000000000000000000000000000000000..7c4d8ba3802a2da9467c42b0aa18653c7bbb2ec9
--- /dev/null
+++ b/SpeechT5/fairseq/examples/speech_recognition/criterions/cross_entropy_acc.py
@@ -0,0 +1,130 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from __future__ import absolute_import, division, print_function, unicode_literals
+
+import logging
+import math
+
+import torch
+import torch.nn.functional as F
+from fairseq import utils
+from fairseq.criterions import FairseqCriterion, register_criterion
+
+
+@register_criterion("cross_entropy_acc")
+class CrossEntropyWithAccCriterion(FairseqCriterion):
+    def __init__(self, task, sentence_avg):
+        super().__init__(task)
+        self.sentence_avg = sentence_avg
+
+    def compute_loss(self, model, net_output, target, reduction, log_probs):
+        # N, T -> N * T
+        target = target.view(-1)
+        lprobs = model.get_normalized_probs(net_output, log_probs=log_probs)
+        if not hasattr(lprobs, "batch_first"):
+            logging.warning(
+                "ERROR: we need to know whether "
+                "batch first for the net output; "
+                "you need to set batch_first attribute for the return value of "
+                "model.get_normalized_probs. Now, we assume this is true, but "
+                "in the future, we will raise exception instead. "
+            )
+        batch_first = getattr(lprobs, "batch_first", True)
+        if not batch_first:
+            lprobs = lprobs.transpose(0, 1)
+
+        # N, T, D -> N * T, D
+        lprobs = lprobs.view(-1, lprobs.size(-1))
+        loss = F.nll_loss(
+            lprobs, target, ignore_index=self.padding_idx, reduction=reduction
+        )
+        return lprobs, loss
+
+    def get_logging_output(self, sample, target, lprobs, loss):
+        target = target.view(-1)
+        mask = target != self.padding_idx
+        correct = torch.sum(
+            lprobs.argmax(1).masked_select(mask) == target.masked_select(mask)
+        )
+        total = torch.sum(mask)
+        sample_size = (
+            sample["target"].size(0) if self.sentence_avg else sample["ntokens"]
+        )
+
+        logging_output = {
+            "loss": utils.item(loss.data),  # * sample['ntokens'],
+            "ntokens": sample["ntokens"],
+            "nsentences": sample["target"].size(0),
+            "sample_size": sample_size,
+            "correct": utils.item(correct.data),
+            "total": utils.item(total.data),
+            "nframes": torch.sum(sample["net_input"]["src_lengths"]).item(),
+        }
+
+        return sample_size, logging_output
+
+    def forward(self, model, sample, reduction="sum", log_probs=True):
+        """Computes the cross entropy with accuracy metric for the given sample.
+
+        This is similar to CrossEntropyCriterion in fairseq, but also
+        computes accuracy metrics as part of logging
+
+        Args:
+            logprobs (Torch.tensor) of shape N, T, D i.e.
+                batchsize, timesteps, dimensions
+            targets (Torch.tensor) of shape N, T  i.e batchsize, timesteps
+
+        Returns:
+        tuple: With three elements:
+            1) the loss
+            2) the sample size, which is used as the denominator for the gradient
+            3) logging outputs to display while training
+
+        TODO:
+            * Currently this Criterion will only work with LSTMEncoderModels or
+            FairseqModels which have decoder, or Models which return TorchTensor
+            as net_output.
+            We need to make a change to support all FairseqEncoder models.
+        """
+        net_output = model(**sample["net_input"])
+        target = model.get_targets(sample, net_output)
+        lprobs, loss = self.compute_loss(
+            model, net_output, target, reduction, log_probs
+        )
+        sample_size, logging_output = self.get_logging_output(
+            sample, target, lprobs, loss
+        )
+        return loss, sample_size, logging_output
+
+    @staticmethod
+    def aggregate_logging_outputs(logging_outputs):
+        """Aggregate logging outputs from data parallel training."""
+        correct_sum = sum(log.get("correct", 0) for log in logging_outputs)
+        total_sum = sum(log.get("total", 0) for log in logging_outputs)
+        loss_sum = sum(log.get("loss", 0) for log in logging_outputs)
+        ntokens = sum(log.get("ntokens", 0) for log in logging_outputs)
+        nsentences = sum(log.get("nsentences", 0) for log in logging_outputs)
+        sample_size = sum(log.get("sample_size", 0) for log in logging_outputs)
+        nframes = sum(log.get("nframes", 0) for log in logging_outputs)
+        agg_output = {
+            "loss": loss_sum / sample_size / math.log(2) if sample_size > 0 else 0.0,
+            # if args.sentence_avg, then sample_size is nsentences, then loss
+            # is per-sentence loss; else sample_size is ntokens, the loss
+            # becomes per-output token loss
+            "ntokens": ntokens,
+            "nsentences": nsentences,
+            "nframes": nframes,
+            "sample_size": sample_size,
+            "acc": correct_sum * 100.0 / total_sum if total_sum > 0 else 0.0,
+            "correct": correct_sum,
+            "total": total_sum,
+            # total is the number of validate tokens
+        }
+        if sample_size != ntokens:
+            agg_output["nll_loss"] = loss_sum / ntokens / math.log(2)
+        # loss: per output token loss
+        # nll_loss: per sentence loss
+        return agg_output
diff --git a/SpeechT5/fairseq/examples/speech_recognition/data/__init__.py b/SpeechT5/fairseq/examples/speech_recognition/data/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..47bb6e24ddf25aa4fd5bf0fe9672f89099efb9b4
--- /dev/null
+++ b/SpeechT5/fairseq/examples/speech_recognition/data/__init__.py
@@ -0,0 +1,11 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from .asr_dataset import AsrDataset
+
+
+__all__ = [
+    "AsrDataset",
+]
diff --git a/SpeechT5/fairseq/examples/speech_recognition/data/asr_dataset.py b/SpeechT5/fairseq/examples/speech_recognition/data/asr_dataset.py
new file mode 100644
index 0000000000000000000000000000000000000000..63a6fcac85d73b1fce8e4d044b4209b1b67fa8ce
--- /dev/null
+++ b/SpeechT5/fairseq/examples/speech_recognition/data/asr_dataset.py
@@ -0,0 +1,122 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import os
+
+import numpy as np
+from fairseq.data import FairseqDataset
+
+from . import data_utils
+from .collaters import Seq2SeqCollater
+
+
+class AsrDataset(FairseqDataset):
+    """
+    A dataset representing speech and corresponding transcription.
+
+    Args:
+        aud_paths: (List[str]): A list of str with paths to audio files.
+        aud_durations_ms (List[int]): A list of int containing the durations of
+            audio files.
+        tgt (List[torch.LongTensor]): A list of LongTensors containing the indices
+            of target transcriptions.
+        tgt_dict (~fairseq.data.Dictionary): target vocabulary.
+        ids (List[str]): A list of utterance IDs.
+        speakers (List[str]): A list of speakers corresponding to utterances.
+        num_mel_bins (int): Number of triangular mel-frequency bins (default: 80)
+        frame_length (float): Frame length in milliseconds (default: 25.0)
+        frame_shift (float): Frame shift in milliseconds (default: 10.0)
+    """
+
+    def __init__(
+        self,
+        aud_paths,
+        aud_durations_ms,
+        tgt,
+        tgt_dict,
+        ids,
+        speakers,
+        num_mel_bins=80,
+        frame_length=25.0,
+        frame_shift=10.0,
+    ):
+        assert frame_length > 0
+        assert frame_shift > 0
+        assert all(x > frame_length for x in aud_durations_ms)
+        self.frame_sizes = [
+            int(1 + (d - frame_length) / frame_shift) for d in aud_durations_ms
+        ]
+
+        assert len(aud_paths) > 0
+        assert len(aud_paths) == len(aud_durations_ms)
+        assert len(aud_paths) == len(tgt)
+        assert len(aud_paths) == len(ids)
+        assert len(aud_paths) == len(speakers)
+        self.aud_paths = aud_paths
+        self.tgt_dict = tgt_dict
+        self.tgt = tgt
+        self.ids = ids
+        self.speakers = speakers
+        self.num_mel_bins = num_mel_bins
+        self.frame_length = frame_length
+        self.frame_shift = frame_shift
+
+        self.s2s_collater = Seq2SeqCollater(
+            0,
+            1,
+            pad_index=self.tgt_dict.pad(),
+            eos_index=self.tgt_dict.eos(),
+            move_eos_to_beginning=True,
+        )
+
+    def __getitem__(self, index):
+        import torchaudio
+        import torchaudio.compliance.kaldi as kaldi
+
+        tgt_item = self.tgt[index] if self.tgt is not None else None
+
+        path = self.aud_paths[index]
+        if not os.path.exists(path):
+            raise FileNotFoundError("Audio file not found: {}".format(path))
+        sound, sample_rate = torchaudio.load_wav(path)
+        output = kaldi.fbank(
+            sound,
+            num_mel_bins=self.num_mel_bins,
+            frame_length=self.frame_length,
+            frame_shift=self.frame_shift,
+        )
+        output_cmvn = data_utils.apply_mv_norm(output)
+
+        return {"id": index, "data": [output_cmvn.detach(), tgt_item]}
+
+    def __len__(self):
+        return len(self.aud_paths)
+
+    def collater(self, samples):
+        """Merge a list of samples to form a mini-batch.
+
+        Args:
+            samples (List[int]): sample indices to collate
+
+        Returns:
+            dict: a mini-batch suitable for forwarding with a Model
+        """
+        return self.s2s_collater.collate(samples)
+
+    def num_tokens(self, index):
+        return self.frame_sizes[index]
+
+    def size(self, index):
+        """Return an example's size as a float or tuple. This value is used when
+        filtering a dataset with ``--max-positions``."""
+        return (
+            self.frame_sizes[index],
+            len(self.tgt[index]) if self.tgt is not None else 0,
+        )
+
+    def ordered_indices(self):
+        """Return an ordered list of indices. Batches will be constructed based
+        on this order."""
+        return np.arange(len(self))
diff --git a/SpeechT5/fairseq/examples/speech_recognition/data/collaters.py b/SpeechT5/fairseq/examples/speech_recognition/data/collaters.py
new file mode 100644
index 0000000000000000000000000000000000000000..6acfec876b87e5a00bc92083b1181301a2a18e3f
--- /dev/null
+++ b/SpeechT5/fairseq/examples/speech_recognition/data/collaters.py
@@ -0,0 +1,131 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+"""
+    This module contains collection of classes which implement
+    collate functionalities for various tasks.
+
+    Collaters should know what data to expect for each sample
+    and they should pack / collate them into batches
+"""
+
+
+from __future__ import absolute_import, division, print_function, unicode_literals
+
+import numpy as np
+import torch
+from fairseq.data import data_utils as fairseq_data_utils
+
+
+class Seq2SeqCollater(object):
+    """
+    Implements collate function mainly for seq2seq tasks
+    This expects each sample to contain feature (src_tokens) and
+    targets.
+    This collator is also used for aligned training task.
+    """
+
+    def __init__(
+        self,
+        feature_index=0,
+        label_index=1,
+        pad_index=1,
+        eos_index=2,
+        move_eos_to_beginning=True,
+    ):
+        self.feature_index = feature_index
+        self.label_index = label_index
+        self.pad_index = pad_index
+        self.eos_index = eos_index
+        self.move_eos_to_beginning = move_eos_to_beginning
+
+    def _collate_frames(self, frames):
+        """Convert a list of 2d frames into a padded 3d tensor
+        Args:
+            frames (list): list of 2d frames of size L[i]*f_dim. Where L[i] is
+                length of i-th frame and f_dim is static dimension of features
+        Returns:
+            3d tensor of size len(frames)*len_max*f_dim where len_max is max of L[i]
+        """
+        len_max = max(frame.size(0) for frame in frames)
+        f_dim = frames[0].size(1)
+        res = frames[0].new(len(frames), len_max, f_dim).fill_(0.0)
+
+        for i, v in enumerate(frames):
+            res[i, : v.size(0)] = v
+
+        return res
+
+    def collate(self, samples):
+        """
+        utility function to collate samples into batch for speech recognition.
+        """
+        if len(samples) == 0:
+            return {}
+
+        # parse samples into torch tensors
+        parsed_samples = []
+        for s in samples:
+            # skip invalid samples
+            if s["data"][self.feature_index] is None:
+                continue
+            source = s["data"][self.feature_index]
+            if isinstance(source, (np.ndarray, np.generic)):
+                source = torch.from_numpy(source)
+            target = s["data"][self.label_index]
+            if isinstance(target, (np.ndarray, np.generic)):
+                target = torch.from_numpy(target).long()
+            elif isinstance(target, list):
+                target = torch.LongTensor(target)
+
+            parsed_sample = {"id": s["id"], "source": source, "target": target}
+            parsed_samples.append(parsed_sample)
+        samples = parsed_samples
+
+        id = torch.LongTensor([s["id"] for s in samples])
+        frames = self._collate_frames([s["source"] for s in samples])
+        # sort samples by descending number of frames
+        frames_lengths = torch.LongTensor([s["source"].size(0) for s in samples])
+        frames_lengths, sort_order = frames_lengths.sort(descending=True)
+        id = id.index_select(0, sort_order)
+        frames = frames.index_select(0, sort_order)
+
+        target = None
+        target_lengths = None
+        prev_output_tokens = None
+        if samples[0].get("target", None) is not None:
+            ntokens = sum(len(s["target"]) for s in samples)
+            target = fairseq_data_utils.collate_tokens(
+                [s["target"] for s in samples],
+                self.pad_index,
+                self.eos_index,
+                left_pad=False,
+                move_eos_to_beginning=False,
+            )
+            target = target.index_select(0, sort_order)
+            target_lengths = torch.LongTensor(
+                [s["target"].size(0) for s in samples]
+            ).index_select(0, sort_order)
+            prev_output_tokens = fairseq_data_utils.collate_tokens(
+                [s["target"] for s in samples],
+                self.pad_index,
+                self.eos_index,
+                left_pad=False,
+                move_eos_to_beginning=self.move_eos_to_beginning,
+            )
+            prev_output_tokens = prev_output_tokens.index_select(0, sort_order)
+        else:
+            ntokens = sum(len(s["source"]) for s in samples)
+
+        batch = {
+            "id": id,
+            "ntokens": ntokens,
+            "net_input": {"src_tokens": frames, "src_lengths": frames_lengths},
+            "target": target,
+            "target_lengths": target_lengths,
+            "nsentences": len(samples),
+        }
+        if prev_output_tokens is not None:
+            batch["net_input"]["prev_output_tokens"] = prev_output_tokens
+        return batch
diff --git a/SpeechT5/fairseq/examples/speech_recognition/data/data_utils.py b/SpeechT5/fairseq/examples/speech_recognition/data/data_utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..cc4729e63c8ef551b29617d1169a44c24f509ad0
--- /dev/null
+++ b/SpeechT5/fairseq/examples/speech_recognition/data/data_utils.py
@@ -0,0 +1,100 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import torch
+
+
+def calc_mean_invstddev(feature):
+    if len(feature.size()) != 2:
+        raise ValueError("We expect the input feature to be 2-D tensor")
+    mean = feature.mean(0)
+    var = feature.var(0)
+    # avoid division by ~zero
+    eps = 1e-8
+    if (var < eps).any():
+        return mean, 1.0 / (torch.sqrt(var) + eps)
+    return mean, 1.0 / torch.sqrt(var)
+
+
+def apply_mv_norm(features):
+    # If there is less than 2 spectrograms, the variance cannot be computed (is NaN)
+    # and normalization is not possible, so return the item as it is
+    if features.size(0) < 2:
+        return features
+    mean, invstddev = calc_mean_invstddev(features)
+    res = (features - mean) * invstddev
+    return res
+
+
+def lengths_to_encoder_padding_mask(lengths, batch_first=False):
+    """
+    convert lengths (a 1-D Long/Int tensor) to 2-D binary tensor
+
+    Args:
+        lengths: a (B, )-shaped tensor
+
+    Return:
+        max_length: maximum length of B sequences
+        encoder_padding_mask: a (max_length, B) binary mask, where
+        [t, b] = 0 for t < lengths[b] and 1 otherwise
+
+    TODO:
+        kernelize this function if benchmarking shows this function is slow
+    """
+    max_lengths = torch.max(lengths).item()
+    bsz = lengths.size(0)
+    encoder_padding_mask = torch.arange(
+        max_lengths
+    ).to(  # a (T, ) tensor with [0, ..., T-1]
+        lengths.device
+    ).view(  # move to the right device
+        1, max_lengths
+    ).expand(  # reshape to (1, T)-shaped tensor
+        bsz, -1
+    ) >= lengths.view(  # expand to (B, T)-shaped tensor
+        bsz, 1
+    ).expand(
+        -1, max_lengths
+    )
+    if not batch_first:
+        return encoder_padding_mask.t(), max_lengths
+    else:
+        return encoder_padding_mask, max_lengths
+
+
+def encoder_padding_mask_to_lengths(
+    encoder_padding_mask, max_lengths, batch_size, device
+):
+    """
+    convert encoder_padding_mask (2-D binary tensor) to a 1-D tensor
+
+    Conventionally, encoder output contains a encoder_padding_mask, which is
+    a 2-D mask in a shape (T, B), whose (t, b) element indicate whether
+    encoder_out[t, b] is a valid output (=0) or not (=1). Occasionally, we
+    need to convert this mask tensor to a 1-D tensor in shape (B, ), where
+    [b] denotes the valid length of b-th sequence
+
+    Args:
+        encoder_padding_mask: a (T, B)-shaped binary tensor or None; if None,
+        indicating all are valid
+    Return:
+        seq_lengths: a (B,)-shaped tensor, where its (b, )-th element is the
+        number of valid elements of b-th sequence
+
+        max_lengths: maximum length of all sequence, if encoder_padding_mask is
+        not None, max_lengths must equal to encoder_padding_mask.size(0)
+
+        batch_size: batch size; if encoder_padding_mask is
+        not None, max_lengths must equal to encoder_padding_mask.size(1)
+
+        device: which device to put the result on
+    """
+    if encoder_padding_mask is None:
+        return torch.Tensor([max_lengths] * batch_size).to(torch.int32).to(device)
+
+    assert encoder_padding_mask.size(0) == max_lengths, "max_lengths does not match"
+    assert encoder_padding_mask.size(1) == batch_size, "batch_size does not match"
+
+    return max_lengths - torch.sum(encoder_padding_mask, dim=0)
diff --git a/SpeechT5/fairseq/examples/speech_recognition/data/replabels.py b/SpeechT5/fairseq/examples/speech_recognition/data/replabels.py
new file mode 100644
index 0000000000000000000000000000000000000000..441f1bd432b95865fc981c6c695cee299b07ed62
--- /dev/null
+++ b/SpeechT5/fairseq/examples/speech_recognition/data/replabels.py
@@ -0,0 +1,70 @@
+#!/usr/bin/env python3
+
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+"""
+Replabel transforms for use with flashlight's ASG criterion.
+"""
+
+
+def replabel_symbol(i):
+    """
+    Replabel symbols used in flashlight, currently just "1", "2", ...
+    This prevents training with numeral tokens, so this might change in the future
+    """
+    return str(i)
+
+
+def pack_replabels(tokens, dictionary, max_reps):
+    """
+    Pack a token sequence so that repeated symbols are replaced by replabels
+    """
+    if len(tokens) == 0 or max_reps <= 0:
+        return tokens
+
+    replabel_value_to_idx = [0] * (max_reps + 1)
+    for i in range(1, max_reps + 1):
+        replabel_value_to_idx[i] = dictionary.index(replabel_symbol(i))
+
+    result = []
+    prev_token = -1
+    num_reps = 0
+    for token in tokens:
+        if token == prev_token and num_reps < max_reps:
+            num_reps += 1
+        else:
+            if num_reps > 0:
+                result.append(replabel_value_to_idx[num_reps])
+                num_reps = 0
+            result.append(token)
+            prev_token = token
+    if num_reps > 0:
+        result.append(replabel_value_to_idx[num_reps])
+    return result
+
+
+def unpack_replabels(tokens, dictionary, max_reps):
+    """
+    Unpack a token sequence so that replabels are replaced by repeated symbols
+    """
+    if len(tokens) == 0 or max_reps <= 0:
+        return tokens
+
+    replabel_idx_to_value = {}
+    for i in range(1, max_reps + 1):
+        replabel_idx_to_value[dictionary.index(replabel_symbol(i))] = i
+
+    result = []
+    prev_token = -1
+    for token in tokens:
+        try:
+            for _ in range(replabel_idx_to_value[token]):
+                result.append(prev_token)
+            prev_token = -1
+        except KeyError:
+            result.append(token)
+            prev_token = token
+    return result
diff --git a/SpeechT5/fairseq/examples/speech_recognition/datasets/asr_prep_json.py b/SpeechT5/fairseq/examples/speech_recognition/datasets/asr_prep_json.py
new file mode 100644
index 0000000000000000000000000000000000000000..b8db8ff16691158fae034a8ab3faad622b351caf
--- /dev/null
+++ b/SpeechT5/fairseq/examples/speech_recognition/datasets/asr_prep_json.py
@@ -0,0 +1,125 @@
+#!/usr/bin/env python3
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from __future__ import absolute_import, division, print_function, unicode_literals
+
+import argparse
+import concurrent.futures
+import json
+import multiprocessing
+import os
+from collections import namedtuple
+from itertools import chain
+
+import sentencepiece as spm
+from fairseq.data import Dictionary
+
+
+MILLISECONDS_TO_SECONDS = 0.001
+
+
+def process_sample(aud_path, lable, utt_id, sp, tgt_dict):
+    import torchaudio
+
+    input = {}
+    output = {}
+    si, ei = torchaudio.info(aud_path)
+    input["length_ms"] = int(
+        si.length / si.channels / si.rate / MILLISECONDS_TO_SECONDS
+    )
+    input["path"] = aud_path
+
+    token = " ".join(sp.EncodeAsPieces(lable))
+    ids = tgt_dict.encode_line(token, append_eos=False)
+    output["text"] = lable
+    output["token"] = token
+    output["tokenid"] = ", ".join(map(str, [t.tolist() for t in ids]))
+    return {utt_id: {"input": input, "output": output}}
+
+
+def main():
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        "--audio-dirs",
+        nargs="+",
+        default=["-"],
+        required=True,
+        help="input directories with audio files",
+    )
+    parser.add_argument(
+        "--labels",
+        required=True,
+        help="aggregated input labels with format <ID LABEL> per line",
+        type=argparse.FileType("r", encoding="UTF-8"),
+    )
+    parser.add_argument(
+        "--spm-model",
+        required=True,
+        help="sentencepiece model to use for encoding",
+        type=argparse.FileType("r", encoding="UTF-8"),
+    )
+    parser.add_argument(
+        "--dictionary",
+        required=True,
+        help="file to load fairseq dictionary from",
+        type=argparse.FileType("r", encoding="UTF-8"),
+    )
+    parser.add_argument("--audio-format", choices=["flac", "wav"], default="wav")
+    parser.add_argument(
+        "--output",
+        required=True,
+        type=argparse.FileType("w"),
+        help="path to save json output",
+    )
+    args = parser.parse_args()
+
+    sp = spm.SentencePieceProcessor()
+    sp.Load(args.spm_model.name)
+
+    tgt_dict = Dictionary.load(args.dictionary)
+
+    labels = {}
+    for line in args.labels:
+        (utt_id, label) = line.split(" ", 1)
+        labels[utt_id] = label
+    if len(labels) == 0:
+        raise Exception("No labels found in ", args.labels_path)
+
+    Sample = namedtuple("Sample", "aud_path utt_id")
+    samples = []
+    for path, _, files in chain.from_iterable(
+        os.walk(path) for path in args.audio_dirs
+    ):
+        for f in files:
+            if f.endswith(args.audio_format):
+                if len(os.path.splitext(f)) != 2:
+                    raise Exception("Expect <utt_id.extension> file name. Got: ", f)
+                utt_id = os.path.splitext(f)[0]
+                if utt_id not in labels:
+                    continue
+                samples.append(Sample(os.path.join(path, f), utt_id))
+
+    utts = {}
+    num_cpu = multiprocessing.cpu_count()
+    with concurrent.futures.ThreadPoolExecutor(max_workers=num_cpu) as executor:
+        future_to_sample = {
+            executor.submit(
+                process_sample, s.aud_path, labels[s.utt_id], s.utt_id, sp, tgt_dict
+            ): s
+            for s in samples
+        }
+        for future in concurrent.futures.as_completed(future_to_sample):
+            try:
+                data = future.result()
+            except Exception as exc:
+                print("generated an exception: ", exc)
+            else:
+                utts.update(data)
+    json.dump({"utts": utts}, args.output, indent=4)
+
+
+if __name__ == "__main__":
+    main()
diff --git a/SpeechT5/fairseq/examples/speech_recognition/datasets/prepare-librispeech.sh b/SpeechT5/fairseq/examples/speech_recognition/datasets/prepare-librispeech.sh
new file mode 100644
index 0000000000000000000000000000000000000000..9e9297f08947027685ff508bfa91ff26b0d8ea0c
--- /dev/null
+++ b/SpeechT5/fairseq/examples/speech_recognition/datasets/prepare-librispeech.sh
@@ -0,0 +1,88 @@
+#!/usr/bin/env bash
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+# Prepare librispeech dataset
+
+base_url=www.openslr.org/resources/12
+train_dir=train_960
+
+if [ "$#" -ne 2 ]; then
+  echo "Usage: $0 <download_dir> <out_dir>"
+  echo "e.g.: $0 /tmp/librispeech_raw/ ~/data/librispeech_final"
+  exit 1
+fi
+
+download_dir=${1%/}
+out_dir=${2%/}
+
+fairseq_root=~/fairseq-py/
+mkdir -p ${out_dir}
+cd ${out_dir} || exit
+
+nbpe=5000
+bpemode=unigram
+
+if [ ! -d "$fairseq_root" ]; then
+    echo "$0: Please set correct fairseq_root"
+    exit 1
+fi
+
+echo "Data Download"
+for part in dev-clean test-clean dev-other test-other train-clean-100 train-clean-360 train-other-500; do
+    url=$base_url/$part.tar.gz
+    if ! wget -P $download_dir $url; then
+        echo "$0: wget failed for $url"
+        exit 1
+    fi
+    if ! tar -C $download_dir -xvzf $download_dir/$part.tar.gz; then
+        echo "$0: error un-tarring archive $download_dir/$part.tar.gz"
+        exit 1
+    fi
+done
+
+echo "Merge all train packs into one"
+mkdir -p ${download_dir}/LibriSpeech/${train_dir}/
+for part in train-clean-100 train-clean-360 train-other-500; do
+    mv ${download_dir}/LibriSpeech/${part}/* $download_dir/LibriSpeech/${train_dir}/
+done
+echo "Merge train text"
+find ${download_dir}/LibriSpeech/${train_dir}/ -name '*.txt' -exec cat {} \; >> ${download_dir}/LibriSpeech/${train_dir}/text
+
+# Use combined dev-clean and dev-other as validation set
+find ${download_dir}/LibriSpeech/dev-clean/ ${download_dir}/LibriSpeech/dev-other/ -name '*.txt' -exec cat {} \; >> ${download_dir}/LibriSpeech/valid_text
+find ${download_dir}/LibriSpeech/test-clean/ -name '*.txt' -exec cat {} \; >> ${download_dir}/LibriSpeech/test-clean/text
+find ${download_dir}/LibriSpeech/test-other/ -name '*.txt' -exec cat {} \; >> ${download_dir}/LibriSpeech/test-other/text
+
+
+dict=data/lang_char/${train_dir}_${bpemode}${nbpe}_units.txt
+encoded=data/lang_char/${train_dir}_${bpemode}${nbpe}_encoded.txt
+fairseq_dict=data/lang_char/${train_dir}_${bpemode}${nbpe}_fairseq_dict.txt
+bpemodel=data/lang_char/${train_dir}_${bpemode}${nbpe}
+echo "dictionary: ${dict}"
+echo "Dictionary preparation"
+mkdir -p data/lang_char/
+echo "<unk> 3" > ${dict}
+echo "</s> 2" >> ${dict}
+echo "<pad> 1" >> ${dict}
+cut -f 2- -d" " ${download_dir}/LibriSpeech/${train_dir}/text > data/lang_char/input.txt
+spm_train --input=data/lang_char/input.txt --vocab_size=${nbpe} --model_type=${bpemode} --model_prefix=${bpemodel} --input_sentence_size=100000000 --unk_id=3 --eos_id=2 --pad_id=1 --bos_id=-1 --character_coverage=1
+spm_encode --model=${bpemodel}.model --output_format=piece < data/lang_char/input.txt > ${encoded}
+cat ${encoded} | tr ' ' '\n' | sort | uniq | awk '{print $0 " " NR+3}' >> ${dict}
+cat ${encoded} | tr ' ' '\n' | sort | uniq -c | awk '{print $2 " " $1}' > ${fairseq_dict}
+wc -l ${dict}
+
+echo "Prepare train and test jsons"
+for part in train_960 test-other test-clean; do
+    python ${fairseq_root}/examples/speech_recognition/datasets/asr_prep_json.py --audio-dirs ${download_dir}/LibriSpeech/${part} --labels ${download_dir}/LibriSpeech/${part}/text --spm-model ${bpemodel}.model --audio-format flac --dictionary ${fairseq_dict} --output ${part}.json
+done
+# fairseq expects to find train.json and valid.json during training
+mv train_960.json train.json
+
+echo "Prepare valid json"
+python ${fairseq_root}/examples/speech_recognition/datasets/asr_prep_json.py --audio-dirs ${download_dir}/LibriSpeech/dev-clean ${download_dir}/LibriSpeech/dev-other --labels ${download_dir}/LibriSpeech/valid_text --spm-model ${bpemodel}.model --audio-format flac --dictionary ${fairseq_dict} --output valid.json
+
+cp ${fairseq_dict} ./dict.txt
+cp ${bpemodel}.model ./spm.model
diff --git a/SpeechT5/fairseq/examples/speech_recognition/infer.py b/SpeechT5/fairseq/examples/speech_recognition/infer.py
new file mode 100644
index 0000000000000000000000000000000000000000..6e9a878af46242ced57cfcd0e876a3d2ef3820ae
--- /dev/null
+++ b/SpeechT5/fairseq/examples/speech_recognition/infer.py
@@ -0,0 +1,427 @@
+#!/usr/bin/env python3 -u
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+"""
+Run inference for pre-processed data with a trained model.
+"""
+
+import ast
+import logging
+import math
+import os
+import sys
+
+import editdistance
+import numpy as np
+import torch
+from fairseq import checkpoint_utils, options, progress_bar, tasks, utils
+from fairseq.data.data_utils import post_process
+from fairseq.logging.meters import StopwatchMeter, TimeMeter
+
+
+logging.basicConfig()
+logging.root.setLevel(logging.INFO)
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger(__name__)
+
+
+def add_asr_eval_argument(parser):
+    parser.add_argument("--kspmodel", default=None, help="sentence piece model")
+    parser.add_argument(
+        "--wfstlm", default=None, help="wfstlm on dictonary output units"
+    )
+    parser.add_argument(
+        "--rnnt_decoding_type",
+        default="greedy",
+        help="wfstlm on dictonary\
+output units",
+    )
+    try:
+        parser.add_argument(
+            "--lm-weight",
+            "--lm_weight",
+            type=float,
+            default=0.2,
+            help="weight for lm while interpolating with neural score",
+        )
+    except:
+        pass
+    parser.add_argument(
+        "--rnnt_len_penalty", default=-0.5, help="rnnt length penalty on word level"
+    )
+    parser.add_argument(
+        "--w2l-decoder",
+        choices=["viterbi", "kenlm", "fairseqlm"],
+        help="use a w2l decoder",
+    )
+    parser.add_argument("--lexicon", help="lexicon for w2l decoder")
+    parser.add_argument("--unit-lm", action="store_true", help="if using a unit lm")
+    parser.add_argument("--kenlm-model", "--lm-model", help="lm model for w2l decoder")
+    parser.add_argument("--beam-threshold", type=float, default=25.0)
+    parser.add_argument("--beam-size-token", type=float, default=100)
+    parser.add_argument("--word-score", type=float, default=1.0)
+    parser.add_argument("--unk-weight", type=float, default=-math.inf)
+    parser.add_argument("--sil-weight", type=float, default=0.0)
+    parser.add_argument(
+        "--dump-emissions",
+        type=str,
+        default=None,
+        help="if present, dumps emissions into this file and exits",
+    )
+    parser.add_argument(
+        "--dump-features",
+        type=str,
+        default=None,
+        help="if present, dumps features into this file and exits",
+    )
+    parser.add_argument(
+        "--load-emissions",
+        type=str,
+        default=None,
+        help="if present, loads emissions from this file",
+    )
+    return parser
+
+
+def check_args(args):
+    # assert args.path is not None, "--path required for generation!"
+    # assert args.results_path is not None, "--results_path required for generation!"
+    assert (
+        not args.sampling or args.nbest == args.beam
+    ), "--sampling requires --nbest to be equal to --beam"
+    assert (
+        args.replace_unk is None or args.raw_text
+    ), "--replace-unk requires a raw text dataset (--raw-text)"
+
+
+def get_dataset_itr(args, task, models):
+    return task.get_batch_iterator(
+        dataset=task.dataset(args.gen_subset),
+        max_tokens=args.max_tokens,
+        max_sentences=args.batch_size,
+        max_positions=(sys.maxsize, sys.maxsize),
+        ignore_invalid_inputs=args.skip_invalid_size_inputs_valid_test,
+        required_batch_size_multiple=args.required_batch_size_multiple,
+        num_shards=args.num_shards,
+        shard_id=args.shard_id,
+        num_workers=args.num_workers,
+        data_buffer_size=args.data_buffer_size,
+    ).next_epoch_itr(shuffle=False)
+
+
+def process_predictions(
+    args, hypos, sp, tgt_dict, target_tokens, res_files, speaker, id
+):
+    for hypo in hypos[: min(len(hypos), args.nbest)]:
+        hyp_pieces = tgt_dict.string(hypo["tokens"].int().cpu())
+
+        if "words" in hypo:
+            hyp_words = " ".join(hypo["words"])
+        else:
+            hyp_words = post_process(hyp_pieces, args.post_process)
+
+        if res_files is not None:
+            print(
+                "{} ({}-{})".format(hyp_pieces, speaker, id),
+                file=res_files["hypo.units"],
+            )
+            print(
+                "{} ({}-{})".format(hyp_words, speaker, id),
+                file=res_files["hypo.words"],
+            )
+
+        tgt_pieces = tgt_dict.string(target_tokens)
+        tgt_words = post_process(tgt_pieces, args.post_process)
+
+        if res_files is not None:
+            print(
+                "{} ({}-{})".format(tgt_pieces, speaker, id),
+                file=res_files["ref.units"],
+            )
+            print(
+                "{} ({}-{})".format(tgt_words, speaker, id), file=res_files["ref.words"]
+            )
+
+        if not args.quiet:
+            logger.info("HYPO:" + hyp_words)
+            logger.info("TARGET:" + tgt_words)
+            logger.info("___________________")
+
+        hyp_words = hyp_words.split()
+        tgt_words = tgt_words.split()
+        return editdistance.eval(hyp_words, tgt_words), len(tgt_words)
+
+
+def prepare_result_files(args):
+    def get_res_file(file_prefix):
+        if args.num_shards > 1:
+            file_prefix = f"{args.shard_id}_{file_prefix}"
+        path = os.path.join(
+            args.results_path,
+            "{}-{}-{}.txt".format(
+                file_prefix, os.path.basename(args.path), args.gen_subset
+            ),
+        )
+        return open(path, "w", buffering=1)
+
+    if not args.results_path:
+        return None
+
+    return {
+        "hypo.words": get_res_file("hypo.word"),
+        "hypo.units": get_res_file("hypo.units"),
+        "ref.words": get_res_file("ref.word"),
+        "ref.units": get_res_file("ref.units"),
+    }
+
+
+def optimize_models(args, use_cuda, models):
+    """Optimize ensemble for generation"""
+    for model in models:
+        model.make_generation_fast_(
+            beamable_mm_beam_size=None if args.no_beamable_mm else args.beam,
+            need_attn=args.print_alignment,
+        )
+        if args.fp16:
+            model.half()
+        if use_cuda:
+            model.cuda()
+
+
+class ExistingEmissionsDecoder(object):
+    def __init__(self, decoder, emissions):
+        self.decoder = decoder
+        self.emissions = emissions
+
+    def generate(self, models, sample, **unused):
+        ids = sample["id"].cpu().numpy()
+        try:
+            emissions = np.stack(self.emissions[ids])
+        except:
+            print([x.shape for x in self.emissions[ids]])
+            raise Exception("invalid sizes")
+        emissions = torch.from_numpy(emissions)
+        return self.decoder.decode(emissions)
+
+
+def main(args, task=None, model_state=None):
+    check_args(args)
+
+    if args.max_tokens is None and args.batch_size is None:
+        args.max_tokens = 4000000
+    logger.info(args)
+
+    use_cuda = torch.cuda.is_available() and not args.cpu
+
+    logger.info("| decoding with criterion {}".format(args.criterion))
+
+    task = tasks.setup_task(args)
+
+    # Load ensemble
+    if args.load_emissions:
+        models, criterions = [], []
+        task.load_dataset(args.gen_subset)
+    else:
+        logger.info("| loading model(s) from {}".format(args.path))
+        models, saved_cfg, task = checkpoint_utils.load_model_ensemble_and_task(
+            utils.split_paths(args.path, separator="\\"),
+            arg_overrides=ast.literal_eval(args.model_overrides),
+            task=task,
+            suffix=args.checkpoint_suffix,
+            strict=(args.checkpoint_shard_count == 1),
+            num_shards=args.checkpoint_shard_count,
+            state=model_state,
+        )
+        optimize_models(args, use_cuda, models)
+        task.load_dataset(args.gen_subset, task_cfg=saved_cfg.task)
+
+
+    # Set dictionary
+    tgt_dict = task.target_dictionary
+
+    logger.info(
+        "| {} {} {} examples".format(
+            args.data, args.gen_subset, len(task.dataset(args.gen_subset))
+        )
+    )
+
+    # hack to pass transitions to W2lDecoder
+    if args.criterion == "asg_loss":
+        raise NotImplementedError("asg_loss is currently not supported")
+        # trans = criterions[0].asg.trans.data
+        # args.asg_transitions = torch.flatten(trans).tolist()
+
+    # Load dataset (possibly sharded)
+    itr = get_dataset_itr(args, task, models)
+
+    # Initialize generator
+    gen_timer = StopwatchMeter()
+
+    def build_generator(args):
+        w2l_decoder = getattr(args, "w2l_decoder", None)
+        if w2l_decoder == "viterbi":
+            from examples.speech_recognition.w2l_decoder import W2lViterbiDecoder
+
+            return W2lViterbiDecoder(args, task.target_dictionary)
+        elif w2l_decoder == "kenlm":
+            from examples.speech_recognition.w2l_decoder import W2lKenLMDecoder
+
+            return W2lKenLMDecoder(args, task.target_dictionary)
+        elif w2l_decoder == "fairseqlm":
+            from examples.speech_recognition.w2l_decoder import W2lFairseqLMDecoder
+
+            return W2lFairseqLMDecoder(args, task.target_dictionary)
+        else:
+            print(
+                "only flashlight decoders with (viterbi, kenlm, fairseqlm) options are supported at the moment"
+            )
+
+    # please do not touch this unless you test both generate.py and infer.py with audio_pretraining task
+    generator = build_generator(args)
+
+    if args.load_emissions:
+        generator = ExistingEmissionsDecoder(
+            generator, np.load(args.load_emissions, allow_pickle=True)
+        )
+        logger.info("loaded emissions from " + args.load_emissions)
+
+    num_sentences = 0
+
+    if args.results_path is not None and not os.path.exists(args.results_path):
+        os.makedirs(args.results_path)
+
+    max_source_pos = (
+        utils.resolve_max_positions(
+            task.max_positions(), *[model.max_positions() for model in models]
+        ),
+    )
+
+    if max_source_pos is not None:
+        max_source_pos = max_source_pos[0]
+        if max_source_pos is not None:
+            max_source_pos = max_source_pos[0] - 1
+
+    if args.dump_emissions:
+        emissions = {}
+    if args.dump_features:
+        features = {}
+        models[0].bert.proj = None
+    else:
+        res_files = prepare_result_files(args)
+    errs_t = 0
+    lengths_t = 0
+    with progress_bar.build_progress_bar(args, itr) as t:
+        wps_meter = TimeMeter()
+        for sample in t:
+            sample = utils.move_to_cuda(sample) if use_cuda else sample
+            if "net_input" not in sample:
+                continue
+
+            prefix_tokens = None
+            if args.prefix_size > 0:
+                prefix_tokens = sample["target"][:, : args.prefix_size]
+
+            gen_timer.start()
+            if args.dump_emissions:
+                with torch.no_grad():
+                    encoder_out = models[0](**sample["net_input"])
+                    emm = models[0].get_normalized_probs(encoder_out, log_probs=True)
+                    emm = emm.transpose(0, 1).cpu().numpy()
+                    for i, id in enumerate(sample["id"]):
+                        emissions[id.item()] = emm[i]
+                    continue
+            elif args.dump_features:
+                with torch.no_grad():
+                    encoder_out = models[0](**sample["net_input"])
+                    feat = encoder_out["encoder_out"].transpose(0, 1).cpu().numpy()
+                    for i, id in enumerate(sample["id"]):
+                        padding = (
+                            encoder_out["encoder_padding_mask"][i].cpu().numpy()
+                            if encoder_out["encoder_padding_mask"] is not None
+                            else None
+                        )
+                        features[id.item()] = (feat[i], padding)
+                    continue
+            hypos = task.inference_step(generator, models, sample, prefix_tokens)
+            num_generated_tokens = sum(len(h[0]["tokens"]) for h in hypos)
+            gen_timer.stop(num_generated_tokens)
+
+            for i, sample_id in enumerate(sample["id"].tolist()):
+                speaker = None
+                # id = task.dataset(args.gen_subset).ids[int(sample_id)]
+                id = sample_id
+                toks = (
+                    sample["target"][i, :]
+                    if "target_label" not in sample
+                    else sample["target_label"][i, :]
+                )
+                target_tokens = utils.strip_pad(toks, tgt_dict.pad()).int().cpu()
+                # Process top predictions
+                errs, length = process_predictions(
+                    args,
+                    hypos[i],
+                    None,
+                    tgt_dict,
+                    target_tokens,
+                    res_files,
+                    speaker,
+                    id,
+                )
+                errs_t += errs
+                lengths_t += length
+
+            wps_meter.update(num_generated_tokens)
+            t.log({"wps": round(wps_meter.avg)})
+            num_sentences += (
+                sample["nsentences"] if "nsentences" in sample else sample["id"].numel()
+            )
+
+    wer = None
+    if args.dump_emissions:
+        emm_arr = []
+        for i in range(len(emissions)):
+            emm_arr.append(emissions[i])
+        np.save(args.dump_emissions, emm_arr)
+        logger.info(f"saved {len(emissions)} emissions to {args.dump_emissions}")
+    elif args.dump_features:
+        feat_arr = []
+        for i in range(len(features)):
+            feat_arr.append(features[i])
+        np.save(args.dump_features, feat_arr)
+        logger.info(f"saved {len(features)} emissions to {args.dump_features}")
+    else:
+        if lengths_t > 0:
+            wer = errs_t * 100.0 / lengths_t
+            logger.info(f"WER: {wer}")
+
+        logger.info(
+            "| Processed {} sentences ({} tokens) in {:.1f}s ({:.2f}"
+            "sentences/s, {:.2f} tokens/s)".format(
+                num_sentences,
+                gen_timer.n,
+                gen_timer.sum,
+                num_sentences / gen_timer.sum,
+                1.0 / gen_timer.avg,
+            )
+        )
+        logger.info("| Generate {} with beam={}".format(args.gen_subset, args.beam))
+    return task, wer
+
+
+def make_parser():
+    parser = options.get_generation_parser()
+    parser = add_asr_eval_argument(parser)
+    return parser
+
+
+def cli_main():
+    parser = make_parser()
+    args = options.parse_args_and_arch(parser)
+    main(args)
+
+
+if __name__ == "__main__":
+    cli_main()
diff --git a/SpeechT5/fairseq/examples/speech_recognition/kaldi/__init__.py b/SpeechT5/fairseq/examples/speech_recognition/kaldi/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/SpeechT5/fairseq/examples/speech_recognition/kaldi/add-self-loop-simple.cc b/SpeechT5/fairseq/examples/speech_recognition/kaldi/add-self-loop-simple.cc
new file mode 100644
index 0000000000000000000000000000000000000000..89754b925ea2b770e569b24d8ee07c408102733c
--- /dev/null
+++ b/SpeechT5/fairseq/examples/speech_recognition/kaldi/add-self-loop-simple.cc
@@ -0,0 +1,94 @@
+/*
+* Copyright (c) Facebook, Inc. and its affiliates.
+*
+* This source code is licensed under the MIT license found in the
+* LICENSE file in the root directory of this source tree.
+*/
+
+#include <iostream>
+#include "fstext/fstext-lib.h" // @manual
+#include "util/common-utils.h" // @manual
+
+/*
+ * This program is to modify a FST without self-loop by:
+ *   for each incoming arc with non-eps input symbol, add a self-loop arc
+ *   with that non-eps symbol as input and eps as output.
+ *
+ * This is to make sure the resultant FST can do deduplication for repeated
+ * symbols, which is very common in acoustic model
+ *
+ */
+namespace {
+int32 AddSelfLoopsSimple(fst::StdVectorFst* fst) {
+  typedef fst::MutableArcIterator<fst::StdVectorFst> IterType;
+
+  int32 num_states_before = fst->NumStates();
+  fst::MakePrecedingInputSymbolsSame(false, fst);
+  int32 num_states_after = fst->NumStates();
+  KALDI_LOG << "There are " << num_states_before
+            << " states in the original FST; "
+            << " after MakePrecedingInputSymbolsSame, there are "
+            << num_states_after << " states " << std::endl;
+
+  auto weight_one = fst::StdArc::Weight::One();
+
+  int32 num_arc_added = 0;
+
+  fst::StdArc self_loop_arc;
+  self_loop_arc.weight = weight_one;
+
+  int32 num_states = fst->NumStates();
+  std::vector<std::set<int32>> incoming_non_eps_label_per_state(num_states);
+
+  for (int32 state = 0; state < num_states; state++) {
+    for (IterType aiter(fst, state); !aiter.Done(); aiter.Next()) {
+      fst::StdArc arc(aiter.Value());
+      if (arc.ilabel != 0) {
+        incoming_non_eps_label_per_state[arc.nextstate].insert(arc.ilabel);
+      }
+    }
+  }
+
+  for (int32 state = 0; state < num_states; state++) {
+    if (!incoming_non_eps_label_per_state[state].empty()) {
+      auto& ilabel_set = incoming_non_eps_label_per_state[state];
+      for (auto it = ilabel_set.begin(); it != ilabel_set.end(); it++) {
+        self_loop_arc.ilabel = *it;
+        self_loop_arc.olabel = 0;
+        self_loop_arc.nextstate = state;
+        fst->AddArc(state, self_loop_arc);
+        num_arc_added++;
+      }
+    }
+  }
+  return num_arc_added;
+}
+
+void print_usage() {
+  std::cout << "add-self-loop-simple usage:\n"
+               "\tadd-self-loop-simple <in-fst> <out-fst> \n";
+}
+} // namespace
+
+int main(int argc, char** argv) {
+  if (argc != 3) {
+    print_usage();
+    exit(1);
+  }
+
+  auto input = argv[1];
+  auto output = argv[2];
+
+  auto fst = fst::ReadFstKaldi(input);
+  auto num_states = fst->NumStates();
+  KALDI_LOG << "Loading FST from " << input << " with " << num_states
+            << " states." << std::endl;
+
+  int32 num_arc_added = AddSelfLoopsSimple(fst);
+  KALDI_LOG << "Adding " << num_arc_added << " self-loop arcs " << std::endl;
+
+  fst::WriteFstKaldi(*fst, std::string(output));
+  KALDI_LOG << "Writing FST to " << output << std::endl;
+
+  delete fst;
+}
\ No newline at end of file
diff --git a/SpeechT5/fairseq/examples/speech_recognition/kaldi/config/kaldi_initializer.yaml b/SpeechT5/fairseq/examples/speech_recognition/kaldi/config/kaldi_initializer.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..be9ba98f55463d41d5d5ea35e306abc0886dbead
--- /dev/null
+++ b/SpeechT5/fairseq/examples/speech_recognition/kaldi/config/kaldi_initializer.yaml
@@ -0,0 +1,8 @@
+# @package _group_
+
+data_dir: ???
+fst_dir: ???
+in_labels: ???
+kaldi_root: ???
+lm_arpa: ???
+blank_symbol: <s>
diff --git a/SpeechT5/fairseq/examples/speech_recognition/kaldi/kaldi_decoder.py b/SpeechT5/fairseq/examples/speech_recognition/kaldi/kaldi_decoder.py
new file mode 100644
index 0000000000000000000000000000000000000000..5f62cc58ae8c0c5a3ba7d17713fedf0abc302942
--- /dev/null
+++ b/SpeechT5/fairseq/examples/speech_recognition/kaldi/kaldi_decoder.py
@@ -0,0 +1,244 @@
+#!/usr/bin/env python3
+
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from concurrent.futures import ThreadPoolExecutor
+import logging
+from omegaconf import MISSING
+import os
+import torch
+from typing import Optional
+import warnings
+
+
+from dataclasses import dataclass
+from fairseq.dataclass import FairseqDataclass
+from .kaldi_initializer import KaldiInitializerConfig, initalize_kaldi
+
+
+logger = logging.getLogger(__name__)
+
+
+@dataclass
+class KaldiDecoderConfig(FairseqDataclass):
+    hlg_graph_path: Optional[str] = None
+    output_dict: str = MISSING
+
+    kaldi_initializer_config: Optional[KaldiInitializerConfig] = None
+
+    acoustic_scale: float = 0.5
+    max_active: int = 10000
+    beam_delta: float = 0.5
+    hash_ratio: float = 2.0
+
+    is_lattice: bool = False
+    lattice_beam: float = 10.0
+    prune_interval: int = 25
+    determinize_lattice: bool = True
+    prune_scale: float = 0.1
+    max_mem: int = 0
+    phone_determinize: bool = True
+    word_determinize: bool = True
+    minimize: bool = True
+
+    num_threads: int = 1
+
+
+class KaldiDecoder(object):
+    def __init__(
+        self,
+        cfg: KaldiDecoderConfig,
+        beam: int,
+        nbest: int = 1,
+    ):
+        try:
+            from kaldi.asr import FasterRecognizer, LatticeFasterRecognizer
+            from kaldi.base import set_verbose_level
+            from kaldi.decoder import (
+                FasterDecoder,
+                FasterDecoderOptions,
+                LatticeFasterDecoder,
+                LatticeFasterDecoderOptions,
+            )
+            from kaldi.lat.functions import DeterminizeLatticePhonePrunedOptions
+            from kaldi.fstext import read_fst_kaldi, SymbolTable
+        except:
+            warnings.warn(
+                "pykaldi is required for this functionality. Please install from https://github.com/pykaldi/pykaldi"
+            )
+
+        # set_verbose_level(2)
+
+        self.acoustic_scale = cfg.acoustic_scale
+        self.nbest = nbest
+
+        if cfg.hlg_graph_path is None:
+            assert (
+                cfg.kaldi_initializer_config is not None
+            ), "Must provide hlg graph path or kaldi initializer config"
+            cfg.hlg_graph_path = initalize_kaldi(cfg.kaldi_initializer_config)
+
+        assert os.path.exists(cfg.hlg_graph_path), cfg.hlg_graph_path
+
+        if cfg.is_lattice:
+            self.dec_cls = LatticeFasterDecoder
+            opt_cls = LatticeFasterDecoderOptions
+            self.rec_cls = LatticeFasterRecognizer
+        else:
+            assert self.nbest == 1, "nbest > 1 requires lattice decoder"
+            self.dec_cls = FasterDecoder
+            opt_cls = FasterDecoderOptions
+            self.rec_cls = FasterRecognizer
+
+        self.decoder_options = opt_cls()
+        self.decoder_options.beam = beam
+        self.decoder_options.max_active = cfg.max_active
+        self.decoder_options.beam_delta = cfg.beam_delta
+        self.decoder_options.hash_ratio = cfg.hash_ratio
+
+        if cfg.is_lattice:
+            self.decoder_options.lattice_beam = cfg.lattice_beam
+            self.decoder_options.prune_interval = cfg.prune_interval
+            self.decoder_options.determinize_lattice = cfg.determinize_lattice
+            self.decoder_options.prune_scale = cfg.prune_scale
+            det_opts = DeterminizeLatticePhonePrunedOptions()
+            det_opts.max_mem = cfg.max_mem
+            det_opts.phone_determinize = cfg.phone_determinize
+            det_opts.word_determinize = cfg.word_determinize
+            det_opts.minimize = cfg.minimize
+            self.decoder_options.det_opts = det_opts
+
+        self.output_symbols = {}
+        with open(cfg.output_dict, "r") as f:
+            for line in f:
+                items = line.rstrip().split()
+                assert len(items) == 2
+                self.output_symbols[int(items[1])] = items[0]
+
+        logger.info(f"Loading FST from {cfg.hlg_graph_path}")
+        self.fst = read_fst_kaldi(cfg.hlg_graph_path)
+        self.symbol_table = SymbolTable.read_text(cfg.output_dict)
+
+        self.executor = ThreadPoolExecutor(max_workers=cfg.num_threads)
+
+    def generate(self, models, sample, **unused):
+        """Generate a batch of inferences."""
+        # model.forward normally channels prev_output_tokens into the decoder
+        # separately, but SequenceGenerator directly calls model.encoder
+        encoder_input = {
+            k: v for k, v in sample["net_input"].items() if k != "prev_output_tokens"
+        }
+        emissions, padding = self.get_emissions(models, encoder_input)
+        return self.decode(emissions, padding)
+
+    def get_emissions(self, models, encoder_input):
+        """Run encoder and normalize emissions"""
+        model = models[0]
+
+        all_encoder_out = [m(**encoder_input) for m in models]
+
+        if len(all_encoder_out) > 1:
+
+            if "encoder_out" in all_encoder_out[0]:
+                encoder_out = {
+                    "encoder_out": sum(e["encoder_out"] for e in all_encoder_out)
+                    / len(all_encoder_out),
+                    "encoder_padding_mask": all_encoder_out[0]["encoder_padding_mask"],
+                }
+                padding = encoder_out["encoder_padding_mask"]
+            else:
+                encoder_out = {
+                    "logits": sum(e["logits"] for e in all_encoder_out)
+                    / len(all_encoder_out),
+                    "padding_mask": all_encoder_out[0]["padding_mask"],
+                }
+                padding = encoder_out["padding_mask"]
+        else:
+            encoder_out = all_encoder_out[0]
+            padding = (
+                encoder_out["padding_mask"]
+                if "padding_mask" in encoder_out
+                else encoder_out["encoder_padding_mask"]
+            )
+
+        if hasattr(model, "get_logits"):
+            emissions = model.get_logits(encoder_out, normalize=True)
+        else:
+            emissions = model.get_normalized_probs(encoder_out, log_probs=True)
+
+        return (
+            emissions.cpu().float().transpose(0, 1),
+            padding.cpu() if padding is not None and padding.any() else None,
+        )
+
+    def decode_one(self, logits, padding):
+        from kaldi.matrix import Matrix
+
+        decoder = self.dec_cls(self.fst, self.decoder_options)
+        asr = self.rec_cls(
+            decoder, self.symbol_table, acoustic_scale=self.acoustic_scale
+        )
+
+        if padding is not None:
+            logits = logits[~padding]
+
+        mat = Matrix(logits.numpy())
+
+        out = asr.decode(mat)
+
+        if self.nbest > 1:
+            from kaldi.fstext import shortestpath
+            from kaldi.fstext.utils import (
+                convert_compact_lattice_to_lattice,
+                convert_lattice_to_std,
+                convert_nbest_to_list,
+                get_linear_symbol_sequence,
+            )
+
+            lat = out["lattice"]
+
+            sp = shortestpath(lat, nshortest=self.nbest)
+
+            sp = convert_compact_lattice_to_lattice(sp)
+            sp = convert_lattice_to_std(sp)
+            seq = convert_nbest_to_list(sp)
+
+            results = []
+            for s in seq:
+                _, o, w = get_linear_symbol_sequence(s)
+                words = list(self.output_symbols[z] for z in o)
+                results.append(
+                    {
+                        "tokens": words,
+                        "words": words,
+                        "score": w.value,
+                        "emissions": logits,
+                    }
+                )
+            return results
+        else:
+            words = out["text"].split()
+            return [
+                {
+                    "tokens": words,
+                    "words": words,
+                    "score": out["likelihood"],
+                    "emissions": logits,
+                }
+            ]
+
+    def decode(self, emissions, padding):
+        if padding is None:
+            padding = [None] * len(emissions)
+
+        ret = list(
+            map(
+                lambda e, p: self.executor.submit(self.decode_one, e, p),
+                emissions,
+                padding,
+            )
+        )
+        return ret
diff --git a/SpeechT5/fairseq/examples/speech_recognition/kaldi/kaldi_initializer.py b/SpeechT5/fairseq/examples/speech_recognition/kaldi/kaldi_initializer.py
new file mode 100644
index 0000000000000000000000000000000000000000..6d2a2a4b6b809ba1106f9a57cb6f241dc083e670
--- /dev/null
+++ b/SpeechT5/fairseq/examples/speech_recognition/kaldi/kaldi_initializer.py
@@ -0,0 +1,698 @@
+#!/usr/bin/env python3
+
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from dataclasses import dataclass
+import hydra
+from hydra.core.config_store import ConfigStore
+import logging
+from omegaconf import MISSING, OmegaConf
+import os
+import os.path as osp
+from pathlib import Path
+import subprocess
+from typing import Optional
+
+from fairseq.data.dictionary import Dictionary
+from fairseq.dataclass import FairseqDataclass
+
+script_dir = Path(__file__).resolve().parent
+config_path = script_dir / "config"
+
+
+logger = logging.getLogger(__name__)
+
+
+@dataclass
+class KaldiInitializerConfig(FairseqDataclass):
+    data_dir: str = MISSING
+    fst_dir: Optional[str] = None
+    in_labels: str = MISSING
+    out_labels: Optional[str] = None
+    wav2letter_lexicon: Optional[str] = None
+    lm_arpa: str = MISSING
+    kaldi_root: str = MISSING
+    blank_symbol: str = "<s>"
+    silence_symbol: Optional[str] = None
+
+
+def create_units(fst_dir: Path, in_labels: str, vocab: Dictionary) -> Path:
+    in_units_file = fst_dir / f"kaldi_dict.{in_labels}.txt"
+    if not in_units_file.exists():
+
+        logger.info(f"Creating {in_units_file}")
+
+        with open(in_units_file, "w") as f:
+            print("<eps> 0", file=f)
+            i = 1
+            for symb in vocab.symbols[vocab.nspecial :]:
+                if not symb.startswith("madeupword"):
+                    print(f"{symb} {i}", file=f)
+                    i += 1
+    return in_units_file
+
+
+def create_lexicon(
+    cfg: KaldiInitializerConfig,
+    fst_dir: Path,
+    unique_label: str,
+    in_units_file: Path,
+    out_words_file: Path,
+) -> (Path, Path):
+
+    disambig_in_units_file = fst_dir / f"kaldi_dict.{cfg.in_labels}_disambig.txt"
+    lexicon_file = fst_dir / f"kaldi_lexicon.{unique_label}.txt"
+    disambig_lexicon_file = fst_dir / f"kaldi_lexicon.{unique_label}_disambig.txt"
+    if (
+        not lexicon_file.exists()
+        or not disambig_lexicon_file.exists()
+        or not disambig_in_units_file.exists()
+    ):
+        logger.info(f"Creating {lexicon_file} (in units file: {in_units_file})")
+
+        assert cfg.wav2letter_lexicon is not None or cfg.in_labels == cfg.out_labels
+
+        if cfg.wav2letter_lexicon is not None:
+            lm_words = set()
+            with open(out_words_file, "r") as lm_dict_f:
+                for line in lm_dict_f:
+                    lm_words.add(line.split()[0])
+
+            num_skipped = 0
+            total = 0
+            with open(cfg.wav2letter_lexicon, "r") as w2l_lex_f, open(
+                lexicon_file, "w"
+            ) as out_f:
+                for line in w2l_lex_f:
+                    items = line.rstrip().split("\t")
+                    assert len(items) == 2, items
+                    if items[0] in lm_words:
+                        print(items[0], items[1], file=out_f)
+                    else:
+                        num_skipped += 1
+                        logger.debug(
+                            f"Skipping word {items[0]} as it was not found in LM"
+                        )
+                    total += 1
+            if num_skipped > 0:
+                logger.warning(
+                    f"Skipped {num_skipped} out of {total} words as they were not found in LM"
+                )
+        else:
+            with open(in_units_file, "r") as in_f, open(lexicon_file, "w") as out_f:
+                for line in in_f:
+                    symb = line.split()[0]
+                    if symb != "<eps>" and symb != "<ctc_blank>" and symb != "<SIL>":
+                        print(symb, symb, file=out_f)
+
+        lex_disambig_path = (
+            Path(cfg.kaldi_root) / "egs/wsj/s5/utils/add_lex_disambig.pl"
+        )
+        res = subprocess.run(
+            [lex_disambig_path, lexicon_file, disambig_lexicon_file],
+            check=True,
+            capture_output=True,
+        )
+        ndisambig = int(res.stdout)
+        disamib_path = Path(cfg.kaldi_root) / "egs/wsj/s5/utils/add_disambig.pl"
+        res = subprocess.run(
+            [disamib_path, "--include-zero", in_units_file, str(ndisambig)],
+            check=True,
+            capture_output=True,
+        )
+        with open(disambig_in_units_file, "wb") as f:
+            f.write(res.stdout)
+
+    return disambig_lexicon_file, disambig_in_units_file
+
+
+def create_G(
+    kaldi_root: Path, fst_dir: Path, lm_arpa: Path, arpa_base: str
+) -> (Path, Path):
+
+    out_words_file = fst_dir / f"kaldi_dict.{arpa_base}.txt"
+    grammar_graph = fst_dir / f"G_{arpa_base}.fst"
+    if not grammar_graph.exists() or not out_words_file.exists():
+        logger.info(f"Creating {grammar_graph}")
+        arpa2fst = kaldi_root / "src/lmbin/arpa2fst"
+        subprocess.run(
+            [
+                arpa2fst,
+                "--disambig-symbol=#0",
+                f"--write-symbol-table={out_words_file}",
+                lm_arpa,
+                grammar_graph,
+            ],
+            check=True,
+        )
+    return grammar_graph, out_words_file
+
+
+def create_L(
+    kaldi_root: Path,
+    fst_dir: Path,
+    unique_label: str,
+    lexicon_file: Path,
+    in_units_file: Path,
+    out_words_file: Path,
+) -> Path:
+    lexicon_graph = fst_dir / f"L.{unique_label}.fst"
+
+    if not lexicon_graph.exists():
+        logger.info(f"Creating {lexicon_graph} (in units: {in_units_file})")
+        make_lex = kaldi_root / "egs/wsj/s5/utils/make_lexicon_fst.pl"
+        fstcompile = kaldi_root / "tools/openfst-1.6.7/bin/fstcompile"
+        fstaddselfloops = kaldi_root / "src/fstbin/fstaddselfloops"
+        fstarcsort = kaldi_root / "tools/openfst-1.6.7/bin/fstarcsort"
+
+        def write_disambig_symbol(file):
+            with open(file, "r") as f:
+                for line in f:
+                    items = line.rstrip().split()
+                    if items[0] == "#0":
+                        out_path = str(file) + "_disamig"
+                        with open(out_path, "w") as out_f:
+                            print(items[1], file=out_f)
+                            return out_path
+
+            return None
+
+        in_disambig_sym = write_disambig_symbol(in_units_file)
+        assert in_disambig_sym is not None
+        out_disambig_sym = write_disambig_symbol(out_words_file)
+        assert out_disambig_sym is not None
+
+        try:
+            with open(lexicon_graph, "wb") as out_f:
+                res = subprocess.run(
+                    [make_lex, lexicon_file], capture_output=True, check=True
+                )
+                assert len(res.stderr) == 0, res.stderr.decode("utf-8")
+                res = subprocess.run(
+                    [
+                        fstcompile,
+                        f"--isymbols={in_units_file}",
+                        f"--osymbols={out_words_file}",
+                        "--keep_isymbols=false",
+                        "--keep_osymbols=false",
+                    ],
+                    input=res.stdout,
+                    capture_output=True,
+                )
+                assert len(res.stderr) == 0, res.stderr.decode("utf-8")
+                res = subprocess.run(
+                    [fstaddselfloops, in_disambig_sym, out_disambig_sym],
+                    input=res.stdout,
+                    capture_output=True,
+                    check=True,
+                )
+                res = subprocess.run(
+                    [fstarcsort, "--sort_type=olabel"],
+                    input=res.stdout,
+                    capture_output=True,
+                    check=True,
+                )
+                out_f.write(res.stdout)
+        except subprocess.CalledProcessError as e:
+            logger.error(f"cmd: {e.cmd}, err: {e.stderr.decode('utf-8')}")
+            os.remove(lexicon_graph)
+            raise
+        except AssertionError:
+            os.remove(lexicon_graph)
+            raise
+
+    return lexicon_graph
+
+
+def create_LG(
+    kaldi_root: Path,
+    fst_dir: Path,
+    unique_label: str,
+    lexicon_graph: Path,
+    grammar_graph: Path,
+) -> Path:
+    lg_graph = fst_dir / f"LG.{unique_label}.fst"
+
+    if not lg_graph.exists():
+        logger.info(f"Creating {lg_graph}")
+
+        fsttablecompose = kaldi_root / "src/fstbin/fsttablecompose"
+        fstdeterminizestar = kaldi_root / "src/fstbin/fstdeterminizestar"
+        fstminimizeencoded = kaldi_root / "src/fstbin/fstminimizeencoded"
+        fstpushspecial = kaldi_root / "src/fstbin/fstpushspecial"
+        fstarcsort = kaldi_root / "tools/openfst-1.6.7/bin/fstarcsort"
+
+        try:
+            with open(lg_graph, "wb") as out_f:
+                res = subprocess.run(
+                    [fsttablecompose, lexicon_graph, grammar_graph],
+                    capture_output=True,
+                    check=True,
+                )
+                res = subprocess.run(
+                    [
+                        fstdeterminizestar,
+                        "--use-log=true",
+                    ],
+                    input=res.stdout,
+                    capture_output=True,
+                )
+                res = subprocess.run(
+                    [fstminimizeencoded],
+                    input=res.stdout,
+                    capture_output=True,
+                    check=True,
+                )
+                res = subprocess.run(
+                    [fstpushspecial],
+                    input=res.stdout,
+                    capture_output=True,
+                    check=True,
+                )
+                res = subprocess.run(
+                    [fstarcsort, "--sort_type=ilabel"],
+                    input=res.stdout,
+                    capture_output=True,
+                    check=True,
+                )
+                out_f.write(res.stdout)
+        except subprocess.CalledProcessError as e:
+            logger.error(f"cmd: {e.cmd}, err: {e.stderr.decode('utf-8')}")
+            os.remove(lg_graph)
+            raise
+
+    return lg_graph
+
+
+def create_H(
+    kaldi_root: Path,
+    fst_dir: Path,
+    disambig_out_units_file: Path,
+    in_labels: str,
+    vocab: Dictionary,
+    blk_sym: str,
+    silence_symbol: Optional[str],
+) -> (Path, Path, Path):
+    h_graph = (
+        fst_dir / f"H.{in_labels}{'_' + silence_symbol if silence_symbol else ''}.fst"
+    )
+    h_out_units_file = fst_dir / f"kaldi_dict.h_out.{in_labels}.txt"
+    disambig_in_units_file_int = Path(str(h_graph) + "isym_disambig.int")
+    disambig_out_units_file_int = Path(str(disambig_out_units_file) + ".int")
+    if (
+        not h_graph.exists()
+        or not h_out_units_file.exists()
+        or not disambig_in_units_file_int.exists()
+    ):
+        logger.info(f"Creating {h_graph}")
+        eps_sym = "<eps>"
+
+        num_disambig = 0
+        osymbols = []
+
+        with open(disambig_out_units_file, "r") as f, open(
+            disambig_out_units_file_int, "w"
+        ) as out_f:
+            for line in f:
+                symb, id = line.rstrip().split()
+                if line.startswith("#"):
+                    num_disambig += 1
+                    print(id, file=out_f)
+                else:
+                    if len(osymbols) == 0:
+                        assert symb == eps_sym, symb
+                    osymbols.append((symb, id))
+
+        i_idx = 0
+        isymbols = [(eps_sym, 0)]
+
+        imap = {}
+
+        for i, s in enumerate(vocab.symbols):
+            i_idx += 1
+            isymbols.append((s, i_idx))
+            imap[s] = i_idx
+
+        fst_str = []
+
+        node_idx = 0
+        root_node = node_idx
+
+        special_symbols = [blk_sym]
+        if silence_symbol is not None:
+            special_symbols.append(silence_symbol)
+
+        for ss in special_symbols:
+            fst_str.append("{} {} {} {}".format(root_node, root_node, ss, eps_sym))
+
+        for symbol, _ in osymbols:
+            if symbol == eps_sym or symbol.startswith("#"):
+                continue
+
+            node_idx += 1
+            # 1. from root to emitting state
+            fst_str.append("{} {} {} {}".format(root_node, node_idx, symbol, symbol))
+            # 2. from emitting state back to root
+            fst_str.append("{} {} {} {}".format(node_idx, root_node, eps_sym, eps_sym))
+            # 3. from emitting state to optional blank state
+            pre_node = node_idx
+            node_idx += 1
+            for ss in special_symbols:
+                fst_str.append("{} {} {} {}".format(pre_node, node_idx, ss, eps_sym))
+            # 4. from blank state back to root
+            fst_str.append("{} {} {} {}".format(node_idx, root_node, eps_sym, eps_sym))
+
+        fst_str.append("{}".format(root_node))
+
+        fst_str = "\n".join(fst_str)
+        h_str = str(h_graph)
+        isym_file = h_str + ".isym"
+
+        with open(isym_file, "w") as f:
+            for sym, id in isymbols:
+                f.write("{} {}\n".format(sym, id))
+
+        with open(h_out_units_file, "w") as f:
+            for sym, id in osymbols:
+                f.write("{} {}\n".format(sym, id))
+
+        with open(disambig_in_units_file_int, "w") as f:
+            disam_sym_id = len(isymbols)
+            for _ in range(num_disambig):
+                f.write("{}\n".format(disam_sym_id))
+                disam_sym_id += 1
+
+        fstcompile = kaldi_root / "tools/openfst-1.6.7/bin/fstcompile"
+        fstaddselfloops = kaldi_root / "src/fstbin/fstaddselfloops"
+        fstarcsort = kaldi_root / "tools/openfst-1.6.7/bin/fstarcsort"
+
+        try:
+            with open(h_graph, "wb") as out_f:
+                res = subprocess.run(
+                    [
+                        fstcompile,
+                        f"--isymbols={isym_file}",
+                        f"--osymbols={h_out_units_file}",
+                        "--keep_isymbols=false",
+                        "--keep_osymbols=false",
+                    ],
+                    input=str.encode(fst_str),
+                    capture_output=True,
+                    check=True,
+                )
+                res = subprocess.run(
+                    [
+                        fstaddselfloops,
+                        disambig_in_units_file_int,
+                        disambig_out_units_file_int,
+                    ],
+                    input=res.stdout,
+                    capture_output=True,
+                    check=True,
+                )
+                res = subprocess.run(
+                    [fstarcsort, "--sort_type=olabel"],
+                    input=res.stdout,
+                    capture_output=True,
+                    check=True,
+                )
+                out_f.write(res.stdout)
+        except subprocess.CalledProcessError as e:
+            logger.error(f"cmd: {e.cmd}, err: {e.stderr.decode('utf-8')}")
+            os.remove(h_graph)
+            raise
+    return h_graph, h_out_units_file, disambig_in_units_file_int
+
+
+def create_HLGa(
+    kaldi_root: Path,
+    fst_dir: Path,
+    unique_label: str,
+    h_graph: Path,
+    lg_graph: Path,
+    disambig_in_words_file_int: Path,
+) -> Path:
+    hlga_graph = fst_dir / f"HLGa.{unique_label}.fst"
+
+    if not hlga_graph.exists():
+        logger.info(f"Creating {hlga_graph}")
+
+        fsttablecompose = kaldi_root / "src/fstbin/fsttablecompose"
+        fstdeterminizestar = kaldi_root / "src/fstbin/fstdeterminizestar"
+        fstrmsymbols = kaldi_root / "src/fstbin/fstrmsymbols"
+        fstrmepslocal = kaldi_root / "src/fstbin/fstrmepslocal"
+        fstminimizeencoded = kaldi_root / "src/fstbin/fstminimizeencoded"
+
+        try:
+            with open(hlga_graph, "wb") as out_f:
+                res = subprocess.run(
+                    [
+                        fsttablecompose,
+                        h_graph,
+                        lg_graph,
+                    ],
+                    capture_output=True,
+                    check=True,
+                )
+                res = subprocess.run(
+                    [fstdeterminizestar, "--use-log=true"],
+                    input=res.stdout,
+                    capture_output=True,
+                    check=True,
+                )
+                res = subprocess.run(
+                    [fstrmsymbols, disambig_in_words_file_int],
+                    input=res.stdout,
+                    capture_output=True,
+                    check=True,
+                )
+                res = subprocess.run(
+                    [fstrmepslocal],
+                    input=res.stdout,
+                    capture_output=True,
+                    check=True,
+                )
+                res = subprocess.run(
+                    [fstminimizeencoded],
+                    input=res.stdout,
+                    capture_output=True,
+                    check=True,
+                )
+                out_f.write(res.stdout)
+        except subprocess.CalledProcessError as e:
+            logger.error(f"cmd: {e.cmd}, err: {e.stderr.decode('utf-8')}")
+            os.remove(hlga_graph)
+            raise
+
+    return hlga_graph
+
+
+def create_HLa(
+    kaldi_root: Path,
+    fst_dir: Path,
+    unique_label: str,
+    h_graph: Path,
+    l_graph: Path,
+    disambig_in_words_file_int: Path,
+) -> Path:
+    hla_graph = fst_dir / f"HLa.{unique_label}.fst"
+
+    if not hla_graph.exists():
+        logger.info(f"Creating {hla_graph}")
+
+        fsttablecompose = kaldi_root / "src/fstbin/fsttablecompose"
+        fstdeterminizestar = kaldi_root / "src/fstbin/fstdeterminizestar"
+        fstrmsymbols = kaldi_root / "src/fstbin/fstrmsymbols"
+        fstrmepslocal = kaldi_root / "src/fstbin/fstrmepslocal"
+        fstminimizeencoded = kaldi_root / "src/fstbin/fstminimizeencoded"
+
+        try:
+            with open(hla_graph, "wb") as out_f:
+                res = subprocess.run(
+                    [
+                        fsttablecompose,
+                        h_graph,
+                        l_graph,
+                    ],
+                    capture_output=True,
+                    check=True,
+                )
+                res = subprocess.run(
+                    [fstdeterminizestar, "--use-log=true"],
+                    input=res.stdout,
+                    capture_output=True,
+                    check=True,
+                )
+                res = subprocess.run(
+                    [fstrmsymbols, disambig_in_words_file_int],
+                    input=res.stdout,
+                    capture_output=True,
+                    check=True,
+                )
+                res = subprocess.run(
+                    [fstrmepslocal],
+                    input=res.stdout,
+                    capture_output=True,
+                    check=True,
+                )
+                res = subprocess.run(
+                    [fstminimizeencoded],
+                    input=res.stdout,
+                    capture_output=True,
+                    check=True,
+                )
+                out_f.write(res.stdout)
+        except subprocess.CalledProcessError as e:
+            logger.error(f"cmd: {e.cmd}, err: {e.stderr.decode('utf-8')}")
+            os.remove(hla_graph)
+            raise
+
+    return hla_graph
+
+
+def create_HLG(
+    kaldi_root: Path,
+    fst_dir: Path,
+    unique_label: str,
+    hlga_graph: Path,
+    prefix: str = "HLG",
+) -> Path:
+    hlg_graph = fst_dir / f"{prefix}.{unique_label}.fst"
+
+    if not hlg_graph.exists():
+        logger.info(f"Creating {hlg_graph}")
+
+        add_self_loop = script_dir / "add-self-loop-simple"
+        kaldi_src = kaldi_root / "src"
+        kaldi_lib = kaldi_src / "lib"
+
+        try:
+            if not add_self_loop.exists():
+                fst_include = kaldi_root / "tools/openfst-1.6.7/include"
+                add_self_loop_src = script_dir / "add-self-loop-simple.cc"
+
+                subprocess.run(
+                    [
+                        "c++",
+                        f"-I{kaldi_src}",
+                        f"-I{fst_include}",
+                        f"-L{kaldi_lib}",
+                        add_self_loop_src,
+                        "-lkaldi-base",
+                        "-lkaldi-fstext",
+                        "-o",
+                        add_self_loop,
+                    ],
+                    check=True,
+                )
+
+            my_env = os.environ.copy()
+            my_env["LD_LIBRARY_PATH"] = f"{kaldi_lib}:{my_env['LD_LIBRARY_PATH']}"
+
+            subprocess.run(
+                [
+                    add_self_loop,
+                    hlga_graph,
+                    hlg_graph,
+                ],
+                check=True,
+                capture_output=True,
+                env=my_env,
+            )
+        except subprocess.CalledProcessError as e:
+            logger.error(f"cmd: {e.cmd}, err: {e.stderr.decode('utf-8')}")
+            raise
+
+    return hlg_graph
+
+
+def initalize_kaldi(cfg: KaldiInitializerConfig) -> Path:
+    if cfg.fst_dir is None:
+        cfg.fst_dir = osp.join(cfg.data_dir, "kaldi")
+    if cfg.out_labels is None:
+        cfg.out_labels = cfg.in_labels
+
+    kaldi_root = Path(cfg.kaldi_root)
+    data_dir = Path(cfg.data_dir)
+    fst_dir = Path(cfg.fst_dir)
+    fst_dir.mkdir(parents=True, exist_ok=True)
+
+    arpa_base = osp.splitext(osp.basename(cfg.lm_arpa))[0]
+    unique_label = f"{cfg.in_labels}.{arpa_base}"
+
+    with open(data_dir / f"dict.{cfg.in_labels}.txt", "r") as f:
+        vocab = Dictionary.load(f)
+
+    in_units_file = create_units(fst_dir, cfg.in_labels, vocab)
+
+    grammar_graph, out_words_file = create_G(
+        kaldi_root, fst_dir, Path(cfg.lm_arpa), arpa_base
+    )
+
+    disambig_lexicon_file, disambig_L_in_units_file = create_lexicon(
+        cfg, fst_dir, unique_label, in_units_file, out_words_file
+    )
+
+    h_graph, h_out_units_file, disambig_in_units_file_int = create_H(
+        kaldi_root,
+        fst_dir,
+        disambig_L_in_units_file,
+        cfg.in_labels,
+        vocab,
+        cfg.blank_symbol,
+        cfg.silence_symbol,
+    )
+    lexicon_graph = create_L(
+        kaldi_root,
+        fst_dir,
+        unique_label,
+        disambig_lexicon_file,
+        disambig_L_in_units_file,
+        out_words_file,
+    )
+    lg_graph = create_LG(
+        kaldi_root, fst_dir, unique_label, lexicon_graph, grammar_graph
+    )
+    hlga_graph = create_HLGa(
+        kaldi_root, fst_dir, unique_label, h_graph, lg_graph, disambig_in_units_file_int
+    )
+    hlg_graph = create_HLG(kaldi_root, fst_dir, unique_label, hlga_graph)
+
+    # for debugging
+    # hla_graph = create_HLa(kaldi_root, fst_dir, unique_label, h_graph, lexicon_graph, disambig_in_units_file_int)
+    # hl_graph = create_HLG(kaldi_root, fst_dir, unique_label, hla_graph, prefix="HL_looped")
+    # create_HLG(kaldi_root, fst_dir, "phnc", h_graph, prefix="H_looped")
+
+    return hlg_graph
+
+
+@hydra.main(config_path=config_path, config_name="kaldi_initializer")
+def cli_main(cfg: KaldiInitializerConfig) -> None:
+    container = OmegaConf.to_container(cfg, resolve=True, enum_to_str=True)
+    cfg = OmegaConf.create(container)
+    OmegaConf.set_struct(cfg, True)
+    initalize_kaldi(cfg)
+
+
+if __name__ == "__main__":
+
+    logging.root.setLevel(logging.INFO)
+    logging.basicConfig(level=logging.INFO)
+
+    try:
+        from hydra._internal.utils import (
+            get_args,
+        )  # pylint: disable=import-outside-toplevel
+
+        cfg_name = get_args().config_name or "kaldi_initializer"
+    except ImportError:
+        logger.warning("Failed to get config name from hydra args")
+        cfg_name = "kaldi_initializer"
+
+    cs = ConfigStore.instance()
+    cs.store(name=cfg_name, node=KaldiInitializerConfig)
+
+    cli_main()
diff --git a/SpeechT5/fairseq/examples/speech_recognition/models/__init__.py b/SpeechT5/fairseq/examples/speech_recognition/models/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..54b5a1c31243e55d384f80ef9514461cd35b15c6
--- /dev/null
+++ b/SpeechT5/fairseq/examples/speech_recognition/models/__init__.py
@@ -0,0 +1,8 @@
+import importlib
+import os
+
+
+for file in sorted(os.listdir(os.path.dirname(__file__))):
+    if file.endswith(".py") and not file.startswith("_"):
+        model_name = file[: file.find(".py")]
+        importlib.import_module("examples.speech_recognition.models." + model_name)
diff --git a/SpeechT5/fairseq/examples/speech_recognition/models/vggtransformer.py b/SpeechT5/fairseq/examples/speech_recognition/models/vggtransformer.py
new file mode 100644
index 0000000000000000000000000000000000000000..97974360a454b581eb63bdfd2af2e2afa05596c7
--- /dev/null
+++ b/SpeechT5/fairseq/examples/speech_recognition/models/vggtransformer.py
@@ -0,0 +1,1019 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import argparse
+import math
+from collections.abc import Iterable
+
+import torch
+import torch.nn as nn
+from examples.speech_recognition.data.data_utils import lengths_to_encoder_padding_mask
+from fairseq import utils
+from fairseq.models import (
+    FairseqEncoder,
+    FairseqEncoderDecoderModel,
+    FairseqEncoderModel,
+    FairseqIncrementalDecoder,
+    register_model,
+    register_model_architecture,
+)
+from fairseq.modules import (
+    LinearizedConvolution,
+    TransformerDecoderLayer,
+    TransformerEncoderLayer,
+    VGGBlock,
+)
+
+
+@register_model("asr_vggtransformer")
+class VGGTransformerModel(FairseqEncoderDecoderModel):
+    """
+    Transformers with convolutional context for ASR
+    https://arxiv.org/abs/1904.11660
+    """
+
+    def __init__(self, encoder, decoder):
+        super().__init__(encoder, decoder)
+
+    @staticmethod
+    def add_args(parser):
+        """Add model-specific arguments to the parser."""
+        parser.add_argument(
+            "--input-feat-per-channel",
+            type=int,
+            metavar="N",
+            help="encoder input dimension per input channel",
+        )
+        parser.add_argument(
+            "--vggblock-enc-config",
+            type=str,
+            metavar="EXPR",
+            help="""
+    an array of tuples each containing the configuration of one vggblock:
+    [(out_channels,
+      conv_kernel_size,
+      pooling_kernel_size,
+      num_conv_layers,
+      use_layer_norm), ...])
+            """,
+        )
+        parser.add_argument(
+            "--transformer-enc-config",
+            type=str,
+            metavar="EXPR",
+            help=""""
+    a tuple containing the configuration of the encoder transformer layers
+    configurations:
+    [(input_dim,
+      num_heads,
+      ffn_dim,
+      normalize_before,
+      dropout,
+      attention_dropout,
+      relu_dropout), ...]')
+            """,
+        )
+        parser.add_argument(
+            "--enc-output-dim",
+            type=int,
+            metavar="N",
+            help="""
+    encoder output dimension, can be None. If specified, projecting the
+    transformer output to the specified dimension""",
+        )
+        parser.add_argument(
+            "--in-channels",
+            type=int,
+            metavar="N",
+            help="number of encoder input channels",
+        )
+        parser.add_argument(
+            "--tgt-embed-dim",
+            type=int,
+            metavar="N",
+            help="embedding dimension of the decoder target tokens",
+        )
+        parser.add_argument(
+            "--transformer-dec-config",
+            type=str,
+            metavar="EXPR",
+            help="""
+    a tuple containing the configuration of the decoder transformer layers
+    configurations:
+    [(input_dim,
+      num_heads,
+      ffn_dim,
+      normalize_before,
+      dropout,
+      attention_dropout,
+      relu_dropout), ...]
+            """,
+        )
+        parser.add_argument(
+            "--conv-dec-config",
+            type=str,
+            metavar="EXPR",
+            help="""
+    an array of tuples for the decoder 1-D convolution config
+        [(out_channels, conv_kernel_size, use_layer_norm), ...]""",
+        )
+
+    @classmethod
+    def build_encoder(cls, args, task):
+        return VGGTransformerEncoder(
+            input_feat_per_channel=args.input_feat_per_channel,
+            vggblock_config=eval(args.vggblock_enc_config),
+            transformer_config=eval(args.transformer_enc_config),
+            encoder_output_dim=args.enc_output_dim,
+            in_channels=args.in_channels,
+        )
+
+    @classmethod
+    def build_decoder(cls, args, task):
+        return TransformerDecoder(
+            dictionary=task.target_dictionary,
+            embed_dim=args.tgt_embed_dim,
+            transformer_config=eval(args.transformer_dec_config),
+            conv_config=eval(args.conv_dec_config),
+            encoder_output_dim=args.enc_output_dim,
+        )
+
+    @classmethod
+    def build_model(cls, args, task):
+        """Build a new model instance."""
+        # make sure that all args are properly defaulted
+        # (in case there are any new ones)
+        base_architecture(args)
+
+        encoder = cls.build_encoder(args, task)
+        decoder = cls.build_decoder(args, task)
+        return cls(encoder, decoder)
+
+    def get_normalized_probs(self, net_output, log_probs, sample=None):
+        # net_output['encoder_out'] is a (B, T, D) tensor
+        lprobs = super().get_normalized_probs(net_output, log_probs, sample)
+        lprobs.batch_first = True
+        return lprobs
+
+
+DEFAULT_ENC_VGGBLOCK_CONFIG = ((32, 3, 2, 2, False),) * 2
+DEFAULT_ENC_TRANSFORMER_CONFIG = ((256, 4, 1024, True, 0.2, 0.2, 0.2),) * 2
+# 256: embedding dimension
+# 4: number of heads
+# 1024: FFN
+# True: apply layerNorm before (dropout + resiaul) instead of after
+# 0.2 (dropout): dropout after MultiheadAttention and second FC
+# 0.2 (attention_dropout): dropout in MultiheadAttention
+# 0.2 (relu_dropout): dropout after ReLu
+DEFAULT_DEC_TRANSFORMER_CONFIG = ((256, 2, 1024, True, 0.2, 0.2, 0.2),) * 2
+DEFAULT_DEC_CONV_CONFIG = ((256, 3, True),) * 2
+
+
+# TODO: repace transformer encoder config from one liner
+# to explicit args to get rid of this transformation
+def prepare_transformer_encoder_params(
+    input_dim,
+    num_heads,
+    ffn_dim,
+    normalize_before,
+    dropout,
+    attention_dropout,
+    relu_dropout,
+):
+    args = argparse.Namespace()
+    args.encoder_embed_dim = input_dim
+    args.encoder_attention_heads = num_heads
+    args.attention_dropout = attention_dropout
+    args.dropout = dropout
+    args.activation_dropout = relu_dropout
+    args.encoder_normalize_before = normalize_before
+    args.encoder_ffn_embed_dim = ffn_dim
+    return args
+
+
+def prepare_transformer_decoder_params(
+    input_dim,
+    num_heads,
+    ffn_dim,
+    normalize_before,
+    dropout,
+    attention_dropout,
+    relu_dropout,
+):
+    args = argparse.Namespace()
+    args.decoder_embed_dim = input_dim
+    args.decoder_attention_heads = num_heads
+    args.attention_dropout = attention_dropout
+    args.dropout = dropout
+    args.activation_dropout = relu_dropout
+    args.decoder_normalize_before = normalize_before
+    args.decoder_ffn_embed_dim = ffn_dim
+    return args
+
+
+class VGGTransformerEncoder(FairseqEncoder):
+    """VGG + Transformer encoder"""
+
+    def __init__(
+        self,
+        input_feat_per_channel,
+        vggblock_config=DEFAULT_ENC_VGGBLOCK_CONFIG,
+        transformer_config=DEFAULT_ENC_TRANSFORMER_CONFIG,
+        encoder_output_dim=512,
+        in_channels=1,
+        transformer_context=None,
+        transformer_sampling=None,
+    ):
+        """constructor for VGGTransformerEncoder
+
+        Args:
+            - input_feat_per_channel: feature dim (not including stacked,
+              just base feature)
+            - in_channel: # input channels (e.g., if stack 8 feature vector
+                together, this is 8)
+            - vggblock_config: configuration of vggblock, see comments on
+                DEFAULT_ENC_VGGBLOCK_CONFIG
+            - transformer_config: configuration of transformer layer, see comments
+                on DEFAULT_ENC_TRANSFORMER_CONFIG
+            - encoder_output_dim: final transformer output embedding dimension
+            - transformer_context: (left, right) if set, self-attention will be focused
+              on (t-left, t+right)
+            - transformer_sampling: an iterable of int, must match with
+              len(transformer_config), transformer_sampling[i] indicates sampling
+              factor for i-th transformer layer, after multihead att and feedfoward
+              part
+        """
+        super().__init__(None)
+
+        self.num_vggblocks = 0
+        if vggblock_config is not None:
+            if not isinstance(vggblock_config, Iterable):
+                raise ValueError("vggblock_config is not iterable")
+            self.num_vggblocks = len(vggblock_config)
+
+        self.conv_layers = nn.ModuleList()
+        self.in_channels = in_channels
+        self.input_dim = input_feat_per_channel
+        self.pooling_kernel_sizes = []
+
+        if vggblock_config is not None:
+            for _, config in enumerate(vggblock_config):
+                (
+                    out_channels,
+                    conv_kernel_size,
+                    pooling_kernel_size,
+                    num_conv_layers,
+                    layer_norm,
+                ) = config
+                self.conv_layers.append(
+                    VGGBlock(
+                        in_channels,
+                        out_channels,
+                        conv_kernel_size,
+                        pooling_kernel_size,
+                        num_conv_layers,
+                        input_dim=input_feat_per_channel,
+                        layer_norm=layer_norm,
+                    )
+                )
+                self.pooling_kernel_sizes.append(pooling_kernel_size)
+                in_channels = out_channels
+                input_feat_per_channel = self.conv_layers[-1].output_dim
+
+        transformer_input_dim = self.infer_conv_output_dim(
+            self.in_channels, self.input_dim
+        )
+        # transformer_input_dim is the output dimension of VGG part
+
+        self.validate_transformer_config(transformer_config)
+        self.transformer_context = self.parse_transformer_context(transformer_context)
+        self.transformer_sampling = self.parse_transformer_sampling(
+            transformer_sampling, len(transformer_config)
+        )
+
+        self.transformer_layers = nn.ModuleList()
+
+        if transformer_input_dim != transformer_config[0][0]:
+            self.transformer_layers.append(
+                Linear(transformer_input_dim, transformer_config[0][0])
+            )
+        self.transformer_layers.append(
+            TransformerEncoderLayer(
+                prepare_transformer_encoder_params(*transformer_config[0])
+            )
+        )
+
+        for i in range(1, len(transformer_config)):
+            if transformer_config[i - 1][0] != transformer_config[i][0]:
+                self.transformer_layers.append(
+                    Linear(transformer_config[i - 1][0], transformer_config[i][0])
+                )
+            self.transformer_layers.append(
+                TransformerEncoderLayer(
+                    prepare_transformer_encoder_params(*transformer_config[i])
+                )
+            )
+
+        self.encoder_output_dim = encoder_output_dim
+        self.transformer_layers.extend(
+            [
+                Linear(transformer_config[-1][0], encoder_output_dim),
+                LayerNorm(encoder_output_dim),
+            ]
+        )
+
+    def forward(self, src_tokens, src_lengths, **kwargs):
+        """
+        src_tokens: padded tensor (B, T, C * feat)
+        src_lengths: tensor of original lengths of input utterances (B,)
+        """
+        bsz, max_seq_len, _ = src_tokens.size()
+        x = src_tokens.view(bsz, max_seq_len, self.in_channels, self.input_dim)
+        x = x.transpose(1, 2).contiguous()
+        # (B, C, T, feat)
+
+        for layer_idx in range(len(self.conv_layers)):
+            x = self.conv_layers[layer_idx](x)
+
+        bsz, _, output_seq_len, _ = x.size()
+
+        # (B, C, T, feat) -> (B, T, C, feat) -> (T, B, C, feat) -> (T, B, C * feat)
+        x = x.transpose(1, 2).transpose(0, 1)
+        x = x.contiguous().view(output_seq_len, bsz, -1)
+
+        input_lengths = src_lengths.clone()
+        for s in self.pooling_kernel_sizes:
+            input_lengths = (input_lengths.float() / s).ceil().long()
+
+        encoder_padding_mask, _ = lengths_to_encoder_padding_mask(
+            input_lengths, batch_first=True
+        )
+        if not encoder_padding_mask.any():
+            encoder_padding_mask = None
+
+        subsampling_factor = int(max_seq_len * 1.0 / output_seq_len + 0.5)
+        attn_mask = self.lengths_to_attn_mask(input_lengths, subsampling_factor)
+
+        transformer_layer_idx = 0
+
+        for layer_idx in range(len(self.transformer_layers)):
+
+            if isinstance(self.transformer_layers[layer_idx], TransformerEncoderLayer):
+                x = self.transformer_layers[layer_idx](
+                    x, encoder_padding_mask, attn_mask
+                )
+
+                if self.transformer_sampling[transformer_layer_idx] != 1:
+                    sampling_factor = self.transformer_sampling[transformer_layer_idx]
+                    x, encoder_padding_mask, attn_mask = self.slice(
+                        x, encoder_padding_mask, attn_mask, sampling_factor
+                    )
+
+                transformer_layer_idx += 1
+
+            else:
+                x = self.transformer_layers[layer_idx](x)
+
+        # encoder_padding_maks is a (T x B) tensor, its [t, b] elements indicate
+        # whether encoder_output[t, b] is valid or not (valid=0, invalid=1)
+
+        return {
+            "encoder_out": x,  # (T, B, C)
+            "encoder_padding_mask": encoder_padding_mask.t()
+            if encoder_padding_mask is not None
+            else None,
+            # (B, T) --> (T, B)
+        }
+
+    def infer_conv_output_dim(self, in_channels, input_dim):
+        sample_seq_len = 200
+        sample_bsz = 10
+        x = torch.randn(sample_bsz, in_channels, sample_seq_len, input_dim)
+        for i, _ in enumerate(self.conv_layers):
+            x = self.conv_layers[i](x)
+        x = x.transpose(1, 2)
+        mb, seq = x.size()[:2]
+        return x.contiguous().view(mb, seq, -1).size(-1)
+
+    def validate_transformer_config(self, transformer_config):
+        for config in transformer_config:
+            input_dim, num_heads = config[:2]
+            if input_dim % num_heads != 0:
+                msg = (
+                    "ERROR in transformer config {}: ".format(config)
+                    + "input dimension {} ".format(input_dim)
+                    + "not dividable by number of heads {}".format(num_heads)
+                )
+                raise ValueError(msg)
+
+    def parse_transformer_context(self, transformer_context):
+        """
+        transformer_context can be the following:
+        -   None; indicates no context is used, i.e.,
+            transformer can access full context
+        -   a tuple/list of two int; indicates left and right context,
+            any number <0 indicates infinite context
+                * e.g., (5, 6) indicates that for query at x_t, transformer can
+                access [t-5, t+6] (inclusive)
+                * e.g., (-1, 6) indicates that for query at x_t, transformer can
+                access [0, t+6] (inclusive)
+        """
+        if transformer_context is None:
+            return None
+
+        if not isinstance(transformer_context, Iterable):
+            raise ValueError("transformer context must be Iterable if it is not None")
+
+        if len(transformer_context) != 2:
+            raise ValueError("transformer context must have length 2")
+
+        left_context = transformer_context[0]
+        if left_context < 0:
+            left_context = None
+
+        right_context = transformer_context[1]
+        if right_context < 0:
+            right_context = None
+
+        if left_context is None and right_context is None:
+            return None
+
+        return (left_context, right_context)
+
+    def parse_transformer_sampling(self, transformer_sampling, num_layers):
+        """
+        parsing transformer sampling configuration
+
+        Args:
+            - transformer_sampling, accepted input:
+                * None, indicating no sampling
+                * an Iterable with int (>0) as element
+            - num_layers, expected number of transformer layers, must match with
+              the length of transformer_sampling if it is not None
+
+        Returns:
+            - A tuple with length num_layers
+        """
+        if transformer_sampling is None:
+            return (1,) * num_layers
+
+        if not isinstance(transformer_sampling, Iterable):
+            raise ValueError(
+                "transformer_sampling must be an iterable if it is not None"
+            )
+
+        if len(transformer_sampling) != num_layers:
+            raise ValueError(
+                "transformer_sampling {} does not match with the number "
+                "of layers {}".format(transformer_sampling, num_layers)
+            )
+
+        for layer, value in enumerate(transformer_sampling):
+            if not isinstance(value, int):
+                raise ValueError("Invalid value in transformer_sampling: ")
+            if value < 1:
+                raise ValueError(
+                    "{} layer's subsampling is {}.".format(layer, value)
+                    + " This is not allowed! "
+                )
+        return transformer_sampling
+
+    def slice(self, embedding, padding_mask, attn_mask, sampling_factor):
+        """
+        embedding is a (T, B, D) tensor
+        padding_mask is a (B, T) tensor or None
+        attn_mask is a (T, T) tensor or None
+        """
+        embedding = embedding[::sampling_factor, :, :]
+        if padding_mask is not None:
+            padding_mask = padding_mask[:, ::sampling_factor]
+        if attn_mask is not None:
+            attn_mask = attn_mask[::sampling_factor, ::sampling_factor]
+
+        return embedding, padding_mask, attn_mask
+
+    def lengths_to_attn_mask(self, input_lengths, subsampling_factor=1):
+        """
+        create attention mask according to sequence lengths and transformer
+        context
+
+        Args:
+            - input_lengths: (B, )-shape Int/Long tensor; input_lengths[b] is
+              the length of b-th sequence
+            - subsampling_factor: int
+                * Note that the left_context and right_context is specified in
+                  the input frame-level while input to transformer may already
+                  go through subsampling (e.g., the use of striding in vggblock)
+                  we use subsampling_factor to scale the left/right context
+
+        Return:
+            - a (T, T) binary tensor or None, where T is max(input_lengths)
+                * if self.transformer_context is None, None
+                * if left_context is None,
+                    * attn_mask[t, t + right_context + 1:] = 1
+                    * others = 0
+                * if right_context is None,
+                    * attn_mask[t, 0:t - left_context] = 1
+                    * others = 0
+                * elsif
+                    * attn_mask[t, t - left_context: t + right_context + 1] = 0
+                    * others = 1
+        """
+        if self.transformer_context is None:
+            return None
+
+        maxT = torch.max(input_lengths).item()
+        attn_mask = torch.zeros(maxT, maxT)
+
+        left_context = self.transformer_context[0]
+        right_context = self.transformer_context[1]
+        if left_context is not None:
+            left_context = math.ceil(self.transformer_context[0] / subsampling_factor)
+        if right_context is not None:
+            right_context = math.ceil(self.transformer_context[1] / subsampling_factor)
+
+        for t in range(maxT):
+            if left_context is not None:
+                st = 0
+                en = max(st, t - left_context)
+                attn_mask[t, st:en] = 1
+            if right_context is not None:
+                st = t + right_context + 1
+                st = min(st, maxT - 1)
+                attn_mask[t, st:] = 1
+
+        return attn_mask.to(input_lengths.device)
+
+    def reorder_encoder_out(self, encoder_out, new_order):
+        encoder_out["encoder_out"] = encoder_out["encoder_out"].index_select(
+            1, new_order
+        )
+        if encoder_out["encoder_padding_mask"] is not None:
+            encoder_out["encoder_padding_mask"] = encoder_out[
+                "encoder_padding_mask"
+            ].index_select(1, new_order)
+        return encoder_out
+
+
+class TransformerDecoder(FairseqIncrementalDecoder):
+    """
+    Transformer decoder consisting of *args.decoder_layers* layers. Each layer
+    is a :class:`TransformerDecoderLayer`.
+    Args:
+        args (argparse.Namespace): parsed command-line arguments
+        dictionary (~fairseq.data.Dictionary): decoding dictionary
+        embed_tokens (torch.nn.Embedding): output embedding
+        no_encoder_attn (bool, optional): whether to attend to encoder outputs.
+            Default: ``False``
+        left_pad (bool, optional): whether the input is left-padded. Default:
+            ``False``
+    """
+
+    def __init__(
+        self,
+        dictionary,
+        embed_dim=512,
+        transformer_config=DEFAULT_ENC_TRANSFORMER_CONFIG,
+        conv_config=DEFAULT_DEC_CONV_CONFIG,
+        encoder_output_dim=512,
+    ):
+
+        super().__init__(dictionary)
+        vocab_size = len(dictionary)
+        self.padding_idx = dictionary.pad()
+        self.embed_tokens = Embedding(vocab_size, embed_dim, self.padding_idx)
+
+        self.conv_layers = nn.ModuleList()
+        for i in range(len(conv_config)):
+            out_channels, kernel_size, layer_norm = conv_config[i]
+            if i == 0:
+                conv_layer = LinearizedConv1d(
+                    embed_dim, out_channels, kernel_size, padding=kernel_size - 1
+                )
+            else:
+                conv_layer = LinearizedConv1d(
+                    conv_config[i - 1][0],
+                    out_channels,
+                    kernel_size,
+                    padding=kernel_size - 1,
+                )
+            self.conv_layers.append(conv_layer)
+            if layer_norm:
+                self.conv_layers.append(nn.LayerNorm(out_channels))
+            self.conv_layers.append(nn.ReLU())
+
+        self.layers = nn.ModuleList()
+        if conv_config[-1][0] != transformer_config[0][0]:
+            self.layers.append(Linear(conv_config[-1][0], transformer_config[0][0]))
+        self.layers.append(
+            TransformerDecoderLayer(
+                prepare_transformer_decoder_params(*transformer_config[0])
+            )
+        )
+
+        for i in range(1, len(transformer_config)):
+            if transformer_config[i - 1][0] != transformer_config[i][0]:
+                self.layers.append(
+                    Linear(transformer_config[i - 1][0], transformer_config[i][0])
+                )
+            self.layers.append(
+                TransformerDecoderLayer(
+                    prepare_transformer_decoder_params(*transformer_config[i])
+                )
+            )
+        self.fc_out = Linear(transformer_config[-1][0], vocab_size)
+
+    def forward(self, prev_output_tokens, encoder_out=None, incremental_state=None):
+        """
+        Args:
+            prev_output_tokens (LongTensor): previous decoder outputs of shape
+                `(batch, tgt_len)`, for input feeding/teacher forcing
+            encoder_out (Tensor, optional): output from the encoder, used for
+                encoder-side attention
+            incremental_state (dict): dictionary used for storing state during
+                :ref:`Incremental decoding`
+        Returns:
+            tuple:
+                - the last decoder layer's output of shape `(batch, tgt_len,
+                  vocab)`
+                - the last decoder layer's attention weights of shape `(batch,
+                  tgt_len, src_len)`
+        """
+        target_padding_mask = (
+            (prev_output_tokens == self.padding_idx).to(prev_output_tokens.device)
+            if incremental_state is None
+            else None
+        )
+
+        if incremental_state is not None:
+            prev_output_tokens = prev_output_tokens[:, -1:]
+
+        # embed tokens
+        x = self.embed_tokens(prev_output_tokens)
+
+        # B x T x C -> T x B x C
+        x = self._transpose_if_training(x, incremental_state)
+
+        for layer in self.conv_layers:
+            if isinstance(layer, LinearizedConvolution):
+                x = layer(x, incremental_state)
+            else:
+                x = layer(x)
+
+        # B x T x C -> T x B x C
+        x = self._transpose_if_inference(x, incremental_state)
+
+        # decoder layers
+        for layer in self.layers:
+            if isinstance(layer, TransformerDecoderLayer):
+                x, *_ = layer(
+                    x,
+                    (encoder_out["encoder_out"] if encoder_out is not None else None),
+                    (
+                        encoder_out["encoder_padding_mask"].t()
+                        if encoder_out["encoder_padding_mask"] is not None
+                        else None
+                    ),
+                    incremental_state,
+                    self_attn_mask=(
+                        self.buffered_future_mask(x)
+                        if incremental_state is None
+                        else None
+                    ),
+                    self_attn_padding_mask=(
+                        target_padding_mask if incremental_state is None else None
+                    ),
+                )
+            else:
+                x = layer(x)
+
+        # T x B x C -> B x T x C
+        x = x.transpose(0, 1)
+
+        x = self.fc_out(x)
+
+        return x, None
+
+    def buffered_future_mask(self, tensor):
+        dim = tensor.size(0)
+        if (
+            not hasattr(self, "_future_mask")
+            or self._future_mask is None
+            or self._future_mask.device != tensor.device
+        ):
+            self._future_mask = torch.triu(
+                utils.fill_with_neg_inf(tensor.new(dim, dim)), 1
+            )
+        if self._future_mask.size(0) < dim:
+            self._future_mask = torch.triu(
+                utils.fill_with_neg_inf(self._future_mask.resize_(dim, dim)), 1
+            )
+        return self._future_mask[:dim, :dim]
+
+    def _transpose_if_training(self, x, incremental_state):
+        if incremental_state is None:
+            x = x.transpose(0, 1)
+        return x
+
+    def _transpose_if_inference(self, x, incremental_state):
+        if incremental_state:
+            x = x.transpose(0, 1)
+        return x
+
+
+@register_model("asr_vggtransformer_encoder")
+class VGGTransformerEncoderModel(FairseqEncoderModel):
+    def __init__(self, encoder):
+        super().__init__(encoder)
+
+    @staticmethod
+    def add_args(parser):
+        """Add model-specific arguments to the parser."""
+        parser.add_argument(
+            "--input-feat-per-channel",
+            type=int,
+            metavar="N",
+            help="encoder input dimension per input channel",
+        )
+        parser.add_argument(
+            "--vggblock-enc-config",
+            type=str,
+            metavar="EXPR",
+            help="""
+    an array of tuples each containing the configuration of one vggblock
+    [(out_channels, conv_kernel_size, pooling_kernel_size,num_conv_layers), ...]
+    """,
+        )
+        parser.add_argument(
+            "--transformer-enc-config",
+            type=str,
+            metavar="EXPR",
+            help="""
+    a tuple containing the configuration of the Transformer layers
+    configurations:
+    [(input_dim,
+      num_heads,
+      ffn_dim,
+      normalize_before,
+      dropout,
+      attention_dropout,
+      relu_dropout), ]""",
+        )
+        parser.add_argument(
+            "--enc-output-dim",
+            type=int,
+            metavar="N",
+            help="encoder output dimension, projecting the LSTM output",
+        )
+        parser.add_argument(
+            "--in-channels",
+            type=int,
+            metavar="N",
+            help="number of encoder input channels",
+        )
+        parser.add_argument(
+            "--transformer-context",
+            type=str,
+            metavar="EXPR",
+            help="""
+    either None or a tuple of two ints, indicating left/right context a
+    transformer can have access to""",
+        )
+        parser.add_argument(
+            "--transformer-sampling",
+            type=str,
+            metavar="EXPR",
+            help="""
+    either None or a tuple of ints, indicating sampling factor in each layer""",
+        )
+
+    @classmethod
+    def build_model(cls, args, task):
+        """Build a new model instance."""
+        base_architecture_enconly(args)
+        encoder = VGGTransformerEncoderOnly(
+            vocab_size=len(task.target_dictionary),
+            input_feat_per_channel=args.input_feat_per_channel,
+            vggblock_config=eval(args.vggblock_enc_config),
+            transformer_config=eval(args.transformer_enc_config),
+            encoder_output_dim=args.enc_output_dim,
+            in_channels=args.in_channels,
+            transformer_context=eval(args.transformer_context),
+            transformer_sampling=eval(args.transformer_sampling),
+        )
+        return cls(encoder)
+
+    def get_normalized_probs(self, net_output, log_probs, sample=None):
+        # net_output['encoder_out'] is a (T, B, D) tensor
+        lprobs = super().get_normalized_probs(net_output, log_probs, sample)
+        # lprobs is a (T, B, D) tensor
+        # we need to transoose to get (B, T, D) tensor
+        lprobs = lprobs.transpose(0, 1).contiguous()
+        lprobs.batch_first = True
+        return lprobs
+
+
+class VGGTransformerEncoderOnly(VGGTransformerEncoder):
+    def __init__(
+        self,
+        vocab_size,
+        input_feat_per_channel,
+        vggblock_config=DEFAULT_ENC_VGGBLOCK_CONFIG,
+        transformer_config=DEFAULT_ENC_TRANSFORMER_CONFIG,
+        encoder_output_dim=512,
+        in_channels=1,
+        transformer_context=None,
+        transformer_sampling=None,
+    ):
+        super().__init__(
+            input_feat_per_channel=input_feat_per_channel,
+            vggblock_config=vggblock_config,
+            transformer_config=transformer_config,
+            encoder_output_dim=encoder_output_dim,
+            in_channels=in_channels,
+            transformer_context=transformer_context,
+            transformer_sampling=transformer_sampling,
+        )
+        self.fc_out = Linear(self.encoder_output_dim, vocab_size)
+
+    def forward(self, src_tokens, src_lengths, **kwargs):
+        """
+        src_tokens: padded tensor (B, T, C * feat)
+        src_lengths: tensor of original lengths of input utterances (B,)
+        """
+
+        enc_out = super().forward(src_tokens, src_lengths)
+        x = self.fc_out(enc_out["encoder_out"])
+        # x = F.log_softmax(x, dim=-1)
+        # Note: no need this line, because model.get_normalized_prob will call
+        # log_softmax
+        return {
+            "encoder_out": x,  # (T, B, C)
+            "encoder_padding_mask": enc_out["encoder_padding_mask"],  # (T, B)
+        }
+
+    def max_positions(self):
+        """Maximum input length supported by the encoder."""
+        return (1e6, 1e6)  # an arbitrary large number
+
+
+def Embedding(num_embeddings, embedding_dim, padding_idx):
+    m = nn.Embedding(num_embeddings, embedding_dim, padding_idx=padding_idx)
+    # nn.init.uniform_(m.weight, -0.1, 0.1)
+    # nn.init.constant_(m.weight[padding_idx], 0)
+    return m
+
+
+def Linear(in_features, out_features, bias=True, dropout=0):
+    """Linear layer (input: N x T x C)"""
+    m = nn.Linear(in_features, out_features, bias=bias)
+    # m.weight.data.uniform_(-0.1, 0.1)
+    # if bias:
+    #     m.bias.data.uniform_(-0.1, 0.1)
+    return m
+
+
+def LinearizedConv1d(in_channels, out_channels, kernel_size, dropout=0, **kwargs):
+    """Weight-normalized Conv1d layer optimized for decoding"""
+    m = LinearizedConvolution(in_channels, out_channels, kernel_size, **kwargs)
+    std = math.sqrt((4 * (1.0 - dropout)) / (m.kernel_size[0] * in_channels))
+    nn.init.normal_(m.weight, mean=0, std=std)
+    nn.init.constant_(m.bias, 0)
+    return nn.utils.weight_norm(m, dim=2)
+
+
+def LayerNorm(embedding_dim):
+    m = nn.LayerNorm(embedding_dim)
+    return m
+
+
+# seq2seq models
+def base_architecture(args):
+    args.input_feat_per_channel = getattr(args, "input_feat_per_channel", 40)
+    args.vggblock_enc_config = getattr(
+        args, "vggblock_enc_config", DEFAULT_ENC_VGGBLOCK_CONFIG
+    )
+    args.transformer_enc_config = getattr(
+        args, "transformer_enc_config", DEFAULT_ENC_TRANSFORMER_CONFIG
+    )
+    args.enc_output_dim = getattr(args, "enc_output_dim", 512)
+    args.in_channels = getattr(args, "in_channels", 1)
+    args.tgt_embed_dim = getattr(args, "tgt_embed_dim", 128)
+    args.transformer_dec_config = getattr(
+        args, "transformer_dec_config", DEFAULT_ENC_TRANSFORMER_CONFIG
+    )
+    args.conv_dec_config = getattr(args, "conv_dec_config", DEFAULT_DEC_CONV_CONFIG)
+    args.transformer_context = getattr(args, "transformer_context", "None")
+
+
+@register_model_architecture("asr_vggtransformer", "vggtransformer_1")
+def vggtransformer_1(args):
+    args.input_feat_per_channel = getattr(args, "input_feat_per_channel", 80)
+    args.vggblock_enc_config = getattr(
+        args, "vggblock_enc_config", "[(64, 3, 2, 2, True), (128, 3, 2, 2, True)]"
+    )
+    args.transformer_enc_config = getattr(
+        args,
+        "transformer_enc_config",
+        "((1024, 16, 4096, True, 0.15, 0.15, 0.15),) * 14",
+    )
+    args.enc_output_dim = getattr(args, "enc_output_dim", 1024)
+    args.tgt_embed_dim = getattr(args, "tgt_embed_dim", 128)
+    args.conv_dec_config = getattr(args, "conv_dec_config", "((256, 3, True),) * 4")
+    args.transformer_dec_config = getattr(
+        args,
+        "transformer_dec_config",
+        "((1024, 16, 4096, True, 0.15, 0.15, 0.15),) * 4",
+    )
+
+
+@register_model_architecture("asr_vggtransformer", "vggtransformer_2")
+def vggtransformer_2(args):
+    args.input_feat_per_channel = getattr(args, "input_feat_per_channel", 80)
+    args.vggblock_enc_config = getattr(
+        args, "vggblock_enc_config", "[(64, 3, 2, 2, True), (128, 3, 2, 2, True)]"
+    )
+    args.transformer_enc_config = getattr(
+        args,
+        "transformer_enc_config",
+        "((1024, 16, 4096, True, 0.15, 0.15, 0.15),) * 16",
+    )
+    args.enc_output_dim = getattr(args, "enc_output_dim", 1024)
+    args.tgt_embed_dim = getattr(args, "tgt_embed_dim", 512)
+    args.conv_dec_config = getattr(args, "conv_dec_config", "((256, 3, True),) * 4")
+    args.transformer_dec_config = getattr(
+        args,
+        "transformer_dec_config",
+        "((1024, 16, 4096, True, 0.15, 0.15, 0.15),) * 6",
+    )
+
+
+@register_model_architecture("asr_vggtransformer", "vggtransformer_base")
+def vggtransformer_base(args):
+    args.input_feat_per_channel = getattr(args, "input_feat_per_channel", 80)
+    args.vggblock_enc_config = getattr(
+        args, "vggblock_enc_config", "[(64, 3, 2, 2, True), (128, 3, 2, 2, True)]"
+    )
+    args.transformer_enc_config = getattr(
+        args, "transformer_enc_config", "((512, 8, 2048, True, 0.15, 0.15, 0.15),) * 12"
+    )
+
+    args.enc_output_dim = getattr(args, "enc_output_dim", 512)
+    args.tgt_embed_dim = getattr(args, "tgt_embed_dim", 512)
+    args.conv_dec_config = getattr(args, "conv_dec_config", "((256, 3, True),) * 4")
+    args.transformer_dec_config = getattr(
+        args, "transformer_dec_config", "((512, 8, 2048, True, 0.15, 0.15, 0.15),) * 6"
+    )
+    # Size estimations:
+    # Encoder:
+    #   - vggblock param: 64*1*3*3 + 64*64*3*3 + 128*64*3*3  + 128*128*3 = 258K
+    #   Transformer:
+    #   - input dimension adapter: 2560 x 512 -> 1.31M
+    #   - transformer_layers (x12) --> 37.74M
+    #       * MultiheadAttention: 512*512*3 (in_proj) + 512*512 (out_proj) = 1.048M
+    #       * FFN weight: 512*2048*2 = 2.097M
+    #   - output dimension adapter: 512 x 512 -> 0.26 M
+    # Decoder:
+    #   - LinearizedConv1d: 512 * 256 * 3 + 256 * 256 * 3 * 3
+    #   - transformer_layer: (x6) --> 25.16M
+    #        * MultiheadAttention (self-attention): 512*512*3 + 512*512 = 1.048M
+    #        * MultiheadAttention (encoder-attention): 512*512*3 + 512*512 = 1.048M
+    #        * FFN: 512*2048*2 = 2.097M
+    # Final FC:
+    #   - FC: 512*5000 = 256K (assuming vocab size 5K)
+    # In total:
+    #       ~65 M
+
+
+# CTC models
+def base_architecture_enconly(args):
+    args.input_feat_per_channel = getattr(args, "input_feat_per_channel", 40)
+    args.vggblock_enc_config = getattr(
+        args, "vggblock_enc_config", "[(32, 3, 2, 2, True)] * 2"
+    )
+    args.transformer_enc_config = getattr(
+        args, "transformer_enc_config", "((256, 4, 1024, True, 0.2, 0.2, 0.2),) * 2"
+    )
+    args.enc_output_dim = getattr(args, "enc_output_dim", 512)
+    args.in_channels = getattr(args, "in_channels", 1)
+    args.transformer_context = getattr(args, "transformer_context", "None")
+    args.transformer_sampling = getattr(args, "transformer_sampling", "None")
+
+
+@register_model_architecture("asr_vggtransformer_encoder", "vggtransformer_enc_1")
+def vggtransformer_enc_1(args):
+    # vggtransformer_1 is the same as vggtransformer_enc_big, except the number
+    # of layers is increased to 16
+    # keep it here for backward compatiablity purpose
+    args.input_feat_per_channel = getattr(args, "input_feat_per_channel", 80)
+    args.vggblock_enc_config = getattr(
+        args, "vggblock_enc_config", "[(64, 3, 2, 2, True), (128, 3, 2, 2, True)]"
+    )
+    args.transformer_enc_config = getattr(
+        args,
+        "transformer_enc_config",
+        "((1024, 16, 4096, True, 0.15, 0.15, 0.15),) * 16",
+    )
+    args.enc_output_dim = getattr(args, "enc_output_dim", 1024)
diff --git a/SpeechT5/fairseq/examples/speech_recognition/models/w2l_conv_glu_enc.py b/SpeechT5/fairseq/examples/speech_recognition/models/w2l_conv_glu_enc.py
new file mode 100644
index 0000000000000000000000000000000000000000..655a9b0d19d11e35511392a016f9d6b7d7aa2925
--- /dev/null
+++ b/SpeechT5/fairseq/examples/speech_recognition/models/w2l_conv_glu_enc.py
@@ -0,0 +1,177 @@
+#!/usr/bin/env python3
+
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import math
+
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from fairseq.models import (
+    FairseqEncoder,
+    FairseqEncoderModel,
+    register_model,
+    register_model_architecture,
+)
+from fairseq.modules.fairseq_dropout import FairseqDropout
+
+
+default_conv_enc_config = """[
+    (400, 13, 170, 0.2),
+    (440, 14, 0, 0.214),
+    (484, 15, 0, 0.22898),
+    (532, 16, 0, 0.2450086),
+    (584, 17, 0, 0.262159202),
+    (642, 18, 0, 0.28051034614),
+    (706, 19, 0, 0.30014607037),
+    (776, 20, 0, 0.321156295296),
+    (852, 21, 0, 0.343637235966),
+    (936, 22, 0, 0.367691842484),
+    (1028, 23, 0, 0.393430271458),
+    (1130, 24, 0, 0.42097039046),
+    (1242, 25, 0, 0.450438317792),
+    (1366, 26, 0, 0.481969000038),
+    (1502, 27, 0, 0.51570683004),
+    (1652, 28, 0, 0.551806308143),
+    (1816, 29, 0, 0.590432749713),
+]"""
+
+
+@register_model("asr_w2l_conv_glu_encoder")
+class W2lConvGluEncoderModel(FairseqEncoderModel):
+    def __init__(self, encoder):
+        super().__init__(encoder)
+
+    @staticmethod
+    def add_args(parser):
+        """Add model-specific arguments to the parser."""
+        parser.add_argument(
+            "--input-feat-per-channel",
+            type=int,
+            metavar="N",
+            help="encoder input dimension per input channel",
+        )
+        parser.add_argument(
+            "--in-channels",
+            type=int,
+            metavar="N",
+            help="number of encoder input channels",
+        )
+        parser.add_argument(
+            "--conv-enc-config",
+            type=str,
+            metavar="EXPR",
+            help="""
+    an array of tuples each containing the configuration of one conv layer
+    [(out_channels, kernel_size, padding, dropout), ...]
+            """,
+        )
+
+    @classmethod
+    def build_model(cls, args, task):
+        """Build a new model instance."""
+        conv_enc_config = getattr(args, "conv_enc_config", default_conv_enc_config)
+        encoder = W2lConvGluEncoder(
+            vocab_size=len(task.target_dictionary),
+            input_feat_per_channel=args.input_feat_per_channel,
+            in_channels=args.in_channels,
+            conv_enc_config=eval(conv_enc_config),
+        )
+        return cls(encoder)
+
+    def get_normalized_probs(self, net_output, log_probs, sample=None):
+        lprobs = super().get_normalized_probs(net_output, log_probs, sample)
+        lprobs.batch_first = False
+        return lprobs
+
+
+class W2lConvGluEncoder(FairseqEncoder):
+    def __init__(
+        self, vocab_size, input_feat_per_channel, in_channels, conv_enc_config
+    ):
+        super().__init__(None)
+
+        self.input_dim = input_feat_per_channel
+        if in_channels != 1:
+            raise ValueError("only 1 input channel is currently supported")
+
+        self.conv_layers = nn.ModuleList()
+        self.linear_layers = nn.ModuleList()
+        self.dropouts = []
+        cur_channels = input_feat_per_channel
+
+        for out_channels, kernel_size, padding, dropout in conv_enc_config:
+            layer = nn.Conv1d(cur_channels, out_channels, kernel_size, padding=padding)
+            layer.weight.data.mul_(math.sqrt(3))  # match wav2letter init
+            self.conv_layers.append(nn.utils.weight_norm(layer))
+            self.dropouts.append(
+                FairseqDropout(dropout, module_name=self.__class__.__name__)
+            )
+            if out_channels % 2 != 0:
+                raise ValueError("odd # of out_channels is incompatible with GLU")
+            cur_channels = out_channels // 2  # halved by GLU
+
+        for out_channels in [2 * cur_channels, vocab_size]:
+            layer = nn.Linear(cur_channels, out_channels)
+            layer.weight.data.mul_(math.sqrt(3))
+            self.linear_layers.append(nn.utils.weight_norm(layer))
+            cur_channels = out_channels // 2
+
+    def forward(self, src_tokens, src_lengths, **kwargs):
+
+        """
+        src_tokens: padded tensor (B, T, C * feat)
+        src_lengths: tensor of original lengths of input utterances (B,)
+        """
+        B, T, _ = src_tokens.size()
+        x = src_tokens.transpose(1, 2).contiguous()  # (B, feat, T) assuming C == 1
+
+        for layer_idx in range(len(self.conv_layers)):
+            x = self.conv_layers[layer_idx](x)
+            x = F.glu(x, dim=1)
+            x = self.dropouts[layer_idx](x)
+
+        x = x.transpose(1, 2).contiguous()  # (B, T, 908)
+        x = self.linear_layers[0](x)
+        x = F.glu(x, dim=2)
+        x = self.dropouts[-1](x)
+        x = self.linear_layers[1](x)
+
+        assert x.size(0) == B
+        assert x.size(1) == T
+
+        encoder_out = x.transpose(0, 1)  # (T, B, vocab_size)
+
+        # need to debug this -- find a simpler/elegant way in pytorch APIs
+        encoder_padding_mask = (
+            torch.arange(T).view(1, T).expand(B, -1).to(x.device)
+            >= src_lengths.view(B, 1).expand(-1, T)
+        ).t()  # (B x T) -> (T x B)
+
+        return {
+            "encoder_out": encoder_out,  # (T, B, vocab_size)
+            "encoder_padding_mask": encoder_padding_mask,  # (T, B)
+        }
+
+    def reorder_encoder_out(self, encoder_out, new_order):
+        encoder_out["encoder_out"] = encoder_out["encoder_out"].index_select(
+            1, new_order
+        )
+        encoder_out["encoder_padding_mask"] = encoder_out[
+            "encoder_padding_mask"
+        ].index_select(1, new_order)
+        return encoder_out
+
+    def max_positions(self):
+        """Maximum input length supported by the encoder."""
+        return (1e6, 1e6)  # an arbitrary large number
+
+
+@register_model_architecture("asr_w2l_conv_glu_encoder", "w2l_conv_glu_enc")
+def w2l_conv_glu_enc(args):
+    args.input_feat_per_channel = getattr(args, "input_feat_per_channel", 80)
+    args.in_channels = getattr(args, "in_channels", 1)
+    args.conv_enc_config = getattr(args, "conv_enc_config", default_conv_enc_config)
diff --git a/SpeechT5/fairseq/examples/speech_recognition/new/README.md b/SpeechT5/fairseq/examples/speech_recognition/new/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..5fa0e97245d3ba6db69d11222261b0644960183d
--- /dev/null
+++ b/SpeechT5/fairseq/examples/speech_recognition/new/README.md
@@ -0,0 +1,43 @@
+# Flashlight Decoder
+
+This script runs decoding for pre-trained speech recognition models.
+
+## Usage
+
+Assuming a few variables:
+
+```bash
+checkpoint=<path-to-checkpoint>
+data=<path-to-data-directory>
+lm_model=<path-to-language-model>
+lexicon=<path-to-lexicon>
+```
+
+Example usage for decoding a fine-tuned Wav2Vec model:
+
+```bash
+python $FAIRSEQ_ROOT/examples/speech_recognition/new/infer.py --multirun \
+    task=audio_pretraining \
+    task.data=$data \
+    task.labels=ltr \
+    common_eval.path=$checkpoint \
+    decoding.type=kenlm \
+    decoding.lexicon=$lexicon \
+    decoding.lmpath=$lm_model \
+    dataset.gen_subset=dev_clean,dev_other,test_clean,test_other
+```
+
+Example usage for using Ax to sweep WER parameters (requires `pip install hydra-ax-sweeper`):
+
+```bash
+python $FAIRSEQ_ROOT/examples/speech_recognition/new/infer.py --multirun \
+    hydra/sweeper=ax \
+    task=audio_pretraining \
+    task.data=$data \
+    task.labels=ltr \
+    common_eval.path=$checkpoint \
+    decoding.type=kenlm \
+    decoding.lexicon=$lexicon \
+    decoding.lmpath=$lm_model \
+    dataset.gen_subset=dev_other
+```
diff --git a/SpeechT5/fairseq/examples/speech_recognition/new/__init__.py b/SpeechT5/fairseq/examples/speech_recognition/new/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/SpeechT5/fairseq/examples/speech_recognition/new/conf/hydra/sweeper/ax.yaml b/SpeechT5/fairseq/examples/speech_recognition/new/conf/hydra/sweeper/ax.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..fbeff17ca6b5fb0a1b44de0abe0d1a3d3d2aeeb2
--- /dev/null
+++ b/SpeechT5/fairseq/examples/speech_recognition/new/conf/hydra/sweeper/ax.yaml
@@ -0,0 +1,26 @@
+# @package hydra.sweeper
+_target_: hydra_plugins.hydra_ax_sweeper.ax_sweeper.AxSweeper
+max_batch_size: null
+ax_config:
+  max_trials: 128
+  early_stop:
+    minimize: true
+    max_epochs_without_improvement: 32
+    epsilon: 1.0e-05
+  experiment:
+    name: ${dataset.gen_subset}
+    objective_name: wer
+    minimize: true
+    parameter_constraints: null
+    outcome_constraints: null
+    status_quo: null
+  client:
+    verbose_logging: false
+    random_seed: null
+  params:
+    decoding.lmweight:
+      type: range
+      bounds: [0.0, 5.0]
+    decoding.wordscore:
+      type: range
+      bounds: [-5.0, 5.0]
diff --git a/SpeechT5/fairseq/examples/speech_recognition/new/conf/infer.yaml b/SpeechT5/fairseq/examples/speech_recognition/new/conf/infer.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..f176228082478fae0586a6da60a437e7b377b9ae
--- /dev/null
+++ b/SpeechT5/fairseq/examples/speech_recognition/new/conf/infer.yaml
@@ -0,0 +1,25 @@
+# @package _group_
+
+defaults:
+    - task: null
+    - model: null
+
+hydra:
+  run:
+    dir: ${common_eval.results_path}/${dataset.gen_subset}
+  sweep:
+    dir: ${common_eval.results_path}
+    subdir: ${dataset.gen_subset}
+common_eval:
+  results_path: null
+  path: null
+  post_process: letter
+  quiet: true
+dataset:
+  max_tokens: 1000000
+  gen_subset: test
+distributed_training:
+  distributed_world_size: 1
+decoding:
+  beam: 5
+  type: viterbi
diff --git a/SpeechT5/fairseq/examples/speech_recognition/new/decoders/__init__.py b/SpeechT5/fairseq/examples/speech_recognition/new/decoders/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/SpeechT5/fairseq/examples/speech_recognition/new/decoders/base_decoder.py b/SpeechT5/fairseq/examples/speech_recognition/new/decoders/base_decoder.py
new file mode 100644
index 0000000000000000000000000000000000000000..a097969b3c0650cf8ea2ab5f8e96bbc68ea9b97f
--- /dev/null
+++ b/SpeechT5/fairseq/examples/speech_recognition/new/decoders/base_decoder.py
@@ -0,0 +1,62 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import itertools as it
+from typing import Any, Dict, List
+
+import torch
+from fairseq.data.dictionary import Dictionary
+from fairseq.models.fairseq_model import FairseqModel
+
+
+class BaseDecoder:
+    def __init__(self, tgt_dict: Dictionary) -> None:
+        self.tgt_dict = tgt_dict
+        self.vocab_size = len(tgt_dict)
+
+        self.blank = (
+            tgt_dict.index("<ctc_blank>")
+            if "<ctc_blank>" in tgt_dict.indices
+            else tgt_dict.bos()
+        )
+        if "<sep>" in tgt_dict.indices:
+            self.silence = tgt_dict.index("<sep>")
+        elif "|" in tgt_dict.indices:
+            self.silence = tgt_dict.index("|")
+        else:
+            self.silence = tgt_dict.eos()
+
+    def generate(
+        self, models: List[FairseqModel], sample: Dict[str, Any], **unused
+    ) -> List[List[Dict[str, torch.LongTensor]]]:
+        encoder_input = {
+            k: v for k, v in sample["net_input"].items() if k != "prev_output_tokens"
+        }
+        emissions = self.get_emissions(models, encoder_input)
+        return self.decode(emissions)
+
+    def get_emissions(
+        self,
+        models: List[FairseqModel],
+        encoder_input: Dict[str, Any],
+    ) -> torch.FloatTensor:
+        model = models[0]
+        encoder_out = model(**encoder_input)
+        if hasattr(model, "get_logits"):
+            emissions = model.get_logits(encoder_out)
+        else:
+            emissions = model.get_normalized_probs(encoder_out, log_probs=True)
+        return emissions.transpose(0, 1).float().cpu().contiguous()
+
+    def get_tokens(self, idxs: torch.IntTensor) -> torch.LongTensor:
+        idxs = (g[0] for g in it.groupby(idxs))
+        idxs = filter(lambda x: x != self.blank, idxs)
+        return torch.LongTensor(list(idxs))
+
+    def decode(
+        self,
+        emissions: torch.FloatTensor,
+    ) -> List[List[Dict[str, torch.LongTensor]]]:
+        raise NotImplementedError
diff --git a/SpeechT5/fairseq/examples/speech_recognition/new/decoders/decoder.py b/SpeechT5/fairseq/examples/speech_recognition/new/decoders/decoder.py
new file mode 100644
index 0000000000000000000000000000000000000000..b5bec8cf707b53104ef7a45993a5db2893d3443b
--- /dev/null
+++ b/SpeechT5/fairseq/examples/speech_recognition/new/decoders/decoder.py
@@ -0,0 +1,32 @@
+#!/usr/bin/env python3
+
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from typing import Union
+
+from fairseq.data.dictionary import Dictionary
+
+from .decoder_config import DecoderConfig, FlashlightDecoderConfig
+from .base_decoder import BaseDecoder
+
+
+def Decoder(
+    cfg: Union[DecoderConfig, FlashlightDecoderConfig], tgt_dict: Dictionary
+) -> BaseDecoder:
+
+    if cfg.type == "viterbi":
+        from .viterbi_decoder import ViterbiDecoder
+
+        return ViterbiDecoder(tgt_dict)
+    if cfg.type == "kenlm":
+        from .flashlight_decoder import KenLMDecoder
+
+        return KenLMDecoder(cfg, tgt_dict)
+    if cfg.type == "fairseqlm":
+        from .flashlight_decoder import FairseqLMDecoder
+
+        return FairseqLMDecoder(cfg, tgt_dict)
+    raise NotImplementedError(f"Invalid decoder name: {cfg.name}")
diff --git a/SpeechT5/fairseq/examples/speech_recognition/new/decoders/decoder_config.py b/SpeechT5/fairseq/examples/speech_recognition/new/decoders/decoder_config.py
new file mode 100644
index 0000000000000000000000000000000000000000..659eb94a9b8187a7c126d7b439ac2742f9d72022
--- /dev/null
+++ b/SpeechT5/fairseq/examples/speech_recognition/new/decoders/decoder_config.py
@@ -0,0 +1,70 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import math
+from dataclasses import dataclass, field
+from typing import Optional
+
+from fairseq.dataclass.configs import FairseqDataclass
+from fairseq.dataclass.constants import ChoiceEnum
+from omegaconf import MISSING
+
+
+DECODER_CHOICES = ChoiceEnum(["viterbi", "kenlm", "fairseqlm"])
+
+
+@dataclass
+class DecoderConfig(FairseqDataclass):
+    type: DECODER_CHOICES = field(
+        default="viterbi",
+        metadata={"help": "The type of decoder to use"},
+    )
+
+
+@dataclass
+class FlashlightDecoderConfig(FairseqDataclass):
+    nbest: int = field(
+        default=1,
+        metadata={"help": "Number of decodings to return"},
+    )
+    unitlm: bool = field(
+        default=False,
+        metadata={"help": "If set, use unit language model"},
+    )
+    lmpath: str = field(
+        default=MISSING,
+        metadata={"help": "Language model for KenLM decoder"},
+    )
+    lexicon: Optional[str] = field(
+        default=None,
+        metadata={"help": "Lexicon for Flashlight decoder"},
+    )
+    beam: int = field(
+        default=50,
+        metadata={"help": "Number of beams to use for decoding"},
+    )
+    beamthreshold: float = field(
+        default=50.0,
+        metadata={"help": "Threshold for beam search decoding"},
+    )
+    beamsizetoken: Optional[int] = field(
+        default=None, metadata={"help": "Beam size to use"}
+    )
+    wordscore: float = field(
+        default=-1,
+        metadata={"help": "Word score for KenLM decoder"},
+    )
+    unkweight: float = field(
+        default=-math.inf,
+        metadata={"help": "Unknown weight for KenLM decoder"},
+    )
+    silweight: float = field(
+        default=0,
+        metadata={"help": "Silence weight for KenLM decoder"},
+    )
+    lmweight: float = field(
+        default=2,
+        metadata={"help": "Weight for LM while interpolating score"},
+    )
diff --git a/SpeechT5/fairseq/examples/speech_recognition/new/decoders/flashlight_decoder.py b/SpeechT5/fairseq/examples/speech_recognition/new/decoders/flashlight_decoder.py
new file mode 100644
index 0000000000000000000000000000000000000000..38c7ac492f390a367a64769d7a72fe228df097c7
--- /dev/null
+++ b/SpeechT5/fairseq/examples/speech_recognition/new/decoders/flashlight_decoder.py
@@ -0,0 +1,431 @@
+#!/usr/bin/env python3
+
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import gc
+import os.path as osp
+import warnings
+from collections import deque, namedtuple
+from typing import Any, Dict, Tuple
+
+import numpy as np
+import torch
+from fairseq import tasks
+from fairseq.data.dictionary import Dictionary
+from fairseq.dataclass.utils import convert_namespace_to_omegaconf
+from fairseq.models.fairseq_model import FairseqModel
+from fairseq.utils import apply_to_sample
+from omegaconf import open_dict, OmegaConf
+
+from typing import List
+
+from .decoder_config import FlashlightDecoderConfig
+from .base_decoder import BaseDecoder
+
+try:
+    from flashlight.lib.text.decoder import (
+        LM,
+        CriterionType,
+        DecodeResult,
+        KenLM,
+        LexiconDecoder,
+        LexiconDecoderOptions,
+        LexiconFreeDecoder,
+        LexiconFreeDecoderOptions,
+        LMState,
+        SmearingMode,
+        Trie,
+    )
+    from flashlight.lib.text.dictionary import create_word_dict, load_words
+except ImportError:
+    warnings.warn(
+        "flashlight python bindings are required to use this functionality. "
+        "Please install from "
+        "https://github.com/facebookresearch/flashlight/tree/master/bindings/python"
+    )
+    LM = object
+    LMState = object
+
+
+class KenLMDecoder(BaseDecoder):
+    def __init__(self, cfg: FlashlightDecoderConfig, tgt_dict: Dictionary) -> None:
+        super().__init__(tgt_dict)
+
+        self.nbest = cfg.nbest
+        self.unitlm = cfg.unitlm
+
+        if cfg.lexicon:
+            self.lexicon = load_words(cfg.lexicon)
+            self.word_dict = create_word_dict(self.lexicon)
+            self.unk_word = self.word_dict.get_index("<unk>")
+
+            self.lm = KenLM(cfg.lmpath, self.word_dict)
+            self.trie = Trie(self.vocab_size, self.silence)
+
+            start_state = self.lm.start(False)
+            for word, spellings in self.lexicon.items():
+                word_idx = self.word_dict.get_index(word)
+                _, score = self.lm.score(start_state, word_idx)
+                for spelling in spellings:
+                    spelling_idxs = [tgt_dict.index(token) for token in spelling]
+                    assert (
+                        tgt_dict.unk() not in spelling_idxs
+                    ), f"{word} {spelling} {spelling_idxs}"
+                    self.trie.insert(spelling_idxs, word_idx, score)
+            self.trie.smear(SmearingMode.MAX)
+
+            self.decoder_opts = LexiconDecoderOptions(
+                beam_size=cfg.beam,
+                beam_size_token=cfg.beamsizetoken or len(tgt_dict),
+                beam_threshold=cfg.beamthreshold,
+                lm_weight=cfg.lmweight,
+                word_score=cfg.wordscore,
+                unk_score=cfg.unkweight,
+                sil_score=cfg.silweight,
+                log_add=False,
+                criterion_type=CriterionType.CTC,
+            )
+
+            self.decoder = LexiconDecoder(
+                self.decoder_opts,
+                self.trie,
+                self.lm,
+                self.silence,
+                self.blank,
+                self.unk_word,
+                [],
+                self.unitlm,
+            )
+        else:
+            assert self.unitlm, "Lexicon-free decoding requires unit LM"
+
+            d = {w: [[w]] for w in tgt_dict.symbols}
+            self.word_dict = create_word_dict(d)
+            self.lm = KenLM(cfg.lmpath, self.word_dict)
+            self.decoder_opts = LexiconFreeDecoderOptions(
+                beam_size=cfg.beam,
+                beam_size_token=cfg.beamsizetoken or len(tgt_dict),
+                beam_threshold=cfg.beamthreshold,
+                lm_weight=cfg.lmweight,
+                sil_score=cfg.silweight,
+                log_add=False,
+                criterion_type=CriterionType.CTC,
+            )
+            self.decoder = LexiconFreeDecoder(
+                self.decoder_opts, self.lm, self.silence, self.blank, []
+            )
+
+    def get_timesteps(self, token_idxs: List[int]) -> List[int]:
+        """Returns frame numbers corresponding to every non-blank token.
+
+        Parameters
+        ----------
+        token_idxs : List[int]
+            IDs of decoded tokens.
+
+        Returns
+        -------
+        List[int]
+            Frame numbers corresponding to every non-blank token.
+        """
+        timesteps = []
+        for i, token_idx in enumerate(token_idxs):
+            if token_idx == self.blank:
+                continue
+            if i == 0 or token_idx != token_idxs[i-1]:
+                timesteps.append(i)
+        return timesteps
+
+    def decode(
+        self,
+        emissions: torch.FloatTensor,
+    ) -> List[List[Dict[str, torch.LongTensor]]]:
+        B, T, N = emissions.size()
+        hypos = []
+        for b in range(B):
+            emissions_ptr = emissions.data_ptr() + 4 * b * emissions.stride(0)
+            results = self.decoder.decode(emissions_ptr, T, N)
+
+            nbest_results = results[: self.nbest]
+            hypos.append(
+                [
+                    {
+                        "tokens": self.get_tokens(result.tokens),
+                        "score": result.score,
+                        "timesteps": self.get_timesteps(result.tokens),
+                        "words": [
+                            self.word_dict.get_entry(x) for x in result.words if x >= 0
+                        ],
+                    }
+                    for result in nbest_results
+                ]
+            )
+        return hypos
+
+
+FairseqLMState = namedtuple(
+    "FairseqLMState",
+    [
+        "prefix",
+        "incremental_state",
+        "probs",
+    ],
+)
+
+
+class FairseqLM(LM):
+    def __init__(self, dictionary: Dictionary, model: FairseqModel) -> None:
+        super().__init__()
+
+        self.dictionary = dictionary
+        self.model = model
+        self.unk = self.dictionary.unk()
+
+        self.save_incremental = False  # this currently does not work properly
+        self.max_cache = 20_000
+
+        if torch.cuda.is_available():
+            model.cuda()
+        model.eval()
+        model.make_generation_fast_()
+
+        self.states = {}
+        self.stateq = deque()
+
+    def start(self, start_with_nothing: bool) -> LMState:
+        state = LMState()
+        prefix = torch.LongTensor([[self.dictionary.eos()]])
+        incremental_state = {} if self.save_incremental else None
+        with torch.no_grad():
+            res = self.model(prefix.cuda(), incremental_state=incremental_state)
+            probs = self.model.get_normalized_probs(res, log_probs=True, sample=None)
+
+        if incremental_state is not None:
+            incremental_state = apply_to_sample(lambda x: x.cpu(), incremental_state)
+        self.states[state] = FairseqLMState(
+            prefix.numpy(), incremental_state, probs[0, -1].cpu().numpy()
+        )
+        self.stateq.append(state)
+
+        return state
+
+    def score(
+        self,
+        state: LMState,
+        token_index: int,
+        no_cache: bool = False,
+    ) -> Tuple[LMState, int]:
+        """
+        Evaluate language model based on the current lm state and new word
+        Parameters:
+        -----------
+        state: current lm state
+        token_index: index of the word
+                     (can be lexicon index then you should store inside LM the
+                      mapping between indices of lexicon and lm, or lm index of a word)
+        Returns:
+        --------
+        (LMState, float): pair of (new state, score for the current word)
+        """
+        curr_state = self.states[state]
+
+        def trim_cache(targ_size: int) -> None:
+            while len(self.stateq) > targ_size:
+                rem_k = self.stateq.popleft()
+                rem_st = self.states[rem_k]
+                rem_st = FairseqLMState(rem_st.prefix, None, None)
+                self.states[rem_k] = rem_st
+
+        if curr_state.probs is None:
+            new_incremental_state = (
+                curr_state.incremental_state.copy()
+                if curr_state.incremental_state is not None
+                else None
+            )
+            with torch.no_grad():
+                if new_incremental_state is not None:
+                    new_incremental_state = apply_to_sample(
+                        lambda x: x.cuda(), new_incremental_state
+                    )
+                elif self.save_incremental:
+                    new_incremental_state = {}
+
+                res = self.model(
+                    torch.from_numpy(curr_state.prefix).cuda(),
+                    incremental_state=new_incremental_state,
+                )
+                probs = self.model.get_normalized_probs(
+                    res, log_probs=True, sample=None
+                )
+
+                if new_incremental_state is not None:
+                    new_incremental_state = apply_to_sample(
+                        lambda x: x.cpu(), new_incremental_state
+                    )
+
+                curr_state = FairseqLMState(
+                    curr_state.prefix, new_incremental_state, probs[0, -1].cpu().numpy()
+                )
+
+            if not no_cache:
+                self.states[state] = curr_state
+                self.stateq.append(state)
+
+        score = curr_state.probs[token_index].item()
+
+        trim_cache(self.max_cache)
+
+        outstate = state.child(token_index)
+        if outstate not in self.states and not no_cache:
+            prefix = np.concatenate(
+                [curr_state.prefix, torch.LongTensor([[token_index]])], -1
+            )
+            incr_state = curr_state.incremental_state
+
+            self.states[outstate] = FairseqLMState(prefix, incr_state, None)
+
+        if token_index == self.unk:
+            score = float("-inf")
+
+        return outstate, score
+
+    def finish(self, state: LMState) -> Tuple[LMState, int]:
+        """
+        Evaluate eos for language model based on the current lm state
+        Returns:
+        --------
+        (LMState, float): pair of (new state, score for the current word)
+        """
+        return self.score(state, self.dictionary.eos())
+
+    def empty_cache(self) -> None:
+        self.states = {}
+        self.stateq = deque()
+        gc.collect()
+
+
+class FairseqLMDecoder(BaseDecoder):
+    def __init__(self, cfg: FlashlightDecoderConfig, tgt_dict: Dictionary) -> None:
+        super().__init__(tgt_dict)
+
+        self.nbest = cfg.nbest
+        self.unitlm = cfg.unitlm
+
+        self.lexicon = load_words(cfg.lexicon) if cfg.lexicon else None
+        self.idx_to_wrd = {}
+
+        checkpoint = torch.load(cfg.lmpath, map_location="cpu")
+
+        if "cfg" in checkpoint and checkpoint["cfg"] is not None:
+            lm_args = checkpoint["cfg"]
+        else:
+            lm_args = convert_namespace_to_omegaconf(checkpoint["args"])
+
+        if not OmegaConf.is_dict(lm_args):
+            lm_args = OmegaConf.create(lm_args)
+
+        with open_dict(lm_args.task):
+            lm_args.task.data = osp.dirname(cfg.lmpath)
+
+        task = tasks.setup_task(lm_args.task)
+        model = task.build_model(lm_args.model)
+        model.load_state_dict(checkpoint["model"], strict=False)
+
+        self.trie = Trie(self.vocab_size, self.silence)
+
+        self.word_dict = task.dictionary
+        self.unk_word = self.word_dict.unk()
+        self.lm = FairseqLM(self.word_dict, model)
+
+        if self.lexicon:
+            start_state = self.lm.start(False)
+            for i, (word, spellings) in enumerate(self.lexicon.items()):
+                if self.unitlm:
+                    word_idx = i
+                    self.idx_to_wrd[i] = word
+                    score = 0
+                else:
+                    word_idx = self.word_dict.index(word)
+                    _, score = self.lm.score(start_state, word_idx, no_cache=True)
+
+                for spelling in spellings:
+                    spelling_idxs = [tgt_dict.index(token) for token in spelling]
+                    assert (
+                        tgt_dict.unk() not in spelling_idxs
+                    ), f"{spelling} {spelling_idxs}"
+                    self.trie.insert(spelling_idxs, word_idx, score)
+            self.trie.smear(SmearingMode.MAX)
+
+            self.decoder_opts = LexiconDecoderOptions(
+                beam_size=cfg.beam,
+                beam_size_token=cfg.beamsizetoken or len(tgt_dict),
+                beam_threshold=cfg.beamthreshold,
+                lm_weight=cfg.lmweight,
+                word_score=cfg.wordscore,
+                unk_score=cfg.unkweight,
+                sil_score=cfg.silweight,
+                log_add=False,
+                criterion_type=CriterionType.CTC,
+            )
+
+            self.decoder = LexiconDecoder(
+                self.decoder_opts,
+                self.trie,
+                self.lm,
+                self.silence,
+                self.blank,
+                self.unk_word,
+                [],
+                self.unitlm,
+            )
+        else:
+            assert self.unitlm, "Lexicon-free decoding requires unit LM"
+
+            d = {w: [[w]] for w in tgt_dict.symbols}
+            self.word_dict = create_word_dict(d)
+            self.lm = KenLM(cfg.lmpath, self.word_dict)
+            self.decoder_opts = LexiconFreeDecoderOptions(
+                beam_size=cfg.beam,
+                beam_size_token=cfg.beamsizetoken or len(tgt_dict),
+                beam_threshold=cfg.beamthreshold,
+                lm_weight=cfg.lmweight,
+                sil_score=cfg.silweight,
+                log_add=False,
+                criterion_type=CriterionType.CTC,
+            )
+            self.decoder = LexiconFreeDecoder(
+                self.decoder_opts, self.lm, self.silence, self.blank, []
+            )
+
+    def decode(
+        self,
+        emissions: torch.FloatTensor,
+    ) -> List[List[Dict[str, torch.LongTensor]]]:
+        B, T, N = emissions.size()
+        hypos = []
+
+        def make_hypo(result: DecodeResult) -> Dict[str, Any]:
+            hypo = {
+                "tokens": self.get_tokens(result.tokens),
+                "score": result.score,
+            }
+            if self.lexicon:
+                hypo["words"] = [
+                    self.idx_to_wrd[x] if self.unitlm else self.word_dict[x]
+                    for x in result.words
+                    if x >= 0
+                ]
+            return hypo
+
+        for b in range(B):
+            emissions_ptr = emissions.data_ptr() + 4 * b * emissions.stride(0)
+            results = self.decoder.decode(emissions_ptr, T, N)
+
+            nbest_results = results[: self.nbest]
+            hypos.append([make_hypo(result) for result in nbest_results])
+            self.lm.empty_cache()
+
+        return hypos
diff --git a/SpeechT5/fairseq/examples/speech_recognition/new/decoders/viterbi_decoder.py b/SpeechT5/fairseq/examples/speech_recognition/new/decoders/viterbi_decoder.py
new file mode 100644
index 0000000000000000000000000000000000000000..b1c47868fa3b4e21f939b0695ede8d14ba1b168d
--- /dev/null
+++ b/SpeechT5/fairseq/examples/speech_recognition/new/decoders/viterbi_decoder.py
@@ -0,0 +1,24 @@
+#!/usr/bin/env python3
+
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import torch
+
+from typing import List, Dict
+
+from .base_decoder import BaseDecoder
+
+
+class ViterbiDecoder(BaseDecoder):
+    def decode(
+        self,
+        emissions: torch.FloatTensor,
+    ) -> List[List[Dict[str, torch.LongTensor]]]:
+        def get_pred(e):
+            toks = e.argmax(dim=-1).unique_consecutive()
+            return toks[toks != self.blank]
+
+        return [[{"tokens": get_pred(x), "score": 0}] for x in emissions]
diff --git a/SpeechT5/fairseq/examples/speech_recognition/new/infer.py b/SpeechT5/fairseq/examples/speech_recognition/new/infer.py
new file mode 100644
index 0000000000000000000000000000000000000000..79afbc426d4655b6aa3eb4d12b2293fb57c9a568
--- /dev/null
+++ b/SpeechT5/fairseq/examples/speech_recognition/new/infer.py
@@ -0,0 +1,471 @@
+#!/usr/bin/env python -u
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import ast
+import hashlib
+import logging
+import os
+import shutil
+import sys
+from dataclasses import dataclass, field, is_dataclass
+from pathlib import Path
+from typing import Any, Dict, List, Optional, Tuple, Union
+
+import editdistance
+import torch
+import torch.distributed as dist
+from examples.speech_recognition.new.decoders.decoder_config import (
+    DecoderConfig,
+    FlashlightDecoderConfig,
+)
+from examples.speech_recognition.new.decoders.decoder import Decoder
+from fairseq import checkpoint_utils, distributed_utils, progress_bar, tasks, utils
+from fairseq.data.data_utils import post_process
+from fairseq.dataclass.configs import (
+    CheckpointConfig,
+    CommonConfig,
+    CommonEvalConfig,
+    DatasetConfig,
+    DistributedTrainingConfig,
+    FairseqDataclass,
+)
+from fairseq.logging.meters import StopwatchMeter, TimeMeter
+from fairseq.logging.progress_bar import BaseProgressBar
+from fairseq.models.fairseq_model import FairseqModel
+from omegaconf import OmegaConf
+
+import hydra
+from hydra.core.config_store import ConfigStore
+
+logging.root.setLevel(logging.INFO)
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger(__name__)
+
+config_path = Path(__file__).resolve().parent / "conf"
+
+
+@dataclass
+class DecodingConfig(DecoderConfig, FlashlightDecoderConfig):
+    unique_wer_file: bool = field(
+        default=False,
+        metadata={"help": "If set, use a unique file for storing WER"},
+    )
+    results_path: Optional[str] = field(
+        default=None,
+        metadata={
+            "help": "If set, write hypothesis and reference sentences into this directory"
+        },
+    )
+
+
+@dataclass
+class InferConfig(FairseqDataclass):
+    task: Any = None
+    decoding: DecodingConfig = DecodingConfig()
+    common: CommonConfig = CommonConfig()
+    common_eval: CommonEvalConfig = CommonEvalConfig()
+    checkpoint: CheckpointConfig = CheckpointConfig()
+    distributed_training: DistributedTrainingConfig = DistributedTrainingConfig()
+    dataset: DatasetConfig = DatasetConfig()
+    is_ax: bool = field(
+        default=False,
+        metadata={
+            "help": "if true, assumes we are using ax for tuning and returns a tuple for ax to consume"
+        },
+    )
+
+
+def reset_logging():
+    root = logging.getLogger()
+    for handler in root.handlers:
+        root.removeHandler(handler)
+    root.setLevel(os.environ.get("LOGLEVEL", "INFO").upper())
+    handler = logging.StreamHandler(sys.stdout)
+    handler.setFormatter(
+        logging.Formatter(
+            fmt="%(asctime)s | %(levelname)s | %(name)s | %(message)s",
+            datefmt="%Y-%m-%d %H:%M:%S",
+        )
+    )
+    root.addHandler(handler)
+
+
+class InferenceProcessor:
+    cfg: InferConfig
+
+    def __init__(self, cfg: InferConfig) -> None:
+        self.cfg = cfg
+        self.task = tasks.setup_task(cfg.task)
+        self.tgt_dict = self.task.target_dictionary
+
+        models, saved_cfg = self.load_model_ensemble()
+        self.models = models
+        self.saved_cfg = saved_cfg
+
+        self.task.load_dataset(
+            self.cfg.dataset.gen_subset,
+            task_cfg=saved_cfg.task,
+        )
+        self.generator = Decoder(cfg.decoding, self.tgt_dict)
+        self.gen_timer = StopwatchMeter()
+        self.wps_meter = TimeMeter()
+        self.num_sentences = 0
+        self.total_errors = 0
+        self.total_length = 0
+
+        self.hypo_words_file = None
+        self.hypo_units_file = None
+        self.ref_words_file = None
+        self.ref_units_file = None
+
+        self.progress_bar = self.build_progress_bar()
+
+    def __enter__(self) -> "InferenceProcessor":
+        if self.cfg.decoding.results_path is not None:
+            self.hypo_words_file = self.get_res_file("hypo.word")
+            self.hypo_units_file = self.get_res_file("hypo.units")
+            self.ref_words_file = self.get_res_file("ref.word")
+            self.ref_units_file = self.get_res_file("ref.units")
+        return self
+
+    def __exit__(self, *exc) -> bool:
+        if self.cfg.decoding.results_path is not None:
+            self.hypo_words_file.close()
+            self.hypo_units_file.close()
+            self.ref_words_file.close()
+            self.ref_units_file.close()
+        return False
+
+    def __iter__(self) -> Any:
+        for sample in self.progress_bar:
+            if not self.cfg.common.cpu:
+                sample = utils.move_to_cuda(sample)
+
+            # Happens on the last batch.
+            if "net_input" not in sample:
+                continue
+            yield sample
+
+    def log(self, *args, **kwargs):
+        self.progress_bar.log(*args, **kwargs)
+
+    def print(self, *args, **kwargs):
+        self.progress_bar.print(*args, **kwargs)
+
+    def get_res_file(self, fname: str) -> None:
+        fname = os.path.join(self.cfg.decoding.results_path, fname)
+        if self.data_parallel_world_size > 1:
+            fname = f"{fname}.{self.data_parallel_rank}"
+        return open(fname, "w", buffering=1)
+
+    def merge_shards(self) -> None:
+        """Merges all shard files into shard 0, then removes shard suffix."""
+
+        shard_id = self.data_parallel_rank
+        num_shards = self.data_parallel_world_size
+
+        if self.data_parallel_world_size > 1:
+
+            def merge_shards_with_root(fname: str) -> None:
+                fname = os.path.join(self.cfg.decoding.results_path, fname)
+                logger.info("Merging %s on shard %d", fname, shard_id)
+                base_fpath = Path(f"{fname}.0")
+                with open(base_fpath, "a") as out_file:
+                    for s in range(1, num_shards):
+                        shard_fpath = Path(f"{fname}.{s}")
+                        with open(shard_fpath, "r") as in_file:
+                            for line in in_file:
+                                out_file.write(line)
+                        shard_fpath.unlink()
+                shutil.move(f"{fname}.0", fname)
+
+            dist.barrier()  # ensure all shards finished writing
+            if shard_id == (0 % num_shards):
+                merge_shards_with_root("hypo.word")
+            if shard_id == (1 % num_shards):
+                merge_shards_with_root("hypo.units")
+            if shard_id == (2 % num_shards):
+                merge_shards_with_root("ref.word")
+            if shard_id == (3 % num_shards):
+                merge_shards_with_root("ref.units")
+            dist.barrier()
+
+    def optimize_model(self, model: FairseqModel) -> None:
+        model.make_generation_fast_()
+        if self.cfg.common.fp16:
+            model.half()
+        if not self.cfg.common.cpu:
+            model.cuda()
+
+    def load_model_ensemble(self) -> Tuple[List[FairseqModel], FairseqDataclass]:
+        arg_overrides = ast.literal_eval(self.cfg.common_eval.model_overrides)
+        models, saved_cfg = checkpoint_utils.load_model_ensemble(
+            utils.split_paths(self.cfg.common_eval.path, separator="\\"),
+            arg_overrides=arg_overrides,
+            task=self.task,
+            suffix=self.cfg.checkpoint.checkpoint_suffix,
+            strict=(self.cfg.checkpoint.checkpoint_shard_count == 1),
+            num_shards=self.cfg.checkpoint.checkpoint_shard_count,
+        )
+        for model in models:
+            self.optimize_model(model)
+        return models, saved_cfg
+
+    def get_dataset_itr(self, disable_iterator_cache: bool = False) -> None:
+        return self.task.get_batch_iterator(
+            dataset=self.task.dataset(self.cfg.dataset.gen_subset),
+            max_tokens=self.cfg.dataset.max_tokens,
+            max_sentences=self.cfg.dataset.batch_size,
+            max_positions=(sys.maxsize, sys.maxsize),
+            ignore_invalid_inputs=self.cfg.dataset.skip_invalid_size_inputs_valid_test,
+            required_batch_size_multiple=self.cfg.dataset.required_batch_size_multiple,
+            seed=self.cfg.common.seed,
+            num_shards=self.data_parallel_world_size,
+            shard_id=self.data_parallel_rank,
+            num_workers=self.cfg.dataset.num_workers,
+            data_buffer_size=self.cfg.dataset.data_buffer_size,
+            disable_iterator_cache=disable_iterator_cache,
+        ).next_epoch_itr(shuffle=False)
+
+    def build_progress_bar(
+        self,
+        epoch: Optional[int] = None,
+        prefix: Optional[str] = None,
+        default_log_format: str = "tqdm",
+    ) -> BaseProgressBar:
+        return progress_bar.progress_bar(
+            iterator=self.get_dataset_itr(),
+            log_format=self.cfg.common.log_format,
+            log_interval=self.cfg.common.log_interval,
+            epoch=epoch,
+            prefix=prefix,
+            tensorboard_logdir=self.cfg.common.tensorboard_logdir,
+            default_log_format=default_log_format,
+        )
+
+    @property
+    def data_parallel_world_size(self):
+        if self.cfg.distributed_training.distributed_world_size == 1:
+            return 1
+        return distributed_utils.get_data_parallel_world_size()
+
+    @property
+    def data_parallel_rank(self):
+        if self.cfg.distributed_training.distributed_world_size == 1:
+            return 0
+        return distributed_utils.get_data_parallel_rank()
+
+    def process_sentence(
+        self,
+        sample: Dict[str, Any],
+        hypo: Dict[str, Any],
+        sid: int,
+        batch_id: int,
+    ) -> Tuple[int, int]:
+        speaker = None  # Speaker can't be parsed from dataset.
+
+        if "target_label" in sample:
+            toks = sample["target_label"]
+        else:
+            toks = sample["target"]
+        toks = toks[batch_id, :]
+
+        # Processes hypothesis.
+        hyp_pieces = self.tgt_dict.string(hypo["tokens"].int().cpu())
+        if "words" in hypo:
+            hyp_words = " ".join(hypo["words"])
+        else:
+            hyp_words = post_process(hyp_pieces, self.cfg.common_eval.post_process)
+
+        # Processes target.
+        target_tokens = utils.strip_pad(toks, self.tgt_dict.pad())
+        tgt_pieces = self.tgt_dict.string(target_tokens.int().cpu())
+        tgt_words = post_process(tgt_pieces, self.cfg.common_eval.post_process)
+
+        if self.cfg.decoding.results_path is not None:
+            print(f"{hyp_pieces} ({speaker}-{sid})", file=self.hypo_units_file)
+            print(f"{hyp_words} ({speaker}-{sid})", file=self.hypo_words_file)
+            print(f"{tgt_pieces} ({speaker}-{sid})", file=self.ref_units_file)
+            print(f"{tgt_words} ({speaker}-{sid})", file=self.ref_words_file)
+
+        if not self.cfg.common_eval.quiet:
+            logger.info(f"HYPO: {hyp_words}")
+            logger.info(f"REF: {tgt_words}")
+            logger.info("---------------------")
+
+        hyp_words, tgt_words = hyp_words.split(), tgt_words.split()
+
+        return editdistance.eval(hyp_words, tgt_words), len(tgt_words)
+
+    def process_sample(self, sample: Dict[str, Any]) -> None:
+        self.gen_timer.start()
+        hypos = self.task.inference_step(
+            generator=self.generator,
+            models=self.models,
+            sample=sample,
+        )
+        num_generated_tokens = sum(len(h[0]["tokens"]) for h in hypos)
+        self.gen_timer.stop(num_generated_tokens)
+        self.wps_meter.update(num_generated_tokens)
+
+        for batch_id, sample_id in enumerate(sample["id"].tolist()):
+            errs, length = self.process_sentence(
+                sample=sample,
+                sid=sample_id,
+                batch_id=batch_id,
+                hypo=hypos[batch_id][0],
+            )
+            self.total_errors += errs
+            self.total_length += length
+
+        self.log({"wps": round(self.wps_meter.avg)})
+        if "nsentences" in sample:
+            self.num_sentences += sample["nsentences"]
+        else:
+            self.num_sentences += sample["id"].numel()
+
+    def log_generation_time(self) -> None:
+        logger.info(
+            "Processed %d sentences (%d tokens) in %.1fs %.2f "
+            "sentences per second, %.2f tokens per second)",
+            self.num_sentences,
+            self.gen_timer.n,
+            self.gen_timer.sum,
+            self.num_sentences / self.gen_timer.sum,
+            1.0 / self.gen_timer.avg,
+        )
+
+
+def parse_wer(wer_file: Path) -> float:
+    with open(wer_file, "r") as f:
+        return float(f.readline().strip().split(" ")[1])
+
+
+def get_wer_file(cfg: InferConfig) -> Path:
+    """Hashes the decoding parameters to a unique file ID."""
+    base_path = "wer"
+    if cfg.decoding.results_path is not None:
+        base_path = os.path.join(cfg.decoding.results_path, base_path)
+
+    if cfg.decoding.unique_wer_file:
+        yaml_str = OmegaConf.to_yaml(cfg.decoding)
+        fid = int(hashlib.md5(yaml_str.encode("utf-8")).hexdigest(), 16)
+        return Path(f"{base_path}.{fid % 1000000}")
+    else:
+        return Path(base_path)
+
+
+def main(cfg: InferConfig) -> float:
+    """Entry point for main processing logic.
+
+    Args:
+        cfg: The inferance configuration to use.
+        wer: Optional shared memory pointer for returning the WER. If not None,
+            the final WER value will be written here instead of being returned.
+
+    Returns:
+        The final WER if `wer` is None, otherwise None.
+    """
+
+    yaml_str, wer_file = OmegaConf.to_yaml(cfg.decoding), get_wer_file(cfg)
+
+    # Validates the provided configuration.
+    if cfg.dataset.max_tokens is None and cfg.dataset.batch_size is None:
+        cfg.dataset.max_tokens = 4000000
+    if not cfg.common.cpu and not torch.cuda.is_available():
+        raise ValueError("CUDA not found; set `cpu=True` to run without CUDA")
+
+    with InferenceProcessor(cfg) as processor:
+        for sample in processor:
+            processor.process_sample(sample)
+
+        processor.log_generation_time()
+
+        if cfg.decoding.results_path is not None:
+            processor.merge_shards()
+
+        errs_t, leng_t = processor.total_errors, processor.total_length
+
+        if cfg.common.cpu:
+            logger.warning("Merging WER requires CUDA.")
+        elif processor.data_parallel_world_size > 1:
+            stats = torch.LongTensor([errs_t, leng_t]).cuda()
+            dist.all_reduce(stats, op=dist.ReduceOp.SUM)
+            errs_t, leng_t = stats[0].item(), stats[1].item()
+
+        wer = errs_t * 100.0 / leng_t
+
+        if distributed_utils.is_master(cfg.distributed_training):
+            with open(wer_file, "w") as f:
+                f.write(
+                    (
+                        f"WER: {wer}\n"
+                        f"err / num_ref_words = {errs_t} / {leng_t}\n\n"
+                        f"{yaml_str}"
+                    )
+                )
+
+        return wer
+
+
+@hydra.main(config_path=config_path, config_name="infer")
+def hydra_main(cfg: InferConfig) -> Union[float, Tuple[float, Optional[float]]]:
+    container = OmegaConf.to_container(cfg, resolve=True, enum_to_str=True)
+    cfg = OmegaConf.create(container)
+    OmegaConf.set_struct(cfg, True)
+
+    if cfg.common.reset_logging:
+        reset_logging()
+
+    # logger.info("Config:\n%s", OmegaConf.to_yaml(cfg))
+    wer = float("inf")
+
+    try:
+        if cfg.common.profile:
+            with torch.cuda.profiler.profile():
+                with torch.autograd.profiler.emit_nvtx():
+                    distributed_utils.call_main(cfg, main)
+        else:
+            distributed_utils.call_main(cfg, main)
+
+        wer = parse_wer(get_wer_file(cfg))
+    except BaseException as e:  # pylint: disable=broad-except
+        if not cfg.common.suppress_crashes:
+            raise
+        else:
+            logger.error("Crashed! %s", str(e))
+
+    logger.info("Word error rate: %.4f", wer)
+    if cfg.is_ax:
+        return wer, None
+
+    return wer
+
+
+def cli_main() -> None:
+    try:
+        from hydra._internal.utils import (
+            get_args,
+        )  # pylint: disable=import-outside-toplevel
+
+        cfg_name = get_args().config_name or "infer"
+    except ImportError:
+        logger.warning("Failed to get config name from hydra args")
+        cfg_name = "infer"
+
+    cs = ConfigStore.instance()
+    cs.store(name=cfg_name, node=InferConfig)
+
+    for k in InferConfig.__dataclass_fields__:
+        if is_dataclass(InferConfig.__dataclass_fields__[k].type):
+            v = InferConfig.__dataclass_fields__[k].default
+            cs.store(name=k, node=v)
+
+    hydra_main()  # pylint: disable=no-value-for-parameter
+
+
+if __name__ == "__main__":
+    cli_main()
diff --git a/SpeechT5/fairseq/examples/speech_recognition/tasks/__init__.py b/SpeechT5/fairseq/examples/speech_recognition/tasks/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..7ac3b8dc69639c92cc129294356e9012745e3fb2
--- /dev/null
+++ b/SpeechT5/fairseq/examples/speech_recognition/tasks/__init__.py
@@ -0,0 +1,8 @@
+import importlib
+import os
+
+
+for file in sorted(os.listdir(os.path.dirname(__file__))):
+    if file.endswith(".py") and not file.startswith("_"):
+        task_name = file[: file.find(".py")]
+        importlib.import_module("examples.speech_recognition.tasks." + task_name)
diff --git a/SpeechT5/fairseq/examples/speech_recognition/tasks/speech_recognition.py b/SpeechT5/fairseq/examples/speech_recognition/tasks/speech_recognition.py
new file mode 100644
index 0000000000000000000000000000000000000000..d9f011d55ff4fdfeb4c04ca790c314d685708c3a
--- /dev/null
+++ b/SpeechT5/fairseq/examples/speech_recognition/tasks/speech_recognition.py
@@ -0,0 +1,157 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import json
+import os
+import re
+import sys
+
+import torch
+from examples.speech_recognition.data import AsrDataset
+from examples.speech_recognition.data.replabels import replabel_symbol
+from fairseq.data import Dictionary
+from fairseq.tasks import LegacyFairseqTask, register_task
+
+
+def get_asr_dataset_from_json(data_json_path, tgt_dict):
+    """
+    Parse data json and create dataset.
+    See scripts/asr_prep_json.py which pack json from raw files
+
+    Json example:
+    {
+    "utts": {
+        "4771-29403-0025": {
+            "input": {
+                "length_ms": 170,
+                "path": "/tmp/file1.flac"
+            },
+            "output": {
+                "text": "HELLO \n",
+                "token": "HE LLO",
+                "tokenid": "4815, 861"
+            }
+        },
+        "1564-142299-0096": {
+            ...
+        }
+    }
+    """
+    if not os.path.isfile(data_json_path):
+        raise FileNotFoundError("Dataset not found: {}".format(data_json_path))
+    with open(data_json_path, "rb") as f:
+        data_samples = json.load(f)["utts"]
+        assert len(data_samples) != 0
+        sorted_samples = sorted(
+            data_samples.items(),
+            key=lambda sample: int(sample[1]["input"]["length_ms"]),
+            reverse=True,
+        )
+        aud_paths = [s[1]["input"]["path"] for s in sorted_samples]
+        ids = [s[0] for s in sorted_samples]
+        speakers = []
+        for s in sorted_samples:
+            m = re.search("(.+?)-(.+?)-(.+?)", s[0])
+            speakers.append(m.group(1) + "_" + m.group(2))
+        frame_sizes = [s[1]["input"]["length_ms"] for s in sorted_samples]
+        tgt = [
+            [int(i) for i in s[1]["output"]["tokenid"].split(", ")]
+            for s in sorted_samples
+        ]
+        # append eos
+        tgt = [[*t, tgt_dict.eos()] for t in tgt]
+        return AsrDataset(aud_paths, frame_sizes, tgt, tgt_dict, ids, speakers)
+
+
+@register_task("speech_recognition")
+class SpeechRecognitionTask(LegacyFairseqTask):
+    """
+    Task for training speech recognition model.
+    """
+
+    @staticmethod
+    def add_args(parser):
+        """Add task-specific arguments to the parser."""
+        parser.add_argument("data", help="path to data directory")
+        parser.add_argument(
+            "--silence-token", default="\u2581", help="token for silence (used by w2l)"
+        )
+        parser.add_argument(
+            "--max-source-positions",
+            default=sys.maxsize,
+            type=int,
+            metavar="N",
+            help="max number of frames in the source sequence",
+        )
+        parser.add_argument(
+            "--max-target-positions",
+            default=1024,
+            type=int,
+            metavar="N",
+            help="max number of tokens in the target sequence",
+        )
+
+    def __init__(self, args, tgt_dict):
+        super().__init__(args)
+        self.tgt_dict = tgt_dict
+
+    @classmethod
+    def setup_task(cls, args, **kwargs):
+        """Setup the task (e.g., load dictionaries)."""
+        dict_path = os.path.join(args.data, "dict.txt")
+        if not os.path.isfile(dict_path):
+            raise FileNotFoundError("Dict not found: {}".format(dict_path))
+        tgt_dict = Dictionary.load(dict_path)
+
+        if args.criterion == "ctc_loss":
+            tgt_dict.add_symbol("<ctc_blank>")
+        elif args.criterion == "asg_loss":
+            for i in range(1, args.max_replabel + 1):
+                tgt_dict.add_symbol(replabel_symbol(i))
+
+        print("| dictionary: {} types".format(len(tgt_dict)))
+        return cls(args, tgt_dict)
+
+    def load_dataset(self, split, combine=False, **kwargs):
+        """Load a given dataset split.
+
+        Args:
+            split (str): name of the split (e.g., train, valid, test)
+        """
+        data_json_path = os.path.join(self.args.data, "{}.json".format(split))
+        self.datasets[split] = get_asr_dataset_from_json(data_json_path, self.tgt_dict)
+
+    def build_generator(self, models, args, **unused):
+        w2l_decoder = getattr(args, "w2l_decoder", None)
+        if w2l_decoder == "viterbi":
+            from examples.speech_recognition.w2l_decoder import W2lViterbiDecoder
+
+            return W2lViterbiDecoder(args, self.target_dictionary)
+        elif w2l_decoder == "kenlm":
+            from examples.speech_recognition.w2l_decoder import W2lKenLMDecoder
+
+            return W2lKenLMDecoder(args, self.target_dictionary)
+        elif w2l_decoder == "fairseqlm":
+            from examples.speech_recognition.w2l_decoder import W2lFairseqLMDecoder
+
+            return W2lFairseqLMDecoder(args, self.target_dictionary)
+        else:
+            return super().build_generator(models, args)
+
+    @property
+    def target_dictionary(self):
+        """Return the :class:`~fairseq.data.Dictionary` for the language
+        model."""
+        return self.tgt_dict
+
+    @property
+    def source_dictionary(self):
+        """Return the source :class:`~fairseq.data.Dictionary` (if applicable
+        for this task)."""
+        return None
+
+    def max_positions(self):
+        """Return the max speech and sentence length allowed by the task."""
+        return (self.args.max_source_positions, self.args.max_target_positions)
diff --git a/SpeechT5/fairseq/examples/speech_recognition/utils/wer_utils.py b/SpeechT5/fairseq/examples/speech_recognition/utils/wer_utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..cf6f3d09ba41a46ad4d7968fb3c286dd53d15c38
--- /dev/null
+++ b/SpeechT5/fairseq/examples/speech_recognition/utils/wer_utils.py
@@ -0,0 +1,381 @@
+#!/usr/bin/env python3
+
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from __future__ import absolute_import, division, print_function, unicode_literals
+
+import re
+from collections import deque
+from enum import Enum
+
+import numpy as np
+
+
+"""
+    Utility modules for computation of Word Error Rate,
+    Alignments, as well as more granular metrics like
+    deletion, insersion and substitutions.
+"""
+
+
+class Code(Enum):
+    match = 1
+    substitution = 2
+    insertion = 3
+    deletion = 4
+
+
+class Token(object):
+    def __init__(self, lbl="", st=np.nan, en=np.nan):
+        if np.isnan(st):
+            self.label, self.start, self.end = "", 0.0, 0.0
+        else:
+            self.label, self.start, self.end = lbl, st, en
+
+
+class AlignmentResult(object):
+    def __init__(self, refs, hyps, codes, score):
+        self.refs = refs  # std::deque<int>
+        self.hyps = hyps  # std::deque<int>
+        self.codes = codes  # std::deque<Code>
+        self.score = score  # float
+
+
+def coordinate_to_offset(row, col, ncols):
+    return int(row * ncols + col)
+
+
+def offset_to_row(offset, ncols):
+    return int(offset / ncols)
+
+
+def offset_to_col(offset, ncols):
+    return int(offset % ncols)
+
+
+def trimWhitespace(str):
+    return re.sub(" +", " ", re.sub(" *$", "", re.sub("^ *", "", str)))
+
+
+def str2toks(str):
+    pieces = trimWhitespace(str).split(" ")
+    toks = []
+    for p in pieces:
+        toks.append(Token(p, 0.0, 0.0))
+    return toks
+
+
+class EditDistance(object):
+    def __init__(self, time_mediated):
+        self.time_mediated_ = time_mediated
+        self.scores_ = np.nan  # Eigen::Matrix<float, Eigen::Dynamic, Eigen::Dynamic>
+        self.backtraces_ = (
+            np.nan
+        )  # Eigen::Matrix<size_t, Eigen::Dynamic, Eigen::Dynamic> backtraces_;
+        self.confusion_pairs_ = {}
+
+    def cost(self, ref, hyp, code):
+        if self.time_mediated_:
+            if code == Code.match:
+                return abs(ref.start - hyp.start) + abs(ref.end - hyp.end)
+            elif code == Code.insertion:
+                return hyp.end - hyp.start
+            elif code == Code.deletion:
+                return ref.end - ref.start
+            else:  # substitution
+                return abs(ref.start - hyp.start) + abs(ref.end - hyp.end) + 0.1
+        else:
+            if code == Code.match:
+                return 0
+            elif code == Code.insertion or code == Code.deletion:
+                return 3
+            else:  # substitution
+                return 4
+
+    def get_result(self, refs, hyps):
+        res = AlignmentResult(refs=deque(), hyps=deque(), codes=deque(), score=np.nan)
+
+        num_rows, num_cols = self.scores_.shape
+        res.score = self.scores_[num_rows - 1, num_cols - 1]
+
+        curr_offset = coordinate_to_offset(num_rows - 1, num_cols - 1, num_cols)
+
+        while curr_offset != 0:
+            curr_row = offset_to_row(curr_offset, num_cols)
+            curr_col = offset_to_col(curr_offset, num_cols)
+
+            prev_offset = self.backtraces_[curr_row, curr_col]
+
+            prev_row = offset_to_row(prev_offset, num_cols)
+            prev_col = offset_to_col(prev_offset, num_cols)
+
+            res.refs.appendleft(curr_row - 1)  # Note: this was .push_front() in C++
+            res.hyps.appendleft(curr_col - 1)
+            if curr_row - 1 == prev_row and curr_col == prev_col:
+                res.codes.appendleft(Code.deletion)
+            elif curr_row == prev_row and curr_col - 1 == prev_col:
+                res.codes.appendleft(Code.insertion)
+            else:
+                # assert(curr_row - 1 == prev_row and curr_col - 1 == prev_col)
+                ref_str = refs[res.refs[0]].label
+                hyp_str = hyps[res.hyps[0]].label
+
+                if ref_str == hyp_str:
+                    res.codes.appendleft(Code.match)
+                else:
+                    res.codes.appendleft(Code.substitution)
+
+                    confusion_pair = "%s -> %s" % (ref_str, hyp_str)
+                    if confusion_pair not in self.confusion_pairs_:
+                        self.confusion_pairs_[confusion_pair] = 1
+                    else:
+                        self.confusion_pairs_[confusion_pair] += 1
+
+            curr_offset = prev_offset
+
+        return res
+
+    def align(self, refs, hyps):
+        if len(refs) == 0 and len(hyps) == 0:
+            return np.nan
+
+        # NOTE: we're not resetting the values in these matrices because every value
+        # will be overridden in the loop below. If this assumption doesn't hold,
+        # be sure to set all entries in self.scores_ and self.backtraces_ to 0.
+        self.scores_ = np.zeros((len(refs) + 1, len(hyps) + 1))
+        self.backtraces_ = np.zeros((len(refs) + 1, len(hyps) + 1))
+
+        num_rows, num_cols = self.scores_.shape
+
+        for i in range(num_rows):
+            for j in range(num_cols):
+                if i == 0 and j == 0:
+                    self.scores_[i, j] = 0.0
+                    self.backtraces_[i, j] = 0
+                    continue
+
+                if i == 0:
+                    self.scores_[i, j] = self.scores_[i, j - 1] + self.cost(
+                        None, hyps[j - 1], Code.insertion
+                    )
+                    self.backtraces_[i, j] = coordinate_to_offset(i, j - 1, num_cols)
+                    continue
+
+                if j == 0:
+                    self.scores_[i, j] = self.scores_[i - 1, j] + self.cost(
+                        refs[i - 1], None, Code.deletion
+                    )
+                    self.backtraces_[i, j] = coordinate_to_offset(i - 1, j, num_cols)
+                    continue
+
+                # Below here both i and j are greater than 0
+                ref = refs[i - 1]
+                hyp = hyps[j - 1]
+                best_score = self.scores_[i - 1, j - 1] + (
+                    self.cost(ref, hyp, Code.match)
+                    if (ref.label == hyp.label)
+                    else self.cost(ref, hyp, Code.substitution)
+                )
+
+                prev_row = i - 1
+                prev_col = j - 1
+                ins = self.scores_[i, j - 1] + self.cost(None, hyp, Code.insertion)
+                if ins < best_score:
+                    best_score = ins
+                    prev_row = i
+                    prev_col = j - 1
+
+                delt = self.scores_[i - 1, j] + self.cost(ref, None, Code.deletion)
+                if delt < best_score:
+                    best_score = delt
+                    prev_row = i - 1
+                    prev_col = j
+
+                self.scores_[i, j] = best_score
+                self.backtraces_[i, j] = coordinate_to_offset(
+                    prev_row, prev_col, num_cols
+                )
+
+        return self.get_result(refs, hyps)
+
+
+class WERTransformer(object):
+    def __init__(self, hyp_str, ref_str, verbose=True):
+        self.ed_ = EditDistance(False)
+        self.id2oracle_errs_ = {}
+        self.utts_ = 0
+        self.words_ = 0
+        self.insertions_ = 0
+        self.deletions_ = 0
+        self.substitutions_ = 0
+
+        self.process(["dummy_str", hyp_str, ref_str])
+
+        if verbose:
+            print("'%s' vs '%s'" % (hyp_str, ref_str))
+            self.report_result()
+
+    def process(self, input):  # std::vector<std::string>&& input
+        if len(input) < 3:
+            print(
+                "Input must be of the form <id> ... <hypo> <ref> , got ",
+                len(input),
+                " inputs:",
+            )
+            return None
+
+        # Align
+        # std::vector<Token> hyps;
+        # std::vector<Token> refs;
+
+        hyps = str2toks(input[-2])
+        refs = str2toks(input[-1])
+
+        alignment = self.ed_.align(refs, hyps)
+        if alignment is None:
+            print("Alignment is null")
+            return np.nan
+
+        # Tally errors
+        ins = 0
+        dels = 0
+        subs = 0
+        for code in alignment.codes:
+            if code == Code.substitution:
+                subs += 1
+            elif code == Code.insertion:
+                ins += 1
+            elif code == Code.deletion:
+                dels += 1
+
+        # Output
+        row = input
+        row.append(str(len(refs)))
+        row.append(str(ins))
+        row.append(str(dels))
+        row.append(str(subs))
+        # print(row)
+
+        # Accumulate
+        kIdIndex = 0
+        kNBestSep = "/"
+
+        pieces = input[kIdIndex].split(kNBestSep)
+
+        if len(pieces) == 0:
+            print(
+                "Error splitting ",
+                input[kIdIndex],
+                " on '",
+                kNBestSep,
+                "', got empty list",
+            )
+            return np.nan
+
+        id = pieces[0]
+        if id not in self.id2oracle_errs_:
+            self.utts_ += 1
+            self.words_ += len(refs)
+            self.insertions_ += ins
+            self.deletions_ += dels
+            self.substitutions_ += subs
+            self.id2oracle_errs_[id] = [ins, dels, subs]
+        else:
+            curr_err = ins + dels + subs
+            prev_err = np.sum(self.id2oracle_errs_[id])
+            if curr_err < prev_err:
+                self.id2oracle_errs_[id] = [ins, dels, subs]
+
+        return 0
+
+    def report_result(self):
+        # print("----------  Summary ---------------")
+        if self.words_ == 0:
+            print("No words counted")
+            return
+
+        # 1-best
+        best_wer = (
+            100.0
+            * (self.insertions_ + self.deletions_ + self.substitutions_)
+            / self.words_
+        )
+
+        print(
+            "\tWER = %0.2f%% (%i utts, %i words, %0.2f%% ins, "
+            "%0.2f%% dels, %0.2f%% subs)"
+            % (
+                best_wer,
+                self.utts_,
+                self.words_,
+                100.0 * self.insertions_ / self.words_,
+                100.0 * self.deletions_ / self.words_,
+                100.0 * self.substitutions_ / self.words_,
+            )
+        )
+
+    def wer(self):
+        if self.words_ == 0:
+            wer = np.nan
+        else:
+            wer = (
+                100.0
+                * (self.insertions_ + self.deletions_ + self.substitutions_)
+                / self.words_
+            )
+        return wer
+
+    def stats(self):
+        if self.words_ == 0:
+            stats = {}
+        else:
+            wer = (
+                100.0
+                * (self.insertions_ + self.deletions_ + self.substitutions_)
+                / self.words_
+            )
+            stats = dict(
+                {
+                    "wer": wer,
+                    "utts": self.utts_,
+                    "numwords": self.words_,
+                    "ins": self.insertions_,
+                    "dels": self.deletions_,
+                    "subs": self.substitutions_,
+                    "confusion_pairs": self.ed_.confusion_pairs_,
+                }
+            )
+        return stats
+
+
+def calc_wer(hyp_str, ref_str):
+    t = WERTransformer(hyp_str, ref_str, verbose=0)
+    return t.wer()
+
+
+def calc_wer_stats(hyp_str, ref_str):
+    t = WERTransformer(hyp_str, ref_str, verbose=0)
+    return t.stats()
+
+
+def get_wer_alignment_codes(hyp_str, ref_str):
+    """
+    INPUT: hypothesis string, reference string
+    OUTPUT: List of alignment codes (intermediate results from WER computation)
+    """
+    t = WERTransformer(hyp_str, ref_str, verbose=0)
+    return t.ed_.align(str2toks(ref_str), str2toks(hyp_str)).codes
+
+
+def merge_counts(x, y):
+    # Merge two hashes which have 'counts' as their values
+    # This can be used for example to merge confusion pair counts
+    #   conf_pairs = merge_counts(conf_pairs, stats['confusion_pairs'])
+    for k, v in y.items():
+        if k not in x:
+            x[k] = 0
+        x[k] += v
+    return x
diff --git a/SpeechT5/fairseq/examples/speech_recognition/w2l_decoder.py b/SpeechT5/fairseq/examples/speech_recognition/w2l_decoder.py
new file mode 100644
index 0000000000000000000000000000000000000000..fbf2d3524ee40bd0d08b6a9560047d96e49b6045
--- /dev/null
+++ b/SpeechT5/fairseq/examples/speech_recognition/w2l_decoder.py
@@ -0,0 +1,486 @@
+#!/usr/bin/env python3
+
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+"""
+Flashlight decoders.
+"""
+
+import gc
+import itertools as it
+import os.path as osp
+from typing import List
+import warnings
+from collections import deque, namedtuple
+
+import numpy as np
+import torch
+from examples.speech_recognition.data.replabels import unpack_replabels
+from fairseq import tasks
+from fairseq.utils import apply_to_sample
+from omegaconf import open_dict
+from fairseq.dataclass.utils import convert_namespace_to_omegaconf
+
+
+try:
+    from flashlight.lib.text.dictionary import create_word_dict, load_words
+    from flashlight.lib.sequence.criterion import CpuViterbiPath, get_data_ptr_as_bytes
+    from flashlight.lib.text.decoder import (
+        CriterionType,
+        LexiconDecoderOptions,
+        KenLM,
+        LM,
+        LMState,
+        SmearingMode,
+        Trie,
+        LexiconDecoder,
+    )
+except:
+    warnings.warn(
+        "flashlight python bindings are required to use this functionality. Please install from https://github.com/facebookresearch/flashlight/tree/master/bindings/python"
+    )
+    LM = object
+    LMState = object
+
+
+class W2lDecoder(object):
+    def __init__(self, args, tgt_dict):
+        self.tgt_dict = tgt_dict
+        self.vocab_size = len(tgt_dict)
+        self.nbest = args.nbest
+
+        # criterion-specific init
+        self.criterion_type = CriterionType.CTC
+        self.blank = (
+            tgt_dict.index("<ctc_blank>")
+            if "<ctc_blank>" in tgt_dict.indices
+            else tgt_dict.bos()
+        )
+        if "<sep>" in tgt_dict.indices:
+            self.silence = tgt_dict.index("<sep>")
+        elif "|" in tgt_dict.indices:
+            self.silence = tgt_dict.index("|")
+        else:
+            self.silence = tgt_dict.eos()
+        self.asg_transitions = None
+
+    def generate(self, models, sample, **unused):
+        """Generate a batch of inferences."""
+        # model.forward normally channels prev_output_tokens into the decoder
+        # separately, but SequenceGenerator directly calls model.encoder
+        encoder_input = {
+            k: v for k, v in sample["net_input"].items() if k != "prev_output_tokens"
+        }
+        emissions = self.get_emissions(models, encoder_input)
+        return self.decode(emissions)
+
+    def get_emissions(self, models, encoder_input):
+        """Run encoder and normalize emissions"""
+        model = models[0]
+        encoder_out = model(**encoder_input)
+        if hasattr(model, "get_logits"):
+            emissions = model.get_logits(encoder_out) # no need to normalize emissions
+        else:
+            emissions = model.get_normalized_probs(encoder_out, log_probs=True)
+        return emissions.transpose(0, 1).float().cpu().contiguous()
+
+    def get_tokens(self, idxs):
+        """Normalize tokens by handling CTC blank, ASG replabels, etc."""
+        idxs = (g[0] for g in it.groupby(idxs))
+        idxs = filter(lambda x: x != self.blank, idxs)
+        return torch.LongTensor(list(idxs))
+
+
+class W2lViterbiDecoder(W2lDecoder):
+    def __init__(self, args, tgt_dict):
+        super().__init__(args, tgt_dict)
+
+    def decode(self, emissions):
+        B, T, N = emissions.size()
+        hypos = []
+        if self.asg_transitions is None:
+            transitions = torch.FloatTensor(N, N).zero_()
+        else:
+            transitions = torch.FloatTensor(self.asg_transitions).view(N, N)
+        viterbi_path = torch.IntTensor(B, T)
+        workspace = torch.ByteTensor(CpuViterbiPath.get_workspace_size(B, T, N))
+        CpuViterbiPath.compute(
+            B,
+            T,
+            N,
+            get_data_ptr_as_bytes(emissions),
+            get_data_ptr_as_bytes(transitions),
+            get_data_ptr_as_bytes(viterbi_path),
+            get_data_ptr_as_bytes(workspace),
+        )
+        return [
+            [{"tokens": self.get_tokens(viterbi_path[b].tolist()), "score": 0}]
+            for b in range(B)
+        ]
+
+
+class W2lKenLMDecoder(W2lDecoder):
+    def __init__(self, args, tgt_dict):
+        super().__init__(args, tgt_dict)
+
+        self.unit_lm = getattr(args, "unit_lm", False)
+
+        if args.lexicon:
+            self.lexicon = load_words(args.lexicon)
+            self.word_dict = create_word_dict(self.lexicon)
+            self.unk_word = self.word_dict.get_index("<unk>")
+
+            self.lm = KenLM(args.kenlm_model, self.word_dict)
+            self.trie = Trie(self.vocab_size, self.silence)
+
+            start_state = self.lm.start(False)
+            for i, (word, spellings) in enumerate(self.lexicon.items()):
+                word_idx = self.word_dict.get_index(word)
+                _, score = self.lm.score(start_state, word_idx)
+                for spelling in spellings:
+                    spelling_idxs = [tgt_dict.index(token) for token in spelling]
+                    assert (
+                        tgt_dict.unk() not in spelling_idxs
+                    ), f"{spelling} {spelling_idxs}"
+                    self.trie.insert(spelling_idxs, word_idx, score)
+            self.trie.smear(SmearingMode.MAX)
+
+            self.decoder_opts = LexiconDecoderOptions(
+                beam_size=args.beam,
+                beam_size_token=int(getattr(args, "beam_size_token", len(tgt_dict))),
+                beam_threshold=args.beam_threshold,
+                lm_weight=args.lm_weight,
+                word_score=args.word_score,
+                unk_score=args.unk_weight,
+                sil_score=args.sil_weight,
+                log_add=False,
+                criterion_type=self.criterion_type,
+            )
+
+            if self.asg_transitions is None:
+                N = 768
+                # self.asg_transitions = torch.FloatTensor(N, N).zero_()
+                self.asg_transitions = []
+
+            self.decoder = LexiconDecoder(
+                self.decoder_opts,
+                self.trie,
+                self.lm,
+                self.silence,
+                self.blank,
+                self.unk_word,
+                self.asg_transitions,
+                self.unit_lm,
+            )
+        else:
+            assert args.unit_lm, "lexicon free decoding can only be done with a unit language model"
+            from flashlight.lib.text.decoder import LexiconFreeDecoder, LexiconFreeDecoderOptions
+
+            d = {w: [[w]] for w in tgt_dict.symbols}
+            self.word_dict = create_word_dict(d)
+            self.lm = KenLM(args.kenlm_model, self.word_dict)
+            self.decoder_opts = LexiconFreeDecoderOptions(
+                beam_size=args.beam,
+                beam_size_token=int(getattr(args, "beam_size_token", len(tgt_dict))),
+                beam_threshold=args.beam_threshold,
+                lm_weight=args.lm_weight,
+                sil_score=args.sil_weight,
+                log_add=False,
+                criterion_type=self.criterion_type,
+            )
+            self.decoder = LexiconFreeDecoder(
+                self.decoder_opts, self.lm, self.silence, self.blank, []
+            )
+
+    def get_timesteps(self, token_idxs: List[int]) -> List[int]:
+        """Returns frame numbers corresponding to every non-blank token.
+
+        Parameters
+        ----------
+        token_idxs : List[int]
+            IDs of decoded tokens.
+
+        Returns
+        -------
+        List[int]
+            Frame numbers corresponding to every non-blank token.
+        """
+        timesteps = []
+        for i, token_idx in enumerate(token_idxs):
+            if token_idx == self.blank:
+                continue
+            if i == 0 or token_idx != token_idxs[i-1]:
+                timesteps.append(i)
+        return timesteps
+
+    def decode(self, emissions):
+        B, T, N = emissions.size()
+        hypos = []
+        for b in range(B):
+            emissions_ptr = emissions.data_ptr() + 4 * b * emissions.stride(0)
+            results = self.decoder.decode(emissions_ptr, T, N)
+
+            nbest_results = results[: self.nbest]
+            hypos.append(
+                [
+                    {
+                        "tokens": self.get_tokens(result.tokens),
+                        "score": result.score,
+                        "timesteps": self.get_timesteps(result.tokens),
+                        "words": [
+                            self.word_dict.get_entry(x) for x in result.words if x >= 0
+                        ],
+                    }
+                    for result in nbest_results
+                ]
+            )
+        return hypos
+
+
+FairseqLMState = namedtuple("FairseqLMState", ["prefix", "incremental_state", "probs"])
+
+
+class FairseqLM(LM):
+    def __init__(self, dictionary, model):
+        LM.__init__(self)
+        self.dictionary = dictionary
+        self.model = model
+        self.unk = self.dictionary.unk()
+
+        self.save_incremental = False  # this currently does not work properly
+        self.max_cache = 20_000
+
+        model.cuda()
+        model.eval()
+        model.make_generation_fast_()
+
+        self.states = {}
+        self.stateq = deque()
+
+    def start(self, start_with_nothing):
+        state = LMState()
+        prefix = torch.LongTensor([[self.dictionary.eos()]])
+        incremental_state = {} if self.save_incremental else None
+        with torch.no_grad():
+            res = self.model(prefix.cuda(), incremental_state=incremental_state)
+            probs = self.model.get_normalized_probs(res, log_probs=True, sample=None)
+
+        if incremental_state is not None:
+            incremental_state = apply_to_sample(lambda x: x.cpu(), incremental_state)
+        self.states[state] = FairseqLMState(
+            prefix.numpy(), incremental_state, probs[0, -1].cpu().numpy()
+        )
+        self.stateq.append(state)
+
+        return state
+
+    def score(self, state: LMState, token_index: int, no_cache: bool = False):
+        """
+        Evaluate language model based on the current lm state and new word
+        Parameters:
+        -----------
+        state: current lm state
+        token_index: index of the word
+                     (can be lexicon index then you should store inside LM the
+                      mapping between indices of lexicon and lm, or lm index of a word)
+
+        Returns:
+        --------
+        (LMState, float): pair of (new state, score for the current word)
+        """
+        curr_state = self.states[state]
+
+        def trim_cache(targ_size):
+            while len(self.stateq) > targ_size:
+                rem_k = self.stateq.popleft()
+                rem_st = self.states[rem_k]
+                rem_st = FairseqLMState(rem_st.prefix, None, None)
+                self.states[rem_k] = rem_st
+
+        if curr_state.probs is None:
+            new_incremental_state = (
+                curr_state.incremental_state.copy()
+                if curr_state.incremental_state is not None
+                else None
+            )
+            with torch.no_grad():
+                if new_incremental_state is not None:
+                    new_incremental_state = apply_to_sample(
+                        lambda x: x.cuda(), new_incremental_state
+                    )
+                elif self.save_incremental:
+                    new_incremental_state = {}
+
+                res = self.model(
+                    torch.from_numpy(curr_state.prefix).cuda(),
+                    incremental_state=new_incremental_state,
+                )
+                probs = self.model.get_normalized_probs(
+                    res, log_probs=True, sample=None
+                )
+
+                if new_incremental_state is not None:
+                    new_incremental_state = apply_to_sample(
+                        lambda x: x.cpu(), new_incremental_state
+                    )
+
+                curr_state = FairseqLMState(
+                    curr_state.prefix, new_incremental_state, probs[0, -1].cpu().numpy()
+                )
+
+            if not no_cache:
+                self.states[state] = curr_state
+                self.stateq.append(state)
+
+        score = curr_state.probs[token_index].item()
+
+        trim_cache(self.max_cache)
+
+        outstate = state.child(token_index)
+        if outstate not in self.states and not no_cache:
+            prefix = np.concatenate(
+                [curr_state.prefix, torch.LongTensor([[token_index]])], -1
+            )
+            incr_state = curr_state.incremental_state
+
+            self.states[outstate] = FairseqLMState(prefix, incr_state, None)
+
+        if token_index == self.unk:
+            score = float("-inf")
+
+        return outstate, score
+
+    def finish(self, state: LMState):
+        """
+        Evaluate eos for language model based on the current lm state
+
+        Returns:
+        --------
+        (LMState, float): pair of (new state, score for the current word)
+        """
+        return self.score(state, self.dictionary.eos())
+
+    def empty_cache(self):
+        self.states = {}
+        self.stateq = deque()
+        gc.collect()
+
+
+class W2lFairseqLMDecoder(W2lDecoder):
+    def __init__(self, args, tgt_dict):
+        super().__init__(args, tgt_dict)
+
+        self.unit_lm = getattr(args, "unit_lm", False)
+
+        self.lexicon = load_words(args.lexicon) if args.lexicon else None
+        self.idx_to_wrd = {}
+
+        checkpoint = torch.load(args.kenlm_model, map_location="cpu")
+
+        if "cfg" in checkpoint and checkpoint["cfg"] is not None:
+            lm_args = checkpoint["cfg"]
+        else:
+            lm_args = convert_namespace_to_omegaconf(checkpoint["args"])
+
+        with open_dict(lm_args.task):
+            lm_args.task.data = osp.dirname(args.kenlm_model)
+
+        task = tasks.setup_task(lm_args.task)
+        model = task.build_model(lm_args.model)
+        model.load_state_dict(checkpoint["model"], strict=False)
+
+        self.trie = Trie(self.vocab_size, self.silence)
+
+        self.word_dict = task.dictionary
+        self.unk_word = self.word_dict.unk()
+        self.lm = FairseqLM(self.word_dict, model)
+
+        if self.lexicon:
+            start_state = self.lm.start(False)
+            for i, (word, spellings) in enumerate(self.lexicon.items()):
+                if self.unit_lm:
+                    word_idx = i
+                    self.idx_to_wrd[i] = word
+                    score = 0
+                else:
+                    word_idx = self.word_dict.index(word)
+                    _, score = self.lm.score(start_state, word_idx, no_cache=True)
+
+                for spelling in spellings:
+                    spelling_idxs = [tgt_dict.index(token) for token in spelling]
+                    assert (
+                        tgt_dict.unk() not in spelling_idxs
+                    ), f"{spelling} {spelling_idxs}"
+                    self.trie.insert(spelling_idxs, word_idx, score)
+            self.trie.smear(SmearingMode.MAX)
+
+            self.decoder_opts = LexiconDecoderOptions(
+                beam_size=args.beam,
+                beam_size_token=int(getattr(args, "beam_size_token", len(tgt_dict))),
+                beam_threshold=args.beam_threshold,
+                lm_weight=args.lm_weight,
+                word_score=args.word_score,
+                unk_score=args.unk_weight,
+                sil_score=args.sil_weight,
+                log_add=False,
+                criterion_type=self.criterion_type,
+            )
+
+            self.decoder = LexiconDecoder(
+                self.decoder_opts,
+                self.trie,
+                self.lm,
+                self.silence,
+                self.blank,
+                self.unk_word,
+                [],
+                self.unit_lm,
+            )
+        else:
+            assert args.unit_lm, "lexicon free decoding can only be done with a unit language model"
+            from flashlight.lib.text.decoder import LexiconFreeDecoder, LexiconFreeDecoderOptions
+
+            d = {w: [[w]] for w in tgt_dict.symbols}
+            self.word_dict = create_word_dict(d)
+            self.lm = KenLM(args.kenlm_model, self.word_dict)
+            self.decoder_opts = LexiconFreeDecoderOptions(
+                beam_size=args.beam,
+                beam_size_token=int(getattr(args, "beam_size_token", len(tgt_dict))),
+                beam_threshold=args.beam_threshold,
+                lm_weight=args.lm_weight,
+                sil_score=args.sil_weight,
+                log_add=False,
+                criterion_type=self.criterion_type,
+            )
+            self.decoder = LexiconFreeDecoder(
+                self.decoder_opts, self.lm, self.silence, self.blank, []
+            )
+
+    def decode(self, emissions):
+        B, T, N = emissions.size()
+        hypos = []
+
+        def idx_to_word(idx):
+            if self.unit_lm:
+                return self.idx_to_wrd[idx]
+            else:
+                return self.word_dict[idx]
+
+        def make_hypo(result):
+            hypo = {"tokens": self.get_tokens(result.tokens), "score": result.score}
+            if self.lexicon:
+                hypo["words"] = [idx_to_word(x) for x in result.words if x >= 0]
+            return hypo
+
+        for b in range(B):
+            emissions_ptr = emissions.data_ptr() + 4 * b * emissions.stride(0)
+            results = self.decoder.decode(emissions_ptr, T, N)
+
+            nbest_results = results[: self.nbest]
+            hypos.append([make_hypo(result) for result in nbest_results])
+            self.lm.empty_cache()
+
+        return hypos
diff --git a/SpeechT5/fairseq/examples/speech_to_text/README.md b/SpeechT5/fairseq/examples/speech_to_text/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..f639d300d342f8de1392c98bfc44ec8690188539
--- /dev/null
+++ b/SpeechT5/fairseq/examples/speech_to_text/README.md
@@ -0,0 +1,77 @@
+# Speech-to-Text (S2T) Modeling
+
+[https://www.aclweb.org/anthology/2020.aacl-demo.6](https://www.aclweb.org/anthology/2020.aacl-demo.6.pdf)
+
+Speech recognition (ASR) and speech-to-text translation (ST) with fairseq.
+
+## Data Preparation
+S2T modeling data consists of source speech features, target text and other optional information
+(source text, speaker id, etc.). Fairseq S2T uses per-dataset-split TSV manifest files
+to store these information. Each data field is represented by a column in the TSV file.
+
+Unlike text token embeddings, speech features (e.g. log mel-scale filter banks) are usually fixed
+during model training and can be pre-computed. The manifest file contains the path to
+either the feature file in NumPy format or the WAV/FLAC audio file. For the latter,
+features will be extracted on-the-fly by fairseq S2T. Optionally, feature/audio files can be packed
+into uncompressed ZIP files (then accessed via byte offset and length) to improve I/O performance.
+
+Fairseq S2T also employs a YAML file for data related configurations: tokenizer type and dictionary path
+for the target text, feature transforms such as CMVN (cepstral mean and variance normalization) and SpecAugment,
+temperature-based resampling, etc.
+
+## Model Training
+Fairseq S2T uses the unified `fairseq-train` interface for model training. It requires arguments `--task speech_to_text`,
+ `--arch <model architecture in fairseq.models.speech_to_text.*>` and `--config-yaml <config YAML filename>`.
+
+## Inference & Evaluation
+Fairseq S2T uses the unified `fairseq-generate`/`fairseq-interactive` interface for inference and evaluation. It
+requires arguments `--task speech_to_text` and `--config-yaml <config YAML filename>`. The interactive console takes
+audio paths (one per line) as inputs.
+
+
+## Examples
+- [Speech Recognition (ASR) on LibriSpeech](docs/librispeech_example.md)
+
+- [Speech-to-Text Translation (ST) on MuST-C](docs/mustc_example.md)
+
+- [Speech-to-Text Translation (ST) on CoVoST 2](docs/covost_example.md)
+
+- [Speech-to-Text Translation (ST) on Multilingual TEDx](docs/mtedx_example.md)
+- [Simultaneous Speech-to-Text Translation (SimulST) on MuST-C](docs/simulst_mustc_example.md)
+
+## Updates
+- 02/04/2021: Added interactive decoding (`fairseq-interactive`) support. Examples:
+  [ASR (LibriSpeech)](docs/librispeech_example.md#interactive-decoding)
+  and [ST (CoVoST 2)](docs/covost_example.md#interactive-decoding).
+- 01/08/2021: Several fixes for S2T Transformer model, inference-time de-tokenization, scorer configuration and data
+  preparation scripts. We also add pre-trained models to the examples and revise the instructions.
+  Breaking changes: the data preparation scripts now extract filterbank features without CMVN. CMVN is instead applied
+  on-the-fly (defined in the config YAML).
+
+## What's Next
+- We are migrating the old fairseq [ASR example](../speech_recognition) into this S2T framework and
+  merging the features from both sides.
+- The following papers also base their experiments on fairseq S2T. We are adding more examples for replication.
+  - [Improving Cross-Lingual Transfer Learning for End-to-End Speech Recognition with Speech Translation (Wang et al., 2020)](https://arxiv.org/abs/2006.05474)
+  - [Self-Supervised Representations Improve End-to-End Speech Translation (Wu et al., 2020)](https://arxiv.org/abs/2006.12124)
+  - [Self-Training for End-to-End Speech Translation (Pino et al., 2020)](https://arxiv.org/abs/2006.02490)
+  - [CoVoST: A Diverse Multilingual Speech-To-Text Translation Corpus (Wang et al., 2020)](https://arxiv.org/abs/2002.01320)
+  - [Harnessing Indirect Training Data for End-to-End Automatic Speech Translation: Tricks of the Trade (Pino et al., 2019)](https://arxiv.org/abs/1909.06515)
+
+## Citation
+Please cite as:
+```
+@inproceedings{wang2020fairseqs2t,
+  title = {fairseq S2T: Fast Speech-to-Text Modeling with fairseq},
+  author = {Changhan Wang and Yun Tang and Xutai Ma and Anne Wu and Dmytro Okhonko and Juan Pino},
+  booktitle = {Proceedings of the 2020 Conference of the Asian Chapter of the Association for Computational Linguistics (AACL): System Demonstrations},
+  year = {2020},
+}
+
+@inproceedings{ott2019fairseq,
+  title = {fairseq: A Fast, Extensible Toolkit for Sequence Modeling},
+  author = {Myle Ott and Sergey Edunov and Alexei Baevski and Angela Fan and Sam Gross and Nathan Ng and David Grangier and Michael Auli},
+  booktitle = {Proceedings of NAACL-HLT 2019: Demonstrations},
+  year = {2019},
+}
+```
diff --git a/SpeechT5/fairseq/examples/speech_to_text/data_utils.py b/SpeechT5/fairseq/examples/speech_to_text/data_utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..2bcff046f7e4daf7f6029f9e89936d2d0b708dae
--- /dev/null
+++ b/SpeechT5/fairseq/examples/speech_to_text/data_utils.py
@@ -0,0 +1,339 @@
+#!/usr/bin/env python3
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import csv
+from pathlib import Path
+import zipfile
+from functools import reduce
+from multiprocessing import cpu_count
+from typing import Any, Dict, List, Optional, Union
+
+import numpy as np
+import pandas as pd
+import sentencepiece as sp
+from fairseq.data.audio.audio_utils import (
+    _convert_to_mono, _get_kaldi_fbank, _get_torchaudio_fbank
+)
+import torch
+from tqdm import tqdm
+
+
+UNK_TOKEN, UNK_TOKEN_ID = "<unk>", 3
+BOS_TOKEN, BOS_TOKEN_ID = "<s>", 0
+EOS_TOKEN, EOS_TOKEN_ID = "</s>", 2
+PAD_TOKEN, PAD_TOKEN_ID = "<pad>", 1
+
+
+def gen_vocab(
+    input_path: Path, output_path_prefix: Path, model_type="bpe",
+    vocab_size=1000, special_symbols: Optional[List[str]] = None
+):
+    # Train SentencePiece Model
+    arguments = [
+        f"--input={input_path.as_posix()}",
+        f"--model_prefix={output_path_prefix.as_posix()}",
+        f"--model_type={model_type}",
+        f"--vocab_size={vocab_size}",
+        "--character_coverage=1.0",
+        f"--num_threads={cpu_count()}",
+        f"--unk_id={UNK_TOKEN_ID}",
+        f"--bos_id={BOS_TOKEN_ID}",
+        f"--eos_id={EOS_TOKEN_ID}",
+        f"--pad_id={PAD_TOKEN_ID}",
+    ]
+    if special_symbols is not None:
+        _special_symbols = ",".join(special_symbols)
+        arguments.append(f"--user_defined_symbols={_special_symbols}")
+    sp.SentencePieceTrainer.Train(" ".join(arguments))
+    # Export fairseq dictionary
+    spm = sp.SentencePieceProcessor()
+    spm.Load(output_path_prefix.as_posix() + ".model")
+    vocab = {i: spm.IdToPiece(i) for i in range(spm.GetPieceSize())}
+    assert (
+        vocab.get(UNK_TOKEN_ID) == UNK_TOKEN
+        and vocab.get(PAD_TOKEN_ID) == PAD_TOKEN
+        and vocab.get(BOS_TOKEN_ID) == BOS_TOKEN
+        and vocab.get(EOS_TOKEN_ID) == EOS_TOKEN
+    )
+    vocab = {
+        i: s
+        for i, s in vocab.items()
+        if s not in {UNK_TOKEN, BOS_TOKEN, EOS_TOKEN, PAD_TOKEN}
+    }
+    with open(output_path_prefix.as_posix() + ".txt", "w") as f_out:
+        for _, s in sorted(vocab.items(), key=lambda x: x[0]):
+            f_out.write(f"{s} 1\n")
+
+
+def extract_fbank_features(
+    waveform: torch.FloatTensor,
+    sample_rate: int,
+    output_path: Optional[Path] = None,
+    n_mel_bins: int = 80,
+    overwrite: bool = False,
+):
+    if output_path is not None and output_path.is_file() and not overwrite:
+        return
+
+    _waveform = _convert_to_mono(waveform, sample_rate)
+    _waveform = _waveform * (2 ** 15)  # Kaldi compliance: 16-bit signed integers
+    _waveform = _waveform.numpy()
+
+    features = _get_kaldi_fbank(_waveform, sample_rate, n_mel_bins)
+    if features is None:
+        features = _get_torchaudio_fbank(_waveform, sample_rate, n_mel_bins)
+    if features is None:
+        raise ImportError(
+            "Please install pyKaldi or torchaudio to enable fbank feature extraction"
+        )
+
+    if output_path is not None:
+        np.save(output_path.as_posix(), features)
+    else:
+        return features
+
+
+def create_zip(data_root: Path, zip_path: Path):
+    paths = list(data_root.glob("*.npy"))
+    with zipfile.ZipFile(zip_path, "w", zipfile.ZIP_STORED) as f:
+        for path in tqdm(paths):
+            f.write(path, arcname=path.name)
+
+
+def is_npy_data(data: bytes) -> bool:
+    return data[0] == 147 and data[1] == 78
+
+
+def get_zip_manifest(zip_path: Path, zip_root: Optional[Path] = None):
+    _zip_path = zip_path if zip_root is None else Path.joinpath(zip_root, zip_path)
+    with zipfile.ZipFile(_zip_path, mode="r") as f:
+        info = f.infolist()
+    manifest = {}
+    for i in tqdm(info):
+        utt_id = Path(i.filename).stem
+        offset, file_size = i.header_offset + 30 + len(i.filename), i.file_size
+        manifest[utt_id] = f"{zip_path.as_posix()}:{offset}:{file_size}"
+        with open(_zip_path, "rb") as f:
+            f.seek(offset)
+            data = f.read(file_size)
+            assert len(data) > 1 and is_npy_data(data)
+    return manifest
+
+
+def gen_config_yaml(
+    manifest_root: Path,
+    spm_filename: str,
+    yaml_filename: str = "config.yaml",
+    specaugment_policy: str = "lb",
+    prepend_tgt_lang_tag: bool = False,
+    sampling_alpha: float = 1.0,
+    audio_root: str = "",
+    cmvn_type: str = "utterance",
+    gcmvn_path: Optional[Path] = None,
+):
+    manifest_root = manifest_root.absolute()
+    writer = S2TDataConfigWriter(manifest_root / yaml_filename)
+    writer.set_vocab_filename(spm_filename.replace(".model", ".txt"))
+    writer.set_input_channels(1)
+    writer.set_input_feat_per_channel(80)
+    specaugment_setters = {
+        "lb": writer.set_specaugment_lb_policy,
+        "ld": writer.set_specaugment_ld_policy,
+        "sm": writer.set_specaugment_sm_policy,
+        "ss": writer.set_specaugment_ss_policy,
+    }
+    specaugment_setter = specaugment_setters.get(specaugment_policy, None)
+    if specaugment_setter is not None:
+        specaugment_setter()
+    writer.set_bpe_tokenizer(
+        {
+            "bpe": "sentencepiece",
+            "sentencepiece_model": (manifest_root / spm_filename).as_posix(),
+        }
+    )
+    if prepend_tgt_lang_tag:
+        writer.set_prepend_tgt_lang_tag(True)
+    writer.set_sampling_alpha(sampling_alpha)
+
+    if cmvn_type not in ["global", "utterance"]:
+        raise NotImplementedError
+
+    writer.set_feature_transforms("_train", [f"{cmvn_type}_cmvn", "specaugment"])
+    writer.set_feature_transforms("*", [f"{cmvn_type}_cmvn"])
+
+    if cmvn_type == "global":
+        assert gcmvn_path is not None, (
+            'Please provide path of global cmvn file.'
+        )
+        writer.set_global_cmvn(str(gcmvn_path))
+
+    if len(audio_root) > 0:
+        writer.set_audio_root(audio_root)
+    writer.flush()
+
+
+def load_df_from_tsv(path: Union[str, Path]):
+    _path = path if isinstance(path, str) else path.as_posix()
+    return pd.read_csv(
+        _path,
+        sep="\t",
+        header=0,
+        encoding="utf-8",
+        escapechar="\\",
+        quoting=csv.QUOTE_NONE,
+        na_filter=False,
+    )
+
+
+def save_df_to_tsv(dataframe, path: Union[str, Path]):
+    _path = path if isinstance(path, str) else path.as_posix()
+    dataframe.to_csv(
+        _path,
+        sep="\t",
+        header=True,
+        index=False,
+        encoding="utf-8",
+        escapechar="\\",
+        quoting=csv.QUOTE_NONE,
+    )
+
+
+def filter_manifest_df(
+    df, is_train_split=False, extra_filters=None, min_n_frames=5, max_n_frames=3000
+):
+    filters = {
+        "no speech": df["audio"] == "",
+        f"short speech (<{min_n_frames} frames)": df["n_frames"] < min_n_frames,
+        "empty sentence": df["tgt_text"] == "",
+    }
+    if is_train_split:
+        filters[f"long speech (>{max_n_frames} frames)"] = df["n_frames"] > max_n_frames
+    if extra_filters is not None:
+        filters.update(extra_filters)
+    invalid = reduce(lambda x, y: x | y, filters.values())
+    valid = ~invalid
+    print(
+        "| "
+        + ", ".join(f"{n}: {f.sum()}" for n, f in filters.items())
+        + f", total {invalid.sum()} filtered, {valid.sum()} remained."
+    )
+    return df[valid]
+
+
+def cal_gcmvn_stats(features_list):
+    features = np.concatenate(features_list)
+    square_sums = (features ** 2).sum(axis=0)
+    mean = features.mean(axis=0)
+    features = np.subtract(features, mean)
+    var = square_sums / features.shape[0] - mean ** 2
+    std = np.sqrt(np.maximum(var, 1e-8))
+    return {"mean": mean.astype("float32"), "std": std.astype("float32")}
+
+
+class S2TDataConfigWriter(object):
+    DEFAULT_VOCAB_FILENAME = "dict.txt"
+    DEFAULT_INPUT_FEAT_PER_CHANNEL = 80
+    DEFAULT_INPUT_CHANNELS = 1
+
+    def __init__(self, yaml_path: Path):
+        try:
+            import yaml
+        except ImportError:
+            print("Please install PyYAML for S2T data config YAML files")
+        self.yaml = yaml
+        self.yaml_path = yaml_path
+        self.config = {}
+
+    def flush(self):
+        with open(self.yaml_path, "w") as f:
+            self.yaml.dump(self.config, f)
+
+    def set_audio_root(self, audio_root=""):
+        self.config["audio_root"] = audio_root
+
+    def set_vocab_filename(self, vocab_filename: str = "dict.txt"):
+        self.config["vocab_filename"] = vocab_filename
+
+    def set_specaugment(
+        self,
+        time_wrap_w: int,
+        freq_mask_n: int,
+        freq_mask_f: int,
+        time_mask_n: int,
+        time_mask_t: int,
+        time_mask_p: float,
+    ):
+        self.config["specaugment"] = {
+            "time_wrap_W": time_wrap_w,
+            "freq_mask_N": freq_mask_n,
+            "freq_mask_F": freq_mask_f,
+            "time_mask_N": time_mask_n,
+            "time_mask_T": time_mask_t,
+            "time_mask_p": time_mask_p,
+        }
+
+    def set_specaugment_lb_policy(self):
+        self.set_specaugment(
+            time_wrap_w=0,
+            freq_mask_n=1,
+            freq_mask_f=27,
+            time_mask_n=1,
+            time_mask_t=100,
+            time_mask_p=1.0,
+        )
+
+    def set_specaugment_ld_policy(self):
+        self.set_specaugment(
+            time_wrap_w=0,
+            freq_mask_n=2,
+            freq_mask_f=27,
+            time_mask_n=2,
+            time_mask_t=100,
+            time_mask_p=1.0,
+        )
+
+    def set_specaugment_sm_policy(self):
+        self.set_specaugment(
+            time_wrap_w=0,
+            freq_mask_n=2,
+            freq_mask_f=15,
+            time_mask_n=2,
+            time_mask_t=70,
+            time_mask_p=0.2,
+        )
+
+    def set_specaugment_ss_policy(self):
+        self.set_specaugment(
+            time_wrap_w=0,
+            freq_mask_n=2,
+            freq_mask_f=27,
+            time_mask_n=2,
+            time_mask_t=70,
+            time_mask_p=0.2,
+        )
+
+    def set_input_channels(self, input_channels: int = 1):
+        self.config["input_channels"] = input_channels
+
+    def set_input_feat_per_channel(self, input_feat_per_channel: int = 80):
+        self.config["input_feat_per_channel"] = input_feat_per_channel
+
+    def set_bpe_tokenizer(self, bpe_tokenizer: Dict[str, Any]):
+        self.config["bpe_tokenizer"] = bpe_tokenizer
+
+    def set_global_cmvn(self, stats_npz_path: str):
+        self.config["global_cmvn"] = {"stats_npz_path": stats_npz_path}
+
+    def set_feature_transforms(self, split: str, transforms: List[str]):
+        if "transforms" not in self.config:
+            self.config["transforms"] = {}
+        self.config["transforms"][split] = transforms
+
+    def set_prepend_tgt_lang_tag(self, flag: bool = True):
+        self.config["prepend_tgt_lang_tag"] = flag
+
+    def set_sampling_alpha(self, sampling_alpha: float = 1.0):
+        self.config["sampling_alpha"] = sampling_alpha
diff --git a/SpeechT5/fairseq/examples/speech_to_text/docs/covost_example.md b/SpeechT5/fairseq/examples/speech_to_text/docs/covost_example.md
new file mode 100644
index 0000000000000000000000000000000000000000..16447f041e4751f79d9f7848b33ef2ff943d63c2
--- /dev/null
+++ b/SpeechT5/fairseq/examples/speech_to_text/docs/covost_example.md
@@ -0,0 +1,102 @@
+[[Back]](..)
+
+# S2T Example: ST on CoVoST
+We replicate the experiments in
+[CoVoST 2 and Massively Multilingual Speech-to-Text Translation (Wang et al., 2020)](https://arxiv.org/abs/2007.10310).
+
+## Data Preparation
+[Download](https://commonvoice.mozilla.org/en/datasets) and unpack Common Voice v4 to a path
+`${COVOST_ROOT}/${SOURCE_LANG_ID}`, then preprocess it with
+```bash
+# additional Python packages for S2T data processing/model training
+pip install pandas torchaudio sentencepiece
+
+# En ASR
+python examples/speech_to_text/prep_covost_data.py \
+  --data-root ${COVOST_ROOT} --vocab-type char --src-lang en
+# ST
+python examples/speech_to_text/prep_covost_data.py \
+  --data-root ${COVOST_ROOT} --vocab-type char \
+  --src-lang fr --tgt-lang en
+```
+The generated files (manifest, features, vocabulary and data configuration) will be added to
+`${COVOST_ROOT}/${SOURCE_LANG_ID}`.
+
+Download our vocabulary files if you want to use our pre-trained models:
+- ASR: [En](https://dl.fbaipublicfiles.com/fairseq/s2t/covost2_en_asr_vocab_char.zip)
+- ST: [Fr-En](https://dl.fbaipublicfiles.com/fairseq/s2t/covost2_fr_en_st_vocab_char.zip), [De-En](https://dl.fbaipublicfiles.com/fairseq/s2t/covost2_de_en_st_vocab_char.zip), [Es-En](https://dl.fbaipublicfiles.com/fairseq/s2t/covost2_es_en_st_vocab_char.zip), [Ca-En](https://dl.fbaipublicfiles.com/fairseq/s2t/covost2_ca_en_st_vocab_char.zip), [En-De](https://dl.fbaipublicfiles.com/fairseq/s2t/covost2_en_de_st_vocab_char.zip), [En-Ca](https://dl.fbaipublicfiles.com/fairseq/s2t/covost2_en_ca_st_vocab_char.zip), [En-Fa](https://dl.fbaipublicfiles.com/fairseq/s2t/covost2_en_fa_st_vocab_char.zip), [En-Et](https://dl.fbaipublicfiles.com/fairseq/s2t/covost2_en_et_st_vocab_char.zip)
+
+## ASR
+#### Training
+We train an En ASR model for encoder pre-training of all ST models:
+```bash
+fairseq-train ${COVOST_ROOT}/en \
+  --config-yaml config_asr_en.yaml --train-subset train_asr_en --valid-subset dev_asr_en \
+  --save-dir ${ASR_SAVE_DIR} --num-workers 4 --max-tokens 50000 --max-update 60000 \
+  --task speech_to_text --criterion label_smoothed_cross_entropy --label-smoothing 0.1 \
+  --report-accuracy --arch s2t_transformer_s --dropout 0.15 --optimizer adam --lr 2e-3 \
+  --lr-scheduler inverse_sqrt --warmup-updates 10000 --clip-norm 10.0 --seed 1 --update-freq 8
+```
+where `ASR_SAVE_DIR` is the checkpoint root path. We set `--update-freq 8` to simulate 8 GPUs with 1 GPU.
+You may want to update it accordingly when using more than 1 GPU.
+
+#### Inference & Evaluation
+```bash
+CHECKPOINT_FILENAME=avg_last_10_checkpoint.pt
+python scripts/average_checkpoints.py \
+  --inputs ${ASR_SAVE_DIR} --num-epoch-checkpoints 10 \
+  --output "${ASR_SAVE_DIR}/${CHECKPOINT_FILENAME}"
+fairseq-generate ${COVOST_ROOT}/en \
+  --config-yaml config_asr_en.yaml --gen-subset test_asr_en --task speech_to_text \
+  --path ${ASR_SAVE_DIR}/${CHECKPOINT_FILENAME} --max-tokens 50000 --beam 5 \
+  --scoring wer --wer-tokenizer 13a --wer-lowercase --wer-remove-punct
+```
+#### Results
+| --arch | Params | En | Model |
+|---|---|---|---|
+| s2t_transformer_s | 31M | 25.6 | [Download](https://dl.fbaipublicfiles.com/fairseq/s2t/covost2_en_asr_transformer_s.pt) |
+
+## ST
+#### Training
+Fr-En as example:
+```bash
+fairseq-train ${COVOST_ROOT}/fr \
+  --config-yaml config_st_fr_en.yaml --train-subset train_st_fr_en --valid-subset dev_st_fr_en \
+  --save-dir ${ST_SAVE_DIR} --num-workers 4 --max-update 30000 --max-tokens 40000 \  # --max-tokens 50000 for en-*
+  --task speech_to_text --criterion label_smoothed_cross_entropy --label-smoothing 0.1 --report-accuracy \
+  --arch s2t_transformer_s --encoder-freezing-updates 1000 --optimizer adam --lr 2e-3 \
+  --lr-scheduler inverse_sqrt --warmup-updates 10000 --clip-norm 10.0 --seed 1 --update-freq 8 \
+  --load-pretrained-encoder-from ${ASR_SAVE_DIR}/${CHECKPOINT_FILENAME}
+```
+where `ST_SAVE_DIR` is the checkpoint root path. The ST encoder is pre-trained by En ASR for faster training and better
+performance: `--load-pretrained-encoder-from <ASR checkpoint path>`. We set `--update-freq 8` to simulate 8 GPUs with 1 GPU.
+You may want to update it accordingly when using more than 1 GPU.
+
+#### Inference & Evaluation
+Average the last 10 checkpoints and evaluate on test split:
+```bash
+CHECKPOINT_FILENAME=avg_last_10_checkpoint.pt
+python scripts/average_checkpoints.py \
+  --inputs ${ST_SAVE_DIR} --num-epoch-checkpoints 10 \
+  --output "${ST_SAVE_DIR}/${CHECKPOINT_FILENAME}"
+fairseq-generate ${COVOST_ROOT}/fr \
+  --config-yaml config_st_fr_en.yaml --gen-subset test_st_fr_en --task speech_to_text \
+  --path ${ST_SAVE_DIR}/${CHECKPOINT_FILENAME} \
+  --max-tokens 50000 --beam 5 --scoring sacrebleu
+```
+
+## Interactive Decoding
+Launch the interactive console via
+```bash
+fairseq-interactive ${COVOST_ROOT}/fr --config-yaml config_st_fr_en.yaml \
+  --task speech_to_text --path ${SAVE_DIR}/${CHECKPOINT_FILENAME} \
+  --max-tokens 50000 --beam 5
+```
+Type in WAV/FLAC/OGG audio paths (one per line) after the prompt.
+
+#### Results
+| --arch | Params | Fr-En | De-En | Es-En | Ca-En | En-De | En-Ca | En-Fa | En-Et | Model |
+|---|---|---|---|---|---|---|---|---|---|---|
+| s2t_transformer_s | 31M | [27.2](https://dl.fbaipublicfiles.com/fairseq/s2t/covost2_fr_en_st_transformer_s.pt) | [17.7](https://dl.fbaipublicfiles.com/fairseq/s2t/covost2_de_en_st_transformer_s.pt) | [23.1](https://dl.fbaipublicfiles.com/fairseq/s2t/covost2_es_en_st_transformer_s.pt) | [19.3](https://dl.fbaipublicfiles.com/fairseq/s2t/covost2_ca_en_st_transformer_s.pt) | [16.1](https://dl.fbaipublicfiles.com/fairseq/s2t/covost2_en_de_st_transformer_s.pt) | [21.6](https://dl.fbaipublicfiles.com/fairseq/s2t/covost2_en_ca_st_transformer_s.pt) | [12.9](https://dl.fbaipublicfiles.com/fairseq/s2t/covost2_en_fa_st_transformer_s.pt) | [12.8](https://dl.fbaipublicfiles.com/fairseq/s2t/covost2_en_et_st_transformer_s.pt) | (<-Download) |
+
+[[Back]](..)
diff --git a/SpeechT5/fairseq/examples/speech_to_text/docs/librispeech_example.md b/SpeechT5/fairseq/examples/speech_to_text/docs/librispeech_example.md
new file mode 100644
index 0000000000000000000000000000000000000000..4040fda9426027537036ba987d087a43e734bfd9
--- /dev/null
+++ b/SpeechT5/fairseq/examples/speech_to_text/docs/librispeech_example.md
@@ -0,0 +1,69 @@
+[[Back]](..)
+
+# S2T Example: Speech Recognition (ASR) on LibriSpeech
+[LibriSpeech](https://www.danielpovey.com/files/2015_icassp_librispeech.pdf) is a de-facto standard English ASR
+benchmark. We provide competitive
+vanilla [Transformer](https://papers.nips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf) baselines.
+
+## Data preparation
+Download and preprocess LibriSpeech data with
+```bash
+# additional Python packages for S2T data processing/model training
+pip install pandas torchaudio sentencepiece
+
+python examples/speech_to_text/prep_librispeech_data.py \
+  --output-root ${LS_ROOT} --vocab-type unigram --vocab-size 10000
+```
+where `LS_ROOT` is the root path for downloaded data as well as generated files (manifest, features, vocabulary and
+data configuration).
+
+[Download](https://dl.fbaipublicfiles.com/fairseq/s2t/librispeech_vocab_unigram10000.zip) our vocabulary files
+if you want to use our pre-trained models.
+
+## Training
+```bash
+fairseq-train ${LS_ROOT} --save-dir ${SAVE_DIR} \
+  --config-yaml config.yaml --train-subset train-clean-100,train-clean-360,train-other-500 --valid-subset dev-clean,dev-other \
+  --num-workers 4 --max-tokens 40000 --max-update 300000 \
+  --task speech_to_text --criterion label_smoothed_cross_entropy --label-smoothing 0.1 --report-accuracy \
+  --arch s2t_transformer_s --share-decoder-input-output-embed \
+  --optimizer adam --lr 2e-3 --lr-scheduler inverse_sqrt --warmup-updates 10000 \
+  --clip-norm 10.0 --seed 1 --update-freq 8
+```
+where `SAVE_DIR` is the checkpoint root path. Here we use `--arch s2t_transformer_s` (31M parameters) as example.
+For better performance, you may switch to `s2t_transformer_m` (71M, with `--lr 1e-3`) or `s2t_transformer_l`
+(268M, with `--lr 5e-4`). We set `--update-freq 8` to simulate 8 GPUs with 1 GPU. You may want to update it accordingly
+when using more than 1 GPU.
+
+## Inference & Evaluation
+Average the last 10 checkpoints and evaluate on the 4 splits
+(`dev-clean`, `dev-other`, `test-clean` and `test-other`):
+```bash
+CHECKPOINT_FILENAME=avg_last_10_checkpoint.pt
+python scripts/average_checkpoints.py --inputs ${SAVE_DIR} \
+  --num-epoch-checkpoints 10 \
+  --output "${SAVE_DIR}/${CHECKPOINT_FILENAME}"
+for SUBSET in dev-clean dev-other test-clean test-other; do
+  fairseq-generate ${LS_ROOT} --config-yaml config.yaml --gen-subset ${SUBSET} \
+    --task speech_to_text --path ${SAVE_DIR}/${CHECKPOINT_FILENAME} \
+    --max-tokens 50000 --beam 5 --scoring wer
+done
+```
+
+## Interactive Decoding
+Launch the interactive console via
+```bash
+fairseq-interactive ${LS_ROOT} --config-yaml config.yaml --task speech_to_text \
+  --path ${SAVE_DIR}/${CHECKPOINT_FILENAME} --max-tokens 50000 --beam 5
+```
+Type in WAV/FLAC/OGG audio paths (one per line) after the prompt.
+
+## Results
+
+| --arch | Params | dev-clean | dev-other | test-clean | test-other | Model |
+|---|---|---|---|---|---|---|
+| s2t_transformer_s | 30M | 3.8 | 8.9 | 4.4 | 9.0 | [Download](https://dl.fbaipublicfiles.com/fairseq/s2t/librispeech_transformer_s.pt) |
+| s2t_transformer_m | 71M | 3.2 | 8.0 | 3.4 | 7.9 | [Download](https://dl.fbaipublicfiles.com/fairseq/s2t/librispeech_transformer_m.pt) |
+| s2t_transformer_l | 268M | 3.0 | 7.5 | 3.2 | 7.5 | [Download](https://dl.fbaipublicfiles.com/fairseq/s2t/librispeech_transformer_l.pt) |
+
+[[Back]](..)
diff --git a/SpeechT5/fairseq/examples/speech_to_text/docs/mtedx_example.md b/SpeechT5/fairseq/examples/speech_to_text/docs/mtedx_example.md
new file mode 100644
index 0000000000000000000000000000000000000000..25b4556affbf5bc141b103095d15fffef6225c0e
--- /dev/null
+++ b/SpeechT5/fairseq/examples/speech_to_text/docs/mtedx_example.md
@@ -0,0 +1,200 @@
+[[Back]](..)
+
+# S2T Example: Speech Translation (ST) on Multilingual TEDx
+
+[Multilingual TEDx](https://arxiv.org/abs/2102.01757) is multilingual corpus for speech recognition and
+speech translation. The data is derived from TEDx talks in 8 source languages
+with translations to a subset of 5 target languages.
+
+## Data Preparation
+[Download](http://openslr.org/100/) and unpack Multilingual TEDx data to a path
+`${MTEDX_ROOT}/${LANG_PAIR}`, then preprocess it with
+```bash
+# additional Python packages for S2T data processing/model training
+pip install pandas torchaudio soundfile sentencepiece
+
+# Generate TSV manifests, features, vocabulary
+# and configuration for each language
+python examples/speech_to_text/prep_mtedx_data.py \
+  --data-root ${MTEDX_ROOT} --task asr \
+  --vocab-type unigram --vocab-size 1000
+python examples/speech_to_text/prep_mtedx_data.py \
+  --data-root ${MTEDX_ROOT} --task st \
+  --vocab-type unigram --vocab-size 1000
+
+# Add vocabulary and configuration for joint data
+# (based on the manifests and features generated above)
+python examples/speech_to_text/prep_mtedx_data.py \
+  --data-root ${MTEDX_ROOT} --task asr --joint \
+  --vocab-type unigram --vocab-size 8000
+python examples/speech_to_text/prep_mtedx_data.py \
+  --data-root ${MTEDX_ROOT} --task st --joint \
+  --vocab-type unigram --vocab-size 8000
+```
+The generated files (manifest, features, vocabulary and data configuration) will be added to
+`${MTEDX_ROOT}/${LANG_PAIR}` (per-language data) and `MTEDX_ROOT` (joint data).
+
+
+## ASR
+#### Training
+Spanish as example:
+```bash
+fairseq-train ${MTEDX_ROOT}/es-es \
+    --config-yaml config_asr.yaml --train-subset train_asr --valid-subset valid_asr \
+    --save-dir ${ASR_SAVE_DIR} --num-workers 4 --max-tokens 40000 --max-epoch 200 \
+    --task speech_to_text --criterion label_smoothed_cross_entropy --report-accuracy \
+    --arch s2t_transformer_xs --optimizer adam --lr 2e-3 --lr-scheduler inverse_sqrt \
+    --warmup-updates 10000 --clip-norm 10.0 --seed 1 --dropout 0.3 --label-smoothing 0.1 \
+    --load-pretrained-encoder-from ${PRETRAINED_ENCODER} \
+    --skip-invalid-size-inputs-valid-test \
+    --keep-last-epochs 10 --update-freq 8 --patience 10
+```
+For joint model (using ASR data from all 8 languages):
+```bash
+fairseq-train ${MTEDX_ROOT} \
+    --config-yaml config_asr.yaml \
+    --train-subset train_es-es_asr,train_fr-fr_asr,train_pt-pt_asr,train_it-it_asr,train_ru-ru_asr,train_el-el_asr,train_ar-ar_asr,train_de-de_asr \
+    --valid-subset valid_es-es_asr,valid_fr-fr_asr,valid_pt-pt_asr,valid_it-it_asr,valid_ru-ru_asr,valid_el-el_asr,valid_ar-ar_asr,valid_de-de_asr \
+    --save-dir ${MULTILINGUAL_ASR_SAVE_DIR} --num-workers 4 --max-tokens 40000 --max-epoch 200 \
+    --task speech_to_text --criterion label_smoothed_cross_entropy --report-accuracy \
+    --arch s2t_transformer_s --optimizer adam --lr 2e-3 --lr-scheduler inverse_sqrt \
+    --warmup-updates 10000 --clip-norm 10.0 --seed 1 --dropout 0.3 --label-smoothing 0.1 \
+    --skip-invalid-size-inputs-valid-test \
+    --keep-last-epochs 10 --update-freq 8 --patience 10 \
+    --ignore-prefix-size 1
+```
+where `MULTILINGUAL_ASR_SAVE_DIR` is the checkpoint root path. We set `--update-freq 8` to simulate 8 GPUs
+with 1 GPU. You may want to update it accordingly when using more than 1 GPU.
+For multilingual models, we prepend target language ID token as target BOS, which should be excluded from
+the training loss via `--ignore-prefix-size 1`.
+
+#### Inference & Evaluation
+```bash
+CHECKPOINT_FILENAME=avg_last_10_checkpoint.pt
+python scripts/average_checkpoints.py \
+  --inputs ${ASR_SAVE_DIR} --num-epoch-checkpoints 10 \
+  --output "${ASR_SAVE_DIR}/${CHECKPOINT_FILENAME}"
+
+fairseq-generate ${MTEDX_ROOT}/es-es \
+  --config-yaml config_asr.yaml --gen-subset test --task speech_to_text \
+  --path ${ASR_SAVE_DIR}/${CHECKPOINT_FILENAME} --max-tokens 50000 --beam 5 \
+  --skip-invalid-size-inputs-valid-test \
+  --scoring wer --wer-tokenizer 13a --wer-lowercase --wer-remove-punct --remove-bpe
+
+# For models trained on joint data
+CHECKPOINT_FILENAME=avg_last_10_checkpoint.pt
+python scripts/average_checkpoints.py \
+  --inputs ${MULTILINGUAL_ASR_SAVE_DIR} --num-epoch-checkpoints 10 \
+  --output "${MULTILINGUAL_ASR_SAVE_DIR}/${CHECKPOINT_FILENAME}"
+
+for LANG in es fr pt it ru el ar de; do
+  fairseq-generate ${MTEDX_ROOT} \
+    --config-yaml config_asr.yaml --gen-subset test_${LANG}-${LANG}_asr --task speech_to_text \
+    --prefix-size 1 --path ${MULTILINGUAL_ASR_SAVE_DIR}/${CHECKPOINT_FILENAME} \
+    --max-tokens 40000 --beam 5 \
+    --skip-invalid-size-inputs-valid-test \
+    --scoring wer --wer-tokenizer 13a --wer-lowercase --wer-remove-punct --remove-bpe
+done
+```
+#### Results
+| Data         | --arch             | Params |  Es  |  Fr  |  Pt  |  It  |  Ru  |   El  |   Ar  |   De  |
+|--------------|--------------------|--------|------|------|------|------|------|-------|-------|-------|
+| Monolingual  | s2t_transformer_xs |    10M | 46.4 | 45.6 | 54.8 | 48.0 | 74.7 | 109.5 | 104.4 | 111.1 |
+
+
+## ST
+#### Training
+Es-En as example:
+```bash
+fairseq-train ${MTEDX_ROOT}/es-en \
+    --config-yaml config_st.yaml --train-subset train_st --valid-subset valid_st \
+    --save-dir ${ST_SAVE_DIR} --num-workers 4 --max-tokens 40000 --max-epoch 200 \
+    --task speech_to_text --criterion label_smoothed_cross_entropy --report-accuracy \
+    --arch s2t_transformer_xs --optimizer adam --lr 2e-3 --lr-scheduler inverse_sqrt \
+    --warmup-updates 10000 --clip-norm 10.0 --seed 1 --dropout 0.3 --label-smoothing 0.1 \
+    --load-pretrained-encoder-from ${PRETRAINED_ENCODER} \
+    --skip-invalid-size-inputs-valid-test \
+    --keep-last-epochs 10 --update-freq 8 --patience 10
+```
+For multilingual model (all 12 directions):
+```bash
+fairseq-train ${MTEDX_ROOT} \
+    --config-yaml config_st.yaml \
+    --train-subset train_el-en_st,train_es-en_st,train_es-fr_st,train_es-it_st,train_es-pt_st,train_fr-en_st,train_fr-es_st,train_fr-pt_st,train_it-en_st,train_it-es_st,train_pt-en_st,train_pt-es_st,train_ru-en_st \
+    --valid-subset valid_el-en_st,valid_es-en_st,valid_es-fr_st,valid_es-it_st,valid_es-pt_st,valid_fr-en_st,valid_fr-es_st,valid_fr-pt_st,valid_it-en_st,valid_it-es_st,valid_pt-en_st,valid_pt-es_st,valid_ru-en_st \
+    --save-dir ${MULTILINGUAL_ST_SAVE_DIR} --num-workers 4 --max-tokens 40000 --max-epoch 200 \
+    --task speech_to_text --criterion label_smoothed_cross_entropy --report-accuracy \
+    --arch s2t_transformer_s --optimizer adam --lr 2e-3 --lr-scheduler inverse_sqrt \
+    --warmup-updates 10000 --clip-norm 10.0 --seed 1 --dropout 0.3 --label-smoothing 0.1 \
+    --skip-invalid-size-inputs-valid-test \
+    --keep-last-epochs 10 --update-freq 8 --patience 10 \
+    --ignore-prefix-size 1 \
+    --load-pretrained-encoder-from ${PRETRAINED_ENCODER}
+```
+where `ST_SAVE_DIR` (`MULTILINGUAL_ST_SAVE_DIR`) is the checkpoint root path. The ST encoder is pre-trained by ASR
+for faster training and better performance: `--load-pretrained-encoder-from <(JOINT_)ASR checkpoint path>`. We set
+`--update-freq 8` to simulate 8 GPUs with 1 GPU. You may want to update it accordingly when using more than 1 GPU.
+For multilingual models, we prepend target language ID token as target BOS, which should be excluded from
+the training loss via `--ignore-prefix-size 1`.
+
+#### Inference & Evaluation
+Average the last 10 checkpoints and evaluate on the `test` split:
+```bash
+CHECKPOINT_FILENAME=avg_last_10_checkpoint.pt
+python scripts/average_checkpoints.py \
+  --inputs ${ST_SAVE_DIR} --num-epoch-checkpoints 10 \
+  --output "${ST_SAVE_DIR}/${CHECKPOINT_FILENAME}"
+
+fairseq-generate ${MTEDX_ROOT}/es-en \
+  --config-yaml config_st.yaml --gen-subset test --task speech_to_text \
+  --path ${ST_SAVE_DIR}/${CHECKPOINT_FILENAME} \
+  --max-tokens 50000 --beam 5 --scoring sacrebleu --remove-bpe
+
+# For multilingual models
+python scripts/average_checkpoints.py \
+  --inputs ${MULTILINGUAL_ST_SAVE_DIR} --num-epoch-checkpoints 10 \
+  --output "${MULTILINGUAL_ST_SAVE_DIR}/${CHECKPOINT_FILENAME}"
+
+for LANGPAIR in es-en es-fr es-pt fr-en fr-es fr-pt pt-en pt-es it-en it-es ru-en el-en; do
+  fairseq-generate ${MTEDX_ROOT} \
+    --config-yaml config_st.yaml --gen-subset test_${LANGPAIR}_st --task speech_to_text \
+    --prefix-size 1 --path ${MULTILINGUAL_ST_SAVE_DIR}/${CHECKPOINT_FILENAME} \
+    --max-tokens 40000 --beam 5 \
+    --skip-invalid-size-inputs-valid-test \
+    --scoring sacrebleu --remove-bpe
+done
+```
+For multilingual models, we force decoding from the target language ID token (as BOS) via `--prefix-size 1`.
+
+#### Results
+| Data         | --arch          | Params | Es-En | Es-Pt | Es-Fr | Fr-En | Fr-Es | Fr-Pt | Pt-En | Pt-Es | It-En | It-Es | Ru-En | El-En |
+|--------------|--------------------|-----|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|
+| Bilingual    | s2t_transformer_xs | 10M |  7.0  |  12.2 |  1.7  |  8.9  |  10.6 |  7.9  |  8.1  |  8.7  |   6.4 |  1.0  |  0.7  |  0.6  |
+| Multilingual | s2t_transformer_s  | 31M |  12.3 |  17.4 |   6.1 |  12.0 |  13.6 |  13.2 |  12.0 |  13.7 |  10.7 |  13.1 |  0.6  |  0.8  |
+
+
+## Citation
+Please cite as:
+```
+@misc{salesky2021mtedx,
+      title={Multilingual TEDx Corpus for Speech Recognition and Translation},
+      author={Elizabeth Salesky and Matthew Wiesner and Jacob Bremerman and Roldano Cattoni and Matteo Negri and Marco Turchi and Douglas W. Oard and Matt Post},
+      year={2021},
+}
+
+@inproceedings{wang2020fairseqs2t,
+  title = {fairseq S2T: Fast Speech-to-Text Modeling with fairseq},
+  author = {Changhan Wang and Yun Tang and Xutai Ma and Anne Wu and Dmytro Okhonko and Juan Pino},
+  booktitle = {Proceedings of the 2020 Conference of the Asian Chapter of the Association for Computational Linguistics (AACL): System Demonstrations},
+  year = {2020},
+}
+
+@inproceedings{ott2019fairseq,
+  title = {fairseq: A Fast, Extensible Toolkit for Sequence Modeling},
+  author = {Myle Ott and Sergey Edunov and Alexei Baevski and Angela Fan and Sam Gross and Nathan Ng and David Grangier and Michael Auli},
+  booktitle = {Proceedings of NAACL-HLT 2019: Demonstrations},
+  year = {2019},
+}
+```
+
+[[Back]](..)
diff --git a/SpeechT5/fairseq/examples/speech_to_text/docs/mustc_example.md b/SpeechT5/fairseq/examples/speech_to_text/docs/mustc_example.md
new file mode 100644
index 0000000000000000000000000000000000000000..c95ef3e15660107c3384f87c1680f005044e7f3b
--- /dev/null
+++ b/SpeechT5/fairseq/examples/speech_to_text/docs/mustc_example.md
@@ -0,0 +1,155 @@
+[[Back]](..)
+
+# S2T Example: Speech Translation (ST) on MuST-C
+
+[MuST-C](https://www.aclweb.org/anthology/N19-1202) is multilingual speech-to-text translation corpus with
+8-language translations on English TED talks. We match the state-of-the-art performance in
+[ESPNet-ST](https://arxiv.org/pdf/2004.10234.pdf) with a simpler model training pipeline.
+
+## Data Preparation
+[Download](https://ict.fbk.eu/must-c) and unpack MuST-C data to a path
+`${MUSTC_ROOT}/en-${TARGET_LANG_ID}`, then preprocess it with
+```bash
+# additional Python packages for S2T data processing/model training
+pip install pandas torchaudio soundfile sentencepiece
+
+# Generate TSV manifests, features, vocabulary
+# and configuration for each language
+python examples/speech_to_text/prep_mustc_data.py \
+  --data-root ${MUSTC_ROOT} --task asr \
+  --vocab-type unigram --vocab-size 5000
+python examples/speech_to_text/prep_mustc_data.py \
+  --data-root ${MUSTC_ROOT} --task st \
+  --vocab-type unigram --vocab-size 8000
+
+# Add vocabulary and configuration for joint data
+# (based on the manifests and features generated above)
+python examples/speech_to_text/prep_mustc_data.py \
+  --data-root ${MUSTC_ROOT} --task asr --joint \
+  --vocab-type unigram --vocab-size 10000
+python examples/speech_to_text/prep_mustc_data.py \
+  --data-root ${MUSTC_ROOT} --task st --joint \
+  --vocab-type unigram --vocab-size 10000
+```
+The generated files (manifest, features, vocabulary and data configuration) will be added to
+`${MUSTC_ROOT}/en-${TARGET_LANG_ID}` (per-language data) and `MUSTC_ROOT` (joint data).
+
+Download our vocabulary files if you want to use our pre-trained models:
+- ASR: [En-De](https://dl.fbaipublicfiles.com/fairseq/s2t/mustc_de_asr_vocab_unigram5000.zip), [En-Nl](https://dl.fbaipublicfiles.com/fairseq/s2t/mustc_nl_asr_vocab_unigram5000.zip), [En-Es](https://dl.fbaipublicfiles.com/fairseq/s2t/mustc_es_asr_vocab_unigram5000.zip), [En-Fr](https://dl.fbaipublicfiles.com/fairseq/s2t/mustc_fr_asr_vocab_unigram5000.zip), [En-It](https://dl.fbaipublicfiles.com/fairseq/s2t/mustc_it_asr_vocab_unigram5000.zip), [En-Pt](https://dl.fbaipublicfiles.com/fairseq/s2t/mustc_pt_asr_vocab_unigram5000.zip), [En-Ro](https://dl.fbaipublicfiles.com/fairseq/s2t/mustc_ro_asr_vocab_unigram5000.zip), [En-Ru](https://dl.fbaipublicfiles.com/fairseq/s2t/mustc_ru_asr_vocab_unigram5000.zip), [Joint](https://dl.fbaipublicfiles.com/fairseq/s2t/mustc_joint_asr_vocab_unigram10000.zip)
+- ST: [En-De](https://dl.fbaipublicfiles.com/fairseq/s2t/mustc_de_st_vocab_unigram8000.zip), [En-Nl](https://dl.fbaipublicfiles.com/fairseq/s2t/mustc_nl_st_vocab_unigram8000.zip), [En-Es](https://dl.fbaipublicfiles.com/fairseq/s2t/mustc_es_st_vocab_unigram8000.zip), [En-Fr](https://dl.fbaipublicfiles.com/fairseq/s2t/mustc_fr_st_vocab_unigram8000.zip), [En-It](https://dl.fbaipublicfiles.com/fairseq/s2t/mustc_it_st_vocab_unigram8000.zip), [En-Pt](https://dl.fbaipublicfiles.com/fairseq/s2t/mustc_pt_st_vocab_unigram8000.zip), [En-Ro](https://dl.fbaipublicfiles.com/fairseq/s2t/mustc_ro_st_vocab_unigram8000.zip), [En-Ru](https://dl.fbaipublicfiles.com/fairseq/s2t/mustc_ru_st_vocab_unigram8000.zip), [Multilingual](https://dl.fbaipublicfiles.com/fairseq/s2t/mustc_multilingual_st_vocab_unigram10000.zip)
+
+## ASR
+#### Training
+En-De as example:
+```bash
+fairseq-train ${MUSTC_ROOT}/en-de \
+  --config-yaml config_asr.yaml --train-subset train_asr --valid-subset dev_asr \
+  --save-dir ${ASR_SAVE_DIR} --num-workers 4 --max-tokens 40000 --max-update 100000 \
+  --task speech_to_text --criterion label_smoothed_cross_entropy --label-smoothing 0.1 --report-accuracy \
+  --arch s2t_transformer_s --optimizer adam --lr 1e-3 --lr-scheduler inverse_sqrt \
+  --warmup-updates 10000 --clip-norm 10.0 --seed 1 --update-freq 8
+```
+For joint model (using ASR data from all 8 directions):
+```bash
+fairseq-train ${MUSTC_ROOT} \
+  --config-yaml config_asr.yaml \
+  --train-subset train_de_asr,train_nl_asr,train_es_asr,train_fr_asr,train_it_asr,train_pt_asr,train_ro_asr,train_ru_asr \
+  --valid-subset dev_de_asr,dev_nl_asr,dev_es_asr,dev_fr_asr,dev_it_asr,dev_pt_asr,dev_ro_asr,dev_ru_asr \
+  --save-dir ${JOINT_ASR_SAVE_DIR} --num-workers 4 --max-tokens 40000 --max-update 100000 \
+  --task speech_to_text --criterion label_smoothed_cross_entropy --label-smoothing 0.1 --report-accuracy \
+  --arch s2t_transformer_s --optimizer adam --lr 1e-3 --lr-scheduler inverse_sqrt \
+  --warmup-updates 10000 --clip-norm 10.0 --seed 1 --update-freq 8
+```
+where `ASR_SAVE_DIR` (`JOINT_ASR_SAVE_DIR`) is the checkpoint root path. We set `--update-freq 8` to simulate 8 GPUs
+with 1 GPU. You may want to update it accordingly when using more than 1 GPU.
+
+#### Inference & Evaluation
+```bash
+CHECKPOINT_FILENAME=avg_last_10_checkpoint.pt
+python scripts/average_checkpoints.py \
+  --inputs ${ASR_SAVE_DIR} --num-epoch-checkpoints 10 \
+  --output "${ASR_SAVE_DIR}/${CHECKPOINT_FILENAME}"
+fairseq-generate ${MUSTC_ROOT}/en-de \
+  --config-yaml config_asr.yaml --gen-subset tst-COMMON_asr --task speech_to_text \
+  --path ${ASR_SAVE_DIR}/${CHECKPOINT_FILENAME} --max-tokens 50000 --beam 5 \
+  --scoring wer --wer-tokenizer 13a --wer-lowercase --wer-remove-punct
+
+# For models trained on joint data
+python scripts/average_checkpoints.py \
+  --inputs ${JOINT_ASR_SAVE_DIR} --num-epoch-checkpoints 10 \
+  --output "${JOINT_ASR_SAVE_DIR}/${CHECKPOINT_FILENAME}"
+for LANG in de nl es fr it pt ro ru; do
+  fairseq-generate ${MUSTC_ROOT} \
+  --config-yaml config_asr.yaml --gen-subset tst-COMMON_${LANG}_asr --task speech_to_text \
+    --path ${JOINT_ASR_SAVE_DIR}/${CHECKPOINT_FILENAME} --max-tokens 50000 --beam 5 \
+    --scoring wer --wer-tokenizer 13a --wer-lowercase --wer-remove-punct
+done
+```
+#### Results
+| Data | --arch | Params | En-De | En-Nl | En-Es | En-Fr | En-It | En-Pt | En-Ro | En-Ru | Model |
+|---|---|---|---|---|---|---|---|---|---|---|---|
+| Single | s2t_transformer_s | 31M | [18.2](https://dl.fbaipublicfiles.com/fairseq/s2t/mustc_de_asr_transformer_s.pt) | [17.6](https://dl.fbaipublicfiles.com/fairseq/s2t/mustc_nl_asr_transformer_s.pt) | [17.7](https://dl.fbaipublicfiles.com/fairseq/s2t/mustc_es_asr_transformer_s.pt) | [17.2](https://dl.fbaipublicfiles.com/fairseq/s2t/mustc_fr_asr_transformer_s.pt) | [17.9](https://dl.fbaipublicfiles.com/fairseq/s2t/mustc_it_asr_transformer_s.pt) | [19.1](https://dl.fbaipublicfiles.com/fairseq/s2t/mustc_pt_asr_transformer_s.pt) | [18.1](https://dl.fbaipublicfiles.com/fairseq/s2t/mustc_ro_asr_transformer_s.pt) | [17.7](https://dl.fbaipublicfiles.com/fairseq/s2t/mustc_ru_asr_transformer_s.pt) | (<-Download) |
+| Joint | s2t_transformer_m | 76M | 16.8 | 16.7 | 16.9 | 16.9 | 17.0 | 17.4 | 17.0 | 16.9 | [Download](https://dl.fbaipublicfiles.com/fairseq/s2t/mustc_joint_asr_transformer_m.pt) |
+
+## ST
+#### Training
+En-De as example:
+```bash
+fairseq-train ${MUSTC_ROOT}/en-de \
+  --config-yaml config_st.yaml --train-subset train_st --valid-subset dev_st \
+  --save-dir ${ST_SAVE_DIR} --num-workers 4 --max-tokens 40000 --max-update 100000 \
+  --task speech_to_text --criterion label_smoothed_cross_entropy --label-smoothing 0.1 --report-accuracy \
+  --arch s2t_transformer_s --optimizer adam --lr 2e-3 --lr-scheduler inverse_sqrt \
+  --warmup-updates 10000 --clip-norm 10.0 --seed 1 --update-freq 8 \
+  --load-pretrained-encoder-from ${ASR_SAVE_DIR}/${CHECKPOINT_FILENAME}
+```
+For multilingual model (all 8 directions):
+```bash
+fairseq-train ${MUSTC_ROOT} \
+  --config-yaml config_st.yaml \
+  --train-subset train_de_st,train_nl_st,train_es_st,train_fr_st,train_it_st,train_pt_st,train_ro_st,train_ru_st \
+  --valid-subset dev_de_st,dev_nl_st,dev_es_st,dev_fr_st,dev_it_st,dev_pt_st,dev_ro_st,dev_ru_st \
+  --save-dir ${MULTILINGUAL_ST_SAVE_DIR} --num-workers 4 --max-tokens 40000 --max-update 100000 \
+  --task speech_to_text --criterion label_smoothed_cross_entropy --label-smoothing 0.1 --report-accuracy \
+  --arch s2t_transformer_s --ignore-prefix-size 1 --optimizer adam --lr 2e-3 --lr-scheduler inverse_sqrt \
+  --warmup-updates 10000 --clip-norm 10.0 --seed 1 --update-freq 8 \
+  --load-pretrained-encoder-from ${JOINT_ASR_SAVE_DIR}/${CHECKPOINT_FILENAME}
+```
+where `ST_SAVE_DIR` (`MULTILINGUAL_ST_SAVE_DIR`) is the checkpoint root path. The ST encoder is pre-trained by ASR
+for faster training and better performance: `--load-pretrained-encoder-from <(JOINT_)ASR checkpoint path>`. We set
+`--update-freq 8` to simulate 8 GPUs with 1 GPU. You may want to update it accordingly when using more than 1 GPU.
+For multilingual models, we prepend target language ID token as target BOS, which should be excluded from
+the training loss via `--ignore-prefix-size 1`.
+
+#### Inference & Evaluation
+Average the last 10 checkpoints and evaluate on the `tst-COMMON` split:
+```bash
+CHECKPOINT_FILENAME=avg_last_10_checkpoint.pt
+python scripts/average_checkpoints.py \
+  --inputs ${ST_SAVE_DIR} --num-epoch-checkpoints 10 \
+  --output "${ST_SAVE_DIR}/${CHECKPOINT_FILENAME}"
+fairseq-generate ${MUSTC_ROOT}/en-de \
+  --config-yaml config_st.yaml --gen-subset tst-COMMON_st --task speech_to_text \
+  --path ${ST_SAVE_DIR}/${CHECKPOINT_FILENAME} \
+  --max-tokens 50000 --beam 5 --scoring sacrebleu
+
+# For multilingual models
+python scripts/average_checkpoints.py \
+  --inputs ${MULTILINGUAL_ST_SAVE_DIR} --num-epoch-checkpoints 10 \
+  --output "${MULTILINGUAL_ST_SAVE_DIR}/${CHECKPOINT_FILENAME}"
+for LANG in de nl es fr it pt ro ru; do
+  fairseq-generate ${MUSTC_ROOT} \
+    --config-yaml config_st.yaml --gen-subset tst-COMMON_${LANG}_st --task speech_to_text \
+    --prefix-size 1 --path ${MULTILINGUAL_ST_SAVE_DIR}/${CHECKPOINT_FILENAME} \
+    --max-tokens 50000 --beam 5 --scoring sacrebleu
+done
+```
+For multilingual models, we force decoding from the target language ID token (as BOS) via `--prefix-size 1`.
+
+#### Results
+| Data | --arch | Params | En-De | En-Nl | En-Es | En-Fr | En-It | En-Pt | En-Ro | En-Ru | Model |
+|---|---|---|---|---|---|---|---|---|---|---|---|
+| Bilingual | s2t_transformer_s | 31M | [22.7](https://dl.fbaipublicfiles.com/fairseq/s2t/mustc_de_st_transformer_s.pt) | [27.3](https://dl.fbaipublicfiles.com/fairseq/s2t/mustc_nl_st_transformer_s.pt) | [27.2](https://dl.fbaipublicfiles.com/fairseq/s2t/mustc_es_st_transformer_s.pt) | [32.9](https://dl.fbaipublicfiles.com/fairseq/s2t/mustc_fr_st_transformer_s.pt) | [22.7](https://dl.fbaipublicfiles.com/fairseq/s2t/mustc_it_st_transformer_s.pt) | [28.1](https://dl.fbaipublicfiles.com/fairseq/s2t/mustc_pt_st_transformer_s.pt) | [21.9](https://dl.fbaipublicfiles.com/fairseq/s2t/mustc_ro_st_transformer_s.pt) | [15.3](https://dl.fbaipublicfiles.com/fairseq/s2t/mustc_ru_st_transformer_s.pt) | (<-Download) |
+| Multilingual | s2t_transformer_m | 76M | 24.5 | 28.6 | 28.2 | 34.9 | 24.6 | 31.1 | 23.8 | 16.0 | [Download](https://dl.fbaipublicfiles.com/fairseq/s2t/mustc_multilingual_st_transformer_m.pt) |
+
+[[Back]](..)
diff --git a/SpeechT5/fairseq/examples/speech_to_text/docs/simulst_mustc_example.md b/SpeechT5/fairseq/examples/speech_to_text/docs/simulst_mustc_example.md
new file mode 100644
index 0000000000000000000000000000000000000000..52ca9ac0625b0da6b5202111b247cd91cc531be7
--- /dev/null
+++ b/SpeechT5/fairseq/examples/speech_to_text/docs/simulst_mustc_example.md
@@ -0,0 +1,190 @@
+# Simultaneous Speech Translation (SimulST) on MuST-C
+
+This is a tutorial of training and evaluating a transformer *wait-k* simultaneous model on MUST-C English-Germen Dataset, from [SimulMT to SimulST: Adapting Simultaneous Text Translation to End-to-End Simultaneous Speech Translation](https://www.aclweb.org/anthology/2020.aacl-main.58.pdf).
+
+[MuST-C](https://www.aclweb.org/anthology/N19-1202) is multilingual speech-to-text translation corpus with 8-language translations on English TED talks.
+
+## Data Preparation
+This section introduces the data preparation for training and evaluation.
+If you only want to evaluate the model, please jump to [Inference & Evaluation](#inference-&-evaluation)
+
+[Download](https://ict.fbk.eu/must-c) and unpack MuST-C data to a path
+`${MUSTC_ROOT}/en-${TARGET_LANG_ID}`, then preprocess it with
+```bash
+# Additional Python packages for S2T data processing/model training
+pip install pandas torchaudio sentencepiece
+
+# Generate TSV manifests, features, vocabulary,
+# global cepstral and mean estimation,
+# and configuration for each language
+cd fairseq
+
+python examples/speech_to_text/prep_mustc_data.py \
+  --data-root ${MUSTC_ROOT} --task asr \
+  --vocab-type unigram --vocab-size 10000 \
+  --cmvn-type global
+
+python examples/speech_to_text/prep_mustc_data.py \
+  --data-root ${MUSTC_ROOT} --task st \
+  --vocab-type unigram --vocab-size 10000 \
+  --cmvn-type global
+```
+
+## ASR Pretraining
+We need a pretrained offline ASR model. Assuming the save directory of the ASR model is `${ASR_SAVE_DIR}`.
+The following command (and the subsequent training commands in this tutorial) assume training on 1 GPU (you can also train on 8 GPUs and remove the `--update-freq 8` option).
+```
+fairseq-train ${MUSTC_ROOT}/en-de \
+  --config-yaml config_asr.yaml --train-subset train_asr --valid-subset dev_asr \
+  --save-dir ${ASR_SAVE_DIR} --num-workers 4 --max-tokens 40000 --max-update 100000 \
+  --task speech_to_text --criterion label_smoothed_cross_entropy --report-accuracy \
+  --arch convtransformer_espnet --optimizer adam --lr 0.0005 --lr-scheduler inverse_sqrt \
+  --warmup-updates 10000 --clip-norm 10.0 --seed 1 --update-freq 8
+```
+A pretrained ASR checkpoint can be downloaded [here](https://dl.fbaipublicfiles.com/simultaneous_translation/must_c_v1_en_de_pretrained_asr)
+
+## Simultaneous Speech Translation Training
+
+### Wait-K with fixed pre-decision module
+Fixed pre-decision indicates that the model operate simultaneous policy on the boundaries of fixed chunks.
+Here is a example of fixed pre-decision ratio 7 (the simultaneous decision is made every 7 encoder states) and
+a wait-3 policy model. Assuming the save directory is `${ST_SAVE_DIR}`
+```bash
+ fairseq-train ${MUSTC_ROOT}/en-de \
+        --config-yaml config_st.yaml --train-subset train_st --valid-subset dev_st \
+        --save-dir ${ST_SAVE_DIR} --num-workers 8  \
+        --optimizer adam --lr 0.0001 --lr-scheduler inverse_sqrt --clip-norm 10.0 \
+        --criterion label_smoothed_cross_entropy \
+        --warmup-updates 4000 --max-update 100000 --max-tokens 40000 --seed 2 \
+        --load-pretrained-encoder-from ${ASR_SAVE_DIR}/checkpoint_best.pt \
+        --task speech_to_text  \
+        --arch convtransformer_simul_trans_espnet  \
+        --simul-type waitk_fixed_pre_decision  \
+        --waitk-lagging 3 \
+        --fixed-pre-decision-ratio 7 \
+        --update-freq 8
+
+```
+### Monotonic multihead attention with fixed pre-decision module
+```
+ fairseq-train ${MUSTC_ROOT}/en-de \
+        --config-yaml config_st.yaml --train-subset train_st --valid-subset dev_st \
+        --save-dir ${ST_SAVE_DIR} --num-workers 8  \
+        --optimizer adam --lr 0.0001 --lr-scheduler inverse_sqrt --clip-norm 10.0 \
+        --warmup-updates 4000 --max-update 100000 --max-tokens 40000 --seed 2 \
+        --load-pretrained-encoder-from ${ASR_SAVE_DIR}/${CHECKPOINT_FILENAME} \
+        --task speech_to_text  \
+        --criterion latency_augmented_label_smoothed_cross_entropy \
+        --latency-weight-avg 0.1 \
+        --arch convtransformer_simul_trans_espnet  \
+        --simul-type infinite_lookback_fixed_pre_decision  \
+        --fixed-pre-decision-ratio 7 \
+        --update-freq 8
+```
+## Inference & Evaluation
+[SimulEval](https://github.com/facebookresearch/SimulEval) is used for evaluation.
+The following command is for evaluation.
+
+```
+git clone https://github.com/facebookresearch/SimulEval.git
+cd SimulEval
+pip install -e .
+
+simuleval \
+    --agent ${FAIRSEQ}/examples/speech_to_text/simultaneous_translation/agents/fairseq_simul_st_agent.py
+    --source ${SRC_LIST_OF_AUDIO}
+    --target ${TGT_FILE}
+    --data-bin ${MUSTC_ROOT}/en-de \
+    --config config_st.yaml \
+    --model-path ${ST_SAVE_DIR}/${CHECKPOINT_FILENAME} \
+    --output ${OUTPUT} \
+    --scores
+```
+
+The source file `${SRC_LIST_OF_AUDIO}` is a list of paths of audio files. Assuming your audio files stored at `/home/user/data`,
+it should look like this
+
+```bash
+/home/user/data/audio-1.wav
+/home/user/data/audio-2.wav
+```
+
+Each line of target file `${TGT_FILE}` is the translation for each audio file input.
+```bash
+Translation_1
+Translation_2
+```
+The evaluation runs on the original MUSTC segmentation.
+The following command will generate the wav list and text file for a evaluation set `${SPLIT}` (chose from `dev`, `tst-COMMON` and `tst-HE`) in MUSTC to `${EVAL_DATA}`.
+```bash
+python ${FAIRSEQ}/examples/speech_to_text/seg_mustc_data.py \
+  --data-root ${MUSTC_ROOT} --lang de \
+  --split ${SPLIT} --task st \
+  --output ${EVAL_DATA}
+```
+
+The `--data-bin` and `--config` should be the same in previous section if you prepare the data from the scratch.
+If only for evaluation, a prepared data directory can be found [here](https://dl.fbaipublicfiles.com/simultaneous_translation/must_c_v1.0_en_de_databin.tgz). It contains
+- `spm_unigram10000_st.model`: a sentencepiece model binary.
+- `spm_unigram10000_st.txt`: the dictionary file generated by the sentencepiece model.
+- `gcmvn.npz`: the binary for global cepstral mean and variance.
+- `config_st.yaml`: the config yaml file. It looks like this.
+You will need to set the absolute paths for `sentencepiece_model` and `stats_npz_path` if the data directory is downloaded.
+```yaml
+bpe_tokenizer:
+  bpe: sentencepiece
+  sentencepiece_model: ABS_PATH_TO_SENTENCEPIECE_MODEL
+global_cmvn:
+  stats_npz_path: ABS_PATH_TO_GCMVN_FILE
+input_channels: 1
+input_feat_per_channel: 80
+sampling_alpha: 1.0
+specaugment:
+  freq_mask_F: 27
+  freq_mask_N: 1
+  time_mask_N: 1
+  time_mask_T: 100
+  time_mask_p: 1.0
+  time_wrap_W: 0
+transforms:
+  '*':
+  - global_cmvn
+  _train:
+  - global_cmvn
+  - specaugment
+vocab_filename: spm_unigram10000_st.txt
+```
+
+Notice that once a `--data-bin` is set, the `--config` is the base name of the config yaml, not the full path.
+
+Set `--model-path` to the model checkpoint.
+A pretrained checkpoint can be downloaded from [here](https://dl.fbaipublicfiles.com/simultaneous_translation/convtransformer_wait5_pre7), which is a wait-5 model with a pre-decision of 280 ms.
+
+The result of this model on `tst-COMMON` is:
+```bash
+{
+    "Quality": {
+        "BLEU": 13.94974229366959
+    },
+    "Latency": {
+        "AL": 1751.8031870037803,
+        "AL_CA": 2338.5911762796536,
+        "AP": 0.7931395378788959,
+        "AP_CA": 0.9405103863210942,
+        "DAL": 1987.7811616943081,
+        "DAL_CA": 2425.2751560926167
+    }
+}
+```
+
+If `--output ${OUTPUT}` option is used, the detailed log and scores will be stored under the `${OUTPUT}` directory.
+
+
+The quality is measured by detokenized BLEU. So make sure that the predicted words sent to the server are detokenized.
+
+The latency metrics are
+* Average Proportion
+* Average Lagging
+* Differentiable Average Lagging
+
+Again they will also be evaluated on detokenized text.
diff --git a/SpeechT5/fairseq/examples/speech_to_text/prep_covost_data.py b/SpeechT5/fairseq/examples/speech_to_text/prep_covost_data.py
new file mode 100644
index 0000000000000000000000000000000000000000..af1d3fc6b854c961dc460b4db23e86ff4fcbcbf3
--- /dev/null
+++ b/SpeechT5/fairseq/examples/speech_to_text/prep_covost_data.py
@@ -0,0 +1,280 @@
+#!/usr/bin/env python3
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import argparse
+import logging
+from pathlib import Path
+import shutil
+from tempfile import NamedTemporaryFile
+from typing import Optional, Tuple
+
+import pandas as pd
+import torchaudio
+from examples.speech_to_text.data_utils import (
+    create_zip,
+    extract_fbank_features,
+    filter_manifest_df,
+    gen_config_yaml,
+    gen_vocab,
+    get_zip_manifest,
+    load_df_from_tsv,
+    save_df_to_tsv,
+)
+from torch import Tensor
+from torch.utils.data import Dataset
+from torchaudio.datasets.utils import download_url, extract_archive
+from tqdm import tqdm
+
+
+log = logging.getLogger(__name__)
+
+
+MANIFEST_COLUMNS = ["id", "audio", "n_frames", "tgt_text", "speaker"]
+
+
+class CoVoST(Dataset):
+    """Create a Dataset for CoVoST (https://github.com/facebookresearch/covost).
+
+    Args:
+        root (str): root path to the dataset and generated manifests/features
+        source_language (str): source (audio) language
+        target_language (str, optional): target (text) language,
+        None for no translation (default: None)
+        version (int, optional): CoVoST version. (default: 2)
+        download (bool, optional): Whether to download the dataset if it is not
+        found at root path. (default: ``False``).
+    """
+
+    COVOST_URL_TEMPLATE = (
+        "https://dl.fbaipublicfiles.com/covost/"
+        "covost_v2.{src_lang}_{tgt_lang}.tsv.tar.gz"
+    )
+
+    VERSIONS = {2}
+    SPLITS = ["train", "dev", "test"]
+
+    XX_EN_LANGUAGES = {
+        1: ["fr", "de", "nl", "ru", "es", "it", "tr", "fa", "sv-SE", "mn", "zh-CN"],
+        2: [
+            "fr",
+            "de",
+            "es",
+            "ca",
+            "it",
+            "ru",
+            "zh-CN",
+            "pt",
+            "fa",
+            "et",
+            "mn",
+            "nl",
+            "tr",
+            "ar",
+            "sv-SE",
+            "lv",
+            "sl",
+            "ta",
+            "ja",
+            "id",
+            "cy",
+        ],
+    }
+    EN_XX_LANGUAGES = {
+        1: [],
+        2: [
+            "de",
+            "tr",
+            "fa",
+            "sv-SE",
+            "mn",
+            "zh-CN",
+            "cy",
+            "ca",
+            "sl",
+            "et",
+            "id",
+            "ar",
+            "ta",
+            "lv",
+            "ja",
+        ],
+    }
+
+    def __init__(
+        self,
+        root: str,
+        split: str,
+        source_language: str,
+        target_language: Optional[str] = None,
+        version: int = 2,
+    ) -> None:
+        assert version in self.VERSIONS and split in self.SPLITS
+        assert source_language is not None
+        self.no_translation = target_language is None
+        if not self.no_translation:
+            assert "en" in {source_language, target_language}
+            if source_language == "en":
+                assert target_language in self.EN_XX_LANGUAGES[version]
+            else:
+                assert source_language in self.XX_EN_LANGUAGES[version]
+        else:
+            # Hack here so that we can get "split" column from CoVoST TSV.
+            # Note that we use CoVoST train split for ASR which is an extension
+            # to Common Voice train split.
+            target_language = "de" if source_language == "en" else "en"
+
+        self.root: Path = Path(root)
+
+        cv_tsv_path = self.root / "validated.tsv"
+        assert cv_tsv_path.is_file()
+
+        covost_url = self.COVOST_URL_TEMPLATE.format(
+            src_lang=source_language, tgt_lang=target_language
+        )
+        covost_archive = self.root / Path(covost_url).name
+        if not covost_archive.is_file():
+            download_url(covost_url, self.root.as_posix(), hash_value=None)
+        extract_archive(covost_archive.as_posix())
+
+        cv_tsv = load_df_from_tsv(cv_tsv_path)
+        covost_tsv = load_df_from_tsv(
+            self.root / Path(covost_url).name.replace(".tar.gz", "")
+        )
+        df = pd.merge(
+            left=cv_tsv[["path", "sentence", "client_id"]],
+            right=covost_tsv[["path", "translation", "split"]],
+            how="inner",
+            on="path",
+        )
+        if split == "train":
+            df = df[(df["split"] == split) | (df["split"] == f"{split}_covost")]
+        else:
+            df = df[df["split"] == split]
+        data = df.to_dict(orient="index").items()
+        data = [v for k, v in sorted(data, key=lambda x: x[0])]
+        self.data = []
+        for e in data:
+            try:
+                path = self.root / "clips" / e["path"]
+                _ = torchaudio.info(path.as_posix())
+                self.data.append(e)
+            except RuntimeError:
+                pass
+
+    def __getitem__(
+        self, n: int
+    ) -> Tuple[Tensor, int, str, str, Optional[str], str, str]:
+        """Load the n-th sample from the dataset.
+
+        Args:
+            n (int): The index of the sample to be loaded
+
+        Returns:
+            tuple: ``(waveform, sample_rate, sentence, translation, speaker_id,
+            sample_id)``
+        """
+        data = self.data[n]
+        path = self.root / "clips" / data["path"]
+        waveform, sample_rate = torchaudio.load(path)
+        sentence = data["sentence"]
+        translation = None if self.no_translation else data["translation"]
+        speaker_id = data["client_id"]
+        _id = data["path"].replace(".mp3", "")
+        return waveform, sample_rate, sentence, translation, speaker_id, _id
+
+    def __len__(self) -> int:
+        return len(self.data)
+
+
+def process(args):
+    root = Path(args.data_root).absolute() / args.src_lang
+    if not root.is_dir():
+        raise NotADirectoryError(f"{root} does not exist")
+    # Extract features
+    feature_root = root / "fbank80"
+    feature_root.mkdir(exist_ok=True)
+    for split in CoVoST.SPLITS:
+        print(f"Fetching split {split}...")
+        dataset = CoVoST(root, split, args.src_lang, args.tgt_lang)
+        print("Extracting log mel filter bank features...")
+        for waveform, sample_rate, _, _, _, utt_id in tqdm(dataset):
+            extract_fbank_features(
+                waveform, sample_rate, feature_root / f"{utt_id}.npy"
+            )
+    # Pack features into ZIP
+    zip_path = root / "fbank80.zip"
+    print("ZIPing features...")
+    create_zip(feature_root, zip_path)
+    print("Fetching ZIP manifest...")
+    zip_manifest = get_zip_manifest(zip_path)
+    # Generate TSV manifest
+    print("Generating manifest...")
+    train_text = []
+    task = f"asr_{args.src_lang}"
+    if args.tgt_lang is not None:
+        task = f"st_{args.src_lang}_{args.tgt_lang}"
+    for split in CoVoST.SPLITS:
+        manifest = {c: [] for c in MANIFEST_COLUMNS}
+        dataset = CoVoST(root, split, args.src_lang, args.tgt_lang)
+        for wav, sr, src_utt, tgt_utt, speaker_id, utt_id in tqdm(dataset):
+            manifest["id"].append(utt_id)
+            manifest["audio"].append(zip_manifest[utt_id])
+            duration_ms = int(wav.size(1) / sr * 1000)
+            manifest["n_frames"].append(int(1 + (duration_ms - 25) / 10))
+            manifest["tgt_text"].append(src_utt if args.tgt_lang is None else tgt_utt)
+            manifest["speaker"].append(speaker_id)
+        is_train_split = split.startswith("train")
+        if is_train_split:
+            train_text.extend(manifest["tgt_text"])
+        df = pd.DataFrame.from_dict(manifest)
+        df = filter_manifest_df(df, is_train_split=is_train_split)
+        save_df_to_tsv(df, root / f"{split}_{task}.tsv")
+    # Generate vocab
+    vocab_size_str = "" if args.vocab_type == "char" else str(args.vocab_size)
+    spm_filename_prefix = f"spm_{args.vocab_type}{vocab_size_str}_{task}"
+    with NamedTemporaryFile(mode="w") as f:
+        for t in train_text:
+            f.write(t + "\n")
+        gen_vocab(
+            Path(f.name),
+            root / spm_filename_prefix,
+            args.vocab_type,
+            args.vocab_size
+        )
+    # Generate config YAML
+    gen_config_yaml(
+        root,
+        spm_filename_prefix + ".model",
+        yaml_filename=f"config_{task}.yaml",
+        specaugment_policy="lb",
+    )
+    # Clean up
+    shutil.rmtree(feature_root)
+
+
+def main():
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        "--data-root", "-d", required=True, type=str,
+        help="data root with sub-folders for each language <root>/<src_lang>"
+    )
+    parser.add_argument(
+        "--vocab-type",
+        default="unigram",
+        required=True,
+        type=str,
+        choices=["bpe", "unigram", "char"],
+    ),
+    parser.add_argument("--vocab-size", default=1000, type=int)
+    parser.add_argument("--src-lang", "-s", required=True, type=str)
+    parser.add_argument("--tgt-lang", "-t", type=str)
+    args = parser.parse_args()
+
+    process(args)
+
+
+if __name__ == "__main__":
+    main()
diff --git a/SpeechT5/fairseq/examples/speech_to_text/prep_librispeech_data.py b/SpeechT5/fairseq/examples/speech_to_text/prep_librispeech_data.py
new file mode 100644
index 0000000000000000000000000000000000000000..7b08447190b8e7af4d81c49abdf42461fdd6760b
--- /dev/null
+++ b/SpeechT5/fairseq/examples/speech_to_text/prep_librispeech_data.py
@@ -0,0 +1,118 @@
+#!/usr/bin/env python3
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import argparse
+import logging
+from pathlib import Path
+import shutil
+from tempfile import NamedTemporaryFile
+
+import pandas as pd
+from examples.speech_to_text.data_utils import (
+    create_zip,
+    extract_fbank_features,
+    gen_config_yaml,
+    gen_vocab,
+    get_zip_manifest,
+    save_df_to_tsv,
+)
+from torchaudio.datasets import LIBRISPEECH
+from tqdm import tqdm
+
+
+log = logging.getLogger(__name__)
+
+SPLITS = [
+    "train-clean-100",
+    "train-clean-360",
+    "train-other-500",
+    "dev-clean",
+    "dev-other",
+    "test-clean",
+    "test-other",
+]
+
+MANIFEST_COLUMNS = ["id", "audio", "n_frames", "tgt_text", "speaker"]
+
+
+def process(args):
+    out_root = Path(args.output_root).absolute()
+    out_root.mkdir(exist_ok=True)
+    # Extract features
+    feature_root = out_root / "fbank80"
+    feature_root.mkdir(exist_ok=True)
+    for split in SPLITS:
+        print(f"Fetching split {split}...")
+        dataset = LIBRISPEECH(out_root.as_posix(), url=split, download=True)
+        print("Extracting log mel filter bank features...")
+        for wav, sample_rate, _, spk_id, chapter_no, utt_no in tqdm(dataset):
+            sample_id = f"{spk_id}-{chapter_no}-{utt_no}"
+            extract_fbank_features(
+                wav, sample_rate, feature_root / f"{sample_id}.npy"
+            )
+    # Pack features into ZIP
+    zip_path = out_root / "fbank80.zip"
+    print("ZIPing features...")
+    create_zip(feature_root, zip_path)
+    print("Fetching ZIP manifest...")
+    zip_manifest = get_zip_manifest(zip_path)
+    # Generate TSV manifest
+    print("Generating manifest...")
+    train_text = []
+    for split in SPLITS:
+        manifest = {c: [] for c in MANIFEST_COLUMNS}
+        dataset = LIBRISPEECH(out_root.as_posix(), url=split)
+        for wav, sample_rate, utt, spk_id, chapter_no, utt_no in tqdm(dataset):
+            sample_id = f"{spk_id}-{chapter_no}-{utt_no}"
+            manifest["id"].append(sample_id)
+            manifest["audio"].append(zip_manifest[sample_id])
+            duration_ms = int(wav.size(1) / sample_rate * 1000)
+            manifest["n_frames"].append(int(1 + (duration_ms - 25) / 10))
+            manifest["tgt_text"].append(utt.lower())
+            manifest["speaker"].append(spk_id)
+        save_df_to_tsv(
+            pd.DataFrame.from_dict(manifest), out_root / f"{split}.tsv"
+        )
+        if split.startswith("train"):
+            train_text.extend(manifest["tgt_text"])
+    # Generate vocab
+    vocab_size = "" if args.vocab_type == "char" else str(args.vocab_size)
+    spm_filename_prefix = f"spm_{args.vocab_type}{vocab_size}"
+    with NamedTemporaryFile(mode="w") as f:
+        for t in train_text:
+            f.write(t + "\n")
+        gen_vocab(
+            Path(f.name),
+            out_root / spm_filename_prefix,
+            args.vocab_type,
+            args.vocab_size,
+        )
+    # Generate config YAML
+    gen_config_yaml(
+        out_root, spm_filename_prefix + ".model", specaugment_policy="ld"
+    )
+    # Clean up
+    shutil.rmtree(feature_root)
+
+
+def main():
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--output-root", "-o", required=True, type=str)
+    parser.add_argument(
+        "--vocab-type",
+        default="unigram",
+        required=True,
+        type=str,
+        choices=["bpe", "unigram", "char"],
+    ),
+    parser.add_argument("--vocab-size", default=10000, type=int)
+    args = parser.parse_args()
+
+    process(args)
+
+
+if __name__ == "__main__":
+    main()
diff --git a/SpeechT5/fairseq/examples/speech_to_text/prep_mtedx_data.py b/SpeechT5/fairseq/examples/speech_to_text/prep_mtedx_data.py
new file mode 100644
index 0000000000000000000000000000000000000000..34b1c398c8df5168cfc731cf68592b2ae8d5897b
--- /dev/null
+++ b/SpeechT5/fairseq/examples/speech_to_text/prep_mtedx_data.py
@@ -0,0 +1,238 @@
+#!/usr/bin/env python3
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import argparse
+import logging
+import os
+from pathlib import Path
+import shutil
+from itertools import groupby
+from tempfile import NamedTemporaryFile
+from typing import Tuple
+
+import pandas as pd
+import soundfile as sf
+from examples.speech_to_text.data_utils import (
+    create_zip,
+    extract_fbank_features,
+    filter_manifest_df,
+    gen_config_yaml,
+    gen_vocab,
+    get_zip_manifest,
+    load_df_from_tsv,
+    save_df_to_tsv,
+)
+import torch
+from torch.utils.data import Dataset
+from tqdm import tqdm
+
+from fairseq.data.audio.audio_utils import get_waveform
+
+
+log = logging.getLogger(__name__)
+
+
+MANIFEST_COLUMNS = ["id", "audio", "n_frames", "tgt_text", "speaker", "tgt_lang"]
+
+
+class mTEDx(Dataset):
+    """
+    Create a Dataset for Multilingual TEDx.
+    Each item is a tuple of the form: waveform, sample_rate, source utterance,
+    target utterance, speaker_id, utterance_id
+    """
+
+    SPLITS = ["train", "valid", "test"]
+    LANGPAIRS = ["es-es", "fr-fr", "pt-pt", "it-it", "ru-ru", "el-el", "ar-ar", "de-de",
+                 "es-en", "es-fr", "es-pt", "es-it", "fr-en", "fr-es", "fr-pt",
+                 "pt-en", "pt-es", "it-en", "it-es", "ru-en", "el-en"]
+
+    def __init__(self, root: str, lang: str, split: str) -> None:
+        assert split in self.SPLITS and lang in self.LANGPAIRS
+        _root = Path(root) / f"{lang}" / "data" / split
+        wav_root, txt_root = _root / "wav", _root / "txt"
+        assert _root.is_dir() and wav_root.is_dir() and txt_root.is_dir()
+        # Load audio segments
+        try:
+            import yaml
+        except ImportError:
+            print("Please install PyYAML to load the Multilingual TEDx YAML files")
+        with open(txt_root / f"{split}.yaml") as f:
+            segments = yaml.load(f, Loader=yaml.BaseLoader)
+        # Load source and target utterances
+        src, tgt = lang.split("-")
+        for _lang in [src, tgt]:
+            with open(txt_root / f"{split}.{_lang}") as f:
+                utterances = [r.strip() for r in f]
+            assert len(segments) == len(utterances)
+            for i, u in enumerate(utterances):
+                segments[i][_lang] = u
+        # Gather info
+        self.data = []
+        for wav_filename, _seg_group in groupby(segments, lambda x: x["wav"]):
+            wav_filename = wav_filename.replace(".wav", ".flac")
+            wav_path = wav_root / wav_filename
+            sample_rate = sf.info(wav_path.as_posix()).samplerate
+            seg_group = sorted(_seg_group, key=lambda x: float(x["offset"]))
+            for i, segment in enumerate(seg_group):
+                offset = int(float(segment["offset"]) * sample_rate)
+                n_frames = int(float(segment["duration"]) * sample_rate)
+                _id = f"{wav_path.stem}_{i}"
+                self.data.append(
+                    (
+                        wav_path.as_posix(),
+                        offset,
+                        n_frames,
+                        sample_rate,
+                        segment[src],
+                        segment[tgt],
+                        segment["speaker_id"],
+                        tgt,
+                        _id,
+                    )
+                )
+
+    def __getitem__(self, n: int) -> Tuple[torch.Tensor, int, str, str, str, str, str]:
+        wav_path, offset, n_frames, sr, src_utt, tgt_utt, spk_id, tgt_lang, utt_id = self.data[n]
+        waveform, _ = get_waveform(wav_path, frames=n_frames, start=offset)
+        waveform = torch.from_numpy(waveform)
+        return waveform, sr, src_utt, tgt_utt, spk_id, tgt_lang, utt_id
+
+    def __len__(self) -> int:
+        return len(self.data)
+
+
+def process(args):
+    root = Path(args.data_root).absolute()
+    for lang in mTEDx.LANGPAIRS:
+        cur_root = root / f"{lang}"
+        if not cur_root.is_dir():
+            print(f"{cur_root.as_posix()} does not exist. Skipped.")
+            continue
+        # Extract features
+        feature_root = cur_root / "fbank80"
+        feature_root.mkdir(exist_ok=True)
+        for split in mTEDx.SPLITS:
+            print(f"Fetching split {split}...")
+            dataset = mTEDx(root.as_posix(), lang, split)
+            print("Extracting log mel filter bank features...")
+            for waveform, sample_rate, _, _, _, _, utt_id in tqdm(dataset):
+                extract_fbank_features(
+                    waveform, sample_rate, feature_root / f"{utt_id}.npy"
+                )
+        # Pack features into ZIP
+        zip_path = cur_root / "fbank80.zip"
+        print("ZIPing features...")
+        create_zip(feature_root, zip_path)
+        print("Fetching ZIP manifest...")
+        zip_manifest = get_zip_manifest(zip_path)
+        # Generate TSV manifest
+        print("Generating manifest...")
+        train_text = []
+        for split in mTEDx.SPLITS:
+            is_train_split = split.startswith("train")
+            manifest = {c: [] for c in MANIFEST_COLUMNS}
+            dataset = mTEDx(args.data_root, lang, split)
+            for wav, sr, src_utt, tgt_utt, speaker_id, tgt_lang, utt_id in tqdm(dataset):
+                manifest["id"].append(utt_id)
+                manifest["audio"].append(zip_manifest[utt_id])
+                duration_ms = int(wav.size(1) / sr * 1000)
+                manifest["n_frames"].append(int(1 + (duration_ms - 25) / 10))
+                manifest["tgt_text"].append(src_utt if args.task == "asr" else tgt_utt)
+                manifest["speaker"].append(speaker_id)
+                manifest["tgt_lang"].append(tgt_lang)
+            if is_train_split:
+                train_text.extend(manifest["tgt_text"])
+            df = pd.DataFrame.from_dict(manifest)
+            df = filter_manifest_df(df, is_train_split=is_train_split)
+            save_df_to_tsv(df, cur_root / f"{split}_{args.task}.tsv")
+        # Generate vocab
+        v_size_str = "" if args.vocab_type == "char" else str(args.vocab_size)
+        spm_filename_prefix = f"spm_{args.vocab_type}{v_size_str}_{args.task}"
+        with NamedTemporaryFile(mode="w") as f:
+            for t in train_text:
+                f.write(t + "\n")
+            gen_vocab(
+                Path(f.name),
+                cur_root / spm_filename_prefix,
+                args.vocab_type,
+                args.vocab_size,
+            )
+        # Generate config YAML
+        gen_config_yaml(
+            cur_root,
+            spm_filename_prefix + ".model",
+            yaml_filename=f"config_{args.task}.yaml",
+            specaugment_policy="lb",
+        )
+        # Clean up
+        shutil.rmtree(feature_root)
+
+
+def process_joint(args):
+    cur_root = Path(args.data_root)
+    assert all((cur_root / f"{lang}").is_dir() for lang in mTEDx.LANGPAIRS), \
+        "do not have downloaded data available for all languages"
+    # Generate vocab
+    vocab_size_str = "" if args.vocab_type == "char" else str(args.vocab_size)
+    spm_filename_prefix = f"spm_{args.vocab_type}{vocab_size_str}_{args.task}"
+    with NamedTemporaryFile(mode="w") as f:
+        for lang in mTEDx.LANGPAIRS:
+            tsv_path = cur_root / f"{lang}" / f"train_{args.task}.tsv"
+            df = load_df_from_tsv(tsv_path)
+            for t in df["tgt_text"]:
+                f.write(t + "\n")
+        special_symbols = None
+        if args.joint:
+            # Add tgt_lang tags to dict
+            special_symbols = list({f'<lang:{lang.split("-")[1]}>' for lang in mTEDx.LANGPAIRS})
+        gen_vocab(
+            Path(f.name),
+            cur_root / spm_filename_prefix,
+            args.vocab_type,
+            args.vocab_size,
+            special_symbols=special_symbols
+        )
+    # Generate config YAML
+    gen_config_yaml(
+        cur_root,
+        spm_filename_prefix + ".model",
+        yaml_filename=f"config_{args.task}.yaml",
+        specaugment_policy="ld",
+        prepend_tgt_lang_tag=(args.joint),
+    )
+    # Make symbolic links to manifests
+    for lang in mTEDx.LANGPAIRS:
+        for split in mTEDx.SPLITS:
+            src_path = cur_root / f"{lang}" / f"{split}_{args.task}.tsv"
+            desc_path = cur_root / f"{split}_{lang}_{args.task}.tsv"
+            if not desc_path.is_symlink():
+                os.symlink(src_path, desc_path)
+
+
+def main():
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--data-root", "-d", required=True, type=str)
+    parser.add_argument(
+        "--vocab-type",
+        default="unigram",
+        required=True,
+        type=str,
+        choices=["bpe", "unigram", "char"],
+    ),
+    parser.add_argument("--vocab-size", default=8000, type=int)
+    parser.add_argument("--task", type=str, choices=["asr", "st"])
+    parser.add_argument("--joint", action="store_true", help="")
+    args = parser.parse_args()
+
+    if args.joint:
+        process_joint(args)
+    else:
+        process(args)
+
+
+if __name__ == "__main__":
+    main()
diff --git a/SpeechT5/fairseq/examples/speech_to_text/prep_mustc_data.py b/SpeechT5/fairseq/examples/speech_to_text/prep_mustc_data.py
new file mode 100644
index 0000000000000000000000000000000000000000..0ee204e651a4e5a305cd27274101740a2a35d7bc
--- /dev/null
+++ b/SpeechT5/fairseq/examples/speech_to_text/prep_mustc_data.py
@@ -0,0 +1,264 @@
+#!/usr/bin/env python3
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import argparse
+import logging
+import os
+from pathlib import Path
+import shutil
+from itertools import groupby
+from tempfile import NamedTemporaryFile
+from typing import Tuple
+
+import numpy as np
+import pandas as pd
+import soundfile as sf
+from examples.speech_to_text.data_utils import (
+    create_zip,
+    extract_fbank_features,
+    filter_manifest_df,
+    gen_config_yaml,
+    gen_vocab,
+    get_zip_manifest,
+    load_df_from_tsv,
+    save_df_to_tsv,
+    cal_gcmvn_stats,
+)
+import torch
+from torch.utils.data import Dataset
+from tqdm import tqdm
+
+from fairseq.data.audio.audio_utils import get_waveform
+
+
+log = logging.getLogger(__name__)
+
+
+MANIFEST_COLUMNS = ["id", "audio", "n_frames", "tgt_text", "speaker"]
+
+
+class MUSTC(Dataset):
+    """
+    Create a Dataset for MuST-C. Each item is a tuple of the form:
+    waveform, sample_rate, source utterance, target utterance, speaker_id,
+    utterance_id
+    """
+
+    SPLITS = ["train", "dev", "tst-COMMON", "tst-HE"]
+    LANGUAGES = ["de", "es", "fr", "it", "nl", "pt", "ro", "ru"]
+
+    def __init__(self, root: str, lang: str, split: str) -> None:
+        assert split in self.SPLITS and lang in self.LANGUAGES
+        _root = Path(root) / f"en-{lang}" / "data" / split
+        wav_root, txt_root = _root / "wav", _root / "txt"
+        assert _root.is_dir() and wav_root.is_dir() and txt_root.is_dir()
+        # Load audio segments
+        try:
+            import yaml
+        except ImportError:
+            print("Please install PyYAML to load the MuST-C YAML files")
+        with open(txt_root / f"{split}.yaml") as f:
+            segments = yaml.load(f, Loader=yaml.BaseLoader)
+        # Load source and target utterances
+        for _lang in ["en", lang]:
+            with open(txt_root / f"{split}.{_lang}") as f:
+                utterances = [r.strip() for r in f]
+            assert len(segments) == len(utterances)
+            for i, u in enumerate(utterances):
+                segments[i][_lang] = u
+        # Gather info
+        self.data = []
+        for wav_filename, _seg_group in groupby(segments, lambda x: x["wav"]):
+            wav_path = wav_root / wav_filename
+            sample_rate = sf.info(wav_path.as_posix()).samplerate
+            seg_group = sorted(_seg_group, key=lambda x: x["offset"])
+            for i, segment in enumerate(seg_group):
+                offset = int(float(segment["offset"]) * sample_rate)
+                n_frames = int(float(segment["duration"]) * sample_rate)
+                _id = f"{wav_path.stem}_{i}"
+                self.data.append(
+                    (
+                        wav_path.as_posix(),
+                        offset,
+                        n_frames,
+                        sample_rate,
+                        segment["en"],
+                        segment[lang],
+                        segment["speaker_id"],
+                        _id,
+                    )
+                )
+
+    def __getitem__(self, n: int) -> Tuple[torch.Tensor, int, str, str, str, str]:
+        wav_path, offset, n_frames, sr, src_utt, tgt_utt, spk_id, utt_id = self.data[n]
+        waveform, _ = get_waveform(wav_path, frames=n_frames, start=offset)
+        waveform = torch.from_numpy(waveform)
+        return waveform, sr, src_utt, tgt_utt, spk_id, utt_id
+
+    def __len__(self) -> int:
+        return len(self.data)
+
+
+def process(args):
+    root = Path(args.data_root).absolute()
+    for lang in MUSTC.LANGUAGES:
+        cur_root = root / f"en-{lang}"
+        if not cur_root.is_dir():
+            print(f"{cur_root.as_posix()} does not exist. Skipped.")
+            continue
+        # Extract features
+        feature_root = cur_root / "fbank80"
+        feature_root.mkdir(exist_ok=True)
+        for split in MUSTC.SPLITS:
+            print(f"Fetching split {split}...")
+            dataset = MUSTC(root.as_posix(), lang, split)
+            print("Extracting log mel filter bank features...")
+            if split == 'train' and args.cmvn_type == "global":
+                print("And estimating cepstral mean and variance stats...")
+                gcmvn_feature_list = []
+
+            for waveform, sample_rate, _, _, _, utt_id in tqdm(dataset):
+                features = extract_fbank_features(waveform, sample_rate)
+
+                np.save(
+                    (feature_root / f"{utt_id}.npy").as_posix(),
+                    features
+                )
+
+                if split == 'train' and args.cmvn_type == "global":
+                    if len(gcmvn_feature_list) < args.gcmvn_max_num:
+                        gcmvn_feature_list.append(features)
+
+            if split == 'train' and args.cmvn_type == "global":
+                # Estimate and save cmv
+                stats = cal_gcmvn_stats(gcmvn_feature_list)
+                with open(cur_root / "gcmvn.npz", "wb") as f:
+                    np.savez(f, mean=stats["mean"], std=stats["std"])
+
+        # Pack features into ZIP
+        zip_path = cur_root / "fbank80.zip"
+        print("ZIPing features...")
+        create_zip(feature_root, zip_path)
+        print("Fetching ZIP manifest...")
+        zip_manifest = get_zip_manifest(zip_path)
+        # Generate TSV manifest
+        print("Generating manifest...")
+        train_text = []
+        for split in MUSTC.SPLITS:
+            is_train_split = split.startswith("train")
+            manifest = {c: [] for c in MANIFEST_COLUMNS}
+            dataset = MUSTC(args.data_root, lang, split)
+            for wav, sr, src_utt, tgt_utt, speaker_id, utt_id in tqdm(dataset):
+                manifest["id"].append(utt_id)
+                manifest["audio"].append(zip_manifest[utt_id])
+                duration_ms = int(wav.size(1) / sr * 1000)
+                manifest["n_frames"].append(int(1 + (duration_ms - 25) / 10))
+                manifest["tgt_text"].append(src_utt if args.task == "asr" else tgt_utt)
+                manifest["speaker"].append(speaker_id)
+            if is_train_split:
+                train_text.extend(manifest["tgt_text"])
+            df = pd.DataFrame.from_dict(manifest)
+            df = filter_manifest_df(df, is_train_split=is_train_split)
+            save_df_to_tsv(df, cur_root / f"{split}_{args.task}.tsv")
+        # Generate vocab
+        v_size_str = "" if args.vocab_type == "char" else str(args.vocab_size)
+        spm_filename_prefix = f"spm_{args.vocab_type}{v_size_str}_{args.task}"
+        with NamedTemporaryFile(mode="w") as f:
+            for t in train_text:
+                f.write(t + "\n")
+            gen_vocab(
+                Path(f.name),
+                cur_root / spm_filename_prefix,
+                args.vocab_type,
+                args.vocab_size,
+            )
+        # Generate config YAML
+        gen_config_yaml(
+            cur_root,
+            spm_filename_prefix + ".model",
+            yaml_filename=f"config_{args.task}.yaml",
+            specaugment_policy="lb",
+            cmvn_type=args.cmvn_type,
+            gcmvn_path=(
+                cur_root / "gcmvn.npz" if args.cmvn_type == "global"
+                else None
+            ),
+        )
+        # Clean up
+        shutil.rmtree(feature_root)
+
+
+def process_joint(args):
+    cur_root = Path(args.data_root)
+    assert all((cur_root / f"en-{lang}").is_dir() for lang in MUSTC.LANGUAGES), \
+        "do not have downloaded data available for all 8 languages"
+    # Generate vocab
+    vocab_size_str = "" if args.vocab_type == "char" else str(args.vocab_size)
+    spm_filename_prefix = f"spm_{args.vocab_type}{vocab_size_str}_{args.task}"
+    with NamedTemporaryFile(mode="w") as f:
+        for lang in MUSTC.LANGUAGES:
+            tsv_path = cur_root / f"en-{lang}" / f"train_{args.task}.tsv"
+            df = load_df_from_tsv(tsv_path)
+            for t in df["tgt_text"]:
+                f.write(t + "\n")
+        special_symbols = None
+        if args.task == 'st':
+            special_symbols = [f'<lang:{lang}>' for lang in MUSTC.LANGUAGES]
+        gen_vocab(
+            Path(f.name),
+            cur_root / spm_filename_prefix,
+            args.vocab_type,
+            args.vocab_size,
+            special_symbols=special_symbols
+        )
+    # Generate config YAML
+    gen_config_yaml(
+        cur_root,
+        spm_filename_prefix + ".model",
+        yaml_filename=f"config_{args.task}.yaml",
+        specaugment_policy="ld",
+        prepend_tgt_lang_tag=(args.task == "st"),
+    )
+    # Make symbolic links to manifests
+    for lang in MUSTC.LANGUAGES:
+        for split in MUSTC.SPLITS:
+            src_path = cur_root / f"en-{lang}" / f"{split}_{args.task}.tsv"
+            desc_path = cur_root / f"{split}_{lang}_{args.task}.tsv"
+            if not desc_path.is_symlink():
+                os.symlink(src_path, desc_path)
+
+
+def main():
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--data-root", "-d", required=True, type=str)
+    parser.add_argument(
+        "--vocab-type",
+        default="unigram",
+        required=True,
+        type=str,
+        choices=["bpe", "unigram", "char"],
+    ),
+    parser.add_argument("--vocab-size", default=8000, type=int)
+    parser.add_argument("--task", type=str, choices=["asr", "st"])
+    parser.add_argument("--joint", action="store_true", help="")
+    parser.add_argument("--cmvn-type", default="utterance",
+                        choices=["global", "utterance"],
+                        help="The type of cepstral mean and variance normalization")
+    parser.add_argument("--gcmvn-max-num", default=150000, type=int,
+                        help=(
+                            "Maximum number of sentences to use to estimate"
+                            "global mean and variance"
+                            ))
+    args = parser.parse_args()
+
+    if args.joint:
+        process_joint(args)
+    else:
+        process(args)
+
+
+if __name__ == "__main__":
+    main()
diff --git a/SpeechT5/fairseq/examples/speech_to_text/seg_mustc_data.py b/SpeechT5/fairseq/examples/speech_to_text/seg_mustc_data.py
new file mode 100644
index 0000000000000000000000000000000000000000..1ee665d6399729afe17d790d872eff34de124900
--- /dev/null
+++ b/SpeechT5/fairseq/examples/speech_to_text/seg_mustc_data.py
@@ -0,0 +1,54 @@
+#!/usr/bin/env python3
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import argparse
+import logging
+from pathlib import Path
+import soundfile as sf
+from examples.speech_to_text.prep_mustc_data import (
+    MUSTC
+)
+
+from tqdm import tqdm
+
+log = logging.getLogger(__name__)
+
+
+def main(args):
+    root = Path(args.data_root).absolute()
+    lang = args.lang
+    split = args.split
+
+    cur_root = root / f"en-{lang}"
+    assert cur_root.is_dir(), (
+        f"{cur_root.as_posix()} does not exist. Skipped."
+    )
+
+    dataset = MUSTC(root.as_posix(), lang, split)
+    output = Path(args.output).absolute()
+    output.mkdir(exist_ok=True)
+    f_text = open(output / f"{split}.{lang}", "w")
+    f_wav_list = open(output / f"{split}.wav_list", "w")
+    for waveform, sample_rate, _, text, _, utt_id in tqdm(dataset):
+        sf.write(
+            output / f"{utt_id}.wav",
+            waveform.squeeze(0).numpy(),
+            samplerate=int(sample_rate)
+        )
+        f_text.write(text + "\n")
+        f_wav_list.write(str(output / f"{utt_id}.wav") + "\n")
+
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--data-root", "-d", required=True, type=str)
+    parser.add_argument("--task", required=True, type=str, choices=["asr", "st"])
+    parser.add_argument("--lang", required=True, type=str)
+    parser.add_argument("--output", required=True, type=str)
+    parser.add_argument("--split", required=True, choices=MUSTC.SPLITS)
+    args = parser.parse_args()
+
+    main(args)
diff --git a/SpeechT5/fairseq/examples/speech_to_text/simultaneous_translation/agents/fairseq_simul_st_agent.py b/SpeechT5/fairseq/examples/speech_to_text/simultaneous_translation/agents/fairseq_simul_st_agent.py
new file mode 100644
index 0000000000000000000000000000000000000000..61617a1739ce196abba1e9a6f9ad9e9f4b37b9c1
--- /dev/null
+++ b/SpeechT5/fairseq/examples/speech_to_text/simultaneous_translation/agents/fairseq_simul_st_agent.py
@@ -0,0 +1,363 @@
+import math
+import os
+import json
+import numpy as np
+import torch
+import torchaudio.compliance.kaldi as kaldi
+import yaml
+from fairseq import checkpoint_utils, tasks
+from fairseq.file_io import PathManager
+
+try:
+    from simuleval import READ_ACTION, WRITE_ACTION, DEFAULT_EOS
+    from simuleval.agents import SpeechAgent
+    from simuleval.states import ListEntry, SpeechStates
+except ImportError:
+    print("Please install simuleval 'pip install simuleval'")
+
+SHIFT_SIZE = 10
+WINDOW_SIZE = 25
+SAMPLE_RATE = 16000
+FEATURE_DIM = 80
+BOW_PREFIX = "\u2581"
+
+
+class OnlineFeatureExtractor:
+    """
+    Extract speech feature on the fly.
+    """
+
+    def __init__(self, args):
+        self.shift_size = args.shift_size
+        self.window_size = args.window_size
+        assert self.window_size >= self.shift_size
+
+        self.sample_rate = args.sample_rate
+        self.feature_dim = args.feature_dim
+        self.num_samples_per_shift = int(self.shift_size * self.sample_rate / 1000)
+        self.num_samples_per_window = int(self.window_size * self.sample_rate / 1000)
+        self.len_ms_to_samples = lambda x: x * self.sample_rate / 1000
+        self.previous_residual_samples = []
+        self.global_cmvn = args.global_cmvn
+
+    def clear_cache(self):
+        self.previous_residual_samples = []
+
+    def __call__(self, new_samples):
+        samples = self.previous_residual_samples + new_samples
+        if len(samples) < self.num_samples_per_window:
+            self.previous_residual_samples = samples
+            return
+
+        # num_frames is the number of frames from the new segment
+        num_frames = math.floor(
+            (len(samples) - self.len_ms_to_samples(self.window_size - self.shift_size))
+            / self.num_samples_per_shift
+        )
+
+        # the number of frames used for feature extraction
+        # including some part of thte previous segment
+        effective_num_samples = int(
+            num_frames * self.len_ms_to_samples(self.shift_size)
+            + self.len_ms_to_samples(self.window_size - self.shift_size)
+        )
+
+        input_samples = samples[:effective_num_samples]
+        self.previous_residual_samples = samples[
+            num_frames * self.num_samples_per_shift:
+        ]
+
+        torch.manual_seed(1)
+        output = kaldi.fbank(
+            torch.FloatTensor(input_samples).unsqueeze(0),
+            num_mel_bins=self.feature_dim,
+            frame_length=self.window_size,
+            frame_shift=self.shift_size,
+        ).numpy()
+
+        output = self.transform(output)
+
+        return torch.from_numpy(output)
+
+    def transform(self, input):
+        if self.global_cmvn is None:
+            return input
+
+        mean = self.global_cmvn["mean"]
+        std = self.global_cmvn["std"]
+
+        x = np.subtract(input, mean)
+        x = np.divide(x, std)
+        return x
+
+
+class TensorListEntry(ListEntry):
+    """
+    Data structure to store a list of tensor.
+    """
+
+    def append(self, value):
+
+        if len(self.value) == 0:
+            self.value = value
+            return
+
+        self.value = torch.cat([self.value] + [value], dim=0)
+
+    def info(self):
+        return {
+            "type": str(self.new_value_type),
+            "length": self.__len__(),
+            "value": "" if type(self.value) is list else self.value.size(),
+        }
+
+
+class FairseqSimulSTAgent(SpeechAgent):
+
+    speech_segment_size = 40  # in ms, 4 pooling ratio * 10 ms step size
+
+    def __init__(self, args):
+        super().__init__(args)
+
+        self.eos = DEFAULT_EOS
+
+        self.gpu = getattr(args, "gpu", False)
+
+        self.args = args
+
+        self.load_model_vocab(args)
+
+        if getattr(
+            self.model.decoder.layers[0].encoder_attn,
+            'pre_decision_ratio',
+            None
+        ) is not None:
+            self.speech_segment_size *= (
+                self.model.decoder.layers[0].encoder_attn.pre_decision_ratio
+            )
+
+        args.global_cmvn = None
+        if args.config:
+            with open(os.path.join(args.data_bin, args.config), "r") as f:
+                config = yaml.load(f, Loader=yaml.BaseLoader)
+
+            if "global_cmvn" in config:
+                args.global_cmvn = np.load(config["global_cmvn"]["stats_npz_path"])
+
+        if args.global_stats:
+            with PathManager.open(args.global_stats, "r") as f:
+                global_cmvn = json.loads(f.read())
+                self.global_cmvn = {"mean": global_cmvn["mean"], "std": global_cmvn["stddev"]}
+
+        self.feature_extractor = OnlineFeatureExtractor(args)
+
+        self.max_len = args.max_len
+
+        self.force_finish = args.force_finish
+
+        torch.set_grad_enabled(False)
+
+    def build_states(self, args, client, sentence_id):
+        # Initialize states here, for example add customized entry to states
+        # This function will be called at beginning of every new sentence
+        states = SpeechStates(args, client, sentence_id, self)
+        self.initialize_states(states)
+        return states
+
+    def to_device(self, tensor):
+        if self.gpu:
+            return tensor.cuda()
+        else:
+            return tensor.cpu()
+
+    @staticmethod
+    def add_args(parser):
+        # fmt: off
+        parser.add_argument('--model-path', type=str, required=True,
+                            help='path to your pretrained model.')
+        parser.add_argument("--data-bin", type=str, required=True,
+                            help="Path of data binary")
+        parser.add_argument("--config", type=str, default=None,
+                            help="Path to config yaml file")
+        parser.add_argument("--global-stats", type=str, default=None,
+                            help="Path to json file containing cmvn stats")
+        parser.add_argument("--tgt-splitter-type", type=str, default="SentencePiece",
+                            help="Subword splitter type for target text")
+        parser.add_argument("--tgt-splitter-path", type=str, default=None,
+                            help="Subword splitter model path for target text")
+        parser.add_argument("--user-dir", type=str, default="examples/simultaneous_translation",
+                            help="User directory for simultaneous translation")
+        parser.add_argument("--max-len", type=int, default=200,
+                            help="Max length of translation")
+        parser.add_argument("--force-finish", default=False, action="store_true",
+                            help="Force the model to finish the hypothsis if the source is not finished")
+        parser.add_argument("--shift-size", type=int, default=SHIFT_SIZE,
+                            help="Shift size of feature extraction window.")
+        parser.add_argument("--window-size", type=int, default=WINDOW_SIZE,
+                            help="Window size of feature extraction window.")
+        parser.add_argument("--sample-rate", type=int, default=SAMPLE_RATE,
+                            help="Sample rate")
+        parser.add_argument("--feature-dim", type=int, default=FEATURE_DIM,
+                            help="Acoustic feature dimension.")
+
+        # fmt: on
+        return parser
+
+    def load_model_vocab(self, args):
+
+        filename = args.model_path
+        if not os.path.exists(filename):
+            raise IOError("Model file not found: {}".format(filename))
+
+        state = checkpoint_utils.load_checkpoint_to_cpu(filename)
+
+        task_args = state["cfg"]["task"]
+        task_args.data = args.data_bin
+
+        if args.config is not None:
+            task_args.config_yaml = args.config
+
+        task = tasks.setup_task(task_args)
+
+        # build model for ensemble
+        state["cfg"]["model"].load_pretrained_encoder_from = None
+        state["cfg"]["model"].load_pretrained_decoder_from = None
+        self.model = task.build_model(state["cfg"]["model"])
+        self.model.load_state_dict(state["model"], strict=True)
+        self.model.eval()
+        self.model.share_memory()
+
+        if self.gpu:
+            self.model.cuda()
+
+        # Set dictionary
+        self.dict = {}
+        self.dict["tgt"] = task.target_dictionary
+
+    def initialize_states(self, states):
+        self.feature_extractor.clear_cache()
+        states.units.source = TensorListEntry()
+        states.units.target = ListEntry()
+        states.incremental_states = dict()
+
+    def segment_to_units(self, segment, states):
+        # Convert speech samples to features
+        features = self.feature_extractor(segment)
+        if features is not None:
+            return [features]
+        else:
+            return []
+
+    def units_to_segment(self, units, states):
+        # Merge sub word to full word.
+        if self.model.decoder.dictionary.eos() == units[0]:
+            return DEFAULT_EOS
+
+        segment = []
+        if None in units.value:
+            units.value.remove(None)
+
+        for index in units:
+            if index is None:
+                units.pop()
+            token = self.model.decoder.dictionary.string([index])
+            if token.startswith(BOW_PREFIX):
+                if len(segment) == 0:
+                    segment += [token.replace(BOW_PREFIX, "")]
+                else:
+                    for j in range(len(segment)):
+                        units.pop()
+
+                    string_to_return = ["".join(segment)]
+
+                    if self.model.decoder.dictionary.eos() == units[0]:
+                        string_to_return += [DEFAULT_EOS]
+
+                    return string_to_return
+            else:
+                segment += [token.replace(BOW_PREFIX, "")]
+
+        if (
+            len(units) > 0
+            and self.model.decoder.dictionary.eos() == units[-1]
+            or len(states.units.target) > self.max_len
+        ):
+            tokens = [self.model.decoder.dictionary.string([unit]) for unit in units]
+            return ["".join(tokens).replace(BOW_PREFIX, "")] + [DEFAULT_EOS]
+
+        return None
+
+    def update_model_encoder(self, states):
+        if len(states.units.source) == 0:
+            return
+        src_indices = self.to_device(
+            states.units.source.value.unsqueeze(0)
+        )
+        src_lengths = self.to_device(
+            torch.LongTensor([states.units.source.value.size(0)])
+        )
+
+        states.encoder_states = self.model.encoder(src_indices, src_lengths)
+        torch.cuda.empty_cache()
+
+    def update_states_read(self, states):
+        # Happens after a read action.
+        self.update_model_encoder(states)
+
+    def policy(self, states):
+        if not getattr(states, "encoder_states", None):
+            return READ_ACTION
+
+        tgt_indices = self.to_device(
+            torch.LongTensor(
+                [self.model.decoder.dictionary.eos()]
+                + [x for x in states.units.target.value if x is not None]
+            ).unsqueeze(0)
+        )
+
+        states.incremental_states["steps"] = {
+            "src": states.encoder_states["encoder_out"][0].size(0),
+            "tgt": 1 + len(states.units.target),
+        }
+
+        states.incremental_states["online"] = {"only": torch.tensor(not states.finish_read())}
+
+        x, outputs = self.model.decoder.forward(
+            prev_output_tokens=tgt_indices,
+            encoder_out=states.encoder_states,
+            incremental_state=states.incremental_states,
+        )
+
+        states.decoder_out = x
+
+        states.decoder_out_extra = outputs
+
+        torch.cuda.empty_cache()
+
+        if outputs.action == 0:
+            return READ_ACTION
+        else:
+            return WRITE_ACTION
+
+    def predict(self, states):
+        decoder_states = states.decoder_out
+
+        lprobs = self.model.get_normalized_probs(
+            [decoder_states[:, -1:]], log_probs=True
+        )
+
+        index = lprobs.argmax(dim=-1)
+
+        index = index[0, 0].item()
+
+        if (
+            self.force_finish
+            and index == self.model.decoder.dictionary.eos()
+            and not states.finish_read()
+        ):
+            # If we want to force finish the translation
+            # (don't stop before finish reading), return a None
+            # self.model.decoder.clear_cache(states.incremental_states)
+            index = None
+
+        return index
diff --git a/SpeechT5/fairseq/examples/stories/README.md b/SpeechT5/fairseq/examples/stories/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..588941eddc5f0280f5254affd40ef49de874c885
--- /dev/null
+++ b/SpeechT5/fairseq/examples/stories/README.md
@@ -0,0 +1,66 @@
+# Hierarchical Neural Story Generation (Fan et al., 2018)
+
+The following commands provide an example of pre-processing data, training a model, and generating text for story generation with the WritingPrompts dataset.
+
+## Pre-trained models
+
+Description | Dataset | Model | Test set(s)
+---|---|---|---
+Stories with Convolutional Model <br> ([Fan et al., 2018](https://arxiv.org/abs/1805.04833)) | [WritingPrompts](https://dl.fbaipublicfiles.com/fairseq/data/writingPrompts.tar.gz) | [download (.tar.bz2)](https://dl.fbaipublicfiles.com/fairseq/models/stories_checkpoint.tar.bz2) | [download (.tar.bz2)](https://dl.fbaipublicfiles.com/fairseq/data/stories_test.tar.bz2)
+
+We provide sample stories generated by the [convolutional seq2seq model](https://dl.fbaipublicfiles.com/fairseq/data/seq2seq_stories.txt) and [fusion model](https://dl.fbaipublicfiles.com/fairseq/data/fusion_stories.txt) from [Fan et al., 2018](https://arxiv.org/abs/1805.04833). The corresponding prompts for the fusion model can be found [here](https://dl.fbaipublicfiles.com/fairseq/data/fusion_prompts.txt). Note that there are unk in the file, as we modeled a small full vocabulary (no BPE or pre-training). We did not use these unk prompts for human evaluation.
+
+## Dataset
+
+The dataset can be downloaded like this:
+
+```bash
+cd examples/stories
+curl https://dl.fbaipublicfiles.com/fairseq/data/writingPrompts.tar.gz | tar xvzf -
+```
+
+and contains a train, test, and valid split. The dataset is described here: https://arxiv.org/abs/1805.04833. We model only the first 1000 words of each story, including one newLine token.
+
+## Example usage
+
+First we will preprocess the dataset. Note that the dataset release is the full data, but the paper models the first 1000 words of each story. Here is example code that trims the dataset to the first 1000 words of each story:
+```python
+data = ["train", "test", "valid"]
+for name in data:
+    with open(name + ".wp_target") as f:
+        stories = f.readlines()
+    stories = [" ".join(i.split()[0:1000]) for i in stories]
+    with open(name + ".wp_target", "w") as o:
+        for line in stories:
+            o.write(line.strip() + "\n")
+```
+
+Once we've trimmed the data we can binarize it and train our model:
+```bash
+# Binarize the dataset:
+export TEXT=examples/stories/writingPrompts
+fairseq-preprocess --source-lang wp_source --target-lang wp_target \
+    --trainpref $TEXT/train --validpref $TEXT/valid --testpref $TEXT/test \
+    --destdir data-bin/writingPrompts --padding-factor 1 --thresholdtgt 10 --thresholdsrc 10
+
+# Train the model:
+fairseq-train data-bin/writingPrompts -a fconv_self_att_wp --lr 0.25 --optimizer nag --clip-norm 0.1 --max-tokens 1500 --lr-scheduler reduce_lr_on_plateau --decoder-attention True --encoder-attention False --criterion label_smoothed_cross_entropy --weight-decay .0000001 --label-smoothing 0 --source-lang wp_source --target-lang wp_target --gated-attention True --self-attention True --project-input True --pretrained False
+
+# Train a fusion model:
+# add the arguments: --pretrained True --pretrained-checkpoint path/to/checkpoint
+
+# Generate:
+# Note: to load the pretrained model at generation time, you need to pass in a model-override argument to communicate to the fusion model at generation time where you have placed the pretrained checkpoint. By default, it will load the exact path of the fusion model's pretrained model from training time. You should use model-override if you have moved the pretrained model (or are using our provided models). If you are generating from a non-fusion model, the model-override argument is not necessary.
+
+fairseq-generate data-bin/writingPrompts --path /path/to/trained/model/checkpoint_best.pt --batch-size 32 --beam 1 --sampling --sampling-topk 10 --temperature 0.8 --nbest 1 --model-overrides "{'pretrained_checkpoint':'/path/to/pretrained/model/checkpoint'}"
+```
+
+## Citation
+```bibtex
+@inproceedings{fan2018hierarchical,
+  title = {Hierarchical Neural Story Generation},
+  author = {Fan, Angela and Lewis, Mike and Dauphin, Yann},
+  booktitle = {Conference of the Association for Computational Linguistics (ACL)},
+  year = 2018,
+}
+```
diff --git a/SpeechT5/fairseq/examples/translation/README.md b/SpeechT5/fairseq/examples/translation/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..2941f5eb8482dab61dca5eca27a71abd7ee5bf5c
--- /dev/null
+++ b/SpeechT5/fairseq/examples/translation/README.md
@@ -0,0 +1,301 @@
+# Neural Machine Translation
+
+This README contains instructions for [using pretrained translation models](#example-usage-torchhub)
+as well as [training new models](#training-a-new-model).
+
+## Pre-trained models
+
+Model | Description | Dataset | Download
+---|---|---|---
+`conv.wmt14.en-fr` | Convolutional <br> ([Gehring et al., 2017](https://arxiv.org/abs/1705.03122)) | [WMT14 English-French](http://statmt.org/wmt14/translation-task.html#Download) | model: <br> [download (.tar.bz2)](https://dl.fbaipublicfiles.com/fairseq/models/wmt14.v2.en-fr.fconv-py.tar.bz2) <br> newstest2014: <br> [download (.tar.bz2)](https://dl.fbaipublicfiles.com/fairseq/data/wmt14.v2.en-fr.newstest2014.tar.bz2) <br> newstest2012/2013: <br> [download (.tar.bz2)](https://dl.fbaipublicfiles.com/fairseq/data/wmt14.v2.en-fr.ntst1213.tar.bz2)
+`conv.wmt14.en-de` | Convolutional <br> ([Gehring et al., 2017](https://arxiv.org/abs/1705.03122)) | [WMT14 English-German](http://statmt.org/wmt14/translation-task.html#Download) | model: <br> [download (.tar.bz2)](https://dl.fbaipublicfiles.com/fairseq/models/wmt14.en-de.fconv-py.tar.bz2) <br> newstest2014: <br> [download (.tar.bz2)](https://dl.fbaipublicfiles.com/fairseq/data/wmt14.en-de.newstest2014.tar.bz2)
+`conv.wmt17.en-de` | Convolutional <br> ([Gehring et al., 2017](https://arxiv.org/abs/1705.03122)) | [WMT17 English-German](http://statmt.org/wmt17/translation-task.html#Download) | model: <br> [download (.tar.bz2)](https://dl.fbaipublicfiles.com/fairseq/models/wmt17.v2.en-de.fconv-py.tar.bz2) <br> newstest2014: <br> [download (.tar.bz2)](https://dl.fbaipublicfiles.com/fairseq/data/wmt17.v2.en-de.newstest2014.tar.bz2)
+`transformer.wmt14.en-fr` | Transformer <br> ([Ott et al., 2018](https://arxiv.org/abs/1806.00187)) | [WMT14 English-French](http://statmt.org/wmt14/translation-task.html#Download) | model: <br> [download (.tar.bz2)](https://dl.fbaipublicfiles.com/fairseq/models/wmt14.en-fr.joined-dict.transformer.tar.bz2) <br> newstest2014: <br> [download (.tar.bz2)](https://dl.fbaipublicfiles.com/fairseq/data/wmt14.en-fr.joined-dict.newstest2014.tar.bz2)
+`transformer.wmt16.en-de` | Transformer <br> ([Ott et al., 2018](https://arxiv.org/abs/1806.00187)) | [WMT16 English-German](https://drive.google.com/uc?export=download&id=0B_bZck-ksdkpM25jRUN2X2UxMm8) | model: <br> [download (.tar.bz2)](https://dl.fbaipublicfiles.com/fairseq/models/wmt16.en-de.joined-dict.transformer.tar.bz2) <br> newstest2014: <br> [download (.tar.bz2)](https://dl.fbaipublicfiles.com/fairseq/data/wmt16.en-de.joined-dict.newstest2014.tar.bz2)
+`transformer.wmt18.en-de` | Transformer <br> ([Edunov et al., 2018](https://arxiv.org/abs/1808.09381)) <br> WMT'18 winner | [WMT'18 English-German](http://www.statmt.org/wmt18/translation-task.html) | model: <br> [download (.tar.gz)](https://dl.fbaipublicfiles.com/fairseq/models/wmt18.en-de.ensemble.tar.gz) <br> See NOTE in the archive
+`transformer.wmt19.en-de` | Transformer <br> ([Ng et al., 2019](https://arxiv.org/abs/1907.06616)) <br> WMT'19 winner | [WMT'19 English-German](http://www.statmt.org/wmt19/translation-task.html) | model: <br> [download (.tar.gz)](https://dl.fbaipublicfiles.com/fairseq/models/wmt19.en-de.joined-dict.ensemble.tar.gz)
+`transformer.wmt19.de-en` | Transformer <br> ([Ng et al., 2019](https://arxiv.org/abs/1907.06616)) <br> WMT'19 winner | [WMT'19 German-English](http://www.statmt.org/wmt19/translation-task.html) | model: <br> [download (.tar.gz)](https://dl.fbaipublicfiles.com/fairseq/models/wmt19.de-en.joined-dict.ensemble.tar.gz)
+`transformer.wmt19.en-ru` | Transformer <br> ([Ng et al., 2019](https://arxiv.org/abs/1907.06616)) <br> WMT'19 winner | [WMT'19 English-Russian](http://www.statmt.org/wmt19/translation-task.html) | model: <br> [download (.tar.gz)](https://dl.fbaipublicfiles.com/fairseq/models/wmt19.en-ru.ensemble.tar.gz)
+`transformer.wmt19.ru-en` | Transformer <br> ([Ng et al., 2019](https://arxiv.org/abs/1907.06616)) <br> WMT'19 winner | [WMT'19 Russian-English](http://www.statmt.org/wmt19/translation-task.html) | model: <br> [download (.tar.gz)](https://dl.fbaipublicfiles.com/fairseq/models/wmt19.ru-en.ensemble.tar.gz)
+
+## Example usage (torch.hub)
+
+We require a few additional Python dependencies for preprocessing:
+```bash
+pip install fastBPE sacremoses subword_nmt
+```
+
+Interactive translation via PyTorch Hub:
+```python
+import torch
+
+# List available models
+torch.hub.list('pytorch/fairseq')  # [..., 'transformer.wmt16.en-de', ... ]
+
+# Load a transformer trained on WMT'16 En-De
+# Note: WMT'19 models use fastBPE instead of subword_nmt, see instructions below
+en2de = torch.hub.load('pytorch/fairseq', 'transformer.wmt16.en-de',
+                       tokenizer='moses', bpe='subword_nmt')
+en2de.eval()  # disable dropout
+
+# The underlying model is available under the *models* attribute
+assert isinstance(en2de.models[0], fairseq.models.transformer.TransformerModel)
+
+# Move model to GPU for faster translation
+en2de.cuda()
+
+# Translate a sentence
+en2de.translate('Hello world!')
+# 'Hallo Welt!'
+
+# Batched translation
+en2de.translate(['Hello world!', 'The cat sat on the mat.'])
+# ['Hallo Welt!', 'Die Katze saß auf der Matte.']
+```
+
+Loading custom models:
+```python
+from fairseq.models.transformer import TransformerModel
+zh2en = TransformerModel.from_pretrained(
+  '/path/to/checkpoints',
+  checkpoint_file='checkpoint_best.pt',
+  data_name_or_path='data-bin/wmt17_zh_en_full',
+  bpe='subword_nmt',
+  bpe_codes='data-bin/wmt17_zh_en_full/zh.code'
+)
+zh2en.translate('你好 世界')
+# 'Hello World'
+```
+
+If you are using a `transformer.wmt19` models, you will need to set the `bpe`
+argument to `'fastbpe'` and (optionally) load the 4-model ensemble:
+```python
+en2de = torch.hub.load('pytorch/fairseq', 'transformer.wmt19.en-de',
+                       checkpoint_file='model1.pt:model2.pt:model3.pt:model4.pt',
+                       tokenizer='moses', bpe='fastbpe')
+en2de.eval()  # disable dropout
+```
+
+## Example usage (CLI tools)
+
+Generation with the binarized test sets can be run in batch mode as follows, e.g. for WMT 2014 English-French on a GTX-1080ti:
+```bash
+mkdir -p data-bin
+curl https://dl.fbaipublicfiles.com/fairseq/models/wmt14.v2.en-fr.fconv-py.tar.bz2 | tar xvjf - -C data-bin
+curl https://dl.fbaipublicfiles.com/fairseq/data/wmt14.v2.en-fr.newstest2014.tar.bz2 | tar xvjf - -C data-bin
+fairseq-generate data-bin/wmt14.en-fr.newstest2014  \
+    --path data-bin/wmt14.en-fr.fconv-py/model.pt \
+    --beam 5 --batch-size 128 --remove-bpe | tee /tmp/gen.out
+# ...
+# | Translated 3003 sentences (96311 tokens) in 166.0s (580.04 tokens/s)
+# | Generate test with beam=5: BLEU4 = 40.83, 67.5/46.9/34.4/25.5 (BP=1.000, ratio=1.006, syslen=83262, reflen=82787)
+
+# Compute BLEU score
+grep ^H /tmp/gen.out | cut -f3- > /tmp/gen.out.sys
+grep ^T /tmp/gen.out | cut -f2- > /tmp/gen.out.ref
+fairseq-score --sys /tmp/gen.out.sys --ref /tmp/gen.out.ref
+# BLEU4 = 40.83, 67.5/46.9/34.4/25.5 (BP=1.000, ratio=1.006, syslen=83262, reflen=82787)
+```
+
+## Training a new model
+
+### IWSLT'14 German to English (Transformer)
+
+The following instructions can be used to train a Transformer model on the [IWSLT'14 German to English dataset](http://workshop2014.iwslt.org/downloads/proceeding.pdf).
+
+First download and preprocess the data:
+```bash
+# Download and prepare the data
+cd examples/translation/
+bash prepare-iwslt14.sh
+cd ../..
+
+# Preprocess/binarize the data
+TEXT=examples/translation/iwslt14.tokenized.de-en
+fairseq-preprocess --source-lang de --target-lang en \
+    --trainpref $TEXT/train --validpref $TEXT/valid --testpref $TEXT/test \
+    --destdir data-bin/iwslt14.tokenized.de-en \
+    --workers 20
+```
+
+Next we'll train a Transformer translation model over this data:
+```bash
+CUDA_VISIBLE_DEVICES=0 fairseq-train \
+    data-bin/iwslt14.tokenized.de-en \
+    --arch transformer_iwslt_de_en --share-decoder-input-output-embed \
+    --optimizer adam --adam-betas '(0.9, 0.98)' --clip-norm 0.0 \
+    --lr 5e-4 --lr-scheduler inverse_sqrt --warmup-updates 4000 \
+    --dropout 0.3 --weight-decay 0.0001 \
+    --criterion label_smoothed_cross_entropy --label-smoothing 0.1 \
+    --max-tokens 4096 \
+    --eval-bleu \
+    --eval-bleu-args '{"beam": 5, "max_len_a": 1.2, "max_len_b": 10}' \
+    --eval-bleu-detok moses \
+    --eval-bleu-remove-bpe \
+    --eval-bleu-print-samples \
+    --best-checkpoint-metric bleu --maximize-best-checkpoint-metric
+```
+
+Finally we can evaluate our trained model:
+```bash
+fairseq-generate data-bin/iwslt14.tokenized.de-en \
+    --path checkpoints/checkpoint_best.pt \
+    --batch-size 128 --beam 5 --remove-bpe
+```
+
+### WMT'14 English to German (Convolutional)
+
+The following instructions can be used to train a Convolutional translation model on the WMT English to German dataset.
+See the [Scaling NMT README](../scaling_nmt/README.md) for instructions to train a Transformer translation model on this data.
+
+The WMT English to German dataset can be preprocessed using the `prepare-wmt14en2de.sh` script.
+By default it will produce a dataset that was modeled after [Attention Is All You Need (Vaswani et al., 2017)](https://arxiv.org/abs/1706.03762), but with additional news-commentary-v12 data from WMT'17.
+
+To use only data available in WMT'14 or to replicate results obtained in the original [Convolutional Sequence to Sequence Learning (Gehring et al., 2017)](https://arxiv.org/abs/1705.03122) paper, please use the `--icml17` option.
+
+```bash
+# Download and prepare the data
+cd examples/translation/
+# WMT'17 data:
+bash prepare-wmt14en2de.sh
+# or to use WMT'14 data:
+# bash prepare-wmt14en2de.sh --icml17
+cd ../..
+
+# Binarize the dataset
+TEXT=examples/translation/wmt17_en_de
+fairseq-preprocess \
+    --source-lang en --target-lang de \
+    --trainpref $TEXT/train --validpref $TEXT/valid --testpref $TEXT/test \
+    --destdir data-bin/wmt17_en_de --thresholdtgt 0 --thresholdsrc 0 \
+    --workers 20
+
+# Train the model
+mkdir -p checkpoints/fconv_wmt_en_de
+fairseq-train \
+    data-bin/wmt17_en_de \
+    --arch fconv_wmt_en_de \
+    --dropout 0.2 \
+    --criterion label_smoothed_cross_entropy --label-smoothing 0.1 \
+    --optimizer nag --clip-norm 0.1 \
+    --lr 0.5 --lr-scheduler fixed --force-anneal 50 \
+    --max-tokens 4000 \
+    --save-dir checkpoints/fconv_wmt_en_de
+
+# Evaluate
+fairseq-generate data-bin/wmt17_en_de \
+    --path checkpoints/fconv_wmt_en_de/checkpoint_best.pt \
+    --beam 5 --remove-bpe
+```
+
+### WMT'14 English to French
+```bash
+# Download and prepare the data
+cd examples/translation/
+bash prepare-wmt14en2fr.sh
+cd ../..
+
+# Binarize the dataset
+TEXT=examples/translation/wmt14_en_fr
+fairseq-preprocess \
+    --source-lang en --target-lang fr \
+    --trainpref $TEXT/train --validpref $TEXT/valid --testpref $TEXT/test \
+    --destdir data-bin/wmt14_en_fr --thresholdtgt 0 --thresholdsrc 0 \
+    --workers 60
+
+# Train the model
+mkdir -p checkpoints/fconv_wmt_en_fr
+fairseq-train \
+    data-bin/wmt14_en_fr \
+    --arch fconv_wmt_en_fr \
+    --dropout 0.1 \
+    --criterion label_smoothed_cross_entropy --label-smoothing 0.1 \
+    --optimizer nag --clip-norm 0.1 \
+    --lr 0.5 --lr-scheduler fixed --force-anneal 50 \
+    --max-tokens 3000 \
+    --save-dir checkpoints/fconv_wmt_en_fr
+
+# Evaluate
+fairseq-generate \
+    data-bin/fconv_wmt_en_fr \
+    --path checkpoints/fconv_wmt_en_fr/checkpoint_best.pt \
+    --beam 5 --remove-bpe
+```
+
+## Multilingual Translation
+
+We also support training multilingual translation models. In this example we'll
+train a multilingual `{de,fr}-en` translation model using the IWSLT'17 datasets.
+
+Note that we use slightly different preprocessing here than for the IWSLT'14
+En-De data above. In particular we learn a joint BPE code for all three
+languages and use fairseq-interactive and sacrebleu for scoring the test set.
+
+```bash
+# First install sacrebleu and sentencepiece
+pip install sacrebleu sentencepiece
+
+# Then download and preprocess the data
+cd examples/translation/
+bash prepare-iwslt17-multilingual.sh
+cd ../..
+
+# Binarize the de-en dataset
+TEXT=examples/translation/iwslt17.de_fr.en.bpe16k
+fairseq-preprocess --source-lang de --target-lang en \
+    --trainpref $TEXT/train.bpe.de-en \
+    --validpref $TEXT/valid0.bpe.de-en,$TEXT/valid1.bpe.de-en,$TEXT/valid2.bpe.de-en,$TEXT/valid3.bpe.de-en,$TEXT/valid4.bpe.de-en,$TEXT/valid5.bpe.de-en \
+    --destdir data-bin/iwslt17.de_fr.en.bpe16k \
+    --workers 10
+
+# Binarize the fr-en dataset
+# NOTE: it's important to reuse the en dictionary from the previous step
+fairseq-preprocess --source-lang fr --target-lang en \
+    --trainpref $TEXT/train.bpe.fr-en \
+    --validpref $TEXT/valid0.bpe.fr-en,$TEXT/valid1.bpe.fr-en,$TEXT/valid2.bpe.fr-en,$TEXT/valid3.bpe.fr-en,$TEXT/valid4.bpe.fr-en,$TEXT/valid5.bpe.fr-en \
+    --tgtdict data-bin/iwslt17.de_fr.en.bpe16k/dict.en.txt \
+    --destdir data-bin/iwslt17.de_fr.en.bpe16k \
+    --workers 10
+
+# Train a multilingual transformer model
+# NOTE: the command below assumes 1 GPU, but accumulates gradients from
+#       8 fwd/bwd passes to simulate training on 8 GPUs
+mkdir -p checkpoints/multilingual_transformer
+CUDA_VISIBLE_DEVICES=0 fairseq-train data-bin/iwslt17.de_fr.en.bpe16k/ \
+    --max-epoch 50 \
+    --ddp-backend=legacy_ddp \
+    --task multilingual_translation --lang-pairs de-en,fr-en \
+    --arch multilingual_transformer_iwslt_de_en \
+    --share-decoders --share-decoder-input-output-embed \
+    --optimizer adam --adam-betas '(0.9, 0.98)' \
+    --lr 0.0005 --lr-scheduler inverse_sqrt \
+    --warmup-updates 4000 --warmup-init-lr '1e-07' \
+    --label-smoothing 0.1 --criterion label_smoothed_cross_entropy \
+    --dropout 0.3 --weight-decay 0.0001 \
+    --save-dir checkpoints/multilingual_transformer \
+    --max-tokens 4000 \
+    --update-freq 8
+
+# Generate and score the test set with sacrebleu
+SRC=de
+sacrebleu --test-set iwslt17 --language-pair ${SRC}-en --echo src \
+    | python scripts/spm_encode.py --model examples/translation/iwslt17.de_fr.en.bpe16k/sentencepiece.bpe.model \
+    > iwslt17.test.${SRC}-en.${SRC}.bpe
+cat iwslt17.test.${SRC}-en.${SRC}.bpe \
+    | fairseq-interactive data-bin/iwslt17.de_fr.en.bpe16k/ \
+      --task multilingual_translation --lang-pairs de-en,fr-en \
+      --source-lang ${SRC} --target-lang en \
+      --path checkpoints/multilingual_transformer/checkpoint_best.pt \
+      --buffer-size 2000 --batch-size 128 \
+      --beam 5 --remove-bpe=sentencepiece \
+    > iwslt17.test.${SRC}-en.en.sys
+grep ^H iwslt17.test.${SRC}-en.en.sys | cut -f3 \
+    | sacrebleu --test-set iwslt17 --language-pair ${SRC}-en
+```
+
+##### Argument format during inference
+
+During inference it is required to specify a single `--source-lang` and
+`--target-lang`, which indicates the inference langauge direction.
+`--lang-pairs`, `--encoder-langtok`, `--decoder-langtok` have to be set to
+the same value as training.
diff --git a/SpeechT5/fairseq/examples/translation/prepare-iwslt14.sh b/SpeechT5/fairseq/examples/translation/prepare-iwslt14.sh
new file mode 100644
index 0000000000000000000000000000000000000000..2fb6643fbccb58701dcbb77d91430e68a821ba38
--- /dev/null
+++ b/SpeechT5/fairseq/examples/translation/prepare-iwslt14.sh
@@ -0,0 +1,115 @@
+#!/usr/bin/env bash
+#
+# Adapted from https://github.com/facebookresearch/MIXER/blob/master/prepareData.sh
+
+echo 'Cloning Moses github repository (for tokenization scripts)...'
+git clone https://github.com/moses-smt/mosesdecoder.git
+
+echo 'Cloning Subword NMT repository (for BPE pre-processing)...'
+git clone https://github.com/rsennrich/subword-nmt.git
+
+SCRIPTS=mosesdecoder/scripts
+TOKENIZER=$SCRIPTS/tokenizer/tokenizer.perl
+LC=$SCRIPTS/tokenizer/lowercase.perl
+CLEAN=$SCRIPTS/training/clean-corpus-n.perl
+BPEROOT=subword-nmt/subword_nmt
+BPE_TOKENS=10000
+
+URL="http://dl.fbaipublicfiles.com/fairseq/data/iwslt14/de-en.tgz"
+GZ=de-en.tgz
+
+if [ ! -d "$SCRIPTS" ]; then
+    echo "Please set SCRIPTS variable correctly to point to Moses scripts."
+    exit
+fi
+
+src=de
+tgt=en
+lang=de-en
+prep=iwslt14.tokenized.de-en
+tmp=$prep/tmp
+orig=orig
+
+mkdir -p $orig $tmp $prep
+
+echo "Downloading data from ${URL}..."
+cd $orig
+wget "$URL"
+
+if [ -f $GZ ]; then
+    echo "Data successfully downloaded."
+else
+    echo "Data not successfully downloaded."
+    exit
+fi
+
+tar zxvf $GZ
+cd ..
+
+echo "pre-processing train data..."
+for l in $src $tgt; do
+    f=train.tags.$lang.$l
+    tok=train.tags.$lang.tok.$l
+
+    cat $orig/$lang/$f | \
+    grep -v '<url>' | \
+    grep -v '<talkid>' | \
+    grep -v '<keywords>' | \
+    sed -e 's/<title>//g' | \
+    sed -e 's/<\/title>//g' | \
+    sed -e 's/<description>//g' | \
+    sed -e 's/<\/description>//g' | \
+    perl $TOKENIZER -threads 8 -l $l > $tmp/$tok
+    echo ""
+done
+perl $CLEAN -ratio 1.5 $tmp/train.tags.$lang.tok $src $tgt $tmp/train.tags.$lang.clean 1 175
+for l in $src $tgt; do
+    perl $LC < $tmp/train.tags.$lang.clean.$l > $tmp/train.tags.$lang.$l
+done
+
+echo "pre-processing valid/test data..."
+for l in $src $tgt; do
+    for o in `ls $orig/$lang/IWSLT14.TED*.$l.xml`; do
+    fname=${o##*/}
+    f=$tmp/${fname%.*}
+    echo $o $f
+    grep '<seg id' $o | \
+        sed -e 's/<seg id="[0-9]*">\s*//g' | \
+        sed -e 's/\s*<\/seg>\s*//g' | \
+        sed -e "s/\’/\'/g" | \
+    perl $TOKENIZER -threads 8 -l $l | \
+    perl $LC > $f
+    echo ""
+    done
+done
+
+
+echo "creating train, valid, test..."
+for l in $src $tgt; do
+    awk '{if (NR%23 == 0)  print $0; }' $tmp/train.tags.de-en.$l > $tmp/valid.$l
+    awk '{if (NR%23 != 0)  print $0; }' $tmp/train.tags.de-en.$l > $tmp/train.$l
+
+    cat $tmp/IWSLT14.TED.dev2010.de-en.$l \
+        $tmp/IWSLT14.TEDX.dev2012.de-en.$l \
+        $tmp/IWSLT14.TED.tst2010.de-en.$l \
+        $tmp/IWSLT14.TED.tst2011.de-en.$l \
+        $tmp/IWSLT14.TED.tst2012.de-en.$l \
+        > $tmp/test.$l
+done
+
+TRAIN=$tmp/train.en-de
+BPE_CODE=$prep/code
+rm -f $TRAIN
+for l in $src $tgt; do
+    cat $tmp/train.$l >> $TRAIN
+done
+
+echo "learn_bpe.py on ${TRAIN}..."
+python $BPEROOT/learn_bpe.py -s $BPE_TOKENS < $TRAIN > $BPE_CODE
+
+for L in $src $tgt; do
+    for f in train.$L valid.$L test.$L; do
+        echo "apply_bpe.py to ${f}..."
+        python $BPEROOT/apply_bpe.py -c $BPE_CODE < $tmp/$f > $prep/$f
+    done
+done
diff --git a/SpeechT5/fairseq/examples/translation/prepare-iwslt17-multilingual.sh b/SpeechT5/fairseq/examples/translation/prepare-iwslt17-multilingual.sh
new file mode 100644
index 0000000000000000000000000000000000000000..23be87555322bc03b13e9d95951d88b1a442f97a
--- /dev/null
+++ b/SpeechT5/fairseq/examples/translation/prepare-iwslt17-multilingual.sh
@@ -0,0 +1,133 @@
+#!/bin/bash
+# Copyright (c) Facebook, Inc. and its affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+
+SRCS=(
+    "de"
+    "fr"
+)
+TGT=en
+
+ROOT=$(dirname "$0")
+SCRIPTS=$ROOT/../../scripts
+SPM_TRAIN=$SCRIPTS/spm_train.py
+SPM_ENCODE=$SCRIPTS/spm_encode.py
+
+BPESIZE=16384
+ORIG=$ROOT/iwslt17_orig
+DATA=$ROOT/iwslt17.de_fr.en.bpe16k
+mkdir -p "$ORIG" "$DATA"
+
+TRAIN_MINLEN=1  # remove sentences with <1 BPE token
+TRAIN_MAXLEN=250  # remove sentences with >250 BPE tokens
+
+URLS=(
+    "https://wit3.fbk.eu/archive/2017-01-trnted/texts/de/en/de-en.tgz"
+    "https://wit3.fbk.eu/archive/2017-01-trnted/texts/fr/en/fr-en.tgz"
+)
+ARCHIVES=(
+    "de-en.tgz"
+    "fr-en.tgz"
+)
+VALID_SETS=(
+    "IWSLT17.TED.dev2010.de-en IWSLT17.TED.tst2010.de-en IWSLT17.TED.tst2011.de-en IWSLT17.TED.tst2012.de-en IWSLT17.TED.tst2013.de-en IWSLT17.TED.tst2014.de-en IWSLT17.TED.tst2015.de-en"
+    "IWSLT17.TED.dev2010.fr-en IWSLT17.TED.tst2010.fr-en IWSLT17.TED.tst2011.fr-en IWSLT17.TED.tst2012.fr-en IWSLT17.TED.tst2013.fr-en IWSLT17.TED.tst2014.fr-en IWSLT17.TED.tst2015.fr-en"
+)
+
+# download and extract data
+for ((i=0;i<${#URLS[@]};++i)); do
+    ARCHIVE=$ORIG/${ARCHIVES[i]}
+    if [ -f "$ARCHIVE" ]; then
+        echo "$ARCHIVE already exists, skipping download"
+    else
+        URL=${URLS[i]}
+        wget -P "$ORIG" "$URL"
+        if [ -f "$ARCHIVE" ]; then
+            echo "$URL successfully downloaded."
+        else
+            echo "$URL not successfully downloaded."
+            exit 1
+        fi
+    fi
+    FILE=${ARCHIVE: -4}
+    if [ -e "$FILE" ]; then
+        echo "$FILE already exists, skipping extraction"
+    else
+        tar -C "$ORIG" -xzvf "$ARCHIVE"
+    fi
+done
+
+echo "pre-processing train data..."
+for SRC in "${SRCS[@]}"; do
+    for LANG in "${SRC}" "${TGT}"; do
+        cat "$ORIG/${SRC}-${TGT}/train.tags.${SRC}-${TGT}.${LANG}" \
+            | grep -v '<url>' \
+            | grep -v '<talkid>' \
+            | grep -v '<keywords>' \
+            | grep -v '<speaker>' \
+            | grep -v '<reviewer' \
+            | grep -v '<translator' \
+            | grep -v '<doc' \
+            | grep -v '</doc>' \
+            | sed -e 's/<title>//g' \
+            | sed -e 's/<\/title>//g' \
+            | sed -e 's/<description>//g' \
+            | sed -e 's/<\/description>//g' \
+            | sed 's/^\s*//g' \
+            | sed 's/\s*$//g' \
+            > "$DATA/train.${SRC}-${TGT}.${LANG}"
+    done
+done
+
+echo "pre-processing valid data..."
+for ((i=0;i<${#SRCS[@]};++i)); do
+    SRC=${SRCS[i]}
+    VALID_SET=(${VALID_SETS[i]})
+    for ((j=0;j<${#VALID_SET[@]};++j)); do
+        FILE=${VALID_SET[j]}
+        for LANG in "$SRC" "$TGT"; do
+            grep '<seg id' "$ORIG/${SRC}-${TGT}/${FILE}.${LANG}.xml" \
+                | sed -e 's/<seg id="[0-9]*">\s*//g' \
+                | sed -e 's/\s*<\/seg>\s*//g' \
+                | sed -e "s/\’/\'/g" \
+                > "$DATA/valid${j}.${SRC}-${TGT}.${LANG}"
+        done
+    done
+done
+
+# learn BPE with sentencepiece
+TRAIN_FILES=$(for SRC in "${SRCS[@]}"; do echo $DATA/train.${SRC}-${TGT}.${SRC}; echo $DATA/train.${SRC}-${TGT}.${TGT}; done | tr "\n" ",")
+echo "learning joint BPE over ${TRAIN_FILES}..."
+python "$SPM_TRAIN" \
+    --input=$TRAIN_FILES \
+    --model_prefix=$DATA/sentencepiece.bpe \
+    --vocab_size=$BPESIZE \
+    --character_coverage=1.0 \
+    --model_type=bpe
+
+# encode train/valid
+echo "encoding train with learned BPE..."
+for SRC in "${SRCS[@]}"; do
+    python "$SPM_ENCODE" \
+        --model "$DATA/sentencepiece.bpe.model" \
+        --output_format=piece \
+        --inputs $DATA/train.${SRC}-${TGT}.${SRC} $DATA/train.${SRC}-${TGT}.${TGT} \
+        --outputs $DATA/train.bpe.${SRC}-${TGT}.${SRC} $DATA/train.bpe.${SRC}-${TGT}.${TGT} \
+        --min-len $TRAIN_MINLEN --max-len $TRAIN_MAXLEN
+done
+
+echo "encoding valid with learned BPE..."
+for ((i=0;i<${#SRCS[@]};++i)); do
+    SRC=${SRCS[i]}
+    VALID_SET=(${VALID_SETS[i]})
+    for ((j=0;j<${#VALID_SET[@]};++j)); do
+        python "$SPM_ENCODE" \
+            --model "$DATA/sentencepiece.bpe.model" \
+            --output_format=piece \
+            --inputs $DATA/valid${j}.${SRC}-${TGT}.${SRC} $DATA/valid${j}.${SRC}-${TGT}.${TGT} \
+            --outputs $DATA/valid${j}.bpe.${SRC}-${TGT}.${SRC} $DATA/valid${j}.bpe.${SRC}-${TGT}.${TGT}
+    done
+done
diff --git a/SpeechT5/fairseq/examples/translation/prepare-wmt14en2de.sh b/SpeechT5/fairseq/examples/translation/prepare-wmt14en2de.sh
new file mode 100644
index 0000000000000000000000000000000000000000..6702c88b568c9e680b525593ff0c9fb0a474825d
--- /dev/null
+++ b/SpeechT5/fairseq/examples/translation/prepare-wmt14en2de.sh
@@ -0,0 +1,142 @@
+#!/bin/bash
+# Adapted from https://github.com/facebookresearch/MIXER/blob/master/prepareData.sh
+
+echo 'Cloning Moses github repository (for tokenization scripts)...'
+git clone https://github.com/moses-smt/mosesdecoder.git
+
+echo 'Cloning Subword NMT repository (for BPE pre-processing)...'
+git clone https://github.com/rsennrich/subword-nmt.git
+
+SCRIPTS=mosesdecoder/scripts
+TOKENIZER=$SCRIPTS/tokenizer/tokenizer.perl
+CLEAN=$SCRIPTS/training/clean-corpus-n.perl
+NORM_PUNC=$SCRIPTS/tokenizer/normalize-punctuation.perl
+REM_NON_PRINT_CHAR=$SCRIPTS/tokenizer/remove-non-printing-char.perl
+BPEROOT=subword-nmt/subword_nmt
+BPE_TOKENS=40000
+
+URLS=(
+    "http://statmt.org/wmt13/training-parallel-europarl-v7.tgz"
+    "http://statmt.org/wmt13/training-parallel-commoncrawl.tgz"
+    "http://data.statmt.org/wmt17/translation-task/training-parallel-nc-v12.tgz"
+    "http://data.statmt.org/wmt17/translation-task/dev.tgz"
+    "http://statmt.org/wmt14/test-full.tgz"
+)
+FILES=(
+    "training-parallel-europarl-v7.tgz"
+    "training-parallel-commoncrawl.tgz"
+    "training-parallel-nc-v12.tgz"
+    "dev.tgz"
+    "test-full.tgz"
+)
+CORPORA=(
+    "training/europarl-v7.de-en"
+    "commoncrawl.de-en"
+    "training/news-commentary-v12.de-en"
+)
+
+# This will make the dataset compatible to the one used in "Convolutional Sequence to Sequence Learning"
+# https://arxiv.org/abs/1705.03122
+if [ "$1" == "--icml17" ]; then
+    URLS[2]="http://statmt.org/wmt14/training-parallel-nc-v9.tgz"
+    FILES[2]="training-parallel-nc-v9.tgz"
+    CORPORA[2]="training/news-commentary-v9.de-en"
+    OUTDIR=wmt14_en_de
+else
+    OUTDIR=wmt17_en_de
+fi
+
+if [ ! -d "$SCRIPTS" ]; then
+    echo "Please set SCRIPTS variable correctly to point to Moses scripts."
+    exit
+fi
+
+src=en
+tgt=de
+lang=en-de
+prep=$OUTDIR
+tmp=$prep/tmp
+orig=orig
+dev=dev/newstest2013
+
+mkdir -p $orig $tmp $prep
+
+cd $orig
+
+for ((i=0;i<${#URLS[@]};++i)); do
+    file=${FILES[i]}
+    if [ -f $file ]; then
+        echo "$file already exists, skipping download"
+    else
+        url=${URLS[i]}
+        wget "$url"
+        if [ -f $file ]; then
+            echo "$url successfully downloaded."
+        else
+            echo "$url not successfully downloaded."
+            exit -1
+        fi
+        if [ ${file: -4} == ".tgz" ]; then
+            tar zxvf $file
+        elif [ ${file: -4} == ".tar" ]; then
+            tar xvf $file
+        fi
+    fi
+done
+cd ..
+
+echo "pre-processing train data..."
+for l in $src $tgt; do
+    rm $tmp/train.tags.$lang.tok.$l
+    for f in "${CORPORA[@]}"; do
+        cat $orig/$f.$l | \
+            perl $NORM_PUNC $l | \
+            perl $REM_NON_PRINT_CHAR | \
+            perl $TOKENIZER -threads 8 -a -l $l >> $tmp/train.tags.$lang.tok.$l
+    done
+done
+
+echo "pre-processing test data..."
+for l in $src $tgt; do
+    if [ "$l" == "$src" ]; then
+        t="src"
+    else
+        t="ref"
+    fi
+    grep '<seg id' $orig/test-full/newstest2014-deen-$t.$l.sgm | \
+        sed -e 's/<seg id="[0-9]*">\s*//g' | \
+        sed -e 's/\s*<\/seg>\s*//g' | \
+        sed -e "s/\’/\'/g" | \
+    perl $TOKENIZER -threads 8 -a -l $l > $tmp/test.$l
+    echo ""
+done
+
+echo "splitting train and valid..."
+for l in $src $tgt; do
+    awk '{if (NR%100 == 0)  print $0; }' $tmp/train.tags.$lang.tok.$l > $tmp/valid.$l
+    awk '{if (NR%100 != 0)  print $0; }' $tmp/train.tags.$lang.tok.$l > $tmp/train.$l
+done
+
+TRAIN=$tmp/train.de-en
+BPE_CODE=$prep/code
+rm -f $TRAIN
+for l in $src $tgt; do
+    cat $tmp/train.$l >> $TRAIN
+done
+
+echo "learn_bpe.py on ${TRAIN}..."
+python $BPEROOT/learn_bpe.py -s $BPE_TOKENS < $TRAIN > $BPE_CODE
+
+for L in $src $tgt; do
+    for f in train.$L valid.$L test.$L; do
+        echo "apply_bpe.py to ${f}..."
+        python $BPEROOT/apply_bpe.py -c $BPE_CODE < $tmp/$f > $tmp/bpe.$f
+    done
+done
+
+perl $CLEAN -ratio 1.5 $tmp/bpe.train $src $tgt $prep/train 1 250
+perl $CLEAN -ratio 1.5 $tmp/bpe.valid $src $tgt $prep/valid 1 250
+
+for L in $src $tgt; do
+    cp $tmp/bpe.test.$L $prep/test.$L
+done
diff --git a/SpeechT5/fairseq/examples/translation/prepare-wmt14en2fr.sh b/SpeechT5/fairseq/examples/translation/prepare-wmt14en2fr.sh
new file mode 100644
index 0000000000000000000000000000000000000000..2ac97a5b76fab255449493488ed8bd67350a7bac
--- /dev/null
+++ b/SpeechT5/fairseq/examples/translation/prepare-wmt14en2fr.sh
@@ -0,0 +1,136 @@
+#!/bin/bash
+# Adapted from https://github.com/facebookresearch/MIXER/blob/master/prepareData.sh
+
+echo 'Cloning Moses github repository (for tokenization scripts)...'
+git clone https://github.com/moses-smt/mosesdecoder.git
+
+echo 'Cloning Subword NMT repository (for BPE pre-processing)...'
+git clone https://github.com/rsennrich/subword-nmt.git
+
+SCRIPTS=mosesdecoder/scripts
+TOKENIZER=$SCRIPTS/tokenizer/tokenizer.perl
+CLEAN=$SCRIPTS/training/clean-corpus-n.perl
+NORM_PUNC=$SCRIPTS/tokenizer/normalize-punctuation.perl
+REM_NON_PRINT_CHAR=$SCRIPTS/tokenizer/remove-non-printing-char.perl
+BPEROOT=subword-nmt/subword_nmt
+BPE_TOKENS=40000
+
+URLS=(
+    "http://statmt.org/wmt13/training-parallel-europarl-v7.tgz"
+    "http://statmt.org/wmt13/training-parallel-commoncrawl.tgz"
+    "http://statmt.org/wmt13/training-parallel-un.tgz"
+    "http://statmt.org/wmt14/training-parallel-nc-v9.tgz"
+    "http://statmt.org/wmt10/training-giga-fren.tar"
+    "http://statmt.org/wmt14/test-full.tgz"
+)
+FILES=(
+    "training-parallel-europarl-v7.tgz"
+    "training-parallel-commoncrawl.tgz"
+    "training-parallel-un.tgz"
+    "training-parallel-nc-v9.tgz"
+    "training-giga-fren.tar"
+    "test-full.tgz"
+)
+CORPORA=(
+    "training/europarl-v7.fr-en"
+    "commoncrawl.fr-en"
+    "un/undoc.2000.fr-en"
+    "training/news-commentary-v9.fr-en"
+    "giga-fren.release2.fixed"
+)
+
+if [ ! -d "$SCRIPTS" ]; then
+    echo "Please set SCRIPTS variable correctly to point to Moses scripts."
+    exit
+fi
+
+src=en
+tgt=fr
+lang=en-fr
+prep=wmt14_en_fr
+tmp=$prep/tmp
+orig=orig
+
+mkdir -p $orig $tmp $prep
+
+cd $orig
+
+for ((i=0;i<${#URLS[@]};++i)); do
+    file=${FILES[i]}
+    if [ -f $file ]; then
+        echo "$file already exists, skipping download"
+    else
+        url=${URLS[i]}
+        wget "$url"
+        if [ -f $file ]; then
+            echo "$url successfully downloaded."
+        else
+            echo "$url not successfully downloaded."
+            exit -1
+        fi
+        if [ ${file: -4} == ".tgz" ]; then
+            tar zxvf $file
+        elif [ ${file: -4} == ".tar" ]; then
+            tar xvf $file
+        fi
+    fi
+done
+
+gunzip giga-fren.release2.fixed.*.gz
+cd ..
+
+echo "pre-processing train data..."
+for l in $src $tgt; do
+    rm $tmp/train.tags.$lang.tok.$l
+    for f in "${CORPORA[@]}"; do
+        cat $orig/$f.$l | \
+            perl $NORM_PUNC $l | \
+            perl $REM_NON_PRINT_CHAR | \
+            perl $TOKENIZER -threads 8 -a -l $l >> $tmp/train.tags.$lang.tok.$l
+    done
+done
+
+echo "pre-processing test data..."
+for l in $src $tgt; do
+    if [ "$l" == "$src" ]; then
+        t="src"
+    else
+        t="ref"
+    fi
+    grep '<seg id' $orig/test-full/newstest2014-fren-$t.$l.sgm | \
+        sed -e 's/<seg id="[0-9]*">\s*//g' | \
+        sed -e 's/\s*<\/seg>\s*//g' | \
+        sed -e "s/\’/\'/g" | \
+    perl $TOKENIZER -threads 8 -a -l $l > $tmp/test.$l
+    echo ""
+done
+
+echo "splitting train and valid..."
+for l in $src $tgt; do
+    awk '{if (NR%1333 == 0)  print $0; }' $tmp/train.tags.$lang.tok.$l > $tmp/valid.$l
+    awk '{if (NR%1333 != 0)  print $0; }' $tmp/train.tags.$lang.tok.$l > $tmp/train.$l
+done
+
+TRAIN=$tmp/train.fr-en
+BPE_CODE=$prep/code
+rm -f $TRAIN
+for l in $src $tgt; do
+    cat $tmp/train.$l >> $TRAIN
+done
+
+echo "learn_bpe.py on ${TRAIN}..."
+python $BPEROOT/learn_bpe.py -s $BPE_TOKENS < $TRAIN > $BPE_CODE
+
+for L in $src $tgt; do
+    for f in train.$L valid.$L test.$L; do
+        echo "apply_bpe.py to ${f}..."
+        python $BPEROOT/apply_bpe.py -c $BPE_CODE < $tmp/$f > $tmp/bpe.$f
+    done
+done
+
+perl $CLEAN -ratio 1.5 $tmp/bpe.train $src $tgt $prep/train 1 250
+perl $CLEAN -ratio 1.5 $tmp/bpe.valid $src $tgt $prep/valid 1 250
+
+for L in $src $tgt; do
+    cp $tmp/bpe.test.$L $prep/test.$L
+done
diff --git a/SpeechT5/fairseq/examples/translation_moe/README.md b/SpeechT5/fairseq/examples/translation_moe/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..2e5c8af617f410f64ca38d29447bd05b6af8c5a8
--- /dev/null
+++ b/SpeechT5/fairseq/examples/translation_moe/README.md
@@ -0,0 +1,89 @@
+# Mixture Models for Diverse Machine Translation: Tricks of the Trade (Shen et al., 2019)
+
+This page includes instructions for reproducing results from the paper [Mixture Models for Diverse Machine Translation: Tricks of the Trade (Shen et al., 2019)](https://arxiv.org/abs/1902.07816).
+
+## Download data
+
+First, follow the [instructions to download and preprocess the WMT'17 En-De dataset](../translation#prepare-wmt14en2desh).
+Make sure to learn a joint vocabulary by passing the `--joined-dictionary` option to `fairseq-preprocess`.
+
+## Train a model
+
+Then we can train a mixture of experts model using the `translation_moe` task.
+Use the `--method` flag to choose the MoE variant; we support hard mixtures with a learned or uniform prior (`--method hMoElp` and `hMoEup`, respectively) and soft mixures (`--method sMoElp` and `sMoEup`).
+The model is trained with online responsibility assignment and shared parameterization.
+
+The following command will train a `hMoElp` model with `3` experts:
+```bash
+fairseq-train --ddp-backend='legacy_ddp' \
+    data-bin/wmt17_en_de \
+    --max-update 100000 \
+    --task translation_moe --user-dir examples/translation_moe/translation_moe_src \
+    --method hMoElp --mean-pool-gating-network \
+    --num-experts 3 \
+    --arch transformer_wmt_en_de --share-all-embeddings \
+    --optimizer adam --adam-betas '(0.9, 0.98)' --clip-norm 0.0 \
+    --lr-scheduler inverse_sqrt --warmup-init-lr 1e-07 --warmup-updates 4000 \
+    --lr 0.0007 \
+    --dropout 0.1 --weight-decay 0.0 --criterion cross_entropy \
+    --max-tokens 3584
+```
+
+## Translate
+
+Once a model is trained, we can generate translations from different experts using the `--gen-expert` option.
+For example, to generate from expert 0:
+```bash
+fairseq-generate data-bin/wmt17_en_de \
+    --path checkpoints/checkpoint_best.pt \
+    --beam 1 --remove-bpe \
+    --task translation_moe --user-dir examples/translation_moe/translation_moe_src \
+    --method hMoElp --mean-pool-gating-network \
+    --num-experts 3 \
+    --gen-expert 0
+```
+
+## Evaluate
+
+First download a tokenized version of the WMT'14 En-De test set with multiple references:
+```bash
+wget dl.fbaipublicfiles.com/fairseq/data/wmt14-en-de.extra_refs.tok
+```
+
+Next apply BPE on the fly and run generation for each expert:
+```bash
+BPE_CODE=examples/translation/wmt17_en_de/code
+for EXPERT in $(seq 0 2); do \
+    cat wmt14-en-de.extra_refs.tok \
+    | grep ^S | cut -f 2 \
+    | fairseq-interactive data-bin/wmt17_en_de \
+        --path checkpoints/checkpoint_best.pt \
+        --beam 1 \
+        --bpe subword_nmt --bpe-codes $BPE_CODE \
+        --buffer-size 500 --max-tokens 6000 \
+        --task translation_moe --user-dir examples/translation_moe/translation_moe_src \
+        --method hMoElp --mean-pool-gating-network \
+        --num-experts 3 \
+        --gen-expert $EXPERT ; \
+done > wmt14-en-de.extra_refs.tok.gen.3experts
+```
+
+Finally use `score_moe.py` to compute pairwise BLUE and average oracle BLEU:
+```bash
+python examples/translation_moe/score.py --sys wmt14-en-de.extra_refs.tok.gen.3experts --ref wmt14-en-de.extra_refs.tok
+# pairwise BLEU: 48.26
+# #refs covered: 2.11
+# multi-reference BLEU (leave-one-out): 59.46
+```
+This matches row 3 from Table 7 in the paper.
+
+## Citation
+
+```bibtex
+@article{shen2019mixture,
+  title = {Mixture Models for Diverse Machine Translation: Tricks of the Trade},
+  author = {Tianxiao Shen and Myle Ott and Michael Auli and Marc'Aurelio Ranzato},
+  journal = {International Conference on Machine Learning},
+  year = 2019,
+}
+```
diff --git a/SpeechT5/fairseq/examples/translation_moe/score.py b/SpeechT5/fairseq/examples/translation_moe/score.py
new file mode 100644
index 0000000000000000000000000000000000000000..9a529a985019710ea202cb6bf28ae071c0ce4135
--- /dev/null
+++ b/SpeechT5/fairseq/examples/translation_moe/score.py
@@ -0,0 +1,197 @@
+#!/usr/bin/env python3
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+"""
+Scoring script for computing pairwise BLEU and multi-ref BLEU over a set of
+candidate hypotheses.
+
+See `"Mixture Models for Diverse Machine Translation: Tricks of the Trade"
+(Shen et al., 2019) <https://arxiv.org/abs/1902.07816>`_.
+"""
+
+import argparse
+import random
+import sys
+from itertools import chain
+
+import numpy as np
+from sacrebleu import compute_bleu, corpus_bleu as _corpus_bleu
+
+
+def main():
+    parser = argparse.ArgumentParser(sys.argv[0])
+    parser.add_argument(
+        "--sys", nargs="*", default="", metavar="FILE", help="path to system output"
+    )
+    parser.add_argument("--ref", default="", metavar="FILE", help="path to references")
+    parser.add_argument(
+        "--output",
+        default="",
+        metavar="FILE",
+        help="print outputs into a pretty format",
+    )
+    args = parser.parse_args()
+
+    if args.sys:
+        src, tgt, hypos, log_probs = load_sys(args.sys)
+        print("pairwise BLEU: %.2f" % pairwise(hypos))
+        if args.output:
+            merge(src, tgt, hypos, log_probs, args.output)
+
+    if args.ref:
+        _, _, refs = load_ref(args.ref)
+        if args.sys:
+            multi_ref(refs, hypos)
+        else:
+            intra_ref(refs)
+
+
+def dictolist(d):
+    a = sorted(d.items(), key=lambda i: i[0])
+    return [i[1] for i in a]
+
+
+def load_sys(paths):
+    src, tgt, hypos, log_probs = {}, {}, {}, {}
+    for path in paths:
+        with open(path) as f:
+            for line in f:
+                line = line.rstrip()
+                # S: source
+                # T: target
+                # D: detokenized system output
+                if line.startswith(("S-", "T-", "D-")):
+                    i = int(line[line.find("-") + 1 : line.find("\t")])
+                    if line.startswith("S-"):
+                        src[i] = line.split("\t")[1]
+                    if line.startswith("T-"):
+                        tgt[i] = line.split("\t")[1]
+                    if line.startswith("D-"):
+                        if i not in hypos:
+                            hypos[i] = []
+                            log_probs[i] = []
+                        hypos[i].append(line.split("\t")[2])
+                        log_probs[i].append(float(line.split("\t")[1]))
+    return dictolist(src), dictolist(tgt), dictolist(hypos), dictolist(log_probs)
+
+
+def load_ref(path):
+    with open(path) as f:
+        lines = f.readlines()
+    src, tgt, refs = [], [], []
+    i = 0
+    while i < len(lines):
+        if lines[i].startswith("S-"):
+            src.append(lines[i].split("\t")[1].rstrip())
+            i += 1
+        elif lines[i].startswith("T-"):
+            tgt.append(lines[i].split("\t")[1].rstrip())
+            i += 1
+        else:
+            a = []
+            while i < len(lines) and lines[i].startswith("R"):
+                a.append(lines[i].split("\t")[1].rstrip())
+                i += 1
+            refs.append(a)
+    return src, tgt, refs
+
+
+def merge(src, tgt, hypos, log_probs, path):
+    with open(path, "w") as f:
+        for s, t, hs, lps in zip(src, tgt, hypos, log_probs):
+            f.write(s + "\n")
+            f.write(t + "\n")
+            f.write("\n")
+            for h, lp in zip(hs, lps):
+                f.write("\t%f\t%s\n" % (lp, h.strip()))
+            f.write("------------------------------------------------------\n")
+
+
+def corpus_bleu(sys_stream, ref_streams):
+    bleu = _corpus_bleu(sys_stream, ref_streams, tokenize="none")
+    return bleu.score
+
+
+def sentence_bleu(hypothesis, reference):
+    bleu = _corpus_bleu(hypothesis, reference)
+    for i in range(1, 4):
+        bleu.counts[i] += 1
+        bleu.totals[i] += 1
+    bleu = compute_bleu(
+        bleu.counts,
+        bleu.totals,
+        bleu.sys_len,
+        bleu.ref_len,
+        smooth_method="exp",
+    )
+    return bleu.score
+
+
+def pairwise(sents):
+    _ref, _hypo = [], []
+    for s in sents:
+        for i in range(len(s)):
+            for j in range(len(s)):
+                if i != j:
+                    _ref.append(s[i])
+                    _hypo.append(s[j])
+    return corpus_bleu(_hypo, [_ref])
+
+
+def multi_ref(refs, hypos):
+    _ref, _hypo = [], []
+    ref_cnt = 0
+    assert len(refs) == len(hypos)
+
+    # count number of refs covered
+    for rs, hs in zip(refs, hypos):
+        a = set()
+        for h in hs:
+            s = [sentence_bleu(h, r) for r in rs]
+            j = np.argmax(s)
+            _ref.append(rs[j])
+            _hypo.append(h)
+            best = [k for k in range(len(rs)) if s[k] == s[j]]
+            a.add(random.choice(best))
+        ref_cnt += len(a)
+    print("#refs covered: %.2f" % (ref_cnt / len(refs)))
+
+    # transpose refs and hypos
+    refs = list(zip(*refs))
+    hypos = list(zip(*hypos))
+
+    # compute multi-ref corpus BLEU (leave-one-out to be comparable to intra_ref)
+    k = len(hypos)
+    m = len(refs)
+    flat_hypos = [hypos[j][i] for i in range(len(hypos[0])) for j in range(k)]
+    duplicated_refs = [[ref for ref in refs_i for _ in range(k)] for refs_i in refs]
+    loo_bleus = []
+    for held_out_ref in range(m):
+        remaining_refs = (
+            duplicated_refs[:held_out_ref] + duplicated_refs[held_out_ref + 1 :]
+        )
+        assert len(remaining_refs) == m - 1
+        loo_bleus.append(corpus_bleu(flat_hypos, remaining_refs))
+    print("average multi-reference BLEU (leave-one-out): %.2f" % np.mean(loo_bleus))
+
+
+def intra_ref(refs):
+    print("ref pairwise BLEU: %.2f" % pairwise(refs))
+    refs = list(zip(*refs))
+    m = len(refs)
+    concat_h = []
+    concat_rest = [[] for j in range(m - 1)]
+    for i, h in enumerate(refs):
+        rest = refs[:i] + refs[i + 1 :]
+        concat_h.append(h)
+        for j in range(m - 1):
+            concat_rest[j].extend(rest[j])
+    concat_h = list(chain.from_iterable(concat_h))
+    bleu = corpus_bleu(concat_h, concat_rest)
+    print("multi-reference BLEU (leave-one-out): %.2f" % bleu)
+
+
+if __name__ == "__main__":
+    main()
diff --git a/SpeechT5/fairseq/examples/translation_moe/translation_moe_src/__init__.py b/SpeechT5/fairseq/examples/translation_moe/translation_moe_src/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..c0abe53e973b4bb31cfb062708965d002c79b6e7
--- /dev/null
+++ b/SpeechT5/fairseq/examples/translation_moe/translation_moe_src/__init__.py
@@ -0,0 +1,6 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from . import translation_moe  # noqa
diff --git a/SpeechT5/fairseq/examples/translation_moe/translation_moe_src/logsumexp_moe.py b/SpeechT5/fairseq/examples/translation_moe/translation_moe_src/logsumexp_moe.py
new file mode 100644
index 0000000000000000000000000000000000000000..fb299daecbc2b15fb66555bbfb8d1d983e481518
--- /dev/null
+++ b/SpeechT5/fairseq/examples/translation_moe/translation_moe_src/logsumexp_moe.py
@@ -0,0 +1,26 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import torch
+
+
+class LogSumExpMoE(torch.autograd.Function):
+    """Standard LogSumExp forward pass, but use *posterior* for the backward.
+
+    See `"Mixture Models for Diverse Machine Translation: Tricks of the Trade"
+    (Shen et al., 2019) <https://arxiv.org/abs/1902.07816>`_.
+    """
+
+    @staticmethod
+    def forward(ctx, logp, posterior, dim=-1):
+        ctx.save_for_backward(posterior)
+        ctx.dim = dim
+        return torch.logsumexp(logp, dim=dim)
+
+    @staticmethod
+    def backward(ctx, grad_output):
+        (posterior,) = ctx.saved_tensors
+        grad_logp = grad_output.unsqueeze(ctx.dim) * posterior
+        return grad_logp, None, None
diff --git a/SpeechT5/fairseq/examples/translation_moe/translation_moe_src/mean_pool_gating_network.py b/SpeechT5/fairseq/examples/translation_moe/translation_moe_src/mean_pool_gating_network.py
new file mode 100644
index 0000000000000000000000000000000000000000..efc7ae40bf8fed6c2384cbc6f94477c4caa4c10c
--- /dev/null
+++ b/SpeechT5/fairseq/examples/translation_moe/translation_moe_src/mean_pool_gating_network.py
@@ -0,0 +1,50 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import torch
+import torch.nn.functional as F
+
+
+class MeanPoolGatingNetwork(torch.nn.Module):
+    """A simple mean-pooling gating network for selecting experts.
+
+    This module applies mean pooling over an encoder's output and returns
+    reponsibilities for each expert. The encoder format is expected to match
+    :class:`fairseq.models.transformer.TransformerEncoder`.
+    """
+
+    def __init__(self, embed_dim, num_experts, dropout=None):
+        super().__init__()
+        self.embed_dim = embed_dim
+        self.num_experts = num_experts
+
+        self.fc1 = torch.nn.Linear(embed_dim, embed_dim)
+        self.dropout = torch.nn.Dropout(dropout) if dropout is not None else None
+        self.fc2 = torch.nn.Linear(embed_dim, num_experts)
+
+    def forward(self, encoder_out):
+        if not (
+            "encoder_out" in encoder_out
+            and "encoder_padding_mask" in encoder_out
+            and encoder_out["encoder_out"][0].size(2) == self.embed_dim
+        ):
+            raise ValueError("Unexpected format for encoder_out")
+
+        # mean pooling over time
+        encoder_padding_mask = encoder_out["encoder_padding_mask"][0]  # B x T
+        encoder_out = encoder_out["encoder_out"][0].transpose(0, 1)    # B x T x C
+        if encoder_padding_mask is not None:
+            encoder_out = encoder_out.clone()  # required because of transpose above
+            encoder_out[encoder_padding_mask] = 0
+            ntokens = torch.sum(~encoder_padding_mask, dim=1, keepdim=True)
+            x = torch.sum(encoder_out, dim=1) / ntokens.type_as(encoder_out)
+        else:
+            x = torch.mean(encoder_out, dim=1)
+
+        x = torch.tanh(self.fc1(x))
+        if self.dropout is not None:
+            x = self.dropout(x)
+        x = self.fc2(x)
+        return F.log_softmax(x, dim=-1, dtype=torch.float32).type_as(x)
diff --git a/SpeechT5/fairseq/examples/translation_moe/translation_moe_src/translation_moe.py b/SpeechT5/fairseq/examples/translation_moe/translation_moe_src/translation_moe.py
new file mode 100644
index 0000000000000000000000000000000000000000..7f28c32dd6152f53d6922cdfccfa903e0bdc5829
--- /dev/null
+++ b/SpeechT5/fairseq/examples/translation_moe/translation_moe_src/translation_moe.py
@@ -0,0 +1,258 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from dataclasses import dataclass, field
+import torch
+from omegaconf import II
+
+from fairseq import metrics, utils
+from fairseq.dataclass import ChoiceEnum
+from fairseq.tasks import register_task
+from fairseq.tasks.translation import TranslationConfig, TranslationTask
+
+from .logsumexp_moe import LogSumExpMoE
+from .mean_pool_gating_network import MeanPoolGatingNetwork
+
+
+METHOD_CHOICES = ChoiceEnum(["sMoElp", "sMoEup", "hMoElp", "hMoEup"])
+
+
+@dataclass
+class TranslationMoEConfig(TranslationConfig):
+    method: METHOD_CHOICES = field(
+        default="hMoEup",
+        metadata={"help": "MoE method"},
+    )
+    num_experts: int = field(
+        default=3,
+        metadata={"help": "number of experts"},
+    )
+    mean_pool_gating_network: bool = field(
+        default=False,
+        metadata={"help": "use a simple mean-pooling gating network"},
+    )
+    mean_pool_gating_network_dropout: float = field(
+        default=0,
+        metadata={"help": "dropout for mean-pooling gating network"},
+    )
+    mean_pool_gating_network_encoder_dim: int = field(
+        default=0,
+        metadata={"help": "encoder output dim for mean-pooling gating network"},
+    )
+    gen_expert: int = field(
+        default=0,
+        metadata={"help": "which expert to use for generation"},
+    )
+    sentence_avg: bool = II("optimization.sentence_avg")
+
+
+@register_task("translation_moe", dataclass=TranslationMoEConfig)
+class TranslationMoETask(TranslationTask):
+    """
+    Translation task for Mixture of Experts (MoE) models.
+
+    See `"Mixture Models for Diverse Machine Translation: Tricks of the Trade"
+    (Shen et al., 2019) <https://arxiv.org/abs/1902.07816>`_.
+
+    Args:
+        src_dict (~fairseq.data.Dictionary): dictionary for the source language
+        tgt_dict (~fairseq.data.Dictionary): dictionary for the target language
+
+    .. note::
+
+        The translation task is compatible with :mod:`fairseq-train`,
+        :mod:`fairseq-generate` and :mod:`fairseq-interactive`.
+
+    The translation task provides the following additional command-line
+    arguments:
+
+    .. argparse::
+        :ref: fairseq.tasks.translation_parser
+        :prog:
+    """
+
+    cfg: TranslationMoEConfig
+
+    def __init__(self, cfg: TranslationMoEConfig, src_dict, tgt_dict):
+        if cfg.method == "sMoElp":
+            # soft MoE with learned prior
+            self.uniform_prior = False
+            self.hard_selection = False
+        elif cfg.method == "sMoEup":
+            # soft MoE with uniform prior
+            self.uniform_prior = True
+            self.hard_selection = False
+        elif cfg.method == "hMoElp":
+            # hard MoE with learned prior
+            self.uniform_prior = False
+            self.hard_selection = True
+        elif cfg.method == "hMoEup":
+            # hard MoE with uniform prior
+            self.uniform_prior = True
+            self.hard_selection = True
+
+        # add indicator tokens for each expert
+        for i in range(cfg.num_experts):
+            # add to both dictionaries in case we're sharing embeddings
+            src_dict.add_symbol("<expert_{}>".format(i))
+            tgt_dict.add_symbol("<expert_{}>".format(i))
+
+        super().__init__(cfg, src_dict, tgt_dict)
+
+    def build_model(self, cfg):
+        from fairseq import models
+
+        model = models.build_model(cfg, self)
+        if not self.uniform_prior and not hasattr(model, "gating_network"):
+            if self.cfg.mean_pool_gating_network:
+                if self.cfg.mean_pool_gating_network_encoder_dim > 0:
+                    encoder_dim = self.cfg.mean_pool_gating_network_encoder_dim
+                elif getattr(cfg, "encoder_embed_dim", None):
+                    # assume that encoder_embed_dim is the encoder's output dimension
+                    encoder_dim = cfg.encoder_embed_dim
+                else:
+                    raise ValueError(
+                        "Must specify --mean-pool-gating-network-encoder-dim"
+                    )
+
+                if self.cfg.mean_pool_gating_network_dropout > 0:
+                    dropout = self.cfg.mean_pool_gating_network_dropout
+                elif getattr(cfg, "dropout", None):
+                    dropout = cfg.dropout
+                else:
+                    raise ValueError("Must specify task.mean_pool_gating_network_dropout")
+
+                model.gating_network = MeanPoolGatingNetwork(
+                    encoder_dim,
+                    self.cfg.num_experts,
+                    dropout,
+                )
+            else:
+                raise ValueError(
+                    "translation_moe task with learned prior requires the model to "
+                    "have a gating network; try using --mean-pool-gating-network"
+                )
+        return model
+
+    def expert_index(self, i):
+        return i + self.tgt_dict.index("<expert_0>")
+
+    def _get_loss(self, sample, model, criterion):
+        assert hasattr(
+            criterion, "compute_loss"
+        ), "translation_moe task requires the criterion to implement the compute_loss() method"
+
+        k = self.cfg.num_experts
+        bsz = sample["target"].size(0)
+
+        def get_lprob_y(encoder_out, prev_output_tokens_k):
+            net_output = model.decoder(
+                prev_output_tokens=prev_output_tokens_k,
+                encoder_out=encoder_out,
+            )
+            loss, _ = criterion.compute_loss(model, net_output, sample, reduce=False)
+            loss = loss.view(bsz, -1)
+            return -loss.sum(dim=1, keepdim=True)  # -> B x 1
+
+        def get_lprob_yz(winners=None):
+            encoder_out = model.encoder(
+                src_tokens=sample["net_input"]["src_tokens"],
+                src_lengths=sample["net_input"]["src_lengths"],
+            )
+
+            if winners is None:
+                lprob_y = []
+                for i in range(k):
+                    prev_output_tokens_k = sample["net_input"][
+                        "prev_output_tokens"
+                    ].clone()
+                    assert not prev_output_tokens_k.requires_grad
+                    prev_output_tokens_k[:, 0] = self.expert_index(i)
+                    lprob_y.append(get_lprob_y(encoder_out, prev_output_tokens_k))
+                lprob_y = torch.cat(lprob_y, dim=1)  # -> B x K
+            else:
+                prev_output_tokens_k = sample["net_input"]["prev_output_tokens"].clone()
+                prev_output_tokens_k[:, 0] = self.expert_index(winners)
+                lprob_y = get_lprob_y(encoder_out, prev_output_tokens_k)  # -> B
+
+            if self.uniform_prior:
+                lprob_yz = lprob_y
+            else:
+                lprob_z = model.gating_network(encoder_out)  # B x K
+                if winners is not None:
+                    lprob_z = lprob_z.gather(dim=1, index=winners.unsqueeze(-1))
+                lprob_yz = lprob_y + lprob_z.type_as(lprob_y)  # B x K
+
+            return lprob_yz
+
+        # compute responsibilities without dropout
+        with utils.model_eval(model):  # disable dropout
+            with torch.no_grad():  # disable autograd
+                lprob_yz = get_lprob_yz()  # B x K
+                prob_z_xy = torch.nn.functional.softmax(lprob_yz, dim=1)
+        assert not prob_z_xy.requires_grad
+
+        # compute loss with dropout
+        if self.hard_selection:
+            winners = prob_z_xy.max(dim=1)[1]
+            loss = -get_lprob_yz(winners)
+        else:
+            lprob_yz = get_lprob_yz()  # B x K
+            loss = -LogSumExpMoE.apply(lprob_yz, prob_z_xy, 1)
+
+        loss = loss.sum()
+        sample_size = (
+            sample["target"].size(0) if self.cfg.sentence_avg else sample["ntokens"]
+        )
+        logging_output = {
+            "loss": utils.item(loss.data),
+            "ntokens": sample["ntokens"],
+            "nsentences": bsz,
+            "sample_size": sample_size,
+            "posterior": prob_z_xy.float().sum(dim=0).cpu(),
+        }
+        return loss, sample_size, logging_output
+
+    def train_step(
+        self, sample, model, criterion, optimizer, update_num, ignore_grad=False
+    ):
+        model.train()
+        loss, sample_size, logging_output = self._get_loss(sample, model, criterion)
+        if ignore_grad:
+            loss *= 0
+        optimizer.backward(loss)
+        return loss, sample_size, logging_output
+
+    def valid_step(self, sample, model, criterion):
+        model.eval()
+        with torch.no_grad():
+            loss, sample_size, logging_output = self._get_loss(sample, model, criterion)
+        return loss, sample_size, logging_output
+
+    def inference_step(
+        self,
+        generator,
+        models,
+        sample,
+        prefix_tokens=None,
+        expert=None,
+        constraints=None,
+    ):
+        expert = expert or self.cfg.gen_expert
+        with torch.no_grad():
+            return generator.generate(
+                models,
+                sample,
+                prefix_tokens=prefix_tokens,
+                constraints=constraints,
+                bos_token=self.expert_index(expert),
+            )
+
+    def reduce_metrics(self, logging_outputs, criterion):
+        super().reduce_metrics(logging_outputs, criterion)
+        metrics.log_scalar(
+            "posterior",
+            sum(log["posterior"] for log in logging_outputs if "posterior" in log),
+        )
diff --git a/SpeechT5/fairseq/examples/truncated_bptt/README.md b/SpeechT5/fairseq/examples/truncated_bptt/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..86518c9d5ef09fbd4fed1512a52e9431b74f08fa
--- /dev/null
+++ b/SpeechT5/fairseq/examples/truncated_bptt/README.md
@@ -0,0 +1,70 @@
+# Truncated Backpropagation Through Time (BPTT)
+
+Truncated BPTT is a useful technique for training language models on very long
+sequences. Typically a long sequences is split into chunks and a language model
+is trained over the chunks sequentially. The LM may condition on previous
+chunks, but gradients only flow through the current chunk. This technique was
+the basis for the paper: [Transformer-XL: Attentive Language Models Beyond a
+Fixed-Length Context](https://arxiv.org/abs/1901.02860), which achieved
+state-of-the-art language modeling results at the time of publication.
+
+It is slightly tricky to implement Truncated BPTT efficiently in fairseq, since
+we need to iterate over the data sequentially and disable any batch shuffling
+logic. The code provided in this example illustrates how to implement Truncated
+BPTT in fairseq by overriding ``FairseqTask::get_batch_iterator`` to iterate
+over the data sequentially. Crucially, this example supports batching and
+multi-GPU (data parallel) training.
+
+##### 0. Setup
+
+First, see the general [language modeling README](README.md) for instructions on
+preprocessing the WikiText-103 data.
+
+##### 1. Train a Transformer-XL model on WikiText-103
+
+We will train a 16-layer Transformer-XL model following the [hyperparameters
+used in the original
+paper](https://github.com/kimiyoung/transformer-xl/blob/master/pytorch/run_wt103_base.sh).
+
+The following command assumes 4 GPUs, so that the total batch size is 60
+sequences (15 x 4). Training should take ~24 hours on 4 V100 GPUs:
+```bash
+CUDA_VISIBLE_DEVICES=0,1,2,3 fairseq-train \
+    --user-dir examples/truncated_bptt \
+    data-bin/wikitext-103/ \
+    --task truncated_bptt_lm --tokens-per-sample 150 \
+    --batch-size 15 --max-update 200000 \
+    --arch transformer_xl --n-layer 16 --d-model 410 --n-head 10 \
+    --d-head 41 --d-inner 2100 --dropout 0.1 --dropatt 0.0 --mem-len 150 \
+    --optimizer adam --clip-norm 0.25 \
+    --lr-scheduler cosine --warmup-updates 0 --min-lr 0.0 --lr 0.00025  \
+    --log-format json --log-interval 25 \
+    --fp16
+```
+
+If training on a single GPU, set `--update-freq=4` to accumulate 4x gradients
+and simulate training on 4 GPUs.
+
+##### 2. Evaluate
+
+```bash
+fairseq-eval-lm data-bin/wikitext-103/ \
+    --path checkpoints/checkpoint_best.pt \
+    --user-dir examples/truncated_bptt/ \
+    --task truncated_bptt_lm \
+    --batch-size 1 --required-batch-size-multiple 1 \
+    --model-overrides '{"mem_len":640,"clamp_len":400,"same_length":True}' \
+    --tokens-per-sample 64
+# ... | INFO | fairseq_cli.eval_lm | num. model params: 151123537
+# ... | INFO | fairseq_cli.eval_lm | Evaluated 245569 tokens in 83.1s (2956.82 tokens/s)
+# ... | INFO | fairseq_cli.eval_lm | Loss (base 2): 4.5668, Perplexity: 23.70
+# Compare to 24.0 test perplexity from the paper
+```
+
+*Note:* During training the model saw 150 tokens of context
+(``--tokens-per-sample=150``) and 150 extra memory tokens (``--mem-len=150``).
+During evaluation we measure perplexity on sequences of 64 tokens
+(``--tokens-per-sample=64``) and increase the memory length
+(``--model-overrides='{"mem_len":640}'``). These settings match the evaluation
+settings from [the original
+paper](https://github.com/kimiyoung/transformer-xl/blob/master/pytorch/run_wt103_base.sh).
diff --git a/SpeechT5/fairseq/examples/truncated_bptt/__init__.py b/SpeechT5/fairseq/examples/truncated_bptt/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..eee484d427a68828462469d133144a8d7c052c40
--- /dev/null
+++ b/SpeechT5/fairseq/examples/truncated_bptt/__init__.py
@@ -0,0 +1,6 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from . import transformer_xl_model, truncated_bptt_lm_task  # noqa
diff --git a/SpeechT5/fairseq/examples/truncated_bptt/transformer_xl_model.py b/SpeechT5/fairseq/examples/truncated_bptt/transformer_xl_model.py
new file mode 100644
index 0000000000000000000000000000000000000000..a6c8b25a07276c2ee30c0aa5f0e4b0a2837ed5ca
--- /dev/null
+++ b/SpeechT5/fairseq/examples/truncated_bptt/transformer_xl_model.py
@@ -0,0 +1,155 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import logging
+from dataclasses import dataclass, field
+from typing import Dict, List, Optional
+
+import torch
+from fairseq.dataclass import FairseqDataclass
+from fairseq.models import (
+    FairseqIncrementalDecoder,
+    FairseqLanguageModel,
+    register_model,
+)
+from fairseq.modules.checkpoint_activations import checkpoint_wrapper
+from omegaconf import II
+
+
+logger = logging.getLogger(__name__)
+
+
+@dataclass
+class TransformerXLConfig(FairseqDataclass):
+    # defaults come from the original Transformer-XL code
+    cutoffs: List[int] = field(default_factory=lambda: [20000, 40000, 200000])
+    d_model: int = 500
+    n_head: int = 10
+    d_head: int = 50
+    d_inner: int = 1000
+    div_val: int = 1
+    n_layer: int = 12
+    mem_len: int = 0
+    clamp_len: int = -1
+    same_length: bool = False
+    dropout: float = 0.0
+    dropatt: float = 0.0
+    checkpoint_activations: bool = False
+    offload_activations: bool = False
+    max_target_positions: int = II("task.max_target_positions")
+
+
+@register_model("transformer_xl", dataclass=TransformerXLConfig)
+class TransformerXLLanguageModel(FairseqLanguageModel):
+    @classmethod
+    def build_model(cls, cfg: TransformerXLConfig, task):
+        return cls(TransformerXLDecoder(cfg, task))
+
+
+class TransformerXLDecoder(FairseqIncrementalDecoder):
+    def __init__(self, cfg, task):
+        try:
+            from transformers.models.transfo_xl import (
+                TransfoXLConfig,
+                TransfoXLLMHeadModel,
+            )
+        except ImportError:
+            from transformers.configuration_transfo_xl import TransfoXLConfig
+            from transformers.modeling_transfo_xl import TransfoXLLMHeadModel
+
+        super().__init__(task.target_dictionary)
+        self.cfg = cfg
+
+        # remove any cutoffs larger than the vocab size
+        cutoffs = [
+            cutoff for cutoff in cfg.cutoffs if cutoff < len(task.target_dictionary)
+        ]
+
+        config = TransfoXLConfig(
+            vocab_size=len(task.target_dictionary),
+            cutoffs=cutoffs,
+            d_model=cfg.d_model,
+            d_embed=cfg.d_model,
+            n_head=cfg.n_head,
+            d_head=cfg.d_head,
+            d_inner=cfg.d_inner,
+            div_val=cfg.div_val,
+            n_layer=cfg.n_layer,
+            mem_len=cfg.mem_len,
+            clamp_len=cfg.clamp_len,
+            same_length=cfg.same_length,
+            dropout=cfg.dropout,
+            dropatt=cfg.dropatt,
+        )
+        logger.info(config)
+        self.model = TransfoXLLMHeadModel(config)
+
+        # Workaround a bug in huggingface's ``ProjectedAdaptiveLogSoftmax``
+        # which adds ``None`` values to an ``nn.ParameterList``, which is not
+        # supported in PyTorch. Instead we can replace this with an
+        # ``nn.ModuleList``, which does support ``None`` values.
+        try:
+            if all(p is None for p in self.model.crit.out_projs._parameters.values()):
+                self.model.crit.out_projs = torch.nn.ModuleList(
+                    [None] * len(self.model.crit.out_projs._parameters)
+                )
+        except Exception:
+            pass
+
+        if cfg.checkpoint_activations or cfg.offload_activations:
+            for i in range(len(self.model.transformer.layers)):
+                self.model.transformer.layers[i] = checkpoint_wrapper(
+                    self.model.transformer.layers[i],
+                    offload_to_cpu=cfg.offload_activations,
+                )
+                # TODO: may save mem to wrap(layer.pos_ff.CoreNet[3])
+
+        self._mems = None
+
+    def forward(
+        self,
+        src_tokens,
+        src_lengths=None,  # unused
+        incremental_state: Optional[Dict[str, List[torch.Tensor]]] = None,
+        encoder_out=None,
+    ):
+        if incremental_state is not None:  # used during inference
+            mems = self.get_incremental_state(incremental_state, "mems")
+            src_tokens = src_tokens[:, -1:]  # only keep the most recent token
+        else:
+            mems = self._mems
+
+        output = self.model(
+            input_ids=src_tokens,
+            mems=mems,
+            return_dict=False,
+        )
+
+        if len(output) >= 2:
+            if incremental_state is not None:
+                self.set_incremental_state(incremental_state, "mems", output[1])
+            else:
+                self._mems = output[1]
+
+        return (output[0],)
+
+    def max_positions(self):
+        return self.cfg.max_target_positions
+
+    def reorder_incremental_state(
+        self,
+        incremental_state: Dict[str, Dict[str, Optional[torch.Tensor]]],
+        new_order: torch.Tensor,
+    ):
+        """Reorder incremental state.
+
+        This will be called when the order of the input has changed from the
+        previous time step. A typical use case is beam search, where the input
+        order changes between time steps based on the selection of beams.
+        """
+        mems = self.get_incremental_state(incremental_state, "mems")
+        if mems is not None:
+            new_mems = [mems_i.index_select(1, new_order) for mems_i in mems]
+            self.set_incremental_state(incremental_state, "mems", new_mems)
diff --git a/SpeechT5/fairseq/examples/truncated_bptt/truncated_bptt_lm_task.py b/SpeechT5/fairseq/examples/truncated_bptt/truncated_bptt_lm_task.py
new file mode 100644
index 0000000000000000000000000000000000000000..02be0e7fb4213b98798c85b79e9046e9990b97fc
--- /dev/null
+++ b/SpeechT5/fairseq/examples/truncated_bptt/truncated_bptt_lm_task.py
@@ -0,0 +1,281 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import logging
+import os
+from dataclasses import dataclass, field
+from typing import List, Optional, Tuple
+
+import torch
+from fairseq import utils
+from fairseq.data import (
+    Dictionary,
+    TokenBlockDataset,
+    data_utils,
+    iterators,
+)
+from fairseq.dataclass import FairseqDataclass
+from fairseq.distributed import utils as dist_utils
+from fairseq.tasks import FairseqTask, register_task
+from omegaconf import II
+
+
+logger = logging.getLogger(__name__)
+
+
+@dataclass
+class TruncatedBPTTLMConfig(FairseqDataclass):
+    data: str = field(default="???", metadata={"help": "path to data directory"})
+    tokens_per_sample: int = field(
+        default=1024,
+        metadata={"help": "max number of tokens per sequence"},
+    )
+    batch_size: int = II("dataset.batch_size")
+    # Some models use *max_target_positions* to know how many positional
+    # embeddings to learn. We use II(...) to make it default to
+    # *tokens_per_sample*, but in principle there could be more positional
+    # embeddings than tokens in a single batch. This may also be irrelevant for
+    # custom model implementations.
+    max_target_positions: int = II("task.tokens_per_sample")
+    # these will be populated automatically if not provided
+    data_parallel_rank: Optional[int] = None
+    data_parallel_size: Optional[int] = None
+
+
+@register_task("truncated_bptt_lm", dataclass=TruncatedBPTTLMConfig)
+class TruncatedBPTTLMTask(FairseqTask):
+    def __init__(self, cfg: TruncatedBPTTLMConfig):
+        super().__init__(cfg)
+
+        if cfg.data_parallel_rank is None or cfg.data_parallel_size is None:
+            if torch.distributed.is_initialized():
+                cfg.data_parallel_rank = dist_utils.get_data_parallel_rank()
+                cfg.data_parallel_size = dist_utils.get_data_parallel_world_size()
+            else:
+                cfg.data_parallel_rank = 0
+                cfg.data_parallel_size = 1
+
+        # load the dictionary
+        paths = utils.split_paths(cfg.data)
+        assert len(paths) > 0
+        self.dictionary = Dictionary.load(os.path.join(paths[0], "dict.txt"))
+        logger.info("dictionary: {} types".format(len(self.dictionary)))
+
+    def load_dataset(self, split, epoch=1, combine=False, **kwargs):
+        """Load a given dataset split (e.g., train, valid, test)"""
+
+        # support sharded datasets
+        paths = utils.split_paths(self.cfg.data)
+        assert len(paths) > 0
+        data_path = paths[(epoch - 1) % len(paths)]
+        split_path = os.path.join(data_path, split)
+
+        # each element of *data* will be a tensorized line from the original
+        # text dataset, similar to ``open(split_path).readlines()``
+        data = data_utils.load_indexed_dataset(
+            split_path, self.dictionary, combine=combine
+        )
+        if data is None:
+            raise FileNotFoundError(
+                "Dataset not found: {} ({})".format(split, split_path)
+            )
+
+        # this is similar to ``data.view(-1).split(tokens_per_sample)``
+        data = TokenBlockDataset(
+            data,
+            data.sizes,
+            block_size=self.cfg.tokens_per_sample,
+            pad=None,  # unused
+            eos=None,  # unused
+            break_mode="none",
+        )
+
+        self.datasets[split] = TruncatedBPTTDataset(
+            data=data,
+            bsz_per_shard=self.cfg.batch_size,
+            shard_id=self.cfg.data_parallel_rank,
+            num_shards=self.cfg.data_parallel_size,
+        )
+
+    def dataset(self, split):
+        return self.datasets[split]
+
+    def get_batch_iterator(
+        self, dataset, num_workers=0, epoch=1, data_buffer_size=0, **kwargs
+    ):
+        return iterators.EpochBatchIterator(
+            dataset=dataset,
+            collate_fn=self._collate_fn,
+            num_workers=num_workers,
+            epoch=epoch,
+            buffer_size=data_buffer_size,
+            # we don't use the batching functionality from EpochBatchIterator;
+            # instead every item in *dataset* is a whole batch
+            batch_sampler=[[i] for i in range(len(dataset))],
+            disable_shuffling=True,
+        )
+
+    def _collate_fn(self, items: List[List[torch.Tensor]]):
+        # we don't use fairseq's batching functionality, so we expect a single
+        # Tensor of type List[torch.Tensor]
+        assert len(items) == 1
+
+        # item will have shape B x T (the last batch may have length < T)
+        id, item = items[0]
+        item = data_utils.collate_tokens(item, pad_idx=self.source_dictionary.pad())
+        B, T = item.size()
+
+        # shift item one position over and append a padding token for the target
+        target = torch.nn.functional.pad(
+            item[:, 1:], (0, 1, 0, 0), value=self.target_dictionary.pad()
+        )
+
+        # fairseq expects batches to have the following structure
+        return {
+            "id": torch.tensor([id]*item.size(0)),
+            "net_input": {
+                "src_tokens": item,
+            },
+            "target": target,
+            "nsentences": item.size(0),
+            "ntokens": item.numel(),
+        }
+
+    def build_dataset_for_inference(
+        self, src_tokens: List[torch.Tensor], src_lengths: List[int], **kwargs
+    ) -> torch.utils.data.Dataset:
+        eos = self.source_dictionary.eos()
+        dataset = TokenBlockDataset(
+            src_tokens,
+            src_lengths,
+            block_size=None,  # ignored for "eos" break mode
+            pad=self.source_dictionary.pad(),
+            eos=eos,
+            break_mode="eos",
+        )
+
+        class Dataset(torch.utils.data.Dataset):
+            def __getitem__(self, i):
+                item = dataset[i]
+                if item[-1] == eos:
+                    # remove eos to support generating with a prefix
+                    item = item[:-1]
+                return (i, [item])
+
+            def __len__(self):
+                return len(dataset)
+
+        return Dataset()
+
+    def inference_step(
+        self, generator, models, sample, prefix_tokens=None, constraints=None
+    ):
+        with torch.no_grad():
+            if constraints is not None:
+                raise NotImplementedError
+
+            # SequenceGenerator doesn't use *src_tokens* directly, we need to
+            # pass the *prefix_tokens* argument instead.
+            if prefix_tokens is None and sample["net_input"]["src_tokens"].nelement():
+                prefix_tokens = sample["net_input"]["src_tokens"]
+
+            # begin generation with the end-of-sentence token
+            bos_token = self.source_dictionary.eos()
+
+            return generator.generate(
+                models, sample, prefix_tokens=prefix_tokens, bos_token=bos_token
+            )
+
+    def eval_lm_dataloader(
+        self,
+        dataset,
+        max_tokens: Optional[int] = 36000,
+        batch_size: Optional[int] = None,
+        max_positions: Optional[int] = None,
+        num_shards: int = 1,
+        shard_id: int = 0,
+        num_workers: int = 1,
+        data_buffer_size: int = 10,
+        context_window: int = 0,
+    ):
+        if context_window > 0:
+            raise NotImplementedError(
+                "Transformer-XL doesn't need --context-window, try "
+                "--model-overrides '{\"mem_len\":42}' instead "
+            )
+        return self.get_batch_iterator(
+            dataset=dataset,
+            max_tokens=max_tokens,
+            max_sentences=batch_size,
+            max_positions=max_positions,
+            ignore_invalid_inputs=True,
+            num_shards=num_shards,
+            shard_id=shard_id,
+            num_workers=num_workers,
+            data_buffer_size=data_buffer_size,
+        ).next_epoch_itr(shuffle=False)
+
+    @property
+    def source_dictionary(self):
+        return self.dictionary
+
+    @property
+    def target_dictionary(self):
+        return self.dictionary
+
+
+class TruncatedBPTTDataset(torch.utils.data.Dataset):
+    def __init__(
+        self,
+        data: List[torch.Tensor],  # ordered list of items
+        bsz_per_shard,  # number of items processed per GPUs per forward
+        shard_id,  # current GPU ID
+        num_shards,  # number of GPUs
+    ):
+        super().__init__()
+        self.data = data
+
+        def batchify(data, bsz):
+            # Work out how cleanly we can divide the dataset into bsz parts.
+            nbatch = data.size(0) // bsz
+            # Trim off any extra elements that wouldn't cleanly fit (remainders).
+            data = data.narrow(0, 0, nbatch * bsz)
+            # Evenly divide the data across the bsz batches.
+            data = data.view(bsz, -1).contiguous()
+            return data
+
+        # total number of sequences processed by all GPUs in each forward pass
+        global_batch_size = bsz_per_shard * num_shards
+
+        """
+        With a 16 item dataset, bsz_per_shard=2 and num_shards=3,
+        *indices* might look like:
+
+            indices = [[0, 1],
+                       [2, 3],
+                       [4, 5],
+                       [6, 7],
+                       [8, 9],
+                       [10, 11]]
+
+        The size of the TruncatedBPTTDataset instance will be 2,
+        and shard 1 will see items:
+
+            [(0, [data[4], data[6]]),
+             (1, [data[5], data[7]])]
+        """
+        indices = batchify(torch.arange(len(data)), global_batch_size)
+        assert indices.size(0) == global_batch_size
+
+        self.my_indices = indices[
+            shard_id * bsz_per_shard : (shard_id + 1) * bsz_per_shard
+        ]
+        assert self.my_indices.size(0) == bsz_per_shard
+
+    def __len__(self):
+        return self.my_indices.size(1)
+
+    def __getitem__(self, i) -> Tuple[int, List[torch.Tensor]]:
+        return (i, [self.data[idx] for idx in self.my_indices[:, i]])
diff --git a/SpeechT5/fairseq/examples/unsupervised_quality_estimation/README.md b/SpeechT5/fairseq/examples/unsupervised_quality_estimation/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..e86a0d13b883af0c37fdc2c1fee9b0b9dff4d18c
--- /dev/null
+++ b/SpeechT5/fairseq/examples/unsupervised_quality_estimation/README.md
@@ -0,0 +1,126 @@
+# Unsupervised Quality Estimation for Neural Machine Translation (Fomicheva et al., 2020)
+
+This page includes instructions for reproducing results from the paper [Unsupervised Quality Estimation for Neural
+Machine Translation (Fomicheva et al., 2020)](https://arxiv.org/abs/2005.10608)
+
+## Requirements:
+
+* mosesdecoder: https://github.com/moses-smt/mosesdecoder
+* subword-nmt: https://github.com/rsennrich/subword-nmt
+* flores: https://github.com/facebookresearch/flores
+
+## Download Models and Test Data
+
+Download translation models and test data from [MLQE dataset repository](https://github.com/facebookresearch/mlqe).
+
+## Set up:
+
+Given a testset consisting of source sentences and reference translations:
+
+* `SRC_LANG`: source language
+* `TGT_LANG`: target language
+* `INPUT`: input prefix, such that the file `$INPUT.$SRC_LANG` contains source sentences and `$INPUT.$TGT_LANG`
+contains the reference sentences
+* `OUTPUT_DIR`: output path to store results
+* `MOSES_DECODER`: path to mosesdecoder installation
+* `BPE_ROOT`: path to subword-nmt installation
+* `BPE`: path to BPE model
+* `MODEL_DIR`: directory containing the NMT model `.pt` file as well as the source and target vocabularies.
+* `TMP`: directory for intermediate temporary files
+* `GPU`: if translating with GPU, id of the GPU to use for inference
+* `DROPOUT_N`: number of stochastic forward passes
+
+`$DROPOUT_N` is set to 30 in the experiments reported in the paper. However, we observed that increasing it beyond 10
+does not bring substantial improvements.
+
+## Translate the data using standard decoding
+
+Preprocess the input data:
+```
+for LANG in $SRC_LANG $TGT_LANG; do
+  perl $MOSES_DECODER/scripts/tokenizer/tokenizer.perl -threads 80 -a -l $LANG < $INPUT.$LANG > $TMP/preprocessed.tok.$LANG
+  python $BPE_ROOT/apply_bpe.py -c ${BPE} < $TMP/preprocessed.tok.$LANG > $TMP/preprocessed.tok.bpe.$LANG
+done
+```
+
+Binarize the data for faster translation:
+
+```
+fairseq-preprocess --srcdict $MODEL_DIR/dict.$SRC_LANG.txt --tgtdict $MODEL_DIR/dict.$TGT_LANG.txt
+--source-lang ${SRC_LANG} --target-lang ${TGT_LANG} --testpref $TMP/preprocessed.tok.bpe --destdir $TMP/bin --workers 4
+```
+
+Translate
+
+```
+CUDA_VISIBLE_DEVICES=$GPU fairseq-generate $TMP/bin --path ${MODEL_DIR}/${SRC_LANG}-${TGT_LANG}.pt --beam 5
+--source-lang $SRC_LANG --target-lang $TGT_LANG --no-progress-bar --unkpen 5 > $TMP/fairseq.out
+grep ^H $TMP/fairseq.out | cut -d- -f2- | sort -n | cut -f3- > $TMP/mt.out
+```
+
+Post-process
+
+```
+sed -r 's/(@@ )| (@@ ?$)//g' < $TMP/mt.out | perl $MOSES_DECODER/scripts/tokenizer/detokenizer.perl
+-l $TGT_LANG > $OUTPUT_DIR/mt.out
+```
+
+## Produce uncertainty estimates
+
+### Scoring
+
+Make temporary files to store the translations repeated N times.
+
+```
+python ${SCRIPTS}/scripts/uncertainty/repeat_lines.py -i $TMP/preprocessed.tok.bpe.$SRC_LANG -n $DROPOUT_N
+-o $TMP/repeated.$SRC_LANG
+python ${SCRIPTS}/scripts/uncertainty/repeat_lines.py -i $TMP/mt.out -n $DROPOUT_N -o $TMP/repeated.$TGT_LANG
+
+fairseq-preprocess --srcdict ${MODEL_DIR}/dict.${SRC_LANG}.txt $TGT_DIC --source-lang ${SRC_LANG}
+--target-lang ${TGT_LANG} --testpref ${TMP}/repeated --destdir ${TMP}/bin-repeated
+```
+
+Produce model scores for the generated translations using `--retain-dropout` option to apply dropout at inference time:
+
+```
+CUDA_VISIBLE_DEVICES=${GPU} fairseq-generate ${TMP}/bin-repeated --path ${MODEL_DIR}/${LP}.pt --beam 5
+ --source-lang $SRC_LANG --target-lang $TGT_LANG --no-progress-bar --unkpen 5 --score-reference --retain-dropout
+ --retain-dropout-modules '["TransformerModel","TransformerEncoder","TransformerDecoder","TransformerEncoderLayer"]'
+ TransformerDecoderLayer --seed 46 > $TMP/dropout.scoring.out
+
+grep ^H $TMP/dropout.scoring.out | cut -d- -f2- | sort -n | cut -f2 > $TMP/dropout.scores
+
+```
+
+Use `--retain-dropout-modules` to specify the modules. By default, dropout is applied in the same places
+as for training.
+
+Compute the mean of the resulting output distribution:
+
+```
+python $SCRIPTS/scripts/uncertainty/aggregate_scores.py -i $TMP/dropout.scores -o $OUTPUT_DIR/dropout.scores.mean
+-n $DROPOUT_N
+```
+
+### Generation
+
+Produce multiple translation hypotheses for the same source using `--retain-dropout` option:
+
+```
+CUDA_VISIBLE_DEVICES=${GPU} fairseq-generate ${TMP}/bin-repeated --path ${MODEL_DIR}/${LP}.pt
+ --beam 5 --source-lang $SRC_LANG --target-lang $TGT_LANG --no-progress-bar --retain-dropout
+ --unkpen 5 --retain-dropout-modules TransformerModel TransformerEncoder TransformerDecoder
+TransformerEncoderLayer TransformerDecoderLayer --seed 46 > $TMP/dropout.generation.out
+
+grep ^H $TMP/dropout.generation.out | cut -d- -f2- | sort -n | cut -f3- > $TMP/dropout.hypotheses_
+
+sed -r 's/(@@ )| (@@ ?$)//g' < $TMP/dropout.hypotheses_ | perl $MOSES_DECODER/scripts/tokenizer/detokenizer.perl
+-l $TGT_LANG > $TMP/dropout.hypotheses
+```
+
+Compute similarity between multiple hypotheses corresponding to the same source sentence using Meteor
+evaluation metric:
+```
+python meteor.py -i $TMP/dropout.hypotheses -m <path_to_meteor_installation> -n $DROPOUT_N -o
+$OUTPUT_DIR/dropout.gen.sim.meteor
+```
diff --git a/SpeechT5/fairseq/examples/unsupervised_quality_estimation/aggregate_scores.py b/SpeechT5/fairseq/examples/unsupervised_quality_estimation/aggregate_scores.py
new file mode 100644
index 0000000000000000000000000000000000000000..66d50d07ff2067b802b90a2aadd88df23153830a
--- /dev/null
+++ b/SpeechT5/fairseq/examples/unsupervised_quality_estimation/aggregate_scores.py
@@ -0,0 +1,41 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import argparse
+import sys
+
+import numpy as np
+
+
+aggregate_funcs = {
+    "std": np.std,
+    "var": np.var,
+    "median": np.median,
+    "mean": np.mean,
+    "min": np.min,
+    "max": np.max,
+}
+
+
+def main():
+    parser = argparse.ArgumentParser()
+    parser.add_argument("-i", "--input_file", required=True, type=str)
+    parser.add_argument("-n", "--repeat_times", required=True, type=int)
+    parser.add_argument("-o", "--output_file", required=False)
+    parser.add_argument("-f", "--func", required=False, default="mean")
+    args = parser.parse_args()
+
+    stream = open(args.output_file, "w") if args.output_file else sys.stdout
+
+    segment_scores = []
+    for line in open(args.input_file):
+        segment_scores.append(float(line.strip()))
+        if len(segment_scores) == args.repeat_times:
+            stream.write("{}\n".format(aggregate_funcs[args.func](segment_scores)))
+            segment_scores = []
+
+
+if __name__ == "__main__":
+    main()
diff --git a/SpeechT5/fairseq/examples/unsupervised_quality_estimation/meteor.py b/SpeechT5/fairseq/examples/unsupervised_quality_estimation/meteor.py
new file mode 100644
index 0000000000000000000000000000000000000000..2ee0448cf1f167f6f3ecee56ad807922cffb0956
--- /dev/null
+++ b/SpeechT5/fairseq/examples/unsupervised_quality_estimation/meteor.py
@@ -0,0 +1,109 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import argparse
+import math
+import os
+import subprocess
+import sys
+import tempfile
+from collections import defaultdict
+from itertools import combinations
+
+
+def read_translations(path, n_repeats):
+    segment_counter = 0
+    segment_translations = []
+    translations = defaultdict(list)
+    for line in open(path):
+        segment_translations.append(" ".join(line.split()))
+        if len(segment_translations) == n_repeats:
+            translations[segment_counter] = segment_translations
+            segment_translations = []
+            segment_counter += 1
+    return translations
+
+
+def generate_input(translations, n_repeats):
+    _, ref_path = tempfile.mkstemp()
+    _, mt_path = tempfile.mkstemp()
+    ref_fh = open(ref_path, "w")
+    mt_fh = open(mt_path, "w")
+    for segid in sorted(translations.keys()):
+        assert len(translations[segid]) == n_repeats
+        indexes = combinations(range(n_repeats), 2)
+        for idx1, idx2 in indexes:
+            mt_fh.write(translations[segid][idx1].strip() + "\n")
+            ref_fh.write(translations[segid][idx2].strip() + "\n")
+    sys.stderr.write("\nSaved translations to %s and %s" % (ref_path, mt_path))
+    return ref_path, mt_path
+
+
+def run_meteor(ref_path, mt_path, metric_path, lang="en"):
+    _, out_path = tempfile.mkstemp()
+    subprocess.call(
+        [
+            "java",
+            "-Xmx2G",
+            "-jar",
+            metric_path,
+            mt_path,
+            ref_path,
+            "-p",
+            "0.5 0.2 0.6 0.75",  # default parameters, only changed alpha to give equal weight to P and R
+            "-norm",
+            "-l",
+            lang,
+        ],
+        stdout=open(out_path, "w"),
+    )
+    os.remove(ref_path)
+    os.remove(mt_path)
+    sys.stderr.write("\nSaved Meteor output to %s" % out_path)
+    return out_path
+
+
+def read_output(meteor_output_path, n_repeats):
+    n_combinations = math.factorial(n_repeats) / (
+        math.factorial(2) * math.factorial(n_repeats - 2)
+    )
+    raw_scores = []
+    average_scores = []
+    for line in open(meteor_output_path):
+        if not line.startswith("Segment "):
+            continue
+        score = float(line.strip().split("\t")[1])
+        raw_scores.append(score)
+        if len(raw_scores) == n_combinations:
+            average_scores.append(sum(raw_scores) / n_combinations)
+            raw_scores = []
+    os.remove(meteor_output_path)
+    return average_scores
+
+
+def main():
+    parser = argparse.ArgumentParser()
+    parser.add_argument("-i", "--infile")
+    parser.add_argument("-n", "--repeat_times", type=int)
+    parser.add_argument("-m", "--meteor")
+    parser.add_argument("-o", "--output")
+    args = parser.parse_args()
+
+    translations = read_translations(args.infile, args.repeat_times)
+    sys.stderr.write("\nGenerating input for Meteor...")
+    ref_path, mt_path = generate_input(translations, args.repeat_times)
+    sys.stderr.write("\nRunning Meteor...")
+    out_path = run_meteor(ref_path, mt_path, args.meteor)
+    sys.stderr.write("\nReading output...")
+    scores = read_output(out_path, args.repeat_times)
+    sys.stderr.write("\nWriting results...")
+    with open(args.output, "w") as o:
+        for scr in scores:
+            o.write("{}\n".format(scr))
+    o.close()
+
+
+if __name__ == "__main__":
+    main()
diff --git a/SpeechT5/fairseq/examples/unsupervised_quality_estimation/repeat_lines.py b/SpeechT5/fairseq/examples/unsupervised_quality_estimation/repeat_lines.py
new file mode 100644
index 0000000000000000000000000000000000000000..5a04851a74624e9c8ebc259805b7aed6c638b0de
--- /dev/null
+++ b/SpeechT5/fairseq/examples/unsupervised_quality_estimation/repeat_lines.py
@@ -0,0 +1,28 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import argparse
+import sys
+
+
+def _normalize_spaces(line):
+    return " ".join(line.split())
+
+
+def main():
+    parser = argparse.ArgumentParser()
+    parser.add_argument("-i", "--input_file", required=True, type=str)
+    parser.add_argument("-n", "--repeat_times", required=True, type=int)
+    parser.add_argument("-o", "--output_file", required=False, type=str)
+    args = parser.parse_args()
+    stream = open(args.output_file, "w") if args.output_file else sys.stdout
+
+    for line in open(args.input_file):
+        for _ in range(args.repeat_times):
+            stream.write(_normalize_spaces(line) + "\n")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/SpeechT5/fairseq/examples/wav2vec/README.md b/SpeechT5/fairseq/examples/wav2vec/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..238639a9ba2474481cfb93bb94d42ac62613897d
--- /dev/null
+++ b/SpeechT5/fairseq/examples/wav2vec/README.md
@@ -0,0 +1,369 @@
+# wav2vec 2.0
+
+wav2vec 2.0 learns speech representations on unlabeled data as described in [wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations (Baevski et al., 2020)](https://arxiv.org/abs/2006.11477).
+
+We learned speech representations in multiple languages as well in [Unsupervised Cross-lingual Representation Learning for Speech Recognition (Conneau et al., 2020)](https://arxiv.org/abs/2006.13979).
+
+We also combined wav2vec 2.0 with self-training in [Self-training and Pre-training are Complementary for Speech Recognition (Xu et al., 2020)](https://arxiv.org/abs/2010.11430).
+
+## Pre-trained models
+
+Model | Finetuning split | Dataset | Model
+|---|---|---|---
+Wav2Vec 2.0 Base | No finetuning | [Librispeech](http://www.openslr.org/12) | [download](https://dl.fbaipublicfiles.com/fairseq/wav2vec/wav2vec_small.pt)
+Wav2Vec 2.0 Base | 10 minutes | [Librispeech](http://www.openslr.org/12) | [download](https://dl.fbaipublicfiles.com/fairseq/wav2vec/wav2vec_small_10m.pt)
+Wav2Vec 2.0 Base | 100 hours | [Librispeech](http://www.openslr.org/12) | [download](https://dl.fbaipublicfiles.com/fairseq/wav2vec/wav2vec_small_100h.pt)
+Wav2Vec 2.0 Base | 960 hours | [Librispeech](http://www.openslr.org/12) | [download](https://dl.fbaipublicfiles.com/fairseq/wav2vec/wav2vec_small_960h.pt)
+Wav2Vec 2.0 Large | No finetuning | [Librispeech](http://www.openslr.org/12)  | [download](https://dl.fbaipublicfiles.com/fairseq/wav2vec/libri960_big.pt)
+Wav2Vec 2.0 Large | 10 minutes | [Librispeech](http://www.openslr.org/12)  | [download](https://dl.fbaipublicfiles.com/fairseq/wav2vec/wav2vec_big_10m.pt)
+Wav2Vec 2.0 Large | 100 hours | [Librispeech](http://www.openslr.org/12)  | [download](https://dl.fbaipublicfiles.com/fairseq/wav2vec/wav2vec_big_100h.pt)
+Wav2Vec 2.0 Large | 960 hours | [Librispeech](http://www.openslr.org/12)  | [download](https://dl.fbaipublicfiles.com/fairseq/wav2vec/wav2vec_big_960h.pt)
+Wav2Vec 2.0 Large (LV-60)* | No finetuning | [Libri-Light](https://github.com/facebookresearch/libri-light) | [download](https://dl.fbaipublicfiles.com/fairseq/wav2vec/wav2vec_vox_new.pt)
+Wav2Vec 2.0 Large (LV-60)* | 10 minutes | [Libri-Light](https://github.com/facebookresearch/libri-light) + [Librispeech](http://www.openslr.org/12) | [download](https://dl.fbaipublicfiles.com/fairseq/wav2vec/wav2vec_vox_10m_new.pt)
+Wav2Vec 2.0 Large (LV-60)* | 100 hours | [Libri-Light](https://github.com/facebookresearch/libri-light) + [Librispeech](http://www.openslr.org/12) | [download](https://dl.fbaipublicfiles.com/fairseq/wav2vec/wav2vec_vox_100h_new.pt)
+Wav2Vec 2.0 Large (LV-60)* | 960 hours | [Libri-Light](https://github.com/facebookresearch/libri-light) + [Librispeech](http://www.openslr.org/12) | [download](https://dl.fbaipublicfiles.com/fairseq/wav2vec/wav2vec2_vox_960h_new.pt)
+Wav2Vec 2.0 Large (LV-60) + Self Training * | 10 minutes | [Libri-Light](https://github.com/facebookresearch/libri-light) + [Librispeech](http://www.openslr.org/12) | [download](https://dl.fbaipublicfiles.com/fairseq/wav2vec/wav2vec_vox_10m_pl.pt)
+Wav2Vec 2.0 Large (LV-60) + Self Training * | 100 hours | [Libri-Light](https://github.com/facebookresearch/libri-light) + [Librispeech](http://www.openslr.org/12) | [download](https://dl.fbaipublicfiles.com/fairseq/wav2vec/wav2vec_vox_100h_pl.pt)
+Wav2Vec 2.0 Large (LV-60) + Self Training * | 960 hours | [Libri-Light](https://github.com/facebookresearch/libri-light) + [Librispeech](http://www.openslr.org/12) | [download](https://dl.fbaipublicfiles.com/fairseq/wav2vec/wav2vec_vox_960h_pl.pt)
+
+\* updated (Oct. 24, 2020)
+
+We also release multilingual pre-trained wav2vec 2.0 (XLSR) models:
+
+Model | Architecture | Hours | Languages | Datasets | Model
+|---|---|---|---|---|---
+XLSR-53 | Large | 56k | 53 | MLS, CommonVoice, BABEL | [download](https://dl.fbaipublicfiles.com/fairseq/wav2vec/xlsr_53_56k.pt)
+
+The XLSR model uses the following datasets for multilingual pretraining:
+
+* **[MLS: Multilingual LibriSpeech](https://indico2.conference4me.psnc.pl/event/35/contributions/3585/attachments/1060/1101/Wed-2-6-10.pdf)** (8 languages, 50.7k hours): *Dutch, English, French, German, Italian, Polish, Portuguese, Spanish*
+
+* **[CommonVoice](https://commonvoice.mozilla.org/en/languages)** (36 languages, 3.6k hours): *Arabic, Basque, Breton, Chinese (CN), Chinese (HK), Chinese (TW), Chuvash, Dhivehi, Dutch, English, Esperanto, Estonian, French, German, Hakh-Chin, Indonesian, Interlingua, Irish, Italian, Japanese, Kabyle, Kinyarwanda, Kyrgyz, Latvian, Mongolian, Persian, Portuguese, Russian, Sakha, Slovenian, Spanish, Swedish, Tamil, Tatar, Turkish, Welsh* (see also [finetuning splits]([https://dl.fbaipublicfiles.com/cpc_audio/common_voices_splits.tar.gz]) from [this paper](https://arxiv.org/abs/2002.02848)).
+
+* **[Babel](https://catalog.ldc.upenn.edu/byyear)** (17 languages, 1.7k hours): *Assamese, Bengali, Cantonese, Cebuano, Georgian, Haitian, Kazakh, Kurmanji, Lao, Pashto, Swahili, Tagalog, Tamil, Tok, Turkish, Vietnamese, Zulu*
+
+
+## Training a new model with the CLI tools
+
+Given a directory containing wav files to be used for pretraining (we recommend splitting each file into separate file 10 to 30 seconds in length)
+
+### Prepare training data manifest:
+
+First, install the `soundfile` library:
+```shell script
+pip install soundfile
+```
+
+Next, run:
+
+```shell script
+$ python examples/wav2vec/wav2vec_manifest.py /path/to/waves --dest /manifest/path --ext $ext --valid-percent $valid
+```
+
+$ext should be set to flac, wav, or whatever format your dataset happens to use that soundfile can read.
+
+$valid should be set to some reasonable percentage (like 0.01) of training data to use for validation.
+To use a pre-defined validation set (like dev-other from librispeech), set to it 0 and then overwrite valid.tsv with a
+separately pre-processed manifest file.
+
+### Train a wav2vec 2.0 base model:
+
+This configuration was used for the base model trained on the Librispeech dataset in the wav2vec 2.0 paper
+
+Note that the input is expected to be single channel, sampled at 16 kHz
+
+```shell script
+$ fairseq-hydra-train \
+    task.data=/path/to/data \
+    --config-dir /path/to/fairseq-py/examples/wav2vec/config/pretraining \
+    --config-name wav2vec2_base_librispeech
+```
+
+Note: you can simulate 64 GPUs by using k GPUs and adding command line parameters (before `--config-dir`)
+`distributed_training.distributed_world_size=k` `+optimization.update_freq='[x]'` where x = 64/k
+
+### Train a wav2vec 2.0 large model:
+
+This configuration was used for the large model trained on the Libri-light dataset in the wav2vec 2.0 paper
+
+```shell script
+$ fairseq-hydra-train \
+    task.data=/path/to/data \
+    --config-dir /path/to/fairseq-py/examples/wav2vec/config/pretraining \
+    --config-name wav2vec2_large_librivox
+```
+
+Note: you can simulate 128 GPUs by using k GPUs and adding command line parameters (before `--config-dir`)
+`distributed_training.distributed_world_size=k` `+optimization.update_freq='[x]'` where x = 128/k
+
+### Fine-tune a pre-trained model with CTC:
+
+Fine-tuning a model requires parallel audio and labels file, as well as a vocabulary file in fairseq format.
+A letter vocabulary can be downloaded [here](https://dl.fbaipublicfiles.com/fairseq/wav2vec/dict.ltr.txt).
+An example [script](libri_labels.py) that generates labels for the Librispeech dataset from the tsv file produced by wav2vec_manifest.py can be used as follows:
+
+```shell script
+split=train
+$ python libri_labels.py /path/to/tsv --output-dir /output/dir --output-name $split
+```
+
+Fine-tuning on 100h of Librispeech with letter targets:
+```shell script
+$ fairseq-hydra-train \
+    distributed_training.distributed_port=$PORT \
+    task.data=/path/to/data \
+    model.w2v_path=/path/to/model.pt \
+    --config-dir /path/to/fairseq-py/examples/wav2vec/config/finetuning \
+    --config-name base_100h
+```
+
+There are other config files in the config/finetuning directory that can be used to fine-tune on other splits.
+You can specify the right config via the `--config-name` parameter.
+
+Note: you can simulate 24 GPUs by using k GPUs and adding command line parameters (before `--config-dir`)
+`distributed_training.distributed_world_size=k` `+optimization.update_freq='[x]'` where x = 24/k
+
+Decoding with a language model during training requires flashlight [python bindings](https://github.com/facebookresearch/flashlight/tree/master/bindings/python) (previously called [wav2letter](https://github.com/facebookresearch/wav2letter).
+If you want to use a language model, add `+criterion.wer_args='[/path/to/kenlm, /path/to/lexicon, 2, -1]'` to the command line.
+
+### Evaluating a CTC model:
+
+Evaluating a CTC model with a language model requires [flashlight python bindings](https://github.com/facebookresearch/flashlight/tree/master/bindings/python) (previously called [wav2letter](https://github.com/facebookresearch/wav2letter) to be installed.
+
+Fairseq transformer language model used in the wav2vec 2.0 paper can be obtained from the [wav2letter model repository](https://github.com/facebookresearch/wav2letter/tree/master/recipes/sota/2019).
+Be sure to upper-case the language model vocab after downloading it.
+
+Letter dictionary for pre-trained models can be found [here](https://dl.fbaipublicfiles.com/fairseq/wav2vec/dict.ltr.txt).
+
+Next, run the evaluation command:
+
+```shell script
+$subset=dev_other
+python examples/speech_recognition/infer.py /checkpoint/abaevski/data/speech/libri/10h/wav2vec/raw --task audio_pretraining \
+--nbest 1 --path /path/to/model --gen-subset $subset --results-path /path/to/save/results/for/sclite --w2l-decoder kenlm \
+--lm-model /path/to/kenlm.bin --lm-weight 2 --word-score -1 --sil-weight 0 --criterion ctc --labels ltr --max-tokens 4000000 \
+--post-process letter
+```
+
+To get raw numbers, use --w2l-decoder viterbi and omit the lexicon. To use the transformer language model, use --w2l-decoder fairseqlm.
+
+## Use wav2vec 2.0 with 🤗Transformers:
+
+Wav2Vec2 is also available in the [🤗Transformers library](https://github.com/huggingface/transformers) since version 4.4.
+
+Pretrained Models can be found on the [hub](https://huggingface.co/models?filter=wav2vec2)
+and documentation can be found [here](https://huggingface.co/transformers/master/model_doc/wav2vec2.html).
+
+Usage example:
+
+```python
+# !pip install transformers
+# !pip install datasets
+import soundfile as sf
+import torch
+from datasets import load_dataset
+from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
+
+# load pretrained model
+processor = Wav2Vec2Processor.from_pretrained("facebook/wav2vec2-base-960h")
+model = Wav2Vec2ForCTC.from_pretrained("facebook/wav2vec2-base-960h")
+
+
+librispeech_samples_ds = load_dataset("patrickvonplaten/librispeech_asr_dummy", "clean", split="validation")
+
+# load audio
+audio_input, sample_rate = sf.read(librispeech_samples_ds[0]["file"])
+
+# pad input values and return pt tensor
+input_values = processor(audio_input, sampling_rate=sample_rate, return_tensors="pt").input_values
+
+# INFERENCE
+
+# retrieve logits & take argmax
+logits = model(input_values).logits
+predicted_ids = torch.argmax(logits, dim=-1)
+
+# transcribe
+transcription = processor.decode(predicted_ids[0])
+
+# FINE-TUNE
+
+target_transcription = "A MAN SAID TO THE UNIVERSE I EXIST"
+
+# encode labels
+with processor.as_target_processor():
+	labels = processor(target_transcription, return_tensors="pt").input_ids
+
+# compute loss by passing labels
+loss = model(input_values, labels=labels).loss
+loss.backward()
+```
+
+# wav2vec
+
+Example to train a wav2vec model as described in [wav2vec: Unsupervised Pre-training for Speech Recognition (Schneider et al., 2019)](https://arxiv.org/abs/1904.05862).
+
+## Pre-trained models
+
+Description | Dataset | Model
+---|---|---
+Wav2Vec large | [Librispeech](http://www.openslr.org/12) | [download](https://dl.fbaipublicfiles.com/fairseq/wav2vec/wav2vec_large.pt)
+
+#### Example usage:
+```python
+import torch
+import fairseq
+
+cp_path = '/path/to/wav2vec.pt'
+model, cfg, task = fairseq.checkpoint_utils.load_model_ensemble_and_task([cp_path])
+model = model[0]
+model.eval()
+
+wav_input_16khz = torch.randn(1,10000)
+z = model.feature_extractor(wav_input_16khz)
+c = model.feature_aggregator(z)
+```
+
+## Training a new model with the CLI tools
+
+Given a directory containing wav files to be used for pretraining (we recommend splitting each file into separate files 10 to 30 seconds in length)
+
+### Prepare training data manifest:
+
+```
+$ python examples/wav2vec/wav2vec_manifest.py /path/to/waves --dest /manifest/path --ext wav
+```
+
+### Train a wav2vec model:
+
+```
+$ python train.py /manifest/path --save-dir /model/path --num-workers 6 --fp16 --max-update 400000 --save-interval 1 --no-epoch-checkpoints \
+--arch wav2vec --task audio_pretraining --min-lr 1e-06 --stop-min-lr 1e-09 --optimizer adam --lr 0.005 --lr-scheduler cosine \
+--conv-feature-layers [(512, 10, 5), (512, 8, 4), (512, 4, 2), (512, 4, 2), (512, 4, 2), (512, 1, 1), (512, 1, 1)] \
+--conv-aggregator-layers [(512, 2, 1), (512, 3, 1), (512, 4, 1), (512, 5, 1), (512, 6, 1), (512, 7, 1), (512, 8, 1), (512, 9, 1), (512, 10, 1), (512, 11, 1), (512, 12, 1), (512, 13, 1)] \
+--skip-connections-agg --residual-scale 0.5 --log-compression --warmup-updates 500 --warmup-init-lr 1e-07 --criterion wav2vec --num-negatives 10 \
+--max-sample-size 150000 --max-tokens 1500000 --skip-invalid-size-inputs-valid-test
+```
+
+### Run wav2vec2 pre-training on Google Cloud TPUs:
+
+Wav2Vec2 is now supported on TPUs! It's currently pre-training only.
+
+#### Using hydra on a v3-8:
+
+```
+$ OMP_NUM_THREADS=1 fairseq-hydra-train \
+  task.data=/manifest/path \
+  --config-dir /PATH/TO/FAIRSEQ/examples/wav2vec/config/pretraining \
+  --config-name wav2vec2_large_librivox_tpu.yaml
+```
+
+#### Using command line arguments on a v3-8:
+
+```
+$ OMP_NUM_THREADS=1 python train.py /manifest/path --save-dir /model/path --num-workers 6 --fp16 --max-update 400000 --save-interval 1 --no-epoch-checkpoints \
+--arch wav2vec2 --task audio_pretraining --min-lr 1e-06 --stop-min-lr 1e-09 --optimizer adam --lr 0.005 --lr-scheduler cosine \
+--conv-feature-layers [(512, 10, 5), (512, 8, 4), (512, 4, 2), (512, 4, 2), (512, 4, 2), (512, 1, 1), (512, 1, 1)] \
+--conv-aggregator-layers [(512, 2, 1), (512, 3, 1), (512, 4, 1), (512, 5, 1), (512, 6, 1), (512, 7, 1), (512, 8, 1), (512, 9, 1), (512, 10, 1), (512, 11, 1), (512, 12, 1), (512, 13, 1)] \
+--skip-connections-agg --residual-scale 0.5 --log-compression --warmup-updates 500 --warmup-init-lr 1e-07 --criterion wav2vec --num-negatives 10 \
+--max-sample-size 150000 --max-tokens 1500000 --skip-invalid-size-inputs-valid-test \
+--tpu --distributed-world-size 8 --num-batch-buckets 3 --enable-padding \
+--encoder-layerdrop 0 --mask-channel-prob 0.1
+```
+
+#### Using hydra on a pod slice (v3-N with N > 8):
+
+```
+$ OMP_NUM_THREADS=1 fairseq-hydra-train \
+  task.data=/manifest/path \
+  --config-dir /PATH/TO/FAIRSEQ/examples/wav2vec/config/pretraining \
+  --config-name wav2vec2_large_librivox_tpu-pod.yaml  # edit distributed-world-size accordingly
+```
+
+#### Using command line arguments on a pod slice (v3-N with N > 8):
+
+
+```
+$ python -m torch_xla.distributed.xla_dist \
+  --tpu ${TPUNAME} --conda-env=torch-xla-${TORCH_XLA_VERSION} --env OMP_NUM_THREADS=1 \
+  -- \
+python train.py /manifest/path --save-dir /model/path --num-workers 6 --fp16 --max-update 400000 --save-interval 1 --no-epoch-checkpoints \
+--arch wav2vec2 --task audio_pretraining --min-lr 1e-06 --stop-min-lr 1e-09 --optimizer adam --lr 0.005 --lr-scheduler cosine \
+--conv-feature-layers [(512, 10, 5), (512, 8, 4), (512, 4, 2), (512, 4, 2), (512, 4, 2), (512, 1, 1), (512, 1, 1)] \
+--conv-aggregator-layers [(512, 2, 1), (512, 3, 1), (512, 4, 1), (512, 5, 1), (512, 6, 1), (512, 7, 1), (512, 8, 1), (512, 9, 1), (512, 10, 1), (512, 11, 1), (512, 12, 1), (512, 13, 1)] \
+--skip-connections-agg --residual-scale 0.5 --log-compression --warmup-updates 500 --warmup-init-lr 1e-07 --criterion wav2vec --num-negatives 10 \
+--max-sample-size 150000 --max-tokens 1500000 --skip-invalid-size-inputs-valid-test \
+--tpu --distributed-world-size ${WORLD_SIZE} --num-batch-buckets 3 --enable-padding \
+--encoder-layerdrop 0 --mask-channel-prob 0.1
+```
+
+### Extract embeddings from the downstream task data:
+
+```
+$ PYTHONPATH=/path/to/fairseq python examples/wav2vec/wav2vec_featurize.py --input /path/to/task/waves --output /path/to/output \
+--model /model/path/checkpoint_best.pt --split train valid test
+```
+
+# vq-wav2vec
+
+Example to train a vq-wav2vec model as described in [vq-wav2vec: Self-Supervised Learning of Discrete Speech Representations (Baevski et al., 2019)](https://arxiv.org/abs/1910.05453).
+
+These models are also used in [Effectiveness of self-supervised pre-training for speech recognition (Baevski et al., 2019)](https://arxiv.org/abs/1911.03912).
+
+## Pre-trained models
+
+Description | Dataset | Model
+---|---|---
+vq-wav2vec Gumbel | [Librispeech](http://www.openslr.org/12) | [download](https://dl.fbaipublicfiles.com/fairseq/wav2vec/vq-wav2vec.pt)
+vq-wav2vec K-means | [Librispeech](http://www.openslr.org/12) | [download](https://dl.fbaipublicfiles.com/fairseq/wav2vec/vq-wav2vec_kmeans.pt)
+Roberta on K-means codes | [Librispeech](http://www.openslr.org/12) | [download](https://dl.fbaipublicfiles.com/fairseq/wav2vec/bert_kmeans.tar)
+
+#### Example usage:
+```python
+import torch
+import fairseq
+
+cp = torch.load('/path/to/vq-wav2vec.pt')
+model, cfg, task = fairseq.checkpoint_utils.load_model_ensemble_and_task([cp])
+model = model[0]
+model.eval()
+
+wav_input_16khz = torch.randn(1,10000)
+z = model.feature_extractor(wav_input_16khz)
+_, idxs = model.vector_quantizer.forward_idx(z)
+print(idxs.shape) # output: torch.Size([1, 60, 2]), 60 timesteps with 2 indexes corresponding to 2 groups in the model
+```
+
+## Training a new model with the CLI tools
+
+Given a directory containing wav files to be used for pretraining (we recommend splitting each file into separate file 10 to 30 seconds in length)
+
+### Prepare training data manifest:
+
+```
+$ python examples/wav2vec/wav2vec_manifest.py /path/to/waves --dest /manifest/path --ext wav
+```
+
+### Train a gumbel vq-wav2vec model:
+
+```
+$ python train.py /manifest/path --save-dir /model/path --num-workers 6 --fp16 --max-update 400000 \
+--save-interval 1 --no-epoch-checkpoints --arch wav2vec --task audio_pretraining --min-lr 1e-06 --stop-min-lr 1e-09 \
+--optimizer adam --lr 1e-05 --lr-scheduler cosine \
+--conv-feature-layers [(512, 10, 5), (512, 8, 4), (512, 4, 2), (512, 4, 2), (512, 4, 2), (512, 1, 1), (512, 1, 1), (512, 1, 1)] \
+--conv-aggregator-layers [(512, 2, 1), (512, 3, 1), (512, 4, 1), (512, 5, 1), (512, 6, 1), (512, 7, 1), (512, 8, 1), (512, 9, 1), (512, 10, 1), (512, 11, 1), (512, 12, 1), (512, 13, 1)] \
+--activation gelu --offset auto --skip-connections-agg --residual-scale 0.5 \
+--log-keys ["prob_perplexity","code_perplexity","temp"] --vq-type gumbel --vq-groups 2 --vq-depth 2 \
+--combine-groups --vq-vars 320 --vq-temp (2,0.5,0.999995) --prediction-steps 12 --warmup-updates 1000 \
+--warmup-init-lr 1e-07 --criterion wav2vec --num-negatives 10 --max-sample-size 150000 \
+--max-tokens 300000 --cross-sample-negatives 0 --update-freq 1 --seed 2 --skip-invalid-size-inputs-valid-test
+```
+
+for k-means training, set vq-type with "kmeans" and add --loss-weights [1] argument. Pre-trained models were trained on 16 GPUs.
+
+### Tokenize audio data (e.g. for BERT training):
+
+```
+$ PYTHONPATH=/path/to/fairseq python examples/wav2vec/vq-wav2vec_featurize.py --data-dir /manifest/path --output-dir /path/to/output \
+--checkpoint /model/path/checkpoint_best.pt --split train valid test --extension tsv
+```
diff --git a/SpeechT5/fairseq/examples/wav2vec/__init__.py b/SpeechT5/fairseq/examples/wav2vec/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/SpeechT5/fairseq/examples/wav2vec/config/finetuning/base_100h.yaml b/SpeechT5/fairseq/examples/wav2vec/config/finetuning/base_100h.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..539dabb047d02089c3e633c01960dba787134e53
--- /dev/null
+++ b/SpeechT5/fairseq/examples/wav2vec/config/finetuning/base_100h.yaml
@@ -0,0 +1,59 @@
+# @package _group_
+
+common:
+  fp16: true
+  log_format: json
+  log_interval: 200
+
+checkpoint:
+  no_epoch_checkpoints: true
+  best_checkpoint_metric: wer
+
+task:
+  _name: audio_pretraining
+  data: ???
+  normalize: false
+  labels: ltr
+
+dataset:
+  num_workers: 6
+  max_tokens: 3200000
+  skip_invalid_size_inputs_valid_test: true
+  valid_subset: dev_other
+
+distributed_training:
+  ddp_backend: legacy_ddp
+  distributed_world_size: 2
+
+criterion:
+  _name: ctc
+  zero_infinity: true
+
+optimization:
+  max_update: 80000
+  lr: [0.00003]
+  sentence_avg: true
+  update_freq: [4]
+
+optimizer:
+  _name: adam
+  adam_betas: (0.9,0.98)
+  adam_eps: 1e-08
+
+lr_scheduler:
+  _name: tri_stage
+  phase_ratio: [0.1, 0.4, 0.5]
+  final_lr_scale: 0.05
+
+model:
+  _name: wav2vec_ctc
+  w2v_path: ???
+  apply_mask: true
+  mask_prob: 0.65
+  mask_channel_prob: 0.5
+  mask_channel_length: 64
+  layerdrop: 0.1
+  activation_dropout: 0.1
+  feature_grad_mult: 0.0
+  freeze_finetune_updates: 0
+
diff --git a/SpeechT5/fairseq/examples/wav2vec/config/finetuning/base_10h.yaml b/SpeechT5/fairseq/examples/wav2vec/config/finetuning/base_10h.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..16a3c4d96cf7f676b4314b3cd4632cec7ec2cebf
--- /dev/null
+++ b/SpeechT5/fairseq/examples/wav2vec/config/finetuning/base_10h.yaml
@@ -0,0 +1,64 @@
+# @package _group_
+
+common:
+  fp16: true
+  log_format: json
+  log_interval: 200
+
+checkpoint:
+  save_interval: 50
+  save_interval_updates: 10000
+  keep_interval_updates: 1
+  no_epoch_checkpoints: true
+  best_checkpoint_metric: wer
+
+task:
+  _name: audio_pretraining
+  data: ???
+  normalize: false
+  labels: ltr
+
+dataset:
+  num_workers: 6
+  max_tokens: 3200000
+  skip_invalid_size_inputs_valid_test: true
+  validate_after_updates: 10000
+  validate_interval: 50
+  valid_subset: dev_other
+
+distributed_training:
+  ddp_backend: legacy_ddp
+  distributed_world_size: 2
+
+criterion:
+  _name: ctc
+  zero_infinity: true
+
+optimization:
+  max_update: 20000
+  lr: [0.00005]
+  sentence_avg: true
+  update_freq: [4]
+
+optimizer:
+  _name: adam
+  adam_betas: (0.9,0.98)
+  adam_eps: 1e-08
+
+lr_scheduler:
+  _name: tri_stage
+  phase_ratio: [0.1, 0.4, 0.5]
+  final_lr_scale: 0.05
+
+model:
+  _name: wav2vec_ctc
+  w2v_path: ???
+  apply_mask: true
+  mask_prob: 0.65
+  mask_channel_prob: 0.5
+  mask_channel_length: 64
+  layerdrop: 0.05
+  activation_dropout: 0.1
+  feature_grad_mult: 0.0
+  freeze_finetune_updates: 10000
+
diff --git a/SpeechT5/fairseq/examples/wav2vec/config/finetuning/base_10m.yaml b/SpeechT5/fairseq/examples/wav2vec/config/finetuning/base_10m.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..3ceb77a252de06e51e960bbda5952e6db3ea13e2
--- /dev/null
+++ b/SpeechT5/fairseq/examples/wav2vec/config/finetuning/base_10m.yaml
@@ -0,0 +1,64 @@
+# @package _group_
+
+common:
+  fp16: true
+  log_format: json
+  log_interval: 200
+
+checkpoint:
+  save_interval: 1000
+  save_interval_updates: 50
+  keep_interval_updates: 1
+  no_epoch_checkpoints: true
+  best_checkpoint_metric: wer
+
+task:
+  _name: audio_pretraining
+  data: ???
+  normalize: false
+  labels: ltr
+
+dataset:
+  num_workers: 6
+  max_tokens: 3200000
+  skip_invalid_size_inputs_valid_test: true
+  validate_after_updates: 10000
+  validate_interval: 1000
+  valid_subset: dev_other
+
+distributed_training:
+  ddp_backend: legacy_ddp
+  distributed_world_size: 2
+
+criterion:
+  _name: ctc
+  zero_infinity: true
+
+optimization:
+  max_update: 13000
+  lr: [0.00005]
+  sentence_avg: true
+  update_freq: [4]
+
+optimizer:
+  _name: adam
+  adam_betas: (0.9,0.98)
+  adam_eps: 1e-08
+
+lr_scheduler:
+  _name: tri_stage
+  phase_ratio: [0.1, 0.4, 0.5]
+  final_lr_scale: 0.05
+
+model:
+  _name: wav2vec_ctc
+  w2v_path: ???
+  apply_mask: true
+  mask_prob: 0.65
+  mask_channel_prob: 0.25
+  mask_channel_length: 64
+  layerdrop: 0.1
+  activation_dropout: 0.1
+  feature_grad_mult: 0.0
+  freeze_finetune_updates: 10000
+
diff --git a/SpeechT5/fairseq/examples/wav2vec/config/finetuning/base_1h.yaml b/SpeechT5/fairseq/examples/wav2vec/config/finetuning/base_1h.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..3ceb77a252de06e51e960bbda5952e6db3ea13e2
--- /dev/null
+++ b/SpeechT5/fairseq/examples/wav2vec/config/finetuning/base_1h.yaml
@@ -0,0 +1,64 @@
+# @package _group_
+
+common:
+  fp16: true
+  log_format: json
+  log_interval: 200
+
+checkpoint:
+  save_interval: 1000
+  save_interval_updates: 50
+  keep_interval_updates: 1
+  no_epoch_checkpoints: true
+  best_checkpoint_metric: wer
+
+task:
+  _name: audio_pretraining
+  data: ???
+  normalize: false
+  labels: ltr
+
+dataset:
+  num_workers: 6
+  max_tokens: 3200000
+  skip_invalid_size_inputs_valid_test: true
+  validate_after_updates: 10000
+  validate_interval: 1000
+  valid_subset: dev_other
+
+distributed_training:
+  ddp_backend: legacy_ddp
+  distributed_world_size: 2
+
+criterion:
+  _name: ctc
+  zero_infinity: true
+
+optimization:
+  max_update: 13000
+  lr: [0.00005]
+  sentence_avg: true
+  update_freq: [4]
+
+optimizer:
+  _name: adam
+  adam_betas: (0.9,0.98)
+  adam_eps: 1e-08
+
+lr_scheduler:
+  _name: tri_stage
+  phase_ratio: [0.1, 0.4, 0.5]
+  final_lr_scale: 0.05
+
+model:
+  _name: wav2vec_ctc
+  w2v_path: ???
+  apply_mask: true
+  mask_prob: 0.65
+  mask_channel_prob: 0.25
+  mask_channel_length: 64
+  layerdrop: 0.1
+  activation_dropout: 0.1
+  feature_grad_mult: 0.0
+  freeze_finetune_updates: 10000
+
diff --git a/SpeechT5/fairseq/examples/wav2vec/config/finetuning/base_960h.yaml b/SpeechT5/fairseq/examples/wav2vec/config/finetuning/base_960h.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..2d38211e919ddcec7cc9a24557fc11dc0f3f99cf
--- /dev/null
+++ b/SpeechT5/fairseq/examples/wav2vec/config/finetuning/base_960h.yaml
@@ -0,0 +1,58 @@
+# @package _group_
+
+common:
+  fp16: true
+  log_format: json
+  log_interval: 200
+
+checkpoint:
+  no_epoch_checkpoints: true
+  best_checkpoint_metric: wer
+
+task:
+  _name: audio_pretraining
+  data: ???
+  normalize: false
+  labels: ltr
+
+dataset:
+  num_workers: 6
+  max_tokens: 3200000
+  skip_invalid_size_inputs_valid_test: true
+  valid_subset: dev_other
+
+distributed_training:
+  ddp_backend: legacy_ddp
+  distributed_world_size: 8
+
+criterion:
+  _name: ctc
+  zero_infinity: true
+
+optimization:
+  max_update: 320000
+  lr: [0.0001]
+  sentence_avg: true
+
+optimizer:
+  _name: adam
+  adam_betas: (0.9,0.98)
+  adam_eps: 1e-08
+
+lr_scheduler:
+  _name: tri_stage
+  phase_ratio: [0.1, 0.4, 0.5]
+  final_lr_scale: 0.05
+
+model:
+  _name: wav2vec_ctc
+  w2v_path: ???
+  apply_mask: true
+  mask_prob: 0.5
+  mask_channel_prob: 0.1
+  mask_channel_length: 64
+  layerdrop: 0.1
+  activation_dropout: 0.1
+  feature_grad_mult: 0.0
+  freeze_finetune_updates: 0
+
diff --git a/SpeechT5/fairseq/examples/wav2vec/config/finetuning/vox_100h.yaml b/SpeechT5/fairseq/examples/wav2vec/config/finetuning/vox_100h.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..2fdb0c568c197186fc370cfc5b95c04d0f49b453
--- /dev/null
+++ b/SpeechT5/fairseq/examples/wav2vec/config/finetuning/vox_100h.yaml
@@ -0,0 +1,59 @@
+# @package _group_
+
+common:
+  fp16: true
+  log_format: json
+  log_interval: 200
+
+checkpoint:
+  no_epoch_checkpoints: true
+  best_checkpoint_metric: wer
+
+task:
+  _name: audio_pretraining
+  data: ???
+  normalize: true
+  labels: ltr
+
+dataset:
+  num_workers: 6
+  max_tokens: 1280000
+  skip_invalid_size_inputs_valid_test: true
+  valid_subset: dev_other
+
+distributed_training:
+  ddp_backend: legacy_ddp
+  distributed_world_size: 4
+
+criterion:
+  _name: ctc
+  zero_infinity: true
+
+optimization:
+  max_update: 80000
+  lr: [0.00003]
+  sentence_avg: true
+  update_freq: [5]
+
+optimizer:
+  _name: adam
+  adam_betas: (0.9,0.98)
+  adam_eps: 1e-08
+
+lr_scheduler:
+  _name: tri_stage
+  phase_ratio: [0.1, 0.4, 0.5]
+  final_lr_scale: 0.05
+
+model:
+  _name: wav2vec_ctc
+  w2v_path: ???
+  apply_mask: true
+  mask_prob: 0.5
+  mask_channel_prob: 0.5
+  mask_channel_length: 64
+  layerdrop: 0.1
+  activation_dropout: 0.1
+  feature_grad_mult: 0.0
+  freeze_finetune_updates: 10000
+
diff --git a/SpeechT5/fairseq/examples/wav2vec/config/finetuning/vox_10h.yaml b/SpeechT5/fairseq/examples/wav2vec/config/finetuning/vox_10h.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..f1a979e05dad279f77463b7cb3cb62a7d0178d5c
--- /dev/null
+++ b/SpeechT5/fairseq/examples/wav2vec/config/finetuning/vox_10h.yaml
@@ -0,0 +1,64 @@
+# @package _group_
+
+common:
+  fp16: true
+  log_format: json
+  log_interval: 200
+
+checkpoint:
+  save_interval: 50
+  save_interval_updates: 10000
+  keep_interval_updates: 1
+  no_epoch_checkpoints: true
+  best_checkpoint_metric: wer
+
+task:
+  _name: audio_pretraining
+  data: ???
+  normalize: true
+  labels: ltr
+
+dataset:
+  num_workers: 6
+  max_tokens: 1280000
+  skip_invalid_size_inputs_valid_test: true
+  validate_after_updates: 10000
+  validate_interval: 50
+  valid_subset: dev_other
+
+distributed_training:
+  ddp_backend: legacy_ddp
+  distributed_world_size: 4
+
+criterion:
+  _name: ctc
+  zero_infinity: true
+
+optimization:
+  max_update: 20000
+  lr: [0.0001]
+  sentence_avg: true
+  update_freq: [5]
+
+optimizer:
+  _name: adam
+  adam_betas: (0.9,0.98)
+  adam_eps: 1e-08
+
+lr_scheduler:
+  _name: tri_stage
+  phase_ratio: [0.1, 0.4, 0.5]
+  final_lr_scale: 0.05
+
+model:
+  _name: wav2vec_ctc
+  w2v_path: ???
+  apply_mask: true
+  mask_prob: 0.75
+  mask_channel_prob: 0.25
+  mask_channel_length: 64
+  layerdrop: 0.1
+  activation_dropout: 0.1
+  feature_grad_mult: 0.0
+  freeze_finetune_updates: 10000
+
diff --git a/SpeechT5/fairseq/examples/wav2vec/config/finetuning/vox_10m.yaml b/SpeechT5/fairseq/examples/wav2vec/config/finetuning/vox_10m.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..d12439bb28cd4a5f0ecc255b4c21a77c64ae8b38
--- /dev/null
+++ b/SpeechT5/fairseq/examples/wav2vec/config/finetuning/vox_10m.yaml
@@ -0,0 +1,64 @@
+# @package _group_
+
+common:
+  fp16: true
+  log_format: json
+  log_interval: 200
+
+checkpoint:
+  save_interval: 1000
+  save_interval_updates: 50
+  keep_interval_updates: 1
+  no_epoch_checkpoints: true
+  best_checkpoint_metric: wer
+
+task:
+  _name: audio_pretraining
+  data: ???
+  normalize: true
+  labels: ltr
+
+dataset:
+  num_workers: 6
+  max_tokens: 1280000
+  skip_invalid_size_inputs_valid_test: true
+  validate_after_updates: 10000
+  validate_interval: 1000
+  valid_subset: dev_other
+
+distributed_training:
+  ddp_backend: legacy_ddp
+  distributed_world_size: 4
+
+criterion:
+  _name: ctc
+  zero_infinity: true
+
+optimization:
+  max_update: 13000
+  lr: [0.0001]
+  sentence_avg: true
+  update_freq: [5]
+
+optimizer:
+  _name: adam
+  adam_betas: (0.9,0.98)
+  adam_eps: 1e-08
+
+lr_scheduler:
+  _name: tri_stage
+  phase_ratio: [0.1, 0.4, 0.5]
+  final_lr_scale: 0.05
+
+model:
+  _name: wav2vec_ctc
+  w2v_path: ???
+  apply_mask: true
+  mask_prob: 0.65
+  mask_channel_prob: 0.25
+  mask_channel_length: 64
+  layerdrop: 0.1
+  activation_dropout: 0.1
+  feature_grad_mult: 0.0
+  freeze_finetune_updates: 10000
+
diff --git a/SpeechT5/fairseq/examples/wav2vec/config/finetuning/vox_1h.yaml b/SpeechT5/fairseq/examples/wav2vec/config/finetuning/vox_1h.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..7f3b04c0349b8900d00992da4a6bce8acac22449
--- /dev/null
+++ b/SpeechT5/fairseq/examples/wav2vec/config/finetuning/vox_1h.yaml
@@ -0,0 +1,64 @@
+# @package _group_
+
+common:
+  fp16: true
+  log_format: json
+  log_interval: 200
+
+checkpoint:
+  save_interval: 1000
+  save_interval_updates: 50
+  keep_interval_updates: 1
+  no_epoch_checkpoints: true
+  best_checkpoint_metric: wer
+
+task:
+  _name: audio_pretraining
+  data: ???
+  normalize: true
+  labels: ltr
+
+dataset:
+  num_workers: 6
+  max_tokens: 1280000
+  skip_invalid_size_inputs_valid_test: true
+  validate_after_updates: 10000
+  validate_interval: 1000
+  valid_subset: dev_other
+
+distributed_training:
+  ddp_backend: legacy_ddp
+  distributed_world_size: 4
+
+criterion:
+  _name: ctc
+  zero_infinity: true
+
+optimization:
+  max_update: 13000
+  lr: [0.0003]
+  sentence_avg: true
+  update_freq: [5]
+
+optimizer:
+  _name: adam
+  adam_betas: (0.9,0.98)
+  adam_eps: 1e-08
+
+lr_scheduler:
+  _name: tri_stage
+  phase_ratio: [0.1, 0.4, 0.5]
+  final_lr_scale: 0.05
+
+model:
+  _name: wav2vec_ctc
+  w2v_path: ???
+  apply_mask: true
+  mask_prob: 0.75
+  mask_channel_prob: 0.25
+  mask_channel_length: 64
+  layerdrop: 0.1
+  activation_dropout: 0.1
+  feature_grad_mult: 0.0
+  freeze_finetune_updates: 10000
+
diff --git a/SpeechT5/fairseq/examples/wav2vec/config/finetuning/vox_960h.yaml b/SpeechT5/fairseq/examples/wav2vec/config/finetuning/vox_960h.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..0633915bb29a102a2275cbc49e45ac6e11bd5ad2
--- /dev/null
+++ b/SpeechT5/fairseq/examples/wav2vec/config/finetuning/vox_960h.yaml
@@ -0,0 +1,58 @@
+# @package _group_
+
+common:
+  fp16: true
+  log_format: json
+  log_interval: 200
+
+checkpoint:
+  no_epoch_checkpoints: true
+  best_checkpoint_metric: wer
+
+task:
+  _name: audio_pretraining
+  data: ???
+  normalize: true
+  labels: ltr
+
+dataset:
+  num_workers: 6
+  max_tokens: 1280000
+  skip_invalid_size_inputs_valid_test: true
+  valid_subset: dev_other
+
+distributed_training:
+  ddp_backend: legacy_ddp
+  distributed_world_size: 24
+
+criterion:
+  _name: ctc
+  zero_infinity: true
+
+optimization:
+  max_update: 320000
+  lr: [0.00003]
+  sentence_avg: true
+
+optimizer:
+  _name: adam
+  adam_betas: (0.9,0.98)
+  adam_eps: 1e-08
+
+lr_scheduler:
+  _name: tri_stage
+  phase_ratio: [0.1, 0.4, 0.5]
+  final_lr_scale: 0.05
+
+model:
+  _name: wav2vec_ctc
+  w2v_path: ???
+  apply_mask: true
+  mask_prob: 0.5
+  mask_channel_prob: 0.25
+  mask_channel_length: 64
+  layerdrop: 0.1
+  activation_dropout: 0.1
+  feature_grad_mult: 0.0
+  freeze_finetune_updates: 10000
+
diff --git a/SpeechT5/fairseq/examples/wav2vec/config/pretraining/wav2vec2_base_librispeech.yaml b/SpeechT5/fairseq/examples/wav2vec/config/pretraining/wav2vec2_base_librispeech.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..b686e21ab1d367158fe7afa4197303a4ee74df66
--- /dev/null
+++ b/SpeechT5/fairseq/examples/wav2vec/config/pretraining/wav2vec2_base_librispeech.yaml
@@ -0,0 +1,57 @@
+# @package _group_
+
+common:
+  fp16: true
+  log_format: json
+  log_interval: 200
+
+checkpoint:
+  save_interval_updates: 25000
+  keep_interval_updates: 1
+  no_epoch_checkpoints: true
+
+task:
+  _name: audio_pretraining
+  data: ???
+  max_sample_size: 250000
+  min_sample_size: 32000
+  normalize: false
+
+dataset:
+  num_workers: 6
+  max_tokens: 1400000
+  skip_invalid_size_inputs_valid_test: true
+
+distributed_training:
+  distributed_world_size: 64
+  ddp_backend: legacy_ddp
+
+criterion:
+  _name: wav2vec
+  infonce: true
+  log_keys: ["prob_perplexity","code_perplexity","temp"]
+  loss_weights: [0.1, 10]
+
+optimization:
+  max_update: 400000
+  lr: [0.0005]
+
+optimizer:
+  _name: adam
+  adam_betas: (0.9,0.98)
+  adam_eps: 1e-06
+  weight_decay: 0.01
+
+lr_scheduler:
+  _name: polynomial_decay
+  warmup_updates: 32000
+
+model:
+  _name: wav2vec2
+  quantize_targets: true
+  final_dim: 256
+  encoder_layerdrop: 0.05
+  dropout_input: 0.1
+  dropout_features: 0.1
+  feature_grad_mult: 0.1
+  encoder_embed_dim: 768
diff --git a/SpeechT5/fairseq/examples/wav2vec/config/pretraining/wav2vec2_large_librivox.yaml b/SpeechT5/fairseq/examples/wav2vec/config/pretraining/wav2vec2_large_librivox.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..bee41157a9984ea89f46dc89e6986ba6c73c3037
--- /dev/null
+++ b/SpeechT5/fairseq/examples/wav2vec/config/pretraining/wav2vec2_large_librivox.yaml
@@ -0,0 +1,69 @@
+# @package _group_
+
+common:
+  fp16: true
+  log_format: json
+  log_interval: 200
+
+checkpoint:
+  save_interval_updates: 25000
+  keep_interval_updates: 1
+  no_epoch_checkpoints: true
+
+task:
+  _name: audio_pretraining
+  data: ???
+  max_sample_size: 320000
+  min_sample_size: 32000
+  normalize: true
+
+dataset:
+  num_workers: 6
+  max_tokens: 1200000
+  skip_invalid_size_inputs_valid_test: true
+
+distributed_training:
+  distributed_world_size: 128
+  ddp_backend: legacy_ddp
+
+criterion:
+  _name: wav2vec
+  infonce: true
+  log_keys: ["prob_perplexity","code_perplexity","temp"]
+  loss_weights: [0.1, 0]
+
+optimization:
+  max_update: 1000000
+  lr: [0.005]
+
+optimizer:
+  _name: adam
+  adam_betas: (0.9,0.98)
+  adam_eps: 1e-06
+  weight_decay: 0.01
+
+lr_scheduler:
+  _name: polynomial_decay
+  warmup_updates: 32000
+
+model:
+  _name: wav2vec2
+  quantize_targets: true
+  extractor_mode: layer_norm
+  layer_norm_first: true
+  final_dim: 768
+  latent_temp: [2.0,0.1,0.999995]
+  encoder_layerdrop: 0.00
+  dropout_input: 0.0
+  dropout_features: 0.0
+  dropout: 0.0
+  attention_dropout: 0.0
+  conv_bias: true
+
+  encoder_layers: 24
+  encoder_embed_dim: 1024
+  encoder_ffn_embed_dim: 4096
+  encoder_attention_heads: 16
+
+  feature_grad_mult: 1.0
+
diff --git a/SpeechT5/fairseq/examples/wav2vec/config/pretraining/wav2vec2_large_librivox_tpu-pod.yaml b/SpeechT5/fairseq/examples/wav2vec/config/pretraining/wav2vec2_large_librivox_tpu-pod.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..ff35a95b6596b74215ef1bbdd2ec8d462d1d8542
--- /dev/null
+++ b/SpeechT5/fairseq/examples/wav2vec/config/pretraining/wav2vec2_large_librivox_tpu-pod.yaml
@@ -0,0 +1,72 @@
+# @package _group_
+
+common:
+  tpu: true
+  fp16: false
+  log_format: json
+  log_interval: 10
+
+checkpoint:
+  save_interval_updates: 25000
+  keep_interval_updates: 1
+  no_epoch_checkpoints: true
+
+task:
+  _name: audio_pretraining
+  data: ???
+  max_sample_size: 250000
+  min_sample_size: 32000
+  normalize: true
+  num_batch_buckets: 3
+  precompute_mask_indices: true
+  enable_padding: true
+
+dataset:
+  num_workers: 6
+  max_tokens: 1200000
+  skip_invalid_size_inputs_valid_test: true
+
+distributed_training:
+  distributed_world_size: 128
+  ddp_backend: legacy_ddp
+
+criterion:
+  _name: wav2vec
+  infonce: true
+  log_keys: ["prob_perplexity","code_perplexity","temp"]
+  loss_weights: [0.1, 0]
+
+optimization:
+  max_update: 1000000
+  lr: [0.005]
+
+optimizer:
+  _name: adam
+  adam_betas: (0.9,0.98)
+  adam_eps: 1e-06
+  weight_decay: 0.01
+
+lr_scheduler:
+  _name: polynomial_decay
+  warmup_updates: 32000
+
+model:
+  _name: wav2vec2
+  quantize_targets: true
+  extractor_mode: layer_norm
+  layer_norm_first: true
+  final_dim: 768
+  latent_temp: [2.0,0.1,0.999995]
+  encoder_layerdrop: 0.00
+  dropout_input: 0.0
+  dropout_features: 0.0
+  dropout: 0.0
+  attention_dropout: 0.0
+  conv_bias: true
+
+  encoder_layers: 24
+  encoder_embed_dim: 1024
+  encoder_ffn_embed_dim: 4096
+  encoder_attention_heads: 16
+
+  feature_grad_mult: 1.0
diff --git a/SpeechT5/fairseq/examples/wav2vec/config/pretraining/wav2vec2_large_librivox_tpu.yaml b/SpeechT5/fairseq/examples/wav2vec/config/pretraining/wav2vec2_large_librivox_tpu.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..2036e23c6bd6ba896cdd2b055915c8f66944b3e4
--- /dev/null
+++ b/SpeechT5/fairseq/examples/wav2vec/config/pretraining/wav2vec2_large_librivox_tpu.yaml
@@ -0,0 +1,72 @@
+# @package _group_
+
+common:
+  tpu: true
+  fp16: false
+  log_format: json
+  log_interval: 10
+
+checkpoint:
+  save_interval_updates: 25000
+  keep_interval_updates: 1
+  no_epoch_checkpoints: true
+
+task:
+  _name: audio_pretraining
+  data: ???
+  max_sample_size: 250000
+  min_sample_size: 32000
+  normalize: true
+  num_batch_buckets: 3
+  precompute_mask_indices: true
+  enable_padding: true
+
+dataset:
+  num_workers: 6
+  max_tokens: 1200000
+  skip_invalid_size_inputs_valid_test: true
+
+distributed_training:
+  distributed_world_size: 8
+  ddp_backend: legacy_ddp
+
+criterion:
+  _name: wav2vec
+  infonce: true
+  log_keys: ["prob_perplexity","code_perplexity","temp"]
+  loss_weights: [0.1, 0]
+
+optimization:
+  max_update: 1000000
+  lr: [0.005]
+
+optimizer:
+  _name: adam
+  adam_betas: (0.9,0.98)
+  adam_eps: 1e-06
+  weight_decay: 0.01
+
+lr_scheduler:
+  _name: polynomial_decay
+  warmup_updates: 32000
+
+model:
+  _name: wav2vec2
+  quantize_targets: true
+  extractor_mode: layer_norm
+  layer_norm_first: true
+  final_dim: 768
+  latent_temp: [2.0,0.1,0.999995]
+  encoder_layerdrop: 0.00
+  dropout_input: 0.0
+  dropout_features: 0.0
+  dropout: 0.0
+  attention_dropout: 0.0
+  conv_bias: true
+
+  encoder_layers: 24
+  encoder_embed_dim: 1024
+  encoder_ffn_embed_dim: 4096
+  encoder_attention_heads: 16
+
+  feature_grad_mult: 1.0
diff --git a/SpeechT5/fairseq/examples/wav2vec/libri_labels.py b/SpeechT5/fairseq/examples/wav2vec/libri_labels.py
new file mode 100644
index 0000000000000000000000000000000000000000..589bce57da9e473763b4e250c79b68e55dcfa1bd
--- /dev/null
+++ b/SpeechT5/fairseq/examples/wav2vec/libri_labels.py
@@ -0,0 +1,56 @@
+#!/usr/bin/env python3
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+"""
+Helper script to pre-compute embeddings for a flashlight (previously called wav2letter++) dataset
+"""
+
+import argparse
+import os
+
+
+def main():
+    parser = argparse.ArgumentParser()
+    parser.add_argument("tsv")
+    parser.add_argument("--output-dir", required=True)
+    parser.add_argument("--output-name", required=True)
+    args = parser.parse_args()
+
+    os.makedirs(args.output_dir, exist_ok=True)
+
+    transcriptions = {}
+
+    with open(args.tsv, "r") as tsv, open(
+        os.path.join(args.output_dir, args.output_name + ".ltr"), "w"
+    ) as ltr_out, open(
+        os.path.join(args.output_dir, args.output_name + ".txt"), "w"
+    ) as wrd_out:
+        root = next(tsv).strip()
+        for line in tsv:
+            line = line.strip()
+            dir = os.path.dirname(line)
+            if dir not in transcriptions:
+                parts = dir.split(os.path.sep)
+                trans_path = f"{parts[-2]}-{parts[-1]}.trans.txt"
+                path = os.path.join(root, dir, trans_path)
+                assert os.path.exists(path)
+                texts = {}
+                with open(path, "r") as trans_f:
+                    for tline in trans_f:
+                        items = tline.strip().split()
+                        texts[items[0]] = " ".join(items[1:])
+                transcriptions[dir] = texts
+            part = os.path.basename(line).split(".")[0]
+            assert part in transcriptions[dir]
+            print(transcriptions[dir][part], file=wrd_out)
+            print(
+                " ".join(list(transcriptions[dir][part].replace(" ", "|"))) + " |",
+                file=ltr_out,
+            )
+
+
+if __name__ == "__main__":
+    main()
diff --git a/SpeechT5/fairseq/examples/wav2vec/scripts/binarize_manifest.sh b/SpeechT5/fairseq/examples/wav2vec/scripts/binarize_manifest.sh
new file mode 100644
index 0000000000000000000000000000000000000000..6f201bdb524fad51a69d8c45889eaa1578efc62d
--- /dev/null
+++ b/SpeechT5/fairseq/examples/wav2vec/scripts/binarize_manifest.sh
@@ -0,0 +1,33 @@
+#!/usr/bin/env bash
+
+# usage: bash binarize_manifest <dest_dir> <train_split> <valid_split>
+
+DEST_DIR=$1
+TRAIN_SPLIT=$2
+VALID_SPLIT=$3
+FAIRSEQ_ROOT=$4
+
+mkdir -p $DEST_DIR
+
+# split file path and lengths into separate files
+cut -f1 $TRAIN_SPLIT.tsv > $DEST_DIR/train_fnames.txt
+cut -f1 $VALID_SPLIT.tsv > $DEST_DIR/valid_fnames.txt
+cut -f2 $TRAIN_SPLIT.tsv > $DEST_DIR/train.lengths
+cut -f2 $VALID_SPLIT.tsv > $DEST_DIR/valid.lengths
+
+# copy root directory
+head -1 $TRAIN_SPLIT.tsv > $DEST_DIR/train.root
+head -1 $VALID_SPLIT.tsv > $DEST_DIR/valid.root
+
+# remove root directory
+sed -i '1d' $DEST_DIR/train_fnames.txt
+sed -i '1d' $DEST_DIR/valid_fnames.txt
+sed -i '1d' $DEST_DIR/train.lengths
+sed -i '1d' $DEST_DIR/valid.lengths
+
+# insert spaces between characters
+sed -i -e 's/\(.\)/\1 /g' $DEST_DIR/train_fnames.txt
+sed -i -e 's/\(.\)/\1 /g' $DEST_DIR/valid_fnames.txt
+
+# run preprocessor
+PYTHONPATH=$FAIRSEQ_ROOT python $FAIRSEQ_ROOT/fairseq_cli/preprocess.py --dataset-impl mmap --trainpref $DEST_DIR/train_fnames.txt --validpref $DEST_DIR/valid_fnames.txt --workers 60 --only-source --destdir $DEST_DIR
diff --git a/SpeechT5/fairseq/examples/wav2vec/unsupervised/README.md b/SpeechT5/fairseq/examples/wav2vec/unsupervised/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..046202e01c7ca81aeed54b480f3ee0f3e172c654
--- /dev/null
+++ b/SpeechT5/fairseq/examples/wav2vec/unsupervised/README.md
@@ -0,0 +1,103 @@
+# wav2vec Unsupervised  (wav2vec-U)
+  
+Wav2vec Unsupervised (wav2vec-U) is a framework for building speech recognition systems without any labeled training data as described in [Unsupervised Speech Recognition (Baevski et al., 2021)](https://ai.facebook.com/research/publications/unsupervised-speech-recognition).  The model takes as input wav2vec 2.0 or XLSR representations (see [pretrained models](https://github.com/pytorch/fairseq/blob/master/examples/wav2vec)) as well as unlabeled speech and text data.  
+  
+  The wav2vec-U training procedure consists of three consecutive main steps:
+* Preparation of speech representations and text data
+* Generative adversarial training (GAN)
+* Iterative self-training + Kaldi LM-decoding
+
+## Preparation of speech and text data
+Similar to [wav2vec 2.0](https://github.com/pytorch/fairseq/blob/master/examples/wav2vec/README.md),  data folders contain {train,valid,test}.{tsv,wrd,phn} files, where audio paths are stored in tsv files, and word, letter or phoneme transcriptions are stored in .{wrd,ltr,phn}.
+
+In **/path/to/data/with_silence** you need a *train.tsv* file as well as (optionally) *{valid,test}.{tsv,wrd,phn}*. It is nice to have *10h.{tsv,phn}* files there too for reproducing the ablation study on  layer selection. In **/path/to/data/without_silence** you have the same files, except *.tsv* files contain audios with silences removed using rVAD.
+
+Pre-requisites:
+* set FAIRSEQ_ROOT environmental variable to your fairseq installation
+* set RVAD_ROOT environmental variable to a checkout of [rVADfast](https://github.com/zhenghuatan/rVADfast)
+* set KENLM_ROOT environmental variable to the location of [KenLM](https://github.com/kpu/kenlm) binaries
+* install [PyKaldi](https://github.com/pykaldi/pykaldi) and set KALDI_ROOT environmental variable to the location of your kaldi installation. To use the version bundled with PyKaldi, you can use /path/to/pykaldi/tools/kaldi
+
+Create new audio files without silences:
+```shell
+# create a manifest file for the set original of audio files
+python $FAIRSEQ_ROOT/examples/wav2vec/wav2vec_manifest.py /dir/to/save/audio/files --ext wav --dest /path/to/new/train.tsv --valid-percent 0
+
+python scripts/vads.py -r $RVAD_ROOT < /path/to/train.tsv > train.vads
+
+python scripts/remove_silence.py --tsv /path/to/train.tsv --vads train.vads --out /dir/to/save/audio/files
+
+python $FAIRSEQ_ROOT/examples/wav2vec/wav2vec_manifest.py /dir/to/save/audio/files --ext wav --dest /path/to/new/train.tsv --valid-percent 0.01
+```
+
+Next, we need to preprocess the audio data to better match phonemized text data:
+
+```shell
+zsh scripts/prepare_audio.sh /dir/with/{train,test,valid}.tsv /output/dir /path/to/wav2vec2/model.pt 512 14
+```
+Note that if you have splits different than train/valid/test, you will need to modify this script. The last two arguments are the PCA dimensionality and the 0-based index of the layer from which to extract representations.
+
+Now we need to prepare text data:
+```shell
+zsh scripts/prepare_text.sh language /path/to/text/file /output/dir 1000 espeak /path/to/fasttext/lid/model
+```
+
+The fourth argument is minimum number observations of phones to keep. If your text corpus is small, you might want to reduce this number.
+
+The fifth argument is which phonemizer to use. Supported values are [espeak](http://espeak.sourceforge.net/), [espeak-ng](https://github.com/espeak-ng/espeak-ng), and [G2P](https://github.com/Kyubyong/g2p) (english only).
+
+Pre-trained fasttext LID models can be downloaded [here](https://fasttext.cc/docs/en/language-identification.html).
+
+### Prepare TIMIT data
+TIMIT transcripts include silence. Therefore VAD is not used for audio preprocessing, and we do not wrap transcripts with silences or insert random silence in between words.
+
+To prepare TIMIT data for both the matched an unmatched setup:
+```shell
+bash scripts/prepare_timit.sh /dir/to/timit/raw/data /output/dir /path/to/wav2vec2/model.pt
+```
+
+Note that we assume the TIMIT distribution with capitalized directories and filenames are used (e.g., `TRAIN/DR1/FCJF0/SA1.PHN`).
+
+## Generative adversarial training (GAN)
+
+We then use a GAN model to build a first unsupervised ASR model. The data preparation above of both speech features and text data is a necessary procedure that enables the generator to match speech to text in an unsupervised way. 
+
+Launching GAN training on top of preprocessed features, with default hyperparameters can be done with:
+
+```
+PREFIX=w2v_unsup_gan_xp
+TASK_DATA=/path/to/features/precompute_unfiltered_pca512_cls128_mean_pooled  
+TEXT_DATA=/path/to/data/phones  # path to fairseq-preprocessed GAN data (phones dir)
+KENLM_PATH=/path/to/data/phones/kenlm.phn.o4.bin  # KenLM 4-gram phoneme language model (LM data = GAN data here)
+
+PYTHONPATH=$FAIRSEQ_ROOT PREFIX=$PREFIX fairseq-hydra-train \
+    -m --config-dir config/gan \
+    --config-name w2vu \
+    task.data=${TASK_DATA} \
+    task.text_data=${TEXT_DATA} \
+    task.kenlm_path=${KENLM_PATH} \
+    common.user_dir=${FAIRSEQ_ROOT}/examples/wav2vec/unsupervised \
+    model.code_penalty=2,4 model.gradient_penalty=1.5,2.0 \
+    model.smoothness_weight=0.5,0.75,1.0 'common.seed=range(0,5)'
+```
+
+
+Once we find the best checkpoint (chosen using unsupervised metric that combined language model perplexity and vocabulary usage), we can use it to generate phone labels (or word labels with an appropriate kaldi WFST):
+
+```shell
+python w2vu_generate.py --config-dir config/generate --config-name viterbi \
+fairseq.common.user_dir=${FAIRSEQ_ROOT}/examples/wav2vec/unsupervised \
+fairseq.task.data=/path/to/dir/with/features \
+fairseq.common_eval.path=/path/to/gan/checkpoint \ 
+fairseq.dataset.gen_subset=valid results_path=/where/to/save/transcriptions
+```
+
+The decoding without LM works best on the same adjacent-mean-pooled features that the gan was trained on, while decoding with LM works better on features before the adjacent timestep mean-pooling step (without the "_pooled" suffix).
+
+## Iterative self-training + Kaldi LM-decoding
+After the GAN training provides a first unsupervised model, we can then progressively refine the quality of transcriptions using several iterations of semi-supervised learning. We perform two iterations: first, pseudo-label the training data with the unsupervised GAN model and train an HMM on the pseudo-labels. Second, we relabel the training data with the HMM and then fine-tune the original wav2vec 2.0 model using the HMM pseudo-labels with a CTC loss. Note that HMM models use phonemes as output, while wav2vec 2.0 use letter. Both are decoded using WFST decoders into words.
+
+
+Please see [this README](kaldi_self_train/README.md) for more instructions on how to do iterative self-training + Kaldi LM-decoding.
+
+*** Note: these instructions are a work in progress and will be updated over the next few days
diff --git a/SpeechT5/fairseq/examples/wav2vec/unsupervised/__init__.py b/SpeechT5/fairseq/examples/wav2vec/unsupervised/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/SpeechT5/fairseq/examples/wav2vec/unsupervised/config/finetuning/w2v_finetune.yaml b/SpeechT5/fairseq/examples/wav2vec/unsupervised/config/finetuning/w2v_finetune.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..e94da2ba4e46de564cb4619b5e5f955bddc103cc
--- /dev/null
+++ b/SpeechT5/fairseq/examples/wav2vec/unsupervised/config/finetuning/w2v_finetune.yaml
@@ -0,0 +1,62 @@
+# @package _group_
+
+common:
+  fp16: true
+  log_format: json
+  log_interval: 200
+  tensorboard_logdir: tb
+
+checkpoint:
+  no_epoch_checkpoints: true
+  save_interval_updates: 20000
+
+task:
+  _name: audio_pretraining
+  data: ???
+  normalize: true
+  labels: ltr
+
+dataset:
+  num_workers: 6
+  max_tokens: 800000
+  skip_invalid_size_inputs_valid_test: true
+  train_subset: train
+  valid_subset: valid
+
+distributed_training:
+  ddp_backend: legacy_ddp
+  distributed_world_size: 8
+  find_unused_parameters: True
+
+criterion:
+  _name: ctc
+  zero_infinity: true
+  post_process: letter
+
+optimization:
+  max_update: 80000
+  lr: [0.00003]
+  sentence_avg: true
+  update_freq: [1]
+
+optimizer:
+  _name: adam
+  adam_betas: (0.9,0.98)
+  adam_eps: 1e-08
+
+lr_scheduler:
+  _name: tri_stage
+  phase_ratio: [0.1, 0.4, 0.5]
+  final_lr_scale: 0.05
+
+model:
+  _name: wav2vec_ctc
+  w2v_path: ???
+  apply_mask: true
+  mask_prob: 0.25
+  mask_channel_prob: 0.1
+  mask_channel_length: 64
+  layerdrop: 0.1
+  activation_dropout: 0.1
+  feature_grad_mult: 0.0
+  freeze_finetune_updates: 0
diff --git a/SpeechT5/fairseq/examples/wav2vec/unsupervised/config/gan/w2vu.yaml b/SpeechT5/fairseq/examples/wav2vec/unsupervised/config/gan/w2vu.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..74f1829d1497560f6e1e006073f19716d36bc947
--- /dev/null
+++ b/SpeechT5/fairseq/examples/wav2vec/unsupervised/config/gan/w2vu.yaml
@@ -0,0 +1,115 @@
+# @package _group_
+
+common:
+  fp16: false
+  fp16_no_flatten_grads: true
+  log_format: json
+  log_interval: 100
+  tensorboard_logdir: tb
+  reset_logging: false
+  suppress_crashes: false
+
+checkpoint:
+  save_interval: 1000
+  save_interval_updates: 1000
+  no_epoch_checkpoints: true
+  best_checkpoint_metric: weighted_lm_ppl
+  save_dir: .
+
+distributed_training:
+  distributed_world_size: 1
+
+task:
+  _name: unpaired_audio_text
+  data: ???
+  text_data: ???
+  labels: phn
+  sort_by_length: false
+  unfiltered: false
+  max_length: null
+  append_eos: false
+  kenlm_path: ???
+
+dataset:
+  num_workers: 6
+  batch_size: 160
+  skip_invalid_size_inputs_valid_test: true
+  valid_subset: valid
+  validate_interval: 1000
+  validate_interval_updates: 1000
+
+criterion:
+  _name: model
+  log_keys:
+    - accuracy_dense
+    - accuracy_token
+    - temp
+    - code_ppl
+
+optimization:
+  max_update: 150000
+  clip_norm: 5.0
+  lr: [0]
+
+optimizer:
+  _name: composite
+  groups:
+    generator:
+      lr: [0.0004]
+      lr_float: null
+      optimizer:
+        _name: adam
+        adam_betas: [0.5,0.98]
+        adam_eps: 1e-06
+        weight_decay: 0
+        amsgrad: false
+      lr_scheduler:
+        _name: fixed
+        warmup_updates: 0
+    discriminator:
+      lr: [ 0.0005 ]
+      lr_float: null
+      optimizer:
+        _name: adam
+        adam_betas: [0.5,0.98]
+        adam_eps: 1e-06
+        weight_decay: 0.0001
+        amsgrad: false
+      lr_scheduler:
+        _name: fixed
+        warmup_updates: 0
+
+lr_scheduler: pass_through
+
+model:
+  _name: wav2vec_u
+
+  discriminator_dim: 384
+  discriminator_depth: 2
+  discriminator_kernel: 6
+  discriminator_linear_emb: false
+  discriminator_causal: true
+  discriminator_max_pool: false
+  discriminator_act_after_linear: false
+  discriminator_dropout: 0.0
+  discriminator_weight_norm: false
+
+  generator_stride: 1
+  generator_kernel: 4
+  generator_bias: false
+  generator_dropout: 0.1
+
+  smoothness_weight: 0.5
+  smoothing: 0
+  smoothing_one_sided: false
+  gumbel: false
+  hard_gumbel: false
+  gradient_penalty: 1.5
+  code_penalty: 4.0
+  temp: [ 2,0.1,0.99995 ]
+  input_dim: 512
+
+  segmentation:
+    type: JOIN
+    mean_pool_join: false
+    remove_zeros: false
diff --git a/SpeechT5/fairseq/examples/wav2vec/unsupervised/config/generate/viterbi.yaml b/SpeechT5/fairseq/examples/wav2vec/unsupervised/config/generate/viterbi.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..9c88beebcb15f9047195c8c7e79c21eac59db418
--- /dev/null
+++ b/SpeechT5/fairseq/examples/wav2vec/unsupervised/config/generate/viterbi.yaml
@@ -0,0 +1,21 @@
+# @package _group_
+
+fairseq:
+  task:
+    _name: unpaired_audio_text
+    labels: phn
+    data: ???
+    sort_by_length: false
+    shuffle: false
+    text_data: ''
+
+  common_eval:
+    path: ???
+    quiet: true
+
+  dataset:
+    gen_subset: valid
+    batch_size: 1
+
+w2l_decoder: VITERBI
+post_process: silence
diff --git a/SpeechT5/fairseq/examples/wav2vec/unsupervised/config/timit_matched/test.uid b/SpeechT5/fairseq/examples/wav2vec/unsupervised/config/timit_matched/test.uid
new file mode 100644
index 0000000000000000000000000000000000000000..401008246a1bc2cbf309d9d0aa56710f0ff643bc
--- /dev/null
+++ b/SpeechT5/fairseq/examples/wav2vec/unsupervised/config/timit_matched/test.uid
@@ -0,0 +1,192 @@
+FDHC0_SI1559
+FDHC0_SI2189
+FDHC0_SI929
+FDHC0_SX119
+FDHC0_SX209
+FDHC0_SX29
+FDHC0_SX299
+FDHC0_SX389
+FELC0_SI1386
+FELC0_SI2016
+FELC0_SI756
+FELC0_SX126
+FELC0_SX216
+FELC0_SX306
+FELC0_SX36
+FELC0_SX396
+FJLM0_SI1043
+FJLM0_SI1673
+FJLM0_SI2303
+FJLM0_SX143
+FJLM0_SX233
+FJLM0_SX323
+FJLM0_SX413
+FJLM0_SX53
+FMGD0_SI1564
+FMGD0_SI2194
+FMGD0_SI934
+FMGD0_SX124
+FMGD0_SX214
+FMGD0_SX304
+FMGD0_SX34
+FMGD0_SX394
+FMLD0_SI2185
+FMLD0_SI822
+FMLD0_SI925
+FMLD0_SX115
+FMLD0_SX205
+FMLD0_SX25
+FMLD0_SX295
+FMLD0_SX385
+FNLP0_SI1308
+FNLP0_SI1938
+FNLP0_SI678
+FNLP0_SX138
+FNLP0_SX228
+FNLP0_SX318
+FNLP0_SX408
+FNLP0_SX48
+FPAS0_SI1272
+FPAS0_SI2204
+FPAS0_SI944
+FPAS0_SX134
+FPAS0_SX224
+FPAS0_SX314
+FPAS0_SX404
+FPAS0_SX44
+FPKT0_SI1538
+FPKT0_SI2168
+FPKT0_SI908
+FPKT0_SX188
+FPKT0_SX278
+FPKT0_SX368
+FPKT0_SX8
+FPKT0_SX98
+MBPM0_SI1577
+MBPM0_SI1584
+MBPM0_SI947
+MBPM0_SX137
+MBPM0_SX227
+MBPM0_SX317
+MBPM0_SX407
+MBPM0_SX47
+MCMJ0_SI1094
+MCMJ0_SI464
+MCMJ0_SI602
+MCMJ0_SX104
+MCMJ0_SX14
+MCMJ0_SX194
+MCMJ0_SX284
+MCMJ0_SX374
+MDAB0_SI1039
+MDAB0_SI1669
+MDAB0_SI2299
+MDAB0_SX139
+MDAB0_SX229
+MDAB0_SX319
+MDAB0_SX409
+MDAB0_SX49
+MGRT0_SI1450
+MGRT0_SI2080
+MGRT0_SI820
+MGRT0_SX10
+MGRT0_SX100
+MGRT0_SX190
+MGRT0_SX280
+MGRT0_SX370
+MJDH0_SI1354
+MJDH0_SI1984
+MJDH0_SI724
+MJDH0_SX184
+MJDH0_SX274
+MJDH0_SX364
+MJDH0_SX4
+MJDH0_SX94
+MJLN0_SI1449
+MJLN0_SI2079
+MJLN0_SI819
+MJLN0_SX189
+MJLN0_SX279
+MJLN0_SX369
+MJLN0_SX9
+MJLN0_SX99
+MJMP0_SI1535
+MJMP0_SI1791
+MJMP0_SI905
+MJMP0_SX185
+MJMP0_SX275
+MJMP0_SX365
+MJMP0_SX5
+MJMP0_SX95
+MKLT0_SI1213
+MKLT0_SI1843
+MKLT0_SI583
+MKLT0_SX133
+MKLT0_SX223
+MKLT0_SX313
+MKLT0_SX403
+MKLT0_SX43
+MLLL0_SI1363
+MLLL0_SI1993
+MLLL0_SI733
+MLLL0_SX103
+MLLL0_SX13
+MLLL0_SX193
+MLLL0_SX283
+MLLL0_SX373
+MLNT0_SI1574
+MLNT0_SI1902
+MLNT0_SI642
+MLNT0_SX102
+MLNT0_SX12
+MLNT0_SX192
+MLNT0_SX282
+MLNT0_SX372
+MNJM0_SI1580
+MNJM0_SI2210
+MNJM0_SI950
+MNJM0_SX140
+MNJM0_SX230
+MNJM0_SX320
+MNJM0_SX410
+MNJM0_SX50
+MPAM0_SI1189
+MPAM0_SI1819
+MPAM0_SI1961
+MPAM0_SX109
+MPAM0_SX19
+MPAM0_SX199
+MPAM0_SX289
+MPAM0_SX379
+MTAS1_SI1473
+MTAS1_SI2098
+MTAS1_SI838
+MTAS1_SX118
+MTAS1_SX208
+MTAS1_SX28
+MTAS1_SX298
+MTAS1_SX388
+MTLS0_SI1370
+MTLS0_SI2000
+MTLS0_SI740
+MTLS0_SX110
+MTLS0_SX20
+MTLS0_SX200
+MTLS0_SX290
+MTLS0_SX380
+MWBT0_SI1553
+MWBT0_SI2183
+MWBT0_SI923
+MWBT0_SX113
+MWBT0_SX203
+MWBT0_SX23
+MWBT0_SX293
+MWBT0_SX383
+MWEW0_SI1361
+MWEW0_SI1991
+MWEW0_SI731
+MWEW0_SX101
+MWEW0_SX11
+MWEW0_SX191
+MWEW0_SX281
+MWEW0_SX371
diff --git a/SpeechT5/fairseq/examples/wav2vec/unsupervised/config/timit_matched/train.uid b/SpeechT5/fairseq/examples/wav2vec/unsupervised/config/timit_matched/train.uid
new file mode 100644
index 0000000000000000000000000000000000000000..c39fd0b91d51e0ae15caf1e9701d0d9ef51ee21b
--- /dev/null
+++ b/SpeechT5/fairseq/examples/wav2vec/unsupervised/config/timit_matched/train.uid
@@ -0,0 +1,3696 @@
+FAEM0_SI1392
+FAEM0_SI2022
+FAEM0_SI762
+FAEM0_SX132
+FAEM0_SX222
+FAEM0_SX312
+FAEM0_SX402
+FAEM0_SX42
+FAJW0_SI1263
+FAJW0_SI1893
+FAJW0_SI633
+FAJW0_SX183
+FAJW0_SX273
+FAJW0_SX3
+FAJW0_SX363
+FAJW0_SX93
+FALK0_SI1086
+FALK0_SI456
+FALK0_SI658
+FALK0_SX186
+FALK0_SX276
+FALK0_SX366
+FALK0_SX6
+FALK0_SX96
+FALR0_SI1325
+FALR0_SI1955
+FALR0_SI695
+FALR0_SX155
+FALR0_SX245
+FALR0_SX335
+FALR0_SX425
+FALR0_SX65
+FAPB0_SI1063
+FAPB0_SI1693
+FAPB0_SI2323
+FAPB0_SX163
+FAPB0_SX253
+FAPB0_SX343
+FAPB0_SX433
+FAPB0_SX73
+FBAS0_SI1387
+FBAS0_SI1472
+FBAS0_SI2066
+FBAS0_SX127
+FBAS0_SX217
+FBAS0_SX307
+FBAS0_SX37
+FBAS0_SX397
+FBCG1_SI1612
+FBCG1_SI2242
+FBCG1_SI982
+FBCG1_SX172
+FBCG1_SX262
+FBCG1_SX352
+FBCG1_SX442
+FBCG1_SX82
+FBCH0_SI1586
+FBCH0_SI956
+FBCH0_SI959
+FBCH0_SX146
+FBCH0_SX236
+FBCH0_SX326
+FBCH0_SX416
+FBCH0_SX56
+FBJL0_SI1552
+FBJL0_SI2182
+FBJL0_SI922
+FBJL0_SX112
+FBJL0_SX202
+FBJL0_SX22
+FBJL0_SX292
+FBJL0_SX382
+FBLV0_SI1058
+FBLV0_SI1688
+FBLV0_SI2318
+FBLV0_SX158
+FBLV0_SX248
+FBLV0_SX338
+FBLV0_SX428
+FBLV0_SX68
+FBMH0_SI1136
+FBMH0_SI1766
+FBMH0_SI970
+FBMH0_SX146
+FBMH0_SX236
+FBMH0_SX326
+FBMH0_SX416
+FBMH0_SX56
+FBMJ0_SI1776
+FBMJ0_SI516
+FBMJ0_SI815
+FBMJ0_SX156
+FBMJ0_SX246
+FBMJ0_SX336
+FBMJ0_SX426
+FBMJ0_SX66
+FCAG0_SI1503
+FCAG0_SI1641
+FCAG0_SI2133
+FCAG0_SX153
+FCAG0_SX243
+FCAG0_SX333
+FCAG0_SX423
+FCAG0_SX63
+FCAJ0_SI1479
+FCAJ0_SI1804
+FCAJ0_SI849
+FCAJ0_SX129
+FCAJ0_SX219
+FCAJ0_SX309
+FCAJ0_SX39
+FCAJ0_SX399
+FCDR1_SI1186
+FCDR1_SI1816
+FCDR1_SI556
+FCDR1_SX106
+FCDR1_SX16
+FCDR1_SX196
+FCDR1_SX286
+FCDR1_SX376
+FCEG0_SI1248
+FCEG0_SI1878
+FCEG0_SI618
+FCEG0_SX168
+FCEG0_SX258
+FCEG0_SX348
+FCEG0_SX438
+FCEG0_SX78
+FCJF0_SI1027
+FCJF0_SI1657
+FCJF0_SI648
+FCJF0_SX127
+FCJF0_SX217
+FCJF0_SX307
+FCJF0_SX37
+FCJF0_SX397
+FCJS0_SI1607
+FCJS0_SI2237
+FCJS0_SI977
+FCJS0_SX167
+FCJS0_SX257
+FCJS0_SX347
+FCJS0_SX437
+FCJS0_SX77
+FCKE0_SI1111
+FCKE0_SI1741
+FCKE0_SI481
+FCKE0_SX121
+FCKE0_SX211
+FCKE0_SX301
+FCKE0_SX31
+FCKE0_SX391
+FCLT0_SI1438
+FCLT0_SI2068
+FCLT0_SI808
+FCLT0_SX178
+FCLT0_SX268
+FCLT0_SX358
+FCLT0_SX448
+FCLT0_SX88
+FCMG0_SI1142
+FCMG0_SI1242
+FCMG0_SI1872
+FCMG0_SX162
+FCMG0_SX252
+FCMG0_SX342
+FCMG0_SX432
+FCMG0_SX72
+FCMM0_SI1083
+FCMM0_SI1957
+FCMM0_SI453
+FCMM0_SX183
+FCMM0_SX273
+FCMM0_SX363
+FCMM0_SX420
+FCMM0_SX93
+FCRZ0_SI1913
+FCRZ0_SI2053
+FCRZ0_SI793
+FCRZ0_SX163
+FCRZ0_SX253
+FCRZ0_SX343
+FCRZ0_SX433
+FCRZ0_SX73
+FCYL0_SI1297
+FCYL0_SI1927
+FCYL0_SI667
+FCYL0_SX127
+FCYL0_SX217
+FCYL0_SX349
+FCYL0_SX37
+FCYL0_SX397
+FDAS1_SI1461
+FDAS1_SI2091
+FDAS1_SI831
+FDAS1_SX111
+FDAS1_SX201
+FDAS1_SX21
+FDAS1_SX291
+FDAS1_SX381
+FDAW0_SI1271
+FDAW0_SI1406
+FDAW0_SI2036
+FDAW0_SX146
+FDAW0_SX236
+FDAW0_SX326
+FDAW0_SX416
+FDAW0_SX56
+FDFB0_SI1318
+FDFB0_SI1948
+FDFB0_SI2010
+FDFB0_SX148
+FDFB0_SX238
+FDFB0_SX328
+FDFB0_SX418
+FDFB0_SX58
+FDJH0_SI1565
+FDJH0_SI2195
+FDJH0_SI935
+FDJH0_SX125
+FDJH0_SX215
+FDJH0_SX305
+FDJH0_SX35
+FDJH0_SX395
+FDKN0_SI1081
+FDKN0_SI1202
+FDKN0_SI1711
+FDKN0_SX181
+FDKN0_SX271
+FDKN0_SX361
+FDKN0_SX451
+FDKN0_SX91
+FDML0_SI1149
+FDML0_SI1779
+FDML0_SI2075
+FDML0_SX159
+FDML0_SX249
+FDML0_SX339
+FDML0_SX429
+FDML0_SX69
+FDMY0_SI1197
+FDMY0_SI567
+FDMY0_SI714
+FDMY0_SX117
+FDMY0_SX207
+FDMY0_SX27
+FDMY0_SX297
+FDMY0_SX387
+FDNC0_SI1278
+FDNC0_SI1908
+FDNC0_SI2287
+FDNC0_SX108
+FDNC0_SX18
+FDNC0_SX198
+FDNC0_SX288
+FDNC0_SX378
+FDTD0_SI1561
+FDTD0_SI2191
+FDTD0_SI931
+FDTD0_SX121
+FDTD0_SX211
+FDTD0_SX301
+FDTD0_SX321
+FDTD0_SX391
+FDXW0_SI1511
+FDXW0_SI2141
+FDXW0_SI881
+FDXW0_SX161
+FDXW0_SX251
+FDXW0_SX341
+FDXW0_SX431
+FDXW0_SX71
+FEAC0_SI1245
+FEAC0_SI1875
+FEAC0_SI615
+FEAC0_SX165
+FEAC0_SX255
+FEAC0_SX345
+FEAC0_SX435
+FEAC0_SX75
+FEAR0_SI1252
+FEAR0_SI1882
+FEAR0_SI622
+FEAR0_SX172
+FEAR0_SX262
+FEAR0_SX352
+FEAR0_SX442
+FEAR0_SX82
+FECD0_SI1418
+FECD0_SI2048
+FECD0_SI788
+FECD0_SX158
+FECD0_SX248
+FECD0_SX338
+FECD0_SX428
+FECD0_SX68
+FEEH0_SI1112
+FEEH0_SI1742
+FEEH0_SI471
+FEEH0_SX122
+FEEH0_SX212
+FEEH0_SX302
+FEEH0_SX32
+FEEH0_SX392
+FEME0_SI1505
+FEME0_SI2135
+FEME0_SI875
+FEME0_SX155
+FEME0_SX245
+FEME0_SX335
+FEME0_SX425
+FEME0_SX65
+FETB0_SI1148
+FETB0_SI1778
+FETB0_SI518
+FETB0_SX158
+FETB0_SX248
+FETB0_SX338
+FETB0_SX428
+FETB0_SX68
+FEXM0_SI1101
+FEXM0_SI1731
+FEXM0_SI482
+FEXM0_SX111
+FEXM0_SX201
+FEXM0_SX291
+FEXM0_SX366
+FEXM0_SX381
+FGCS0_SI1486
+FGCS0_SI2116
+FGCS0_SI856
+FGCS0_SX136
+FGCS0_SX226
+FGCS0_SX316
+FGCS0_SX406
+FGCS0_SX46
+FGDP0_SI1618
+FGDP0_SI2248
+FGDP0_SI988
+FGDP0_SX178
+FGDP0_SX268
+FGDP0_SX358
+FGDP0_SX448
+FGDP0_SX88
+FGMB0_SI1145
+FGMB0_SI1775
+FGMB0_SI515
+FGMB0_SX155
+FGMB0_SX245
+FGMB0_SX335
+FGMB0_SX425
+FGMB0_SX65
+FGRW0_SI1152
+FGRW0_SI1782
+FGRW0_SI1990
+FGRW0_SX162
+FGRW0_SX252
+FGRW0_SX342
+FGRW0_SX432
+FGRW0_SX72
+FHLM0_SI1560
+FHLM0_SI2190
+FHLM0_SI930
+FHLM0_SX120
+FHLM0_SX210
+FHLM0_SX300
+FHLM0_SX349
+FHLM0_SX390
+FHXS0_SI1075
+FHXS0_SI2302
+FHXS0_SI2335
+FHXS0_SX175
+FHXS0_SX265
+FHXS0_SX355
+FHXS0_SX445
+FHXS0_SX85
+FJDM2_SI1582
+FJDM2_SI1964
+FJDM2_SI2212
+FJDM2_SX142
+FJDM2_SX232
+FJDM2_SX322
+FJDM2_SX412
+FJDM2_SX52
+FJEN0_SI1047
+FJEN0_SI1677
+FJEN0_SI2307
+FJEN0_SX147
+FJEN0_SX237
+FJEN0_SX327
+FJEN0_SX417
+FJEN0_SX57
+FJHK0_SI1022
+FJHK0_SI1652
+FJHK0_SI2282
+FJHK0_SX122
+FJHK0_SX212
+FJHK0_SX302
+FJHK0_SX32
+FJHK0_SX392
+FJKL0_SI1562
+FJKL0_SI2192
+FJKL0_SI932
+FJKL0_SX122
+FJKL0_SX212
+FJKL0_SX302
+FJKL0_SX32
+FJKL0_SX392
+FJLG0_SI1506
+FJLG0_SI1889
+FJLG0_SI2306
+FJLG0_SX179
+FJLG0_SX269
+FJLG0_SX359
+FJLG0_SX449
+FJLG0_SX89
+FJLR0_SI1231
+FJLR0_SI1861
+FJLR0_SI601
+FJLR0_SX151
+FJLR0_SX241
+FJLR0_SX331
+FJLR0_SX421
+FJLR0_SX61
+FJRB0_SI1302
+FJRB0_SI1932
+FJRB0_SI672
+FJRB0_SX132
+FJRB0_SX222
+FJRB0_SX312
+FJRB0_SX402
+FJRB0_SX42
+FJRP1_SI1432
+FJRP1_SI2062
+FJRP1_SI802
+FJRP1_SX172
+FJRP1_SX262
+FJRP1_SX352
+FJRP1_SX442
+FJRP1_SX82
+FJSK0_SI1052
+FJSK0_SI1682
+FJSK0_SI2312
+FJSK0_SX152
+FJSK0_SX242
+FJSK0_SX332
+FJSK0_SX422
+FJSK0_SX62
+FJSP0_SI1434
+FJSP0_SI1763
+FJSP0_SI804
+FJSP0_SX174
+FJSP0_SX264
+FJSP0_SX354
+FJSP0_SX444
+FJSP0_SX84
+FJWB1_SI2055
+FJWB1_SI748
+FJWB1_SI795
+FJWB1_SX165
+FJWB1_SX255
+FJWB1_SX345
+FJWB1_SX435
+FJWB1_SX75
+FJXM0_SI1211
+FJXM0_SI1971
+FJXM0_SI581
+FJXM0_SX131
+FJXM0_SX221
+FJXM0_SX311
+FJXM0_SX401
+FJXM0_SX41
+FJXP0_SI1122
+FJXP0_SI1752
+FJXP0_SI492
+FJXP0_SX132
+FJXP0_SX222
+FJXP0_SX312
+FJXP0_SX402
+FJXP0_SX42
+FKAA0_SI1208
+FKAA0_SI1838
+FKAA0_SI578
+FKAA0_SX128
+FKAA0_SX218
+FKAA0_SX308
+FKAA0_SX38
+FKAA0_SX398
+FKDE0_SI1141
+FKDE0_SI1771
+FKDE0_SI2221
+FKDE0_SX151
+FKDE0_SX241
+FKDE0_SX331
+FKDE0_SX421
+FKDE0_SX61
+FKDW0_SI1207
+FKDW0_SI1891
+FKDW0_SI577
+FKDW0_SX127
+FKDW0_SX217
+FKDW0_SX307
+FKDW0_SX37
+FKDW0_SX397
+FKFB0_SI1608
+FKFB0_SI2238
+FKFB0_SI978
+FKFB0_SX168
+FKFB0_SX258
+FKFB0_SX348
+FKFB0_SX438
+FKFB0_SX78
+FKKH0_SI1290
+FKKH0_SI1920
+FKKH0_SI660
+FKKH0_SX120
+FKKH0_SX210
+FKKH0_SX30
+FKKH0_SX300
+FKKH0_SX390
+FKLC0_SI1615
+FKLC0_SI2245
+FKLC0_SI985
+FKLC0_SX175
+FKLC0_SX265
+FKLC0_SX355
+FKLC0_SX445
+FKLC0_SX85
+FKLC1_SI1048
+FKLC1_SI1678
+FKLC1_SI2308
+FKLC1_SX148
+FKLC1_SX238
+FKLC1_SX328
+FKLC1_SX418
+FKLC1_SX58
+FKLH0_SI1257
+FKLH0_SI1887
+FKLH0_SI627
+FKLH0_SX177
+FKLH0_SX267
+FKLH0_SX357
+FKLH0_SX447
+FKLH0_SX87
+FKSR0_SI1117
+FKSR0_SI1747
+FKSR0_SI487
+FKSR0_SX161
+FKSR0_SX217
+FKSR0_SX366
+FKSR0_SX37
+FKSR0_SX397
+FLAC0_SI1339
+FLAC0_SI2161
+FLAC0_SI901
+FLAC0_SX181
+FLAC0_SX271
+FLAC0_SX361
+FLAC0_SX451
+FLAC0_SX91
+FLAG0_SI1464
+FLAG0_SI2094
+FLAG0_SI834
+FLAG0_SX114
+FLAG0_SX204
+FLAG0_SX24
+FLAG0_SX294
+FLAG0_SX384
+FLEH0_SI1051
+FLEH0_SI1681
+FLEH0_SI2311
+FLEH0_SX151
+FLEH0_SX241
+FLEH0_SX331
+FLEH0_SX421
+FLEH0_SX61
+FLET0_SI1137
+FLET0_SI1767
+FLET0_SI507
+FLET0_SX147
+FLET0_SX237
+FLET0_SX277
+FLET0_SX417
+FLET0_SX57
+FLHD0_SI1344
+FLHD0_SI1827
+FLHD0_SI1974
+FLHD0_SX174
+FLHD0_SX264
+FLHD0_SX354
+FLHD0_SX444
+FLHD0_SX84
+FLJA0_SI1078
+FLJA0_SI1708
+FLJA0_SI2338
+FLJA0_SX178
+FLJA0_SX268
+FLJA0_SX358
+FLJA0_SX448
+FLJA0_SX88
+FLJD0_SI1516
+FLJD0_SI2146
+FLJD0_SI886
+FLJD0_SX166
+FLJD0_SX256
+FLJD0_SX346
+FLJD0_SX436
+FLJD0_SX76
+FLJG0_SI1611
+FLJG0_SI2241
+FLJG0_SI981
+FLJG0_SX171
+FLJG0_SX261
+FLJG0_SX351
+FLJG0_SX441
+FLJG0_SX81
+FLKM0_SI1880
+FLKM0_SI620
+FLKM0_SI686
+FLKM0_SX116
+FLKM0_SX260
+FLKM0_SX350
+FLKM0_SX440
+FLKM0_SX80
+FLMA0_SI1243
+FLMA0_SI1873
+FLMA0_SI613
+FLMA0_SX163
+FLMA0_SX253
+FLMA0_SX343
+FLMA0_SX433
+FLMA0_SX73
+FLMC0_SI1372
+FLMC0_SI2002
+FLMC0_SI742
+FLMC0_SX112
+FLMC0_SX22
+FLMC0_SX292
+FLMC0_SX336
+FLMC0_SX382
+FLMK0_SI1035
+FLMK0_SI1229
+FLMK0_SI2295
+FLMK0_SX135
+FLMK0_SX225
+FLMK0_SX315
+FLMK0_SX405
+FLMK0_SX45
+FLOD0_SI1287
+FLOD0_SI1917
+FLOD0_SI657
+FLOD0_SX117
+FLOD0_SX171
+FLOD0_SX207
+FLOD0_SX297
+FLOD0_SX387
+FLTM0_SI1070
+FLTM0_SI1700
+FLTM0_SI2330
+FLTM0_SX170
+FLTM0_SX260
+FLTM0_SX350
+FLTM0_SX440
+FLTM0_SX80
+FMAH1_SI1509
+FMAH1_SI2139
+FMAH1_SI879
+FMAH1_SX159
+FMAH1_SX249
+FMAH1_SX339
+FMAH1_SX429
+FMAH1_SX69
+FMBG0_SI1160
+FMBG0_SI1790
+FMBG0_SI2264
+FMBG0_SX260
+FMBG0_SX3
+FMBG0_SX350
+FMBG0_SX440
+FMBG0_SX80
+FMEM0_SI1377
+FMEM0_SI2007
+FMEM0_SI747
+FMEM0_SX117
+FMEM0_SX207
+FMEM0_SX297
+FMEM0_SX333
+FMEM0_SX387
+FMJB0_SI1177
+FMJB0_SI1807
+FMJB0_SI547
+FMJB0_SX187
+FMJB0_SX277
+FMJB0_SX367
+FMJB0_SX7
+FMJB0_SX97
+FMJF0_SI1254
+FMJF0_SI1884
+FMJF0_SI624
+FMJF0_SX174
+FMJF0_SX264
+FMJF0_SX354
+FMJF0_SX444
+FMJF0_SX84
+FMJU0_SI1389
+FMJU0_SI2019
+FMJU0_SI759
+FMJU0_SX129
+FMJU0_SX219
+FMJU0_SX309
+FMJU0_SX39
+FMJU0_SX399
+FMKC0_SI1041
+FMKC0_SI1072
+FMKC0_SI1702
+FMKC0_SX172
+FMKC0_SX262
+FMKC0_SX352
+FMKC0_SX442
+FMKC0_SX82
+FMKF0_SI1018
+FMKF0_SI1536
+FMKF0_SI906
+FMKF0_SX186
+FMKF0_SX276
+FMKF0_SX366
+FMKF0_SX6
+FMKF0_SX96
+FMMH0_SI1537
+FMMH0_SI2167
+FMMH0_SI907
+FMMH0_SX187
+FMMH0_SX367
+FMMH0_SX420
+FMMH0_SX7
+FMMH0_SX97
+FMPG0_SI1602
+FMPG0_SI2232
+FMPG0_SI972
+FMPG0_SX162
+FMPG0_SX252
+FMPG0_SX342
+FMPG0_SX432
+FMPG0_SX72
+FNKL0_SI1522
+FNKL0_SI2152
+FNKL0_SI892
+FNKL0_SX172
+FNKL0_SX196
+FNKL0_SX262
+FNKL0_SX442
+FNKL0_SX82
+FNTB0_SI1203
+FNTB0_SI573
+FNTB0_SI679
+FNTB0_SX123
+FNTB0_SX213
+FNTB0_SX303
+FNTB0_SX33
+FNTB0_SX393
+FPAB1_SI1471
+FPAB1_SI2101
+FPAB1_SI841
+FPAB1_SX121
+FPAB1_SX211
+FPAB1_SX301
+FPAB1_SX31
+FPAB1_SX391
+FPAC0_SI1921
+FPAC0_SI2011
+FPAC0_SI661
+FPAC0_SX121
+FPAC0_SX211
+FPAC0_SX301
+FPAC0_SX31
+FPAC0_SX391
+FPAD0_SI1346
+FPAD0_SI1976
+FPAD0_SI716
+FPAD0_SX176
+FPAD0_SX266
+FPAD0_SX356
+FPAD0_SX446
+FPAD0_SX86
+FPAF0_SI1054
+FPAF0_SI1684
+FPAF0_SI2314
+FPAF0_SX154
+FPAF0_SX244
+FPAF0_SX334
+FPAF0_SX424
+FPAF0_SX64
+FPAZ0_SI1593
+FPAZ0_SI2223
+FPAZ0_SI963
+FPAZ0_SX153
+FPAZ0_SX243
+FPAZ0_SX27
+FPAZ0_SX423
+FPAZ0_SX63
+FPJF0_SI1046
+FPJF0_SI1259
+FPJF0_SI1676
+FPJF0_SX146
+FPJF0_SX236
+FPJF0_SX326
+FPJF0_SX352
+FPJF0_SX56
+FPLS0_SI1590
+FPLS0_SI2220
+FPLS0_SI960
+FPLS0_SX150
+FPLS0_SX240
+FPLS0_SX3
+FPLS0_SX330
+FPLS0_SX60
+FPMY0_SI1153
+FPMY0_SI1783
+FPMY0_SI523
+FPMY0_SX163
+FPMY0_SX196
+FPMY0_SX253
+FPMY0_SX343
+FPMY0_SX73
+FREH0_SI1315
+FREH0_SI1945
+FREH0_SI685
+FREH0_SX145
+FREH0_SX235
+FREH0_SX325
+FREH0_SX415
+FREH0_SX55
+FRJB0_SI1427
+FRJB0_SI1470
+FRJB0_SI1794
+FRJB0_SX167
+FRJB0_SX257
+FRJB0_SX347
+FRJB0_SX437
+FRJB0_SX77
+FRLL0_SI1514
+FRLL0_SI805
+FRLL0_SI884
+FRLL0_SX164
+FRLL0_SX254
+FRLL0_SX344
+FRLL0_SX434
+FRLL0_SX74
+FSAG0_SI1323
+FSAG0_SI1953
+FSAG0_SI693
+FSAG0_SX153
+FSAG0_SX243
+FSAG0_SX333
+FSAG0_SX423
+FSAG0_SX63
+FSAH0_SI1244
+FSAH0_SI1874
+FSAH0_SI614
+FSAH0_SX164
+FSAH0_SX327
+FSAH0_SX344
+FSAH0_SX434
+FSAH0_SX74
+FSAK0_SI1300
+FSAK0_SI1930
+FSAK0_SI670
+FSAK0_SX130
+FSAK0_SX220
+FSAK0_SX310
+FSAK0_SX40
+FSAK0_SX400
+FSBK0_SI1069
+FSBK0_SI1699
+FSBK0_SI2329
+FSBK0_SX169
+FSBK0_SX259
+FSBK0_SX349
+FSBK0_SX439
+FSBK0_SX79
+FSCN0_SI1886
+FSCN0_SI626
+FSCN0_SI705
+FSCN0_SX176
+FSCN0_SX266
+FSCN0_SX356
+FSCN0_SX446
+FSCN0_SX86
+FSDC0_SI1312
+FSDC0_SI1942
+FSDC0_SI2234
+FSDC0_SX142
+FSDC0_SX232
+FSDC0_SX322
+FSDC0_SX412
+FSDC0_SX52
+FSDJ0_SI1115
+FSDJ0_SI1745
+FSDJ0_SI485
+FSDJ0_SX125
+FSDJ0_SX215
+FSDJ0_SX305
+FSDJ0_SX35
+FSDJ0_SX395
+FSGF0_SI1557
+FSGF0_SI2187
+FSGF0_SI927
+FSGF0_SX117
+FSGF0_SX207
+FSGF0_SX27
+FSGF0_SX297
+FSGF0_SX387
+FSJG0_SI1570
+FSJG0_SI2200
+FSJG0_SI940
+FSJG0_SX130
+FSJG0_SX220
+FSJG0_SX310
+FSJG0_SX40
+FSJG0_SX400
+FSJK1_SI1025
+FSJK1_SI2285
+FSJK1_SI696
+FSJK1_SX125
+FSJK1_SX215
+FSJK1_SX305
+FSJK1_SX35
+FSJK1_SX395
+FSJS0_SI1171
+FSJS0_SI1801
+FSJS0_SI541
+FSJS0_SX181
+FSJS0_SX271
+FSJS0_SX361
+FSJS0_SX451
+FSJS0_SX91
+FSJW0_SI1333
+FSJW0_SI1963
+FSJW0_SI703
+FSJW0_SX163
+FSJW0_SX253
+FSJW0_SX343
+FSJW0_SX433
+FSJW0_SX73
+FSKC0_SI1416
+FSKC0_SI2046
+FSKC0_SI786
+FSKC0_SX156
+FSKC0_SX246
+FSKC0_SX336
+FSKC0_SX426
+FSKC0_SX66
+FSKL0_SI1529
+FSKL0_SI2159
+FSKL0_SI899
+FSKL0_SX179
+FSKL0_SX269
+FSKL0_SX359
+FSKL0_SX449
+FSKL0_SX89
+FSKP0_SI1098
+FSKP0_SI1728
+FSKP0_SI468
+FSKP0_SX108
+FSKP0_SX18
+FSKP0_SX198
+FSKP0_SX288
+FSKP0_SX378
+FSLS0_SI1056
+FSLS0_SI1686
+FSLS0_SI2316
+FSLS0_SX156
+FSLS0_SX202
+FSLS0_SX246
+FSLS0_SX426
+FSLS0_SX66
+FSMA0_SI1621
+FSMA0_SI2251
+FSMA0_SI991
+FSMA0_SX181
+FSMA0_SX271
+FSMA0_SX361
+FSMA0_SX451
+FSMA0_SX91
+FSMM0_SI1314
+FSMM0_SI1944
+FSMM0_SI684
+FSMM0_SX144
+FSMM0_SX234
+FSMM0_SX324
+FSMM0_SX414
+FSMM0_SX54
+FSMS1_SI1504
+FSMS1_SI2134
+FSMS1_SI874
+FSMS1_SX154
+FSMS1_SX244
+FSMS1_SX334
+FSMS1_SX347
+FSMS1_SX64
+FSPM0_SI1241
+FSPM0_SI1871
+FSPM0_SI611
+FSPM0_SX161
+FSPM0_SX251
+FSPM0_SX341
+FSPM0_SX431
+FSPM0_SX71
+FSRH0_SI1719
+FSRH0_SI1931
+FSRH0_SI671
+FSRH0_SX131
+FSRH0_SX221
+FSRH0_SX311
+FSRH0_SX401
+FSRH0_SX41
+FSSB0_SI1082
+FSSB0_SI1712
+FSSB0_SI2342
+FSSB0_SX182
+FSSB0_SX272
+FSSB0_SX362
+FSSB0_SX452
+FSSB0_SX92
+FTAJ0_SI1329
+FTAJ0_SI474
+FTAJ0_SI699
+FTAJ0_SX159
+FTAJ0_SX249
+FTAJ0_SX339
+FTAJ0_SX429
+FTAJ0_SX69
+FTBR0_SI1402
+FTBR0_SI2181
+FTBR0_SI921
+FTBR0_SX111
+FTBR0_SX201
+FTBR0_SX21
+FTBR0_SX291
+FTBR0_SX381
+FTBW0_SI1345
+FTBW0_SI1975
+FTBW0_SI715
+FTBW0_SX175
+FTBW0_SX265
+FTBW0_SX355
+FTBW0_SX445
+FTBW0_SX85
+FTLG0_SI1743
+FTLG0_SI483
+FTLG0_SI840
+FTLG0_SX123
+FTLG0_SX213
+FTLG0_SX303
+FTLG0_SX33
+FTLG0_SX393
+FTMG0_SI1532
+FTMG0_SI2162
+FTMG0_SI902
+FTMG0_SX182
+FTMG0_SX272
+FTMG0_SX362
+FTMG0_SX452
+FTMG0_SX92
+FVFB0_SI1032
+FVFB0_SI1510
+FVFB0_SI2292
+FVFB0_SX132
+FVFB0_SX222
+FVFB0_SX312
+FVFB0_SX402
+FVFB0_SX42
+FVKB0_SI1159
+FVKB0_SI1789
+FVKB0_SI529
+FVKB0_SX169
+FVKB0_SX259
+FVKB0_SX349
+FVKB0_SX439
+FVKB0_SX79
+FVMH0_SI1466
+FVMH0_SI2096
+FVMH0_SI836
+FVMH0_SX116
+FVMH0_SX206
+FVMH0_SX26
+FVMH0_SX296
+FVMH0_SX386
+MABC0_SI1620
+MABC0_SI2041
+MABC0_SI781
+MABC0_SX151
+MABC0_SX241
+MABC0_SX331
+MABC0_SX421
+MABC0_SX61
+MADC0_SI1367
+MADC0_SI1997
+MADC0_SI737
+MADC0_SX107
+MADC0_SX17
+MADC0_SX197
+MADC0_SX287
+MADC0_SX377
+MADD0_SI1295
+MADD0_SI1798
+MADD0_SI538
+MADD0_SX178
+MADD0_SX268
+MADD0_SX358
+MADD0_SX448
+MADD0_SX88
+MAEB0_SI1411
+MAEB0_SI2250
+MAEB0_SI990
+MAEB0_SX180
+MAEB0_SX270
+MAEB0_SX360
+MAEB0_SX450
+MAEB0_SX90
+MAEO0_SI1326
+MAEO0_SI1655
+MAEO0_SI1956
+MAEO0_SX156
+MAEO0_SX246
+MAEO0_SX336
+MAEO0_SX426
+MAEO0_SX66
+MAFM0_SI1569
+MAFM0_SI2199
+MAFM0_SI939
+MAFM0_SX129
+MAFM0_SX219
+MAFM0_SX309
+MAFM0_SX39
+MAFM0_SX399
+MAJP0_SI1074
+MAJP0_SI1704
+MAJP0_SI2334
+MAJP0_SX174
+MAJP0_SX264
+MAJP0_SX354
+MAJP0_SX444
+MAJP0_SX84
+MAKB0_SI1016
+MAKB0_SI1646
+MAKB0_SI2276
+MAKB0_SX116
+MAKB0_SX206
+MAKB0_SX26
+MAKB0_SX296
+MAKB0_SX386
+MAKR0_SI1352
+MAKR0_SI1982
+MAKR0_SI722
+MAKR0_SX182
+MAKR0_SX272
+MAKR0_SX362
+MAKR0_SX452
+MAKR0_SX92
+MAPV0_SI1293
+MAPV0_SI1923
+MAPV0_SI663
+MAPV0_SX123
+MAPV0_SX213
+MAPV0_SX303
+MAPV0_SX33
+MAPV0_SX393
+MARC0_SI1188
+MARC0_SI1818
+MARC0_SI558
+MARC0_SX108
+MARC0_SX18
+MARC0_SX198
+MARC0_SX288
+MARC0_SX378
+MARW0_SI1276
+MARW0_SI1906
+MARW0_SI646
+MARW0_SX106
+MARW0_SX16
+MARW0_SX286
+MARW0_SX349
+MARW0_SX376
+MBAR0_SI1319
+MBAR0_SI1949
+MBAR0_SI689
+MBAR0_SX149
+MBAR0_SX239
+MBAR0_SX329
+MBAR0_SX419
+MBAR0_SX59
+MBBR0_SI1055
+MBBR0_SI1685
+MBBR0_SI2315
+MBBR0_SX155
+MBBR0_SX245
+MBBR0_SX335
+MBBR0_SX425
+MBBR0_SX65
+MBCG0_SI2217
+MBCG0_SI486
+MBCG0_SI957
+MBCG0_SX147
+MBCG0_SX237
+MBCG0_SX327
+MBCG0_SX417
+MBCG0_SX57
+MBEF0_SI1281
+MBEF0_SI1911
+MBEF0_SI651
+MBEF0_SX111
+MBEF0_SX201
+MBEF0_SX21
+MBEF0_SX291
+MBEF0_SX381
+MBGT0_SI1341
+MBGT0_SI1841
+MBGT0_SI711
+MBGT0_SX171
+MBGT0_SX261
+MBGT0_SX351
+MBGT0_SX441
+MBGT0_SX81
+MBJV0_SI1247
+MBJV0_SI1877
+MBJV0_SI617
+MBJV0_SX167
+MBJV0_SX257
+MBJV0_SX347
+MBJV0_SX437
+MBJV0_SX77
+MBMA0_SI1222
+MBMA0_SI1852
+MBMA0_SI592
+MBMA0_SX142
+MBMA0_SX232
+MBMA0_SX322
+MBMA0_SX412
+MBMA0_SX52
+MBMA1_SI2207
+MBMA1_SI2214
+MBMA1_SI954
+MBMA1_SX144
+MBMA1_SX234
+MBMA1_SX324
+MBMA1_SX414
+MBMA1_SX54
+MBML0_SI1169
+MBML0_SI1799
+MBML0_SI539
+MBML0_SX179
+MBML0_SX269
+MBML0_SX359
+MBML0_SX449
+MBML0_SX89
+MBOM0_SI1014
+MBOM0_SI1644
+MBOM0_SI2274
+MBOM0_SX114
+MBOM0_SX204
+MBOM0_SX294
+MBOM0_SX311
+MBOM0_SX384
+MBSB0_SI1353
+MBSB0_SI1983
+MBSB0_SI723
+MBSB0_SX183
+MBSB0_SX273
+MBSB0_SX3
+MBSB0_SX363
+MBSB0_SX93
+MBTH0_SI2102
+MBTH0_SI505
+MBTH0_SI757
+MBTH0_SX122
+MBTH0_SX212
+MBTH0_SX302
+MBTH0_SX32
+MBTH0_SX392
+MBWP0_SI1531
+MBWP0_SI1969
+MBWP0_SI709
+MBWP0_SX169
+MBWP0_SX259
+MBWP0_SX349
+MBWP0_SX439
+MBWP0_SX79
+MCAE0_SI1447
+MCAE0_SI2077
+MCAE0_SI817
+MCAE0_SX187
+MCAE0_SX277
+MCAE0_SX367
+MCAE0_SX7
+MCAE0_SX97
+MCAL0_SI1138
+MCAL0_SI1768
+MCAL0_SI508
+MCAL0_SX148
+MCAL0_SX238
+MCAL0_SX328
+MCAL0_SX418
+MCAL0_SX58
+MCDC0_SI1292
+MCDC0_SI1922
+MCDC0_SI662
+MCDC0_SX122
+MCDC0_SX212
+MCDC0_SX302
+MCDC0_SX32
+MCDC0_SX392
+MCDD0_SI1513
+MCDD0_SI2143
+MCDD0_SI883
+MCDD0_SX163
+MCDD0_SX253
+MCDD0_SX343
+MCDD0_SX433
+MCDD0_SX73
+MCDR0_SI1154
+MCDR0_SI1784
+MCDR0_SI524
+MCDR0_SX164
+MCDR0_SX254
+MCDR0_SX344
+MCDR0_SX434
+MCDR0_SX74
+MCEF0_SI1135
+MCEF0_SI1765
+MCEF0_SI842
+MCEF0_SX145
+MCEF0_SX235
+MCEF0_SX325
+MCEF0_SX415
+MCEF0_SX55
+MCEW0_SI1442
+MCEW0_SI2072
+MCEW0_SI812
+MCEW0_SX182
+MCEW0_SX272
+MCEW0_SX362
+MCEW0_SX452
+MCEW0_SX92
+MCHL0_SI1347
+MCHL0_SI1404
+MCHL0_SI1977
+MCHL0_SX177
+MCHL0_SX267
+MCHL0_SX357
+MCHL0_SX447
+MCHL0_SX87
+MCLK0_SI1660
+MCLK0_SI2290
+MCLK0_SI650
+MCLK0_SX130
+MCLK0_SX220
+MCLK0_SX310
+MCLK0_SX40
+MCLK0_SX400
+MCLM0_SI1456
+MCLM0_SI2086
+MCLM0_SI826
+MCLM0_SX106
+MCLM0_SX16
+MCLM0_SX196
+MCLM0_SX286
+MCLM0_SX376
+MCPM0_SI1194
+MCPM0_SI1824
+MCPM0_SI564
+MCPM0_SX114
+MCPM0_SX204
+MCPM0_SX24
+MCPM0_SX294
+MCPM0_SX384
+MCRE0_SI1121
+MCRE0_SI1725
+MCRE0_SI1751
+MCRE0_SX131
+MCRE0_SX221
+MCRE0_SX24
+MCRE0_SX401
+MCRE0_SX41
+MCSS0_SI1380
+MCSS0_SI688
+MCSS0_SI750
+MCSS0_SX120
+MCSS0_SX210
+MCSS0_SX30
+MCSS0_SX300
+MCSS0_SX390
+MCTH0_SI1209
+MCTH0_SI1839
+MCTH0_SI579
+MCTH0_SX129
+MCTH0_SX219
+MCTH0_SX309
+MCTH0_SX39
+MCTH0_SX399
+MCTM0_SI1350
+MCTM0_SI1980
+MCTM0_SI720
+MCTM0_SX180
+MCTM0_SX270
+MCTM0_SX360
+MCTM0_SX450
+MCTM0_SX90
+MCXM0_SI1351
+MCXM0_SI1981
+MCXM0_SI721
+MCXM0_SX181
+MCXM0_SX271
+MCXM0_SX361
+MCXM0_SX451
+MCXM0_SX91
+MDAC0_SI1261
+MDAC0_SI1837
+MDAC0_SI631
+MDAC0_SX181
+MDAC0_SX271
+MDAC0_SX361
+MDAC0_SX451
+MDAC0_SX91
+MDAS0_SI1266
+MDAS0_SI1896
+MDAS0_SI636
+MDAS0_SX186
+MDAS0_SX21
+MDAS0_SX276
+MDAS0_SX6
+MDAS0_SX96
+MDBB1_SI1006
+MDBB1_SI1636
+MDBB1_SI2056
+MDBB1_SX106
+MDBB1_SX16
+MDBB1_SX196
+MDBB1_SX286
+MDBB1_SX376
+MDBP0_SI1158
+MDBP0_SI1788
+MDBP0_SI528
+MDBP0_SX168
+MDBP0_SX258
+MDBP0_SX348
+MDBP0_SX438
+MDBP0_SX78
+MDCD0_SI1415
+MDCD0_SI2045
+MDCD0_SI785
+MDCD0_SX155
+MDCD0_SX245
+MDCD0_SX335
+MDCD0_SX425
+MDCD0_SX65
+MDCM0_SI1480
+MDCM0_SI2110
+MDCM0_SI850
+MDCM0_SX130
+MDCM0_SX220
+MDCM0_SX310
+MDCM0_SX40
+MDCM0_SX400
+MDDC0_SI1419
+MDDC0_SI2049
+MDDC0_SI789
+MDDC0_SX159
+MDDC0_SX249
+MDDC0_SX339
+MDDC0_SX429
+MDDC0_SX69
+MDED0_SI1170
+MDED0_SI1800
+MDED0_SI540
+MDED0_SX180
+MDED0_SX270
+MDED0_SX360
+MDED0_SX450
+MDED0_SX90
+MDEF0_SI1123
+MDEF0_SI1563
+MDEF0_SI2193
+MDEF0_SX123
+MDEF0_SX213
+MDEF0_SX303
+MDEF0_SX33
+MDEF0_SX393
+MDEM0_SI1868
+MDEM0_SI608
+MDEM0_SI800
+MDEM0_SX158
+MDEM0_SX248
+MDEM0_SX338
+MDEM0_SX428
+MDEM0_SX68
+MDHL0_SI1439
+MDHL0_SI2069
+MDHL0_SI809
+MDHL0_SX179
+MDHL0_SX269
+MDHL0_SX359
+MDHL0_SX449
+MDHL0_SX89
+MDHS0_SI1530
+MDHS0_SI2160
+MDHS0_SI900
+MDHS0_SX180
+MDHS0_SX270
+MDHS0_SX360
+MDHS0_SX450
+MDHS0_SX90
+MDJM0_SI1455
+MDJM0_SI2085
+MDJM0_SI825
+MDJM0_SX105
+MDJM0_SX15
+MDJM0_SX195
+MDJM0_SX285
+MDJM0_SX375
+MDKS0_SI1066
+MDKS0_SI1696
+MDKS0_SI2326
+MDKS0_SX166
+MDKS0_SX256
+MDKS0_SX346
+MDKS0_SX436
+MDKS0_SX76
+MDLB0_SI1306
+MDLB0_SI1936
+MDLB0_SI676
+MDLB0_SX136
+MDLB0_SX226
+MDLB0_SX316
+MDLB0_SX406
+MDLB0_SX46
+MDLC0_SI1395
+MDLC0_SI2025
+MDLC0_SI765
+MDLC0_SX135
+MDLC0_SX225
+MDLC0_SX315
+MDLC0_SX405
+MDLC0_SX45
+MDLC1_SI1435
+MDLC1_SI2065
+MDLC1_SI2144
+MDLC1_SX175
+MDLC1_SX265
+MDLC1_SX355
+MDLC1_SX445
+MDLC1_SX85
+MDLC2_SI1614
+MDLC2_SI2244
+MDLC2_SI984
+MDLC2_SX174
+MDLC2_SX264
+MDLC2_SX354
+MDLC2_SX444
+MDLC2_SX84
+MDLH0_SI1960
+MDLH0_SI574
+MDLH0_SI700
+MDLH0_SX160
+MDLH0_SX250
+MDLH0_SX340
+MDLH0_SX430
+MDLH0_SX70
+MDLM0_SI1234
+MDLM0_SI1864
+MDLM0_SI604
+MDLM0_SX154
+MDLM0_SX244
+MDLM0_SX334
+MDLM0_SX424
+MDLM0_SX64
+MDLR0_SI1233
+MDLR0_SI1863
+MDLR0_SI603
+MDLR0_SX153
+MDLR0_SX243
+MDLR0_SX333
+MDLR0_SX423
+MDLR0_SX63
+MDLR1_SI1299
+MDLR1_SI1929
+MDLR1_SI669
+MDLR1_SX129
+MDLR1_SX219
+MDLR1_SX309
+MDLR1_SX39
+MDLR1_SX399
+MDMA0_SI1238
+MDMA0_SI1430
+MDMA0_SI2060
+MDMA0_SX170
+MDMA0_SX260
+MDMA0_SX350
+MDMA0_SX440
+MDMA0_SX80
+MDMT0_SI1832
+MDMT0_SI2341
+MDMT0_SI572
+MDMT0_SX122
+MDMT0_SX212
+MDMT0_SX302
+MDMT0_SX32
+MDMT0_SX392
+MDNS0_SI1011
+MDNS0_SI2271
+MDNS0_SI873
+MDNS0_SX111
+MDNS0_SX201
+MDNS0_SX21
+MDNS0_SX291
+MDNS0_SX381
+MDPB0_SI1760
+MDPB0_SI2126
+MDPB0_SI866
+MDPB0_SX146
+MDPB0_SX236
+MDPB0_SX326
+MDPB0_SX416
+MDPB0_SX56
+MDPK0_SI1053
+MDPK0_SI1683
+MDPK0_SI552
+MDPK0_SX153
+MDPK0_SX243
+MDPK0_SX333
+MDPK0_SX423
+MDPK0_SX63
+MDPS0_SI1651
+MDPS0_SI1979
+MDPS0_SI719
+MDPS0_SX179
+MDPS0_SX269
+MDPS0_SX359
+MDPS0_SX449
+MDPS0_SX89
+MDRD0_SI1382
+MDRD0_SI2012
+MDRD0_SI752
+MDRD0_SX122
+MDRD0_SX212
+MDRD0_SX302
+MDRD0_SX32
+MDRD0_SX392
+MDSJ0_SI1462
+MDSJ0_SI2092
+MDSJ0_SI832
+MDSJ0_SX112
+MDSJ0_SX22
+MDSJ0_SX292
+MDSJ0_SX382
+MDSJ0_SX438
+MDSS0_SI1881
+MDSS0_SI2087
+MDSS0_SI621
+MDSS0_SX171
+MDSS0_SX261
+MDSS0_SX351
+MDSS0_SX441
+MDSS0_SX81
+MDSS1_SI1327
+MDSS1_SI1713
+MDSS1_SI697
+MDSS1_SX157
+MDSS1_SX247
+MDSS1_SX337
+MDSS1_SX427
+MDSS1_SX67
+MDTB0_SI1200
+MDTB0_SI1830
+MDTB0_SI570
+MDTB0_SX120
+MDTB0_SX210
+MDTB0_SX300
+MDTB0_SX321
+MDTB0_SX390
+MDWD0_SI1260
+MDWD0_SI1890
+MDWD0_SI557
+MDWD0_SX180
+MDWD0_SX270
+MDWD0_SX360
+MDWD0_SX450
+MDWD0_SX90
+MDWH0_SI1168
+MDWH0_SI1925
+MDWH0_SI665
+MDWH0_SX125
+MDWH0_SX215
+MDWH0_SX305
+MDWH0_SX35
+MDWH0_SX395
+MDWM0_SI1546
+MDWM0_SI2176
+MDWM0_SI916
+MDWM0_SX106
+MDWM0_SX16
+MDWM0_SX286
+MDWM0_SX376
+MDWM0_SX433
+MEAL0_SI1547
+MEAL0_SI2177
+MEAL0_SI917
+MEAL0_SX107
+MEAL0_SX197
+MEAL0_SX287
+MEAL0_SX347
+MEAL0_SX377
+MEDR0_SI1374
+MEDR0_SI2004
+MEDR0_SI744
+MEDR0_SX114
+MEDR0_SX204
+MEDR0_SX24
+MEDR0_SX294
+MEDR0_SX384
+MEFG0_SI465
+MEFG0_SI491
+MEFG0_SI598
+MEFG0_SX105
+MEFG0_SX15
+MEFG0_SX195
+MEFG0_SX285
+MEFG0_SX375
+MEGJ0_SI1337
+MEGJ0_SI1967
+MEGJ0_SI707
+MEGJ0_SX167
+MEGJ0_SX257
+MEGJ0_SX3
+MEGJ0_SX437
+MEGJ0_SX77
+MEJL0_SI1592
+MEJL0_SI1654
+MEJL0_SI962
+MEJL0_SX152
+MEJL0_SX242
+MEJL0_SX332
+MEJL0_SX422
+MEJL0_SX62
+MEJS0_SI1240
+MEJS0_SI1870
+MEJS0_SI610
+MEJS0_SX160
+MEJS0_SX250
+MEJS0_SX340
+MEJS0_SX430
+MEJS0_SX70
+MESG0_SI1332
+MESG0_SI1962
+MESG0_SI702
+MESG0_SX162
+MESG0_SX252
+MESG0_SX342
+MESG0_SX432
+MESG0_SX72
+MESJ0_SI2039
+MESJ0_SI2257
+MESJ0_SI997
+MESJ0_SX187
+MESJ0_SX277
+MESJ0_SX367
+MESJ0_SX7
+MESJ0_SX97
+MEWM0_SI1348
+MEWM0_SI1978
+MEWM0_SI718
+MEWM0_SX178
+MEWM0_SX268
+MEWM0_SX358
+MEWM0_SX448
+MEWM0_SX88
+MFER0_SI1492
+MFER0_SI2122
+MFER0_SI862
+MFER0_SX142
+MFER0_SX232
+MFER0_SX322
+MFER0_SX412
+MFER0_SX52
+MFMC0_SI1132
+MFMC0_SI1762
+MFMC0_SI502
+MFMC0_SX142
+MFMC0_SX232
+MFMC0_SX322
+MFMC0_SX412
+MFMC0_SX52
+MFRM0_SI1155
+MFRM0_SI1717
+MFRM0_SI1785
+MFRM0_SX165
+MFRM0_SX255
+MFRM0_SX345
+MFRM0_SX435
+MFRM0_SX75
+MFWK0_SI1249
+MFWK0_SI1879
+MFWK0_SI619
+MFWK0_SX169
+MFWK0_SX259
+MFWK0_SX349
+MFWK0_SX439
+MFWK0_SX79
+MFXS0_SI1674
+MFXS0_SI2225
+MFXS0_SI2304
+MFXS0_SX144
+MFXS0_SX234
+MFXS0_SX324
+MFXS0_SX414
+MFXS0_SX54
+MFXV0_SI1005
+MFXV0_SI1342
+MFXV0_SI1635
+MFXV0_SX105
+MFXV0_SX15
+MFXV0_SX195
+MFXV0_SX285
+MFXV0_SX375
+MGAF0_SI1282
+MGAF0_SI1912
+MGAF0_SI652
+MGAF0_SX112
+MGAF0_SX202
+MGAF0_SX22
+MGAF0_SX292
+MGAF0_SX382
+MGAG0_SI1321
+MGAG0_SI645
+MGAG0_SI691
+MGAG0_SX151
+MGAG0_SX241
+MGAG0_SX331
+MGAG0_SX421
+MGAG0_SX61
+MGAK0_SI1036
+MGAK0_SI1666
+MGAK0_SI2296
+MGAK0_SX136
+MGAK0_SX226
+MGAK0_SX316
+MGAK0_SX406
+MGAK0_SX46
+MGAR0_SI1212
+MGAR0_SI1694
+MGAR0_SI1842
+MGAR0_SX132
+MGAR0_SX222
+MGAR0_SX312
+MGAR0_SX402
+MGAR0_SX42
+MGAW0_SI1165
+MGAW0_SI1802
+MGAW0_SI535
+MGAW0_SX175
+MGAW0_SX265
+MGAW0_SX355
+MGAW0_SX445
+MGAW0_SX85
+MGES0_SI1481
+MGES0_SI2111
+MGES0_SI851
+MGES0_SX131
+MGES0_SX221
+MGES0_SX311
+MGES0_SX401
+MGES0_SX41
+MGJC0_SI1256
+MGJC0_SI1335
+MGJC0_SI1965
+MGJC0_SX165
+MGJC0_SX255
+MGJC0_SX345
+MGJC0_SX435
+MGJC0_SX75
+MGRL0_SI1497
+MGRL0_SI2127
+MGRL0_SI867
+MGRL0_SX147
+MGRL0_SX237
+MGRL0_SX327
+MGRL0_SX417
+MGRL0_SX57
+MGRP0_SI1317
+MGRP0_SI1947
+MGRP0_SI687
+MGRP0_SX147
+MGRP0_SX237
+MGRP0_SX327
+MGRP0_SX417
+MGRP0_SX57
+MGSH0_SI1176
+MGSH0_SI1806
+MGSH0_SI546
+MGSH0_SX127
+MGSH0_SX186
+MGSH0_SX276
+MGSH0_SX6
+MGSH0_SX96
+MGSL0_SI1164
+MGSL0_SI534
+MGSL0_SI797
+MGSL0_SX174
+MGSL0_SX264
+MGSL0_SX354
+MGSL0_SX444
+MGSL0_SX84
+MGXP0_SI1087
+MGXP0_SI457
+MGXP0_SI525
+MGXP0_SX187
+MGXP0_SX277
+MGXP0_SX367
+MGXP0_SX7
+MGXP0_SX97
+MHBS0_SI1575
+MHBS0_SI2205
+MHBS0_SI945
+MHBS0_SX135
+MHBS0_SX225
+MHBS0_SX315
+MHBS0_SX405
+MHBS0_SX45
+MHIT0_SI1613
+MHIT0_SI2243
+MHIT0_SI983
+MHIT0_SX173
+MHIT0_SX263
+MHIT0_SX353
+MHIT0_SX443
+MHIT0_SX83
+MHJB0_SI1017
+MHJB0_SI1647
+MHJB0_SI2277
+MHJB0_SX117
+MHJB0_SX207
+MHJB0_SX27
+MHJB0_SX297
+MHJB0_SX387
+MHMG0_SI1365
+MHMG0_SI1995
+MHMG0_SI735
+MHMG0_SX105
+MHMG0_SX15
+MHMG0_SX195
+MHMG0_SX285
+MHMG0_SX375
+MHMR0_SI1119
+MHMR0_SI1692
+MHMR0_SI489
+MHMR0_SX129
+MHMR0_SX219
+MHMR0_SX309
+MHMR0_SX39
+MHMR0_SX399
+MHRM0_SI1475
+MHRM0_SI2218
+MHRM0_SI958
+MHRM0_SX148
+MHRM0_SX238
+MHRM0_SX328
+MHRM0_SX418
+MHRM0_SX58
+MHXL0_SI1772
+MHXL0_SI512
+MHXL0_SI612
+MHXL0_SX152
+MHXL0_SX242
+MHXL0_SX332
+MHXL0_SX422
+MHXL0_SX62
+MILB0_SI2163
+MILB0_SI807
+MILB0_SI903
+MILB0_SX183
+MILB0_SX273
+MILB0_SX3
+MILB0_SX363
+MILB0_SX93
+MJAC0_SI1331
+MJAC0_SI2148
+MJAC0_SI701
+MJAC0_SX251
+MJAC0_SX307
+MJAC0_SX341
+MJAC0_SX431
+MJAC0_SX71
+MJAE0_SI1524
+MJAE0_SI1999
+MJAE0_SI2154
+MJAE0_SX174
+MJAE0_SX264
+MJAE0_SX354
+MJAE0_SX444
+MJAE0_SX84
+MJAI0_SI1604
+MJAI0_SI682
+MJAI0_SI710
+MJAI0_SX164
+MJAI0_SX254
+MJAI0_SX344
+MJAI0_SX434
+MJAI0_SX74
+MJBG0_SI1232
+MJBG0_SI1724
+MJBG0_SI1862
+MJBG0_SX152
+MJBG0_SX242
+MJBG0_SX332
+MJBG0_SX422
+MJBG0_SX62
+MJDA0_SI1031
+MJDA0_SI1661
+MJDA0_SI2291
+MJDA0_SX131
+MJDA0_SX221
+MJDA0_SX311
+MJDA0_SX401
+MJDA0_SX41
+MJDC0_SI1161
+MJDC0_SI2165
+MJDC0_SI531
+MJDC0_SX171
+MJDC0_SX261
+MJDC0_SX351
+MJDC0_SX441
+MJDC0_SX81
+MJDE0_SI1120
+MJDE0_SI463
+MJDE0_SI490
+MJDE0_SX130
+MJDE0_SX220
+MJDE0_SX310
+MJDE0_SX40
+MJDE0_SX400
+MJDG0_SI1042
+MJDG0_SI1672
+MJDG0_SI1705
+MJDG0_SX142
+MJDG0_SX232
+MJDG0_SX322
+MJDG0_SX412
+MJDG0_SX52
+MJDM0_SI1340
+MJDM0_SI1937
+MJDM0_SI974
+MJDM0_SX170
+MJDM0_SX260
+MJDM0_SX350
+MJDM0_SX440
+MJDM0_SX80
+MJEB0_SI1286
+MJEB0_SI1916
+MJEB0_SI656
+MJEB0_SX170
+MJEB0_SX206
+MJEB0_SX26
+MJEB0_SX296
+MJEB0_SX386
+MJEB1_SI1467
+MJEB1_SI2097
+MJEB1_SI837
+MJEB1_SX117
+MJEB1_SX207
+MJEB1_SX27
+MJEB1_SX297
+MJEB1_SX387
+MJEE0_SI1237
+MJEE0_SI1867
+MJEE0_SI607
+MJEE0_SX157
+MJEE0_SX247
+MJEE0_SX337
+MJEE0_SX427
+MJEE0_SX67
+MJFH0_SI1107
+MJFH0_SI1737
+MJFH0_SI477
+MJFH0_SX117
+MJFH0_SX207
+MJFH0_SX27
+MJFH0_SX297
+MJFH0_SX387
+MJFR0_SI1605
+MJFR0_SI2235
+MJFR0_SI975
+MJFR0_SX165
+MJFR0_SX255
+MJFR0_SX345
+MJFR0_SX435
+MJFR0_SX75
+MJHI0_SI1328
+MJHI0_SI555
+MJHI0_SI698
+MJHI0_SX158
+MJHI0_SX248
+MJHI0_SX338
+MJHI0_SX428
+MJHI0_SX68
+MJJB0_SI1139
+MJJB0_SI1277
+MJJB0_SI1769
+MJJB0_SX149
+MJJB0_SX239
+MJJB0_SX329
+MJJB0_SX419
+MJJB0_SX59
+MJJJ0_SI1163
+MJJJ0_SI1793
+MJJJ0_SI533
+MJJJ0_SX173
+MJJJ0_SX263
+MJJJ0_SX353
+MJJJ0_SX443
+MJJJ0_SX83
+MJJM0_SI1251
+MJJM0_SI1457
+MJJM0_SI827
+MJJM0_SX107
+MJJM0_SX17
+MJJM0_SX197
+MJJM0_SX287
+MJJM0_SX377
+MJKR0_SI1201
+MJKR0_SI1831
+MJKR0_SI571
+MJKR0_SX121
+MJKR0_SX211
+MJKR0_SX301
+MJKR0_SX31
+MJKR0_SX391
+MJLB0_SI1616
+MJLB0_SI2246
+MJLB0_SI986
+MJLB0_SX176
+MJLB0_SX266
+MJLB0_SX356
+MJLB0_SX446
+MJLB0_SX86
+MJLG1_SI1012
+MJLG1_SI1642
+MJLG1_SI2272
+MJLG1_SX112
+MJLG1_SX202
+MJLG1_SX22
+MJLG1_SX292
+MJLG1_SX382
+MJLS0_SI1096
+MJLS0_SI1726
+MJLS0_SI466
+MJLS0_SX106
+MJLS0_SX16
+MJLS0_SX196
+MJLS0_SX286
+MJLS0_SX376
+MJMA0_SI1495
+MJMA0_SI2125
+MJMA0_SI865
+MJMA0_SX145
+MJMA0_SX235
+MJMA0_SX325
+MJMA0_SX415
+MJMA0_SX55
+MJMD0_SI1028
+MJMD0_SI1658
+MJMD0_SI2288
+MJMD0_SX128
+MJMD0_SX218
+MJMD0_SX308
+MJMD0_SX38
+MJMD0_SX398
+MJMM0_SI1255
+MJMM0_SI1885
+MJMM0_SI625
+MJMM0_SX175
+MJMM0_SX265
+MJMM0_SX355
+MJMM0_SX445
+MJMM0_SX85
+MJPG0_SI1191
+MJPG0_SI1821
+MJPG0_SI561
+MJPG0_SX111
+MJPG0_SX201
+MJPG0_SX21
+MJPG0_SX291
+MJPG0_SX381
+MJPM0_SI1368
+MJPM0_SI1998
+MJPM0_SI738
+MJPM0_SX108
+MJPM0_SX18
+MJPM0_SX198
+MJPM0_SX288
+MJPM0_SX378
+MJPM1_SI1897
+MJPM1_SI2280
+MJPM1_SI761
+MJPM1_SX131
+MJPM1_SX221
+MJPM1_SX311
+MJPM1_SX401
+MJPM1_SX41
+MJRA0_SI1236
+MJRA0_SI1866
+MJRA0_SI606
+MJRA0_SX156
+MJRA0_SX246
+MJRA0_SX336
+MJRA0_SX426
+MJRA0_SX66
+MJRG0_SI1366
+MJRG0_SI1996
+MJRG0_SI736
+MJRG0_SX106
+MJRG0_SX16
+MJRG0_SX286
+MJRG0_SX352
+MJRG0_SX376
+MJRH0_SI1125
+MJRH0_SI1755
+MJRH0_SI1840
+MJRH0_SX135
+MJRH0_SX225
+MJRH0_SX315
+MJRH0_SX405
+MJRH0_SX45
+MJRH1_SI1558
+MJRH1_SI1774
+MJRH1_SI514
+MJRH1_SX154
+MJRH1_SX244
+MJRH1_SX334
+MJRH1_SX424
+MJRH1_SX64
+MJRK0_SI1662
+MJRK0_SI2103
+MJRK0_SI880
+MJRK0_SX160
+MJRK0_SX250
+MJRK0_SX340
+MJRK0_SX430
+MJRK0_SX70
+MJRP0_SI1835
+MJRP0_SI1845
+MJRP0_SI585
+MJRP0_SX135
+MJRP0_SX225
+MJRP0_SX315
+MJRP0_SX405
+MJRP0_SX45
+MJSR0_SI1424
+MJSR0_SI2054
+MJSR0_SI794
+MJSR0_SX164
+MJSR0_SX254
+MJSR0_SX344
+MJSR0_SX434
+MJSR0_SX74
+MJWG0_SI2155
+MJWG0_SI813
+MJWG0_SI895
+MJWG0_SX175
+MJWG0_SX265
+MJWG0_SX355
+MJWG0_SX445
+MJWG0_SX85
+MJWS0_SI1143
+MJWS0_SI1773
+MJWS0_SI513
+MJWS0_SX153
+MJWS0_SX243
+MJWS0_SX333
+MJWS0_SX423
+MJWS0_SX63
+MJWT0_SI1291
+MJWT0_SI1381
+MJWT0_SI751
+MJWT0_SX121
+MJWT0_SX211
+MJWT0_SX301
+MJWT0_SX31
+MJWT0_SX391
+MJXA0_SI1507
+MJXA0_SI2137
+MJXA0_SI877
+MJXA0_SX157
+MJXA0_SX247
+MJXA0_SX337
+MJXA0_SX427
+MJXA0_SX67
+MJXL0_SI1172
+MJXL0_SI1795
+MJXL0_SI542
+MJXL0_SX182
+MJXL0_SX272
+MJXL0_SX362
+MJXL0_SX452
+MJXL0_SX92
+MKAG0_SI1609
+MKAG0_SI2239
+MKAG0_SI979
+MKAG0_SX169
+MKAG0_SX259
+MKAG0_SX30
+MKAG0_SX439
+MKAG0_SX79
+MKAH0_SI1528
+MKAH0_SI2158
+MKAH0_SI898
+MKAH0_SX178
+MKAH0_SX268
+MKAH0_SX358
+MKAH0_SX448
+MKAH0_SX88
+MKAJ0_SI1414
+MKAJ0_SI2044
+MKAJ0_SI784
+MKAJ0_SX154
+MKAJ0_SX244
+MKAJ0_SX334
+MKAJ0_SX424
+MKAJ0_SX64
+MKAM0_SI1250
+MKAM0_SI1316
+MKAM0_SI1465
+MKAM0_SX146
+MKAM0_SX236
+MKAM0_SX326
+MKAM0_SX416
+MKAM0_SX56
+MKDB0_SI2132
+MKDB0_SI588
+MKDB0_SI872
+MKDB0_SX152
+MKDB0_SX242
+MKDB0_SX332
+MKDB0_SX422
+MKDB0_SX62
+MKDD0_SI1567
+MKDD0_SI2197
+MKDD0_SI937
+MKDD0_SX127
+MKDD0_SX217
+MKDD0_SX307
+MKDD0_SX37
+MKDD0_SX397
+MKDT0_SI2153
+MKDT0_SI814
+MKDT0_SI893
+MKDT0_SX173
+MKDT0_SX263
+MKDT0_SX353
+MKDT0_SX443
+MKDT0_SX83
+MKES0_SI1253
+MKES0_SI1883
+MKES0_SI623
+MKES0_SX173
+MKES0_SX263
+MKES0_SX353
+MKES0_SX443
+MKES0_SX83
+MKJO0_SI1517
+MKJO0_SI2147
+MKJO0_SI887
+MKJO0_SX167
+MKJO0_SX257
+MKJO0_SX424
+MKJO0_SX437
+MKJO0_SX77
+MKLN0_SI1598
+MKLN0_SI2228
+MKLN0_SI968
+MKLN0_SX158
+MKLN0_SX248
+MKLN0_SX338
+MKLN0_SX428
+MKLN0_SX68
+MKLR0_SI1059
+MKLR0_SI1689
+MKLR0_SI2319
+MKLR0_SX159
+MKLR0_SX249
+MKLR0_SX339
+MKLR0_SX429
+MKLR0_SX69
+MKLS0_SI1437
+MKLS0_SI1533
+MKLS0_SI2067
+MKLS0_SX177
+MKLS0_SX267
+MKLS0_SX357
+MKLS0_SX447
+MKLS0_SX87
+MKLS1_SI1545
+MKLS1_SI2175
+MKLS1_SI915
+MKLS1_SX105
+MKLS1_SX15
+MKLS1_SX195
+MKLS1_SX285
+MKLS1_SX375
+MKLW0_SI1571
+MKLW0_SI1844
+MKLW0_SI2201
+MKLW0_SX131
+MKLW0_SX221
+MKLW0_SX311
+MKLW0_SX401
+MKLW0_SX41
+MKRG0_SI1491
+MKRG0_SI2121
+MKRG0_SI861
+MKRG0_SX141
+MKRG0_SX231
+MKRG0_SX31
+MKRG0_SX411
+MKRG0_SX51
+MKXL0_SI1185
+MKXL0_SI1815
+MKXL0_SI1958
+MKXL0_SX105
+MKXL0_SX15
+MKXL0_SX195
+MKXL0_SX285
+MKXL0_SX375
+MLBC0_SI1239
+MLBC0_SI1869
+MLBC0_SI609
+MLBC0_SX159
+MLBC0_SX249
+MLBC0_SX339
+MLBC0_SX429
+MLBC0_SX69
+MLEL0_SI1246
+MLEL0_SI1876
+MLEL0_SI616
+MLEL0_SX166
+MLEL0_SX256
+MLEL0_SX346
+MLEL0_SX436
+MLEL0_SX76
+MLJC0_SI1225
+MLJC0_SI1855
+MLJC0_SI595
+MLJC0_SX145
+MLJC0_SX235
+MLJC0_SX325
+MLJC0_SX415
+MLJC0_SX55
+MLJH0_SI1324
+MLJH0_SI1422
+MLJH0_SI694
+MLJH0_SX154
+MLJH0_SX244
+MLJH0_SX334
+MLJH0_SX424
+MLJH0_SX64
+MLNS0_SI1407
+MLNS0_SI2037
+MLNS0_SI777
+MLNS0_SX147
+MLNS0_SX237
+MLNS0_SX327
+MLNS0_SX417
+MLNS0_SX57
+MLSH0_SI1417
+MLSH0_SI2047
+MLSH0_SI787
+MLSH0_SX157
+MLSH0_SX247
+MLSH0_SX337
+MLSH0_SX427
+MLSH0_SX67
+MMAA0_SI1588
+MMAA0_SI2105
+MMAA0_SI845
+MMAA0_SX125
+MMAA0_SX215
+MMAA0_SX305
+MMAA0_SX35
+MMAA0_SX395
+MMAB1_SI1494
+MMAB1_SI2124
+MMAB1_SI864
+MMAB1_SX144
+MMAB1_SX234
+MMAB1_SX324
+MMAB1_SX414
+MMAB1_SX54
+MMAG0_SI1126
+MMAG0_SI1756
+MMAG0_SI496
+MMAG0_SX136
+MMAG0_SX226
+MMAG0_SX316
+MMAG0_SX406
+MMAG0_SX46
+MMAM0_SI1597
+MMAM0_SI1668
+MMAM0_SI2227
+MMAM0_SX157
+MMAM0_SX247
+MMAM0_SX337
+MMAM0_SX427
+MMAM0_SX67
+MMAR0_SI1336
+MMAR0_SI1966
+MMAR0_SI706
+MMAR0_SX166
+MMAR0_SX256
+MMAR0_SX346
+MMAR0_SX436
+MMAR0_SX76
+MMBS0_SI1151
+MMBS0_SI1781
+MMBS0_SI521
+MMBS0_SX161
+MMBS0_SX251
+MMBS0_SX341
+MMBS0_SX431
+MMBS0_SX71
+MMCC0_SI1338
+MMCC0_SI1968
+MMCC0_SI708
+MMCC0_SX168
+MMCC0_SX258
+MMCC0_SX348
+MMCC0_SX438
+MMCC0_SX78
+MMDB0_SI1358
+MMDB0_SI1617
+MMDB0_SI987
+MMDB0_SX177
+MMDB0_SX267
+MMDB0_SX357
+MMDB0_SX447
+MMDB0_SX87
+MMDG0_SI1780
+MMDG0_SI2035
+MMDG0_SI520
+MMDG0_SX160
+MMDG0_SX250
+MMDG0_SX340
+MMDG0_SX430
+MMDG0_SX70
+MMDM0_SI1311
+MMDM0_SI1941
+MMDM0_SI681
+MMDM0_SX141
+MMDM0_SX231
+MMDM0_SX321
+MMDM0_SX411
+MMDM0_SX51
+MMDM1_SI1650
+MMDM1_SI2043
+MMDM1_SI783
+MMDM1_SX153
+MMDM1_SX243
+MMDM1_SX333
+MMDM1_SX423
+MMDM1_SX63
+MMDS0_SI1343
+MMDS0_SI1973
+MMDS0_SI713
+MMDS0_SX173
+MMDS0_SX263
+MMDS0_SX353
+MMDS0_SX443
+MMDS0_SX83
+MMEA0_SI1388
+MMEA0_SI2018
+MMEA0_SI758
+MMEA0_SX128
+MMEA0_SX218
+MMEA0_SX308
+MMEA0_SX38
+MMEA0_SX398
+MMEB0_SI1357
+MMEB0_SI1987
+MMEB0_SI727
+MMEB0_SX187
+MMEB0_SX327
+MMEB0_SX367
+MMEB0_SX7
+MMEB0_SX97
+MMGC0_SI1305
+MMGC0_SI1935
+MMGC0_SI2184
+MMGC0_SX135
+MMGC0_SX225
+MMGC0_SX315
+MMGC0_SX405
+MMGC0_SX45
+MMGG0_SI1079
+MMGG0_SI1709
+MMGG0_SI2339
+MMGG0_SX179
+MMGG0_SX269
+MMGG0_SX359
+MMGG0_SX449
+MMGG0_SX89
+MMGK0_SI1322
+MMGK0_SI1952
+MMGK0_SI692
+MMGK0_SX152
+MMGK0_SX242
+MMGK0_SX332
+MMGK0_SX422
+MMGK0_SX62
+MMJB1_SI1408
+MMJB1_SI2038
+MMJB1_SI778
+MMJB1_SX148
+MMJB1_SX238
+MMJB1_SX328
+MMJB1_SX418
+MMJB1_SX58
+MMLM0_SI1527
+MMLM0_SI2150
+MMLM0_SI897
+MMLM0_SX177
+MMLM0_SX267
+MMLM0_SX357
+MMLM0_SX447
+MMLM0_SX87
+MMPM0_SI1061
+MMPM0_SI1691
+MMPM0_SI2321
+MMPM0_SX161
+MMPM0_SX251
+MMPM0_SX341
+MMPM0_SX431
+MMPM0_SX71
+MMRP0_SI2034
+MMRP0_SI717
+MMRP0_SI774
+MMRP0_SX144
+MMRP0_SX234
+MMRP0_SX324
+MMRP0_SX414
+MMRP0_SX54
+MMSM0_SI1106
+MMSM0_SI1736
+MMSM0_SI476
+MMSM0_SX116
+MMSM0_SX206
+MMSM0_SX26
+MMSM0_SX296
+MMSM0_SX386
+MMVP0_SI1284
+MMVP0_SI1914
+MMVP0_SI654
+MMVP0_SX114
+MMVP0_SX204
+MMVP0_SX294
+MMVP0_SX347
+MMVP0_SX384
+MMWB0_SI1619
+MMWB0_SI2249
+MMWB0_SI989
+MMWB0_SX179
+MMWB0_SX269
+MMWB0_SX359
+MMWB0_SX449
+MMWB0_SX89
+MMWS0_SI1518
+MMWS0_SI559
+MMWS0_SI888
+MMWS0_SX168
+MMWS0_SX258
+MMWS0_SX348
+MMWS0_SX438
+MMWS0_SX78
+MMWS1_SI1071
+MMWS1_SI1701
+MMWS1_SI2331
+MMWS1_SX261
+MMWS1_SX27
+MMWS1_SX351
+MMWS1_SX441
+MMWS1_SX81
+MMXS0_SI2136
+MMXS0_SI629
+MMXS0_SI876
+MMXS0_SX156
+MMXS0_SX246
+MMXS0_SX336
+MMXS0_SX426
+MMXS0_SX66
+MNET0_SI1446
+MNET0_SI2076
+MNET0_SI816
+MNET0_SX186
+MNET0_SX276
+MNET0_SX366
+MNET0_SX6
+MNET0_SX96
+MNTW0_SI1068
+MNTW0_SI1698
+MNTW0_SI2328
+MNTW0_SX168
+MNTW0_SX202
+MNTW0_SX258
+MNTW0_SX348
+MNTW0_SX78
+MPAR0_SI1576
+MPAR0_SI2206
+MPAR0_SI946
+MPAR0_SX136
+MPAR0_SX226
+MPAR0_SX316
+MPAR0_SX406
+MPAR0_SX46
+MPEB0_SI1034
+MPEB0_SI1860
+MPEB0_SI600
+MPEB0_SX150
+MPEB0_SX240
+MPEB0_SX330
+MPEB0_SX420
+MPEB0_SX60
+MPFU0_SI1258
+MPFU0_SI1888
+MPFU0_SI628
+MPFU0_SX178
+MPFU0_SX268
+MPFU0_SX358
+MPFU0_SX448
+MPFU0_SX88
+MPGH0_SI1554
+MPGH0_SI675
+MPGH0_SI924
+MPGH0_SX114
+MPGH0_SX204
+MPGH0_SX24
+MPGH0_SX294
+MPGH0_SX384
+MPGR0_SI1410
+MPGR0_SI2040
+MPGR0_SI780
+MPGR0_SX150
+MPGR0_SX240
+MPGR0_SX330
+MPGR0_SX420
+MPGR0_SX60
+MPGR1_SI1269
+MPGR1_SI1499
+MPGR1_SI2129
+MPGR1_SX149
+MPGR1_SX239
+MPGR1_SX329
+MPGR1_SX419
+MPGR1_SX59
+MPMB0_SI1501
+MPMB0_SI2131
+MPMB0_SI871
+MPMB0_SX151
+MPMB0_SX241
+MPMB0_SX331
+MPMB0_SX421
+MPMB0_SX61
+MPPC0_SI1412
+MPPC0_SI2042
+MPPC0_SI782
+MPPC0_SX152
+MPPC0_SX242
+MPPC0_SX332
+MPPC0_SX422
+MPPC0_SX62
+MPRB0_SI1205
+MPRB0_SI1215
+MPRB0_SI575
+MPRB0_SX125
+MPRB0_SX215
+MPRB0_SX305
+MPRB0_SX35
+MPRB0_SX395
+MPRD0_SI1431
+MPRD0_SI2061
+MPRD0_SI801
+MPRD0_SX171
+MPRD0_SX261
+MPRD0_SX351
+MPRD0_SX441
+MPRD0_SX81
+MPRK0_SI1097
+MPRK0_SI1727
+MPRK0_SI467
+MPRK0_SX107
+MPRK0_SX17
+MPRK0_SX197
+MPRK0_SX287
+MPRK0_SX377
+MPRT0_SI1210
+MPRT0_SI495
+MPRT0_SI580
+MPRT0_SX130
+MPRT0_SX220
+MPRT0_SX310
+MPRT0_SX40
+MPRT0_SX400
+MPSW0_SI1067
+MPSW0_SI1697
+MPSW0_SI2327
+MPSW0_SX167
+MPSW0_SX24
+MPSW0_SX257
+MPSW0_SX437
+MPSW0_SX77
+MRAB0_SI1224
+MRAB0_SI1854
+MRAB0_SI594
+MRAB0_SX144
+MRAB0_SX234
+MRAB0_SX324
+MRAB0_SX414
+MRAB0_SX54
+MRAB1_SI1478
+MRAB1_SI2108
+MRAB1_SI848
+MRAB1_SX128
+MRAB1_SX218
+MRAB1_SX308
+MRAB1_SX38
+MRAB1_SX398
+MRAI0_SI1954
+MRAI0_SI2052
+MRAI0_SI792
+MRAI0_SX162
+MRAI0_SX252
+MRAI0_SX342
+MRAI0_SX432
+MRAI0_SX72
+MRAM0_SI1275
+MRAM0_SI1905
+MRAM0_SI1951
+MRAM0_SX105
+MRAM0_SX15
+MRAM0_SX195
+MRAM0_SX285
+MRAM0_SX375
+MRAV0_SI1008
+MRAV0_SI1638
+MRAV0_SI2268
+MRAV0_SX108
+MRAV0_SX18
+MRAV0_SX198
+MRAV0_SX288
+MRAV0_SX378
+MRBC0_SI1665
+MRBC0_SI1859
+MRBC0_SI599
+MRBC0_SX149
+MRBC0_SX239
+MRBC0_SX329
+MRBC0_SX419
+MRBC0_SX59
+MRCG0_SI1428
+MRCG0_SI2058
+MRCG0_SI798
+MRCG0_SX168
+MRCG0_SX258
+MRCG0_SX348
+MRCG0_SX438
+MRCG0_SX78
+MRCW0_SI1371
+MRCW0_SI2001
+MRCW0_SI741
+MRCW0_SX111
+MRCW0_SX201
+MRCW0_SX21
+MRCW0_SX291
+MRCW0_SX381
+MRDD0_SI1050
+MRDD0_SI1680
+MRDD0_SI2310
+MRDD0_SX150
+MRDD0_SX240
+MRDD0_SX277
+MRDD0_SX330
+MRDD0_SX60
+MRDM0_SI1044
+MRDM0_SI1595
+MRDM0_SI965
+MRDM0_SX155
+MRDM0_SX245
+MRDM0_SX335
+MRDM0_SX425
+MRDM0_SX65
+MRDS0_SI1167
+MRDS0_SI1797
+MRDS0_SI537
+MRDS0_SX177
+MRDS0_SX267
+MRDS0_SX357
+MRDS0_SX447
+MRDS0_SX87
+MREE0_SI1104
+MREE0_SI1734
+MREE0_SI1959
+MREE0_SX114
+MREE0_SX204
+MREE0_SX24
+MREE0_SX294
+MREE0_SX384
+MREH1_SI1599
+MREH1_SI2229
+MREH1_SI969
+MREH1_SX159
+MREH1_SX249
+MREH1_SX339
+MREH1_SX429
+MREH1_SX69
+MREM0_SI1591
+MREM0_SI511
+MREM0_SI961
+MREM0_SX151
+MREM0_SX241
+MREM0_SX331
+MREM0_SX421
+MREM0_SX61
+MREW1_SI1500
+MREW1_SI2130
+MREW1_SI870
+MREW1_SX150
+MREW1_SX240
+MREW1_SX330
+MREW1_SX420
+MREW1_SX60
+MRFK0_SI1076
+MRFK0_SI1706
+MRFK0_SI2336
+MRFK0_SX176
+MRFK0_SX266
+MRFK0_SX356
+MRFK0_SX446
+MRFK0_SX86
+MRFL0_SI1156
+MRFL0_SI1786
+MRFL0_SI526
+MRFL0_SX166
+MRFL0_SX256
+MRFL0_SX346
+MRFL0_SX436
+MRFL0_SX76
+MRGM0_SI1162
+MRGM0_SI1792
+MRGM0_SI532
+MRGM0_SX172
+MRGM0_SX262
+MRGM0_SX416
+MRGM0_SX442
+MRGM0_SX82
+MRGS0_SI1356
+MRGS0_SI1986
+MRGS0_SI726
+MRGS0_SX186
+MRGS0_SX276
+MRGS0_SX366
+MRGS0_SX6
+MRGS0_SX96
+MRHL0_SI1515
+MRHL0_SI2145
+MRHL0_SI885
+MRHL0_SX165
+MRHL0_SX255
+MRHL0_SX345
+MRHL0_SX435
+MRHL0_SX75
+MRJB1_SI1020
+MRJB1_SI1413
+MRJB1_SI2021
+MRJB1_SX120
+MRJB1_SX210
+MRJB1_SX30
+MRJB1_SX300
+MRJB1_SX390
+MRJH0_SI1519
+MRJH0_SI889
+MRJH0_SI914
+MRJH0_SX169
+MRJH0_SX259
+MRJH0_SX307
+MRJH0_SX439
+MRJH0_SX79
+MRJM0_SI1095
+MRJM0_SI1228
+MRJM0_SI1858
+MRJM0_SX148
+MRJM0_SX238
+MRJM0_SX328
+MRJM0_SX418
+MRJM0_SX58
+MRJM1_SI1298
+MRJM1_SI1928
+MRJM1_SI668
+MRJM1_SX128
+MRJM1_SX218
+MRJM1_SX308
+MRJM1_SX38
+MRJM1_SX398
+MRJT0_SI1498
+MRJT0_SI1805
+MRJT0_SI868
+MRJT0_SX148
+MRJT0_SX238
+MRJT0_SX328
+MRJT0_SX418
+MRJT0_SX58
+MRKM0_SI1267
+MRKM0_SI1391
+MRKM0_SI637
+MRKM0_SX187
+MRKM0_SX277
+MRKM0_SX367
+MRKM0_SX7
+MRKM0_SX97
+MRLD0_SI1594
+MRLD0_SI2224
+MRLD0_SI964
+MRLD0_SX154
+MRLD0_SX244
+MRLD0_SX334
+MRLD0_SX424
+MRLD0_SX64
+MRLJ0_SI1420
+MRLJ0_SI2050
+MRLJ0_SI790
+MRLJ0_SX160
+MRLJ0_SX250
+MRLJ0_SX340
+MRLJ0_SX430
+MRLJ0_SX70
+MRLJ1_SI1671
+MRLJ1_SI2301
+MRLJ1_SI2332
+MRLJ1_SX141
+MRLJ1_SX231
+MRLJ1_SX321
+MRLJ1_SX411
+MRLJ1_SX51
+MRLK0_SI1468
+MRLK0_SI2140
+MRLK0_SI843
+MRLK0_SX123
+MRLK0_SX213
+MRLK0_SX303
+MRLK0_SX33
+MRLK0_SX393
+MRLR0_SI1196
+MRLR0_SI1826
+MRLR0_SI566
+MRLR0_SX116
+MRLR0_SX206
+MRLR0_SX26
+MRLR0_SX296
+MRLR0_SX386
+MRMB0_SI1581
+MRMB0_SI2211
+MRMB0_SI951
+MRMB0_SX141
+MRMB0_SX231
+MRMB0_SX321
+MRMB0_SX411
+MRMB0_SX51
+MRMG0_SI1080
+MRMG0_SI1710
+MRMG0_SI2340
+MRMG0_SX180
+MRMG0_SX270
+MRMG0_SX360
+MRMG0_SX450
+MRMG0_SX90
+MRMH0_SI1021
+MRMH0_SI1349
+MRMH0_SI2281
+MRMH0_SX121
+MRMH0_SX211
+MRMH0_SX301
+MRMH0_SX31
+MRMH0_SX391
+MRML0_SI1421
+MRML0_SI2051
+MRML0_SI791
+MRML0_SX161
+MRML0_SX251
+MRML0_SX341
+MRML0_SX431
+MRML0_SX71
+MRMS0_SI1113
+MRMS0_SI2057
+MRMS0_SI2100
+MRMS0_SX120
+MRMS0_SX210
+MRMS0_SX30
+MRMS0_SX300
+MRMS0_SX390
+MRPC1_SI1482
+MRPC1_SI2026
+MRPC1_SI2112
+MRPC1_SX132
+MRPC1_SX222
+MRPC1_SX312
+MRPC1_SX402
+MRPC1_SX42
+MRRE0_SI1334
+MRRE0_SI704
+MRRE0_SI952
+MRRE0_SX164
+MRRE0_SX254
+MRRE0_SX344
+MRRE0_SX434
+MRRE0_SX74
+MRSO0_SI1206
+MRSO0_SI1659
+MRSO0_SI2289
+MRSO0_SX129
+MRSO0_SX219
+MRSO0_SX309
+MRSO0_SX39
+MRSO0_SX399
+MRSP0_SI1429
+MRSP0_SI2059
+MRSP0_SI799
+MRSP0_SX169
+MRSP0_SX196
+MRSP0_SX259
+MRSP0_SX439
+MRSP0_SX79
+MRTC0_SI1458
+MRTC0_SI2088
+MRTC0_SI828
+MRTC0_SX108
+MRTC0_SX18
+MRTC0_SX198
+MRTC0_SX288
+MRTC0_SX378
+MRTJ0_SI1551
+MRTJ0_SI2032
+MRTJ0_SI772
+MRTJ0_SX142
+MRTJ0_SX232
+MRTJ0_SX322
+MRTJ0_SX412
+MRTJ0_SX52
+MRVG0_SI1140
+MRVG0_SI1770
+MRVG0_SI510
+MRVG0_SX150
+MRVG0_SX240
+MRVG0_SX330
+MRVG0_SX420
+MRVG0_SX60
+MRWA0_SI1603
+MRWA0_SI2233
+MRWA0_SI973
+MRWA0_SX163
+MRWA0_SX253
+MRWA0_SX343
+MRWA0_SX433
+MRWA0_SX73
+MRWS0_SI1102
+MRWS0_SI1732
+MRWS0_SI472
+MRWS0_SX112
+MRWS0_SX202
+MRWS0_SX22
+MRWS0_SX292
+MRWS0_SX382
+MRXB0_SI1585
+MRXB0_SI2215
+MRXB0_SI955
+MRXB0_SX145
+MRXB0_SX235
+MRXB0_SX325
+MRXB0_SX415
+MRXB0_SX55
+MSAH1_SI1049
+MSAH1_SI1679
+MSAH1_SI2309
+MSAH1_SX149
+MSAH1_SX239
+MSAH1_SX329
+MSAH1_SX419
+MSAH1_SX59
+MSAS0_SI1376
+MSAS0_SI2006
+MSAS0_SI746
+MSAS0_SX116
+MSAS0_SX206
+MSAS0_SX26
+MSAS0_SX296
+MSAS0_SX386
+MSAT0_SI1526
+MSAT0_SI2156
+MSAT0_SI896
+MSAT0_SX176
+MSAT0_SX266
+MSAT0_SX356
+MSAT0_SX446
+MSAT0_SX86
+MSAT1_SI1073
+MSAT1_SI1703
+MSAT1_SI2333
+MSAT1_SX173
+MSAT1_SX263
+MSAT1_SX353
+MSAT1_SX443
+MSAT1_SX83
+MSDB0_SI1007
+MSDB0_SI1637
+MSDB0_SI2267
+MSDB0_SX107
+MSDB0_SX17
+MSDB0_SX197
+MSDB0_SX287
+MSDB0_SX377
+MSDH0_SI2113
+MSDH0_SI2240
+MSDH0_SI980
+MSDH0_SX170
+MSDH0_SX260
+MSDH0_SX350
+MSDH0_SX440
+MSDH0_SX80
+MSDS0_SI1077
+MSDS0_SI1707
+MSDS0_SI2337
+MSDS0_SX177
+MSDS0_SX267
+MSDS0_SX357
+MSDS0_SX447
+MSDS0_SX87
+MSEM1_SI1440
+MSEM1_SI2070
+MSEM1_SI810
+MSEM1_SX180
+MSEM1_SX270
+MSEM1_SX360
+MSEM1_SX450
+MSEM1_SX90
+MSES0_SI1589
+MSES0_SI2216
+MSES0_SI2219
+MSES0_SX149
+MSES0_SX239
+MSES0_SX329
+MSES0_SX419
+MSES0_SX59
+MSFH0_SI1216
+MSFH0_SI1738
+MSFH0_SI586
+MSFH0_SX136
+MSFH0_SX226
+MSFH0_SX316
+MSFH0_SX406
+MSFH0_SX46
+MSFV0_SI1262
+MSFV0_SI1892
+MSFV0_SI632
+MSFV0_SX182
+MSFV0_SX272
+MSFV0_SX362
+MSFV0_SX452
+MSFV0_SX92
+MSJK0_SI1596
+MSJK0_SI2226
+MSJK0_SI966
+MSJK0_SX156
+MSJK0_SX246
+MSJK0_SX336
+MSJK0_SX426
+MSJK0_SX66
+MSMC0_SI1907
+MSMC0_SI509
+MSMC0_SI647
+MSMC0_SX107
+MSMC0_SX17
+MSMC0_SX197
+MSMC0_SX287
+MSMC0_SX377
+MSMR0_SI1150
+MSMR0_SI1405
+MSMR0_SI775
+MSMR0_SX145
+MSMR0_SX235
+MSMR0_SX325
+MSMR0_SX415
+MSMR0_SX55
+MSMS0_SI1433
+MSMS0_SI2063
+MSMS0_SI803
+MSMS0_SX173
+MSMS0_SX263
+MSMS0_SX353
+MSMS0_SX443
+MSMS0_SX83
+MSRG0_SI1221
+MSRG0_SI1851
+MSRG0_SI591
+MSRG0_SX141
+MSRG0_SX231
+MSRG0_SX321
+MSRG0_SX411
+MSRG0_SX51
+MSRR0_SI1131
+MSRR0_SI1761
+MSRR0_SI501
+MSRR0_SX141
+MSRR0_SX231
+MSRR0_SX30
+MSRR0_SX411
+MSRR0_SX51
+MSTF0_SI1396
+MSTF0_SI766
+MSTF0_SI852
+MSTF0_SX136
+MSTF0_SX226
+MSTF0_SX316
+MSTF0_SX406
+MSTF0_SX46
+MSVS0_SI1568
+MSVS0_SI2198
+MSVS0_SI938
+MSVS0_SX128
+MSVS0_SX218
+MSVS0_SX308
+MSVS0_SX38
+MSVS0_SX398
+MTAB0_SI1572
+MTAB0_SI2202
+MTAB0_SI942
+MTAB0_SX132
+MTAB0_SX222
+MTAB0_SX312
+MTAB0_SX402
+MTAB0_SX42
+MTAS0_SI1385
+MTAS0_SI2015
+MTAS0_SI755
+MTAS0_SX125
+MTAS0_SX215
+MTAS0_SX305
+MTAS0_SX35
+MTAS0_SX395
+MTAT0_SI1110
+MTAT0_SI1740
+MTAT0_SI811
+MTAT0_SX120
+MTAT0_SX210
+MTAT0_SX30
+MTAT0_SX300
+MTAT0_SX390
+MTAT1_SI1409
+MTAT1_SI1627
+MTAT1_SI779
+MTAT1_SX149
+MTAT1_SX239
+MTAT1_SX329
+MTAT1_SX419
+MTAT1_SX59
+MTBC0_SI1173
+MTBC0_SI1803
+MTBC0_SI543
+MTBC0_SX183
+MTBC0_SX273
+MTBC0_SX347
+MTBC0_SX363
+MTBC0_SX93
+MTCS0_SI1972
+MTCS0_SI2265
+MTCS0_SI712
+MTCS0_SX172
+MTCS0_SX262
+MTCS0_SX352
+MTCS0_SX442
+MTCS0_SX82
+MTDB0_SI1401
+MTDB0_SI2031
+MTDB0_SI771
+MTDB0_SX141
+MTDB0_SX231
+MTDB0_SX321
+MTDB0_SX411
+MTDB0_SX51
+MTDP0_SI1274
+MTDP0_SI1521
+MTDP0_SI2151
+MTDP0_SX171
+MTDP0_SX261
+MTDP0_SX351
+MTDP0_SX441
+MTDP0_SX81
+MTER0_SI1157
+MTER0_SI1787
+MTER0_SI527
+MTER0_SX167
+MTER0_SX17
+MTER0_SX257
+MTER0_SX437
+MTER0_SX77
+MTJG0_SI1520
+MTJG0_SI2157
+MTJG0_SI890
+MTJG0_SX170
+MTJG0_SX260
+MTJG0_SX350
+MTJG0_SX440
+MTJG0_SX80
+MTJM0_SI1226
+MTJM0_SI1856
+MTJM0_SI655
+MTJM0_SX146
+MTJM0_SX236
+MTJM0_SX326
+MTJM0_SX416
+MTJM0_SX56
+MTJS0_SI1192
+MTJS0_SI1822
+MTJS0_SI562
+MTJS0_SX112
+MTJS0_SX202
+MTJS0_SX22
+MTJS0_SX292
+MTJS0_SX382
+MTJU0_SI2020
+MTJU0_SI2269
+MTJU0_SI760
+MTJU0_SX130
+MTJU0_SX220
+MTJU0_SX310
+MTJU0_SX40
+MTJU0_SX400
+MTKD0_SI1187
+MTKD0_SI1817
+MTKD0_SI630
+MTKD0_SX107
+MTKD0_SX17
+MTKD0_SX197
+MTKD0_SX287
+MTKD0_SX377
+MTKP0_SI1023
+MTKP0_SI2283
+MTKP0_SI454
+MTKP0_SX123
+MTKP0_SX213
+MTKP0_SX303
+MTKP0_SX33
+MTKP0_SX393
+MTLB0_SI1134
+MTLB0_SI1764
+MTLB0_SI504
+MTLB0_SX144
+MTLB0_SX234
+MTLB0_SX324
+MTLB0_SX414
+MTLB0_SX54
+MTLC0_SI1313
+MTLC0_SI1477
+MTLC0_SI847
+MTLC0_SX127
+MTLC0_SX217
+MTLC0_SX307
+MTLC0_SX37
+MTLC0_SX397
+MTML0_SI1065
+MTML0_SI1695
+MTML0_SI2325
+MTML0_SX165
+MTML0_SX255
+MTML0_SX345
+MTML0_SX435
+MTML0_SX75
+MTMN0_SI1064
+MTMN0_SI2324
+MTMN0_SI582
+MTMN0_SX164
+MTMN0_SX254
+MTMN0_SX344
+MTMN0_SX434
+MTMN0_SX74
+MTMT0_SI1118
+MTMT0_SI1748
+MTMT0_SI488
+MTMT0_SX128
+MTMT0_SX218
+MTMT0_SX308
+MTMT0_SX38
+MTMT0_SX398
+MTPF0_SI1235
+MTPF0_SI1865
+MTPF0_SI605
+MTPF0_SX155
+MTPF0_SX245
+MTPF0_SX335
+MTPF0_SX425
+MTPF0_SX65
+MTPG0_SI1383
+MTPG0_SI2013
+MTPG0_SI753
+MTPG0_SX123
+MTPG0_SX213
+MTPG0_SX303
+MTPG0_SX33
+MTPG0_SX393
+MTPP0_SI1508
+MTPP0_SI2138
+MTPP0_SI878
+MTPP0_SX158
+MTPP0_SX248
+MTPP0_SX338
+MTPP0_SX428
+MTPP0_SX68
+MTPR0_SI1600
+MTPR0_SI2230
+MTPR0_SI506
+MTPR0_SX160
+MTPR0_SX250
+MTPR0_SX340
+MTPR0_SX430
+MTPR0_SX70
+MTQC0_SI1441
+MTQC0_SI2071
+MTQC0_SI480
+MTQC0_SX181
+MTQC0_SX271
+MTQC0_SX361
+MTQC0_SX451
+MTQC0_SX91
+MTRC0_SI1623
+MTRC0_SI589
+MTRC0_SI993
+MTRC0_SX170
+MTRC0_SX183
+MTRC0_SX273
+MTRC0_SX363
+MTRC0_SX93
+MTRR0_SI1548
+MTRR0_SI2178
+MTRR0_SI918
+MTRR0_SX108
+MTRR0_SX18
+MTRR0_SX198
+MTRR0_SX288
+MTRR0_SX378
+MTRT0_SI1227
+MTRT0_SI1857
+MTRT0_SI597
+MTRT0_SX147
+MTRT0_SX237
+MTRT0_SX254
+MTRT0_SX417
+MTRT0_SX57
+MTWH1_SI1512
+MTWH1_SI2142
+MTWH1_SI882
+MTWH1_SX162
+MTWH1_SX252
+MTWH1_SX342
+MTWH1_SX432
+MTWH1_SX72
+MTXS0_SI1060
+MTXS0_SI1690
+MTXS0_SI2320
+MTXS0_SX160
+MTXS0_SX250
+MTXS0_SX340
+MTXS0_SX430
+MTXS0_SX70
+MVJH0_SI1556
+MVJH0_SI2186
+MVJH0_SI926
+MVJH0_SX116
+MVJH0_SX206
+MVJH0_SX26
+MVJH0_SX296
+MVJH0_SX386
+MVLO0_SI1147
+MVLO0_SI1777
+MVLO0_SI517
+MVLO0_SX157
+MVLO0_SX247
+MVLO0_SX337
+MVLO0_SX427
+MVLO0_SX67
+MVRW0_SI1485
+MVRW0_SI2115
+MVRW0_SI855
+MVRW0_SX135
+MVRW0_SX225
+MVRW0_SX315
+MVRW0_SX405
+MVRW0_SX45
+MWAC0_SI1601
+MWAC0_SI2231
+MWAC0_SI971
+MWAC0_SX161
+MWAC0_SX251
+MWAC0_SX341
+MWAC0_SX431
+MWAC0_SX71
+MWAD0_SI1062
+MWAD0_SI1749
+MWAD0_SI2322
+MWAD0_SX162
+MWAD0_SX252
+MWAD0_SX342
+MWAD0_SX432
+MWAD0_SX72
+MWAR0_SI1045
+MWAR0_SI1675
+MWAR0_SI2305
+MWAR0_SX145
+MWAR0_SX235
+MWAR0_SX325
+MWAR0_SX415
+MWAR0_SX55
+MWCH0_SI1622
+MWCH0_SI1895
+MWCH0_SI2252
+MWCH0_SX182
+MWCH0_SX272
+MWCH0_SX362
+MWCH0_SX452
+MWCH0_SX92
+MWDK0_SI1436
+MWDK0_SI2017
+MWDK0_SI806
+MWDK0_SX176
+MWDK0_SX266
+MWDK0_SX356
+MWDK0_SX446
+MWDK0_SX86
+MWEM0_SI1320
+MWEM0_SI1393
+MWEM0_SI1950
+MWEM0_SX150
+MWEM0_SX240
+MWEM0_SX330
+MWEM0_SX420
+MWEM0_SX60
+MWGR0_SI1606
+MWGR0_SI2236
+MWGR0_SI976
+MWGR0_SX166
+MWGR0_SX256
+MWGR0_SX346
+MWGR0_SX436
+MWGR0_SX76
+MWRE0_SI1057
+MWRE0_SI1687
+MWRE0_SI2317
+MWRE0_SX157
+MWRE0_SX247
+MWRE0_SX337
+MWRE0_SX427
+MWRE0_SX67
+MWRP0_SI1443
+MWRP0_SI1525
+MWRP0_SI2073
+MWRP0_SX183
+MWRP0_SX273
+MWRP0_SX3
+MWRP0_SX363
+MWRP0_SX93
+MWSB0_SI1626
+MWSB0_SI2256
+MWSB0_SI996
+MWSB0_SX186
+MWSB0_SX276
+MWSB0_SX366
+MWSB0_SX6
+MWSB0_SX96
+MWSH0_SI1426
+MWSH0_SI2266
+MWSH0_SI796
+MWSH0_SX166
+MWSH0_SX256
+MWSH0_SX346
+MWSH0_SX436
+MWSH0_SX76
+MZMB0_SI1166
+MZMB0_SI1796
+MZMB0_SI536
+MZMB0_SX176
+MZMB0_SX266
+MZMB0_SX356
+MZMB0_SX446
+MZMB0_SX86
diff --git a/SpeechT5/fairseq/examples/wav2vec/unsupervised/config/timit_matched/train_text.uid b/SpeechT5/fairseq/examples/wav2vec/unsupervised/config/timit_matched/train_text.uid
new file mode 100644
index 0000000000000000000000000000000000000000..c39fd0b91d51e0ae15caf1e9701d0d9ef51ee21b
--- /dev/null
+++ b/SpeechT5/fairseq/examples/wav2vec/unsupervised/config/timit_matched/train_text.uid
@@ -0,0 +1,3696 @@
+FAEM0_SI1392
+FAEM0_SI2022
+FAEM0_SI762
+FAEM0_SX132
+FAEM0_SX222
+FAEM0_SX312
+FAEM0_SX402
+FAEM0_SX42
+FAJW0_SI1263
+FAJW0_SI1893
+FAJW0_SI633
+FAJW0_SX183
+FAJW0_SX273
+FAJW0_SX3
+FAJW0_SX363
+FAJW0_SX93
+FALK0_SI1086
+FALK0_SI456
+FALK0_SI658
+FALK0_SX186
+FALK0_SX276
+FALK0_SX366
+FALK0_SX6
+FALK0_SX96
+FALR0_SI1325
+FALR0_SI1955
+FALR0_SI695
+FALR0_SX155
+FALR0_SX245
+FALR0_SX335
+FALR0_SX425
+FALR0_SX65
+FAPB0_SI1063
+FAPB0_SI1693
+FAPB0_SI2323
+FAPB0_SX163
+FAPB0_SX253
+FAPB0_SX343
+FAPB0_SX433
+FAPB0_SX73
+FBAS0_SI1387
+FBAS0_SI1472
+FBAS0_SI2066
+FBAS0_SX127
+FBAS0_SX217
+FBAS0_SX307
+FBAS0_SX37
+FBAS0_SX397
+FBCG1_SI1612
+FBCG1_SI2242
+FBCG1_SI982
+FBCG1_SX172
+FBCG1_SX262
+FBCG1_SX352
+FBCG1_SX442
+FBCG1_SX82
+FBCH0_SI1586
+FBCH0_SI956
+FBCH0_SI959
+FBCH0_SX146
+FBCH0_SX236
+FBCH0_SX326
+FBCH0_SX416
+FBCH0_SX56
+FBJL0_SI1552
+FBJL0_SI2182
+FBJL0_SI922
+FBJL0_SX112
+FBJL0_SX202
+FBJL0_SX22
+FBJL0_SX292
+FBJL0_SX382
+FBLV0_SI1058
+FBLV0_SI1688
+FBLV0_SI2318
+FBLV0_SX158
+FBLV0_SX248
+FBLV0_SX338
+FBLV0_SX428
+FBLV0_SX68
+FBMH0_SI1136
+FBMH0_SI1766
+FBMH0_SI970
+FBMH0_SX146
+FBMH0_SX236
+FBMH0_SX326
+FBMH0_SX416
+FBMH0_SX56
+FBMJ0_SI1776
+FBMJ0_SI516
+FBMJ0_SI815
+FBMJ0_SX156
+FBMJ0_SX246
+FBMJ0_SX336
+FBMJ0_SX426
+FBMJ0_SX66
+FCAG0_SI1503
+FCAG0_SI1641
+FCAG0_SI2133
+FCAG0_SX153
+FCAG0_SX243
+FCAG0_SX333
+FCAG0_SX423
+FCAG0_SX63
+FCAJ0_SI1479
+FCAJ0_SI1804
+FCAJ0_SI849
+FCAJ0_SX129
+FCAJ0_SX219
+FCAJ0_SX309
+FCAJ0_SX39
+FCAJ0_SX399
+FCDR1_SI1186
+FCDR1_SI1816
+FCDR1_SI556
+FCDR1_SX106
+FCDR1_SX16
+FCDR1_SX196
+FCDR1_SX286
+FCDR1_SX376
+FCEG0_SI1248
+FCEG0_SI1878
+FCEG0_SI618
+FCEG0_SX168
+FCEG0_SX258
+FCEG0_SX348
+FCEG0_SX438
+FCEG0_SX78
+FCJF0_SI1027
+FCJF0_SI1657
+FCJF0_SI648
+FCJF0_SX127
+FCJF0_SX217
+FCJF0_SX307
+FCJF0_SX37
+FCJF0_SX397
+FCJS0_SI1607
+FCJS0_SI2237
+FCJS0_SI977
+FCJS0_SX167
+FCJS0_SX257
+FCJS0_SX347
+FCJS0_SX437
+FCJS0_SX77
+FCKE0_SI1111
+FCKE0_SI1741
+FCKE0_SI481
+FCKE0_SX121
+FCKE0_SX211
+FCKE0_SX301
+FCKE0_SX31
+FCKE0_SX391
+FCLT0_SI1438
+FCLT0_SI2068
+FCLT0_SI808
+FCLT0_SX178
+FCLT0_SX268
+FCLT0_SX358
+FCLT0_SX448
+FCLT0_SX88
+FCMG0_SI1142
+FCMG0_SI1242
+FCMG0_SI1872
+FCMG0_SX162
+FCMG0_SX252
+FCMG0_SX342
+FCMG0_SX432
+FCMG0_SX72
+FCMM0_SI1083
+FCMM0_SI1957
+FCMM0_SI453
+FCMM0_SX183
+FCMM0_SX273
+FCMM0_SX363
+FCMM0_SX420
+FCMM0_SX93
+FCRZ0_SI1913
+FCRZ0_SI2053
+FCRZ0_SI793
+FCRZ0_SX163
+FCRZ0_SX253
+FCRZ0_SX343
+FCRZ0_SX433
+FCRZ0_SX73
+FCYL0_SI1297
+FCYL0_SI1927
+FCYL0_SI667
+FCYL0_SX127
+FCYL0_SX217
+FCYL0_SX349
+FCYL0_SX37
+FCYL0_SX397
+FDAS1_SI1461
+FDAS1_SI2091
+FDAS1_SI831
+FDAS1_SX111
+FDAS1_SX201
+FDAS1_SX21
+FDAS1_SX291
+FDAS1_SX381
+FDAW0_SI1271
+FDAW0_SI1406
+FDAW0_SI2036
+FDAW0_SX146
+FDAW0_SX236
+FDAW0_SX326
+FDAW0_SX416
+FDAW0_SX56
+FDFB0_SI1318
+FDFB0_SI1948
+FDFB0_SI2010
+FDFB0_SX148
+FDFB0_SX238
+FDFB0_SX328
+FDFB0_SX418
+FDFB0_SX58
+FDJH0_SI1565
+FDJH0_SI2195
+FDJH0_SI935
+FDJH0_SX125
+FDJH0_SX215
+FDJH0_SX305
+FDJH0_SX35
+FDJH0_SX395
+FDKN0_SI1081
+FDKN0_SI1202
+FDKN0_SI1711
+FDKN0_SX181
+FDKN0_SX271
+FDKN0_SX361
+FDKN0_SX451
+FDKN0_SX91
+FDML0_SI1149
+FDML0_SI1779
+FDML0_SI2075
+FDML0_SX159
+FDML0_SX249
+FDML0_SX339
+FDML0_SX429
+FDML0_SX69
+FDMY0_SI1197
+FDMY0_SI567
+FDMY0_SI714
+FDMY0_SX117
+FDMY0_SX207
+FDMY0_SX27
+FDMY0_SX297
+FDMY0_SX387
+FDNC0_SI1278
+FDNC0_SI1908
+FDNC0_SI2287
+FDNC0_SX108
+FDNC0_SX18
+FDNC0_SX198
+FDNC0_SX288
+FDNC0_SX378
+FDTD0_SI1561
+FDTD0_SI2191
+FDTD0_SI931
+FDTD0_SX121
+FDTD0_SX211
+FDTD0_SX301
+FDTD0_SX321
+FDTD0_SX391
+FDXW0_SI1511
+FDXW0_SI2141
+FDXW0_SI881
+FDXW0_SX161
+FDXW0_SX251
+FDXW0_SX341
+FDXW0_SX431
+FDXW0_SX71
+FEAC0_SI1245
+FEAC0_SI1875
+FEAC0_SI615
+FEAC0_SX165
+FEAC0_SX255
+FEAC0_SX345
+FEAC0_SX435
+FEAC0_SX75
+FEAR0_SI1252
+FEAR0_SI1882
+FEAR0_SI622
+FEAR0_SX172
+FEAR0_SX262
+FEAR0_SX352
+FEAR0_SX442
+FEAR0_SX82
+FECD0_SI1418
+FECD0_SI2048
+FECD0_SI788
+FECD0_SX158
+FECD0_SX248
+FECD0_SX338
+FECD0_SX428
+FECD0_SX68
+FEEH0_SI1112
+FEEH0_SI1742
+FEEH0_SI471
+FEEH0_SX122
+FEEH0_SX212
+FEEH0_SX302
+FEEH0_SX32
+FEEH0_SX392
+FEME0_SI1505
+FEME0_SI2135
+FEME0_SI875
+FEME0_SX155
+FEME0_SX245
+FEME0_SX335
+FEME0_SX425
+FEME0_SX65
+FETB0_SI1148
+FETB0_SI1778
+FETB0_SI518
+FETB0_SX158
+FETB0_SX248
+FETB0_SX338
+FETB0_SX428
+FETB0_SX68
+FEXM0_SI1101
+FEXM0_SI1731
+FEXM0_SI482
+FEXM0_SX111
+FEXM0_SX201
+FEXM0_SX291
+FEXM0_SX366
+FEXM0_SX381
+FGCS0_SI1486
+FGCS0_SI2116
+FGCS0_SI856
+FGCS0_SX136
+FGCS0_SX226
+FGCS0_SX316
+FGCS0_SX406
+FGCS0_SX46
+FGDP0_SI1618
+FGDP0_SI2248
+FGDP0_SI988
+FGDP0_SX178
+FGDP0_SX268
+FGDP0_SX358
+FGDP0_SX448
+FGDP0_SX88
+FGMB0_SI1145
+FGMB0_SI1775
+FGMB0_SI515
+FGMB0_SX155
+FGMB0_SX245
+FGMB0_SX335
+FGMB0_SX425
+FGMB0_SX65
+FGRW0_SI1152
+FGRW0_SI1782
+FGRW0_SI1990
+FGRW0_SX162
+FGRW0_SX252
+FGRW0_SX342
+FGRW0_SX432
+FGRW0_SX72
+FHLM0_SI1560
+FHLM0_SI2190
+FHLM0_SI930
+FHLM0_SX120
+FHLM0_SX210
+FHLM0_SX300
+FHLM0_SX349
+FHLM0_SX390
+FHXS0_SI1075
+FHXS0_SI2302
+FHXS0_SI2335
+FHXS0_SX175
+FHXS0_SX265
+FHXS0_SX355
+FHXS0_SX445
+FHXS0_SX85
+FJDM2_SI1582
+FJDM2_SI1964
+FJDM2_SI2212
+FJDM2_SX142
+FJDM2_SX232
+FJDM2_SX322
+FJDM2_SX412
+FJDM2_SX52
+FJEN0_SI1047
+FJEN0_SI1677
+FJEN0_SI2307
+FJEN0_SX147
+FJEN0_SX237
+FJEN0_SX327
+FJEN0_SX417
+FJEN0_SX57
+FJHK0_SI1022
+FJHK0_SI1652
+FJHK0_SI2282
+FJHK0_SX122
+FJHK0_SX212
+FJHK0_SX302
+FJHK0_SX32
+FJHK0_SX392
+FJKL0_SI1562
+FJKL0_SI2192
+FJKL0_SI932
+FJKL0_SX122
+FJKL0_SX212
+FJKL0_SX302
+FJKL0_SX32
+FJKL0_SX392
+FJLG0_SI1506
+FJLG0_SI1889
+FJLG0_SI2306
+FJLG0_SX179
+FJLG0_SX269
+FJLG0_SX359
+FJLG0_SX449
+FJLG0_SX89
+FJLR0_SI1231
+FJLR0_SI1861
+FJLR0_SI601
+FJLR0_SX151
+FJLR0_SX241
+FJLR0_SX331
+FJLR0_SX421
+FJLR0_SX61
+FJRB0_SI1302
+FJRB0_SI1932
+FJRB0_SI672
+FJRB0_SX132
+FJRB0_SX222
+FJRB0_SX312
+FJRB0_SX402
+FJRB0_SX42
+FJRP1_SI1432
+FJRP1_SI2062
+FJRP1_SI802
+FJRP1_SX172
+FJRP1_SX262
+FJRP1_SX352
+FJRP1_SX442
+FJRP1_SX82
+FJSK0_SI1052
+FJSK0_SI1682
+FJSK0_SI2312
+FJSK0_SX152
+FJSK0_SX242
+FJSK0_SX332
+FJSK0_SX422
+FJSK0_SX62
+FJSP0_SI1434
+FJSP0_SI1763
+FJSP0_SI804
+FJSP0_SX174
+FJSP0_SX264
+FJSP0_SX354
+FJSP0_SX444
+FJSP0_SX84
+FJWB1_SI2055
+FJWB1_SI748
+FJWB1_SI795
+FJWB1_SX165
+FJWB1_SX255
+FJWB1_SX345
+FJWB1_SX435
+FJWB1_SX75
+FJXM0_SI1211
+FJXM0_SI1971
+FJXM0_SI581
+FJXM0_SX131
+FJXM0_SX221
+FJXM0_SX311
+FJXM0_SX401
+FJXM0_SX41
+FJXP0_SI1122
+FJXP0_SI1752
+FJXP0_SI492
+FJXP0_SX132
+FJXP0_SX222
+FJXP0_SX312
+FJXP0_SX402
+FJXP0_SX42
+FKAA0_SI1208
+FKAA0_SI1838
+FKAA0_SI578
+FKAA0_SX128
+FKAA0_SX218
+FKAA0_SX308
+FKAA0_SX38
+FKAA0_SX398
+FKDE0_SI1141
+FKDE0_SI1771
+FKDE0_SI2221
+FKDE0_SX151
+FKDE0_SX241
+FKDE0_SX331
+FKDE0_SX421
+FKDE0_SX61
+FKDW0_SI1207
+FKDW0_SI1891
+FKDW0_SI577
+FKDW0_SX127
+FKDW0_SX217
+FKDW0_SX307
+FKDW0_SX37
+FKDW0_SX397
+FKFB0_SI1608
+FKFB0_SI2238
+FKFB0_SI978
+FKFB0_SX168
+FKFB0_SX258
+FKFB0_SX348
+FKFB0_SX438
+FKFB0_SX78
+FKKH0_SI1290
+FKKH0_SI1920
+FKKH0_SI660
+FKKH0_SX120
+FKKH0_SX210
+FKKH0_SX30
+FKKH0_SX300
+FKKH0_SX390
+FKLC0_SI1615
+FKLC0_SI2245
+FKLC0_SI985
+FKLC0_SX175
+FKLC0_SX265
+FKLC0_SX355
+FKLC0_SX445
+FKLC0_SX85
+FKLC1_SI1048
+FKLC1_SI1678
+FKLC1_SI2308
+FKLC1_SX148
+FKLC1_SX238
+FKLC1_SX328
+FKLC1_SX418
+FKLC1_SX58
+FKLH0_SI1257
+FKLH0_SI1887
+FKLH0_SI627
+FKLH0_SX177
+FKLH0_SX267
+FKLH0_SX357
+FKLH0_SX447
+FKLH0_SX87
+FKSR0_SI1117
+FKSR0_SI1747
+FKSR0_SI487
+FKSR0_SX161
+FKSR0_SX217
+FKSR0_SX366
+FKSR0_SX37
+FKSR0_SX397
+FLAC0_SI1339
+FLAC0_SI2161
+FLAC0_SI901
+FLAC0_SX181
+FLAC0_SX271
+FLAC0_SX361
+FLAC0_SX451
+FLAC0_SX91
+FLAG0_SI1464
+FLAG0_SI2094
+FLAG0_SI834
+FLAG0_SX114
+FLAG0_SX204
+FLAG0_SX24
+FLAG0_SX294
+FLAG0_SX384
+FLEH0_SI1051
+FLEH0_SI1681
+FLEH0_SI2311
+FLEH0_SX151
+FLEH0_SX241
+FLEH0_SX331
+FLEH0_SX421
+FLEH0_SX61
+FLET0_SI1137
+FLET0_SI1767
+FLET0_SI507
+FLET0_SX147
+FLET0_SX237
+FLET0_SX277
+FLET0_SX417
+FLET0_SX57
+FLHD0_SI1344
+FLHD0_SI1827
+FLHD0_SI1974
+FLHD0_SX174
+FLHD0_SX264
+FLHD0_SX354
+FLHD0_SX444
+FLHD0_SX84
+FLJA0_SI1078
+FLJA0_SI1708
+FLJA0_SI2338
+FLJA0_SX178
+FLJA0_SX268
+FLJA0_SX358
+FLJA0_SX448
+FLJA0_SX88
+FLJD0_SI1516
+FLJD0_SI2146
+FLJD0_SI886
+FLJD0_SX166
+FLJD0_SX256
+FLJD0_SX346
+FLJD0_SX436
+FLJD0_SX76
+FLJG0_SI1611
+FLJG0_SI2241
+FLJG0_SI981
+FLJG0_SX171
+FLJG0_SX261
+FLJG0_SX351
+FLJG0_SX441
+FLJG0_SX81
+FLKM0_SI1880
+FLKM0_SI620
+FLKM0_SI686
+FLKM0_SX116
+FLKM0_SX260
+FLKM0_SX350
+FLKM0_SX440
+FLKM0_SX80
+FLMA0_SI1243
+FLMA0_SI1873
+FLMA0_SI613
+FLMA0_SX163
+FLMA0_SX253
+FLMA0_SX343
+FLMA0_SX433
+FLMA0_SX73
+FLMC0_SI1372
+FLMC0_SI2002
+FLMC0_SI742
+FLMC0_SX112
+FLMC0_SX22
+FLMC0_SX292
+FLMC0_SX336
+FLMC0_SX382
+FLMK0_SI1035
+FLMK0_SI1229
+FLMK0_SI2295
+FLMK0_SX135
+FLMK0_SX225
+FLMK0_SX315
+FLMK0_SX405
+FLMK0_SX45
+FLOD0_SI1287
+FLOD0_SI1917
+FLOD0_SI657
+FLOD0_SX117
+FLOD0_SX171
+FLOD0_SX207
+FLOD0_SX297
+FLOD0_SX387
+FLTM0_SI1070
+FLTM0_SI1700
+FLTM0_SI2330
+FLTM0_SX170
+FLTM0_SX260
+FLTM0_SX350
+FLTM0_SX440
+FLTM0_SX80
+FMAH1_SI1509
+FMAH1_SI2139
+FMAH1_SI879
+FMAH1_SX159
+FMAH1_SX249
+FMAH1_SX339
+FMAH1_SX429
+FMAH1_SX69
+FMBG0_SI1160
+FMBG0_SI1790
+FMBG0_SI2264
+FMBG0_SX260
+FMBG0_SX3
+FMBG0_SX350
+FMBG0_SX440
+FMBG0_SX80
+FMEM0_SI1377
+FMEM0_SI2007
+FMEM0_SI747
+FMEM0_SX117
+FMEM0_SX207
+FMEM0_SX297
+FMEM0_SX333
+FMEM0_SX387
+FMJB0_SI1177
+FMJB0_SI1807
+FMJB0_SI547
+FMJB0_SX187
+FMJB0_SX277
+FMJB0_SX367
+FMJB0_SX7
+FMJB0_SX97
+FMJF0_SI1254
+FMJF0_SI1884
+FMJF0_SI624
+FMJF0_SX174
+FMJF0_SX264
+FMJF0_SX354
+FMJF0_SX444
+FMJF0_SX84
+FMJU0_SI1389
+FMJU0_SI2019
+FMJU0_SI759
+FMJU0_SX129
+FMJU0_SX219
+FMJU0_SX309
+FMJU0_SX39
+FMJU0_SX399
+FMKC0_SI1041
+FMKC0_SI1072
+FMKC0_SI1702
+FMKC0_SX172
+FMKC0_SX262
+FMKC0_SX352
+FMKC0_SX442
+FMKC0_SX82
+FMKF0_SI1018
+FMKF0_SI1536
+FMKF0_SI906
+FMKF0_SX186
+FMKF0_SX276
+FMKF0_SX366
+FMKF0_SX6
+FMKF0_SX96
+FMMH0_SI1537
+FMMH0_SI2167
+FMMH0_SI907
+FMMH0_SX187
+FMMH0_SX367
+FMMH0_SX420
+FMMH0_SX7
+FMMH0_SX97
+FMPG0_SI1602
+FMPG0_SI2232
+FMPG0_SI972
+FMPG0_SX162
+FMPG0_SX252
+FMPG0_SX342
+FMPG0_SX432
+FMPG0_SX72
+FNKL0_SI1522
+FNKL0_SI2152
+FNKL0_SI892
+FNKL0_SX172
+FNKL0_SX196
+FNKL0_SX262
+FNKL0_SX442
+FNKL0_SX82
+FNTB0_SI1203
+FNTB0_SI573
+FNTB0_SI679
+FNTB0_SX123
+FNTB0_SX213
+FNTB0_SX303
+FNTB0_SX33
+FNTB0_SX393
+FPAB1_SI1471
+FPAB1_SI2101
+FPAB1_SI841
+FPAB1_SX121
+FPAB1_SX211
+FPAB1_SX301
+FPAB1_SX31
+FPAB1_SX391
+FPAC0_SI1921
+FPAC0_SI2011
+FPAC0_SI661
+FPAC0_SX121
+FPAC0_SX211
+FPAC0_SX301
+FPAC0_SX31
+FPAC0_SX391
+FPAD0_SI1346
+FPAD0_SI1976
+FPAD0_SI716
+FPAD0_SX176
+FPAD0_SX266
+FPAD0_SX356
+FPAD0_SX446
+FPAD0_SX86
+FPAF0_SI1054
+FPAF0_SI1684
+FPAF0_SI2314
+FPAF0_SX154
+FPAF0_SX244
+FPAF0_SX334
+FPAF0_SX424
+FPAF0_SX64
+FPAZ0_SI1593
+FPAZ0_SI2223
+FPAZ0_SI963
+FPAZ0_SX153
+FPAZ0_SX243
+FPAZ0_SX27
+FPAZ0_SX423
+FPAZ0_SX63
+FPJF0_SI1046
+FPJF0_SI1259
+FPJF0_SI1676
+FPJF0_SX146
+FPJF0_SX236
+FPJF0_SX326
+FPJF0_SX352
+FPJF0_SX56
+FPLS0_SI1590
+FPLS0_SI2220
+FPLS0_SI960
+FPLS0_SX150
+FPLS0_SX240
+FPLS0_SX3
+FPLS0_SX330
+FPLS0_SX60
+FPMY0_SI1153
+FPMY0_SI1783
+FPMY0_SI523
+FPMY0_SX163
+FPMY0_SX196
+FPMY0_SX253
+FPMY0_SX343
+FPMY0_SX73
+FREH0_SI1315
+FREH0_SI1945
+FREH0_SI685
+FREH0_SX145
+FREH0_SX235
+FREH0_SX325
+FREH0_SX415
+FREH0_SX55
+FRJB0_SI1427
+FRJB0_SI1470
+FRJB0_SI1794
+FRJB0_SX167
+FRJB0_SX257
+FRJB0_SX347
+FRJB0_SX437
+FRJB0_SX77
+FRLL0_SI1514
+FRLL0_SI805
+FRLL0_SI884
+FRLL0_SX164
+FRLL0_SX254
+FRLL0_SX344
+FRLL0_SX434
+FRLL0_SX74
+FSAG0_SI1323
+FSAG0_SI1953
+FSAG0_SI693
+FSAG0_SX153
+FSAG0_SX243
+FSAG0_SX333
+FSAG0_SX423
+FSAG0_SX63
+FSAH0_SI1244
+FSAH0_SI1874
+FSAH0_SI614
+FSAH0_SX164
+FSAH0_SX327
+FSAH0_SX344
+FSAH0_SX434
+FSAH0_SX74
+FSAK0_SI1300
+FSAK0_SI1930
+FSAK0_SI670
+FSAK0_SX130
+FSAK0_SX220
+FSAK0_SX310
+FSAK0_SX40
+FSAK0_SX400
+FSBK0_SI1069
+FSBK0_SI1699
+FSBK0_SI2329
+FSBK0_SX169
+FSBK0_SX259
+FSBK0_SX349
+FSBK0_SX439
+FSBK0_SX79
+FSCN0_SI1886
+FSCN0_SI626
+FSCN0_SI705
+FSCN0_SX176
+FSCN0_SX266
+FSCN0_SX356
+FSCN0_SX446
+FSCN0_SX86
+FSDC0_SI1312
+FSDC0_SI1942
+FSDC0_SI2234
+FSDC0_SX142
+FSDC0_SX232
+FSDC0_SX322
+FSDC0_SX412
+FSDC0_SX52
+FSDJ0_SI1115
+FSDJ0_SI1745
+FSDJ0_SI485
+FSDJ0_SX125
+FSDJ0_SX215
+FSDJ0_SX305
+FSDJ0_SX35
+FSDJ0_SX395
+FSGF0_SI1557
+FSGF0_SI2187
+FSGF0_SI927
+FSGF0_SX117
+FSGF0_SX207
+FSGF0_SX27
+FSGF0_SX297
+FSGF0_SX387
+FSJG0_SI1570
+FSJG0_SI2200
+FSJG0_SI940
+FSJG0_SX130
+FSJG0_SX220
+FSJG0_SX310
+FSJG0_SX40
+FSJG0_SX400
+FSJK1_SI1025
+FSJK1_SI2285
+FSJK1_SI696
+FSJK1_SX125
+FSJK1_SX215
+FSJK1_SX305
+FSJK1_SX35
+FSJK1_SX395
+FSJS0_SI1171
+FSJS0_SI1801
+FSJS0_SI541
+FSJS0_SX181
+FSJS0_SX271
+FSJS0_SX361
+FSJS0_SX451
+FSJS0_SX91
+FSJW0_SI1333
+FSJW0_SI1963
+FSJW0_SI703
+FSJW0_SX163
+FSJW0_SX253
+FSJW0_SX343
+FSJW0_SX433
+FSJW0_SX73
+FSKC0_SI1416
+FSKC0_SI2046
+FSKC0_SI786
+FSKC0_SX156
+FSKC0_SX246
+FSKC0_SX336
+FSKC0_SX426
+FSKC0_SX66
+FSKL0_SI1529
+FSKL0_SI2159
+FSKL0_SI899
+FSKL0_SX179
+FSKL0_SX269
+FSKL0_SX359
+FSKL0_SX449
+FSKL0_SX89
+FSKP0_SI1098
+FSKP0_SI1728
+FSKP0_SI468
+FSKP0_SX108
+FSKP0_SX18
+FSKP0_SX198
+FSKP0_SX288
+FSKP0_SX378
+FSLS0_SI1056
+FSLS0_SI1686
+FSLS0_SI2316
+FSLS0_SX156
+FSLS0_SX202
+FSLS0_SX246
+FSLS0_SX426
+FSLS0_SX66
+FSMA0_SI1621
+FSMA0_SI2251
+FSMA0_SI991
+FSMA0_SX181
+FSMA0_SX271
+FSMA0_SX361
+FSMA0_SX451
+FSMA0_SX91
+FSMM0_SI1314
+FSMM0_SI1944
+FSMM0_SI684
+FSMM0_SX144
+FSMM0_SX234
+FSMM0_SX324
+FSMM0_SX414
+FSMM0_SX54
+FSMS1_SI1504
+FSMS1_SI2134
+FSMS1_SI874
+FSMS1_SX154
+FSMS1_SX244
+FSMS1_SX334
+FSMS1_SX347
+FSMS1_SX64
+FSPM0_SI1241
+FSPM0_SI1871
+FSPM0_SI611
+FSPM0_SX161
+FSPM0_SX251
+FSPM0_SX341
+FSPM0_SX431
+FSPM0_SX71
+FSRH0_SI1719
+FSRH0_SI1931
+FSRH0_SI671
+FSRH0_SX131
+FSRH0_SX221
+FSRH0_SX311
+FSRH0_SX401
+FSRH0_SX41
+FSSB0_SI1082
+FSSB0_SI1712
+FSSB0_SI2342
+FSSB0_SX182
+FSSB0_SX272
+FSSB0_SX362
+FSSB0_SX452
+FSSB0_SX92
+FTAJ0_SI1329
+FTAJ0_SI474
+FTAJ0_SI699
+FTAJ0_SX159
+FTAJ0_SX249
+FTAJ0_SX339
+FTAJ0_SX429
+FTAJ0_SX69
+FTBR0_SI1402
+FTBR0_SI2181
+FTBR0_SI921
+FTBR0_SX111
+FTBR0_SX201
+FTBR0_SX21
+FTBR0_SX291
+FTBR0_SX381
+FTBW0_SI1345
+FTBW0_SI1975
+FTBW0_SI715
+FTBW0_SX175
+FTBW0_SX265
+FTBW0_SX355
+FTBW0_SX445
+FTBW0_SX85
+FTLG0_SI1743
+FTLG0_SI483
+FTLG0_SI840
+FTLG0_SX123
+FTLG0_SX213
+FTLG0_SX303
+FTLG0_SX33
+FTLG0_SX393
+FTMG0_SI1532
+FTMG0_SI2162
+FTMG0_SI902
+FTMG0_SX182
+FTMG0_SX272
+FTMG0_SX362
+FTMG0_SX452
+FTMG0_SX92
+FVFB0_SI1032
+FVFB0_SI1510
+FVFB0_SI2292
+FVFB0_SX132
+FVFB0_SX222
+FVFB0_SX312
+FVFB0_SX402
+FVFB0_SX42
+FVKB0_SI1159
+FVKB0_SI1789
+FVKB0_SI529
+FVKB0_SX169
+FVKB0_SX259
+FVKB0_SX349
+FVKB0_SX439
+FVKB0_SX79
+FVMH0_SI1466
+FVMH0_SI2096
+FVMH0_SI836
+FVMH0_SX116
+FVMH0_SX206
+FVMH0_SX26
+FVMH0_SX296
+FVMH0_SX386
+MABC0_SI1620
+MABC0_SI2041
+MABC0_SI781
+MABC0_SX151
+MABC0_SX241
+MABC0_SX331
+MABC0_SX421
+MABC0_SX61
+MADC0_SI1367
+MADC0_SI1997
+MADC0_SI737
+MADC0_SX107
+MADC0_SX17
+MADC0_SX197
+MADC0_SX287
+MADC0_SX377
+MADD0_SI1295
+MADD0_SI1798
+MADD0_SI538
+MADD0_SX178
+MADD0_SX268
+MADD0_SX358
+MADD0_SX448
+MADD0_SX88
+MAEB0_SI1411
+MAEB0_SI2250
+MAEB0_SI990
+MAEB0_SX180
+MAEB0_SX270
+MAEB0_SX360
+MAEB0_SX450
+MAEB0_SX90
+MAEO0_SI1326
+MAEO0_SI1655
+MAEO0_SI1956
+MAEO0_SX156
+MAEO0_SX246
+MAEO0_SX336
+MAEO0_SX426
+MAEO0_SX66
+MAFM0_SI1569
+MAFM0_SI2199
+MAFM0_SI939
+MAFM0_SX129
+MAFM0_SX219
+MAFM0_SX309
+MAFM0_SX39
+MAFM0_SX399
+MAJP0_SI1074
+MAJP0_SI1704
+MAJP0_SI2334
+MAJP0_SX174
+MAJP0_SX264
+MAJP0_SX354
+MAJP0_SX444
+MAJP0_SX84
+MAKB0_SI1016
+MAKB0_SI1646
+MAKB0_SI2276
+MAKB0_SX116
+MAKB0_SX206
+MAKB0_SX26
+MAKB0_SX296
+MAKB0_SX386
+MAKR0_SI1352
+MAKR0_SI1982
+MAKR0_SI722
+MAKR0_SX182
+MAKR0_SX272
+MAKR0_SX362
+MAKR0_SX452
+MAKR0_SX92
+MAPV0_SI1293
+MAPV0_SI1923
+MAPV0_SI663
+MAPV0_SX123
+MAPV0_SX213
+MAPV0_SX303
+MAPV0_SX33
+MAPV0_SX393
+MARC0_SI1188
+MARC0_SI1818
+MARC0_SI558
+MARC0_SX108
+MARC0_SX18
+MARC0_SX198
+MARC0_SX288
+MARC0_SX378
+MARW0_SI1276
+MARW0_SI1906
+MARW0_SI646
+MARW0_SX106
+MARW0_SX16
+MARW0_SX286
+MARW0_SX349
+MARW0_SX376
+MBAR0_SI1319
+MBAR0_SI1949
+MBAR0_SI689
+MBAR0_SX149
+MBAR0_SX239
+MBAR0_SX329
+MBAR0_SX419
+MBAR0_SX59
+MBBR0_SI1055
+MBBR0_SI1685
+MBBR0_SI2315
+MBBR0_SX155
+MBBR0_SX245
+MBBR0_SX335
+MBBR0_SX425
+MBBR0_SX65
+MBCG0_SI2217
+MBCG0_SI486
+MBCG0_SI957
+MBCG0_SX147
+MBCG0_SX237
+MBCG0_SX327
+MBCG0_SX417
+MBCG0_SX57
+MBEF0_SI1281
+MBEF0_SI1911
+MBEF0_SI651
+MBEF0_SX111
+MBEF0_SX201
+MBEF0_SX21
+MBEF0_SX291
+MBEF0_SX381
+MBGT0_SI1341
+MBGT0_SI1841
+MBGT0_SI711
+MBGT0_SX171
+MBGT0_SX261
+MBGT0_SX351
+MBGT0_SX441
+MBGT0_SX81
+MBJV0_SI1247
+MBJV0_SI1877
+MBJV0_SI617
+MBJV0_SX167
+MBJV0_SX257
+MBJV0_SX347
+MBJV0_SX437
+MBJV0_SX77
+MBMA0_SI1222
+MBMA0_SI1852
+MBMA0_SI592
+MBMA0_SX142
+MBMA0_SX232
+MBMA0_SX322
+MBMA0_SX412
+MBMA0_SX52
+MBMA1_SI2207
+MBMA1_SI2214
+MBMA1_SI954
+MBMA1_SX144
+MBMA1_SX234
+MBMA1_SX324
+MBMA1_SX414
+MBMA1_SX54
+MBML0_SI1169
+MBML0_SI1799
+MBML0_SI539
+MBML0_SX179
+MBML0_SX269
+MBML0_SX359
+MBML0_SX449
+MBML0_SX89
+MBOM0_SI1014
+MBOM0_SI1644
+MBOM0_SI2274
+MBOM0_SX114
+MBOM0_SX204
+MBOM0_SX294
+MBOM0_SX311
+MBOM0_SX384
+MBSB0_SI1353
+MBSB0_SI1983
+MBSB0_SI723
+MBSB0_SX183
+MBSB0_SX273
+MBSB0_SX3
+MBSB0_SX363
+MBSB0_SX93
+MBTH0_SI2102
+MBTH0_SI505
+MBTH0_SI757
+MBTH0_SX122
+MBTH0_SX212
+MBTH0_SX302
+MBTH0_SX32
+MBTH0_SX392
+MBWP0_SI1531
+MBWP0_SI1969
+MBWP0_SI709
+MBWP0_SX169
+MBWP0_SX259
+MBWP0_SX349
+MBWP0_SX439
+MBWP0_SX79
+MCAE0_SI1447
+MCAE0_SI2077
+MCAE0_SI817
+MCAE0_SX187
+MCAE0_SX277
+MCAE0_SX367
+MCAE0_SX7
+MCAE0_SX97
+MCAL0_SI1138
+MCAL0_SI1768
+MCAL0_SI508
+MCAL0_SX148
+MCAL0_SX238
+MCAL0_SX328
+MCAL0_SX418
+MCAL0_SX58
+MCDC0_SI1292
+MCDC0_SI1922
+MCDC0_SI662
+MCDC0_SX122
+MCDC0_SX212
+MCDC0_SX302
+MCDC0_SX32
+MCDC0_SX392
+MCDD0_SI1513
+MCDD0_SI2143
+MCDD0_SI883
+MCDD0_SX163
+MCDD0_SX253
+MCDD0_SX343
+MCDD0_SX433
+MCDD0_SX73
+MCDR0_SI1154
+MCDR0_SI1784
+MCDR0_SI524
+MCDR0_SX164
+MCDR0_SX254
+MCDR0_SX344
+MCDR0_SX434
+MCDR0_SX74
+MCEF0_SI1135
+MCEF0_SI1765
+MCEF0_SI842
+MCEF0_SX145
+MCEF0_SX235
+MCEF0_SX325
+MCEF0_SX415
+MCEF0_SX55
+MCEW0_SI1442
+MCEW0_SI2072
+MCEW0_SI812
+MCEW0_SX182
+MCEW0_SX272
+MCEW0_SX362
+MCEW0_SX452
+MCEW0_SX92
+MCHL0_SI1347
+MCHL0_SI1404
+MCHL0_SI1977
+MCHL0_SX177
+MCHL0_SX267
+MCHL0_SX357
+MCHL0_SX447
+MCHL0_SX87
+MCLK0_SI1660
+MCLK0_SI2290
+MCLK0_SI650
+MCLK0_SX130
+MCLK0_SX220
+MCLK0_SX310
+MCLK0_SX40
+MCLK0_SX400
+MCLM0_SI1456
+MCLM0_SI2086
+MCLM0_SI826
+MCLM0_SX106
+MCLM0_SX16
+MCLM0_SX196
+MCLM0_SX286
+MCLM0_SX376
+MCPM0_SI1194
+MCPM0_SI1824
+MCPM0_SI564
+MCPM0_SX114
+MCPM0_SX204
+MCPM0_SX24
+MCPM0_SX294
+MCPM0_SX384
+MCRE0_SI1121
+MCRE0_SI1725
+MCRE0_SI1751
+MCRE0_SX131
+MCRE0_SX221
+MCRE0_SX24
+MCRE0_SX401
+MCRE0_SX41
+MCSS0_SI1380
+MCSS0_SI688
+MCSS0_SI750
+MCSS0_SX120
+MCSS0_SX210
+MCSS0_SX30
+MCSS0_SX300
+MCSS0_SX390
+MCTH0_SI1209
+MCTH0_SI1839
+MCTH0_SI579
+MCTH0_SX129
+MCTH0_SX219
+MCTH0_SX309
+MCTH0_SX39
+MCTH0_SX399
+MCTM0_SI1350
+MCTM0_SI1980
+MCTM0_SI720
+MCTM0_SX180
+MCTM0_SX270
+MCTM0_SX360
+MCTM0_SX450
+MCTM0_SX90
+MCXM0_SI1351
+MCXM0_SI1981
+MCXM0_SI721
+MCXM0_SX181
+MCXM0_SX271
+MCXM0_SX361
+MCXM0_SX451
+MCXM0_SX91
+MDAC0_SI1261
+MDAC0_SI1837
+MDAC0_SI631
+MDAC0_SX181
+MDAC0_SX271
+MDAC0_SX361
+MDAC0_SX451
+MDAC0_SX91
+MDAS0_SI1266
+MDAS0_SI1896
+MDAS0_SI636
+MDAS0_SX186
+MDAS0_SX21
+MDAS0_SX276
+MDAS0_SX6
+MDAS0_SX96
+MDBB1_SI1006
+MDBB1_SI1636
+MDBB1_SI2056
+MDBB1_SX106
+MDBB1_SX16
+MDBB1_SX196
+MDBB1_SX286
+MDBB1_SX376
+MDBP0_SI1158
+MDBP0_SI1788
+MDBP0_SI528
+MDBP0_SX168
+MDBP0_SX258
+MDBP0_SX348
+MDBP0_SX438
+MDBP0_SX78
+MDCD0_SI1415
+MDCD0_SI2045
+MDCD0_SI785
+MDCD0_SX155
+MDCD0_SX245
+MDCD0_SX335
+MDCD0_SX425
+MDCD0_SX65
+MDCM0_SI1480
+MDCM0_SI2110
+MDCM0_SI850
+MDCM0_SX130
+MDCM0_SX220
+MDCM0_SX310
+MDCM0_SX40
+MDCM0_SX400
+MDDC0_SI1419
+MDDC0_SI2049
+MDDC0_SI789
+MDDC0_SX159
+MDDC0_SX249
+MDDC0_SX339
+MDDC0_SX429
+MDDC0_SX69
+MDED0_SI1170
+MDED0_SI1800
+MDED0_SI540
+MDED0_SX180
+MDED0_SX270
+MDED0_SX360
+MDED0_SX450
+MDED0_SX90
+MDEF0_SI1123
+MDEF0_SI1563
+MDEF0_SI2193
+MDEF0_SX123
+MDEF0_SX213
+MDEF0_SX303
+MDEF0_SX33
+MDEF0_SX393
+MDEM0_SI1868
+MDEM0_SI608
+MDEM0_SI800
+MDEM0_SX158
+MDEM0_SX248
+MDEM0_SX338
+MDEM0_SX428
+MDEM0_SX68
+MDHL0_SI1439
+MDHL0_SI2069
+MDHL0_SI809
+MDHL0_SX179
+MDHL0_SX269
+MDHL0_SX359
+MDHL0_SX449
+MDHL0_SX89
+MDHS0_SI1530
+MDHS0_SI2160
+MDHS0_SI900
+MDHS0_SX180
+MDHS0_SX270
+MDHS0_SX360
+MDHS0_SX450
+MDHS0_SX90
+MDJM0_SI1455
+MDJM0_SI2085
+MDJM0_SI825
+MDJM0_SX105
+MDJM0_SX15
+MDJM0_SX195
+MDJM0_SX285
+MDJM0_SX375
+MDKS0_SI1066
+MDKS0_SI1696
+MDKS0_SI2326
+MDKS0_SX166
+MDKS0_SX256
+MDKS0_SX346
+MDKS0_SX436
+MDKS0_SX76
+MDLB0_SI1306
+MDLB0_SI1936
+MDLB0_SI676
+MDLB0_SX136
+MDLB0_SX226
+MDLB0_SX316
+MDLB0_SX406
+MDLB0_SX46
+MDLC0_SI1395
+MDLC0_SI2025
+MDLC0_SI765
+MDLC0_SX135
+MDLC0_SX225
+MDLC0_SX315
+MDLC0_SX405
+MDLC0_SX45
+MDLC1_SI1435
+MDLC1_SI2065
+MDLC1_SI2144
+MDLC1_SX175
+MDLC1_SX265
+MDLC1_SX355
+MDLC1_SX445
+MDLC1_SX85
+MDLC2_SI1614
+MDLC2_SI2244
+MDLC2_SI984
+MDLC2_SX174
+MDLC2_SX264
+MDLC2_SX354
+MDLC2_SX444
+MDLC2_SX84
+MDLH0_SI1960
+MDLH0_SI574
+MDLH0_SI700
+MDLH0_SX160
+MDLH0_SX250
+MDLH0_SX340
+MDLH0_SX430
+MDLH0_SX70
+MDLM0_SI1234
+MDLM0_SI1864
+MDLM0_SI604
+MDLM0_SX154
+MDLM0_SX244
+MDLM0_SX334
+MDLM0_SX424
+MDLM0_SX64
+MDLR0_SI1233
+MDLR0_SI1863
+MDLR0_SI603
+MDLR0_SX153
+MDLR0_SX243
+MDLR0_SX333
+MDLR0_SX423
+MDLR0_SX63
+MDLR1_SI1299
+MDLR1_SI1929
+MDLR1_SI669
+MDLR1_SX129
+MDLR1_SX219
+MDLR1_SX309
+MDLR1_SX39
+MDLR1_SX399
+MDMA0_SI1238
+MDMA0_SI1430
+MDMA0_SI2060
+MDMA0_SX170
+MDMA0_SX260
+MDMA0_SX350
+MDMA0_SX440
+MDMA0_SX80
+MDMT0_SI1832
+MDMT0_SI2341
+MDMT0_SI572
+MDMT0_SX122
+MDMT0_SX212
+MDMT0_SX302
+MDMT0_SX32
+MDMT0_SX392
+MDNS0_SI1011
+MDNS0_SI2271
+MDNS0_SI873
+MDNS0_SX111
+MDNS0_SX201
+MDNS0_SX21
+MDNS0_SX291
+MDNS0_SX381
+MDPB0_SI1760
+MDPB0_SI2126
+MDPB0_SI866
+MDPB0_SX146
+MDPB0_SX236
+MDPB0_SX326
+MDPB0_SX416
+MDPB0_SX56
+MDPK0_SI1053
+MDPK0_SI1683
+MDPK0_SI552
+MDPK0_SX153
+MDPK0_SX243
+MDPK0_SX333
+MDPK0_SX423
+MDPK0_SX63
+MDPS0_SI1651
+MDPS0_SI1979
+MDPS0_SI719
+MDPS0_SX179
+MDPS0_SX269
+MDPS0_SX359
+MDPS0_SX449
+MDPS0_SX89
+MDRD0_SI1382
+MDRD0_SI2012
+MDRD0_SI752
+MDRD0_SX122
+MDRD0_SX212
+MDRD0_SX302
+MDRD0_SX32
+MDRD0_SX392
+MDSJ0_SI1462
+MDSJ0_SI2092
+MDSJ0_SI832
+MDSJ0_SX112
+MDSJ0_SX22
+MDSJ0_SX292
+MDSJ0_SX382
+MDSJ0_SX438
+MDSS0_SI1881
+MDSS0_SI2087
+MDSS0_SI621
+MDSS0_SX171
+MDSS0_SX261
+MDSS0_SX351
+MDSS0_SX441
+MDSS0_SX81
+MDSS1_SI1327
+MDSS1_SI1713
+MDSS1_SI697
+MDSS1_SX157
+MDSS1_SX247
+MDSS1_SX337
+MDSS1_SX427
+MDSS1_SX67
+MDTB0_SI1200
+MDTB0_SI1830
+MDTB0_SI570
+MDTB0_SX120
+MDTB0_SX210
+MDTB0_SX300
+MDTB0_SX321
+MDTB0_SX390
+MDWD0_SI1260
+MDWD0_SI1890
+MDWD0_SI557
+MDWD0_SX180
+MDWD0_SX270
+MDWD0_SX360
+MDWD0_SX450
+MDWD0_SX90
+MDWH0_SI1168
+MDWH0_SI1925
+MDWH0_SI665
+MDWH0_SX125
+MDWH0_SX215
+MDWH0_SX305
+MDWH0_SX35
+MDWH0_SX395
+MDWM0_SI1546
+MDWM0_SI2176
+MDWM0_SI916
+MDWM0_SX106
+MDWM0_SX16
+MDWM0_SX286
+MDWM0_SX376
+MDWM0_SX433
+MEAL0_SI1547
+MEAL0_SI2177
+MEAL0_SI917
+MEAL0_SX107
+MEAL0_SX197
+MEAL0_SX287
+MEAL0_SX347
+MEAL0_SX377
+MEDR0_SI1374
+MEDR0_SI2004
+MEDR0_SI744
+MEDR0_SX114
+MEDR0_SX204
+MEDR0_SX24
+MEDR0_SX294
+MEDR0_SX384
+MEFG0_SI465
+MEFG0_SI491
+MEFG0_SI598
+MEFG0_SX105
+MEFG0_SX15
+MEFG0_SX195
+MEFG0_SX285
+MEFG0_SX375
+MEGJ0_SI1337
+MEGJ0_SI1967
+MEGJ0_SI707
+MEGJ0_SX167
+MEGJ0_SX257
+MEGJ0_SX3
+MEGJ0_SX437
+MEGJ0_SX77
+MEJL0_SI1592
+MEJL0_SI1654
+MEJL0_SI962
+MEJL0_SX152
+MEJL0_SX242
+MEJL0_SX332
+MEJL0_SX422
+MEJL0_SX62
+MEJS0_SI1240
+MEJS0_SI1870
+MEJS0_SI610
+MEJS0_SX160
+MEJS0_SX250
+MEJS0_SX340
+MEJS0_SX430
+MEJS0_SX70
+MESG0_SI1332
+MESG0_SI1962
+MESG0_SI702
+MESG0_SX162
+MESG0_SX252
+MESG0_SX342
+MESG0_SX432
+MESG0_SX72
+MESJ0_SI2039
+MESJ0_SI2257
+MESJ0_SI997
+MESJ0_SX187
+MESJ0_SX277
+MESJ0_SX367
+MESJ0_SX7
+MESJ0_SX97
+MEWM0_SI1348
+MEWM0_SI1978
+MEWM0_SI718
+MEWM0_SX178
+MEWM0_SX268
+MEWM0_SX358
+MEWM0_SX448
+MEWM0_SX88
+MFER0_SI1492
+MFER0_SI2122
+MFER0_SI862
+MFER0_SX142
+MFER0_SX232
+MFER0_SX322
+MFER0_SX412
+MFER0_SX52
+MFMC0_SI1132
+MFMC0_SI1762
+MFMC0_SI502
+MFMC0_SX142
+MFMC0_SX232
+MFMC0_SX322
+MFMC0_SX412
+MFMC0_SX52
+MFRM0_SI1155
+MFRM0_SI1717
+MFRM0_SI1785
+MFRM0_SX165
+MFRM0_SX255
+MFRM0_SX345
+MFRM0_SX435
+MFRM0_SX75
+MFWK0_SI1249
+MFWK0_SI1879
+MFWK0_SI619
+MFWK0_SX169
+MFWK0_SX259
+MFWK0_SX349
+MFWK0_SX439
+MFWK0_SX79
+MFXS0_SI1674
+MFXS0_SI2225
+MFXS0_SI2304
+MFXS0_SX144
+MFXS0_SX234
+MFXS0_SX324
+MFXS0_SX414
+MFXS0_SX54
+MFXV0_SI1005
+MFXV0_SI1342
+MFXV0_SI1635
+MFXV0_SX105
+MFXV0_SX15
+MFXV0_SX195
+MFXV0_SX285
+MFXV0_SX375
+MGAF0_SI1282
+MGAF0_SI1912
+MGAF0_SI652
+MGAF0_SX112
+MGAF0_SX202
+MGAF0_SX22
+MGAF0_SX292
+MGAF0_SX382
+MGAG0_SI1321
+MGAG0_SI645
+MGAG0_SI691
+MGAG0_SX151
+MGAG0_SX241
+MGAG0_SX331
+MGAG0_SX421
+MGAG0_SX61
+MGAK0_SI1036
+MGAK0_SI1666
+MGAK0_SI2296
+MGAK0_SX136
+MGAK0_SX226
+MGAK0_SX316
+MGAK0_SX406
+MGAK0_SX46
+MGAR0_SI1212
+MGAR0_SI1694
+MGAR0_SI1842
+MGAR0_SX132
+MGAR0_SX222
+MGAR0_SX312
+MGAR0_SX402
+MGAR0_SX42
+MGAW0_SI1165
+MGAW0_SI1802
+MGAW0_SI535
+MGAW0_SX175
+MGAW0_SX265
+MGAW0_SX355
+MGAW0_SX445
+MGAW0_SX85
+MGES0_SI1481
+MGES0_SI2111
+MGES0_SI851
+MGES0_SX131
+MGES0_SX221
+MGES0_SX311
+MGES0_SX401
+MGES0_SX41
+MGJC0_SI1256
+MGJC0_SI1335
+MGJC0_SI1965
+MGJC0_SX165
+MGJC0_SX255
+MGJC0_SX345
+MGJC0_SX435
+MGJC0_SX75
+MGRL0_SI1497
+MGRL0_SI2127
+MGRL0_SI867
+MGRL0_SX147
+MGRL0_SX237
+MGRL0_SX327
+MGRL0_SX417
+MGRL0_SX57
+MGRP0_SI1317
+MGRP0_SI1947
+MGRP0_SI687
+MGRP0_SX147
+MGRP0_SX237
+MGRP0_SX327
+MGRP0_SX417
+MGRP0_SX57
+MGSH0_SI1176
+MGSH0_SI1806
+MGSH0_SI546
+MGSH0_SX127
+MGSH0_SX186
+MGSH0_SX276
+MGSH0_SX6
+MGSH0_SX96
+MGSL0_SI1164
+MGSL0_SI534
+MGSL0_SI797
+MGSL0_SX174
+MGSL0_SX264
+MGSL0_SX354
+MGSL0_SX444
+MGSL0_SX84
+MGXP0_SI1087
+MGXP0_SI457
+MGXP0_SI525
+MGXP0_SX187
+MGXP0_SX277
+MGXP0_SX367
+MGXP0_SX7
+MGXP0_SX97
+MHBS0_SI1575
+MHBS0_SI2205
+MHBS0_SI945
+MHBS0_SX135
+MHBS0_SX225
+MHBS0_SX315
+MHBS0_SX405
+MHBS0_SX45
+MHIT0_SI1613
+MHIT0_SI2243
+MHIT0_SI983
+MHIT0_SX173
+MHIT0_SX263
+MHIT0_SX353
+MHIT0_SX443
+MHIT0_SX83
+MHJB0_SI1017
+MHJB0_SI1647
+MHJB0_SI2277
+MHJB0_SX117
+MHJB0_SX207
+MHJB0_SX27
+MHJB0_SX297
+MHJB0_SX387
+MHMG0_SI1365
+MHMG0_SI1995
+MHMG0_SI735
+MHMG0_SX105
+MHMG0_SX15
+MHMG0_SX195
+MHMG0_SX285
+MHMG0_SX375
+MHMR0_SI1119
+MHMR0_SI1692
+MHMR0_SI489
+MHMR0_SX129
+MHMR0_SX219
+MHMR0_SX309
+MHMR0_SX39
+MHMR0_SX399
+MHRM0_SI1475
+MHRM0_SI2218
+MHRM0_SI958
+MHRM0_SX148
+MHRM0_SX238
+MHRM0_SX328
+MHRM0_SX418
+MHRM0_SX58
+MHXL0_SI1772
+MHXL0_SI512
+MHXL0_SI612
+MHXL0_SX152
+MHXL0_SX242
+MHXL0_SX332
+MHXL0_SX422
+MHXL0_SX62
+MILB0_SI2163
+MILB0_SI807
+MILB0_SI903
+MILB0_SX183
+MILB0_SX273
+MILB0_SX3
+MILB0_SX363
+MILB0_SX93
+MJAC0_SI1331
+MJAC0_SI2148
+MJAC0_SI701
+MJAC0_SX251
+MJAC0_SX307
+MJAC0_SX341
+MJAC0_SX431
+MJAC0_SX71
+MJAE0_SI1524
+MJAE0_SI1999
+MJAE0_SI2154
+MJAE0_SX174
+MJAE0_SX264
+MJAE0_SX354
+MJAE0_SX444
+MJAE0_SX84
+MJAI0_SI1604
+MJAI0_SI682
+MJAI0_SI710
+MJAI0_SX164
+MJAI0_SX254
+MJAI0_SX344
+MJAI0_SX434
+MJAI0_SX74
+MJBG0_SI1232
+MJBG0_SI1724
+MJBG0_SI1862
+MJBG0_SX152
+MJBG0_SX242
+MJBG0_SX332
+MJBG0_SX422
+MJBG0_SX62
+MJDA0_SI1031
+MJDA0_SI1661
+MJDA0_SI2291
+MJDA0_SX131
+MJDA0_SX221
+MJDA0_SX311
+MJDA0_SX401
+MJDA0_SX41
+MJDC0_SI1161
+MJDC0_SI2165
+MJDC0_SI531
+MJDC0_SX171
+MJDC0_SX261
+MJDC0_SX351
+MJDC0_SX441
+MJDC0_SX81
+MJDE0_SI1120
+MJDE0_SI463
+MJDE0_SI490
+MJDE0_SX130
+MJDE0_SX220
+MJDE0_SX310
+MJDE0_SX40
+MJDE0_SX400
+MJDG0_SI1042
+MJDG0_SI1672
+MJDG0_SI1705
+MJDG0_SX142
+MJDG0_SX232
+MJDG0_SX322
+MJDG0_SX412
+MJDG0_SX52
+MJDM0_SI1340
+MJDM0_SI1937
+MJDM0_SI974
+MJDM0_SX170
+MJDM0_SX260
+MJDM0_SX350
+MJDM0_SX440
+MJDM0_SX80
+MJEB0_SI1286
+MJEB0_SI1916
+MJEB0_SI656
+MJEB0_SX170
+MJEB0_SX206
+MJEB0_SX26
+MJEB0_SX296
+MJEB0_SX386
+MJEB1_SI1467
+MJEB1_SI2097
+MJEB1_SI837
+MJEB1_SX117
+MJEB1_SX207
+MJEB1_SX27
+MJEB1_SX297
+MJEB1_SX387
+MJEE0_SI1237
+MJEE0_SI1867
+MJEE0_SI607
+MJEE0_SX157
+MJEE0_SX247
+MJEE0_SX337
+MJEE0_SX427
+MJEE0_SX67
+MJFH0_SI1107
+MJFH0_SI1737
+MJFH0_SI477
+MJFH0_SX117
+MJFH0_SX207
+MJFH0_SX27
+MJFH0_SX297
+MJFH0_SX387
+MJFR0_SI1605
+MJFR0_SI2235
+MJFR0_SI975
+MJFR0_SX165
+MJFR0_SX255
+MJFR0_SX345
+MJFR0_SX435
+MJFR0_SX75
+MJHI0_SI1328
+MJHI0_SI555
+MJHI0_SI698
+MJHI0_SX158
+MJHI0_SX248
+MJHI0_SX338
+MJHI0_SX428
+MJHI0_SX68
+MJJB0_SI1139
+MJJB0_SI1277
+MJJB0_SI1769
+MJJB0_SX149
+MJJB0_SX239
+MJJB0_SX329
+MJJB0_SX419
+MJJB0_SX59
+MJJJ0_SI1163
+MJJJ0_SI1793
+MJJJ0_SI533
+MJJJ0_SX173
+MJJJ0_SX263
+MJJJ0_SX353
+MJJJ0_SX443
+MJJJ0_SX83
+MJJM0_SI1251
+MJJM0_SI1457
+MJJM0_SI827
+MJJM0_SX107
+MJJM0_SX17
+MJJM0_SX197
+MJJM0_SX287
+MJJM0_SX377
+MJKR0_SI1201
+MJKR0_SI1831
+MJKR0_SI571
+MJKR0_SX121
+MJKR0_SX211
+MJKR0_SX301
+MJKR0_SX31
+MJKR0_SX391
+MJLB0_SI1616
+MJLB0_SI2246
+MJLB0_SI986
+MJLB0_SX176
+MJLB0_SX266
+MJLB0_SX356
+MJLB0_SX446
+MJLB0_SX86
+MJLG1_SI1012
+MJLG1_SI1642
+MJLG1_SI2272
+MJLG1_SX112
+MJLG1_SX202
+MJLG1_SX22
+MJLG1_SX292
+MJLG1_SX382
+MJLS0_SI1096
+MJLS0_SI1726
+MJLS0_SI466
+MJLS0_SX106
+MJLS0_SX16
+MJLS0_SX196
+MJLS0_SX286
+MJLS0_SX376
+MJMA0_SI1495
+MJMA0_SI2125
+MJMA0_SI865
+MJMA0_SX145
+MJMA0_SX235
+MJMA0_SX325
+MJMA0_SX415
+MJMA0_SX55
+MJMD0_SI1028
+MJMD0_SI1658
+MJMD0_SI2288
+MJMD0_SX128
+MJMD0_SX218
+MJMD0_SX308
+MJMD0_SX38
+MJMD0_SX398
+MJMM0_SI1255
+MJMM0_SI1885
+MJMM0_SI625
+MJMM0_SX175
+MJMM0_SX265
+MJMM0_SX355
+MJMM0_SX445
+MJMM0_SX85
+MJPG0_SI1191
+MJPG0_SI1821
+MJPG0_SI561
+MJPG0_SX111
+MJPG0_SX201
+MJPG0_SX21
+MJPG0_SX291
+MJPG0_SX381
+MJPM0_SI1368
+MJPM0_SI1998
+MJPM0_SI738
+MJPM0_SX108
+MJPM0_SX18
+MJPM0_SX198
+MJPM0_SX288
+MJPM0_SX378
+MJPM1_SI1897
+MJPM1_SI2280
+MJPM1_SI761
+MJPM1_SX131
+MJPM1_SX221
+MJPM1_SX311
+MJPM1_SX401
+MJPM1_SX41
+MJRA0_SI1236
+MJRA0_SI1866
+MJRA0_SI606
+MJRA0_SX156
+MJRA0_SX246
+MJRA0_SX336
+MJRA0_SX426
+MJRA0_SX66
+MJRG0_SI1366
+MJRG0_SI1996
+MJRG0_SI736
+MJRG0_SX106
+MJRG0_SX16
+MJRG0_SX286
+MJRG0_SX352
+MJRG0_SX376
+MJRH0_SI1125
+MJRH0_SI1755
+MJRH0_SI1840
+MJRH0_SX135
+MJRH0_SX225
+MJRH0_SX315
+MJRH0_SX405
+MJRH0_SX45
+MJRH1_SI1558
+MJRH1_SI1774
+MJRH1_SI514
+MJRH1_SX154
+MJRH1_SX244
+MJRH1_SX334
+MJRH1_SX424
+MJRH1_SX64
+MJRK0_SI1662
+MJRK0_SI2103
+MJRK0_SI880
+MJRK0_SX160
+MJRK0_SX250
+MJRK0_SX340
+MJRK0_SX430
+MJRK0_SX70
+MJRP0_SI1835
+MJRP0_SI1845
+MJRP0_SI585
+MJRP0_SX135
+MJRP0_SX225
+MJRP0_SX315
+MJRP0_SX405
+MJRP0_SX45
+MJSR0_SI1424
+MJSR0_SI2054
+MJSR0_SI794
+MJSR0_SX164
+MJSR0_SX254
+MJSR0_SX344
+MJSR0_SX434
+MJSR0_SX74
+MJWG0_SI2155
+MJWG0_SI813
+MJWG0_SI895
+MJWG0_SX175
+MJWG0_SX265
+MJWG0_SX355
+MJWG0_SX445
+MJWG0_SX85
+MJWS0_SI1143
+MJWS0_SI1773
+MJWS0_SI513
+MJWS0_SX153
+MJWS0_SX243
+MJWS0_SX333
+MJWS0_SX423
+MJWS0_SX63
+MJWT0_SI1291
+MJWT0_SI1381
+MJWT0_SI751
+MJWT0_SX121
+MJWT0_SX211
+MJWT0_SX301
+MJWT0_SX31
+MJWT0_SX391
+MJXA0_SI1507
+MJXA0_SI2137
+MJXA0_SI877
+MJXA0_SX157
+MJXA0_SX247
+MJXA0_SX337
+MJXA0_SX427
+MJXA0_SX67
+MJXL0_SI1172
+MJXL0_SI1795
+MJXL0_SI542
+MJXL0_SX182
+MJXL0_SX272
+MJXL0_SX362
+MJXL0_SX452
+MJXL0_SX92
+MKAG0_SI1609
+MKAG0_SI2239
+MKAG0_SI979
+MKAG0_SX169
+MKAG0_SX259
+MKAG0_SX30
+MKAG0_SX439
+MKAG0_SX79
+MKAH0_SI1528
+MKAH0_SI2158
+MKAH0_SI898
+MKAH0_SX178
+MKAH0_SX268
+MKAH0_SX358
+MKAH0_SX448
+MKAH0_SX88
+MKAJ0_SI1414
+MKAJ0_SI2044
+MKAJ0_SI784
+MKAJ0_SX154
+MKAJ0_SX244
+MKAJ0_SX334
+MKAJ0_SX424
+MKAJ0_SX64
+MKAM0_SI1250
+MKAM0_SI1316
+MKAM0_SI1465
+MKAM0_SX146
+MKAM0_SX236
+MKAM0_SX326
+MKAM0_SX416
+MKAM0_SX56
+MKDB0_SI2132
+MKDB0_SI588
+MKDB0_SI872
+MKDB0_SX152
+MKDB0_SX242
+MKDB0_SX332
+MKDB0_SX422
+MKDB0_SX62
+MKDD0_SI1567
+MKDD0_SI2197
+MKDD0_SI937
+MKDD0_SX127
+MKDD0_SX217
+MKDD0_SX307
+MKDD0_SX37
+MKDD0_SX397
+MKDT0_SI2153
+MKDT0_SI814
+MKDT0_SI893
+MKDT0_SX173
+MKDT0_SX263
+MKDT0_SX353
+MKDT0_SX443
+MKDT0_SX83
+MKES0_SI1253
+MKES0_SI1883
+MKES0_SI623
+MKES0_SX173
+MKES0_SX263
+MKES0_SX353
+MKES0_SX443
+MKES0_SX83
+MKJO0_SI1517
+MKJO0_SI2147
+MKJO0_SI887
+MKJO0_SX167
+MKJO0_SX257
+MKJO0_SX424
+MKJO0_SX437
+MKJO0_SX77
+MKLN0_SI1598
+MKLN0_SI2228
+MKLN0_SI968
+MKLN0_SX158
+MKLN0_SX248
+MKLN0_SX338
+MKLN0_SX428
+MKLN0_SX68
+MKLR0_SI1059
+MKLR0_SI1689
+MKLR0_SI2319
+MKLR0_SX159
+MKLR0_SX249
+MKLR0_SX339
+MKLR0_SX429
+MKLR0_SX69
+MKLS0_SI1437
+MKLS0_SI1533
+MKLS0_SI2067
+MKLS0_SX177
+MKLS0_SX267
+MKLS0_SX357
+MKLS0_SX447
+MKLS0_SX87
+MKLS1_SI1545
+MKLS1_SI2175
+MKLS1_SI915
+MKLS1_SX105
+MKLS1_SX15
+MKLS1_SX195
+MKLS1_SX285
+MKLS1_SX375
+MKLW0_SI1571
+MKLW0_SI1844
+MKLW0_SI2201
+MKLW0_SX131
+MKLW0_SX221
+MKLW0_SX311
+MKLW0_SX401
+MKLW0_SX41
+MKRG0_SI1491
+MKRG0_SI2121
+MKRG0_SI861
+MKRG0_SX141
+MKRG0_SX231
+MKRG0_SX31
+MKRG0_SX411
+MKRG0_SX51
+MKXL0_SI1185
+MKXL0_SI1815
+MKXL0_SI1958
+MKXL0_SX105
+MKXL0_SX15
+MKXL0_SX195
+MKXL0_SX285
+MKXL0_SX375
+MLBC0_SI1239
+MLBC0_SI1869
+MLBC0_SI609
+MLBC0_SX159
+MLBC0_SX249
+MLBC0_SX339
+MLBC0_SX429
+MLBC0_SX69
+MLEL0_SI1246
+MLEL0_SI1876
+MLEL0_SI616
+MLEL0_SX166
+MLEL0_SX256
+MLEL0_SX346
+MLEL0_SX436
+MLEL0_SX76
+MLJC0_SI1225
+MLJC0_SI1855
+MLJC0_SI595
+MLJC0_SX145
+MLJC0_SX235
+MLJC0_SX325
+MLJC0_SX415
+MLJC0_SX55
+MLJH0_SI1324
+MLJH0_SI1422
+MLJH0_SI694
+MLJH0_SX154
+MLJH0_SX244
+MLJH0_SX334
+MLJH0_SX424
+MLJH0_SX64
+MLNS0_SI1407
+MLNS0_SI2037
+MLNS0_SI777
+MLNS0_SX147
+MLNS0_SX237
+MLNS0_SX327
+MLNS0_SX417
+MLNS0_SX57
+MLSH0_SI1417
+MLSH0_SI2047
+MLSH0_SI787
+MLSH0_SX157
+MLSH0_SX247
+MLSH0_SX337
+MLSH0_SX427
+MLSH0_SX67
+MMAA0_SI1588
+MMAA0_SI2105
+MMAA0_SI845
+MMAA0_SX125
+MMAA0_SX215
+MMAA0_SX305
+MMAA0_SX35
+MMAA0_SX395
+MMAB1_SI1494
+MMAB1_SI2124
+MMAB1_SI864
+MMAB1_SX144
+MMAB1_SX234
+MMAB1_SX324
+MMAB1_SX414
+MMAB1_SX54
+MMAG0_SI1126
+MMAG0_SI1756
+MMAG0_SI496
+MMAG0_SX136
+MMAG0_SX226
+MMAG0_SX316
+MMAG0_SX406
+MMAG0_SX46
+MMAM0_SI1597
+MMAM0_SI1668
+MMAM0_SI2227
+MMAM0_SX157
+MMAM0_SX247
+MMAM0_SX337
+MMAM0_SX427
+MMAM0_SX67
+MMAR0_SI1336
+MMAR0_SI1966
+MMAR0_SI706
+MMAR0_SX166
+MMAR0_SX256
+MMAR0_SX346
+MMAR0_SX436
+MMAR0_SX76
+MMBS0_SI1151
+MMBS0_SI1781
+MMBS0_SI521
+MMBS0_SX161
+MMBS0_SX251
+MMBS0_SX341
+MMBS0_SX431
+MMBS0_SX71
+MMCC0_SI1338
+MMCC0_SI1968
+MMCC0_SI708
+MMCC0_SX168
+MMCC0_SX258
+MMCC0_SX348
+MMCC0_SX438
+MMCC0_SX78
+MMDB0_SI1358
+MMDB0_SI1617
+MMDB0_SI987
+MMDB0_SX177
+MMDB0_SX267
+MMDB0_SX357
+MMDB0_SX447
+MMDB0_SX87
+MMDG0_SI1780
+MMDG0_SI2035
+MMDG0_SI520
+MMDG0_SX160
+MMDG0_SX250
+MMDG0_SX340
+MMDG0_SX430
+MMDG0_SX70
+MMDM0_SI1311
+MMDM0_SI1941
+MMDM0_SI681
+MMDM0_SX141
+MMDM0_SX231
+MMDM0_SX321
+MMDM0_SX411
+MMDM0_SX51
+MMDM1_SI1650
+MMDM1_SI2043
+MMDM1_SI783
+MMDM1_SX153
+MMDM1_SX243
+MMDM1_SX333
+MMDM1_SX423
+MMDM1_SX63
+MMDS0_SI1343
+MMDS0_SI1973
+MMDS0_SI713
+MMDS0_SX173
+MMDS0_SX263
+MMDS0_SX353
+MMDS0_SX443
+MMDS0_SX83
+MMEA0_SI1388
+MMEA0_SI2018
+MMEA0_SI758
+MMEA0_SX128
+MMEA0_SX218
+MMEA0_SX308
+MMEA0_SX38
+MMEA0_SX398
+MMEB0_SI1357
+MMEB0_SI1987
+MMEB0_SI727
+MMEB0_SX187
+MMEB0_SX327
+MMEB0_SX367
+MMEB0_SX7
+MMEB0_SX97
+MMGC0_SI1305
+MMGC0_SI1935
+MMGC0_SI2184
+MMGC0_SX135
+MMGC0_SX225
+MMGC0_SX315
+MMGC0_SX405
+MMGC0_SX45
+MMGG0_SI1079
+MMGG0_SI1709
+MMGG0_SI2339
+MMGG0_SX179
+MMGG0_SX269
+MMGG0_SX359
+MMGG0_SX449
+MMGG0_SX89
+MMGK0_SI1322
+MMGK0_SI1952
+MMGK0_SI692
+MMGK0_SX152
+MMGK0_SX242
+MMGK0_SX332
+MMGK0_SX422
+MMGK0_SX62
+MMJB1_SI1408
+MMJB1_SI2038
+MMJB1_SI778
+MMJB1_SX148
+MMJB1_SX238
+MMJB1_SX328
+MMJB1_SX418
+MMJB1_SX58
+MMLM0_SI1527
+MMLM0_SI2150
+MMLM0_SI897
+MMLM0_SX177
+MMLM0_SX267
+MMLM0_SX357
+MMLM0_SX447
+MMLM0_SX87
+MMPM0_SI1061
+MMPM0_SI1691
+MMPM0_SI2321
+MMPM0_SX161
+MMPM0_SX251
+MMPM0_SX341
+MMPM0_SX431
+MMPM0_SX71
+MMRP0_SI2034
+MMRP0_SI717
+MMRP0_SI774
+MMRP0_SX144
+MMRP0_SX234
+MMRP0_SX324
+MMRP0_SX414
+MMRP0_SX54
+MMSM0_SI1106
+MMSM0_SI1736
+MMSM0_SI476
+MMSM0_SX116
+MMSM0_SX206
+MMSM0_SX26
+MMSM0_SX296
+MMSM0_SX386
+MMVP0_SI1284
+MMVP0_SI1914
+MMVP0_SI654
+MMVP0_SX114
+MMVP0_SX204
+MMVP0_SX294
+MMVP0_SX347
+MMVP0_SX384
+MMWB0_SI1619
+MMWB0_SI2249
+MMWB0_SI989
+MMWB0_SX179
+MMWB0_SX269
+MMWB0_SX359
+MMWB0_SX449
+MMWB0_SX89
+MMWS0_SI1518
+MMWS0_SI559
+MMWS0_SI888
+MMWS0_SX168
+MMWS0_SX258
+MMWS0_SX348
+MMWS0_SX438
+MMWS0_SX78
+MMWS1_SI1071
+MMWS1_SI1701
+MMWS1_SI2331
+MMWS1_SX261
+MMWS1_SX27
+MMWS1_SX351
+MMWS1_SX441
+MMWS1_SX81
+MMXS0_SI2136
+MMXS0_SI629
+MMXS0_SI876
+MMXS0_SX156
+MMXS0_SX246
+MMXS0_SX336
+MMXS0_SX426
+MMXS0_SX66
+MNET0_SI1446
+MNET0_SI2076
+MNET0_SI816
+MNET0_SX186
+MNET0_SX276
+MNET0_SX366
+MNET0_SX6
+MNET0_SX96
+MNTW0_SI1068
+MNTW0_SI1698
+MNTW0_SI2328
+MNTW0_SX168
+MNTW0_SX202
+MNTW0_SX258
+MNTW0_SX348
+MNTW0_SX78
+MPAR0_SI1576
+MPAR0_SI2206
+MPAR0_SI946
+MPAR0_SX136
+MPAR0_SX226
+MPAR0_SX316
+MPAR0_SX406
+MPAR0_SX46
+MPEB0_SI1034
+MPEB0_SI1860
+MPEB0_SI600
+MPEB0_SX150
+MPEB0_SX240
+MPEB0_SX330
+MPEB0_SX420
+MPEB0_SX60
+MPFU0_SI1258
+MPFU0_SI1888
+MPFU0_SI628
+MPFU0_SX178
+MPFU0_SX268
+MPFU0_SX358
+MPFU0_SX448
+MPFU0_SX88
+MPGH0_SI1554
+MPGH0_SI675
+MPGH0_SI924
+MPGH0_SX114
+MPGH0_SX204
+MPGH0_SX24
+MPGH0_SX294
+MPGH0_SX384
+MPGR0_SI1410
+MPGR0_SI2040
+MPGR0_SI780
+MPGR0_SX150
+MPGR0_SX240
+MPGR0_SX330
+MPGR0_SX420
+MPGR0_SX60
+MPGR1_SI1269
+MPGR1_SI1499
+MPGR1_SI2129
+MPGR1_SX149
+MPGR1_SX239
+MPGR1_SX329
+MPGR1_SX419
+MPGR1_SX59
+MPMB0_SI1501
+MPMB0_SI2131
+MPMB0_SI871
+MPMB0_SX151
+MPMB0_SX241
+MPMB0_SX331
+MPMB0_SX421
+MPMB0_SX61
+MPPC0_SI1412
+MPPC0_SI2042
+MPPC0_SI782
+MPPC0_SX152
+MPPC0_SX242
+MPPC0_SX332
+MPPC0_SX422
+MPPC0_SX62
+MPRB0_SI1205
+MPRB0_SI1215
+MPRB0_SI575
+MPRB0_SX125
+MPRB0_SX215
+MPRB0_SX305
+MPRB0_SX35
+MPRB0_SX395
+MPRD0_SI1431
+MPRD0_SI2061
+MPRD0_SI801
+MPRD0_SX171
+MPRD0_SX261
+MPRD0_SX351
+MPRD0_SX441
+MPRD0_SX81
+MPRK0_SI1097
+MPRK0_SI1727
+MPRK0_SI467
+MPRK0_SX107
+MPRK0_SX17
+MPRK0_SX197
+MPRK0_SX287
+MPRK0_SX377
+MPRT0_SI1210
+MPRT0_SI495
+MPRT0_SI580
+MPRT0_SX130
+MPRT0_SX220
+MPRT0_SX310
+MPRT0_SX40
+MPRT0_SX400
+MPSW0_SI1067
+MPSW0_SI1697
+MPSW0_SI2327
+MPSW0_SX167
+MPSW0_SX24
+MPSW0_SX257
+MPSW0_SX437
+MPSW0_SX77
+MRAB0_SI1224
+MRAB0_SI1854
+MRAB0_SI594
+MRAB0_SX144
+MRAB0_SX234
+MRAB0_SX324
+MRAB0_SX414
+MRAB0_SX54
+MRAB1_SI1478
+MRAB1_SI2108
+MRAB1_SI848
+MRAB1_SX128
+MRAB1_SX218
+MRAB1_SX308
+MRAB1_SX38
+MRAB1_SX398
+MRAI0_SI1954
+MRAI0_SI2052
+MRAI0_SI792
+MRAI0_SX162
+MRAI0_SX252
+MRAI0_SX342
+MRAI0_SX432
+MRAI0_SX72
+MRAM0_SI1275
+MRAM0_SI1905
+MRAM0_SI1951
+MRAM0_SX105
+MRAM0_SX15
+MRAM0_SX195
+MRAM0_SX285
+MRAM0_SX375
+MRAV0_SI1008
+MRAV0_SI1638
+MRAV0_SI2268
+MRAV0_SX108
+MRAV0_SX18
+MRAV0_SX198
+MRAV0_SX288
+MRAV0_SX378
+MRBC0_SI1665
+MRBC0_SI1859
+MRBC0_SI599
+MRBC0_SX149
+MRBC0_SX239
+MRBC0_SX329
+MRBC0_SX419
+MRBC0_SX59
+MRCG0_SI1428
+MRCG0_SI2058
+MRCG0_SI798
+MRCG0_SX168
+MRCG0_SX258
+MRCG0_SX348
+MRCG0_SX438
+MRCG0_SX78
+MRCW0_SI1371
+MRCW0_SI2001
+MRCW0_SI741
+MRCW0_SX111
+MRCW0_SX201
+MRCW0_SX21
+MRCW0_SX291
+MRCW0_SX381
+MRDD0_SI1050
+MRDD0_SI1680
+MRDD0_SI2310
+MRDD0_SX150
+MRDD0_SX240
+MRDD0_SX277
+MRDD0_SX330
+MRDD0_SX60
+MRDM0_SI1044
+MRDM0_SI1595
+MRDM0_SI965
+MRDM0_SX155
+MRDM0_SX245
+MRDM0_SX335
+MRDM0_SX425
+MRDM0_SX65
+MRDS0_SI1167
+MRDS0_SI1797
+MRDS0_SI537
+MRDS0_SX177
+MRDS0_SX267
+MRDS0_SX357
+MRDS0_SX447
+MRDS0_SX87
+MREE0_SI1104
+MREE0_SI1734
+MREE0_SI1959
+MREE0_SX114
+MREE0_SX204
+MREE0_SX24
+MREE0_SX294
+MREE0_SX384
+MREH1_SI1599
+MREH1_SI2229
+MREH1_SI969
+MREH1_SX159
+MREH1_SX249
+MREH1_SX339
+MREH1_SX429
+MREH1_SX69
+MREM0_SI1591
+MREM0_SI511
+MREM0_SI961
+MREM0_SX151
+MREM0_SX241
+MREM0_SX331
+MREM0_SX421
+MREM0_SX61
+MREW1_SI1500
+MREW1_SI2130
+MREW1_SI870
+MREW1_SX150
+MREW1_SX240
+MREW1_SX330
+MREW1_SX420
+MREW1_SX60
+MRFK0_SI1076
+MRFK0_SI1706
+MRFK0_SI2336
+MRFK0_SX176
+MRFK0_SX266
+MRFK0_SX356
+MRFK0_SX446
+MRFK0_SX86
+MRFL0_SI1156
+MRFL0_SI1786
+MRFL0_SI526
+MRFL0_SX166
+MRFL0_SX256
+MRFL0_SX346
+MRFL0_SX436
+MRFL0_SX76
+MRGM0_SI1162
+MRGM0_SI1792
+MRGM0_SI532
+MRGM0_SX172
+MRGM0_SX262
+MRGM0_SX416
+MRGM0_SX442
+MRGM0_SX82
+MRGS0_SI1356
+MRGS0_SI1986
+MRGS0_SI726
+MRGS0_SX186
+MRGS0_SX276
+MRGS0_SX366
+MRGS0_SX6
+MRGS0_SX96
+MRHL0_SI1515
+MRHL0_SI2145
+MRHL0_SI885
+MRHL0_SX165
+MRHL0_SX255
+MRHL0_SX345
+MRHL0_SX435
+MRHL0_SX75
+MRJB1_SI1020
+MRJB1_SI1413
+MRJB1_SI2021
+MRJB1_SX120
+MRJB1_SX210
+MRJB1_SX30
+MRJB1_SX300
+MRJB1_SX390
+MRJH0_SI1519
+MRJH0_SI889
+MRJH0_SI914
+MRJH0_SX169
+MRJH0_SX259
+MRJH0_SX307
+MRJH0_SX439
+MRJH0_SX79
+MRJM0_SI1095
+MRJM0_SI1228
+MRJM0_SI1858
+MRJM0_SX148
+MRJM0_SX238
+MRJM0_SX328
+MRJM0_SX418
+MRJM0_SX58
+MRJM1_SI1298
+MRJM1_SI1928
+MRJM1_SI668
+MRJM1_SX128
+MRJM1_SX218
+MRJM1_SX308
+MRJM1_SX38
+MRJM1_SX398
+MRJT0_SI1498
+MRJT0_SI1805
+MRJT0_SI868
+MRJT0_SX148
+MRJT0_SX238
+MRJT0_SX328
+MRJT0_SX418
+MRJT0_SX58
+MRKM0_SI1267
+MRKM0_SI1391
+MRKM0_SI637
+MRKM0_SX187
+MRKM0_SX277
+MRKM0_SX367
+MRKM0_SX7
+MRKM0_SX97
+MRLD0_SI1594
+MRLD0_SI2224
+MRLD0_SI964
+MRLD0_SX154
+MRLD0_SX244
+MRLD0_SX334
+MRLD0_SX424
+MRLD0_SX64
+MRLJ0_SI1420
+MRLJ0_SI2050
+MRLJ0_SI790
+MRLJ0_SX160
+MRLJ0_SX250
+MRLJ0_SX340
+MRLJ0_SX430
+MRLJ0_SX70
+MRLJ1_SI1671
+MRLJ1_SI2301
+MRLJ1_SI2332
+MRLJ1_SX141
+MRLJ1_SX231
+MRLJ1_SX321
+MRLJ1_SX411
+MRLJ1_SX51
+MRLK0_SI1468
+MRLK0_SI2140
+MRLK0_SI843
+MRLK0_SX123
+MRLK0_SX213
+MRLK0_SX303
+MRLK0_SX33
+MRLK0_SX393
+MRLR0_SI1196
+MRLR0_SI1826
+MRLR0_SI566
+MRLR0_SX116
+MRLR0_SX206
+MRLR0_SX26
+MRLR0_SX296
+MRLR0_SX386
+MRMB0_SI1581
+MRMB0_SI2211
+MRMB0_SI951
+MRMB0_SX141
+MRMB0_SX231
+MRMB0_SX321
+MRMB0_SX411
+MRMB0_SX51
+MRMG0_SI1080
+MRMG0_SI1710
+MRMG0_SI2340
+MRMG0_SX180
+MRMG0_SX270
+MRMG0_SX360
+MRMG0_SX450
+MRMG0_SX90
+MRMH0_SI1021
+MRMH0_SI1349
+MRMH0_SI2281
+MRMH0_SX121
+MRMH0_SX211
+MRMH0_SX301
+MRMH0_SX31
+MRMH0_SX391
+MRML0_SI1421
+MRML0_SI2051
+MRML0_SI791
+MRML0_SX161
+MRML0_SX251
+MRML0_SX341
+MRML0_SX431
+MRML0_SX71
+MRMS0_SI1113
+MRMS0_SI2057
+MRMS0_SI2100
+MRMS0_SX120
+MRMS0_SX210
+MRMS0_SX30
+MRMS0_SX300
+MRMS0_SX390
+MRPC1_SI1482
+MRPC1_SI2026
+MRPC1_SI2112
+MRPC1_SX132
+MRPC1_SX222
+MRPC1_SX312
+MRPC1_SX402
+MRPC1_SX42
+MRRE0_SI1334
+MRRE0_SI704
+MRRE0_SI952
+MRRE0_SX164
+MRRE0_SX254
+MRRE0_SX344
+MRRE0_SX434
+MRRE0_SX74
+MRSO0_SI1206
+MRSO0_SI1659
+MRSO0_SI2289
+MRSO0_SX129
+MRSO0_SX219
+MRSO0_SX309
+MRSO0_SX39
+MRSO0_SX399
+MRSP0_SI1429
+MRSP0_SI2059
+MRSP0_SI799
+MRSP0_SX169
+MRSP0_SX196
+MRSP0_SX259
+MRSP0_SX439
+MRSP0_SX79
+MRTC0_SI1458
+MRTC0_SI2088
+MRTC0_SI828
+MRTC0_SX108
+MRTC0_SX18
+MRTC0_SX198
+MRTC0_SX288
+MRTC0_SX378
+MRTJ0_SI1551
+MRTJ0_SI2032
+MRTJ0_SI772
+MRTJ0_SX142
+MRTJ0_SX232
+MRTJ0_SX322
+MRTJ0_SX412
+MRTJ0_SX52
+MRVG0_SI1140
+MRVG0_SI1770
+MRVG0_SI510
+MRVG0_SX150
+MRVG0_SX240
+MRVG0_SX330
+MRVG0_SX420
+MRVG0_SX60
+MRWA0_SI1603
+MRWA0_SI2233
+MRWA0_SI973
+MRWA0_SX163
+MRWA0_SX253
+MRWA0_SX343
+MRWA0_SX433
+MRWA0_SX73
+MRWS0_SI1102
+MRWS0_SI1732
+MRWS0_SI472
+MRWS0_SX112
+MRWS0_SX202
+MRWS0_SX22
+MRWS0_SX292
+MRWS0_SX382
+MRXB0_SI1585
+MRXB0_SI2215
+MRXB0_SI955
+MRXB0_SX145
+MRXB0_SX235
+MRXB0_SX325
+MRXB0_SX415
+MRXB0_SX55
+MSAH1_SI1049
+MSAH1_SI1679
+MSAH1_SI2309
+MSAH1_SX149
+MSAH1_SX239
+MSAH1_SX329
+MSAH1_SX419
+MSAH1_SX59
+MSAS0_SI1376
+MSAS0_SI2006
+MSAS0_SI746
+MSAS0_SX116
+MSAS0_SX206
+MSAS0_SX26
+MSAS0_SX296
+MSAS0_SX386
+MSAT0_SI1526
+MSAT0_SI2156
+MSAT0_SI896
+MSAT0_SX176
+MSAT0_SX266
+MSAT0_SX356
+MSAT0_SX446
+MSAT0_SX86
+MSAT1_SI1073
+MSAT1_SI1703
+MSAT1_SI2333
+MSAT1_SX173
+MSAT1_SX263
+MSAT1_SX353
+MSAT1_SX443
+MSAT1_SX83
+MSDB0_SI1007
+MSDB0_SI1637
+MSDB0_SI2267
+MSDB0_SX107
+MSDB0_SX17
+MSDB0_SX197
+MSDB0_SX287
+MSDB0_SX377
+MSDH0_SI2113
+MSDH0_SI2240
+MSDH0_SI980
+MSDH0_SX170
+MSDH0_SX260
+MSDH0_SX350
+MSDH0_SX440
+MSDH0_SX80
+MSDS0_SI1077
+MSDS0_SI1707
+MSDS0_SI2337
+MSDS0_SX177
+MSDS0_SX267
+MSDS0_SX357
+MSDS0_SX447
+MSDS0_SX87
+MSEM1_SI1440
+MSEM1_SI2070
+MSEM1_SI810
+MSEM1_SX180
+MSEM1_SX270
+MSEM1_SX360
+MSEM1_SX450
+MSEM1_SX90
+MSES0_SI1589
+MSES0_SI2216
+MSES0_SI2219
+MSES0_SX149
+MSES0_SX239
+MSES0_SX329
+MSES0_SX419
+MSES0_SX59
+MSFH0_SI1216
+MSFH0_SI1738
+MSFH0_SI586
+MSFH0_SX136
+MSFH0_SX226
+MSFH0_SX316
+MSFH0_SX406
+MSFH0_SX46
+MSFV0_SI1262
+MSFV0_SI1892
+MSFV0_SI632
+MSFV0_SX182
+MSFV0_SX272
+MSFV0_SX362
+MSFV0_SX452
+MSFV0_SX92
+MSJK0_SI1596
+MSJK0_SI2226
+MSJK0_SI966
+MSJK0_SX156
+MSJK0_SX246
+MSJK0_SX336
+MSJK0_SX426
+MSJK0_SX66
+MSMC0_SI1907
+MSMC0_SI509
+MSMC0_SI647
+MSMC0_SX107
+MSMC0_SX17
+MSMC0_SX197
+MSMC0_SX287
+MSMC0_SX377
+MSMR0_SI1150
+MSMR0_SI1405
+MSMR0_SI775
+MSMR0_SX145
+MSMR0_SX235
+MSMR0_SX325
+MSMR0_SX415
+MSMR0_SX55
+MSMS0_SI1433
+MSMS0_SI2063
+MSMS0_SI803
+MSMS0_SX173
+MSMS0_SX263
+MSMS0_SX353
+MSMS0_SX443
+MSMS0_SX83
+MSRG0_SI1221
+MSRG0_SI1851
+MSRG0_SI591
+MSRG0_SX141
+MSRG0_SX231
+MSRG0_SX321
+MSRG0_SX411
+MSRG0_SX51
+MSRR0_SI1131
+MSRR0_SI1761
+MSRR0_SI501
+MSRR0_SX141
+MSRR0_SX231
+MSRR0_SX30
+MSRR0_SX411
+MSRR0_SX51
+MSTF0_SI1396
+MSTF0_SI766
+MSTF0_SI852
+MSTF0_SX136
+MSTF0_SX226
+MSTF0_SX316
+MSTF0_SX406
+MSTF0_SX46
+MSVS0_SI1568
+MSVS0_SI2198
+MSVS0_SI938
+MSVS0_SX128
+MSVS0_SX218
+MSVS0_SX308
+MSVS0_SX38
+MSVS0_SX398
+MTAB0_SI1572
+MTAB0_SI2202
+MTAB0_SI942
+MTAB0_SX132
+MTAB0_SX222
+MTAB0_SX312
+MTAB0_SX402
+MTAB0_SX42
+MTAS0_SI1385
+MTAS0_SI2015
+MTAS0_SI755
+MTAS0_SX125
+MTAS0_SX215
+MTAS0_SX305
+MTAS0_SX35
+MTAS0_SX395
+MTAT0_SI1110
+MTAT0_SI1740
+MTAT0_SI811
+MTAT0_SX120
+MTAT0_SX210
+MTAT0_SX30
+MTAT0_SX300
+MTAT0_SX390
+MTAT1_SI1409
+MTAT1_SI1627
+MTAT1_SI779
+MTAT1_SX149
+MTAT1_SX239
+MTAT1_SX329
+MTAT1_SX419
+MTAT1_SX59
+MTBC0_SI1173
+MTBC0_SI1803
+MTBC0_SI543
+MTBC0_SX183
+MTBC0_SX273
+MTBC0_SX347
+MTBC0_SX363
+MTBC0_SX93
+MTCS0_SI1972
+MTCS0_SI2265
+MTCS0_SI712
+MTCS0_SX172
+MTCS0_SX262
+MTCS0_SX352
+MTCS0_SX442
+MTCS0_SX82
+MTDB0_SI1401
+MTDB0_SI2031
+MTDB0_SI771
+MTDB0_SX141
+MTDB0_SX231
+MTDB0_SX321
+MTDB0_SX411
+MTDB0_SX51
+MTDP0_SI1274
+MTDP0_SI1521
+MTDP0_SI2151
+MTDP0_SX171
+MTDP0_SX261
+MTDP0_SX351
+MTDP0_SX441
+MTDP0_SX81
+MTER0_SI1157
+MTER0_SI1787
+MTER0_SI527
+MTER0_SX167
+MTER0_SX17
+MTER0_SX257
+MTER0_SX437
+MTER0_SX77
+MTJG0_SI1520
+MTJG0_SI2157
+MTJG0_SI890
+MTJG0_SX170
+MTJG0_SX260
+MTJG0_SX350
+MTJG0_SX440
+MTJG0_SX80
+MTJM0_SI1226
+MTJM0_SI1856
+MTJM0_SI655
+MTJM0_SX146
+MTJM0_SX236
+MTJM0_SX326
+MTJM0_SX416
+MTJM0_SX56
+MTJS0_SI1192
+MTJS0_SI1822
+MTJS0_SI562
+MTJS0_SX112
+MTJS0_SX202
+MTJS0_SX22
+MTJS0_SX292
+MTJS0_SX382
+MTJU0_SI2020
+MTJU0_SI2269
+MTJU0_SI760
+MTJU0_SX130
+MTJU0_SX220
+MTJU0_SX310
+MTJU0_SX40
+MTJU0_SX400
+MTKD0_SI1187
+MTKD0_SI1817
+MTKD0_SI630
+MTKD0_SX107
+MTKD0_SX17
+MTKD0_SX197
+MTKD0_SX287
+MTKD0_SX377
+MTKP0_SI1023
+MTKP0_SI2283
+MTKP0_SI454
+MTKP0_SX123
+MTKP0_SX213
+MTKP0_SX303
+MTKP0_SX33
+MTKP0_SX393
+MTLB0_SI1134
+MTLB0_SI1764
+MTLB0_SI504
+MTLB0_SX144
+MTLB0_SX234
+MTLB0_SX324
+MTLB0_SX414
+MTLB0_SX54
+MTLC0_SI1313
+MTLC0_SI1477
+MTLC0_SI847
+MTLC0_SX127
+MTLC0_SX217
+MTLC0_SX307
+MTLC0_SX37
+MTLC0_SX397
+MTML0_SI1065
+MTML0_SI1695
+MTML0_SI2325
+MTML0_SX165
+MTML0_SX255
+MTML0_SX345
+MTML0_SX435
+MTML0_SX75
+MTMN0_SI1064
+MTMN0_SI2324
+MTMN0_SI582
+MTMN0_SX164
+MTMN0_SX254
+MTMN0_SX344
+MTMN0_SX434
+MTMN0_SX74
+MTMT0_SI1118
+MTMT0_SI1748
+MTMT0_SI488
+MTMT0_SX128
+MTMT0_SX218
+MTMT0_SX308
+MTMT0_SX38
+MTMT0_SX398
+MTPF0_SI1235
+MTPF0_SI1865
+MTPF0_SI605
+MTPF0_SX155
+MTPF0_SX245
+MTPF0_SX335
+MTPF0_SX425
+MTPF0_SX65
+MTPG0_SI1383
+MTPG0_SI2013
+MTPG0_SI753
+MTPG0_SX123
+MTPG0_SX213
+MTPG0_SX303
+MTPG0_SX33
+MTPG0_SX393
+MTPP0_SI1508
+MTPP0_SI2138
+MTPP0_SI878
+MTPP0_SX158
+MTPP0_SX248
+MTPP0_SX338
+MTPP0_SX428
+MTPP0_SX68
+MTPR0_SI1600
+MTPR0_SI2230
+MTPR0_SI506
+MTPR0_SX160
+MTPR0_SX250
+MTPR0_SX340
+MTPR0_SX430
+MTPR0_SX70
+MTQC0_SI1441
+MTQC0_SI2071
+MTQC0_SI480
+MTQC0_SX181
+MTQC0_SX271
+MTQC0_SX361
+MTQC0_SX451
+MTQC0_SX91
+MTRC0_SI1623
+MTRC0_SI589
+MTRC0_SI993
+MTRC0_SX170
+MTRC0_SX183
+MTRC0_SX273
+MTRC0_SX363
+MTRC0_SX93
+MTRR0_SI1548
+MTRR0_SI2178
+MTRR0_SI918
+MTRR0_SX108
+MTRR0_SX18
+MTRR0_SX198
+MTRR0_SX288
+MTRR0_SX378
+MTRT0_SI1227
+MTRT0_SI1857
+MTRT0_SI597
+MTRT0_SX147
+MTRT0_SX237
+MTRT0_SX254
+MTRT0_SX417
+MTRT0_SX57
+MTWH1_SI1512
+MTWH1_SI2142
+MTWH1_SI882
+MTWH1_SX162
+MTWH1_SX252
+MTWH1_SX342
+MTWH1_SX432
+MTWH1_SX72
+MTXS0_SI1060
+MTXS0_SI1690
+MTXS0_SI2320
+MTXS0_SX160
+MTXS0_SX250
+MTXS0_SX340
+MTXS0_SX430
+MTXS0_SX70
+MVJH0_SI1556
+MVJH0_SI2186
+MVJH0_SI926
+MVJH0_SX116
+MVJH0_SX206
+MVJH0_SX26
+MVJH0_SX296
+MVJH0_SX386
+MVLO0_SI1147
+MVLO0_SI1777
+MVLO0_SI517
+MVLO0_SX157
+MVLO0_SX247
+MVLO0_SX337
+MVLO0_SX427
+MVLO0_SX67
+MVRW0_SI1485
+MVRW0_SI2115
+MVRW0_SI855
+MVRW0_SX135
+MVRW0_SX225
+MVRW0_SX315
+MVRW0_SX405
+MVRW0_SX45
+MWAC0_SI1601
+MWAC0_SI2231
+MWAC0_SI971
+MWAC0_SX161
+MWAC0_SX251
+MWAC0_SX341
+MWAC0_SX431
+MWAC0_SX71
+MWAD0_SI1062
+MWAD0_SI1749
+MWAD0_SI2322
+MWAD0_SX162
+MWAD0_SX252
+MWAD0_SX342
+MWAD0_SX432
+MWAD0_SX72
+MWAR0_SI1045
+MWAR0_SI1675
+MWAR0_SI2305
+MWAR0_SX145
+MWAR0_SX235
+MWAR0_SX325
+MWAR0_SX415
+MWAR0_SX55
+MWCH0_SI1622
+MWCH0_SI1895
+MWCH0_SI2252
+MWCH0_SX182
+MWCH0_SX272
+MWCH0_SX362
+MWCH0_SX452
+MWCH0_SX92
+MWDK0_SI1436
+MWDK0_SI2017
+MWDK0_SI806
+MWDK0_SX176
+MWDK0_SX266
+MWDK0_SX356
+MWDK0_SX446
+MWDK0_SX86
+MWEM0_SI1320
+MWEM0_SI1393
+MWEM0_SI1950
+MWEM0_SX150
+MWEM0_SX240
+MWEM0_SX330
+MWEM0_SX420
+MWEM0_SX60
+MWGR0_SI1606
+MWGR0_SI2236
+MWGR0_SI976
+MWGR0_SX166
+MWGR0_SX256
+MWGR0_SX346
+MWGR0_SX436
+MWGR0_SX76
+MWRE0_SI1057
+MWRE0_SI1687
+MWRE0_SI2317
+MWRE0_SX157
+MWRE0_SX247
+MWRE0_SX337
+MWRE0_SX427
+MWRE0_SX67
+MWRP0_SI1443
+MWRP0_SI1525
+MWRP0_SI2073
+MWRP0_SX183
+MWRP0_SX273
+MWRP0_SX3
+MWRP0_SX363
+MWRP0_SX93
+MWSB0_SI1626
+MWSB0_SI2256
+MWSB0_SI996
+MWSB0_SX186
+MWSB0_SX276
+MWSB0_SX366
+MWSB0_SX6
+MWSB0_SX96
+MWSH0_SI1426
+MWSH0_SI2266
+MWSH0_SI796
+MWSH0_SX166
+MWSH0_SX256
+MWSH0_SX346
+MWSH0_SX436
+MWSH0_SX76
+MZMB0_SI1166
+MZMB0_SI1796
+MZMB0_SI536
+MZMB0_SX176
+MZMB0_SX266
+MZMB0_SX356
+MZMB0_SX446
+MZMB0_SX86
diff --git a/SpeechT5/fairseq/examples/wav2vec/unsupervised/config/timit_matched/valid.uid b/SpeechT5/fairseq/examples/wav2vec/unsupervised/config/timit_matched/valid.uid
new file mode 100644
index 0000000000000000000000000000000000000000..ab5ef381ab9319aa9aefa9054e0a5128aec6f5e1
--- /dev/null
+++ b/SpeechT5/fairseq/examples/wav2vec/unsupervised/config/timit_matched/valid.uid
@@ -0,0 +1,400 @@
+FADG0_SI1279
+FADG0_SI1909
+FADG0_SI649
+FADG0_SX109
+FADG0_SX19
+FADG0_SX199
+FADG0_SX289
+FADG0_SX379
+FAKS0_SI1573
+FAKS0_SI2203
+FAKS0_SI943
+FAKS0_SX133
+FAKS0_SX223
+FAKS0_SX313
+FAKS0_SX403
+FAKS0_SX43
+FCAL1_SI1403
+FCAL1_SI2033
+FCAL1_SI773
+FCAL1_SX143
+FCAL1_SX233
+FCAL1_SX323
+FCAL1_SX413
+FCAL1_SX53
+FCMH0_SI1454
+FCMH0_SI2084
+FCMH0_SI824
+FCMH0_SX104
+FCMH0_SX14
+FCMH0_SX194
+FCMH0_SX284
+FCMH0_SX374
+FDAC1_SI1474
+FDAC1_SI2104
+FDAC1_SI844
+FDAC1_SX124
+FDAC1_SX214
+FDAC1_SX304
+FDAC1_SX34
+FDAC1_SX394
+FDMS0_SI1218
+FDMS0_SI1502
+FDMS0_SI1848
+FDMS0_SX138
+FDMS0_SX228
+FDMS0_SX318
+FDMS0_SX408
+FDMS0_SX48
+FDRW0_SI1283
+FDRW0_SI1423
+FDRW0_SI653
+FDRW0_SX113
+FDRW0_SX203
+FDRW0_SX23
+FDRW0_SX293
+FDRW0_SX383
+FEDW0_SI1084
+FEDW0_SI1653
+FEDW0_SI1714
+FEDW0_SX184
+FEDW0_SX274
+FEDW0_SX364
+FEDW0_SX4
+FEDW0_SX94
+FGJD0_SI1179
+FGJD0_SI549
+FGJD0_SI818
+FGJD0_SX189
+FGJD0_SX279
+FGJD0_SX369
+FGJD0_SX9
+FGJD0_SX99
+FJEM0_SI1264
+FJEM0_SI1894
+FJEM0_SI634
+FJEM0_SX184
+FJEM0_SX274
+FJEM0_SX364
+FJEM0_SX4
+FJEM0_SX94
+FJMG0_SI1181
+FJMG0_SI1811
+FJMG0_SI551
+FJMG0_SX101
+FJMG0_SX11
+FJMG0_SX191
+FJMG0_SX281
+FJMG0_SX371
+FJSJ0_SI1484
+FJSJ0_SI2114
+FJSJ0_SI854
+FJSJ0_SX134
+FJSJ0_SX224
+FJSJ0_SX314
+FJSJ0_SX404
+FJSJ0_SX44
+FKMS0_SI1490
+FKMS0_SI2120
+FKMS0_SI860
+FKMS0_SX140
+FKMS0_SX230
+FKMS0_SX320
+FKMS0_SX410
+FKMS0_SX50
+FMAH0_SI1289
+FMAH0_SI1919
+FMAH0_SI659
+FMAH0_SX119
+FMAH0_SX209
+FMAH0_SX29
+FMAH0_SX299
+FMAH0_SX389
+FMML0_SI1040
+FMML0_SI1670
+FMML0_SI2300
+FMML0_SX140
+FMML0_SX230
+FMML0_SX320
+FMML0_SX410
+FMML0_SX50
+FNMR0_SI1399
+FNMR0_SI2029
+FNMR0_SI769
+FNMR0_SX139
+FNMR0_SX229
+FNMR0_SX319
+FNMR0_SX409
+FNMR0_SX49
+FREW0_SI1030
+FREW0_SI1280
+FREW0_SI1910
+FREW0_SX110
+FREW0_SX20
+FREW0_SX200
+FREW0_SX290
+FREW0_SX380
+FSEM0_SI1198
+FSEM0_SI1828
+FSEM0_SI568
+FSEM0_SX118
+FSEM0_SX208
+FSEM0_SX28
+FSEM0_SX298
+FSEM0_SX388
+MAJC0_SI1946
+MAJC0_SI2095
+MAJC0_SI835
+MAJC0_SX115
+MAJC0_SX205
+MAJC0_SX25
+MAJC0_SX295
+MAJC0_SX385
+MBDG0_SI1463
+MBDG0_SI2093
+MBDG0_SI833
+MBDG0_SX113
+MBDG0_SX203
+MBDG0_SX23
+MBDG0_SX293
+MBDG0_SX383
+MBNS0_SI1220
+MBNS0_SI1850
+MBNS0_SI590
+MBNS0_SX140
+MBNS0_SX230
+MBNS0_SX320
+MBNS0_SX410
+MBNS0_SX50
+MBWM0_SI1304
+MBWM0_SI1934
+MBWM0_SI674
+MBWM0_SX134
+MBWM0_SX224
+MBWM0_SX314
+MBWM0_SX404
+MBWM0_SX44
+MCSH0_SI1549
+MCSH0_SI2179
+MCSH0_SI919
+MCSH0_SX109
+MCSH0_SX19
+MCSH0_SX199
+MCSH0_SX289
+MCSH0_SX379
+MDLF0_SI1583
+MDLF0_SI2213
+MDLF0_SI953
+MDLF0_SX143
+MDLF0_SX233
+MDLF0_SX323
+MDLF0_SX413
+MDLF0_SX53
+MDLS0_SI1628
+MDLS0_SI2258
+MDLS0_SI998
+MDLS0_SX188
+MDLS0_SX278
+MDLS0_SX368
+MDLS0_SX8
+MDLS0_SX98
+MDVC0_SI2174
+MDVC0_SI2196
+MDVC0_SI936
+MDVC0_SX126
+MDVC0_SX216
+MDVC0_SX306
+MDVC0_SX36
+MDVC0_SX396
+MERS0_SI1019
+MERS0_SI1649
+MERS0_SI497
+MERS0_SX119
+MERS0_SX209
+MERS0_SX29
+MERS0_SX299
+MERS0_SX389
+MGJF0_SI1901
+MGJF0_SI641
+MGJF0_SI776
+MGJF0_SX101
+MGJF0_SX11
+MGJF0_SX191
+MGJF0_SX281
+MGJF0_SX371
+MGLB0_SI1534
+MGLB0_SI2164
+MGLB0_SI904
+MGLB0_SX184
+MGLB0_SX274
+MGLB0_SX364
+MGLB0_SX4
+MGLB0_SX94
+MGWT0_SI1539
+MGWT0_SI2169
+MGWT0_SI909
+MGWT0_SX189
+MGWT0_SX279
+MGWT0_SX369
+MGWT0_SX9
+MGWT0_SX99
+MJAR0_SI1988
+MJAR0_SI2247
+MJAR0_SI728
+MJAR0_SX188
+MJAR0_SX278
+MJAR0_SX368
+MJAR0_SX8
+MJAR0_SX98
+MJFC0_SI1033
+MJFC0_SI1663
+MJFC0_SI2293
+MJFC0_SX133
+MJFC0_SX223
+MJFC0_SX313
+MJFC0_SX403
+MJFC0_SX43
+MJSW0_SI1010
+MJSW0_SI1640
+MJSW0_SI2270
+MJSW0_SX110
+MJSW0_SX20
+MJSW0_SX200
+MJSW0_SX290
+MJSW0_SX380
+MMDB1_SI1625
+MMDB1_SI2255
+MMDB1_SI995
+MMDB1_SX185
+MMDB1_SX275
+MMDB1_SX365
+MMDB1_SX5
+MMDB1_SX95
+MMDM2_SI1452
+MMDM2_SI1555
+MMDM2_SI2082
+MMDM2_SX102
+MMDM2_SX12
+MMDM2_SX192
+MMDM2_SX282
+MMDM2_SX372
+MMJR0_SI1648
+MMJR0_SI2166
+MMJR0_SI2278
+MMJR0_SX118
+MMJR0_SX208
+MMJR0_SX28
+MMJR0_SX298
+MMJR0_SX388
+MMWH0_SI1089
+MMWH0_SI1301
+MMWH0_SI459
+MMWH0_SX189
+MMWH0_SX279
+MMWH0_SX369
+MMWH0_SX9
+MMWH0_SX99
+MPDF0_SI1542
+MPDF0_SI2172
+MPDF0_SI912
+MPDF0_SX102
+MPDF0_SX12
+MPDF0_SX192
+MPDF0_SX282
+MPDF0_SX372
+MRCS0_SI1223
+MRCS0_SI1853
+MRCS0_SI593
+MRCS0_SX143
+MRCS0_SX233
+MRCS0_SX323
+MRCS0_SX413
+MRCS0_SX53
+MREB0_SI1375
+MREB0_SI2005
+MREB0_SI745
+MREB0_SX115
+MREB0_SX205
+MREB0_SX25
+MREB0_SX295
+MREB0_SX385
+MRJM4_SI1489
+MRJM4_SI2119
+MRJM4_SI859
+MRJM4_SX139
+MRJM4_SX229
+MRJM4_SX319
+MRJM4_SX409
+MRJM4_SX49
+MRJR0_SI1182
+MRJR0_SI1812
+MRJR0_SI2313
+MRJR0_SX102
+MRJR0_SX12
+MRJR0_SX192
+MRJR0_SX282
+MRJR0_SX372
+MROA0_SI1307
+MROA0_SI1970
+MROA0_SI677
+MROA0_SX137
+MROA0_SX227
+MROA0_SX317
+MROA0_SX407
+MROA0_SX47
+MRTK0_SI1093
+MRTK0_SI1723
+MRTK0_SI1750
+MRTK0_SX103
+MRTK0_SX13
+MRTK0_SX193
+MRTK0_SX283
+MRTK0_SX373
+MRWS1_SI1130
+MRWS1_SI1496
+MRWS1_SI500
+MRWS1_SX140
+MRWS1_SX230
+MRWS1_SX320
+MRWS1_SX410
+MRWS1_SX50
+MTAA0_SI1285
+MTAA0_SI1915
+MTAA0_SI596
+MTAA0_SX115
+MTAA0_SX205
+MTAA0_SX25
+MTAA0_SX295
+MTAA0_SX385
+MTDT0_SI1994
+MTDT0_SI2254
+MTDT0_SI994
+MTDT0_SX184
+MTDT0_SX274
+MTDT0_SX364
+MTDT0_SX4
+MTDT0_SX94
+MTEB0_SI1133
+MTEB0_SI2064
+MTEB0_SI503
+MTEB0_SX143
+MTEB0_SX233
+MTEB0_SX323
+MTEB0_SX413
+MTEB0_SX53
+MTHC0_SI1015
+MTHC0_SI1645
+MTHC0_SI2275
+MTHC0_SX115
+MTHC0_SX205
+MTHC0_SX25
+MTHC0_SX295
+MTHC0_SX385
+MWJG0_SI1124
+MWJG0_SI1754
+MWJG0_SI494
+MWJG0_SX134
+MWJG0_SX224
+MWJG0_SX314
+MWJG0_SX404
+MWJG0_SX44
diff --git a/SpeechT5/fairseq/examples/wav2vec/unsupervised/config/timit_unmatched/test.uid b/SpeechT5/fairseq/examples/wav2vec/unsupervised/config/timit_unmatched/test.uid
new file mode 100644
index 0000000000000000000000000000000000000000..e3967e42423d4d82d159cb395514d41c13316da2
--- /dev/null
+++ b/SpeechT5/fairseq/examples/wav2vec/unsupervised/config/timit_unmatched/test.uid
@@ -0,0 +1,1680 @@
+FADG0_SA1
+FADG0_SA2
+FADG0_SI1279
+FADG0_SI1909
+FADG0_SI649
+FADG0_SX109
+FADG0_SX19
+FADG0_SX199
+FADG0_SX289
+FADG0_SX379
+FAKS0_SA1
+FAKS0_SA2
+FAKS0_SI1573
+FAKS0_SI2203
+FAKS0_SI943
+FAKS0_SX133
+FAKS0_SX223
+FAKS0_SX313
+FAKS0_SX403
+FAKS0_SX43
+FASW0_SA1
+FASW0_SA2
+FASW0_SI1550
+FASW0_SI2180
+FASW0_SI920
+FASW0_SX110
+FASW0_SX20
+FASW0_SX200
+FASW0_SX290
+FASW0_SX380
+FAWF0_SA1
+FAWF0_SA2
+FAWF0_SI1000
+FAWF0_SI1630
+FAWF0_SI2260
+FAWF0_SX10
+FAWF0_SX100
+FAWF0_SX190
+FAWF0_SX280
+FAWF0_SX370
+FCAL1_SA1
+FCAL1_SA2
+FCAL1_SI1403
+FCAL1_SI2033
+FCAL1_SI773
+FCAL1_SX143
+FCAL1_SX233
+FCAL1_SX323
+FCAL1_SX413
+FCAL1_SX53
+FCAU0_SA1
+FCAU0_SA2
+FCAU0_SI1037
+FCAU0_SI1667
+FCAU0_SI2297
+FCAU0_SX137
+FCAU0_SX227
+FCAU0_SX317
+FCAU0_SX407
+FCAU0_SX47
+FCFT0_SA1
+FCFT0_SA2
+FCFT0_SI1178
+FCFT0_SI1808
+FCFT0_SI548
+FCFT0_SX188
+FCFT0_SX278
+FCFT0_SX368
+FCFT0_SX8
+FCFT0_SX98
+FCMH0_SA1
+FCMH0_SA2
+FCMH0_SI1454
+FCMH0_SI2084
+FCMH0_SI824
+FCMH0_SX104
+FCMH0_SX14
+FCMH0_SX194
+FCMH0_SX284
+FCMH0_SX374
+FCMH1_SA1
+FCMH1_SA2
+FCMH1_SI1493
+FCMH1_SI2123
+FCMH1_SI863
+FCMH1_SX143
+FCMH1_SX233
+FCMH1_SX323
+FCMH1_SX413
+FCMH1_SX53
+FCMR0_SA1
+FCMR0_SA2
+FCMR0_SI1105
+FCMR0_SI1735
+FCMR0_SI475
+FCMR0_SX115
+FCMR0_SX205
+FCMR0_SX25
+FCMR0_SX295
+FCMR0_SX385
+FCRH0_SA1
+FCRH0_SA2
+FCRH0_SI1088
+FCRH0_SI1718
+FCRH0_SI458
+FCRH0_SX188
+FCRH0_SX278
+FCRH0_SX368
+FCRH0_SX8
+FCRH0_SX98
+FDAC1_SA1
+FDAC1_SA2
+FDAC1_SI1474
+FDAC1_SI2104
+FDAC1_SI844
+FDAC1_SX124
+FDAC1_SX214
+FDAC1_SX304
+FDAC1_SX34
+FDAC1_SX394
+FDHC0_SA1
+FDHC0_SA2
+FDHC0_SI1559
+FDHC0_SI2189
+FDHC0_SI929
+FDHC0_SX119
+FDHC0_SX209
+FDHC0_SX29
+FDHC0_SX299
+FDHC0_SX389
+FDMS0_SA1
+FDMS0_SA2
+FDMS0_SI1218
+FDMS0_SI1502
+FDMS0_SI1848
+FDMS0_SX138
+FDMS0_SX228
+FDMS0_SX318
+FDMS0_SX408
+FDMS0_SX48
+FDRD1_SA1
+FDRD1_SA2
+FDRD1_SI1544
+FDRD1_SI1566
+FDRD1_SI2149
+FDRD1_SX104
+FDRD1_SX14
+FDRD1_SX194
+FDRD1_SX284
+FDRD1_SX374
+FDRW0_SA1
+FDRW0_SA2
+FDRW0_SI1283
+FDRW0_SI1423
+FDRW0_SI653
+FDRW0_SX113
+FDRW0_SX203
+FDRW0_SX23
+FDRW0_SX293
+FDRW0_SX383
+FEDW0_SA1
+FEDW0_SA2
+FEDW0_SI1084
+FEDW0_SI1653
+FEDW0_SI1714
+FEDW0_SX184
+FEDW0_SX274
+FEDW0_SX364
+FEDW0_SX4
+FEDW0_SX94
+FELC0_SA1
+FELC0_SA2
+FELC0_SI1386
+FELC0_SI2016
+FELC0_SI756
+FELC0_SX126
+FELC0_SX216
+FELC0_SX306
+FELC0_SX36
+FELC0_SX396
+FGJD0_SA1
+FGJD0_SA2
+FGJD0_SI1179
+FGJD0_SI549
+FGJD0_SI818
+FGJD0_SX189
+FGJD0_SX279
+FGJD0_SX369
+FGJD0_SX9
+FGJD0_SX99
+FGMD0_SA1
+FGMD0_SA2
+FGMD0_SI1943
+FGMD0_SI2107
+FGMD0_SI683
+FGMD0_SX143
+FGMD0_SX233
+FGMD0_SX323
+FGMD0_SX413
+FGMD0_SX53
+FGWR0_SA1
+FGWR0_SA2
+FGWR0_SI1578
+FGWR0_SI2208
+FGWR0_SI948
+FGWR0_SX138
+FGWR0_SX228
+FGWR0_SX318
+FGWR0_SX408
+FGWR0_SX48
+FHES0_SA1
+FHES0_SA2
+FHES0_SI1109
+FHES0_SI1739
+FHES0_SI479
+FHES0_SX119
+FHES0_SX209
+FHES0_SX29
+FHES0_SX299
+FHES0_SX389
+FHEW0_SA1
+FHEW0_SA2
+FHEW0_SI2023
+FHEW0_SI690
+FHEW0_SI763
+FHEW0_SX133
+FHEW0_SX223
+FHEW0_SX313
+FHEW0_SX403
+FHEW0_SX43
+FISB0_SA1
+FISB0_SA2
+FISB0_SI1579
+FISB0_SI2209
+FISB0_SI949
+FISB0_SX139
+FISB0_SX229
+FISB0_SX319
+FISB0_SX409
+FISB0_SX49
+FJAS0_SA1
+FJAS0_SA2
+FJAS0_SI1400
+FJAS0_SI2030
+FJAS0_SI770
+FJAS0_SX140
+FJAS0_SX230
+FJAS0_SX320
+FJAS0_SX410
+FJAS0_SX50
+FJCS0_SA1
+FJCS0_SA2
+FJCS0_SI1309
+FJCS0_SI1833
+FJCS0_SI1939
+FJCS0_SX139
+FJCS0_SX229
+FJCS0_SX319
+FJCS0_SX409
+FJCS0_SX49
+FJEM0_SA1
+FJEM0_SA2
+FJEM0_SI1264
+FJEM0_SI1894
+FJEM0_SI634
+FJEM0_SX184
+FJEM0_SX274
+FJEM0_SX364
+FJEM0_SX4
+FJEM0_SX94
+FJLM0_SA1
+FJLM0_SA2
+FJLM0_SI1043
+FJLM0_SI1673
+FJLM0_SI2303
+FJLM0_SX143
+FJLM0_SX233
+FJLM0_SX323
+FJLM0_SX413
+FJLM0_SX53
+FJMG0_SA1
+FJMG0_SA2
+FJMG0_SI1181
+FJMG0_SI1811
+FJMG0_SI551
+FJMG0_SX101
+FJMG0_SX11
+FJMG0_SX191
+FJMG0_SX281
+FJMG0_SX371
+FJRE0_SA1
+FJRE0_SA2
+FJRE0_SI1116
+FJRE0_SI1587
+FJRE0_SI1746
+FJRE0_SX126
+FJRE0_SX216
+FJRE0_SX306
+FJRE0_SX36
+FJRE0_SX396
+FJSA0_SA1
+FJSA0_SA2
+FJSA0_SI1379
+FJSA0_SI2009
+FJSA0_SI749
+FJSA0_SX119
+FJSA0_SX209
+FJSA0_SX29
+FJSA0_SX299
+FJSA0_SX389
+FJSJ0_SA1
+FJSJ0_SA2
+FJSJ0_SI1484
+FJSJ0_SI2114
+FJSJ0_SI854
+FJSJ0_SX134
+FJSJ0_SX224
+FJSJ0_SX314
+FJSJ0_SX404
+FJSJ0_SX44
+FJWB0_SA1
+FJWB0_SA2
+FJWB0_SI1265
+FJWB0_SI635
+FJWB0_SI992
+FJWB0_SX185
+FJWB0_SX275
+FJWB0_SX365
+FJWB0_SX5
+FJWB0_SX95
+FKMS0_SA1
+FKMS0_SA2
+FKMS0_SI1490
+FKMS0_SI2120
+FKMS0_SI860
+FKMS0_SX140
+FKMS0_SX230
+FKMS0_SX320
+FKMS0_SX410
+FKMS0_SX50
+FLAS0_SA1
+FLAS0_SA2
+FLAS0_SI1026
+FLAS0_SI1488
+FLAS0_SI858
+FLAS0_SX138
+FLAS0_SX228
+FLAS0_SX318
+FLAS0_SX408
+FLAS0_SX48
+FLBW0_SA1
+FLBW0_SA2
+FLBW0_SI1219
+FLBW0_SI1849
+FLBW0_SI2253
+FLBW0_SX139
+FLBW0_SX229
+FLBW0_SX319
+FLBW0_SX409
+FLBW0_SX49
+FLKD0_SA1
+FLKD0_SA2
+FLKD0_SI1369
+FLKD0_SI739
+FLKD0_SI894
+FLKD0_SX109
+FLKD0_SX19
+FLKD0_SX199
+FLKD0_SX289
+FLKD0_SX379
+FLNH0_SA1
+FLNH0_SA2
+FLNH0_SI1214
+FLNH0_SI584
+FLNH0_SI941
+FLNH0_SX134
+FLNH0_SX224
+FLNH0_SX314
+FLNH0_SX404
+FLNH0_SX44
+FMAF0_SA1
+FMAF0_SA2
+FMAF0_SI1459
+FMAF0_SI2089
+FMAF0_SI829
+FMAF0_SX109
+FMAF0_SX19
+FMAF0_SX199
+FMAF0_SX289
+FMAF0_SX379
+FMAH0_SA1
+FMAH0_SA2
+FMAH0_SI1289
+FMAH0_SI1919
+FMAH0_SI659
+FMAH0_SX119
+FMAH0_SX209
+FMAH0_SX29
+FMAH0_SX299
+FMAH0_SX389
+FMCM0_SA1
+FMCM0_SA2
+FMCM0_SI1180
+FMCM0_SI1810
+FMCM0_SI550
+FMCM0_SX10
+FMCM0_SX100
+FMCM0_SX190
+FMCM0_SX280
+FMCM0_SX370
+FMGD0_SA1
+FMGD0_SA2
+FMGD0_SI1564
+FMGD0_SI2194
+FMGD0_SI934
+FMGD0_SX124
+FMGD0_SX214
+FMGD0_SX304
+FMGD0_SX34
+FMGD0_SX394
+FMLD0_SA1
+FMLD0_SA2
+FMLD0_SI2185
+FMLD0_SI822
+FMLD0_SI925
+FMLD0_SX115
+FMLD0_SX205
+FMLD0_SX25
+FMLD0_SX295
+FMLD0_SX385
+FMML0_SA1
+FMML0_SA2
+FMML0_SI1040
+FMML0_SI1670
+FMML0_SI2300
+FMML0_SX140
+FMML0_SX230
+FMML0_SX320
+FMML0_SX410
+FMML0_SX50
+FNLP0_SA1
+FNLP0_SA2
+FNLP0_SI1308
+FNLP0_SI1938
+FNLP0_SI678
+FNLP0_SX138
+FNLP0_SX228
+FNLP0_SX318
+FNLP0_SX408
+FNLP0_SX48
+FNMR0_SA1
+FNMR0_SA2
+FNMR0_SI1399
+FNMR0_SI2029
+FNMR0_SI769
+FNMR0_SX139
+FNMR0_SX229
+FNMR0_SX319
+FNMR0_SX409
+FNMR0_SX49
+FPAS0_SA1
+FPAS0_SA2
+FPAS0_SI1272
+FPAS0_SI2204
+FPAS0_SI944
+FPAS0_SX134
+FPAS0_SX224
+FPAS0_SX314
+FPAS0_SX404
+FPAS0_SX44
+FPKT0_SA1
+FPKT0_SA2
+FPKT0_SI1538
+FPKT0_SI2168
+FPKT0_SI908
+FPKT0_SX188
+FPKT0_SX278
+FPKT0_SX368
+FPKT0_SX8
+FPKT0_SX98
+FRAM1_SA1
+FRAM1_SA2
+FRAM1_SI1360
+FRAM1_SI522
+FRAM1_SI730
+FRAM1_SX10
+FRAM1_SX100
+FRAM1_SX190
+FRAM1_SX280
+FRAM1_SX370
+FREW0_SA1
+FREW0_SA2
+FREW0_SI1030
+FREW0_SI1280
+FREW0_SI1910
+FREW0_SX110
+FREW0_SX20
+FREW0_SX200
+FREW0_SX290
+FREW0_SX380
+FRNG0_SA1
+FRNG0_SA2
+FRNG0_SI1355
+FRNG0_SI1985
+FRNG0_SI725
+FRNG0_SX185
+FRNG0_SX275
+FRNG0_SX365
+FRNG0_SX5
+FRNG0_SX95
+FSEM0_SA1
+FSEM0_SA2
+FSEM0_SI1198
+FSEM0_SI1828
+FSEM0_SI568
+FSEM0_SX118
+FSEM0_SX208
+FSEM0_SX28
+FSEM0_SX298
+FSEM0_SX388
+FSLB1_SA1
+FSLB1_SA2
+FSLB1_SI1904
+FSLB1_SI644
+FSLB1_SI891
+FSLB1_SX104
+FSLB1_SX14
+FSLB1_SX194
+FSLB1_SX284
+FSLB1_SX374
+FSXA0_SA1
+FSXA0_SA2
+FSXA0_SI1108
+FSXA0_SI1846
+FSXA0_SI478
+FSXA0_SX118
+FSXA0_SX208
+FSXA0_SX28
+FSXA0_SX298
+FSXA0_SX388
+FTLH0_SA1
+FTLH0_SA2
+FTLH0_SI1009
+FTLH0_SI1390
+FTLH0_SI1639
+FTLH0_SX109
+FTLH0_SX19
+FTLH0_SX199
+FTLH0_SX289
+FTLH0_SX379
+FUTB0_SA1
+FUTB0_SA2
+FUTB0_SI1204
+FUTB0_SI1330
+FUTB0_SI1834
+FUTB0_SX124
+FUTB0_SX214
+FUTB0_SX304
+FUTB0_SX34
+FUTB0_SX394
+MABW0_SA1
+MABW0_SA2
+MABW0_SI1230
+MABW0_SI1664
+MABW0_SI2294
+MABW0_SX134
+MABW0_SX224
+MABW0_SX314
+MABW0_SX404
+MABW0_SX44
+MAHH0_SA1
+MAHH0_SA2
+MAHH0_SI1294
+MAHH0_SI1924
+MAHH0_SI664
+MAHH0_SX124
+MAHH0_SX214
+MAHH0_SX304
+MAHH0_SX34
+MAHH0_SX394
+MAJC0_SA1
+MAJC0_SA2
+MAJC0_SI1946
+MAJC0_SI2095
+MAJC0_SI835
+MAJC0_SX115
+MAJC0_SX205
+MAJC0_SX25
+MAJC0_SX295
+MAJC0_SX385
+MBDG0_SA1
+MBDG0_SA2
+MBDG0_SI1463
+MBDG0_SI2093
+MBDG0_SI833
+MBDG0_SX113
+MBDG0_SX203
+MBDG0_SX23
+MBDG0_SX293
+MBDG0_SX383
+MBJK0_SA1
+MBJK0_SA2
+MBJK0_SI1175
+MBJK0_SI2128
+MBJK0_SI545
+MBJK0_SX185
+MBJK0_SX275
+MBJK0_SX365
+MBJK0_SX5
+MBJK0_SX95
+MBNS0_SA1
+MBNS0_SA2
+MBNS0_SI1220
+MBNS0_SI1850
+MBNS0_SI590
+MBNS0_SX140
+MBNS0_SX230
+MBNS0_SX320
+MBNS0_SX410
+MBNS0_SX50
+MBPM0_SA1
+MBPM0_SA2
+MBPM0_SI1577
+MBPM0_SI1584
+MBPM0_SI947
+MBPM0_SX137
+MBPM0_SX227
+MBPM0_SX317
+MBPM0_SX407
+MBPM0_SX47
+MBWM0_SA1
+MBWM0_SA2
+MBWM0_SI1304
+MBWM0_SI1934
+MBWM0_SI674
+MBWM0_SX134
+MBWM0_SX224
+MBWM0_SX314
+MBWM0_SX404
+MBWM0_SX44
+MCCS0_SA1
+MCCS0_SA2
+MCCS0_SI1469
+MCCS0_SI2099
+MCCS0_SI839
+MCCS0_SX119
+MCCS0_SX209
+MCCS0_SX29
+MCCS0_SX299
+MCCS0_SX389
+MCEM0_SA1
+MCEM0_SA2
+MCEM0_SI1398
+MCEM0_SI2028
+MCEM0_SI768
+MCEM0_SX138
+MCEM0_SX228
+MCEM0_SX318
+MCEM0_SX408
+MCEM0_SX48
+MCHH0_SA1
+MCHH0_SA2
+MCHH0_SI1004
+MCHH0_SI1634
+MCHH0_SI530
+MCHH0_SX104
+MCHH0_SX14
+MCHH0_SX194
+MCHH0_SX284
+MCHH0_SX374
+MCMB0_SA1
+MCMB0_SA2
+MCMB0_SI1268
+MCMB0_SI1898
+MCMB0_SI638
+MCMB0_SX188
+MCMB0_SX278
+MCMB0_SX368
+MCMB0_SX8
+MCMB0_SX98
+MCMJ0_SA1
+MCMJ0_SA2
+MCMJ0_SI1094
+MCMJ0_SI464
+MCMJ0_SI602
+MCMJ0_SX104
+MCMJ0_SX14
+MCMJ0_SX194
+MCMJ0_SX284
+MCMJ0_SX374
+MCRC0_SA1
+MCRC0_SA2
+MCRC0_SI1092
+MCRC0_SI1722
+MCRC0_SI462
+MCRC0_SX102
+MCRC0_SX12
+MCRC0_SX192
+MCRC0_SX282
+MCRC0_SX372
+MCSH0_SA1
+MCSH0_SA2
+MCSH0_SI1549
+MCSH0_SI2179
+MCSH0_SI919
+MCSH0_SX109
+MCSH0_SX19
+MCSH0_SX199
+MCSH0_SX289
+MCSH0_SX379
+MCTT0_SA1
+MCTT0_SA2
+MCTT0_SI1144
+MCTT0_SI2188
+MCTT0_SI928
+MCTT0_SX118
+MCTT0_SX208
+MCTT0_SX28
+MCTT0_SX298
+MCTT0_SX388
+MCTW0_SA1
+MCTW0_SA2
+MCTW0_SI1373
+MCTW0_SI2003
+MCTW0_SI743
+MCTW0_SX113
+MCTW0_SX203
+MCTW0_SX23
+MCTW0_SX293
+MCTW0_SX383
+MDAB0_SA1
+MDAB0_SA2
+MDAB0_SI1039
+MDAB0_SI1669
+MDAB0_SI2299
+MDAB0_SX139
+MDAB0_SX229
+MDAB0_SX319
+MDAB0_SX409
+MDAB0_SX49
+MDAC2_SA1
+MDAC2_SA2
+MDAC2_SI2259
+MDAC2_SI560
+MDAC2_SI999
+MDAC2_SX189
+MDAC2_SX279
+MDAC2_SX369
+MDAC2_SX9
+MDAC2_SX99
+MDAW1_SA1
+MDAW1_SA2
+MDAW1_SI1453
+MDAW1_SI2083
+MDAW1_SI823
+MDAW1_SX103
+MDAW1_SX13
+MDAW1_SX193
+MDAW1_SX283
+MDAW1_SX373
+MDBB0_SA1
+MDBB0_SA2
+MDBB0_SI1195
+MDBB0_SI1825
+MDBB0_SI565
+MDBB0_SX115
+MDBB0_SX205
+MDBB0_SX25
+MDBB0_SX295
+MDBB0_SX385
+MDLD0_SA1
+MDLD0_SA2
+MDLD0_SI1543
+MDLD0_SI2173
+MDLD0_SI913
+MDLD0_SX103
+MDLD0_SX13
+MDLD0_SX193
+MDLD0_SX283
+MDLD0_SX373
+MDLF0_SA1
+MDLF0_SA2
+MDLF0_SI1583
+MDLF0_SI2213
+MDLF0_SI953
+MDLF0_SX143
+MDLF0_SX233
+MDLF0_SX323
+MDLF0_SX413
+MDLF0_SX53
+MDLS0_SA1
+MDLS0_SA2
+MDLS0_SI1628
+MDLS0_SI2258
+MDLS0_SI998
+MDLS0_SX188
+MDLS0_SX278
+MDLS0_SX368
+MDLS0_SX8
+MDLS0_SX98
+MDRB0_SA1
+MDRB0_SA2
+MDRB0_SI1174
+MDRB0_SI2109
+MDRB0_SI544
+MDRB0_SX184
+MDRB0_SX274
+MDRB0_SX364
+MDRB0_SX4
+MDRB0_SX94
+MDRM0_SA1
+MDRM0_SA2
+MDRM0_SI1013
+MDRM0_SI1643
+MDRM0_SI2273
+MDRM0_SX113
+MDRM0_SX203
+MDRM0_SX23
+MDRM0_SX293
+MDRM0_SX383
+MDSC0_SA1
+MDSC0_SA2
+MDSC0_SI1038
+MDSC0_SI2298
+MDSC0_SI967
+MDSC0_SX138
+MDSC0_SX228
+MDSC0_SX318
+MDSC0_SX408
+MDSC0_SX48
+MDVC0_SA1
+MDVC0_SA2
+MDVC0_SI2174
+MDVC0_SI2196
+MDVC0_SI936
+MDVC0_SX126
+MDVC0_SX216
+MDVC0_SX306
+MDVC0_SX36
+MDVC0_SX396
+MDWA0_SA1
+MDWA0_SA2
+MDWA0_SI1146
+MDWA0_SI1445
+MDWA0_SI519
+MDWA0_SX185
+MDWA0_SX275
+MDWA0_SX365
+MDWA0_SX5
+MDWA0_SX95
+MDWK0_SA1
+MDWK0_SA2
+MDWK0_SI1540
+MDWK0_SI2170
+MDWK0_SI910
+MDWK0_SX10
+MDWK0_SX100
+MDWK0_SX190
+MDWK0_SX280
+MDWK0_SX370
+MERS0_SA1
+MERS0_SA2
+MERS0_SI1019
+MERS0_SI1649
+MERS0_SI497
+MERS0_SX119
+MERS0_SX209
+MERS0_SX29
+MERS0_SX299
+MERS0_SX389
+MESD0_SA1
+MESD0_SA2
+MESD0_SI1002
+MESD0_SI1632
+MESD0_SI2262
+MESD0_SX102
+MESD0_SX12
+MESD0_SX192
+MESD0_SX282
+MESD0_SX372
+MFGK0_SA1
+MFGK0_SA2
+MFGK0_SI1451
+MFGK0_SI1744
+MFGK0_SI484
+MFGK0_SX124
+MFGK0_SX214
+MFGK0_SX304
+MFGK0_SX34
+MFGK0_SX394
+MGJF0_SA1
+MGJF0_SA2
+MGJF0_SI1901
+MGJF0_SI641
+MGJF0_SI776
+MGJF0_SX101
+MGJF0_SX11
+MGJF0_SX191
+MGJF0_SX281
+MGJF0_SX371
+MGLB0_SA1
+MGLB0_SA2
+MGLB0_SI1534
+MGLB0_SI2164
+MGLB0_SI904
+MGLB0_SX184
+MGLB0_SX274
+MGLB0_SX364
+MGLB0_SX4
+MGLB0_SX94
+MGMM0_SA1
+MGMM0_SA2
+MGMM0_SI1129
+MGMM0_SI1759
+MGMM0_SI499
+MGMM0_SX139
+MGMM0_SX229
+MGMM0_SX319
+MGMM0_SX409
+MGMM0_SX49
+MGRT0_SA1
+MGRT0_SA2
+MGRT0_SI1450
+MGRT0_SI2080
+MGRT0_SI820
+MGRT0_SX10
+MGRT0_SX100
+MGRT0_SX190
+MGRT0_SX280
+MGRT0_SX370
+MGWT0_SA1
+MGWT0_SA2
+MGWT0_SI1539
+MGWT0_SI2169
+MGWT0_SI909
+MGWT0_SX189
+MGWT0_SX279
+MGWT0_SX369
+MGWT0_SX9
+MGWT0_SX99
+MHPG0_SA1
+MHPG0_SA2
+MHPG0_SI1090
+MHPG0_SI1720
+MHPG0_SI460
+MHPG0_SX10
+MHPG0_SX100
+MHPG0_SX190
+MHPG0_SX280
+MHPG0_SX370
+MJAR0_SA1
+MJAR0_SA2
+MJAR0_SI1988
+MJAR0_SI2247
+MJAR0_SI728
+MJAR0_SX188
+MJAR0_SX278
+MJAR0_SX368
+MJAR0_SX8
+MJAR0_SX98
+MJBR0_SA1
+MJBR0_SA2
+MJBR0_SI1001
+MJBR0_SI1631
+MJBR0_SI2261
+MJBR0_SX101
+MJBR0_SX11
+MJBR0_SX191
+MJBR0_SX281
+MJBR0_SX371
+MJDH0_SA1
+MJDH0_SA2
+MJDH0_SI1354
+MJDH0_SI1984
+MJDH0_SI724
+MJDH0_SX184
+MJDH0_SX274
+MJDH0_SX364
+MJDH0_SX4
+MJDH0_SX94
+MJDM1_SA1
+MJDM1_SA2
+MJDM1_SI1085
+MJDM1_SI1715
+MJDM1_SI455
+MJDM1_SX185
+MJDM1_SX275
+MJDM1_SX365
+MJDM1_SX5
+MJDM1_SX95
+MJES0_SA1
+MJES0_SA2
+MJES0_SI1384
+MJES0_SI2014
+MJES0_SI754
+MJES0_SX124
+MJES0_SX214
+MJES0_SX304
+MJES0_SX34
+MJES0_SX394
+MJFC0_SA1
+MJFC0_SA2
+MJFC0_SI1033
+MJFC0_SI1663
+MJFC0_SI2293
+MJFC0_SX133
+MJFC0_SX223
+MJFC0_SX313
+MJFC0_SX403
+MJFC0_SX43
+MJJG0_SA1
+MJJG0_SA2
+MJJG0_SI1003
+MJJG0_SI1633
+MJJG0_SI2263
+MJJG0_SX103
+MJJG0_SX13
+MJJG0_SX193
+MJJG0_SX283
+MJJG0_SX373
+MJLN0_SA1
+MJLN0_SA2
+MJLN0_SI1449
+MJLN0_SI2079
+MJLN0_SI819
+MJLN0_SX189
+MJLN0_SX279
+MJLN0_SX369
+MJLN0_SX9
+MJLN0_SX99
+MJMP0_SA1
+MJMP0_SA2
+MJMP0_SI1535
+MJMP0_SI1791
+MJMP0_SI905
+MJMP0_SX185
+MJMP0_SX275
+MJMP0_SX365
+MJMP0_SX5
+MJMP0_SX95
+MJRF0_SA1
+MJRF0_SA2
+MJRF0_SI1114
+MJRF0_SI2081
+MJRF0_SI821
+MJRF0_SX101
+MJRF0_SX11
+MJRF0_SX191
+MJRF0_SX281
+MJRF0_SX371
+MJSW0_SA1
+MJSW0_SA2
+MJSW0_SI1010
+MJSW0_SI1640
+MJSW0_SI2270
+MJSW0_SX110
+MJSW0_SX20
+MJSW0_SX200
+MJSW0_SX290
+MJSW0_SX380
+MJTC0_SA1
+MJTC0_SA2
+MJTC0_SI1460
+MJTC0_SI2090
+MJTC0_SI830
+MJTC0_SX110
+MJTC0_SX20
+MJTC0_SX200
+MJTC0_SX290
+MJTC0_SX380
+MJTH0_SA1
+MJTH0_SA2
+MJTH0_SI1296
+MJTH0_SI1926
+MJTH0_SI666
+MJTH0_SX126
+MJTH0_SX216
+MJTH0_SX306
+MJTH0_SX36
+MJTH0_SX396
+MJVW0_SA1
+MJVW0_SA2
+MJVW0_SI1733
+MJVW0_SI1758
+MJVW0_SI473
+MJVW0_SX113
+MJVW0_SX203
+MJVW0_SX23
+MJVW0_SX293
+MJVW0_SX383
+MKCH0_SA1
+MKCH0_SA2
+MKCH0_SI1378
+MKCH0_SI1425
+MKCH0_SI2008
+MKCH0_SX118
+MKCH0_SX208
+MKCH0_SX28
+MKCH0_SX298
+MKCH0_SX388
+MKCL0_SA1
+MKCL0_SA2
+MKCL0_SI1091
+MKCL0_SI1721
+MKCL0_SI461
+MKCL0_SX101
+MKCL0_SX11
+MKCL0_SX191
+MKCL0_SX281
+MKCL0_SX371
+MKDR0_SA1
+MKDR0_SA2
+MKDR0_SI1273
+MKDR0_SI1903
+MKDR0_SI643
+MKDR0_SX103
+MKDR0_SX13
+MKDR0_SX193
+MKDR0_SX283
+MKDR0_SX373
+MKJL0_SA1
+MKJL0_SA2
+MKJL0_SI1100
+MKJL0_SI1730
+MKJL0_SI470
+MKJL0_SX110
+MKJL0_SX20
+MKJL0_SX200
+MKJL0_SX290
+MKJL0_SX380
+MKLT0_SA1
+MKLT0_SA2
+MKLT0_SI1213
+MKLT0_SI1843
+MKLT0_SI583
+MKLT0_SX133
+MKLT0_SX223
+MKLT0_SX313
+MKLT0_SX403
+MKLT0_SX43
+MLIH0_SA1
+MLIH0_SA2
+MLIH0_SI1183
+MLIH0_SI1813
+MLIH0_SI553
+MLIH0_SX103
+MLIH0_SX13
+MLIH0_SX193
+MLIH0_SX283
+MLIH0_SX373
+MLJB0_SA1
+MLJB0_SA2
+MLJB0_SI1310
+MLJB0_SI1940
+MLJB0_SI680
+MLJB0_SX140
+MLJB0_SX230
+MLJB0_SX320
+MLJB0_SX410
+MLJB0_SX50
+MLLL0_SA1
+MLLL0_SA2
+MLLL0_SI1363
+MLLL0_SI1993
+MLLL0_SI733
+MLLL0_SX103
+MLLL0_SX13
+MLLL0_SX193
+MLLL0_SX283
+MLLL0_SX373
+MLNT0_SA1
+MLNT0_SA2
+MLNT0_SI1574
+MLNT0_SI1902
+MLNT0_SI642
+MLNT0_SX102
+MLNT0_SX12
+MLNT0_SX192
+MLNT0_SX282
+MLNT0_SX372
+MMAB0_SA1
+MMAB0_SA2
+MMAB0_SI1362
+MMAB0_SI1992
+MMAB0_SI732
+MMAB0_SX102
+MMAB0_SX12
+MMAB0_SX192
+MMAB0_SX282
+MMAB0_SX372
+MMDB1_SA1
+MMDB1_SA2
+MMDB1_SI1625
+MMDB1_SI2255
+MMDB1_SI995
+MMDB1_SX185
+MMDB1_SX275
+MMDB1_SX365
+MMDB1_SX5
+MMDB1_SX95
+MMDH0_SA1
+MMDH0_SA2
+MMDH0_SI1656
+MMDH0_SI2118
+MMDH0_SI2286
+MMDH0_SX126
+MMDH0_SX216
+MMDH0_SX306
+MMDH0_SX36
+MMDH0_SX396
+MMDM2_SA1
+MMDM2_SA2
+MMDM2_SI1452
+MMDM2_SI1555
+MMDM2_SI2082
+MMDM2_SX102
+MMDM2_SX12
+MMDM2_SX192
+MMDM2_SX282
+MMDM2_SX372
+MMJR0_SA1
+MMJR0_SA2
+MMJR0_SI1648
+MMJR0_SI2166
+MMJR0_SI2278
+MMJR0_SX118
+MMJR0_SX208
+MMJR0_SX28
+MMJR0_SX298
+MMJR0_SX388
+MMWH0_SA1
+MMWH0_SA2
+MMWH0_SI1089
+MMWH0_SI1301
+MMWH0_SI459
+MMWH0_SX189
+MMWH0_SX279
+MMWH0_SX369
+MMWH0_SX9
+MMWH0_SX99
+MNJM0_SA1
+MNJM0_SA2
+MNJM0_SI1580
+MNJM0_SI2210
+MNJM0_SI950
+MNJM0_SX140
+MNJM0_SX230
+MNJM0_SX320
+MNJM0_SX410
+MNJM0_SX50
+MNLS0_SA1
+MNLS0_SA2
+MNLS0_SI1483
+MNLS0_SI1610
+MNLS0_SI853
+MNLS0_SX133
+MNLS0_SX223
+MNLS0_SX313
+MNLS0_SX403
+MNLS0_SX43
+MPAB0_SA1
+MPAB0_SA2
+MPAB0_SI1103
+MPAB0_SI1128
+MPAB0_SI498
+MPAB0_SX138
+MPAB0_SX228
+MPAB0_SX318
+MPAB0_SX408
+MPAB0_SX48
+MPAM0_SA1
+MPAM0_SA2
+MPAM0_SI1189
+MPAM0_SI1819
+MPAM0_SI1961
+MPAM0_SX109
+MPAM0_SX19
+MPAM0_SX199
+MPAM0_SX289
+MPAM0_SX379
+MPAM1_SA1
+MPAM1_SA2
+MPAM1_SI1029
+MPAM1_SI1836
+MPAM1_SI576
+MPAM1_SX126
+MPAM1_SX216
+MPAM1_SX306
+MPAM1_SX36
+MPAM1_SX396
+MPCS0_SA1
+MPCS0_SA2
+MPCS0_SI1359
+MPCS0_SI1989
+MPCS0_SI729
+MPCS0_SX189
+MPCS0_SX279
+MPCS0_SX369
+MPCS0_SX9
+MPCS0_SX99
+MPDF0_SA1
+MPDF0_SA2
+MPDF0_SI1542
+MPDF0_SI2172
+MPDF0_SI912
+MPDF0_SX102
+MPDF0_SX12
+MPDF0_SX192
+MPDF0_SX282
+MPDF0_SX372
+MPGL0_SA1
+MPGL0_SA2
+MPGL0_SI1099
+MPGL0_SI1729
+MPGL0_SI469
+MPGL0_SX109
+MPGL0_SX19
+MPGL0_SX199
+MPGL0_SX289
+MPGL0_SX379
+MPLB0_SA1
+MPLB0_SA2
+MPLB0_SI1394
+MPLB0_SI2024
+MPLB0_SI764
+MPLB0_SX134
+MPLB0_SX224
+MPLB0_SX314
+MPLB0_SX404
+MPLB0_SX44
+MPWM0_SA1
+MPWM0_SA2
+MPWM0_SI1127
+MPWM0_SI1757
+MPWM0_SI2279
+MPWM0_SX137
+MPWM0_SX227
+MPWM0_SX317
+MPWM0_SX407
+MPWM0_SX47
+MRCS0_SA1
+MRCS0_SA2
+MRCS0_SI1223
+MRCS0_SI1853
+MRCS0_SI593
+MRCS0_SX143
+MRCS0_SX233
+MRCS0_SX323
+MRCS0_SX413
+MRCS0_SX53
+MRCZ0_SA1
+MRCZ0_SA2
+MRCZ0_SI1541
+MRCZ0_SI2171
+MRCZ0_SI911
+MRCZ0_SX101
+MRCZ0_SX11
+MRCZ0_SX191
+MRCZ0_SX281
+MRCZ0_SX371
+MREB0_SA1
+MREB0_SA2
+MREB0_SI1375
+MREB0_SI2005
+MREB0_SI745
+MREB0_SX115
+MREB0_SX205
+MREB0_SX25
+MREB0_SX295
+MREB0_SX385
+MRES0_SA1
+MRES0_SA2
+MRES0_SI1217
+MRES0_SI1847
+MRES0_SI587
+MRES0_SX137
+MRES0_SX227
+MRES0_SX317
+MRES0_SX407
+MRES0_SX47
+MRGG0_SA1
+MRGG0_SA2
+MRGG0_SI1199
+MRGG0_SI1829
+MRGG0_SI569
+MRGG0_SX119
+MRGG0_SX209
+MRGG0_SX29
+MRGG0_SX299
+MRGG0_SX389
+MRJM3_SA1
+MRJM3_SA2
+MRJM3_SI1448
+MRJM3_SI1809
+MRJM3_SI2078
+MRJM3_SX188
+MRJM3_SX278
+MRJM3_SX368
+MRJM3_SX8
+MRJM3_SX98
+MRJM4_SA1
+MRJM4_SA2
+MRJM4_SI1489
+MRJM4_SI2119
+MRJM4_SI859
+MRJM4_SX139
+MRJM4_SX229
+MRJM4_SX319
+MRJM4_SX409
+MRJM4_SX49
+MRJO0_SA1
+MRJO0_SA2
+MRJO0_SI1364
+MRJO0_SI1624
+MRJO0_SI734
+MRJO0_SX104
+MRJO0_SX14
+MRJO0_SX194
+MRJO0_SX284
+MRJO0_SX374
+MRJR0_SA1
+MRJR0_SA2
+MRJR0_SI1182
+MRJR0_SI1812
+MRJR0_SI2313
+MRJR0_SX102
+MRJR0_SX12
+MRJR0_SX192
+MRJR0_SX282
+MRJR0_SX372
+MRJS0_SA1
+MRJS0_SA2
+MRJS0_SI1444
+MRJS0_SI1523
+MRJS0_SI2074
+MRJS0_SX184
+MRJS0_SX274
+MRJS0_SX364
+MRJS0_SX4
+MRJS0_SX94
+MRKO0_SA1
+MRKO0_SA2
+MRKO0_SI1397
+MRKO0_SI2027
+MRKO0_SI767
+MRKO0_SX137
+MRKO0_SX227
+MRKO0_SX317
+MRKO0_SX407
+MRKO0_SX47
+MRMS1_SA1
+MRMS1_SA2
+MRMS1_SI1487
+MRMS1_SI2117
+MRMS1_SI857
+MRMS1_SX137
+MRMS1_SX227
+MRMS1_SX317
+MRMS1_SX407
+MRMS1_SX47
+MROA0_SA1
+MROA0_SA2
+MROA0_SI1307
+MROA0_SI1970
+MROA0_SI677
+MROA0_SX137
+MROA0_SX227
+MROA0_SX317
+MROA0_SX407
+MROA0_SX47
+MRPC0_SA1
+MRPC0_SA2
+MRPC0_SI1753
+MRPC0_SI493
+MRPC0_SI933
+MRPC0_SX133
+MRPC0_SX223
+MRPC0_SX313
+MRPC0_SX403
+MRPC0_SX43
+MRPP0_SA1
+MRPP0_SA2
+MRPP0_SI1184
+MRPP0_SI1814
+MRPP0_SI554
+MRPP0_SX104
+MRPP0_SX14
+MRPP0_SX194
+MRPP0_SX284
+MRPP0_SX374
+MRRK0_SA1
+MRRK0_SA2
+MRRK0_SI1288
+MRRK0_SI1716
+MRRK0_SI1918
+MRRK0_SX118
+MRRK0_SX208
+MRRK0_SX28
+MRRK0_SX298
+MRRK0_SX388
+MRTK0_SA1
+MRTK0_SA2
+MRTK0_SI1093
+MRTK0_SI1723
+MRTK0_SI1750
+MRTK0_SX103
+MRTK0_SX13
+MRTK0_SX193
+MRTK0_SX283
+MRTK0_SX373
+MRWS1_SA1
+MRWS1_SA2
+MRWS1_SI1130
+MRWS1_SI1496
+MRWS1_SI500
+MRWS1_SX140
+MRWS1_SX230
+MRWS1_SX320
+MRWS1_SX410
+MRWS1_SX50
+MSFH1_SA1
+MSFH1_SA2
+MSFH1_SI1270
+MSFH1_SI1900
+MSFH1_SI640
+MSFH1_SX10
+MSFH1_SX100
+MSFH1_SX190
+MSFH1_SX280
+MSFH1_SX370
+MSJS1_SA1
+MSJS1_SA2
+MSJS1_SI1899
+MSJS1_SI639
+MSJS1_SI869
+MSJS1_SX189
+MSJS1_SX279
+MSJS1_SX369
+MSJS1_SX9
+MSJS1_SX99
+MSLB0_SA1
+MSLB0_SA2
+MSLB0_SI1193
+MSLB0_SI1823
+MSLB0_SI563
+MSLB0_SX113
+MSLB0_SX203
+MSLB0_SX23
+MSLB0_SX293
+MSLB0_SX383
+MSTK0_SA1
+MSTK0_SA2
+MSTK0_SI1024
+MSTK0_SI2222
+MSTK0_SI2284
+MSTK0_SX124
+MSTK0_SX214
+MSTK0_SX304
+MSTK0_SX34
+MSTK0_SX394
+MTAA0_SA1
+MTAA0_SA2
+MTAA0_SI1285
+MTAA0_SI1915
+MTAA0_SI596
+MTAA0_SX115
+MTAA0_SX205
+MTAA0_SX25
+MTAA0_SX295
+MTAA0_SX385
+MTAS1_SA1
+MTAS1_SA2
+MTAS1_SI1473
+MTAS1_SI2098
+MTAS1_SI838
+MTAS1_SX118
+MTAS1_SX208
+MTAS1_SX28
+MTAS1_SX298
+MTAS1_SX388
+MTDT0_SA1
+MTDT0_SA2
+MTDT0_SI1994
+MTDT0_SI2254
+MTDT0_SI994
+MTDT0_SX184
+MTDT0_SX274
+MTDT0_SX364
+MTDT0_SX4
+MTDT0_SX94
+MTEB0_SA1
+MTEB0_SA2
+MTEB0_SI1133
+MTEB0_SI2064
+MTEB0_SI503
+MTEB0_SX143
+MTEB0_SX233
+MTEB0_SX323
+MTEB0_SX413
+MTEB0_SX53
+MTHC0_SA1
+MTHC0_SA2
+MTHC0_SI1015
+MTHC0_SI1645
+MTHC0_SI2275
+MTHC0_SX115
+MTHC0_SX205
+MTHC0_SX25
+MTHC0_SX295
+MTHC0_SX385
+MTLS0_SA1
+MTLS0_SA2
+MTLS0_SI1370
+MTLS0_SI2000
+MTLS0_SI740
+MTLS0_SX110
+MTLS0_SX20
+MTLS0_SX200
+MTLS0_SX290
+MTLS0_SX380
+MTMR0_SA1
+MTMR0_SA2
+MTMR0_SI1303
+MTMR0_SI1933
+MTMR0_SI673
+MTMR0_SX133
+MTMR0_SX223
+MTMR0_SX313
+MTMR0_SX403
+MTMR0_SX43
+MTWH0_SA1
+MTWH0_SA2
+MTWH0_SI1190
+MTWH0_SI1629
+MTWH0_SI1820
+MTWH0_SX110
+MTWH0_SX20
+MTWH0_SX200
+MTWH0_SX290
+MTWH0_SX380
+MWBT0_SA1
+MWBT0_SA2
+MWBT0_SI1553
+MWBT0_SI2183
+MWBT0_SI923
+MWBT0_SX113
+MWBT0_SX203
+MWBT0_SX23
+MWBT0_SX293
+MWBT0_SX383
+MWEW0_SA1
+MWEW0_SA2
+MWEW0_SI1361
+MWEW0_SI1991
+MWEW0_SI731
+MWEW0_SX101
+MWEW0_SX11
+MWEW0_SX191
+MWEW0_SX281
+MWEW0_SX371
+MWJG0_SA1
+MWJG0_SA2
+MWJG0_SI1124
+MWJG0_SI1754
+MWJG0_SI494
+MWJG0_SX134
+MWJG0_SX224
+MWJG0_SX314
+MWJG0_SX404
+MWJG0_SX44
+MWVW0_SA1
+MWVW0_SA2
+MWVW0_SI1476
+MWVW0_SI2106
+MWVW0_SI846
+MWVW0_SX126
+MWVW0_SX216
+MWVW0_SX306
+MWVW0_SX36
+MWVW0_SX396
diff --git a/SpeechT5/fairseq/examples/wav2vec/unsupervised/config/timit_unmatched/train.uid b/SpeechT5/fairseq/examples/wav2vec/unsupervised/config/timit_unmatched/train.uid
new file mode 100644
index 0000000000000000000000000000000000000000..35b02e7f82cc788f59860befad083ba8cfc899c0
--- /dev/null
+++ b/SpeechT5/fairseq/examples/wav2vec/unsupervised/config/timit_unmatched/train.uid
@@ -0,0 +1,3000 @@
+FAEM0_SA1
+FAEM0_SA2
+FAEM0_SI2022
+FAEM0_SX132
+FAEM0_SX222
+FAEM0_SX312
+FAEM0_SX402
+FAJW0_SA2
+FAJW0_SI1893
+FAJW0_SX183
+FAJW0_SX273
+FAJW0_SX363
+FALK0_SA1
+FALK0_SA2
+FALK0_SI1086
+FALK0_SI456
+FALK0_SX276
+FALK0_SX366
+FALK0_SX96
+FALR0_SA1
+FALR0_SA2
+FALR0_SI1955
+FALR0_SI695
+FALR0_SX155
+FALR0_SX245
+FALR0_SX425
+FALR0_SX65
+FAPB0_SA1
+FAPB0_SA2
+FAPB0_SI1693
+FAPB0_SX163
+FAPB0_SX253
+FAPB0_SX343
+FAPB0_SX73
+FBAS0_SA2
+FBAS0_SI1387
+FBAS0_SX127
+FBAS0_SX307
+FBAS0_SX37
+FBAS0_SX397
+FBCG1_SA2
+FBCG1_SI1612
+FBCG1_SI2242
+FBCG1_SI982
+FBCG1_SX262
+FBCG1_SX82
+FBCH0_SA1
+FBCH0_SA2
+FBCH0_SI1586
+FBCH0_SI956
+FBCH0_SX146
+FBCH0_SX326
+FBCH0_SX56
+FBJL0_SA1
+FBJL0_SA2
+FBJL0_SI1552
+FBJL0_SI2182
+FBJL0_SX112
+FBJL0_SX202
+FBJL0_SX22
+FBJL0_SX292
+FBJL0_SX382
+FBLV0_SA2
+FBLV0_SI2318
+FBLV0_SX158
+FBLV0_SX248
+FBLV0_SX428
+FBMH0_SA2
+FBMH0_SI1766
+FBMH0_SX146
+FBMH0_SX236
+FBMH0_SX326
+FBMH0_SX416
+FBMH0_SX56
+FBMJ0_SA2
+FBMJ0_SX156
+FBMJ0_SX246
+FBMJ0_SX426
+FBMJ0_SX66
+FCAG0_SA2
+FCAG0_SI1503
+FCAG0_SI1641
+FCAG0_SI2133
+FCAG0_SX333
+FCAG0_SX423
+FCAG0_SX63
+FCAJ0_SA1
+FCAJ0_SA2
+FCAJ0_SI1804
+FCAJ0_SI849
+FCAJ0_SX129
+FCAJ0_SX219
+FCAJ0_SX39
+FCAJ0_SX399
+FCDR1_SA1
+FCDR1_SA2
+FCDR1_SX16
+FCDR1_SX376
+FCEG0_SA1
+FCEG0_SI1248
+FCEG0_SI1878
+FCEG0_SI618
+FCEG0_SX168
+FCEG0_SX258
+FCEG0_SX348
+FCEG0_SX438
+FCEG0_SX78
+FCJF0_SA2
+FCJF0_SI1027
+FCJF0_SI1657
+FCJF0_SI648
+FCJF0_SX217
+FCJF0_SX307
+FCJF0_SX37
+FCJF0_SX397
+FCJS0_SA1
+FCJS0_SA2
+FCJS0_SI977
+FCJS0_SX167
+FCJS0_SX347
+FCJS0_SX437
+FCJS0_SX77
+FCKE0_SA1
+FCKE0_SI1111
+FCKE0_SX211
+FCKE0_SX301
+FCKE0_SX31
+FCKE0_SX391
+FCLT0_SA1
+FCLT0_SA2
+FCLT0_SI1438
+FCLT0_SX178
+FCLT0_SX268
+FCLT0_SX358
+FCMG0_SA1
+FCMG0_SI1242
+FCMG0_SX162
+FCMG0_SX252
+FCMG0_SX342
+FCMM0_SI1083
+FCMM0_SI453
+FCMM0_SX273
+FCMM0_SX363
+FCMM0_SX93
+FCRZ0_SA1
+FCRZ0_SA2
+FCRZ0_SI1913
+FCRZ0_SI793
+FCRZ0_SX163
+FCRZ0_SX253
+FCRZ0_SX343
+FCRZ0_SX73
+FCYL0_SA2
+FCYL0_SI1297
+FCYL0_SI1927
+FCYL0_SX127
+FCYL0_SX217
+FCYL0_SX397
+FDAS1_SA1
+FDAS1_SA2
+FDAS1_SX111
+FDAS1_SX21
+FDAS1_SX291
+FDAW0_SA1
+FDAW0_SA2
+FDAW0_SX146
+FDAW0_SX236
+FDAW0_SX326
+FDAW0_SX416
+FDAW0_SX56
+FDFB0_SI1318
+FDFB0_SI1948
+FDFB0_SX148
+FDFB0_SX238
+FDFB0_SX328
+FDFB0_SX418
+FDJH0_SA1
+FDJH0_SA2
+FDJH0_SI1565
+FDJH0_SI2195
+FDJH0_SX125
+FDJH0_SX215
+FDJH0_SX35
+FDJH0_SX395
+FDKN0_SA1
+FDKN0_SA2
+FDKN0_SI1081
+FDKN0_SI1711
+FDKN0_SX271
+FDKN0_SX361
+FDKN0_SX91
+FDML0_SA1
+FDML0_SI1149
+FDML0_SI1779
+FDML0_SI2075
+FDML0_SX339
+FDML0_SX69
+FDMY0_SI1197
+FDMY0_SX117
+FDMY0_SX207
+FDMY0_SX297
+FDNC0_SA1
+FDNC0_SA2
+FDNC0_SI2287
+FDNC0_SX108
+FDNC0_SX18
+FDNC0_SX378
+FDTD0_SA2
+FDTD0_SI1561
+FDTD0_SI2191
+FDTD0_SI931
+FDTD0_SX121
+FDTD0_SX301
+FDTD0_SX391
+FDXW0_SA2
+FDXW0_SI1511
+FDXW0_SI2141
+FDXW0_SI881
+FDXW0_SX161
+FDXW0_SX431
+FEAC0_SA1
+FEAC0_SA2
+FEAC0_SI1245
+FEAC0_SI1875
+FEAC0_SX255
+FEAC0_SX345
+FEAC0_SX435
+FEAR0_SA1
+FEAR0_SA2
+FEAR0_SI1252
+FEAR0_SI1882
+FEAR0_SX172
+FEAR0_SX262
+FEAR0_SX442
+FEAR0_SX82
+FECD0_SA2
+FECD0_SI2048
+FECD0_SX158
+FECD0_SX248
+FECD0_SX338
+FECD0_SX428
+FEEH0_SA2
+FEEH0_SI1112
+FEEH0_SX212
+FEEH0_SX302
+FEEH0_SX32
+FEEH0_SX392
+FEME0_SA2
+FEME0_SI1505
+FEME0_SI2135
+FEME0_SX245
+FEME0_SX425
+FETB0_SA2
+FETB0_SI1778
+FETB0_SI518
+FETB0_SX248
+FETB0_SX338
+FETB0_SX428
+FETB0_SX68
+FEXM0_SA2
+FEXM0_SI1731
+FEXM0_SX111
+FEXM0_SX201
+FEXM0_SX291
+FEXM0_SX381
+FGCS0_SA1
+FGCS0_SA2
+FGCS0_SI1486
+FGCS0_SI2116
+FGCS0_SI856
+FGCS0_SX46
+FGDP0_SA2
+FGDP0_SI1618
+FGDP0_SI2248
+FGDP0_SX178
+FGDP0_SX268
+FGDP0_SX358
+FGDP0_SX448
+FGMB0_SA1
+FGMB0_SA2
+FGMB0_SI515
+FGMB0_SX155
+FGMB0_SX425
+FGMB0_SX65
+FGRW0_SA2
+FGRW0_SI1782
+FGRW0_SI1990
+FGRW0_SX252
+FGRW0_SX342
+FGRW0_SX72
+FHLM0_SA1
+FHLM0_SA2
+FHLM0_SI1560
+FHLM0_SI2190
+FHLM0_SI930
+FHLM0_SX210
+FHLM0_SX300
+FHXS0_SI2335
+FHXS0_SX265
+FHXS0_SX355
+FHXS0_SX85
+FJDM2_SI1582
+FJDM2_SI1964
+FJDM2_SI2212
+FJDM2_SX322
+FJDM2_SX412
+FJEN0_SA2
+FJEN0_SI1047
+FJEN0_SI1677
+FJEN0_SI2307
+FJEN0_SX147
+FJEN0_SX237
+FJEN0_SX57
+FJHK0_SA1
+FJHK0_SA2
+FJHK0_SI1022
+FJHK0_SI1652
+FJHK0_SX122
+FJHK0_SX212
+FJHK0_SX32
+FJHK0_SX392
+FJKL0_SA1
+FJKL0_SA2
+FJKL0_SI1562
+FJKL0_SI2192
+FJKL0_SX122
+FJKL0_SX302
+FJKL0_SX32
+FJLG0_SA1
+FJLG0_SA2
+FJLG0_SI1506
+FJLG0_SX179
+FJLG0_SX269
+FJLG0_SX359
+FJLG0_SX449
+FJLG0_SX89
+FJLR0_SA2
+FJLR0_SI1861
+FJLR0_SI601
+FJLR0_SX151
+FJLR0_SX241
+FJLR0_SX331
+FJLR0_SX421
+FJLR0_SX61
+FJRB0_SA1
+FJRB0_SA2
+FJRB0_SI1302
+FJRB0_SI1932
+FJRB0_SI672
+FJRB0_SX132
+FJRB0_SX222
+FJRB0_SX312
+FJRB0_SX42
+FJRP1_SA2
+FJRP1_SI802
+FJRP1_SX172
+FJRP1_SX442
+FJSK0_SA2
+FJSK0_SI1682
+FJSK0_SI2312
+FJSK0_SX152
+FJSK0_SX242
+FJSK0_SX332
+FJSK0_SX422
+FJSK0_SX62
+FJSP0_SA1
+FJSP0_SA2
+FJSP0_SI1763
+FJSP0_SI804
+FJSP0_SX174
+FJSP0_SX84
+FJWB1_SA2
+FJWB1_SI2055
+FJWB1_SI795
+FJWB1_SX165
+FJWB1_SX255
+FJWB1_SX75
+FJXM0_SA2
+FJXM0_SI1211
+FJXM0_SI1971
+FJXM0_SX131
+FJXM0_SX221
+FJXP0_SA2
+FJXP0_SI492
+FJXP0_SX222
+FJXP0_SX312
+FJXP0_SX402
+FJXP0_SX42
+FKAA0_SA2
+FKAA0_SI1208
+FKAA0_SI1838
+FKAA0_SI578
+FKAA0_SX218
+FKAA0_SX308
+FKAA0_SX38
+FKDE0_SA2
+FKDE0_SI2221
+FKDE0_SX331
+FKDW0_SA1
+FKDW0_SA2
+FKDW0_SI577
+FKDW0_SX127
+FKDW0_SX217
+FKDW0_SX307
+FKDW0_SX37
+FKFB0_SA1
+FKFB0_SI2238
+FKFB0_SI978
+FKFB0_SX168
+FKFB0_SX258
+FKKH0_SI660
+FKKH0_SX210
+FKKH0_SX30
+FKKH0_SX300
+FKLC0_SA1
+FKLC0_SA2
+FKLC0_SI1615
+FKLC0_SI2245
+FKLC0_SX265
+FKLC0_SX445
+FKLC0_SX85
+FKLC1_SA1
+FKLC1_SA2
+FKLC1_SI1678
+FKLC1_SX148
+FKLC1_SX58
+FKLH0_SA1
+FKLH0_SI1887
+FKLH0_SI627
+FKLH0_SX267
+FKLH0_SX357
+FKLH0_SX447
+FKLH0_SX87
+FKSR0_SI1117
+FKSR0_SX161
+FKSR0_SX37
+FKSR0_SX397
+FLAC0_SA1
+FLAC0_SA2
+FLAC0_SI2161
+FLAC0_SI901
+FLAC0_SX181
+FLAC0_SX271
+FLAC0_SX361
+FLAC0_SX91
+FLAG0_SA1
+FLAG0_SI2094
+FLAG0_SX294
+FLEH0_SA1
+FLEH0_SA2
+FLEH0_SX151
+FLEH0_SX241
+FLEH0_SX421
+FLEH0_SX61
+FLET0_SA2
+FLET0_SI1137
+FLET0_SI1767
+FLET0_SX147
+FLET0_SX237
+FLET0_SX277
+FLET0_SX417
+FLET0_SX57
+FLHD0_SA1
+FLHD0_SA2
+FLHD0_SI1344
+FLHD0_SI1974
+FLHD0_SX174
+FLHD0_SX264
+FLHD0_SX444
+FLHD0_SX84
+FLJA0_SA2
+FLJA0_SI1708
+FLJA0_SX268
+FLJA0_SX358
+FLJA0_SX448
+FLJA0_SX88
+FLJD0_SA1
+FLJD0_SA2
+FLJD0_SI2146
+FLJD0_SX166
+FLJD0_SX256
+FLJD0_SX346
+FLJD0_SX436
+FLJG0_SA1
+FLJG0_SI1611
+FLJG0_SI2241
+FLJG0_SX261
+FLJG0_SX441
+FLJG0_SX81
+FLKM0_SI1880
+FLKM0_SX116
+FLMA0_SA2
+FLMA0_SI1243
+FLMA0_SI1873
+FLMA0_SX163
+FLMA0_SX253
+FLMA0_SX343
+FLMC0_SA1
+FLMC0_SA2
+FLMC0_SI2002
+FLMC0_SI742
+FLMC0_SX112
+FLMC0_SX292
+FLMC0_SX336
+FLMC0_SX382
+FLMK0_SA2
+FLMK0_SI2295
+FLMK0_SX135
+FLMK0_SX225
+FLMK0_SX45
+FLOD0_SA1
+FLOD0_SA2
+FLOD0_SI1287
+FLOD0_SI657
+FLOD0_SX207
+FLOD0_SX387
+FLTM0_SA2
+FLTM0_SI1700
+FLTM0_SX260
+FLTM0_SX80
+FMAH1_SA1
+FMAH1_SI1509
+FMAH1_SI2139
+FMAH1_SX249
+FMAH1_SX339
+FMAH1_SX429
+FMAH1_SX69
+FMBG0_SA1
+FMBG0_SI1790
+FMBG0_SX260
+FMBG0_SX3
+FMBG0_SX350
+FMBG0_SX440
+FMBG0_SX80
+FMEM0_SA2
+FMEM0_SI1377
+FMEM0_SI2007
+FMEM0_SX117
+FMEM0_SX207
+FMEM0_SX297
+FMJB0_SA1
+FMJB0_SA2
+FMJB0_SI1807
+FMJB0_SX187
+FMJB0_SX277
+FMJB0_SX367
+FMJB0_SX7
+FMJF0_SA1
+FMJF0_SI1254
+FMJF0_SI1884
+FMJF0_SX264
+FMJF0_SX354
+FMJF0_SX444
+FMJU0_SA1
+FMJU0_SA2
+FMJU0_SI2019
+FMJU0_SI759
+FMJU0_SX129
+FMJU0_SX219
+FMJU0_SX39
+FMKC0_SA1
+FMKC0_SA2
+FMKC0_SI1072
+FMKC0_SX172
+FMKC0_SX262
+FMKC0_SX352
+FMKF0_SA1
+FMKF0_SA2
+FMKF0_SI1536
+FMKF0_SI906
+FMKF0_SX276
+FMKF0_SX366
+FMKF0_SX6
+FMKF0_SX96
+FMMH0_SA1
+FMMH0_SA2
+FMMH0_SI1537
+FMMH0_SI2167
+FMMH0_SI907
+FMMH0_SX187
+FMMH0_SX367
+FMMH0_SX420
+FMMH0_SX7
+FMMH0_SX97
+FMPG0_SI1602
+FMPG0_SI2232
+FMPG0_SX252
+FMPG0_SX72
+FNKL0_SA1
+FNKL0_SA2
+FNKL0_SI2152
+FNKL0_SX172
+FNKL0_SX196
+FNKL0_SX262
+FNKL0_SX442
+FNKL0_SX82
+FNTB0_SA1
+FNTB0_SA2
+FNTB0_SX123
+FNTB0_SX213
+FNTB0_SX33
+FNTB0_SX393
+FPAB1_SA2
+FPAB1_SX121
+FPAB1_SX301
+FPAB1_SX31
+FPAB1_SX391
+FPAC0_SA1
+FPAC0_SI2011
+FPAC0_SX121
+FPAC0_SX211
+FPAC0_SX301
+FPAC0_SX31
+FPAC0_SX391
+FPAD0_SA1
+FPAD0_SI1346
+FPAD0_SI1976
+FPAD0_SX266
+FPAD0_SX446
+FPAF0_SI1684
+FPAF0_SI2314
+FPAF0_SX244
+FPAF0_SX334
+FPAF0_SX424
+FPAF0_SX64
+FPAZ0_SI1593
+FPAZ0_SX153
+FPAZ0_SX27
+FPAZ0_SX423
+FPAZ0_SX63
+FPJF0_SA2
+FPJF0_SI1046
+FPJF0_SI1676
+FPJF0_SX236
+FPJF0_SX326
+FPLS0_SA1
+FPLS0_SA2
+FPLS0_SI2220
+FPLS0_SX150
+FPLS0_SX240
+FPLS0_SX3
+FPLS0_SX60
+FPMY0_SA2
+FPMY0_SI1783
+FPMY0_SX163
+FPMY0_SX196
+FPMY0_SX253
+FPMY0_SX73
+FREH0_SI1315
+FREH0_SI685
+FREH0_SX145
+FREH0_SX235
+FREH0_SX325
+FREH0_SX55
+FRJB0_SA1
+FRJB0_SA2
+FRJB0_SI1427
+FRJB0_SI1470
+FRJB0_SI1794
+FRJB0_SX167
+FRJB0_SX257
+FRJB0_SX437
+FRJB0_SX77
+FRLL0_SA1
+FRLL0_SA2
+FRLL0_SI1514
+FRLL0_SI884
+FRLL0_SX164
+FRLL0_SX254
+FRLL0_SX344
+FRLL0_SX74
+FSAG0_SA2
+FSAG0_SI1953
+FSAG0_SI693
+FSAG0_SX63
+FSAH0_SI1244
+FSAH0_SI1874
+FSAH0_SX344
+FSAH0_SX74
+FSAK0_SA1
+FSAK0_SA2
+FSAK0_SI1930
+FSAK0_SI670
+FSAK0_SX130
+FSAK0_SX220
+FSAK0_SX310
+FSAK0_SX40
+FSAK0_SX400
+FSBK0_SA1
+FSBK0_SI1699
+FSBK0_SI2329
+FSBK0_SX259
+FSBK0_SX439
+FSBK0_SX79
+FSCN0_SI1886
+FSCN0_SX356
+FSDC0_SA1
+FSDC0_SI1942
+FSDC0_SI2234
+FSDC0_SX232
+FSDC0_SX412
+FSDJ0_SA1
+FSDJ0_SA2
+FSDJ0_SI1745
+FSDJ0_SX125
+FSDJ0_SX35
+FSGF0_SA1
+FSGF0_SA2
+FSGF0_SI1557
+FSGF0_SX207
+FSGF0_SX27
+FSGF0_SX297
+FSGF0_SX387
+FSJG0_SI1570
+FSJG0_SI2200
+FSJG0_SX310
+FSJK1_SA1
+FSJK1_SI1025
+FSJK1_SI2285
+FSJK1_SI696
+FSJK1_SX215
+FSJK1_SX305
+FSJK1_SX395
+FSJS0_SA2
+FSJS0_SI1171
+FSJS0_SI1801
+FSJS0_SI541
+FSJS0_SX271
+FSJS0_SX361
+FSJS0_SX91
+FSJW0_SA1
+FSJW0_SA2
+FSJW0_SI703
+FSJW0_SX163
+FSJW0_SX253
+FSJW0_SX343
+FSJW0_SX73
+FSKC0_SA1
+FSKC0_SA2
+FSKC0_SI2046
+FSKC0_SX156
+FSKC0_SX336
+FSKC0_SX426
+FSKC0_SX66
+FSKL0_SA1
+FSKL0_SA2
+FSKL0_SI2159
+FSKL0_SI899
+FSKL0_SX179
+FSKL0_SX269
+FSKL0_SX359
+FSKL0_SX89
+FSKP0_SA1
+FSKP0_SI1728
+FSKP0_SI468
+FSKP0_SX108
+FSKP0_SX18
+FSKP0_SX198
+FSKP0_SX288
+FSKP0_SX378
+FSLS0_SA1
+FSLS0_SA2
+FSLS0_SI1056
+FSLS0_SI1686
+FSLS0_SI2316
+FSLS0_SX202
+FSLS0_SX246
+FSLS0_SX66
+FSMA0_SA1
+FSMA0_SI1621
+FSMA0_SI2251
+FSMA0_SX271
+FSMA0_SX361
+FSMA0_SX91
+FSMM0_SA1
+FSMM0_SA2
+FSMM0_SI1314
+FSMM0_SI1944
+FSMM0_SI684
+FSMM0_SX414
+FSMM0_SX54
+FSMS1_SA1
+FSMS1_SA2
+FSMS1_SI1504
+FSMS1_SI2134
+FSMS1_SI874
+FSMS1_SX154
+FSMS1_SX334
+FSMS1_SX64
+FSPM0_SA1
+FSPM0_SI1871
+FSPM0_SI611
+FSPM0_SX341
+FSPM0_SX431
+FSRH0_SA1
+FSRH0_SA2
+FSRH0_SI1719
+FSRH0_SX131
+FSRH0_SX41
+FSSB0_SA1
+FSSB0_SA2
+FSSB0_SI1082
+FSSB0_SI2342
+FSSB0_SX182
+FSSB0_SX272
+FSSB0_SX452
+FSSB0_SX92
+FTAJ0_SA1
+FTAJ0_SA2
+FTAJ0_SI1329
+FTAJ0_SI474
+FTAJ0_SX339
+FTAJ0_SX69
+FTBR0_SA1
+FTBR0_SA2
+FTBR0_SI2181
+FTBR0_SX111
+FTBR0_SX201
+FTBR0_SX291
+FTBR0_SX381
+FTBW0_SA2
+FTBW0_SI1345
+FTBW0_SI1975
+FTBW0_SX265
+FTBW0_SX355
+FTBW0_SX445
+FTBW0_SX85
+FTLG0_SA1
+FTLG0_SA2
+FTLG0_SI840
+FTLG0_SX123
+FTLG0_SX213
+FTLG0_SX303
+FTLG0_SX33
+FTLG0_SX393
+FTMG0_SA1
+FTMG0_SA2
+FTMG0_SX182
+FTMG0_SX272
+FTMG0_SX362
+FTMG0_SX92
+FVFB0_SA1
+FVFB0_SI1032
+FVFB0_SI2292
+FVFB0_SX222
+FVFB0_SX312
+FVFB0_SX402
+FVKB0_SA2
+FVKB0_SI1159
+FVKB0_SI1789
+FVKB0_SI529
+FVKB0_SX169
+FVKB0_SX259
+FVKB0_SX439
+FVKB0_SX79
+FVMH0_SA1
+FVMH0_SI2096
+FVMH0_SX206
+FVMH0_SX296
+FVMH0_SX386
+MABC0_SA1
+MABC0_SA2
+MABC0_SX151
+MABC0_SX241
+MABC0_SX331
+MABC0_SX421
+MABC0_SX61
+MADC0_SA1
+MADC0_SA2
+MADC0_SI1997
+MADC0_SX17
+MADC0_SX197
+MADC0_SX287
+MADD0_SA1
+MADD0_SI1798
+MADD0_SI538
+MADD0_SX358
+MADD0_SX448
+MAEB0_SA1
+MAEB0_SA2
+MAEB0_SI2250
+MAEB0_SI990
+MAEB0_SX180
+MAEB0_SX270
+MAEB0_SX360
+MAEB0_SX90
+MAEO0_SA2
+MAEO0_SI1655
+MAEO0_SI1956
+MAEO0_SX156
+MAEO0_SX246
+MAEO0_SX336
+MAEO0_SX426
+MAEO0_SX66
+MAFM0_SA1
+MAFM0_SA2
+MAFM0_SI1569
+MAFM0_SI2199
+MAFM0_SX219
+MAFM0_SX39
+MAFM0_SX399
+MAJP0_SA1
+MAJP0_SI1074
+MAJP0_SI2334
+MAJP0_SX264
+MAJP0_SX354
+MAJP0_SX444
+MAJP0_SX84
+MAKB0_SA1
+MAKB0_SX206
+MAKB0_SX296
+MAKR0_SA1
+MAKR0_SA2
+MAKR0_SI1352
+MAKR0_SI1982
+MAKR0_SI722
+MAKR0_SX182
+MAKR0_SX272
+MAKR0_SX452
+MAPV0_SA1
+MAPV0_SA2
+MAPV0_SI1923
+MAPV0_SX123
+MAPV0_SX303
+MAPV0_SX33
+MAPV0_SX393
+MARC0_SA1
+MARC0_SI1188
+MARC0_SI1818
+MARC0_SI558
+MARC0_SX288
+MARC0_SX378
+MARW0_SA1
+MARW0_SA2
+MARW0_SI1276
+MARW0_SI646
+MARW0_SX106
+MARW0_SX16
+MARW0_SX376
+MBAR0_SA2
+MBAR0_SI1319
+MBAR0_SI1949
+MBAR0_SI689
+MBAR0_SX149
+MBAR0_SX239
+MBAR0_SX329
+MBBR0_SA1
+MBBR0_SA2
+MBBR0_SI1685
+MBBR0_SX155
+MBBR0_SX245
+MBBR0_SX425
+MBCG0_SA2
+MBCG0_SI2217
+MBCG0_SX147
+MBCG0_SX237
+MBCG0_SX417
+MBCG0_SX57
+MBEF0_SA1
+MBEF0_SA2
+MBEF0_SX111
+MBEF0_SX201
+MBEF0_SX291
+MBGT0_SA1
+MBGT0_SI1341
+MBGT0_SI711
+MBGT0_SX81
+MBJV0_SA2
+MBJV0_SI1247
+MBJV0_SI1877
+MBJV0_SX167
+MBJV0_SX257
+MBJV0_SX437
+MBJV0_SX77
+MBMA0_SA1
+MBMA0_SA2
+MBMA0_SI1852
+MBMA0_SX142
+MBMA0_SX322
+MBMA0_SX412
+MBMA1_SA1
+MBMA1_SA2
+MBMA1_SI2207
+MBMA1_SX144
+MBMA1_SX234
+MBMA1_SX414
+MBML0_SA1
+MBML0_SI1799
+MBML0_SI539
+MBML0_SX179
+MBML0_SX269
+MBML0_SX359
+MBML0_SX449
+MBOM0_SA1
+MBOM0_SI1014
+MBOM0_SI1644
+MBOM0_SX114
+MBOM0_SX204
+MBOM0_SX311
+MBOM0_SX384
+MBSB0_SA2
+MBSB0_SI1353
+MBSB0_SI1983
+MBSB0_SI723
+MBSB0_SX183
+MBSB0_SX273
+MBSB0_SX363
+MBSB0_SX93
+MBTH0_SA1
+MBTH0_SI505
+MBTH0_SI757
+MBTH0_SX212
+MBTH0_SX302
+MBTH0_SX392
+MBWP0_SA1
+MBWP0_SA2
+MBWP0_SI1531
+MBWP0_SI1969
+MBWP0_SI709
+MBWP0_SX169
+MBWP0_SX259
+MBWP0_SX439
+MBWP0_SX79
+MCAE0_SA1
+MCAE0_SA2
+MCAE0_SX187
+MCAE0_SX367
+MCAE0_SX7
+MCAE0_SX97
+MCAL0_SA1
+MCAL0_SI508
+MCAL0_SX148
+MCAL0_SX238
+MCAL0_SX328
+MCAL0_SX418
+MCAL0_SX58
+MCDC0_SA2
+MCDC0_SI1292
+MCDC0_SI1922
+MCDC0_SI662
+MCDC0_SX122
+MCDC0_SX302
+MCDC0_SX32
+MCDC0_SX392
+MCDD0_SA1
+MCDD0_SI1513
+MCDD0_SI2143
+MCDD0_SX163
+MCDD0_SX343
+MCDD0_SX73
+MCDR0_SA1
+MCDR0_SA2
+MCDR0_SX164
+MCDR0_SX254
+MCDR0_SX344
+MCDR0_SX434
+MCDR0_SX74
+MCEF0_SA1
+MCEF0_SA2
+MCEF0_SI1135
+MCEF0_SI1765
+MCEF0_SX145
+MCEF0_SX325
+MCEF0_SX55
+MCEW0_SI1442
+MCEW0_SX182
+MCEW0_SX272
+MCEW0_SX92
+MCHL0_SA1
+MCHL0_SA2
+MCHL0_SI1977
+MCHL0_SX177
+MCHL0_SX267
+MCHL0_SX357
+MCHL0_SX447
+MCLK0_SA1
+MCLK0_SA2
+MCLK0_SI1660
+MCLK0_SX130
+MCLK0_SX220
+MCLK0_SX40
+MCLK0_SX400
+MCLM0_SA2
+MCLM0_SI1456
+MCLM0_SX106
+MCLM0_SX16
+MCLM0_SX196
+MCLM0_SX286
+MCLM0_SX376
+MCPM0_SA2
+MCPM0_SI1194
+MCPM0_SI564
+MCPM0_SX204
+MCPM0_SX24
+MCRE0_SA1
+MCRE0_SA2
+MCRE0_SI1121
+MCRE0_SI1725
+MCRE0_SI1751
+MCRE0_SX131
+MCRE0_SX221
+MCRE0_SX24
+MCRE0_SX401
+MCRE0_SX41
+MCSS0_SA1
+MCSS0_SA2
+MCSS0_SX120
+MCSS0_SX210
+MCSS0_SX30
+MCSS0_SX300
+MCSS0_SX390
+MCTH0_SA2
+MCTH0_SI1209
+MCTH0_SI1839
+MCTH0_SI579
+MCTH0_SX129
+MCTH0_SX219
+MCTH0_SX309
+MCTH0_SX399
+MCTM0_SA1
+MCTM0_SA2
+MCTM0_SI720
+MCTM0_SX180
+MCTM0_SX270
+MCTM0_SX360
+MCTM0_SX450
+MCTM0_SX90
+MCXM0_SA1
+MCXM0_SA2
+MCXM0_SI1351
+MCXM0_SI1981
+MCXM0_SI721
+MCXM0_SX181
+MCXM0_SX271
+MCXM0_SX361
+MCXM0_SX451
+MDAC0_SA2
+MDAC0_SI1261
+MDAC0_SI1837
+MDAC0_SX271
+MDAC0_SX451
+MDAC0_SX91
+MDAS0_SA1
+MDAS0_SA2
+MDAS0_SI1266
+MDAS0_SX186
+MDAS0_SX21
+MDAS0_SX276
+MDAS0_SX96
+MDBB1_SA1
+MDBB1_SA2
+MDBB1_SI1006
+MDBB1_SI1636
+MDBB1_SI2056
+MDBB1_SX196
+MDBB1_SX286
+MDBP0_SA1
+MDBP0_SA2
+MDBP0_SI1158
+MDBP0_SI1788
+MDBP0_SX258
+MDBP0_SX348
+MDBP0_SX78
+MDCD0_SA1
+MDCD0_SA2
+MDCD0_SI2045
+MDCD0_SX155
+MDCD0_SX65
+MDCM0_SA1
+MDCM0_SA2
+MDCM0_SI2110
+MDCM0_SI850
+MDCM0_SX130
+MDCM0_SX220
+MDCM0_SX310
+MDDC0_SA1
+MDDC0_SA2
+MDDC0_SX249
+MDDC0_SX339
+MDDC0_SX429
+MDED0_SI1170
+MDED0_SI1800
+MDED0_SX180
+MDED0_SX270
+MDED0_SX360
+MDED0_SX450
+MDED0_SX90
+MDEF0_SA1
+MDEF0_SA2
+MDEF0_SI1563
+MDEF0_SI2193
+MDEF0_SX213
+MDEF0_SX33
+MDEF0_SX393
+MDEM0_SA2
+MDEM0_SI1868
+MDEM0_SX158
+MDEM0_SX248
+MDEM0_SX338
+MDEM0_SX68
+MDHL0_SA1
+MDHL0_SA2
+MDHL0_SI2069
+MDHL0_SI809
+MDHL0_SX179
+MDHL0_SX359
+MDHL0_SX89
+MDHS0_SX180
+MDHS0_SX270
+MDHS0_SX360
+MDHS0_SX450
+MDHS0_SX90
+MDJM0_SA1
+MDJM0_SA2
+MDJM0_SI2085
+MDJM0_SI825
+MDJM0_SX195
+MDJM0_SX285
+MDJM0_SX375
+MDKS0_SA1
+MDKS0_SA2
+MDKS0_SI1066
+MDKS0_SI1696
+MDKS0_SI2326
+MDKS0_SX256
+MDKS0_SX76
+MDLB0_SA1
+MDLB0_SI1936
+MDLB0_SI676
+MDLB0_SX226
+MDLB0_SX316
+MDLB0_SX46
+MDLC0_SA1
+MDLC0_SA2
+MDLC0_SI765
+MDLC0_SX135
+MDLC0_SX225
+MDLC0_SX315
+MDLC0_SX45
+MDLC1_SA1
+MDLC1_SX175
+MDLC1_SX265
+MDLC1_SX355
+MDLC1_SX85
+MDLC2_SA1
+MDLC2_SA2
+MDLC2_SI1614
+MDLC2_SI984
+MDLC2_SX174
+MDLC2_SX264
+MDLC2_SX444
+MDLC2_SX84
+MDLH0_SA1
+MDLH0_SI1960
+MDLH0_SI574
+MDLH0_SI700
+MDLH0_SX250
+MDLH0_SX340
+MDLH0_SX70
+MDLM0_SA1
+MDLM0_SA2
+MDLM0_SX244
+MDLM0_SX334
+MDLM0_SX64
+MDLR0_SI1233
+MDLR0_SX243
+MDLR0_SX423
+MDLR0_SX63
+MDLR1_SI1299
+MDLR1_SI1929
+MDLR1_SX129
+MDLR1_SX219
+MDLR1_SX309
+MDLR1_SX39
+MDLR1_SX399
+MDMA0_SA1
+MDMA0_SA2
+MDMA0_SI1238
+MDMA0_SI2060
+MDMT0_SI2341
+MDMT0_SI572
+MDMT0_SX212
+MDMT0_SX302
+MDMT0_SX392
+MDNS0_SA1
+MDNS0_SX111
+MDNS0_SX291
+MDNS0_SX381
+MDPB0_SA1
+MDPB0_SA2
+MDPB0_SI2126
+MDPB0_SX146
+MDPB0_SX236
+MDPB0_SX326
+MDPB0_SX56
+MDPK0_SA1
+MDPK0_SA2
+MDPK0_SI1683
+MDPK0_SI552
+MDPK0_SX153
+MDPK0_SX243
+MDPK0_SX63
+MDPS0_SA1
+MDPS0_SA2
+MDPS0_SI1651
+MDPS0_SI1979
+MDPS0_SX179
+MDPS0_SX269
+MDPS0_SX449
+MDPS0_SX89
+MDRD0_SA2
+MDRD0_SI1382
+MDRD0_SI2012
+MDRD0_SX122
+MDRD0_SX212
+MDRD0_SX302
+MDRD0_SX392
+MDSJ0_SA1
+MDSJ0_SA2
+MDSJ0_SI832
+MDSJ0_SX112
+MDSJ0_SX22
+MDSJ0_SX292
+MDSJ0_SX382
+MDSS0_SA1
+MDSS0_SI1881
+MDSS0_SI2087
+MDSS0_SI621
+MDSS0_SX171
+MDSS0_SX261
+MDSS0_SX351
+MDSS0_SX81
+MDSS1_SA2
+MDSS1_SI1713
+MDSS1_SX247
+MDSS1_SX337
+MDSS1_SX427
+MDTB0_SA1
+MDTB0_SA2
+MDTB0_SI570
+MDTB0_SX210
+MDTB0_SX300
+MDTB0_SX321
+MDTB0_SX390
+MDWD0_SA1
+MDWD0_SI1890
+MDWD0_SI557
+MDWD0_SX180
+MDWD0_SX360
+MDWD0_SX450
+MDWH0_SA2
+MDWH0_SI1925
+MDWH0_SX125
+MDWH0_SX35
+MDWH0_SX395
+MDWM0_SI1546
+MDWM0_SI2176
+MDWM0_SX106
+MDWM0_SX376
+MDWM0_SX433
+MEAL0_SA1
+MEAL0_SI1547
+MEAL0_SI917
+MEAL0_SX197
+MEAL0_SX287
+MEAL0_SX377
+MEDR0_SI744
+MEDR0_SX114
+MEDR0_SX204
+MEDR0_SX24
+MEDR0_SX294
+MEDR0_SX384
+MEFG0_SA2
+MEFG0_SI465
+MEFG0_SX105
+MEFG0_SX15
+MEFG0_SX195
+MEFG0_SX285
+MEFG0_SX375
+MEGJ0_SI1967
+MEGJ0_SX437
+MEGJ0_SX77
+MEJL0_SA2
+MEJL0_SI1592
+MEJL0_SI1654
+MEJL0_SI962
+MEJL0_SX332
+MEJL0_SX422
+MEJL0_SX62
+MEJS0_SA1
+MEJS0_SA2
+MEJS0_SI1870
+MEJS0_SX250
+MEJS0_SX430
+MEJS0_SX70
+MESG0_SA1
+MESG0_SA2
+MESG0_SI1332
+MESG0_SI1962
+MESG0_SX162
+MESG0_SX252
+MESG0_SX342
+MESG0_SX72
+MESJ0_SA1
+MESJ0_SA2
+MESJ0_SI2257
+MESJ0_SI997
+MESJ0_SX277
+MESJ0_SX367
+MESJ0_SX7
+MEWM0_SA1
+MEWM0_SA2
+MEWM0_SI1348
+MEWM0_SI1978
+MEWM0_SX268
+MEWM0_SX358
+MEWM0_SX448
+MFER0_SA1
+MFER0_SA2
+MFER0_SI1492
+MFER0_SI2122
+MFER0_SX232
+MFER0_SX322
+MFER0_SX412
+MFER0_SX52
+MFMC0_SA1
+MFMC0_SA2
+MFMC0_SI1132
+MFMC0_SI1762
+MFMC0_SI502
+MFMC0_SX142
+MFMC0_SX232
+MFMC0_SX322
+MFMC0_SX412
+MFMC0_SX52
+MFRM0_SA1
+MFRM0_SA2
+MFRM0_SI1155
+MFRM0_SI1717
+MFRM0_SI1785
+MFRM0_SX165
+MFRM0_SX255
+MFRM0_SX75
+MFWK0_SA1
+MFWK0_SA2
+MFWK0_SI1249
+MFWK0_SI619
+MFWK0_SX259
+MFWK0_SX439
+MFWK0_SX79
+MFXS0_SA1
+MFXS0_SA2
+MFXS0_SI1674
+MFXS0_SI2225
+MFXS0_SI2304
+MFXS0_SX144
+MFXS0_SX234
+MFXS0_SX414
+MFXV0_SA1
+MFXV0_SI1635
+MFXV0_SX15
+MFXV0_SX195
+MFXV0_SX285
+MFXV0_SX375
+MGAF0_SA2
+MGAF0_SI1912
+MGAF0_SI652
+MGAF0_SX112
+MGAF0_SX202
+MGAF0_SX292
+MGAG0_SA1
+MGAG0_SI1321
+MGAG0_SI645
+MGAG0_SX151
+MGAG0_SX241
+MGAG0_SX331
+MGAG0_SX421
+MGAG0_SX61
+MGAK0_SA1
+MGAK0_SA2
+MGAK0_SI1666
+MGAK0_SI2296
+MGAK0_SX316
+MGAK0_SX406
+MGAR0_SA1
+MGAR0_SA2
+MGAR0_SI1212
+MGAR0_SI1694
+MGAR0_SI1842
+MGAR0_SX222
+MGAR0_SX402
+MGAR0_SX42
+MGAW0_SA1
+MGAW0_SA2
+MGAW0_SI1802
+MGAW0_SX265
+MGAW0_SX355
+MGAW0_SX445
+MGAW0_SX85
+MGES0_SA2
+MGES0_SI1481
+MGES0_SX131
+MGES0_SX221
+MGES0_SX401
+MGES0_SX41
+MGJC0_SA1
+MGJC0_SI1256
+MGJC0_SI1335
+MGJC0_SI1965
+MGJC0_SX165
+MGJC0_SX255
+MGJC0_SX345
+MGRL0_SA1
+MGRL0_SA2
+MGRL0_SI1497
+MGRL0_SX237
+MGRL0_SX417
+MGRL0_SX57
+MGRP0_SA1
+MGRP0_SI1947
+MGRP0_SI687
+MGRP0_SX147
+MGRP0_SX237
+MGRP0_SX417
+MGRP0_SX57
+MGSH0_SA1
+MGSH0_SX186
+MGSH0_SX96
+MGSL0_SA2
+MGSL0_SI1164
+MGSL0_SX174
+MGSL0_SX354
+MGSL0_SX444
+MGSL0_SX84
+MGXP0_SA1
+MGXP0_SA2
+MGXP0_SI457
+MGXP0_SX277
+MGXP0_SX367
+MGXP0_SX97
+MHBS0_SA1
+MHBS0_SA2
+MHBS0_SI1575
+MHBS0_SI2205
+MHBS0_SX135
+MHBS0_SX225
+MHBS0_SX405
+MHIT0_SA2
+MHIT0_SI1613
+MHIT0_SI2243
+MHIT0_SX173
+MHIT0_SX263
+MHIT0_SX353
+MHIT0_SX443
+MHIT0_SX83
+MHJB0_SA2
+MHJB0_SI1647
+MHJB0_SI2277
+MHJB0_SX117
+MHJB0_SX207
+MHJB0_SX27
+MHJB0_SX297
+MHJB0_SX387
+MHMG0_SA1
+MHMG0_SA2
+MHMG0_SI1365
+MHMG0_SI1995
+MHMG0_SX105
+MHMG0_SX15
+MHMG0_SX285
+MHMG0_SX375
+MHMR0_SA2
+MHMR0_SI1119
+MHMR0_SX129
+MHMR0_SX219
+MHMR0_SX309
+MHMR0_SX39
+MHMR0_SX399
+MHRM0_SA2
+MHRM0_SI1475
+MHRM0_SI2218
+MHRM0_SX238
+MHRM0_SX328
+MHRM0_SX418
+MHXL0_SA1
+MHXL0_SA2
+MHXL0_SI512
+MHXL0_SI612
+MHXL0_SX152
+MHXL0_SX332
+MHXL0_SX422
+MHXL0_SX62
+MILB0_SA1
+MILB0_SI2163
+MILB0_SI807
+MILB0_SX183
+MILB0_SX273
+MILB0_SX3
+MILB0_SX363
+MILB0_SX93
+MJAC0_SA1
+MJAC0_SA2
+MJAC0_SI1331
+MJAC0_SI2148
+MJAC0_SX341
+MJAC0_SX431
+MJAE0_SA1
+MJAE0_SA2
+MJAE0_SI1524
+MJAE0_SI1999
+MJAE0_SI2154
+MJAE0_SX264
+MJAE0_SX354
+MJAE0_SX444
+MJAI0_SI1604
+MJAI0_SX164
+MJAI0_SX254
+MJAI0_SX344
+MJAI0_SX434
+MJAI0_SX74
+MJBG0_SA1
+MJBG0_SA2
+MJBG0_SI1232
+MJBG0_SI1724
+MJBG0_SI1862
+MJBG0_SX152
+MJBG0_SX242
+MJBG0_SX332
+MJBG0_SX422
+MJDA0_SA1
+MJDA0_SA2
+MJDA0_SI1661
+MJDA0_SI2291
+MJDA0_SX131
+MJDA0_SX221
+MJDA0_SX401
+MJDA0_SX41
+MJDC0_SA1
+MJDC0_SA2
+MJDC0_SI1161
+MJDC0_SI2165
+MJDC0_SX171
+MJDC0_SX261
+MJDC0_SX351
+MJDC0_SX441
+MJDC0_SX81
+MJDE0_SA2
+MJDE0_SX130
+MJDE0_SX310
+MJDE0_SX40
+MJDE0_SX400
+MJDG0_SA1
+MJDG0_SI1672
+MJDG0_SX142
+MJDG0_SX232
+MJDG0_SX322
+MJDG0_SX412
+MJDG0_SX52
+MJDM0_SA2
+MJDM0_SI1937
+MJDM0_SX260
+MJDM0_SX440
+MJDM0_SX80
+MJEB0_SA1
+MJEB0_SA2
+MJEB0_SI1286
+MJEB0_SI1916
+MJEB0_SX206
+MJEB0_SX26
+MJEB0_SX386
+MJEB1_SA1
+MJEB1_SI2097
+MJEB1_SX117
+MJEB1_SX27
+MJEB1_SX297
+MJEE0_SA2
+MJEE0_SI1237
+MJEE0_SI1867
+MJEE0_SI607
+MJEE0_SX157
+MJEE0_SX427
+MJEE0_SX67
+MJFH0_SA1
+MJFH0_SI1737
+MJFH0_SI477
+MJFH0_SX117
+MJFH0_SX207
+MJFH0_SX27
+MJFH0_SX297
+MJFH0_SX387
+MJFR0_SA2
+MJFR0_SI1605
+MJFR0_SI2235
+MJFR0_SI975
+MJFR0_SX165
+MJFR0_SX255
+MJFR0_SX345
+MJHI0_SA2
+MJHI0_SI555
+MJHI0_SI698
+MJHI0_SX248
+MJHI0_SX338
+MJHI0_SX428
+MJHI0_SX68
+MJJB0_SA2
+MJJB0_SI1139
+MJJB0_SI1277
+MJJB0_SI1769
+MJJB0_SX149
+MJJB0_SX329
+MJJB0_SX419
+MJJB0_SX59
+MJJJ0_SA1
+MJJJ0_SA2
+MJJJ0_SI1793
+MJJJ0_SI533
+MJJJ0_SX173
+MJJJ0_SX263
+MJJJ0_SX353
+MJJJ0_SX83
+MJJM0_SA1
+MJJM0_SI1457
+MJJM0_SX17
+MJJM0_SX197
+MJJM0_SX287
+MJJM0_SX377
+MJKR0_SA2
+MJKR0_SI1201
+MJKR0_SI1831
+MJKR0_SX121
+MJKR0_SX211
+MJKR0_SX301
+MJKR0_SX31
+MJKR0_SX391
+MJLB0_SA1
+MJLB0_SA2
+MJLB0_SI2246
+MJLB0_SI986
+MJLB0_SX266
+MJLB0_SX356
+MJLB0_SX446
+MJLB0_SX86
+MJLG1_SA1
+MJLG1_SA2
+MJLG1_SI1012
+MJLG1_SI1642
+MJLG1_SI2272
+MJLG1_SX112
+MJLG1_SX202
+MJLG1_SX22
+MJLG1_SX382
+MJLS0_SA1
+MJLS0_SA2
+MJLS0_SI1096
+MJLS0_SI466
+MJLS0_SX16
+MJLS0_SX196
+MJLS0_SX286
+MJLS0_SX376
+MJMA0_SI1495
+MJMA0_SI865
+MJMA0_SX145
+MJMA0_SX235
+MJMA0_SX325
+MJMA0_SX415
+MJMA0_SX55
+MJMD0_SA1
+MJMD0_SI1028
+MJMD0_SI1658
+MJMD0_SX128
+MJMD0_SX218
+MJMD0_SX398
+MJMM0_SA1
+MJMM0_SA2
+MJMM0_SI1885
+MJMM0_SI625
+MJMM0_SX265
+MJMM0_SX355
+MJMM0_SX445
+MJPG0_SA1
+MJPG0_SA2
+MJPG0_SI561
+MJPG0_SX291
+MJPG0_SX381
+MJPM0_SA1
+MJPM0_SI1998
+MJPM0_SI738
+MJPM0_SX108
+MJPM0_SX18
+MJPM0_SX198
+MJPM0_SX288
+MJPM1_SA1
+MJPM1_SA2
+MJPM1_SI1897
+MJPM1_SI761
+MJPM1_SX131
+MJPM1_SX221
+MJPM1_SX41
+MJRA0_SI606
+MJRA0_SX156
+MJRA0_SX246
+MJRA0_SX66
+MJRG0_SA1
+MJRG0_SA2
+MJRG0_SX106
+MJRG0_SX16
+MJRG0_SX286
+MJRH0_SA1
+MJRH0_SA2
+MJRH0_SI1125
+MJRH0_SI1755
+MJRH0_SX135
+MJRH0_SX315
+MJRH0_SX405
+MJRH0_SX45
+MJRH1_SA2
+MJRH1_SI1774
+MJRH1_SX334
+MJRH1_SX64
+MJRK0_SI2103
+MJRK0_SX340
+MJRK0_SX70
+MJRP0_SI1835
+MJRP0_SI585
+MJRP0_SX135
+MJRP0_SX315
+MJRP0_SX405
+MJRP0_SX45
+MJSR0_SA2
+MJSR0_SX164
+MJSR0_SX254
+MJSR0_SX434
+MJSR0_SX74
+MJWG0_SA2
+MJWG0_SI2155
+MJWG0_SX355
+MJWG0_SX445
+MJWG0_SX85
+MJWS0_SA1
+MJWS0_SA2
+MJWS0_SI1143
+MJWS0_SI1773
+MJWS0_SX243
+MJWS0_SX423
+MJWT0_SA2
+MJWT0_SI751
+MJXA0_SA1
+MJXA0_SA2
+MJXA0_SI1507
+MJXA0_SI2137
+MJXA0_SI877
+MJXA0_SX157
+MJXA0_SX247
+MJXA0_SX337
+MJXA0_SX67
+MJXL0_SA1
+MJXL0_SA2
+MJXL0_SI1795
+MJXL0_SX182
+MJXL0_SX272
+MJXL0_SX362
+MJXL0_SX452
+MJXL0_SX92
+MKAG0_SA2
+MKAG0_SI1609
+MKAG0_SI2239
+MKAG0_SX169
+MKAG0_SX30
+MKAG0_SX439
+MKAG0_SX79
+MKAH0_SA1
+MKAH0_SA2
+MKAH0_SI1528
+MKAH0_SI2158
+MKAH0_SI898
+MKAH0_SX268
+MKAH0_SX358
+MKAH0_SX448
+MKAH0_SX88
+MKAJ0_SA1
+MKAJ0_SI1414
+MKAJ0_SI2044
+MKAJ0_SI784
+MKAJ0_SX244
+MKAJ0_SX334
+MKAJ0_SX424
+MKAJ0_SX64
+MKAM0_SA2
+MKAM0_SI1316
+MKAM0_SX236
+MKAM0_SX416
+MKDB0_SI2132
+MKDB0_SI588
+MKDB0_SI872
+MKDB0_SX242
+MKDB0_SX332
+MKDB0_SX422
+MKDB0_SX62
+MKDD0_SA1
+MKDD0_SX127
+MKDD0_SX217
+MKDD0_SX307
+MKDD0_SX37
+MKDD0_SX397
+MKDT0_SA1
+MKDT0_SA2
+MKDT0_SI2153
+MKDT0_SI893
+MKDT0_SX173
+MKDT0_SX263
+MKDT0_SX353
+MKDT0_SX443
+MKDT0_SX83
+MKES0_SA2
+MKES0_SX263
+MKES0_SX353
+MKES0_SX443
+MKES0_SX83
+MKJO0_SA1
+MKJO0_SA2
+MKJO0_SI2147
+MKJO0_SX167
+MKJO0_SX257
+MKJO0_SX424
+MKJO0_SX77
+MKLN0_SA1
+MKLN0_SA2
+MKLN0_SI1598
+MKLN0_SI2228
+MKLN0_SX158
+MKLN0_SX338
+MKLN0_SX428
+MKLN0_SX68
+MKLR0_SA1
+MKLR0_SI1059
+MKLR0_SI2319
+MKLR0_SX159
+MKLR0_SX249
+MKLR0_SX339
+MKLR0_SX429
+MKLR0_SX69
+MKLS0_SA2
+MKLS0_SI1533
+MKLS0_SX177
+MKLS0_SX267
+MKLS0_SX447
+MKLS1_SI1545
+MKLS1_SI2175
+MKLS1_SX105
+MKLS1_SX15
+MKLS1_SX195
+MKLS1_SX285
+MKLW0_SA2
+MKLW0_SI1844
+MKLW0_SI2201
+MKLW0_SX131
+MKLW0_SX221
+MKLW0_SX401
+MKLW0_SX41
+MKRG0_SA1
+MKRG0_SA2
+MKRG0_SI1491
+MKRG0_SI2121
+MKRG0_SX141
+MKRG0_SX231
+MKRG0_SX31
+MKRG0_SX51
+MKXL0_SA1
+MKXL0_SI1185
+MKXL0_SX105
+MKXL0_SX195
+MKXL0_SX285
+MLBC0_SA2
+MLBC0_SI609
+MLBC0_SX159
+MLBC0_SX339
+MLBC0_SX429
+MLBC0_SX69
+MLEL0_SI1876
+MLEL0_SX346
+MLEL0_SX76
+MLJC0_SA1
+MLJC0_SA2
+MLJC0_SI1855
+MLJC0_SI595
+MLJC0_SX235
+MLJC0_SX325
+MLJC0_SX55
+MLJH0_SI1324
+MLJH0_SX154
+MLJH0_SX334
+MLJH0_SX424
+MLNS0_SA1
+MLNS0_SA2
+MLNS0_SI1407
+MLNS0_SI777
+MLNS0_SX147
+MLNS0_SX237
+MLNS0_SX327
+MLNS0_SX417
+MLNS0_SX57
+MLSH0_SA1
+MLSH0_SA2
+MLSH0_SI2047
+MLSH0_SI787
+MLSH0_SX157
+MLSH0_SX337
+MLSH0_SX427
+MLSH0_SX67
+MMAA0_SI2105
+MMAA0_SX125
+MMAA0_SX215
+MMAA0_SX305
+MMAA0_SX395
+MMAB1_SA1
+MMAB1_SA2
+MMAB1_SI2124
+MMAB1_SX144
+MMAB1_SX414
+MMAB1_SX54
+MMAG0_SI496
+MMAG0_SX226
+MMAG0_SX406
+MMAG0_SX46
+MMAM0_SA1
+MMAM0_SA2
+MMAM0_SI1597
+MMAM0_SI1668
+MMAM0_SX247
+MMAM0_SX337
+MMAM0_SX67
+MMAR0_SA1
+MMAR0_SA2
+MMAR0_SI1336
+MMAR0_SI706
+MMAR0_SX436
+MMAR0_SX76
+MMBS0_SA1
+MMBS0_SA2
+MMBS0_SI1151
+MMBS0_SX251
+MMBS0_SX341
+MMBS0_SX431
+MMBS0_SX71
+MMCC0_SA1
+MMCC0_SI1968
+MMCC0_SI708
+MMCC0_SX168
+MMCC0_SX258
+MMCC0_SX348
+MMCC0_SX438
+MMCC0_SX78
+MMDB0_SA1
+MMDB0_SA2
+MMDB0_SI1358
+MMDB0_SI1617
+MMDB0_SX267
+MMDB0_SX357
+MMDB0_SX447
+MMDB0_SX87
+MMDG0_SI2035
+MMDG0_SX340
+MMDG0_SX430
+MMDG0_SX70
+MMDM0_SA1
+MMDM0_SA2
+MMDM0_SX231
+MMDM0_SX321
+MMDM0_SX411
+MMDM0_SX51
+MMDM1_SA1
+MMDM1_SI1650
+MMDM1_SI783
+MMDM1_SX243
+MMDS0_SA2
+MMDS0_SI1343
+MMDS0_SI1973
+MMDS0_SI713
+MMDS0_SX173
+MMDS0_SX263
+MMDS0_SX353
+MMDS0_SX443
+MMDS0_SX83
+MMEA0_SA2
+MMEA0_SI1388
+MMEA0_SI2018
+MMEA0_SI758
+MMEA0_SX218
+MMEA0_SX308
+MMEA0_SX38
+MMEB0_SA1
+MMEB0_SI1357
+MMEB0_SI1987
+MMEB0_SI727
+MMEB0_SX7
+MMEB0_SX97
+MMGC0_SA1
+MMGC0_SI1935
+MMGC0_SI2184
+MMGC0_SX315
+MMGC0_SX405
+MMGC0_SX45
+MMGG0_SA1
+MMGG0_SA2
+MMGG0_SI1709
+MMGG0_SI2339
+MMGG0_SX179
+MMGG0_SX359
+MMGG0_SX89
+MMGK0_SA1
+MMGK0_SA2
+MMGK0_SI1322
+MMGK0_SI1952
+MMGK0_SI692
+MMGK0_SX152
+MMGK0_SX242
+MMGK0_SX422
+MMJB1_SA1
+MMJB1_SI1408
+MMJB1_SI2038
+MMJB1_SI778
+MMJB1_SX148
+MMJB1_SX238
+MMJB1_SX328
+MMJB1_SX418
+MMJB1_SX58
+MMLM0_SA1
+MMLM0_SA2
+MMLM0_SI1527
+MMLM0_SI897
+MMLM0_SX177
+MMLM0_SX267
+MMLM0_SX357
+MMLM0_SX447
+MMLM0_SX87
+MMPM0_SA1
+MMPM0_SA2
+MMPM0_SI1061
+MMPM0_SI1691
+MMPM0_SI2321
+MMPM0_SX251
+MMPM0_SX341
+MMPM0_SX431
+MMPM0_SX71
+MMRP0_SA1
+MMRP0_SI2034
+MMRP0_SI717
+MMRP0_SI774
+MMRP0_SX234
+MMRP0_SX414
+MMRP0_SX54
+MMSM0_SA1
+MMSM0_SA2
+MMSM0_SI1736
+MMSM0_SX26
+MMSM0_SX296
+MMSM0_SX386
+MMVP0_SI1284
+MMVP0_SI1914
+MMVP0_SX114
+MMVP0_SX204
+MMVP0_SX294
+MMVP0_SX384
+MMWB0_SA2
+MMWB0_SI1619
+MMWB0_SX179
+MMWB0_SX269
+MMWS0_SA1
+MMWS0_SI1518
+MMWS0_SI559
+MMWS0_SI888
+MMWS0_SX258
+MMWS0_SX78
+MMWS1_SA1
+MMWS1_SA2
+MMWS1_SI1071
+MMWS1_SI2331
+MMWS1_SX261
+MMWS1_SX27
+MMWS1_SX351
+MMWS1_SX441
+MMWS1_SX81
+MMXS0_SA1
+MMXS0_SA2
+MMXS0_SI629
+MMXS0_SI876
+MMXS0_SX156
+MMXS0_SX336
+MMXS0_SX66
+MNET0_SA1
+MNET0_SA2
+MNET0_SI1446
+MNET0_SI2076
+MNET0_SX186
+MNET0_SX276
+MNET0_SX366
+MNET0_SX96
+MNTW0_SA1
+MNTW0_SI2328
+MNTW0_SX202
+MNTW0_SX258
+MNTW0_SX348
+MPAR0_SA1
+MPAR0_SA2
+MPAR0_SI1576
+MPAR0_SX226
+MPAR0_SX406
+MPAR0_SX46
+MPEB0_SA1
+MPEB0_SA2
+MPEB0_SX150
+MPEB0_SX420
+MPEB0_SX60
+MPFU0_SA1
+MPFU0_SA2
+MPFU0_SI1888
+MPFU0_SX178
+MPFU0_SX268
+MPFU0_SX358
+MPFU0_SX88
+MPGH0_SA1
+MPGH0_SA2
+MPGH0_SI1554
+MPGH0_SI924
+MPGH0_SX204
+MPGH0_SX294
+MPGH0_SX384
+MPGR0_SA1
+MPGR0_SA2
+MPGR0_SI2040
+MPGR0_SI780
+MPGR0_SX150
+MPGR0_SX420
+MPGR0_SX60
+MPGR1_SA1
+MPGR1_SA2
+MPGR1_SI1269
+MPGR1_SI2129
+MPGR1_SX239
+MPGR1_SX329
+MPGR1_SX419
+MPGR1_SX59
+MPMB0_SX241
+MPPC0_SA2
+MPPC0_SI2042
+MPPC0_SI782
+MPPC0_SX152
+MPPC0_SX242
+MPPC0_SX332
+MPPC0_SX422
+MPPC0_SX62
+MPRB0_SA1
+MPRB0_SA2
+MPRB0_SI1205
+MPRB0_SX125
+MPRB0_SX215
+MPRB0_SX305
+MPRB0_SX35
+MPRB0_SX395
+MPRD0_SA2
+MPRD0_SI1431
+MPRD0_SI2061
+MPRK0_SA2
+MPRK0_SX17
+MPRK0_SX197
+MPRT0_SA2
+MPRT0_SI1210
+MPRT0_SI495
+MPRT0_SI580
+MPRT0_SX130
+MPRT0_SX220
+MPRT0_SX40
+MPRT0_SX400
+MPSW0_SA1
+MPSW0_SA2
+MPSW0_SI1697
+MPSW0_SI2327
+MPSW0_SX24
+MPSW0_SX257
+MPSW0_SX77
+MRAB0_SA1
+MRAB0_SA2
+MRAB0_SI1224
+MRAB0_SI594
+MRAB0_SX144
+MRAB0_SX234
+MRAB0_SX324
+MRAB0_SX414
+MRAB0_SX54
+MRAB1_SA1
+MRAB1_SA2
+MRAB1_SI1478
+MRAB1_SI2108
+MRAB1_SX218
+MRAB1_SX38
+MRAB1_SX398
+MRAI0_SI1954
+MRAI0_SX162
+MRAI0_SX252
+MRAI0_SX342
+MRAM0_SI1275
+MRAM0_SI1905
+MRAM0_SX105
+MRAM0_SX195
+MRAM0_SX285
+MRAM0_SX375
+MRAV0_SA1
+MRAV0_SA2
+MRAV0_SI1008
+MRAV0_SI1638
+MRAV0_SI2268
+MRAV0_SX108
+MRAV0_SX18
+MRAV0_SX198
+MRAV0_SX288
+MRAV0_SX378
+MRBC0_SA1
+MRBC0_SA2
+MRBC0_SI1665
+MRBC0_SI599
+MRBC0_SX149
+MRBC0_SX239
+MRBC0_SX59
+MRCG0_SA1
+MRCG0_SI2058
+MRCG0_SX258
+MRCG0_SX78
+MRCW0_SA2
+MRCW0_SI1371
+MRCW0_SI2001
+MRCW0_SX111
+MRCW0_SX201
+MRCW0_SX21
+MRCW0_SX381
+MRDD0_SA1
+MRDD0_SA2
+MRDD0_SI1050
+MRDD0_SI2310
+MRDD0_SX240
+MRDD0_SX330
+MRDM0_SA1
+MRDM0_SA2
+MRDM0_SI965
+MRDM0_SX155
+MRDM0_SX245
+MRDM0_SX425
+MRDS0_SA2
+MRDS0_SI1167
+MRDS0_SI1797
+MRDS0_SI537
+MRDS0_SX177
+MRDS0_SX267
+MRDS0_SX357
+MRDS0_SX447
+MRDS0_SX87
+MREE0_SA1
+MREE0_SA2
+MREE0_SI1734
+MREE0_SX114
+MREE0_SX204
+MREE0_SX294
+MREE0_SX384
+MREH1_SA2
+MREH1_SI2229
+MREH1_SX159
+MREH1_SX339
+MREH1_SX429
+MREM0_SA1
+MREM0_SI1591
+MREM0_SI961
+MREM0_SX151
+MREM0_SX241
+MREM0_SX331
+MREM0_SX421
+MREM0_SX61
+MREW1_SA1
+MREW1_SA2
+MREW1_SI1500
+MREW1_SI2130
+MREW1_SX150
+MREW1_SX240
+MREW1_SX330
+MREW1_SX420
+MREW1_SX60
+MRFK0_SA1
+MRFK0_SA2
+MRFK0_SI1706
+MRFK0_SI2336
+MRFK0_SX176
+MRFK0_SX266
+MRFK0_SX356
+MRFK0_SX86
+MRFL0_SA2
+MRFL0_SI1786
+MRFL0_SX346
+MRGM0_SA1
+MRGM0_SI1162
+MRGM0_SI1792
+MRGM0_SX416
+MRGM0_SX82
+MRGS0_SA1
+MRGS0_SI1986
+MRGS0_SX276
+MRGS0_SX366
+MRGS0_SX96
+MRHL0_SA1
+MRHL0_SA2
+MRHL0_SI1515
+MRHL0_SI2145
+MRHL0_SX165
+MRHL0_SX255
+MRHL0_SX75
+MRJB1_SI1020
+MRJB1_SX300
+MRJH0_SA1
+MRJH0_SI914
+MRJH0_SX259
+MRJH0_SX439
+MRJM0_SA1
+MRJM0_SA2
+MRJM0_SI1095
+MRJM0_SI1228
+MRJM0_SI1858
+MRJM0_SX238
+MRJM0_SX328
+MRJM0_SX418
+MRJM0_SX58
+MRJM1_SA1
+MRJM1_SI668
+MRJM1_SX218
+MRJM1_SX308
+MRJM1_SX38
+MRJM1_SX398
+MRJT0_SA1
+MRJT0_SI1805
+MRJT0_SX148
+MRJT0_SX238
+MRKM0_SA1
+MRKM0_SX187
+MRKM0_SX277
+MRKM0_SX7
+MRKM0_SX97
+MRLD0_SA1
+MRLD0_SI1594
+MRLD0_SI964
+MRLD0_SX244
+MRLD0_SX334
+MRLD0_SX64
+MRLJ0_SA2
+MRLJ0_SI1420
+MRLJ0_SI2050
+MRLJ0_SX160
+MRLJ0_SX430
+MRLJ0_SX70
+MRLJ1_SI1671
+MRLJ1_SI2332
+MRLJ1_SX141
+MRLJ1_SX231
+MRLJ1_SX411
+MRLJ1_SX51
+MRLK0_SA1
+MRLK0_SA2
+MRLK0_SI2140
+MRLK0_SX303
+MRLK0_SX33
+MRLK0_SX393
+MRLR0_SA1
+MRLR0_SA2
+MRLR0_SI1826
+MRLR0_SI566
+MRLR0_SX116
+MRLR0_SX206
+MRLR0_SX26
+MRLR0_SX296
+MRLR0_SX386
+MRMB0_SA1
+MRMB0_SI2211
+MRMB0_SI951
+MRMB0_SX141
+MRMB0_SX231
+MRMB0_SX321
+MRMB0_SX51
+MRMG0_SA2
+MRMG0_SI1710
+MRMG0_SI2340
+MRMG0_SX180
+MRMG0_SX270
+MRMG0_SX360
+MRMG0_SX90
+MRMH0_SA1
+MRMH0_SA2
+MRMH0_SI1021
+MRMH0_SX211
+MRMH0_SX301
+MRMH0_SX31
+MRMH0_SX391
+MRML0_SI2051
+MRML0_SI791
+MRML0_SX431
+MRML0_SX71
+MRMS0_SA1
+MRMS0_SA2
+MRMS0_SI1113
+MRMS0_SI2100
+MRMS0_SX120
+MRMS0_SX210
+MRMS0_SX30
+MRMS0_SX300
+MRMS0_SX390
+MRPC1_SA1
+MRPC1_SA2
+MRPC1_SI1482
+MRPC1_SI2026
+MRPC1_SX132
+MRPC1_SX222
+MRPC1_SX312
+MRPC1_SX402
+MRPC1_SX42
+MRRE0_SI704
+MRRE0_SX254
+MRRE0_SX434
+MRSO0_SA1
+MRSO0_SA2
+MRSO0_SI1659
+MRSO0_SI2289
+MRSO0_SX219
+MRSO0_SX309
+MRSO0_SX399
+MRSP0_SA1
+MRSP0_SA2
+MRSP0_SI2059
+MRSP0_SI799
+MRSP0_SX169
+MRSP0_SX196
+MRSP0_SX439
+MRSP0_SX79
+MRTC0_SA1
+MRTC0_SA2
+MRTC0_SI2088
+MRTC0_SI828
+MRTC0_SX108
+MRTC0_SX18
+MRTC0_SX198
+MRTC0_SX288
+MRTJ0_SA2
+MRTJ0_SI1551
+MRTJ0_SI2032
+MRTJ0_SX322
+MRTJ0_SX412
+MRVG0_SA1
+MRVG0_SA2
+MRVG0_SI1770
+MRVG0_SI510
+MRVG0_SX150
+MRVG0_SX330
+MRVG0_SX420
+MRVG0_SX60
+MRWA0_SA1
+MRWA0_SA2
+MRWA0_SI1603
+MRWA0_SI2233
+MRWA0_SX253
+MRWA0_SX343
+MRWA0_SX433
+MRWS0_SA1
+MRWS0_SA2
+MRWS0_SX112
+MRWS0_SX202
+MRWS0_SX292
+MRXB0_SA1
+MRXB0_SI1585
+MRXB0_SX145
+MRXB0_SX235
+MRXB0_SX325
+MRXB0_SX55
+MSAH1_SA1
+MSAH1_SA2
+MSAH1_SI1049
+MSAH1_SI2309
+MSAH1_SX149
+MSAH1_SX239
+MSAH1_SX329
+MSAH1_SX419
+MSAH1_SX59
+MSAS0_SA1
+MSAS0_SA2
+MSAS0_SI2006
+MSAS0_SX26
+MSAS0_SX296
+MSAT0_SA2
+MSAT0_SI1526
+MSAT0_SI2156
+MSAT0_SI896
+MSAT0_SX176
+MSAT0_SX266
+MSAT0_SX356
+MSAT0_SX446
+MSAT0_SX86
+MSAT1_SA1
+MSAT1_SA2
+MSAT1_SI1073
+MSAT1_SI1703
+MSAT1_SI2333
+MSAT1_SX173
+MSAT1_SX353
+MSDB0_SA1
+MSDB0_SA2
+MSDB0_SI1007
+MSDB0_SI1637
+MSDB0_SI2267
+MSDB0_SX107
+MSDB0_SX17
+MSDH0_SA1
+MSDH0_SA2
+MSDH0_SI2113
+MSDH0_SX260
+MSDH0_SX350
+MSDS0_SA2
+MSDS0_SI1707
+MSDS0_SI2337
+MSDS0_SX177
+MSDS0_SX447
+MSDS0_SX87
+MSEM1_SA1
+MSEM1_SA2
+MSEM1_SX360
+MSEM1_SX450
+MSEM1_SX90
+MSES0_SA1
+MSES0_SA2
+MSES0_SI2216
+MSES0_SI2219
+MSES0_SX149
+MSES0_SX329
+MSES0_SX59
+MSFH0_SA2
+MSFH0_SI1216
+MSFH0_SI586
+MSFH0_SX226
+MSFH0_SX46
+MSFV0_SA1
+MSFV0_SA2
+MSFV0_SI1262
+MSFV0_SX182
+MSFV0_SX272
+MSFV0_SX452
+MSJK0_SA1
+MSJK0_SA2
+MSJK0_SI2226
+MSJK0_SI966
+MSJK0_SX156
+MSJK0_SX246
+MSJK0_SX426
+MSJK0_SX66
+MSMC0_SA1
+MSMC0_SA2
+MSMC0_SI1907
+MSMC0_SI647
+MSMC0_SX107
+MSMC0_SX17
+MSMC0_SX197
+MSMC0_SX287
+MSMC0_SX377
+MSMR0_SA1
+MSMR0_SA2
+MSMR0_SI1405
+MSMR0_SI775
+MSMR0_SX145
+MSMR0_SX235
+MSMR0_SX325
+MSMR0_SX55
+MSMS0_SA2
+MSMS0_SI2063
+MSMS0_SI803
+MSMS0_SX263
+MSMS0_SX353
+MSMS0_SX443
+MSRG0_SA2
+MSRG0_SI1851
+MSRG0_SI591
+MSRG0_SX141
+MSRG0_SX231
+MSRG0_SX321
+MSRG0_SX411
+MSRG0_SX51
+MSRR0_SA1
+MSRR0_SA2
+MSRR0_SI1131
+MSRR0_SX141
+MSRR0_SX231
+MSRR0_SX30
+MSRR0_SX411
+MSRR0_SX51
+MSTF0_SA1
+MSTF0_SA2
+MSTF0_SI1396
+MSTF0_SX136
+MSTF0_SX226
+MSTF0_SX406
+MSVS0_SA1
+MSVS0_SI1568
+MSVS0_SX128
+MSVS0_SX218
+MSVS0_SX38
+MTAB0_SA1
+MTAB0_SA2
+MTAB0_SI2202
+MTAB0_SI942
+MTAB0_SX132
+MTAB0_SX222
+MTAB0_SX402
+MTAB0_SX42
+MTAS0_SA1
+MTAS0_SA2
+MTAS0_SI1385
+MTAS0_SI2015
+MTAS0_SI755
+MTAS0_SX125
+MTAS0_SX305
+MTAT0_SA2
+MTAT0_SI1740
+MTAT0_SX120
+MTAT0_SX210
+MTAT0_SX30
+MTAT0_SX300
+MTAT1_SA1
+MTAT1_SA2
+MTAT1_SI1409
+MTAT1_SI1627
+MTAT1_SX239
+MTAT1_SX419
+MTBC0_SA1
+MTBC0_SA2
+MTBC0_SI1173
+MTBC0_SX183
+MTBC0_SX273
+MTBC0_SX347
+MTBC0_SX363
+MTBC0_SX93
+MTCS0_SA1
+MTCS0_SI1972
+MTCS0_SX172
+MTCS0_SX262
+MTCS0_SX352
+MTCS0_SX442
+MTDB0_SA1
+MTDB0_SA2
+MTDB0_SI2031
+MTDB0_SX141
+MTDB0_SX231
+MTDB0_SX321
+MTDB0_SX411
+MTDB0_SX51
+MTDP0_SI1274
+MTDP0_SI2151
+MTDP0_SX261
+MTDP0_SX441
+MTDP0_SX81
+MTER0_SI527
+MTER0_SX167
+MTER0_SX17
+MTER0_SX257
+MTER0_SX77
+MTJG0_SA2
+MTJG0_SI1520
+MTJG0_SI890
+MTJG0_SX350
+MTJG0_SX440
+MTJG0_SX80
+MTJM0_SA1
+MTJM0_SA2
+MTJM0_SI1226
+MTJM0_SI655
+MTJM0_SX236
+MTJM0_SX326
+MTJM0_SX416
+MTJM0_SX56
+MTJS0_SA1
+MTJS0_SI1192
+MTJS0_SX112
+MTJS0_SX202
+MTJS0_SX22
+MTJS0_SX292
+MTJU0_SA1
+MTJU0_SA2
+MTJU0_SI2269
+MTJU0_SI760
+MTJU0_SX220
+MTJU0_SX310
+MTJU0_SX40
+MTKD0_SA1
+MTKD0_SA2
+MTKD0_SI1187
+MTKD0_SI1817
+MTKD0_SX17
+MTKD0_SX197
+MTKD0_SX377
+MTKP0_SA1
+MTKP0_SA2
+MTKP0_SX123
+MTKP0_SX213
+MTKP0_SX303
+MTKP0_SX33
+MTKP0_SX393
+MTLB0_SA2
+MTLB0_SI1764
+MTLB0_SI504
+MTLB0_SX144
+MTLB0_SX414
+MTLB0_SX54
+MTLC0_SA2
+MTLC0_SI847
+MTLC0_SX127
+MTLC0_SX217
+MTLC0_SX307
+MTLC0_SX37
+MTLC0_SX397
+MTML0_SA1
+MTML0_SA2
+MTML0_SI1065
+MTML0_SI1695
+MTML0_SX255
+MTML0_SX345
+MTML0_SX75
+MTMN0_SA1
+MTMN0_SX164
+MTMN0_SX254
+MTMN0_SX344
+MTMN0_SX74
+MTMT0_SA1
+MTMT0_SI1118
+MTMT0_SX128
+MTMT0_SX218
+MTMT0_SX308
+MTMT0_SX38
+MTMT0_SX398
+MTPF0_SA1
+MTPF0_SA2
+MTPF0_SI1235
+MTPF0_SI1865
+MTPF0_SI605
+MTPF0_SX155
+MTPF0_SX245
+MTPF0_SX335
+MTPF0_SX425
+MTPG0_SA1
+MTPG0_SA2
+MTPG0_SI2013
+MTPG0_SX123
+MTPG0_SX213
+MTPG0_SX33
+MTPG0_SX393
+MTPP0_SA1
+MTPP0_SA2
+MTPP0_SI2138
+MTPP0_SI878
+MTPP0_SX158
+MTPP0_SX248
+MTPP0_SX428
+MTPP0_SX68
+MTPR0_SA1
+MTPR0_SA2
+MTPR0_SI1600
+MTPR0_SI506
+MTPR0_SX250
+MTPR0_SX70
+MTQC0_SA2
+MTQC0_SI2071
+MTQC0_SX271
+MTQC0_SX361
+MTRC0_SA1
+MTRC0_SA2
+MTRC0_SI1623
+MTRC0_SI993
+MTRC0_SX170
+MTRC0_SX183
+MTRC0_SX273
+MTRC0_SX363
+MTRC0_SX93
+MTRR0_SA1
+MTRR0_SA2
+MTRR0_SI1548
+MTRR0_SI2178
+MTRR0_SX108
+MTRR0_SX18
+MTRR0_SX378
+MTRT0_SA1
+MTRT0_SI1857
+MTRT0_SI597
+MTRT0_SX147
+MTRT0_SX237
+MTRT0_SX417
+MTWH1_SA1
+MTWH1_SA2
+MTWH1_SI1512
+MTWH1_SI2142
+MTWH1_SI882
+MTWH1_SX162
+MTWH1_SX252
+MTWH1_SX342
+MTWH1_SX432
+MTXS0_SI1690
+MTXS0_SX250
+MTXS0_SX340
+MTXS0_SX70
+MVJH0_SA1
+MVJH0_SA2
+MVJH0_SI2186
+MVJH0_SX116
+MVJH0_SX26
+MVJH0_SX386
+MVLO0_SA2
+MVLO0_SI1147
+MVLO0_SI1777
+MVLO0_SX157
+MVLO0_SX247
+MVLO0_SX337
+MVLO0_SX427
+MVLO0_SX67
+MVRW0_SA1
+MVRW0_SI1485
+MVRW0_SI2115
+MVRW0_SI855
+MVRW0_SX315
+MVRW0_SX405
+MVRW0_SX45
+MWAC0_SA1
+MWAC0_SI2231
+MWAC0_SI971
+MWAC0_SX71
+MWAD0_SA1
+MWAD0_SA2
+MWAD0_SI1062
+MWAD0_SI1749
+MWAD0_SI2322
+MWAD0_SX162
+MWAD0_SX252
+MWAD0_SX342
+MWAR0_SA2
+MWAR0_SI2305
+MWAR0_SX145
+MWAR0_SX235
+MWAR0_SX325
+MWAR0_SX415
+MWAR0_SX55
+MWCH0_SA1
+MWCH0_SA2
+MWCH0_SI1622
+MWCH0_SX272
+MWCH0_SX362
+MWCH0_SX92
+MWDK0_SX266
+MWDK0_SX356
+MWDK0_SX446
+MWEM0_SA1
+MWEM0_SI1950
+MWEM0_SX240
+MWEM0_SX330
+MWEM0_SX60
+MWGR0_SA1
+MWGR0_SA2
+MWGR0_SI1606
+MWGR0_SI2236
+MWGR0_SI976
+MWGR0_SX166
+MWGR0_SX256
+MWGR0_SX436
+MWGR0_SX76
+MWRE0_SA1
+MWRE0_SI1687
+MWRE0_SI2317
+MWRE0_SX157
+MWRP0_SA2
+MWRP0_SI1525
+MWRP0_SI2073
+MWRP0_SX183
+MWRP0_SX3
+MWRP0_SX93
+MWSB0_SA1
+MWSB0_SA2
+MWSB0_SI1626
+MWSB0_SI2256
+MWSB0_SX186
+MWSB0_SX366
+MWSB0_SX6
+MWSB0_SX96
+MWSH0_SA1
+MWSH0_SA2
+MWSH0_SI2266
+MWSH0_SX346
+MWSH0_SX436
+MZMB0_SA2
+MZMB0_SI1166
+MZMB0_SI1796
+MZMB0_SI536
+MZMB0_SX176
+MZMB0_SX266
+MZMB0_SX356
+MZMB0_SX446
+MZMB0_SX86
diff --git a/SpeechT5/fairseq/examples/wav2vec/unsupervised/config/timit_unmatched/train_text.uid b/SpeechT5/fairseq/examples/wav2vec/unsupervised/config/timit_unmatched/train_text.uid
new file mode 100644
index 0000000000000000000000000000000000000000..0e0c2517c9415ce76d5863781f621402cd15b911
--- /dev/null
+++ b/SpeechT5/fairseq/examples/wav2vec/unsupervised/config/timit_unmatched/train_text.uid
@@ -0,0 +1,1000 @@
+FAEM0_SI762
+FAEM0_SX42
+FAJW0_SA1
+FAJW0_SX3
+FAJW0_SX93
+FALK0_SX186
+FALK0_SX6
+FALR0_SI1325
+FBAS0_SA1
+FBAS0_SX217
+FBCG1_SA1
+FBCG1_SX172
+FBCG1_SX442
+FBCH0_SX236
+FBCH0_SX416
+FBLV0_SA1
+FBLV0_SI1058
+FBLV0_SX338
+FBLV0_SX68
+FBMH0_SA1
+FBMJ0_SI815
+FCAG0_SA1
+FCAG0_SX153
+FCAG0_SX243
+FCAJ0_SI1479
+FCAJ0_SX309
+FCDR1_SX106
+FCDR1_SX196
+FCEG0_SA2
+FCJF0_SA1
+FCJF0_SX127
+FCJS0_SI1607
+FCJS0_SI2237
+FCJS0_SX257
+FCKE0_SA2
+FCKE0_SX121
+FCLT0_SI2068
+FCLT0_SX448
+FCLT0_SX88
+FCMG0_SA2
+FCMG0_SI1872
+FCMG0_SX72
+FCMM0_SA1
+FCMM0_SA2
+FCMM0_SX183
+FCRZ0_SI2053
+FCRZ0_SX433
+FCYL0_SA1
+FCYL0_SX37
+FDAS1_SI2091
+FDAS1_SX201
+FDAS1_SX381
+FDAW0_SI1406
+FDFB0_SA1
+FDFB0_SA2
+FDFB0_SI2010
+FDFB0_SX58
+FDJH0_SX305
+FDML0_SA2
+FDML0_SX159
+FDML0_SX249
+FDML0_SX429
+FDMY0_SA2
+FDMY0_SX27
+FDNC0_SX198
+FDNC0_SX288
+FDTD0_SX211
+FDXW0_SA1
+FDXW0_SX251
+FDXW0_SX341
+FDXW0_SX71
+FEAC0_SX165
+FEAC0_SX75
+FEAR0_SI622
+FECD0_SX68
+FEEH0_SA1
+FEEH0_SI1742
+FEEH0_SI471
+FEEH0_SX122
+FEME0_SA1
+FEME0_SX155
+FEME0_SX65
+FETB0_SA1
+FETB0_SI1148
+FETB0_SX158
+FEXM0_SI1101
+FGCS0_SX136
+FGCS0_SX226
+FGCS0_SX316
+FGCS0_SX406
+FGDP0_SA1
+FGMB0_SI1775
+FGMB0_SX245
+FHLM0_SX390
+FHXS0_SA2
+FHXS0_SX445
+FJDM2_SA1
+FJDM2_SX232
+FJDM2_SX52
+FJHK0_SX302
+FJKL0_SX212
+FJKL0_SX392
+FJLG0_SI2306
+FJLR0_SA1
+FJRP1_SI2062
+FJRP1_SX82
+FJSK0_SA1
+FJSP0_SX264
+FJSP0_SX354
+FJSP0_SX444
+FJWB1_SA1
+FJWB1_SX345
+FJWB1_SX435
+FJXM0_SA1
+FJXM0_SI581
+FJXM0_SX401
+FJXP0_SA1
+FJXP0_SI1122
+FJXP0_SX132
+FKAA0_SX128
+FKAA0_SX398
+FKDE0_SA1
+FKDE0_SX151
+FKDE0_SX241
+FKDE0_SX421
+FKDE0_SX61
+FKDW0_SX397
+FKFB0_SA2
+FKFB0_SX348
+FKFB0_SX78
+FKKH0_SA1
+FKKH0_SA2
+FKKH0_SX120
+FKKH0_SX390
+FKLC0_SX355
+FKLC1_SI2308
+FKLC1_SX238
+FKLC1_SX328
+FKLC1_SX418
+FKLH0_SA2
+FKLH0_SX177
+FKSR0_SA1
+FKSR0_SA2
+FKSR0_SI1747
+FKSR0_SI487
+FKSR0_SX217
+FLAC0_SX451
+FLAG0_SA2
+FLAG0_SX114
+FLAG0_SX204
+FLAG0_SX24
+FLAG0_SX384
+FLEH0_SI1681
+FLEH0_SI2311
+FLEH0_SX331
+FLET0_SA1
+FLHD0_SI1827
+FLHD0_SX354
+FLJA0_SA1
+FLJA0_SI2338
+FLJD0_SI886
+FLJD0_SX76
+FLJG0_SA2
+FLKM0_SA2
+FLKM0_SI686
+FLKM0_SX260
+FLKM0_SX80
+FLMA0_SA1
+FLMA0_SI613
+FLMA0_SX433
+FLMA0_SX73
+FLMC0_SX22
+FLMK0_SI1035
+FLMK0_SX315
+FLMK0_SX405
+FLOD0_SI1917
+FLOD0_SX117
+FLOD0_SX171
+FLOD0_SX297
+FLTM0_SA1
+FLTM0_SI1070
+FLTM0_SI2330
+FMAH1_SA2
+FMAH1_SX159
+FMBG0_SA2
+FMBG0_SI2264
+FMEM0_SI747
+FMEM0_SX387
+FMJB0_SI547
+FMJB0_SX97
+FMJF0_SA2
+FMJU0_SX309
+FMJU0_SX399
+FMKC0_SI1702
+FMKC0_SX442
+FMKC0_SX82
+FMKF0_SX186
+FMPG0_SA2
+FNKL0_SI1522
+FNTB0_SI1203
+FNTB0_SI573
+FNTB0_SX303
+FPAB1_SI1471
+FPAB1_SX211
+FPAC0_SA2
+FPAD0_SA2
+FPAD0_SX356
+FPAD0_SX86
+FPAF0_SA2
+FPAF0_SX154
+FPAZ0_SA1
+FPAZ0_SA2
+FPAZ0_SX243
+FPJF0_SA1
+FPJF0_SX146
+FPJF0_SX56
+FPLS0_SI1590
+FPLS0_SX330
+FPMY0_SA1
+FPMY0_SX343
+FREH0_SA1
+FREH0_SA2
+FREH0_SX415
+FRJB0_SX347
+FRLL0_SX434
+FSAG0_SA1
+FSAG0_SX243
+FSAH0_SA1
+FSAH0_SA2
+FSAH0_SX164
+FSAH0_SX434
+FSBK0_SA2
+FSBK0_SI1069
+FSBK0_SX169
+FSCN0_SA2
+FSCN0_SI626
+FSCN0_SX266
+FSCN0_SX446
+FSCN0_SX86
+FSDC0_SA2
+FSDC0_SX142
+FSDC0_SX322
+FSDC0_SX52
+FSDJ0_SI485
+FSDJ0_SX215
+FSDJ0_SX305
+FSDJ0_SX395
+FSGF0_SX117
+FSJG0_SX130
+FSJK1_SA2
+FSJK1_SX125
+FSJK1_SX35
+FSJS0_SX181
+FSJW0_SI1963
+FSJW0_SX433
+FSKC0_SI1416
+FSKC0_SI786
+FSKC0_SX246
+FSKL0_SI1529
+FSKL0_SX449
+FSKP0_SA2
+FSLS0_SX156
+FSLS0_SX426
+FSMA0_SA2
+FSMA0_SX181
+FSMM0_SX144
+FSMM0_SX234
+FSMS1_SX244
+FSMS1_SX347
+FSPM0_SA2
+FSPM0_SX161
+FSPM0_SX71
+FSRH0_SI1931
+FSRH0_SI671
+FSRH0_SX221
+FSRH0_SX401
+FTAJ0_SI699
+FTAJ0_SX159
+FTAJ0_SX249
+FTAJ0_SX429
+FTBR0_SX21
+FTBW0_SA1
+FTMG0_SI1532
+FTMG0_SI2162
+FTMG0_SX452
+FVFB0_SA2
+FVFB0_SX132
+FVFB0_SX42
+FVKB0_SA1
+FVMH0_SA2
+FVMH0_SX116
+FVMH0_SX26
+MABC0_SI1620
+MABC0_SI2041
+MABC0_SI781
+MADC0_SX107
+MADC0_SX377
+MADD0_SA2
+MADD0_SI1295
+MADD0_SX178
+MADD0_SX268
+MADD0_SX88
+MAEB0_SX450
+MAEO0_SA1
+MAFM0_SI939
+MAFM0_SX129
+MAFM0_SX309
+MAJP0_SA2
+MAKB0_SI1646
+MAKB0_SX26
+MAKB0_SX386
+MAKR0_SX362
+MAKR0_SX92
+MAPV0_SX213
+MARC0_SA2
+MARC0_SX108
+MARC0_SX18
+MARC0_SX198
+MARW0_SI1906
+MBAR0_SA1
+MBAR0_SX419
+MBAR0_SX59
+MBBR0_SI2315
+MBBR0_SX65
+MBCG0_SA1
+MBCG0_SI486
+MBEF0_SI1281
+MBEF0_SI1911
+MBEF0_SI651
+MBEF0_SX21
+MBEF0_SX381
+MBGT0_SA2
+MBGT0_SX261
+MBGT0_SX351
+MBGT0_SX441
+MBJV0_SA1
+MBJV0_SI617
+MBJV0_SX347
+MBMA0_SI592
+MBMA0_SX232
+MBMA0_SX52
+MBMA1_SI2214
+MBMA1_SX54
+MBML0_SA2
+MBML0_SI1169
+MBML0_SX89
+MBOM0_SA2
+MBOM0_SI2274
+MBOM0_SX294
+MBSB0_SA1
+MBSB0_SX3
+MBTH0_SA2
+MBTH0_SX122
+MBTH0_SX32
+MCAE0_SX277
+MCAL0_SA2
+MCAL0_SI1768
+MCDC0_SA1
+MCDC0_SX212
+MCDD0_SA2
+MCDD0_SI883
+MCDD0_SX253
+MCDD0_SX433
+MCDR0_SI1154
+MCEF0_SX235
+MCEF0_SX415
+MCEW0_SA2
+MCHL0_SX87
+MCLK0_SX310
+MCLM0_SA1
+MCLM0_SI2086
+MCLM0_SI826
+MCPM0_SA1
+MCPM0_SX114
+MCPM0_SX294
+MCPM0_SX384
+MCSS0_SI750
+MCTH0_SA1
+MCTH0_SX39
+MCXM0_SX91
+MDAC0_SA1
+MDAC0_SX181
+MDAC0_SX361
+MDAS0_SX6
+MDBB1_SX106
+MDBB1_SX16
+MDBB1_SX376
+MDBP0_SX168
+MDCD0_SI1415
+MDCD0_SX245
+MDCD0_SX425
+MDCM0_SX40
+MDCM0_SX400
+MDDC0_SI2049
+MDDC0_SI789
+MDDC0_SX159
+MDDC0_SX69
+MDED0_SA1
+MDED0_SA2
+MDEF0_SX123
+MDEF0_SX303
+MDHL0_SI1439
+MDHL0_SX269
+MDHL0_SX449
+MDHS0_SA1
+MDHS0_SA2
+MDHS0_SI1530
+MDHS0_SI2160
+MDJM0_SX105
+MDJM0_SX15
+MDKS0_SX436
+MDLB0_SA2
+MDLC0_SX405
+MDLC1_SA2
+MDLC1_SI2065
+MDLC1_SI2144
+MDLC1_SX445
+MDLC2_SI2244
+MDLC2_SX354
+MDLH0_SA2
+MDLM0_SI1234
+MDLM0_SI1864
+MDLM0_SX154
+MDLM0_SX424
+MDLR0_SA1
+MDLR0_SA2
+MDLR0_SI1863
+MDLR0_SI603
+MDLR0_SX153
+MDLR1_SA1
+MDLR1_SA2
+MDMA0_SI1430
+MDMA0_SX260
+MDMA0_SX80
+MDMT0_SA1
+MDMT0_SA2
+MDMT0_SI1832
+MDMT0_SX122
+MDMT0_SX32
+MDNS0_SA2
+MDNS0_SI2271
+MDNS0_SX201
+MDNS0_SX21
+MDPB0_SX416
+MDPK0_SI1053
+MDPK0_SX333
+MDPK0_SX423
+MDPS0_SI719
+MDPS0_SX359
+MDRD0_SA1
+MDRD0_SX32
+MDSJ0_SI2092
+MDSS0_SA2
+MDSS0_SX441
+MDSS1_SA1
+MDSS1_SI1327
+MDSS1_SI697
+MDSS1_SX157
+MDSS1_SX67
+MDTB0_SI1200
+MDTB0_SI1830
+MDTB0_SX120
+MDWD0_SA2
+MDWD0_SX270
+MDWD0_SX90
+MDWH0_SX215
+MDWH0_SX305
+MDWM0_SA1
+MDWM0_SA2
+MDWM0_SX16
+MDWM0_SX286
+MEAL0_SA2
+MEAL0_SI2177
+MEAL0_SX107
+MEAL0_SX347
+MEDR0_SA1
+MEDR0_SA2
+MEDR0_SI1374
+MEFG0_SA1
+MEGJ0_SA2
+MEGJ0_SX257
+MEGJ0_SX3
+MEJL0_SA1
+MEJL0_SX152
+MEJL0_SX242
+MEJS0_SI610
+MEJS0_SX160
+MEJS0_SX340
+MESG0_SX432
+MESJ0_SX187
+MESJ0_SX97
+MEWM0_SI718
+MEWM0_SX178
+MEWM0_SX88
+MFER0_SI862
+MFER0_SX142
+MFRM0_SX345
+MFRM0_SX435
+MFWK0_SI1879
+MFWK0_SX169
+MFXS0_SX54
+MFXV0_SA2
+MFXV0_SX105
+MGAF0_SA1
+MGAF0_SX22
+MGAF0_SX382
+MGAG0_SA2
+MGAK0_SX226
+MGAK0_SX46
+MGAR0_SX132
+MGAW0_SI535
+MGAW0_SX175
+MGES0_SA1
+MGES0_SI2111
+MGES0_SI851
+MGJC0_SA2
+MGJC0_SX75
+MGRL0_SI2127
+MGRL0_SI867
+MGRL0_SX147
+MGRP0_SA2
+MGSH0_SA2
+MGSH0_SI1806
+MGSH0_SX127
+MGSH0_SX276
+MGSH0_SX6
+MGSL0_SA1
+MGSL0_SI534
+MGSL0_SX264
+MGXP0_SX187
+MGXP0_SX7
+MHBS0_SX315
+MHBS0_SX45
+MHIT0_SA1
+MHJB0_SA1
+MHJB0_SI1017
+MHMG0_SX195
+MHMR0_SA1
+MHMR0_SI489
+MHRM0_SA1
+MHRM0_SI958
+MHRM0_SX148
+MHRM0_SX58
+MHXL0_SI1772
+MHXL0_SX242
+MILB0_SA2
+MJAC0_SX307
+MJAC0_SX71
+MJAE0_SX174
+MJAI0_SA1
+MJAI0_SA2
+MJBG0_SX62
+MJDA0_SI1031
+MJDA0_SX311
+MJDE0_SI463
+MJDG0_SA2
+MJDG0_SI1042
+MJDG0_SI1705
+MJDM0_SA1
+MJDM0_SI974
+MJEB0_SI656
+MJEB0_SX296
+MJEB1_SA2
+MJEB1_SX207
+MJEB1_SX387
+MJEE0_SA1
+MJEE0_SX247
+MJEE0_SX337
+MJFH0_SA2
+MJFH0_SI1107
+MJFR0_SX75
+MJHI0_SA1
+MJHI0_SX158
+MJJB0_SA1
+MJJB0_SX239
+MJJJ0_SX443
+MJJM0_SA2
+MJJM0_SI827
+MJJM0_SX107
+MJKR0_SA1
+MJKR0_SI571
+MJLB0_SX176
+MJLG1_SX292
+MJLS0_SX106
+MJMA0_SA1
+MJMA0_SA2
+MJMD0_SA2
+MJMD0_SX308
+MJMD0_SX38
+MJMM0_SX85
+MJPG0_SI1191
+MJPG0_SX111
+MJPG0_SX201
+MJPG0_SX21
+MJPM0_SA2
+MJPM0_SX378
+MJPM1_SI2280
+MJPM1_SX401
+MJRA0_SA1
+MJRA0_SA2
+MJRA0_SI1236
+MJRA0_SI1866
+MJRA0_SX426
+MJRG0_SI1366
+MJRG0_SI1996
+MJRG0_SX376
+MJRH0_SX225
+MJRH1_SA1
+MJRH1_SI514
+MJRH1_SX154
+MJRH1_SX244
+MJRH1_SX424
+MJRK0_SA1
+MJRK0_SA2
+MJRK0_SI1662
+MJRK0_SX160
+MJRK0_SX250
+MJRK0_SX430
+MJRP0_SA1
+MJRP0_SA2
+MJRP0_SX225
+MJSR0_SA1
+MJSR0_SI1424
+MJSR0_SX344
+MJWG0_SA1
+MJWG0_SX265
+MJWS0_SI513
+MJWS0_SX153
+MJWS0_SX63
+MJWT0_SA1
+MJWT0_SX121
+MJWT0_SX211
+MJWT0_SX301
+MJWT0_SX31
+MJWT0_SX391
+MJXA0_SX427
+MJXL0_SI542
+MKAG0_SA1
+MKAG0_SX259
+MKAJ0_SA2
+MKAJ0_SX154
+MKAM0_SA1
+MKAM0_SX146
+MKAM0_SX326
+MKAM0_SX56
+MKDB0_SA1
+MKDB0_SA2
+MKDB0_SX152
+MKDD0_SA2
+MKES0_SA1
+MKES0_SI1253
+MKES0_SI1883
+MKES0_SX173
+MKJO0_SI1517
+MKJO0_SI887
+MKJO0_SX437
+MKLN0_SI968
+MKLN0_SX248
+MKLR0_SA2
+MKLR0_SI1689
+MKLS0_SA1
+MKLS0_SX357
+MKLS0_SX87
+MKLS1_SA1
+MKLS1_SA2
+MKLS1_SX375
+MKLW0_SA1
+MKRG0_SX411
+MKXL0_SA2
+MKXL0_SX15
+MKXL0_SX375
+MLBC0_SA1
+MLBC0_SI1869
+MLBC0_SX249
+MLEL0_SA1
+MLEL0_SA2
+MLEL0_SI1246
+MLEL0_SX256
+MLEL0_SX436
+MLJC0_SX145
+MLJC0_SX415
+MLJH0_SX64
+MLNS0_SI2037
+MMAA0_SA1
+MMAA0_SA2
+MMAA0_SX35
+MMAB1_SI1494
+MMAB1_SX234
+MMAG0_SA2
+MMAG0_SI1126
+MMAG0_SX316
+MMAM0_SI2227
+MMAM0_SX157
+MMAM0_SX427
+MMAR0_SX256
+MMBS0_SI1781
+MMCC0_SA2
+MMDB0_SX177
+MMDG0_SA1
+MMDG0_SA2
+MMDG0_SI520
+MMDG0_SX160
+MMDG0_SX250
+MMDM0_SI1941
+MMDM0_SI681
+MMDM0_SX141
+MMDM1_SA2
+MMDM1_SI2043
+MMDM1_SX423
+MMDM1_SX63
+MMDS0_SA1
+MMEA0_SA1
+MMEA0_SX128
+MMEA0_SX398
+MMEB0_SA2
+MMEB0_SX187
+MMEB0_SX367
+MMGC0_SA2
+MMGC0_SX135
+MMGC0_SX225
+MMGG0_SX269
+MMGK0_SX332
+MMGK0_SX62
+MMJB1_SA2
+MMRP0_SA2
+MMRP0_SX144
+MMSM0_SX116
+MMSM0_SX206
+MMVP0_SA1
+MMVP0_SA2
+MMWB0_SI989
+MMWB0_SX89
+MMWS0_SA2
+MMWS0_SX168
+MMWS0_SX348
+MMWS0_SX438
+MMWS1_SI1701
+MMXS0_SI2136
+MMXS0_SX246
+MMXS0_SX426
+MNET0_SI816
+MNET0_SX6
+MNTW0_SA2
+MNTW0_SX168
+MNTW0_SX78
+MPAR0_SI2206
+MPAR0_SI946
+MPAR0_SX136
+MPAR0_SX316
+MPEB0_SI1034
+MPEB0_SI1860
+MPEB0_SX240
+MPEB0_SX330
+MPFU0_SI628
+MPFU0_SX448
+MPGH0_SX114
+MPGH0_SX24
+MPGR0_SX240
+MPGR0_SX330
+MPGR1_SX149
+MPPC0_SA1
+MPRD0_SA1
+MPRD0_SX261
+MPRD0_SX351
+MPRD0_SX441
+MPRD0_SX81
+MPRK0_SI1727
+MPRK0_SX107
+MPRK0_SX377
+MPRT0_SA1
+MPRT0_SX310
+MPSW0_SI1067
+MPSW0_SX167
+MPSW0_SX437
+MRAB1_SX128
+MRAB1_SX308
+MRAI0_SA1
+MRAI0_SA2
+MRAI0_SX72
+MRAM0_SA1
+MRAM0_SA2
+MRAM0_SX15
+MRBC0_SI1859
+MRBC0_SX329
+MRBC0_SX419
+MRCG0_SI798
+MRCG0_SX168
+MRCW0_SA1
+MRCW0_SX291
+MRDD0_SI1680
+MRDD0_SX150
+MRDD0_SX277
+MRDD0_SX60
+MRDM0_SI1595
+MRDM0_SX65
+MRDS0_SA1
+MREE0_SX24
+MREH1_SX249
+MREH1_SX69
+MREM0_SA2
+MREW1_SI870
+MRFK0_SX446
+MRFL0_SA1
+MRFL0_SX256
+MRFL0_SX436
+MRFL0_SX76
+MRGM0_SA2
+MRGM0_SX262
+MRGS0_SA2
+MRGS0_SX186
+MRHL0_SI885
+MRHL0_SX345
+MRHL0_SX435
+MRJB1_SA1
+MRJB1_SA2
+MRJB1_SX210
+MRJB1_SX30
+MRJB1_SX390
+MRJH0_SA2
+MRJH0_SX307
+MRJH0_SX79
+MRJM0_SX148
+MRJM1_SA2
+MRJM1_SI1298
+MRJM1_SI1928
+MRJM1_SX128
+MRJT0_SA2
+MRJT0_SI1498
+MRJT0_SX328
+MRJT0_SX418
+MRKM0_SA2
+MRKM0_SX367
+MRLD0_SA2
+MRLD0_SI2224
+MRLD0_SX154
+MRLD0_SX424
+MRLJ0_SA1
+MRLJ0_SX250
+MRLJ0_SX340
+MRLJ1_SA1
+MRLJ1_SA2
+MRLJ1_SX321
+MRLK0_SI843
+MRLK0_SX123
+MRLK0_SX213
+MRMB0_SA2
+MRMB0_SI1581
+MRMB0_SX411
+MRMG0_SA1
+MRMG0_SI1080
+MRMG0_SX450
+MRMH0_SI1349
+MRMH0_SI2281
+MRMH0_SX121
+MRML0_SA2
+MRML0_SX341
+MRPC1_SI2112
+MRRE0_SA2
+MRRE0_SX164
+MRRE0_SX344
+MRRE0_SX74
+MRSO0_SX129
+MRSO0_SX39
+MRSP0_SX259
+MRTC0_SX378
+MRVG0_SI1140
+MRVG0_SX240
+MRWA0_SI973
+MRWA0_SX163
+MRWA0_SX73
+MRWS0_SI1732
+MRWS0_SI472
+MRWS0_SX22
+MRWS0_SX382
+MRXB0_SA2
+MRXB0_SX415
+MSAH1_SI1679
+MSAS0_SX116
+MSAS0_SX206
+MSAS0_SX386
+MSAT0_SA1
+MSAT1_SX263
+MSAT1_SX443
+MSAT1_SX83
+MSDB0_SX197
+MSDB0_SX287
+MSDB0_SX377
+MSDH0_SI2240
+MSDH0_SX440
+MSDH0_SX80
+MSDS0_SA1
+MSEM1_SI1440
+MSEM1_SX180
+MSEM1_SX270
+MSES0_SI1589
+MSES0_SX239
+MSES0_SX419
+MSFH0_SX316
+MSFV0_SI1892
+MSFV0_SX362
+MSFV0_SX92
+MSMR0_SX415
+MSMS0_SA1
+MSMS0_SX173
+MSMS0_SX83
+MSRG0_SA1
+MSRG0_SI1221
+MSTF0_SI766
+MSTF0_SX316
+MSTF0_SX46
+MSVS0_SA2
+MSVS0_SX308
+MTAS0_SX215
+MTAS0_SX35
+MTAS0_SX395
+MTAT0_SX390
+MTAT1_SX59
+MTBC0_SI1803
+MTCS0_SA2
+MTCS0_SI2265
+MTCS0_SX82
+MTDP0_SA2
+MTER0_SA2
+MTER0_SI1787
+MTJG0_SA1
+MTJG0_SI2157
+MTJG0_SX260
+MTJM0_SI1856
+MTJM0_SX146
+MTJU0_SX130
+MTJU0_SX400
+MTKD0_SX107
+MTKD0_SX287
+MTKP0_SI1023
+MTLB0_SA1
+MTLB0_SX234
+MTLC0_SA1
+MTML0_SI2325
+MTML0_SX165
+MTMN0_SA2
+MTMN0_SI1064
+MTMN0_SI2324
+MTMN0_SX434
+MTMT0_SA2
+MTMT0_SI1748
+MTPF0_SX65
+MTPG0_SI1383
+MTPG0_SI753
+MTPG0_SX303
+MTPP0_SX338
+MTPR0_SX340
+MTQC0_SI480
+MTQC0_SX91
+MTRR0_SX198
+MTRR0_SX288
+MTRT0_SA2
+MTRT0_SX254
+MTRT0_SX57
+MTWH1_SX72
+MTXS0_SA1
+MTXS0_SA2
+MVJH0_SI926
+MVJH0_SX206
+MVJH0_SX296
+MVLO0_SA1
+MVRW0_SA2
+MVRW0_SX135
+MVRW0_SX225
+MWAC0_SA2
+MWAC0_SX341
+MWAC0_SX431
+MWAD0_SX432
+MWAD0_SX72
+MWAR0_SA1
+MWAR0_SI1675
+MWCH0_SI1895
+MWCH0_SI2252
+MWCH0_SX182
+MWCH0_SX452
+MWDK0_SA1
+MWDK0_SA2
+MWDK0_SI2017
+MWDK0_SI806
+MWDK0_SX176
+MWDK0_SX86
+MWEM0_SA2
+MWEM0_SI1320
+MWEM0_SI1393
+MWEM0_SX150
+MWGR0_SX346
+MWRE0_SX247
+MWRE0_SX337
+MWRE0_SX427
+MWRP0_SA1
+MWRP0_SX273
+MWRP0_SX363
+MWSB0_SX276
+MWSH0_SX256
+MWSH0_SX76
+MZMB0_SA1
diff --git a/SpeechT5/fairseq/examples/wav2vec/unsupervised/config/timit_unmatched/valid.uid b/SpeechT5/fairseq/examples/wav2vec/unsupervised/config/timit_unmatched/valid.uid
new file mode 100644
index 0000000000000000000000000000000000000000..e99edfe937854a5f47a2f0384f0e067487336883
--- /dev/null
+++ b/SpeechT5/fairseq/examples/wav2vec/unsupervised/config/timit_unmatched/valid.uid
@@ -0,0 +1,620 @@
+FAEM0_SI1392
+FAJW0_SI1263
+FAJW0_SI633
+FALK0_SI658
+FALR0_SX335
+FAPB0_SI1063
+FAPB0_SI2323
+FAPB0_SX433
+FBAS0_SI1472
+FBAS0_SI2066
+FBCG1_SX352
+FBCH0_SI959
+FBJL0_SI922
+FBLV0_SI1688
+FBMH0_SI1136
+FBMH0_SI970
+FBMJ0_SA1
+FBMJ0_SI1776
+FBMJ0_SI516
+FBMJ0_SX336
+FCDR1_SI1186
+FCDR1_SI1816
+FCDR1_SI556
+FCDR1_SX286
+FCKE0_SI1741
+FCKE0_SI481
+FCLT0_SI808
+FCMG0_SI1142
+FCMG0_SX432
+FCMM0_SI1957
+FCMM0_SX420
+FCYL0_SI667
+FCYL0_SX349
+FDAS1_SI1461
+FDAS1_SI831
+FDAW0_SI1271
+FDAW0_SI2036
+FDJH0_SI935
+FDKN0_SI1202
+FDKN0_SX181
+FDKN0_SX451
+FDMY0_SA1
+FDMY0_SI567
+FDMY0_SI714
+FDMY0_SX387
+FDNC0_SI1278
+FDNC0_SI1908
+FDTD0_SA1
+FDTD0_SX321
+FEAC0_SI615
+FEAR0_SX352
+FECD0_SA1
+FECD0_SI1418
+FECD0_SI788
+FEME0_SI875
+FEME0_SX335
+FEXM0_SA1
+FEXM0_SI482
+FEXM0_SX366
+FGDP0_SI988
+FGDP0_SX88
+FGMB0_SI1145
+FGMB0_SX335
+FGRW0_SA1
+FGRW0_SI1152
+FGRW0_SX162
+FGRW0_SX432
+FHLM0_SX120
+FHLM0_SX349
+FHXS0_SA1
+FHXS0_SI1075
+FHXS0_SI2302
+FHXS0_SX175
+FJDM2_SA2
+FJDM2_SX142
+FJEN0_SA1
+FJEN0_SX327
+FJEN0_SX417
+FJHK0_SI2282
+FJKL0_SI932
+FJLG0_SI1889
+FJLR0_SI1231
+FJRB0_SX402
+FJRP1_SA1
+FJRP1_SI1432
+FJRP1_SX262
+FJRP1_SX352
+FJSK0_SI1052
+FJSP0_SI1434
+FJWB1_SI748
+FJXM0_SX311
+FJXM0_SX41
+FJXP0_SI1752
+FKAA0_SA1
+FKDE0_SI1141
+FKDE0_SI1771
+FKDW0_SI1207
+FKDW0_SI1891
+FKFB0_SI1608
+FKFB0_SX438
+FKKH0_SI1290
+FKKH0_SI1920
+FKLC0_SI985
+FKLC0_SX175
+FKLC1_SI1048
+FKLH0_SI1257
+FKSR0_SX366
+FLAC0_SI1339
+FLAG0_SI1464
+FLAG0_SI834
+FLEH0_SI1051
+FLET0_SI507
+FLJA0_SI1078
+FLJA0_SX178
+FLJD0_SI1516
+FLJG0_SI981
+FLJG0_SX171
+FLJG0_SX351
+FLKM0_SA1
+FLKM0_SI620
+FLKM0_SX350
+FLKM0_SX440
+FLMC0_SI1372
+FLMK0_SA1
+FLMK0_SI1229
+FLTM0_SX170
+FLTM0_SX350
+FLTM0_SX440
+FMAH1_SI879
+FMBG0_SI1160
+FMEM0_SA1
+FMEM0_SX333
+FMJB0_SI1177
+FMJF0_SI624
+FMJF0_SX174
+FMJF0_SX84
+FMJU0_SI1389
+FMKC0_SI1041
+FMKF0_SI1018
+FMPG0_SA1
+FMPG0_SI972
+FMPG0_SX162
+FMPG0_SX342
+FMPG0_SX432
+FNKL0_SI892
+FNTB0_SI679
+FPAB1_SA1
+FPAB1_SI2101
+FPAB1_SI841
+FPAC0_SI1921
+FPAC0_SI661
+FPAD0_SI716
+FPAD0_SX176
+FPAF0_SA1
+FPAF0_SI1054
+FPAZ0_SI2223
+FPAZ0_SI963
+FPJF0_SI1259
+FPJF0_SX352
+FPLS0_SI960
+FPMY0_SI1153
+FPMY0_SI523
+FREH0_SI1945
+FRLL0_SI805
+FSAG0_SI1323
+FSAG0_SX153
+FSAG0_SX333
+FSAG0_SX423
+FSAH0_SI614
+FSAH0_SX327
+FSAK0_SI1300
+FSBK0_SX349
+FSCN0_SA1
+FSCN0_SI705
+FSCN0_SX176
+FSDC0_SI1312
+FSDJ0_SI1115
+FSGF0_SI2187
+FSGF0_SI927
+FSJG0_SA1
+FSJG0_SA2
+FSJG0_SI940
+FSJG0_SX220
+FSJG0_SX40
+FSJG0_SX400
+FSJS0_SA1
+FSJS0_SX451
+FSJW0_SI1333
+FSKP0_SI1098
+FSMA0_SI991
+FSMA0_SX451
+FSMM0_SX324
+FSPM0_SI1241
+FSPM0_SX251
+FSRH0_SX311
+FSSB0_SI1712
+FSSB0_SX362
+FTBR0_SI1402
+FTBR0_SI921
+FTBW0_SI715
+FTBW0_SX175
+FTLG0_SI1743
+FTLG0_SI483
+FTMG0_SI902
+FVFB0_SI1510
+FVKB0_SX349
+FVMH0_SI1466
+FVMH0_SI836
+MADC0_SI1367
+MADC0_SI737
+MAEB0_SI1411
+MAEO0_SI1326
+MAJP0_SI1704
+MAJP0_SX174
+MAKB0_SA2
+MAKB0_SI1016
+MAKB0_SI2276
+MAKB0_SX116
+MAPV0_SI1293
+MAPV0_SI663
+MARW0_SX286
+MARW0_SX349
+MBBR0_SI1055
+MBBR0_SX335
+MBCG0_SI957
+MBCG0_SX327
+MBGT0_SI1841
+MBGT0_SX171
+MBMA0_SI1222
+MBMA1_SI954
+MBMA1_SX324
+MBTH0_SI2102
+MBWP0_SX349
+MCAE0_SI1447
+MCAE0_SI2077
+MCAE0_SI817
+MCAL0_SI1138
+MCDR0_SI1784
+MCDR0_SI524
+MCEF0_SI842
+MCEW0_SA1
+MCEW0_SI2072
+MCEW0_SI812
+MCEW0_SX362
+MCEW0_SX452
+MCHL0_SI1347
+MCHL0_SI1404
+MCLK0_SI2290
+MCLK0_SI650
+MCPM0_SI1824
+MCSS0_SI1380
+MCSS0_SI688
+MCTM0_SI1350
+MCTM0_SI1980
+MDAC0_SI631
+MDAS0_SI1896
+MDAS0_SI636
+MDBP0_SI528
+MDBP0_SX438
+MDCD0_SI785
+MDCD0_SX335
+MDCM0_SI1480
+MDDC0_SI1419
+MDED0_SI540
+MDEF0_SI1123
+MDEM0_SA1
+MDEM0_SI608
+MDEM0_SI800
+MDEM0_SX428
+MDHS0_SI900
+MDJM0_SI1455
+MDKS0_SX166
+MDKS0_SX346
+MDLB0_SI1306
+MDLB0_SX136
+MDLB0_SX406
+MDLC0_SI1395
+MDLC0_SI2025
+MDLC1_SI1435
+MDLH0_SX160
+MDLH0_SX430
+MDLM0_SI604
+MDLR0_SX333
+MDLR1_SI669
+MDMA0_SX170
+MDMA0_SX350
+MDMA0_SX440
+MDNS0_SI1011
+MDNS0_SI873
+MDPB0_SI1760
+MDPB0_SI866
+MDRD0_SI752
+MDSJ0_SI1462
+MDSJ0_SX438
+MDWD0_SI1260
+MDWH0_SA1
+MDWH0_SI1168
+MDWH0_SI665
+MDWM0_SI916
+MEDR0_SI2004
+MEFG0_SI491
+MEFG0_SI598
+MEGJ0_SA1
+MEGJ0_SI1337
+MEGJ0_SI707
+MEGJ0_SX167
+MEJS0_SI1240
+MESG0_SI702
+MESJ0_SI2039
+MFWK0_SX349
+MFXS0_SX324
+MFXV0_SI1005
+MFXV0_SI1342
+MGAF0_SI1282
+MGAG0_SI691
+MGAK0_SI1036
+MGAK0_SX136
+MGAR0_SX312
+MGAW0_SI1165
+MGES0_SX311
+MGJC0_SX435
+MGRL0_SX327
+MGRP0_SI1317
+MGRP0_SX327
+MGSH0_SI1176
+MGSH0_SI546
+MGSL0_SI797
+MGXP0_SI1087
+MGXP0_SI525
+MHBS0_SI945
+MHIT0_SI983
+MHMG0_SI735
+MHMR0_SI1692
+MILB0_SI903
+MJAC0_SI701
+MJAC0_SX251
+MJAE0_SX84
+MJAI0_SI682
+MJAI0_SI710
+MJDC0_SI531
+MJDE0_SA1
+MJDE0_SI1120
+MJDE0_SI490
+MJDE0_SX220
+MJDM0_SI1340
+MJDM0_SX170
+MJDM0_SX350
+MJEB0_SX170
+MJEB1_SI1467
+MJEB1_SI837
+MJFR0_SA1
+MJFR0_SX435
+MJHI0_SI1328
+MJJJ0_SI1163
+MJJM0_SI1251
+MJLB0_SI1616
+MJLS0_SI1726
+MJMA0_SI2125
+MJMD0_SI2288
+MJMM0_SI1255
+MJMM0_SX175
+MJPG0_SI1821
+MJPM0_SI1368
+MJPM1_SX311
+MJRA0_SX336
+MJRG0_SI736
+MJRG0_SX352
+MJRH0_SI1840
+MJRH1_SI1558
+MJRK0_SI880
+MJRP0_SI1845
+MJSR0_SI2054
+MJSR0_SI794
+MJWG0_SI813
+MJWG0_SI895
+MJWG0_SX175
+MJWS0_SX333
+MJWT0_SI1291
+MJWT0_SI1381
+MJXL0_SI1172
+MKAG0_SI979
+MKAH0_SX178
+MKAM0_SI1250
+MKAM0_SI1465
+MKDD0_SI1567
+MKDD0_SI2197
+MKDD0_SI937
+MKDT0_SI814
+MKES0_SI623
+MKLS0_SI1437
+MKLS0_SI2067
+MKLS1_SI915
+MKLW0_SI1571
+MKLW0_SX311
+MKRG0_SI861
+MKXL0_SI1815
+MKXL0_SI1958
+MLBC0_SI1239
+MLEL0_SI616
+MLEL0_SX166
+MLJC0_SI1225
+MLJH0_SA1
+MLJH0_SA2
+MLJH0_SI1422
+MLJH0_SI694
+MLJH0_SX244
+MLSH0_SI1417
+MLSH0_SX247
+MMAA0_SI1588
+MMAA0_SI845
+MMAB1_SI864
+MMAB1_SX324
+MMAG0_SA1
+MMAG0_SI1756
+MMAG0_SX136
+MMAR0_SI1966
+MMAR0_SX166
+MMAR0_SX346
+MMBS0_SI521
+MMBS0_SX161
+MMCC0_SI1338
+MMDB0_SI987
+MMDG0_SI1780
+MMDM0_SI1311
+MMDM1_SX153
+MMDM1_SX333
+MMEB0_SX327
+MMGC0_SI1305
+MMGG0_SI1079
+MMGG0_SX449
+MMLM0_SI2150
+MMPM0_SX161
+MMRP0_SX324
+MMSM0_SI1106
+MMSM0_SI476
+MMVP0_SI654
+MMVP0_SX347
+MMWB0_SA1
+MMWB0_SI2249
+MMWB0_SX359
+MMWB0_SX449
+MNTW0_SI1068
+MNTW0_SI1698
+MPEB0_SI600
+MPFU0_SI1258
+MPGH0_SI675
+MPGR0_SI1410
+MPGR1_SI1499
+MPMB0_SA1
+MPMB0_SA2
+MPMB0_SI1501
+MPMB0_SI2131
+MPMB0_SI871
+MPMB0_SX151
+MPMB0_SX331
+MPMB0_SX421
+MPMB0_SX61
+MPPC0_SI1412
+MPRB0_SI1215
+MPRB0_SI575
+MPRD0_SI801
+MPRD0_SX171
+MPRK0_SA1
+MPRK0_SI1097
+MPRK0_SI467
+MPRK0_SX287
+MRAB0_SI1854
+MRAB1_SI848
+MRAI0_SI2052
+MRAI0_SI792
+MRAI0_SX432
+MRAM0_SI1951
+MRCG0_SA2
+MRCG0_SI1428
+MRCG0_SX348
+MRCG0_SX438
+MRCW0_SI741
+MRDM0_SI1044
+MRDM0_SX335
+MREE0_SI1104
+MREE0_SI1959
+MREH1_SA1
+MREH1_SI1599
+MREH1_SI969
+MREM0_SI511
+MRFK0_SI1076
+MRFL0_SI1156
+MRFL0_SI526
+MRFL0_SX166
+MRGM0_SI532
+MRGM0_SX172
+MRGM0_SX442
+MRGS0_SI1356
+MRGS0_SI726
+MRGS0_SX6
+MRJB1_SI1413
+MRJB1_SI2021
+MRJB1_SX120
+MRJH0_SI1519
+MRJH0_SI889
+MRJH0_SX169
+MRJT0_SI868
+MRJT0_SX58
+MRKM0_SI1267
+MRKM0_SI1391
+MRKM0_SI637
+MRLJ0_SI790
+MRLJ1_SI2301
+MRLK0_SI1468
+MRLR0_SI1196
+MRML0_SA1
+MRML0_SI1421
+MRML0_SX161
+MRML0_SX251
+MRMS0_SI2057
+MRRE0_SA1
+MRRE0_SI1334
+MRRE0_SI952
+MRSO0_SI1206
+MRSP0_SI1429
+MRTC0_SI1458
+MRTJ0_SA1
+MRTJ0_SI772
+MRTJ0_SX142
+MRTJ0_SX232
+MRTJ0_SX52
+MRWS0_SI1102
+MRXB0_SI2215
+MRXB0_SI955
+MSAS0_SI1376
+MSAS0_SI746
+MSDH0_SI980
+MSDH0_SX170
+MSDS0_SI1077
+MSDS0_SX267
+MSDS0_SX357
+MSEM1_SI2070
+MSEM1_SI810
+MSFH0_SA1
+MSFH0_SI1738
+MSFH0_SX136
+MSFH0_SX406
+MSFV0_SI632
+MSJK0_SI1596
+MSJK0_SX336
+MSMC0_SI509
+MSMR0_SI1150
+MSMS0_SI1433
+MSRR0_SI1761
+MSRR0_SI501
+MSTF0_SI852
+MSVS0_SI2198
+MSVS0_SI938
+MSVS0_SX398
+MTAB0_SI1572
+MTAB0_SX312
+MTAT0_SA1
+MTAT0_SI1110
+MTAT0_SI811
+MTAT1_SI779
+MTAT1_SX149
+MTAT1_SX329
+MTBC0_SI543
+MTCS0_SI712
+MTDB0_SI1401
+MTDB0_SI771
+MTDP0_SA1
+MTDP0_SI1521
+MTDP0_SX171
+MTDP0_SX351
+MTER0_SA1
+MTER0_SI1157
+MTER0_SX437
+MTJG0_SX170
+MTJS0_SA2
+MTJS0_SI1822
+MTJS0_SI562
+MTJS0_SX382
+MTJU0_SI2020
+MTKD0_SI630
+MTKP0_SI2283
+MTKP0_SI454
+MTLB0_SI1134
+MTLB0_SX324
+MTLC0_SI1313
+MTLC0_SI1477
+MTML0_SX435
+MTMN0_SI582
+MTMT0_SI488
+MTPP0_SI1508
+MTPR0_SI2230
+MTPR0_SX160
+MTPR0_SX430
+MTQC0_SA1
+MTQC0_SI1441
+MTQC0_SX181
+MTQC0_SX451
+MTRC0_SI589
+MTRR0_SI918
+MTRT0_SI1227
+MTXS0_SI1060
+MTXS0_SI2320
+MTXS0_SX160
+MTXS0_SX430
+MVJH0_SI1556
+MVLO0_SI517
+MWAC0_SI1601
+MWAC0_SX161
+MWAC0_SX251
+MWAR0_SI1045
+MWDK0_SI1436
+MWEM0_SX420
+MWRE0_SA2
+MWRE0_SI1057
+MWRE0_SX67
+MWRP0_SI1443
+MWSB0_SI996
+MWSH0_SI1426
+MWSH0_SI796
+MWSH0_SX166
diff --git a/SpeechT5/fairseq/examples/wav2vec/unsupervised/data/__init__.py b/SpeechT5/fairseq/examples/wav2vec/unsupervised/data/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..d0545627efc9a6f9bb180e351ead519a2cb6dea7
--- /dev/null
+++ b/SpeechT5/fairseq/examples/wav2vec/unsupervised/data/__init__.py
@@ -0,0 +1,13 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from .extracted_features_dataset import ExtractedFeaturesDataset
+from .random_input_dataset import RandomInputDataset
+
+
+__all__ = [
+    "ExtractedFeaturesDataset",
+    "RandomInputDataset",
+]
diff --git a/SpeechT5/fairseq/examples/wav2vec/unsupervised/data/extracted_features_dataset.py b/SpeechT5/fairseq/examples/wav2vec/unsupervised/data/extracted_features_dataset.py
new file mode 100644
index 0000000000000000000000000000000000000000..d6ee9c4a3602be9db8ddfe67d41ce8a96a98ad1e
--- /dev/null
+++ b/SpeechT5/fairseq/examples/wav2vec/unsupervised/data/extracted_features_dataset.py
@@ -0,0 +1,144 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+
+import logging
+import os
+import contextlib
+
+import numpy as np
+import torch
+
+from fairseq.data import FairseqDataset, data_utils
+
+
+logger = logging.getLogger(__name__)
+
+
+class ExtractedFeaturesDataset(FairseqDataset):
+    def __init__(
+        self,
+        path,
+        split,
+        min_length=3,
+        max_length=None,
+        labels=None,
+        label_dict=None,
+        shuffle=True,
+        sort_by_length=True,
+    ):
+        super().__init__()
+
+        self.min_length = min_length
+        self.max_length = max_length
+        self.shuffle = shuffle
+        self.sort_by_length = sort_by_length
+        self.label_dict = label_dict
+
+        if labels is not None:
+            assert label_dict is not None
+
+        self.sizes = []
+        self.offsets = []
+        self.labels = []
+
+        path = os.path.join(path, split)
+        data_path = path
+        self.data = np.load(data_path + ".npy", mmap_mode="r")
+
+        offset = 0
+        skipped = 0
+
+        if not os.path.exists(path + f".{labels}"):
+            labels = None
+
+        with open(data_path + ".lengths", "r") as len_f, open(
+            path + f".{labels}", "r"
+        ) if labels is not None else contextlib.ExitStack() as lbl_f:
+            for line in len_f:
+                length = int(line.rstrip())
+                lbl = None if labels is None else next(lbl_f).rstrip().split()
+                if length >= min_length and (
+                    max_length is None or length <= max_length
+                ):
+                    self.sizes.append(length)
+                    self.offsets.append(offset)
+                    if lbl is not None:
+                        self.labels.append(lbl)
+                offset += length
+
+        self.sizes = np.asarray(self.sizes)
+        self.offsets = np.asarray(self.offsets)
+
+        logger.info(f"loaded {len(self.offsets)}, skipped {skipped} samples")
+
+    def __getitem__(self, index):
+        offset = self.offsets[index]
+        end = self.sizes[index] + offset
+        feats = torch.from_numpy(self.data[offset:end].copy()).float()
+
+        res = {"id": index, "features": feats}
+        if len(self.labels) > 0:
+            res["target"] = self.label_dict.encode_line(
+                self.labels[index],
+                line_tokenizer=lambda x: x,
+                append_eos=False,
+            )
+
+        return res
+
+    def __len__(self):
+        return len(self.sizes)
+
+    def collater(self, samples):
+        if len(samples) == 0:
+            return {}
+
+        features = [s["features"] for s in samples]
+        sizes = [len(s) for s in features]
+
+        target_size = max(sizes)
+
+        collated_features = features[0].new_zeros(
+            len(features), target_size, features[0].size(-1)
+        )
+        padding_mask = torch.BoolTensor(collated_features.shape[:-1]).fill_(False)
+        for i, (f, size) in enumerate(zip(features, sizes)):
+            collated_features[i, :size] = f
+            padding_mask[i, size:] = True
+
+        res = {
+            "id": torch.LongTensor([s["id"] for s in samples]),
+            "net_input": {"features": collated_features, "padding_mask": padding_mask},
+        }
+
+        if len(self.labels) > 0:
+            target = data_utils.collate_tokens(
+                [s["target"] for s in samples],
+                pad_idx=self.label_dict.pad(),
+                left_pad=False,
+            )
+            res["target"] = target
+        return res
+
+    def num_tokens(self, index):
+        return self.size(index)
+
+    def size(self, index):
+        return self.sizes[index]
+
+    def ordered_indices(self):
+        """Return an ordered list of indices. Batches will be constructed based
+        on this order."""
+        if self.shuffle:
+            order = [np.random.permutation(len(self))]
+        else:
+            order = [np.arange(len(self))]
+
+        if self.sort_by_length:
+            order.append(self.sizes)
+            return np.lexsort(order)[::-1]
+        else:
+            return order[0]
diff --git a/SpeechT5/fairseq/examples/wav2vec/unsupervised/data/random_input_dataset.py b/SpeechT5/fairseq/examples/wav2vec/unsupervised/data/random_input_dataset.py
new file mode 100644
index 0000000000000000000000000000000000000000..886505616cc7f7a515ecebf34fae5c2bc541de03
--- /dev/null
+++ b/SpeechT5/fairseq/examples/wav2vec/unsupervised/data/random_input_dataset.py
@@ -0,0 +1,62 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import random
+from typing import List
+
+from fairseq.data import BaseWrapperDataset, data_utils
+
+
+class RandomInputDataset(BaseWrapperDataset):
+    def __init__(
+        self,
+        dataset,
+        random_input_dataset,
+        input_key_path: List[str],
+        add_to_input,
+        pad_idx,
+    ):
+        super().__init__(dataset)
+        self.random_input_dataset = random_input_dataset
+        if isinstance(input_key_path, str):
+            input_key_path = [input_key_path]
+        assert len(input_key_path) > 0
+        self.input_key_path = input_key_path
+        self.add_to_input = add_to_input
+        self.pad_idx = pad_idx
+
+    def get_target(self, item):
+        target_loc = item
+        for p in self.input_key_path[:-1]:
+            target_loc = target_loc[p]
+        return self.input_key_path[-1], target_loc
+
+    def get_target_value(self, item):
+        k, target_loc = self.get_target(item)
+        return target_loc[k]
+
+    def __getitem__(self, index):
+        item = self.dataset[index]
+        k, target_loc = self.get_target(item)
+        target_loc[k] = random.choice(self.random_input_dataset)
+        return item
+
+    def collater(self, samples):
+        collated = self.dataset.collater(samples)
+        if len(collated) == 0:
+            return collated
+        indices = set(collated["id"].tolist())
+
+        random_inputs = data_utils.collate_tokens(
+            [self.get_target_value(s) for s in samples if s["id"] in indices],
+            pad_idx=self.pad_idx,
+            left_pad=False,
+        )
+        k, target_loc = self.get_target(
+            collated if not self.add_to_input else collated["net_input"]
+        )
+        target_loc[k] = random_inputs
+
+        return collated
diff --git a/SpeechT5/fairseq/examples/wav2vec/unsupervised/kaldi_self_train/README.md b/SpeechT5/fairseq/examples/wav2vec/unsupervised/kaldi_self_train/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..314984fcbb6825169193b21bd6bb3fca5fd2503b
--- /dev/null
+++ b/SpeechT5/fairseq/examples/wav2vec/unsupervised/kaldi_self_train/README.md
@@ -0,0 +1,56 @@
+# Self-Training with Kaldi HMM Models
+This folder contains recipes for self-training on pseudo phone transcripts and
+decoding into phones or words with [kaldi](https://github.com/kaldi-asr/kaldi).
+
+To start, download and install kaldi follow its instruction, and place this
+folder in `path/to/kaldi/egs`.
+
+## Training
+Assuming the following has been prepared:
+- `w2v_dir`: contains features `{train,valid}.{npy,lengths}`, real transcripts `{train,valid}.${label}`, and dict `dict.${label}.txt`
+- `lab_dir`: contains pseudo labels `{train,valid}.txt`
+- `arpa_lm`: Arpa-format n-gram phone LM for decoding
+- `arpa_lm_bin`: Arpa-format n-gram phone LM for unsupervised model selection to be used with KenLM
+
+Set these variables in `train.sh`, as well as `out_dir`, the output directory,
+and then run it.
+
+The output will be:
+```
+==== WER w.r.t. real transcript (select based on unsupervised metric)
+INFO:root:./out/exp/mono/decode_valid/scoring/14.0.0.tra.txt: score 0.9178 wer 28.71% lm_ppl 24.4500 gt_wer 25.57%
+INFO:root:./out/exp/tri1/decode_valid/scoring/17.1.0.tra.txt: score 0.9257 wer 26.99% lm_ppl 30.8494 gt_wer 21.90%
+INFO:root:./out/exp/tri2b/decode_valid/scoring/8.0.0.tra.txt: score 0.7506 wer 23.15% lm_ppl 25.5944 gt_wer 15.78%
+```
+where `wer` is the word eror rate with respect to the pseudo label, `gt_wer` to
+the ground truth label, `lm_ppl` the language model perplexity of HMM prediced
+transcripts, and `score` is the unsupervised metric for model selection. We
+choose the model and the LM parameter of the one with the lowest score. In the
+example above, it is `tri2b`, `8.0.0`.
+
+
+## Decoding into Phones
+In `decode_phone.sh`, set `out_dir` the same as used in `train.sh`, set
+`dec_exp` and `dec_lmparam` to the selected model and LM parameter (e.g.
+`tri2b` and `8.0.0` in the above example). `dec_script` needs to be set
+according to `dec_exp`: for mono/tri1/tri2b, use `decode.sh`; for tri3b, use
+`decode_fmllr.sh`.
+
+The output will be saved at `out_dir/dec_data`
+
+
+## Decoding into Words
+`decode_word_step1.sh` prepares WFSTs for word decoding. Besides the variables
+mentioned above, set
+- `wrd_arpa_lm`: Arpa-format n-gram word LM for decoding
+- `wrd_arpa_lm_bin`: Arpa-format n-gram word LM for unsupervised model selection
+
+`decode_word_step1.sh` decodes the `train` and `valid` split into word and runs
+unsupervised model selection using the `valid` split. The output is like:
+```
+INFO:root:./out/exp/tri2b/decodeword_valid/scoring/17.0.0.tra.txt: score 1.8693 wer 24.97% lm_ppl 1785.5333 gt_wer 31.45%
+```
+
+After determining the LM parameter (`17.0.0` in the example above), set it in
+`decode_word_step2.sh` and run it. The output will be saved at
+`out_dir/dec_data_word`.
diff --git a/SpeechT5/fairseq/examples/wav2vec/unsupervised/kaldi_self_train/st/cmd.sh b/SpeechT5/fairseq/examples/wav2vec/unsupervised/kaldi_self_train/st/cmd.sh
new file mode 100644
index 0000000000000000000000000000000000000000..e74953194d41f0d93855d41b2acef08556d92477
--- /dev/null
+++ b/SpeechT5/fairseq/examples/wav2vec/unsupervised/kaldi_self_train/st/cmd.sh
@@ -0,0 +1,15 @@
+# you can change cmd.sh depending on what type of queue you are using.
+# If you have no queueing system and want to run on a local machine, you
+# can change all instances 'queue.pl' to run.pl (but be careful and run
+# commands one by one: most recipes will exhaust the memory on your
+# machine).  queue.pl works with GridEngine (qsub).  slurm.pl works
+# with slurm.  Different queues are configured differently, with different
+# queue names and different ways of specifying things like memory;
+# to account for these differences you can create and edit the file
+# conf/queue.conf to match your queue's configuration.  Search for
+# conf/queue.conf in http://kaldi-asr.org/doc/queue.html for more information,
+# or search for the string 'default_config' in utils/queue.pl or utils/slurm.pl.
+
+export train_cmd="run.pl --mem 2G"
+export decode_cmd="run.pl --mem 4G"
+export mkgraph_cmd="run.pl --mem 8G"
diff --git a/SpeechT5/fairseq/examples/wav2vec/unsupervised/kaldi_self_train/st/decode_phone.sh b/SpeechT5/fairseq/examples/wav2vec/unsupervised/kaldi_self_train/st/decode_phone.sh
new file mode 100644
index 0000000000000000000000000000000000000000..947342a0b7d8f50bcf4164b284ef3303a1247b64
--- /dev/null
+++ b/SpeechT5/fairseq/examples/wav2vec/unsupervised/kaldi_self_train/st/decode_phone.sh
@@ -0,0 +1,33 @@
+#!/bin/bash
+
+# decode into phones (and prepare a new data directory for HMM outputs)
+
+. ./path.sh
+
+set -eu
+
+out_dir=  # same as in train.sh
+dec_lmparam=  # LM hyperparameters (e.g., 7.0.0)
+dec_exp=
+dec_script=
+dec_splits="train valid"
+dec_data_dir=$out_dir/dec_data  # where to write HMM output
+
+data_dir=${out_dir}/data
+
+local/decode.sh --nj 40 --graph_name graph \
+  --val_sets "$dec_splits" --decode_script $dec_script \
+  $out_dir/exp/$dec_exp $data_dir $data_dir/lang_test
+
+if [ ! -z $dec_lmparam ]; then
+  for x in $dec_splits; do
+    mkdir -p $dec_data_dir/$x
+    cp $data_dir/$x/{feats.scp,cmvn.scp,utt2spk,spk2utt} $dec_data_dir/$x/
+  
+    tra=$out_dir/exp/$dec_exp/decode_${x}/scoring/${dec_lmparam}.tra
+    cat $tra | utils/int2sym.pl -f 2- $data_dir/lang/words.txt | \
+      sed 's:<UNK>::g' | sed 's:<SIL>::g' > $dec_data_dir/${x}/text
+    utils/fix_data_dir.sh $dec_data_dir/${x}
+    echo "WER on ${x} is" $(compute-wer ark:$data_dir/${x}_gt/text ark:$dec_data_dir/$x/text | cut -d" " -f2-)
+  done
+fi
diff --git a/SpeechT5/fairseq/examples/wav2vec/unsupervised/kaldi_self_train/st/decode_word_step1.sh b/SpeechT5/fairseq/examples/wav2vec/unsupervised/kaldi_self_train/st/decode_word_step1.sh
new file mode 100644
index 0000000000000000000000000000000000000000..c1276bbe4d0e02deb984c7c10d6c0486dce09a5f
--- /dev/null
+++ b/SpeechT5/fairseq/examples/wav2vec/unsupervised/kaldi_self_train/st/decode_word_step1.sh
@@ -0,0 +1,46 @@
+#!/bin/bash
+
+# prepare word WFSTs, reference data, and decode
+
+set -eu
+
+w2v_dir=  # same as in train.sh
+out_dir=  # same as in train.sh
+lexicon=  # word to phone mapping
+wrd_arpa_lm=  # word LM
+wrd_arpa_lm_bin=  # word LM for KenLM, used in unsupervised selection
+
+dec_exp=  # what HMM stage to decode (e.g., tri3b)
+dec_script=  # what decoding script to use (e.g., steps/decode_fmllr.sh)
+phn_label=phnc
+wrd_label=wrd
+dec_suffix=word
+dec_splits="train valid"
+valid_split="valid"
+
+data_dir=$out_dir/data
+wrd_data_dir=$out_dir/data_word
+
+lexicon_clean=$(mktemp)
+cat $lexicon | sort | uniq > $lexicon_clean
+local/prepare_lang_word.sh $w2v_dir/dict.${phn_label}.txt $data_dir $lexicon_clean && rm $lexicon_clean
+local/prepare_lm.sh --langdir $data_dir/lang_word --lmdir $data_dir/lang_test_word $wrd_arpa_lm $data_dir
+
+for x in $dec_splits; do
+  x_gt=${x}_gt
+  mkdir -p $wrd_data_dir/$x_gt
+  cp $data_dir/$x_gt/{feats.scp,cmvn.scp,utt2spk,spk2utt} $wrd_data_dir/$x_gt/
+  python local/copy_aligned_text.py < $w2v_dir/$x.$wrd_label > $wrd_data_dir/$x_gt/text
+done
+
+local/decode.sh --nj 40 --graph_name graph${dec_suffix} --decode_suffix $dec_suffix \
+  --val_sets "$dec_splits" --decode_script $dec_script \
+  $out_dir/exp/$dec_exp $data_dir $data_dir/lang_test_word
+
+local/unsup_select_decode_word.sh \
+  --split $valid_split --kenlm_path $wrd_arpa_lm_bin \
+  --ref_txt $wrd_data_dir/${valid_split}_gt/text \
+  --psd_txt $data_dir/${valid_split}/text \
+  --dec_name decode${dec_suffix} --graph_name graph${dec_suffix} \
+  --phonemize_lexicon $data_dir/local/dict_word/lexicon.txt \
+  $out_dir/exp
diff --git a/SpeechT5/fairseq/examples/wav2vec/unsupervised/kaldi_self_train/st/decode_word_step2.sh b/SpeechT5/fairseq/examples/wav2vec/unsupervised/kaldi_self_train/st/decode_word_step2.sh
new file mode 100644
index 0000000000000000000000000000000000000000..59a6cbb12539cf62658f8344f7be7cecf2e3380f
--- /dev/null
+++ b/SpeechT5/fairseq/examples/wav2vec/unsupervised/kaldi_self_train/st/decode_word_step2.sh
@@ -0,0 +1,30 @@
+#!/bin/bash
+
+# prepare a new data directory of HMM word output
+
+. ./path.sh
+
+set -eu
+
+out_dir=  # same as in train.sh
+dec_lmparam=  # LM hyperparameters (e.g., 7.0.0)
+
+dec_exp=tri3b  # what HMM stage to decode (e.g., tri3b)
+dec_suffix=word
+dec_splits="train valid"
+dec_data_dir=$out_dir/dec_data_word  # where to write HMM output
+
+data_dir=$out_dir/data
+wrd_data_dir=$out_dir/data_word
+
+for x in $dec_splits; do
+  mkdir -p $dec_data_dir/$x
+  cp $data_dir/$x/{feats.scp,cmvn.scp,utt2spk,spk2utt} $dec_data_dir/$x/
+
+  tra=$out_dir/exp/$dec_exp/decode${dec_suffix}_${x}/scoring/${dec_lmparam}.tra
+  cat $tra | utils/int2sym.pl -f 2- $data_dir/lang_word/words.txt | \
+    sed 's:<UNK>::g' | sed 's:<SIL>::g' > $dec_data_dir/$x/text
+  utils/fix_data_dir.sh $dec_data_dir/$x
+  echo "WER on $x is" $(compute-wer ark:$wrd_data_dir/${x}_gt/text ark:$dec_data_dir/$x/text | cut -d" " -f2-)
+done
+
diff --git a/SpeechT5/fairseq/examples/wav2vec/unsupervised/kaldi_self_train/st/local/copy_aligned_text.py b/SpeechT5/fairseq/examples/wav2vec/unsupervised/kaldi_self_train/st/local/copy_aligned_text.py
new file mode 100644
index 0000000000000000000000000000000000000000..5f4faa99218b0b30c980cad167c52b2297cd92c3
--- /dev/null
+++ b/SpeechT5/fairseq/examples/wav2vec/unsupervised/kaldi_self_train/st/local/copy_aligned_text.py
@@ -0,0 +1,4 @@
+import sys
+
+for idx, line in enumerate(sys.stdin):
+    print(f"utt{idx:010d} {line}", end='')
\ No newline at end of file
diff --git a/SpeechT5/fairseq/examples/wav2vec/unsupervised/kaldi_self_train/st/local/decode.sh b/SpeechT5/fairseq/examples/wav2vec/unsupervised/kaldi_self_train/st/local/decode.sh
new file mode 100644
index 0000000000000000000000000000000000000000..811cb63c88bb7cdd03b0a250ef2db32b5eaa50df
--- /dev/null
+++ b/SpeechT5/fairseq/examples/wav2vec/unsupervised/kaldi_self_train/st/local/decode.sh
@@ -0,0 +1,38 @@
+#!/bin/bash
+
+set -u
+
+val_sets="dev_other"
+graph_name=graph
+decode_suffix=""
+decode_script="steps/decode_fmllr.sh"
+decode_args=""
+nj=60
+
+. ./cmd.sh
+. ./path.sh
+. parse_options.sh
+
+set -x
+exp_dir=$1
+data_root=$2
+lang_test=$3
+
+graph=$exp_dir/$graph_name
+
+if [ ! -d $graph ]; then
+  utils/mkgraph.sh $lang_test $exp_dir $graph
+fi
+
+for part in $val_sets; do
+  dec_dir=$exp_dir/decode${decode_suffix}_${part}
+  if [ ! -d $dec_dir ]; then
+    echo "decoding $part for $exp_dir"
+    $decode_script --nj $nj --cmd "$decode_cmd" $decode_args \
+      $graph $data_root/$part $dec_dir &
+  else
+    echo "$dec_dir exists. skip"
+  fi
+done
+
+wait
diff --git a/SpeechT5/fairseq/examples/wav2vec/unsupervised/kaldi_self_train/st/local/prepare_data_from_w2v.py b/SpeechT5/fairseq/examples/wav2vec/unsupervised/kaldi_self_train/st/local/prepare_data_from_w2v.py
new file mode 100644
index 0000000000000000000000000000000000000000..66954ea5c9f3f3330e3230860229c7c4046a5d6a
--- /dev/null
+++ b/SpeechT5/fairseq/examples/wav2vec/unsupervised/kaldi_self_train/st/local/prepare_data_from_w2v.py
@@ -0,0 +1,56 @@
+import kaldi_io
+import numpy as np
+import os
+
+
+def get_parser():
+    import argparse
+    parser = argparse.ArgumentParser()
+    parser.add_argument("w2v_dir", help="wav2vec feature and text directory")
+    parser.add_argument("tar_root", help="output data directory in kaldi's format")
+    parser.add_argument("split", help="name of the subset")
+    parser.add_argument("--label", default="", help="if specified, copy labels too")
+    return parser
+
+def main():
+    parser = get_parser()
+    args = parser.parse_args()
+
+    tar_dir = os.path.join(args.tar_root, args.split)
+    os.makedirs(tar_dir, exist_ok=True)
+
+    lengths_path = os.path.join(args.w2v_dir, f"{args.split}.lengths")
+    with open(lengths_path) as f:
+        lengths = [int(line.rstrip()) for line in f]
+        offsets = [0] + np.cumsum(lengths[:-1]).tolist()
+    feats = np.load(
+        os.path.join(args.w2v_dir, f"{args.split}.npy"),
+        mmap_mode="r"
+    )
+    assert feats.shape[0] == sum(lengths), \
+        f"lengths mismatch {feats.shape[0]} != {sum(lengths)}"
+
+    ark_path = os.path.join(tar_dir, "feats.ark")
+    scp_path = os.path.join(tar_dir, "feats.scp")
+    wspec = f"ark:| copy-feats --compress=true ark:- ark,scp:{ark_path},{scp_path}"
+    with kaldi_io.open_or_fd(wspec, "wb") as f:
+        for idx, (offset, length) in enumerate(zip(offsets, lengths)):
+            feat = feats[offset:offset+length]
+            kaldi_io.write_mat(f, feat, key=f"utt{idx:010d}")
+
+    u2s_path = os.path.join(tar_dir, "utt2spk")
+    s2u_path = os.path.join(tar_dir, "spk2utt")
+    with open(u2s_path, "w") as f_u2s, open(s2u_path, "w") as f_s2u:
+        for idx in range(len(lengths)):
+            f_u2s.write(f"utt{idx:010d} utt{idx:010d}\n")
+            f_s2u.write(f"utt{idx:010d} utt{idx:010d}\n")
+
+    if bool(args.label):
+        lab_path = os.path.join(args.w2v_dir, f"{args.split}.{args.label}")
+        txt_path = os.path.join(tar_dir, "text")
+        with open(lab_path) as f_lab, open(txt_path, "w") as f_txt:
+            for idx, line in enumerate(f_lab):
+                f_txt.write(f"utt{idx:010d} {line}")
+
+if __name__ == "__main__":
+    main()
diff --git a/SpeechT5/fairseq/examples/wav2vec/unsupervised/kaldi_self_train/st/local/prepare_lang.sh b/SpeechT5/fairseq/examples/wav2vec/unsupervised/kaldi_self_train/st/local/prepare_lang.sh
new file mode 100644
index 0000000000000000000000000000000000000000..e9a80001eb47d5af863d6aab11a59362a59cef61
--- /dev/null
+++ b/SpeechT5/fairseq/examples/wav2vec/unsupervised/kaldi_self_train/st/local/prepare_lang.sh
@@ -0,0 +1,37 @@
+#!/bin/bash
+
+sil_prob=0.5
+num_sil_states=3
+num_nonsil_states=1
+
+. ./cmd.sh
+. ./path.sh
+. parse_options.sh
+
+set -eux
+
+dict=$1
+data_dir=$2
+
+dict_dir=$data_dir/local/dict
+tmplm_dir=$data_dir/local/lang_tmp
+lm_dir=$data_dir/lang
+
+mkdir -p $dict_dir $tmplm_dir $lm_dir
+
+# prepare dict
+echo "SIL" > $dict_dir/silence_phones.txt
+echo "SIL" > $dict_dir/optional_silence.txt
+awk '{print $1}' $dict > $dict_dir/nonsilence_phones.txt
+
+echo "SIL SIL" > $dict_dir/lexicon.txt
+echo "<UNK> SIL" >> $dict_dir/lexicon.txt
+awk '{print $1" "$1}' $dict >> $dict_dir/lexicon.txt
+
+echo "SIL" > $dict_dir/extra_questions.txt
+awk '{printf $1" "} END {printf "\n"}' $dict >> $dict_dir/extra_questions.txt
+
+# prepare lang
+utils/prepare_lang.sh --sil-prob $sil_prob --position-dependent-phones false \
+  --num_sil_states $num_sil_states --num_nonsil_states $num_nonsil_states \
+  $dict_dir "<UNK>" $tmplm_dir $lm_dir
diff --git a/SpeechT5/fairseq/examples/wav2vec/unsupervised/kaldi_self_train/st/local/prepare_lang_word.sh b/SpeechT5/fairseq/examples/wav2vec/unsupervised/kaldi_self_train/st/local/prepare_lang_word.sh
new file mode 100644
index 0000000000000000000000000000000000000000..a7ea3877beefe1d4d53f9f7e32b004d8ce01e22a
--- /dev/null
+++ b/SpeechT5/fairseq/examples/wav2vec/unsupervised/kaldi_self_train/st/local/prepare_lang_word.sh
@@ -0,0 +1,35 @@
+#!/bin/bash
+
+num_sil_states=3
+num_nonsil_states=1
+
+. ./cmd.sh
+. ./path.sh
+. parse_options.sh
+
+set -eux
+
+dict=$1
+data_dir=$2
+lexicon=$3
+
+dict_dir=$data_dir/local/dict_word
+tmplm_dir=$data_dir/local/lang_tmp_word
+lm_dir=$data_dir/lang_word
+
+mkdir -p $dict_dir $tmplm_dir $lm_dir
+
+# prepare dict
+echo "SIL" > $dict_dir/silence_phones.txt
+echo "SIL" > $dict_dir/optional_silence.txt
+awk '{print $1}' $dict > $dict_dir/nonsilence_phones.txt
+
+(echo "!SIL SIL"; echo "<UNK> SIL";) | cat - $lexicon > $dict_dir/lexicon.txt
+
+echo "SIL" > $dict_dir/extra_questions.txt
+awk '{printf $1" "} END {printf "\n"}' $dict >> $dict_dir/extra_questions.txt
+
+# prepare lang
+utils/prepare_lang.sh --position-dependent-phones false \
+  --num_sil_states $num_sil_states --num_nonsil_states $num_nonsil_states \
+  $dict_dir "<UNK>" $tmplm_dir $lm_dir
diff --git a/SpeechT5/fairseq/examples/wav2vec/unsupervised/kaldi_self_train/st/local/prepare_lm.sh b/SpeechT5/fairseq/examples/wav2vec/unsupervised/kaldi_self_train/st/local/prepare_lm.sh
new file mode 100644
index 0000000000000000000000000000000000000000..c2edcefede2da3b6a991b9c8fbc78c96d46d27cb
--- /dev/null
+++ b/SpeechT5/fairseq/examples/wav2vec/unsupervised/kaldi_self_train/st/local/prepare_lm.sh
@@ -0,0 +1,35 @@
+#!/usr/bin/env bash
+
+langdir=""
+lmdir=""
+
+. ./cmd.sh
+. ./path.sh
+. parse_options.sh
+
+arpa_lm=$1
+data=$2
+
+if [ -z $langdir ]; then
+  langdir=$data/lang
+fi
+if [ -z $lmdir ]; then
+  lmdir=$data/lang_test
+fi
+
+if [ ! -d $langdir ]; then
+  echo "$langdir not found. run local/prepare_lang.sh first" && exit 1
+fi
+
+mkdir -p $lmdir
+cp -r $langdir/* $lmdir
+
+if [[ "$arpa_lm" == *.gz ]]; then
+  gunzip -c $arpa_lm | arpa2fst --disambig-symbol=#0 --read-symbol-table=$lmdir/words.txt - $lmdir/G.fst
+else
+  arpa2fst --disambig-symbol=#0 --read-symbol-table=$lmdir/words.txt $arpa_lm $lmdir/G.fst
+fi
+fstisstochastic $lmdir/G.fst
+utils/validate_lang.pl $lmdir || exit 1
+
+echo "done preparing lm ($lmdir)"
diff --git a/SpeechT5/fairseq/examples/wav2vec/unsupervised/kaldi_self_train/st/local/score.sh b/SpeechT5/fairseq/examples/wav2vec/unsupervised/kaldi_self_train/st/local/score.sh
new file mode 100644
index 0000000000000000000000000000000000000000..cb5bbb7277bfb9f2d5440da0514bf7b16da8140d
--- /dev/null
+++ b/SpeechT5/fairseq/examples/wav2vec/unsupervised/kaldi_self_train/st/local/score.sh
@@ -0,0 +1,63 @@
+#!/usr/bin/env bash
+# Copyright 2012  Johns Hopkins University (Author: Daniel Povey)
+#           2014  Guoguo Chen
+# Apache 2.0
+
+[ -f ./path.sh ] && . ./path.sh
+
+# begin configuration section.
+cmd=run.pl
+stage=0
+decode_mbr=true
+word_ins_penalty=0.0,0.5,1.0
+min_lmwt=7
+max_lmwt=17
+iter=final
+#end configuration section.
+
+[ -f ./path.sh ] && . ./path.sh
+. parse_options.sh || exit 1;
+
+if [ $# -ne 3 ]; then
+  echo "Usage: local/score.sh [--cmd (run.pl|queue.pl...)] <data-dir> <lang-dir|graph-dir> <decode-dir>"
+  echo " Options:"
+  echo "    --cmd (run.pl|queue.pl...)      # specify how to run the sub-processes."
+  echo "    --stage (0|1|2)                 # start scoring script from part-way through."
+  echo "    --decode_mbr (true/false)       # maximum bayes risk decoding (confusion network)."
+  echo "    --min_lmwt <int>                # minumum LM-weight for lattice rescoring "
+  echo "    --max_lmwt <int>                # maximum LM-weight for lattice rescoring "
+  exit 1;
+fi
+
+data=$1
+lang_or_graph=$2
+dir=$3
+
+symtab=$lang_or_graph/words.txt
+
+for f in $symtab $dir/lat.1.gz $data/text; do
+  [ ! -f $f ] && echo "score.sh: no such file $f" && exit 1;
+done
+
+mkdir -p $dir/scoring/log
+
+cat $data/text | sed 's:<NOISE>::g' | sed 's:<SPOKEN_NOISE>::g' > $dir/scoring/test_filt.txt
+
+for wip in $(echo $word_ins_penalty | sed 's/,/ /g'); do
+  $cmd LMWT=$min_lmwt:$max_lmwt $dir/scoring/log/best_path.LMWT.$wip.log \
+    lattice-scale --inv-acoustic-scale=LMWT "ark:gunzip -c $dir/lat.*.gz|" ark:- \| \
+    lattice-add-penalty --word-ins-penalty=$wip ark:- ark:- \| \
+    lattice-best-path --word-symbol-table=$symtab \
+      ark:- ark,t:$dir/scoring/LMWT.$wip.tra || exit 1;
+done
+
+# Note: the double level of quoting for the sed command
+for wip in $(echo $word_ins_penalty | sed 's/,/ /g'); do
+  $cmd LMWT=$min_lmwt:$max_lmwt $dir/scoring/log/score.LMWT.$wip.log \
+    cat $dir/scoring/LMWT.$wip.tra \| \
+    utils/int2sym.pl -f 2- $symtab \| sed 's:\<UNK\>::g' \| \
+    compute-wer --text --mode=present \
+    ark:$dir/scoring/test_filt.txt  ark,p:- ">&" $dir/wer_LMWT_$wip || exit 1;
+done
+
+exit 0;
diff --git a/SpeechT5/fairseq/examples/wav2vec/unsupervised/kaldi_self_train/st/local/show_wer.sh b/SpeechT5/fairseq/examples/wav2vec/unsupervised/kaldi_self_train/st/local/show_wer.sh
new file mode 100644
index 0000000000000000000000000000000000000000..9ecf1690c67f8a019009ef32d973fbd45b56c7ca
--- /dev/null
+++ b/SpeechT5/fairseq/examples/wav2vec/unsupervised/kaldi_self_train/st/local/show_wer.sh
@@ -0,0 +1,52 @@
+#!/bin/bash
+
+split="dev_other"
+ref_data=""
+get_best_wer=true
+dec_name="decode"
+graph_name="graph"
+
+. ./cmd.sh
+. ./path.sh
+. parse_options.sh
+
+exp_root=$1
+
+set -eu
+
+echo "==== WER w.r.t. pseudo transcript"
+for x in $exp_root/*/${dec_name}_${split}*; do grep WER $x/wer_* 2>/dev/null | utils/best_wer.sh; done
+
+
+if [ ! -z $ref_data ]; then
+  echo "==== WER w.r.t. real transcript (select based on pseudo WER)"
+  ref_txt=$ref_data/$split/text
+  for x in $exp_root/*/${dec_name}_${split}*; do
+    lang=$(dirname $x)/$graph_name
+
+    lmwt=$(
+      grep WER $x/wer_* 2>/dev/null | utils/best_wer.sh |
+      sed 's/.*wer_\(.*\)$/\1/g' | sed 's/_/./g'
+    )
+    tra=$x/scoring/$lmwt.tra
+    cat $tra | utils/int2sym.pl -f 2- $lang/words.txt | sed 's:<UNK>::g' | sed 's:<SIL>::g' | \
+      compute-wer --text --mode=present \
+      ark:$ref_txt  ark,p:- 2> /dev/null | grep WER | xargs -I{} echo {} $tra
+  done
+fi
+
+if [ ! -z $ref_data ] && $get_best_wer; then
+  echo "==== WER w.r.t. real transcript (select based on true WER)"
+  ref_txt=$ref_data/$split/text
+  for x in $exp_root/*/${dec_name}_${split}*; do
+    lang=$(dirname $x)/$graph_name
+
+    for tra in $x/scoring/*.tra; do
+      cat $tra | utils/int2sym.pl -f 2- $lang/words.txt | sed 's:<UNK>::g' | sed 's:<SIL>::g' | \
+        compute-wer --text --mode=present \
+        ark:$ref_txt  ark,p:- 2> /dev/null | grep WER | xargs -I{} echo {} $tra
+    done | sort -k2n | head -n1
+  done
+fi
+
+exit 0;
diff --git a/SpeechT5/fairseq/examples/wav2vec/unsupervised/kaldi_self_train/st/local/train_subset_lgbeam.sh b/SpeechT5/fairseq/examples/wav2vec/unsupervised/kaldi_self_train/st/local/train_subset_lgbeam.sh
new file mode 100644
index 0000000000000000000000000000000000000000..913c1d8e4357c146026b86e78f0b16f921776441
--- /dev/null
+++ b/SpeechT5/fairseq/examples/wav2vec/unsupervised/kaldi_self_train/st/local/train_subset_lgbeam.sh
@@ -0,0 +1,129 @@
+#!/usr/bin/env bash
+
+out_root=/tmp
+out_name=train_${RANDOM}
+num_nonsil_states=1
+
+valid="dev_other"
+train="train"
+mono_size="-1"  # 2000
+tri1_size="-1"  # 5000
+tri2b_size="-1"  # 10000
+tri3b_size="-1"  # 10000
+
+# Acoustic model parameters
+numLeavesTri1=2000
+numGaussTri1=10000
+numLeavesMLLT=2500
+numGaussMLLT=15000
+numLeavesSAT=2500
+numGaussSAT=15000
+
+stage=1
+max_stage=1
+
+. ./cmd.sh
+. ./path.sh
+. parse_options.sh
+
+data=$1
+lang=$2
+lang_test=$3
+
+exp_root=$out_root/$out_name
+
+# you might not want to do this for interactive shells.
+set -e
+
+
+if [ $stage -le 1 ] && [ $max_stage -ge 1 ]; then
+  # train a monophone system
+  if [ ! $mono_size -eq -1 ]; then
+    utils/subset_data_dir.sh $data/$train $mono_size $data/${train}_${mono_size}
+    mono_train=${train}_${mono_size}
+  else
+    mono_train=${train}
+  fi
+
+  steps/train_mono.sh --boost-silence 1.25 --nj 20 --cmd "$train_cmd" \
+    --initial-beam 40 --regular-beam 60 --retry-beam 120 \
+    $data/$mono_train $lang $exp_root/mono
+
+  utils/mkgraph.sh $lang_test $exp_root/mono $exp_root/mono/graph
+  steps/decode.sh --nj 20 --cmd "$decode_cmd" \
+    $exp_root/mono/graph $data/$valid $exp_root/mono/decode_$valid &
+fi
+
+
+if [ $stage -le 2 ] && [ $max_stage -ge 2 ]; then
+  # train a first delta + delta-delta triphone system on a subset of 5000 utterances
+  if [ ! $tri1_size -eq -1 ]; then
+    utils/subset_data_dir.sh $data/$train $tri1_size $data/${train}_${tri1_size}
+    tri1_train=${train}_${tri1_size}
+  else
+    tri1_train=${train}
+  fi
+
+  steps/align_si.sh --boost-silence 1.25 --nj 10 --cmd "$train_cmd" \
+    $data/$tri1_train $lang \
+    $exp_root/mono $exp_root/mono_ali_${tri1_train}
+
+  steps_gan/train_deltas.sh --boost-silence 1.25 --cmd "$train_cmd" \
+      --num_nonsil_states $num_nonsil_states $numLeavesTri1 $numGaussTri1 \
+      $data/$tri1_train $lang \
+      $exp_root/mono_ali_${tri1_train} $exp_root/tri1
+
+  utils/mkgraph.sh $lang_test $exp_root/tri1 $exp_root/tri1/graph
+  steps/decode.sh --nj 20 --cmd "$decode_cmd" \
+    $exp_root/tri1/graph $data/$valid $exp_root/tri1/decode_$valid &
+fi
+
+if [ $stage -le 3 ] && [ $max_stage -ge 3 ]; then
+  # train an LDA+MLLT system.
+  if [ ! $tri2b_size -eq -1 ]; then
+    utils/subset_data_dir.sh $data/$train $tri2b_size $data/${train}_${tri2b_size}
+    tri2b_train=${train}_${tri2b_size}
+  else
+    tri2b_train=${train}
+  fi
+
+  steps/align_si.sh --nj 10 --cmd "$train_cmd" \
+    $data/$tri2b_train $lang \
+    $exp_root/tri1 $exp_root/tri1_ali_${tri2b_train}
+
+  steps_gan/train_lda_mllt.sh --cmd "$train_cmd" \
+      --num_nonsil_states $num_nonsil_states \
+      --splice-opts "--left-context=3 --right-context=3" $numLeavesMLLT $numGaussMLLT \
+      $data/$tri2b_train $lang \
+      $exp_root/tri1_ali_${tri2b_train} $exp_root/tri2b
+
+  utils/mkgraph.sh $lang_test $exp_root/tri2b $exp_root/tri2b/graph
+  steps/decode.sh --nj 20 --cmd "$decode_cmd" \
+    $exp_root/tri2b/graph $data/$valid $exp_root/tri2b/decode_$valid &
+fi
+
+
+if [ $stage -le 4 ] && [ $max_stage -ge 4 ]; then
+  # Train tri3b, which is LDA+MLLT+SAT on 10k utts
+  if [ ! $tri3b_size -eq -1 ]; then
+    utils/subset_data_dir.sh $data/$train $tri3b_size $data/${train}_${tri3b_size}
+    tri3b_train=${train}_${tri3b_size}
+  else
+    tri3b_train=${train}
+  fi
+
+  steps/align_si.sh  --nj 10 --cmd "$train_cmd" --use-graphs true \
+    $data/$tri3b_train $lang \
+    $exp_root/tri2b $exp_root/tri2b_ali_${tri2b_train}
+
+  steps_gan/train_sat.sh --cmd "$train_cmd" \
+    --num_nonsil_states $num_nonsil_states $numLeavesSAT $numGaussSAT \
+    $data/$tri3b_train $lang \
+    $exp_root/tri2b_ali_${tri2b_train} $exp_root/tri3b
+
+  utils/mkgraph.sh $lang_test $exp_root/tri3b $exp_root/tri3b/graph
+  steps/decode_fmllr.sh --nj 20 --cmd "$decode_cmd" \
+    $exp_root/tri3b/graph $data/$valid $exp_root/tri3b/decode_$valid &
+fi
+
+wait
diff --git a/SpeechT5/fairseq/examples/wav2vec/unsupervised/kaldi_self_train/st/local/unsup_select.py b/SpeechT5/fairseq/examples/wav2vec/unsupervised/kaldi_self_train/st/local/unsup_select.py
new file mode 100644
index 0000000000000000000000000000000000000000..1122c88c1964d8beead63bc8dfe21d41602b83bc
--- /dev/null
+++ b/SpeechT5/fairseq/examples/wav2vec/unsupervised/kaldi_self_train/st/local/unsup_select.py
@@ -0,0 +1,135 @@
+"""
+Implement unsupervised metric for decoding hyperparameter selection:
+    $$ alpha * LM_PPL + ViterbitUER(%) * 100 $$
+"""
+import argparse
+import logging
+import math
+import sys
+
+import kenlm
+import editdistance
+from g2p_en import G2p
+
+logging.root.setLevel(logging.INFO)
+logging.basicConfig(stream=sys.stdout, level=logging.INFO)
+logger = logging.getLogger(__name__)
+
+
+def get_parser():
+    parser = argparse.ArgumentParser()
+    parser.add_argument("ref_tra", help="reference pseudo labels")
+    parser.add_argument("hyp_tra", help="decoded pseudo labels to be assess")
+    parser.add_argument("--kenlm_path", default="/checkpoint/abaevski/data/speech/libri/librispeech_lm_novox.phnc_o5.bin", help="")
+    parser.add_argument("--uppercase", action="store_true", help="")
+    parser.add_argument("--skipwords", default="", help="")
+    parser.add_argument("--gt_tra", default="", help="ground truth pseudo labels for computing oracle WER")
+    parser.add_argument("--min_vt_uer", default=0.0, type=float)
+    parser.add_argument("--phonemize", action="store_true", help="phonemize word hypotheses, used when reference is phone transcript")
+    parser.add_argument("--phonemize_lexicon", default="", type=str, help="use a lexicon for phonemizing")
+    return parser
+
+def load_tra(tra_path):
+    with open(tra_path, "r") as f:
+        uid_to_tra = {}
+        for line in f:
+            toks = line.rstrip().split()
+            uid, tra = toks[0], " ".join(toks[1:])
+            uid_to_tra[uid] = tra
+    logger.debug(f"loaded {len(uid_to_tra)} utterances from {tra_path}")
+    return uid_to_tra
+
+def load_lex(lex_path):
+    with open(lex_path, "r") as f:
+        w2p = {}
+        for line in f:
+            w, p = line.rstrip().split(None, 1)
+            w2p[w] = p.split()
+    return w2p
+            
+def compute_wer(ref_uid_to_tra, hyp_uid_to_tra, g2p, g2p_dict):
+    d_cnt = 0
+    w_cnt = 0
+    w_cnt_h = 0
+    for uid in hyp_uid_to_tra:
+        ref = ref_uid_to_tra[uid].split()
+        if g2p_dict is not None:
+            hyp = []
+            for word in hyp_uid_to_tra[uid].split():
+                if word in g2p_dict:
+                    hyp = hyp + g2p_dict[word]
+                else:
+                    logger.warning(f"{word} not in g2p_dict")
+        elif g2p is not None:
+            hyp = g2p(hyp_uid_to_tra[uid])
+            hyp = [p for p in hyp if p != "'" and p != " "]
+            hyp = [p[:-1] if p[-1].isnumeric() else p for p in hyp]
+        else:
+            hyp = hyp_uid_to_tra[uid].split()
+        logger.debug((
+            f"======================\n"
+            f"HYP: {' '.join(hyp)}\n"
+            f"REF: {' '.join(ref)}"
+        ))
+        d_cnt += editdistance.eval(ref, hyp)
+        w_cnt += len(ref)
+        w_cnt_h += len(hyp)
+    wer = float(d_cnt) / w_cnt
+    logger.debug((
+        f"wer = {wer*100:.2f}%; num. of ref words = {w_cnt}; "
+        f"num. of hyp words = {w_cnt_h}; num. of sentences = {len(ref_uid_to_tra)}"
+    ))
+    return wer
+
+def compute_lm_ppl(hyp_uid_to_tra, score_fn):
+    lm_score = 0.
+    w_cnt = 0
+    for hyp in hyp_uid_to_tra.values():
+        cur_score = score_fn(hyp)
+        cur_cnt = len(hyp.split()) + 1  # plus one for </s>
+        lm_score += cur_score
+        w_cnt += cur_cnt
+        logger.debug((
+            f"======================\n"
+            f"score sum/avg = {cur_score:.2f}/{cur_score/cur_cnt:.2f}\n"
+            f"hyp = {hyp}"
+        ))
+    lm_ppl = math.pow(10, -lm_score / w_cnt)
+    logger.debug(f"lm ppl = {lm_ppl:.2f}; num. of words = {w_cnt}")
+    return lm_ppl
+
+def main():
+    args = get_parser().parse_args()
+    logger.debug(f"Args: {args}")
+    
+    ref_uid_to_tra = load_tra(args.ref_tra)
+    hyp_uid_to_tra = load_tra(args.hyp_tra)
+    assert not bool(set(hyp_uid_to_tra.keys()) - set(ref_uid_to_tra.keys()))
+
+    lm = kenlm.Model(args.kenlm_path)
+    skipwords = set(args.skipwords.split(","))
+    def compute_lm_score(s):
+        s = " ".join(w for w in s.split() if w not in skipwords)
+        s = s.upper() if args.uppercase else s
+        return lm.score(s)
+
+    g2p, g2p_dict = None, None
+    if args.phonemize:
+        if args.phonemize_lexicon:
+            g2p_dict = load_lex(args.phonemize_lexicon)
+        else:
+            g2p = G2p()
+
+    wer = compute_wer(ref_uid_to_tra, hyp_uid_to_tra, g2p, g2p_dict)
+    lm_ppl = compute_lm_ppl(hyp_uid_to_tra, compute_lm_score)
+    
+    gt_wer = -math.inf
+    if args.gt_tra:
+        gt_uid_to_tra = load_tra(args.gt_tra)
+        gt_wer = compute_wer(gt_uid_to_tra, hyp_uid_to_tra, None, None)
+
+    score = math.log(lm_ppl) * max(wer, args.min_vt_uer)
+    logging.info(f"{args.hyp_tra}: score={score:.4f}; wer={wer*100:.2f}%; lm_ppl={lm_ppl:.4f}; gt_wer={gt_wer*100:.2f}%")
+
+if __name__ == "__main__":
+    main()
diff --git a/SpeechT5/fairseq/examples/wav2vec/unsupervised/kaldi_self_train/st/local/unsup_select_decode.sh b/SpeechT5/fairseq/examples/wav2vec/unsupervised/kaldi_self_train/st/local/unsup_select_decode.sh
new file mode 100644
index 0000000000000000000000000000000000000000..b34c5b6e0688914a53515162f817a93617b609e5
--- /dev/null
+++ b/SpeechT5/fairseq/examples/wav2vec/unsupervised/kaldi_self_train/st/local/unsup_select_decode.sh
@@ -0,0 +1,37 @@
+#!/bin/bash
+
+split="dev_other"
+ref_txt=""  # ground truth transcript path
+psd_txt=""  # pseudo transcript path
+get_best_wer=true
+dec_name="decode"
+graph_name="graph"
+kenlm_path=/checkpoint/abaevski/data/speech/libri/librispeech_lm_novox.phnc_o6.bin
+
+. ./cmd.sh
+. ./path.sh
+. parse_options.sh
+
+exp_root=$1
+unsup_args=""
+if [ $# -ge 2 ]; then
+  unsup_args=$2
+fi
+
+set -eu
+
+if [ ! -z $ref_txt ] && $get_best_wer; then
+  echo "==== WER w.r.t. real transcript (select based on unsupervised metric)"
+  for x in $exp_root/*/${dec_name}_${split}*; do
+    lang=$(dirname $x)/$graph_name
+
+    (
+      for tra in $x/scoring/*.tra; do
+        cat $tra | utils/int2sym.pl -f 2- $lang/words.txt | sed 's:<UNK>::g' | sed 's:<SIL>::g' > $tra.txt
+        python local/unsup_select.py $psd_txt $tra.txt --kenlm_path $kenlm_path --gt_tra $ref_txt $unsup_args
+      done 2>/dev/null | grep "score=" | sed 's/=/ /g' | sed 's/;//g' | sort -k3n | head -n1
+    ) &
+  done
+fi
+wait
+
diff --git a/SpeechT5/fairseq/examples/wav2vec/unsupervised/kaldi_self_train/st/local/unsup_select_decode_word.sh b/SpeechT5/fairseq/examples/wav2vec/unsupervised/kaldi_self_train/st/local/unsup_select_decode_word.sh
new file mode 100644
index 0000000000000000000000000000000000000000..c10a6b8809b77bca2b2c02df8b8702725bdd51c7
--- /dev/null
+++ b/SpeechT5/fairseq/examples/wav2vec/unsupervised/kaldi_self_train/st/local/unsup_select_decode_word.sh
@@ -0,0 +1,35 @@
+#!/bin/bash
+
+split="dev_other"
+ref_txt=""  # ground truth transcript path
+psd_txt=""  # pseudo transcript path
+get_best_wer=true
+dec_name="decode"
+graph_name="graph"
+kenlm_path=/checkpoint/abaevski/data/speech/libri/librispeech_lm_novox.phnc_o6.bin
+phonemize_lexicon=""
+
+. ./cmd.sh
+. ./path.sh
+. parse_options.sh
+. /private/home/wnhsu/unsup_asr/fairseq-py-unsup/env.sh
+
+exp_root=$1
+
+set -eu
+
+if [ ! -z $ref_txt ] && $get_best_wer; then
+  echo "==== WER w.r.t. real transcript (select based on unsupervised metric)"
+  for x in $exp_root/*/${dec_name}_${split}*; do
+    lang=$(dirname $x)/$graph_name
+
+    for tra in $x/scoring/*.tra; do
+      cat $tra | utils/int2sym.pl -f 2- $lang/words.txt | sed 's:\<UNK\>::g' > $tra.txt
+      python local/unsup_select.py $psd_txt $tra.txt \
+        --kenlm_path $kenlm_path --gt_tra $ref_txt --phonemize \
+        --phonemize_lexicon "$phonemize_lexicon"
+    done | grep "score=" | sed 's/=/ /g' | sed 's/;//g' | sort -k3n | head -n1
+  done
+fi
+
+
diff --git a/SpeechT5/fairseq/examples/wav2vec/unsupervised/kaldi_self_train/st/path.sh b/SpeechT5/fairseq/examples/wav2vec/unsupervised/kaldi_self_train/st/path.sh
new file mode 100644
index 0000000000000000000000000000000000000000..1a6fb5f891b55d9fd978cfe54565f112f7eedce7
--- /dev/null
+++ b/SpeechT5/fairseq/examples/wav2vec/unsupervised/kaldi_self_train/st/path.sh
@@ -0,0 +1,5 @@
+export KALDI_ROOT=`pwd`/../../..
+export PATH=$PWD/utils/:$KALDI_ROOT/tools/openfst/bin:$PWD:$PATH
+[ ! -f $KALDI_ROOT/tools/config/common_path.sh ] && echo >&2 "The standard file $KALDI_ROOT/tools/config/common_path.sh is not present -> Exit!" && exit 1
+. $KALDI_ROOT/tools/config/common_path.sh
+export LC_ALL=C
diff --git a/SpeechT5/fairseq/examples/wav2vec/unsupervised/kaldi_self_train/st/steps_gan/train_deltas.sh b/SpeechT5/fairseq/examples/wav2vec/unsupervised/kaldi_self_train/st/steps_gan/train_deltas.sh
new file mode 100644
index 0000000000000000000000000000000000000000..af68715ab0d87ae40666596d9d877d593684f8e2
--- /dev/null
+++ b/SpeechT5/fairseq/examples/wav2vec/unsupervised/kaldi_self_train/st/steps_gan/train_deltas.sh
@@ -0,0 +1,175 @@
+#!/usr/bin/env bash
+
+# Copyright 2012  Johns Hopkins University (Author: Daniel Povey)
+# Apache 2.0
+
+# Begin configuration.
+stage=-4 #  This allows restarting after partway, when something when wrong.
+config=
+cmd=run.pl
+scale_opts="--transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1"
+realign_iters="10 20 30";
+num_iters=35    # Number of iterations of training
+max_iter_inc=25 # Last iter to increase #Gauss on.
+beam=10
+careful=false
+retry_beam=40
+boost_silence=1.0 # Factor by which to boost silence likelihoods in alignment
+power=0.25 # Exponent for number of gaussians according to occurrence counts
+cluster_thresh=-1  # for build-tree control final bottom-up clustering of leaves
+norm_vars=false # deprecated.  Prefer --cmvn-opts "--norm-vars=true"
+                # use the option --cmvn-opts "--norm-means=false"
+cmvn_opts=
+delta_opts=
+context_opts=   # use"--context-width=5 --central-position=2" for quinphone
+num_nonsil_states=3
+# End configuration.
+
+echo "$0 $@"  # Print the command line for logging
+
+[ -f path.sh ] && . ./path.sh;
+. parse_options.sh || exit 1;
+
+if [ $# != 6 ]; then
+   echo "Usage: steps/train_deltas.sh <num-leaves> <tot-gauss> <data-dir> <lang-dir> <alignment-dir> <exp-dir>"
+   echo "e.g.: steps/train_deltas.sh 2000 10000 data/train_si84_half data/lang exp/mono_ali exp/tri1"
+   echo "main options (for others, see top of script file)"
+   echo "  --cmd (utils/run.pl|utils/queue.pl <queue opts>) # how to run jobs."
+   echo "  --config <config-file>                           # config containing options"
+   echo "  --stage <stage>                                  # stage to do partial re-run from."
+   exit 1;
+fi
+
+numleaves=$1
+totgauss=$2
+data=$3
+lang=$4
+alidir=$5
+dir=$6
+
+for f in $alidir/final.mdl $alidir/ali.1.gz $data/feats.scp $lang/phones.txt; do
+  [ ! -f $f ] && echo "train_deltas.sh: no such file $f" && exit 1;
+done
+
+numgauss=$numleaves
+incgauss=$[($totgauss-$numgauss)/$max_iter_inc] # per-iter increment for #Gauss
+oov=`cat $lang/oov.int` || exit 1;
+ciphonelist=`cat $lang/phones/context_indep.csl` || exit 1;
+nj=`cat $alidir/num_jobs` || exit 1;
+mkdir -p $dir/log
+echo $nj > $dir/num_jobs
+
+utils/lang/check_phones_compatible.sh $lang/phones.txt $alidir/phones.txt || exit 1;
+cp $lang/phones.txt $dir || exit 1;
+
+sdata=$data/split$nj;
+split_data.sh $data $nj || exit 1;
+
+
+[ $(cat $alidir/cmvn_opts 2>/dev/null | wc -c) -gt 1 ] && [ -z "$cmvn_opts" ] && \
+  echo "$0: warning: ignoring CMVN options from source directory $alidir"
+$norm_vars && cmvn_opts="--norm-vars=true $cmvn_opts"
+echo $cmvn_opts  > $dir/cmvn_opts # keep track of options to CMVN.
+[ ! -z $delta_opts ] && echo $delta_opts > $dir/delta_opts
+
+feats="ark,s,cs:apply-cmvn $cmvn_opts --utt2spk=ark:$sdata/JOB/utt2spk scp:$sdata/JOB/cmvn.scp scp:$sdata/JOB/feats.scp ark:- | add-deltas $delta_opts ark:- ark:- |"
+
+rm $dir/.error 2>/dev/null
+
+if [ $stage -le -3 ]; then
+  echo "$0: accumulating tree stats"
+  $cmd JOB=1:$nj $dir/log/acc_tree.JOB.log \
+    acc-tree-stats $context_opts \
+    --ci-phones=$ciphonelist $alidir/final.mdl "$feats" \
+    "ark:gunzip -c $alidir/ali.JOB.gz|" $dir/JOB.treeacc || exit 1;
+  sum-tree-stats $dir/treeacc $dir/*.treeacc 2>$dir/log/sum_tree_acc.log || exit 1;
+  rm $dir/*.treeacc
+fi
+
+if [ $stage -le -2 ]; then
+  echo "$0: getting questions for tree-building, via clustering"
+  # preparing questions, roots file...
+  cluster-phones --pdf-class-list=$(($num_nonsil_states / 2)) $context_opts \
+    $dir/treeacc $lang/phones/sets.int \
+    $dir/questions.int 2> $dir/log/questions.log || exit 1;
+  cat $lang/phones/extra_questions.int >> $dir/questions.int
+  compile-questions $context_opts $lang/topo $dir/questions.int \
+    $dir/questions.qst 2>$dir/log/compile_questions.log || exit 1;
+
+  echo "$0: building the tree"
+  $cmd $dir/log/build_tree.log \
+    build-tree $context_opts --verbose=1 --max-leaves=$numleaves \
+    --cluster-thresh=$cluster_thresh $dir/treeacc $lang/phones/roots.int \
+    $dir/questions.qst $lang/topo $dir/tree || exit 1;
+
+  $cmd $dir/log/init_model.log \
+    gmm-init-model  --write-occs=$dir/1.occs  \
+      $dir/tree $dir/treeacc $lang/topo $dir/1.mdl || exit 1;
+  if grep 'no stats' $dir/log/init_model.log; then
+     echo "** The warnings above about 'no stats' generally mean you have phones **"
+     echo "** (or groups of phones) in your phone set that had no corresponding data. **"
+     echo "** You should probably figure out whether something went wrong, **"
+     echo "** or whether your data just doesn't happen to have examples of those **"
+     echo "** phones. **"
+  fi
+
+  gmm-mixup --mix-up=$numgauss $dir/1.mdl $dir/1.occs $dir/1.mdl 2>$dir/log/mixup.log || exit 1;
+  rm $dir/treeacc
+fi
+
+if [ $stage -le -1 ]; then
+  # Convert the alignments.
+  echo "$0: converting alignments from $alidir to use current tree"
+  $cmd JOB=1:$nj $dir/log/convert.JOB.log \
+    convert-ali $alidir/final.mdl $dir/1.mdl $dir/tree \
+     "ark:gunzip -c $alidir/ali.JOB.gz|" "ark:|gzip -c >$dir/ali.JOB.gz" || exit 1;
+fi
+
+if [ $stage -le 0 ]; then
+  echo "$0: compiling graphs of transcripts"
+  $cmd JOB=1:$nj $dir/log/compile_graphs.JOB.log \
+    compile-train-graphs --read-disambig-syms=$lang/phones/disambig.int $dir/tree $dir/1.mdl  $lang/L.fst  \
+     "ark:utils/sym2int.pl --map-oov $oov -f 2- $lang/words.txt < $sdata/JOB/text |" \
+      "ark:|gzip -c >$dir/fsts.JOB.gz" || exit 1;
+fi
+
+x=1
+while [ $x -lt $num_iters ]; do
+  echo "$0: training pass $x"
+  if [ $stage -le $x ]; then
+    if echo $realign_iters | grep -w $x >/dev/null; then
+      echo "$0: aligning data"
+      mdl="gmm-boost-silence --boost=$boost_silence `cat $lang/phones/optional_silence.csl` $dir/$x.mdl - |"
+      $cmd JOB=1:$nj $dir/log/align.$x.JOB.log \
+        gmm-align-compiled $scale_opts --beam=$beam --retry-beam=$retry_beam --careful=$careful "$mdl" \
+         "ark:gunzip -c $dir/fsts.JOB.gz|" "$feats" \
+         "ark:|gzip -c >$dir/ali.JOB.gz" || exit 1;
+    fi
+    $cmd JOB=1:$nj $dir/log/acc.$x.JOB.log \
+      gmm-acc-stats-ali  $dir/$x.mdl "$feats" \
+       "ark,s,cs:gunzip -c $dir/ali.JOB.gz|" $dir/$x.JOB.acc || exit 1;
+    $cmd $dir/log/update.$x.log \
+      gmm-est --mix-up=$numgauss --power=$power \
+        --write-occs=$dir/$[$x+1].occs $dir/$x.mdl \
+       "gmm-sum-accs - $dir/$x.*.acc |" $dir/$[$x+1].mdl || exit 1;
+    rm $dir/$x.mdl $dir/$x.*.acc
+    rm $dir/$x.occs
+  fi
+  [ $x -le $max_iter_inc ] && numgauss=$[$numgauss+$incgauss];
+  x=$[$x+1];
+done
+
+rm $dir/final.mdl $dir/final.occs 2>/dev/null
+ln -s $x.mdl $dir/final.mdl
+ln -s $x.occs $dir/final.occs
+
+steps/diagnostic/analyze_alignments.sh --cmd "$cmd" $lang $dir
+
+# Summarize warning messages...
+utils/summarize_warnings.pl  $dir/log
+
+steps/info/gmm_dir_info.pl $dir
+
+echo "$0: Done training system with delta+delta-delta features in $dir"
+
+exit 0
diff --git a/SpeechT5/fairseq/examples/wav2vec/unsupervised/kaldi_self_train/st/steps_gan/train_lda_mllt.sh b/SpeechT5/fairseq/examples/wav2vec/unsupervised/kaldi_self_train/st/steps_gan/train_lda_mllt.sh
new file mode 100644
index 0000000000000000000000000000000000000000..9d8c319ce848e431ec47a3548156347ae3b50ced
--- /dev/null
+++ b/SpeechT5/fairseq/examples/wav2vec/unsupervised/kaldi_self_train/st/steps_gan/train_lda_mllt.sh
@@ -0,0 +1,239 @@
+#!/usr/bin/env bash
+
+# Copyright 2012  Johns Hopkins University (Author: Daniel Povey)
+#
+# LDA+MLLT refers to the way we transform the features after computing
+# the MFCCs: we splice across several frames, reduce the dimension (to 40
+# by default) using Linear Discriminant Analysis), and then later estimate,
+# over multiple iterations, a diagonalizing transform known as MLLT or STC.
+# See http://kaldi-asr.org/doc/transform.html for more explanation.
+#
+# Apache 2.0.
+
+# Begin configuration.
+cmd=run.pl
+config=
+stage=-5
+scale_opts="--transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1"
+realign_iters="10 20 30";
+mllt_iters="2 4 6 12";
+num_iters=35    # Number of iterations of training
+max_iter_inc=25  # Last iter to increase #Gauss on.
+dim=40
+beam=10
+retry_beam=40
+careful=false
+boost_silence=1.0 # Factor by which to boost silence likelihoods in alignment
+power=0.25 # Exponent for number of gaussians according to occurrence counts
+randprune=4.0 # This is approximately the ratio by which we will speed up the
+              # LDA and MLLT calculations via randomized pruning.
+splice_opts=
+cluster_thresh=-1  # for build-tree control final bottom-up clustering of leaves
+norm_vars=false # deprecated.  Prefer --cmvn-opts "--norm-vars=false"
+cmvn_opts=
+context_opts=   # use "--context-width=5 --central-position=2" for quinphone.
+# End configuration.
+train_tree=true  # if false, don't actually train the tree.
+use_lda_mat=  # If supplied, use this LDA[+MLLT] matrix.
+num_nonsil_states=3
+
+echo "$0 $@"  # Print the command line for logging
+
+[ -f path.sh ] && . ./path.sh
+. parse_options.sh || exit 1;
+
+if [ $# != 6 ]; then
+  echo "Usage: steps/train_lda_mllt.sh [options] <#leaves> <#gauss> <data> <lang> <alignments> <dir>"
+  echo " e.g.: steps/train_lda_mllt.sh 2500 15000 data/train_si84 data/lang exp/tri1_ali_si84 exp/tri2b"
+  echo "Main options (for others, see top of script file)"
+  echo "  --cmd (utils/run.pl|utils/queue.pl <queue opts>) # how to run jobs."
+  echo "  --config <config-file>                           # config containing options"
+  echo "  --stage <stage>                                  # stage to do partial re-run from."
+  exit 1;
+fi
+
+numleaves=$1
+totgauss=$2
+data=$3
+lang=$4
+alidir=$5
+dir=$6
+
+for f in $alidir/final.mdl $alidir/ali.1.gz $data/feats.scp $lang/phones.txt; do
+  [ ! -f $f ] && echo "train_lda_mllt.sh: no such file $f" && exit 1;
+done
+
+numgauss=$numleaves
+incgauss=$[($totgauss-$numgauss)/$max_iter_inc] # per-iter #gauss increment
+oov=`cat $lang/oov.int` || exit 1;
+nj=`cat $alidir/num_jobs` || exit 1;
+silphonelist=`cat $lang/phones/silence.csl` || exit 1;
+ciphonelist=`cat $lang/phones/context_indep.csl` || exit 1;
+
+mkdir -p $dir/log
+
+utils/lang/check_phones_compatible.sh $lang/phones.txt $alidir/phones.txt || exit 1;
+cp $lang/phones.txt $dir || exit 1;
+
+echo $nj >$dir/num_jobs
+echo "$splice_opts" >$dir/splice_opts # keep track of frame-splicing options
+           # so that later stages of system building can know what they were.
+
+
+[ $(cat $alidir/cmvn_opts 2>/dev/null | wc -c) -gt 1 ] && [ -z "$cmvn_opts" ] && \
+  echo "$0: warning: ignoring CMVN options from source directory $alidir"
+$norm_vars && cmvn_opts="--norm-vars=true $cmvn_opts"
+echo $cmvn_opts > $dir/cmvn_opts # keep track of options to CMVN.
+
+sdata=$data/split$nj;
+split_data.sh $data $nj || exit 1;
+
+splicedfeats="ark,s,cs:apply-cmvn $cmvn_opts --utt2spk=ark:$sdata/JOB/utt2spk scp:$sdata/JOB/cmvn.scp scp:$sdata/JOB/feats.scp ark:- | splice-feats $splice_opts ark:- ark:- |"
+# Note: $feats gets overwritten later in the script.
+feats="$splicedfeats transform-feats $dir/0.mat ark:- ark:- |"
+
+
+
+if [ $stage -le -5 ]; then
+  if [ -z "$use_lda_mat" ]; then
+    echo "$0: Accumulating LDA statistics."
+    rm $dir/lda.*.acc 2>/dev/null
+    $cmd JOB=1:$nj $dir/log/lda_acc.JOB.log \
+    ali-to-post "ark:gunzip -c $alidir/ali.JOB.gz|" ark:- \| \
+      weight-silence-post 0.0 $silphonelist $alidir/final.mdl ark:- ark:- \| \
+      acc-lda --rand-prune=$randprune $alidir/final.mdl "$splicedfeats" ark,s,cs:- \
+      $dir/lda.JOB.acc || exit 1;
+    est-lda --write-full-matrix=$dir/full.mat --dim=$dim $dir/0.mat $dir/lda.*.acc \
+      2>$dir/log/lda_est.log || exit 1;
+    rm $dir/lda.*.acc
+  else
+    echo "$0: Using supplied LDA matrix $use_lda_mat"
+    cp $use_lda_mat $dir/0.mat || exit 1;
+    [ ! -z "$mllt_iters" ] && \
+      echo "$0: Warning: using supplied LDA matrix $use_lda_mat but we will do MLLT," && \
+      echo "     which you might not want; to disable MLLT, specify --mllt-iters ''" && \
+      sleep 5
+  fi
+fi
+
+cur_lda_iter=0
+
+if [ $stage -le -4 ] && $train_tree; then
+  echo "$0: Accumulating tree stats"
+  $cmd JOB=1:$nj $dir/log/acc_tree.JOB.log \
+    acc-tree-stats $context_opts \
+    --ci-phones=$ciphonelist $alidir/final.mdl "$feats" \
+    "ark:gunzip -c $alidir/ali.JOB.gz|" $dir/JOB.treeacc || exit 1;
+  [ `ls $dir/*.treeacc | wc -w` -ne "$nj" ] && echo "$0: Wrong #tree-accs" && exit 1;
+  $cmd $dir/log/sum_tree_acc.log \
+    sum-tree-stats $dir/treeacc $dir/*.treeacc || exit 1;
+  rm $dir/*.treeacc
+fi
+
+
+if [ $stage -le -3 ] && $train_tree; then
+  echo "$0: Getting questions for tree clustering."
+  # preparing questions, roots file...
+  cluster-phones --pdf-class-list=$(($num_nonsil_states / 2)) $context_opts $dir/treeacc $lang/phones/sets.int \
+    $dir/questions.int 2> $dir/log/questions.log || exit 1;
+  cat $lang/phones/extra_questions.int >> $dir/questions.int
+  compile-questions $context_opts $lang/topo $dir/questions.int \
+    $dir/questions.qst 2>$dir/log/compile_questions.log || exit 1;
+
+  echo "$0: Building the tree"
+  $cmd $dir/log/build_tree.log \
+    build-tree $context_opts --verbose=1 --max-leaves=$numleaves \
+    --cluster-thresh=$cluster_thresh $dir/treeacc $lang/phones/roots.int \
+    $dir/questions.qst $lang/topo $dir/tree || exit 1;
+fi
+
+if [ $stage -le -2 ]; then
+  echo "$0: Initializing the model"
+  if $train_tree; then
+    gmm-init-model  --write-occs=$dir/1.occs  \
+      $dir/tree $dir/treeacc $lang/topo $dir/1.mdl 2> $dir/log/init_model.log || exit 1;
+    grep 'no stats' $dir/log/init_model.log && echo "This is a bad warning.";
+    rm $dir/treeacc
+  else
+    cp $alidir/tree $dir/ || exit 1;
+    $cmd JOB=1 $dir/log/init_model.log \
+      gmm-init-model-flat $dir/tree $lang/topo $dir/1.mdl \
+        "$feats subset-feats ark:- ark:-|" || exit 1;
+  fi
+fi
+
+
+if [ $stage -le -1 ]; then
+  # Convert the alignments.
+  echo "$0: Converting alignments from $alidir to use current tree"
+  $cmd JOB=1:$nj $dir/log/convert.JOB.log \
+    convert-ali $alidir/final.mdl $dir/1.mdl $dir/tree \
+     "ark:gunzip -c $alidir/ali.JOB.gz|" "ark:|gzip -c >$dir/ali.JOB.gz" || exit 1;
+fi
+
+if [ $stage -le 0 ] && [ "$realign_iters" != "" ]; then
+  echo "$0: Compiling graphs of transcripts"
+  $cmd JOB=1:$nj $dir/log/compile_graphs.JOB.log \
+    compile-train-graphs --read-disambig-syms=$lang/phones/disambig.int $dir/tree $dir/1.mdl  $lang/L.fst  \
+     "ark:utils/sym2int.pl --map-oov $oov -f 2- $lang/words.txt < $data/split$nj/JOB/text |" \
+      "ark:|gzip -c >$dir/fsts.JOB.gz" || exit 1;
+fi
+
+
+x=1
+while [ $x -lt $num_iters ]; do
+  echo Training pass $x
+  if echo $realign_iters | grep -w $x >/dev/null && [ $stage -le $x ]; then
+    echo Aligning data
+    mdl="gmm-boost-silence --boost=$boost_silence `cat $lang/phones/optional_silence.csl` $dir/$x.mdl - |"
+    $cmd JOB=1:$nj $dir/log/align.$x.JOB.log \
+      gmm-align-compiled $scale_opts --beam=$beam --retry-beam=$retry_beam --careful=$careful "$mdl" \
+      "ark:gunzip -c $dir/fsts.JOB.gz|" "$feats" \
+      "ark:|gzip -c >$dir/ali.JOB.gz" || exit 1;
+  fi
+  if echo $mllt_iters | grep -w $x >/dev/null; then
+    if [ $stage -le $x ]; then
+      echo "$0: Estimating MLLT"
+      $cmd JOB=1:$nj $dir/log/macc.$x.JOB.log \
+        ali-to-post "ark:gunzip -c $dir/ali.JOB.gz|" ark:- \| \
+        weight-silence-post 0.0 $silphonelist $dir/$x.mdl ark:- ark:- \| \
+        gmm-acc-mllt --rand-prune=$randprune  $dir/$x.mdl "$feats" ark:- $dir/$x.JOB.macc \
+        || exit 1;
+      est-mllt $dir/$x.mat.new $dir/$x.*.macc 2> $dir/log/mupdate.$x.log || exit 1;
+      gmm-transform-means  $dir/$x.mat.new $dir/$x.mdl $dir/$x.mdl \
+        2> $dir/log/transform_means.$x.log || exit 1;
+      compose-transforms --print-args=false $dir/$x.mat.new $dir/$cur_lda_iter.mat $dir/$x.mat || exit 1;
+      rm $dir/$x.*.macc
+    fi
+    feats="$splicedfeats transform-feats $dir/$x.mat ark:- ark:- |"
+    cur_lda_iter=$x
+  fi
+
+  if [ $stage -le $x ]; then
+    $cmd JOB=1:$nj $dir/log/acc.$x.JOB.log \
+      gmm-acc-stats-ali  $dir/$x.mdl "$feats" \
+      "ark,s,cs:gunzip -c $dir/ali.JOB.gz|" $dir/$x.JOB.acc || exit 1;
+    $cmd $dir/log/update.$x.log \
+      gmm-est --write-occs=$dir/$[$x+1].occs --mix-up=$numgauss --power=$power \
+        $dir/$x.mdl "gmm-sum-accs - $dir/$x.*.acc |" $dir/$[$x+1].mdl || exit 1;
+    rm $dir/$x.mdl $dir/$x.*.acc $dir/$x.occs
+  fi
+  [ $x -le $max_iter_inc ] && numgauss=$[$numgauss+$incgauss];
+  x=$[$x+1];
+done
+
+rm $dir/final.{mdl,mat,occs} 2>/dev/null
+ln -s $x.mdl $dir/final.mdl
+ln -s $x.occs $dir/final.occs
+ln -s $cur_lda_iter.mat $dir/final.mat
+
+steps/diagnostic/analyze_alignments.sh --cmd "$cmd" $lang $dir
+
+# Summarize warning messages...
+utils/summarize_warnings.pl $dir/log
+
+steps/info/gmm_dir_info.pl $dir
+
+echo "$0: Done training system with LDA+MLLT features in $dir"
+
+exit 0
diff --git a/SpeechT5/fairseq/examples/wav2vec/unsupervised/kaldi_self_train/st/steps_gan/train_sat.sh b/SpeechT5/fairseq/examples/wav2vec/unsupervised/kaldi_self_train/st/steps_gan/train_sat.sh
new file mode 100644
index 0000000000000000000000000000000000000000..f75afafb1c4ad04ee71ab8541064ab0477430616
--- /dev/null
+++ b/SpeechT5/fairseq/examples/wav2vec/unsupervised/kaldi_self_train/st/steps_gan/train_sat.sh
@@ -0,0 +1,281 @@
+#!/usr/bin/env bash
+# Copyright 2012  Johns Hopkins University (Author: Daniel Povey).  Apache 2.0.
+
+
+# This does Speaker Adapted Training (SAT), i.e. train on
+# fMLLR-adapted features.  It can be done on top of either LDA+MLLT, or
+# delta and delta-delta features.  If there are no transforms supplied
+# in the alignment directory, it will estimate transforms itself before
+# building the tree (and in any case, it estimates transforms a number
+# of times during training).
+
+
+# Begin configuration section.
+stage=-5
+exit_stage=-100 # you can use this to require it to exit at the
+                # beginning of a specific stage.  Not all values are
+                # supported.
+fmllr_update_type=full
+cmd=run.pl
+scale_opts="--transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1"
+beam=10
+retry_beam=40
+careful=false
+boost_silence=1.0 # Factor by which to boost silence likelihoods in alignment
+context_opts=  # e.g. set this to "--context-width 5 --central-position 2" for quinphone.
+realign_iters="10 20 30";
+fmllr_iters="2 4 6 12";
+silence_weight=0.0 # Weight on silence in fMLLR estimation.
+num_iters=35   # Number of iterations of training
+max_iter_inc=25 # Last iter to increase #Gauss on.
+power=0.2 # Exponent for number of gaussians according to occurrence counts
+cluster_thresh=-1  # for build-tree control final bottom-up clustering of leaves
+phone_map=
+train_tree=true
+tree_stats_opts=
+cluster_phones_opts=
+compile_questions_opts=
+# End configuration section.
+num_nonsil_states=3
+
+echo "$0 $@"  # Print the command line for logging
+
+[ -f path.sh ] && . ./path.sh
+. parse_options.sh || exit 1;
+
+if [ $# != 6 ]; then
+  echo "Usage: steps/train_sat.sh <#leaves> <#gauss> <data> <lang> <ali-dir> <exp-dir>"
+  echo " e.g.: steps/train_sat.sh 2500 15000 data/train_si84 data/lang exp/tri2b_ali_si84 exp/tri3b"
+  echo "Main options (for others, see top of script file)"
+  echo "  --cmd (utils/run.pl|utils/queue.pl <queue opts>) # how to run jobs."
+  echo "  --config <config-file>                           # config containing options"
+  echo "  --stage <stage>                                  # stage to do partial re-run from."
+  exit 1;
+fi
+
+numleaves=$1
+totgauss=$2
+data=$3
+lang=$4
+alidir=$5
+dir=$6
+
+for f in $data/feats.scp $lang/phones.txt $alidir/final.mdl $alidir/ali.1.gz; do
+  [ ! -f $f ] && echo "train_sat.sh: no such file $f" && exit 1;
+done
+
+numgauss=$numleaves
+incgauss=$[($totgauss-$numgauss)/$max_iter_inc]  # per-iter #gauss increment
+oov=`cat $lang/oov.int`
+nj=`cat $alidir/num_jobs` || exit 1;
+silphonelist=`cat $lang/phones/silence.csl`
+ciphonelist=`cat $lang/phones/context_indep.csl` || exit 1;
+sdata=$data/split$nj;
+splice_opts=`cat $alidir/splice_opts 2>/dev/null` # frame-splicing options.
+cmvn_opts=`cat $alidir/cmvn_opts 2>/dev/null`
+delta_opts=`cat $alidir/delta_opts 2>/dev/null`
+phone_map_opt=
+[ ! -z "$phone_map" ] && phone_map_opt="--phone-map='$phone_map'"
+
+mkdir -p $dir/log
+cp $alidir/splice_opts $dir 2>/dev/null # frame-splicing options.
+cp $alidir/cmvn_opts $dir 2>/dev/null # cmn/cmvn option.
+cp $alidir/delta_opts $dir 2>/dev/null # delta option.
+
+utils/lang/check_phones_compatible.sh $lang/phones.txt $alidir/phones.txt || exit 1;
+cp $lang/phones.txt $dir || exit 1;
+
+echo $nj >$dir/num_jobs
+[[ -d $sdata && $data/feats.scp -ot $sdata ]] || split_data.sh $data $nj || exit 1;
+
+# Set up features.
+
+if [ -f $alidir/final.mat ]; then feat_type=lda; else feat_type=delta; fi
+echo "$0: feature type is $feat_type"
+
+## Set up speaker-independent features.
+case $feat_type in
+  delta) sifeats="ark,s,cs:apply-cmvn $cmvn_opts --utt2spk=ark:$sdata/JOB/utt2spk scp:$sdata/JOB/cmvn.scp scp:$sdata/JOB/feats.scp ark:- | add-deltas $delta_opts ark:- ark:- |";;
+  lda) sifeats="ark,s,cs:apply-cmvn $cmvn_opts --utt2spk=ark:$sdata/JOB/utt2spk scp:$sdata/JOB/cmvn.scp scp:$sdata/JOB/feats.scp ark:- | splice-feats $splice_opts ark:- ark:- | transform-feats $alidir/final.mat ark:- ark:- |"
+    cp $alidir/final.mat $dir
+    cp $alidir/full.mat $dir 2>/dev/null
+    ;;
+  *) echo "$0: invalid feature type $feat_type" && exit 1;
+esac
+
+## Get initial fMLLR transforms (possibly from alignment dir)
+if [ -f $alidir/trans.1 ]; then
+  echo "$0: Using transforms from $alidir"
+  feats="$sifeats transform-feats --utt2spk=ark:$sdata/JOB/utt2spk ark,s,cs:$alidir/trans.JOB ark:- ark:- |"
+  cur_trans_dir=$alidir
+else
+  if [ $stage -le -5 ]; then
+    echo "$0: obtaining initial fMLLR transforms since not present in $alidir"
+    # The next line is necessary because of $silphonelist otherwise being incorrect; would require
+    # old $lang dir which would require another option.  Not needed anyway.
+    [ ! -z "$phone_map" ] && \
+       echo "$0: error: you must provide transforms if you use the --phone-map option." && exit 1;
+    $cmd JOB=1:$nj $dir/log/fmllr.0.JOB.log \
+      ali-to-post "ark:gunzip -c $alidir/ali.JOB.gz|" ark:- \| \
+      weight-silence-post $silence_weight $silphonelist $alidir/final.mdl ark:- ark:- \| \
+      gmm-est-fmllr --fmllr-update-type=$fmllr_update_type \
+      --spk2utt=ark:$sdata/JOB/spk2utt $alidir/final.mdl "$sifeats" \
+      ark:- ark:$dir/trans.JOB || exit 1;
+  fi
+  feats="$sifeats transform-feats --utt2spk=ark:$sdata/JOB/utt2spk ark,s,cs:$dir/trans.JOB ark:- ark:- |"
+  cur_trans_dir=$dir
+fi
+
+if [ $stage -le -4 ] && $train_tree; then
+  # Get tree stats.
+  echo "$0: Accumulating tree stats"
+  $cmd JOB=1:$nj $dir/log/acc_tree.JOB.log \
+    acc-tree-stats $context_opts $tree_stats_opts $phone_map_opt --ci-phones=$ciphonelist $alidir/final.mdl "$feats" \
+    "ark:gunzip -c $alidir/ali.JOB.gz|" $dir/JOB.treeacc || exit 1;
+  [ "`ls $dir/*.treeacc | wc -w`" -ne "$nj" ] && echo "$0: Wrong #tree-accs" && exit 1;
+  $cmd $dir/log/sum_tree_acc.log \
+    sum-tree-stats $dir/treeacc $dir/*.treeacc || exit 1;
+  rm $dir/*.treeacc
+fi
+
+if [ $stage -le -3 ] && $train_tree; then
+  echo "$0: Getting questions for tree clustering."
+  # preparing questions, roots file...
+  cluster-phones --pdf-class-list=$(($num_nonsil_states / 2)) \
+    $cluster_phones_opts $context_opts \
+    $dir/treeacc $lang/phones/sets.int $dir/questions.int 2>$dir/log/questions.log || exit 1;
+  cat $lang/phones/extra_questions.int >> $dir/questions.int
+  compile-questions $context_opts $compile_questions_opts $lang/topo $dir/questions.int $dir/questions.qst 2>$dir/log/compile_questions.log || exit 1;
+
+  echo "$0: Building the tree"
+  $cmd $dir/log/build_tree.log \
+    build-tree $context_opts --verbose=1 --max-leaves=$numleaves \
+    --cluster-thresh=$cluster_thresh $dir/treeacc $lang/phones/roots.int \
+    $dir/questions.qst $lang/topo $dir/tree || exit 1;
+fi
+
+if [ $stage -le -2 ]; then
+  echo "$0: Initializing the model"
+  if $train_tree; then
+    gmm-init-model  --write-occs=$dir/1.occs  \
+      $dir/tree $dir/treeacc $lang/topo $dir/1.mdl 2> $dir/log/init_model.log || exit 1;
+    grep 'no stats' $dir/log/init_model.log && echo "This is a bad warning.";
+    rm $dir/treeacc
+  else
+    cp $alidir/tree $dir/ || exit 1;
+    $cmd JOB=1 $dir/log/init_model.log \
+      gmm-init-model-flat $dir/tree $lang/topo $dir/1.mdl \
+        "$feats subset-feats ark:- ark:-|" || exit 1;
+  fi
+fi
+
+if [ $stage -le -1 ]; then
+  # Convert the alignments.
+  echo "$0: Converting alignments from $alidir to use current tree"
+  $cmd JOB=1:$nj $dir/log/convert.JOB.log \
+    convert-ali $phone_map_opt $alidir/final.mdl $dir/1.mdl $dir/tree \
+     "ark:gunzip -c $alidir/ali.JOB.gz|" "ark:|gzip -c >$dir/ali.JOB.gz" || exit 1;
+fi
+
+[ "$exit_stage" -eq 0 ] && echo "$0: Exiting early: --exit-stage $exit_stage" && exit 0;
+
+if [ $stage -le 0 ] && [ "$realign_iters" != "" ]; then
+  echo "$0: Compiling graphs of transcripts"
+  $cmd JOB=1:$nj $dir/log/compile_graphs.JOB.log \
+    compile-train-graphs --read-disambig-syms=$lang/phones/disambig.int $dir/tree $dir/1.mdl  $lang/L.fst  \
+     "ark:utils/sym2int.pl --map-oov $oov -f 2- $lang/words.txt < $sdata/JOB/text |" \
+      "ark:|gzip -c >$dir/fsts.JOB.gz" || exit 1;
+fi
+
+x=1
+while [ $x -lt $num_iters ]; do
+   echo Pass $x
+  if echo $realign_iters | grep -w $x >/dev/null && [ $stage -le $x ]; then
+    echo Aligning data
+    mdl="gmm-boost-silence --boost=$boost_silence `cat $lang/phones/optional_silence.csl` $dir/$x.mdl - |"
+    $cmd JOB=1:$nj $dir/log/align.$x.JOB.log \
+      gmm-align-compiled $scale_opts --beam=$beam --retry-beam=$retry_beam --careful=$careful "$mdl" \
+      "ark:gunzip -c $dir/fsts.JOB.gz|" "$feats" \
+      "ark:|gzip -c >$dir/ali.JOB.gz" || exit 1;
+  fi
+
+  if echo $fmllr_iters | grep -w $x >/dev/null; then
+    if [ $stage -le $x ]; then
+      echo Estimating fMLLR transforms
+      # We estimate a transform that's additional to the previous transform;
+      # we'll compose them.
+      $cmd JOB=1:$nj $dir/log/fmllr.$x.JOB.log \
+        ali-to-post "ark:gunzip -c $dir/ali.JOB.gz|" ark:-  \| \
+        weight-silence-post $silence_weight $silphonelist $dir/$x.mdl ark:- ark:- \| \
+        gmm-est-fmllr --fmllr-update-type=$fmllr_update_type \
+        --spk2utt=ark:$sdata/JOB/spk2utt $dir/$x.mdl \
+        "$feats" ark:- ark:$dir/tmp_trans.JOB || exit 1;
+      for n in `seq $nj`; do
+        ! ( compose-transforms --b-is-affine=true \
+          ark:$dir/tmp_trans.$n ark:$cur_trans_dir/trans.$n ark:$dir/composed_trans.$n \
+          && mv $dir/composed_trans.$n $dir/trans.$n && \
+          rm $dir/tmp_trans.$n ) 2>$dir/log/compose_transforms.$x.log \
+          && echo "$0: Error composing transforms" && exit 1;
+      done
+    fi
+    feats="$sifeats transform-feats --utt2spk=ark:$sdata/JOB/utt2spk ark:$dir/trans.JOB ark:- ark:- |"
+    cur_trans_dir=$dir
+  fi
+
+  if [ $stage -le $x ]; then
+    $cmd JOB=1:$nj $dir/log/acc.$x.JOB.log \
+      gmm-acc-stats-ali $dir/$x.mdl "$feats" \
+      "ark,s,cs:gunzip -c $dir/ali.JOB.gz|" $dir/$x.JOB.acc || exit 1;
+    [ `ls $dir/$x.*.acc | wc -w` -ne "$nj" ] && echo "$0: Wrong #accs" && exit 1;
+    $cmd $dir/log/update.$x.log \
+      gmm-est --power=$power --write-occs=$dir/$[$x+1].occs --mix-up=$numgauss $dir/$x.mdl \
+      "gmm-sum-accs - $dir/$x.*.acc |" $dir/$[$x+1].mdl || exit 1;
+    rm $dir/$x.mdl $dir/$x.*.acc
+    rm $dir/$x.occs
+  fi
+  [ $x -le $max_iter_inc ] && numgauss=$[$numgauss+$incgauss];
+  x=$[$x+1];
+done
+
+
+if [ $stage -le $x ]; then
+  # Accumulate stats for "alignment model"-- this model is
+  # computed with the speaker-independent features, but matches Gaussian-for-Gaussian
+  # with the final speaker-adapted model.
+  $cmd JOB=1:$nj $dir/log/acc_alimdl.JOB.log \
+    ali-to-post "ark:gunzip -c $dir/ali.JOB.gz|" ark:-  \| \
+    gmm-acc-stats-twofeats $dir/$x.mdl "$feats" "$sifeats" \
+    ark,s,cs:- $dir/$x.JOB.acc || exit 1;
+  [ `ls $dir/$x.*.acc | wc -w` -ne "$nj" ] && echo "$0: Wrong #accs" && exit 1;
+  # Update model.
+  $cmd $dir/log/est_alimdl.log \
+    gmm-est --power=$power --remove-low-count-gaussians=false $dir/$x.mdl \
+    "gmm-sum-accs - $dir/$x.*.acc|" $dir/$x.alimdl  || exit 1;
+  rm $dir/$x.*.acc
+fi
+
+rm $dir/final.{mdl,alimdl,occs} 2>/dev/null
+ln -s $x.mdl $dir/final.mdl
+ln -s $x.occs $dir/final.occs
+ln -s $x.alimdl $dir/final.alimdl
+
+
+steps/diagnostic/analyze_alignments.sh --cmd "$cmd" $lang $dir
+
+utils/summarize_warnings.pl $dir/log
+(
+  echo "$0: Likelihood evolution:"
+  for x in `seq $[$num_iters-1]`; do
+    tail -n 30 $dir/log/acc.$x.*.log | awk '/Overall avg like/{l += $(NF-3)*$(NF-1); t += $(NF-1); }
+        /Overall average logdet/{d += $(NF-3)*$(NF-1); t2 += $(NF-1);}
+        END{ d /= t2; l /= t; printf("%s ", d+l); } '
+  done
+  echo
+) | tee $dir/log/summary.log
+
+
+steps/info/gmm_dir_info.pl $dir
+
+echo "$0: done training SAT system in $dir"
+
+exit 0
diff --git a/SpeechT5/fairseq/examples/wav2vec/unsupervised/kaldi_self_train/st/train.sh b/SpeechT5/fairseq/examples/wav2vec/unsupervised/kaldi_self_train/st/train.sh
new file mode 100644
index 0000000000000000000000000000000000000000..f3a3d3fc7cc98a38d8e9d523a0b43c0c8ea51bf9
--- /dev/null
+++ b/SpeechT5/fairseq/examples/wav2vec/unsupervised/kaldi_self_train/st/train.sh
@@ -0,0 +1,43 @@
+#!/bin/bash
+
+set -eu
+
+w2v_dir=  # contains features `{train,valid}.{npy,lengths}`, real transcripts `{train,valid}.${label}`, and dict `dict.${label}.txt`
+lab_dir=  # contains pseudo labels `{train,valid}.txt`
+out_dir=  # output root
+arpa_lm=  # phone LM
+arpa_lm_bin=  # (binary) phone LM for KenLM, used in unsupervised selection
+
+label=phnc
+train_name="train"
+valid_name="valid"
+data_dir=${out_dir}/data
+
+mkdir -p ${out_dir}/exp
+local/prepare_lang.sh $w2v_dir/dict.${label}.txt $data_dir
+local/prepare_lm.sh $arpa_lm $data_dir
+
+for x in $train_name $valid_name; do
+  x_gt=${x}_gt
+
+  # prepare pseudo data
+  python local/prepare_data_from_w2v.py $w2v_dir $data_dir $x
+  steps/compute_cmvn_stats.sh $data_dir/$x $out_dir/exp/make_feat/$x $out_dir/feats/$x
+  python local/copy_aligned_text.py < $lab_dir/$x.txt > $data_dir/$x/text
+
+  # prepare ground truth data
+  mkdir $data_dir/$x_gt
+  cp $data_dir/$x/{feats.scp,cmvn.scp,utt2spk,spk2utt} $data_dir/$x_gt/
+  python local/copy_aligned_text.py < $w2v_dir/$x.$label > $data_dir/$x_gt/text
+done
+
+local/train_subset_lgbeam.sh \
+  --out_root ${out_dir} --out_name exp --train $train_name --valid $valid_name \
+  --mono_size 2000 --tri1_size 5000 --tri2b_size -1 --tri3b_size -1 \
+  --stage 1 --max_stage 3 $data_dir $data_dir/lang $data_dir/lang_test
+
+local/unsup_select_decode.sh \
+  --split $valid_name --kenlm_path $arpa_lm_bin \
+  --ref_txt $data_dir/${valid_name}_gt/text \
+  --psd_txt $data_dir/${valid_name}/text \
+  $out_dir/exp
diff --git a/SpeechT5/fairseq/examples/wav2vec/unsupervised/models/__init__.py b/SpeechT5/fairseq/examples/wav2vec/unsupervised/models/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..3e3039b7081a9e3228c8abefb6391a75b4864439
--- /dev/null
+++ b/SpeechT5/fairseq/examples/wav2vec/unsupervised/models/__init__.py
@@ -0,0 +1,11 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from .wav2vec_u import Wav2vec_U
+
+
+__all__ = [
+    "Wav2vec_U",
+]
diff --git a/SpeechT5/fairseq/examples/wav2vec/unsupervised/models/wav2vec_u.py b/SpeechT5/fairseq/examples/wav2vec/unsupervised/models/wav2vec_u.py
new file mode 100644
index 0000000000000000000000000000000000000000..27792ebda842057e33fed3dc53dd9d8a594d0483
--- /dev/null
+++ b/SpeechT5/fairseq/examples/wav2vec/unsupervised/models/wav2vec_u.py
@@ -0,0 +1,637 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from dataclasses import dataclass
+from enum import Enum, auto
+import math
+import numpy as np
+from typing import Tuple, List, Optional, Dict
+
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from torch import autograd
+
+from fairseq import checkpoint_utils, utils
+from fairseq.dataclass import FairseqDataclass
+from fairseq.models import BaseFairseqModel, register_model
+from fairseq.modules import (
+    SamePad,
+    TransposeLast,
+)
+
+
+class SegmentationType(Enum):
+    NONE = auto()
+    RANDOM = auto()
+    UNIFORM_RANDOM = auto()
+    UNIFORM_RANDOM_JOIN = auto()
+    JOIN = auto()
+
+
+@dataclass
+class SegmentationConfig(FairseqDataclass):
+    type: SegmentationType = SegmentationType.NONE
+    subsample_rate: float = 0.25
+    mean_pool: bool = True
+    mean_pool_join: bool = False
+    remove_zeros: bool = False
+
+
+@dataclass
+class Wav2vec_UConfig(FairseqDataclass):
+
+    discriminator_kernel: int = 3
+    discriminator_dilation: int = 1
+    discriminator_dim: int = 256
+    discriminator_causal: bool = True
+    discriminator_linear_emb: bool = False
+    discriminator_depth: int = 1
+    discriminator_max_pool: bool = False
+    discriminator_act_after_linear: bool = False
+    discriminator_dropout: float = 0.0
+    discriminator_spectral_norm: bool = False
+    discriminator_weight_norm: bool = False
+
+    generator_kernel: int = 4
+    generator_dilation: int = 1
+    generator_stride: int = 1
+    generator_bias: bool = False
+    generator_dropout: float = 0.0
+
+    blank_weight: float = 0
+    blank_mode: str = "add"
+    blank_is_sil: bool = False
+    no_softmax: bool = False
+
+    smoothness_weight: float = 0.0
+    smoothing: float = 0.0
+    smoothing_one_sided: bool = False
+    gradient_penalty: float = 0.0
+    probabilistic_grad_penalty_slicing: bool = False
+    code_penalty: float = 0.0
+    gumbel: bool = False
+    hard_gumbel: bool = True
+    temp: Tuple[float, float, float] = (2, 0.1, 0.99995)
+    input_dim: int = 128
+
+    segmentation: SegmentationConfig = SegmentationConfig()
+
+
+class Segmenter(nn.Module):
+    cfg: SegmentationConfig
+
+    def __init__(self, cfg: SegmentationConfig):
+        super().__init__()
+        self.cfg = cfg
+        self.subsample_rate = cfg.subsample_rate
+
+    def pre_segment(self, dense_x, dense_padding_mask):
+        return dense_x, dense_padding_mask
+
+    def logit_segment(self, logits, padding_mask):
+        return logits, padding_mask
+
+
+class RandomSegmenter(Segmenter):
+    def pre_segment(self, dense_x, dense_padding_mask):
+        target_num = math.ceil(dense_x.size(1) * self.subsample_rate)
+        ones = torch.ones(dense_x.shape[:-1], device=dense_x.device)
+        indices, _ = ones.multinomial(target_num).sort(dim=-1)
+        indices_ld = indices.unsqueeze(-1).expand(-1, -1, dense_x.size(-1))
+        dense_x = dense_x.gather(1, indices_ld)
+        dense_padding_mask = dense_padding_mask.gather(1, index=indices)
+        return dense_x, dense_padding_mask
+
+
+class UniformRandomSegmenter(Segmenter):
+    def pre_segment(self, dense_x, dense_padding_mask):
+        bsz, tsz, fsz = dense_x.shape
+
+        target_num = math.ceil(tsz * self.subsample_rate)
+
+        rem = tsz % target_num
+
+        if rem > 0:
+            dense_x = F.pad(dense_x, [0, 0, 0, target_num - rem])
+            dense_padding_mask = F.pad(
+                dense_padding_mask, [0, target_num - rem], value=True
+            )
+
+        dense_x = dense_x.view(bsz, target_num, -1, fsz)
+        dense_padding_mask = dense_padding_mask.view(bsz, target_num, -1)
+
+        if self.cfg.mean_pool:
+            dense_x = dense_x.mean(dim=-2)
+            dense_padding_mask = dense_padding_mask.all(dim=-1)
+        else:
+            ones = torch.ones((bsz, dense_x.size(2)), device=dense_x.device)
+            indices = ones.multinomial(1)
+            indices = indices.unsqueeze(-1).expand(-1, target_num, -1)
+            indices_ld = indices.unsqueeze(-1).expand(-1, -1, -1, fsz)
+            dense_x = dense_x.gather(2, indices_ld).reshape(bsz, -1, fsz)
+            dense_padding_mask = dense_padding_mask.gather(2, index=indices).reshape(
+                bsz, -1
+            )
+        return dense_x, dense_padding_mask
+
+
+class JoinSegmenter(Segmenter):
+    def logit_segment(self, logits, padding_mask):
+        preds = logits.argmax(dim=-1)
+
+        if padding_mask.any():
+            preds[padding_mask] = -1  # mark pad
+        uniques = []
+
+        bsz, tsz, csz = logits.shape
+
+        for p in preds:
+            uniques.append(
+                p.cpu().unique_consecutive(return_inverse=True, return_counts=True)
+            )
+
+        new_tsz = max(u[0].numel() for u in uniques)
+        new_logits = logits.new_zeros(bsz, new_tsz, csz)
+        new_pad = padding_mask.new_zeros(bsz, new_tsz)
+
+        for b in range(bsz):
+            u, idx, c = uniques[b]
+            keep = u != -1
+
+            if self.cfg.remove_zeros:
+                keep.logical_and_(u != 0)
+
+            if self.training and not self.cfg.mean_pool_join:
+                u[0] = 0
+                u[1:] = c.cumsum(0)[:-1]
+                m = c > 1
+                r = torch.rand(m.sum())
+                o = (c[m] * r).long()
+                u[m] += o
+                new_logits[b, : u.numel()] = logits[b, u]
+            else:
+                new_logits[b].index_add_(
+                    dim=0, index=idx.to(new_logits.device), source=logits[b]
+                )
+                new_logits[b, : c.numel()] /= c.unsqueeze(-1).to(new_logits.device)
+
+            new_sz = keep.sum()
+            if not keep.all():
+                kept_logits = new_logits[b, : c.numel()][keep]
+                new_logits[b, :new_sz] = kept_logits
+
+            if new_sz < new_tsz:
+                pad = new_tsz - new_sz
+                new_logits[b, -pad:] = 0
+                new_pad[b, -pad:] = True
+
+        return new_logits, new_pad
+
+
+class UniformRandomJoinSegmenter(UniformRandomSegmenter, JoinSegmenter):
+    pass
+
+
+SEGMENT_FACTORY = {
+    SegmentationType.NONE: Segmenter,
+    SegmentationType.RANDOM: RandomSegmenter,
+    SegmentationType.UNIFORM_RANDOM: UniformRandomSegmenter,
+    SegmentationType.UNIFORM_RANDOM_JOIN: UniformRandomJoinSegmenter,
+    SegmentationType.JOIN: JoinSegmenter,
+}
+
+
+class Discriminator(nn.Module):
+    def __init__(self, dim, cfg: Wav2vec_UConfig):
+        super().__init__()
+
+        inner_dim = cfg.discriminator_dim
+        kernel = cfg.discriminator_kernel
+        dilation = cfg.discriminator_dilation
+        self.max_pool = cfg.discriminator_max_pool
+
+        if cfg.discriminator_causal:
+            padding = kernel - 1
+        else:
+            padding = kernel // 2
+
+        def make_conv(in_d, out_d, k, p=0, has_dilation=True):
+            conv = nn.Conv1d(
+                in_d,
+                out_d,
+                kernel_size=k,
+                padding=p,
+                dilation=dilation if has_dilation else 1,
+            )
+            if cfg.discriminator_spectral_norm:
+                conv = nn.utils.spectral_norm(conv)
+            elif cfg.discriminator_weight_norm:
+                conv = nn.utils.weight_norm(conv)
+            return conv
+
+        inner_net = [
+            nn.Sequential(
+                make_conv(inner_dim, inner_dim, kernel, padding),
+                SamePad(kernel_size=kernel, causal=cfg.discriminator_causal),
+                nn.Dropout(cfg.discriminator_dropout),
+                nn.GELU(),
+            )
+            for _ in range(cfg.discriminator_depth - 1)
+        ] + [
+            make_conv(inner_dim, 1, kernel, padding, has_dilation=False),
+            SamePad(kernel_size=kernel, causal=cfg.discriminator_causal),
+        ]
+
+        if cfg.discriminator_linear_emb:
+            emb_net = [make_conv(dim, inner_dim, 1)]
+        else:
+            emb_net = [
+                make_conv(dim, inner_dim, kernel, padding),
+                SamePad(kernel_size=kernel, causal=cfg.discriminator_causal),
+            ]
+
+        if cfg.discriminator_act_after_linear:
+            emb_net.append(nn.GELU())
+
+        self.net = nn.Sequential(
+            *emb_net,
+            nn.Dropout(cfg.discriminator_dropout),
+            *inner_net,
+        )
+
+    def forward(self, x, padding_mask):
+        x = x.transpose(1, 2)  # BTC -> BCT
+        x = self.net(x)
+        x = x.transpose(1, 2)
+        x_sz = x.size(1)
+        if padding_mask is not None and padding_mask.any() and padding_mask.dim() > 1:
+            padding_mask = padding_mask[:, : x.size(1)]
+            x[padding_mask] = float("-inf") if self.max_pool else 0
+            x_sz = x_sz - padding_mask.sum(dim=-1)
+        x = x.squeeze(-1)
+        if self.max_pool:
+            x, _ = x.max(dim=-1)
+        else:
+            x = x.sum(dim=-1)
+            x = x / x_sz
+        return x
+
+
+class Generator(nn.Module):
+    def __init__(self, input_dim, output_dim, cfg: Wav2vec_UConfig):
+        super().__init__()
+
+        self.cfg = cfg
+        self.output_dim = output_dim
+        self.stride = cfg.generator_stride
+        self.dropout = nn.Dropout(cfg.generator_dropout)
+
+        padding = cfg.generator_kernel // 2
+        self.proj = nn.Sequential(
+            TransposeLast(),
+            nn.Conv1d(
+                input_dim,
+                output_dim,
+                kernel_size=cfg.generator_kernel,
+                stride=cfg.generator_stride,
+                dilation=cfg.generator_dilation,
+                padding=padding,
+                bias=cfg.generator_bias,
+            ),
+            TransposeLast(),
+        )
+
+    def forward(self, dense_x, tokens, dense_padding_mask):
+        dense_x = self.dropout(dense_x)
+
+        dense_x = self.proj(dense_x)
+        if self.stride > 1:
+            dense_padding_mask = dense_padding_mask[:, :: self.stride]
+
+        if dense_padding_mask.size(1) != dense_x.size(1):
+            new_padding = dense_padding_mask.new_zeros(dense_x.shape[:-1])
+            diff = new_padding.size(1) - dense_padding_mask.size(1)
+            assert (
+                diff > 0
+            ), f"{new_padding.shape}, {dense_padding_mask.shape}, {dense_x.shape}, {diff}"
+            if diff > 0:
+                new_padding[:, diff:] = dense_padding_mask
+            else:
+                assert diff < 0
+                new_padding = dense_padding_mask[:, :diff]
+
+            dense_padding_mask = new_padding
+
+        result = {}
+
+        token_x = None
+        if tokens is not None:
+            token_x = dense_x.new_zeros(tokens.numel(), self.output_dim)
+            token_x.scatter_(1, tokens.view(-1, 1).long(), 1)
+            token_x = token_x.view(tokens.shape + (self.output_dim,))
+
+        result["dense_x"] = dense_x
+        result["token_x"] = token_x
+        result["dense_padding_mask"] = dense_padding_mask
+
+        return result
+
+
+@register_model("wav2vec_u", dataclass=Wav2vec_UConfig)
+class Wav2vec_U(BaseFairseqModel):
+    def calc_gradient_penalty(self, real_data, fake_data):
+
+        b_size = min(real_data.size(0), fake_data.size(0))
+        t_size = min(real_data.size(1), fake_data.size(1))
+
+        if self.cfg.probabilistic_grad_penalty_slicing:
+
+            def get_slice(data, dim, target_size):
+
+                size = data.size(dim)
+                diff = size - target_size
+                if diff <= 0:
+                    return data
+
+                start = np.random.randint(0, diff + 1)
+                return data.narrow(dim=dim, start=start, length=target_size)
+
+            real_data = get_slice(real_data, 0, b_size)
+            real_data = get_slice(real_data, 1, t_size)
+            fake_data = get_slice(fake_data, 0, b_size)
+            fake_data = get_slice(fake_data, 1, t_size)
+
+        else:
+            real_data = real_data[:b_size, :t_size]
+            fake_data = fake_data[:b_size, :t_size]
+
+        alpha = torch.rand(real_data.size(0), 1, 1)
+        alpha = alpha.expand(real_data.size())
+        alpha = alpha.to(real_data.device)
+
+        interpolates = alpha * real_data + ((1 - alpha) * fake_data)
+
+        disc_interpolates = self.discriminator(interpolates, None)
+
+        gradients = autograd.grad(
+            outputs=disc_interpolates,
+            inputs=interpolates,
+            grad_outputs=torch.ones(disc_interpolates.size(), device=real_data.device),
+            create_graph=True,
+            retain_graph=True,
+            only_inputs=True,
+        )[0]
+
+        gradient_penalty = (gradients.norm(2, dim=1) - 1) ** 2
+        return gradient_penalty
+
+    def set_num_updates(self, num_updates):
+        super().set_num_updates(num_updates)
+        self.update_num = num_updates
+        self.curr_temp = max(
+            self.max_temp * self.temp_decay ** num_updates, self.min_temp
+        )
+
+    def discrim_step(self, num_updates):
+        return num_updates % 2 == 1
+
+    def get_groups_for_update(self, num_updates):
+        return "discriminator" if self.discrim_step(num_updates) else "generator"
+
+    def __init__(self, cfg: Wav2vec_UConfig, target_dict):
+        super().__init__()
+
+        self.cfg = cfg
+        self.zero_index = target_dict.index("<SIL>") if "<SIL>" in target_dict else 0
+        self.smoothness_weight = cfg.smoothness_weight
+
+        output_size = len(target_dict)
+        self.pad = target_dict.pad()
+        self.eos = target_dict.eos()
+        self.smoothing = cfg.smoothing
+        self.smoothing_one_sided = cfg.smoothing_one_sided
+        self.no_softmax = cfg.no_softmax
+        self.gumbel = cfg.gumbel
+        self.hard_gumbel = cfg.hard_gumbel
+        self.last_acc = None
+
+        self.gradient_penalty = cfg.gradient_penalty
+        self.code_penalty = cfg.code_penalty
+        self.blank_weight = cfg.blank_weight
+        self.blank_mode = cfg.blank_mode
+        self.blank_index = target_dict.index("<SIL>") if cfg.blank_is_sil else 0
+        assert self.blank_index != target_dict.unk()
+
+        self.discriminator = Discriminator(output_size, cfg)
+        for p in self.discriminator.parameters():
+            p.param_group = "discriminator"
+
+        self.pca_A = self.pca_b = None
+        d = cfg.input_dim
+
+        self.segmenter = SEGMENT_FACTORY[cfg.segmentation.type](cfg.segmentation)
+
+        self.generator = Generator(d, output_size, cfg)
+
+        for p in self.generator.parameters():
+            p.param_group = "generator"
+
+        for p in self.segmenter.parameters():
+            p.param_group = "generator"
+
+        self.max_temp, self.min_temp, self.temp_decay = cfg.temp
+        self.curr_temp = self.max_temp
+        self.update_num = 0
+
+    @classmethod
+    def build_model(cls, cfg, task):
+        return cls(cfg, task.target_dictionary)
+
+    def get_logits(
+        self,
+        net_output: Optional[Dict[str, List[Optional[torch.Tensor]]]],
+        normalize: bool = False,
+    ):
+        logits = net_output["logits"]
+
+        if self.blank_weight != 0:
+            if self.blank_mode == "add":
+                logits[..., self.blank_index] += self.blank_weight
+            elif self.blank_mode == "set":
+                logits[..., self.blank_index] = self.blank_weight
+            else:
+                raise Exception(f"invalid blank mode {self.blank_mode}")
+
+        padding = net_output["padding_mask"]
+        if padding.any():
+            logits[padding] = float("-inf")
+            logits[padding][..., self.blank_index] = float("inf")
+
+        if normalize:
+            logits = utils.log_softmax(logits.float(), dim=-1)
+
+        return logits.transpose(0, 1)
+
+    def get_normalized_probs(
+        self,
+        net_output: Tuple[
+            torch.Tensor, Optional[Dict[str, List[Optional[torch.Tensor]]]]
+        ],
+        log_probs: bool,
+        sample: Optional[Dict[str, torch.Tensor]] = None,
+    ):
+        logits = self.get_logits(net_output)
+
+        probs = super().get_normalized_probs(logits, log_probs, sample)
+        # BTC -> TBC for ctc
+        probs = probs.transpose(0, 1)
+        return probs
+
+    def normalize(self, dense_x):
+
+        bsz, tsz, csz = dense_x.shape
+
+        if dense_x.numel() == 0:
+            raise Exception(dense_x.shape)
+        _, k = dense_x.max(-1)
+        hard_x = (
+            dense_x.new_zeros(bsz * tsz, csz)
+            .scatter_(-1, k.view(-1, 1), 1.0)
+            .view(-1, csz)
+        )
+        hard_probs = torch.mean(hard_x.float(), dim=0)
+        code_perplexity = torch.exp(
+            -torch.sum(hard_probs * torch.log(hard_probs + 1e-7), dim=-1)
+        )
+
+        avg_probs = torch.softmax(dense_x.reshape(-1, csz).float(), dim=-1).mean(dim=0)
+        prob_perplexity = torch.exp(
+            -torch.sum(avg_probs * torch.log(avg_probs + 1e-7), dim=-1)
+        )
+
+        if not self.no_softmax:
+            if self.training and self.gumbel:
+                dense_x = F.gumbel_softmax(
+                    dense_x.float(), tau=self.curr_temp, hard=self.hard_gumbel
+                ).type_as(dense_x)
+            else:
+                dense_x = dense_x.softmax(-1)
+
+        return dense_x, code_perplexity, prob_perplexity
+
+    def forward(
+        self,
+        features,
+        padding_mask,
+        random_label=None,
+        dense_x_only=False,
+        segment=True,
+    ):
+        if segment:
+            features, padding_mask = self.segmenter.pre_segment(features, padding_mask)
+
+        orig_size = features.size(0) * features.size(1) - padding_mask.sum()
+
+        gen_result = self.generator(features, random_label, padding_mask)
+
+        orig_dense_x, token_x = gen_result["dense_x"], gen_result["token_x"]
+        orig_dense_padding_mask = gen_result["dense_padding_mask"]
+
+        if segment:
+            dense_x, dense_padding_mask = self.segmenter.logit_segment(
+                orig_dense_x, orig_dense_padding_mask
+            )
+        else:
+            dense_x = orig_dense_x
+            dense_padding_mask = orig_dense_padding_mask
+
+        dense_logits = dense_x
+        prob_perplexity = None
+        code_perplexity = None
+
+        if not (self.no_softmax and dense_x_only):
+            dense_x, code_perplexity, prob_perplexity = self.normalize(dense_logits)
+
+        if dense_x_only or self.discriminator is None:
+            return {
+                "logits": dense_x,
+                "padding_mask": dense_padding_mask,
+            }
+
+        token_padding_mask = random_label == self.pad
+
+        dense_y = self.discriminator(dense_x, dense_padding_mask)
+        token_y = self.discriminator(token_x, token_padding_mask)
+
+        sample_size = features.size(0)
+
+        d_step = self.discrim_step(self.update_num)
+
+        fake_smooth = self.smoothing
+        real_smooth = self.smoothing
+        if self.smoothing_one_sided:
+            fake_smooth = 0
+
+        zero_loss = None
+        smoothness_loss = None
+        code_pen = None
+
+        if d_step:
+            loss_dense = F.binary_cross_entropy_with_logits(
+                dense_y,
+                dense_y.new_ones(dense_y.shape) - fake_smooth,
+                reduction="sum",
+            )
+            loss_token = F.binary_cross_entropy_with_logits(
+                token_y,
+                token_y.new_zeros(token_y.shape) + real_smooth,
+                reduction="sum",
+            )
+            if self.training and self.gradient_penalty > 0:
+                grad_pen = self.calc_gradient_penalty(token_x, dense_x)
+                grad_pen = grad_pen.sum() * self.gradient_penalty
+            else:
+                grad_pen = None
+        else:
+            grad_pen = None
+            loss_token = None
+            loss_dense = F.binary_cross_entropy_with_logits(
+                dense_y,
+                dense_y.new_zeros(dense_y.shape) + fake_smooth,
+                reduction="sum",
+            )
+            num_vars = dense_x.size(-1)
+            if prob_perplexity is not None:
+                code_pen = (num_vars - prob_perplexity) / num_vars
+                code_pen = code_pen * sample_size * self.code_penalty
+
+            if self.smoothness_weight > 0:
+                smoothness_loss = F.mse_loss(
+                    dense_logits[:, :-1], dense_logits[:, 1:], reduction="none"
+                )
+                smoothness_loss[dense_padding_mask[:, 1:]] = 0
+                smoothness_loss = (
+                    smoothness_loss.mean() * sample_size * self.smoothness_weight
+                )
+
+        result = {
+            "losses": {
+                "grad_pen": grad_pen,
+                "code_pen": code_pen,
+                "smoothness": smoothness_loss,
+            },
+            "temp": self.curr_temp,
+            "code_ppl": code_perplexity,
+            "prob_ppl": prob_perplexity,
+            "d_steps": int(d_step),
+            "sample_size": sample_size,
+        }
+
+        suff = "_d" if d_step else "_g"
+        result["losses"]["dense" + suff] = loss_dense
+        result["losses"]["token" + suff] = loss_token
+
+        return result
diff --git a/SpeechT5/fairseq/examples/wav2vec/unsupervised/scripts/apply_pca.py b/SpeechT5/fairseq/examples/wav2vec/unsupervised/scripts/apply_pca.py
new file mode 100644
index 0000000000000000000000000000000000000000..10ad6ce47cfdf0a87ba089b299fe9551b29fa167
--- /dev/null
+++ b/SpeechT5/fairseq/examples/wav2vec/unsupervised/scripts/apply_pca.py
@@ -0,0 +1,76 @@
+#!/usr/bin/env python3 -u
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import argparse
+import os
+import os.path as osp
+import math
+import numpy as np
+import tqdm
+import torch
+from shutil import copyfile
+
+from npy_append_array import NpyAppendArray
+
+
+def get_parser():
+    parser = argparse.ArgumentParser(
+        description="transforms features via a given pca and stored them in target dir"
+    )
+    # fmt: off
+    parser.add_argument('source', help='directory with features')
+    parser.add_argument('--split', help='which split to read', required=True)
+    parser.add_argument('--save-dir', help='where to save the output', required=True)
+    parser.add_argument('--pca-path', type=str, help='pca location. will append _A.npy and _b.npy', required=True)
+    parser.add_argument('--batch-size', type=int, default=2048000, help='batch size')
+    parser.add_argument('--unfiltered', action='store_true', help='process the unfiltered version')
+    # fmt: on
+
+    return parser
+
+
+def main():
+    parser = get_parser()
+    args = parser.parse_args()
+
+    source_path = osp.join(args.source, args.split)
+    data_poth = source_path + "_unfiltered" if args.unfiltered else source_path
+
+    print(f"data path: {data_poth}")
+
+    features = np.load(data_poth + ".npy", mmap_mode="r")
+    pca_A = torch.from_numpy(np.load(args.pca_path + "_A.npy")).cuda()
+    pca_b = torch.from_numpy(np.load(args.pca_path + "_b.npy")).cuda()
+
+    os.makedirs(args.save_dir, exist_ok=True)
+    save_path = osp.join(args.save_dir, args.split)
+
+    copyfile(source_path + ".tsv", save_path + ".tsv")
+    copyfile(data_poth + ".lengths", save_path + ".lengths")
+
+    if osp.exists(source_path + ".phn"):
+        copyfile(source_path + ".phn", save_path + ".phn")
+
+    if osp.exists(source_path + ".wrd"):
+        copyfile(source_path + ".wrd", save_path + ".wrd")
+
+    if osp.exists(save_path + ".npy"):
+        os.remove(save_path + ".npy")
+    npaa = NpyAppendArray(save_path + ".npy")
+
+    batches = math.ceil(features.shape[0] / args.batch_size)
+
+    with torch.no_grad():
+        for b in tqdm.trange(batches):
+            start = b * args.batch_size
+            end = start + args.batch_size
+            x = torch.from_numpy(features[start:end]).cuda()
+            x = torch.matmul(x, pca_A) + pca_b
+            npaa.append(x.cpu().numpy())
+
+
+if __name__ == "__main__":
+    main()
diff --git a/SpeechT5/fairseq/examples/wav2vec/unsupervised/scripts/copy_labels.py b/SpeechT5/fairseq/examples/wav2vec/unsupervised/scripts/copy_labels.py
new file mode 100644
index 0000000000000000000000000000000000000000..989868388eefccc37c82d7602f709632035c7aa1
--- /dev/null
+++ b/SpeechT5/fairseq/examples/wav2vec/unsupervised/scripts/copy_labels.py
@@ -0,0 +1,10 @@
+#!/usr/bin/env python3 -u
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import sys
+
+for idx, line in enumerate(sys.stdin):
+    print(f"utt{idx:010d} {line}", end="")
diff --git a/SpeechT5/fairseq/examples/wav2vec/unsupervised/scripts/filter_lexicon.py b/SpeechT5/fairseq/examples/wav2vec/unsupervised/scripts/filter_lexicon.py
new file mode 100644
index 0000000000000000000000000000000000000000..5bf3e51e7a50ac3f07cc41739198cde946dc79aa
--- /dev/null
+++ b/SpeechT5/fairseq/examples/wav2vec/unsupervised/scripts/filter_lexicon.py
@@ -0,0 +1,40 @@
+#!/usr/bin/env python3 -u
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import argparse
+import sys
+
+from fairseq.data import Dictionary
+
+
+def get_parser():
+    parser = argparse.ArgumentParser(
+        description="filters a lexicon given a unit dictionary"
+    )
+    parser.add_argument("-d", "--unit-dict", help="unit dictionary", required=True)
+    return parser
+
+
+def main():
+    parser = get_parser()
+    args = parser.parse_args()
+
+    d = Dictionary.load(args.unit_dict)
+    symbols = set(d.symbols)
+
+    for line in sys.stdin:
+        items = line.rstrip().split()
+        skip = len(items) < 2
+        for x in items[1:]:
+            if x not in symbols:
+                skip = True
+                break
+        if not skip:
+            print(line, end="")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/SpeechT5/fairseq/examples/wav2vec/unsupervised/scripts/filter_tsv.py b/SpeechT5/fairseq/examples/wav2vec/unsupervised/scripts/filter_tsv.py
new file mode 100644
index 0000000000000000000000000000000000000000..a09d79acf31414ea3eae82db59cf9f105aefcdf1
--- /dev/null
+++ b/SpeechT5/fairseq/examples/wav2vec/unsupervised/scripts/filter_tsv.py
@@ -0,0 +1,37 @@
+#!/usr/bin/env python3 -u
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import os
+import argparse
+import sys
+
+
+parser = argparse.ArgumentParser()
+parser.add_argument("--tsv", required=True, type=str)
+parser.add_argument("--no-skip", action="store_true")
+parser.add_argument("--keep", action="store_true")
+params = parser.parse_args()
+
+
+def get_fname(line):
+    p = os.path.basename(line.split("\t")[0])
+    p = os.path.splitext(p)[0]
+    return p
+
+
+# filenames to exclude
+seen = set()
+with open(params.tsv) as f:
+    if not params.no_skip:
+        root = next(f).rstrip()
+    for line in f:
+        seen.add(get_fname(line))
+
+for i, line in enumerate(sys.stdin):
+    exists = get_fname(line) in seen
+    keep = (exists and params.keep) or (not exists and not params.keep)
+    if i == 0 or keep:
+        print(line, end="")
diff --git a/SpeechT5/fairseq/examples/wav2vec/unsupervised/scripts/g2p_wrd_to_phn.py b/SpeechT5/fairseq/examples/wav2vec/unsupervised/scripts/g2p_wrd_to_phn.py
new file mode 100644
index 0000000000000000000000000000000000000000..2e31c307bd67d10941150160c7fb8c9e085ac5d9
--- /dev/null
+++ b/SpeechT5/fairseq/examples/wav2vec/unsupervised/scripts/g2p_wrd_to_phn.py
@@ -0,0 +1,45 @@
+#!/usr/bin/env python3 -u
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import argparse
+import sys
+
+from g2p_en import G2p
+
+
+def main():
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        "--compact",
+        action="store_true",
+        help="if set, compacts phones",
+    )
+    args = parser.parse_args()
+
+    compact = args.compact
+
+    wrd_to_phn = {}
+    g2p = G2p()
+    for line in sys.stdin:
+        words = line.strip().split()
+        phones = []
+        for w in words:
+            if w not in wrd_to_phn:
+                wrd_to_phn[w] = g2p(w)
+                if compact:
+                    wrd_to_phn[w] = [
+                        p[:-1] if p[-1].isnumeric() else p for p in wrd_to_phn[w]
+                    ]
+            phones.extend(wrd_to_phn[w])
+        try:
+            print(" ".join(phones))
+        except:
+            print(wrd_to_phn, words, phones, file=sys.stderr)
+            raise
+
+
+if __name__ == "__main__":
+    main()
diff --git a/SpeechT5/fairseq/examples/wav2vec/unsupervised/scripts/ltr_to_wrd.py b/SpeechT5/fairseq/examples/wav2vec/unsupervised/scripts/ltr_to_wrd.py
new file mode 100644
index 0000000000000000000000000000000000000000..36c85d1e2f60487494a92207feb4685e78db8aa2
--- /dev/null
+++ b/SpeechT5/fairseq/examples/wav2vec/unsupervised/scripts/ltr_to_wrd.py
@@ -0,0 +1,16 @@
+#!/usr/bin/env python3 -u
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import sys
+
+
+def main():
+    for line in sys.stdin:
+        print(line.replace(" ", "").replace("|", " ").strip())
+
+
+if __name__ == "__main__":
+    main()
diff --git a/SpeechT5/fairseq/examples/wav2vec/unsupervised/scripts/mean_pool.py b/SpeechT5/fairseq/examples/wav2vec/unsupervised/scripts/mean_pool.py
new file mode 100644
index 0000000000000000000000000000000000000000..4eea048ef3455cb3c897e74c18778c78fdc9fcbf
--- /dev/null
+++ b/SpeechT5/fairseq/examples/wav2vec/unsupervised/scripts/mean_pool.py
@@ -0,0 +1,99 @@
+#!/usr/bin/env python3 -u
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import argparse
+import os
+import os.path as osp
+import math
+import numpy as np
+import tqdm
+import torch
+import torch.nn.functional as F
+from shutil import copyfile
+
+from npy_append_array import NpyAppendArray
+
+
+def get_parser():
+    parser = argparse.ArgumentParser(
+        description="mean pools representations by compressing uniform splits of the data"
+    )
+    # fmt: off
+    parser.add_argument('source', help='directory with features')
+    parser.add_argument('--split', help='which split to read', required=True)
+    parser.add_argument('--save-dir', help='where to save the output', required=True)
+    parser.add_argument('--subsample-rate', type=float, default=0.5, help='size to subsample data to')
+
+    parser.add_argument('--remove-extra', action='store_true', help='if true, removes extra states that cant be pooled, otherwise pads with 0s')
+    # fmt: on
+
+    return parser
+
+
+def main():
+    parser = get_parser()
+    args = parser.parse_args()
+
+    source_path = osp.join(args.source, args.split)
+
+    print(f"data path: {source_path}")
+
+    features = np.load(source_path + ".npy", mmap_mode="r")
+
+    os.makedirs(args.save_dir, exist_ok=True)
+    save_path = osp.join(args.save_dir, args.split)
+
+    copyfile(source_path + ".tsv", save_path + ".tsv")
+
+    if os.path.exists(source_path + ".phn"):
+        copyfile(source_path + ".phn", save_path + ".phn")
+    if os.path.exists(source_path + ".wrd"):
+        copyfile(source_path + ".wrd", save_path + ".wrd")
+
+    if os.path.exists(osp.join(args.source, "dict.phn.txt")):
+        copyfile(
+            osp.join(args.source, "dict.phn.txt"),
+            osp.join(args.save_dir, "dict.phn.txt"),
+        )
+
+    if osp.exists(save_path + ".npy"):
+        os.remove(save_path + ".npy")
+    npaa = NpyAppendArray(save_path + ".npy")
+
+    with open(source_path + ".lengths", "r") as lf:
+        lengths = lf.readlines()
+
+    fsz = features.shape[-1]
+    start = 0
+    with torch.no_grad():
+        with open(save_path + ".lengths", "w") as lengths_out:
+            for length in tqdm.tqdm(lengths):
+                length = int(length)
+                end = start + length
+                feats = features[start:end]
+                start += length
+                x = torch.from_numpy(feats).cuda()
+                target_num = math.ceil(length * args.subsample_rate)
+                rem = length % target_num
+
+                if rem > 0:
+                    if args.remove_extra:
+                        to_rem = target_num - rem
+                        target_num -= 1
+                        x = x[:-to_rem]
+                    else:
+                        to_add = target_num - rem
+                        x = F.pad(x, [0, 0, 0, to_add])
+                        x[-to_add:] = x[-to_add - 1]
+
+                x = x.view(target_num, -1, fsz)
+                x = x.mean(dim=-2)
+                print(target_num, file=lengths_out)
+                npaa.append(x.cpu().numpy())
+
+
+if __name__ == "__main__":
+    main()
diff --git a/SpeechT5/fairseq/examples/wav2vec/unsupervised/scripts/merge_clusters.py b/SpeechT5/fairseq/examples/wav2vec/unsupervised/scripts/merge_clusters.py
new file mode 100644
index 0000000000000000000000000000000000000000..2780f9d971d847b3ad0b59e9a33780553ebce902
--- /dev/null
+++ b/SpeechT5/fairseq/examples/wav2vec/unsupervised/scripts/merge_clusters.py
@@ -0,0 +1,114 @@
+#!/usr/bin/env python3 -u
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import argparse
+import os
+import os.path as osp
+import numpy as np
+import tqdm
+import torch
+import random
+from shutil import copyfile
+
+from npy_append_array import NpyAppendArray
+
+
+def get_parser():
+    parser = argparse.ArgumentParser(
+        description="transforms features via a given pca and stored them in target dir"
+    )
+    # fmt: off
+    parser.add_argument('source', help='directory with features')
+    parser.add_argument('--split', help='which split to read', required=True)
+    parser.add_argument('--save-dir', help='where to save the output', required=True)
+    parser.add_argument('--cluster-dir', help='where the clusters are')
+    parser.add_argument('--pooling', type=str, default='mean', choices=['mean', 'sample'], help='how to pool')
+    # fmt: on
+
+    return parser
+
+
+def main():
+    parser = get_parser()
+    args = parser.parse_args()
+
+    source_path = osp.join(args.source, args.split)
+    cluster_path = osp.join(args.cluster_dir, args.split + ".src")
+    print(f"data path: {source_path}")
+
+    features = np.load(source_path + ".npy", mmap_mode="r")
+    sizes = []
+    offsets = []
+    offset = 0
+    with open(source_path + ".lengths", "r") as len_f:
+        for line in len_f:
+            length = int(line.rstrip())
+            sizes.append(length)
+            offsets.append(offset)
+            offset += length
+
+    clusters = []
+    with open(cluster_path, "r") as cf:
+        for line in cf:
+            line = line.rstrip()
+            items = line.split()
+            items = list(map(int, items))
+            clusters.append(items)
+
+    os.makedirs(args.save_dir, exist_ok=True)
+    save_path = osp.join(args.save_dir, args.split)
+
+    copyfile(source_path + ".tsv", save_path + ".tsv")
+
+    if os.path.exists(source_path + ".phn"):
+        copyfile(source_path + ".phn", save_path + ".phn")
+    if os.path.exists(osp.join(args.source, "dict.phn.txt")):
+        copyfile(
+            osp.join(args.source, "dict.phn.txt"),
+            osp.join(args.save_dir, "dict.phn.txt"),
+        )
+    if os.path.exists(source_path + ".wrd"):
+        copyfile(source_path + ".wrd", save_path + ".wrd")
+
+    if osp.exists(save_path + ".npy"):
+        os.remove(save_path + ".npy")
+    npaa = NpyAppendArray(save_path + ".npy")
+
+    def merge(feats, clust):
+        feats = torch.from_numpy(feats.copy())
+        clust = torch.LongTensor(clust)
+        _, counts = clust.unique_consecutive(return_counts=True)
+        curr = 0
+
+        merged = []
+        for c in counts:
+            c = c.item()
+            start = curr
+            end = curr + c
+            curr += c
+            if args.pooling == "mean":
+                new_x = feats[start:end].mean(dim=0)
+            elif args.pooling == "sample":
+                new_x = feats[start + int(random.random() * c)]
+            else:
+                raise NotImplementedError()
+            merged.append(new_x)
+
+        return torch.stack(merged, dim=0).numpy()
+
+    with open(save_path + ".lengths", "w") as l_f:
+        for size, offset, clust in tqdm.tqdm(
+            zip(sizes, offsets, clusters), total=len(sizes)
+        ):
+            end = size + offset
+            feats = features[offset:end]
+            feats = merge(feats, clust)
+            print(len(feats), file=l_f)
+            npaa.append(feats)
+
+
+if __name__ == "__main__":
+    main()
diff --git a/SpeechT5/fairseq/examples/wav2vec/unsupervised/scripts/normalize_and_filter_text.py b/SpeechT5/fairseq/examples/wav2vec/unsupervised/scripts/normalize_and_filter_text.py
new file mode 100644
index 0000000000000000000000000000000000000000..c2bd16efb530af5af3f72ab0edb3044b4e9fcd5c
--- /dev/null
+++ b/SpeechT5/fairseq/examples/wav2vec/unsupervised/scripts/normalize_and_filter_text.py
@@ -0,0 +1,72 @@
+#!/usr/bin/env python3
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import argparse
+import fasttext as ft
+import os
+import regex
+import sys
+
+
+def get_parser():
+    parser = argparse.ArgumentParser(
+        description="reads text from stdin and outputs normalized, lid-filtered version to stdout"
+    )
+    parser.add_argument(
+        "--fasttext-model",
+        help="path to fasttext model",
+        default="lid.187.bin",
+    )
+    parser.add_argument("--lang", help="language id", required=True)
+    parser.add_argument(
+        "--lid-threshold",
+        type=float,
+        help="threshold for this lang id probability",
+        default=0.4,
+    )
+
+    return parser
+
+
+def main():
+    parser = get_parser()
+    args = parser.parse_args()
+    filter_r = regex.compile(r"[^\p{L}\p{N}\p{M}\' \-]")
+
+    lg = args.lang.lower()
+    lg_label = f"__label__{lg}"
+    thresh = args.lid_threshold
+
+    if os.path.exists(args.fasttext_model):
+        model = ft.load_model(args.fasttext_model)
+    else:
+        print(
+            f"fasttext language id model {args.fasttext_model} not found. Proceeding without language filtering. "
+            f"To enable language filtering, please download the latest language id model "
+            f"from https://fasttext.cc/docs/en/language-identification.html",
+            file=sys.stderr,
+        )
+        model = None
+
+    for line in sys.stdin:
+        line = line.strip()
+        line = filter_r.sub(" ", line)
+        line = " ".join(line.split())
+
+        if model is not None:
+            lid, prob = model.predict(line, k=100)
+            try:
+                target_idx = lid.index(lg_label)
+            except ValueError:
+                continue
+            if target_idx == 0 or prob[target_idx] >= thresh:
+                print(line)
+        else:
+            print(line)
+
+
+if __name__ == "__main__":
+    main()
diff --git a/SpeechT5/fairseq/examples/wav2vec/unsupervised/scripts/normalize_text.py b/SpeechT5/fairseq/examples/wav2vec/unsupervised/scripts/normalize_text.py
new file mode 100644
index 0000000000000000000000000000000000000000..9d0ffeb27d038a6b82aaf0f6bdf208af565663f6
--- /dev/null
+++ b/SpeechT5/fairseq/examples/wav2vec/unsupervised/scripts/normalize_text.py
@@ -0,0 +1,22 @@
+#!/usr/bin/env python3
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import regex
+import sys
+
+
+def main():
+    filter_r = regex.compile(r"[^\p{L}\p{N}\p{M}\' \-]")
+
+    for line in sys.stdin:
+        line = line.strip()
+        line = filter_r.sub(" ", line)
+        line = " ".join(line.split())
+        print(line)
+
+
+if __name__ == "__main__":
+    main()
diff --git a/SpeechT5/fairseq/examples/wav2vec/unsupervised/scripts/pca.py b/SpeechT5/fairseq/examples/wav2vec/unsupervised/scripts/pca.py
new file mode 100644
index 0000000000000000000000000000000000000000..948cf5319fd86ba1bccff65270b2881048faf9b1
--- /dev/null
+++ b/SpeechT5/fairseq/examples/wav2vec/unsupervised/scripts/pca.py
@@ -0,0 +1,53 @@
+#!/usr/bin/env python3 -u
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import argparse
+import os
+import os.path as osp
+import numpy as np
+
+import faiss
+
+
+
+def get_parser():
+    parser = argparse.ArgumentParser(
+        description="compute a pca matrix given an array of numpy features"
+    )
+    # fmt: off
+    parser.add_argument('data', help='numpy file containing features')
+    parser.add_argument('--output', help='where to save the pca matrix', required=True)
+    parser.add_argument('--dim', type=int, help='dim for pca reduction', required=True)
+    parser.add_argument('--eigen-power', type=float, default=0, help='eigen power, -0.5 for whitening')
+
+    return parser
+
+
+def main():
+    parser = get_parser()
+    args = parser.parse_args()
+
+    print("Reading features")
+    x = np.load(args.data, mmap_mode="r")
+
+    print("Computing PCA")
+    pca = faiss.PCAMatrix(x.shape[-1], args.dim, args.eigen_power)
+    pca.train(x)
+    b = faiss.vector_to_array(pca.b)
+    A = faiss.vector_to_array(pca.A).reshape(pca.d_out, pca.d_in)
+
+    os.makedirs(args.output, exist_ok=True)
+
+    prefix = str(args.dim)
+    if args.eigen_power != 0:
+        prefix += f"_{args.eigen_power}"
+
+    np.save(osp.join(args.output, f"{prefix}_pca_A"), A.T)
+    np.save(osp.join(args.output, f"{prefix}_pca_b"), b)
+
+
+if __name__ == "__main__":
+    main()
diff --git a/SpeechT5/fairseq/examples/wav2vec/unsupervised/scripts/phonemize_with_sil.py b/SpeechT5/fairseq/examples/wav2vec/unsupervised/scripts/phonemize_with_sil.py
new file mode 100644
index 0000000000000000000000000000000000000000..c6512d7322def67b27aba46e9e36da171db6963b
--- /dev/null
+++ b/SpeechT5/fairseq/examples/wav2vec/unsupervised/scripts/phonemize_with_sil.py
@@ -0,0 +1,83 @@
+#!/usr/bin/env python3 -u
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import argparse
+import numpy as np
+import sys
+
+
+def get_parser():
+    parser = argparse.ArgumentParser(
+        description="converts words to phones adding optional silences around in between words"
+    )
+    parser.add_argument(
+        "--sil-prob",
+        "-s",
+        type=float,
+        default=0,
+        help="probability of inserting silence between each word",
+    )
+    parser.add_argument(
+        "--surround",
+        action="store_true",
+        help="if set, surrounds each example with silence",
+    )
+    parser.add_argument(
+        "--lexicon",
+        help="lexicon to convert to phones",
+        required=True,
+    )
+
+    return parser
+
+
+def main():
+    parser = get_parser()
+    args = parser.parse_args()
+
+    sil_prob = args.sil_prob
+    surround = args.surround
+    sil = "<SIL>"
+
+    wrd_to_phn = {}
+
+    with open(args.lexicon, "r") as lf:
+        for line in lf:
+            items = line.rstrip().split()
+            assert len(items) > 1, line
+            assert items[0] not in wrd_to_phn, items
+            wrd_to_phn[items[0]] = items[1:]
+
+    for line in sys.stdin:
+        words = line.strip().split()
+
+        if not all(w in wrd_to_phn for w in words):
+            continue
+
+        phones = []
+        if surround:
+            phones.append(sil)
+
+        sample_sil_probs = None
+        if sil_prob > 0 and len(words) > 1:
+            sample_sil_probs = np.random.random(len(words) - 1)
+
+        for i, w in enumerate(words):
+            phones.extend(wrd_to_phn[w])
+            if (
+                sample_sil_probs is not None
+                and i < len(sample_sil_probs)
+                and sample_sil_probs[i] < sil_prob
+            ):
+                phones.append(sil)
+
+        if surround:
+            phones.append(sil)
+        print(" ".join(phones))
+
+
+if __name__ == "__main__":
+    main()
diff --git a/SpeechT5/fairseq/examples/wav2vec/unsupervised/scripts/prepare_audio.sh b/SpeechT5/fairseq/examples/wav2vec/unsupervised/scripts/prepare_audio.sh
new file mode 100644
index 0000000000000000000000000000000000000000..013f7a9b055a7693a29f9c5ba1e4003a9a25850e
--- /dev/null
+++ b/SpeechT5/fairseq/examples/wav2vec/unsupervised/scripts/prepare_audio.sh
@@ -0,0 +1,78 @@
+#!/usr/bin/env zsh
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+source_dir=$1
+tgt_dir=$2
+model=$3
+
+if [ -z "$4" ]
+  then
+    dim=512
+  else
+    dim=$4
+fi
+
+echo "using $dim dim for PCA"
+
+if [ -z "$5" ]
+  then
+    layer=14
+  else
+    layer=$5
+fi
+
+echo "extracting from layer $layer"
+
+train_split=train
+valid_split=valid
+test_split=test
+
+all_splits=($train_split)
+
+if [[ -f "$source_dir/valid.tsv" ]]; then
+    all_splits+=('valid')
+fi
+
+if [[ -f "$source_dir/test.tsv" ]]; then
+    all_splits+=('test')
+fi
+
+echo "processing splits: $all_splits"
+
+mkdir -p $tgt_dir
+
+cp $source_dir/*.tsv $tgt_dir
+cp $source_dir/*.wrd $tgt_dir
+cp $source_dir/*.ltr $tgt_dir
+cp $source_dir/*.phn $tgt_dir
+cp $source_dir/dict* $tgt_dir
+
+setopt shwordsplit
+
+for split in $all_splits; do
+  python $FAIRSEQ_ROOT/examples/wav2vec/unsupervised/scripts/wav2vec_extract_features.py $source_dir --split $split \
+  --save-dir $tgt_dir --checkpoint $model --layer $layer
+done
+
+python $FAIRSEQ_ROOT/examples/wav2vec/unsupervised/scripts/wav2vec_cluster_faiss.py $tgt_dir/${train_split}.tsv \
+--checkpoint $model --save-dir $tgt_dir -f "CLUS128" --sample-pct 1.0
+
+for split in $all_splits; do
+  python $FAIRSEQ_ROOT/examples/wav2vec/unsupervised/scripts/wav2vec_apply_cluster_faiss.py $tgt_dir \
+  --checkpoint $model --path $tgt_dir/CLUS128 --split $split
+done
+
+python $FAIRSEQ_ROOT/examples/wav2vec/unsupervised/scripts/pca.py $tgt_dir/${train_split}.npy --output $tgt_dir/pca --dim $dim
+
+for split in $all_splits; do
+  python $FAIRSEQ_ROOT/examples/wav2vec/unsupervised/scripts/apply_pca.py $tgt_dir --split $split --save-dir $tgt_dir/precompute_pca$dim --pca-path $tgt_dir/pca/${dim}_pca --batch-size 1048000
+
+  python $FAIRSEQ_ROOT/examples/wav2vec/unsupervised/scripts/merge_clusters.py $tgt_dir/precompute_pca$dim --cluster-dir $tgt_dir/CLUS128 \
+  --split $split --save-dir $tgt_dir/precompute_pca${dim}_cls128_mean --pooling mean
+
+  python $FAIRSEQ_ROOT/examples/wav2vec/unsupervised/scripts/mean_pool.py $tgt_dir/precompute_pca${dim}_cls128_mean \
+  --save-dir $tgt_dir/precompute_pca${dim}_cls128_mean_pooled --split $split
+done
diff --git a/SpeechT5/fairseq/examples/wav2vec/unsupervised/scripts/prepare_text.sh b/SpeechT5/fairseq/examples/wav2vec/unsupervised/scripts/prepare_text.sh
new file mode 100644
index 0000000000000000000000000000000000000000..1caf13cb6a2a0bd84e5322c92124b2fa37368f9a
--- /dev/null
+++ b/SpeechT5/fairseq/examples/wav2vec/unsupervised/scripts/prepare_text.sh
@@ -0,0 +1,82 @@
+#!/usr/bin/env zsh
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+lg=$1
+text_path=$2
+target_dir=$3
+min_phones=$4
+phonemizer=$5
+lid_path=$6
+
+if [ -z "$lid_path" ]; then
+  lid_path="lid.187.bin"
+fi
+
+ph_lg=${lg:l}
+if test "$lg" = 'fr'; then
+  ph_lg='fr-fr'
+elif test "$lg" = 'en'; then
+  ph_lg='en-us'
+elif test "$lg" = 'pt'; then
+  ph_lg='pt-br'
+fi
+
+ESPEAK_PATH=''
+if test "$phonemizer" = 'espeak'; then
+  ESPEAK_PATH=$(which espeak)
+elif test "$phonemizer" = 'espeak-ng'; then
+  ESPEAK_PATH=$(which espeak-ng)
+elif test "$phonemizer" = 'G2P'; then
+  ESPEAK_PATH=''
+else
+  echo "Unknown phonemizer $phonemizer. Valid options are espeak, espean-ng and G2P"
+  exit 1
+fi
+
+echo $lg
+echo $ph_lg
+echo $text_path
+echo $target_dir
+echo "min phone seen threshold is $min_phones"
+
+mkdir -p $target_dir
+python $FAIRSEQ_ROOT/examples/wav2vec/unsupervised/scripts/normalize_and_filter_text.py --lang $lg --fasttext-model $lid_path < $text_path | grep -v '\-\-\-' >! $target_dir/lm.upper.lid.txt
+python $FAIRSEQ_ROOT/fairseq_cli/preprocess.py --dataset-impl mmap --trainpref $target_dir/lm.upper.lid.txt --only-source --destdir $target_dir --thresholdsrc 2 --padding-factor 1 --dict-only
+cut -f1 -d' ' $target_dir/dict.txt | grep -v -x '[[:punct:]]*' | grep -Pv '\d\d\d\d\d+' >! $target_dir/words.txt
+
+
+if [ -z "$ESPEAK_PATH" ]; then
+  python $FAIRSEQ_ROOT/examples/wav2vec/unsupervised/scripts/g2p_wrd_to_phn.py --compact < $target_dir/words.txt > $target_dir/phones.txt
+else
+  # echoing 1 into corpus will prevent the mismatch lines between lexicon and phones in case the phonemizer fails
+  one=$(echo "1" | PHONEMIZER_ESPEAK_PATH=$ESPEAK_PATH phonemize -p ' ' -w '' -l $ph_lg --language-switch remove-flags)
+  sed 's/$/ 1/' $target_dir/words.txt | PHONEMIZER_ESPEAK_PATH=$ESPEAK_PATH phonemize -o $target_dir/phones.txt -p ' ' -w '' -l $ph_lg -j 70 --language-switch remove-flags
+  echo "one is ${one}"
+  sed -i "s/${one}$//" $target_dir/phones.txt
+fi
+
+paste $target_dir/words.txt $target_dir/phones.txt >! $target_dir/lexicon.lst
+
+python $FAIRSEQ_ROOT/fairseq_cli/preprocess.py --dataset-impl mmap --trainpref $target_dir/phones.txt --only-source --destdir $target_dir/phones --thresholdsrc $min_phones --padding-factor 1 --dict-only
+
+python $FAIRSEQ_ROOT/examples/wav2vec/unsupervised/scripts/filter_lexicon.py -d $target_dir/phones/dict.txt < $target_dir/lexicon.lst >! $target_dir/lexicon_filtered.lst
+python $FAIRSEQ_ROOT/examples/wav2vec/unsupervised/scripts/phonemize_with_sil.py -s 0.25 --surround --lexicon $target_dir/lexicon_filtered.lst < $target_dir/lm.upper.lid.txt >! $target_dir/phones/lm.phones.filtered.txt
+cp $target_dir/phones/dict.txt $target_dir/phones/dict.phn.txt
+echo "<SIL> 0" >> $target_dir/phones/dict.phn.txt
+python $FAIRSEQ_ROOT/fairseq_cli/preprocess.py --dataset-impl mmap --trainpref $target_dir/phones/lm.phones.filtered.txt --workers 70 --only-source --destdir $target_dir/phones --srcdict $target_dir/phones/dict.phn.txt
+
+$KENLM_ROOT/lmplz -o 4 < $target_dir/lm.upper.lid.txt --discount_fallback --prune 0 0 0 3 >! $target_dir/kenlm.wrd.o40003.arpa
+$KENLM_ROOT/build_binary $target_dir/kenlm.wrd.o40003.arpa $target_dir/kenlm.wrd.o40003.bin
+
+lg=$lg python $FAIRSEQ_ROOT/examples/speech_recognition/kaldi/kaldi_initializer.py kaldi_root=$KALDI_ROOT fst_dir=$target_dir/fst/phn_to_words_sil lm_arpa=$target_dir/kenlm.wrd.o40003.arpa wav2letter_lexicon=$target_dir/lexicon_filtered.lst data_dir=$target_dir/phones in_labels=phn "blank_symbol='<SIL>'"
+lg=$lg python $FAIRSEQ_ROOT/examples/speech_recognition/kaldi/kaldi_initializer.py kaldi_root=$KALDI_ROOT fst_dir=$target_dir/fst/phn_to_words lm_arpa=$target_dir/kenlm.wrd.o40003.arpa wav2letter_lexicon=$target_dir/lexicon_filtered.lst data_dir=$target_dir/phones in_labels=phn
+
+$KENLM_ROOT/lmplz -o 4 < $target_dir/phones/lm.phones.filtered.txt --discount_fallback >! $target_dir/phones/lm.phones.filtered.04.arpa
+$KENLM_ROOT/build_binary $target_dir/phones/lm.phones.filtered.04.arpa $target_dir/phones/lm.phones.filtered.04.bin
+$KENLM_ROOT/lmplz -o 6 < $target_dir/phones/lm.phones.filtered.txt --discount_fallback >! $target_dir/phones/lm.phones.filtered.06.arpa
+$KENLM_ROOT/build_binary $target_dir/phones/lm.phones.filtered.06.arpa $target_dir/phones/lm.phones.filtered.06.bin
+
+lg=$lg python $FAIRSEQ_ROOT/examples/speech_recognition/kaldi/kaldi_initializer.py kaldi_root=$KALDI_ROOT fst_dir=$target_dir/fst/phn_to_phn_sil lm_arpa=$target_dir/phones/lm.phones.filtered.06.arpa data_dir=$target_dir/phones in_labels=phn "blank_symbol='<SIL>'"
diff --git a/SpeechT5/fairseq/examples/wav2vec/unsupervised/scripts/prepare_timit.sh b/SpeechT5/fairseq/examples/wav2vec/unsupervised/scripts/prepare_timit.sh
new file mode 100644
index 0000000000000000000000000000000000000000..d8f5d596b4b4ec55f11a82dbbf83bad4a22c0b6c
--- /dev/null
+++ b/SpeechT5/fairseq/examples/wav2vec/unsupervised/scripts/prepare_timit.sh
@@ -0,0 +1,79 @@
+#!/bin/bash
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+timit_root=$1  # assume it is the upper-cased version
+tgt_dir=$2
+model=$3
+
+set -eu
+
+setups="matched unmatched"
+splits="test valid train train_text"
+
+tgt_dir=$(realpath $tgt_dir)
+sph2wav=$KALDI_ROOT/tools/sph2pipe_v2.5/sph2pipe
+wav_dir=$tgt_dir/wav
+
+
+mkdir -p $tgt_dir $wav_dir
+find $timit_root/{TRAIN,TEST} -iname "*.WAV" > $tgt_dir/all_sph.flist
+cat $tgt_dir/all_sph.flist | sed -e 's#//*#/#g' -e 's#.*/\([^/]*\)/\([^/]*\).WAV#\1_\2#g' > $tgt_dir/all.uid
+paste -d' ' $tgt_dir/{all_sph.flist,all.uid} | \
+  awk -v sph2wav=$sph2wav -v wav_dir=$wav_dir '{print sph2wav " -f wav " $1 " > " wav_dir "/" $2 ".wav"}' \
+  > $tgt_dir/sph2wav.sh
+bash $tgt_dir/sph2wav.sh
+cat $tgt_dir/all.uid | awk -v wav_dir=$(pwd)/$wav_dir '{print $1" "wav_dir"/"$1".wav"}' | sort > $tgt_dir/all_wav.scp
+cut -d' ' -f2 $tgt_dir/all_wav.scp | xargs -I{} soxi -s {} > $tgt_dir/all.dur
+paste -d' ' $tgt_dir/{all_wav.scp,all.dur} > $tgt_dir/all_wav_dur.scp
+rm $tgt_dir/{all.uid,all_sph.flist,sph2wav.sh}
+
+find $timit_root/{TRAIN,TEST} -iname "*.PHN" > $tgt_dir/all_phn60.flist
+while read line; do
+  if [ ! -f $line ]; then 
+    >&2 echo "Cannot find transcription file '$line'" && exit 1;
+  fi
+  cut -f3 -d' ' "$line" | tr '\n' ' ' | perl -ape 's: *$:\n:;'
+done < $tgt_dir/all_phn60.flist > $tgt_dir/all.phn60
+cat $tgt_dir/all_phn60.flist | sed -e 's#//*#/#g' -e 's#.*/\([^/]*\)/\([^/]*\).PHN#\1_\2#g' | \
+  paste -d' ' - $tgt_dir/all.phn60 | \
+  $KALDI_ROOT/egs/timit/s5/local/timit_norm_trans.pl -i - -m $KALDI_ROOT/egs/timit/s5/conf/phones.60-48-39.map -to 39 | \
+  sort > $tgt_dir/all.phn
+echo "done preparing wav and 39-phone transcripts"
+
+
+for s in $setups; do
+  mkdir -p $tgt_dir/$s
+  for x in $splits; do
+    uid_path=config/timit_${s}/${x}.uid
+    grep -w -f $uid_path $tgt_dir/all.phn | cut -d' ' -f2- > $tgt_dir/$s/$x.phn
+    ln -sf $(realpath $tgt_dir/$s/$x.phn) $tgt_dir/$s/$x.wrd
+    
+    echo "/" > $tgt_dir/$s/$x.tsv &&  grep -w -f $uid_path $tgt_dir/all_wav_dur.scp | cut -d' ' -f2- | sed 's# #\t#'  >> $tgt_dir/$s/$x.tsv
+  done
+  
+  for x in $splits; do
+    cat $tgt_dir/$s/$x.phn
+  done | tr ' ' '\n' | sort -u | awk '{print $1" "1}' > $tgt_dir/$s/dict.phn.txt
+  ln -sf $(realpath $tgt_dir/$s/dict.phn.txt) $tgt_dir/$s/dict.wrd.txt
+done
+echo "done preparing unmatched and matched setups for TIMIT"
+
+
+for s in $setups; do
+  zsh scripts/prepare_audio.sh $tgt_dir/$s $tgt_dir/$s/feat $model
+
+  lm_dir=$tgt_dir/$s/phones
+  fst_dir=$tgt_dir/$s/fst/phn_to_phn
+
+  python $FAIRSEQ_ROOT/fairseq_cli/preprocess.py --dataset-impl mmap --trainpref $tgt_dir/$s/train_text.phn --workers 10 --only-source --destdir $lm_dir --srcdict $tgt_dir/$s/dict.phn.txt
+  $KENLM_ROOT/lmplz -o 3 < $tgt_dir/$s/train_text.phn --discount_fallback >$lm_dir/train_text_phn.03.arpa
+  $KENLM_ROOT/build_binary $lm_dir/train_text_phn.03.arpa $lm_dir/train_text_phn.03.bin
+  $KENLM_ROOT/lmplz -o 4 < $tgt_dir/$s/train_text.phn --discount_fallback >$lm_dir/train_text_phn.04.arpa
+  $KENLM_ROOT/build_binary $lm_dir/train_text_phn.04.arpa $lm_dir/train_text_phn.04.bin
+  
+  python $FAIRSEQ_ROOT/examples/speech_recognition/kaldi/kaldi_initializer.py kaldi_root=$KALDI_ROOT fst_dir=$fst_dir lm_arpa=$lm_dir/train_text_phn.03.arpa data_dir=$tgt_dir/$s in_labels=phn
+done
+echo "done preprocessing audio and text for wav2vec-U"
diff --git a/SpeechT5/fairseq/examples/wav2vec/unsupervised/scripts/remove_silence.py b/SpeechT5/fairseq/examples/wav2vec/unsupervised/scripts/remove_silence.py
new file mode 100644
index 0000000000000000000000000000000000000000..fac88b989703262a84b242b2761df621bf02c739
--- /dev/null
+++ b/SpeechT5/fairseq/examples/wav2vec/unsupervised/scripts/remove_silence.py
@@ -0,0 +1,63 @@
+#!/usr/bin/env python3 -u
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+"""
+get intervals from .vads file, specify output data, and this script removes silences and saves the audio data in out path folder
+paths=shards/train.tsv
+vads=shards/train.vads
+python remove_silence.py --paths $paths --vads $vads
+"""
+
+import os
+import argparse
+import torch
+import torchaudio
+import tqdm
+
+
+parser = argparse.ArgumentParser()
+parser.add_argument("--tsv", default="", type=str)
+parser.add_argument("--vads", default="", type=str)
+parser.add_argument("--out", type=str)
+params = parser.parse_args()
+
+# load paths
+paths = []
+with open(params.tsv) as f:
+    root = next(f).rstrip()
+    for line in f:
+        paths.append(os.path.join(root, line.rstrip().split("\t")[0]))
+
+# load vads
+list_intervals = []
+with open(params.vads) as f:
+    for line in f:
+        interval = [
+            [int(w.split(":")[0]), int(w.split(":")[1])] for w in line.rstrip().split()
+        ]
+        list_intervals.append(interval)
+
+
+# load audio and keep only intervals (i.e. remove silences)
+for i in tqdm.trange(len(paths)):
+    data, _ = torchaudio.load(paths[i])
+    if len(list_intervals[i]) > 0:
+        data_filtered = torch.cat(
+            [data[0][int(it[0]) : int(it[1])] for it in list_intervals[i]]
+        ).unsqueeze(0)
+    else:
+        data_filtered = data
+
+    # YOU MAY NEED TO MODIFY THIS TO GET THE RIGHT SUBPATH
+    # outpath = params.out + '/'.join(paths[i].split('/')[-1])
+    outpath = params.out + "/" + "/".join(paths[i].split("/")[-2:])
+
+    if not os.path.isdir("/".join(outpath.split("/")[:-1])):
+        os.makedirs("/".join(outpath.split("/")[:-1]))
+    if not os.path.exists(outpath):
+        torchaudio.save(outpath, data_filtered, sample_rate=16000)
+    else:
+        print(outpath, "exists!")
diff --git a/SpeechT5/fairseq/examples/wav2vec/unsupervised/scripts/vads.py b/SpeechT5/fairseq/examples/wav2vec/unsupervised/scripts/vads.py
new file mode 100644
index 0000000000000000000000000000000000000000..2398da97d8c44b8f3f270b22d5508a003482b4d6
--- /dev/null
+++ b/SpeechT5/fairseq/examples/wav2vec/unsupervised/scripts/vads.py
@@ -0,0 +1,98 @@
+#!/usr/bin/env python3 -u
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import argparse
+import sys
+
+from copy import deepcopy
+from scipy.signal import lfilter
+
+import numpy as np
+from tqdm import tqdm
+import soundfile as sf
+import os.path as osp
+
+
+def get_parser():
+    parser = argparse.ArgumentParser(description="compute vad segments")
+    parser.add_argument(
+        "--rvad-home",
+        "-r",
+        help="path to rvad home (see https://github.com/zhenghuatan/rVADfast)",
+        required=True,
+    )
+
+    return parser
+
+
+def rvad(speechproc, path):
+    winlen, ovrlen, pre_coef, nfilter, nftt = 0.025, 0.01, 0.97, 20, 512
+    ftThres = 0.5
+    vadThres = 0.4
+    opts = 1
+
+    data, fs = sf.read(path)
+    assert fs == 16_000, "sample rate must be 16khz"
+    ft, flen, fsh10, nfr10 = speechproc.sflux(data, fs, winlen, ovrlen, nftt)
+
+    # --spectral flatness --
+    pv01 = np.zeros(ft.shape[0])
+    pv01[np.less_equal(ft, ftThres)] = 1
+    pitch = deepcopy(ft)
+
+    pvblk = speechproc.pitchblockdetect(pv01, pitch, nfr10, opts)
+
+    # --filtering--
+    ENERGYFLOOR = np.exp(-50)
+    b = np.array([0.9770, -0.9770])
+    a = np.array([1.0000, -0.9540])
+    fdata = lfilter(b, a, data, axis=0)
+
+    # --pass 1--
+    noise_samp, noise_seg, n_noise_samp = speechproc.snre_highenergy(
+        fdata, nfr10, flen, fsh10, ENERGYFLOOR, pv01, pvblk
+    )
+
+    # sets noisy segments to zero
+    for j in range(n_noise_samp):
+        fdata[range(int(noise_samp[j, 0]), int(noise_samp[j, 1]) + 1)] = 0
+
+    vad_seg = speechproc.snre_vad(
+        fdata, nfr10, flen, fsh10, ENERGYFLOOR, pv01, pvblk, vadThres
+    )
+    return vad_seg, data
+
+
+def main():
+    parser = get_parser()
+    args = parser.parse_args()
+
+    sys.path.append(args.rvad_home)
+    import speechproc
+
+    stride = 160
+    lines = sys.stdin.readlines()
+    root = lines[0].rstrip()
+    for fpath in tqdm(lines[1:]):
+        path = osp.join(root, fpath.split()[0])
+        vads, wav = rvad(speechproc, path)
+
+        start = None
+        vad_segs = []
+        for i, v in enumerate(vads):
+            if start is None and v == 1:
+                start = i * stride
+            elif start is not None and v == 0:
+                vad_segs.append((start, i * stride))
+                start = None
+        if start is not None:
+            vad_segs.append((start, len(wav)))
+
+        print(" ".join(f"{v[0]}:{v[1]}" for v in vad_segs))
+
+
+if __name__ == "__main__":
+    main()
diff --git a/SpeechT5/fairseq/examples/wav2vec/unsupervised/scripts/wav2vec_apply_cluster_faiss.py b/SpeechT5/fairseq/examples/wav2vec/unsupervised/scripts/wav2vec_apply_cluster_faiss.py
new file mode 100644
index 0000000000000000000000000000000000000000..a5dd7ae6c15b358206e067385be260c94021bf20
--- /dev/null
+++ b/SpeechT5/fairseq/examples/wav2vec/unsupervised/scripts/wav2vec_apply_cluster_faiss.py
@@ -0,0 +1,128 @@
+#!/usr/bin/env python3 -u
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import argparse
+import os
+import os.path as osp
+import numpy as np
+import tqdm
+import torch
+import sys
+
+import faiss
+import torch.nn.functional as F
+
+from wav2vec_cluster_faiss import parse_faiss_specs, Wav2VecFeatureReader
+
+
+def get_parser():
+    parser = argparse.ArgumentParser(description="apply clusters")
+    # fmt: off
+    parser.add_argument('data', help='location of tsv files')
+    parser.add_argument('--split', help='split to process', required=True)
+    parser.add_argument('--labels', help='split to process', default="phn")
+    parser.add_argument('--path', help='path to pca and centroids', required=True)
+    parser.add_argument('--checkpoint', type=str, help='checkpoint for wav2vec model (if using wav2vec features)', required=True)
+    parser.add_argument('--layer', '-l', type=int, help='which layer to read', default=14)
+    parser.add_argument('--max-tsz', type=int, help='batch kmeans up to this much', default=14)
+    # fmt: on
+
+    return parser
+
+
+def get_iterator(args):
+    label_path = osp.join(args.data, f"{args.split}.{args.labels}")
+    if osp.exists(label_path):
+        lp = open(label_path, "r")
+    else:
+        lp = None
+
+    with open(osp.join(args.data, f"{args.split}.tsv"), "r") as fp:
+        lines = fp.read().split("\n")
+        root = lines.pop(0).strip()
+        files = [line.rstrip() for line in lines if len(line) > 0]
+
+        if lp is not None:
+            lbls = [line.rstrip() for line in lp]
+        else:
+            lbls = [None] * len(files)
+
+        num = len(files)
+        reader = Wav2VecFeatureReader(args.checkpoint, args.layer)
+
+        def iterate():
+            for fname, lbl in zip(files, lbls):
+                file = osp.join(root, fname.split("\t")[0])
+                feats = reader.get_feats(file)
+                yield feats.data, fname, lbl
+
+        return iterate, num, root
+
+
+def main():
+    parser = get_parser()
+    args = parser.parse_args()
+
+    spec = osp.basename(args.path)
+
+    try:
+        faiss_spec = parse_faiss_specs(spec.rstrip("/"))[0]
+    except:
+        print(spec)
+        raise
+
+    print("Faiss Spec:", faiss_spec, file=sys.stderr)
+
+    if faiss_spec.pca:
+        A = torch.from_numpy(np.load(osp.join(args.path, "pca_A.npy"))).cuda()
+        b = torch.from_numpy(np.load(osp.join(args.path, "pca_b.npy"))).cuda()
+        print("Loaded PCA", file=sys.stderr)
+
+    centroids = np.load(osp.join(args.path, "centroids.npy"))
+    print("Loaded centroids", centroids.shape, file=sys.stderr)
+
+    res = faiss.StandardGpuResources()
+    index_flat = (
+        faiss.IndexFlatL2(centroids.shape[1])
+        if not faiss_spec.sphere
+        else faiss.IndexFlatIP(centroids.shape[1])
+    )
+    faiss_index = faiss.index_cpu_to_gpu(res, 0, index_flat)
+    faiss_index.add(centroids)
+
+    generator, num, root = get_iterator(args)
+    iterator = generator()
+
+    had_labels = False
+    label_path = osp.join(args.path, f"{args.split}.{args.labels}")
+
+    with torch.no_grad():
+        with open(osp.join(args.path, f"{args.split}.src"), "w") as fp, open(
+            osp.join(args.path, f"{args.split}.tsv"), "w"
+        ) as pp, open(label_path, "w") as lp:
+            print(root, file=pp)
+            for f, fname, lbl in tqdm.tqdm(iterator, total=num):
+                if faiss_spec.pca:
+                    f = torch.mm(f, A) + b
+                if faiss_spec.norm:
+                    f = F.normalize(f, p=2, dim=-1)
+
+                f = f.cpu().numpy()
+
+                _, z = faiss_index.search(f, 1)
+
+                print(" ".join(str(x.item()) for x in z), file=fp)
+                print(fname, file=pp)
+
+                if lbl is not None:
+                    print(lbl, file=lp)
+                    had_labels = True
+    if not had_labels:
+        os.remove(label_path)
+
+
+if __name__ == "__main__":
+    main()
diff --git a/SpeechT5/fairseq/examples/wav2vec/unsupervised/scripts/wav2vec_cluster_faiss.py b/SpeechT5/fairseq/examples/wav2vec/unsupervised/scripts/wav2vec_cluster_faiss.py
new file mode 100644
index 0000000000000000000000000000000000000000..632a69e9f4bd98d33abb689c15557c818d0e35ea
--- /dev/null
+++ b/SpeechT5/fairseq/examples/wav2vec/unsupervised/scripts/wav2vec_cluster_faiss.py
@@ -0,0 +1,210 @@
+#!/usr/bin/env python3 -u
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import argparse
+import gc
+import os
+import os.path as osp
+import random
+import numpy as np
+import tqdm
+import torch
+
+from collections import namedtuple
+
+import faiss
+
+import fairseq
+import soundfile as sf
+
+
+def get_parser():
+    parser = argparse.ArgumentParser(
+        description="compute kmeans codebook from kaldi-computed feats"
+    )
+    # fmt: off
+    parser.add_argument('data', help='location of tsv files')
+    parser.add_argument('--save-dir', help='where to save the output', required=True)
+    parser.add_argument('--checkpoint', type=str, help='checkpoint for wav2vec model (if using wav2vec features)', required=True)
+    parser.add_argument('--sample-pct', '-r', type=float, help='percentage of timesteps to sample', default=0)
+    parser.add_argument('--layer', '-l', type=int, help='which layer to read', default=14)
+    parser.add_argument('--faiss-specs', '-f', type=str,
+                        help='faiss index specs; separated by space '
+                             'format is: PCAx_NORM_CLUSx_SPHERICAL -> '
+                                'PCAx if exists first apply PCA '
+                                'NORM if exists, normalize the vector by L2 norm '
+                                'CLUSx must exist, cluster to x clusters '
+                                'SPEHRICAL if exists, apply spherical kmeans',
+                        default='l2')
+    # fmt: on
+
+    return parser
+
+
+faiss_spec = namedtuple("faiss_spec", ["pca", "norm", "n_clus", "sphere", "spec_str"])
+
+
+def parse_faiss_specs(specs_str):
+    specs = []
+    for ss in specs_str.split():
+        comps = ss.split("_")
+        pca = 0
+        norm = False
+        n_clus = 0
+        sphere = False
+        for c in comps:
+            if c.startswith("PCA"):
+                pca = int(c[3:])
+            elif c == "NORM":
+                norm = True
+            elif c.startswith("CLUS"):
+                n_clus = int(c[4:])
+            elif c == "SPHERICAL":
+                sphere = True
+        assert n_clus > 0
+        specs.append(
+            faiss_spec(pca=pca, norm=norm, n_clus=n_clus, sphere=sphere, spec_str=ss)
+        )
+    return specs
+
+
+class Wav2VecFeatureReader(object):
+    def __init__(self, cp_file, layer):
+        state = fairseq.checkpoint_utils.load_checkpoint_to_cpu(cp_file)
+
+        self.layer = layer
+
+        if "cfg" in state:
+            w2v_args = state["cfg"]
+            task = fairseq.tasks.setup_task(w2v_args.task)
+            model = task.build_model(w2v_args.model)
+        else:
+            w2v_args = state["args"]
+            task = fairseq.tasks.setup_task(w2v_args)
+            model = task.build_model(w2v_args)
+        model.load_state_dict(state["model"], strict=True)
+        model.eval()
+        model.cuda()
+        self.model = model
+
+    def read_audio(self, fname):
+        """Load an audio file and return PCM along with the sample rate"""
+        wav, sr = sf.read(fname)
+        assert sr == 16e3
+
+        return wav
+
+    def get_feats(self, loc):
+        x = self.read_audio(loc)
+        with torch.no_grad():
+            source = torch.from_numpy(x).view(1, -1).float().cuda()
+            res = self.model(
+                source=source, mask=False, features_only=True, layer=self.layer
+            )
+            return res["layer_results"][self.layer][0].squeeze(1)
+
+
+def get_iterator(args):
+    with open(args.data, "r") as fp:
+        lines = fp.read().split("\n")
+        root = lines.pop(0).strip()
+        files = [osp.join(root, line.split("\t")[0]) for line in lines if len(line) > 0]
+
+        if getattr(args, "sample_pct", 0) > 0:
+            files = random.sample(files, int(args.sample_pct * len(files)))
+        num = len(files)
+        reader = Wav2VecFeatureReader(args.checkpoint, args.layer)
+
+        def iterate():
+            for fname in files:
+                feats = reader.get_feats(fname)
+                yield feats.cpu().numpy()
+
+    return iterate, num
+
+
+def main():
+    parser = get_parser()
+    args = parser.parse_args()
+
+    faiss_specs = parse_faiss_specs(args.faiss_specs)
+    print("Faiss Specs:", faiss_specs)
+
+    feat_path = osp.join(args.save_dir, "features")
+    if osp.exists(feat_path + ".npy"):
+        feats = np.load(feat_path + ".npy")
+    else:
+        generator, num = get_iterator(args)
+        iterator = generator()
+
+        feats = []
+        for f in tqdm.tqdm(iterator, total=num):
+            feats.append(f)
+
+        del iterator
+        del generator
+
+        feats = np.concatenate(feats)
+
+        print(feats.shape)
+
+        os.makedirs(args.save_dir, exist_ok=True)
+        # np.save(feat_path, feats)
+
+        gc.collect()
+        torch.cuda.empty_cache()
+
+    reload = False
+    for spec in faiss_specs:
+        print("Processing spec", spec)
+
+        if reload:
+            print("Reloading...")
+            del feats
+            gc.collect()
+            feats = np.load(feat_path + ".npy")
+
+        save_path = osp.join(args.save_dir, spec.spec_str)
+        os.makedirs(save_path, exist_ok=True)
+        d = feats.shape[-1]
+        x = feats
+        if spec.pca > 0:
+            print("Computing PCA")
+            pca = faiss.PCAMatrix(d, spec.pca)
+            pca.train(x)
+            d = spec.pca
+            b = faiss.vector_to_array(pca.b)
+            A = faiss.vector_to_array(pca.A).reshape(pca.d_out, pca.d_in)
+            np.save(osp.join(save_path, "pca_A"), A.T)
+            np.save(osp.join(save_path, "pca_b"), b)
+            print("Applying PCA")
+            x = pca.apply_py(x)
+
+        if spec.norm:
+            reload = spec.pca <= 0
+            print("Normalizing")
+            faiss.normalize_L2(x)
+
+        print("Computing kmeans")
+        kmeans = faiss.Kmeans(
+            d,
+            spec.n_clus,
+            niter=50,
+            verbose=True,
+            spherical=spec.sphere,
+            max_points_per_centroid=feats.shape[0],
+            gpu=True,
+            nredo=3,
+        )
+        kmeans.train(x)
+        np.save(osp.join(save_path, "centroids"), kmeans.centroids)
+        del kmeans
+        del x
+        gc.collect()
+
+
+if __name__ == "__main__":
+    main()
diff --git a/SpeechT5/fairseq/examples/wav2vec/unsupervised/scripts/wav2vec_extract_features.py b/SpeechT5/fairseq/examples/wav2vec/unsupervised/scripts/wav2vec_extract_features.py
new file mode 100644
index 0000000000000000000000000000000000000000..b07e274d202414ce40d00aa64a27cf97bb49c1c3
--- /dev/null
+++ b/SpeechT5/fairseq/examples/wav2vec/unsupervised/scripts/wav2vec_extract_features.py
@@ -0,0 +1,119 @@
+#!/usr/bin/env python3 -u
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import argparse
+import os
+import os.path as osp
+import tqdm
+import torch
+import torch.nn.functional as F
+from shutil import copyfile
+
+from npy_append_array import NpyAppendArray
+
+import fairseq
+import soundfile as sf
+
+
+def get_parser():
+    parser = argparse.ArgumentParser(
+        description="compute kmeans codebook from kaldi-computed feats"
+    )
+    # fmt: off
+    parser.add_argument('data', help='location of tsv files')
+    parser.add_argument('--split', help='which split to read', required=True)
+    parser.add_argument('--save-dir', help='where to save the output', required=True)
+    parser.add_argument('--checkpoint', type=str, help='checkpoint for wav2vec ctc model', required=True)
+    parser.add_argument('--layer', type=int, default=14, help='which layer to use')
+    # fmt: on
+
+    return parser
+
+
+class Wav2VecFeatureReader(object):
+    def __init__(self, cp_file, layer):
+        model, cfg, task = fairseq.checkpoint_utils.load_model_ensemble_and_task(
+            [cp_file]
+        )
+        model = model[0]
+        model.eval()
+        model.cuda()
+        self.model = model
+        self.task = task
+        self.layer = layer
+
+    def read_audio(self, fname):
+        """Load an audio file and return PCM along with the sample rate"""
+        wav, sr = sf.read(fname)
+        assert sr == 16e3
+
+        return wav
+
+    def get_feats(self, loc):
+        x = self.read_audio(loc)
+        with torch.no_grad():
+            source = torch.from_numpy(x).float().cuda()
+            if self.task.cfg.normalize:
+                assert source.dim() == 1, source.dim()
+                with torch.no_grad():
+                    source = F.layer_norm(source, source.shape)
+            source = source.view(1, -1)
+
+            m_res = self.model(source=source, mask=False, features_only=True, layer=self.layer)
+            return m_res["x"].squeeze(0).cpu()
+
+
+def get_iterator(args):
+    with open(osp.join(args.data, args.split) + ".tsv", "r") as fp:
+        lines = fp.read().split("\n")
+        root = lines.pop(0).strip()
+        files = [osp.join(root, line.split("\t")[0]) for line in lines if len(line) > 0]
+
+        num = len(files)
+        reader = Wav2VecFeatureReader(args.checkpoint, args.layer)
+
+        def iterate():
+            for fname in files:
+                w2v_feats = reader.get_feats(fname)
+                yield w2v_feats
+
+    return iterate, num
+
+
+def main():
+    parser = get_parser()
+    args = parser.parse_args()
+
+    os.makedirs(args.save_dir, exist_ok=True)
+
+    def create_files(dest):
+        copyfile(osp.join(args.data, args.split) + ".tsv", dest + ".tsv")
+        if osp.exists(osp.join(args.data, args.split) + ".wrd"):
+            copyfile(osp.join(args.data, args.split) + ".wrd", dest + ".wrd")
+        if osp.exists(osp.join(args.data, args.split) + ".phn"):
+            copyfile(osp.join(args.data, args.split) + ".phn", dest + ".phn")
+
+        if osp.exists(dest + ".npy"):
+            os.remove(dest + ".npy")
+        npaa = NpyAppendArray(dest + ".npy")
+        return npaa
+
+    save_path = osp.join(args.save_dir, args.split)
+    npaa = create_files(save_path)
+
+    generator, num = get_iterator(args)
+    iterator = generator()
+
+    with open(save_path + ".lengths", "w") as l_f:
+        for w2v_feats in tqdm.tqdm(iterator, total=num):
+            print(len(w2v_feats), file=l_f)
+
+            if len(w2v_feats) > 0:
+                npaa.append(w2v_feats.numpy())
+
+
+if __name__ == "__main__":
+    main()
diff --git a/SpeechT5/fairseq/examples/wav2vec/unsupervised/scripts/wer.py b/SpeechT5/fairseq/examples/wav2vec/unsupervised/scripts/wer.py
new file mode 100644
index 0000000000000000000000000000000000000000..613ab50d39019f6edf67c56c2353646be2a2f17d
--- /dev/null
+++ b/SpeechT5/fairseq/examples/wav2vec/unsupervised/scripts/wer.py
@@ -0,0 +1,82 @@
+#!/usr/bin/env python3 -u
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+"""
+Implement unsupervised metric for decoding hyperparameter selection:
+    $$ alpha * LM_PPL + ViterbitUER(%) * 100 $$
+"""
+import argparse
+import logging
+import sys
+
+import editdistance
+
+logging.root.setLevel(logging.INFO)
+logging.basicConfig(stream=sys.stdout, level=logging.INFO)
+logger = logging.getLogger(__name__)
+
+
+def get_parser():
+    parser = argparse.ArgumentParser()
+    parser.add_argument("-s", "--hypo", help="hypo transcription", required=True)
+    parser.add_argument(
+        "-r", "--reference", help="reference transcription", required=True
+    )
+    return parser
+
+
+def compute_wer(ref_uid_to_tra, hyp_uid_to_tra, g2p):
+    d_cnt = 0
+    w_cnt = 0
+    w_cnt_h = 0
+    for uid in hyp_uid_to_tra:
+        ref = ref_uid_to_tra[uid].split()
+        if g2p is not None:
+            hyp = g2p(hyp_uid_to_tra[uid])
+            hyp = [p for p in hyp if p != "'" and p != " "]
+            hyp = [p[:-1] if p[-1].isnumeric() else p for p in hyp]
+        else:
+            hyp = hyp_uid_to_tra[uid].split()
+        d_cnt += editdistance.eval(ref, hyp)
+        w_cnt += len(ref)
+        w_cnt_h += len(hyp)
+    wer = float(d_cnt) / w_cnt
+    logger.debug(
+        (
+            f"wer = {wer * 100:.2f}%; num. of ref words = {w_cnt}; "
+            f"num. of hyp words = {w_cnt_h}; num. of sentences = {len(ref_uid_to_tra)}"
+        )
+    )
+    return wer
+
+
+def main():
+    args = get_parser().parse_args()
+
+    errs = 0
+    count = 0
+    with open(args.hypo, "r") as hf, open(args.reference, "r") as rf:
+        for h, r in zip(hf, rf):
+            h = h.rstrip().split()
+            r = r.rstrip().split()
+            errs += editdistance.eval(r, h)
+            count += len(r)
+
+    logger.info(f"UER: {errs / count * 100:.2f}%")
+
+
+if __name__ == "__main__":
+    main()
+
+
+def load_tra(tra_path):
+    with open(tra_path, "r") as f:
+        uid_to_tra = {}
+        for line in f:
+            uid, tra = line.split(None, 1)
+            uid_to_tra[uid] = tra
+    logger.debug(f"loaded {len(uid_to_tra)} utterances from {tra_path}")
+    return uid_to_tra
diff --git a/SpeechT5/fairseq/examples/wav2vec/unsupervised/scripts/wrd_to_ltr.py b/SpeechT5/fairseq/examples/wav2vec/unsupervised/scripts/wrd_to_ltr.py
new file mode 100644
index 0000000000000000000000000000000000000000..f83471409a434556cab70086ca9e2d72d4bdddd5
--- /dev/null
+++ b/SpeechT5/fairseq/examples/wav2vec/unsupervised/scripts/wrd_to_ltr.py
@@ -0,0 +1,16 @@
+#!/usr/bin/env python3 -u
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import sys
+
+
+def main():
+    for line in sys.stdin:
+        print(" ".join(list(line.strip().replace(" ", "|"))) + " |")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/SpeechT5/fairseq/examples/wav2vec/unsupervised/tasks/__init__.py b/SpeechT5/fairseq/examples/wav2vec/unsupervised/tasks/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..6d7dd625e09451be671908578f93148f371f53cd
--- /dev/null
+++ b/SpeechT5/fairseq/examples/wav2vec/unsupervised/tasks/__init__.py
@@ -0,0 +1,11 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from .unpaired_audio_text import UnpairedAudioText
+
+
+__all__ = [
+    "UnpairedAudioText",
+]
diff --git a/SpeechT5/fairseq/examples/wav2vec/unsupervised/tasks/unpaired_audio_text.py b/SpeechT5/fairseq/examples/wav2vec/unsupervised/tasks/unpaired_audio_text.py
new file mode 100644
index 0000000000000000000000000000000000000000..5f292528f80d6bb51f16a4324d97342d28fce942
--- /dev/null
+++ b/SpeechT5/fairseq/examples/wav2vec/unsupervised/tasks/unpaired_audio_text.py
@@ -0,0 +1,447 @@
+# Copyright (c) 2017-present, Facebook, Inc.
+# All rights reserved.
+#
+# This source code is licensed under the license found in the LICENSE file in
+# the root directory of this source tree. An additional grant of patent rights
+# can be found in the PATENTS file in the same directory.
+
+from dataclasses import dataclass, field
+import logging
+import math
+import os
+from typing import Optional
+import torch
+
+from fairseq.logging import metrics
+from fairseq.tasks import FairseqTask, register_task
+from ..data import ExtractedFeaturesDataset, RandomInputDataset
+
+from fairseq.data import (
+    Dictionary,
+    data_utils,
+    StripTokenDataset,
+)
+from fairseq.dataclass import FairseqDataclass
+from fairseq.distributed.utils import get_data_parallel_world_size
+from omegaconf import MISSING
+
+from examples.speech_recognition.kaldi.kaldi_decoder import (
+    KaldiDecoder,
+    KaldiDecoderConfig,
+)
+
+
+logger = logging.getLogger(__name__)
+
+
+@dataclass
+class DecodingConfig(FairseqDataclass):
+    kenlm_path: Optional[str] = None
+    lm_weight: float = 0
+    blank_weight: float = 0
+
+
+@dataclass
+class UnpairedAudioTextConfig(FairseqDataclass):
+    data: str = field(
+        default=MISSING, metadata={"help": "path to data directory containing audio"}
+    )
+    text_data: str = field(
+        default=MISSING, metadata={"help": "path to data directory containing text"}
+    )
+    max_length: Optional[int] = None
+    labels: Optional[str] = field(
+        default=None,
+        metadata={"help": "extension of the label file to load, used for fine-tuning"},
+    )
+    unfiltered: bool = field(
+        default=False, metadata={"help": "load data with _unfiltered suffix"}
+    )
+    ctc_eval: bool = field(
+        default=False, metadata={"help": "eval UER as if computed by CTC"}
+    )
+    sort_by_length: bool = field(
+        default=True, metadata={"help": "sort examples by length of audio timesteps"}
+    )
+    shuffle: bool = field(default=True, metadata={"help": "shuffle examples"})
+    append_eos: bool = field(default=False, metadata={"help": "append eos"})
+    uppercase: Optional[bool] = field(
+        default=False, metadata={"help": "uppercase for LM score computation"}
+    )
+    skipwords: Optional[str] = field(
+        default="",
+        metadata={
+            "help": "comma-separated words to be removed for LM score computation"
+        },
+    )
+    kenlm_path: Optional[str] = None
+    vocab_usage_power: float = 2
+
+    word_decoder_config: Optional[KaldiDecoderConfig] = None
+    word_kenlm_path: Optional[str] = None
+
+    decoding_config: DecodingConfig = DecodingConfig()
+
+
+@register_task("unpaired_audio_text", dataclass=UnpairedAudioTextConfig)
+class UnpairedAudioText(FairseqTask):
+    """ """
+
+    cfg: UnpairedAudioTextConfig
+
+    def __init__(
+        self,
+        cfg: UnpairedAudioTextConfig,
+        source_dictionary=None,
+        target_dictionary=None,
+    ):
+        super().__init__(cfg)
+
+        self._target_dictionary = target_dictionary
+        self._source_dictionary = source_dictionary
+        self.num_symbols = (
+            len([s for s in target_dictionary.symbols if not s.startswith("madeup")])
+            - target_dictionary.nspecial
+        )
+        self.sil_id = (
+            target_dictionary.index("<SIL>") if "<SIL>" in target_dictionary else -1
+        )
+        self.kenlm = None
+        if cfg.kenlm_path is not None:
+            import kenlm
+
+            self.kenlm = kenlm.Model(cfg.kenlm_path)
+
+        self.word_kenlm = None
+        if cfg.word_kenlm_path is not None:
+            import kenlm
+
+            self.word_kenlm = kenlm.Model(cfg.word_kenlm_path)
+
+        self.uppercase = cfg.uppercase
+        self.skipwords = set(cfg.skipwords.split(","))
+
+        def str_postprocess(s):
+            s = " ".join(w for w in s.split() if w not in self.skipwords)
+            s = s.upper() if self.uppercase else s
+            return s
+
+        self.str_postprocess = str_postprocess
+        self.compute_lm_score = lambda s: self.kenlm.score(self.str_postprocess(s))
+
+        self.compute_word_score = None
+        if cfg.word_decoder_config is not None:
+            self.kaldi_decoder = KaldiDecoder(cfg.word_decoder_config, beam=10)
+
+            def compute_word_score(logits, padding):
+                res = self.kaldi_decoder.decode(logits, padding)
+                for r in res:
+                    r = r.result()
+                    assert len(r) == 1
+                    r = r[0]
+                    yield r["score"], r["words"]
+
+            self.compute_word_score = compute_word_score
+
+    @classmethod
+    def setup_task(cls, cfg: UnpairedAudioTextConfig, **kwargs):
+        """Setup the task (e.g., load dictionaries).
+
+        Args:
+            cfg (AudioPretrainingConfig): configuration of this task
+        """
+
+        dict_path = os.path.join(cfg.text_data, "dict.txt")
+        if os.path.exists(dict_path):
+            target_dictionary = Dictionary.load(dict_path)
+        else:
+            dict_path = os.path.join(cfg.data, f"dict.{cfg.labels}.txt")
+            target_dictionary = Dictionary.load(dict_path)
+
+        return cls(cfg, target_dictionary=target_dictionary)
+
+    def optimizer_step(self, optimizer, model, update_num):
+        if hasattr(model, "get_groups_for_update"):
+            groups = model.get_groups_for_update(update_num)
+            optimizer.step(groups={groups})
+        else:
+            optimizer.step()
+
+    def valid_step(self, sample, model, criterion):
+        res = model(
+            **sample["net_input"],
+            dense_x_only=True,
+        )
+
+        dense_x = res["logits"]
+        padding_mask = res["padding_mask"]
+
+        word_scores = None
+        if self.compute_word_score is not None:
+            word_scores = self.compute_word_score(dense_x.cpu(), padding_mask.cpu())
+
+        z = dense_x.argmax(-1)
+        z[padding_mask] = self.target_dictionary.pad()
+
+        vocab_seen = torch.zeros(self.num_symbols, dtype=torch.bool)
+
+        import editdistance
+
+        c_err = 0
+        c_len = 0
+        pred_c_len = 0
+        lm_score_sum = 0
+        for i, (x, t, id) in enumerate(
+            zip(
+                z,
+                sample["target"] if "target" in sample else [None] * len(z),
+                sample["id"],
+            )
+        ):
+
+            if t is not None:
+                t = t[(t >= self.target_dictionary.nspecial)]
+            x = x[
+                (x >= self.target_dictionary.nspecial)
+                & (x < (self.num_symbols + self.target_dictionary.nspecial))
+            ]
+            if self.sil_id >= 0:
+                x = x[x != self.sil_id]
+
+            vocab_seen[x - self.target_dictionary.nspecial] = True
+
+            pred_units_arr = x
+            if self.cfg.ctc_eval:
+                pred_units_arr = pred_units_arr.unique_consecutive()
+                pred_units_arr = pred_units_arr[pred_units_arr != 0]
+
+            if id == 0:
+                if t is not None:
+                    logger.info(f"REF: {self.target_dictionary.string(t)}")
+                logger.info(f"HYP: {self.target_dictionary.string(pred_units_arr)}")
+
+                if self.kenlm is not None:
+                    if t is not None:
+                        ref_lm_s = self.compute_lm_score(
+                            self.target_dictionary.string(t)
+                        )
+                        logger.info(
+                            f"LM [REF]: {ref_lm_s}, {math.pow(10, -ref_lm_s / (len(t) + 1))}"
+                        )
+
+                    hyp_lm_s = self.compute_lm_score(
+                        self.target_dictionary.string(pred_units_arr)
+                    )
+                    logger.info(
+                        f"LM [HYP]: {hyp_lm_s}, {math.pow(10, -hyp_lm_s / (len(pred_units_arr) + 1))}"
+                    )
+
+            pred_units_arr = pred_units_arr.tolist()
+
+            pred_c_len += len(pred_units_arr)
+
+            if t is not None:
+                t = t.tolist()
+                c_err += editdistance.eval(pred_units_arr, t)
+                c_len += len(t)
+            else:
+                c_len = pred_c_len
+
+            if self.kenlm is not None:
+                pred_str = self.target_dictionary.string(pred_units_arr)
+                lm_score = self.compute_lm_score(pred_str)
+                lm_score_sum += lm_score
+
+        kaldi_score_sum = 0
+        word_lm_sum = 0
+        num_words = 0
+        if word_scores is not None:
+            for score, words in word_scores:
+                kaldi_score_sum += score
+                num_words += len(words)
+                if self.word_kenlm is not None:
+                    word_lm_sum += self.kenlm.score(" ".join(words))
+
+        try:
+            world_size = get_data_parallel_world_size()
+        except:
+            world_size = 1
+
+        logging_output = {
+            "loss": c_err,
+            "_num_char_errors": c_err,
+            "_num_chars": c_len,
+            "_num_pred_chars": pred_c_len,
+            "ntokens": c_len,
+            "nsentences": z.size(0),
+            "sample_size": c_len,
+            "_world_size": world_size,
+            "_lm_score_sum": lm_score_sum,
+            "_kaldi_score_sum": kaldi_score_sum,
+            "_word_lm_sum": word_lm_sum,
+            "_num_words": num_words,
+            "_vocab_seen": vocab_seen,
+        }
+
+        return c_err, c_len, logging_output
+
+    def load_dataset(self, split: str, task_cfg: FairseqDataclass = None, **kwargs):
+        data_path = self.cfg.data
+        task_cfg = task_cfg or self.cfg
+
+        has_unpaired_text = os.path.exists(
+            os.path.join(self.cfg.text_data, f"{split}.idx")
+        )
+
+        self.datasets[split] = ExtractedFeaturesDataset(
+            path=data_path,
+            split=split,
+            min_length=3,
+            max_length=task_cfg.max_length,
+            labels=None if has_unpaired_text else task_cfg.labels,
+            label_dict=self.target_dictionary,
+            shuffle=getattr(task_cfg, "shuffle", True),
+            sort_by_length=task_cfg.sort_by_length,
+        )
+
+        logger.info(f"split {split} has unpaired text? {has_unpaired_text}")
+        if has_unpaired_text:
+            text_dataset = data_utils.load_indexed_dataset(
+                os.path.join(self.cfg.text_data, split), self.target_dictionary
+            )
+            text_dataset = StripTokenDataset(text_dataset, self.target_dictionary.eos())
+            self.datasets[split] = RandomInputDataset(
+                self.datasets[split],
+                text_dataset,
+                ["random_label"],
+                add_to_input=True,
+                pad_idx=self.target_dictionary.pad(),
+            )
+
+    @property
+    def source_dictionary(self):
+        return self._source_dictionary
+
+    @property
+    def target_dictionary(self):
+        """Return the :class:`~fairseq.data.Dictionary` for the language
+        model."""
+        return self._target_dictionary
+
+    def max_positions(self):
+        """Maximum input length supported by the encoder."""
+        return None
+
+    def reduce_metrics(self, logging_outputs, criterion):
+        super().reduce_metrics(logging_outputs, criterion)
+
+        zero = torch.scalar_tensor(0.0)
+        num_char_errors = sum(
+            log.get("_num_char_errors", zero) for log in logging_outputs
+        )
+        num_chars = sum(log.get("_num_chars", zero) for log in logging_outputs)
+        num_word_errors = sum(
+            log.get("_num_word_errors", zero) for log in logging_outputs
+        )
+        num_words = sum(log.get("_num_words", zero) for log in logging_outputs)
+        num_pred_chars = sum(
+            log.get("_num_pred_chars", zero) for log in logging_outputs
+        )
+
+        lm_score_sum = sum(log.get("_lm_score_sum", zero) for log in logging_outputs)
+        vocab_seen = (
+            sum(log.get("_vocab_seen", zero) for log in logging_outputs)
+            .bool()
+            .sum()
+            .item()
+        )
+        kaldi_score_sum = sum(
+            log.get("_kaldi_score_sum", zero) for log in logging_outputs
+        )
+        word_lm_sum = sum(log.get("_word_lm_sum", zero) for log in logging_outputs)
+
+        metrics.log_scalar_sum("_num_char_errors", num_char_errors)
+        metrics.log_scalar_sum("_num_chars", num_chars)
+        metrics.log_scalar_sum("_num_word_errors", num_word_errors)
+        metrics.log_scalar_sum("_num_words", num_words)
+
+        metrics.log_scalar_sum("lm_score_sum", lm_score_sum)
+        metrics.log_scalar_sum("num_pred_chars", num_pred_chars)
+
+        if self.cfg.word_kenlm_path is not None:
+            metrics.log_scalar_sum("kaldi_score_sum", kaldi_score_sum)
+            metrics.log_scalar_sum("word_lm_sum", word_lm_sum)
+
+        if num_chars > 0:
+            metrics.log_derived(
+                "uer",
+                lambda meters: meters["_num_char_errors"].sum
+                * 100.0
+                / meters["_num_chars"].sum
+                if meters["_num_chars"].sum > 0
+                else float("nan"),
+            )
+
+            if lm_score_sum < 0 and vocab_seen > 0:
+                metrics.log_scalar("vocab_seen_pct", vocab_seen / self.num_symbols)
+
+                metrics.log_derived(
+                    "weighted_lm_ppl",
+                    lambda meters: math.pow(
+                        10,
+                        -meters["lm_score_sum"].sum
+                        / (
+                            meters["num_pred_chars"].sum + meters["nsentences"].sum
+                        ),  # account for </s>
+                    )
+                    / meters["vocab_seen_pct"].avg ** self.cfg.vocab_usage_power,
+                )
+
+                metrics.log_derived(
+                    "lm_ppl",
+                    lambda meters: math.pow(
+                        10,
+                        -meters["lm_score_sum"].sum
+                        / (
+                            meters["num_pred_chars"].sum + meters["nsentences"].sum
+                        ),  # account for </s>
+                    ),
+                )
+            else:
+                metrics.log_derived("weighted_lm_ppl", lambda meters: float("inf"))
+
+        if num_words > 0:
+            if word_lm_sum != 0:
+                metrics.log_derived(
+                    "word_lm_ppl",
+                    lambda meters: math.pow(
+                        10,
+                        -meters["word_lm_sum"].sum
+                        / (
+                            meters["_num_words"].sum + meters["nsentences"].sum
+                        ),  # account for </s>
+                    ),
+                )
+                metrics.log_derived(
+                    "weighted_word_lm_ppl",
+                    lambda meters: math.pow(
+                        10,
+                        -meters["word_lm_sum"].sum
+                        / (
+                            meters["_num_words"].sum + meters["nsentences"].sum
+                        ),  # account for </s>
+                    )
+                    / meters["vocab_seen_pct"].avg ** self.cfg.vocab_usage_power,
+                )
+
+            if self.cfg.word_kenlm_path is not None:
+                metrics.log_derived(
+                    "kaldi_score",
+                    lambda meters: meters["kaldi_score_sum"].sum
+                    / meters["nsentences"].sum,
+                )
+
+    def build_model(self, cfg: FairseqDataclass):
+        model = super().build_model(cfg)
+
+        return model
diff --git a/SpeechT5/fairseq/examples/wav2vec/unsupervised/w2vu_generate.py b/SpeechT5/fairseq/examples/wav2vec/unsupervised/w2vu_generate.py
new file mode 100644
index 0000000000000000000000000000000000000000..2bad873616e33274357a993633a281dbc8b9e168
--- /dev/null
+++ b/SpeechT5/fairseq/examples/wav2vec/unsupervised/w2vu_generate.py
@@ -0,0 +1,709 @@
+#!/usr/bin/env python3 -u
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+"""
+Run inference for pre-processed data with a trained model.
+"""
+
+import ast
+from collections import namedtuple
+from dataclasses import dataclass, field
+from enum import Enum, auto
+import hydra
+from hydra.core.config_store import ConfigStore
+import logging
+import math
+import os
+from omegaconf import OmegaConf
+from typing import Optional
+import sys
+
+import editdistance
+import torch
+
+from hydra.core.hydra_config import HydraConfig
+
+from fairseq import checkpoint_utils, progress_bar, tasks, utils
+from fairseq.data.data_utils import post_process
+from fairseq.dataclass.configs import FairseqDataclass, FairseqConfig
+from fairseq.logging.meters import StopwatchMeter
+from omegaconf import open_dict
+
+from examples.speech_recognition.kaldi.kaldi_decoder import KaldiDecoderConfig
+
+logging.root.setLevel(logging.INFO)
+logging.basicConfig(stream=sys.stdout, level=logging.INFO)
+logger = logging.getLogger(__name__)
+
+
+class DecoderType(Enum):
+    VITERBI = auto()
+    KENLM = auto()
+    FAIRSEQ = auto()
+    KALDI = auto()
+
+
+@dataclass
+class UnsupGenerateConfig(FairseqDataclass):
+    fairseq: FairseqConfig = FairseqConfig()
+    lm_weight: float = field(
+        default=2.0,
+        metadata={"help": "language model weight"},
+    )
+    w2l_decoder: DecoderType = field(
+        default=DecoderType.VITERBI,
+        metadata={"help": "type of decoder to use"},
+    )
+    kaldi_decoder_config: Optional[KaldiDecoderConfig] = None
+    lexicon: Optional[str] = field(
+        default=None,
+        metadata={
+            "help": "path to lexicon. This is also used to 'phonemize' for unsupvised param tuning"
+        },
+    )
+    lm_model: Optional[str] = field(
+        default=None,
+        metadata={"help": "path to language model (kenlm or fairseq)"},
+    )
+    unit_lm: bool = field(
+        default=False,
+        metadata={"help": "whether to use unit lm"},
+    )
+    beam_threshold: float = field(
+        default=50.0,
+        metadata={"help": "beam score threshold"},
+    )
+    beam_size_token: float = field(
+        default=100.0,
+        metadata={"help": "max tokens per beam"},
+    )
+    beam: int = field(
+        default=5,
+        metadata={"help": "decoder beam size"},
+    )
+    nbest: int = field(
+        default=1,
+        metadata={"help": "number of results to return"},
+    )
+    word_score: float = field(
+        default=1.0,
+        metadata={"help": "word score to add at end of word"},
+    )
+    unk_weight: float = field(
+        default=-math.inf,
+        metadata={"help": "unknown token weight"},
+    )
+    sil_weight: float = field(
+        default=0.0,
+        metadata={"help": "silence token weight"},
+    )
+    targets: Optional[str] = field(
+        default=None,
+        metadata={"help": "extension of ground truth labels to compute UER"},
+    )
+    results_path: Optional[str] = field(
+        default=None,
+        metadata={"help": "where to store results"},
+    )
+    post_process: Optional[str] = field(
+        default=None,
+        metadata={"help": "how to post process results"},
+    )
+    vocab_usage_power: float = field(
+        default=2,
+        metadata={"help": "for unsupervised param tuning"},
+    )
+
+    viterbi_transcript: Optional[str] = field(
+        default=None,
+        metadata={"help": "for unsupervised param tuning"},
+    )
+    min_lm_ppl: float = field(
+        default=0,
+        metadata={"help": "for unsupervised param tuning"},
+    )
+    min_vt_uer: float = field(
+        default=0,
+        metadata={"help": "for unsupervised param tuning"},
+    )
+
+    blank_weight: float = field(
+        default=0,
+        metadata={"help": "value to add or set for blank emission"},
+    )
+    blank_mode: str = field(
+        default="set",
+        metadata={
+            "help": "can be add or set, how to modify blank emission with blank weight"
+        },
+    )
+    sil_is_blank: bool = field(
+        default=False,
+        metadata={"help": "if true, <SIL> token is same as blank token"},
+    )
+
+    unsupervised_tuning: bool = field(
+        default=False,
+        metadata={
+            "help": "if true, returns a score based on unsupervised param selection metric instead of UER"
+        },
+    )
+    is_ax: bool = field(
+        default=False,
+        metadata={
+            "help": "if true, assumes we are using ax for tuning and returns a tuple for ax to consume"
+        },
+    )
+
+
+def get_dataset_itr(cfg, task):
+    return task.get_batch_iterator(
+        dataset=task.dataset(cfg.fairseq.dataset.gen_subset),
+        max_tokens=cfg.fairseq.dataset.max_tokens,
+        max_sentences=cfg.fairseq.dataset.batch_size,
+        max_positions=(sys.maxsize, sys.maxsize),
+        ignore_invalid_inputs=cfg.fairseq.dataset.skip_invalid_size_inputs_valid_test,
+        required_batch_size_multiple=cfg.fairseq.dataset.required_batch_size_multiple,
+        num_shards=cfg.fairseq.dataset.num_shards,
+        shard_id=cfg.fairseq.dataset.shard_id,
+        num_workers=cfg.fairseq.dataset.num_workers,
+        data_buffer_size=cfg.fairseq.dataset.data_buffer_size,
+    ).next_epoch_itr(shuffle=False)
+
+
+def process_predictions(
+    cfg: UnsupGenerateConfig,
+    hypos,
+    tgt_dict,
+    target_tokens,
+    res_files,
+):
+    retval = []
+    word_preds = []
+    transcriptions = []
+    dec_scores = []
+
+    for i, hypo in enumerate(hypos[: min(len(hypos), cfg.nbest)]):
+        if torch.is_tensor(hypo["tokens"]):
+            tokens = hypo["tokens"].int().cpu()
+            tokens = tokens[tokens >= tgt_dict.nspecial]
+            hyp_pieces = tgt_dict.string(tokens)
+        else:
+            hyp_pieces = " ".join(hypo["tokens"])
+
+        if "words" in hypo and len(hypo["words"]) > 0:
+            hyp_words = " ".join(hypo["words"])
+        else:
+            hyp_words = post_process(hyp_pieces, cfg.post_process)
+
+        to_write = {}
+        if res_files is not None:
+            to_write[res_files["hypo.units"]] = hyp_pieces
+            to_write[res_files["hypo.words"]] = hyp_words
+
+        tgt_words = ""
+        if target_tokens is not None:
+            if isinstance(target_tokens, str):
+                tgt_pieces = tgt_words = target_tokens
+            else:
+                tgt_pieces = tgt_dict.string(target_tokens)
+                tgt_words = post_process(tgt_pieces, cfg.post_process)
+
+            if res_files is not None:
+                to_write[res_files["ref.units"]] = tgt_pieces
+                to_write[res_files["ref.words"]] = tgt_words
+
+        if not cfg.fairseq.common_eval.quiet:
+            logger.info(f"HYPO {i}:" + hyp_words)
+            if tgt_words:
+                logger.info("TARGET:" + tgt_words)
+
+            if "am_score" in hypo and "lm_score" in hypo:
+                logger.info(
+                    f"DECODER AM SCORE: {hypo['am_score']}, DECODER LM SCORE: {hypo['lm_score']}, DECODER SCORE: {hypo['score']}"
+                )
+            elif "score" in hypo:
+                logger.info(f"DECODER SCORE: {hypo['score']}")
+
+            logger.info("___________________")
+
+        hyp_words_arr = hyp_words.split()
+        tgt_words_arr = tgt_words.split()
+
+        retval.append(
+            (
+                editdistance.eval(hyp_words_arr, tgt_words_arr),
+                len(hyp_words_arr),
+                len(tgt_words_arr),
+                hyp_pieces,
+                hyp_words,
+            )
+        )
+        word_preds.append(hyp_words_arr)
+        transcriptions.append(to_write)
+        dec_scores.append(-hypo.get("score", 0))  # negate cuz kaldi returns NLL
+
+    if len(retval) > 1:
+        best = None
+        for r, t in zip(retval, transcriptions):
+            if best is None or r[0] < best[0][0]:
+                best = r, t
+        for dest, tran in best[1].items():
+            print(tran, file=dest)
+            dest.flush()
+        return best[0]
+
+    assert len(transcriptions) == 1
+    for dest, tran in transcriptions[0].items():
+        print(tran, file=dest)
+
+    return retval[0]
+
+
+def prepare_result_files(cfg: UnsupGenerateConfig):
+    def get_res_file(file_prefix):
+        if cfg.fairseq.dataset.num_shards > 1:
+            file_prefix = f"{cfg.fairseq.dataset.shard_id}_{file_prefix}"
+        path = os.path.join(
+            cfg.results_path,
+            "{}{}.txt".format(
+                cfg.fairseq.dataset.gen_subset,
+                file_prefix,
+            ),
+        )
+        return open(path, "w", buffering=1)
+
+    if not cfg.results_path:
+        return None
+
+    return {
+        "hypo.words": get_res_file(""),
+        "hypo.units": get_res_file("_units"),
+        "ref.words": get_res_file("_ref"),
+        "ref.units": get_res_file("_ref_units"),
+        "hypo.nbest.words": get_res_file("_nbest_words"),
+    }
+
+
+def optimize_models(cfg: UnsupGenerateConfig, use_cuda, models):
+    """Optimize ensemble for generation"""
+    for model in models:
+        model.eval()
+        if cfg.fairseq.common.fp16:
+            model.half()
+        if use_cuda:
+            model.cuda()
+
+
+GenResult = namedtuple(
+    "GenResult",
+    [
+        "count",
+        "errs_t",
+        "gen_timer",
+        "lengths_hyp_unit_t",
+        "lengths_hyp_t",
+        "lengths_t",
+        "lm_score_t",
+        "num_feats",
+        "num_sentences",
+        "num_symbols",
+        "vt_err_t",
+        "vt_length_t",
+    ],
+)
+
+
+def generate(cfg: UnsupGenerateConfig, models, saved_cfg, use_cuda):
+    task = tasks.setup_task(cfg.fairseq.task)
+    saved_cfg.task.labels = cfg.fairseq.task.labels
+    task.load_dataset(cfg.fairseq.dataset.gen_subset, task_cfg=saved_cfg.task)
+    # Set dictionary
+    tgt_dict = task.target_dictionary
+    logger.info(
+        "| {} {} {} examples".format(
+            cfg.fairseq.task.data,
+            cfg.fairseq.dataset.gen_subset,
+            len(task.dataset(cfg.fairseq.dataset.gen_subset)),
+        )
+    )
+    # Load dataset (possibly sharded)
+    itr = get_dataset_itr(cfg, task)
+    # Initialize generator
+    gen_timer = StopwatchMeter()
+
+    def build_generator(cfg: UnsupGenerateConfig):
+        w2l_decoder = cfg.w2l_decoder
+        if w2l_decoder == DecoderType.VITERBI:
+            from examples.speech_recognition.w2l_decoder import W2lViterbiDecoder
+
+            return W2lViterbiDecoder(cfg, task.target_dictionary)
+        elif w2l_decoder == DecoderType.KENLM:
+            from examples.speech_recognition.w2l_decoder import W2lKenLMDecoder
+
+            return W2lKenLMDecoder(cfg, task.target_dictionary)
+        elif w2l_decoder == DecoderType.FAIRSEQ:
+            from examples.speech_recognition.w2l_decoder import W2lFairseqLMDecoder
+
+            return W2lFairseqLMDecoder(cfg, task.target_dictionary)
+        elif w2l_decoder == DecoderType.KALDI:
+            from examples.speech_recognition.kaldi.kaldi_decoder import KaldiDecoder
+
+            assert cfg.kaldi_decoder_config is not None
+
+            return KaldiDecoder(
+                cfg.kaldi_decoder_config,
+                cfg.beam,
+            )
+        else:
+            raise NotImplementedError(
+                "only wav2letter decoders with (viterbi, kenlm, fairseqlm) options are supported at the moment but found "
+                + str(w2l_decoder)
+            )
+
+    generator = build_generator(cfg)
+
+    kenlm = None
+    fairseq_lm = None
+    if cfg.lm_model is not None:
+        import kenlm
+
+        kenlm = kenlm.Model(cfg.lm_model)
+
+    num_sentences = 0
+    if cfg.results_path is not None and not os.path.exists(cfg.results_path):
+        os.makedirs(cfg.results_path)
+
+    res_files = prepare_result_files(cfg)
+    errs_t = 0
+    lengths_hyp_t = 0
+    lengths_hyp_unit_t = 0
+    lengths_t = 0
+    count = 0
+    num_feats = 0
+    all_hyp_pieces = []
+    all_hyp_words = []
+
+    num_symbols = (
+        len([s for s in tgt_dict.symbols if not s.startswith("madeup")])
+        - tgt_dict.nspecial
+    )
+    targets = None
+    if cfg.targets is not None:
+        tgt_path = os.path.join(
+            cfg.fairseq.task.data, cfg.fairseq.dataset.gen_subset + "." + cfg.targets
+        )
+        if os.path.exists(tgt_path):
+            with open(tgt_path, "r") as f:
+                targets = f.read().splitlines()
+    viterbi_transcript = None
+    if cfg.viterbi_transcript is not None and len(cfg.viterbi_transcript) > 0:
+        logger.info(f"loading viterbi transcript from {cfg.viterbi_transcript}")
+        with open(cfg.viterbi_transcript, "r") as vf:
+            viterbi_transcript = vf.readlines()
+            viterbi_transcript = [v.rstrip().split() for v in viterbi_transcript]
+
+    gen_timer.start()
+
+    start = 0
+    end = len(itr)
+
+    hypo_futures = None
+    if cfg.w2l_decoder == DecoderType.KALDI:
+        logger.info("Extracting features")
+        hypo_futures = []
+        samples = []
+        with progress_bar.build_progress_bar(cfg.fairseq.common, itr) as t:
+            for i, sample in enumerate(t):
+                if "net_input" not in sample or i < start or i >= end:
+                    continue
+                if "padding_mask" not in sample["net_input"]:
+                    sample["net_input"]["padding_mask"] = None
+
+                hypos, num_feats = gen_hypos(
+                    generator, models, num_feats, sample, task, use_cuda
+                )
+                hypo_futures.append(hypos)
+                samples.append(sample)
+                if cfg.debug:
+                    break
+        itr = list(zip(hypo_futures, samples))
+        start = 0
+        end = len(itr)
+        logger.info("Finished extracting features")
+
+    with progress_bar.build_progress_bar(cfg.fairseq.common, itr) as t:
+        for i, sample in enumerate(t):
+            if i < start or i >= end:
+                continue
+
+            if hypo_futures is not None:
+                hypos, sample = sample
+                hypos = [h.result() for h in hypos]
+            else:
+                if "net_input" not in sample:
+                    continue
+
+                hypos, num_feats = gen_hypos(
+                    generator, models, num_feats, sample, task, use_cuda
+                )
+
+            for i, sample_id in enumerate(sample["id"].tolist()):
+                if targets is not None:
+                    target_tokens = targets[sample_id]
+                elif "target" in sample or "target_label" in sample:
+                    toks = (
+                        sample["target"][i, :]
+                        if "target_label" not in sample
+                        else sample["target_label"][i, :]
+                    )
+
+                    target_tokens = utils.strip_pad(toks, tgt_dict.pad()).int().cpu()
+                else:
+                    target_tokens = None
+
+                # Process top predictions
+                (
+                    errs,
+                    length_hyp,
+                    length,
+                    hyp_pieces,
+                    hyp_words,
+                ) = process_predictions(
+                    cfg,
+                    hypos[i],
+                    tgt_dict,
+                    target_tokens,
+                    res_files,
+                )
+                errs_t += errs
+                lengths_hyp_t += length_hyp
+                lengths_hyp_unit_t += (
+                    len(hyp_pieces) if len(hyp_pieces) > 0 else len(hyp_words)
+                )
+                lengths_t += length
+                count += 1
+                all_hyp_pieces.append(hyp_pieces)
+                all_hyp_words.append(hyp_words)
+
+            num_sentences += (
+                sample["nsentences"] if "nsentences" in sample else sample["id"].numel()
+            )
+
+    lm_score_sum = 0
+    if kenlm is not None:
+
+        if cfg.unit_lm:
+            lm_score_sum = sum(kenlm.score(w) for w in all_hyp_pieces)
+        else:
+            lm_score_sum = sum(kenlm.score(w) for w in all_hyp_words)
+    elif fairseq_lm is not None:
+        lm_score_sum = sum(fairseq_lm.score([h.split() for h in all_hyp_words])[0])
+
+    vt_err_t = 0
+    vt_length_t = 0
+    if viterbi_transcript is not None:
+        unit_hyps = []
+        if cfg.targets is not None and cfg.lexicon is not None:
+            lex = {}
+            with open(cfg.lexicon, "r") as lf:
+                for line in lf:
+                    items = line.rstrip().split()
+                    lex[items[0]] = items[1:]
+            for h in all_hyp_pieces:
+                hyp_ws = []
+                for w in h.split():
+                    assert w in lex, w
+                    hyp_ws.extend(lex[w])
+                unit_hyps.append(hyp_ws)
+
+        else:
+            unit_hyps.extend([h.split() for h in all_hyp_words])
+
+        vt_err_t = sum(
+            editdistance.eval(vt, h) for vt, h in zip(viterbi_transcript, unit_hyps)
+        )
+
+        vt_length_t = sum(len(h) for h in viterbi_transcript)
+
+    if res_files is not None:
+        for r in res_files.values():
+            r.close()
+
+    gen_timer.stop(lengths_hyp_t)
+
+    return GenResult(
+        count,
+        errs_t,
+        gen_timer,
+        lengths_hyp_unit_t,
+        lengths_hyp_t,
+        lengths_t,
+        lm_score_sum,
+        num_feats,
+        num_sentences,
+        num_symbols,
+        vt_err_t,
+        vt_length_t,
+    )
+
+
+def gen_hypos(generator, models, num_feats, sample, task, use_cuda):
+    sample = utils.move_to_cuda(sample) if use_cuda else sample
+
+    if "features" in sample["net_input"]:
+        sample["net_input"]["dense_x_only"] = True
+        num_feats += (
+            sample["net_input"]["features"].shape[0]
+            * sample["net_input"]["features"].shape[1]
+        )
+    hypos = task.inference_step(generator, models, sample, None)
+    return hypos, num_feats
+
+
+def main(cfg: UnsupGenerateConfig, model=None):
+    if (
+        cfg.fairseq.dataset.max_tokens is None
+        and cfg.fairseq.dataset.batch_size is None
+    ):
+        cfg.fairseq.dataset.max_tokens = 1024000
+
+    use_cuda = torch.cuda.is_available() and not cfg.fairseq.common.cpu
+
+    task = tasks.setup_task(cfg.fairseq.task)
+
+    overrides = ast.literal_eval(cfg.fairseq.common_eval.model_overrides)
+
+    if cfg.fairseq.task._name == "unpaired_audio_text":
+        overrides["model"] = {
+            "blank_weight": cfg.blank_weight,
+            "blank_mode": cfg.blank_mode,
+            "blank_is_sil": cfg.sil_is_blank,
+            "no_softmax": True,
+            "segmentation": {
+                "type": "NONE",
+            },
+        }
+    else:
+        overrides["model"] = {
+            "blank_weight": cfg.blank_weight,
+            "blank_mode": cfg.blank_mode,
+        }
+
+    if model is None:
+        # Load ensemble
+        logger.info("| loading model(s) from {}".format(cfg.fairseq.common_eval.path))
+        models, saved_cfg = checkpoint_utils.load_model_ensemble(
+            cfg.fairseq.common_eval.path.split("\\"),
+            arg_overrides=overrides,
+            task=task,
+            suffix=cfg.fairseq.checkpoint.checkpoint_suffix,
+            strict=(cfg.fairseq.checkpoint.checkpoint_shard_count == 1),
+            num_shards=cfg.fairseq.checkpoint.checkpoint_shard_count,
+        )
+        optimize_models(cfg, use_cuda, models)
+    else:
+        models = [model]
+        saved_cfg = cfg.fairseq
+
+    with open_dict(saved_cfg.task):
+        saved_cfg.task.shuffle = False
+        saved_cfg.task.sort_by_length = False
+
+    gen_result = generate(cfg, models, saved_cfg, use_cuda)
+
+    wer = None
+    if gen_result.lengths_t > 0:
+        wer = gen_result.errs_t * 100.0 / gen_result.lengths_t
+        logger.info(f"WER: {wer}")
+
+    lm_ppl = float("inf")
+
+    if gen_result.lm_score_t != 0 and gen_result.lengths_hyp_t > 0:
+        hyp_len = gen_result.lengths_hyp_t
+        lm_ppl = math.pow(
+            10, -gen_result.lm_score_t / (hyp_len + gen_result.num_sentences)
+        )
+        logger.info(f"LM PPL: {lm_ppl}")
+
+    logger.info(
+        "| Processed {} sentences ({} tokens) in {:.1f}s ({:.2f}"
+        " sentences/s, {:.2f} tokens/s)".format(
+            gen_result.num_sentences,
+            gen_result.gen_timer.n,
+            gen_result.gen_timer.sum,
+            gen_result.num_sentences / gen_result.gen_timer.sum,
+            1.0 / gen_result.gen_timer.avg,
+        )
+    )
+
+    vt_diff = None
+    if gen_result.vt_length_t > 0:
+        vt_diff = gen_result.vt_err_t / gen_result.vt_length_t
+        vt_diff = max(cfg.min_vt_uer, vt_diff)
+
+    lm_ppl = max(cfg.min_lm_ppl, lm_ppl)
+
+    if not cfg.unsupervised_tuning == 0:
+        weighted_score = wer
+    else:
+        weighted_score = math.log(lm_ppl) * (vt_diff or 1.0)
+
+    res = (
+        f"| Generate {cfg.fairseq.dataset.gen_subset} with beam={cfg.beam}, "
+        f"lm_weight={cfg.kaldi_decoder_config.acoustic_scale if cfg.kaldi_decoder_config else cfg.lm_weight}, "
+        f"word_score={cfg.word_score}, sil_weight={cfg.sil_weight}, blank_weight={cfg.blank_weight}, "
+        f"WER: {wer}, LM_PPL: {lm_ppl}, num feats: {gen_result.num_feats}, "
+        f"length: {gen_result.lengths_hyp_t}, UER to viterbi: {(vt_diff or 0) * 100}, score: {weighted_score}"
+    )
+
+    logger.info(res)
+    # print(res)
+
+    return task, weighted_score
+
+
+@hydra.main(
+    config_path=os.path.join("../../..", "fairseq", "config"), config_name="config"
+)
+def hydra_main(cfg):
+    with open_dict(cfg):
+        # make hydra logging work with ddp (see # see https://github.com/facebookresearch/hydra/issues/1126)
+        cfg.job_logging_cfg = OmegaConf.to_container(
+            HydraConfig.get().job_logging, resolve=True
+        )
+
+    cfg = OmegaConf.create(
+        OmegaConf.to_container(cfg, resolve=False, enum_to_str=False)
+    )
+    OmegaConf.set_struct(cfg, True)
+    logger.info(cfg)
+
+    utils.import_user_module(cfg.fairseq.common)
+
+    _, score = main(cfg)
+
+    if cfg.is_ax:
+        return score, None
+    return score
+
+
+def cli_main():
+    try:
+        from hydra._internal.utils import get_args
+
+        cfg_name = get_args().config_name or "config"
+    except:
+        logger.warning("Failed to get config name from hydra args")
+        cfg_name = "config"
+
+    cs = ConfigStore.instance()
+    cs.store(name=cfg_name, node=UnsupGenerateConfig)
+    hydra_main()
+
+
+if __name__ == "__main__":
+    cli_main()
diff --git a/SpeechT5/fairseq/examples/wav2vec/vq-wav2vec_featurize.py b/SpeechT5/fairseq/examples/wav2vec/vq-wav2vec_featurize.py
new file mode 100644
index 0000000000000000000000000000000000000000..627072ee174c22831209e00984b945eb9dc2c279
--- /dev/null
+++ b/SpeechT5/fairseq/examples/wav2vec/vq-wav2vec_featurize.py
@@ -0,0 +1,250 @@
+#!/usr/bin/env python3
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+"""
+Helper script to pre-compute embeddings for a flashlight (previously called wav2letter++) dataset
+"""
+
+import argparse
+import glob
+import os
+import os.path as osp
+import pprint
+
+import soundfile as sf
+import torch
+import fairseq
+from torch import nn
+from torch.utils.data import DataLoader
+
+
+try:
+    import tqdm
+except:
+    print("Install tqdm to use --log-format=tqdm")
+
+
+class FilesDataset:
+    def __init__(self, files, labels):
+        self.files = files
+        if labels and osp.exists(labels):
+            with open(labels, "r") as lbl_f:
+                self.labels = [line.rstrip() for line in lbl_f]
+        else:
+            self.labels = labels
+
+    def __len__(self):
+        return len(self.files)
+
+    def __getitem__(self, index):
+        fname = self.files[index]
+
+        wav, sr = sf.read(fname)
+        assert sr == 16000
+
+        wav = torch.from_numpy(wav).float()
+        lbls = None
+        if self.labels:
+            if isinstance(self.labels, str):
+                lbl_file = osp.splitext(fname)[0] + "." + self.labels
+                with open(lbl_file, "r") as lblf:
+                    lbls = lblf.readline()
+                    assert lbls is not None
+            else:
+                lbls = self.labels[index]
+        return wav, lbls
+
+    def collate(self, batch):
+        return batch
+
+
+class ArgTypes:
+    @staticmethod
+    def existing_path(arg):
+        arg = str(arg)
+        assert osp.exists(arg), f"File {arg} does not exist"
+        return arg
+
+    @staticmethod
+    def mkdir(arg):
+        arg = str(arg)
+        os.makedirs(arg, exist_ok=True)
+        return arg
+
+
+class DatasetWriter:
+    def __init__(self):
+
+        self.args = self.load_config()
+        pprint.pprint(self.args.__dict__)
+
+        self.model = self.load_model()
+
+    def __getattr__(self, attr):
+        return getattr(self.args, attr)
+
+    def read_manifest(self, fname):
+
+        with open(fname, "r") as fp:
+            lines = fp.read().split("\n")
+            root = lines.pop(0).strip()
+            fnames = [
+                osp.join(root, line.split("\t")[0]) for line in lines if len(line) > 0
+            ]
+
+        return fnames
+
+    def process_splits(self):
+
+        if self.args.shard is not None or self.args.num_shards is not None:
+            assert self.args.shard is not None and self.args.num_shards is not None
+
+        for split in self.splits:
+            print(split)
+
+            if self.extension == "tsv":
+                datadir = osp.join(self.data_dir, f"{split}.{self.extension}")
+                print("Reading manifest file: ", datadir)
+                files = self.read_manifest(datadir)
+            else:
+                datadir = osp.join(self.data_dir, split, f"**/*.{self.extension}")
+                files = glob.glob(datadir, recursive=True)
+
+            assert len(files) > 0
+
+            if self.args.shard is not None:
+                files = files[self.args.shard :: self.args.num_shards]
+
+            lbls = []
+            with open(self.data_file(split), "w") as srcf:
+                for line, lbl in self.iterate(files):
+                    print(line, file=srcf)
+                    if self.args.labels:
+                        lbls.append(lbl + "\n")
+
+            if self.args.labels:
+                assert all(a is not None for a in lbls)
+                with open(self.lbl_file(split), "w") as lblf:
+                    lblf.writelines(lbls)
+
+    def iterate(self, files):
+
+        data = self.load_data(files)
+        for samples in tqdm.tqdm(data, total=len(files) // 32):
+
+            for wav, lbl in samples:
+                x = wav.unsqueeze(0).float().cuda()
+
+                div = 1
+                while x.size(-1) // div > self.args.max_size:
+                    div += 1
+
+                xs = x.chunk(div, dim=-1)
+
+                result = []
+                for x in xs:
+                    torch.cuda.empty_cache()
+                    x = self.model.feature_extractor(x)
+                    if self.quantize_location == "encoder":
+                        with torch.no_grad():
+                            _, idx = self.model.vector_quantizer.forward_idx(x)
+                            idx = idx.squeeze(0).cpu()
+                    else:
+                        with torch.no_grad():
+                            z = self.model.feature_aggregator(x)
+                            _, idx = self.model.vector_quantizer.forward_idx(z)
+                            idx = idx.squeeze(0).cpu()
+                    result.append(idx)
+
+                idx = torch.cat(result, dim=0)
+                yield " ".join("-".join(map(str, a.tolist())) for a in idx), lbl
+
+    def lbl_file(self, name):
+        shard_part = "" if self.args.shard is None else f".{self.args.shard}"
+        return osp.join(self.output_dir, f"{name}.lbl{shard_part}")
+
+    def data_file(self, name):
+        shard_part = "" if self.args.shard is None else f".{self.args.shard}"
+        return osp.join(self.output_dir, f"{name}.src{shard_part}")
+
+    def var_file(self):
+        return osp.join(self.output_dir, f"vars.pt")
+
+    def load_config(self):
+
+        parser = argparse.ArgumentParser("Vector Quantized wav2vec features")
+
+        # Model Arguments
+        parser.add_argument("--checkpoint", type=ArgTypes.existing_path, required=True)
+        parser.add_argument("--data-parallel", action="store_true")
+
+        # Output Arguments
+        parser.add_argument("--output-dir", type=ArgTypes.mkdir, required=True)
+
+        # Data Arguments
+        parser.add_argument("--data-dir", type=ArgTypes.existing_path, required=True)
+        parser.add_argument("--splits", type=str, nargs="+", required=True)
+        parser.add_argument("--extension", type=str, required=True)
+        parser.add_argument("--labels", type=str, required=False)
+
+        parser.add_argument("--shard", type=int, default=None)
+        parser.add_argument("--num-shards", type=int, default=None)
+        parser.add_argument("--max-size", type=int, default=1300000)
+
+        # Logger Arguments
+        parser.add_argument(
+            "--log-format", type=str, choices=["none", "simple", "tqdm"]
+        )
+
+        return parser.parse_args()
+
+    def load_data(self, fnames):
+
+        dataset = FilesDataset(fnames, self.args.labels)
+        loader = DataLoader(
+            dataset, batch_size=32, collate_fn=dataset.collate, num_workers=8
+        )
+        return loader
+
+    def load_model(self):
+        model, cfg, task = fairseq.checkpoint_utils.load_model_ensemble_and_task([self.checkpoint])
+        model = model[0]
+
+        self.quantize_location = getattr(cfg.model, "vq", "encoder")
+
+        model.eval().float()
+        model.cuda()
+
+        if self.data_parallel:
+            model = nn.DataParallel(model)
+
+        return model
+
+    def __call__(self):
+
+        self.process_splits()
+
+        if hasattr(self.model.feature_extractor, "vars") and (
+            self.args.shard is None or self.args.shard == 0
+        ):
+            vars = (
+                self.model.feature_extractor.vars.view(
+                    self.model.feature_extractor.banks,
+                    self.model.feature_extractor.num_vars,
+                    -1,
+                )
+                .cpu()
+                .detach()
+            )
+            print("writing learned latent variable embeddings: ", vars.shape)
+            torch.save(vars, self.var_file())
+
+
+if __name__ == "__main__":
+    write_data = DatasetWriter()
+
+    write_data()
+    print("Done.")
diff --git a/SpeechT5/fairseq/examples/wav2vec/wav2vec_featurize.py b/SpeechT5/fairseq/examples/wav2vec/wav2vec_featurize.py
new file mode 100644
index 0000000000000000000000000000000000000000..588268b7080cbd3400ac144604b2d75cef2876dd
--- /dev/null
+++ b/SpeechT5/fairseq/examples/wav2vec/wav2vec_featurize.py
@@ -0,0 +1,249 @@
+#!/usr/bin/env python3
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+"""
+Helper script to pre-compute embeddings for a flashlight (previously called wav2letter++) dataset
+"""
+
+import argparse
+import glob
+import os
+from shutil import copy
+
+import h5py
+import numpy as np
+import soundfile as sf
+import torch
+import tqdm
+import fairseq
+from torch import nn
+
+
+def read_audio(fname):
+    """ Load an audio file and return PCM along with the sample rate """
+
+    wav, sr = sf.read(fname)
+    assert sr == 16e3
+
+    return wav, 16e3
+
+
+class PretrainedWav2VecModel(nn.Module):
+    def __init__(self, fname):
+        super().__init__()
+
+        model, cfg, task = fairseq.checkpoint_utils.load_model_ensemble_and_task([fname])
+        model = model[0]
+        model.eval()
+
+        self.model = model
+
+    def forward(self, x):
+        with torch.no_grad():
+            z = self.model.feature_extractor(x)
+            if isinstance(z, tuple):
+                z = z[0]
+            c = self.model.feature_aggregator(z)
+        return z, c
+
+
+class EmbeddingWriterConfig(argparse.ArgumentParser):
+    def __init__(self):
+        super().__init__("Pre-compute embeddings for flashlight datasets")
+
+        kwargs = {"action": "store", "type": str, "required": True}
+
+        self.add_argument("--input", "-i", help="Input Directory", **kwargs)
+        self.add_argument("--output", "-o", help="Output Directory", **kwargs)
+        self.add_argument("--model", help="Path to model checkpoint", **kwargs)
+        self.add_argument("--split", help="Dataset Splits", nargs="+", **kwargs)
+        self.add_argument(
+            "--ext", default="wav", required=False, help="Audio file extension"
+        )
+
+        self.add_argument(
+            "--no-copy-labels",
+            action="store_true",
+            help="Do not copy label files. Useful for large datasets, use --targetdir in flashlight then.",
+        )
+        self.add_argument(
+            "--use-feat",
+            action="store_true",
+            help="Use the feature vector ('z') instead of context vector ('c') for features",
+        )
+        self.add_argument("--gpu", help="GPU to use", default=0, type=int)
+
+
+class Prediction:
+    """ Lightweight wrapper around a fairspeech embedding model """
+
+    def __init__(self, fname, gpu=0):
+        self.gpu = gpu
+        self.model = PretrainedWav2VecModel(fname).cuda(gpu)
+
+    def __call__(self, x):
+        x = torch.from_numpy(x).float().cuda(self.gpu)
+        with torch.no_grad():
+            z, c = self.model(x.unsqueeze(0))
+
+        return z.squeeze(0).cpu().numpy(), c.squeeze(0).cpu().numpy()
+
+
+class H5Writer:
+    """ Write features as hdf5 file in flashlight compatible format """
+
+    def __init__(self, fname):
+        self.fname = fname
+        os.makedirs(os.path.dirname(self.fname), exist_ok=True)
+
+    def write(self, data):
+        channel, T = data.shape
+
+        with h5py.File(self.fname, "w") as out_ds:
+            data = data.T.flatten()
+            out_ds["features"] = data
+            out_ds["info"] = np.array([16e3 // 160, T, channel])
+
+
+class EmbeddingDatasetWriter(object):
+    """Given a model and a flashlight dataset, pre-compute and store embeddings
+
+    Args:
+        input_root, str :
+            Path to the flashlight dataset
+        output_root, str :
+            Desired output directory. Will be created if non-existent
+        split, str :
+            Dataset split
+    """
+
+    def __init__(
+        self,
+        input_root,
+        output_root,
+        split,
+        model_fname,
+        extension="wav",
+        gpu=0,
+        verbose=False,
+        use_feat=False,
+    ):
+
+        assert os.path.exists(model_fname)
+
+        self.model_fname = model_fname
+        self.model = Prediction(self.model_fname, gpu)
+
+        self.input_root = input_root
+        self.output_root = output_root
+        self.split = split
+        self.verbose = verbose
+        self.extension = extension
+        self.use_feat = use_feat
+
+        assert os.path.exists(self.input_path), "Input path '{}' does not exist".format(
+            self.input_path
+        )
+
+    def _progress(self, iterable, **kwargs):
+        if self.verbose:
+            return tqdm.tqdm(iterable, **kwargs)
+        return iterable
+
+    def require_output_path(self, fname=None):
+        path = self.get_output_path(fname)
+        os.makedirs(path, exist_ok=True)
+
+    @property
+    def input_path(self):
+        return self.get_input_path()
+
+    @property
+    def output_path(self):
+        return self.get_output_path()
+
+    def get_input_path(self, fname=None):
+        if fname is None:
+            return os.path.join(self.input_root, self.split)
+        return os.path.join(self.get_input_path(), fname)
+
+    def get_output_path(self, fname=None):
+        if fname is None:
+            return os.path.join(self.output_root, self.split)
+        return os.path.join(self.get_output_path(), fname)
+
+    def copy_labels(self):
+        self.require_output_path()
+
+        labels = list(
+            filter(
+                lambda x: self.extension not in x, glob.glob(self.get_input_path("*"))
+            )
+        )
+        for fname in tqdm.tqdm(labels):
+            copy(fname, self.output_path)
+
+    @property
+    def input_fnames(self):
+        return sorted(glob.glob(self.get_input_path("*.{}".format(self.extension))))
+
+    def __len__(self):
+        return len(self.input_fnames)
+
+    def write_features(self):
+
+        paths = self.input_fnames
+
+        fnames_context = map(
+            lambda x: os.path.join(
+                self.output_path, x.replace("." + self.extension, ".h5context")
+            ),
+            map(os.path.basename, paths),
+        )
+
+        for name, target_fname in self._progress(
+            zip(paths, fnames_context), total=len(self)
+        ):
+            wav, sr = read_audio(name)
+            z, c = self.model(wav)
+            feat = z if self.use_feat else c
+            writer = H5Writer(target_fname)
+            writer.write(feat)
+
+    def __repr__(self):
+
+        return "EmbeddingDatasetWriter ({n_files} files)\n\tinput:\t{input_root}\n\toutput:\t{output_root}\n\tsplit:\t{split})".format(
+            n_files=len(self), **self.__dict__
+        )
+
+
+if __name__ == "__main__":
+
+    args = EmbeddingWriterConfig().parse_args()
+
+    for split in args.split:
+
+        writer = EmbeddingDatasetWriter(
+            input_root=args.input,
+            output_root=args.output,
+            split=split,
+            model_fname=args.model,
+            gpu=args.gpu,
+            extension=args.ext,
+            use_feat=args.use_feat,
+        )
+
+        print(writer)
+        writer.require_output_path()
+
+        print("Writing Features...")
+        writer.write_features()
+        print("Done.")
+
+        if not args.no_copy_labels:
+            print("Copying label data...")
+            writer.copy_labels()
+            print("Done.")
diff --git a/SpeechT5/fairseq/examples/wav2vec/wav2vec_manifest.py b/SpeechT5/fairseq/examples/wav2vec/wav2vec_manifest.py
new file mode 100644
index 0000000000000000000000000000000000000000..9b8aa180e88d9ee98bdca7089aed5046ec0d9cb9
--- /dev/null
+++ b/SpeechT5/fairseq/examples/wav2vec/wav2vec_manifest.py
@@ -0,0 +1,87 @@
+#!/usr/bin/env python3
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+"""
+Data pre-processing: build vocabularies and binarize training data.
+"""
+
+import argparse
+import glob
+import os
+import random
+
+import soundfile
+
+
+def get_parser():
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        "root", metavar="DIR", help="root directory containing flac files to index"
+    )
+    parser.add_argument(
+        "--valid-percent",
+        default=0.01,
+        type=float,
+        metavar="D",
+        help="percentage of data to use as validation set (between 0 and 1)",
+    )
+    parser.add_argument(
+        "--dest", default=".", type=str, metavar="DIR", help="output directory"
+    )
+    parser.add_argument(
+        "--ext", default="flac", type=str, metavar="EXT", help="extension to look for"
+    )
+    parser.add_argument("--seed", default=42, type=int, metavar="N", help="random seed")
+    parser.add_argument(
+        "--path-must-contain",
+        default=None,
+        type=str,
+        metavar="FRAG",
+        help="if set, path must contain this substring for a file to be included in the manifest",
+    )
+    return parser
+
+
+def main(args):
+    assert args.valid_percent >= 0 and args.valid_percent <= 1.0
+
+    if not os.path.exists(args.dest):
+        os.makedirs(args.dest)
+
+    dir_path = os.path.realpath(args.root)
+    search_path = os.path.join(dir_path, "**/*." + args.ext)
+    rand = random.Random(args.seed)
+
+    valid_f = (
+        open(os.path.join(args.dest, "valid.tsv"), "w")
+        if args.valid_percent > 0
+        else None
+    )
+
+    with open(os.path.join(args.dest, "train.tsv"), "w") as train_f:
+        print(dir_path, file=train_f)
+
+        if valid_f is not None:
+            print(dir_path, file=valid_f)
+
+        for fname in glob.iglob(search_path, recursive=True):
+            file_path = os.path.realpath(fname)
+
+            if args.path_must_contain and args.path_must_contain not in file_path:
+                continue
+
+            frames = soundfile.info(fname).frames
+            dest = train_f if rand.random() > args.valid_percent else valid_f
+            print(
+                "{}\t{}".format(os.path.relpath(file_path, dir_path), frames), file=dest
+            )
+    if valid_f is not None:
+        valid_f.close()
+
+
+if __name__ == "__main__":
+    parser = get_parser()
+    args = parser.parse_args()
+    main(args)
diff --git a/SpeechT5/fairseq/examples/wmt19/README.md b/SpeechT5/fairseq/examples/wmt19/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..5c90d0e6c4ae8d043ca622e70c5828dca6f9c2f2
--- /dev/null
+++ b/SpeechT5/fairseq/examples/wmt19/README.md
@@ -0,0 +1,85 @@
+# WMT 19
+
+This page provides pointers to the models of Facebook-FAIR's WMT'19 news translation task submission [(Ng et al., 2019)](https://arxiv.org/abs/1907.06616).
+
+## Pre-trained models
+
+Model | Description | Download
+---|---|---
+`transformer.wmt19.en-de` | En->De Ensemble | [download (.tar.gz)](https://dl.fbaipublicfiles.com/fairseq/models/wmt19.en-de.joined-dict.ensemble.tar.gz)
+`transformer.wmt19.de-en` | De->En Ensemble | [download (.tar.gz)](https://dl.fbaipublicfiles.com/fairseq/models/wmt19.de-en.joined-dict.ensemble.tar.gz)
+`transformer.wmt19.en-ru` | En->Ru Ensemble | [download (.tar.gz)](https://dl.fbaipublicfiles.com/fairseq/models/wmt19.en-ru.ensemble.tar.gz)
+`transformer.wmt19.ru-en` | Ru->En Ensemble | [download (.tar.gz)](https://dl.fbaipublicfiles.com/fairseq/models/wmt19.ru-en.ensemble.tar.gz)
+`transformer_lm.wmt19.en` | En Language Model | [download (.tar.gz)](https://dl.fbaipublicfiles.com/fairseq/models/lm/wmt19.en.tar.gz)
+`transformer_lm.wmt19.de` | De Language Model | [download (.tar.gz)](https://dl.fbaipublicfiles.com/fairseq/models/lm/wmt19.de.tar.gz)
+`transformer_lm.wmt19.ru` | Ru Language Model | [download (.tar.gz)](https://dl.fbaipublicfiles.com/fairseq/models/lm/wmt19.ru.tar.gz)
+
+## Pre-trained single models before finetuning
+
+Model | Description | Download
+---|---|---
+`transformer.wmt19.en-de` | En->De Single, no finetuning | [download (.tar.gz)](https://dl.fbaipublicfiles.com/fairseq/models/wmt19.en-de.ffn8192.tar.gz)
+`transformer.wmt19.de-en` | De->En Single, no finetuning  | [download (.tar.gz)](https://dl.fbaipublicfiles.com/fairseq/models/wmt19.de-en.ffn8192.tar.gz)
+`transformer.wmt19.en-ru` | En->Ru Single, no finetuning | [download (.tar.gz)](https://dl.fbaipublicfiles.com/fairseq/models/wmt19.en-ru.ffn8192.tar.gz)
+`transformer.wmt19.ru-en` | Ru->En Single, no finetuning  | [download (.tar.gz)](https://dl.fbaipublicfiles.com/fairseq/models/wmt19.ru-en.ffn8192.tar.gz)
+
+## Example usage (torch.hub)
+
+#### Requirements
+
+We require a few additional Python dependencies for preprocessing:
+```bash
+pip install fastBPE sacremoses
+```
+
+#### Translation
+
+```python
+import torch
+
+# English to German translation
+en2de = torch.hub.load('pytorch/fairseq', 'transformer.wmt19.en-de', checkpoint_file='model1.pt:model2.pt:model3.pt:model4.pt',
+                       tokenizer='moses', bpe='fastbpe')
+en2de.translate("Machine learning is great!")  # 'Maschinelles Lernen ist großartig!'
+
+# German to English translation
+de2en = torch.hub.load('pytorch/fairseq', 'transformer.wmt19.de-en', checkpoint_file='model1.pt:model2.pt:model3.pt:model4.pt',
+                       tokenizer='moses', bpe='fastbpe')
+de2en.translate("Maschinelles Lernen ist großartig!")  # 'Machine learning is great!'
+
+# English to Russian translation
+en2ru = torch.hub.load('pytorch/fairseq', 'transformer.wmt19.en-ru', checkpoint_file='model1.pt:model2.pt:model3.pt:model4.pt',
+                       tokenizer='moses', bpe='fastbpe')
+en2ru.translate("Machine learning is great!")  # 'Машинное обучение - это здорово!'
+
+# Russian to English translation
+ru2en = torch.hub.load('pytorch/fairseq', 'transformer.wmt19.ru-en', checkpoint_file='model1.pt:model2.pt:model3.pt:model4.pt',
+                       tokenizer='moses', bpe='fastbpe')
+ru2en.translate("Машинное обучение - это здорово!")  # 'Machine learning is great!'
+```
+
+#### Language Modeling
+
+```python
+# Sample from the English LM
+en_lm = torch.hub.load('pytorch/fairseq', 'transformer_lm.wmt19.en', tokenizer='moses', bpe='fastbpe')
+en_lm.sample("Machine learning is")  # 'Machine learning is the future of computing, says Microsoft boss Satya Nadella ...'
+
+# Sample from the German LM
+de_lm = torch.hub.load('pytorch/fairseq', 'transformer_lm.wmt19.de', tokenizer='moses', bpe='fastbpe')
+de_lm.sample("Maschinelles lernen ist")  # 'Maschinelles lernen ist das A und O (neues-deutschland.de) Die Arbeitsbedingungen für Lehrerinnen und Lehrer sind seit Jahren verbesserungswürdig ...'
+
+# Sample from the Russian LM
+ru_lm = torch.hub.load('pytorch/fairseq', 'transformer_lm.wmt19.ru', tokenizer='moses', bpe='fastbpe')
+ru_lm.sample("машинное обучение это")  # 'машинное обучение это то, что мы называем "искусственным интеллектом".'
+```
+
+## Citation
+```bibtex
+@inproceedings{ng2019facebook},
+  title = {Facebook FAIR's WMT19 News Translation Task Submission},
+  author = {Ng, Nathan and Yee, Kyra and Baevski, Alexei and Ott, Myle and Auli, Michael and Edunov, Sergey},
+  booktitle = {Proc. of WMT},
+  year = 2019,
+}
+```
diff --git a/SpeechT5/fairseq/examples/wmt20/README.md b/SpeechT5/fairseq/examples/wmt20/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..b4f2874652f8be19998a65faa1d9276d8017ec59
--- /dev/null
+++ b/SpeechT5/fairseq/examples/wmt20/README.md
@@ -0,0 +1,72 @@
+# WMT 20
+
+This page provides pointers to the models of Facebook-FAIR's WMT'20 news translation task submission [(Chen et al., 2020)](https://arxiv.org/abs/2011.08298).
+
+## Single best MT models (after finetuning on part of WMT20 news dev set)
+
+Model | Description | Download
+---|---|---
+`transformer.wmt20.ta-en` | Ta->En | [download (.tar.gz)](https://dl.fbaipublicfiles.com/fairseq/models/wmt20.ta-en.single.tar.gz)
+`transformer.wmt20.en-ta` | En->Ta | [download (.tar.gz)](https://dl.fbaipublicfiles.com/fairseq/models/wmt20.en-ta.single.tar.gz)
+`transformer.wmt20.iu-en.news` | Iu->En (News domain) | [download (.tar.gz)](https://dl.fbaipublicfiles.com/fairseq/models/wmt20.iu-en.news.single.tar.gz)
+`transformer.wmt20.en-iu.news` | En->Iu (News domain) | [download (.tar.gz)](https://dl.fbaipublicfiles.com/fairseq/models/wmt20.en-iu.news.single.tar.gz)
+`transformer.wmt20.iu-en.nh` | Iu->En (Nunavut Hansard domain) | [download (.tar.gz)](https://dl.fbaipublicfiles.com/fairseq/models/wmt20.iu-en.nh.single.tar.gz)
+`transformer.wmt20.en-iu.nh` | En->Iu (Nunavut Hansard domain) | [download (.tar.gz)](https://dl.fbaipublicfiles.com/fairseq/models/wmt20.en-iu.nh.single.tar.gz)
+
+## Language models
+Model | Description | Download
+---|---|---
+`transformer_lm.wmt20.en` | En Language Model | [download (.tar.gz)](https://dl.fbaipublicfiles.com/fairseq/models/wmt20.en.tar.gz)
+`transformer_lm.wmt20.ta` | Ta Language Model | [download (.tar.gz)](https://dl.fbaipublicfiles.com/fairseq/models/wmt20.ta.tar.gz)
+`transformer_lm.wmt20.iu.news` | Iu Language Model (News domain) | [download (.tar.gz)](https://dl.fbaipublicfiles.com/fairseq/models/wmt20.iu.news.tar.gz)
+`transformer_lm.wmt20.iu.nh` | Iu Language Model (Nunavut Hansard domain) | [download (.tar.gz)](https://dl.fbaipublicfiles.com/fairseq/models/wmt20.iu.nh.tar.gz)
+
+## Example usage (torch.hub)
+
+#### Translation
+
+```python
+import torch
+
+# English to Tamil translation
+en2ta = torch.hub.load('pytorch/fairseq', 'transformer.wmt20.en-ta')
+en2ta.translate("Machine learning is great!")  # 'இயந்திரக் கற்றல் அருமை!'
+
+# Tamil to English translation
+ta2en = torch.hub.load('pytorch/fairseq', 'transformer.wmt20.ta-en')
+ta2en.translate("இயந்திரக் கற்றல் அருமை!")  # 'Machine learning is great!'
+
+# English to Inuktitut translation
+en2iu = torch.hub.load('pytorch/fairseq', 'transformer.wmt20.en-iu.news')
+en2iu.translate("machine learning is great!")  # 'ᖃᒧᑕᐅᔭᓄᑦ ᐃᓕᓐᓂᐊᕐᓂᖅ ᐱᐅᔪᒻᒪᕆᒃ!'
+
+# Inuktitut to English translation
+iu2en = torch.hub.load('pytorch/fairseq', 'transformer.wmt20.iu-en.news')
+iu2en.translate("ᖃᒧᑕᐅᔭᓄᑦ ᐃᓕᓐᓂᐊᕐᓂᖅ ᐱᐅᔪᒻᒪᕆᒃ!")  # 'Machine learning excellence!'
+```
+
+#### Language Modeling
+
+```python
+# Sample from the English LM
+en_lm = torch.hub.load('pytorch/fairseq', 'transformer_lm.wmt20.en')
+en_lm.sample("Machine learning is")  # 'Machine learning is a type of artificial intelligence that uses machine learning to learn from data and make predictions.'
+
+# Sample from the Tamil LM
+ta_lm = torch.hub.load('pytorch/fairseq', 'transformer_lm.wmt20.ta')
+ta_lm.sample("இயந்திரக் கற்றல் என்பது செயற்கை நுண்ணறிவின்")  # 'இயந்திரக் கற்றல் என்பது செயற்கை நுண்ணறிவின் ஒரு பகுதியாகும்.'
+
+# Sample from the Inuktitut LM
+iu_lm = torch.hub.load('pytorch/fairseq', 'transformer_lm.wmt20.iu.news')
+iu_lm.sample("ᖃᒧᑕᐅᔭᓄᑦ ᐃᓕᓐᓂᐊᕐᓂᖅ")  # 'ᖃᒧᑕᐅᔭᓄᑦ ᐃᓕᓐᓂᐊᕐᓂᖅ, ᐊᒻᒪᓗ ᓯᓚᐅᑉ ᐊᓯᙳᖅᐸᓪᓕᐊᓂᖓᓄᑦ ᖃᓄᐃᓕᐅᕈᑎᒃᓴᑦ, ᐃᓚᖃᖅᖢᑎᒃ ᐅᑯᓂᖓ:'
+```
+
+## Citation
+```bibtex
+@inproceedings{chen2020facebook
+  title={Facebook AI's WMT20 News Translation Task Submission},
+  author={Peng-Jen Chen and Ann Lee and Changhan Wang and Naman Goyal and Angela Fan and Mary Williamson and Jiatao Gu},
+  booktitle={Proc. of WMT},
+  year={2020},
+}
+```
diff --git a/SpeechT5/fairseq/examples/xlmr/README.md b/SpeechT5/fairseq/examples/xlmr/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..b95bfe15d3fe6d03951453679135c2e9187d73c7
--- /dev/null
+++ b/SpeechT5/fairseq/examples/xlmr/README.md
@@ -0,0 +1,144 @@
+# Unsupervised Cross-lingual Representation Learning at Scale (XLM-RoBERTa)
+https://arxiv.org/pdf/1911.02116.pdf
+
+# Larger-Scale Transformers for Multilingual Masked Language Modeling
+https://arxiv.org/pdf/2105.00572.pdf
+
+
+## What's New:
+- June 2021: `XLMR-XL` AND `XLMR-XXL` models released.
+
+## Introduction
+
+`XLM-R` (`XLM-RoBERTa`) is a generic cross lingual sentence encoder that obtains state-of-the-art results on many cross-lingual understanding (XLU) benchmarks. It is trained on `2.5T` of filtered CommonCrawl data in 100 languages (list below).
+
+ Language | Language|Language |Language | Language
+---|---|---|---|---
+Afrikaans | Albanian | Amharic | Arabic | Armenian 
+Assamese | Azerbaijani | Basque | Belarusian | Bengali 
+Bengali Romanize | Bosnian | Breton | Bulgarian | Burmese 
+Burmese zawgyi font | Catalan | Chinese (Simplified) | Chinese (Traditional) | Croatian 
+Czech | Danish | Dutch | English | Esperanto 
+Estonian | Filipino | Finnish | French | Galician
+Georgian | German | Greek | Gujarati | Hausa
+Hebrew | Hindi | Hindi Romanize | Hungarian | Icelandic
+Indonesian | Irish | Italian | Japanese | Javanese
+Kannada | Kazakh | Khmer | Korean | Kurdish (Kurmanji)
+Kyrgyz | Lao | Latin | Latvian | Lithuanian
+Macedonian | Malagasy | Malay | Malayalam | Marathi
+Mongolian | Nepali | Norwegian | Oriya | Oromo
+Pashto | Persian | Polish | Portuguese | Punjabi
+Romanian | Russian | Sanskrit | Scottish Gaelic | Serbian
+Sindhi | Sinhala | Slovak | Slovenian | Somali
+Spanish | Sundanese | Swahili | Swedish | Tamil
+Tamil Romanize | Telugu | Telugu Romanize | Thai | Turkish
+Ukrainian | Urdu | Urdu Romanize | Uyghur | Uzbek
+Vietnamese | Welsh | Western Frisian | Xhosa | Yiddish
+
+## Pre-trained models
+
+Model | Description | #params | vocab size | Download
+---|---|---|---|---
+`xlmr.base` | XLM-R using the BERT-base architecture | 250M | 250k | [xlm.base.tar.gz](https://dl.fbaipublicfiles.com/fairseq/models/xlmr.base.tar.gz)
+`xlmr.large` | XLM-R using the BERT-large architecture | 560M | 250k | [xlm.large.tar.gz](https://dl.fbaipublicfiles.com/fairseq/models/xlmr.large.tar.gz)
+`xlmr.xl` | XLM-R (`layers=36, model_dim=2560`) | 3.5B | 250k | [xlm.xl.tar.gz](https://dl.fbaipublicfiles.com/fairseq/models/xlmr/xlmr.xl.tar.gz)
+`xlmr.xxl` | XLM-R (`layers=48, model_dim=4096`) | 10.7B | 250k | [xlm.xxl.tar.gz](https://dl.fbaipublicfiles.com/fairseq/models/xlmr/xlmr.xxl.tar.gz)
+
+## Results
+
+**[XNLI (Conneau et al., 2018)](https://arxiv.org/abs/1809.05053)**
+
+Model | average | en | fr | es | de | el | bg | ru | tr | ar | vi | th | zh | hi | sw | ur
+---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---
+`roberta.large.mnli` _(TRANSLATE-TEST)_ | 77.8 | 91.3 | 82.9 | 84.3 | 81.2 | 81.7 | 83.1 | 78.3 | 76.8 | 76.6 | 74.2 | 74.1 | 77.5 | 70.9 | 66.7 | 66.8
+`xlmr.large` _(TRANSLATE-TRAIN-ALL)_ | 83.6 | 89.1 | 85.1 | 86.6 | 85.7 | 85.3 | 85.9 | 83.5 | 83.2 | 83.1 | 83.7 | 81.5 | 83.7 | 81.6 | 78.0 | 78.1
+`xlmr.xl` _(TRANSLATE-TRAIN-ALL)_ | 85.4 | 91.1 | 87.2 | 88.1 | 87.0 | 87.4 | 87.8 | 85.3 | 85.2 | 85.3 | 86.2 | 83.8 | 85.3 | 83.1 | 79.8 | 78.2 | 85.4
+`xlmr.xxl` _(TRANSLATE-TRAIN-ALL)_ | 86.0 | 91.5 | 87.6 | 88.7 | 87.8 | 87.4 | 88.2 | 85.6 | 85.1 | 85.8 | 86.3 | 83.9 | 85.6 | 84.6 | 81.7 | 80.6
+
+**[MLQA (Lewis et al., 2018)](https://arxiv.org/abs/1910.07475)**
+
+Model | average | en | es | de | ar | hi | vi | zh
+---|---|---|---|---|---|---|---|---
+`BERT-large` | - | 80.2/67.4 | - | - | - | - | - | -
+`mBERT` | 57.7 / 41.6 | 77.7 / 65.2 | 64.3 / 46.6 | 57.9 / 44.3 | 45.7 / 29.8| 43.8 / 29.7 | 57.1 / 38.6 | 57.5 / 37.3
+`xlmr.large` | 70.7 / 52.7 | 80.6 / 67.8 | 74.1 / 56.0 | 68.5 / 53.6 | 63.1 / 43.5 | 69.2 / 51.6 | 71.3 / 50.9 | 68.0 / 45.4
+`xlmr.xl` | 73.4 / 55.3 | 85.1 / 72.6 | 66.7 / 46.2 | 70.5 / 55.5 | 74.3 / 56.9 | 72.2 / 54.7 | 74.4 / 52.9 | 70.9 / 48.5
+`xlmr.xxl` | 74.8 / 56.6 | 85.5 / 72.4 | 68.6 / 48.4 | 72.7 / 57.8 | 75.4 / 57.6 | 73.7 / 55.8 | 76.0 / 55.0 | 71.7 / 48.9 
+
+
+## Example usage
+
+##### Load XLM-R from torch.hub (PyTorch >= 1.1):
+```python
+import torch
+xlmr = torch.hub.load('pytorch/fairseq', 'xlmr.large')
+xlmr.eval()  # disable dropout (or leave in train mode to finetune)
+```
+
+##### Load XLM-R (for PyTorch 1.0 or custom models):
+```python
+# Download xlmr.large model
+wget https://dl.fbaipublicfiles.com/fairseq/models/xlmr.large.tar.gz
+tar -xzvf xlmr.large.tar.gz
+
+# Load the model in fairseq
+from fairseq.models.roberta import XLMRModel
+xlmr = XLMRModel.from_pretrained('/path/to/xlmr.large', checkpoint_file='model.pt')
+xlmr.eval()  # disable dropout (or leave in train mode to finetune)
+```
+
+##### Apply sentence-piece-model (SPM) encoding to input text:
+```python
+en_tokens = xlmr.encode('Hello world!')
+assert en_tokens.tolist() == [0, 35378,  8999, 38, 2]
+xlmr.decode(en_tokens)  # 'Hello world!'
+
+zh_tokens = xlmr.encode('你好，世界')
+assert zh_tokens.tolist() == [0, 6, 124084, 4, 3221, 2]
+xlmr.decode(zh_tokens)  # '你好，世界'
+
+hi_tokens = xlmr.encode('नमस्ते दुनिया')
+assert hi_tokens.tolist() == [0, 68700, 97883, 29405, 2]
+xlmr.decode(hi_tokens)  # 'नमस्ते दुनिया'
+
+ar_tokens = xlmr.encode('مرحبا بالعالم')
+assert ar_tokens.tolist() == [0, 665, 193478, 258, 1705, 77796, 2]
+xlmr.decode(ar_tokens) # 'مرحبا بالعالم'
+
+fr_tokens = xlmr.encode('Bonjour le monde')
+assert fr_tokens.tolist() == [0, 84602, 95, 11146, 2]
+xlmr.decode(fr_tokens) # 'Bonjour le monde'
+```
+
+##### Extract features from XLM-R:
+```python
+# Extract the last layer's features
+last_layer_features = xlmr.extract_features(zh_tokens)
+assert last_layer_features.size() == torch.Size([1, 6, 1024])
+
+# Extract all layer's features (layer 0 is the embedding layer)
+all_layers = xlmr.extract_features(zh_tokens, return_all_hiddens=True)
+assert len(all_layers) == 25
+assert torch.all(all_layers[-1] == last_layer_features)
+```
+
+## Citation
+
+```bibtex
+@article{conneau2019unsupervised,
+  title={Unsupervised Cross-lingual Representation Learning at Scale},
+  author={Conneau, Alexis and Khandelwal, Kartikay and Goyal, Naman and Chaudhary, Vishrav and Wenzek, Guillaume and Guzm{\'a}n, Francisco and Grave, Edouard and Ott, Myle and Zettlemoyer, Luke and Stoyanov, Veselin},
+  journal={arXiv preprint arXiv:1911.02116},
+  year={2019}
+}
+```
+
+
+```bibtex
+@article{goyal2021larger,
+  title={Larger-Scale Transformers for Multilingual Masked Language Modeling},
+  author={Goyal, Naman and Du, Jingfei and Ott, Myle and Anantharaman, Giri and Conneau, Alexis},
+  journal={arXiv preprint arXiv:2105.00572},
+  year={2021}
+}
+```
diff --git a/SpeechT5/fairseq/fairseq/__init__.py b/SpeechT5/fairseq/fairseq/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..dc9fd1886d55756b5bdfeccf1ad329bd419a706e
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/__init__.py
@@ -0,0 +1,44 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+"""isort:skip_file"""
+
+import os
+import sys
+
+try:
+    from .version import __version__  # noqa
+except ImportError:
+    version_txt = os.path.join(os.path.dirname(__file__), "version.txt")
+    with open(version_txt) as f:
+        __version__ = f.read().strip()
+
+__all__ = ["pdb"]
+
+# backwards compatibility to support `from fairseq.X import Y`
+from fairseq.distributed import utils as distributed_utils
+from fairseq.logging import meters, metrics, progress_bar  # noqa
+
+sys.modules["fairseq.distributed_utils"] = distributed_utils
+sys.modules["fairseq.meters"] = meters
+sys.modules["fairseq.metrics"] = metrics
+sys.modules["fairseq.progress_bar"] = progress_bar
+
+# initialize hydra
+from fairseq.dataclass.initialize import hydra_init
+hydra_init()
+
+import fairseq.criterions  # noqa
+import fairseq.distributed  # noqa
+import fairseq.models  # noqa
+import fairseq.modules  # noqa
+import fairseq.optim  # noqa
+import fairseq.optim.lr_scheduler  # noqa
+import fairseq.pdb  # noqa
+import fairseq.scoring  # noqa
+import fairseq.tasks  # noqa
+import fairseq.token_generation_constraints  # noqa
+
+import fairseq.benchmark  # noqa
+import fairseq.model_parallel  # noqa
diff --git a/SpeechT5/fairseq/fairseq/benchmark/__init__.py b/SpeechT5/fairseq/fairseq/benchmark/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..0317d5c623778fe40b7bf07b77769cd10c243244
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/benchmark/__init__.py
@@ -0,0 +1,7 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+# import models/tasks to register them
+from . import dummy_dataset, dummy_lm, dummy_masked_lm, dummy_model, dummy_mt  # noqa
diff --git a/SpeechT5/fairseq/fairseq/benchmark/dummy_dataset.py b/SpeechT5/fairseq/fairseq/benchmark/dummy_dataset.py
new file mode 100644
index 0000000000000000000000000000000000000000..2f051754af55966e26850e94c121e0ff439bfd28
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/benchmark/dummy_dataset.py
@@ -0,0 +1,36 @@
+import numpy as np
+from fairseq.data import FairseqDataset
+
+
+class DummyDataset(FairseqDataset):
+    def __init__(self, batch, num_items, item_size):
+        super().__init__()
+        self.batch = batch
+        self.num_items = num_items
+        self.item_size = item_size
+
+    def __getitem__(self, index):
+        return index
+
+    def __len__(self):
+        return self.num_items
+
+    def collater(self, samples):
+        return self.batch
+
+    @property
+    def sizes(self):
+        return np.array([self.item_size] * self.num_items)
+
+    def num_tokens(self, index):
+        return self.item_size
+
+    def size(self, index):
+        return self.item_size
+
+    def ordered_indices(self):
+        return np.arange(self.num_items)
+
+    @property
+    def supports_prefetch(self):
+        return False
diff --git a/SpeechT5/fairseq/fairseq/benchmark/dummy_lm.py b/SpeechT5/fairseq/fairseq/benchmark/dummy_lm.py
new file mode 100644
index 0000000000000000000000000000000000000000..c6246a0c0e338fa36244b3aa4fb57f189fbffcb6
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/benchmark/dummy_lm.py
@@ -0,0 +1,83 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import logging
+from dataclasses import dataclass, field
+from typing import Optional
+
+import torch
+from .dummy_dataset import DummyDataset
+from fairseq.data import Dictionary
+from fairseq.dataclass import FairseqDataclass
+from fairseq.tasks import FairseqTask, register_task
+from omegaconf import II
+
+
+logger = logging.getLogger(__name__)
+
+
+@dataclass
+class DummyLMConfig(FairseqDataclass):
+    dict_size: int = 49996
+    dataset_size: int = 100000
+    tokens_per_sample: int = field(
+        default=512, metadata={"help": "max sequence length"}
+    )
+    add_bos_token: bool = False
+    batch_size: Optional[int] = II("dataset.batch_size")
+    max_tokens: Optional[int] = II("dataset.max_tokens")
+    max_target_positions: int = II("task.tokens_per_sample")
+
+
+@register_task("dummy_lm", dataclass=DummyLMConfig)
+class DummyLMTask(FairseqTask):
+    def __init__(self, cfg: DummyLMConfig):
+        super().__init__(cfg)
+
+        # load dictionary
+        self.dictionary = Dictionary()
+        for i in range(cfg.dict_size):
+            self.dictionary.add_symbol("word{}".format(i))
+        self.dictionary.pad_to_multiple_(8)  # often faster if divisible by 8
+        logger.info("dictionary: {} types".format(len(self.dictionary)))
+
+        seq = torch.arange(cfg.tokens_per_sample + 1) + self.dictionary.pad() + 1
+
+        self.dummy_src = seq[:-1]
+        self.dummy_tgt = seq[1:]
+
+    def load_dataset(self, split, epoch=1, combine=False, **kwargs):
+        """Load a given dataset split.
+        Args:
+            split (str): name of the split (e.g., train, valid, test)
+        """
+        if self.cfg.batch_size is not None:
+            bsz = self.cfg.batch_size
+        else:
+            bsz = max(1, self.cfg.max_tokens // self.cfg.tokens_per_sample)
+        self.datasets[split] = DummyDataset(
+            {
+                "id": 1,
+                "net_input": {
+                    "src_tokens": torch.stack([self.dummy_src for _ in range(bsz)]),
+                    "src_lengths": torch.full(
+                        (bsz,), self.cfg.tokens_per_sample, dtype=torch.long
+                    ),
+                },
+                "target": torch.stack([self.dummy_tgt for _ in range(bsz)]),
+                "nsentences": bsz,
+                "ntokens": bsz * self.cfg.tokens_per_sample,
+            },
+            num_items=self.cfg.dataset_size,
+            item_size=self.cfg.tokens_per_sample,
+        )
+
+    @property
+    def source_dictionary(self):
+        return self.dictionary
+
+    @property
+    def target_dictionary(self):
+        return self.dictionary
diff --git a/SpeechT5/fairseq/fairseq/benchmark/dummy_masked_lm.py b/SpeechT5/fairseq/fairseq/benchmark/dummy_masked_lm.py
new file mode 100644
index 0000000000000000000000000000000000000000..12b9c5d0f55993bf8750564882a351fc3f8055f0
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/benchmark/dummy_masked_lm.py
@@ -0,0 +1,94 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import logging
+from dataclasses import dataclass, field
+from typing import Optional
+
+import torch
+from omegaconf import II
+
+from .dummy_dataset import DummyDataset
+from fairseq.data import Dictionary
+from fairseq.dataclass import FairseqDataclass
+from fairseq.tasks import FairseqTask, register_task
+
+logger = logging.getLogger(__name__)
+
+
+@dataclass
+class DummyMaskedLMConfig(FairseqDataclass):
+    dict_size: int = 49996
+    dataset_size: int = 100000
+    tokens_per_sample: int = field(
+        default=512,
+        metadata={
+            "help": "max number of total tokens over all"
+            " segments per sample for BERT dataset"
+        },
+    )
+    batch_size: Optional[int] = II("dataset.batch_size")
+    max_tokens: Optional[int] = II("dataset.max_tokens")
+    max_target_positions: int = II("task.tokens_per_sample")
+
+
+@register_task("dummy_masked_lm", dataclass=DummyMaskedLMConfig)
+class DummyMaskedLMTask(FairseqTask):
+    def __init__(self, cfg: DummyMaskedLMConfig):
+        super().__init__(cfg)
+
+        self.dictionary = Dictionary()
+        for i in range(cfg.dict_size):
+            self.dictionary.add_symbol("word{}".format(i))
+        logger.info("dictionary: {} types".format(len(self.dictionary)))
+        # add mask token
+        self.mask_idx = self.dictionary.add_symbol("<mask>")
+        self.dictionary.pad_to_multiple_(8)  # often faster if divisible by 8
+
+        mask_idx = 0
+        pad_idx = 1
+        seq = torch.arange(cfg.tokens_per_sample) + pad_idx + 1
+        mask = torch.arange(2, cfg.tokens_per_sample, 7)  # ~15%
+        src = seq.clone()
+        src[mask] = mask_idx
+        tgt = torch.full_like(seq, pad_idx)
+        tgt[mask] = seq[mask]
+
+        self.dummy_src = src
+        self.dummy_tgt = tgt
+
+    def load_dataset(self, split, epoch=1, combine=False, **kwargs):
+        """Load a given dataset split.
+        Args:
+            split (str): name of the split (e.g., train, valid, test)
+        """
+        if self.cfg.batch_size is not None:
+            bsz = self.cfg.batch_size
+        else:
+            bsz = max(1, self.cfg.max_tokens // self.cfg.tokens_per_sample)
+        self.datasets[split] = DummyDataset(
+            {
+                "id": 1,
+                "net_input": {
+                    "src_tokens": torch.stack([self.dummy_src for _ in range(bsz)]),
+                    "src_lengths": torch.full(
+                        (bsz,), self.cfg.tokens_per_sample, dtype=torch.long
+                    ),
+                },
+                "target": torch.stack([self.dummy_tgt for _ in range(bsz)]),
+                "nsentences": bsz,
+                "ntokens": bsz * self.cfg.tokens_per_sample,
+            },
+            num_items=self.cfg.dataset_size,
+            item_size=self.cfg.tokens_per_sample,
+        )
+
+    @property
+    def source_dictionary(self):
+        return self.dictionary
+
+    @property
+    def target_dictionary(self):
+        return self.dictionary
diff --git a/SpeechT5/fairseq/fairseq/benchmark/dummy_model.py b/SpeechT5/fairseq/fairseq/benchmark/dummy_model.py
new file mode 100644
index 0000000000000000000000000000000000000000..ff26e4fe655d8e8d7f9942c4bd3df7cd267405fb
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/benchmark/dummy_model.py
@@ -0,0 +1,96 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import torch.nn as nn
+import torch.nn.functional as F
+from fairseq.data import Dictionary
+from fairseq.models import (
+    FairseqDecoder,
+    FairseqLanguageModel,
+    register_model,
+    register_model_architecture,
+)
+
+
+@register_model("dummy_model")
+class DummyModel(FairseqLanguageModel):
+    def __init__(self, args, encoder):
+        super().__init__(encoder)
+        self.args = args
+
+    @staticmethod
+    def add_args(parser):
+        parser.add_argument("--num-layers", type=int, default=24)
+        parser.add_argument("--embed-dim", type=int, default=1024)
+
+    @classmethod
+    def build_model(cls, args, task):
+        encoder = DummyEncoder(
+            num_embed=len(task.target_dictionary),
+            embed_dim=args.embed_dim,
+            num_layers=args.num_layers,
+        )
+        return cls(args, encoder)
+
+    def forward(self, src_tokens, masked_tokens=None, **kwargs):
+        return self.decoder(src_tokens, masked_tokens=masked_tokens)
+
+
+class DummyEncoder(FairseqDecoder):
+    def __init__(self, num_embed=50000, embed_dim=1024, num_layers=24):
+        super().__init__(Dictionary())
+        self.embed = nn.Embedding(
+            num_embeddings=num_embed, embedding_dim=embed_dim, padding_idx=0
+        )
+        self.layers_a = nn.ModuleList(
+            [
+                nn.Sequential(
+                    nn.LayerNorm(embed_dim),
+                    nn.Linear(embed_dim, 3 * embed_dim),  # q, k, v input projection
+                    nn.Linear(3 * embed_dim, embed_dim),  # skip self-attention
+                    nn.Linear(embed_dim, embed_dim),  # output projection
+                    nn.Dropout(),
+                )
+                for i in range(num_layers)
+            ]
+        )
+        self.layers_b = nn.ModuleList(
+            [
+                nn.Sequential(
+                    nn.LayerNorm(embed_dim),
+                    nn.Linear(embed_dim, 4 * embed_dim),  # FFN
+                    nn.ReLU(),
+                    nn.Linear(4 * embed_dim, embed_dim),  # FFN
+                    nn.Dropout(0.1),
+                )
+                for i in range(num_layers)
+            ]
+        )
+        self.out_proj = nn.Linear(embed_dim, num_embed)
+
+    def forward(self, tokens, masked_tokens=None):
+        x = self.embed(tokens)
+        for layer_a, layer_b in zip(self.layers_a, self.layers_b):
+            x = x + layer_a(x)
+            x = x + layer_b(x)
+        x = self.out_proj(x)
+        if masked_tokens is not None:
+            x = x[masked_tokens]
+        return (x,)
+
+    def max_positions(self):
+        return 1024
+
+    def get_normalized_probs(self, net_output, log_probs, sample=None):
+        logits = net_output[0].float()
+        if log_probs:
+            return F.log_softmax(logits, dim=-1)
+        else:
+            return F.softmax(logits, dim=-1)
+
+
+@register_model_architecture("dummy_model", "dummy_model")
+def base_architecture(args):
+    pass
diff --git a/SpeechT5/fairseq/fairseq/benchmark/dummy_mt.py b/SpeechT5/fairseq/fairseq/benchmark/dummy_mt.py
new file mode 100644
index 0000000000000000000000000000000000000000..4ca7be93a38d8d2b47685b74b4f8b8f9dcb03d2e
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/benchmark/dummy_mt.py
@@ -0,0 +1,119 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import logging
+
+import numpy as np
+import torch
+from fairseq.data import Dictionary, FairseqDataset
+from fairseq.tasks import LegacyFairseqTask, register_task
+
+
+logger = logging.getLogger(__name__)
+
+
+@register_task("dummy_mt")
+class DummyMTTask(LegacyFairseqTask):
+    @staticmethod
+    def add_args(parser):
+        """Add task-specific arguments to the parser."""
+        parser.add_argument("--dict-size", default=49996, type=int)
+        parser.add_argument("--dataset-size", default=100000, type=int)
+        parser.add_argument("--src-len", default=30, type=int)
+        parser.add_argument("--tgt-len", default=30, type=int)
+
+    def __init__(self, args, dictionary):
+        super().__init__(args)
+        self.dictionary = dictionary
+        self.seed = args.seed
+
+        dictionary.pad_to_multiple_(8)  # often faster if divisible by 8
+
+        self.dummy_src = torch.arange(args.src_len + 1) + dictionary.pad() + 1
+        self.dummy_tgt = torch.arange(args.tgt_len + 1) + dictionary.pad() + 1
+
+    @classmethod
+    def setup_task(cls, args, **kwargs):
+        """Setup the task. """
+        dictionary = Dictionary()
+        for i in range(args.dict_size):
+            dictionary.add_symbol("word{}".format(i))
+        logger.info("dictionary: {} types".format(len(dictionary)))
+
+        args.max_source_positions = args.src_len + dictionary.pad() + 2
+        args.max_target_positions = args.tgt_len + dictionary.pad() + 2
+
+        return cls(args, dictionary)
+
+    def load_dataset(self, split, epoch=1, combine=False, **kwargs):
+        """Load a given dataset split.
+        Args:
+            split (str): name of the split (e.g., train, valid, test)
+        """
+        item_size = max(self.args.src_len, self.args.tgt_len)
+        if self.args.batch_size is not None:
+            bsz = self.args.batch_size
+        else:
+            bsz = max(1, self.args.max_tokens // item_size)
+        tgt = torch.stack([self.dummy_tgt for _ in range(bsz)])
+        self.datasets[split] = DummyDataset(
+            {
+                "id": 1,
+                "net_input": {
+                    "src_tokens": torch.stack([self.dummy_src for _ in range(bsz)]),
+                    "src_lengths": torch.full(
+                        (bsz,), self.args.src_len, dtype=torch.long
+                    ),
+                    "prev_output_tokens": tgt.clone(),
+                },
+                "target": tgt,
+                "nsentences": bsz,
+                "ntokens": bsz * self.args.tgt_len,
+            },
+            num_items=self.args.dataset_size,
+            item_size=item_size,
+        )
+
+    @property
+    def source_dictionary(self):
+        return self.dictionary
+
+    @property
+    def target_dictionary(self):
+        return self.dictionary
+
+
+class DummyDataset(FairseqDataset):
+    def __init__(self, batch, num_items, item_size):
+        super().__init__()
+        self.batch = batch
+        self.num_items = num_items
+        self.item_size = item_size
+
+    def __getitem__(self, index):
+        return index
+
+    def __len__(self):
+        return self.num_items
+
+    def collater(self, samples):
+        return self.batch
+
+    @property
+    def sizes(self):
+        return np.array([self.item_size] * self.num_items)
+
+    def num_tokens(self, index):
+        return self.item_size
+
+    def size(self, index):
+        return self.item_size
+
+    def ordered_indices(self):
+        return np.arange(self.num_items)
+
+    @property
+    def supports_prefetch(self):
+        return False
diff --git a/SpeechT5/fairseq/fairseq/binarizer.py b/SpeechT5/fairseq/fairseq/binarizer.py
new file mode 100644
index 0000000000000000000000000000000000000000..18ae67bf25868095e101e7068962c78ee5d12aca
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/binarizer.py
@@ -0,0 +1,114 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import os
+from collections import Counter
+
+import torch
+from fairseq.file_io import PathManager
+from fairseq.tokenizer import tokenize_line
+from typing import List, Dict
+
+
+def safe_readline(f):
+    pos = f.tell()
+    while True:
+        try:
+            return f.readline()
+        except UnicodeDecodeError:
+            pos -= 1
+            f.seek(pos)  # search where this character begins
+
+
+class Binarizer:
+    @staticmethod
+    def binarize(
+        filename,
+        dict,
+        consumer,
+        tokenize=tokenize_line,
+        append_eos=True,
+        reverse_order=False,
+        offset=0,
+        end=-1,
+        already_numberized=False,
+    ) -> Dict[str, int]:
+        nseq, ntok = 0, 0
+        replaced = Counter()
+
+        def replaced_consumer(word, idx):
+            if idx == dict.unk_index and word != dict.unk_word:
+                replaced.update([word])
+
+        with open(PathManager.get_local_path(filename), "r", encoding="utf-8") as f:
+            f.seek(offset)
+            # next(f) breaks f.tell(), hence readline() must be used
+            line = safe_readline(f)
+            while line:
+                # f.tell() does not always give the byte position in the file
+                # sometimes it skips to a very large number
+                # it is unlikely that through a normal read we go from
+                # end bytes to end + 2**32 bytes (4 GB) and this makes it unlikely
+                # that the procedure breaks by the undeterministic behavior of
+                # f.tell()
+                if end > 0 and f.tell() > end and f.tell() < end + 2 ** 32:
+                    break
+                if already_numberized:
+                    id_strings = line.strip().split()
+                    id_list = [int(id_string) for id_string in id_strings]
+                    if reverse_order:
+                        id_list.reverse()
+                    if append_eos:
+                        id_list.append(dict.eos())
+                    ids = torch.IntTensor(id_list)
+                else:
+                    ids = dict.encode_line(
+                        line=line,
+                        line_tokenizer=tokenize,
+                        add_if_not_exist=False,
+                        consumer=replaced_consumer,
+                        append_eos=append_eos,
+                        reverse_order=reverse_order,
+                    )
+                nseq += 1
+                ntok += len(ids)
+                consumer(ids)
+                line = f.readline()
+        return {
+            "nseq": nseq,
+            "nunk": sum(replaced.values()),
+            "ntok": ntok,
+            "replaced": replaced,
+        }
+
+    @staticmethod
+    def binarize_alignments(
+        filename, alignment_parser, consumer, offset=0, end=-1
+    ) -> Dict[str, int]:
+        nseq = 0
+
+        with open(PathManager.get_local_path(filename), "r") as f:
+            f.seek(offset)
+            line = safe_readline(f)
+            while line:
+                if end > 0 and f.tell() > end:
+                    break
+                ids = alignment_parser(line)
+                nseq += 1
+                consumer(ids)
+                line = f.readline()
+        return {"nseq": nseq}
+
+    @staticmethod
+    def find_offsets(filename, num_chunks) -> List[int]:
+        with open(PathManager.get_local_path(filename), "r", encoding="utf-8") as f:
+            size = os.fstat(f.fileno()).st_size
+            chunk_size = size // num_chunks
+            offsets = [0 for _ in range(num_chunks + 1)]
+            for i in range(1, num_chunks):
+                f.seek(chunk_size * i)
+                safe_readline(f)
+                offsets[i] = f.tell()
+            return offsets
diff --git a/SpeechT5/fairseq/fairseq/checkpoint_utils.py b/SpeechT5/fairseq/fairseq/checkpoint_utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..627f14160d2e4040f4dfe4e793f0986f53d8d39b
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/checkpoint_utils.py
@@ -0,0 +1,798 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import ast
+import collections
+import contextlib
+import logging
+import os
+import re
+import time
+import traceback
+from collections import OrderedDict
+from typing import Any, Dict, Optional, Union
+from random import randint
+
+import torch
+from fairseq.dataclass.configs import CheckpointConfig
+from fairseq.dataclass.utils import (
+    convert_namespace_to_omegaconf,
+    overwrite_args_by_name,
+)
+from fairseq.distributed.fully_sharded_data_parallel import FSDP, has_FSDP
+from fairseq.file_io import PathManager
+from fairseq.models import FairseqDecoder, FairseqEncoder
+from omegaconf import DictConfig, open_dict, OmegaConf
+
+
+logger = logging.getLogger(__name__)
+
+
+def save_checkpoint(cfg: CheckpointConfig, trainer, epoch_itr, val_loss):
+    from fairseq import meters
+
+    # only one worker should attempt to create the required dir
+    if trainer.data_parallel_rank == 0:
+        os.makedirs(cfg.save_dir, exist_ok=True)
+
+    prev_best = getattr(save_checkpoint, "best", val_loss)
+    if val_loss is not None:
+        best_function = max if cfg.maximize_best_checkpoint_metric else min
+        save_checkpoint.best = best_function(val_loss, prev_best)
+
+    if cfg.no_save:
+        return
+
+    trainer.consolidate_optimizer()  # TODO(SS): do we need this if no_save_optimizer_state
+
+    if not trainer.should_save_checkpoint_on_current_rank:
+        if trainer.always_call_state_dict_during_save_checkpoint:
+            trainer.state_dict()
+        return
+
+    write_timer = meters.StopwatchMeter()
+    write_timer.start()
+
+    epoch = epoch_itr.epoch
+    end_of_epoch = epoch_itr.end_of_epoch()
+    updates = trainer.get_num_updates()
+
+    logger.info(f"Preparing to save checkpoint for epoch {epoch} @ {updates} updates")
+
+    def is_better(a, b):
+        return a >= b if cfg.maximize_best_checkpoint_metric else a <= b
+
+    suffix = trainer.checkpoint_suffix
+    checkpoint_conds = collections.OrderedDict()
+    checkpoint_conds["checkpoint{}{}.pt".format(epoch, suffix)] = (
+        end_of_epoch and not cfg.no_epoch_checkpoints and epoch % cfg.save_interval == 0
+    )
+    checkpoint_conds["checkpoint_{}_{}{}.pt".format(epoch, updates, suffix)] = (
+        not end_of_epoch
+        and cfg.save_interval_updates > 0
+        and updates % cfg.save_interval_updates == 0
+    )
+    checkpoint_conds["checkpoint_best{}.pt".format(suffix)] = val_loss is not None and (
+        not hasattr(save_checkpoint, "best")
+        or is_better(val_loss, save_checkpoint.best)
+    )
+    if val_loss is not None and cfg.keep_best_checkpoints > 0:
+        worst_best = getattr(save_checkpoint, "best", None)
+        chkpts = checkpoint_paths(
+            cfg.save_dir,
+            pattern=r"checkpoint\.best_{}_(\d+\.?\d*)\.pt".format(
+                cfg.best_checkpoint_metric
+            ),
+        )
+        if len(chkpts) > 0:
+            p = chkpts[-1] if cfg.maximize_best_checkpoint_metric else chkpts[0]
+            worst_best = float(p.rsplit("_")[-1].replace(".pt", ""))
+        # add random digits to resolve ties
+        rand_sfx = randint(0, cfg.keep_best_checkpoints)
+        checkpoint_conds[
+            "checkpoint.best_{}_{:.3f}{}.pt".format(cfg.best_checkpoint_metric,
+                                                    val_loss, rand_sfx)
+        ] = worst_best is None or is_better(val_loss, worst_best)
+    checkpoint_conds[
+        "checkpoint_last{}.pt".format(suffix)
+    ] = not cfg.no_last_checkpoints
+
+    extra_state = {"train_iterator": epoch_itr.state_dict(), "val_loss": val_loss}
+    if hasattr(save_checkpoint, "best"):
+        extra_state.update({"best": save_checkpoint.best})
+
+    checkpoints = [
+        os.path.join(cfg.save_dir, fn) for fn, cond in checkpoint_conds.items() if cond
+    ]
+    if len(checkpoints) > 0:
+        trainer.save_checkpoint(checkpoints[0], extra_state)
+        for cp in checkpoints[1:]:
+            if cfg.write_checkpoints_asynchronously:
+                # TODO[ioPath]: Need to implement a delayed asynchronous
+                # file copying/moving feature.
+                logger.warning(
+                    f"ioPath is not copying {checkpoints[0]} to {cp} "
+                    "since async write mode is on."
+                )
+            else:
+                assert PathManager.copy(
+                    checkpoints[0], cp, overwrite=True
+                ), f"Failed to copy {checkpoints[0]} to {cp}"
+
+        write_timer.stop()
+        logger.info(
+            "Saved checkpoint {} (epoch {} @ {} updates, score {}) (writing took {} seconds)".format(
+                checkpoints[0], epoch, updates, val_loss, write_timer.sum
+            )
+        )
+
+    if not end_of_epoch and cfg.keep_interval_updates > 0:
+        # remove old checkpoints; checkpoints are sorted in descending order
+        if cfg.keep_interval_updates_pattern == -1:
+            checkpoints = checkpoint_paths(
+                cfg.save_dir, pattern=r"checkpoint_\d+_(\d+){}\.pt".format(suffix)
+            )
+        else:
+            checkpoints = checkpoint_paths(
+                cfg.save_dir,
+                pattern=r"checkpoint_\d+_(\d+){}\.pt".format(suffix),
+                keep_match=True,
+            )
+            checkpoints = [
+                x[0]
+                for x in checkpoints
+                if x[1] % cfg.keep_interval_updates_pattern != 0
+            ]
+
+        for old_chk in checkpoints[cfg.keep_interval_updates :]:
+            if os.path.lexists(old_chk):
+                os.remove(old_chk)
+            elif PathManager.exists(old_chk):
+                PathManager.rm(old_chk)
+
+    if cfg.keep_last_epochs > 0:
+        # remove old epoch checkpoints; checkpoints are sorted in descending order
+        checkpoints = checkpoint_paths(
+            cfg.save_dir, pattern=r"checkpoint(\d+){}\.pt".format(suffix)
+        )
+        for old_chk in checkpoints[cfg.keep_last_epochs :]:
+            if os.path.lexists(old_chk):
+                os.remove(old_chk)
+
+    if cfg.keep_best_checkpoints > 0:
+        # only keep the best N checkpoints according to validation metric
+        checkpoints = checkpoint_paths(
+            cfg.save_dir,
+            pattern=r"checkpoint\.best_{}_(\d+\.?\d*){}\.pt".format(
+                cfg.best_checkpoint_metric, suffix
+            ),
+        )
+        if not cfg.maximize_best_checkpoint_metric:
+            checkpoints = checkpoints[::-1]
+        for old_chk in checkpoints[cfg.keep_best_checkpoints :]:
+            if os.path.lexists(old_chk):
+                os.remove(old_chk)
+
+
+def load_checkpoint(cfg: CheckpointConfig, trainer, **passthrough_args):
+    """
+    Load a checkpoint and restore the training iterator.
+
+    *passthrough_args* will be passed through to
+    ``trainer.get_train_iterator``.
+    """
+
+    reset_optimizer = cfg.reset_optimizer
+    reset_lr_scheduler = cfg.reset_lr_scheduler
+    optimizer_overrides = ast.literal_eval(cfg.optimizer_overrides)
+    reset_meters = cfg.reset_meters
+    reset_dataloader = cfg.reset_dataloader
+
+    if cfg.finetune_from_model is not None and (
+        reset_optimizer or reset_lr_scheduler or reset_meters or reset_dataloader
+    ):
+        raise ValueError(
+            "--finetune-from-model can not be set together with either --reset-optimizer"
+            " or reset_lr_scheduler or reset_meters or reset_dataloader"
+        )
+
+    suffix = trainer.checkpoint_suffix
+    if (
+        cfg.restore_file == "checkpoint_last.pt"
+    ):  # default value of restore_file is 'checkpoint_last.pt'
+        checkpoint_path = os.path.join(
+            cfg.save_dir, "checkpoint_last{}.pt".format(suffix)
+        )
+        first_launch = not PathManager.exists(checkpoint_path)
+        if cfg.finetune_from_model is not None and first_launch:
+            # if there is no last checkpoint to restore, start the finetune from pretrained model
+            # else just use usual logic to load checkpoint, e.g. restart from last checkpoint and etc.
+            if PathManager.exists(cfg.finetune_from_model):
+                checkpoint_path = cfg.finetune_from_model
+                reset_optimizer = True
+                reset_lr_scheduler = True
+                reset_meters = True
+                reset_dataloader = True
+                logger.info(
+                    f"loading pretrained model from {checkpoint_path}: "
+                    "optimizer, lr scheduler, meters, dataloader will be reset"
+                )
+            else:
+                raise ValueError(
+                    f"--funetune-from-model {cfg.finetune_from_model} does not exist"
+                )
+    elif suffix is not None:
+        checkpoint_path = cfg.restore_file.replace(".pt", suffix + ".pt")
+    else:
+        checkpoint_path = cfg.restore_file
+
+    if cfg.restore_file != "checkpoint_last.pt" and cfg.finetune_from_model:
+        raise ValueError(
+            "--finetune-from-model and --restore-file (non-default value) "
+            "can not be specified together: " + str(cfg)
+        )
+
+    extra_state = trainer.load_checkpoint(
+        checkpoint_path,
+        reset_optimizer,
+        reset_lr_scheduler,
+        optimizer_overrides,
+        reset_meters=reset_meters,
+    )
+
+    if (
+        extra_state is not None
+        and "best" in extra_state
+        and not reset_optimizer
+        and not reset_meters
+    ):
+        save_checkpoint.best = extra_state["best"]
+
+    if extra_state is not None and not reset_dataloader:
+        # restore iterator from checkpoint
+        itr_state = extra_state["train_iterator"]
+        epoch_itr = trainer.get_train_iterator(
+            epoch=itr_state["epoch"], load_dataset=True, **passthrough_args
+        )
+        epoch_itr.load_state_dict(itr_state)
+    else:
+        epoch_itr = trainer.get_train_iterator(
+            epoch=1, load_dataset=True, **passthrough_args
+        )
+
+    trainer.lr_step(epoch_itr.epoch)
+
+    return extra_state, epoch_itr
+
+
+def load_checkpoint_to_cpu(path, arg_overrides=None, load_on_all_ranks=False):
+    """Loads a checkpoint to CPU (with upgrading for backward compatibility).
+
+    If doing single-GPU training or if the checkpoint is only being loaded by at
+    most one process on each node (current default behavior is for only rank 0
+    to read the checkpoint from disk), load_on_all_ranks should be False to
+    avoid errors from torch.distributed not having been initialized or
+    torch.distributed.barrier() hanging.
+
+    If all processes on each node may be loading the checkpoint
+    simultaneously, load_on_all_ranks should be set to True to avoid I/O
+    conflicts.
+
+    There's currently no support for > 1 but < all processes loading the
+    checkpoint on each node.
+    """
+    local_path = PathManager.get_local_path(path)
+    # The locally cached file returned by get_local_path() may be stale for
+    # remote files that are periodically updated/overwritten (ex:
+    # checkpoint_last.pt) - so we remove the local copy, sync across processes
+    # (if needed), and then download a fresh copy.
+    if local_path != path and PathManager.path_requires_pathmanager(path):
+        try:
+            os.remove(local_path)
+        except FileNotFoundError:
+            # With potentially multiple processes removing the same file, the
+            # file being missing is benign (missing_ok isn't available until
+            # Python 3.8).
+            pass
+        if load_on_all_ranks:
+            torch.distributed.barrier()
+        local_path = PathManager.get_local_path(path)
+
+    with open(local_path, "rb") as f:
+        state = torch.load(f, map_location=torch.device("cpu"))
+
+    if "args" in state and state["args"] is not None and arg_overrides is not None:
+        args = state["args"]
+        for arg_name, arg_val in arg_overrides.items():
+            setattr(args, arg_name, arg_val)
+
+    if "cfg" in state and state["cfg"] is not None:
+
+        # hack to be able to set Namespace in dict config. this should be removed when we update to newer
+        # omegaconf version that supports object flags, or when we migrate all existing models
+        from omegaconf import _utils
+
+        old_primitive = _utils.is_primitive_type
+        _utils.is_primitive_type = lambda _: True
+
+        state["cfg"] = OmegaConf.create(state["cfg"])
+
+        _utils.is_primitive_type = old_primitive
+        OmegaConf.set_struct(state["cfg"], True)
+
+        if arg_overrides is not None:
+            overwrite_args_by_name(state["cfg"], arg_overrides)
+
+    state = _upgrade_state_dict(state)
+    return state
+
+
+def load_model_ensemble(
+    filenames,
+    arg_overrides: Optional[Dict[str, Any]] = None,
+    task=None,
+    strict=True,
+    suffix="",
+    num_shards=1,
+    state=None,
+):
+    """Loads an ensemble of models.
+
+    Args:
+        filenames (List[str]): checkpoint files to load
+        arg_overrides (Dict[str,Any], optional): override model args that
+            were used during model training
+        task (fairseq.tasks.FairseqTask, optional): task to use for loading
+    """
+    assert not (
+        strict and num_shards > 1
+    ), "Cannot load state dict with strict=True and checkpoint shards > 1"
+    ensemble, args, _task = load_model_ensemble_and_task(
+        filenames,
+        arg_overrides,
+        task,
+        strict,
+        suffix,
+        num_shards,
+        state,
+    )
+    return ensemble, args
+
+
+def get_maybe_sharded_checkpoint_filename(
+    filename: str, suffix: str, shard_idx: int, num_shards: int
+) -> str:
+    orig_filename = filename
+    filename = filename.replace(".pt", suffix + ".pt")
+    fsdp_filename = filename[:-3] + f"-shard{shard_idx}.pt"
+    model_parallel_filename = orig_filename[:-3] + f"_part{shard_idx}.pt"
+    if PathManager.exists(fsdp_filename):
+        return fsdp_filename
+    elif num_shards > 1:
+        return model_parallel_filename
+    else:
+        return filename
+
+
+def load_model_ensemble_and_task(
+    filenames,
+    arg_overrides: Optional[Dict[str, Any]] = None,
+    task=None,
+    strict=True,
+    suffix="",
+    num_shards=1,
+    state=None,
+):
+    assert state is None or len(filenames) == 1
+
+    from fairseq import tasks
+
+    assert not (
+        strict and num_shards > 1
+    ), "Cannot load state dict with strict=True and checkpoint shards > 1"
+    ensemble = []
+    cfg = None
+    for filename in filenames:
+        orig_filename = filename
+        model_shard_state = {"shard_weights": [], "shard_metadata": []}
+        assert num_shards > 0
+        st = time.time()
+        for shard_idx in range(num_shards):
+            filename = get_maybe_sharded_checkpoint_filename(
+                orig_filename, suffix, shard_idx, num_shards
+            )
+
+            if not PathManager.exists(filename):
+                raise IOError("Model file not found: {}".format(filename))
+            if state is None:
+                state = load_checkpoint_to_cpu(filename, arg_overrides)
+            if "args" in state and state["args"] is not None:
+                cfg = convert_namespace_to_omegaconf(state["args"])
+            elif "cfg" in state and state["cfg"] is not None:
+                cfg = state["cfg"]
+            else:
+                raise RuntimeError(
+                    f"Neither args nor cfg exist in state keys = {state.keys()}"
+                )
+
+            if task is None:
+                task = tasks.setup_task(cfg.task)
+
+            if "task_state" in state:
+                task.load_state_dict(state["task_state"])
+
+            if "fsdp_metadata" in state and num_shards > 1:
+                model_shard_state["shard_weights"].append(state["model"])
+                model_shard_state["shard_metadata"].append(state["fsdp_metadata"])
+                # check FSDP import before the code goes too far
+                if not has_FSDP:
+                    raise ImportError(
+                        "Cannot find FullyShardedDataParallel. "
+                        "Please install fairscale with: pip install fairscale"
+                    )
+                if shard_idx == num_shards - 1:
+                    consolidated_model_state = FSDP.consolidate_shard_weights(
+                        shard_weights=model_shard_state["shard_weights"],
+                        shard_metadata=model_shard_state["shard_metadata"],
+                    )
+                    model = task.build_model(cfg.model)
+                    model.load_state_dict(
+                        consolidated_model_state, strict=strict, model_cfg=cfg.model
+                    )
+            else:
+                # model parallel checkpoint or unsharded checkpoint
+                model = task.build_model(cfg.model)
+                model.load_state_dict(
+                    state["model"], strict=strict, model_cfg=cfg.model
+                )
+
+            # reset state so it gets loaded for the next model in ensemble
+            state = None
+            if shard_idx % 10 == 0 and shard_idx > 0:
+                elapsed = time.time() - st
+                logger.info(f"Loaded {shard_idx} shards in {elapsed:.2f}s, {elapsed / (shard_idx+1):.2f}s/shard")
+
+        # build model for ensemble
+        ensemble.append(model)
+    return ensemble, cfg, task
+
+
+def checkpoint_paths(path, pattern=r"checkpoint(\d+)\.pt", keep_match=False):
+    """Retrieves all checkpoints found in `path` directory.
+
+    Checkpoints are identified by matching filename to the specified pattern. If
+    the pattern contains groups, the result will be sorted by the first group in
+    descending order.
+    """
+    pt_regexp = re.compile(pattern)
+    files = PathManager.ls(path)
+
+    entries = []
+    for i, f in enumerate(files):
+        m = pt_regexp.fullmatch(f)
+        if m is not None:
+            idx = float(m.group(1)) if len(m.groups()) > 0 else i
+            entries.append((idx, m.group(0)))
+    if keep_match:
+        return [(os.path.join(path, x[1]), x[0]) for x in sorted(entries, reverse=True)]
+    else:
+        return [os.path.join(path, x[1]) for x in sorted(entries, reverse=True)]
+
+
+def torch_persistent_save(obj, filename, async_write: bool = False):
+    if async_write:
+        with PathManager.opena(filename, "wb") as f:
+            _torch_persistent_save(obj, f)
+    else:
+        if PathManager.supports_rename(filename):
+            # do atomic save
+            with PathManager.open(filename + ".tmp", "wb") as f:
+                _torch_persistent_save(obj, f)
+            PathManager.rename(filename + ".tmp", filename)
+        else:
+            # fallback to non-atomic save
+            with PathManager.open(filename, "wb") as f:
+                _torch_persistent_save(obj, f)
+
+
+def _torch_persistent_save(obj, f):
+    if isinstance(f, str):
+        with PathManager.open(f, "wb") as h:
+            torch_persistent_save(obj, h)
+        return
+    for i in range(3):
+        try:
+            return torch.save(obj, f)
+        except Exception:
+            if i == 2:
+                logger.error(traceback.format_exc())
+
+
+def _upgrade_state_dict(state):
+    """Helper for upgrading old model checkpoints."""
+
+    # add optimizer_history
+    if "optimizer_history" not in state:
+        state["optimizer_history"] = [
+            {"criterion_name": "CrossEntropyCriterion", "best_loss": state["best_loss"]}
+        ]
+        state["last_optimizer_state"] = state["optimizer"]
+        del state["optimizer"]
+        del state["best_loss"]
+    # move extra_state into sub-dictionary
+    if "epoch" in state and "extra_state" not in state:
+        state["extra_state"] = {
+            "epoch": state["epoch"],
+            "batch_offset": state["batch_offset"],
+            "val_loss": state["val_loss"],
+        }
+        del state["epoch"]
+        del state["batch_offset"]
+        del state["val_loss"]
+    # reduce optimizer history's memory usage (only keep the last state)
+    if "optimizer" in state["optimizer_history"][-1]:
+        state["last_optimizer_state"] = state["optimizer_history"][-1]["optimizer"]
+        for optim_hist in state["optimizer_history"]:
+            del optim_hist["optimizer"]
+    # record the optimizer class name
+    if "optimizer_name" not in state["optimizer_history"][-1]:
+        state["optimizer_history"][-1]["optimizer_name"] = "FairseqNAG"
+    # move best_loss into lr_scheduler_state
+    if "lr_scheduler_state" not in state["optimizer_history"][-1]:
+        state["optimizer_history"][-1]["lr_scheduler_state"] = {
+            "best": state["optimizer_history"][-1]["best_loss"]
+        }
+        del state["optimizer_history"][-1]["best_loss"]
+    # keep track of number of updates
+    if "num_updates" not in state["optimizer_history"][-1]:
+        state["optimizer_history"][-1]["num_updates"] = 0
+    # old model checkpoints may not have separate source/target positions
+    if (
+        "args" in state
+        and hasattr(state["args"], "max_positions")
+        and not hasattr(state["args"], "max_source_positions")
+    ):
+        state["args"].max_source_positions = state["args"].max_positions
+        state["args"].max_target_positions = state["args"].max_positions
+    # use stateful training data iterator
+    if "train_iterator" not in state["extra_state"]:
+        state["extra_state"]["train_iterator"] = {
+            "epoch": state["extra_state"]["epoch"],
+            "iterations_in_epoch": state["extra_state"].get("batch_offset", 0),
+        }
+
+    # backward compatibility, cfg updates
+    if "args" in state and state["args"] is not None:
+        # default to translation task
+        if not hasattr(state["args"], "task"):
+            state["args"].task = "translation"
+        # --raw-text and --lazy-load are deprecated
+        if getattr(state["args"], "raw_text", False):
+            state["args"].dataset_impl = "raw"
+        elif getattr(state["args"], "lazy_load", False):
+            state["args"].dataset_impl = "lazy"
+        # epochs start at 1
+        if state["extra_state"]["train_iterator"] is not None:
+            state["extra_state"]["train_iterator"]["epoch"] = max(
+                state["extra_state"]["train_iterator"].get("epoch", 1), 1
+            )
+        # --remove-bpe ==> --postprocess
+        if hasattr(state["args"], "remove_bpe"):
+            state["args"].post_process = state["args"].remove_bpe
+        # --min-lr ==> --stop-min-lr
+        if hasattr(state["args"], "min_lr"):
+            state["args"].stop_min_lr = state["args"].min_lr
+            del state["args"].min_lr
+        # binary_cross_entropy / kd_binary_cross_entropy => wav2vec criterion
+        if (
+            hasattr(state["args"], "criterion")
+            and state["args"].criterion in [
+                "binary_cross_entropy",
+                "kd_binary_cross_entropy",
+            ]
+        ):
+            state["args"].criterion = "wav2vec"
+        # remove log_keys if it's None (criteria will supply a default value of [])
+        if hasattr(state["args"], "log_keys") and state["args"].log_keys is None:
+            delattr(state["args"], "log_keys")
+        # speech_pretraining => audio pretraining
+        if (
+            hasattr(state["args"], "task")
+            and state["args"].task == "speech_pretraining"
+        ):
+            state["args"].task = "audio_pretraining"
+        # audio_cpc => wav2vec
+        if hasattr(state["args"], "arch") and state["args"].arch == "audio_cpc":
+            state["args"].arch = "wav2vec"
+        # convert legacy float learning rate to List[float]
+        if hasattr(state["args"], "lr") and isinstance(state["args"].lr, float):
+            state["args"].lr = [state["args"].lr]
+        # convert task data arg to a string instead of List[string]
+        if (
+            hasattr(state["args"], "data")
+            and isinstance(state["args"].data, list)
+            and len(state["args"].data) > 0
+        ):
+            state["args"].data = state["args"].data[0]
+        # remove keys in state["args"] related to teacher-student learning
+        for key in [
+            "static_teachers",
+            "static_teacher_weights",
+            "dynamic_teachers",
+            "dynamic_teacher_weights",
+        ]:
+            if key in state["args"]:
+                delattr(state["args"], key)
+
+        state["cfg"] = convert_namespace_to_omegaconf(state["args"])
+
+    if "cfg" in state and state["cfg"] is not None:
+        cfg = state["cfg"]
+        with open_dict(cfg):
+            # any upgrades for Hydra-based configs
+            if (
+                "task" in cfg
+                and "eval_wer_config" in cfg.task
+                and isinstance(cfg.task.eval_wer_config.print_alignment, bool)
+            ):
+                cfg.task.eval_wer_config.print_alignment = "hard"
+            if "generation" in cfg and isinstance(cfg.generation.print_alignment, bool):
+                cfg.generation.print_alignment = "hard"
+            if (
+                "model" in cfg
+                and "w2v_args" in cfg.model
+                and cfg.model.w2v_args is not None
+                and (
+                    hasattr(cfg.model.w2v_args, "task") or "task" in cfg.model.w2v_args
+                )
+                and hasattr(cfg.model.w2v_args.task, "eval_wer_config")
+                and cfg.model.w2v_args.task.eval_wer_config is not None
+                and isinstance(
+                    cfg.model.w2v_args.task.eval_wer_config.print_alignment, bool
+                )
+            ):
+                cfg.model.w2v_args.task.eval_wer_config.print_alignment = "hard"
+
+    return state
+
+
+def prune_state_dict(state_dict, model_cfg: Optional[DictConfig]):
+    """Prune the given state_dict if desired for LayerDrop
+    (https://arxiv.org/abs/1909.11556).
+
+    Training with LayerDrop allows models to be robust to pruning at inference
+    time. This function prunes state_dict to allow smaller models to be loaded
+    from a larger model and re-maps the existing state_dict for this to occur.
+
+    It's called by functions that load models from checkpoints and does not
+    need to be called directly.
+    """
+    arch = None
+    if model_cfg is not None:
+        arch = (
+            model_cfg._name
+            if isinstance(model_cfg, DictConfig)
+            else getattr(model_cfg, "arch", None)
+        )
+
+    if not model_cfg or arch is None or arch == "ptt_transformer":
+        # args should not be none, but don't crash if it is.
+        return state_dict
+
+    encoder_layers_to_keep = getattr(model_cfg, "encoder_layers_to_keep", None)
+    decoder_layers_to_keep = getattr(model_cfg, "decoder_layers_to_keep", None)
+
+    if not encoder_layers_to_keep and not decoder_layers_to_keep:
+        return state_dict
+
+    # apply pruning
+    logger.info(
+        "Pruning model to specified layer configuration - this works best if the model was trained with LayerDrop"
+    )
+
+    def create_pruning_pass(layers_to_keep, layer_name):
+        keep_layers = sorted(
+            int(layer_string) for layer_string in layers_to_keep.split(",")
+        )
+        mapping_dict = {}
+        for i in range(len(keep_layers)):
+            mapping_dict[str(keep_layers[i])] = str(i)
+
+        regex = re.compile(r"^{layer}.*\.layers\.(\d+)".format(layer=layer_name))
+        return {"substitution_regex": regex, "mapping_dict": mapping_dict}
+
+    pruning_passes = []
+    if encoder_layers_to_keep:
+        pruning_passes.append(create_pruning_pass(encoder_layers_to_keep, "encoder"))
+    if decoder_layers_to_keep:
+        pruning_passes.append(create_pruning_pass(decoder_layers_to_keep, "decoder"))
+
+    new_state_dict = {}
+    for layer_name in state_dict.keys():
+        match = re.search(r"\.layers\.(\d+)\.", layer_name)
+        # if layer has no number in it, it is a supporting layer, such as an
+        # embedding
+        if not match:
+            new_state_dict[layer_name] = state_dict[layer_name]
+            continue
+
+        # otherwise, layer should be pruned.
+        original_layer_number = match.group(1)
+        # figure out which mapping dict to replace from
+        for pruning_pass in pruning_passes:
+            if original_layer_number in pruning_pass["mapping_dict"] and pruning_pass[
+                "substitution_regex"
+            ].search(layer_name):
+                new_layer_number = pruning_pass["mapping_dict"][original_layer_number]
+                substitution_match = pruning_pass["substitution_regex"].search(
+                    layer_name
+                )
+                new_state_key = (
+                    layer_name[: substitution_match.start(1)]
+                    + new_layer_number
+                    + layer_name[substitution_match.end(1) :]
+                )
+                new_state_dict[new_state_key] = state_dict[layer_name]
+
+    # Since layers are now pruned, *_layers_to_keep are no longer needed.
+    # This is more of "It would make it work fix" rather than a proper fix.
+    if isinstance(model_cfg, DictConfig):
+        context = open_dict(model_cfg)
+    else:
+        context = contextlib.ExitStack()
+    with context:
+        if hasattr(model_cfg, "encoder_layers_to_keep"):
+            model_cfg.encoder_layers_to_keep = None
+        if hasattr(model_cfg, "decoder_layers_to_keep"):
+            model_cfg.decoder_layers_to_keep = None
+
+    return new_state_dict
+
+
+def load_pretrained_component_from_model(
+    component: Union[FairseqEncoder, FairseqDecoder], checkpoint: str
+):
+    """
+    Load a pretrained FairseqEncoder or FairseqDecoder from checkpoint into the
+    provided `component` object. If state_dict fails to load, there may be a
+    mismatch in the architecture of the corresponding `component` found in the
+    `checkpoint` file.
+    """
+    if not PathManager.exists(checkpoint):
+        raise IOError("Model file not found: {}".format(checkpoint))
+    state = load_checkpoint_to_cpu(checkpoint)
+    if isinstance(component, FairseqEncoder):
+        component_type = "encoder"
+    elif isinstance(component, FairseqDecoder):
+        component_type = "decoder"
+    else:
+        raise ValueError(
+            "component to load must be either a FairseqEncoder or "
+            "FairseqDecoder. Loading other component types are not supported."
+        )
+    component_state_dict = OrderedDict()
+    for key in state["model"].keys():
+        if key.startswith(component_type):
+            # encoder.input_layers.0.0.weight --> input_layers.0.0.weight
+            component_subkey = key[len(component_type) + 1 :]
+            component_state_dict[component_subkey] = state["model"][key]
+    component.load_state_dict(component_state_dict, strict=True)
+    return component
+
+
+def verify_checkpoint_directory(save_dir: str) -> None:
+    if not os.path.exists(save_dir):
+        os.makedirs(save_dir, exist_ok=True)
+    temp_file_path = os.path.join(save_dir, "dummy")
+    try:
+        with open(temp_file_path, "w"):
+            pass
+    except OSError as e:
+        logger.warning(
+            "Unable to access checkpoint save directory: {}".format(save_dir)
+        )
+        raise e
+    else:
+        os.remove(temp_file_path)
diff --git a/SpeechT5/fairseq/fairseq/clib/cuda/ngram_repeat_block_cuda.cpp b/SpeechT5/fairseq/fairseq/clib/cuda/ngram_repeat_block_cuda.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..4199cd6ea86b019cb688b20c07e85905b2244fa0
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/clib/cuda/ngram_repeat_block_cuda.cpp
@@ -0,0 +1,47 @@
+/*
+Copyright (c) Microsoft Corporation.
+Licensed under the MIT License.
+*/
+
+#include <torch/extension.h>
+#include <vector>
+
+/*
+CPP Binding for CUDA OP
+*/
+
+// CUDA forward declarations
+torch::Tensor ngram_repeat_block_cuda_forward(torch::Tensor tokens,
+                                              torch::Tensor lprobs, int bsz,
+                                              int step, int beam_size,
+                                              int no_repeat_ngram_size);
+
+#define CHECK_CUDA(x) \
+  TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
+#define CHECK_CONTIGUOUS(x) \
+  TORCH_CHECK(x.is_contiguous(), #x " must be contiguous")
+#define CHECK_INPUT(x) \
+  CHECK_CUDA(x);       \
+  CHECK_CONTIGUOUS(x)
+
+// Input check and call to CUDA OP
+// Backward method not required
+torch::Tensor ngram_repeat_block_forward(torch::Tensor tokens,
+                                         torch::Tensor lprobs, int bsz,
+                                         int step, int beam_size,
+                                         int no_repeat_ngram_size) {
+  CHECK_INPUT(tokens);
+  CHECK_INPUT(lprobs);
+  assert(bsz > 0);
+  assert(step >= 0);
+  assert(beam_size > 0);
+  assert(no_repeat_ngram_size > 0);
+
+  return ngram_repeat_block_cuda_forward(tokens, lprobs, bsz, step, beam_size,
+                                         no_repeat_ngram_size);
+}
+
+PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {
+  m.def("forward", &ngram_repeat_block_forward,
+        "No Repeat Ngram Block forward (CUDA)");
+}
diff --git a/SpeechT5/fairseq/fairseq/clib/cuda/ngram_repeat_block_cuda_kernel.cu b/SpeechT5/fairseq/fairseq/clib/cuda/ngram_repeat_block_cuda_kernel.cu
new file mode 100644
index 0000000000000000000000000000000000000000..b458b0916a7b38d1662377ce95559e7562e4c65d
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/clib/cuda/ngram_repeat_block_cuda_kernel.cu
@@ -0,0 +1,76 @@
+/*
+Copyright (c) Microsoft Corporation.
+Licensed under the MIT License.
+*/
+
+/*
+Kernel implementation for blocking repeated n-grams.
+*/
+
+#include <cuda.h>
+#include <cuda_runtime.h>
+#include <math.h>
+#include <torch/extension.h>
+#include <vector>
+
+// Ban repeated ngrams of length = 'no_repeat_ngram_size'
+__global__ void banRepeatedTokens(long* __restrict__ tokens,
+                                  float* __restrict__ lprobs,
+                                  int max_predict_len, int vocab_size,
+                                  int no_repeat_ngram_size) {
+  auto row = blockIdx.x;
+  auto col = threadIdx.x;
+  auto start = row * (max_predict_len) + col;
+  // Each thread compares ngram starting from
+  // thread index with final ngram starting from
+  // step - no_repeat_ngram_size +2
+  auto check_start_pos = blockDim.x;
+  auto lprob_start = row * vocab_size;
+  bool is_banned = true;
+  extern __shared__ long tokens_shm[];
+  tokens_shm[col] = tokens[start];
+  if (col == blockDim.x - 1) {
+     for (int i=1; i<no_repeat_ngram_size; i++){
+	if (col+i < max_predict_len){
+          tokens_shm[col + i] = tokens[start + i];
+	}
+    }
+  }
+  __syncthreads();
+
+  for (int k = 0; k < no_repeat_ngram_size - 1; k++) {
+    if (tokens_shm[col + k] != tokens_shm[check_start_pos + k]) {
+      is_banned = false;
+    }
+  }
+  if (is_banned == true) {
+    auto token_to_be_banned = tokens_shm[col + no_repeat_ngram_size - 1];
+    lprobs[lprob_start + token_to_be_banned] = -INFINITY;
+  }
+}
+
+// Allocate blocks and threads based on
+// batch size and sequence length and launch
+// kernel
+torch::Tensor ngram_repeat_block_cuda_forward(const torch::Tensor tokens,
+                                              torch::Tensor lprobs, int bsz,
+                                              int step, int beam_size,
+                                              int no_repeat_ngram_size) {
+  int threads = step - no_repeat_ngram_size + 2;
+  if (threads <= 0) return lprobs;
+  int max_predict_len = tokens.size(1);
+  int vocab_size = lprobs.size(1);
+  auto token_ptr = tokens.data_ptr<long>();
+  auto lprob_ptr = lprobs.data_ptr<float>();
+  int blocks = bsz * beam_size;
+  int shared_mem_size = (step + 1) * sizeof(long);
+
+  // Launching N blocks where N is number of samples in a batch (beams*bsz)
+  // Launching T threads where T is number of previous ngrams in a sample
+  // Allocating shared mem per block for fastser access of input tokens since
+  // each token will be accessed N times to compare with current Ngram where
+  // N is Ngram size.
+  banRepeatedTokens<<<blocks, threads, shared_mem_size>>>(
+      token_ptr, lprob_ptr, max_predict_len, vocab_size, no_repeat_ngram_size);
+  return lprobs;
+}
diff --git a/SpeechT5/fairseq/fairseq/clib/libbase/balanced_assignment.cpp b/SpeechT5/fairseq/fairseq/clib/libbase/balanced_assignment.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..296f03b6aeb87a11db92e5342d8dab90f1fc3867
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/clib/libbase/balanced_assignment.cpp
@@ -0,0 +1,95 @@
+/**
+ * Copyright 2017-present, Facebook, Inc.
+ * All rights reserved.
+ *
+ * This source code is licensed under the license found in the
+ * LICENSE file in the root directory of this source tree.
+ */
+
+/*
+C++ code for solving the linear assignment problem.
+Based on the Auction Algorithm from https://dspace.mit.edu/bitstream/handle/1721.1/3265/P-2108-26912652.pdf and the implementation from: 
+https://github.com/bkj/auction-lap
+Adapted to be more efficient when each worker is looking for k jobs instead of 1.
+*/
+#include <torch/extension.h>
+#include <iostream>
+using namespace torch::indexing;
+torch::Tensor balanced_assignment(torch::Tensor job_and_worker_to_score) {
+    int max_iterations = 100;
+    torch::Tensor epsilon = (job_and_worker_to_score.max() - job_and_worker_to_score.min()) / 50;
+    epsilon.clamp_min_(1e-04);
+    torch::Tensor worker_and_job_to_score = job_and_worker_to_score.detach().transpose(0,1).contiguous();
+    int num_workers = worker_and_job_to_score.size(0);
+    int num_jobs = worker_and_job_to_score.size(1);
+    auto device = worker_and_job_to_score.device();
+    int jobs_per_worker = num_jobs / num_workers;
+    torch::Tensor value = worker_and_job_to_score.clone();
+    int counter = 0;
+    torch::Tensor max_value = worker_and_job_to_score.max();
+
+    torch::Tensor bid_indices;
+    torch::Tensor cost = worker_and_job_to_score.new_zeros({1, num_jobs});
+    torch::Tensor bids = worker_and_job_to_score.new_empty({num_workers, num_jobs});
+    torch::Tensor bid_increments = worker_and_job_to_score.new_empty({num_workers, jobs_per_worker});
+    torch::Tensor top_values = worker_and_job_to_score.new_empty({num_workers, jobs_per_worker + 1});
+    torch::Tensor high_bids = worker_and_job_to_score.new_empty({num_jobs});
+
+    torch::Tensor top_index = top_values.to(torch::kLong);
+    torch::Tensor high_bidders = top_index.new_empty({num_jobs});
+    torch::Tensor have_bids = high_bidders.to(torch::kBool);
+    torch::Tensor jobs_indices = torch::arange({num_jobs}, torch::dtype(torch::kLong).device(device));
+    torch::Tensor true_tensor = torch::ones({1}, torch::dtype(torch::kBool).device(device));
+
+    while (true) {
+        bids.zero_();
+        torch::topk_out(top_values, top_index, value, jobs_per_worker + 1, 1);
+
+        // Each worker bids the difference in value between that job and the k+1th job
+        torch::sub_out(bid_increments,
+                       top_values.index({Slice(None, None), Slice(0, jobs_per_worker)}),
+                       top_values.index({Slice(None, None), jobs_per_worker}).unsqueeze(1));
+
+        bid_increments.add_(epsilon);
+        bids.scatter_(1,
+            top_index.index({Slice(None, None),Slice(0, jobs_per_worker)}),
+            bid_increments);
+
+        if (counter < max_iterations && counter > 0) {
+            // Put in a minimal bid to retain items from the last round if no-one else bids for them this round
+            bids.view(-1).index_put_({bid_indices}, epsilon);
+        }
+
+        // Find the highest bidding worker per job
+        torch::max_out(high_bids, high_bidders, bids, 0);
+        torch::gt_out(have_bids, high_bids, 0);
+
+        if (have_bids.all().item<bool>()) {
+            // All jobs were bid for
+            break;
+        }
+
+        // Make popular items more expensive
+        cost.add_(high_bids);
+        torch::sub_out(value, worker_and_job_to_score, cost);
+
+        bid_indices = ((high_bidders * num_jobs) + jobs_indices).index({have_bids});
+
+        if (counter < max_iterations) {
+            // Make sure that this item will be in the winning worker's top-k next time.
+            value.view(-1).index_put_({bid_indices}, max_value);
+        }
+        else {
+            // Suboptimal approximation that converges quickly from current solution
+            value.view(-1).index_put_({bid_indices}, worker_and_job_to_score.view(-1).index({bid_indices}));
+        }
+
+        counter += 1;
+    }
+
+    return top_index.index({Slice(None, None), Slice(0, jobs_per_worker)}).reshape(-1);
+}
+
+PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {
+  m.def("balanced_assignment", &balanced_assignment, "Balanced Assignment");
+}
diff --git a/SpeechT5/fairseq/fairseq/clib/libbleu/libbleu.cpp b/SpeechT5/fairseq/fairseq/clib/libbleu/libbleu.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..3cf2d65b6d16e19ea299ebe43c9c25e3481d4524
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/clib/libbleu/libbleu.cpp
@@ -0,0 +1,141 @@
+/**
+ * Copyright 2017-present, Facebook, Inc.
+ * All rights reserved.
+ *
+ * This source code is licensed under the license found in the
+ * LICENSE file in the root directory of this source tree.
+ */
+
+#include <map>
+#include <array>
+#include <cstring>
+#include <cstdio>
+
+typedef struct
+{
+    size_t reflen;
+    size_t predlen;
+    size_t match1;
+    size_t count1;
+    size_t match2;
+    size_t count2;
+    size_t match3;
+    size_t count3;
+    size_t match4;
+    size_t count4;
+} bleu_stat;
+
+// left trim (remove pad)
+void bleu_ltrim(size_t* len, int** sent, int pad) {
+  size_t start = 0;
+  while(start < *len) {
+    if (*(*sent + start) != pad) { break; }
+    start++;
+  }
+  *sent += start;
+  *len -= start;
+}
+
+// right trim remove (eos)
+void bleu_rtrim(size_t* len, int** sent, int pad, int eos) {
+  size_t end = *len - 1;
+  while (end > 0) {
+    if (*(*sent + end) != eos && *(*sent + end) != pad) { break; }
+    end--;
+  }
+  *len = end + 1;
+}
+
+// left and right trim
+void bleu_trim(size_t* len, int** sent, int pad, int eos) {
+  bleu_ltrim(len, sent, pad);
+  bleu_rtrim(len, sent, pad, eos);
+}
+
+size_t bleu_hash(int len, int* data) {
+  size_t h     = 14695981039346656037ul;
+  size_t prime = 0x100000001b3;
+  char* b      = (char*) data;
+  size_t blen  = sizeof(int) * len;
+
+  while (blen-- > 0) {
+    h ^= *b++;
+    h *= prime;
+  }
+
+  return h;
+}
+
+void bleu_addngram(
+    size_t *ntotal, size_t *nmatch, size_t n,
+    size_t reflen, int* ref, size_t predlen, int* pred) {
+
+  if (predlen < n) { return; }
+
+  predlen = predlen - n + 1;
+  (*ntotal) += predlen;
+
+  if (reflen < n) { return; }
+
+  reflen = reflen - n + 1;
+
+  std::map<size_t, size_t> count;
+  while (predlen > 0) {
+    size_t w = bleu_hash(n, pred++);
+    count[w]++;
+    predlen--;
+  }
+
+  while (reflen > 0) {
+    size_t w = bleu_hash(n, ref++);
+    if (count[w] > 0) {
+      (*nmatch)++;
+      count[w] -=1;
+    }
+    reflen--;
+  }
+}
+
+extern "C" {
+
+#ifdef _WIN64
+__declspec(dllexport) 
+#endif
+void bleu_zero_init(bleu_stat* stat) {
+  std::memset(stat, 0, sizeof(bleu_stat));
+}
+
+#ifdef _WIN64
+__declspec(dllexport) 
+#endif
+void bleu_one_init(bleu_stat* stat) {
+  bleu_zero_init(stat);
+  stat->count1 = 0;
+  stat->count2 = 1;
+  stat->count3 = 1;
+  stat->count4 = 1;
+  stat->match1 = 0;
+  stat->match2 = 1;
+  stat->match3 = 1;
+  stat->match4 = 1;
+}
+
+#ifdef _WIN64
+__declspec(dllexport) 
+#endif
+void bleu_add(
+    bleu_stat* stat,
+    size_t reflen, int* ref, size_t predlen, int* pred, int pad, int eos) {
+
+  bleu_trim(&reflen, &ref, pad, eos);
+  bleu_trim(&predlen, &pred, pad, eos);
+  stat->reflen += reflen;
+  stat->predlen += predlen;
+
+  bleu_addngram(&stat->count1, &stat->match1, 1, reflen, ref, predlen, pred);
+  bleu_addngram(&stat->count2, &stat->match2, 2, reflen, ref, predlen, pred);
+  bleu_addngram(&stat->count3, &stat->match3, 3, reflen, ref, predlen, pred);
+  bleu_addngram(&stat->count4, &stat->match4, 4, reflen, ref, predlen, pred);
+}
+
+}
diff --git a/SpeechT5/fairseq/fairseq/clib/libbleu/module.cpp b/SpeechT5/fairseq/fairseq/clib/libbleu/module.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..8ed9a84b1c028bfe9ed1d45be6857b6e79b3459f
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/clib/libbleu/module.cpp
@@ -0,0 +1,37 @@
+/**
+ * Copyright 2017-present, Facebook, Inc.
+ * All rights reserved.
+ *
+ * This source code is licensed under the license found in the
+ * LICENSE file in the root directory of this source tree.
+ */
+
+#include <Python.h>
+
+
+static PyMethodDef method_def[] = {
+  {NULL, NULL, 0, NULL}
+};
+
+static struct PyModuleDef module_def = {
+   PyModuleDef_HEAD_INIT,
+   "libbleu",   /* name of module */
+   NULL,     /* module documentation, may be NULL */
+   -1,       /* size of per-interpreter state of the module,
+                or -1 if the module keeps state in global variables. */
+   method_def
+};
+
+
+#if PY_MAJOR_VERSION == 2
+PyMODINIT_FUNC init_libbleu()
+#else
+PyMODINIT_FUNC PyInit_libbleu()
+#endif
+{
+  PyObject *m = PyModule_Create(&module_def);
+  if (!m) {
+    return NULL;
+  }
+  return m;
+}
diff --git a/SpeechT5/fairseq/fairseq/clib/libnat/edit_dist.cpp b/SpeechT5/fairseq/fairseq/clib/libnat/edit_dist.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..6bc6a937d6abde0cd49769c4def69ac0560096bc
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/clib/libnat/edit_dist.cpp
@@ -0,0 +1,231 @@
+/**
+ * Copyright 2017-present, Facebook, Inc.
+ * All rights reserved.
+ *
+ * This source code is licensed under the license found in the
+ * LICENSE file in the root directory of this source tree.
+ */
+
+#include <torch/torch.h> // @manual=//caffe2:torch_extension
+#include <pybind11/detail/common.h>
+#include <pybind11/pybind11.h>
+#include <vector>
+#include <algorithm>
+#include <cstdint>
+#include <iosfwd>
+#include <memory>
+#include <new>
+#include <string>
+#include <utility>
+
+using namespace ::std;
+
+vector<vector<uint32_t>> edit_distance2_with_dp(
+    vector<uint32_t>& x,
+    vector<uint32_t>& y) {
+  uint32_t lx = x.size();
+  uint32_t ly = y.size();
+  vector<vector<uint32_t>> d(lx + 1, vector<uint32_t>(ly + 1));
+  for (uint32_t i = 0; i < lx + 1; i++) {
+    d[i][0] = i;
+  }
+  for (uint32_t j = 0; j < ly + 1; j++) {
+    d[0][j] = j;
+  }
+  for (uint32_t i = 1; i < lx + 1; i++) {
+    for (uint32_t j = 1; j < ly + 1; j++) {
+      d[i][j] =
+          min(min(d[i - 1][j], d[i][j - 1]) + 1,
+              d[i - 1][j - 1] + 2 * (x.at(i - 1) == y.at(j - 1) ? 0 : 1));
+    }
+  }
+  return d;
+}
+
+vector<vector<uint32_t>> edit_distance2_backtracking(
+    vector<vector<uint32_t>>& d,
+    vector<uint32_t>& x,
+    vector<uint32_t>& y,
+    uint32_t terminal_symbol) {
+  vector<uint32_t> seq;
+  vector<vector<uint32_t>> edit_seqs(x.size() + 2, vector<uint32_t>());
+  /*
+  edit_seqs:
+  0~x.size() cell is the insertion sequences
+  last cell is the delete sequence
+  */
+
+  if (x.size() == 0) {
+    edit_seqs.at(0) = y;
+    return edit_seqs;
+  }
+
+  uint32_t i = d.size() - 1;
+  uint32_t j = d.at(0).size() - 1;
+
+  while ((i >= 0) && (j >= 0)) {
+    if ((i == 0) && (j == 0)) {
+      break;
+    }
+
+    if ((j > 0) && (d.at(i).at(j - 1) < d.at(i).at(j))) {
+      seq.push_back(1); // insert
+      seq.push_back(y.at(j - 1));
+      j--;
+    } else if ((i > 0) && (d.at(i - 1).at(j) < d.at(i).at(j))) {
+      seq.push_back(2); // delete
+      seq.push_back(x.at(i - 1));
+      i--;
+    } else {
+      seq.push_back(3); // keep
+      seq.push_back(x.at(i - 1));
+      i--;
+      j--;
+    }
+  }
+
+  uint32_t prev_op, op, s, word;
+  prev_op = 0, s = 0;
+  for (uint32_t k = 0; k < seq.size() / 2; k++) {
+    op = seq.at(seq.size() - 2 * k - 2);
+    word = seq.at(seq.size() - 2 * k - 1);
+    if (prev_op != 1) {
+      s++;
+    }
+    if (op == 1) // insert
+    {
+      edit_seqs.at(s - 1).push_back(word);
+    } else if (op == 2) // delete
+    {
+      edit_seqs.at(x.size() + 1).push_back(1);
+    } else {
+      edit_seqs.at(x.size() + 1).push_back(0);
+    }
+
+    prev_op = op;
+  }
+
+  for (uint32_t k = 0; k < edit_seqs.size(); k++) {
+    if (edit_seqs[k].size() == 0) {
+      edit_seqs[k].push_back(terminal_symbol);
+    }
+  }
+  return edit_seqs;
+}
+
+vector<vector<uint32_t>> edit_distance2_backtracking_with_delete(
+    vector<vector<uint32_t>>& d,
+    vector<uint32_t>& x,
+    vector<uint32_t>& y,
+    uint32_t terminal_symbol,
+    uint32_t deletion_symbol) {
+  vector<uint32_t> seq;
+  vector<vector<uint32_t>> edit_seqs(x.size() + 1, vector<uint32_t>());
+  /*
+  edit_seqs:
+  0~x.size() cell is the insertion sequences
+  last cell is the delete sequence
+  */
+
+  if (x.size() == 0) {
+    edit_seqs.at(0) = y;
+    return edit_seqs;
+  }
+
+  uint32_t i = d.size() - 1;
+  uint32_t j = d.at(0).size() - 1;
+
+  while ((i >= 0) && (j >= 0)) {
+    if ((i == 0) && (j == 0)) {
+      break;
+    }
+
+    if ((j > 0) && (d.at(i).at(j - 1) < d.at(i).at(j))) {
+      seq.push_back(1); // insert
+      seq.push_back(y.at(j - 1));
+      j--;
+    } else if ((i > 0) && (d.at(i - 1).at(j) < d.at(i).at(j))) {
+      seq.push_back(2); // delete
+      seq.push_back(x.at(i - 1));
+      i--;
+    } else {
+      seq.push_back(3); // keep
+      seq.push_back(x.at(i - 1));
+      i--;
+      j--;
+    }
+  }
+
+  uint32_t prev_op, op, s, word;
+  prev_op = 0, s = 0;
+  for (uint32_t k = 0; k < seq.size() / 2; k++) {
+    op = seq.at(seq.size() - 2 * k - 2);
+    word = seq.at(seq.size() - 2 * k - 1);
+    if (prev_op != 1) {
+      s++;
+    }
+    if (op == 1) // insert
+    {
+      edit_seqs.at(s - 1).push_back(word);
+    } else if (op == 2) // delete
+    {
+      edit_seqs.at(s - 1).push_back(deletion_symbol);
+    }
+
+    prev_op = op;
+  }
+
+  for (uint32_t k = 0; k < edit_seqs.size(); k++) {
+    if (edit_seqs.at(k).size() == 0) {
+      edit_seqs.at(k).push_back(terminal_symbol);
+    }
+  }
+  return edit_seqs;
+}
+
+vector<uint32_t> compute_ed2(
+    vector<vector<uint32_t>>& xs,
+    vector<vector<uint32_t>>& ys) {
+  vector<uint32_t> distances(xs.size());
+  for (uint32_t i = 0; i < xs.size(); i++) {
+    vector<vector<uint32_t>> d = edit_distance2_with_dp(xs.at(i), ys.at(i));
+    distances.at(i) = d.at(xs.at(i).size()).at(ys.at(i).size());
+  }
+  return distances;
+}
+
+vector<vector<vector<uint32_t>>> suggested_ed2_path(
+    vector<vector<uint32_t>>& xs,
+    vector<vector<uint32_t>>& ys,
+    uint32_t terminal_symbol) {
+  vector<vector<vector<uint32_t>>> seq(xs.size());
+  for (uint32_t i = 0; i < xs.size(); i++) {
+    vector<vector<uint32_t>> d = edit_distance2_with_dp(xs.at(i), ys.at(i));
+    seq.at(i) =
+        edit_distance2_backtracking(d, xs.at(i), ys.at(i), terminal_symbol);
+  }
+  return seq;
+}
+
+vector<vector<vector<uint32_t>>> suggested_ed2_path_with_delete(
+    vector<vector<uint32_t>>& xs,
+    vector<vector<uint32_t>>& ys,
+    uint32_t terminal_symbol,
+    uint32_t deletion_symbol) {
+  vector<vector<vector<uint32_t>>> seq(xs.size());
+  for (uint32_t i = 0; i < xs.size(); i++) {
+    vector<vector<uint32_t>> d = edit_distance2_with_dp(xs.at(i), ys.at(i));
+    seq.at(i) = edit_distance2_backtracking_with_delete(
+        d, xs.at(i), ys.at(i), terminal_symbol, deletion_symbol);
+  }
+  return seq;
+}
+
+PYBIND11_MODULE(libnat, m) {
+  m.def("compute_ed2", &compute_ed2, "compute_ed2");
+  m.def("suggested_ed2_path", &suggested_ed2_path, "suggested_ed2_path");
+  m.def(
+      "suggested_ed2_path_with_delete",
+      &suggested_ed2_path_with_delete,
+      "suggested_ed2_path_with_delete");
+}
diff --git a/SpeechT5/fairseq/fairseq/clib/libnat_cuda/binding.cpp b/SpeechT5/fairseq/fairseq/clib/libnat_cuda/binding.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..aaa6244d5c6819acfae5f408280205661a3389ae
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/clib/libnat_cuda/binding.cpp
@@ -0,0 +1,60 @@
+/**
+ * Copyright 2017-present, Facebook, Inc.
+ * All rights reserved.
+ *
+ * This source code is licensed under the license found in the
+ * LICENSE file in the root directory of this source tree.
+ */
+
+/*
+ This code is partially adpoted from https://github.com/1ytic/pytorch-edit-distance
+ */
+
+#include "edit_dist.h"
+#include <torch/types.h>
+
+#ifndef TORCH_CHECK
+#define TORCH_CHECK AT_CHECK
+#endif
+
+#define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
+#define CHECK_CONTIGUOUS(x) TORCH_CHECK(x.is_contiguous(), #x " must be contiguous")
+#define CHECK_INPUT(x) CHECK_CUDA(x); CHECK_CONTIGUOUS(x)
+
+
+torch::Tensor LevenshteinDistance(
+        torch::Tensor source,
+        torch::Tensor target,
+        torch::Tensor source_length,
+        torch::Tensor target_length) {
+
+    CHECK_INPUT(source);
+    CHECK_INPUT(target);
+    CHECK_INPUT(source_length);
+    CHECK_INPUT(target_length);
+    return LevenshteinDistanceCuda(source, target, source_length, target_length);
+}
+
+torch::Tensor GenerateDeletionLabel(
+        torch::Tensor source,
+        torch::Tensor operations) {
+
+    CHECK_INPUT(source);
+    CHECK_INPUT(operations);
+    return GenerateDeletionLabelCuda(source, operations);
+}
+
+std::pair<torch::Tensor, torch::Tensor> GenerateInsertionLabel(
+        torch::Tensor target,
+        torch::Tensor operations) {
+
+    CHECK_INPUT(target);
+    CHECK_INPUT(operations);
+    return GenerateInsertionLabelCuda(target, operations);
+}
+
+PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {
+    m.def("levenshtein_distance", &LevenshteinDistance, "Levenshtein distance");
+    m.def("generate_deletion_labels", &GenerateDeletionLabel, "Generate Deletion Label");
+    m.def("generate_insertion_labels", &GenerateInsertionLabel, "Generate Insertion Label");
+}
diff --git a/SpeechT5/fairseq/fairseq/clib/libnat_cuda/edit_dist.cu b/SpeechT5/fairseq/fairseq/clib/libnat_cuda/edit_dist.cu
new file mode 100644
index 0000000000000000000000000000000000000000..22de16b270851227348c43d6adfb763c8a325df6
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/clib/libnat_cuda/edit_dist.cu
@@ -0,0 +1,332 @@
+/**
+* Copyright 2017-present, Facebook, Inc.
+* All rights reserved.
+*
+* This source code is licensed under the license found in the
+* LICENSE file in the root directory of this source tree.
+*/
+
+#include "edit_dist.h"
+#include <THC/THC.h>
+#include <cuda.h>
+#include <cuda_runtime.h>
+#include <device_launch_parameters.h>
+#include <utility>      // std::pair
+
+template <typename scalar_t>
+__global__ void generate_deletion_label_kernel(
+        const scalar_t* __restrict__ source,
+        const size_t source_size,
+        const size_t operation_size,
+        int* __restrict__ operations,
+        int* __restrict__ labels) {
+
+    const int index = blockIdx.x;
+    const int offset = index * operation_size;
+    const int offset_label = index * source_size;
+
+    for (int i = 0; i < source_size; i++) {
+        labels[offset_label + i] = 0;
+    }
+
+    int k = 0;
+    for (int i = 0; i < operation_size; i++){
+        if (operations[offset + i] == 0){
+            break;
+        } else if (operations[offset + i] == 1){
+            continue;
+        } else {
+            labels[offset_label + k] = 3 - operations[offset + i];
+            k++;
+        }
+    }
+}
+
+template <typename scalar_t>
+__global__ void generate_insertion_label_kernel(
+        const scalar_t* __restrict__ target,
+        const size_t target_size,
+        const size_t operation_size,
+        int* __restrict__ operations,
+        int* __restrict__ labels,
+        int* __restrict__ masks) {
+
+    const int index = blockIdx.x;
+    const int offset = index * operation_size;
+    const int offset_label = index * target_size;
+
+    int k = 0;
+    int u = 0;
+    int m = 0;
+
+    for (int i = 0; i < target_size; i++) {
+        labels[offset_label + i] = 0;
+        masks[offset_label + i] = 0;
+    }
+
+    for (int i = 0; i < operation_size-1; i++){
+        if (operations[offset + i] == 0){
+            break;
+        } else if (operations[offset + i] == 2){
+            continue;
+        } else if (operations[offset + i] == 1){
+            masks[offset_label + m] = 1;
+            u++; m++;
+        } else {
+            labels[offset_label + k] = u;
+            masks[offset_label + m] = 0;
+            k++; m++;
+            u = 0;
+        }
+    }
+}
+
+template <typename scalar_t>
+__global__ void levenshtein_distance_kernel(
+        const scalar_t* __restrict__ source,
+        const scalar_t* __restrict__ target,
+        const int* __restrict__ source_length,
+        const int* __restrict__ target_length,
+        const size_t source_size,
+        const size_t target_size,
+        int* __restrict__ operations,
+        int* __restrict__ errors_curr) {
+
+    const int index = blockIdx.x;
+    const int offset = index * (source_size + target_size);
+    const int d = index * (source_size + 1) * (target_size + 1);
+    const int t = target_size + 1;
+
+    auto err_idx = [d, t](int i, int j) { return d + i * t + j; };
+    auto opt_idx = [offset](int k) { return offset + k; };
+
+    const int hyp_len = source_length[index];
+    const int ref_len = target_length[index];
+    const scalar_t* hyp_begin = source + index * source_size;
+    const scalar_t* ref_begin = target + index * target_size;
+
+    // dynamic programming
+    for (int i = 0; i <= hyp_len; i++){
+        errors_curr[err_idx(i, 0)] = i;
+    }
+    for (int j = 0; j <= ref_len; j++){
+        errors_curr[err_idx(0, j)] = j;
+    }
+    for (int i = 1; i <= hyp_len; i++){
+        for (int j = 1; j <= ref_len; j++){
+            errors_curr[err_idx(i, j)] = min(
+                min(
+                    errors_curr[err_idx(i-1, j)],
+                    errors_curr[err_idx(i, j-1)]
+                ) + 1,
+                errors_curr[err_idx(i-1, j-1)] + 2 * (
+                    *(hyp_begin+i-1) == *(ref_begin+j-1) ? 0 : 1
+                )
+            );
+        }
+    }
+
+    // back-tracing
+    int i = hyp_len;
+    int j = ref_len;
+    int o = hyp_len + ref_len;
+
+    for (int k = 0; k < source_size + target_size; k++) {
+        operations[opt_idx(k)] = 0;
+    }
+
+    while ((i >= 0) && (j >= 0)) {
+        if ((i == 0) && (j == 0)) {
+        break;
+        }
+
+        if ((j > 0) && (errors_curr[err_idx(i, j-1)] < errors_curr[err_idx(i, j)])) {
+            o--; operations[opt_idx(o)] = 1; j--;  // insertion
+        } else if ((i > 0) && (errors_curr[err_idx(i-1, j)] < errors_curr[err_idx(i, j)])) {
+            o--; operations[opt_idx(o)] = 2; i--;  // deletion
+        } else {
+            o--; operations[opt_idx(o)] = 3; i--; j--;  // do nothing
+        }
+    }
+
+    // moving to the left
+    for (int k = 0; k < hyp_len + ref_len; k++) {
+        if (k + o < hyp_len + ref_len){
+            operations[opt_idx(k)] = operations[opt_idx(k+o)];
+        } else{
+            operations[opt_idx(k)] = 0;  // padding
+        }
+    }
+
+}
+
+template <typename scalar_t>
+__global__ void faster_levenshtein_distance_kernel(
+        const scalar_t* __restrict__ source,
+        const scalar_t* __restrict__ target,
+        const int* __restrict__ source_length,
+        const int* __restrict__ target_length,
+        const size_t source_size,
+        const size_t target_size,
+        int* __restrict__ operations) {
+
+    extern __shared__ short errors[];
+    auto errors_curr = errors;
+
+    const int index = blockIdx.x;
+    const int offset = index * (source_size + target_size);
+    const int t = target_size + 1;
+
+    auto err_idx = [t](int i, int j) { return i * t + j; };
+    auto opt_idx = [offset](int k) { return offset + k; };
+
+    const int hyp_len = source_length[index];
+    const int ref_len = target_length[index];
+    const scalar_t* hyp_begin = source + index * source_size;
+    const scalar_t* ref_begin = target + index * target_size;
+
+    // dynamic programming
+    for (int i = 0; i <= hyp_len; i++){
+        errors_curr[err_idx(i, 0)] = i;
+    }
+    for (int j = 0; j <= ref_len; j++){
+        errors_curr[err_idx(0, j)] = j;
+    }
+    for (int i = 1; i <= hyp_len; i++){
+        for (int j = 1; j <= ref_len; j++){
+            errors_curr[err_idx(i, j)] = min(
+                min(
+                    errors_curr[err_idx(i-1, j)],
+                    errors_curr[err_idx(i, j-1)]
+                ) + 1,
+                errors_curr[err_idx(i-1, j-1)] + 2 * (
+                    *(hyp_begin+i-1) == *(ref_begin+j-1) ? 0 : 1
+                )
+            );
+        }
+    }
+
+    // back-tracing
+    int i = hyp_len;
+    int j = ref_len;
+    int o = hyp_len + ref_len;
+
+    for (int k = 0; k < source_size + target_size; k++) {
+        operations[opt_idx(k)] = 0;
+    }
+
+    while ((i >= 0) && (j >= 0)) {
+        if ((i == 0) && (j == 0)) {
+        break;
+        }
+
+        if ((j > 0) && (errors_curr[err_idx(i, j-1)] < errors_curr[err_idx(i, j)])) {
+            o--; operations[opt_idx(o)] = 1; j--;  // insertion
+        } else if ((i > 0) && (errors_curr[err_idx(i-1, j)] < errors_curr[err_idx(i, j)])) {
+            o--; operations[opt_idx(o)] = 2; i--;  // deletion
+        } else {
+            o--; operations[opt_idx(o)] = 3; i--; j--;  // do nothing
+        }
+    }
+
+    // moving to the left
+    for (int k = 0; k < hyp_len + ref_len; k++) {
+        if (k + o < hyp_len + ref_len){
+            operations[opt_idx(k)] = operations[opt_idx(k+o)];
+        } else{
+            operations[opt_idx(k)] = 0;  // padding
+        }
+    }
+
+}
+
+
+torch::Tensor GenerateDeletionLabelCuda(
+        torch::Tensor source,
+        torch::Tensor operations) {
+
+    const auto batch_size = source.size(0);
+    at::TensorOptions options(source.device());
+    options = options.dtype(at::ScalarType::Int);
+    auto labels = torch::empty({batch_size, source.size(1)}, options);
+    auto stream = at::cuda::getCurrentCUDAStream(source.device().index());
+
+    AT_DISPATCH_ALL_TYPES(source.scalar_type(), "generate_deletion_labels", ([&] {
+        generate_deletion_label_kernel<scalar_t><<<batch_size, 1, 0, stream>>>(
+            source.data_ptr<scalar_t>(),
+            source.size(1),
+            operations.size(1),
+            operations.data_ptr<int>(),
+            labels.data_ptr<int>());
+    }));
+
+    return labels;
+}
+
+std::pair<torch::Tensor, torch::Tensor> GenerateInsertionLabelCuda(
+    torch::Tensor target,
+    torch::Tensor operations) {
+
+const auto batch_size = target.size(0);
+at::TensorOptions options(target.device());
+options = options.dtype(at::ScalarType::Int);
+auto labels = torch::empty({batch_size, target.size(1)}, options);
+auto masks  = torch::empty({batch_size, target.size(1)}, options);
+auto stream = at::cuda::getCurrentCUDAStream(target.device().index());
+
+AT_DISPATCH_ALL_TYPES(target.scalar_type(), "generate_insertion_labels", ([&] {
+    generate_insertion_label_kernel<scalar_t><<<batch_size, 1, 0, stream>>>(
+        target.data_ptr<scalar_t>(),
+        target.size(1),
+        operations.size(1),
+        operations.data_ptr<int>(),
+        labels.data_ptr<int>(),
+        masks.data_ptr<int>());
+}));
+
+return std::make_pair(labels, masks);
+}
+
+
+torch::Tensor LevenshteinDistanceCuda(
+        torch::Tensor source,
+        torch::Tensor target,
+        torch::Tensor source_length,
+        torch::Tensor target_length) {
+
+    const auto batch_size = source.size(0);
+    const auto shared_size = (source.size(1) + 1) * (target.size(1) + 1) * sizeof(short);
+    
+    at::TensorOptions options(source.device());
+    options = options.dtype(at::ScalarType::Int);
+    auto operations = torch::empty({batch_size, source.size(1) + target.size(1)}, options);
+    auto stream = at::cuda::getCurrentCUDAStream(source.device().index());
+
+    if (shared_size > 40000) {
+        auto distances = torch::empty({batch_size, (source.size(1) + 1) * (target.size(1) + 1)}, options);
+        AT_DISPATCH_ALL_TYPES(source.scalar_type(), "levenshtein_distance", ([&] {
+            levenshtein_distance_kernel<scalar_t><<<batch_size, 1, 0, stream>>>(
+                source.data_ptr<scalar_t>(),
+                target.data_ptr<scalar_t>(),
+                source_length.data_ptr<int>(),
+                target_length.data_ptr<int>(),
+                source.size(1),
+                target.size(1),
+                operations.data_ptr<int>(),
+                distances.data_ptr<int>());
+        }));
+    } else {
+        AT_DISPATCH_ALL_TYPES(source.scalar_type(), "faster_levenshtein_distance", ([&] {
+            faster_levenshtein_distance_kernel<scalar_t><<<batch_size, 1, shared_size, stream>>>(
+                source.data_ptr<scalar_t>(),
+                target.data_ptr<scalar_t>(),
+                source_length.data_ptr<int>(),
+                target_length.data_ptr<int>(),
+                source.size(1),
+                target.size(1),
+                operations.data_ptr<int>());
+        }));
+    }
+
+    return operations;
+}
diff --git a/SpeechT5/fairseq/fairseq/clib/libnat_cuda/edit_dist.h b/SpeechT5/fairseq/fairseq/clib/libnat_cuda/edit_dist.h
new file mode 100644
index 0000000000000000000000000000000000000000..e3506cd34ddaa35bb724fe64a459bad8046b9a34
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/clib/libnat_cuda/edit_dist.h
@@ -0,0 +1,25 @@
+/**
+ * Copyright 2017-present, Facebook, Inc.
+ * All rights reserved.
+ *
+ * This source code is licensed under the license found in the
+ * LICENSE file in the root directory of this source tree.
+ */
+
+#pragma once
+
+#include <torch/extension.h>
+
+torch::Tensor LevenshteinDistanceCuda(
+        torch::Tensor source,
+        torch::Tensor target,
+        torch::Tensor source_length,
+        torch::Tensor target_length);
+
+torch::Tensor GenerateDeletionLabelCuda(
+        torch::Tensor source,
+        torch::Tensor operations);
+
+std::pair<torch::Tensor, torch::Tensor> GenerateInsertionLabelCuda(
+        torch::Tensor source,
+        torch::Tensor operations);
diff --git a/SpeechT5/fairseq/fairseq/config/__init__.py b/SpeechT5/fairseq/fairseq/config/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..6264236915a7269a4d920ee8213004374dd86a9a
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/config/__init__.py
@@ -0,0 +1,4 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
diff --git a/SpeechT5/fairseq/fairseq/config/config.yaml b/SpeechT5/fairseq/fairseq/config/config.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..e20d914b9b1620b21e702f1114aaf1131c3f6c55
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/config/config.yaml
@@ -0,0 +1,18 @@
+# @package _group_
+
+hydra:
+  run:
+    dir: .
+
+defaults:
+    - task: null
+    - model: null
+    - criterion: cross_entropy
+    - optimizer: null
+    - lr_scheduler: fixed
+    - bpe: null
+    - tokenizer: null
+    - scoring: null
+    - generation: null
+    - common_eval: null
+    - eval_lm: null
diff --git a/SpeechT5/fairseq/fairseq/config/model/transformer_lm/transformer_lm_baevski_gbw.yaml b/SpeechT5/fairseq/fairseq/config/model/transformer_lm/transformer_lm_baevski_gbw.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..30b1a4f1e0f5e7f7c2671ff8ec995cc32363f10f
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/config/model/transformer_lm/transformer_lm_baevski_gbw.yaml
@@ -0,0 +1,36 @@
+# @package _group_
+activation_fn: "relu"
+dropout: 0.1
+attention_dropout: 0.1
+activation_dropout: 0.0
+relu_dropout: 0.0
+decoder_embed_dim: 512
+decoder_output_dim: 512
+decoder_input_dim: 512
+decoder_ffn_embed_dim: 4096
+decoder_layers: 12
+decoder_attention_heads: 16
+decoder_normalize_before: true
+no_decoder_final_norm: true
+adaptive_softmax_cutoff: null
+adaptive_softmax_dropout: 0
+adaptive_softmax_factor: 4
+no_token_positional_embeddings: false
+share_decoder_input_output_embed: false
+character_embeddings: false
+character_filters: "[(1, 64), (2, 128), (3, 192), (4, 256), (5, 256), (6, 256), (7, 256)]"
+character_embedding_dim: 4
+char_embedder_highway_layers: 2
+adaptive_input: false
+adaptive_input_factor: 4
+adaptive_input_cutoff: null
+tie_adaptive_weights: false
+tie_adaptive_proj: false
+decoder_learned_pos: false
+decoder_layerdrop: 0
+decoder_layers_to_keep: null
+layernorm_embedding: false
+no_scale_embedding: false
+quant_noise_pq: 0
+quant_noise_pq_block_size: 8
+quant_noise_scalar: 0
diff --git a/SpeechT5/fairseq/fairseq/config/model/transformer_lm/transformer_lm_baevski_wiki103.yaml b/SpeechT5/fairseq/fairseq/config/model/transformer_lm/transformer_lm_baevski_wiki103.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..1154cfa660ee5ce6a272cd1a0049eead1e92c117
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/config/model/transformer_lm/transformer_lm_baevski_wiki103.yaml
@@ -0,0 +1,36 @@
+# @package _group_
+activation_fn: "relu"
+dropout: 0.3
+attention_dropout: 0.1
+activation_dropout: 0.1
+relu_dropout: 0.1
+decoder_embed_dim: 1024
+decoder_output_dim: 1024
+decoder_input_dim: 1024
+decoder_ffn_embed_dim: 4096
+decoder_layers: 16
+decoder_attention_heads: 8
+decoder_normalize_before: true
+no_decoder_final_norm: true
+adaptive_softmax_cutoff: "20000,60000"
+adaptive_softmax_dropout: 0.2
+adaptive_softmax_factor: 4
+no_token_positional_embeddings: false
+share_decoder_input_output_embed: false
+character_embeddings: false
+character_filters: "[(1, 64), (2, 128), (3, 192), (4, 256), (5, 256), (6, 256), (7, 256)]"
+character_embedding_dim: 4
+char_embedder_highway_layers: 2
+adaptive_input: true
+adaptive_input_factor: 4
+adaptive_input_cutoff: "20000,60000"
+tie_adaptive_weights: true
+tie_adaptive_proj: true
+decoder_learned_pos: false
+decoder_layerdrop: 0
+decoder_layers_to_keep: null
+layernorm_embedding: false
+no_scale_embedding: false
+quant_noise_pq: 0
+quant_noise_pq_block_size: 8
+quant_noise_scalar: 0
diff --git a/SpeechT5/fairseq/fairseq/config/model/transformer_lm/transformer_lm_big.yaml b/SpeechT5/fairseq/fairseq/config/model/transformer_lm/transformer_lm_big.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..309575310bfc5d9c5cde31563073bef18abc646e
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/config/model/transformer_lm/transformer_lm_big.yaml
@@ -0,0 +1,36 @@
+# @package _group_
+activation_fn: "relu"
+dropout: 0.1
+attention_dropout: 0.0
+activation_dropout: 0.0
+relu_dropout: 0.0
+decoder_embed_dim: 1024
+decoder_output_dim: 1024
+decoder_input_dim: 1024
+decoder_ffn_embed_dim: 4096
+decoder_layers: 12
+decoder_attention_heads: 16
+decoder_normalize_before: true
+no_decoder_final_norm: false
+adaptive_softmax_cutoff: null
+adaptive_softmax_dropout: 0
+adaptive_softmax_factor: 4
+no_token_positional_embeddings: false
+share_decoder_input_output_embed: false
+character_embeddings: false
+character_filters: "[(1, 64), (2, 128), (3, 192), (4, 256), (5, 256), (6, 256), (7, 256)]"
+character_embedding_dim: 4
+char_embedder_highway_layers: 2
+adaptive_input: false
+adaptive_input_factor: 4
+adaptive_input_cutoff: null
+tie_adaptive_weights: false
+tie_adaptive_proj: false
+decoder_learned_pos: false
+decoder_layerdrop: 0
+decoder_layers_to_keep: null
+layernorm_embedding: false
+no_scale_embedding: false
+quant_noise_pq: 0
+quant_noise_pq_block_size: 8
+quant_noise_scalar: 0
diff --git a/SpeechT5/fairseq/fairseq/config/model/transformer_lm/transformer_lm_gbw.yaml b/SpeechT5/fairseq/fairseq/config/model/transformer_lm/transformer_lm_gbw.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..30b1a4f1e0f5e7f7c2671ff8ec995cc32363f10f
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/config/model/transformer_lm/transformer_lm_gbw.yaml
@@ -0,0 +1,36 @@
+# @package _group_
+activation_fn: "relu"
+dropout: 0.1
+attention_dropout: 0.1
+activation_dropout: 0.0
+relu_dropout: 0.0
+decoder_embed_dim: 512
+decoder_output_dim: 512
+decoder_input_dim: 512
+decoder_ffn_embed_dim: 4096
+decoder_layers: 12
+decoder_attention_heads: 16
+decoder_normalize_before: true
+no_decoder_final_norm: true
+adaptive_softmax_cutoff: null
+adaptive_softmax_dropout: 0
+adaptive_softmax_factor: 4
+no_token_positional_embeddings: false
+share_decoder_input_output_embed: false
+character_embeddings: false
+character_filters: "[(1, 64), (2, 128), (3, 192), (4, 256), (5, 256), (6, 256), (7, 256)]"
+character_embedding_dim: 4
+char_embedder_highway_layers: 2
+adaptive_input: false
+adaptive_input_factor: 4
+adaptive_input_cutoff: null
+tie_adaptive_weights: false
+tie_adaptive_proj: false
+decoder_learned_pos: false
+decoder_layerdrop: 0
+decoder_layers_to_keep: null
+layernorm_embedding: false
+no_scale_embedding: false
+quant_noise_pq: 0
+quant_noise_pq_block_size: 8
+quant_noise_scalar: 0
diff --git a/SpeechT5/fairseq/fairseq/config/model/transformer_lm/transformer_lm_gpt.yaml b/SpeechT5/fairseq/fairseq/config/model/transformer_lm/transformer_lm_gpt.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..2c6cb7be3801115371566932ffc78651c9ac6c0f
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/config/model/transformer_lm/transformer_lm_gpt.yaml
@@ -0,0 +1,36 @@
+# @package _group_
+activation_fn: "gelu"
+dropout: 0.1
+attention_dropout: 0.1
+activation_dropout: 0.0
+relu_dropout: 0.0
+decoder_embed_dim: 768
+decoder_output_dim: 768
+decoder_input_dim: 768
+decoder_ffn_embed_dim: 3072
+decoder_layers: 12
+decoder_attention_heads: 12
+decoder_normalize_before: true
+no_decoder_final_norm: false
+adaptive_softmax_cutoff: null
+adaptive_softmax_dropout: 0
+adaptive_softmax_factor: 4
+no_token_positional_embeddings: false
+share_decoder_input_output_embed: false
+character_embeddings: false
+character_filters: "[(1, 64), (2, 128), (3, 192), (4, 256), (5, 256), (6, 256), (7, 256)]"
+character_embedding_dim: 4
+char_embedder_highway_layers: 2
+adaptive_input: false
+adaptive_input_factor: 4
+adaptive_input_cutoff: null
+tie_adaptive_weights: false
+tie_adaptive_proj: false
+decoder_learned_pos: false
+decoder_layerdrop: 0
+decoder_layers_to_keep: null
+layernorm_embedding: false
+no_scale_embedding: false
+quant_noise_pq: 0
+quant_noise_pq_block_size: 8
+quant_noise_scalar: 0
diff --git a/SpeechT5/fairseq/fairseq/config/model/transformer_lm/transformer_lm_gpt2_big.yaml b/SpeechT5/fairseq/fairseq/config/model/transformer_lm/transformer_lm_gpt2_big.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..a08769a1781abdb13302bf57bf1338bcaf68a0ec
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/config/model/transformer_lm/transformer_lm_gpt2_big.yaml
@@ -0,0 +1,36 @@
+# @package _group_
+activation_fn: "gelu"
+dropout: 0.1
+attention_dropout: 0.1
+activation_dropout: 0.0
+relu_dropout: 0.0
+decoder_embed_dim: 1600
+decoder_output_dim: 1600
+decoder_input_dim: 1600
+decoder_ffn_embed_dim: 6400
+decoder_layers: 48
+decoder_attention_heads: 25
+decoder_normalize_before: true
+no_decoder_final_norm: false
+adaptive_softmax_cutoff: null
+adaptive_softmax_dropout: 0
+adaptive_softmax_factor: 4
+no_token_positional_embeddings: false
+share_decoder_input_output_embed: false
+character_embeddings: false
+character_filters: "[(1, 64), (2, 128), (3, 192), (4, 256), (5, 256), (6, 256), (7, 256)]"
+character_embedding_dim: 4
+char_embedder_highway_layers: 2
+adaptive_input: false
+adaptive_input_factor: 4
+adaptive_input_cutoff: null
+tie_adaptive_weights: false
+tie_adaptive_proj: false
+decoder_learned_pos: false
+decoder_layerdrop: 0
+decoder_layers_to_keep: null
+layernorm_embedding: false
+no_scale_embedding: false
+quant_noise_pq: 0
+quant_noise_pq_block_size: 8
+quant_noise_scalar: 0
diff --git a/SpeechT5/fairseq/fairseq/config/model/transformer_lm/transformer_lm_gpt2_medium.yaml b/SpeechT5/fairseq/fairseq/config/model/transformer_lm/transformer_lm_gpt2_medium.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..64261d793c0f1ae091c9bf5c8c77093a07326137
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/config/model/transformer_lm/transformer_lm_gpt2_medium.yaml
@@ -0,0 +1,36 @@
+# @package _group_
+activation_fn: "gelu"
+dropout: 0.1
+attention_dropout: 0.1
+activation_dropout: 0.0
+relu_dropout: 0.0
+decoder_embed_dim: 1280
+decoder_output_dim: 1280
+decoder_input_dim: 1280
+decoder_ffn_embed_dim: 5120
+decoder_layers: 36
+decoder_attention_heads: 20
+decoder_normalize_before: true
+no_decoder_final_norm: false
+adaptive_softmax_cutoff: null
+adaptive_softmax_dropout: 0
+adaptive_softmax_factor: 4
+no_token_positional_embeddings: false
+share_decoder_input_output_embed: false
+character_embeddings: false
+character_filters: "[(1, 64), (2, 128), (3, 192), (4, 256), (5, 256), (6, 256), (7, 256)]"
+character_embedding_dim: 4
+char_embedder_highway_layers: 2
+adaptive_input: false
+adaptive_input_factor: 4
+adaptive_input_cutoff: null
+tie_adaptive_weights: false
+tie_adaptive_proj: false
+decoder_learned_pos: false
+decoder_layerdrop: 0
+decoder_layers_to_keep: null
+layernorm_embedding: false
+no_scale_embedding: false
+quant_noise_pq: 0
+quant_noise_pq_block_size: 8
+quant_noise_scalar: 0
diff --git a/SpeechT5/fairseq/fairseq/config/model/transformer_lm/transformer_lm_gpt2_small.yaml b/SpeechT5/fairseq/fairseq/config/model/transformer_lm/transformer_lm_gpt2_small.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..702e81f466c82edf40433589d389edbe0a7b96db
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/config/model/transformer_lm/transformer_lm_gpt2_small.yaml
@@ -0,0 +1,36 @@
+# @package _group_
+activation_fn: "gelu"
+dropout: 0.1
+attention_dropout: 0.1
+activation_dropout: 0.0
+relu_dropout: 0.0
+decoder_embed_dim: 1024
+decoder_output_dim: 1024
+decoder_input_dim: 1024
+decoder_ffn_embed_dim: 4096
+decoder_layers: 24
+decoder_attention_heads: 16
+decoder_normalize_before: true
+no_decoder_final_norm: false
+adaptive_softmax_cutoff: null
+adaptive_softmax_dropout: 0
+adaptive_softmax_factor: 4
+no_token_positional_embeddings: false
+share_decoder_input_output_embed: false
+character_embeddings: false
+character_filters: "[(1, 64), (2, 128), (3, 192), (4, 256), (5, 256), (6, 256), (7, 256)]"
+character_embedding_dim: 4
+char_embedder_highway_layers: 2
+adaptive_input: false
+adaptive_input_factor: 4
+adaptive_input_cutoff: null
+tie_adaptive_weights: false
+tie_adaptive_proj: false
+decoder_learned_pos: false
+decoder_layerdrop: 0
+decoder_layers_to_keep: null
+layernorm_embedding: false
+no_scale_embedding: false
+quant_noise_pq: 0
+quant_noise_pq_block_size: 8
+quant_noise_scalar: 0
diff --git a/SpeechT5/fairseq/fairseq/config/model/transformer_lm/transformer_lm_wiki103.yaml b/SpeechT5/fairseq/fairseq/config/model/transformer_lm/transformer_lm_wiki103.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..1154cfa660ee5ce6a272cd1a0049eead1e92c117
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/config/model/transformer_lm/transformer_lm_wiki103.yaml
@@ -0,0 +1,36 @@
+# @package _group_
+activation_fn: "relu"
+dropout: 0.3
+attention_dropout: 0.1
+activation_dropout: 0.1
+relu_dropout: 0.1
+decoder_embed_dim: 1024
+decoder_output_dim: 1024
+decoder_input_dim: 1024
+decoder_ffn_embed_dim: 4096
+decoder_layers: 16
+decoder_attention_heads: 8
+decoder_normalize_before: true
+no_decoder_final_norm: true
+adaptive_softmax_cutoff: "20000,60000"
+adaptive_softmax_dropout: 0.2
+adaptive_softmax_factor: 4
+no_token_positional_embeddings: false
+share_decoder_input_output_embed: false
+character_embeddings: false
+character_filters: "[(1, 64), (2, 128), (3, 192), (4, 256), (5, 256), (6, 256), (7, 256)]"
+character_embedding_dim: 4
+char_embedder_highway_layers: 2
+adaptive_input: true
+adaptive_input_factor: 4
+adaptive_input_cutoff: "20000,60000"
+tie_adaptive_weights: true
+tie_adaptive_proj: true
+decoder_learned_pos: false
+decoder_layerdrop: 0
+decoder_layers_to_keep: null
+layernorm_embedding: false
+no_scale_embedding: false
+quant_noise_pq: 0
+quant_noise_pq_block_size: 8
+quant_noise_scalar: 0
diff --git a/SpeechT5/fairseq/fairseq/config/model/wav2vec/vq_wav2vec_gumbel.yaml b/SpeechT5/fairseq/fairseq/config/model/wav2vec/vq_wav2vec_gumbel.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..ee1329bf4612d8bb295c6cc3d8bc0a3bcef1777d
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/config/model/wav2vec/vq_wav2vec_gumbel.yaml
@@ -0,0 +1,5 @@
+# @package _group_
+activation: gelu
+vq_type: gumbel
+vq_depth: 2
+combine_groups: true
diff --git a/SpeechT5/fairseq/fairseq/config/model/wav2vec2/wav2vec2_base.yaml b/SpeechT5/fairseq/fairseq/config/model/wav2vec2/wav2vec2_base.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..ce65499b808b9a3821cee4ca87c36e84d09005a1
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/config/model/wav2vec2/wav2vec2_base.yaml
@@ -0,0 +1,8 @@
+# @package _group_
+
+quantize_targets: true
+final_dim: 256
+encoder_layerdrop: 0.05
+dropout_input: 0.1
+dropout_features: 0.1
+feature_grad_mult: 0.1
diff --git a/SpeechT5/fairseq/fairseq/config/model/wav2vec2/wav2vec2_large.yaml b/SpeechT5/fairseq/fairseq/config/model/wav2vec2/wav2vec2_large.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..5846f75243f27f201c85bfe6820815c015971275
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/config/model/wav2vec2/wav2vec2_large.yaml
@@ -0,0 +1,20 @@
+# @package _group_
+
+quantize_targets: true
+extractor_mode: layer_norm
+layer_norm_first: true
+final_dim: 768
+latent_temp: [2.0,0.1,0.999995]
+encoder_layerdrop: 0.0
+dropout_input: 0.0
+dropout_features: 0.0
+dropout: 0.0
+attention_dropout: 0.0
+conv_bias: true
+
+encoder_layers: 24
+encoder_embed_dim: 1024
+encoder_ffn_embed_dim: 4096
+encoder_attention_heads: 16
+
+feature_grad_mult: 1.0
diff --git a/SpeechT5/fairseq/fairseq/criterions/__init__.py b/SpeechT5/fairseq/fairseq/criterions/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..4dbf46a1cb31ce65c4224ae79cbc2d7cf9e4d111
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/criterions/__init__.py
@@ -0,0 +1,36 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+"""isort:skip_file"""
+
+import importlib
+import os
+
+from fairseq import registry
+from fairseq.criterions.fairseq_criterion import (  # noqa
+    FairseqCriterion,
+    LegacyFairseqCriterion,
+)
+from omegaconf import DictConfig
+
+
+(
+    build_criterion_,
+    register_criterion,
+    CRITERION_REGISTRY,
+    CRITERION_DATACLASS_REGISTRY,
+) = registry.setup_registry(
+    "--criterion", base_class=FairseqCriterion, default="cross_entropy"
+)
+
+
+def build_criterion(cfg: DictConfig, task):
+    return build_criterion_(cfg, task)
+
+
+# automatically import any Python files in the criterions/ directory
+for file in sorted(os.listdir(os.path.dirname(__file__))):
+    if file.endswith(".py") and not file.startswith("_"):
+        file_name = file[: file.find(".py")]
+        importlib.import_module("fairseq.criterions." + file_name)
diff --git a/SpeechT5/fairseq/fairseq/criterions/adaptive_loss.py b/SpeechT5/fairseq/fairseq/criterions/adaptive_loss.py
new file mode 100644
index 0000000000000000000000000000000000000000..6209ceaedb6d8120ad820c11b55c13596447933c
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/criterions/adaptive_loss.py
@@ -0,0 +1,123 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import math
+from dataclasses import dataclass
+
+import torch.nn.functional as F
+from fairseq import metrics, utils
+from fairseq.criterions import FairseqCriterion, register_criterion
+from fairseq.dataclass import FairseqDataclass
+from fairseq.dataclass.constants import DDP_BACKEND_CHOICES
+from omegaconf import II
+
+
+@dataclass
+class AdaptiveLossConfig(FairseqDataclass):
+    sentence_avg: bool = II("optimization.sentence_avg")
+    ddp_backend: DDP_BACKEND_CHOICES = II("distributed_training.ddp_backend")
+
+
+@register_criterion("adaptive_loss", dataclass=AdaptiveLossConfig)
+class AdaptiveLoss(FairseqCriterion):
+    """This is an implementation of the loss function accompanying the adaptive softmax approximation for
+    graphical processing units (GPU), described in the paper "Efficient softmax approximation for GPUs"
+    (http://arxiv.org/abs/1609.04309)."""
+
+    def __init__(self, task, sentence_avg):
+        super().__init__(task)
+        self.sentence_avg = sentence_avg
+
+    @classmethod
+    def build_criterion(cls, cfg: AdaptiveLossConfig, task):
+        if cfg.ddp_backend in {"c10d", "pytorch_ddp"}:
+            raise Exception(
+                "AdaptiveLoss is not compatible with the PyTorch "
+                "version of DistributedDataParallel. Please use "
+                "`--ddp-backend=legacy_ddp` instead."
+            )
+        return cls(task, cfg.sentence_avg)
+
+    def forward(self, model, sample, reduce=True):
+        """Compute the loss for the given sample.
+
+        Returns a tuple with three elements:
+        1) the loss
+        2) the sample size, which is used as the denominator for the gradient
+        3) logging outputs to display while training
+        """
+
+        assert (
+            hasattr(model.decoder, "adaptive_softmax")
+            and model.decoder.adaptive_softmax is not None
+        )
+        adaptive_softmax = model.decoder.adaptive_softmax
+
+        net_output = model(**sample["net_input"])
+        orig_target = model.get_targets(sample, net_output)
+
+        nsentences = orig_target.size(0)
+        orig_target = orig_target.view(-1)
+
+        bsz = orig_target.size(0)
+
+        logits, target = adaptive_softmax(net_output[0], orig_target)
+        assert len(target) == len(logits)
+
+        loss = net_output[0].new(1 if reduce else bsz).zero_()
+
+        for i in range(len(target)):
+            if target[i] is not None:
+                assert target[i].min() >= 0 and target[i].max() <= logits[i].size(1)
+                loss += F.cross_entropy(
+                    logits[i],
+                    target[i],
+                    ignore_index=self.padding_idx,
+                    reduction="sum" if reduce else "none",
+                )
+
+        orig = utils.strip_pad(orig_target, self.padding_idx)
+        ntokens = orig.numel()
+        sample_size = sample["target"].size(0) if self.sentence_avg else ntokens
+        logging_output = {
+            "loss": loss.data,
+            "ntokens": ntokens,
+            "nsentences": nsentences,
+            "sample_size": sample_size,
+        }
+        return loss, sample_size, logging_output
+
+    @staticmethod
+    def reduce_metrics(logging_outputs) -> None:
+        """Aggregate logging outputs from data parallel training."""
+        loss_sum = utils.item(sum(log.get("loss", 0) for log in logging_outputs))
+        ntokens = utils.item(sum(log.get("ntokens", 0) for log in logging_outputs))
+        sample_size = utils.item(
+            sum(log.get("sample_size", 0) for log in logging_outputs)
+        )
+
+        metrics.log_scalar(
+            "loss", loss_sum / sample_size / math.log(2), sample_size, round=3
+        )
+        if sample_size != ntokens:
+            metrics.log_scalar(
+                "nll_loss", loss_sum / ntokens / math.log(2), ntokens, round=3
+            )
+            metrics.log_derived(
+                "ppl", lambda meters: utils.get_perplexity(meters["nll_loss"].avg)
+            )
+        else:
+            metrics.log_derived(
+                "ppl", lambda meters: utils.get_perplexity(meters["loss"].avg)
+            )
+
+    @staticmethod
+    def logging_outputs_can_be_summed() -> bool:
+        """
+        Whether the logging outputs returned by `forward` can be summed
+        across workers prior to calling `reduce_metrics`. Setting this
+        to True will improves distributed training speed.
+        """
+        return True
diff --git a/SpeechT5/fairseq/fairseq/criterions/composite_loss.py b/SpeechT5/fairseq/fairseq/criterions/composite_loss.py
new file mode 100644
index 0000000000000000000000000000000000000000..98e835fa6e4c0bcad062df9c519701bf795c98be
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/criterions/composite_loss.py
@@ -0,0 +1,100 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from fairseq import utils
+from fairseq.criterions import LegacyFairseqCriterion, register_criterion
+from torch import nn
+
+
+@register_criterion("composite_loss")
+class CompositeLoss(LegacyFairseqCriterion):
+    """This is a composite loss that, given a list of model outputs and a list of targets,
+    computes an average of losses for each output-target pair"""
+
+    def __init__(self, args, task):
+        super().__init__(args, task)
+        self.underlying_criterion = args.underlying_criterion
+
+    @staticmethod
+    def add_args(parser):
+        """Add criterion-specific arguments to the parser."""
+        # fmt: off
+        parser.add_argument('--underlying-criterion', type=str, metavar='VAL', required=True,
+                            help='underlying criterion to use for the composite loss')
+        # fmt: on
+
+    @staticmethod
+    def build_underlying_criterion(args, task):
+        saved_criterion = args.criterion
+        args.criterion = args.underlying_criterion
+        assert saved_criterion != args.underlying_criterion
+        underlying_criterion = task.build_criterion(args)
+        args.criterion = saved_criterion
+        return underlying_criterion
+
+    @classmethod
+    def build_criterion(cls, args, task):
+        underlying_criterion = CompositeLoss.build_underlying_criterion(args, task)
+
+        class FakeModel(nn.Module):
+            def __init__(self, model, net_out, target):
+                super().__init__()
+                self.model = model
+                self.net_out = net_out
+                self.target = target
+
+            def forward(self, **unused):
+                return self.net_out
+
+            def get_normalized_probs(self, net_output, log_probs, sample=None):
+                return self.model.get_normalized_probs(
+                    net_output, log_probs, sample=sample
+                )
+
+            def get_targets(self, *unused):
+                return self.target
+
+            @property
+            def decoder(self):
+                return self.model.decoder
+
+        class _CompositeLoss(LegacyFairseqCriterion):
+            def __init__(self, args, task, underlying_criterion):
+                super().__init__(args, task)
+                self.underlying_criterion = underlying_criterion
+
+            def forward(self, model, sample, reduce=True):
+                net_outputs = model(**sample["net_input"])
+                targets = sample["target"]
+
+                bsz = targets[0].size(0)
+                loss = net_outputs[0][0].new(1 if reduce else bsz).float().zero_()
+
+                sample_size = 0
+                logging_output = {}
+                for o, t in zip(net_outputs[0], targets):
+                    m = FakeModel(model, (o, net_outputs[1]), t)
+                    sample["target"] = t
+                    l, ss, logging_output = self.underlying_criterion(m, sample, reduce)
+                    loss += l
+                    sample_size += ss
+
+                loss.div_(len(targets))
+                sample_size /= len(targets)
+
+                logging_output["loss"] = utils.item(loss.data) if reduce else loss.data
+                return loss, sample_size, logging_output
+
+            @staticmethod
+            def aggregate_logging_outputs(logging_outputs):
+                return underlying_criterion.__class__.aggregate_logging_outputs(
+                    logging_outputs
+                )
+
+            @staticmethod
+            def reduce_metrics(logging_outputs) -> None:
+                underlying_criterion.__class__.reduce_metrics(logging_outputs)
+
+        return _CompositeLoss(args, task, underlying_criterion)
diff --git a/SpeechT5/fairseq/fairseq/criterions/cross_entropy.py b/SpeechT5/fairseq/fairseq/criterions/cross_entropy.py
new file mode 100644
index 0000000000000000000000000000000000000000..fe461064716b38ecf2eb610daddbb609a1884e6b
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/criterions/cross_entropy.py
@@ -0,0 +1,90 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import math
+from dataclasses import dataclass
+
+import torch.nn.functional as F
+from fairseq import metrics, utils
+from fairseq.criterions import FairseqCriterion, register_criterion
+from fairseq.dataclass import FairseqDataclass
+from omegaconf import II
+
+
+@dataclass
+class CrossEntropyCriterionConfig(FairseqDataclass):
+    sentence_avg: bool = II("optimization.sentence_avg")
+
+
+@register_criterion("cross_entropy", dataclass=CrossEntropyCriterionConfig)
+class CrossEntropyCriterion(FairseqCriterion):
+    def __init__(self, task, sentence_avg):
+        super().__init__(task)
+        self.sentence_avg = sentence_avg
+
+    def forward(self, model, sample, reduce=True):
+        """Compute the loss for the given sample.
+
+        Returns a tuple with three elements:
+        1) the loss
+        2) the sample size, which is used as the denominator for the gradient
+        3) logging outputs to display while training
+        """
+        net_output = model(**sample["net_input"])
+        loss, _ = self.compute_loss(model, net_output, sample, reduce=reduce)
+        sample_size = (
+            sample["target"].size(0) if self.sentence_avg else sample["ntokens"]
+        )
+        logging_output = {
+            "loss": loss.data,
+            "ntokens": sample["ntokens"],
+            "nsentences": sample["target"].size(0),
+            "sample_size": sample_size,
+        }
+        return loss, sample_size, logging_output
+
+    def compute_loss(self, model, net_output, sample, reduce=True):
+        lprobs = model.get_normalized_probs(net_output, log_probs=True)
+        lprobs = lprobs.view(-1, lprobs.size(-1))
+        target = model.get_targets(sample, net_output).view(-1)
+        loss = F.nll_loss(
+            lprobs,
+            target,
+            ignore_index=self.padding_idx,
+            reduction="sum" if reduce else "none",
+        )
+        return loss, loss
+
+    @staticmethod
+    def reduce_metrics(logging_outputs) -> None:
+        """Aggregate logging outputs from data parallel training."""
+        loss_sum = sum(log.get("loss", 0) for log in logging_outputs)
+        ntokens = sum(log.get("ntokens", 0) for log in logging_outputs)
+        sample_size = sum(log.get("sample_size", 0) for log in logging_outputs)
+
+        # we divide by log(2) to convert the loss from base e to base 2
+        metrics.log_scalar(
+            "loss", loss_sum / sample_size / math.log(2), sample_size, round=3
+        )
+        if sample_size != ntokens:
+            metrics.log_scalar(
+                "nll_loss", loss_sum / ntokens / math.log(2), ntokens, round=3
+            )
+            metrics.log_derived(
+                "ppl", lambda meters: utils.get_perplexity(meters["nll_loss"].avg)
+            )
+        else:
+            metrics.log_derived(
+                "ppl", lambda meters: utils.get_perplexity(meters["loss"].avg)
+            )
+
+    @staticmethod
+    def logging_outputs_can_be_summed() -> bool:
+        """
+        Whether the logging outputs returned by `forward` can be summed
+        across workers prior to calling `reduce_metrics`. Setting this
+        to True will improves distributed training speed.
+        """
+        return True
diff --git a/SpeechT5/fairseq/fairseq/criterions/ctc.py b/SpeechT5/fairseq/fairseq/criterions/ctc.py
new file mode 100644
index 0000000000000000000000000000000000000000..10e3618382c86a84466cb4264d62f31537980251
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/criterions/ctc.py
@@ -0,0 +1,295 @@
+# All rights reserved.
+#
+# This source code is licensed under the license found in the LICENSE file in
+# the root directory of this source tree. An additional grant of patent rights
+# can be found in the PATENTS file in the same directory.
+
+import math
+from argparse import Namespace
+from dataclasses import dataclass, field
+from omegaconf import II
+from typing import Optional
+
+import torch
+import torch.nn.functional as F
+from fairseq import metrics, utils
+from fairseq.criterions import FairseqCriterion, register_criterion
+from fairseq.dataclass import FairseqDataclass
+from fairseq.data.data_utils import post_process
+from fairseq.tasks import FairseqTask
+from fairseq.logging.meters import safe_round
+
+
+@dataclass
+class CtcCriterionConfig(FairseqDataclass):
+    zero_infinity: bool = field(
+        default=False,
+        metadata={"help": "zero inf loss when source length <= target length"},
+    )
+    sentence_avg: bool = II("optimization.sentence_avg")
+    post_process: str = field(
+        default="letter",
+        metadata={
+            "help": "how to post process predictions into words. can be letter, "
+            "wordpiece, BPE symbols, etc. "
+            "See fairseq.data.data_utils.post_process() for full list of options"
+        },
+    )
+    wer_kenlm_model: Optional[str] = field(
+        default=None,
+        metadata={
+            "help": "if this is provided, use kenlm to compute wer (along with other wer_* args)"
+        },
+    )
+    wer_lexicon: Optional[str] = field(
+        default=None,
+        metadata={"help": "lexicon to use with wer_kenlm_model"},
+    )
+    wer_lm_weight: float = field(
+        default=2.0,
+        metadata={"help": "lm weight to use with wer_kenlm_model"},
+    )
+    wer_word_score: float = field(
+        default=-1.0,
+        metadata={"help": "lm word score to use with wer_kenlm_model"},
+    )
+
+    wer_args: Optional[str] = field(
+        default=None,
+        metadata={
+            "help": "DEPRECATED: tuple of (wer_kenlm_model, wer_lexicon, wer_lm_weight, wer_word_score)"
+        },
+    )
+
+
+@register_criterion("ctc", dataclass=CtcCriterionConfig)
+class CtcCriterion(FairseqCriterion):
+    def __init__(self, cfg: CtcCriterionConfig, task: FairseqTask):
+        super().__init__(task)
+        self.blank_idx = (
+            task.target_dictionary.index(task.blank_symbol)
+            if hasattr(task, "blank_symbol")
+            else 0
+        )
+        self.pad_idx = task.target_dictionary.pad()
+        self.eos_idx = task.target_dictionary.eos()
+        self.post_process = cfg.post_process
+
+        if cfg.wer_args is not None:
+            (
+                cfg.wer_kenlm_model,
+                cfg.wer_lexicon,
+                cfg.wer_lm_weight,
+                cfg.wer_word_score,
+            ) = eval(cfg.wer_args)
+
+        if cfg.wer_kenlm_model is not None:
+            from examples.speech_recognition.w2l_decoder import W2lKenLMDecoder
+
+            dec_args = Namespace()
+            dec_args.nbest = 1
+            dec_args.criterion = "ctc"
+            dec_args.kenlm_model = cfg.wer_kenlm_model
+            dec_args.lexicon = cfg.wer_lexicon
+            dec_args.beam = 50
+            dec_args.beam_size_token = min(50, len(task.target_dictionary))
+            dec_args.beam_threshold = min(50, len(task.target_dictionary))
+            dec_args.lm_weight = cfg.wer_lm_weight
+            dec_args.word_score = cfg.wer_word_score
+            dec_args.unk_weight = -math.inf
+            dec_args.sil_weight = 0
+
+            self.w2l_decoder = W2lKenLMDecoder(dec_args, task.target_dictionary)
+        else:
+            self.w2l_decoder = None
+
+        self.zero_infinity = cfg.zero_infinity
+        self.sentence_avg = cfg.sentence_avg
+
+    def forward(self, model, sample, reduce=True):
+        net_output = model(**sample["net_input"])
+        lprobs = model.get_normalized_probs(
+            net_output, log_probs=True
+        ).contiguous()  # (T, B, C) from the encoder
+
+        if "src_lengths" in sample["net_input"]:
+            input_lengths = sample["net_input"]["src_lengths"]
+        else:
+            if net_output["padding_mask"] is not None:
+                non_padding_mask = ~net_output["padding_mask"]
+                input_lengths = non_padding_mask.long().sum(-1)
+            else:
+                input_lengths = lprobs.new_full(
+                    (lprobs.size(1),), lprobs.size(0), dtype=torch.long
+                )
+
+        pad_mask = (sample["target"] != self.pad_idx) & (
+            sample["target"] != self.eos_idx
+        )
+        targets_flat = sample["target"].masked_select(pad_mask)
+        if "target_lengths" in sample:
+            target_lengths = sample["target_lengths"]
+        else:
+            target_lengths = pad_mask.sum(-1)
+
+        with torch.backends.cudnn.flags(enabled=False):
+            loss = F.ctc_loss(
+                lprobs,
+                targets_flat,
+                input_lengths,
+                target_lengths,
+                blank=self.blank_idx,
+                reduction="sum",
+                zero_infinity=self.zero_infinity,
+            )
+
+        ntokens = (
+            sample["ntokens"] if "ntokens" in sample else target_lengths.sum().item()
+        )
+
+        sample_size = sample["target"].size(0) if self.sentence_avg else ntokens
+        logging_output = {
+            "loss": utils.item(loss.data),  # * sample['ntokens'],
+            "ntokens": ntokens,
+            "nsentences": sample["id"].numel(),
+            "sample_size": sample_size,
+        }
+
+        if not model.training:
+            import editdistance
+
+            with torch.no_grad():
+                lprobs_t = lprobs.transpose(0, 1).float().contiguous().cpu()
+
+                c_err = 0
+                c_len = 0
+                w_errs = 0
+                w_len = 0
+                wv_errs = 0
+                for lp, t, inp_l in zip(
+                    lprobs_t,
+                    sample["target_label"]
+                    if "target_label" in sample
+                    else sample["target"],
+                    input_lengths,
+                ):
+                    lp = lp[:inp_l].unsqueeze(0)
+
+                    decoded = None
+                    if self.w2l_decoder is not None:
+                        decoded = self.w2l_decoder.decode(lp)
+                        if len(decoded) < 1:
+                            decoded = None
+                        else:
+                            decoded = decoded[0]
+                            if len(decoded) < 1:
+                                decoded = None
+                            else:
+                                decoded = decoded[0]
+
+                    p = (t != self.task.target_dictionary.pad()) & (
+                        t != self.task.target_dictionary.eos()
+                    )
+                    targ = t[p]
+                    targ_units = self.task.target_dictionary.string(targ)
+                    targ_units_arr = targ.tolist()
+
+                    toks = lp.argmax(dim=-1).unique_consecutive()
+                    pred_units_arr = toks[toks != self.blank_idx].tolist()
+
+                    c_err += editdistance.eval(pred_units_arr, targ_units_arr)
+                    c_len += len(targ_units_arr)
+
+                    targ_words = post_process(targ_units, self.post_process).split()
+
+                    pred_units = self.task.target_dictionary.string(pred_units_arr)
+                    pred_words_raw = post_process(pred_units, self.post_process).split()
+
+                    if decoded is not None and "words" in decoded:
+                        pred_words = decoded["words"]
+                        w_errs += editdistance.eval(pred_words, targ_words)
+                        wv_errs += editdistance.eval(pred_words_raw, targ_words)
+                    else:
+                        dist = editdistance.eval(pred_words_raw, targ_words)
+                        w_errs += dist
+                        wv_errs += dist
+
+                    w_len += len(targ_words)
+
+                logging_output["wv_errors"] = wv_errs
+                logging_output["w_errors"] = w_errs
+                logging_output["w_total"] = w_len
+                logging_output["c_errors"] = c_err
+                logging_output["c_total"] = c_len
+
+        return loss, sample_size, logging_output
+
+    @staticmethod
+    def reduce_metrics(logging_outputs) -> None:
+        """Aggregate logging outputs from data parallel training."""
+
+        loss_sum = utils.item(sum(log.get("loss", 0) for log in logging_outputs))
+        ntokens = utils.item(sum(log.get("ntokens", 0) for log in logging_outputs))
+        nsentences = utils.item(
+            sum(log.get("nsentences", 0) for log in logging_outputs)
+        )
+        sample_size = utils.item(
+            sum(log.get("sample_size", 0) for log in logging_outputs)
+        )
+
+        metrics.log_scalar(
+            "loss", loss_sum / sample_size / math.log(2), sample_size, round=3
+        )
+        metrics.log_scalar("ntokens", ntokens)
+        metrics.log_scalar("nsentences", nsentences)
+        if sample_size != ntokens:
+            metrics.log_scalar(
+                "nll_loss", loss_sum / ntokens / math.log(2), ntokens, round=3
+            )
+
+        c_errors = sum(log.get("c_errors", 0) for log in logging_outputs)
+        metrics.log_scalar("_c_errors", c_errors)
+        c_total = sum(log.get("c_total", 0) for log in logging_outputs)
+        metrics.log_scalar("_c_total", c_total)
+        w_errors = sum(log.get("w_errors", 0) for log in logging_outputs)
+        metrics.log_scalar("_w_errors", w_errors)
+        wv_errors = sum(log.get("wv_errors", 0) for log in logging_outputs)
+        metrics.log_scalar("_wv_errors", wv_errors)
+        w_total = sum(log.get("w_total", 0) for log in logging_outputs)
+        metrics.log_scalar("_w_total", w_total)
+
+        if c_total > 0:
+            metrics.log_derived(
+                "uer",
+                lambda meters: safe_round(
+                    meters["_c_errors"].sum * 100.0 / meters["_c_total"].sum, 3
+                )
+                if meters["_c_total"].sum > 0
+                else float("nan"),
+            )
+        if w_total > 0:
+            metrics.log_derived(
+                "wer",
+                lambda meters: safe_round(
+                    meters["_w_errors"].sum * 100.0 / meters["_w_total"].sum, 3
+                )
+                if meters["_w_total"].sum > 0
+                else float("nan"),
+            )
+            metrics.log_derived(
+                "raw_wer",
+                lambda meters: safe_round(
+                    meters["_wv_errors"].sum * 100.0 / meters["_w_total"].sum, 3
+                )
+                if meters["_w_total"].sum > 0
+                else float("nan"),
+            )
+
+    @staticmethod
+    def logging_outputs_can_be_summed() -> bool:
+        """
+        Whether the logging outputs returned by `forward` can be summed
+        across workers prior to calling `reduce_metrics`. Setting this
+        to True will improves distributed training speed.
+        """
+        return True
diff --git a/SpeechT5/fairseq/fairseq/criterions/fairseq_criterion.py b/SpeechT5/fairseq/fairseq/criterions/fairseq_criterion.py
new file mode 100644
index 0000000000000000000000000000000000000000..ff4beb02503ea48a6c09596630aad4c710be94b6
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/criterions/fairseq_criterion.py
@@ -0,0 +1,120 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import inspect
+from typing import Any, Dict, List
+
+from fairseq import metrics, utils
+from fairseq.dataclass import FairseqDataclass
+from fairseq.dataclass.utils import gen_parser_from_dataclass
+from torch.nn.modules.loss import _Loss
+
+
+class FairseqCriterion(_Loss):
+    def __init__(self, task):
+        super().__init__()
+        self.task = task
+        if hasattr(task, "target_dictionary"):
+            tgt_dict = task.target_dictionary
+            self.padding_idx = tgt_dict.pad() if tgt_dict is not None else -100
+
+    @classmethod
+    def add_args(cls, parser):
+        """Add criterion-specific arguments to the parser."""
+        dc = getattr(cls, "__dataclass", None)
+        if dc is not None:
+            gen_parser_from_dataclass(parser, dc())
+
+    @classmethod
+    def build_criterion(cls, cfg: FairseqDataclass, task):
+        """Construct a criterion from command-line args."""
+        # arguments in the __init__.
+        init_args = {}
+        for p in inspect.signature(cls).parameters.values():
+            if (
+                p.kind == p.POSITIONAL_ONLY
+                or p.kind == p.VAR_POSITIONAL
+                or p.kind == p.VAR_KEYWORD
+            ):
+                # we haven't implemented inference for these argument types,
+                # but PRs welcome :)
+                raise NotImplementedError("{} not supported".format(p.kind))
+
+            assert p.kind in {p.POSITIONAL_OR_KEYWORD, p.KEYWORD_ONLY}
+
+            if p.name == "task":
+                init_args["task"] = task
+            elif p.name == "cfg":
+                init_args["cfg"] = cfg
+            elif hasattr(cfg, p.name):
+                init_args[p.name] = getattr(cfg, p.name)
+            elif p.default != p.empty:
+                pass  # we'll use the default value
+            else:
+                raise NotImplementedError(
+                    "Unable to infer Criterion arguments, please implement "
+                    "{}.build_criterion".format(cls.__name__)
+                )
+        return cls(**init_args)
+
+    def forward(self, model, sample, reduce=True):
+        """Compute the loss for the given sample.
+
+        Returns a tuple with three elements:
+        1) the loss
+        2) the sample size, which is used as the denominator for the gradient
+        3) logging outputs to display while training
+        """
+        raise NotImplementedError
+
+    @staticmethod
+    def aggregate_logging_outputs(
+        logging_outputs: List[Dict[str, Any]]
+    ) -> Dict[str, Any]:
+        """Aggregate logging outputs from data parallel training."""
+        utils.deprecation_warning(
+            "The aggregate_logging_outputs API is deprecated. "
+            "Please use the reduce_metrics API instead."
+        )
+        raise NotImplementedError
+
+    @classmethod
+    def reduce_metrics(cls, logging_outputs: List[Dict[str, Any]]) -> None:
+        """Aggregate logging outputs from data parallel training."""
+        utils.deprecation_warning(
+            "Criterions should implement the reduce_metrics API. "
+            "Falling back to deprecated aggregate_logging_outputs API."
+        )
+        agg_logging_outputs = cls.aggregate_logging_outputs(logging_outputs)
+        for k, v in agg_logging_outputs.items():
+            if k in {"nsentences", "ntokens", "sample_size"}:
+                continue
+            metrics.log_scalar(k, v)
+
+    @staticmethod
+    def logging_outputs_can_be_summed() -> bool:
+        """
+        Whether the logging outputs returned by `forward` can be summed
+        across workers prior to calling `reduce_metrics`. Setting this
+        to True will improves distributed training speed.
+        """
+        return False
+
+
+class LegacyFairseqCriterion(FairseqCriterion):
+    def __init__(self, args, task):
+        super().__init__(task=task)
+        self.args = args
+
+        utils.deprecation_warning(
+            "Criterions should take explicit arguments instead of an "
+            "argparse.Namespace object, please update your criterion by "
+            "extending FairseqCriterion instead of LegacyFairseqCriterion."
+        )
+
+    @classmethod
+    def build_criterion(cls, args, task):
+        """Construct a criterion from command-line args."""
+        return cls(args, task)
diff --git a/SpeechT5/fairseq/fairseq/criterions/hubert_criterion.py b/SpeechT5/fairseq/fairseq/criterions/hubert_criterion.py
new file mode 100644
index 0000000000000000000000000000000000000000..68cb24e6f142c46e108c53479fd4027a741f5f92
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/criterions/hubert_criterion.py
@@ -0,0 +1,177 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import math
+import re
+from dataclasses import dataclass, field
+from typing import List, Optional
+
+import torch
+import torch.nn.functional as F
+from fairseq import metrics, utils
+from fairseq.criterions import FairseqCriterion, register_criterion
+from fairseq.dataclass import FairseqDataclass
+
+
+@dataclass
+class HubertCriterionConfig(FairseqDataclass):
+    pred_masked_weight: float = field(
+        default=1.0,
+        metadata={"help": "weight for predictive loss for masked frames"},
+    )
+    pred_nomask_weight: float = field(
+        default=0.0,
+        metadata={"help": "weight for predictive loss for unmasked frames"},
+    )
+    loss_weights: Optional[List[float]] = field(
+        default=None,
+        metadata={"help": "weights for additional loss terms (not first one)"},
+    )
+    log_keys: List[str] = field(
+        default_factory=lambda: [],
+        metadata={"help": "output keys to log"},
+    )
+
+
+@register_criterion("hubert", dataclass=HubertCriterionConfig)
+class HubertCriterion(FairseqCriterion):
+    def __init__(self, task, pred_masked_weight, pred_nomask_weight, loss_weights=None, log_keys=None):
+        super().__init__(task)
+        self.pred_masked_weight = pred_masked_weight
+        self.pred_nomask_weight = pred_nomask_weight
+        self.loss_weights = loss_weights
+        self.log_keys = [] if log_keys is None else log_keys
+
+    def forward(self, model, sample, reduce=True, log_pred=False):
+        """Compute the loss for the given sample.
+        Returns a tuple with three elements:
+        1) the loss
+        2) the sample size, which is used as the denominator for the gradient
+        3) logging outputs to display while training
+        """
+        net_output = model(target_list=sample["target_list"], **sample["net_input"])
+        loss = 0.
+        sample_size = 0
+        logging_output = {}
+        reduction = "sum" if reduce else "none"
+
+        loss_m_list = []
+        logp_m_list = model.get_logits(net_output, True)
+        targ_m_list = model.get_targets(net_output, True)
+        assert self.pred_masked_weight == 0 or len(logp_m_list) > 0
+        for i, (logp_m, targ_m) in enumerate(zip(logp_m_list, targ_m_list)):
+            loss_m = F.cross_entropy(logp_m, targ_m, reduction=reduction)
+            loss_m_list.append(loss_m)
+            logging_output[f"loss_m_{i}"] = loss_m.detach().item()
+        if self.pred_masked_weight > 0:
+            loss += self.pred_masked_weight * sum(loss_m_list)
+            sample_size += targ_m_list[0].numel()
+
+        loss_u_list = []
+        logp_u_list = model.get_logits(net_output, False)
+        targ_u_list = model.get_targets(net_output, False)
+        assert self.pred_nomask_weight == 0 or len(logp_u_list) > 0
+        for i, (logp_u, targ_u) in enumerate(zip(logp_u_list, targ_u_list)):
+            loss_u = F.cross_entropy(logp_u, targ_u, reduction=reduction)
+            loss_u_list.append(loss_u)
+            logging_output[f"loss_u_{i}"] = loss_u.detach().item()
+        if self.pred_nomask_weight > 0:
+            loss += self.pred_nomask_weight * sum(loss_u_list)
+            sample_size += targ_u_list[0].numel()
+
+        if self.loss_weights is not None:
+            assert hasattr(model, "get_extra_losses")
+            extra_losses, names = model.get_extra_losses(net_output)
+            if torch.is_tensor(extra_losses):
+                extra_losses = [extra_losses]
+                names = [names]
+            if len(self.loss_weights) == 1 and len(extra_losses) != 1:
+                self.loss_weights = [self.loss_weights[0]] * len(extra_losses)
+            assert len(extra_losses) == len(self.loss_weights), f"{len(extra_losses)}, {len(self.loss_weights)}"
+            for p, n, coef in zip(extra_losses, names, self.loss_weights):
+                if coef != 0 and p is not None:
+                    p = coef * p.float() * sample_size
+                    loss += p
+                    logging_output[f"loss_{n}"] = p.item()
+
+        logging_output = {
+            "loss": loss.item() if reduce else loss,
+            "ntokens": sample_size,
+            "nsentences": sample["id"].numel(),
+            "sample_size": sample_size,
+            **logging_output,
+        }
+
+        for lk in self.log_keys:
+            if lk in net_output:
+                logging_output[lk] = float((net_output[lk]))
+
+        def compute_correct(logits):
+            if logits.numel() == 0:
+                return 0, 0
+            else:
+                assert logits.dim() > 1, logits.shape
+                max = logits.argmax(-1) == 0
+                min = logits.argmin(-1) == 0
+                both = max & min
+                corr = max.long().sum().item() - both.long().sum().item()
+                count = max.numel()
+                return corr, count
+
+        with torch.no_grad():
+            for i, logp_m in enumerate(logp_m_list):
+                corr_m, count_m = compute_correct(logp_m)
+                logging_output[f"correct_m_{i}"] = corr_m
+                logging_output[f"count_m_{i}"] = count_m
+
+            for i, logp_u in enumerate(logp_u_list):
+                corr_u, count_u = compute_correct(logp_u)
+                logging_output[f"correct_u_{i}"] = corr_u
+                logging_output[f"count_u_{i}"] = count_u
+
+        return loss, sample_size, logging_output
+
+    @staticmethod
+    def reduce_metrics(logging_outputs) -> None:
+        """Aggregate logging outputs from data parallel training (copied from normal cross entropy)."""
+        loss_sum = sum(log.get("loss", 0) for log in logging_outputs)
+        ntokens = sum(log.get("ntokens", 0) for log in logging_outputs)
+        sample_size = sum(log.get("sample_size", 0) for log in logging_outputs)
+
+        metrics.log_scalar("loss", loss_sum / sample_size / math.log(2), sample_size, round=3)
+        if sample_size != ntokens:
+            metrics.log_scalar("nll_loss", loss_sum / ntokens / math.log(2), ntokens, round=3)
+            metrics.log_derived("ppl", lambda meters: utils.get_perplexity(meters["nll_loss"].avg))
+        else:
+            metrics.log_derived("ppl", lambda meters: utils.get_perplexity(meters["loss"].avg))
+
+        counts = {}
+        for lk in logging_outputs[0].keys():
+            if lk.startswith("count_"):
+                val = sum(log[lk] for log in logging_outputs)
+                metrics.log_scalar(lk, val)
+                counts[lk] = val
+
+        for lk in logging_outputs[0].keys():
+            if lk.startswith("loss_"):
+                val = sum(log[lk] for log in logging_outputs)
+                metrics.log_scalar(lk, val / sample_size / math.log(2), round=3)
+            elif lk.startswith("correct_"):
+                val = sum(log[lk] for log in logging_outputs)
+                metrics.log_scalar(lk, val / counts[re.sub("correct", "count", lk)])
+
+    @staticmethod
+    def aggregate_logging_outputs(logging_outputs):
+        """Aggregate logging outputs from data parallel training."""
+        raise NotImplementedError()
+
+    @staticmethod
+    def logging_outputs_can_be_summed() -> bool:
+        """
+        Whether the logging outputs returned by `forward` can be summed
+        across workers prior to calling `reduce_metrics`. Setting this
+        to True will improves distributed training speed.
+        """
+        return False
diff --git a/SpeechT5/fairseq/fairseq/criterions/label_smoothed_cross_entropy.py b/SpeechT5/fairseq/fairseq/criterions/label_smoothed_cross_entropy.py
new file mode 100644
index 0000000000000000000000000000000000000000..56d63e3e1b5a036e0adf32480e2b66f371738013
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/criterions/label_smoothed_cross_entropy.py
@@ -0,0 +1,170 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import math
+from dataclasses import dataclass, field
+
+import torch
+from fairseq import metrics, utils
+from fairseq.criterions import FairseqCriterion, register_criterion
+from fairseq.dataclass import FairseqDataclass
+from omegaconf import II
+
+
+@dataclass
+class LabelSmoothedCrossEntropyCriterionConfig(FairseqDataclass):
+    label_smoothing: float = field(
+        default=0.0,
+        metadata={"help": "epsilon for label smoothing, 0 means no label smoothing"},
+    )
+    report_accuracy: bool = field(
+        default=False,
+        metadata={"help": "report accuracy metric"},
+    )
+    ignore_prefix_size: int = field(
+        default=0,
+        metadata={"help": "Ignore first N tokens"},
+    )
+    sentence_avg: bool = II("optimization.sentence_avg")
+
+
+def label_smoothed_nll_loss(lprobs, target, epsilon, ignore_index=None, reduce=True):
+    if target.dim() == lprobs.dim() - 1:
+        target = target.unsqueeze(-1)
+    nll_loss = -lprobs.gather(dim=-1, index=target)
+    smooth_loss = -lprobs.sum(dim=-1, keepdim=True)
+    if ignore_index is not None:
+        pad_mask = target.eq(ignore_index)
+        nll_loss.masked_fill_(pad_mask, 0.0)
+        smooth_loss.masked_fill_(pad_mask, 0.0)
+    else:
+        nll_loss = nll_loss.squeeze(-1)
+        smooth_loss = smooth_loss.squeeze(-1)
+    if reduce:
+        nll_loss = nll_loss.sum()
+        smooth_loss = smooth_loss.sum()
+    eps_i = epsilon / (lprobs.size(-1) - 1)
+    loss = (1.0 - epsilon - eps_i) * nll_loss + eps_i * smooth_loss
+    return loss, nll_loss
+
+
+@register_criterion(
+    "label_smoothed_cross_entropy", dataclass=LabelSmoothedCrossEntropyCriterionConfig
+)
+class LabelSmoothedCrossEntropyCriterion(FairseqCriterion):
+    def __init__(
+        self,
+        task,
+        sentence_avg,
+        label_smoothing,
+        ignore_prefix_size=0,
+        report_accuracy=False,
+    ):
+        super().__init__(task)
+        self.sentence_avg = sentence_avg
+        self.eps = label_smoothing
+        self.ignore_prefix_size = ignore_prefix_size
+        self.report_accuracy = report_accuracy
+
+    def forward(self, model, sample, reduce=True):
+        """Compute the loss for the given sample.
+
+        Returns a tuple with three elements:
+        1) the loss
+        2) the sample size, which is used as the denominator for the gradient
+        3) logging outputs to display while training
+        """
+        net_output = model(**sample["net_input"])
+        loss, nll_loss = self.compute_loss(model, net_output, sample, reduce=reduce)
+        sample_size = (
+            sample["target"].size(0) if self.sentence_avg else sample["ntokens"]
+        )
+        logging_output = {
+            "loss": loss.data,
+            "nll_loss": nll_loss.data,
+            "ntokens": sample["ntokens"],
+            "nsentences": sample["target"].size(0),
+            "sample_size": sample_size,
+        }
+        if self.report_accuracy:
+            n_correct, total = self.compute_accuracy(model, net_output, sample)
+            logging_output["n_correct"] = utils.item(n_correct.data)
+            logging_output["total"] = utils.item(total.data)
+        return loss, sample_size, logging_output
+
+    def get_lprobs_and_target(self, model, net_output, sample):
+        lprobs = model.get_normalized_probs(net_output, log_probs=True)
+        target = model.get_targets(sample, net_output)
+        if self.ignore_prefix_size > 0:
+            if getattr(lprobs, "batch_first", False):
+                lprobs = lprobs[:, self.ignore_prefix_size :, :].contiguous()
+                target = target[:, self.ignore_prefix_size :].contiguous()
+            else:
+                lprobs = lprobs[self.ignore_prefix_size :, :, :].contiguous()
+                target = target[self.ignore_prefix_size :, :].contiguous()
+        return lprobs.view(-1, lprobs.size(-1)), target.view(-1)
+
+    def compute_loss(self, model, net_output, sample, reduce=True):
+        lprobs, target = self.get_lprobs_and_target(model, net_output, sample)
+        loss, nll_loss = label_smoothed_nll_loss(
+            lprobs,
+            target,
+            self.eps,
+            ignore_index=self.padding_idx,
+            reduce=reduce,
+        )
+        return loss, nll_loss
+
+    def compute_accuracy(self, model, net_output, sample):
+        lprobs, target = self.get_lprobs_and_target(model, net_output, sample)
+        mask = target.ne(self.padding_idx)
+        n_correct = torch.sum(
+            lprobs.argmax(1).masked_select(mask).eq(target.masked_select(mask))
+        )
+        total = torch.sum(mask)
+        return n_correct, total
+
+    @classmethod
+    def reduce_metrics(cls, logging_outputs) -> None:
+        """Aggregate logging outputs from data parallel training."""
+        loss_sum = sum(log.get("loss", 0) for log in logging_outputs)
+        nll_loss_sum = sum(log.get("nll_loss", 0) for log in logging_outputs)
+        ntokens = sum(log.get("ntokens", 0) for log in logging_outputs)
+        sample_size = sum(log.get("sample_size", 0) for log in logging_outputs)
+
+        metrics.log_scalar(
+            "loss", loss_sum / sample_size / math.log(2), sample_size, round=3
+        )
+        metrics.log_scalar(
+            "nll_loss", nll_loss_sum / ntokens / math.log(2), ntokens, round=3
+        )
+        metrics.log_derived(
+            "ppl", lambda meters: utils.get_perplexity(meters["nll_loss"].avg)
+        )
+
+        total = utils.item(sum(log.get("total", 0) for log in logging_outputs))
+        if total > 0:
+            metrics.log_scalar("total", total)
+            n_correct = utils.item(
+                sum(log.get("n_correct", 0) for log in logging_outputs)
+            )
+            metrics.log_scalar("n_correct", n_correct)
+            metrics.log_derived(
+                "accuracy",
+                lambda meters: round(
+                    meters["n_correct"].sum * 100.0 / meters["total"].sum, 3
+                )
+                if meters["total"].sum > 0
+                else float("nan"),
+            )
+
+    @staticmethod
+    def logging_outputs_can_be_summed() -> bool:
+        """
+        Whether the logging outputs returned by `forward` can be summed
+        across workers prior to calling `reduce_metrics`. Setting this
+        to True will improves distributed training speed.
+        """
+        return True
diff --git a/SpeechT5/fairseq/fairseq/criterions/label_smoothed_cross_entropy_latency_augmented.py b/SpeechT5/fairseq/fairseq/criterions/label_smoothed_cross_entropy_latency_augmented.py
new file mode 100644
index 0000000000000000000000000000000000000000..051785238fdc4d18230de49ddd735f154ed5a3e7
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/criterions/label_smoothed_cross_entropy_latency_augmented.py
@@ -0,0 +1,108 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from fairseq.criterions import register_criterion
+from fairseq.criterions.label_smoothed_cross_entropy import (
+    LabelSmoothedCrossEntropyCriterion,
+)
+
+
+@register_criterion("latency_augmented_label_smoothed_cross_entropy")
+class LatencyAugmentedLabelSmoothedCrossEntropyCriterion(
+    LabelSmoothedCrossEntropyCriterion
+):
+    def __init__(
+        self,
+        task,
+        sentence_avg,
+        label_smoothing,
+        ignore_prefix_size,
+        report_accuracy,
+        latency_weight_avg,
+        latency_weight_avg_type,
+        latency_weight_var,
+        latency_weight_var_type,
+        mass_preservation,
+        average_method,
+    ):
+        super().__init__(
+            task, sentence_avg, label_smoothing, ignore_prefix_size, report_accuracy
+        )
+        from examples.simultaneous_translation.utils.latency import LatencyTraining
+        self.eps = label_smoothing
+        self.latency_weight_avg = latency_weight_avg
+        self.latency_weight_avg_type = latency_weight_avg_type
+        self.latency_weight_var = latency_weight_var
+        self.latency_weight_var_type = latency_weight_var_type
+        self.mass_preservation = mass_preservation
+        self.average_method = average_method
+        self.latency_train = LatencyTraining(
+            self.latency_weight_avg,
+            self.latency_weight_var,
+            self.latency_weight_avg_type,
+            self.latency_weight_var_type,
+            self.mass_preservation,
+            self.average_method,
+        )
+
+    @staticmethod
+    def add_args(parser):
+        super(
+            LatencyAugmentedLabelSmoothedCrossEntropyCriterion,
+            LatencyAugmentedLabelSmoothedCrossEntropyCriterion,
+        ).add_args(parser)
+        # fmt: off
+
+        """Add criterion-specific arguments to the parser."""
+        parser.add_argument(
+            "--label-smoothing",
+            default=0.0,
+            type=float,
+            metavar="D",
+            help="epsilon for label smoothing, 0 means no label smoothing",
+        )
+        parser.add_argument(
+            "--ignore_prefix_size",
+            default=0,
+            type=int,
+            help="ignore first N tokens",
+        )
+        parser.add_argument(
+            "--report-accuracy",
+            default=False,
+            type=bool,
+            help="report accuracy metric",
+        )
+        parser.add_argument("--latency-weight-avg", default=0., type=float, metavar='D',
+                            help="Average loss weight")
+        parser.add_argument("--latency-weight-var", default=0., type=float, metavar='D',
+                            help="Variance loss weight")
+        parser.add_argument("--latency-weight-avg-type", default="differentiable_average_lagging",
+                            help="Statistics for Average loss type")
+        parser.add_argument("--latency-weight-var-type", default="variance_delay",
+                            help="Statistics for variance loss type")
+        parser.add_argument("--average-method", default="weighted_average",
+                            help="Average loss type")
+        # fmt: on
+
+    def compute_loss(self, model, net_output, sample, reduce=True):
+        # Compute cross entropy loss first
+        loss, nll_loss = super().compute_loss(model, net_output, sample, reduce)
+
+        # Obtain the expected alignment
+        attn_list = [item["alpha"] for item in net_output[-1]["attn_list"]]
+
+        target_padding_mask = model.get_targets(sample, net_output).eq(self.padding_idx)
+
+        source_padding_mask = net_output[-1].get("encoder_padding_mask", None)
+
+        # Get latency loss
+        latency_loss = self.latency_train.loss(
+            attn_list, source_padding_mask, target_padding_mask
+        )
+
+        loss += latency_loss
+
+        return loss, nll_loss
diff --git a/SpeechT5/fairseq/fairseq/criterions/label_smoothed_cross_entropy_with_alignment.py b/SpeechT5/fairseq/fairseq/criterions/label_smoothed_cross_entropy_with_alignment.py
new file mode 100644
index 0000000000000000000000000000000000000000..73cfa05310e51d9a5f349cc30b8406002d25861b
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/criterions/label_smoothed_cross_entropy_with_alignment.py
@@ -0,0 +1,125 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import math
+
+from fairseq import metrics, utils
+from fairseq.criterions import register_criterion
+
+from .label_smoothed_cross_entropy import LabelSmoothedCrossEntropyCriterion
+
+
+@register_criterion("label_smoothed_cross_entropy_with_alignment")
+class LabelSmoothedCrossEntropyCriterionWithAlignment(
+    LabelSmoothedCrossEntropyCriterion
+):
+    def __init__(self, task, sentence_avg, label_smoothing, alignment_lambda):
+        super().__init__(task, sentence_avg, label_smoothing)
+        self.alignment_lambda = alignment_lambda
+
+    @staticmethod
+    def add_args(parser):
+        """Add criterion-specific arguments to the parser."""
+        LabelSmoothedCrossEntropyCriterion.add_args(parser)
+        parser.add_argument(
+            "--alignment-lambda",
+            default=0.05,
+            type=float,
+            metavar="D",
+            help="weight for the alignment loss",
+        )
+
+    def forward(self, model, sample, reduce=True):
+        """Compute the loss for the given sample.
+
+        Returns a tuple with three elements:
+        1) the loss
+        2) the sample size, which is used as the denominator for the gradient
+        3) logging outputs to display while training
+        """
+        net_output = model(**sample["net_input"])
+        loss, nll_loss = self.compute_loss(model, net_output, sample, reduce=reduce)
+        sample_size = (
+            sample["target"].size(0) if self.sentence_avg else sample["ntokens"]
+        )
+        logging_output = {
+            "loss": utils.item(loss.data) if reduce else loss.data,
+            "nll_loss": utils.item(nll_loss.data) if reduce else nll_loss.data,
+            "ntokens": sample["ntokens"],
+            "nsentences": sample["target"].size(0),
+            "sample_size": sample_size,
+        }
+
+        alignment_loss = None
+
+        # Compute alignment loss only for training set and non dummy batches.
+        if "alignments" in sample and sample["alignments"] is not None:
+            alignment_loss = self.compute_alignment_loss(sample, net_output)
+
+        if alignment_loss is not None:
+            logging_output["alignment_loss"] = utils.item(alignment_loss.data)
+            loss += self.alignment_lambda * alignment_loss
+
+        return loss, sample_size, logging_output
+
+    def compute_alignment_loss(self, sample, net_output):
+        attn_prob = net_output[1]["attn"][0]
+        bsz, tgt_sz, src_sz = attn_prob.shape
+        attn = attn_prob.view(bsz * tgt_sz, src_sz)
+
+        align = sample["alignments"]
+        align_weights = sample["align_weights"].float()
+
+        if len(align) > 0:
+            # Alignment loss computation. align (shape [:, 2]) contains the src-tgt index pairs corresponding to
+            # the alignments. align_weights (shape [:]) contains the 1 / frequency of a tgt index for normalizing.
+            loss = -(
+                (attn[align[:, 1][:, None], align[:, 0][:, None]]).log()
+                * align_weights[:, None]
+            ).sum()
+        else:
+            return None
+
+        return loss
+
+    @staticmethod
+    def reduce_metrics(logging_outputs) -> None:
+        """Aggregate logging outputs from data parallel training."""
+        loss_sum = utils.item(sum(log.get("loss", 0) for log in logging_outputs))
+        nll_loss_sum = utils.item(
+            sum(log.get("nll_loss", 0) for log in logging_outputs)
+        )
+        alignment_loss_sum = utils.item(
+            sum(log.get("alignment_loss", 0) for log in logging_outputs)
+        )
+        ntokens = utils.item(sum(log.get("ntokens", 0) for log in logging_outputs))
+        sample_size = utils.item(
+            sum(log.get("sample_size", 0) for log in logging_outputs)
+        )
+
+        metrics.log_scalar(
+            "loss", loss_sum / sample_size / math.log(2), sample_size, round=3
+        )
+        metrics.log_scalar(
+            "nll_loss", nll_loss_sum / ntokens / math.log(2), ntokens, round=3
+        )
+        metrics.log_scalar(
+            "alignment_loss",
+            alignment_loss_sum / sample_size / math.log(2),
+            sample_size,
+            round=3,
+        )
+        metrics.log_derived(
+            "ppl", lambda meters: utils.get_perplexity(meters["nll_loss"].avg)
+        )
+
+    @staticmethod
+    def logging_outputs_can_be_summed() -> bool:
+        """
+        Whether the logging outputs returned by `forward` can be summed
+        across workers prior to calling `reduce_metrics`. Setting this
+        to True will improves distributed training speed.
+        """
+        return True
diff --git a/SpeechT5/fairseq/fairseq/criterions/legacy_masked_lm.py b/SpeechT5/fairseq/fairseq/criterions/legacy_masked_lm.py
new file mode 100644
index 0000000000000000000000000000000000000000..c70608c5a143b7b4fbd8c58dfcf9f873639d379c
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/criterions/legacy_masked_lm.py
@@ -0,0 +1,177 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import math
+
+import torch
+import torch.nn.functional as F
+from fairseq import metrics, utils
+from fairseq.criterions import FairseqCriterion, register_criterion
+
+
+def compute_cross_entropy_loss(logits, targets, ignore_index=-100):
+    """
+    Function to compute the cross entropy loss. The default value of
+    ignore_index is the same as the default value for F.cross_entropy in
+    pytorch.
+    """
+    assert logits.size(0) == targets.size(
+        -1
+    ), "Logits and Targets tensor shapes don't match up"
+
+    loss = F.nll_loss(
+        F.log_softmax(logits, -1, dtype=torch.float32),
+        targets,
+        reduction="sum",
+        ignore_index=ignore_index,
+    )
+    return loss
+
+
+@register_criterion("legacy_masked_lm_loss")
+class LegacyMaskedLmLoss(FairseqCriterion):
+    """
+    Implementation for the loss used in masked language model (MLM) training.
+    This optionally also computes the next sentence prediction (NSP) loss and
+    adds it to the overall loss based on the specified args. There are three
+    cases to consider:
+        1) Generic MLM training without NSP loss. In this case sentence_targets
+           and sentence_logits are both None.
+        2) BERT training without NSP loss. In this case sentence_targets is
+           not None but sentence_logits is None and we should not be computing
+           a sentence level loss.
+        3) BERT training with NSP loss. In this case both sentence_targets and
+           sentence_logits are not None and we should be computing a sentence
+           level loss. The weight of the sentence level loss is specified as
+           an argument.
+    """
+
+    def __init__(self, task, masked_lm_only, nsp_loss_weight):
+        super().__init__(task)
+        self.masked_lm_only = masked_lm_only
+        self.nsp_loss_weight = nsp_loss_weight
+
+    @staticmethod
+    def add_args(parser):
+        """Args for MaskedLM Loss"""
+        # Default for masked_lm_only is False so as to not break BERT training
+        parser.add_argument(
+            "--masked-lm-only",
+            default=False,
+            action="store_true",
+            help="compute MLM loss only",
+        )
+        parser.add_argument(
+            "--nsp-loss-weight",
+            default=1.0,
+            type=float,
+            help="weight for next sentence prediction" " loss (default 1)",
+        )
+
+    def forward(self, model, sample, reduce=True):
+        """Compute the loss for the given sample.
+        Returns a tuple with three elements:
+        1) the loss
+        2) the sample size, which is used as the denominator for the gradient
+        3) logging outputs to display while training
+        """
+        lm_logits, output_metadata = model(**sample["net_input"])
+
+        # reshape lm_logits from (N,T,C) to (N*T,C)
+        lm_logits = lm_logits.view(-1, lm_logits.size(-1))
+        lm_targets = sample["lm_target"].view(-1)
+        lm_loss = compute_cross_entropy_loss(lm_logits, lm_targets, self.padding_idx)
+
+        # compute the number of tokens for which loss is computed. This is used
+        # to normalize the loss
+        ntokens = utils.strip_pad(lm_targets, self.padding_idx).numel()
+        loss = lm_loss / ntokens
+        nsentences = sample["nsentences"]
+        # nsentences = 0
+
+        # Compute sentence loss if masked_lm_only is False
+        sentence_loss = None
+        if not self.masked_lm_only:
+            sentence_logits = output_metadata["sentence_logits"]
+            sentence_targets = sample["sentence_target"].view(-1)
+            # This needs to be recomputed due to some differences between
+            # TokenBlock and BlockPair dataset. This can be resolved with a
+            # refactor of BERTModel which we will do in the future.
+            # TODO: Remove this after refactor of BERTModel
+            nsentences = sentence_targets.size(0)
+
+            # Check for logits being none which can happen when remove_heads
+            # is set to true in the BERT model. Ideally we should set
+            # masked_lm_only to true in this case, but that requires some
+            # refactor in the BERT model.
+            if sentence_logits is not None:
+                sentence_loss = compute_cross_entropy_loss(
+                    sentence_logits, sentence_targets
+                )
+
+                loss += self.nsp_loss_weight * (sentence_loss / nsentences)
+
+        # NOTE: as we are summing up per token mlm loss and per sentence nsp loss
+        # we don't need to use sample_size as denominator for the gradient
+        # here sample_size is just used for logging
+        sample_size = 1
+        logging_output = {
+            "loss": utils.item(loss.data) if reduce else loss.data,
+            "lm_loss": utils.item(lm_loss.data) if reduce else lm_loss.data,
+            # sentence loss is not always computed
+            "sentence_loss": (
+                (utils.item(sentence_loss.data) if reduce else sentence_loss.data)
+                if sentence_loss is not None
+                else 0.0
+            ),
+            "ntokens": ntokens,
+            "nsentences": nsentences,
+            "sample_size": sample_size,
+        }
+        return loss, sample_size, logging_output
+
+    @staticmethod
+    def reduce_metrics(logging_outputs) -> None:
+        """Aggregate logging outputs from data parallel training."""
+        lm_loss_sum = sum(log.get("lm_loss", 0) for log in logging_outputs)
+        sentence_loss_sum = sum(log.get("sentence_loss", 0) for log in logging_outputs)
+        ntokens = sum(log.get("ntokens", 0) for log in logging_outputs)
+        nsentences = sum(log.get("nsentences", 0) for log in logging_outputs)
+        sample_size = sum(log.get("sample_size", 0) for log in logging_outputs)
+        agg_loss = sum(log.get("loss", 0) for log in logging_outputs)
+
+        metrics.log_scalar(
+            "loss",
+            agg_loss / sample_size / math.log(2) if sample_size > 0 else 0.0,
+            sample_size,
+            round=3,
+        )
+        metrics.log_scalar(
+            "lm_loss",
+            lm_loss_sum / ntokens / math.log(2) if ntokens > 0 else 0.0,
+            ntokens,
+            round=3,
+        )
+        metrics.log_scalar(
+            "sentence_loss",
+            sentence_loss_sum / nsentences / math.log(2) if nsentences > 0 else 0.0,
+            nsentences,
+            round=3,
+        )
+        metrics.log_scalar(
+            "nll_loss",
+            lm_loss_sum / ntokens / math.log(2) if ntokens > 0 else 0.0,
+            ntokens,
+            round=3,
+        )
+
+    @staticmethod
+    def logging_outputs_can_be_summed() -> bool:
+        """
+        Whether the logging outputs returned by `forward` can be summed
+        across workers prior to calling `reduce_metrics`. Setting this
+        to True will improves distributed training speed.
+        """
+        return True
diff --git a/SpeechT5/fairseq/fairseq/criterions/masked_lm.py b/SpeechT5/fairseq/fairseq/criterions/masked_lm.py
new file mode 100644
index 0000000000000000000000000000000000000000..b04cfbff6dcbfacb91156bb10a7c8cdbb9e76d37
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/criterions/masked_lm.py
@@ -0,0 +1,91 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import math
+
+import torch
+import torch.nn.functional as F
+from fairseq import metrics, modules, utils
+from fairseq.criterions import FairseqCriterion, register_criterion
+
+
+@register_criterion("masked_lm")
+class MaskedLmLoss(FairseqCriterion):
+    """
+    Implementation for the loss used in masked language model (MLM) training.
+    """
+
+    def __init__(self, task, tpu=False):
+        super().__init__(task)
+        self.tpu = tpu
+
+    def forward(self, model, sample, reduce=True):
+        """Compute the loss for the given sample.
+
+        Returns a tuple with three elements:
+        1) the loss
+        2) the sample size, which is used as the denominator for the gradient
+        3) logging outputs to display while training
+        """
+        masked_tokens = sample["target"].ne(self.padding_idx)
+        sample_size = masked_tokens.int().sum()
+
+        # Rare: when all tokens are masked, project all tokens.
+        # We use torch.where to avoid device-to-host transfers,
+        # except on CPU where torch.where is not well supported
+        # (see github.com/pytorch/pytorch/issues/26247).
+        if self.tpu:
+            masked_tokens = None  # always project all tokens on TPU
+        elif masked_tokens.device == torch.device("cpu"):
+            if not masked_tokens.any():
+                masked_tokens = None
+        else:
+            masked_tokens = torch.where(
+                masked_tokens.any(),
+                masked_tokens,
+                masked_tokens.new([True]),
+            )
+
+        logits = model(**sample["net_input"], masked_tokens=masked_tokens)[0]
+        targets = model.get_targets(sample, [logits])
+        if masked_tokens is not None:
+            targets = targets[masked_tokens]
+
+        loss = modules.cross_entropy(
+            logits.view(-1, logits.size(-1)),
+            targets.view(-1),
+            reduction="sum",
+            ignore_index=self.padding_idx,
+        )
+
+        logging_output = {
+            "loss": loss if self.tpu else loss.data,
+            "ntokens": sample["ntokens"],
+            "nsentences": sample["nsentences"],
+            "sample_size": sample_size,
+        }
+        return loss, sample_size, logging_output
+
+    @staticmethod
+    def reduce_metrics(logging_outputs) -> None:
+        """Aggregate logging outputs from data parallel training."""
+        loss_sum = sum(log.get("loss", 0) for log in logging_outputs)
+        sample_size = sum(log.get("sample_size", 0) for log in logging_outputs)
+
+        metrics.log_scalar(
+            "loss", loss_sum / sample_size / math.log(2), sample_size, round=3
+        )
+        metrics.log_derived(
+            "ppl", lambda meters: utils.get_perplexity(meters["loss"].avg)
+        )
+
+    @staticmethod
+    def logging_outputs_can_be_summed() -> bool:
+        """
+        Whether the logging outputs returned by `forward` can be summed
+        across workers prior to calling `reduce_metrics`. Setting this
+        to True will improves distributed training speed.
+        """
+        return True
diff --git a/SpeechT5/fairseq/fairseq/criterions/model_criterion.py b/SpeechT5/fairseq/fairseq/criterions/model_criterion.py
new file mode 100644
index 0000000000000000000000000000000000000000..30350f13b1c00498de6784579250d6b342ced7dd
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/criterions/model_criterion.py
@@ -0,0 +1,138 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import logging
+from dataclasses import dataclass, field
+from typing import Dict, List
+
+from fairseq import metrics, utils
+from fairseq.criterions import FairseqCriterion, register_criterion
+from fairseq.dataclass import FairseqDataclass
+
+
+logger = logging.getLogger(__name__)
+
+
+@dataclass
+class ModelCriterionConfig(FairseqDataclass):
+    loss_weights: Dict[str, float] = field(
+        default_factory=dict,
+        metadata={"help": "weights for the loss terms"},
+    )
+    log_keys: List[str] = field(
+        default_factory=list,
+        metadata={"help": "additional output keys to log"},
+    )
+
+
+@register_criterion("model", dataclass=ModelCriterionConfig)
+class ModelCriterion(FairseqCriterion):
+    """
+    This criterion relies on the model to supply losses.
+    The losses should be a dictionary of name -> scalar returned by
+    the model either by including it in the net_output dict or by
+    implementing a get_losses(net_output, sample) method. The final loss is
+    a scaled sum of all losses according to weights in loss_weights.
+    If no weights are provided, then all losses are scaled by 1.0.
+
+    The losses will be automatically logged. Additional keys from
+    net_output dict can be logged via the log_keys parameter.
+    """
+
+    def __init__(self, task, loss_weights=None, log_keys=None):
+        super().__init__(task)
+        self.loss_weights = loss_weights
+        self.log_keys = log_keys
+
+    def forward(self, model, sample, reduce=True):
+        net_output = model(**sample["net_input"])
+
+        sample_size = net_output["sample_size"]
+        scaled_losses = {}
+
+        if hasattr(model, "get_losses"):
+            losses = model.get_losses(net_output, sample)
+        elif isinstance(net_output, dict) and "losses" in net_output:
+            losses = net_output["losses"]
+        else:
+            raise Exception("Could not retrieve losses")
+
+        for lk, p in losses.items():
+            try:
+                coef = 1.0 if len(self.loss_weights) == 0 else self.loss_weights[lk]
+            except KeyError:
+                logger.error(
+                    f"weight for loss {lk} is not in loss_weights ({self.loss_weights})"
+                )
+                raise
+            if coef != 0 and p is not None:
+                scaled_losses[lk] = coef * p.float()
+
+        loss = sum(scaled_losses.values())
+        if reduce and loss.numel() > 1:
+            loss = loss.sum()
+
+        logging_output = {
+            "loss": loss.data,
+            "ntokens": sample_size,
+            "nsentences": sample["id"].numel(),
+            "sample_size": sample_size,
+            "_world_size": 1,
+        }
+
+        for lk in self.log_keys:
+            if lk in net_output and net_output[lk] is not None:
+                logging_output[lk] = float(net_output[lk])
+
+        if len(scaled_losses) > 1:
+            for lk, l in scaled_losses.items():
+                logging_output[f"loss_{lk}"] = l.item()
+
+        return loss, sample_size, logging_output
+
+    @staticmethod
+    def reduce_metrics(logging_outputs) -> None:
+        """Aggregate logging outputs from data parallel training."""
+        loss_sum = utils.item(sum(log.get("loss", 0) for log in logging_outputs))
+        ntokens = utils.item(sum(log.get("ntokens", 0) for log in logging_outputs))
+        nsentences = utils.item(
+            sum(log.get("nsentences", 0) for log in logging_outputs)
+        )
+        sample_size = utils.item(
+            sum(log.get("sample_size", 0) for log in logging_outputs)
+        )
+
+        metrics.log_scalar("loss", loss_sum / sample_size, sample_size, round=3)
+        metrics.log_scalar("ntokens", ntokens)
+        metrics.log_scalar("nsentences", nsentences)
+
+        builtin_keys = {
+            "loss",
+            "ntokens",
+            "nsentences",
+            "sample_size",
+            "_world_size",
+        }
+
+        world_size = utils.item(
+            sum(log.get("_world_size", 0) for log in logging_outputs)
+        )
+
+        for k in logging_outputs[0]:
+            if k not in builtin_keys:
+                val = sum(log.get(k, 0) for log in logging_outputs)
+                if k.startswith("loss_"):
+                    metrics.log_scalar(k, val / sample_size, sample_size, round=3)
+                else:
+                    metrics.log_scalar(k, val / world_size, round=3)
+
+    @staticmethod
+    def logging_outputs_can_be_summed() -> bool:
+        """
+        Whether the logging outputs returned by `forward` can be summed
+        across workers prior to calling `reduce_metrics`. Setting this
+        to True will improves distributed training speed.
+        """
+        return True
diff --git a/SpeechT5/fairseq/fairseq/criterions/nat_loss.py b/SpeechT5/fairseq/fairseq/criterions/nat_loss.py
new file mode 100644
index 0000000000000000000000000000000000000000..cdc7da861d7d5d5af183a78fdde51f49eb0cf5e7
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/criterions/nat_loss.py
@@ -0,0 +1,180 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import math
+
+import torch
+import torch.nn.functional as F
+from fairseq import metrics, utils
+from fairseq.criterions import FairseqCriterion, register_criterion
+from torch import Tensor
+
+
+@register_criterion("nat_loss")
+class LabelSmoothedDualImitationCriterion(FairseqCriterion):
+    def __init__(self, task, label_smoothing):
+        super().__init__(task)
+        self.label_smoothing = label_smoothing
+
+    @staticmethod
+    def add_args(parser):
+        """Add criterion-specific arguments to the parser."""
+        parser.add_argument(
+            "--label-smoothing",
+            default=0.0,
+            type=float,
+            metavar="D",
+            help="epsilon for label smoothing, 0 means no label smoothing",
+        )
+
+    def _compute_loss(
+        self, outputs, targets, masks=None, label_smoothing=0.0, name="loss", factor=1.0
+    ):
+        """
+        outputs: batch x len x d_model
+        targets: batch x len
+        masks:   batch x len
+
+        policy_logprob: if there is some policy
+            depends on the likelihood score as rewards.
+        """
+
+        def mean_ds(x: Tensor, dim=None) -> Tensor:
+            return (
+                x.float().mean().type_as(x)
+                if dim is None
+                else x.float().mean(dim).type_as(x)
+            )
+
+        if masks is not None:
+            outputs, targets = outputs[masks], targets[masks]
+
+        if masks is not None and not masks.any():
+            nll_loss = torch.tensor(0)
+            loss = nll_loss
+        else:
+            logits = F.log_softmax(outputs, dim=-1)
+            if targets.dim() == 1:
+                losses = F.nll_loss(logits, targets.to(logits.device), reduction="none")
+
+            else:  # soft-labels
+                losses = F.kl_div(logits, targets.to(logits.device), reduction="none")
+                losses = losses.sum(-1)
+
+            nll_loss = mean_ds(losses)
+            if label_smoothing > 0:
+                loss = (
+                    nll_loss * (1 - label_smoothing) - mean_ds(logits) * label_smoothing
+                )
+            else:
+                loss = nll_loss
+
+        loss = loss * factor
+        return {"name": name, "loss": loss, "nll_loss": nll_loss, "factor": factor}
+
+    def _custom_loss(self, loss, name="loss", factor=1.0):
+        return {"name": name, "loss": loss, "factor": factor}
+
+    def forward(self, model, sample, reduce=True):
+        """Compute the loss for the given sample.
+        Returns a tuple with three elements:
+        1) the loss
+        2) the sample size, which is used as the denominator for the gradient
+        3) logging outputs to display while training
+        """
+        nsentences, ntokens = sample["nsentences"], sample["ntokens"]
+
+        # B x T
+        src_tokens, src_lengths = (
+            sample["net_input"]["src_tokens"],
+            sample["net_input"]["src_lengths"],
+        )
+        tgt_tokens, prev_output_tokens = sample["target"], sample["prev_target"]
+
+        outputs = model(src_tokens, src_lengths, prev_output_tokens, tgt_tokens)
+        losses, nll_loss = [], []
+
+        for obj in outputs:
+            if outputs[obj].get("loss", None) is None:
+                _losses = self._compute_loss(
+                    outputs[obj].get("out"),
+                    outputs[obj].get("tgt"),
+                    outputs[obj].get("mask", None),
+                    outputs[obj].get("ls", 0.0),
+                    name=obj + "-loss",
+                    factor=outputs[obj].get("factor", 1.0),
+                )
+            else:
+                _losses = self._custom_loss(
+                    outputs[obj].get("loss"),
+                    name=obj + "-loss",
+                    factor=outputs[obj].get("factor", 1.0),
+                )
+
+            losses += [_losses]
+            if outputs[obj].get("nll_loss", False):
+                nll_loss += [_losses.get("nll_loss", 0.0)]
+
+        loss = sum(l["loss"] for l in losses)
+        nll_loss = sum(l for l in nll_loss) if len(nll_loss) > 0 else loss.new_tensor(0)
+
+        # NOTE:
+        # we don't need to use sample_size as denominator for the gradient
+        # here sample_size is just used for logging
+        sample_size = 1
+        logging_output = {
+            "loss": loss.data,
+            "nll_loss": nll_loss.data,
+            "ntokens": ntokens,
+            "nsentences": nsentences,
+            "sample_size": sample_size,
+        }
+
+        for l in losses:
+            logging_output[l["name"]] = (
+                utils.item(l["loss"].data / l["factor"])
+                if reduce
+                else l[["loss"]].data / l["factor"]
+            )
+
+        return loss, sample_size, logging_output
+
+    @staticmethod
+    def reduce_metrics(logging_outputs) -> None:
+        """Aggregate logging outputs from data parallel training."""
+        sample_size = utils.item(
+            sum(log.get("sample_size", 0) for log in logging_outputs)
+        )
+        loss = utils.item(sum(log.get("loss", 0) for log in logging_outputs))
+        nll_loss = utils.item(sum(log.get("nll_loss", 0) for log in logging_outputs))
+
+        metrics.log_scalar(
+            "loss", loss / sample_size / math.log(2), sample_size, round=3
+        )
+        metrics.log_scalar(
+            "nll_loss", nll_loss / sample_size / math.log(2), sample_size, round=3
+        )
+        metrics.log_derived(
+            "ppl", lambda meters: utils.get_perplexity(meters["loss"].avg)
+        )
+
+        for key in logging_outputs[0]:
+            if key[-5:] == "-loss":
+                val = sum(log.get(key, 0) for log in logging_outputs)
+                metrics.log_scalar(
+                    key[:-5],
+                    val / sample_size / math.log(2) if sample_size > 0 else 0.0,
+                    sample_size,
+                    round=3,
+                )
+
+    @staticmethod
+    def logging_outputs_can_be_summed() -> bool:
+        """
+        Whether the logging outputs returned by `forward` can be summed
+        across workers prior to calling `reduce_metrics`. Setting this
+        to True will improves distributed training speed.
+        """
+        return True
diff --git a/SpeechT5/fairseq/fairseq/criterions/sentence_prediction.py b/SpeechT5/fairseq/fairseq/criterions/sentence_prediction.py
new file mode 100644
index 0000000000000000000000000000000000000000..9519fdc56d7de86b727f74ef5b18db520382e562
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/criterions/sentence_prediction.py
@@ -0,0 +1,99 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import math
+
+import torch
+import torch.nn.functional as F
+from fairseq import metrics, utils
+from fairseq.criterions import FairseqCriterion, register_criterion
+
+
+@register_criterion("sentence_prediction")
+class SentencePredictionCriterion(FairseqCriterion):
+    def __init__(self, task, classification_head_name, regression_target):
+        super().__init__(task)
+        self.classification_head_name = classification_head_name
+        self.regression_target = regression_target
+
+    @staticmethod
+    def add_args(parser):
+        # fmt: off
+        parser.add_argument('--classification-head-name',
+                            default='sentence_classification_head',
+                            help='name of the classification head to use')
+        # fmt: on
+
+    def forward(self, model, sample, reduce=True):
+        """Compute the loss for the given sample.
+
+        Returns a tuple with three elements:
+        1) the loss
+        2) the sample size, which is used as the denominator for the gradient
+        3) logging outputs to display while training
+        """
+        assert (
+            hasattr(model, "classification_heads")
+            and self.classification_head_name in model.classification_heads
+        ), "model must provide sentence classification head for --criterion=sentence_prediction"
+
+        logits, _ = model(
+            **sample["net_input"],
+            features_only=True,
+            classification_head_name=self.classification_head_name,
+        )
+        targets = model.get_targets(sample, [logits]).view(-1)
+        sample_size = targets.numel()
+
+        if not self.regression_target:
+            lprobs = F.log_softmax(logits, dim=-1, dtype=torch.float32)
+            loss = F.nll_loss(lprobs, targets, reduction="sum")
+        else:
+            logits = logits.view(-1).float()
+            targets = targets.float()
+            loss = F.mse_loss(logits, targets, reduction="sum")
+
+        logging_output = {
+            "loss": loss.data,
+            "ntokens": sample["ntokens"],
+            "nsentences": sample_size,
+            "sample_size": sample_size,
+        }
+        if not self.regression_target:
+            preds = logits.argmax(dim=1)
+            logging_output["ncorrect"] = (preds == targets).sum()
+
+        return loss, sample_size, logging_output
+
+    @staticmethod
+    def reduce_metrics(logging_outputs) -> None:
+        """Aggregate logging outputs from data parallel training."""
+        loss_sum = sum(log.get("loss", 0) for log in logging_outputs)
+        ntokens = sum(log.get("ntokens", 0) for log in logging_outputs)
+        nsentences = sum(log.get("nsentences", 0) for log in logging_outputs)
+        sample_size = sum(log.get("sample_size", 0) for log in logging_outputs)
+
+        metrics.log_scalar(
+            "loss", loss_sum / sample_size / math.log(2), sample_size, round=3
+        )
+        if sample_size != ntokens:
+            metrics.log_scalar(
+                "nll_loss", loss_sum / ntokens / math.log(2), ntokens, round=3
+            )
+
+        if len(logging_outputs) > 0 and "ncorrect" in logging_outputs[0]:
+            ncorrect = sum(log.get("ncorrect", 0) for log in logging_outputs)
+            metrics.log_scalar(
+                "accuracy", 100.0 * ncorrect / nsentences, nsentences, round=1
+            )
+
+    @staticmethod
+    def logging_outputs_can_be_summed() -> bool:
+        """
+        Whether the logging outputs returned by `forward` can be summed
+        across workers prior to calling `reduce_metrics`. Setting this
+        to True will improves distributed training speed.
+        """
+        return True
diff --git a/SpeechT5/fairseq/fairseq/criterions/sentence_ranking.py b/SpeechT5/fairseq/fairseq/criterions/sentence_ranking.py
new file mode 100644
index 0000000000000000000000000000000000000000..d4c76341d4d87e6d0da21ac89e833ce0bda13a0c
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/criterions/sentence_ranking.py
@@ -0,0 +1,120 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import math
+
+import torch
+import torch.nn.functional as F
+from fairseq import metrics, utils
+from fairseq.criterions import FairseqCriterion, register_criterion
+
+
+@register_criterion("sentence_ranking")
+class SentenceRankingCriterion(FairseqCriterion):
+    def __init__(self, task, ranking_head_name, save_predictions, num_classes):
+        super().__init__(task)
+        self.ranking_head_name = ranking_head_name
+        if save_predictions is not None:
+            self.prediction_h = open(save_predictions, "w")
+        else:
+            self.prediction_h = None
+        self.num_classes = num_classes
+
+    def __del__(self):
+        if self.prediction_h is not None:
+            self.prediction_h.close()
+
+    @staticmethod
+    def add_args(parser):
+        # fmt: off
+        parser.add_argument('--save-predictions', metavar='FILE',
+                            help='file to save predictions to')
+        parser.add_argument('--ranking-head-name',
+                            default='sentence_classification_head',
+                            help='name of the ranking head to use')
+        # fmt: on
+
+    def forward(self, model, sample, reduce=True):
+        """Compute ranking loss for the given sample.
+
+        Returns a tuple with three elements:
+        1) the loss
+        2) the sample size, which is used as the denominator for the gradient
+        3) logging outputs to display while training
+        """
+        assert (
+            hasattr(model, "classification_heads")
+            and self.ranking_head_name in model.classification_heads
+        ), "model must provide sentence ranking head for --criterion=sentence_ranking"
+
+        scores = []
+        for idx in range(self.num_classes):
+            score, _ = model(
+                **sample["net_input{idx}".format(idx=idx + 1)],
+                classification_head_name=self.ranking_head_name,
+            )
+            scores.append(score)
+
+        logits = torch.cat(scores, dim=1)
+        sample_size = logits.size(0)
+
+        if "target" in sample:
+            targets = model.get_targets(sample, [logits]).view(-1)
+            lprobs = F.log_softmax(logits, dim=-1, dtype=torch.float32)
+            loss = F.nll_loss(lprobs, targets, reduction="sum")
+        else:
+            targets = None
+            loss = torch.tensor(0.0, requires_grad=True)
+
+        if self.prediction_h is not None:
+            preds = logits.argmax(dim=1)
+            for i, (id, pred) in enumerate(zip(sample["id"].tolist(), preds.tolist())):
+                if targets is not None:
+                    label = targets[i].item()
+                    print("{}\t{}\t{}".format(id, pred, label), file=self.prediction_h)
+                else:
+                    print("{}\t{}".format(id, pred), file=self.prediction_h)
+
+        logging_output = {
+            "loss": loss.data,
+            "ntokens": sample["ntokens"],
+            "nsentences": sample_size,
+            "sample_size": sample_size,
+        }
+        if targets is not None:
+            logging_output["ncorrect"] = (logits.argmax(dim=1) == targets).sum()
+
+        return loss, sample_size, logging_output
+
+    @staticmethod
+    def reduce_metrics(logging_outputs) -> None:
+        """Aggregate logging outputs from data parallel training."""
+        loss_sum = sum(log.get("loss", 0) for log in logging_outputs)
+        ntokens = sum(log.get("ntokens", 0) for log in logging_outputs)
+        nsentences = sum(log.get("nsentences", 0) for log in logging_outputs)
+        sample_size = sum(log.get("sample_size", 0) for log in logging_outputs)
+
+        metrics.log_scalar(
+            "loss", loss_sum / sample_size / math.log(2), sample_size, round=3
+        )
+        if sample_size != ntokens:
+            metrics.log_scalar(
+                "nll_loss", loss_sum / ntokens / math.log(2), ntokens, round=3
+            )
+
+        if len(logging_outputs) > 0 and "ncorrect" in logging_outputs[0]:
+            ncorrect = sum(log.get("ncorrect", 0) for log in logging_outputs)
+            metrics.log_scalar(
+                "accuracy", 100.0 * ncorrect / nsentences, nsentences, round=1
+            )
+
+    @staticmethod
+    def logging_outputs_can_be_summed() -> bool:
+        """
+        Whether the logging outputs returned by `forward` can be summed
+        across workers prior to calling `reduce_metrics`. Setting this
+        to True will improves distributed training speed.
+        """
+        return True
diff --git a/SpeechT5/fairseq/fairseq/criterions/wav2vec_criterion.py b/SpeechT5/fairseq/fairseq/criterions/wav2vec_criterion.py
new file mode 100644
index 0000000000000000000000000000000000000000..e04786cc3b75517cefd06303f98f8536f9279311
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/criterions/wav2vec_criterion.py
@@ -0,0 +1,229 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import math
+from dataclasses import dataclass, field
+from typing import List, Optional
+
+import torch
+import torch.nn.functional as F
+from fairseq import metrics, utils
+from fairseq.criterions import FairseqCriterion, register_criterion
+from fairseq.dataclass import FairseqDataclass
+from fairseq.logging.meters import safe_round
+from fairseq.utils import is_xla_tensor
+
+
+@dataclass
+class Wav2VecCriterionConfig(FairseqDataclass):
+    infonce: bool = field(
+        default=False,
+        metadata={
+            "help": "if set, uses cross entropy instead of binary cross entropy (i.e. InfoNCE loss)"
+        },
+    )
+    loss_weights: Optional[List[float]] = field(
+        default=None,
+        metadata={"help": "weights for additional loss terms (not first one)"},
+    )
+    log_keys: List[str] = field(
+        default_factory=lambda: [],
+        metadata={"help": "output keys to log"},
+    )
+
+@register_criterion("wav2vec", dataclass=Wav2VecCriterionConfig)
+class Wav2vecCriterion(FairseqCriterion):
+    def __init__(self, task, infonce=False, loss_weights=None, log_keys=None):
+        super().__init__(task)
+        self.infonce = infonce
+        self.loss_weights = loss_weights
+        self.log_keys = [] if log_keys is None else log_keys
+
+    def forward(self, model, sample, reduce=True):
+        """Compute the loss for the given sample.
+
+        Returns a tuple with three elements:
+        1) the loss
+        2) the sample size, which is used as the denominator for the gradient
+        3) logging outputs to display while training
+        """
+        net_output = model(**sample["net_input"])
+        logits = model.get_logits(net_output).float()
+        target = model.get_targets(sample, net_output)
+        self.xla = is_xla_tensor(logits)
+
+        # XXX: handle weights on xla.
+        weights = None
+        if hasattr(model, "get_target_weights") and not self.infonce:
+            weights = model.get_target_weights(target, net_output)
+            if torch.is_tensor(weights):
+                weights = weights.float()
+
+        losses = []
+
+        reduction = "none" if ((not reduce) or self.xla) else "sum"
+        if self.infonce:
+            loss = F.cross_entropy(logits, target, reduction=reduction)
+        else:
+            loss = F.binary_cross_entropy_with_logits(
+                logits, target.float(), weights, reduction=reduction
+            )
+
+        if self.xla:
+            # tpu-comment: since dynamic shapes lead to recompilations on xla,
+            # we don't shrink tensors using mask_indices.
+            # Instead, we use mask indices to adjust loss.
+            mi = (
+                sample['net_input']['mask_indices']
+                .transpose(0, 1)  # logits are transposed in `model.get_logits`
+                .reshape(logits.size(0))
+            )
+            loss = (loss * mi).sum() if reduce else (loss * mi)
+
+        if 'sample_size' in sample:
+            sample_size = sample['sample_size']
+        elif 'mask_indices' in sample['net_input']:
+            sample_size = sample['net_input']['mask_indices'].sum()
+        else:
+            sample_size = target.numel() if self.infonce else target.long().sum().item()
+        losses.append(loss.detach().clone())
+
+        if self.loss_weights is not None:
+            assert hasattr(model, "get_extra_losses")
+            extra_losses = model.get_extra_losses(net_output)
+            if torch.is_tensor(extra_losses):
+                extra_losses = [extra_losses]
+            if len(self.loss_weights) == 1 and len(extra_losses) != 1:
+                self.loss_weights = [self.loss_weights[0]] * len(extra_losses)
+            assert len(extra_losses) == len(
+                self.loss_weights
+            ), f"{len(extra_losses)}, {len(self.loss_weights)}"
+            for p, coef in zip(extra_losses, self.loss_weights):
+                if coef != 0 and p is not None:
+                    p = coef * p.float() * sample_size
+                    loss += p
+                    losses.append(p)
+
+        logging_output = {
+            "loss": loss.item() if (reduce and not self.xla) else loss.detach(),
+            "ntokens": sample_size,
+            "nsentences": sample["id"].numel(),
+            "sample_size": sample_size,
+        }
+
+        for lk in self.log_keys:
+            # Only store "logits" and "target" for computing MAP and MAUC
+            # during validation
+            if lk == "logits":
+                if not self.training:
+                    logging_output["logits"] = logits.cpu().numpy()
+            elif lk == "target":
+                if not self.training:
+                    # If the targets have been mixed with the predictions of
+                    # teacher models, find the original targets
+                    if hasattr(model, "get_original_targets"):
+                        original_target = model.get_original_targets(sample, net_output)
+                    else:
+                        original_target = target
+                    logging_output["target"] = original_target.cpu().numpy()
+            elif lk in net_output:
+                value = net_output[lk]
+                if not is_xla_tensor(value):
+                    value = float(value)
+                logging_output[lk] = value
+
+        if len(losses) > 1:
+            for i, l in enumerate(losses):
+                logging_output[f"loss_{i}"] = l.item() if not self.xla else l.detach()
+
+        if self.infonce:
+            with torch.no_grad():
+                if logits.numel() == 0:
+                    corr = 0
+                    count = 0
+                else:
+                    assert logits.dim() > 1, logits.shape
+                    max = logits.argmax(-1) == 0
+                    min = logits.argmin(-1) == 0
+                    if is_xla_tensor(logits):
+                        max, min = max * mi, min * mi
+                        both = max & min
+                        corr = max.long().sum() - both.long().sum()
+                        count = mi.sum()
+                    else:
+                        both = max & min
+                        corr = max.long().sum().item() - both.long().sum().item()
+                        count = float(max.numel())
+
+                logging_output["correct"] = corr
+                logging_output["count"] = count
+
+        return loss, sample_size, logging_output
+
+    @staticmethod
+    def reduce_metrics(logging_outputs) -> None:
+        """Aggregate logging outputs from data parallel training."""
+        loss_sum = utils.item(sum(log.get("loss", 0) for log in logging_outputs))
+        ntokens = utils.item(sum(log.get("ntokens", 0) for log in logging_outputs))
+        nsentences = utils.item(
+            sum(log.get("nsentences", 0) for log in logging_outputs)
+        )
+        sample_size = utils.item(
+            sum(log.get("sample_size", 0) for log in logging_outputs)
+        )
+
+        metrics.log_scalar(
+            "loss", loss_sum / (sample_size or 1) / math.log(2), sample_size, round=3
+        )
+        metrics.log_scalar("ntokens", ntokens)
+        metrics.log_scalar("nsentences", nsentences)
+
+        correct = sum(log.get("correct", 0) for log in logging_outputs)
+        metrics.log_scalar("_correct", correct)
+
+        total = sum(log.get("count", 0) for log in logging_outputs)
+        metrics.log_scalar("_total", total)
+
+        if total > 0:
+            metrics.log_derived(
+                "accuracy",
+                lambda meters: safe_round(
+                    meters["_correct"].sum / meters["_total"].sum, 5
+                )
+                if meters["_total"].sum > 0
+                else float("nan"),
+            )
+
+        builtin_keys = {
+            "loss",
+            "ntokens",
+            "nsentences",
+            "sample_size",
+            "correct",
+            "count",
+        }
+
+        for k in logging_outputs[0]:
+            if k not in builtin_keys:
+                val = sum(log.get(k, 0) for log in logging_outputs)
+                if k.startswith("loss"):
+                    metrics.log_scalar(
+                        k, val / (sample_size or 1) / math.log(2), sample_size, round=3
+                    )
+                else:
+                    metrics.log_scalar(k, val / len(logging_outputs), round=3)
+
+    # FIXME: revert when gather based xla reduction is implemented
+    #@staticmethod
+    #def logging_outputs_can_be_summed() -> bool:
+    def logging_outputs_can_be_summed(self) -> bool:
+        """
+        Whether the logging outputs returned by `forward` can be summed
+        across workers prior to calling `reduce_metrics`. Setting this
+        to True will improves distributed training speed.
+        """
+        # XXX: Gather based reduction not implemented for xla yet.
+        # So we fall to sum based reduction for xla.
+        return self.xla
diff --git a/SpeechT5/fairseq/fairseq/data/__init__.py b/SpeechT5/fairseq/fairseq/data/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..8b7eb2ec4fc5190c4dcdfe34b0259e6f448e18a9
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/data/__init__.py
@@ -0,0 +1,128 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+"""isort:skip_file"""
+
+from .dictionary import Dictionary, TruncatedDictionary
+
+from .fairseq_dataset import FairseqDataset, FairseqIterableDataset
+
+from .base_wrapper_dataset import BaseWrapperDataset
+
+from .add_target_dataset import AddTargetDataset
+from .append_token_dataset import AppendTokenDataset
+from .audio.raw_audio_dataset import BinarizedAudioDataset, FileAudioDataset
+from .audio.hubert_dataset import HubertDataset
+from .backtranslation_dataset import BacktranslationDataset
+from .bucket_pad_length_dataset import BucketPadLengthDataset
+from .colorize_dataset import ColorizeDataset
+from .concat_dataset import ConcatDataset
+from .concat_sentences_dataset import ConcatSentencesDataset
+from .denoising_dataset import DenoisingDataset
+from .id_dataset import IdDataset
+from .indexed_dataset import (
+    IndexedCachedDataset,
+    IndexedDataset,
+    IndexedRawTextDataset,
+    MMapIndexedDataset,
+)
+from .language_pair_dataset import LanguagePairDataset
+from .list_dataset import ListDataset
+from .lm_context_window_dataset import LMContextWindowDataset
+from .lru_cache_dataset import LRUCacheDataset
+from .mask_tokens_dataset import MaskTokensDataset
+from .monolingual_dataset import MonolingualDataset
+from .multi_corpus_sampled_dataset import MultiCorpusSampledDataset
+from .nested_dictionary_dataset import NestedDictionaryDataset
+from .noising import NoisingDataset
+from .numel_dataset import NumelDataset
+from .num_samples_dataset import NumSamplesDataset
+from .offset_tokens_dataset import OffsetTokensDataset
+from .pad_dataset import LeftPadDataset, PadDataset, RightPadDataset
+from .prepend_dataset import PrependDataset
+from .prepend_token_dataset import PrependTokenDataset
+from .raw_label_dataset import RawLabelDataset
+from .replace_dataset import ReplaceDataset
+from .resampling_dataset import ResamplingDataset
+from .roll_dataset import RollDataset
+from .round_robin_zip_datasets import RoundRobinZipDatasets
+from .sort_dataset import SortDataset
+from .strip_token_dataset import StripTokenDataset
+from .subsample_dataset import SubsampleDataset
+from .token_block_dataset import TokenBlockDataset
+from .transform_eos_dataset import TransformEosDataset
+from .transform_eos_lang_pair_dataset import TransformEosLangPairDataset
+from .shorten_dataset import TruncateDataset, RandomCropDataset
+from .multilingual.sampled_multi_dataset import SampledMultiDataset
+from .multilingual.sampled_multi_epoch_dataset import SampledMultiEpochDataset
+from .fasta_dataset import FastaDataset, EncodedFastaDataset
+
+from .iterators import (
+    CountingIterator,
+    EpochBatchIterator,
+    GroupedIterator,
+    ShardedIterator,
+)
+
+__all__ = [
+    "AddTargetDataset",
+    "AppendTokenDataset",
+    "BacktranslationDataset",
+    "BaseWrapperDataset",
+    "BinarizedAudioDataset",
+    "BucketPadLengthDataset",
+    "ColorizeDataset",
+    "ConcatDataset",
+    "ConcatSentencesDataset",
+    "CountingIterator",
+    "DenoisingDataset",
+    "Dictionary",
+    "EncodedFastaDataset",
+    "EpochBatchIterator",
+    "FairseqDataset",
+    "FairseqIterableDataset",
+    "FastaDataset",
+    "FileAudioDataset",
+    "GroupedIterator",
+    "HubertDataset",
+    "IdDataset",
+    "IndexedCachedDataset",
+    "IndexedDataset",
+    "IndexedRawTextDataset",
+    "LanguagePairDataset",
+    "LeftPadDataset",
+    "ListDataset",
+    "LMContextWindowDataset",
+    "LRUCacheDataset",
+    "MaskTokensDataset",
+    "MMapIndexedDataset",
+    "MonolingualDataset",
+    "MultiCorpusSampledDataset",
+    "NestedDictionaryDataset",
+    "NoisingDataset",
+    "NumelDataset",
+    "NumSamplesDataset",
+    "OffsetTokensDataset",
+    "PadDataset",
+    "PrependDataset",
+    "PrependTokenDataset",
+    "RandomCropDataset",
+    "RawLabelDataset",
+    "ResamplingDataset",
+    "ReplaceDataset",
+    "RightPadDataset",
+    "RollDataset",
+    "RoundRobinZipDatasets",
+    "SampledMultiDataset",
+    "SampledMultiEpochDataset",
+    "ShardedIterator",
+    "SortDataset",
+    "StripTokenDataset",
+    "SubsampleDataset",
+    "TokenBlockDataset",
+    "TransformEosDataset",
+    "TransformEosLangPairDataset",
+    "TruncateDataset",
+    "TruncatedDictionary",
+]
diff --git a/SpeechT5/fairseq/fairseq/data/add_target_dataset.py b/SpeechT5/fairseq/fairseq/data/add_target_dataset.py
new file mode 100644
index 0000000000000000000000000000000000000000..9ef467058b89d9d74f703acbe5b45cb5ef9b2b69
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/data/add_target_dataset.py
@@ -0,0 +1,70 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import torch
+
+from . import BaseWrapperDataset, data_utils
+
+
+class AddTargetDataset(BaseWrapperDataset):
+    def __init__(
+        self,
+        dataset,
+        labels,
+        pad,
+        eos,
+        batch_targets,
+        process_label=None,
+        add_to_input=False,
+    ):
+        super().__init__(dataset)
+        self.labels = labels
+        self.batch_targets = batch_targets
+        self.pad = pad
+        self.eos = eos
+        self.process_label = process_label
+        self.add_to_input = add_to_input
+
+    def get_label(self, index):
+        return (
+            self.labels[index]
+            if self.process_label is None
+            else self.process_label(self.labels[index])
+        )
+
+    def __getitem__(self, index):
+        item = self.dataset[index]
+        item["label"] = self.get_label(index)
+        return item
+
+    def size(self, index):
+        sz = self.dataset.size(index)
+        own_sz = len(self.get_label(index))
+        return (sz, own_sz)
+
+    def collater(self, samples):
+        collated = self.dataset.collater(samples)
+        if len(collated) == 0:
+            return collated
+        indices = set(collated["id"].tolist())
+        target = [s["label"] for s in samples if s["id"] in indices]
+
+        if self.batch_targets:
+            collated["target_lengths"] = torch.LongTensor([len(t) for t in target])
+            target = data_utils.collate_tokens(target, pad_idx=self.pad, left_pad=False)
+            collated["ntokens"] = collated["target_lengths"].sum().item()
+        else:
+            collated["ntokens"] = sum([len(t) for t in target])
+
+        collated["target"] = target
+
+        if self.add_to_input:
+            eos = target.new_full((target.size(0), 1), self.eos)
+            collated["target"] = torch.cat([target, eos], dim=-1).long()
+            collated["net_input"]["prev_output_tokens"] = torch.cat(
+                [eos, target], dim=-1
+            ).long()
+            collated["ntokens"] += target.size(0)
+        return collated
diff --git a/SpeechT5/fairseq/fairseq/data/append_token_dataset.py b/SpeechT5/fairseq/fairseq/data/append_token_dataset.py
new file mode 100644
index 0000000000000000000000000000000000000000..87695bd0f5fcb6b10247e3b743340623e6438cc1
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/data/append_token_dataset.py
@@ -0,0 +1,41 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import numpy as np
+import torch
+
+from . import BaseWrapperDataset
+
+
+class AppendTokenDataset(BaseWrapperDataset):
+    def __init__(self, dataset, token=None):
+        super().__init__(dataset)
+        self.token = token
+        if token is not None:
+            self._sizes = np.array(dataset.sizes) + 1
+        else:
+            self._sizes = dataset.sizes
+
+    def __getitem__(self, idx):
+        item = self.dataset[idx]
+        if self.token is not None:
+            item = torch.cat([item, item.new([self.token])])
+        return item
+
+    @property
+    def sizes(self):
+        return self._sizes
+
+    def num_tokens(self, index):
+        n = self.dataset.num_tokens(index)
+        if self.token is not None:
+            n += 1
+        return n
+
+    def size(self, index):
+        n = self.dataset.size(index)
+        if self.token is not None:
+            n += 1
+        return n
diff --git a/SpeechT5/fairseq/fairseq/data/audio/__init__.py b/SpeechT5/fairseq/fairseq/data/audio/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/SpeechT5/fairseq/fairseq/data/audio/audio_utils.py b/SpeechT5/fairseq/fairseq/data/audio/audio_utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..f51cb0cddc29c732a4573b7b3e915844a80cd2f3
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/data/audio/audio_utils.py
@@ -0,0 +1,174 @@
+from pathlib import Path
+from typing import BinaryIO, Optional, Tuple, Union, List
+
+import numpy as np
+import torch
+
+
+SF_AUDIO_FILE_EXTENSIONS = {".wav", ".flac", ".ogg"}
+FEATURE_OR_SF_AUDIO_FILE_EXTENSIONS = {".npy", ".wav", ".flac", ".ogg"}
+
+
+def _convert_to_mono(
+        waveform: torch.FloatTensor, sample_rate: int
+) -> torch.FloatTensor:
+    if waveform.shape[0] > 1:
+        try:
+            import torchaudio.sox_effects as ta_sox
+        except ImportError:
+            raise ImportError(
+                "Please install torchaudio to convert multi-channel audios"
+            )
+        effects = [['channels', '1']]
+        return ta_sox.apply_effects_tensor(waveform, sample_rate, effects)[0]
+    return waveform
+
+
+def convert_to_mono(waveform: np.ndarray, sample_rate: int) -> np.ndarray:
+    if waveform.shape[0] > 1:
+        _waveform = torch.from_numpy(waveform)
+        return _convert_to_mono(_waveform, sample_rate).numpy()
+    return waveform
+
+
+def get_waveform(
+        path_or_fp: Union[str, BinaryIO], normalization=True, mono=True,
+        frames=-1, start=0, always_2d=True
+) -> Tuple[np.ndarray, int]:
+    """Get the waveform and sample rate of a 16-bit WAV/FLAC/OGG Vorbis audio.
+
+    Args:
+        path_or_fp (str or BinaryIO): the path or file-like object
+        normalization (bool): Normalize values to [-1, 1] (Default: True)
+        mono (bool): convert multi-channel audio to mono-channel one
+        frames (int): the number of frames to read. (-1 for reading all)
+        start (int): Where to start reading. A negative value counts from the end.
+        always_2d (bool): always return 2D array even for mono-channel audios
+    Returns:
+        waveform (numpy.ndarray): 1D or 2D waveform (channels x length)
+        sample_rate (float): sample rate
+    """
+    if isinstance(path_or_fp, str):
+        ext = Path(path_or_fp).suffix
+        if ext not in SF_AUDIO_FILE_EXTENSIONS:
+            raise ValueError(f"Unsupported audio format: {ext}")
+
+    try:
+        import soundfile as sf
+    except ImportError:
+        raise ImportError(
+            "Please install soundfile to load WAV/FLAC/OGG Vorbis audios"
+        )
+
+    waveform, sample_rate = sf.read(
+        path_or_fp, dtype="float32", always_2d=True, frames=frames, start=start
+    )
+    waveform = waveform.T  # T x C -> C x T
+    if mono and waveform.shape[0] > 1:
+        waveform = convert_to_mono(waveform, sample_rate)
+    if not normalization:
+        waveform *= 2 ** 15  # denormalized to 16-bit signed integers
+    if not always_2d:
+        waveform = waveform.squeeze(axis=0)
+    return waveform, sample_rate
+
+
+def _get_kaldi_fbank(
+        waveform: np.ndarray, sample_rate: int, n_bins=80
+) -> Optional[np.ndarray]:
+    """Get mel-filter bank features via PyKaldi."""
+    try:
+        from kaldi.feat.mel import MelBanksOptions
+        from kaldi.feat.fbank import FbankOptions, Fbank
+        from kaldi.feat.window import FrameExtractionOptions
+        from kaldi.matrix import Vector
+
+        mel_opts = MelBanksOptions()
+        mel_opts.num_bins = n_bins
+        frame_opts = FrameExtractionOptions()
+        frame_opts.samp_freq = sample_rate
+        opts = FbankOptions()
+        opts.mel_opts = mel_opts
+        opts.frame_opts = frame_opts
+        fbank = Fbank(opts=opts)
+        features = fbank.compute(Vector(waveform.squeeze()), 1.0).numpy()
+        return features
+    except ImportError:
+        return None
+
+
+def _get_torchaudio_fbank(
+        waveform: np.ndarray, sample_rate, n_bins=80
+) -> Optional[np.ndarray]:
+    """Get mel-filter bank features via TorchAudio."""
+    try:
+        import torchaudio.compliance.kaldi as ta_kaldi
+        waveform = torch.from_numpy(waveform)
+        features = ta_kaldi.fbank(
+            waveform, num_mel_bins=n_bins, sample_frequency=sample_rate
+        )
+        return features.numpy()
+    except ImportError:
+        return None
+
+
+def get_fbank(path_or_fp: Union[str, BinaryIO], n_bins=80) -> np.ndarray:
+    """Get mel-filter bank features via PyKaldi or TorchAudio. Prefer PyKaldi
+    (faster CPP implementation) to TorchAudio (Python implementation). Note that
+    Kaldi/TorchAudio requires 16-bit signed integers as inputs and hence the
+    waveform should not be normalized."""
+    waveform, sample_rate = get_waveform(path_or_fp, normalization=False)
+
+    features = _get_kaldi_fbank(waveform, sample_rate, n_bins)
+    if features is None:
+        features = _get_torchaudio_fbank(waveform, sample_rate, n_bins)
+    if features is None:
+        raise ImportError(
+            "Please install pyKaldi or torchaudio to enable "
+            "online filterbank feature extraction"
+        )
+
+    return features
+
+
+def is_npy_data(data: bytes) -> bool:
+    return data[0] == 147 and data[1] == 78
+
+
+def is_sf_audio_data(data: bytes) -> bool:
+    is_wav = (data[0] == 82 and data[1] == 73 and data[2] == 70)
+    is_flac = (data[0] == 102 and data[1] == 76 and data[2] == 97)
+    is_ogg = (data[0] == 79 and data[1] == 103 and data[2] == 103)
+    return is_wav or is_flac or is_ogg
+
+
+def read_from_stored_zip(zip_path: str, offset: int, file_size: int) -> bytes:
+    with open(zip_path, "rb") as f:
+        f.seek(offset)
+        data = f.read(file_size)
+    return data
+
+
+def parse_path(path: str) -> Tuple[str, List[int]]:
+    """Parse data path which is either a path to
+      1. a .npy/.wav/.flac/.ogg file
+      2. a stored ZIP file with slicing info: "[zip_path]:[offset]:[length]"
+
+        Args:
+            path (str): the data path to parse
+
+        Returns:
+            file_path (str): the file path
+            slice_ptr (list of int): empty in case 1;
+              byte offset and length for the slice in case 2
+    """
+
+    if Path(path).suffix in FEATURE_OR_SF_AUDIO_FILE_EXTENSIONS:
+        _path, slice_ptr = path, []
+    else:
+        _path, *slice_ptr = path.split(":")
+        if not Path(_path).is_file():
+            raise FileNotFoundError(f"File not found: {_path}")
+    assert len(slice_ptr) in {0, 2}, f"Invalid path: {path}"
+    slice_ptr = [int(i) for i in slice_ptr]
+    return _path, slice_ptr
diff --git a/SpeechT5/fairseq/fairseq/data/audio/feature_transforms/__init__.py b/SpeechT5/fairseq/fairseq/data/audio/feature_transforms/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..359fa069716cba0dd615ce0959368b20828c31f7
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/data/audio/feature_transforms/__init__.py
@@ -0,0 +1,82 @@
+import importlib
+import os
+from abc import ABC, abstractmethod
+from typing import Dict, Optional
+
+
+class AudioFeatureTransform(ABC):
+    @classmethod
+    @abstractmethod
+    def from_config_dict(cls, config: Optional[Dict] = None):
+        pass
+
+
+AUDIO_FEATURE_TRANSFORM_REGISTRY = {}
+AUDIO_FEATURE_TRANSFORM_CLASS_NAMES = set()
+
+
+def register_audio_feature_transform(name):
+    def register_audio_feature_transform_cls(cls):
+        if name in AUDIO_FEATURE_TRANSFORM_REGISTRY:
+            raise ValueError(f"Cannot register duplicate transform ({name})")
+        if not issubclass(cls, AudioFeatureTransform):
+            raise ValueError(
+                f"Transform ({name}: {cls.__name__}) must extend "
+                "AudioFeatureTransform"
+            )
+        if cls.__name__ in AUDIO_FEATURE_TRANSFORM_CLASS_NAMES:
+            raise ValueError(
+                f"Cannot register audio feature transform with duplicate "
+                f"class name ({cls.__name__})"
+            )
+        AUDIO_FEATURE_TRANSFORM_REGISTRY[name] = cls
+        AUDIO_FEATURE_TRANSFORM_CLASS_NAMES.add(cls.__name__)
+        return cls
+
+    return register_audio_feature_transform_cls
+
+
+def get_audio_feature_transform(name):
+    return AUDIO_FEATURE_TRANSFORM_REGISTRY[name]
+
+
+transforms_dir = os.path.dirname(__file__)
+for file in os.listdir(transforms_dir):
+    path = os.path.join(transforms_dir, file)
+    if (
+        not file.startswith("_")
+        and not file.startswith(".")
+        and (file.endswith(".py") or os.path.isdir(path))
+    ):
+        name = file[: file.find(".py")] if file.endswith(".py") else file
+        importlib.import_module("fairseq.data.audio.feature_transforms." + name)
+
+
+class CompositeAudioFeatureTransform(AudioFeatureTransform):
+    @classmethod
+    def from_config_dict(cls, config=None):
+        _config = {} if config is None else config
+        _transforms = _config.get("transforms")
+        if _transforms is None:
+            return None
+        transforms = [
+            get_audio_feature_transform(_t).from_config_dict(_config.get(_t))
+            for _t in _transforms
+        ]
+        return CompositeAudioFeatureTransform(transforms)
+
+    def __init__(self, transforms):
+        self.transforms = [t for t in transforms if t is not None]
+
+    def __call__(self, x):
+        for t in self.transforms:
+            x = t(x)
+        return x
+
+    def __repr__(self):
+        format_string = (
+            [self.__class__.__name__ + "("]
+            + [f"    {t.__repr__()}" for t in self.transforms]
+            + [")"]
+        )
+        return "\n".join(format_string)
diff --git a/SpeechT5/fairseq/fairseq/data/audio/feature_transforms/global_cmvn.py b/SpeechT5/fairseq/fairseq/data/audio/feature_transforms/global_cmvn.py
new file mode 100644
index 0000000000000000000000000000000000000000..e457ff176fee3b996da11f47e7dc61b81c445ba3
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/data/audio/feature_transforms/global_cmvn.py
@@ -0,0 +1,29 @@
+import numpy as np
+from fairseq.data.audio.feature_transforms import (
+    AudioFeatureTransform,
+    register_audio_feature_transform,
+)
+
+
+@register_audio_feature_transform("global_cmvn")
+class GlobalCMVN(AudioFeatureTransform):
+    """Global CMVN (cepstral mean and variance normalization). The global mean
+    and variance need to be pre-computed and stored in NumPy format (.npz)."""
+
+    @classmethod
+    def from_config_dict(cls, config=None):
+        _config = {} if config is None else config
+        return GlobalCMVN(_config.get("stats_npz_path"))
+
+    def __init__(self, stats_npz_path):
+        self.stats_npz_path = stats_npz_path
+        stats = np.load(stats_npz_path)
+        self.mean, self.std = stats["mean"], stats["std"]
+
+    def __repr__(self):
+        return self.__class__.__name__ + f'(stats_npz_path="{self.stats_npz_path}")'
+
+    def __call__(self, x):
+        x = np.subtract(x, self.mean)
+        x = np.divide(x, self.std)
+        return x
diff --git a/SpeechT5/fairseq/fairseq/data/audio/feature_transforms/specaugment.py b/SpeechT5/fairseq/fairseq/data/audio/feature_transforms/specaugment.py
new file mode 100644
index 0000000000000000000000000000000000000000..ce5802b41a903ea8f3e3e8a169d5048b4e908f99
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/data/audio/feature_transforms/specaugment.py
@@ -0,0 +1,131 @@
+import math
+import numbers
+from typing import Optional
+
+import numpy as np
+from fairseq.data.audio.feature_transforms import (
+    AudioFeatureTransform,
+    register_audio_feature_transform,
+)
+
+
+@register_audio_feature_transform("specaugment")
+class SpecAugmentTransform(AudioFeatureTransform):
+    """SpecAugment (https://arxiv.org/abs/1904.08779)"""
+
+    @classmethod
+    def from_config_dict(cls, config=None):
+        _config = {} if config is None else config
+        return SpecAugmentTransform(
+            _config.get("time_warp_W", 0),
+            _config.get("freq_mask_N", 0),
+            _config.get("freq_mask_F", 0),
+            _config.get("time_mask_N", 0),
+            _config.get("time_mask_T", 0),
+            _config.get("time_mask_p", 0.0),
+            _config.get("mask_value", None),
+        )
+
+    def __init__(
+        self,
+        time_warp_w: int = 0,
+        freq_mask_n: int = 0,
+        freq_mask_f: int = 0,
+        time_mask_n: int = 0,
+        time_mask_t: int = 0,
+        time_mask_p: float = 0.0,
+        mask_value: Optional[float] = 0.0,
+    ):
+        # Sanity checks
+        assert mask_value is None or isinstance(
+            mask_value, numbers.Number
+        ), f"mask_value (type: {type(mask_value)}) must be None or a number"
+        if freq_mask_n > 0:
+            assert freq_mask_f > 0, (
+                f"freq_mask_F ({freq_mask_f}) "
+                f"must be larger than 0 when doing freq masking."
+            )
+        if time_mask_n > 0:
+            assert time_mask_t > 0, (
+                f"time_mask_T ({time_mask_t}) must be larger than 0 when "
+                f"doing time masking."
+            )
+
+        self.time_warp_w = time_warp_w
+        self.freq_mask_n = freq_mask_n
+        self.freq_mask_f = freq_mask_f
+        self.time_mask_n = time_mask_n
+        self.time_mask_t = time_mask_t
+        self.time_mask_p = time_mask_p
+        self.mask_value = mask_value
+
+    def __repr__(self):
+        return (
+            self.__class__.__name__
+            + "("
+            + ", ".join(
+                [
+                    f"time_warp_w={self.time_warp_w}",
+                    f"freq_mask_n={self.freq_mask_n}",
+                    f"freq_mask_f={self.freq_mask_f}",
+                    f"time_mask_n={self.time_mask_n}",
+                    f"time_mask_t={self.time_mask_t}",
+                    f"time_mask_p={self.time_mask_p}",
+                ]
+            )
+            + ")"
+        )
+
+    def __call__(self, spectrogram):
+        assert len(spectrogram.shape) == 2, "spectrogram must be a 2-D tensor."
+
+        distorted = spectrogram.copy()  # make a copy of input spectrogram.
+        num_frames = spectrogram.shape[0]  # or 'tau' in the paper.
+        num_freqs = spectrogram.shape[1]  # or 'miu' in the paper.
+        mask_value = self.mask_value
+
+        if mask_value is None:  # if no value was specified, use local mean.
+            mask_value = spectrogram.mean()
+
+        if num_frames == 0:
+            return spectrogram
+
+        if num_freqs < self.freq_mask_f:
+            return spectrogram
+
+        if self.time_warp_w > 0:
+            if 2 * self.time_warp_w < num_frames:
+                import cv2
+
+                w0 = np.random.randint(self.time_warp_w, num_frames - self.time_warp_w)
+                w = np.random.randint(-self.time_warp_w + 1, self.time_warp_w)
+                upper, lower = distorted[:w0, :], distorted[w0:, :]
+                upper = cv2.resize(
+                    upper, dsize=(num_freqs, w0 + w), interpolation=cv2.INTER_LINEAR
+                )
+                lower = cv2.resize(
+                    lower,
+                    dsize=(num_freqs, num_frames - w0 - w),
+                    interpolation=cv2.INTER_LINEAR,
+                )
+                distorted = np.concatenate((upper, lower), axis=0)
+
+        for _i in range(self.freq_mask_n):
+            f = np.random.randint(0, self.freq_mask_f)
+            f0 = np.random.randint(0, num_freqs - f)
+            if f != 0:
+                distorted[:, f0 : f0 + f] = mask_value
+
+        max_time_mask_t = min(
+            self.time_mask_t, math.floor(num_frames * self.time_mask_p)
+        )
+        if max_time_mask_t < 1:
+            return distorted
+
+        for _i in range(self.time_mask_n):
+            t = np.random.randint(0, max_time_mask_t)
+            t0 = np.random.randint(0, num_frames - t)
+            if t != 0:
+                distorted[t0 : t0 + t, :] = mask_value
+
+        return distorted
diff --git a/SpeechT5/fairseq/fairseq/data/audio/feature_transforms/utterance_cmvn.py b/SpeechT5/fairseq/fairseq/data/audio/feature_transforms/utterance_cmvn.py
new file mode 100644
index 0000000000000000000000000000000000000000..6bbd0ae821b42ab693f4141e7c161d6d7cb0b15a
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/data/audio/feature_transforms/utterance_cmvn.py
@@ -0,0 +1,40 @@
+import numpy as np
+from fairseq.data.audio.feature_transforms import (
+    AudioFeatureTransform,
+    register_audio_feature_transform,
+)
+
+
+@register_audio_feature_transform("utterance_cmvn")
+class UtteranceCMVN(AudioFeatureTransform):
+    """Utterance-level CMVN (cepstral mean and variance normalization)"""
+
+    @classmethod
+    def from_config_dict(cls, config=None):
+        _config = {} if config is None else config
+        return UtteranceCMVN(
+            _config.get("norm_means", True),
+            _config.get("norm_vars", True),
+        )
+
+    def __init__(self, norm_means=True, norm_vars=True):
+        self.norm_means, self.norm_vars = norm_means, norm_vars
+
+    def __repr__(self):
+        return (
+            self.__class__.__name__
+            + f"(norm_means={self.norm_means}, norm_vars={self.norm_vars})"
+        )
+
+    def __call__(self, x):
+        mean = x.mean(axis=0)
+        square_sums = (x ** 2).sum(axis=0)
+
+        if self.norm_means:
+            x = np.subtract(x, mean)
+        if self.norm_vars:
+            var = square_sums / x.shape[0] - mean ** 2
+            std = np.sqrt(np.maximum(var, 1e-10))
+            x = np.divide(x, std)
+
+        return x
diff --git a/SpeechT5/fairseq/fairseq/data/audio/hubert_dataset.py b/SpeechT5/fairseq/fairseq/data/audio/hubert_dataset.py
new file mode 100644
index 0000000000000000000000000000000000000000..f00fe301a64a8740ed3ce07e44f6774edb933926
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/data/audio/hubert_dataset.py
@@ -0,0 +1,358 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import itertools
+import logging
+import os
+import sys
+from typing import Any, List, Optional, Union
+
+import numpy as np
+
+import torch
+import torch.nn.functional as F
+from fairseq.data import data_utils
+from fairseq.data.fairseq_dataset import FairseqDataset
+
+logger = logging.getLogger(__name__)
+
+
+def load_audio(manifest_path, max_keep, min_keep):
+    n_long, n_short = 0, 0
+    names, inds, sizes = [], [], []
+    with open(manifest_path) as f:
+        root = f.readline().strip()
+        for ind, line in enumerate(f):
+            items = line.strip().split("\t")
+            assert len(items) == 2, line
+            sz = int(items[1])
+            if min_keep is not None and sz < min_keep:
+                n_short += 1
+            elif max_keep is not None and sz > max_keep:
+                n_long += 1
+            else:
+                names.append(items[0])
+                inds.append(ind)
+                sizes.append(sz)
+    tot = ind + 1
+    logger.info(
+        (
+            f"max_keep={max_keep}, min_keep={min_keep}, "
+            f"loaded {len(names)}, skipped {n_short} short and {n_long} long, "
+            f"longest-loaded={max(sizes)}, shortest-loaded={min(sizes)}"
+        )
+    )
+    return root, names, inds, tot, sizes
+
+
+def load_label(label_path, inds, tot):
+    with open(label_path) as f:
+        labels = [line.rstrip() for line in f]
+        assert (
+            len(labels) == tot
+        ), f"number of labels does not match ({len(labels)} != {tot})"
+        labels = [labels[i] for i in inds]
+    return labels
+
+
+def load_label_offset(label_path, inds, tot):
+    with open(label_path) as f:
+        code_lengths = [len(line.encode("utf-8")) for line in f]
+        assert (
+            len(code_lengths) == tot
+        ), f"number of labels does not match ({len(code_lengths)} != {tot})"
+        offsets = list(itertools.accumulate([0] + code_lengths))
+        offsets = [(offsets[i], offsets[i + 1]) for i in inds]
+    return offsets
+
+
+def verify_label_lengths(
+    audio_sizes,
+    audio_rate,
+    label_path,
+    label_rate,
+    inds,
+    tot,
+    tol=0.1,  # tolerance in seconds
+):
+    if label_rate < 0:
+        logger.info(f"{label_path} is sequence label. skipped")
+        return
+
+    with open(label_path) as f:
+        lengths = [len(line.rstrip().split()) for line in f]
+        assert len(lengths) == tot
+        lengths = [lengths[i] for i in inds]
+    num_invalid = 0
+    for i, ind in enumerate(inds):
+        dur_from_audio = audio_sizes[i] / audio_rate
+        dur_from_label = lengths[i] / label_rate
+        if abs(dur_from_audio - dur_from_label) > tol:
+            logger.warning(
+                (
+                    f"audio and label duration differ too much "
+                    f"(|{dur_from_audio} - {dur_from_label}| > {tol}) "
+                    f"in line {ind+1} of {label_path}. Check if `label_rate` "
+                    f"is correctly set (currently {label_rate}). "
+                    f"num. of samples = {audio_sizes[i]}; "
+                    f"label length = {lengths[i]}"
+                )
+            )
+            num_invalid += 1
+    if num_invalid > 0:
+        logger.warning(
+            f"total {num_invalid} (audio, label) pairs with mismatched lengths"
+        )
+
+
+class HubertDataset(FairseqDataset):
+    def __init__(
+        self,
+        manifest_path: str,
+        sample_rate: float,
+        label_paths: List[str],
+        label_rates: Union[List[float], float],  # -1 for sequence labels
+        pad_list: List[str],
+        eos_list: List[str],
+        label_processors: Optional[List[Any]] = None,
+        max_keep_sample_size: Optional[int] = None,
+        min_keep_sample_size: Optional[int] = None,
+        max_sample_size: Optional[int] = None,
+        shuffle: bool = True,
+        pad_audio: bool = False,
+        normalize: bool = False,
+        store_labels: bool = True,
+        random_crop: bool = False,
+        single_target: bool = False,
+    ):
+        self.audio_root, self.audio_names, inds, tot, self.sizes = load_audio(
+            manifest_path, max_keep_sample_size, min_keep_sample_size
+        )
+        self.sample_rate = sample_rate
+        self.shuffle = shuffle
+        self.random_crop = random_crop
+
+        self.num_labels = len(label_paths)
+        self.pad_list = pad_list
+        self.eos_list = eos_list
+        self.label_processors = label_processors
+        self.single_target = single_target
+        self.label_rates = (
+            [label_rates for _ in range(len(label_paths))]
+            if isinstance(label_rates, int)
+            else label_rates
+        )
+        self.store_labels = store_labels
+        if store_labels:
+            self.label_list = [load_label(p, inds, tot) for p in label_paths]
+        else:
+            self.label_paths = label_paths
+            self.label_offsets_list = [
+                load_label_offset(p, inds, tot) for p in label_paths
+            ]
+        assert (
+            label_processors is None
+            or len(label_processors) == self.num_labels
+        )
+        for label_path, label_rate in zip(label_paths, self.label_rates):
+            verify_label_lengths(
+                self.sizes, sample_rate, label_path, label_rate, inds, tot
+            )
+
+        self.max_sample_size = (
+            max_sample_size if max_sample_size is not None else sys.maxsize
+        )
+        self.pad_audio = pad_audio
+        self.normalize = normalize
+        logger.info(
+            f"pad_audio={pad_audio}, random_crop={random_crop}, "
+            f"normalize={normalize}, max_sample_size={self.max_sample_size}"
+        )
+
+    def get_audio(self, index):
+        import soundfile as sf
+
+        wav_path = os.path.join(self.audio_root, self.audio_names[index])
+        wav, cur_sample_rate = sf.read(wav_path)
+        wav = torch.from_numpy(wav).float()
+        wav = self.postprocess(wav, cur_sample_rate)
+        return wav
+
+    def get_label(self, index, label_idx):
+        if self.store_labels:
+            label = self.label_list[label_idx][index]
+        else:
+            with open(self.label_paths[label_idx]) as f:
+                offset_s, offset_e = self.label_offsets_list[label_idx][index]
+                f.seek(offset_s)
+                label = f.read(offset_e - offset_s)
+
+        if self.label_processors is not None:
+            label = self.label_processors[label_idx](label)
+        return label
+
+    def get_labels(self, index):
+        return [self.get_label(index, i) for i in range(self.num_labels)]
+
+    def __getitem__(self, index):
+        wav = self.get_audio(index)
+        labels = self.get_labels(index)
+        return {"id": index, "source": wav, "label_list": labels}
+
+    def __len__(self):
+        return len(self.sizes)
+
+    def crop_to_max_size(self, wav, target_size):
+        size = len(wav)
+        diff = size - target_size
+        if diff <= 0:
+            return wav, 0
+
+        start, end = 0, target_size
+        if self.random_crop:
+            start = np.random.randint(0, diff + 1)
+            end = size - diff + start
+        return wav[start:end], start
+
+    def collater(self, samples):
+        # target = max(sizes) -> random_crop not used
+        # target = max_sample_size -> random_crop used for long
+        samples = [s for s in samples if s["source"] is not None]
+        if len(samples) == 0:
+            return {}
+
+        audios = [s["source"] for s in samples]
+        audio_sizes = [len(s) for s in audios]
+        if self.pad_audio:
+            audio_size = min(max(audio_sizes), self.max_sample_size)
+        else:
+            audio_size = min(min(audio_sizes), self.max_sample_size)
+        collated_audios, padding_mask, audio_starts = self.collater_audio(
+            audios, audio_size
+        )
+
+        targets_by_label = [
+            [s["label_list"][i] for s in samples]
+            for i in range(self.num_labels)
+        ]
+        targets_list, lengths_list, ntokens_list = self.collater_label(
+            targets_by_label, audio_size, audio_starts
+        )
+
+        net_input = {"source": collated_audios, "padding_mask": padding_mask}
+        batch = {
+            "id": torch.LongTensor([s["id"] for s in samples]),
+            "net_input": net_input,
+        }
+
+        if self.single_target:
+            batch["target_lengths"] = lengths_list[0]
+            batch["ntokens"] = ntokens_list[0]
+            batch["target"] = targets_list[0]
+        else:
+            batch["target_lengths_list"] = lengths_list
+            batch["ntokens_list"] = ntokens_list
+            batch["target_list"] = targets_list
+        return batch
+
+    def collater_audio(self, audios, audio_size):
+        collated_audios = audios[0].new_zeros(len(audios), audio_size)
+        padding_mask = (
+            torch.BoolTensor(collated_audios.shape).fill_(False)
+            # if self.pad_audio else None
+        )
+        audio_starts = [0 for _ in audios]
+        for i, audio in enumerate(audios):
+            diff = len(audio) - audio_size
+            if diff == 0:
+                collated_audios[i] = audio
+            elif diff < 0:
+                assert self.pad_audio
+                collated_audios[i] = torch.cat(
+                    [audio, audio.new_full((-diff,), 0.0)]
+                )
+                padding_mask[i, diff:] = True
+            else:
+                collated_audios[i], audio_starts[i] = self.crop_to_max_size(
+                    audio, audio_size
+                )
+        return collated_audios, padding_mask, audio_starts
+
+    def collater_frm_label(
+        self, targets, audio_size, audio_starts, label_rate, pad
+    ):
+        assert label_rate > 0
+        s2f = label_rate / self.sample_rate
+        frm_starts = [int(round(s * s2f)) for s in audio_starts]
+        frm_size = int(round(audio_size * s2f))
+        if not self.pad_audio:
+            rem_size = [len(t) - s for t, s in zip(targets, frm_starts)]
+            frm_size = min(frm_size, *rem_size)
+        targets = [t[s: s + frm_size] for t, s in zip(targets, frm_starts)]
+        logger.debug(f"audio_starts={audio_starts}")
+        logger.debug(f"frame_starts={frm_starts}")
+        logger.debug(f"frame_size={frm_size}")
+
+        lengths = torch.LongTensor([len(t) for t in targets])
+        ntokens = lengths.sum().item()
+        targets = data_utils.collate_tokens(
+            targets, pad_idx=pad, left_pad=False
+        )
+        return targets, lengths, ntokens
+
+    def collater_seq_label(self, targets, pad):
+        lengths = torch.LongTensor([len(t) for t in targets])
+        ntokens = lengths.sum().item()
+        targets = data_utils.collate_tokens(
+            targets, pad_idx=pad, left_pad=False
+        )
+        return targets, lengths, ntokens
+
+    def collater_label(self, targets_by_label, audio_size, audio_starts):
+        targets_list, lengths_list, ntokens_list = [], [], []
+        itr = zip(targets_by_label, self.label_rates, self.pad_list)
+        for targets, label_rate, pad in itr:
+            if label_rate == -1:
+                targets, lengths, ntokens = self.collater_seq_label(
+                    targets, pad
+                )
+            else:
+                targets, lengths, ntokens = self.collater_frm_label(
+                    targets, audio_size, audio_starts, label_rate, pad
+                )
+            targets_list.append(targets)
+            lengths_list.append(lengths)
+            ntokens_list.append(ntokens)
+        return targets_list, lengths_list, ntokens_list
+
+    def num_tokens(self, index):
+        return self.size(index)
+
+    def size(self, index):
+        if self.pad_audio:
+            return self.sizes[index]
+        return min(self.sizes[index], self.max_sample_size)
+
+    def ordered_indices(self):
+        if self.shuffle:
+            order = [np.random.permutation(len(self))]
+        else:
+            order = [np.arange(len(self))]
+
+        order.append(self.sizes)
+        return np.lexsort(order)[::-1]
+
+    def postprocess(self, wav, cur_sample_rate):
+        if wav.dim() == 2:
+            wav = wav.mean(-1)
+        assert wav.dim() == 1, wav.dim()
+
+        if cur_sample_rate != self.sample_rate:
+            raise Exception(f"sr {cur_sample_rate} != {self.sample_rate}")
+
+        if self.normalize:
+            with torch.no_grad():
+                wav = F.layer_norm(wav, wav.shape)
+        return wav
diff --git a/SpeechT5/fairseq/fairseq/data/audio/raw_audio_dataset.py b/SpeechT5/fairseq/fairseq/data/audio/raw_audio_dataset.py
new file mode 100644
index 0000000000000000000000000000000000000000..9ce3f7e39d55860f38b3332fe79917c8d38724fe
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/data/audio/raw_audio_dataset.py
@@ -0,0 +1,386 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+
+import logging
+import os
+import sys
+import io
+
+import numpy as np
+import torch
+import torch.nn.functional as F
+
+from .. import FairseqDataset
+from ..data_utils import compute_mask_indices, get_buckets, get_bucketed_sizes
+from fairseq.data.audio.audio_utils import (
+    parse_path,
+    read_from_stored_zip,
+    is_sf_audio_data,
+)
+
+
+logger = logging.getLogger(__name__)
+
+
+class RawAudioDataset(FairseqDataset):
+    def __init__(
+        self,
+        sample_rate,
+        max_sample_size=None,
+        min_sample_size=0,
+        shuffle=True,
+        pad=False,
+        normalize=False,
+        compute_mask_indices=False,
+        **mask_compute_kwargs,
+    ):
+        super().__init__()
+
+        self.sample_rate = sample_rate
+        self.sizes = []
+        self.max_sample_size = (
+            max_sample_size if max_sample_size is not None else sys.maxsize
+        )
+        self.min_sample_size = min_sample_size
+        self.pad = pad
+        self.shuffle = shuffle
+        self.normalize = normalize
+        self.compute_mask_indices = compute_mask_indices
+        if self.compute_mask_indices:
+            self.mask_compute_kwargs = mask_compute_kwargs
+            self._features_size_map = {}
+            self._C = mask_compute_kwargs["encoder_embed_dim"]
+            self._conv_feature_layers = eval(mask_compute_kwargs["conv_feature_layers"])
+
+    def __getitem__(self, index):
+        raise NotImplementedError()
+
+    def __len__(self):
+        return len(self.sizes)
+
+    def postprocess(self, feats, curr_sample_rate):
+        if feats.dim() == 2:
+            feats = feats.mean(-1)
+
+        if curr_sample_rate != self.sample_rate:
+            raise Exception(f"sample rate: {curr_sample_rate}, need {self.sample_rate}")
+
+        assert feats.dim() == 1, feats.dim()
+
+        if self.normalize:
+            with torch.no_grad():
+                feats = F.layer_norm(feats, feats.shape)
+        return feats
+
+    def crop_to_max_size(self, wav, target_size):
+        size = len(wav)
+        diff = size - target_size
+        if diff <= 0:
+            return wav
+
+        start = np.random.randint(0, diff + 1)
+        end = size - diff + start
+        return wav[start:end]
+
+    def _compute_mask_indices(self, dims, padding_mask):
+        B, T, C = dims
+        mask_indices, mask_channel_indices = None, None
+        if self.mask_compute_kwargs["mask_prob"] > 0:
+            mask_indices = compute_mask_indices(
+                (B, T),
+                padding_mask,
+                self.mask_compute_kwargs["mask_prob"],
+                self.mask_compute_kwargs["mask_length"],
+                self.mask_compute_kwargs["mask_selection"],
+                self.mask_compute_kwargs["mask_other"],
+                min_masks=2,
+                no_overlap=self.mask_compute_kwargs["no_mask_overlap"],
+                min_space=self.mask_compute_kwargs["mask_min_space"],
+            )
+            mask_indices = torch.from_numpy(mask_indices)
+        if self.mask_compute_kwargs["mask_channel_prob"] > 0:
+            mask_channel_indices = compute_mask_indices(
+                (B, C),
+                None,
+                self.mask_compute_kwargs["mask_channel_prob"],
+                self.mask_compute_kwargs["mask_channel_length"],
+                self.mask_compute_kwargs["mask_channel_selection"],
+                self.mask_compute_kwargs["mask_channel_other"],
+                no_overlap=self.mask_compute_kwargs["no_mask_channel_overlap"],
+                min_space=self.mask_compute_kwargs["mask_channel_min_space"],
+            )
+            mask_channel_indices = (
+                torch.from_numpy(mask_channel_indices).unsqueeze(1).expand(-1, T, -1)
+            )
+
+        return mask_indices, mask_channel_indices
+
+    @staticmethod
+    def _bucket_tensor(tensor, num_pad, value):
+        return F.pad(tensor, (0, num_pad), value=value)
+
+    def collater(self, samples):
+        samples = [s for s in samples if s["source"] is not None]
+        if len(samples) == 0:
+            return {}
+
+        sources = [s["source"] for s in samples]
+        sizes = [len(s) for s in sources]
+
+        if self.pad:
+            target_size = min(max(sizes), self.max_sample_size)
+        else:
+            target_size = min(min(sizes), self.max_sample_size)
+
+        collated_sources = sources[0].new_zeros(len(sources), target_size)
+        padding_mask = (
+            torch.BoolTensor(collated_sources.shape).fill_(False) if self.pad else None
+        )
+        for i, (source, size) in enumerate(zip(sources, sizes)):
+            diff = size - target_size
+            if diff == 0:
+                collated_sources[i] = source
+            elif diff < 0:
+                assert self.pad
+                collated_sources[i] = torch.cat(
+                    [source, source.new_full((-diff,), 0.0)]
+                )
+                padding_mask[i, diff:] = True
+            else:
+                collated_sources[i] = self.crop_to_max_size(source, target_size)
+
+        input = {"source": collated_sources}
+        out = {"id": torch.LongTensor([s["id"] for s in samples])}
+        if self.pad:
+            input["padding_mask"] = padding_mask
+
+        if hasattr(self, "num_buckets") and self.num_buckets > 0:
+            assert self.pad, "Cannot bucket without padding first."
+            bucket = max(self._bucketed_sizes[s["id"]] for s in samples)
+            num_pad = bucket - collated_sources.size(-1)
+            if num_pad:
+                input["source"] = self._bucket_tensor(collated_sources, num_pad, 0)
+                input["padding_mask"] = self._bucket_tensor(padding_mask, num_pad, True)
+
+        if self.compute_mask_indices:
+            B = input["source"].size(0)
+            T = self._get_mask_indices_dims(input["source"].size(-1))
+            padding_mask_reshaped = input["padding_mask"].clone()
+            extra = padding_mask_reshaped.size(1) % T
+            if extra > 0:
+                padding_mask_reshaped = padding_mask_reshaped[:, :-extra]
+            padding_mask_reshaped = padding_mask_reshaped.view(
+                padding_mask_reshaped.size(0), T, -1
+            )
+            padding_mask_reshaped = padding_mask_reshaped.all(-1)
+            input["padding_count"] = padding_mask_reshaped.sum(-1).max().item()
+            mask_indices, mask_channel_indices = self._compute_mask_indices(
+                (B, T, self._C),
+                padding_mask_reshaped,
+            )
+            input["mask_indices"] = mask_indices
+            input["mask_channel_indices"] = mask_channel_indices
+            out["sample_size"] = mask_indices.sum().item()
+
+        out["net_input"] = input
+        return out
+
+    def _get_mask_indices_dims(self, size, padding=0, dilation=1):
+        if size not in self._features_size_map:
+            L_in = size
+            for (_, kernel_size, stride) in self._conv_feature_layers:
+                L_out = L_in + 2 * padding - dilation * (kernel_size - 1) - 1
+                L_out = 1 + L_out // stride
+                L_in = L_out
+            self._features_size_map[size] = L_out
+        return self._features_size_map[size]
+
+    def num_tokens(self, index):
+        return self.size(index)
+
+    def size(self, index):
+        """Return an example's size as a float or tuple. This value is used when
+        filtering a dataset with ``--max-positions``."""
+        if self.pad:
+            return self.sizes[index]
+        return min(self.sizes[index], self.max_sample_size)
+
+    def ordered_indices(self):
+        """Return an ordered list of indices. Batches will be constructed based
+        on this order."""
+
+        if self.shuffle:
+            order = [np.random.permutation(len(self))]
+            order.append(
+                np.minimum(
+                    np.array(self.sizes),
+                    self.max_sample_size,
+                )
+            )
+            return np.lexsort(order)[::-1]
+        else:
+            return np.arange(len(self))
+
+    def set_bucket_info(self, num_buckets):
+        self.num_buckets = num_buckets
+        if self.num_buckets > 0:
+            self._collated_sizes = np.minimum(
+                np.array(self.sizes),
+                self.max_sample_size,
+            )
+            self.buckets = get_buckets(
+                self._collated_sizes,
+                self.num_buckets,
+            )
+            self._bucketed_sizes = get_bucketed_sizes(
+                self._collated_sizes, self.buckets
+            )
+            logger.info(
+                f"{len(self.buckets)} bucket(s) for the audio dataset: "
+                f"{self.buckets}"
+            )
+
+
+class FileAudioDataset(RawAudioDataset):
+    def __init__(
+        self,
+        manifest_path,
+        sample_rate,
+        max_sample_size=None,
+        min_sample_size=0,
+        shuffle=True,
+        pad=False,
+        normalize=False,
+        num_buckets=0,
+        compute_mask_indices=False,
+        **mask_compute_kwargs,
+    ):
+        super().__init__(
+            sample_rate=sample_rate,
+            max_sample_size=max_sample_size,
+            min_sample_size=min_sample_size,
+            shuffle=shuffle,
+            pad=pad,
+            normalize=normalize,
+            compute_mask_indices=compute_mask_indices,
+            **mask_compute_kwargs,
+        )
+
+        skipped = 0
+        self.fnames = []
+        sizes = []
+        self.skipped_indices = set()
+
+        with open(manifest_path, "r") as f:
+            self.root_dir = f.readline().strip()
+            for i, line in enumerate(f):
+                items = line.strip().split("\t")
+                assert len(items) == 2, line
+                sz = int(items[1])
+                if min_sample_size is not None and sz < min_sample_size:
+                    skipped += 1
+                    self.skipped_indices.add(i)
+                    continue
+                self.fnames.append(items[0])
+                sizes.append(sz)
+        logger.info(f"loaded {len(self.fnames)}, skipped {skipped} samples")
+
+        self.sizes = np.array(sizes, dtype=np.int64)
+
+        try:
+            import pyarrow
+
+            self.fnames = pyarrow.array(self.fnames)
+        except:
+            logger.debug(
+                "Could not create a pyarrow array. Please install pyarrow for better performance"
+            )
+            pass
+
+        self.set_bucket_info(num_buckets)
+
+    def __getitem__(self, index):
+        import soundfile as sf
+
+        path_or_fp = os.path.join(self.root_dir, str(self.fnames[index]))
+        _path, slice_ptr = parse_path(path_or_fp)
+        if len(slice_ptr) == 2:
+            byte_data = read_from_stored_zip(_path, slice_ptr[0], slice_ptr[1])
+            assert is_sf_audio_data(byte_data)
+            path_or_fp = io.BytesIO(byte_data)
+
+        wav, curr_sample_rate = sf.read(path_or_fp, dtype="float32")
+
+        feats = torch.from_numpy(wav).float()
+        feats = self.postprocess(feats, curr_sample_rate)
+        return {"id": index, "source": feats}
+
+
+class BinarizedAudioDataset(RawAudioDataset):
+    def __init__(
+        self,
+        data_dir,
+        split,
+        sample_rate,
+        max_sample_size=None,
+        min_sample_size=0,
+        shuffle=True,
+        pad=False,
+        normalize=False,
+        num_buckets=0,
+        compute_mask_indices=False,
+        **mask_compute_kwargs,
+    ):
+        super().__init__(
+            sample_rate=sample_rate,
+            max_sample_size=max_sample_size,
+            min_sample_size=min_sample_size,
+            shuffle=shuffle,
+            pad=pad,
+            normalize=normalize,
+            compute_mask_indices=compute_mask_indices,
+            **mask_compute_kwargs,
+        )
+
+        from fairseq.data import data_utils, Dictionary
+
+        self.fnames_dict = Dictionary.load(os.path.join(data_dir, "dict.txt"))
+
+        root_path = os.path.join(data_dir, f"{split}.root")
+        if os.path.exists(root_path):
+            with open(root_path, "r") as f:
+                self.root_dir = next(f).strip()
+        else:
+            self.root_dir = None
+
+        fnames_path = os.path.join(data_dir, split)
+        self.fnames = data_utils.load_indexed_dataset(fnames_path, self.fnames_dict)
+        lengths_path = os.path.join(data_dir, f"{split}.lengths")
+
+        with open(lengths_path, "r") as f:
+            for line in f:
+                sz = int(line.rstrip())
+                assert (
+                    sz >= min_sample_size
+                ), f"Min sample size is not supported for binarized dataset, but found a sample with size {sz}"
+                self.sizes.append(sz)
+
+        self.sizes = np.array(self.sizes, dtype=np.int64)
+
+        self.set_bucket_info(num_buckets)
+        logger.info(f"loaded {len(self.fnames)} samples")
+
+    def __getitem__(self, index):
+        import soundfile as sf
+
+        fname = self.fnames_dict.string(self.fnames[index], separator="")
+        if self.root_dir:
+            fname = os.path.join(self.root_dir, fname)
+
+        wav, curr_sample_rate = sf.read(fname)
+        feats = torch.from_numpy(wav).float()
+        feats = self.postprocess(feats, curr_sample_rate)
+        return {"id": index, "source": feats}
diff --git a/SpeechT5/fairseq/fairseq/data/audio/speech_to_text_dataset.py b/SpeechT5/fairseq/fairseq/data/audio/speech_to_text_dataset.py
new file mode 100644
index 0000000000000000000000000000000000000000..d4b5668d8f9d4bc93fcbda73d867554d8f1b3107
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/data/audio/speech_to_text_dataset.py
@@ -0,0 +1,511 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import csv
+import io
+import logging
+import os.path as op
+import re
+from typing import Dict, List, Optional, Tuple
+
+import numpy as np
+import torch
+from fairseq.data import (
+    ConcatDataset,
+    Dictionary,
+    FairseqDataset,
+    ResamplingDataset,
+    data_utils as fairseq_data_utils,
+)
+from fairseq.data.audio.audio_utils import (
+    get_fbank, get_waveform, read_from_stored_zip, is_npy_data,
+    is_sf_audio_data, parse_path, FEATURE_OR_SF_AUDIO_FILE_EXTENSIONS
+)
+from fairseq.data.audio.feature_transforms import CompositeAudioFeatureTransform
+
+
+logger = logging.getLogger(__name__)
+
+
+class S2TDataConfig(object):
+    """Wrapper class for data config YAML"""
+
+    def __init__(self, yaml_path):
+        try:
+            import yaml
+        except ImportError:
+            print("Please install PyYAML to load YAML files for " "S2T data config")
+        self.config = {}
+        if op.isfile(yaml_path):
+            try:
+                with open(yaml_path) as f:
+                    self.config = yaml.load(f, Loader=yaml.FullLoader)
+            except Exception as e:
+                raise Exception(f"Failed to load config from {yaml_path}: {e}")
+        else:
+            raise FileNotFoundError(f"{yaml_path} not found")
+
+    @property
+    def vocab_filename(self):
+        """fairseq vocabulary file under data root"""
+        return self.config.get("vocab_filename", "dict.txt")
+
+    @property
+    def shuffle(self) -> bool:
+        """Shuffle dataset samples before batching"""
+        return self.config.get("shuffle", False)
+
+    @property
+    def pre_tokenizer(self) -> Dict:
+        """Pre-tokenizer to apply before subword tokenization. Returning
+        a dictionary with `tokenizer` providing the tokenizer name and
+        the other items providing the tokenizer-specific arguments.
+        Tokenizers are defined in `fairseq.data.encoders.*`"""
+        return self.config.get("pre_tokenizer", {"tokenizer": None})
+
+    @property
+    def bpe_tokenizer(self) -> Dict:
+        """Subword tokenizer to apply after pre-tokenization. Returning
+        a dictionary with `bpe` providing the tokenizer name and
+        the other items providing the tokenizer-specific arguments.
+        Tokenizers are defined in `fairseq.data.encoders.*`"""
+        return self.config.get("bpe_tokenizer", {"bpe": None})
+
+    @property
+    def prepend_tgt_lang_tag(self) -> bool:
+        """Prepend target lang ID token as the target BOS (e.g. for to-many
+        multilingual setting). During inference, this requires `--prefix-size 1`
+        to force BOS to be lang ID token."""
+        return self.config.get("prepend_tgt_lang_tag", False)
+
+    @property
+    def input_feat_per_channel(self):
+        """The dimension of input features (per audio channel)"""
+        return self.config.get("input_feat_per_channel", 80)
+
+    @property
+    def input_channels(self):
+        """The number of channels in the input audio"""
+        return self.config.get("input_channels", 1)
+
+    @property
+    def sampling_alpha(self):
+        """Hyper-parameter alpha = 1/T for temperature-based resampling.
+        (alpha = 1 for no resampling)"""
+        return self.config.get("sampling_alpha", 1.0)
+
+    @property
+    def use_audio_input(self):
+        """Needed by the dataset loader to see if the model requires
+        raw audio as inputs."""
+        return self.config.get("use_audio_input", False)
+
+    @property
+    def audio_root(self):
+        """Audio paths in the manifest TSV can be relative and this provides
+        the root path. Set this to empty string when using absolute paths."""
+        return self.config.get("audio_root", "")
+
+    def get_feature_transforms(self, split, is_train):
+        """Split-specific feature transforms. Allowing train set wildcard `_train`,
+        evaluation set wildcard `_eval` and general wildcard `*` for matching."""
+        from copy import deepcopy
+
+        cfg = deepcopy(self.config)
+        _cur = cfg.get("transforms", {})
+        cur = _cur.get(split)
+        cur = _cur.get("_train") if cur is None and is_train else cur
+        cur = _cur.get("_eval") if cur is None and not is_train else cur
+        cur = _cur.get("*") if cur is None else cur
+        cfg["transforms"] = cur
+        return cfg
+
+
+def get_features_from_npy_or_audio(path):
+    ext = op.splitext(op.basename(path))[1]
+    if ext not in FEATURE_OR_SF_AUDIO_FILE_EXTENSIONS:
+        raise ValueError(f'Unsupported file format for "{path}"')
+    return np.load(path) if ext == ".npy" else get_fbank(path)
+
+
+def get_features_or_waveform_from_stored_zip(
+    path, byte_offset, byte_size, need_waveform=False
+):
+    assert path.endswith(".zip")
+    data = read_from_stored_zip(path, byte_offset, byte_size)
+    f = io.BytesIO(data)
+    if is_npy_data(data):
+        features_or_waveform = np.load(f)
+    elif is_sf_audio_data(data):
+        features_or_waveform = \
+            get_waveform(f, always_2d=False)[0] if need_waveform else get_fbank(f)
+    else:
+        raise ValueError(f'Unknown file format for "{path}"')
+    return features_or_waveform
+
+
+def get_features_or_waveform(path: str, need_waveform=False):
+    """Get speech features from .npy file or waveform from .wav/.flac file.
+    The file may be inside an uncompressed ZIP file and is accessed via byte
+    offset and length.
+
+    Args:
+        path (str): File path in the format of "<.npy/.wav/.flac path>" or
+        "<zip path>:<byte offset>:<byte length>".
+        need_waveform (bool): return waveform instead of features.
+
+    Returns:
+        features_or_waveform (numpy.ndarray): speech features or waveform.
+    """
+    _path, slice_ptr = parse_path(path)
+    if len(slice_ptr) == 0:
+        if need_waveform:
+            return get_waveform(_path, always_2d=False)
+        return get_features_from_npy_or_audio(_path)
+    elif len(slice_ptr) == 2:
+        features_or_waveform = get_features_or_waveform_from_stored_zip(
+            _path, slice_ptr[0], slice_ptr[1], need_waveform=need_waveform
+        )
+    else:
+        raise ValueError(f"Invalid path: {path}")
+
+    return features_or_waveform
+
+
+def _collate_frames(
+    frames: List[torch.Tensor], is_audio_input: bool = False
+) -> torch.Tensor:
+    """
+    Convert a list of 2D frames into a padded 3D tensor
+    Args:
+        frames (list): list of 2D frames of size L[i]*f_dim. Where L[i] is
+            length of i-th frame and f_dim is static dimension of features
+    Returns:
+        3D tensor of size len(frames)*len_max*f_dim where len_max is max of L[i]
+    """
+    max_len = max(frame.size(0) for frame in frames)
+    if is_audio_input:
+        out = frames[0].new_zeros((len(frames), max_len))
+    else:
+        out = frames[0].new_zeros((len(frames), max_len, frames[0].size(1)))
+    for i, v in enumerate(frames):
+        out[i, : v.size(0)] = v
+    return out
+
+
+class SpeechToTextDataset(FairseqDataset):
+    LANG_TAG_TEMPLATE = "<lang:{}>"
+
+    def __init__(
+        self,
+        split: str,
+        is_train_split: bool,
+        data_cfg: S2TDataConfig,
+        audio_paths: List[str],
+        n_frames: List[int],
+        src_texts: Optional[List[str]] = None,
+        tgt_texts: Optional[List[str]] = None,
+        speakers: Optional[List[str]] = None,
+        src_langs: Optional[List[str]] = None,
+        tgt_langs: Optional[List[str]] = None,
+        ids: Optional[List[str]] = None,
+        tgt_dict: Optional[Dictionary] = None,
+        pre_tokenizer=None,
+        bpe_tokenizer=None,
+    ):
+        self.split, self.is_train_split = split, is_train_split
+        self.data_cfg = data_cfg
+        self.audio_paths, self.n_frames = audio_paths, n_frames
+        self.n_samples = len(audio_paths)
+        assert len(n_frames) == self.n_samples > 0
+        assert src_texts is None or len(src_texts) == self.n_samples
+        assert tgt_texts is None or len(tgt_texts) == self.n_samples
+        assert speakers is None or len(speakers) == self.n_samples
+        assert src_langs is None or len(src_langs) == self.n_samples
+        assert tgt_langs is None or len(tgt_langs) == self.n_samples
+        assert ids is None or len(ids) == self.n_samples
+        assert (tgt_dict is None and tgt_texts is None) or (
+            tgt_dict is not None and tgt_texts is not None
+        )
+        self.src_texts, self.tgt_texts = src_texts, tgt_texts
+        self.src_langs, self.tgt_langs = src_langs, tgt_langs
+        self.tgt_dict = tgt_dict
+        self.check_tgt_lang_tag()
+        self.ids = ids
+        self.shuffle = data_cfg.shuffle if is_train_split else False
+
+        self.feature_transforms = CompositeAudioFeatureTransform.from_config_dict(
+            self.data_cfg.get_feature_transforms(split, is_train_split)
+        )
+
+        self.pre_tokenizer = pre_tokenizer
+        self.bpe_tokenizer = bpe_tokenizer
+
+        logger.info(self.__repr__())
+
+    def __repr__(self):
+        return (
+            self.__class__.__name__
+            + f'(split="{self.split}", n_samples={self.n_samples}, '
+            f"prepend_tgt_lang_tag={self.data_cfg.prepend_tgt_lang_tag}, "
+            f"shuffle={self.shuffle}, transforms={self.feature_transforms})"
+        )
+
+    @classmethod
+    def is_lang_tag(cls, token):
+        pattern = cls.LANG_TAG_TEMPLATE.replace("{}", "(.*)")
+        return re.match(pattern, token)
+
+    def check_tgt_lang_tag(self):
+        if self.data_cfg.prepend_tgt_lang_tag:
+            assert self.tgt_langs is not None and self.tgt_dict is not None
+            tgt_lang_tags = [
+                self.LANG_TAG_TEMPLATE.format(t) for t in set(self.tgt_langs)
+            ]
+            assert all(t in self.tgt_dict for t in tgt_lang_tags)
+
+    def tokenize_text(self, text: str):
+        if self.pre_tokenizer is not None:
+            text = self.pre_tokenizer.encode(text)
+        if self.bpe_tokenizer is not None:
+            text = self.bpe_tokenizer.encode(text)
+        return text
+
+    def __getitem__(
+        self, index: int
+    ) -> Tuple[int, torch.Tensor, Optional[torch.Tensor]]:
+        source = get_features_or_waveform(
+            self.audio_paths[index], need_waveform=self.data_cfg.use_audio_input
+        )
+        if self.feature_transforms is not None:
+            assert not self.data_cfg.use_audio_input
+            source = self.feature_transforms(source)
+        source = torch.from_numpy(source).float()
+
+        target = None
+        if self.tgt_texts is not None:
+            tokenized = self.tokenize_text(self.tgt_texts[index])
+            target = self.tgt_dict.encode_line(
+                tokenized, add_if_not_exist=False, append_eos=True
+            ).long()
+            if self.data_cfg.prepend_tgt_lang_tag:
+                lang_tag = self.LANG_TAG_TEMPLATE.format(self.tgt_langs[index])
+                lang_tag_idx = self.tgt_dict.index(lang_tag)
+                target = torch.cat((torch.LongTensor([lang_tag_idx]), target), 0)
+        return index, source, target
+
+    def __len__(self):
+        return self.n_samples
+
+    def collater(self, samples: List[Tuple[int, torch.Tensor, torch.Tensor]]) -> Dict:
+        if len(samples) == 0:
+            return {}
+        indices = torch.tensor([i for i, _, _ in samples], dtype=torch.long)
+        frames = _collate_frames(
+            [s for _, s, _ in samples], self.data_cfg.use_audio_input
+        )
+        # sort samples by descending number of frames
+        n_frames = torch.tensor([s.size(0) for _, s, _ in samples], dtype=torch.long)
+        n_frames, order = n_frames.sort(descending=True)
+        indices = indices.index_select(0, order)
+        frames = frames.index_select(0, order)
+
+        target, target_lengths = None, None
+        prev_output_tokens = None
+        ntokens = None
+        if self.tgt_texts is not None:
+            target = fairseq_data_utils.collate_tokens(
+                [t for _, _, t in samples],
+                self.tgt_dict.pad(),
+                self.tgt_dict.eos(),
+                left_pad=False,
+                move_eos_to_beginning=False,
+            )
+            target = target.index_select(0, order)
+            target_lengths = torch.tensor(
+                [t.size(0) for _, _, t in samples], dtype=torch.long
+            ).index_select(0, order)
+            prev_output_tokens = fairseq_data_utils.collate_tokens(
+                [t for _, _, t in samples],
+                self.tgt_dict.pad(),
+                self.tgt_dict.eos(),
+                left_pad=False,
+                move_eos_to_beginning=True,
+            )
+            prev_output_tokens = prev_output_tokens.index_select(0, order)
+            ntokens = sum(t.size(0) for _, _, t in samples)
+
+        out = {
+            "id": indices,
+            "net_input": {
+                "src_tokens": frames,
+                "src_lengths": n_frames,
+                "prev_output_tokens": prev_output_tokens,
+            },
+            "target": target,
+            "target_lengths": target_lengths,
+            "ntokens": ntokens,
+            "nsentences": len(samples),
+        }
+        return out
+
+    def num_tokens(self, index):
+        return self.n_frames[index]
+
+    def size(self, index):
+        t_len = 0
+        if self.tgt_texts is not None:
+            tokenized = self.tokenize_text(self.tgt_texts[index])
+            t_len = len(tokenized.split(" "))
+        return self.n_frames[index], t_len
+
+    @property
+    def sizes(self):
+        return np.array(self.n_frames)
+
+    @property
+    def can_reuse_epoch_itr_across_epochs(self):
+        return True
+
+    def ordered_indices(self):
+        if self.shuffle:
+            order = [np.random.permutation(len(self))]
+        else:
+            order = [np.arange(len(self))]
+        # first by descending order of # of frames then by original/random order
+        order.append([-n for n in self.n_frames])
+        return np.lexsort(order)
+
+    def prefetch(self, indices):
+        raise False
+
+
+class SpeechToTextDatasetCreator(object):
+    # mandatory columns
+    KEY_ID, KEY_AUDIO, KEY_N_FRAMES = "id", "audio", "n_frames"
+    KEY_TGT_TEXT = "tgt_text"
+    # optional columns
+    KEY_SPEAKER, KEY_SRC_TEXT = "speaker", "src_text"
+    KEY_SRC_LANG, KEY_TGT_LANG = "src_lang", "tgt_lang"
+    # default values
+    DEFAULT_SPEAKER = DEFAULT_SRC_TEXT = DEFAULT_LANG = ""
+
+    @classmethod
+    def _from_list(
+        cls,
+        split_name: str,
+        is_train_split,
+        samples: List[List[Dict]],
+        data_cfg: S2TDataConfig,
+        tgt_dict,
+        pre_tokenizer,
+        bpe_tokenizer,
+    ) -> SpeechToTextDataset:
+        audio_paths, n_frames, src_texts, tgt_texts, ids = [], [], [], [], []
+        speakers, src_langs, tgt_langs = [], [], []
+        for s in samples:
+            ids.extend([ss[cls.KEY_ID] for ss in s])
+            audio_paths.extend(
+                [op.join(data_cfg.audio_root, ss[cls.KEY_AUDIO]) for ss in s]
+            )
+            n_frames.extend([int(ss[cls.KEY_N_FRAMES]) for ss in s])
+            tgt_texts.extend([ss[cls.KEY_TGT_TEXT] for ss in s])
+            src_texts.extend(
+                [ss.get(cls.KEY_SRC_TEXT, cls.DEFAULT_SRC_TEXT) for ss in s]
+            )
+            speakers.extend([ss.get(cls.KEY_SPEAKER, cls.DEFAULT_SPEAKER) for ss in s])
+            src_langs.extend([ss.get(cls.KEY_SRC_LANG, cls.DEFAULT_LANG) for ss in s])
+            tgt_langs.extend([ss.get(cls.KEY_TGT_LANG, cls.DEFAULT_LANG) for ss in s])
+        return SpeechToTextDataset(
+            split_name,
+            is_train_split,
+            data_cfg,
+            audio_paths,
+            n_frames,
+            src_texts,
+            tgt_texts,
+            speakers,
+            src_langs,
+            tgt_langs,
+            ids,
+            tgt_dict,
+            pre_tokenizer,
+            bpe_tokenizer,
+        )
+
+    @classmethod
+    def _get_size_ratios(cls, ids: List[str], sizes: List[int], alpha: float = 1.0):
+        """Size ratios for temperature-based sampling
+        (https://arxiv.org/abs/1907.05019)"""
+        _sizes = np.array(sizes)
+        prob = _sizes / _sizes.sum()
+        smoothed_prob = prob ** alpha
+        smoothed_prob = smoothed_prob / smoothed_prob.sum()
+        size_ratio = (smoothed_prob * _sizes.sum()) / _sizes
+
+        o_str = str({_i: f"{prob[i]:.3f}" for i, _i in enumerate(ids)})
+        logger.info(f"original sampling probability: {o_str}")
+        p_str = str({_i: f"{smoothed_prob[i]:.3f}" for i, _i in enumerate(ids)})
+        logger.info(f"balanced sampling probability: {p_str}")
+        sr_str = str({_id: f"{size_ratio[i]:.3f}" for i, _id in enumerate(ids)})
+        logger.info(f"balanced sampling size ratio: {sr_str}")
+        return size_ratio.tolist()
+
+    @classmethod
+    def from_tsv(
+        cls,
+        root: str,
+        data_cfg: S2TDataConfig,
+        splits: str,
+        tgt_dict,
+        pre_tokenizer,
+        bpe_tokenizer,
+        is_train_split: bool,
+        epoch: int,
+        seed: int,
+    ) -> SpeechToTextDataset:
+        samples = []
+        _splits = splits.split(",")
+        for split in _splits:
+            tsv_path = op.join(root, f"{split}.tsv")
+            if not op.isfile(tsv_path):
+                raise FileNotFoundError(f"Dataset not found: {tsv_path}")
+            with open(tsv_path) as f:
+                reader = csv.DictReader(
+                    f,
+                    delimiter="\t",
+                    quotechar=None,
+                    doublequote=False,
+                    lineterminator="\n",
+                    quoting=csv.QUOTE_NONE,
+                )
+                samples.append([dict(e) for e in reader])
+                assert len(samples) > 0
+
+        datasets = [
+            cls._from_list(
+                name,
+                is_train_split,
+                [s],
+                data_cfg,
+                tgt_dict,
+                pre_tokenizer,
+                bpe_tokenizer,
+            )
+            for name, s in zip(_splits, samples)
+        ]
+
+        if is_train_split and len(_splits) > 1 and data_cfg.sampling_alpha != 1.0:
+            # temperature-based sampling
+            size_ratios = cls._get_size_ratios(
+                _splits, [len(s) for s in samples], alpha=data_cfg.sampling_alpha
+            )
+            datasets = [
+                ResamplingDataset(
+                    d, size_ratio=r, seed=seed, epoch=epoch, replace=(r >= 1.0)
+                )
+                for d, r in zip(datasets, size_ratios)
+            ]
+        return ConcatDataset(datasets)
diff --git a/SpeechT5/fairseq/fairseq/data/backtranslation_dataset.py b/SpeechT5/fairseq/fairseq/data/backtranslation_dataset.py
new file mode 100644
index 0000000000000000000000000000000000000000..8f70c90df3d237077537993e125d366c95292f1a
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/data/backtranslation_dataset.py
@@ -0,0 +1,165 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import torch
+from fairseq import utils
+
+from . import FairseqDataset
+
+
+def backtranslate_samples(samples, collate_fn, generate_fn, cuda=True):
+    """Backtranslate a list of samples.
+
+    Given an input (*samples*) of the form:
+
+        [{'id': 1, 'source': 'hallo welt'}]
+
+    this will return:
+
+        [{'id': 1, 'source': 'hello world', 'target': 'hallo welt'}]
+
+    Args:
+        samples (List[dict]): samples to backtranslate. Individual samples are
+            expected to have a 'source' key, which will become the 'target'
+            after backtranslation.
+        collate_fn (callable): function to collate samples into a mini-batch
+        generate_fn (callable): function to generate backtranslations
+        cuda (bool): use GPU for generation (default: ``True``)
+
+    Returns:
+        List[dict]: an updated list of samples with a backtranslated source
+    """
+    collated_samples = collate_fn(samples)
+    s = utils.move_to_cuda(collated_samples) if cuda else collated_samples
+    generated_sources = generate_fn(s)
+
+    id_to_src = {sample["id"]: sample["source"] for sample in samples}
+
+    # Go through each tgt sentence in batch and its corresponding best
+    # generated hypothesis and create a backtranslation data pair
+    # {id: id, source: generated backtranslation, target: original tgt}
+    return [
+        {
+            "id": id.item(),
+            "target": id_to_src[id.item()],
+            "source": hypos[0]["tokens"].cpu(),
+        }
+        for id, hypos in zip(collated_samples["id"], generated_sources)
+    ]
+
+
+class BacktranslationDataset(FairseqDataset):
+    """
+    Sets up a backtranslation dataset which takes a tgt batch, generates
+    a src using a tgt-src backtranslation function (*backtranslation_fn*),
+    and returns the corresponding `{generated src, input tgt}` batch.
+
+    Args:
+        tgt_dataset (~fairseq.data.FairseqDataset): the dataset to be
+            backtranslated. Only the source side of this dataset will be used.
+            After backtranslation, the source sentences in this dataset will be
+            returned as the targets.
+        src_dict (~fairseq.data.Dictionary): the dictionary of backtranslated
+            sentences.
+        tgt_dict (~fairseq.data.Dictionary, optional): the dictionary of
+            sentences to be backtranslated.
+        backtranslation_fn (callable, optional): function to call to generate
+            backtranslations. This is typically the `generate` method of a
+            :class:`~fairseq.sequence_generator.SequenceGenerator` object.
+            Pass in None when it is not available at initialization time, and
+            use set_backtranslation_fn function to set it when available.
+        output_collater (callable, optional): function to call on the
+            backtranslated samples to create the final batch
+            (default: ``tgt_dataset.collater``).
+        cuda: use GPU for generation
+    """
+
+    def __init__(
+        self,
+        tgt_dataset,
+        src_dict,
+        tgt_dict=None,
+        backtranslation_fn=None,
+        output_collater=None,
+        cuda=True,
+        **kwargs
+    ):
+        self.tgt_dataset = tgt_dataset
+        self.backtranslation_fn = backtranslation_fn
+        self.output_collater = (
+            output_collater if output_collater is not None else tgt_dataset.collater
+        )
+        self.cuda = cuda if torch.cuda.is_available() else False
+        self.src_dict = src_dict
+        self.tgt_dict = tgt_dict
+
+    def __getitem__(self, index):
+        """
+        Returns a single sample from *tgt_dataset*. Note that backtranslation is
+        not applied in this step; use :func:`collater` instead to backtranslate
+        a batch of samples.
+        """
+        return self.tgt_dataset[index]
+
+    def __len__(self):
+        return len(self.tgt_dataset)
+
+    def set_backtranslation_fn(self, backtranslation_fn):
+        self.backtranslation_fn = backtranslation_fn
+
+    def collater(self, samples):
+        """Merge and backtranslate a list of samples to form a mini-batch.
+
+        Using the samples from *tgt_dataset*, load a collated target sample to
+        feed to the backtranslation model. Then take the backtranslation with
+        the best score as the source and the original input as the target.
+
+        Note: we expect *tgt_dataset* to provide a function `collater()` that
+        will collate samples into the format expected by *backtranslation_fn*.
+        After backtranslation, we will feed the new list of samples (i.e., the
+        `(backtranslated source, original source)` pairs) to *output_collater*
+        and return the result.
+
+        Args:
+            samples (List[dict]): samples to backtranslate and collate
+
+        Returns:
+            dict: a mini-batch with keys coming from *output_collater*
+        """
+        if samples[0].get("is_dummy", False):
+            return samples
+        samples = backtranslate_samples(
+            samples=samples,
+            collate_fn=self.tgt_dataset.collater,
+            generate_fn=(lambda net_input: self.backtranslation_fn(net_input)),
+            cuda=self.cuda,
+        )
+        return self.output_collater(samples)
+
+    def num_tokens(self, index):
+        """Just use the tgt dataset num_tokens"""
+        return self.tgt_dataset.num_tokens(index)
+
+    def ordered_indices(self):
+        """Just use the tgt dataset ordered_indices"""
+        return self.tgt_dataset.ordered_indices()
+
+    def size(self, index):
+        """Return an example's size as a float or tuple. This value is used
+        when filtering a dataset with ``--max-positions``.
+
+        Note: we use *tgt_dataset* to approximate the length of the source
+        sentence, since we do not know the actual length until after
+        backtranslation.
+        """
+        tgt_size = self.tgt_dataset.size(index)[0]
+        return (tgt_size, tgt_size)
+
+    @property
+    def supports_prefetch(self):
+        return getattr(self.tgt_dataset, "supports_prefetch", False)
+
+    def prefetch(self, indices):
+        return self.tgt_dataset.prefetch(indices)
diff --git a/SpeechT5/fairseq/fairseq/data/base_wrapper_dataset.py b/SpeechT5/fairseq/fairseq/data/base_wrapper_dataset.py
new file mode 100644
index 0000000000000000000000000000000000000000..134d398b47dc73c8807759188504aee205b3b34d
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/data/base_wrapper_dataset.py
@@ -0,0 +1,78 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from torch.utils.data.dataloader import default_collate
+
+from . import FairseqDataset
+
+
+class BaseWrapperDataset(FairseqDataset):
+    def __init__(self, dataset):
+        super().__init__()
+        self.dataset = dataset
+
+    def __getitem__(self, index):
+        return self.dataset[index]
+
+    def __len__(self):
+        return len(self.dataset)
+
+    def collater(self, samples):
+        if hasattr(self.dataset, "collater"):
+            return self.dataset.collater(samples)
+        else:
+            return default_collate(samples)
+
+    @property
+    def sizes(self):
+        return self.dataset.sizes
+
+    def num_tokens(self, index):
+        return self.dataset.num_tokens(index)
+
+    def size(self, index):
+        return self.dataset.size(index)
+
+    def ordered_indices(self):
+        return self.dataset.ordered_indices()
+
+    @property
+    def supports_prefetch(self):
+        return getattr(self.dataset, "supports_prefetch", False)
+
+    def attr(self, attr: str, index: int):
+        return self.dataset.attr(attr, index)
+
+    def prefetch(self, indices):
+        self.dataset.prefetch(indices)
+
+    def get_batch_shapes(self):
+        return self.dataset.get_batch_shapes()
+
+    def batch_by_size(
+        self,
+        indices,
+        max_tokens=None,
+        max_sentences=None,
+        required_batch_size_multiple=1,
+    ):
+        return self.dataset.batch_by_size(
+            indices,
+            max_tokens=max_tokens,
+            max_sentences=max_sentences,
+            required_batch_size_multiple=required_batch_size_multiple,
+        )
+
+    def filter_indices_by_size(self, indices, max_sizes):
+        return self.dataset.filter_indices_by_size(indices, max_sizes)
+
+    @property
+    def can_reuse_epoch_itr_across_epochs(self):
+        return self.dataset.can_reuse_epoch_itr_across_epochs
+
+    def set_epoch(self, epoch):
+        super().set_epoch(epoch)
+        if hasattr(self.dataset, "set_epoch"):
+            self.dataset.set_epoch(epoch)
diff --git a/SpeechT5/fairseq/fairseq/data/bucket_pad_length_dataset.py b/SpeechT5/fairseq/fairseq/data/bucket_pad_length_dataset.py
new file mode 100644
index 0000000000000000000000000000000000000000..0f9410014845873bb0344fca6478c231c88e9dea
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/data/bucket_pad_length_dataset.py
@@ -0,0 +1,78 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import numpy as np
+import torch.nn.functional as F
+from fairseq.data import BaseWrapperDataset
+from fairseq.data.data_utils import get_buckets, get_bucketed_sizes
+
+
+class BucketPadLengthDataset(BaseWrapperDataset):
+    """
+    Bucket and pad item lengths to the nearest bucket size. This can be used to
+    reduce the number of unique batch shapes, which is important on TPUs since
+    each new batch shape requires a recompilation.
+
+    Args:
+        dataset (FairseqDatset): dataset to bucket
+        sizes (List[int]): all item sizes
+        num_buckets (int): number of buckets to create
+        pad_idx (int): padding symbol
+        left_pad (bool): if True, pad on the left; otherwise right pad
+    """
+
+    def __init__(
+        self,
+        dataset,
+        sizes,
+        num_buckets,
+        pad_idx,
+        left_pad,
+        tensor_key=None,
+    ):
+        super().__init__(dataset)
+        self.pad_idx = pad_idx
+        self.left_pad = left_pad
+
+        assert num_buckets > 0
+        self.buckets = get_buckets(sizes, num_buckets)
+        self._bucketed_sizes = get_bucketed_sizes(sizes, self.buckets)
+        self._tensor_key = tensor_key
+
+    def _set_tensor(self, item, val):
+        if self._tensor_key is None:
+            return val
+        item[self._tensor_key] = val
+        return item
+
+    def _get_tensor(self, item):
+        if self._tensor_key is None:
+            return item
+        return item[self._tensor_key]
+
+    def _pad(self, tensor, bucket_size, dim=-1):
+        num_pad = bucket_size - tensor.size(dim)
+        return F.pad(
+            tensor,
+            (num_pad if self.left_pad else 0, 0 if self.left_pad else num_pad),
+            value=self.pad_idx,
+        )
+
+    def __getitem__(self, index):
+        item = self.dataset[index]
+        bucket_size = self._bucketed_sizes[index]
+        tensor = self._get_tensor(item)
+        padded = self._pad(tensor, bucket_size)
+        return self._set_tensor(item, padded)
+
+    @property
+    def sizes(self):
+        return self._bucketed_sizes
+
+    def num_tokens(self, index):
+        return self._bucketed_sizes[index]
+
+    def size(self, index):
+        return self._bucketed_sizes[index]
diff --git a/SpeechT5/fairseq/fairseq/data/colorize_dataset.py b/SpeechT5/fairseq/fairseq/data/colorize_dataset.py
new file mode 100644
index 0000000000000000000000000000000000000000..6ef097bff1a013f4944b1cb55e1e7e4e2480b3a6
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/data/colorize_dataset.py
@@ -0,0 +1,25 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import torch
+
+from . import BaseWrapperDataset
+
+
+class ColorizeDataset(BaseWrapperDataset):
+    """ Adds 'colors' property to net input that is obtained from the provided color getter for use by models """
+
+    def __init__(self, dataset, color_getter):
+        super().__init__(dataset)
+        self.color_getter = color_getter
+
+    def collater(self, samples):
+        base_collate = super().collater(samples)
+        if len(base_collate) > 0:
+            base_collate["net_input"]["colors"] = torch.tensor(
+                list(self.color_getter(self.dataset, s["id"]) for s in samples),
+                dtype=torch.long,
+            )
+        return base_collate
diff --git a/SpeechT5/fairseq/fairseq/data/concat_dataset.py b/SpeechT5/fairseq/fairseq/data/concat_dataset.py
new file mode 100644
index 0000000000000000000000000000000000000000..01a4078bb159fa44b2d1062b9a971fe7f1abd1c2
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/data/concat_dataset.py
@@ -0,0 +1,124 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import bisect
+
+import numpy as np
+from torch.utils.data.dataloader import default_collate
+
+from . import FairseqDataset
+
+
+class ConcatDataset(FairseqDataset):
+    @staticmethod
+    def cumsum(sequence, sample_ratios):
+        r, s = [], 0
+        for e, ratio in zip(sequence, sample_ratios):
+            curr_len = int(ratio * len(e))
+            r.append(curr_len + s)
+            s += curr_len
+        return r
+
+    def __init__(self, datasets, sample_ratios=1):
+        super(ConcatDataset, self).__init__()
+        assert len(datasets) > 0, "datasets should not be an empty iterable"
+        self.datasets = list(datasets)
+        if isinstance(sample_ratios, int):
+            sample_ratios = [sample_ratios] * len(self.datasets)
+        self.sample_ratios = sample_ratios
+        self.cumulative_sizes = self.cumsum(self.datasets, sample_ratios)
+        self.real_sizes = [len(d) for d in self.datasets]
+
+    def __len__(self):
+        return self.cumulative_sizes[-1]
+
+    def __getitem__(self, idx):
+        dataset_idx, sample_idx = self._get_dataset_and_sample_index(idx)
+        return self.datasets[dataset_idx][sample_idx]
+
+    def _get_dataset_and_sample_index(self, idx: int):
+        dataset_idx = bisect.bisect_right(self.cumulative_sizes, idx)
+        if dataset_idx == 0:
+            sample_idx = idx
+        else:
+            sample_idx = idx - self.cumulative_sizes[dataset_idx - 1]
+        sample_idx = sample_idx % self.real_sizes[dataset_idx]
+        return dataset_idx, sample_idx
+
+    def collater(self, samples, **extra_args):
+        # For now only supports datasets with same underlying collater implementations
+        if hasattr(self.datasets[0], "collater"):
+            return self.datasets[0].collater(samples, **extra_args)
+        else:
+            return default_collate(samples, **extra_args)
+
+    def size(self, idx: int):
+        """
+        Return an example's size as a float or tuple.
+        """
+        dataset_idx, sample_idx = self._get_dataset_and_sample_index(idx)
+        return self.datasets[dataset_idx].size(sample_idx)
+
+    def num_tokens(self, index: int):
+        return np.max(self.size(index))
+
+    def attr(self, attr: str, index: int):
+        dataset_idx = bisect.bisect_right(self.cumulative_sizes, index)
+        return getattr(self.datasets[dataset_idx], attr, None)
+
+    @property
+    def sizes(self):
+        _dataset_sizes = []
+        for ds, sr in zip(self.datasets, self.sample_ratios):
+            if isinstance(ds.sizes, np.ndarray):
+                _dataset_sizes.append(np.tile(ds.sizes, sr))
+            else:
+                # Only support underlying dataset with single size array.
+                assert isinstance(ds.sizes, list)
+                _dataset_sizes.append(np.tile(ds.sizes[0], sr))
+        return np.concatenate(_dataset_sizes)
+
+    @property
+    def supports_prefetch(self):
+        return all(d.supports_prefetch for d in self.datasets)
+
+    def ordered_indices(self):
+        """
+        Returns indices sorted by length. So less padding is needed.
+        """
+        if isinstance(self.sizes, np.ndarray) and len(self.sizes.shape) > 1:
+            # special handling for concatenating lang_pair_datasets
+            indices = np.arange(len(self))
+            sizes = self.sizes
+            tgt_sizes = (
+                sizes[:, 1] if len(sizes.shape) > 0 and sizes.shape[1] > 1 else None
+            )
+            src_sizes = (
+                sizes[:, 0] if len(sizes.shape) > 0 and sizes.shape[1] > 1 else sizes
+            )
+            # sort by target length, then source length
+            if tgt_sizes is not None:
+                indices = indices[np.argsort(tgt_sizes[indices], kind="mergesort")]
+            return indices[np.argsort(src_sizes[indices], kind="mergesort")]
+        else:
+            return np.argsort(self.sizes)
+
+    def prefetch(self, indices):
+        frm = 0
+        for to, ds in zip(self.cumulative_sizes, self.datasets):
+            real_size = len(ds)
+            if getattr(ds, "supports_prefetch", False):
+                ds.prefetch([(i - frm) % real_size for i in indices if frm <= i < to])
+            frm = to
+
+    @property
+    def can_reuse_epoch_itr_across_epochs(self):
+        return all(d.can_reuse_epoch_itr_across_epochs for d in self.datasets)
+
+    def set_epoch(self, epoch):
+        super().set_epoch(epoch)
+        for ds in self.datasets:
+            if hasattr(ds, "set_epoch"):
+                ds.set_epoch(epoch)
diff --git a/SpeechT5/fairseq/fairseq/data/concat_sentences_dataset.py b/SpeechT5/fairseq/fairseq/data/concat_sentences_dataset.py
new file mode 100644
index 0000000000000000000000000000000000000000..625a29370e90f9d1d7274024afb902ed83a22325
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/data/concat_sentences_dataset.py
@@ -0,0 +1,54 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import torch
+
+from . import FairseqDataset
+
+
+class ConcatSentencesDataset(FairseqDataset):
+    def __init__(self, *datasets):
+        super().__init__()
+        self.datasets = datasets
+        assert all(
+            len(ds) == len(datasets[0]) for ds in datasets
+        ), "datasets must have the same length"
+
+    def __getitem__(self, index):
+        return torch.cat([ds[index] for ds in self.datasets])
+
+    def __len__(self):
+        return len(self.datasets[0])
+
+    def collater(self, samples):
+        return self.datasets[0].collater(samples)
+
+    @property
+    def sizes(self):
+        return sum(ds.sizes for ds in self.datasets)
+
+    def num_tokens(self, index):
+        return sum(ds.num_tokens(index) for ds in self.datasets)
+
+    def size(self, index):
+        return sum(ds.size(index) for ds in self.datasets)
+
+    def ordered_indices(self):
+        return self.datasets[0].ordered_indices()
+
+    @property
+    def supports_prefetch(self):
+        return any(getattr(ds, "supports_prefetch", False) for ds in self.datasets)
+
+    def prefetch(self, indices):
+        for ds in self.datasets:
+            if getattr(ds, "supports_prefetch", False):
+                ds.prefetch(indices)
+
+    def set_epoch(self, epoch):
+        super().set_epoch(epoch)
+        for ds in self.datasets:
+            if hasattr(ds, "set_epoch"):
+                ds.set_epoch(epoch)
diff --git a/SpeechT5/fairseq/fairseq/data/data_utils.py b/SpeechT5/fairseq/fairseq/data/data_utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..b3de57681e0fb6b026003eff19f7745caf6799d3
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/data/data_utils.py
@@ -0,0 +1,595 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+try:
+    from collections.abc import Iterable
+except ImportError:
+    from collections import Iterable
+import contextlib
+import itertools
+import logging
+import re
+import warnings
+from typing import Optional, Tuple
+
+import numpy as np
+import torch
+
+from fairseq.file_io import PathManager
+from fairseq import utils
+import os
+
+logger = logging.getLogger(__name__)
+
+
+def infer_language_pair(path):
+    """Infer language pair from filename: <split>.<lang1>-<lang2>.(...).idx"""
+    src, dst = None, None
+    for filename in PathManager.ls(path):
+        parts = filename.split(".")
+        if len(parts) >= 3 and len(parts[1].split("-")) == 2:
+            return parts[1].split("-")
+    return src, dst
+
+
+def collate_tokens(
+    values,
+    pad_idx,
+    eos_idx=None,
+    left_pad=False,
+    move_eos_to_beginning=False,
+    pad_to_length=None,
+    pad_to_multiple=1,
+    pad_to_bsz=None,
+):
+    """Convert a list of 1d tensors into a padded 2d tensor."""
+    size = max(v.size(0) for v in values)
+    size = size if pad_to_length is None else max(size, pad_to_length)
+    if pad_to_multiple != 1 and size % pad_to_multiple != 0:
+        size = int(((size - 0.1) // pad_to_multiple + 1) * pad_to_multiple)
+
+    batch_size = len(values) if pad_to_bsz is None else max(len(values), pad_to_bsz)
+    res = values[0].new(batch_size, size).fill_(pad_idx)
+
+    def copy_tensor(src, dst):
+        assert dst.numel() == src.numel()
+        if move_eos_to_beginning:
+            if eos_idx is None:
+                # if no eos_idx is specified, then use the last token in src
+                dst[0] = src[-1]
+            else:
+                dst[0] = eos_idx
+            dst[1:] = src[:-1]
+        else:
+            dst.copy_(src)
+
+    for i, v in enumerate(values):
+        copy_tensor(v, res[i][size - len(v) :] if left_pad else res[i][: len(v)])
+    return res
+
+def load_indexed_dataset(
+    path, dictionary=None, dataset_impl=None, combine=False, default="cached"
+):
+    """A helper function for loading indexed datasets.
+
+    Args:
+        path (str): path to indexed dataset (e.g., 'data-bin/train')
+        dictionary (~fairseq.data.Dictionary): data dictionary
+        dataset_impl (str, optional): which dataset implementation to use. If
+            not provided, it will be inferred automatically. For legacy indexed
+            data we use the 'cached' implementation by default.
+        combine (bool, optional): automatically load and combine multiple
+            datasets. For example, if *path* is 'data-bin/train', then we will
+            combine 'data-bin/train', 'data-bin/train1', ... and return a
+            single ConcatDataset instance.
+    """
+    import fairseq.data.indexed_dataset as indexed_dataset
+    from fairseq.data.concat_dataset import ConcatDataset
+
+    datasets = []
+    for k in itertools.count():
+        path_k = path + (str(k) if k > 0 else "")
+        try:
+            path_k = indexed_dataset.get_indexed_dataset_to_local(path_k)
+        except Exception as e:
+            if "StorageException: [404] Path not found" in str(e):
+                logger.warning(f"path_k: {e} not found")
+            else:
+                raise e
+
+        dataset_impl_k = dataset_impl
+        if dataset_impl_k is None:
+            dataset_impl_k = indexed_dataset.infer_dataset_impl(path_k)
+        dataset = indexed_dataset.make_dataset(
+            path_k,
+            impl=dataset_impl_k or default,
+            fix_lua_indexing=True,
+            dictionary=dictionary,
+        )
+        if dataset is None:
+            break
+        logger.info("loaded {:,} examples from: {}".format(len(dataset), path_k))
+        datasets.append(dataset)
+        if not combine:
+            break
+    if len(datasets) == 0:
+        return None
+    elif len(datasets) == 1:
+        return datasets[0]
+    else:
+        return ConcatDataset(datasets)
+
+
+@contextlib.contextmanager
+def numpy_seed(seed, *addl_seeds):
+    """Context manager which seeds the NumPy PRNG with the specified seed and
+    restores the state afterward"""
+    if seed is None:
+        yield
+        return
+    if len(addl_seeds) > 0:
+        seed = int(hash((seed, *addl_seeds)) % 1e6)
+    state = np.random.get_state()
+    np.random.seed(seed)
+    try:
+        yield
+    finally:
+        np.random.set_state(state)
+
+
+def collect_filtered(function, iterable, filtered):
+    """
+    Similar to :func:`filter` but collects filtered elements in ``filtered``.
+
+    Args:
+        function (callable): function that returns ``False`` for elements that
+            should be filtered
+        iterable (iterable): iterable to filter
+        filtered (list): list to store filtered elements
+    """
+    for el in iterable:
+        if function(el):
+            yield el
+        else:
+            filtered.append(el)
+
+
+def _filter_by_size_dynamic(indices, size_fn, max_positions, raise_exception=False):
+    def compare_leq(a, b):
+        return a <= b if not isinstance(a, tuple) else max(a) <= b
+
+    def check_size(idx):
+        if isinstance(max_positions, float) or isinstance(max_positions, int):
+            return size_fn(idx) <= max_positions
+        elif isinstance(max_positions, dict):
+            idx_size = size_fn(idx)
+            assert isinstance(idx_size, dict)
+            intersect_keys = set(max_positions.keys()) & set(idx_size.keys())
+            return all(
+                all(
+                    a is None or b is None or a <= b
+                    for a, b in zip(idx_size[key], max_positions[key])
+                )
+                for key in intersect_keys
+            )
+        else:
+            # For MultiCorpusSampledDataset, will generalize it later
+            if not isinstance(size_fn(idx), Iterable):
+                return all(size_fn(idx) <= b for b in max_positions)
+            return all(
+                a is None or b is None or a <= b
+                for a, b in zip(size_fn(idx), max_positions)
+            )
+
+    ignored = []
+    itr = collect_filtered(check_size, indices, ignored)
+    indices = np.fromiter(itr, dtype=np.int64, count=-1)
+    return indices, ignored
+
+
+def filter_by_size(indices, dataset, max_positions, raise_exception=False):
+    """
+    [deprecated] Filter indices based on their size.
+    Use `FairseqDataset::filter_indices_by_size` instead.
+
+    Args:
+        indices (List[int]): ordered list of dataset indices
+        dataset (FairseqDataset): fairseq dataset instance
+        max_positions (tuple): filter elements larger than this size.
+            Comparisons are done component-wise.
+        raise_exception (bool, optional): if ``True``, raise an exception if
+            any elements are filtered (default: False).
+    """
+    warnings.warn(
+        "data_utils.filter_by_size is deprecated. "
+        "Use `FairseqDataset::filter_indices_by_size` instead.",
+        stacklevel=2,
+    )
+    if isinstance(max_positions, float) or isinstance(max_positions, int):
+        if hasattr(dataset, "sizes") and isinstance(dataset.sizes, np.ndarray):
+            ignored = indices[dataset.sizes[indices] > max_positions].tolist()
+            indices = indices[dataset.sizes[indices] <= max_positions]
+        elif (
+            hasattr(dataset, "sizes")
+            and isinstance(dataset.sizes, list)
+            and len(dataset.sizes) == 1
+        ):
+            ignored = indices[dataset.sizes[0][indices] > max_positions].tolist()
+            indices = indices[dataset.sizes[0][indices] <= max_positions]
+        else:
+            indices, ignored = _filter_by_size_dynamic(
+                indices, dataset.size, max_positions
+            )
+    else:
+        indices, ignored = _filter_by_size_dynamic(indices, dataset.size, max_positions)
+
+    if len(ignored) > 0 and raise_exception:
+        raise Exception(
+            (
+                "Size of sample #{} is invalid (={}) since max_positions={}, "
+                "skip this example with --skip-invalid-size-inputs-valid-test"
+            ).format(ignored[0], dataset.size(ignored[0]), max_positions)
+        )
+    if len(ignored) > 0:
+        logger.warning(
+            (
+                "{} samples have invalid sizes and will be skipped, "
+                "max_positions={}, first few sample ids={}"
+            ).format(len(ignored), max_positions, ignored[:10])
+        )
+    return indices
+
+
+def filter_paired_dataset_indices_by_size(src_sizes, tgt_sizes, indices, max_sizes):
+    """Filter a list of sample indices. Remove those that are longer
+        than specified in max_sizes.
+
+    Args:
+        indices (np.array): original array of sample indices
+        max_sizes (int or list[int] or tuple[int]): max sample size,
+            can be defined separately for src and tgt (then list or tuple)
+
+    Returns:
+        np.array: filtered sample array
+        list: list of removed indices
+    """
+    if max_sizes is None:
+        return indices, []
+    if type(max_sizes) in (int, float):
+        max_src_size, max_tgt_size = max_sizes, max_sizes
+    else:
+        max_src_size, max_tgt_size = max_sizes
+    if tgt_sizes is None:
+        ignored = indices[src_sizes[indices] > max_src_size]
+    else:
+        ignored = indices[
+            (src_sizes[indices] > max_src_size) | (tgt_sizes[indices] > max_tgt_size)
+        ]
+    if len(ignored) > 0:
+        if tgt_sizes is None:
+            indices = indices[src_sizes[indices] <= max_src_size]
+        else:
+            indices = indices[
+                (src_sizes[indices] <= max_src_size)
+                & (tgt_sizes[indices] <= max_tgt_size)
+            ]
+    return indices, ignored.tolist()
+
+
+def batch_by_size(
+    indices,
+    num_tokens_fn,
+    num_tokens_vec=None,
+    max_tokens=None,
+    max_sentences=None,
+    required_batch_size_multiple=1,
+    fixed_shapes=None,
+):
+    """
+    Yield mini-batches of indices bucketed by size. Batches may contain
+    sequences of different lengths.
+
+    Args:
+        indices (List[int]): ordered list of dataset indices
+        num_tokens_fn (callable): function that returns the number of tokens at
+            a given index
+        num_tokens_vec (List[int], optional): precomputed vector of the number
+            of tokens for each index in indices (to enable faster batch generation)
+        max_tokens (int, optional): max number of tokens in each batch
+            (default: None).
+        max_sentences (int, optional): max number of sentences in each
+            batch (default: None).
+        required_batch_size_multiple (int, optional): require batch size to
+            be less than N or a multiple of N (default: 1).
+        fixed_shapes (List[Tuple[int, int]], optional): if given, batches will
+            only be created with the given shapes. *max_sentences* and
+            *required_batch_size_multiple* will be ignored (default: None).
+    """
+    try:
+        from fairseq.data.data_utils_fast import (
+            batch_by_size_fn,
+            batch_by_size_vec,
+            batch_fixed_shapes_fast,
+        )
+    except ImportError:
+        raise ImportError(
+            "Please build Cython components with: "
+            "`python setup.py build_ext --inplace`"
+        )
+    except ValueError:
+        raise ValueError(
+            "Please build (or rebuild) Cython components with `python setup.py build_ext --inplace`."
+        )
+
+    # added int() to avoid TypeError: an integer is required
+    max_tokens = (
+        int(max_tokens) if max_tokens is not None else -1
+    )
+    max_sentences = max_sentences if max_sentences is not None else -1
+    bsz_mult = required_batch_size_multiple
+
+    if not isinstance(indices, np.ndarray):
+        indices = np.fromiter(indices, dtype=np.int64, count=-1)
+
+    if num_tokens_vec is not None and not isinstance(num_tokens_vec, np.ndarray):
+        num_tokens_vec = np.fromiter(num_tokens_vec, dtype=np.int64, count=-1)
+
+    if fixed_shapes is None:
+        if num_tokens_vec is None:
+            return batch_by_size_fn(
+                indices,
+                num_tokens_fn,
+                max_tokens,
+                max_sentences,
+                bsz_mult,
+            )
+        else:
+            return batch_by_size_vec(
+                indices,
+                num_tokens_vec,
+                max_tokens,
+                max_sentences,
+                bsz_mult,
+            )
+
+    else:
+        fixed_shapes = np.array(fixed_shapes, dtype=np.int64)
+        sort_order = np.lexsort(
+            [
+                fixed_shapes[:, 1].argsort(),  # length
+                fixed_shapes[:, 0].argsort(),  # bsz
+            ]
+        )
+        fixed_shapes_sorted = fixed_shapes[sort_order]
+        return batch_fixed_shapes_fast(indices, num_tokens_fn, fixed_shapes_sorted)
+
+
+def post_process(sentence: str, symbol: str):
+    if symbol == "sentencepiece":
+        sentence = sentence.replace(" ", "").replace("\u2581", " ").strip()
+    elif symbol == "wordpiece":
+        sentence = sentence.replace(" ", "").replace("_", " ").strip()
+    elif symbol == "letter":
+        sentence = sentence.replace(" ", "").replace("|", " ").strip()
+    elif symbol == "silence":
+        import re
+        sentence = sentence.replace("<SIL>", "")
+        sentence = re.sub(' +', ' ', sentence).strip()
+    elif symbol == "_EOW":
+        sentence = sentence.replace(" ", "").replace("_EOW", " ").strip()
+    elif symbol in {"subword_nmt", "@@ ", "@@"}:
+        if symbol == "subword_nmt":
+            symbol = "@@ "
+        sentence = (sentence + " ").replace(symbol, "").rstrip()
+    elif symbol == "none":
+        pass
+    elif symbol is not None:
+        raise NotImplementedError(f"Unknown post_process option: {symbol}")
+    return sentence
+
+
+def compute_mask_indices(
+    shape: Tuple[int, int],
+    padding_mask: Optional[torch.Tensor],
+    mask_prob: float,
+    mask_length: int,
+    mask_type: str = "static",
+    mask_other: float = 0.0,
+    min_masks: int = 0,
+    no_overlap: bool = False,
+    min_space: int = 0,
+) -> np.ndarray:
+    """
+    Computes random mask spans for a given shape
+
+    Args:
+        shape: the the shape for which to compute masks.
+            should be of size 2 where first element is batch size and 2nd is timesteps
+        padding_mask: optional padding mask of the same size as shape, which will prevent masking padded elements
+        mask_prob: probability for each token to be chosen as start of the span to be masked. this will be multiplied by
+            number of timesteps divided by length of mask span to mask approximately this percentage of all elements.
+            however due to overlaps, the actual number will be smaller (unless no_overlap is True)
+        mask_type: how to compute mask lengths
+            static = fixed size
+            uniform = sample from uniform distribution [mask_other, mask_length*2]
+            normal = sample from normal distribution with mean mask_length and stdev mask_other. mask is min 1 element
+            poisson = sample from possion distribution with lambda = mask length
+        min_masks: minimum number of masked spans
+        no_overlap: if false, will switch to an alternative recursive algorithm that prevents spans from overlapping
+        min_space: only used if no_overlap is True, this is how many elements to keep unmasked between spans
+    """
+
+    bsz, all_sz = shape
+    mask = np.full((bsz, all_sz), False)
+
+    all_num_mask = int(
+        # add a random number for probabilistic rounding
+        mask_prob * all_sz / float(mask_length)
+        + np.random.rand()
+    )
+
+    all_num_mask = max(min_masks, all_num_mask)
+
+    mask_idcs = []
+    for i in range(bsz):
+        if padding_mask is not None:
+            sz = all_sz - padding_mask[i].long().sum().item()
+            num_mask = int(
+                # add a random number for probabilistic rounding
+                mask_prob * sz / float(mask_length)
+                + np.random.rand()
+            )
+            num_mask = max(min_masks, num_mask)
+        else:
+            sz = all_sz
+            num_mask = all_num_mask
+
+        if mask_type == "static":
+            lengths = np.full(num_mask, mask_length)
+        elif mask_type == "uniform":
+            lengths = np.random.randint(mask_other, mask_length * 2 + 1, size=num_mask)
+        elif mask_type == "normal":
+            lengths = np.random.normal(mask_length, mask_other, size=num_mask)
+            lengths = [max(1, int(round(x))) for x in lengths]
+        elif mask_type == "poisson":
+            lengths = np.random.poisson(mask_length, size=num_mask)
+            lengths = [int(round(x)) for x in lengths]
+        else:
+            raise Exception("unknown mask selection " + mask_type)
+
+        if sum(lengths) == 0:
+            lengths[0] = min(mask_length, sz - 1)
+
+        if no_overlap:
+            mask_idc = []
+
+            def arrange(s, e, length, keep_length):
+                span_start = np.random.randint(s, e - length)
+                mask_idc.extend(span_start + i for i in range(length))
+
+                new_parts = []
+                if span_start - s - min_space >= keep_length:
+                    new_parts.append((s, span_start - min_space + 1))
+                if e - span_start - keep_length - min_space > keep_length:
+                    new_parts.append((span_start + length + min_space, e))
+                return new_parts
+
+            parts = [(0, sz)]
+            min_length = min(lengths)
+            for length in sorted(lengths, reverse=True):
+                lens = np.fromiter(
+                    (e - s if e - s >= length + min_space else 0 for s, e in parts),
+                    np.int,
+                )
+                l_sum = np.sum(lens)
+                if l_sum == 0:
+                    break
+                probs = lens / np.sum(lens)
+                c = np.random.choice(len(parts), p=probs)
+                s, e = parts.pop(c)
+                parts.extend(arrange(s, e, length, min_length))
+            mask_idc = np.asarray(mask_idc)
+        else:
+            min_len = min(lengths)
+            if sz - min_len <= num_mask:
+                min_len = sz - num_mask - 1
+
+            mask_idc = np.random.choice(sz - min_len, num_mask, replace=False)
+
+            mask_idc = np.asarray(
+                [
+                    mask_idc[j] + offset
+                    for j in range(len(mask_idc))
+                    for offset in range(lengths[j])
+                ]
+            )
+
+        mask_idcs.append(np.unique(mask_idc[mask_idc < sz]))
+
+    min_len = min([len(m) for m in mask_idcs])
+    for i, mask_idc in enumerate(mask_idcs):
+        if len(mask_idc) > min_len:
+            mask_idc = np.random.choice(mask_idc, min_len, replace=False)
+        mask[i, mask_idc] = True
+
+    return mask
+
+
+def get_mem_usage():
+    try:
+        import psutil
+
+        mb = 1024 * 1024
+        return f"used={psutil.virtual_memory().used / mb}Mb; avail={psutil.virtual_memory().available / mb}Mb"
+    except ImportError:
+        return "N/A"
+
+
+# lens: torch.LongTensor
+# returns: torch.BoolTensor
+def lengths_to_padding_mask(lens):
+    bsz, max_lens = lens.size(0), torch.max(lens).item()
+    mask = torch.arange(max_lens).to(lens.device).view(1, max_lens)
+    mask = mask.expand(bsz, -1) >= lens.view(bsz, 1).expand(-1, max_lens)
+    return mask
+
+
+# lens: torch.LongTensor
+# returns: torch.BoolTensor
+def lengths_to_mask(lens):
+    return ~lengths_to_padding_mask(lens)
+
+
+def get_buckets(sizes, num_buckets):
+    buckets = np.unique(
+        np.percentile(
+            sizes,
+            np.linspace(0, 100, num_buckets + 1),
+            interpolation='lower',
+        )[1:]
+    )
+    return buckets
+
+
+def get_bucketed_sizes(orig_sizes, buckets):
+    sizes = np.copy(orig_sizes)
+    assert np.min(sizes) >= 0
+    start_val = -1
+    for end_val in buckets:
+        mask = (sizes > start_val) & (sizes <= end_val)
+        sizes[mask] = end_val
+        start_val = end_val
+    return sizes
+
+
+
+def _find_extra_valid_paths(dataset_path: str) -> set:
+    paths = utils.split_paths(dataset_path)
+    all_valid_paths = set()
+    for sub_dir in paths:
+        contents = PathManager.ls(sub_dir)
+        valid_paths = [c for c in contents if re.match("valid*[0-9].*", c) is not None]
+        all_valid_paths |= {os.path.basename(p) for p in valid_paths}
+    # Remove .bin, .idx etc
+    roots = {os.path.splitext(p)[0] for p in all_valid_paths}
+    return roots
+
+
+def raise_if_valid_subsets_unintentionally_ignored(train_cfg) -> None:
+    """Raises if there are paths matching 'valid*[0-9].*' which are not combined or ignored."""
+    if (
+        train_cfg.dataset.ignore_unused_valid_subsets
+        or train_cfg.dataset.combine_valid_subsets
+        or train_cfg.dataset.disable_validation
+        or not hasattr(train_cfg.task, "data")
+    ):
+        return
+    other_paths = _find_extra_valid_paths(train_cfg.task.data)
+    specified_subsets = train_cfg.dataset.valid_subset.split(",")
+    ignored_paths = [p for p in other_paths if p not in specified_subsets]
+    if ignored_paths:
+        advice = "Set --combine-val to combine them or --ignore-unused-valid-subsets to ignore them."
+        msg = f"Valid paths {ignored_paths} will be ignored. {advice}"
+        raise ValueError(msg)
diff --git a/SpeechT5/fairseq/fairseq/data/data_utils_fast.pyx b/SpeechT5/fairseq/fairseq/data/data_utils_fast.pyx
new file mode 100644
index 0000000000000000000000000000000000000000..c61f31d6b2113d4c6a03d6553335997098ba0c20
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/data/data_utils_fast.pyx
@@ -0,0 +1,178 @@
+# cython: language_level=3
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import numpy as np
+
+cimport cython
+cimport numpy as np
+
+from libc.stdint cimport int32_t, int64_t
+from libcpp cimport bool as bool_t
+
+ctypedef int64_t DTYPE_t
+
+@cython.cdivision(True)
+@cython.boundscheck(False)
+@cython.wraparound(False)
+cpdef list batch_by_size_vec(
+    np.ndarray[int64_t, ndim=1] indices,
+    np.ndarray[int64_t, ndim=1] num_tokens_vec,
+    int64_t max_tokens,
+    int64_t max_sentences,
+    int32_t bsz_mult,
+):
+    if indices.shape[0] == 0:
+        return []
+
+    assert max_tokens <= 0 or np.max(num_tokens_vec) <= max_tokens, (
+        f"Sentences lengths should not exceed max_tokens={max_tokens}"
+    )
+
+    cdef int32_t indices_len = indices.shape[0]
+    cdef np.ndarray[int32_t, ndim=1] batches_ends = \
+            np.zeros(indices_len, dtype=np.int32)
+    cdef int32_t[:] batches_ends_view = batches_ends
+    cdef int64_t[:] num_tokens_view = num_tokens_vec
+
+    cdef int32_t pos = 0
+    cdef int32_t new_batch_end = 0
+
+    cdef int64_t new_batch_max_tokens = 0
+    cdef int32_t new_batch_sentences = 0
+    cdef int64_t new_batch_num_tokens = 0
+
+    cdef bool_t overflow = False
+    cdef bool_t size_matches_with_bsz_mult = False
+
+    cdef int32_t batches_count = 0
+    cdef int32_t batch_start = 0
+    cdef int64_t tail_max_tokens = 0
+    cdef int64_t batch_max_tokens = 0
+
+    for pos in range(indices_len):
+        # At every pos we keep stats about the last complete batch [batch_start:batch_end),
+        #      and tail [batch_end:pos].
+        # 1) Every time when (batch + tail) forms a valid batch
+        #      (according to max_tokens, max_sentences and bsz_mult) we append tail to batch.
+        # 2) When (batch+tail) violates max_tokens or max_sentences constraints
+        #      we finalize running batch, and tail becomes a new batch.
+        # 3) There is a corner case when tail also violates constraints.
+        #      In that situation [batch_end:pos-1] (tail without the current pos)
+        #      gets added to the finalized batches, while [pos:pos] becomes a new tail.
+        #
+        # Important: For the sake of performance try to avoid using function calls within this loop.
+
+        tail_max_tokens = tail_max_tokens \
+                            if tail_max_tokens > num_tokens_view[pos] \
+                            else num_tokens_view[pos]
+        new_batch_end = pos + 1
+        new_batch_max_tokens = batch_max_tokens \
+                                if batch_max_tokens > tail_max_tokens \
+                                else tail_max_tokens
+        new_batch_sentences = new_batch_end - batch_start
+        new_batch_num_tokens = new_batch_sentences * new_batch_max_tokens
+
+        overflow = (new_batch_sentences > max_sentences > 0 or
+                    new_batch_num_tokens > max_tokens > 0)
+        size_matches_with_bsz_mult = (new_batch_sentences < bsz_mult or
+                                      new_batch_sentences % bsz_mult == 0)
+
+        if overflow:
+            tail_num_tokens = tail_max_tokens * \
+                    (new_batch_end - batches_ends_view[batches_count])
+            tail_overflow = tail_num_tokens > max_tokens > 0
+            # In case of a tail overflow finalize two batches
+            if tail_overflow:
+                batches_count += 1
+                batches_ends_view[batches_count] = pos
+                tail_max_tokens = num_tokens_view[pos]
+            batch_start = batches_ends_view[batches_count]
+            batches_count += 1
+            new_batch_max_tokens = tail_max_tokens
+
+        if overflow or size_matches_with_bsz_mult:
+            batches_ends_view[batches_count] = new_batch_end
+            batch_max_tokens = new_batch_max_tokens
+            tail_max_tokens = 0
+    if batches_ends_view[batches_count] != indices_len:
+        batches_count += 1
+    # Memory and time-efficient split
+    return np.split(indices, batches_ends[:batches_count])
+
+
+@cython.boundscheck(False)
+@cython.wraparound(False)
+cpdef list batch_by_size_fn(
+    np.ndarray[DTYPE_t, ndim=1] indices,
+    num_tokens_fn,
+    int64_t max_tokens,
+    int64_t max_sentences,
+    int32_t bsz_mult,
+):
+    cdef int32_t indices_len = indices.shape[0]
+    cdef np.ndarray[int64_t, ndim=1] num_tokens_vec = np.zeros(indices_len,
+                                                               dtype=np.int64)
+    cdef DTYPE_t[:] indices_view = indices
+    cdef DTYPE_t[:] num_tokens_vec_view = num_tokens_vec
+    cdef int64_t pos
+    for pos in range(indices_len):
+        num_tokens_vec[pos] = num_tokens_fn(indices_view[pos])
+    return batch_by_size_vec(indices, num_tokens_vec, max_tokens,
+        max_sentences, bsz_mult,)
+
+
+cdef _find_valid_shape(
+    DTYPE_t[:, :] shapes_view,
+    int64_t num_sentences,
+    int64_t num_tokens,
+):
+    """Return index of first valid shape of -1 if none is found."""
+    for i in range(shapes_view.shape[0]):
+        if num_sentences <= shapes_view[i][0] and num_tokens <= shapes_view[i][1]:
+            return i
+    return -1
+
+
+@cython.cdivision(True)
+cpdef list batch_fixed_shapes_fast(
+    np.ndarray[DTYPE_t, ndim=1] indices,
+    num_tokens_fn,
+    np.ndarray[DTYPE_t, ndim=2] fixed_shapes_sorted,
+):
+    cdef int64_t sample_len = 0
+    cdef list sample_lens = []
+    cdef list batch = []
+    cdef list batches = []
+    cdef int64_t mod_len
+    cdef int64_t i
+    cdef int64_t idx
+    cdef int64_t num_tokens
+    cdef DTYPE_t[:] indices_view = indices
+    cdef DTYPE_t[:, :] shapes_view = fixed_shapes_sorted
+
+    for i in range(len(indices_view)):
+        idx = indices_view[i]
+        num_tokens = num_tokens_fn(idx)
+        sample_lens.append(num_tokens)
+        sample_len = max(sample_len, num_tokens)
+
+        shape_idx = _find_valid_shape(shapes_view, len(batch) + 1, sample_len)
+        if shape_idx == -1:
+            batches.append(batch)
+            batch = []
+            sample_lens = []
+            sample_len = 0
+            shapes_view = fixed_shapes_sorted
+        elif shape_idx > 0:
+            # small optimization for the next call to _find_valid_shape
+            shapes_view = shapes_view[shape_idx:]
+
+        batch.append(idx)
+
+    if len(batch) > 0:
+        batches.append(batch)
+
+    return batches
diff --git a/SpeechT5/fairseq/fairseq/data/denoising_dataset.py b/SpeechT5/fairseq/fairseq/data/denoising_dataset.py
new file mode 100644
index 0000000000000000000000000000000000000000..bdb62c8d5db9c8755c72db4d0d8083c936f18dc8
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/data/denoising_dataset.py
@@ -0,0 +1,436 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import math
+
+import numpy as np
+import torch
+
+from . import FairseqDataset, data_utils
+
+
+def collate(
+    samples,
+    pad_idx,
+    eos_idx,
+    vocab,
+    left_pad_source=False,
+    left_pad_target=False,
+    input_feeding=True,
+    pad_to_length=None,
+):
+    assert input_feeding
+    if len(samples) == 0:
+        return {}
+
+    def merge(key, left_pad, move_eos_to_beginning=False, pad_to_length=None):
+        return data_utils.collate_tokens(
+            [s[key] for s in samples],
+            pad_idx,
+            eos_idx=None,  # use eos_idx of each sample instead of vocab.eos()
+            left_pad=left_pad,
+            move_eos_to_beginning=move_eos_to_beginning,
+            pad_to_length=pad_to_length,
+        )
+
+    id = torch.LongTensor([s["id"] for s in samples])
+    src_tokens = merge(
+        "source",
+        left_pad=left_pad_source,
+        pad_to_length=pad_to_length["source"] if pad_to_length is not None else None,
+    )
+    # sort by descending source length
+    src_lengths = torch.LongTensor([s["source"].numel() for s in samples])
+    src_lengths, sort_order = src_lengths.sort(descending=True)
+    id = id.index_select(0, sort_order)
+    src_tokens = src_tokens.index_select(0, sort_order)
+
+    prev_output_tokens = None
+    target = None
+    if samples[0].get("target", None) is not None:
+        target = merge(
+            "target",
+            left_pad=left_pad_target,
+            pad_to_length=pad_to_length["target"]
+            if pad_to_length is not None
+            else None,
+        )
+        target = target.index_select(0, sort_order)
+        ntokens = sum(len(s["target"]) for s in samples)
+
+        if input_feeding:
+            # we create a shifted version of targets for feeding the
+            # previous output token(s) into the next decoder step
+            prev_output_tokens = merge(
+                "target",
+                left_pad=left_pad_target,
+                move_eos_to_beginning=True,
+                pad_to_length=pad_to_length["target"]
+                if pad_to_length is not None
+                else None,
+            )
+            prev_output_tokens = prev_output_tokens.index_select(0, sort_order)
+    else:
+        ntokens = sum(len(s["source"]) for s in samples)
+
+    batch = {
+        "id": id,
+        "ntokens": ntokens,
+        "net_input": {
+            "src_tokens": src_tokens,
+            "src_lengths": src_lengths,
+        },
+        "target": target,
+        "nsentences": samples[0]["source"].size(0),
+        "sort_order": sort_order,
+    }
+    if prev_output_tokens is not None:
+        batch["net_input"]["prev_output_tokens"] = prev_output_tokens
+
+    return batch
+
+
+class DenoisingDataset(FairseqDataset):
+    """
+    A wrapper around TokenBlockDataset for BART dataset.
+
+    Args:
+        dataset (TokenBlockDataset): dataset to wrap
+        sizes (List[int]): sentence lengths
+        vocab (~fairseq.data.Dictionary): vocabulary
+        mask_idx (int): dictionary index used for masked token
+        mask_whole_words: only mask whole words. This should be a byte mask
+            over vocab indices, indicating whether it is the beginning of a
+            word. We will extend any mask to encompass the whole word.
+        shuffle (bool, optional): shuffle the elements before batching.
+          Default: ``True``
+        seed: Seed for random number generator for reproducibility.
+        args: argparse arguments.
+    """
+
+    def __init__(
+        self,
+        dataset,
+        sizes,
+        vocab,
+        mask_idx,
+        mask_whole_words,
+        shuffle,
+        seed,
+        args,
+        eos=None,
+        item_transform_func=None,
+    ):
+        self.dataset = dataset
+
+        self.sizes = sizes
+
+        self.vocab = vocab
+        self.shuffle = shuffle
+        self.seed = seed
+        self.mask_idx = mask_idx
+        self.mask_whole_word = mask_whole_words
+        self.mask_ratio = args.mask
+        self.random_ratio = args.mask_random
+        self.insert_ratio = args.insert
+        self.rotate_ratio = args.rotate
+        self.permute_sentence_ratio = args.permute_sentences
+        self.eos = eos if eos is not None else vocab.eos()
+        self.item_transform_func = item_transform_func
+
+        if args.bpe != "gpt2":
+            self.full_stop_index = self.vocab.eos()
+        else:
+            assert args.bpe == "gpt2"
+            self.full_stop_index = self.vocab.index("13")
+
+        self.replace_length = args.replace_length
+        if self.replace_length not in [-1, 0, 1]:
+            raise ValueError(f"invalid arg: replace_length={self.replace_length}")
+        if args.mask_length not in ["subword", "word", "span-poisson"]:
+            raise ValueError(f"invalid arg: mask-length={args.mask_length}")
+        if args.mask_length == "subword" and args.replace_length not in [0, 1]:
+            raise ValueError(f"if using subwords, use replace-length=1 or 0")
+
+        self.mask_span_distribution = None
+        if args.mask_length == "span-poisson":
+            _lambda = args.poisson_lambda
+
+            lambda_to_the_k = 1
+            e_to_the_minus_lambda = math.exp(-_lambda)
+            k_factorial = 1
+            ps = []
+            for k in range(0, 128):
+                ps.append(e_to_the_minus_lambda * lambda_to_the_k / k_factorial)
+                lambda_to_the_k *= _lambda
+                k_factorial *= k + 1
+                if ps[-1] < 0.0000001:
+                    break
+            ps = torch.FloatTensor(ps)
+            self.mask_span_distribution = torch.distributions.Categorical(ps)
+
+        self.epoch = 0
+
+    @property
+    def can_reuse_epoch_itr_across_epochs(self):
+        return True  # only the noise changes, not item sizes
+
+    def set_epoch(self, epoch, **unused):
+        self.epoch = epoch
+
+    def __getitem__(self, index):
+        with data_utils.numpy_seed(self.seed, self.epoch, index):
+            tokens = self.dataset[index]
+            assert tokens[-1] == self.eos
+            source, target = tokens, tokens.clone()
+
+            if self.permute_sentence_ratio > 0.0:
+                source = self.permute_sentences(source, self.permute_sentence_ratio)
+
+            if self.mask_ratio > 0:
+                source = self.add_whole_word_mask(source, self.mask_ratio)
+
+            if self.insert_ratio > 0:
+                source = self.add_insertion_noise(source, self.insert_ratio)
+
+            if self.rotate_ratio > 0.0 and np.random.random() < self.rotate_ratio:
+                source = self.add_rolling_noise(source)
+        # there can additional changes to make:
+        if self.item_transform_func is not None:
+            source, target = self.item_transform_func(source, target)
+
+        assert (source >= 0).all()
+        assert (source[1:-1] >= 1).all()
+        assert (source <= len(self.vocab)).all()
+        assert source[0] == self.vocab.bos()
+        assert source[-1] == self.eos
+        return {
+            "id": index,
+            "source": source,
+            "target": target,
+        }
+
+    def __len__(self):
+        return len(self.dataset)
+
+    def permute_sentences(self, source, p=1.0):
+        full_stops = source == self.full_stop_index
+        # Pretend it ends with a full stop so last span is a sentence
+        full_stops[-2] = 1
+
+        # Tokens that are full stops, where the previous token is not
+        sentence_ends = (full_stops[1:] * ~full_stops[:-1]).nonzero(as_tuple=False) + 2
+        result = source.clone()
+
+        num_sentences = sentence_ends.size(0)
+        num_to_permute = math.ceil((num_sentences * 2 * p) / 2.0)
+        substitutions = torch.randperm(num_sentences)[:num_to_permute]
+        ordering = torch.arange(0, num_sentences)
+        ordering[substitutions] = substitutions[torch.randperm(num_to_permute)]
+
+        # Ignore <bos> at start
+        index = 1
+        for i in ordering:
+            sentence = source[(sentence_ends[i - 1] if i > 0 else 1) : sentence_ends[i]]
+            result[index : index + sentence.size(0)] = sentence
+            index += sentence.size(0)
+        return result
+
+    def word_starts(self, source):
+        if self.mask_whole_word is not None:
+            is_word_start = self.mask_whole_word.gather(0, source)
+        else:
+            is_word_start = torch.ones(source.size())
+        is_word_start[0] = 0
+        is_word_start[-1] = 0
+        return is_word_start
+
+    def add_whole_word_mask(self, source, p):
+        is_word_start = self.word_starts(source)
+        num_to_mask = int(math.ceil(is_word_start.float().sum() * p))
+        num_inserts = 0
+        if num_to_mask == 0:
+            return source
+
+        if self.mask_span_distribution is not None:
+            lengths = self.mask_span_distribution.sample(sample_shape=(num_to_mask,))
+
+            # Make sure we have enough to mask
+            cum_length = torch.cumsum(lengths, 0)
+            while cum_length[-1] < num_to_mask:
+                lengths = torch.cat(
+                    [
+                        lengths,
+                        self.mask_span_distribution.sample(sample_shape=(num_to_mask,)),
+                    ],
+                    dim=0,
+                )
+                cum_length = torch.cumsum(lengths, 0)
+
+            # Trim to masking budget
+            i = 0
+            while cum_length[i] < num_to_mask:
+                i += 1
+            lengths[i] = num_to_mask - (0 if i == 0 else cum_length[i - 1])
+            num_to_mask = i + 1
+            lengths = lengths[:num_to_mask]
+
+            # Handle 0-length mask (inserts) separately
+            lengths = lengths[lengths > 0]
+            num_inserts = num_to_mask - lengths.size(0)
+            num_to_mask -= num_inserts
+            if num_to_mask == 0:
+                return self.add_insertion_noise(source, num_inserts / source.size(0))
+
+            assert (lengths > 0).all()
+        else:
+            lengths = torch.ones((num_to_mask,)).long()
+        assert is_word_start[-1] == 0
+        word_starts = is_word_start.nonzero(as_tuple=False)
+        indices = word_starts[
+            torch.randperm(word_starts.size(0))[:num_to_mask]
+        ].squeeze(1)
+        mask_random = torch.FloatTensor(num_to_mask).uniform_() < self.random_ratio
+
+        source_length = source.size(0)
+        assert source_length - 1 not in indices
+        to_keep = torch.ones(source_length, dtype=torch.bool)
+        is_word_start[
+            -1
+        ] = 255  # acts as a long length, so spans don't go over the end of doc
+        if self.replace_length == 0:
+            to_keep[indices] = 0
+        else:
+            # keep index, but replace it with [MASK]
+            source[indices] = self.mask_idx
+            source[indices[mask_random]] = torch.randint(
+                1, len(self.vocab), size=(mask_random.sum(),)
+            )
+
+        if self.mask_span_distribution is not None:
+            assert len(lengths.size()) == 1
+            assert lengths.size() == indices.size()
+            lengths -= 1
+            while indices.size(0) > 0:
+                assert lengths.size() == indices.size()
+                lengths -= is_word_start[indices + 1].long()
+                uncompleted = lengths >= 0
+                indices = indices[uncompleted] + 1
+                mask_random = mask_random[uncompleted]
+                lengths = lengths[uncompleted]
+                if self.replace_length != -1:
+                    # delete token
+                    to_keep[indices] = 0
+                else:
+                    # keep index, but replace it with [MASK]
+                    source[indices] = self.mask_idx
+                    source[indices[mask_random]] = torch.randint(
+                        1, len(self.vocab), size=(mask_random.sum(),)
+                    )
+        else:
+            # A bit faster when all lengths are 1
+            while indices.size(0) > 0:
+                uncompleted = is_word_start[indices + 1] == 0
+                indices = indices[uncompleted] + 1
+                mask_random = mask_random[uncompleted]
+                if self.replace_length != -1:
+                    # delete token
+                    to_keep[indices] = 0
+                else:
+                    # keep index, but replace it with [MASK]
+                    source[indices] = self.mask_idx
+                    source[indices[mask_random]] = torch.randint(
+                        1, len(self.vocab), size=(mask_random.sum(),)
+                    )
+
+                assert source_length - 1 not in indices
+
+        source = source[to_keep]
+
+        if num_inserts > 0:
+            source = self.add_insertion_noise(source, num_inserts / source.size(0))
+
+        return source
+
+    def add_permuted_noise(self, tokens, p):
+        num_words = len(tokens)
+        num_to_permute = math.ceil(((num_words * 2) * p) / 2.0)
+        substitutions = torch.randperm(num_words - 2)[:num_to_permute] + 1
+        tokens[substitutions] = tokens[substitutions[torch.randperm(num_to_permute)]]
+        return tokens
+
+    def add_rolling_noise(self, tokens):
+        offset = np.random.randint(1, max(1, tokens.size(-1) - 1) + 1)
+        tokens = torch.cat(
+            (tokens[0:1], tokens[offset:-1], tokens[1:offset], tokens[-1:]),
+            dim=0,
+        )
+        return tokens
+
+    def add_insertion_noise(self, tokens, p):
+        if p == 0.0:
+            return tokens
+
+        num_tokens = len(tokens)
+        n = int(math.ceil(num_tokens * p))
+
+        noise_indices = torch.randperm(num_tokens + n - 2)[:n] + 1
+        noise_mask = torch.zeros(size=(num_tokens + n,), dtype=torch.bool)
+        noise_mask[noise_indices] = 1
+        result = torch.LongTensor(n + len(tokens)).fill_(-1)
+
+        num_random = int(math.ceil(n * self.random_ratio))
+        result[noise_indices[num_random:]] = self.mask_idx
+        result[noise_indices[:num_random]] = torch.randint(
+            low=1, high=len(self.vocab), size=(num_random,)
+        )
+
+        result[~noise_mask] = tokens
+
+        assert (result >= 0).all()
+        return result
+
+    def collater(self, samples, pad_to_length=None):
+        """Merge a list of samples to form a mini-batch.
+        Args:
+            samples (List[dict]): samples to collate
+        Returns:
+            dict: a mini-batch of data
+        """
+        return collate(
+            samples, self.vocab.pad(), self.eos, self.vocab, pad_to_length=pad_to_length
+        )
+
+    def num_tokens(self, index):
+        """Return the number of tokens in a sample. This value is used to
+        enforce ``--max-tokens`` during batching."""
+        return self.sizes[index]
+
+    def size(self, index):
+        """Return an example's size as a float or tuple. This value is used when
+        filtering a dataset with ``--max-positions``."""
+        return self.sizes[index]
+
+    def ordered_indices(self):
+        """Return an ordered list of indices. Batches will be constructed based
+        on this order."""
+        if self.shuffle:
+            indices = np.random.permutation(len(self))
+        else:
+            indices = np.arange(len(self))
+        return indices[np.argsort(self.sizes[indices], kind="mergesort")]
+
+    def prefetch(self, indices):
+        self.src.prefetch(indices)
+        self.tgt.prefetch(indices)
+
+    @property
+    def supports_prefetch(self):
+        return (
+            hasattr(self.src, "supports_prefetch")
+            and self.src.supports_prefetch
+            and hasattr(self.tgt, "supports_prefetch")
+            and self.tgt.supports_prefetch
+        )
diff --git a/SpeechT5/fairseq/fairseq/data/dictionary.py b/SpeechT5/fairseq/fairseq/data/dictionary.py
new file mode 100644
index 0000000000000000000000000000000000000000..0d8308a811c5e558d9024a18d8545804dc0ecdfd
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/data/dictionary.py
@@ -0,0 +1,395 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import os
+from collections import Counter
+from multiprocessing import Pool
+
+import torch
+from fairseq import utils
+from fairseq.binarizer import safe_readline
+from fairseq.data import data_utils
+from fairseq.file_io import PathManager
+from fairseq.tokenizer import tokenize_line
+
+
+class Dictionary:
+    """A mapping from symbols to consecutive integers"""
+
+    def __init__(
+        self,
+        *,  # begin keyword-only arguments
+        bos="<s>",
+        pad="<pad>",
+        eos="</s>",
+        unk="<unk>",
+        extra_special_symbols=None,
+    ):
+        self.bos_word, self.unk_word, self.pad_word, self.eos_word = bos, unk, pad, eos
+        self.symbols = []
+        self.count = []
+        self.indices = {}
+        self.bos_index = self.add_symbol(bos)
+        self.pad_index = self.add_symbol(pad)
+        self.eos_index = self.add_symbol(eos)
+        self.unk_index = self.add_symbol(unk)
+        if extra_special_symbols:
+            for s in extra_special_symbols:
+                self.add_symbol(s)
+        self.nspecial = len(self.symbols)
+
+    def __eq__(self, other):
+        return self.indices == other.indices
+
+    def __getitem__(self, idx):
+        if idx < len(self.symbols):
+            return self.symbols[idx]
+        return self.unk_word
+
+    def __len__(self):
+        """Returns the number of symbols in the dictionary"""
+        return len(self.symbols)
+
+    def __contains__(self, sym):
+        return sym in self.indices
+
+    def index(self, sym):
+        """Returns the index of the specified symbol"""
+        assert isinstance(sym, str)
+        if sym in self.indices:
+            return self.indices[sym]
+        return self.unk_index
+
+    def string(
+        self,
+        tensor,
+        bpe_symbol=None,
+        escape_unk=False,
+        extra_symbols_to_ignore=None,
+        unk_string=None,
+        include_eos=False,
+        separator=" ",
+    ):
+        """Helper for converting a tensor of token indices to a string.
+
+        Can optionally remove BPE symbols or escape <unk> words.
+        """
+        if torch.is_tensor(tensor) and tensor.dim() == 2:
+            return "\n".join(
+                self.string(t, bpe_symbol, escape_unk, extra_symbols_to_ignore, include_eos=include_eos)
+                for t in tensor
+            )
+
+        extra_symbols_to_ignore = set(extra_symbols_to_ignore or [])
+        extra_symbols_to_ignore.add(self.eos())
+
+        def token_string(i):
+            if i == self.unk():
+                if unk_string is not None:
+                    return unk_string
+                else:
+                    return self.unk_string(escape_unk)
+            else:
+                return self[i]
+
+        if hasattr(self, "bos_index"):
+            extra_symbols_to_ignore.add(self.bos())
+
+        sent = separator.join(
+            token_string(i)
+            for i in tensor
+            if utils.item(i) not in extra_symbols_to_ignore
+        )
+
+        return data_utils.post_process(sent, bpe_symbol)
+
+    def unk_string(self, escape=False):
+        """Return unknown string, optionally escaped as: <<unk>>"""
+        if escape:
+            return "<{}>".format(self.unk_word)
+        else:
+            return self.unk_word
+
+    def add_symbol(self, word, n=1, overwrite=False):
+        """Adds a word to the dictionary"""
+        if word in self.indices and not overwrite:
+            idx = self.indices[word]
+            self.count[idx] = self.count[idx] + n
+            return idx
+        else:
+            idx = len(self.symbols)
+            self.indices[word] = idx
+            self.symbols.append(word)
+            self.count.append(n)
+            return idx
+
+    def update(self, new_dict):
+        """Updates counts from new dictionary."""
+        for word in new_dict.symbols:
+            idx2 = new_dict.indices[word]
+            if word in self.indices:
+                idx = self.indices[word]
+                self.count[idx] = self.count[idx] + new_dict.count[idx2]
+            else:
+                idx = len(self.symbols)
+                self.indices[word] = idx
+                self.symbols.append(word)
+                self.count.append(new_dict.count[idx2])
+
+    def finalize(self, threshold=-1, nwords=-1, padding_factor=8):
+        """Sort symbols by frequency in descending order, ignoring special ones.
+
+        Args:
+            - threshold defines the minimum word count
+            - nwords defines the total number of words in the final dictionary,
+                including special symbols
+            - padding_factor can be used to pad the dictionary size to be a
+                multiple of 8, which is important on some hardware (e.g., Nvidia
+                Tensor Cores).
+        """
+        if nwords <= 0:
+            nwords = len(self)
+
+        new_indices = dict(zip(self.symbols[: self.nspecial], range(self.nspecial)))
+        new_symbols = self.symbols[: self.nspecial]
+        new_count = self.count[: self.nspecial]
+
+        c = Counter(
+            dict(
+                sorted(zip(self.symbols[self.nspecial :], self.count[self.nspecial :]))
+            )
+        )
+        for symbol, count in c.most_common(nwords - self.nspecial):
+            if count >= threshold:
+                new_indices[symbol] = len(new_symbols)
+                new_symbols.append(symbol)
+                new_count.append(count)
+            else:
+                break
+
+        assert len(new_symbols) == len(new_indices)
+
+        self.count = list(new_count)
+        self.symbols = list(new_symbols)
+        self.indices = new_indices
+
+        self.pad_to_multiple_(padding_factor)
+
+    def pad_to_multiple_(self, padding_factor):
+        """Pad Dictionary size to be a multiple of *padding_factor*."""
+        if padding_factor > 1:
+            i = 0
+            while len(self) % padding_factor != 0:
+                symbol = "madeupword{:04d}".format(i)
+                self.add_symbol(symbol, n=0)
+                i += 1
+
+    def bos(self):
+        """Helper to get index of beginning-of-sentence symbol"""
+        return self.bos_index
+
+    def pad(self):
+        """Helper to get index of pad symbol"""
+        return self.pad_index
+
+    def eos(self):
+        """Helper to get index of end-of-sentence symbol"""
+        return self.eos_index
+
+    def unk(self):
+        """Helper to get index of unk symbol"""
+        return self.unk_index
+
+    @classmethod
+    def load(cls, f):
+        """Loads the dictionary from a text file with the format:
+
+        ```
+        <symbol0> <count0>
+        <symbol1> <count1>
+        ...
+        ```
+        """
+        d = cls()
+        d.add_from_file(f)
+        return d
+
+    def add_from_file(self, f):
+        """
+        Loads a pre-existing dictionary from a text file and adds its symbols
+        to this instance.
+        """
+        if isinstance(f, str):
+            try:
+                with open(PathManager.get_local_path(f), "r", encoding="utf-8") as fd:
+                    self.add_from_file(fd)
+            except FileNotFoundError as fnfe:
+                raise fnfe
+            except UnicodeError:
+                raise Exception(
+                    "Incorrect encoding detected in {}, please "
+                    "rebuild the dataset".format(f)
+                )
+            return
+
+        lines = f.readlines()
+        indices_start_line = self._load_meta(lines)
+
+        for line in lines[indices_start_line:]:
+            try:
+                line, field = line.rstrip().rsplit(" ", 1)
+                if field == "#fairseq:overwrite":
+                    overwrite = True
+                    line, field = line.rsplit(" ", 1)
+                else:
+                    overwrite = False
+                count = int(field)
+                word = line
+                if word in self and not overwrite:
+                    raise RuntimeError(
+                        "Duplicate word found when loading Dictionary: '{}'. "
+                        "Duplicate words can overwrite earlier ones by adding the "
+                        "#fairseq:overwrite flag at the end of the corresponding row "
+                        "in the dictionary file. If using the Camembert model, please "
+                        "download an updated copy of the model file.".format(word)
+                    )
+                self.add_symbol(word, n=count, overwrite=overwrite)
+            except ValueError:
+                raise ValueError(
+                    "Incorrect dictionary format, expected '<token> <cnt> [flags]'"
+                )
+
+    def _save(self, f, kv_iterator):
+        if isinstance(f, str):
+            PathManager.mkdirs(os.path.dirname(f))
+            with PathManager.open(f, "w", encoding="utf-8") as fd:
+                return self.save(fd)
+        for k, v in kv_iterator:
+            print("{} {}".format(k, v), file=f)
+
+    def _get_meta(self):
+        return [], []
+
+    def _load_meta(self, lines):
+        return 0
+
+    def save(self, f):
+        """Stores dictionary into a text file"""
+        ex_keys, ex_vals = self._get_meta()
+        self._save(
+            f,
+            zip(
+                ex_keys + self.symbols[self.nspecial :],
+                ex_vals + self.count[self.nspecial :],
+            ),
+        )
+
+    def dummy_sentence(self, length):
+        t = torch.Tensor(length).uniform_(self.nspecial + 1, len(self)).long()
+        t[-1] = self.eos()
+        return t
+
+    def encode_line(
+        self,
+        line,
+        line_tokenizer=tokenize_line,
+        add_if_not_exist=True,
+        consumer=None,
+        append_eos=True,
+        reverse_order=False,
+    ) -> torch.IntTensor:
+        words = line_tokenizer(line)
+        if reverse_order:
+            words = list(reversed(words))
+        nwords = len(words)
+        ids = torch.IntTensor(nwords + 1 if append_eos else nwords)
+
+        for i, word in enumerate(words):
+            if add_if_not_exist:
+                idx = self.add_symbol(word)
+            else:
+                idx = self.index(word)
+            if consumer is not None:
+                consumer(word, idx)
+            ids[i] = idx
+        if append_eos:
+            ids[nwords] = self.eos_index
+        return ids
+
+    @staticmethod
+    def _add_file_to_dictionary_single_worker(
+        filename, tokenize, eos_word, worker_id=0, num_workers=1
+    ):
+        counter = Counter()
+        with open(PathManager.get_local_path(filename), "r", encoding="utf-8") as f:
+            size = os.fstat(f.fileno()).st_size
+            chunk_size = size // num_workers
+            offset = worker_id * chunk_size
+            end = offset + chunk_size
+            f.seek(offset)
+            if offset > 0:
+                safe_readline(f)  # drop first incomplete line
+            line = f.readline()
+            while line:
+                for word in tokenize(line):
+                    counter.update([word])
+                counter.update([eos_word])
+                # f.tell() returns only an opaque number which can
+                # return to the position in the file via f.seek()
+                # and does not necessarily represent a byte position
+                # in the file. However, f.tell() is faithful to the
+                # byte position _most of the time_. Thus we can just
+                # check against the file size to prevent early exit.
+                if f.tell() > end and f.tell() < size:
+                    break
+                line = f.readline()
+        return counter
+
+    @staticmethod
+    def add_file_to_dictionary(filename, dict, tokenize, num_workers):
+        def merge_result(counter):
+            for w, c in sorted(counter.items()):
+                dict.add_symbol(w, c)
+
+        if num_workers > 1:
+            pool = Pool(processes=num_workers)
+            results = []
+            for worker_id in range(num_workers):
+                results.append(
+                    pool.apply_async(
+                        Dictionary._add_file_to_dictionary_single_worker,
+                        (filename, tokenize, dict.eos_word, worker_id, num_workers),
+                    )
+                )
+            pool.close()
+            pool.join()
+            for r in results:
+                merge_result(r.get())
+        else:
+            merge_result(
+                Dictionary._add_file_to_dictionary_single_worker(
+                    filename, tokenize, dict.eos_word
+                )
+            )
+
+
+class TruncatedDictionary(object):
+    def __init__(self, wrapped_dict, length):
+        self.__class__ = type(
+            wrapped_dict.__class__.__name__,
+            (self.__class__, wrapped_dict.__class__),
+            {},
+        )
+        self.__dict__ = wrapped_dict.__dict__
+        self.wrapped_dict = wrapped_dict
+        self.length = min(len(self.wrapped_dict), length)
+
+    def __len__(self):
+        return self.length
+
+    def __getitem__(self, i):
+        if i < self.length:
+            return self.wrapped_dict[i]
+        return self.wrapped_dict.unk()
diff --git a/SpeechT5/fairseq/fairseq/data/encoders/__init__.py b/SpeechT5/fairseq/fairseq/data/encoders/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..7cbe00a10520331709441e5e77991bd2edca8c06
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/data/encoders/__init__.py
@@ -0,0 +1,29 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+
+import importlib
+import os
+
+from fairseq import registry
+
+
+build_tokenizer, register_tokenizer, TOKENIZER_REGISTRY, _ = registry.setup_registry(
+    "--tokenizer",
+    default=None,
+)
+
+
+build_bpe, register_bpe, BPE_REGISTRY, _ = registry.setup_registry(
+    "--bpe",
+    default=None,
+)
+
+
+# automatically import any Python files in the encoders/ directory
+for file in sorted(os.listdir(os.path.dirname(__file__))):
+    if file.endswith(".py") and not file.startswith("_"):
+        module = file[: file.find(".py")]
+        importlib.import_module("fairseq.data.encoders." + module)
diff --git a/SpeechT5/fairseq/fairseq/data/encoders/byte_bpe.py b/SpeechT5/fairseq/fairseq/data/encoders/byte_bpe.py
new file mode 100644
index 0000000000000000000000000000000000000000..31e3a0627827f19ca7f0b58da45e46d40a80c3bf
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/data/encoders/byte_bpe.py
@@ -0,0 +1,48 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+
+from dataclasses import dataclass, field
+
+from fairseq import file_utils
+from fairseq.data.encoders import register_bpe
+from fairseq.data.encoders.byte_utils import (
+    SPACE,
+    SPACE_ESCAPE,
+    byte_encode,
+    smart_byte_decode,
+)
+from fairseq.dataclass import FairseqDataclass
+
+
+@dataclass
+class ByteBpeConfig(FairseqDataclass):
+    sentencepiece_model_path: str = field(
+        default="???", metadata={"help": "path to sentencepiece model"}
+    )
+
+
+@register_bpe("byte_bpe", dataclass=ByteBpeConfig)
+class ByteBPE(object):
+    def __init__(self, cfg):
+        vocab = file_utils.cached_path(cfg.sentencepiece_model_path)
+        try:
+            import sentencepiece as spm
+
+            self.sp = spm.SentencePieceProcessor()
+            self.sp.Load(vocab)
+        except ImportError:
+            raise ImportError(
+                "Please install sentencepiece with: pip install sentencepiece"
+            )
+
+    def encode(self, x: str) -> str:
+        byte_encoded = byte_encode(x)
+        return SPACE.join(self.sp.EncodeAsPieces(byte_encoded))
+
+    @staticmethod
+    def decode(x: str) -> str:
+        unescaped = x.replace(SPACE, "").replace(SPACE_ESCAPE, SPACE)
+        return smart_byte_decode(unescaped)
diff --git a/SpeechT5/fairseq/fairseq/data/encoders/byte_utils.py b/SpeechT5/fairseq/fairseq/data/encoders/byte_utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..a305c080926c2d094b7e8ae48f5331da82025a75
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/data/encoders/byte_utils.py
@@ -0,0 +1,51 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import re
+
+
+WHITESPACE_NORMALIZER = re.compile(r"\s+")
+SPACE = chr(32)
+SPACE_ESCAPE = chr(9601)
+# excluding non-breaking space (160) here
+PRINTABLE_LATIN = set(
+    list(range(32, 126 + 1)) + list(range(161, 172 + 1)) + list(range(174, 255 + 1))
+)
+BYTE_TO_BCHAR = {
+    b: chr(b) if b in PRINTABLE_LATIN else chr(256 + b) for b in range(256)
+}
+BCHAR_TO_BYTE = {bc: b for b, bc in BYTE_TO_BCHAR.items()}
+
+
+def byte_encode(x: str) -> str:
+    normalized = WHITESPACE_NORMALIZER.sub(SPACE, x)
+    return "".join([BYTE_TO_BCHAR[b] for b in normalized.encode("utf-8")])
+
+
+def byte_decode(x: str) -> str:
+    try:
+        return bytes([BCHAR_TO_BYTE[bc] for bc in x]).decode("utf-8")
+    except ValueError:
+        return ""
+
+
+def smart_byte_decode(x: str) -> str:
+    output = byte_decode(x)
+    if output == "":
+        # DP the best recovery (max valid chars) if it's broken
+        n_bytes = len(x)
+        f = [0 for _ in range(n_bytes + 1)]
+        pt = [0 for _ in range(n_bytes + 1)]
+        for i in range(1, n_bytes + 1):
+            f[i], pt[i] = f[i - 1], i - 1
+            for j in range(1, min(4, i) + 1):
+                if f[i - j] + 1 > f[i] and len(byte_decode(x[i - j : i])) > 0:
+                    f[i], pt[i] = f[i - j] + 1, i - j
+        cur_pt = n_bytes
+        while cur_pt > 0:
+            if f[cur_pt] == f[pt[cur_pt]] + 1:
+                output = byte_decode(x[pt[cur_pt] : cur_pt]) + output
+            cur_pt = pt[cur_pt]
+    return output
diff --git a/SpeechT5/fairseq/fairseq/data/encoders/bytes.py b/SpeechT5/fairseq/fairseq/data/encoders/bytes.py
new file mode 100644
index 0000000000000000000000000000000000000000..f88f8f6929f5b6bdb0db470be9ebedf8fe1f752d
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/data/encoders/bytes.py
@@ -0,0 +1,34 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+
+from fairseq.data.encoders import register_bpe
+from fairseq.data.encoders.byte_utils import (
+    SPACE,
+    SPACE_ESCAPE,
+    byte_encode,
+    smart_byte_decode,
+)
+
+
+@register_bpe("bytes")
+class Bytes(object):
+    def __init__(self, *unused):
+        pass
+
+    @staticmethod
+    def add_args(parser):
+        pass
+
+    @staticmethod
+    def encode(x: str) -> str:
+        encoded = byte_encode(x)
+        escaped = encoded.replace(SPACE, SPACE_ESCAPE)
+        return SPACE.join(list(escaped))
+
+    @staticmethod
+    def decode(x: str) -> str:
+        unescaped = x.replace(SPACE, "").replace(SPACE_ESCAPE, SPACE)
+        return smart_byte_decode(unescaped)
diff --git a/SpeechT5/fairseq/fairseq/data/encoders/characters.py b/SpeechT5/fairseq/fairseq/data/encoders/characters.py
new file mode 100644
index 0000000000000000000000000000000000000000..494ea219392716dc75d2c1e19d71cd55b9b2f4ba
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/data/encoders/characters.py
@@ -0,0 +1,30 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+
+from fairseq.data.encoders import register_bpe
+
+
+SPACE = chr(32)
+SPACE_ESCAPE = chr(9601)
+
+
+@register_bpe("characters")
+class Characters(object):
+    def __init__(self, *unused):
+        pass
+
+    @staticmethod
+    def add_args(parser):
+        pass
+
+    @staticmethod
+    def encode(x: str) -> str:
+        escaped = x.replace(SPACE, SPACE_ESCAPE)
+        return SPACE.join(list(escaped))
+
+    @staticmethod
+    def decode(x: str) -> str:
+        return x.replace(SPACE, "").replace(SPACE_ESCAPE, SPACE)
diff --git a/SpeechT5/fairseq/fairseq/data/encoders/fastbpe.py b/SpeechT5/fairseq/fairseq/data/encoders/fastbpe.py
new file mode 100644
index 0000000000000000000000000000000000000000..f7c21039549ea002e73d1ad7cde5735f215f11ee
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/data/encoders/fastbpe.py
@@ -0,0 +1,36 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from dataclasses import dataclass, field
+
+from fairseq import file_utils
+from fairseq.data.encoders import register_bpe
+from fairseq.dataclass import FairseqDataclass
+
+
+@dataclass
+class fastBPEConfig(FairseqDataclass):
+    bpe_codes: str = field(default="???", metadata={"help": "path to fastBPE BPE"})
+
+
+@register_bpe("fastbpe", dataclass=fastBPEConfig)
+class fastBPE(object):
+    def __init__(self, cfg):
+        if cfg.bpe_codes is None:
+            raise ValueError("--bpe-codes is required for --bpe=fastbpe")
+        codes = file_utils.cached_path(cfg.bpe_codes)
+        try:
+            import fastBPE
+
+            self.bpe = fastBPE.fastBPE(codes)
+            self.bpe_symbol = "@@ "
+        except ImportError:
+            raise ImportError("Please install fastBPE with: pip install fastBPE")
+
+    def encode(self, x: str) -> str:
+        return self.bpe.apply([x])[0]
+
+    def decode(self, x: str) -> str:
+        return (x + " ").replace(self.bpe_symbol, "").rstrip()
diff --git a/SpeechT5/fairseq/fairseq/data/encoders/gpt2_bpe.py b/SpeechT5/fairseq/fairseq/data/encoders/gpt2_bpe.py
new file mode 100644
index 0000000000000000000000000000000000000000..e661426a73c7e735f7054bcb04281bf1649bb46c
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/data/encoders/gpt2_bpe.py
@@ -0,0 +1,45 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from dataclasses import dataclass, field
+
+from fairseq import file_utils
+from fairseq.data.encoders import register_bpe
+from fairseq.dataclass import FairseqDataclass
+
+from .gpt2_bpe_utils import get_encoder
+
+
+DEFAULT_ENCODER_JSON = "https://dl.fbaipublicfiles.com/fairseq/gpt2_bpe/encoder.json"
+DEFAULT_VOCAB_BPE = "https://dl.fbaipublicfiles.com/fairseq/gpt2_bpe/vocab.bpe"
+
+
+@dataclass
+class GPT2BPEConfig(FairseqDataclass):
+    gpt2_encoder_json: str = field(
+        default=DEFAULT_ENCODER_JSON, metadata={"help": "path to encoder.json"}
+    )
+    gpt2_vocab_bpe: str = field(
+        default=DEFAULT_VOCAB_BPE, metadata={"help": "path to vocab.bpe"}
+    )
+
+
+@register_bpe("gpt2", dataclass=GPT2BPEConfig)
+class GPT2BPE(object):
+    def __init__(self, cfg):
+        encoder_json = file_utils.cached_path(cfg.gpt2_encoder_json)
+        vocab_bpe = file_utils.cached_path(cfg.gpt2_vocab_bpe)
+        self.bpe = get_encoder(encoder_json, vocab_bpe)
+
+    def encode(self, x: str) -> str:
+        return " ".join(map(str, self.bpe.encode(x)))
+
+    def decode(self, x: str) -> str:
+        return self.bpe.decode(
+            [int(tok) if tok not in {"<unk>", "<mask>"} else tok for tok in x.split()]
+        )
+
+    def is_beginning_of_word(self, x: str) -> bool:
+        return self.decode(x).startswith(" ")
diff --git a/SpeechT5/fairseq/fairseq/data/encoders/gpt2_bpe_utils.py b/SpeechT5/fairseq/fairseq/data/encoders/gpt2_bpe_utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..688d4e36e358df2dcc432d37d3e57bd81e2f1ed1
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/data/encoders/gpt2_bpe_utils.py
@@ -0,0 +1,140 @@
+"""
+Byte pair encoding utilities from GPT-2.
+
+Original source: https://github.com/openai/gpt-2/blob/master/src/encoder.py
+Original license: MIT
+"""
+
+import json
+from functools import lru_cache
+
+
+@lru_cache()
+def bytes_to_unicode():
+    """
+    Returns list of utf-8 byte and a corresponding list of unicode strings.
+    The reversible bpe codes work on unicode strings.
+    This means you need a large # of unicode characters in your vocab if you want to avoid UNKs.
+    When you're at something like a 10B token dataset you end up needing around 5K for decent coverage.
+    This is a signficant percentage of your normal, say, 32K bpe vocab.
+    To avoid that, we want lookup tables between utf-8 bytes and unicode strings.
+    And avoids mapping to whitespace/control characters the bpe code barfs on.
+    """
+    bs = (
+        list(range(ord("!"), ord("~") + 1))
+        + list(range(ord("¡"), ord("¬") + 1))
+        + list(range(ord("®"), ord("ÿ") + 1))
+    )
+    cs = bs[:]
+    n = 0
+    for b in range(2 ** 8):
+        if b not in bs:
+            bs.append(b)
+            cs.append(2 ** 8 + n)
+            n += 1
+    cs = [chr(n) for n in cs]
+    return dict(zip(bs, cs))
+
+
+def get_pairs(word):
+    """Return set of symbol pairs in a word.
+    Word is represented as tuple of symbols (symbols being variable-length strings).
+    """
+    pairs = set()
+    prev_char = word[0]
+    for char in word[1:]:
+        pairs.add((prev_char, char))
+        prev_char = char
+    return pairs
+
+
+class Encoder:
+    def __init__(self, encoder, bpe_merges, errors="replace"):
+        self.encoder = encoder
+        self.decoder = {v: k for k, v in self.encoder.items()}
+        self.errors = errors  # how to handle errors in decoding
+        self.byte_encoder = bytes_to_unicode()
+        self.byte_decoder = {v: k for k, v in self.byte_encoder.items()}
+        self.bpe_ranks = dict(zip(bpe_merges, range(len(bpe_merges))))
+        self.cache = {}
+
+        try:
+            import regex as re
+
+            self.re = re
+        except ImportError:
+            raise ImportError("Please install regex with: pip install regex")
+
+        # Should haved added re.IGNORECASE so BPE merges can happen for capitalized versions of contractions
+        self.pat = self.re.compile(
+            r"""'s|'t|'re|'ve|'m|'ll|'d| ?\p{L}+| ?\p{N}+| ?[^\s\p{L}\p{N}]+|\s+(?!\S)|\s+"""
+        )
+
+    def bpe(self, token):
+        if token in self.cache:
+            return self.cache[token]
+        word = tuple(token)
+        pairs = get_pairs(word)
+
+        if not pairs:
+            return token
+
+        while True:
+            bigram = min(pairs, key=lambda pair: self.bpe_ranks.get(pair, float("inf")))
+            if bigram not in self.bpe_ranks:
+                break
+            first, second = bigram
+            new_word = []
+            i = 0
+            while i < len(word):
+                try:
+                    j = word.index(first, i)
+                    new_word.extend(word[i:j])
+                    i = j
+                except:
+                    new_word.extend(word[i:])
+                    break
+
+                if word[i] == first and i < len(word) - 1 and word[i + 1] == second:
+                    new_word.append(first + second)
+                    i += 2
+                else:
+                    new_word.append(word[i])
+                    i += 1
+            new_word = tuple(new_word)
+            word = new_word
+            if len(word) == 1:
+                break
+            else:
+                pairs = get_pairs(word)
+        word = " ".join(word)
+        self.cache[token] = word
+        return word
+
+    def encode(self, text):
+        bpe_tokens = []
+        for token in self.re.findall(self.pat, text):
+            token = "".join(self.byte_encoder[b] for b in token.encode("utf-8"))
+            bpe_tokens.extend(
+                self.encoder[bpe_token] for bpe_token in self.bpe(token).split(" ")
+            )
+        return bpe_tokens
+
+    def decode(self, tokens):
+        text = "".join([self.decoder.get(token, token) for token in tokens])
+        text = bytearray([self.byte_decoder[c] for c in text]).decode(
+            "utf-8", errors=self.errors
+        )
+        return text
+
+
+def get_encoder(encoder_json_path, vocab_bpe_path):
+    with open(encoder_json_path, "r") as f:
+        encoder = json.load(f)
+    with open(vocab_bpe_path, "r", encoding="utf-8") as f:
+        bpe_data = f.read()
+    bpe_merges = [tuple(merge_str.split()) for merge_str in bpe_data.split("\n")[1:-1]]
+    return Encoder(
+        encoder=encoder,
+        bpe_merges=bpe_merges,
+    )
diff --git a/SpeechT5/fairseq/fairseq/data/encoders/hf_bert_bpe.py b/SpeechT5/fairseq/fairseq/data/encoders/hf_bert_bpe.py
new file mode 100644
index 0000000000000000000000000000000000000000..a41c059343ec7e2914b2c9d2f53f526c33f9659d
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/data/encoders/hf_bert_bpe.py
@@ -0,0 +1,50 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from dataclasses import dataclass, field
+from typing import Optional
+
+from fairseq.data.encoders import register_bpe
+from fairseq.dataclass import FairseqDataclass
+
+
+@dataclass
+class BertBPEConfig(FairseqDataclass):
+    bpe_cased: bool = field(default=False, metadata={"help": "set for cased BPE"})
+    bpe_vocab_file: Optional[str] = field(
+        default=None, metadata={"help": "bpe vocab file"}
+    )
+
+
+@register_bpe("bert", dataclass=BertBPEConfig)
+class BertBPE(object):
+    def __init__(self, cfg):
+        try:
+            from transformers import BertTokenizer
+        except ImportError:
+            raise ImportError(
+                "Please install transformers with: pip install transformers"
+            )
+
+        if cfg.bpe_vocab_file:
+            self.bert_tokenizer = BertTokenizer(
+                cfg.bpe_vocab_file, do_lower_case=not cfg.bpe_cased
+            )
+        else:
+            vocab_file_name = (
+                "bert-base-cased" if cfg.bpe_cased else "bert-base-uncased"
+            )
+            self.bert_tokenizer = BertTokenizer.from_pretrained(vocab_file_name)
+
+    def encode(self, x: str) -> str:
+        return " ".join(self.bert_tokenizer.tokenize(x))
+
+    def decode(self, x: str) -> str:
+        return self.bert_tokenizer.clean_up_tokenization(
+            self.bert_tokenizer.convert_tokens_to_string(x.split(" "))
+        )
+
+    def is_beginning_of_word(self, x: str) -> bool:
+        return not x.startswith("##")
diff --git a/SpeechT5/fairseq/fairseq/data/encoders/hf_byte_bpe.py b/SpeechT5/fairseq/fairseq/data/encoders/hf_byte_bpe.py
new file mode 100644
index 0000000000000000000000000000000000000000..c508578d41bf6b7ce0a847e0797d71b19beb393d
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/data/encoders/hf_byte_bpe.py
@@ -0,0 +1,50 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from dataclasses import dataclass, field
+
+from fairseq.data.encoders import register_bpe
+from fairseq.dataclass import FairseqDataclass
+from fairseq import file_utils
+
+
+@dataclass
+class HuggingFaceByteLevelBPEConfig(FairseqDataclass):
+    bpe_merges: str = field(default="???", metadata={"help": "path to merges.txt"})
+    bpe_vocab: str = field(default="???", metadata={"help": "path to vocab.json"})
+    bpe_add_prefix_space: bool = field(
+        default=False, metadata={"help": "add prefix space before encoding"}
+    )
+
+
+@register_bpe("hf_byte_bpe", dataclass=HuggingFaceByteLevelBPEConfig)
+class HuggingFaceByteLevelBPE(object):
+    def __init__(self, cfg):
+        try:
+            from tokenizers import ByteLevelBPETokenizer
+        except ImportError:
+            raise ImportError(
+                "Please install huggingface/tokenizers with: " "pip install tokenizers"
+            )
+
+        bpe_vocab = file_utils.cached_path(cfg.bpe_vocab)
+        bpe_merges = file_utils.cached_path(cfg.bpe_merges)
+
+        self.bpe = ByteLevelBPETokenizer(
+            bpe_vocab,
+            bpe_merges,
+            add_prefix_space=cfg.bpe_add_prefix_space,
+        )
+
+    def encode(self, x: str) -> str:
+        return " ".join(map(str, self.bpe.encode(x).ids))
+
+    def decode(self, x: str) -> str:
+        return self.bpe.decode(
+            [int(tok) if tok not in {"<unk>", "<mask>"} else tok for tok in x.split()]
+        )
+
+    def is_beginning_of_word(self, x: str) -> bool:
+        return self.decode(x).startswith(" ")
diff --git a/SpeechT5/fairseq/fairseq/data/encoders/moses_tokenizer.py b/SpeechT5/fairseq/fairseq/data/encoders/moses_tokenizer.py
new file mode 100644
index 0000000000000000000000000000000000000000..e236dad167a037a8ed95f7fc8292b27b10d580b0
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/data/encoders/moses_tokenizer.py
@@ -0,0 +1,49 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from dataclasses import dataclass, field
+
+from fairseq.data.encoders import register_tokenizer
+from fairseq.dataclass import FairseqDataclass
+
+
+@dataclass
+class MosesTokenizerConfig(FairseqDataclass):
+    source_lang: str = field(default="en", metadata={"help": "source language"})
+    target_lang: str = field(default="en", metadata={"help": "target language"})
+    moses_no_dash_splits: bool = field(
+        default=False, metadata={"help": "don't apply dash split rules"}
+    )
+    moses_no_escape: bool = field(
+        default=False,
+        metadata={"help": "don't perform HTML escaping on apostrophe, quotes, etc."},
+    )
+
+
+@register_tokenizer("moses", dataclass=MosesTokenizerConfig)
+class MosesTokenizer(object):
+    def __init__(self, cfg: MosesTokenizerConfig):
+        self.cfg = cfg
+
+        try:
+            from sacremoses import MosesTokenizer, MosesDetokenizer
+
+            self.tok = MosesTokenizer(cfg.source_lang)
+            self.detok = MosesDetokenizer(cfg.target_lang)
+        except ImportError:
+            raise ImportError(
+                "Please install Moses tokenizer with: pip install sacremoses"
+            )
+
+    def encode(self, x: str) -> str:
+        return self.tok.tokenize(
+            x,
+            aggressive_dash_splits=(not self.cfg.moses_no_dash_splits),
+            return_str=True,
+            escape=(not self.cfg.moses_no_escape),
+        )
+
+    def decode(self, x: str) -> str:
+        return self.detok.detokenize(x.split())
diff --git a/SpeechT5/fairseq/fairseq/data/encoders/nltk_tokenizer.py b/SpeechT5/fairseq/fairseq/data/encoders/nltk_tokenizer.py
new file mode 100644
index 0000000000000000000000000000000000000000..0ab92377b3a23bb48384c3f7acf299612e8b0775
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/data/encoders/nltk_tokenizer.py
@@ -0,0 +1,24 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from fairseq.data.encoders import register_tokenizer
+from fairseq.dataclass import FairseqDataclass
+
+
+@register_tokenizer("nltk", dataclass=FairseqDataclass)
+class NLTKTokenizer(object):
+    def __init__(self, *unused):
+        try:
+            from nltk.tokenize import word_tokenize
+
+            self.word_tokenize = word_tokenize
+        except ImportError:
+            raise ImportError("Please install nltk with: pip install nltk")
+
+    def encode(self, x: str) -> str:
+        return " ".join(self.word_tokenize(x))
+
+    def decode(self, x: str) -> str:
+        return x
diff --git a/SpeechT5/fairseq/fairseq/data/encoders/sentencepiece_bpe.py b/SpeechT5/fairseq/fairseq/data/encoders/sentencepiece_bpe.py
new file mode 100644
index 0000000000000000000000000000000000000000..a76d46a2014e81eff72b19f6c13084a855fcd477
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/data/encoders/sentencepiece_bpe.py
@@ -0,0 +1,48 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from dataclasses import dataclass, field
+
+from fairseq import file_utils
+from fairseq.data.encoders import register_bpe
+from fairseq.dataclass import FairseqDataclass
+
+
+@dataclass
+class SentencepieceConfig(FairseqDataclass):
+    sentencepiece_model: str = field(
+        default="???", metadata={"help": "path to sentencepiece model"}
+    )
+
+
+@register_bpe("sentencepiece", dataclass=SentencepieceConfig)
+class SentencepieceBPE(object):
+    def __init__(self, cfg):
+        sentencepiece_model = file_utils.cached_path(cfg.sentencepiece_model)
+        try:
+            import sentencepiece as spm
+
+            self.sp = spm.SentencePieceProcessor()
+            self.sp.Load(sentencepiece_model)
+        except ImportError:
+            raise ImportError(
+                "Please install sentencepiece with: pip install sentencepiece"
+            )
+
+    def encode(self, x: str) -> str:
+        return " ".join(self.sp.EncodeAsPieces(x))
+
+    def decode(self, x: str) -> str:
+        return x.replace(" ", "").replace("\u2581", " ").strip()
+
+    def is_beginning_of_word(self, x: str) -> bool:
+        if x in ["<unk>", "<s>", "</s>", "<pad>"]:
+            # special elements are always considered beginnings
+            # HACK: this logic is already present in fairseq/tasks/masked_lm.py
+            # but these special tokens are also contained in the sentencepiece
+            # vocabulary which causes duplicate special tokens. This hack makes
+            # sure that they are all taken into account.
+            return True
+        return x.startswith("\u2581")
diff --git a/SpeechT5/fairseq/fairseq/data/encoders/space_tokenizer.py b/SpeechT5/fairseq/fairseq/data/encoders/space_tokenizer.py
new file mode 100644
index 0000000000000000000000000000000000000000..925ad41b7c1aee6738c63938c36bd3ee16dca812
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/data/encoders/space_tokenizer.py
@@ -0,0 +1,21 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import re
+
+from fairseq.data.encoders import register_tokenizer
+from fairseq.dataclass import FairseqDataclass
+
+
+@register_tokenizer("space", dataclass=FairseqDataclass)
+class SpaceTokenizer(object):
+    def __init__(self, *unused):
+        self.space_tok = re.compile(r"\s+")
+
+    def encode(self, x: str) -> str:
+        return self.space_tok.sub(" ", x)
+
+    def decode(self, x: str) -> str:
+        return x
diff --git a/SpeechT5/fairseq/fairseq/data/encoders/subword_nmt_bpe.py b/SpeechT5/fairseq/fairseq/data/encoders/subword_nmt_bpe.py
new file mode 100644
index 0000000000000000000000000000000000000000..5d724d2730a5895ca55af2998c2ced471625b516
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/data/encoders/subword_nmt_bpe.py
@@ -0,0 +1,54 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from dataclasses import dataclass, field
+
+from fairseq import file_utils
+from fairseq.data.encoders import register_bpe
+from fairseq.dataclass import FairseqDataclass
+
+
+@dataclass
+class SubwordNMTBPEConfig(FairseqDataclass):
+    bpe_codes: str = field(default="???", metadata={"help": "path to subword NMT BPE"})
+    bpe_separator: str = field(default="@@", metadata={"help": "BPE separator"})
+
+
+@register_bpe("subword_nmt", dataclass=SubwordNMTBPEConfig)
+class SubwordNMTBPE(object):
+    def __init__(self, cfg):
+        if cfg.bpe_codes is None:
+            raise ValueError("--bpe-codes is required for --bpe=subword_nmt")
+        codes = file_utils.cached_path(cfg.bpe_codes)
+        try:
+            from subword_nmt import apply_bpe
+
+            bpe_parser = apply_bpe.create_parser()
+            bpe_args = bpe_parser.parse_args(
+                [
+                    "--codes",
+                    codes,
+                    "--separator",
+                    cfg.bpe_separator,
+                ]
+            )
+            self.bpe = apply_bpe.BPE(
+                bpe_args.codes,
+                bpe_args.merges,
+                bpe_args.separator,
+                None,
+                bpe_args.glossaries,
+            )
+            self.bpe_symbol = bpe_args.separator + " "
+        except ImportError:
+            raise ImportError(
+                "Please install subword_nmt with: pip install subword-nmt"
+            )
+
+    def encode(self, x: str) -> str:
+        return self.bpe.process_line(x)
+
+    def decode(self, x: str) -> str:
+        return (x + " ").replace(self.bpe_symbol, "").rstrip()
diff --git a/SpeechT5/fairseq/fairseq/data/encoders/utils.py b/SpeechT5/fairseq/fairseq/data/encoders/utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..d93eb532ef84f0e2bc708b777229ab2cb76ca14b
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/data/encoders/utils.py
@@ -0,0 +1,30 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import torch
+from fairseq.data import encoders
+
+
+def get_whole_word_mask(args, dictionary):
+    bpe = encoders.build_bpe(args)
+    if bpe is not None:
+
+        def is_beginning_of_word(i):
+            if i < dictionary.nspecial:
+                # special elements are always considered beginnings
+                return True
+            tok = dictionary[i]
+            if tok.startswith("madeupword"):
+                return True
+            try:
+                return bpe.is_beginning_of_word(tok)
+            except ValueError:
+                return True
+
+        mask_whole_words = torch.ByteTensor(
+            list(map(is_beginning_of_word, range(len(dictionary))))
+        )
+        return mask_whole_words
+    return None
diff --git a/SpeechT5/fairseq/fairseq/data/fairseq_dataset.py b/SpeechT5/fairseq/fairseq/data/fairseq_dataset.py
new file mode 100644
index 0000000000000000000000000000000000000000..23e6992dbaf34e52f2fdcd0c8fc418c93744ea4e
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/data/fairseq_dataset.py
@@ -0,0 +1,205 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import logging
+import numpy as np
+import torch.utils.data
+from fairseq.data import data_utils
+
+logger = logging.getLogger(__name__)
+
+
+class EpochListening:
+    """Mixin for receiving updates whenever the epoch increments."""
+
+    @property
+    def can_reuse_epoch_itr_across_epochs(self):
+        """
+        Whether we can reuse the :class:`fairseq.data.EpochBatchIterator` for
+        this dataset across epochs.
+
+        This needs to return ``False`` if the sample sizes can change across
+        epochs, in which case we may need to regenerate batches at each epoch.
+        If your dataset relies in ``set_epoch`` then you should consider setting
+        this to ``False``.
+        """
+        return True
+
+    def set_epoch(self, epoch):
+        """Will receive the updated epoch number at the beginning of the epoch."""
+        pass
+
+
+class FairseqDataset(torch.utils.data.Dataset, EpochListening):
+    """A dataset that provides helpers for batching."""
+
+    def __getitem__(self, index):
+        raise NotImplementedError
+
+    def __len__(self):
+        raise NotImplementedError
+
+    def collater(self, samples):
+        """Merge a list of samples to form a mini-batch.
+
+        Args:
+            samples (List[dict]): samples to collate
+
+        Returns:
+            dict: a mini-batch suitable for forwarding with a Model
+        """
+        raise NotImplementedError
+
+    def num_tokens(self, index):
+        """Return the number of tokens in a sample. This value is used to
+        enforce ``--max-tokens`` during batching."""
+        raise NotImplementedError
+
+    def num_tokens_vec(self, indices):
+        """Return the number of tokens for a set of positions defined by indices.
+        This value is used to enforce ``--max-tokens`` during batching."""
+        raise NotImplementedError
+
+    def size(self, index):
+        """Return an example's size as a float or tuple. This value is used when
+        filtering a dataset with ``--max-positions``."""
+        raise NotImplementedError
+
+    def ordered_indices(self):
+        """Return an ordered list of indices. Batches will be constructed based
+        on this order."""
+        return np.arange(len(self), dtype=np.int64)
+
+    @property
+    def supports_prefetch(self):
+        """Whether this dataset supports prefetching."""
+        return False
+
+    def attr(self, attr: str, index: int):
+        return getattr(self, attr, None)
+
+    def prefetch(self, indices):
+        """Prefetch the data required for this epoch."""
+        raise NotImplementedError
+
+    def get_batch_shapes(self):
+        """
+        Return a list of valid batch shapes, for example::
+
+            [(8, 512), (16, 256), (32, 128)]
+
+        The first dimension of each tuple is the batch size and can be ``None``
+        to automatically infer the max batch size based on ``--max-tokens``.
+        The second dimension of each tuple is the max supported length as given
+        by :func:`fairseq.data.FairseqDataset.num_tokens`.
+
+        This will be used by :func:`fairseq.data.FairseqDataset.batch_by_size`
+        to restrict batch shapes. This is useful on TPUs to avoid too many
+        dynamic shapes (and recompilations).
+        """
+        return None
+
+    def batch_by_size(
+        self,
+        indices,
+        max_tokens=None,
+        max_sentences=None,
+        required_batch_size_multiple=1,
+    ):
+        """
+        Given an ordered set of indices, return batches according to
+        *max_tokens*, *max_sentences* and *required_batch_size_multiple*.
+        """
+        from fairseq.data import data_utils
+
+        fixed_shapes = self.get_batch_shapes()
+        if fixed_shapes is not None:
+
+            def adjust_bsz(bsz, num_tokens):
+                if bsz is None:
+                    assert max_tokens is not None, "Must specify --max-tokens"
+                    bsz = max_tokens // num_tokens
+                if max_sentences is not None:
+                    bsz = min(bsz, max_sentences)
+                elif (
+                    bsz >= required_batch_size_multiple
+                    and bsz % required_batch_size_multiple != 0
+                ):
+                    bsz -= bsz % required_batch_size_multiple
+                return bsz
+
+            fixed_shapes = np.array(
+                [
+                    [adjust_bsz(bsz, num_tokens), num_tokens]
+                    for (bsz, num_tokens) in fixed_shapes
+                ]
+            )
+
+        try:
+            num_tokens_vec = self.num_tokens_vec(indices).astype('int64')
+        except NotImplementedError:
+            num_tokens_vec = None
+
+        return data_utils.batch_by_size(
+            indices,
+            num_tokens_fn=self.num_tokens,
+            num_tokens_vec=num_tokens_vec,
+            max_tokens=max_tokens,
+            max_sentences=max_sentences,
+            required_batch_size_multiple=required_batch_size_multiple,
+            fixed_shapes=fixed_shapes,
+        )
+
+    def filter_indices_by_size(self, indices, max_sizes):
+        """
+        Filter a list of sample indices. Remove those that are longer than
+        specified in *max_sizes*.
+
+        WARNING: don't update, override method in child classes
+
+        Args:
+            indices (np.array): original array of sample indices
+            max_sizes (int or list[int] or tuple[int]): max sample size,
+                can be defined separately for src and tgt (then list or tuple)
+
+        Returns:
+            np.array: filtered sample array
+            list: list of removed indices
+        """
+        if isinstance(max_sizes, float) or isinstance(max_sizes, int):
+            if hasattr(self, "sizes") and isinstance(self.sizes, np.ndarray):
+                ignored = indices[self.sizes[indices] > max_sizes].tolist()
+                indices = indices[self.sizes[indices] <= max_sizes]
+            elif (
+                hasattr(self, "sizes")
+                and isinstance(self.sizes, list)
+                and len(self.sizes) == 1
+            ):
+                ignored = indices[self.sizes[0][indices] > max_sizes].tolist()
+                indices = indices[self.sizes[0][indices] <= max_sizes]
+            else:
+                indices, ignored = data_utils._filter_by_size_dynamic(
+                    indices, self.size, max_sizes
+                )
+        else:
+            indices, ignored = data_utils._filter_by_size_dynamic(
+                indices, self.size, max_sizes
+            )
+        return indices, ignored
+
+    @property
+    def supports_fetch_outside_dataloader(self):
+        """Whether this dataset supports fetching outside the workers of the dataloader."""
+        return True
+
+
+class FairseqIterableDataset(torch.utils.data.IterableDataset, EpochListening):
+    """
+    For datasets that need to be read sequentially, usually because the data is
+    being streamed or otherwise can't be manipulated on a single machine.
+    """
+
+    def __iter__(self):
+        raise NotImplementedError
diff --git a/SpeechT5/fairseq/fairseq/data/fasta_dataset.py b/SpeechT5/fairseq/fairseq/data/fasta_dataset.py
new file mode 100644
index 0000000000000000000000000000000000000000..007011974a997fd7446dd29d7eba097d7513bab0
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/data/fasta_dataset.py
@@ -0,0 +1,107 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import os
+import subprocess
+import threading
+from pathlib import Path
+
+import numpy as np
+import torch
+
+
+def fasta_file_path(prefix_path):
+    return prefix_path + ".fasta"
+
+
+class FastaDataset(torch.utils.data.Dataset):
+    """
+    For loading protein sequence datasets in the common FASTA data format
+    """
+
+    def __init__(self, path: str, cache_indices=False):
+        self.fn = fasta_file_path(path)
+        self.threadlocal = threading.local()
+        self.cache = Path(f"{path}.fasta.idx.npy")
+        if cache_indices:
+            if self.cache.exists():
+                self.offsets, self.sizes = np.load(self.cache)
+            else:
+                self.offsets, self.sizes = self._build_index(path)
+                np.save(self.cache, np.stack([self.offsets, self.sizes]))
+        else:
+            self.offsets, self.sizes = self._build_index(path)
+
+    def _get_file(self):
+        if not hasattr(self.threadlocal, "f"):
+            self.threadlocal.f = open(self.fn, "r")
+        return self.threadlocal.f
+
+    def __getitem__(self, idx):
+        f = self._get_file()
+        f.seek(self.offsets[idx])
+        desc = f.readline().strip()
+        line = f.readline()
+        seq = ""
+        while line != "" and line[0] != ">":
+            seq += line.strip()
+            line = f.readline()
+        return desc, seq
+
+    def __len__(self):
+        return self.offsets.size
+
+    def _build_index(self, path: str):
+        # Use grep and awk to get 100M/s on local SSD.
+        # Should process your enormous 100G fasta in ~10 min single core...
+        path = fasta_file_path(path)
+        bytes_offsets = subprocess.check_output(
+            f"cat {path} | tqdm --bytes --total $(wc -c < {path})"
+            "| grep --byte-offset '^>' -o | cut -d: -f1",
+            shell=True,
+        )
+        fasta_lengths = subprocess.check_output(
+            f"cat {path} | tqdm --bytes --total $(wc -c < {path})"
+            "| awk '/^>/ {print \"\";next;} { printf(\"%s\",$0);}' | tail -n+2 | awk '{print length($1)}'",
+            shell=True,
+        )
+        bytes_np = np.fromstring(bytes_offsets, dtype=np.int64, sep=" ")
+        sizes_np = np.fromstring(fasta_lengths, dtype=np.int64, sep=" ")
+        return bytes_np, sizes_np
+
+    def __setstate__(self, state):
+        self.__dict__ = state
+        self.threadlocal = threading.local()
+
+    def __getstate__(self):
+        d = {}
+        for i, v in self.__dict__.items():
+            if i != "threadlocal":
+                d[i] = v
+        return d
+
+    def __del__(self):
+        if hasattr(self.threadlocal, "f"):
+            self.threadlocal.f.close()
+            del self.threadlocal.f
+
+    @staticmethod
+    def exists(path):
+        return os.path.exists(fasta_file_path(path))
+
+
+class EncodedFastaDataset(FastaDataset):
+    """
+    The FastaDataset returns raw sequences - this allows us to return
+    indices with a dictionary instead.
+    """
+
+    def __init__(self, path, dictionary):
+        super().__init__(path, cache_indices=True)
+        self.dictionary = dictionary
+
+    def __getitem__(self, idx):
+        desc, seq = super().__getitem__(idx)
+        return self.dictionary.encode_line(seq, line_tokenizer=list).long()
diff --git a/SpeechT5/fairseq/fairseq/data/id_dataset.py b/SpeechT5/fairseq/fairseq/data/id_dataset.py
new file mode 100644
index 0000000000000000000000000000000000000000..3e4d7969cf2a26e852b466f165a6fadabae3b35f
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/data/id_dataset.py
@@ -0,0 +1,19 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import torch
+
+from . import FairseqDataset
+
+
+class IdDataset(FairseqDataset):
+    def __getitem__(self, index):
+        return index
+
+    def __len__(self):
+        return 0
+
+    def collater(self, samples):
+        return torch.tensor(samples)
diff --git a/SpeechT5/fairseq/fairseq/data/indexed_dataset.py b/SpeechT5/fairseq/fairseq/data/indexed_dataset.py
new file mode 100644
index 0000000000000000000000000000000000000000..802e37a7ff849e435d4fa89ad7609c17cedd1980
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/data/indexed_dataset.py
@@ -0,0 +1,576 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import shutil
+import struct
+from functools import lru_cache
+
+import numpy as np
+import torch
+from fairseq.dataclass.constants import DATASET_IMPL_CHOICES
+from fairseq.data.fasta_dataset import FastaDataset
+from fairseq.file_io import PathManager
+
+from . import FairseqDataset
+
+from typing import Union
+
+
+def best_fitting_int_dtype(
+    max_int_to_represent,
+) -> Union[np.uint16, np.uint32, np.int64]:
+
+    if max_int_to_represent is None:
+        return np.uint32  # Safe guess
+    elif max_int_to_represent < 65500:
+        return np.uint16
+    elif max_int_to_represent < 4294967295:
+        return np.uint32
+    else:
+        return np.int64
+        # we avoid np.uint64 because it doesn't save space and its type promotion behaves unexpectedly
+        # https://github.com/numpy/numpy/issues/5745
+
+
+def get_available_dataset_impl():
+    return list(map(str, DATASET_IMPL_CHOICES))
+
+
+def infer_dataset_impl(path):
+    if IndexedRawTextDataset.exists(path):
+        return "raw"
+    elif IndexedDataset.exists(path):
+        with open(index_file_path(path), "rb") as f:
+            magic = f.read(8)
+            if magic == IndexedDataset._HDR_MAGIC:
+                return "cached"
+            elif magic == MMapIndexedDataset.Index._HDR_MAGIC[:8]:
+                return "mmap"
+            else:
+                return None
+    elif FastaDataset.exists(path):
+        return "fasta"
+    else:
+        return None
+
+
+def make_builder(out_file, impl, vocab_size=None):
+    if impl == "mmap":
+        return MMapIndexedDatasetBuilder(
+            out_file, dtype=best_fitting_int_dtype(vocab_size)
+        )
+    elif impl == "fasta":
+        raise NotImplementedError
+    else:
+        return IndexedDatasetBuilder(out_file)
+
+
+def make_dataset(path, impl, fix_lua_indexing=False, dictionary=None):
+    if impl == "raw" and IndexedRawTextDataset.exists(path):
+        assert dictionary is not None
+        return IndexedRawTextDataset(path, dictionary)
+    elif impl == "lazy" and IndexedDataset.exists(path):
+        return IndexedDataset(path, fix_lua_indexing=fix_lua_indexing)
+    elif impl == "cached" and IndexedDataset.exists(path):
+        return IndexedCachedDataset(path, fix_lua_indexing=fix_lua_indexing)
+    elif impl == "mmap" and MMapIndexedDataset.exists(path):
+        return MMapIndexedDataset(path)
+    elif impl == "fasta" and FastaDataset.exists(path):
+        from fairseq.data.fasta_dataset import EncodedFastaDataset
+
+        return EncodedFastaDataset(path, dictionary)
+    return None
+
+
+def dataset_exists(path, impl):
+    if impl == "raw":
+        return IndexedRawTextDataset.exists(path)
+    elif impl == "mmap":
+        return MMapIndexedDataset.exists(path)
+    else:
+        return IndexedDataset.exists(path)
+
+
+def read_longs(f, n):
+    a = np.empty(n, dtype=np.int64)
+    f.readinto(a)
+    return a
+
+
+def write_longs(f, a):
+    f.write(np.array(a, dtype=np.int64))
+
+
+_code_to_dtype = {
+    1: np.uint8,
+    2: np.int8,
+    3: np.int16,
+    4: np.int32,
+    5: np.int64,
+    6: np.float,
+    7: np.double,
+    8: np.uint16,
+    9: np.uint32,
+    10: np.uint64,
+}
+
+
+def _dtype_header_code(dtype) -> int:
+    for k in _code_to_dtype.keys():
+        if _code_to_dtype[k] == dtype:
+            return k
+    raise ValueError(dtype)
+
+
+def index_file_path(prefix_path):
+    return prefix_path + ".idx"
+
+
+def data_file_path(prefix_path):
+    return prefix_path + ".bin"
+
+
+class IndexedDataset(FairseqDataset):
+    """Loader for TorchNet IndexedDataset"""
+
+    _HDR_MAGIC = b"TNTIDX\x00\x00"
+
+    def __init__(self, path, fix_lua_indexing=False):
+        super().__init__()
+        self.path = path
+        self.fix_lua_indexing = fix_lua_indexing
+        self.data_file = None
+        self.read_index(path)
+
+    def read_index(self, path):
+        with open(index_file_path(path), "rb") as f:
+            magic = f.read(8)
+            assert magic == self._HDR_MAGIC, (
+                "Index file doesn't match expected format. "
+                "Make sure that --dataset-impl is configured properly."
+            )
+            version = f.read(8)
+            assert struct.unpack("<Q", version) == (1,)
+            code, self.element_size = struct.unpack("<QQ", f.read(16))
+            self.dtype = _code_to_dtype[code]
+            self._len, self.s = struct.unpack("<QQ", f.read(16))
+            self.dim_offsets = read_longs(f, self._len + 1)
+            self.data_offsets = read_longs(f, self._len + 1)
+            self.sizes = read_longs(f, self.s)
+
+    def read_data(self, path):
+        self.data_file = open(data_file_path(path), "rb", buffering=0)
+
+    def check_index(self, i):
+        if i < 0 or i >= self._len:
+            raise IndexError("index out of range")
+
+    def __del__(self):
+        if self.data_file:
+            self.data_file.close()
+
+    @lru_cache(maxsize=8)
+    def __getitem__(self, i) -> torch.Tensor:
+        if not self.data_file:
+            self.read_data(self.path)
+        self.check_index(i)
+        tensor_size = self.sizes[self.dim_offsets[i] : self.dim_offsets[i + 1]]
+        a = np.empty(tensor_size, dtype=self.dtype)
+        self.data_file.seek(self.data_offsets[i] * self.element_size)
+        self.data_file.readinto(a)
+        item = torch.from_numpy(a).long()
+        if self.fix_lua_indexing:
+            item -= 1  # subtract 1 for 0-based indexing
+        return item
+
+    def __len__(self):
+        return self._len
+
+    def num_tokens(self, index):
+        return self.sizes[index]
+
+    def size(self, index):
+        return self.sizes[index]
+
+    @staticmethod
+    def exists(path):
+        return PathManager.exists(index_file_path(path)) and PathManager.exists(
+            data_file_path(path)
+        )
+
+    @property
+    def supports_prefetch(self):
+        return False  # avoid prefetching to save memory
+
+
+class IndexedCachedDataset(IndexedDataset):
+    def __init__(self, path, fix_lua_indexing=False):
+        super().__init__(path, fix_lua_indexing=fix_lua_indexing)
+        self.cache = None
+        self.cache_index = {}
+
+    @property
+    def supports_prefetch(self):
+        return True
+
+    def prefetch(self, indices):
+        if all(i in self.cache_index for i in indices):
+            return
+        if not self.data_file:
+            self.read_data(self.path)
+        indices = sorted(set(indices))
+        total_size = 0
+        for i in indices:
+            total_size += self.data_offsets[i + 1] - self.data_offsets[i]
+        self.cache = np.empty(total_size, dtype=self.dtype)
+        ptx = 0
+        self.cache_index.clear()
+        for i in indices:
+            self.cache_index[i] = ptx
+            size = self.data_offsets[i + 1] - self.data_offsets[i]
+            a = self.cache[ptx : ptx + size]
+            self.data_file.seek(self.data_offsets[i] * self.element_size)
+            self.data_file.readinto(a)
+            ptx += size
+        if self.data_file:
+            # close and delete data file after prefetch so we can pickle
+            self.data_file.close()
+            self.data_file = None
+
+    @lru_cache(maxsize=8)
+    def __getitem__(self, i):
+        self.check_index(i)
+        tensor_size = self.sizes[self.dim_offsets[i] : self.dim_offsets[i + 1]]
+        a = np.empty(tensor_size, dtype=self.dtype)
+        ptx = self.cache_index[i]
+        np.copyto(a, self.cache[ptx : ptx + a.size])
+        item = torch.from_numpy(a).long()
+        if self.fix_lua_indexing:
+            item -= 1  # subtract 1 for 0-based indexing
+        return item
+
+
+class IndexedRawTextDataset(FairseqDataset):
+    """Takes a text file as input and binarizes it in memory at instantiation.
+    Original lines are also kept in memory"""
+
+    def __init__(self, path, dictionary, append_eos=True, reverse_order=False):
+        self.tokens_list = []
+        self.lines = []
+        self.sizes = []
+        self.append_eos = append_eos
+        self.reverse_order = reverse_order
+        self.read_data(path, dictionary)
+        self.size = len(self.tokens_list)
+
+    def read_data(self, path, dictionary):
+        with open(path, "r", encoding="utf-8") as f:
+            for line in f:
+                self.lines.append(line.strip("\n"))
+                tokens = dictionary.encode_line(
+                    line,
+                    add_if_not_exist=False,
+                    append_eos=self.append_eos,
+                    reverse_order=self.reverse_order,
+                ).long()
+                self.tokens_list.append(tokens)
+                self.sizes.append(len(tokens))
+        self.sizes = np.array(self.sizes)
+
+    def check_index(self, i):
+        if i < 0 or i >= self.size:
+            raise IndexError("index out of range")
+
+    @lru_cache(maxsize=8)
+    def __getitem__(self, i):
+        self.check_index(i)
+        return self.tokens_list[i]
+
+    def get_original_text(self, i):
+        self.check_index(i)
+        return self.lines[i]
+
+    def __del__(self):
+        pass
+
+    def __len__(self):
+        return self.size
+
+    def num_tokens(self, index):
+        return self.sizes[index]
+
+    def size(self, index):
+        return self.sizes[index]
+
+    @staticmethod
+    def exists(path):
+        return PathManager.exists(path)
+
+
+class IndexedDatasetBuilder:
+    element_sizes = {
+        np.uint8: 1,
+        np.int8: 1,
+        np.int16: 2,
+        np.int32: 4,
+        np.int64: 8,
+        np.float: 4,
+        np.double: 8,
+    }
+
+    def __init__(self, out_file, dtype=np.int32):
+        self.out_file = open(out_file, "wb")
+        self.dtype = dtype
+        self.data_offsets = [0]
+        self.dim_offsets = [0]
+        self.sizes = []
+        self.element_size = self.element_sizes[self.dtype]
+
+    def add_item(self, tensor):
+        # +1 for Lua compatibility
+        bytes = self.out_file.write(np.array(tensor.numpy() + 1, dtype=self.dtype))
+        self.data_offsets.append(self.data_offsets[-1] + bytes / self.element_size)
+        for s in tensor.size():
+            self.sizes.append(s)
+        self.dim_offsets.append(self.dim_offsets[-1] + len(tensor.size()))
+
+    def merge_file_(self, another_file):
+        index = IndexedDataset(another_file)
+        assert index.dtype == self.dtype
+
+        begin = self.data_offsets[-1]
+        for offset in index.data_offsets[1:]:
+            self.data_offsets.append(begin + offset)
+        self.sizes.extend(index.sizes)
+        begin = self.dim_offsets[-1]
+        for dim_offset in index.dim_offsets[1:]:
+            self.dim_offsets.append(begin + dim_offset)
+
+        with open(data_file_path(another_file), "rb") as f:
+            while True:
+                data = f.read(1024)
+                if data:
+                    self.out_file.write(data)
+                else:
+                    break
+
+    def finalize(self, index_file):
+        self.out_file.close()
+        index = open(index_file, "wb")
+        index.write(b"TNTIDX\x00\x00")
+        index.write(struct.pack("<Q", 1))
+        index.write(
+            struct.pack("<QQ", _dtype_header_code(self.dtype), self.element_size)
+        )
+        index.write(struct.pack("<QQ", len(self.data_offsets) - 1, len(self.sizes)))
+        write_longs(index, self.dim_offsets)
+        write_longs(index, self.data_offsets)
+        write_longs(index, self.sizes)
+        index.close()
+
+
+def _warmup_mmap_file(path):
+    with open(path, "rb") as stream:
+        while stream.read(100 * 1024 * 1024):
+            pass
+
+
+class MMapIndexedDataset(torch.utils.data.Dataset):
+    class Index:
+        _HDR_MAGIC = b"MMIDIDX\x00\x00"
+
+        @classmethod
+        def writer(cls, path, dtype):
+            class _Writer:
+                def __enter__(self):
+                    self._file = open(path, "wb")
+
+                    self._file.write(cls._HDR_MAGIC)
+                    self._file.write(struct.pack("<Q", 1))
+                    self._file.write(struct.pack("<B", _dtype_header_code(dtype)))
+
+                    return self
+
+                @staticmethod
+                def _get_pointers(sizes):
+                    dtype_size = dtype().itemsize
+                    address = 0
+                    pointers = []
+
+                    for size in sizes:
+                        pointers.append(address)
+                        address += size * dtype_size
+
+                    return pointers
+
+                def write(self, sizes):
+                    pointers = self._get_pointers(sizes)
+
+                    self._file.write(struct.pack("<Q", len(sizes)))
+
+                    sizes = np.array(sizes, dtype=np.int32)
+                    self._file.write(sizes.tobytes(order="C"))
+                    del sizes
+
+                    pointers = np.array(pointers, dtype=np.int64)
+                    self._file.write(pointers.tobytes(order="C"))
+                    del pointers
+
+                def __exit__(self, exc_type, exc_val, exc_tb):
+                    self._file.close()
+
+            return _Writer()
+
+        def __init__(self, path):
+            with open(path, "rb") as stream:
+                magic_test = stream.read(9)
+                assert self._HDR_MAGIC == magic_test, (
+                    "Index file doesn't match expected format. "
+                    "Make sure that --dataset-impl is configured properly."
+                )
+                version = struct.unpack("<Q", stream.read(8))
+                assert (1,) == version
+
+                (dtype_code,) = struct.unpack("<B", stream.read(1))
+                self._dtype = _code_to_dtype[dtype_code]
+                self._dtype_size = self._dtype().itemsize
+
+                self._len = struct.unpack("<Q", stream.read(8))[0]
+                offset = stream.tell()
+
+            _warmup_mmap_file(path)
+
+            self._bin_buffer_mmap = np.memmap(path, mode="r", order="C")
+            self._bin_buffer = memoryview(self._bin_buffer_mmap)
+            self._sizes = np.frombuffer(
+                self._bin_buffer, dtype=np.int32, count=self._len, offset=offset
+            )
+            self._pointers = np.frombuffer(
+                self._bin_buffer,
+                dtype=np.int64,
+                count=self._len,
+                offset=offset + self._sizes.nbytes,
+            )
+
+        def __del__(self):
+            self._bin_buffer_mmap._mmap.close()
+            del self._bin_buffer_mmap
+
+        @property
+        def dtype(self):
+            return self._dtype
+
+        @property
+        def sizes(self):
+            return self._sizes
+
+        @lru_cache(maxsize=8)
+        def __getitem__(self, i):
+            return self._pointers[i], self._sizes[i]
+
+        def __len__(self):
+            return self._len
+
+    def __init__(self, path):
+        super().__init__()
+
+        self._path = None
+        self._index = None
+        self._bin_buffer = None
+
+        self._do_init(path)
+
+    def __getstate__(self):
+        return self._path
+
+    def __setstate__(self, state):
+        self._do_init(state)
+
+    def _do_init(self, path):
+        self._path = path
+        self._index = self.Index(index_file_path(self._path))
+
+        _warmup_mmap_file(data_file_path(self._path))
+        self._bin_buffer_mmap = np.memmap(
+            data_file_path(self._path), mode="r", order="C"
+        )
+        self._bin_buffer = memoryview(self._bin_buffer_mmap)
+
+    def __del__(self):
+        self._bin_buffer_mmap._mmap.close()
+        del self._bin_buffer_mmap
+        del self._index
+
+    def __len__(self):
+        return len(self._index)
+
+    @lru_cache(maxsize=8)
+    def __getitem__(self, i):
+        ptr, size = self._index[i]
+        np_array = np.frombuffer(
+            self._bin_buffer, dtype=self._index.dtype, count=size, offset=ptr
+        )
+        if self._index.dtype != np.int64:
+            np_array = np_array.astype(np.int64)
+
+        return torch.from_numpy(np_array)
+
+    @property
+    def sizes(self):
+        return self._index.sizes
+
+    @property
+    def supports_prefetch(self):
+        return False
+
+    @staticmethod
+    def exists(path):
+        return PathManager.exists(index_file_path(path)) and PathManager.exists(
+            data_file_path(path)
+        )
+
+
+def get_indexed_dataset_to_local(path) -> str:
+    local_index_path = PathManager.get_local_path(index_file_path(path))
+    local_data_path = PathManager.get_local_path(data_file_path(path))
+
+    assert local_index_path.endswith(".idx") and local_data_path.endswith(".bin"), (
+        "PathManager.get_local_path does not return files with expected patterns: "
+        f"{local_index_path} and {local_data_path}"
+    )
+
+    local_path = local_data_path[:-4]  # stripping surfix ".bin"
+    assert local_path == local_index_path[:-4]  # stripping surfix ".idx"
+    return local_path
+
+
+class MMapIndexedDatasetBuilder:
+    def __init__(self, out_file, dtype=np.int64):
+        self._data_file = open(out_file, "wb")
+        self._dtype = dtype
+        self._sizes = []
+
+    def add_item(self, tensor):
+        np_array = np.array(tensor.numpy(), dtype=self._dtype)
+        self._data_file.write(np_array.tobytes(order="C"))
+        self._sizes.append(np_array.size)
+
+    def merge_file_(self, another_file):
+        # Concatenate index
+        index = MMapIndexedDataset.Index(index_file_path(another_file))
+        assert index.dtype == self._dtype
+
+        for size in index.sizes:
+            self._sizes.append(size)
+
+        # Concatenate data
+        with open(data_file_path(another_file), "rb") as f:
+            shutil.copyfileobj(f, self._data_file)
+
+    def finalize(self, index_file):
+        self._data_file.close()
+
+        with MMapIndexedDataset.Index.writer(index_file, self._dtype) as index:
+            index.write(self._sizes)
diff --git a/SpeechT5/fairseq/fairseq/data/iterators.py b/SpeechT5/fairseq/fairseq/data/iterators.py
new file mode 100644
index 0000000000000000000000000000000000000000..86f6d0553371c3195aaa780778e7e830e7e27e1a
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/data/iterators.py
@@ -0,0 +1,640 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import itertools
+import logging
+import math
+import operator
+import os
+import queue
+import time
+from threading import Thread
+
+import numpy as np
+import torch
+from fairseq.data import data_utils
+
+
+logger = logging.getLogger(__name__)
+
+# Object used by _background_consumer to signal the source is exhausted
+# to the main thread.
+_sentinel = object()
+
+
+class CountingIterator(object):
+    """Wrapper around an iterable that maintains the iteration count.
+
+    Args:
+        iterable (iterable): iterable to wrap
+        start (int): starting iteration count. Note that this doesn't
+            actually advance the iterator.
+        total (int): override the iterator length returned by ``__len``.
+            This can be used to truncate *iterator*.
+
+    Attributes:
+        n (int): number of elements consumed from this iterator
+    """
+
+    def __init__(self, iterable, start=None, total=None):
+        self._itr = iter(iterable)
+        self.n = start or getattr(iterable, "n", 0)
+        self.total = total or self.n + len(iterable)
+
+    def __len__(self):
+        return self.total
+
+    def __iter__(self):
+        return self
+
+    def __next__(self):
+        if not self.has_next():
+            raise StopIteration
+        try:
+            x = next(self._itr)
+        except StopIteration:
+            raise IndexError(f"Iterator expected to have length {self.total}, "
+                             "but exhausted at position {self.n}.")
+        self.n += 1
+        return x
+
+    def has_next(self):
+        """Whether the iterator has been exhausted."""
+        return self.n < self.total
+
+    def skip(self, n):
+        """Fast-forward the iterator by skipping n elements."""
+        for _ in range(n):
+            next(self)
+        return self
+
+    def take(self, n):
+        """Truncate the iterator to n elements at most."""
+        self.total = min(self.total, n)
+        # Propagate this change to the underlying iterator
+        if hasattr(self._itr, "take"):
+            self._itr.take(max(n - self.n, 0))
+        return self
+
+
+class EpochBatchIterating(object):
+    def __len__(self) -> int:
+        raise NotImplementedError
+
+    @property
+    def next_epoch_idx(self):
+        raise NotImplementedError
+
+    def next_epoch_itr(
+        self, shuffle=True, fix_batches_to_gpus=False, set_dataset_epoch=True
+    ):
+        """Return a new iterator over the dataset.
+
+        Args:
+            shuffle (bool, optional): shuffle batches before returning the
+                iterator (default: True).
+            fix_batches_to_gpus (bool, optional): ensure that batches are always
+                allocated to the same shards across epochs. Requires
+                that :attr:`dataset` supports prefetching (default: False).
+            set_dataset_epoch (bool, optional): update the wrapped Dataset with
+                the new epoch number (default: True).
+        """
+        raise NotImplementedError
+
+    def end_of_epoch(self) -> bool:
+        """Returns whether the most recent epoch iterator has been exhausted"""
+        raise NotImplementedError
+
+    @property
+    def iterations_in_epoch(self) -> int:
+        """The number of consumed batches in the current epoch."""
+        raise NotImplementedError
+
+    def state_dict(self):
+        """Returns a dictionary containing a whole state of the iterator."""
+        raise NotImplementedError
+
+    def load_state_dict(self, state_dict):
+        """Copies the state of the iterator from the given *state_dict*."""
+        raise NotImplementedError
+
+    @property
+    def first_batch(self):
+        return "DUMMY"
+
+
+class StreamingEpochBatchIterator(EpochBatchIterating):
+    """A steaming-style iterator over a :class:`torch.utils.data.IterableDataset`.
+
+    Args:
+        dataset (~torch.utils.data.Dataset): dataset from which to load the data
+        max_sentences: batch size
+        collate_fn (callable): merges a list of samples to form a mini-batch
+        num_workers (int, optional): how many subprocesses to use for data
+            loading. 0 means the data will be loaded in the main process
+            (default: 0).
+        epoch (int, optional): the epoch to start the iterator from
+            (default: 1).
+        buffer_size (int, optional): the number of batches to keep ready in the
+            queue. Helps speeding up dataloading. When buffer_size is zero, the
+            default torch.utils.data.DataLoader preloading is used.
+        timeout (int, optional): if positive, the timeout value for collecting a batch
+            from workers. Should always be non-negative (default: ``0``).
+    """
+
+    def __init__(
+        self,
+        dataset,
+        max_sentences=1,
+        collate_fn=None,
+        epoch=1,
+        num_workers=0,
+        buffer_size=0,
+        timeout=0,
+    ):
+        assert isinstance(dataset, torch.utils.data.IterableDataset)
+        self.dataset = dataset
+        self.max_sentences = max_sentences
+        self.collate_fn = collate_fn
+        self.epoch = max(epoch, 1)  # we use 1-based indexing for epochs
+        self.num_workers = num_workers
+        # This upper limit here is to prevent people from abusing this feature
+        # in a shared computing environment.
+        self.buffer_size = min(buffer_size, 20)
+        self.timeout = timeout
+
+        self._current_epoch_iterator = None
+
+    @property
+    def next_epoch_idx(self):
+        """Return the epoch index after *next_epoch_itr* is called."""
+        if self._current_epoch_iterator is not None and self.end_of_epoch():
+            return self.epoch + 1
+        else:
+            return self.epoch
+
+    def next_epoch_itr(
+        self, shuffle=True, fix_batches_to_gpus=False, set_dataset_epoch=True
+    ):
+        self.epoch = self.next_epoch_idx
+        if set_dataset_epoch and hasattr(self.dataset, "set_epoch"):
+            self.dataset.set_epoch(self.epoch)
+        self._current_epoch_iterator = self._get_iterator_for_epoch(self.epoch, shuffle)
+        return self._current_epoch_iterator
+
+    def end_of_epoch(self) -> bool:
+        return not self._current_epoch_iterator.has_next()
+
+    @property
+    def iterations_in_epoch(self) -> int:
+        if self._current_epoch_iterator is not None:
+            return self._current_epoch_iterator.n
+        return 0
+
+    def state_dict(self):
+        return {
+            "epoch": self.epoch,
+        }
+
+    def load_state_dict(self, state_dict):
+        self.epoch = state_dict["epoch"]
+
+    def _get_iterator_for_epoch(self, epoch, shuffle, offset=0):
+        if self.num_workers > 0:
+            os.environ["PYTHONWARNINGS"] = "ignore:semaphore_tracker:UserWarning"
+
+        # Create data loader
+        worker_init_fn = getattr(self.dataset, "worker_init_fn", None)
+        itr = torch.utils.data.DataLoader(
+            self.dataset,
+            batch_size=self.max_sentences,
+            collate_fn=self.collate_fn,
+            num_workers=self.num_workers,
+            timeout=self.timeout,
+            worker_init_fn=worker_init_fn,
+            pin_memory=True,
+        )
+
+        # Wrap with a BufferedIterator if needed
+        if self.buffer_size > 0:
+            itr = BufferedIterator(self.buffer_size, itr)
+
+        # Wrap with CountingIterator
+        itr = CountingIterator(itr, start=offset)
+
+        return itr
+
+
+class EpochBatchIterator(EpochBatchIterating):
+    """A multi-epoch iterator over a :class:`torch.utils.data.Dataset`.
+
+    Compared to :class:`torch.utils.data.DataLoader`, this iterator:
+
+    - can be reused across multiple epochs with the :func:`next_epoch_itr`
+      method (optionally shuffled between epochs)
+    - can be serialized/deserialized with the :func:`state_dict` and
+      :func:`load_state_dict` methods
+    - supports sharding with the *num_shards* and *shard_id* arguments
+
+    Args:
+        dataset (~torch.utils.data.Dataset): dataset from which to load the data
+        collate_fn (callable): merges a list of samples to form a mini-batch
+        batch_sampler (~torch.utils.data.Sampler or a callable): an iterator over batches of
+            indices, or a callable to create such an iterator (~torch.utils.data.Sampler).
+            A callable batch_sampler will be called for each epoch to enable per epoch dynamic
+            batch iterators defined by this callable batch_sampler.
+        seed (int, optional): seed for random number generator for
+            reproducibility (default: 1).
+        num_shards (int, optional): shard the data iterator into N
+            shards (default: 1).
+        shard_id (int, optional): which shard of the data iterator to
+            return (default: 0).
+        num_workers (int, optional): how many subprocesses to use for data
+            loading. 0 means the data will be loaded in the main process
+            (default: 0).
+        epoch (int, optional): the epoch to start the iterator from
+            (default: 1).
+        buffer_size (int, optional): the number of batches to keep ready in the
+            queue. Helps speeding up dataloading. When buffer_size is zero, the
+            default torch.utils.data.DataLoader preloading is used.
+        timeout (int, optional): if positive, the timeout value for collecting a batch
+            from workers. Should always be non-negative (default: ``0``).
+        disable_shuffling (bool, optional): force disable shuffling
+            (default: ``False``).
+    """
+
+    def __init__(
+        self,
+        dataset,
+        collate_fn,
+        batch_sampler,
+        seed=1,
+        num_shards=1,
+        shard_id=0,
+        num_workers=0,
+        epoch=1,
+        buffer_size=0,
+        timeout=0,
+        disable_shuffling=False,
+    ):
+        assert isinstance(dataset, torch.utils.data.Dataset)
+        self.dataset = dataset
+        self.collate_fn = collate_fn
+        self.batch_sampler = batch_sampler
+        self._frozen_batches = (
+            tuple(batch_sampler) if not callable(batch_sampler) else None
+        )
+        self.seed = seed
+        self.num_shards = num_shards
+        self.shard_id = shard_id
+        self.num_workers = num_workers
+        # This upper limit here is to prevent people from abusing this feature
+        # in a shared computing environment.
+        self.buffer_size = min(buffer_size, 20)
+        self.timeout = timeout
+        self.disable_shuffling = disable_shuffling
+
+        self.epoch = max(epoch, 1)  # we use 1-based indexing for epochs
+        self.shuffle = not disable_shuffling
+        self._cur_epoch_itr = None
+        self._next_epoch_itr = None
+        self._supports_prefetch = getattr(dataset, "supports_prefetch", False)
+
+    @property
+    def frozen_batches(self):
+        if self._frozen_batches is None:
+            self._frozen_batches = tuple(self.batch_sampler(self.dataset, self.epoch))
+        return self._frozen_batches
+
+    @property
+    def first_batch(self):
+        if len(self.frozen_batches) == 0:
+            raise Exception(
+                "The dataset is empty. This could indicate "
+                "that all elements in the dataset have been skipped. "
+                "Try increasing the max number of allowed tokens or using "
+                "a larger dataset."
+            )
+
+        if getattr(self.dataset, "supports_fetch_outside_dataloader", True):
+            return self.collate_fn([self.dataset[i] for i in self.frozen_batches[0]])
+        else:
+            return "DUMMY"
+
+    def __len__(self):
+        return int(math.ceil(len(self.frozen_batches) / float(self.num_shards)))
+
+    @property
+    def n(self):
+        return self.iterations_in_epoch
+
+    @property
+    def next_epoch_idx(self):
+        """Return the epoch index after *next_epoch_itr* is called."""
+        if self._next_epoch_itr is not None:
+            return self.epoch
+        elif self._cur_epoch_itr is not None and self.end_of_epoch():
+            return self.epoch + 1
+        else:
+            return self.epoch
+
+    def next_epoch_itr(
+        self, shuffle=True, fix_batches_to_gpus=False, set_dataset_epoch=True
+    ):
+        """Return a new iterator over the dataset.
+
+        Args:
+            shuffle (bool, optional): shuffle batches before returning the
+                iterator (default: True).
+            fix_batches_to_gpus (bool, optional): ensure that batches are always
+                allocated to the same shards across epochs. Requires
+                that :attr:`dataset` supports prefetching (default: False).
+            set_dataset_epoch (bool, optional): update the wrapped Dataset with
+                the new epoch number (default: True).
+        """
+        if self.disable_shuffling:
+            shuffle = False
+        prev_epoch = self.epoch
+        self.epoch = self.next_epoch_idx
+        if set_dataset_epoch and hasattr(self.dataset, "set_epoch"):
+            self.dataset.set_epoch(self.epoch)
+        if self._next_epoch_itr is not None:
+            self._cur_epoch_itr = self._next_epoch_itr
+            self._next_epoch_itr = None
+        else:
+            if callable(self.batch_sampler) and prev_epoch != self.epoch:
+                # reset _frozen_batches to refresh the next epoch
+                self._frozen_batches = None
+            self._cur_epoch_itr = self._get_iterator_for_epoch(
+                self.epoch,
+                shuffle,
+                fix_batches_to_gpus=fix_batches_to_gpus,
+            )
+        self.shuffle = shuffle
+        return self._cur_epoch_itr
+
+    def end_of_epoch(self) -> bool:
+        """Returns whether the most recent epoch iterator has been exhausted"""
+        return not self._cur_epoch_itr.has_next()
+
+    @property
+    def iterations_in_epoch(self):
+        """The number of consumed batches in the current epoch."""
+        if self._cur_epoch_itr is not None:
+            return self._cur_epoch_itr.n
+        elif self._next_epoch_itr is not None:
+            return self._next_epoch_itr.n
+        return 0
+
+    def state_dict(self):
+        """Returns a dictionary containing a whole state of the iterator."""
+        if self.end_of_epoch():
+            epoch = self.epoch + 1
+            iter_in_epoch = 0
+        else:
+            epoch = self.epoch
+            iter_in_epoch = self.iterations_in_epoch
+        return {
+            "version": 2,
+            "epoch": epoch,
+            "iterations_in_epoch": iter_in_epoch,
+            "shuffle": self.shuffle,
+        }
+
+    def load_state_dict(self, state_dict):
+        """Copies the state of the iterator from the given *state_dict*."""
+        self.epoch = state_dict["epoch"]
+        itr_pos = state_dict.get("iterations_in_epoch", 0)
+        version = state_dict.get("version", 1)
+        if itr_pos > 0:
+            # fast-forward epoch iterator
+            self._next_epoch_itr = self._get_iterator_for_epoch(
+                self.epoch,
+                shuffle=state_dict.get("shuffle", True),
+                offset=itr_pos,
+            )
+            if self._next_epoch_itr is None:
+                if version == 1:
+                    # legacy behavior: we finished the epoch, increment epoch counter
+                    self.epoch += 1
+                else:
+                    raise RuntimeError(
+                        "Cannot resume training due to dataloader mismatch, please "
+                        "report this to the fairseq developers. You can relaunch "
+                        "training with `--reset-dataloader` and it should work."
+                    )
+        else:
+            self._next_epoch_itr = None
+
+    def _get_iterator_for_epoch(
+        self, epoch, shuffle, fix_batches_to_gpus=False, offset=0
+    ):
+        def shuffle_batches(batches, seed):
+            with data_utils.numpy_seed(seed):
+                np.random.shuffle(batches)
+            return batches
+
+        if self._supports_prefetch:
+            batches = self.frozen_batches
+
+            if shuffle and not fix_batches_to_gpus:
+                batches = shuffle_batches(list(batches), self.seed + epoch)
+
+            batches = list(
+                ShardedIterator(batches, self.num_shards, self.shard_id, fill_value=[])
+            )
+            self.dataset.prefetch([i for s in batches for i in s])
+
+            if shuffle and fix_batches_to_gpus:
+                batches = shuffle_batches(batches, self.seed + epoch + self.shard_id)
+        else:
+            if shuffle:
+                batches = shuffle_batches(list(self.frozen_batches), self.seed + epoch)
+            else:
+                batches = self.frozen_batches
+            batches = list(
+                ShardedIterator(batches, self.num_shards, self.shard_id, fill_value=[])
+            )
+
+        if offset > 0 and offset >= len(batches):
+            return None
+
+        if self.num_workers > 0:
+            os.environ["PYTHONWARNINGS"] = "ignore:semaphore_tracker:UserWarning"
+
+        # Create data loader
+        itr = torch.utils.data.DataLoader(
+            self.dataset,
+            collate_fn=self.collate_fn,
+            batch_sampler=batches[offset:],
+            num_workers=self.num_workers,
+            timeout=self.timeout,
+            pin_memory=True,
+        )
+
+        # Wrap with a BufferedIterator if needed
+        if self.buffer_size > 0:
+            itr = BufferedIterator(self.buffer_size, itr)
+
+        # Wrap with CountingIterator
+        itr = CountingIterator(itr, start=offset)
+        return itr
+
+
+class GroupedIterator(CountingIterator):
+    """Wrapper around an iterable that returns groups (chunks) of items.
+
+    Args:
+        iterable (iterable): iterable to wrap
+        chunk_size (int): size of each chunk
+
+    Attributes:
+        n (int): number of elements consumed from this iterator
+    """
+
+    def __init__(self, iterable, chunk_size):
+        itr = _chunk_iterator(iterable, chunk_size)
+        super().__init__(
+            itr,
+            start=int(math.ceil(getattr(iterable, "n", 0) / float(chunk_size))),
+            total=int(math.ceil(len(iterable) / float(chunk_size))),
+        )
+        self.chunk_size = chunk_size
+
+
+def _chunk_iterator(itr, chunk_size):
+    chunk = []
+    for x in itr:
+        chunk.append(x)
+        if len(chunk) == chunk_size:
+            yield chunk
+            chunk = []
+    if len(chunk) > 0:
+        yield chunk
+
+
+class ShardedIterator(CountingIterator):
+    """A sharded wrapper around an iterable, padded to length.
+
+    Args:
+        iterable (iterable): iterable to wrap
+        num_shards (int): number of shards to split the iterable into
+        shard_id (int): which shard to iterator over
+        fill_value (Any, optional): padding value when the iterable doesn't
+            evenly divide *num_shards* (default: None).
+
+    Attributes:
+        n (int): number of elements consumed from this iterator
+    """
+
+    def __init__(self, iterable, num_shards, shard_id, fill_value=None):
+        if shard_id < 0 or shard_id >= num_shards:
+            raise ValueError("shard_id must be between 0 and num_shards")
+        sharded_len = int(math.ceil(len(iterable) / float(num_shards)))
+        itr = map(
+            operator.itemgetter(1),
+            itertools.zip_longest(
+                range(sharded_len),
+                itertools.islice(iterable, shard_id, len(iterable), num_shards),
+                fillvalue=fill_value,
+            ),
+        )
+        super().__init__(
+            itr,
+            start=int(math.ceil(getattr(iterable, "n", 0) / float(num_shards))),
+            total=sharded_len,
+        )
+
+
+class BackgroundConsumer(Thread):
+    def __init__(self, queue, source, max_len, cuda_device):
+        Thread.__init__(self)
+
+        self._queue = queue
+        self._source = source
+        self._max_len = max_len
+        self.count = 0
+        self.cuda_device = cuda_device
+
+    def run(self):
+        # set_device to avoid creation of GPU0 context when using pin_memory
+        if self.cuda_device is not None:
+            torch.cuda.set_device(self.cuda_device)
+
+        try:
+            for item in self._source:
+                self._queue.put(item)
+
+                # Stop if we reached the maximum length
+                self.count += 1
+                if self._max_len is not None and self.count >= self._max_len:
+                    break
+
+            # Signal the consumer we are done.
+            self._queue.put(_sentinel)
+        except Exception as e:
+            self._queue.put(e)
+
+
+class BufferedIterator(object):
+    def __init__(self, size, iterable):
+        self._queue = queue.Queue(size)
+        self._iterable = iterable
+        self._consumer = None
+
+        self.start_time = time.time()
+        self.warning_time = None
+
+        self.total = len(iterable)
+
+    def _create_consumer(self):
+        self._consumer = BackgroundConsumer(
+            self._queue,
+            self._iterable,
+            self.total,
+            torch.cuda.current_device() if torch.cuda.is_available() else None
+        )
+        self._consumer.daemon = True
+        self._consumer.start()
+
+    def __iter__(self):
+        return self
+
+    def __len__(self):
+        return self.total
+
+    def take(self, n):
+        self.total = min(self.total, n)
+        # Propagate this change to the underlying iterator
+        if hasattr(self._iterable, "take"):
+            self._iterable.take(n)
+        return self
+
+    def __next__(self):
+        # Create consumer if not created yet
+        if self._consumer is None:
+            self._create_consumer()
+
+        # Notify the user if there is a data loading bottleneck
+        if self._queue.qsize() < min(2, max(1, self._queue.maxsize // 2)):
+            if time.time() - self.start_time > 5 * 60:
+                if (
+                    self.warning_time is None
+                    or time.time() - self.warning_time > 15 * 60
+                ):
+                    logger.debug(
+                        "Data loading buffer is empty or nearly empty. This may "
+                        "indicate a data loading bottleneck, and increasing the "
+                        "number of workers (--num-workers) may help."
+                    )
+                    self.warning_time = time.time()
+
+        # Get next example
+        item = self._queue.get(True)
+        if isinstance(item, Exception):
+            raise item
+        if item is _sentinel:
+            raise StopIteration()
+        return item
diff --git a/SpeechT5/fairseq/fairseq/data/language_pair_dataset.py b/SpeechT5/fairseq/fairseq/data/language_pair_dataset.py
new file mode 100644
index 0000000000000000000000000000000000000000..ff3e14bf14770638524ef6067b558e455dbe5f2b
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/data/language_pair_dataset.py
@@ -0,0 +1,471 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import logging
+
+import numpy as np
+import torch
+from fairseq.data import FairseqDataset, data_utils
+
+
+logger = logging.getLogger(__name__)
+
+
+def collate(
+    samples,
+    pad_idx,
+    eos_idx,
+    left_pad_source=True,
+    left_pad_target=False,
+    input_feeding=True,
+    pad_to_length=None,
+    pad_to_multiple=1,
+):
+    if len(samples) == 0:
+        return {}
+
+    def merge(key, left_pad, move_eos_to_beginning=False, pad_to_length=None):
+        return data_utils.collate_tokens(
+            [s[key] for s in samples],
+            pad_idx,
+            eos_idx,
+            left_pad,
+            move_eos_to_beginning,
+            pad_to_length=pad_to_length,
+            pad_to_multiple=pad_to_multiple,
+        )
+
+    def check_alignment(alignment, src_len, tgt_len):
+        if alignment is None or len(alignment) == 0:
+            return False
+        if (
+            alignment[:, 0].max().item() >= src_len - 1
+            or alignment[:, 1].max().item() >= tgt_len - 1
+        ):
+            logger.warning("alignment size mismatch found, skipping alignment!")
+            return False
+        return True
+
+    def compute_alignment_weights(alignments):
+        """
+        Given a tensor of shape [:, 2] containing the source-target indices
+        corresponding to the alignments, a weight vector containing the
+        inverse frequency of each target index is computed.
+        For e.g. if alignments = [[5, 7], [2, 3], [1, 3], [4, 2]], then
+        a tensor containing [1., 0.5, 0.5, 1] should be returned (since target
+        index 3 is repeated twice)
+        """
+        align_tgt = alignments[:, 1]
+        _, align_tgt_i, align_tgt_c = torch.unique(
+            align_tgt, return_inverse=True, return_counts=True
+        )
+        align_weights = align_tgt_c[align_tgt_i[np.arange(len(align_tgt))]]
+        return 1.0 / align_weights.float()
+
+    id = torch.LongTensor([s["id"] for s in samples])
+    src_tokens = merge(
+        "source",
+        left_pad=left_pad_source,
+        pad_to_length=pad_to_length["source"] if pad_to_length is not None else None,
+    )
+    # sort by descending source length
+    src_lengths = torch.LongTensor(
+        [s["source"].ne(pad_idx).long().sum() for s in samples]
+    )
+    src_lengths, sort_order = src_lengths.sort(descending=True)
+    id = id.index_select(0, sort_order)
+    src_tokens = src_tokens.index_select(0, sort_order)
+
+    prev_output_tokens = None
+    target = None
+    if samples[0].get("target", None) is not None:
+        target = merge(
+            "target",
+            left_pad=left_pad_target,
+            pad_to_length=pad_to_length["target"]
+            if pad_to_length is not None
+            else None,
+        )
+        target = target.index_select(0, sort_order)
+        tgt_lengths = torch.LongTensor(
+            [s["target"].ne(pad_idx).long().sum() for s in samples]
+        ).index_select(0, sort_order)
+        ntokens = tgt_lengths.sum().item()
+
+        if samples[0].get("prev_output_tokens", None) is not None:
+            prev_output_tokens = merge("prev_output_tokens", left_pad=left_pad_target)
+        elif input_feeding:
+            # we create a shifted version of targets for feeding the
+            # previous output token(s) into the next decoder step
+            prev_output_tokens = merge(
+                "target",
+                left_pad=left_pad_target,
+                move_eos_to_beginning=True,
+                pad_to_length=pad_to_length["target"]
+                if pad_to_length is not None
+                else None,
+            )
+    else:
+        ntokens = src_lengths.sum().item()
+
+    batch = {
+        "id": id,
+        "nsentences": len(samples),
+        "ntokens": ntokens,
+        "net_input": {"src_tokens": src_tokens, "src_lengths": src_lengths,},
+        "target": target,
+    }
+    if prev_output_tokens is not None:
+        batch["net_input"]["prev_output_tokens"] = prev_output_tokens.index_select(
+            0, sort_order
+        )
+
+    if samples[0].get("alignment", None) is not None:
+        bsz, tgt_sz = batch["target"].shape
+        src_sz = batch["net_input"]["src_tokens"].shape[1]
+
+        offsets = torch.zeros((len(sort_order), 2), dtype=torch.long)
+        offsets[:, 1] += torch.arange(len(sort_order), dtype=torch.long) * tgt_sz
+        if left_pad_source:
+            offsets[:, 0] += src_sz - src_lengths
+        if left_pad_target:
+            offsets[:, 1] += tgt_sz - tgt_lengths
+
+        alignments = [
+            alignment + offset
+            for align_idx, offset, src_len, tgt_len in zip(
+                sort_order, offsets, src_lengths, tgt_lengths
+            )
+            for alignment in [samples[align_idx]["alignment"].view(-1, 2)]
+            if check_alignment(alignment, src_len, tgt_len)
+        ]
+
+        if len(alignments) > 0:
+            alignments = torch.cat(alignments, dim=0)
+            align_weights = compute_alignment_weights(alignments)
+
+            batch["alignments"] = alignments
+            batch["align_weights"] = align_weights
+
+    if samples[0].get("constraints", None) is not None:
+        # Collate the packed constraints across the samples, padding to
+        # the length of the longest sample.
+        lens = [sample.get("constraints").size(0) for sample in samples]
+        max_len = max(lens)
+        constraints = torch.zeros((len(samples), max(lens))).long()
+        for i, sample in enumerate(samples):
+            constraints[i, 0 : lens[i]] = samples[i].get("constraints")
+        batch["constraints"] = constraints.index_select(0, sort_order)
+
+    return batch
+
+
+class LanguagePairDataset(FairseqDataset):
+    """
+    A pair of torch.utils.data.Datasets.
+
+    Args:
+        src (torch.utils.data.Dataset): source dataset to wrap
+        src_sizes (List[int]): source sentence lengths
+        src_dict (~fairseq.data.Dictionary): source vocabulary
+        tgt (torch.utils.data.Dataset, optional): target dataset to wrap
+        tgt_sizes (List[int], optional): target sentence lengths
+        tgt_dict (~fairseq.data.Dictionary, optional): target vocabulary
+        left_pad_source (bool, optional): pad source tensors on the left side
+            (default: True).
+        left_pad_target (bool, optional): pad target tensors on the left side
+            (default: False).
+        shuffle (bool, optional): shuffle dataset elements before batching
+            (default: True).
+        input_feeding (bool, optional): create a shifted version of the targets
+            to be passed into the model for teacher forcing (default: True).
+        remove_eos_from_source (bool, optional): if set, removes eos from end
+            of source if it's present (default: False).
+        append_eos_to_target (bool, optional): if set, appends eos to end of
+            target if it's absent (default: False).
+        align_dataset (torch.utils.data.Dataset, optional): dataset
+            containing alignments.
+        constraints (Tensor, optional): 2d tensor with a concatenated, zero-
+            delimited list of constraints for each sentence.
+        append_bos (bool, optional): if set, appends bos to the beginning of
+            source/target sentence.
+        num_buckets (int, optional): if set to a value greater than 0, then
+            batches will be bucketed into the given number of batch shapes.
+        src_lang_id (int, optional): source language ID, if set, the collated batch
+            will contain a field 'src_lang_id' in 'net_input' which indicates the
+            source language of the samples.
+        tgt_lang_id (int, optional): target language ID, if set, the collated batch
+            will contain a field 'tgt_lang_id' which indicates the target language
+             of the samples.
+    """
+
+    def __init__(
+        self,
+        src,
+        src_sizes,
+        src_dict,
+        tgt=None,
+        tgt_sizes=None,
+        tgt_dict=None,
+        left_pad_source=True,
+        left_pad_target=False,
+        shuffle=True,
+        input_feeding=True,
+        remove_eos_from_source=False,
+        append_eos_to_target=False,
+        align_dataset=None,
+        constraints=None,
+        append_bos=False,
+        eos=None,
+        num_buckets=0,
+        src_lang_id=None,
+        tgt_lang_id=None,
+        pad_to_multiple=1,
+    ):
+        if tgt_dict is not None:
+            assert src_dict.pad() == tgt_dict.pad()
+            assert src_dict.eos() == tgt_dict.eos()
+            assert src_dict.unk() == tgt_dict.unk()
+        if tgt is not None:
+            assert len(src) == len(
+                tgt
+            ), "Source and target must contain the same number of examples"
+        self.src = src
+        self.tgt = tgt
+        self.src_sizes = np.array(src_sizes)
+        self.tgt_sizes = np.array(tgt_sizes) if tgt_sizes is not None else None
+        self.sizes = (
+            np.vstack((self.src_sizes, self.tgt_sizes)).T
+            if self.tgt_sizes is not None
+            else self.src_sizes
+        )
+        self.src_dict = src_dict
+        self.tgt_dict = tgt_dict
+        self.left_pad_source = left_pad_source
+        self.left_pad_target = left_pad_target
+        self.shuffle = shuffle
+        self.input_feeding = input_feeding
+        self.remove_eos_from_source = remove_eos_from_source
+        self.append_eos_to_target = append_eos_to_target
+        self.align_dataset = align_dataset
+        if self.align_dataset is not None:
+            assert (
+                self.tgt_sizes is not None
+            ), "Both source and target needed when alignments are provided"
+        self.constraints = constraints
+        self.append_bos = append_bos
+        self.eos = eos if eos is not None else src_dict.eos()
+        self.src_lang_id = src_lang_id
+        self.tgt_lang_id = tgt_lang_id
+        if num_buckets > 0:
+            from fairseq.data import BucketPadLengthDataset
+
+            self.src = BucketPadLengthDataset(
+                self.src,
+                sizes=self.src_sizes,
+                num_buckets=num_buckets,
+                pad_idx=self.src_dict.pad(),
+                left_pad=self.left_pad_source,
+            )
+            self.src_sizes = self.src.sizes
+            logger.info("bucketing source lengths: {}".format(list(self.src.buckets)))
+            if self.tgt is not None:
+                self.tgt = BucketPadLengthDataset(
+                    self.tgt,
+                    sizes=self.tgt_sizes,
+                    num_buckets=num_buckets,
+                    pad_idx=self.tgt_dict.pad(),
+                    left_pad=self.left_pad_target,
+                )
+                self.tgt_sizes = self.tgt.sizes
+                logger.info(
+                    "bucketing target lengths: {}".format(list(self.tgt.buckets))
+                )
+
+            # determine bucket sizes using self.num_tokens, which will return
+            # the padded lengths (thanks to BucketPadLengthDataset)
+            num_tokens = np.vectorize(self.num_tokens, otypes=[np.compat.long])
+            self.bucketed_num_tokens = num_tokens(np.arange(len(self.src)))
+            self.buckets = [
+                (None, num_tokens) for num_tokens in np.unique(self.bucketed_num_tokens)
+            ]
+        else:
+            self.buckets = None
+        self.pad_to_multiple = pad_to_multiple
+
+    def get_batch_shapes(self):
+        return self.buckets
+
+    def __getitem__(self, index):
+        tgt_item = self.tgt[index] if self.tgt is not None else None
+        src_item = self.src[index]
+        # Append EOS to end of tgt sentence if it does not have an EOS and remove
+        # EOS from end of src sentence if it exists. This is useful when we use
+        # use existing datasets for opposite directions i.e., when we want to
+        # use tgt_dataset as src_dataset and vice versa
+        if self.append_eos_to_target:
+            eos = self.tgt_dict.eos() if self.tgt_dict else self.src_dict.eos()
+            if self.tgt and self.tgt[index][-1] != eos:
+                tgt_item = torch.cat([self.tgt[index], torch.LongTensor([eos])])
+
+        if self.append_bos:
+            bos = self.tgt_dict.bos() if self.tgt_dict else self.src_dict.bos()
+            if self.tgt and self.tgt[index][0] != bos:
+                tgt_item = torch.cat([torch.LongTensor([bos]), self.tgt[index]])
+
+            bos = self.src_dict.bos()
+            if self.src[index][0] != bos:
+                src_item = torch.cat([torch.LongTensor([bos]), self.src[index]])
+
+        if self.remove_eos_from_source:
+            eos = self.src_dict.eos()
+            if self.src[index][-1] == eos:
+                src_item = self.src[index][:-1]
+
+        example = {
+            "id": index,
+            "source": src_item,
+            "target": tgt_item,
+        }
+        if self.align_dataset is not None:
+            example["alignment"] = self.align_dataset[index]
+        if self.constraints is not None:
+            example["constraints"] = self.constraints[index]
+        return example
+
+    def __len__(self):
+        return len(self.src)
+
+    def collater(self, samples, pad_to_length=None):
+        """Merge a list of samples to form a mini-batch.
+
+        Args:
+            samples (List[dict]): samples to collate
+            pad_to_length (dict, optional): a dictionary of
+                {'source': source_pad_to_length, 'target': target_pad_to_length}
+                to indicate the max length to pad to in source and target respectively.
+
+        Returns:
+            dict: a mini-batch with the following keys:
+
+                - `id` (LongTensor): example IDs in the original input order
+                - `ntokens` (int): total number of tokens in the batch
+                - `net_input` (dict): the input to the Model, containing keys:
+
+                  - `src_tokens` (LongTensor): a padded 2D Tensor of tokens in
+                    the source sentence of shape `(bsz, src_len)`. Padding will
+                    appear on the left if *left_pad_source* is ``True``.
+                  - `src_lengths` (LongTensor): 1D Tensor of the unpadded
+                    lengths of each source sentence of shape `(bsz)`
+                  - `prev_output_tokens` (LongTensor): a padded 2D Tensor of
+                    tokens in the target sentence, shifted right by one
+                    position for teacher forcing, of shape `(bsz, tgt_len)`.
+                    This key will not be present if *input_feeding* is
+                    ``False``.  Padding will appear on the left if
+                    *left_pad_target* is ``True``.
+                  - `src_lang_id` (LongTensor): a long Tensor which contains source
+                    language IDs of each sample in the batch
+
+                - `target` (LongTensor): a padded 2D Tensor of tokens in the
+                  target sentence of shape `(bsz, tgt_len)`. Padding will appear
+                  on the left if *left_pad_target* is ``True``.
+                - `tgt_lang_id` (LongTensor): a long Tensor which contains target language
+                   IDs of each sample in the batch
+        """
+        res = collate(
+            samples,
+            pad_idx=self.src_dict.pad(),
+            eos_idx=self.eos,
+            left_pad_source=self.left_pad_source,
+            left_pad_target=self.left_pad_target,
+            input_feeding=self.input_feeding,
+            pad_to_length=pad_to_length,
+            pad_to_multiple=self.pad_to_multiple,
+        )
+        if self.src_lang_id is not None or self.tgt_lang_id is not None:
+            src_tokens = res["net_input"]["src_tokens"]
+            bsz = src_tokens.size(0)
+            if self.src_lang_id is not None:
+                res["net_input"]["src_lang_id"] = (
+                    torch.LongTensor([[self.src_lang_id]]).expand(bsz, 1).to(src_tokens)
+                )
+            if self.tgt_lang_id is not None:
+                res["tgt_lang_id"] = (
+                    torch.LongTensor([[self.tgt_lang_id]]).expand(bsz, 1).to(src_tokens)
+                )
+        return res
+
+    def num_tokens(self, index):
+        """Return the number of tokens in a sample. This value is used to
+        enforce ``--max-tokens`` during batching."""
+        return max(
+            self.src_sizes[index],
+            self.tgt_sizes[index] if self.tgt_sizes is not None else 0,
+        )
+
+    def num_tokens_vec(self, indices):
+        """Return the number of tokens for a set of positions defined by indices.
+        This value is used to enforce ``--max-tokens`` during batching."""
+        sizes = self.src_sizes[indices]
+        if self.tgt_sizes is not None:
+            sizes = np.maximum(sizes, self.tgt_sizes[indices])
+        return sizes
+
+    def size(self, index):
+        """Return an example's size as a float or tuple. This value is used when
+        filtering a dataset with ``--max-positions``."""
+        return (
+            self.src_sizes[index],
+            self.tgt_sizes[index] if self.tgt_sizes is not None else 0,
+        )
+
+    def ordered_indices(self):
+        """Return an ordered list of indices. Batches will be constructed based
+        on this order."""
+        if self.shuffle:
+            indices = np.random.permutation(len(self)).astype(np.int64)
+        else:
+            indices = np.arange(len(self), dtype=np.int64)
+        if self.buckets is None:
+            # sort by target length, then source length
+            if self.tgt_sizes is not None:
+                indices = indices[np.argsort(self.tgt_sizes[indices], kind="mergesort")]
+            return indices[np.argsort(self.src_sizes[indices], kind="mergesort")]
+        else:
+            # sort by bucketed_num_tokens, which is:
+            #   max(padded_src_len, padded_tgt_len)
+            return indices[
+                np.argsort(self.bucketed_num_tokens[indices], kind="mergesort")
+            ]
+
+    @property
+    def supports_prefetch(self):
+        return getattr(self.src, "supports_prefetch", False) and (
+            getattr(self.tgt, "supports_prefetch", False) or self.tgt is None
+        )
+
+    def prefetch(self, indices):
+        self.src.prefetch(indices)
+        if self.tgt is not None:
+            self.tgt.prefetch(indices)
+        if self.align_dataset is not None:
+            self.align_dataset.prefetch(indices)
+
+    def filter_indices_by_size(self, indices, max_sizes):
+        """Filter a list of sample indices. Remove those that are longer
+            than specified in max_sizes.
+
+        Args:
+            indices (np.array): original array of sample indices
+            max_sizes (int or list[int] or tuple[int]): max sample size,
+                can be defined separately for src and tgt (then list or tuple)
+
+        Returns:
+            np.array: filtered sample array
+            list: list of removed indices
+        """
+        return data_utils.filter_paired_dataset_indices_by_size(
+            self.src_sizes, self.tgt_sizes, indices, max_sizes,
+        )
diff --git a/SpeechT5/fairseq/fairseq/data/legacy/__init__.py b/SpeechT5/fairseq/fairseq/data/legacy/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..9bd5c72b5e9d7f67fb7e4ef10808d7ec08967ff4
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/data/legacy/__init__.py
@@ -0,0 +1,16 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from .block_pair_dataset import BlockPairDataset
+from .masked_lm_dataset import MaskedLMDataset
+from .masked_lm_dictionary import BertDictionary, MaskedLMDictionary
+
+
+__all__ = [
+    "BertDictionary",
+    "BlockPairDataset",
+    "MaskedLMDataset",
+    "MaskedLMDictionary",
+]
diff --git a/SpeechT5/fairseq/fairseq/data/legacy/block_pair_dataset.py b/SpeechT5/fairseq/fairseq/data/legacy/block_pair_dataset.py
new file mode 100644
index 0000000000000000000000000000000000000000..ba069b46052286c531b4f9706d96788732cd2ad2
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/data/legacy/block_pair_dataset.py
@@ -0,0 +1,311 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import math
+
+import numpy as np
+import torch
+from fairseq.data import FairseqDataset
+
+
+class BlockPairDataset(FairseqDataset):
+    """Break a Dataset of tokens into sentence pair blocks for next sentence
+       prediction as well as masked language model.
+
+       High-level logics are:
+       1. break input tensor to tensor blocks
+       2. pair the blocks with 50% next sentence and 50% random sentence
+       3. return paired blocks as well as related segment labels
+
+    Args:
+        dataset (~torch.utils.data.Dataset): dataset to break into blocks
+        sizes: array of sentence lengths
+        dictionary: dictionary for the task
+        block_size: maximum block size
+        break_mode: mode for breaking copurs into block pairs. currently we support
+            2 modes
+            doc: respect document boundaries and each part of the pair should belong to on document
+            none: don't respect any boundary and cut tokens evenly
+        short_seq_prob: probability for generating shorter block pairs
+        doc_break_size: Size for empty line separating documents. Typically 1 if
+                        the sentences have eos, 0 otherwise.
+    """
+
+    def __init__(
+        self,
+        dataset,
+        dictionary,
+        sizes,
+        block_size,
+        break_mode="doc",
+        short_seq_prob=0.1,
+        doc_break_size=1,
+    ):
+        super().__init__()
+        self.dataset = dataset
+        self.pad = dictionary.pad()
+        self.eos = dictionary.eos()
+        self.cls = dictionary.cls()
+        self.mask = dictionary.mask()
+        self.sep = dictionary.sep()
+        self.break_mode = break_mode
+        self.dictionary = dictionary
+        self.short_seq_prob = short_seq_prob
+        self.block_indices = []
+
+        assert len(dataset) == len(sizes)
+
+        if break_mode == "doc":
+            cur_doc = []
+            for sent_id, sz in enumerate(sizes):
+                assert doc_break_size == 0 or sz != 0, (
+                    "when doc_break_size is non-zero, we expect documents to be"
+                    "separated by a blank line with a single eos."
+                )
+                # empty line as document separator
+                if sz == doc_break_size:
+                    if len(cur_doc) == 0:
+                        continue
+                    self.block_indices.append(cur_doc)
+                    cur_doc = []
+                else:
+                    cur_doc.append(sent_id)
+            max_num_tokens = block_size - 3  # Account for [CLS], [SEP], [SEP]
+            self.sent_pairs = []
+            self.sizes = []
+            for doc_id, doc in enumerate(self.block_indices):
+                self._generate_sentence_pair(doc, doc_id, max_num_tokens, sizes)
+        elif break_mode is None or break_mode == "none":
+            # each block should have half of the block size since we are constructing block pair
+            sent_length = (block_size - 3) // 2
+            total_len = sum(dataset.sizes)
+            length = math.ceil(total_len / sent_length)
+
+            def block_at(i):
+                start = i * sent_length
+                end = min(start + sent_length, total_len)
+                return (start, end)
+
+            sent_indices = np.array([block_at(i) for i in range(length)])
+            sent_sizes = np.array([e - s for s, e in sent_indices])
+            dataset_index = self._sent_to_dataset_index(sent_sizes)
+
+            # pair sentences
+            self._pair_sentences(dataset_index)
+        else:
+            raise ValueError("Invalid break_mode: " + break_mode)
+
+    def _pair_sentences(self, dataset_index):
+        """
+        Give a list of evenly cut blocks/sentences, pair these sentences with 50%
+        consecutive sentences and 50% random sentences.
+        This is used for none break mode
+        """
+        # pair sentences
+        for sent_id, sent in enumerate(dataset_index):
+            next_sent_label = (
+                1 if np.random.rand() > 0.5 and sent_id != len(dataset_index) - 1 else 0
+            )
+            if next_sent_label:
+                next_sent = dataset_index[sent_id + 1]
+            else:
+                next_sent = dataset_index[
+                    self._skip_sampling(len(dataset_index), [sent_id, sent_id + 1])
+                ]
+            self.sent_pairs.append((sent, next_sent, next_sent_label))
+
+            # The current blocks don't include the special tokens but the
+            # sizes already account for this
+            self.sizes.append(3 + sent[3] + next_sent[3])
+
+    def _sent_to_dataset_index(self, sent_sizes):
+        """
+        Build index mapping block indices to the underlying dataset indices
+        """
+        dataset_index = []
+        ds_idx, ds_remaining = -1, 0
+        for to_consume in sent_sizes:
+            sent_size = to_consume
+            if ds_remaining == 0:
+                ds_idx += 1
+                ds_remaining = sent_sizes[ds_idx]
+            start_ds_idx = ds_idx
+            start_offset = sent_sizes[ds_idx] - ds_remaining
+            while to_consume > ds_remaining:
+                to_consume -= ds_remaining
+                ds_idx += 1
+                ds_remaining = sent_sizes[ds_idx]
+            ds_remaining -= to_consume
+            dataset_index.append(
+                (
+                    start_ds_idx,  # starting index in dataset
+                    start_offset,  # starting offset within starting index
+                    ds_idx,  # ending index in dataset
+                    sent_size,  # sentence length
+                )
+            )
+        assert ds_remaining == 0
+        assert ds_idx == len(self.dataset) - 1
+        return dataset_index
+
+    def _generate_sentence_pair(self, doc, doc_id, max_num_tokens, sizes):
+        """
+        Go through a single document and genrate sentence paris from it
+        """
+        current_chunk = []
+        current_length = 0
+        curr = 0
+        # To provide more randomness, we decrease target seq length for parts of
+        # samples (10% by default). Note that max_num_tokens is the hard threshold
+        # for batching and will never be changed.
+        target_seq_length = max_num_tokens
+        if np.random.random() < self.short_seq_prob:
+            target_seq_length = np.random.randint(2, max_num_tokens)
+        # loop through all sentences in document
+        while curr < len(doc):
+            sent_id = doc[curr]
+            current_chunk.append(sent_id)
+            current_length = sum(sizes[current_chunk])
+            # split chunk and generate pair when exceed target_seq_length or
+            # finish the loop
+            if curr == len(doc) - 1 or current_length >= target_seq_length:
+                # split the chunk into 2 parts
+                a_end = 1
+                if len(current_chunk) > 2:
+                    a_end = np.random.randint(1, len(current_chunk) - 1)
+                sent_a = current_chunk[:a_end]
+                len_a = sum(sizes[sent_a])
+                # generate next sentence label, note that if there is only 1 sentence
+                # in current chunk, label is always 0
+                next_sent_label = (
+                    1 if np.random.rand() > 0.5 and len(current_chunk) != 1 else 0
+                )
+                if not next_sent_label:
+                    # if next sentence label is 0, sample sent_b from a random doc
+                    target_b_length = target_seq_length - len_a
+                    rand_doc_id = self._skip_sampling(len(self.block_indices), [doc_id])
+                    random_doc = self.block_indices[rand_doc_id]
+                    random_start = np.random.randint(0, len(random_doc))
+                    sent_b = []
+                    len_b = 0
+                    for j in range(random_start, len(random_doc)):
+                        sent_b.append(random_doc[j])
+                        len_b = sum(sizes[sent_b])
+                        if len_b >= target_b_length:
+                            break
+                    # return the second part of the chunk since it's not used
+                    num_unused_segments = len(current_chunk) - a_end
+                    curr -= num_unused_segments
+                else:
+                    # if next sentence label is 1, use the second part of chunk as sent_B
+                    sent_b = current_chunk[a_end:]
+                    len_b = sum(sizes[sent_b])
+                # currently sent_a and sent_B may be longer than max_num_tokens,
+                # truncate them and return block idx and offsets for them
+                sent_a, sent_b = self._truncate_sentences(
+                    sent_a, sent_b, max_num_tokens
+                )
+                self.sent_pairs.append((sent_a, sent_b, next_sent_label))
+                self.sizes.append(3 + sent_a[3] + sent_b[3])
+                current_chunk = []
+            curr += 1
+
+    def _skip_sampling(self, total, skip_ids):
+        """
+        Generate a random integer which is not in skip_ids. Sample range is [0, total)
+        TODO: ids in skip_ids should be consecutive, we can extend it to more generic version later
+        """
+        rand_id = np.random.randint(total - len(skip_ids))
+        return rand_id if rand_id < min(skip_ids) else rand_id + len(skip_ids)
+
+    def _truncate_sentences(self, sent_a, sent_b, max_num_tokens):
+        """
+        Trancate a pair of sentence to limit total length under max_num_tokens
+        Logics:
+            1. Truncate longer sentence
+            2. Tokens to be truncated could be at the beginning or the end of the sentnce
+        Returns:
+            Truncated sentences represented by dataset idx
+        """
+        len_a, len_b = sum(self.dataset.sizes[sent_a]), sum(self.dataset.sizes[sent_b])
+        front_cut_a = front_cut_b = end_cut_a = end_cut_b = 0
+
+        while True:
+            total_length = (
+                len_a + len_b - front_cut_a - front_cut_b - end_cut_a - end_cut_b
+            )
+            if total_length <= max_num_tokens:
+                break
+
+            if len_a - front_cut_a - end_cut_a > len_b - front_cut_b - end_cut_b:
+                if np.random.rand() < 0.5:
+                    front_cut_a += 1
+                else:
+                    end_cut_a += 1
+            else:
+                if np.random.rand() < 0.5:
+                    front_cut_b += 1
+                else:
+                    end_cut_b += 1
+
+        # calculate ds indices as well as offsets and return
+        truncated_sent_a = self._cut_sentence(sent_a, front_cut_a, end_cut_a)
+        truncated_sent_b = self._cut_sentence(sent_b, front_cut_b, end_cut_b)
+        return truncated_sent_a, truncated_sent_b
+
+    def _cut_sentence(self, sent, front_cut, end_cut):
+        """
+        Cut a sentence based on the numbers of tokens to be cut from beginning and end
+        Represent the sentence as dataset idx and return
+        """
+        start_ds_idx, end_ds_idx, offset = sent[0], sent[-1], 0
+        target_len = sum(self.dataset.sizes[sent]) - front_cut - end_cut
+        while front_cut > 0:
+            if self.dataset.sizes[start_ds_idx] > front_cut:
+                offset += front_cut
+                break
+            else:
+                front_cut -= self.dataset.sizes[start_ds_idx]
+                start_ds_idx += 1
+        while end_cut > 0:
+            if self.dataset.sizes[end_ds_idx] > end_cut:
+                break
+            else:
+                end_cut -= self.dataset.sizes[end_ds_idx]
+                end_ds_idx -= 1
+        return start_ds_idx, offset, end_ds_idx, target_len
+
+    def _fetch_block(self, start_ds_idx, offset, end_ds_idx, length):
+        """
+        Fetch a block of tokens based on its dataset idx
+        """
+        buffer = torch.cat(
+            [self.dataset[idx] for idx in range(start_ds_idx, end_ds_idx + 1)]
+        )
+        s, e = offset, offset + length
+        return buffer[s:e]
+
+    def __getitem__(self, index):
+        block1, block2, next_sent_label = self.sent_pairs[index]
+        block1 = self._fetch_block(*block1)
+        block2 = self._fetch_block(*block2)
+        return block1, block2, next_sent_label
+
+    def __len__(self):
+        return len(self.sizes)
+
+    @property
+    def supports_prefetch(self):
+        return getattr(self.dataset, "supports_prefetch", False)
+
+    def prefetch(self, indices):
+        prefetch_idx = set()
+        for index in indices:
+            for block1, block2, _ in [self.sent_pairs[index]]:
+                for ds_idx in range(block1[0], block1[2] + 1):
+                    prefetch_idx.add(ds_idx)
+                for ds_idx in range(block2[0], block2[2] + 1):
+                    prefetch_idx.add(ds_idx)
+        self.dataset.prefetch(prefetch_idx)
diff --git a/SpeechT5/fairseq/fairseq/data/legacy/masked_lm_dataset.py b/SpeechT5/fairseq/fairseq/data/legacy/masked_lm_dataset.py
new file mode 100644
index 0000000000000000000000000000000000000000..dd8ea2c60aff306ab3a756223a298a28d41a4991
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/data/legacy/masked_lm_dataset.py
@@ -0,0 +1,303 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import math
+from typing import Dict, List, Tuple
+
+import numpy as np
+import torch
+from fairseq.data import Dictionary, FairseqDataset, data_utils
+from fairseq.data.concat_dataset import ConcatDataset
+from fairseq.data.legacy.block_pair_dataset import BlockPairDataset
+from fairseq.data.token_block_dataset import TokenBlockDataset
+
+
+class MaskedLMDataset(FairseqDataset):
+    """
+    A wrapper Dataset for masked language modelling. The dataset
+    wraps around TokenBlockDataset or BlockedPairDataset and creates a batch
+    where the input blocks are masked according to the specified masking
+    probability. Additionally the batch can also contain sentence level targets
+    if this is specified.
+
+    Args:
+        dataset: Dataset which generates blocks of data. Only BlockPairDataset
+            and TokenBlockDataset are supported.
+        sizes: Sentence lengths
+        vocab: Dictionary with the vocabulary and special tokens.
+        pad_idx: Id of padding token in dictionary
+        mask_idx: Id of mask token in dictionary
+        classif_token_idx: Id of classification token in dictionary. This is the
+            token associated with the sentence embedding (Eg: CLS for BERT)
+        sep_token_idx: Id of separator token in dictionary
+            (Eg: SEP in BERT)
+        seed: Seed for random number generator for reproducibility.
+        shuffle: Shuffle the elements before batching.
+        has_pairs: Specifies whether the underlying dataset
+            generates a pair of blocks along with a sentence_target or not.
+            Setting it to True assumes that the underlying dataset generates a
+            label for the pair of sentences which is surfaced as
+            sentence_target. The default value assumes a single block with no
+            sentence target.
+        segment_id: An optional segment id for filling in the segment labels
+            when we are in the single block setting (Eg: XLM). Default is 0.
+        masking_ratio: specifies what percentage of the blocks should be masked.
+        masking_prob: specifies the probability of a given token being
+            replaced with the "MASK" token.
+        random_token_prob: specifies the probability of a given token being
+            replaced by a random token from the vocabulary.
+    """
+
+    def __init__(
+        self,
+        dataset: FairseqDataset,
+        sizes: np.ndarray,
+        vocab: Dictionary,
+        pad_idx: int,
+        mask_idx: int,
+        classif_token_idx: int,
+        sep_token_idx: int,
+        seed: int = 1,
+        shuffle: bool = True,
+        has_pairs: bool = True,
+        segment_id: int = 0,
+        masking_ratio: float = 0.15,
+        masking_prob: float = 0.8,
+        random_token_prob: float = 0.1,
+    ):
+        # Make sure the input datasets are the ones supported
+        assert (
+            isinstance(dataset, TokenBlockDataset)
+            or isinstance(dataset, BlockPairDataset)
+            or isinstance(dataset, ConcatDataset)
+        ), (
+            "MaskedLMDataset only wraps TokenBlockDataset or BlockPairDataset or "
+            "ConcatDataset"
+        )
+
+        self.dataset = dataset
+        self.sizes = np.array(sizes)
+        self.vocab = vocab
+        self.pad_idx = pad_idx
+        self.mask_idx = mask_idx
+        self.classif_token_idx = classif_token_idx
+        self.sep_token_idx = sep_token_idx
+        self.shuffle = shuffle
+        self.seed = seed
+        self.has_pairs = has_pairs
+        self.segment_id = segment_id
+        self.masking_ratio = masking_ratio
+        self.masking_prob = masking_prob
+        self.random_token_prob = random_token_prob
+
+        # If we have only one block then sizes needs to be updated to include
+        # the classification token
+        if not has_pairs:
+            self.sizes = self.sizes + 1
+
+    def __getitem__(self, index: int):
+        # if has_pairs, then expect 2 blocks and a sentence target
+        if self.has_pairs:
+            (block_one, block_two, sentence_target) = self.dataset[index]
+        else:
+            block_one = self.dataset[index]
+
+        return {
+            "id": index,
+            "block_one": block_one,
+            "block_two": block_two if self.has_pairs else None,
+            "sentence_target": sentence_target if self.has_pairs else None,
+        }
+
+    def __len__(self):
+        return len(self.dataset)
+
+    def _mask_block(
+        self,
+        sentence: np.ndarray,
+        mask_idx: int,
+        pad_idx: int,
+        dictionary_token_range: Tuple,
+    ):
+        """
+        Mask tokens for Masked Language Model training
+        Samples mask_ratio tokens that will be predicted by LM.
+
+        Note:This function may not be efficient enough since we had multiple
+        conversions between np and torch, we can replace them with torch
+        operators later.
+
+        Args:
+            sentence: 1d tensor to be masked
+            mask_idx: index to use for masking the sentence
+            pad_idx: index to use for masking the target for tokens we aren't
+                predicting
+            dictionary_token_range: range of indices in dictionary which can
+                be used for random word replacement
+                (e.g. without special characters)
+        Return:
+            masked_sent: masked sentence
+            target: target with words which we are not predicting replaced
+                by pad_idx
+        """
+        masked_sent = np.copy(sentence)
+        sent_length = len(sentence)
+        mask_num = math.ceil(sent_length * self.masking_ratio)
+        mask = np.random.choice(sent_length, mask_num, replace=False)
+        target = np.copy(sentence)
+
+        for i in range(sent_length):
+            if i in mask:
+                rand = np.random.random()
+
+                # replace with mask if probability is less than masking_prob
+                # (Eg: 0.8)
+                if rand < self.masking_prob:
+                    masked_sent[i] = mask_idx
+
+                # replace with random token if probability is less than
+                # masking_prob + random_token_prob (Eg: 0.9)
+                elif rand < (self.masking_prob + self.random_token_prob):
+                    # sample random token from dictionary
+                    masked_sent[i] = np.random.randint(
+                        dictionary_token_range[0], dictionary_token_range[1]
+                    )
+            else:
+                target[i] = pad_idx
+
+        return masked_sent, target
+
+    def _collate(self, samples: List[Dict], pad_idx: int, eos_idx: int):
+        """
+        Does the heavy lifting for creating a batch from the input list of
+        examples. The logic is as follows:
+            1. Mask the input blocks. In case has_pair is True then we have 2
+               blocks to mask.
+            2. Prepend the first masked block tensor with the special token
+               used as sentence embedding. Eg: CLS in BERT. This happens
+               irrespective of the value of has_pair.
+            3. If has_pair is True, then append the first masked block with the
+               special separator token (eg: SEP for BERT) and compute segment
+               label accordingly. In this case, also append the second masked
+               block with this special separator token and compute its segment
+               label.
+            4. For the targets tensor, prepend and append with padding index
+               accordingly.
+            5. Concatenate all tensors.
+        """
+        if len(samples) == 0:
+            return {}
+        # To ensure determinism, we reset the state of the PRNG after every
+        # batch based on the seed and the first id of the batch. This ensures
+        # that across epochs we get the same mask for the same example. This
+        # is needed for reproducibility and is how BERT does masking
+        # TODO: Can we add deteminism without this constraint?
+        with data_utils.numpy_seed(self.seed + samples[0]["id"]):
+            for s in samples:
+
+                # token range is needed for replacing with random token during
+                # masking
+                token_range = (self.vocab.nspecial, len(self.vocab))
+
+                # mask according to specified probabilities.
+                masked_blk_one, masked_tgt_one = self._mask_block(
+                    s["block_one"],
+                    self.mask_idx,
+                    self.pad_idx,
+                    token_range,
+                )
+
+                tokens = np.concatenate([[self.classif_token_idx], masked_blk_one])
+                targets = np.concatenate([[self.pad_idx], masked_tgt_one])
+                segments = np.ones(len(tokens)) * self.segment_id
+
+                # if has_pairs is True then we need to add the SEP token to both
+                # the blocks after masking and re-compute segments based on the new
+                # lengths.
+                if self.has_pairs:
+                    tokens_one = np.concatenate([tokens, [self.sep_token_idx]])
+                    targets_one = np.concatenate([targets, [self.pad_idx]])
+
+                    masked_blk_two, masked_tgt_two = self._mask_block(
+                        s["block_two"], self.mask_idx, self.pad_idx, token_range
+                    )
+                    tokens_two = np.concatenate([masked_blk_two, [self.sep_token_idx]])
+                    targets_two = np.concatenate([masked_tgt_two, [self.pad_idx]])
+
+                    # block + 1 sep + 1 special (CLS)
+                    segments_one = np.zeros(len(tokens_one))
+                    # block + 1 sep
+                    segments_two = np.ones(len(tokens_two))
+
+                    tokens = np.concatenate([tokens_one, tokens_two])
+                    targets = np.concatenate([targets_one, targets_two])
+                    segments = np.concatenate([segments_one, segments_two])
+
+                s["source"] = torch.LongTensor(tokens)
+                s["segment_labels"] = torch.LongTensor(segments)
+                s["lm_target"] = torch.LongTensor(targets)
+
+        def merge(key):
+            return data_utils.collate_tokens(
+                [s[key] for s in samples], pad_idx, eos_idx, left_pad=False
+            )
+
+        return {
+            "id": torch.LongTensor([s["id"] for s in samples]),
+            "ntokens": sum(len(s["source"]) for s in samples),
+            "net_input": {
+                "src_tokens": merge("source"),
+                "segment_labels": merge("segment_labels"),
+            },
+            "lm_target": merge("lm_target"),
+            "sentence_target": torch.LongTensor([s["sentence_target"] for s in samples])
+            if self.has_pairs
+            else None,
+            "nsentences": len(samples),
+        }
+
+    def collater(self, samples: List[Dict]):
+        """Merge a list of samples to form a mini-batch.
+
+        Args:
+            samples (List[dict]): samples to collate
+
+        Returns:
+            dict: a mini-batch of data
+        """
+        return self._collate(samples, self.vocab.pad(), self.vocab.eos())
+
+    def num_tokens(self, index: int):
+        """
+        Return the number of tokens in a sample. This value is used to
+        enforce max-tokens during batching.
+        """
+        return self.sizes[index]
+
+    def size(self, index: int):
+        """
+        Return an example's size as a float or tuple. This value is used when
+        filtering a dataset with max-positions.
+        """
+        return self.sizes[index]
+
+    def ordered_indices(self):
+        """
+        Return an ordered list of indices. Batches will be constructed based
+        on this order.
+        """
+        if self.shuffle:
+            return np.random.permutation(len(self))
+        else:
+            order = [np.arange(len(self))]
+            order.append(self.sizes)
+            return np.lexsort(order)
+
+    @property
+    def supports_prefetch(self):
+        return getattr(self.dataset, "supports_prefetch", False)
+
+    def prefetch(self, indices):
+        self.dataset.prefetch(indices)
diff --git a/SpeechT5/fairseq/fairseq/data/legacy/masked_lm_dictionary.py b/SpeechT5/fairseq/fairseq/data/legacy/masked_lm_dictionary.py
new file mode 100644
index 0000000000000000000000000000000000000000..dee88f7a3ed72ea465ea4e8ffe7b1c01ff6f57f1
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/data/legacy/masked_lm_dictionary.py
@@ -0,0 +1,60 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from fairseq.data import Dictionary
+
+
+class MaskedLMDictionary(Dictionary):
+    """
+    Dictionary for Masked Language Modelling tasks. This extends Dictionary by
+    adding the mask symbol.
+    """
+
+    def __init__(
+        self,
+        pad="<pad>",
+        eos="</s>",
+        unk="<unk>",
+        mask="<mask>",
+    ):
+        super().__init__(pad=pad, eos=eos, unk=unk)
+        self.mask_word = mask
+        self.mask_index = self.add_symbol(mask)
+        self.nspecial = len(self.symbols)
+
+    def mask(self):
+        """Helper to get index of mask symbol"""
+        return self.mask_index
+
+
+class BertDictionary(MaskedLMDictionary):
+    """
+    Dictionary for BERT task. This extends MaskedLMDictionary by adding support
+    for cls and sep symbols.
+    """
+
+    def __init__(
+        self,
+        pad="<pad>",
+        eos="</s>",
+        unk="<unk>",
+        mask="<mask>",
+        cls="<cls>",
+        sep="<sep>",
+    ):
+        super().__init__(pad=pad, eos=eos, unk=unk, mask=mask)
+        self.cls_word = cls
+        self.sep_word = sep
+        self.cls_index = self.add_symbol(cls)
+        self.sep_index = self.add_symbol(sep)
+        self.nspecial = len(self.symbols)
+
+    def cls(self):
+        """Helper to get index of cls symbol"""
+        return self.cls_index
+
+    def sep(self):
+        """Helper to get index of sep symbol"""
+        return self.sep_index
diff --git a/SpeechT5/fairseq/fairseq/data/list_dataset.py b/SpeechT5/fairseq/fairseq/data/list_dataset.py
new file mode 100644
index 0000000000000000000000000000000000000000..12f00aa43661d6bad701c9e72653ba8779136906
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/data/list_dataset.py
@@ -0,0 +1,32 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from . import BaseWrapperDataset
+
+
+class ListDataset(BaseWrapperDataset):
+    def __init__(self, dataset, sizes=None):
+        super().__init__(dataset)
+        self._sizes = sizes
+
+    def __iter__(self):
+        for x in self.dataset:
+            yield x
+
+    def collater(self, samples):
+        return samples
+
+    @property
+    def sizes(self):
+        return self._sizes
+
+    def num_tokens(self, index):
+        return self.sizes[index]
+
+    def size(self, index):
+        return self.sizes[index]
+
+    def set_epoch(self, epoch):
+        pass
diff --git a/SpeechT5/fairseq/fairseq/data/lm_context_window_dataset.py b/SpeechT5/fairseq/fairseq/data/lm_context_window_dataset.py
new file mode 100644
index 0000000000000000000000000000000000000000..1a945927cf0d96719003685676a990737a3762b2
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/data/lm_context_window_dataset.py
@@ -0,0 +1,97 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import numpy as np
+import torch
+from typing import Dict
+
+from fairseq.data.monolingual_dataset import MonolingualDataset
+
+from . import FairseqDataset
+
+
+class LMContextWindowDataset(FairseqDataset):
+    """
+    Wraps a MonolingualDataset and provides more context for evaluation.
+
+    Each item in the new dataset will have a maximum size of
+    ``tokens_per_sample + context_window``.
+
+    Args:
+        dataset: dataset to wrap
+        tokens_per_sample (int): the max number of tokens in each dataset item
+        context_window (int): the number of accumulated tokens to add to each
+            dataset item
+        pad_idx (int): padding symbol
+    """
+
+    def __init__(
+        self,
+        dataset: MonolingualDataset,
+        tokens_per_sample: int,
+        context_window: int,
+        pad_idx: int,
+    ):
+        assert context_window > 0
+        self.dataset = dataset
+        self.tokens_per_sample = tokens_per_sample
+        self.context_window = context_window
+        self.pad_idx = pad_idx
+        self.prev_tokens = np.empty([0])
+
+    def __getitem__(self, index):
+        return self.dataset[index]
+
+    def __len__(self):
+        return len(self.dataset)
+
+    def collater(self, samples) -> Dict:
+        sample = self.dataset.collater(samples)
+
+        pad = self.pad_idx
+        max_sample_len = self.tokens_per_sample + self.context_window
+
+        bsz, tsz = sample["net_input"]["src_tokens"].shape
+        start_idxs = [0] * bsz
+        toks = sample["net_input"]["src_tokens"]
+        lengths = sample["net_input"]["src_lengths"]
+        tgt = sample["target"]
+        new_toks = np.empty([bsz, tsz + self.context_window], dtype=np.int64)
+        new_tgt = np.full([bsz, tsz + self.context_window], pad, dtype=np.int64)
+        sample_lens = toks.ne(pad).long().sum(dim=1).cpu()
+        for i in range(bsz):
+            sample_len = sample_lens[i]
+            extra = len(self.prev_tokens) + sample_len - max_sample_len
+            if extra > 0:
+                self.prev_tokens = self.prev_tokens[extra:]
+            pads = np.full(self.context_window - len(self.prev_tokens), pad)
+            new_toks[i] = np.concatenate([self.prev_tokens, toks[i].numpy(), pads])
+            new_tgt[
+                i, len(self.prev_tokens) : len(self.prev_tokens) + len(tgt[i])
+            ] = tgt[i]
+            start_idxs[i] = len(self.prev_tokens)
+            lengths[i] += len(self.prev_tokens)
+            self.prev_tokens = new_toks[i][new_toks[i] != pad][-self.context_window :]
+        sample["net_input"]["src_tokens"] = torch.from_numpy(new_toks)
+        sample["target"] = torch.from_numpy(new_tgt)
+        sample["start_indices"] = start_idxs
+        return sample
+
+    def num_tokens(self, index):
+        return self.dataset.num_tokens(index)
+
+    def size(self, index):
+        return self.dataset.size(index)
+
+    def ordered_indices(self):
+        # NOTE we don't shuffle the data to retain access to the previous dataset elements
+        return np.arange(len(self.dataset))
+
+    @property
+    def supports_prefetch(self):
+        return getattr(self.dataset, "supports_prefetch", False)
+
+    def prefetch(self, indices):
+        return self.dataset.prefetch(indices)
diff --git a/SpeechT5/fairseq/fairseq/data/lru_cache_dataset.py b/SpeechT5/fairseq/fairseq/data/lru_cache_dataset.py
new file mode 100644
index 0000000000000000000000000000000000000000..a7854ac1701392754ce5795cafe9c634671aebdf
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/data/lru_cache_dataset.py
@@ -0,0 +1,21 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from functools import lru_cache
+
+from . import BaseWrapperDataset
+
+
+class LRUCacheDataset(BaseWrapperDataset):
+    def __init__(self, dataset, token=None):
+        super().__init__(dataset)
+
+    @lru_cache(maxsize=8)
+    def __getitem__(self, index):
+        return self.dataset[index]
+
+    @lru_cache(maxsize=8)
+    def collater(self, samples):
+        return self.dataset.collater(samples)
diff --git a/SpeechT5/fairseq/fairseq/data/mask_tokens_dataset.py b/SpeechT5/fairseq/fairseq/data/mask_tokens_dataset.py
new file mode 100644
index 0000000000000000000000000000000000000000..9123235594c3977994a3ae8a03ab4c9e395cc5de
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/data/mask_tokens_dataset.py
@@ -0,0 +1,220 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from functools import lru_cache
+
+import numpy as np
+import torch
+from fairseq.data import Dictionary, data_utils
+
+from . import BaseWrapperDataset, LRUCacheDataset
+
+
+class MaskTokensDataset(BaseWrapperDataset):
+    """
+    A wrapper Dataset for masked language modeling.
+
+    Input items are masked according to the specified masking probability.
+
+    Args:
+        dataset: Dataset to wrap.
+        sizes: Sentence lengths
+        vocab: Dictionary with the vocabulary and special tokens.
+        pad_idx: Id of pad token in vocab
+        mask_idx: Id of mask token in vocab
+        return_masked_tokens: controls whether to return the non-masked tokens
+            (the default) or to return a tensor with the original masked token
+            IDs (and *pad_idx* elsewhere). The latter is useful as targets for
+            masked LM training.
+        seed: Seed for random number generator for reproducibility.
+        mask_prob: probability of replacing a token with *mask_idx*.
+        leave_unmasked_prob: probability that a masked token is unmasked.
+        random_token_prob: probability of replacing a masked token with a
+            random token from the vocabulary.
+        freq_weighted_replacement: sample random replacement words based on
+            word frequencies in the vocab.
+        mask_whole_words: only mask whole words. This should be a byte mask
+            over vocab indices, indicating whether it is the beginning of a
+            word. We will extend any mask to encompass the whole word.
+        bpe: BPE to use for whole-word masking.
+        mask_multiple_length : repeat each mask index multiple times. Default
+            value is 1.
+        mask_stdev : standard deviation of masks distribution in case of
+            multiple masking. Default value is 0.
+    """
+
+    @classmethod
+    def apply_mask(cls, dataset: torch.utils.data.Dataset, *args, **kwargs):
+        """Return the source and target datasets for masked LM training."""
+        dataset = LRUCacheDataset(dataset)
+        return (
+            LRUCacheDataset(cls(dataset, *args, **kwargs, return_masked_tokens=False)),
+            LRUCacheDataset(cls(dataset, *args, **kwargs, return_masked_tokens=True)),
+        )
+
+    def __init__(
+        self,
+        dataset: torch.utils.data.Dataset,
+        vocab: Dictionary,
+        pad_idx: int,
+        mask_idx: int,
+        return_masked_tokens: bool = False,
+        seed: int = 1,
+        mask_prob: float = 0.15,
+        leave_unmasked_prob: float = 0.1,
+        random_token_prob: float = 0.1,
+        freq_weighted_replacement: bool = False,
+        mask_whole_words: torch.Tensor = None,
+        mask_multiple_length: int = 1,
+        mask_stdev: float = 0.0,
+    ):
+        assert 0.0 < mask_prob < 1.0
+        assert 0.0 <= random_token_prob <= 1.0
+        assert 0.0 <= leave_unmasked_prob <= 1.0
+        assert random_token_prob + leave_unmasked_prob <= 1.0
+        assert mask_multiple_length >= 1
+        assert mask_stdev >= 0.0
+
+        self.dataset = dataset
+        self.vocab = vocab
+        self.pad_idx = pad_idx
+        self.mask_idx = mask_idx
+        self.return_masked_tokens = return_masked_tokens
+        self.seed = seed
+        self.mask_prob = mask_prob
+        self.leave_unmasked_prob = leave_unmasked_prob
+        self.random_token_prob = random_token_prob
+        self.mask_whole_words = mask_whole_words
+        self.mask_multiple_length = mask_multiple_length
+        self.mask_stdev = mask_stdev
+
+        if random_token_prob > 0.0:
+            if freq_weighted_replacement:
+                weights = np.array(self.vocab.count)
+            else:
+                weights = np.ones(len(self.vocab))
+            weights[: self.vocab.nspecial] = 0
+            self.weights = weights / weights.sum()
+
+        self.epoch = 0
+
+    @property
+    def can_reuse_epoch_itr_across_epochs(self):
+        return True  # only the noise changes, not item sizes
+
+    def set_epoch(self, epoch, **unused):
+        super().set_epoch(epoch)
+        self.epoch = epoch
+
+    def __getitem__(self, index: int):
+        return self.__getitem_cached__(self.seed, self.epoch, index)
+
+    @lru_cache(maxsize=8)
+    def __getitem_cached__(self, seed: int, epoch: int, index: int):
+        with data_utils.numpy_seed(self.seed, self.epoch, index):
+            item = self.dataset[index]
+            sz = len(item)
+
+            assert (
+                self.mask_idx not in item
+            ), "Dataset contains mask_idx (={}), this is not expected!".format(
+                self.mask_idx,
+            )
+
+            if self.mask_whole_words is not None:
+                word_begins_mask = self.mask_whole_words.gather(0, item)
+                word_begins_idx = word_begins_mask.nonzero().view(-1)
+                sz = len(word_begins_idx)
+                words = np.split(word_begins_mask, word_begins_idx)[1:]
+                assert len(words) == sz
+                word_lens = list(map(len, words))
+
+            # decide elements to mask
+            mask = np.full(sz, False)
+            num_mask = int(
+                # add a random number for probabilistic rounding
+                self.mask_prob * sz / float(self.mask_multiple_length)
+                + np.random.rand()
+            )
+
+            # multiple masking as described in the vq-wav2vec paper (https://arxiv.org/abs/1910.05453)
+            mask_idc = np.random.choice(sz, num_mask, replace=False)
+            if self.mask_stdev > 0.0:
+                lengths = np.random.normal(
+                    self.mask_multiple_length, self.mask_stdev, size=num_mask
+                )
+                lengths = [max(0, int(round(x))) for x in lengths]
+                mask_idc = np.asarray(
+                    [
+                        mask_idc[j] + offset
+                        for j in range(len(mask_idc))
+                        for offset in range(lengths[j])
+                    ],
+                    dtype=np.int64,
+                )
+            else:
+                mask_idc = np.concatenate(
+                    [mask_idc + i for i in range(self.mask_multiple_length)]
+                )
+            mask_idc = mask_idc[mask_idc < len(mask)]
+            try:
+                mask[mask_idc] = True
+            except:  # something wrong
+                print(
+                    "Assigning mask indexes {} to mask {} failed!".format(
+                        mask_idc, mask
+                    )
+                )
+                raise
+
+            if self.return_masked_tokens:
+                # exit early if we're just returning the masked tokens
+                # (i.e., the targets for masked LM training)
+                if self.mask_whole_words is not None:
+                    mask = np.repeat(mask, word_lens)
+                new_item = np.full(len(mask), self.pad_idx)
+                new_item[mask] = item[torch.from_numpy(mask.astype(np.uint8)) == 1]
+                return torch.from_numpy(new_item)
+
+            # decide unmasking and random replacement
+            rand_or_unmask_prob = self.random_token_prob + self.leave_unmasked_prob
+            if rand_or_unmask_prob > 0.0:
+                rand_or_unmask = mask & (np.random.rand(sz) < rand_or_unmask_prob)
+                if self.random_token_prob == 0.0:
+                    unmask = rand_or_unmask
+                    rand_mask = None
+                elif self.leave_unmasked_prob == 0.0:
+                    unmask = None
+                    rand_mask = rand_or_unmask
+                else:
+                    unmask_prob = self.leave_unmasked_prob / rand_or_unmask_prob
+                    decision = np.random.rand(sz) < unmask_prob
+                    unmask = rand_or_unmask & decision
+                    rand_mask = rand_or_unmask & (~decision)
+            else:
+                unmask = rand_mask = None
+
+            if unmask is not None:
+                mask = mask ^ unmask
+
+            if self.mask_whole_words is not None:
+                mask = np.repeat(mask, word_lens)
+
+            new_item = np.copy(item)
+            new_item[mask] = self.mask_idx
+            if rand_mask is not None:
+                num_rand = rand_mask.sum()
+                if num_rand > 0:
+                    if self.mask_whole_words is not None:
+                        rand_mask = np.repeat(rand_mask, word_lens)
+                        num_rand = rand_mask.sum()
+
+                    new_item[rand_mask] = np.random.choice(
+                        len(self.vocab),
+                        num_rand,
+                        p=self.weights,
+                    )
+
+            return torch.from_numpy(new_item)
diff --git a/SpeechT5/fairseq/fairseq/data/monolingual_dataset.py b/SpeechT5/fairseq/fairseq/data/monolingual_dataset.py
new file mode 100644
index 0000000000000000000000000000000000000000..54fd583b64a3a475324ade6eaaeccf593d747fdc
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/data/monolingual_dataset.py
@@ -0,0 +1,253 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import numpy as np
+import torch
+
+from . import FairseqDataset, data_utils
+
+
+def collate(samples, pad_idx, eos_idx, fixed_pad_length=None, pad_to_bsz=None):
+    if len(samples) == 0:
+        return {}
+
+    def merge(key, is_list=False):
+        if is_list:
+            res = []
+            for i in range(len(samples[0][key])):
+                res.append(
+                    data_utils.collate_tokens(
+                        [s[key][i] for s in samples],
+                        pad_idx,
+                        eos_idx,
+                        left_pad=False,
+                        pad_to_length=fixed_pad_length,
+                        pad_to_bsz=pad_to_bsz,
+                    )
+                )
+            return res
+        else:
+            return data_utils.collate_tokens(
+                [s[key] for s in samples],
+                pad_idx,
+                eos_idx,
+                left_pad=False,
+                pad_to_length=fixed_pad_length,
+                pad_to_bsz=pad_to_bsz,
+            )
+
+    src_tokens = merge("source")
+    if samples[0]["target"] is not None:
+        is_target_list = isinstance(samples[0]["target"], list)
+        target = merge("target", is_target_list)
+    else:
+        target = src_tokens
+
+    return {
+        "id": torch.LongTensor([s["id"] for s in samples]),
+        "nsentences": len(samples),
+        "ntokens": sum(len(s["source"]) for s in samples),
+        "net_input": {
+            "src_tokens": src_tokens,
+            "src_lengths": torch.LongTensor([s["source"].numel() for s in samples]),
+        },
+        "target": target,
+    }
+
+
+class MonolingualDataset(FairseqDataset):
+    """
+    A wrapper around torch.utils.data.Dataset for monolingual data.
+
+    Args:
+        dataset (torch.utils.data.Dataset): dataset to wrap
+        sizes (List[int]): sentence lengths
+        vocab (~fairseq.data.Dictionary): vocabulary
+        shuffle (bool, optional): shuffle the elements before batching
+            (default: True).
+    """
+
+    def __init__(
+        self,
+        dataset,
+        sizes,
+        src_vocab,
+        tgt_vocab=None,
+        add_eos_for_other_targets=False,
+        shuffle=False,
+        targets=None,
+        add_bos_token=False,
+        fixed_pad_length=None,
+        pad_to_bsz=None,
+        src_lang_idx=None,
+        tgt_lang_idx=None,
+    ):
+        self.dataset = dataset
+        self.sizes = np.array(sizes)
+        self.vocab = src_vocab
+        self.tgt_vocab = tgt_vocab or src_vocab
+        self.add_eos_for_other_targets = add_eos_for_other_targets
+        self.shuffle = shuffle
+        self.add_bos_token = add_bos_token
+        self.fixed_pad_length = fixed_pad_length
+        self.pad_to_bsz = pad_to_bsz
+        self.src_lang_idx = src_lang_idx
+        self.tgt_lang_idx = tgt_lang_idx
+
+        assert targets is None or all(
+            t in {"self", "future", "past"} for t in targets
+        ), "targets must be none or one of 'self', 'future', 'past'"
+        if targets is not None and len(targets) == 0:
+            targets = None
+        self.targets = targets
+
+    def __getitem__(self, index):
+        if self.targets is not None:
+            # *future_target* is the original sentence
+            # *source* is shifted right by 1 (maybe left-padded with eos)
+            # *past_target* is shifted right by 2 (left-padded as needed)
+            #
+            # Left-to-right language models should condition on *source* and
+            # predict *future_target*.
+            # Right-to-left language models should condition on *source* and
+            # predict *past_target*.
+            source, future_target, past_target = self.dataset[index]
+            source, target = self._make_source_target(
+                source, future_target, past_target
+            )
+        else:
+            source = self.dataset[index]
+            target = None
+        source, target = self._maybe_add_bos(source, target)
+        return {"id": index, "source": source, "target": target}
+
+    def __len__(self):
+        return len(self.dataset)
+
+    def _make_source_target(self, source, future_target, past_target):
+        if self.targets is not None:
+            target = []
+
+            if (
+                self.add_eos_for_other_targets
+                and (("self" in self.targets) or ("past" in self.targets))
+                and source[-1] != self.vocab.eos()
+            ):
+                # append eos at the end of source
+                source = torch.cat([source, source.new([self.vocab.eos()])])
+
+                if "future" in self.targets:
+                    future_target = torch.cat(
+                        [future_target, future_target.new([self.vocab.pad()])]
+                    )
+                if "past" in self.targets:
+                    # first token is before the start of sentence which is only used in "none" break mode when
+                    # add_eos_for_other_targets is False
+                    past_target = torch.cat(
+                        [
+                            past_target.new([self.vocab.pad()]),
+                            past_target[1:],
+                            source[-2, None],
+                        ]
+                    )
+
+            for t in self.targets:
+                if t == "self":
+                    target.append(source)
+                elif t == "future":
+                    target.append(future_target)
+                elif t == "past":
+                    target.append(past_target)
+                else:
+                    raise Exception("invalid target " + t)
+
+            if len(target) == 1:
+                target = target[0]
+        else:
+            target = future_target
+
+        return source, self._filter_vocab(target)
+
+    def _maybe_add_bos(self, source, target):
+        if self.add_bos_token:
+            source = torch.cat([source.new([self.vocab.bos()]), source])
+            if target is not None:
+                target = torch.cat([target.new([self.tgt_vocab.bos()]), target])
+        return source, target
+
+    def num_tokens_vec(self, indices):
+        """Return the number of tokens for a set of positions defined by indices.
+        This value is used to enforce ``--max-tokens`` during batching."""
+        return self.sizes[indices]
+
+    def _filter_vocab(self, target):
+        if len(self.tgt_vocab) != len(self.vocab):
+
+            def _filter(target):
+                mask = target.ge(len(self.tgt_vocab))
+                if mask.any():
+                    target[mask] = self.tgt_vocab.unk()
+                return target
+
+            if isinstance(target, list):
+                return [_filter(t) for t in target]
+            return _filter(target)
+        return target
+
+    def collater(self, samples):
+        """Merge a list of samples to form a mini-batch.
+
+        Args:
+            samples (List[dict]): samples to collate
+
+        Returns:
+            dict: a mini-batch with the following keys:
+
+                - `id` (LongTensor): example IDs in the original input order
+                - `ntokens` (int): total number of tokens in the batch
+                - `net_input` (dict): the input to the Model, containing keys:
+
+                  - `src_tokens` (LongTensor): a padded 2D Tensor of tokens in
+                    the source sentence of shape `(bsz, src_len)`. Padding will
+                    appear on the right.
+
+                - `target` (LongTensor): a padded 2D Tensor of tokens in the
+                  target sentence of shape `(bsz, tgt_len)`. Padding will appear
+                  on the right.
+        """
+        return collate(
+            samples,
+            self.vocab.pad(),
+            self.vocab.eos(),
+            self.fixed_pad_length,
+            self.pad_to_bsz,
+        )
+
+    def num_tokens(self, index):
+        """Return the number of tokens in a sample. This value is used to
+        enforce ``--max-tokens`` during batching."""
+        return self.sizes[index]
+
+    def size(self, index):
+        """Return an example's size as a float or tuple. This value is used when
+        filtering a dataset with ``--max-positions``."""
+        return self.sizes[index]
+
+    def ordered_indices(self):
+        """Return an ordered list of indices. Batches will be constructed based
+        on this order."""
+        if self.shuffle:
+            order = [np.random.permutation(len(self))]
+        else:
+            order = [np.arange(len(self))]
+        order.append(self.sizes)
+        return np.lexsort(order)
+
+    @property
+    def supports_prefetch(self):
+        return getattr(self.dataset, "supports_prefetch", False)
+
+    def prefetch(self, indices):
+        self.dataset.prefetch(indices)
diff --git a/SpeechT5/fairseq/fairseq/data/multi_corpus_dataset.py b/SpeechT5/fairseq/fairseq/data/multi_corpus_dataset.py
new file mode 100644
index 0000000000000000000000000000000000000000..1bd61c32ebcc57759c210f320dd8ac7386c6193d
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/data/multi_corpus_dataset.py
@@ -0,0 +1,240 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import logging
+import time
+from collections import OrderedDict
+from typing import Dict, List
+
+import numpy as np
+from fairseq.data import data_utils
+
+from . import FairseqDataset
+
+logger = logging.getLogger(__name__)
+
+
+class MultiCorpusDataset(FairseqDataset):
+    """
+    Stores multiple instances of FairseqDataset together. Requires each instance
+    to be the same dataset, as the collate method needs to work on batches with
+    samples from each dataset.
+
+    Allows specifying a distribution over the datasets to use. Note that unlike
+    MultiCorpusSampledDataset, this distribution allows sampling for each item,
+    rather than on a batch level.
+
+    Each time ordered_indices() is called, a new sample is generated with
+    the specified distribution.
+
+    Args:
+        datasets: a OrderedDict of FairseqDataset instances.
+        distribution: a List containing the probability of getting an utterance from
+                        corresponding dataset
+        seed: random seed for sampling the datsets
+        sort_indices: if true, will sort the ordered indices by size
+        batch_sample: if true, will ensure each batch is from a single dataset
+    """
+
+    def __init__(
+        self,
+        datasets: Dict[str, FairseqDataset],
+        distribution: List[float],
+        seed: int,
+        sort_indices: bool = False,
+        batch_sample: bool = False,
+        distributed_rank=None,
+    ):
+        super().__init__()
+        assert isinstance(datasets, OrderedDict)
+        assert len(datasets) == len(distribution)
+        assert sum(distribution) == 1
+        self.datasets = datasets
+        self.distribution = distribution
+        self.seed = seed
+        self.sort_indices = sort_indices
+        self.batch_sample = batch_sample
+        self.distributed_rank = distributed_rank
+
+        # Avoid repeated conversions to list later
+        self.dataset_list = list(datasets.values())
+        self.total_num_instances = 0
+
+        first_dataset = list(self.datasets.values())[0]
+
+        self.dataset_offsets = []
+        for dataset in datasets.values():
+            assert isinstance(dataset, FairseqDataset)
+            assert type(dataset) is type(first_dataset)
+            self.dataset_offsets.append(self.total_num_instances)
+            self.total_num_instances += len(dataset)
+
+    def ordered_indices(self):
+        start = time.time()
+        with data_utils.numpy_seed(self.seed, self.epoch):
+            logger.info(f"sampling new dataset with seed {self.seed} epoch {self.epoch}")
+            sampled_indices = []
+            num_selected_instances = 0
+
+            # For each dataset i, sample self.distribution[i] * self.total_num_instances
+            for i, key in enumerate(self.datasets):
+
+                if i < len(self.datasets) - 1:
+                    num_instances = int(self.distribution[i] * self.total_num_instances)
+                    high = self.dataset_offsets[i + 1]
+                else:
+                    num_instances = self.total_num_instances - num_selected_instances
+                    high = self.total_num_instances
+
+                logger.info(f"sampling {num_instances} from {key} dataset")
+                num_selected_instances += num_instances
+
+                # First, add k copies of the dataset where k = num_instances // len(dataset).
+                # This ensures an equal distribution of the data points as much as possible.
+                # For the remaining entries randomly sample them
+                dataset_size = len(self.datasets[key])
+                num_copies = num_instances // dataset_size
+                dataset_indices = (
+                    np.random.permutation(high - self.dataset_offsets[i])
+                    + self.dataset_offsets[i]
+                )[: num_instances - num_copies * dataset_size]
+                if num_copies > 0:
+                    sampled_indices += list(
+                        np.concatenate(
+                            (
+                                np.repeat(
+                                    np.arange(self.dataset_offsets[i], high), num_copies
+                                ),
+                                dataset_indices,
+                            )
+                        )
+                    )
+                else:
+                    sampled_indices += list(dataset_indices)
+
+            assert (
+                len(sampled_indices) == self.total_num_instances
+            ), f"{len(sampled_indices)} vs {self.total_num_instances}"
+
+            np.random.shuffle(sampled_indices)
+            if self.sort_indices:
+                sampled_indices.sort(key=lambda i: self.num_tokens(i))
+
+            logger.info(
+                "multi_corpus_dataset ordered_indices took {}s".format(
+                    time.time() - start
+                )
+            )
+            return np.array(sampled_indices, dtype=np.int64)
+
+    def _map_index(self, index: int):
+        """
+        If dataset A has length N and dataset B has length M
+        then index 1 maps to index 1 of dataset A, and index N + 1
+        maps to index 1 of B.
+        """
+        counter = 0
+        for key, dataset in self.datasets.items():
+            if index < counter + len(dataset):
+                return index - counter, key
+            counter += len(dataset)
+        raise ValueError(
+            "Invalid index: {}, max: {}".format(index, self.total_num_instances)
+        )
+
+    def __len__(self):
+        """
+        Length of this dataset is the sum of individual datasets
+        """
+        return self.total_num_instances
+
+    def __getitem__(self, index):
+        new_index, key = self._map_index(index)
+        try:
+            item = self.datasets[key][new_index]
+            item["full_id"] = index
+            return item
+        except Exception as e:
+            e.args = (f"Error from {key} dataset", *e.args)
+            raise
+
+    def collater(self, samples):
+        """
+        If we are doing batch sampling, then pick the right collater to use.
+
+        Otherwise we assume all collaters are the same.
+        """
+        if len(samples) == 0:
+            return None
+        if "full_id" in samples[0]:
+            _, key = self._map_index(samples[0]["full_id"])
+            return self.datasets[key].collater(samples)
+        else:
+            # Subclasses may override __getitem__ to not specify full_id
+            return list(self.datasets.values())[0].collater(samples)
+
+    def num_tokens(self, index: int):
+        index, key = self._map_index(index)
+        return self.datasets[key].num_tokens(index)
+
+    def size(self, index: int):
+        index, key = self._map_index(index)
+        return self.datasets[key].size(index)
+
+    @property
+    def can_reuse_epoch_itr_across_epochs(self):
+        return False
+
+    def set_epoch(self, epoch, **unused):
+        super().set_epoch(epoch)
+        logger.info(f"setting epoch of multi_corpus_dataset to {epoch}")
+        self.epoch = epoch
+
+    @property
+    def supports_prefetch(self):
+        return False
+
+    @property
+    def supports_fetch_outside_dataloader(self):
+        return all(
+            self.datasets[key].supports_fetch_outside_dataloader
+            for key in self.datasets
+        )
+
+    def batch_by_size(
+        self,
+        indices,
+        max_tokens=None,
+        max_sentences=None,
+        required_batch_size_multiple=1,
+    ):
+        if not self.batch_sample:
+            return super().batch_by_size(
+                indices, max_tokens, max_sentences, required_batch_size_multiple
+            )
+
+        dataset_indices = {key: [] for key in self.datasets}
+        for i in indices:
+            _, key = self._map_index(i)
+            dataset_indices[key].append(i)
+
+        batches = []
+        for key in dataset_indices:
+            cur_batches = super().batch_by_size(
+                np.array(dataset_indices[key], dtype=np.int64),
+                max_tokens,
+                max_sentences,
+                required_batch_size_multiple,
+            )
+            logger.info(f"Created {len(cur_batches)} batches for dataset {key}")
+            batches += cur_batches
+
+        # If this dataset is used in a distributed training setup,
+        # then shuffle such that the order is seeded by the distributed rank
+        # as well
+        if self.distributed_rank is not None:
+            with data_utils.numpy_seed(self.seed, self.epoch, self.distributed_rank):
+                np.random.shuffle(batches)
+        return batches
diff --git a/SpeechT5/fairseq/fairseq/data/multi_corpus_sampled_dataset.py b/SpeechT5/fairseq/fairseq/data/multi_corpus_sampled_dataset.py
new file mode 100644
index 0000000000000000000000000000000000000000..e2e9fdf004dd1da519a170a5e8bc225775776f72
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/data/multi_corpus_sampled_dataset.py
@@ -0,0 +1,152 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from collections import OrderedDict
+from typing import Callable, Dict, List
+
+import numpy as np
+
+from . import FairseqDataset
+
+
+def uniform_sampler(x):
+    # Sample from uniform distribution
+    return np.random.choice(x, 1).item()
+
+
+class MultiCorpusSampledDataset(FairseqDataset):
+    """
+    Stores multiple instances of FairseqDataset together and in every iteration
+    creates a batch by first sampling a dataset according to a specified
+    probability distribution and then getting instances from that dataset.
+
+    Args:
+        datasets: an OrderedDict of FairseqDataset instances.
+        sampling_func: A function for sampling over list of dataset keys.
+            The default strategy is to sample uniformly.
+    """
+
+    def __init__(
+        self,
+        datasets: Dict[str, FairseqDataset],
+        sampling_func: Callable[[List], int] = None,
+    ):
+        super().__init__()
+        assert isinstance(datasets, OrderedDict)
+        self.datasets = datasets
+        if sampling_func is None:
+            sampling_func = uniform_sampler
+        self.sampling_func = sampling_func
+
+        self.total_num_instances = 0
+        for _, dataset in datasets.items():
+            assert isinstance(dataset, FairseqDataset)
+            self.total_num_instances += len(dataset)
+
+        self._ordered_indices = None
+
+    def __len__(self):
+        """
+        Length of this dataset is the sum of individual datasets
+        """
+        return self.total_num_instances
+
+    def ordered_indices(self):
+        """
+        Ordered indices for batching. Here we call the underlying
+        dataset's ordered_indices() so that we get the same random ordering
+        as we would have from using the underlying dataset directly.
+        """
+        if self._ordered_indices is None:
+            self._ordered_indices = OrderedDict(
+                [
+                    (key, dataset.ordered_indices())
+                    for key, dataset in self.datasets.items()
+                ]
+            )
+        return np.arange(len(self))
+
+    def _map_index_to_dataset(self, key: int, index: int):
+        """
+        Different underlying datasets have different lengths. In order to ensure
+        we are not accessing an index outside the range of the current dataset
+        size, we wrap around. This function should be called after we have
+        created an ordering for this and all underlying datasets.
+        """
+        assert (
+            self._ordered_indices is not None
+        ), "Must call MultiCorpusSampledDataset.ordered_indices() first"
+        mapped_index = index % len(self.datasets[key])
+        return self._ordered_indices[key][mapped_index]
+
+    def __getitem__(self, index: int):
+        """
+        Get the item associated with index from each underlying dataset.
+        Since index is in the range of [0, TotalNumInstances], we need to
+        map the index to the dataset before retrieving the item.
+        """
+        return OrderedDict(
+            [
+                (key, dataset[self._map_index_to_dataset(key, index)])
+                for key, dataset in self.datasets.items()
+            ]
+        )
+
+    def collater(self, samples: List[Dict]):
+        """
+        Generate a mini-batch for this dataset.
+        To convert this into a regular mini-batch we use the following
+        logic:
+            1. Select a dataset using the specified probability distribution.
+            2. Call the collater function of the selected dataset.
+        """
+        if len(samples) == 0:
+            return None
+
+        selected_key = self.sampling_func(list(self.datasets.keys()))
+        selected_samples = [sample[selected_key] for sample in samples]
+        return self.datasets[selected_key].collater(selected_samples)
+
+    def num_tokens(self, index: int):
+        """
+        Return an example's length (number of tokens), used for batching. Here
+        we return the max across all examples at index across all underlying
+        datasets.
+        """
+        return max(
+            dataset.num_tokens(self._map_index_to_dataset(key, index))
+            for key, dataset in self.datasets.items()
+        )
+
+    def size(self, index: int):
+        """
+        Return an example's size as a float or tuple. Here we return the max
+        across all underlying datasets. This value is used when filtering a
+        dataset with max-positions.
+        """
+        return max(
+            dataset.size(self._map_index_to_dataset(key, index))
+            for key, dataset in self.datasets.items()
+        )
+
+    @property
+    def supports_prefetch(self):
+        return all(
+            getattr(dataset, "supports_prefetch", False)
+            for dataset in self.datasets.values()
+        )
+
+    def prefetch(self, indices):
+        for key, dataset in self.datasets.items():
+            dataset.prefetch(
+                [self._map_index_to_dataset(key, index) for index in indices]
+            )
+
+    @property
+    def supports_fetch_outside_dataloader(self):
+        return all(
+            self.datasets[key].supports_fetch_outside_dataloader
+            for key in self.datasets
+        )
diff --git a/SpeechT5/fairseq/fairseq/data/multilingual/__init__.py b/SpeechT5/fairseq/fairseq/data/multilingual/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..6264236915a7269a4d920ee8213004374dd86a9a
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/data/multilingual/__init__.py
@@ -0,0 +1,4 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
diff --git a/SpeechT5/fairseq/fairseq/data/multilingual/multilingual_data_manager.py b/SpeechT5/fairseq/fairseq/data/multilingual/multilingual_data_manager.py
new file mode 100644
index 0000000000000000000000000000000000000000..a2fae5bf520153a6f41c4bd3410c691c239f9521
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/data/multilingual/multilingual_data_manager.py
@@ -0,0 +1,1131 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import itertools
+import json
+import logging
+import math
+import os
+from collections import OrderedDict, defaultdict
+
+from fairseq import utils
+from fairseq.data import (
+    AppendTokenDataset,
+    ConcatDataset,
+    Dictionary,
+    LanguagePairDataset,
+    PrependTokenDataset,
+    SampledMultiDataset,
+    SampledMultiEpochDataset,
+    StripTokenDataset,
+    TransformEosLangPairDataset,
+    TruncateDataset,
+    data_utils,
+    indexed_dataset,
+)
+from fairseq.data.multilingual.multilingual_utils import (
+    EncoderLangtok,
+    LangTokSpec,
+    LangTokStyle,
+    augment_dictionary,
+    get_lang_tok,
+)
+from fairseq.data.multilingual.sampled_multi_dataset import CollateFormat
+from fairseq.file_io import PathManager
+from fairseq.utils import FileContentsAction, csv_str_list, eval_str_dict
+
+
+logger = logging.getLogger(__name__)
+
+SRC_DICT_NAME = 'src'
+TGT_DICT_NAME = 'tgt'
+
+
+def _lang_id(dic: Dictionary, lang: str):
+    """Return language ID index."""
+    idx = dic.index(lang)
+    assert idx != dic.unk_index, "cannot find language ID for lang {}".format(lang)
+    return idx
+
+
+def load_sampling_weights(from_file):
+    with open(from_file) as f:
+        weights = json.load(f)
+    return weights
+
+
+class MultilingualDatasetManager(object):
+    def __init__(self, args, lang_pairs, langs, dicts, sampling_method):
+        super().__init__()
+        self.args = args
+        self.seed = args.seed
+        self.lang_pairs = lang_pairs
+        self.extra_lang_pairs = (
+                list(
+                    {p for _, v in args.extra_lang_pairs.items() for p in v.split(",")}
+                )
+                if args.extra_lang_pairs
+                else []
+            )
+        self.src_langs = {p.split("-")[0] for p in args.lang_pairs + self.extra_lang_pairs}
+        self.tgt_langs = {p.split("-")[1] for p in args.lang_pairs + self.extra_lang_pairs}
+        self.langs = langs
+        self.dicts = dicts
+        self.lang_dict = self.create_lang_dictionary(self.langs)
+        self.sampling_method = sampling_method
+        self.sampling_scheduler = None
+        self._has_sharded_data = False
+        self._num_shards_dict = {}
+        self._training_data_sizes = defaultdict(lambda: {})
+
+    @classmethod
+    def setup_data_manager(cls, args, lang_pairs, langs, dicts, sampling_method):
+        return MultilingualDatasetManager(
+            args, lang_pairs, langs, dicts, sampling_method
+        )
+
+    @staticmethod
+    def add_args(parser):
+        parser.add_argument(
+            "data",
+            help="colon separated path to data directories list, \
+                            will be iterated upon during epochs in round-robin manner",
+            action=FileContentsAction,
+        )
+        parser.add_argument(
+            "--langs",
+            default=None,
+            type=csv_str_list,
+            help="a list of languages comma sperated languages which can appear in lang-pairs; "
+            "note that the ordering determines language token IDs",
+        )
+        parser.add_argument(
+            "--lang-dict",
+            default=None,
+            type=str,
+            help="an external file which contains a list of "
+            "languages which can appear in lang-pairs; "
+            "note that the ordering determines language token IDs; "
+            "--langs and --lang-dict are two exclusive options",
+        )
+        parser.add_argument('--source-dict', default=None, type=str,
+                            help='path to source dictionary; if specified it will override per language dictionary loading')
+        parser.add_argument('--target-dict', default=None, type=str,
+                            help='path to target dictionary; if specified it will override per language dictionary loading')
+        parser.add_argument(
+            "--lang-tok-style",
+            default=LangTokStyle.multilingual.value,
+            type=str,
+            choices=[LangTokStyle.multilingual.value, LangTokStyle.mbart.value],
+            help="language token styles",
+        )
+
+        parser.add_argument(
+            "--load-alignments",
+            action="store_true",
+            help="load the binarized alignments",
+        )
+        parser.add_argument(
+            "--left-pad-source",
+            default="True",
+            type=str,
+            metavar="BOOL",
+            help="pad the source on the left",
+        )
+        parser.add_argument(
+            "--left-pad-target",
+            default="False",
+            type=str,
+            metavar="BOOL",
+            help="pad the target on the left",
+        )
+        parser.add_argument(
+            "--max-source-positions",
+            default=1024,
+            type=int,
+            metavar="N",
+            help="max number of tokens in the source sequence",
+        )
+        parser.add_argument(
+            "--max-target-positions",
+            default=1024,
+            type=int,
+            metavar="N",
+            help="max number of tokens in the target sequence",
+        )
+        parser.add_argument(
+            "--upsample-primary",
+            default=1,
+            type=int,
+            help="amount to upsample primary dataset",
+        )
+        parser.add_argument(
+            "--truncate-source",
+            action="store_true",
+            default=False,
+            help="truncate source to max-source-positions",
+        )
+        parser.add_argument(
+            "--encoder-langtok",
+            default=None,
+            type=str,
+            choices=[EncoderLangtok.src.value, EncoderLangtok.tgt.value],
+            metavar="SRCTGT",
+            help="prepend to the beginning of source sentence the source or target "
+            "language token. (src/tgt)",
+        )
+        parser.add_argument(
+            "--decoder-langtok",
+            action="store_true",
+            help="prepend to the beginning of target sentence the target language token",
+        )
+        parser.add_argument(
+            "--lang-tok-replacing-bos-eos", action="store_true", default=False
+        )
+        parser.add_argument(
+            "--enable-lang-ids",
+            default=False,
+            action="store_true",
+            help="whether to include language IDs in samples",
+        )
+        parser.add_argument(
+            "--enable-reservsed-directions-shared-datasets",
+            default=False,
+            action="store_true",
+            help="whether to allow datasets be used in reversed directions",
+        )
+
+        parser.add_argument(
+            "--extra-data",
+            help='a dictionary of data name to this path, \
+                            e.g. {"mined", path_to_mined_data, "denoised": path_to_denoised_data}',
+            type=lambda uf: eval_str_dict(uf, type=str),
+            default=None,
+        )
+        parser.add_argument(
+            "--extra-lang-pairs",
+            help='a dictionary of data name to the language pairs they serve, \
+                            e.g. {"mined": comma-separated-lang-pairs, "denoised":  comma-separated-lang-pairs}',
+            type=lambda uf: eval_str_dict(uf, type=str),
+            default=None,
+        )
+        parser.add_argument(
+            "--fixed-dictionary",
+            help="Fixed dictionary to use with model path",
+            default=None,
+            type=str,
+        )
+        parser.add_argument(
+            "--langtoks-specs",
+            help='a list of comma separated data types that a set of language tokens to be specialized for, \
+                            e.g. "main,dae,mined". There will be a set of language tokens added to the vocab to \
+                            distinguish languages in different training data types. If not specified, default language \
+                            tokens per languages will be added',
+            default=LangTokSpec.main.value,
+            type=csv_str_list,
+        )
+        parser.add_argument(
+            "--langtoks",
+            help='a dictionary of how to add language tokens, \
+                            e.g. {"mined": (None, "tgt"), "mono_dae": ("src.dae", "tgt"), "main": \
+                            ("src", "tgt")}, or {"mined": ("src.mined", "tgt")}',
+            default=None,
+            type=lambda uf: eval_str_dict(uf, type=str),
+        )
+        parser.add_argument(
+            "--sampling-weights-from-file",
+            help='a file contain a python dictionary of how to sample data sets, \
+                                e.g. { "main:en_XX-es_XX": 0.2, "mined:en_XX-pt_XX": 0.5, \
+                                    "mono_dae:es_XX-es_XX: 0.3, "main:en_xx-fr_XX": 0.8 }',
+            default=None,
+            type=str,
+        )
+        parser.add_argument(
+            "--sampling-weights",
+            help='a dictionary of how to sample data sets, \
+                            e.g. { "main:en_XX-es_XX": 0.2, "mined:en_XX-pt_XX": 0.5, \
+                                   "mono_dae:es_XX-es_XX: 0.3, "main:en_xx-fr_XX": 0.8 }',
+            default=None,
+            type=lambda uf: eval_str_dict(uf, type=str),
+        )
+        parser.add_argument(
+            "--virtual-epoch-size",
+            default=None,
+            type=int,
+            help="virtual epoch size to speed up data loading",
+        )
+        parser.add_argument(
+            "--virtual-data-size",
+            default=None,
+            type=int,
+            help="virtual data size of the whole joint dataset to speed"
+            "up data loading and have specific dynamic sampling strategy interval",
+        )
+
+    @classmethod
+    def load_langs(cls, args, **kwargs):
+        if args.lang_dict and args.langs:
+            raise ValueError("--langs and --lang-dict can not both be specified")
+        if args.lang_dict is None and args.langs is None:
+            logger.warning(
+                "External language dictionary is not provided; "
+                "use lang-pairs to infer the set of supported languages. "
+                "The language ordering is not stable which might cause "
+                "misalignment in pretraining and finetuning."
+            )
+            # infer from lang_pairs as it is
+            langs = list(
+                {x for lang_pair in args.lang_pairs for x in lang_pair.split("-")}
+            )
+            langs = sorted(langs)
+            logger.info(f"inferred language list: {langs}")
+        elif args.lang_dict:
+            with open(
+                PathManager.get_local_path(args.lang_dict), "r", encoding="utf-8"
+            ) as f:
+                langs = [lang.strip() for lang in f.readlines() if lang.strip()]
+                logger.info(
+                    f"loaded language list from {args.lang_dict} as they are ordered in file"
+                )
+        elif args.langs:
+            langs = args.langs
+            logger.info(
+                f"parsed the language list as they are ordered in the option: {langs}"
+            )
+        return langs
+
+    def has_sharded_data(self, split):
+        return self._has_sharded_data and split == getattr(
+            self.args, "train_subset", None
+        )
+
+    def _shared_collater(self):
+        return not (self.args.extra_data and "mono_dae" in self.args.extra_data) and (
+            not self.args.lang_tok_replacing_bos_eos
+        )
+
+    def estimate_global_pass_epoch(self, epoch):
+        if self.args.virtual_epoch_size is None or self.args.virtual_data_size is None:
+            return None
+        # one epoch more for remaining data in each shard
+        virtual_epochs_per_shard = math.ceil(
+            self.args.virtual_data_size / self.args.virtual_epoch_size
+        )
+        # note that fairseq epoch / shard_epoch starts from 1
+        shard_epoch = (epoch - 1) // virtual_epochs_per_shard + 1
+        return shard_epoch
+
+    @classmethod
+    def prepare(cls, load_dictionary, args, **kargs):
+        args.left_pad_source = utils.eval_bool(args.left_pad_source)
+        args.left_pad_target = utils.eval_bool(args.left_pad_target)
+
+        if not hasattr(args, "shuffle_instance"):
+            args.shuffle_instance = False
+        if args.langtoks is None:
+            args.langtoks = {}
+        if "main" not in args.langtoks:
+            src_langtok_spec = args.encoder_langtok if args.encoder_langtok else None
+            tgt_langtok_spec = "tgt" if args.decoder_langtok else None
+            args.langtoks["main"] = (src_langtok_spec, tgt_langtok_spec)
+
+        def check_langs(langs, pairs):
+            messages = []
+            for src, tgt in pairs:
+                if src not in langs or tgt not in langs:
+                    messages.append(
+                        f"language pair {src}-{tgt} contains languages "
+                        "that are not in the language dictionary"
+                    )
+            if len(messages) > 0:
+                raise ValueError(" ".join(messages) + f"; langs: {langs}")
+
+        if args.lang_pairs is None:
+            raise ValueError(
+                "--lang-pairs is required. List all the language pairs in the training objective."
+            )
+        if isinstance(args.lang_pairs, str):
+            args.lang_pairs = args.lang_pairs.split(",")
+        if args.source_lang is not None or args.target_lang is not None:
+            training = False
+        else:
+            training = True
+        language_list = cls.load_langs(args, **kargs)
+        check_langs(
+            language_list,
+            (
+                [p.split("-") for p in args.lang_pairs]
+                if training
+                else [(args.source_lang, args.target_lang)]
+            ),
+        )
+
+        def load_dictionary_and_postproc(path):
+            d = load_dictionary(path)
+            augment_dictionary(
+                dictionary=d,
+                language_list=language_list,
+                lang_tok_style=args.lang_tok_style,
+                langtoks_specs=args.langtoks_specs,
+                extra_data=args.extra_data,
+            )
+            return d
+
+        dicts = cls.load_all_dictionaries(args, language_list, load_dictionary_and_postproc, training)
+        return language_list, dicts, training
+
+    @classmethod
+    def load_all_dictionaries(cls, args, language_list, load_dictionary, training):
+        dicts = OrderedDict()
+        if args.source_dict is not None:
+            dicts[SRC_DICT_NAME] = load_dictionary(args.source_dict)
+        if args.target_dict is not None:
+            dicts[TGT_DICT_NAME] = load_dictionary(args.target_dict)
+
+        if training:
+            extra_lang_pairs = (
+                list(
+                    {p for _, v in args.extra_lang_pairs.items() for p in v.split(",")}
+                )
+                if args.extra_lang_pairs
+                else []
+            )
+            src_langs_to_load_dicts = sorted(
+                {p.split("-")[0] for p in (args.lang_pairs + extra_lang_pairs)}
+            )
+            tgt_langs_to_load_dicts = sorted(
+                {p.split("-")[1] for p in (args.lang_pairs + extra_lang_pairs)}
+            )
+        else:
+            src_langs_to_load_dicts = [args.source_lang]
+            tgt_langs_to_load_dicts = [args.target_lang]
+
+        paths = utils.split_paths(args.data)
+        assert len(paths) > 0
+
+        def load_dicts(langs_to_load_dicts):
+            for lang in langs_to_load_dicts:
+                dicts[lang] = load_dictionary(
+                    os.path.join(paths[0], "dict.{}.txt".format(lang))
+                )
+            if len(dicts) > 0:
+                dict0 = next(iter(dicts.values()))
+                assert dicts[lang].pad() == dict0.pad()
+                assert dicts[lang].eos() == dict0.eos()
+                assert dicts[lang].unk() == dict0.unk()
+            logger.info("[{}] dictionary: {} types".format(lang, len(dicts[lang])))
+
+        if args.fixed_dictionary is not None:
+            fixed_dict = load_dictionary(args.fixed_dictionary)
+            dicts = {lang: fixed_dict for lang in src_langs_to_load_dicts + tgt_langs_to_load_dicts}
+        else:
+            if args.source_dict is None:
+                load_dicts(src_langs_to_load_dicts)
+            if args.target_dict is None:
+                load_dicts(tgt_langs_to_load_dicts)
+        return dicts
+
+    def get_source_dictionary(self, lang):
+        if self.args.source_dict is not None:
+            return self.dicts[SRC_DICT_NAME]
+        else:
+            return self.dicts[lang]
+
+    def get_target_dictionary(self, lang):
+        if self.args.target_dict is not None:
+            return self.dicts[TGT_DICT_NAME]
+        else:
+            return self.dicts[lang]
+
+    @classmethod
+    def create_lang_dictionary(cls, langs):
+        unk = "<unk>"
+        # hack to remove symbols other than unk as they are not needed by lang dict
+        lang_dict = Dictionary(pad=unk, eos=unk, unk=unk, bos=unk)
+        for lang in langs:
+            lang_dict.add_symbol(lang)
+        return lang_dict
+
+    @classmethod
+    def get_langtok_index(cls, lang_tok, dic):
+        idx = dic.index(lang_tok)
+        assert (
+            idx != dic.unk_index
+        ), "cannot find language token {} in the dictionary".format(lang_tok)
+        return idx
+
+    def get_encoder_langtok(self, src_lang, tgt_lang, spec=None):
+        if spec is None:
+            return None
+        if spec and spec.startswith("src"):
+            if src_lang is None:
+                return None
+            langtok = get_lang_tok(
+                lang=src_lang, lang_tok_style=self.args.lang_tok_style, spec=spec
+            )
+        else:
+            if tgt_lang is None:
+                return None
+            langtok = get_lang_tok(
+                lang=tgt_lang, lang_tok_style=self.args.lang_tok_style, spec=spec
+            )
+        return self.get_langtok_index(
+            langtok, self.get_source_dictionary(src_lang) if src_lang else self.get_target_dictionary(tgt_lang)
+        )
+
+    def get_decoder_langtok(self, tgt_lang, spec=None):
+        if spec is None:
+            return None
+        langtok = get_lang_tok(
+            lang=tgt_lang, lang_tok_style=self.args.lang_tok_style, spec=spec
+        )
+        return self.get_langtok_index(langtok, self.get_target_dictionary(tgt_lang))
+
+    @classmethod
+    def load_data(cls, path, vdict, impl):
+        dataset = data_utils.load_indexed_dataset(path, vdict, impl)
+        return dataset
+
+    @classmethod
+    def split_exists(cls, split, src, tgt, lang, data_path, dataset_impl):
+        filename = os.path.join(data_path, "{}.{}-{}.{}".format(split, src, tgt, lang))
+        return indexed_dataset.dataset_exists(filename, impl=dataset_impl)
+
+    def load_lang_dataset(
+        self,
+        data_path,
+        split,
+        src,
+        src_dict,
+        tgt,
+        tgt_dict,
+        combine,
+        dataset_impl,
+        upsample_primary,
+        max_source_positions,
+        prepend_bos=False,
+        load_alignments=False,
+        truncate_source=False,
+    ):
+
+        src_datasets = []
+        tgt_datasets = []
+
+        for k in itertools.count():
+            split_k = split + (str(k) if k > 0 else "")
+
+            # infer langcode
+            if self.split_exists(split_k, src, tgt, src, data_path, dataset_impl):
+                prefix = os.path.join(data_path, "{}.{}-{}.".format(split_k, src, tgt))
+            elif self.split_exists(split_k, tgt, src, src, data_path, dataset_impl):
+                prefix = os.path.join(data_path, "{}.{}-{}.".format(split_k, tgt, src))
+            else:
+                if k > 0:
+                    break
+                else:
+                    logger.error(
+                        f"Dataset not found: {data_path}, {split_k}, {src}, {tgt}"
+                    )
+                    raise FileNotFoundError(
+                        "Dataset not found: {} ({})".format(split, data_path)
+                    )
+
+            src_dataset = self.load_data(prefix + src, src_dict, dataset_impl)
+            if truncate_source:
+                src_dataset = AppendTokenDataset(
+                    TruncateDataset(
+                        StripTokenDataset(src_dataset, src_dict.eos()),
+                        max_source_positions - 1,
+                    ),
+                    src_dict.eos(),
+                )
+            src_datasets.append(src_dataset)
+            tgt_datasets.append(self.load_data(prefix + tgt, tgt_dict, dataset_impl))
+
+            logger.info(
+                "{} {} {}-{} {} examples".format(
+                    data_path, split_k, src, tgt, len(src_datasets[-1])
+                )
+            )
+
+            if not combine:
+                break
+
+        assert len(src_datasets) == len(tgt_datasets)
+
+        if len(src_datasets) == 1:
+            src_dataset, tgt_dataset = src_datasets[0], tgt_datasets[0]
+        else:
+            sample_ratios = [1] * len(src_datasets)
+            sample_ratios[0] = upsample_primary
+            src_dataset = ConcatDataset(src_datasets, sample_ratios)
+            tgt_dataset = ConcatDataset(tgt_datasets, sample_ratios)
+
+        if prepend_bos:
+            assert hasattr(src_dict, "bos_index") and hasattr(tgt_dict, "bos_index")
+            src_dataset = PrependTokenDataset(src_dataset, src_dict.bos())
+            tgt_dataset = PrependTokenDataset(tgt_dataset, tgt_dict.bos())
+
+        align_dataset = None
+        if load_alignments:
+            align_path = os.path.join(
+                data_path, "{}.align.{}-{}".format(split, src, tgt)
+            )
+            if indexed_dataset.dataset_exists(align_path, impl=dataset_impl):
+                align_dataset = data_utils.load_indexed_dataset(
+                    align_path, None, dataset_impl
+                )
+
+        return src_dataset, tgt_dataset, align_dataset
+
+    def load_langpair_dataset(
+        self,
+        data_path,
+        split,
+        src,
+        src_dict,
+        tgt,
+        tgt_dict,
+        combine,
+        dataset_impl,
+        upsample_primary,
+        left_pad_source,
+        left_pad_target,
+        max_source_positions,
+        max_target_positions,
+        prepend_bos=False,
+        load_alignments=False,
+        truncate_source=False,
+        src_dataset_transform_func=lambda dataset: dataset,
+        tgt_dataset_transform_func=lambda dataset: dataset,
+        src_lang_id=None,
+        tgt_lang_id=None,
+        langpairs_sharing_datasets=None,
+    ):
+        norm_direction = "-".join(sorted([src, tgt]))
+        if langpairs_sharing_datasets is not None:
+            src_dataset = langpairs_sharing_datasets.get(
+                (data_path, split, norm_direction, src), "NotInCache"
+            )
+            tgt_dataset = langpairs_sharing_datasets.get(
+                (data_path, split, norm_direction, tgt), "NotInCache"
+            )
+            align_dataset = langpairs_sharing_datasets.get(
+                (data_path, split, norm_direction, src, tgt), "NotInCache"
+            )
+
+        # a hack: any one is not in cache, we need to reload them
+        if (
+            langpairs_sharing_datasets is None
+            or src_dataset == "NotInCache"
+            or tgt_dataset == "NotInCache"
+            or align_dataset == "NotInCache"
+            or split != getattr(self.args, "train_subset", None)
+        ):
+            # source and target datasets can be reused in reversed directions to save memory
+            # reversed directions of valid and test data will not share source and target datasets
+            src_dataset, tgt_dataset, align_dataset = self.load_lang_dataset(
+                data_path,
+                split,
+                src,
+                src_dict,
+                tgt,
+                tgt_dict,
+                combine,
+                dataset_impl,
+                upsample_primary,
+                max_source_positions=max_source_positions,
+                prepend_bos=prepend_bos,
+                load_alignments=load_alignments,
+                truncate_source=truncate_source,
+            )
+            src_dataset = src_dataset_transform_func(src_dataset)
+            tgt_dataset = tgt_dataset_transform_func(tgt_dataset)
+            if langpairs_sharing_datasets is not None:
+                langpairs_sharing_datasets[
+                    (data_path, split, norm_direction, src)
+                ] = src_dataset
+                langpairs_sharing_datasets[
+                    (data_path, split, norm_direction, tgt)
+                ] = tgt_dataset
+                langpairs_sharing_datasets[
+                    (data_path, split, norm_direction, src, tgt)
+                ] = align_dataset
+                if align_dataset is None:
+                    # no align data so flag the reverse direction as well in sharing
+                    langpairs_sharing_datasets[
+                        (data_path, split, norm_direction, tgt, src)
+                    ] = align_dataset
+        else:
+            logger.info(
+                f"Reusing source and target datasets of [{split}] {tgt}-{src} for reversed direction: "
+                f"[{split}] {src}-{tgt}: src length={len(src_dataset)}; tgt length={len(tgt_dataset)}"
+            )
+
+        return LanguagePairDataset(
+            src_dataset,
+            src_dataset.sizes,
+            src_dict,
+            tgt_dataset,
+            tgt_dataset.sizes if tgt_dataset is not None else None,
+            tgt_dict,
+            left_pad_source=left_pad_source,
+            left_pad_target=left_pad_target,
+            align_dataset=align_dataset,
+            src_lang_id=src_lang_id,
+            tgt_lang_id=tgt_lang_id,
+        )
+
+    def src_dataset_tranform_func(self, src_lang, tgt_lang, dataset, spec=None):
+        if self.args.lang_tok_replacing_bos_eos:
+            # it is handled by self.alter_dataset_langtok
+            # TODO: Unifiy with alter_dataset_langtok
+            return dataset
+        if spec is None:
+            return dataset
+        tok = self.get_encoder_langtok(src_lang, tgt_lang, spec)
+        if tok:
+            return PrependTokenDataset(dataset, tok)
+        return dataset
+
+    def tgt_dataset_tranform_func(self, source_lang, target_lang, dataset, spec=None):
+        if dataset is None:
+            # note that target dataset can be None during inference time
+            return None
+        if self.args.lang_tok_replacing_bos_eos:
+            # TODO: Unifiy with alter_dataset_langtok
+            # It is handled by self.alter_dataset_langtok.
+            # The complication in self.alter_dataset_langtok
+            # makes a unified framework difficult.
+            return dataset
+        # if not self.args.decoder_langtok:
+        if not spec:
+            return dataset
+        tok = self.get_decoder_langtok(target_lang, spec)
+        if tok:
+            return PrependTokenDataset(dataset, tok)
+        return dataset
+
+    def alter_dataset_langtok(
+        self,
+        lang_pair_dataset,
+        src_eos=None,
+        src_lang=None,
+        tgt_eos=None,
+        tgt_lang=None,
+        src_langtok_spec=None,
+        tgt_langtok_spec=None,
+    ):
+        if src_langtok_spec is None and tgt_langtok_spec is None:
+            return lang_pair_dataset
+
+        new_src_eos = None
+        if (
+            src_langtok_spec is not None
+            and src_eos is not None
+            and (src_lang is not None or tgt_lang is not None)
+        ):
+            new_src_eos = self.get_encoder_langtok(src_lang, tgt_lang, src_langtok_spec)
+        else:
+            src_eos = None
+
+        new_tgt_bos = None
+        if tgt_langtok_spec and tgt_eos is not None and tgt_lang is not None:
+            new_tgt_bos = self.get_decoder_langtok(tgt_lang, tgt_langtok_spec)
+        else:
+            tgt_eos = None
+
+        return TransformEosLangPairDataset(
+            lang_pair_dataset,
+            src_eos=src_eos,
+            new_src_eos=new_src_eos,
+            tgt_bos=tgt_eos,
+            new_tgt_bos=new_tgt_bos,
+        )
+
+    def load_a_dataset(
+        self,
+        split,
+        data_path,
+        src,
+        src_dict,
+        tgt,
+        tgt_dict,
+        combine,
+        prepend_bos=False,
+        langpairs_sharing_datasets=None,
+        data_category=None,
+        **extra_kwargs,
+    ):
+        dataset_impl = self.args.dataset_impl
+        upsample_primary = self.args.upsample_primary
+        left_pad_source = self.args.left_pad_source
+        left_pad_target = self.args.left_pad_target
+        max_source_positions = self.args.max_source_positions
+        max_target_positions = self.args.max_target_positions
+        load_alignments = self.args.load_alignments
+        truncate_source = self.args.truncate_source
+        src_dataset_transform_func = self.src_dataset_tranform_func
+        tgt_dataset_transform_func = self.tgt_dataset_tranform_func
+        enable_lang_ids = self.args.enable_lang_ids
+        lang_dictionary = self.lang_dict
+        src_langtok_spec, tgt_langtok_spec = extra_kwargs["langtok_spec"]
+
+        src_langtok = self.get_encoder_langtok(src, tgt, src_langtok_spec)
+        tgt_langtok = self.get_decoder_langtok(tgt, tgt_langtok_spec)
+        logger.info(
+            f"{data_category}:{src}-{tgt} src_langtok: {src_langtok}; tgt_langtok: {tgt_langtok}"
+        )
+
+        langpair_ds = self.load_langpair_dataset(
+            data_path,
+            split,
+            src,
+            src_dict,
+            tgt,
+            tgt_dict,
+            combine,
+            dataset_impl,
+            upsample_primary,
+            left_pad_source,
+            left_pad_target,
+            max_source_positions,
+            max_target_positions,
+            prepend_bos,
+            load_alignments,
+            truncate_source,
+            src_dataset_transform_func=lambda dataset: src_dataset_transform_func(
+                src, tgt, dataset, src_langtok_spec
+            ),
+            tgt_dataset_transform_func=lambda dataset: tgt_dataset_transform_func(
+                src, tgt, dataset, tgt_langtok_spec
+            ),
+            src_lang_id=_lang_id(lang_dictionary, src)
+            if enable_lang_ids and lang_dictionary is not None
+            else None,
+            tgt_lang_id=_lang_id(lang_dictionary, tgt)
+            if enable_lang_ids and lang_dictionary is not None
+            else None,
+            langpairs_sharing_datasets=langpairs_sharing_datasets,
+        )
+        # TODO: handle modified lang toks for mined data and dae data
+        if self.args.lang_tok_replacing_bos_eos:
+            ds = self.alter_dataset_langtok(
+                langpair_ds,
+                src_eos=self.get_source_dictionary(src).eos() if src else self.get_target_dictionary(tgt).eos(),
+                src_lang=src,
+                tgt_eos=self.get_target_dictionary(tgt).eos(),
+                tgt_lang=tgt,
+                src_langtok_spec=src_langtok_spec,
+                tgt_langtok_spec=tgt_langtok_spec,
+            )
+        else:
+            ds = langpair_ds
+        return ds
+
+    def load_split_langpair_datasets(self, split, data_param_list):
+        datasets = []
+        langpairs_sharing_datasets = (
+            {} if self.args.enable_reservsed_directions_shared_datasets else None
+        )
+        for param in data_param_list:
+            ds = self.load_a_dataset(
+                split=split,
+                langpairs_sharing_datasets=langpairs_sharing_datasets,
+                **param,
+            )
+            datasets.append(ds)
+        return datasets
+
+    def get_data_paths_and_lang_pairs(self, split):
+        datapaths = {"main": self.args.data}
+        lang_pairs = {"main": self.lang_pairs}
+        if split == getattr(self.args, "train_subset", None):
+            # only training data can have extra data and extra language pairs
+            if self.args.extra_data:
+                extra_datapaths = self.args.extra_data
+                datapaths.update(extra_datapaths)
+            if self.args.extra_lang_pairs:
+                extra_lang_pairs = {
+                    k: v.split(",") for k, v in self.args.extra_lang_pairs.items()
+                }
+                lang_pairs.update(extra_lang_pairs)
+        return datapaths, lang_pairs
+
+    @classmethod
+    def get_dataset_key(cls, data_category, src, tgt):
+        return f"{data_category}:{src}-{tgt}"
+
+    @classmethod
+    def _get_shard_num_dict(cls, split, paths):
+        shards = defaultdict(int)
+        for path in paths:
+            files = PathManager.ls(path)
+            directions = set()
+            for f in files:
+                if f.startswith(split) and f.endswith(".idx"):
+                    # idx files of the form "{split}.{src}-{tgt}.{lang}.idx"
+                    direction = f.split(".")[-3]
+                    directions.add(direction)
+            for direction in directions:
+                shards[direction] += 1
+        return shards
+
+    def get_split_num_data_shards(self, split):
+        if split in self._num_shards_dict:
+            return self._num_shards_dict[split]
+        num_shards_dict = {}
+        data_paths, lang_pairs = self.get_data_paths_and_lang_pairs(split)
+
+        for data_category, paths in data_paths.items():
+            if data_category not in lang_pairs:
+                continue
+            paths = utils.split_paths(paths)
+            shards_dict = self._get_shard_num_dict(split, paths)
+            lang_dirs = [
+                lang_pair.split("-") for lang_pair in lang_pairs[data_category]
+            ]
+            lang_dirs = [x if len(x) > 1 else (x[0], x[0]) for x in lang_dirs]
+            for src, tgt in lang_dirs:
+                key = self.get_dataset_key(data_category, src, tgt)
+                if "mono_" in data_category:
+                    # monolingual data requires tgt only
+                    assert src is None or src == tgt, (
+                        f"error: src={src}, "
+                        "tgt={tgt} for data_category={data_category}"
+                    )
+                    num_shards_dict[key] = shards_dict[tgt]
+                else:
+                    if f"{src}-{tgt}" in shards_dict:
+                        num_shards_dict[key] = shards_dict[f"{src}-{tgt}"]
+                    elif f"{tgt}-{src}" in shards_dict:
+                        # follow the fairseq tradition to use reversed direction data if it is not available
+                        num_shards_dict[key] = shards_dict[f"{tgt}-{src}"]
+        self._num_shards_dict[split] = num_shards_dict
+        logger.info(f"[{split}] num of shards: {num_shards_dict}")
+        return num_shards_dict
+
+    @classmethod
+    def get_shard_id(cls, num_shards, epoch, shard_epoch=None):
+        shard = epoch if shard_epoch is None else shard_epoch
+        shard = (shard - 1) % num_shards
+        return shard
+
+    def get_split_data_path(self, paths, epoch, shard_epoch, num_shards):
+        path = paths[self.get_shard_id(num_shards, epoch, shard_epoch)]
+        return path
+
+    def get_split_data_param_list(self, split, epoch, shard_epoch=None):
+        # TODO: to extend with extra datasets and keys and loop over different shard data paths
+        param_list = []
+        data_paths, lang_pairs = self.get_data_paths_and_lang_pairs(split)
+        logger.info(f"langtoks settings: {self.args.langtoks}")
+        split_num_shards_dict = self.get_split_num_data_shards(split)
+        for data_category, paths in data_paths.items():
+            if data_category not in lang_pairs:
+                continue
+            paths = utils.split_paths(paths)
+            assert len(paths) > 0
+            if len(paths) > 1:
+                self._has_sharded_data = True
+            if split != getattr(self.args, "train_subset", None):
+                # if not training data set, use the first shard for valid and test
+                paths = paths[:1]
+
+            if data_category in self.args.langtoks:
+                lang_tok_spec = self.args.langtoks[data_category]
+            else:
+                # default to None
+                lang_tok_spec = (None, None)
+
+            # infer langcode
+            lang_dirs = [
+                lang_pair.split("-") for lang_pair in lang_pairs[data_category]
+            ]
+            lang_dirs = [x if len(x) > 1 else (x[0], x[0]) for x in lang_dirs]
+            for src, tgt in lang_dirs:
+                assert src is not None or data_category == "mono_dae", (
+                    f"error: src={src}, " "tgt={tgt} for data_category={data_category}"
+                )
+                # logger.info(f"preparing param for {data_category}: {src} - {tgt}")
+                key = self.get_dataset_key(data_category, src, tgt)
+                data_path = self.get_split_data_path(
+                    paths, epoch, shard_epoch, split_num_shards_dict[key]
+                )
+                param_list.append(
+                    {
+                        "key": key,
+                        "data_path": data_path,
+                        "split": split,
+                        "src": src,
+                        "src_dict": self.get_source_dictionary(src)
+                        if src and data_category != "mono_dae"
+                        else None,
+                        "tgt": tgt,
+                        "tgt_dict": self.get_target_dictionary(tgt),
+                        "data_category": data_category,
+                        "langtok_spec": lang_tok_spec,
+                    }
+                )
+        return param_list
+
+    def get_train_dataset_sizes(
+        self, data_param_list, datasets, epoch, shard_epoch=None
+    ):
+        num_shards = [
+            self.get_split_num_data_shards(param["split"])[param["key"]]
+            for param in data_param_list
+        ]
+        data_sizes = []
+        for (key, d), num_shard in zip(datasets, num_shards):
+            my_data_sizes = self._training_data_sizes[key]
+            shard_ind = self.get_shard_id(num_shard, epoch, shard_epoch)
+            if shard_ind not in my_data_sizes:
+                my_data_sizes[shard_ind] = len(d)
+            known_size = max(my_data_sizes.values())
+            data_sizes.append(
+                # If we don't know the data size of the shard yet,
+                # use the the max known data size to approximate.
+                # Note that we preprocess shards by a designated shard size
+                # and put any remaining data at the end into the last shard so
+                # the max shard size approximation is almost correct before loading
+                # the last shard; after loading the last shard, it will have the
+                # exact data sizes of the whole data size.
+                (key, sum(my_data_sizes.get(i, known_size) for i in range(num_shard)))
+            )
+        logger.info(
+            f"estimated total data sizes of all shards used in sampling ratios: {data_sizes}. "
+            "Note that if the data a shard has not been loaded yet, use the max known data size to approximate"
+        )
+        return [s for _, s in data_sizes]
+
+    def get_train_sampling_ratios(
+        self, data_param_list, datasets, epoch=1, shard_epoch=None
+    ):
+        data_sizes = self.get_train_dataset_sizes(
+            data_param_list, datasets, epoch, shard_epoch
+        )
+        sampling_func = self.sampling_method.sampling_method_selector()
+        sample_ratios = sampling_func(data_sizes) if sampling_func is not None else None
+        return sample_ratios
+
+    def get_sampling_ratios(self, data_param_list, datasets, epoch, shard_epoch=None):
+        if self.args.sampling_weights_from_file:
+            weights = load_sampling_weights(self.args.sampling_weights_from_file)
+            sample_ratios = [weights[k] for k, _ in datasets]
+            logger.info(
+                "| ignoring --sampling-weights when loadding sampling weights "
+                f"from file {self.args.sampling_weights_from_file}"
+            )
+        elif self.args.sampling_weights:
+            sample_ratios = [self.args.sampling_weights[k] for k, _ in datasets]
+        else:
+            sample_ratios = self.get_train_sampling_ratios(
+                data_param_list, datasets, epoch, shard_epoch
+            )
+
+        if sample_ratios is not None:
+            logger.info(
+                "| Upsample ratios: {}".format(
+                    list(zip(map(lambda x: x["key"], data_param_list), sample_ratios))
+                )
+            )
+            assert len(sample_ratios) == len(datasets)
+        return sample_ratios
+
+    def load_split_datasets(
+        self, split, training, epoch=1, combine=False, shard_epoch=None, **kwargs
+    ):
+        data_param_list = self.get_split_data_param_list(
+            split, epoch, shard_epoch=shard_epoch
+        )
+        langpairs_sharing_datasets = (
+            {} if self.args.enable_reservsed_directions_shared_datasets else None
+        )
+        datasets = [
+            (
+                param["key"],
+                self.load_a_dataset(
+                    combine=combine,
+                    langpairs_sharing_datasets=langpairs_sharing_datasets,
+                    **param,
+                ),
+            )
+            for param in data_param_list
+        ]
+        return datasets, data_param_list
+
+    def load_into_concat_dataset(self, split, datasets, data_param_list):
+        if self.args.lang_tok_replacing_bos_eos:
+            # TODO: to investigate why TransformEosLangPairDataset doesn't work with ConcatDataset
+            return SampledMultiDataset(
+                OrderedDict(datasets),
+                sampling_ratios=None,
+                eval_key=None,
+                collate_format=CollateFormat.single,
+                virtual_size=None,
+                split=split,
+            )
+        return ConcatDataset([d for _, d in datasets])
+
+    def load_sampled_multi_epoch_dataset(
+        self, split, training, epoch=0, combine=False, shard_epoch=None, **kwargs
+    ):
+        datasets, data_param_list = self.load_split_datasets(
+            split, training, epoch, combine, shard_epoch=shard_epoch, **kwargs
+        )
+        if training and split == getattr(self.args, "train_subset", None):
+            sample_ratios = self.get_sampling_ratios(data_param_list, datasets, epoch)
+            return SampledMultiEpochDataset(
+                OrderedDict(datasets),
+                epoch=epoch,
+                shard_epoch=shard_epoch,
+                # valid and test datasets will be degenerate to concating datasets:
+                sampling_ratios=sample_ratios,
+                eval_key=None,
+                collate_format=CollateFormat.single,
+                virtual_size=self.args.virtual_data_size,
+                split=split,
+                virtual_epoch_size=self.args.virtual_epoch_size,
+                # if not using lang_tok altering, simplified to use the same collater
+                shared_collater=self._shared_collater(),
+            )
+        else:
+            return self.load_into_concat_dataset(split, datasets, data_param_list)
+
+    def load_sampled_multi_dataset(
+        self, split, training, epoch=0, combine=False, shard_epoch=None, **kwargs
+    ):
+        datasets, data_param_list = self.load_split_datasets(
+            split, training, epoch, combine, shard_epoch=shard_epoch, **kwargs
+        )
+        if training and split == getattr(self.args, "train_subset", None):
+            sample_ratios = self.get_sampling_ratios(data_param_list, datasets, epoch)
+            return SampledMultiDataset(
+                OrderedDict(datasets),
+                epoch=epoch,
+                # valid and test datasets will be degerate to concating datasets:
+                sampling_ratios=sample_ratios,
+                eval_key=None,
+                collate_format=CollateFormat.single,
+                virtual_size=self.args.virtual_data_size,
+                split=split,
+                # if not using lang_tok altering, simplified to use the same collater
+                shared_collater=self._shared_collater(),
+            )
+        else:
+            return self.load_into_concat_dataset(split, datasets, data_param_list)
+
+    def load_dataset(
+        self, split, training, epoch=0, combine=False, shard_epoch=None, **kwargs
+    ):
+        if self.args.virtual_epoch_size is None:
+            return self.load_sampled_multi_dataset(
+                split, training, epoch, combine, shard_epoch, **kwargs
+            )
+        else:
+            return self.load_sampled_multi_epoch_dataset(
+                split, training, epoch, combine, shard_epoch, **kwargs
+            )
diff --git a/SpeechT5/fairseq/fairseq/data/multilingual/multilingual_utils.py b/SpeechT5/fairseq/fairseq/data/multilingual/multilingual_utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..b4e0f9828cabfdbe375d05d9152b58bdbd6de7dc
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/data/multilingual/multilingual_utils.py
@@ -0,0 +1,63 @@
+from enum import Enum
+from typing import Dict, List, Optional, Sequence
+
+import torch
+from fairseq.data import Dictionary
+
+
+class EncoderLangtok(Enum):
+    """
+    Prepend to the beginning of source sentence either the
+    source or target language token. (src/tgt).
+    """
+
+    src = "src"
+    tgt = "tgt"
+
+
+class LangTokSpec(Enum):
+    main = "main"
+    mono_dae = "mono_dae"
+
+
+class LangTokStyle(Enum):
+    multilingual = "multilingual"
+    mbart = "mbart"
+
+
+@torch.jit.export
+def get_lang_tok(
+    lang: str, lang_tok_style: str, spec: str = LangTokSpec.main.value
+) -> str:
+    # TOKEN_STYLES can't be defined outside this fn since it needs to be
+    # TorchScriptable.
+    TOKEN_STYLES: Dict[str, str] = {
+        LangTokStyle.mbart.value: "[{}]",
+        LangTokStyle.multilingual.value: "__{}__",
+    }
+
+    if spec.endswith("dae"):
+        lang = f"{lang}_dae"
+    elif spec.endswith("mined"):
+        lang = f"{lang}_mined"
+    style = TOKEN_STYLES[lang_tok_style]
+    return style.format(lang)
+
+
+def augment_dictionary(
+    dictionary: Dictionary,
+    language_list: List[str],
+    lang_tok_style: str,
+    langtoks_specs: Sequence[str] = (LangTokSpec.main.value,),
+    extra_data: Optional[Dict[str, str]] = None,
+) -> None:
+    for spec in langtoks_specs:
+        for language in language_list:
+            dictionary.add_symbol(
+                get_lang_tok(lang=language, lang_tok_style=lang_tok_style, spec=spec)
+            )
+
+    if lang_tok_style == LangTokStyle.mbart.value or (
+        extra_data is not None and LangTokSpec.mono_dae.value in extra_data
+    ):
+        dictionary.add_symbol("<mask>")
diff --git a/SpeechT5/fairseq/fairseq/data/multilingual/sampled_multi_dataset.py b/SpeechT5/fairseq/fairseq/data/multilingual/sampled_multi_dataset.py
new file mode 100644
index 0000000000000000000000000000000000000000..b0a617424ee3c5923b37796773da4c97851a16c5
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/data/multilingual/sampled_multi_dataset.py
@@ -0,0 +1,467 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import datetime
+import hashlib
+import logging
+import time
+from bisect import bisect_right
+from collections import OrderedDict, defaultdict
+from enum import Enum
+from typing import List
+
+import numpy as np
+import torch
+from fairseq.data import FairseqDataset, data_utils
+from fairseq.distributed import utils as distributed_utils
+
+
+def get_time_gap(s, e):
+    return (
+        datetime.datetime.fromtimestamp(e) - datetime.datetime.fromtimestamp(s)
+    ).__str__()
+
+
+logger = logging.getLogger(__name__)
+
+
+def default_virtual_size_func(datasets, ratios, max_scale_up=1.5):
+    sizes = [len(d) for d in datasets]
+    if ratios is None:
+        return sum(sizes)
+    largest_idx = np.argmax(sizes)
+    largest_r = ratios[largest_idx]
+    largest_s = sizes[largest_idx]
+    # set virtual sizes relative to the largest dataset
+    virtual_sizes = [(r / largest_r) * largest_s for r in ratios]
+    vsize = sum(virtual_sizes)
+    max_size = sum(sizes) * max_scale_up
+    return int(vsize if vsize < max_size else max_size)
+
+
+class CollateFormat(Enum):
+    single = 1
+    ordered_dict = 2
+
+
+class SampledMultiDataset(FairseqDataset):
+    """Samples from multiple sub-datasets according to given sampling ratios.
+    Args:
+        datasets (
+            List[~torch.utils.data.Dataset]
+            or OrderedDict[str, ~torch.utils.data.Dataset]
+        ): datasets
+        sampling_ratios (List[float]): list of probability of each dataset to be sampled
+            (default: None, which corresponds to concatenating all dataset together).
+        seed (int): RNG seed to use (default: 2).
+        epoch (int): starting epoch number (default: 1).
+        eval_key (str, optional): a key used at evaluation time that causes
+            this instance to pass-through batches from *datasets[eval_key]*.
+        collate_format (CollateFormat):  collater output format, either CollateFormat.ordered_dict or
+            CollateFormat.single (default: CollateFormat.single) where CollateFormat.single configures
+            the collater to output batches of data mixed from all sub-datasets,
+            and CollateFormat.ordered_dict configures the collater to output a dictionary of batches indexed by keys
+            of sub-datasets.
+            Note that not all sub-datasets will present in a single batch in both formats.
+        virtual_size (int, or callable): the expected virtual size of the dataset (default: default_virtual_size_func).
+        split (str): the split of the data, e.g. 'train', 'valid' or 'test'.
+        shared_collater (bool): whether or not to all sub-datasets have the same collater.
+        shuffle (bool): whether or not to shuffle data (default: True).
+    """
+
+    def __init__(
+        self,
+        datasets,
+        sampling_ratios=None,
+        seed=2,
+        epoch=1,
+        eval_key=None,
+        collate_format=CollateFormat.single,
+        virtual_size=default_virtual_size_func,
+        split="",
+        shared_collater=False,
+        shuffle=True,
+    ):
+        super().__init__()
+        self.shared_collater = shared_collater
+        self.shuffle = shuffle
+
+        if isinstance(datasets, OrderedDict):
+            self.keys = list(datasets.keys())
+            datasets = list(datasets.values())
+        elif isinstance(datasets, List):
+            self.keys = list(range(len(datasets)))
+        else:
+            raise AssertionError()
+        self.datasets = datasets
+        self.split = split
+
+        self.eval_key = eval_key
+        if self.eval_key is not None:
+            self.collate_format = CollateFormat.single
+        else:
+            self.collate_format = collate_format
+
+        self.seed = seed
+        self._cur_epoch = None
+
+        self.cumulated_sizes = None
+        # self.datasets[k][self._cur_indices[i]] is the data item i in this sampled dataset
+        # namely, data item i is sampled from the kth sub-dataset self.datasets[k]
+        # where self.cumulated_sizes[k-1] <= i < self.cumulated_sizes[k]
+        self._cur_indices = None
+
+        self._sizes = None
+        self.virtual_size_per_dataset = None
+        # caching properties
+        self._reset_cached_properties()
+        self.setup_sampling(sampling_ratios, virtual_size)
+        self.set_epoch(epoch)
+
+    def _clean_if_not_none(self, var_list):
+        for v in var_list:
+            if v is not None:
+                del v
+
+    def _reset_cached_properties(self):
+        self._clean_if_not_none([self._sizes, self._cur_indices])
+        self._sizes = None
+        self._cur_indices = None
+
+    def setup_sampling(self, sample_ratios, virtual_size):
+        sizes = [len(d) for d in self.datasets]
+        if sample_ratios is None:
+            # default back to concating datasets
+            self.sample_ratios = None
+            self.virtual_size = sum(sizes)
+        else:
+            if not isinstance(sample_ratios, np.ndarray):
+                sample_ratios = np.array(sample_ratios)
+            self.sample_ratios = sample_ratios
+            virtual_size = (
+                default_virtual_size_func if virtual_size is None else virtual_size
+            )
+            self.virtual_size = (
+                virtual_size(self.datasets, self.sample_ratios)
+                if callable(virtual_size)
+                else virtual_size
+            )
+
+    def adjust_sampling(self, epoch, sampling_ratios, virtual_size):
+        if sampling_ratios is not None:
+            sampling_ratios = self._sync_sample_ratios(sampling_ratios)
+            self.setup_sampling(sampling_ratios, virtual_size)
+
+    def _sync_sample_ratios(self, ratios):
+        # in case the ratios are not precisely the same across processes
+        # also to ensure every procresses update the ratios in the same pace
+        ratios = torch.DoubleTensor(ratios)
+        if torch.distributed.is_initialized():
+            if torch.cuda.is_available():
+                distributed_utils.all_reduce(
+                    ratios.cuda(), group=distributed_utils.get_data_parallel_group()
+                )
+            else:
+                distributed_utils.all_reduce(
+                    ratios, group=distributed_utils.get_data_parallel_group()
+                )
+            ret = ratios.cpu()
+            ret = ret.numpy()
+        return ret
+
+    def random_choice_in_dataset(self, rng, dataset, choice_size):
+        if hasattr(dataset, "random_choice_in_dataset"):
+            return dataset.random_choice_in_dataset(rng, choice_size)
+        dataset_size = len(dataset)
+        return rng.choice(
+            dataset_size, choice_size, replace=(choice_size > dataset_size)
+        )
+
+    def get_virtual_indices(self, rng, datasets, sample_ratios, virtual_size):
+        def get_counts(sample_ratios):
+            counts = np.array([virtual_size * r for r in sample_ratios], dtype=np.int64)
+            diff = virtual_size - counts.sum()
+            assert diff >= 0
+            # due to round-offs, the size might not match the desired sizes
+            if diff > 0:
+                dataset_indices = rng.choice(
+                    len(sample_ratios), size=diff, p=sample_ratios
+                )
+                for i in dataset_indices:
+                    counts[i] += 1
+            return counts
+
+        def get_in_dataset_indices(datasets, sizes, sample_ratios):
+            counts = get_counts(sample_ratios)
+            # uniformally sample desired counts for each dataset
+            # if the desired counts are large, sample with replacement:
+            indices = [
+                self.random_choice_in_dataset(rng, d, c)
+                for c, d in zip(counts, datasets)
+            ]
+            return indices
+
+        sizes = [len(d) for d in datasets]
+        if sample_ratios is None:
+            # default back to concating datasets
+            in_dataset_indices = [list(range(s)) for s in sizes]
+            virtual_sizes_per_dataset = sizes
+        else:
+            ratios = sample_ratios / sample_ratios.sum()
+            in_dataset_indices = get_in_dataset_indices(datasets, sizes, ratios)
+            virtual_sizes_per_dataset = [len(d) for d in in_dataset_indices]
+        virtual_sizes_per_dataset = np.array(virtual_sizes_per_dataset, np.int64)
+        cumulative_sizes = np.cumsum(virtual_sizes_per_dataset)
+        assert sum(virtual_sizes_per_dataset) == virtual_size
+        assert cumulative_sizes[-1] == virtual_size
+        if virtual_size < sum(sizes):
+            logger.warning(
+                f"virtual data size ({virtual_size}) is less than real data size ({sum(sizes)})."
+                " If virtual size << real data size, there could be data coverage issue."
+            )
+        in_dataset_indices = np.hstack(in_dataset_indices)
+        return in_dataset_indices, cumulative_sizes, virtual_sizes_per_dataset
+
+    def _get_dataset_and_index(self, index):
+        i = bisect_right(self.cumulated_sizes, index)
+        return i, self._cur_indices[index]
+
+    def __getitem__(self, index):
+        # self.__getitem__(index) returns self.datasets[k][self._cur_indices[index]]
+        # where k satisfies self.cumulated_sizes[k - 1] <= k < self.cumulated_sizes[k]
+        ds_idx, ds_sample_idx = self._get_dataset_and_index(index)
+        ret = (ds_idx, self.datasets[ds_idx][ds_sample_idx])
+        return ret
+
+    def num_tokens(self, index):
+        return self.sizes[index].max()
+
+    def num_tokens_vec(self, indices):
+        sizes_vec = self.sizes[np.array(indices)]
+        # max across all dimensions but first one
+        return np.amax(sizes_vec, axis=tuple(range(1, len(sizes_vec.shape))))
+
+    def size(self, index):
+        return self.sizes[index]
+
+    def __len__(self):
+        return self.virtual_size
+
+    def collater(self, samples, **extra_args):
+        """Merge a list of samples to form a mini-batch."""
+        if len(samples) == 0:
+            return None
+        if self.collate_format == "ordered_dict":
+            collect_samples = [[] for _ in range(len(self.datasets))]
+            for (i, sample) in samples:
+                collect_samples[i].append(sample)
+            batch = OrderedDict(
+                [
+                    (self.keys[i], dataset.collater(collect_samples[i]))
+                    for i, (key, dataset) in enumerate(zip(self.keys, self.datasets))
+                    if len(collect_samples[i]) > 0
+                ]
+            )
+        elif self.shared_collater:
+            batch = self.datasets[0].collater([s for _, s in samples])
+        else:
+            samples_dict = defaultdict(list)
+            pad_to_length = (
+                defaultdict(int)
+                if "pad_to_length" not in extra_args
+                else extra_args["pad_to_length"]
+            )
+            for ds_idx, s in samples:
+                pad_to_length["source"] = max(
+                    pad_to_length["source"], s["source"].size(0)
+                )
+                if s["target"] is not None:
+                    pad_to_length["target"] = max(
+                        pad_to_length["target"], s["target"].size(0)
+                    )
+                samples_dict[ds_idx].append(s)
+            batches = [
+                self.datasets[i].collater(samples_dict[i], pad_to_length=pad_to_length)
+                for i in range(len(self.datasets))
+                if len(samples_dict[i]) > 0
+            ]
+
+            def straight_data(tensors):
+                batch = torch.cat(tensors, dim=0)
+                return batch
+
+            src_lengths = straight_data(
+                [b["net_input"]["src_lengths"] for b in batches]
+            )
+            src_lengths, sort_order = src_lengths.sort(descending=True)
+
+            def straight_order(tensors):
+                batch = straight_data(tensors)
+                return batch.index_select(0, sort_order)
+
+            batch = {
+                "id": straight_order([b["id"] for b in batches]),
+                "nsentences": sum(b["nsentences"] for b in batches),
+                "ntokens": sum(b["ntokens"] for b in batches),
+                "net_input": {
+                    "src_tokens": straight_order(
+                        [b["net_input"]["src_tokens"] for b in batches]
+                    ),
+                    "src_lengths": src_lengths,
+                },
+                "target": straight_order([b["target"] for b in batches])
+                if batches[0]["target"] is not None
+                else None,
+            }
+            if "prev_output_tokens" in batches[0]["net_input"]:
+                batch["net_input"]["prev_output_tokens"] = straight_order(
+                    [b["net_input"]["prev_output_tokens"] for b in batches]
+                )
+            if "src_lang_id" in batches[0]["net_input"]:
+                batch["net_input"]["src_lang_id"] = straight_order(
+                    [b["net_input"]["src_lang_id"] for b in batches]
+                )
+            if "tgt_lang_id" in batches[0]:
+                batch["tgt_lang_id"] = straight_order(
+                    [b["tgt_lang_id"] for b in batches]
+                )
+        return batch
+
+    @property
+    def sizes(self):
+        if self._sizes is not None:
+            return self._sizes
+        start_time = time.time()
+        in_sub_dataset_indices = [
+            self._cur_indices[
+                0 if i == 0 else self.cumulated_sizes[i - 1] : self.cumulated_sizes[i]
+            ]
+            for i in range(len(self.datasets))
+        ]
+        sub_dataset_sizes = [
+            d.sizes[indices]
+            for d, indices in zip(self.datasets, in_sub_dataset_indices)
+        ]
+        self._sizes = np.vstack(sub_dataset_sizes)
+        logger.info(f"sizes() calling time: {get_time_gap(start_time, time.time())}")
+        return self._sizes
+
+    def ordered_indices(self):
+        if self.shuffle:
+            indices = np.random.permutation(len(self))
+        else:
+            indices = np.arange(len(self))
+
+        sizes = self.sizes
+        tgt_sizes = sizes[:, 1] if len(sizes.shape) > 0 and sizes.shape[1] > 1 else None
+        src_sizes = (
+            sizes[:, 0] if len(sizes.shape) > 0 and sizes.shape[1] > 1 else sizes
+        )
+
+        # sort by target length, then source length
+        if tgt_sizes is not None:
+            indices = indices[np.argsort(tgt_sizes[indices], kind="mergesort")]
+        sort_indices = indices[np.argsort(src_sizes[indices], kind="mergesort")]
+        return sort_indices
+
+    def prefetch(self, indices):
+        prefetch_indices = [[] for _ in range(len(self.datasets))]
+        for i in indices:
+            ds_idx, ds_sample_idx = self._get_dataset_and_index(i)
+            prefetch_indices[ds_idx].append(ds_sample_idx)
+        for i in range(len(prefetch_indices)):
+            self.datasets[i].prefetch(prefetch_indices[i])
+
+    @property
+    def can_reuse_epoch_itr_across_epochs(self):
+        return False
+
+    def set_epoch(self, epoch):
+        super().set_epoch(epoch)
+        if epoch == self._cur_epoch:
+            # re-enter so return
+            return
+        for d in self.datasets:
+            if hasattr(d, "set_epoch"):
+                d.set_epoch(epoch)
+        self._cur_epoch = epoch
+        self._establish_virtual_datasets()
+
+    def _establish_virtual_datasets(self):
+        if self.sample_ratios is None and self._cur_indices is not None:
+            # not a samping dataset, no need to resample if indices are already established
+            return
+        self._reset_cached_properties()
+
+        start_time = time.time()
+        # Generate a weighted sample of indices as a function of the
+        # random seed and the current epoch.
+        rng = np.random.RandomState(
+            [
+                int(
+                    hashlib.sha1(
+                        str(self.__class__.__name__).encode("utf-8")
+                    ).hexdigest(),
+                    16,
+                )
+                % (2 ** 32),
+                self.seed % (2 ** 32),  # global seed
+                self._cur_epoch,  # epoch index,
+            ]
+        )
+        self._clean_if_not_none(
+            [self.cumulated_sizes, self.virtual_size_per_dataset, self._sizes]
+        )
+        self._sizes = None
+
+        indices, cumulated_sizes, virtual_size_per_dataset = self.get_virtual_indices(
+            rng, self.datasets, self.sample_ratios, self.virtual_size
+        )
+        self._cur_indices = indices
+        self.cumulated_sizes = cumulated_sizes
+        self.virtual_size_per_dataset = virtual_size_per_dataset
+
+        raw_sizes = [len(d) for d in self.datasets]
+        sampled_sizes = self.virtual_size_per_dataset
+        logger.info(
+            f"[{self.split}] Raw sizes: {str(dict(zip(self.keys, raw_sizes)))}; "
+            f"raw total size: {sum(raw_sizes)}"
+        )
+        logger.info(
+            f"[{self.split}] Resampled sizes: {str(dict(zip(self.keys, sampled_sizes)))}; "
+            f"resampled total size: {sum(sampled_sizes)}"
+        )
+        if self.sample_ratios is not None:
+            logger.info(
+                f"[{self.split}] Upsampling ratios: {str(dict(zip(self.keys, self.sample_ratios)))}"
+            )
+        else:
+            logger.info(f"[{self.split}] A concat dataset")
+        logger.info(
+            f"[{self.split}] virtual dataset established time: {get_time_gap(start_time, time.time())}"
+        )
+
+    def filter_indices_by_size(self, indices, max_sizes):
+        """Filter a list of sample indices. Remove those that are longer
+            than specified in max_sizes.
+
+        Args:
+            indices (np.array): original array of sample indices
+            max_sizes (int or list[int] or tuple[int]): max sample size,
+                can be defined separately for src and tgt (then list or tuple)
+
+        Returns:
+            np.array: filtered sample array
+            list: list of removed indices
+        """
+        sizes = self.sizes
+        tgt_sizes = sizes[:, 1] if len(sizes.shape) > 0 and sizes.shape[1] > 1 else None
+        src_sizes = (
+            sizes[:, 0] if len(sizes.shape) > 0 and sizes.shape[1] > 1 else sizes
+        )
+
+        return data_utils.filter_paired_dataset_indices_by_size(
+            src_sizes, tgt_sizes, indices, max_sizes
+        )
diff --git a/SpeechT5/fairseq/fairseq/data/multilingual/sampled_multi_epoch_dataset.py b/SpeechT5/fairseq/fairseq/data/multilingual/sampled_multi_epoch_dataset.py
new file mode 100644
index 0000000000000000000000000000000000000000..17387b2f85c0ee76db1a003091331b46de8d8def
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/data/multilingual/sampled_multi_epoch_dataset.py
@@ -0,0 +1,199 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import hashlib
+import logging
+import math
+
+import numpy as np
+from fairseq.data import SampledMultiDataset
+
+from .sampled_multi_dataset import CollateFormat, default_virtual_size_func
+
+
+logger = logging.getLogger(__name__)
+
+
+class SampledMultiEpochDataset(SampledMultiDataset):
+    """Samples from multiple sub-datasets according to sampling ratios
+       using virtual epoch sizes to speed up dataloading.
+    Args:
+        datasets (
+            List[~torch.utils.data.Dataset]
+            or OrderedDict[str, ~torch.utils.data.Dataset]
+        ): datasets
+        sampling_ratios (List[float]): list of probability of each dataset to be sampled
+            (default: None, which corresponds to concating all dataset together).
+        seed (int): RNG seed to use (default: 2).
+        epoch (int): starting epoch number (default: 1).
+        eval_key (str, optional): a key used at evaluation time that causes
+            this instance to pass-through batches from *datasets[eval_key]*.
+        collate_format (CollateFormat):  collater output format, either CollateFormat.ordered_dict or
+            CollateFormat.single (default: CollateFormat.single) where CollateFormat.single configures
+            the collater to output batches of data mixed from all sub-datasets,
+            and CollateFormat.ordered_dict configures the collater to output a dictionary of batches indexed by keys
+            of sub-datasets.
+            Note that not all sub-datasets will present in a single batch in both formats.
+        virtual_size (int, or callable): the expected virtual size of the dataset (default: default_virtual_size_func).
+        split (str): the split of the data, e.g. 'train', 'valid' or 'test'.
+        virtual_epoch_size (int): virtual epoch size, the dataset will go through the data by
+            this virtual epoch size one by one to speed up data loading, e.g. indicing and filtering
+            can be performed whenever a virtual epoch is loaded without waiting for the whole dataset to be loaded.
+        shared_collater (bool): whether or not to all sub-datasets have the same collater.
+        shard_epoch (int): the real epoch number for shard selection.
+        shuffle (bool): whether or not to shuffle data (default: True).
+    """
+
+    def __init__(
+        self,
+        datasets,
+        sampling_ratios=None,
+        seed=2,
+        epoch=1,
+        eval_key=None,
+        collate_format=CollateFormat.single,
+        virtual_size=default_virtual_size_func,
+        split="",
+        virtual_epoch_size=None,
+        shared_collater=False,
+        shard_epoch=1,
+        shuffle=True,
+    ):
+        self.virtual_epoch_size = virtual_epoch_size
+        self._current_epoch_start_index = None
+        self._random_global_indices = None
+        self.shard_epoch = shard_epoch if shard_epoch is not None else 1
+        self.load_next_shard = None
+        self._epoch_sizes = None
+        super().__init__(
+            datasets=datasets,
+            sampling_ratios=sampling_ratios,
+            seed=seed,
+            epoch=epoch,
+            eval_key=eval_key,
+            collate_format=collate_format,
+            virtual_size=virtual_size,
+            split=split,
+            shared_collater=shared_collater,
+            shuffle=shuffle,
+        )
+
+    def _setup(self, epoch):
+        self.virtual_epoch_size = (
+            self.virtual_epoch_size
+            if self.virtual_epoch_size is not None
+            else self.virtual_size
+        )
+        if self.virtual_epoch_size > self.virtual_size:
+            logger.warning(
+                f"virtual epoch size {self.virtual_epoch_size} "
+                f"is greater than virtual dataset size {self.virtual_size}"
+            )
+            self.virtual_epoch_size = self.virtual_size
+        self.num_virtual_epochs = math.ceil(self.virtual_size / self.virtual_epoch_size)
+        self._current_epoch_start_index = self._get_epoch_start_index(epoch)
+        logger.info(
+            f"virtual epoch size {self.virtual_epoch_size}; virtual dataset size {self.virtual_size}"
+        )
+
+    def _map_epoch_index_to_global(self, index):
+        index = self._current_epoch_start_index + index
+        # add randomness
+        return self._random_global_indices[index]
+
+    @property
+    def sizes(self):
+        if self._epoch_sizes is not None:
+            return self._epoch_sizes
+        _sizes = super().sizes
+        indices = self._random_global_indices[
+            self._current_epoch_start_index : self._current_epoch_start_index
+            + len(self)
+        ]
+        self._epoch_sizes = _sizes[indices]
+        # del super()._sizes to save memory
+        del self._sizes
+        self._sizes = None
+        return self._epoch_sizes
+
+    def _get_dataset_and_index(self, index):
+        i = self._map_epoch_index_to_global(index)
+        return super()._get_dataset_and_index(i)
+
+    def __len__(self):
+        return (
+            self.virtual_epoch_size
+            if self._current_epoch_start_index + self.virtual_epoch_size
+            < self.virtual_size
+            else self.virtual_size - self._current_epoch_start_index
+        )
+
+    def set_epoch(self, epoch):
+        if self._current_epoch_start_index is None:
+            # initializing epoch idnices of a virtual dataset
+            self._setup(epoch)
+            self._next_virtual_epoch(epoch)
+        else:
+            # working on already intialized epoch indices
+            if epoch == self._cur_epoch:
+                # re-enter so return
+                return
+            self._next_virtual_epoch(epoch)
+
+    def _get_epoch_start_index(self, epoch):
+        assert epoch >= 1  # fairseq is using 1-based epoch everywhere
+        return ((epoch - 1) % self.num_virtual_epochs) * self.virtual_epoch_size
+
+    def _next_global_indices(self, epoch):
+        rng = np.random.RandomState(
+            [
+                int(
+                    hashlib.sha1(
+                        str(self.__class__.__name__).encode("utf-8")
+                    ).hexdigest(),
+                    16,
+                )
+                % (2 ** 32),
+                self.seed % (2 ** 32),  # global seed
+                epoch,  # epoch index,
+            ]
+        )
+        del self._random_global_indices
+        self._random_global_indices = rng.choice(
+            self.virtual_size, self.virtual_size, replace=False
+        )
+        if self.load_next_shard is None:
+            self.load_next_shard = False
+        else:
+            # increase shard epoch for next loading
+            self.shard_epoch += 1
+            self.load_next_shard = True
+            logger.info(
+                "to load next epoch/shard in next load_dataset: "
+                f"epoch={epoch}/shard_epoch={self.shard_epoch}"
+            )
+
+    def _next_virtual_epoch(self, epoch):
+        index = self._get_epoch_start_index(epoch)
+        if index == 0 or self._random_global_indices is None:
+            # need to start from the beginning,
+            # so call super().set_epoch(epoch) to establish the global virtual indices
+            logger.info(
+                "establishing a new set of global virtual indices for "
+                f"epoch={epoch}/shard_epoch={self.shard_epoch}"
+            )
+            super().set_epoch(epoch)
+            self._next_global_indices(epoch)
+        else:
+            self._cur_epoch = epoch
+
+        # reset cache sizes and ordered_indices for the epoch after moving to a new epoch
+        self._clean_if_not_none(
+            [
+                self._epoch_sizes,
+            ]
+        )
+        self._epoch_sizes = None
+        self._current_epoch_start_index = index
diff --git a/SpeechT5/fairseq/fairseq/data/multilingual/sampling_method.py b/SpeechT5/fairseq/fairseq/data/multilingual/sampling_method.py
new file mode 100644
index 0000000000000000000000000000000000000000..140c68f01d60e902ef88f11f30f8813dc15fc681
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/data/multilingual/sampling_method.py
@@ -0,0 +1,78 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import logging
+from typing import List
+
+
+logger = logging.getLogger(__name__)
+
+
+def uniform(dataset_sizes: List[int]):
+    return [1.0] * len(dataset_sizes)
+
+
+def temperature_sampling(dataset_sizes, temp):
+    total_size = sum(dataset_sizes)
+    return [(size / total_size) ** (1.0 / temp) for size in dataset_sizes]
+
+
+def make_temperature_sampling(temp=1.0):
+    def sampling_func(dataset_sizes):
+        return temperature_sampling(dataset_sizes, temp)
+
+    return sampling_func
+
+
+def make_ratio_sampling(ratios):
+    def sampling_func(dataset_sizes):
+        return ratios
+
+    return sampling_func
+
+
+class SamplingMethod:
+    @staticmethod
+    def add_arguments(parser):
+        parser.add_argument(
+            "--sampling-method",
+            choices=[
+                "uniform",
+                "temperature",
+                "concat",
+                "RoundRobin",
+            ],
+            type=str,
+            default="concat",
+            help="The method to sample data per language pairs",
+        )
+        parser.add_argument(
+            "--sampling-temperature",
+            default=1.5,
+            type=float,
+            help="only work with --sampling-method temperature",
+        )
+
+    @staticmethod
+    def build_sampler(args, task):
+        return SamplingMethod(args, task)
+
+    def __init__(self, args, task):
+        self.args = args
+        self.task = task
+
+    def is_adaptive(self):
+        return False
+
+    def sampling_method_selector(self):
+        args = self.args
+        logger.info(f"selected sampler: {args.sampling_method}")
+        if args.sampling_method == "uniform":
+            return uniform
+        elif args.sampling_method == "temperature" or self.is_adaptive():
+            return make_temperature_sampling(float(args.sampling_temperature))
+        else:
+            # default to concating all data set together
+            return None
diff --git a/SpeechT5/fairseq/fairseq/data/nested_dictionary_dataset.py b/SpeechT5/fairseq/fairseq/data/nested_dictionary_dataset.py
new file mode 100644
index 0000000000000000000000000000000000000000..52e74abddacc923c5e29b0a0c41d7efc85482d3b
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/data/nested_dictionary_dataset.py
@@ -0,0 +1,125 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from collections import OrderedDict
+
+import torch
+from torch.utils.data.dataloader import default_collate
+
+from . import FairseqDataset
+
+
+def _flatten(dico, prefix=None):
+    """Flatten a nested dictionary."""
+    new_dico = OrderedDict()
+    if isinstance(dico, dict):
+        prefix = prefix + "." if prefix is not None else ""
+        for k, v in dico.items():
+            if v is None:
+                continue
+            new_dico.update(_flatten(v, prefix + k))
+    elif isinstance(dico, list):
+        for i, v in enumerate(dico):
+            new_dico.update(_flatten(v, prefix + ".[" + str(i) + "]"))
+    else:
+        new_dico = OrderedDict({prefix: dico})
+    return new_dico
+
+
+def _unflatten(dico):
+    """Unflatten a flattened dictionary into a nested dictionary."""
+    new_dico = OrderedDict()
+    for full_k, v in dico.items():
+        full_k = full_k.split(".")
+        node = new_dico
+        for k in full_k[:-1]:
+            if k.startswith("[") and k.endswith("]"):
+                k = int(k[1:-1])
+            if k not in node:
+                node[k] = OrderedDict()
+            node = node[k]
+        node[full_k[-1]] = v
+    return new_dico
+
+
+class NestedDictionaryDataset(FairseqDataset):
+    def __init__(self, defn, sizes=None):
+        super().__init__()
+        self.defn = _flatten(defn)
+        self.sizes = [sizes] if not isinstance(sizes, (list, tuple)) else sizes
+
+        first = None
+        for v in self.defn.values():
+            if not isinstance(
+                v,
+                (
+                    FairseqDataset,
+                    torch.utils.data.Dataset,
+                ),
+            ):
+                raise ValueError("Expected Dataset but found: {}".format(v.__class__))
+            first = first or v
+            if len(v) > 0:
+                assert len(v) == len(first), "dataset lengths must match"
+
+        self._len = len(first)
+
+    def __getitem__(self, index):
+        return OrderedDict((k, ds[index]) for k, ds in self.defn.items())
+
+    def __len__(self):
+        return self._len
+
+    def collater(self, samples):
+        """Merge a list of samples to form a mini-batch.
+
+        Args:
+            samples (List[dict]): samples to collate
+
+        Returns:
+            dict: a mini-batch suitable for forwarding with a Model
+        """
+        if len(samples) == 0:
+            return {}
+        sample = OrderedDict()
+        for k, ds in self.defn.items():
+            try:
+                sample[k] = ds.collater([s[k] for s in samples])
+            except NotImplementedError:
+                sample[k] = default_collate([s[k] for s in samples])
+        return _unflatten(sample)
+
+    def num_tokens(self, index):
+        """Return the number of tokens in a sample. This value is used to
+        enforce ``--max-tokens`` during batching."""
+        return max(s[index] for s in self.sizes)
+
+    def size(self, index):
+        """Return an example's size as a float or tuple. This value is used when
+        filtering a dataset with ``--max-positions``."""
+        if len(self.sizes) == 1:
+            return self.sizes[0][index]
+        else:
+            return (s[index] for s in self.sizes)
+
+    @property
+    def supports_prefetch(self):
+        """Whether this dataset supports prefetching."""
+        return any(ds.supports_prefetch for ds in self.defn.values())
+
+    def prefetch(self, indices):
+        """Prefetch the data required for this epoch."""
+        for ds in self.defn.values():
+            if getattr(ds, "supports_prefetch", False):
+                ds.prefetch(indices)
+
+    @property
+    def can_reuse_epoch_itr_across_epochs(self):
+        return all(ds.can_reuse_epoch_itr_across_epochs for ds in self.defn.values())
+
+    def set_epoch(self, epoch):
+        super().set_epoch(epoch)
+        for ds in self.defn.values():
+            ds.set_epoch(epoch)
diff --git a/SpeechT5/fairseq/fairseq/data/noising.py b/SpeechT5/fairseq/fairseq/data/noising.py
new file mode 100644
index 0000000000000000000000000000000000000000..2b1cc347203bfbdc9f1cba29e2e36427b7b5be57
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/data/noising.py
@@ -0,0 +1,335 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import numpy as np
+import torch
+from fairseq.data import data_utils
+
+
+class WordNoising(object):
+    """Generate a noisy version of a sentence, without changing words themselves."""
+
+    def __init__(self, dictionary, bpe_cont_marker="@@", bpe_end_marker=None):
+        self.dictionary = dictionary
+        self.bpe_end = None
+        if bpe_cont_marker:
+            self.bpe_end = np.array(
+                [
+                    not self.dictionary[i].endswith(bpe_cont_marker)
+                    for i in range(len(self.dictionary))
+                ]
+            )
+        elif bpe_end_marker:
+            self.bpe_end = np.array(
+                [
+                    self.dictionary[i].endswith(bpe_end_marker)
+                    for i in range(len(self.dictionary))
+                ]
+            )
+
+        self.get_word_idx = (
+            self._get_bpe_word_idx if self.bpe_end is not None else self._get_token_idx
+        )
+
+    def noising(self, x, lengths, noising_prob=0.0):
+        raise NotImplementedError()
+
+    def _get_bpe_word_idx(self, x):
+        """
+        Given a list of BPE tokens, for every index in the tokens list,
+        return the index of the word grouping that it belongs to.
+        For example, for input x corresponding to ["how", "are", "y@@", "ou"],
+        return [[0], [1], [2], [2]].
+        """
+        # x: (T x B)
+        bpe_end = self.bpe_end[x]
+
+        if x.size(0) == 1 and x.size(1) == 1:
+            # Special case when we only have one word in x. If x = [[N]],
+            # bpe_end is a scalar (bool) instead of a 2-dim array of bools,
+            # which makes the sum operation below fail.
+            return np.array([[0]])
+
+        # do a reduce front sum to generate word ids
+        word_idx = bpe_end[::-1].cumsum(0)[::-1]
+        word_idx = word_idx.max(0)[None, :] - word_idx
+        return word_idx
+
+    def _get_token_idx(self, x):
+        """
+        This is to extend noising functions to be able to apply to non-bpe
+        tokens, e.g. word or characters.
+        """
+        x = torch.t(x)
+        word_idx = np.array([range(len(x_i)) for x_i in x])
+        return np.transpose(word_idx)
+
+
+class WordDropout(WordNoising):
+    """Randomly drop input words. If not passing blank_idx (default is None),
+    then dropped words will be removed. Otherwise, it will be replaced by the
+    blank_idx."""
+
+    def __init__(
+        self,
+        dictionary,
+        default_dropout_prob=0.1,
+        bpe_cont_marker="@@",
+        bpe_end_marker=None,
+    ):
+        super().__init__(dictionary, bpe_cont_marker, bpe_end_marker)
+        self.default_dropout_prob = default_dropout_prob
+
+    def noising(self, x, lengths, dropout_prob=None, blank_idx=None):
+        if dropout_prob is None:
+            dropout_prob = self.default_dropout_prob
+        # x: (T x B), lengths: B
+        if dropout_prob == 0:
+            return x, lengths
+
+        assert 0 < dropout_prob < 1
+
+        # be sure to drop entire words
+        word_idx = self.get_word_idx(x)
+        sentences = []
+        modified_lengths = []
+        for i in range(lengths.size(0)):
+            # Since dropout probabilities need to apply over non-pad tokens,
+            # it is not trivial to generate the keep mask without consider
+            # input lengths; otherwise, this could be done outside the loop
+
+            # We want to drop whole words based on word_idx grouping
+            num_words = max(word_idx[:, i]) + 1
+
+            # ith example: [x0, x1, ..., eos, pad, ..., pad]
+            # We should only generate keep probs for non-EOS tokens. Thus if the
+            # input sentence ends in EOS, the last word idx is not included in
+            # the dropout mask generation and we append True to always keep EOS.
+            # Otherwise, just generate the dropout mask for all word idx
+            # positions.
+            has_eos = x[lengths[i] - 1, i] == self.dictionary.eos()
+            if has_eos:  # has eos?
+                keep = np.random.rand(num_words - 1) >= dropout_prob
+                keep = np.append(keep, [True])  # keep EOS symbol
+            else:
+                keep = np.random.rand(num_words) >= dropout_prob
+
+            words = x[: lengths[i], i].tolist()
+
+            # TODO: speed up the following loop
+            # drop words from the input according to keep
+            new_s = [
+                w if keep[word_idx[j, i]] else blank_idx for j, w in enumerate(words)
+            ]
+            new_s = [w for w in new_s if w is not None]
+            # we need to have at least one word in the sentence (more than the
+            # start / end sentence symbols)
+            if len(new_s) <= 1:
+                # insert at beginning in case the only token left is EOS
+                # EOS should be at end of list.
+                new_s.insert(0, words[np.random.randint(0, len(words))])
+            assert len(new_s) >= 1 and (
+                not has_eos  # Either don't have EOS at end or last token is EOS
+                or (len(new_s) >= 2 and new_s[-1] == self.dictionary.eos())
+            ), "New sentence is invalid."
+            sentences.append(new_s)
+            modified_lengths.append(len(new_s))
+        # re-construct input
+        modified_lengths = torch.LongTensor(modified_lengths)
+        modified_x = torch.LongTensor(
+            modified_lengths.max(), modified_lengths.size(0)
+        ).fill_(self.dictionary.pad())
+        for i in range(modified_lengths.size(0)):
+            modified_x[: modified_lengths[i], i].copy_(torch.LongTensor(sentences[i]))
+
+        return modified_x, modified_lengths
+
+
+class WordShuffle(WordNoising):
+    """Shuffle words by no more than k positions."""
+
+    def __init__(
+        self,
+        dictionary,
+        default_max_shuffle_distance=3,
+        bpe_cont_marker="@@",
+        bpe_end_marker=None,
+    ):
+        super().__init__(dictionary, bpe_cont_marker, bpe_end_marker)
+        self.default_max_shuffle_distance = 3
+
+    def noising(self, x, lengths, max_shuffle_distance=None):
+        if max_shuffle_distance is None:
+            max_shuffle_distance = self.default_max_shuffle_distance
+        # x: (T x B), lengths: B
+        if max_shuffle_distance == 0:
+            return x, lengths
+
+        # max_shuffle_distance < 1 will return the same sequence
+        assert max_shuffle_distance > 1
+
+        # define noise word scores
+        noise = np.random.uniform(
+            0,
+            max_shuffle_distance,
+            size=(x.size(0), x.size(1)),
+        )
+        noise[0] = -1  # do not move start sentence symbol
+        # be sure to shuffle entire words
+        word_idx = self.get_word_idx(x)
+        x2 = x.clone()
+        for i in range(lengths.size(0)):
+            length_no_eos = lengths[i]
+            if x[lengths[i] - 1, i] == self.dictionary.eos():
+                length_no_eos = lengths[i] - 1
+            # generate a random permutation
+            scores = word_idx[:length_no_eos, i] + noise[word_idx[:length_no_eos, i], i]
+            # ensure no reordering inside a word
+            scores += 1e-6 * np.arange(length_no_eos.item())
+            permutation = scores.argsort()
+            # shuffle words
+            x2[:length_no_eos, i].copy_(
+                x2[:length_no_eos, i][torch.from_numpy(permutation)]
+            )
+        return x2, lengths
+
+
+class UnsupervisedMTNoising(WordNoising):
+    """
+    Implements the default configuration for noising in UnsupervisedMT
+    (github.com/facebookresearch/UnsupervisedMT)
+    """
+
+    def __init__(
+        self,
+        dictionary,
+        max_word_shuffle_distance,
+        word_dropout_prob,
+        word_blanking_prob,
+        bpe_cont_marker="@@",
+        bpe_end_marker=None,
+    ):
+        super().__init__(dictionary)
+        self.max_word_shuffle_distance = max_word_shuffle_distance
+        self.word_dropout_prob = word_dropout_prob
+        self.word_blanking_prob = word_blanking_prob
+
+        self.word_dropout = WordDropout(
+            dictionary=dictionary,
+            bpe_cont_marker=bpe_cont_marker,
+            bpe_end_marker=bpe_end_marker,
+        )
+        self.word_shuffle = WordShuffle(
+            dictionary=dictionary,
+            bpe_cont_marker=bpe_cont_marker,
+            bpe_end_marker=bpe_end_marker,
+        )
+
+    def noising(self, x, lengths):
+        # 1. Word Shuffle
+        noisy_src_tokens, noisy_src_lengths = self.word_shuffle.noising(
+            x=x,
+            lengths=lengths,
+            max_shuffle_distance=self.max_word_shuffle_distance,
+        )
+        # 2. Word Dropout
+        noisy_src_tokens, noisy_src_lengths = self.word_dropout.noising(
+            x=noisy_src_tokens,
+            lengths=noisy_src_lengths,
+            dropout_prob=self.word_dropout_prob,
+        )
+        # 3. Word Blanking
+        noisy_src_tokens, noisy_src_lengths = self.word_dropout.noising(
+            x=noisy_src_tokens,
+            lengths=noisy_src_lengths,
+            dropout_prob=self.word_blanking_prob,
+            blank_idx=self.dictionary.unk(),
+        )
+
+        return noisy_src_tokens
+
+
+class NoisingDataset(torch.utils.data.Dataset):
+    def __init__(
+        self,
+        src_dataset,
+        src_dict,
+        seed,
+        noiser=None,
+        noising_class=UnsupervisedMTNoising,
+        **kwargs
+    ):
+        """
+        Wrap a :class:`~torch.utils.data.Dataset` and apply noise to the
+        samples based on the supplied noising configuration.
+
+        Args:
+            src_dataset (~torch.utils.data.Dataset): dataset to wrap.
+                to build self.src_dataset --
+                a LanguagePairDataset with src dataset as the source dataset and
+                None as the target dataset. Should NOT have padding so that
+                src_lengths are accurately calculated by language_pair_dataset
+                collate function.
+                We use language_pair_dataset here to encapsulate the tgt_dataset
+                so we can re-use the LanguagePairDataset collater to format the
+                batches in the structure that SequenceGenerator expects.
+            src_dict (~fairseq.data.Dictionary): source dictionary
+            seed (int): seed to use when generating random noise
+            noiser (WordNoising): a pre-initialized :class:`WordNoising`
+                instance. If this is None, a new instance will be created using
+                *noising_class* and *kwargs*.
+            noising_class (class, optional): class to use to initialize a
+                default :class:`WordNoising` instance.
+            kwargs (dict, optional): arguments to initialize the default
+                :class:`WordNoising` instance given by *noiser*.
+        """
+        self.src_dataset = src_dataset
+        self.src_dict = src_dict
+        self.seed = seed
+        self.noiser = (
+            noiser
+            if noiser is not None
+            else noising_class(
+                dictionary=src_dict,
+                **kwargs,
+            )
+        )
+        self.sizes = src_dataset.sizes
+
+
+    def __getitem__(self, index):
+        """
+        Returns a single noisy sample. Multiple samples are fed to the collater
+        create a noising dataset batch.
+        """
+        src_tokens = self.src_dataset[index]
+        src_lengths = torch.LongTensor([len(src_tokens)])
+        src_tokens = src_tokens.unsqueeze(0)
+
+        # Transpose src tokens to fit expected shape of x in noising function
+        # (batch size, sequence length) -> (sequence length, batch size)
+        src_tokens_t = torch.t(src_tokens)
+
+        with data_utils.numpy_seed(self.seed + index):
+            noisy_src_tokens = self.noiser.noising(src_tokens_t, src_lengths)
+
+        # Transpose back to expected src_tokens format
+        # (sequence length, 1) -> (1, sequence length)
+        noisy_src_tokens = torch.t(noisy_src_tokens)
+        return noisy_src_tokens[0]
+
+    def __len__(self):
+        """
+        The length of the noising dataset is the length of src.
+        """
+        return len(self.src_dataset)
+
+    @property
+    def supports_prefetch(self):
+        return self.src_dataset.supports_prefetch
+
+    def prefetch(self, indices):
+        if self.src_dataset.supports_prefetch:
+            self.src_dataset.prefetch(indices)
diff --git a/SpeechT5/fairseq/fairseq/data/num_samples_dataset.py b/SpeechT5/fairseq/fairseq/data/num_samples_dataset.py
new file mode 100644
index 0000000000000000000000000000000000000000..99a17495c701d8a05e0268f98bf453905e11d078
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/data/num_samples_dataset.py
@@ -0,0 +1,17 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from . import FairseqDataset
+
+
+class NumSamplesDataset(FairseqDataset):
+    def __getitem__(self, index):
+        return 1
+
+    def __len__(self):
+        return 0
+
+    def collater(self, samples):
+        return sum(samples)
diff --git a/SpeechT5/fairseq/fairseq/data/numel_dataset.py b/SpeechT5/fairseq/fairseq/data/numel_dataset.py
new file mode 100644
index 0000000000000000000000000000000000000000..ac86dfd2f1d89055de909656d61d6aca85523f00
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/data/numel_dataset.py
@@ -0,0 +1,31 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import numpy as np
+import torch
+
+from . import BaseWrapperDataset
+
+
+class NumelDataset(BaseWrapperDataset):
+    def __init__(self, dataset, reduce=False):
+        super().__init__(dataset)
+        self.reduce = reduce
+
+    def __getitem__(self, index):
+        item = self.dataset[index]
+        if torch.is_tensor(item):
+            return torch.numel(item)
+        else:
+            return np.size(item)
+
+    def __len__(self):
+        return len(self.dataset)
+
+    def collater(self, samples):
+        if self.reduce:
+            return sum(samples)
+        else:
+            return torch.tensor(samples)
diff --git a/SpeechT5/fairseq/fairseq/data/offset_tokens_dataset.py b/SpeechT5/fairseq/fairseq/data/offset_tokens_dataset.py
new file mode 100644
index 0000000000000000000000000000000000000000..6fabbdcdaa1a8f70d8d8c07db4cd53754503c194
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/data/offset_tokens_dataset.py
@@ -0,0 +1,15 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from . import BaseWrapperDataset
+
+
+class OffsetTokensDataset(BaseWrapperDataset):
+    def __init__(self, dataset, offset):
+        super().__init__(dataset)
+        self.offset = offset
+
+    def __getitem__(self, idx):
+        return self.dataset[idx] + self.offset
diff --git a/SpeechT5/fairseq/fairseq/data/pad_dataset.py b/SpeechT5/fairseq/fairseq/data/pad_dataset.py
new file mode 100644
index 0000000000000000000000000000000000000000..8075bba6a9efc5f8421368ee0b2ae66afe3f5009
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/data/pad_dataset.py
@@ -0,0 +1,28 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from fairseq.data import data_utils
+
+from . import BaseWrapperDataset
+
+
+class PadDataset(BaseWrapperDataset):
+    def __init__(self, dataset, pad_idx, left_pad):
+        super().__init__(dataset)
+        self.pad_idx = pad_idx
+        self.left_pad = left_pad
+
+    def collater(self, samples):
+        return data_utils.collate_tokens(samples, self.pad_idx, left_pad=self.left_pad)
+
+
+class LeftPadDataset(PadDataset):
+    def __init__(self, dataset, pad_idx):
+        super().__init__(dataset, pad_idx, left_pad=True)
+
+
+class RightPadDataset(PadDataset):
+    def __init__(self, dataset, pad_idx):
+        super().__init__(dataset, pad_idx, left_pad=False)
diff --git a/SpeechT5/fairseq/fairseq/data/plasma_utils.py b/SpeechT5/fairseq/fairseq/data/plasma_utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..b9fab3b739db46b685fa6859a2f851a14eef8407
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/data/plasma_utils.py
@@ -0,0 +1,197 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+
+import subprocess
+import json
+import tempfile
+import hashlib
+from typing import Hashable
+
+try:
+    import pyarrow.plasma as plasma
+
+    PYARROW_AVAILABLE = True
+except ImportError:
+    plasma = None
+    PYARROW_AVAILABLE = False
+
+
+class PlasmaArray:
+    """
+    Wrapper around numpy arrays that automatically moves the data to shared
+    memory upon serialization. This is particularly helpful when passing numpy
+    arrays through multiprocessing, so that data is not unnecessarily
+    duplicated or pickled.
+    """
+
+    def __init__(self, array):
+        super().__init__()
+        self.array = array
+        self.disable = array.nbytes < 134217728  # disable for arrays <128MB
+        self.object_id = None
+        self.path = None
+
+        # variables with underscores shouldn't be pickled
+        self._client = None
+        self._server = None
+        self._server_tmp = None
+        self._plasma = None
+
+    @property
+    def plasma(self):
+        if self._plasma is None and not self.disable:
+            self._plasma = plasma
+        return self._plasma
+
+    def start_server(self):
+        if self.plasma is None or self._server is not None:
+            return
+        assert self.object_id is None
+        assert self.path is None
+        self._server_tmp = tempfile.NamedTemporaryFile()
+        self.path = self._server_tmp.name
+        self._server = subprocess.Popen(
+            ["plasma_store", "-m", str(int(1.05 * self.array.nbytes)), "-s", self.path]
+        )
+
+    @property
+    def client(self):
+        if self._client is None:
+            assert self.path is not None
+            self._client = self.plasma.connect(self.path, num_retries=200)
+        return self._client
+
+    def __getstate__(self):
+        """Called on pickle load"""
+        if self.plasma is None:
+            return self.__dict__
+        if self.object_id is None:
+            self.start_server()
+            self.object_id = self.client.put(self.array)
+        state = self.__dict__.copy()
+        del state["array"]
+        state["_client"] = None
+        state["_server"] = None
+        state["_server_tmp"] = None
+        state["_plasma"] = None
+        return state
+
+    def __setstate__(self, state):
+        """Called on pickle save"""
+        self.__dict__.update(state)
+        if self.plasma is None:
+            return
+        self.array = self.client.get(self.object_id)
+
+    def __del__(self):
+        if self._server is not None:
+            self._server.kill()
+            self._server = None
+            self._server_tmp.close()
+            self._server_tmp = None
+
+
+DEFAULT_PLASMA_PATH = "/tmp/plasma"
+
+
+class PlasmaView:
+    """Interface to write and read from shared memory. Whereas PlasmaArray writes to plasma on serialization,
+    PlasmaView writes to shared memory on instantiation."""
+
+    def __init__(self, array, split_path: str, hash_data: Hashable, plasma_path=None):
+        """
+        Args:
+            array: numpy array to store. This can be read with ``PlasmaView().array``
+            split_path: the path whence the data was read, used for hashing
+            hash_data: other metadata about the array that can be used to create a unique key.
+                as of writing, the 3 callers in ``TokenBlockDataset`` use::
+
+                    hash_data = ((block_size, document_sep_len, str(break_mode), len(dataset)), 0|1|2)
+
+
+        """
+        assert PYARROW_AVAILABLE
+        assert split_path is not None
+        if plasma_path is None:
+            plasma_path = DEFAULT_PLASMA_PATH
+
+        self.path = plasma_path
+        self.split_path = split_path
+        self._client = None  # Initialize lazily for pickle. plasma clients should not be deep copied or serialized.
+        self._n = None
+
+        self.object_id = self.get_object_id(self.split_path, hash_data)
+        try:
+            self.client.put(array, object_id=self.object_id)
+        except plasma.PlasmaObjectExists:
+            pass
+
+    @property
+    def client(self):
+        if self._client is None:
+            self._client = plasma.connect(self.path, num_retries=200)
+        return self._client
+
+    @property
+    def array(self):
+        """Fetch a read only view of an np.array, stored in plasma."""
+        ret = self.client.get(self.object_id)
+        return ret
+
+    @staticmethod
+    def get_object_id(split_path: str, hash_data: Hashable):
+        """Returns plasma.ObjectID from hashing split_path and object_num."""
+        hash = hashlib.blake2b(bytes(split_path, "utf-8"), digest_size=20)
+        harg = json.dumps(hash_data).encode("utf-8")
+        hash.update(harg)
+        return plasma.ObjectID(hash.digest())
+
+    def __getstate__(self):
+        """Called on pickle save"""
+        self.disconnect()
+        state = self.__dict__.copy()
+        assert state["_client"] is None
+        assert "object_id" in state
+        return state
+
+    def __setstate__(self, state):
+        """Called on pickle load"""
+        self.__dict__.update(state)
+
+    def __del__(self):
+        self.disconnect()
+
+    def disconnect(self):
+        if self._client is not None:
+            self._client.disconnect()
+            self._client = None
+
+    def __len__(self):
+        """Save reads by caching len"""
+        if self._n is None:
+            self._n = len(self.array)
+        return self._n
+
+
+GB100 = (1024 ** 3) * 100
+
+
+class PlasmaStore:
+    def __init__(self, path=DEFAULT_PLASMA_PATH, nbytes: int = GB100):
+
+        self.server = self.start(path, nbytes)
+
+    def __del__(self):
+        self.server.kill()
+
+    @staticmethod
+    def start(path=DEFAULT_PLASMA_PATH, nbytes: int = GB100) -> subprocess.Popen:
+        if not PYARROW_AVAILABLE:
+            raise ImportError("please run pip install pyarrow to use --use_plasma_view")
+        # best practice is to allocate more space than we need. The limitation seems to be the size of /dev/shm
+        _server = subprocess.Popen(["plasma_store", "-m", str(nbytes), "-s", path])
+        plasma.connect(path, num_retries=200)  # If we can't connect we fail immediately
+        return _server
diff --git a/SpeechT5/fairseq/fairseq/data/prepend_dataset.py b/SpeechT5/fairseq/fairseq/data/prepend_dataset.py
new file mode 100644
index 0000000000000000000000000000000000000000..ad74784d2d7920e4a6225282d95543ce16ea50d9
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/data/prepend_dataset.py
@@ -0,0 +1,28 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import numpy as np
+import torch
+
+from . import BaseWrapperDataset
+
+
+class PrependDataset(BaseWrapperDataset):
+    def __init__(self, dataset, prepend_getter, ensure_first_token_is=None):
+        super().__init__(dataset)
+        self.prepend_getter = prepend_getter
+        self.ensure_first_token = ensure_first_token_is
+
+    def __getitem__(self, idx):
+        item = self.dataset[idx]
+        is_tuple = isinstance(item, tuple)
+        src = item[0] if is_tuple else item
+
+        assert self.ensure_first_token is None or src[0] == self.ensure_first_token
+        prepend_idx = self.prepend_getter(self.dataset, idx)
+        assert isinstance(prepend_idx, int)
+        src[0] = prepend_idx
+        item = tuple((src,) + item[1:]) if is_tuple else src
+        return item
diff --git a/SpeechT5/fairseq/fairseq/data/prepend_token_dataset.py b/SpeechT5/fairseq/fairseq/data/prepend_token_dataset.py
new file mode 100644
index 0000000000000000000000000000000000000000..fd1331f4c44c1595eb9bb78baa0cf5cf3bcce9ad
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/data/prepend_token_dataset.py
@@ -0,0 +1,41 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import numpy as np
+import torch
+
+from . import BaseWrapperDataset
+
+
+class PrependTokenDataset(BaseWrapperDataset):
+    def __init__(self, dataset, token=None):
+        super().__init__(dataset)
+        self.token = token
+        if token is not None:
+            self._sizes = np.array(dataset.sizes) + 1
+        else:
+            self._sizes = dataset.sizes
+
+    def __getitem__(self, idx):
+        item = self.dataset[idx]
+        if self.token is not None:
+            item = torch.cat([item.new([self.token]), item])
+        return item
+
+    @property
+    def sizes(self):
+        return self._sizes
+
+    def num_tokens(self, index):
+        n = self.dataset.num_tokens(index)
+        if self.token is not None:
+            n += 1
+        return n
+
+    def size(self, index):
+        n = self.dataset.size(index)
+        if self.token is not None:
+            n += 1
+        return n
diff --git a/SpeechT5/fairseq/fairseq/data/raw_label_dataset.py b/SpeechT5/fairseq/fairseq/data/raw_label_dataset.py
new file mode 100644
index 0000000000000000000000000000000000000000..d054904f419bd64855d33a2a770b43f671c7c8d8
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/data/raw_label_dataset.py
@@ -0,0 +1,23 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import torch
+
+from . import FairseqDataset
+
+
+class RawLabelDataset(FairseqDataset):
+    def __init__(self, labels):
+        super().__init__()
+        self.labels = labels
+
+    def __getitem__(self, index):
+        return self.labels[index]
+
+    def __len__(self):
+        return len(self.labels)
+
+    def collater(self, samples):
+        return torch.tensor(samples)
diff --git a/SpeechT5/fairseq/fairseq/data/replace_dataset.py b/SpeechT5/fairseq/fairseq/data/replace_dataset.py
new file mode 100644
index 0000000000000000000000000000000000000000..5aac2ba96bee0a8bb65f4c9e56fa0b17248ee1d9
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/data/replace_dataset.py
@@ -0,0 +1,36 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from . import BaseWrapperDataset
+
+
+class ReplaceDataset(BaseWrapperDataset):
+    """Replaces tokens found in the dataset by a specified replacement token
+
+    Args:
+        dataset (~torch.utils.data.Dataset): dataset to replace tokens in
+        replace_map(Dictionary[int,int]): map of token to replace -> replacement token
+        offsets (List[int]): do not replace tokens before (from left if pos, right if neg) this offset. should be
+        as many as the number of objects returned by the underlying dataset __getitem__ method.
+    """
+
+    def __init__(self, dataset, replace_map, offsets):
+        super().__init__(dataset)
+        assert len(replace_map) > 0
+        self.replace_map = replace_map
+        self.offsets = offsets
+
+    def __getitem__(self, index):
+        item = self.dataset[index]
+        is_tuple = isinstance(item, tuple)
+        srcs = item if is_tuple else [item]
+
+        for offset, src in zip(self.offsets, srcs):
+            for k, v in self.replace_map.items():
+                src_off = src[offset:] if offset >= 0 else src[:offset]
+                src_off.masked_fill_(src_off == k, v)
+
+        item = srcs if is_tuple else srcs[0]
+        return item
diff --git a/SpeechT5/fairseq/fairseq/data/resampling_dataset.py b/SpeechT5/fairseq/fairseq/data/resampling_dataset.py
new file mode 100644
index 0000000000000000000000000000000000000000..3d3b993164dc3962df48bacff26714328e843e80
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/data/resampling_dataset.py
@@ -0,0 +1,139 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import logging
+
+import numpy as np
+from fairseq.data import BaseWrapperDataset, plasma_utils
+
+
+logger = logging.getLogger(__name__)
+
+
+class ResamplingDataset(BaseWrapperDataset):
+    """Randomly samples from a given dataset at each epoch.
+
+    Sampling is done with or without replacement, depending on the "replace"
+    parameter.
+
+    Optionally, the epoch size can be rescaled. This is potentially desirable
+    to increase per-epoch coverage of the base dataset (since sampling with
+    replacement means that many items in the dataset will be left out). In the
+    case of sampling without replacement, size_ratio should be strictly less
+    than 1.
+
+    Args:
+        dataset (~torch.utils.data.Dataset): dataset on which to sample.
+        weights (List[float]): list of probability weights
+            (default: None, which corresponds to uniform sampling).
+        replace (bool): sampling mode; True for "with replacement", or False
+            for "without replacement" (default: True)
+        size_ratio (float): the ratio to subsample to; must be positive
+            (default: 1.0).
+        batch_by_size (bool): whether or not to batch by sequence length
+            (default: True).
+        seed (int): RNG seed to use (default: 0).
+        epoch (int): starting epoch number (default: 1).
+    """
+
+    def __init__(
+        self,
+        dataset,
+        weights=None,
+        replace=True,
+        size_ratio=1.0,
+        batch_by_size=True,
+        seed=0,
+        epoch=1,
+    ):
+        super().__init__(dataset)
+
+        if weights is None:
+            self.weights = None
+
+        else:
+            assert len(weights) == len(dataset)
+            weights_arr = np.array(weights, dtype=np.float64)
+            weights_arr /= weights_arr.sum()
+            self.weights = plasma_utils.PlasmaArray(weights_arr)
+
+        self.replace = replace
+
+        assert size_ratio > 0.0
+        if not self.replace:
+            assert size_ratio < 1.0
+        self.size_ratio = float(size_ratio)
+        self.actual_size = np.ceil(len(dataset) * self.size_ratio).astype(int)
+
+        self.batch_by_size = batch_by_size
+        self.seed = seed
+
+        self._cur_epoch = None
+        self._cur_indices = None
+
+        self.set_epoch(epoch)
+
+    def __getitem__(self, index):
+        return self.dataset[self._cur_indices.array[index]]
+
+    def __len__(self):
+        return self.actual_size
+
+    @property
+    def sizes(self):
+        if isinstance(self.dataset.sizes, list):
+            return [s[self._cur_indices.array] for s in self.dataset.sizes]
+        return self.dataset.sizes[self._cur_indices.array]
+
+    def num_tokens(self, index):
+        return self.dataset.num_tokens(self._cur_indices.array[index])
+
+    def size(self, index):
+        return self.dataset.size(self._cur_indices.array[index])
+
+    def ordered_indices(self):
+        if self.batch_by_size:
+            order = [
+                np.arange(len(self)),
+                self.sizes,
+            ]  # No need to handle `self.shuffle == True`
+            return np.lexsort(order)
+        else:
+            return np.arange(len(self))
+
+    def prefetch(self, indices):
+        self.dataset.prefetch(self._cur_indices.array[indices])
+
+    @property
+    def can_reuse_epoch_itr_across_epochs(self):
+        return False
+
+    def set_epoch(self, epoch):
+        logger.debug("ResamplingDataset.set_epoch: {}".format(epoch))
+        super().set_epoch(epoch)
+
+        if epoch == self._cur_epoch:
+            return
+
+        self._cur_epoch = epoch
+
+        # Generate a weighted sample of indices as a function of the
+        # random seed and the current epoch.
+
+        rng = np.random.RandomState(
+            [
+                42,  # magic number
+                self.seed % (2 ** 32),  # global seed
+                self._cur_epoch,  # epoch index
+            ]
+        )
+        self._cur_indices = plasma_utils.PlasmaArray(
+            rng.choice(
+                len(self.dataset),
+                self.actual_size,
+                replace=self.replace,
+                p=(None if self.weights is None else self.weights.array),
+            )
+        )
diff --git a/SpeechT5/fairseq/fairseq/data/roll_dataset.py b/SpeechT5/fairseq/fairseq/data/roll_dataset.py
new file mode 100644
index 0000000000000000000000000000000000000000..a2915eeb3e8fb4dfb4b2bb33e0464ad0783d854c
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/data/roll_dataset.py
@@ -0,0 +1,18 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import torch
+
+from . import BaseWrapperDataset
+
+
+class RollDataset(BaseWrapperDataset):
+    def __init__(self, dataset, shifts):
+        super().__init__(dataset)
+        self.shifts = shifts
+
+    def __getitem__(self, index):
+        item = self.dataset[index]
+        return torch.roll(item, self.shifts)
diff --git a/SpeechT5/fairseq/fairseq/data/round_robin_zip_datasets.py b/SpeechT5/fairseq/fairseq/data/round_robin_zip_datasets.py
new file mode 100644
index 0000000000000000000000000000000000000000..2cb7447ea955a7c3ae7372f09ee426c08acd430e
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/data/round_robin_zip_datasets.py
@@ -0,0 +1,160 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import logging
+from collections import OrderedDict
+from typing import Dict, Sequence
+
+import numpy as np
+
+from . import FairseqDataset, LanguagePairDataset
+
+logger = logging.getLogger(__name__)
+
+
+class RoundRobinZipDatasets(FairseqDataset):
+    """Zip multiple :class:`~fairseq.data.FairseqDataset` instances together.
+
+    Shorter datasets are repeated in a round-robin fashion to match the length
+    of the longest one.
+
+    Args:
+        datasets (Dict[~fairseq.data.FairseqDataset]): a dictionary of
+            :class:`~fairseq.data.FairseqDataset` instances.
+        eval_key (str, optional): a key used at evaluation time that causes
+            this instance to pass-through batches from *datasets[eval_key]*.
+    """
+
+    def __init__(self, datasets, eval_key=None):
+        super().__init__()
+        if isinstance(datasets, dict):
+            datasets = OrderedDict(datasets)
+        assert isinstance(datasets, OrderedDict)
+        assert datasets, "Can't make a RoundRobinZipDatasets out of nothing"
+        for dataset in datasets.values():
+            assert isinstance(dataset, FairseqDataset)
+
+        self.datasets = datasets
+        self.eval_key = eval_key
+
+        self.longest_dataset_key = max(datasets, key=lambda k: len(datasets[k]))
+        self.longest_dataset = datasets[self.longest_dataset_key]
+        self._ordered_indices: Dict[str, Sequence[int]] = None
+
+    def _map_index(self, key, index):
+        assert (
+            self._ordered_indices is not None
+        ), "Must call RoundRobinZipDatasets.ordered_indices() first"
+        o = self._ordered_indices[key]
+        return o[index % len(o)]
+
+    def __getitem__(self, index):
+        if self.eval_key is None:
+            return OrderedDict(
+                [
+                    (key, dataset[self._map_index(key, index)])
+                    for key, dataset in self.datasets.items()
+                ]
+            )
+        else:
+            # at evaluation time it's useful to pass-through batches from a single key
+            return self.datasets[self.eval_key][self._map_index(self.eval_key, index)]
+
+    def __len__(self):
+        if self._ordered_indices is not None:
+            return len(self._ordered_indices[self.longest_dataset_key])
+        return len(self.longest_dataset)
+
+    def collater(self, samples):
+        """Merge a list of samples to form a mini-batch."""
+        if len(samples) == 0:
+            return None
+        if self.eval_key is None:
+            return OrderedDict(
+                [
+                    (key, dataset.collater([sample[key] for sample in samples]))
+                    for key, dataset in self.datasets.items()
+                ]
+            )
+        else:
+            # at evaluation time it's useful to pass-through batches from a single key
+            return self.datasets[self.eval_key].collater(samples)
+
+    def num_tokens(self, index):
+        """Return an example's length (number of tokens), used for batching."""
+        # TODO make it configurable whether to use max() or sum() here
+        return max(
+            dataset.num_tokens(self._map_index(key, index))
+            for key, dataset in self.datasets.items()
+        )
+
+    def size(self, index):
+        """Return an example's size as a float or tuple. This value is used when
+        filtering a dataset with ``--max-positions``."""
+        return {
+            key: dataset.size(self._map_index(key, index))
+            for key, dataset in self.datasets.items()
+        }
+
+    def ordered_indices(self):
+        """Ordered indices for batching."""
+        if self._ordered_indices is None:
+            # Call the underlying dataset's ordered_indices() here, so that we
+            # get the same random ordering as we would have from using the
+            # underlying sub-datasets directly.
+            self._ordered_indices = OrderedDict(
+                [
+                    (key, dataset.ordered_indices())
+                    for key, dataset in self.datasets.items()
+                ]
+            )
+        return np.arange(len(self))
+
+    def filter_indices_by_size(self, indices, max_positions=None):
+        """
+        Filter each sub-dataset independently, then update the round robin to work
+        on the filtered sub-datasets.
+        """
+
+        def _deep_until_language_pair(dataset):
+            if isinstance(dataset, LanguagePairDataset):
+                return dataset
+            if hasattr(dataset, "tgt_dataset"):
+                return _deep_until_language_pair(dataset.tgt_dataset)
+            if hasattr(dataset, "dataset"):
+                return _deep_until_language_pair(dataset.dataset)
+            raise Exception(f"Don't know how to unwrap this dataset: {dataset}")
+
+        if not isinstance(max_positions, dict):
+            max_positions = {k: max_positions for k in self.datasets.keys()}
+        ignored_some = False
+        for key, dataset in self.datasets.items():
+            dataset = _deep_until_language_pair(dataset)
+            self._ordered_indices[key], ignored = dataset.filter_indices_by_size(
+                self._ordered_indices[key], max_positions[key]
+            )
+            if len(ignored) > 0:
+                ignored_some = True
+                logger.warning(
+                    f"{len(ignored)} samples from {key} have invalid sizes and will be skipped, "
+                    f"max_positions={max_positions[key]}, first few sample ids={ignored[:10]}"
+                )
+        # Since we are modifying in place the _ordered_indices,
+        # it's not possible anymore to return valid ignored indices.
+        # Hopefully the extra debug information print above should be enough to debug.
+        # Ideally we would receive ignore_invalid_inputs so that we could have
+        # a proper error message.
+        return (np.arange(len(self)), [0] if ignored_some else [])
+
+    @property
+    def supports_prefetch(self):
+        return all(
+            getattr(dataset, "supports_prefetch", False)
+            for dataset in self.datasets.values()
+        )
+
+    def prefetch(self, indices):
+        for key, dataset in self.datasets.items():
+            dataset.prefetch([self._map_index(key, index) for index in indices])
diff --git a/SpeechT5/fairseq/fairseq/data/shorten_dataset.py b/SpeechT5/fairseq/fairseq/data/shorten_dataset.py
new file mode 100644
index 0000000000000000000000000000000000000000..6ebb5d88feb3f29d1512a0873df304915d051209
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/data/shorten_dataset.py
@@ -0,0 +1,78 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import numpy as np
+from fairseq.data import data_utils
+
+from . import BaseWrapperDataset
+
+
+class TruncateDataset(BaseWrapperDataset):
+    """Truncate a sequence by returning the first truncation_length tokens"""
+
+    def __init__(self, dataset, truncation_length):
+        super().__init__(dataset)
+        assert truncation_length is not None
+        self.truncation_length = truncation_length
+        self.dataset = dataset
+
+    def __getitem__(self, index):
+        item = self.dataset[index]
+        item_len = item.size(0)
+        if item_len > self.truncation_length:
+            item = item[: self.truncation_length]
+        return item
+
+    @property
+    def sizes(self):
+        return np.minimum(self.dataset.sizes, self.truncation_length)
+
+    def __len__(self):
+        return len(self.dataset)
+
+
+class RandomCropDataset(TruncateDataset):
+    """Truncate a sequence by returning a random crop of truncation_length tokens"""
+
+    def __init__(self, dataset, truncation_length, seed=1):
+        super().__init__(dataset, truncation_length)
+        self.seed = seed
+        self.epoch = 0
+
+    @property
+    def can_reuse_epoch_itr_across_epochs(self):
+        return True  # only the crop changes, not item sizes
+
+    def set_epoch(self, epoch, **unused):
+        super().set_epoch(epoch)
+        self.epoch = epoch
+
+    def __getitem__(self, index):
+        with data_utils.numpy_seed(self.seed, self.epoch, index):
+            item = self.dataset[index]
+            item_len = item.size(0)
+            excess = item_len - self.truncation_length
+            if excess > 0:
+                start_idx = np.random.randint(0, excess)
+                item = item[start_idx : start_idx + self.truncation_length]
+            return item
+
+
+def maybe_shorten_dataset(
+    dataset,
+    split,
+    shorten_data_split_list,
+    shorten_method,
+    tokens_per_sample,
+    seed,
+):
+    truncate_split = (
+        split in shorten_data_split_list.split(",") or len(shorten_data_split_list) == 0
+    )
+    if shorten_method == "truncate" and truncate_split:
+        dataset = TruncateDataset(dataset, tokens_per_sample)
+    elif shorten_method == "random_crop" and truncate_split:
+        dataset = RandomCropDataset(dataset, tokens_per_sample, seed)
+    return dataset
diff --git a/SpeechT5/fairseq/fairseq/data/sort_dataset.py b/SpeechT5/fairseq/fairseq/data/sort_dataset.py
new file mode 100644
index 0000000000000000000000000000000000000000..b3890e7279e1f26db2e48ec0a91c639e9299d60f
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/data/sort_dataset.py
@@ -0,0 +1,21 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import numpy as np
+
+from . import BaseWrapperDataset
+
+
+class SortDataset(BaseWrapperDataset):
+    def __init__(self, dataset, sort_order):
+        super().__init__(dataset)
+        if not isinstance(sort_order, (list, tuple)):
+            sort_order = [sort_order]
+        self.sort_order = sort_order
+
+        assert all(len(so) == len(dataset) for so in sort_order)
+
+    def ordered_indices(self):
+        return np.lexsort(self.sort_order)
diff --git a/SpeechT5/fairseq/fairseq/data/strip_token_dataset.py b/SpeechT5/fairseq/fairseq/data/strip_token_dataset.py
new file mode 100644
index 0000000000000000000000000000000000000000..cae39ba4d2f8106398eccd7eb0cf5c2194ec0db5
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/data/strip_token_dataset.py
@@ -0,0 +1,20 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from . import BaseWrapperDataset
+
+
+class StripTokenDataset(BaseWrapperDataset):
+    def __init__(self, dataset, id_to_strip):
+        super().__init__(dataset)
+        self.id_to_strip = id_to_strip
+
+    def __getitem__(self, index):
+        item = self.dataset[index]
+        while len(item) > 0 and item[-1] == self.id_to_strip:
+            item = item[:-1]
+        while len(item) > 0 and item[0] == self.id_to_strip:
+            item = item[1:]
+        return item
diff --git a/SpeechT5/fairseq/fairseq/data/subsample_dataset.py b/SpeechT5/fairseq/fairseq/data/subsample_dataset.py
new file mode 100644
index 0000000000000000000000000000000000000000..48feaf883f87dc95f8637c24d3c96f3f9fd8bd1d
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/data/subsample_dataset.py
@@ -0,0 +1,72 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import logging
+
+import numpy as np
+
+from . import BaseWrapperDataset
+
+
+logger = logging.getLogger(__name__)
+
+
+class SubsampleDataset(BaseWrapperDataset):
+    """Subsamples a given dataset by a specified ratio. Subsampling is done on the number of examples
+
+    Args:
+        dataset (~torch.utils.data.Dataset): dataset to subsample
+        size_ratio(float): the ratio to subsample to. must be between 0 and 1 (exclusive)
+    """
+
+    def __init__(self, dataset, size_ratio, shuffle=False):
+        super().__init__(dataset)
+        assert size_ratio < 1
+        self.actual_size = np.ceil(len(dataset) * size_ratio).astype(int)
+        self.indices = np.random.choice(
+            list(range(len(self.dataset))), self.actual_size, replace=False
+        )
+        self.shuffle = shuffle
+        logger.info(
+            "subsampled dataset from {} to {} (ratio={})".format(
+                len(self.dataset), self.actual_size, size_ratio
+            )
+        )
+
+    def __getitem__(self, index):
+        return self.dataset[self.indices[index]]
+
+    def __len__(self):
+        return self.actual_size
+
+    def collater(self, samples):
+        return self.dataset.collater(samples)
+
+    @property
+    def sizes(self):
+        return self.dataset.sizes[self.indices]
+
+    @property
+    def name(self):
+        return self.dataset.name
+
+    def num_tokens(self, index):
+        return self.dataset.num_tokens(self.indices[index])
+
+    def size(self, index):
+        return self.dataset.size(self.indices[index])
+
+    def ordered_indices(self):
+        """Return an ordered list of indices. Batches will be constructed based
+        on this order."""
+        if self.shuffle:
+            order = [np.random.permutation(len(self))]
+        else:
+            order = [np.arange(len(self))]
+        order.append(self.sizes)
+        return np.lexsort(order)
+
+    def prefetch(self, indices):
+        self.dataset.prefetch(self.indices[indices])
diff --git a/SpeechT5/fairseq/fairseq/data/token_block_dataset.py b/SpeechT5/fairseq/fairseq/data/token_block_dataset.py
new file mode 100644
index 0000000000000000000000000000000000000000..d2c65fd7e058072911c3aa60bfc760288a0f83e5
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/data/token_block_dataset.py
@@ -0,0 +1,202 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import numpy as np
+import torch
+from fairseq.data import FairseqDataset, plasma_utils
+from fairseq.data.indexed_dataset import best_fitting_int_dtype
+from typing import Tuple
+
+
+class TokenBlockDataset(FairseqDataset):
+    """Break a Dataset of tokens into blocks.
+
+    Args:
+        dataset (~torch.utils.data.Dataset): dataset to break into blocks
+        sizes (List[int]): sentence lengths (required for 'complete' and 'eos')
+        block_size (int): maximum block size (ignored in 'eos' break mode)
+        break_mode (str, optional): Mode used for breaking tokens. Values can
+            be one of:
+            - 'none': break tokens into equally sized blocks (up to block_size)
+            - 'complete': break tokens into blocks (up to block_size) such that
+                blocks contains complete sentences, although block_size may be
+                exceeded if some sentences exceed block_size
+            - 'complete_doc': similar to 'complete' mode, but do not
+                cross document boundaries
+            - 'eos': each block contains one sentence (block_size is ignored)
+        include_targets (bool, optional): return next tokens as targets
+            (default: False).
+        document_sep_len (int, optional): document separator size (required for
+            'complete_doc' break mode). Typically 1 if the sentences have eos
+            and 0 otherwise.
+    """
+
+    def __init__(
+        self,
+        dataset,
+        sizes,
+        block_size,
+        pad,
+        eos,
+        break_mode=None,
+        include_targets=False,
+        document_sep_len=1,
+        use_plasma_view=False,
+        split_path=None,
+        plasma_path=None,
+    ):
+
+        super().__init__()
+        self.dataset = dataset
+        self.pad = pad
+        self.eos = eos
+        self.include_targets = include_targets
+
+        assert len(dataset) > 0
+
+        assert len(dataset) == len(sizes)
+        _sizes, block_to_dataset_index, slice_indices = self._build_slice_indices(
+            sizes, break_mode, document_sep_len, block_size
+        )
+        if use_plasma_view:
+            plasma_id = (block_size, document_sep_len, str(break_mode), len(dataset))
+            self._slice_indices = plasma_utils.PlasmaView(
+                slice_indices, split_path, (plasma_id, 0), plasma_path=plasma_path
+            )
+            self._sizes = plasma_utils.PlasmaView(
+                _sizes, split_path, (plasma_id, 1), plasma_path=plasma_path
+            )
+            self._block_to_dataset_index = plasma_utils.PlasmaView(
+                block_to_dataset_index, split_path, (plasma_id, 2), plasma_path=plasma_path,
+            )
+        else:
+            self._slice_indices = plasma_utils.PlasmaArray(slice_indices)
+            self._sizes = plasma_utils.PlasmaArray(_sizes)
+            self._block_to_dataset_index = plasma_utils.PlasmaArray(
+                block_to_dataset_index
+            )
+
+    @staticmethod
+    def _build_slice_indices(
+        sizes, break_mode, document_sep_len, block_size
+    ) -> Tuple[np.ndarray]:
+        """Use token_block_utils_fast to build arrays for indexing into self.dataset"""
+        try:
+            from fairseq.data.token_block_utils_fast import (
+                _get_slice_indices_fast,
+                _get_block_to_dataset_index_fast,
+            )
+        except ImportError:
+            raise ImportError(
+                "Please build Cython components with: `pip install --editable .` "
+                "or `python setup.py build_ext --inplace`"
+            )
+
+        if isinstance(sizes, list):
+            sizes = np.array(sizes, dtype=np.int64)
+        else:
+            if torch.is_tensor(sizes):
+                sizes = sizes.numpy()
+            sizes = sizes.astype(np.int64)
+
+        break_mode = break_mode if break_mode is not None else "none"
+
+        # For "eos" break-mode, block_size is not required parameters.
+        if break_mode == "eos" and block_size is None:
+            block_size = 0
+
+        slice_indices = _get_slice_indices_fast(
+            sizes, str(break_mode), block_size, document_sep_len
+        )
+        _sizes = slice_indices[:, 1] - slice_indices[:, 0]
+
+        # build index mapping block indices to the underlying dataset indices
+        if break_mode == "eos":
+            # much faster version for eos break mode
+            block_to_dataset_index = np.stack(
+                [
+                    np.arange(len(sizes)),  # starting index in dataset
+                    np.zeros(
+                        len(sizes), dtype=np.compat.long
+                    ),  # starting offset within starting index
+                    np.arange(len(sizes)),  # ending index in dataset
+                ],
+                1,
+            )
+        else:
+            block_to_dataset_index = _get_block_to_dataset_index_fast(
+                sizes, slice_indices,
+            )
+        size_dtype = np.uint16 if block_size < 65535 else np.uint32
+        num_tokens = slice_indices[-1].max()
+        slice_indices_dtype = best_fitting_int_dtype(num_tokens)
+        slice_indices = slice_indices.astype(slice_indices_dtype)
+        _sizes = _sizes.astype(size_dtype)
+        block_to_dataset_index = block_to_dataset_index.astype(slice_indices_dtype)
+        return _sizes, block_to_dataset_index, slice_indices
+
+    @property
+    def slice_indices(self):
+        return self._slice_indices.array
+
+    @property
+    def sizes(self):
+        return self._sizes.array
+
+    @property
+    def block_to_dataset_index(self):
+        return self._block_to_dataset_index.array
+
+    def attr(self, attr: str, index: int):
+        start_ds_idx, _, _ = self.block_to_dataset_index[index]
+        return self.dataset.attr(attr, start_ds_idx)
+
+    def __getitem__(self, index):
+        start_ds_idx, start_offset, end_ds_idx = self.block_to_dataset_index[index]
+
+        buffer = torch.cat(
+            [self.dataset[idx] for idx in range(start_ds_idx, end_ds_idx + 1)]
+        )
+        slice_s, slice_e = self.slice_indices[index]
+        length = slice_e - slice_s
+        s, e = start_offset, start_offset + length
+        item = buffer[s:e]
+
+        if self.include_targets:
+            # *target* is the original sentence (=item)
+            # *source* is shifted right by 1 (maybe left-padded with eos)
+            # *past_target* is shifted right by 2 (left-padded as needed)
+            if s == 0:
+                source = torch.cat([item.new([self.eos]), buffer[0 : e - 1]])
+                past_target = torch.cat(
+                    [item.new([self.pad, self.eos]), buffer[0 : e - 2]]
+                )
+            else:
+                source = buffer[s - 1 : e - 1]
+                if s == 1:
+                    past_target = torch.cat([item.new([self.eos]), buffer[0 : e - 2]])
+                else:
+                    past_target = buffer[s - 2 : e - 2]
+
+            return source, item, past_target
+
+        return item
+
+    def __len__(self):
+        return len(self.slice_indices)
+
+    @property
+    def supports_prefetch(self):
+        return getattr(self.dataset, "supports_prefetch", False)
+
+    def prefetch(self, indices):
+        self.dataset.prefetch(
+            {
+                ds_idx
+                for index in indices
+                for start_ds_idx, _, end_ds_idx in [self.block_to_dataset_index[index]]
+                for ds_idx in range(start_ds_idx, end_ds_idx + 1)
+            }
+        )
diff --git a/SpeechT5/fairseq/fairseq/data/token_block_utils_fast.pyx b/SpeechT5/fairseq/fairseq/data/token_block_utils_fast.pyx
new file mode 100644
index 0000000000000000000000000000000000000000..08af4f30613a7b6ffa965a7c7084acabec8f8749
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/data/token_block_utils_fast.pyx
@@ -0,0 +1,187 @@
+# cython: language_level=3
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import numpy as np
+import torch
+from itertools import chain
+from libc.math cimport ceil
+
+cimport cython
+cimport numpy as np
+
+from libc.stdint cimport int32_t, int64_t
+
+DTYPE = np.int64
+ctypedef int64_t DTYPE_t
+
+
+@cython.boundscheck(False)
+@cython.wraparound(False)
+@cython.nonecheck(False)
+cdef np.ndarray[DTYPE_t, ndim=2] _get_slice_indices_none_mode(np.ndarray[DTYPE_t, ndim=1] sizes, int block_size):
+    cdef DTYPE_t total_size = sizes.sum()
+    cdef DTYPE_t length = <DTYPE_t> ceil(total_size / <double> block_size)
+    cdef np.ndarray[DTYPE_t, ndim=2] slice_indices = np.zeros([length, 2], dtype=DTYPE)
+    cdef DTYPE_t[:, :] slice_indices_view = slice_indices
+    cdef DTYPE_t i
+    cdef DTYPE_t start
+    cdef DTYPE_t end
+    for i in range(length):
+        start = i * block_size
+        end = min(start + block_size, total_size)
+        slice_indices_view[i][0] = start
+        slice_indices_view[i][1] = end
+    return slice_indices
+
+
+cdef np.ndarray[DTYPE_t, ndim=2] _fast_convert_to_np_array(list list_of_list):
+    """
+    Faster function to convert DTYPE_t list of list.
+    Only fast when there are huge number of rows and low number of columns.
+    """
+    cdef np.ndarray[DTYPE_t, ndim=1] flat = np.fromiter(chain.from_iterable(list_of_list), DTYPE, -1)
+    return flat.reshape((len(list_of_list), -1))
+
+
+@cython.boundscheck(False)
+@cython.wraparound(False)
+@cython.nonecheck(False)
+cpdef np.ndarray[DTYPE_t, ndim=2] _get_slice_indices_fast(np.ndarray[DTYPE_t, ndim=1] sizes, str break_mode, int block_size, int document_sep_len):
+    cdef DTYPE_t tok_idx = 0
+    cdef DTYPE_t sz_idx = 0
+    cdef DTYPE_t curr_size = 0
+    cdef DTYPE_t i = 0
+    cdef DTYPE_t length
+    cdef DTYPE_t total_size
+    cdef DTYPE_t[:] sizes_view = sizes
+    cdef np.ndarray[DTYPE_t, ndim=2] slice_indices
+    cdef list slice_indices_list = []
+
+    if break_mode is None or break_mode == 'none':
+        slice_indices = _get_slice_indices_none_mode(sizes, block_size)
+    elif break_mode == 'complete':
+        while sz_idx < len(sizes_view):
+            if curr_size + sizes_view[sz_idx] <= block_size or curr_size == 0:
+                curr_size += sizes_view[sz_idx]
+                sz_idx += 1
+            else:
+                slice_indices_list.append((tok_idx, tok_idx + curr_size))
+                tok_idx += curr_size
+                curr_size = 0
+        if curr_size > 0:
+            slice_indices_list.append((tok_idx, tok_idx + curr_size))
+        slice_indices = _fast_convert_to_np_array(slice_indices_list)
+    elif break_mode == 'complete_doc':
+        while sz_idx < len(sizes_view):
+            if (
+                (curr_size + sizes_view[sz_idx] <= block_size or curr_size == 0)
+                # an empty sentence indicates end-of-document:
+                and sizes_view[sz_idx] != document_sep_len
+            ):
+                curr_size += sizes_view[sz_idx]
+                sz_idx += 1
+            else:
+                # Only keep non-empty documents.
+                if curr_size > 1:
+                    slice_indices_list.append((tok_idx, tok_idx + curr_size))
+                tok_idx += curr_size
+                curr_size = 0
+                if sizes_view[sz_idx] == document_sep_len:
+                    tok_idx += sizes_view[sz_idx]
+                    sz_idx += 1
+        if curr_size > 1:
+            slice_indices_list.append((tok_idx, tok_idx + curr_size))
+        slice_indices = _fast_convert_to_np_array(slice_indices_list)
+    elif break_mode == 'eos':
+        slice_indices = np.zeros((len(sizes), 2), dtype=DTYPE)
+        cumsum = sizes.cumsum(axis=0)
+        slice_indices[1:, 0] = cumsum[:cumsum.shape[0] - 1]
+        slice_indices[:, 1] = cumsum
+    else:
+        raise ValueError('Invalid break_mode: ' + break_mode)
+    return slice_indices
+
+
+@cython.boundscheck(False)
+@cython.wraparound(False)
+@cython.nonecheck(False)
+cpdef np.ndarray[DTYPE_t, ndim=2] _get_block_to_dataset_index_fast(np.ndarray[DTYPE_t, ndim=1] sizes, np.ndarray[DTYPE_t, ndim=2] slice_indices):
+    cdef DTYPE_t start_ds_idx
+    cdef DTYPE_t start_offset
+    cdef DTYPE_t end_ds_idx
+    cdef DTYPE_t i
+    cdef DTYPE_t s
+    cdef DTYPE_t e
+    cdef DatasetSearcher ds = DatasetSearcher(sizes)
+    cdef np.ndarray[DTYPE_t, ndim=2] block_to_dataset_index = np.zeros([len(slice_indices), 3], dtype=DTYPE)
+    cdef DTYPE_t[:, :] block_to_dataset_index_view = block_to_dataset_index
+    cdef DTYPE_t[:, :] slice_indices_view = slice_indices
+    cdef Py_ssize_t x_max = slice_indices.shape[0]
+
+    for i in range(x_max):
+        s = slice_indices_view[i][0]
+        e = slice_indices_view[i][1]
+        ds.seek(s)
+        start_ds_idx = ds.current_index
+        start_offset = ds.current_offset
+        if e <= s:
+            end_ds_idx = start_ds_idx
+        else:
+            ds.seek(e - 1)
+            end_ds_idx = ds.current_index
+        block_to_dataset_index_view[i][0] = start_ds_idx  # starting index in dataset
+        block_to_dataset_index_view[i][1] = start_offset  # starting offset within starting index
+        block_to_dataset_index_view[i][2] = end_ds_idx    # ending index in dataset
+    return block_to_dataset_index
+
+
+cdef class DatasetSearcher(object):
+    """Helper for mapping "flat" indices to indices and offsets in an
+    underlying dataset."""
+    cdef DTYPE_t current_i
+    cdef DTYPE_t current_offset
+    cdef DTYPE_t current_index
+    cdef DTYPE_t[:] sizes
+
+    def __init__(self, DTYPE_t[:] sizes):
+        self.sizes = sizes
+        self.reset()
+
+    cdef reset(self):
+        self.current_offset = 0     # offset within current index in underlying dataset
+        self.current_i = 0          # "flat" index
+        self.current_index = 0      # index in underlying dataset
+
+    @cython.boundscheck(False)
+    @cython.wraparound(False)
+    @cython.nonecheck(False)
+    cdef int step(self, DTYPE_t i):
+        cdef DTYPE_t to_consume
+        cdef DTYPE_t remaining
+        if i < self.current_i:
+            self.reset()
+        if i > self.current_i:
+            to_consume = i - self.current_i
+            remaining = self.sizes[self.current_index] - self.current_offset
+            if remaining > to_consume:
+                self.current_offset += to_consume
+                self.current_i += to_consume
+            else:
+                assert remaining >= 0
+                self.current_i += remaining
+                self.current_index += 1
+                self.current_offset = 0
+                return 1
+        return 0
+
+    @cython.boundscheck(False)
+    @cython.wraparound(False)
+    @cython.nonecheck(False)
+    cdef seek(self, DTYPE_t i):
+        cdef int not_done = 1
+        while not_done == 1:
+            not_done = self.step(i)
+        assert self.current_i == i
diff --git a/SpeechT5/fairseq/fairseq/data/transform_eos_dataset.py b/SpeechT5/fairseq/fairseq/data/transform_eos_dataset.py
new file mode 100644
index 0000000000000000000000000000000000000000..fb14ff018edf13b20f5d0e486692dfb0a37ec6d1
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/data/transform_eos_dataset.py
@@ -0,0 +1,120 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import torch
+
+from . import FairseqDataset
+
+
+class TransformEosDataset(FairseqDataset):
+    """A :class:`~fairseq.data.FairseqDataset` wrapper that appends/prepends/strips EOS.
+
+    Note that the transformation is applied in :func:`collater`.
+
+    Args:
+        dataset (~fairseq.data.FairseqDataset): dataset to wrap
+        eos (int): index of the end-of-sentence symbol
+        append_eos_to_src (bool, optional): append EOS to the end of src
+        remove_eos_from_src (bool, optional): remove EOS from the end of src
+        append_eos_to_tgt (bool, optional): append EOS to the end of tgt
+        remove_eos_from_tgt (bool, optional): remove EOS from the end of tgt
+    """
+
+    def __init__(
+        self,
+        dataset,
+        eos,
+        append_eos_to_src=False,
+        remove_eos_from_src=False,
+        append_eos_to_tgt=False,
+        remove_eos_from_tgt=False,
+        has_target=True,
+    ):
+        if not isinstance(dataset, FairseqDataset):
+            raise ValueError("dataset must be an instance of FairseqDataset")
+        if append_eos_to_src and remove_eos_from_src:
+            raise ValueError("cannot combine append_eos_to_src and remove_eos_from_src")
+        if append_eos_to_tgt and remove_eos_from_tgt:
+            raise ValueError("cannot combine append_eos_to_tgt and remove_eos_from_tgt")
+
+        self.dataset = dataset
+        self.eos = torch.LongTensor([eos])
+        self.append_eos_to_src = append_eos_to_src
+        self.remove_eos_from_src = remove_eos_from_src
+        self.append_eos_to_tgt = append_eos_to_tgt
+        self.remove_eos_from_tgt = remove_eos_from_tgt
+        self.has_target = has_target
+
+        # precompute how we should adjust the reported sizes
+        self._src_delta = 0
+        self._src_delta += 1 if append_eos_to_src else 0
+        self._src_delta -= 1 if remove_eos_from_src else 0
+        self._tgt_delta = 0
+        self._tgt_delta += 1 if append_eos_to_tgt else 0
+        self._tgt_delta -= 1 if remove_eos_from_tgt else 0
+
+        self._checked_src = False
+        self._checked_tgt = False
+
+    def _check_src(self, src, expect_eos):
+        if not self._checked_src:
+            assert (src[-1] == self.eos[0]) == expect_eos
+            self._checked_src = True
+
+    def _check_tgt(self, tgt, expect_eos):
+        if self.has_target and not self._checked_tgt:
+            assert (tgt[-1] == self.eos[0]) == expect_eos
+            self._checked_tgt = True
+
+    def __getitem__(self, index):
+        return self.dataset[index]
+
+    def __len__(self):
+        return len(self.dataset)
+
+    def collater(self, samples):
+        def transform(item):
+            if self.append_eos_to_src:
+                self.eos = self.eos.to(device=item["source"].device)
+                self._check_src(item["source"], expect_eos=False)
+                item["source"] = torch.cat([item["source"], self.eos])
+            if self.remove_eos_from_src:
+                self.eos = self.eos.to(device=item["source"].device)
+                self._check_src(item["source"], expect_eos=True)
+                item["source"] = item["source"][:-1]
+            if self.append_eos_to_tgt:
+                self.eos = self.eos.to(device=item["target"].device)
+                self._check_tgt(item["target"], expect_eos=False)
+                item["target"] = torch.cat([item["target"], self.eos])
+            if self.remove_eos_from_tgt:
+                self.eos = self.eos.to(device=item["target"].device)
+                self._check_tgt(item["target"], expect_eos=True)
+                item["target"] = item["target"][:-1]
+            return item
+
+        samples = list(map(transform, samples))
+        return self.dataset.collater(samples)
+
+    def num_tokens(self, index):
+        return self.dataset.num_tokens(index)
+
+    def size(self, index):
+        if self.has_target:
+            src_len, tgt_len = self.dataset.size(index)
+            return (src_len + self._src_delta, tgt_len + self._tgt_delta)
+        else:
+            return self.dataset.size(index)
+
+    def ordered_indices(self):
+        # NOTE: we assume that the ordering does not change based on the
+        # addition or removal of eos
+        return self.dataset.ordered_indices()
+
+    @property
+    def supports_prefetch(self):
+        return getattr(self.dataset, "supports_prefetch", False)
+
+    def prefetch(self, indices):
+        return self.dataset.prefetch(indices)
diff --git a/SpeechT5/fairseq/fairseq/data/transform_eos_lang_pair_dataset.py b/SpeechT5/fairseq/fairseq/data/transform_eos_lang_pair_dataset.py
new file mode 100644
index 0000000000000000000000000000000000000000..07ebdd5f3882b50cca39665715fd2b2af45f0825
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/data/transform_eos_lang_pair_dataset.py
@@ -0,0 +1,111 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+
+from typing import Optional
+
+import torch
+
+from . import FairseqDataset
+
+
+class TransformEosLangPairDataset(FairseqDataset):
+    """A :class:`~fairseq.data.FairseqDataset` wrapper that transform bos on
+    collated samples of language pair dataset.
+
+    Note that the transformation is applied in :func:`collater`.
+
+    Args:
+        dataset (~fairseq.data.FairseqDataset): dataset that collates sample into
+            LanguagePairDataset schema
+        src_eos (int): original source end-of-sentence symbol index to be replaced
+        new_src_eos (int, optional): new end-of-sentence symbol index to replace source eos symbol
+        tgt_bos (int, optional): original target beginning-of-sentence symbol index to be replaced
+        new_tgt_bos (int, optional): new beginning-of-sentence symbol index to replace at the
+            beginning of 'prev_output_tokens'
+    """
+
+    def __init__(
+        self,
+        dataset: FairseqDataset,
+        src_eos: int,
+        new_src_eos: Optional[int] = None,
+        tgt_bos: Optional[int] = None,
+        new_tgt_bos: Optional[int] = None,
+    ):
+        self.dataset = dataset
+        self.src_eos = src_eos
+        self.new_src_eos = new_src_eos
+        self.tgt_bos = tgt_bos
+        self.new_tgt_bos = new_tgt_bos
+
+    def __getitem__(self, index):
+        return self.dataset[index]
+
+    def __len__(self):
+        return len(self.dataset)
+
+    def collater(self, samples, **extra_args):
+        samples = self.dataset.collater(samples, **extra_args)
+
+        if 'net_input' not in samples:
+            return samples
+
+        if self.new_src_eos is not None:
+            if self.dataset.left_pad_source:
+                assert (
+                    samples["net_input"]["src_tokens"][:, -1] != self.src_eos
+                ).sum() == 0
+                samples["net_input"]["src_tokens"][:, -1] = self.new_src_eos
+            else:
+                eos_idx = samples["net_input"]["src_lengths"] - 1
+                assert (
+                    samples["net_input"]["src_tokens"][
+                        torch.arange(eos_idx.size(0)), eos_idx
+                    ]
+                    != self.src_eos
+                ).sum() == 0
+                eos_idx = eos_idx.resize_(len(samples["net_input"]["src_lengths"]), 1)
+                samples["net_input"]["src_tokens"].scatter_(
+                    1, eos_idx, self.new_src_eos
+                )
+
+        if (
+            self.new_tgt_bos is not None
+            and "prev_output_tokens" in samples["net_input"]
+        ):
+            if self.dataset.left_pad_target:
+                # TODO: support different padding direction on target side
+                raise NotImplementedError(
+                    "TransformEosLangPairDataset does not implement --left-pad-target True option"
+                )
+            else:
+                assert (
+                    samples["net_input"]["prev_output_tokens"][:, 0] != self.tgt_bos
+                ).sum() == 0
+                samples["net_input"]["prev_output_tokens"][:, 0] = self.new_tgt_bos
+
+        return samples
+
+    def num_tokens(self, index):
+        return self.dataset.num_tokens(index)
+
+    def size(self, index):
+        return self.dataset.size(index)
+
+    @property
+    def sizes(self):
+        # dataset.sizes can be a dynamically computed sizes:
+        return self.dataset.sizes
+
+    def ordered_indices(self):
+        return self.dataset.ordered_indices()
+
+    @property
+    def supports_prefetch(self):
+        return getattr(self.dataset, "supports_prefetch", False)
+
+    def prefetch(self, indices):
+        return self.dataset.prefetch(indices)
diff --git a/SpeechT5/fairseq/fairseq/dataclass/__init__.py b/SpeechT5/fairseq/fairseq/dataclass/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..25408d28ec44cee56eb5fb3ab0c817dc04159e95
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/dataclass/__init__.py
@@ -0,0 +1,13 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from .configs import FairseqDataclass
+from .constants import ChoiceEnum
+
+
+__all__ = [
+    "FairseqDataclass",
+    "ChoiceEnum",
+]
diff --git a/SpeechT5/fairseq/fairseq/dataclass/configs.py b/SpeechT5/fairseq/fairseq/dataclass/configs.py
new file mode 100644
index 0000000000000000000000000000000000000000..b0146fa4c7332c9f8b1f6bcff7977399dfc46f08
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/dataclass/configs.py
@@ -0,0 +1,990 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import sys
+from dataclasses import _MISSING_TYPE, dataclass, field
+from typing import Any, List, Optional
+
+import torch
+
+from fairseq.dataclass.constants import (
+    DATASET_IMPL_CHOICES,
+    DDP_BACKEND_CHOICES,
+    DDP_COMM_HOOK_CHOICES,
+    GENERATION_CONSTRAINTS_CHOICES,
+    GENERATION_DECODING_FORMAT_CHOICES,
+    LOG_FORMAT_CHOICES,
+    PIPELINE_CHECKPOINT_CHOICES,
+    PRINT_ALIGNMENT_CHOICES,
+    ZERO_SHARDING_CHOICES,
+)
+
+from omegaconf import II, MISSING
+
+
+@dataclass
+class FairseqDataclass:
+    """fairseq base dataclass that supported fetching attributes and metas"""
+
+    _name: Optional[str] = None
+
+    @staticmethod
+    def name():
+        return None
+
+    def _get_all_attributes(self) -> List[str]:
+        return [k for k in self.__dataclass_fields__.keys()]
+
+    def _get_meta(
+        self, attribute_name: str, meta: str, default: Optional[Any] = None
+    ) -> Any:
+        return self.__dataclass_fields__[attribute_name].metadata.get(meta, default)
+
+    def _get_name(self, attribute_name: str) -> str:
+        return self.__dataclass_fields__[attribute_name].name
+
+    def _get_default(self, attribute_name: str) -> Any:
+        if hasattr(self, attribute_name):
+            if str(getattr(self, attribute_name)).startswith("${"):
+                return str(getattr(self, attribute_name))
+            elif str(self.__dataclass_fields__[attribute_name].default).startswith(
+                "${"
+            ):
+                return str(self.__dataclass_fields__[attribute_name].default)
+            elif (
+                getattr(self, attribute_name)
+                != self.__dataclass_fields__[attribute_name].default
+            ):
+                return getattr(self, attribute_name)
+
+        f = self.__dataclass_fields__[attribute_name]
+        if not isinstance(f.default_factory, _MISSING_TYPE):
+            return f.default_factory()
+        return f.default
+
+    def _get_type(self, attribute_name: str) -> Any:
+        return self.__dataclass_fields__[attribute_name].type
+
+    def _get_help(self, attribute_name: str) -> Any:
+        return self._get_meta(attribute_name, "help")
+
+    def _get_argparse_const(self, attribute_name: str) -> Any:
+        return self._get_meta(attribute_name, "argparse_const")
+
+    def _get_argparse_alias(self, attribute_name: str) -> Any:
+        return self._get_meta(attribute_name, "argparse_alias")
+
+    def _get_choices(self, attribute_name: str) -> Any:
+        return self._get_meta(attribute_name, "choices")
+
+
+@dataclass
+class CommonConfig(FairseqDataclass):
+    # This is the core dataclass including common parameters shared by all different jobs. Please append your params to other dataclasses if they were
+    # used for a particular purpose or task, such as those dedicated for `distributed training`, `optimization`, etc.
+    no_progress_bar: bool = field(
+        default=False, metadata={"help": "disable progress bar"}
+    )
+    log_interval: int = field(
+        default=100,
+        metadata={
+            "help": "log progress every N batches (when progress bar is disabled)"
+        },
+    )
+    log_format: Optional[LOG_FORMAT_CHOICES] = field(
+        default=None, metadata={"help": "log format to use"}
+    )
+    log_file: Optional[str] = field(
+        default=None, metadata={"help": "log file to copy metrics to."}
+    )
+    tensorboard_logdir: Optional[str] = field(
+        default=None,
+        metadata={
+            "help": "path to save logs for tensorboard, should match --logdir "
+            "of running tensorboard (default: no tensorboard logging)"
+        },
+    )
+    wandb_project: Optional[str] = field(
+        default=None,
+        metadata={"help": "Weights and Biases project name to use for logging"},
+    )
+    azureml_logging: Optional[bool] = field(
+        default=False, metadata={"help": "Log scalars to AzureML context"},
+    )
+    seed: int = field(
+        default=1, metadata={"help": "pseudo random number generator seed"}
+    )
+    cpu: bool = field(default=False, metadata={"help": "use CPU instead of CUDA"})
+    tpu: bool = field(default=False, metadata={"help": "use TPU instead of CUDA"})
+    bf16: bool = field(default=False, metadata={"help": "use bfloat16; implies --tpu"})
+    memory_efficient_bf16: bool = field(
+        default=False,
+        metadata={
+            "help": "use a memory-efficient version of BF16 training; implies --bf16"
+        },
+    )
+    fp16: bool = field(default=False, metadata={"help": "use FP16"})
+    memory_efficient_fp16: bool = field(
+        default=False,
+        metadata={
+            "help": "use a memory-efficient version of FP16 training; implies --fp16"
+        },
+    )
+    fp16_no_flatten_grads: bool = field(
+        default=False, metadata={"help": "don't flatten FP16 grads tensor"}
+    )
+    fp16_init_scale: int = field(
+        default=2 ** 7, metadata={"help": "default FP16 loss scale"}
+    )
+    fp16_scale_window: Optional[int] = field(
+        default=None,
+        metadata={"help": "number of updates before increasing loss scale"},
+    )
+    fp16_scale_tolerance: float = field(
+        default=0.0,
+        metadata={
+            "help": "pct of updates that can overflow before decreasing the loss scale"
+        },
+    )
+    on_cpu_convert_precision: bool = field(
+        default=False,
+        metadata={
+            "help": "if set, the floating point conversion to fp16/bf16 runs on CPU. "
+            "This reduces bus transfer time and GPU memory usage."
+        }
+    )
+    min_loss_scale: float = field(
+        default=1e-4,
+        metadata={"help": "minimum FP16/AMP loss scale, after which training is stopped"},
+    )
+    threshold_loss_scale: Optional[float] = field(
+        default=None, metadata={"help": "threshold FP16 loss scale from below"}
+    )
+    amp: bool = field(default=False, metadata={"help": "use automatic mixed precision"})
+    amp_batch_retries: int = field(
+        default=2,
+        metadata={"help": "number of retries of same batch after reducing loss scale with AMP"},
+    )
+    amp_init_scale: int = field(
+        default=2 ** 7, metadata={"help": "default AMP loss scale"}
+    )
+    amp_scale_window: Optional[int] = field(
+        default=None,
+        metadata={"help": "number of updates before increasing AMP loss scale"},
+    )
+    user_dir: Optional[str] = field(
+        default=None,
+        metadata={
+            "help": "path to a python module containing custom extensions (tasks and/or architectures)"
+        },
+    )
+    empty_cache_freq: int = field(
+        default=0,
+        metadata={"help": "how often to clear the PyTorch CUDA cache (0 to disable)"},
+    )
+    all_gather_list_size: int = field(
+        default=16384,
+        metadata={"help": "number of bytes reserved for gathering stats from workers"},
+    )
+    model_parallel_size: int = field(
+        default=1, metadata={"help": "total number of GPUs to parallelize model over"}
+    )
+    quantization_config_path: Optional[str] = field(
+        default=None, metadata={"help": "path to quantization config file"}
+    )
+    profile: bool = field(
+        default=False, metadata={"help": "enable autograd profiler emit_nvtx"}
+    )
+    reset_logging: bool = field(
+        default=False,
+        metadata={
+            "help": "when using Hydra, reset the logging at the beginning of training"
+        },
+    )
+    suppress_crashes: bool = field(
+        default=False,
+        metadata={
+            "help": "suppress crashes when training with the hydra_train entry point so that the "
+                    "main method can return a value (useful for sweeps)"
+        },
+    )
+    use_plasma_view: bool = field(
+        default=False, metadata={"help": "Store indices and sizes in shared memory"}
+    )
+    plasma_path: Optional[str] = field(
+        default="/tmp/plasma",
+        metadata={
+            "help": "path to run plasma_store, defaults to /tmp/plasma. Paths outside /tmp tend to fail."
+        },
+    )
+
+
+@dataclass
+class DistributedTrainingConfig(FairseqDataclass):
+    distributed_world_size: int = field(
+        default=max(1, torch.cuda.device_count()),
+        metadata={
+            "help": "total number of GPUs across all nodes (default: all visible GPUs)"
+        },
+    )
+    distributed_num_procs: Optional[int] = field(
+        default=max(1, torch.cuda.device_count()),
+        metadata={
+            "help": "total number of processes to fork (default: all visible GPUs)"
+        },
+    )
+    distributed_rank: Optional[int] = field(
+        default=0, metadata={"help": "rank of the current worker"}
+    )
+    distributed_backend: str = field(
+        default="nccl", metadata={"help": "distributed backend"}
+    )
+    distributed_init_method: Optional[str] = field(
+        default=None,
+        metadata={
+            "help": "typically tcp://hostname:port that will be used to "
+            "establish initial connetion"
+        },
+    )
+    distributed_port: int = field(
+        default=-1,
+        metadata={
+            "help": "port number (not required if using --distributed-init-method)"
+        },
+    )
+    device_id: int = field(
+        default=0,
+        metadata={
+            "help": "which GPU to use (usually configured automatically)",
+            "argparse_alias": "--local_rank",
+        },
+    )
+    distributed_no_spawn: bool = field(
+        default=False,
+        metadata={
+            "help": "do not spawn multiple processes even if multiple GPUs are visible"
+        },
+    )
+    ddp_backend: DDP_BACKEND_CHOICES = field(
+        default="pytorch_ddp", metadata={"help": "DistributedDataParallel backend"}
+    )
+    ddp_comm_hook: DDP_COMM_HOOK_CHOICES = field(
+        default="none", metadata={"help": "communication hook"}
+    )
+    bucket_cap_mb: int = field(
+        default=25, metadata={"help": "bucket size for reduction"}
+    )
+    fix_batches_to_gpus: bool = field(
+        default=False,
+        metadata={
+            "help": "don't shuffle batches between GPUs; this reduces overall "
+            "randomness and may affect precision but avoids the cost of re-reading the data"
+        },
+    )
+    find_unused_parameters: bool = field(
+        default=False,
+        metadata={
+            "help": "disable unused parameter detection (not applicable to "
+            "--ddp-backend=legacy_ddp)"
+        },
+    )
+    fast_stat_sync: bool = field(
+        default=False,
+        metadata={"help": "[deprecated] this is now defined per Criterion"},
+    )
+    heartbeat_timeout: int = field(
+        default=-1,
+        metadata={
+            "help": "kill the job if no progress is made in N seconds; "
+            "set to -1 to disable"
+        },
+    )
+    broadcast_buffers: bool = field(
+        default=False,
+        metadata={
+            "help": "Copy non-trainable parameters between GPUs, such as "
+            "batchnorm population statistics"
+        },
+    )
+    slowmo_momentum: Optional[float] = field(
+        default=None,
+        metadata={
+            "help": "SlowMo momentum term; by default use 0.0 for 16 GPUs, "
+            "0.2 for 32 GPUs; 0.5 for 64 GPUs, 0.6 for > 64 GPUs"
+        },
+    )
+    slowmo_algorithm: str = field(
+        default="LocalSGD", metadata={"help": "whether to use LocalSGD or SGP"}
+    )
+    localsgd_frequency: int = field(
+        default=3, metadata={"help": "Local SGD allreduce frequency"}
+    )
+    nprocs_per_node: int = field(
+        default=max(1, torch.cuda.device_count()),
+        metadata={
+            "help": "number of GPUs in each node. An allreduce operation across GPUs in "
+            "a node is very fast. Hence, we do allreduce across GPUs in a node, "
+            "and gossip across different nodes"
+        },
+    )
+    pipeline_model_parallel: bool = field(
+        default=False,
+        metadata={"help": "if set, use pipeline model parallelism across GPUs"},
+    )
+    pipeline_balance: Optional[str] = field(
+        default=None,
+        metadata={
+            "help": "partition the model into N_K pieces, where each piece "
+            "contains N_i layers. The sum(args.pipeline_balance) "
+            "should equal the total number of layers in the model"
+        },
+    )
+    pipeline_devices: Optional[str] = field(
+        default=None,
+        metadata={
+            "help": "a list of device indices indicating which device to place "
+            "each of the N_K partitions. The length of this list should "
+            "equal the length of the --pipeline-balance argument"
+        },
+    )
+    pipeline_chunks: Optional[int] = field(
+        default=0, metadata={"help": "microbatch count for pipeline model parallelism"}
+    )
+    pipeline_encoder_balance: Optional[str] = field(
+        default=None,
+        metadata={
+            "help": "partition the pipeline parallel encoder into N_K pieces, where each piece "
+            "contains N_i layers. The sum(args.pipeline_encoder_balance) "
+            "should equal the total number of encoder layers in the model"
+        },
+    )
+    pipeline_encoder_devices: Optional[str] = field(
+        default=None,
+        metadata={
+            "help": "a list of device indices indicating which device to place "
+            "each of the N_K partitions. The length of this list should "
+            "equal the length of the --pipeline-encoder-balance argument"
+        },
+    )
+    pipeline_decoder_balance: Optional[str] = field(
+        default=None,
+        metadata={
+            "help": "partition the pipeline parallel decoder into N_K pieces, where each piece "
+            "contains N_i layers. The sum(args.pipeline_decoder_balance) "
+            "should equal the total number of decoder layers in the model"
+        },
+    )
+    pipeline_decoder_devices: Optional[str] = field(
+        default=None,
+        metadata={
+            "help": "a list of device indices indicating which device to place "
+            "each of the N_K partitions. The length of this list should "
+            "equal the length of the --pipeline-decoder-balance argument"
+        },
+    )
+    pipeline_checkpoint: PIPELINE_CHECKPOINT_CHOICES = field(
+        default="never",
+        metadata={"help": "checkpointing mode for pipeline model parallelism"},
+    )
+    zero_sharding: ZERO_SHARDING_CHOICES = field(
+        default="none", metadata={"help": "ZeRO sharding"}
+    )
+    fp16: bool = II("common.fp16")
+    memory_efficient_fp16: bool = II("common.memory_efficient_fp16")
+    tpu: bool = II("common.tpu")
+    # configuration for --ddp-backend=fully_sharded
+    no_reshard_after_forward: bool = field(
+        default=False, metadata={"help": "don't reshard parameters after forward pass"},
+    )
+    fp32_reduce_scatter: bool = field(
+        default=False, metadata={"help": "reduce-scatter grads in FP32"},
+    )
+    cpu_offload: bool = field(
+        default=False, metadata={"help": "offload FP32 params to CPU"}
+    )
+    use_sharded_state: bool = field(
+        default=False, metadata={"help": "use sharded checkpoint files"},
+    )
+
+
+@dataclass
+class DatasetConfig(FairseqDataclass):
+    num_workers: int = field(
+        default=1, metadata={"help": "how many subprocesses to use for data loading"}
+    )
+    skip_invalid_size_inputs_valid_test: bool = field(
+        default=False,
+        metadata={"help": "ignore too long or too short lines in valid and test set"},
+    )
+    max_tokens: Optional[int] = field(
+        default=None, metadata={"help": "maximum number of tokens in a batch"}
+    )
+    batch_size: Optional[int] = field(
+        default=None,
+        metadata={
+            "help": "number of examples in a batch",
+            "argparse_alias": "--max-sentences",
+        },
+    )
+    required_batch_size_multiple: int = field(
+        default=8, metadata={"help": "batch size will be a multiplier of this value"}
+    )
+    required_seq_len_multiple: int = field(
+        default=1,
+        metadata={
+            "help": "maximum sequence length in batch will be a multiplier of this value"
+        },
+    )
+    dataset_impl: Optional[DATASET_IMPL_CHOICES] = field(
+        default=None, metadata={"help": "output dataset implementation"}
+    )
+    data_buffer_size: int = field(
+        default=10, metadata={"help": "Number of batches to preload"}
+    )
+    train_subset: str = field(
+        default="train",
+        metadata={"help": "data subset to use for training (e.g. train, valid, test)"},
+    )
+    valid_subset: str = field(
+        default="valid",
+        metadata={
+            "help": "comma separated list of data subsets to use for validation"
+            " (e.g. train, valid, test)"
+        },
+    )
+    combine_valid_subsets: Optional[bool] = field(
+        default=None,
+        metadata={
+            "help": "comma separated list of data subsets to use for validation"
+                    " (e.g. train, valid, test)",
+            "argparse_alias": "--combine-val",
+        },
+    )
+    ignore_unused_valid_subsets: Optional[bool] = field(
+        default=False,
+        metadata={"help": "do not raise error if valid subsets are ignored"},
+    )
+
+    validate_interval: int = field(
+        default=1, metadata={"help": "validate every N epochs"}
+    )
+    validate_interval_updates: int = field(
+        default=0, metadata={"help": "validate every N updates"}
+    )
+    validate_after_updates: int = field(
+        default=0, metadata={"help": "dont validate until reaching this many updates"}
+    )
+    fixed_validation_seed: Optional[int] = field(
+        default=None, metadata={"help": "specified random seed for validation"}
+    )
+    disable_validation: bool = field(
+        default=False, metadata={"help": "disable validation"}
+    )
+    max_tokens_valid: Optional[int] = field(
+        default=II("dataset.max_tokens"),
+        metadata={
+            "help": "maximum number of tokens in a validation batch"
+            " (defaults to --max-tokens)"
+        },
+    )
+    batch_size_valid: Optional[int] = field(
+        default=II("dataset.batch_size"),
+        metadata={
+            "help": "batch size of the validation batch (defaults to --batch-size)",
+            "argparse_alias": "--max-sentences-valid",
+        },
+    )
+    max_valid_steps: Optional[int] = field(default=None, metadata={'help': 'How many batches to evaluate',
+                                                                   "argparse_alias": "--nval"})
+    curriculum: int = field(
+        default=0, metadata={"help": "don't shuffle batches for first N epochs"}
+    )
+    gen_subset: str = field(
+        default="test",
+        metadata={"help": "data subset to generate (train, valid, test)"},
+    )
+    num_shards: int = field(
+        default=1, metadata={"help": "shard generation over N shards"}
+    )
+    shard_id: int = field(
+        default=0, metadata={"help": "id of the shard to generate (id < num_shards)"}
+    )
+
+
+@dataclass
+class OptimizationConfig(FairseqDataclass):
+    max_epoch: int = field(
+        default=0, metadata={"help": "force stop training at specified epoch"}
+    )
+    max_update: int = field(
+        default=0, metadata={"help": "force stop training at specified update"}
+    )
+    stop_time_hours: float = field(
+        default=0,
+        metadata={
+            "help": "force stop training after specified cumulative time (if >0)"
+        },
+    )
+    clip_norm: float = field(
+        default=0.0, metadata={"help": "clip threshold of gradients"}
+    )
+    sentence_avg: bool = field(
+        default=False,
+        metadata={
+            "help": "normalize gradients by the number of sentences in a batch"
+            " (default is to normalize by number of tokens)"
+        },
+    )
+    update_freq: List[int] = field(
+        default_factory=lambda: [1],
+        metadata={"help": "update parameters every N_i batches, when in epoch i"},
+    )
+    lr: List[float] = field(
+        default_factory=lambda: [0.25],
+        metadata={
+            "help": "learning rate for the first N epochs; all epochs >N using LR_N"
+            " (note: this may be interpreted differently depending on --lr-scheduler)"
+        },
+    )
+    stop_min_lr: float = field(
+        default=-1.0,
+        metadata={"help": "stop training when the learning rate reaches this minimum"},
+    )
+    use_bmuf: bool = field(
+        default=False,
+        metadata={
+            "help": "specify global optimizer for syncing models on different GPUs/shards"
+        },
+    )
+
+
+@dataclass
+class CheckpointConfig(FairseqDataclass):
+    save_dir: str = field(
+        default="checkpoints", metadata={"help": "path to save checkpoints"}
+    )
+    restore_file: str = field(
+        default="checkpoint_last.pt",
+        metadata={
+            "help": "filename from which to load checkpoint "
+            "(default: <save-dir>/checkpoint_last.pt"
+        },
+    )
+    finetune_from_model: Optional[str] = field(
+        default=None,
+        metadata={
+            "help": "finetune from a pretrained model; note that meters and lr scheduler will be reset"
+        },
+    )
+    reset_dataloader: bool = field(
+        default=False,
+        metadata={
+            "help": "if set, does not reload dataloader state from the checkpoint"
+        },
+    )
+    reset_lr_scheduler: bool = field(
+        default=False,
+        metadata={
+            "help": "if set, does not load lr scheduler state from the checkpoint"
+        },
+    )
+    reset_meters: bool = field(
+        default=False,
+        metadata={"help": "if set, does not load meters from the checkpoint"},
+    )
+    reset_optimizer: bool = field(
+        default=False,
+        metadata={"help": "if set, does not load optimizer state from the checkpoint"},
+    )
+    optimizer_overrides: str = field(
+        default="{}",
+        metadata={
+            "help": "a dictionary used to override optimizer args when loading a checkpoint"
+        },
+    )
+    save_interval: int = field(
+        default=1, metadata={"help": "save a checkpoint every N epochs"}
+    )
+    save_interval_updates: int = field(
+        default=0, metadata={"help": "save a checkpoint (and validate) every N updates"}
+    )
+    keep_interval_updates: int = field(
+        default=-1,
+        metadata={
+            "help": "keep the last N checkpoints saved with --save-interval-updates"
+        },
+    )
+    keep_interval_updates_pattern: int = field(
+        default=-1,
+        metadata={
+            "help": "when used with --keep-interval-updates, skips deleting "
+                    "any checkpoints with update X where "
+                    "X %% keep_interval_updates_pattern == 0"
+        },
+    )
+    keep_last_epochs: int = field(
+        default=-1, metadata={"help": "keep last N epoch checkpoints"}
+    )
+    keep_best_checkpoints: int = field(
+        default=-1, metadata={"help": "keep best N checkpoints based on scores"}
+    )
+    no_save: bool = field(
+        default=False, metadata={"help": "don't save models or checkpoints"}
+    )
+    no_epoch_checkpoints: bool = field(
+        default=False, metadata={"help": "only store last and best checkpoints"}
+    )
+    no_last_checkpoints: bool = field(
+        default=False, metadata={"help": "don't store last checkpoints"}
+    )
+    no_save_optimizer_state: bool = field(
+        default=False,
+        metadata={"help": "don't save optimizer-state as part of checkpoint"},
+    )
+    best_checkpoint_metric: str = field(
+        default="loss", metadata={"help": 'metric to use for saving "best" checkpoints'}
+    )
+    maximize_best_checkpoint_metric: bool = field(
+        default=False,
+        metadata={
+            "help": 'select the largest metric value for saving "best" checkpoints'
+        },
+    )
+    patience: int = field(
+        default=-1,
+        metadata={
+            "help": (
+                "early stop training if valid performance doesn't "
+                "improve for N consecutive validation runs; note "
+                "that this is influenced by --validate-interval"
+            )
+        },
+    )
+    checkpoint_suffix: str = field(
+        default="", metadata={"help": "suffix to add to the checkpoint file name"}
+    )
+    checkpoint_shard_count: int = field(
+        default=1,
+        metadata={
+            "help": "Number of shards containing the checkpoint - "
+            "if the checkpoint is over 300GB, it is preferable "
+            "to split it into shards to prevent OOM on CPU while loading "
+            "the checkpoint"
+        },
+    )
+    load_checkpoint_on_all_dp_ranks: bool = field(
+        default=False,
+        metadata={
+            "help": "load checkpoints on all data parallel devices "
+            "(default: only load on rank 0 and broadcast to other devices)"
+        },
+    )
+    write_checkpoints_asynchronously: bool = field(
+        default=False,
+        metadata={
+            "help": (
+                "Write checkpoints asynchronously in a separate "
+                "thread. NOTE: This feature is currently being tested."
+            ),
+            "argparse_alias": "--save-async",
+        },
+    )
+    model_parallel_size: int = II("common.model_parallel_size")
+
+
+@dataclass
+class FairseqBMUFConfig(FairseqDataclass):
+    block_lr: float = field(
+        default=1, metadata={"help": "block learning rate for bmuf"}
+    )
+    block_momentum: float = field(
+        default=0.875, metadata={"help": "block momentum for bmuf"}
+    )
+    global_sync_iter: int = field(
+        default=50, metadata={"help": "Iteration for syncing global model"}
+    )
+    warmup_iterations: int = field(
+        default=500, metadata={"help": "warmup iterations for model to broadcast"}
+    )
+    use_nbm: bool = field(
+        default=False,
+        metadata={"help": "Specify whether you want to use classical BM / Nesterov BM"},
+    )
+    average_sync: bool = field(
+        default=False,
+        metadata={
+            "help": "Specify whether you want to average the local momentum after each sync"
+        },
+    )
+    distributed_world_size: int = II("distributed_training.distributed_world_size")
+
+
+@dataclass
+class GenerationConfig(FairseqDataclass):
+    beam: int = field(
+        default=5, metadata={"help": "beam size"},
+    )
+    nbest: int = field(
+        default=1, metadata={"help": "number of hypotheses to output"},
+    )
+    max_len_a: float = field(
+        default=0,
+        metadata={
+            "help": "generate sequences of maximum length ax + b, where x is the source length"
+        },
+    )
+    max_len_b: int = field(
+        default=200,
+        metadata={
+            "help": "generate sequences of maximum length ax + b, where x is the source length"
+        },
+    )
+    min_len: int = field(
+        default=1, metadata={"help": "minimum generation length"},
+    )
+    match_source_len: bool = field(
+        default=False, metadata={"help": "generations should match the source length"},
+    )
+    unnormalized: bool = field(
+        default=False, metadata={"help": "compare unnormalized hypothesis scores"},
+    )
+    no_early_stop: bool = field(
+        default=False, metadata={"help": "deprecated"},
+    )
+    no_beamable_mm: bool = field(
+        default=False, metadata={"help": "don't use BeamableMM in attention layers"},
+    )
+    lenpen: float = field(
+        default=1,
+        metadata={
+            "help": "length penalty: <1.0 favors shorter, >1.0 favors longer sentences"
+        },
+    )
+    unkpen: float = field(
+        default=0,
+        metadata={
+            "help": "unknown word penalty: <0 produces more unks, >0 produces fewer"
+        },
+    )
+    replace_unk: Optional[str] = field(
+        default=None,
+        metadata={
+            "help": "perform unknown replacement (optionally with alignment dictionary)",
+            "argparse_const": "@@ ",
+        },
+    )
+    sacrebleu: bool = field(
+        default=False, metadata={"help": "score with sacrebleu"},
+    )
+    score_reference: bool = field(
+        default=False, metadata={"help": "just score the reference translation"},
+    )
+    prefix_size: int = field(
+        default=0,
+        metadata={"help": "initialize generation by target prefix of given length"},
+    )
+    no_repeat_ngram_size: int = field(
+        default=0,
+        metadata={
+            "help": "ngram blocking such that this size ngram cannot be repeated in the generation"
+        },
+    )
+    sampling: bool = field(
+        default=False,
+        metadata={"help": "sample hypotheses instead of using beam search"},
+    )
+    sampling_topk: int = field(
+        default=-1,
+        metadata={"help": "sample from top K likely next words instead of all words"},
+    )
+    sampling_topp: float = field(
+        default=-1.0,
+        metadata={
+            "help": "sample from the smallest set whose cumulative probability mass exceeds p for next words"
+        },
+    )
+    constraints: Optional[GENERATION_CONSTRAINTS_CHOICES] = field(
+        default=None,
+        metadata={
+            "help": "enables lexically constrained decoding",
+            "argparse_const": "ordered",
+        },
+    )
+    temperature: float = field(
+        default=1.0, metadata={"help": "temperature for generation"},
+    )
+    diverse_beam_groups: int = field(
+        default=-1, metadata={"help": "number of groups for Diverse Beam Search"},
+    )
+    diverse_beam_strength: float = field(
+        default=0.5,
+        metadata={"help": "strength of diversity penalty for Diverse Beam Search"},
+    )
+    diversity_rate: float = field(
+        default=-1.0,
+        metadata={"help": "strength of diversity penalty for Diverse Siblings Search"},
+    )
+    print_alignment: Optional[PRINT_ALIGNMENT_CHOICES] = field(
+        default=None,
+        metadata={
+            "help": "if set, uses attention feedback to compute and print alignment to source tokens "
+            "(valid options are: hard, soft, otherwise treated as hard alignment)",
+            "argparse_const": "hard",
+        },
+    )
+    print_step: bool = field(
+        default=False, metadata={"help": "print steps"},
+    )
+    lm_path: Optional[str] = field(
+        default=None, metadata={"help": "path to lm checkpoint for lm fusion"},
+    )
+    lm_weight: float = field(
+        default=0.0, metadata={"help": "weight for lm probs for lm fusion"},
+    )
+
+    # arguments for iterative refinement generator
+    iter_decode_eos_penalty: float = field(
+        default=0.0,
+        metadata={"help": "if > 0.0, it penalized early-stopping in decoding."},
+    )
+    iter_decode_max_iter: int = field(
+        default=10, metadata={"help": "maximum iterations for iterative refinement."},
+    )
+    iter_decode_force_max_iter: bool = field(
+        default=False,
+        metadata={
+            "help": "if set, run exact the maximum number of iterations without early stop"
+        },
+    )
+    iter_decode_with_beam: int = field(
+        default=1,
+        metadata={
+            "help": "if > 1, model will generate translations varying by the lengths."
+        },
+    )
+    iter_decode_with_external_reranker: bool = field(
+        default=False,
+        metadata={
+            "help": "if set, the last checkpoint are assumed to be a reranker to rescore the translations"
+        },
+    )
+    retain_iter_history: bool = field(
+        default=False,
+        metadata={
+            "help": "if set, decoding returns the whole history of iterative refinement"
+        },
+    )
+    retain_dropout: bool = field(
+        default=False, metadata={"help": "Use dropout at inference time"},
+    )
+    # temporarily set to Any until https://github.com/facebookresearch/hydra/issues/1117 is fixed
+    # retain_dropout_modules: Optional[List[str]] = field(
+    retain_dropout_modules: Any = field(
+        default=None,
+        metadata={
+            "help": "if set, only retain dropout for the specified modules; "
+            "if not set, then dropout will be retained for all modules"
+        },
+    )
+    # special decoding format for advanced decoding.
+    decoding_format: Optional[GENERATION_DECODING_FORMAT_CHOICES] = field(
+        default=None,
+        metadata={"help": "special decoding format for advanced decoding."},
+    )
+    no_seed_provided: bool = field(
+        default=False,
+        metadata={"help": "if set, dont use seed for initializing random generators"},
+    )
+
+
+@dataclass
+class CommonEvalConfig(FairseqDataclass):
+    path: Optional[str] = field(
+        default=None, metadata={"help": "path(s) to model file(s), colon separated"},
+    )
+    post_process: Optional[str] = field(
+        default=None,
+        metadata={
+            "help": (
+                "post-process text by removing BPE, letter segmentation, etc. "
+                "Valid options can be found in fairseq.data.utils.post_process."
+            ),
+            "argparse_const": "subword_nmt",
+            "argparse_alias": "--remove-bpe",
+        },
+    )
+    quiet: bool = field(default=False, metadata={"help": "only print final scores"})
+    model_overrides: str = field(
+        default="{}",
+        metadata={
+            "help": "a dictionary used to override model args at generation that were used during model training"
+        },
+    )
+    results_path: Optional[str] = field(
+        default=None, metadata={"help": "path to save eval results (optional)"}
+    )
+
+
+@dataclass
+class EvalLMConfig(FairseqDataclass):
+    output_word_probs: bool = field(
+        default=False,
+        metadata={
+            "help": "if set, outputs words and their predicted log probabilities to standard output"
+        },
+    )
+    output_word_stats: bool = field(
+        default=False,
+        metadata={
+            "help": "if set, outputs word statistics such as word count, average probability, etc"
+        },
+    )
+    context_window: int = field(
+        default=0,
+        metadata={
+            "help": "ensures that every evaluated token has access to a context of at least this size, if possible"
+        },
+    )
+    softmax_batch: int = field(
+        default=sys.maxsize,
+        metadata={
+            "help": "if BxT is more than this, will batch the softmax over vocab to this amount of tokens, in order to fit into GPU memory"
+        },
+    )
+
+
+@dataclass
+class InteractiveConfig(FairseqDataclass):
+    buffer_size: int = field(
+        default=0,
+        metadata={
+            "help": "read this many sentences into a buffer before processing them"
+        },
+    )
+    input: str = field(
+        default="-", metadata={"help": "file to read from; use - for stdin"},
+    )
+
+
+@dataclass
+class FairseqConfig(FairseqDataclass):
+    common: CommonConfig = CommonConfig()
+    common_eval: CommonEvalConfig = CommonEvalConfig()
+    distributed_training: DistributedTrainingConfig = DistributedTrainingConfig()
+    dataset: DatasetConfig = DatasetConfig()
+    optimization: OptimizationConfig = OptimizationConfig()
+    checkpoint: CheckpointConfig = CheckpointConfig()
+    bmuf: FairseqBMUFConfig = FairseqBMUFConfig()
+    generation: GenerationConfig = GenerationConfig()
+    eval_lm: EvalLMConfig = EvalLMConfig()
+    interactive: InteractiveConfig = InteractiveConfig()
+    model: Any = MISSING
+    task: Any = None
+    criterion: Any = None
+    optimizer: Any = None
+    lr_scheduler: Any = None
+    scoring: Any = None
+    bpe: Any = None
+    tokenizer: Any = None
diff --git a/SpeechT5/fairseq/fairseq/dataclass/constants.py b/SpeechT5/fairseq/fairseq/dataclass/constants.py
new file mode 100644
index 0000000000000000000000000000000000000000..442c25982b55f680880147feb64b9c2e6756142c
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/dataclass/constants.py
@@ -0,0 +1,54 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from enum import Enum, EnumMeta
+from typing import List
+
+
+class StrEnumMeta(EnumMeta):
+    # this is workaround for submitit pickling leading to instance checks failing in hydra for StrEnum, see
+    # https://github.com/facebookresearch/hydra/issues/1156
+    @classmethod
+    def __instancecheck__(cls, other):
+        return "enum" in str(type(other))
+
+
+class StrEnum(Enum, metaclass=StrEnumMeta):
+    def __str__(self):
+        return self.value
+
+    def __eq__(self, other: str):
+        return self.value == other
+
+    def __repr__(self):
+        return self.value
+
+    def __hash__(self):
+        return hash(str(self))
+
+
+def ChoiceEnum(choices: List[str]):
+    """return the Enum class used to enforce list of choices"""
+    return StrEnum("Choices", {k: k for k in choices})
+
+
+LOG_FORMAT_CHOICES = ChoiceEnum(["json", "none", "simple", "tqdm"])
+DDP_BACKEND_CHOICES = ChoiceEnum([
+    "c10d",  # alias for pytorch_ddp
+    "fully_sharded",  # FullyShardedDataParallel from fairscale
+    "legacy_ddp",
+    "no_c10d",  # alias for legacy_ddp
+    "pytorch_ddp",
+    "slow_mo",
+])
+DDP_COMM_HOOK_CHOICES = ChoiceEnum(["none", "fp16"])
+DATASET_IMPL_CHOICES = ChoiceEnum(["raw", "lazy", "cached", "mmap", "fasta"])
+GENERATION_CONSTRAINTS_CHOICES = ChoiceEnum(["ordered", "unordered"])
+GENERATION_DECODING_FORMAT_CHOICES = ChoiceEnum(
+    ["unigram", "ensemble", "vote", "dp", "bs"]
+)
+ZERO_SHARDING_CHOICES = ChoiceEnum(["none", "os"])
+PIPELINE_CHECKPOINT_CHOICES = ChoiceEnum(["always", "never", "except_last"])
+PRINT_ALIGNMENT_CHOICES = ChoiceEnum(["hard", "soft"])
diff --git a/SpeechT5/fairseq/fairseq/dataclass/initialize.py b/SpeechT5/fairseq/fairseq/dataclass/initialize.py
new file mode 100644
index 0000000000000000000000000000000000000000..479aeb8b16ad230c424da353a689fe3505b449e5
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/dataclass/initialize.py
@@ -0,0 +1,61 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+"""isort:skip_file"""
+
+import logging
+from hydra.core.config_store import ConfigStore
+from fairseq.dataclass.configs import FairseqConfig
+from omegaconf import DictConfig, OmegaConf
+
+
+logger = logging.getLogger(__name__)
+
+
+def hydra_init(cfg_name="config") -> None:
+
+    cs = ConfigStore.instance()
+    cs.store(name=cfg_name, node=FairseqConfig)
+
+    for k in FairseqConfig.__dataclass_fields__:
+        v = FairseqConfig.__dataclass_fields__[k].default
+        try:
+            cs.store(name=k, node=v)
+        except BaseException:
+            logger.error(f"{k} - {v}")
+            raise
+
+
+def add_defaults(cfg: DictConfig) -> None:
+    """This function adds default values that are stored in dataclasses that hydra doesn't know about """
+
+    from fairseq.registry import REGISTRIES
+    from fairseq.tasks import TASK_DATACLASS_REGISTRY
+    from fairseq.models import ARCH_MODEL_NAME_REGISTRY, MODEL_DATACLASS_REGISTRY
+    from fairseq.dataclass.utils import merge_with_parent
+    from typing import Any
+
+    OmegaConf.set_struct(cfg, False)
+
+    for k, v in FairseqConfig.__dataclass_fields__.items():
+        field_cfg = cfg.get(k)
+        if field_cfg is not None and v.type == Any:
+            dc = None
+
+            if isinstance(field_cfg, str):
+                field_cfg = DictConfig({"_name": field_cfg})
+                field_cfg.__dict__["_parent"] = field_cfg.__dict__["_parent"]
+
+            name = field_cfg.get("_name")
+
+            if k == "task":
+                dc = TASK_DATACLASS_REGISTRY.get(name)
+            elif k == "model":
+                name = ARCH_MODEL_NAME_REGISTRY.get(name, name)
+                dc = MODEL_DATACLASS_REGISTRY.get(name)
+            elif k in REGISTRIES:
+                dc = REGISTRIES[k]["dataclass_registry"].get(name)
+
+            if dc is not None:
+                cfg[k] = merge_with_parent(dc, field_cfg)
diff --git a/SpeechT5/fairseq/fairseq/dataclass/utils.py b/SpeechT5/fairseq/fairseq/dataclass/utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..89206125d1d50ccc4b4d56394a76bc07bb32927a
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/dataclass/utils.py
@@ -0,0 +1,476 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import ast
+import inspect
+import logging
+import os
+import re
+from argparse import ArgumentError, ArgumentParser, Namespace
+from dataclasses import _MISSING_TYPE, MISSING, is_dataclass
+from enum import Enum
+from typing import Any, Dict, List, Optional, Tuple, Type
+
+from fairseq.dataclass import FairseqDataclass
+from fairseq.dataclass.configs import FairseqConfig
+from hydra.core.global_hydra import GlobalHydra
+from hydra.experimental import compose, initialize
+from omegaconf import DictConfig, OmegaConf, open_dict
+
+logger = logging.getLogger(__name__)
+
+
+def eval_str_list(x, x_type=float):
+    if x is None:
+        return None
+    if isinstance(x, str):
+        if len(x) == 0:
+            return []
+        x = ast.literal_eval(x)
+    try:
+        return list(map(x_type, x))
+    except TypeError:
+        return [x_type(x)]
+
+
+def interpret_dc_type(field_type):
+    if isinstance(field_type, str):
+        raise RuntimeError("field should be a type")
+
+    if field_type == Any:
+        return str
+
+    typestring = str(field_type)
+    if re.match(
+        r"(typing.|^)Union\[(.*), NoneType\]$", typestring
+    ) or typestring.startswith("typing.Optional"):
+        return field_type.__args__[0]
+    return field_type
+
+
+def gen_parser_from_dataclass(
+    parser: ArgumentParser,
+    dataclass_instance: FairseqDataclass,
+    delete_default: bool = False,
+) -> None:
+    """convert a dataclass instance to tailing parser arguments"""
+
+    def argparse_name(name: str):
+        if name == "data":
+            # normally data is positional args
+            return name
+        if name == "_name":
+            # private member, skip
+            return None
+        return "--" + name.replace("_", "-")
+
+    def get_kwargs_from_dc(
+        dataclass_instance: FairseqDataclass, k: str
+    ) -> Dict[str, Any]:
+        """k: dataclass attributes"""
+
+        kwargs = {}
+
+        field_type = dataclass_instance._get_type(k)
+        inter_type = interpret_dc_type(field_type)
+
+        field_default = dataclass_instance._get_default(k)
+
+        if isinstance(inter_type, type) and issubclass(inter_type, Enum):
+            field_choices = [t.value for t in list(inter_type)]
+        else:
+            field_choices = None
+
+        field_help = dataclass_instance._get_help(k)
+        field_const = dataclass_instance._get_argparse_const(k)
+
+        if isinstance(field_default, str) and field_default.startswith("${"):
+            kwargs["default"] = field_default
+        else:
+            if field_default is MISSING:
+                kwargs["required"] = True
+            if field_choices is not None:
+                kwargs["choices"] = field_choices
+            if (
+                isinstance(inter_type, type)
+                and (issubclass(inter_type, List) or issubclass(inter_type, Tuple))
+            ) or ("List" in str(inter_type) or "Tuple" in str(inter_type)):
+                if "int" in str(inter_type):
+                    kwargs["type"] = lambda x: eval_str_list(x, int)
+                elif "float" in str(inter_type):
+                    kwargs["type"] = lambda x: eval_str_list(x, float)
+                elif "str" in str(inter_type):
+                    kwargs["type"] = lambda x: eval_str_list(x, str)
+                else:
+                    raise NotImplementedError(
+                        "parsing of type " + str(inter_type) + " is not implemented"
+                    )
+                if field_default is not MISSING:
+                    kwargs["default"] = (
+                        ",".join(map(str, field_default))
+                        if field_default is not None
+                        else None
+                    )
+            elif (
+                isinstance(inter_type, type) and issubclass(inter_type, Enum)
+            ) or "Enum" in str(inter_type):
+                kwargs["type"] = str
+                if field_default is not MISSING:
+                    if isinstance(field_default, Enum):
+                        kwargs["default"] = field_default.value
+                    else:
+                        kwargs["default"] = field_default
+            elif inter_type is bool:
+                kwargs["action"] = (
+                    "store_false" if field_default is True else "store_true"
+                )
+                kwargs["default"] = field_default
+            else:
+                kwargs["type"] = inter_type
+                if field_default is not MISSING:
+                    kwargs["default"] = field_default
+
+        kwargs["help"] = field_help
+        if field_const is not None:
+            kwargs["const"] = field_const
+            kwargs["nargs"] = "?"
+
+        return kwargs
+
+    for k in dataclass_instance._get_all_attributes():
+        field_name = argparse_name(dataclass_instance._get_name(k))
+        field_type = dataclass_instance._get_type(k)
+        if field_name is None:
+            continue
+        elif inspect.isclass(field_type) and issubclass(field_type, FairseqDataclass):
+            gen_parser_from_dataclass(parser, field_type(), delete_default)
+            continue
+
+        kwargs = get_kwargs_from_dc(dataclass_instance, k)
+
+        field_args = [field_name]
+        alias = dataclass_instance._get_argparse_alias(k)
+        if alias is not None:
+            field_args.append(alias)
+
+        if "default" in kwargs:
+            if isinstance(kwargs["default"], str) and kwargs["default"].startswith(
+                "${"
+            ):
+                if kwargs["help"] is None:
+                    # this is a field with a name that will be added elsewhere
+                    continue
+                else:
+                    del kwargs["default"]
+            if delete_default and "default" in kwargs:
+                del kwargs["default"]
+        try:
+            parser.add_argument(*field_args, **kwargs)
+        except ArgumentError:
+            pass
+
+
+def _set_legacy_defaults(args, cls):
+    """Helper to set default arguments based on *add_args*."""
+    if not hasattr(cls, "add_args"):
+        return
+
+    import argparse
+
+    parser = argparse.ArgumentParser(
+        argument_default=argparse.SUPPRESS, allow_abbrev=False
+    )
+    cls.add_args(parser)
+    # copied from argparse.py:
+    defaults = argparse.Namespace()
+    for action in parser._actions:
+        if action.dest is not argparse.SUPPRESS:
+            if not hasattr(defaults, action.dest):
+                if action.default is not argparse.SUPPRESS:
+                    setattr(defaults, action.dest, action.default)
+    for key, default_value in vars(defaults).items():
+        if not hasattr(args, key):
+            setattr(args, key, default_value)
+
+
+def _override_attr(
+    sub_node: str, data_class: Type[FairseqDataclass], args: Namespace
+) -> List[str]:
+    overrides = []
+
+    if not inspect.isclass(data_class) or not issubclass(data_class, FairseqDataclass):
+        return overrides
+
+    def get_default(f):
+        if not isinstance(f.default_factory, _MISSING_TYPE):
+            return f.default_factory()
+        return f.default
+
+    for k, v in data_class.__dataclass_fields__.items():
+        if k.startswith("_"):
+            # private member, skip
+            continue
+
+        val = get_default(v) if not hasattr(args, k) else getattr(args, k)
+
+        field_type = interpret_dc_type(v.type)
+        if (
+            isinstance(val, str)
+            and not val.startswith("${")  # not interpolation
+            and field_type != str
+            and (
+                not inspect.isclass(field_type) or not issubclass(field_type, Enum)
+            )  # not choices enum
+        ):
+            # upgrade old models that stored complex parameters as string
+            val = ast.literal_eval(val)
+
+        if isinstance(val, tuple):
+            val = list(val)
+
+        v_type = getattr(v.type, "__origin__", None)
+        if (
+            (v_type is List or v_type is list or v_type is Optional)
+            # skip interpolation
+            and not (isinstance(val, str) and val.startswith("${"))
+        ):
+            # if type is int but val is float, then we will crash later - try to convert here
+            if hasattr(v.type, "__args__"):
+                t_args = v.type.__args__
+                if len(t_args) == 1 and (t_args[0] is float or t_args[0] is int):
+                    val = list(map(t_args[0], val))
+        elif val is not None and (
+            field_type is int or field_type is bool or field_type is float
+        ):
+            try:
+                val = field_type(val)
+            except:
+                pass  # ignore errors here, they are often from interpolation args
+
+        if val is None:
+            overrides.append("{}.{}=null".format(sub_node, k))
+        elif val == "":
+            overrides.append("{}.{}=''".format(sub_node, k))
+        elif isinstance(val, str):
+            val = val.replace("'", r"\'")
+            overrides.append("{}.{}='{}'".format(sub_node, k, val))
+        elif isinstance(val, FairseqDataclass):
+            overrides += _override_attr(f"{sub_node}.{k}", type(val), args)
+        elif isinstance(val, Namespace):
+            sub_overrides, _ = override_module_args(val)
+            for so in sub_overrides:
+                overrides.append(f"{sub_node}.{k}.{so}")
+        else:
+            overrides.append("{}.{}={}".format(sub_node, k, val))
+
+    return overrides
+
+
+def migrate_registry(
+    name, value, registry, args, overrides, deletes, use_name_as_val=False
+):
+    if value in registry:
+        overrides.append("{}={}".format(name, value))
+        overrides.append("{}._name={}".format(name, value))
+        overrides.extend(_override_attr(name, registry[value], args))
+    elif use_name_as_val and value is not None:
+        overrides.append("{}={}".format(name, value))
+    else:
+        deletes.append(name)
+
+
+def override_module_args(args: Namespace) -> Tuple[List[str], List[str]]:
+    """use the field in args to overrides those in cfg"""
+    overrides = []
+    deletes = []
+
+    for k in FairseqConfig.__dataclass_fields__.keys():
+        overrides.extend(
+            _override_attr(k, FairseqConfig.__dataclass_fields__[k].type, args)
+        )
+
+    if args is not None:
+        if hasattr(args, "task"):
+            from fairseq.tasks import TASK_DATACLASS_REGISTRY
+
+            migrate_registry(
+                "task", args.task, TASK_DATACLASS_REGISTRY, args, overrides, deletes
+            )
+        else:
+            deletes.append("task")
+
+        # these options will be set to "None" if they have not yet been migrated
+        # so we can populate them with the entire flat args
+        CORE_REGISTRIES = {"criterion", "optimizer", "lr_scheduler"}
+
+        from fairseq.registry import REGISTRIES
+
+        for k, v in REGISTRIES.items():
+            if hasattr(args, k):
+                migrate_registry(
+                    k,
+                    getattr(args, k),
+                    v["dataclass_registry"],
+                    args,
+                    overrides,
+                    deletes,
+                    use_name_as_val=k not in CORE_REGISTRIES,
+                )
+            else:
+                deletes.append(k)
+
+        no_dc = True
+        if hasattr(args, "arch"):
+            from fairseq.models import ARCH_MODEL_REGISTRY, ARCH_MODEL_NAME_REGISTRY
+
+            if args.arch in ARCH_MODEL_REGISTRY:
+                m_cls = ARCH_MODEL_REGISTRY[args.arch]
+                dc = getattr(m_cls, "__dataclass", None)
+                if dc is not None:
+                    m_name = ARCH_MODEL_NAME_REGISTRY[args.arch]
+                    overrides.append("model={}".format(m_name))
+                    overrides.append("model._name={}".format(args.arch))
+                    # override model params with those exist in args
+                    overrides.extend(_override_attr("model", dc, args))
+                    no_dc = False
+        if no_dc:
+            deletes.append("model")
+
+    return overrides, deletes
+
+
+def convert_namespace_to_omegaconf(args: Namespace) -> DictConfig:
+    """Convert a flat argparse.Namespace to a structured DictConfig."""
+
+    # Here we are using field values provided in args to override counterparts inside config object
+    overrides, deletes = override_module_args(args)
+
+    # configs will be in fairseq/config after installation
+    config_path = os.path.join("..", "config")
+
+    GlobalHydra.instance().clear()
+
+    with initialize(config_path=config_path):
+        try:
+            composed_cfg = compose("config", overrides=overrides, strict=False)
+        except:
+            logger.error("Error when composing. Overrides: " + str(overrides))
+            raise
+
+        for k in deletes:
+            composed_cfg[k] = None
+
+    cfg = OmegaConf.create(
+        OmegaConf.to_container(composed_cfg, resolve=True, enum_to_str=True)
+    )
+
+    # hack to be able to set Namespace in dict config. this should be removed when we update to newer
+    # omegaconf version that supports object flags, or when we migrate all existing models
+    from omegaconf import _utils
+
+    old_primitive = _utils.is_primitive_type
+    _utils.is_primitive_type = lambda _: True
+
+    if cfg.task is None and getattr(args, "task", None):
+        cfg.task = Namespace(**vars(args))
+        from fairseq.tasks import TASK_REGISTRY
+
+        _set_legacy_defaults(cfg.task, TASK_REGISTRY[args.task])
+        cfg.task._name = args.task
+    if cfg.model is None and getattr(args, "arch", None):
+        cfg.model = Namespace(**vars(args))
+        from fairseq.models import ARCH_MODEL_REGISTRY
+
+        _set_legacy_defaults(cfg.model, ARCH_MODEL_REGISTRY[args.arch])
+        cfg.model._name = args.arch
+    if cfg.optimizer is None and getattr(args, "optimizer", None):
+        cfg.optimizer = Namespace(**vars(args))
+        from fairseq.optim import OPTIMIZER_REGISTRY
+
+        _set_legacy_defaults(cfg.optimizer, OPTIMIZER_REGISTRY[args.optimizer])
+        cfg.optimizer._name = args.optimizer
+    if cfg.lr_scheduler is None and getattr(args, "lr_scheduler", None):
+        cfg.lr_scheduler = Namespace(**vars(args))
+        from fairseq.optim.lr_scheduler import LR_SCHEDULER_REGISTRY
+
+        _set_legacy_defaults(cfg.lr_scheduler, LR_SCHEDULER_REGISTRY[args.lr_scheduler])
+        cfg.lr_scheduler._name = args.lr_scheduler
+    if cfg.criterion is None and getattr(args, "criterion", None):
+        cfg.criterion = Namespace(**vars(args))
+        from fairseq.criterions import CRITERION_REGISTRY
+
+        _set_legacy_defaults(cfg.criterion, CRITERION_REGISTRY[args.criterion])
+        cfg.criterion._name = args.criterion
+
+    _utils.is_primitive_type = old_primitive
+    OmegaConf.set_struct(cfg, True)
+    return cfg
+
+
+def populate_dataclass(
+    dataclass: FairseqDataclass,
+    args: Namespace,
+) -> FairseqDataclass:
+    for k in dataclass.__dataclass_fields__.keys():
+        if k.startswith("_"):
+            # private member, skip
+            continue
+        if hasattr(args, k):
+            setattr(dataclass, k, getattr(args, k))
+
+    return dataclass
+
+
+def overwrite_args_by_name(cfg: DictConfig, overrides: Dict[str, any]):
+    # this will be deprecated when we get rid of argparse and model_overrides logic
+
+    from fairseq.registry import REGISTRIES
+
+    with open_dict(cfg):
+        for k in cfg.keys():
+            # "k in cfg" will return false if its a "mandatory value (e.g. ???)"
+            if k in cfg and isinstance(cfg[k], DictConfig):
+                if k in overrides and isinstance(overrides[k], dict):
+                    for ok, ov in overrides[k].items():
+                        if isinstance(ov, dict) and cfg[k][ok] is not None:
+                            overwrite_args_by_name(cfg[k][ok], ov)
+                        else:
+                            cfg[k][ok] = ov
+                else:
+                    overwrite_args_by_name(cfg[k], overrides)
+            elif k in cfg and isinstance(cfg[k], Namespace):
+                for override_key, val in overrides.items():
+                    setattr(cfg[k], override_key, val)
+            elif k in overrides:
+                if (
+                    k in REGISTRIES
+                    and overrides[k] in REGISTRIES[k]["dataclass_registry"]
+                ):
+                    cfg[k] = DictConfig(
+                        REGISTRIES[k]["dataclass_registry"][overrides[k]]
+                    )
+                    overwrite_args_by_name(cfg[k], overrides)
+                    cfg[k]._name = overrides[k]
+                else:
+                    cfg[k] = overrides[k]
+
+
+def merge_with_parent(dc: FairseqDataclass, cfg: DictConfig, remove_missing=True):
+    if remove_missing:
+
+        if is_dataclass(dc):
+            target_keys = set(dc.__dataclass_fields__.keys())
+        else:
+            target_keys = set(dc.keys())
+
+        with open_dict(cfg):
+            for k in list(cfg.keys()):
+                if k not in target_keys:
+                    del cfg[k]
+
+    merged_cfg = OmegaConf.merge(dc, cfg)
+    merged_cfg.__dict__["_parent"] = cfg.__dict__["_parent"]
+    OmegaConf.set_struct(merged_cfg, True)
+    return merged_cfg
diff --git a/SpeechT5/fairseq/fairseq/distributed/__init__.py b/SpeechT5/fairseq/fairseq/distributed/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..d0b96b734c4b5e7cd5d295238d0764c05093dc27
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/distributed/__init__.py
@@ -0,0 +1,21 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from .distributed_timeout_wrapper import DistributedTimeoutWrapper
+from .fully_sharded_data_parallel import fsdp_enable_wrap, fsdp_wrap, FullyShardedDataParallel
+from .legacy_distributed_data_parallel import LegacyDistributedDataParallel
+from .module_proxy_wrapper import ModuleProxyWrapper
+from .tpu_distributed_data_parallel import TPUDistributedDataParallel
+
+
+__all__ = [
+    "DistributedTimeoutWrapper",
+    "fsdp_enable_wrap",
+    "fsdp_wrap",
+    "FullyShardedDataParallel",
+    "LegacyDistributedDataParallel",
+    "ModuleProxyWrapper",
+    "TPUDistributedDataParallel",
+]
diff --git a/SpeechT5/fairseq/fairseq/distributed/distributed_timeout_wrapper.py b/SpeechT5/fairseq/fairseq/distributed/distributed_timeout_wrapper.py
new file mode 100644
index 0000000000000000000000000000000000000000..18107ef27ea837b8c72dcaa49db18fd8e64267b1
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/distributed/distributed_timeout_wrapper.py
@@ -0,0 +1,94 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import logging
+import os
+import signal
+import threading
+
+from torch import nn
+
+
+logger = logging.getLogger(__name__)
+
+
+class DistributedTimeoutWrapper(nn.Module):
+    """
+    A wrapper that kills the process if no progress is made within a given
+    *timeout*. The timer is reset every time :func:`forward` is called.
+
+    Usage::
+
+        module = DistributedTimeoutWrapper(module, timeout=30)
+        x = module(input)
+        time.sleep(20)  # safe
+        x = module(input)
+        time.sleep(45)  # job will be killed before this returns
+
+    Args:
+        module (nn.Module): module to wrap
+        timeout (int): number of seconds before killing the process
+            (set to a value <= 0 to disable the timeout)
+        signal (Optional): signal to send once timeout is triggered
+    """
+    def __init__(self, module: nn.Module, timeout: int, signal=signal.SIGINT):
+        super().__init__()
+        self.module = module
+        self.timeout = timeout
+        self.signal = signal
+
+        if timeout > 0:
+            self._heartbeat = threading.Event()
+            self._heartbeat_thread = threading.Thread(
+                target=self._check_heartbeat,
+                args=(os.getpid(),),
+                daemon=True,
+            )
+            self._heartbeat_thread.start()
+            self._terminated = False
+        else:
+            self._heartbeat = None
+            self._heartbeat_thread = None
+
+    def __del__(self):
+        self.stop_timeout()
+
+    def __getattr__(self, name):
+        """Forward missing attributes to wrapped module."""
+        try:
+            return super().__getattr__(name)  # defer to nn.Module's logic
+        except AttributeError:
+            return getattr(self.module, name)
+
+    def stop_timeout(self):
+        if self._heartbeat_thread is not None:
+            self._terminated = True
+            self._heartbeat_thread.join()
+
+    def state_dict(self, *args, **kwargs):
+        return self.module.state_dict(*args, **kwargs)
+
+    def load_state_dict(self, *args, **kwargs):
+        return self.module.load_state_dict(*args, **kwargs)
+
+    def forward(self, *args, **kwargs):
+        if self._heartbeat is not None:
+            self._heartbeat.set()
+        return self.module(*args, **kwargs)
+
+    def _check_heartbeat(self, parent_pid):
+        self._heartbeat.wait()  # wait for the first forward pass
+        while True:
+            self._heartbeat.clear()
+            success = self._heartbeat.wait(timeout=self.timeout)
+            if self._terminated:
+                break
+            elif not success:
+                logger.error((
+                    "Killing job for not making progress in {} seconds. "
+                    "Set --heartbeat-timeout=-1 to disable this timeout."
+                ).format(int(self.timeout)))
+                os.kill(parent_pid, self.signal)
+                return
diff --git a/SpeechT5/fairseq/fairseq/distributed/fully_sharded_data_parallel.py b/SpeechT5/fairseq/fairseq/distributed/fully_sharded_data_parallel.py
new file mode 100644
index 0000000000000000000000000000000000000000..8a96bfc76516682ac8e2b7e2c3bc2e6aa3d8ef0c
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/distributed/fully_sharded_data_parallel.py
@@ -0,0 +1,135 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import contextlib
+from typing import Optional
+
+import torch
+from fairseq.dataclass.configs import DistributedTrainingConfig
+from fairseq.distributed import utils as dist_utils
+
+
+try:
+    from fairscale.nn.data_parallel import FullyShardedDataParallel as FSDP
+
+    has_FSDP = True
+except ImportError:
+    FSDP = torch.nn.Module
+    has_FSDP = False
+
+
+class FullyShardedDataParallel(FSDP):
+    """
+    A small wrapper around fairscale's FullyShardedDataParallel (FSDP) with some
+    fairseq-specific checkpoint saving/loading logic.
+
+    Args:
+        use_sharded_state (bool): if True, then ``state_dict`` will return
+            ``FSDP.local_state_dict`` and ``load_state_dict`` will call
+            ``FSDP.load_local_state_dict``. Otherwise, ``state_dict`` will
+            return the full model weights on data parallel rank 0 (empty on
+            other ranks) and ``load_state_dict`` will broadcast model weights
+            from rank 0 to other ranks.
+    """
+
+    def __init__(self, *args, use_sharded_state: bool = False, **kwargs):
+        if not has_FSDP:
+            raise ImportError(
+                "Cannot find FullyShardedDataParallel. "
+                "Please install fairscale with: pip install fairscale"
+            )
+        super().__init__(*args, **kwargs)
+        self.use_sharded_state = use_sharded_state
+
+    @property
+    def unwrapped_module(self) -> torch.nn.Module:
+        if self.flatten_parameters:
+            return self.module.module
+        else:
+            return self.module
+
+    def state_dict(self, destination=None, prefix="", keep_vars=False):
+        if self.use_sharded_state:
+            return super().local_state_dict(
+                destination=destination, prefix=prefix, keep_vars=keep_vars
+            )
+        else:
+            if self.rank == 0:
+                return super().state_dict(
+                    destination=destination, prefix=prefix, keep_vars=keep_vars
+                )
+            else:
+                # We must call state_dict() due to use of communication
+                # primitives. But we don't use the result.
+                super().state_dict()
+                return destination or {}
+
+    def load_state_dict(self, state_dict, strict=True, model_cfg=None):
+        if self.use_sharded_state:
+            return super().load_local_state_dict(state_dict, strict=strict)
+        else:
+            state_dict = dist_utils.broadcast_object(
+                state_dict, src_rank=0, group=self.process_group
+            )
+            return super().load_state_dict(state_dict, strict=strict)
+
+
+@contextlib.contextmanager
+def fsdp_enable_wrap(cfg: DistributedTrainingConfig):
+    try:
+        from fairscale.nn import enable_wrap
+    except ImportError:
+        raise ImportError(
+            "Cannot find FullyShardedDataParallel. "
+            "Please install fairscale with: pip install fairscale"
+        )
+    if cfg.memory_efficient_fp16:
+        assert cfg.fp16  # memory_efficient_fp16 should imply fp16
+    group = dist_utils.get_data_parallel_group()
+    if group is None and cfg.distributed_world_size == 1:
+        from fairscale.utils.testing import DummyProcessGroup
+
+        group = DummyProcessGroup(rank=0, size=1)
+    fsdp_config = {
+        "process_group": group,
+        "reshard_after_forward": not cfg.no_reshard_after_forward,
+        "mixed_precision": cfg.fp16 and not cfg.memory_efficient_fp16,
+        "fp32_reduce_scatter": cfg.fp32_reduce_scatter,
+        "flatten_parameters": True,
+        "cpu_offload": cfg.cpu_offload,
+        "compute_dtype": torch.float16 if cfg.fp16 else torch.float32,
+        "bucket_cap_mb": cfg.bucket_cap_mb,
+        "state_dict_device": torch.device("cpu"),  # reduce GPU mem usage
+    }
+    with enable_wrap(
+        wrapper_cls=FullyShardedDataParallel,
+        use_sharded_state=cfg.use_sharded_state,
+        **fsdp_config,
+    ):
+        yield
+
+
+def fsdp_wrap(module, min_num_params: Optional[int] = None, **kwargs):
+    """
+    Helper to wrap layers/modules in FSDP. This falls back to a no-op if
+    fairscale is not available.
+
+    Args:
+        module (nn.Module): module to (maybe) wrap
+        min_num_params (int, Optional): minimum number of layer params to wrap
+    """
+    try:
+        from fairscale.nn import wrap
+
+        if min_num_params is not None:
+            num_params = sum(p.numel() for p in module.parameters())
+            if num_params >= min_num_params:
+                return wrap(module, **kwargs)
+            else:
+                return module
+        else:
+            return wrap(module, **kwargs)
+    except ImportError:
+        return module
diff --git a/SpeechT5/fairseq/fairseq/distributed/legacy_distributed_data_parallel.py b/SpeechT5/fairseq/fairseq/distributed/legacy_distributed_data_parallel.py
new file mode 100644
index 0000000000000000000000000000000000000000..f2308f87c5233625a3fe1b27104f5ead003ae3cb
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/distributed/legacy_distributed_data_parallel.py
@@ -0,0 +1,165 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+"""
+A modified version of the legacy DistributedDataParallel module that uses c10d
+communication primitives. This version is simpler than the latest PyTorch
+version and is useful for debugging. Notably it does not overlap gradient
+communication with the backward pass, which makes it slower but more robust
+than the PyTorch version.
+
+This version also supports the *no_sync* context manager, which allows faster
+training with `--update-freq`.
+"""
+
+from collections import OrderedDict
+from contextlib import contextmanager
+
+import torch
+from torch import nn
+
+from fairseq.distributed import utils
+
+
+class LegacyDistributedDataParallel(nn.Module):
+    """Implements distributed data parallelism at the module level.
+
+    A simplified version of :class:`torch.nn.parallel.DistributedDataParallel`.
+    This version uses a c10d process group for communication and does not
+    broadcast buffers.
+
+    Args:
+        module (~torch.nn.Module): module to be parallelized
+        process_group: the c10d process group to be used for distributed data
+            parallel all-reduction.
+        buffer_size (int, optional): number of elements to buffer before
+            performing all-reduce (default: 256M).
+    """
+
+    def __init__(self, module, process_group, buffer_size=2 ** 28):
+        super().__init__()
+
+        self.module = module
+        self.process_group = process_group
+        self.world_size = utils.get_world_size(self.process_group)
+
+        # Never use a bigger buffer than the number of model params
+        self.buffer_size = min(buffer_size, sum(p.numel() for p in module.parameters()))
+        self.buffer = None
+
+        # We can also forcibly accumulate grads locally and only do the
+        # all-reduce at some later time
+        self.accumulate_grads = False
+
+        # make per-device lists of parameters
+        paramlists = OrderedDict()
+        for param in self.module.parameters():
+            device = param.device
+            if paramlists.get(device) is None:
+                paramlists[device] = []
+            paramlists[device] += [param]
+        self.per_device_params = list(paramlists.values())
+
+    @contextmanager
+    def no_sync(self):
+        """A context manager to disable gradient synchronization."""
+        old_accumulate_grads = self.accumulate_grads
+        self.accumulate_grads = True
+        yield
+        self.accumulate_grads = old_accumulate_grads
+
+    def forward(self, *inputs, **kwargs):
+        return self.module(*inputs, **kwargs)
+
+    def all_reduce_grads(self):
+        """
+        This function must be called explicitly after backward to reduce
+        gradients. There is no automatic hook like c10d.
+        """
+
+        def all_reduce_params(params):
+            buffer = self.buffer
+            nonzero_buffer = False
+            if len(params) > 1:
+                offset = 0
+                for p in params:
+                    sz = p.numel()
+                    if p.grad is not None:
+                        buffer[offset : offset + sz].copy_(p.grad.data.view(-1))
+                        nonzero_buffer = True
+                    else:
+                        buffer[offset : offset + sz].zero_()
+                    offset += sz
+            else:
+                # we only have a single grad to all-reduce
+                p = params[0]
+                if p.grad is not None:
+                    buffer = p.grad.data
+                    nonzero_buffer = True
+                elif p.numel() <= self.buffer.numel():
+                    buffer = buffer[: p.numel()]
+                    buffer.zero_()
+                else:
+                    buffer = torch.zeros_like(p)
+
+            if nonzero_buffer:
+                buffer.div_(self.world_size)
+
+            utils.all_reduce(buffer, self.process_group)
+
+            # copy all-reduced grads back into their original place
+            offset = 0
+            for p in params:
+                sz = p.numel()
+                if p.grad is not None:
+                    p.grad.data.copy_(buffer[offset : offset + sz].view_as(p))
+                else:
+                    p.grad = buffer[offset : offset + sz].view_as(p).clone()
+                offset += sz
+
+        def reduction_fn():
+            # This function only needs to be called once
+            if self.accumulate_grads:
+                return
+
+            if self.buffer is None:
+                self.buffer = next(self.module.parameters()).new(self.buffer_size)
+
+            for params in self.per_device_params:
+                # All-reduce the gradients in buckets
+                offset = 0
+                buffered_params = []
+                for param in params:
+                    if not param.requires_grad:
+                        continue
+                    if param.grad is None:
+                        param.grad = torch.zeros_like(param)
+
+                    if hasattr(param, 'expert'):
+                        # Skip gradient sync for unshared parameters
+                        continue
+
+                    if param.grad.requires_grad:
+                        raise RuntimeError(
+                            "DistributedDataParallel only works "
+                            "with gradients that don't require "
+                            "grad"
+                        )
+                    sz = param.numel()
+                    if sz > self.buffer.numel():
+                        # all-reduce big params directly
+                        all_reduce_params([param])
+                    else:
+                        if offset + sz > self.buffer.numel():
+                            all_reduce_params(buffered_params)
+                            offset = 0
+                            buffered_params.clear()
+                        buffered_params.append(param)
+                        offset += sz
+
+                if len(buffered_params) > 0:
+                    all_reduce_params(buffered_params)
+
+        reduction_fn()
diff --git a/SpeechT5/fairseq/fairseq/distributed/module_proxy_wrapper.py b/SpeechT5/fairseq/fairseq/distributed/module_proxy_wrapper.py
new file mode 100644
index 0000000000000000000000000000000000000000..fc2c6f8c718f2ac8ece308e50f7ba74a05474f4a
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/distributed/module_proxy_wrapper.py
@@ -0,0 +1,55 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from torch import nn
+
+
+class ModuleProxyWrapper(nn.Module):
+    """
+    Wrap a DistributedDataParallel module and forward requests for missing
+    attributes to the module wrapped by DDP (the twice-wrapped module).
+    Also forward calls to :func:`state_dict` and :func:`load_state_dict`.
+
+    Usage::
+
+        module.xyz = "hello world"
+        wrapped_module = DistributedDataParallel(module, **ddp_args)
+        wrapped_module = ModuleProxyWrapper(wrapped_module)
+        assert wrapped_module.xyz == "hello world"
+        assert wrapped_module.state_dict().keys() == module.state_dict().keys()
+
+    Args:
+        module (nn.Module): module to wrap
+    """
+
+    def __init__(self, module: nn.Module):
+        super().__init__()
+        assert hasattr(module, "module"), \
+            "ModuleProxyWrapper expects input to wrap another module"
+        self.module = module
+
+    def __getattr__(self, name):
+        """Forward missing attributes to twice-wrapped module."""
+        try:
+            # defer to nn.Module's logic
+            return super().__getattr__(name)
+        except AttributeError:
+            try:
+                # forward to the once-wrapped module
+                return getattr(self.module, name)
+            except AttributeError:
+                # forward to the twice-wrapped module
+                return getattr(self.module.module, name)
+
+    def state_dict(self, *args, **kwargs):
+        """Forward to the twice-wrapped module."""
+        return self.module.module.state_dict(*args, **kwargs)
+
+    def load_state_dict(self, *args, **kwargs):
+        """Forward to the twice-wrapped module."""
+        return self.module.module.load_state_dict(*args, **kwargs)
+
+    def forward(self, *args, **kwargs):
+        return self.module(*args, **kwargs)
diff --git a/SpeechT5/fairseq/fairseq/distributed/tpu_distributed_data_parallel.py b/SpeechT5/fairseq/fairseq/distributed/tpu_distributed_data_parallel.py
new file mode 100644
index 0000000000000000000000000000000000000000..e971cf07c57c4e864726781092a690dd4d7d3e46
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/distributed/tpu_distributed_data_parallel.py
@@ -0,0 +1,43 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import torch
+from torch import nn
+
+from fairseq.distributed import utils
+
+
+class TPUDistributedDataParallel(nn.Module):
+
+    def __init__(self, module, process_group):
+        super().__init__()
+        self.module = module
+        self.process_group = process_group
+        self.world_size = utils.get_world_size(self.process_group)
+
+    def forward(self, *inputs, **kwargs):
+        return self.module(*inputs, **kwargs)
+
+    def all_reduce_grads(self):
+        gradients = []
+        for p in self.parameters():
+            if not p.requires_grad:
+                continue
+            if p.grad is None:
+                p.grad = torch.zeros_like(p)
+            if p.grad.requires_grad:
+                raise RuntimeError(
+                    "TPUDistributedDataParallel only works with gradients that don't "
+                    "require grad"
+                )
+            gradients.append(p.grad)
+
+        import torch_xla.core.xla_model as xm
+        xm.all_reduce(
+            'sum',
+            gradients,
+            scale=1. / self.world_size,
+            groups=self.process_group[1],
+        )
diff --git a/SpeechT5/fairseq/fairseq/distributed/utils.py b/SpeechT5/fairseq/fairseq/distributed/utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..b7736116f97bd2b9f3a72339e179f06be5c33cfd
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/distributed/utils.py
@@ -0,0 +1,805 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import io
+import logging
+import os
+import pickle
+import random
+import socket
+import struct
+import subprocess
+import warnings
+from argparse import Namespace
+from collections import OrderedDict
+from dataclasses import dataclass
+from typing import Any, Dict, List, Mapping, Optional
+
+import torch
+import torch.distributed as dist
+from fairseq.dataclass.configs import DistributedTrainingConfig, FairseqConfig
+from omegaconf import open_dict
+
+try:
+    import torch_xla.core.xla_model as xm
+except ImportError:
+    xm = None
+
+
+# Flag to indicate if we're using Megatron
+# NOTE: this is a temporary hack until we move away from Megatron's model parallel init
+_USE_MEGATRON = False
+
+# Whether to use XLA ops (e.g., on TPUs) instead of CUDA ops.
+_USE_XLA = False
+
+
+logger = logging.getLogger(__name__)
+
+
+def is_master(cfg: DistributedTrainingConfig):
+    return cfg.distributed_rank == 0
+
+
+def infer_init_method(cfg: DistributedTrainingConfig, force_distributed=False):
+    if cfg.distributed_init_method is not None or cfg.tpu:
+        return
+
+    num_pipelines_per_node = None
+    if cfg.pipeline_model_parallel:
+        num_pipeline_devices, num_pipelines_per_node = _pipeline_parallel_pre_init(cfg)
+
+    if all(
+        key in os.environ
+        for key in ["MASTER_ADDR", "MASTER_PORT", "WORLD_SIZE", "RANK"]
+    ):
+        # support torch.distributed.launch
+        _infer_torch_distributed_launch_init(cfg)
+    elif cfg.distributed_port > 0:
+        # we can determine the init method automatically for Slurm
+        _infer_slurm_init(cfg, num_pipelines_per_node)
+    elif cfg.distributed_world_size > 1 or force_distributed:
+        # fallback for single node with multiple GPUs
+        _infer_single_node_init(cfg)
+
+    if cfg.pipeline_model_parallel:
+        _pipeline_parallel_post_init(cfg, num_pipeline_devices, num_pipelines_per_node)
+    elif not cfg.distributed_no_spawn:
+        with open_dict(cfg):
+            cfg.distributed_num_procs = min(
+                torch.cuda.device_count(), cfg.distributed_world_size
+            )
+
+
+def _infer_torch_distributed_launch_init(cfg: DistributedTrainingConfig):
+    cfg.distributed_init_method = "env://"
+    cfg.distributed_world_size = int(os.environ["WORLD_SIZE"])
+    cfg.distributed_rank = int(os.environ["RANK"])
+    # processes are created by torch.distributed.launch
+    cfg.distributed_no_spawn = True
+
+
+def _infer_slurm_init(cfg: DistributedTrainingConfig, num_pipelines_per_node):
+    node_list = os.environ.get("SLURM_STEP_NODELIST")
+    if node_list is None:
+        node_list = os.environ.get("SLURM_JOB_NODELIST")
+    if node_list is not None:
+        try:
+            hostnames = subprocess.check_output(
+                ["scontrol", "show", "hostnames", node_list]
+            )
+            cfg.distributed_init_method = "tcp://{host}:{port}".format(
+                host=hostnames.split()[0].decode("utf-8"),
+                port=cfg.distributed_port,
+            )
+            nnodes = int(os.environ.get("SLURM_NNODES"))
+            ntasks_per_node = os.environ.get("SLURM_NTASKS_PER_NODE")
+            if ntasks_per_node is not None:
+                ntasks_per_node = int(ntasks_per_node)
+            else:
+                ntasks = int(os.environ.get("SLURM_NTASKS"))
+                nnodes = int(os.environ.get("SLURM_NNODES"))
+                assert ntasks % nnodes == 0
+                ntasks_per_node = int(ntasks / nnodes)
+            if ntasks_per_node == 1:
+                gpus_per_node = torch.cuda.device_count()
+                node_id = int(os.environ.get("SLURM_NODEID"))
+                cfg.distributed_rank = node_id * gpus_per_node
+                cfg.distributed_world_size = nnodes * gpus_per_node
+            elif cfg.pipeline_model_parallel:
+                assert ntasks_per_node == num_pipelines_per_node, (
+                    "SLURM --ntasks-per-node must match number of pipelines per "
+                    "node (={})".format(num_pipelines_per_node)
+                )
+                cfg.distributed_no_spawn = True
+                # For 4-way MP on nodes with 8 GPUs, ranks will be [0, 1] on
+                # the first node, [1, 2] on the second node, etc. This
+                # matches torch.distributed.launch.
+                node_id = int(os.environ.get("SLURM_NODEID"))
+                local_id = int(os.environ.get("SLURM_LOCALID"))
+                cfg.distributed_rank = node_id * num_pipelines_per_node + local_id
+                # In the above example, device_id will always be in [0, 1],
+                # which also matches torch.distributed.launch.
+                cfg.device_id = local_id
+                # We also want to set distributed_world_size to be the total
+                # number of pipelines across all nodes.
+                cfg.distributed_world_size = nnodes * num_pipelines_per_node
+            else:
+                assert ntasks_per_node == cfg.distributed_world_size // nnodes
+                cfg.distributed_no_spawn = True
+                cfg.distributed_rank = int(os.environ.get("SLURM_PROCID"))
+                cfg.device_id = int(os.environ.get("SLURM_LOCALID"))
+        except subprocess.CalledProcessError as e:  # scontrol failed
+            raise e
+        except FileNotFoundError:  # Slurm is not installed
+            pass
+
+
+def _infer_single_node_init(cfg: DistributedTrainingConfig):
+    assert (
+        cfg.distributed_world_size <= torch.cuda.device_count()
+    ), f"world size is {cfg.distributed_world_size} but have {torch.cuda.device_count()} available devices"
+    port = random.randint(10000, 20000)
+    cfg.distributed_init_method = "tcp://localhost:{port}".format(port=port)
+
+
+def _pipeline_parallel_pre_init(cfg: DistributedTrainingConfig):
+    from fairseq import utils
+
+    balance_exists = (
+        cfg.pipeline_balance is not None
+        or cfg.pipeline_encoder_balance is not None
+        or cfg.pipeline_decoder_balance is not None
+    )
+    devices_exist = (
+        cfg.pipeline_devices is not None
+        or cfg.pipeline_encoder_devices is not None
+        or cfg.pipeline_decoder_devices is not None
+    )
+    if not balance_exists:
+        raise ValueError(
+            "--pipeline-balance is currently required for pipeline model parallelism"
+        )
+    if not devices_exist:
+        raise ValueError(
+            "--pipeline-devices is currently required for pipeline model parallelism"
+        )
+
+    cfg.pipeline_balance = utils.eval_str_list(cfg.pipeline_balance, type=int)
+    if cfg.pipeline_devices is not None:
+        cfg.pipeline_devices = utils.eval_str_list(cfg.pipeline_devices, type=int)
+        num_pipeline_devices = len(set(cfg.pipeline_devices))
+    else:
+        cfg.pipeline_encoder_devices = utils.eval_str_list(
+            cfg.pipeline_encoder_devices, type=int
+        )
+        cfg.pipeline_decoder_devices = utils.eval_str_list(
+            cfg.pipeline_decoder_devices, type=int
+        )
+        num_pipeline_devices = len(
+            set(cfg.pipeline_encoder_devices + cfg.pipeline_decoder_devices)
+        )
+    gpus_per_node = torch.cuda.device_count()
+    assert (
+        gpus_per_node >= num_pipeline_devices
+        and gpus_per_node % num_pipeline_devices == 0
+    ), (
+        "the number of unique device IDs in --pipeline-devices must evenly divide "
+        "the number of GPUs per node (multi-node pipelining is not yet supported)"
+    )
+    num_pipelines_per_node = gpus_per_node // num_pipeline_devices
+    return num_pipeline_devices, num_pipelines_per_node
+
+
+def _pipeline_parallel_post_init(
+    cfg: DistributedTrainingConfig, num_pipeline_devices, num_pipelines_per_node
+):
+    if not cfg.distributed_no_spawn:
+        # When distributed_no_spawn is False, we expect distributed_rank and
+        # distributed_world_size to be based on the total number of GPUs, so
+        # we need to correct them to be based on the number of pipelines.
+        assert cfg.distributed_world_size % num_pipeline_devices == 0
+        cfg.distributed_world_size = (
+            cfg.distributed_world_size // num_pipeline_devices
+        )
+        # In the case of 4-way MP on nodes with 8 GPUs, we want
+        # distributed_rank to be the starting GPU index for each pipeline
+        # i.e., 0, 2, ...
+        gpus_per_node = torch.cuda.device_count()
+        assert cfg.distributed_rank % gpus_per_node == 0
+        assert cfg.distributed_rank % num_pipeline_devices == 0
+
+        with open_dict(cfg):
+            cfg.distributed_rank = cfg.distributed_rank // num_pipeline_devices
+            # launch one process per pipeline
+            cfg.distributed_num_procs = num_pipelines_per_node
+
+    # if we have 4-way MP on a node with 8 GPUs, we want device_ids to be 0
+    # and 4, indicating the starting device IDs for each pipeline
+    cfg.device_id *= num_pipeline_devices
+
+    if cfg.device_id > 0:
+        # if there's multiple pipelines on a node (e.g., 4-way MP on an 8
+        # GPU node), we need to adjust pipeline_devices accordingly
+        logger.debug(
+            "setting CUDA device={} on rank {}".format(
+                cfg.device_id, cfg.distributed_rank
+            )
+        )
+        torch.cuda.set_device(cfg.device_id)
+        with open_dict(cfg):
+            cfg.pipeline_devices = [cfg.device_id + d for d in cfg.pipeline_devices]
+        logger.info(
+            "setting pipeline_devices={} on rank {}".format(
+                cfg.pipeline_devices, cfg.distributed_rank
+            )
+        )
+
+
+def distributed_init(cfg: FairseqConfig):
+    if isinstance(cfg, Namespace):
+        from fairseq.dataclass.utils import convert_namespace_to_omegaconf
+
+        cfg = convert_namespace_to_omegaconf(cfg)
+
+    if not cfg.common.tpu:
+        if torch.distributed.is_available() and torch.distributed.is_initialized():
+            warnings.warn(
+                "Distributed is already initialized, cannot initialize twice!"
+            )
+        else:
+            logger.info(
+                "distributed init (rank {}): {}".format(
+                    cfg.distributed_training.distributed_rank,
+                    cfg.distributed_training.distributed_init_method,
+                )
+            )
+            dist.init_process_group(
+                backend=cfg.distributed_training.distributed_backend,
+                init_method=cfg.distributed_training.distributed_init_method,
+                world_size=cfg.distributed_training.distributed_world_size,
+                rank=cfg.distributed_training.distributed_rank,
+            )
+            logger.info(
+                "initialized host {} as rank {}".format(
+                    socket.gethostname(),
+                    cfg.distributed_training.distributed_rank,
+                )
+            )
+
+            # perform a dummy all-reduce to initialize the NCCL communicator
+            if torch.cuda.is_available():
+                dist.all_reduce(torch.zeros(1).cuda())
+
+        cfg.distributed_training.distributed_rank = torch.distributed.get_rank()
+    else:
+        assert xm.xrt_world_size() == cfg.distributed_training.distributed_world_size
+        global _USE_XLA
+        _USE_XLA = True
+        cfg.distributed_training.device_id = xm.get_local_ordinal()
+        cfg.distributed_training.distributed_rank = xm.get_ordinal()
+        xm.rendezvous("distributed_init")  # wait for all workers
+
+    if is_master(cfg.distributed_training):
+        logging.getLogger().setLevel(logging.INFO)
+    else:
+        logging.getLogger().setLevel(logging.WARNING)
+
+    if cfg.common.model_parallel_size > 1:
+        try:
+            from fairseq.model_parallel.megatron.mpu import (
+                initialize_model_parallel,
+                model_parallel_cuda_manual_seed,
+            )
+        except ImportError:
+            raise ImportError(
+                "\n\nPlease install the megatron submodule:"
+                "\n\n  git submodule update --init "
+                "fairseq/model_parallel/megatron"
+            )
+        global _USE_MEGATRON
+        _USE_MEGATRON = True
+        initialize_model_parallel(cfg.common.model_parallel_size)
+        model_parallel_cuda_manual_seed(cfg.common.seed)
+        model_part_number = get_model_parallel_rank()
+        cfg.checkpoint.checkpoint_suffix += "-model_part-{0}".format(model_part_number)
+
+    if hasattr(cfg,  "model") and getattr(cfg.model, "base_layers", 0) > 0:
+        cfg.checkpoint.checkpoint_suffix = f"-rank-{cfg.distributed_training.distributed_rank}"
+
+    return cfg.distributed_training.distributed_rank
+
+
+def distributed_main(i, main, cfg: FairseqConfig, kwargs):
+    cfg.distributed_training.device_id = i
+    if torch.cuda.is_available() and not cfg.common.cpu and not cfg.common.tpu:
+        torch.cuda.set_device(cfg.distributed_training.device_id)
+    if cfg.distributed_training.distributed_rank is None:  # torch.multiprocessing.spawn
+        cfg.distributed_training.distributed_rank = kwargs.pop("start_rank", 0) + i
+
+    cfg.distributed_training.distributed_rank = distributed_init(cfg)
+
+    after_distributed_init_fn = kwargs.pop("after_distributed_init_fn", None)
+    if after_distributed_init_fn:
+        cfg = after_distributed_init_fn(cfg)
+
+    main(cfg, **kwargs)
+
+    if torch.distributed.is_initialized():
+        torch.distributed.barrier(get_global_group())
+
+
+def call_main(cfg: FairseqConfig, main, **kwargs):
+    if cfg.distributed_training.distributed_init_method is None:
+        infer_init_method(cfg.distributed_training)
+
+    if cfg.distributed_training.distributed_init_method is not None:
+        # distributed training
+        if not cfg.distributed_training.distributed_no_spawn:
+            start_rank = cfg.distributed_training.distributed_rank
+            cfg.distributed_training.distributed_rank = None  # assign automatically
+            kwargs["start_rank"] = start_rank
+            torch.multiprocessing.spawn(
+                fn=distributed_main,
+                args=(main, cfg, kwargs),
+                nprocs=min(
+                    torch.cuda.device_count(),
+                    cfg.distributed_training.distributed_world_size,
+                ),
+                join=True,
+            )
+        else:
+            distributed_main(cfg.distributed_training.device_id, main, cfg, kwargs)
+    elif cfg.common.tpu and cfg.distributed_training.distributed_world_size > 1:
+        import torch_xla.distributed.xla_multiprocessing as xmp
+
+        torch.multiprocessing.set_sharing_strategy("file_system")
+        xmp.spawn(
+            fn=distributed_main,
+            args=(main, cfg, kwargs),
+            # tpu-comment:
+            #   8 devices in one TPU VM, is the max processes to be spawned.
+            #   The rest is driven by xm.distributed.xla_dist
+            nprocs=min(cfg.distributed_training.distributed_world_size, 8),
+        )
+    else:
+        # single GPU main
+        main(cfg, **kwargs)
+
+
+def use_xla():
+    global _USE_XLA
+    return _USE_XLA
+
+
+def new_groups(grouped_ranks: List[List[int]]):
+    if use_xla():
+        return ("tpu", grouped_ranks)
+    else:
+        groups = [dist.new_group(g) for g in grouped_ranks]
+        my_group_idx = _find_my_group_index(grouped_ranks)
+        return groups[my_group_idx]
+
+
+def _find_my_group_index(grouped_ranks):
+    my_rank = get_global_rank()
+    for i, group in enumerate(grouped_ranks):
+        if my_rank in group:
+            return i
+    raise RuntimeError
+
+
+def _find_my_group(grouped_ranks):
+    index = _find_my_group_index(grouped_ranks)
+    return grouped_ranks[index]
+
+
+def get_rank(group):
+    if use_xla():
+        assert group[0] == "tpu"
+        my_group = _find_my_group(group[1])
+        return my_group.index(get_global_rank())
+    else:
+        return dist.get_rank(group=group)
+
+
+def get_world_size(group):
+    if use_xla():
+        assert group[0] == "tpu"
+        my_group = _find_my_group(group[1])
+        return len(my_group)
+    elif torch.distributed.is_initialized():
+        return dist.get_world_size(group=group)
+    else:
+        return 1
+
+
+def get_global_group():
+    if use_xla():
+        return new_groups([list(range(get_global_world_size()))])
+    elif torch.distributed.is_initialized():
+        if not hasattr(get_global_group, "_global_group"):
+            # ideally we could use torch.distributed.group.WORLD, but it seems
+            # to cause random NCCL hangs in some cases
+            get_global_group._global_group = dist.new_group()
+        return get_global_group._global_group
+    else:
+        return None
+
+
+def get_global_rank():
+    if use_xla():
+        return xm.get_ordinal()
+    elif torch.distributed.is_initialized():
+        return torch.distributed.get_rank()
+    else:
+        return 0
+
+
+def get_global_world_size():
+    if use_xla():
+        return xm.xrt_world_size()
+    elif torch.distributed.is_initialized():
+        return torch.distributed.get_world_size()
+    else:
+        return 1
+
+
+def get_data_parallel_group():
+    """Get the data parallel group the caller rank belongs to."""
+    global _USE_MEGATRON
+    if _USE_MEGATRON:
+        from fairseq.model_parallel.megatron import mpu
+
+        return mpu.get_data_parallel_group()
+    else:
+        return get_global_group()
+
+
+def get_data_parallel_rank():
+    """Return my rank for the data parallel group."""
+    return get_rank(get_data_parallel_group())
+
+
+def get_data_parallel_world_size():
+    """Return world size for the data parallel group."""
+    return get_world_size(get_data_parallel_group())
+
+
+def get_model_parallel_group():
+    global _USE_MEGATRON
+    if _USE_MEGATRON:
+        from fairseq.model_parallel.megatron import mpu
+
+        return mpu.get_model_parallel_group()
+    else:
+        return None
+
+
+def get_model_parallel_rank():
+    """Return my rank for the model parallel group."""
+    return get_rank(get_model_parallel_group())
+
+
+def get_model_parallel_world_size():
+    """Return world size for the model parallel group."""
+    return get_world_size(get_model_parallel_group())
+
+
+def all_reduce(tensor, group, op="sum"):
+    if use_xla():
+        assert isinstance(group, tuple) and group[0] == "tpu"
+        tensor = [tensor]  # wrap in a list to make xm.all_reduce in-place
+        return xm.all_reduce(op, tensor, groups=group[1])[0]
+    else:
+        if op == "sum":
+            op = dist.ReduceOp.SUM
+        elif op == "max":
+            op = dist.ReduceOp.MAX
+        else:
+            raise NotImplementedError
+        dist.all_reduce(tensor, op=op, group=group)
+        return tensor
+
+
+def broadcast(tensor, src, group):
+    if use_xla():
+        # XLA doesn't support broadcast, hack it with all_reduce
+        if get_rank(group) != src:
+            tensor.zero_()
+        all_reduce(tensor, group)
+    else:
+        dist.broadcast(tensor, src=src, group=group)
+
+
+def all_to_all(tensor, group):
+    """Perform an all-to-all operation on a 1D Tensor."""
+    assert tensor.dim() == 1
+    split_count = get_world_size(group=group)
+    assert tensor.numel() % split_count == 0
+    if use_xla():
+        assert isinstance(group, tuple) and group[0] == "tpu"
+        return xm.all_to_all(
+            tensor,
+            split_dimension=0,
+            concat_dimension=0,
+            split_count=split_count,
+            groups=group[1],
+        )
+    else:
+        output = torch.zeros_like(tensor)
+        dist.all_to_all_single(output, tensor, group=group)
+        return output
+
+
+def all_gather(tensor, group, return_tensor=False):
+    """Perform an all-gather operation."""
+    if use_xla():
+        result = xm.all_gather(tensor, groups=group[1])
+        world_size = get_world_size(group=group)
+        result = result.view(world_size, *tensor.size())
+        if return_tensor:
+            return result
+        else:
+            return [result[i] for i in range(world_size)]
+    else:
+        world_size = get_world_size(group=group)
+        rank = get_rank(group=group)
+        tensor_list = [
+            tensor if i == rank else torch.empty_like(tensor) for i in range(world_size)
+        ]
+        dist.all_gather(tensor_list, tensor, group=group)
+        if return_tensor:
+            return torch.stack(tensor_list, dim=0)
+        else:
+            return tensor_list
+
+
+def all_gather_list(data, group=None, max_size=16384):
+    """Gathers arbitrary data from all nodes into a list.
+
+    Similar to :func:`~torch.distributed.all_gather` but for arbitrary Python
+    data. Note that *data* must be picklable and any CUDA tensors will be moved
+    to CPU and returned on CPU as well.
+
+    Args:
+        data (Any): data from the local worker to be gathered on other workers
+        group: group of the collective
+        max_size (int, optional): maximum size of the data to be gathered
+            across workers
+    """
+    from fairseq import utils
+
+    if group is None:
+        group = get_global_group()
+    rank = get_rank(group=group)
+    world_size = get_world_size(group=group)
+
+    buffer_size = max_size * world_size
+    if (
+        not hasattr(all_gather_list, "_buffer")
+        or all_gather_list._buffer.numel() < buffer_size
+    ):
+        all_gather_list._buffer = torch.cuda.ByteTensor(buffer_size)
+        all_gather_list._cpu_buffer = torch.ByteTensor(max_size).pin_memory()
+    buffer = all_gather_list._buffer
+    buffer.zero_()
+    cpu_buffer = all_gather_list._cpu_buffer
+
+    data = utils.move_to_cpu(data)
+    enc = pickle.dumps(data)
+    enc_size = len(enc)
+    header_size = 4  # size of header that contains the length of the encoded data
+    size = header_size + enc_size
+    if size > max_size:
+        raise ValueError(
+            "encoded data size ({}) exceeds max_size ({})".format(size, max_size)
+        )
+
+    header = struct.pack(">I", enc_size)
+    cpu_buffer[:size] = torch.ByteTensor(list(header + enc))
+    start = rank * max_size
+    buffer[start : start + size].copy_(cpu_buffer[:size])
+
+    all_reduce(buffer, group=group)
+
+    buffer = buffer.cpu()
+    try:
+        result = []
+        for i in range(world_size):
+            out_buffer = buffer[i * max_size : (i + 1) * max_size]
+            (enc_size,) = struct.unpack(">I", bytes(out_buffer[:header_size].tolist()))
+            if enc_size > 0:
+                result.append(
+                    pickle.loads(
+                        bytes(out_buffer[header_size : header_size + enc_size].tolist())
+                    )
+                )
+        return result
+    except pickle.UnpicklingError:
+        raise Exception(
+            "Unable to unpickle data from other workers. all_gather_list requires all "
+            "workers to enter the function together, so this error usually indicates "
+            "that the workers have fallen out of sync somehow. Workers can fall out of "
+            "sync if one of them runs out of memory, or if there are other conditions "
+            "in your training script that can cause one worker to finish an epoch "
+            "while other workers are still iterating over their portions of the data. "
+            "Try rerunning with --ddp-backend=legacy_ddp and see if that helps."
+        )
+
+
+def all_reduce_dict(data: Mapping[str, Any], device, group) -> Dict[str, Any]:
+    """
+    AllReduce a dictionary of values across workers. We separately
+    reduce items that are already on the device and items on CPU for
+    better performance.
+
+    Args:
+        data (Mapping[str, Any]): dictionary of data to all-reduce, but
+            cannot be a nested dictionary
+        device (torch.device): device for the reduction
+        group: group of the collective
+    """
+    data_keys = list(data.keys())
+
+    # We want to separately reduce items that are already on the
+    # device and items on CPU for performance reasons.
+    cpu_data = OrderedDict()
+    device_data = OrderedDict()
+    for k in data_keys:
+        t = data[k]
+        if not torch.is_tensor(t):
+            cpu_data[k] = torch.tensor(t, dtype=torch.double)
+        elif t.device.type != device.type:
+            cpu_data[k] = t.to(dtype=torch.double)
+        else:
+            device_data[k] = t.to(dtype=torch.double)
+
+    def _all_reduce_dict(data: OrderedDict):
+        if len(data) == 0:
+            return data
+        buf = torch.cat([t.view(-1) for t in data.values()]).to(device=device)
+        all_reduce(buf, group=group)
+        split_buf = torch.split(buf, [t.numel() for t in data.values()])
+        reduced_data = [t.view_as(orig) for t, orig in zip(split_buf, data.values())]
+        return OrderedDict(zip(data.keys(), reduced_data))
+
+    cpu_data = _all_reduce_dict(cpu_data)
+    device_data = _all_reduce_dict(device_data)
+
+    def get_from_stack(key):
+        if key in cpu_data:
+            return cpu_data[key]
+        elif key in device_data:
+            return device_data[key]
+        raise KeyError
+
+    return OrderedDict([(key, get_from_stack(key)) for key in data_keys])
+
+
+def broadcast_tensors(
+    tensors: Optional[List[torch.Tensor]],
+    src_rank: int,
+    group: object,
+    dist_device: Optional[torch.device] = None,
+) -> List[torch.Tensor]:
+    """
+    Broadcasts a list of tensors without other (non-src) ranks needing to know
+    the dtypes/shapes of the tensors.
+    """
+    if dist_device is None:
+        if torch.distributed.get_backend(group) == "nccl":
+            dist_device = torch.device("cuda")
+        else:
+            dist_device = torch.device("cpu")
+
+    # share metadata first to simplify transfer
+    is_src_rank = (get_rank(group) == src_rank)
+    if is_src_rank:
+        metadata = [
+            {"size": t.size(), "dtype": t.dtype, "device": t.device} for t in tensors
+        ]
+        metadata = _broadcast_object_slow(metadata, src_rank, group, dist_device)
+    else:
+        metadata = _broadcast_object_slow(None, src_rank, group, dist_device)
+
+    out_tensors = []
+    for i, meta in enumerate(metadata):
+        if is_src_rank:
+            tensor = tensors[i]
+            broadcast(tensors[i].to(dist_device), src=src_rank, group=group)
+        else:
+            tensor = torch.zeros(
+                [meta["size"].numel()], dtype=meta["dtype"], device=dist_device
+            )
+            broadcast(tensor, src=src_rank, group=group)
+        tensor = tensor.view(meta["size"]).to(meta["device"])
+        out_tensors.append(tensor)
+    return out_tensors
+
+
+def broadcast_object(
+    obj: Any,
+    src_rank: int,
+    group: object,
+    dist_device: Optional[torch.device] = None,
+) -> Any:
+    """Broadcast an arbitrary Python object to other workers."""
+    if dist_device is None:
+        if torch.distributed.get_backend(group) == "nccl":
+            dist_device = torch.device("cuda")
+        else:
+            dist_device = torch.device("cpu")
+
+    if get_rank(group) == src_rank:
+        # split the tensors from the non-tensors so we can broadcast them
+        # directly, avoiding unnecessary serialization/deserialization
+        tensors = []
+        obj = _split_tensors_from_obj(obj, tensors)
+        obj = _broadcast_object_slow(obj, src_rank, group, dist_device)
+        tensors = broadcast_tensors(tensors, src_rank, group, dist_device)
+    else:
+        obj = _broadcast_object_slow(None, src_rank, group, dist_device)
+        tensors = broadcast_tensors(None, src_rank, group, dist_device)
+    return _put_tensors_in_obj(obj, tensors)
+
+
+def _broadcast_object_slow(
+    obj: Any, src_rank: int, group: object, dist_device: torch.device,
+) -> Any:
+    if get_rank(group) == src_rank:
+        # Emit data
+        buffer = io.BytesIO()
+        torch.save(obj, buffer)
+        buffer = torch.ByteTensor(buffer.getbuffer()).to(dist_device)
+        length = torch.LongTensor([len(buffer)]).to(dist_device)
+        broadcast(length, src=src_rank, group=group)
+        broadcast(buffer, src=src_rank, group=group)
+    else:
+        # Fetch from the source
+        length = torch.LongTensor([0]).to(dist_device)
+        broadcast(length, src=src_rank, group=group)
+        buffer = torch.ByteTensor(int(length.item())).to(dist_device)
+        broadcast(buffer, src=src_rank, group=group)
+        buffer = io.BytesIO(buffer.cpu().numpy())
+        obj = torch.load(buffer, map_location="cpu")
+    return obj
+
+
+@dataclass(frozen=True)
+class _TensorPlaceholder:
+    index: int
+
+
+def _split_tensors_from_obj(obj: Any, tensors: List[torch.Tensor]) -> Any:
+    if torch.is_tensor(obj):
+        placeholder = _TensorPlaceholder(index=len(tensors))
+        tensors.append(obj)
+        return placeholder
+    elif isinstance(obj, dict):
+        return {k: _split_tensors_from_obj(v, tensors) for k, v in obj.items()}
+    elif isinstance(obj, list):
+        return [_split_tensors_from_obj(v, tensors) for v in obj]
+    elif isinstance(obj, tuple):
+        return tuple(_split_tensors_from_obj(v, tensors) for v in obj)
+    elif isinstance(obj, set):
+        return {_split_tensors_from_obj(v, tensors) for v in obj}
+    else:
+        return obj
+
+
+def _put_tensors_in_obj(obj: Any, tensors: List[torch.Tensor]) -> Any:
+    if isinstance(obj, _TensorPlaceholder):
+        return tensors[obj.index]
+    elif isinstance(obj, dict):
+        return {k: _put_tensors_in_obj(v, tensors) for k, v in obj.items()}
+    elif isinstance(obj, list):
+        return [_put_tensors_in_obj(v, tensors) for v in obj]
+    elif isinstance(obj, tuple):
+        return tuple(_put_tensors_in_obj(v, tensors) for v in obj)
+    elif isinstance(obj, set):
+        return {_put_tensors_in_obj(v, tensors) for v in obj}
+    else:
+        return obj
diff --git a/SpeechT5/fairseq/fairseq/file_io.py b/SpeechT5/fairseq/fairseq/file_io.py
new file mode 100644
index 0000000000000000000000000000000000000000..dba663d4aafeb925ddffa50f5055933d6531a069
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/file_io.py
@@ -0,0 +1,194 @@
+#!/usr/bin/env python3
+
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import logging
+import os
+import shutil
+from typing import List, Optional
+
+
+logger = logging.getLogger(__file__)
+
+
+try:
+    from iopath.common.file_io import g_pathmgr as IOPathManager
+
+    try:
+        # [FB only - for now] AWS PathHandler for PathManager
+        from .fb_pathhandlers import S3PathHandler
+
+        IOPathManager.register_handler(S3PathHandler())
+    except KeyError:
+        logging.warning("S3PathHandler already registered.")
+    except ImportError:
+        logging.debug(
+            "S3PathHandler couldn't be imported. Either missing fb-only files, or boto3 module."
+        )
+
+except ImportError:
+    IOPathManager = None
+
+
+class PathManager:
+    """
+    Wrapper for insulating OSS I/O (using Python builtin operations) from
+    iopath's PathManager abstraction (for transparently handling various
+    internal backends).
+    """
+
+    @staticmethod
+    def open(
+        path: str,
+        mode: str = "r",
+        buffering: int = -1,
+        encoding: Optional[str] = None,
+        errors: Optional[str] = None,
+        newline: Optional[str] = None,
+    ):
+        if IOPathManager:
+            return IOPathManager.open(
+                path=path,
+                mode=mode,
+                buffering=buffering,
+                encoding=encoding,
+                errors=errors,
+                newline=newline,
+            )
+        return open(
+            path,
+            mode=mode,
+            buffering=buffering,
+            encoding=encoding,
+            errors=errors,
+            newline=newline,
+        )
+
+    @staticmethod
+    def copy(src_path: str, dst_path: str, overwrite: bool = False) -> bool:
+        if IOPathManager:
+            return IOPathManager.copy(
+                src_path=src_path, dst_path=dst_path, overwrite=overwrite
+            )
+        return shutil.copyfile(src_path, dst_path)
+
+    @staticmethod
+    def get_local_path(path: str, **kwargs) -> str:
+        if IOPathManager:
+            return IOPathManager.get_local_path(path, **kwargs)
+        return path
+
+    @staticmethod
+    def exists(path: str) -> bool:
+        if IOPathManager:
+            return IOPathManager.exists(path)
+        return os.path.exists(path)
+
+    @staticmethod
+    def isfile(path: str) -> bool:
+        if IOPathManager:
+            return IOPathManager.isfile(path)
+        return os.path.isfile(path)
+
+    @staticmethod
+    def ls(path: str) -> List[str]:
+        if IOPathManager:
+            return IOPathManager.ls(path)
+        return os.listdir(path)
+
+    @staticmethod
+    def mkdirs(path: str) -> None:
+        if IOPathManager:
+            return IOPathManager.mkdirs(path)
+        os.makedirs(path, exist_ok=True)
+
+    @staticmethod
+    def rm(path: str) -> None:
+        if IOPathManager:
+            return IOPathManager.rm(path)
+        os.remove(path)
+
+    @staticmethod
+    def chmod(path: str, mode: int) -> None:
+        if not PathManager.path_requires_pathmanager(path):
+            os.chmod(path, mode)
+
+    @staticmethod
+    def register_handler(handler) -> None:
+        if IOPathManager:
+            return IOPathManager.register_handler(handler=handler)
+
+    @staticmethod
+    def copy_from_local(
+        local_path: str, dst_path: str, overwrite: bool = False, **kwargs
+    ) -> None:
+        if IOPathManager:
+            return IOPathManager.copy_from_local(
+                local_path=local_path, dst_path=dst_path, overwrite=overwrite, **kwargs
+            )
+        return shutil.copyfile(local_path, dst_path)
+
+    @staticmethod
+    def path_requires_pathmanager(path: str) -> bool:
+        """Do we require PathManager to access given path?"""
+        if IOPathManager:
+            for p in IOPathManager._path_handlers.keys():
+                if path.startswith(p):
+                    return True
+        return False
+
+    @staticmethod
+    def supports_rename(path: str) -> bool:
+        # PathManager doesn't yet support renames
+        return not PathManager.path_requires_pathmanager(path)
+
+    @staticmethod
+    def rename(src: str, dst: str):
+        os.rename(src, dst)
+
+    """
+    ioPath async PathManager methods:
+    """
+    @staticmethod
+    def opena(
+        path: str,
+        mode: str = "r",
+        buffering: int = -1,
+        encoding: Optional[str] = None,
+        errors: Optional[str] = None,
+        newline: Optional[str] = None,
+    ):
+        """
+        Return file descriptor with asynchronous write operations.
+        """
+        global IOPathManager
+        if not IOPathManager:
+            logging.info("ioPath is initializing PathManager.")
+            try:
+                from iopath.common.file_io import PathManager
+                IOPathManager = PathManager()
+            except Exception:
+                logging.exception("Failed to initialize ioPath PathManager object.")
+        return IOPathManager.opena(
+            path=path,
+            mode=mode,
+            buffering=buffering,
+            encoding=encoding,
+            errors=errors,
+            newline=newline,
+        )
+
+    @staticmethod
+    def async_close() -> bool:
+        """
+        Wait for files to be written and clean up asynchronous PathManager.
+        NOTE: `PathManager.async_close()` must be called at the end of any
+        script that uses `PathManager.opena(...)`.
+        """
+        global IOPathManager
+        if IOPathManager:
+            return IOPathManager.async_close()
+        return False
diff --git a/SpeechT5/fairseq/fairseq/file_utils.py b/SpeechT5/fairseq/fairseq/file_utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..d1d5ea65746682881264e4a9c462854dcfb3413f
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/file_utils.py
@@ -0,0 +1,369 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+"""
+Utilities for working with the local dataset cache.
+This file is adapted from `AllenNLP <https://github.com/allenai/allennlp>`_.
+and `huggingface <https://github.com/huggingface>`_.
+"""
+
+import fnmatch
+import json
+import logging
+import os
+import shutil
+import tarfile
+import tempfile
+from functools import partial, wraps
+from hashlib import sha256
+from io import open
+
+
+try:
+    from torch.hub import _get_torch_home
+
+    torch_cache_home = _get_torch_home()
+except ImportError:
+    torch_cache_home = os.path.expanduser(
+        os.getenv(
+            "TORCH_HOME", os.path.join(os.getenv("XDG_CACHE_HOME", "~/.cache"), "torch")
+        )
+    )
+default_cache_path = os.path.join(torch_cache_home, "pytorch_fairseq")
+
+try:
+    from urllib.parse import urlparse
+except ImportError:
+    from urlparse import urlparse
+
+try:
+    from pathlib import Path
+
+    PYTORCH_FAIRSEQ_CACHE = Path(os.getenv("PYTORCH_FAIRSEQ_CACHE", default_cache_path))
+except (AttributeError, ImportError):
+    PYTORCH_FAIRSEQ_CACHE = os.getenv("PYTORCH_FAIRSEQ_CACHE", default_cache_path)
+
+CONFIG_NAME = "config.json"
+WEIGHTS_NAME = "pytorch_model.bin"
+
+logger = logging.getLogger(__name__)  # pylint: disable=invalid-name
+
+
+def load_archive_file(archive_file):
+    # redirect to the cache, if necessary
+    try:
+        resolved_archive_file = cached_path(archive_file, cache_dir=None)
+    except EnvironmentError:
+        logger.info(
+            "Archive name '{}' was not found in archive name list. "
+            "We assumed '{}' was a path or URL but couldn't find any file "
+            "associated to this path or URL.".format(
+                archive_file,
+                archive_file,
+            )
+        )
+        return None
+
+    if resolved_archive_file == archive_file:
+        logger.info("loading archive file {}".format(archive_file))
+    else:
+        logger.info(
+            "loading archive file {} from cache at {}".format(
+                archive_file, resolved_archive_file
+            )
+        )
+
+    # Extract archive to temp dir and replace .tar.bz2 if necessary
+    tempdir = None
+    if not os.path.isdir(resolved_archive_file):
+        tempdir = tempfile.mkdtemp()
+        logger.info(
+            "extracting archive file {} to temp dir {}".format(
+                resolved_archive_file, tempdir
+            )
+        )
+        ext = os.path.splitext(archive_file)[1][1:]
+        with tarfile.open(resolved_archive_file, "r:" + ext) as archive:
+            top_dir = os.path.commonprefix(archive.getnames())
+            archive.extractall(tempdir)
+        os.remove(resolved_archive_file)
+        shutil.move(os.path.join(tempdir, top_dir), resolved_archive_file)
+        shutil.rmtree(tempdir)
+
+    return resolved_archive_file
+
+
+def url_to_filename(url, etag=None):
+    """
+    Convert `url` into a hashed filename in a repeatable way.
+    If `etag` is specified, append its hash to the URL's, delimited
+    by a period.
+    """
+    url_bytes = url.encode("utf-8")
+    url_hash = sha256(url_bytes)
+    filename = url_hash.hexdigest()
+
+    if etag:
+        etag_bytes = etag.encode("utf-8")
+        etag_hash = sha256(etag_bytes)
+        filename += "." + etag_hash.hexdigest()
+
+    return filename
+
+
+def filename_to_url(filename, cache_dir=None):
+    """
+    Return the url and etag (which may be ``None``) stored for `filename`.
+    Raise ``EnvironmentError`` if `filename` or its stored metadata do not exist.
+    """
+    if cache_dir is None:
+        cache_dir = PYTORCH_FAIRSEQ_CACHE
+    if isinstance(cache_dir, Path):
+        cache_dir = str(cache_dir)
+
+    cache_path = os.path.join(cache_dir, filename)
+    if not os.path.exists(cache_path):
+        raise EnvironmentError("file {} not found".format(cache_path))
+
+    meta_path = cache_path + ".json"
+    if not os.path.exists(meta_path):
+        raise EnvironmentError("file {} not found".format(meta_path))
+
+    with open(meta_path, encoding="utf-8") as meta_file:
+        metadata = json.load(meta_file)
+    url = metadata["url"]
+    etag = metadata["etag"]
+
+    return url, etag
+
+
+def cached_path_from_pm(url_or_filename):
+    """
+    Tries to cache the specified URL using PathManager class.
+    Returns the cached path if success otherwise failure.
+    """
+    try:
+        from fairseq.file_io import PathManager
+        local_path = PathManager.get_local_path(url_or_filename)
+        return local_path
+    except Exception:
+        return None
+
+
+def cached_path(url_or_filename, cache_dir=None):
+    """
+    Given something that might be a URL (or might be a local path),
+    determine which. If it's a URL, download the file and cache it, and
+    return the path to the cached file. If it's already a local path,
+    make sure the file exists and then return the path.
+    """
+    if cache_dir is None:
+        cache_dir = PYTORCH_FAIRSEQ_CACHE
+    if isinstance(url_or_filename, Path):
+        url_or_filename = str(url_or_filename)
+    if isinstance(cache_dir, Path):
+        cache_dir = str(cache_dir)
+
+    parsed = urlparse(url_or_filename)
+
+    if parsed.scheme in ("http", "https", "s3"):
+        # URL, so get it from the cache (downloading if necessary)
+        return get_from_cache(url_or_filename, cache_dir)
+    elif os.path.exists(url_or_filename):
+        # File, and it exists.
+        return url_or_filename
+    elif parsed.scheme == "":
+        # File, but it doesn't exist.
+        raise EnvironmentError("file {} not found".format(url_or_filename))
+    else:
+        cached_path = cached_path_from_pm(url_or_filename)
+        if cached_path:
+            return cached_path
+        # Something unknown
+        raise ValueError(
+            "unable to parse {} as a URL or as a local path".format(url_or_filename)
+        )
+
+
+def split_s3_path(url):
+    """Split a full s3 path into the bucket name and path."""
+    parsed = urlparse(url)
+    if not parsed.netloc or not parsed.path:
+        raise ValueError("bad s3 path {}".format(url))
+    bucket_name = parsed.netloc
+    s3_path = parsed.path
+    # Remove '/' at beginning of path.
+    if s3_path.startswith("/"):
+        s3_path = s3_path[1:]
+    return bucket_name, s3_path
+
+
+def s3_request(func):
+    """
+    Wrapper function for s3 requests in order to create more helpful error
+    messages.
+    """
+
+    @wraps(func)
+    def wrapper(url, *args, **kwargs):
+        from botocore.exceptions import ClientError
+
+        try:
+            return func(url, *args, **kwargs)
+        except ClientError as exc:
+            if int(exc.response["Error"]["Code"]) == 404:
+                raise EnvironmentError("file {} not found".format(url))
+            else:
+                raise
+
+    return wrapper
+
+
+@s3_request
+def s3_etag(url):
+    """Check ETag on S3 object."""
+    import boto3
+
+    s3_resource = boto3.resource("s3")
+    bucket_name, s3_path = split_s3_path(url)
+    s3_object = s3_resource.Object(bucket_name, s3_path)
+    return s3_object.e_tag
+
+
+@s3_request
+def s3_get(url, temp_file):
+    """Pull a file directly from S3."""
+    import boto3
+
+    s3_resource = boto3.resource("s3")
+    bucket_name, s3_path = split_s3_path(url)
+    s3_resource.Bucket(bucket_name).download_fileobj(s3_path, temp_file)
+
+
+def request_wrap_timeout(func, url):
+    import requests
+
+    for attempt, timeout in enumerate([10, 20, 40, 60, 60]):
+        try:
+            return func(timeout=timeout)
+        except requests.exceptions.Timeout as e:
+            logger.warning(
+                "Request for %s timed-out (attempt %d). Retrying with a timeout of %d secs",
+                url,
+                attempt,
+                timeout,
+                exc_info=e,
+            )
+            continue
+    raise RuntimeError(f"Unable to fetch file {url}")
+
+
+def http_get(url, temp_file):
+    import requests
+    from tqdm import tqdm
+
+    req = request_wrap_timeout(partial(requests.get, url, stream=True), url)
+    content_length = req.headers.get("Content-Length")
+    total = int(content_length) if content_length is not None else None
+    progress = tqdm(unit="B", total=total)
+    for chunk in req.iter_content(chunk_size=1024):
+        if chunk:  # filter out keep-alive new chunks
+            progress.update(len(chunk))
+            temp_file.write(chunk)
+    progress.close()
+
+
+def get_from_cache(url, cache_dir=None):
+    """
+    Given a URL, look for the corresponding dataset in the local cache.
+    If it's not there, download it. Then return the path to the cached file.
+    """
+    if cache_dir is None:
+        cache_dir = PYTORCH_FAIRSEQ_CACHE
+    if isinstance(cache_dir, Path):
+        cache_dir = str(cache_dir)
+
+    if not os.path.exists(cache_dir):
+        os.makedirs(cache_dir)
+
+    # Get eTag to add to filename, if it exists.
+    if url.startswith("s3://"):
+        etag = s3_etag(url)
+    else:
+        try:
+            import requests
+
+            response = request_wrap_timeout(
+                partial(requests.head, url, allow_redirects=True), url
+            )
+            if response.status_code != 200:
+                etag = None
+            else:
+                etag = response.headers.get("ETag")
+        except RuntimeError:
+            etag = None
+
+    filename = url_to_filename(url, etag)
+
+    # get cache path to put the file
+    cache_path = os.path.join(cache_dir, filename)
+
+    # If we don't have a connection (etag is None) and can't identify the file
+    # try to get the last downloaded one
+    if not os.path.exists(cache_path) and etag is None:
+        matching_files = fnmatch.filter(os.listdir(cache_dir), filename + ".*")
+        matching_files = list(filter(lambda s: not s.endswith(".json"), matching_files))
+        if matching_files:
+            cache_path = os.path.join(cache_dir, matching_files[-1])
+
+    if not os.path.exists(cache_path):
+        # Download to temporary file, then copy to cache dir once finished.
+        # Otherwise you get corrupt cache entries if the download gets interrupted.
+        with tempfile.NamedTemporaryFile() as temp_file:
+            logger.info("%s not found in cache, downloading to %s", url, temp_file.name)
+
+            # GET file object
+            if url.startswith("s3://"):
+                s3_get(url, temp_file)
+            else:
+                http_get(url, temp_file)
+
+            # we are copying the file before closing it, so flush to avoid truncation
+            temp_file.flush()
+            # shutil.copyfileobj() starts at the current position, so go to the start
+            temp_file.seek(0)
+
+            logger.info("copying %s to cache at %s", temp_file.name, cache_path)
+            with open(cache_path, "wb") as cache_file:
+                shutil.copyfileobj(temp_file, cache_file)
+
+            logger.info("creating metadata file for %s", cache_path)
+            meta = {"url": url, "etag": etag}
+            meta_path = cache_path + ".json"
+            with open(meta_path, "w") as meta_file:
+                output_string = json.dumps(meta)
+                meta_file.write(output_string)
+
+            logger.info("removing temp file %s", temp_file.name)
+
+    return cache_path
+
+
+def read_set_from_file(filename):
+    """
+    Extract a de-duped collection (set) of text from a file.
+    Expected file format is one item per line.
+    """
+    collection = set()
+    with open(filename, "r", encoding="utf-8") as file_:
+        for line in file_:
+            collection.add(line.rstrip())
+    return collection
+
+
+def get_file_extension(path, dot=True, lower=True):
+    ext = os.path.splitext(path)[1]
+    ext = ext if dot else ext[1:]
+    return ext.lower() if lower else ext
diff --git a/SpeechT5/fairseq/fairseq/hub_utils.py b/SpeechT5/fairseq/fairseq/hub_utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..d74470d2ecba2825221a2efa2ce21a9b698340df
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/hub_utils.py
@@ -0,0 +1,303 @@
+#!/usr/bin/env python3 -u
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import argparse
+import copy
+import logging
+import os
+from typing import Any, Dict, Iterator, List
+
+import torch
+from fairseq import utils
+from fairseq.data import encoders
+from omegaconf import open_dict
+from torch import nn
+
+
+logger = logging.getLogger(__name__)
+
+
+def from_pretrained(
+    model_name_or_path,
+    checkpoint_file="model.pt",
+    data_name_or_path=".",
+    archive_map=None,
+    **kwargs
+):
+    from fairseq import checkpoint_utils, file_utils
+
+    if archive_map is not None:
+        if model_name_or_path in archive_map:
+            model_name_or_path = archive_map[model_name_or_path]
+        if data_name_or_path is not None and data_name_or_path in archive_map:
+            data_name_or_path = archive_map[data_name_or_path]
+
+        # allow archive_map to set default arg_overrides (e.g., tokenizer, bpe)
+        # for each model
+        if isinstance(model_name_or_path, dict):
+            for k, v in model_name_or_path.items():
+                if k == "checkpoint_file":
+                    checkpoint_file = v
+                elif (
+                    k != "path"
+                    # only set kwargs that don't already have overrides
+                    and k not in kwargs
+                ):
+                    kwargs[k] = v
+            model_name_or_path = model_name_or_path["path"]
+
+    model_path = file_utils.load_archive_file(model_name_or_path)
+
+    # convenience hack for loading data and BPE codes from model archive
+    if data_name_or_path.startswith("."):
+        kwargs["data"] = os.path.abspath(os.path.join(model_path, data_name_or_path))
+    else:
+        kwargs["data"] = file_utils.load_archive_file(data_name_or_path)
+    for file, arg in {
+        "code": "bpe_codes",
+        "bpecodes": "bpe_codes",
+        "sentencepiece.bpe.model": "sentencepiece_model",
+        "merges.txt": "bpe_merges",
+        "vocab.json": "bpe_vocab",
+    }.items():
+        path = os.path.join(model_path, file)
+        if os.path.exists(path):
+            kwargs[arg] = path
+
+    if "user_dir" in kwargs:
+        utils.import_user_module(argparse.Namespace(user_dir=kwargs["user_dir"]))
+
+    models, args, task = checkpoint_utils.load_model_ensemble_and_task(
+        [os.path.join(model_path, cpt) for cpt in checkpoint_file.split(os.pathsep)],
+        arg_overrides=kwargs,
+    )
+
+    return {
+        "args": args,
+        "task": task,
+        "models": models,
+    }
+
+
+class GeneratorHubInterface(nn.Module):
+    """
+    PyTorch Hub interface for generating sequences from a pre-trained
+    translation or language model.
+    """
+
+    def __init__(self, cfg, task, models):
+        super().__init__()
+        self.cfg = cfg
+        self.task = task
+        self.models = nn.ModuleList(models)
+        self.src_dict = task.source_dictionary
+        self.tgt_dict = task.target_dictionary
+
+        # optimize model for generation
+        for model in self.models:
+            model.prepare_for_inference_(cfg)
+
+        # Load alignment dictionary for unknown word replacement
+        # (None if no unknown word replacement, empty if no path to align dictionary)
+        self.align_dict = utils.load_align_dict(cfg.generation.replace_unk)
+
+        self.tokenizer = encoders.build_tokenizer(cfg.tokenizer)
+        self.bpe = encoders.build_bpe(cfg.bpe)
+
+        self.max_positions = utils.resolve_max_positions(
+            self.task.max_positions(), *[model.max_positions() for model in models]
+        )
+
+        # this is useful for determining the device
+        self.register_buffer("_float_tensor", torch.tensor([0], dtype=torch.float))
+
+    @property
+    def device(self):
+        return self._float_tensor.device
+
+    def translate(
+        self, sentences: List[str], beam: int = 5, verbose: bool = False, **kwargs
+    ) -> List[str]:
+        return self.sample(sentences, beam, verbose, **kwargs)
+
+    def sample(
+        self, sentences: List[str], beam: int = 1, verbose: bool = False, **kwargs
+    ) -> List[str]:
+        if isinstance(sentences, str):
+            return self.sample([sentences], beam=beam, verbose=verbose, **kwargs)[0]
+        tokenized_sentences = [self.encode(sentence) for sentence in sentences]
+        batched_hypos = self.generate(tokenized_sentences, beam, verbose, **kwargs)
+        return [self.decode(hypos[0]["tokens"]) for hypos in batched_hypos]
+
+    def score(self, sentences: List[str], **kwargs):
+        if isinstance(sentences, str):
+            return self.score([sentences], **kwargs)[0]
+        # NOTE: this doesn't support translation tasks currently
+        tokenized_sentences = [self.encode(sentence) for sentence in sentences]
+        return [
+            hypos[0]
+            for hypos in self.generate(
+                tokenized_sentences, score_reference=True, **kwargs
+            )
+        ]
+
+    def generate(
+        self,
+        tokenized_sentences: List[torch.LongTensor],
+        beam: int = 5,
+        verbose: bool = False,
+        skip_invalid_size_inputs=False,
+        inference_step_args=None,
+        prefix_allowed_tokens_fn=None,
+        **kwargs
+    ) -> List[List[Dict[str, torch.Tensor]]]:
+        if torch.is_tensor(tokenized_sentences) and tokenized_sentences.dim() == 1:
+            return self.generate(
+                tokenized_sentences.unsqueeze(0), beam=beam, verbose=verbose, **kwargs
+            )[0]
+
+        # build generator using current args as well as any kwargs
+        gen_args = copy.deepcopy(self.cfg.generation)
+        with open_dict(gen_args):
+            gen_args.beam = beam
+            for k, v in kwargs.items():
+                setattr(gen_args, k, v)
+        generator = self.task.build_generator(
+            self.models,
+            gen_args,
+            prefix_allowed_tokens_fn=prefix_allowed_tokens_fn,
+        )
+
+        inference_step_args = inference_step_args or {}
+        results = []
+        for batch in self._build_batches(tokenized_sentences, skip_invalid_size_inputs):
+            batch = utils.apply_to_sample(lambda t: t.to(self.device), batch)
+            translations = self.task.inference_step(
+                generator, self.models, batch, **inference_step_args
+            )
+            for id, hypos in zip(batch["id"].tolist(), translations):
+                results.append((id, hypos))
+
+        # sort output to match input order
+        outputs = [hypos for _, hypos in sorted(results, key=lambda x: x[0])]
+
+        if verbose:
+
+            def getarg(name, default):
+                return getattr(gen_args, name, getattr(self.cfg, name, default))
+
+            for source_tokens, target_hypotheses in zip(tokenized_sentences, outputs):
+                src_str_with_unk = self.string(source_tokens)
+                logger.info("S\t{}".format(src_str_with_unk))
+                for hypo in target_hypotheses:
+                    hypo_str = self.decode(hypo["tokens"])
+                    logger.info("H\t{}\t{}".format(hypo["score"], hypo_str))
+                    logger.info(
+                        "P\t{}".format(
+                            " ".join(
+                                map(
+                                    lambda x: "{:.4f}".format(x),
+                                    hypo["positional_scores"].tolist(),
+                                )
+                            )
+                        )
+                    )
+                    if hypo["alignment"] is not None and getarg(
+                        "print_alignment", False
+                    ):
+                        logger.info(
+                            "A\t{}".format(
+                                " ".join(
+                                    [
+                                        "{}-{}".format(src_idx, tgt_idx)
+                                        for src_idx, tgt_idx in hypo["alignment"]
+                                    ]
+                                )
+                            )
+                        )
+        return outputs
+
+    def encode(self, sentence: str) -> torch.LongTensor:
+        sentence = self.tokenize(sentence)
+        sentence = self.apply_bpe(sentence)
+        return self.binarize(sentence)
+
+    def decode(self, tokens: torch.LongTensor) -> str:
+        sentence = self.string(tokens)
+        sentence = self.remove_bpe(sentence)
+        return self.detokenize(sentence)
+
+    def tokenize(self, sentence: str) -> str:
+        if self.tokenizer is not None:
+            sentence = self.tokenizer.encode(sentence)
+        return sentence
+
+    def detokenize(self, sentence: str) -> str:
+        if self.tokenizer is not None:
+            sentence = self.tokenizer.decode(sentence)
+        return sentence
+
+    def apply_bpe(self, sentence: str) -> str:
+        if self.bpe is not None:
+            sentence = self.bpe.encode(sentence)
+        return sentence
+
+    def remove_bpe(self, sentence: str) -> str:
+        if self.bpe is not None:
+            sentence = self.bpe.decode(sentence)
+        return sentence
+
+    def binarize(self, sentence: str) -> torch.LongTensor:
+        return self.src_dict.encode_line(sentence, add_if_not_exist=False).long()
+
+    def string(self, tokens: torch.LongTensor) -> str:
+        return self.tgt_dict.string(tokens)
+
+    def _build_batches(
+        self, tokens: List[List[int]], skip_invalid_size_inputs: bool
+    ) -> Iterator[Dict[str, Any]]:
+        lengths = torch.LongTensor([t.numel() for t in tokens])
+        batch_iterator = self.task.get_batch_iterator(
+            dataset=self.task.build_dataset_for_inference(tokens, lengths),
+            max_tokens=self.cfg.dataset.max_tokens,
+            max_sentences=self.cfg.dataset.batch_size,
+            max_positions=self.max_positions,
+            ignore_invalid_inputs=skip_invalid_size_inputs,
+            disable_iterator_cache=True,
+        ).next_epoch_itr(shuffle=False)
+        return batch_iterator
+
+
+class BPEHubInterface(object):
+    """PyTorch Hub interface for Byte-Pair Encoding (BPE)."""
+
+    def __init__(self, bpe, **kwargs):
+        super().__init__()
+        args = argparse.Namespace(bpe=bpe, **kwargs)
+        self.bpe = encoders.build_bpe(args)
+        assert self.bpe is not None
+
+    def encode(self, sentence: str) -> str:
+        return self.bpe.encode(sentence)
+
+    def decode(self, sentence: str) -> str:
+        return self.bpe.decode(sentence)
+
+
+class TokenizerHubInterface(object):
+    """PyTorch Hub interface for tokenization."""
+
+    def __init__(self, tokenizer, **kwargs):
+        super().__init__()
+        args = argparse.Namespace(tokenizer=tokenizer, **kwargs)
+        self.tokenizer = encoders.build_tokenizer(args)
+        assert self.tokenizer is not None
+
+    def encode(self, sentence: str) -> str:
+        return self.tokenizer.encode(sentence)
+
+    def decode(self, sentence: str) -> str:
+        return self.tokenizer.decode(sentence)
diff --git a/SpeechT5/fairseq/fairseq/incremental_decoding_utils.py b/SpeechT5/fairseq/fairseq/incremental_decoding_utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..b26e6cd01cd4cbdffa23d88b354eb4a55a94189b
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/incremental_decoding_utils.py
@@ -0,0 +1,51 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import uuid
+from typing import Dict, Optional
+
+from torch import Tensor
+
+
+class FairseqIncrementalState(object):
+    def __init__(self, *args, **kwargs):
+        super().__init__(*args, **kwargs)
+        self.init_incremental_state()
+
+    def init_incremental_state(self):
+        self._incremental_state_id = str(uuid.uuid4())
+
+    def _get_full_incremental_state_key(self, key: str) -> str:
+        return "{}.{}".format(self._incremental_state_id, key)
+
+    def get_incremental_state(
+        self,
+        incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]],
+        key: str,
+    ) -> Optional[Dict[str, Optional[Tensor]]]:
+        """Helper for getting incremental state for an nn.Module."""
+        full_key = self._get_full_incremental_state_key(key)
+        if incremental_state is None or full_key not in incremental_state:
+            return None
+        return incremental_state[full_key]
+
+    def set_incremental_state(
+        self,
+        incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]],
+        key: str,
+        value: Dict[str, Optional[Tensor]],
+    ) -> Optional[Dict[str, Dict[str, Optional[Tensor]]]]:
+        """Helper for setting incremental state for an nn.Module."""
+        if incremental_state is not None:
+            full_key = self._get_full_incremental_state_key(key)
+            incremental_state[full_key] = value
+        return incremental_state
+
+
+def with_incremental_state(cls):
+    cls.__bases__ = (FairseqIncrementalState,) + tuple(
+        b for b in cls.__bases__ if b != FairseqIncrementalState
+    )
+    return cls
diff --git a/SpeechT5/fairseq/fairseq/iterative_refinement_generator.py b/SpeechT5/fairseq/fairseq/iterative_refinement_generator.py
new file mode 100644
index 0000000000000000000000000000000000000000..4fb0946f499329ceb130761b59675d761df1c158
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/iterative_refinement_generator.py
@@ -0,0 +1,359 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from collections import namedtuple
+
+import numpy as np
+import torch
+from fairseq import utils
+
+
+DecoderOut = namedtuple(
+    "IterativeRefinementDecoderOut",
+    ["output_tokens", "output_scores", "attn", "step", "max_step", "history"],
+)
+
+
+class IterativeRefinementGenerator(object):
+    def __init__(
+        self,
+        tgt_dict,
+        models=None,
+        eos_penalty=0.0,
+        max_iter=10,
+        max_ratio=2,
+        beam_size=1,
+        decoding_format=None,
+        retain_dropout=False,
+        adaptive=True,
+        retain_history=False,
+        reranking=False,
+    ):
+        """
+        Generates translations based on iterative refinement.
+
+        Args:
+            tgt_dict: target dictionary
+            eos_penalty: if > 0.0, it penalized early-stopping in decoding
+            max_iter: maximum number of refinement iterations
+            max_ratio: generate sequences of maximum length ax, where x is the source length
+            decoding_format: decoding mode in {'unigram', 'ensemble', 'vote', 'dp', 'bs'}
+            retain_dropout: retaining dropout in the inference
+            adaptive: decoding with early stop
+        """
+        self.bos = tgt_dict.bos()
+        self.pad = tgt_dict.pad()
+        self.unk = tgt_dict.unk()
+        self.eos = tgt_dict.eos()
+        self.vocab_size = len(tgt_dict)
+        self.eos_penalty = eos_penalty
+        self.max_iter = max_iter
+        self.max_ratio = max_ratio
+        self.beam_size = beam_size
+        self.reranking = reranking
+        self.decoding_format = decoding_format
+        self.retain_dropout = retain_dropout
+        self.retain_history = retain_history
+        self.adaptive = adaptive
+        self.models = models
+
+    def generate_batched_itr(
+        self,
+        data_itr,
+        maxlen_a=None,
+        maxlen_b=None,
+        cuda=False,
+        timer=None,
+        prefix_size=0,
+    ):
+        """Iterate over a batched dataset and yield individual translations.
+
+        Args:
+            maxlen_a/b: generate sequences of maximum length ax + b,
+                where x is the source sentence length.
+            cuda: use GPU for generation
+            timer: StopwatchMeter for timing generations.
+        """
+
+        for sample in data_itr:
+            if "net_input" not in sample:
+                continue
+            if timer is not None:
+                timer.start()
+            with torch.no_grad():
+                hypos = self.generate(
+                    self.models,
+                    sample,
+                    prefix_tokens=sample["target"][:, :prefix_size]
+                    if prefix_size > 0
+                    else None,
+                )
+            if timer is not None:
+                timer.stop(sample["ntokens"])
+            for i, id in enumerate(sample["id"]):
+                # remove padding
+                src = utils.strip_pad(sample["net_input"]["src_tokens"][i, :], self.pad)
+                ref = utils.strip_pad(sample["target"][i, :], self.pad)
+                yield id, src, ref, hypos[i]
+
+    @torch.no_grad()
+    def generate(self, models, sample, prefix_tokens=None, constraints=None):
+        if constraints is not None:
+            raise NotImplementedError(
+                "Constrained decoding with the IterativeRefinementGenerator is not supported"
+            )
+
+        # TODO: iterative refinement generator does not support ensemble for now.
+        if not self.retain_dropout:
+            for model in models:
+                model.eval()
+
+        model, reranker = models[0], None
+        if self.reranking:
+            assert len(models) > 1, "Assuming the last checkpoint is the reranker"
+            assert (
+                self.beam_size > 1
+            ), "Reranking requires multiple translation for each example"
+
+            reranker = models[-1]
+            models = models[:-1]
+
+        if len(models) > 1 and hasattr(model, "enable_ensemble"):
+            assert model.allow_ensemble, "{} does not support ensembling".format(
+                model.__class__.__name__
+            )
+            model.enable_ensemble(models)
+
+        # TODO: better encoder inputs?
+        src_tokens = sample["net_input"]["src_tokens"]
+        src_lengths = sample["net_input"]["src_lengths"]
+        bsz, src_len = src_tokens.size()
+
+        # initialize
+        encoder_out = model.forward_encoder([src_tokens, src_lengths])
+        prev_decoder_out = model.initialize_output_tokens(encoder_out, src_tokens)
+
+        if self.beam_size > 1:
+            assert (
+                model.allow_length_beam
+            ), "{} does not support decoding with length beam.".format(
+                model.__class__.__name__
+            )
+
+            # regenerate data based on length-beam
+            length_beam_order = (
+                utils.new_arange(src_tokens, self.beam_size, bsz).t().reshape(-1)
+            )
+            encoder_out = model.encoder.reorder_encoder_out(
+                encoder_out, length_beam_order
+            )
+            prev_decoder_out = model.regenerate_length_beam(
+                prev_decoder_out, self.beam_size
+            )
+            bsz = bsz * self.beam_size
+
+        sent_idxs = torch.arange(bsz)
+        prev_output_tokens = prev_decoder_out.output_tokens.clone()
+
+        if self.retain_history:
+            prev_decoder_out = prev_decoder_out._replace(history=[prev_output_tokens])
+
+        finalized = [[] for _ in range(bsz)]
+
+        def is_a_loop(x, y, s, a):
+            b, l_x, l_y = x.size(0), x.size(1), y.size(1)
+            if l_x > l_y:
+                y = torch.cat([y, x.new_zeros(b, l_x - l_y).fill_(self.pad)], 1)
+                s = torch.cat([s, s.new_zeros(b, l_x - l_y)], 1)
+                if a is not None:
+                    a = torch.cat([a, a.new_zeros(b, l_x - l_y, a.size(2))], 1)
+            elif l_x < l_y:
+                x = torch.cat([x, y.new_zeros(b, l_y - l_x).fill_(self.pad)], 1)
+            return (x == y).all(1), y, s, a
+
+        def finalized_hypos(step, prev_out_token, prev_out_score, prev_out_attn):
+            cutoff = prev_out_token.ne(self.pad)
+            tokens = prev_out_token[cutoff]
+            if prev_out_score is None:
+                scores, score = None, None
+            else:
+                scores = prev_out_score[cutoff]
+                score = scores.mean()
+
+            if prev_out_attn is None:
+                hypo_attn, alignment = None, None
+            else:
+                hypo_attn = prev_out_attn[cutoff]
+                alignment = hypo_attn.max(dim=1)[1]
+            return {
+                "steps": step,
+                "tokens": tokens,
+                "positional_scores": scores,
+                "score": score,
+                "hypo_attn": hypo_attn,
+                "alignment": alignment,
+            }
+
+        for step in range(self.max_iter + 1):
+
+            decoder_options = {
+                "eos_penalty": self.eos_penalty,
+                "max_ratio": self.max_ratio,
+                "decoding_format": self.decoding_format,
+            }
+            prev_decoder_out = prev_decoder_out._replace(
+                step=step,
+                max_step=self.max_iter + 1,
+            )
+
+            decoder_out = model.forward_decoder(
+                prev_decoder_out, encoder_out, **decoder_options
+            )
+
+            if self.adaptive:
+                # terminate if there is a loop
+                terminated, out_tokens, out_scores, out_attn = is_a_loop(
+                    prev_output_tokens,
+                    decoder_out.output_tokens,
+                    decoder_out.output_scores,
+                    decoder_out.attn,
+                )
+                decoder_out = decoder_out._replace(
+                    output_tokens=out_tokens,
+                    output_scores=out_scores,
+                    attn=out_attn,
+                )
+
+            else:
+                terminated = decoder_out.output_tokens.new_zeros(
+                    decoder_out.output_tokens.size(0)
+                ).bool()
+
+            if step == self.max_iter:  # reach last iteration, terminate
+                terminated.fill_(1)
+
+            # collect finalized sentences
+            finalized_idxs = sent_idxs[terminated]
+            finalized_tokens = decoder_out.output_tokens[terminated]
+            finalized_scores = decoder_out.output_scores[terminated]
+            finalized_attn = (
+                None
+                if (decoder_out.attn is None or decoder_out.attn.size(0) == 0)
+                else decoder_out.attn[terminated]
+            )
+
+            if self.retain_history:
+                finalized_history_tokens = [h[terminated] for h in decoder_out.history]
+
+            for i in range(finalized_idxs.size(0)):
+                finalized[finalized_idxs[i]] = [
+                    finalized_hypos(
+                        step,
+                        finalized_tokens[i],
+                        finalized_scores[i],
+                        None if finalized_attn is None else finalized_attn[i],
+                    )
+                ]
+
+                if self.retain_history:
+                    finalized[finalized_idxs[i]][0]["history"] = []
+                    for j in range(len(finalized_history_tokens)):
+                        finalized[finalized_idxs[i]][0]["history"].append(
+                            finalized_hypos(
+                                step, finalized_history_tokens[j][i], None, None
+                            )
+                        )
+
+            # check if all terminated
+            if terminated.sum() == terminated.size(0):
+                break
+
+            # for next step
+            not_terminated = ~terminated
+            prev_decoder_out = decoder_out._replace(
+                output_tokens=decoder_out.output_tokens[not_terminated],
+                output_scores=decoder_out.output_scores[not_terminated],
+                attn=decoder_out.attn[not_terminated]
+                if (decoder_out.attn is not None and decoder_out.attn.size(0) > 0)
+                else None,
+                history=[h[not_terminated] for h in decoder_out.history]
+                if decoder_out.history is not None
+                else None,
+            )
+            encoder_out = model.encoder.reorder_encoder_out(
+                encoder_out, not_terminated.nonzero(as_tuple=False).squeeze()
+            )
+            sent_idxs = sent_idxs[not_terminated]
+            prev_output_tokens = prev_decoder_out.output_tokens.clone()
+
+        if self.beam_size > 1:
+            if reranker is not None:
+                finalized = self.rerank(
+                    reranker, finalized, [src_tokens, src_lengths], self.beam_size
+                )
+
+            # aggregate information from length beam
+            finalized = [
+                finalized[
+                    np.argmax(
+                        [
+                            finalized[self.beam_size * i + j][0]["score"]
+                            for j in range(self.beam_size)
+                        ]
+                    )
+                    + self.beam_size * i
+                ]
+                for i in range(len(finalized) // self.beam_size)
+            ]
+
+        return finalized
+
+    def rerank(self, reranker, finalized, encoder_input, beam_size):
+        def rebuild_batch(finalized):
+            finalized_tokens = [f[0]["tokens"] for f in finalized]
+            finalized_maxlen = max(f.size(0) for f in finalized_tokens)
+            final_output_tokens = (
+                finalized_tokens[0]
+                .new_zeros(len(finalized_tokens), finalized_maxlen)
+                .fill_(self.pad)
+            )
+            for i, f in enumerate(finalized_tokens):
+                final_output_tokens[i, : f.size(0)] = f
+            return final_output_tokens
+
+        final_output_tokens = rebuild_batch(finalized)
+        final_output_tokens[
+            :, 0
+        ] = self.eos  # autoregressive model assumes starting with EOS
+
+        reranker_encoder_out = reranker.encoder(*encoder_input)
+        length_beam_order = (
+            utils.new_arange(
+                final_output_tokens, beam_size, reranker_encoder_out.encoder_out.size(1)
+            )
+            .t()
+            .reshape(-1)
+        )
+        reranker_encoder_out = reranker.encoder.reorder_encoder_out(
+            reranker_encoder_out, length_beam_order
+        )
+        reranking_scores = reranker.get_normalized_probs(
+            reranker.decoder(final_output_tokens[:, :-1], reranker_encoder_out),
+            True,
+            None,
+        )
+        reranking_scores = reranking_scores.gather(2, final_output_tokens[:, 1:, None])
+        reranking_masks = final_output_tokens[:, 1:].ne(self.pad)
+        reranking_scores = (
+            reranking_scores[:, :, 0].masked_fill_(~reranking_masks, 0).sum(1)
+        )
+        reranking_scores = reranking_scores / reranking_masks.sum(1).type_as(
+            reranking_scores
+        )
+
+        for i in range(len(finalized)):
+            finalized[i][0]["score"] = reranking_scores[i]
+
+        return finalized
diff --git a/SpeechT5/fairseq/fairseq/logging/__init__.py b/SpeechT5/fairseq/fairseq/logging/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/SpeechT5/fairseq/fairseq/logging/meters.py b/SpeechT5/fairseq/fairseq/logging/meters.py
new file mode 100644
index 0000000000000000000000000000000000000000..2100b1fa0b2704b1c585f59e9349655bba0cc9e6
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/logging/meters.py
@@ -0,0 +1,323 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import bisect
+import time
+from collections import OrderedDict
+from typing import Dict, Optional
+
+
+try:
+    import torch
+
+    def type_as(a, b):
+        if torch.is_tensor(a) and torch.is_tensor(b):
+            return a.to(b)
+        else:
+            return a
+
+
+except ImportError:
+    torch = None
+
+    def type_as(a, b):
+        return a
+
+
+try:
+    import numpy as np
+except ImportError:
+    np = None
+
+
+class Meter(object):
+    """Base class for Meters."""
+
+    def __init__(self):
+        pass
+
+    def state_dict(self):
+        return {}
+
+    def load_state_dict(self, state_dict):
+        pass
+
+    def reset(self):
+        raise NotImplementedError
+
+    @property
+    def smoothed_value(self) -> float:
+        """Smoothed value used for logging."""
+        raise NotImplementedError
+
+
+def safe_round(number, ndigits):
+    if hasattr(number, "__round__"):
+        return round(number, ndigits)
+    elif torch is not None and torch.is_tensor(number) and number.numel() == 1:
+        return safe_round(number.item(), ndigits)
+    elif np is not None and np.ndim(number) == 0 and hasattr(number, "item"):
+        return safe_round(number.item(), ndigits)
+    else:
+        return number
+
+
+class AverageMeter(Meter):
+    """Computes and stores the average and current value"""
+
+    def __init__(self, round: Optional[int] = None):
+        self.round = round
+        self.reset()
+
+    def reset(self):
+        self.val = None  # most recent update
+        self.sum = 0  # sum from all updates
+        self.count = 0  # total n from all updates
+
+    def update(self, val, n=1):
+        if val is not None:
+            self.val = val
+            if n > 0:
+                self.sum = type_as(self.sum, val) + (val * n)
+                self.count = type_as(self.count, n) + n
+
+    def state_dict(self):
+        return {
+            "val": self.val,
+            "sum": self.sum,
+            "count": self.count,
+            "round": self.round,
+        }
+
+    def load_state_dict(self, state_dict):
+        self.val = state_dict["val"]
+        self.sum = state_dict["sum"]
+        self.count = state_dict["count"]
+        self.round = state_dict.get("round", None)
+
+    @property
+    def avg(self):
+        return self.sum / self.count if self.count > 0 else self.val
+
+    @property
+    def smoothed_value(self) -> float:
+        val = self.avg
+        if self.round is not None and val is not None:
+            val = safe_round(val, self.round)
+        return val
+
+
+class SumMeter(Meter):
+    """Computes and stores the sum"""
+
+    def __init__(self, round: Optional[int] = None):
+        self.round = round
+        self.reset()
+
+    def reset(self):
+        self.sum = 0  # sum from all updates
+
+    def update(self, val):
+        if val is not None:
+            self.sum = type_as(self.sum, val) + val
+
+    def state_dict(self):
+        return {
+            "sum": self.sum,
+            "round": self.round,
+        }
+
+    def load_state_dict(self, state_dict):
+        self.sum = state_dict["sum"]
+        self.round = state_dict.get("round", None)
+
+    @property
+    def smoothed_value(self) -> float:
+        val = self.sum
+        if self.round is not None and val is not None:
+            val = safe_round(val, self.round)
+        return val
+
+
+class TimeMeter(Meter):
+    """Computes the average occurrence of some event per second"""
+
+    def __init__(
+        self,
+        init: int = 0,
+        n: int = 0,
+        round: Optional[int] = None,
+    ):
+        self.round = round
+        self.reset(init, n)
+
+    def reset(self, init=0, n=0):
+        self.init = init
+        self.start = time.perf_counter()
+        self.n = n
+        self.i = 0
+
+    def update(self, val=1):
+        self.n = type_as(self.n, val) + val
+        self.i += 1
+
+    def state_dict(self):
+        return {
+            "init": self.elapsed_time,
+            "n": self.n,
+            "round": self.round,
+        }
+
+    def load_state_dict(self, state_dict):
+        if "start" in state_dict:
+            # backwards compatibility for old state_dicts
+            self.reset(init=state_dict["init"])
+        else:
+            self.reset(init=state_dict["init"], n=state_dict["n"])
+            self.round = state_dict.get("round", None)
+
+    @property
+    def avg(self):
+        return self.n / self.elapsed_time
+
+    @property
+    def elapsed_time(self):
+        return self.init + (time.perf_counter() - self.start)
+
+    @property
+    def smoothed_value(self) -> float:
+        val = self.avg
+        if self.round is not None and val is not None:
+            val = safe_round(val, self.round)
+        return val
+
+
+class StopwatchMeter(Meter):
+    """Computes the sum/avg duration of some event in seconds"""
+
+    def __init__(self, round: Optional[int] = None):
+        self.round = round
+        self.sum = 0
+        self.n = 0
+        self.start_time = None
+
+    def start(self):
+        self.start_time = time.perf_counter()
+
+    def stop(self, n=1, prehook=None):
+        if self.start_time is not None:
+            if prehook is not None:
+                prehook()
+            delta = time.perf_counter() - self.start_time
+            self.sum = self.sum + delta
+            self.n = type_as(self.n, n) + n
+
+    def reset(self):
+        self.sum = 0  # cumulative time during which stopwatch was active
+        self.n = 0  # total n across all start/stop
+        self.start()
+
+    def state_dict(self):
+        return {
+            "sum": self.sum,
+            "n": self.n,
+            "round": self.round,
+        }
+
+    def load_state_dict(self, state_dict):
+        self.sum = state_dict["sum"]
+        self.n = state_dict["n"]
+        self.start_time = None
+        self.round = state_dict.get("round", None)
+
+    @property
+    def avg(self):
+        return self.sum / self.n if self.n > 0 else self.sum
+
+    @property
+    def elapsed_time(self):
+        if self.start_time is None:
+            return 0.0
+        return time.perf_counter() - self.start_time
+
+    @property
+    def smoothed_value(self) -> float:
+        val = self.avg if self.sum > 0 else self.elapsed_time
+        if self.round is not None and val is not None:
+            val = safe_round(val, self.round)
+        return val
+
+
+class MetersDict(OrderedDict):
+    """A sorted dictionary of :class:`Meters`.
+
+    Meters are sorted according to a priority that is given when the
+    meter is first added to the dictionary.
+    """
+
+    def __init__(self, *args, **kwargs):
+        super().__init__(*args, **kwargs)
+        self.priorities = []
+
+    def __setitem__(self, key, value):
+        assert key not in self, "MetersDict doesn't support reassignment"
+        priority, value = value
+        bisect.insort(self.priorities, (priority, len(self.priorities), key))
+        super().__setitem__(key, value)
+        for _, _, key in self.priorities:  # reorder dict to match priorities
+            self.move_to_end(key)
+
+    def add_meter(self, key, meter, priority):
+        self.__setitem__(key, (priority, meter))
+
+    def state_dict(self):
+        return [
+            (pri, key, self[key].__class__.__name__, self[key].state_dict())
+            for pri, _, key in self.priorities
+            # can't serialize DerivedMeter instances
+            if not isinstance(self[key], MetersDict._DerivedMeter)
+        ]
+
+    def load_state_dict(self, state_dict):
+        self.clear()
+        self.priorities.clear()
+        for pri, key, meter_cls, meter_state in state_dict:
+            meter = globals()[meter_cls]()
+            meter.load_state_dict(meter_state)
+            self.add_meter(key, meter, pri)
+
+    def get_smoothed_value(self, key: str) -> float:
+        """Get a single smoothed value."""
+        meter = self[key]
+        if isinstance(meter, MetersDict._DerivedMeter):
+            return meter.fn(self)
+        else:
+            return meter.smoothed_value
+
+    def get_smoothed_values(self) -> Dict[str, float]:
+        """Get all smoothed values."""
+        return OrderedDict(
+            [
+                (key, self.get_smoothed_value(key))
+                for key in self.keys()
+                if not key.startswith("_")
+            ]
+        )
+
+    def reset(self):
+        """Reset Meter instances."""
+        for meter in self.values():
+            if isinstance(meter, MetersDict._DerivedMeter):
+                continue
+            meter.reset()
+
+    class _DerivedMeter(Meter):
+        """A Meter whose values are derived from other Meters."""
+
+        def __init__(self, fn):
+            self.fn = fn
+
+        def reset(self):
+            pass
diff --git a/SpeechT5/fairseq/fairseq/logging/metrics.py b/SpeechT5/fairseq/fairseq/logging/metrics.py
new file mode 100644
index 0000000000000000000000000000000000000000..58c2fb64e186ed9d5e9a06c73194d98a21bb7560
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/logging/metrics.py
@@ -0,0 +1,314 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+"""
+A standalone module for aggregating metrics.
+
+Metrics can be logged from anywhere using the `log_*` functions defined
+in this module. The logged values will be aggregated dynamically based
+on the aggregation context in which the logging occurs. See the
+:func:`aggregate` context manager for more details.
+"""
+
+import contextlib
+import uuid
+from collections import defaultdict
+from typing import Callable, List, Optional
+
+from .meters import *
+
+
+# Aggregation contexts are considered "active" when inside the scope
+# created by the :func:`aggregate` context manager.
+_aggregators = OrderedDict()
+_active_aggregators = OrderedDict()
+_active_aggregators_cnt = defaultdict(lambda: 0)
+
+
+def reset() -> None:
+    """Reset all metrics aggregators."""
+    _aggregators.clear()
+    _active_aggregators.clear()
+    _active_aggregators_cnt.clear()
+
+    # The "default" aggregator observes all logged values.
+    _aggregators["default"] = MetersDict()
+    _active_aggregators["default"] = _aggregators["default"]
+    _active_aggregators_cnt["default"] = 1
+
+
+reset()
+
+
+@contextlib.contextmanager
+def aggregate(name: Optional[str] = None, new_root: bool = False):
+    """Context manager to aggregate metrics under a given name.
+
+    Aggregations can be nested. If *new_root* is ``False``, then logged
+    metrics will be recorded along the entire stack of nested
+    aggregators, including a global "default" aggregator. If *new_root*
+    is ``True``, then this aggregator will be the root of a new
+    aggregation stack, thus bypassing any parent aggregators.
+
+    Note that aggregation contexts are uniquely identified by their
+    *name* (e.g., train, valid). Creating a context with an existing
+    name will reuse the corresponding :class:`MetersDict` instance.
+    If no name is given, then a temporary aggregator will be created.
+
+    Usage::
+
+        with metrics.aggregate("train"):
+            for step, batch in enumerate(epoch):
+                with metrics.aggregate("train_inner") as agg:
+                    metrics.log_scalar("loss", get_loss(batch))
+                    if step % log_interval == 0:
+                        print(agg.get_smoothed_value("loss"))
+                        agg.reset()
+        print(metrics.get_smoothed_values("train")["loss"])
+
+    Args:
+        name (str): name of the aggregation. Defaults to a
+            random/temporary name if not given explicitly.
+        new_root (bool): make this aggregation the root of a new
+            aggregation stack.
+    """
+    if name is None:
+        # generate a temporary name
+        name = str(uuid.uuid4())
+        assert name not in _aggregators
+        agg = MetersDict()
+    else:
+        assert name != "default"
+        agg = _aggregators.setdefault(name, MetersDict())
+
+    if new_root:
+        backup_aggregators = _active_aggregators.copy()
+        _active_aggregators.clear()
+        backup_aggregators_cnt = _active_aggregators_cnt.copy()
+        _active_aggregators_cnt.clear()
+
+    _active_aggregators[name] = agg
+    _active_aggregators_cnt[name] += 1
+
+    yield agg
+
+    _active_aggregators_cnt[name] -= 1
+    if _active_aggregators_cnt[name] == 0 and name in _active_aggregators:
+        del _active_aggregators[name]
+
+    if new_root:
+        _active_aggregators.clear()
+        _active_aggregators.update(backup_aggregators)
+        _active_aggregators_cnt.clear()
+        _active_aggregators_cnt.update(backup_aggregators_cnt)
+
+
+def get_active_aggregators() -> List[MetersDict]:
+    return list(_active_aggregators.values())
+
+
+def log_scalar(
+    key: str,
+    value: float,
+    weight: float = 1,
+    priority: int = 10,
+    round: Optional[int] = None,
+):
+    """Log a scalar value.
+
+    Args:
+        key (str): name of the field to log
+        value (float): value to log
+        weight (float): weight that this value contributes to the average.
+            A weight of 0 will always log the latest value.
+        priority (int): smaller values are logged earlier in the output
+        round (Optional[int]): number of digits to round to when displaying
+    """
+    for agg in get_active_aggregators():
+        if key not in agg:
+            agg.add_meter(key, AverageMeter(round=round), priority)
+        agg[key].update(value, weight)
+
+def log_scalar_sum(
+    key: str,
+    value: float,
+    priority: int = 10,
+    round: Optional[int] = None,
+):
+    """Log a scalar value that is summed for reporting.
+
+    Args:
+        key (str): name of the field to log
+        value (float): value to log
+        priority (int): smaller values are logged earlier in the output
+        round (Optional[int]): number of digits to round to when displaying
+    """
+    for agg in get_active_aggregators():
+        if key not in agg:
+            agg.add_meter(key, SumMeter(round=round), priority)
+        agg[key].update(value)
+
+
+def log_derived(key: str, fn: Callable[[MetersDict], float], priority: int = 20):
+    """Log a scalar value derived from other meters.
+
+    Args:
+        key (str): name of the field to log
+        fn (Callable[[MetersDict], float]): function that takes a single
+            argument *meters* and returns the derived value
+        priority (int): smaller values are logged earlier in the output
+    """
+    for agg in get_active_aggregators():
+        if key not in agg:
+            agg.add_meter(key, MetersDict._DerivedMeter(fn), priority)
+
+
+def log_speed(
+    key: str,
+    value: float,
+    priority: int = 30,
+    round: Optional[int] = None,
+):
+    """Log the rate of some quantity per second.
+
+    Args:
+        key (str): name of the field to log
+        value (float): value to log
+        priority (int): smaller values are logged earlier in the output
+        round (Optional[int]): number of digits to round to when displaying
+    """
+    for agg in get_active_aggregators():
+        if key not in agg:
+            agg.add_meter(key, TimeMeter(round=round), priority)
+            agg[key].reset()  # reset meter on the first call
+        else:
+            agg[key].update(value)
+
+
+def log_start_time(key: str, priority: int = 40, round: Optional[int] = None):
+    """Log the duration of some event in seconds.
+
+    The duration will be computed once :func:`log_stop_time` is called.
+
+    Args:
+        key (str): name of the field to log
+        priority (int): smaller values are logged earlier in the output
+        round (Optional[int]): number of digits to round to when displaying
+    """
+    for agg in get_active_aggregators():
+        if key not in agg:
+            agg.add_meter(key, StopwatchMeter(round=round), priority)
+        agg[key].start()
+
+
+def log_stop_time(key: str, weight: float = 0.0, prehook=None):
+    """Log the duration of some event in seconds.
+
+    The duration will be computed since :func:`log_start_time` was called.
+    Set weight > 0 to report the average time instead of the sum.
+
+    Args:
+        key (str): name of the field to log
+        weight (float): weight that this time contributes to the average
+        prehook (function, no arguments): will be called before the timer
+        is stopped. For example, use prehook=torch.cuda.synchronize to
+        make sure all gpu operations are done before timer is stopped.
+    """
+    for agg in get_active_aggregators():
+        if key in agg:
+            agg[key].stop(weight, prehook)
+
+
+def log_custom(
+    new_meter_fn: Callable[[], Meter],
+    key: str,
+    *args,
+    priority: int = 50,
+    **kwargs,
+):
+    """Log using a custom Meter.
+
+    Any extra *args* or *kwargs* will be passed through to the Meter's
+    *update* method.
+
+    Args:
+        new_meter_fn (Callable[[], Meter]): function that returns a new
+            Meter instance
+        key (str): name of the field to log
+        priority (int): smaller values are logged earlier in the output
+    """
+    for agg in get_active_aggregators():
+        if key not in agg:
+            agg.add_meter(key, new_meter_fn(), priority)
+        agg[key].update(*args, **kwargs)
+
+
+def reset_meter(name: str, key: str) -> None:
+    """Reset Meter instance aggregated under a given *name* and *key*."""
+    meter = get_meter(name, key)
+    if meter is not None:
+        meter.reset()
+
+
+def reset_meters(name: str) -> None:
+    """Reset Meter instances aggregated under a given *name*."""
+    meters = get_meters(name)
+    if meters is not None:
+        meters.reset()
+
+
+def get_meter(name: str, key: str) -> Meter:
+    """Get a single Meter instance aggregated under *name* and *key*.
+
+    Returns:
+        Meter or None if no metrics have been logged under *name* and *key*.
+    """
+    if name not in _aggregators:
+        return None
+    return _aggregators[name].get(key, None)
+
+
+def get_meters(name: str) -> MetersDict:
+    """Get Meter instances aggregated under a given *name*.
+
+    Returns:
+        MetersDict or None if no metrics have been logged under *name*.
+    """
+    return _aggregators.get(name, None)
+
+
+def get_smoothed_value(name: str, key: str) -> float:
+    """Get a single smoothed value.
+
+    Raises:
+        KeyError: if no metrics have been logged under *name* and *key*.
+    """
+    return _aggregators[name].get_smoothed_value(key)
+
+
+def get_smoothed_values(name: str) -> Dict[str, float]:
+    """Get smoothed values aggregated under a given *name*.
+
+    Raises:
+        KeyError: if no metrics have been logged under *name*.
+    """
+    return _aggregators[name].get_smoothed_values()
+
+
+def state_dict():
+    return OrderedDict([(name, agg.state_dict()) for name, agg in _aggregators.items()])
+
+
+def load_state_dict(state_dict):
+    for name, agg_state in state_dict.items():
+        _aggregators[name] = MetersDict()
+        _aggregators[name].load_state_dict(agg_state)
+
+
+def xla_metrics_report():
+    try:
+        import torch_xla.debug.metrics as met
+        print(met.metrics_report())
+    except ImportError:
+        return
diff --git a/SpeechT5/fairseq/fairseq/logging/progress_bar.py b/SpeechT5/fairseq/fairseq/logging/progress_bar.py
new file mode 100644
index 0000000000000000000000000000000000000000..061082caefe542c5f0f87e04d9472583874126a3
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/logging/progress_bar.py
@@ -0,0 +1,490 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+"""
+Wrapper around various loggers and progress bars (e.g., tqdm).
+"""
+
+import atexit
+import json
+import logging
+import os
+import sys
+from collections import OrderedDict
+from contextlib import contextmanager
+from numbers import Number
+from typing import Optional
+
+import torch
+
+from .meters import AverageMeter, StopwatchMeter, TimeMeter
+
+
+logger = logging.getLogger(__name__)
+
+
+def progress_bar(
+    iterator,
+    log_format: Optional[str] = None,
+    log_interval: int = 100,
+    log_file: Optional[str] = None,
+    epoch: Optional[int] = None,
+    prefix: Optional[str] = None,
+    tensorboard_logdir: Optional[str] = None,
+    default_log_format: str = "tqdm",
+    wandb_project: Optional[str] = None,
+    wandb_run_name: Optional[str] = None,
+    azureml_logging: Optional[bool] = False,
+):
+    if log_format is None:
+        log_format = default_log_format
+    if log_file is not None:
+        handler = logging.FileHandler(filename=log_file)
+        logger.addHandler(handler)
+
+    if log_format == "tqdm" and not sys.stderr.isatty():
+        log_format = "simple"
+
+    if log_format == "json":
+        bar = JsonProgressBar(iterator, epoch, prefix, log_interval)
+    elif log_format == "none":
+        bar = NoopProgressBar(iterator, epoch, prefix)
+    elif log_format == "simple":
+        bar = SimpleProgressBar(iterator, epoch, prefix, log_interval)
+    elif log_format == "tqdm":
+        bar = TqdmProgressBar(iterator, epoch, prefix)
+    else:
+        raise ValueError("Unknown log format: {}".format(log_format))
+
+    if tensorboard_logdir:
+        try:
+            # [FB only] custom wrapper for TensorBoard
+            import palaas  # noqa
+            from .fb_tbmf_wrapper import FbTbmfWrapper
+
+            bar = FbTbmfWrapper(bar, log_interval)
+        except ImportError:
+            bar = TensorboardProgressBarWrapper(bar, tensorboard_logdir)
+
+    if wandb_project:
+        bar = WandBProgressBarWrapper(bar, wandb_project, run_name=wandb_run_name)
+
+    if azureml_logging:
+        bar = AzureMLProgressBarWrapper(bar)
+
+    return bar
+
+
+def build_progress_bar(
+    args,
+    iterator,
+    epoch: Optional[int] = None,
+    prefix: Optional[str] = None,
+    default: str = "tqdm",
+    no_progress_bar: str = "none",
+):
+    """Legacy wrapper that takes an argparse.Namespace."""
+    if getattr(args, "no_progress_bar", False):
+        default = no_progress_bar
+    if getattr(args, "distributed_rank", 0) == 0:
+        tensorboard_logdir = getattr(args, "tensorboard_logdir", None)
+    else:
+        tensorboard_logdir = None
+    return progress_bar(
+        iterator,
+        log_format=args.log_format,
+        log_interval=args.log_interval,
+        epoch=epoch,
+        prefix=prefix,
+        tensorboard_logdir=tensorboard_logdir,
+        default_log_format=default,
+    )
+
+
+def format_stat(stat):
+    if isinstance(stat, Number):
+        stat = "{:g}".format(stat)
+    elif isinstance(stat, AverageMeter):
+        stat = "{:.3f}".format(stat.avg)
+    elif isinstance(stat, TimeMeter):
+        stat = "{:g}".format(round(stat.avg))
+    elif isinstance(stat, StopwatchMeter):
+        stat = "{:g}".format(round(stat.sum))
+    elif torch.is_tensor(stat):
+        stat = stat.tolist()
+    return stat
+
+
+class BaseProgressBar(object):
+    """Abstract class for progress bars."""
+
+    def __init__(self, iterable, epoch=None, prefix=None):
+        self.iterable = iterable
+        self.n = getattr(iterable, "n", 0)
+        self.epoch = epoch
+        self.prefix = ""
+        if epoch is not None:
+            self.prefix += "epoch {:03d}".format(epoch)
+        if prefix is not None:
+            self.prefix += (" | " if self.prefix != "" else "") + prefix
+
+    def __len__(self):
+        return len(self.iterable)
+
+    def __enter__(self):
+        return self
+
+    def __exit__(self, *exc):
+        return False
+
+    def __iter__(self):
+        raise NotImplementedError
+
+    def log(self, stats, tag=None, step=None):
+        """Log intermediate stats according to log_interval."""
+        raise NotImplementedError
+
+    def print(self, stats, tag=None, step=None):
+        """Print end-of-epoch stats."""
+        raise NotImplementedError
+
+    def update_config(self, config):
+        """Log latest configuration."""
+        pass
+
+    def _str_commas(self, stats):
+        return ", ".join(key + "=" + stats[key].strip() for key in stats.keys())
+
+    def _str_pipes(self, stats):
+        return " | ".join(key + " " + stats[key].strip() for key in stats.keys())
+
+    def _format_stats(self, stats):
+        postfix = OrderedDict(stats)
+        # Preprocess stats according to datatype
+        for key in postfix.keys():
+            postfix[key] = str(format_stat(postfix[key]))
+        return postfix
+
+
+@contextmanager
+def rename_logger(logger, new_name):
+    old_name = logger.name
+    if new_name is not None:
+        logger.name = new_name
+    yield logger
+    logger.name = old_name
+
+
+class JsonProgressBar(BaseProgressBar):
+    """Log output in JSON format."""
+
+    def __init__(self, iterable, epoch=None, prefix=None, log_interval=1000):
+        super().__init__(iterable, epoch, prefix)
+        self.log_interval = log_interval
+        self.i = None
+        self.size = None
+
+    def __iter__(self):
+        self.size = len(self.iterable)
+        for i, obj in enumerate(self.iterable, start=self.n):
+            self.i = i
+            yield obj
+
+    def log(self, stats, tag=None, step=None):
+        """Log intermediate stats according to log_interval."""
+        step = step or self.i or 0
+        if step > 0 and self.log_interval is not None and step % self.log_interval == 0:
+            update = (
+                self.epoch - 1 + (self.i + 1) / float(self.size)
+                if self.epoch is not None
+                else None
+            )
+            stats = self._format_stats(stats, epoch=self.epoch, update=update)
+            with rename_logger(logger, tag):
+                logger.info(json.dumps(stats))
+
+    def print(self, stats, tag=None, step=None):
+        """Print end-of-epoch stats."""
+        self.stats = stats
+        if tag is not None:
+            self.stats = OrderedDict(
+                [(tag + "_" + k, v) for k, v in self.stats.items()]
+            )
+        stats = self._format_stats(self.stats, epoch=self.epoch)
+        with rename_logger(logger, tag):
+            logger.info(json.dumps(stats))
+
+    def _format_stats(self, stats, epoch=None, update=None):
+        postfix = OrderedDict()
+        if epoch is not None:
+            postfix["epoch"] = epoch
+        if update is not None:
+            postfix["update"] = round(update, 3)
+        # Preprocess stats according to datatype
+        for key in stats.keys():
+            postfix[key] = format_stat(stats[key])
+        return postfix
+
+
+class NoopProgressBar(BaseProgressBar):
+    """No logging."""
+
+    def __init__(self, iterable, epoch=None, prefix=None):
+        super().__init__(iterable, epoch, prefix)
+
+    def __iter__(self):
+        for obj in self.iterable:
+            yield obj
+
+    def log(self, stats, tag=None, step=None):
+        """Log intermediate stats according to log_interval."""
+        pass
+
+    def print(self, stats, tag=None, step=None):
+        """Print end-of-epoch stats."""
+        pass
+
+
+class SimpleProgressBar(BaseProgressBar):
+    """A minimal logger for non-TTY environments."""
+
+    def __init__(self, iterable, epoch=None, prefix=None, log_interval=1000):
+        super().__init__(iterable, epoch, prefix)
+        self.log_interval = log_interval
+        self.i = None
+        self.size = None
+
+    def __iter__(self):
+        self.size = len(self.iterable)
+        for i, obj in enumerate(self.iterable, start=self.n):
+            self.i = i
+            yield obj
+
+    def log(self, stats, tag=None, step=None):
+        """Log intermediate stats according to log_interval."""
+        step = step or self.i or 0
+        if step > 0 and self.log_interval is not None and step % self.log_interval == 0:
+            stats = self._format_stats(stats)
+            postfix = self._str_commas(stats)
+            with rename_logger(logger, tag):
+                logger.info(
+                    "{}:  {:5d} / {:d} {}".format(
+                        self.prefix, self.i + 1, self.size, postfix
+                    )
+                )
+
+    def print(self, stats, tag=None, step=None):
+        """Print end-of-epoch stats."""
+        postfix = self._str_pipes(self._format_stats(stats))
+        with rename_logger(logger, tag):
+            logger.info("{} | {}".format(self.prefix, postfix))
+
+
+class TqdmProgressBar(BaseProgressBar):
+    """Log to tqdm."""
+
+    def __init__(self, iterable, epoch=None, prefix=None):
+        super().__init__(iterable, epoch, prefix)
+        from tqdm import tqdm
+
+        self.tqdm = tqdm(
+            iterable,
+            self.prefix,
+            leave=False,
+            disable=(logger.getEffectiveLevel() > logging.INFO),
+        )
+
+    def __iter__(self):
+        return iter(self.tqdm)
+
+    def log(self, stats, tag=None, step=None):
+        """Log intermediate stats according to log_interval."""
+        self.tqdm.set_postfix(self._format_stats(stats), refresh=False)
+
+    def print(self, stats, tag=None, step=None):
+        """Print end-of-epoch stats."""
+        postfix = self._str_pipes(self._format_stats(stats))
+        with rename_logger(logger, tag):
+            logger.info("{} | {}".format(self.prefix, postfix))
+
+
+try:
+    _tensorboard_writers = {}
+    from torch.utils.tensorboard import SummaryWriter
+except ImportError:
+    try:
+        from tensorboardX import SummaryWriter
+    except ImportError:
+        SummaryWriter = None
+
+
+def _close_writers():
+    for w in _tensorboard_writers.values():
+        w.close()
+
+
+atexit.register(_close_writers)
+
+
+class TensorboardProgressBarWrapper(BaseProgressBar):
+    """Log to tensorboard."""
+
+    def __init__(self, wrapped_bar, tensorboard_logdir):
+        self.wrapped_bar = wrapped_bar
+        self.tensorboard_logdir = tensorboard_logdir
+
+        if SummaryWriter is None:
+            logger.warning(
+                "tensorboard not found, please install with: pip install tensorboard"
+            )
+
+    def _writer(self, key):
+        if SummaryWriter is None:
+            return None
+        _writers = _tensorboard_writers
+        if key not in _writers:
+            _writers[key] = SummaryWriter(os.path.join(self.tensorboard_logdir, key))
+            _writers[key].add_text("sys.argv", " ".join(sys.argv))
+        return _writers[key]
+
+    def __iter__(self):
+        return iter(self.wrapped_bar)
+
+    def log(self, stats, tag=None, step=None):
+        """Log intermediate stats to tensorboard."""
+        self._log_to_tensorboard(stats, tag, step)
+        self.wrapped_bar.log(stats, tag=tag, step=step)
+
+    def print(self, stats, tag=None, step=None):
+        """Print end-of-epoch stats."""
+        self._log_to_tensorboard(stats, tag, step)
+        self.wrapped_bar.print(stats, tag=tag, step=step)
+
+    def update_config(self, config):
+        """Log latest configuration."""
+        # TODO add hparams to Tensorboard
+        self.wrapped_bar.update_config(config)
+
+    def _log_to_tensorboard(self, stats, tag=None, step=None):
+        writer = self._writer(tag or "")
+        if writer is None:
+            return
+        if step is None:
+            step = stats["num_updates"]
+        for key in stats.keys() - {"num_updates"}:
+            if isinstance(stats[key], AverageMeter):
+                writer.add_scalar(key, stats[key].val, step)
+            elif isinstance(stats[key], Number):
+                writer.add_scalar(key, stats[key], step)
+            elif torch.is_tensor(stats[key]) and stats[key].numel() == 1:
+                writer.add_scalar(key, stats[key].item(), step)
+        writer.flush()
+
+
+try:
+    import wandb
+except ImportError:
+    wandb = None
+
+
+class WandBProgressBarWrapper(BaseProgressBar):
+    """Log to Weights & Biases."""
+
+    def __init__(self, wrapped_bar, wandb_project, run_name=None):
+        self.wrapped_bar = wrapped_bar
+        if wandb is None:
+            logger.warning("wandb not found, pip install wandb")
+            return
+
+        # reinit=False to ensure if wandb.init() is called multiple times
+        # within one process it still references the same run
+        wandb.init(project=wandb_project, reinit=False, name=run_name)
+
+    def __iter__(self):
+        return iter(self.wrapped_bar)
+
+    def log(self, stats, tag=None, step=None):
+        """Log intermediate stats to tensorboard."""
+        self._log_to_wandb(stats, tag, step)
+        self.wrapped_bar.log(stats, tag=tag, step=step)
+
+    def print(self, stats, tag=None, step=None):
+        """Print end-of-epoch stats."""
+        self._log_to_wandb(stats, tag, step)
+        self.wrapped_bar.print(stats, tag=tag, step=step)
+
+    def update_config(self, config):
+        """Log latest configuration."""
+        if wandb is not None:
+            wandb.config.update(config)
+        self.wrapped_bar.update_config(config)
+
+    def _log_to_wandb(self, stats, tag=None, step=None):
+        if wandb is None:
+            return
+        if step is None:
+            step = stats["num_updates"]
+
+        prefix = "" if tag is None else tag + "/"
+
+        for key in stats.keys() - {"num_updates"}:
+            if isinstance(stats[key], AverageMeter):
+                wandb.log({prefix + key: stats[key].val}, step=step)
+            elif isinstance(stats[key], Number):
+                wandb.log({prefix + key: stats[key]}, step=step)
+
+
+try:
+    from azureml.core import Run
+except ImportError:
+    Run = None
+
+
+class AzureMLProgressBarWrapper(BaseProgressBar):
+    """Log to Azure ML"""
+
+    def __init__(self, wrapped_bar):
+        self.wrapped_bar = wrapped_bar
+        if Run is None:
+            logger.warning("azureml.core not found, pip install azureml-core")
+            return
+        self.run = Run.get_context()
+
+    def __exit__(self, *exc):
+        if Run is not None:
+            self.run.complete()
+        return False
+
+    def __iter__(self):
+        return iter(self.wrapped_bar)
+
+    def log(self, stats, tag=None, step=None):
+        """Log intermediate stats to AzureML"""
+        self._log_to_azureml(stats, tag, step)
+        self.wrapped_bar.log(stats, tag=tag, step=step)
+
+    def print(self, stats, tag=None, step=None):
+        """Print end-of-epoch stats"""
+        self._log_to_azureml(stats, tag, step)
+        self.wrapped_bar.print(stats, tag=tag, step=step)
+
+    def update_config(self, config):
+        """Log latest configuration."""
+        self.wrapped_bar.update_config(config)
+
+    def _log_to_azureml(self, stats, tag=None, step=None):
+        if Run is None:
+            return
+        if step is None:
+            step = stats["num_updates"]
+
+        prefix = "" if tag is None else tag + "/"
+
+        for key in stats.keys() - {"num_updates"}:
+            name = prefix + key
+            if isinstance(stats[key], AverageMeter):
+                self.run.log_row(name=name, **{"step": step, key: stats[key].val})
+            elif isinstance(stats[key], Number):
+                self.run.log_row(name=name, **{"step": step, key: stats[key]})
diff --git a/SpeechT5/fairseq/fairseq/model_parallel/__init__.py b/SpeechT5/fairseq/fairseq/model_parallel/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..69f21684872f72ae8ee26d9ff7d2d2b6e6d526c3
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/model_parallel/__init__.py
@@ -0,0 +1,6 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from . import criterions, models, modules  # noqa
diff --git a/SpeechT5/fairseq/fairseq/model_parallel/criterions/__init__.py b/SpeechT5/fairseq/fairseq/model_parallel/criterions/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..5fae7bd4c2cfa7b4f64ad62dd9b9082f59f0e50d
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/model_parallel/criterions/__init__.py
@@ -0,0 +1,14 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import importlib
+import os
+
+
+# automatically import any Python files in the criterions/ directory
+for file in sorted(os.listdir(os.path.dirname(__file__))):
+    if file.endswith(".py") and not file.startswith("_"):
+        module = file[: file.find(".py")]
+        importlib.import_module("fairseq.model_parallel.criterions." + module)
diff --git a/SpeechT5/fairseq/fairseq/model_parallel/criterions/vocab_parallel_cross_entropy.py b/SpeechT5/fairseq/fairseq/model_parallel/criterions/vocab_parallel_cross_entropy.py
new file mode 100644
index 0000000000000000000000000000000000000000..35c50ee1521963c5cb6dfb7036ccf43401c6c6ac
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/model_parallel/criterions/vocab_parallel_cross_entropy.py
@@ -0,0 +1,87 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import math
+
+from fairseq import metrics, utils
+from fairseq.criterions import FairseqCriterion, register_criterion
+
+
+try:
+    from fairseq.model_parallel.megatron.mpu.cross_entropy import (
+        vocab_parallel_cross_entropy,
+    )
+
+    has_megatron_submodule = True
+except (ImportError, ModuleNotFoundError):
+    has_megatron_submodule = False
+
+
+@register_criterion("vocab_parallel_cross_entropy")
+class VocabParallelCrossEntropyCriterion(FairseqCriterion):
+    def __init__(self, task, sentence_avg):
+        super().__init__(task)
+        self.sentence_avg = sentence_avg
+        if not has_megatron_submodule:
+            raise ImportError(
+                "\n\nPlease install the megatron submodule:"
+                "\n\n  git submodule update --init "
+                "fairseq/model_parallel/megatron"
+            )
+
+    def forward(self, model, sample, reduce=True):
+        """Compute the loss for the given sample.
+
+        Returns a tuple with three elements:
+        1) the loss
+        2) the sample size, which is used as the denominator for the gradient
+        3) logging outputs to display while training
+        """
+        net_output = model(**sample["net_input"])
+        target = sample["target"]
+
+        loss = vocab_parallel_cross_entropy(net_output[0].float(), target)
+        loss = (loss * (target != self.padding_idx)).sum()
+        sample_size = (
+            sample["target"].size(0) if self.sentence_avg else sample["ntokens"]
+        )
+        logging_output = {
+            "loss": utils.item(loss.data) if reduce else loss.data,
+            "ntokens": sample["ntokens"],
+            "nsentences": sample["target"].size(0),
+            "sample_size": sample_size,
+        }
+        return loss, sample_size, logging_output
+
+    @staticmethod
+    def reduce_metrics(logging_outputs) -> None:
+        """Aggregate logging outputs from data parallel training."""
+        loss_sum = sum(log.get("loss", 0) for log in logging_outputs)
+        ntokens = sum(log.get("ntokens", 0) for log in logging_outputs)
+        sample_size = sum(log.get("sample_size", 0) for log in logging_outputs)
+
+        metrics.log_scalar(
+            "loss", loss_sum / sample_size / math.log(2), sample_size, round=3
+        )
+        if sample_size != ntokens:
+            metrics.log_scalar(
+                "nll_loss", loss_sum / ntokens / math.log(2), ntokens, round=3
+            )
+            metrics.log_derived(
+                "ppl", lambda meters: utils.get_perplexity(meters["nll_loss"].avg)
+            )
+        else:
+            metrics.log_derived(
+                "ppl", lambda meters: utils.get_perplexity(meters["loss"].avg)
+            )
+
+    @staticmethod
+    def logging_outputs_can_be_summed() -> bool:
+        """
+        Whether the logging outputs returned by `forward` can be summed
+        across workers prior to calling `reduce_metrics`. Setting this
+        to True will improves distributed training speed.
+        """
+        return True
diff --git a/SpeechT5/fairseq/fairseq/model_parallel/megatron_trainer.py b/SpeechT5/fairseq/fairseq/model_parallel/megatron_trainer.py
new file mode 100644
index 0000000000000000000000000000000000000000..8ab4657f73c6cda91e95637921edb84ccb76b3d0
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/model_parallel/megatron_trainer.py
@@ -0,0 +1,71 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+"""
+Train a network across multiple GPUs.
+"""
+
+from fairseq.dataclass.configs import FairseqConfig
+from fairseq.distributed import utils as distributed_utils
+from fairseq.trainer import Trainer
+
+try:
+    from fairseq.model_parallel.megatron.mpu import (
+        get_data_parallel_rank,
+        get_data_parallel_world_size,
+        get_model_parallel_src_rank,
+        get_cuda_rng_tracker,
+    )
+
+    has_megatron_submodule = True
+except (ImportError, ModuleNotFoundError):
+    has_megatron_submodule = False
+
+
+class MegatronTrainer(Trainer):
+    """Main class for model parallel with data parallel training."""
+
+    def __init__(self, cfg: FairseqConfig, task, model, criterion, **kwargs):
+        if not has_megatron_submodule:
+            raise ImportError(
+                "\n\nPlease install the megatron submodule:"
+                "\n\n  git submodule update --init "
+                "fairseq/model_parallel/megatron"
+            )
+        super().__init__(cfg, task, model, criterion, **kwargs)
+
+    def clip_grad_norm(self, clip_norm):
+        def _aggregate_model_parallel_grad_norm(total_norm):
+            total_norm = total_norm ** 2
+            distributed_utils.all_reduce(
+                total_norm, group=distributed_utils.get_model_parallel_group()
+            )
+            total_norm = total_norm ** 0.5
+            return total_norm
+
+        return self.optimizer.clip_grad_norm(
+            clip_norm,
+            aggregate_norm_fn=_aggregate_model_parallel_grad_norm,
+        )
+
+    def save_checkpoint(self, filename, extra_state):
+        """Save all training state in a checkpoint file."""
+        extra_state['rng_tracker_states'] \
+            = get_cuda_rng_tracker().get_states()
+        super().save_checkpoint(filename, extra_state)
+
+    def load_checkpoint(
+        self,
+        filename,
+        reset_optimizer=False,
+        reset_lr_scheduler=False,
+        optimizer_overrides=None,
+        reset_meters=False,
+    ):
+        extra_state = super().load_checkpoint(filename, reset_optimizer=reset_optimizer, reset_lr_scheduler=reset_lr_scheduler, optimizer_overrides=optimizer_overrides, reset_meters=reset_meters)
+        if extra_state is not None and 'rng_tracker_states' in extra_state:
+            get_cuda_rng_tracker().set_states(
+                extra_state['rng_tracker_states'])
+        return extra_state
diff --git a/SpeechT5/fairseq/fairseq/model_parallel/models/__init__.py b/SpeechT5/fairseq/fairseq/model_parallel/models/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..3532479e52a0e1f1ba204c6f5d51c71c98ee5df0
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/model_parallel/models/__init__.py
@@ -0,0 +1,20 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import importlib
+import os
+
+
+# automatically import any Python files in the models/ directory
+models_dir = os.path.dirname(__file__)
+for file in os.listdir(models_dir):
+    path = os.path.join(models_dir, file)
+    if (
+        not file.startswith("_")
+        and not file.startswith(".")
+        and (file.endswith(".py") or os.path.isdir(path))
+    ):
+        model_name = file[: file.find(".py")] if file.endswith(".py") else file
+        module = importlib.import_module("fairseq.model_parallel.models." + model_name)
diff --git a/SpeechT5/fairseq/fairseq/model_parallel/models/pipeline_parallel_transformer/__init__.py b/SpeechT5/fairseq/fairseq/model_parallel/models/pipeline_parallel_transformer/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..117827c3e9c176477f33e3a6fd7fe19a922411a2
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/model_parallel/models/pipeline_parallel_transformer/__init__.py
@@ -0,0 +1,6 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from .model import *  # noqa
diff --git a/SpeechT5/fairseq/fairseq/model_parallel/models/pipeline_parallel_transformer/layers.py b/SpeechT5/fairseq/fairseq/model_parallel/models/pipeline_parallel_transformer/layers.py
new file mode 100644
index 0000000000000000000000000000000000000000..eb81ded341257ba0a43c4d0867e8f3c83f276bc7
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/model_parallel/models/pipeline_parallel_transformer/layers.py
@@ -0,0 +1,600 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import math
+from collections import namedtuple
+
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from fairseq import options, utils
+from fairseq.modules import (
+    AdaptiveSoftmax,
+    LayerNorm,
+    MultiheadAttention,
+    PositionalEmbedding,
+)
+
+
+EncoderOut = namedtuple(
+    "TransformerEncoderOut",
+    [
+        "encoder_out",  # T x B x C
+        "encoder_padding_mask",  # B x T
+        "encoder_embedding",  # B x T x C
+        "encoder_states",  # List[T x B x C]
+    ],
+)
+
+
+class TransformerEncoderEmbedding(nn.Module):
+    """ Encoder Embedding + Positional Embedding """
+
+    def __init__(self, args, embed_tokens):
+        super().__init__()
+        self.dropout = args.dropout
+        self.max_source_positions = args.max_source_positions
+        self.embed_tokens = embed_tokens
+        if isinstance(embed_tokens, nn.ModuleList):
+            self.padding_idx = embed_tokens[0].padding_idx
+            embed_dim = sum(e.embedding_dim for e in embed_tokens)
+        else:
+            self.padding_idx = embed_tokens.padding_idx
+            embed_dim = embed_tokens.embedding_dim
+        self.embed_scale = math.sqrt(embed_dim)
+        self.embed_positions = (
+            PositionalEmbedding(
+                args.max_source_positions,
+                embed_dim,
+                self.padding_idx,
+                learned=args.encoder_learned_pos,
+            )
+            if not args.no_token_positional_embeddings
+            else None
+        )
+        if getattr(args, "layernorm_embedding", False):
+            self.layernorm_embedding = LayerNorm(embed_dim)
+        else:
+            self.layernorm_embedding = None
+
+    def forward(self, input):
+        # embed tokens and positions
+        src_tokens = input[0]
+        prev_output_tokens = input[2]
+        if isinstance(self.embed_tokens, nn.ModuleList):
+            x_embed_list = []
+            for embed_tokens_part in self.embed_tokens:
+                x_embed_list.append(embed_tokens_part(src_tokens))
+
+            embedded = torch.cat(x_embed_list, dim=-1)
+        else:
+            embedded = self.embed_tokens(src_tokens)
+        x = embed = self.embed_scale * embedded
+        if self.embed_positions is not None:
+            x = embed + self.embed_positions(src_tokens)
+        if self.layernorm_embedding:
+            x = self.layernorm_embedding(x)
+        x = F.dropout(x, p=self.dropout, training=self.training)
+        # B x T x C -> T x B x C
+        x = x.transpose(0, 1)
+
+        # compute padding mask
+        encoder_padding_mask = src_tokens.eq(self.padding_idx)
+        return (x, encoder_padding_mask, prev_output_tokens)
+
+
+class TransformerEncoderLayerNorm(nn.Module):
+    """
+    Layer norm at the the end of all encoder layers if
+    args.encoder_enormalize_before = True
+    """
+
+    def __init__(self, args, embed_dim):
+        super().__init__()
+        if args.encoder_normalize_before:
+            self.layer_norm = LayerNorm(embed_dim)
+        else:
+            self.layer_norm = None
+
+    def forward(self, input):
+        x = input[0]
+        encoder_padding_mask = input[1]
+        prev_output_tokens = input[2]
+        if self.layer_norm:
+            x = self.layer_norm(x)
+        # keeping track of the incremental_state is not supported yet
+        return (x, encoder_padding_mask, prev_output_tokens)
+
+
+class TransformerDecoderEmbedding(nn.Module):
+    """ Decoder Embedding + Positional Embedding """
+
+    def __init__(self, args, embed_tokens):
+        super().__init__()
+        self.dropout = args.dropout
+        self.share_input_output_embed = args.share_decoder_input_output_embed
+        input_embed_dim = (
+            sum(e.embedding_dim for e in embed_tokens)
+            if isinstance(embed_tokens, nn.ModuleList)
+            else embed_tokens.embedding_dim
+        )
+        embed_dim = args.decoder_embed_dim
+        self.output_embed_dim = args.decoder_output_dim
+
+        padding_idx = (
+            embed_tokens[0].padding_idx
+            if isinstance(embed_tokens, nn.ModuleList)
+            else embed_tokens.padding_idx
+        )
+        self.max_target_positions = args.max_target_positions
+
+        self.embed_tokens = embed_tokens
+        self.embed_scale = math.sqrt(embed_dim)  # todo: try with input_embed_dim
+
+        self.project_in_dim = (
+            Linear(input_embed_dim, embed_dim, bias=False)
+            if embed_dim != input_embed_dim
+            else None
+        )
+
+        self.embed_positions = (
+            PositionalEmbedding(
+                args.max_target_positions,
+                embed_dim,
+                padding_idx,
+                learned=args.decoder_learned_pos,
+            )
+            if not args.no_token_positional_embeddings
+            else None
+        )
+
+    def forward(self, input):
+        mt_task = False
+        if isinstance(input, tuple):
+            if len(input) == 3:
+                encoder_out = input[0]
+                encoder_padding_mask = input[1]
+                prev_output_tokens = input[2]
+                incremental_state = None  # Hardcoding to avoid passing of None objects
+                mt_task = True
+            else:
+                # HACK for now, need to fix (TODO sidgoyal)
+                prev_output_tokens = input[0]
+                # discard "src_lengths"
+                encoder_out = None
+                encoder_padding_mask = None
+                incremental_state = None
+
+        else:
+            prev_output_tokens = input
+            encoder_out = None
+            encoder_padding_mask = None
+            incremental_state = None
+
+        positions = (
+            self.embed_positions(
+                prev_output_tokens,
+                incremental_state=incremental_state,
+            )
+            if self.embed_positions is not None
+            else None
+        )
+
+        if incremental_state is not None:
+            prev_output_tokens = prev_output_tokens[:, -1:]
+            if positions is not None:
+                positions = positions[:, -1:]
+
+        # embed tokens and positions
+
+        if isinstance(self.embed_tokens, nn.ModuleList):
+            x_embed_list = []
+            for embed_tokens_part in self.embed_tokens:
+                x_embed_list.append(embed_tokens_part(prev_output_tokens))
+
+            x = self.embed_scale * torch.cat(x_embed_list, dim=-1)
+        else:
+            x = self.embed_scale * self.embed_tokens(prev_output_tokens)
+
+        if self.project_in_dim is not None:
+            x = self.project_in_dim(x)
+
+        if positions is not None:
+            x += positions
+        x = F.dropout(x, p=self.dropout, training=self.training)
+
+        # B x T x C -> T x B x C
+        x = x.transpose(0, 1)
+        if mt_task:
+            return (x, encoder_out, encoder_padding_mask)
+        return x
+
+
+class TransformerDecoderOutputLayer(nn.Module):
+    def __init__(self, args, embed_tokens, dictionary):
+        super().__init__()
+        self.share_input_output_embed = args.share_decoder_input_output_embed
+        self.embed_tokens = embed_tokens
+        self.output_embed_dim = args.decoder_output_dim
+        embed_dim = args.decoder_embed_dim
+
+        self.project_out_dim = (
+            Linear(embed_dim, self.output_embed_dim, bias=False)
+            if embed_dim != self.output_embed_dim and not args.tie_adaptive_weights
+            else None
+        )
+        self.adaptive_softmax = None
+        if args.adaptive_softmax_cutoff is not None:
+            assert not isinstance(embed_tokens, nn.ModuleList)
+            self.adaptive_softmax = AdaptiveSoftmax(
+                len(dictionary),
+                self.output_embed_dim,
+                options.eval_str_list(args.adaptive_softmax_cutoff, type=int),
+                dropout=args.adaptive_softmax_dropout,
+                adaptive_inputs=embed_tokens if args.tie_adaptive_weights else None,
+                factor=args.adaptive_softmax_factor,
+                tie_proj=args.tie_adaptive_proj,
+            )
+        elif not self.share_input_output_embed:
+            self.embed_tokens = nn.Parameter(
+                torch.Tensor(len(dictionary), self.output_embed_dim)
+            )
+            nn.init.normal_(
+                self.embed_tokens, mean=0, std=self.output_embed_dim ** -0.5
+            )
+
+        if args.decoder_normalize_before and not getattr(
+            args, "no_decoder_final_norm", False
+        ):
+            self.layer_norm = LayerNorm(embed_dim)
+        else:
+            self.layer_norm = None
+
+    def forward(self, input, apply_final_proj=True):
+        if isinstance(input, tuple):
+            x = input[0]
+        else:
+            x = input
+
+        if self.layer_norm:
+            x = self.layer_norm(x)
+
+        # T x B x C -> B x T x C
+        x = x.transpose(0, 1)
+
+        if self.project_out_dim is not None:
+            x = self.project_out_dim(x)
+        if apply_final_proj:
+            x = self.output_layer(x)
+        return x
+
+    def output_layer(self, features, **kwargs):
+        """Project features to the vocabulary size."""
+        if self.adaptive_softmax is None:
+            # project back to size of vocabulary
+            if self.share_input_output_embed:
+                if isinstance(self.embed_tokens, nn.ModuleList):
+                    output = None
+                    for i, emb in enumerate(self.embed_tokens):
+                        sidx = i * emb.embedding_dim
+                        eidx = (i + 1) * emb.embedding_dim
+                        if output is None:
+                            output = F.linear(features[:, :, sidx:eidx], emb.weight)
+                        else:
+                            output += F.linear(features[:, :, sidx:eidx], emb.weight)
+
+                    return output
+                else:
+                    return F.linear(features, self.embed_tokens.weight)
+            else:
+                return F.linear(features, self.embed_tokens)
+        else:
+            return features
+
+
+class TransformerEncoderLayer(nn.Module):
+    """Encoder layer block.
+    In the original paper each operation (multi-head attention or FFN) is
+    postprocessed with: `dropout -> add residual -> layernorm`. In the
+    tensor2tensor code they suggest that learning is more robust when
+    preprocessing each layer with layernorm and postprocessing with:
+    `dropout -> add residual`. We default to the approach in the paper, but the
+    tensor2tensor approach can be enabled by setting
+    *args.encoder_normalize_before* to ``True``.
+
+    Args:
+        args (argparse.Namespace): parsed command-line arguments
+    """
+
+    def __init__(self, args):
+        super().__init__()
+        self.embed_dim = args.encoder_embed_dim
+        self.self_attn = MultiheadAttention(
+            self.embed_dim,
+            args.encoder_attention_heads,
+            dropout=args.attention_dropout,
+            self_attention=True,
+        )
+        self.self_attn_layer_norm = LayerNorm(self.embed_dim)
+        self.dropout = args.dropout
+        self.activation_fn = utils.get_activation_fn(
+            activation=getattr(args, "activation_fn", "relu")
+        )
+        self.activation_dropout = getattr(args, "activation_dropout", 0)
+        if self.activation_dropout == 0:
+            # for backwards compatibility with models that use args.relu_dropout
+            self.activation_dropout = getattr(args, "relu_dropout", 0)
+        self.normalize_before = args.encoder_normalize_before
+        self.fc1 = Linear(self.embed_dim, args.encoder_ffn_embed_dim)
+        self.fc2 = Linear(args.encoder_ffn_embed_dim, self.embed_dim)
+        self.final_layer_norm = LayerNorm(self.embed_dim)
+
+    def upgrade_state_dict_named(self, state_dict, name):
+        """
+        Rename layer norm states from `...layer_norms.0.weight` to
+        `...self_attn_layer_norm.weight` and `...layer_norms.1.weight` to
+        `...final_layer_norm.weight`
+        """
+        layer_norm_map = {"0": "self_attn_layer_norm", "1": "final_layer_norm"}
+        for old, new in layer_norm_map.items():
+            for m in ("weight", "bias"):
+                k = "{}.layer_norms.{}.{}".format(name, old, m)
+                if k in state_dict:
+                    state_dict["{}.{}.{}".format(name, new, m)] = state_dict[k]
+                    del state_dict[k]
+
+    def forward(self, input):
+        """
+        Args:
+            input (Tuple):
+                input[0] (Tensor): input to the layer of shape `(seq_len, batch, embed_dim)`
+                input[1] (ByteTensor/FloatTensor): encoder padding mask -
+                    binary ByteTensor of shape `(batch, src_len)` where padding elements
+                    are indicated by ``1``.
+                input[2] (LongTensor): previous decoder outputs of shape
+                    `(batch, tgt_len)`, for teacher forcing)
+        Returns:
+            output (Tuple):
+                output[0] (Tensor): encoded output of shape `(batch, src_len, embed_dim)`
+                output[1] (ByteTensor/FloatTensor): encoder padding mask
+                output[2] (LongTensor): previous decoder outputs
+        """
+        x = input[0]
+        encoder_padding_mask = input[1]
+        prev_output_tokens = input[2]
+        residual = x
+        x = self.maybe_layer_norm(self.self_attn_layer_norm, x, before=True)
+        x, _ = self.self_attn(
+            query=x, key=x, value=x, key_padding_mask=encoder_padding_mask
+        )
+        x = F.dropout(x, p=self.dropout, training=self.training)
+        x = residual + x
+        x = self.maybe_layer_norm(self.self_attn_layer_norm, x, after=True)
+
+        residual = x
+        x = self.maybe_layer_norm(self.final_layer_norm, x, before=True)
+        x = self.activation_fn(self.fc1(x))
+        x = F.dropout(x, p=self.activation_dropout, training=self.training)
+        x = self.fc2(x)
+        x = F.dropout(x, p=self.dropout, training=self.training)
+        x = residual + x
+        x = self.maybe_layer_norm(self.final_layer_norm, x, after=True)
+        return (x, encoder_padding_mask, prev_output_tokens)
+
+    def maybe_layer_norm(self, layer_norm, x, before=False, after=False):
+        assert before ^ after
+        if after ^ self.normalize_before:
+            return layer_norm(x)
+        else:
+            return x
+
+
+class TransformerDecoderLayer(nn.Module):
+    """Decoder layer block.
+
+    In the original paper each operation (multi-head attention, encoder
+    attention or FFN) is postprocessed with: `dropout -> add residual ->
+    layernorm`. In the tensor2tensor code they suggest that learning is more
+    robust when preprocessing each layer with layernorm and postprocessing with:
+    `dropout -> add residual`. We default to the approach in the paper, but the
+    tensor2tensor approach can be enabled by setting
+    *args.decoder_normalize_before* to ``True``.
+
+    Args:
+        args (argparse.Namespace): parsed command-line arguments
+        no_encoder_attn (bool, optional): whether to attend to encoder outputs
+            (default: False).
+    """
+
+    def __init__(
+        self, args, no_encoder_attn=False, add_bias_kv=False, add_zero_attn=False
+    ):
+        super().__init__()
+        self.embed_dim = args.decoder_embed_dim
+        self.self_attn = MultiheadAttention(
+            embed_dim=self.embed_dim,
+            num_heads=args.decoder_attention_heads,
+            dropout=args.attention_dropout,
+            add_bias_kv=add_bias_kv,
+            add_zero_attn=add_zero_attn,
+            self_attention=True,
+        )
+        self.dropout = args.dropout
+        self.activation_fn = utils.get_activation_fn(
+            activation=getattr(args, "activation_fn", "relu")
+        )
+        self.activation_dropout = getattr(args, "activation_dropout", 0)
+        if self.activation_dropout == 0:
+            # for backwards compatibility with models that use args.relu_dropout
+            self.activation_dropout = getattr(args, "relu_dropout", 0)
+        self.normalize_before = args.decoder_normalize_before
+
+        # use layerNorm rather than FusedLayerNorm for exporting.
+        # char_inputs can be used to determint this.
+        # TODO  remove this once we update apex with the fix
+        export = getattr(args, "char_inputs", False)
+        self.self_attn_layer_norm = LayerNorm(self.embed_dim, export=export)
+
+        if no_encoder_attn:
+            self.encoder_attn = None
+            self.encoder_attn_layer_norm = None
+        else:
+            self.encoder_attn = MultiheadAttention(
+                self.embed_dim,
+                args.decoder_attention_heads,
+                kdim=getattr(args, "encoder_embed_dim", None),
+                vdim=getattr(args, "encoder_embed_dim", None),
+                dropout=args.attention_dropout,
+                encoder_decoder_attention=True,
+            )
+            self.encoder_attn_layer_norm = LayerNorm(self.embed_dim, export=export)
+
+        self.fc1 = Linear(self.embed_dim, args.decoder_ffn_embed_dim)
+        self.fc2 = Linear(args.decoder_ffn_embed_dim, self.embed_dim)
+
+        self.final_layer_norm = LayerNorm(self.embed_dim, export=export)
+        self.need_attn = True
+
+        self.onnx_trace = False
+
+    def prepare_for_onnx_export_(self):
+        self.onnx_trace = True
+
+    def forward(self, input):
+        """
+        Args:
+            input (Tuple):
+                input[0] (Tensor): input to the layer of shape `(seq_len, batch, embed_dim)`
+                input[1] (Tensor): encoder output of shape `(batch, src_len, embed_dim)`
+                input[2] (ByteTensor/FloatTensor): encoder padding mask -
+                    binary ByteTensor of shape `(batch, src_len)` where padding elements
+                    are indicated by ``1``.
+        Returns:
+            output (Tuple):
+                output[0] (Tensor): encoded output of shape `(batch, src_len, embed_dim)`
+                output[1] (ByteTensor/FloatTensor): encoder padding mask
+                output[2] (LongTensor): previous decoder outputs
+        """
+        # Note: incremental state is not yet supported
+        mt_task = False
+        if isinstance(input, tuple):
+            x = input[0]
+            encoder_out = input[1]
+            encoder_padding_mask = input[2]
+            incremental_state = None
+            mt_task = True
+        else:
+            x = input
+            encoder_out = None
+            encoder_padding_mask = None
+            incremental_state = None
+
+        if incremental_state is None:
+            self_attn_mask = self.buffered_future_mask(x)
+        else:
+            self_attn_mask = None
+
+        # TODO: add back prev_self_attn_state, prev_attn_state,
+        # self_attn_padding_mask
+        prev_self_attn_state = None
+        prev_attn_state = None
+        self_attn_padding_mask = None
+
+        residual = x
+        x = self.maybe_layer_norm(self.self_attn_layer_norm, x, before=True)
+        if prev_self_attn_state is not None:
+            if incremental_state is None:
+                incremental_state = {}
+            prev_key, prev_value = prev_self_attn_state
+            saved_state = {"prev_key": prev_key, "prev_value": prev_value}
+            self.self_attn._set_input_buffer(incremental_state, saved_state)
+        x, attn = self.self_attn(
+            query=x,
+            key=x,
+            value=x,
+            key_padding_mask=self_attn_padding_mask,
+            incremental_state=incremental_state,
+            need_weights=False,
+            attn_mask=self_attn_mask,
+        )
+        x = F.dropout(x, p=self.dropout, training=self.training)
+        x = residual + x
+        x = self.maybe_layer_norm(self.self_attn_layer_norm, x, after=True)
+
+        if self.encoder_attn is not None:
+            residual = x
+            x = self.maybe_layer_norm(self.encoder_attn_layer_norm, x, before=True)
+            if prev_attn_state is not None:
+                if incremental_state is None:
+                    incremental_state = {}
+                prev_key, prev_value = prev_attn_state
+                saved_state = {"prev_key": prev_key, "prev_value": prev_value}
+                self.encoder_attn._set_input_buffer(incremental_state, saved_state)
+            x, attn = self.encoder_attn(
+                query=x,
+                key=encoder_out,
+                value=encoder_out,
+                key_padding_mask=encoder_padding_mask,
+                incremental_state=incremental_state,
+                static_kv=True,
+                need_weights=(not self.training and self.need_attn),
+            )
+            x = F.dropout(x, p=self.dropout, training=self.training)
+            x = residual + x
+            x = self.maybe_layer_norm(self.encoder_attn_layer_norm, x, after=True)
+
+        residual = x
+        x = self.maybe_layer_norm(self.final_layer_norm, x, before=True)
+        x = self.activation_fn(self.fc1(x))
+        x = F.dropout(x, p=self.activation_dropout, training=self.training)
+        x = self.fc2(x)
+        x = F.dropout(x, p=self.dropout, training=self.training)
+        x = residual + x
+        x = self.maybe_layer_norm(self.final_layer_norm, x, after=True)
+
+        if mt_task:
+            return (x, encoder_out, encoder_padding_mask)
+        return x
+
+    def buffered_future_mask(self, tensor):
+        dim = tensor.size(0)
+        if (
+            not hasattr(self, "_future_mask")
+            or self._future_mask is None
+            or self._future_mask.device != tensor.device
+        ):
+            self._future_mask = torch.triu(
+                utils.fill_with_neg_inf(tensor.new(dim, dim)), 1
+            )
+        if self._future_mask.size(0) < dim:
+            self._future_mask = torch.triu(
+                utils.fill_with_neg_inf(self._future_mask.resize_(dim, dim)), 1
+            )
+        return self._future_mask[:dim, :dim]
+
+    def maybe_layer_norm(self, layer_norm, x, before=False, after=False):
+        assert before ^ after
+        if after ^ self.normalize_before:
+            return layer_norm(x)
+        else:
+            return x
+
+    def make_generation_fast_(self, need_attn=False, **kwargs):
+        self.need_attn = need_attn
+
+
+def Embedding(num_embeddings, embedding_dim, padding_idx):
+    m = nn.Embedding(num_embeddings, embedding_dim, padding_idx=padding_idx)
+    nn.init.normal_(m.weight, mean=0, std=embedding_dim ** -0.5)
+    nn.init.constant_(m.weight[padding_idx], 0)
+    return m
+
+
+def Linear(in_features, out_features, bias=True):
+    m = nn.Linear(in_features, out_features, bias)
+    nn.init.xavier_uniform_(m.weight)
+    if bias:
+        nn.init.constant_(m.bias, 0.0)
+    return m
diff --git a/SpeechT5/fairseq/fairseq/model_parallel/models/pipeline_parallel_transformer/model.py b/SpeechT5/fairseq/fairseq/model_parallel/models/pipeline_parallel_transformer/model.py
new file mode 100644
index 0000000000000000000000000000000000000000..7f30dd98bb19b7bc414790787053efb231855129
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/model_parallel/models/pipeline_parallel_transformer/model.py
@@ -0,0 +1,767 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import logging
+
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from fairseq import utils
+from fairseq.model_parallel.models.pipeline_parallel_transformer.layers import (
+    Embedding,
+    TransformerDecoderEmbedding,
+    TransformerDecoderLayer,
+    TransformerDecoderOutputLayer,
+    TransformerEncoderEmbedding,
+    TransformerEncoderLayer,
+    TransformerEncoderLayerNorm,
+)
+from fairseq.models import (
+    BaseFairseqModel,
+    FairseqDecoder,
+    FairseqEncoder,
+    register_model,
+    register_model_architecture,
+)
+from fairseq.models.fairseq_encoder import EncoderOut
+from fairseq.models.transformer import (
+    base_architecture,
+    transformer_iwslt_de_en,
+    transformer_wmt_en_de_big,
+)
+from fairseq.modules import SinusoidalPositionalEmbedding
+
+
+logger = logging.getLogger(__name__)
+
+
+DEFAULT_MAX_SOURCE_POSITIONS = 1024
+DEFAULT_MAX_TARGET_POSITIONS = 1024
+TORCH_PIPE = False
+RPC_INIT = False
+
+def import_pipe():
+    global TORCH_PIPE
+    global RPC_INIT
+    try:
+        from torch.distributed.pipeline.sync import Pipe # noqa
+        global Pipe
+        from torch.distributed.pipeline.sync.utils import partition_model
+        global partition_model
+        from torch.distributed import rpc
+        import tempfile
+        TORCH_PIPE = True
+        # Initialize single process RPC agent since TORCH_PIPE requires
+        # RRef. RRef depends on RPC being initialized and as a result we initialize
+        # RPC with a single node.
+        tmpfile = tempfile.NamedTemporaryFile()
+        if not RPC_INIT:
+            rpc.init_rpc(
+                name="worker",
+                rank=0,
+                world_size=1,
+                rpc_backend_options=rpc.TensorPipeRpcBackendOptions(
+                    init_method="file://{}".format(tmpfile.name),
+                )
+            )
+            RPC_INIT = True
+        logger.info('Using torch pipe')
+    except ImportError:
+        try:
+            from fairscale.nn import Pipe # noqa
+            logger.info('Using fairscale pipe')
+        except ImportError:
+            raise ImportError("Please install fairscale with: pip install fairscale")
+
+
+@register_model("pipeline_parallel_transformer")
+class PipelineParallelTransformerModel(BaseFairseqModel):
+    def __init__(self, encoder, decoder, balance, devices, chunks, checkpoint):
+        import_pipe()
+        super().__init__()
+        assert isinstance(encoder, FairseqEncoder)
+        assert isinstance(decoder, FairseqDecoder)
+        encoder_module_list = (
+            [encoder.embedding_layer]
+            + list(encoder.encoder_layers)
+            + [encoder.final_layer_norm]
+        )
+        self.num_encoder_modules = len(encoder_module_list)
+        decoder_module_list = (
+            [decoder.embedding_layer]
+            + list(decoder.decoder_layers)
+            + [decoder.decoder_output_layer]
+        )
+        self.num_decoder_modules = len(decoder_module_list)
+        module_list = encoder_module_list + decoder_module_list
+        self.devices = devices
+        if TORCH_PIPE:
+            self.model = Pipe(
+                partition_model(nn.Sequential(*module_list), balance, devices),
+                chunks=chunks,
+                checkpoint=checkpoint,
+            )
+        else:
+            self.model = Pipe(
+                nn.Sequential(*module_list),
+                balance=balance,
+                devices=devices,
+                chunks=chunks,
+                checkpoint=checkpoint,
+            )
+        self.encoder_max_positions = self.max_positions_helper(
+            encoder.embedding_layer, "max_source_positions"
+        )
+        self.decoder_max_positions = self.max_positions_helper(
+            decoder.embedding_layer, "max_target_positions"
+        )
+        self.adaptive_softmax = getattr(decoder, "adaptive_softmax", None)
+        # Note: To be populated during inference
+        self.encoder = None
+        self.decoder = None
+
+    def forward(self, src_tokens, src_lengths, prev_output_tokens):
+        if self.training:
+            input_lst = [src_tokens, src_lengths, prev_output_tokens]
+            input = tuple(i.to(self.devices[0], non_blocking=True) for i in input_lst)
+            if TORCH_PIPE:
+                return self.model(input).local_value()
+            else:
+                return self.model(input)
+        else:
+            assert self.encoder is not None and self.decoder is not None, (
+                "encoder and decoder need to be initialized by "
+                + "calling the `prepare_for_inference_()` method"
+            )
+            encoder_output_tuple = self.encoder(input)
+            return self.decoder(encoder_output_tuple)
+
+    def prepare_for_inference_(self, cfg):
+        if self.encoder is not None and self.decoder is not None:
+            logger.info("Encoder and Decoder already initialized")
+            return
+        encoder_module_list = []
+        decoder_module_list = []
+        module_count = 0
+        for partition in self.model.partitions:
+            for module in partition:
+                if module_count < self.num_encoder_modules:
+                    encoder_module_list.append(module)
+                else:
+                    decoder_module_list.append(module)
+                module_count += 1
+        self.model = None
+        self.encoder = TransformerEncoder(cfg.distributed_training, None, None, encoder_module_list)
+        self.decoder = TransformerDecoder(
+            cfg.distributed_training, None, None, decoder_module_list=decoder_module_list
+        )
+
+    @staticmethod
+    def add_args(parser):
+        """Add model-specific arguments to the parser."""
+        # fmt: off
+        parser.add_argument('--activation-fn',
+                            choices=utils.get_available_activation_fns(),
+                            help='activation function to use')
+        parser.add_argument('--dropout', type=float, metavar='D',
+                            help='dropout probability')
+        parser.add_argument('--attention-dropout', type=float, metavar='D',
+                            help='dropout probability for attention weights')
+        parser.add_argument('--activation-dropout', '--relu-dropout', type=float, metavar='D',
+                            help='dropout probability after activation in FFN.')
+        parser.add_argument('--encoder-embed-path', type=str, metavar='STR',
+                            help='path to pre-trained encoder embedding')
+        parser.add_argument('--encoder-embed-dim', type=int, metavar='N',
+                            help='encoder embedding dimension')
+        parser.add_argument('--encoder-ffn-embed-dim', type=int, metavar='N',
+                            help='encoder embedding dimension for FFN')
+        parser.add_argument('--encoder-layers', type=int, metavar='N',
+                            help='num encoder layers')
+        parser.add_argument('--encoder-attention-heads', type=int, metavar='N',
+                            help='num encoder attention heads')
+        parser.add_argument('--encoder-normalize-before', action='store_true',
+                            help='apply layernorm before each encoder block')
+        parser.add_argument('--encoder-learned-pos', action='store_true',
+                            help='use learned positional embeddings in the encoder')
+        parser.add_argument('--decoder-embed-path', type=str, metavar='STR',
+                            help='path to pre-trained decoder embedding')
+        parser.add_argument('--decoder-embed-dim', type=int, metavar='N',
+                            help='decoder embedding dimension')
+        parser.add_argument('--decoder-ffn-embed-dim', type=int, metavar='N',
+                            help='decoder embedding dimension for FFN')
+        parser.add_argument('--decoder-layers', type=int, metavar='N',
+                            help='num decoder layers')
+        parser.add_argument('--decoder-attention-heads', type=int, metavar='N',
+                            help='num decoder attention heads')
+        parser.add_argument('--decoder-learned-pos', action='store_true',
+                            help='use learned positional embeddings in the decoder')
+        parser.add_argument('--decoder-normalize-before', action='store_true',
+                            help='apply layernorm before each decoder block')
+        parser.add_argument('--share-decoder-input-output-embed', action='store_true',
+                            help='share decoder input and output embeddings')
+        parser.add_argument('--share-all-embeddings', action='store_true',
+                            help='share encoder, decoder and output embeddings'
+                                 ' (requires shared dictionary and embed dim)')
+        parser.add_argument('--no-token-positional-embeddings', default=False, action='store_true',
+                            help='if set, disables positional embeddings (outside self attention)')
+        parser.add_argument('--adaptive-softmax-cutoff', metavar='EXPR',
+                            help='comma separated list of adaptive softmax cutoff points. '
+                                 'Must be used with adaptive_loss criterion'),
+        parser.add_argument('--adaptive-softmax-dropout', type=float, metavar='D',
+                            help='sets adaptive softmax dropout for the tail projections')
+        parser.add_argument('--num-embedding-chunks', type=int, metavar='N', default=1,
+                            help='Number of embedding layer chunks (enables more even distribution'
+                                 'of optimizer states across data parallel nodes'
+                                 'when using optimizer state sharding and'
+                                 'a big embedding vocabulary)')
+        # fmt: on
+
+    @classmethod
+    def build_model_base(cls, args, task):
+        """Build a new model instance."""
+
+        # make sure all arguments are present in older models
+        base_architecture(args)
+
+        if not hasattr(args, "max_source_positions"):
+            args.max_source_positions = DEFAULT_MAX_SOURCE_POSITIONS
+        if not hasattr(args, "max_target_positions"):
+            args.max_target_positions = DEFAULT_MAX_TARGET_POSITIONS
+
+        src_dict, tgt_dict = task.source_dictionary, task.target_dictionary
+
+        def build_embedding(dictionary, embed_dim, path=None, num_embed_chunks=1):
+            assert embed_dim % num_embed_chunks == 0, (
+                f"Number of embedding chunks = {num_embed_chunks} should be "
+                + f"divisible by the embedding dimension = {embed_dim}"
+            )
+            assert path is None or num_embed_chunks == 1, (
+                "Loading embedding from a path with number of embedding chunks > 1"
+                + " is not yet supported"
+            )
+            num_embeddings = len(dictionary)
+            padding_idx = dictionary.pad()
+            # if provided, load from preloaded dictionaries
+            if path:
+                emb = Embedding(num_embeddings, embed_dim, padding_idx)
+                embed_dict = utils.parse_embedding(path)
+                utils.load_embedding(embed_dict, dictionary, emb)
+            else:
+                embed_chunk_dim = embed_dim // num_embed_chunks
+                emb = nn.ModuleList()
+                for i in range(num_embed_chunks):
+                    emb.append(Embedding(num_embeddings, embed_chunk_dim, padding_idx))
+            return emb
+
+        num_embed_chunks = args.num_embedding_chunks
+        if args.share_all_embeddings:
+            if src_dict != tgt_dict:
+                raise ValueError("--share-all-embeddings requires a joined dictionary")
+            if args.encoder_embed_dim != args.decoder_embed_dim:
+                raise ValueError(
+                    "--share-all-embeddings requires --encoder-embed-dim to match --decoder-embed-dim"
+                )
+            if args.decoder_embed_path and (
+                args.decoder_embed_path != args.encoder_embed_path
+            ):
+                raise ValueError(
+                    "--share-all-embeddings not compatible with --decoder-embed-path"
+                )
+            encoder_embed_tokens = build_embedding(
+                src_dict,
+                args.encoder_embed_dim,
+                args.encoder_embed_path,
+                num_embed_chunks,
+            )
+            decoder_embed_tokens = encoder_embed_tokens
+            args.share_decoder_input_output_embed = True
+        else:
+            assert args.share_decoder_input_output_embed or num_embed_chunks == 1, (
+                "Not sharing decoder I/O embeddings is not yet supported with number of "
+                + "embedding chunks > 1"
+            )
+            encoder_embed_tokens = build_embedding(
+                src_dict,
+                args.encoder_embed_dim,
+                args.encoder_embed_path,
+                num_embed_chunks,
+            )
+            decoder_embed_tokens = build_embedding(
+                tgt_dict,
+                args.decoder_embed_dim,
+                args.decoder_embed_path,
+                num_embed_chunks,
+            )
+
+        encoder = cls.build_encoder(args, src_dict, encoder_embed_tokens)
+        decoder = cls.build_decoder(args, tgt_dict, decoder_embed_tokens)
+        return (encoder, decoder)
+
+    @classmethod
+    def build_encoder(cls, args, src_dict, embed_tokens):
+        return TransformerEncoder(args, src_dict, embed_tokens)
+
+    @classmethod
+    def build_decoder(cls, args, tgt_dict, embed_tokens):
+        return TransformerDecoder(args, tgt_dict, embed_tokens)
+
+    @classmethod
+    def build_model(cls, args, task):
+        encoder, decoder = cls.build_model_base(args, task)
+        return PipelineParallelTransformerModel(
+            encoder=encoder,
+            decoder=decoder,
+            balance=utils.eval_str_list(args.pipeline_balance, type=int),
+            devices=utils.eval_str_list(args.pipeline_devices, type=int),
+            chunks=args.pipeline_chunks,
+            checkpoint=args.pipeline_checkpoint,
+        )
+
+    def output_layer(self, features, **kwargs):
+        """Project features to the default output size (typically vocabulary size)."""
+        return self.decoder.output_layer(features, **kwargs)
+
+    def max_positions(self):
+        """Maximum length supported by the model."""
+        return (self.encoder_max_positions, self.decoder_max_positions)
+
+    def max_positions_helper(
+        self, embedding_layer, max_positions_field="max_source_positions"
+    ):
+        """Maximum input length supported by the encoder or decoder."""
+        if embedding_layer.embed_positions is None:
+            return getattr(embedding_layer, max_positions_field)
+        return min(
+            getattr(embedding_layer, max_positions_field),
+            embedding_layer.embed_positions.max_positions,
+        )
+
+    def get_normalized_probs(self, net_output, log_probs, sample=None):
+        """Get normalized probabilities (or log probs) from a net's output."""
+
+        if hasattr(self, "adaptive_softmax") and self.adaptive_softmax is not None:
+            if sample is not None:
+                assert "target" in sample
+                target = sample["target"]
+            else:
+                target = None
+            out = self.adaptive_softmax.get_log_prob(net_output, target=target)
+            return out.exp_() if not log_probs else out
+
+        # A Pipe() module returns a tuple of tensors as the output.
+        # In this case, the tuple has one element - the output tensor of logits
+        logits = net_output if isinstance(net_output, torch.Tensor) else net_output[0]
+        if log_probs:
+            return utils.log_softmax(logits, dim=-1, onnx_trace=False)
+        else:
+            return utils.softmax(logits, dim=-1, onnx_trace=False)
+
+    def max_decoder_positions(self):
+        """Maximum length supported by the decoder."""
+        return self.decoder_max_positions
+
+    def load_state_dict(self, state_dict, strict=True, model_cfg=None):
+        """Copies parameters and buffers from *state_dict* into this module and
+        its descendants.
+
+        Overrides the method in :class:`nn.Module`. Compared with that method
+        this additionally "upgrades" *state_dicts* from old checkpoints.
+        """
+        self.upgrade_state_dict(state_dict)
+        is_regular_transformer = not any("model.partitions" in k for k in state_dict)
+        if is_regular_transformer:
+            state_dict = self.convert_to_pipeline_parallel_state_dict(state_dict)
+        return super().load_state_dict(state_dict, strict)
+
+    def convert_to_pipeline_parallel_state_dict(self, state_dict):
+        new_state_dict = self.state_dict()
+        encoder_layer_idx = 0
+        decoder_layer_idx = 0
+        encoder_key_suffixes = [
+            "self_attn.k_proj.weight",
+            "self_attn.k_proj.bias",
+            "self_attn.v_proj.weight",
+            "self_attn.v_proj.bias",
+            "self_attn.q_proj.weight",
+            "self_attn.q_proj.bias",
+            "self_attn.out_proj.weight",
+            "self_attn.out_proj.bias",
+            "self_attn_layer_norm.weight",
+            "self_attn_layer_norm.bias",
+            "fc1.weight",
+            "fc1.bias",
+            "fc2.weight",
+            "fc2.bias",
+            "final_layer_norm.weight",
+            "final_layer_norm.bias",
+        ]
+        decoder_key_suffixes = [
+            "self_attn.k_proj.weight",
+            "self_attn.k_proj.bias",
+            "self_attn.v_proj.weight",
+            "self_attn.v_proj.bias",
+            "self_attn.q_proj.weight",
+            "self_attn.q_proj.bias",
+            "self_attn.out_proj.weight",
+            "self_attn.out_proj.bias",
+            "self_attn_layer_norm.weight",
+            "self_attn_layer_norm.bias",
+            "encoder_attn.k_proj.weight",
+            "encoder_attn.k_proj.bias",
+            "encoder_attn.v_proj.weight",
+            "encoder_attn.v_proj.bias",
+            "encoder_attn.q_proj.weight",
+            "encoder_attn.q_proj.bias",
+            "encoder_attn.out_proj.weight",
+            "encoder_attn.out_proj.bias",
+            "encoder_attn_layer_norm.weight",
+            "encoder_attn_layer_norm.bias",
+            "fc1.weight",
+            "fc1.bias",
+            "fc2.weight",
+            "fc2.bias",
+            "final_layer_norm.weight",
+            "final_layer_norm.bias",
+        ]
+        for pid, partition in enumerate(self.model.partitions):
+            logger.info(f"Begin Partition {pid}")
+            for mid, module in enumerate(partition):
+                # fmt: off
+                if isinstance(module, TransformerEncoderEmbedding):
+                    new_state_dict[f'model.partitions.{pid}.{mid}.embed_tokens.weight'] = state_dict['encoder.embed_tokens.weight']
+                    new_state_dict[f'model.partitions.{pid}.{mid}.embed_positions._float_tensor'] = state_dict['encoder.embed_positions._float_tensor']
+                if isinstance(module, TransformerEncoderLayer):
+                    for suffix in encoder_key_suffixes:
+                        new_state_dict[f'model.partitions.{pid}.{mid}.{suffix}'] = state_dict[f'encoder.layers.{encoder_layer_idx}.{suffix}']
+                    encoder_layer_idx += 1
+                if isinstance(module, TransformerDecoderLayer):
+                    for suffix in decoder_key_suffixes:
+                        new_state_dict[f'model.partitions.{pid}.{mid}.{suffix}'] = state_dict[f'decoder.layers.{decoder_layer_idx}.{suffix}']
+                    decoder_layer_idx += 1
+                if isinstance(module, TransformerEncoderLayerNorm):
+                    if 'encoder.layer_norm.weight' in state_dict:
+                        new_state_dict[f'model.partitions.{pid}.{mid}.layer_norm.weight'] = state_dict['encoder.layer_norm.weight']
+                        new_state_dict[f'model.partitions.{pid}.{mid}.layer_norm.bias'] = state_dict['encoder.layer_norm.bias']
+                if isinstance(module, TransformerDecoderEmbedding):
+                    new_state_dict[f'model.partitions.{pid}.{mid}.embed_tokens.weight'] = state_dict['decoder.embed_tokens.weight']
+                    new_state_dict[f'model.partitions.{pid}.{mid}.embed_positions._float_tensor'] = state_dict['decoder.embed_positions._float_tensor']
+                if isinstance(module, TransformerDecoderOutputLayer):
+                    new_state_dict[f'model.partitions.{pid}.{mid}.output_projection.weight'] = state_dict['decoder.output_projection.weight']
+                # fmt: on
+        return new_state_dict
+
+
+class TransformerEncoder(FairseqEncoder):
+    """
+    Transformer encoder consisting of *args.encoder_layers* layers. Each layer
+    is a :class:`TransformerEncoderLayer`.
+
+    Args:
+        args (argparse.Namespace): parsed command-line arguments
+        dictionary (~fairseq.data.Dictionary): encoding dictionary
+        embed_tokens (torch.nn.Embedding): input embedding
+    """
+
+    def __init__(self, args, dictionary, embed_tokens, encoder_module_list=None):
+        super().__init__(dictionary)
+        self.register_buffer("version", torch.Tensor([3]))
+        import_pipe()
+        self.use_pipeline = encoder_module_list is not None
+        if not self.use_pipeline:
+            self.embedding_layer = TransformerEncoderEmbedding(args, embed_tokens)
+            self.encoder_layers = nn.Sequential(*[TransformerEncoderLayer(args) for i in range(args.encoder_layers)])
+            if isinstance(embed_tokens, nn.ModuleList):
+                emb_dim = sum(e.embedding_dim for e in embed_tokens)
+            else:
+                emb_dim = embed_tokens.embedding_dim
+            self.final_layer_norm = TransformerEncoderLayerNorm(args, emb_dim)
+        else:
+            encoder_balance = utils.eval_str_list(
+                args.pipeline_encoder_balance, type=int
+            )
+            encoder_devices = utils.eval_str_list(
+                args.pipeline_encoder_devices, type=int
+            )
+            assert sum(encoder_balance) == len(encoder_module_list), (
+                f"Sum of encoder_balance={encoder_balance} is not equal "
+                + f"to num_encoder_modules={len(encoder_module_list)}"
+            )
+            if TORCH_PIPE:
+                self.model = Pipe(
+                    module=partition_model(nn.Sequential(*encoder_module_list), encoder_balance, encoder_devices),
+                    chunks=args.pipeline_chunks,
+                    checkpoint=args.pipeline_checkpoint,
+                )
+            else:
+                self.model = Pipe(
+                    module=nn.Sequential(*encoder_module_list),
+                    balance=encoder_balance,
+                    devices=encoder_devices,
+                    chunks=args.pipeline_chunks,
+                    checkpoint=args.pipeline_checkpoint,
+                )
+
+    def forward(self, src_tokens, src_lengths):
+        """
+        Args:
+            input_tuple(
+                src_tokens (LongTensor): tokens in the source language of shape
+                    `(batch, src_len)`
+                src_lengths (torch.LongTensor): lengths of each source sentence of
+                    shape `(batch)`
+            )
+
+        Returns:
+            output_tuple(
+                - **encoder_out** (Tensor): the last encoder layer's output of
+                  shape `(src_len, batch, embed_dim)`
+                - **encoder_padding_mask** (ByteTensor): the positions of
+                  padding elements of shape `(batch, src_len)`
+                - prev_output_tokens
+                - **encoder_states** (List[Tensor]): all intermediate
+                  hidden states of shape `(src_len, batch, embed_dim)`.
+                  Only populated if *return_all_hiddens* is True.
+            )
+        """
+        dummy_prev_output_tokens = torch.zeros(
+            1, dtype=src_tokens.dtype, device=src_tokens.device
+        )
+        input_tuple = (src_tokens, src_lengths, dummy_prev_output_tokens)
+        if self.use_pipeline:
+            input_tuple = tuple(i.to(self.model.devices[0]) for i in input_tuple)
+            if TORCH_PIPE:
+                encoder_out = self.model(input_tuple).local_value()
+            else:
+                encoder_out = self.model(input_tuple)
+        else:
+            encoder_embed_output_tuple = self.embedding_layer(input_tuple)
+            encoder_layers_output = self.encoder_layers(encoder_embed_output_tuple)
+            encoder_out = self.final_layer_norm(encoder_layers_output)
+        # first element is the encoder output
+        # second element is the encoder padding mask
+        # the remaining elements of EncoderOut are not computed by
+        # the PipelineParallelTransformer
+        return EncoderOut(encoder_out[0], encoder_out[1], None, None, None, None)
+
+    def reorder_encoder_out(self, encoder_out, new_order):
+        """
+        Reorder encoder output according to *new_order*.
+
+        Args:
+            encoder_out: output from the ``forward()`` method
+            new_order (LongTensor): desired order
+
+        Returns:
+            *encoder_out* rearranged according to *new_order*
+        """
+        if encoder_out.encoder_out is not None:
+            encoder_out = encoder_out._replace(
+                encoder_out=encoder_out.encoder_out.index_select(1, new_order)
+            )
+        if encoder_out.encoder_padding_mask is not None:
+            encoder_out = encoder_out._replace(
+                encoder_padding_mask=encoder_out.encoder_padding_mask.index_select(
+                    0, new_order
+                )
+            )
+        if encoder_out.encoder_embedding is not None:
+            encoder_out = encoder_out._replace(
+                encoder_embedding=encoder_out.encoder_embedding.index_select(
+                    0, new_order
+                )
+            )
+        if encoder_out.encoder_states is not None:
+            for idx, state in enumerate(encoder_out.encoder_states):
+                encoder_out.encoder_states[idx] = state.index_select(1, new_order)
+        return encoder_out
+
+    def max_positions(self):
+        """Maximum input length supported by the encoder."""
+        if self.embedding_layer.embed_positions is None:
+            return self.embedding_layer.max_source_positions
+        return min(
+            self.embedding_layer.max_source_positions,
+            self.embedding_layer.embed_positions.max_positions,
+        )
+
+
+class TransformerDecoder(FairseqDecoder):
+    """
+    Transformer decoder consisting of *args.decoder_layers* layers. Each layer
+    is a :class:`TransformerDecoderLayer`.
+
+    Args:
+        args (argparse.Namespace): parsed command-line arguments
+        dictionary (~fairseq.data.Dictionary): decoding dictionary
+        embed_tokens (torch.nn.Embedding): output embedding
+        no_encoder_attn (bool, optional): whether to attend to encoder outputs
+            (default: False).
+    """
+
+    def __init__(
+        self,
+        args,
+        dictionary,
+        embed_tokens,
+        no_encoder_attn=False,
+        decoder_module_list=None,
+    ):
+        super().__init__(dictionary)
+        self.register_buffer("version", torch.Tensor([3]))
+        import_pipe()
+        self.use_pipeline = decoder_module_list is not None
+        if not self.use_pipeline:
+            self.embedding_layer = TransformerDecoderEmbedding(args, embed_tokens)
+            self.decoder_layers = nn.Sequential(*[
+                TransformerDecoderLayer(args, no_encoder_attn)
+                for _ in range(args.decoder_layers)
+            ])
+            self.decoder_output_layer = TransformerDecoderOutputLayer(
+                args, embed_tokens, dictionary
+            )
+        else:
+            decoder_balance = utils.eval_str_list(
+                args.pipeline_decoder_balance, type=int
+            )
+            decoder_devices = utils.eval_str_list(
+                args.pipeline_decoder_devices, type=int
+            )
+            assert sum(decoder_balance) == len(decoder_module_list), (
+                f"Sum of decoder_balance={decoder_balance} is not equal "
+                + f"to num_decoder_modules={len(decoder_module_list)}"
+            )
+            if TORCH_PIPE:
+                self.model = Pipe(
+                    module=partition_model(nn.Sequential(*decoder_module_list), decoder_balance, decoder_devices),
+                    chunks=args.pipeline_chunks,
+                    checkpoint=args.pipeline_checkpoint,
+                )
+            else:
+                self.model = Pipe(
+                    module=nn.Sequential(*decoder_module_list),
+                    balance=decoder_balance,
+                    devices=decoder_devices,
+                    chunks=args.pipeline_chunks,
+                    checkpoint=args.pipeline_checkpoint,
+                )
+
+    def forward(
+        self,
+        prev_output_tokens,
+        encoder_out=None,
+    ):
+        """
+        Args:
+            prev_output_tokens (LongTensor): previous decoder outputs of shape
+                `(batch, tgt_len)`, for teacher forcing
+            encoder_out (optional): output from the encoder, used for
+                encoder-side attention
+            incremental_state (dict): dictionary used for storing state during
+                :ref:`Incremental decoding`
+            features_only (bool, optional): only return features without
+                applying output layer (default: False).
+
+        Returns:
+            tuple:
+                - the decoder's output of shape `(batch, tgt_len, vocab)`
+                - a dictionary with any model-specific outputs
+        """
+        input_tuple = (
+            encoder_out.encoder_out,
+            encoder_out.encoder_padding_mask,
+            prev_output_tokens,
+        )
+        if self.use_pipeline:
+            input_tuple = tuple(i.to(self.model.devices[0]) for i in input_tuple)
+            if TORCH_PIPE:
+                return (self.model(input_tuple).local_value(),)
+            else:
+                return (self.model(input_tuple),)
+        else:
+            embed_layer_output = self.embedding_layer(input_tuple)
+            state = self.decoder_layers(embed_layer_output)
+            return (self.decoder_output_layer(state),)
+
+    def output_layer(self, features, **kwargs):
+        """Project features to the vocabulary size."""
+        if self.adaptive_softmax is None:
+            # project back to size of vocabulary
+            if self.share_input_output_embed:
+                return F.linear(features, self.embed_tokens.weight)
+            else:
+                return F.linear(features, self.embed_out)
+        else:
+            return features
+
+    def max_positions(self):
+        """Maximum output length supported by the decoder."""
+        if self.embedding_layer.embed_positions is None:
+            return self.embedding_layer.max_target_positions
+        return min(
+            self.embedding_layer.max_target_positions,
+            self.embedding_layer.embed_positions.max_positions,
+        )
+
+    def buffered_future_mask(self, tensor):
+        dim = tensor.size(0)
+        if (
+            not hasattr(self, "_future_mask")
+            or self._future_mask is None
+            or self._future_mask.device != tensor.device
+            or self._future_mask.size(0) < dim
+        ):
+            self._future_mask = torch.triu(
+                utils.fill_with_neg_inf(tensor.new(dim, dim)), 1
+            )
+        return self._future_mask[:dim, :dim]
+
+    def upgrade_state_dict_named(self, state_dict, name):
+        """Upgrade a (possibly old) state dict for new versions of fairseq."""
+        if isinstance(self.embed_positions, SinusoidalPositionalEmbedding):
+            weights_key = "{}.embed_positions.weights".format(name)
+            if weights_key in state_dict:
+                del state_dict[weights_key]
+            state_dict[
+                "{}.embed_positions._float_tensor".format(name)
+            ] = torch.FloatTensor(1)
+
+        for i in range(len(self.layers)):
+            # update layer norms
+            layer_norm_map = {
+                "0": "self_attn_layer_norm",
+                "1": "encoder_attn_layer_norm",
+                "2": "final_layer_norm",
+            }
+            for old, new in layer_norm_map.items():
+                for m in ("weight", "bias"):
+                    k = "{}.layers.{}.layer_norms.{}.{}".format(name, i, old, m)
+                    if k in state_dict:
+                        state_dict[
+                            "{}.layers.{}.{}.{}".format(name, i, new, m)
+                        ] = state_dict[k]
+                        del state_dict[k]
+
+        version_key = "{}.version".format(name)
+        if utils.item(state_dict.get(version_key, torch.Tensor([1]))[0]) <= 2:
+            # earlier checkpoints did not normalize after the stack of layers
+            self.layer_norm = None
+            self.normalize = False
+            state_dict[version_key] = torch.Tensor([1])
+
+        return state_dict
+
+
+@register_model_architecture(
+    "pipeline_parallel_transformer", "transformer_iwslt_de_en_pipeline_parallel"
+)
+def transformer_iwslt_de_en_dist(args):
+    transformer_iwslt_de_en(args)
+
+
+@register_model_architecture(
+    "pipeline_parallel_transformer", "transformer_wmt_en_de_big_pipeline_parallel"
+)
+def transformer_wmt_en_de_big_dist(args):
+    transformer_wmt_en_de_big(args)
diff --git a/SpeechT5/fairseq/fairseq/model_parallel/models/roberta/__init__.py b/SpeechT5/fairseq/fairseq/model_parallel/models/roberta/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..117827c3e9c176477f33e3a6fd7fe19a922411a2
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/model_parallel/models/roberta/__init__.py
@@ -0,0 +1,6 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from .model import *  # noqa
diff --git a/SpeechT5/fairseq/fairseq/model_parallel/models/roberta/model.py b/SpeechT5/fairseq/fairseq/model_parallel/models/roberta/model.py
new file mode 100644
index 0000000000000000000000000000000000000000..77a80ef72057219110b34678a38705549910edd3
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/model_parallel/models/roberta/model.py
@@ -0,0 +1,225 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+"""
+RoBERTa: A Robustly Optimized BERT Pretraining Approach.
+"""
+
+import logging
+
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from fairseq import utils
+from fairseq.model_parallel.models.transformer import ModelParallelTransformerEncoder
+from fairseq.models import register_model, register_model_architecture
+from fairseq.models.roberta import (
+    roberta_base_architecture,
+    roberta_prenorm_architecture,
+    RobertaEncoder,
+    RobertaModel,
+)
+from fairseq.modules import LayerNorm
+
+
+try:
+    from fairseq.model_parallel.megatron.mpu import (
+        copy_to_model_parallel_region,
+        gather_from_model_parallel_region,
+        ColumnParallelLinear,
+        VocabParallelEmbedding,
+    )
+
+    has_megatron_submodule = True
+except (ImportError, ModuleNotFoundError):
+    has_megatron_submodule = False
+
+logger = logging.getLogger(__name__)
+
+
+@register_model("model_parallel_roberta")
+class ModelParallelRobertaModel(RobertaModel):
+    def __init__(self, args, encoder):
+        super().__init__(args, encoder)
+
+        self.classification_heads = nn.ModuleDict()
+
+    @staticmethod
+    def add_args(parser):
+        RobertaModel.add_args(parser)
+        parser.add_argument(
+            "--no-final-layer-norm",
+            action="store_true",
+            help=(
+                "don't add final layernorm (only applicable when "
+                "--encoder-normalize-before=True"
+            ),
+        )
+
+    @classmethod
+    def build_model(cls, args, task):
+        """Build a new model instance."""
+
+        # make sure all arguments are present
+        base_architecture(args)
+
+        task.source_dictionary.pad_to_multiple_(args.model_parallel_size * 8)
+        task.target_dictionary.pad_to_multiple_(args.model_parallel_size * 8)
+
+        if not hasattr(args, "max_positions"):
+            args.max_positions = args.tokens_per_sample
+
+        if getattr(args, "untie_weights_roberta", False):
+            raise NotImplementedError(
+                "--untie-weights-roberta is not supported in model parallel mode"
+            )
+
+        encoder = ModelParallelRobertaEncoder(args, task.source_dictionary)
+        return cls(args, encoder)
+
+    def forward(
+        self,
+        src_tokens,
+        features_only=False,
+        return_all_hiddens=False,
+        classification_head_name=None,
+        **kwargs
+    ):
+        if classification_head_name is not None:
+            features_only = True
+
+        x, extra = self.encoder(src_tokens, features_only, return_all_hiddens, **kwargs)
+
+        if classification_head_name is not None:
+            x = self.classification_heads[classification_head_name](x)
+        return x, extra
+
+    def register_classification_head(
+        self, name, num_classes=None, inner_dim=None, **kwargs
+    ):
+        """Register a classification head."""
+        if name in self.classification_heads:
+            prev_num_classes = self.classification_heads[name].out_proj.out_features
+            prev_inner_dim = self.classification_heads[name].dense.out_features
+            if num_classes != prev_num_classes or inner_dim != prev_inner_dim:
+                logger.warning(
+                    're-registering head "{}" with num_classes {} (prev: {}) '
+                    "and inner_dim {} (prev: {})".format(
+                        name, num_classes, prev_num_classes, inner_dim, prev_inner_dim
+                    )
+                )
+        self.classification_heads[name] = ModelParallelRobertaClassificationHead(
+            self.args.encoder_embed_dim,
+            inner_dim or self.args.encoder_embed_dim,
+            num_classes,
+            self.args.pooler_activation_fn,
+            self.args.pooler_dropout,
+        )
+
+
+class ModelParallelRobertaLMHead(nn.Module):
+    """Head for masked language modeling."""
+
+    def __init__(self, embed_dim, output_dim, activation_fn, weight=None):
+        super().__init__()
+        self.dense = ColumnParallelLinear(embed_dim, embed_dim, gather_output=True)
+        self.activation_fn = utils.get_activation_fn(activation_fn)
+        self.layer_norm = LayerNorm(embed_dim)
+
+        if weight is None:
+            weight = nn.Linear(embed_dim, output_dim, bias=False).weight
+        self.weight = weight
+        self.bias = nn.Parameter(torch.zeros(output_dim))
+
+    def forward(self, features, masked_tokens=None, **kwargs):
+        # Only project the unmasked tokens while training,
+        # saves both memory and computation
+        if masked_tokens is not None:
+            features = features[masked_tokens, :]
+
+        x = self.dense(features)
+        x = self.activation_fn(x)
+        x = self.layer_norm(x)
+
+        x = copy_to_model_parallel_region(x)
+        # project back to size of vocabulary with bias
+        x = F.linear(x, self.weight)
+        x = gather_from_model_parallel_region(x).contiguous()
+        x = x + self.bias
+        return x
+
+
+class ModelParallelRobertaClassificationHead(nn.Module):
+    """Head for sentence-level classification tasks."""
+
+    def __init__(
+        self, input_dim, inner_dim, num_classes, activation_fn, pooler_dropout
+    ):
+        super().__init__()
+        self.dense = ColumnParallelLinear(input_dim, inner_dim, gather_output=True)
+        self.activation_fn = utils.get_activation_fn(activation_fn)
+        self.dropout = nn.Dropout(p=pooler_dropout)
+        self.out_proj = nn.Linear(inner_dim, num_classes)
+
+    def forward(self, features, **kwargs):
+        x = features[:, 0, :]  # take <s> token (equiv. to [CLS])
+        x = self.dropout(x)
+        x = self.dense(x)
+        x = self.activation_fn(x)
+        x = self.dropout(x)
+        x = self.out_proj(x)
+        return x
+
+
+class ModelParallelRobertaEncoder(RobertaEncoder):
+    """RoBERTa encoder."""
+
+    def __init__(self, args, dictionary):
+        super().__init__(args, dictionary)
+        assert not self.args.untie_weights_roberta
+
+    def build_embedding(self, vocab_size, embedding_dim, padding_idx):
+        return VocabParallelEmbedding(vocab_size, embedding_dim, padding_idx)
+
+    def build_encoder(self, args, dictionary, embed_tokens):
+        return ModelParallelTransformerEncoder(args, dictionary, embed_tokens)
+
+    def build_lm_head(self, embed_dim, output_dim, activation_fn, weight):
+        return ModelParallelRobertaLMHead(embed_dim, output_dim, activation_fn, weight)
+
+
+@register_model_architecture("model_parallel_roberta", "model_parallel_roberta")
+def base_architecture(args):
+    args.no_final_layer_norm = getattr(args, "no_final_layer_norm", False)
+    # model parallel RoBERTa defaults to "Pre-LN" formulation
+    roberta_prenorm_architecture(args)
+
+
+# earlier versions of model parallel RoBERTa removed the final layer norm
+@register_model_architecture("model_parallel_roberta", "model_parallel_roberta_v1")
+def model_parallel_roberta_v1_architecture(args):
+    args.no_final_layer_norm = getattr(args, "no_final_layer_norm", True)
+    base_architecture(args)
+
+
+@register_model_architecture(
+    "model_parallel_roberta", "model_parallel_roberta_postnorm"
+)
+def model_parallel_roberta_postnorm_architecture(args):
+    # the original BERT/RoBERTa uses the "Post-LN" formulation
+    roberta_base_architecture(args)
+
+
+@register_model_architecture("model_parallel_roberta", "model_parallel_roberta_base")
+def model_parallel_roberta_base_architecture(args):
+    base_architecture(args)
+
+
+@register_model_architecture("model_parallel_roberta", "model_parallel_roberta_large")
+def model_parallel_roberta_large_architecture(args):
+    args.encoder_layers = getattr(args, "encoder_layers", 24)
+    args.encoder_embed_dim = getattr(args, "encoder_embed_dim", 1024)
+    args.encoder_ffn_embed_dim = getattr(args, "encoder_ffn_embed_dim", 4096)
+    args.encoder_attention_heads = getattr(args, "encoder_attention_heads", 16)
+    base_architecture(args)
diff --git a/SpeechT5/fairseq/fairseq/model_parallel/models/transformer.py b/SpeechT5/fairseq/fairseq/model_parallel/models/transformer.py
new file mode 100644
index 0000000000000000000000000000000000000000..6b330ef1b7f7a506e7e8176f20a0e722b5fd5149
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/model_parallel/models/transformer.py
@@ -0,0 +1,121 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import logging
+
+import torch.nn as nn
+from fairseq.model_parallel.modules import (
+    ModelParallelTransformerDecoderLayer,
+    ModelParallelTransformerEncoderLayer,
+)
+from fairseq.models import register_model
+from fairseq.models.transformer import (
+    TransformerDecoder,
+    TransformerEncoder,
+    TransformerModel,
+)
+
+
+try:
+    from fairseq.model_parallel.megatron.mpu import (
+        copy_to_model_parallel_region,
+        gather_from_model_parallel_region,
+        VocabParallelEmbedding,
+    )
+
+    has_megatron_submodule = True
+except (ImportError, ModuleNotFoundError):
+    has_megatron_submodule = False
+
+
+logger = logging.getLogger(__name__)
+
+
+@register_model("model_parallel_transformer")
+class ModelParallelTransformerModel(TransformerModel):
+    """
+    Model parallel Transformer model.
+    """
+
+    @classmethod
+    def build_embedding(cls, args, dictionary, embed_dim, path=None):
+        if not has_megatron_submodule:
+            raise ImportError(
+                "\n\nPlease install the megatron submodule:"
+                "\n\n  git submodule update --init "
+                "fairseq/model_parallel/megatron"
+            )
+        dictionary.pad_to_multiple_(args.model_parallel_size * 8)
+        num_embeddings = len(dictionary)
+        padding_idx = dictionary.pad()
+
+        def _vocab_init(tensor, **kwargs):
+            nn.init.normal_(tensor, mean=0, std=num_embeddings ** -0.5)
+            nn.init.constant_(tensor[1], 0)
+
+        emb = VocabParallelEmbedding(
+            num_embeddings, embed_dim, padding_idx, init_method=_vocab_init
+        )
+        # if provided, load from preloaded dictionaries
+        if path:
+            raise NotImplementedError(
+                "Loading of embedding from path is not supported for model parallel"
+            )
+        return emb
+
+    @classmethod
+    def build_encoder(cls, args, src_dict, embed_tokens):
+        return ModelParallelTransformerEncoder(args, src_dict, embed_tokens)
+
+    @classmethod
+    def build_decoder(cls, args, tgt_dict, embed_tokens):
+        return ModelParallelTransformerDecoder(
+            args,
+            tgt_dict,
+            embed_tokens,
+            no_encoder_attn=getattr(args, "no_cross_attention", False),
+        )
+
+
+class ModelParallelTransformerEncoder(TransformerEncoder):
+    """
+    Model parallel Transformer encoder consisting of *args.encoder_layers* layers. Each layer
+    is a :class:`ModelParallelTransformerEncoderLayer`.
+    """
+
+    def __init__(self, args, dictionary, embed_tokens):
+        super().__init__(args, dictionary, embed_tokens)
+
+        if args.no_final_layer_norm:
+            self.layer_norm = None
+
+    def build_encoder_layer(self, args):
+        return ModelParallelTransformerEncoderLayer(args)
+
+
+class ModelParallelTransformerDecoder(TransformerDecoder):
+    """
+    Model Parallel Transformer decoder consisting of *args.decoder_layers* layers. Each layer
+    is a :class:`ModelParallelTransformerDecoderLayer`.
+    """
+
+    def build_decoder_layer(self, args, no_encoder_attn=False):
+        return ModelParallelTransformerDecoderLayer(args, no_encoder_attn)
+
+    def output_layer(self, features, **kwargs):
+        """Project features to the vocabulary size."""
+        if not self.share_input_output_embed:
+            raise NotImplementedError(
+                "Model parallel training currently requires --share-decoder-input-output-embed"
+            )
+
+        features = copy_to_model_parallel_region(features)
+
+        # project back to size of vocabulary
+        x = self.output_projection(features)
+
+        if getattr(self.args, "criterion") != "vocab_parallel_cross_entropy":
+            x = gather_from_model_parallel_region(x).contiguous()
+        return x
diff --git a/SpeechT5/fairseq/fairseq/model_parallel/models/transformer_lm.py b/SpeechT5/fairseq/fairseq/model_parallel/models/transformer_lm.py
new file mode 100644
index 0000000000000000000000000000000000000000..dc52f6e8dd3899b6bf9bebae7415cee20baf9884
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/model_parallel/models/transformer_lm.py
@@ -0,0 +1,174 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import torch.nn as nn
+from fairseq.model_parallel.models.transformer import ModelParallelTransformerDecoder
+from fairseq.models import register_model, register_model_architecture
+from fairseq.models.transformer_lm import TransformerLanguageModel
+
+
+try:
+    from fairseq.model_parallel.megatron.mpu import VocabParallelEmbedding
+
+    has_megatron_submodule = True
+except (ImportError, ModuleNotFoundError):
+    has_megatron_submodule = False
+
+
+DEFAULT_MAX_TARGET_POSITIONS = 1024
+
+
+@register_model("model_parallel_transformer_lm")
+class ModelParallelTransformerLanguageModel(TransformerLanguageModel):
+
+    @staticmethod
+    def add_args(parser):
+        TransformerLanguageModel.add_args(parser)
+
+    @classmethod
+    def build_model(cls, args, task):
+        """Build a new model instance."""
+        if not has_megatron_submodule:
+            raise ImportError(
+                "\n\nPlease install the megatron submodule:"
+                "\n\n  git submodule update --init "
+                "fairseq/model_parallel/megatron"
+            )
+
+        # make sure all arguments are present in older models
+        base_lm_architecture(args)
+
+        task.source_dictionary.pad_to_multiple_(args.model_parallel_size * 8)
+        task.target_dictionary.pad_to_multiple_(args.model_parallel_size * 8)
+
+        if args.decoder_layers_to_keep:
+            args.decoder_layers = len(args.decoder_layers_to_keep.split(","))
+
+        if getattr(args, "max_target_positions", None) is None:
+            args.max_target_positions = getattr(
+                args, "tokens_per_sample", DEFAULT_MAX_TARGET_POSITIONS
+            )
+
+        if args.character_embeddings:
+            raise NotImplementedError(
+                "Character embeddings is not supported for model parallel"
+            )
+        elif args.adaptive_input:
+            raise NotImplementedError(
+                "Adaptive input is not supported for model parallel"
+            )
+        else:
+            embed_tokens = cls.build_embedding(
+                args, task.source_dictionary, args.decoder_input_dim
+            )
+
+        decoder = ModelParallelTransformerDecoder(
+            args,
+            task.target_dictionary,
+            embed_tokens,
+            no_encoder_attn=True,
+        )
+        return cls(decoder)
+
+    @staticmethod
+    def add_args(parser):
+        TransformerLanguageModel.add_args(parser)
+
+    @classmethod
+    def build_embedding(cls, args, dictionary, embed_dim, path=None):
+        def _vocab_init(tensor, **kwargs):
+            nn.init.normal_(tensor, mean=0, std=embed_dim ** -0.5)
+            nn.init.constant_(tensor[1], 0)
+
+        embed_tokens = VocabParallelEmbedding(
+            len(dictionary), embed_dim, dictionary.pad(), init_method=_vocab_init
+        )
+        return embed_tokens
+
+
+def base_lm_architecture(args):
+    # backward compatibility for older model checkpoints
+    if hasattr(args, "no_tie_adaptive_proj"):
+        # previous models defined --no-tie-adaptive-proj, so use the existence of
+        # that option to determine if this is an "old" model checkpoint
+        args.no_decoder_final_norm = True  # old models always set this to True
+        if args.no_tie_adaptive_proj is False:
+            args.tie_adaptive_proj = True
+    if hasattr(args, "decoder_final_norm"):
+        args.no_decoder_final_norm = not args.decoder_final_norm
+
+    args.activation_fn = getattr(args, "activation_fn", "relu")
+    args.dropout = getattr(args, "dropout", 0.1)
+    args.attention_dropout = getattr(args, "attention_dropout", 0.0)
+    args.activation_dropout = getattr(args, "activation_dropout", 0.0)
+    args.relu_dropout = getattr(args, "relu_dropout", 0.0)
+    args.decoder_embed_dim = getattr(args, "decoder_embed_dim", 512)
+    args.decoder_output_dim = getattr(
+        args, "decoder_output_dim", args.decoder_embed_dim
+    )
+    args.decoder_input_dim = getattr(args, "decoder_input_dim", args.decoder_embed_dim)
+    args.decoder_ffn_embed_dim = getattr(args, "decoder_ffn_embed_dim", 2048)
+    args.decoder_layers = getattr(args, "decoder_layers", 6)
+    args.decoder_attention_heads = getattr(args, "decoder_attention_heads", 8)
+    # Model training is not stable without this
+    args.decoder_normalize_before = True
+    args.no_decoder_final_norm = getattr(args, "no_decoder_final_norm", False)
+    args.adaptive_softmax_cutoff = getattr(args, "adaptive_softmax_cutoff", None)
+    args.adaptive_softmax_dropout = getattr(args, "adaptive_softmax_dropout", 0)
+    args.adaptive_softmax_factor = getattr(args, "adaptive_softmax_factor", 4)
+    args.no_token_positional_embeddings = getattr(
+        args, "no_token_positional_embeddings", False
+    )
+    args.share_decoder_input_output_embed = getattr(
+        args, "share_decoder_input_output_embed", False
+    )
+    args.character_embeddings = getattr(args, "character_embeddings", False)
+    args.character_filters = getattr(
+        args,
+        "character_filters",
+        "[(1, 64), (2, 128), (3, 192), (4, 256), (5, 256), (6, 256), (7, 256)]",
+    )
+    args.character_embedding_dim = getattr(args, "character_embedding_dim", 4)
+    args.char_embedder_highway_layers = getattr(args, "char_embedder_highway_layers", 2)
+    args.adaptive_input = getattr(args, "adaptive_input", False)
+    args.adaptive_input_factor = getattr(args, "adaptive_input_factor", 4)
+    args.adaptive_input_cutoff = getattr(args, "adaptive_input_cutoff", None)
+    args.tie_adaptive_weights = getattr(args, "tie_adaptive_weights", False)
+    args.tie_adaptive_proj = getattr(args, "tie_adaptive_proj", False)
+    args.decoder_learned_pos = getattr(args, "decoder_learned_pos", False)
+    args.decoder_layerdrop = getattr(args, "decoder_layerdrop", 0.0)
+    args.decoder_layers_to_keep = getattr(args, "decoder_layers_to_keep", None)
+    args.layernorm_embedding = getattr(args, "layernorm_embedding", False)
+    args.no_scale_embedding = getattr(args, "no_scale_embedding", False)
+    args.quant_noise_pq = getattr(args, "quant_noise_pq", 0.0)
+    args.quant_noise_pq_block_size = getattr(args, "quant_noise_pq_block_size", 8)
+    args.quant_noise_scalar = getattr(args, "quant_noise_scalar", 0.0)
+    args.add_bos_token = getattr(args, "add_bos_token", False)
+
+
+@register_model_architecture("model_parallel_transformer_lm", "transformer_lm_megatron")
+def transformer_lm_megatron(args):
+    args.decoder_embed_dim = getattr(args, "decoder_embed_dim", 3072)
+    args.decoder_ffn_embed_dim = getattr(args, "decoder_ffn_embed_dim", 3072 * 4)
+    args.decoder_layers = getattr(args, "decoder_layers", 72)
+    args.decoder_attention_heads = getattr(args, "decoder_attention_heads", 32)
+    args.dropout = getattr(args, "dropout", 0.1)
+    args.attention_dropout = getattr(args, "attention_dropout", 0.1)
+    args.activation_fn = getattr(args, "activation_fn", "gelu")
+    base_lm_architecture(args)
+
+
+@register_model_architecture(
+    "model_parallel_transformer_lm", "transformer_lm_megatron_11b"
+)
+def transformer_lm_megatron_11b(args):
+    args.decoder_embed_dim = getattr(args, "decoder_embed_dim", 3072)
+    args.decoder_ffn_embed_dim = getattr(args, "decoder_ffn_embed_dim", 3072 * 6)
+    args.decoder_layers = getattr(args, "decoder_layers", 72)
+    args.decoder_attention_heads = getattr(args, "decoder_attention_heads", 32)
+    args.dropout = getattr(args, "dropout", 0.1)
+    args.attention_dropout = getattr(args, "attention_dropout", 0.1)
+    args.activation_fn = getattr(args, "activation_fn", "gelu")
+    base_lm_architecture(args)
diff --git a/SpeechT5/fairseq/fairseq/model_parallel/modules/__init__.py b/SpeechT5/fairseq/fairseq/model_parallel/modules/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..11603217a188f420ea849ae0fde19979736ba208
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/model_parallel/modules/__init__.py
@@ -0,0 +1,17 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+"""isort:skip_file"""
+
+from .multihead_attention import ModelParallelMultiheadAttention
+from .transformer_layer import (
+    ModelParallelTransformerEncoderLayer,
+    ModelParallelTransformerDecoderLayer,
+)
+
+__all__ = [
+    "ModelParallelMultiheadAttention",
+    "ModelParallelTransformerEncoderLayer",
+    "ModelParallelTransformerDecoderLayer",
+]
diff --git a/SpeechT5/fairseq/fairseq/model_parallel/modules/multihead_attention.py b/SpeechT5/fairseq/fairseq/model_parallel/modules/multihead_attention.py
new file mode 100644
index 0000000000000000000000000000000000000000..8eb9d09dad37ab132295166d691873beec63eaf1
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/model_parallel/modules/multihead_attention.py
@@ -0,0 +1,349 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from typing import Dict, Optional, Tuple
+
+import torch
+import torch.nn.functional as F
+from fairseq import utils
+from fairseq.incremental_decoding_utils import with_incremental_state
+from fairseq.modules.fairseq_dropout import FairseqDropout
+from torch import Tensor, nn
+
+
+try:
+    from fairseq.model_parallel.megatron.mpu import (
+        get_cuda_rng_tracker,
+        get_model_parallel_world_size,
+        ColumnParallelLinear,
+        RowParallelLinear,
+    )
+
+    has_megatron_submodule = True
+except (ImportError, ModuleNotFoundError):
+    has_megatron_submodule = False
+
+
+@with_incremental_state
+class ModelParallelMultiheadAttention(nn.Module):
+    """Model parallel Multi-headed attention.
+    This performs the Multi-headed attention over multiple gpus.
+
+    See "Megatron-LM: https://arxiv.org/pdf/1909.08053.pdf" for more details.
+    """
+
+    def __init__(
+        self,
+        embed_dim,
+        num_heads,
+        kdim=None,
+        vdim=None,
+        dropout=0.0,
+        bias=True,
+        self_attention=False,
+        encoder_decoder_attention=False,
+    ):
+        super().__init__()
+        if not has_megatron_submodule:
+            raise ImportError(
+                "\n\nPlease install the megatron submodule:"
+                "\n\n  git submodule update --init "
+                "fairseq/model_parallel/megatron"
+            )
+        self.embed_dim = embed_dim
+        self.kdim = kdim if kdim is not None else embed_dim
+        self.vdim = vdim if vdim is not None else embed_dim
+        self.qkv_same_dim = self.kdim == embed_dim and self.vdim == embed_dim
+
+        self.model_parallel_size = get_model_parallel_world_size()
+
+        self.num_heads_partition = num_heads // self.model_parallel_size
+        assert (
+            self.num_heads_partition * self.model_parallel_size == num_heads
+        ), "Number of heads must be divisible by model parallel size"
+
+        self.dropout_module = FairseqDropout(
+            dropout, module_name=self.__class__.__name__
+        )
+        self.head_dim = embed_dim // num_heads
+        assert (
+            self.head_dim * num_heads == self.embed_dim
+        ), "embed_dim must be divisible by num_heads"
+        self.scaling = self.head_dim ** -0.5
+
+        self.self_attention = self_attention
+        self.encoder_decoder_attention = encoder_decoder_attention
+
+        assert (
+            not self.self_attention or self.qkv_same_dim
+        ), "Self-attention requires query, key and value to be of the same size"
+
+        self.k_proj = ColumnParallelLinear(
+            self.kdim, embed_dim, bias=bias, gather_output=False
+        )
+        self.v_proj = ColumnParallelLinear(
+            self.vdim, embed_dim, bias=bias, gather_output=False
+        )
+        self.q_proj = ColumnParallelLinear(
+            embed_dim, embed_dim, bias=bias, gather_output=False
+        )
+        self.out_proj = RowParallelLinear(
+            embed_dim, embed_dim, bias=bias, input_is_parallel=True
+        )
+
+    def forward(
+        self,
+        query,
+        key: Optional[Tensor],
+        value: Optional[Tensor],
+        key_padding_mask: Optional[Tensor] = None,
+        incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]] = None,
+        static_kv: bool = False,
+        attn_mask: Optional[Tensor] = None,
+        **unused_kwargs,
+    ) -> Tuple[Tensor, Optional[Tensor]]:
+        """Input shape: Time x Batch x Channel
+
+        Args:
+            key_padding_mask (ByteTensor, optional): mask to exclude
+                keys that are pads, of shape `(batch, src_len)`, where
+                padding elements are indicated by 1s.
+            attn_mask (ByteTensor, optional): typically used to
+                implement causal attention, where the mask prevents the
+                attention from looking forward in time (default: None).
+        """
+        tgt_len, bsz, embed_dim = query.size()
+        assert embed_dim == self.embed_dim
+        assert list(query.size()) == [tgt_len, bsz, embed_dim]
+
+        is_tpu = query.device.type == "xla"
+
+        if incremental_state is not None:
+            saved_state = self._get_input_buffer(incremental_state)
+            if saved_state is not None and "prev_key" in saved_state:
+                # previous time steps are cached - no need to recompute
+                # key and value if they are static
+                if static_kv:
+                    assert self.encoder_decoder_attention and not self.self_attention
+                    key = value = None
+        else:
+            saved_state = None
+
+        if self.self_attention:
+            q = self.q_proj(query)
+            k = self.k_proj(query)
+            v = self.v_proj(query)
+        elif self.encoder_decoder_attention:
+            # encoder-decoder attention
+            q = self.q_proj(query)
+            if key is None:
+                assert value is None
+                k = v = None
+            else:
+                k = self.k_proj(key)
+                v = self.v_proj(key)
+
+        else:
+            assert key is not None and value is not None
+            q = self.q_proj(query)
+            k = self.k_proj(key)
+            v = self.v_proj(value)
+        q *= self.scaling
+
+        q = (
+            q.contiguous()
+            .view(tgt_len, bsz * self.num_heads_partition, self.head_dim)
+            .transpose(0, 1)
+        )
+        if k is not None:
+            k = (
+                k.contiguous()
+                .view(-1, bsz * self.num_heads_partition, self.head_dim)
+                .transpose(0, 1)
+            )
+        if v is not None:
+            v = (
+                v.contiguous()
+                .view(-1, bsz * self.num_heads_partition, self.head_dim)
+                .transpose(0, 1)
+            )
+
+        if saved_state is not None:
+            # saved states are stored with shape (bsz, num_heads_partition, seq_len, head_dim)
+            if "prev_key" in saved_state:
+                _prev_key = saved_state["prev_key"]
+                assert _prev_key is not None
+                prev_key = _prev_key.view(
+                    bsz * self.num_heads_partition, -1, self.head_dim
+                )
+                if static_kv:
+                    k = prev_key
+                else:
+                    assert k is not None
+                    k = torch.cat([prev_key, k], dim=1)
+            if "prev_value" in saved_state:
+                _prev_value = saved_state["prev_value"]
+                assert _prev_value is not None
+                prev_value = _prev_value.view(
+                    bsz * self.num_heads_partition, -1, self.head_dim
+                )
+                if static_kv:
+                    v = prev_value
+                else:
+                    assert v is not None
+                    v = torch.cat([prev_value, v], dim=1)
+            prev_key_padding_mask: Optional[Tensor] = None
+            if "prev_key_padding_mask" in saved_state:
+                prev_key_padding_mask = saved_state["prev_key_padding_mask"]
+            assert k is not None and v is not None
+            key_padding_mask = (
+                ModelParallelMultiheadAttention._append_prev_key_padding_mask(
+                    key_padding_mask=key_padding_mask,
+                    prev_key_padding_mask=prev_key_padding_mask,
+                    batch_size=bsz,
+                    src_len=k.size(1),
+                    static_kv=static_kv,
+                )
+            )
+
+            saved_state["prev_key"] = k.view(
+                bsz, self.num_heads_partition, -1, self.head_dim
+            )
+            saved_state["prev_value"] = v.view(
+                bsz, self.num_heads_partition, -1, self.head_dim
+            )
+            saved_state["prev_key_padding_mask"] = key_padding_mask
+            # In this branch incremental_state is never None
+            assert incremental_state is not None
+            incremental_state = self._set_input_buffer(incremental_state, saved_state)
+        assert k is not None
+        src_len = k.size(1)
+
+        # This is part of a workaround to get around fork/join parallelism
+        # not supporting Optional types.
+        if key_padding_mask is not None and key_padding_mask.dim() == 0:
+            key_padding_mask = None
+
+        if key_padding_mask is not None:
+            assert key_padding_mask.size(0) == bsz
+            assert key_padding_mask.size(1) == src_len
+
+        attn_weights = torch.bmm(q, k.transpose(1, 2))
+
+        assert list(attn_weights.size()) == [
+            bsz * self.num_heads_partition,
+            tgt_len,
+            src_len,
+        ]
+
+        if attn_mask is not None:
+            attn_mask = attn_mask.unsqueeze(0)
+            attn_weights += attn_mask
+
+        if key_padding_mask is not None:
+            # don't attend to padding symbols
+            attn_weights = attn_weights.view(
+                bsz, self.num_heads_partition, tgt_len, src_len
+            )
+            if not is_tpu:
+                attn_weights = attn_weights.masked_fill(
+                    key_padding_mask.unsqueeze(1).unsqueeze(2).to(torch.bool),
+                    float("-inf"),
+                )
+            else:
+                attn_weights = attn_weights.transpose(0, 2)
+                attn_weights = attn_weights.masked_fill(key_padding_mask, float("-inf"))
+                attn_weights = attn_weights.transpose(0, 2)
+            attn_weights = attn_weights.view(
+                bsz * self.num_heads_partition, tgt_len, src_len
+            )
+
+        attn_weights_float = utils.softmax(attn_weights, dim=-1)
+        attn_weights = attn_weights_float.type_as(attn_weights)
+
+        with get_cuda_rng_tracker().fork():
+            attn_probs = self.dropout_module(attn_weights)
+
+        assert v is not None
+        attn = torch.bmm(attn_probs, v)
+        assert list(attn.size()) == [
+            bsz * self.num_heads_partition,
+            tgt_len,
+            self.head_dim,
+        ]
+        embed_dim_partition = embed_dim // self.model_parallel_size
+        attn = attn.transpose(0, 1).contiguous().view(tgt_len, bsz, embed_dim_partition)
+        attn = self.out_proj(attn)
+        # return attn_weights None to keep the return type same as single gpu multihead attention
+        # This will be deprecated.
+        attn_weights: Optional[Tensor] = None
+
+        return attn, attn_weights
+
+    @staticmethod
+    def _append_prev_key_padding_mask(
+        key_padding_mask: Optional[Tensor],
+        prev_key_padding_mask: Optional[Tensor],
+        batch_size: int,
+        src_len: int,
+        static_kv: bool,
+    ) -> Optional[Tensor]:
+        # saved key padding masks have shape (bsz, seq_len)
+        if prev_key_padding_mask is not None and static_kv:
+            new_key_padding_mask = prev_key_padding_mask
+        elif prev_key_padding_mask is not None and key_padding_mask is not None:
+            new_key_padding_mask = torch.cat(
+                [prev_key_padding_mask.float(), key_padding_mask.float()], dim=1
+            )
+        # During incremental decoding, as the padding token enters and
+        # leaves the frame, there will be a time when prev or current
+        # is None
+        elif prev_key_padding_mask is not None:
+
+            filler = torch.zeros(batch_size, src_len - prev_key_padding_mask.size(1))
+            if prev_key_padding_mask.is_cuda:
+                filler = filler.cuda()
+            new_key_padding_mask = torch.cat(
+                [prev_key_padding_mask.float(), filler.float()], dim=1
+            )
+        elif key_padding_mask is not None:
+            filler = torch.zeros(batch_size, src_len - key_padding_mask.size(1))
+            if key_padding_mask.is_cuda:
+                filler = filler.cuda()
+            new_key_padding_mask = torch.cat(
+                [filler.float(), key_padding_mask.float()], dim=1
+            )
+        else:
+            new_key_padding_mask = prev_key_padding_mask
+        return new_key_padding_mask
+
+    def reorder_incremental_state(
+        self, incremental_state: Dict[str, Dict[str, Optional[Tensor]]], new_order
+    ):
+        """Reorder buffered internal state (for incremental generation)."""
+        input_buffer = self._get_input_buffer(incremental_state)
+        if input_buffer is not None:
+            for k in input_buffer.keys():
+                if input_buffer[k] is not None:
+                    input_buffer[k] = input_buffer[k].index_select(0, new_order)
+            incremental_state = self._set_input_buffer(incremental_state, input_buffer)
+        return incremental_state
+
+    def _get_input_buffer(
+        self, incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]]
+    ) -> Dict[str, Optional[Tensor]]:
+        result = self.get_incremental_state(incremental_state, "attn_state")
+        if result is not None:
+            return result
+        else:
+            empty_result: Dict[str, Optional[Tensor]] = {}
+            return empty_result
+
+    def _set_input_buffer(
+        self,
+        incremental_state: Dict[str, Dict[str, Optional[Tensor]]],
+        buffer: Dict[str, Optional[Tensor]],
+    ):
+        return self.set_incremental_state(incremental_state, "attn_state", buffer)
diff --git a/SpeechT5/fairseq/fairseq/model_parallel/modules/transformer_layer.py b/SpeechT5/fairseq/fairseq/model_parallel/modules/transformer_layer.py
new file mode 100644
index 0000000000000000000000000000000000000000..7ab53c6e5f12f15562717effb86ab8cb8d6b4fa3
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/model_parallel/modules/transformer_layer.py
@@ -0,0 +1,78 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from fairseq.model_parallel.modules import ModelParallelMultiheadAttention
+from fairseq.modules import TransformerDecoderLayer, TransformerEncoderLayer
+
+
+try:
+    from fairseq.model_parallel.megatron.mpu import (
+        ColumnParallelLinear,
+        RowParallelLinear,
+    )
+
+    has_megatron_submodule = True
+except (ImportError, ModuleNotFoundError):
+    has_megatron_submodule = False
+
+
+class ModelParallelTransformerEncoderLayer(TransformerEncoderLayer):
+    """Encoder layer block over multiple gpus.
+
+    See "Megatron-LM: https://arxiv.org/pdf/1909.08053.pdf" for more details.
+    """
+
+    def build_fc1(self, input_dim, output_dim, q_noise, qn_block_size):
+        if q_noise > 0:
+            raise NotImplementedError
+        return ColumnParallelLinear(input_dim, output_dim, gather_output=False)
+
+    def build_fc2(self, input_dim, output_dim, q_noise, qn_block_size):
+        if q_noise > 0:
+            raise NotImplementedError
+        return RowParallelLinear(input_dim, output_dim, input_is_parallel=True)
+
+    def build_self_attention(self, embed_dim, args, **unused_kwargs):
+        return ModelParallelMultiheadAttention(
+            embed_dim,
+            args.encoder_attention_heads,
+            dropout=args.attention_dropout,
+            self_attention=True,
+        )
+
+
+class ModelParallelTransformerDecoderLayer(TransformerDecoderLayer):
+    """Decoder layer block.
+
+    See "Megatron-LM: https://arxiv.org/pdf/1909.08053.pdf" for more details.
+    """
+
+    def build_fc1(self, input_dim, output_dim, q_noise, qn_block_size):
+        if q_noise > 0:
+            raise NotImplementedError
+        return ColumnParallelLinear(input_dim, output_dim, gather_output=False)
+
+    def build_fc2(self, input_dim, output_dim, q_noise, qn_block_size):
+        if q_noise > 0:
+            raise NotImplementedError
+        return RowParallelLinear(input_dim, output_dim, input_is_parallel=True)
+
+    def build_self_attention(self, embed_dim, args, **unused_kwargs):
+        return ModelParallelMultiheadAttention(
+            embed_dim=embed_dim,
+            num_heads=args.decoder_attention_heads,
+            dropout=args.attention_dropout,
+            self_attention=not getattr(args, "cross_self_attention", False),
+        )
+
+    def build_encoder_attention(self, embed_dim, args, **unused_kwargs):
+        return ModelParallelMultiheadAttention(
+            embed_dim=embed_dim,
+            num_heads=args.decoder_attention_heads,
+            kdim=getattr(args, "encoder_embed_dim", None),
+            vdim=getattr(args, "encoder_embed_dim", None),
+            dropout=args.attention_dropout,
+            encoder_decoder_attention=True,
+        )
diff --git a/SpeechT5/fairseq/fairseq/models/__init__.py b/SpeechT5/fairseq/fairseq/models/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..61425c8ef5e386c035d97a7ddaf773ff39dde61c
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/models/__init__.py
@@ -0,0 +1,225 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+"""isort:skip_file"""
+
+import argparse
+import importlib
+import os
+
+from fairseq.dataclass import FairseqDataclass
+from fairseq.dataclass.utils import merge_with_parent, populate_dataclass
+from hydra.core.config_store import ConfigStore
+
+from .composite_encoder import CompositeEncoder
+from .distributed_fairseq_model import DistributedFairseqModel
+from .fairseq_decoder import FairseqDecoder
+from .fairseq_encoder import FairseqEncoder
+from .fairseq_incremental_decoder import FairseqIncrementalDecoder
+from .fairseq_model import (
+    BaseFairseqModel,
+    FairseqEncoderDecoderModel,
+    FairseqEncoderModel,
+    FairseqLanguageModel,
+    FairseqModel,
+    FairseqMultiModel,
+)
+
+
+MODEL_REGISTRY = {}
+MODEL_DATACLASS_REGISTRY = {}
+ARCH_MODEL_REGISTRY = {}
+ARCH_MODEL_NAME_REGISTRY = {}
+ARCH_MODEL_INV_REGISTRY = {}
+ARCH_CONFIG_REGISTRY = {}
+
+
+__all__ = [
+    "BaseFairseqModel",
+    "CompositeEncoder",
+    "DistributedFairseqModel",
+    "FairseqDecoder",
+    "FairseqEncoder",
+    "FairseqEncoderDecoderModel",
+    "FairseqEncoderModel",
+    "FairseqIncrementalDecoder",
+    "FairseqLanguageModel",
+    "FairseqModel",
+    "FairseqMultiModel",
+]
+
+
+def build_model(cfg: FairseqDataclass, task):
+
+    model = None
+    model_type = getattr(cfg, "_name", None) or getattr(cfg, "arch", None)
+
+    if not model_type and len(cfg) == 1:
+        # this is hit if config object is nested in directory that is named after model type
+
+        model_type = next(iter(cfg))
+        if model_type in MODEL_DATACLASS_REGISTRY:
+            cfg = cfg[model_type]
+        else:
+            raise Exception(
+                "Could not infer model type from directory. Please add _name field to indicate model type. "
+                "Available models: "
+                + str(MODEL_DATACLASS_REGISTRY.keys())
+                + " Requested model type: "
+                + model_type
+            )
+
+    if model_type in ARCH_MODEL_REGISTRY:
+        # case 1: legacy models
+        model = ARCH_MODEL_REGISTRY[model_type]
+    elif model_type in MODEL_DATACLASS_REGISTRY:
+        # case 2: config-driven models
+        model = MODEL_REGISTRY[model_type]
+
+    if model_type in MODEL_DATACLASS_REGISTRY:
+        # set defaults from dataclass. note that arch name and model name can be the same
+        dc = MODEL_DATACLASS_REGISTRY[model_type]
+        if isinstance(cfg, argparse.Namespace):
+            cfg = populate_dataclass(dc(), cfg)
+        else:
+            cfg = merge_with_parent(dc(), cfg)
+
+    assert model is not None, (
+        f"Could not infer model type from {cfg}. "
+        f"Available models: "
+        + str(MODEL_DATACLASS_REGISTRY.keys())
+        + " Requested model type: "
+        + model_type
+    )
+
+    return model.build_model(cfg, task)
+
+
+def register_model(name, dataclass=None):
+    """
+    New model types can be added to fairseq with the :func:`register_model`
+    function decorator.
+
+    For example::
+
+        @register_model('lstm')
+        class LSTM(FairseqEncoderDecoderModel):
+            (...)
+
+    .. note:: All models must implement the :class:`BaseFairseqModel` interface.
+        Typically you will extend :class:`FairseqEncoderDecoderModel` for
+        sequence-to-sequence tasks or :class:`FairseqLanguageModel` for
+        language modeling tasks.
+
+    Args:
+        name (str): the name of the model
+    """
+
+    def register_model_cls(cls):
+        if name in MODEL_REGISTRY:
+            raise ValueError("Cannot register duplicate model ({})".format(name))
+        if not issubclass(cls, BaseFairseqModel):
+            raise ValueError(
+                "Model ({}: {}) must extend BaseFairseqModel".format(name, cls.__name__)
+            )
+        MODEL_REGISTRY[name] = cls
+        if dataclass is not None and not issubclass(dataclass, FairseqDataclass):
+            raise ValueError(
+                "Dataclass {} must extend FairseqDataclass".format(dataclass)
+            )
+
+        cls.__dataclass = dataclass
+        if dataclass is not None:
+            MODEL_DATACLASS_REGISTRY[name] = dataclass
+
+            cs = ConfigStore.instance()
+            node = dataclass()
+            node._name = name
+            cs.store(name=name, group="model", node=node, provider="fairseq")
+
+            @register_model_architecture(name, name)
+            def noop(_):
+                pass
+
+        return cls
+
+    return register_model_cls
+
+
+def register_model_architecture(model_name, arch_name):
+    """
+    New model architectures can be added to fairseq with the
+    :func:`register_model_architecture` function decorator. After registration,
+    model architectures can be selected with the ``--arch`` command-line
+    argument.
+
+    For example::
+
+        @register_model_architecture('lstm', 'lstm_luong_wmt_en_de')
+        def lstm_luong_wmt_en_de(cfg):
+            args.encoder_embed_dim = getattr(cfg.model, 'encoder_embed_dim', 1000)
+            (...)
+
+    The decorated function should take a single argument *cfg*, which is a
+    :class:`omegaconf.DictConfig`. The decorated function should modify these
+    arguments in-place to match the desired architecture.
+
+    Args:
+        model_name (str): the name of the Model (Model must already be
+            registered)
+        arch_name (str): the name of the model architecture (``--arch``)
+    """
+
+    def register_model_arch_fn(fn):
+        if model_name not in MODEL_REGISTRY:
+            raise ValueError(
+                "Cannot register model architecture for unknown model type ({})".format(
+                    model_name
+                )
+            )
+        if arch_name in ARCH_MODEL_REGISTRY:
+            raise ValueError(
+                "Cannot register duplicate model architecture ({})".format(arch_name)
+            )
+        if not callable(fn):
+            raise ValueError(
+                "Model architecture must be callable ({})".format(arch_name)
+            )
+        ARCH_MODEL_REGISTRY[arch_name] = MODEL_REGISTRY[model_name]
+        ARCH_MODEL_NAME_REGISTRY[arch_name] = model_name
+        ARCH_MODEL_INV_REGISTRY.setdefault(model_name, []).append(arch_name)
+        ARCH_CONFIG_REGISTRY[arch_name] = fn
+        return fn
+
+    return register_model_arch_fn
+
+
+def import_models(models_dir, namespace):
+    for file in os.listdir(models_dir):
+        path = os.path.join(models_dir, file)
+        if (
+            not file.startswith("_")
+            and not file.startswith(".")
+            and (file.endswith(".py") or os.path.isdir(path))
+        ):
+            model_name = file[: file.find(".py")] if file.endswith(".py") else file
+            importlib.import_module(namespace + "." + model_name)
+
+            # extra `model_parser` for sphinx
+            if model_name in MODEL_REGISTRY:
+                parser = argparse.ArgumentParser(add_help=False)
+                group_archs = parser.add_argument_group("Named architectures")
+                group_archs.add_argument(
+                    "--arch", choices=ARCH_MODEL_INV_REGISTRY[model_name]
+                )
+                group_args = parser.add_argument_group(
+                    "Additional command-line arguments"
+                )
+                MODEL_REGISTRY[model_name].add_args(group_args)
+                globals()[model_name + "_parser"] = parser
+
+
+# automatically import any Python files in the models/ directory
+models_dir = os.path.dirname(__file__)
+import_models(models_dir, "fairseq.models")
diff --git a/SpeechT5/fairseq/fairseq/models/bart/__init__.py b/SpeechT5/fairseq/fairseq/models/bart/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..a701923f7e5a2a8aa9b75e5580ddea22907f53ee
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/models/bart/__init__.py
@@ -0,0 +1,7 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from .hub_interface import *  # noqa
+from .model import *  # noqa
diff --git a/SpeechT5/fairseq/fairseq/models/bart/hub_interface.py b/SpeechT5/fairseq/fairseq/models/bart/hub_interface.py
new file mode 100644
index 0000000000000000000000000000000000000000..9afe385b9d93e29f81709b088c945b73639bf583
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/models/bart/hub_interface.py
@@ -0,0 +1,208 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import copy
+import logging
+from typing import Dict, List
+
+import numpy as np
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from fairseq import utils
+from fairseq.data import encoders
+from fairseq.hub_utils import GeneratorHubInterface
+from omegaconf import open_dict
+
+
+logger = logging.getLogger(__name__)
+
+
+class BARTHubInterface(GeneratorHubInterface):
+    """A simple PyTorch Hub interface to BART.
+
+    Usage: https://github.com/pytorch/fairseq/tree/master/examples/bart
+    """
+
+    def __init__(self, cfg, task, model):
+        super().__init__(cfg, task, [model])
+        self.model = self.models[0]
+
+    def encode(
+        self, sentence: str, *addl_sentences, no_separator=True
+    ) -> torch.LongTensor:
+        """
+        BPE-encode a sentence (or multiple sentences).
+
+        Every sequence begins with a beginning-of-sentence (`<s>`) symbol.
+        Every sentence ends with an end-of-sentence (`</s>`).
+
+        Example (single sentence): `<s> a b c </s>`
+        Example (sentence pair): `<s> d e f </s> 1 2 3 </s>`
+
+        The BPE encoding follows GPT-2. One subtle detail is that the GPT-2 BPE
+        requires leading spaces. For example::
+
+            >>> bart.encode('Hello world').tolist()
+            [0, 31414, 232, 2]
+            >>> bart.encode(' world').tolist()
+            [0, 232, 2]
+            >>> bart.encode('world').tolist()
+            [0, 8331, 2]
+        """
+        tokens = self.bpe.encode(sentence)
+        if len(tokens.split(" ")) > min(self.max_positions) - 2:
+            tokens = " ".join(tokens.split(" ")[: min(self.max_positions) - 2])
+        bpe_sentence = "<s> " + tokens + " </s>"
+        for s in addl_sentences:
+            bpe_sentence += " </s>" if not no_separator else ""
+            bpe_sentence += " " + self.bpe.encode(s) + " </s>"
+        tokens = self.task.source_dictionary.encode_line(bpe_sentence, append_eos=False)
+        return tokens.long()
+
+    def decode(self, tokens: torch.LongTensor):
+        assert tokens.dim() == 1
+        tokens = tokens.cpu().numpy()
+        if tokens[0] == self.task.source_dictionary.bos():
+            tokens = tokens[1:]  # remove <s>
+        eos_mask = tokens == self.task.source_dictionary.eos()
+        doc_mask = eos_mask[1:] & eos_mask[:-1]
+        sentences = np.split(tokens, doc_mask.nonzero()[0] + 1)
+        sentences = [
+            self.bpe.decode(self.task.source_dictionary.string(s)) for s in sentences
+        ]
+        if len(sentences) == 1:
+            return sentences[0]
+        return sentences
+
+    def _build_sample(self, src_tokens: List[torch.LongTensor]):
+        # assert torch.is_tensor(src_tokens)
+        dataset = self.task.build_dataset_for_inference(
+            src_tokens,
+            [x.numel() for x in src_tokens],
+        )
+        sample = dataset.collater(dataset)
+        sample = utils.apply_to_sample(lambda tensor: tensor.to(self.device), sample)
+        return sample
+
+    def generate(
+        self,
+        tokenized_sentences: List[torch.LongTensor],
+        *args,
+        inference_step_args=None,
+        skip_invalid_size_inputs=False,
+        **kwargs
+    ) -> List[List[Dict[str, torch.Tensor]]]:
+        inference_step_args = inference_step_args or {}
+        if "prefix_tokens" in inference_step_args:
+            raise NotImplementedError("prefix generation not implemented for BART")
+        res = []
+        for batch in self._build_batches(tokenized_sentences, skip_invalid_size_inputs):
+            src_tokens = batch['net_input']['src_tokens']
+            inference_step_args["prefix_tokens"] =src_tokens.new_full(
+                (src_tokens.size(0), 1), fill_value=self.task.source_dictionary.bos()
+            ).to(device=self.device)
+            results = super().generate(
+                src_tokens,
+                *args,
+                inference_step_args=inference_step_args,
+                skip_invalid_size_inputs=skip_invalid_size_inputs,
+                **kwargs
+            )
+            for id, hypos in zip(batch['id'].tolist(), results):
+                res.append((id, hypos))
+        res = [hypos for _, hypos in sorted(res, key=lambda x: x[0])]
+        return res
+
+    def extract_features(
+        self, tokens: torch.LongTensor, return_all_hiddens: bool = False
+    ) -> torch.Tensor:
+        if tokens.dim() == 1:
+            tokens = tokens.unsqueeze(0)
+        if tokens.size(-1) > min(self.model.max_positions()):
+            raise ValueError(
+                "tokens exceeds maximum length: {} > {}".format(
+                    tokens.size(-1), self.model.max_positions()
+                )
+            )
+        tokens.to(device=self.device),
+        prev_output_tokens = tokens.clone()
+
+        prev_output_tokens[:, 0] = tokens.gather(
+            1,
+            (tokens.ne(self.task.source_dictionary.pad()).sum(dim=1) - 1).unsqueeze(-1),
+        ).squeeze()
+
+        prev_output_tokens[:, 1:] = tokens[:, :-1]
+        features, extra = self.model(
+            src_tokens=tokens,
+            src_lengths=None,
+            prev_output_tokens=prev_output_tokens,
+            features_only=True,
+            return_all_hiddens=return_all_hiddens,
+        )
+        if return_all_hiddens:
+            # convert from T x B x C -> B x T x C
+            inner_states = extra["inner_states"]
+            return [inner_state.transpose(0, 1) for inner_state in inner_states]
+        else:
+            return features  # just the last layer's features
+
+    def register_classification_head(
+        self, name: str, num_classes: int = None, embedding_size: int = None, **kwargs
+    ):
+        self.model.register_classification_head(
+            name, num_classes=num_classes, embedding_size=embedding_size, **kwargs
+        )
+
+    def predict(self, head: str, tokens: torch.LongTensor, return_logits: bool = False):
+        if tokens.dim() == 1:
+            tokens = tokens.unsqueeze(0)
+        features = self.extract_features(tokens.to(device=self.device))
+        sentence_representation = features[
+            tokens.eq(self.task.source_dictionary.eos()), :
+        ].view(features.size(0), -1, features.size(-1))[:, -1, :]
+
+        logits = self.model.classification_heads[head](sentence_representation)
+        if return_logits:
+            return logits
+        return F.log_softmax(logits, dim=-1)
+
+    def fill_mask(
+        self,
+        masked_inputs: List[str],
+        topk: int = 5,
+        match_source_len: bool = True,
+        **generate_kwargs
+    ):
+        masked_token = '<mask>'
+        batch_tokens = []
+        for masked_input in masked_inputs:
+            assert masked_token in masked_input, \
+                "please add one {} token for the input".format(masked_token)
+
+            text_spans = masked_input.split(masked_token)
+            text_spans_bpe = (' {0} '.format(masked_token)).join(
+                [self.bpe.encode(text_span.rstrip()) for text_span in text_spans]
+            ).strip()
+            tokens = self.task.source_dictionary.encode_line(
+                '<s> ' + text_spans_bpe + ' </s>',
+                append_eos=False,
+                add_if_not_exist=False,
+            ).long()
+            batch_tokens.append(tokens)
+
+        # ensure beam size is at least as big as topk
+        generate_kwargs['beam'] = max(
+            topk,
+            generate_kwargs.get('beam', -1),
+        )
+        generate_kwargs['match_source_len'] = match_source_len
+        batch_hypos = self.generate(batch_tokens, **generate_kwargs)
+
+        return [
+            [(self.decode(hypo['tokens']), hypo['score']) for hypo in hypos[:topk]]
+            for hypos in batch_hypos
+        ]
diff --git a/SpeechT5/fairseq/fairseq/models/bart/model.py b/SpeechT5/fairseq/fairseq/models/bart/model.py
new file mode 100644
index 0000000000000000000000000000000000000000..71d0b27cd2c0655fe3b00479b672d6d042a4d5ed
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/models/bart/model.py
@@ -0,0 +1,384 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+"""
+BART: Denoising Sequence-to-Sequence Pre-training for
+Natural Language Generation, Translation, and Comprehension
+"""
+from typing import Optional
+
+import logging
+
+import torch
+import torch.nn as nn
+from fairseq import utils
+from fairseq.models import register_model, register_model_architecture
+from fairseq.models.transformer import TransformerModel
+from fairseq.modules.transformer_sentence_encoder import init_bert_params
+
+from .hub_interface import BARTHubInterface
+
+
+logger = logging.getLogger(__name__)
+
+
+@register_model("bart")
+class BARTModel(TransformerModel):
+    __jit_unused_properties__ = ["supported_targets"]
+
+    @classmethod
+    def hub_models(cls):
+        return {
+            "bart.base": "http://dl.fbaipublicfiles.com/fairseq/models/bart.base.tar.gz",
+            "bart.large": "http://dl.fbaipublicfiles.com/fairseq/models/bart.large.tar.gz",
+            "bart.large.mnli": "http://dl.fbaipublicfiles.com/fairseq/models/bart.large.mnli.tar.gz",
+            "bart.large.cnn": "http://dl.fbaipublicfiles.com/fairseq/models/bart.large.cnn.tar.gz",
+            "bart.large.xsum": "http://dl.fbaipublicfiles.com/fairseq/models/bart.large.xsum.tar.gz",
+        }
+
+    def __init__(self, args, encoder, decoder):
+        super().__init__(args, encoder, decoder)
+
+        # We follow BERT's random weight initialization
+        self.apply(init_bert_params)
+
+        self.classification_heads = nn.ModuleDict()
+        if hasattr(self.encoder, "dictionary"):
+            self.eos: int = self.encoder.dictionary.eos()
+
+    @staticmethod
+    def add_args(parser):
+        super(BARTModel, BARTModel).add_args(parser)
+        parser.add_argument(
+            "--pooler-dropout",
+            type=float,
+            metavar="D",
+            help="dropout probability in the masked_lm pooler layers",
+        )
+        parser.add_argument(
+            "--pooler-activation-fn",
+            choices=utils.get_available_activation_fns(),
+            help="activation function to use for pooler layer",
+        )
+        parser.add_argument(
+            "--spectral-norm-classification-head",
+            action="store_true",
+            help="Apply spectral normalization on the classification head",
+        )
+
+    @property
+    def supported_targets(self):
+        return {"self"}
+
+    def forward(
+        self,
+        src_tokens,
+        src_lengths,
+        prev_output_tokens,
+        features_only: bool = False,
+        classification_head_name: Optional[str] = None,
+        token_embeddings: Optional[torch.Tensor] = None,
+        return_all_hiddens: bool = True,
+        alignment_layer: Optional[int] = None,
+        alignment_heads: Optional[int] = None,
+    ):
+        if classification_head_name is not None:
+            features_only = True
+
+        encoder_out = self.encoder(
+            src_tokens,
+            src_lengths=src_lengths,
+            token_embeddings=token_embeddings,
+            return_all_hiddens=return_all_hiddens
+        )
+        x, extra = self.decoder(
+            prev_output_tokens,
+            encoder_out=encoder_out,
+            features_only=features_only,
+            alignment_layer=alignment_layer,
+            alignment_heads=alignment_heads,
+            src_lengths=src_lengths,
+            return_all_hiddens=return_all_hiddens,
+        )
+        eos: int = self.eos
+        if classification_head_name is not None:
+            sentence_representation = x[
+                src_tokens.eq(eos), :
+            ].view(x.size(0), -1, x.size(-1))[:, -1, :]
+            for k, head in self.classification_heads.items():
+                # for torch script only supports iteration
+                if k == classification_head_name:
+                    x = head(sentence_representation)
+                    break
+        return x, extra
+
+    @classmethod
+    def from_pretrained(
+        cls,
+        model_name_or_path,
+        checkpoint_file="model.pt",
+        data_name_or_path=".",
+        bpe="gpt2",
+        sample_break_mode="eos",
+        **kwargs,
+    ):
+        from fairseq import hub_utils
+
+        x = hub_utils.from_pretrained(
+            model_name_or_path,
+            checkpoint_file,
+            data_name_or_path,
+            archive_map=cls.hub_models(),
+            bpe=bpe,
+            load_checkpoint_heads=True,
+            sample_break_mode=sample_break_mode,
+            **kwargs,
+        )
+        return BARTHubInterface(x["args"], x["task"], x["models"][0])
+
+    def register_classification_head(
+        self, name, num_classes=None, inner_dim=None, **kwargs
+    ):
+        """Register a classification head."""
+        logger.info("Registering classification head: {0}".format(name))
+        if name in self.classification_heads:
+            prev_num_classes = self.classification_heads[name].out_proj.out_features
+            prev_inner_dim = self.classification_heads[name].dense.out_features
+            if num_classes != prev_num_classes or inner_dim != prev_inner_dim:
+                logger.warning(
+                    're-registering head "{}" with num_classes {} (prev: {}) '
+                    "and inner_dim {} (prev: {})".format(
+                        name, num_classes, prev_num_classes, inner_dim, prev_inner_dim
+                    )
+                )
+        self.classification_heads[name] = BARTClassificationHead(
+            input_dim=self.args.encoder_embed_dim,
+            inner_dim=inner_dim or self.args.encoder_embed_dim,
+            num_classes=num_classes,
+            activation_fn=self.args.pooler_activation_fn,
+            pooler_dropout=self.args.pooler_dropout,
+            do_spectral_norm=getattr(
+                self.args, "spectral_norm_classification_head", False
+            ),
+        )
+
+    def upgrade_state_dict_named(self, state_dict, name):
+        super().upgrade_state_dict_named(state_dict, name)
+
+        prefix = name + "." if name != "" else ""
+        current_head_names = (
+            []
+            if not hasattr(self, "classification_heads")
+            else self.classification_heads.keys()
+        )
+
+        # Handle new classification heads present in the state dict.
+        keys_to_delete = []
+        for k in state_dict.keys():
+            if not k.startswith(prefix + "classification_heads."):
+                continue
+
+            head_name = k[len(prefix + "classification_heads.") :].split(".")[0]
+            num_classes = state_dict[
+                prefix + "classification_heads." + head_name + ".out_proj.weight"
+            ].size(0)
+            inner_dim = state_dict[
+                prefix + "classification_heads." + head_name + ".dense.weight"
+            ].size(0)
+
+            if getattr(self.args, "load_checkpoint_heads", False):
+                if head_name not in current_head_names:
+                    self.register_classification_head(head_name, num_classes, inner_dim)
+            else:
+                if head_name not in current_head_names:
+                    logger.warning(
+                        "deleting classification head ({}) from checkpoint "
+                        "not present in current model: {}".format(head_name, k)
+                    )
+                    keys_to_delete.append(k)
+                elif (
+                    num_classes
+                    != self.classification_heads[head_name].out_proj.out_features
+                    or inner_dim
+                    != self.classification_heads[head_name].dense.out_features
+                ):
+                    logger.warning(
+                        "deleting classification head ({}) from checkpoint "
+                        "with different dimensions than current model: {}".format(
+                            head_name, k
+                        )
+                    )
+                    keys_to_delete.append(k)
+        for k in keys_to_delete:
+            del state_dict[k]
+
+        def truncate_emb(key):
+            if key in state_dict:
+                state_dict[key] = state_dict[key][:-1, :]
+
+        # When finetuning on translation task, remove last row of
+        # embedding matrix that corresponds to mask_idx token.
+        loaded_dict_size = state_dict["encoder.embed_tokens.weight"].size(0)
+        if (
+            loaded_dict_size == len(self.encoder.dictionary) + 1
+            and "<mask>" not in self.encoder.dictionary
+        ):
+            truncate_emb("encoder.embed_tokens.weight")
+            truncate_emb("decoder.embed_tokens.weight")
+            truncate_emb("encoder.output_projection.weight")
+            truncate_emb("decoder.output_projection.weight")
+
+        # When continued pretraining on new set of languages for mbart,
+        # add extra lang embeddings at the end of embed_tokens.
+        # Note: newly added languages are assumed to have been added at the end.
+        if self.args.task == "multilingual_denoising" and loaded_dict_size < len(
+            self.encoder.dictionary
+        ):
+            logger.info(
+                "Adding extra language embeddings not found in pretrained model for "
+                "continued pretraining of MBART on new set of languages."
+            )
+            loaded_mask_token_embedding = state_dict["encoder.embed_tokens.weight"][
+                -1, :
+            ]
+
+            num_langids_to_add = len(self.encoder.dictionary) - loaded_dict_size
+            embed_dim = state_dict["encoder.embed_tokens.weight"].size(1)
+
+            new_lang_embed_to_add = torch.zeros(num_langids_to_add, embed_dim)
+            nn.init.normal_(new_lang_embed_to_add, mean=0, std=embed_dim ** -0.5)
+            new_lang_embed_to_add = new_lang_embed_to_add.to(
+                dtype=state_dict["encoder.embed_tokens.weight"].dtype,
+            )
+
+            state_dict["encoder.embed_tokens.weight"] = torch.cat(
+                [
+                    state_dict["encoder.embed_tokens.weight"][
+                        : loaded_dict_size - 1, :
+                    ],
+                    new_lang_embed_to_add,
+                    loaded_mask_token_embedding.unsqueeze(0),
+                ]
+            )
+            state_dict["decoder.embed_tokens.weight"] = torch.cat(
+                [
+                    state_dict["decoder.embed_tokens.weight"][
+                        : loaded_dict_size - 1, :
+                    ],
+                    new_lang_embed_to_add,
+                    loaded_mask_token_embedding.unsqueeze(0),
+                ]
+            )
+
+        # Copy any newly-added classification heads into the state dict
+        # with their current weights.
+        if hasattr(self, "classification_heads"):
+            cur_state = self.classification_heads.state_dict()
+            for k, v in cur_state.items():
+                if prefix + "classification_heads." + k not in state_dict:
+                    logger.info("Overwriting " + prefix + "classification_heads." + k)
+                    state_dict[prefix + "classification_heads." + k] = v
+
+
+class BARTClassificationHead(nn.Module):
+    """Head for sentence-level classification tasks."""
+
+    def __init__(
+        self,
+        input_dim,
+        inner_dim,
+        num_classes,
+        activation_fn,
+        pooler_dropout,
+        do_spectral_norm=False,
+    ):
+        super().__init__()
+        self.dense = nn.Linear(input_dim, inner_dim)
+        self.activation_fn = utils.get_activation_fn(activation_fn)
+        self.dropout = nn.Dropout(p=pooler_dropout)
+        self.out_proj = nn.Linear(inner_dim, num_classes)
+
+        if do_spectral_norm:
+            self.out_proj = torch.nn.utils.spectral_norm(self.out_proj)
+
+    def forward(self, features, **kwargs):
+        x = features
+        x = self.dropout(x)
+        x = self.dense(x)
+        x = self.activation_fn(x)
+        x = self.dropout(x)
+        x = self.out_proj(x)
+        return x
+
+
+@register_model_architecture("bart", "bart_large")
+def bart_large_architecture(args):
+    args.encoder_embed_path = getattr(args, "encoder_embed_path", None)
+    args.encoder_embed_dim = getattr(args, "encoder_embed_dim", 1024)
+    args.encoder_ffn_embed_dim = getattr(args, "encoder_ffn_embed_dim", 4 * 1024)
+    args.encoder_layers = getattr(args, "encoder_layers", 12)
+    args.encoder_attention_heads = getattr(args, "encoder_attention_heads", 16)
+    args.encoder_normalize_before = getattr(args, "encoder_normalize_before", False)
+    args.encoder_learned_pos = getattr(args, "encoder_learned_pos", True)
+    args.decoder_embed_path = getattr(args, "decoder_embed_path", None)
+    args.decoder_embed_dim = getattr(args, "decoder_embed_dim", args.encoder_embed_dim)
+    args.decoder_ffn_embed_dim = getattr(
+        args, "decoder_ffn_embed_dim", args.encoder_ffn_embed_dim
+    )
+    args.decoder_layers = getattr(args, "decoder_layers", 12)
+    args.decoder_attention_heads = getattr(args, "decoder_attention_heads", 16)
+    args.decoder_normalize_before = getattr(args, "decoder_normalize_before", False)
+    args.decoder_learned_pos = getattr(args, "decoder_learned_pos", True)
+    args.attention_dropout = getattr(args, "attention_dropout", 0.0)
+    args.relu_dropout = getattr(args, "relu_dropout", 0.0)
+    args.dropout = getattr(args, "dropout", 0.1)
+    args.max_target_positions = getattr(args, "max_target_positions", 1024)
+    args.max_source_positions = getattr(args, "max_source_positions", 1024)
+    args.adaptive_softmax_cutoff = getattr(args, "adaptive_softmax_cutoff", None)
+    args.adaptive_softmax_dropout = getattr(args, "adaptive_softmax_dropout", 0)
+    args.share_decoder_input_output_embed = getattr(
+        args, "share_decoder_input_output_embed", True
+    )
+    args.share_all_embeddings = getattr(args, "share_all_embeddings", True)
+
+    args.decoder_output_dim = getattr(
+        args, "decoder_output_dim", args.decoder_embed_dim
+    )
+    args.decoder_input_dim = getattr(args, "decoder_input_dim", args.decoder_embed_dim)
+
+    args.no_scale_embedding = getattr(args, "no_scale_embedding", True)
+    args.layernorm_embedding = getattr(args, "layernorm_embedding", True)
+
+    args.activation_fn = getattr(args, "activation_fn", "gelu")
+    args.pooler_activation_fn = getattr(args, "pooler_activation_fn", "tanh")
+    args.pooler_dropout = getattr(args, "pooler_dropout", 0.0)
+
+
+@register_model_architecture("bart", "bart_base")
+def bart_base_architecture(args):
+    args.encoder_embed_dim = getattr(args, "encoder_embed_dim", 768)
+    args.encoder_ffn_embed_dim = getattr(args, "encoder_ffn_embed_dim", 4 * 768)
+    args.encoder_layers = getattr(args, "encoder_layers", 6)
+    args.encoder_attention_heads = getattr(args, "encoder_attention_heads", 12)
+    args.decoder_layers = getattr(args, "decoder_layers", 6)
+    args.decoder_attention_heads = getattr(args, "decoder_attention_heads", 12)
+    bart_large_architecture(args)
+
+
+@register_model_architecture("bart", "mbart_large")
+def mbart_large_architecture(args):
+    args.no_scale_embedding = getattr(args, "no_scale_embedding", False)
+    bart_large_architecture(args)
+
+
+@register_model_architecture("bart", "mbart_base")
+def mbart_base_architecture(args):
+    args.no_scale_embedding = getattr(args, "no_scale_embedding", False)
+    bart_base_architecture(args)
+
+
+@register_model_architecture("bart", "mbart_base_wmt20")
+def mbart_base_wmt20_architecture(args):
+    args.layernorm_embedding = getattr(args, "layernorm_embedding", False)
+    mbart_base_architecture(args)
diff --git a/SpeechT5/fairseq/fairseq/models/composite_encoder.py b/SpeechT5/fairseq/fairseq/models/composite_encoder.py
new file mode 100644
index 0000000000000000000000000000000000000000..4e20fe3a833a2d87876cbec294ad2bebfba7f591
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/models/composite_encoder.py
@@ -0,0 +1,57 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from .fairseq_encoder import FairseqEncoder
+
+
+class CompositeEncoder(FairseqEncoder):
+    """
+    A wrapper around a dictionary of :class:`FairseqEncoder` objects.
+
+    We run forward on each encoder and return a dictionary of outputs. The first
+    encoder's dictionary is used for initialization.
+
+    Args:
+        encoders (dict): a dictionary of :class:`FairseqEncoder` objects.
+    """
+
+    def __init__(self, encoders):
+        super().__init__(next(iter(encoders.values())).dictionary)
+        self.encoders = encoders
+        for key in self.encoders:
+            self.add_module(key, self.encoders[key])
+
+    def forward(self, src_tokens, src_lengths):
+        """
+        Args:
+            src_tokens (LongTensor): tokens in the source language of shape
+                `(batch, src_len)`
+            src_lengths (LongTensor): lengths of each source sentence of shape
+                `(batch)`
+
+        Returns:
+            dict:
+                the outputs from each Encoder
+        """
+        encoder_out = {}
+        for key in self.encoders:
+            encoder_out[key] = self.encoders[key](src_tokens, src_lengths)
+        return encoder_out
+
+    def reorder_encoder_out(self, encoder_out, new_order):
+        """Reorder encoder output according to new_order."""
+        for key in self.encoders:
+            encoder_out[key] = self.encoders[key].reorder_encoder_out(
+                encoder_out[key], new_order
+            )
+        return encoder_out
+
+    def max_positions(self):
+        return min(self.encoders[key].max_positions() for key in self.encoders)
+
+    def upgrade_state_dict(self, state_dict):
+        for key in self.encoders:
+            self.encoders[key].upgrade_state_dict(state_dict)
+        return state_dict
diff --git a/SpeechT5/fairseq/fairseq/models/distributed_fairseq_model.py b/SpeechT5/fairseq/fairseq/models/distributed_fairseq_model.py
new file mode 100644
index 0000000000000000000000000000000000000000..06905455fd615ea962d8478c6093e7b4bbcc83c4
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/models/distributed_fairseq_model.py
@@ -0,0 +1,145 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import logging
+import os
+import signal
+import threading
+
+import torch
+import torch.nn as nn
+from torch.nn.parallel import DistributedDataParallel
+
+from fairseq.distributed import (
+    DistributedTimeoutWrapper,
+    LegacyDistributedDataParallel,
+    ModuleProxyWrapper,
+    TPUDistributedDataParallel,
+)
+
+
+logger = logging.getLogger(__name__)
+
+
+_GOSSIP_DISABLED = False
+try:
+    import gossip
+except ImportError:
+    _GOSSIP_DISABLED = True
+
+
+def DistributedFairseqModel(args, model, process_group, device):
+    """
+    Wrap a *model* to support distributed data parallel training.
+
+    This is similar to the built-in DistributedDataParallel, but allows
+    additional configuration of the DistributedDataParallel class to
+    use, and also provides easier access to the wrapped model by
+    forwarding requests for missing attributes to the wrapped model.
+
+    Args:
+        args (argparse.Namespace): fairseq args
+        model (BaseFairseqModel): model to wrap
+        process_group: the c10d process group to be used for distributed data
+            parallel all-reduction.
+        device: device to move model to
+    """
+    assert isinstance(model, nn.Module)
+    if args.tpu:
+        wrapped_model = TPUDistributedDataParallel(
+            module=model.to(device),
+            process_group=process_group,
+        )
+        # forward missing getattr and state_dict/load_state_dict to orig model
+        wrapped_model = ModuleProxyWrapper(wrapped_model)
+    elif args.ddp_backend in {"c10d", "pytorch_ddp"}:
+        wrapped_model = DistributedDataParallel(
+            module=model.to(device),
+            device_ids=[args.device_id],
+            output_device=args.device_id,
+            broadcast_buffers=args.broadcast_buffers,
+            bucket_cap_mb=args.bucket_cap_mb,
+            process_group=process_group,
+            find_unused_parameters=args.find_unused_parameters,
+        )
+        if args.ddp_comm_hook == "fp16":
+            logger.info("enable fp16 communication hook in DDP")
+            try:
+                from torch.distributed.algorithms.ddp_comm_hooks import (
+                    register_ddp_comm_hook,
+                    DDPCommHookType,
+                )
+            except:
+                logger.error(
+                    "Could not import from torch.distributed.algorithms.ddp_comm_hooks; you may need to update your pytorch version"
+                )
+                raise
+
+            register_ddp_comm_hook(DDPCommHookType.FP16_COMPRESS, wrapped_model)
+        # forward missing getattr and state_dict/load_state_dict to orig model
+        wrapped_model = ModuleProxyWrapper(wrapped_model)
+    elif args.ddp_backend in {"no_c10d", "legacy_ddp"}:
+        wrapped_model = LegacyDistributedDataParallel(
+            module=model.to(device),
+            buffer_size=2 ** 28,
+            process_group=process_group,
+        )
+        # forward missing getattr and state_dict/load_state_dict to orig model
+        wrapped_model = ModuleProxyWrapper(wrapped_model)
+    elif args.ddp_backend == "slow_mo":
+        if _GOSSIP_DISABLED:
+            raise ImportError(
+                "Cannot find gossip library. Please install from: "
+                "github.com/facebookresearch/stochastic_gradient_push"
+            )
+
+        # The values of slowmo_momentum below were obtained by tuning on the
+        # En-De 16 dataset by training the transformer_wmt_en_de_large model
+        if args.slowmo_momentum is None:
+            if args.distributed_world_size <= 16:
+                args.slowmo_momentum = 0.0
+            elif args.distributed_world_size <= 32:
+                args.slowmo_momentum = 0.2
+            elif args.distributed_world_size <= 64:
+                args.slowmo_momentum = 0.5
+            else:
+                args.slowmo_momentum = 0.6
+
+        wrapped_model = gossip.GossipDataParallel(
+            module=model.to(device),
+            device_ids=[args.device_id],
+            output_device=args.device_id,
+            broadcast_buffers=args.broadcast_buffers,
+            nprocs_per_node=args.nprocs_per_node,
+            slowmo_momentum=args.slowmo_momentum,
+            localsgd=(args.slowmo_algorithm == "LocalSGD"),
+            localsgd_frequency=args.localsgd_frequency,
+        )
+        # forward missing getattr and state_dict/load_state_dict to orig model
+        wrapped_model = ModuleProxyWrapper(wrapped_model)
+    elif args.ddp_backend == "fully_sharded":
+        try:
+            from fairscale.nn.data_parallel import FullyShardedDataParallel as FSDP
+        except ImportError:
+            raise ImportError(
+                "Cannot find FullyShardedDataParallel. "
+                "Please install fairscale with: pip install fairscale"
+            )
+        assert isinstance(model, FSDP), "expected model to already be wrapped in FSDP"
+        wrapped_model = model
+        if args.memory_efficient_fp16:
+            wrapped_model = wrapped_model.half()
+        if not args.cpu_offload:
+            wrapped_model = wrapped_model.to(device=device)
+    else:
+        raise ValueError("Unknown --ddp-backend: " + args.ddp_backend)
+
+    # kill hung distributed jobs after a timeout
+    if getattr(args, "heartbeat_timeout", -1) > 0:
+        wrapped_model = DistributedTimeoutWrapper(
+            wrapped_model, timeout=getattr(args, "heartbeat_timeout", -1)
+        )
+
+    return wrapped_model
diff --git a/SpeechT5/fairseq/fairseq/models/fairseq_decoder.py b/SpeechT5/fairseq/fairseq/models/fairseq_decoder.py
new file mode 100644
index 0000000000000000000000000000000000000000..4f1e8b52a2e0a50199050f11cc613ab02ca9febe
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/models/fairseq_decoder.py
@@ -0,0 +1,105 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from typing import Dict, List, Optional, Tuple
+
+import torch.nn as nn
+from fairseq import utils
+from torch import Tensor
+
+
+class FairseqDecoder(nn.Module):
+    """Base class for decoders."""
+
+    def __init__(self, dictionary):
+        super().__init__()
+        self.dictionary = dictionary
+        self.onnx_trace = False
+        self.adaptive_softmax = None
+
+
+    def forward(self, prev_output_tokens, encoder_out=None, **kwargs):
+        """
+        Args:
+            prev_output_tokens (LongTensor): shifted output tokens of shape
+                `(batch, tgt_len)`, for teacher forcing
+            encoder_out (dict, optional): output from the encoder, used for
+                encoder-side attention
+
+        Returns:
+            tuple:
+                - the decoder's output of shape `(batch, tgt_len, vocab)`
+                - a dictionary with any model-specific outputs
+        """
+        x, extra = self.extract_features(
+            prev_output_tokens, encoder_out=encoder_out, **kwargs
+        )
+        x = self.output_layer(x)
+        return x, extra
+
+    def extract_features(self, prev_output_tokens, encoder_out=None, **kwargs):
+        """
+        Returns:
+            tuple:
+                - the decoder's features of shape `(batch, tgt_len, embed_dim)`
+                - a dictionary with any model-specific outputs
+        """
+        raise NotImplementedError
+
+    def output_layer(self, features, **kwargs):
+        """
+        Project features to the default output size, e.g., vocabulary size.
+
+        Args:
+            features (Tensor): features returned by *extract_features*.
+        """
+        raise NotImplementedError
+
+    def get_normalized_probs(
+        self,
+        net_output: Tuple[Tensor, Optional[Dict[str, List[Optional[Tensor]]]]],
+        log_probs: bool,
+        sample: Optional[Dict[str, Tensor]] = None,
+    ):
+        """Get normalized probabilities (or log probs) from a net's output."""
+        return self.get_normalized_probs_scriptable(net_output, log_probs, sample)
+
+    # TorchScript doesn't support super() method so that the scriptable Subclass
+    # can't access the base class model in Torchscript.
+    # Current workaround is to add a helper function with different name and
+    # call the helper function from scriptable Subclass.
+    def get_normalized_probs_scriptable(
+        self,
+        net_output: Tuple[Tensor, Optional[Dict[str, List[Optional[Tensor]]]]],
+        log_probs: bool,
+        sample: Optional[Dict[str, Tensor]] = None,
+    ):
+        """Get normalized probabilities (or log probs) from a net's output."""
+
+        if hasattr(self, "adaptive_softmax") and self.adaptive_softmax is not None:
+            if sample is not None:
+                assert "target" in sample
+                target = sample["target"]
+            else:
+                target = None
+            out = self.adaptive_softmax.get_log_prob(net_output[0], target=target)
+            return out.exp_() if not log_probs else out
+
+        logits = net_output[0]
+        if log_probs:
+            return utils.log_softmax(logits, dim=-1, onnx_trace=self.onnx_trace)
+        else:
+            return utils.softmax(logits, dim=-1, onnx_trace=self.onnx_trace)
+
+    def max_positions(self):
+        """Maximum input length supported by the decoder."""
+        return 1e6  # an arbitrary large number
+
+    def upgrade_state_dict_named(self, state_dict, name):
+        """Upgrade old state dicts to work with newer code."""
+        return state_dict
+
+    def prepare_for_onnx_export_(self):
+        self.onnx_trace = True
diff --git a/SpeechT5/fairseq/fairseq/models/fairseq_encoder.py b/SpeechT5/fairseq/fairseq/models/fairseq_encoder.py
new file mode 100644
index 0000000000000000000000000000000000000000..08cbde15a46e9b6d58e11c2f6052e7cf2d0cc8b2
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/models/fairseq_encoder.py
@@ -0,0 +1,92 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from typing import Dict, List, NamedTuple, Optional
+
+import torch
+import torch.nn as nn
+from torch import Tensor
+
+
+EncoderOut = NamedTuple(
+    "EncoderOut",
+    [
+        ("encoder_out", Tensor),  # T x B x C
+        ("encoder_padding_mask", Optional[Tensor]),  # B x T
+        ("encoder_embedding", Optional[Tensor]),  # B x T x C
+        ("encoder_states", Optional[List[Tensor]]),  # List[T x B x C]
+        ("src_tokens", Optional[Tensor]),  # B x T
+        ("src_lengths", Optional[Tensor]),  # B x 1
+    ],
+)
+
+
+class FairseqEncoder(nn.Module):
+    """Base class for encoders."""
+
+    def __init__(self, dictionary):
+        super().__init__()
+        self.dictionary = dictionary
+
+    def forward(self, src_tokens, src_lengths=None, **kwargs):
+        """
+        Args:
+            src_tokens (LongTensor): tokens in the source language of shape
+                `(batch, src_len)`
+            src_lengths (LongTensor): lengths of each source sentence of shape
+                `(batch)`
+        """
+        raise NotImplementedError
+
+    def forward_torchscript(self, net_input: Dict[str, Tensor]):
+        """A TorchScript-compatible version of forward.
+
+        Encoders which use additional arguments may want to override
+        this method for TorchScript compatibility.
+        """
+        if torch.jit.is_scripting():
+            return self.forward(
+                src_tokens=net_input["src_tokens"],
+                src_lengths=net_input["src_lengths"],
+            )
+        else:
+            return self.forward_non_torchscript(net_input)
+
+    @torch.jit.unused
+    def forward_non_torchscript(self, net_input: Dict[str, Tensor]):
+        encoder_input = {
+            k: v for k, v in net_input.items() if k != "prev_output_tokens"
+        }
+        return self.forward(**encoder_input)
+
+    def reorder_encoder_out(self, encoder_out, new_order):
+        """
+        Reorder encoder output according to `new_order`.
+
+        Args:
+            encoder_out: output from the ``forward()`` method
+            new_order (LongTensor): desired order
+
+        Returns:
+            `encoder_out` rearranged according to `new_order`
+        """
+        raise NotImplementedError
+
+    def max_positions(self):
+        """Maximum input length supported by the encoder."""
+        return 1e6  # an arbitrary large number
+
+    def upgrade_state_dict_named(self, state_dict, name):
+        """Upgrade old state dicts to work with newer code."""
+        return state_dict
+
+    def set_num_updates(self, num_updates):
+        """State from trainer to pass along to model at every update."""
+
+        def _apply(m):
+            if hasattr(m, "set_num_updates") and m != self:
+                m.set_num_updates(num_updates)
+
+        self.apply(_apply)
diff --git a/SpeechT5/fairseq/fairseq/models/fairseq_incremental_decoder.py b/SpeechT5/fairseq/fairseq/models/fairseq_incremental_decoder.py
new file mode 100644
index 0000000000000000000000000000000000000000..cc72a0f8f3da238a8ce846240e5008d91ce1bc1a
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/models/fairseq_incremental_decoder.py
@@ -0,0 +1,118 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import logging
+from typing import Dict, Optional
+
+from fairseq.incremental_decoding_utils import with_incremental_state
+from fairseq.models import FairseqDecoder
+from torch import Tensor
+
+
+logger = logging.getLogger(__name__)
+
+
+@with_incremental_state
+class FairseqIncrementalDecoder(FairseqDecoder):
+    """Base class for incremental decoders.
+
+    Incremental decoding is a special mode at inference time where the Model
+    only receives a single timestep of input corresponding to the previous
+    output token (for teacher forcing) and must produce the next output
+    *incrementally*. Thus the model must cache any long-term state that is
+    needed about the sequence, e.g., hidden states, convolutional states, etc.
+
+    Compared to the standard :class:`FairseqDecoder` interface, the incremental
+    decoder interface allows :func:`forward` functions to take an extra keyword
+    argument (*incremental_state*) that can be used to cache state across
+    time-steps.
+
+    The :class:`FairseqIncrementalDecoder` interface also defines the
+    :func:`reorder_incremental_state` method, which is used during beam search
+    to select and reorder the incremental state based on the selection of beams.
+
+    To learn more about how incremental decoding works, refer to `this blog
+    <http://www.telesens.co/2019/04/21/understanding-incremental-decoding-in-fairseq/>`_.
+    """
+
+    def __init__(self, dictionary):
+        super().__init__(dictionary)
+
+    def forward(
+        self, prev_output_tokens, encoder_out=None, incremental_state=None, **kwargs
+    ):
+        """
+        Args:
+            prev_output_tokens (LongTensor): shifted output tokens of shape
+                `(batch, tgt_len)`, for teacher forcing
+            encoder_out (dict, optional): output from the encoder, used for
+                encoder-side attention
+            incremental_state (dict, optional): dictionary used for storing
+                state during :ref:`Incremental decoding`
+
+        Returns:
+            tuple:
+                - the decoder's output of shape `(batch, tgt_len, vocab)`
+                - a dictionary with any model-specific outputs
+        """
+        raise NotImplementedError
+
+    def extract_features(
+        self, prev_output_tokens, encoder_out=None, incremental_state=None, **kwargs
+    ):
+        """
+        Returns:
+            tuple:
+                - the decoder's features of shape `(batch, tgt_len, embed_dim)`
+                - a dictionary with any model-specific outputs
+        """
+        raise NotImplementedError
+
+    def reorder_incremental_state(
+        self,
+        incremental_state: Dict[str, Dict[str, Optional[Tensor]]],
+        new_order: Tensor,
+    ):
+        """Reorder incremental state.
+
+        This will be called when the order of the input has changed from the
+        previous time step. A typical use case is beam search, where the input
+        order changes between time steps based on the selection of beams.
+        """
+        pass
+
+    def reorder_incremental_state_scripting(
+        self,
+        incremental_state: Dict[str, Dict[str, Optional[Tensor]]],
+        new_order: Tensor,
+    ):
+        """Main entry point for reordering the incremental state.
+
+        Due to limitations in TorchScript, we call this function in
+        :class:`fairseq.sequence_generator.SequenceGenerator` instead of
+        calling :func:`reorder_incremental_state` directly.
+        """
+        for module in self.modules():
+            if hasattr(module, "reorder_incremental_state"):
+                result = module.reorder_incremental_state(incremental_state, new_order)
+                if result is not None:
+                    incremental_state = result
+
+    def set_beam_size(self, beam_size):
+        """Sets the beam size in the decoder and all children."""
+        if getattr(self, "_beam_size", -1) != beam_size:
+            seen = set()
+
+            def apply_set_beam_size(module):
+                if (
+                    module != self
+                    and hasattr(module, "set_beam_size")
+                    and module not in seen
+                ):
+                    seen.add(module)
+                    module.set_beam_size(beam_size)
+
+            self.apply(apply_set_beam_size)
+            self._beam_size = beam_size
diff --git a/SpeechT5/fairseq/fairseq/models/fairseq_model.py b/SpeechT5/fairseq/fairseq/models/fairseq_model.py
new file mode 100644
index 0000000000000000000000000000000000000000..e55c7ba1ad90f4e2f12db6c814d04a90c4e3b77c
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/models/fairseq_model.py
@@ -0,0 +1,569 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+"""
+Base classes for various fairseq models.
+"""
+
+import logging
+from argparse import Namespace
+from typing import Dict, List, Optional, Tuple
+
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from fairseq import utils
+from fairseq.data import Dictionary
+from fairseq.dataclass.utils import (
+    convert_namespace_to_omegaconf,
+    gen_parser_from_dataclass,
+)
+from fairseq.models import FairseqDecoder, FairseqEncoder
+from omegaconf import DictConfig
+from torch import Tensor
+
+
+logger = logging.getLogger(__name__)
+
+
+def check_type(module, expected_type):
+    if hasattr(module, "unwrapped_module"):
+        assert isinstance(module.unwrapped_module, expected_type), \
+            f"{type(module.unwrapped_module)} != {expected_type}"
+    else:
+        assert isinstance(module, expected_type), f"{type(module)} != {expected_type}"
+
+
+class BaseFairseqModel(nn.Module):
+    """Base class for fairseq models."""
+
+    def __init__(self):
+        super().__init__()
+        self._is_generation_fast = False
+
+    @classmethod
+    def add_args(cls, parser):
+        """Add model-specific arguments to the parser."""
+        dc = getattr(cls, "__dataclass", None)
+        if dc is not None:
+            # do not set defaults so that settings defaults from various architectures still works
+            gen_parser_from_dataclass(parser, dc(), delete_default=True)
+
+    @classmethod
+    def build_model(cls, args, task):
+        """Build a new model instance."""
+        raise NotImplementedError("Model must implement the build_model method")
+
+    def get_targets(self, sample, net_output):
+        """Get targets from either the sample or the net's output."""
+        return sample["target"]
+
+    def get_normalized_probs(
+        self,
+        net_output: Tuple[Tensor, Optional[Dict[str, List[Optional[Tensor]]]]],
+        log_probs: bool,
+        sample: Optional[Dict[str, Tensor]] = None,
+    ):
+        """Get normalized probabilities (or log probs) from a net's output."""
+        return self.get_normalized_probs_scriptable(net_output, log_probs, sample)
+
+    # TorchScript doesn't support super() method so that the scriptable Subclass
+    # can't access the base class model in Torchscript.
+    # Current workaround is to add a helper function with different name and
+    # call the helper function from scriptable Subclass.
+    def get_normalized_probs_scriptable(
+        self,
+        net_output: Tuple[Tensor, Optional[Dict[str, List[Optional[Tensor]]]]],
+        log_probs: bool,
+        sample: Optional[Dict[str, Tensor]] = None,
+    ):
+        """Scriptable helper function for get_normalized_probs in ~BaseFairseqModel"""
+        if hasattr(self, "decoder"):
+            return self.decoder.get_normalized_probs(net_output, log_probs, sample)
+        elif torch.is_tensor(net_output):
+            # syntactic sugar for simple models which don't have a decoder
+            # (e.g., the classification tutorial)
+            logits = net_output.float()
+            if log_probs:
+                return F.log_softmax(logits, dim=-1)
+            else:
+                return F.softmax(logits, dim=-1)
+        raise NotImplementedError
+
+    def extract_features(self, *args, **kwargs):
+        """Similar to *forward* but only return features."""
+        return self(*args, **kwargs)
+
+    def max_positions(self):
+        """Maximum length supported by the model."""
+        return None
+
+    def load_state_dict(
+        self,
+        state_dict,
+        strict=True,
+        model_cfg: Optional[DictConfig] = None,
+        args: Optional[Namespace] = None,
+    ):
+        """Copies parameters and buffers from *state_dict* into this module and
+        its descendants.
+
+        Overrides the method in :class:`nn.Module`. Compared with that method
+        this additionally "upgrades" *state_dicts* from old checkpoints.
+        """
+
+        if model_cfg is None and args is not None:
+            logger.warn("using 'args' is deprecated, please update your code to use dataclass config")
+            model_cfg = convert_namespace_to_omegaconf(args).model
+
+        self.upgrade_state_dict(state_dict)
+
+        from fairseq.checkpoint_utils import prune_state_dict
+
+        new_state_dict = prune_state_dict(state_dict, model_cfg)
+        return super().load_state_dict(new_state_dict, strict)
+
+    def upgrade_state_dict(self, state_dict):
+        """Upgrade old state dicts to work with newer code."""
+        self.upgrade_state_dict_named(state_dict, "")
+
+    def upgrade_state_dict_named(self, state_dict, name):
+        """Upgrade old state dicts to work with newer code.
+
+        Args:
+            state_dict (dict): state dictionary to upgrade, in place
+            name (str): the state dict key corresponding to the current module
+        """
+        assert state_dict is not None
+
+        def do_upgrade(m, prefix):
+            if len(prefix) > 0:
+                prefix += "."
+
+            for n, c in m.named_children():
+                name = prefix + n
+                if hasattr(c, "upgrade_state_dict_named"):
+                    c.upgrade_state_dict_named(state_dict, name)
+                elif hasattr(c, "upgrade_state_dict"):
+                    c.upgrade_state_dict(state_dict)
+                do_upgrade(c, name)
+
+        do_upgrade(self, name)
+
+    def set_num_updates(self, num_updates):
+        """State from trainer to pass along to model at every update."""
+        for m in self.modules():
+            if hasattr(m, "set_num_updates") and m != self:
+                m.set_num_updates(num_updates)
+
+    def prepare_for_inference_(self, cfg: DictConfig):
+        """Prepare model for inference."""
+        kwargs = {}
+        kwargs["beamable_mm_beam_size"] = (
+            None
+            if getattr(cfg.generation, "no_beamable_mm", False)
+            else getattr(cfg.generation, "beam", 5)
+        )
+        kwargs["need_attn"] = getattr(cfg.generation, "print_alignment", False)
+        if getattr(cfg.generation, "retain_dropout", False):
+            kwargs["retain_dropout"] = cfg.generation.retain_dropout
+            kwargs["retain_dropout_modules"] = cfg.generation.retain_dropout_modules
+        self.make_generation_fast_(**kwargs)
+
+    def make_generation_fast_(self, **kwargs):
+        """
+        Legacy entry point to optimize model for faster generation.
+        Prefer prepare_for_inference_.
+        """
+        if self._is_generation_fast:
+            return  # only apply once
+        self._is_generation_fast = True
+
+        # remove weight norm from all modules in the network
+        def apply_remove_weight_norm(module):
+            try:
+                nn.utils.remove_weight_norm(module)
+            except (AttributeError, ValueError):  # this module didn't have weight norm
+                return
+
+        self.apply(apply_remove_weight_norm)
+
+        def apply_make_generation_fast_(module, prefix):
+            if len(prefix) > 0:
+                prefix += "."
+
+            base_func = BaseFairseqModel.make_generation_fast_
+            for n, m in module.named_modules():
+                if (
+                    m != self
+                    and hasattr(m, "make_generation_fast_")
+                    # don't call this implementation again, e.g., if
+                    # children modules also inherit from BaseFairseqModel
+                    and m.make_generation_fast_.__func__ is not base_func
+                ):
+                    name = prefix + n
+                    m.make_generation_fast_(name=name, **kwargs)
+
+        apply_make_generation_fast_(self, "")
+
+        def train(mode=True):
+            if mode:
+                raise RuntimeError("cannot train after make_generation_fast")
+
+        # this model should no longer be used for training
+        self.eval()
+        self.train = train
+
+    def prepare_for_onnx_export_(self, **kwargs):
+        """Make model exportable via ONNX trace."""
+        seen = set()
+
+        def apply_prepare_for_onnx_export_(module):
+            if (
+                module != self
+                and hasattr(module, "prepare_for_onnx_export_")
+                and module not in seen
+            ):
+                seen.add(module)
+                module.prepare_for_onnx_export_(**kwargs)
+
+        self.apply(apply_prepare_for_onnx_export_)
+
+    @classmethod
+    def from_pretrained(
+        cls,
+        model_name_or_path,
+        checkpoint_file="model.pt",
+        data_name_or_path=".",
+        **kwargs,
+    ):
+        """
+        Load a :class:`~fairseq.models.FairseqModel` from a pre-trained model
+        file. Downloads and caches the pre-trained model file if needed.
+
+        The base implementation returns a
+        :class:`~fairseq.hub_utils.GeneratorHubInterface`, which can be used to
+        generate translations or sample from language models. The underlying
+        :class:`~fairseq.models.FairseqModel` can be accessed via the
+        *generator.models* attribute.
+
+        Other models may override this to implement custom hub interfaces.
+
+        Args:
+            model_name_or_path (str): either the name of a pre-trained model to
+                load or a path/URL to a pre-trained model state dict
+            checkpoint_file (str, optional): colon-separated list of checkpoint
+                files in the model archive to ensemble (default: 'model.pt')
+            data_name_or_path (str, optional): point args.data to the archive
+                at the given path/URL. Can start with '.' or './' to reuse the
+                model archive path.
+        """
+        from fairseq import hub_utils
+
+        x = hub_utils.from_pretrained(
+            model_name_or_path,
+            checkpoint_file,
+            data_name_or_path,
+            archive_map=cls.hub_models(),
+            **kwargs,
+        )
+        logger.info(x["args"])
+        return hub_utils.GeneratorHubInterface(x["args"], x["task"], x["models"])
+
+    @classmethod
+    def hub_models(cls):
+        return {}
+
+
+class FairseqEncoderDecoderModel(BaseFairseqModel):
+    """Base class for encoder-decoder models.
+
+    Args:
+        encoder (FairseqEncoder): the encoder
+        decoder (FairseqDecoder): the decoder
+    """
+
+    def __init__(self, encoder, decoder):
+        super().__init__()
+
+        self.encoder = encoder
+        self.decoder = decoder
+
+        check_type(self.encoder, FairseqEncoder)
+        check_type(self.decoder, FairseqDecoder)
+
+    def forward(self, src_tokens, src_lengths, prev_output_tokens, **kwargs):
+        """
+        Run the forward pass for an encoder-decoder model.
+
+        First feed a batch of source tokens through the encoder. Then, feed the
+        encoder output and previous decoder outputs (i.e., teacher forcing) to
+        the decoder to produce the next outputs::
+
+            encoder_out = self.encoder(src_tokens, src_lengths)
+            return self.decoder(prev_output_tokens, encoder_out)
+
+        Args:
+            src_tokens (LongTensor): tokens in the source language of shape
+                `(batch, src_len)`
+            src_lengths (LongTensor): source sentence lengths of shape `(batch)`
+            prev_output_tokens (LongTensor): previous decoder outputs of shape
+                `(batch, tgt_len)`, for teacher forcing
+
+        Returns:
+            tuple:
+                - the decoder's output of shape `(batch, tgt_len, vocab)`
+                - a dictionary with any model-specific outputs
+        """
+        encoder_out = self.encoder(src_tokens, src_lengths=src_lengths, **kwargs)
+        decoder_out = self.decoder(
+            prev_output_tokens, encoder_out=encoder_out, **kwargs
+        )
+        return decoder_out
+
+    def forward_decoder(self, prev_output_tokens, **kwargs):
+        return self.decoder(prev_output_tokens, **kwargs)
+
+    def extract_features(self, src_tokens, src_lengths, prev_output_tokens, **kwargs):
+        """
+        Similar to *forward* but only return features.
+
+        Returns:
+            tuple:
+                - the decoder's features of shape `(batch, tgt_len, embed_dim)`
+                - a dictionary with any model-specific outputs
+        """
+        encoder_out = self.encoder(src_tokens, src_lengths=src_lengths, **kwargs)
+        features = self.decoder.extract_features(
+            prev_output_tokens, encoder_out=encoder_out, **kwargs
+        )
+        return features
+
+    def output_layer(self, features, **kwargs):
+        """Project features to the default output size (typically vocabulary size)."""
+        return self.decoder.output_layer(features, **kwargs)
+
+    def max_positions(self):
+        """Maximum length supported by the model."""
+        return (self.encoder.max_positions(), self.decoder.max_positions())
+
+    def max_decoder_positions(self):
+        """Maximum length supported by the decoder."""
+        return self.decoder.max_positions()
+
+
+class FairseqModel(FairseqEncoderDecoderModel):
+    def __init__(self, *args, **kwargs):
+        super().__init__(*args, **kwargs)
+        utils.deprecation_warning(
+            "FairseqModel is deprecated, please use FairseqEncoderDecoderModel "
+            "or BaseFairseqModel instead",
+            stacklevel=4,
+        )
+
+
+class FairseqMultiModel(BaseFairseqModel):
+    """Base class for combining multiple encoder-decoder models."""
+
+    def __init__(self, encoders, decoders):
+        super().__init__()
+        assert encoders.keys() == decoders.keys()
+        self.keys = list(encoders.keys())
+        for key in self.keys:
+            check_type(encoders[key], FairseqEncoder)
+            check_type(decoders[key], FairseqDecoder)
+
+        self.models = nn.ModuleDict(
+            {
+                key: FairseqEncoderDecoderModel(encoders[key], decoders[key])
+                for key in self.keys
+            }
+        )
+
+    @staticmethod
+    def build_shared_embeddings(
+        dicts: Dict[str, Dictionary],
+        langs: List[str],
+        embed_dim: int,
+        build_embedding: callable,
+        pretrained_embed_path: Optional[str] = None,
+    ):
+        """
+        Helper function to build shared embeddings for a set of languages after
+        checking that all dicts corresponding to those languages are equivalent.
+
+        Args:
+            dicts: Dict of lang_id to its corresponding Dictionary
+            langs: languages that we want to share embeddings for
+            embed_dim: embedding dimension
+            build_embedding: callable function to actually build the embedding
+            pretrained_embed_path: Optional path to load pretrained embeddings
+        """
+        shared_dict = dicts[langs[0]]
+        if any(dicts[lang] != shared_dict for lang in langs):
+            raise ValueError(
+                "--share-*-embeddings requires a joined dictionary: "
+                "--share-encoder-embeddings requires a joined source "
+                "dictionary, --share-decoder-embeddings requires a joined "
+                "target dictionary, and --share-all-embeddings requires a "
+                "joint source + target dictionary."
+            )
+        return build_embedding(shared_dict, embed_dim, pretrained_embed_path)
+
+    def forward(self, src_tokens, src_lengths, prev_output_tokens, **kwargs):
+        raise NotImplementedError
+
+    def max_positions(self):
+        """Maximum length supported by the model."""
+        return {
+            key: (
+                self.models[key].encoder.max_positions(),
+                self.models[key].decoder.max_positions(),
+            )
+            for key in self.keys
+        }
+
+    def max_decoder_positions(self):
+        """Maximum length supported by the decoder."""
+        return min(model.decoder.max_positions() for model in self.models.values())
+
+    @property
+    def encoder(self):
+        return self.models[self.keys[0]].encoder
+
+    @property
+    def decoder(self):
+        return self.models[self.keys[0]].decoder
+
+    def forward_decoder(self, prev_output_tokens, **kwargs):
+        return self.decoder(prev_output_tokens, **kwargs)
+
+    def load_state_dict(
+        self,
+        state_dict,
+        strict=True,
+        model_cfg=None,
+        args: Optional[Namespace] = None,
+    ):
+        """Copies parameters and buffers from *state_dict* into this module and
+        its descendants.
+
+        Overrides the method in :class:`nn.Module`. Compared with that method
+        this additionally "upgrades" *state_dicts* from old checkpoints.
+        """
+
+        if model_cfg is None and args is not None:
+            logger.warn("using 'args' is deprecated, please update your code to use dataclass config")
+            model_cfg = convert_namespace_to_omegaconf(args).model
+
+        self.upgrade_state_dict(state_dict)
+
+        from fairseq.checkpoint_utils import prune_state_dict
+
+        new_state_dict = prune_state_dict(state_dict, model_cfg)
+        return super().load_state_dict(new_state_dict, strict)
+
+
+class FairseqLanguageModel(BaseFairseqModel):
+    """Base class for decoder-only models.
+
+    Args:
+        decoder (FairseqDecoder): the decoder
+    """
+
+    def __init__(self, decoder):
+        super().__init__()
+        self.decoder = decoder
+        check_type(self.decoder, FairseqDecoder)
+
+    def forward(self, src_tokens, **kwargs):
+        """
+        Run the forward pass for a decoder-only model.
+
+        Feeds a batch of tokens through the decoder to predict the next tokens.
+
+        Args:
+            src_tokens (LongTensor): tokens on which to condition the decoder,
+                of shape `(batch, tgt_len)`
+            src_lengths (LongTensor): source sentence lengths of shape `(batch)`
+
+        Returns:
+            tuple:
+                - the decoder's output of shape `(batch, seq_len, vocab)`
+                - a dictionary with any model-specific outputs
+        """
+        return self.decoder(src_tokens, **kwargs)
+
+    def forward_decoder(self, prev_output_tokens, **kwargs):
+        return self.decoder(prev_output_tokens, **kwargs)
+
+    def extract_features(self, src_tokens, **kwargs):
+        """
+        Similar to *forward* but only return features.
+
+        Returns:
+            tuple:
+                - the decoder's features of shape `(batch, seq_len, embed_dim)`
+                - a dictionary with any model-specific outputs
+        """
+        return self.decoder.extract_features(src_tokens, **kwargs)
+
+    def output_layer(self, features, **kwargs):
+        """Project features to the default output size (typically vocabulary size)."""
+        return self.decoder.output_layer(features, **kwargs)
+
+    def max_positions(self):
+        """Maximum length supported by the model."""
+        return self.decoder.max_positions()
+
+    def max_decoder_positions(self):
+        """Maximum length supported by the decoder."""
+        return self.decoder.max_positions()
+
+    @property
+    def supported_targets(self):
+        return {"future"}
+
+
+class FairseqEncoderModel(BaseFairseqModel):
+    """Base class for encoder-only models.
+
+    Args:
+        encoder (FairseqEncoder): the encoder
+    """
+
+    def __init__(self, encoder):
+        super().__init__()
+        self.encoder = encoder
+        check_type(self.encoder, FairseqEncoder)
+
+    def forward(self, src_tokens, src_lengths, **kwargs):
+        """
+        Run the forward pass for a encoder-only model.
+
+        Feeds a batch of tokens through the encoder to generate features.
+
+        Args:
+            src_tokens (LongTensor): input tokens of shape `(batch, src_len)`
+            src_lengths (LongTensor): source sentence lengths of shape `(batch)`
+
+        Returns:
+            the encoder's output, typically of shape `(batch, src_len, features)`
+        """
+        return self.encoder(src_tokens, src_lengths, **kwargs)
+
+    def get_normalized_probs(self, net_output, log_probs, sample=None):
+        """Get normalized probabilities (or log probs) from a net's output."""
+        encoder_out = net_output["encoder_out"]
+        if torch.is_tensor(encoder_out):
+            logits = encoder_out.float()
+            if log_probs:
+                return F.log_softmax(logits, dim=-1)
+            else:
+                return F.softmax(logits, dim=-1)
+        raise NotImplementedError
+
+    def max_positions(self):
+        """Maximum length supported by the model."""
+        return self.encoder.max_positions()
diff --git a/SpeechT5/fairseq/fairseq/models/fconv.py b/SpeechT5/fairseq/fairseq/models/fconv.py
new file mode 100644
index 0000000000000000000000000000000000000000..c99a2151014d816ec9aff6f4b27d71224dd7b4cf
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/models/fconv.py
@@ -0,0 +1,756 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import math
+
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from fairseq import utils
+from fairseq.models import (
+    FairseqEncoder,
+    FairseqEncoderDecoderModel,
+    FairseqIncrementalDecoder,
+    register_model,
+    register_model_architecture,
+)
+from fairseq.modules import (
+    AdaptiveSoftmax,
+    BeamableMM,
+    FairseqDropout,
+    GradMultiply,
+    LearnedPositionalEmbedding,
+    LinearizedConvolution,
+)
+
+
+@register_model("fconv")
+class FConvModel(FairseqEncoderDecoderModel):
+    """
+    A fully convolutional model, i.e. a convolutional encoder and a
+    convolutional decoder, as described in `"Convolutional Sequence to Sequence
+    Learning" (Gehring et al., 2017) <https://arxiv.org/abs/1705.03122>`_.
+
+    Args:
+        encoder (FConvEncoder): the encoder
+        decoder (FConvDecoder): the decoder
+
+    The Convolutional model provides the following named architectures and
+    command-line arguments:
+
+    .. argparse::
+        :ref: fairseq.models.fconv_parser
+        :prog:
+    """
+
+    @classmethod
+    def hub_models(cls):
+        def moses_subword(path):
+            return {
+                "path": path,
+                "tokenizer": "moses",
+                "bpe": "subword_nmt",
+            }
+
+        return {
+            "conv.wmt14.en-fr": moses_subword(
+                "https://dl.fbaipublicfiles.com/fairseq/models/wmt14.v2.en-fr.fconv-py.tar.bz2"
+            ),
+            "conv.wmt14.en-de": moses_subword(
+                "https://dl.fbaipublicfiles.com/fairseq/models/wmt14.en-de.fconv-py.tar.bz2"
+            ),
+            "conv.wmt17.en-de": moses_subword(
+                "https://dl.fbaipublicfiles.com/fairseq/models/wmt17.v2.en-de.fconv-py.tar.bz2"
+            ),
+        }
+
+    def __init__(self, encoder, decoder):
+        super().__init__(encoder, decoder)
+        self.encoder.num_attention_layers = sum(
+            layer is not None for layer in decoder.attention
+        )
+
+    @staticmethod
+    def add_args(parser):
+        """Add model-specific arguments to the parser."""
+        # fmt: off
+        parser.add_argument('--dropout', type=float, metavar='D',
+                            help='dropout probability')
+        parser.add_argument('--encoder-embed-dim', type=int, metavar='N',
+                            help='encoder embedding dimension')
+        parser.add_argument('--encoder-embed-path', type=str, metavar='STR',
+                            help='path to pre-trained encoder embedding')
+        parser.add_argument('--encoder-layers', type=str, metavar='EXPR',
+                            help='encoder layers [(dim, kernel_size), ...]')
+        parser.add_argument('--decoder-embed-dim', type=int, metavar='N',
+                            help='decoder embedding dimension')
+        parser.add_argument('--decoder-embed-path', type=str, metavar='STR',
+                            help='path to pre-trained decoder embedding')
+        parser.add_argument('--decoder-layers', type=str, metavar='EXPR',
+                            help='decoder layers [(dim, kernel_size), ...]')
+        parser.add_argument('--decoder-out-embed-dim', type=int, metavar='N',
+                            help='decoder output embedding dimension')
+        parser.add_argument('--decoder-attention', type=str, metavar='EXPR',
+                            help='decoder attention [True, ...]')
+        parser.add_argument('--share-input-output-embed', action='store_true',
+                            help='share input and output embeddings (requires'
+                                 ' --decoder-out-embed-dim and --decoder-embed-dim'
+                                 ' to be equal)')
+        # fmt: on
+
+    @classmethod
+    def build_model(cls, args, task):
+        """Build a new model instance."""
+        # make sure that all args are properly defaulted (in case there are any new ones)
+        base_architecture(args)
+
+        encoder_embed_dict = None
+        if args.encoder_embed_path:
+            encoder_embed_dict = utils.parse_embedding(args.encoder_embed_path)
+            utils.print_embed_overlap(encoder_embed_dict, task.source_dictionary)
+
+        decoder_embed_dict = None
+        if args.decoder_embed_path:
+            decoder_embed_dict = utils.parse_embedding(args.decoder_embed_path)
+            utils.print_embed_overlap(decoder_embed_dict, task.target_dictionary)
+
+        encoder = FConvEncoder(
+            dictionary=task.source_dictionary,
+            embed_dim=args.encoder_embed_dim,
+            embed_dict=encoder_embed_dict,
+            convolutions=eval(args.encoder_layers),
+            dropout=args.dropout,
+            max_positions=args.max_source_positions,
+        )
+        decoder = FConvDecoder(
+            dictionary=task.target_dictionary,
+            embed_dim=args.decoder_embed_dim,
+            embed_dict=decoder_embed_dict,
+            convolutions=eval(args.decoder_layers),
+            out_embed_dim=args.decoder_out_embed_dim,
+            attention=eval(args.decoder_attention),
+            dropout=args.dropout,
+            max_positions=args.max_target_positions,
+            share_embed=args.share_input_output_embed,
+        )
+        return FConvModel(encoder, decoder)
+
+
+class FConvEncoder(FairseqEncoder):
+    """
+    Convolutional encoder consisting of `len(convolutions)` layers.
+
+    Args:
+        dictionary (~fairseq.data.Dictionary): encoding dictionary
+        embed_dim (int, optional): embedding dimension
+        embed_dict (str, optional): filename from which to load pre-trained
+            embeddings
+        max_positions (int, optional): maximum supported input sequence length
+        convolutions (list, optional): the convolutional layer structure. Each
+            list item `i` corresponds to convolutional layer `i`. Layers are
+            given as ``(out_channels, kernel_width, [residual])``. Residual
+            connections are added between layers when ``residual=1`` (which is
+            the default behavior).
+        dropout (float, optional): dropout to be applied before each conv layer
+    """
+
+    def __init__(
+        self,
+        dictionary,
+        embed_dim=512,
+        embed_dict=None,
+        max_positions=1024,
+        convolutions=((512, 3),) * 20,
+        dropout=0.1,
+    ):
+        super().__init__(dictionary)
+        self.dropout_module = FairseqDropout(
+            dropout, module_name=self.__class__.__name__
+        )
+        self.num_attention_layers = None
+
+        num_embeddings = len(dictionary)
+        self.padding_idx = dictionary.pad()
+        self.embed_tokens = Embedding(num_embeddings, embed_dim, self.padding_idx)
+        if embed_dict:
+            self.embed_tokens = utils.load_embedding(
+                embed_dict, self.dictionary, self.embed_tokens
+            )
+
+        self.embed_positions = PositionalEmbedding(
+            max_positions,
+            embed_dim,
+            self.padding_idx,
+        )
+
+        convolutions = extend_conv_spec(convolutions)
+        in_channels = convolutions[0][0]
+        self.fc1 = Linear(embed_dim, in_channels, dropout=dropout)
+        self.projections = nn.ModuleList()
+        self.convolutions = nn.ModuleList()
+        self.residuals = []
+
+        layer_in_channels = [in_channels]
+        for _, (out_channels, kernel_size, residual) in enumerate(convolutions):
+            if residual == 0:
+                residual_dim = out_channels
+            else:
+                residual_dim = layer_in_channels[-residual]
+            self.projections.append(
+                Linear(residual_dim, out_channels)
+                if residual_dim != out_channels
+                else None
+            )
+            if kernel_size % 2 == 1:
+                padding = kernel_size // 2
+            else:
+                padding = 0
+            self.convolutions.append(
+                ConvTBC(
+                    in_channels,
+                    out_channels * 2,
+                    kernel_size,
+                    dropout=dropout,
+                    padding=padding,
+                )
+            )
+            self.residuals.append(residual)
+            in_channels = out_channels
+            layer_in_channels.append(out_channels)
+        self.fc2 = Linear(in_channels, embed_dim)
+
+    def forward(self, src_tokens, src_lengths):
+        """
+        Args:
+            src_tokens (LongTensor): tokens in the source language of shape
+                `(batch, src_len)`
+            src_lengths (LongTensor): lengths of each source sentence of shape
+                `(batch)`
+
+        Returns:
+            dict:
+                - **encoder_out** (tuple): a tuple with two elements, where the
+                  first element is the last encoder layer's output and the
+                  second element is the same quantity summed with the input
+                  embedding (used for attention). The shape of both tensors is
+                  `(batch, src_len, embed_dim)`.
+                - **encoder_padding_mask** (ByteTensor): the positions of
+                  padding elements of shape `(batch, src_len)`
+        """
+        # embed tokens and positions
+        x = self.embed_tokens(src_tokens) + self.embed_positions(src_tokens)
+        x = self.dropout_module(x)
+        input_embedding = x
+
+        # project to size of convolution
+        x = self.fc1(x)
+
+        # used to mask padding in input
+        encoder_padding_mask = src_tokens.eq(self.padding_idx).t()  # -> T x B
+        if not encoder_padding_mask.any():
+            encoder_padding_mask = None
+
+        # B x T x C -> T x B x C
+        x = x.transpose(0, 1)
+
+        residuals = [x]
+        # temporal convolutions
+        for proj, conv, res_layer in zip(
+            self.projections, self.convolutions, self.residuals
+        ):
+            if res_layer > 0:
+                residual = residuals[-res_layer]
+                residual = residual if proj is None else proj(residual)
+            else:
+                residual = None
+
+            if encoder_padding_mask is not None:
+                x = x.masked_fill(encoder_padding_mask.unsqueeze(-1), 0)
+
+            x = self.dropout_module(x)
+            if conv.kernel_size[0] % 2 == 1:
+                # padding is implicit in the conv
+                x = conv(x)
+            else:
+                padding_l = (conv.kernel_size[0] - 1) // 2
+                padding_r = conv.kernel_size[0] // 2
+                x = F.pad(x, (0, 0, 0, 0, padding_l, padding_r))
+                x = conv(x)
+            x = F.glu(x, dim=2)
+
+            if residual is not None:
+                x = (x + residual) * math.sqrt(0.5)
+            residuals.append(x)
+
+        # T x B x C -> B x T x C
+        x = x.transpose(1, 0)
+
+        # project back to size of embedding
+        x = self.fc2(x)
+
+        if encoder_padding_mask is not None:
+            encoder_padding_mask = encoder_padding_mask.t()  # -> B x T
+            x = x.masked_fill(encoder_padding_mask.unsqueeze(-1), 0)
+
+        # scale gradients (this only affects backward, not forward)
+        x = GradMultiply.apply(x, 1.0 / (2.0 * self.num_attention_layers))
+
+        # add output to input embedding for attention
+        y = (x + input_embedding) * math.sqrt(0.5)
+
+        return {
+            "encoder_out": (x, y),
+            "encoder_padding_mask": encoder_padding_mask,  # B x T
+        }
+
+    def reorder_encoder_out(self, encoder_out, new_order):
+        if encoder_out["encoder_out"] is not None:
+            encoder_out["encoder_out"] = (
+                encoder_out["encoder_out"][0].index_select(0, new_order),
+                encoder_out["encoder_out"][1].index_select(0, new_order),
+            )
+        if encoder_out["encoder_padding_mask"] is not None:
+            encoder_out["encoder_padding_mask"] = encoder_out[
+                "encoder_padding_mask"
+            ].index_select(0, new_order)
+        return encoder_out
+
+    def max_positions(self):
+        """Maximum input length supported by the encoder."""
+        return self.embed_positions.max_positions
+
+
+class AttentionLayer(nn.Module):
+    def __init__(self, conv_channels, embed_dim, bmm=None):
+        super().__init__()
+        # projects from output of convolution to embedding dimension
+        self.in_projection = Linear(conv_channels, embed_dim)
+        # projects from embedding dimension to convolution size
+        self.out_projection = Linear(embed_dim, conv_channels)
+
+        self.bmm = bmm if bmm is not None else torch.bmm
+
+    def forward(self, x, target_embedding, encoder_out, encoder_padding_mask):
+        residual = x
+
+        # attention
+        x = (self.in_projection(x) + target_embedding) * math.sqrt(0.5)
+        x = self.bmm(x, encoder_out[0])
+
+        # don't attend over padding
+        if encoder_padding_mask is not None:
+            x = (
+                x.float()
+                .masked_fill(encoder_padding_mask.unsqueeze(1), float("-inf"))
+                .type_as(x)
+            )  # FP16 support: cast to float and back
+
+        # softmax over last dim
+        sz = x.size()
+        x = F.softmax(x.view(sz[0] * sz[1], sz[2]), dim=1)
+        x = x.view(sz)
+        attn_scores = x
+
+        x = self.bmm(x, encoder_out[1])
+
+        # scale attention output (respecting potentially different lengths)
+        s = encoder_out[1].size(1)
+        if encoder_padding_mask is None:
+            x = x * (s * math.sqrt(1.0 / s))
+        else:
+            s = s - encoder_padding_mask.type_as(x).sum(
+                dim=1, keepdim=True
+            )  # exclude padding
+            s = s.unsqueeze(-1)
+            x = x * (s * s.rsqrt())
+
+        # project back
+        x = (self.out_projection(x) + residual) * math.sqrt(0.5)
+        return x, attn_scores
+
+    def make_generation_fast_(self, beamable_mm_beam_size=None, **kwargs):
+        """Replace torch.bmm with BeamableMM."""
+        if beamable_mm_beam_size is not None:
+            del self.bmm
+            self.add_module("bmm", BeamableMM(beamable_mm_beam_size))
+
+
+class FConvDecoder(FairseqIncrementalDecoder):
+    """Convolutional decoder"""
+
+    def __init__(
+        self,
+        dictionary,
+        embed_dim=512,
+        embed_dict=None,
+        out_embed_dim=256,
+        max_positions=1024,
+        convolutions=((512, 3),) * 20,
+        attention=True,
+        dropout=0.1,
+        share_embed=False,
+        positional_embeddings=True,
+        adaptive_softmax_cutoff=None,
+        adaptive_softmax_dropout=0.0,
+    ):
+        super().__init__(dictionary)
+        self.register_buffer("version", torch.Tensor([2]))
+        self.dropout_module = FairseqDropout(
+            dropout, module_name=self.__class__.__name__
+        )
+        self.need_attn = True
+
+        convolutions = extend_conv_spec(convolutions)
+        in_channels = convolutions[0][0]
+        if isinstance(attention, bool):
+            # expand True into [True, True, ...] and do the same with False
+            attention = [attention] * len(convolutions)
+        if not isinstance(attention, list) or len(attention) != len(convolutions):
+            raise ValueError(
+                "Attention is expected to be a list of booleans of "
+                "length equal to the number of layers."
+            )
+
+        num_embeddings = len(dictionary)
+        padding_idx = dictionary.pad()
+        self.embed_tokens = Embedding(num_embeddings, embed_dim, padding_idx)
+        if embed_dict:
+            self.embed_tokens = utils.load_embedding(
+                embed_dict, self.dictionary, self.embed_tokens
+            )
+
+        self.embed_positions = (
+            PositionalEmbedding(
+                max_positions,
+                embed_dim,
+                padding_idx,
+            )
+            if positional_embeddings
+            else None
+        )
+
+        self.fc1 = Linear(embed_dim, in_channels, dropout=dropout)
+        self.projections = nn.ModuleList()
+        self.convolutions = nn.ModuleList()
+        self.attention = nn.ModuleList()
+        self.residuals = []
+
+        layer_in_channels = [in_channels]
+        for i, (out_channels, kernel_size, residual) in enumerate(convolutions):
+            if residual == 0:
+                residual_dim = out_channels
+            else:
+                residual_dim = layer_in_channels[-residual]
+            self.projections.append(
+                Linear(residual_dim, out_channels)
+                if residual_dim != out_channels
+                else None
+            )
+            self.convolutions.append(
+                LinearizedConv1d(
+                    in_channels,
+                    out_channels * 2,
+                    kernel_size,
+                    padding=(kernel_size - 1),
+                    dropout=dropout,
+                )
+            )
+            self.attention.append(
+                AttentionLayer(out_channels, embed_dim) if attention[i] else None
+            )
+            self.residuals.append(residual)
+            in_channels = out_channels
+            layer_in_channels.append(out_channels)
+
+        self.adaptive_softmax = None
+        self.fc2 = self.fc3 = None
+
+        if adaptive_softmax_cutoff is not None:
+            assert not share_embed
+            self.adaptive_softmax = AdaptiveSoftmax(
+                num_embeddings,
+                in_channels,
+                adaptive_softmax_cutoff,
+                dropout=adaptive_softmax_dropout,
+            )
+        else:
+            self.fc2 = Linear(in_channels, out_embed_dim)
+            if share_embed:
+                assert out_embed_dim == embed_dim, (
+                    "Shared embed weights implies same dimensions "
+                    " out_embed_dim={} vs embed_dim={}".format(out_embed_dim, embed_dim)
+                )
+                self.fc3 = nn.Linear(out_embed_dim, num_embeddings)
+                self.fc3.weight = self.embed_tokens.weight
+            else:
+                self.fc3 = Linear(out_embed_dim, num_embeddings, dropout=dropout)
+
+    def forward(
+        self, prev_output_tokens, encoder_out=None, incremental_state=None, **unused
+    ):
+        if encoder_out is not None:
+            encoder_padding_mask = encoder_out["encoder_padding_mask"]
+            encoder_out = encoder_out["encoder_out"]
+
+            # split and transpose encoder outputs
+            encoder_a, encoder_b = self._split_encoder_out(
+                encoder_out, incremental_state
+            )
+
+        if self.embed_positions is not None:
+            pos_embed = self.embed_positions(prev_output_tokens, incremental_state)
+        else:
+            pos_embed = 0
+
+        if incremental_state is not None:
+            prev_output_tokens = prev_output_tokens[:, -1:]
+        x = self._embed_tokens(prev_output_tokens, incremental_state)
+
+        # embed tokens and combine with positional embeddings
+        x += pos_embed
+        x = self.dropout_module(x)
+        target_embedding = x
+
+        # project to size of convolution
+        x = self.fc1(x)
+
+        # B x T x C -> T x B x C
+        x = self._transpose_if_training(x, incremental_state)
+
+        # temporal convolutions
+        avg_attn_scores = None
+        num_attn_layers = len(self.attention)
+        residuals = [x]
+        for proj, conv, attention, res_layer in zip(
+            self.projections, self.convolutions, self.attention, self.residuals
+        ):
+            if res_layer > 0:
+                residual = residuals[-res_layer]
+                residual = residual if proj is None else proj(residual)
+            else:
+                residual = None
+
+            x = self.dropout_module(x)
+            x = conv(x, incremental_state)
+            x = F.glu(x, dim=2)
+
+            # attention
+            if attention is not None:
+                x = self._transpose_if_training(x, incremental_state)
+
+                x, attn_scores = attention(
+                    x, target_embedding, (encoder_a, encoder_b), encoder_padding_mask
+                )
+
+                if not self.training and self.need_attn:
+                    attn_scores = attn_scores / num_attn_layers
+                    if avg_attn_scores is None:
+                        avg_attn_scores = attn_scores
+                    else:
+                        avg_attn_scores.add_(attn_scores)
+
+                x = self._transpose_if_training(x, incremental_state)
+
+            # residual
+            if residual is not None:
+                x = (x + residual) * math.sqrt(0.5)
+            residuals.append(x)
+
+        # T x B x C -> B x T x C
+        x = self._transpose_if_training(x, incremental_state)
+
+        # project back to size of vocabulary if not using adaptive softmax
+        if self.fc2 is not None and self.fc3 is not None:
+            x = self.fc2(x)
+            x = self.dropout_module(x)
+            x = self.fc3(x)
+
+        return x, avg_attn_scores
+
+    def reorder_incremental_state(self, incremental_state, new_order):
+        super().reorder_incremental_state(incremental_state, new_order)
+        encoder_out = utils.get_incremental_state(
+            self, incremental_state, "encoder_out"
+        )
+        if encoder_out is not None:
+            encoder_out = tuple(eo.index_select(0, new_order) for eo in encoder_out)
+            utils.set_incremental_state(
+                self, incremental_state, "encoder_out", encoder_out
+            )
+
+    def max_positions(self):
+        """Maximum output length supported by the decoder."""
+        return (
+            self.embed_positions.max_positions
+            if self.embed_positions is not None
+            else float("inf")
+        )
+
+    def upgrade_state_dict(self, state_dict):
+        if utils.item(state_dict.get("decoder.version", torch.Tensor([1]))[0]) < 2:
+            # old models use incorrect weight norm dimension
+            for i, conv in enumerate(self.convolutions):
+                # reconfigure weight norm
+                nn.utils.remove_weight_norm(conv)
+                self.convolutions[i] = nn.utils.weight_norm(conv, dim=0)
+            state_dict["decoder.version"] = torch.Tensor([1])
+        return state_dict
+
+    def make_generation_fast_(self, need_attn=False, **kwargs):
+        self.need_attn = need_attn
+
+    def _embed_tokens(self, tokens, incremental_state):
+        if incremental_state is not None:
+            # keep only the last token for incremental forward pass
+            tokens = tokens[:, -1:]
+        return self.embed_tokens(tokens)
+
+    def _split_encoder_out(self, encoder_out, incremental_state):
+        """Split and transpose encoder outputs.
+
+        This is cached when doing incremental inference.
+        """
+        cached_result = utils.get_incremental_state(
+            self, incremental_state, "encoder_out"
+        )
+        if cached_result is not None:
+            return cached_result
+
+        # transpose only once to speed up attention layers
+        encoder_a, encoder_b = encoder_out
+        encoder_a = encoder_a.transpose(1, 2).contiguous()
+        result = (encoder_a, encoder_b)
+
+        if incremental_state is not None:
+            utils.set_incremental_state(self, incremental_state, "encoder_out", result)
+        return result
+
+    def _transpose_if_training(self, x, incremental_state):
+        if incremental_state is None:
+            x = x.transpose(0, 1)
+        return x
+
+
+def extend_conv_spec(convolutions):
+    """
+    Extends convolutional spec that is a list of tuples of 2 or 3 parameters
+    (kernel size, dim size and optionally how many layers behind to look for residual)
+    to default the residual propagation param if it is not specified
+    """
+    extended = []
+    for spec in convolutions:
+        if len(spec) == 3:
+            extended.append(spec)
+        elif len(spec) == 2:
+            extended.append(spec + (1,))
+        else:
+            raise Exception(
+                "invalid number of parameters in convolution spec "
+                + str(spec)
+                + ". expected 2 or 3"
+            )
+    return tuple(extended)
+
+
+def Embedding(num_embeddings, embedding_dim, padding_idx):
+    m = nn.Embedding(num_embeddings, embedding_dim, padding_idx=padding_idx)
+    nn.init.normal_(m.weight, 0, 0.1)
+    nn.init.constant_(m.weight[padding_idx], 0)
+    return m
+
+
+def PositionalEmbedding(num_embeddings, embedding_dim, padding_idx):
+    m = LearnedPositionalEmbedding(num_embeddings, embedding_dim, padding_idx)
+    nn.init.normal_(m.weight, 0, 0.1)
+    nn.init.constant_(m.weight[padding_idx], 0)
+    return m
+
+
+def Linear(in_features, out_features, dropout=0.0):
+    """Weight-normalized Linear layer (input: N x T x C)"""
+    m = nn.Linear(in_features, out_features)
+    nn.init.normal_(m.weight, mean=0, std=math.sqrt((1 - dropout) / in_features))
+    nn.init.constant_(m.bias, 0)
+    return nn.utils.weight_norm(m)
+
+
+def LinearizedConv1d(in_channels, out_channels, kernel_size, dropout=0.0, **kwargs):
+    """Weight-normalized Conv1d layer optimized for decoding"""
+    m = LinearizedConvolution(in_channels, out_channels, kernel_size, **kwargs)
+    std = math.sqrt((4 * (1.0 - dropout)) / (m.kernel_size[0] * in_channels))
+    nn.init.normal_(m.weight, mean=0, std=std)
+    nn.init.constant_(m.bias, 0)
+    return nn.utils.weight_norm(m, dim=2)
+
+
+def ConvTBC(in_channels, out_channels, kernel_size, dropout=0.0, **kwargs):
+    """Weight-normalized Conv1d layer"""
+    from fairseq.modules import ConvTBC
+
+    m = ConvTBC(in_channels, out_channels, kernel_size, **kwargs)
+    std = math.sqrt((4 * (1.0 - dropout)) / (m.kernel_size[0] * in_channels))
+    nn.init.normal_(m.weight, mean=0, std=std)
+    nn.init.constant_(m.bias, 0)
+    return nn.utils.weight_norm(m, dim=2)
+
+
+@register_model_architecture("fconv", "fconv")
+def base_architecture(args):
+    args.dropout = getattr(args, "dropout", 0.1)
+    args.encoder_embed_dim = getattr(args, "encoder_embed_dim", 512)
+    args.encoder_embed_path = getattr(args, "encoder_embed_path", None)
+    args.encoder_layers = getattr(args, "encoder_layers", "[(512, 3)] * 20")
+    args.decoder_embed_dim = getattr(args, "decoder_embed_dim", 512)
+    args.decoder_embed_path = getattr(args, "decoder_embed_path", None)
+    args.decoder_layers = getattr(args, "decoder_layers", "[(512, 3)] * 20")
+    args.decoder_out_embed_dim = getattr(args, "decoder_out_embed_dim", 256)
+    args.decoder_attention = getattr(args, "decoder_attention", "True")
+    args.share_input_output_embed = getattr(args, "share_input_output_embed", False)
+
+
+@register_model_architecture("fconv", "fconv_iwslt_de_en")
+def fconv_iwslt_de_en(args):
+    args.encoder_embed_dim = getattr(args, "encoder_embed_dim", 256)
+    args.encoder_layers = getattr(args, "encoder_layers", "[(256, 3)] * 4")
+    args.decoder_embed_dim = getattr(args, "decoder_embed_dim", 256)
+    args.decoder_layers = getattr(args, "decoder_layers", "[(256, 3)] * 3")
+    args.decoder_out_embed_dim = getattr(args, "decoder_out_embed_dim", 256)
+    base_architecture(args)
+
+
+@register_model_architecture("fconv", "fconv_wmt_en_ro")
+def fconv_wmt_en_ro(args):
+    args.decoder_out_embed_dim = getattr(args, "decoder_out_embed_dim", 512)
+    base_architecture(args)
+
+
+@register_model_architecture("fconv", "fconv_wmt_en_de")
+def fconv_wmt_en_de(args):
+    convs = "[(512, 3)] * 9"  # first 9 layers have 512 units
+    convs += " + [(1024, 3)] * 4"  # next 4 layers have 1024 units
+    convs += " + [(2048, 1)] * 2"  # final 2 layers use 1x1 convolutions
+
+    args.encoder_embed_dim = getattr(args, "encoder_embed_dim", 768)
+    args.encoder_layers = getattr(args, "encoder_layers", convs)
+    args.decoder_embed_dim = getattr(args, "decoder_embed_dim", 768)
+    args.decoder_layers = getattr(args, "decoder_layers", convs)
+    args.decoder_out_embed_dim = getattr(args, "decoder_out_embed_dim", 512)
+    base_architecture(args)
+
+
+@register_model_architecture("fconv", "fconv_wmt_en_fr")
+def fconv_wmt_en_fr(args):
+    convs = "[(512, 3)] * 6"  # first 6 layers have 512 units
+    convs += " + [(768, 3)] * 4"  # next 4 layers have 768 units
+    convs += " + [(1024, 3)] * 3"  # next 3 layers have 1024 units
+    convs += " + [(2048, 1)] * 1"  # next 1 layer uses 1x1 convolutions
+    convs += " + [(4096, 1)] * 1"  # final 1 layer uses 1x1 convolutions
+
+    args.encoder_embed_dim = getattr(args, "encoder_embed_dim", 768)
+    args.encoder_layers = getattr(args, "encoder_layers", convs)
+    args.decoder_embed_dim = getattr(args, "decoder_embed_dim", 768)
+    args.decoder_layers = getattr(args, "decoder_layers", convs)
+    args.decoder_out_embed_dim = getattr(args, "decoder_out_embed_dim", 512)
+    base_architecture(args)
diff --git a/SpeechT5/fairseq/fairseq/models/fconv_lm.py b/SpeechT5/fairseq/fairseq/models/fconv_lm.py
new file mode 100644
index 0000000000000000000000000000000000000000..07391eaa2908eacd2709176942d920c483c4f066
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/models/fconv_lm.py
@@ -0,0 +1,135 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from fairseq import utils
+from fairseq.models import (
+    FairseqLanguageModel,
+    register_model,
+    register_model_architecture,
+)
+from fairseq.models.fconv import FConvDecoder
+
+
+@register_model("fconv_lm")
+class FConvLanguageModel(FairseqLanguageModel):
+    def __init__(self, decoder):
+        super().__init__(decoder)
+
+    @staticmethod
+    def add_args(parser):
+        """Add model-specific arguments to the parser."""
+        parser.add_argument(
+            "--dropout", type=float, metavar="D", help="dropout probability"
+        )
+        parser.add_argument(
+            "--decoder-embed-dim",
+            type=int,
+            metavar="N",
+            help="decoder embedding dimension",
+        )
+        parser.add_argument(
+            "--decoder-layers",
+            type=str,
+            metavar="EXPR",
+            help="decoder layers [(dim, kernel_size), ...]",
+        )
+        parser.add_argument(
+            "--decoder-out-embed-dim",
+            type=int,
+            metavar="N",
+            help="decoder output embedding dimension",
+        )
+        parser.add_argument(
+            "--adaptive-softmax-cutoff",
+            metavar="EXPR",
+            help="comma separated list of adaptive softmax cutoff points. "
+            "Must be used with adaptive_loss criterion",
+        )
+        parser.add_argument(
+            "--adaptive-softmax-dropout",
+            type=float,
+            metavar="D",
+            help="sets adaptive softmax dropout for the tail projections",
+        )
+        parser.add_argument(
+            "--decoder-attention",
+            type=str,
+            metavar="EXPR",
+            help="decoder attention [True, ...]",
+        )
+
+    @classmethod
+    def build_model(cls, args, task):
+        """Build a new model instance."""
+        # make sure all arguments are present in older models
+        base_lm_architecture(args)
+
+        if hasattr(args, "max_target_positions") and not hasattr(
+            args, "tokens_per_sample"
+        ):
+            args.tokens_per_sample = args.max_target_positions
+
+        decoder = FConvDecoder(
+            dictionary=task.target_dictionary,
+            embed_dim=args.decoder_embed_dim,
+            convolutions=eval(args.decoder_layers),
+            out_embed_dim=args.decoder_embed_dim,
+            attention=eval(args.decoder_attention),
+            dropout=args.dropout,
+            max_positions=args.tokens_per_sample,
+            share_embed=False,
+            positional_embeddings=False,
+            adaptive_softmax_cutoff=(
+                utils.eval_str_list(args.adaptive_softmax_cutoff, type=int)
+                if args.criterion == "adaptive_loss"
+                else None
+            ),
+            adaptive_softmax_dropout=args.adaptive_softmax_dropout,
+        )
+        return FConvLanguageModel(decoder)
+
+
+@register_model_architecture("fconv_lm", "fconv_lm")
+def base_lm_architecture(args):
+    args.dropout = getattr(args, "dropout", 0.1)
+    args.decoder_embed_dim = getattr(args, "decoder_embed_dim", 128)
+    args.decoder_layers = getattr(args, "decoder_layers", "[(1268, 4)] * 13")
+    args.decoder_attention = getattr(args, "decoder_attention", "False")
+    args.adaptive_softmax_cutoff = getattr(args, "adaptive_softmax_cutoff", None)
+    args.adaptive_softmax_dropout = getattr(args, "adaptive_softmax_dropout", 0)
+
+
+@register_model_architecture("fconv_lm", "fconv_lm_dauphin_wikitext103")
+def fconv_lm_dauphin_wikitext103(args):
+    layers = "[(850, 6)] * 3"
+    layers += " + [(850, 1)] * 1"
+    layers += " + [(850, 5)] * 4"
+    layers += " + [(850, 1)] * 1"
+    layers += " + [(850, 4)] * 3"
+    layers += " + [(1024, 4)] * 1"
+    layers += " + [(2048, 4)] * 1"
+    args.decoder_embed_dim = getattr(args, "decoder_embed_dim", 280)
+    args.decoder_layers = getattr(args, "decoder_layers", layers)
+    args.decoder_attention = getattr(args, "decoder_attention", "False")
+    args.adaptive_softmax_cutoff = getattr(
+        args, "adaptive_softmax_cutoff", "10000,20000,200000"
+    )
+    base_lm_architecture(args)
+
+
+@register_model_architecture("fconv_lm", "fconv_lm_dauphin_gbw")
+def fconv_lm_dauphin_gbw(args):
+    layers = "[(512, 5)]"
+    layers += " + [(128, 1, 0), (128, 5, 0), (512, 1, 3)] * 3"
+    layers += " + [(512, 1, 0), (512, 5, 0), (1024, 1, 3)] * 3"
+    layers += " + [(1024, 1, 0), (1024, 5, 0), (2048, 1, 3)] * 6"
+    layers += " + [(1024, 1, 0), (1024, 5, 0), (4096, 1, 3)]"
+    args.decoder_embed_dim = getattr(args, "decoder_embed_dim", 128)
+    args.decoder_layers = getattr(args, "decoder_layers", layers)
+    args.decoder_attention = getattr(args, "decoder_attention", "False")
+    args.adaptive_softmax_cutoff = getattr(
+        args, "adaptive_softmax_cutoff", "10000,50000,200000"
+    )
+    base_lm_architecture(args)
diff --git a/SpeechT5/fairseq/fairseq/models/fconv_self_att.py b/SpeechT5/fairseq/fairseq/models/fconv_self_att.py
new file mode 100644
index 0000000000000000000000000000000000000000..8357ef7847ed25a62345e219c41906156828c233
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/models/fconv_self_att.py
@@ -0,0 +1,674 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import logging
+import math
+import os
+
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from fairseq import checkpoint_utils
+from fairseq.incremental_decoding_utils import with_incremental_state
+from fairseq.models import (
+    CompositeEncoder,
+    FairseqDecoder,
+    FairseqEncoder,
+    FairseqEncoderDecoderModel,
+    register_model,
+    register_model_architecture,
+)
+from fairseq.modules import (
+    DownsampledMultiHeadAttention,
+    FairseqDropout,
+    GradMultiply,
+    LayerNorm,
+    LearnedPositionalEmbedding,
+    LinearizedConvolution,
+)
+
+
+logger = logging.getLogger(__name__)
+
+
+@register_model("fconv_self_att")
+class FConvModelSelfAtt(FairseqEncoderDecoderModel):
+    @classmethod
+    def hub_models(cls):
+        return {
+            "conv.stories.pretrained": {
+                "path": "https://dl.fbaipublicfiles.com/fairseq/models/stories_checkpoint.tar.gz",
+                "checkpoint_file": "pretrained_checkpoint.pt",
+                "tokenizer": "nltk",
+            },
+            "conv.stories": {
+                "path": "https://dl.fbaipublicfiles.com/fairseq/models/stories_checkpoint.tar.gz",
+                "checkpoint_file": "fusion_checkpoint.pt",
+                "tokenizer": "nltk",
+                "pretrained": "True",
+                "pretrained_checkpoint": "./pretrained_checkpoint.pt",
+            },
+            # Test set containing dictionaries
+            "data.stories": "https://dl.fbaipublicfiles.com/fairseq/data/stories_test.tar.bz2",
+        }
+
+    def __init__(self, encoder, decoder, pretrained_encoder=None):
+        super().__init__(encoder, decoder)
+        self.encoder.num_attention_layers = sum(
+            layer is not None for layer in decoder.attention
+        )
+        self.pretrained_encoder = pretrained_encoder
+        if self.pretrained_encoder is None:
+            encoders = {"encoder": encoder}
+        else:
+            encoders = {"encoder": encoder, "pretrained": self.pretrained_encoder}
+        # for fusion model, CompositeEncoder contains both pretrained and training encoders
+        # these are forwarded and then combined in the decoder
+        self.encoder = CompositeEncoder(encoders)
+
+    @staticmethod
+    def add_args(parser):
+        """Add model-specific arguments to the parser."""
+        # fmt: off
+        parser.add_argument('--dropout', type=float, metavar='D',
+                            help='dropout probability')
+        parser.add_argument('--encoder-embed-dim', type=int, metavar='N',
+                            help='encoder embedding dimension')
+        parser.add_argument('--encoder-layers', type=str, metavar='EXPR',
+                            help='encoder layers [(dim, kernel_size), ...]')
+        parser.add_argument('--decoder-embed-dim', type=int, metavar='N',
+                            help='decoder embedding dimension')
+        parser.add_argument('--decoder-layers', type=str, metavar='EXPR',
+                            help='decoder layers [(dim, kernel_size), ...]')
+        parser.add_argument('--decoder-out-embed-dim', type=int, metavar='N',
+                            help='decoder output embedding dimension')
+        parser.add_argument('--decoder-attention', type=str, metavar='EXPR',
+                            help='decoder attention [True, ...]')
+        parser.add_argument('--self-attention', type=str, metavar='EXPR',
+                            help='decoder self-attention layers, ex: [True] + [False]*5')
+        parser.add_argument('--multihead-attention-nheads', type=int,
+                            help='Number of heads to use in attention')
+        parser.add_argument('--multihead-self-attention-nheads', type=int,
+                            help='Number of heads to use in self-attention')
+        parser.add_argument('--encoder-attention', type=str, metavar='EXPR',
+                            help='encoder attention [True, ...]')
+        parser.add_argument('--encoder-attention-nheads', type=int,
+                            help='Number of heads to use in encoder attention')
+        parser.add_argument('--project-input', type=str, metavar='EXPR',
+                            help='Use projections in self-attention [True, ...]')
+        parser.add_argument('--gated-attention', type=str, metavar='EXPR',
+                            help='Use GLU layers in self-attention projections [True, ...]')
+        parser.add_argument('--downsample', type=str, metavar='EXPR',
+                            help='Use downsampling in self-attention [True, ...]')
+        parser.add_argument('--pretrained-checkpoint', metavar='DIR',
+                            help='path to load checkpoint from pretrained model')
+        parser.add_argument('--pretrained', type=str, metavar='EXPR',
+                            help='use pretrained model when training [True, ...]')
+        # fmt: on
+
+    @classmethod
+    def build_model(cls, args, task):
+        """Build a new model instance."""
+        trained_encoder, trained_decoder = None, None
+        pretrained = eval(args.pretrained)
+        if pretrained:
+            logger.info("loading pretrained model")
+            if not os.path.exists(args.pretrained_checkpoint):
+                new_pretrained_checkpoint = os.path.join(
+                    args.data, args.pretrained_checkpoint
+                )
+                if os.path.exists(new_pretrained_checkpoint):
+                    args.pretrained_checkpoint = new_pretrained_checkpoint
+            trained_model = checkpoint_utils.load_model_ensemble(
+                filenames=[args.pretrained_checkpoint],
+                task=task,
+            )[0][0]
+            trained_decoder = list(trained_model.children())[1]
+            trained_encoder = list(trained_model.children())[0]
+
+            # freeze pretrained model
+            for param in trained_decoder.parameters():
+                param.requires_grad = False
+            for param in trained_encoder.parameters():
+                param.requires_grad = False
+
+        encoder = FConvEncoder(
+            task.source_dictionary,
+            embed_dim=args.encoder_embed_dim,
+            convolutions=eval(args.encoder_layers),
+            dropout=args.dropout,
+            max_positions=args.max_source_positions,
+            attention=eval(args.encoder_attention),
+            attention_nheads=args.encoder_attention_nheads,
+        )
+
+        decoder = FConvDecoder(
+            task.target_dictionary,
+            embed_dim=args.decoder_embed_dim,
+            convolutions=eval(args.decoder_layers),
+            out_embed_dim=args.decoder_out_embed_dim,
+            attention=eval(args.decoder_attention),
+            dropout=args.dropout,
+            max_positions=args.max_target_positions,
+            selfattention=eval(args.self_attention),
+            attention_nheads=args.multihead_attention_nheads,
+            selfattention_nheads=args.multihead_self_attention_nheads,
+            project_input=eval(args.project_input),
+            gated_attention=eval(args.gated_attention),
+            downsample=eval(args.downsample),
+            pretrained=pretrained,
+            trained_decoder=trained_decoder,
+        )
+        model = FConvModelSelfAtt(encoder, decoder, trained_encoder)
+
+        return model
+
+    @property
+    def pretrained(self):
+        return self.pretrained_encoder is not None
+
+
+class FConvEncoder(FairseqEncoder):
+    """Convolutional encoder"""
+
+    def __init__(
+        self,
+        dictionary,
+        embed_dim=512,
+        max_positions=1024,
+        convolutions=((512, 3),) * 20,
+        dropout=0.1,
+        attention=False,
+        attention_nheads=1,
+    ):
+        super().__init__(dictionary)
+        self.dropout_module = FairseqDropout(
+            dropout, module_name=self.__class__.__name__
+        )
+        self.num_attention_layers = None
+
+        num_embeddings = len(dictionary)
+        self.padding_idx = dictionary.pad()
+        self.embed_tokens = Embedding(num_embeddings, embed_dim, self.padding_idx)
+        self.embed_positions = PositionalEmbedding(
+            max_positions,
+            embed_dim,
+            self.padding_idx,
+        )
+
+        def expand_bool_array(val):
+            if isinstance(val, bool):
+                # expand True into [True, True, ...] and do the same with False
+                return [val] * len(convolutions)
+            return val
+
+        attention = expand_bool_array(attention)
+
+        in_channels = convolutions[0][0]
+        self.fc1 = Linear(embed_dim, in_channels, dropout=dropout)
+        self.projections = nn.ModuleList()
+        self.convolutions = nn.ModuleList()
+        self.attention = nn.ModuleList()
+        self.attproj = nn.ModuleList()
+        for i, (out_channels, kernel_size) in enumerate(convolutions):
+            self.projections.append(
+                Linear(in_channels, out_channels)
+                if in_channels != out_channels
+                else None
+            )
+            self.convolutions.append(
+                ConvTBC(in_channels, out_channels * 2, kernel_size, dropout=dropout)
+            )
+
+            self.attention.append(
+                SelfAttention(out_channels, embed_dim, attention_nheads)
+                if attention[i]
+                else None
+            )
+            in_channels = out_channels
+
+        self.fc2 = Linear(in_channels, embed_dim)
+
+    def forward(self, src_tokens, src_lengths):
+        # embed tokens and positions
+        x = self.embed_tokens(src_tokens) + self.embed_positions(src_tokens)
+        x = self.dropout_module(x)
+        input_embedding = x.transpose(0, 1)
+
+        # project to size of convolution
+        x = self.fc1(x)
+
+        encoder_padding_mask = src_tokens.eq(self.padding_idx).t()  # -> T x B
+        if not encoder_padding_mask.any():
+            encoder_padding_mask = None
+
+        # B x T x C -> T x B x C
+        x = x.transpose(0, 1)
+
+        # temporal convolutions
+        for proj, conv, attention in zip(
+            self.projections, self.convolutions, self.attention
+        ):
+            residual = x if proj is None else proj(x)
+
+            if encoder_padding_mask is not None:
+                x = x.masked_fill(encoder_padding_mask.unsqueeze(-1), 0)
+
+            x = self.dropout_module(x)
+            padding_l = (conv.kernel_size[0] - 1) // 2
+            padding_r = conv.kernel_size[0] // 2
+            x = F.pad(x, (0, 0, 0, 0, padding_l, padding_r))
+            x = conv(x)
+            x = F.glu(x, dim=2)
+            if attention is not None:
+                x = attention(x)
+            x = (x + residual) * math.sqrt(0.5)
+
+        # T x B x C -> B x T x C
+        x = x.transpose(1, 0)
+
+        # project back to size of embedding
+        x = self.fc2(x)
+
+        if encoder_padding_mask is not None:
+            encoder_padding_mask = encoder_padding_mask.t()  # -> B x T
+            x = x.masked_fill(encoder_padding_mask.unsqueeze(-1), 0)
+
+        # scale gradients (this only affects backward, not forward)
+        x = GradMultiply.apply(x, 1.0 / (2.0 * self.num_attention_layers))
+
+        # add output to input embedding for attention
+        y = (x + input_embedding.transpose(0, 1)) * math.sqrt(0.5)
+
+        return {
+            "encoder_out": (x, y),
+            "encoder_padding_mask": encoder_padding_mask,  # B x T
+        }
+
+    def reorder_encoder_out(self, encoder_out, new_order):
+        encoder_out["encoder_out"] = tuple(
+            eo.index_select(0, new_order) for eo in encoder_out["encoder_out"]
+        )
+
+        if encoder_out["encoder_padding_mask"] is not None:
+            encoder_out["encoder_padding_mask"] = encoder_out[
+                "encoder_padding_mask"
+            ].index_select(0, new_order)
+
+        if "pretrained" in encoder_out:
+            encoder_out["pretrained"]["encoder_out"] = tuple(
+                eo.index_select(0, new_order)
+                for eo in encoder_out["pretrained"]["encoder_out"]
+            )
+
+        return encoder_out
+
+    def max_positions(self):
+        """Maximum input length supported by the encoder."""
+        return self.embed_positions.max_positions
+
+
+@with_incremental_state
+class FConvDecoder(FairseqDecoder):
+    """Convolutional decoder"""
+
+    def __init__(
+        self,
+        dictionary,
+        embed_dim=512,
+        out_embed_dim=256,
+        max_positions=1024,
+        convolutions=((512, 3),) * 8,
+        attention=True,
+        dropout=0.1,
+        selfattention=False,
+        attention_nheads=1,
+        selfattention_nheads=1,
+        project_input=False,
+        gated_attention=False,
+        downsample=False,
+        pretrained=False,
+        trained_decoder=None,
+    ):
+        super().__init__(dictionary)
+        self.register_buffer("version", torch.Tensor([2]))
+        self.pretrained = pretrained
+        self.pretrained_decoder = trained_decoder
+        self.dropout_module = FairseqDropout(
+            dropout, module_name=self.__class__.__name__
+        )
+        self.need_attn = True
+        in_channels = convolutions[0][0]
+
+        def expand_bool_array(val):
+            if isinstance(val, bool):
+                # expand True into [True, True, ...] and do the same with False
+                return [val] * len(convolutions)
+            return val
+
+        attention = expand_bool_array(attention)
+        selfattention = expand_bool_array(selfattention)
+
+        if not isinstance(attention, list) or len(attention) != len(convolutions):
+            raise ValueError(
+                "Attention is expected to be a list of booleans of "
+                "length equal to the number of layers."
+            )
+
+        num_embeddings = len(dictionary)
+        padding_idx = dictionary.pad()
+        self.embed_tokens = Embedding(num_embeddings, embed_dim, padding_idx)
+
+        self.embed_positions = PositionalEmbedding(
+            max_positions,
+            embed_dim,
+            padding_idx,
+        )
+
+        self.fc1 = Linear(embed_dim, in_channels, dropout=dropout)
+        self.projections = nn.ModuleList()
+        self.convolutions = nn.ModuleList()
+        self.attention = nn.ModuleList()
+        self.selfattention = nn.ModuleList()
+        self.attproj = nn.ModuleList()
+        for i, (out_channels, kernel_size) in enumerate(convolutions):
+            self.projections.append(
+                Linear(in_channels, out_channels)
+                if in_channels != out_channels
+                else None
+            )
+            self.convolutions.append(
+                LinearizedConv1d(
+                    in_channels,
+                    out_channels * 2,
+                    kernel_size,
+                    padding=(kernel_size - 1),
+                    dropout=dropout,
+                )
+            )
+
+            self.attention.append(
+                DownsampledMultiHeadAttention(
+                    out_channels,
+                    embed_dim,
+                    attention_nheads,
+                    project_input=project_input,
+                    gated=False,
+                    downsample=False,
+                )
+                if attention[i]
+                else None
+            )
+
+            self.attproj.append(
+                Linear(out_channels, embed_dim, dropout=dropout)
+                if attention[i]
+                else None
+            )
+            self.selfattention.append(
+                SelfAttention(
+                    out_channels,
+                    embed_dim,
+                    selfattention_nheads,
+                    project_input=project_input,
+                    gated=gated_attention,
+                    downsample=downsample,
+                )
+                if selfattention[i]
+                else None
+            )
+            in_channels = out_channels
+
+        self.fc2 = Linear(in_channels, out_embed_dim)
+        self.fc3 = Linear(out_embed_dim, num_embeddings, dropout=dropout)
+
+        # model fusion
+        if self.pretrained:
+            # independent gates are learned from the concatenated input
+            self.gate1 = nn.Sequential(
+                Linear(out_embed_dim * 2, out_embed_dim), nn.Sigmoid()
+            )
+            self.gate2 = nn.Sequential(
+                Linear(out_embed_dim * 2, out_embed_dim), nn.Sigmoid()
+            )
+            # pretrained and trained models are joined
+            self.joining = nn.Sequential(
+                Linear(out_embed_dim * 2, out_embed_dim * 2),
+                LayerNorm(out_embed_dim * 2),
+                nn.GLU(),
+                Linear(out_embed_dim, out_embed_dim * 2),
+                LayerNorm(out_embed_dim * 2),
+                nn.GLU(),
+                Linear(out_embed_dim, out_embed_dim),
+                LayerNorm(out_embed_dim),
+            )
+            # pretrained model contains an output layer that is nhid -> vocab size
+            # but the models are combined in their hidden state
+            # the hook stores the output of the pretrained model forward
+            self.pretrained_outputs = {}
+
+            def save_output():
+                def hook(a, b, output):
+                    self.pretrained_outputs["out"] = output
+
+                return hook
+
+            self.pretrained_decoder.fc2.register_forward_hook(save_output())
+
+    def forward(self, prev_output_tokens, encoder_out):
+        trained_encoder_out = encoder_out["pretrained"] if self.pretrained else None
+        encoder_out = encoder_out["encoder"]["encoder_out"]
+
+        encoder_a, encoder_b = self._split_encoder_out(encoder_out)
+
+        # embed positions
+        positions = self.embed_positions(prev_output_tokens)
+
+        # embed tokens and positions
+        x = self.embed_tokens(prev_output_tokens) + positions
+        x = self.dropout_module(x)
+        target_embedding = x.transpose(0, 1)
+
+        # project to size of convolution
+        x = self.fc1(x)
+
+        # B x T x C -> T x B x C
+        x = x.transpose(0, 1)
+
+        # temporal convolutions
+        avg_attn_scores = None
+        for proj, conv, attention, selfattention, attproj in zip(
+            self.projections,
+            self.convolutions,
+            self.attention,
+            self.selfattention,
+            self.attproj,
+        ):
+            residual = x if proj is None else proj(x)
+
+            x = self.dropout_module(x)
+            x = conv(x)
+            x = F.glu(x, dim=2)
+
+            # attention
+            if attention is not None:
+                r = x
+                x, attn_scores = attention(
+                    attproj(x) + target_embedding, encoder_a, encoder_b
+                )
+                x = x + r
+                if not self.training and self.need_attn:
+                    if avg_attn_scores is None:
+                        avg_attn_scores = attn_scores
+                    else:
+                        avg_attn_scores.add_(attn_scores)
+
+            if selfattention is not None:
+                x = selfattention(x)
+
+            x = (x + residual) * math.sqrt(0.5)
+
+        # T x B x C -> B x T x C
+        x = x.transpose(0, 1)
+
+        # project back to size of vocabulary
+        x = self.fc2(x)
+        x = self.dropout_module(x)
+        if not self.pretrained:
+            x = self.fc3(x)
+
+        # fusion gating
+        if self.pretrained:
+            trained_x, _ = self.pretrained_decoder.forward(
+                prev_output_tokens, trained_encoder_out
+            )
+            y = torch.cat([x, self.pretrained_outputs["out"]], dim=-1)
+            gate1 = self.gate1(y)
+            gate2 = self.gate2(y)
+            gated_x1 = gate1 * x
+            gated_x2 = gate2 * self.pretrained_outputs["out"]
+            fusion = torch.cat([gated_x1, gated_x2], dim=-1)
+            fusion = self.joining(fusion)
+            fusion_output = self.fc3(fusion)
+            return fusion_output, avg_attn_scores
+        else:
+            return x, avg_attn_scores
+
+    def max_positions(self):
+        """Maximum output length supported by the decoder."""
+        return self.embed_positions.max_positions
+
+    def make_generation_fast_(self, need_attn=False, **kwargs):
+        self.need_attn = need_attn
+
+    def _split_encoder_out(self, encoder_out):
+        """Split and transpose encoder outputs."""
+        # transpose only once to speed up attention layers
+        encoder_a, encoder_b = encoder_out
+        encoder_a = encoder_a.transpose(0, 1).contiguous()
+        encoder_b = encoder_b.transpose(0, 1).contiguous()
+        result = (encoder_a, encoder_b)
+        return result
+
+
+class SelfAttention(nn.Module):
+    def __init__(
+        self,
+        out_channels,
+        embed_dim,
+        num_heads,
+        project_input=False,
+        gated=False,
+        downsample=False,
+    ):
+        super().__init__()
+        self.attention = DownsampledMultiHeadAttention(
+            out_channels,
+            embed_dim,
+            num_heads,
+            dropout=0,
+            bias=True,
+            project_input=project_input,
+            gated=gated,
+            downsample=downsample,
+        )
+        self.in_proj_q = Linear(out_channels, embed_dim)
+        self.in_proj_k = Linear(out_channels, embed_dim)
+        self.in_proj_v = Linear(out_channels, embed_dim)
+        self.ln = LayerNorm(out_channels)
+
+    def forward(self, x):
+        residual = x
+        query = self.in_proj_q(x)
+        key = self.in_proj_k(x)
+        value = self.in_proj_v(x)
+        x, _ = self.attention(
+            query, key, value, mask_future_timesteps=True, use_scalar_bias=True
+        )
+        return self.ln(x + residual)
+
+
+def Embedding(num_embeddings, embedding_dim, padding_idx):
+    m = nn.Embedding(num_embeddings, embedding_dim, padding_idx=padding_idx)
+    m.weight.data.normal_(0, 0.1)
+    return m
+
+
+def PositionalEmbedding(num_embeddings, embedding_dim, padding_idx):
+    m = LearnedPositionalEmbedding(num_embeddings, embedding_dim, padding_idx)
+    m.weight.data.normal_(0, 0.1)
+    return m
+
+
+def Linear(in_features, out_features, dropout=0.0):
+    """Weight-normalized Linear layer (input: N x T x C)"""
+    m = nn.Linear(in_features, out_features)
+    m.weight.data.normal_(mean=0, std=math.sqrt((1 - dropout) / in_features))
+    m.bias.data.zero_()
+    return m
+
+
+def LinearizedConv1d(in_channels, out_channels, kernel_size, dropout=0.0, **kwargs):
+    """Weight-normalized Conv1d layer optimized for decoding"""
+    m = LinearizedConvolution(in_channels, out_channels, kernel_size, **kwargs)
+    std = math.sqrt((4 * (1.0 - dropout)) / (m.kernel_size[0] * in_channels))
+    m.weight.data.normal_(mean=0, std=std)
+    m.bias.data.zero_()
+    return m
+
+
+def ConvTBC(in_channels, out_channels, kernel_size, dropout=0.0, **kwargs):
+    """Weight-normalized Conv1d layer"""
+    from fairseq.modules import ConvTBC
+
+    m = ConvTBC(in_channels, out_channels, kernel_size, **kwargs)
+    std = math.sqrt((4 * (1.0 - dropout)) / (m.kernel_size[0] * in_channels))
+    m.weight.data.normal_(mean=0, std=std)
+    m.bias.data.zero_()
+    return m
+
+
+@register_model_architecture("fconv_self_att", "fconv_self_att")
+def base_architecture(args):
+    args.dropout = getattr(args, "dropout", 0.1)
+    args.encoder_embed_dim = getattr(args, "encoder_embed_dim", 512)
+    args.encoder_layers = getattr(args, "encoder_layers", "[(512, 3)] * 3")
+    args.decoder_embed_dim = getattr(args, "decoder_embed_dim", 512)
+    args.decoder_layers = getattr(args, "decoder_layers", "[(512, 3)] * 8")
+    args.decoder_out_embed_dim = getattr(args, "decoder_out_embed_dim", 256)
+    args.decoder_attention = getattr(args, "decoder_attention", "True")
+    args.self_attention = getattr(args, "self_attention", "False")
+    args.encoder_attention = getattr(args, "encoder_attention", "False")
+    args.multihead_attention_nheads = getattr(args, "multihead_attention_nheads", 1)
+    args.multihead_self_attention_nheads = getattr(
+        args, "multihead_self_attention_nheads", 1
+    )
+    args.encoder_attention_nheads = getattr(args, "encoder_attention_nheads", 1)
+    args.project_input = getattr(args, "project_input", "False")
+    args.gated_attention = getattr(args, "gated_attention", "False")
+    args.downsample = getattr(args, "downsample", "False")
+    args.pretrained_checkpoint = getattr(args, "pretrained_checkpoint", "")
+    args.pretrained = getattr(args, "pretrained", "False")
+
+
+@register_model_architecture("fconv_self_att", "fconv_self_att_wp")
+def fconv_self_att_wp(args):
+    args.encoder_embed_dim = getattr(args, "encoder_embed_dim", 256)
+    args.encoder_layers = getattr(
+        args, "encoder_layers", "[(128, 3)] * 2 + [(512,3)] * 1"
+    )
+    args.decoder_embed_dim = getattr(args, "decoder_embed_dim", 256)
+    args.decoder_layers = getattr(
+        args, "decoder_layers", "[(512, 4)] * 4 + [(768, 4)] * 2 + [(1024, 4)] * 1"
+    )
+    args.decoder_out_embed_dim = getattr(args, "decoder_out_embed_dim", 256)
+    args.self_attention = getattr(args, "self_attention", "True")
+    args.multihead_self_attention_nheads = getattr(
+        args, "multihead_self_attention_nheads", 4
+    )
+    args.project_input = getattr(args, "project_input", "True")
+    args.gated_attention = getattr(args, "gated_attention", "True")
+    args.downsample = getattr(args, "downsample", "True")
+    base_architecture(args)
diff --git a/SpeechT5/fairseq/fairseq/models/hubert/__init__.py b/SpeechT5/fairseq/fairseq/models/hubert/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..a1b0eabbdbcaf12b15bb96b329ab1e276256f79a
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/models/hubert/__init__.py
@@ -0,0 +1,7 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from .hubert import *  # noqa
+from .hubert_asr import *  # noqa
diff --git a/SpeechT5/fairseq/fairseq/models/hubert/hubert.py b/SpeechT5/fairseq/fairseq/models/hubert/hubert.py
new file mode 100644
index 0000000000000000000000000000000000000000..232a5e402a146023e5c93f3c2574ecec98faf9d5
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/models/hubert/hubert.py
@@ -0,0 +1,563 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import logging
+from typing import Dict, List, Optional, Tuple
+
+import numpy as np
+
+import torch
+import torch.nn as nn
+from dataclasses import dataclass, field
+from fairseq import utils
+from fairseq.data.data_utils import compute_mask_indices
+from fairseq.data.dictionary import Dictionary
+from fairseq.dataclass import ChoiceEnum, FairseqDataclass
+from fairseq.models import BaseFairseqModel, register_model
+from fairseq.models.wav2vec.wav2vec2 import (
+    ConvFeatureExtractionModel,
+    TransformerEncoder,
+)
+from fairseq.modules import GradMultiply, LayerNorm
+from fairseq.tasks.hubert_pretraining import (
+    HubertPretrainingConfig,
+    HubertPretrainingTask,
+)
+from omegaconf import II
+
+logger = logging.getLogger(__name__)
+
+EXTRACTOR_MODE_CHOICES = ChoiceEnum(["default", "layer_norm"])
+MASKING_DISTRIBUTION_CHOICES = ChoiceEnum(
+    ["static", "uniform", "normal", "poisson"]
+)
+
+
+@dataclass
+class HubertConfig(FairseqDataclass):
+    label_rate: int = II("task.label_rate")
+
+    extractor_mode: EXTRACTOR_MODE_CHOICES = field(
+        default="default",
+        metadata={
+            "help": "mode for feature extractor. default has a single group "
+            "norm with d groups in the first conv block, whereas layer_norm "
+            "has layer norms in every block (meant to use with normalize=True)"
+        },
+    )
+    encoder_layers: int = field(
+        default=12, metadata={"help": "num encoder layers in the transformer"}
+    )
+    encoder_embed_dim: int = field(
+        default=768, metadata={"help": "encoder embedding dimension"}
+    )
+    encoder_ffn_embed_dim: int = field(
+        default=3072, metadata={"help": "encoder embedding dimension for FFN"}
+    )
+    encoder_attention_heads: int = field(
+        default=12, metadata={"help": "num encoder attention heads"}
+    )
+    activation_fn: ChoiceEnum(utils.get_available_activation_fns()) = field(
+        default="gelu", metadata={"help": "activation function to use"}
+    )
+
+    # dropouts
+    dropout: float = field(
+        default=0.1,
+        metadata={"help": "dropout probability for the transformer"},
+    )
+    attention_dropout: float = field(
+        default=0.1,
+        metadata={"help": "dropout probability for attention weights"},
+    )
+    activation_dropout: float = field(
+        default=0.0,
+        metadata={"help": "dropout probability after activation in FFN"},
+    )
+    encoder_layerdrop: float = field(
+        default=0.0,
+        metadata={"help": "probability of dropping a tarnsformer layer"},
+    )
+    dropout_input: float = field(
+        default=0.0,
+        metadata={"help": "dropout to apply to the input (after feat extr)"},
+    )
+    dropout_features: float = field(
+        default=0.0,
+        metadata={
+            "help": "dropout to apply to the features (after feat extr)"
+        },
+    )
+
+    final_dim: int = field(
+        default=0,
+        metadata={
+            "help": "project final representations and targets to this many "
+            "dimensions. set to encoder_embed_dim is <= 0"
+        },
+    )
+    untie_final_proj: bool = field(
+        default=False,
+        metadata={"help": "use separate projection for each target"},
+    )
+    layer_norm_first: bool = field(
+        default=False,
+        metadata={"help": "apply layernorm first in the transformer"},
+    )
+    conv_feature_layers: str = field(
+        default="[(512,10,5)] + [(512,3,2)] * 4 + [(512,2,2)] * 2",
+        metadata={
+            "help": "string describing convolutional feature extraction "
+            "layers in form of a python list that contains "
+            "[(dim, kernel_size, stride), ...]"
+        },
+    )
+    conv_bias: bool = field(
+        default=False, metadata={"help": "include bias in conv encoder"}
+    )
+    logit_temp: float = field(
+        default=0.1, metadata={"help": "temperature to divide logits by"}
+    )
+    target_glu: bool = field(
+        default=False, metadata={"help": "adds projection + glu to targets"}
+    )
+    feature_grad_mult: float = field(
+        default=1.0,
+        metadata={"help": "multiply feature extractor var grads by this"},
+    )
+
+    # masking
+    mask_length: int = field(default=10, metadata={"help": "mask length"})
+    mask_prob: float = field(
+        default=0.65,
+        metadata={"help": "probability of replacing a token with mask"},
+    )
+    mask_selection: MASKING_DISTRIBUTION_CHOICES = field(
+        default="static", metadata={"help": "how to choose mask length"}
+    )
+    mask_other: float = field(
+        default=0,
+        metadata={
+            "help": "secondary mask argument "
+            "(used for more complex distributions), "
+            "see help in compute_mask_indicesh"
+        },
+    )
+    no_mask_overlap: bool = field(
+        default=False, metadata={"help": "whether to allow masks to overlap"}
+    )
+    mask_min_space: int = field(
+        default=1,
+        metadata={
+            "help": "min space between spans (if no overlap is enabled)"
+        },
+    )
+
+    # channel masking
+    mask_channel_length: int = field(
+        default=10,
+        metadata={"help": "length of the mask for features (channels)"},
+    )
+    mask_channel_prob: float = field(
+        default=0.0,
+        metadata={"help": "probability of replacing a feature with 0"},
+    )
+    mask_channel_selection: MASKING_DISTRIBUTION_CHOICES = field(
+        default="static",
+        metadata={"help": "how to choose mask length for channel masking"},
+    )
+    mask_channel_other: float = field(
+        default=0,
+        metadata={
+            "help": "secondary mask argument "
+            "(used for more complex distributions), "
+            "see help in compute_mask_indicesh"
+        },
+    )
+    no_mask_channel_overlap: bool = field(
+        default=False,
+        metadata={"help": "whether to allow channel masks to overlap"},
+    )
+    mask_channel_min_space: int = field(
+        default=1,
+        metadata={
+            "help": "min space between spans (if no overlap is enabled)"
+        },
+    )
+
+    # positional embeddings
+    conv_pos: int = field(
+        default=128,
+        metadata={
+            "help": "number of filters for convolutional positional embeddings"
+        },
+    )
+    conv_pos_groups: int = field(
+        default=16,
+        metadata={
+            "help": "number of groups for convolutional positional embedding"
+        },
+    )
+
+    latent_temp: Tuple[float, float, float] = field(
+        default=(2, 0.5, 0.999995),
+        metadata={"help": "legacy (to be removed)"},
+    )
+
+    # loss computation
+    skip_masked: bool = field(
+        default=False,
+        metadata={"help": "skip computing losses over masked frames"},
+    )
+    skip_nomask: bool = field(
+        default=False,
+        metadata={"help": "skip computing losses over unmasked frames"},
+    )
+
+
+@register_model("hubert", dataclass=HubertConfig)
+class HubertModel(BaseFairseqModel):
+    def __init__(
+        self,
+        cfg: HubertConfig,
+        task_cfg: HubertPretrainingConfig,
+        dictionaries: List[Dictionary],
+    ) -> None:
+        super().__init__()
+        logger.info(f"HubertModel Config: {cfg}")
+
+        feature_enc_layers = eval(cfg.conv_feature_layers)  # noqa
+        self.embed = feature_enc_layers[-1][0]
+
+        self.feature_extractor = ConvFeatureExtractionModel(
+            conv_layers=feature_enc_layers,
+            dropout=0.0,
+            mode=cfg.extractor_mode,
+            conv_bias=cfg.conv_bias,
+        )
+        feature_ds_rate = np.prod([s for _, _, s in feature_enc_layers])
+        self.feat2tar_ratio = (
+            cfg.label_rate * feature_ds_rate / task_cfg.sample_rate
+        )
+
+        self.post_extract_proj = (
+            nn.Linear(self.embed, cfg.encoder_embed_dim)
+            if self.embed != cfg.encoder_embed_dim
+            else None
+        )
+
+        self.mask_prob = cfg.mask_prob
+        self.mask_selection = cfg.mask_selection
+        self.mask_other = cfg.mask_other
+        self.mask_length = cfg.mask_length
+        self.no_mask_overlap = cfg.no_mask_overlap
+        self.mask_min_space = cfg.mask_min_space
+
+        self.mask_channel_prob = cfg.mask_channel_prob
+        self.mask_channel_selection = cfg.mask_channel_selection
+        self.mask_channel_other = cfg.mask_channel_other
+        self.mask_channel_length = cfg.mask_channel_length
+        self.no_mask_channel_overlap = cfg.no_mask_channel_overlap
+        self.mask_channel_min_space = cfg.mask_channel_min_space
+
+        self.dropout_input = nn.Dropout(cfg.dropout_input)
+        self.dropout_features = nn.Dropout(cfg.dropout_features)
+
+        self.feature_grad_mult = cfg.feature_grad_mult
+        self.logit_temp = cfg.logit_temp
+        self.skip_masked = cfg.skip_masked
+        self.skip_nomask = cfg.skip_nomask
+
+        final_dim = (
+            cfg.final_dim if cfg.final_dim > 0 else cfg.encoder_embed_dim
+        )
+
+        self.mask_emb = nn.Parameter(
+            torch.FloatTensor(cfg.encoder_embed_dim).uniform_()
+        )
+
+        self.encoder = TransformerEncoder(cfg)
+        self.layer_norm = LayerNorm(self.embed)
+
+        self.target_glu = None
+        if cfg.target_glu:
+            self.target_glu = nn.Sequential(
+                nn.Linear(final_dim, final_dim * 2), nn.GLU()
+            )
+
+        self.untie_final_proj = cfg.untie_final_proj
+        if self.untie_final_proj:
+            self.final_proj = nn.Linear(
+                cfg.encoder_embed_dim, final_dim * len(dictionaries)
+            )
+        else:
+            self.final_proj = nn.Linear(cfg.encoder_embed_dim, final_dim)
+
+        # modules below are not needed during fine-tuning
+        if any([d is None for d in dictionaries]):
+            logger.info(
+                "cannot find dictionary. assume will be used for fine-tuning"
+            )
+        else:
+            self.num_classes = [len(d) for d in dictionaries]
+            self.label_embs_concat = nn.Parameter(
+                torch.FloatTensor(sum(self.num_classes), final_dim)
+            )
+            nn.init.uniform_(self.label_embs_concat)
+
+    def upgrade_state_dict_named(self, state_dict, name):
+        """Upgrade a (possibly old) state dict for new versions of fairseq."""
+
+        super().upgrade_state_dict_named(state_dict, name)
+        return state_dict
+
+    @classmethod
+    def build_model(cls, cfg: HubertConfig, task: HubertPretrainingTask):
+        """Build a new model instance."""
+
+        model = HubertModel(cfg, task.cfg, task.dictionaries)
+        return model
+
+    def apply_mask(self, x, padding_mask, target_list):
+        B, T, C = x.shape
+        if self.mask_prob > 0:
+            mask_indices = compute_mask_indices(
+                (B, T),
+                padding_mask,
+                self.mask_prob,
+                self.mask_length,
+                self.mask_selection,
+                self.mask_other,
+                min_masks=2,
+                no_overlap=self.no_mask_overlap,
+                min_space=self.mask_min_space,
+            )
+            mask_indices = torch.from_numpy(mask_indices).to(x.device)
+            x[mask_indices] = self.mask_emb
+        else:
+            mask_indices = None
+
+        if self.mask_channel_prob > 0:
+            mask_channel_indices = compute_mask_indices(
+                (B, C),
+                None,
+                self.mask_channel_prob,
+                self.mask_channel_length,
+                self.mask_channel_selection,
+                self.mask_channel_other,
+                no_overlap=self.no_mask_channel_overlap,
+                min_space=self.mask_channel_min_space,
+            )
+            mask_channel_indices = (
+                torch.from_numpy(mask_channel_indices)
+                .to(x.device)
+                .unsqueeze(1)
+                .expand(-1, T, -1)
+            )
+            x[mask_channel_indices] = 0
+
+        return x, mask_indices
+
+    def compute_nce(self, x, pos, negs):
+        neg_is_pos = (pos == negs).all(-1)
+        pos = pos.unsqueeze(0)
+        targets = torch.cat([pos, negs], dim=0)
+
+        logits = torch.cosine_similarity(
+            x.float(), targets.float(), dim=-1
+        ).type_as(x)
+        logits /= self.logit_temp
+        if neg_is_pos.any():
+            logits[1:][neg_is_pos] = float("-inf")
+        logits = logits.transpose(0, 1)  # (num_x, num_cls+1)
+        return logits
+
+    def forward_features(self, source: torch.Tensor) -> torch.Tensor:
+        if self.feature_grad_mult > 0:
+            features = self.feature_extractor(source)
+            if self.feature_grad_mult != 1.0:
+                features = GradMultiply.apply(features, self.feature_grad_mult)
+        else:
+            with torch.no_grad():
+                features = self.feature_extractor(source)
+        return features
+
+    def forward_targets(
+        self, features: torch.Tensor, target_list: List[torch.Tensor],
+    ) -> Tuple[torch.Tensor, torch.Tensor]:
+        # Trim features to ensure labels exist and then get aligned labels
+        feat_tsz = features.size(2)
+        targ_tsz = min([t.size(1) for t in target_list])
+        if self.feat2tar_ratio * feat_tsz > targ_tsz:
+            feat_tsz = int(targ_tsz / self.feat2tar_ratio)
+            features = features[..., :feat_tsz]
+        target_inds = torch.arange(feat_tsz).float() * self.feat2tar_ratio
+        target_list = [t[:, target_inds.long()] for t in target_list]
+        return features, target_list
+
+    def forward_padding_mask(
+        self, features: torch.Tensor, padding_mask: torch.Tensor,
+    ) -> torch.Tensor:
+        extra = padding_mask.size(1) % features.size(1)
+        if extra > 0:
+            padding_mask = padding_mask[:, :-extra]
+        padding_mask = padding_mask.view(
+            padding_mask.size(0), features.size(1), -1
+        )
+        padding_mask = padding_mask.all(-1)
+        return padding_mask
+
+    def forward(
+        self,
+        source: torch.Tensor,
+        target_list: Optional[List[torch.Tensor]] = None,
+        padding_mask: Optional[torch.Tensor] = None,
+        mask: bool = True,
+        features_only: bool = False,
+        output_layer: Optional[int] = None,
+    ) -> Dict[str, torch.Tensor]:
+        """output layer is 1-based"""
+        features = self.forward_features(source)
+        if target_list is not None:
+            features, target_list = self.forward_targets(features, target_list)
+
+        features_pen = features.float().pow(2).mean()
+
+        features = features.transpose(1, 2)
+        features = self.layer_norm(features)
+        unmasked_features = features.clone()
+
+        if padding_mask is not None:
+            padding_mask = self.forward_padding_mask(features, padding_mask)
+
+        if self.post_extract_proj is not None:
+            features = self.post_extract_proj(features)
+
+        features = self.dropout_input(features)
+        unmasked_features = self.dropout_features(unmasked_features)
+
+        if mask:
+            x, mask_indices = self.apply_mask(
+                features, padding_mask, target_list
+            )
+        else:
+            x = features
+            mask_indices = None
+
+        # feature: (B, T, D), float
+        # target: (B, T), long
+        # x: (B, T, D), float
+        # padding_mask: (B, T), bool
+        # mask_indices: (B, T), bool
+        x, _ = self.encoder(
+            x,
+            padding_mask=padding_mask,
+            layer=None if output_layer is None else output_layer - 1
+        )
+
+        if features_only:
+            return {"x": x, "padding_mask": padding_mask, "features": features}
+
+        def compute_pred(proj_x, target, label_embs):
+            # compute logits for the i-th label set
+            y = torch.index_select(label_embs, 0, target.long())
+            negs = label_embs.unsqueeze(1).expand(-1, proj_x.size(0), -1)
+            if self.target_glu:
+                y = self.target_glu(y)
+                negs = self.target_glu(negs)
+            # proj_x: (S, D)
+            # y: (S, D)
+            # negs: (Neg, S, D)
+            return self.compute_nce(proj_x, y, negs)
+
+        label_embs_list = self.label_embs_concat.split(self.num_classes, 0)
+
+        if not self.skip_masked:
+            masked_indices = torch.logical_and(~padding_mask, mask_indices)
+            proj_x_m = self.final_proj(x[masked_indices])
+            if self.untie_final_proj:
+                proj_x_m_list = proj_x_m.chunk(len(target_list), dim=-1)
+            else:
+                proj_x_m_list = [proj_x_m for _ in range(len(target_list))]
+            logit_m_list = [
+                compute_pred(proj_x_m, t[masked_indices], label_embs_list[i])
+                for i, (proj_x_m, t) in enumerate(
+                    zip(proj_x_m_list, target_list)
+                )
+            ]
+        else:
+            logit_m_list = [None for _ in target_list]
+
+        if not self.skip_nomask:
+            nomask_indices = torch.logical_and(~padding_mask, ~mask_indices)
+            proj_x_u = self.final_proj(x[nomask_indices])
+            if self.untie_final_proj:
+                proj_x_u_list = proj_x_u.chunk(len(target_list), dim=-1)
+            else:
+                proj_x_u_list = [proj_x_u for _ in range(len(target_list))]
+
+            logit_u_list = [
+                compute_pred(proj_x_u, t[nomask_indices], label_embs_list[i])
+                for i, (proj_x_u, t) in enumerate(
+                    zip(proj_x_u_list, target_list)
+                )
+            ]
+        else:
+            logit_u_list = [None for _ in target_list]
+
+        result = {
+            "logit_m_list": logit_m_list,
+            "logit_u_list": logit_u_list,
+            "padding_mask": padding_mask,
+            "features_pen": features_pen,
+        }
+        return result
+
+    def extract_features(
+        self,
+        source: torch.Tensor,
+        padding_mask: Optional[torch.Tensor] = None,
+        mask: bool = False,
+        ret_conv: bool = False,
+        output_layer: Optional[int] = None,
+    ) -> Tuple[torch.Tensor, torch.Tensor]:
+        res = self.forward(
+            source,
+            padding_mask=padding_mask,
+            mask=mask,
+            features_only=True,
+            output_layer=output_layer,
+        )
+        feature = res["features"] if ret_conv else res["x"]
+        return feature, res["padding_mask"]
+
+    def get_logits(self, net_output, is_masked=True):
+        if is_masked:
+            logits_list = net_output["logit_m_list"]
+        else:
+            logits_list = net_output["logit_u_list"]
+        logits_list = [x.float() for x in logits_list if x is not None]
+        return logits_list
+
+    def get_targets(self, net_output, is_masked=True):
+        logits_list = self.get_logits(net_output, is_masked)
+        targets_list = [
+            x.new_zeros(x.size(0), dtype=torch.long) for x in logits_list
+        ]
+        return targets_list
+
+    def get_extra_losses(self, net_output):
+        extra_losses = []
+        names = []
+
+        if "features_pen" in net_output:
+            extra_losses.append(net_output["features_pen"])
+            names.append("features_pen")
+
+        return extra_losses, names
+
+    def remove_pretraining_modules(self):
+        self.target_glu = None
+        self.final_proj = None
diff --git a/SpeechT5/fairseq/fairseq/models/hubert/hubert_asr.py b/SpeechT5/fairseq/fairseq/models/hubert/hubert_asr.py
new file mode 100644
index 0000000000000000000000000000000000000000..4cb3fb71537643b560b493ff1c7fc17843e1e49e
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/models/hubert/hubert_asr.py
@@ -0,0 +1,373 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import contextlib
+from argparse import Namespace
+from typing import Any
+
+import torch
+import torch.nn as nn
+from dataclasses import dataclass, field
+from fairseq import checkpoint_utils, tasks, utils
+from fairseq.dataclass import FairseqDataclass
+from fairseq.dataclass.utils import convert_namespace_to_omegaconf
+from fairseq.models import BaseFairseqModel, FairseqEncoder, register_model
+from fairseq.models.hubert.hubert import MASKING_DISTRIBUTION_CHOICES
+from fairseq.tasks import FairseqTask
+from omegaconf import II, MISSING
+
+
+@dataclass
+class HubertAsrConfig(FairseqDataclass):
+    w2v_path: str = field(
+        default=MISSING, metadata={"help": "path to hubert model"}
+    )
+    no_pretrained_weights: bool = field(
+        default=False,
+        metadata={"help": "if true, does not load pretrained weights"},
+    )
+    dropout_input: float = field(
+        default=0.0,
+        metadata={"help": "dropout to apply to the input (after feat extr)"},
+    )
+    final_dropout: float = field(
+        default=0.0,
+        metadata={
+            "help": "dropout after transformer and before final projection"
+        },
+    )
+    dropout: float = field(
+        default=0.0,
+        metadata={"help": "dropout probability inside hubert model"},
+    )
+    attention_dropout: float = field(
+        default=0.0,
+        metadata={
+            "help": "dropout probability for attention weights "
+            "inside hubert model"
+        },
+    )
+    activation_dropout: float = field(
+        default=0.0,
+        metadata={
+            "help": "dropout probability after activation in FFN "
+            "inside hubert model"
+        },
+    )
+
+    # masking
+    apply_mask: bool = field(
+        default=False, metadata={"help": "apply masking during fine-tuning"}
+    )
+    mask_length: int = field(
+        default=10, metadata={"help": "repeat the mask indices multiple times"}
+    )
+    mask_prob: float = field(
+        default=0.5,
+        metadata={
+            "help": "probability of replacing a token with mask "
+            "(normalized by length)"
+        },
+    )
+    mask_selection: MASKING_DISTRIBUTION_CHOICES = field(
+        default="static", metadata={"help": "how to choose masks"}
+    )
+    mask_other: float = field(
+        default=0,
+        metadata={
+            "help": "secondary mask argument "
+            "(used for more complex distributions), "
+            "see help in compute_mask_indices"
+        },
+    )
+    no_mask_overlap: bool = field(
+        default=False, metadata={"help": "whether to allow masks to overlap"}
+    )
+
+    # channel masking
+    mask_channel_length: int = field(
+        default=10,
+        metadata={"help": "length of the mask for features (channels)"},
+    )
+    mask_channel_prob: float = field(
+        default=0.0,
+        metadata={"help": "probability of replacing a feature with 0"},
+    )
+    mask_channel_selection: MASKING_DISTRIBUTION_CHOICES = field(
+        default="static",
+        metadata={"help": "how to choose mask length for channel masking"},
+    )
+    mask_channel_other: float = field(
+        default=0,
+        metadata={
+            "help": "secondary mask argument "
+            "(used for more complex distributions), "
+            "see help in compute_mask_indices"
+        },
+    )
+    no_mask_channel_overlap: bool = field(
+        default=False,
+        metadata={"help": "whether to allow channel masks to overlap"},
+    )
+    freeze_finetune_updates: int = field(
+        default=0,
+        metadata={"help": "dont finetune hubert for this many updates"},
+    )
+    feature_grad_mult: float = field(
+        default=0.0,
+        metadata={"help": "reset feature grad mult in hubert to this"},
+    )
+    layerdrop: float = field(
+        default=0.0,
+        metadata={"help": "probability of dropping a layer in hubert"},
+    )
+    normalize: bool = II("task.normalize")
+    data: str = II("task.data")
+
+    # this holds the loaded hubert args
+    w2v_args: Any = None
+
+
+@dataclass
+class HubertCtcConfig(HubertAsrConfig):
+    pass
+
+
+@register_model("hubert_ctc", dataclass=HubertCtcConfig)
+class HubertCtc(BaseFairseqModel):
+    def __init__(self, cfg: HubertCtcConfig, w2v_encoder: BaseFairseqModel):
+        super().__init__()
+        self.cfg = cfg
+        self.w2v_encoder = w2v_encoder
+
+    def upgrade_state_dict_named(self, state_dict, name):
+        super().upgrade_state_dict_named(state_dict, name)
+        return state_dict
+
+    @classmethod
+    def build_model(cls, cfg: HubertCtcConfig, task: FairseqTask):
+        """Build a new model instance."""
+        w2v_encoder = HubertEncoder(cfg, task.target_dictionary)
+        return cls(cfg, w2v_encoder)
+
+    def get_normalized_probs(self, net_output, log_probs):
+        """Get normalized probabilities (or log probs) from a net's output."""
+
+        logits = net_output["encoder_out"]
+        if log_probs:
+            return utils.log_softmax(logits.float(), dim=-1)
+        else:
+            return utils.softmax(logits.float(), dim=-1)
+
+    def get_logits(self, net_output):
+        logits = net_output["encoder_out"]
+        padding = net_output["encoder_padding_mask"]
+        if padding is not None and padding.any():
+            padding = padding.T
+            logits[padding][..., 0] = 0
+            logits[padding][..., 1:] = float("-inf")
+
+        return logits
+
+    def forward(self, **kwargs):
+        x = self.w2v_encoder(**kwargs)
+        return x
+
+
+@dataclass
+class HubertSeq2SeqConfig(HubertAsrConfig):
+    decoder_embed_dim: int = field(
+        default=768, metadata={"help": "decoder embedding dimension"}
+    )
+    decoder_ffn_embed_dim: int = field(
+        default=3072, metadata={"help": "decoder embedding dimension for FFN"}
+    )
+    decoder_layers: int = field(
+        default=6, metadata={"help": "num of decoder layers"}
+    )
+    decoder_layerdrop: float = field(
+        default=0.0, metadata={"help": "decoder layerdrop chance"}
+    )
+    decoder_attention_heads: int = field(
+        default=4, metadata={"help": "num decoder attention heads"}
+    )
+    decoder_learned_pos: bool = field(
+        default=False,
+        metadata={"help": "use learned positional embeddings in the decoder"},
+    )
+    decoder_normalize_before: bool = field(
+        default=False,
+        metadata={"help": "apply layernorm before each decoder block"},
+    )
+    no_token_positional_embeddings: bool = field(
+        default=False,
+        metadata={
+            "help": "if set, disables positional embeddings "
+            "(outside self attention)"
+        },
+    )
+    decoder_dropout: float = field(
+        default=0.0, metadata={"help": "dropout probability in the decoder"}
+    )
+    decoder_attention_dropout: float = field(
+        default=0.0,
+        metadata={
+            "help": "dropout probability for attention weights "
+            "inside the decoder"
+        },
+    )
+    decoder_activation_dropout: float = field(
+        default=0.0,
+        metadata={
+            "help": "dropout probability after activation in FFN "
+            "inside the decoder"
+        },
+    )
+    max_target_positions: int = field(
+        default=2048, metadata={"help": "max target positions"}
+    )
+    share_decoder_input_output_embed: bool = field(
+        default=False,
+        metadata={"help": "share decoder input and output embeddings"},
+    )
+
+
+class HubertEncoder(FairseqEncoder):
+    def __init__(self, cfg: HubertAsrConfig, tgt_dict=None):
+        self.apply_mask = cfg.apply_mask
+
+        arg_overrides = {
+            "dropout": cfg.dropout,
+            "activation_dropout": cfg.activation_dropout,
+            "dropout_input": cfg.dropout_input,
+            "attention_dropout": cfg.attention_dropout,
+            "mask_length": cfg.mask_length,
+            "mask_prob": cfg.mask_prob,
+            "mask_selection": cfg.mask_selection,
+            "mask_other": cfg.mask_other,
+            "no_mask_overlap": cfg.no_mask_overlap,
+            "mask_channel_length": cfg.mask_channel_length,
+            "mask_channel_prob": cfg.mask_channel_prob,
+            "mask_channel_selection": cfg.mask_channel_selection,
+            "mask_channel_other": cfg.mask_channel_other,
+            "no_mask_channel_overlap": cfg.no_mask_channel_overlap,
+            "encoder_layerdrop": cfg.layerdrop,
+            "feature_grad_mult": cfg.feature_grad_mult,
+        }
+
+        if cfg.w2v_args is None:
+            state = checkpoint_utils.load_checkpoint_to_cpu(
+                cfg.w2v_path, arg_overrides
+            )
+            w2v_args = state.get("cfg", None)
+            if w2v_args is None:
+                w2v_args = convert_namespace_to_omegaconf(state["args"])
+            cfg.w2v_args = w2v_args
+        else:
+            state = None
+            w2v_args = cfg.w2v_args
+            if isinstance(w2v_args, Namespace):
+                cfg.w2v_args = w2v_args = convert_namespace_to_omegaconf(
+                    w2v_args
+                )
+
+        assert cfg.normalize == w2v_args.task.normalize, (
+            "Fine-tuning works best when data normalization is the same. "
+            "Please check that --normalize is set or unset for "
+            "both pre-training and here"
+        )
+
+        w2v_args.task.data = cfg.data
+        task = tasks.setup_task(w2v_args.task)
+        model = task.build_model(w2v_args.model)
+
+        if state is not None and not cfg.no_pretrained_weights:
+            # set strict=False because we omit some modules
+            model.load_state_dict(state["model"], strict=False)
+
+        model.remove_pretraining_modules()
+
+        super().__init__(task.source_dictionary)
+
+        d = w2v_args.model.encoder_embed_dim
+
+        self.w2v_model = model
+
+        self.final_dropout = nn.Dropout(cfg.final_dropout)
+        self.freeze_finetune_updates = cfg.freeze_finetune_updates
+        self.num_updates = 0
+
+        if tgt_dict is not None:
+            self.proj = Linear(d, len(tgt_dict))
+        elif getattr(cfg, "decoder_embed_dim", d) != d:
+            self.proj = Linear(d, cfg.decoder_embed_dim)
+        else:
+            self.proj = None
+
+    def set_num_updates(self, num_updates):
+        """Set the number of parameters updates."""
+        super().set_num_updates(num_updates)
+        self.num_updates = num_updates
+
+    def forward(self, source, padding_mask, tbc=True, **kwargs):
+
+        w2v_args = {
+            "source": source,
+            "padding_mask": padding_mask,
+            "mask": self.apply_mask and self.training,
+        }
+
+        ft = self.freeze_finetune_updates <= self.num_updates
+
+        with torch.no_grad() if not ft else contextlib.ExitStack():
+            x, padding_mask = self.w2v_model.extract_features(**w2v_args)
+
+            if tbc:
+                # B x T x C -> T x B x C
+                x = x.transpose(0, 1)
+
+        x = self.final_dropout(x)
+
+        if self.proj:
+            x = self.proj(x)
+
+        return {
+            "encoder_out": x,  # T x B x C
+            "encoder_padding_mask": padding_mask,  # B x T
+            "padding_mask": padding_mask,
+        }
+
+    def reorder_encoder_out(self, encoder_out, new_order):
+        if encoder_out["encoder_out"] is not None:
+            encoder_out["encoder_out"] = encoder_out[
+                "encoder_out"
+            ].index_select(1, new_order)
+        if encoder_out["encoder_padding_mask"] is not None:
+            encoder_out["encoder_padding_mask"] = encoder_out[
+                "encoder_padding_mask"
+            ].index_select(0, new_order)
+        return encoder_out
+
+    def max_positions(self):
+        """Maximum input length supported by the encoder."""
+        return None
+
+    def upgrade_state_dict_named(self, state_dict, name):
+        return state_dict
+
+
+def Embedding(num_embeddings, embedding_dim, padding_idx):
+    m = nn.Embedding(num_embeddings, embedding_dim, padding_idx=padding_idx)
+    nn.init.normal_(m.weight, mean=0, std=embedding_dim ** -0.5)
+    nn.init.constant_(m.weight[padding_idx], 0)
+    return m
+
+
+def Linear(in_features, out_features, bias=True):
+    m = nn.Linear(in_features, out_features, bias)
+    nn.init.xavier_uniform_(m.weight)
+    if bias:
+        nn.init.constant_(m.bias, 0.0)
+    return m
diff --git a/SpeechT5/fairseq/fairseq/models/huggingface/__init__.py b/SpeechT5/fairseq/fairseq/models/huggingface/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..f7911c2c8edf516855023a285b18935e5389ec02
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/models/huggingface/__init__.py
@@ -0,0 +1,20 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import importlib
+import os
+
+
+# automatically import any Python files in the models/huggingface/ directory
+models_dir = os.path.dirname(__file__)
+for file in os.listdir(models_dir):
+    path = os.path.join(models_dir, file)
+    if (
+        not file.startswith("_")
+        and not file.startswith(".")
+        and (file.endswith(".py") or os.path.isdir(path))
+    ):
+        model_name = file[: file.find(".py")] if file.endswith(".py") else file
+        module = importlib.import_module("fairseq.models.huggingface." + model_name)
diff --git a/SpeechT5/fairseq/fairseq/models/huggingface/hf_gpt2.py b/SpeechT5/fairseq/fairseq/models/huggingface/hf_gpt2.py
new file mode 100644
index 0000000000000000000000000000000000000000..3a8eb78198f5808557092f814e92f1c9d72933ec
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/models/huggingface/hf_gpt2.py
@@ -0,0 +1,168 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import logging
+import os
+import sys
+from typing import Dict, List, Optional
+
+import torch
+from fairseq.models import (
+    FairseqIncrementalDecoder,
+    FairseqLanguageModel,
+    register_model,
+    register_model_architecture,
+)
+
+
+logger = logging.getLogger(__name__)
+
+
+DEFAULT_MAX_TARGET_POSITIONS = 1024
+
+
+@register_model("hf_gpt2")
+class HuggingFaceGPT2LanguageModel(FairseqLanguageModel):
+    def __init__(self, decoder):
+        super().__init__(decoder)
+
+    @staticmethod
+    def add_args(parser):
+        """Add model-specific arguments to the parser."""
+        # fmt: off
+        parser.add_argument('--embed-dim', type=int, metavar='N',
+                            help='embedding dimension')
+        parser.add_argument('--num-attention-heads', type=int, metavar='N',
+                            help='num attention heads')
+        parser.add_argument('--num-layers', type=int, metavar='N',
+                            help='num layers')
+        parser.add_argument('--dropout', type=float, metavar='D',
+                            help='dropout probability for all fully connected layers '
+                                 'in the embeddings, encoder, and pooler')
+        parser.add_argument('--attention-dropout', type=float, metavar='D',
+                            help='dropout probability for attention weights')
+        # fmt: on
+
+    @classmethod
+    def build_model(cls, args, task):
+        """Build a new model instance."""
+        default_architecture(args)
+        return cls(HuggingFaceGPT2Decoder(args, task))
+
+
+class HuggingFaceGPT2Decoder(FairseqIncrementalDecoder):
+    def __init__(self, args, task):
+        try:
+            from transformers import GPT2Config, GPT2LMHeadModel
+        except ImportError:
+            raise ImportError(
+                "\n\nPlease install huggingface/transformers with:"
+                "\n\n  pip install transformers"
+            )
+
+        super().__init__(task.target_dictionary)
+
+        config = GPT2Config(
+            vocab_size=len(task.target_dictionary),
+            n_positions=args.max_target_positions + 1,
+            n_ctx=args.max_target_positions,
+            n_embd=args.embed_dim,
+            n_layer=args.num_layers,
+            n_head=args.num_attention_heads,
+            resid_pdrop=args.dropout,
+            embd_pdrop=args.dropout,
+            attn_pdrop=args.attention_dropout,
+            layer_norm_epsilon=1e-6,
+        )
+        self.model = GPT2LMHeadModel(config)
+
+        # set zero embedding for padding symbol
+        self.pad_idx = task.target_dictionary.pad()
+        self.model.transformer.wte.weight.data[self.pad_idx].zero_()
+        self.model.transformer.wpe.weight.data[0].zero_()
+
+    def forward(
+        self,
+        prev_output_tokens,
+        src_lengths=None,
+        incremental_state: Optional[Dict[str, List[torch.Tensor]]] = None,
+        encoder_out=None,
+    ):
+        features = self.extract_features(prev_output_tokens, incremental_state)
+        lm_logits = self.model.lm_head(features)
+        return (lm_logits,)
+
+    def extract_features(
+        self,
+        prev_output_tokens,
+        incremental_state: Optional[Dict[str, List[torch.Tensor]]] = None,
+    ):
+        if incremental_state:
+            past = self.get_incremental_state("past")
+        else:
+            past = None
+
+        # don't attend to padding symbols
+        attention_mask = prev_output_tokens.ne(self.pad_idx).int()
+
+        # set position ids to exclude padding symbols
+        position_ids = attention_mask * (
+            torch.arange(1, 1 + prev_output_tokens.size(1))
+            .to(prev_output_tokens)
+            .repeat(prev_output_tokens.size(0), 1)
+        )
+
+        outputs = self.model.transformer(
+            input_ids=prev_output_tokens,
+            past=past,
+            attention_mask=attention_mask,
+            position_ids=position_ids,
+        )
+        last_hidden_states = outputs[0]
+
+        if incremental_state:
+            self.set_incremental_state(incremental_state, "past", outputs[1])
+
+        return last_hidden_states
+
+    def max_positions(self):
+        return self.model.config.n_positions - 1
+
+
+@register_model_architecture("hf_gpt2", "hf_gpt2")
+def default_architecture(args):
+    if getattr(args, "max_target_positions", None) is None:
+        args.max_target_positions = getattr(
+            args, "tokens_per_sample", DEFAULT_MAX_TARGET_POSITIONS
+        )
+    args.embed_dim = getattr(args, "embed_dim", 768)
+    args.num_attention_heads = getattr(args, "num_attention_heads", 12)
+    args.num_layers = getattr(args, "num_layers", 12)
+    args.dropout = getattr(args, "dropout", 0.1)
+    args.attention_dropout = getattr(args, "attention_dropout", 0.1)
+
+
+@register_model_architecture("hf_gpt2", "hf_gpt2_medium")
+def hf_gpt2_medium(args):
+    args.embed_dim = getattr(args, "embed_dim", 1024)
+    args.num_attention_heads = getattr(args, "num_attention_heads", 16)
+    args.num_layers = getattr(args, "num_layers", 24)
+    default_architecture(args)
+
+
+@register_model_architecture("hf_gpt2", "hf_gpt2_large")
+def hf_gpt2_large(args):
+    args.embed_dim = getattr(args, "embed_dim", 1280)
+    args.num_attention_heads = getattr(args, "num_attention_heads", 20)
+    args.num_layers = getattr(args, "num_layers", 36)
+    default_architecture(args)
+
+
+@register_model_architecture("hf_gpt2", "hf_gpt2_xl")
+def hf_gpt2_xl(args):
+    args.embed_dim = getattr(args, "embed_dim", 1600)
+    args.num_attention_heads = getattr(args, "num_attention_heads", 25)
+    args.num_layers = getattr(args, "num_layers", 48)
+    default_architecture(args)
diff --git a/SpeechT5/fairseq/fairseq/models/lightconv.py b/SpeechT5/fairseq/fairseq/models/lightconv.py
new file mode 100644
index 0000000000000000000000000000000000000000..b614da366513091132c8b6bd8b8e170cce33a1c4
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/models/lightconv.py
@@ -0,0 +1,1018 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import math
+
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from fairseq import utils
+from fairseq.models import (
+    FairseqEncoder,
+    FairseqEncoderDecoderModel,
+    FairseqIncrementalDecoder,
+    register_model,
+    register_model_architecture,
+)
+from fairseq.modules import (
+    AdaptiveSoftmax,
+    DynamicConv,
+    FairseqDropout,
+    LayerNorm,
+    LightweightConv,
+    MultiheadAttention,
+    PositionalEmbedding,
+)
+
+
+@register_model("lightconv")
+class LightConvModel(FairseqEncoderDecoderModel):
+    """
+    LightConv and DynamicConv model from `"Pay Less Attention with Lightweight and Dynamic Convolutions" (Wu, et al, 2019)
+    <https://openreview.net/pdf?id=SkVhlh09tX>`_.
+    To use LightConv please set ``--encoder-conv-type lightweight --decoder-conv-type lightweight``
+    To use DynamicConv please set ``--encoder-conv-type dynamic --decoder-conv-type dynamic``
+
+    Args:
+        encoder (LightConvEncoder): the encoder
+        decoder (LightConvDecoder): the decoder
+
+    The LightConv model provides the following named architectures and
+    command-line arguments:
+
+    .. argparse::
+        :ref: fairseq.models.lightconv_parser
+        :prog:
+    """
+
+    @classmethod
+    def hub_models(cls):
+        # fmt: off
+
+        def moses_subword(path):
+            return {
+                'path': path,
+                'tokenizer': 'moses',
+                'bpe': 'subword_nmt',
+            }
+
+        return {
+            'lightconv.no_glu.iwslt14.de-en': moses_subword('https://dl.fbaipublicfiles.com/fairseq/models/dynamicconv/iwslt14.de-en.lightconv.tar.gz'),
+            'dynamicconv.no_glu.iwslt14.de-en': moses_subword('https://dl.fbaipublicfiles.com/fairseq/models/dynamicconv/iwslt14.de-en.dynamicconv.tar.gz'),
+            'lightconv.no_glu.wmt16.en-de': moses_subword('https://dl.fbaipublicfiles.com/fairseq/models/dynamicconv/wmt16.en-de.joined-dict.lightconv.tar.gz'),
+            'dynamicconv.no_glu.wmt16.en-de': moses_subword('https://dl.fbaipublicfiles.com/fairseq/models/dynamicconv/wmt16.en-de.joined-dict.dynamicconv.tar.gz'),
+            'lightconv.glu.wmt16.en-de': moses_subword('https://dl.fbaipublicfiles.com/fairseq/models/dynamicconv/wmt16.en-de.joined-dict.lightconv-glu.tar.gz'),
+            'dynamicconv.glu.wmt16.en-de': moses_subword('https://dl.fbaipublicfiles.com/fairseq/models/dynamicconv/wmt16.en-de.joined-dict.dynamicconv-glu.tar.gz'),
+            'lightconv.glu.wmt17.en-de': moses_subword('https://dl.fbaipublicfiles.com/fairseq/models/dynamicconv/wmt16.en-de.joined-dict.lightconv-glu.tar.gz'),
+            'dynamicconv.glu.wmt17.en-de': moses_subword('https://dl.fbaipublicfiles.com/fairseq/models/dynamicconv/wmt16.en-de.joined-dict.dynamicconv-glu.tar.gz'),
+            'lightconv.glu.wmt14.en-fr': moses_subword('https://dl.fbaipublicfiles.com/fairseq/models/dynamicconv/wmt14.en-fr.joined-dict.lightconv-glu.tar.gz'),
+            'dynamicconv.glu.wmt14.en-fr': moses_subword('https://dl.fbaipublicfiles.com/fairseq/models/dynamicconv/wmt14.en-fr.joined-dict.dynamicconv-glu.tar.gz'),
+            'lightconv.glu.wmt17.zh-en': moses_subword('https://dl.fbaipublicfiles.com/fairseq/models/dynamicconv/wmt17.zh-en.lightconv-glu.tar.gz'),
+            'dynamicconv.glu.wmt17.zh-en': moses_subword('https://dl.fbaipublicfiles.com/fairseq/models/dynamicconv/wmt17.zh-en.dynamicconv-glu.tar.gz'),
+        }
+        # fmt: on
+
+    def __init__(self, encoder, decoder):
+        super().__init__(encoder, decoder)
+
+    @staticmethod
+    def add_args(parser):
+        """Add model-specific arguments to the parser."""
+        parser.add_argument(
+            "--dropout", type=float, metavar="D", help="dropout probability"
+        )
+        parser.add_argument(
+            "--attention-dropout",
+            type=float,
+            metavar="D",
+            help="dropout probability for attention weights",
+        )
+        parser.add_argument(
+            "--relu-dropout",
+            type=float,
+            metavar="D",
+            help="dropout probability after ReLU in FFN",
+        )
+        parser.add_argument(
+            "--input-dropout",
+            type=float,
+            metavar="D",
+            help="dropout probability of the inputs",
+        )
+        parser.add_argument(
+            "--encoder-embed-path",
+            type=str,
+            metavar="STR",
+            help="path to pre-trained encoder embedding",
+        )
+        parser.add_argument(
+            "--encoder-embed-dim",
+            type=int,
+            metavar="N",
+            help="encoder embedding dimension",
+        )
+        parser.add_argument(
+            "--encoder-conv-dim",
+            type=int,
+            metavar="N",
+            help="encoder embedding dimension",
+        )
+        parser.add_argument(
+            "--encoder-ffn-embed-dim",
+            type=int,
+            metavar="N",
+            help="encoder embedding dimension for FFN",
+        )
+        parser.add_argument(
+            "--encoder-layers", type=int, metavar="N", help="num encoder layers"
+        )
+        parser.add_argument(
+            "--encoder-attention-heads",
+            type=int,
+            metavar="N",
+            help="num encoder attention heads or LightConv/DynamicConv heads",
+        )
+        parser.add_argument(
+            "--encoder-normalize-before",
+            action="store_true",
+            help="apply layernorm before each encoder block",
+        )
+        parser.add_argument(
+            "--encoder-learned-pos",
+            action="store_true",
+            help="use learned positional embeddings in the encoder",
+        )
+        parser.add_argument(
+            "--decoder-embed-path",
+            type=str,
+            metavar="STR",
+            help="path to pre-trained decoder embedding",
+        )
+        parser.add_argument(
+            "--decoder-embed-dim",
+            type=int,
+            metavar="N",
+            help="decoder embedding dimension",
+        )
+        parser.add_argument(
+            "--decoder-conv-dim",
+            type=int,
+            metavar="N",
+            help="decoder embedding dimension",
+        )
+        parser.add_argument(
+            "--decoder-ffn-embed-dim",
+            type=int,
+            metavar="N",
+            help="decoder embedding dimension for FFN",
+        )
+        parser.add_argument(
+            "--decoder-layers", type=int, metavar="N", help="num decoder layers"
+        )
+        parser.add_argument(
+            "--decoder-attention-heads",
+            type=int,
+            metavar="N",
+            help="num decoder attention heads or LightConv/DynamicConv heads",
+        )
+        parser.add_argument(
+            "--decoder-learned-pos",
+            action="store_true",
+            help="use learned positional embeddings in the decoder",
+        )
+        parser.add_argument(
+            "--decoder-normalize-before",
+            action="store_true",
+            help="apply layernorm before each decoder block",
+        )
+        parser.add_argument(
+            "--share-decoder-input-output-embed",
+            action="store_true",
+            help="share decoder input and output embeddings",
+        )
+        parser.add_argument(
+            "--share-all-embeddings",
+            action="store_true",
+            help="share encoder, decoder and output embeddings"
+            " (requires shared dictionary and embed dim)",
+        )
+        parser.add_argument(
+            "--adaptive-softmax-cutoff",
+            metavar="EXPR",
+            help="comma separated list of adaptive softmax cutoff points. "
+            "Must be used with adaptive_loss criterion",
+        ),
+        parser.add_argument(
+            "--adaptive-softmax-dropout",
+            type=float,
+            metavar="D",
+            help="sets adaptive softmax dropout for the tail projections",
+        )
+
+        """LightConv and DynamicConv arguments"""
+        parser.add_argument(
+            "--encoder-kernel-size-list",
+            type=lambda x: utils.eval_str_list(x, int),
+            help='list of kernel size (default: "[3,7,15,31,31,31,31]")',
+        )
+        parser.add_argument(
+            "--decoder-kernel-size-list",
+            type=lambda x: utils.eval_str_list(x, int),
+            help='list of kernel size (default: "[3,7,15,31,31,31]")',
+        )
+        parser.add_argument(
+            "--encoder-glu", type=utils.eval_bool, help="glu after in proj"
+        )
+        parser.add_argument(
+            "--decoder-glu", type=utils.eval_bool, help="glu after in proj"
+        )
+        parser.add_argument(
+            "--encoder-conv-type",
+            default="dynamic",
+            type=str,
+            choices=["dynamic", "lightweight"],
+            help="type of convolution",
+        )
+        parser.add_argument(
+            "--decoder-conv-type",
+            default="dynamic",
+            type=str,
+            choices=["dynamic", "lightweight"],
+            help="type of convolution",
+        )
+        parser.add_argument("--weight-softmax", default=True, type=utils.eval_bool)
+        parser.add_argument(
+            "--weight-dropout",
+            type=float,
+            metavar="D",
+            help="dropout probability for conv weights",
+        )
+
+    @classmethod
+    def build_model(cls, args, task):
+        """Build a new model instance."""
+
+        # make sure all arguments are present in older models
+        base_architecture(args)
+
+        if not hasattr(args, "max_source_positions"):
+            args.max_source_positions = 1024
+        if not hasattr(args, "max_target_positions"):
+            args.max_target_positions = 1024
+
+        src_dict, tgt_dict = task.source_dictionary, task.target_dictionary
+
+        def build_embedding(dictionary, embed_dim, path=None):
+            num_embeddings = len(dictionary)
+            padding_idx = dictionary.pad()
+            emb = Embedding(num_embeddings, embed_dim, padding_idx)
+            # if provided, load from preloaded dictionaries
+            if path:
+                embed_dict = utils.parse_embedding(path)
+                utils.load_embedding(embed_dict, dictionary, emb)
+            return emb
+
+        if args.share_all_embeddings:
+            if src_dict != tgt_dict:
+                raise RuntimeError(
+                    "--share-all-embeddings requires a joined dictionary"
+                )
+            if args.encoder_embed_dim != args.decoder_embed_dim:
+                raise RuntimeError(
+                    "--share-all-embeddings requires --encoder-embed-dim to match --decoder-embed-dim"
+                )
+            if args.decoder_embed_path and (
+                args.decoder_embed_path != args.encoder_embed_path
+            ):
+                raise RuntimeError(
+                    "--share-all-embeddings not compatible with --decoder-embed-path"
+                )
+            encoder_embed_tokens = build_embedding(
+                src_dict, args.encoder_embed_dim, args.encoder_embed_path
+            )
+            decoder_embed_tokens = encoder_embed_tokens
+            args.share_decoder_input_output_embed = True
+        else:
+            encoder_embed_tokens = build_embedding(
+                src_dict, args.encoder_embed_dim, args.encoder_embed_path
+            )
+            decoder_embed_tokens = build_embedding(
+                tgt_dict, args.decoder_embed_dim, args.decoder_embed_path
+            )
+
+        encoder = LightConvEncoder(args, src_dict, encoder_embed_tokens)
+        decoder = LightConvDecoder(args, tgt_dict, decoder_embed_tokens)
+        return LightConvModel(encoder, decoder)
+
+
+class LightConvEncoder(FairseqEncoder):
+    """
+    LightConv encoder consisting of *args.encoder_layers* layers. Each layer
+    is a :class:`LightConvEncoderLayer`.
+
+    Args:
+        args (argparse.Namespace): parsed command-line arguments
+        dictionary (~fairseq.data.Dictionary): encoding dictionary
+        embed_tokens (torch.nn.Embedding): input embedding
+    """
+
+    def __init__(self, args, dictionary, embed_tokens):
+        super().__init__(dictionary)
+        self.dropout_module = FairseqDropout(
+            args.dropout, module_name=self.__class__.__name__
+        )
+
+        embed_dim = embed_tokens.embedding_dim
+        self.padding_idx = embed_tokens.padding_idx
+        self.max_source_positions = args.max_source_positions
+
+        self.embed_tokens = embed_tokens
+        self.embed_scale = math.sqrt(embed_dim)
+        self.embed_positions = (
+            PositionalEmbedding(
+                args.max_source_positions,
+                embed_dim,
+                self.padding_idx,
+                learned=args.encoder_learned_pos,
+            )
+            if not args.no_token_positional_embeddings
+            else None
+        )
+
+        self.layers = nn.ModuleList([])
+        self.layers.extend(
+            [
+                LightConvEncoderLayer(
+                    args, kernel_size=args.encoder_kernel_size_list[i]
+                )
+                for i in range(args.encoder_layers)
+            ]
+        )
+        self.register_buffer("version", torch.Tensor([2]))
+        self.normalize = args.encoder_normalize_before
+        if self.normalize:
+            self.layer_norm = LayerNorm(embed_dim)
+
+    def forward(self, src_tokens, **unused):
+        """
+        Args:
+            src_tokens (LongTensor): tokens in the source language of shape
+                `(batch, src_len)`
+
+        Returns:
+            dict:
+                - **encoder_out** (Tensor): the last encoder layer's output of
+                  shape `(src_len, batch, embed_dim)`
+                - **encoder_padding_mask** (ByteTensor): the positions of
+                  padding elements of shape `(batch, src_len)`
+        """
+        # embed tokens and positions
+        x = self.embed_scale * self.embed_tokens(src_tokens)
+        if self.embed_positions is not None:
+            x += self.embed_positions(src_tokens)
+        x = self.dropout_module(x)
+
+        # B x T x C -> T x B x C
+        x = x.transpose(0, 1)
+
+        # compute padding mask
+        encoder_padding_mask = src_tokens.eq(self.padding_idx)
+        if not encoder_padding_mask.any():
+            encoder_padding_mask = None
+
+        # encoder layers
+        for layer in self.layers:
+            x = layer(x, encoder_padding_mask)
+
+        if self.normalize:
+            x = self.layer_norm(x)
+
+        return {
+            "encoder_out": x,  # T x B x C
+            "encoder_padding_mask": encoder_padding_mask,  # B x T
+        }
+
+    def reorder_encoder_out(self, encoder_out, new_order):
+        """
+        Reorder encoder output according to *new_order*.
+
+        Args:
+            encoder_out: output from the ``forward()`` method
+            new_order (LongTensor): desired order
+
+        Returns:
+            *encoder_out* rearranged according to *new_order*
+        """
+        if encoder_out["encoder_out"] is not None:
+            encoder_out["encoder_out"] = encoder_out["encoder_out"].index_select(
+                1, new_order
+            )
+        if encoder_out["encoder_padding_mask"] is not None:
+            encoder_out["encoder_padding_mask"] = encoder_out[
+                "encoder_padding_mask"
+            ].index_select(0, new_order)
+        return encoder_out
+
+    def max_positions(self):
+        """Maximum input length supported by the encoder."""
+        if self.embed_positions is None:
+            return self.max_source_positions
+        return min(self.max_source_positions, self.embed_positions.max_positions)
+
+
+class LightConvDecoder(FairseqIncrementalDecoder):
+    """
+    LightConv decoder consisting of *args.decoder_layers* layers. Each layer
+    is a :class:`LightConvDecoderLayer`.
+
+    Args:
+        args (argparse.Namespace): parsed command-line arguments
+        dictionary (~fairseq.data.Dictionary): decoding dictionary
+        embed_tokens (torch.nn.Embedding): output embedding
+        no_encoder_attn (bool, optional): whether to attend to encoder outputs.
+            Default: ``False``
+    """
+
+    def __init__(
+        self, args, dictionary, embed_tokens, no_encoder_attn=False, final_norm=True
+    ):
+        super().__init__(dictionary)
+        self.dropout_module = FairseqDropout(
+            args.dropout, module_name=self.__class__.__name__
+        )
+        self.share_input_output_embed = args.share_decoder_input_output_embed
+
+        input_embed_dim = embed_tokens.embedding_dim
+        embed_dim = args.decoder_embed_dim
+        output_embed_dim = args.decoder_output_dim
+
+        padding_idx = embed_tokens.padding_idx
+        self.max_target_positions = args.max_target_positions
+
+        self.embed_tokens = embed_tokens
+        self.embed_scale = math.sqrt(embed_dim)  # todo: try with input_embed_dim
+
+        self.project_in_dim = (
+            Linear(input_embed_dim, embed_dim, bias=False)
+            if embed_dim != input_embed_dim
+            else None
+        )
+
+        self.embed_positions = (
+            PositionalEmbedding(
+                args.max_target_positions,
+                embed_dim,
+                padding_idx,
+                learned=args.decoder_learned_pos,
+            )
+            if not args.no_token_positional_embeddings
+            else None
+        )
+
+        self.layers = nn.ModuleList([])
+        self.layers.extend(
+            [
+                LightConvDecoderLayer(
+                    args, no_encoder_attn, kernel_size=args.decoder_kernel_size_list[i]
+                )
+                for i in range(args.decoder_layers)
+            ]
+        )
+
+        self.adaptive_softmax = None
+
+        self.project_out_dim = (
+            Linear(embed_dim, output_embed_dim, bias=False)
+            if embed_dim != output_embed_dim and not args.tie_adaptive_weights
+            else None
+        )
+
+        if args.adaptive_softmax_cutoff is not None:
+            self.adaptive_softmax = AdaptiveSoftmax(
+                len(dictionary),
+                output_embed_dim,
+                utils.eval_str_list(args.adaptive_softmax_cutoff, type=int),
+                dropout=args.adaptive_softmax_dropout,
+                adaptive_inputs=embed_tokens if args.tie_adaptive_weights else None,
+                factor=args.adaptive_softmax_factor,
+                tie_proj=args.tie_adaptive_proj,
+            )
+        elif not self.share_input_output_embed:
+            self.embed_out = nn.Parameter(
+                torch.Tensor(len(dictionary), output_embed_dim)
+            )
+            nn.init.normal_(self.embed_out, mean=0, std=output_embed_dim ** -0.5)
+        self.register_buffer("version", torch.Tensor([2]))
+        self.normalize = args.decoder_normalize_before and final_norm
+        if self.normalize:
+            self.layer_norm = LayerNorm(embed_dim)
+
+    def forward(
+        self, prev_output_tokens, encoder_out=None, incremental_state=None, **kwargs
+    ):
+        """
+        Args:
+            prev_output_tokens (LongTensor): previous decoder outputs of shape
+                `(batch, tgt_len)`, for teacher forcing
+            encoder_out (Tensor, optional): output from the encoder, used for
+                encoder-side attention
+            incremental_state (dict): dictionary used for storing state during
+                :ref:`Incremental decoding`
+
+        Returns:
+            tuple:
+                - the last decoder layer's output of shape `(batch, tgt_len,
+                  vocab)`
+                - the last decoder layer's attention weights of shape `(batch,
+                  tgt_len, src_len)`
+        """
+        # embed positions
+        positions = (
+            self.embed_positions(
+                prev_output_tokens,
+                incremental_state=incremental_state,
+            )
+            if self.embed_positions is not None
+            else None
+        )
+
+        if incremental_state is not None:
+            prev_output_tokens = prev_output_tokens[:, -1:]
+            if positions is not None:
+                positions = positions[:, -1:]
+
+        # embed tokens and positions
+        x = self.embed_scale * self.embed_tokens(prev_output_tokens)
+
+        if self.project_in_dim is not None:
+            x = self.project_in_dim(x)
+
+        if positions is not None:
+            x += positions
+        x = self.dropout_module(x)
+
+        # B x T x C -> T x B x C
+        x = x.transpose(0, 1)
+        attn = None
+
+        inner_states = [x]
+
+        # decoder layers
+        for layer in self.layers:
+            x, attn = layer(
+                x,
+                encoder_out["encoder_out"] if encoder_out is not None else None,
+                encoder_out["encoder_padding_mask"]
+                if encoder_out is not None
+                else None,
+                incremental_state,
+            )
+            inner_states.append(x)
+
+        if self.normalize:
+            x = self.layer_norm(x)
+
+        # T x B x C -> B x T x C
+        x = x.transpose(0, 1)
+
+        if self.project_out_dim is not None:
+            x = self.project_out_dim(x)
+
+        if self.adaptive_softmax is None:
+            # project back to size of vocabulary
+            if self.share_input_output_embed:
+                x = F.linear(x, self.embed_tokens.weight)
+            else:
+                x = F.linear(x, self.embed_out)
+
+        return x, {"attn": attn, "inner_states": inner_states}
+
+    def max_positions(self):
+        """Maximum output length supported by the decoder."""
+        if self.embed_positions is None:
+            return self.max_target_positions
+        return min(self.max_target_positions, self.embed_positions.max_positions)
+
+    def buffered_future_mask(self, tensor):
+        dim = tensor.size(0)
+        if (
+            not hasattr(self, "_future_mask")
+            or self._future_mask is None
+            or self._future_mask.device != tensor.device
+        ):
+            self._future_mask = torch.triu(
+                utils.fill_with_neg_inf(tensor.new(dim, dim)), 1
+            )
+        if self._future_mask.size(0) < dim:
+            self._future_mask = torch.triu(
+                utils.fill_with_neg_inf(self._future_mask.resize_(dim, dim)), 1
+            )
+        return self._future_mask[:dim, :dim]
+
+
+class LightConvEncoderLayer(nn.Module):
+    """Encoder layer block.
+
+    Args:
+        args (argparse.Namespace): parsed command-line arguments
+        kernel_size: kernel size of the convolution
+    """
+
+    def __init__(self, args, kernel_size=0):
+        super().__init__()
+        self.embed_dim = args.encoder_embed_dim
+        self.conv_dim = args.encoder_conv_dim
+        padding_l = (
+            kernel_size // 2
+            if kernel_size % 2 == 1
+            else ((kernel_size - 1) // 2, kernel_size // 2)
+        )
+
+        if args.encoder_glu:
+            self.linear1 = Linear(self.embed_dim, 2 * self.conv_dim)
+            self.act = nn.GLU()
+        else:
+            self.linear1 = Linear(self.embed_dim, self.conv_dim)
+            self.act = None
+        if args.encoder_conv_type == "lightweight":
+            self.conv = LightweightConv(
+                self.conv_dim,
+                kernel_size,
+                padding_l=padding_l,
+                weight_softmax=args.weight_softmax,
+                num_heads=args.encoder_attention_heads,
+                weight_dropout=args.weight_dropout,
+            )
+        elif args.encoder_conv_type == "dynamic":
+            self.conv = DynamicConv(
+                self.conv_dim,
+                kernel_size,
+                padding_l=padding_l,
+                weight_softmax=args.weight_softmax,
+                num_heads=args.encoder_attention_heads,
+                weight_dropout=args.weight_dropout,
+            )
+        else:
+            raise NotImplementedError
+        self.linear2 = Linear(self.conv_dim, self.embed_dim)
+
+        self.dropout_module = FairseqDropout(
+            args.dropout, module_name=self.__class__.__name__
+        )
+        self.relu_dropout_module = FairseqDropout(
+            args.relu_dropout, module_name=self.__class__.__name__
+        )
+        self.input_dropout_module = FairseqDropout(
+            args.input_dropout, module_name=self.__class__.__name__
+        )
+        self.normalize_before = args.encoder_normalize_before
+        self.fc1 = Linear(self.embed_dim, args.encoder_ffn_embed_dim)
+        self.fc2 = Linear(args.encoder_ffn_embed_dim, self.embed_dim)
+        self.layer_norms = nn.ModuleList([LayerNorm(self.embed_dim) for _ in range(2)])
+
+    def forward(self, x, encoder_padding_mask):
+        """
+        Args:
+            x (Tensor): input to the layer of shape `(seq_len, batch, embed_dim)`
+            encoder_padding_mask (ByteTensor): binary ByteTensor of shape
+                `(batch, src_len)` where padding elements are indicated by ``1``.
+
+        Returns:
+            encoded output of shape `(batch, src_len, embed_dim)`
+        """
+        residual = x
+        x = self.maybe_layer_norm(0, x, before=True)
+        x = self.input_dropout_module(x)
+        x = self.linear1(x)
+        if self.act is not None:
+            x = self.act(x)
+        if encoder_padding_mask is not None:
+            x = x.masked_fill(encoder_padding_mask.transpose(0, 1).unsqueeze(2), 0)
+        x = self.conv(x)
+        x = self.linear2(x)
+        x = self.dropout_module(x)
+        x = residual + x
+        x = self.maybe_layer_norm(0, x, after=True)
+
+        residual = x
+        x = self.maybe_layer_norm(1, x, before=True)
+        x = F.relu(self.fc1(x))
+        x = self.relu_dropout_module(x)
+        x = self.fc2(x)
+        x = self.dropout_module(x)
+        x = residual + x
+        x = self.maybe_layer_norm(1, x, after=True)
+        return x
+
+    def maybe_layer_norm(self, i, x, before=False, after=False):
+        assert before ^ after
+        if after ^ self.normalize_before:
+            return self.layer_norms[i](x)
+        else:
+            return x
+
+    def extra_repr(self):
+        return (
+            "dropout={}, relu_dropout={}, input_dropout={}, normalize_before={}".format(
+                self.dropout_module.p,
+                self.relu_dropout_module.p,
+                self.input_dropout_module.p,
+                self.normalize_before,
+            )
+        )
+
+
+class LightConvDecoderLayer(nn.Module):
+    """Decoder layer block.
+
+    Args:
+        args (argparse.Namespace): parsed command-line arguments
+        no_encoder_attn (bool, optional): whether to attend to encoder outputs.
+            Default: ``False``
+        kernel_size: kernel size of the convolution
+    """
+
+    def __init__(self, args, no_encoder_attn=False, kernel_size=0):
+        super().__init__()
+        self.embed_dim = args.decoder_embed_dim
+        self.conv_dim = args.decoder_conv_dim
+        if args.decoder_glu:
+            self.linear1 = Linear(self.embed_dim, 2 * self.conv_dim)
+            self.act = nn.GLU()
+        else:
+            self.linear1 = Linear(self.embed_dim, self.conv_dim)
+            self.act = None
+        if args.decoder_conv_type == "lightweight":
+            self.conv = LightweightConv(
+                self.conv_dim,
+                kernel_size,
+                padding_l=kernel_size - 1,
+                weight_softmax=args.weight_softmax,
+                num_heads=args.decoder_attention_heads,
+                weight_dropout=args.weight_dropout,
+            )
+        elif args.decoder_conv_type == "dynamic":
+            self.conv = DynamicConv(
+                self.conv_dim,
+                kernel_size,
+                padding_l=kernel_size - 1,
+                weight_softmax=args.weight_softmax,
+                num_heads=args.decoder_attention_heads,
+                weight_dropout=args.weight_dropout,
+            )
+        else:
+            raise NotImplementedError
+        self.linear2 = Linear(self.conv_dim, self.embed_dim)
+
+        self.dropout_module = FairseqDropout(
+            args.dropout, module_name=self.__class__.__name__
+        )
+        self.relu_dropout_module = FairseqDropout(
+            args.relu_dropout, module_name=self.__class__.__name__
+        )
+        self.input_dropout_module = FairseqDropout(
+            args.input_dropout, module_name=self.__class__.__name__
+        )
+        self.normalize_before = args.decoder_normalize_before
+
+        self.conv_layer_norm = LayerNorm(self.embed_dim)
+
+        if no_encoder_attn:
+            self.encoder_attn = None
+            self.encoder_attn_layer_norm = None
+        else:
+            self.encoder_attn = MultiheadAttention(
+                self.embed_dim,
+                args.decoder_attention_heads,
+                dropout=args.attention_dropout,
+                encoder_decoder_attention=True,
+            )
+            self.encoder_attn_layer_norm = LayerNorm(self.embed_dim)
+
+        self.fc1 = Linear(self.embed_dim, args.decoder_ffn_embed_dim)
+        self.fc2 = Linear(args.decoder_ffn_embed_dim, self.embed_dim)
+
+        self.final_layer_norm = LayerNorm(self.embed_dim)
+        self.need_attn = True
+
+    def forward(
+        self,
+        x,
+        encoder_out,
+        encoder_padding_mask,
+        incremental_state,
+        prev_conv_state=None,
+        prev_attn_state=None,
+        conv_mask=None,
+        conv_padding_mask=None,
+    ):
+        """
+        Args:
+            x (Tensor): input to the layer of shape `(seq_len, batch, embed_dim)`
+            encoder_padding_mask (ByteTensor): binary ByteTensor of shape
+                `(batch, src_len)` where padding elements are indicated by ``1``.
+
+        Returns:
+            encoded output of shape `(batch, src_len, embed_dim)`
+        """
+        residual = x
+        x = self.maybe_layer_norm(self.conv_layer_norm, x, before=True)
+        if prev_conv_state is not None:
+            if incremental_state is None:
+                incremental_state = {}
+            self.conv._set_input_buffer(incremental_state, prev_conv_state)
+        x = self.input_dropout_module(x)
+        x = self.linear1(x)
+        if self.act is not None:
+            x = self.act(x)
+        x = self.conv(x, incremental_state=incremental_state)
+        x = self.linear2(x)
+        x = self.dropout_module(x)
+        x = residual + x
+        x = self.maybe_layer_norm(self.conv_layer_norm, x, after=True)
+
+        attn = None
+        if self.encoder_attn is not None:
+            residual = x
+            x = self.maybe_layer_norm(self.encoder_attn_layer_norm, x, before=True)
+            if prev_attn_state is not None:
+                if incremental_state is None:
+                    incremental_state = {}
+                prev_key, prev_value = prev_attn_state
+                saved_state = {"prev_key": prev_key, "prev_value": prev_value}
+                self.encoder_attn._set_input_buffer(incremental_state, saved_state)
+            x, attn = self.encoder_attn(
+                query=x,
+                key=encoder_out,
+                value=encoder_out,
+                key_padding_mask=encoder_padding_mask,
+                incremental_state=incremental_state,
+                static_kv=True,
+                need_weights=(not self.training and self.need_attn),
+            )
+            x = self.dropout_module(x)
+            x = residual + x
+            x = self.maybe_layer_norm(self.encoder_attn_layer_norm, x, after=True)
+
+        residual = x
+        x = self.maybe_layer_norm(self.final_layer_norm, x, before=True)
+        x = F.relu(self.fc1(x))
+        x = self.relu_dropout_module(x)
+        x = self.fc2(x)
+        x = self.dropout_module(x)
+        x = residual + x
+        x = self.maybe_layer_norm(self.final_layer_norm, x, after=True)
+        return x, attn
+
+    def maybe_layer_norm(self, layer_norm, x, before=False, after=False):
+        assert before ^ after
+        if after ^ self.normalize_before:
+            return layer_norm(x)
+        else:
+            return x
+
+    def make_generation_fast_(self, need_attn=False, **kwargs):
+        self.need_attn = need_attn
+
+    def extra_repr(self):
+        return (
+            "dropout={}, relu_dropout={}, input_dropout={}, normalize_before={}".format(
+                self.dropout_module.p,
+                self.relu_dropout_module.p,
+                self.input_dropout_module.p,
+                self.normalize_before,
+            )
+        )
+
+
+def Embedding(num_embeddings, embedding_dim, padding_idx):
+    m = nn.Embedding(num_embeddings, embedding_dim, padding_idx=padding_idx)
+    nn.init.normal_(m.weight, mean=0, std=embedding_dim ** -0.5)
+    nn.init.constant_(m.weight[padding_idx], 0)
+    return m
+
+
+def Linear(in_features, out_features, bias=True):
+    m = nn.Linear(in_features, out_features, bias)
+    nn.init.xavier_uniform_(m.weight)
+    if bias:
+        nn.init.constant_(m.bias, 0.0)
+    return m
+
+
+@register_model_architecture("lightconv", "lightconv")
+def base_architecture(args):
+    args.encoder_embed_path = getattr(args, "encoder_embed_path", None)
+    args.encoder_embed_dim = getattr(args, "encoder_embed_dim", 512)
+    args.encoder_ffn_embed_dim = getattr(args, "encoder_ffn_embed_dim", 2048)
+    args.encoder_layers = getattr(args, "encoder_layers", 7)
+    args.encoder_attention_heads = getattr(args, "encoder_attention_heads", 8)
+    args.encoder_normalize_before = getattr(args, "encoder_normalize_before", False)
+    args.encoder_learned_pos = getattr(args, "encoder_learned_pos", False)
+    args.decoder_embed_path = getattr(args, "decoder_embed_path", None)
+    args.decoder_embed_dim = getattr(args, "decoder_embed_dim", args.encoder_embed_dim)
+    args.decoder_ffn_embed_dim = getattr(
+        args, "decoder_ffn_embed_dim", args.encoder_ffn_embed_dim
+    )
+    args.decoder_layers = getattr(args, "decoder_layers", 6)
+    args.decoder_attention_heads = getattr(args, "decoder_attention_heads", 8)
+    args.decoder_normalize_before = getattr(args, "decoder_normalize_before", False)
+    args.decoder_learned_pos = getattr(args, "decoder_learned_pos", False)
+    args.attention_dropout = getattr(args, "attention_dropout", 0.0)
+    args.relu_dropout = getattr(args, "relu_dropout", 0.0)
+    args.dropout = getattr(args, "dropout", 0.1)
+    args.adaptive_softmax_cutoff = getattr(args, "adaptive_softmax_cutoff", None)
+    args.adaptive_softmax_dropout = getattr(args, "adaptive_softmax_dropout", 0)
+    args.share_decoder_input_output_embed = getattr(
+        args, "share_decoder_input_output_embed", False
+    )
+    args.share_all_embeddings = getattr(args, "share_all_embeddings", False)
+    args.no_token_positional_embeddings = getattr(
+        args, "no_token_positional_embeddings", False
+    )
+
+    args.decoder_output_dim = getattr(
+        args, "decoder_output_dim", args.decoder_embed_dim
+    )
+    args.decoder_input_dim = getattr(args, "decoder_input_dim", args.decoder_embed_dim)
+
+    args.encoder_conv_dim = getattr(args, "encoder_conv_dim", args.encoder_embed_dim)
+    args.decoder_conv_dim = getattr(args, "decoder_conv_dim", args.decoder_embed_dim)
+
+    args.encoder_kernel_size_list = getattr(
+        args, "encoder_kernel_size_list", [3, 7, 15, 31, 31, 31, 31]
+    )
+    args.decoder_kernel_size_list = getattr(
+        args, "decoder_kernel_size_list", [3, 7, 15, 31, 31, 31]
+    )
+    if len(args.encoder_kernel_size_list) == 1:
+        args.encoder_kernel_size_list = (
+            args.encoder_kernel_size_list * args.encoder_layers
+        )
+    if len(args.decoder_kernel_size_list) == 1:
+        args.decoder_kernel_size_list = (
+            args.decoder_kernel_size_list * args.decoder_layers
+        )
+    assert (
+        len(args.encoder_kernel_size_list) == args.encoder_layers
+    ), "encoder_kernel_size_list doesn't match encoder_layers"
+    assert (
+        len(args.decoder_kernel_size_list) == args.decoder_layers
+    ), "decoder_kernel_size_list doesn't match decoder_layers"
+    args.encoder_glu = getattr(args, "encoder_glu", True)
+    args.decoder_glu = getattr(args, "decoder_glu", True)
+    args.input_dropout = getattr(args, "input_dropout", 0.1)
+    args.weight_dropout = getattr(args, "weight_dropout", args.attention_dropout)
+
+
+@register_model_architecture("lightconv", "lightconv_iwslt_de_en")
+def lightconv_iwslt_de_en(args):
+    args.encoder_embed_dim = getattr(args, "encoder_embed_dim", 512)
+    args.encoder_ffn_embed_dim = getattr(args, "encoder_ffn_embed_dim", 1024)
+    args.encoder_attention_heads = getattr(args, "encoder_attention_heads", 4)
+    args.encoder_layers = getattr(args, "encoder_layers", 7)
+    args.decoder_embed_dim = getattr(args, "decoder_embed_dim", 512)
+    args.decoder_ffn_embed_dim = getattr(args, "decoder_ffn_embed_dim", 1024)
+    args.decoder_attention_heads = getattr(args, "decoder_attention_heads", 4)
+    args.decoder_layers = getattr(args, "decoder_layers", 6)
+    args.attention_dropout = getattr(args, "attention_dropout", 0.1)
+    args.weight_dropout = getattr(args, "weight_dropout", 0.1)
+    args.encoder_glu = getattr(args, "encoder_glu", False)
+    args.decoder_glu = getattr(args, "decoder_glu", False)
+    args.input_dropout = getattr(args, "input_dropout", 0.0)
+    base_architecture(args)
+
+
+@register_model_architecture("lightconv", "lightconv_wmt_en_de")
+def lightconv_wmt_en_de(args):
+    base_architecture(args)
+
+
+@register_model_architecture("lightconv", "lightconv_wmt_en_de_big")
+def lightconv_wmt_en_de_big(args):
+    args.attention_dropout = getattr(args, "attention_dropout", 0.1)
+    args.encoder_embed_dim = getattr(args, "encoder_embed_dim", 1024)
+    args.encoder_ffn_embed_dim = getattr(args, "encoder_ffn_embed_dim", 4096)
+    args.encoder_attention_heads = getattr(args, "encoder_attention_heads", 16)
+    args.encoder_normalize_before = getattr(args, "encoder_normalize_before", False)
+    args.decoder_embed_dim = getattr(args, "decoder_embed_dim", 1024)
+    args.decoder_ffn_embed_dim = getattr(args, "decoder_ffn_embed_dim", 4096)
+    args.decoder_attention_heads = getattr(args, "decoder_attention_heads", 16)
+    args.dropout = getattr(args, "dropout", 0.3)
+    base_architecture(args)
+
+
+@register_model_architecture("lightconv", "lightconv_wmt_en_fr_big")
+def lightconv_wmt_en_fr_big(args):
+    args.dropout = getattr(args, "dropout", 0.1)
+    lightconv_wmt_en_de_big(args)
+
+
+@register_model_architecture("lightconv", "lightconv_wmt_zh_en_big")
+def lightconv_wmt_zh_en_big(args):
+    args.dropout = getattr(args, "dropout", 0.2)
+    args.attention_dropout = getattr(args, "attention_dropout", 0.2)
+    args.weight_dropout = getattr(args, "weight_dropout", 0.2)
+    lightconv_wmt_en_de_big(args)
diff --git a/SpeechT5/fairseq/fairseq/models/lightconv_lm.py b/SpeechT5/fairseq/fairseq/models/lightconv_lm.py
new file mode 100644
index 0000000000000000000000000000000000000000..1d9efc4e42a5ecc1b83338055f18ade5a83ea666
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/models/lightconv_lm.py
@@ -0,0 +1,306 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from fairseq import utils
+from fairseq.models import (
+    FairseqLanguageModel,
+    register_model,
+    register_model_architecture,
+)
+from fairseq.models.lightconv import Embedding, LightConvDecoder
+from fairseq.modules import AdaptiveInput, CharacterTokenEmbedder
+
+
+@register_model("lightconv_lm")
+class LightConvLanguageModel(FairseqLanguageModel):
+    def __init__(self, decoder):
+        super().__init__(decoder)
+
+    @staticmethod
+    def add_args(parser):
+        """Add model-specific arguments to the parser."""
+        parser.add_argument(
+            "--dropout",
+            default=0.1,
+            type=float,
+            metavar="D",
+            help="dropout probability",
+        )
+        parser.add_argument(
+            "--attention-dropout",
+            default=0.0,
+            type=float,
+            metavar="D",
+            help="dropout probability for attention weights",
+        )
+        parser.add_argument(
+            "--relu-dropout",
+            default=0.0,
+            type=float,
+            metavar="D",
+            help="dropout probability after ReLU in FFN",
+        )
+        parser.add_argument(
+            "--input-dropout",
+            type=float,
+            metavar="D",
+            help="dropout probability of the inputs",
+        )
+        parser.add_argument(
+            "--decoder-embed-dim",
+            type=int,
+            metavar="N",
+            help="decoder embedding dimension",
+        )
+        parser.add_argument(
+            "--decoder-output-dim",
+            type=int,
+            metavar="N",
+            help="decoder output dimension",
+        )
+        parser.add_argument(
+            "--decoder-input-dim", type=int, metavar="N", help="decoder input dimension"
+        )
+        parser.add_argument(
+            "--decoder-ffn-embed-dim",
+            type=int,
+            metavar="N",
+            help="decoder embedding dimension for FFN",
+        )
+        parser.add_argument(
+            "--decoder-layers", type=int, metavar="N", help="num decoder layers"
+        )
+        parser.add_argument(
+            "--decoder-attention-heads",
+            type=int,
+            metavar="N",
+            help="num decoder attention heads or LightConv/DynamicConv heads",
+        )
+        parser.add_argument(
+            "--decoder-normalize-before",
+            default=False,
+            action="store_true",
+            help="apply layernorm before each decoder block",
+        )
+        parser.add_argument(
+            "--adaptive-softmax-cutoff",
+            metavar="EXPR",
+            help="comma separated list of adaptive softmax cutoff points. "
+            "Must be used with adaptive_loss criterion",
+        )
+        parser.add_argument(
+            "--adaptive-softmax-dropout",
+            type=float,
+            metavar="D",
+            help="sets adaptive softmax dropout for the tail projections",
+        )
+        parser.add_argument(
+            "--adaptive-softmax-factor",
+            type=float,
+            metavar="N",
+            help="adaptive input factor",
+        )
+        parser.add_argument(
+            "--no-token-positional-embeddings",
+            default=False,
+            action="store_true",
+            help="if set, disables positional embeddings (outside self attention)",
+        )
+        parser.add_argument(
+            "--share-decoder-input-output-embed",
+            default=False,
+            action="store_true",
+            help="share decoder input and output embeddings",
+        )
+        parser.add_argument(
+            "--character-embeddings",
+            default=False,
+            action="store_true",
+            help="if set, uses character embedding convolutions to produce token embeddings",
+        )
+        parser.add_argument(
+            "--character-filters",
+            type=str,
+            metavar="LIST",
+            default="[(1, 64), (2, 128), (3, 192), (4, 256), (5, 256), (6, 256), (7, 256)]",
+            help="size of character embeddings",
+        )
+        parser.add_argument(
+            "--character-embedding-dim",
+            type=int,
+            metavar="N",
+            default=4,
+            help="size of character embeddings",
+        )
+        parser.add_argument(
+            "--char-embedder-highway-layers",
+            type=int,
+            metavar="N",
+            default=2,
+            help="number of highway layers for character token embeddder",
+        )
+        parser.add_argument(
+            "--adaptive-input",
+            default=False,
+            action="store_true",
+            help="if set, uses adaptive input",
+        )
+        parser.add_argument(
+            "--adaptive-input-factor",
+            type=float,
+            metavar="N",
+            help="adaptive input factor",
+        )
+        parser.add_argument(
+            "--adaptive-input-cutoff",
+            metavar="EXPR",
+            help="comma separated list of adaptive input cutoff points.",
+        )
+        parser.add_argument(
+            "--tie-adaptive-weights",
+            action="store_true",
+            help="if set, ties the weights of adaptive softmax and adaptive input",
+        )
+        parser.add_argument(
+            "--tie-adaptive-proj",
+            action="store_true",
+            help="if set, ties the projection weights of adaptive softmax and adaptive input",
+        )
+        parser.add_argument(
+            "--decoder-learned-pos",
+            action="store_true",
+            help="use learned positional embeddings in the decoder",
+        )
+
+        """LightConv and DynamicConv arguments"""
+        parser.add_argument(
+            "--decoder-kernel-size-list",
+            type=lambda x: utils.eval_str_list(x, int),
+            help='list of kernel size (default: "[3,7,15,31,31,31]")',
+        )
+        parser.add_argument(
+            "--decoder-glu", type=utils.eval_bool, help="glu after in proj"
+        )
+        parser.add_argument(
+            "--decoder-conv-type",
+            default="dynamic",
+            type=str,
+            choices=["dynamic", "lightweight"],
+            help="type of convolution",
+        )
+        parser.add_argument("--weight-softmax", default=True, type=utils.eval_bool)
+        parser.add_argument(
+            "--weight-dropout",
+            type=float,
+            metavar="D",
+            help="dropout probability for conv weights",
+        )
+
+    @classmethod
+    def build_model(cls, args, task):
+        """Build a new model instance."""
+
+        # make sure all arguments are present in older models
+        base_lm_architecture(args)
+
+        if getattr(args, "max_source_positions", None) is None:
+            args.max_source_positions = args.tokens_per_sample
+        if getattr(args, "max_target_positions", None) is None:
+            args.max_target_positions = args.tokens_per_sample
+
+        if args.character_embeddings:
+            embed_tokens = CharacterTokenEmbedder(
+                task.dictionary,
+                eval(args.character_filters),
+                args.character_embedding_dim,
+                args.decoder_embed_dim,
+                args.char_embedder_highway_layers,
+            )
+        elif args.adaptive_input:
+            embed_tokens = AdaptiveInput(
+                len(task.dictionary),
+                task.dictionary.pad(),
+                args.decoder_input_dim,
+                args.adaptive_input_factor,
+                args.decoder_embed_dim,
+                utils.eval_str_list(args.adaptive_input_cutoff, type=int),
+            )
+        else:
+            embed_tokens = Embedding(
+                len(task.dictionary), args.decoder_input_dim, task.dictionary.pad()
+            )
+
+        if args.tie_adaptive_weights:
+            assert args.adaptive_input
+            assert args.adaptive_input_factor == args.adaptive_softmax_factor
+            assert (
+                args.adaptive_softmax_cutoff == args.adaptive_input_cutoff
+            ), "{} != {}".format(
+                args.adaptive_softmax_cutoff, args.adaptive_input_cutoff
+            )
+            assert args.decoder_input_dim == args.decoder_output_dim
+
+        decoder = LightConvDecoder(
+            args,
+            task.output_dictionary,
+            embed_tokens,
+            no_encoder_attn=True,
+            final_norm=False,
+        )
+        return LightConvLanguageModel(decoder)
+
+
+@register_model_architecture("lightconv_lm", "lightconv_lm")
+def base_lm_architecture(args):
+    args.decoder_embed_dim = getattr(args, "decoder_embed_dim", 512)
+    args.decoder_ffn_embed_dim = getattr(args, "decoder_ffn_embed_dim", 2048)
+    args.decoder_layers = getattr(args, "decoder_layers", 6)
+    args.decoder_attention_heads = getattr(args, "decoder_attention_heads", 8)
+    args.adaptive_softmax_cutoff = getattr(args, "adaptive_softmax_cutoff", None)
+    args.adaptive_softmax_dropout = getattr(args, "adaptive_softmax_dropout", 0)
+    args.adaptive_softmax_factor = getattr(args, "adaptive_softmax_factor", 4)
+    args.decoder_learned_pos = getattr(args, "decoder_learned_pos", False)
+
+    args.character_embeddings = getattr(args, "character_embeddings", False)
+
+    args.decoder_output_dim = getattr(
+        args, "decoder_output_dim", args.decoder_embed_dim
+    )
+    args.decoder_input_dim = getattr(args, "decoder_input_dim", args.decoder_embed_dim)
+    args.decoder_conv_dim = getattr(args, "decoder_conv_dim", args.decoder_embed_dim)
+
+    # The model training is not stable without this
+    args.decoder_normalize_before = True
+
+    args.adaptive_input = getattr(args, "adaptive_input", False)
+    args.adaptive_input_factor = getattr(args, "adaptive_input_factor", 4)
+    args.adaptive_input_cutoff = getattr(args, "adaptive_input_cutoff", None)
+
+    args.tie_adaptive_weights = getattr(args, "tie_adaptive_weights", False)
+    args.tie_adaptive_proj = getattr(args, "tie_adaptive_proj", False)
+
+    args.decoder_kernel_size_list = getattr(
+        args, "decoder_kernel_size_list", [3, 7, 15, 31, 31, 31]
+    )
+    if len(args.decoder_kernel_size_list) == 1:
+        args.decoder_kernel_size_list = (
+            args.decoder_kernel_size_list * args.decoder_layers
+        )
+    assert (
+        len(args.decoder_kernel_size_list) == args.decoder_layers
+    ), "decoder_kernel_size_list doesn't match decoder_layers"
+    args.decoder_glu = getattr(args, "decoder_glu", True)
+    args.input_dropout = getattr(args, "input_dropout", 0.1)
+    args.weight_dropout = getattr(args, "weight_dropout", args.attention_dropout)
+
+
+@register_model_architecture("lightconv_lm", "lightconv_lm_gbw")
+def lightconv_lm_gbw(args):
+    args.decoder_embed_dim = getattr(args, "decoder_embed_dim", 512)
+    args.dropout = getattr(args, "dropout", 0.1)
+    args.attention_dropout = getattr(args, "attention_dropout", 0.1)
+    args.decoder_ffn_embed_dim = getattr(args, "decoder_ffn_embed_dim", 4096)
+    args.decoder_attention_heads = getattr(args, "decoder_attention_heads", 16)
+    base_lm_architecture(args)
diff --git a/SpeechT5/fairseq/fairseq/models/lstm.py b/SpeechT5/fairseq/fairseq/models/lstm.py
new file mode 100644
index 0000000000000000000000000000000000000000..12e3aff85dc02604a380546cd654719e4ab445f7
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/models/lstm.py
@@ -0,0 +1,753 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from typing import Dict, List, Optional, Tuple
+
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from fairseq import utils
+from fairseq.models import (
+    FairseqEncoder,
+    FairseqEncoderDecoderModel,
+    FairseqIncrementalDecoder,
+    register_model,
+    register_model_architecture,
+)
+from fairseq.modules import AdaptiveSoftmax, FairseqDropout
+from torch import Tensor
+
+
+DEFAULT_MAX_SOURCE_POSITIONS = 1e5
+DEFAULT_MAX_TARGET_POSITIONS = 1e5
+
+
+@register_model("lstm")
+class LSTMModel(FairseqEncoderDecoderModel):
+    def __init__(self, encoder, decoder):
+        super().__init__(encoder, decoder)
+
+    @staticmethod
+    def add_args(parser):
+        """Add model-specific arguments to the parser."""
+        # fmt: off
+        parser.add_argument('--dropout', type=float, metavar='D',
+                            help='dropout probability')
+        parser.add_argument('--encoder-embed-dim', type=int, metavar='N',
+                            help='encoder embedding dimension')
+        parser.add_argument('--encoder-embed-path', type=str, metavar='STR',
+                            help='path to pre-trained encoder embedding')
+        parser.add_argument('--encoder-freeze-embed', action='store_true',
+                            help='freeze encoder embeddings')
+        parser.add_argument('--encoder-hidden-size', type=int, metavar='N',
+                            help='encoder hidden size')
+        parser.add_argument('--encoder-layers', type=int, metavar='N',
+                            help='number of encoder layers')
+        parser.add_argument('--encoder-bidirectional', action='store_true',
+                            help='make all layers of encoder bidirectional')
+        parser.add_argument('--decoder-embed-dim', type=int, metavar='N',
+                            help='decoder embedding dimension')
+        parser.add_argument('--decoder-embed-path', type=str, metavar='STR',
+                            help='path to pre-trained decoder embedding')
+        parser.add_argument('--decoder-freeze-embed', action='store_true',
+                            help='freeze decoder embeddings')
+        parser.add_argument('--decoder-hidden-size', type=int, metavar='N',
+                            help='decoder hidden size')
+        parser.add_argument('--decoder-layers', type=int, metavar='N',
+                            help='number of decoder layers')
+        parser.add_argument('--decoder-out-embed-dim', type=int, metavar='N',
+                            help='decoder output embedding dimension')
+        parser.add_argument('--decoder-attention', type=str, metavar='BOOL',
+                            help='decoder attention')
+        parser.add_argument('--adaptive-softmax-cutoff', metavar='EXPR',
+                            help='comma separated list of adaptive softmax cutoff points. '
+                                 'Must be used with adaptive_loss criterion')
+        parser.add_argument('--share-decoder-input-output-embed', default=False,
+                            action='store_true',
+                            help='share decoder input and output embeddings')
+        parser.add_argument('--share-all-embeddings', default=False, action='store_true',
+                            help='share encoder, decoder and output embeddings'
+                                 ' (requires shared dictionary and embed dim)')
+
+        # Granular dropout settings (if not specified these default to --dropout)
+        parser.add_argument('--encoder-dropout-in', type=float, metavar='D',
+                            help='dropout probability for encoder input embedding')
+        parser.add_argument('--encoder-dropout-out', type=float, metavar='D',
+                            help='dropout probability for encoder output')
+        parser.add_argument('--decoder-dropout-in', type=float, metavar='D',
+                            help='dropout probability for decoder input embedding')
+        parser.add_argument('--decoder-dropout-out', type=float, metavar='D',
+                            help='dropout probability for decoder output')
+        # fmt: on
+
+    @classmethod
+    def build_model(cls, args, task):
+        """Build a new model instance."""
+        # make sure that all args are properly defaulted (in case there are any new ones)
+        base_architecture(args)
+
+        if args.encoder_layers != args.decoder_layers:
+            raise ValueError("--encoder-layers must match --decoder-layers")
+
+        max_source_positions = getattr(
+            args, "max_source_positions", DEFAULT_MAX_SOURCE_POSITIONS
+        )
+        max_target_positions = getattr(
+            args, "max_target_positions", DEFAULT_MAX_TARGET_POSITIONS
+        )
+
+        def load_pretrained_embedding_from_file(embed_path, dictionary, embed_dim):
+            num_embeddings = len(dictionary)
+            padding_idx = dictionary.pad()
+            embed_tokens = Embedding(num_embeddings, embed_dim, padding_idx)
+            embed_dict = utils.parse_embedding(embed_path)
+            utils.print_embed_overlap(embed_dict, dictionary)
+            return utils.load_embedding(embed_dict, dictionary, embed_tokens)
+
+        if args.encoder_embed_path:
+            pretrained_encoder_embed = load_pretrained_embedding_from_file(
+                args.encoder_embed_path, task.source_dictionary, args.encoder_embed_dim
+            )
+        else:
+            num_embeddings = len(task.source_dictionary)
+            pretrained_encoder_embed = Embedding(
+                num_embeddings, args.encoder_embed_dim, task.source_dictionary.pad()
+            )
+
+        if args.share_all_embeddings:
+            # double check all parameters combinations are valid
+            if task.source_dictionary != task.target_dictionary:
+                raise ValueError("--share-all-embeddings requires a joint dictionary")
+            if args.decoder_embed_path and (
+                args.decoder_embed_path != args.encoder_embed_path
+            ):
+                raise ValueError(
+                    "--share-all-embed not compatible with --decoder-embed-path"
+                )
+            if args.encoder_embed_dim != args.decoder_embed_dim:
+                raise ValueError(
+                    "--share-all-embeddings requires --encoder-embed-dim to "
+                    "match --decoder-embed-dim"
+                )
+            pretrained_decoder_embed = pretrained_encoder_embed
+            args.share_decoder_input_output_embed = True
+        else:
+            # separate decoder input embeddings
+            pretrained_decoder_embed = None
+            if args.decoder_embed_path:
+                pretrained_decoder_embed = load_pretrained_embedding_from_file(
+                    args.decoder_embed_path,
+                    task.target_dictionary,
+                    args.decoder_embed_dim,
+                )
+        # one last double check of parameter combinations
+        if args.share_decoder_input_output_embed and (
+            args.decoder_embed_dim != args.decoder_out_embed_dim
+        ):
+            raise ValueError(
+                "--share-decoder-input-output-embeddings requires "
+                "--decoder-embed-dim to match --decoder-out-embed-dim"
+            )
+
+        if args.encoder_freeze_embed:
+            pretrained_encoder_embed.weight.requires_grad = False
+        if args.decoder_freeze_embed:
+            pretrained_decoder_embed.weight.requires_grad = False
+
+        encoder = LSTMEncoder(
+            dictionary=task.source_dictionary,
+            embed_dim=args.encoder_embed_dim,
+            hidden_size=args.encoder_hidden_size,
+            num_layers=args.encoder_layers,
+            dropout_in=args.encoder_dropout_in,
+            dropout_out=args.encoder_dropout_out,
+            bidirectional=args.encoder_bidirectional,
+            pretrained_embed=pretrained_encoder_embed,
+            max_source_positions=max_source_positions,
+        )
+        decoder = LSTMDecoder(
+            dictionary=task.target_dictionary,
+            embed_dim=args.decoder_embed_dim,
+            hidden_size=args.decoder_hidden_size,
+            out_embed_dim=args.decoder_out_embed_dim,
+            num_layers=args.decoder_layers,
+            dropout_in=args.decoder_dropout_in,
+            dropout_out=args.decoder_dropout_out,
+            attention=utils.eval_bool(args.decoder_attention),
+            encoder_output_units=encoder.output_units,
+            pretrained_embed=pretrained_decoder_embed,
+            share_input_output_embed=args.share_decoder_input_output_embed,
+            adaptive_softmax_cutoff=(
+                utils.eval_str_list(args.adaptive_softmax_cutoff, type=int)
+                if args.criterion == "adaptive_loss"
+                else None
+            ),
+            max_target_positions=max_target_positions,
+            residuals=False,
+        )
+        return cls(encoder, decoder)
+
+    def forward(
+        self,
+        src_tokens,
+        src_lengths,
+        prev_output_tokens,
+        incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]] = None,
+    ):
+        encoder_out = self.encoder(src_tokens, src_lengths=src_lengths)
+        decoder_out = self.decoder(
+            prev_output_tokens,
+            encoder_out=encoder_out,
+            incremental_state=incremental_state,
+        )
+        return decoder_out
+
+
+class LSTMEncoder(FairseqEncoder):
+    """LSTM encoder."""
+
+    def __init__(
+        self,
+        dictionary,
+        embed_dim=512,
+        hidden_size=512,
+        num_layers=1,
+        dropout_in=0.1,
+        dropout_out=0.1,
+        bidirectional=False,
+        left_pad=True,
+        pretrained_embed=None,
+        padding_idx=None,
+        max_source_positions=DEFAULT_MAX_SOURCE_POSITIONS,
+    ):
+        super().__init__(dictionary)
+        self.num_layers = num_layers
+        self.dropout_in_module = FairseqDropout(
+            dropout_in, module_name=self.__class__.__name__
+        )
+        self.dropout_out_module = FairseqDropout(
+            dropout_out, module_name=self.__class__.__name__
+        )
+        self.bidirectional = bidirectional
+        self.hidden_size = hidden_size
+        self.max_source_positions = max_source_positions
+
+        num_embeddings = len(dictionary)
+        self.padding_idx = padding_idx if padding_idx is not None else dictionary.pad()
+        if pretrained_embed is None:
+            self.embed_tokens = Embedding(num_embeddings, embed_dim, self.padding_idx)
+        else:
+            self.embed_tokens = pretrained_embed
+
+        self.lstm = LSTM(
+            input_size=embed_dim,
+            hidden_size=hidden_size,
+            num_layers=num_layers,
+            dropout=self.dropout_out_module.p if num_layers > 1 else 0.0,
+            bidirectional=bidirectional,
+        )
+        self.left_pad = left_pad
+
+        self.output_units = hidden_size
+        if bidirectional:
+            self.output_units *= 2
+
+    def forward(
+        self,
+        src_tokens: Tensor,
+        src_lengths: Tensor,
+        enforce_sorted: bool = True,
+    ):
+        """
+        Args:
+            src_tokens (LongTensor): tokens in the source language of
+                shape `(batch, src_len)`
+            src_lengths (LongTensor): lengths of each source sentence of
+                shape `(batch)`
+            enforce_sorted (bool, optional): if True, `src_tokens` is
+                expected to contain sequences sorted by length in a
+                decreasing order. If False, this condition is not
+                required. Default: True.
+        """
+        if self.left_pad:
+            # nn.utils.rnn.pack_padded_sequence requires right-padding;
+            # convert left-padding to right-padding
+            src_tokens = utils.convert_padding_direction(
+                src_tokens,
+                torch.zeros_like(src_tokens).fill_(self.padding_idx),
+                left_to_right=True,
+            )
+
+        bsz, seqlen = src_tokens.size()
+
+        # embed tokens
+        x = self.embed_tokens(src_tokens)
+        x = self.dropout_in_module(x)
+
+        # B x T x C -> T x B x C
+        x = x.transpose(0, 1)
+
+        # pack embedded source tokens into a PackedSequence
+        packed_x = nn.utils.rnn.pack_padded_sequence(
+            x, src_lengths.cpu(), enforce_sorted=enforce_sorted
+        )
+
+        # apply LSTM
+        if self.bidirectional:
+            state_size = 2 * self.num_layers, bsz, self.hidden_size
+        else:
+            state_size = self.num_layers, bsz, self.hidden_size
+        h0 = x.new_zeros(*state_size)
+        c0 = x.new_zeros(*state_size)
+        packed_outs, (final_hiddens, final_cells) = self.lstm(packed_x, (h0, c0))
+
+        # unpack outputs and apply dropout
+        x, _ = nn.utils.rnn.pad_packed_sequence(
+            packed_outs, padding_value=self.padding_idx * 1.0
+        )
+        x = self.dropout_out_module(x)
+        assert list(x.size()) == [seqlen, bsz, self.output_units]
+
+        if self.bidirectional:
+            final_hiddens = self.combine_bidir(final_hiddens, bsz)
+            final_cells = self.combine_bidir(final_cells, bsz)
+
+        encoder_padding_mask = src_tokens.eq(self.padding_idx).t()
+
+        return tuple(
+            (
+                x,  # seq_len x batch x hidden
+                final_hiddens,  # num_layers x batch x num_directions*hidden
+                final_cells,  # num_layers x batch x num_directions*hidden
+                encoder_padding_mask,  # seq_len x batch
+            )
+        )
+
+    def combine_bidir(self, outs, bsz: int):
+        out = outs.view(self.num_layers, 2, bsz, -1).transpose(1, 2).contiguous()
+        return out.view(self.num_layers, bsz, -1)
+
+    def reorder_encoder_out(self, encoder_out, new_order):
+        return tuple(
+            (
+                encoder_out[0].index_select(1, new_order),
+                encoder_out[1].index_select(1, new_order),
+                encoder_out[2].index_select(1, new_order),
+                encoder_out[3].index_select(1, new_order),
+            )
+        )
+
+    def max_positions(self):
+        """Maximum input length supported by the encoder."""
+        return self.max_source_positions
+
+
+class AttentionLayer(nn.Module):
+    def __init__(self, input_embed_dim, source_embed_dim, output_embed_dim, bias=False):
+        super().__init__()
+
+        self.input_proj = Linear(input_embed_dim, source_embed_dim, bias=bias)
+        self.output_proj = Linear(
+            input_embed_dim + source_embed_dim, output_embed_dim, bias=bias
+        )
+
+    def forward(self, input, source_hids, encoder_padding_mask):
+        # input: bsz x input_embed_dim
+        # source_hids: srclen x bsz x source_embed_dim
+
+        # x: bsz x source_embed_dim
+        x = self.input_proj(input)
+
+        # compute attention
+        attn_scores = (source_hids * x.unsqueeze(0)).sum(dim=2)
+
+        # don't attend over padding
+        if encoder_padding_mask is not None:
+            attn_scores = (
+                attn_scores.float()
+                .masked_fill_(encoder_padding_mask, float("-inf"))
+                .type_as(attn_scores)
+            )  # FP16 support: cast to float and back
+
+        attn_scores = F.softmax(attn_scores, dim=0)  # srclen x bsz
+
+        # sum weighted sources
+        x = (attn_scores.unsqueeze(2) * source_hids).sum(dim=0)
+
+        x = torch.tanh(self.output_proj(torch.cat((x, input), dim=1)))
+        return x, attn_scores
+
+
+class LSTMDecoder(FairseqIncrementalDecoder):
+    """LSTM decoder."""
+
+    def __init__(
+        self,
+        dictionary,
+        embed_dim=512,
+        hidden_size=512,
+        out_embed_dim=512,
+        num_layers=1,
+        dropout_in=0.1,
+        dropout_out=0.1,
+        attention=True,
+        encoder_output_units=512,
+        pretrained_embed=None,
+        share_input_output_embed=False,
+        adaptive_softmax_cutoff=None,
+        max_target_positions=DEFAULT_MAX_TARGET_POSITIONS,
+        residuals=False,
+    ):
+        super().__init__(dictionary)
+        self.dropout_in_module = FairseqDropout(
+            dropout_in, module_name=self.__class__.__name__
+        )
+        self.dropout_out_module = FairseqDropout(
+            dropout_out, module_name=self.__class__.__name__
+        )
+        self.hidden_size = hidden_size
+        self.share_input_output_embed = share_input_output_embed
+        self.need_attn = True
+        self.max_target_positions = max_target_positions
+        self.residuals = residuals
+        self.num_layers = num_layers
+
+        self.adaptive_softmax = None
+        num_embeddings = len(dictionary)
+        padding_idx = dictionary.pad()
+        if pretrained_embed is None:
+            self.embed_tokens = Embedding(num_embeddings, embed_dim, padding_idx)
+        else:
+            self.embed_tokens = pretrained_embed
+
+        self.encoder_output_units = encoder_output_units
+        if encoder_output_units != hidden_size and encoder_output_units != 0:
+            self.encoder_hidden_proj = Linear(encoder_output_units, hidden_size)
+            self.encoder_cell_proj = Linear(encoder_output_units, hidden_size)
+        else:
+            self.encoder_hidden_proj = self.encoder_cell_proj = None
+
+        # disable input feeding if there is no encoder
+        # input feeding is described in arxiv.org/abs/1508.04025
+        input_feed_size = 0 if encoder_output_units == 0 else hidden_size
+        self.layers = nn.ModuleList(
+            [
+                LSTMCell(
+                    input_size=input_feed_size + embed_dim
+                    if layer == 0
+                    else hidden_size,
+                    hidden_size=hidden_size,
+                )
+                for layer in range(num_layers)
+            ]
+        )
+
+        if attention:
+            # TODO make bias configurable
+            self.attention = AttentionLayer(
+                hidden_size, encoder_output_units, hidden_size, bias=False
+            )
+        else:
+            self.attention = None
+
+        if hidden_size != out_embed_dim:
+            self.additional_fc = Linear(hidden_size, out_embed_dim)
+
+        if adaptive_softmax_cutoff is not None:
+            # setting adaptive_softmax dropout to dropout_out for now but can be redefined
+            self.adaptive_softmax = AdaptiveSoftmax(
+                num_embeddings,
+                hidden_size,
+                adaptive_softmax_cutoff,
+                dropout=dropout_out,
+            )
+        elif not self.share_input_output_embed:
+            self.fc_out = Linear(out_embed_dim, num_embeddings, dropout=dropout_out)
+
+    def forward(
+        self,
+        prev_output_tokens,
+        encoder_out: Optional[Tuple[Tensor, Tensor, Tensor, Tensor]] = None,
+        incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]] = None,
+        src_lengths: Optional[Tensor] = None,
+    ):
+        x, attn_scores = self.extract_features(
+            prev_output_tokens, encoder_out, incremental_state
+        )
+        return self.output_layer(x), attn_scores
+
+    def extract_features(
+        self,
+        prev_output_tokens,
+        encoder_out: Optional[Tuple[Tensor, Tensor, Tensor, Tensor]] = None,
+        incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]] = None,
+    ):
+        """
+        Similar to *forward* but only return features.
+        """
+        # get outputs from encoder
+        if encoder_out is not None:
+            encoder_outs = encoder_out[0]
+            encoder_hiddens = encoder_out[1]
+            encoder_cells = encoder_out[2]
+            encoder_padding_mask = encoder_out[3]
+        else:
+            encoder_outs = torch.empty(0)
+            encoder_hiddens = torch.empty(0)
+            encoder_cells = torch.empty(0)
+            encoder_padding_mask = torch.empty(0)
+        srclen = encoder_outs.size(0)
+
+        if incremental_state is not None and len(incremental_state) > 0:
+            prev_output_tokens = prev_output_tokens[:, -1:]
+
+        bsz, seqlen = prev_output_tokens.size()
+
+        # embed tokens
+        x = self.embed_tokens(prev_output_tokens)
+        x = self.dropout_in_module(x)
+
+        # B x T x C -> T x B x C
+        x = x.transpose(0, 1)
+
+        # initialize previous states (or get from cache during incremental generation)
+        if incremental_state is not None and len(incremental_state) > 0:
+            prev_hiddens, prev_cells, input_feed = self.get_cached_state(
+                incremental_state
+            )
+        elif encoder_out is not None:
+            # setup recurrent cells
+            prev_hiddens = [encoder_hiddens[i] for i in range(self.num_layers)]
+            prev_cells = [encoder_cells[i] for i in range(self.num_layers)]
+            if self.encoder_hidden_proj is not None:
+                prev_hiddens = [self.encoder_hidden_proj(y) for y in prev_hiddens]
+                prev_cells = [self.encoder_cell_proj(y) for y in prev_cells]
+            input_feed = x.new_zeros(bsz, self.hidden_size)
+        else:
+            # setup zero cells, since there is no encoder
+            zero_state = x.new_zeros(bsz, self.hidden_size)
+            prev_hiddens = [zero_state for i in range(self.num_layers)]
+            prev_cells = [zero_state for i in range(self.num_layers)]
+            input_feed = None
+
+        assert (
+            srclen > 0 or self.attention is None
+        ), "attention is not supported if there are no encoder outputs"
+        attn_scores: Optional[Tensor] = (
+            x.new_zeros(srclen, seqlen, bsz) if self.attention is not None else None
+        )
+        outs = []
+        for j in range(seqlen):
+            # input feeding: concatenate context vector from previous time step
+            if input_feed is not None:
+                input = torch.cat((x[j, :, :], input_feed), dim=1)
+            else:
+                input = x[j]
+
+            for i, rnn in enumerate(self.layers):
+                # recurrent cell
+                hidden, cell = rnn(input, (prev_hiddens[i], prev_cells[i]))
+
+                # hidden state becomes the input to the next layer
+                input = self.dropout_out_module(hidden)
+                if self.residuals:
+                    input = input + prev_hiddens[i]
+
+                # save state for next time step
+                prev_hiddens[i] = hidden
+                prev_cells[i] = cell
+
+            # apply attention using the last layer's hidden state
+            if self.attention is not None:
+                assert attn_scores is not None
+                out, attn_scores[:, j, :] = self.attention(
+                    hidden, encoder_outs, encoder_padding_mask
+                )
+            else:
+                out = hidden
+            out = self.dropout_out_module(out)
+
+            # input feeding
+            if input_feed is not None:
+                input_feed = out
+
+            # save final output
+            outs.append(out)
+
+        # Stack all the necessary tensors together and store
+        prev_hiddens_tensor = torch.stack(prev_hiddens)
+        prev_cells_tensor = torch.stack(prev_cells)
+        cache_state = torch.jit.annotate(
+            Dict[str, Optional[Tensor]],
+            {
+                "prev_hiddens": prev_hiddens_tensor,
+                "prev_cells": prev_cells_tensor,
+                "input_feed": input_feed,
+            },
+        )
+        self.set_incremental_state(incremental_state, "cached_state", cache_state)
+
+        # collect outputs across time steps
+        x = torch.cat(outs, dim=0).view(seqlen, bsz, self.hidden_size)
+
+        # T x B x C -> B x T x C
+        x = x.transpose(1, 0)
+
+        if hasattr(self, "additional_fc") and self.adaptive_softmax is None:
+            x = self.additional_fc(x)
+            x = self.dropout_out_module(x)
+        # srclen x tgtlen x bsz -> bsz x tgtlen x srclen
+        if not self.training and self.need_attn and self.attention is not None:
+            assert attn_scores is not None
+            attn_scores = attn_scores.transpose(0, 2)
+        else:
+            attn_scores = None
+        return x, attn_scores
+
+    def output_layer(self, x):
+        """Project features to the vocabulary size."""
+        if self.adaptive_softmax is None:
+            if self.share_input_output_embed:
+                x = F.linear(x, self.embed_tokens.weight)
+            else:
+                x = self.fc_out(x)
+        return x
+
+    def get_cached_state(
+        self,
+        incremental_state: Dict[str, Dict[str, Optional[Tensor]]],
+    ) -> Tuple[List[Tensor], List[Tensor], Optional[Tensor]]:
+        cached_state = self.get_incremental_state(incremental_state, "cached_state")
+        assert cached_state is not None
+        prev_hiddens_ = cached_state["prev_hiddens"]
+        assert prev_hiddens_ is not None
+        prev_cells_ = cached_state["prev_cells"]
+        assert prev_cells_ is not None
+        prev_hiddens = [prev_hiddens_[i] for i in range(self.num_layers)]
+        prev_cells = [prev_cells_[j] for j in range(self.num_layers)]
+        input_feed = cached_state[
+            "input_feed"
+        ]  # can be None for decoder-only language models
+        return prev_hiddens, prev_cells, input_feed
+
+    def reorder_incremental_state(
+        self,
+        incremental_state: Dict[str, Dict[str, Optional[Tensor]]],
+        new_order: Tensor,
+    ):
+        if incremental_state is None or len(incremental_state) == 0:
+            return
+        prev_hiddens, prev_cells, input_feed = self.get_cached_state(incremental_state)
+        prev_hiddens = [p.index_select(0, new_order) for p in prev_hiddens]
+        prev_cells = [p.index_select(0, new_order) for p in prev_cells]
+        if input_feed is not None:
+            input_feed = input_feed.index_select(0, new_order)
+        cached_state_new = torch.jit.annotate(
+            Dict[str, Optional[Tensor]],
+            {
+                "prev_hiddens": torch.stack(prev_hiddens),
+                "prev_cells": torch.stack(prev_cells),
+                "input_feed": input_feed,
+            },
+        )
+        self.set_incremental_state(incremental_state, "cached_state", cached_state_new),
+        return
+
+    def max_positions(self):
+        """Maximum output length supported by the decoder."""
+        return self.max_target_positions
+
+    def make_generation_fast_(self, need_attn=False, **kwargs):
+        self.need_attn = need_attn
+
+
+def Embedding(num_embeddings, embedding_dim, padding_idx):
+    m = nn.Embedding(num_embeddings, embedding_dim, padding_idx=padding_idx)
+    nn.init.uniform_(m.weight, -0.1, 0.1)
+    nn.init.constant_(m.weight[padding_idx], 0)
+    return m
+
+
+def LSTM(input_size, hidden_size, **kwargs):
+    m = nn.LSTM(input_size, hidden_size, **kwargs)
+    for name, param in m.named_parameters():
+        if "weight" in name or "bias" in name:
+            param.data.uniform_(-0.1, 0.1)
+    return m
+
+
+def LSTMCell(input_size, hidden_size, **kwargs):
+    m = nn.LSTMCell(input_size, hidden_size, **kwargs)
+    for name, param in m.named_parameters():
+        if "weight" in name or "bias" in name:
+            param.data.uniform_(-0.1, 0.1)
+    return m
+
+
+def Linear(in_features, out_features, bias=True, dropout=0.0):
+    """Linear layer (input: N x T x C)"""
+    m = nn.Linear(in_features, out_features, bias=bias)
+    m.weight.data.uniform_(-0.1, 0.1)
+    if bias:
+        m.bias.data.uniform_(-0.1, 0.1)
+    return m
+
+
+@register_model_architecture("lstm", "lstm")
+def base_architecture(args):
+    args.dropout = getattr(args, "dropout", 0.1)
+    args.encoder_embed_dim = getattr(args, "encoder_embed_dim", 512)
+    args.encoder_embed_path = getattr(args, "encoder_embed_path", None)
+    args.encoder_freeze_embed = getattr(args, "encoder_freeze_embed", False)
+    args.encoder_hidden_size = getattr(
+        args, "encoder_hidden_size", args.encoder_embed_dim
+    )
+    args.encoder_layers = getattr(args, "encoder_layers", 1)
+    args.encoder_bidirectional = getattr(args, "encoder_bidirectional", False)
+    args.encoder_dropout_in = getattr(args, "encoder_dropout_in", args.dropout)
+    args.encoder_dropout_out = getattr(args, "encoder_dropout_out", args.dropout)
+    args.decoder_embed_dim = getattr(args, "decoder_embed_dim", 512)
+    args.decoder_embed_path = getattr(args, "decoder_embed_path", None)
+    args.decoder_freeze_embed = getattr(args, "decoder_freeze_embed", False)
+    args.decoder_hidden_size = getattr(
+        args, "decoder_hidden_size", args.decoder_embed_dim
+    )
+    args.decoder_layers = getattr(args, "decoder_layers", 1)
+    args.decoder_out_embed_dim = getattr(args, "decoder_out_embed_dim", 512)
+    args.decoder_attention = getattr(args, "decoder_attention", "1")
+    args.decoder_dropout_in = getattr(args, "decoder_dropout_in", args.dropout)
+    args.decoder_dropout_out = getattr(args, "decoder_dropout_out", args.dropout)
+    args.share_decoder_input_output_embed = getattr(
+        args, "share_decoder_input_output_embed", False
+    )
+    args.share_all_embeddings = getattr(args, "share_all_embeddings", False)
+    args.adaptive_softmax_cutoff = getattr(
+        args, "adaptive_softmax_cutoff", "10000,50000,200000"
+    )
+
+
+@register_model_architecture("lstm", "lstm_wiseman_iwslt_de_en")
+def lstm_wiseman_iwslt_de_en(args):
+    args.dropout = getattr(args, "dropout", 0.1)
+    args.encoder_embed_dim = getattr(args, "encoder_embed_dim", 256)
+    args.encoder_dropout_in = getattr(args, "encoder_dropout_in", 0)
+    args.encoder_dropout_out = getattr(args, "encoder_dropout_out", 0)
+    args.decoder_embed_dim = getattr(args, "decoder_embed_dim", 256)
+    args.decoder_out_embed_dim = getattr(args, "decoder_out_embed_dim", 256)
+    args.decoder_dropout_in = getattr(args, "decoder_dropout_in", 0)
+    args.decoder_dropout_out = getattr(args, "decoder_dropout_out", args.dropout)
+    base_architecture(args)
+
+
+@register_model_architecture("lstm", "lstm_luong_wmt_en_de")
+def lstm_luong_wmt_en_de(args):
+    args.encoder_embed_dim = getattr(args, "encoder_embed_dim", 1000)
+    args.encoder_layers = getattr(args, "encoder_layers", 4)
+    args.encoder_dropout_out = getattr(args, "encoder_dropout_out", 0)
+    args.decoder_embed_dim = getattr(args, "decoder_embed_dim", 1000)
+    args.decoder_layers = getattr(args, "decoder_layers", 4)
+    args.decoder_out_embed_dim = getattr(args, "decoder_out_embed_dim", 1000)
+    args.decoder_dropout_out = getattr(args, "decoder_dropout_out", 0)
+    base_architecture(args)
diff --git a/SpeechT5/fairseq/fairseq/models/lstm_lm.py b/SpeechT5/fairseq/fairseq/models/lstm_lm.py
new file mode 100644
index 0000000000000000000000000000000000000000..454f0ac36fab78bf02a8e2f07ed9607d1da87e34
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/models/lstm_lm.py
@@ -0,0 +1,142 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from fairseq import utils
+from fairseq.models import (
+    FairseqLanguageModel,
+    register_model,
+    register_model_architecture,
+)
+from fairseq.models.lstm import Embedding, LSTMDecoder
+
+
+DEFAULT_MAX_TARGET_POSITIONS = 1e5
+
+
+@register_model("lstm_lm")
+class LSTMLanguageModel(FairseqLanguageModel):
+    def __init__(self, decoder):
+        super().__init__(decoder)
+
+    @staticmethod
+    def add_args(parser):
+        """Add model-specific arguments to the parser."""
+        # fmt: off
+        parser.add_argument('--dropout', type=float, metavar='D',
+                            help='dropout probability')
+        parser.add_argument('--decoder-embed-dim', type=int, metavar='N',
+                            help='decoder embedding dimension')
+        parser.add_argument('--decoder-embed-path', type=str, metavar='STR',
+                            help='path to pre-trained decoder embedding')
+        parser.add_argument('--decoder-hidden-size', type=int, metavar='N',
+                            help='decoder hidden size')
+        parser.add_argument('--decoder-layers', type=int, metavar='N',
+                            help='number of decoder layers')
+        parser.add_argument('--decoder-out-embed-dim', type=int, metavar='N',
+                            help='decoder output embedding dimension')
+        parser.add_argument('--decoder-attention', type=str, metavar='BOOL',
+                            help='decoder attention')
+        parser.add_argument('--adaptive-softmax-cutoff', metavar='EXPR',
+                            help='comma separated list of adaptive softmax cutoff points. '
+                                 'Must be used with adaptive_loss criterion')
+        parser.add_argument('--residuals', default=False,
+                            action='store_true',
+                            help='applying residuals between LSTM layers')
+
+        # Granular dropout settings (if not specified these default to --dropout)
+        parser.add_argument('--decoder-dropout-in', type=float, metavar='D',
+                            help='dropout probability for decoder input embedding')
+        parser.add_argument('--decoder-dropout-out', type=float, metavar='D',
+                            help='dropout probability for decoder output')
+        parser.add_argument('--share-decoder-input-output-embed', default=False,
+                            action='store_true',
+                            help='share decoder input and output embeddings')
+        # fmt: on
+
+    @classmethod
+    def build_model(cls, args, task):
+        """Build a new model instance."""
+
+        # make sure all arguments are present in older models
+        base_architecture(args)
+
+        if getattr(args, "max_target_positions", None) is not None:
+            max_target_positions = args.max_target_positions
+        else:
+            max_target_positions = getattr(
+                args, "tokens_per_sample", DEFAULT_MAX_TARGET_POSITIONS
+            )
+
+        def load_pretrained_embedding_from_file(embed_path, dictionary, embed_dim):
+            num_embeddings = len(dictionary)
+            padding_idx = dictionary.pad()
+            embed_tokens = Embedding(num_embeddings, embed_dim, padding_idx)
+            embed_dict = utils.parse_embedding(embed_path)
+            utils.print_embed_overlap(embed_dict, dictionary)
+            return utils.load_embedding(embed_dict, dictionary, embed_tokens)
+
+        pretrained_decoder_embed = None
+        if args.decoder_embed_path:
+            pretrained_decoder_embed = load_pretrained_embedding_from_file(
+                args.decoder_embed_path, task.target_dictionary, args.decoder_embed_dim
+            )
+
+        if args.share_decoder_input_output_embed:
+            # double check all parameters combinations are valid
+            if task.source_dictionary != task.target_dictionary:
+                raise ValueError(
+                    "--share-decoder-input-output-embeddings requires a joint dictionary"
+                )
+
+            if args.decoder_embed_dim != args.decoder_out_embed_dim:
+                raise ValueError(
+                    "--share-decoder-input-output-embeddings requires "
+                    "--decoder-embed-dim to match --decoder-out-embed-dim"
+                )
+
+        decoder = LSTMDecoder(
+            dictionary=task.dictionary,
+            embed_dim=args.decoder_embed_dim,
+            hidden_size=args.decoder_hidden_size,
+            out_embed_dim=args.decoder_out_embed_dim,
+            num_layers=args.decoder_layers,
+            dropout_in=args.decoder_dropout_in,
+            dropout_out=args.decoder_dropout_out,
+            attention=False,  # decoder-only language model doesn't support attention
+            encoder_output_units=0,
+            pretrained_embed=pretrained_decoder_embed,
+            share_input_output_embed=args.share_decoder_input_output_embed,
+            adaptive_softmax_cutoff=(
+                utils.eval_str_list(args.adaptive_softmax_cutoff, type=int)
+                if args.criterion == "adaptive_loss"
+                else None
+            ),
+            max_target_positions=max_target_positions,
+            residuals=args.residuals,
+        )
+
+        return cls(decoder)
+
+
+@register_model_architecture("lstm_lm", "lstm_lm")
+def base_architecture(args):
+    args.dropout = getattr(args, "dropout", 0.1)
+    args.decoder_embed_dim = getattr(args, "decoder_embed_dim", 512)
+    args.decoder_embed_path = getattr(args, "decoder_embed_path", None)
+    args.decoder_hidden_size = getattr(
+        args, "decoder_hidden_size", args.decoder_embed_dim
+    )
+    args.decoder_layers = getattr(args, "decoder_layers", 1)
+    args.decoder_out_embed_dim = getattr(args, "decoder_out_embed_dim", 512)
+    args.decoder_attention = getattr(args, "decoder_attention", "0")
+    args.decoder_dropout_in = getattr(args, "decoder_dropout_in", args.dropout)
+    args.decoder_dropout_out = getattr(args, "decoder_dropout_out", args.dropout)
+    args.share_decoder_input_output_embed = getattr(
+        args, "share_decoder_input_output_embed", False
+    )
+    args.adaptive_softmax_cutoff = getattr(
+        args, "adaptive_softmax_cutoff", "10000,50000,200000"
+    )
+    args.residuals = getattr(args, "residuals", False)
diff --git a/SpeechT5/fairseq/fairseq/models/masked_lm.py b/SpeechT5/fairseq/fairseq/models/masked_lm.py
new file mode 100644
index 0000000000000000000000000000000000000000..c786de9125551f7247618b0a1d0867477894c755
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/models/masked_lm.py
@@ -0,0 +1,403 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import logging
+
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from fairseq import utils
+from fairseq.models import (
+    FairseqEncoder,
+    FairseqEncoderModel,
+    register_model,
+    register_model_architecture,
+)
+from fairseq.modules import (
+    LayerNorm,
+    SinusoidalPositionalEmbedding,
+    TransformerSentenceEncoder,
+)
+from fairseq.modules.transformer_sentence_encoder import init_bert_params
+
+
+logger = logging.getLogger(__name__)
+
+
+@register_model("masked_lm")
+class MaskedLMModel(FairseqEncoderModel):
+    """
+    Class for training a Masked Language Model. It also supports an
+    additional sentence level prediction if the sent-loss argument is set.
+    """
+
+    def __init__(self, args, encoder):
+        super().__init__(encoder)
+        self.args = args
+
+        # if specified then apply bert initialization on the model. We need
+        # to explictly call this to make sure that the output embeddings
+        # and projection layers are also correctly initialized
+        if getattr(args, "apply_bert_init", False):
+            self.apply(init_bert_params)
+
+    @staticmethod
+    def add_args(parser):
+        """Add model-specific arguments to the parser."""
+        # Arguments related to dropout
+        parser.add_argument(
+            "--dropout", type=float, metavar="D", help="dropout probability"
+        )
+        parser.add_argument(
+            "--attention-dropout",
+            type=float,
+            metavar="D",
+            help="dropout probability for" " attention weights",
+        )
+        parser.add_argument(
+            "--act-dropout",
+            type=float,
+            metavar="D",
+            help="dropout probability after" " activation in FFN",
+        )
+
+        # Arguments related to hidden states and self-attention
+        parser.add_argument(
+            "--encoder-ffn-embed-dim",
+            type=int,
+            metavar="N",
+            help="encoder embedding dimension for FFN",
+        )
+        parser.add_argument(
+            "--encoder-layers", type=int, metavar="N", help="num encoder layers"
+        )
+        parser.add_argument(
+            "--encoder-attention-heads",
+            type=int,
+            metavar="N",
+            help="num encoder attention heads",
+        )
+
+        # Arguments related to input and output embeddings
+        parser.add_argument(
+            "--encoder-embed-dim",
+            type=int,
+            metavar="N",
+            help="encoder embedding dimension",
+        )
+        parser.add_argument(
+            "--share-encoder-input-output-embed",
+            action="store_true",
+            help="share encoder input" " and output embeddings",
+        )
+        parser.add_argument(
+            "--encoder-learned-pos",
+            action="store_true",
+            help="use learned positional embeddings in the encoder",
+        )
+        parser.add_argument(
+            "--no-token-positional-embeddings",
+            action="store_true",
+            help="if set, disables positional embeddings" " (outside self attention)",
+        )
+        parser.add_argument(
+            "--num-segment", type=int, metavar="N", help="num segment in the input"
+        )
+        parser.add_argument(
+            "--max-positions", type=int, help="number of positional embeddings to learn"
+        )
+
+        # Arguments related to sentence level prediction
+        parser.add_argument(
+            "--sentence-class-num",
+            type=int,
+            metavar="N",
+            help="number of classes for sentence task",
+        )
+        parser.add_argument(
+            "--sent-loss",
+            action="store_true",
+            help="if set," " calculate sentence level predictions",
+        )
+
+        # Arguments related to parameter initialization
+        parser.add_argument(
+            "--apply-bert-init",
+            action="store_true",
+            help="use custom param initialization for BERT",
+        )
+
+        # misc params
+        parser.add_argument(
+            "--activation-fn",
+            choices=utils.get_available_activation_fns(),
+            help="activation function to use",
+        )
+        parser.add_argument(
+            "--pooler-activation-fn",
+            choices=utils.get_available_activation_fns(),
+            help="Which activation function to use for pooler layer.",
+        )
+        parser.add_argument(
+            "--encoder-normalize-before",
+            action="store_true",
+            help="apply layernorm before each encoder block",
+        )
+
+    def forward(self, src_tokens, segment_labels=None, **kwargs):
+        return self.encoder(src_tokens, segment_labels=segment_labels, **kwargs)
+
+    def max_positions(self):
+        return self.encoder.max_positions
+
+    @classmethod
+    def build_model(cls, args, task):
+        """Build a new model instance."""
+        # make sure all arguments are present in older models
+        base_architecture(args)
+
+        if not hasattr(args, "max_positions"):
+            args.max_positions = args.tokens_per_sample
+
+        logger.info(args)
+
+        encoder = MaskedLMEncoder(args, task.dictionary)
+        return cls(args, encoder)
+
+
+class MaskedLMEncoder(FairseqEncoder):
+    """
+    Encoder for Masked Language Modelling.
+    """
+
+    def __init__(self, args, dictionary):
+        super().__init__(dictionary)
+
+        self.padding_idx = dictionary.pad()
+        self.vocab_size = dictionary.__len__()
+        self.max_positions = args.max_positions
+
+        self.sentence_encoder = TransformerSentenceEncoder(
+            padding_idx=self.padding_idx,
+            vocab_size=self.vocab_size,
+            num_encoder_layers=args.encoder_layers,
+            embedding_dim=args.encoder_embed_dim,
+            ffn_embedding_dim=args.encoder_ffn_embed_dim,
+            num_attention_heads=args.encoder_attention_heads,
+            dropout=args.dropout,
+            attention_dropout=args.attention_dropout,
+            activation_dropout=args.act_dropout,
+            max_seq_len=self.max_positions,
+            num_segments=args.num_segment,
+            use_position_embeddings=not args.no_token_positional_embeddings,
+            encoder_normalize_before=args.encoder_normalize_before,
+            apply_bert_init=args.apply_bert_init,
+            activation_fn=args.activation_fn,
+            learned_pos_embedding=args.encoder_learned_pos,
+        )
+
+        self.share_input_output_embed = args.share_encoder_input_output_embed
+        self.embed_out = None
+        self.sentence_projection_layer = None
+        self.sentence_out_dim = args.sentence_class_num
+        self.lm_output_learned_bias = None
+
+        # Remove head is set to true during fine-tuning
+        self.load_softmax = not getattr(args, "remove_head", False)
+
+        self.masked_lm_pooler = nn.Linear(
+            args.encoder_embed_dim, args.encoder_embed_dim
+        )
+        self.pooler_activation = utils.get_activation_fn(args.pooler_activation_fn)
+
+        self.lm_head_transform_weight = nn.Linear(
+            args.encoder_embed_dim, args.encoder_embed_dim
+        )
+        self.activation_fn = utils.get_activation_fn(args.activation_fn)
+        self.layer_norm = LayerNorm(args.encoder_embed_dim)
+
+        self.lm_output_learned_bias = None
+        if self.load_softmax:
+            self.lm_output_learned_bias = nn.Parameter(torch.zeros(self.vocab_size))
+
+            if not self.share_input_output_embed:
+                self.embed_out = nn.Linear(
+                    args.encoder_embed_dim, self.vocab_size, bias=False
+                )
+
+            if args.sent_loss:
+                self.sentence_projection_layer = nn.Linear(
+                    args.encoder_embed_dim, self.sentence_out_dim, bias=False
+                )
+
+    def forward(self, src_tokens, segment_labels=None, masked_tokens=None, **unused):
+        """
+        Forward pass for Masked LM encoder. This first computes the token
+        embedding using the token embedding matrix, position embeddings (if
+        specified) and segment embeddings (if specified).
+
+        Here we assume that the sentence representation corresponds to the
+        output of the classification_token (see bert_task or cross_lingual_lm
+        task for more details).
+        Args:
+            - src_tokens: B x T matrix representing sentences
+            - segment_labels: B x T matrix representing segment label for tokens
+        Returns:
+            - a tuple of the following:
+                - logits for predictions in format B x T x C to be used in
+                  softmax afterwards
+                - a dictionary of additional data, where 'pooled_output' contains
+                  the representation for classification_token and 'inner_states'
+                  is a list of internal model states used to compute the
+                  predictions (similar in ELMO). 'sentence_logits'
+                  is the prediction logit for NSP task and is only computed if
+                  this is specified in the input arguments.
+        """
+
+        inner_states, sentence_rep = self.sentence_encoder(
+            src_tokens,
+            segment_labels=segment_labels,
+        )
+
+        x = inner_states[-1].transpose(0, 1)
+        # project masked tokens only
+        if masked_tokens is not None:
+            x = x[masked_tokens, :]
+        x = self.layer_norm(self.activation_fn(self.lm_head_transform_weight(x)))
+
+        pooled_output = self.pooler_activation(self.masked_lm_pooler(sentence_rep))
+
+        # project back to size of vocabulary
+        if self.share_input_output_embed and hasattr(
+            self.sentence_encoder.embed_tokens, "weight"
+        ):
+            x = F.linear(x, self.sentence_encoder.embed_tokens.weight)
+        elif self.embed_out is not None:
+            x = self.embed_out(x)
+        if self.lm_output_learned_bias is not None:
+            x = x + self.lm_output_learned_bias
+        sentence_logits = None
+        if self.sentence_projection_layer:
+            sentence_logits = self.sentence_projection_layer(pooled_output)
+
+        return x, {
+            "inner_states": inner_states,
+            "pooled_output": pooled_output,
+            "sentence_logits": sentence_logits,
+        }
+
+    def max_positions(self):
+        """Maximum output length supported by the encoder."""
+        return self.max_positions
+
+    def upgrade_state_dict_named(self, state_dict, name):
+        if isinstance(
+            self.sentence_encoder.embed_positions, SinusoidalPositionalEmbedding
+        ):
+            state_dict[
+                name + ".sentence_encoder.embed_positions._float_tensor"
+            ] = torch.FloatTensor(1)
+        if not self.load_softmax:
+            for k in list(state_dict.keys()):
+                if (
+                    "embed_out.weight" in k
+                    or "sentence_projection_layer.weight" in k
+                    or "lm_output_learned_bias" in k
+                ):
+                    del state_dict[k]
+        return state_dict
+
+
+@register_model_architecture("masked_lm", "masked_lm")
+def base_architecture(args):
+    args.dropout = getattr(args, "dropout", 0.1)
+    args.attention_dropout = getattr(args, "attention_dropout", 0.1)
+    args.act_dropout = getattr(args, "act_dropout", 0.0)
+
+    args.encoder_ffn_embed_dim = getattr(args, "encoder_ffn_embed_dim", 4096)
+    args.encoder_layers = getattr(args, "encoder_layers", 6)
+    args.encoder_attention_heads = getattr(args, "encoder_attention_heads", 8)
+
+    args.encoder_embed_dim = getattr(args, "encoder_embed_dim", 1024)
+    args.share_encoder_input_output_embed = getattr(
+        args, "share_encoder_input_output_embed", False
+    )
+    args.encoder_learned_pos = getattr(args, "encoder_learned_pos", False)
+    args.no_token_positional_embeddings = getattr(
+        args, "no_token_positional_embeddings", False
+    )
+    args.num_segment = getattr(args, "num_segment", 2)
+
+    args.sentence_class_num = getattr(args, "sentence_class_num", 2)
+    args.sent_loss = getattr(args, "sent_loss", False)
+
+    args.apply_bert_init = getattr(args, "apply_bert_init", False)
+
+    args.activation_fn = getattr(args, "activation_fn", "relu")
+    args.pooler_activation_fn = getattr(args, "pooler_activation_fn", "tanh")
+    args.encoder_normalize_before = getattr(args, "encoder_normalize_before", False)
+
+
+@register_model_architecture("masked_lm", "bert_base")
+def bert_base_architecture(args):
+    args.encoder_embed_dim = getattr(args, "encoder_embed_dim", 768)
+    args.share_encoder_input_output_embed = getattr(
+        args, "share_encoder_input_output_embed", True
+    )
+    args.no_token_positional_embeddings = getattr(
+        args, "no_token_positional_embeddings", False
+    )
+    args.encoder_learned_pos = getattr(args, "encoder_learned_pos", True)
+    args.num_segment = getattr(args, "num_segment", 2)
+
+    args.encoder_layers = getattr(args, "encoder_layers", 12)
+
+    args.encoder_attention_heads = getattr(args, "encoder_attention_heads", 12)
+    args.encoder_ffn_embed_dim = getattr(args, "encoder_ffn_embed_dim", 3072)
+
+    args.sentence_class_num = getattr(args, "sentence_class_num", 2)
+    args.sent_loss = getattr(args, "sent_loss", True)
+
+    args.apply_bert_init = getattr(args, "apply_bert_init", True)
+
+    args.activation_fn = getattr(args, "activation_fn", "gelu")
+    args.pooler_activation_fn = getattr(args, "pooler_activation_fn", "tanh")
+    args.encoder_normalize_before = getattr(args, "encoder_normalize_before", True)
+    base_architecture(args)
+
+
+@register_model_architecture("masked_lm", "bert_large")
+def bert_large_architecture(args):
+    args.encoder_embed_dim = getattr(args, "encoder_embed_dim", 1024)
+    args.encoder_layers = getattr(args, "encoder_layers", 24)
+    args.encoder_attention_heads = getattr(args, "encoder_attention_heads", 16)
+    args.encoder_ffn_embed_dim = getattr(args, "encoder_ffn_embed_dim", 4096)
+    bert_base_architecture(args)
+
+
+@register_model_architecture("masked_lm", "xlm_base")
+def xlm_architecture(args):
+    args.encoder_embed_dim = getattr(args, "encoder_embed_dim", 1024)
+    args.share_encoder_input_output_embed = getattr(
+        args, "share_encoder_input_output_embed", True
+    )
+    args.no_token_positional_embeddings = getattr(
+        args, "no_token_positional_embeddings", False
+    )
+    args.encoder_learned_pos = getattr(args, "encoder_learned_pos", True)
+    args.num_segment = getattr(args, "num_segment", 1)
+
+    args.encoder_layers = getattr(args, "encoder_layers", 6)
+
+    args.encoder_attention_heads = getattr(args, "encoder_attention_heads", 8)
+    args.encoder_ffn_embed_dim = getattr(args, "encoder_ffn_embed_dim", 4096)
+
+    args.sent_loss = getattr(args, "sent_loss", False)
+
+    args.activation_fn = getattr(args, "activation_fn", "gelu")
+    args.encoder_normalize_before = getattr(args, "encoder_normalize_before", False)
+    args.pooler_activation_fn = getattr(args, "pooler_activation_fn", "tanh")
+    args.apply_bert_init = getattr(args, "apply_bert_init", True)
+    base_architecture(args)
diff --git a/SpeechT5/fairseq/fairseq/models/model_utils.py b/SpeechT5/fairseq/fairseq/models/model_utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..732d66b1d5f695151c26d29eb7f6b53179c269f1
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/models/model_utils.py
@@ -0,0 +1,92 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from typing import List, Optional
+
+import torch
+from torch import Tensor
+
+
+@torch.jit.script
+def script_skip_tensor_list(x: List[Tensor], mask):
+    res = [xi[mask] if xi.size(0) == mask.size(0) else xi[:, mask] for xi in x]
+    outputs = []
+    for i, t in enumerate(res):
+        if t.numel() != 0:
+            outputs.append(t)
+        else:
+            outputs.append(x[i])
+    return outputs
+
+
+@torch.jit.script
+def script_skip_tensor(x: Tensor, mask):
+    # None case
+    if x.size(0) == 0:
+        return x
+    res = x[mask] if x.size(0) == mask.size(0) else x[:, mask]
+    if res.numel() == 0:
+        return x
+    else:
+        return res
+
+
+@torch.jit.script
+def expand_2d_or_3d_tensor(x, trg_dim: int, padding_idx: int):
+    """
+    Expand 2D/3D tensor on dim=1
+    """
+    if x is None:
+        return None
+
+    assert x.dim() == 2 or x.dim() == 3
+    assert trg_dim >= x.size(1), (trg_dim, x.size())
+    if trg_dim == x.size(1):
+        return x
+
+    dims = [x.size(0), trg_dim - x.size(1)]
+    if x.dim() == 3:
+        dims.append(x.size(2))
+    x = torch.cat([x, torch.zeros(dims).to(x).fill_(padding_idx)], 1)
+
+    return x
+
+
+@torch.jit.script
+def coalesce(x: Optional[Tensor], y: Tensor) -> Tensor:
+    return x if x is not None else y
+
+
+@torch.jit.script
+def fill_tensors(
+    x: Optional[Tensor], mask, y: Optional[Tensor], padding_idx: int
+) -> Optional[Tensor]:
+    """
+    Filling tensor x with y at masked positions (dim=0).
+    """
+    if x is None or x.size()[0] == 0 or y is None:
+        return x
+    assert x.dim() == y.dim() and mask.size(0) == x.size(0)
+    assert x.dim() == 2 or (x.dim() == 3 and x.size(2) == y.size(2))
+
+    n_selected = mask.sum()
+    if n_selected == 0:
+        return x
+    assert n_selected == y.size(0)
+    if n_selected == x.size(0):
+        return y
+
+    if x.size(1) < y.size(1):
+        x = expand_2d_or_3d_tensor(x, y.size(1), padding_idx)
+        x[mask] = y
+    elif x.size(1) > y.size(1):
+        x[mask] = torch.tensor(padding_idx).type_as(x)
+        if x.dim() == 2:
+            x[mask, : y.size(1)] = y
+        else:
+            x[mask, : y.size(1), :] = y
+    else:
+        x[mask] = y
+    return x
diff --git a/SpeechT5/fairseq/fairseq/models/multilingual_transformer.py b/SpeechT5/fairseq/fairseq/models/multilingual_transformer.py
new file mode 100644
index 0000000000000000000000000000000000000000..2e1f86f36e01a2dd105c13f2e69b0eb25caa9fca
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/models/multilingual_transformer.py
@@ -0,0 +1,228 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from collections import OrderedDict
+
+from fairseq import utils
+from fairseq.models import (
+    FairseqMultiModel,
+    register_model,
+    register_model_architecture,
+)
+from fairseq.models.transformer import (
+    Embedding,
+    TransformerDecoder,
+    TransformerEncoder,
+    TransformerModel,
+    base_architecture,
+)
+
+
+@register_model("multilingual_transformer")
+class MultilingualTransformerModel(FairseqMultiModel):
+    """Train Transformer models for multiple language pairs simultaneously.
+
+    Requires `--task multilingual_translation`.
+
+    We inherit all arguments from TransformerModel and assume that all language
+    pairs use a single Transformer architecture. In addition, we provide several
+    options that are specific to the multilingual setting.
+
+    Args:
+        --share-encoder-embeddings: share encoder embeddings across all source languages
+        --share-decoder-embeddings: share decoder embeddings across all target languages
+        --share-encoders: share all encoder params (incl. embeddings) across all source languages
+        --share-decoders: share all decoder params (incl. embeddings) across all target languages
+    """
+
+    def __init__(self, encoders, decoders):
+        super().__init__(encoders, decoders)
+
+    @staticmethod
+    def add_args(parser):
+        """Add model-specific arguments to the parser."""
+        TransformerModel.add_args(parser)
+        parser.add_argument(
+            "--share-encoder-embeddings",
+            action="store_true",
+            help="share encoder embeddings across languages",
+        )
+        parser.add_argument(
+            "--share-decoder-embeddings",
+            action="store_true",
+            help="share decoder embeddings across languages",
+        )
+        parser.add_argument(
+            "--share-encoders",
+            action="store_true",
+            help="share encoders across languages",
+        )
+        parser.add_argument(
+            "--share-decoders",
+            action="store_true",
+            help="share decoders across languages",
+        )
+
+    @classmethod
+    def build_model(cls, args, task):
+        """Build a new model instance."""
+        from fairseq.tasks.multilingual_translation import MultilingualTranslationTask
+
+        assert isinstance(task, MultilingualTranslationTask)
+
+        # make sure all arguments are present in older models
+        base_multilingual_architecture(args)
+
+        if not hasattr(args, "max_source_positions"):
+            args.max_source_positions = 1024
+        if not hasattr(args, "max_target_positions"):
+            args.max_target_positions = 1024
+
+        src_langs = [lang_pair.split("-")[0] for lang_pair in task.model_lang_pairs]
+        tgt_langs = [lang_pair.split("-")[1] for lang_pair in task.model_lang_pairs]
+
+        if args.share_encoders:
+            args.share_encoder_embeddings = True
+        if args.share_decoders:
+            args.share_decoder_embeddings = True
+
+        def build_embedding(dictionary, embed_dim, path=None):
+            num_embeddings = len(dictionary)
+            padding_idx = dictionary.pad()
+            emb = Embedding(num_embeddings, embed_dim, padding_idx)
+            # if provided, load from preloaded dictionaries
+            if path:
+                embed_dict = utils.parse_embedding(path)
+                utils.load_embedding(embed_dict, dictionary, emb)
+            return emb
+
+        # build shared embeddings (if applicable)
+        shared_encoder_embed_tokens, shared_decoder_embed_tokens = None, None
+        if args.share_all_embeddings:
+            if args.encoder_embed_dim != args.decoder_embed_dim:
+                raise ValueError(
+                    "--share-all-embeddings requires --encoder-embed-dim to match --decoder-embed-dim"
+                )
+            if args.decoder_embed_path and (
+                args.decoder_embed_path != args.encoder_embed_path
+            ):
+                raise ValueError(
+                    "--share-all-embeddings not compatible with --decoder-embed-path"
+                )
+            shared_encoder_embed_tokens = FairseqMultiModel.build_shared_embeddings(
+                dicts=task.dicts,
+                langs=task.langs,
+                embed_dim=args.encoder_embed_dim,
+                build_embedding=build_embedding,
+                pretrained_embed_path=args.encoder_embed_path,
+            )
+            shared_decoder_embed_tokens = shared_encoder_embed_tokens
+            args.share_decoder_input_output_embed = True
+        else:
+            if args.share_encoder_embeddings:
+                shared_encoder_embed_tokens = FairseqMultiModel.build_shared_embeddings(
+                    dicts=task.dicts,
+                    langs=src_langs,
+                    embed_dim=args.encoder_embed_dim,
+                    build_embedding=build_embedding,
+                    pretrained_embed_path=args.encoder_embed_path,
+                )
+            if args.share_decoder_embeddings:
+                shared_decoder_embed_tokens = FairseqMultiModel.build_shared_embeddings(
+                    dicts=task.dicts,
+                    langs=tgt_langs,
+                    embed_dim=args.decoder_embed_dim,
+                    build_embedding=build_embedding,
+                    pretrained_embed_path=args.decoder_embed_path,
+                )
+
+        # encoders/decoders for each language
+        lang_encoders, lang_decoders = {}, {}
+
+        def get_encoder(lang):
+            if lang not in lang_encoders:
+                if shared_encoder_embed_tokens is not None:
+                    encoder_embed_tokens = shared_encoder_embed_tokens
+                else:
+                    encoder_embed_tokens = build_embedding(
+                        task.dicts[lang],
+                        args.encoder_embed_dim,
+                        args.encoder_embed_path,
+                    )
+                lang_encoders[lang] = cls._get_module_class(
+                    True, args, task.dicts[lang], encoder_embed_tokens, src_langs
+                )
+            return lang_encoders[lang]
+
+        def get_decoder(lang):
+            if lang not in lang_decoders:
+                if shared_decoder_embed_tokens is not None:
+                    decoder_embed_tokens = shared_decoder_embed_tokens
+                else:
+                    decoder_embed_tokens = build_embedding(
+                        task.dicts[lang],
+                        args.decoder_embed_dim,
+                        args.decoder_embed_path,
+                    )
+                lang_decoders[lang] = cls._get_module_class(
+                    False, args, task.dicts[lang], decoder_embed_tokens, tgt_langs
+                )
+            return lang_decoders[lang]
+
+        # shared encoders/decoders (if applicable)
+        shared_encoder, shared_decoder = None, None
+        if args.share_encoders:
+            shared_encoder = get_encoder(src_langs[0])
+        if args.share_decoders:
+            shared_decoder = get_decoder(tgt_langs[0])
+
+        encoders, decoders = OrderedDict(), OrderedDict()
+        for lang_pair, src, tgt in zip(task.model_lang_pairs, src_langs, tgt_langs):
+            encoders[lang_pair] = (
+                shared_encoder if shared_encoder is not None else get_encoder(src)
+            )
+            decoders[lang_pair] = (
+                shared_decoder if shared_decoder is not None else get_decoder(tgt)
+            )
+
+        return MultilingualTransformerModel(encoders, decoders)
+
+    @classmethod
+    def _get_module_class(cls, is_encoder, args, lang_dict, embed_tokens, langs):
+        module_class = TransformerEncoder if is_encoder else TransformerDecoder
+        return module_class(args, lang_dict, embed_tokens)
+
+    def load_state_dict(self, state_dict, strict=True, model_cfg=None):
+        state_dict_subset = state_dict.copy()
+        for k, _ in state_dict.items():
+            assert k.startswith("models.")
+            lang_pair = k.split(".")[1]
+            if lang_pair not in self.models:
+                del state_dict_subset[k]
+        super().load_state_dict(state_dict_subset, strict=strict, model_cfg=model_cfg)
+
+
+@register_model_architecture("multilingual_transformer", "multilingual_transformer")
+def base_multilingual_architecture(args):
+    base_architecture(args)
+    args.share_encoder_embeddings = getattr(args, "share_encoder_embeddings", False)
+    args.share_decoder_embeddings = getattr(args, "share_decoder_embeddings", False)
+    args.share_encoders = getattr(args, "share_encoders", False)
+    args.share_decoders = getattr(args, "share_decoders", False)
+
+
+@register_model_architecture(
+    "multilingual_transformer", "multilingual_transformer_iwslt_de_en"
+)
+def multilingual_transformer_iwslt_de_en(args):
+    args.encoder_embed_dim = getattr(args, "encoder_embed_dim", 512)
+    args.encoder_ffn_embed_dim = getattr(args, "encoder_ffn_embed_dim", 1024)
+    args.encoder_attention_heads = getattr(args, "encoder_attention_heads", 4)
+    args.encoder_layers = getattr(args, "encoder_layers", 6)
+    args.decoder_embed_dim = getattr(args, "decoder_embed_dim", 512)
+    args.decoder_ffn_embed_dim = getattr(args, "decoder_ffn_embed_dim", 1024)
+    args.decoder_attention_heads = getattr(args, "decoder_attention_heads", 4)
+    args.decoder_layers = getattr(args, "decoder_layers", 6)
+    base_multilingual_architecture(args)
diff --git a/SpeechT5/fairseq/fairseq/models/nat/__init__.py b/SpeechT5/fairseq/fairseq/models/nat/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..05fe822487c3bcde8346648d5826f1669c6bc1ca
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/models/nat/__init__.py
@@ -0,0 +1,13 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+"""isort:skip_file"""
+
+from .fairseq_nat_model import *
+from .nonautoregressive_transformer import *
+from .nat_crf_transformer import *
+from .iterative_nonautoregressive_transformer import *
+from .cmlm_transformer import *
+from .levenshtein_transformer import *
+from .insertion_transformer import *
diff --git a/SpeechT5/fairseq/fairseq/models/nat/cmlm_transformer.py b/SpeechT5/fairseq/fairseq/models/nat/cmlm_transformer.py
new file mode 100644
index 0000000000000000000000000000000000000000..c876e9453c101c00bd8e93e6e6f1fb48dc26f993
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/models/nat/cmlm_transformer.py
@@ -0,0 +1,162 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+"""
+This file implements:
+Ghazvininejad, Marjan, et al.
+"Constant-time machine translation with conditional masked language models."
+arXiv preprint arXiv:1904.09324 (2019).
+"""
+
+from fairseq.models import register_model, register_model_architecture
+from fairseq.models.nat import NATransformerModel
+from fairseq.utils import new_arange
+
+
+def _skeptical_unmasking(output_scores, output_masks, p):
+    sorted_index = output_scores.sort(-1)[1]
+    boundary_len = (
+        (output_masks.sum(1, keepdim=True).type_as(output_scores) - 2) * p
+    ).long()
+    skeptical_mask = new_arange(output_masks) < boundary_len
+    return skeptical_mask.scatter(1, sorted_index, skeptical_mask)
+
+
+@register_model("cmlm_transformer")
+class CMLMNATransformerModel(NATransformerModel):
+    @staticmethod
+    def add_args(parser):
+        NATransformerModel.add_args(parser)
+
+    def forward(
+        self, src_tokens, src_lengths, prev_output_tokens, tgt_tokens, **kwargs
+    ):
+        assert not self.decoder.src_embedding_copy, "do not support embedding copy."
+
+        # encoding
+        encoder_out = self.encoder(src_tokens, src_lengths=src_lengths, **kwargs)
+        # length prediction
+        length_out = self.decoder.forward_length(
+            normalize=False, encoder_out=encoder_out
+        )
+        length_tgt = self.decoder.forward_length_prediction(
+            length_out, encoder_out, tgt_tokens
+        )
+
+        # decoding
+        word_ins_out = self.decoder(
+            normalize=False,
+            prev_output_tokens=prev_output_tokens,
+            encoder_out=encoder_out,
+        )
+        word_ins_mask = prev_output_tokens.eq(self.unk)
+
+        return {
+            "word_ins": {
+                "out": word_ins_out,
+                "tgt": tgt_tokens,
+                "mask": word_ins_mask,
+                "ls": self.args.label_smoothing,
+                "nll_loss": True,
+            },
+            "length": {
+                "out": length_out,
+                "tgt": length_tgt,
+                "factor": self.decoder.length_loss_factor,
+            },
+        }
+
+    def forward_decoder(self, decoder_out, encoder_out, decoding_format=None, **kwargs):
+
+        step = decoder_out.step
+        max_step = decoder_out.max_step
+
+        output_tokens = decoder_out.output_tokens
+        output_scores = decoder_out.output_scores
+        history = decoder_out.history
+
+        # execute the decoder
+        output_masks = output_tokens.eq(self.unk)
+        _scores, _tokens = self.decoder(
+            normalize=True,
+            prev_output_tokens=output_tokens,
+            encoder_out=encoder_out,
+        ).max(-1)
+        output_tokens.masked_scatter_(output_masks, _tokens[output_masks])
+        output_scores.masked_scatter_(output_masks, _scores[output_masks])
+
+        if history is not None:
+            history.append(output_tokens.clone())
+
+        # skeptical decoding (depend on the maximum decoding steps.)
+        if (step + 1) < max_step:
+            skeptical_mask = _skeptical_unmasking(
+                output_scores, output_tokens.ne(self.pad), 1 - (step + 1) / max_step
+            )
+
+            output_tokens.masked_fill_(skeptical_mask, self.unk)
+            output_scores.masked_fill_(skeptical_mask, 0.0)
+
+            if history is not None:
+                history.append(output_tokens.clone())
+
+        return decoder_out._replace(
+            output_tokens=output_tokens,
+            output_scores=output_scores,
+            attn=None,
+            history=history,
+        )
+
+
+@register_model_architecture("cmlm_transformer", "cmlm_transformer")
+def cmlm_base_architecture(args):
+    args.encoder_embed_path = getattr(args, "encoder_embed_path", None)
+    args.encoder_embed_dim = getattr(args, "encoder_embed_dim", 512)
+    args.encoder_ffn_embed_dim = getattr(args, "encoder_ffn_embed_dim", 2048)
+    args.encoder_layers = getattr(args, "encoder_layers", 6)
+    args.encoder_attention_heads = getattr(args, "encoder_attention_heads", 8)
+    args.encoder_normalize_before = getattr(args, "encoder_normalize_before", False)
+    args.encoder_learned_pos = getattr(args, "encoder_learned_pos", False)
+    args.decoder_embed_path = getattr(args, "decoder_embed_path", None)
+    args.decoder_embed_dim = getattr(args, "decoder_embed_dim", args.encoder_embed_dim)
+    args.decoder_ffn_embed_dim = getattr(
+        args, "decoder_ffn_embed_dim", args.encoder_ffn_embed_dim
+    )
+    args.decoder_layers = getattr(args, "decoder_layers", 6)
+    args.decoder_attention_heads = getattr(args, "decoder_attention_heads", 8)
+    args.decoder_normalize_before = getattr(args, "decoder_normalize_before", False)
+    args.decoder_learned_pos = getattr(args, "decoder_learned_pos", False)
+    args.attention_dropout = getattr(args, "attention_dropout", 0.0)
+    args.activation_dropout = getattr(args, "activation_dropout", 0.0)
+    args.activation_fn = getattr(args, "activation_fn", "relu")
+    args.dropout = getattr(args, "dropout", 0.1)
+    args.adaptive_softmax_cutoff = getattr(args, "adaptive_softmax_cutoff", None)
+    args.adaptive_softmax_dropout = getattr(args, "adaptive_softmax_dropout", 0)
+    args.share_decoder_input_output_embed = getattr(
+        args, "share_decoder_input_output_embed", False
+    )
+    args.share_all_embeddings = getattr(args, "share_all_embeddings", True)
+    args.no_token_positional_embeddings = getattr(
+        args, "no_token_positional_embeddings", False
+    )
+    args.adaptive_input = getattr(args, "adaptive_input", False)
+    args.apply_bert_init = getattr(args, "apply_bert_init", False)
+
+    args.decoder_output_dim = getattr(
+        args, "decoder_output_dim", args.decoder_embed_dim
+    )
+    args.decoder_input_dim = getattr(args, "decoder_input_dim", args.decoder_embed_dim)
+
+    # --- special arguments ---
+    args.sg_length_pred = getattr(args, "sg_length_pred", False)
+    args.pred_length_offset = getattr(args, "pred_length_offset", False)
+    args.length_loss_factor = getattr(args, "length_loss_factor", 0.1)
+    args.ngram_predictor = getattr(args, "ngram_predictor", 1)
+    args.src_embedding_copy = getattr(args, "src_embedding_copy", False)
+
+
+@register_model_architecture("cmlm_transformer", "cmlm_transformer_wmt_en_de")
+def cmlm_wmt_en_de(args):
+    cmlm_base_architecture(args)
diff --git a/SpeechT5/fairseq/fairseq/models/nat/fairseq_nat_model.py b/SpeechT5/fairseq/fairseq/models/nat/fairseq_nat_model.py
new file mode 100644
index 0000000000000000000000000000000000000000..b09394112f57d9e82f2a4cbc371af888281b9e8a
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/models/nat/fairseq_nat_model.py
@@ -0,0 +1,170 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import math
+
+import torch
+from fairseq.models.transformer import (
+    TransformerDecoder,
+    TransformerEncoder,
+    TransformerModel,
+)
+from fairseq.modules.transformer_sentence_encoder import init_bert_params
+
+
+def ensemble_encoder(func):
+    def wrapper(self, *args, **kwargs):
+        if self.ensemble_models is None or len(self.ensemble_models) == 1:
+            return func(self, *args, **kwargs)
+        encoder_outs = [func(model, *args, **kwargs, return_all_hiddens=True) for model in self.ensemble_models]
+        _encoder_out = encoder_outs[0].copy()
+
+        def stack(key):
+            outs = [e[key][0] for e in encoder_outs]
+            return [torch.stack(outs, -1) if outs[0] is not None else None]
+
+        _encoder_out["encoder_out"] = stack("encoder_out")
+        _encoder_out["encoder_embedding"] = stack("encoder_embedding")
+
+        num_layers = len(_encoder_out["encoder_states"])
+        if num_layers > 0:
+            _encoder_out["encoder_states"] = [
+                torch.stack([e["encoder_states"][i] for e in encoder_outs], -1)
+                for i in range(num_layers)
+            ]
+        return _encoder_out
+
+    return wrapper
+
+
+def ensemble_decoder(func):
+    def wrapper(self, normalize=False, encoder_out=None, *args, **kwargs):
+        if self.ensemble_models is None or len(self.ensemble_models) == 1:
+            return func(
+                self, normalize=normalize, encoder_out=encoder_out, *args, **kwargs
+            )
+
+        def _replace(encoder_out, new_val):
+            new_encoder_out = encoder_out.copy()
+            new_encoder_out["encoder_out"] = [new_val]
+            return new_encoder_out
+
+        action_outs = [
+            func(
+                model,
+                normalize=normalize,
+                encoder_out=_replace(
+                    encoder_out,
+                    encoder_out["encoder_out"][0][:, :, :, i]
+                ),
+                *args,
+                **kwargs
+            )
+            for i, model in enumerate(self.ensemble_models)
+        ]
+
+        if not isinstance(action_outs[0], tuple):  # return multiple values
+            action_outs = [[a] for a in action_outs]
+        else:
+            action_outs = [list(a) for a in action_outs]
+
+        ensembled_outs = []
+        for i in range(len(action_outs[0])):
+            if i == 0 and normalize:
+                ensembled_outs += [
+                    torch.logsumexp(
+                        torch.stack([a[i] for a in action_outs], -1), dim=-1
+                    )
+                    - math.log(len(self.ensemble_models))
+                ]
+            elif action_outs[0][i] is not None:
+                ensembled_outs += [torch.stack([a[i] for a in action_outs], -1)]
+            else:
+                ensembled_outs += [None]
+
+        if len(ensembled_outs) == 1:
+            return ensembled_outs[0]
+        return tuple(ensembled_outs)
+
+    return wrapper
+
+
+class FairseqNATModel(TransformerModel):
+    """
+    Abstract class for all nonautoregressive-based models
+    """
+
+    def __init__(self, args, encoder, decoder):
+        super().__init__(args, encoder, decoder)
+        self.tgt_dict = decoder.dictionary
+        self.bos = decoder.dictionary.bos()
+        self.eos = decoder.dictionary.eos()
+        self.pad = decoder.dictionary.pad()
+        self.unk = decoder.dictionary.unk()
+
+        self.ensemble_models = None
+
+    @property
+    def allow_length_beam(self):
+        return False
+
+    @property
+    def allow_ensemble(self):
+        return True
+
+    def enable_ensemble(self, models):
+        self.encoder.ensemble_models = [m.encoder for m in models]
+        self.decoder.ensemble_models = [m.decoder for m in models]
+
+    @staticmethod
+    def add_args(parser):
+        TransformerModel.add_args(parser)
+        parser.add_argument(
+            "--apply-bert-init",
+            action="store_true",
+            help="use custom param initialization for BERT",
+        )
+
+    @classmethod
+    def build_decoder(cls, args, tgt_dict, embed_tokens):
+        decoder = FairseqNATDecoder(args, tgt_dict, embed_tokens)
+        if getattr(args, "apply_bert_init", False):
+            decoder.apply(init_bert_params)
+        return decoder
+
+    @classmethod
+    def build_encoder(cls, args, src_dict, embed_tokens):
+        encoder = FairseqNATEncoder(args, src_dict, embed_tokens)
+        if getattr(args, "apply_bert_init", False):
+            encoder.apply(init_bert_params)
+        return encoder
+
+    def forward_encoder(self, encoder_inputs):
+        return self.encoder(*encoder_inputs)
+
+    def forward_decoder(self, *args, **kwargs):
+        return NotImplementedError
+
+    def initialize_output_tokens(self, *args, **kwargs):
+        return NotImplementedError
+
+    def forward(self, *args, **kwargs):
+        return NotImplementedError
+
+
+class FairseqNATEncoder(TransformerEncoder):
+    def __init__(self, args, dictionary, embed_tokens):
+        super().__init__(args, dictionary, embed_tokens)
+        self.ensemble_models = None
+
+    @ensemble_encoder
+    def forward(self, *args, **kwargs):
+        return super().forward(*args, **kwargs)
+
+
+class FairseqNATDecoder(TransformerDecoder):
+    def __init__(self, args, dictionary, embed_tokens, no_encoder_attn=False):
+        super().__init__(args, dictionary, embed_tokens, no_encoder_attn)
+        self.ensemble_models = None
diff --git a/SpeechT5/fairseq/fairseq/models/nat/insertion_transformer.py b/SpeechT5/fairseq/fairseq/models/nat/insertion_transformer.py
new file mode 100644
index 0000000000000000000000000000000000000000..bc28000f59a3b9e8098f9fe710cc8335d39eea3e
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/models/nat/insertion_transformer.py
@@ -0,0 +1,280 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import numpy as np
+import torch
+import torch.nn.functional as F
+from fairseq.models import register_model, register_model_architecture
+from fairseq.models.nat import (
+    FairseqNATModel,
+    LevenshteinTransformerDecoder,
+    LevenshteinTransformerModel,
+    ensemble_decoder,
+)
+from fairseq.models.transformer import Linear
+from fairseq.modules.transformer_sentence_encoder import init_bert_params
+from fairseq.utils import new_arange
+
+
+class NegativeDistanceScore(object):
+    def __init__(self):
+
+        # pre-compute some values
+        self.scores = {}
+
+        self.scores[0.5] = self.compute_score_full(50, 0.5)
+        self.scores[1.0] = self.compute_score_full(50, 1.0)
+        self.scores[2.0] = self.compute_score_full(50, 2.0)
+
+    def __call__(self, i, L, tau):
+        if (tau is None) or (tau > 1000):
+            return 1 / L
+
+        if tau in self.scores:
+            if L < self.scores[tau].shape[0]:
+                return self.scores[tau][L - 1, i]
+        return self.compute_score(L, tau)[i]
+
+    def compute_score(self, L, tau):
+        s = np.array([-abs(L / 2 - i) / tau for i in range(L)])
+        s = np.exp(s - s.max())
+        return s / s.sum()
+
+    def compute_score_full(self, L, tau):
+        s = -abs(np.arange(0, L - 1)[:, None] / 2 - np.arange(L)[None, :]) / tau
+        s = np.tril(s, 0) + np.triu(s - float("inf"), 1)
+        s = np.exp(s - s.max(1, keepdims=True))
+        return s / s.sum(1, keepdims=True)
+
+
+neg_scorer = NegativeDistanceScore()
+
+
+def _get_ins_targets(in_tokens, out_tokens, padding_idx, unk_idx, vocab_size, tau=None):
+    try:
+        from fairseq import libnat
+    except ImportError as e:
+        import sys
+
+        sys.stderr.write("ERROR: missing libnat. run `pip install --editable .`\n")
+        raise e
+
+    B = in_tokens.size(0)
+    T = in_tokens.size(1)
+    V = vocab_size
+
+    with torch.cuda.device_of(in_tokens):
+        in_tokens_list = [
+            [t for t in s if t != padding_idx] for i, s in enumerate(in_tokens.tolist())
+        ]
+        out_tokens_list = [
+            [t for t in s if t != padding_idx]
+            for i, s in enumerate(out_tokens.tolist())
+        ]
+
+    full_labels = libnat.suggested_ed2_path(
+        in_tokens_list, out_tokens_list, padding_idx
+    )
+    insert_labels = [a[:-1] for a in full_labels]
+
+    # numericalize1
+    insert_label_tensors = in_tokens.new_zeros(B * (T - 1) * V).float()
+    insert_index, insert_labels = zip(
+        *[
+            (w + (j + i * (T - 1)) * V, neg_scorer(k, len(label), tau))
+            for i, labels in enumerate(insert_labels)
+            for j, label in enumerate(labels[1:-1])
+            for k, w in enumerate(label)
+        ]
+    )  # HACK 1:-1
+    insert_index, insert_labels = [
+        torch.tensor(list(a), device=in_tokens.device)
+        for a in [insert_index, insert_labels]
+    ]
+    insert_label_tensors.scatter_(0, insert_index.long(), insert_labels)
+    insert_label_tensors = insert_label_tensors.view(B, T - 1, V)
+
+    return insert_label_tensors
+
+
+def _apply_ins_words(in_tokens, in_scores, word_ins_pred, word_ins_scores, padding_idx):
+
+    padding_masks = in_tokens[:, 1:].eq(padding_idx)
+    word_ins_scores.masked_fill_(padding_masks, 0.0)
+    word_ins_pred.masked_fill_(padding_masks, padding_idx)
+
+    in_coords = new_arange(in_tokens).type_as(in_scores)
+
+    # shift all padding predictions to infinite
+    out_coords = (in_coords[:, 1:] - 0.5).masked_fill(
+        word_ins_pred.eq(padding_idx), float("inf")
+    )
+    out_coords = torch.cat([in_coords, out_coords], 1).sort(-1)[1]
+    out_tokens = torch.cat([in_tokens, word_ins_pred], 1).gather(1, out_coords)
+    out_scores = torch.cat([in_scores, word_ins_scores], 1).gather(1, out_coords)
+    return out_tokens, out_scores
+
+
+@register_model("insertion_transformer")
+class InsertionTransformerModel(LevenshteinTransformerModel):
+    def __init__(self, args, encoder, decoder):
+        super().__init__(args, encoder, decoder)
+
+    @staticmethod
+    def add_args(parser):
+        FairseqNATModel.add_args(parser)
+        parser.add_argument("--label-tau", default=None, type=float)
+
+    @classmethod
+    def build_decoder(cls, args, tgt_dict, embed_tokens):
+        decoder = InsertionTransformerDecoder(args, tgt_dict, embed_tokens)
+        if getattr(args, "apply_bert_init", False):
+            decoder.apply(init_bert_params)
+        return decoder
+
+    def forward(
+        self, src_tokens, src_lengths, prev_output_tokens, tgt_tokens, **kwargs
+    ):
+
+        assert tgt_tokens is not None, "forward function only supports training."
+
+        # encoding
+        encoder_out = self.encoder(src_tokens, src_lengths=src_lengths, **kwargs)
+
+        # generate training labels for insertion
+        word_ins_out = self.decoder.forward_word_ins(
+            normalize=False,
+            prev_output_tokens=prev_output_tokens,
+            encoder_out=encoder_out,
+        )
+
+        word_ins_tgt = _get_ins_targets(
+            prev_output_tokens,
+            tgt_tokens,
+            self.pad,
+            self.unk,
+            len(self.tgt_dict),
+            tau=self.decoder.label_tau,
+        ).type_as(word_ins_out)
+        word_ins_masks = prev_output_tokens[:, 1:].ne(self.pad)
+
+        return {
+            "word_ins": {
+                "out": word_ins_out,
+                "tgt": word_ins_tgt,
+                "mask": word_ins_masks,
+                "ls": self.args.label_smoothing,
+                "nll_loss": True,
+            }
+        }
+
+    def forward_decoder(
+        self, decoder_out, encoder_out, eos_penalty=0.0, max_ratio=None, **kwargs
+    ):
+
+        output_tokens = decoder_out.output_tokens
+        output_scores = decoder_out.output_scores
+        history = decoder_out.history
+
+        # TODO: decoding for InsertionTransformer
+        word_ins_score = self.decoder.forward_word_ins(
+            normalize=True, prev_output_tokens=output_tokens, encoder_out=encoder_out
+        )
+
+        if eos_penalty > 0.0:
+            word_ins_score[:, :, self.pad] -= eos_penalty
+        word_ins_score, word_ins_pred = word_ins_score.max(-1)
+        output_tokens, output_scores = _apply_ins_words(
+            output_tokens, output_scores, word_ins_pred, word_ins_score, self.pad
+        )
+
+        # delete some unnecessary paddings
+        cut_off = output_tokens.ne(self.pad).sum(1).max()
+        output_tokens = output_tokens[:, :cut_off]
+        output_scores = output_scores[:, :cut_off]
+
+        if history is not None:
+            history.append(output_tokens.clone())
+
+        return decoder_out._replace(
+            output_tokens=output_tokens,
+            output_scores=output_scores,
+            attn=None,
+            history=history,
+        )
+
+
+class InsertionTransformerDecoder(LevenshteinTransformerDecoder):
+    def __init__(self, args, dictionary, embed_tokens, no_encoder_attn=False):
+        # use the TransformerDecoder's __init__
+        super(LevenshteinTransformerDecoder, self).__init__(
+            args, dictionary, embed_tokens, no_encoder_attn=no_encoder_attn
+        )
+
+        self.dictionary = dictionary
+        self.bos = dictionary.bos()
+        self.unk = dictionary.unk()
+        self.eos = dictionary.eos()
+        self.pool_out = Linear(self.output_embed_dim * 2, self.output_embed_dim)
+
+        self.label_tau = getattr(args, "label_tau", None)
+
+    @ensemble_decoder
+    def forward_word_ins(self, normalize, encoder_out, prev_output_tokens):
+        features = self.extract_features(prev_output_tokens, encoder_out=encoder_out)[0]
+        features = self.pool_out(
+            torch.cat([features[:, :-1, :], features[:, 1:, :]], 2)
+        )
+        decoder_out = self.output_layer(features)
+        return F.log_softmax(decoder_out, -1) if normalize else decoder_out
+
+    def forward_mask_ins(self, *args, **kwargs):
+        raise NotImplementedError
+
+    def forward_word_del(self, *args, **kwargs):
+        raise NotImplementedError
+
+
+@register_model_architecture("insertion_transformer", "insertion_transformer")
+def insertion_base_architecture(args):
+    args.encoder_embed_path = getattr(args, "encoder_embed_path", None)
+    args.encoder_embed_dim = getattr(args, "encoder_embed_dim", 512)
+    args.encoder_ffn_embed_dim = getattr(args, "encoder_ffn_embed_dim", 2048)
+    args.encoder_layers = getattr(args, "encoder_layers", 6)
+    args.encoder_attention_heads = getattr(args, "encoder_attention_heads", 8)
+    args.encoder_normalize_before = getattr(args, "encoder_normalize_before", False)
+    args.encoder_learned_pos = getattr(args, "encoder_learned_pos", False)
+    args.decoder_embed_path = getattr(args, "decoder_embed_path", None)
+    args.decoder_embed_dim = getattr(args, "decoder_embed_dim", args.encoder_embed_dim)
+    args.decoder_ffn_embed_dim = getattr(
+        args, "decoder_ffn_embed_dim", args.encoder_ffn_embed_dim
+    )
+    args.decoder_layers = getattr(args, "decoder_layers", 6)
+    args.decoder_attention_heads = getattr(args, "decoder_attention_heads", 8)
+    args.decoder_normalize_before = getattr(args, "decoder_normalize_before", False)
+    args.decoder_learned_pos = getattr(args, "decoder_learned_pos", False)
+    args.attention_dropout = getattr(args, "attention_dropout", 0.0)
+    args.activation_dropout = getattr(args, "activation_dropout", 0.0)
+    args.activation_fn = getattr(args, "activation_fn", "relu")
+    args.dropout = getattr(args, "dropout", 0.1)
+    args.adaptive_softmax_cutoff = getattr(args, "adaptive_softmax_cutoff", None)
+    args.adaptive_softmax_dropout = getattr(args, "adaptive_softmax_dropout", 0)
+    args.share_decoder_input_output_embed = getattr(
+        args, "share_decoder_input_output_embed", False
+    )
+    args.share_all_embeddings = getattr(args, "share_all_embeddings", False)
+    args.no_token_positional_embeddings = getattr(
+        args, "no_token_positional_embeddings", False
+    )
+    args.adaptive_input = getattr(args, "adaptive_input", False)
+    args.apply_bert_init = getattr(args, "apply_bert_init", False)
+
+    args.decoder_output_dim = getattr(
+        args, "decoder_output_dim", args.decoder_embed_dim
+    )
+    args.decoder_input_dim = getattr(args, "decoder_input_dim", args.decoder_embed_dim)
+
+    # special for insertion transformer
+    args.label_tau = getattr(args, "label_tau", None)
diff --git a/SpeechT5/fairseq/fairseq/models/nat/iterative_nonautoregressive_transformer.py b/SpeechT5/fairseq/fairseq/models/nat/iterative_nonautoregressive_transformer.py
new file mode 100644
index 0000000000000000000000000000000000000000..bc39509980a80eb8c21e0bfdb304649ad3acc4d0
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/models/nat/iterative_nonautoregressive_transformer.py
@@ -0,0 +1,228 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import torch
+from fairseq.models import register_model, register_model_architecture
+from fairseq.models.nat import NATransformerModel
+
+
+def _sequential_poisoning(s, V, beta=0.33, bos=2, eos=3, pad=1):
+    # s: input batch
+    # V: vocabulary size
+    rand_words = torch.randint(low=4, high=V, size=s.size(), device=s.device)
+    choices = torch.rand(size=s.size(), device=s.device)
+    choices.masked_fill_((s == pad) | (s == bos) | (s == eos), 1)
+
+    replace = choices < beta / 3
+    repeat = (choices >= beta / 3) & (choices < beta * 2 / 3)
+    swap = (choices >= beta * 2 / 3) & (choices < beta)
+    safe = choices >= beta
+
+    for i in range(s.size(1) - 1):
+        rand_word = rand_words[:, i]
+        next_word = s[:, i + 1]
+        self_word = s[:, i]
+
+        replace_i = replace[:, i]
+        swap_i = swap[:, i] & (next_word != 3)
+        repeat_i = repeat[:, i] & (next_word != 3)
+        safe_i = safe[:, i] | ((next_word == 3) & (~replace_i))
+
+        s[:, i] = (
+            self_word * (safe_i | repeat_i).long()
+            + next_word * swap_i.long()
+            + rand_word * replace_i.long()
+        )
+        s[:, i + 1] = (
+            next_word * (safe_i | replace_i).long()
+            + self_word * (swap_i | repeat_i).long()
+        )
+    return s
+
+
+def gumbel_noise(input, TINY=1e-8):
+    return (
+        input.new_zeros(*input.size())
+        .uniform_()
+        .add_(TINY)
+        .log_()
+        .neg_()
+        .add_(TINY)
+        .log_()
+        .neg_()
+    )
+
+
+@register_model("iterative_nonautoregressive_transformer")
+class IterNATransformerModel(NATransformerModel):
+    @staticmethod
+    def add_args(parser):
+        NATransformerModel.add_args(parser)
+        parser.add_argument(
+            "--train-step",
+            type=int,
+            help="number of refinement iterations during training",
+        )
+        parser.add_argument(
+            "--dae-ratio",
+            type=float,
+            help="the probability of switching to the denoising auto-encoder loss",
+        )
+        parser.add_argument(
+            "--stochastic-approx",
+            action="store_true",
+            help="sampling from the decoder as the inputs for next iteration",
+        )
+
+    @classmethod
+    def build_model(cls, args, task):
+        model = super().build_model(args, task)
+        model.train_step = getattr(args, "train_step", 4)
+        model.dae_ratio = getattr(args, "dae_ratio", 0.5)
+        model.stochastic_approx = getattr(args, "stochastic_approx", False)
+        return model
+
+    def forward(
+        self, src_tokens, src_lengths, prev_output_tokens, tgt_tokens, **kwargs
+    ):
+
+        B, T = prev_output_tokens.size()
+
+        # encoding
+        encoder_out = self.encoder(src_tokens, src_lengths=src_lengths, **kwargs)
+
+        # length prediction
+        length_out = self.decoder.forward_length(
+            normalize=False, encoder_out=encoder_out
+        )
+        length_tgt = self.decoder.forward_length_prediction(
+            length_out, encoder_out, tgt_tokens
+        )
+
+        # decoding
+        word_ins_outs, word_ins_tgts, word_ins_masks = [], [], []
+        for t in range(self.train_step):
+            word_ins_out = self.decoder(
+                normalize=False,
+                prev_output_tokens=prev_output_tokens,
+                encoder_out=encoder_out,
+                step=t,
+            )
+            word_ins_tgt = tgt_tokens
+            word_ins_mask = word_ins_tgt.ne(self.pad)
+
+            word_ins_outs.append(word_ins_out)
+            word_ins_tgts.append(word_ins_tgt)
+            word_ins_masks.append(word_ins_mask)
+
+            if t < (self.train_step - 1):
+                # prediction for next iteration
+                if self.stochastic_approx:
+                    word_ins_prediction = (
+                        word_ins_out + gumbel_noise(word_ins_out)
+                    ).max(-1)[1]
+                else:
+                    word_ins_prediction = word_ins_out.max(-1)[1]
+
+                prev_output_tokens = prev_output_tokens.masked_scatter(
+                    word_ins_mask, word_ins_prediction[word_ins_mask]
+                )
+
+                if self.dae_ratio > 0:
+                    # we do not perform denoising for the first iteration
+                    corrputed = (
+                        torch.rand(size=(B,), device=prev_output_tokens.device)
+                        < self.dae_ratio
+                    )
+                    corrputed_tokens = _sequential_poisoning(
+                        tgt_tokens[corrputed],
+                        len(self.tgt_dict),
+                        0.33,
+                        self.bos,
+                        self.eos,
+                        self.pad,
+                    )
+                    prev_output_tokens[corrputed] = corrputed_tokens
+
+        # concat everything
+        word_ins_out = torch.cat(word_ins_outs, 0)
+        word_ins_tgt = torch.cat(word_ins_tgts, 0)
+        word_ins_mask = torch.cat(word_ins_masks, 0)
+
+        return {
+            "word_ins": {
+                "out": word_ins_out,
+                "tgt": word_ins_tgt,
+                "mask": word_ins_mask,
+                "ls": self.args.label_smoothing,
+                "nll_loss": True,
+            },
+            "length": {
+                "out": length_out,
+                "tgt": length_tgt,
+                "factor": self.decoder.length_loss_factor,
+            },
+        }
+
+
+@register_model_architecture(
+    "iterative_nonautoregressive_transformer", "iterative_nonautoregressive_transformer"
+)
+def inat_base_architecture(args):
+    args.encoder_embed_path = getattr(args, "encoder_embed_path", None)
+    args.encoder_embed_dim = getattr(args, "encoder_embed_dim", 512)
+    args.encoder_ffn_embed_dim = getattr(args, "encoder_ffn_embed_dim", 2048)
+    args.encoder_layers = getattr(args, "encoder_layers", 6)
+    args.encoder_attention_heads = getattr(args, "encoder_attention_heads", 8)
+    args.encoder_normalize_before = getattr(args, "encoder_normalize_before", False)
+    args.encoder_learned_pos = getattr(args, "encoder_learned_pos", False)
+    args.decoder_embed_path = getattr(args, "decoder_embed_path", None)
+    args.decoder_embed_dim = getattr(args, "decoder_embed_dim", args.encoder_embed_dim)
+    args.decoder_ffn_embed_dim = getattr(
+        args, "decoder_ffn_embed_dim", args.encoder_ffn_embed_dim
+    )
+    args.decoder_layers = getattr(args, "decoder_layers", 6)
+    args.decoder_attention_heads = getattr(args, "decoder_attention_heads", 8)
+    args.decoder_normalize_before = getattr(args, "decoder_normalize_before", False)
+    args.decoder_learned_pos = getattr(args, "decoder_learned_pos", False)
+    args.attention_dropout = getattr(args, "attention_dropout", 0.0)
+    args.activation_dropout = getattr(args, "activation_dropout", 0.0)
+    args.activation_fn = getattr(args, "activation_fn", "relu")
+    args.dropout = getattr(args, "dropout", 0.1)
+    args.adaptive_softmax_cutoff = getattr(args, "adaptive_softmax_cutoff", None)
+    args.adaptive_softmax_dropout = getattr(args, "adaptive_softmax_dropout", 0)
+    args.share_decoder_input_output_embed = getattr(
+        args, "share_decoder_input_output_embed", False
+    )
+    args.share_all_embeddings = getattr(args, "share_all_embeddings", False)
+    args.no_token_positional_embeddings = getattr(
+        args, "no_token_positional_embeddings", False
+    )
+    args.adaptive_input = getattr(args, "adaptive_input", False)
+    args.apply_bert_init = getattr(args, "apply_bert_init", False)
+
+    args.decoder_output_dim = getattr(
+        args, "decoder_output_dim", args.decoder_embed_dim
+    )
+    args.decoder_input_dim = getattr(args, "decoder_input_dim", args.decoder_embed_dim)
+
+    # --- special arguments ---
+    args.sg_length_pred = getattr(args, "sg_length_pred", False)
+    args.pred_length_offset = getattr(args, "pred_length_offset", False)
+    args.length_loss_factor = getattr(args, "length_loss_factor", 0.1)
+    args.ngram_predictor = getattr(args, "ngram_predictor", 1)
+    args.src_embedding_copy = getattr(args, "src_embedding_copy", False)
+
+    args.train_step = getattr(args, "train_step", 4)
+    args.dae_ratio = getattr(args, "dae_ratio", 0.5)
+    args.stochastic_approx = getattr(args, "stochastic_approx", False)
+
+
+@register_model_architecture(
+    "iterative_nonautoregressive_transformer",
+    "iterative_nonautoregressive_transformer_wmt_en_de",
+)
+def iter_nat_wmt_en_de(args):
+    inat_base_architecture(args)
diff --git a/SpeechT5/fairseq/fairseq/models/nat/levenshtein_transformer.py b/SpeechT5/fairseq/fairseq/models/nat/levenshtein_transformer.py
new file mode 100644
index 0000000000000000000000000000000000000000..9377c3c7f5ad6b298eedfb2dc11f1a7a52d1cf26
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/models/nat/levenshtein_transformer.py
@@ -0,0 +1,509 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from fairseq.iterative_refinement_generator import DecoderOut
+from fairseq.models import register_model, register_model_architecture
+from fairseq.models.nat import FairseqNATDecoder, FairseqNATModel, ensemble_decoder
+from fairseq.models.transformer import Embedding, TransformerDecoderLayer
+from fairseq.modules.transformer_sentence_encoder import init_bert_params
+
+from .levenshtein_utils import (
+    _apply_del_words,
+    _apply_ins_masks,
+    _apply_ins_words,
+    _fill,
+    _get_del_targets,
+    _get_ins_targets,
+    _skip,
+    _skip_encoder_out,
+)
+
+
+@register_model("levenshtein_transformer")
+class LevenshteinTransformerModel(FairseqNATModel):
+    @property
+    def allow_length_beam(self):
+        return False
+
+    @staticmethod
+    def add_args(parser):
+        FairseqNATModel.add_args(parser)
+        parser.add_argument(
+            "--early-exit",
+            default="6,6,6",
+            type=str,
+            help="number of decoder layers before word_del, mask_ins, word_ins",
+        )
+        parser.add_argument(
+            "--no-share-discriminator",
+            action="store_true",
+            help="separate parameters for discriminator",
+        )
+        parser.add_argument(
+            "--no-share-maskpredictor",
+            action="store_true",
+            help="separate parameters for mask-predictor",
+        )
+        parser.add_argument(
+            "--share-discriminator-maskpredictor",
+            action="store_true",
+            help="share the parameters for both mask-predictor and discriminator",
+        )
+        parser.add_argument(
+            "--sampling-for-deletion",
+            action="store_true",
+            help="instead of argmax, use sampling to predict the tokens",
+        )
+
+    @classmethod
+    def build_decoder(cls, args, tgt_dict, embed_tokens):
+        decoder = LevenshteinTransformerDecoder(args, tgt_dict, embed_tokens)
+        if getattr(args, "apply_bert_init", False):
+            decoder.apply(init_bert_params)
+        return decoder
+
+    def forward(
+        self, src_tokens, src_lengths, prev_output_tokens, tgt_tokens, **kwargs
+    ):
+
+        assert tgt_tokens is not None, "forward function only supports training."
+
+        # encoding
+        encoder_out = self.encoder(src_tokens, src_lengths=src_lengths, **kwargs)
+
+        # generate training labels for insertion
+        masked_tgt_masks, masked_tgt_tokens, mask_ins_targets = _get_ins_targets(
+            prev_output_tokens, tgt_tokens, self.pad, self.unk
+        )
+        mask_ins_targets = mask_ins_targets.clamp(min=0, max=255)  # for safe prediction
+        mask_ins_masks = prev_output_tokens[:, 1:].ne(self.pad)
+
+        mask_ins_out, _ = self.decoder.forward_mask_ins(
+            normalize=False,
+            prev_output_tokens=prev_output_tokens,
+            encoder_out=encoder_out,
+        )
+        word_ins_out, _ = self.decoder.forward_word_ins(
+            normalize=False,
+            prev_output_tokens=masked_tgt_tokens,
+            encoder_out=encoder_out,
+        )
+
+        # make online prediction
+        if self.decoder.sampling_for_deletion:
+            word_predictions = torch.multinomial(
+                F.softmax(word_ins_out, -1).view(-1, word_ins_out.size(-1)), 1
+            ).view(word_ins_out.size(0), -1)
+        else:
+            word_predictions = F.log_softmax(word_ins_out, dim=-1).max(2)[1]
+
+        word_predictions.masked_scatter_(
+            ~masked_tgt_masks, tgt_tokens[~masked_tgt_masks]
+        )
+
+        # generate training labels for deletion
+        word_del_targets = _get_del_targets(word_predictions, tgt_tokens, self.pad)
+        word_del_out, _ = self.decoder.forward_word_del(
+            normalize=False,
+            prev_output_tokens=word_predictions,
+            encoder_out=encoder_out,
+        )
+        word_del_masks = word_predictions.ne(self.pad)
+
+        return {
+            "mask_ins": {
+                "out": mask_ins_out,
+                "tgt": mask_ins_targets,
+                "mask": mask_ins_masks,
+                "ls": 0.01,
+            },
+            "word_ins": {
+                "out": word_ins_out,
+                "tgt": tgt_tokens,
+                "mask": masked_tgt_masks,
+                "ls": self.args.label_smoothing,
+                "nll_loss": True,
+            },
+            "word_del": {
+                "out": word_del_out,
+                "tgt": word_del_targets,
+                "mask": word_del_masks,
+            },
+        }
+
+    def forward_decoder(
+        self, decoder_out, encoder_out, eos_penalty=0.0, max_ratio=None, **kwargs
+    ):
+
+        output_tokens = decoder_out.output_tokens
+        output_scores = decoder_out.output_scores
+        attn = decoder_out.attn
+        history = decoder_out.history
+
+        bsz = output_tokens.size(0)
+        if max_ratio is None:
+            max_lens = torch.zeros_like(output_tokens).fill_(255)
+        else:
+            if not encoder_out["encoder_padding_mask"]:
+                max_src_len = encoder_out["encoder_out"].size(0)
+                src_lens = encoder_out["encoder_out"].new(bsz).fill_(max_src_len)
+            else:
+                src_lens = (~encoder_out["encoder_padding_mask"][0]).sum(1)
+            max_lens = (src_lens * max_ratio).clamp(min=10).long()
+
+        # delete words
+        # do not delete tokens if it is <s> </s>
+        can_del_word = output_tokens.ne(self.pad).sum(1) > 2
+        if can_del_word.sum() != 0:  # we cannot delete, skip
+            word_del_score, word_del_attn = self.decoder.forward_word_del(
+                normalize=True,
+                prev_output_tokens=_skip(output_tokens, can_del_word),
+                encoder_out=_skip_encoder_out(self.encoder, encoder_out, can_del_word),
+            )
+            word_del_pred = word_del_score.max(-1)[1].bool()
+
+            _tokens, _scores, _attn = _apply_del_words(
+                output_tokens[can_del_word],
+                output_scores[can_del_word],
+                word_del_attn,
+                word_del_pred,
+                self.pad,
+                self.bos,
+                self.eos,
+            )
+            output_tokens = _fill(output_tokens, can_del_word, _tokens, self.pad)
+            output_scores = _fill(output_scores, can_del_word, _scores, 0)
+            attn = _fill(attn, can_del_word, _attn, 0.0)
+
+            if history is not None:
+                history.append(output_tokens.clone())
+
+        # insert placeholders
+        can_ins_mask = output_tokens.ne(self.pad).sum(1) < max_lens
+        if can_ins_mask.sum() != 0:
+            mask_ins_score, _ = self.decoder.forward_mask_ins(
+                normalize=True,
+                prev_output_tokens=_skip(output_tokens, can_ins_mask),
+                encoder_out=_skip_encoder_out(self.encoder, encoder_out, can_ins_mask),
+            )
+            if eos_penalty > 0.0:
+                mask_ins_score[:, :, 0] = mask_ins_score[:, :, 0] - eos_penalty
+            mask_ins_pred = mask_ins_score.max(-1)[1]
+            mask_ins_pred = torch.min(
+                mask_ins_pred, max_lens[can_ins_mask, None].expand_as(mask_ins_pred)
+            )
+
+            _tokens, _scores = _apply_ins_masks(
+                output_tokens[can_ins_mask],
+                output_scores[can_ins_mask],
+                mask_ins_pred,
+                self.pad,
+                self.unk,
+                self.eos,
+            )
+            output_tokens = _fill(output_tokens, can_ins_mask, _tokens, self.pad)
+            output_scores = _fill(output_scores, can_ins_mask, _scores, 0)
+
+            if history is not None:
+                history.append(output_tokens.clone())
+
+        # insert words
+        can_ins_word = output_tokens.eq(self.unk).sum(1) > 0
+        if can_ins_word.sum() != 0:
+            word_ins_score, word_ins_attn = self.decoder.forward_word_ins(
+                normalize=True,
+                prev_output_tokens=_skip(output_tokens, can_ins_word),
+                encoder_out=_skip_encoder_out(self.encoder, encoder_out, can_ins_word),
+            )
+            word_ins_score, word_ins_pred = word_ins_score.max(-1)
+            _tokens, _scores = _apply_ins_words(
+                output_tokens[can_ins_word],
+                output_scores[can_ins_word],
+                word_ins_pred,
+                word_ins_score,
+                self.unk,
+            )
+
+            output_tokens = _fill(output_tokens, can_ins_word, _tokens, self.pad)
+            output_scores = _fill(output_scores, can_ins_word, _scores, 0)
+            attn = _fill(attn, can_ins_word, word_ins_attn, 0.0)
+
+            if history is not None:
+                history.append(output_tokens.clone())
+
+        # delete some unnecessary paddings
+        cut_off = output_tokens.ne(self.pad).sum(1).max()
+        output_tokens = output_tokens[:, :cut_off]
+        output_scores = output_scores[:, :cut_off]
+        attn = None if attn is None else attn[:, :cut_off, :]
+
+        return decoder_out._replace(
+            output_tokens=output_tokens,
+            output_scores=output_scores,
+            attn=attn,
+            history=history,
+        )
+
+    def initialize_output_tokens(self, encoder_out, src_tokens):
+        initial_output_tokens = src_tokens.new_zeros(src_tokens.size(0), 2)
+        initial_output_tokens[:, 0] = self.bos
+        initial_output_tokens[:, 1] = self.eos
+
+        initial_output_scores = initial_output_tokens.new_zeros(
+            *initial_output_tokens.size()
+        ).type_as(encoder_out["encoder_out"][0])
+
+        return DecoderOut(
+            output_tokens=initial_output_tokens,
+            output_scores=initial_output_scores,
+            attn=None,
+            step=0,
+            max_step=0,
+            history=None,
+        )
+
+
+class LevenshteinTransformerDecoder(FairseqNATDecoder):
+    def __init__(self, args, dictionary, embed_tokens, no_encoder_attn=False):
+        super().__init__(
+            args, dictionary, embed_tokens, no_encoder_attn=no_encoder_attn
+        )
+        self.dictionary = dictionary
+        self.bos = dictionary.bos()
+        self.unk = dictionary.unk()
+        self.eos = dictionary.eos()
+        self.sampling_for_deletion = getattr(args, "sampling_for_deletion", False)
+        self.embed_mask_ins = Embedding(256, self.output_embed_dim * 2, None)
+        self.embed_word_del = Embedding(2, self.output_embed_dim, None)
+
+        # del_word, ins_mask, ins_word
+        self.early_exit = [int(i) for i in args.early_exit.split(",")]
+        assert len(self.early_exit) == 3
+
+        # copy layers for mask-predict/deletion
+        self.layers_msk = None
+        if getattr(args, "no_share_maskpredictor", False):
+            self.layers_msk = nn.ModuleList(
+                [
+                    TransformerDecoderLayer(args, no_encoder_attn)
+                    for _ in range(self.early_exit[1])
+                ]
+            )
+        self.layers_del = None
+        if getattr(args, "no_share_discriminator", False):
+            self.layers_del = nn.ModuleList(
+                [
+                    TransformerDecoderLayer(args, no_encoder_attn)
+                    for _ in range(self.early_exit[0])
+                ]
+            )
+
+        if getattr(args, "share_discriminator_maskpredictor", False):
+            assert getattr(
+                args, "no_share_discriminator", False
+            ), "must set saperate discriminator"
+            self.layers_msk = self.layers_del
+
+    def extract_features(
+        self,
+        prev_output_tokens,
+        encoder_out=None,
+        early_exit=None,
+        layers=None,
+        **unused
+    ):
+        """
+        Similar to *forward* but only return features.
+        Inputs:
+            prev_output_tokens: Tensor(B, T)
+            encoder_out: a dictionary of hidden states and masks
+
+        Returns:
+            tuple:
+                - the decoder's features of shape `(batch, tgt_len, embed_dim)`
+                - a dictionary with any model-specific outputs
+            the LevenshteinTransformer decoder has full-attention to all generated tokens
+        """
+        # embed positions
+        positions = (
+            self.embed_positions(prev_output_tokens)
+            if self.embed_positions is not None
+            else None
+        )
+
+        # embed tokens and positions
+        x = self.embed_scale * self.embed_tokens(prev_output_tokens)
+        if self.project_in_dim is not None:
+            x = self.project_in_dim(x)
+
+        if positions is not None:
+            x += positions
+        x = self.dropout_module(x)
+
+        # B x T x C -> T x B x C
+        x = x.transpose(0, 1)
+        attn = None
+        inner_states = [x]
+
+        # decoder layers
+        decoder_padding_mask = prev_output_tokens.eq(self.padding_idx)
+        layers = self.layers if layers is None else layers
+        early_exit = len(layers) if early_exit is None else early_exit
+        for _, layer in enumerate(layers[:early_exit]):
+            x, attn, _ = layer(
+                x,
+                encoder_out["encoder_out"][0]
+                if (encoder_out is not None and len(encoder_out["encoder_out"]) > 0)
+                else None,
+                encoder_out["encoder_padding_mask"][0]
+                if (
+                    encoder_out is not None
+                    and len(encoder_out["encoder_padding_mask"]) > 0
+                )
+                else None,
+                self_attn_mask=None,
+                self_attn_padding_mask=decoder_padding_mask,
+            )
+            inner_states.append(x)
+
+        if self.layer_norm:
+            x = self.layer_norm(x)
+
+        # T x B x C -> B x T x C
+        x = x.transpose(0, 1)
+
+        if self.project_out_dim is not None:
+            x = self.project_out_dim(x)
+
+        return x, {"attn": attn, "inner_states": inner_states}
+
+    @ensemble_decoder
+    def forward_mask_ins(self, normalize, encoder_out, prev_output_tokens, **unused):
+        features, extra = self.extract_features(
+            prev_output_tokens,
+            encoder_out=encoder_out,
+            early_exit=self.early_exit[1],
+            layers=self.layers_msk,
+            **unused
+        )
+        features_cat = torch.cat([features[:, :-1, :], features[:, 1:, :]], 2)
+        decoder_out = F.linear(features_cat, self.embed_mask_ins.weight)
+        if normalize:
+            return F.log_softmax(decoder_out, -1), extra["attn"]
+        return decoder_out, extra["attn"]
+
+    @ensemble_decoder
+    def forward_word_ins(self, normalize, encoder_out, prev_output_tokens, **unused):
+        features, extra = self.extract_features(
+            prev_output_tokens,
+            encoder_out=encoder_out,
+            early_exit=self.early_exit[2],
+            layers=self.layers,
+            **unused
+        )
+        decoder_out = self.output_layer(features)
+        if normalize:
+            return F.log_softmax(decoder_out, -1), extra["attn"]
+        return decoder_out, extra["attn"]
+
+    @ensemble_decoder
+    def forward_word_del(self, normalize, encoder_out, prev_output_tokens, **unused):
+        features, extra = self.extract_features(
+            prev_output_tokens,
+            encoder_out=encoder_out,
+            early_exit=self.early_exit[0],
+            layers=self.layers_del,
+            **unused
+        )
+        decoder_out = F.linear(features, self.embed_word_del.weight)
+        if normalize:
+            return F.log_softmax(decoder_out, -1), extra["attn"]
+        return decoder_out, extra["attn"]
+
+
+@register_model_architecture("levenshtein_transformer", "levenshtein_transformer")
+def levenshtein_base_architecture(args):
+    args.encoder_embed_path = getattr(args, "encoder_embed_path", None)
+    args.encoder_embed_dim = getattr(args, "encoder_embed_dim", 512)
+    args.encoder_ffn_embed_dim = getattr(args, "encoder_ffn_embed_dim", 2048)
+    args.encoder_layers = getattr(args, "encoder_layers", 6)
+    args.encoder_attention_heads = getattr(args, "encoder_attention_heads", 8)
+    args.encoder_normalize_before = getattr(args, "encoder_normalize_before", False)
+    args.encoder_learned_pos = getattr(args, "encoder_learned_pos", False)
+    args.decoder_embed_path = getattr(args, "decoder_embed_path", None)
+    args.decoder_embed_dim = getattr(args, "decoder_embed_dim", args.encoder_embed_dim)
+    args.decoder_ffn_embed_dim = getattr(
+        args, "decoder_ffn_embed_dim", args.encoder_ffn_embed_dim
+    )
+    args.decoder_layers = getattr(args, "decoder_layers", 6)
+    args.decoder_attention_heads = getattr(args, "decoder_attention_heads", 8)
+    args.decoder_normalize_before = getattr(args, "decoder_normalize_before", False)
+    args.decoder_learned_pos = getattr(args, "decoder_learned_pos", False)
+    args.attention_dropout = getattr(args, "attention_dropout", 0.0)
+    args.activation_dropout = getattr(args, "activation_dropout", 0.0)
+    args.activation_fn = getattr(args, "activation_fn", "relu")
+    args.dropout = getattr(args, "dropout", 0.1)
+    args.adaptive_softmax_cutoff = getattr(args, "adaptive_softmax_cutoff", None)
+    args.adaptive_softmax_dropout = getattr(args, "adaptive_softmax_dropout", 0)
+    args.share_decoder_input_output_embed = getattr(
+        args, "share_decoder_input_output_embed", False
+    )
+    args.share_all_embeddings = getattr(args, "share_all_embeddings", False)
+    args.no_token_positional_embeddings = getattr(
+        args, "no_token_positional_embeddings", False
+    )
+    args.adaptive_input = getattr(args, "adaptive_input", False)
+    args.apply_bert_init = getattr(args, "apply_bert_init", False)
+
+    args.decoder_output_dim = getattr(
+        args, "decoder_output_dim", args.decoder_embed_dim
+    )
+    args.sampling_for_deletion = getattr(args, "sampling_for_deletion", False)
+    args.decoder_input_dim = getattr(args, "decoder_input_dim", args.decoder_embed_dim)
+    args.early_exit = getattr(args, "early_exit", "6,6,6")
+    args.no_share_discriminator = getattr(args, "no_share_discriminator", False)
+    args.no_share_maskpredictor = getattr(args, "no_share_maskpredictor", False)
+    args.share_discriminator_maskpredictor = getattr(
+        args, "share_discriminator_maskpredictor", False
+    )
+    args.no_share_last_layer = getattr(args, "no_share_last_layer", False)
+
+
+@register_model_architecture(
+    "levenshtein_transformer", "levenshtein_transformer_wmt_en_de"
+)
+def levenshtein_transformer_wmt_en_de(args):
+    levenshtein_base_architecture(args)
+
+
+# similar parameters used in the "Attention Is All You Need" paper (Vaswani et al., 2017)
+@register_model_architecture(
+    "levenshtein_transformer", "levenshtein_transformer_vaswani_wmt_en_de_big"
+)
+def levenshtein_transformer_vaswani_wmt_en_de_big(args):
+    args.encoder_embed_dim = getattr(args, "encoder_embed_dim", 1024)
+    args.encoder_ffn_embed_dim = getattr(args, "encoder_ffn_embed_dim", 4096)
+    args.encoder_attention_heads = getattr(args, "encoder_attention_heads", 16)
+    args.encoder_normalize_before = getattr(args, "encoder_normalize_before", False)
+    args.decoder_embed_dim = getattr(args, "decoder_embed_dim", 1024)
+    args.decoder_ffn_embed_dim = getattr(args, "decoder_ffn_embed_dim", 4096)
+    args.decoder_attention_heads = getattr(args, "decoder_attention_heads", 16)
+    args.dropout = getattr(args, "dropout", 0.3)
+    levenshtein_base_architecture(args)
+
+
+# default parameters used in tensor2tensor implementation
+@register_model_architecture(
+    "levenshtein_transformer", "levenshtein_transformer_wmt_en_de_big"
+)
+def levenshtein_transformer_wmt_en_de_big_t2t(args):
+    args.encoder_normalize_before = getattr(args, "encoder_normalize_before", True)
+    args.decoder_normalize_before = getattr(args, "decoder_normalize_before", True)
+    args.attention_dropout = getattr(args, "attention_dropout", 0.1)
+    args.activation_dropout = getattr(args, "activation_dropout", 0.1)
+    levenshtein_transformer_vaswani_wmt_en_de_big(args)
diff --git a/SpeechT5/fairseq/fairseq/models/nat/levenshtein_utils.py b/SpeechT5/fairseq/fairseq/models/nat/levenshtein_utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..375a98c2e11354de085f0a7926f407bd1a6a2ad4
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/models/nat/levenshtein_utils.py
@@ -0,0 +1,293 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import torch
+from fairseq.utils import new_arange
+
+
+# -------------- Helper Functions --------------------------------------------------- #
+
+
+def load_libnat():
+    try:
+        from fairseq import libnat_cuda
+
+        return libnat_cuda, True
+
+    except ImportError as e:
+        print(str(e) + "... fall back to CPU version")
+
+        try:
+            from fairseq import libnat
+
+            return libnat, False
+
+        except ImportError as e:
+            import sys
+
+            sys.stderr.write(
+                "ERROR: missing libnat_cuda. run `python setup.py build_ext --inplace`\n"
+            )
+            raise e
+
+
+def _get_ins_targets(in_tokens, out_tokens, padding_idx, unk_idx):
+    libnat, use_cuda = load_libnat()
+
+    def _get_ins_targets_cuda(in_tokens, out_tokens, padding_idx, unk_idx):
+        in_masks = in_tokens.ne(padding_idx)
+        out_masks = out_tokens.ne(padding_idx)
+        mask_ins_targets, masked_tgt_masks = libnat.generate_insertion_labels(
+            out_tokens.int(),
+            libnat.levenshtein_distance(
+                in_tokens.int(),
+                out_tokens.int(),
+                in_masks.sum(1).int(),
+                out_masks.sum(1).int(),
+            ),
+        )
+        masked_tgt_masks = masked_tgt_masks.bool() & out_masks
+        mask_ins_targets = mask_ins_targets.type_as(in_tokens)[
+            :, 1 : in_masks.size(1)
+        ].masked_fill_(~in_masks[:, 1:], 0)
+        masked_tgt_tokens = out_tokens.masked_fill(masked_tgt_masks, unk_idx)
+        return masked_tgt_masks, masked_tgt_tokens, mask_ins_targets
+
+    def _get_ins_targets_cpu(in_tokens, out_tokens, padding_idx, unk_idx):
+        in_seq_len, out_seq_len = in_tokens.size(1), out_tokens.size(1)
+
+        in_tokens_list = [
+            [t for t in s if t != padding_idx] for i, s in enumerate(in_tokens.tolist())
+        ]
+        out_tokens_list = [
+            [t for t in s if t != padding_idx]
+            for i, s in enumerate(out_tokens.tolist())
+        ]
+
+        full_labels = libnat.suggested_ed2_path(
+            in_tokens_list, out_tokens_list, padding_idx
+        )
+        mask_inputs = [
+            [len(c) if c[0] != padding_idx else 0 for c in a[:-1]] for a in full_labels
+        ]
+
+        # generate labels
+        masked_tgt_masks = []
+        for mask_input in mask_inputs:
+            mask_label = []
+            for beam_size in mask_input[1:-1]:  # HACK 1:-1
+                mask_label += [0] + [1 for _ in range(beam_size)]
+            masked_tgt_masks.append(
+                mask_label + [0 for _ in range(out_seq_len - len(mask_label))]
+            )
+        mask_ins_targets = [
+            mask_input[1:-1]
+            + [0 for _ in range(in_seq_len - 1 - len(mask_input[1:-1]))]
+            for mask_input in mask_inputs
+        ]
+
+        # transform to tensor
+        masked_tgt_masks = torch.tensor(
+            masked_tgt_masks, device=out_tokens.device
+        ).bool()
+        mask_ins_targets = torch.tensor(mask_ins_targets, device=in_tokens.device)
+        masked_tgt_tokens = out_tokens.masked_fill(masked_tgt_masks, unk_idx)
+        return masked_tgt_masks, masked_tgt_tokens, mask_ins_targets
+
+    if use_cuda:
+        return _get_ins_targets_cuda(in_tokens, out_tokens, padding_idx, unk_idx)
+    return _get_ins_targets_cpu(in_tokens, out_tokens, padding_idx, unk_idx)
+
+
+def _get_del_targets(in_tokens, out_tokens, padding_idx):
+    libnat, use_cuda = load_libnat()
+
+    def _get_del_targets_cuda(in_tokens, out_tokens, padding_idx):
+        in_masks = in_tokens.ne(padding_idx)
+        out_masks = out_tokens.ne(padding_idx)
+
+        word_del_targets = libnat.generate_deletion_labels(
+            in_tokens.int(),
+            libnat.levenshtein_distance(
+                in_tokens.int(),
+                out_tokens.int(),
+                in_masks.sum(1).int(),
+                out_masks.sum(1).int(),
+            ),
+        )
+        word_del_targets = word_del_targets.type_as(in_tokens).masked_fill_(
+            ~in_masks, 0
+        )
+        return word_del_targets
+
+    def _get_del_targets_cpu(in_tokens, out_tokens, padding_idx):
+        out_seq_len = out_tokens.size(1)
+        with torch.cuda.device_of(in_tokens):
+            in_tokens_list = [
+                [t for t in s if t != padding_idx]
+                for i, s in enumerate(in_tokens.tolist())
+            ]
+            out_tokens_list = [
+                [t for t in s if t != padding_idx]
+                for i, s in enumerate(out_tokens.tolist())
+            ]
+
+        full_labels = libnat.suggested_ed2_path(
+            in_tokens_list, out_tokens_list, padding_idx
+        )
+        word_del_targets = [b[-1] for b in full_labels]
+        word_del_targets = [
+            labels + [0 for _ in range(out_seq_len - len(labels))]
+            for labels in word_del_targets
+        ]
+
+        # transform to tensor
+        word_del_targets = torch.tensor(word_del_targets, device=out_tokens.device)
+        return word_del_targets
+
+    if use_cuda:
+        return _get_del_targets_cuda(in_tokens, out_tokens, padding_idx)
+    return _get_del_targets_cpu(in_tokens, out_tokens, padding_idx)
+
+
+def _apply_ins_masks(
+    in_tokens, in_scores, mask_ins_pred, padding_idx, unk_idx, eos_idx
+):
+
+    in_masks = in_tokens.ne(padding_idx)
+    in_lengths = in_masks.sum(1)
+
+    # HACK: hacky way to shift all the paddings to eos first.
+    in_tokens.masked_fill_(~in_masks, eos_idx)
+    mask_ins_pred.masked_fill_(~in_masks[:, 1:], 0)
+
+    out_lengths = in_lengths + mask_ins_pred.sum(1)
+    out_max_len = out_lengths.max()
+    out_masks = new_arange(out_lengths, out_max_len)[None, :] < out_lengths[:, None]
+
+    reordering = (mask_ins_pred + in_masks[:, 1:].long()).cumsum(1)
+    out_tokens = (
+        in_tokens.new_zeros(in_tokens.size(0), out_max_len)
+        .fill_(padding_idx)
+        .masked_fill_(out_masks, unk_idx)
+    )
+    out_tokens[:, 0] = in_tokens[:, 0]
+    out_tokens.scatter_(1, reordering, in_tokens[:, 1:])
+
+    out_scores = None
+    if in_scores is not None:
+        in_scores.masked_fill_(~in_masks, 0)
+        out_scores = in_scores.new_zeros(*out_tokens.size())
+        out_scores[:, 0] = in_scores[:, 0]
+        out_scores.scatter_(1, reordering, in_scores[:, 1:])
+
+    return out_tokens, out_scores
+
+
+def _apply_ins_words(in_tokens, in_scores, word_ins_pred, word_ins_scores, unk_idx):
+    word_ins_masks = in_tokens.eq(unk_idx)
+    out_tokens = in_tokens.masked_scatter(word_ins_masks, word_ins_pred[word_ins_masks])
+
+    if in_scores is not None:
+        out_scores = in_scores.masked_scatter(
+            word_ins_masks, word_ins_scores[word_ins_masks]
+        )
+    else:
+        out_scores = None
+
+    return out_tokens, out_scores
+
+
+def _apply_del_words(
+    in_tokens, in_scores, in_attn, word_del_pred, padding_idx, bos_idx, eos_idx
+):
+    # apply deletion to a tensor
+    in_masks = in_tokens.ne(padding_idx)
+    bos_eos_masks = in_tokens.eq(bos_idx) | in_tokens.eq(eos_idx)
+
+    max_len = in_tokens.size(1)
+    word_del_pred.masked_fill_(~in_masks, 1)
+    word_del_pred.masked_fill_(bos_eos_masks, 0)
+
+    reordering = new_arange(in_tokens).masked_fill_(word_del_pred, max_len).sort(1)[1]
+
+    out_tokens = in_tokens.masked_fill(word_del_pred, padding_idx).gather(1, reordering)
+
+    out_scores = None
+    if in_scores is not None:
+        out_scores = in_scores.masked_fill(word_del_pred, 0).gather(1, reordering)
+
+    out_attn = None
+    if in_attn is not None:
+        _mask = word_del_pred[:, :, None].expand_as(in_attn)
+        _reordering = reordering[:, :, None].expand_as(in_attn)
+        out_attn = in_attn.masked_fill(_mask, 0.0).gather(1, _reordering)
+
+    return out_tokens, out_scores, out_attn
+
+
+def _skip(x, mask):
+    """
+    Getting sliced (dim=0) tensor by mask. Supporting tensor and list/dict of tensors.
+    """
+    if isinstance(x, int):
+        return x
+
+    if x is None:
+        return None
+
+    if isinstance(x, torch.Tensor):
+        if x.size(0) == mask.size(0):
+            return x[mask]
+        elif x.size(1) == mask.size(0):
+            return x[:, mask]
+
+    if isinstance(x, list):
+        return [_skip(x_i, mask) for x_i in x]
+
+    if isinstance(x, dict):
+        return {k: _skip(v, mask) for k, v in x.items()}
+
+    raise NotImplementedError
+
+
+def _skip_encoder_out(encoder, encoder_out, mask):
+    if not mask.any():
+        return encoder_out
+    else:
+        return encoder.reorder_encoder_out(
+            encoder_out, mask.nonzero(as_tuple=False).squeeze()
+        )
+
+
+def _fill(x, mask, y, padding_idx):
+    """
+    Filling tensor x with y at masked positions (dim=0).
+    """
+    if x is None:
+        return y
+    assert x.dim() == y.dim() and mask.size(0) == x.size(0)
+    assert x.dim() == 2 or (x.dim() == 3 and x.size(2) == y.size(2))
+    n_selected = mask.sum()
+    assert n_selected == y.size(0)
+
+    if n_selected == x.size(0):
+        return y
+
+    if x.size(1) < y.size(1):
+        dims = [x.size(0), y.size(1) - x.size(1)]
+        if x.dim() == 3:
+            dims.append(x.size(2))
+        x = torch.cat([x, x.new_zeros(*dims).fill_(padding_idx)], 1)
+        x[mask] = y
+    elif x.size(1) > y.size(1):
+        x[mask] = padding_idx
+        if x.dim() == 2:
+            x[mask, : y.size(1)] = y
+        else:
+            x[mask, : y.size(1), :] = y
+    else:
+        x[mask] = y
+    return x
diff --git a/SpeechT5/fairseq/fairseq/models/nat/nat_crf_transformer.py b/SpeechT5/fairseq/fairseq/models/nat/nat_crf_transformer.py
new file mode 100644
index 0000000000000000000000000000000000000000..d4b3cd931ceb077eb30db73df1d5d6cd714a86c2
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/models/nat/nat_crf_transformer.py
@@ -0,0 +1,121 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+
+from fairseq.models import register_model, register_model_architecture
+from fairseq.models.nat import NATransformerModel, base_architecture
+from fairseq.modules import DynamicCRF
+
+
+@register_model("nacrf_transformer")
+class NACRFTransformerModel(NATransformerModel):
+    def __init__(self, args, encoder, decoder):
+        super().__init__(args, encoder, decoder)
+        self.crf_layer = DynamicCRF(
+            num_embedding=len(self.tgt_dict),
+            low_rank=args.crf_lowrank_approx,
+            beam_size=args.crf_beam_approx,
+        )
+
+    @property
+    def allow_ensemble(self):
+        return False
+
+    @staticmethod
+    def add_args(parser):
+        NATransformerModel.add_args(parser)
+        parser.add_argument(
+            "--crf-lowrank-approx",
+            type=int,
+            help="the dimension of low-rank approximation of transition",
+        )
+        parser.add_argument(
+            "--crf-beam-approx",
+            type=int,
+            help="the beam size for apporixmating the normalizing factor",
+        )
+        parser.add_argument(
+            "--word-ins-loss-factor",
+            type=float,
+            help="weights on NAT loss used to co-training with CRF loss.",
+        )
+
+    def forward(
+        self, src_tokens, src_lengths, prev_output_tokens, tgt_tokens, **kwargs
+    ):
+        # encoding
+        encoder_out = self.encoder(src_tokens, src_lengths=src_lengths, **kwargs)
+
+        # length prediction
+        length_out = self.decoder.forward_length(
+            normalize=False, encoder_out=encoder_out
+        )
+        length_tgt = self.decoder.forward_length_prediction(
+            length_out, encoder_out, tgt_tokens
+        )
+
+        # decoding
+        word_ins_out = self.decoder(
+            normalize=False,
+            prev_output_tokens=prev_output_tokens,
+            encoder_out=encoder_out,
+        )
+        word_ins_tgt, word_ins_mask = tgt_tokens, tgt_tokens.ne(self.pad)
+
+        # compute the log-likelihood of CRF
+        crf_nll = -self.crf_layer(word_ins_out, word_ins_tgt, word_ins_mask)
+        crf_nll = (crf_nll / word_ins_mask.type_as(crf_nll).sum(-1)).mean()
+
+        return {
+            "word_ins": {
+                "out": word_ins_out,
+                "tgt": word_ins_tgt,
+                "mask": word_ins_mask,
+                "ls": self.args.label_smoothing,
+                "nll_loss": True,
+                "factor": self.args.word_ins_loss_factor,
+            },
+            "word_crf": {"loss": crf_nll},
+            "length": {
+                "out": length_out,
+                "tgt": length_tgt,
+                "factor": self.decoder.length_loss_factor,
+            },
+        }
+
+    def forward_decoder(self, decoder_out, encoder_out, decoding_format=None, **kwargs):
+        output_tokens = decoder_out.output_tokens
+        output_scores = decoder_out.output_scores
+        history = decoder_out.history
+
+        # execute the decoder and get emission scores
+        output_masks = output_tokens.ne(self.pad)
+        word_ins_out = self.decoder(
+            normalize=False, prev_output_tokens=output_tokens, encoder_out=encoder_out
+        )
+
+        # run viterbi decoding through CRF
+        _scores, _tokens = self.crf_layer.forward_decoder(word_ins_out, output_masks)
+        output_tokens.masked_scatter_(output_masks, _tokens[output_masks])
+        output_scores.masked_scatter_(output_masks, _scores[output_masks])
+        if history is not None:
+            history.append(output_tokens.clone())
+
+        return decoder_out._replace(
+            output_tokens=output_tokens,
+            output_scores=output_scores,
+            attn=None,
+            history=history,
+        )
+
+
+@register_model_architecture("nacrf_transformer", "nacrf_transformer")
+def nacrf_base_architecture(args):
+    args.crf_lowrank_approx = getattr(args, "crf_lowrank_approx", 32)
+    args.crf_beam_approx = getattr(args, "crf_beam_approx", 64)
+    args.word_ins_loss_factor = getattr(args, "word_ins_loss_factor", 0.5)
+    args.encoder_normalize_before = getattr(args, "encoder_normalize_before", True)
+    args.decoder_normalize_before = getattr(args, "decoder_normalize_before", True)
+    base_architecture(args)
diff --git a/SpeechT5/fairseq/fairseq/models/nat/nonautoregressive_ensembles.py b/SpeechT5/fairseq/fairseq/models/nat/nonautoregressive_ensembles.py
new file mode 100644
index 0000000000000000000000000000000000000000..705a04fb49658c91114a26efd411b4653c65b943
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/models/nat/nonautoregressive_ensembles.py
@@ -0,0 +1,253 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import math
+
+import torch
+import torch.nn.functional as F
+from fairseq.models.nat import (
+    _apply_del_words,
+    _apply_ins_masks,
+    _apply_ins_words,
+    _fill,
+    _skip,
+    _skip_encoder_out,
+)
+
+
+class _EnsembleModelEncoder(object):
+    def __init__(self, models):
+        self.models = models
+
+    def reorder_encoder_out(self, encoder_outs, new_order):
+        encoder_outs = [
+            model.encoder.reorder_encoder_out(encoder_out, new_order)
+            for model, encoder_out in zip(self.models, encoder_outs)
+        ]
+        return encoder_outs
+
+
+class BasicEnsembleModel(torch.nn.Module):
+    """A wrapper around an ensemble of models."""
+
+    def __init__(self, models):
+        super().__init__()
+        self.models = torch.nn.ModuleList(models)
+        self.bos = self.models[0].decoder.dictionary.bos()
+        self.eos = self.models[0].decoder.dictionary.eos()
+        self.pad = self.models[0].decoder.dictionary.pad()
+        self.unk = self.models[0].decoder.dictionary.unk()
+        self.encoder = _EnsembleModelEncoder(self.models)
+
+    def has_encoder(self):
+        return hasattr(self.models[0], "encoder")
+
+    def max_decoder_positions(self):
+        return min(m.max_decoder_positions() for m in self.models)
+
+    @torch.no_grad()
+    def forward_encoder(self, encoder_input):
+        if not self.has_encoder():
+            return None
+        return [model.forward_encoder(encoder_input) for model in self.models]
+
+    @torch.no_grad()
+    def forward_decoder(self, *inputs):
+        raise NotImplementedError
+
+    def initialize_output_tokens(self, *inputs):
+        raise NotImplementedError
+
+
+class EnsembleLevT(BasicEnsembleModel):
+    """A wrapper around an ensemble of models."""
+
+    def __init__(self, models):
+        super().__init__(models)
+
+    @torch.no_grad()
+    def forward_decoder(
+        self, decoder_out, encoder_outs, eos_penalty=0.0, max_ratio=None, **kwargs
+    ):
+        # LevT ensembling
+        # A pipeline of three steps: deletion, placeholder, and word insertion.
+        # We need to average scores in each step in a pipeline way because of dependence.
+        # deletion
+        output_tokens = decoder_out.output_tokens
+        output_scores = decoder_out.output_scores
+        attn = decoder_out.attn
+
+        bsz = output_tokens.size(0)
+        if max_ratio is None:
+            max_lens = output_tokens.new().fill_(255)
+        else:
+            if not encoder_outs[0]["encoder_padding_mask"]:
+                src_lens = (
+                    encoder_outs[0]["encoder_out"][0].new(bsz)
+                    .fill_(encoder_outs[0]["encoder_out"][0].size(1))
+                )
+            else:
+                src_lens = (~encoder_outs[0]["encoder_padding_mask"][0]).sum(1)
+            max_lens = (src_lens * max_ratio).clamp(min=10).long()
+
+        # delete words
+        # do not delete tokens if it is <s> </s>
+        can_del_word = output_tokens.ne(self.pad).sum(1) > 2
+        if can_del_word.sum() != 0:  # we cannot delete, skip
+            output_tokens, output_scores, attn = self.forward_word_del(
+                encoder_outs,
+                output_tokens,
+                output_scores,
+                attn,
+                can_del_word,
+            )
+
+        # insert placeholders
+        can_ins_mask = output_tokens.ne(self.pad).sum(1) < max_lens
+        if can_ins_mask.sum() != 0:
+            output_tokens, output_scores = self.forward_mask_ins(
+                encoder_outs,
+                output_tokens,
+                output_scores,
+                can_ins_mask,
+                eos_penalty,
+                max_lens,
+            )
+
+        # insert words
+        can_ins_word = output_tokens.eq(self.unk).sum(1) > 0
+        if can_ins_word.sum() != 0:
+            output_tokens, output_scores, attn = self.forward_word_ins(
+                encoder_outs,
+                output_tokens,
+                output_scores,
+                attn,
+                can_ins_word,
+            )
+
+        # delete some unnecessary paddings
+        cut_off = output_tokens.ne(self.pad).sum(1).max()
+        output_tokens = output_tokens[:, :cut_off]
+        output_scores = output_scores[:, :cut_off]
+        attn = None if attn is None else attn[:, :cut_off, :]
+        return decoder_out._replace(
+            output_tokens=output_tokens,
+            output_scores=output_scores,
+            attn=attn,
+            history=None,
+        )
+
+    def forward_word_del(
+        self, encoder_outs, output_tokens, output_scores, attn, can_del_word
+    ):
+        word_del_score_avg = []
+        word_del_attn_avg = []
+        for model, encoder_out in zip(self.models, encoder_outs):
+            word_del_out, word_del_attn = model.decoder.forward_word_del(
+                _skip(output_tokens, can_del_word),
+                _skip_encoder_out(model.encoder, encoder_out, can_del_word),
+            )
+            word_del_score = F.log_softmax(word_del_out, 2)
+            word_del_score_avg.append(word_del_score)
+            word_del_attn_avg.append(word_del_attn)
+        word_del_score_avg = torch.logsumexp(
+            torch.stack(word_del_score_avg, dim=0), dim=0
+        ) - math.log(len(self.models))
+        word_del_pred = word_del_score_avg.max(-1)[1].bool()
+        if word_del_attn_avg[0] is not None:
+            word_del_attn_avg = torch.stack(word_del_attn_avg, dim=0) / len(self.models)
+        else:
+            word_del_attn_avg = None
+
+        _tokens, _scores, _attn = _apply_del_words(
+            output_tokens[can_del_word],
+            output_scores[can_del_word],
+            word_del_attn_avg,
+            word_del_pred,
+            self.pad,
+            self.bos,
+            self.eos,
+        )
+        output_tokens = _fill(output_tokens, can_del_word, _tokens, self.pad)
+        output_scores = _fill(output_scores, can_del_word, _scores, 0)
+        attn = _fill(attn, can_del_word, _attn, 0.0)
+        return output_tokens, output_scores, attn
+
+    def forward_mask_ins(
+        self,
+        encoder_outs,
+        output_tokens,
+        output_scores,
+        can_ins_mask,
+        eos_penalty,
+        max_lens,
+    ):
+        mask_ins_score_avg = []
+        for model, encoder_out in zip(self.models, encoder_outs):
+            mask_ins_out, _ = model.decoder.forward_mask_ins(
+                _skip(output_tokens, can_ins_mask),
+                _skip_encoder_out(model.encoder, encoder_out, can_ins_mask),
+            )
+            mask_ins_score = F.log_softmax(mask_ins_out, 2)
+            if eos_penalty > 0.0:
+                mask_ins_score[:, :, 0] -= eos_penalty
+            mask_ins_score_avg.append(mask_ins_score)
+        mask_ins_score_avg = torch.logsumexp(
+            torch.stack(mask_ins_score_avg, dim=0), dim=0
+        ) - math.log(len(self.models))
+        mask_ins_pred = mask_ins_score_avg.max(-1)[1]
+        mask_ins_pred = torch.min(
+            mask_ins_pred, max_lens[can_ins_mask, None].expand_as(mask_ins_pred)
+        )
+        _tokens, _scores = _apply_ins_masks(
+            output_tokens[can_ins_mask],
+            output_scores[can_ins_mask],
+            mask_ins_pred,
+            self.pad,
+            self.unk,
+            self.eos,
+        )
+        output_tokens = _fill(output_tokens, can_ins_mask, _tokens, self.pad)
+        output_scores = _fill(output_scores, can_ins_mask, _scores, 0)
+        return output_tokens, output_scores
+
+    def forward_word_ins(
+        self, encoder_outs, output_tokens, output_scores, attn, can_ins_word
+    ):
+        word_ins_score_avg = []
+        word_ins_attn_avg = []
+        for model, encoder_out in zip(self.models, encoder_outs):
+            word_ins_out, word_ins_attn = model.decoder.forward_word_ins(
+                _skip(output_tokens, can_ins_word),
+                _skip_encoder_out(model.encoder, encoder_out, can_ins_word),
+            )
+            word_ins_score = F.log_softmax(word_ins_out, 2)
+            word_ins_score_avg.append(word_ins_score)
+            word_ins_attn_avg.append(word_ins_attn)
+        word_ins_score_avg = torch.logsumexp(
+            torch.stack(word_ins_score_avg, dim=0), dim=0
+        ) - math.log(len(self.models))
+        if word_ins_attn_avg[0] is not None:
+            word_ins_attn_avg = torch.stack(word_ins_attn_avg, dim=0) / len(self.models)
+        else:
+            word_ins_attn_avg = None
+        word_ins_score_max, word_ins_pred = word_ins_score_avg.max(-1)
+
+        _tokens, _scores = _apply_ins_words(
+            output_tokens[can_ins_word],
+            output_scores[can_ins_word],
+            word_ins_pred,
+            word_ins_score_max,
+            self.unk,
+        )
+
+        output_tokens = _fill(output_tokens, can_ins_word, _tokens, self.pad)
+        output_scores = _fill(output_scores, can_ins_word, _scores, 0)
+        attn = _fill(attn, can_ins_word, word_ins_attn, 0.0)
+        return output_tokens, output_scores, attn
+
+    def initialize_output_tokens(self, encoder_outs, src_tokens):
+        # LevT doesn't do length prediction.
+        return self.models[0].initialize_output_tokens(encoder_outs[0], src_tokens)
diff --git a/SpeechT5/fairseq/fairseq/models/nat/nonautoregressive_transformer.py b/SpeechT5/fairseq/fairseq/models/nat/nonautoregressive_transformer.py
new file mode 100644
index 0000000000000000000000000000000000000000..d114202d25fbd1dca66c7abebb0b0a8bffbe094d
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/models/nat/nonautoregressive_transformer.py
@@ -0,0 +1,456 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import torch
+import torch.nn.functional as F
+from fairseq import utils
+from fairseq.iterative_refinement_generator import DecoderOut
+from fairseq.models import register_model, register_model_architecture
+from fairseq.models.nat import FairseqNATDecoder, FairseqNATModel, ensemble_decoder
+from fairseq.models.transformer import Embedding
+from fairseq.modules.transformer_sentence_encoder import init_bert_params
+
+
+def _mean_pooling(enc_feats, src_masks):
+    # enc_feats: T x B x C
+    # src_masks: B x T or None
+    if src_masks is None:
+        enc_feats = enc_feats.mean(0)
+    else:
+        src_masks = (~src_masks).transpose(0, 1).type_as(enc_feats)
+        enc_feats = (
+            (enc_feats / src_masks.sum(0)[None, :, None]) * src_masks[:, :, None]
+        ).sum(0)
+    return enc_feats
+
+
+def _argmax(x, dim):
+    return (x == x.max(dim, keepdim=True)[0]).type_as(x)
+
+
+def _uniform_assignment(src_lens, trg_lens):
+    max_trg_len = trg_lens.max()
+    steps = (src_lens.float() - 1) / (trg_lens.float() - 1)  # step-size
+    # max_trg_len
+    index_t = utils.new_arange(trg_lens, max_trg_len).float()
+    index_t = steps[:, None] * index_t[None, :]  # batch_size X max_trg_len
+    index_t = torch.round(index_t).long().detach()
+    return index_t
+
+
+@register_model("nonautoregressive_transformer")
+class NATransformerModel(FairseqNATModel):
+    @property
+    def allow_length_beam(self):
+        return True
+
+    @staticmethod
+    def add_args(parser):
+        FairseqNATModel.add_args(parser)
+
+        # length prediction
+        parser.add_argument(
+            "--src-embedding-copy",
+            action="store_true",
+            help="copy encoder word embeddings as the initial input of the decoder",
+        )
+        parser.add_argument(
+            "--pred-length-offset",
+            action="store_true",
+            help="predicting the length difference between the target and source sentences",
+        )
+        parser.add_argument(
+            "--sg-length-pred",
+            action="store_true",
+            help="stop the gradients back-propagated from the length predictor",
+        )
+        parser.add_argument(
+            "--length-loss-factor",
+            type=float,
+            help="weights on the length prediction loss",
+        )
+
+    @classmethod
+    def build_decoder(cls, args, tgt_dict, embed_tokens):
+        decoder = NATransformerDecoder(args, tgt_dict, embed_tokens)
+        if getattr(args, "apply_bert_init", False):
+            decoder.apply(init_bert_params)
+        return decoder
+
+    def forward(
+        self, src_tokens, src_lengths, prev_output_tokens, tgt_tokens, **kwargs
+    ):
+        # encoding
+        encoder_out = self.encoder(src_tokens, src_lengths=src_lengths, **kwargs)
+
+        # length prediction
+        length_out = self.decoder.forward_length(
+            normalize=False, encoder_out=encoder_out
+        )
+        length_tgt = self.decoder.forward_length_prediction(
+            length_out, encoder_out, tgt_tokens
+        )
+
+        # decoding
+        word_ins_out = self.decoder(
+            normalize=False,
+            prev_output_tokens=prev_output_tokens,
+            encoder_out=encoder_out,
+        )
+
+        return {
+            "word_ins": {
+                "out": word_ins_out,
+                "tgt": tgt_tokens,
+                "mask": tgt_tokens.ne(self.pad),
+                "ls": self.args.label_smoothing,
+                "nll_loss": True,
+            },
+            "length": {
+                "out": length_out,
+                "tgt": length_tgt,
+                "factor": self.decoder.length_loss_factor,
+            },
+        }
+
+    def forward_decoder(self, decoder_out, encoder_out, decoding_format=None, **kwargs):
+        step = decoder_out.step
+        output_tokens = decoder_out.output_tokens
+        output_scores = decoder_out.output_scores
+        history = decoder_out.history
+
+        # execute the decoder
+        output_masks = output_tokens.ne(self.pad)
+        _scores, _tokens = self.decoder(
+            normalize=True,
+            prev_output_tokens=output_tokens,
+            encoder_out=encoder_out,
+            step=step,
+        ).max(-1)
+
+        output_tokens.masked_scatter_(output_masks, _tokens[output_masks])
+        output_scores.masked_scatter_(output_masks, _scores[output_masks])
+        if history is not None:
+            history.append(output_tokens.clone())
+
+        return decoder_out._replace(
+            output_tokens=output_tokens,
+            output_scores=output_scores,
+            attn=None,
+            history=history,
+        )
+
+    def initialize_output_tokens(self, encoder_out, src_tokens):
+        # length prediction
+        length_tgt = self.decoder.forward_length_prediction(
+            self.decoder.forward_length(normalize=True, encoder_out=encoder_out),
+            encoder_out=encoder_out,
+        )
+
+        max_length = length_tgt.clamp_(min=2).max()
+        idx_length = utils.new_arange(src_tokens, max_length)
+
+        initial_output_tokens = src_tokens.new_zeros(
+            src_tokens.size(0), max_length
+        ).fill_(self.pad)
+        initial_output_tokens.masked_fill_(
+            idx_length[None, :] < length_tgt[:, None], self.unk
+        )
+        initial_output_tokens[:, 0] = self.bos
+        initial_output_tokens.scatter_(1, length_tgt[:, None] - 1, self.eos)
+
+        initial_output_scores = initial_output_tokens.new_zeros(
+            *initial_output_tokens.size()
+        ).type_as(encoder_out["encoder_out"][0])
+
+        return DecoderOut(
+            output_tokens=initial_output_tokens,
+            output_scores=initial_output_scores,
+            attn=None,
+            step=0,
+            max_step=0,
+            history=None,
+        )
+
+    def regenerate_length_beam(self, decoder_out, beam_size):
+        output_tokens = decoder_out.output_tokens
+        length_tgt = output_tokens.ne(self.pad).sum(1)
+        length_tgt = (
+            length_tgt[:, None]
+            + utils.new_arange(length_tgt, 1, beam_size)
+            - beam_size // 2
+        )
+        length_tgt = length_tgt.view(-1).clamp_(min=2)
+        max_length = length_tgt.max()
+        idx_length = utils.new_arange(length_tgt, max_length)
+
+        initial_output_tokens = output_tokens.new_zeros(
+            length_tgt.size(0), max_length
+        ).fill_(self.pad)
+        initial_output_tokens.masked_fill_(
+            idx_length[None, :] < length_tgt[:, None], self.unk
+        )
+        initial_output_tokens[:, 0] = self.bos
+        initial_output_tokens.scatter_(1, length_tgt[:, None] - 1, self.eos)
+
+        initial_output_scores = initial_output_tokens.new_zeros(
+            *initial_output_tokens.size()
+        ).type_as(decoder_out.output_scores)
+
+        return decoder_out._replace(
+            output_tokens=initial_output_tokens, output_scores=initial_output_scores
+        )
+
+
+class NATransformerDecoder(FairseqNATDecoder):
+    def __init__(self, args, dictionary, embed_tokens, no_encoder_attn=False):
+        super().__init__(
+            args, dictionary, embed_tokens, no_encoder_attn=no_encoder_attn
+        )
+        self.dictionary = dictionary
+        self.bos = dictionary.bos()
+        self.unk = dictionary.unk()
+        self.eos = dictionary.eos()
+
+        self.encoder_embed_dim = args.encoder_embed_dim
+        self.sg_length_pred = getattr(args, "sg_length_pred", False)
+        self.pred_length_offset = getattr(args, "pred_length_offset", False)
+        self.length_loss_factor = getattr(args, "length_loss_factor", 0.1)
+        self.src_embedding_copy = getattr(args, "src_embedding_copy", False)
+        self.embed_length = Embedding(256, self.encoder_embed_dim, None)
+
+    @ensemble_decoder
+    def forward(self, normalize, encoder_out, prev_output_tokens, step=0, **unused):
+        features, _ = self.extract_features(
+            prev_output_tokens,
+            encoder_out=encoder_out,
+            embedding_copy=(step == 0) & self.src_embedding_copy,
+        )
+        decoder_out = self.output_layer(features)
+        return F.log_softmax(decoder_out, -1) if normalize else decoder_out
+
+    @ensemble_decoder
+    def forward_length(self, normalize, encoder_out):
+        enc_feats = encoder_out["encoder_out"][0]  # T x B x C
+        if len(encoder_out["encoder_padding_mask"]) > 0:
+            src_masks = encoder_out["encoder_padding_mask"][0]  # B x T
+        else:
+            src_masks = None
+        enc_feats = _mean_pooling(enc_feats, src_masks)
+        if self.sg_length_pred:
+            enc_feats = enc_feats.detach()
+        length_out = F.linear(enc_feats, self.embed_length.weight)
+        return F.log_softmax(length_out, -1) if normalize else length_out
+
+    def extract_features(
+        self,
+        prev_output_tokens,
+        encoder_out=None,
+        early_exit=None,
+        embedding_copy=False,
+        **unused
+    ):
+        """
+        Similar to *forward* but only return features.
+
+        Inputs:
+            prev_output_tokens: Tensor(B, T)
+            encoder_out: a dictionary of hidden states and masks
+
+        Returns:
+            tuple:
+                - the decoder's features of shape `(batch, tgt_len, embed_dim)`
+                - a dictionary with any model-specific outputs
+            the LevenshteinTransformer decoder has full-attention to all generated tokens
+        """
+        # embedding
+        if embedding_copy:
+            src_embd = encoder_out["encoder_embedding"][0]
+            if len(encoder_out["encoder_padding_mask"]) > 0:
+                src_mask = encoder_out["encoder_padding_mask"][0]
+            else:
+                src_mask = None
+            src_mask = (
+                ~src_mask
+                if src_mask is not None
+                else prev_output_tokens.new_ones(*src_embd.size()[:2]).bool()
+            )
+
+            x, decoder_padding_mask = self.forward_embedding(
+                prev_output_tokens,
+                self.forward_copying_source(
+                    src_embd, src_mask, prev_output_tokens.ne(self.padding_idx)
+                ),
+            )
+
+        else:
+
+            x, decoder_padding_mask = self.forward_embedding(prev_output_tokens)
+
+        # B x T x C -> T x B x C
+        x = x.transpose(0, 1)
+        attn = None
+        inner_states = [x]
+
+        # decoder layers
+        for i, layer in enumerate(self.layers):
+
+            # early exit from the decoder.
+            if (early_exit is not None) and (i >= early_exit):
+                break
+
+            x, attn, _ = layer(
+                x,
+                encoder_out["encoder_out"][0]
+                if (encoder_out is not None and len(encoder_out["encoder_out"]) > 0)
+                else None,
+                encoder_out["encoder_padding_mask"][0]
+                if (
+                    encoder_out is not None
+                    and len(encoder_out["encoder_padding_mask"]) > 0
+                )
+                else None,
+                self_attn_mask=None,
+                self_attn_padding_mask=decoder_padding_mask,
+            )
+            inner_states.append(x)
+
+        if self.layer_norm:
+            x = self.layer_norm(x)
+
+        # T x B x C -> B x T x C
+        x = x.transpose(0, 1)
+
+        if self.project_out_dim is not None:
+            x = self.project_out_dim(x)
+
+        return x, {"attn": attn, "inner_states": inner_states}
+
+    def forward_embedding(self, prev_output_tokens, states=None):
+        # embed positions
+        positions = (
+            self.embed_positions(prev_output_tokens)
+            if self.embed_positions is not None
+            else None
+        )
+
+        # embed tokens and positions
+        if states is None:
+            x = self.embed_scale * self.embed_tokens(prev_output_tokens)
+            if self.project_in_dim is not None:
+                x = self.project_in_dim(x)
+        else:
+            x = states
+
+        if positions is not None:
+            x += positions
+        x = self.dropout_module(x)
+        decoder_padding_mask = prev_output_tokens.eq(self.padding_idx)
+        return x, decoder_padding_mask
+
+    def forward_copying_source(self, src_embeds, src_masks, tgt_masks):
+        length_sources = src_masks.sum(1)
+        length_targets = tgt_masks.sum(1)
+        mapped_inputs = _uniform_assignment(length_sources, length_targets).masked_fill(
+            ~tgt_masks, 0
+        )
+        copied_embedding = torch.gather(
+            src_embeds,
+            1,
+            mapped_inputs.unsqueeze(-1).expand(
+                *mapped_inputs.size(), src_embeds.size(-1)
+            ),
+        )
+        return copied_embedding
+
+    def forward_length_prediction(self, length_out, encoder_out, tgt_tokens=None):
+        enc_feats = encoder_out["encoder_out"][0]  # T x B x C
+        if len(encoder_out["encoder_padding_mask"]) > 0:
+            src_masks = encoder_out["encoder_padding_mask"][0]  # B x T
+        else:
+            src_masks = None
+        if self.pred_length_offset:
+            if src_masks is None:
+                src_lengs = enc_feats.new_ones(enc_feats.size(1)).fill_(
+                    enc_feats.size(0)
+                )
+            else:
+                src_lengs = (~src_masks).transpose(0, 1).type_as(enc_feats).sum(0)
+            src_lengs = src_lengs.long()
+
+        if tgt_tokens is not None:
+            # obtain the length target
+            tgt_lengs = tgt_tokens.ne(self.padding_idx).sum(1).long()
+            if self.pred_length_offset:
+                length_tgt = tgt_lengs - src_lengs + 128
+            else:
+                length_tgt = tgt_lengs
+            length_tgt = length_tgt.clamp(min=0, max=255)
+
+        else:
+            # predict the length target (greedy for now)
+            # TODO: implementing length-beam
+            pred_lengs = length_out.max(-1)[1]
+            if self.pred_length_offset:
+                length_tgt = pred_lengs - 128 + src_lengs
+            else:
+                length_tgt = pred_lengs
+
+        return length_tgt
+
+
+@register_model_architecture(
+    "nonautoregressive_transformer", "nonautoregressive_transformer"
+)
+def base_architecture(args):
+    args.encoder_embed_path = getattr(args, "encoder_embed_path", None)
+    args.encoder_embed_dim = getattr(args, "encoder_embed_dim", 512)
+    args.encoder_ffn_embed_dim = getattr(args, "encoder_ffn_embed_dim", 2048)
+    args.encoder_layers = getattr(args, "encoder_layers", 6)
+    args.encoder_attention_heads = getattr(args, "encoder_attention_heads", 8)
+    args.encoder_normalize_before = getattr(args, "encoder_normalize_before", False)
+    args.encoder_learned_pos = getattr(args, "encoder_learned_pos", False)
+    args.decoder_embed_path = getattr(args, "decoder_embed_path", None)
+    args.decoder_embed_dim = getattr(args, "decoder_embed_dim", args.encoder_embed_dim)
+    args.decoder_ffn_embed_dim = getattr(
+        args, "decoder_ffn_embed_dim", args.encoder_ffn_embed_dim
+    )
+    args.decoder_layers = getattr(args, "decoder_layers", 6)
+    args.decoder_attention_heads = getattr(args, "decoder_attention_heads", 8)
+    args.decoder_normalize_before = getattr(args, "decoder_normalize_before", False)
+    args.decoder_learned_pos = getattr(args, "decoder_learned_pos", False)
+    args.attention_dropout = getattr(args, "attention_dropout", 0.0)
+    args.activation_dropout = getattr(args, "activation_dropout", 0.0)
+    args.activation_fn = getattr(args, "activation_fn", "relu")
+    args.dropout = getattr(args, "dropout", 0.1)
+    args.adaptive_softmax_cutoff = getattr(args, "adaptive_softmax_cutoff", None)
+    args.adaptive_softmax_dropout = getattr(args, "adaptive_softmax_dropout", 0)
+    args.share_decoder_input_output_embed = getattr(
+        args, "share_decoder_input_output_embed", False
+    )
+    args.share_all_embeddings = getattr(args, "share_all_embeddings", False)
+    args.no_token_positional_embeddings = getattr(
+        args, "no_token_positional_embeddings", False
+    )
+    args.adaptive_input = getattr(args, "adaptive_input", False)
+    args.apply_bert_init = getattr(args, "apply_bert_init", False)
+
+    args.decoder_output_dim = getattr(
+        args, "decoder_output_dim", args.decoder_embed_dim
+    )
+    args.decoder_input_dim = getattr(args, "decoder_input_dim", args.decoder_embed_dim)
+
+    # --- special arguments ---
+    args.sg_length_pred = getattr(args, "sg_length_pred", False)
+    args.pred_length_offset = getattr(args, "pred_length_offset", False)
+    args.length_loss_factor = getattr(args, "length_loss_factor", 0.1)
+    args.src_embedding_copy = getattr(args, "src_embedding_copy", False)
+
+
+@register_model_architecture(
+    "nonautoregressive_transformer", "nonautoregressive_transformer_wmt_en_de"
+)
+def nonautoregressive_transformer_wmt_en_de(args):
+    base_architecture(args)
diff --git a/SpeechT5/fairseq/fairseq/models/roberta/__init__.py b/SpeechT5/fairseq/fairseq/models/roberta/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..4cd723ae96aec8e3182773483f123109d23b620e
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/models/roberta/__init__.py
@@ -0,0 +1,11 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from .hub_interface import *  # noqa
+from .model import *  # noqa
+from .enc_dec import *  # noqa
+from .model_camembert import *  # noqa
+from .model_gottbert import *  # noqa
+from .model_xlmr import *  # noqa
diff --git a/SpeechT5/fairseq/fairseq/models/roberta/alignment_utils.py b/SpeechT5/fairseq/fairseq/models/roberta/alignment_utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..ccc7f74cb94d5b8baa2d4e9dfd44f653d47ee43e
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/models/roberta/alignment_utils.py
@@ -0,0 +1,118 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from collections import Counter
+from typing import List
+
+import torch
+
+
+def align_bpe_to_words(roberta, bpe_tokens: torch.LongTensor, other_tokens: List[str]):
+    """
+    Helper to align GPT-2 BPE to other tokenization formats (e.g., spaCy).
+
+    Args:
+        roberta (RobertaHubInterface): RoBERTa instance
+        bpe_tokens (torch.LongTensor): GPT-2 BPE tokens of shape `(T_bpe)`
+        other_tokens (List[str]): other tokens of shape `(T_words)`
+
+    Returns:
+        List[str]: mapping from *other_tokens* to corresponding *bpe_tokens*.
+    """
+    assert bpe_tokens.dim() == 1
+    assert bpe_tokens[0] == 0
+
+    def clean(text):
+        return text.strip()
+
+    # remove whitespaces to simplify alignment
+    bpe_tokens = [roberta.task.source_dictionary.string([x]) for x in bpe_tokens]
+    bpe_tokens = [
+        clean(roberta.bpe.decode(x) if x not in {"<s>", ""} else x) for x in bpe_tokens
+    ]
+    other_tokens = [clean(str(o)) for o in other_tokens]
+
+    # strip leading <s>
+    bpe_tokens = bpe_tokens[1:]
+    assert "".join(bpe_tokens) == "".join(other_tokens)
+
+    # create alignment from every word to a list of BPE tokens
+    alignment = []
+    bpe_toks = filter(lambda item: item[1] != "", enumerate(bpe_tokens, start=1))
+    j, bpe_tok = next(bpe_toks)
+    for other_tok in other_tokens:
+        bpe_indices = []
+        while True:
+            if other_tok.startswith(bpe_tok):
+                bpe_indices.append(j)
+                other_tok = other_tok[len(bpe_tok) :]
+                try:
+                    j, bpe_tok = next(bpe_toks)
+                except StopIteration:
+                    j, bpe_tok = None, None
+            elif bpe_tok.startswith(other_tok):
+                # other_tok spans multiple BPE tokens
+                bpe_indices.append(j)
+                bpe_tok = bpe_tok[len(other_tok) :]
+                other_tok = ""
+            else:
+                raise Exception('Cannot align "{}" and "{}"'.format(other_tok, bpe_tok))
+            if other_tok == "":
+                break
+        assert len(bpe_indices) > 0
+        alignment.append(bpe_indices)
+    assert len(alignment) == len(other_tokens)
+
+    return alignment
+
+
+def align_features_to_words(roberta, features, alignment):
+    """
+    Align given features to words.
+
+    Args:
+        roberta (RobertaHubInterface): RoBERTa instance
+        features (torch.Tensor): features to align of shape `(T_bpe x C)`
+        alignment: alignment between BPE tokens and words returned by
+            func:`align_bpe_to_words`.
+    """
+    assert features.dim() == 2
+
+    bpe_counts = Counter(j for bpe_indices in alignment for j in bpe_indices)
+    assert bpe_counts[0] == 0  # <s> shouldn't be aligned
+    denom = features.new([bpe_counts.get(j, 1) for j in range(len(features))])
+    weighted_features = features / denom.unsqueeze(-1)
+
+    output = [weighted_features[0]]
+    largest_j = -1
+    for bpe_indices in alignment:
+        output.append(weighted_features[bpe_indices].sum(dim=0))
+        largest_j = max(largest_j, *bpe_indices)
+    for j in range(largest_j + 1, len(features)):
+        output.append(weighted_features[j])
+    output = torch.stack(output)
+    assert torch.all(torch.abs(output.sum(dim=0) - features.sum(dim=0)) < 1e-4)
+    return output
+
+
+def spacy_nlp():
+    if getattr(spacy_nlp, "_nlp", None) is None:
+        try:
+            from spacy.lang.en import English
+
+            spacy_nlp._nlp = English()
+        except ImportError:
+            raise ImportError("Please install spacy with: pip install spacy")
+    return spacy_nlp._nlp
+
+
+def spacy_tokenizer():
+    if getattr(spacy_tokenizer, "_tokenizer", None) is None:
+        try:
+            nlp = spacy_nlp()
+            spacy_tokenizer._tokenizer = nlp.Defaults.create_tokenizer(nlp)
+        except ImportError:
+            raise ImportError("Please install spacy with: pip install spacy")
+    return spacy_tokenizer._tokenizer
diff --git a/SpeechT5/fairseq/fairseq/models/roberta/enc_dec.py b/SpeechT5/fairseq/fairseq/models/roberta/enc_dec.py
new file mode 100644
index 0000000000000000000000000000000000000000..e538dee0aa5984b1a3d02ce81117d2046c030593
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/models/roberta/enc_dec.py
@@ -0,0 +1,192 @@
+import argparse
+import logging
+
+import torch.nn as nn
+import fairseq.checkpoint_utils
+from fairseq.models import (
+    FairseqEncoderDecoderModel,
+    register_model,
+    register_model_architecture,
+)
+from fairseq.models.transformer import TransformerDecoder
+from fairseq.models.roberta import model as roberta
+
+logger = logging.getLogger(__name__)
+
+
+@register_model("roberta_enc_dec")
+class RobertaEncDecModel(FairseqEncoderDecoderModel):
+    @staticmethod
+    def add_args(parser):
+        parser.add_argument(
+            "--pretrained-mlm-checkpoint",
+            default=None,
+            type=str,
+            metavar="PRETRAINED",
+            help="path to pretrained mlm checkpoint",
+        )
+        parser.add_argument(
+            "--pretrained-decoder", action="store_true", help="reload decoder"
+        )
+        parser.add_argument(
+            "--hack-layernorm-embedding",
+            action="store_true",
+            help="hack to reload old models trained with encoder-normalize-before=False (no equivalent to encoder-normalize-before=False and layernorm_embedding=False",
+        )
+        parser.add_argument(
+            "--share-decoder-input-output-embed",
+            action="store_true",
+            help="share decoder input and output embeddings",
+        )
+        parser.add_argument(
+            "--share-all-embeddings",
+            action="store_true",
+            help="share encoder, decoder and output embeddings"
+            " (requires shared dictionary and embed dim)",
+        )
+
+    @classmethod
+    def build_model(cls, args, task):
+        """Build a new model instance."""
+
+        # make sure all arguments are present
+        base_enc_dec_architecture(args)
+        if args.pretrained_mlm_checkpoint:
+            arg_overrides = None
+            if args.hack_layernorm_embedding:
+                arg_overrides = {"layernorm_embedding": False}
+            loaded = fairseq.checkpoint_utils.load_model_ensemble_and_task(
+                [args.pretrained_mlm_checkpoint], arg_overrides=arg_overrides
+            )
+            ([roberta_enc], _cfg, _task) = loaded
+        else:
+            # Do we need to edit untie_weights here ?
+            share_in_out = (
+                args.share_decoder_input_output_embed or args.share_all_embeddings
+            )
+            args.untie_weights_roberta = not share_in_out
+            if args.hack_layernorm_embedding:
+                args.layernorm_embedding = False
+                args.encoder_normalize_before = False
+            roberta_enc = roberta.RobertaModel.build_model(args, task)
+
+        return cls.from_roberta(roberta_enc, args, task.source_dictionary)
+
+    @staticmethod
+    def from_roberta(roberta_enc: roberta.RobertaModel, args, dictionary):
+        encoder = roberta_enc.encoder.sentence_encoder
+        vocab_size, embed_dim = encoder.embed_tokens.weight.shape
+
+        if args.share_all_embeddings:
+            lm_head = roberta_enc.encoder.lm_head
+            assert encoder.embed_tokens.weight is lm_head.weight, (
+                "Can't use --share-all-embeddings with a model "
+                "that was pretraiend with --untie-weights-roberta_enc"
+            )
+        else:
+            lm_head = roberta.RobertaLMHead(
+                embed_dim, vocab_size, roberta_enc.args.activation_fn
+            )
+
+        dec_embs = nn.Embedding(vocab_size, embed_dim, dictionary.pad())
+        if args.share_all_embeddings or args.share_decoder_input_output_embed:
+            # Note: I wasn't able to use Embedding _weight parameter to achive this sharing.
+            dec_embs.weight = lm_head.weight
+
+        decoder = TransformerDecoder(
+            RobertaEncDecModel.read_args_from_roberta(roberta_enc.args),
+            dictionary,
+            dec_embs,
+            no_encoder_attn=False,
+            output_projection=lm_head,
+        )
+        if getattr(args, "pretrained_decoder", False):
+            decoder_dict = encoder.state_dict()
+
+            # TODO: hide setting "encoder_attn" layers behind a flag.
+            for k, w in list(decoder_dict.items()):
+                if ".self_attn" in k:
+                    k_enc_attn = k.replace(".self_attn", ".encoder_attn")
+                    decoder_dict[k_enc_attn] = w.detach().clone()
+
+            for k, w in lm_head.state_dict().items():
+                decoder_dict["output_projection." + k] = w
+
+            missing_keys, unexpected_keys = decoder.load_state_dict(
+                decoder_dict, strict=False
+            )
+            # missing_keys = [m for m in missing_keys if ".encoder_attn" not in m]
+            assert not missing_keys and not unexpected_keys, (
+                "Failed to load state dict. "
+                f"Missing keys: {missing_keys}. "
+                f"Unexpected keys: {unexpected_keys}."
+            )
+
+        if args.share_all_embeddings:
+            assert decoder.output_projection.weight is decoder.embed_tokens.weight
+            assert encoder.embed_tokens.weight is decoder.embed_tokens.weight
+        elif args.share_decoder_input_output_embed:
+            assert decoder.output_projection.weight is decoder.embed_tokens.weight
+            assert encoder.embed_tokens.weight is not decoder.embed_tokens.weight
+        else:
+            assert decoder.output_projection.weight is not decoder.embed_tokens.weight
+            assert encoder.embed_tokens.weight is not decoder.embed_tokens.weight
+
+        return RobertaEncDecModel(encoder, decoder)
+
+    @staticmethod
+    def read_args_from_roberta(roberta_args: argparse.Namespace):
+        # TODO: this would become easier if encoder/decoder where using a similar
+        # TransformerConfig object
+        args = argparse.Namespace(**vars(roberta_args))
+        attr_map = [
+            ("encoder_attention_heads", "decoder_attention_heads"),
+            ("encoder_embed_dim", "decoder_embed_dim"),
+            ("encoder_embed_dim", "decoder_output_dim"),
+            ("encoder_normalize_before", "decoder_normalize_before"),
+            ("encoder_layers_to_keep", "decoder_layers_to_keep"),
+            ("encoder_ffn_embed_dim", "decoder_ffn_embed_dim"),
+            ("encoder_layerdrop", "decoder_layerdrop"),
+            ("encoder_layers", "decoder_layers"),
+            ("encoder_learned_pos", "decoder_learned_pos"),
+            # should this be set from here ?
+            ("max_positions", "max_target_positions"),
+        ]
+        for k1, k2 in attr_map:
+            setattr(args, k2, getattr(roberta_args, k1))
+
+        args.adaptive_softmax_cutoff = getattr(args, "adaptive_softmax_cutoff", None)
+        args.adaptive_softmax_dropout = getattr(args, "adaptive_softmax_dropout", 0)
+        args.share_decoder_input_output_embed = not roberta_args.untie_weights_roberta
+        return args
+
+    def upgrade_state_dict_named(self, state_dict, name):
+        prefix = name + "." if name != "" else ""
+        super().upgrade_state_dict_named(state_dict, name)
+        old_keys = list(state_dict.keys())
+
+        # rename decoder -> encoder before upgrading children modules
+        for k in old_keys:
+            if k.startswith(prefix + "encoder.lm_head"):
+                state_dict.pop(k)
+                continue
+            new_k = k
+            new_k = new_k.replace(".sentence_encoder.", ".")
+            new_k = new_k.replace("decoder.lm_head.", "decoder.output_projection.")
+            if k == new_k:
+                continue
+            # print(k, "->", new_k)
+            state_dict[new_k] = state_dict.pop(k)
+
+
+@register_model_architecture("roberta_enc_dec", "roberta_enc_dec")
+def base_enc_dec_architecture(args):
+    args.hack_layernorm_embedding = getattr(args, "hack_layernorm_embedding", False)
+    args.pretrained_mlm_checkpoint = getattr(args, "pretrained_mlm_checkpoint", None)
+    args.pretrained_decoder = getattr(args, "pretrained_decoder", None)
+    args.share_all_embeddings = getattr(args, "share_all_embeddings", False)
+    args.share_decoder_input_output_embed = getattr(
+        args, "share_decoder_input_output_embed", False
+    )
+
+    roberta.base_architecture(args)
diff --git a/SpeechT5/fairseq/fairseq/models/roberta/hub_interface.py b/SpeechT5/fairseq/fairseq/models/roberta/hub_interface.py
new file mode 100644
index 0000000000000000000000000000000000000000..c9af434bde61f399a4eebaafd5811be9a37d538e
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/models/roberta/hub_interface.py
@@ -0,0 +1,235 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import numpy as np
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from fairseq import utils
+from fairseq.data import encoders
+
+
+class RobertaHubInterface(nn.Module):
+    """A simple PyTorch Hub interface to RoBERTa.
+
+    Usage: https://github.com/pytorch/fairseq/tree/master/examples/roberta
+    """
+
+    def __init__(self, cfg, task, model):
+        super().__init__()
+        self.cfg = cfg
+        self.task = task
+        self.model = model
+
+        self.bpe = encoders.build_bpe(cfg.bpe)
+
+        # this is useful for determining the device
+        self.register_buffer("_float_tensor", torch.tensor([0], dtype=torch.float))
+
+    @property
+    def device(self):
+        return self._float_tensor.device
+
+    def encode(
+        self, sentence: str, *addl_sentences, no_separator=False
+    ) -> torch.LongTensor:
+        """
+        BPE-encode a sentence (or multiple sentences).
+
+        Every sequence begins with a beginning-of-sentence (`<s>`) symbol.
+        Every sentence ends with an end-of-sentence (`</s>`) and we use an
+        extra end-of-sentence (`</s>`) as a separator.
+
+        Example (single sentence): `<s> a b c </s>`
+        Example (sentence pair): `<s> d e f </s> </s> 1 2 3 </s>`
+
+        The BPE encoding follows GPT-2. One subtle detail is that the GPT-2 BPE
+        requires leading spaces. For example::
+
+            >>> roberta.encode('Hello world').tolist()
+            [0, 31414, 232, 2]
+            >>> roberta.encode(' world').tolist()
+            [0, 232, 2]
+            >>> roberta.encode('world').tolist()
+            [0, 8331, 2]
+        """
+        bpe_sentence = "<s> " + self.bpe.encode(sentence) + " </s>"
+        for s in addl_sentences:
+            bpe_sentence += " </s>" if not no_separator else ""
+            bpe_sentence += " " + self.bpe.encode(s) + " </s>"
+        tokens = self.task.source_dictionary.encode_line(
+            bpe_sentence, append_eos=False, add_if_not_exist=False
+        )
+        return tokens.long()
+
+    def decode(self, tokens: torch.LongTensor):
+        assert tokens.dim() == 1
+        tokens = tokens.numpy()
+        if tokens[0] == self.task.source_dictionary.bos():
+            tokens = tokens[1:]  # remove <s>
+        eos_mask = tokens == self.task.source_dictionary.eos()
+        doc_mask = eos_mask[1:] & eos_mask[:-1]
+        sentences = np.split(tokens, doc_mask.nonzero()[0] + 1)
+        sentences = [
+            self.bpe.decode(self.task.source_dictionary.string(s)) for s in sentences
+        ]
+        if len(sentences) == 1:
+            return sentences[0]
+        return sentences
+
+    def extract_features(
+        self, tokens: torch.LongTensor, return_all_hiddens: bool = False
+    ) -> torch.Tensor:
+        if tokens.dim() == 1:
+            tokens = tokens.unsqueeze(0)
+        if tokens.size(-1) > self.model.max_positions():
+            raise ValueError(
+                "tokens exceeds maximum length: {} > {}".format(
+                    tokens.size(-1), self.model.max_positions()
+                )
+            )
+        features, extra = self.model(
+            tokens.to(device=self.device),
+            features_only=True,
+            return_all_hiddens=return_all_hiddens,
+        )
+        if return_all_hiddens:
+            # convert from T x B x C -> B x T x C
+            inner_states = extra["inner_states"]
+            return [inner_state.transpose(0, 1) for inner_state in inner_states]
+        else:
+            return features  # just the last layer's features
+
+    def register_classification_head(
+        self, name: str, num_classes: int = None, embedding_size: int = None, **kwargs
+    ):
+        self.model.register_classification_head(
+            name, num_classes=num_classes, embedding_size=embedding_size, **kwargs
+        )
+
+    def predict(self, head: str, tokens: torch.LongTensor, return_logits: bool = False):
+        features = self.extract_features(tokens.to(device=self.device))
+        logits = self.model.classification_heads[head](features)
+        if return_logits:
+            return logits
+        return F.log_softmax(logits, dim=-1)
+
+    def extract_features_aligned_to_words(
+        self, sentence: str, return_all_hiddens: bool = False
+    ) -> torch.Tensor:
+        """Extract RoBERTa features, aligned to spaCy's word-level tokenizer."""
+        from fairseq.models.roberta import alignment_utils
+        from spacy.tokens import Doc
+
+        nlp = alignment_utils.spacy_nlp()
+        tokenizer = alignment_utils.spacy_tokenizer()
+
+        # tokenize both with GPT-2 BPE and spaCy
+        bpe_toks = self.encode(sentence)
+        spacy_toks = tokenizer(sentence)
+        spacy_toks_ws = [t.text_with_ws for t in tokenizer(sentence)]
+        alignment = alignment_utils.align_bpe_to_words(self, bpe_toks, spacy_toks_ws)
+
+        # extract features and align them
+        features = self.extract_features(
+            bpe_toks, return_all_hiddens=return_all_hiddens
+        )
+        features = features.squeeze(0)
+        aligned_feats = alignment_utils.align_features_to_words(
+            self, features, alignment
+        )
+
+        # wrap in spaCy Doc
+        doc = Doc(
+            nlp.vocab,
+            words=["<s>"] + [x.text for x in spacy_toks] + ["</s>"],
+            spaces=[True]
+            + [x.endswith(" ") for x in spacy_toks_ws[:-1]]
+            + [True, False],
+        )
+        assert len(doc) == aligned_feats.size(0)
+        doc.user_token_hooks["vector"] = lambda token: aligned_feats[token.i]
+        return doc
+
+    def fill_mask(self, masked_input: str, topk: int = 5):
+        masked_token = "<mask>"
+        assert (
+            masked_token in masked_input and masked_input.count(masked_token) == 1
+        ), "Please add one {0} token for the input, eg: 'He is a {0} guy'".format(
+            masked_token
+        )
+
+        text_spans = masked_input.split(masked_token)
+        text_spans_bpe = (
+            (" {0} ".format(masked_token))
+            .join([self.bpe.encode(text_span.rstrip()) for text_span in text_spans])
+            .strip()
+        )
+        tokens = self.task.source_dictionary.encode_line(
+            "<s> " + text_spans_bpe + " </s>",
+            append_eos=False,
+            add_if_not_exist=False,
+        )
+
+        masked_index = (tokens == self.task.mask_idx).nonzero(as_tuple=False)
+        if tokens.dim() == 1:
+            tokens = tokens.unsqueeze(0)
+
+        with utils.model_eval(self.model):
+            features, extra = self.model(
+                tokens.long().to(device=self.device),
+                features_only=False,
+                return_all_hiddens=False,
+            )
+        logits = features[0, masked_index, :].squeeze()
+        prob = logits.softmax(dim=0)
+        values, index = prob.topk(k=topk, dim=0)
+        topk_predicted_token_bpe = self.task.source_dictionary.string(index)
+
+        topk_filled_outputs = []
+        for index, predicted_token_bpe in enumerate(
+            topk_predicted_token_bpe.split(" ")
+        ):
+            predicted_token = self.bpe.decode(predicted_token_bpe)
+            # Quick hack to fix https://github.com/pytorch/fairseq/issues/1306
+            if predicted_token_bpe.startswith("\u2581"):
+                predicted_token = " " + predicted_token
+            if " {0}".format(masked_token) in masked_input:
+                topk_filled_outputs.append(
+                    (
+                        masked_input.replace(
+                            " {0}".format(masked_token), predicted_token
+                        ),
+                        values[index].item(),
+                        predicted_token,
+                    )
+                )
+            else:
+                topk_filled_outputs.append(
+                    (
+                        masked_input.replace(masked_token, predicted_token),
+                        values[index].item(),
+                        predicted_token,
+                    )
+                )
+        return topk_filled_outputs
+
+    def disambiguate_pronoun(self, sentence: str) -> bool:
+        """
+        Usage::
+
+            >>> disambiguate_pronoun('The _trophy_ would not fit in the brown suitcase because [it] was too big.')
+            True
+
+            >>> disambiguate_pronoun('The trophy would not fit in the brown suitcase because [it] was too big.')
+            'The trophy'
+        """
+        assert hasattr(
+            self.task, "disambiguate_pronoun"
+        ), "roberta.disambiguate_pronoun() requires a model trained with the WSC task."
+        with utils.model_eval(self.model):
+            return self.task.disambiguate_pronoun(
+                self.model, sentence, use_cuda=self.device.type == "cuda"
+            )
diff --git a/SpeechT5/fairseq/fairseq/models/roberta/model.py b/SpeechT5/fairseq/fairseq/models/roberta/model.py
new file mode 100644
index 0000000000000000000000000000000000000000..d9d0f324cf708a49a9d97ef05621dd1eb9bdefc8
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/models/roberta/model.py
@@ -0,0 +1,582 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+"""
+RoBERTa: A Robustly Optimized BERT Pretraining Approach.
+"""
+
+import logging
+
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from fairseq import utils
+from fairseq.models import (
+    FairseqEncoder,
+    FairseqEncoderModel,
+    register_model,
+    register_model_architecture,
+)
+from fairseq.models.transformer import DEFAULT_MIN_PARAMS_TO_WRAP, TransformerEncoder
+from fairseq.modules import LayerNorm
+from fairseq.modules.quant_noise import quant_noise as apply_quant_noise_
+from fairseq.modules.transformer_sentence_encoder import init_bert_params
+
+from .hub_interface import RobertaHubInterface
+
+
+logger = logging.getLogger(__name__)
+
+
+@register_model("roberta")
+class RobertaModel(FairseqEncoderModel):
+    @classmethod
+    def hub_models(cls):
+        return {
+            "roberta.base": "http://dl.fbaipublicfiles.com/fairseq/models/roberta.base.tar.gz",
+            "roberta.large": "http://dl.fbaipublicfiles.com/fairseq/models/roberta.large.tar.gz",
+            "roberta.large.mnli": "http://dl.fbaipublicfiles.com/fairseq/models/roberta.large.mnli.tar.gz",
+            "roberta.large.wsc": "http://dl.fbaipublicfiles.com/fairseq/models/roberta.large.wsc.tar.gz",
+        }
+
+    def __init__(self, args, encoder):
+        super().__init__(encoder)
+        self.args = args
+
+        # We follow BERT's random weight initialization
+        self.apply(init_bert_params)
+
+        self.classification_heads = nn.ModuleDict()
+
+    @staticmethod
+    def add_args(parser):
+        """Add model-specific arguments to the parser."""
+        parser.add_argument(
+            "--encoder-layers", type=int, metavar="L", help="num encoder layers"
+        )
+        parser.add_argument(
+            "--encoder-embed-dim",
+            type=int,
+            metavar="H",
+            help="encoder embedding dimension",
+        )
+        parser.add_argument(
+            "--encoder-ffn-embed-dim",
+            type=int,
+            metavar="F",
+            help="encoder embedding dimension for FFN",
+        )
+        parser.add_argument(
+            "--encoder-attention-heads",
+            type=int,
+            metavar="A",
+            help="num encoder attention heads",
+        )
+        parser.add_argument(
+            "--activation-fn",
+            choices=utils.get_available_activation_fns(),
+            help="activation function to use",
+        )
+        parser.add_argument(
+            "--pooler-activation-fn",
+            choices=utils.get_available_activation_fns(),
+            help="activation function to use for pooler layer",
+        )
+        parser.add_argument(
+            "--encoder-normalize-before",
+            action="store_true",
+            help="apply layernorm before each encoder block",
+        )
+        parser.add_argument(
+            "--layernorm-embedding",
+            action="store_true",
+            help="add layernorm to embedding",
+        )
+        parser.add_argument(
+            "--dropout", type=float, metavar="D", help="dropout probability"
+        )
+        parser.add_argument(
+            "--attention-dropout",
+            type=float,
+            metavar="D",
+            help="dropout probability for attention weights",
+        )
+        parser.add_argument(
+            "--activation-dropout",
+            type=float,
+            metavar="D",
+            help="dropout probability after activation in FFN",
+        )
+        parser.add_argument(
+            "--pooler-dropout",
+            type=float,
+            metavar="D",
+            help="dropout probability in the masked_lm pooler layers",
+        )
+        parser.add_argument(
+            "--max-positions", type=int, help="number of positional embeddings to learn"
+        )
+        parser.add_argument(
+            "--load-checkpoint-heads",
+            action="store_true",
+            help="(re-)register and load heads when loading checkpoints",
+        )
+        parser.add_argument(
+            "--untie-weights-roberta",
+            action="store_true",
+            help="Untie weights between embeddings and classifiers in RoBERTa",
+        )
+        # args for "Reducing Transformer Depth on Demand with Structured Dropout" (Fan et al., 2019)
+        parser.add_argument(
+            "--encoder-layerdrop",
+            type=float,
+            metavar="D",
+            default=0,
+            help="LayerDrop probability for encoder",
+        )
+        parser.add_argument(
+            "--encoder-layers-to-keep",
+            default=None,
+            help="which layers to *keep* when pruning as a comma-separated list",
+        )
+        # args for Training with Quantization Noise for Extreme Model Compression ({Fan*, Stock*} et al., 2020)
+        parser.add_argument(
+            "--quant-noise-pq",
+            type=float,
+            metavar="D",
+            default=0,
+            help="iterative PQ quantization noise at training time",
+        )
+        parser.add_argument(
+            "--quant-noise-pq-block-size",
+            type=int,
+            metavar="D",
+            default=8,
+            help="block size of quantization noise at training time",
+        )
+        parser.add_argument(
+            "--quant-noise-scalar",
+            type=float,
+            metavar="D",
+            default=0,
+            help="scalar quantization noise and scalar quantization at training time",
+        )
+        # args for "Better Fine-Tuning by Reducing Representational Collapse" (Aghajanyan et al. 2020)
+        parser.add_argument(
+            "--spectral-norm-classification-head",
+            action="store_true",
+            default=False,
+            help="Apply spectral normalization on the classification head",
+        )
+        # args for Fully Sharded Data Parallel (FSDP) training
+        parser.add_argument(
+            "--min-params-to-wrap",
+            type=int,
+            metavar="D",
+            default=DEFAULT_MIN_PARAMS_TO_WRAP,
+            help=(
+                "minimum number of params for a layer to be wrapped with FSDP() when "
+                "training with --ddp-backend=fully_sharded. Smaller values will "
+                "improve memory efficiency, but may make torch.distributed "
+                "communication less efficient due to smaller input sizes. This option "
+                "is set to 0 (i.e., always wrap) when --checkpoint-activations or "
+                "--offload-activations are passed."
+            )
+        )
+
+    @classmethod
+    def build_model(cls, args, task):
+        """Build a new model instance."""
+
+        # make sure all arguments are present
+        base_architecture(args)
+
+        if not hasattr(args, "max_positions"):
+            args.max_positions = args.tokens_per_sample
+
+        encoder = RobertaEncoder(args, task.source_dictionary)
+        return cls(args, encoder)
+
+    def forward(
+        self,
+        src_tokens,
+        features_only=False,
+        return_all_hiddens=False,
+        classification_head_name=None,
+        **kwargs,
+    ):
+        if classification_head_name is not None:
+            features_only = True
+
+        x, extra = self.encoder(src_tokens, features_only, return_all_hiddens, **kwargs)
+
+        if classification_head_name is not None:
+            x = self.classification_heads[classification_head_name](x)
+        return x, extra
+
+    def get_normalized_probs(self, net_output, log_probs, sample=None):
+        """Get normalized probabilities (or log probs) from a net's output."""
+        logits = net_output[0].float()
+        if log_probs:
+            return F.log_softmax(logits, dim=-1)
+        else:
+            return F.softmax(logits, dim=-1)
+
+    def register_classification_head(
+        self, name, num_classes=None, inner_dim=None, **kwargs
+    ):
+        """Register a classification head."""
+        if name in self.classification_heads:
+            prev_num_classes = self.classification_heads[name].out_proj.out_features
+            prev_inner_dim = self.classification_heads[name].dense.out_features
+            if num_classes != prev_num_classes or inner_dim != prev_inner_dim:
+                logger.warning(
+                    're-registering head "{}" with num_classes {} (prev: {}) '
+                    "and inner_dim {} (prev: {})".format(
+                        name, num_classes, prev_num_classes, inner_dim, prev_inner_dim
+                    )
+                )
+        self.classification_heads[name] = RobertaClassificationHead(
+            input_dim=self.args.encoder_embed_dim,
+            inner_dim=inner_dim or self.args.encoder_embed_dim,
+            num_classes=num_classes,
+            activation_fn=self.args.pooler_activation_fn,
+            pooler_dropout=self.args.pooler_dropout,
+            q_noise=self.args.quant_noise_pq,
+            qn_block_size=self.args.quant_noise_pq_block_size,
+            do_spectral_norm=self.args.spectral_norm_classification_head,
+        )
+
+    @property
+    def supported_targets(self):
+        return {"self"}
+
+    @classmethod
+    def from_pretrained(
+        cls,
+        model_name_or_path,
+        checkpoint_file="model.pt",
+        data_name_or_path=".",
+        bpe="gpt2",
+        **kwargs,
+    ):
+        from fairseq import hub_utils
+
+        x = hub_utils.from_pretrained(
+            model_name_or_path,
+            checkpoint_file,
+            data_name_or_path,
+            archive_map=cls.hub_models(),
+            bpe=bpe,
+            load_checkpoint_heads=True,
+            **kwargs,
+        )
+
+        logger.info(x["args"])
+        return RobertaHubInterface(x["args"], x["task"], x["models"][0])
+
+    def upgrade_state_dict_named(self, state_dict, name):
+        prefix = name + "." if name != "" else ""
+
+        # rename decoder -> encoder before upgrading children modules
+        for k in list(state_dict.keys()):
+            if k.startswith(prefix + "decoder"):
+                new_k = prefix + "encoder" + k[len(prefix + "decoder") :]
+                state_dict[new_k] = state_dict[k]
+                del state_dict[k]
+
+        # rename emb_layer_norm -> layernorm_embedding
+        for k in list(state_dict.keys()):
+            if ".emb_layer_norm." in k:
+                new_k = k.replace(".emb_layer_norm.", ".layernorm_embedding.")
+                state_dict[new_k] = state_dict[k]
+                del state_dict[k]
+
+        # upgrade children modules
+        super().upgrade_state_dict_named(state_dict, name)
+
+        # Handle new classification heads present in the state dict.
+        current_head_names = (
+            []
+            if not hasattr(self, "classification_heads")
+            else self.classification_heads.keys()
+        )
+        keys_to_delete = []
+        for k in state_dict.keys():
+            if not k.startswith(prefix + "classification_heads."):
+                continue
+
+            head_name = k[len(prefix + "classification_heads.") :].split(".")[0]
+            num_classes = state_dict[
+                prefix + "classification_heads." + head_name + ".out_proj.weight"
+            ].size(0)
+            inner_dim = state_dict[
+                prefix + "classification_heads." + head_name + ".dense.weight"
+            ].size(0)
+
+            if getattr(self.args, "load_checkpoint_heads", False):
+                if head_name not in current_head_names:
+                    self.register_classification_head(head_name, num_classes, inner_dim)
+            else:
+                if head_name not in current_head_names:
+                    logger.warning(
+                        "deleting classification head ({}) from checkpoint "
+                        "not present in current model: {}".format(head_name, k)
+                    )
+                    keys_to_delete.append(k)
+                elif (
+                    num_classes
+                    != self.classification_heads[head_name].out_proj.out_features
+                    or inner_dim
+                    != self.classification_heads[head_name].dense.out_features
+                ):
+                    logger.warning(
+                        "deleting classification head ({}) from checkpoint "
+                        "with different dimensions than current model: {}".format(
+                            head_name, k
+                        )
+                    )
+                    keys_to_delete.append(k)
+        for k in keys_to_delete:
+            del state_dict[k]
+
+        # Copy any newly-added classification heads into the state dict
+        # with their current weights.
+        if hasattr(self, "classification_heads"):
+            cur_state = self.classification_heads.state_dict()
+            for k, v in cur_state.items():
+                if prefix + "classification_heads." + k not in state_dict:
+                    logger.info("Overwriting " + prefix + "classification_heads." + k)
+                    state_dict[prefix + "classification_heads." + k] = v
+
+
+class RobertaLMHead(nn.Module):
+    """Head for masked language modeling."""
+
+    def __init__(self, embed_dim, output_dim, activation_fn, weight=None):
+        super().__init__()
+        self.dense = nn.Linear(embed_dim, embed_dim)
+        self.activation_fn = utils.get_activation_fn(activation_fn)
+        self.layer_norm = LayerNorm(embed_dim)
+
+        if weight is None:
+            weight = nn.Linear(embed_dim, output_dim, bias=False).weight
+        self.weight = weight
+        self.bias = nn.Parameter(torch.zeros(output_dim))
+
+    def forward(self, features, masked_tokens=None, **kwargs):
+        # Only project the masked tokens while training,
+        # saves both memory and computation
+        if masked_tokens is not None:
+            features = features[masked_tokens, :]
+
+        x = self.dense(features)
+        x = self.activation_fn(x)
+        x = self.layer_norm(x)
+        # project back to size of vocabulary with bias
+        x = F.linear(x, self.weight) + self.bias
+        return x
+
+
+class RobertaClassificationHead(nn.Module):
+    """Head for sentence-level classification tasks."""
+
+    def __init__(
+        self,
+        input_dim,
+        inner_dim,
+        num_classes,
+        activation_fn,
+        pooler_dropout,
+        q_noise=0,
+        qn_block_size=8,
+        do_spectral_norm=False,
+    ):
+        super().__init__()
+        self.dense = nn.Linear(input_dim, inner_dim)
+        self.activation_fn = utils.get_activation_fn(activation_fn)
+        self.dropout = nn.Dropout(p=pooler_dropout)
+        self.out_proj = apply_quant_noise_(
+            nn.Linear(inner_dim, num_classes), q_noise, qn_block_size
+        )
+        if do_spectral_norm:
+            if q_noise != 0:
+                raise NotImplementedError(
+                    "Attempting to use Spectral Normalization with Quant Noise. This is not officially supported"
+                )
+            self.out_proj = torch.nn.utils.spectral_norm(self.out_proj)
+
+    def forward(self, features, **kwargs):
+        x = features[:, 0, :]  # take <s> token (equiv. to [CLS])
+        x = self.dropout(x)
+        x = self.dense(x)
+        x = self.activation_fn(x)
+        x = self.dropout(x)
+        x = self.out_proj(x)
+        return x
+
+
+class RobertaEncoder(FairseqEncoder):
+    """RoBERTa encoder."""
+
+    def __init__(self, args, dictionary):
+        super().__init__(dictionary)
+
+        # set any missing default values
+        base_architecture(args)
+        self.args = args
+
+        if args.encoder_layers_to_keep:
+            args.encoder_layers = len(args.encoder_layers_to_keep.split(","))
+
+        embed_tokens = self.build_embedding(
+            len(dictionary), args.encoder_embed_dim, dictionary.pad()
+        )
+
+        self.sentence_encoder = self.build_encoder(args, dictionary, embed_tokens)
+
+        self.lm_head = self.build_lm_head(
+            embed_dim=args.encoder_embed_dim,
+            output_dim=len(dictionary),
+            activation_fn=args.activation_fn,
+            weight=(
+                self.sentence_encoder.embed_tokens.weight
+                if not args.untie_weights_roberta
+                else None
+            ),
+        )
+
+    def build_embedding(self, vocab_size, embedding_dim, padding_idx):
+        return nn.Embedding(vocab_size, embedding_dim, padding_idx)
+
+    def build_encoder(self, args, dictionary, embed_tokens):
+        encoder = TransformerEncoder(args, dictionary, embed_tokens)
+        encoder.apply(init_bert_params)
+        return encoder
+
+    def build_lm_head(self, embed_dim, output_dim, activation_fn, weight):
+        return RobertaLMHead(embed_dim, output_dim, activation_fn, weight)
+
+    def forward(
+        self,
+        src_tokens,
+        features_only=False,
+        return_all_hiddens=False,
+        masked_tokens=None,
+        **unused,
+    ):
+        """
+        Args:
+            src_tokens (LongTensor): input tokens of shape `(batch, src_len)`
+            features_only (bool, optional): skip LM head and just return
+                features. If True, the output will be of shape
+                `(batch, src_len, embed_dim)`.
+            return_all_hiddens (bool, optional): also return all of the
+                intermediate hidden states (default: False).
+
+        Returns:
+            tuple:
+                - the LM output of shape `(batch, src_len, vocab)`
+                - a dictionary of additional data, where 'inner_states'
+                  is a list of hidden states. Note that the hidden
+                  states have shape `(src_len, batch, vocab)`.
+        """
+        x, extra = self.extract_features(
+            src_tokens, return_all_hiddens=return_all_hiddens
+        )
+        if not features_only:
+            x = self.output_layer(x, masked_tokens=masked_tokens)
+        return x, extra
+
+    def extract_features(self, src_tokens, return_all_hiddens=False, **kwargs):
+        encoder_out = self.sentence_encoder(
+            src_tokens,
+            return_all_hiddens=return_all_hiddens,
+            token_embeddings=kwargs.get("token_embeddings", None),
+        )
+        # T x B x C -> B x T x C
+        features = encoder_out["encoder_out"][0].transpose(0, 1)
+        inner_states = encoder_out["encoder_states"] if return_all_hiddens else None
+        return features, {"inner_states": inner_states}
+
+    def output_layer(self, features, masked_tokens=None, **unused):
+        return self.lm_head(features, masked_tokens)
+
+    def max_positions(self):
+        """Maximum output length supported by the encoder."""
+        return self.args.max_positions
+
+
+@register_model_architecture("roberta", "roberta")
+def base_architecture(args):
+    args.encoder_layers = getattr(args, "encoder_layers", 12)
+    args.encoder_embed_dim = getattr(args, "encoder_embed_dim", 768)
+    args.encoder_ffn_embed_dim = getattr(args, "encoder_ffn_embed_dim", 3072)
+    args.encoder_attention_heads = getattr(args, "encoder_attention_heads", 12)
+
+    args.dropout = getattr(args, "dropout", 0.1)
+    args.attention_dropout = getattr(args, "attention_dropout", 0.1)
+    args.activation_dropout = getattr(args, "activation_dropout", 0.0)
+    args.pooler_dropout = getattr(args, "pooler_dropout", 0.0)
+
+    args.max_source_positions = getattr(args, "max_positions", 512)
+    args.no_token_positional_embeddings = getattr(
+        args, "no_token_positional_embeddings", False
+    )
+
+    # BERT has a few structural differences compared to the original Transformer
+    args.encoder_learned_pos = getattr(args, "encoder_learned_pos", True)
+    args.layernorm_embedding = getattr(args, "layernorm_embedding", True)
+    args.no_scale_embedding = getattr(args, "no_scale_embedding", True)
+    args.activation_fn = getattr(args, "activation_fn", "gelu")
+    args.encoder_normalize_before = getattr(args, "encoder_normalize_before", False)
+    args.pooler_activation_fn = getattr(args, "pooler_activation_fn", "tanh")
+    args.untie_weights_roberta = getattr(args, "untie_weights_roberta", False)
+
+    # Adaptive input config
+    args.adaptive_input = getattr(args, "adaptive_input", False)
+
+    # LayerDrop config
+    args.encoder_layerdrop = getattr(args, "encoder_layerdrop", 0.0)
+    args.encoder_layers_to_keep = getattr(args, "encoder_layers_to_keep", None)
+
+    # Quantization noise config
+    args.quant_noise_pq = getattr(args, "quant_noise_pq", 0)
+    args.quant_noise_pq_block_size = getattr(args, "quant_noise_pq_block_size", 8)
+    args.quant_noise_scalar = getattr(args, "quant_noise_scalar", 0)
+
+    # R4F config
+    args.spectral_norm_classification_head = getattr(
+        args, "spectral_norm_classification_head", False
+    )
+
+
+@register_model_architecture("roberta", "roberta_prenorm")
+def roberta_prenorm_architecture(args):
+    args.layernorm_embedding = getattr(args, "layernorm_embedding", False)
+    args.encoder_normalize_before = getattr(args, "encoder_normalize_before", True)
+    base_architecture(args)
+
+
+@register_model_architecture("roberta", "roberta_base")
+def roberta_base_architecture(args):
+    base_architecture(args)
+
+
+@register_model_architecture("roberta", "roberta_large")
+def roberta_large_architecture(args):
+    args.encoder_layers = getattr(args, "encoder_layers", 24)
+    args.encoder_embed_dim = getattr(args, "encoder_embed_dim", 1024)
+    args.encoder_ffn_embed_dim = getattr(args, "encoder_ffn_embed_dim", 4096)
+    args.encoder_attention_heads = getattr(args, "encoder_attention_heads", 16)
+    base_architecture(args)
+
+
+@register_model_architecture("roberta", "xlm")
+def xlm_architecture(args):
+    args.encoder_layers = getattr(args, "encoder_layers", 16)
+    args.encoder_embed_dim = getattr(args, "encoder_embed_dim", 1280)
+    args.encoder_ffn_embed_dim = getattr(args, "encoder_ffn_embed_dim", 1280 * 4)
+    args.encoder_attention_heads = getattr(args, "encoder_attention_heads", 16)
+    base_architecture(args)
diff --git a/SpeechT5/fairseq/fairseq/models/roberta/model_camembert.py b/SpeechT5/fairseq/fairseq/models/roberta/model_camembert.py
new file mode 100644
index 0000000000000000000000000000000000000000..46447546fafb4a0a887b481022cac07631047c80
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/models/roberta/model_camembert.py
@@ -0,0 +1,50 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+"""
+CamemBERT: a Tasty French Language Model
+"""
+
+from fairseq.models import register_model
+
+from .hub_interface import RobertaHubInterface
+from .model import RobertaModel
+
+
+@register_model("camembert")
+class CamembertModel(RobertaModel):
+    @classmethod
+    def hub_models(cls):
+        return {
+            "camembert": "http://dl.fbaipublicfiles.com/fairseq/models/camembert-base.tar.gz",
+            "camembert.v0": "http://dl.fbaipublicfiles.com/fairseq/models/camembert-base.tar.gz",
+            "camembert-base": "http://dl.fbaipublicfiles.com/fairseq/models/camembert-base.tar.gz",
+            "camembert-large": "http://dl.fbaipublicfiles.com/fairseq/models/camembert-large.tar.gz",
+            "camembert-base-ccnet": "http://dl.fbaipublicfiles.com/fairseq/models/camembert-base-ccnet.tar.gz",
+            "camembert-base-ccnet-4gb": "http://dl.fbaipublicfiles.com/fairseq/models/camembert-base-ccnet-4gb.tar.gz",
+            "camembert-base-wikipedia-4gb": "http://dl.fbaipublicfiles.com/fairseq/models/camembert-base-wikipedia-4gb.tar.gz",
+            "camembert-base-oscar-4gb": "http://dl.fbaipublicfiles.com/fairseq/models/camembert-base-oscar-4gb.tar.gz",
+        }
+
+    @classmethod
+    def from_pretrained(
+        cls,
+        model_name_or_path,
+        checkpoint_file="model.pt",
+        data_name_or_path=".",
+        bpe="sentencepiece",
+        **kwargs
+    ):
+        from fairseq import hub_utils
+
+        x = hub_utils.from_pretrained(
+            model_name_or_path,
+            checkpoint_file,
+            data_name_or_path,
+            archive_map=cls.hub_models(),
+            bpe=bpe,
+            load_checkpoint_heads=True,
+            **kwargs,
+        )
+        return RobertaHubInterface(x["args"], x["task"], x["models"][0])
diff --git a/SpeechT5/fairseq/fairseq/models/roberta/model_gottbert.py b/SpeechT5/fairseq/fairseq/models/roberta/model_gottbert.py
new file mode 100644
index 0000000000000000000000000000000000000000..2e8c66354ac7ce7309226bb091a7baa4776fbfdc
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/models/roberta/model_gottbert.py
@@ -0,0 +1,49 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+"""
+GottBERT: a pure German Language Model
+"""
+
+from fairseq.models import register_model
+
+from .hub_interface import RobertaHubInterface
+from .model import RobertaModel
+
+
+@register_model('gottbert')
+class GottbertModel(RobertaModel):
+
+    @classmethod
+    def hub_models(cls):
+        return {
+            'gottbert-base': 'https://dl.gottbert.de/fairseq/models/gottbert-base.tar.gz',
+        }
+
+    @classmethod
+    def from_pretrained(cls,
+                        model_name_or_path,
+                        checkpoint_file='model.pt',
+                        data_name_or_path='.',
+                        bpe='hf_byte_bpe',
+                        bpe_vocab='vocab.json',
+                        bpe_merges='merges.txt',
+                        bpe_add_prefix_space=False,
+                        **kwargs
+                        ):
+        from fairseq import hub_utils
+
+        x = hub_utils.from_pretrained(
+            model_name_or_path,
+            checkpoint_file,
+            data_name_or_path,
+            archive_map=cls.hub_models(),
+            bpe=bpe,
+            load_checkpoint_heads=True,
+            bpe_vocab=bpe_vocab,
+            bpe_merges=bpe_merges,
+            bpe_add_prefix_space=bpe_add_prefix_space,
+            **kwargs,
+        )
+        return RobertaHubInterface(x['args'], x['task'], x['models'][0])
diff --git a/SpeechT5/fairseq/fairseq/models/roberta/model_xlmr.py b/SpeechT5/fairseq/fairseq/models/roberta/model_xlmr.py
new file mode 100644
index 0000000000000000000000000000000000000000..cf6e354d53b918dd4c7c78bfcd38ac0d63cab3bd
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/models/roberta/model_xlmr.py
@@ -0,0 +1,46 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+"""
+Unsupervised Cross-lingual Representation Learning at Scale
+"""
+
+from fairseq.models import register_model
+
+from .hub_interface import RobertaHubInterface
+from .model import RobertaModel
+
+
+@register_model("xlmr")
+class XLMRModel(RobertaModel):
+    @classmethod
+    def hub_models(cls):
+        return {
+            "xlmr.base": "http://dl.fbaipublicfiles.com/fairseq/models/xlmr.base.tar.gz",
+            "xlmr.large": "http://dl.fbaipublicfiles.com/fairseq/models/xlmr.large.tar.gz",
+            "xlmr.xl": "http://dl.fbaipublicfiles.com/fairseq/models/xlmr/xlmr.xl.tar.gz",
+            "xlmr.xxl": "http://dl.fbaipublicfiles.com/fairseq/models/xlmr/xlmr.xxl.tar.gz",
+        }
+
+    @classmethod
+    def from_pretrained(
+        cls,
+        model_name_or_path,
+        checkpoint_file="model.pt",
+        data_name_or_path=".",
+        bpe="sentencepiece",
+        **kwargs
+    ):
+        from fairseq import hub_utils
+
+        x = hub_utils.from_pretrained(
+            model_name_or_path,
+            checkpoint_file,
+            data_name_or_path,
+            archive_map=cls.hub_models(),
+            bpe=bpe,
+            load_checkpoint_heads=True,
+            **kwargs,
+        )
+        return RobertaHubInterface(x["args"], x["task"], x["models"][0])
diff --git a/SpeechT5/fairseq/fairseq/models/speech_to_text/__init__.py b/SpeechT5/fairseq/fairseq/models/speech_to_text/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..c6ae9b17ba37a228163fddcb6fed199e61ef02c8
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/models/speech_to_text/__init__.py
@@ -0,0 +1,8 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from .berard import *  # noqa
+from .convtransformer import *  # noqa
+from .s2t_transformer import *  # noqa
diff --git a/SpeechT5/fairseq/fairseq/models/speech_to_text/berard.py b/SpeechT5/fairseq/fairseq/models/speech_to_text/berard.py
new file mode 100644
index 0000000000000000000000000000000000000000..c505e3acaa84e5f3263ccbfaf9556f77123f09fc
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/models/speech_to_text/berard.py
@@ -0,0 +1,606 @@
+#!/usr/bin/env python3
+
+from ast import literal_eval
+from typing import List, Tuple
+
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from fairseq import checkpoint_utils, utils
+from fairseq.data.data_utils import lengths_to_padding_mask
+from fairseq.models import (
+    FairseqEncoder,
+    FairseqEncoderDecoderModel,
+    FairseqIncrementalDecoder,
+    register_model,
+    register_model_architecture,
+)
+
+
+@register_model("s2t_berard")
+class BerardModel(FairseqEncoderDecoderModel):
+    """Implementation of a model similar to https://arxiv.org/abs/1802.04200
+
+    Paper title: End-to-End Automatic Speech Translation of Audiobooks
+    An implementation is available in tensorflow at
+    https://github.com/eske/seq2seq
+    Relevant files in this implementation are the config
+    (https://github.com/eske/seq2seq/blob/master/config/LibriSpeech/AST.yaml)
+    and the model code
+    (https://github.com/eske/seq2seq/blob/master/translate/models.py).
+    The encoder and decoder try to be close to the original implementation.
+    The attention is an MLP as in Bahdanau et al.
+    (https://arxiv.org/abs/1409.0473).
+    There is no state initialization by averaging the encoder outputs.
+    """
+
+    def __init__(self, encoder, decoder):
+        super().__init__(encoder, decoder)
+
+    @staticmethod
+    def add_args(parser):
+        parser.add_argument(
+            "--input-layers",
+            type=str,
+            metavar="EXPR",
+            help="List of linear layer dimensions. These "
+            "layers are applied to the input features and "
+            "are followed by tanh and possibly dropout.",
+        )
+        parser.add_argument(
+            "--dropout",
+            type=float,
+            metavar="D",
+            help="Dropout probability to use in the encoder/decoder. "
+            "Note that this parameters control dropout in various places, "
+            "there is no fine-grained control for dropout for embeddings "
+            "vs LSTM layers for example.",
+        )
+        parser.add_argument(
+            "--in-channels",
+            type=int,
+            metavar="N",
+            help="Number of encoder input channels. " "Typically value is 1.",
+        )
+        parser.add_argument(
+            "--conv-layers",
+            type=str,
+            metavar="EXPR",
+            help="List of conv layers " "(format: (channels, kernel, stride)).",
+        )
+        parser.add_argument(
+            "--num-blstm-layers",
+            type=int,
+            metavar="N",
+            help="Number of encoder bi-LSTM layers.",
+        )
+        parser.add_argument(
+            "--lstm-size", type=int, metavar="N", help="LSTM hidden size."
+        )
+        parser.add_argument(
+            "--decoder-embed-dim",
+            type=int,
+            metavar="N",
+            help="Embedding dimension of the decoder target tokens.",
+        )
+        parser.add_argument(
+            "--decoder-hidden-dim",
+            type=int,
+            metavar="N",
+            help="Decoder LSTM hidden dimension.",
+        )
+        parser.add_argument(
+            "--decoder-num-layers",
+            type=int,
+            metavar="N",
+            help="Number of decoder LSTM layers.",
+        )
+        parser.add_argument(
+            "--attention-dim",
+            type=int,
+            metavar="N",
+            help="Hidden layer dimension in MLP attention.",
+        )
+        parser.add_argument(
+            "--output-layer-dim",
+            type=int,
+            metavar="N",
+            help="Hidden layer dim for linear layer prior to output projection.",
+        )
+        parser.add_argument(
+            "--load-pretrained-encoder-from",
+            type=str,
+            metavar="STR",
+            help="model to take encoder weights from (for initialization)",
+        )
+        parser.add_argument(
+            "--load-pretrained-decoder-from",
+            type=str,
+            metavar="STR",
+            help="model to take decoder weights from (for initialization)",
+        )
+
+    @classmethod
+    def build_encoder(cls, args, task):
+        encoder = BerardEncoder(
+            input_layers=literal_eval(args.input_layers),
+            conv_layers=literal_eval(args.conv_layers),
+            in_channels=args.input_channels,
+            input_feat_per_channel=args.input_feat_per_channel,
+            num_blstm_layers=args.num_blstm_layers,
+            lstm_size=args.lstm_size,
+            dropout=args.dropout,
+        )
+        if getattr(args, "load_pretrained_encoder_from", None):
+            encoder = checkpoint_utils.load_pretrained_component_from_model(
+                component=encoder, checkpoint=args.load_pretrained_encoder_from
+            )
+        return encoder
+
+    @classmethod
+    def build_decoder(cls, args, task):
+        decoder = LSTMDecoder(
+            dictionary=task.target_dictionary,
+            embed_dim=args.decoder_embed_dim,
+            num_layers=args.decoder_num_layers,
+            hidden_size=args.decoder_hidden_dim,
+            dropout=args.dropout,
+            encoder_output_dim=2 * args.lstm_size,  # bidirectional
+            attention_dim=args.attention_dim,
+            output_layer_dim=args.output_layer_dim,
+        )
+        if getattr(args, "load_pretrained_decoder_from", None):
+            decoder = checkpoint_utils.load_pretrained_component_from_model(
+                component=decoder, checkpoint=args.load_pretrained_decoder_from
+            )
+        return decoder
+
+    @classmethod
+    def build_model(cls, args, task):
+        """Build a new model instance."""
+        encoder = cls.build_encoder(args, task)
+        decoder = cls.build_decoder(args, task)
+
+        return cls(encoder, decoder)
+
+    def get_normalized_probs(self, net_output, log_probs, sample=None):
+        # net_output['encoder_out'] is a (B, T, D) tensor
+        lprobs = super().get_normalized_probs(net_output, log_probs, sample)
+        # lprobs is a (B, T, D) tensor
+        lprobs.batch_first = True
+        return lprobs
+
+
+class BerardEncoder(FairseqEncoder):
+    def __init__(
+        self,
+        input_layers: List[int],
+        conv_layers: List[Tuple[int]],
+        in_channels: int,
+        input_feat_per_channel: int,
+        num_blstm_layers: int,
+        lstm_size: int,
+        dropout: float,
+    ):
+        """
+        Args:
+            input_layers: list of linear layer dimensions. These layers are
+                applied to the input features and are followed by tanh and
+                possibly dropout.
+            conv_layers: list of conv2d layer configurations. A configuration is
+                a tuple (out_channels, conv_kernel_size, stride).
+            in_channels: number of input channels.
+            input_feat_per_channel: number of input features per channel. These
+                are speech features, typically 40 or 80.
+            num_blstm_layers: number of bidirectional LSTM layers.
+            lstm_size: size of the LSTM hidden (and cell) size.
+            dropout: dropout probability. Dropout can be applied after the
+                linear layers and LSTM layers but not to the convolutional
+                layers.
+        """
+        super().__init__(None)
+
+        self.input_layers = nn.ModuleList()
+        in_features = input_feat_per_channel
+        for out_features in input_layers:
+            if dropout > 0:
+                self.input_layers.append(
+                    nn.Sequential(
+                        nn.Linear(in_features, out_features), nn.Dropout(p=dropout)
+                    )
+                )
+            else:
+                self.input_layers.append(nn.Linear(in_features, out_features))
+            in_features = out_features
+
+        self.in_channels = in_channels
+        self.input_dim = input_feat_per_channel
+        self.conv_kernel_sizes_and_strides = []
+        self.conv_layers = nn.ModuleList()
+        lstm_input_dim = input_layers[-1]
+        for conv_layer in conv_layers:
+            out_channels, conv_kernel_size, conv_stride = conv_layer
+            self.conv_layers.append(
+                nn.Conv2d(
+                    in_channels,
+                    out_channels,
+                    conv_kernel_size,
+                    stride=conv_stride,
+                    padding=conv_kernel_size // 2,
+                )
+            )
+            self.conv_kernel_sizes_and_strides.append((conv_kernel_size, conv_stride))
+            in_channels = out_channels
+            lstm_input_dim //= conv_stride
+
+        lstm_input_dim *= conv_layers[-1][0]
+        self.lstm_size = lstm_size
+        self.num_blstm_layers = num_blstm_layers
+        self.lstm = nn.LSTM(
+            input_size=lstm_input_dim,
+            hidden_size=lstm_size,
+            num_layers=num_blstm_layers,
+            dropout=dropout,
+            bidirectional=True,
+        )
+        self.output_dim = 2 * lstm_size  # bidirectional
+        if dropout > 0:
+            self.dropout = nn.Dropout(p=dropout)
+        else:
+            self.dropout = None
+
+    def forward(self, src_tokens, src_lengths=None, **kwargs):
+        """
+        Args
+            src_tokens: padded tensor (B, T, C * feat)
+            src_lengths: tensor of original lengths of input utterances (B,)
+        """
+        bsz, max_seq_len, _ = src_tokens.size()
+        # (B, C, T, feat)
+        x = (
+            src_tokens.view(bsz, max_seq_len, self.in_channels, self.input_dim)
+            .transpose(1, 2)
+            .contiguous()
+        )
+
+        for input_layer in self.input_layers:
+            x = input_layer(x)
+            x = torch.tanh(x)
+
+        for conv_layer in self.conv_layers:
+            x = conv_layer(x)
+
+        bsz, _, output_seq_len, _ = x.size()
+
+        # (B, C, T, feat) -> (B, T, C, feat) -> (T, B, C, feat) ->
+        # (T, B, C * feat)
+        x = x.transpose(1, 2).transpose(0, 1).contiguous().view(output_seq_len, bsz, -1)
+
+        input_lengths = src_lengths.clone()
+        for k, s in self.conv_kernel_sizes_and_strides:
+            p = k // 2
+            input_lengths = (input_lengths.float() + 2 * p - k) / s + 1
+            input_lengths = input_lengths.floor().long()
+
+        packed_x = nn.utils.rnn.pack_padded_sequence(x, input_lengths)
+
+        h0 = x.new(2 * self.num_blstm_layers, bsz, self.lstm_size).zero_()
+        c0 = x.new(2 * self.num_blstm_layers, bsz, self.lstm_size).zero_()
+        packed_outs, _ = self.lstm(packed_x, (h0, c0))
+
+        # unpack outputs and apply dropout
+        x, output_lengths = nn.utils.rnn.pad_packed_sequence(packed_outs)
+        if self.dropout is not None:
+            x = self.dropout(x)
+
+        encoder_padding_mask = (
+            lengths_to_padding_mask(output_lengths).to(src_tokens.device).t()
+        )
+
+        return {
+            "encoder_out": x,  # (T, B, C)
+            "encoder_padding_mask": encoder_padding_mask,  # (T, B)
+        }
+
+    def reorder_encoder_out(self, encoder_out, new_order):
+        encoder_out["encoder_out"] = encoder_out["encoder_out"].index_select(
+            1, new_order
+        )
+        encoder_out["encoder_padding_mask"] = encoder_out[
+            "encoder_padding_mask"
+        ].index_select(1, new_order)
+        return encoder_out
+
+
+class MLPAttention(nn.Module):
+    """The original attention from Badhanau et al. (2014)
+
+    https://arxiv.org/abs/1409.0473, based on a Multi-Layer Perceptron.
+    The attention score between position i in the encoder and position j in the
+    decoder is: alpha_ij = V_a * tanh(W_ae * enc_i + W_ad * dec_j + b_a)
+    """
+
+    def __init__(self, decoder_hidden_state_dim, context_dim, attention_dim):
+        super().__init__()
+
+        self.context_dim = context_dim
+        self.attention_dim = attention_dim
+        # W_ae and b_a
+        self.encoder_proj = nn.Linear(context_dim, self.attention_dim, bias=True)
+        # W_ad
+        self.decoder_proj = nn.Linear(
+            decoder_hidden_state_dim, self.attention_dim, bias=False
+        )
+        # V_a
+        self.to_scores = nn.Linear(self.attention_dim, 1, bias=False)
+
+    def forward(self, decoder_state, source_hids, encoder_padding_mask):
+        """The expected input dimensions are:
+        decoder_state: bsz x decoder_hidden_state_dim
+        source_hids: src_len x bsz x context_dim
+        encoder_padding_mask: src_len x bsz
+        """
+        src_len, bsz, _ = source_hids.size()
+        # (src_len*bsz) x context_dim (to feed through linear)
+        flat_source_hids = source_hids.view(-1, self.context_dim)
+        # (src_len*bsz) x attention_dim
+        encoder_component = self.encoder_proj(flat_source_hids)
+        # src_len x bsz x attention_dim
+        encoder_component = encoder_component.view(src_len, bsz, self.attention_dim)
+        # 1 x bsz x attention_dim
+        decoder_component = self.decoder_proj(decoder_state).unsqueeze(0)
+        # Sum with broadcasting and apply the non linearity
+        # src_len x bsz x attention_dim
+        hidden_att = torch.tanh(
+            (decoder_component + encoder_component).view(-1, self.attention_dim)
+        )
+        # Project onto the reals to get attentions scores (src_len x bsz)
+        attn_scores = self.to_scores(hidden_att).view(src_len, bsz)
+
+        # Mask + softmax (src_len x bsz)
+        if encoder_padding_mask is not None:
+            attn_scores = (
+                attn_scores.float()
+                .masked_fill_(encoder_padding_mask, float("-inf"))
+                .type_as(attn_scores)
+            )  # FP16 support: cast to float and back
+        # srclen x bsz
+        normalized_masked_attn_scores = F.softmax(attn_scores, dim=0)
+
+        # Sum weighted sources (bsz x context_dim)
+        attn_weighted_context = (
+            source_hids * normalized_masked_attn_scores.unsqueeze(2)
+        ).sum(dim=0)
+
+        return attn_weighted_context, normalized_masked_attn_scores
+
+
+class LSTMDecoder(FairseqIncrementalDecoder):
+    def __init__(
+        self,
+        dictionary,
+        embed_dim,
+        num_layers,
+        hidden_size,
+        dropout,
+        encoder_output_dim,
+        attention_dim,
+        output_layer_dim,
+    ):
+        """
+        Args:
+            dictionary: target text dictionary.
+            embed_dim: embedding dimension for target tokens.
+            num_layers: number of LSTM layers.
+            hidden_size: hidden size for LSTM layers.
+            dropout: dropout probability. Dropout can be applied to the
+                embeddings, the LSTM layers, and the context vector.
+            encoder_output_dim: encoder output dimension (hidden size of
+                encoder LSTM).
+            attention_dim: attention dimension for MLP attention.
+            output_layer_dim: size of the linear layer prior to output
+                projection.
+        """
+        super().__init__(dictionary)
+        self.num_layers = num_layers
+        self.hidden_size = hidden_size
+        num_embeddings = len(dictionary)
+        padding_idx = dictionary.pad()
+        self.embed_tokens = nn.Embedding(num_embeddings, embed_dim, padding_idx)
+        if dropout > 0:
+            self.dropout = nn.Dropout(p=dropout)
+        else:
+            self.dropout = None
+
+        self.layers = nn.ModuleList()
+        for layer_id in range(num_layers):
+            input_size = embed_dim if layer_id == 0 else encoder_output_dim
+            self.layers.append(
+                nn.LSTMCell(input_size=input_size, hidden_size=hidden_size)
+            )
+
+        self.context_dim = encoder_output_dim
+        self.attention = MLPAttention(
+            decoder_hidden_state_dim=hidden_size,
+            context_dim=encoder_output_dim,
+            attention_dim=attention_dim,
+        )
+
+        self.deep_output_layer = nn.Linear(
+            hidden_size + encoder_output_dim + embed_dim, output_layer_dim
+        )
+        self.output_projection = nn.Linear(output_layer_dim, num_embeddings)
+
+    def forward(
+        self, prev_output_tokens, encoder_out=None, incremental_state=None, **kwargs
+    ):
+        encoder_padding_mask = encoder_out["encoder_padding_mask"]
+        encoder_outs = encoder_out["encoder_out"]
+
+        if incremental_state is not None:
+            prev_output_tokens = prev_output_tokens[:, -1:]
+        bsz, seqlen = prev_output_tokens.size()
+
+        srclen = encoder_outs.size(0)
+
+        # embed tokens
+        embeddings = self.embed_tokens(prev_output_tokens)
+        x = embeddings
+        if self.dropout is not None:
+            x = self.dropout(x)
+
+        # B x T x C -> T x B x C
+        x = x.transpose(0, 1)
+
+        # initialize previous states (or get from cache during incremental
+        # generation)
+        cached_state = utils.get_incremental_state(
+            self, incremental_state, "cached_state"
+        )
+        if cached_state is not None:
+            prev_hiddens, prev_cells = cached_state
+        else:
+            prev_hiddens = [encoder_out["encoder_out"].mean(dim=0)] * self.num_layers
+            prev_cells = [x.new_zeros(bsz, self.hidden_size)] * self.num_layers
+
+        attn_scores = x.new_zeros(bsz, srclen)
+        attention_outs = []
+        outs = []
+        for j in range(seqlen):
+            input = x[j, :, :]
+            attention_out = None
+            for i, layer in enumerate(self.layers):
+                # the previous state is one layer below except for the bottom
+                # layer where the previous state is the state emitted by the
+                # top layer
+                hidden, cell = layer(
+                    input,
+                    (
+                        prev_hiddens[(i - 1) % self.num_layers],
+                        prev_cells[(i - 1) % self.num_layers],
+                    ),
+                )
+                if self.dropout is not None:
+                    hidden = self.dropout(hidden)
+                prev_hiddens[i] = hidden
+                prev_cells[i] = cell
+                if attention_out is None:
+                    attention_out, attn_scores = self.attention(
+                        hidden, encoder_outs, encoder_padding_mask
+                    )
+                    if self.dropout is not None:
+                        attention_out = self.dropout(attention_out)
+                    attention_outs.append(attention_out)
+                input = attention_out
+
+            # collect the output of the top layer
+            outs.append(hidden)
+
+        # cache previous states (no-op except during incremental generation)
+        utils.set_incremental_state(
+            self, incremental_state, "cached_state", (prev_hiddens, prev_cells)
+        )
+
+        # collect outputs across time steps
+        x = torch.cat(outs, dim=0).view(seqlen, bsz, self.hidden_size)
+        attention_outs_concat = torch.cat(attention_outs, dim=0).view(
+            seqlen, bsz, self.context_dim
+        )
+
+        # T x B x C -> B x T x C
+        x = x.transpose(0, 1)
+        attention_outs_concat = attention_outs_concat.transpose(0, 1)
+
+        # concat LSTM output, attention output and embedding
+        # before output projection
+        x = torch.cat((x, attention_outs_concat, embeddings), dim=2)
+        x = self.deep_output_layer(x)
+        x = torch.tanh(x)
+        if self.dropout is not None:
+            x = self.dropout(x)
+        # project back to size of vocabulary
+        x = self.output_projection(x)
+
+        # to return the full attn_scores tensor, we need to fix the decoder
+        # to account for subsampling input frames
+        # return x, attn_scores
+        return x, None
+
+    def reorder_incremental_state(self, incremental_state, new_order):
+        super().reorder_incremental_state(incremental_state, new_order)
+        cached_state = utils.get_incremental_state(
+            self, incremental_state, "cached_state"
+        )
+        if cached_state is None:
+            return
+
+        def reorder_state(state):
+            if isinstance(state, list):
+                return [reorder_state(state_i) for state_i in state]
+            return state.index_select(0, new_order)
+
+        new_state = tuple(map(reorder_state, cached_state))
+        utils.set_incremental_state(self, incremental_state, "cached_state", new_state)
+
+
+@register_model_architecture(model_name="s2t_berard", arch_name="s2t_berard")
+def berard(args):
+    """The original version: "End-to-End Automatic Speech Translation of
+    Audiobooks" (https://arxiv.org/abs/1802.04200)
+    """
+    args.input_layers = getattr(args, "input_layers", "[256, 128]")
+    args.conv_layers = getattr(args, "conv_layers", "[(16, 3, 2), (16, 3, 2)]")
+    args.num_blstm_layers = getattr(args, "num_blstm_layers", 3)
+    args.lstm_size = getattr(args, "lstm_size", 256)
+    args.dropout = getattr(args, "dropout", 0.2)
+    args.decoder_embed_dim = getattr(args, "decoder_embed_dim", 128)
+    args.decoder_num_layers = getattr(args, "decoder_num_layers", 2)
+    args.decoder_hidden_dim = getattr(args, "decoder_hidden_dim", 512)
+    args.attention_dim = getattr(args, "attention_dim", 512)
+    args.output_layer_dim = getattr(args, "output_layer_dim", 128)
+    args.load_pretrained_encoder_from = getattr(
+        args, "load_pretrained_encoder_from", None
+    )
+    args.load_pretrained_decoder_from = getattr(
+        args, "load_pretrained_decoder_from", None
+    )
+
+
+@register_model_architecture(model_name="s2t_berard", arch_name="s2t_berard_256_3_3")
+def berard_256_3_3(args):
+    """Used in
+    * "Harnessing Indirect Training Data for End-to-End Automatic Speech
+    Translation: Tricks of the Trade" (https://arxiv.org/abs/1909.06515)
+    * "CoVoST: A Diverse Multilingual Speech-To-Text Translation Corpus"
+    (https://arxiv.org/pdf/2002.01320.pdf)
+    * "Self-Supervised Representations Improve End-to-End Speech Translation"
+    (https://arxiv.org/abs/2006.12124)
+    """
+    args.decoder_num_layers = getattr(args, "decoder_num_layers", 3)
+    berard(args)
+
+
+@register_model_architecture(model_name="s2t_berard", arch_name="s2t_berard_512_3_2")
+def berard_512_3_2(args):
+    args.num_blstm_layers = getattr(args, "num_blstm_layers", 3)
+    args.lstm_size = getattr(args, "lstm_size", 512)
+    args.dropout = getattr(args, "dropout", 0.3)
+    args.decoder_embed_dim = getattr(args, "decoder_embed_dim", 256)
+    args.decoder_num_layers = getattr(args, "decoder_num_layers", 2)
+    args.decoder_hidden_dim = getattr(args, "decoder_hidden_dim", 1024)
+    args.attention_dim = getattr(args, "attention_dim", 512)
+    args.output_layer_dim = getattr(args, "output_layer_dim", 256)
+    berard(args)
+
+
+@register_model_architecture(model_name="s2t_berard", arch_name="s2t_berard_512_5_3")
+def berard_512_5_3(args):
+    args.num_blstm_layers = getattr(args, "num_blstm_layers", 5)
+    args.lstm_size = getattr(args, "lstm_size", 512)
+    args.dropout = getattr(args, "dropout", 0.3)
+    args.decoder_embed_dim = getattr(args, "decoder_embed_dim", 256)
+    args.decoder_num_layers = getattr(args, "decoder_num_layers", 3)
+    args.decoder_hidden_dim = getattr(args, "decoder_hidden_dim", 1024)
+    args.attention_dim = getattr(args, "attention_dim", 512)
+    args.output_layer_dim = getattr(args, "output_layer_dim", 256)
+    berard(args)
diff --git a/SpeechT5/fairseq/fairseq/models/speech_to_text/convtransformer.py b/SpeechT5/fairseq/fairseq/models/speech_to_text/convtransformer.py
new file mode 100644
index 0000000000000000000000000000000000000000..eba000d7b0826d2ecf5dc471156f8f8cc9f5e402
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/models/speech_to_text/convtransformer.py
@@ -0,0 +1,448 @@
+#!/usr/bin/env python3
+
+import logging
+import math
+from typing import Dict, List, Optional, Tuple
+
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from fairseq import checkpoint_utils, utils
+from fairseq.data.data_utils import lengths_to_padding_mask
+from fairseq.models import (
+    FairseqEncoder,
+    FairseqEncoderDecoderModel,
+    register_model,
+    register_model_architecture,
+)
+from fairseq.models.transformer import Embedding, TransformerDecoder
+from fairseq.modules import LayerNorm, PositionalEmbedding, TransformerEncoderLayer
+from torch import Tensor
+
+logger = logging.getLogger(__name__)
+
+
+@register_model("convtransformer")
+class ConvTransformerModel(FairseqEncoderDecoderModel):
+    """
+    Transformer-based Speech translation model from ESPNet-ST
+    https://arxiv.org/abs/2004.10234
+    """
+
+    def __init__(self, encoder, decoder):
+        super().__init__(encoder, decoder)
+
+    @staticmethod
+    def add_args(parser):
+        """Add model-specific arguments to the parser."""
+        parser.add_argument(
+            "--input-feat-per-channel",
+            type=int,
+            metavar="N",
+            help="encoder input dimension per input channel",
+        )
+        parser.add_argument(
+            "--activation-fn",
+            choices=utils.get_available_activation_fns(),
+            help="activation function to use",
+        )
+        parser.add_argument(
+            "--dropout", type=float, metavar="D", help="dropout probability"
+        )
+        parser.add_argument(
+            "--attention-dropout",
+            type=float,
+            metavar="D",
+            help="dropout probability for attention weights",
+        )
+        parser.add_argument(
+            "--activation-dropout",
+            "--relu-dropout",
+            type=float,
+            metavar="D",
+            help="dropout probability after activation in FFN.",
+        )
+        parser.add_argument(
+            "--encoder-embed-dim",
+            type=int,
+            metavar="N",
+            help="encoder embedding dimension",
+        )
+        parser.add_argument(
+            "--encoder-ffn-embed-dim",
+            type=int,
+            metavar="N",
+            help="encoder embedding dimension for FFN",
+        )
+        parser.add_argument(
+            "--encoder-layers", type=int, metavar="N", help="num encoder layers"
+        )
+        parser.add_argument(
+            "--encoder-attention-heads",
+            type=int,
+            metavar="N",
+            help="num encoder attention heads",
+        )
+        parser.add_argument(
+            "--encoder-normalize-before",
+            action="store_true",
+            help="apply layernorm before each encoder block",
+        )
+        parser.add_argument(
+            "--decoder-embed-dim",
+            type=int,
+            metavar="N",
+            help="decoder embedding dimension",
+        )
+        parser.add_argument(
+            "--decoder-ffn-embed-dim",
+            type=int,
+            metavar="N",
+            help="decoder embedding dimension for FFN",
+        )
+        parser.add_argument(
+            "--decoder-layers", type=int, metavar="N", help="num decoder layers"
+        )
+        parser.add_argument(
+            "--decoder-attention-heads",
+            type=int,
+            metavar="N",
+            help="num decoder attention heads",
+        )
+        parser.add_argument(
+            "--decoder-normalize-before",
+            action="store_true",
+            help="apply layernorm before each decoder block",
+        )
+        parser.add_argument(
+            "--decoder-output-dim",
+            type=int,
+            metavar="N",
+            help="decoder output dimension (extra linear layer if different from decoder embed dim)",
+        )
+        parser.add_argument(
+            "--share-decoder-input-output-embed",
+            action="store_true",
+            help="share decoder input and output embeddings",
+        )
+        parser.add_argument(
+            "--layernorm-embedding",
+            action="store_true",
+            help="add layernorm to embedding",
+        )
+        parser.add_argument(
+            "--no-scale-embedding",
+            action="store_true",
+            help="if True, dont scale embeddings",
+        )
+        parser.add_argument(
+            "--load-pretrained-encoder-from",
+            type=str,
+            metavar="STR",
+            help="model to take encoder weights from (for initialization)",
+        )
+        parser.add_argument(
+            "--load-pretrained-decoder-from",
+            type=str,
+            metavar="STR",
+            help="model to take decoder weights from (for initialization)",
+        )
+        parser.add_argument(
+            "--conv-out-channels",
+            type=int,
+            metavar="INT",
+            help="the number of output channels of conv layer",
+        )
+
+    @classmethod
+    def build_encoder(cls, args):
+        encoder = ConvTransformerEncoder(args)
+        if getattr(args, "load_pretrained_encoder_from", None):
+            encoder = checkpoint_utils.load_pretrained_component_from_model(
+                component=encoder, checkpoint=args.load_pretrained_encoder_from
+            )
+        return encoder
+
+    @classmethod
+    def build_decoder(cls, args, task, embed_tokens):
+        decoder = TransformerDecoderNoExtra(args, task.target_dictionary, embed_tokens)
+        if getattr(args, "load_pretrained_decoder_from", None):
+            decoder = checkpoint_utils.load_pretrained_component_from_model(
+                component=decoder, checkpoint=args.load_pretrained_decoder_from
+            )
+        return decoder
+
+    @classmethod
+    def build_model(cls, args, task):
+        """Build a new model instance."""
+
+        # make sure all arguments are present in older models
+        base_architecture(args)
+
+        def build_embedding(dictionary, embed_dim):
+            num_embeddings = len(dictionary)
+            padding_idx = dictionary.pad()
+            return Embedding(num_embeddings, embed_dim, padding_idx)
+
+        decoder_embed_tokens = build_embedding(
+            task.target_dictionary, args.decoder_embed_dim
+        )
+        encoder = cls.build_encoder(args)
+        decoder = cls.build_decoder(args, task, decoder_embed_tokens)
+        return cls(encoder, decoder)
+
+    @staticmethod
+    @torch.jit.unused
+    def set_batch_first(lprobs):
+        lprobs.batch_first = True
+
+    def get_normalized_probs(
+        self,
+        net_output: Tuple[Tensor, Optional[Dict[str, List[Optional[Tensor]]]]],
+        log_probs: bool,
+        sample: Optional[Dict[str, Tensor]] = None,
+    ):
+        # net_output['encoder_out'] is a (B, T, D) tensor
+        lprobs = self.get_normalized_probs_scriptable(net_output, log_probs, sample)
+        if self.training:
+            self.set_batch_first(lprobs)
+        return lprobs
+
+    def output_layout(self):
+        return "BTD"
+
+    """
+    The forward method inherited from the base class has a **kwargs argument in
+    its input, which is not supported in torchscript. This method overrites the forward
+    method definition without **kwargs.
+    """
+
+    def forward(self, src_tokens, src_lengths, prev_output_tokens):
+        encoder_out = self.encoder(src_tokens=src_tokens, src_lengths=src_lengths)
+        decoder_out = self.decoder(
+            prev_output_tokens=prev_output_tokens, encoder_out=encoder_out
+        )
+        return decoder_out
+
+
+class ConvTransformerEncoder(FairseqEncoder):
+    """Conv + Transformer encoder"""
+
+    def __init__(self, args):
+        """Construct an Encoder object."""
+        super().__init__(None)
+
+        self.dropout = args.dropout
+        self.embed_scale = (
+            1.0 if args.no_scale_embedding else math.sqrt(args.encoder_embed_dim)
+        )
+        self.padding_idx = 1
+        self.in_channels = 1
+        self.input_dim = args.input_feat_per_channel
+        self.conv = torch.nn.Sequential(
+            torch.nn.Conv2d(1, args.conv_out_channels, 3, stride=2, padding=3 // 2),
+            torch.nn.ReLU(),
+            torch.nn.Conv2d(
+                args.conv_out_channels,
+                args.conv_out_channels,
+                3,
+                stride=2,
+                padding=3 // 2,
+            ),
+            torch.nn.ReLU(),
+        )
+        transformer_input_dim = self.infer_conv_output_dim(
+            self.in_channels, self.input_dim, args.conv_out_channels
+        )
+        self.out = torch.nn.Linear(transformer_input_dim, args.encoder_embed_dim)
+        self.embed_positions = PositionalEmbedding(
+            args.max_source_positions,
+            args.encoder_embed_dim,
+            self.padding_idx,
+            learned=False,
+        )
+
+        self.transformer_layers = nn.ModuleList([])
+        self.transformer_layers.extend(
+            [TransformerEncoderLayer(args) for i in range(args.encoder_layers)]
+        )
+        if args.encoder_normalize_before:
+            self.layer_norm = LayerNorm(args.encoder_embed_dim)
+        else:
+            self.layer_norm = None
+
+    def pooling_ratio(self):
+        return 4
+
+    def infer_conv_output_dim(self, in_channels, input_dim, out_channels):
+        sample_seq_len = 200
+        sample_bsz = 10
+        x = torch.randn(sample_bsz, in_channels, sample_seq_len, input_dim)
+        x = torch.nn.Conv2d(1, out_channels, 3, stride=2, padding=3 // 2)(x)
+        x = torch.nn.Conv2d(out_channels, out_channels, 3, stride=2, padding=3 // 2)(x)
+        x = x.transpose(1, 2)
+        mb, seq = x.size()[:2]
+        return x.contiguous().view(mb, seq, -1).size(-1)
+
+    def forward(self, src_tokens, src_lengths):
+        """Encode input sequence.
+        :param torch.Tensor xs: input tensor
+        :param torch.Tensor masks: input mask
+        :return: position embedded tensor and mask
+        :rtype Tuple[torch.Tensor, torch.Tensor]:
+        """
+        bsz, max_seq_len, _ = src_tokens.size()
+        x = (
+            src_tokens.view(bsz, max_seq_len, self.in_channels, self.input_dim)
+            .transpose(1, 2)
+            .contiguous()
+        )
+        x = self.conv(x)
+        bsz, _, output_seq_len, _ = x.size()
+        x = x.transpose(1, 2).transpose(0, 1).contiguous().view(output_seq_len, bsz, -1)
+        x = self.out(x)
+        x = self.embed_scale * x
+
+        subsampling_factor = int(max_seq_len * 1.0 / output_seq_len + 0.5)
+        input_len_0 = (src_lengths.float() / subsampling_factor).ceil().long()
+        input_len_1 = x.size(0) * torch.ones([src_lengths.size(0)]).long().to(
+            input_len_0.device
+        )
+        input_lengths = torch.min(input_len_0, input_len_1)
+
+        encoder_padding_mask = lengths_to_padding_mask(input_lengths)
+
+        positions = self.embed_positions(encoder_padding_mask).transpose(0, 1)
+        x += positions
+        x = F.dropout(x, p=self.dropout, training=self.training)
+
+        for layer in self.transformer_layers:
+            x = layer(x, encoder_padding_mask)
+
+        if not encoder_padding_mask.any():
+            maybe_encoder_padding_mask = None
+        else:
+            maybe_encoder_padding_mask = encoder_padding_mask
+
+        return {
+            "encoder_out": [x],
+            "encoder_padding_mask": [maybe_encoder_padding_mask]
+            if maybe_encoder_padding_mask is not None
+            else [],
+            "encoder_embedding": [],
+            "encoder_states": [],
+            "src_tokens": [],
+            "src_lengths": [],
+        }
+
+    @torch.jit.export
+    def reorder_encoder_out(self, encoder_out: Dict[str, List[Tensor]], new_order):
+        """
+        Reorder encoder output according to *new_order*.
+
+        Args:
+            encoder_out: output from the ``forward()`` method
+            new_order (LongTensor): desired order
+
+        Returns:
+            *encoder_out* rearranged according to *new_order*
+        """
+        new_encoder_out = [encoder_out["encoder_out"][0].index_select(1, new_order)]
+        if len(encoder_out["encoder_padding_mask"]) == 0:
+            new_encoder_padding_mask = []
+        else:
+            new_encoder_padding_mask = [
+                (encoder_out["encoder_padding_mask"][0]).index_select(0, new_order)
+            ]
+        if len(encoder_out["encoder_embedding"]) == 0:
+            new_encoder_embedding = []
+        else:
+            new_encoder_embedding = [
+                (encoder_out["encoder_embedding"][0]).index_select(0, new_order)
+            ]
+        encoder_states = encoder_out["encoder_states"]
+        if len(encoder_states) > 0:
+            for idx, state in enumerate(encoder_states):
+                encoder_states[idx] = state.index_select(1, new_order)
+
+        return {
+            "encoder_out": new_encoder_out,
+            "encoder_padding_mask": new_encoder_padding_mask,
+            "encoder_embedding": new_encoder_embedding,
+            "encoder_states": encoder_states,
+            "src_tokens": [],
+            "src_lengths": [],
+        }
+
+
+class TransformerDecoderNoExtra(TransformerDecoder):
+    def extract_features(
+        self,
+        prev_output_tokens,
+        encoder_out: Optional[Dict[str, List[Tensor]]],
+        incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]] = None,
+        full_context_alignment: bool = False,
+        alignment_layer: Optional[int] = None,
+        alignment_heads: Optional[int] = None,
+    ):
+        # call scriptable method from parent class
+        x, _ = self.extract_features_scriptable(
+            prev_output_tokens,
+            encoder_out,
+            incremental_state,
+            full_context_alignment,
+            alignment_layer,
+            alignment_heads,
+        )
+        return x, None
+
+
+@register_model_architecture(model_name="convtransformer", arch_name="convtransformer")
+def base_architecture(args):
+    args.input_feat_per_channel = getattr(args, "input_feat_per_channel", 80)
+    args.encoder_embed_dim = getattr(args, "encoder_embed_dim", 512)
+    args.encoder_ffn_embed_dim = getattr(args, "encoder_ffn_embed_dim", 2048)
+    args.encoder_layers = getattr(args, "encoder_layers", 6)
+    args.encoder_attention_heads = getattr(args, "encoder_attention_heads", 8)
+    args.encoder_normalize_before = getattr(args, "encoder_normalize_before", False)
+    args.decoder_embed_dim = getattr(args, "decoder_embed_dim", args.encoder_embed_dim)
+    args.decoder_ffn_embed_dim = getattr(
+        args, "decoder_ffn_embed_dim", args.encoder_ffn_embed_dim
+    )
+    args.decoder_layers = getattr(args, "decoder_layers", 6)
+    args.decoder_attention_heads = getattr(args, "decoder_attention_heads", 8)
+    args.decoder_normalize_before = getattr(args, "decoder_normalize_before", False)
+    args.decoder_learned_pos = getattr(args, "decoder_learned_pos", False)
+    args.attention_dropout = getattr(args, "attention_dropout", 0.0)
+    args.activation_dropout = getattr(args, "activation_dropout", 0.0)
+    args.activation_fn = getattr(args, "activation_fn", "relu")
+    args.dropout = getattr(args, "dropout", 0.1)
+    args.adaptive_softmax_cutoff = getattr(args, "adaptive_softmax_cutoff", None)
+    args.adaptive_softmax_dropout = getattr(args, "adaptive_softmax_dropout", 0)
+    args.share_decoder_input_output_embed = getattr(
+        args, "share_decoder_input_output_embed", False
+    )
+    args.no_token_positional_embeddings = getattr(
+        args, "no_token_positional_embeddings", False
+    )
+    args.adaptive_input = getattr(args, "adaptive_input", False)
+    args.decoder_layerdrop = getattr(args, "decoder_layerdrop", 0.0)
+
+    args.decoder_output_dim = getattr(
+        args, "decoder_output_dim", args.decoder_embed_dim
+    )
+    args.decoder_input_dim = getattr(args, "decoder_input_dim", args.decoder_embed_dim)
+    args.no_scale_embedding = getattr(args, "no_scale_embedding", False)
+    args.quant_noise_pq = getattr(args, "quant_noise_pq", 0)
+    args.max_source_positions = getattr(args, "max_source_positions", 3000)
+    args.max_target_positions = getattr(args, "max_target_positions", 1024)
+    args.tie_adaptive_weights = getattr(args, "tie_adaptive_weights", False)
+    args.conv_out_channels = getattr(args, "conv_out_channels", args.encoder_embed_dim)
+
+
+@register_model_architecture("convtransformer", "convtransformer_espnet")
+def convtransformer_espnet(args):
+    args.encoder_embed_dim = getattr(args, "encoder_embed_dim", 256)
+    args.encoder_layers = getattr(args, "encoder_layers", 12)
+    args.encoder_attention_heads = getattr(args, "encoder_attention_heads", 4)
+    args.decoder_attention_heads = getattr(args, "decoder_attention_heads", 4)
diff --git a/SpeechT5/fairseq/fairseq/models/speech_to_text/modules/augmented_memory_attention.py b/SpeechT5/fairseq/fairseq/models/speech_to_text/modules/augmented_memory_attention.py
new file mode 100644
index 0000000000000000000000000000000000000000..e7465bc889fd1ba6ca2c60905a2eb6ff5cc62b9d
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/models/speech_to_text/modules/augmented_memory_attention.py
@@ -0,0 +1,488 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from typing import Tuple, List
+
+import torch
+import torch.nn.functional as F
+from fairseq.models import FairseqEncoder
+from fairseq.models.speech_to_text import (
+    ConvTransformerEncoder,
+)
+from fairseq.models.speech_to_text.utils import attention_suppression
+from fairseq.models.speech_to_text.utils import (
+    lengths_to_encoder_padding_mask,
+    segments_to_sequence,
+    sequence_to_segments,
+)
+from fairseq.modules import MultiheadAttention, TransformerEncoderLayer
+from torch import nn, Tensor
+
+# ------------------------------------------------------------------------------
+#   AugmentedMemoryConvTransformerEncoder
+# ------------------------------------------------------------------------------
+
+
+class AugmentedMemoryConvTransformerEncoder(ConvTransformerEncoder):
+    def __init__(self, args):
+        super().__init__(args)
+
+        args.encoder_stride = self.stride()
+
+        self.left_context = args.left_context // args.encoder_stride
+
+        self.right_context = args.right_context // args.encoder_stride
+
+        self.left_context_after_stride = args.left_context // args.encoder_stride
+        self.right_context_after_stride = args.right_context // args.encoder_stride
+
+        self.transformer_layers = nn.ModuleList([])
+        self.transformer_layers.extend(
+            [
+                AugmentedMemoryTransformerEncoderLayer(args)
+                for i in range(args.encoder_layers)
+            ]
+        )
+
+    def stride(self):
+        # Hard coded here. Should infer from convs in future
+        stride = 4
+        return stride
+
+    def forward(self, src_tokens, src_lengths, states=None):
+        """Encode input sequence.
+        :param torch.Tensor xs: input tensor
+        :param torch.Tensor masks: input mask
+        :return: position embedded tensor and mask
+        :rtype Tuple[torch.Tensor, torch.Tensor]:
+        """
+        bsz, max_seq_len, _ = src_tokens.size()
+        x = (
+            src_tokens.view(bsz, max_seq_len, self.in_channels, self.input_dim)
+            .transpose(1, 2)
+            .contiguous()
+        )
+        x = self.conv(x)
+        bsz, _, output_seq_len, _ = x.size()
+        x = x.transpose(1, 2).transpose(0, 1).contiguous().view(output_seq_len, bsz, -1)
+        x = self.out(x)
+        x = self.embed_scale * x
+
+        subsampling_factor = 1.0 * max_seq_len / output_seq_len
+        input_lengths = torch.max(
+            (src_lengths.float() / subsampling_factor).ceil().long(),
+            x.size(0) * src_lengths.new_ones([src_lengths.size(0)]).long(),
+        )
+
+        encoder_padding_mask, _ = lengths_to_encoder_padding_mask(
+            input_lengths, batch_first=True
+        )
+
+        # TODO: fix positional embedding
+        positions = self.embed_positions(encoder_padding_mask).transpose(0, 1)
+
+        x += positions
+        x = F.dropout(x, p=self.dropout, training=self.training)
+
+        # State to store memory banks etc.
+        if states is None:
+            states = [
+                {"memory_banks": None, "encoder_states": None}
+                for i in range(len(self.transformer_layers))
+            ]
+
+        for i, layer in enumerate(self.transformer_layers):
+            # x size:
+            # (self.left_size + self.segment_size + self.right_size)
+            # / self.stride, num_heads, dim
+            # TODO: Consider mask here
+            x = layer(x, states[i])
+            states[i]["encoder_states"] = x[
+                self.left_context_after_stride : -self.right_context_after_stride
+            ]
+
+        lengths = (
+            (
+                ~encoder_padding_mask[
+                    :, self.left_context_after_stride : -self.right_context_after_stride
+                ]
+            )
+            .sum(dim=1, keepdim=True)
+            .long()
+        )
+
+        return states[-1]["encoder_states"], lengths, states
+
+
+# ------------------------------------------------------------------------------
+#   AugmentedMemoryTransformerEncoderLayer
+# ------------------------------------------------------------------------------
+class AugmentedMemoryTransformerEncoderLayer(TransformerEncoderLayer):
+    def __init__(self, args):
+        super().__init__(args)
+
+        self.left_context = args.left_context // args.encoder_stride
+        self.right_context = args.right_context // args.encoder_stride
+
+    def forward(self, x, state):
+
+        length, batch_size, x_dim = x.size()
+
+        residual = x
+
+        if self.normalize_before:
+            x = self.self_attn_layer_norm(x)
+
+        # init_state
+        if state.get("memory_banks", None) is None:
+            state["memory_banks"] = []
+
+        # TODO reseach new sum_query method
+        seg_start = self.left_context
+        seg_end = length - self.right_context
+        if seg_start < seg_end:
+            summarization_query = torch.mean(x[seg_start:seg_end], keepdim=True, dim=0)
+        else:
+            summarization_query = x.new_zeros(1, batch_size, x_dim)
+
+        x = torch.cat([x, summarization_query], dim=0)
+
+        x = self.self_attn(input_and_summary=x, state=state)
+
+        x = self.dropout_module(x)
+        x = residual + x
+
+        if not self.normalize_before:
+            x = self.self_attn_layer_norm(x)
+
+        residual = x
+        if self.normalize_before:
+            x = self.final_layer_norm(x)
+
+        x = self.activation_fn(self.fc1(x))
+        x = self.activation_dropout_module(x)
+        x = self.fc2(x)
+        x = self.dropout_module(x)
+        x = residual + x
+        if not self.normalize_before:
+            x = self.final_layer_norm(x)
+
+        return x
+
+    def build_self_attention(self, embed_dim, args):
+        return AugmentedMemoryMultiheadAttention(
+            embed_dim=embed_dim,
+            num_heads=args.encoder_attention_heads,
+            dropout=args.attention_dropout,
+            self_attention=True,
+            q_noise=self.quant_noise,
+            qn_block_size=self.quant_noise_block_size,
+            tanh_on_mem=True,
+            max_memory_size=args.max_memory_size,
+        )
+
+
+# ------------------------------------------------------------------------------
+#   AugmentedMemoryMultiheadAttention
+# ------------------------------------------------------------------------------
+class AugmentedMemoryMultiheadAttention(MultiheadAttention):
+    """
+    Augmented Memory Attention from
+    Streaming Transformer-based Acoustic Models
+    Using Self-attention with Augmented Memory
+    https://arxiv.org/abs/2005.08042
+    """
+
+    def __init__(
+        self,
+        embed_dim,
+        num_heads,
+        kdim=None,
+        vdim=None,
+        dropout=0.0,
+        bias=True,
+        add_bias_kv=False,
+        add_zero_attn=False,
+        self_attention=False,
+        encoder_decoder_attention=False,
+        q_noise=0.0,
+        qn_block_size=8,
+        tanh_on_mem=False,
+        memory_dim=None,
+        std_scale=0.5,  # 0.5 based on https://arxiv.org/abs/2005.09137
+        max_memory_size=-1,
+        disable_mem_on_mem_attn=True,
+    ):
+        super().__init__(
+            embed_dim,
+            num_heads,
+            kdim,
+            vdim,
+            dropout,
+            bias,
+            add_bias_kv,
+            add_zero_attn,
+            self_attention,
+            encoder_decoder_attention,
+            q_noise,
+            qn_block_size,
+        )
+
+        self.memory_dim = memory_dim if memory_dim is not None else embed_dim
+        self.std_scale = std_scale
+        self.disable_mem_on_mem_attn = disable_mem_on_mem_attn
+
+        # This Operator was used for factorization in PySpeech
+        self.v2e = lambda x: x
+
+        if tanh_on_mem:
+            self.squash_mem = torch.tanh
+            self.nonlinear_squash_mem = True
+        else:
+            self.squash_mem = lambda x: x
+            self.nonlinear_squash_mem = False
+
+        self.max_memory_size = max_memory_size
+
+    def forward(self, input_and_summary, state):
+        """
+        input: Encoder states of current segment with left or right context,
+            plus one summarization query
+
+        """
+
+        length, batch_size, _ = input_and_summary.shape
+        length = length - 1  # not include sum_query, last index
+
+        memory = state["memory_banks"]
+        # TODO: positional embedding on memory
+
+        if self.max_memory_size > -1 and len(memory) > self.max_memory_size:
+            # TODO: need to fix here
+            if self.max_memory_size == 0:
+                memory = memory.new_zeros(1, memory.size(1), self.memory_dim)
+            else:
+                memory = memory[-self.max_memory_size :]
+
+        memory_and_input = torch.cat(memory + [input_and_summary[:-1]], dim=0)
+        input_and_sum_query = input_and_summary
+
+        q = self.q_proj(self.v2e(input_and_sum_query))
+        k = self.k_proj(self.v2e(memory_and_input))
+        v = self.v_proj(self.v2e(memory_and_input))
+
+        q = (
+            q.contiguous()
+            .view(-1, batch_size * self.num_heads, self.head_dim)
+            .transpose(0, 1)
+            * self.scaling
+        )
+        k = (
+            k.contiguous()
+            .view(-1, batch_size * self.num_heads, self.head_dim)
+            .transpose(0, 1)
+        )
+
+        v = (
+            v.contiguous()
+            .view(-1, batch_size * self.num_heads, self.head_dim)
+            .transpose(0, 1)
+        )
+
+        attention_weights = torch.bmm(q, k.transpose(1, 2))
+
+        if self.disable_mem_on_mem_attn:
+            attention_weights = self.suppress_mem_on_mem_attention(
+                batch_size, self.num_heads, len(memory), attention_weights
+            )
+
+        if self.std_scale is not None:
+            attention_weights = attention_suppression(attention_weights, self.std_scale)
+
+        assert list(attention_weights.shape) == [
+            batch_size * self.num_heads,
+            length + 1,
+            length + len(memory),
+        ]
+
+        attention_weights = torch.nn.functional.softmax(
+            attention_weights.float(), dim=-1
+        ).type_as(attention_weights)
+
+        attention_probs = self.dropout_module(attention_weights)
+
+        # [T, T, B, n_head] + [T, B, n_head, d_head] -> [T, B, n_head, d_head]
+        attention = torch.bmm(attention_probs, v)
+
+        assert list(attention.shape) == [
+            batch_size * self.num_heads,
+            length + 1,
+            self.head_dim,
+        ]
+
+        attention = (
+            attention.transpose(0, 1)
+            .contiguous()
+            .view(length + 1, batch_size, self.embed_dim)
+        )
+
+        output_and_memory = self.out_proj(attention)
+
+        next_m = output_and_memory[-1:]
+        next_m = self.squash_mem(next_m)
+        output = output_and_memory[:-1]
+
+        state["memory_banks"].append(next_m)
+
+        return output
+
+    def suppress_mem_on_mem_attention(
+        self, B: int, num_heads: int, mem_size: int, attention_weight: Tensor
+    ):
+        """
+        Arguments:
+            - B: batch size
+            - num_heads: number of attention heads
+            - mem_size: size of memory bank
+            - attention_weight: a [B*num_heads, T + 1, T + mem_size] vector
+
+        Return:
+            modified attention_weight with [B*num_heads, -1, :mem_size] = -inf
+        """
+        attention_weight[:, -1, :mem_size] = float("-inf")
+        return attention_weight
+
+
+# ------------------------------------------------------------------------------
+#   SequenceEncoder
+# ------------------------------------------------------------------------------
+class SequenceEncoder(FairseqEncoder):
+    """
+    SequenceEncoder encodes sequences.
+
+    More specifically, `src_tokens` and `src_lengths` in `forward()` should
+    describe a batch of "complete" sequences rather than segments.
+
+    Segment-by-segment inference can be triggered by `segment_size`:
+    1) `segment_size` is None:
+        SequenceEncoder treats the input sequence as one single segment.
+    2) `segment_size` is not None (some int instead):
+        SequenceEncoder does the following:
+            1. breaks the input sequence into several segments
+            2. inference on each segment and collect the outputs
+            3. concatanete segment outputs into the output sequence.
+    Note that `segment_size` here shouldn't include additional left/right
+    contexts needed, for example if we wish to infer with LC-BLSTM where the
+    middle chunk size is 100 and right context is 20, `segment_size` should be
+    100.
+    """
+
+    def __init__(self, args, module):
+        super().__init__(None)
+
+        self.module = module
+        self.input_time_axis = 1
+        self.output_time_axis = 0
+        self.segment_size = args.segment_size
+        self.left_context = args.left_context
+        self.right_context = args.right_context
+
+    def forward(
+        self,
+        src_tokens: Tensor,
+        src_lengths: Tensor,
+        states=None,
+    ):
+
+        seg_src_tokens_lengths = sequence_to_segments(
+            sequence=src_tokens,
+            time_axis=self.input_time_axis,
+            lengths=src_lengths,
+            segment_size=self.segment_size,
+            extra_left_context=self.left_context,
+            extra_right_context=self.right_context,
+        )
+
+        seg_encoder_states_lengths: List[Tuple[Tensor, Tensor]] = []
+
+        for seg_src_tokens, seg_src_lengths in seg_src_tokens_lengths:
+            (seg_encoder_states, seg_enc_lengths, states) = self.module(
+                seg_src_tokens,
+                seg_src_lengths,
+                states=states,
+            )
+
+            seg_encoder_states_lengths.append((seg_encoder_states, seg_enc_lengths))
+
+        encoder_out, enc_lengths = segments_to_sequence(
+            segments=seg_encoder_states_lengths, time_axis=self.output_time_axis
+        )
+
+        encoder_padding_mask, _ = lengths_to_encoder_padding_mask(
+            enc_lengths, batch_first=True
+        )
+
+        if not encoder_padding_mask.any():
+            encoder_padding_mask = None
+
+        return {
+            "encoder_out": [encoder_out],
+            "encoder_padding_mask": [encoder_padding_mask],
+            "encoder_embedding": [],
+            "encoder_states": [states],
+            "src_tokens": [],
+            "src_lengths": [],
+        }
+
+    def incremental_encode(
+        self,
+        seg_src_tokens: Tensor,
+        seg_src_lengths: Tensor,
+        states=None,
+    ):
+        """
+        Different from forward function, this function takes segmented speech
+        as input, and append encoder states to previous states
+        """
+        (seg_encoder_states, seg_enc_lengths, states) = self.module(
+            seg_src_tokens,
+            seg_src_lengths,
+            states=states,
+        )
+        return seg_encoder_states, seg_enc_lengths, states
+
+
+# ------------------------------------------------------------------------------
+#   Augmented memory model decorator
+# ------------------------------------------------------------------------------
+def augmented_memory(klass):
+    class StreamSeq2SeqModel(klass):
+        @staticmethod
+        def add_args(parser):
+            super(StreamSeq2SeqModel, StreamSeq2SeqModel).add_args(parser)
+            parser.add_argument(
+                "--segment-size", type=int, required=True, help="Length of the segment."
+            )
+            parser.add_argument(
+                "--left-context",
+                type=int,
+                default=0,
+                help="Left context for the segment.",
+            )
+            parser.add_argument(
+                "--right-context",
+                type=int,
+                default=0,
+                help="Right context for the segment.",
+            )
+            parser.add_argument(
+                "--max-memory-size",
+                type=int,
+                default=-1,
+                help="Right context for the segment.",
+            )
+
+    StreamSeq2SeqModel.__name__ = klass.__name__
+    return StreamSeq2SeqModel
diff --git a/SpeechT5/fairseq/fairseq/models/speech_to_text/modules/emformer.py b/SpeechT5/fairseq/fairseq/models/speech_to_text/modules/emformer.py
new file mode 100644
index 0000000000000000000000000000000000000000..6ef76bd012ba40b0395fec2ca9ae9e9c136ffe40
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/models/speech_to_text/modules/emformer.py
@@ -0,0 +1,1837 @@
+#!/usr/bin/env python3
+# Copyright (c) 2017-present, Facebook, Inc.
+# All rights reserved.
+#
+# This source code is licensed under the license found in the LICENSE file in
+# the root directory of this source tree. An additional grant of patent rights
+# can be found in the PATENTS file in the same directory.
+
+
+import math
+import re
+from functools import partial
+from typing import List, Optional, Tuple
+
+import torch
+import torch.nn as nn
+from fairseq.models import (
+    FairseqEncoder,
+)
+from fairseq.models.speech_to_text.utils import (
+    NoOp,
+    lengths_to_padding_mask,
+    segments_to_sequence,
+)
+from fairseq.models.speech_to_text.utils import (
+    attention_suppression,
+    layer_norm_backward_hook,
+)
+from torch import Tensor, device as Device
+from torch.quantization.qconfig import (
+    default_dynamic_qconfig,
+    per_channel_dynamic_qconfig,
+)
+
+
+class RelativePositionEmbedding(nn.Module):
+    """
+    Implementation according to https://arxiv.org/abs/1803.02155
+    """
+
+    def __init__(self, head_dim, max_position, norm_init=True):
+        super().__init__()
+        self.head_dim = head_dim
+        self.max_position = max_position
+        self.embeddings = nn.Parameter(torch.Tensor(max_position * 2 + 1, head_dim))
+        if norm_init:
+            nn.init.xavier_normal_(self.embeddings)
+        else:
+            nn.init.xavier_uniform_(self.embeddings)
+
+    def forward(self, input: Tensor):
+        output = nn.functional.embedding(input.long(), self.embeddings)
+        return output
+
+
+class Fp32LayerNorm(nn.Module):
+    def __init__(
+        self,
+        input_dim,
+        clamp_grad=True,
+        max_grad_value=256,
+        eps=1e-5,
+        elementwise_affine=True,
+    ):
+        super().__init__()
+        self.torch_module = torch.nn.LayerNorm(
+            input_dim, eps=eps, elementwise_affine=elementwise_affine
+        )
+        if clamp_grad:
+            hook = partial(layer_norm_backward_hook, clamp_value=max_grad_value)
+            self.torch_module.register_backward_hook(hook)
+
+    def forward(self, input):
+        output = torch.nn.functional.layer_norm(
+            input.float(),
+            self.torch_module.normalized_shape,
+            self.torch_module.weight.float()
+            if self.torch_module.weight is not None
+            else None,
+            self.torch_module.bias.float()
+            if self.torch_module.bias is not None
+            else None,
+            self.torch_module.eps,
+        ).type_as(input)
+        return output
+
+
+# ------------------------------------------------------------------------------
+#   PositionwiseFF
+# ------------------------------------------------------------------------------
+
+
+class PositionwiseFF(nn.Module):
+    """
+    FFN layer in transformer.
+
+    Args:
+        input_dim: input embedding dimension
+        ffn_dim: FFN layer inner dimension
+        dropout_on_fc1: dropout for first linear layer
+        dropout_on_fc2: dropout fr second linear layer
+        activation_fn: activation function used after first linear layer. \
+                Only relu or gelu is supported.
+
+    """
+
+    def __init__(
+        self, input_dim, ffn_dim, dropout_on_fc1, dropout_on_fc2, activation_fn
+    ):
+        super(PositionwiseFF, self).__init__()
+
+        self.input_dim = input_dim
+        self.ffn_dim = ffn_dim
+        if activation_fn == "relu":
+            ac = nn.ReLU()
+        elif activation_fn == "gelu":
+            ac = nn.GELU()
+        else:
+            raise ValueError("Unsupported activation_fn = ({})".format(activation_fn))
+
+        # fc1 -> ac -> dropout -> fc2 -> dropout
+        self.module = nn.Sequential(
+            nn.Linear(input_dim, ffn_dim),
+            ac,
+            nn.Dropout(dropout_on_fc1),
+            nn.Linear(ffn_dim, input_dim),
+            nn.Dropout(dropout_on_fc2),
+        )
+
+        self.layer_norm = Fp32LayerNorm(input_dim)
+
+    def forward(self, input):
+        module_out = self.module(self.layer_norm(input))
+        output = module_out + input
+
+        return output
+
+    def quantize_(self, params=None):
+        if params and "per_channel" in params and params["per_channel"]:
+            qconfig = per_channel_dynamic_qconfig
+        else:
+            qconfig = default_dynamic_qconfig
+        torch.quantization.quantize_dynamic(
+            self, {torch.nn.Linear: qconfig}, dtype=torch.qint8, inplace=True
+        )
+        return self
+
+
+# ------------------------------------------------------------------------------
+#   SummarizationLayer
+# ------------------------------------------------------------------------------
+
+
+class SummarizationLayer(nn.Module):
+    def __init__(self, method, segment_size, embedding_dim):
+        super(SummarizationLayer, self).__init__()
+        self.segment_size = segment_size
+        self.embedding_dim = embedding_dim
+        nonlin_match = re.match(r"nonlinear\((?P<act>[a-z]+),(?P<dim>[0-9]+)\)", method)
+        self.method = method
+        if method == "mean":
+            self.module = nn.AvgPool1d(
+                kernel_size=segment_size,
+                stride=segment_size,
+                ceil_mode=True,
+            )
+        elif method == "max":
+            self.module = nn.MaxPool1d(
+                kernel_size=segment_size,
+                stride=segment_size,
+                ceil_mode=True,
+            )
+        elif method == "linear":
+            self.module = nn.Linear(segment_size, 1)
+        elif nonlin_match:
+            nonlin_args = nonlin_match.groupdict()
+            act_type = nonlin_args["act"]
+            hid_dim = int(nonlin_args["dim"])
+            if act_type == "relu":
+                act = nn.ReLU()
+            elif act_type == "gelu":
+                act = nn.GELU()
+            else:
+                raise ValueError("Unsupported activation_fn = ({})".format(act_type))
+            self.module = nn.Sequential(
+                nn.Linear(segment_size, hid_dim),
+                act,
+                nn.Linear(hid_dim, 1),
+            )
+        else:
+            raise ValueError("Unsupported summarization method = ({})".format(method))
+
+    def forward(self, input):
+        # T, B, D -> B, D, T
+        input = input.permute(1, 2, 0)
+
+        if self.method == "mean" or self.method == "max":
+            output = self.module(input)
+            output = output.permute(2, 0, 1)
+            return output
+
+        full_seg_length = input.size(2) // self.segment_size * self.segment_size
+        if full_seg_length > 0:
+            # at least one seg is full
+            B = input.size(0)
+            D = input.size(1)
+            input_todo = (
+                input[:, :, :full_seg_length]
+                .contiguous()
+                .view(B, -1, self.segment_size)
+            )
+            output = self.module(input_todo)
+            output = output.view(B, D, -1)
+        else:
+            output = input.new_zeros(input.size(0), input.size(1), 0)
+        left = input.size(2) - full_seg_length
+        if left > 0:
+            # when last seg is not full, use zeros as last memory placeholder
+            zeros = input.new_zeros(input.size(0), input.size(1), 1)
+            output = torch.cat([output, zeros], dim=2)
+        output = output.permute(2, 0, 1)
+        return output
+
+
+# ------------------------------------------------------------------------------
+#   NoSegAugmentedMemoryMultiheadAttentionBmm
+# ------------------------------------------------------------------------------
+
+
+class NoSegAugmentedMemoryMultiheadAttentionBmm(nn.Module):
+    """
+    Whole utterance augmented memory multihead attention using BMM.
+
+    Different with previous augmented memory multihead attention where
+    the utterance is chunked into segments. Here we use attention mask
+    achieve so. The input embedding [right_context, utterance, summary]
+    is a concatenation of right context, utterance and summary.
+
+    Right context block is the concatenation of all the right context for
+    each segments. [right_context_0, right_context_1, ..., right_context_n]
+    For example, if we have utterance = [v0, v1, v2, ...., v20]. segment
+    size 8, right_context size 4. Then the right context blocks =
+    [v8, v9, v10, v11, v16, v17, v18, v19, 0, 0, 0, 0], where v8, v9, v10,
+    and v11 are the right context for first segment. v16, v17, v18 and v19
+    are the right context for second segment. 0, 0, 0 and 0 are right context
+    for the last segment.
+
+    utterance is corresponding to input embedding sequence
+
+    summary is concatenation of average of each segments. [summary_0,
+    summary_1, ..., ].
+
+    In augmented memory multihead attention, the query is [right_context,
+    utterance, summary], key is [memory, right_context, utterance]. Different
+    with AugmentedMemoryMultiheadAttentionBmm, memory here is passed from
+    previous attention layer. For the first attention layer, memory is average
+    of each segment.
+
+    Memory is a concatenation of memory from each segments in previous attention
+    layer. For example, current layer is i, then memory is [m_0, m_1, ..., m_n].
+    Each m_k is the output from seg_k in layer i-1.
+
+    args:
+        input_dim: input embedding dimension
+        num_heads: number of heads in multihead self-attention
+        dropout: attention dropout
+        std_scale: if std_scale is not None. The weak attention suppression is
+            turned on. For std_scale = 0.5, all the attention smaller than
+            mean + 0.5 * std will be suppressed.
+        scaled_init: whether to use scaled init for linear weight
+        tanh_on_mem: whether to use tanh on memory output
+        use_mem: whether to use memory or not. When max_memory_size is 0, then
+            we don't have memory anymore.
+        layer_index: current self-attention layer index that is used in depth
+            initialization
+        max_relative_position: max relative position used in relative position
+            embedding
+        rpe_old_option: To be compatible with previous model. The previous model
+            was trained with attention += attention + rpe. The correct equation
+            should be attention = attention + rpe
+
+    """
+
+    def __init__(
+        self,
+        input_dim,
+        num_heads,
+        dropout=0.0,
+        std_scale=None,
+        scaled_init=False,
+        tanh_on_mem=False,
+        use_mem=True,
+        mini_batches=False,
+        negative_inf="-inf",
+        layer_index=-1,
+        max_relative_position=0,
+        rpe_old_option=True,
+    ):
+        if input_dim % num_heads:
+            raise ValueError(
+                "input_dim ({}) must be divisible by num_heads ({})".format(
+                    input_dim, num_heads
+                )
+            )
+
+        super().__init__()
+
+        embed_dim = input_dim
+        self.e2h_kv = torch.nn.Linear(input_dim, 2 * input_dim, bias=True)
+        self.e2h_q = torch.nn.Linear(input_dim, input_dim, bias=True)
+        self.rpe_old_option = rpe_old_option
+        if max_relative_position > 0:
+            self.use_rpe = True
+            self.rpe_k = RelativePositionEmbedding(
+                head_dim=input_dim // num_heads,
+                max_position=max_relative_position,
+            )
+            self.rpe_v = RelativePositionEmbedding(
+                head_dim=input_dim // num_heads,
+                max_position=max_relative_position,
+            )
+        else:
+            self.use_rpe = False
+            self.rpe_k = None
+            self.rpe_v = None
+        if scaled_init:
+            if layer_index == -1:
+                gain = 1.0 / math.sqrt(2)
+            else:
+                # https://arxiv.org/abs/2005.09684 depthwise initialization
+                # stablize the training greatly. Use depthwise initialization to
+                # replace incremental loss.
+                gain = 1.0 / math.sqrt(layer_index + 1)
+            torch.nn.init.xavier_uniform_(self.e2h_kv.weight, gain=gain)
+            torch.nn.init.xavier_uniform_(self.e2h_q.weight, gain=gain)
+
+        self.out_proj = torch.nn.Linear(embed_dim, embed_dim, bias=True)
+
+        self.embed_dim = embed_dim
+        self.num_heads = num_heads
+        self.dropout = dropout
+
+        self.head_dim = embed_dim // num_heads
+        self.scaling = self.head_dim ** -0.5
+
+        self.std_scale = std_scale
+        self.use_mem = use_mem
+        self.mini_batches = mini_batches
+        self.negative_inf = negative_inf
+
+        if tanh_on_mem:
+            self.squash_mem = torch.tanh
+            self.nonlinear_squash_mem = True
+        else:
+            self.squash_mem = NoOp()
+            self.nonlinear_squash_mem = False
+
+    def prepare_qkv(
+        self,
+        input: Tensor,
+        mems: Tensor,
+        lengths: Tensor,
+        summary_length: int,
+        lc_length: int,
+    ):
+        # T: right_context length + utterance_length  + summary_length
+        T, B, D = input.shape
+        mem_length = mems.size(0)
+        utterance_length = torch.max(lengths)
+
+        right_context_blocks_length = T - utterance_length - summary_length
+        rc_block = input[:right_context_blocks_length, :, :]
+        utterance_block = input[right_context_blocks_length : T - summary_length, :, :]
+
+        if B == 1:
+            padding_mask = None
+        else:
+            klengths = lengths + mem_length + right_context_blocks_length + lc_length
+            padding_mask = lengths_to_padding_mask(lengths=klengths)
+
+        mem_rc_input = torch.cat([mems, rc_block, utterance_block], dim=0)
+
+        # In training lc_length = 0
+        key_length = mem_rc_input.size(0) + lc_length
+        rc_input_sum = input
+        q = self.e2h_q(rc_input_sum)
+        kv = self.e2h_kv(mem_rc_input)
+        k, v = kv.chunk(chunks=2, dim=2)
+        result_qkv = (q, k, v)
+        input_shape = (T, B, D)
+        result_lengths_info = (
+            mem_length,
+            utterance_length,
+            right_context_blocks_length,
+            key_length,
+        )
+        if padding_mask is not None:
+            assert padding_mask.size(0) == B
+            assert padding_mask.size(1) == key_length
+
+        return result_qkv, input_shape, result_lengths_info, padding_mask
+
+    def prepare_attention_weights(
+        self,
+        q: Tensor,
+        new_k: Tensor,
+        new_v: Tensor,
+        input_shape: Tuple[int, int, int],
+        rpe: Optional[Tensor],
+    ) -> Tuple[Tensor, Tensor, Tensor]:
+        T, B, D = input_shape
+        q = (
+            q.contiguous().view(-1, B * self.num_heads, self.head_dim).transpose(0, 1)
+            * self.scaling
+        )
+
+        k = (
+            new_k.contiguous()
+            .view(-1, B * self.num_heads, self.head_dim)
+            .transpose(0, 1)
+        )
+
+        v = (
+            new_v.contiguous()
+            .view(-1, B * self.num_heads, self.head_dim)
+            .transpose(0, 1)
+        )
+
+        attention_weights = torch.bmm(q, k.transpose(1, 2))
+        if self.use_rpe and rpe is not None and self.rpe_v is not None:
+            r_k = self.rpe_k(rpe)
+            # [q, B*h, d] * [q, k, d] -> [B*h, q, k]
+            attention_weights_rpe = torch.matmul(
+                q.transpose(0, 1), r_k.transpose(1, 2)
+            ).transpose(0, 1)
+            attention_weights = attention_weights + attention_weights_rpe
+        attention_weights_float = attention_weights.float()
+
+        return attention_weights, attention_weights_float, v
+
+    def prepare_attention_output(
+        self,
+        attention_weights: Tensor,
+        attention_weights_float: Tensor,
+        v: Tensor,
+        input_shape: Tuple[int, int, int],
+        key_length: int,
+        padding_mask: Optional[Tensor],
+        rpe: Optional[Tensor],
+    ) -> Tensor:
+        T, B, D = input_shape
+        if padding_mask is not None:
+            attention_weights_float = attention_weights_float.view(
+                B, self.num_heads, T, key_length
+            )
+            attention_weights_float = attention_weights_float.masked_fill(
+                padding_mask.unsqueeze(1).unsqueeze(2).to(torch.bool), float("-inf")
+            )
+            attention_weights_float = attention_weights_float.view(
+                B * self.num_heads, T, key_length
+            )
+
+        if self.std_scale is not None:
+            attention_weights_float = attention_suppression(
+                attention_weights_float, self.std_scale
+            )
+
+        attention_weights_float = torch.nn.functional.softmax(
+            attention_weights_float, dim=-1
+        )
+        attention_weights = attention_weights_float.type_as(attention_weights)
+
+        attention_probs = torch.nn.functional.dropout(
+            attention_weights, p=self.dropout, training=self.training
+        )
+
+        # [T, key_length, B, n_head]+ [key_length, B, n_head, d_head]
+        # -> [T, B, n_head, d_head]
+        attention = torch.bmm(attention_probs, v)
+        if self.use_rpe and rpe is not None and self.rpe_v is not None:
+            r_v = self.rpe_v(rpe)
+            attention_rpe = torch.matmul(
+                attention_probs.transpose(0, 1), r_v
+            ).transpose(0, 1)
+
+            if self.rpe_old_option:
+                attention += attention + attention_rpe
+            else:
+                attention = attention + attention_rpe
+
+        assert list(attention.shape) == [B * self.num_heads, T, self.head_dim]
+
+        attention = attention.transpose(0, 1).contiguous().view(T, B, self.embed_dim)
+
+        rc_output_memory = self.out_proj(attention)
+        return rc_output_memory
+
+    @torch.jit.unused
+    def forward(
+        self,
+        input: Tensor,
+        lengths: Tensor,
+        mems: Tensor,
+        attention_mask: Tensor,
+        pre_mems: Optional[Tensor] = None,
+        left_context_key: Optional[Tensor] = None,
+        left_context_val: Optional[Tensor] = None,
+        rpe: Optional[Tensor] = None,
+    ) -> Tuple[Tensor, Tensor, Tensor, Tensor]:
+        """
+        forward function for NoSegAugmentedMemoryMultiheadAttentionBmm in training.
+
+        args:
+            input: formed in the following way
+                [right_context_0, right_contex_1, ..., seg_0, seg_1,
+                ..., summary_0, summary_1,..]
+            lengths: the length of query which is [seg_0, seg_1, ....]
+            mems: [mem_0, mem_1, ...].
+            attention_mask: attention mask for query = [right_context, query, summary]
+                key = [mem, right_context, query]. This is only used for traing.
+
+        """
+        if self.use_mem:
+            mem_length = mems.size(0)
+            summary_length = mem_length + 1
+            if pre_mems is not None:
+                mems = torch.cat([pre_mems, mems], dim=0)
+        else:
+            mem_length = 0
+            summary_length = 0
+
+        # In training, lc_length = 0
+        if left_context_key is not None:
+            lc_length = left_context_key.size(0)
+        else:
+            lc_length = 0
+        results = self.prepare_qkv(
+            input=input,
+            mems=mems,
+            lengths=lengths,
+            summary_length=summary_length,
+            lc_length=lc_length,
+        )
+        result_qkv, input_shape, result_lengths_info, padding_mask = results
+        q, k, v = result_qkv
+        (
+            mem_length,
+            utterance_length,
+            right_context_blocks_length,
+            key_length,
+        ) = result_lengths_info
+
+        if left_context_key is not None:
+            # add the cache key and value
+            new_k = torch.cat(
+                [
+                    k[: mem_length + right_context_blocks_length, :, :],
+                    left_context_key,
+                    k[-utterance_length:, :, :],
+                ],
+                dim=0,
+            )
+            new_v = torch.cat(
+                [
+                    v[: mem_length + right_context_blocks_length, :, :],
+                    left_context_val,
+                    v[-utterance_length:, :, :],
+                ],
+                dim=0,
+            )
+            next_k = new_k[mem_length + right_context_blocks_length :, :, :]
+            next_v = new_v[mem_length + right_context_blocks_length :, :, :]
+        else:
+            new_k = k
+            new_v = v
+            next_k = None
+            next_v = None
+
+        attention_weights, attention_weights_float, v = self.prepare_attention_weights(
+            q=q,
+            new_k=new_k,
+            new_v=new_v,
+            input_shape=input_shape,
+            rpe=rpe,
+        )
+
+        # mask attention
+        attention_mask = attention_mask.unsqueeze(0)
+        attention_weights_float = attention_weights_float.masked_fill(
+            attention_mask, float(self.negative_inf)
+        )
+
+        rc_output_memory = self.prepare_attention_output(
+            attention_weights=attention_weights,
+            attention_weights_float=attention_weights_float,
+            v=v,
+            input_shape=input_shape,
+            key_length=key_length,
+            padding_mask=padding_mask,
+            rpe=rpe,
+        )
+
+        if self.use_mem:
+            # next_m length equals to summary length - 1
+            # last memory is ignored
+            if self.mini_batches:
+                next_m = rc_output_memory[-summary_length:]
+            else:
+                next_m = rc_output_memory[-summary_length:-1]
+
+            next_m = self.squash_mem(next_m)
+            # rc and output
+            rc_output = rc_output_memory[:-summary_length]
+            if not self.nonlinear_squash_mem:
+                next_m = torch.clamp(next_m, min=-10, max=10)
+        else:
+            next_m = mems
+            rc_output = rc_output_memory
+
+        return rc_output, next_m, next_k, next_v
+
+    @torch.jit.export
+    def forward_jit(
+        self,
+        input: Tensor,
+        lengths: Tensor,
+        mems: Tensor,
+        left_context_key: Tensor,
+        left_context_val: Tensor,
+        rpe: Optional[Tensor],
+    ) -> Tuple[Tensor, Tensor, Tensor, Tensor]:
+        """
+        forward function for NoSegAugmentedMemoryMultiheadAttentionBmm in decoding.
+
+        args:
+            input: formed in the following way
+                [right_context_0, right_contex_1, ..., seg_0, seg_1,
+                ..., summary_0, summary_1,..]
+            lengths: the length of query which is [seg_0, seg_1, ....]
+            mems: [mem_0, mem_1, ...].
+            left_context_key: left_context for key part. This is only used for online
+                decoding. In training, this is empty tensor
+            left_context_val: left_context for value part. This is only used for online
+                decoding. In training, this is empty tensor
+
+        """
+        lc_length = left_context_key.size(0)
+
+        # In decoding, summary_length = 1 or 0
+        if self.use_mem:
+            summary_length = 1
+        else:
+            summary_length = 0
+
+        results = self.prepare_qkv(
+            input=input,
+            mems=mems,
+            lengths=lengths,
+            summary_length=summary_length,
+            lc_length=lc_length,
+        )
+        result_qkv, input_shape, result_lengths_info, padding_mask = results
+        q, k, v = result_qkv
+        (
+            mem_length,
+            utterance_length,
+            right_context_blocks_length,
+            key_length,
+        ) = result_lengths_info
+
+        # add the cache key and value
+        new_k = torch.cat(
+            [
+                k[: mem_length + right_context_blocks_length, :, :],
+                left_context_key,
+                k[-utterance_length:, :, :],
+            ],
+            dim=0,
+        )
+        new_v = torch.cat(
+            [
+                v[: mem_length + right_context_blocks_length, :, :],
+                left_context_val,
+                v[-utterance_length:, :, :],
+            ],
+            dim=0,
+        )
+        next_k = new_k[mem_length + right_context_blocks_length :, :, :]
+        next_v = new_v[mem_length + right_context_blocks_length :, :, :]
+
+        attention_weights, attention_weights_float, v = self.prepare_attention_weights(
+            q=q,
+            new_k=new_k,
+            new_v=new_v,
+            input_shape=input_shape,
+            rpe=rpe,
+        )
+        # In online decoding, we don't have attention mask. But we still need
+        # to disable the attention from summary query to memory
+        attention_weights_float[:, -1, :mem_length] = float(self.negative_inf)
+        rc_output_memory = self.prepare_attention_output(
+            attention_weights=attention_weights,
+            attention_weights_float=attention_weights_float,
+            v=v,
+            input_shape=input_shape,
+            key_length=key_length,
+            padding_mask=padding_mask,
+            rpe=rpe,
+        )
+
+        # In decoding, summary length is 1
+        if self.use_mem:
+            next_m = rc_output_memory[-1:]
+            next_m = self.squash_mem(next_m)
+            # rc and output
+            rc_output = rc_output_memory[:-1]
+            if not self.nonlinear_squash_mem:
+                next_m = torch.clamp(next_m, min=-10, max=10)
+        else:
+            rc_output = rc_output_memory
+            # empty tensor as input mems
+            next_m = mems
+
+        return rc_output, next_m, next_k, next_v
+
+    def quantize_(self, params=None):
+        if params and "per_channel" in params and params["per_channel"]:
+            qconfig = per_channel_dynamic_qconfig
+        else:
+            qconfig = default_dynamic_qconfig
+        torch.quantization.quantize_dynamic(
+            self, {torch.nn.Linear: qconfig}, dtype=torch.qint8, inplace=True
+        )
+        return self
+
+
+class NoSegAugmentedMemoryTransformer(nn.Module):
+    """
+    Whole utterance augmented memory transformer.
+
+    This is not pyspeech nn layer. It is used as a module in a master layer where
+    multiple transformers is used.
+    """
+
+    def __init__(
+        self,
+        input_dim,
+        num_heads,
+        ffn_dim,
+        dropout_in_attn=0.0,
+        dropout_on_attn=None,
+        dropout_on_fc1=None,
+        dropout_on_fc2=None,
+        activation_fn="relu",
+        tanh_on_mem=False,
+        std_scale=None,
+        scaled_init=False,
+        segment_size=128,
+        use_mem=True,
+        mini_batches=False,
+        negative_inf="-inf",
+        layer_index=-1,
+        summarization_method="mean",
+        max_relative_position=0,
+        rpe_old_option=True,
+    ):
+        super(NoSegAugmentedMemoryTransformer, self).__init__()
+
+        self.attention = NoSegAugmentedMemoryMultiheadAttentionBmm(
+            input_dim=input_dim,
+            num_heads=num_heads,
+            dropout=dropout_in_attn,
+            scaled_init=scaled_init,
+            tanh_on_mem=tanh_on_mem,
+            std_scale=std_scale,
+            use_mem=use_mem,
+            mini_batches=mini_batches,
+            negative_inf=negative_inf,
+            layer_index=layer_index,
+            max_relative_position=max_relative_position,
+        )
+        self.dropout = nn.Dropout(dropout_on_attn)
+        self.pos_ff = PositionwiseFF(
+            input_dim=input_dim,
+            ffn_dim=ffn_dim,
+            dropout_on_fc1=dropout_on_fc1,
+            dropout_on_fc2=dropout_on_fc2,
+            activation_fn=activation_fn,
+        )
+        self.layer_norm_pre = Fp32LayerNorm(input_dim)
+        self.layer_norm = Fp32LayerNorm(input_dim)
+        self.segment_size = segment_size
+        self.use_mem = use_mem
+
+        self.memory_op = SummarizationLayer(
+            summarization_method, segment_size, input_dim
+        )
+
+    def set_mini_batches(self, mini_batches):
+        self.attention.mini_batches = mini_batches
+
+    def gen_summary_queries(self, input):
+        sum_input = self.memory_op(input)
+        return sum_input
+
+    def pre_attention_ops(self, input, right_context_blocks):
+        rc_length = right_context_blocks.size(0)
+        input_length = input.size(0)
+
+        rc_and_input = torch.cat([right_context_blocks, input], dim=0)
+        residual_input = rc_and_input
+        rc_and_input = self.layer_norm_pre(rc_and_input)
+
+        query_input = rc_and_input[-input_length:, :, :]
+        return rc_length, input_length, residual_input, query_input, rc_and_input
+
+    def after_attention_ops(self, attention_output, residual_input):
+        output = self.dropout(attention_output)
+        output = output + residual_input
+        output = self.pos_ff(output)
+        output = self.layer_norm(output)
+        return output
+
+    @torch.jit.export
+    def forward_jit(
+        self,
+        input: Tensor,
+        lengths: Tensor,
+        mems: Tensor,
+        left_context_key: Tensor,
+        left_context_val: Tensor,
+        right_context_blocks: Tensor,
+        rpe: Optional[Tensor],
+    ) -> Tuple[Tensor, Tensor, Tensor, Tensor, Tensor]:
+
+        results = self.pre_attention_ops(input, right_context_blocks)
+        rc_length, input_length, residual_input, query_input, rc_and_input = results
+
+        # In online decoding, the summary query size is always 1 or 0
+        if self.use_mem:
+            summary_query = self.gen_summary_queries(query_input)
+            summary_query = summary_query[0:1, :, :]
+            rc_qu_su = torch.cat([rc_and_input, summary_query], dim=0)
+        else:
+            rc_qu_su = rc_and_input
+
+        rc_output, next_m, next_k, next_v = self.attention.forward_jit(
+            input=rc_qu_su,
+            lengths=lengths,
+            mems=mems,
+            left_context_key=left_context_key,
+            left_context_val=left_context_val,
+            rpe=rpe,
+        )
+        rc_output = self.after_attention_ops(rc_output, residual_input)
+        results = (
+            rc_output[-input_length:, :, :],
+            next_m,
+            rc_output[0:rc_length, :, :],
+            next_k,
+            next_v,
+        )
+        return results
+
+    @torch.jit.unused
+    def forward(
+        self,
+        input,
+        lengths,
+        mems,
+        right_context_blocks,
+        attention_mask,
+        pre_mems,
+        left_context_key,
+        left_context_val,
+        rpe,
+    ):
+
+        results = self.pre_attention_ops(input, right_context_blocks)
+        rc_length, input_length, residual_input, query_input, rc_and_input = results
+        if self.use_mem:
+            summary_query = self.gen_summary_queries(query_input)
+            rc_qu_su = torch.cat([rc_and_input, summary_query], dim=0)
+        else:
+            rc_qu_su = rc_and_input
+
+        rc_output, next_m, next_k, next_v = self.attention(
+            input=rc_qu_su,
+            lengths=lengths,
+            mems=mems,
+            attention_mask=attention_mask,
+            pre_mems=pre_mems,
+            left_context_key=left_context_key,
+            left_context_val=left_context_val,
+            rpe=rpe,
+        )
+
+        # [TODO] Note memory did not go through pos_ff. What happen if we pass
+        # memory through the pos_ff as well?
+        rc_output = self.after_attention_ops(rc_output, residual_input)
+        results = (
+            rc_output[-input_length:, :, :],
+            next_m,
+            rc_output[0:rc_length, :, :],
+            next_k,
+            next_v,
+        )
+
+        return results
+
+
+class NoSegAugmentedMemoryTransformerEncoderLayer(FairseqEncoder):
+    """
+    Whole utterance augmented memory transformer encoder layer. This is a master layer
+    where we can define multiple augmented memory transformers. There are two reasons
+    to setup the master layer.
+    1. We only need to define once about the attention mask. All the layers in the master
+       layer share the same mask.
+    2. pyspeech nn layer has special input and output format. Defining one master layer is
+       easier to passing memory between different layes inside the master layer
+
+    args:
+        input_dim: input embedding dimension
+        num_heads: number of heads in multihead self-attention
+        ffn_dim: ffn dimension in FFN layer
+        num_layers: number of augmented memory transformer layers
+        dropout_in_attn: dropout used in multi-head self-attention
+        dropout_on_attn: dropout used for output from te multihead self-attention
+        dropout_on_fc1: dropout used in FFN layer for the first linear layer
+        dropout_on_fc2: dropout used in FFN layer for the second linear layer
+        segment_size: segment size for each segment
+        context_config: (left_context_size, right_context_size) defines the surround context size
+            for each segment
+        max_memory_size: maximum memory size used for each segment
+        scaled_init: whether use scaled init for weight initialization in attention layer
+        std_scale: if std_scale is not None. The weak attention suppression is
+            turned on. For std_scale = 0.5, all the attention smaller than
+            mean + 0.5 * std will be suppressed.
+        activation_fn: activation function used in FFN layer. [ReLU, GELU] supported
+        tanh_on_mem: whether use tanh on memory
+        mini_batches: use mini-btach training
+        negative_inf: the negative infinity value used in attention masking. default is "-inf".
+            For some situation, e.g. LM. it is better to use "-1e8" to avoid nan issue.
+        summarization_method: method to generate segment summrization embedding
+        max_relative_position: max relatie position for relative position embedding
+        rpe_old_option: To be compatible with previous model. The previous model
+            was trained with attention += attention + rpe. The correct equation
+            should be attention = attention + rpe
+        [TODO]: remove the rpe_old_option by the end of 2021 Q1.
+
+    """
+
+    def __init__(
+        self,
+        input_dim,
+        num_heads,
+        ffn_dim,
+        num_layers=1,
+        dropout_in_attn=0.0,
+        dropout_on_attn=0.0,
+        dropout_on_fc1=0.0,
+        dropout_on_fc2=0.0,
+        segment_size=128,
+        context_config=(0, 0),
+        max_memory_size=0,
+        scaled_init=True,
+        std_scale=None,
+        activation_fn="relu",
+        tanh_on_mem=False,
+        mini_batches=False,
+        negative_inf="-inf",
+        deep_init=True,
+        summarization_method="mean",
+        max_relative_position=0,
+        rpe_old_option=True,
+    ):
+        super().__init__(None)
+        if input_dim % num_heads:
+            raise ValueError(
+                "input_dim ({}) must be divisible by num_heads ({})".format(
+                    input_dim, num_heads
+                )
+            )
+
+        # we used to support growing memory size. However, it will cause
+        # cross stream batching failure. Now we need to have exact max memory size
+        if max_memory_size < 0:
+            raise ValueError("max_memory_size must be >= 0")
+
+        # Only assign right_context. In decoding, left context will be cached.
+        # No need to let the online decoder to re-assign the left context
+        self.left_context, self.right_context = context_config
+        self.segment_size = segment_size
+        self.memory_dim = input_dim
+        self.max_memory_size = max_memory_size
+        self.mini_batches = mini_batches
+        if self.max_memory_size != 0:
+            self.use_mem = True
+        else:
+            self.use_mem = False
+
+        self.memory_op = SummarizationLayer(
+            summarization_method, segment_size, input_dim
+        )
+
+        self.layers = torch.nn.ModuleList()
+        self.num_layers = num_layers
+        self.max_relative_position = max_relative_position
+        if self.max_relative_position > 0:
+            self.use_rpe = True
+        else:
+            self.use_rpe = False
+        for i in range(self.num_layers):
+            if deep_init:
+                layer_index = i
+            else:
+                layer_index = -1
+
+            self.layers.append(
+                NoSegAugmentedMemoryTransformer(
+                    num_heads=num_heads,
+                    input_dim=input_dim,
+                    ffn_dim=ffn_dim,
+                    dropout_in_attn=dropout_in_attn,
+                    dropout_on_attn=dropout_on_attn,
+                    dropout_on_fc1=dropout_on_fc1,
+                    dropout_on_fc2=dropout_on_fc2,
+                    segment_size=segment_size,
+                    std_scale=std_scale,
+                    activation_fn=activation_fn,
+                    tanh_on_mem=tanh_on_mem,
+                    scaled_init=scaled_init,
+                    use_mem=self.use_mem,
+                    mini_batches=mini_batches,
+                    negative_inf=negative_inf,
+                    layer_index=layer_index,
+                    summarization_method=summarization_method,
+                    max_relative_position=max_relative_position,
+                    rpe_old_option=rpe_old_option,
+                )
+            )
+
+    def set_mini_batches(self, mini_batches):
+        # handy function only used for unit test
+        self.mini_batches = mini_batches
+        for layer in self.layers:
+            layer.set_mini_batches(mini_batches)
+
+    def _get_relative_position(
+        self,
+        input: Tensor,
+        max_relative_position: int,
+        left_context_length: int,
+        past_length: int,
+        is_decoding: bool,
+    ):
+        # For training, we copy the right context to the start of the utterance
+        # First dimension in distance is corresponding to query.
+        # [right context, utterance, summary vector]
+        # Second dimension in distance is corresponding to key.
+        # [Memory bank, right context, utterance]
+        # For summary vector in query part, the distance with
+        # all other position is 2*max_position. For memory bank in key,
+        # the distance with all other positions is 0.
+
+        T, B, D = input.shape
+        num_segs = math.ceil((T - self.right_context) / self.segment_size)
+
+        # utterance
+        u_st = past_length * self.segment_size
+        u_ed = u_st + T
+        utterance_ranges = torch.arange(u_st, u_ed - self.right_context)
+
+        # left context. Only in minibatch or decoding
+        left_context_ranges = torch.arange(u_st - left_context_length, u_st)
+
+        # Right context block
+        # right context + utterance
+        right_context_blocks = []
+        for i in range(0, num_segs - 1):
+            st = (i + 1) * self.segment_size + u_st
+            ed = st + self.right_context
+            assert ed < u_ed
+            temp = torch.arange(st, ed)
+            right_context_blocks.append(temp)
+        right_context_blocks.append(torch.arange(u_ed - self.right_context, u_ed))
+        right_context_ranges = torch.cat(right_context_blocks)
+
+        if self.use_mem:
+            # Memory bank
+            # The position for memory -n, .., -1
+            if is_decoding:
+                memory_size = min(past_length, self.max_memory_size)
+            else:
+                memory_size = num_segs + past_length - 1
+            memory_bank_ranges = torch.arange(
+                -max_relative_position - 1, -max_relative_position - 1 - memory_size, -1
+            )
+
+            # summary vector
+            # The position for summary vector as the T+max_relative_position+1.
+            # After the clamping, the relative position is max_relative_position
+            summary_pos_st = u_ed + max_relative_position + 1
+            summary_vector_ranges = torch.arange(
+                summary_pos_st, summary_pos_st + num_segs
+            )
+
+            key_ranges = torch.cat(
+                [
+                    memory_bank_ranges,
+                    right_context_ranges,
+                    left_context_ranges,
+                    utterance_ranges,
+                ]
+            )
+
+            query_ranges = torch.cat(
+                [right_context_ranges, utterance_ranges, summary_vector_ranges]
+            )
+        else:
+            key_ranges = torch.cat(
+                [right_context_ranges, left_context_ranges, utterance_ranges]
+            )
+
+            query_ranges = torch.cat([right_context_ranges, utterance_ranges])
+
+        distance = key_ranges[None, :] - query_ranges[:, None]
+        distance_clamp = (
+            torch.clamp(distance, -max_relative_position, max_relative_position)
+            + max_relative_position
+        )
+        distance_clamp = distance_clamp.to(input.device).long().detach()
+        return distance_clamp
+
+    def _get_attention_mask(self, input, past_length=0, left_context_cache=0):
+        # attention mask for each query contains three parts:
+        # 1. memory part
+        # 2. left_context + segment
+        # 3. right_context_block
+        # so for each segment and its correspoinding right context block,
+        # the attention matrix is formed by 9 parts:
+        # [0, m, 0, 0, right_context, 0, 0, seg, 0]
+        # [before memory, memory, after memory, before right context, right_context,
+        #  after right context, before seg, seg, after seg]
+        #
+        # Query is formed in the way as [right_context_blocks, utterance, summary]
+        #
+        # Note: put m and right_context before segment is convenient
+        # for padding_mask operation.
+        # Key lengths = m_length + right_context_block_length + lengths
+        utterance_length, batch_size, _ = input.shape
+        summary_length = math.ceil(utterance_length / self.segment_size)
+        num_segs = summary_length
+        rc_length = self.right_context * num_segs
+        rc = self.right_context
+        lc = self.left_context
+
+        # using mini-batches, there is left context cache available for current
+        # sequence.
+        lcc = left_context_cache
+
+        # max_memory_size is 0 then we don't have memory and summary
+        # past_length is the memory carry from previous sequence
+        if self.use_mem:
+            mem_length = num_segs - 1 + past_length
+        else:
+            mem_length = 0
+        rc_mask = []
+        query_mask = []
+        summary_mask = []
+        for j in range(0, num_segs):
+            ssize = min(self.segment_size, utterance_length - j * self.segment_size)
+
+            rc_size = rc
+            rc_mat = []
+            q_mat = []
+            s_mat = []
+            m_start = max(j + past_length - self.max_memory_size, 0)
+
+            # max_memory_size is 0, then we don't use memory
+            if self.use_mem:
+                # part 0: before memory
+                rc_mat.append(input.new_zeros(rc_size, m_start))
+                q_mat.append(input.new_zeros(ssize, m_start))
+                s_mat.append(input.new_zeros(1, m_start))
+
+                # part 1: memory
+                col_1 = j + past_length - m_start
+                rc_mat.append(torch.ones(rc_size, col_1, device=input.device))
+                q_mat.append(torch.ones(ssize, col_1, device=input.device))
+                # based on D22875746, disable summary query attention
+                # on memeory is better for long form utterance
+                s_mat.append(input.new_zeros(1, col_1))
+
+                # part 2: after memory
+                col_2 = mem_length - (j + past_length)
+                rc_mat.append(input.new_zeros(rc_size, col_2))
+                q_mat.append(input.new_zeros(ssize, col_2))
+                s_mat.append(input.new_zeros(1, col_2))
+
+            # part 3: before right context
+            rc_start = j * rc
+            rc_mat.append(input.new_zeros(rc_size, rc_start))
+            q_mat.append(input.new_zeros(ssize, rc_start))
+            s_mat.append(input.new_zeros(1, rc_start))
+
+            # part 4: right context
+            rc_end = rc_start + rc
+            col_4 = rc
+            rc_mat.append(torch.ones(rc_size, col_4, device=input.device))
+            q_mat.append(torch.ones(ssize, col_4, device=input.device))
+            s_mat.append(torch.ones(1, col_4, device=input.device))
+
+            # part 5: after right context
+            col_5 = rc_length - rc_end
+            rc_mat.append(input.new_zeros(rc_size, col_5))
+            q_mat.append(input.new_zeros(ssize, col_5))
+            s_mat.append(input.new_zeros(1, col_5))
+
+            # part 6: before query segment
+            seg_start = max(j * self.segment_size + lcc - lc, 0)
+            rc_mat.append(input.new_zeros(rc_size, seg_start))
+            q_mat.append(input.new_zeros(ssize, seg_start))
+            s_mat.append(input.new_zeros(1, seg_start))
+
+            # part 7: query segment
+            # note: right context is put in right context block
+            # here we only need to consider about left context
+            seg_end = min((j + 1) * self.segment_size + lcc, utterance_length + lcc)
+            col_7 = seg_end - seg_start
+            rc_mat.append(torch.ones(rc_size, col_7, device=input.device))
+            q_mat.append(torch.ones(ssize, col_7, device=input.device))
+            s_mat.append(torch.ones(1, col_7, device=input.device))
+
+            # part 8: after query segment
+            col_8 = utterance_length + lcc - seg_end
+            rc_mat.append(input.new_zeros(rc_size, col_8))
+            q_mat.append(input.new_zeros(ssize, col_8))
+            s_mat.append(input.new_zeros(1, col_8))
+
+            rc_mask.append(torch.cat(rc_mat, dim=1))
+            query_mask.append(torch.cat(q_mat, dim=1))
+            summary_mask.append(torch.cat(s_mat, dim=1))
+
+        # no memory, then we don't need summary either
+        if self.use_mem:
+            attention_mask = (
+                1
+                - torch.cat(
+                    [
+                        torch.cat(rc_mask, dim=0),
+                        torch.cat(query_mask, dim=0),
+                        torch.cat(summary_mask, dim=0),
+                    ],
+                    dim=0,
+                )
+            ).to(torch.bool)
+        else:
+            attention_mask = (
+                1
+                - torch.cat(
+                    [torch.cat(rc_mask, dim=0), torch.cat(query_mask, dim=0)], dim=0
+                )
+            ).to(torch.bool)
+
+        return attention_mask
+
+    @torch.jit.export
+    def init_state(
+        self, batch_size: int, device: Optional[Device] = None
+    ) -> List[Tensor]:
+        empty_memory = torch.zeros(
+            self.num_layers,
+            self.max_memory_size,
+            batch_size,
+            self.memory_dim,
+            device=device,
+        )
+        left_context_key = torch.zeros(
+            self.num_layers,
+            self.left_context,
+            batch_size,
+            self.memory_dim,
+            device=device,
+        )
+        left_context_val = torch.zeros(
+            self.num_layers,
+            self.left_context,
+            batch_size,
+            self.memory_dim,
+            device=device,
+        )
+        past_length = torch.zeros(1, batch_size, dtype=torch.int32, device=device)
+
+        return [empty_memory, left_context_key, left_context_val, past_length]
+
+    @torch.jit.export
+    def batch_state(self, states: List[List[Tensor]]) -> List[Tensor]:
+        if len(states) == 0:
+            return []
+        batched_m = []
+        batched_lc_key = []
+        batched_lc_val = []
+        batched_past_length = []
+        for state in states:
+            if len(state) == 0:
+                continue
+            m, lc_key, lc_val, past_length = state
+            batched_m.append(m)
+            batched_lc_key.append(lc_key)
+            batched_lc_val.append(lc_val)
+            batched_past_length.append(past_length)
+
+        if (
+            (len(batched_m) == 0)
+            or (len(batched_lc_key) == 0)
+            or (len(batched_lc_val) == 0)
+            or (len(batched_past_length) == 0)
+        ):
+            return [
+                torch.tensor([]),
+                torch.tensor([]),
+                torch.tensor([]),
+                torch.tensor([]),
+            ]
+
+        batched_m = torch.cat(batched_m, dim=2)
+        batched_lc_key = torch.cat(batched_lc_key, dim=2)
+        batched_lc_val = torch.cat(batched_lc_val, dim=2)
+        batched_past_length = torch.cat(batched_past_length, dim=1)
+        return [batched_m, batched_lc_key, batched_lc_val, batched_past_length]
+
+    @torch.jit.export
+    def reorder_state(self, state: List[Tensor], indices: Tensor) -> List[Tensor]:
+        if len(state) == 0:
+            return []
+        m, lc_key, lc_val, past_length = state
+        indices = indices.to(device=m.device)
+        reord_m = torch.index_select(m, 2, indices)
+        reord_lc_key = torch.index_select(lc_key, 2, indices)
+        reord_lc_val = torch.index_select(lc_val, 2, indices)
+        reord_past_length = torch.index_select(past_length, 1, indices)
+        return [reord_m, reord_lc_key, reord_lc_val, reord_past_length]
+
+    @torch.jit.export
+    def reset_state(self, state: List[Tensor], indices: Tensor) -> List[Tensor]:
+        m, lc_key, lc_val, past_length = state
+        m = m.index_fill(dim=2, index=indices, value=0.0)
+        lc_key = lc_key.index_fill(dim=2, index=indices, value=0.0)
+        lc_val = lc_val.index_fill(dim=2, index=indices, value=0.0)
+        past_length = past_length.index_fill(dim=1, index=indices, value=0)
+
+        return [m, lc_key, lc_val, past_length]
+
+    @torch.jit.export
+    def state_size(self) -> int:
+        return 4
+
+    @torch.jit.export
+    def batch_size_in_state(
+        self, state: Optional[List[Tensor]], sloppy: bool = True
+    ) -> Optional[int]:
+        if state is None:
+            return None
+        return state[0].size(2)
+
+    def gen_summary_queries(self, input):
+        sum_input = self.memory_op(input)
+        return sum_input
+
+    def _gen_right_context_padded_input(self, input):
+        # This function deals with input that is already
+        # padded with right context (e.g. minibatch training)
+        right_context_blocks = []
+        T, B, D = input.shape
+        num_segs = math.ceil((T - self.right_context) / self.segment_size)
+        for i in range(0, num_segs - 1):
+            st = (i + 1) * self.segment_size
+            ed = st + self.right_context
+            assert ed < T
+            temp = input[st:ed, :, :]
+            right_context_blocks.append(temp)
+
+        # last segment right context is already available
+        right_context_blocks.append(input[T - self.right_context :, :, :])
+        return torch.cat(right_context_blocks, dim=0)
+
+    def _gen_segs_right_context(self, input, lengths):
+        segments = []
+        T, B, D = input.size()
+        nT = T - self.right_context
+
+        # assume input is right context padded
+        num_segs = math.ceil(nT / self.segment_size)
+        # pad zeros to the utterance to make sure each
+        # segment has the same right context. For the
+        for i in range(0, num_segs - 1):
+            st = i * self.segment_size
+            ed = min(T, st + self.segment_size + self.right_context)
+            temp = input[st:ed, :, :]
+            rest_lengths = torch.clamp(
+                lengths - self.segment_size, min=0, max=nT - (i + 1) * self.segment_size
+            )
+            segments.append((temp, lengths - rest_lengths + self.right_context))
+            lengths = rest_lengths
+
+        last_seg = input[st + self.segment_size :, :, :]
+        segments.append((last_seg, rest_lengths + self.right_context))
+
+        return segments
+
+    @torch.jit.unused
+    def forward(
+        self, input: Tensor, padding_masks: Tensor, state: Optional[List[Tensor]] = None
+    ) -> Tuple[Tensor, Tensor, List[Tensor], List[Tensor]]:
+        # Xutai: originally the second argument is lengths.
+        lengths = (~padding_masks).sum(dim=1).long()
+        # mini batch training.
+        if self.mini_batches:
+            return self.forward_mini_batches(input, lengths, state)
+
+        # regular full sequence training. Note, assume the right context in provided
+        # in the input.
+        T, B, D = input.size()
+        right_context_blocks = self._gen_right_context_padded_input(input)
+
+        # generate the relative positional embedding
+        if self.use_rpe:
+            rpe = self._get_relative_position(
+                input=input,
+                max_relative_position=self.max_relative_position,
+                left_context_length=0,
+                past_length=0,
+                is_decoding=False,
+            )
+        else:
+            rpe = None
+        input = input[: T - self.right_context, :, :]
+
+        attention_mask = self._get_attention_mask(input)
+
+        # firt layer use each segment mean as memory
+        # ignore the last one seg average
+        if self.use_mem:
+            mems = self.gen_summary_queries(input)[:-1, :, :]
+        else:
+            mems = torch.zeros(0, input.size(1), input.size(2), device=input.device)
+            mems = mems.type_as(input)
+
+        output = input
+        all_outputs = []
+
+        for layer in self.layers:
+            output, mems, right_context_blocks, _, _ = layer(
+                input=output,
+                lengths=lengths,
+                attention_mask=attention_mask,
+                mems=mems,
+                right_context_blocks=right_context_blocks,
+                pre_mems=None,
+                left_context_key=None,
+                left_context_val=None,
+                rpe=rpe,
+            )
+            all_outputs.append(output)
+        return output, padding_masks, [], all_outputs
+
+    def forward_jit_mini_batch_init(
+        self,
+        seg: Tensor,
+        state: Optional[List[Tensor]] = None,
+        is_decoding: bool = False,
+    ):
+        # Prepare state. In whole sequence training, state is ignored.
+        # For minibatch training, we need to prepare state
+        if state is None:
+            state = self.init_state(batch_size=seg.size(1), device=seg.device)
+            if seg.dtype == torch.half:
+                state = [state[0].half(), state[1].half(), state[2].half(), state[3]]
+
+        if self.use_mem:
+            # note input average only on seg, not on right context
+            # first layer use each segmetn mean as memory. the last
+            # one segment average is used in state
+            full_mems = self.gen_summary_queries(seg)
+            if is_decoding:
+                mems = full_mems[0:1, :, :]
+                state_mems = torch.cat([state[0][0], mems], dim=0)
+            else:
+                mems = full_mems[:-1, :, :]
+                state_mems = torch.cat([state[0][0], full_mems], dim=0)
+        else:
+            mems = state[0][0]
+            state_mems = mems
+
+        # track processed segment number or memory number
+        # the same batch as the same bumber of past length
+        past_length = state[3][0][0].item()
+        past_left_context = min(past_length * self.segment_size, self.left_context)
+        past_length = min(self.max_memory_size, past_length)
+
+        return state, mems, state_mems, past_length, past_left_context
+
+    def state_update_before(
+        self, layer: int, state: List[Tensor], past_length: int, past_left_context: int
+    ):
+        pre_mems = state[0][layer][self.max_memory_size - past_length :, :, :]
+        lc_key = state[1][layer][self.left_context - past_left_context :, :, :]
+        lc_val = state[2][layer][self.left_context - past_left_context :, :, :]
+        return pre_mems, lc_key, lc_val
+
+    def state_update_after(
+        self,
+        layer: int,
+        state: List[Tensor],
+        mems: Tensor,
+        next_key: Tensor,
+        next_val: Tensor,
+        mems_list: List[Tensor],
+        lc_key_list: List[Tensor],
+        lc_val_list: List[Tensor],
+    ):
+        # mems is used for next layer
+        if layer < self.num_layers - 1:
+            state_mems = torch.cat([state[0][layer + 1], mems], dim=0)
+            mems_list.append(state_mems[-self.max_memory_size :, :, :])
+
+        # when mems pass to next sequence, we need the last memory. when mems
+        # use for the next layer, we can ignore the last memory
+        mems = mems[:-1, :, :]
+
+        # note state[1][i] and state[2][i] original length equals to self.left_context
+        new_k = torch.cat([state[1][layer], next_key], dim=0)
+        new_v = torch.cat([state[2][layer], next_val], dim=0)
+        lc_key_list.append(new_k[-self.left_context :, :, :])
+        lc_val_list.append(new_v[-self.left_context :, :, :])
+        return mems_list, lc_key_list, lc_val_list, mems
+
+    def state_update_after_loop(
+        self,
+        state: List[Tensor],
+        mems_list: List[Tensor],
+        lc_key_list: List[Tensor],
+        lc_val_list: List[Tensor],
+        update_length: int,
+    ):
+        state[0] = torch.stack(mems_list, dim=0)
+        state[1] = torch.stack(lc_key_list, dim=0)
+        state[2] = torch.stack(lc_val_list, dim=0)
+        state[3] = state[3] + update_length
+        return state
+
+    @torch.jit.unused
+    def forward_mini_batches(
+        self, input: Tensor, lengths: Tensor, state: Optional[List[Tensor]] = None
+    ) -> Tuple[Tensor, Tensor, List[Tensor], List[Tensor]]:
+        T, B, D = input.size()
+
+        # input without right context
+        seg = input[: T - self.right_context, :, :]
+
+        # get right context blocks
+        right_context_blocks = self._gen_right_context_padded_input(input)
+
+        mems_list = []
+        lc_key_list = []
+        lc_val_list = []
+        results = self.forward_jit_mini_batch_init(seg, state, False)
+        state, mems, state_mems, past_length, past_left_context = results
+
+        # relative position embedding
+        if self.use_rpe:
+            rpe = self._get_relative_position(
+                input=input,
+                max_relative_position=self.max_relative_position,
+                left_context_length=past_left_context,
+                past_length=past_length,
+                is_decoding=False,
+            )
+        else:
+            rpe = None
+
+        # get attention mask based on seg (not include right context) and available
+        # left context
+        attention_mask = self._get_attention_mask(seg, past_length, past_left_context)
+        mems_list.append(state_mems[-self.max_memory_size :, :, :])
+        output = seg
+        i = 0
+        all_outputs = []
+        for layer in self.layers:
+            # In order to make cross stream batching work, mem, left context key
+            # and left context value in the state should always be the same shape.
+            # We use the past length to track the processed segment number. In this
+            # way, we take out the essential memory, left context key and left
+            # context val from the state. After finish the forward for current segment
+            # we add the new memory, left context key and left context value into the
+            # staate and trim out the oldest part to keep the shape consistent.
+            pre_mems, lc_key, lc_val = self.state_update_before(
+                i, state, past_length, past_left_context
+            )
+
+            output, mems, right_context_blocks, next_key, next_val = layer.forward(
+                input=output,
+                lengths=lengths,
+                attention_mask=attention_mask,
+                mems=mems,
+                right_context_blocks=right_context_blocks,
+                pre_mems=pre_mems,
+                left_context_key=lc_key,
+                left_context_val=lc_val,
+                rpe=rpe,
+            )
+            all_outputs.append(output)
+            mems_list, lc_key_list, lc_val_list, mems = self.state_update_after(
+                layer=i,
+                state=state,
+                mems=mems,
+                next_key=next_key,
+                next_val=next_val,
+                mems_list=mems_list,
+                lc_key_list=lc_key_list,
+                lc_val_list=lc_val_list,
+            )
+
+            i += 1
+
+        # update state
+        update_length = math.ceil((T - self.right_context) / self.segment_size)
+        state = self.state_update_after_loop(
+            state=state,
+            mems_list=mems_list,
+            lc_key_list=lc_key_list,
+            lc_val_list=lc_val_list,
+            update_length=update_length,
+        )
+
+        return output, lengths, state, all_outputs
+
+    def forward_jit_test(
+        self, input: Tensor, lengths: Tensor, state: Optional[List[Tensor]] = None
+    ) -> Tuple[Tensor, Tensor, List[Tensor]]:
+        """
+        This one simulate sequence encoder forward jit. This is for unit test purpose.
+        It is not used in training or decoding. Note, extra_right_context is set in
+        the model. In unit test, input = [utterance, right_context], lengths =
+        [utterance_length].
+        args:
+            input: input utterance
+            lengths: utterance input length
+            state: None here. input is whole utterance
+        """
+        # [TODO] sequence_to_segment has bug in lengths.
+        seg_src_tokens_lengths = self._gen_segs_right_context(input, lengths)
+
+        seg_enc_tokens_lengths: List[Tuple[Tensor, Tensor]] = []
+        state: Optional[List[Tensor]] = None
+        for seg_src_tokens, seg_src_lengths in seg_src_tokens_lengths:
+            seg_enc_tokens, seg_enc_lengths, state = self.forward_jit(
+                input=seg_src_tokens, lengths=seg_src_lengths, state=state
+            )
+            seg_enc_tokens_lengths.append((seg_enc_tokens, seg_enc_lengths))
+
+        enc_tokens, enc_lengths = segments_to_sequence(
+            segments=seg_enc_tokens_lengths, time_axis=0
+        )
+
+        state = []  # returns trivial state
+
+        return enc_tokens, enc_lengths, state
+
+    @torch.jit.export
+    def forward_jit(
+        self, input: Tensor, lengths: Tensor, state: Optional[List[Tensor]] = None
+    ) -> Tuple[Tensor, Tensor, List[Tensor]]:
+        """
+        Forward helper for online decoding.
+
+        args:
+            input: [seg, right_context]. We assume in online we
+                always padding the right context to the preset right context size.
+                For the last segment, we may have short segment size, but right
+                context size is the same as other segments
+            lengths: utterance input length is the utterance segment length and
+                     right context size
+            state: [memory, left_context_key, left_context_val]. To improve throughput,
+                in addition to memory, we also cache key and value for left_context in
+                multihead self-attention
+        """
+        # In online decoding, input = [segment, right_context]
+        # Lengths = [segment_length, right_context_length]
+        # so we need strip right context in output
+        T, B, D = input.size()
+        rc_str = T - self.right_context
+        rc_end = T
+        right_context_blocks = input[rc_str:rc_end, :, :]
+        seg = input[:rc_str, :, :]
+        lengths = torch.clamp(lengths - self.right_context, min=0)
+        mems_list = []
+        lc_key_list = []
+        lc_val_list = []
+
+        results = self.forward_jit_mini_batch_init(seg, state, True)
+        state, mems, state_mems, past_length, past_left_context = results
+
+        # relative position embedding
+        if self.use_rpe:
+            rpe = self._get_relative_position(
+                input=input,
+                max_relative_position=self.max_relative_position,
+                left_context_length=past_left_context,
+                past_length=past_length,
+                is_decoding=True,
+            )
+        else:
+            rpe = None
+
+        # memory for first layer.
+        mems_list.append(state_mems[-self.max_memory_size :, :, :])
+        output = seg
+        i = 0
+        for layer in self.layers:
+            # In order to make cross stream batching work, mem, left context key
+            # and left context value in the state should always be the same shape.
+            # We use the past length to track the processed segment number. In this
+            # way, we take out the essential memory, left context key and left
+            # context val from the state. After finish the forward for current segment
+            # we add the new memory, left context key and left context value into the
+            # staate and trim out the oldest part to keep the shape consistent.
+            true_mems, lc_key, lc_val = self.state_update_before(
+                layer=i,
+                state=state,
+                past_length=past_length,
+                past_left_context=past_left_context,
+            )
+
+            output, mems, right_context_blocks, next_key, next_val = layer.forward_jit(
+                input=output,
+                lengths=lengths,
+                mems=true_mems,
+                right_context_blocks=right_context_blocks,
+                left_context_key=lc_key,
+                left_context_val=lc_val,
+                rpe=rpe,
+            )
+            # mems is used for next layer
+            mems_list, lc_key_list, lc_val_list, _ = self.state_update_after(
+                layer=i,
+                state=state,
+                mems_list=mems_list,
+                mems=mems,
+                next_key=next_key,
+                next_val=next_val,
+                lc_key_list=lc_key_list,
+                lc_val_list=lc_val_list,
+            )
+            i += 1
+
+        # update state
+        state = self.state_update_after_loop(
+            state=state,
+            mems_list=mems_list,
+            lc_key_list=lc_key_list,
+            lc_val_list=lc_val_list,
+            update_length=1,
+        )
+
+        return output, lengths, state
+
+    def quantize_(self, params=None):
+        if params and "per_channel" in params and params["per_channel"]:
+            qconfig = per_channel_dynamic_qconfig
+        else:
+            qconfig = default_dynamic_qconfig
+        torch.quantization.quantize_dynamic(
+            self, {torch.nn.Linear: qconfig}, dtype=torch.qint8, inplace=True
+        )
+        return self
+
+
+# ------------------------------------------------------------------------------
+#   Emformer encoder for seq2seq model
+#   This is a wrapper over the original emformer
+# ------------------------------------------------------------------------------
+def emformer_encoder(klass):
+    class SpeechEncoder(klass):
+        def __init__(self, args):
+            super().__init__(args)
+            stride = SpeechEncoder.conv_layer_stride(args)
+            trf_left_context = args.segment_left_context // stride
+            trf_right_context = args.segment_right_context // stride
+            context_config = [trf_left_context, trf_right_context]
+            self.transformer_layers = nn.ModuleList(
+                [
+                    NoSegAugmentedMemoryTransformerEncoderLayer(
+                        input_dim=args.encoder_embed_dim,
+                        num_heads=args.encoder_attention_heads,
+                        ffn_dim=args.encoder_ffn_embed_dim,
+                        num_layers=args.encoder_layers,
+                        dropout_in_attn=args.dropout,
+                        dropout_on_attn=args.dropout,
+                        dropout_on_fc1=args.dropout,
+                        dropout_on_fc2=args.dropout,
+                        activation_fn=args.activation_fn,
+                        context_config=context_config,
+                        segment_size=args.segment_length,
+                        max_memory_size=args.max_memory_size,
+                        scaled_init=True,  # TODO: use constant for now.
+                        tanh_on_mem=args.amtrf_tanh_on_mem,
+                    )
+                ]
+            )
+
+        def forward(self, src_tokens, src_lengths):
+            encoder_out = super().forward(src_tokens, src_lengths)
+            output = encoder_out["encoder_out"][0]
+            encoder_padding_masks = encoder_out["encoder_padding_mask"][0]
+
+            # This is because that in the original implementation
+            # the output didn't consider the last segment as right context.
+            encoder_padding_masks = encoder_padding_masks[:, : output.size(0)]
+
+            return {
+                "encoder_out": [output],
+                "encoder_padding_mask": [encoder_padding_masks],
+                "encoder_embedding": [],
+                "encoder_states": [],
+                "src_tokens": [],
+                "src_lengths": [],
+            }
+
+        @staticmethod
+        def conv_layer_stride(args):
+            # TODO: make it configurable from the args
+            return 4
+
+    SpeechEncoder.__name__ = klass.__name__
+    return SpeechEncoder
diff --git a/SpeechT5/fairseq/fairseq/models/speech_to_text/s2t_transformer.py b/SpeechT5/fairseq/fairseq/models/speech_to_text/s2t_transformer.py
new file mode 100644
index 0000000000000000000000000000000000000000..5c935efaf5ef5fbf03479db6280f60aeeea5e6eb
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/models/speech_to_text/s2t_transformer.py
@@ -0,0 +1,496 @@
+#!/usr/bin/env python3
+
+import logging
+import math
+from typing import Dict, List, Optional, Tuple
+from pathlib import Path
+
+import torch
+import torch.nn as nn
+from fairseq import checkpoint_utils, utils
+from fairseq.data.data_utils import lengths_to_padding_mask
+from fairseq.models import (
+    FairseqEncoder,
+    FairseqEncoderDecoderModel,
+    register_model,
+    register_model_architecture,
+)
+from fairseq.models.transformer import Embedding, TransformerDecoder
+from fairseq.modules import (
+    FairseqDropout,
+    LayerNorm,
+    PositionalEmbedding,
+    TransformerEncoderLayer,
+)
+from torch import Tensor
+
+
+logger = logging.getLogger(__name__)
+
+
+class Conv1dSubsampler(nn.Module):
+    """Convolutional subsampler: a stack of 1D convolution (along temporal
+    dimension) followed by non-linear activation via gated linear units
+    (https://arxiv.org/abs/1911.08460)
+
+    Args:
+        in_channels (int): the number of input channels
+        mid_channels (int): the number of intermediate channels
+        out_channels (int): the number of output channels
+        kernel_sizes (List[int]): the kernel size for each convolutional layer
+    """
+
+    def __init__(
+        self,
+        in_channels: int,
+        mid_channels: int,
+        out_channels: int,
+        kernel_sizes: List[int] = (3, 3),
+    ):
+        super(Conv1dSubsampler, self).__init__()
+        self.n_layers = len(kernel_sizes)
+        self.conv_layers = nn.ModuleList(
+            nn.Conv1d(
+                in_channels if i == 0 else mid_channels // 2,
+                mid_channels if i < self.n_layers - 1 else out_channels * 2,
+                k,
+                stride=2,
+                padding=k // 2,
+            )
+            for i, k in enumerate(kernel_sizes)
+        )
+
+    def get_out_seq_lens_tensor(self, in_seq_lens_tensor):
+        out = in_seq_lens_tensor.clone()
+        for _ in range(self.n_layers):
+            out = ((out.float() - 1) / 2 + 1).floor().long()
+        return out
+
+    def forward(self, src_tokens, src_lengths):
+        bsz, in_seq_len, _ = src_tokens.size()  # B x T x (C x D)
+        x = src_tokens.transpose(1, 2).contiguous()  # -> B x (C x D) x T
+        for conv in self.conv_layers:
+            x = conv(x)
+            x = nn.functional.glu(x, dim=1)
+        _, _, out_seq_len = x.size()
+        x = x.transpose(1, 2).transpose(0, 1).contiguous()  # -> T x B x (C x D)
+        return x, self.get_out_seq_lens_tensor(src_lengths)
+
+
+@register_model("s2t_transformer")
+class S2TTransformerModel(FairseqEncoderDecoderModel):
+    """Adapted Transformer model (https://arxiv.org/abs/1706.03762) for
+    speech-to-text tasks. The Transformer encoder/decoder remains the same.
+    A trainable input subsampler is prepended to the Transformer encoder to
+    project inputs into the encoder dimension as well as downsample input
+    sequence for computational efficiency."""
+
+    def __init__(self, encoder, decoder):
+        super().__init__(encoder, decoder)
+
+    @staticmethod
+    def add_args(parser):
+        """Add model-specific arguments to the parser."""
+        # input
+        parser.add_argument(
+            "--conv-kernel-sizes",
+            type=str,
+            metavar="N",
+            help="kernel sizes of Conv1d subsampling layers",
+        )
+        parser.add_argument(
+            "--conv-channels",
+            type=int,
+            metavar="N",
+            help="# of channels in Conv1d subsampling layers",
+        )
+        # Transformer
+        parser.add_argument(
+            "--activation-fn",
+            type=str,
+            default="relu",
+            choices=utils.get_available_activation_fns(),
+            help="activation function to use",
+        )
+        parser.add_argument(
+            "--dropout", type=float, metavar="D", help="dropout probability"
+        )
+        parser.add_argument(
+            "--attention-dropout",
+            type=float,
+            metavar="D",
+            help="dropout probability for attention weights",
+        )
+        parser.add_argument(
+            "--activation-dropout",
+            "--relu-dropout",
+            type=float,
+            metavar="D",
+            help="dropout probability after activation in FFN.",
+        )
+        parser.add_argument(
+            "--encoder-embed-dim",
+            type=int,
+            metavar="N",
+            help="encoder embedding dimension",
+        )
+        parser.add_argument(
+            "--encoder-ffn-embed-dim",
+            type=int,
+            metavar="N",
+            help="encoder embedding dimension for FFN",
+        )
+        parser.add_argument(
+            "--encoder-layers", type=int, metavar="N", help="num encoder layers"
+        )
+        parser.add_argument(
+            "--encoder-attention-heads",
+            type=int,
+            metavar="N",
+            help="num encoder attention heads",
+        )
+        parser.add_argument(
+            "--encoder-normalize-before",
+            action="store_true",
+            help="apply layernorm before each encoder block",
+        )
+        parser.add_argument(
+            "--decoder-embed-dim",
+            type=int,
+            metavar="N",
+            help="decoder embedding dimension",
+        )
+        parser.add_argument(
+            "--decoder-ffn-embed-dim",
+            type=int,
+            metavar="N",
+            help="decoder embedding dimension for FFN",
+        )
+        parser.add_argument(
+            "--decoder-layers", type=int, metavar="N", help="num decoder layers"
+        )
+        parser.add_argument(
+            "--decoder-attention-heads",
+            type=int,
+            metavar="N",
+            help="num decoder attention heads",
+        )
+        parser.add_argument(
+            "--decoder-normalize-before",
+            action="store_true",
+            help="apply layernorm before each decoder block",
+        )
+        parser.add_argument(
+            "--share-decoder-input-output-embed",
+            action="store_true",
+            help="share decoder input and output embeddings",
+        )
+        parser.add_argument(
+            "--layernorm-embedding",
+            action="store_true",
+            help="add layernorm to embedding",
+        )
+        parser.add_argument(
+            "--no-scale-embedding",
+            action="store_true",
+            help="if True, dont scale embeddings",
+        )
+        parser.add_argument(
+            "--load-pretrained-encoder-from",
+            type=str,
+            metavar="STR",
+            help="model to take encoder weights from (for initialization)",
+        )
+        parser.add_argument(
+            '--encoder-freezing-updates',
+            type=int,
+            metavar='N',
+            help='freeze encoder for first N updates'
+        )
+
+    @classmethod
+    def build_encoder(cls, args):
+        encoder = S2TTransformerEncoder(args)
+        pretraining_path = getattr(args, "load_pretrained_encoder_from", None)
+        if pretraining_path is not None:
+            if not Path(pretraining_path).exists():
+                logger.warning(
+                    f"skipped pretraining because {pretraining_path} does not exist"
+                )
+            else:
+                encoder = checkpoint_utils.load_pretrained_component_from_model(
+                    component=encoder, checkpoint=pretraining_path
+                )
+                logger.info(f"loaded pretrained encoder from: {pretraining_path}")
+        return encoder
+
+    @classmethod
+    def build_decoder(cls, args, task, embed_tokens):
+        return TransformerDecoderScriptable(args, task.target_dictionary, embed_tokens)
+
+    @classmethod
+    def build_model(cls, args, task):
+        """Build a new model instance."""
+
+        # make sure all arguments are present in older models
+        base_architecture(args)
+
+        def build_embedding(dictionary, embed_dim):
+            num_embeddings = len(dictionary)
+            padding_idx = dictionary.pad()
+            return Embedding(num_embeddings, embed_dim, padding_idx)
+
+        decoder_embed_tokens = build_embedding(
+            task.target_dictionary, args.decoder_embed_dim
+        )
+        encoder = cls.build_encoder(args)
+        decoder = cls.build_decoder(args, task, decoder_embed_tokens)
+        return cls(encoder, decoder)
+
+    def get_normalized_probs(
+        self,
+        net_output: Tuple[Tensor, Optional[Dict[str, List[Optional[Tensor]]]]],
+        log_probs: bool,
+        sample: Optional[Dict[str, Tensor]] = None,
+    ):
+        # net_output['encoder_out'] is a (B, T, D) tensor
+        lprobs = self.get_normalized_probs_scriptable(net_output, log_probs, sample)
+        lprobs.batch_first = True
+        return lprobs
+
+    def forward(self, src_tokens, src_lengths, prev_output_tokens):
+        """
+        The forward method inherited from the base class has a **kwargs
+        argument in its input, which is not supported in torchscript. This
+        method overwrites the forward method definition without **kwargs.
+        """
+        encoder_out = self.encoder(src_tokens=src_tokens, src_lengths=src_lengths)
+        decoder_out = self.decoder(
+            prev_output_tokens=prev_output_tokens, encoder_out=encoder_out
+        )
+        return decoder_out
+
+
+class S2TTransformerEncoder(FairseqEncoder):
+    """Speech-to-text Transformer encoder that consists of input subsampler and
+    Transformer encoder."""
+
+    def __init__(self, args):
+        super().__init__(None)
+
+        self.encoder_freezing_updates = args.encoder_freezing_updates
+        self.num_updates = 0
+
+        self.dropout_module = FairseqDropout(
+            p=args.dropout, module_name=self.__class__.__name__
+        )
+        self.embed_scale = math.sqrt(args.encoder_embed_dim)
+        if args.no_scale_embedding:
+            self.embed_scale = 1.0
+        self.padding_idx = 1
+
+        self.subsample = Conv1dSubsampler(
+            args.input_feat_per_channel * args.input_channels,
+            args.conv_channels,
+            args.encoder_embed_dim,
+            [int(k) for k in args.conv_kernel_sizes.split(",")],
+        )
+
+        self.embed_positions = PositionalEmbedding(
+            args.max_source_positions, args.encoder_embed_dim, self.padding_idx
+        )
+
+        self.transformer_layers = nn.ModuleList(
+            [TransformerEncoderLayer(args) for _ in range(args.encoder_layers)]
+        )
+        if args.encoder_normalize_before:
+            self.layer_norm = LayerNorm(args.encoder_embed_dim)
+        else:
+            self.layer_norm = None
+
+    def _forward(self, src_tokens, src_lengths):
+        x, input_lengths = self.subsample(src_tokens, src_lengths)
+        x = self.embed_scale * x
+
+        encoder_padding_mask = lengths_to_padding_mask(input_lengths)
+        positions = self.embed_positions(encoder_padding_mask).transpose(0, 1)
+        x += positions
+        x = self.dropout_module(x)
+
+        for layer in self.transformer_layers:
+            x = layer(x, encoder_padding_mask)
+
+        if self.layer_norm is not None:
+            x = self.layer_norm(x)
+
+        return {
+            "encoder_out": [x],  # T x B x C
+            "encoder_padding_mask": [encoder_padding_mask] if encoder_padding_mask.any() else [],  # B x T
+            "encoder_embedding": [],  # B x T x C
+            "encoder_states": [],  # List[T x B x C]
+            "src_tokens": [],
+            "src_lengths": [],
+        }
+
+    def forward(self, src_tokens, src_lengths):
+        if self.num_updates < self.encoder_freezing_updates:
+            with torch.no_grad():
+                x = self._forward(src_tokens, src_lengths)
+        else:
+            x = self._forward(src_tokens, src_lengths)
+        return x
+
+    def reorder_encoder_out(self, encoder_out, new_order):
+        new_encoder_out = (
+            [] if len(encoder_out["encoder_out"]) == 0
+            else [x.index_select(1, new_order) for x in encoder_out["encoder_out"]]
+        )
+
+        new_encoder_padding_mask = (
+            [] if len(encoder_out["encoder_padding_mask"]) == 0
+            else [x.index_select(0, new_order) for x in encoder_out["encoder_padding_mask"]]
+        )
+
+        new_encoder_embedding = (
+            [] if len(encoder_out["encoder_embedding"]) == 0
+            else [x.index_select(0, new_order) for x in encoder_out["encoder_embedding"]]
+        )
+
+        encoder_states = encoder_out["encoder_states"]
+        if len(encoder_states) > 0:
+            for idx, state in enumerate(encoder_states):
+                encoder_states[idx] = state.index_select(1, new_order)
+
+        return {
+            "encoder_out": new_encoder_out,  # T x B x C
+            "encoder_padding_mask": new_encoder_padding_mask,  # B x T
+            "encoder_embedding": new_encoder_embedding,  # B x T x C
+            "encoder_states": encoder_states,  # List[T x B x C]
+            "src_tokens": [],  # B x T
+            "src_lengths": [],  # B x 1
+        }
+
+    def set_num_updates(self, num_updates):
+        super().set_num_updates(num_updates)
+        self.num_updates = num_updates
+
+
+class TransformerDecoderScriptable(TransformerDecoder):
+    def extract_features(
+        self,
+        prev_output_tokens,
+        encoder_out: Optional[Dict[str, List[Tensor]]] = None,
+        incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]] = None,
+        full_context_alignment: bool = False,
+        alignment_layer: Optional[int] = None,
+        alignment_heads: Optional[int] = None,
+    ):
+        # call scriptable method from parent class
+        x, _ = self.extract_features_scriptable(
+            prev_output_tokens,
+            encoder_out,
+            incremental_state,
+            full_context_alignment,
+            alignment_layer,
+            alignment_heads,
+        )
+        return x, None
+
+
+@register_model_architecture(model_name="s2t_transformer", arch_name="s2t_transformer")
+def base_architecture(args):
+    args.encoder_freezing_updates = getattr(args, "encoder_freezing_updates", 0)
+    # Convolutional subsampler
+    args.conv_kernel_sizes = getattr(args, "conv_kernel_sizes", "5,5")
+    args.conv_channels = getattr(args, "conv_channels", 1024)
+    # Transformer
+    args.encoder_embed_dim = getattr(args, "encoder_embed_dim", 512)
+    args.encoder_ffn_embed_dim = getattr(args, "encoder_ffn_embed_dim", 2048)
+    args.encoder_layers = getattr(args, "encoder_layers", 12)
+    args.encoder_attention_heads = getattr(args, "encoder_attention_heads", 8)
+    args.encoder_normalize_before = getattr(args, "encoder_normalize_before", True)
+    args.decoder_embed_dim = getattr(args, "decoder_embed_dim", args.encoder_embed_dim)
+    args.decoder_ffn_embed_dim = getattr(
+        args, "decoder_ffn_embed_dim", args.encoder_ffn_embed_dim
+    )
+    args.decoder_layers = getattr(args, "decoder_layers", 6)
+    args.decoder_attention_heads = getattr(args, "decoder_attention_heads", 8)
+    args.decoder_normalize_before = getattr(args, "decoder_normalize_before", True)
+    args.decoder_learned_pos = getattr(args, "decoder_learned_pos", False)
+    args.dropout = getattr(args, "dropout", 0.1)
+    args.attention_dropout = getattr(args, "attention_dropout", args.dropout)
+    args.activation_dropout = getattr(args, "activation_dropout", args.dropout)
+    args.activation_fn = getattr(args, "activation_fn", "relu")
+    args.adaptive_softmax_cutoff = getattr(args, "adaptive_softmax_cutoff", None)
+    args.adaptive_softmax_dropout = getattr(args, "adaptive_softmax_dropout", 0)
+    args.share_decoder_input_output_embed = getattr(
+        args, "share_decoder_input_output_embed", False
+    )
+    args.no_token_positional_embeddings = getattr(
+        args, "no_token_positional_embeddings", False
+    )
+    args.adaptive_input = getattr(args, "adaptive_input", False)
+    args.decoder_layerdrop = getattr(args, "decoder_layerdrop", 0.0)
+    args.decoder_output_dim = getattr(
+        args, "decoder_output_dim", args.decoder_embed_dim
+    )
+    args.decoder_input_dim = getattr(args, "decoder_input_dim", args.decoder_embed_dim)
+    args.no_scale_embedding = getattr(args, "no_scale_embedding", False)
+    args.quant_noise_pq = getattr(args, "quant_noise_pq", 0)
+
+
+@register_model_architecture("s2t_transformer", "s2t_transformer_s")
+def s2t_transformer_s(args):
+    args.encoder_embed_dim = getattr(args, "encoder_embed_dim", 256)
+    args.encoder_ffn_embed_dim = getattr(args, "encoder_ffn_embed_dim", 256 * 8)
+    args.encoder_attention_heads = getattr(args, "encoder_attention_heads", 4)
+    args.decoder_attention_heads = getattr(args, "decoder_attention_heads", 4)
+    args.dropout = getattr(args, "dropout", 0.1)
+    base_architecture(args)
+
+
+@register_model_architecture("s2t_transformer", "s2t_transformer_xs")
+def s2t_transformer_xs(args):
+    args.encoder_layers = getattr(args, "encoder_layers", 6)
+    args.decoder_layers = getattr(args, "decoder_layers", 3)
+    args.encoder_ffn_embed_dim = getattr(args, "encoder_ffn_embed_dim", 256 * 4)
+    args.dropout = getattr(args, "dropout", 0.3)
+    s2t_transformer_s(args)
+
+
+@register_model_architecture("s2t_transformer", "s2t_transformer_sp")
+def s2t_transformer_sp(args):
+    args.encoder_layers = getattr(args, "encoder_layers", 16)
+    s2t_transformer_s(args)
+
+
+@register_model_architecture("s2t_transformer", "s2t_transformer_m")
+def s2t_transformer_m(args):
+    args.encoder_embed_dim = getattr(args, "encoder_embed_dim", 512)
+    args.encoder_ffn_embed_dim = getattr(args, "encoder_ffn_embed_dim", 512 * 4)
+    args.encoder_attention_heads = getattr(args, "encoder_attention_heads", 8)
+    args.decoder_attention_heads = getattr(args, "decoder_attention_heads", 8)
+    args.dropout = getattr(args, "dropout", 0.15)
+    base_architecture(args)
+
+
+@register_model_architecture("s2t_transformer", "s2t_transformer_mp")
+def s2t_transformer_mp(args):
+    args.encoder_layers = getattr(args, "encoder_layers", 16)
+    s2t_transformer_m(args)
+
+
+@register_model_architecture("s2t_transformer", "s2t_transformer_l")
+def s2t_transformer_l(args):
+    args.encoder_embed_dim = getattr(args, "encoder_embed_dim", 1024)
+    args.encoder_ffn_embed_dim = getattr(args, "encoder_ffn_embed_dim", 1024 * 4)
+    args.encoder_attention_heads = getattr(args, "encoder_attention_heads", 16)
+    args.decoder_attention_heads = getattr(args, "decoder_attention_heads", 16)
+    args.dropout = getattr(args, "dropout", 0.2)
+    base_architecture(args)
+
+
+@register_model_architecture("s2t_transformer", "s2t_transformer_lp")
+def s2t_transformer_lp(args):
+    args.encoder_layers = getattr(args, "encoder_layers", 16)
+    s2t_transformer_l(args)
diff --git a/SpeechT5/fairseq/fairseq/models/speech_to_text/utils.py b/SpeechT5/fairseq/fairseq/models/speech_to_text/utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..573f8537c9b5940f3eff1fef5e732c6ae7e7fdc0
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/models/speech_to_text/utils.py
@@ -0,0 +1,564 @@
+#!/usr/bin/env python3
+# Copyright (c) 2017-present, Facebook, Inc.
+# All rights reserved.
+#
+# This source code is licensed under the license found in the LICENSE file in
+# the root directory of this source tree. An additional grant of patent rights
+# can be found in the PATENTS file in the same directory.
+
+
+import logging
+from collections.abc import Iterable
+from itertools import repeat
+from typing import List, Optional, Tuple
+
+import torch
+from torch import Tensor
+
+
+# ------------------------------------------------------------------------------
+#   assert_equal()
+# ------------------------------------------------------------------------------
+
+
+def assert_equal(value1, value2, name1=None, name2=None):
+    """Asserts two values are equal otherwise raise an error."""
+
+    str_name1 = "" if name1 is None else "{} ".format(name1)
+    str_name2 = "" if name2 is None else "{} ".format(name2)
+    if value1 != value2:
+        str_value1 = "{}" if name1 is None else "({})"
+        str_value1 = str_value1.format(value1)
+        str_value2 = "{}" if name2 is None else "({})"
+        str_value2 = str_value2.format(value2)
+        raise ValueError(
+            "Expected {}{} == {}{}".format(str_name1, str_value1, str_name2, str_value2)
+        )
+
+
+def fill_config(config, key, value):
+    if value is not None:
+        if key not in config or config[key] is None:
+            config[key] = value
+        assert_equal(value, config[key], "value", f'config["{key}"]')
+
+
+# ------------------------------------------------------------------------------
+#   check_and_return_expected()
+# ------------------------------------------------------------------------------
+
+
+def check_and_return_expected(value, undefined_value, expected_value, name=None):
+    """
+    Return the expected value while checking if the given value is undefined or
+    equal to the expected value.
+    """
+    if (undefined_value is None and value is None) or (undefined_value == value):
+        return expected_value
+    if value != expected_value:
+        str_name = "" if name is None else "{} ".format(name)
+        str_value = "{}" if name is None else "({})"
+        str_value = str_value.format(value)
+        raise ValueError(
+            "Expected {}{} == {}".format(str_name, str_value, expected_value)
+        )
+    return expected_value
+
+
+# ------------------------------------------------------------------------------
+#   get_time_axis()
+# ------------------------------------------------------------------------------
+
+
+def get_time_axis(layout):
+    """
+    Extract the time axis from the layout, for example for breaking sequence into
+    segments.
+    """
+    if layout in ["TB", "TBD"]:
+        return 0
+    if layout in ["BT", "BTD"]:
+        return 1
+    if layout in ["BCTD"]:
+        return 2
+    raise ValueError("Unsupported layout = {}".format(layout))
+
+
+# ------------------------------------------------------------------------------
+#   get_batch_axis()
+# ------------------------------------------------------------------------------
+
+
+def get_batch_axis(layout):
+    """
+    Extract the batch axis from the layout
+    """
+    if layout in ["TB", "TBD"]:
+        return 1
+    if layout in ["BT", "BTD", "BCTD"]:
+        return 0
+    raise ValueError("Unsupported layout = {}".format(layout))
+
+
+# ------------------------------------------------------------------------------
+#   monotonically_increasing_and_bounded()
+# ------------------------------------------------------------------------------
+
+
+def monotonically_increasing_and_bounded(iterable, min=None, max=None):
+    """
+    Check if the elements in the given iterable are monotonically increasing and
+    bounded by upper/lower bounds.
+    """
+    if not isinstance(iterable, Iterable):
+        raise TypeError(
+            "Expected iterable to be of type Iterable, got ({})".format(
+                iterable.__class__.__name__
+            )
+        )
+    for i in range(len(iterable)):
+        if min is not None and iterable[i] < min:
+            return False
+        if max is not None and iterable[i] > max:
+            return False
+        if i > 0 and iterable[i] <= iterable[i - 1]:
+            return False
+    return True
+
+
+# ------------------------------------------------------------------------------
+#   to_pair()
+# ------------------------------------------------------------------------------
+
+
+def to_pair(value, name):
+    """Make a pair (of type tuple) of given value."""
+    if isinstance(value, Iterable):
+        if len(value) != 2:
+            raise ValueError(
+                "Expected `{}` to have exactly 2 elements, got: ({})".format(
+                    name, value
+                )
+            )
+        return value
+    return tuple(repeat(value, 2))
+
+
+# ------------------------------------------------------------------------------
+#   infer_conv_output_attrs()
+# ------------------------------------------------------------------------------
+
+
+# TODO(cfyeh): figure out if we can get `output_dim` without calling the module.
+def infer_conv_output_attrs(
+    module, input_channels, input_dim, batch_size=1, max_length=8
+):
+    """Get output attributes of a module with input."""
+    input = torch.randn(batch_size, input_channels, max_length, input_dim)
+    output = module(input)
+    output_channels = output.shape[1]
+    output_dim = output.shape[-1]
+    return output_channels, output_dim
+
+
+# ------------------------------------------------------------------------------
+#   NoOp
+# ------------------------------------------------------------------------------
+
+
+class NoOp(torch.nn.Module):
+    """
+    NoOp simply passes the input as the output.
+    """
+
+    def __init__(self):
+        super().__init__()
+
+    def forward(self, input: Tensor) -> Tensor:
+        return input
+
+
+# ------------------------------------------------------------------------------
+#   Permute: a torch.nn.Module applies permutation on the input tensor.
+# ------------------------------------------------------------------------------
+
+
+class Permute(torch.nn.Module):
+    def __init__(self, dims):
+        super().__init__()
+        self.dims = dims
+
+    def forward(self, input: Tensor) -> Tensor:
+        return input.permute(self.dims).contiguous()
+
+
+# ------------------------------------------------------------------------------
+#   lengths_to_padding_mask()
+# ------------------------------------------------------------------------------
+
+
+def lengths_to_padding_mask(lengths: Tensor) -> Tensor:
+    """Convert lengths of shape (B, ) to padding mask."""
+    batch_size = lengths.shape[0]
+    max_length = int(torch.max(lengths).item())
+    padding_mask = torch.arange(  # [0, ..., T-1]
+        max_length, device=lengths.device, dtype=lengths.dtype
+    ).expand(batch_size, max_length) >= lengths.unsqueeze(1)
+
+    return padding_mask
+
+
+# ------------------------------------------------------------------------------
+#   lengths_to_attention_mask()
+# ------------------------------------------------------------------------------
+
+
+def lengths_to_attention_mask(
+    lengths: Tensor,
+    left_context: Optional[int] = None,
+    right_context: Optional[int] = None,
+) -> Optional[Tensor]:
+    """
+    Generate attention mask based on (lengths, left_context, right_context).
+    left_context is None means unlimited left context.
+    right_context is None means unlimited right context.
+    """
+
+    if left_context is None and right_context is None:
+        return None
+
+    max_length = int(torch.max(lengths).item())
+
+    # For example, with `max_length` == 5,
+    # indices = tensor([
+    #     [ 0,  1,  2,  3,  4,  5],
+    #     [-1,  0,  1,  2,  3,  4],
+    #     [-2, -1,  0,  1,  2,  3],
+    #     [-3, -2, -1,  0,  1,  2],
+    #     [-4, -3, -2, -1,  0,  1],
+    #     [-5, -4, -3, -2, -1,  0],
+    # ])
+
+    # In some cases the second torch.arange is created on cpu which causes a
+    # failure. Adding the device option to guard against it.
+    indices = torch.arange(
+        max_length, device=lengths.device, dtype=lengths.dtype
+    ).expand(max_length, max_length) - torch.arange(
+        max_length, device=lengths.device
+    ).view(
+        max_length, -1
+    )
+
+    # For example, with `max_length` == 5,
+    # bool_mask = tensor([
+    #     [True, True, True, True, True],
+    #     [True, True, True, True, True],
+    #     [True, True, True, True, True],
+    #     [True, True, True, True, True],
+    #     [True, True, True, True, True],
+    # ])
+    bool_mask = (
+        torch.tensor([True]).to(device=lengths.device).expand(max_length, max_length)
+    )
+
+    # For example, with `max_length` == 5, left_context == 2
+    # left_mask = tensor([
+    #     [ True,  True, True, True, True],
+    #     [ True,  True, True, True, True],
+    #     [ True,  True, True, True, True],
+    #     [False,  True, True, True, True],
+    #     [False, False, True, True, True],
+    # ])
+    if left_context is not None:
+        left_mask = indices >= -left_context
+        bool_mask = bool_mask & left_mask
+
+    # For example, with `max_length` == 5, right_context == 1
+    # right_mask = tensor([
+    #     [True, True, False, False, False],
+    #     [True, True,  True, False, False],
+    #     [True, True,  True,  True, False],
+    #     [True, True,  True,  True,  True],
+    #     [True, True,  True,  True,  True],
+    # ])
+    if right_context is not None:
+        right_mask = indices <= right_context
+        bool_mask = bool_mask & right_mask
+
+    bool_mask = (~bool_mask).to(device=lengths.device)
+    return bool_mask
+
+
+# ------------------------------------------------------------------------------
+#   infer_output_norm()
+# ------------------------------------------------------------------------------
+
+
+def infer_output_norm(module, output_norm=None):
+    """
+    Infer the output norm (string and module) needed on the module gvien desired
+    output normalization.
+    """
+    if output_norm == module.output_norm():
+        # output_norm already matches module.output_norm().
+        return (None, NoOp())
+
+    if output_norm is None and module.output_norm() is not None:
+        logger = logging.getLogger("infer_output_norm()")
+        logger.warning(
+            "trying to set output_norm ({}) ".format(output_norm)
+            + "but got module.output_norm() ({}), ".format(module.output_norm())
+            + "the combined output_norm() will be ({})".format(module.output_norm())
+        )
+        return (None, NoOp())
+
+    if output_norm == "log_softmax":
+        if module.output_norm() is not None:
+            raise ValueError(
+                "incompatible output_norm ({}) ".format(output_norm)
+                + "and module.output_norm() ({})".format(module.output_norm())
+            )
+        else:
+            return ("log_softmax", torch.nn.LogSoftmax(dim=-1))
+
+    if output_norm == "softmax":
+        if module.output_norm() is not None:
+            raise ValueError(
+                "incompatible output_norm ({}) ".format(output_norm)
+                + "and module.output_norm() ({})".format(module.output_norm())
+            )
+        else:
+            return ("softmax", torch.nn.Softmax(dim=-1))
+
+    raise ValueError(
+        "output_norm ({}) not in ".format(output_norm)
+        + "supported list = [None, softmax, log_softmax]"
+    )
+
+
+# ------------------------------------------------------------------------------
+#   infer_channels_from_layout()
+# ------------------------------------------------------------------------------
+
+
+def infer_channels_from_layout(layout, channels):
+    """Extract the number of channels from the layout."""
+    if layout in ("TBD", "BTD"):
+        if channels is not None and channels != 1:
+            raise ValueError(
+                "Expected channels ({}) to be 1 for layout = {}".format(
+                    channels, layout
+                )
+            )
+        if channels is None:
+            return 1
+    return channels
+
+
+# ------------------------------------------------------------------------------
+#   pad_sequence()
+# ------------------------------------------------------------------------------
+
+
+@torch.jit.export
+def pad_sequence(
+    sequence: Tensor,
+    time_axis: int,
+    extra_left_context: int = 0,
+    extra_right_context: int = 0,
+) -> Tensor:
+    """Pad extra left/right contexts to the sequence."""
+
+    if extra_left_context == 0 and extra_right_context == 0:
+        return sequence
+
+    tensors_to_concat = []
+
+    if extra_left_context:
+        size = (extra_left_context,)
+        fill_value = 0
+        indices = torch.full(
+            size=size,
+            fill_value=fill_value,
+            dtype=torch.long,
+            device=sequence.device,
+        )
+        left_padding = torch.index_select(sequence, time_axis, indices)
+        tensors_to_concat.append(left_padding)
+
+    tensors_to_concat.append(sequence)
+
+    # NOTE(cfyeh): for efficiency reason we pad 0 instead of the last frame for
+    #              extra right contexts.
+    if extra_right_context:
+        size = list(sequence.shape)
+        size[time_axis] = extra_right_context
+        right_padding = torch.zeros(size, dtype=sequence.dtype, device=sequence.device)
+        tensors_to_concat.append(right_padding)
+
+    padded_sequence = torch.cat(tensors_to_concat, dim=time_axis)
+    return padded_sequence
+
+
+# ------------------------------------------------------------------------------
+#   sequence_to_segments()
+# ------------------------------------------------------------------------------
+
+
+@torch.jit.export
+def sequence_to_segments(
+    sequence: Tensor,
+    time_axis: int,
+    lengths: Tensor,
+    segment_size: Optional[int] = None,
+    extra_left_context: int = 0,
+    extra_right_context: int = 0,
+) -> List[Tuple[Tensor, Tensor]]:
+    """Breaks sequence into segments."""
+
+    sequence = pad_sequence(
+        sequence=sequence,
+        time_axis=time_axis,
+        extra_left_context=extra_left_context,
+        extra_right_context=extra_right_context,
+    )
+
+    lengths = lengths + extra_left_context + extra_right_context
+
+    segments: List[Tuple[Tensor, Tensor]] = []
+
+    if segment_size is None:
+        segments.append((sequence, lengths))
+        return segments
+
+    offset = 0
+    end = sequence.shape[time_axis]
+    step = segment_size
+    size = extra_left_context + segment_size + extra_right_context
+
+    while offset + extra_left_context + extra_right_context < end:
+        clamped_size = min(size, end - offset)
+        segment_lengths = torch.clamp(lengths - offset, min=0, max=clamped_size)
+        indices = torch.arange(
+            start=offset,
+            end=(offset + clamped_size),
+            step=1,
+            dtype=torch.long,
+            device=sequence.device,
+        )
+        segment_tensor = torch.index_select(sequence, time_axis, indices)
+        segments.append((segment_tensor, segment_lengths))
+        offset = offset + step
+
+    return segments
+
+
+# ------------------------------------------------------------------------------
+#   segments_to_sequence()
+# ------------------------------------------------------------------------------
+
+
+@torch.jit.export
+def segments_to_sequence(
+    segments: List[Tuple[Tensor, Tensor]], time_axis: int
+) -> Tuple[Tensor, Tensor]:
+    """Concatenate segments into a full sequence."""
+    if len(segments) == 1:
+        return segments[0]
+
+    tensors_to_concat: List[Tensor] = []
+    lengths_to_stack: List[Tensor] = []
+
+    for tensor, lengths in segments:
+        tensors_to_concat.append(tensor)
+        lengths_to_stack.append(lengths)
+
+    sequence = torch.cat(tensors_to_concat, dim=time_axis)
+    lengths = torch.stack(lengths_to_stack, dim=0)
+    lengths = torch.sum(lengths, dim=0)
+
+    return sequence, lengths
+
+
+def lengths_to_encoder_padding_mask(lengths, batch_first: bool = False):
+    """
+    convert lengths (a 1-D Long/Int tensor) to 2-D binary tensor
+
+    Args:
+        lengths: a (B, )-shaped tensor
+        batch_first: whether to return a (B, T) tensor
+
+    Return:
+        max_length: maximum length of B sequences
+        encoder_padding_mask: a (max_length, B) binary mask, where
+        [t, b] = False for t < lengths[b] and True otherwise
+
+    TODO:
+        kernelize this function if benchmarking shows this function is slow
+    """
+    max_lengths = torch.max(lengths).item()
+    bsz = lengths.size(0)
+    encoder_padding_mask = torch.arange(
+        max_lengths
+    ).to(  # a (T, ) tensor with [0, ..., T-1]
+        lengths.device
+    ).view(  # move to the right device
+        1, max_lengths
+    ).expand(  # reshape to (1, T)-shaped tensor
+        bsz, -1
+    ) > lengths.view(  # expand to (B, T)-shaped tensor
+        bsz, 1
+    ).expand(
+        -1, max_lengths
+    )
+    if not batch_first:
+        return encoder_padding_mask.t(), max_lengths
+    else:
+        return encoder_padding_mask, max_lengths
+
+
+# ------------------------------------------------------------------------------
+#   attention suppression
+# ------------------------------------------------------------------------------
+
+
+def attention_suppression(attention_weights: Tensor, scale: float):
+    # B, H, qlen, klen -> B, H, qlen, 1
+    attention_prob = torch.nn.functional.softmax(attention_weights.float(), dim=-1)
+    attention_nozeros = attention_prob.to(torch.bool)
+    nozeros_sum = torch.sum(attention_nozeros.to(torch.float), dim=-1, keepdim=True)
+
+    # For very sparse situation, we need get round about 0s
+    key_sum = torch.sum(attention_prob, dim=-1, keepdim=True)
+
+    # nozeros_sum should > 1
+    key_mean = key_sum / (nozeros_sum + 1e-8)
+
+    # std calculation
+    dis = (attention_prob - key_mean) * (attention_prob - key_mean)
+
+    # if attention_prob[i] < threshold, then dis_masked[i] = 0; for all i
+    dis_masked = torch.where(
+        attention_nozeros, dis, attention_prob.new_zeros(attention_prob.size())
+    )
+
+    key_var = torch.sum(dis_masked, dim=-1, keepdim=True)
+    key_var = key_var / (nozeros_sum - 1.0 + 1e-8)
+    key_std = torch.sqrt(key_var)
+    key_thread = key_mean - scale * key_std
+
+    # if attention_prob[i] >= key_thread, then attention_prob[i]
+    # , otherwise "-inf"
+    inf_tensor = attention_prob.new_zeros(attention_prob.size()).detach()
+    inf_tensor[:] = float("-inf")
+    attention_weights_float = torch.where(
+        attention_prob < key_thread,
+        inf_tensor,
+        attention_weights.float(),
+    )
+
+    return attention_weights_float.type_as(attention_weights)
+
+
+def layer_norm_backward_hook(module, grad_input, grad_output, clamp_value):
+    return tuple(torch.clamp(v, min=-clamp_value, max=clamp_value) for v in grad_input)
diff --git a/SpeechT5/fairseq/fairseq/models/transformer.py b/SpeechT5/fairseq/fairseq/models/transformer.py
new file mode 100644
index 0000000000000000000000000000000000000000..f4f6bea27bb4c021aaea33e86f5e481edbb3facc
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/models/transformer.py
@@ -0,0 +1,1187 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import math
+from typing import Any, Dict, List, Optional, Tuple
+
+import torch
+import torch.nn as nn
+from fairseq import utils
+from fairseq.distributed import fsdp_wrap
+from fairseq.models import (
+    FairseqEncoder,
+    FairseqEncoderDecoderModel,
+    FairseqIncrementalDecoder,
+    register_model,
+    register_model_architecture,
+)
+from fairseq.modules import (
+    AdaptiveSoftmax,
+    BaseLayer,
+    FairseqDropout,
+    LayerDropModuleList,
+    LayerNorm,
+    PositionalEmbedding,
+    SinusoidalPositionalEmbedding,
+    TransformerDecoderLayer,
+    TransformerEncoderLayer,
+)
+from fairseq.modules.checkpoint_activations import checkpoint_wrapper
+from fairseq.modules.quant_noise import quant_noise as apply_quant_noise_
+from torch import Tensor
+
+
+DEFAULT_MAX_SOURCE_POSITIONS = 1024
+DEFAULT_MAX_TARGET_POSITIONS = 1024
+
+
+DEFAULT_MIN_PARAMS_TO_WRAP = int(1e8)
+
+
+@register_model("transformer")
+class TransformerModel(FairseqEncoderDecoderModel):
+    """
+    Transformer model from `"Attention Is All You Need" (Vaswani, et al, 2017)
+    <https://arxiv.org/abs/1706.03762>`_.
+
+    Args:
+        encoder (TransformerEncoder): the encoder
+        decoder (TransformerDecoder): the decoder
+
+    The Transformer model provides the following named architectures and
+    command-line arguments:
+
+    .. argparse::
+        :ref: fairseq.models.transformer_parser
+        :prog:
+    """
+
+    @classmethod
+    def hub_models(cls):
+        # fmt: off
+
+        def moses_subword(path):
+            return {
+                'path': path,
+                'tokenizer': 'moses',
+                'bpe': 'subword_nmt',
+            }
+
+        def moses_fastbpe(path):
+            return {
+                'path': path,
+                'tokenizer': 'moses',
+                'bpe': 'fastbpe',
+            }
+
+        def spm(path):
+            return {
+                'path': path,
+                'bpe': 'sentencepiece',
+                'tokenizer': 'space',
+            }
+
+        return {
+            'transformer.wmt14.en-fr': moses_subword('https://dl.fbaipublicfiles.com/fairseq/models/wmt14.en-fr.joined-dict.transformer.tar.bz2'),
+            'transformer.wmt16.en-de': 'https://dl.fbaipublicfiles.com/fairseq/models/wmt16.en-de.joined-dict.transformer.tar.bz2',
+            'transformer.wmt18.en-de': moses_subword('https://dl.fbaipublicfiles.com/fairseq/models/wmt18.en-de.ensemble.tar.gz'),
+            'transformer.wmt19.en-de': moses_fastbpe('https://dl.fbaipublicfiles.com/fairseq/models/wmt19.en-de.joined-dict.ensemble.tar.gz'),
+            'transformer.wmt19.en-ru': moses_fastbpe('https://dl.fbaipublicfiles.com/fairseq/models/wmt19.en-ru.ensemble.tar.gz'),
+            'transformer.wmt19.de-en': moses_fastbpe('https://dl.fbaipublicfiles.com/fairseq/models/wmt19.de-en.joined-dict.ensemble.tar.gz'),
+            'transformer.wmt19.ru-en': moses_fastbpe('https://dl.fbaipublicfiles.com/fairseq/models/wmt19.ru-en.ensemble.tar.gz'),
+            'transformer.wmt19.en-de.single_model': moses_fastbpe('https://dl.fbaipublicfiles.com/fairseq/models/wmt19.en-de.joined-dict.single_model.tar.gz'),
+            'transformer.wmt19.en-ru.single_model': moses_fastbpe('https://dl.fbaipublicfiles.com/fairseq/models/wmt19.en-ru.single_model.tar.gz'),
+            'transformer.wmt19.de-en.single_model': moses_fastbpe('https://dl.fbaipublicfiles.com/fairseq/models/wmt19.de-en.joined-dict.single_model.tar.gz'),
+            'transformer.wmt19.ru-en.single_model': moses_fastbpe('https://dl.fbaipublicfiles.com/fairseq/models/wmt19.ru-en.single_model.tar.gz'),
+            'transformer.wmt20.en-ta': spm('https://dl.fbaipublicfiles.com/fairseq/models/wmt20.en-ta.single.tar.gz'),
+            'transformer.wmt20.en-iu.news': spm('https://dl.fbaipublicfiles.com/fairseq/models/wmt20.en-iu.news.single.tar.gz'),
+            'transformer.wmt20.en-iu.nh': spm('https://dl.fbaipublicfiles.com/fairseq/models/wmt20.en-iu.nh.single.tar.gz'),
+            'transformer.wmt20.ta-en': spm('https://dl.fbaipublicfiles.com/fairseq/models/wmt20.ta-en.single.tar.gz'),
+            'transformer.wmt20.iu-en.news': spm('https://dl.fbaipublicfiles.com/fairseq/models/wmt20.iu-en.news.single.tar.gz'),
+            'transformer.wmt20.iu-en.nh': spm('https://dl.fbaipublicfiles.com/fairseq/models/wmt20.iu-en.nh.single.tar.gz'),
+            'transformer.flores101.mm100.615M': spm('https://dl.fbaipublicfiles.com/flores101/pretrained_models/flores101_mm100_615M.tar.gz'),
+            'transformer.flores101.mm100.175M': spm('https://dl.fbaipublicfiles.com/flores101/pretrained_models/flores101_mm100_175M.tar.gz'),
+        }
+        # fmt: on
+
+    def __init__(self, args, encoder, decoder):
+        super().__init__(encoder, decoder)
+        self.args = args
+        self.supports_align_args = True
+
+    @staticmethod
+    def add_args(parser):
+        """Add model-specific arguments to the parser."""
+        # fmt: off
+        parser.add_argument('--activation-fn',
+                            choices=utils.get_available_activation_fns(),
+                            help='activation function to use')
+        parser.add_argument('--dropout', type=float, metavar='D',
+                            help='dropout probability')
+        parser.add_argument('--attention-dropout', type=float, metavar='D',
+                            help='dropout probability for attention weights')
+        parser.add_argument('--activation-dropout', '--relu-dropout', type=float, metavar='D',
+                            help='dropout probability after activation in FFN.')
+        parser.add_argument('--encoder-embed-path', type=str, metavar='STR',
+                            help='path to pre-trained encoder embedding')
+        parser.add_argument('--encoder-embed-dim', type=int, metavar='N',
+                            help='encoder embedding dimension')
+        parser.add_argument('--encoder-ffn-embed-dim', type=int, metavar='N',
+                            help='encoder embedding dimension for FFN')
+        parser.add_argument('--encoder-layers', type=int, metavar='N',
+                            help='num encoder layers')
+        parser.add_argument('--encoder-attention-heads', type=int, metavar='N',
+                            help='num encoder attention heads')
+        parser.add_argument('--encoder-normalize-before', action='store_true',
+                            help='apply layernorm before each encoder block')
+        parser.add_argument('--encoder-learned-pos', action='store_true',
+                            help='use learned positional embeddings in the encoder')
+        parser.add_argument('--decoder-embed-path', type=str, metavar='STR',
+                            help='path to pre-trained decoder embedding')
+        parser.add_argument('--decoder-embed-dim', type=int, metavar='N',
+                            help='decoder embedding dimension')
+        parser.add_argument('--decoder-ffn-embed-dim', type=int, metavar='N',
+                            help='decoder embedding dimension for FFN')
+        parser.add_argument('--decoder-layers', type=int, metavar='N',
+                            help='num decoder layers')
+        parser.add_argument('--decoder-attention-heads', type=int, metavar='N',
+                            help='num decoder attention heads')
+        parser.add_argument('--decoder-learned-pos', action='store_true',
+                            help='use learned positional embeddings in the decoder')
+        parser.add_argument('--decoder-normalize-before', action='store_true',
+                            help='apply layernorm before each decoder block')
+        parser.add_argument('--decoder-output-dim', type=int, metavar='N',
+                            help='decoder output dimension (extra linear layer '
+                                 'if different from decoder embed dim')
+        parser.add_argument('--share-decoder-input-output-embed', action='store_true',
+                            help='share decoder input and output embeddings')
+        parser.add_argument('--share-all-embeddings', action='store_true',
+                            help='share encoder, decoder and output embeddings'
+                                 ' (requires shared dictionary and embed dim)')
+        parser.add_argument('--no-token-positional-embeddings', default=False, action='store_true',
+                            help='if set, disables positional embeddings (outside self attention)')
+        parser.add_argument('--adaptive-softmax-cutoff', metavar='EXPR',
+                            help='comma separated list of adaptive softmax cutoff points. '
+                                 'Must be used with adaptive_loss criterion'),
+        parser.add_argument('--adaptive-softmax-dropout', type=float, metavar='D',
+                            help='sets adaptive softmax dropout for the tail projections')
+        parser.add_argument('--layernorm-embedding', action='store_true',
+                            help='add layernorm to embedding')
+        parser.add_argument('--no-scale-embedding', action='store_true',
+                            help='if True, dont scale embeddings')
+        parser.add_argument('--checkpoint-activations', action='store_true',
+                            help='checkpoint activations at each layer, which saves GPU '
+                                 'memory usage at the cost of some additional compute')
+        parser.add_argument('--offload-activations', action='store_true',
+                            help='checkpoint activations at each layer, then save to gpu. Sets --checkpoint-activations.')
+        # args for "Cross+Self-Attention for Transformer Models" (Peitz et al., 2019)
+        parser.add_argument('--no-cross-attention', default=False, action='store_true',
+                            help='do not perform cross-attention')
+        parser.add_argument('--cross-self-attention', default=False, action='store_true',
+                            help='perform cross+self-attention')
+        # args for "Reducing Transformer Depth on Demand with Structured Dropout" (Fan et al., 2019)
+        parser.add_argument('--encoder-layerdrop', type=float, metavar='D', default=0,
+                            help='LayerDrop probability for encoder')
+        parser.add_argument('--decoder-layerdrop', type=float, metavar='D', default=0,
+                            help='LayerDrop probability for decoder')
+        parser.add_argument('--encoder-layers-to-keep', default=None,
+                            help='which layers to *keep* when pruning as a comma-separated list')
+        parser.add_argument('--decoder-layers-to-keep', default=None,
+                            help='which layers to *keep* when pruning as a comma-separated list')
+        # args for Training with Quantization Noise for Extreme Model Compression ({Fan*, Stock*} et al., 2020)
+        parser.add_argument('--quant-noise-pq', type=float, metavar='D', default=0,
+                            help='iterative PQ quantization noise at training time')
+        parser.add_argument('--quant-noise-pq-block-size', type=int, metavar='D', default=8,
+                            help='block size of quantization noise at training time')
+        parser.add_argument('--quant-noise-scalar', type=float, metavar='D', default=0,
+                            help='scalar quantization noise and scalar quantization at training time')
+        # args for Fully Sharded Data Parallel (FSDP) training
+        parser.add_argument(
+            '--min-params-to-wrap', type=int, metavar='D', default=DEFAULT_MIN_PARAMS_TO_WRAP,
+            help=(
+                'minimum number of params for a layer to be wrapped with FSDP() when '
+                'training with --ddp-backend=fully_sharded. Smaller values will '
+                'improve memory efficiency, but may make torch.distributed '
+                'communication less efficient due to smaller input sizes. This option '
+                'is set to 0 (i.e., always wrap) when --checkpoint-activations or '
+                '--offload-activations are passed.'
+            )
+        )
+        # fmt: on
+
+    @classmethod
+    def build_model(cls, args, task):
+        """Build a new model instance."""
+
+        # make sure all arguments are present in older models
+        base_architecture(args)
+
+        if args.encoder_layers_to_keep:
+            args.encoder_layers = len(args.encoder_layers_to_keep.split(","))
+        if args.decoder_layers_to_keep:
+            args.decoder_layers = len(args.decoder_layers_to_keep.split(","))
+
+        if getattr(args, "max_source_positions", None) is None:
+            args.max_source_positions = DEFAULT_MAX_SOURCE_POSITIONS
+        if getattr(args, "max_target_positions", None) is None:
+            args.max_target_positions = DEFAULT_MAX_TARGET_POSITIONS
+
+        src_dict, tgt_dict = task.source_dictionary, task.target_dictionary
+
+        if args.share_all_embeddings:
+            if src_dict != tgt_dict:
+                raise ValueError("--share-all-embeddings requires a joined dictionary")
+            if args.encoder_embed_dim != args.decoder_embed_dim:
+                raise ValueError(
+                    "--share-all-embeddings requires --encoder-embed-dim to match --decoder-embed-dim"
+                )
+            if args.decoder_embed_path and (
+                args.decoder_embed_path != args.encoder_embed_path
+            ):
+                raise ValueError(
+                    "--share-all-embeddings not compatible with --decoder-embed-path"
+                )
+            encoder_embed_tokens = cls.build_embedding(
+                args, src_dict, args.encoder_embed_dim, args.encoder_embed_path
+            )
+            decoder_embed_tokens = encoder_embed_tokens
+            args.share_decoder_input_output_embed = True
+        else:
+            encoder_embed_tokens = cls.build_embedding(
+                args, src_dict, args.encoder_embed_dim, args.encoder_embed_path
+            )
+            decoder_embed_tokens = cls.build_embedding(
+                args, tgt_dict, args.decoder_embed_dim, args.decoder_embed_path
+            )
+        if getattr(args, "offload_activations", False):
+            args.checkpoint_activations = True  # offloading implies checkpointing
+        encoder = cls.build_encoder(args, src_dict, encoder_embed_tokens)
+        decoder = cls.build_decoder(args, tgt_dict, decoder_embed_tokens)
+        if not args.share_all_embeddings:
+            min_params_to_wrap = getattr(
+                args, "min_params_to_wrap", DEFAULT_MIN_PARAMS_TO_WRAP
+            )
+            # fsdp_wrap is a no-op when --ddp-backend != fully_sharded
+            encoder = fsdp_wrap(encoder, min_num_params=min_params_to_wrap)
+            decoder = fsdp_wrap(decoder, min_num_params=min_params_to_wrap)
+        return cls(args, encoder, decoder)
+
+    @classmethod
+    def build_embedding(cls, args, dictionary, embed_dim, path=None):
+        num_embeddings = len(dictionary)
+        padding_idx = dictionary.pad()
+
+        emb = Embedding(num_embeddings, embed_dim, padding_idx)
+        # if provided, load from preloaded dictionaries
+        if path:
+            embed_dict = utils.parse_embedding(path)
+            utils.load_embedding(embed_dict, dictionary, emb)
+        return emb
+
+    @classmethod
+    def build_encoder(cls, args, src_dict, embed_tokens):
+        return TransformerEncoder(args, src_dict, embed_tokens)
+
+    @classmethod
+    def build_decoder(cls, args, tgt_dict, embed_tokens):
+        return TransformerDecoder(
+            args,
+            tgt_dict,
+            embed_tokens,
+            no_encoder_attn=getattr(args, "no_cross_attention", False),
+        )
+
+    # TorchScript doesn't support optional arguments with variable length (**kwargs).
+    # Current workaround is to add union of all arguments in child classes.
+    def forward(
+        self,
+        src_tokens,
+        src_lengths,
+        prev_output_tokens,
+        return_all_hiddens: bool = True,
+        features_only: bool = False,
+        alignment_layer: Optional[int] = None,
+        alignment_heads: Optional[int] = None,
+    ):
+        """
+        Run the forward pass for an encoder-decoder model.
+
+        Copied from the base class, but without ``**kwargs``,
+        which are not supported by TorchScript.
+        """
+        encoder_out = self.encoder(
+            src_tokens, src_lengths=src_lengths, return_all_hiddens=return_all_hiddens
+        )
+        decoder_out = self.decoder(
+            prev_output_tokens,
+            encoder_out=encoder_out,
+            features_only=features_only,
+            alignment_layer=alignment_layer,
+            alignment_heads=alignment_heads,
+            src_lengths=src_lengths,
+            return_all_hiddens=return_all_hiddens,
+        )
+        return decoder_out
+
+    # Since get_normalized_probs is in the Fairseq Model which is not scriptable,
+    # I rewrite the get_normalized_probs from Base Class to call the
+    # helper function in the Base Class.
+    @torch.jit.export
+    def get_normalized_probs(
+        self,
+        net_output: Tuple[Tensor, Optional[Dict[str, List[Optional[Tensor]]]]],
+        log_probs: bool,
+        sample: Optional[Dict[str, Tensor]] = None,
+    ):
+        """Get normalized probabilities (or log probs) from a net's output."""
+        return self.get_normalized_probs_scriptable(net_output, log_probs, sample)
+
+
+class TransformerEncoder(FairseqEncoder):
+    """
+    Transformer encoder consisting of *args.encoder_layers* layers. Each layer
+    is a :class:`TransformerEncoderLayer`.
+
+    Args:
+        args (argparse.Namespace): parsed command-line arguments
+        dictionary (~fairseq.data.Dictionary): encoding dictionary
+        embed_tokens (torch.nn.Embedding): input embedding
+    """
+
+    def __init__(self, args, dictionary, embed_tokens):
+        self.args = args
+        super().__init__(dictionary)
+        self.register_buffer("version", torch.Tensor([3]))
+
+        self.dropout_module = FairseqDropout(
+            args.dropout, module_name=self.__class__.__name__
+        )
+        self.encoder_layerdrop = args.encoder_layerdrop
+
+        embed_dim = embed_tokens.embedding_dim
+        self.padding_idx = embed_tokens.padding_idx
+        self.max_source_positions = args.max_source_positions
+
+        self.embed_tokens = embed_tokens
+
+        self.embed_scale = 1.0 if args.no_scale_embedding else math.sqrt(embed_dim)
+
+        self.embed_positions = (
+            PositionalEmbedding(
+                args.max_source_positions,
+                embed_dim,
+                self.padding_idx,
+                learned=args.encoder_learned_pos,
+            )
+            if not args.no_token_positional_embeddings
+            else None
+        )
+        export = getattr(args, "export", False)
+        if getattr(args, "layernorm_embedding", False):
+            self.layernorm_embedding = LayerNorm(embed_dim, export=export)
+        else:
+            self.layernorm_embedding = None
+
+        if not args.adaptive_input and args.quant_noise_pq > 0:
+            self.quant_noise = apply_quant_noise_(
+                nn.Linear(embed_dim, embed_dim, bias=False),
+                args.quant_noise_pq,
+                args.quant_noise_pq_block_size,
+            )
+        else:
+            self.quant_noise = None
+
+        if self.encoder_layerdrop > 0.0:
+            self.layers = LayerDropModuleList(p=self.encoder_layerdrop)
+        else:
+            self.layers = nn.ModuleList([])
+        self.layers.extend(
+            [self.build_encoder_layer(args) for i in range(args.encoder_layers)]
+        )
+        self.num_layers = len(self.layers)
+
+        if args.encoder_normalize_before:
+            self.layer_norm = LayerNorm(embed_dim, export=export)
+        else:
+            self.layer_norm = None
+
+    def build_encoder_layer(self, args):
+        layer = TransformerEncoderLayer(args)
+        checkpoint = getattr(args, "checkpoint_activations", False)
+        if checkpoint:
+            offload_to_cpu = getattr(args, "offload_activations", False)
+            layer = checkpoint_wrapper(layer, offload_to_cpu=offload_to_cpu)
+        # if we are checkpointing, enforce that FSDP always wraps the
+        # checkpointed layer, regardless of layer size
+        min_params_to_wrap = (
+            getattr(args, "min_params_to_wrap", DEFAULT_MIN_PARAMS_TO_WRAP)
+            if not checkpoint
+            else 0
+        )
+        layer = fsdp_wrap(layer, min_num_params=min_params_to_wrap)
+        return layer
+
+    def forward_embedding(
+        self, src_tokens, token_embedding: Optional[torch.Tensor] = None
+    ):
+        # embed tokens and positions
+        if token_embedding is None:
+            token_embedding = self.embed_tokens(src_tokens)
+        x = embed = self.embed_scale * token_embedding
+        if self.embed_positions is not None:
+            x = embed + self.embed_positions(src_tokens)
+        if self.layernorm_embedding is not None:
+            x = self.layernorm_embedding(x)
+        x = self.dropout_module(x)
+        if self.quant_noise is not None:
+            x = self.quant_noise(x)
+        return x, embed
+
+    def forward(
+        self,
+        src_tokens,
+        src_lengths: Optional[torch.Tensor] = None,
+        return_all_hiddens: bool = False,
+        token_embeddings: Optional[torch.Tensor] = None,
+    ):
+        """
+        Args:
+            src_tokens (LongTensor): tokens in the source language of shape
+                `(batch, src_len)`
+            src_lengths (torch.LongTensor): lengths of each source sentence of
+                shape `(batch)`
+            return_all_hiddens (bool, optional): also return all of the
+                intermediate hidden states (default: False).
+            token_embeddings (torch.Tensor, optional): precomputed embeddings
+                default `None` will recompute embeddings
+
+        Returns:
+            dict:
+                - **encoder_out** (Tensor): the last encoder layer's output of
+                  shape `(src_len, batch, embed_dim)`
+                - **encoder_padding_mask** (ByteTensor): the positions of
+                  padding elements of shape `(batch, src_len)`
+                - **encoder_embedding** (Tensor): the (scaled) embedding lookup
+                  of shape `(batch, src_len, embed_dim)`
+                - **encoder_states** (List[Tensor]): all intermediate
+                  hidden states of shape `(src_len, batch, embed_dim)`.
+                  Only populated if *return_all_hiddens* is True.
+        """
+        return self.forward_scriptable(
+            src_tokens, src_lengths, return_all_hiddens, token_embeddings
+        )
+
+    # TorchScript doesn't support super() method so that the scriptable Subclass
+    # can't access the base class model in Torchscript.
+    # Current workaround is to add a helper function with different name and
+    # call the helper function from scriptable Subclass.
+    def forward_scriptable(
+        self,
+        src_tokens,
+        src_lengths: Optional[torch.Tensor] = None,
+        return_all_hiddens: bool = False,
+        token_embeddings: Optional[torch.Tensor] = None,
+    ):
+        """
+        Args:
+            src_tokens (LongTensor): tokens in the source language of shape
+                `(batch, src_len)`
+            src_lengths (torch.LongTensor): lengths of each source sentence of
+                shape `(batch)`
+            return_all_hiddens (bool, optional): also return all of the
+                intermediate hidden states (default: False).
+            token_embeddings (torch.Tensor, optional): precomputed embeddings
+                default `None` will recompute embeddings
+
+        Returns:
+            dict:
+                - **encoder_out** (Tensor): the last encoder layer's output of
+                  shape `(src_len, batch, embed_dim)`
+                - **encoder_padding_mask** (ByteTensor): the positions of
+                  padding elements of shape `(batch, src_len)`
+                - **encoder_embedding** (Tensor): the (scaled) embedding lookup
+                  of shape `(batch, src_len, embed_dim)`
+                - **encoder_states** (List[Tensor]): all intermediate
+                  hidden states of shape `(src_len, batch, embed_dim)`.
+                  Only populated if *return_all_hiddens* is True.
+        """
+        # compute padding mask
+        encoder_padding_mask = src_tokens.eq(self.padding_idx)
+        has_pads = src_tokens.device.type == "xla" or encoder_padding_mask.any()
+
+        x, encoder_embedding = self.forward_embedding(src_tokens, token_embeddings)
+
+        # account for padding while computing the representation
+        if has_pads:
+            x = x * (1 - encoder_padding_mask.unsqueeze(-1).type_as(x))
+
+        # B x T x C -> T x B x C
+        x = x.transpose(0, 1)
+
+        encoder_states = []
+
+        if return_all_hiddens:
+            encoder_states.append(x)
+
+        # encoder layers
+        for layer in self.layers:
+            x = layer(
+                x, encoder_padding_mask=encoder_padding_mask if has_pads else None
+            )
+            if return_all_hiddens:
+                assert encoder_states is not None
+                encoder_states.append(x)
+
+        if self.layer_norm is not None:
+            x = self.layer_norm(x)
+
+        # The Pytorch Mobile lite interpreter does not supports returning NamedTuple in
+        # `forward` so we use a dictionary instead.
+        # TorchScript does not support mixed values so the values are all lists.
+        # The empty list is equivalent to None.
+        return {
+            "encoder_out": [x],  # T x B x C
+            "encoder_padding_mask": [encoder_padding_mask],  # B x T
+            "encoder_embedding": [encoder_embedding],  # B x T x C
+            "encoder_states": encoder_states,  # List[T x B x C]
+            "src_tokens": [],
+            "src_lengths": [],
+        }
+
+    @torch.jit.export
+    def reorder_encoder_out(self, encoder_out: Dict[str, List[Tensor]], new_order):
+        """
+        Reorder encoder output according to *new_order*.
+
+        Args:
+            encoder_out: output from the ``forward()`` method
+            new_order (LongTensor): desired order
+
+        Returns:
+            *encoder_out* rearranged according to *new_order*
+        """
+        if len(encoder_out["encoder_out"]) == 0:
+            new_encoder_out = []
+        else:
+            new_encoder_out = [encoder_out["encoder_out"][0].index_select(1, new_order)]
+        if len(encoder_out["encoder_padding_mask"]) == 0:
+            new_encoder_padding_mask = []
+        else:
+            new_encoder_padding_mask = [
+                encoder_out["encoder_padding_mask"][0].index_select(0, new_order)
+            ]
+        if len(encoder_out["encoder_embedding"]) == 0:
+            new_encoder_embedding = []
+        else:
+            new_encoder_embedding = [
+                encoder_out["encoder_embedding"][0].index_select(0, new_order)
+            ]
+
+        if len(encoder_out["src_tokens"]) == 0:
+            src_tokens = []
+        else:
+            src_tokens = [(encoder_out["src_tokens"][0]).index_select(0, new_order)]
+
+        if len(encoder_out["src_lengths"]) == 0:
+            src_lengths = []
+        else:
+            src_lengths = [(encoder_out["src_lengths"][0]).index_select(0, new_order)]
+
+        encoder_states = encoder_out["encoder_states"]
+        if len(encoder_states) > 0:
+            for idx, state in enumerate(encoder_states):
+                encoder_states[idx] = state.index_select(1, new_order)
+
+        return {
+            "encoder_out": new_encoder_out,  # T x B x C
+            "encoder_padding_mask": new_encoder_padding_mask,  # B x T
+            "encoder_embedding": new_encoder_embedding,  # B x T x C
+            "encoder_states": encoder_states,  # List[T x B x C]
+            "src_tokens": src_tokens,  # B x T
+            "src_lengths": src_lengths,  # B x 1
+        }
+
+    def max_positions(self):
+        """Maximum input length supported by the encoder."""
+        if self.embed_positions is None:
+            return self.max_source_positions
+        return min(self.max_source_positions, self.embed_positions.max_positions)
+
+    def upgrade_state_dict_named(self, state_dict, name):
+        """Upgrade a (possibly old) state dict for new versions of fairseq."""
+        if isinstance(self.embed_positions, SinusoidalPositionalEmbedding):
+            weights_key = "{}.embed_positions.weights".format(name)
+            if weights_key in state_dict:
+                print("deleting {0}".format(weights_key))
+                del state_dict[weights_key]
+            state_dict[
+                "{}.embed_positions._float_tensor".format(name)
+            ] = torch.FloatTensor(1)
+        for i in range(self.num_layers):
+            # update layer norms
+            self.layers[i].upgrade_state_dict_named(
+                state_dict, "{}.layers.{}".format(name, i)
+            )
+
+        version_key = "{}.version".format(name)
+        if utils.item(state_dict.get(version_key, torch.Tensor([1]))[0]) < 2:
+            # earlier checkpoints did not normalize after the stack of layers
+            self.layer_norm = None
+            self.normalize = False
+            state_dict[version_key] = torch.Tensor([1])
+        return state_dict
+
+
+class TransformerDecoder(FairseqIncrementalDecoder):
+    """
+    Transformer decoder consisting of *args.decoder_layers* layers. Each layer
+    is a :class:`TransformerDecoderLayer`.
+
+    Args:
+        args (argparse.Namespace): parsed command-line arguments
+        dictionary (~fairseq.data.Dictionary): decoding dictionary
+        embed_tokens (torch.nn.Embedding): output embedding
+        no_encoder_attn (bool, optional): whether to attend to encoder outputs
+            (default: False).
+    """
+
+    def __init__(
+        self,
+        args,
+        dictionary,
+        embed_tokens,
+        no_encoder_attn=False,
+        output_projection=None,
+    ):
+        self.args = args
+        super().__init__(dictionary)
+        self.register_buffer("version", torch.Tensor([3]))
+        self._future_mask = torch.empty(0)
+
+        self.dropout_module = FairseqDropout(
+            args.dropout, module_name=self.__class__.__name__
+        )
+        self.decoder_layerdrop = args.decoder_layerdrop
+        self.share_input_output_embed = args.share_decoder_input_output_embed
+
+        input_embed_dim = embed_tokens.embedding_dim
+        embed_dim = args.decoder_embed_dim
+        self.embed_dim = embed_dim
+        self.output_embed_dim = args.decoder_output_dim
+
+        self.padding_idx = embed_tokens.padding_idx
+        self.max_target_positions = args.max_target_positions
+
+        self.embed_tokens = embed_tokens
+
+        self.embed_scale = 1.0 if args.no_scale_embedding else math.sqrt(embed_dim)
+
+        if not args.adaptive_input and args.quant_noise_pq > 0:
+            self.quant_noise = apply_quant_noise_(
+                nn.Linear(embed_dim, embed_dim, bias=False),
+                args.quant_noise_pq,
+                args.quant_noise_pq_block_size,
+            )
+        else:
+            self.quant_noise = None
+
+        self.project_in_dim = (
+            Linear(input_embed_dim, embed_dim, bias=False)
+            if embed_dim != input_embed_dim
+            else None
+        )
+        self.embed_positions = (
+            PositionalEmbedding(
+                self.max_target_positions,
+                embed_dim,
+                self.padding_idx,
+                learned=args.decoder_learned_pos,
+            )
+            if not args.no_token_positional_embeddings
+            else None
+        )
+        export = getattr(args, "export", False)
+        if getattr(args, "layernorm_embedding", False):
+            self.layernorm_embedding = LayerNorm(embed_dim, export=export)
+        else:
+            self.layernorm_embedding = None
+
+        self.cross_self_attention = getattr(args, "cross_self_attention", False)
+
+        if self.decoder_layerdrop > 0.0:
+            self.layers = LayerDropModuleList(p=self.decoder_layerdrop)
+        else:
+            self.layers = nn.ModuleList([])
+        self.layers.extend(
+            [
+                self.build_decoder_layer(args, no_encoder_attn)
+                for _ in range(args.decoder_layers)
+            ]
+        )
+        self.num_layers = len(self.layers)
+
+        if args.decoder_normalize_before and not getattr(
+            args, "no_decoder_final_norm", False
+        ):
+            self.layer_norm = LayerNorm(embed_dim, export=export)
+        else:
+            self.layer_norm = None
+
+        self.project_out_dim = (
+            Linear(embed_dim, self.output_embed_dim, bias=False)
+            if embed_dim != self.output_embed_dim and not args.tie_adaptive_weights
+            else None
+        )
+
+        self.adaptive_softmax = None
+        self.output_projection = output_projection
+        if self.output_projection is None:
+            self.build_output_projection(args, dictionary, embed_tokens)
+
+    def build_output_projection(self, args, dictionary, embed_tokens):
+        if args.adaptive_softmax_cutoff is not None:
+            self.adaptive_softmax = AdaptiveSoftmax(
+                len(dictionary),
+                self.output_embed_dim,
+                utils.eval_str_list(args.adaptive_softmax_cutoff, type=int),
+                dropout=args.adaptive_softmax_dropout,
+                adaptive_inputs=embed_tokens if args.tie_adaptive_weights else None,
+                factor=args.adaptive_softmax_factor,
+                tie_proj=args.tie_adaptive_proj,
+            )
+        elif self.share_input_output_embed:
+            self.output_projection = nn.Linear(
+                self.embed_tokens.weight.shape[1],
+                self.embed_tokens.weight.shape[0],
+                bias=False,
+            )
+            self.output_projection.weight = self.embed_tokens.weight
+        else:
+            self.output_projection = nn.Linear(
+                self.output_embed_dim, len(dictionary), bias=False
+            )
+            nn.init.normal_(
+                self.output_projection.weight, mean=0, std=self.output_embed_dim ** -0.5
+            )
+        num_base_layers = getattr(args, "base_layers", 0)
+        for i in range(num_base_layers):
+            self.layers.insert(
+                ((i + 1) * args.decoder_layers) // (num_base_layers + 1),
+                BaseLayer(args),
+            )
+
+    def build_decoder_layer(self, args, no_encoder_attn=False):
+        layer = TransformerDecoderLayer(args, no_encoder_attn)
+        checkpoint = getattr(args, "checkpoint_activations", False)
+        if checkpoint:
+            offload_to_cpu = getattr(args, "offload_activations", False)
+            layer = checkpoint_wrapper(layer, offload_to_cpu=offload_to_cpu)
+        # if we are checkpointing, enforce that FSDP always wraps the
+        # checkpointed layer, regardless of layer size
+        min_params_to_wrap = (
+            getattr(args, "min_params_to_wrap", DEFAULT_MIN_PARAMS_TO_WRAP)
+            if not checkpoint
+            else 0
+        )
+        layer = fsdp_wrap(layer, min_num_params=min_params_to_wrap)
+        return layer
+
+    def forward(
+        self,
+        prev_output_tokens,
+        encoder_out: Optional[Dict[str, List[Tensor]]] = None,
+        incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]] = None,
+        features_only: bool = False,
+        full_context_alignment: bool = False,
+        alignment_layer: Optional[int] = None,
+        alignment_heads: Optional[int] = None,
+        src_lengths: Optional[Any] = None,
+        return_all_hiddens: bool = False,
+    ):
+        """
+        Args:
+            prev_output_tokens (LongTensor): previous decoder outputs of shape
+                `(batch, tgt_len)`, for teacher forcing
+            encoder_out (optional): output from the encoder, used for
+                encoder-side attention, should be of size T x B x C
+            incremental_state (dict): dictionary used for storing state during
+                :ref:`Incremental decoding`
+            features_only (bool, optional): only return features without
+                applying output layer (default: False).
+            full_context_alignment (bool, optional): don't apply
+                auto-regressive mask to self-attention (default: False).
+
+        Returns:
+            tuple:
+                - the decoder's output of shape `(batch, tgt_len, vocab)`
+                - a dictionary with any model-specific outputs
+        """
+
+        x, extra = self.extract_features(
+            prev_output_tokens,
+            encoder_out=encoder_out,
+            incremental_state=incremental_state,
+            full_context_alignment=full_context_alignment,
+            alignment_layer=alignment_layer,
+            alignment_heads=alignment_heads,
+        )
+
+        if not features_only:
+            x = self.output_layer(x)
+        return x, extra
+
+    def extract_features(
+        self,
+        prev_output_tokens,
+        encoder_out: Optional[Dict[str, List[Tensor]]],
+        incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]] = None,
+        full_context_alignment: bool = False,
+        alignment_layer: Optional[int] = None,
+        alignment_heads: Optional[int] = None,
+    ):
+        return self.extract_features_scriptable(
+            prev_output_tokens,
+            encoder_out,
+            incremental_state,
+            full_context_alignment,
+            alignment_layer,
+            alignment_heads,
+        )
+
+    """
+    A scriptable subclass of this class has an extract_features method and calls
+    super().extract_features, but super() is not supported in torchscript. A copy of
+    this function is made to be used in the subclass instead.
+    """
+
+    def extract_features_scriptable(
+        self,
+        prev_output_tokens,
+        encoder_out: Optional[Dict[str, List[Tensor]]],
+        incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]] = None,
+        full_context_alignment: bool = False,
+        alignment_layer: Optional[int] = None,
+        alignment_heads: Optional[int] = None,
+    ):
+        """
+        Similar to *forward* but only return features.
+
+        Includes several features from "Jointly Learning to Align and
+        Translate with Transformer Models" (Garg et al., EMNLP 2019).
+
+        Args:
+            full_context_alignment (bool, optional): don't apply
+                auto-regressive mask to self-attention (default: False).
+            alignment_layer (int, optional): return mean alignment over
+                heads at this layer (default: last layer).
+            alignment_heads (int, optional): only average alignment over
+                this many heads (default: all heads).
+
+        Returns:
+            tuple:
+                - the decoder's features of shape `(batch, tgt_len, embed_dim)`
+                - a dictionary with any model-specific outputs
+        """
+        bs, slen = prev_output_tokens.size()
+        if alignment_layer is None:
+            alignment_layer = self.num_layers - 1
+
+        enc: Optional[Tensor] = None
+        padding_mask: Optional[Tensor] = None
+        if encoder_out is not None and len(encoder_out["encoder_out"]) > 0:
+            enc = encoder_out["encoder_out"][0]
+            assert (
+                enc.size()[1] == bs
+            ), f"Expected enc.shape == (t, {bs}, c) got {enc.shape}"
+        if encoder_out is not None and len(encoder_out["encoder_padding_mask"]) > 0:
+            padding_mask = encoder_out["encoder_padding_mask"][0]
+
+        # embed positions
+        positions = None
+        if self.embed_positions is not None:
+            positions = self.embed_positions(
+                prev_output_tokens, incremental_state=incremental_state
+            )
+
+        if incremental_state is not None:
+            prev_output_tokens = prev_output_tokens[:, -1:]
+            if positions is not None:
+                positions = positions[:, -1:]
+
+        # embed tokens and positions
+        x = self.embed_scale * self.embed_tokens(prev_output_tokens)
+
+        if self.quant_noise is not None:
+            x = self.quant_noise(x)
+
+        if self.project_in_dim is not None:
+            x = self.project_in_dim(x)
+
+        if positions is not None:
+            x += positions
+
+        if self.layernorm_embedding is not None:
+            x = self.layernorm_embedding(x)
+
+        x = self.dropout_module(x)
+
+        # B x T x C -> T x B x C
+        x = x.transpose(0, 1)
+
+        self_attn_padding_mask: Optional[Tensor] = None
+        if self.cross_self_attention or prev_output_tokens.eq(self.padding_idx).any():
+            self_attn_padding_mask = prev_output_tokens.eq(self.padding_idx)
+
+        # decoder layers
+        attn: Optional[Tensor] = None
+        inner_states: List[Optional[Tensor]] = [x]
+        for idx, layer in enumerate(self.layers):
+            if incremental_state is None and not full_context_alignment:
+                self_attn_mask = self.buffered_future_mask(x)
+            else:
+                self_attn_mask = None
+
+            x, layer_attn, _ = layer(
+                x,
+                enc,
+                padding_mask,
+                incremental_state,
+                self_attn_mask=self_attn_mask,
+                self_attn_padding_mask=self_attn_padding_mask,
+                need_attn=bool((idx == alignment_layer)),
+                need_head_weights=bool((idx == alignment_layer)),
+            )
+            inner_states.append(x)
+            if layer_attn is not None and idx == alignment_layer:
+                attn = layer_attn.float().to(x)
+
+        if attn is not None:
+            if alignment_heads is not None:
+                attn = attn[:alignment_heads]
+
+            # average probabilities over heads
+            attn = attn.mean(dim=0)
+
+        if self.layer_norm is not None:
+            x = self.layer_norm(x)
+
+        # T x B x C -> B x T x C
+        x = x.transpose(0, 1)
+
+        if self.project_out_dim is not None:
+            x = self.project_out_dim(x)
+
+        return x, {"attn": [attn], "inner_states": inner_states}
+
+    def output_layer(self, features):
+        """Project features to the vocabulary size."""
+        if self.adaptive_softmax is None:
+            # project back to size of vocabulary
+            return self.output_projection(features)
+        else:
+            return features
+
+    def max_positions(self):
+        """Maximum output length supported by the decoder."""
+        if self.embed_positions is None:
+            return self.max_target_positions
+        return min(self.max_target_positions, self.embed_positions.max_positions)
+
+    def buffered_future_mask(self, tensor):
+        dim = tensor.size(0)
+        # self._future_mask.device != tensor.device is not working in TorchScript. This is a workaround.
+        if (
+            self._future_mask.size(0) == 0
+            or (not self._future_mask.device == tensor.device)
+            or self._future_mask.size(0) < dim
+        ):
+            self._future_mask = torch.triu(
+                utils.fill_with_neg_inf(torch.zeros([dim, dim])), 1
+            )
+        self._future_mask = self._future_mask.to(tensor)
+        return self._future_mask[:dim, :dim]
+
+    def upgrade_state_dict_named(self, state_dict, name):
+        """Upgrade a (possibly old) state dict for new versions of fairseq."""
+        if isinstance(self.embed_positions, SinusoidalPositionalEmbedding):
+            weights_key = "{}.embed_positions.weights".format(name)
+            if weights_key in state_dict:
+                del state_dict[weights_key]
+            state_dict[
+                "{}.embed_positions._float_tensor".format(name)
+            ] = torch.FloatTensor(1)
+
+        if f"{name}.output_projection.weight" not in state_dict:
+            if self.share_input_output_embed:
+                embed_out_key = f"{name}.embed_tokens.weight"
+            else:
+                embed_out_key = f"{name}.embed_out"
+            if embed_out_key in state_dict:
+                state_dict[f"{name}.output_projection.weight"] = state_dict[
+                    embed_out_key
+                ]
+                if not self.share_input_output_embed:
+                    del state_dict[embed_out_key]
+
+        for i in range(self.num_layers):
+            # update layer norms
+            layer_norm_map = {
+                "0": "self_attn_layer_norm",
+                "1": "encoder_attn_layer_norm",
+                "2": "final_layer_norm",
+            }
+            for old, new in layer_norm_map.items():
+                for m in ("weight", "bias"):
+                    k = "{}.layers.{}.layer_norms.{}.{}".format(name, i, old, m)
+                    if k in state_dict:
+                        state_dict[
+                            "{}.layers.{}.{}.{}".format(name, i, new, m)
+                        ] = state_dict[k]
+                        del state_dict[k]
+
+        version_key = "{}.version".format(name)
+        if utils.item(state_dict.get(version_key, torch.Tensor([1]))[0]) <= 2:
+            # earlier checkpoints did not normalize after the stack of layers
+            self.layer_norm = None
+            self.normalize = False
+            state_dict[version_key] = torch.Tensor([1])
+
+        return state_dict
+
+
+def Embedding(num_embeddings, embedding_dim, padding_idx):
+    m = nn.Embedding(num_embeddings, embedding_dim, padding_idx=padding_idx)
+    nn.init.normal_(m.weight, mean=0, std=embedding_dim ** -0.5)
+    nn.init.constant_(m.weight[padding_idx], 0)
+    return m
+
+
+def Linear(in_features, out_features, bias=True):
+    m = nn.Linear(in_features, out_features, bias)
+    nn.init.xavier_uniform_(m.weight)
+    if bias:
+        nn.init.constant_(m.bias, 0.0)
+    return m
+
+
+@register_model_architecture("transformer", "transformer_tiny")
+def tiny_architecture(args):
+    args.encoder_embed_dim = getattr(args, "encoder_embed_dim", 64)
+    args.encoder_ffn_embed_dim = getattr(args, "encoder_ffn_embed_dim", 64)
+    args.encoder_layers = getattr(args, "encoder_layers", 2)
+    args.encoder_attention_heads = getattr(args, "encoder_attention_heads", 2)
+    args.decoder_layers = getattr(args, "decoder_layers", 2)
+    args.decoder_attention_heads = getattr(args, "decoder_attention_heads", 2)
+    return base_architecture(args)
+
+
+@register_model_architecture("transformer", "transformer")
+def base_architecture(args):
+    args.encoder_embed_path = getattr(args, "encoder_embed_path", None)
+    args.encoder_embed_dim = getattr(args, "encoder_embed_dim", 512)
+    args.encoder_ffn_embed_dim = getattr(args, "encoder_ffn_embed_dim", 2048)
+    args.encoder_layers = getattr(args, "encoder_layers", 6)
+    args.encoder_attention_heads = getattr(args, "encoder_attention_heads", 8)
+    args.encoder_normalize_before = getattr(args, "encoder_normalize_before", False)
+    args.encoder_learned_pos = getattr(args, "encoder_learned_pos", False)
+    args.decoder_embed_path = getattr(args, "decoder_embed_path", None)
+    args.decoder_embed_dim = getattr(args, "decoder_embed_dim", args.encoder_embed_dim)
+    args.decoder_ffn_embed_dim = getattr(
+        args, "decoder_ffn_embed_dim", args.encoder_ffn_embed_dim
+    )
+    args.decoder_layers = getattr(args, "decoder_layers", 6)
+    args.decoder_attention_heads = getattr(args, "decoder_attention_heads", 8)
+    args.decoder_normalize_before = getattr(args, "decoder_normalize_before", False)
+    args.decoder_learned_pos = getattr(args, "decoder_learned_pos", False)
+    args.attention_dropout = getattr(args, "attention_dropout", 0.0)
+    args.activation_dropout = getattr(args, "activation_dropout", 0.0)
+    args.activation_fn = getattr(args, "activation_fn", "relu")
+    args.dropout = getattr(args, "dropout", 0.1)
+    args.adaptive_softmax_cutoff = getattr(args, "adaptive_softmax_cutoff", None)
+    args.adaptive_softmax_dropout = getattr(args, "adaptive_softmax_dropout", 0)
+    args.share_decoder_input_output_embed = getattr(
+        args, "share_decoder_input_output_embed", False
+    )
+    args.share_all_embeddings = getattr(args, "share_all_embeddings", False)
+    args.no_token_positional_embeddings = getattr(
+        args, "no_token_positional_embeddings", False
+    )
+    args.adaptive_input = getattr(args, "adaptive_input", False)
+    args.no_cross_attention = getattr(args, "no_cross_attention", False)
+    args.cross_self_attention = getattr(args, "cross_self_attention", False)
+
+    args.decoder_output_dim = getattr(
+        args, "decoder_output_dim", args.decoder_embed_dim
+    )
+    args.decoder_input_dim = getattr(args, "decoder_input_dim", args.decoder_embed_dim)
+
+    args.no_scale_embedding = getattr(args, "no_scale_embedding", False)
+    args.layernorm_embedding = getattr(args, "layernorm_embedding", False)
+    args.tie_adaptive_weights = getattr(args, "tie_adaptive_weights", False)
+    args.checkpoint_activations = getattr(args, "checkpoint_activations", False)
+    args.offload_activations = getattr(args, "offload_activations", False)
+    if args.offload_activations:
+        args.checkpoint_activations = True
+    args.encoder_layers_to_keep = getattr(args, "encoder_layers_to_keep", None)
+    args.decoder_layers_to_keep = getattr(args, "decoder_layers_to_keep", None)
+    args.encoder_layerdrop = getattr(args, "encoder_layerdrop", 0)
+    args.decoder_layerdrop = getattr(args, "decoder_layerdrop", 0)
+    args.quant_noise_pq = getattr(args, "quant_noise_pq", 0)
+    args.quant_noise_pq_block_size = getattr(args, "quant_noise_pq_block_size", 8)
+    args.quant_noise_scalar = getattr(args, "quant_noise_scalar", 0)
+
+
+@register_model_architecture("transformer", "transformer_iwslt_de_en")
+def transformer_iwslt_de_en(args):
+    args.encoder_embed_dim = getattr(args, "encoder_embed_dim", 512)
+    args.encoder_ffn_embed_dim = getattr(args, "encoder_ffn_embed_dim", 1024)
+    args.encoder_attention_heads = getattr(args, "encoder_attention_heads", 4)
+    args.encoder_layers = getattr(args, "encoder_layers", 6)
+    args.decoder_embed_dim = getattr(args, "decoder_embed_dim", 512)
+    args.decoder_ffn_embed_dim = getattr(args, "decoder_ffn_embed_dim", 1024)
+    args.decoder_attention_heads = getattr(args, "decoder_attention_heads", 4)
+    args.decoder_layers = getattr(args, "decoder_layers", 6)
+    base_architecture(args)
+
+
+@register_model_architecture("transformer", "transformer_wmt_en_de")
+def transformer_wmt_en_de(args):
+    base_architecture(args)
+
+
+# parameters used in the "Attention Is All You Need" paper (Vaswani et al., 2017)
+@register_model_architecture("transformer", "transformer_vaswani_wmt_en_de_big")
+def transformer_vaswani_wmt_en_de_big(args):
+    args.encoder_embed_dim = getattr(args, "encoder_embed_dim", 1024)
+    args.encoder_ffn_embed_dim = getattr(args, "encoder_ffn_embed_dim", 4096)
+    args.encoder_attention_heads = getattr(args, "encoder_attention_heads", 16)
+    args.encoder_normalize_before = getattr(args, "encoder_normalize_before", False)
+    args.decoder_embed_dim = getattr(args, "decoder_embed_dim", 1024)
+    args.decoder_ffn_embed_dim = getattr(args, "decoder_ffn_embed_dim", 4096)
+    args.decoder_attention_heads = getattr(args, "decoder_attention_heads", 16)
+    args.dropout = getattr(args, "dropout", 0.3)
+    base_architecture(args)
+
+
+@register_model_architecture("transformer", "transformer_vaswani_wmt_en_fr_big")
+def transformer_vaswani_wmt_en_fr_big(args):
+    args.dropout = getattr(args, "dropout", 0.1)
+    transformer_vaswani_wmt_en_de_big(args)
+
+
+@register_model_architecture("transformer", "transformer_wmt_en_de_big")
+def transformer_wmt_en_de_big(args):
+    args.attention_dropout = getattr(args, "attention_dropout", 0.1)
+    transformer_vaswani_wmt_en_de_big(args)
+
+
+# default parameters used in tensor2tensor implementation
+@register_model_architecture("transformer", "transformer_wmt_en_de_big_t2t")
+def transformer_wmt_en_de_big_t2t(args):
+    args.encoder_normalize_before = getattr(args, "encoder_normalize_before", True)
+    args.decoder_normalize_before = getattr(args, "decoder_normalize_before", True)
+    args.attention_dropout = getattr(args, "attention_dropout", 0.1)
+    args.activation_dropout = getattr(args, "activation_dropout", 0.1)
+    transformer_vaswani_wmt_en_de_big(args)
diff --git a/SpeechT5/fairseq/fairseq/models/transformer_align.py b/SpeechT5/fairseq/fairseq/models/transformer_align.py
new file mode 100644
index 0000000000000000000000000000000000000000..eaf585bd10e630ae6cd89920f197cd165f55ad58
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/models/transformer_align.py
@@ -0,0 +1,93 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from fairseq.models import register_model, register_model_architecture
+from fairseq.models.transformer import (
+    TransformerModel,
+    base_architecture,
+    transformer_wmt_en_de_big,
+)
+
+
+@register_model("transformer_align")
+class TransformerAlignModel(TransformerModel):
+    """
+    See "Jointly Learning to Align and Translate with Transformer
+    Models" (Garg et al., EMNLP 2019).
+    """
+
+    def __init__(self, encoder, decoder, args):
+        super().__init__(args, encoder, decoder)
+        self.alignment_heads = args.alignment_heads
+        self.alignment_layer = args.alignment_layer
+        self.full_context_alignment = args.full_context_alignment
+
+    @staticmethod
+    def add_args(parser):
+        # fmt: off
+        super(TransformerAlignModel, TransformerAlignModel).add_args(parser)
+        parser.add_argument('--alignment-heads', type=int, metavar='D',
+                            help='Number of cross attention heads per layer to supervised with alignments')
+        parser.add_argument('--alignment-layer', type=int, metavar='D',
+                            help='Layer number which has to be supervised. 0 corresponding to the bottommost layer.')
+        parser.add_argument('--full-context-alignment', action='store_true',
+                            help='Whether or not alignment is supervised conditioned on the full target context.')
+        # fmt: on
+
+    @classmethod
+    def build_model(cls, args, task):
+        # set any default arguments
+        transformer_align(args)
+
+        transformer_model = TransformerModel.build_model(args, task)
+        return TransformerAlignModel(
+            transformer_model.encoder, transformer_model.decoder, args
+        )
+
+    def forward(self, src_tokens, src_lengths, prev_output_tokens):
+        encoder_out = self.encoder(src_tokens, src_lengths)
+        return self.forward_decoder(prev_output_tokens, encoder_out)
+
+    def forward_decoder(
+        self,
+        prev_output_tokens,
+        encoder_out=None,
+        incremental_state=None,
+        features_only=False,
+        **extra_args,
+    ):
+        attn_args = {
+            "alignment_layer": self.alignment_layer,
+            "alignment_heads": self.alignment_heads,
+        }
+        decoder_out = self.decoder(prev_output_tokens, encoder_out, **attn_args)
+
+        if self.full_context_alignment:
+            attn_args["full_context_alignment"] = self.full_context_alignment
+            _, alignment_out = self.decoder(
+                prev_output_tokens,
+                encoder_out,
+                features_only=True,
+                **attn_args,
+                **extra_args,
+            )
+            decoder_out[1]["attn"] = alignment_out["attn"]
+
+        return decoder_out
+
+
+@register_model_architecture("transformer_align", "transformer_align")
+def transformer_align(args):
+    args.alignment_heads = getattr(args, "alignment_heads", 1)
+    args.alignment_layer = getattr(args, "alignment_layer", 4)
+    args.full_context_alignment = getattr(args, "full_context_alignment", False)
+    base_architecture(args)
+
+
+@register_model_architecture("transformer_align", "transformer_wmt_en_de_big_align")
+def transformer_wmt_en_de_big_align(args):
+    args.alignment_heads = getattr(args, "alignment_heads", 1)
+    args.alignment_layer = getattr(args, "alignment_layer", 4)
+    transformer_wmt_en_de_big(args)
diff --git a/SpeechT5/fairseq/fairseq/models/transformer_from_pretrained_xlm.py b/SpeechT5/fairseq/fairseq/models/transformer_from_pretrained_xlm.py
new file mode 100644
index 0000000000000000000000000000000000000000..236d9942e1fb0238cc92e2b4f160520b5cdd6504
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/models/transformer_from_pretrained_xlm.py
@@ -0,0 +1,152 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import os
+from typing import Any, Dict
+
+from fairseq import checkpoint_utils
+from fairseq.data.legacy.masked_lm_dictionary import MaskedLMDictionary
+from fairseq.models import register_model, register_model_architecture
+from fairseq.models.transformer import (
+    TransformerDecoder,
+    TransformerEncoder,
+    TransformerModel,
+    base_architecture as transformer_base_architecture,
+)
+
+
+@register_model("transformer_from_pretrained_xlm")
+class TransformerFromPretrainedXLMModel(TransformerModel):
+    @staticmethod
+    def add_args(parser):
+        """Add model-specific arguments to the parser."""
+        TransformerModel.add_args(parser)
+        parser.add_argument(
+            "--pretrained-xlm-checkpoint",
+            type=str,
+            metavar="STR",
+            help="XLM model to use for initializing transformer encoder and/or decoder",
+        )
+        parser.add_argument(
+            "--init-encoder-only",
+            action="store_true",
+            help="if set, don't load the XLM weights and embeddings into decoder",
+        )
+        parser.add_argument(
+            "--init-decoder-only",
+            action="store_true",
+            help="if set, don't load the XLM weights and embeddings into encoder",
+        )
+
+    @classmethod
+    def build_model(self, args, task, cls_dictionary=MaskedLMDictionary):
+        assert hasattr(args, "pretrained_xlm_checkpoint"), (
+            "You must specify a path for --pretrained-xlm-checkpoint to use "
+            "--arch transformer_from_pretrained_xlm"
+        )
+        assert isinstance(task.source_dictionary, cls_dictionary) and isinstance(
+            task.target_dictionary, cls_dictionary
+        ), (
+            "You should use a MaskedLMDictionary when using --arch "
+            "transformer_from_pretrained_xlm because the pretrained XLM model "
+            "was trained using data binarized with MaskedLMDictionary. "
+            "For translation, you may want to use --task "
+            "translation_from_pretrained_xlm"
+        )
+        assert not (
+            getattr(args, "init_encoder_only", False)
+            and getattr(args, "init_decoder_only", False)
+        ), "Only one of --init-encoder-only and --init-decoder-only can be set."
+        return super().build_model(args, task)
+
+    @classmethod
+    def build_encoder(cls, args, src_dict, embed_tokens):
+        return TransformerEncoderFromPretrainedXLM(args, src_dict, embed_tokens)
+
+    @classmethod
+    def build_decoder(cls, args, tgt_dict, embed_tokens):
+        return TransformerDecoderFromPretrainedXLM(args, tgt_dict, embed_tokens)
+
+
+def upgrade_state_dict_with_xlm_weights(
+    state_dict: Dict[str, Any], pretrained_xlm_checkpoint: str
+) -> Dict[str, Any]:
+    """
+    Load XLM weights into a Transformer encoder or decoder model.
+
+    Args:
+        state_dict: state dict for either TransformerEncoder or
+            TransformerDecoder
+        pretrained_xlm_checkpoint: checkpoint to load XLM weights from
+
+    Raises:
+        AssertionError: If architecture (num layers, attention heads, etc.)
+            does not match between the current Transformer encoder or
+            decoder and the pretrained_xlm_checkpoint
+    """
+    if not os.path.exists(pretrained_xlm_checkpoint):
+        raise IOError("Model file not found: {}".format(pretrained_xlm_checkpoint))
+
+    state = checkpoint_utils.load_checkpoint_to_cpu(pretrained_xlm_checkpoint)
+    xlm_state_dict = state["model"]
+    for key in xlm_state_dict.keys():
+
+        for search_key in ["embed_tokens", "embed_positions", "layers"]:
+            if search_key in key:
+                subkey = key[key.find(search_key) :]
+                assert subkey in state_dict, (
+                    "{} Transformer encoder / decoder "
+                    "state_dict does not contain {}. Cannot "
+                    "load {} from pretrained XLM checkpoint "
+                    "{} into Transformer.".format(
+                        str(state_dict.keys()), subkey, key, pretrained_xlm_checkpoint
+                    )
+                )
+
+                state_dict[subkey] = xlm_state_dict[key]
+    return state_dict
+
+
+class TransformerEncoderFromPretrainedXLM(TransformerEncoder):
+    def __init__(self, args, dictionary, embed_tokens):
+        super().__init__(args, dictionary, embed_tokens)
+        if getattr(args, "init_decoder_only", False):
+            # Don't load XLM weights for encoder if --init-decoder-only
+            return
+
+        assert hasattr(args, "pretrained_xlm_checkpoint"), (
+            "--pretrained-xlm-checkpoint must be specified to load Transformer "
+            "encoder from pretrained XLM"
+        )
+        xlm_loaded_state_dict = upgrade_state_dict_with_xlm_weights(
+            state_dict=self.state_dict(),
+            pretrained_xlm_checkpoint=args.pretrained_xlm_checkpoint,
+        )
+        self.load_state_dict(xlm_loaded_state_dict, strict=True)
+
+
+class TransformerDecoderFromPretrainedXLM(TransformerDecoder):
+    def __init__(self, args, dictionary, embed_tokens, no_encoder_attn=False):
+        super().__init__(args, dictionary, embed_tokens, no_encoder_attn)
+        if getattr(args, "init_encoder_only", False):
+            # Don't load XLM weights for decoder if --init-encoder-only
+            return
+        assert hasattr(args, "pretrained_xlm_checkpoint"), (
+            "--pretrained-xlm-checkpoint must be specified to load Transformer "
+            "decoder from pretrained XLM"
+        )
+
+        xlm_loaded_state_dict = upgrade_state_dict_with_xlm_weights(
+            state_dict=self.state_dict(),
+            pretrained_xlm_checkpoint=args.pretrained_xlm_checkpoint,
+        )
+        self.load_state_dict(xlm_loaded_state_dict, strict=True)
+
+
+@register_model_architecture(
+    "transformer_from_pretrained_xlm", "transformer_from_pretrained_xlm"
+)
+def base_architecture(args):
+    transformer_base_architecture(args)
diff --git a/SpeechT5/fairseq/fairseq/models/transformer_lm.py b/SpeechT5/fairseq/fairseq/models/transformer_lm.py
new file mode 100644
index 0000000000000000000000000000000000000000..a546776912b24f8aec4011f52b5ac1884112634e
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/models/transformer_lm.py
@@ -0,0 +1,544 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+
+from dataclasses import dataclass, field
+from typing import Optional
+
+from fairseq import options, utils
+from fairseq.dataclass import ChoiceEnum, FairseqDataclass
+from fairseq.models import (
+    FairseqLanguageModel,
+    register_model,
+    register_model_architecture,
+)
+from fairseq.models.transformer import (
+    DEFAULT_MIN_PARAMS_TO_WRAP, Embedding, TransformerDecoder
+)
+from fairseq.modules import AdaptiveInput, CharacterTokenEmbedder
+from omegaconf import II
+
+
+DEFAULT_MAX_TARGET_POSITIONS = 1024
+
+
+@dataclass
+class TransformerLanguageModelConfig(FairseqDataclass):
+    activation_fn: ChoiceEnum(utils.get_available_activation_fns()) = field(
+        default="relu", metadata={"help": "activation function to use"}
+    )
+    dropout: float = field(default=0.1, metadata={"help": "dropout probability"})
+    attention_dropout: float = field(
+        default=0.0, metadata={"help": "dropout probability for attention weights"}
+    )
+    activation_dropout: float = field(
+        default=0.0, metadata={"help": "dropout probability after activation in FFN."}
+    )
+    relu_dropout: float = field(
+        default=0.0, metadata={"help": "dropout probability after activation in FFN."}
+    )
+    decoder_embed_dim: int = field(
+        default=512, metadata={"help": "decoder embedding dimension"}
+    )
+    decoder_output_dim: int = field(
+        default=512, metadata={"help": "decoder output dimension"}
+    )
+    decoder_input_dim: int = field(
+        default=512, metadata={"help": "decoder input dimension"}
+    )
+    decoder_ffn_embed_dim: int = field(
+        default=2048, metadata={"help": "decoder embedding dimension for FFN"}
+    )
+    decoder_layers: int = field(default=6, metadata={"help": "num decoder layers"})
+    decoder_attention_heads: int = field(
+        default=8, metadata={"help": "num decoder attention heads"}
+    )
+    decoder_normalize_before: bool = field(
+        default=False, metadata={"help": "apply layernorm before each decoder block"}
+    )
+    no_decoder_final_norm: bool = field(
+        default=False,
+        metadata={"help": "don't add an extra layernorm after the last decoder block"},
+    )
+    adaptive_softmax_cutoff: Optional[str] = field(
+        default=None,
+        metadata={
+            "help": "comma separated list of adaptive softmax cutoff points. "
+            "Must be used with adaptive_loss criterion"
+        },
+    )
+    adaptive_softmax_dropout: float = field(
+        default=0,
+        metadata={"help": "sets adaptive softmax dropout for the tail projections"},
+    )
+    adaptive_softmax_factor: float = field(
+        default=4, metadata={"help": "adaptive input factor"}
+    )
+    no_token_positional_embeddings: bool = field(
+        default=False,
+        metadata={
+            "help": "if set, disables positional embeddings (outside self attention)"
+        },
+    )
+    share_decoder_input_output_embed: bool = field(
+        default=False, metadata={"help": "share decoder input and output embeddings"}
+    )
+    character_embeddings: bool = field(
+        default=False,
+        metadata={
+            "help": "if set, uses character embedding convolutions to produce token embeddings"
+        },
+    )
+    character_filters: str = field(
+        default="[(1, 64), (2, 128), (3, 192), (4, 256), (5, 256), (6, 256), (7, 256)]",
+        metadata={"help": "size of character embeddings"},
+    )
+    character_embedding_dim: int = field(
+        default=4, metadata={"help": "size of character embeddings"}
+    )
+    char_embedder_highway_layers: int = field(
+        default=2,
+        metadata={"help": "number of highway layers for character token embeddder"},
+    )
+    adaptive_input: bool = field(
+        default=False, metadata={"help": "if set, uses adaptive input"}
+    )
+    adaptive_input_factor: float = field(
+        default=4, metadata={"help": "adaptive input factor"}
+    )
+    adaptive_input_cutoff: Optional[str] = field(
+        default=None,
+        metadata={"help": "comma separated list of adaptive input cutoff points."},
+    )
+    tie_adaptive_weights: bool = field(
+        default=False,
+        metadata={
+            "help": "if set, ties the weights of adaptive softmax and adaptive input"
+        },
+    )
+    tie_adaptive_proj: bool = field(
+        default=False,
+        metadata={
+            "help": "if set, ties the projection weights of adaptive softmax and adaptive input"
+        },
+    )
+    decoder_learned_pos: bool = field(
+        default=False,
+        metadata={"help": "use learned positional embeddings in the decoder"},
+    )
+    layernorm_embedding: bool = field(
+        default=False, metadata={"help": "add layernorm to embedding"}
+    )
+    no_scale_embedding: bool = field(
+        default=False, metadata={"help": "if True, dont scale embeddings"}
+    )
+    checkpoint_activations: bool = field(
+        default=False, metadata={"help": "checkpoint activations at each layer"}
+    )
+    offload_activations: bool = field(
+        default=False,
+        metadata={"help": "move checkpointed activations to CPU after they are used."},
+    )
+    # config for "Reducing Transformer Depth on Demand with Structured Dropout" (Fan et al., 2019)
+    decoder_layerdrop: float = field(
+        default=0.0, metadata={"help": "LayerDrop probability for decoder"}
+    )
+    decoder_layers_to_keep: Optional[str] = field(
+        default=None,
+        metadata={
+            "help": "which layers to *keep* when pruning as a comma-separated list"
+        },
+    )
+    # config for Training with Quantization Noise for Extreme Model Compression ({Fan*, Stock*} et al., 2020)
+    quant_noise_pq: float = field(
+        default=0.0,
+        metadata={"help": "iterative PQ quantization noise at training time"},
+    )
+    quant_noise_pq_block_size: int = field(
+        default=8,
+        metadata={"help": "block size of quantization noise at training time"},
+    )
+    quant_noise_scalar: float = field(
+        default=0.0,
+        metadata={
+            "help": "scalar quantization noise and scalar quantization at training time"
+        },
+    )
+    # config for Fully Sharded Data Parallel (FSDP) training
+    min_params_to_wrap: int = field(
+        default=DEFAULT_MIN_PARAMS_TO_WRAP,
+        metadata={
+            "help": (
+                "minimum number of params for a layer to be wrapped with FSDP() when "
+                "training with --ddp-backend=fully_sharded. Smaller values will "
+                "improve memory efficiency, but may make torch.distributed "
+                "communication less efficient due to smaller input sizes. This option "
+                "is set to 0 (i.e., always wrap) when --checkpoint-activations or "
+                "--offload-activations are passed."
+            )
+        }
+    )
+    # config for "BASE Layers: Simplifying Training of Large, Sparse Models"
+    base_layers: Optional[int] = field(
+        default=0, metadata={"help": "number of BASE layers in total"}
+    )
+    base_sublayers: Optional[int] = field(
+        default=1, metadata={"help": "number of sublayers in each BASE layer"}
+    )
+    base_shuffle: Optional[int] = field(
+        default=1, metadata={"help": "shuffle tokens between workers before computing assignment"}
+    )
+    # options from other parts of the config
+    add_bos_token: bool = II("task.add_bos_token")
+    tokens_per_sample: int = II("task.tokens_per_sample")
+    max_target_positions: Optional[int] = II("task.max_target_positions")
+    tpu: bool = II("common.tpu")
+
+
+@register_model("transformer_lm", dataclass=TransformerLanguageModelConfig)
+class TransformerLanguageModel(FairseqLanguageModel):
+    @classmethod
+    def hub_models(cls):
+        def moses_fastbpe(path):
+            return {"path": path, "tokenizer": "moses", "bpe": "fastbpe"}
+
+        def spm(path):
+            return {"path": path, "tokenizer": "space", "bpe": "sentencepiece"}
+
+        return {
+            "transformer_lm.gbw.adaptive_huge": "https://dl.fbaipublicfiles.com/fairseq/models/lm/adaptive_lm_gbw_huge.tar.bz2",
+            "transformer_lm.wiki103.adaptive": "https://dl.fbaipublicfiles.com/fairseq/models/lm/adaptive_lm_wiki103.v2.tar.bz2",
+            "transformer_lm.wmt19.en": moses_fastbpe(
+                "https://dl.fbaipublicfiles.com/fairseq/models/lm/wmt19.en.tar.bz2"
+            ),
+            "transformer_lm.wmt19.de": moses_fastbpe(
+                "https://dl.fbaipublicfiles.com/fairseq/models/lm/wmt19.de.tar.bz2"
+            ),
+            "transformer_lm.wmt19.ru": moses_fastbpe(
+                "https://dl.fbaipublicfiles.com/fairseq/models/lm/wmt19.ru.tar.bz2"
+            ),
+            "transformer_lm.wmt20.en": spm(
+                "https://dl.fbaipublicfiles.com/fairseq/models/lm/wmt20.en.tar.gz"
+            ),
+            "transformer_lm.wmt20.ta": spm(
+                "https://dl.fbaipublicfiles.com/fairseq/models/lm/wmt20.ta.tar.gz"
+            ),
+            "transformer_lm.wmt20.iu.news": spm(
+                "https://dl.fbaipublicfiles.com/fairseq/models/lm/wmt20.iu.news.tar.gz"
+            ),
+            "transformer_lm.wmt20.iu.nh": spm(
+                "https://dl.fbaipublicfiles.com/fairseq/models/lm/wmt20.iu.nh.tar.gz"
+            ),
+        }
+
+    def __init__(self, decoder):
+        super().__init__(decoder)
+
+    @classmethod
+    def build_model(cls, args, task):
+        """Build a new model instance."""
+
+        if args.decoder_layers_to_keep:
+            args.decoder_layers = len(args.decoder_layers_to_keep.split(","))
+
+        if getattr(args, "max_target_positions", None) is None:
+            args.max_target_positions = getattr(
+                args, "tokens_per_sample", DEFAULT_MAX_TARGET_POSITIONS
+            )
+
+        if args.character_embeddings:
+            embed_tokens = CharacterTokenEmbedder(
+                task.source_dictionary,
+                eval(args.character_filters),
+                args.character_embedding_dim,
+                args.decoder_embed_dim,
+                args.char_embedder_highway_layers,
+            )
+        elif args.adaptive_input:
+            embed_tokens = AdaptiveInput(
+                len(task.source_dictionary),
+                task.source_dictionary.pad(),
+                args.decoder_input_dim,
+                args.adaptive_input_factor,
+                args.decoder_embed_dim,
+                options.eval_str_list(args.adaptive_input_cutoff, type=int),
+                args.quant_noise_pq,
+                args.quant_noise_pq_block_size,
+            )
+        else:
+            embed_tokens = cls.build_embedding(
+                args, task.source_dictionary, args.decoder_input_dim
+            )
+
+        if args.tie_adaptive_weights:
+            assert args.adaptive_input
+            assert args.adaptive_input_factor == args.adaptive_softmax_factor
+            assert (
+                args.adaptive_softmax_cutoff == args.adaptive_input_cutoff
+            ), "{} != {}".format(
+                args.adaptive_softmax_cutoff, args.adaptive_input_cutoff
+            )
+            assert args.decoder_input_dim == args.decoder_output_dim
+
+        decoder = TransformerDecoder(
+            args, task.target_dictionary, embed_tokens, no_encoder_attn=True
+        )
+        return cls(decoder)
+
+    @classmethod
+    def build_embedding(cls, args, dictionary, embed_dim, path=None):
+        embed_tokens = Embedding(len(dictionary), embed_dim, dictionary.pad())
+        return embed_tokens
+
+
+def base_lm_architecture(args):
+    # backward compatibility for older model checkpoints
+    if hasattr(args, "no_tie_adaptive_proj"):
+        # previous models defined --no-tie-adaptive-proj, so use the existence of
+        # that option to determine if this is an "old" model checkpoint
+        args.no_decoder_final_norm = True  # old models always set this to True
+        if args.no_tie_adaptive_proj is False:
+            args.tie_adaptive_proj = True
+    if hasattr(args, "decoder_final_norm"):
+        args.no_decoder_final_norm = not args.decoder_final_norm
+
+    args.dropout = getattr(args, "dropout", 0.1)
+    args.attention_dropout = getattr(args, "attention_dropout", 0.0)
+
+    args.decoder_embed_dim = getattr(args, "decoder_embed_dim", 512)
+    args.decoder_ffn_embed_dim = getattr(args, "decoder_ffn_embed_dim", 2048)
+    args.decoder_layers = getattr(args, "decoder_layers", 6)
+    args.decoder_attention_heads = getattr(args, "decoder_attention_heads", 8)
+    args.adaptive_softmax_cutoff = getattr(args, "adaptive_softmax_cutoff", None)
+    args.adaptive_softmax_dropout = getattr(args, "adaptive_softmax_dropout", 0)
+    args.adaptive_softmax_factor = getattr(args, "adaptive_softmax_factor", 4)
+    args.decoder_learned_pos = getattr(args, "decoder_learned_pos", False)
+    args.activation_fn = getattr(args, "activation_fn", "relu")
+
+    args.decoder_layerdrop = getattr(args, "decoder_layerdrop", 0)
+    args.decoder_layers_to_keep = getattr(args, "decoder_layers_to_keep", None)
+    args.quant_noise_pq = getattr(args, "quant_noise_pq", 0)
+    args.quant_noise_pq_block_size = getattr(args, "quant_noise_pq_block_size", 8)
+    args.quant_noise_scalar = getattr(args, "quant_noise_scalar", 0)
+
+    args.base_layers = getattr(args, "base_layers", 0)
+    args.base_sublayers = getattr(args, "base_sublayers", 1)
+    args.base_shuffle = getattr(args, "base_shuffle", False)
+
+    args.add_bos_token = getattr(args, "add_bos_token", False)
+    args.no_token_positional_embeddings = getattr(
+        args, "no_token_positional_embeddings", False
+    )
+    args.share_decoder_input_output_embed = getattr(
+        args, "share_decoder_input_output_embed", False
+    )
+    args.character_embeddings = getattr(args, "character_embeddings", False)
+
+    args.decoder_output_dim = getattr(
+        args, "decoder_output_dim", args.decoder_embed_dim
+    )
+    args.decoder_input_dim = getattr(args, "decoder_input_dim", args.decoder_embed_dim)
+
+    # Model training is not stable without this
+    args.decoder_normalize_before = True
+    args.no_decoder_final_norm = getattr(args, "no_decoder_final_norm", False)
+
+    args.adaptive_input = getattr(args, "adaptive_input", False)
+    args.adaptive_input_factor = getattr(args, "adaptive_input_factor", 4)
+    args.adaptive_input_cutoff = getattr(args, "adaptive_input_cutoff", None)
+
+    args.tie_adaptive_weights = getattr(args, "tie_adaptive_weights", False)
+    args.tie_adaptive_proj = getattr(args, "tie_adaptive_proj", False)
+
+    args.no_scale_embedding = getattr(args, "no_scale_embedding", False)
+    args.layernorm_embedding = getattr(args, "layernorm_embedding", False)
+    args.checkpoint_activations = getattr(args, "checkpoint_activations", False)
+    args.offload_activations = getattr(args, "offload_activations", False)
+    if args.offload_activations:
+        args.checkpoint_activations = True
+
+
+@register_model_architecture("transformer_lm", "transformer_lm_big")
+def transformer_lm_big(args):
+    args.decoder_layers = getattr(args, "decoder_layers", 12)
+    args.decoder_embed_dim = getattr(args, "decoder_embed_dim", 1024)
+    args.decoder_ffn_embed_dim = getattr(args, "decoder_ffn_embed_dim", 4096)
+    args.decoder_attention_heads = getattr(args, "decoder_attention_heads", 16)
+    base_lm_architecture(args)
+
+
+@register_model_architecture("transformer_lm", "transformer_lm_wiki103")
+@register_model_architecture("transformer_lm", "transformer_lm_baevski_wiki103")
+def transformer_lm_baevski_wiki103(args):
+    args.decoder_layers = getattr(args, "decoder_layers", 16)
+    args.decoder_attention_heads = getattr(args, "decoder_attention_heads", 8)
+    args.dropout = getattr(args, "dropout", 0.3)
+    args.adaptive_input = getattr(args, "adaptive_input", True)
+    args.tie_adaptive_weights = getattr(args, "tie_adaptive_weights", True)
+    args.adaptive_input_cutoff = getattr(args, "adaptive_input_cutoff", "20000,60000")
+    args.adaptive_softmax_cutoff = getattr(
+        args, "adaptive_softmax_cutoff", "20000,60000"
+    )
+    args.adaptive_softmax_dropout = getattr(args, "adaptive_softmax_dropout", 0.2)
+    args.attention_dropout = getattr(args, "attention_dropout", 0.1)
+    args.activation_dropout = getattr(args, "activation_dropout", 0.1)
+    args.no_decoder_final_norm = getattr(args, "no_decoder_final_norm", True)
+    args.tie_adaptive_proj = getattr(args, "tie_adaptive_proj", True)
+    transformer_lm_big(args)
+
+
+@register_model_architecture("transformer_lm", "transformer_lm_gbw")
+@register_model_architecture("transformer_lm", "transformer_lm_baevski_gbw")
+def transformer_lm_baevski_gbw(args):
+    args.decoder_embed_dim = getattr(args, "decoder_embed_dim", 512)
+    args.dropout = getattr(args, "dropout", 0.1)
+    args.attention_dropout = getattr(args, "attention_dropout", 0.1)
+    args.no_decoder_final_norm = getattr(args, "no_decoder_final_norm", True)
+    transformer_lm_big(args)
+
+
+@register_model_architecture("transformer_lm", "transformer_lm_gpt")
+def transformer_lm_gpt(args):
+    args.decoder_embed_dim = getattr(args, "decoder_embed_dim", 768)
+    args.decoder_ffn_embed_dim = getattr(args, "decoder_ffn_embed_dim", 3072)
+    args.decoder_layers = getattr(args, "decoder_layers", 12)
+    args.decoder_attention_heads = getattr(args, "decoder_attention_heads", 12)
+    args.dropout = getattr(args, "dropout", 0.1)
+    args.attention_dropout = getattr(args, "attention_dropout", 0.1)
+    args.activation_fn = getattr(args, "activation_fn", "gelu")
+    base_lm_architecture(args)
+
+
+@register_model_architecture("transformer_lm", "transformer_lm_gpt2_small")
+def transformer_lm_gpt2_small(args):
+    args.decoder_embed_dim = getattr(args, "decoder_embed_dim", 1024)
+    args.decoder_ffn_embed_dim = getattr(args, "decoder_ffn_embed_dim", 4096)
+    args.decoder_layers = getattr(args, "decoder_layers", 24)
+    args.decoder_attention_heads = getattr(args, "decoder_attention_heads", 16)
+    args.dropout = getattr(args, "dropout", 0.1)
+    args.attention_dropout = getattr(args, "attention_dropout", 0.1)
+    args.activation_fn = getattr(args, "activation_fn", "gelu")
+    base_lm_architecture(args)
+
+
+@register_model_architecture("transformer_lm", "transformer_lm_gpt2_tiny")
+def transformer_lm_gpt2_tiny(args):
+    args.decoder_embed_dim = getattr(args, "decoder_embed_dim", 64)
+    args.decoder_ffn_embed_dim = getattr(args, "decoder_ffn_embed_dim", 64)
+    args.decoder_layers = getattr(args, "decoder_layers", 2)
+    args.decoder_attention_heads = getattr(args, "decoder_attention_heads", 1)
+    args.dropout = getattr(args, "dropout", 0.1)
+    args.attention_dropout = getattr(args, "attention_dropout", 0.1)
+    args.activation_fn = getattr(args, "activation_fn", "gelu")
+    base_lm_architecture(args)
+
+
+@register_model_architecture("transformer_lm", "transformer_lm_gpt2_medium")
+def transformer_lm_gpt2_medium(args):
+    args.decoder_embed_dim = getattr(args, "decoder_embed_dim", 1280)
+    args.decoder_ffn_embed_dim = getattr(args, "decoder_ffn_embed_dim", 5120)
+    args.decoder_layers = getattr(args, "decoder_layers", 36)
+    args.decoder_attention_heads = getattr(args, "decoder_attention_heads", 20)
+    args.dropout = getattr(args, "dropout", 0.1)
+    args.attention_dropout = getattr(args, "attention_dropout", 0.1)
+    args.activation_fn = getattr(args, "activation_fn", "gelu")
+    base_lm_architecture(args)
+
+
+@register_model_architecture("transformer_lm", "transformer_lm_gpt2_big")
+def transformer_lm_gpt2_big(args):
+    args.decoder_embed_dim = getattr(args, "decoder_embed_dim", 1600)
+    args.decoder_ffn_embed_dim = getattr(args, "decoder_ffn_embed_dim", 6400)
+    args.decoder_layers = getattr(args, "decoder_layers", 48)
+    args.decoder_attention_heads = getattr(args, "decoder_attention_heads", 25)
+    args.dropout = getattr(args, "dropout", 0.1)
+    args.attention_dropout = getattr(args, "attention_dropout", 0.1)
+    args.activation_fn = getattr(args, "activation_fn", "gelu")
+    base_lm_architecture(args)
+
+
+def base_gpt3_architecture(args):
+    args.decoder_input_dim = args.decoder_embed_dim
+    args.decoder_output_dim = args.decoder_embed_dim
+    args.decoder_ffn_embed_dim = getattr(args, "decoder_ffn_embed_dim", args.decoder_embed_dim * 4)
+    # GPT-3 used learned positional embeddings, rather than sinusoidal
+    args.decoder_learned_pos = getattr(args, "decoder_learned_pos", True)
+    args.dropout = getattr(args, "dropout", 0.0)
+    args.attention_dropout = getattr(args, "attention_dropout", 0.0)
+    args.activation_fn = getattr(args, "activation_fn", "gelu")
+    args.share_decoder_input_output_embed = True
+    base_lm_architecture(args)
+
+
+@register_model_architecture("transformer_lm", "transformer_lm_gpt3_small")
+def transformer_lm_gpt3_small(args):
+    # 125M params
+    args.decoder_layers = getattr(args, "decoder_layers", 12)
+    args.decoder_embed_dim = getattr(args, "decoder_embed_dim", 768)
+    args.decoder_attention_heads = getattr(args, "decoder_attention_heads", 12)
+    base_gpt3_architecture(args)
+
+
+@register_model_architecture("transformer_lm", "transformer_lm_gpt3_medium")
+def transformer_lm_gpt3_medium(args):
+    # 350M params
+    args.decoder_layers = getattr(args, "decoder_layers", 24)
+    args.decoder_embed_dim = getattr(args, "decoder_embed_dim", 1024)
+    args.decoder_attention_heads = getattr(args, "decoder_attention_heads", 16)
+    base_gpt3_architecture(args)
+
+
+@register_model_architecture("transformer_lm", "transformer_lm_gpt3_large")
+def transformer_lm_gpt3_large(args):
+    # 760M params
+    args.decoder_layers = getattr(args, "decoder_layers", 24)
+    args.decoder_embed_dim = getattr(args, "decoder_embed_dim", 1536)
+    args.decoder_attention_heads = getattr(args, "decoder_attention_heads", 16)
+    base_gpt3_architecture(args)
+
+
+@register_model_architecture("transformer_lm", "transformer_lm_gpt3_xl")
+def transformer_lm_gpt3_xl(args):
+    # 1.3B params
+    args.decoder_layers = getattr(args, "decoder_layers", 24)
+    args.decoder_embed_dim = getattr(args, "decoder_embed_dim", 2048)
+    args.decoder_attention_heads = getattr(args, "decoder_attention_heads", 32)
+    base_gpt3_architecture(args)
+
+
+@register_model_architecture("transformer_lm", "transformer_lm_gpt3_2_7")
+def transformer_lm_gpt3_2_7(args):
+    # 2.7B params
+    args.decoder_layers = getattr(args, "decoder_layers", 32)
+    args.decoder_embed_dim = getattr(args, "decoder_embed_dim", 2560)
+    args.decoder_attention_heads = getattr(args, "decoder_attention_heads", 32)
+    base_gpt3_architecture(args)
+
+
+@register_model_architecture("transformer_lm", "transformer_lm_gpt3_6_7")
+def transformer_lm_gpt3_6_7(args):
+    # 6.7B params
+    args.decoder_layers = getattr(args, "decoder_layers", 32)
+    args.decoder_embed_dim = getattr(args, "decoder_embed_dim", 4096)
+    args.decoder_attention_heads = getattr(args, "decoder_attention_heads", 32)
+    base_gpt3_architecture(args)
+
+
+@register_model_architecture("transformer_lm", "transformer_lm_gpt3_13")
+def transformer_lm_gpt3_13(args):
+    # 13B params
+    args.decoder_layers = getattr(args, "decoder_layers", 40)
+    args.decoder_embed_dim = getattr(args, "decoder_embed_dim", 5120)
+    args.decoder_attention_heads = getattr(args, "decoder_attention_heads", 40)
+    base_gpt3_architecture(args)
+
+
+@register_model_architecture("transformer_lm", "transformer_lm_gpt3_175")
+def transformer_lm_gpt3_175(args):
+    # 175B params
+    args.decoder_layers = getattr(args, "decoder_layers", 96)
+    args.decoder_embed_dim = getattr(args, "decoder_embed_dim", 12288)
+    args.decoder_attention_heads = getattr(args, "decoder_attention_heads", 96)
+    base_gpt3_architecture(args)
diff --git a/SpeechT5/fairseq/fairseq/models/wav2vec/__init__.py b/SpeechT5/fairseq/fairseq/models/wav2vec/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..06cec18183ca14cd534d14558e8b44e25f3e69d5
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/models/wav2vec/__init__.py
@@ -0,0 +1,8 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from .wav2vec import *  # noqa
+from .wav2vec2 import *  # noqa
+from .wav2vec2_asr import *  # noqa
diff --git a/SpeechT5/fairseq/fairseq/models/wav2vec/wav2vec.py b/SpeechT5/fairseq/fairseq/models/wav2vec/wav2vec.py
new file mode 100644
index 0000000000000000000000000000000000000000..af6604da10f504baabff50bf14a6eb2214bffef3
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/models/wav2vec/wav2vec.py
@@ -0,0 +1,630 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from dataclasses import dataclass, field
+import logging
+import math
+from typing import Optional, Tuple
+from omegaconf import II
+import sys
+
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from fairseq.dataclass import ChoiceEnum, FairseqDataclass
+from fairseq.models import BaseFairseqModel, register_model
+from fairseq.modules import (
+    Fp32GroupNorm,
+    Fp32LayerNorm,
+    GumbelVectorQuantizer,
+    KmeansVectorQuantizer,
+    TransposeLast,
+)
+from fairseq.tasks import FairseqTask
+from fairseq.utils import buffered_arange
+
+
+logger = logging.getLogger(__name__)
+
+
+AGGREGATOR_CHOICES = ChoiceEnum(["cnn", "gru"])
+PROJECT_FEATURES_CHOICES = ChoiceEnum(["none", "same", "new"])
+ACTIVATION_CHOICES = ChoiceEnum(["relu", "gelu"])
+VQ_TYPE_CHOICES = ChoiceEnum(["none", "gumbel", "kmeans"])
+
+
+@dataclass
+class Wav2VecConfig(FairseqDataclass):
+    prediction_steps: int = field(
+        default=12, metadata={"help": "number of steps ahead to predict"}
+    )
+    sample_distance: Optional[int] = field(
+        default=None,
+        metadata={
+            "help": "sample distance from target. does not work properly with cross-sampling"
+        },
+    )
+    cross_sample_negatives: int = field(
+        default=0, metadata={"help": "num of cross sampled negatives"}
+    )
+    num_negatives: int = field(
+        default=10, metadata={"help": "num of sampled negatives"}
+    )
+    conv_feature_layers: str = field(
+        default="[(512, 10, 5), (512, 8, 4), (512, 4, 2), (512, 4, 2), (512, 4, 2), (512, 1, 1), (512, 1, 1), (512, 1, 1)]",
+        metadata={
+            "help": "convolutional feature extraction layers [(dim, kernel_size, stride), ...]"
+        },
+    )
+    conv_aggregator_layers: str = field(
+        default="[(512, 2, 1), (512, 3, 1), (512, 4, 1), (512, 5, 1), (512, 6, 1), (512, 7, 1), (512, 8, 1), (512, 9, 1), (512, 10, 1), (512, 11, 1), (512, 12, 1), (512, 13, 1)]",
+        metadata={
+            "help": "convolutional aggregator layers [(dim, kernel_size, stride), ...]"
+        },
+    )
+    dropout: float = field(
+        default=0.0, metadata={"help": "dropout to apply within the model"}
+    )
+    dropout_features: float = field(
+        default=0.0, metadata={"help": "dropout to apply to the features"}
+    )
+    dropout_agg: float = field(
+        default=0.0, metadata={"help": "dropout to apply after aggregation step"}
+    )
+    aggregator: AGGREGATOR_CHOICES = field(
+        default="cnn", metadata={"help": "type of aggregator to use"}
+    )
+    gru_dim: int = field(default=512, metadata={"help": "GRU dimensionality"})
+    no_conv_bias: bool = field(
+        default=False, metadata={"help": "if set, does not learn bias for conv layers"}
+    )
+    agg_zero_pad: bool = field(
+        default=False,
+        metadata={"help": "if set, zero pads in aggregator instead of repl pad"},
+    )
+    skip_connections_feat: bool = field(
+        default=False,
+        metadata={"help": "if set, adds skip connections to the feature extractor"},
+    )
+    skip_connections_agg: bool = field(
+        default=True,
+        metadata={"help": "if set, adds skip connections to the aggregator"},
+    )
+    residual_scale: float = field(
+        default=0.5, metadata={"help": "scales residual by sqrt(value)"}
+    )
+    log_compression: bool = field(
+        default=True,
+        metadata={"help": "if set, adds a log compression to feature extractor"},
+    )
+    balanced_classes: bool = field(
+        default=False,
+        metadata={"help": "if set, loss is scaled to balance for number of negatives"},
+    )
+    project_features: PROJECT_FEATURES_CHOICES = field(
+        default="none",
+        metadata={
+            "help": "if not none, features are projected using the (same or new) aggregator"
+        },
+    )
+    non_affine_group_norm: bool = field(
+        default=False, metadata={"help": "if set, group norm is not affine"}
+    )
+    offset: str = field(
+        default="auto",
+        metadata={
+            "help": "if set to 'auto', it is computed automatically from the receptive field, else set to int value"
+        },
+    )
+    activation: ACTIVATION_CHOICES = field(
+        default="relu",
+        metadata={
+            "help": "if set to 'auto', it is computed automatically from the receptive field, else set to int value"
+        },
+    )
+    vq_type: VQ_TYPE_CHOICES = field(
+        default="none", metadata={"help": "which type of quantizer to use"}
+    )
+    vq_vars: int = field(
+        default=320,
+        metadata={"help": "project to this many vector quantized variables per group"},
+    )
+    vq_groups: int = field(
+        default=2, metadata={"help": "number of groups of latent variables"}
+    )
+    vq_dim: int = field(
+        default=0,
+        metadata={
+            "help": "uses this dimensionality for quantized vectors. 0 to use model dim // groups"
+        },
+    )
+    vq_depth: int = field(
+        default=1, metadata={"help": "number of layers for vq weight projection"}
+    )
+    combine_groups: bool = field(
+        default=False, metadata={"help": "if set, variables are shared among groups"}
+    )
+    vq_temp: Tuple[float, float, float] = field(
+        default=(2.0, 0.5, 0.999995),
+        metadata={
+            "help": "temperature for latent variable sampling with gumbel softmax. should be a tuple of 3 values (start, end, decay)"
+        },
+    )
+    vq_gamma: float = field(
+        default=0.25,
+        metadata={"help": "gamma parameter for kmeans style vector quantization"},
+    )
+    infonce: bool = II("criterion.infonce")
+
+
+@register_model("wav2vec", dataclass=Wav2VecConfig)
+class Wav2VecModel(BaseFairseqModel):
+    @classmethod
+    def build_model(cls, cfg: Wav2VecConfig, task: FairseqTask):
+        """Build a new model instance."""
+
+        model = Wav2VecModel(cfg)
+        logger.info(model)
+        return model
+
+    def __init__(self, cfg: Wav2VecConfig):
+        super().__init__()
+
+        self.prediction_steps = cfg.prediction_steps
+        offset = cfg.offset
+
+        if cfg.activation == "relu":
+            activation = nn.ReLU()
+        elif cfg.activation == "gelu":
+            activation = nn.GELU()
+        else:
+            raise Exception("unknown activation " + cfg.activation)
+
+        feature_enc_layers = eval(cfg.conv_feature_layers)
+        self.feature_extractor = ConvFeatureExtractionModel(
+            conv_layers=feature_enc_layers,
+            dropout=0.0,
+            log_compression=cfg.log_compression,
+            skip_connections=cfg.skip_connections_feat,
+            residual_scale=cfg.residual_scale,
+            non_affine_group_norm=cfg.non_affine_group_norm,
+            activation=activation,
+        )
+        embed = feature_enc_layers[-1][0]
+
+        self.vector_quantizer = None
+        if cfg.vq_type == "gumbel":
+            self.vector_quantizer = GumbelVectorQuantizer(
+                dim=embed,
+                num_vars=cfg.vq_vars,
+                temp=cfg.vq_temp,
+                groups=cfg.vq_groups,
+                combine_groups=cfg.combine_groups,
+                vq_dim=cfg.vq_dim if cfg.vq_dim > 0 else embed,
+                time_first=False,
+                activation=activation,
+                weight_proj_depth=cfg.vq_depth,
+                weight_proj_factor=2,
+            )
+        elif cfg.vq_type == "kmeans":
+            self.vector_quantizer = KmeansVectorQuantizer(
+                dim=embed,
+                num_vars=cfg.vq_vars,
+                groups=cfg.vq_groups,
+                combine_groups=cfg.combine_groups,
+                vq_dim=cfg.vq_dim if cfg.vq_dim > 0 else embed,
+                time_first=False,
+                gamma=cfg.vq_gamma,
+            )
+        else:
+            assert (
+                cfg.vq_type == "none" or cfg.vq_type is None
+            ), "Unknown quantizer type"
+
+        if cfg.offset == "auto":
+            jin = 0
+            rin = 0
+            for _, k, stride in feature_enc_layers:
+                if rin == 0:
+                    rin = k
+                rin = rin + (k - 1) * jin
+                if jin == 0:
+                    jin = stride
+                else:
+                    jin *= stride
+            offset = math.ceil(rin / jin)
+
+        offset = int(offset)
+
+        def make_aggregator():
+            if cfg.aggregator == "cnn":
+                agg_layers = eval(cfg.conv_aggregator_layers)
+                agg_dim = agg_layers[-1][0]
+                feature_aggregator = ConvAggegator(
+                    conv_layers=agg_layers,
+                    embed=embed,
+                    dropout=cfg.dropout,
+                    skip_connections=cfg.skip_connections_agg,
+                    residual_scale=cfg.residual_scale,
+                    non_affine_group_norm=cfg.non_affine_group_norm,
+                    conv_bias=not cfg.no_conv_bias,
+                    zero_pad=cfg.agg_zero_pad,
+                    activation=activation,
+                )
+            elif cfg.aggregator == "gru":
+                agg_dim = cfg.gru_dim
+                feature_aggregator = nn.Sequential(
+                    TransposeLast(),
+                    nn.GRU(
+                        input_size=embed,
+                        hidden_size=agg_dim,
+                        num_layers=1,
+                        dropout=cfg.dropout,
+                    ),
+                    TransposeLast(deconstruct_idx=0),
+                )
+            else:
+                raise Exception("unknown aggregator type " + cfg.aggregator)
+
+            return feature_aggregator, agg_dim
+
+        self.feature_aggregator, agg_dim = make_aggregator()
+
+        self.wav2vec_predictions = Wav2VecPredictionsModel(
+            in_dim=agg_dim,
+            out_dim=embed,
+            prediction_steps=cfg.prediction_steps,
+            n_negatives=cfg.num_negatives,
+            cross_sample_negatives=cfg.cross_sample_negatives,
+            sample_distance=cfg.sample_distance,
+            dropout=cfg.dropout,
+            offset=offset,
+            balanced_classes=cfg.balanced_classes,
+            infonce=cfg.infonce,
+        )
+
+        self.dropout_feats = nn.Dropout(p=cfg.dropout_features)
+        self.dropout_agg = nn.Dropout(p=cfg.dropout_agg)
+
+        if cfg.project_features == "none":
+            self.project_features = None
+        elif cfg.project_features == "same":
+            self.project_features = self.feature_aggregator
+        elif cfg.project_features == "new":
+            self.project_features, _ = make_aggregator()
+
+    def forward(self, source):
+        result = {}
+
+        features = self.feature_extractor(source)
+        if self.vector_quantizer:
+            q_res = self.vector_quantizer(features)
+            features = q_res["x"]
+            for k in q_res.keys():
+                if k != "x":
+                    result[k] = q_res[k]
+
+        x = self.dropout_feats(features)
+        x = self.feature_aggregator(x)
+        x = self.dropout_agg(x)
+
+        if self.project_features is not None:
+            features = self.project_features(features)
+        x, targets = self.wav2vec_predictions(x, features)
+        result["cpc_logits"] = x
+        result["cpc_targets"] = targets
+
+        return result
+
+    def upgrade_state_dict_named(self, state_dict, name):
+        super().upgrade_state_dict_named(state_dict, name)
+
+    def max_positions(self):
+        """Maximum length supported by the model."""
+        return sys.maxsize
+
+    def get_logits(self, net_output):
+        logits = net_output["cpc_logits"]
+        return logits
+
+    def get_targets(self, sample, net_output):
+        t = net_output["cpc_targets"]
+        if isinstance(t, tuple):
+            t = t[0]
+        return t.contiguous()
+
+    def get_target_weights(self, targets, net_output):
+        targets = net_output["cpc_targets"]
+        if isinstance(targets, tuple) and targets[-1] is not None:
+            return targets[-1]
+        return None
+
+    def get_extra_losses(self, net_output):
+        loss = None
+        if "prob_perplexity" in net_output:
+            loss = net_output["num_vars"] - net_output["prob_perplexity"]
+        elif "kmeans_loss" in net_output:
+            loss = net_output["kmeans_loss"]
+
+        return loss
+
+
+def norm_block(is_layer_norm, dim, affine=True):
+    if is_layer_norm:
+        mod = nn.Sequential(
+            TransposeLast(),
+            Fp32LayerNorm(dim, elementwise_affine=affine),
+            TransposeLast(),
+        )
+    else:
+        mod = Fp32GroupNorm(1, dim, affine=affine)
+
+    return mod
+
+
+class ConvFeatureExtractionModel(nn.Module):
+    def __init__(
+        self,
+        conv_layers,
+        dropout,
+        log_compression,
+        skip_connections,
+        residual_scale,
+        non_affine_group_norm,
+        activation,
+    ):
+        super().__init__()
+
+        def block(n_in, n_out, k, stride):
+            return nn.Sequential(
+                nn.Conv1d(n_in, n_out, k, stride=stride, bias=False),
+                nn.Dropout(p=dropout),
+                norm_block(
+                    is_layer_norm=False, dim=n_out, affine=not non_affine_group_norm
+                ),
+                activation,
+            )
+
+        in_d = 1
+        self.conv_layers = nn.ModuleList()
+        for dim, k, stride in conv_layers:
+            self.conv_layers.append(block(in_d, dim, k, stride))
+            in_d = dim
+
+        self.log_compression = log_compression
+        self.skip_connections = skip_connections
+        self.residual_scale = math.sqrt(residual_scale)
+
+    def forward(self, x):
+        # BxT -> BxCxT
+        x = x.unsqueeze(1)
+
+        for conv in self.conv_layers:
+            residual = x
+            x = conv(x)
+            if self.skip_connections and x.size(1) == residual.size(1):
+                tsz = x.size(2)
+                r_tsz = residual.size(2)
+                residual = residual[..., :: r_tsz // tsz][..., :tsz]
+                x = (x + residual) * self.residual_scale
+
+        if self.log_compression:
+            x = x.abs()
+            x = x + 1
+            x = x.log()
+
+        return x
+
+
+class ZeroPad1d(nn.Module):
+    def __init__(self, pad_left, pad_right):
+        super().__init__()
+        self.pad_left = pad_left
+        self.pad_right = pad_right
+
+    def forward(self, x):
+        return F.pad(x, (self.pad_left, self.pad_right))
+
+
+class ConvAggegator(nn.Module):
+    def __init__(
+        self,
+        conv_layers,
+        embed,
+        dropout,
+        skip_connections,
+        residual_scale,
+        non_affine_group_norm,
+        conv_bias,
+        zero_pad,
+        activation,
+    ):
+        super().__init__()
+
+        def block(n_in, n_out, k, stride):
+            # padding dims only really make sense for stride = 1
+            ka = k // 2
+            kb = ka - 1 if k % 2 == 0 else ka
+
+            pad = (
+                ZeroPad1d(ka + kb, 0) if zero_pad else nn.ReplicationPad1d((ka + kb, 0))
+            )
+
+            return nn.Sequential(
+                pad,
+                nn.Conv1d(n_in, n_out, k, stride=stride, bias=conv_bias),
+                nn.Dropout(p=dropout),
+                norm_block(False, n_out, affine=not non_affine_group_norm),
+                activation,
+            )
+
+        in_d = embed
+        self.conv_layers = nn.ModuleList()
+        self.residual_proj = nn.ModuleList()
+        for dim, k, stride in conv_layers:
+            if in_d != dim and skip_connections:
+                self.residual_proj.append(nn.Conv1d(in_d, dim, 1, bias=False))
+            else:
+                self.residual_proj.append(None)
+
+            self.conv_layers.append(block(in_d, dim, k, stride))
+            in_d = dim
+        self.conv_layers = nn.Sequential(*self.conv_layers)
+        self.skip_connections = skip_connections
+        self.residual_scale = math.sqrt(residual_scale)
+
+    def forward(self, x):
+        for rproj, conv in zip(self.residual_proj, self.conv_layers):
+            residual = x
+            x = conv(x)
+            if self.skip_connections:
+                if rproj is not None:
+                    residual = rproj(residual)
+                x = (x + residual) * self.residual_scale
+        return x
+
+
+class Wav2VecPredictionsModel(nn.Module):
+    def __init__(
+        self,
+        in_dim,
+        out_dim,
+        prediction_steps,
+        n_negatives,
+        cross_sample_negatives,
+        sample_distance,
+        dropout,
+        offset,
+        balanced_classes,
+        infonce,
+    ):
+        super().__init__()
+
+        self.n_negatives = n_negatives
+        self.cross_sample_negatives = cross_sample_negatives
+        self.sample_distance = sample_distance
+        self.project_to_steps = nn.ConvTranspose2d(
+            in_dim, out_dim, (1, prediction_steps)
+        )
+        self.dropout = nn.Dropout(p=dropout)
+        self.offset = offset
+        self.balanced_classes = balanced_classes
+        self.infonce = infonce
+
+    def sample_negatives(self, y):
+        bsz, fsz, tsz = y.shape
+
+        y = y.transpose(0, 1)  # BCT -> CBT
+        y = y.contiguous().view(fsz, -1)  # CBT => C(BxT)
+
+        cross_high = tsz * bsz
+        high = tsz if self.sample_distance is None else min(tsz, self.sample_distance)
+        assert high > 1
+
+        neg_idxs = torch.randint(low=0, high=high, size=(bsz, self.n_negatives * tsz))
+
+        with torch.no_grad():
+            if self.n_negatives > 0:
+                tszs = (
+                    buffered_arange(tsz)
+                    .unsqueeze(-1)
+                    .expand(-1, self.n_negatives)
+                    .flatten()
+                )
+
+                neg_idxs = torch.randint(
+                    low=0, high=high - 1, size=(bsz, self.n_negatives * tsz)
+                )
+                neg_idxs[neg_idxs >= tszs] += 1
+
+            if self.cross_sample_negatives > 0:
+                tszs = (
+                    buffered_arange(tsz)
+                    .unsqueeze(-1)
+                    .expand(-1, self.cross_sample_negatives)
+                    .flatten()
+                )
+
+                cross_neg_idxs = torch.randint(
+                    low=0,
+                    high=cross_high - 1,
+                    size=(bsz, self.cross_sample_negatives * tsz),
+                )
+                cross_neg_idxs[cross_neg_idxs >= tszs] += 1
+
+        if self.n_negatives > 0:
+            for i in range(1, bsz):
+                neg_idxs[i] += i * high
+        else:
+            neg_idxs = cross_neg_idxs
+
+        if self.cross_sample_negatives > 0 and self.n_negatives > 0:
+            neg_idxs = torch.cat([neg_idxs, cross_neg_idxs], dim=1)
+
+        negs = y[..., neg_idxs.view(-1)]
+        negs = negs.view(
+            fsz, bsz, self.n_negatives + self.cross_sample_negatives, tsz
+        ).permute(
+            2, 1, 0, 3
+        )  # to NxBxCxT
+
+        return negs
+
+    def forward(self, x, y):
+
+        x = x.unsqueeze(-1)
+        x = self.project_to_steps(x)  # BxCxTxS
+        x = self.dropout(x)
+
+        negatives = self.sample_negatives(y)
+        y = y.unsqueeze(0)
+        targets = torch.cat([y, negatives], dim=0)  # Copies x B x C x T
+
+        copies = targets.size(0)
+        bsz, dim, tsz, steps = x.shape
+        steps = min(steps, tsz - self.offset)
+
+        predictions = x.new(
+            bsz * copies * (tsz - self.offset + 1) * steps
+            - ((steps + 1) * steps // 2) * copies * bsz
+        )
+        if self.infonce:
+            labels = predictions.new_full(
+                (predictions.shape[0] // copies,), 0, dtype=torch.long
+            )
+        else:
+            labels = torch.zeros_like(predictions)
+        weights = (
+            torch.full_like(labels, 1 / self.n_negatives)
+            if self.balanced_classes and not self.infonce
+            else None
+        )
+
+        start = end = 0
+        for i in range(steps):
+            offset = i + self.offset
+            end = start + (tsz - offset) * bsz * copies
+            if self.infonce:
+                predictions[start:end] = torch.einsum(
+                    "bct,nbct->tbn", x[..., :-offset, i], targets[..., offset:]
+                ).flatten()
+            else:
+                pos_num = (end - start) // copies
+                predictions[start:end] = torch.einsum(
+                    "bct,nbct->nbt", x[..., :-offset, i], targets[..., offset:]
+                ).flatten()
+                labels[start : start + pos_num] = 1.0
+                if weights is not None:
+                    weights[start : start + pos_num] = 1.0
+            start = end
+        assert end == predictions.numel(), "{} != {}".format(end, predictions.numel())
+
+        if self.infonce:
+            predictions = predictions.view(-1, copies)
+        else:
+            if weights is not None:
+                labels = (labels, weights)
+
+        return predictions, labels
diff --git a/SpeechT5/fairseq/fairseq/models/wav2vec/wav2vec2.py b/SpeechT5/fairseq/fairseq/models/wav2vec/wav2vec2.py
new file mode 100644
index 0000000000000000000000000000000000000000..714fd3ab50443b8d15715b1cf5abd4eb517298c4
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/models/wav2vec/wav2vec2.py
@@ -0,0 +1,1016 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import math
+from dataclasses import dataclass, field
+from typing import List, Tuple
+
+import numpy as np
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from fairseq import utils
+from fairseq.data.data_utils import compute_mask_indices
+from fairseq.dataclass import ChoiceEnum, FairseqDataclass
+from fairseq.models import BaseFairseqModel, register_model
+from fairseq.modules import (
+    Fp32GroupNorm,
+    Fp32LayerNorm,
+    GradMultiply,
+    GumbelVectorQuantizer,
+    LayerNorm,
+    MultiheadAttention,
+    SamePad,
+    TransposeLast,
+)
+from fairseq.modules.transformer_sentence_encoder import init_bert_params
+from fairseq.utils import buffered_arange, index_put, is_xla_tensor
+
+
+EXTRACTOR_MODE_CHOICES = ChoiceEnum(["default", "layer_norm"])
+MASKING_DISTRIBUTION_CHOICES = ChoiceEnum(["static", "uniform", "normal", "poisson"])
+
+
+@dataclass
+class Wav2Vec2Config(FairseqDataclass):
+    extractor_mode: EXTRACTOR_MODE_CHOICES = field(
+        default="default",
+        metadata={
+            "help": "mode for feature extractor. default has a single group norm with d "
+            "groups in the first conv block, whereas layer_norm has layer norms in "
+            "every block (meant to use with normalize=True)"
+        },
+    )
+    encoder_layers: int = field(
+        default=12, metadata={"help": "num encoder layers in the transformer"}
+    )
+    encoder_embed_dim: int = field(
+        default=768, metadata={"help": "encoder embedding dimension"}
+    )
+    encoder_ffn_embed_dim: int = field(
+        default=3072, metadata={"help": "encoder embedding dimension for FFN"}
+    )
+    encoder_attention_heads: int = field(
+        default=12, metadata={"help": "num encoder attention heads"}
+    )
+    activation_fn: ChoiceEnum(utils.get_available_activation_fns()) = field(
+        default="gelu", metadata={"help": "activation function to use"}
+    )
+
+    # dropouts
+    dropout: float = field(
+        default=0.1, metadata={"help": "dropout probability for the transformer"}
+    )
+    attention_dropout: float = field(
+        default=0.1, metadata={"help": "dropout probability for attention weights"}
+    )
+    activation_dropout: float = field(
+        default=0.0, metadata={"help": "dropout probability after activation in FFN"}
+    )
+    encoder_layerdrop: float = field(
+        default=0.0, metadata={"help": "probability of dropping a tarnsformer layer"}
+    )
+    dropout_input: float = field(
+        default=0.0,
+        metadata={"help": "dropout to apply to the input (after feat extr)"},
+    )
+    dropout_features: float = field(
+        default=0.0,
+        metadata={"help": "dropout to apply to the features (after feat extr)"},
+    )
+
+    final_dim: int = field(
+        default=0,
+        metadata={
+            "help": "project final representations and targets to this many dimensions."
+            "set to encoder_embed_dim is <= 0"
+        },
+    )
+    layer_norm_first: bool = field(
+        default=False, metadata={"help": "apply layernorm first in the transformer"}
+    )
+    conv_feature_layers: str = field(
+        default="[(512, 10, 5)] + [(512, 3, 2)] * 4 + [(512,2,2)] + [(512,2,2)]",
+        metadata={
+            "help": "string describing convolutional feature extraction layers in form of a python list that contains "
+            "[(dim, kernel_size, stride), ...]"
+        },
+    )
+    conv_bias: bool = field(
+        default=False, metadata={"help": "include bias in conv encoder"}
+    )
+    logit_temp: float = field(
+        default=0.1, metadata={"help": "temperature to divide logits by"}
+    )
+    quantize_targets: bool = field(
+        default=False, metadata={"help": "use quantized targets"}
+    )
+    quantize_input: bool = field(
+        default=False, metadata={"help": "use quantized inputs"}
+    )
+    same_quantizer: bool = field(
+        default=False, metadata={"help": "use same quantizer for inputs and targets"}
+    )
+    target_glu: bool = field(
+        default=False, metadata={"help": "adds projection + glu to targets"}
+    )
+    feature_grad_mult: float = field(
+        default=1.0, metadata={"help": "multiply feature extractor var grads by this"}
+    )
+    quantizer_depth: int = field(
+        default=1,
+        metadata={"help": "number of quantizer layers"},
+    )
+    quantizer_factor: int = field(
+        default=3,
+        metadata={
+            "help": "dimensionality increase for inner quantizer layers (if depth > 1)"
+        },
+    )
+    latent_vars: int = field(
+        default=320,
+        metadata={"help": "number of latent variables V in each group of the codebook"},
+    )
+    latent_groups: int = field(
+        default=2,
+        metadata={"help": "number of groups G of latent variables in the codebook"},
+    )
+    latent_dim: int = field(
+        default=0,
+        metadata={
+            "help": "if > 0, uses this dimensionality for latent variables. "
+            "otherwise uses final_dim / latent_groups"
+        },
+    )
+
+    # masking
+    mask_length: int = field(default=10, metadata={"help": "mask length"})
+    mask_prob: float = field(
+        default=0.65, metadata={"help": "probability of replacing a token with mask"}
+    )
+    mask_selection: MASKING_DISTRIBUTION_CHOICES = field(
+        default="static", metadata={"help": "how to choose mask length"}
+    )
+    mask_other: float = field(
+        default=0,
+        metadata={
+            "help": "secondary mask argument (used for more complex distributions), "
+            "see help in compute_mask_indices"
+        },
+    )
+    no_mask_overlap: bool = field(
+        default=False, metadata={"help": "whether to allow masks to overlap"}
+    )
+    mask_min_space: int = field(
+        default=1,
+        metadata={"help": "min space between spans (if no overlap is enabled)"},
+    )
+
+    # channel masking
+    mask_channel_length: int = field(
+        default=10, metadata={"help": "length of the mask for features (channels)"}
+    )
+    mask_channel_prob: float = field(
+        default=0.0, metadata={"help": "probability of replacing a feature with 0"}
+    )
+    mask_channel_before: bool = False
+    mask_channel_selection: MASKING_DISTRIBUTION_CHOICES = field(
+        default="static",
+        metadata={"help": "how to choose mask length for channel masking"},
+    )
+    mask_channel_other: float = field(
+        default=0,
+        metadata={
+            "help": "secondary mask argument (used for more complex distributions), "
+            "see help in compute_mask_indicesh"
+        },
+    )
+    no_mask_channel_overlap: bool = field(
+        default=False, metadata={"help": "whether to allow channel masks to overlap"}
+    )
+    mask_channel_min_space: int = field(
+        default=1,
+        metadata={"help": "min space between spans (if no overlap is enabled)"},
+    )
+
+    # negative selection
+    num_negatives: int = field(
+        default=100,
+        metadata={"help": "number of negative examples from the same sample"},
+    )
+    negatives_from_everywhere: bool = field(
+        default=False,
+        metadata={"help": "sample negatives from everywhere, not just masked states"},
+    )
+    cross_sample_negatives: int = field(
+        default=0, metadata={"help": "number of negative examples from the any sample"}
+    )
+    codebook_negatives: int = field(
+        default=0, metadata={"help": "number of negative examples codebook"}
+    )
+
+    # positional embeddings
+    conv_pos: int = field(
+        default=128,
+        metadata={"help": "number of filters for convolutional positional embeddings"},
+    )
+    conv_pos_groups: int = field(
+        default=16,
+        metadata={"help": "number of groups for convolutional positional embedding"},
+    )
+
+    latent_temp: Tuple[float, float, float] = field(
+        default=(2, 0.5, 0.999995),
+        metadata={
+            "help": "temperature for latent variable sampling. "
+            "can be tuple of 3 values (start, end, decay)"
+        },
+    )
+
+
+@register_model("wav2vec2", dataclass=Wav2Vec2Config)
+class Wav2Vec2Model(BaseFairseqModel):
+    def __init__(self, cfg: Wav2Vec2Config):
+        super().__init__()
+        self.cfg = cfg
+
+        feature_enc_layers = eval(cfg.conv_feature_layers)
+        self.embed = feature_enc_layers[-1][0]
+
+        self.feature_extractor = ConvFeatureExtractionModel(
+            conv_layers=feature_enc_layers,
+            dropout=0.0,
+            mode=cfg.extractor_mode,
+            conv_bias=cfg.conv_bias,
+        )
+
+        self.post_extract_proj = (
+            nn.Linear(self.embed, cfg.encoder_embed_dim)
+            if self.embed != cfg.encoder_embed_dim and not cfg.quantize_input
+            else None
+        )
+
+        self.mask_prob = cfg.mask_prob
+        self.mask_selection = cfg.mask_selection
+        self.mask_other = cfg.mask_other
+        self.mask_length = cfg.mask_length
+        self.no_mask_overlap = cfg.no_mask_overlap
+        self.mask_min_space = cfg.mask_min_space
+
+        self.mask_channel_prob = cfg.mask_channel_prob
+        self.mask_channel_before = cfg.mask_channel_before
+        self.mask_channel_selection = cfg.mask_channel_selection
+        self.mask_channel_other = cfg.mask_channel_other
+        self.mask_channel_length = cfg.mask_channel_length
+        self.no_mask_channel_overlap = cfg.no_mask_channel_overlap
+        self.mask_channel_min_space = cfg.mask_channel_min_space
+
+        self.dropout_input = nn.Dropout(cfg.dropout_input)
+        self.dropout_features = nn.Dropout(cfg.dropout_features)
+
+        self.feature_grad_mult = cfg.feature_grad_mult
+
+        self.quantizer = None
+        self.input_quantizer = None
+
+        self.n_negatives = cfg.num_negatives
+        self.cross_sample_negatives = cfg.cross_sample_negatives
+        self.codebook_negatives = cfg.codebook_negatives
+        self.negatives_from_everywhere = cfg.negatives_from_everywhere
+
+        self.logit_temp = cfg.logit_temp
+
+        final_dim = cfg.final_dim if cfg.final_dim > 0 else cfg.encoder_embed_dim
+
+        if cfg.quantize_targets:
+            vq_dim = cfg.latent_dim if cfg.latent_dim > 0 else final_dim
+            self.quantizer = GumbelVectorQuantizer(
+                dim=self.embed,
+                num_vars=cfg.latent_vars,
+                temp=cfg.latent_temp,
+                groups=cfg.latent_groups,
+                combine_groups=False,
+                vq_dim=vq_dim,
+                time_first=True,
+                weight_proj_depth=cfg.quantizer_depth,
+                weight_proj_factor=cfg.quantizer_factor,
+            )
+            self.project_q = nn.Linear(vq_dim, final_dim)
+        else:
+            self.project_q = nn.Linear(self.embed, final_dim)
+
+        if cfg.quantize_input:
+            if cfg.same_quantizer and self.quantizer is not None:
+                vq_dim = final_dim
+                self.input_quantizer = self.quantizer
+            else:
+                vq_dim = cfg.latent_dim if cfg.latent_dim > 0 else cfg.encoder_embed_dim
+                self.input_quantizer = GumbelVectorQuantizer(
+                    dim=self.embed,
+                    num_vars=cfg.latent_vars,
+                    temp=cfg.latent_temp,
+                    groups=cfg.latent_groups,
+                    combine_groups=False,
+                    vq_dim=vq_dim,
+                    time_first=True,
+                    weight_proj_depth=cfg.quantizer_depth,
+                    weight_proj_factor=cfg.quantizer_factor,
+                )
+            self.project_inp = nn.Linear(vq_dim, cfg.encoder_embed_dim)
+
+        self.mask_emb = nn.Parameter(
+            torch.FloatTensor(cfg.encoder_embed_dim).uniform_()
+        )
+
+        self.encoder = TransformerEncoder(cfg)
+        self.layer_norm = LayerNorm(self.embed)
+
+        self.target_glu = None
+        if cfg.target_glu:
+            self.target_glu = nn.Sequential(
+                nn.Linear(final_dim, final_dim * 2), nn.GLU()
+            )
+
+        self.final_proj = nn.Linear(cfg.encoder_embed_dim, final_dim)
+
+    def upgrade_state_dict_named(self, state_dict, name):
+        super().upgrade_state_dict_named(state_dict, name)
+        """Upgrade a (possibly old) state dict for new versions of fairseq."""
+        return state_dict
+
+    @classmethod
+    def build_model(cls, cfg: Wav2Vec2Config, task=None):
+        """Build a new model instance."""
+
+        return cls(cfg)
+
+    def apply_mask(
+        self,
+        x,
+        padding_mask,
+        mask_indices=None,
+        mask_channel_indices=None,
+    ):
+        B, T, C = x.shape
+
+        if self.mask_channel_prob > 0 and self.mask_channel_before:
+            mask_channel_indices = compute_mask_indices(
+                (B, C),
+                None,
+                self.mask_channel_prob,
+                self.mask_channel_length,
+                self.mask_channel_selection,
+                self.mask_channel_other,
+                no_overlap=self.no_mask_channel_overlap,
+                min_space=self.mask_channel_min_space,
+            )
+            mask_channel_indices = (
+                torch.from_numpy(mask_channel_indices)
+                .to(x.device)
+                .unsqueeze(1)
+                .expand(-1, T, -1)
+            )
+            x[mask_channel_indices] = 0
+
+        if self.mask_prob > 0:
+            if mask_indices is None:
+                mask_indices = compute_mask_indices(
+                    (B, T),
+                    padding_mask,
+                    self.mask_prob,
+                    self.mask_length,
+                    self.mask_selection,
+                    self.mask_other,
+                    min_masks=2,
+                    no_overlap=self.no_mask_overlap,
+                    min_space=self.mask_min_space,
+                )
+                mask_indices = torch.from_numpy(mask_indices).to(x.device)
+            x = index_put(x, mask_indices, self.mask_emb)
+        else:
+            mask_indices = None
+
+        if self.mask_channel_prob > 0 and not self.mask_channel_before:
+            if mask_channel_indices is None:
+                mask_channel_indices = compute_mask_indices(
+                    (B, C),
+                    None,
+                    self.mask_channel_prob,
+                    self.mask_channel_length,
+                    self.mask_channel_selection,
+                    self.mask_channel_other,
+                    no_overlap=self.no_mask_channel_overlap,
+                    min_space=self.mask_channel_min_space,
+                )
+                mask_channel_indices = (
+                    torch.from_numpy(mask_channel_indices)
+                    .to(x.device)
+                    .unsqueeze(1)
+                    .expand(-1, T, -1)
+                )
+            x = index_put(x, mask_channel_indices, 0)
+
+        return x, mask_indices
+
+    def sample_negatives(self, y, num, padding_count=None):
+
+        if self.n_negatives == 0 and self.cross_sample_negatives == 0:
+            return y.new(0)
+
+        bsz, tsz, fsz = y.shape
+        y = y.view(-1, fsz)  # BTC => (BxT)C
+
+        # FIXME: what happens if padding_count is specified?
+        cross_high = tsz * bsz
+        high = tsz - (padding_count or 0)
+        with torch.no_grad():
+            assert high > 1, f"{bsz,tsz,fsz}"
+
+            if self.n_negatives > 0:
+                tszs = (
+                    buffered_arange(num)
+                    .unsqueeze(-1)
+                    .expand(-1, self.n_negatives)
+                    .flatten()
+                )
+
+                neg_idxs = torch.randint(
+                    low=0, high=high - 1, size=(bsz, self.n_negatives * num)
+                )
+                neg_idxs[neg_idxs >= tszs] += 1
+
+            if self.cross_sample_negatives > 0:
+                tszs = (
+                    buffered_arange(num)
+                    .unsqueeze(-1)
+                    .expand(-1, self.cross_sample_negatives)
+                    .flatten()
+                )
+
+                cross_neg_idxs = torch.randint(
+                    low=0,
+                    high=cross_high - 1,
+                    size=(bsz, self.cross_sample_negatives * num),
+                )
+                cross_neg_idxs[cross_neg_idxs >= tszs] += 1
+
+        if self.n_negatives > 0:
+            for i in range(1, bsz):
+                neg_idxs[i] += i * high
+        else:
+            neg_idxs = cross_neg_idxs
+
+        if self.cross_sample_negatives > 0 and self.n_negatives > 0:
+            neg_idxs = torch.cat([neg_idxs, cross_neg_idxs], dim=1)
+
+        negs = y[neg_idxs.view(-1)]
+        negs = negs.view(
+            bsz, num, self.n_negatives + self.cross_sample_negatives, fsz
+        ).permute(
+            2, 0, 1, 3
+        )  # to NxBxTxC
+        return negs, neg_idxs
+
+    def compute_preds(self, x, y, negatives):
+
+        neg_is_pos = (y == negatives).all(-1)
+        y = y.unsqueeze(0)
+        targets = torch.cat([y, negatives], dim=0)
+
+        logits = torch.cosine_similarity(x.float(), targets.float(), dim=-1).type_as(x)
+
+        logits = logits / self.logit_temp
+
+        if is_xla_tensor(logits) or neg_is_pos.any():
+            fillval = -float(2 ** 30)
+            if not hasattr(self, "_inftensor"):
+                self._inftensor = (
+                    torch.tensor(fillval).to(x.device)
+                    if is_xla_tensor(logits)
+                    else float("-inf")
+                )
+            logits[1:] = index_put(logits[1:], neg_is_pos, self._inftensor)
+
+        return logits
+
+    def _get_feat_extract_output_lengths(self, input_lengths: torch.LongTensor):
+        """
+        Computes the output length of the convolutional layers
+        """
+
+        def _conv_out_length(input_length, kernel_size, stride):
+            return torch.floor((input_length - kernel_size) / stride + 1)
+
+        conv_cfg_list = eval(self.cfg.conv_feature_layers)
+
+        for i in range(len(conv_cfg_list)):
+            input_lengths = _conv_out_length(
+                input_lengths, conv_cfg_list[i][1], conv_cfg_list[i][2]
+            )
+
+        return input_lengths.to(torch.long)
+
+    def forward(
+        self,
+        source,
+        padding_mask=None,
+        mask=True,
+        features_only=False,
+        layer=None,
+        mask_indices=None,
+        mask_channel_indices=None,
+        padding_count=None,
+    ):
+
+        if self.feature_grad_mult > 0:
+            features = self.feature_extractor(source)
+            if self.feature_grad_mult != 1.0:
+                features = GradMultiply.apply(features, self.feature_grad_mult)
+        else:
+            with torch.no_grad():
+                features = self.feature_extractor(source)
+
+        features_pen = features.float().pow(2).mean()
+
+        features = features.transpose(1, 2)
+        features = self.layer_norm(features)
+        unmasked_features = features.clone()
+
+        if padding_mask is not None and padding_mask.any():
+            input_lengths = (1 - padding_mask.long()).sum(-1)
+            # apply conv formula to get real output_lengths
+            output_lengths = self._get_feat_extract_output_lengths(input_lengths)
+
+            padding_mask = torch.zeros(
+                features.shape[:2], dtype=features.dtype, device=features.device
+            )
+
+            # these two operations makes sure that all values
+            # before the output lengths indices are attended to
+            padding_mask[
+                (
+                    torch.arange(padding_mask.shape[0], device=padding_mask.device),
+                    output_lengths - 1,
+                )
+            ] = 1
+            padding_mask = (1 - padding_mask.flip([-1]).cumsum(-1).flip([-1])).bool()
+        else:
+            padding_mask = None
+
+        if self.post_extract_proj is not None:
+            features = self.post_extract_proj(features)
+
+        features = self.dropout_input(features)
+        unmasked_features = self.dropout_features(unmasked_features)
+
+        num_vars = None
+        code_ppl = None
+        prob_ppl = None
+        curr_temp = None
+
+        if self.input_quantizer:
+            q = self.input_quantizer(features, produce_targets=False)
+            features = q["x"]
+            num_vars = q["num_vars"]
+            code_ppl = q["code_perplexity"]
+            prob_ppl = q["prob_perplexity"]
+            curr_temp = q["temp"]
+            features = self.project_inp(features)
+
+        if mask:
+            x, mask_indices = self.apply_mask(
+                features,
+                padding_mask,
+                mask_indices=mask_indices,
+                mask_channel_indices=mask_channel_indices,
+            )
+            if not is_xla_tensor(x) and mask_indices is not None:
+                # tpu-comment: reducing the size in a dynamic way causes
+                # too many recompilations on xla.
+                y = unmasked_features[mask_indices].view(
+                    unmasked_features.size(0), -1, unmasked_features.size(-1)
+                )
+            else:
+                y = unmasked_features
+        else:
+            x = features
+            y = unmasked_features
+            mask_indices = None
+
+        x, layer_results = self.encoder(x, padding_mask=padding_mask, layer=layer)
+
+        if features_only:
+            return {
+                "x": x,
+                "padding_mask": padding_mask,
+                "features": unmasked_features,
+                "layer_results": layer_results,
+            }
+
+        if self.quantizer:
+            q = self.quantizer(y, produce_targets=False)
+            y = q["x"]
+            num_vars = q["num_vars"]
+            code_ppl = q["code_perplexity"]
+            prob_ppl = q["prob_perplexity"]
+            curr_temp = q["temp"]
+
+            y = self.project_q(y)
+
+            if self.negatives_from_everywhere:
+                neg_cands = self.quantizer(unmasked_features, produce_targets=False)[
+                    "x"
+                ]
+                negs, _ = self.sample_negatives(
+                    neg_cands,
+                    y.size(1),
+                    padding_count=padding_count,
+                )
+                negs = self.project_q(negs)
+
+            else:
+                negs, _ = self.sample_negatives(
+                    y,
+                    y.size(1),
+                    padding_count=padding_count,
+                )
+
+            if self.codebook_negatives > 0:
+                cb_negs = self.quantizer.sample_from_codebook(
+                    y.size(0) * y.size(1), self.codebook_negatives
+                )
+                cb_negs = cb_negs.view(
+                    self.codebook_negatives, y.size(0), y.size(1), -1
+                )  # order doesnt matter
+                cb_negs = self.project_q(cb_negs)
+                negs = torch.cat([negs, cb_negs], dim=0)
+        else:
+            y = self.project_q(y)
+
+            if self.negatives_from_everywhere:
+                negs, _ = self.sample_negatives(
+                    unmasked_features,
+                    y.size(1),
+                    padding_count=padding_count,
+                )
+                negs = self.project_q(negs)
+            else:
+                negs, _ = self.sample_negatives(
+                    y,
+                    y.size(1),
+                    padding_count=padding_count,
+                )
+
+        if not is_xla_tensor(x):
+            # tpu-comment: reducing the size in a dynamic way causes
+            # too many recompilations on xla.
+            x = x[mask_indices].view(x.size(0), -1, x.size(-1))
+
+        if self.target_glu:
+            y = self.target_glu(y)
+            negs = self.target_glu(negs)
+
+        x = self.final_proj(x)
+        x = self.compute_preds(x, y, negs)
+
+        result = {
+            "x": x,
+            "padding_mask": padding_mask,
+            "features_pen": features_pen,
+        }
+
+        if prob_ppl is not None:
+            result["prob_perplexity"] = prob_ppl
+            result["code_perplexity"] = code_ppl
+            result["num_vars"] = num_vars
+            result["temp"] = curr_temp
+
+        return result
+
+    def quantize(self, x):
+        assert self.quantizer is not None
+        x = self.feature_extractor(x)
+        x = x.transpose(1, 2)
+        x = self.layer_norm(x)
+        return self.quantizer.forward_idx(x)
+
+    def extract_features(self, source, padding_mask, mask=False, layer=None):
+        res = self.forward(
+            source, padding_mask, mask=mask, features_only=True, layer=layer
+        )
+        return res
+
+    def get_logits(self, net_output):
+        logits = net_output["x"]
+        logits = logits.transpose(0, 2)
+        logits = logits.reshape(-1, logits.size(-1))
+        return logits
+
+    def get_targets(self, sample, net_output, expand_steps=True):
+        x = net_output["x"]
+        return x.new_zeros(x.size(1) * x.size(2), dtype=torch.long)
+
+    def get_extra_losses(self, net_output):
+        pen = []
+
+        if "prob_perplexity" in net_output:
+            pen.append(
+                (net_output["num_vars"] - net_output["prob_perplexity"])
+                / net_output["num_vars"]
+            )
+
+        if "features_pen" in net_output:
+            pen.append(net_output["features_pen"])
+
+        return pen
+
+    def remove_pretraining_modules(self):
+        self.quantizer = None
+        self.project_q = None
+        self.target_glu = None
+        self.final_proj = None
+
+
+class ConvFeatureExtractionModel(nn.Module):
+    def __init__(
+        self,
+        conv_layers: List[Tuple[int, int, int]],
+        dropout: float = 0.0,
+        mode: str = "default",
+        conv_bias: bool = False,
+    ):
+        super().__init__()
+
+        assert mode in {"default", "layer_norm"}
+
+        def block(
+            n_in,
+            n_out,
+            k,
+            stride,
+            is_layer_norm=False,
+            is_group_norm=False,
+            conv_bias=False,
+        ):
+            def make_conv():
+                conv = nn.Conv1d(n_in, n_out, k, stride=stride, bias=conv_bias)
+                nn.init.kaiming_normal_(conv.weight)
+                return conv
+
+            assert (
+                is_layer_norm and is_group_norm
+            ) == False, "layer norm and group norm are exclusive"
+
+            if is_layer_norm:
+                return nn.Sequential(
+                    make_conv(),
+                    nn.Dropout(p=dropout),
+                    nn.Sequential(
+                        TransposeLast(),
+                        Fp32LayerNorm(dim, elementwise_affine=True),
+                        TransposeLast(),
+                    ),
+                    nn.GELU(),
+                )
+            elif is_group_norm:
+                return nn.Sequential(
+                    make_conv(),
+                    nn.Dropout(p=dropout),
+                    Fp32GroupNorm(dim, dim, affine=True),
+                    nn.GELU(),
+                )
+            else:
+                return nn.Sequential(make_conv(), nn.Dropout(p=dropout), nn.GELU())
+
+        in_d = 1
+        self.conv_layers = nn.ModuleList()
+        for i, cl in enumerate(conv_layers):
+            assert len(cl) == 3, "invalid conv definition: " + str(cl)
+            (dim, k, stride) = cl
+
+            self.conv_layers.append(
+                block(
+                    in_d,
+                    dim,
+                    k,
+                    stride,
+                    is_layer_norm=mode == "layer_norm",
+                    is_group_norm=mode == "default" and i == 0,
+                    conv_bias=conv_bias,
+                )
+            )
+            in_d = dim
+
+    def forward(self, x):
+
+        # BxT -> BxCxT
+        x = x.unsqueeze(1)
+
+        for conv in self.conv_layers:
+            x = conv(x)
+
+        return x
+
+
+class TransformerEncoder(nn.Module):
+    def __init__(self, args):
+        super().__init__()
+
+        self.dropout = args.dropout
+        self.embedding_dim = args.encoder_embed_dim
+
+        self.pos_conv = nn.Conv1d(
+            self.embedding_dim,
+            self.embedding_dim,
+            kernel_size=args.conv_pos,
+            padding=args.conv_pos // 2,
+            groups=args.conv_pos_groups,
+        )
+        dropout = 0
+        std = math.sqrt((4 * (1.0 - dropout)) / (args.conv_pos * self.embedding_dim))
+        nn.init.normal_(self.pos_conv.weight, mean=0, std=std)
+        nn.init.constant_(self.pos_conv.bias, 0)
+
+        self.pos_conv = nn.utils.weight_norm(self.pos_conv, name="weight", dim=2)
+        self.pos_conv = nn.Sequential(self.pos_conv, SamePad(args.conv_pos), nn.GELU())
+
+        self.layers = nn.ModuleList(
+            [
+                TransformerSentenceEncoderLayer(
+                    embedding_dim=self.embedding_dim,
+                    ffn_embedding_dim=args.encoder_ffn_embed_dim,
+                    num_attention_heads=args.encoder_attention_heads,
+                    dropout=self.dropout,
+                    attention_dropout=args.attention_dropout,
+                    activation_dropout=args.activation_dropout,
+                    activation_fn=args.activation_fn,
+                    layer_norm_first=args.layer_norm_first,
+                )
+                for _ in range(args.encoder_layers)
+            ]
+        )
+
+        self.layer_norm_first = args.layer_norm_first
+        self.layer_norm = LayerNorm(self.embedding_dim)
+        self.layerdrop = args.encoder_layerdrop
+
+        self.apply(init_bert_params)
+
+    def forward(self, x, padding_mask=None, layer=None):
+        x, layer_results = self.extract_features(x, padding_mask, layer)
+
+        if self.layer_norm_first and layer is None:
+            x = self.layer_norm(x)
+
+        return x, layer_results
+
+    def extract_features(self, x, padding_mask=None, tgt_layer=None):
+
+        if padding_mask is not None:
+            x = index_put(x, padding_mask, 0)
+
+        x_conv = self.pos_conv(x.transpose(1, 2))
+        x_conv = x_conv.transpose(1, 2)
+        x = x + x_conv
+
+        if not self.layer_norm_first:
+            x = self.layer_norm(x)
+
+        x = F.dropout(x, p=self.dropout, training=self.training)
+
+        # B x T x C -> T x B x C
+        x = x.transpose(0, 1)
+
+        layer_results = []
+        r = None
+        for i, layer in enumerate(self.layers):
+            dropout_probability = np.random.random()
+            if not self.training or (dropout_probability > self.layerdrop):
+                x, z = layer(x, self_attn_padding_mask=padding_mask, need_weights=False)
+                if tgt_layer is not None:
+                    layer_results.append((x, z))
+            if i == tgt_layer:
+                r = x
+                break
+
+        if r is not None:
+            x = r
+
+        # T x B x C -> B x T x C
+        x = x.transpose(0, 1)
+
+        return x, layer_results
+
+    def max_positions(self):
+        """Maximum output length supported by the encoder."""
+        return self.args.max_positions
+
+    def upgrade_state_dict_named(self, state_dict, name):
+        """Upgrade a (possibly old) state dict for new versions of fairseq."""
+        return state_dict
+
+
+class TransformerSentenceEncoderLayer(nn.Module):
+    """
+    Implements a Transformer Encoder Layer used in BERT/XLM style pre-trained
+    models.
+    """
+
+    def __init__(
+        self,
+        embedding_dim: float = 768,
+        ffn_embedding_dim: float = 3072,
+        num_attention_heads: float = 8,
+        dropout: float = 0.1,
+        attention_dropout: float = 0.1,
+        activation_dropout: float = 0.1,
+        activation_fn: str = "relu",
+        layer_norm_first: bool = False,
+    ) -> None:
+
+        super().__init__()
+        # Initialize parameters
+        self.embedding_dim = embedding_dim
+        self.dropout = dropout
+        self.activation_dropout = activation_dropout
+
+        # Initialize blocks
+        self.activation_fn = utils.get_activation_fn(activation_fn)
+        self.self_attn = MultiheadAttention(
+            self.embedding_dim,
+            num_attention_heads,
+            dropout=attention_dropout,
+            self_attention=True,
+        )
+
+        self.dropout1 = nn.Dropout(dropout)
+        self.dropout2 = nn.Dropout(self.activation_dropout)
+        self.dropout3 = nn.Dropout(dropout)
+
+        self.layer_norm_first = layer_norm_first
+
+        # layer norm associated with the self attention layer
+        self.self_attn_layer_norm = LayerNorm(self.embedding_dim)
+        self.fc1 = nn.Linear(self.embedding_dim, ffn_embedding_dim)
+        self.fc2 = nn.Linear(ffn_embedding_dim, self.embedding_dim)
+
+        # layer norm associated with the position wise feed-forward NN
+        self.final_layer_norm = LayerNorm(self.embedding_dim)
+
+    def forward(
+        self,
+        x: torch.Tensor,
+        self_attn_mask: torch.Tensor = None,
+        self_attn_padding_mask: torch.Tensor = None,
+        need_weights: bool = False,
+        att_args=None,
+    ):
+        """
+        LayerNorm is applied either before or after the self-attention/ffn
+        modules similar to the original Transformer imlementation.
+        """
+        residual = x
+
+        if self.layer_norm_first:
+            x = self.self_attn_layer_norm(x)
+            x, attn = self.self_attn(
+                query=x,
+                key=x,
+                value=x,
+                key_padding_mask=self_attn_padding_mask,
+                attn_mask=self_attn_mask,
+            )
+            x = self.dropout1(x)
+            x = residual + x
+
+            residual = x
+            x = self.final_layer_norm(x)
+            x = self.activation_fn(self.fc1(x))
+            x = self.dropout2(x)
+            x = self.fc2(x)
+            x = self.dropout3(x)
+            x = residual + x
+        else:
+            x, attn = self.self_attn(
+                query=x,
+                key=x,
+                value=x,
+                key_padding_mask=self_attn_padding_mask,
+            )
+
+            x = self.dropout1(x)
+            x = residual + x
+
+            x = self.self_attn_layer_norm(x)
+
+            residual = x
+            x = self.activation_fn(self.fc1(x))
+            x = self.dropout2(x)
+            x = self.fc2(x)
+            x = self.dropout3(x)
+            x = residual + x
+            x = self.final_layer_norm(x)
+
+        return x, attn
diff --git a/SpeechT5/fairseq/fairseq/models/wav2vec/wav2vec2_asr.py b/SpeechT5/fairseq/fairseq/models/wav2vec/wav2vec2_asr.py
new file mode 100644
index 0000000000000000000000000000000000000000..405d1e613a9bbf8294302c4526267f1330ffc5cd
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/models/wav2vec/wav2vec2_asr.py
@@ -0,0 +1,655 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from argparse import Namespace
+import contextlib
+import copy
+import math
+import numpy as np
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from dataclasses import dataclass, field
+from omegaconf import MISSING, II, open_dict
+from typing import Any, Optional
+
+from fairseq import checkpoint_utils, tasks, utils
+from fairseq.dataclass import FairseqDataclass
+from fairseq.dataclass.utils import convert_namespace_to_omegaconf
+from fairseq.tasks import FairseqTask
+from fairseq.models import (
+    BaseFairseqModel,
+    FairseqEncoder,
+    FairseqEncoderDecoderModel,
+    FairseqIncrementalDecoder,
+    register_model,
+)
+from fairseq.models.wav2vec.wav2vec2 import MASKING_DISTRIBUTION_CHOICES
+from fairseq.modules import (
+    LayerNorm,
+    PositionalEmbedding,
+    TransformerDecoderLayer,
+)
+
+
+@dataclass
+class Wav2Vec2AsrConfig(FairseqDataclass):
+    w2v_path: str = field(
+        default=MISSING, metadata={"help": "path to wav2vec 2.0 model"}
+    )
+    no_pretrained_weights: bool = field(
+        default=False, metadata={"help": "if true, does not load pretrained weights"}
+    )
+    dropout_input: float = field(
+        default=0.0,
+        metadata={"help": "dropout to apply to the input (after feat extr)"},
+    )
+    final_dropout: float = field(
+        default=0.0,
+        metadata={"help": "dropout after transformer and before final projection"},
+    )
+    dropout: float = field(
+        default=0.0, metadata={"help": "dropout probability inside wav2vec 2.0 model"}
+    )
+    attention_dropout: float = field(
+        default=0.0,
+        metadata={
+            "help": "dropout probability for attention weights inside wav2vec 2.0 model"
+        },
+    )
+    activation_dropout: float = field(
+        default=0.0,
+        metadata={
+            "help": "dropout probability after activation in FFN inside wav2vec 2.0 model"
+        },
+    )
+
+    # masking
+    apply_mask: bool = field(
+        default=False, metadata={"help": "apply masking during fine-tuning"}
+    )
+    mask_length: int = field(
+        default=10, metadata={"help": "repeat the mask indices multiple times"}
+    )
+    mask_prob: float = field(
+        default=0.5,
+        metadata={
+            "help": "probability of replacing a token with mask (normalized by length)"
+        },
+    )
+    mask_selection: MASKING_DISTRIBUTION_CHOICES = field(
+        default="static", metadata={"help": "how to choose masks"}
+    )
+    mask_other: float = field(
+        default=0,
+        metadata={
+            "help": "secondary mask argument (used for more complex distributions), "
+            "see help in compute_mask_indices"
+        },
+    )
+    no_mask_overlap: bool = field(
+        default=False, metadata={"help": "whether to allow masks to overlap"}
+    )
+
+    # channel masking
+    mask_channel_length: int = field(
+        default=10, metadata={"help": "length of the mask for features (channels)"}
+    )
+    mask_channel_prob: float = field(
+        default=0.0, metadata={"help": "probability of replacing a feature with 0"}
+    )
+    mask_channel_selection: MASKING_DISTRIBUTION_CHOICES = field(
+        default="static",
+        metadata={"help": "how to choose mask length for channel masking"},
+    )
+    mask_channel_other: float = field(
+        default=0,
+        metadata={
+            "help": "secondary mask argument (used for more complex distributions), "
+            "see help in compute_mask_indicesh"
+        },
+    )
+    no_mask_channel_overlap: bool = field(
+        default=False, metadata={"help": "whether to allow channel masks to overlap"}
+    )
+    freeze_finetune_updates: int = field(
+        default=0, metadata={"help": "dont finetune wav2vec for this many updates"}
+    )
+    feature_grad_mult: float = field(
+        default=0.0, metadata={"help": "reset feature grad mult in wav2vec 2.0 to this"}
+    )
+    layerdrop: float = field(
+        default=0.0, metadata={"help": "probability of dropping a layer in wav2vec 2.0"}
+    )
+    mask_channel_before: bool = False
+    normalize: bool = II("task.normalize")
+    data: str = II("task.data")
+    # this holds the loaded wav2vec args
+    w2v_args: Any = None
+
+
+@dataclass
+class Wav2Vec2CtcConfig(Wav2Vec2AsrConfig):
+    blank_weight: float = 0
+    blank_mode: str = "add"
+    mask_min_space: Optional[int] = field(
+        default=1,
+        metadata={"help": "min space between spans (if no overlap is enabled)"},
+    )
+    mask_channel_min_space: Optional[int] = field(
+        default=1,
+        metadata={"help": "min space between spans (if no overlap is enabled)"},
+    )
+    conv_feature_layers: Optional[str] = field(
+        default="[(512, 10, 5)] + [(512, 3, 2)] * 4 + [(512,2,2)] + [(512,2,2)]",
+        metadata={
+            "help": (
+                "string describing convolutional feature extraction "
+                "layers in form of a python list that contains "
+                "[(dim, kernel_size, stride), ...]"
+            ),
+        },
+    )
+    encoder_embed_dim: Optional[int] = field(
+        default=768, metadata={"help": "encoder embedding dimension"}
+    )
+
+
+@register_model("wav2vec_ctc", dataclass=Wav2Vec2CtcConfig)
+class Wav2VecCtc(BaseFairseqModel):
+    def __init__(self, cfg: Wav2Vec2CtcConfig, w2v_encoder: BaseFairseqModel):
+        super().__init__()
+        self.cfg = cfg
+        self.w2v_encoder = w2v_encoder
+        self.blank_weight = cfg.blank_weight
+        self.blank_mode = cfg.blank_mode
+
+    def upgrade_state_dict_named(self, state_dict, name):
+        super().upgrade_state_dict_named(state_dict, name)
+        return state_dict
+
+    @classmethod
+    def build_model(cls, cfg: Wav2Vec2CtcConfig, task: FairseqTask):
+        """Build a new model instance."""
+        w2v_encoder = Wav2VecEncoder(cfg, len(task.target_dictionary))
+        return cls(cfg, w2v_encoder)
+
+    def get_logits(self, net_output, normalize=False):
+        logits = net_output["encoder_out"]
+        if self.blank_weight != 0:
+            if self.blank_mode == "add":
+                logits[..., 0] += self.blank_weight
+            elif self.blank_mode == "set":
+                logits[..., 0] = self.blank_weight
+            else:
+                raise Exception(f"invalid blank mode {self.blank_mode}")
+
+        if net_output["padding_mask"] is not None and net_output["padding_mask"].any():
+            logits[net_output["padding_mask"].T][..., 0] = float("inf")
+            logits[net_output["padding_mask"].T][..., 1:] = float("-inf")
+
+        if normalize:
+            logits = utils.log_softmax(logits.float(), dim=-1)
+
+        return logits
+
+    def get_normalized_probs(self, net_output, log_probs):
+        """Get normalized probabilities (or log probs) from a net's output."""
+
+        logits = self.get_logits(net_output)
+
+        if log_probs:
+            return utils.log_softmax(logits.float(), dim=-1)
+        else:
+            return utils.softmax(logits.float(), dim=-1)
+
+    def forward(self, **kwargs):
+        x = self.w2v_encoder(**kwargs)
+        return x
+
+
+@dataclass
+class Wav2Vec2Seq2SeqConfig(Wav2Vec2AsrConfig):
+    decoder_embed_dim: int = field(
+        default=768, metadata={"help": "decoder embedding dimension"}
+    )
+    decoder_ffn_embed_dim: int = field(
+        default=3072, metadata={"help": "decoder embedding dimension for FFN"}
+    )
+    decoder_layers: int = field(default=6, metadata={"help": "num of decoder layers"})
+    decoder_layerdrop: float = field(
+        default=0.0, metadata={"help": "decoder layerdrop chance"}
+    )
+    decoder_attention_heads: int = field(
+        default=4, metadata={"help": "num decoder attention heads"}
+    )
+    decoder_learned_pos: bool = field(
+        default=False,
+        metadata={"help": "use learned positional embeddings in the decoder"},
+    )
+    decoder_normalize_before: bool = field(
+        default=False, metadata={"help": "apply layernorm before each decoder block"}
+    )
+    no_token_positional_embeddings: bool = field(
+        default=False,
+        metadata={
+            "help": "if set, disables positional embeddings (outside self attention)"
+        },
+    )
+    decoder_dropout: float = field(
+        default=0.0, metadata={"help": "dropout probability in the decoder"}
+    )
+    decoder_attention_dropout: float = field(
+        default=0.0,
+        metadata={
+            "help": "dropout probability for attention weights inside the decoder"
+        },
+    )
+    decoder_activation_dropout: float = field(
+        default=0.0,
+        metadata={
+            "help": "dropout probability after activation in FFN inside the decoder"
+        },
+    )
+    max_target_positions: int = field(
+        default=2048, metadata={"help": "max target positions"}
+    )
+    share_decoder_input_output_embed: bool = field(
+        default=False, metadata={"help": "share decoder input and output embeddings"}
+    )
+    autoregressive: bool = II("task.autoregressive")
+
+
+@register_model("wav2vec_seq2seq", dataclass=Wav2Vec2Seq2SeqConfig)
+class Wav2Vec2Seq2SeqModel(FairseqEncoderDecoderModel):
+    def __init__(self, encoder, decoder):
+        super().__init__(encoder, decoder)
+
+    @classmethod
+    def build_model(cls, cfg: Wav2Vec2Seq2SeqConfig, task: FairseqTask):
+        """Build a new model instance."""
+
+        assert (
+            cfg.autoregressive
+        ), "Please set task.autoregressive=true for seq2seq asr models"
+
+        src_dict, tgt_dict = task.source_dictionary, task.target_dictionary
+
+        def build_embedding(dictionary, embed_dim):
+            num_embeddings = len(dictionary)
+            padding_idx = dictionary.pad()
+            emb = Embedding(num_embeddings, embed_dim, padding_idx)
+            return emb
+
+        decoder_embed_tokens = build_embedding(tgt_dict, cfg.decoder_embed_dim)
+
+        encoder = cls.build_encoder(cfg)
+        decoder = cls.build_decoder(cfg, tgt_dict, decoder_embed_tokens)
+
+        return Wav2Vec2Seq2SeqModel(encoder, decoder)
+
+    @classmethod
+    def build_encoder(cls, cfg: Wav2Vec2AsrConfig):
+        return Wav2VecEncoder(cfg)
+
+    @classmethod
+    def build_decoder(cls, cfg: Wav2Vec2Seq2SeqConfig, tgt_dict, embed_tokens):
+        return TransformerDecoder(cfg, tgt_dict, embed_tokens)
+
+    def forward(self, **kwargs):
+        encoder_out = self.encoder(tbc=False, **kwargs)
+        decoder_out = self.decoder(encoder_out=encoder_out, **kwargs)
+        return decoder_out
+
+    def upgrade_state_dict_named(self, state_dict, name):
+        super().upgrade_state_dict_named(state_dict, name)
+        return state_dict
+
+
+class Wav2VecEncoder(FairseqEncoder):
+    def __init__(self, cfg: Wav2Vec2AsrConfig, output_size=None):
+        self.apply_mask = cfg.apply_mask
+
+        arg_overrides = {
+            "dropout": cfg.dropout,
+            "activation_dropout": cfg.activation_dropout,
+            "dropout_input": cfg.dropout_input,
+            "attention_dropout": cfg.attention_dropout,
+            "mask_length": cfg.mask_length,
+            "mask_prob": cfg.mask_prob,
+            "mask_selection": cfg.mask_selection,
+            "mask_other": cfg.mask_other,
+            "no_mask_overlap": cfg.no_mask_overlap,
+            "mask_channel_length": cfg.mask_channel_length,
+            "mask_channel_prob": cfg.mask_channel_prob,
+            "mask_channel_before": cfg.mask_channel_before,
+            "mask_channel_selection": cfg.mask_channel_selection,
+            "mask_channel_other": cfg.mask_channel_other,
+            "no_mask_channel_overlap": cfg.no_mask_channel_overlap,
+            "encoder_layerdrop": cfg.layerdrop,
+            "feature_grad_mult": cfg.feature_grad_mult,
+        }
+
+        if cfg.w2v_args is None:
+            state = checkpoint_utils.load_checkpoint_to_cpu(cfg.w2v_path, arg_overrides)
+            w2v_args = state.get("cfg", None)
+            if w2v_args is None:
+                w2v_args = convert_namespace_to_omegaconf(state["args"])
+            cfg.w2v_args = w2v_args
+        else:
+            state = None
+            w2v_args = cfg.w2v_args
+            if isinstance(w2v_args, Namespace):
+                cfg.w2v_args = w2v_args = convert_namespace_to_omegaconf(w2v_args)
+
+        assert cfg.normalize == w2v_args.task.normalize, (
+            "Fine-tuning works best when data normalization is the same. "
+            "Please check that --normalize is set or unset for both pre-training and here"
+        )
+
+        w2v_args.task.data = cfg.data
+        task = tasks.setup_task(w2v_args.task)
+        model = task.build_model(w2v_args.model)
+
+        if state is not None and not cfg.no_pretrained_weights:
+            model.load_state_dict(state["model"], strict=True)
+
+        model.remove_pretraining_modules()
+
+        super().__init__(task.source_dictionary)
+
+        d = w2v_args.model.encoder_embed_dim
+
+        self.w2v_model = model
+
+        self.final_dropout = nn.Dropout(cfg.final_dropout)
+        self.freeze_finetune_updates = cfg.freeze_finetune_updates
+        self.num_updates = 0
+
+        targ_d = None
+        self.proj = None
+
+        if output_size is not None:
+            targ_d = output_size
+        elif getattr(cfg, "decoder_embed_dim", d) != d:
+            targ_d = cfg.decoder_embed_dim
+
+        if targ_d is not None:
+            self.proj = Linear(d, targ_d)
+
+    def set_num_updates(self, num_updates):
+        """Set the number of parameters updates."""
+        super().set_num_updates(num_updates)
+        self.num_updates = num_updates
+
+    def forward(self, source, padding_mask, tbc=True, **kwargs):
+        w2v_args = {
+            "source": source,
+            "padding_mask": padding_mask,
+            "mask": self.apply_mask and self.training,
+        }
+
+        ft = self.freeze_finetune_updates <= self.num_updates
+
+        with torch.no_grad() if not ft else contextlib.ExitStack():
+            res = self.w2v_model.extract_features(**w2v_args)
+
+            x = res["x"]
+            padding_mask = res["padding_mask"]
+
+            if tbc:
+                # BTC -> TBC
+                x = x.transpose(0, 1)
+
+        x = self.final_dropout(x)
+
+        if self.proj:
+            x = self.proj(x)
+
+        return {
+            "encoder_out": x,  # T x B x C
+            "encoder_padding_mask": padding_mask.transpose(0, 1)
+            if padding_mask is not None
+            else None,  # T x B
+            "padding_mask": padding_mask,
+            "layer_results": res["layer_results"],
+        }
+
+    def reorder_encoder_out(self, encoder_out, new_order):
+        if encoder_out["encoder_out"] is not None:
+            encoder_out["encoder_out"] = encoder_out["encoder_out"].index_select(
+                1, new_order
+            )
+        if encoder_out["encoder_padding_mask"] is not None:
+            encoder_out["encoder_padding_mask"] = encoder_out[
+                "encoder_padding_mask"
+            ].index_select(0, new_order)
+        return encoder_out
+
+    def max_positions(self):
+        """Maximum input length supported by the encoder."""
+        return None
+
+    def upgrade_state_dict_named(self, state_dict, name):
+        return state_dict
+
+
+class TransformerDecoder(FairseqIncrementalDecoder):
+    """
+    Transformer decoder consisting of *args.decoder_layers* layers. Each layer
+    is a :class:`TransformerDecoderLayer`.
+
+    Args:
+        args (argparse.Namespace): parsed command-line arguments
+        dictionary (~fairseq.data.Dictionary): decoding dictionary
+        embed_tokens (torch.nn.Embedding): output embedding
+        no_encoder_attn (bool, optional): whether to attend to encoder outputs
+            (default: False).
+    """
+
+    def __init__(
+        self,
+        cfg: Wav2Vec2Seq2SeqConfig,
+        dictionary,
+        embed_tokens,
+        no_encoder_attn=False,
+    ):
+        super().__init__(dictionary)
+
+        self.dropout = cfg.decoder_dropout
+        self.share_input_output_embed = cfg.share_decoder_input_output_embed
+
+        input_embed_dim = embed_tokens.embedding_dim
+        embed_dim = cfg.decoder_embed_dim
+        self.output_embed_dim = cfg.decoder_embed_dim
+
+        self.layerdrop = cfg.decoder_layerdrop
+
+        padding_idx = embed_tokens.padding_idx
+        self.max_target_positions = cfg.max_target_positions
+
+        self.embed_tokens = embed_tokens
+        self.embed_scale = math.sqrt(embed_dim)  # todo: try with input_embed_dim
+
+        self.project_in_dim = (
+            Linear(input_embed_dim, embed_dim, bias=False)
+            if embed_dim != input_embed_dim
+            else None
+        )
+
+        self.embed_positions = (
+            PositionalEmbedding(
+                cfg.max_target_positions,
+                embed_dim,
+                padding_idx,
+                learned=cfg.decoder_learned_pos,
+            )
+            if not cfg.no_token_positional_embeddings
+            else None
+        )
+
+        # TODO: update this when transformer gets converted to dataclass configs
+        transformer_cfg = copy.deepcopy(cfg)
+        with open_dict(transformer_cfg):
+            transformer_cfg.dropout = transformer_cfg.decoder_dropout
+            transformer_cfg.attention_dropout = (
+                transformer_cfg.decoder_attention_dropout
+            )
+            transformer_cfg.activation_dropout = (
+                transformer_cfg.decoder_activation_dropout
+            )
+
+        self.layers = nn.ModuleList([])
+        self.layers.extend(
+            [
+                TransformerDecoderLayer(transformer_cfg, no_encoder_attn)
+                for _ in range(transformer_cfg.decoder_layers)
+            ]
+        )
+
+        if not self.share_input_output_embed:
+            self.embed_out = nn.Parameter(
+                torch.Tensor(len(dictionary), self.output_embed_dim)
+            )
+            nn.init.normal_(self.embed_out, mean=0, std=self.output_embed_dim ** -0.5)
+
+        if transformer_cfg.decoder_normalize_before:
+            self.layer_norm = LayerNorm(embed_dim)
+        else:
+            self.layer_norm = None
+
+    def forward(
+        self, prev_output_tokens, encoder_out=None, incremental_state=None, **unused
+    ):
+        """
+        Args:
+            prev_output_tokens (LongTensor): previous decoder outputs of shape
+                `(batch, tgt_len)`, for teacher forcing
+            encoder_out (Tensor, optional): output from the encoder, used for
+                encoder-side attention
+            incremental_state (dict): dictionary used for storing state during
+                :ref:`Incremental decoding`
+
+        Returns:
+            tuple:
+                - the decoder's output of shape `(batch, tgt_len, vocab)`
+                - a dictionary with any model-specific outputs
+        """
+        prev_output_tokens = prev_output_tokens.long()
+        x, extra = self.extract_features(
+            prev_output_tokens, encoder_out, incremental_state
+        )
+        x = self.output_layer(x)
+        return x, extra
+
+    def extract_features(
+        self, prev_output_tokens, encoder_out=None, incremental_state=None, **unused
+    ):
+        """
+        Similar to *forward* but only return features.
+
+        Returns:
+            tuple:
+                - the decoder's features of shape `(batch, tgt_len, embed_dim)`
+                - a dictionary with any model-specific outputs
+        """
+
+        # embed positions
+        positions = (
+            self.embed_positions(
+                prev_output_tokens, incremental_state=incremental_state
+            )
+            if self.embed_positions is not None
+            else None
+        )
+
+        if incremental_state is not None:
+            prev_output_tokens = prev_output_tokens[:, -1:]
+            if positions is not None:
+                positions = positions[:, -1:]
+
+        # embed tokens and positions
+        x = self.embed_scale * self.embed_tokens(prev_output_tokens)
+
+        if self.project_in_dim is not None:
+            x = self.project_in_dim(x)
+
+        if positions is not None:
+            x += positions
+        x = F.dropout(x, p=self.dropout, training=self.training)
+
+        # B x T x C -> T x B x C
+        x = x.transpose(0, 1)
+        attn = None
+
+        inner_states = [x]
+
+        # decoder layers
+        for layer in self.layers:
+            dropout_probability = np.random.random()
+            if not self.training or (dropout_probability > self.layerdrop):
+                x, attn, _ = layer(
+                    x,
+                    encoder_out["encoder_out"] if encoder_out is not None else None,
+                    encoder_out["padding_mask"] if encoder_out is not None else None,
+                    incremental_state,
+                    self_attn_mask=self.buffered_future_mask(x)
+                    if incremental_state is None
+                    else None,
+                )
+                inner_states.append(x)
+
+        if self.layer_norm:
+            x = self.layer_norm(x)
+
+        # T x B x C -> B x T x C
+        x = x.transpose(0, 1)
+
+        return x, {"attn": attn, "inner_states": inner_states}
+
+    def output_layer(self, features, **kwargs):
+        """Project features to the vocabulary size."""
+        # project back to size of vocabulary
+        if self.share_input_output_embed:
+            return F.linear(features, self.embed_tokens.weight)
+        else:
+            return F.linear(features, self.embed_out)
+
+    def max_positions(self):
+        """Maximum output length supported by the decoder."""
+        if self.embed_positions is None:
+            return self.max_target_positions
+        return min(self.max_target_positions, self.embed_positions.max_positions)
+
+    def buffered_future_mask(self, tensor):
+        dim = tensor.size(0)
+        if (
+            not hasattr(self, "_future_mask")
+            or self._future_mask is None
+            or self._future_mask.device != tensor.device
+            or self._future_mask.size(0) < dim
+        ):
+            self._future_mask = torch.triu(
+                utils.fill_with_neg_inf(tensor.new(dim, dim)), 1
+            )
+        return self._future_mask[:dim, :dim]
+
+    def upgrade_state_dict_named(self, state_dict, name):
+        return state_dict
+
+
+def Embedding(num_embeddings, embedding_dim, padding_idx):
+    m = nn.Embedding(num_embeddings, embedding_dim, padding_idx=padding_idx)
+    nn.init.normal_(m.weight, mean=0, std=embedding_dim ** -0.5)
+    nn.init.constant_(m.weight[padding_idx], 0)
+    return m
+
+
+def Linear(in_features, out_features, bias=True):
+    m = nn.Linear(in_features, out_features, bias)
+    nn.init.xavier_uniform_(m.weight)
+    if bias:
+        nn.init.constant_(m.bias, 0.0)
+    return m
diff --git a/SpeechT5/fairseq/fairseq/modules/__init__.py b/SpeechT5/fairseq/fairseq/modules/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..81930aa71c00ab8a6c36e362e8de3d356d0cf30a
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/modules/__init__.py
@@ -0,0 +1,78 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+"""isort:skip_file"""
+
+from .adaptive_input import AdaptiveInput
+from .adaptive_softmax import AdaptiveSoftmax
+from .base_layer import BaseLayer
+from .beamable_mm import BeamableMM
+from .character_token_embedder import CharacterTokenEmbedder
+from .conv_tbc import ConvTBC
+from .cross_entropy import cross_entropy
+from .downsampled_multihead_attention import DownsampledMultiHeadAttention
+from .dynamic_convolution import DynamicConv, DynamicConv1dTBC
+from .dynamic_crf_layer import DynamicCRF
+from .fairseq_dropout import FairseqDropout
+from .fp32_group_norm import Fp32GroupNorm
+from .gelu import gelu, gelu_accurate
+from .grad_multiply import GradMultiply
+from .gumbel_vector_quantizer import GumbelVectorQuantizer
+from .kmeans_vector_quantizer import KmeansVectorQuantizer
+from .layer_drop import LayerDropModuleList
+from .layer_norm import Fp32LayerNorm, LayerNorm
+from .learned_positional_embedding import LearnedPositionalEmbedding
+from .lightweight_convolution import LightweightConv, LightweightConv1dTBC
+from .linearized_convolution import LinearizedConvolution
+from .multihead_attention import MultiheadAttention
+from .positional_embedding import PositionalEmbedding
+from .same_pad import SamePad
+from .scalar_bias import ScalarBias
+from .sinusoidal_positional_embedding import SinusoidalPositionalEmbedding
+from .transformer_sentence_encoder_layer import TransformerSentenceEncoderLayer
+from .transformer_sentence_encoder import TransformerSentenceEncoder
+from .transpose_last import TransposeLast
+from .unfold import unfold1d
+from .transformer_layer import TransformerDecoderLayer, TransformerEncoderLayer
+from .vggblock import VGGBlock
+
+__all__ = [
+    "AdaptiveInput",
+    "AdaptiveSoftmax",
+    "BaseLayer",
+    "BeamableMM",
+    "CharacterTokenEmbedder",
+    "ConvTBC",
+    "cross_entropy",
+    "DownsampledMultiHeadAttention",
+    "DynamicConv1dTBC",
+    "DynamicConv",
+    "DynamicCRF",
+    "FairseqDropout",
+    "Fp32GroupNorm",
+    "Fp32LayerNorm",
+    "gelu",
+    "gelu_accurate",
+    "GradMultiply",
+    "GumbelVectorQuantizer",
+    "KmeansVectorQuantizer",
+    "LayerDropModuleList",
+    "LayerNorm",
+    "LearnedPositionalEmbedding",
+    "LightweightConv1dTBC",
+    "LightweightConv",
+    "LinearizedConvolution",
+    "MultiheadAttention",
+    "PositionalEmbedding",
+    "SamePad",
+    "ScalarBias",
+    "SinusoidalPositionalEmbedding",
+    "TransformerSentenceEncoderLayer",
+    "TransformerSentenceEncoder",
+    "TransformerDecoderLayer",
+    "TransformerEncoderLayer",
+    "TransposeLast",
+    "VGGBlock",
+    "unfold1d",
+]
diff --git a/SpeechT5/fairseq/fairseq/modules/adaptive_input.py b/SpeechT5/fairseq/fairseq/modules/adaptive_input.py
new file mode 100644
index 0000000000000000000000000000000000000000..446534a9f8b87337a4dd752944ea386ff7cf7965
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/modules/adaptive_input.py
@@ -0,0 +1,80 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+
+from typing import List
+
+import torch
+from fairseq.modules.quant_noise import quant_noise
+from torch import nn
+
+
+class AdaptiveInput(nn.Module):
+    def __init__(
+        self,
+        vocab_size: int,
+        padding_idx: int,
+        initial_dim: int,
+        factor: float,
+        output_dim: int,
+        cutoff: List[int],
+        q_noise: float = 0,
+        qn_block_size: int = 8,
+    ):
+        super().__init__()
+
+        if vocab_size > cutoff[-1]:
+            cutoff = cutoff + [vocab_size]
+        else:
+            assert (
+                vocab_size == cutoff[-1]
+            ), "cannot specify cutoff larger than vocab size"
+
+        self.cutoff = cutoff
+        self.embedding_dim = output_dim
+        self.padding_idx = padding_idx
+
+        self.embeddings = nn.ModuleList()
+        for i in range(len(self.cutoff)):
+            prev = self.cutoff[i - 1] if i > 0 else 0
+            size = self.cutoff[i] - prev
+            dim = int(initial_dim // (factor ** i))
+            seq = nn.Sequential(
+                nn.Embedding(size, dim, self.padding_idx),
+                quant_noise(
+                    nn.Linear(dim, output_dim, bias=False), q_noise, qn_block_size
+                ),
+            )
+
+            self.embeddings.append(seq)
+            self.padding_idx = None
+        self.padding_idx = padding_idx
+
+        def init_weights(m):
+            if isinstance(m, nn.Embedding):
+                nn.init.normal_(m.weight, mean=0, std=m.weight.shape[1] ** -0.5)
+                nn.init.constant_(m.weight[padding_idx], 0)
+            elif hasattr(m, "weight"):
+                nn.init.xavier_uniform_(m.weight)
+
+        self.apply(init_weights)
+
+        self.register_buffer("_float_tensor", torch.FloatTensor(1))
+
+    def weights_for_band(self, band: int):
+        return self.embeddings[band][0].weight, self.embeddings[band][1].weight
+
+    def forward(self, input: torch.Tensor):
+        result = self._float_tensor.new(input.shape + (self.embedding_dim,))
+        for i in range(len(self.cutoff)):
+            mask = input.lt(self.cutoff[i])
+            if i > 0:
+                mask.mul_(input.ge(self.cutoff[i - 1]))
+                chunk_input = input[mask] - self.cutoff[i - 1]
+            else:
+                chunk_input = input[mask]
+            if mask.any():
+                result[mask] = self.embeddings[i](chunk_input)
+        return result
diff --git a/SpeechT5/fairseq/fairseq/modules/adaptive_softmax.py b/SpeechT5/fairseq/fairseq/modules/adaptive_softmax.py
new file mode 100644
index 0000000000000000000000000000000000000000..ae0c77ba0f6ee98501306d66cbc4a948b4ade0f7
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/modules/adaptive_softmax.py
@@ -0,0 +1,268 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import functools
+import operator
+
+import torch
+import torch.nn.functional as F
+from fairseq.modules.fairseq_dropout import FairseqDropout
+from fairseq.modules.quant_noise import quant_noise
+from torch import nn
+
+
+class TiedLinear(nn.Module):
+    def __init__(self, weight, transpose):
+        super().__init__()
+        self.weight = weight
+        self.transpose = transpose
+
+    def forward(self, input):
+        return F.linear(input, self.weight.t() if self.transpose else self.weight)
+
+
+class TiedHeadModule(nn.Module):
+    def __init__(self, weights, input_dim, num_classes, q_noise, qn_block_size):
+        super().__init__()
+        tied_emb, _ = weights
+        self.num_words, emb_dim = tied_emb.size()
+
+        self.word_proj = quant_noise(
+            TiedLinear(tied_emb, transpose=False), q_noise, qn_block_size
+        )
+        if input_dim != emb_dim:
+            self.word_proj = nn.Sequential(
+                quant_noise(
+                    nn.Linear(input_dim, emb_dim, bias=False), q_noise, qn_block_size
+                ),
+                self.word_proj,
+            )
+
+        self.class_proj = quant_noise(
+            nn.Linear(input_dim, num_classes, bias=False), q_noise, qn_block_size
+        )
+        self.out_dim = self.num_words + num_classes
+
+        self.register_buffer("_float_tensor", torch.FloatTensor(1))
+
+    def forward(self, input):
+        inp_sz = functools.reduce(operator.mul, input.shape[:-1], 1)
+        out = self._float_tensor.new(inp_sz, self.out_dim)
+        out[:, : self.num_words] = self.word_proj(input.view(inp_sz, -1))
+        out[:, self.num_words :] = self.class_proj(input.view(inp_sz, -1))
+        return out
+
+
+class AdaptiveSoftmax(nn.Module):
+    """
+    This is an implementation of the efficient softmax approximation for
+    graphical processing units (GPU), described in the paper "Efficient softmax
+    approximation for GPUs" (http://arxiv.org/abs/1609.04309).
+    """
+
+    def __init__(
+        self,
+        vocab_size,
+        input_dim,
+        cutoff,
+        dropout,
+        factor=4.0,
+        adaptive_inputs=None,
+        tie_proj=False,
+        q_noise=0,
+        qn_block_size=8,
+    ):
+        super().__init__()
+
+        if vocab_size > cutoff[-1]:
+            cutoff = cutoff + [vocab_size]
+        else:
+            assert (
+                vocab_size == cutoff[-1]
+            ), "cannot specify cutoff larger than vocab size"
+
+        output_dim = cutoff[0] + len(cutoff) - 1
+
+        self.vocab_size = vocab_size
+        self.cutoff = cutoff
+        self.dropout_module = FairseqDropout(
+            dropout, module_name=self.__class__.__name__
+        )
+        self.input_dim = input_dim
+        self.factor = factor
+        self.q_noise = q_noise
+        self.qn_block_size = qn_block_size
+
+        self.lsm = nn.LogSoftmax(dim=1)
+
+        if adaptive_inputs is not None:
+            self.head = TiedHeadModule(
+                adaptive_inputs.weights_for_band(0),
+                input_dim,
+                len(cutoff) - 1,
+                self.q_noise,
+                self.qn_block_size,
+            )
+        else:
+            self.head = quant_noise(
+                nn.Linear(input_dim, output_dim, bias=False),
+                self.q_noise,
+                self.qn_block_size,
+            )
+
+        self._make_tail(adaptive_inputs, tie_proj)
+
+        def init_weights(m):
+            if (
+                hasattr(m, "weight")
+                and not isinstance(m, TiedLinear)
+                and not isinstance(m, TiedHeadModule)
+            ):
+                nn.init.xavier_uniform_(m.weight)
+
+        self.apply(init_weights)
+
+        self.register_buffer("version", torch.LongTensor([1]))
+
+    def _make_tail(self, adaptive_inputs=None, tie_proj=False):
+        self.tail = nn.ModuleList()
+        for i in range(len(self.cutoff) - 1):
+            dim = int(self.input_dim // self.factor ** (i + 1))
+
+            tied_emb, tied_proj = (
+                adaptive_inputs.weights_for_band(i + 1)
+                if adaptive_inputs is not None
+                else (None, None)
+            )
+
+            if tied_proj is not None:
+                if tie_proj:
+                    proj = quant_noise(
+                        TiedLinear(tied_proj, transpose=True),
+                        self.q_noise,
+                        self.qn_block_size,
+                    )
+                else:
+                    proj = quant_noise(
+                        nn.Linear(tied_proj.size(0), tied_proj.size(1), bias=False),
+                        self.q_noise,
+                        self.qn_block_size,
+                    )
+            else:
+                proj = quant_noise(
+                    nn.Linear(self.input_dim, dim, bias=False),
+                    self.q_noise,
+                    self.qn_block_size,
+                )
+
+            if tied_emb is None:
+                out_proj = nn.Linear(
+                    dim, self.cutoff[i + 1] - self.cutoff[i], bias=False
+                )
+            else:
+                out_proj = TiedLinear(tied_emb, transpose=False)
+
+            m = nn.Sequential(
+                proj,
+                nn.Dropout(self.dropout_module.p),
+                quant_noise(out_proj, self.q_noise, self.qn_block_size),
+            )
+
+            self.tail.append(m)
+
+    def upgrade_state_dict_named(self, state_dict, name):
+        version_name = name + ".version"
+        if version_name not in state_dict:
+            raise Exception("This version of the model is no longer supported")
+
+    def adapt_target(self, target):
+        """
+        In order to be efficient, the AdaptiveSoftMax does not compute the
+        scores for all the word of the vocabulary for all the examples. It is
+        thus necessary to call the method adapt_target of the AdaptiveSoftMax
+        layer inside each forward pass.
+        """
+
+        target = target.view(-1)
+        new_target = [target.clone()]
+        target_idxs = []
+
+        for i in range(len(self.cutoff) - 1):
+            mask = target.ge(self.cutoff[i]).mul(target.lt(self.cutoff[i + 1]))
+            new_target[0][mask] = self.cutoff[0] + i
+
+            if mask.any():
+                target_idxs.append(mask.nonzero(as_tuple=False).squeeze(1))
+                new_target.append(target[mask].add(-self.cutoff[i]))
+            else:
+                target_idxs.append(None)
+                new_target.append(None)
+
+        return new_target, target_idxs
+
+    def forward(self, input, target):
+        """
+        Args:
+            input: (b x t x d)
+            target: (b x t)
+        Returns:
+            2 lists: output for each cutoff section and new targets by cut off
+        """
+
+        input = input.contiguous().view(-1, input.size(-1))
+        input = self.dropout_module(input)
+
+        new_target, target_idxs = self.adapt_target(target)
+        output = [self.head(input)]
+
+        for i in range(len(target_idxs)):
+            if target_idxs[i] is not None:
+                output.append(self.tail[i](input.index_select(0, target_idxs[i])))
+            else:
+                output.append(None)
+
+        return output, new_target
+
+    def get_log_prob(self, input, target):
+        """
+        Computes the log probabilities for all the words of the vocabulary,
+        given a 2D tensor of hidden vectors.
+        """
+
+        bsz, length, dim = input.size()
+        input = input.contiguous().view(-1, dim)
+
+        if target is not None:
+            _, target_idxs = self.adapt_target(target)
+        else:
+            target_idxs = None
+
+        head_y = self.head(input)
+        log_probs = head_y.new_zeros(input.size(0), self.vocab_size)
+
+        head_sz = self.cutoff[0] + len(self.tail)
+        log_probs[:, :head_sz] = self.lsm(head_y)
+        tail_priors = log_probs[:, self.cutoff[0] : head_sz].clone()
+
+        for i in range(len(self.tail)):
+            start = self.cutoff[i]
+            end = self.cutoff[i + 1]
+
+            if target_idxs is None:
+                tail_out = log_probs[:, start:end]
+                tail_out.copy_(self.tail[i](input))
+                log_probs[:, start:end] = self.lsm(tail_out).add_(
+                    tail_priors[:, i, None]
+                )
+            elif target_idxs[i] is not None:
+                idxs = target_idxs[i]
+                tail_out = log_probs[idxs, start:end]
+                tail_out.copy_(self.tail[i](input[idxs]))
+                log_probs[idxs, start:end] = self.lsm(tail_out).add_(
+                    tail_priors[idxs, i, None]
+                )
+
+        log_probs = log_probs.view(bsz, length, -1)
+        return log_probs
diff --git a/SpeechT5/fairseq/fairseq/modules/base_layer.py b/SpeechT5/fairseq/fairseq/modules/base_layer.py
new file mode 100644
index 0000000000000000000000000000000000000000..e7ef155b25fc73e74780879f665288c9bc95fd80
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/modules/base_layer.py
@@ -0,0 +1,135 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import torch.nn as nn
+import torch
+import sys
+from fairseq import utils
+from fairseq.distributed import utils as distributed_utils
+from fairseq.modules.layer_norm import LayerNorm
+
+
+class BaseLayer(nn.Module):
+
+    def __init__(self, args):
+        super().__init__()
+        self.num_workers = distributed_utils.get_data_parallel_world_size()
+        expert_centroids = torch.empty(self.num_workers, args.decoder_embed_dim)
+        torch.nn.init.orthogonal_(expert_centroids, gain=0.1)
+        self.register_parameter("expert_centroids", torch.nn.Parameter(expert_centroids))
+        self.expert_network = nn.Sequential(*([BaseSublayer(args) for _ in range(args.base_sublayers)]))
+        self.expert_id = distributed_utils.get_data_parallel_rank()
+        self.shuffle = args.base_shuffle
+        self.cpp = self.load_assignment()
+
+        # Add a special attribute to the expert parameters, so we know not to sync their gradients
+        for param in self.expert_network.parameters():
+            param.expert = True
+
+    def forward(self, input_features, *args, **kwargs):
+        features = input_features.reshape(-1, input_features.size(-1))
+        is_training = input_features.requires_grad
+
+        if self.shuffle and is_training:
+            # Send each token to a random worker, to break correlations within the batch
+            shuffle_sort = torch.randperm(features.size(0), device=features.device)
+            features = All2All.apply(features[shuffle_sort])
+
+        with torch.no_grad():
+            # Compute similarity of each token to each expert, for routing
+            token_expert_affinities = features.matmul(self.expert_centroids.transpose(0, 1))
+
+        # Compute which token goes to which expert
+        sort_by_expert, input_splits, output_splits = self.balanced_assignment(token_expert_affinities) \
+            if is_training else self.greedy_assignment(token_expert_affinities)
+        # Swap these tokens for the right ones for our expert
+        routed_features = All2All.apply(features[sort_by_expert], output_splits, input_splits)
+
+        if routed_features.size(0) > 0:
+            # Mix in the expert network based on how appropriate it is for these tokens
+            alpha = torch.sigmoid(routed_features.mv(self.expert_centroids[self.expert_id])).unsqueeze(1)
+            routed_features = alpha * self.expert_network(routed_features) + (1 - alpha) * routed_features
+        # Return to original worker and ordering
+        result = All2All.apply(routed_features, input_splits, output_splits)[self.inverse_sort(sort_by_expert)]
+
+        if self.shuffle and is_training:
+            # Undo shuffling
+            result = All2All.apply(result)[self.inverse_sort(shuffle_sort)]
+
+        # Return additional Nones for compatibility with TransformerDecoderLayer
+        return result.view(input_features.size()), None, None
+
+    def inverse_sort(self, order):
+        # Creates an index that undoes a sort: xs==xs[order][inverse_sort(order)]
+        return torch.empty_like(order).scatter_(0, order, torch.arange(0, order.size(0), device=order.device))
+
+    def balanced_assignment(self, scores):
+        ok = scores.isfinite()
+        if not ok.all():
+            # NaNs here can break the assignment algorithm
+            scores[~ok] = scores[ok].min()
+        return self.cpp.balanced_assignment(scores), None, None
+
+    # Assigns each token to the top k experts
+    def greedy_assignment(self, scores, k=1):
+        token_to_workers = torch.topk(scores, dim=1, k=k, largest=True).indices.view(-1)
+        token_to_workers, sort_ordering = torch.sort(token_to_workers)
+        worker2token = sort_ordering // k
+
+        # Find how many tokens we're sending to each other worker (being careful for sending 0 tokens to some workers)
+        output_splits = torch.zeros((self.num_workers,), dtype=torch.long, device=scores.device)
+        workers, counts = torch.unique_consecutive(token_to_workers, return_counts=True)
+        output_splits[workers] = counts
+        # Tell other workers how many tokens to expect from us
+        input_splits = All2All.apply(output_splits)
+        return worker2token, input_splits.tolist(), output_splits.tolist()
+
+    def load_assignment(self):
+        try:
+            from fairseq import libbase
+
+            return libbase
+
+        except ImportError as e:
+            sys.stderr.write(
+                "ERROR: missing libbase. run `python setup.py build_ext --inplace`\n"
+            )
+            raise e
+
+
+class BaseSublayer(nn.Module):
+    def __init__(self, args):
+        super().__init__()
+        self.activation_fn = utils.get_activation_fn(
+            activation=getattr(args, 'activation_fn', 'relu') or "relu"
+        )
+        self.norm = LayerNorm(args.decoder_embed_dim, export=False)
+        self.ff1 = torch.nn.Linear(args.decoder_embed_dim, args.decoder_ffn_embed_dim)
+        self.ff2 = torch.nn.Linear(args.decoder_ffn_embed_dim, args.decoder_embed_dim)
+        self.ff2.weight.data.zero_()
+
+    def forward(self, xs):
+        return xs + self.ff2(self.activation_fn(self.ff1(self.norm(xs))))
+
+
+# Wraps torch.distributed.all_to_all_single as a function that supports autograd
+class All2All(torch.autograd.Function):
+    @staticmethod
+    def forward(ctx, xs, input_splits=None, output_splits=None):
+        ctx.input_splits = input_splits
+        ctx.output_splits = output_splits
+
+        ys = torch.empty_like(xs) if output_splits is None else \
+            xs.new_empty(size=[sum(output_splits)] + list(xs.size()[1:]))
+        torch.distributed.all_to_all_single(ys, xs, output_split_sizes=output_splits, input_split_sizes=input_splits)
+        return ys
+
+    @staticmethod
+    def backward(ctx, grad_output):
+        result = torch.empty_like(grad_output) if ctx.input_splits is None else \
+            grad_output.new_empty(size=[sum(ctx.input_splits)] + list(grad_output.size()[1:]))
+        torch.distributed.all_to_all_single(result, grad_output,
+                                            output_split_sizes=ctx.input_splits, input_split_sizes=ctx.output_splits)
+        return result, None, None
diff --git a/SpeechT5/fairseq/fairseq/modules/beamable_mm.py b/SpeechT5/fairseq/fairseq/modules/beamable_mm.py
new file mode 100644
index 0000000000000000000000000000000000000000..eff1a4607f600c71210e6b914985dc48731aae86
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/modules/beamable_mm.py
@@ -0,0 +1,49 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import torch
+import torch.nn as nn
+
+
+class BeamableMM(nn.Module):
+    """This module provides an optimized MM for beam decoding with attention.
+
+    It leverage the fact that the source-side of the input is replicated beam
+    times and the target-side of the input is of width one. This layer speeds up
+    inference by replacing the inputs {(bsz x 1 x nhu), (bsz x sz2 x nhu)}
+    with smaller inputs {(bsz/beam x beam x nhu), (bsz/beam x sz2 x nhu)}.
+    """
+
+    def __init__(self, beam_size=None):
+        super(BeamableMM, self).__init__()
+        self.beam_size = beam_size
+
+    def forward(self, input1, input2):
+        if (
+            not self.training
+            and self.beam_size is not None  # test mode
+            and input1.dim() == 3  # beam size is set
+            and input1.size(1)  # only support batched input
+            == 1  # single time step update
+        ):
+            bsz, beam = input1.size(0), self.beam_size
+
+            # bsz x 1 x nhu --> bsz/beam x beam x nhu
+            input1 = input1[:, 0, :].unfold(0, beam, beam).transpose(2, 1)
+
+            # bsz x sz2 x nhu --> bsz/beam x sz2 x nhu
+            input2 = input2.unfold(0, beam, beam)[:, :, :, 0]
+
+            # use non batched operation if bsz = beam
+            if input1.size(0) == 1:
+                output = torch.mm(input1[0, :, :], input2[0, :, :])
+            else:
+                output = input1.bmm(input2)
+            return output.view(bsz, 1, -1)
+        else:
+            return input1.bmm(input2)
+
+    def set_beam_size(self, beam_size):
+        self.beam_size = beam_size
diff --git a/SpeechT5/fairseq/fairseq/modules/character_token_embedder.py b/SpeechT5/fairseq/fairseq/modules/character_token_embedder.py
new file mode 100644
index 0000000000000000000000000000000000000000..181221b61b9f76453b67e3b848b198620dce912c
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/modules/character_token_embedder.py
@@ -0,0 +1,214 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import logging
+from typing import List, Tuple
+
+import torch
+import torch.nn.functional as F
+from fairseq.data import Dictionary
+from torch import nn
+
+
+CHAR_PAD_IDX = 0
+CHAR_EOS_IDX = 257
+
+
+logger = logging.getLogger(__name__)
+
+
+class CharacterTokenEmbedder(torch.nn.Module):
+    def __init__(
+        self,
+        vocab: Dictionary,
+        filters: List[Tuple[int, int]],
+        char_embed_dim: int,
+        word_embed_dim: int,
+        highway_layers: int,
+        max_char_len: int = 50,
+        char_inputs: bool = False,
+    ):
+        super(CharacterTokenEmbedder, self).__init__()
+
+        self.onnx_trace = False
+        self.embedding_dim = word_embed_dim
+        self.max_char_len = max_char_len
+        self.char_embeddings = nn.Embedding(257, char_embed_dim, padding_idx=0)
+        self.symbol_embeddings = nn.Parameter(torch.FloatTensor(2, word_embed_dim))
+        self.eos_idx, self.unk_idx = 0, 1
+        self.char_inputs = char_inputs
+
+        self.convolutions = nn.ModuleList()
+        for width, out_c in filters:
+            self.convolutions.append(
+                nn.Conv1d(char_embed_dim, out_c, kernel_size=width)
+            )
+
+        last_dim = sum(f[1] for f in filters)
+
+        self.highway = Highway(last_dim, highway_layers) if highway_layers > 0 else None
+
+        self.projection = nn.Linear(last_dim, word_embed_dim)
+
+        assert (
+            vocab is not None or char_inputs
+        ), "vocab must be set if not using char inputs"
+        self.vocab = None
+        if vocab is not None:
+            self.set_vocab(vocab, max_char_len)
+
+        self.reset_parameters()
+
+    def prepare_for_onnx_export_(self):
+        self.onnx_trace = True
+
+    def set_vocab(self, vocab, max_char_len):
+        word_to_char = torch.LongTensor(len(vocab), max_char_len)
+
+        truncated = 0
+        for i in range(len(vocab)):
+            if i < vocab.nspecial:
+                char_idxs = [0] * max_char_len
+            else:
+                chars = vocab[i].encode()
+                # +1 for padding
+                char_idxs = [c + 1 for c in chars] + [0] * (max_char_len - len(chars))
+            if len(char_idxs) > max_char_len:
+                truncated += 1
+                char_idxs = char_idxs[:max_char_len]
+            word_to_char[i] = torch.LongTensor(char_idxs)
+
+        if truncated > 0:
+            logger.info(
+                "truncated {} words longer than {} characters".format(
+                    truncated, max_char_len
+                )
+            )
+
+        self.vocab = vocab
+        self.word_to_char = word_to_char
+
+    @property
+    def padding_idx(self):
+        return Dictionary().pad() if self.vocab is None else self.vocab.pad()
+
+    def reset_parameters(self):
+        nn.init.xavier_normal_(self.char_embeddings.weight)
+        nn.init.xavier_normal_(self.symbol_embeddings)
+        nn.init.xavier_uniform_(self.projection.weight)
+
+        nn.init.constant_(
+            self.char_embeddings.weight[self.char_embeddings.padding_idx], 0.0
+        )
+        nn.init.constant_(self.projection.bias, 0.0)
+
+    def forward(
+        self,
+        input: torch.Tensor,
+    ):
+        if self.char_inputs:
+            chars = input.view(-1, self.max_char_len)
+            pads = chars[:, 0].eq(CHAR_PAD_IDX)
+            eos = chars[:, 0].eq(CHAR_EOS_IDX)
+            if eos.any():
+                if self.onnx_trace:
+                    chars = torch.where(eos.unsqueeze(1), chars.new_zeros(1), chars)
+                else:
+                    chars[eos] = 0
+
+            unk = None
+        else:
+            flat_words = input.view(-1)
+            chars = self.word_to_char[flat_words.type_as(self.word_to_char)].type_as(
+                input
+            )
+            pads = flat_words.eq(self.vocab.pad())
+            eos = flat_words.eq(self.vocab.eos())
+            unk = flat_words.eq(self.vocab.unk())
+
+        word_embs = self._convolve(chars)
+        if self.onnx_trace:
+            if pads.any():
+                word_embs = torch.where(
+                    pads.unsqueeze(1), word_embs.new_zeros(1), word_embs
+                )
+            if eos.any():
+                word_embs = torch.where(
+                    eos.unsqueeze(1), self.symbol_embeddings[self.eos_idx], word_embs
+                )
+            if unk is not None and unk.any():
+                word_embs = torch.where(
+                    unk.unsqueeze(1), self.symbol_embeddings[self.unk_idx], word_embs
+                )
+        else:
+            if pads.any():
+                word_embs[pads] = 0
+            if eos.any():
+                word_embs[eos] = self.symbol_embeddings[self.eos_idx]
+            if unk is not None and unk.any():
+                word_embs[unk] = self.symbol_embeddings[self.unk_idx]
+
+        return word_embs.view(input.size()[:2] + (-1,))
+
+    def _convolve(
+        self,
+        char_idxs: torch.Tensor,
+    ):
+        char_embs = self.char_embeddings(char_idxs)
+        char_embs = char_embs.transpose(1, 2)  # BTC -> BCT
+
+        conv_result = []
+
+        for conv in self.convolutions:
+            x = conv(char_embs)
+            x, _ = torch.max(x, -1)
+            x = F.relu(x)
+            conv_result.append(x)
+
+        x = torch.cat(conv_result, dim=-1)
+
+        if self.highway is not None:
+            x = self.highway(x)
+        x = self.projection(x)
+
+        return x
+
+
+class Highway(torch.nn.Module):
+    """
+    A `Highway layer <https://arxiv.org/abs/1505.00387>`_.
+    Adopted from the AllenNLP implementation.
+    """
+
+    def __init__(self, input_dim: int, num_layers: int = 1):
+        super(Highway, self).__init__()
+        self.input_dim = input_dim
+        self.layers = nn.ModuleList(
+            [nn.Linear(input_dim, input_dim * 2) for _ in range(num_layers)]
+        )
+        self.activation = nn.ReLU()
+
+        self.reset_parameters()
+
+    def reset_parameters(self):
+        for layer in self.layers:
+            # As per comment in AllenNLP:
+            # We should bias the highway layer to just carry its input forward.  We do that by
+            # setting the bias on `B(x)` to be positive, because that means `g` will be biased to
+            # be high, so we will carry the input forward.  The bias on `B(x)` is the second half
+            # of the bias vector in each Linear layer.
+            nn.init.constant_(layer.bias[self.input_dim :], 1)
+
+            nn.init.constant_(layer.bias[: self.input_dim], 0)
+            nn.init.xavier_normal_(layer.weight)
+
+    def forward(self, x: torch.Tensor):
+        for layer in self.layers:
+            projection = layer(x)
+            proj_x, gate = projection.chunk(2, dim=-1)
+            proj_x = self.activation(proj_x)
+            gate = torch.sigmoid(gate)
+            x = gate * x + (gate.new_tensor([1]) - gate) * proj_x
+        return x
diff --git a/SpeechT5/fairseq/fairseq/modules/checkpoint_activations.py b/SpeechT5/fairseq/fairseq/modules/checkpoint_activations.py
new file mode 100644
index 0000000000000000000000000000000000000000..b44fc346cec1ab24d8056075b3df14020a86214b
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/modules/checkpoint_activations.py
@@ -0,0 +1,236 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import functools
+from typing import Any, Dict, List, Tuple, Union
+
+import torch
+import torch.utils.checkpoint as checkpoint
+from fairseq import utils
+
+
+def checkpoint_wrapper(m, offload_to_cpu=False):
+    """
+    A friendlier wrapper for performing activation checkpointing.
+
+    Compared to the PyTorch version, this version:
+    - wraps an nn.Module, so that all subsequent calls will use checkpointing
+    - handles keyword arguments in the forward
+    - handles non-Tensor outputs from the forward
+
+    Usage::
+
+        checkpointed_module = checkpoint_wrapper(my_module, offload_to_cpu=True)
+        a, b = checkpointed_module(x, y=3, z=torch.Tensor([1]))
+    """
+    # should I check whether original_forward has already been set?
+    assert not hasattr(
+        m, "precheckpoint_forward"
+    ), "checkpoint function has already been applied?"
+    m.precheckpoint_forward = m.forward
+    m.forward = functools.partial(
+        _checkpointed_forward,
+        m.precheckpoint_forward,  # original_forward
+        offload_to_cpu,
+    )
+    return m
+
+
+def unwrap_checkpoint(m: torch.nn.Module):
+    """
+    unwrap a module and its children from checkpoint_wrapper
+    """
+    for module in m.modules():
+        if hasattr(module, "precheckpoint_forward"):
+            module.forward = module.precheckpoint_forward
+            del module.precheckpoint_forward
+    return m
+
+
+def _checkpointed_forward(original_forward, offload_to_cpu, *args, **kwargs):
+    # Autograd Functions in PyTorch work best with positional args, since
+    # the backward must return gradients (or None) for every input argument.
+    # We can flatten keyword arguments to make this easier.
+    kwarg_keys, flat_args = pack_kwargs(*args, **kwargs)
+    parent_ctx_dict = {"offload": offload_to_cpu}
+    output = CheckpointFunction.apply(
+        original_forward, parent_ctx_dict, kwarg_keys, *flat_args
+    )
+    if isinstance(output, torch.Tensor):
+        return output
+    else:
+        packed_non_tensor_outputs = parent_ctx_dict["packed_non_tensor_outputs"]
+        if packed_non_tensor_outputs:
+            output = unpack_non_tensors(output, packed_non_tensor_outputs)
+        return output
+
+
+def pack_kwargs(*args, **kwargs) -> Tuple[List[str], List[Any]]:
+    """
+    Usage::
+
+        kwarg_keys, flat_args = pack_kwargs(1, 2, a=3, b=4)
+        args, kwargs = unpack_kwargs(kwarg_keys, flat_args)
+        assert args == [1, 2]
+        assert kwargs == {"a": 3, "b": 4}
+    """
+    kwarg_keys = []
+    flat_args = list(args)
+    for k, v in kwargs.items():
+        kwarg_keys.append(k)
+        flat_args.append(v)
+    return kwarg_keys, flat_args
+
+
+def unpack_kwargs(
+    kwarg_keys: List[str], flat_args: List[Any]
+) -> Tuple[List[Any], Dict[str, Any]]:
+    if len(kwarg_keys) == 0:
+        return flat_args, {}
+    args = flat_args[: -len(kwarg_keys)]
+    kwargs = {k: v for k, v in zip(kwarg_keys, flat_args[-len(kwarg_keys) :])}
+    return args, kwargs
+
+
+def split_non_tensors(
+    mixed: Union[torch.Tensor, Tuple[Any]]
+) -> Tuple[Tuple[torch.Tensor], Dict[str, List[Any]]]:
+    """
+    Usage::
+
+        x = torch.Tensor([1])
+        y = torch.Tensor([2])
+        tensors, packed_non_tensors = split_non_tensors((x, y, None, 3))
+        recon = unpack_non_tensors(tensors, packed_non_tensors)
+        assert recon == (x, y, None, 3)
+    """
+    if isinstance(mixed, torch.Tensor):
+        return (mixed,), None
+    tensors = []
+    packed_non_tensors = {"is_tensor": [], "objects": []}
+    for o in mixed:
+        if isinstance(o, torch.Tensor):
+            packed_non_tensors["is_tensor"].append(True)
+            tensors.append(o)
+        else:
+            packed_non_tensors["is_tensor"].append(False)
+            packed_non_tensors["objects"].append(o)
+    return tuple(tensors), packed_non_tensors
+
+
+def unpack_non_tensors(
+    tensors: Tuple[torch.Tensor],
+    packed_non_tensors: Dict[str, List[Any]],
+) -> Tuple[Any]:
+    if packed_non_tensors is None:
+        return tensors
+    assert isinstance(packed_non_tensors, dict)
+    mixed = []
+    is_tensor_list = packed_non_tensors["is_tensor"]
+    objects = packed_non_tensors["objects"]
+    assert len(tensors) + len(objects) == len(is_tensor_list)
+    obj_i = tnsr_i = 0
+    for is_tensor in is_tensor_list:
+        if is_tensor:
+            mixed.append(tensors[tnsr_i])
+            tnsr_i += 1
+        else:
+            mixed.append(objects[obj_i])
+            obj_i += 1
+    return tuple(mixed)
+
+
+class CheckpointFunction(torch.autograd.Function):
+    """Similar to the torch version, but support non-Tensor outputs.
+
+    The caller is expected to provide a dict (*parent_ctx_dict*) that will hold
+    the non-Tensor outputs. These should be combined with the Tensor *outputs*
+    by calling ``unpack_non_tensors``.
+    """
+
+    @staticmethod
+    def forward(ctx, run_function, parent_ctx_dict, kwarg_keys, *args):
+        if torch.is_grad_enabled():  # grad may be disabled, e.g., during validation
+            checkpoint.check_backward_validity(args)
+
+        ctx.run_function = run_function
+        ctx.kwarg_keys = kwarg_keys
+        ctx.fwd_rng_state = utils.get_rng_state()
+
+        tensor_inputs, packed_non_tensor_inputs = split_non_tensors(args)
+        if parent_ctx_dict["offload"]:
+            ctx.fwd_device = tuple(x.device for x in tensor_inputs)
+            ctx.grad_requirements = tuple(x.requires_grad for x in tensor_inputs)
+            tensor_inputs = tuple(x.cpu() for x in tensor_inputs)
+
+        else:
+            ctx.fwd_device, ctx.grad_requirements = None, None
+
+        ctx.save_for_backward(*tensor_inputs)
+        ctx.packed_non_tensor_inputs = packed_non_tensor_inputs
+
+        with torch.no_grad():
+            unpacked_args, unpacked_kwargs = unpack_kwargs(kwarg_keys, args)
+            outputs = run_function(*unpacked_args, **unpacked_kwargs)
+
+        if isinstance(outputs, torch.Tensor):
+            return outputs
+        else:
+            # Autograd Functions don't like non-Tensor outputs. We can split the
+            # non-Tensor and Tensor outputs, returning the former by reference
+            # through *parent_ctx_dict* and returning the latter directly.
+            outputs, packed_non_tensor_outputs = split_non_tensors(outputs)
+            parent_ctx_dict["packed_non_tensor_outputs"] = packed_non_tensor_outputs
+            return outputs
+
+    @staticmethod
+    def backward(ctx, *args):
+        if not torch.autograd._is_checkpoint_valid():
+            raise RuntimeError(
+                "Checkpointing is not compatible with .grad(), please use .backward() if possible"
+            )
+
+        tensor_inputs: Tuple = ctx.saved_tensors
+        tensor_inputs = checkpoint.detach_variable(tensor_inputs)
+        if ctx.fwd_device is not None:
+            tensor_inputs = [
+                t.to(ctx.fwd_device[i]) for i, t in enumerate(tensor_inputs)
+            ]
+            for i, need_grad in enumerate(ctx.grad_requirements):
+                tensor_inputs[i].requires_grad = need_grad
+        inputs = unpack_non_tensors(tensor_inputs, ctx.packed_non_tensor_inputs)
+
+        # Store the current states.
+        bwd_rng_state = utils.get_rng_state()
+
+        # Set the states to what it used to be before the forward pass.
+        utils.set_rng_state(ctx.fwd_rng_state)
+
+        with torch.enable_grad():
+            unpacked_args, unpacked_kwargs = unpack_kwargs(ctx.kwarg_keys, inputs)
+            outputs = ctx.run_function(*unpacked_args, **unpacked_kwargs)
+            tensor_outputs, _ = split_non_tensors(outputs)
+        # Set the states back to what it was at the start of this function.
+        utils.set_rng_state(bwd_rng_state)
+
+        # Run backward() with only Tensors that require grad
+        outputs_with_grad = []
+        args_with_grad = []
+        for i in range(len(tensor_outputs)):
+            if tensor_outputs[i].requires_grad:
+                outputs_with_grad.append(tensor_outputs[i])
+                args_with_grad.append(args[i])
+        if len(outputs_with_grad) == 0:
+            raise RuntimeError(
+                "None of the outputs have requires_grad=True, "
+                "this checkpoint() is not necessary"
+            )
+
+        torch.autograd.backward(outputs_with_grad, args_with_grad)
+
+        grads = tuple(
+            inp.grad if isinstance(inp, torch.Tensor) else None for inp in inputs
+        )
+        return (None, None, None) + grads
diff --git a/SpeechT5/fairseq/fairseq/modules/conv_tbc.py b/SpeechT5/fairseq/fairseq/modules/conv_tbc.py
new file mode 100644
index 0000000000000000000000000000000000000000..65e17ec94f7e595cb657b3d2daaa1052a95d0677
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/modules/conv_tbc.py
@@ -0,0 +1,53 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import torch
+from torch import nn
+from torch.nn.modules.utils import _single
+from torch import Tensor
+
+
+class ConvTBC(torch.nn.Module):
+    """1D convolution over an input of shape (time x batch x channel)
+
+    The implementation uses gemm to perform the convolution. This implementation
+    is faster than cuDNN for small kernel sizes.
+    """
+
+    def __init__(self, in_channels, out_channels, kernel_size, padding=0):
+        super(ConvTBC, self).__init__()
+        self.in_channels = in_channels
+        self.out_channels = out_channels
+        self.kernel_size = _single(kernel_size)
+        self.padding = _single(padding)
+
+        self.weight = torch.nn.Parameter(
+            torch.Tensor(self.kernel_size[0], in_channels, out_channels)
+        )
+        self.bias = torch.nn.Parameter(torch.Tensor(out_channels))
+
+        self.reset_parameters()
+
+    def reset_parameters(self):
+        nn.init.xavier_normal_(self.weight)
+        nn.init.zeros_(self.bias)
+
+    def conv_tbc(self, input: Tensor):
+        return torch.conv_tbc(
+            input.contiguous(), self.weight, self.bias, self.padding[0]
+        )
+
+    def forward(self, input: Tensor):
+        return self.conv_tbc(input)
+
+    def __repr__(self):
+        s = (
+            "{name}({in_channels}, {out_channels}, kernel_size={kernel_size}"
+            ", padding={padding}"
+        )
+        if self.bias is None:
+            s += ", bias=False"
+        s += ")"
+        return s.format(name=self.__class__.__name__, **self.__dict__)
diff --git a/SpeechT5/fairseq/fairseq/modules/cross_entropy.py b/SpeechT5/fairseq/fairseq/modules/cross_entropy.py
new file mode 100644
index 0000000000000000000000000000000000000000..6f33c24cb56e25f91595009af38e63784c2263a0
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/modules/cross_entropy.py
@@ -0,0 +1,61 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import logging
+
+import torch
+import torch.nn.functional as F
+
+
+logger = logging.getLogger(__name__)
+
+
+def _cross_entropy_pytorch(logits, target, ignore_index=None, reduction="mean"):
+    lprobs = F.log_softmax(logits, dim=-1, dtype=torch.float32)
+    return F.nll_loss(
+        lprobs,
+        target,
+        ignore_index=ignore_index,
+        reduction=reduction,
+    )
+
+
+try:
+    import xentropy_cuda
+    from apex.contrib import xentropy
+
+    def cross_entropy(logits, target, ignore_index=-100, reduction="mean"):
+        if logits.device == torch.device("cpu"):
+            return _cross_entropy_pytorch(logits, target, ignore_index, reduction)
+        else:
+            if not getattr(cross_entropy, "_has_logged_once", False):
+                logger.info("using fused cross entropy")
+                cross_entropy._has_logged_once = True
+
+            half_to_float = logits.dtype == torch.half
+            losses = xentropy.SoftmaxCrossEntropyLoss.apply(
+                logits,
+                target,
+                0.0,
+                ignore_index,
+                half_to_float,
+            )
+            if reduction == "sum":
+                return losses.sum()
+            elif reduction == "mean":
+                if ignore_index >= 0:
+                    return losses.sum() / target.ne(ignore_index).sum()
+                else:
+                    return losses.mean()
+            elif reduction == "none":
+                return losses
+            else:
+                raise NotImplementedError
+
+
+except ImportError:
+
+    def cross_entropy(logits, target, ignore_index=-100, reduction="mean"):
+        return _cross_entropy_pytorch(logits, target, ignore_index, reduction)
diff --git a/SpeechT5/fairseq/fairseq/modules/cuda_utils.cu b/SpeechT5/fairseq/fairseq/modules/cuda_utils.cu
new file mode 100644
index 0000000000000000000000000000000000000000..516f1d92440e9e2c092f122e45d81b45cb135602
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/modules/cuda_utils.cu
@@ -0,0 +1,203 @@
+/**
+ * Copyright (c) Facebook, Inc. and its affiliates.
+ * 
+ * This source code is licensed under the MIT license found in the
+ * LICENSE file in the root directory of this source tree.
+ */
+
+
+template <typename U, typename V>	
+constexpr __host__ __device__ auto divUp(U a, V b) -> decltype(a + b) {	
+  return (a + b - 1) / b;	
+}
+
+
+template<int FS, int SB, int padding_l, typename scalar_t>
+__inline__ __device__
+void zeroSharedMem(scalar_t* data) {
+  /*
+    Given an array of length FS + SB, zero out the first padding_l and last
+    (FS - padding_l) values in the array
+  */
+
+  int tid = threadIdx.x;
+
+  if (FS < SB) {
+
+    // zero all if we have enough threads in a block to do all of them
+    if (tid < padding_l || tid > SB - FS + padding_l - 1) {
+      data[tid] = scalar_t(0.0);
+    }
+  } else {
+
+    // otherwise zero out one block at a time
+    const int numIterations = divUp<int, int>(FS, SB);
+    for (int i = 0; i < numIterations; i++) {
+      int offset = i * SB;
+      if (tid + offset < padding_l) {
+        data[tid + offset] = scalar_t(0.0);
+      } else if (tid + offset < FS) {
+        data[SB + tid + offset] = scalar_t(0.0);
+      }
+    }
+  }
+}
+
+template<typename scalar_t>
+__inline__ __device__
+scalar_t warpReduce(scalar_t data) {
+  /*
+    Reduce an array within each warp. After processing all values in warp will
+    caontain the sum of all original values in that warp.
+
+    data - pointer to data to reduce
+  */
+  data += __shfl_xor_sync(SHFL_MASK, data, 16);
+  data += __shfl_xor_sync(SHFL_MASK, data, 8);
+  data += __shfl_xor_sync(SHFL_MASK, data, 4);
+  data += __shfl_xor_sync(SHFL_MASK, data, 2);
+  data += __shfl_xor_sync(SHFL_MASK, data, 1);
+  return data;
+}
+
+template<typename scalar_t>
+__inline__ __device__
+scalar_t blockReduce(scalar_t data) {
+  /*
+     Reduce an entire array on the block level. After processing, the
+     first value in the array will contain the reduced sum.
+
+     data - pointer to data to reduce
+  */
+
+  static __shared__ scalar_t warpSum[32];
+  const int tid = threadIdx.x;
+  int wid = tid / 32;
+  int lane = tid % 32;
+
+  __syncthreads();
+
+  // reduce each warp then write to shared memory
+  scalar_t sum = warpReduce(data);
+  if (lane == 0) {
+    warpSum[wid] = sum;
+  }
+  
+  __syncthreads();
+
+  scalar_t v;
+  // perform final sum of partial warp sums
+  if (tid < blockDim.x / 32) {
+    v = warpSum[lane];
+  } else {
+    v = scalar_t(0.0);
+  }
+
+  if (wid == 0) {
+    v = warpReduce(v);
+  }
+  __syncthreads();
+
+  return v;
+}
+
+void checkCudaStatus(cudaError_t status, int lineNumber = -1) {
+
+  if (status != cudaSuccess) {
+    std::cout << cudaGetErrorString(status)
+              << " at line " << lineNumber << std::endl;
+    std::cout << "Exiting" << std::endl;
+    exit(1);
+  }
+}
+
+template<int FS, int SB, int padding_l, typename scalar_t>
+__device__
+void load_input_to_shared(const scalar_t* input, // global memory
+                          int inputOffset, int sequenceLength,
+                          int iteration, int numIterations,
+                          bool no_prev, scalar_t* output /* shared memory */) {
+  /*
+    Load a block size of input into shared memory with
+    right and left overhang of total size FS. If previously
+    loaded memory, overlap will be shifted over to reduce
+    global memory access
+
+    input - pointer to start of channel sequence
+    inputOffset - how far in the sequence to start loading
+    sequenceLength - total length of sequence
+    iteration - which block of sequence we are loading
+    numIterations - total number of blocks to load
+    no_prev - whether to load the whole block if the previous block
+              wasn't loaded
+    output - shared memory to write input to
+  */
+
+  const int tid = threadIdx.x;
+
+  // Load the left "overhang" of input
+  if (iteration > 0) {
+    if (padding_l < SB) {
+
+      // load all at once
+      if (tid < padding_l) {
+        output[tid] = (no_prev) ? input[inputOffset - padding_l + tid] : output[tid + SB];
+      }
+    } else {
+
+      // load in chunks of size SB
+      int numIterations = divUp<int, int>(padding_l, SB);
+      for (int i = 0; i < numIterations; i++) {
+        int offset = i * SB;
+        if ((tid + offset) < padding_l) {
+          output[tid + offset] = (no_prev) ? input[inputOffset - padding_l + tid + offset] : output[tid + offset + SB];
+        }
+      }
+    }
+  }
+
+  // Load the right "overhang" of input
+  if (iteration < (numIterations - 1)) {
+    const int elementsLeft = sequenceLength - (iteration+1) * SB;
+
+    if ((FS - padding_l) < SB) {
+
+      // load all at once
+      if (tid < (FS - padding_l)) {
+          output[padding_l + SB + tid] = (tid < elementsLeft) ? input[inputOffset + SB + tid] : scalar_t(0.0);
+      }
+    } else {
+
+      // load in chunks of size SB
+      int numIterations = divUp<int, int>(FS - padding_l, SB);
+      for (int i = 0; i < numIterations; i++) {
+        int offset = i * SB;
+        if ((tid + offset) < (FS - padding_l)) {
+          output[padding_l + SB + tid + offset] = ((tid + offset) < elementsLeft) ? input[inputOffset + SB + tid + offset] : scalar_t(0.0);
+        }
+      }
+    }
+  }
+
+  // We should also clear out the right "overhang"
+  if (iteration == (numIterations - 1)) {
+    if ((FS - padding_l) < SB) {
+
+      // clear out all at once
+      if (tid < (FS - padding_l)) {
+          output[padding_l + SB + tid] = scalar_t(0.0);
+      }
+    } else {
+
+      // clear in chunks of size SB
+      int numIterations = divUp<int, int>(FS - padding_l, SB);
+      for (int i = 0; i < numIterations; i++) {
+        int offset = i * SB;
+        if ((tid + offset) < (FS - padding_l)) {
+          output[padding_l + SB + tid + offset] = scalar_t(0.0);
+        }
+      }
+    }
+  }
+  output[tid + padding_l] = ((inputOffset + tid) < sequenceLength) ? input[inputOffset + tid] : scalar_t(0.0);
+}
diff --git a/SpeechT5/fairseq/fairseq/modules/downsampled_multihead_attention.py b/SpeechT5/fairseq/fairseq/modules/downsampled_multihead_attention.py
new file mode 100644
index 0000000000000000000000000000000000000000..2cdece3f7fca2b830eb72999ce93f58667ed595b
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/modules/downsampled_multihead_attention.py
@@ -0,0 +1,316 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+#
+
+import math
+
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from fairseq.modules.fairseq_dropout import FairseqDropout
+from fairseq.modules.scalar_bias import scalar_bias
+
+
+class SingleHeadAttention(nn.Module):
+    """
+    Single-head attention that supports Gating and Downsampling
+    """
+
+    def __init__(
+        self,
+        out_channels,
+        embed_dim,
+        head_dim,
+        head_index,
+        dropout=0.0,
+        bias=True,
+        project_input=True,
+        gated=False,
+        downsample=False,
+        num_heads=1,
+    ):
+        super().__init__()
+        self.embed_dim = embed_dim
+        self.dropout_module = FairseqDropout(
+            dropout, module_name=self.__class__.__name__
+        )
+        self.head_index = head_index
+        self.head_dim = head_dim
+        self.project_input = project_input
+        self.gated = gated
+        self.downsample = downsample
+        self.num_heads = num_heads
+        self.projection = None
+
+        k_layers = []
+        v_layers = []
+        if self.downsample:
+            k_layers.append(Downsample(self.head_index))
+            v_layers.append(Downsample(self.head_index))
+            out_proj_size = self.head_dim
+        else:
+            out_proj_size = self.head_dim * self.num_heads
+        if self.gated:
+            k_layers.append(GatedLinear(self.embed_dim, out_proj_size, bias=bias))
+            self.in_proj_q = GatedLinear(self.embed_dim, out_proj_size, bias=bias)
+            v_layers.append(GatedLinear(self.embed_dim, out_proj_size, bias=bias))
+        else:
+            k_layers.append(Linear(self.embed_dim, out_proj_size, bias=bias))
+            self.in_proj_q = Linear(self.embed_dim, out_proj_size, bias=bias)
+            v_layers.append(Linear(self.embed_dim, out_proj_size, bias=bias))
+
+        self.in_proj_k = nn.Sequential(*k_layers)
+        self.in_proj_v = nn.Sequential(*v_layers)
+
+        if self.downsample:
+            self.out_proj = Linear(out_proj_size, self.head_dim, bias=bias)
+        else:
+            self.out_proj = Linear(out_proj_size, out_channels, bias=bias)
+
+        self.scaling = self.head_dim ** -0.5
+
+    def forward(
+        self,
+        query,
+        key,
+        value,
+        mask_future_timesteps=False,
+        key_padding_mask=None,
+        use_scalar_bias=False,
+    ):
+        """Input shape: Time x Batch x Channel
+        Self-attention can be implemented by passing in the same arguments for
+        query, key and value. Future timesteps can be masked with the
+        `mask_future_timesteps` argument. Padding elements can be excluded from
+        the key by passing a binary ByteTensor (`key_padding_mask`) with shape:
+        batch x src_len, where padding elements are indicated by 1s.
+        """
+        src_len, bsz, out_channels = key.size()
+        tgt_len = query.size(0)
+        assert list(query.size()) == [tgt_len, bsz, out_channels]
+        assert key.size() == value.size()
+
+        if key_padding_mask is not None:
+            assert key_padding_mask.size(0) == bsz
+            assert key_padding_mask.size(1) == src_len
+
+        if self.downsample:
+            size = bsz
+        else:
+            size = bsz * self.num_heads
+
+        k = key
+        v = value
+        q = query
+        if self.project_input:
+            q = self.in_proj_q(q)
+            k = self.in_proj_k(k)
+            v = self.in_proj_v(v)
+            src_len = k.size()[0]
+        q *= self.scaling
+
+        if not self.downsample:
+            q = q.view(tgt_len, size, self.head_dim)
+            k = k.view(src_len, size, self.head_dim)
+            v = v.view(src_len, size, self.head_dim)
+
+        q = q.transpose(0, 1)
+        k = k.transpose(0, 1)
+        v = v.transpose(0, 1)
+
+        attn_weights = torch.bmm(q, k.transpose(1, 2))
+        if mask_future_timesteps:
+            assert (
+                query.size() == key.size()
+            ), "mask_future_timesteps only applies to self-attention"
+            attn_weights *= torch.tril(
+                attn_weights.data.new([1]).expand(tgt_len, tgt_len).clone(),
+                diagonal=-1,
+            )[:, :: self.head_index + 1 if self.downsample else 1].unsqueeze(0)
+            attn_weights += torch.triu(
+                attn_weights.data.new([-math.inf]).expand(tgt_len, tgt_len).clone(),
+                diagonal=0,
+            )[:, :: self.head_index + 1 if self.downsample else 1].unsqueeze(0)
+        tgt_size = tgt_len
+        if use_scalar_bias:
+            attn_weights = scalar_bias(attn_weights, 2)
+            v = scalar_bias(v, 1)
+            tgt_size += 1
+
+        if key_padding_mask is not None:
+            # don't attend to padding symbols
+            if key_padding_mask.max() > 0:
+                if self.downsample:
+                    attn_weights = attn_weights.view(bsz, 1, tgt_len, src_len)
+                else:
+                    attn_weights = attn_weights.view(
+                        size, self.num_heads, tgt_len, src_len
+                    )
+                attn_weights = attn_weights.masked_fill(
+                    key_padding_mask.unsqueeze(1).unsqueeze(2),
+                    -math.inf,
+                )
+                attn_weights = attn_weights.view(size, tgt_len, src_len)
+        attn_weights = F.softmax(attn_weights, dim=-1)
+        attn_weights = self.dropout_module(attn_weights)
+
+        attn = torch.bmm(attn_weights, v)
+        if self.downsample:
+            attn = attn.transpose(0, 1).contiguous().view(tgt_len, bsz, self.head_dim)
+        else:
+            attn = attn.transpose(0, 1).contiguous().view(tgt_len, bsz, self.embed_dim)
+
+        attn = self.out_proj(attn)
+
+        return attn, attn_weights
+
+
+class DownsampledMultiHeadAttention(nn.ModuleList):
+    """
+    Multi-headed attention with Gating and Downsampling
+    """
+
+    def __init__(
+        self,
+        out_channels,
+        embed_dim,
+        num_heads,
+        dropout=0.0,
+        bias=True,
+        project_input=True,
+        gated=False,
+        downsample=False,
+    ):
+        self.embed_dim = embed_dim
+        self.num_heads = num_heads
+        self.head_dim = embed_dim // num_heads
+        self.downsample = downsample
+        self.gated = gated
+        self.project_input = project_input
+        assert self.head_dim * num_heads == embed_dim
+
+        if self.downsample:
+            attention_heads = []
+            for index in range(self.num_heads):
+                attention_heads.append(
+                    SingleHeadAttention(
+                        out_channels,
+                        self.embed_dim,
+                        self.head_dim,
+                        index,
+                        dropout,
+                        bias,
+                        self.project_input,
+                        self.gated,
+                        self.downsample,
+                        self.num_heads,
+                    )
+                )
+            super().__init__(modules=attention_heads)
+            self.out_proj = Linear(embed_dim, out_channels, bias=bias)
+        else:
+            # either we have a list of attention heads, or just one attention head
+            # if not being downsampled, we can do the heads with one linear layer instead of separate ones
+            super().__init__()
+            self.attention_module = SingleHeadAttention(
+                out_channels,
+                self.embed_dim,
+                self.head_dim,
+                1,
+                dropout,
+                bias,
+                self.project_input,
+                self.gated,
+                self.downsample,
+                self.num_heads,
+            )
+
+    def forward(
+        self,
+        query,
+        key,
+        value,
+        mask_future_timesteps=False,
+        key_padding_mask=None,
+        use_scalar_bias=False,
+    ):
+        src_len, bsz, embed_dim = key.size()
+        tgt_len = query.size(0)
+        assert embed_dim == self.embed_dim
+        assert list(query.size()) == [tgt_len, bsz, embed_dim]
+        assert key.size() == value.size()
+
+        tgt_size = tgt_len
+        if use_scalar_bias:
+            tgt_size += 1
+
+        attn = []
+        attn_weights = []
+        if self.downsample:
+            for attention_head_number in range(self.num_heads):
+                # call the forward of each attention head
+                _attn, _attn_weight = self[attention_head_number](
+                    query,
+                    key,
+                    value,
+                    mask_future_timesteps,
+                    key_padding_mask,
+                    use_scalar_bias,
+                )
+                attn.append(_attn)
+                attn_weights.append(_attn_weight)
+            full_attn = torch.cat(attn, dim=2)
+            full_attn = self.out_proj(full_attn)
+            return full_attn, attn_weights[0].clone()
+        else:
+            _attn, _attn_weight = self.attention_module(
+                query,
+                key,
+                value,
+                mask_future_timesteps,
+                key_padding_mask,
+                use_scalar_bias,
+            )
+            attn.append(_attn)
+            attn_weights.append(_attn_weight)
+            full_attn = torch.cat(attn, dim=2)
+            full_attn_weights = torch.cat(attn_weights)
+            full_attn_weights = full_attn_weights.view(
+                bsz, self.num_heads, tgt_size, src_len
+            )
+            full_attn_weights = full_attn_weights.sum(dim=1) / self.num_heads
+            return full_attn, full_attn_weights
+
+
+class Downsample(nn.Module):
+    """
+    Selects every nth element, where n is the index
+    """
+
+    def __init__(self, index):
+        super().__init__()
+        self.index = index
+
+    def forward(self, x):
+        return x[:: self.index + 1]
+
+
+def Linear(in_features, out_features, dropout=0.0, bias=True):
+    """Weight-normalized Linear layer (input: B x T x C)"""
+    m = nn.Linear(in_features, out_features, bias=bias)
+    m.weight.data.normal_(mean=0, std=math.sqrt((1 - dropout) / in_features))
+    m.bias.data.zero_()
+    return nn.utils.weight_norm(m)
+
+
+def GatedLinear(in_features, out_features, dropout=0.0, bias=True):
+    """Weight-normalized Linear layer (input: B x T x C) with interspersed GLU units"""
+    return nn.Sequential(
+        Linear(in_features, out_features * 4, dropout, bias),
+        nn.GLU(),
+        Linear(out_features * 2, out_features * 2, dropout, bias),
+        nn.GLU(),
+        Linear(out_features, out_features, dropout, bias),
+    )
diff --git a/SpeechT5/fairseq/fairseq/modules/dynamic_convolution.py b/SpeechT5/fairseq/fairseq/modules/dynamic_convolution.py
new file mode 100644
index 0000000000000000000000000000000000000000..0121d453b9e026f5128dd41fce691aa1b4486448
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/modules/dynamic_convolution.py
@@ -0,0 +1,310 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from fairseq import utils
+from fairseq.incremental_decoding_utils import with_incremental_state
+from fairseq.modules.fairseq_dropout import FairseqDropout
+
+from .unfold import unfold1d
+
+
+def DynamicConv(
+    input_size,
+    kernel_size=1,
+    padding_l=None,
+    num_heads=1,
+    weight_dropout=0.0,
+    weight_softmax=False,
+    renorm_padding=False,
+    bias=False,
+    conv_bias=False,
+    query_size=None,
+    in_proj=False,
+):
+    if torch.cuda.is_available():
+        try:
+            from fairseq.modules.dynamicconv_layer import DynamicconvLayer
+
+            return DynamicconvLayer(
+                input_size,
+                kernel_size=kernel_size,
+                padding_l=padding_l,
+                num_heads=num_heads,
+                weight_dropout=weight_dropout,
+                weight_softmax=weight_softmax,
+                renorm_padding=renorm_padding,
+                bias=bias,
+                conv_bias=conv_bias,
+                query_size=query_size,
+            )
+        except ImportError as e:
+            print(e)
+    return DynamicConv1dTBC(
+        input_size,
+        kernel_size=kernel_size,
+        padding_l=padding_l,
+        num_heads=num_heads,
+        weight_dropout=weight_dropout,
+        weight_softmax=weight_softmax,
+        renorm_padding=renorm_padding,
+        bias=bias,
+        conv_bias=conv_bias,
+        query_size=query_size,
+    )
+
+
+def Linear(in_features, out_features, bias=True):
+    m = nn.Linear(in_features, out_features, bias)
+    nn.init.xavier_uniform_(m.weight)
+    if bias:
+        nn.init.constant_(m.bias, 0.0)
+    return m
+
+
+@with_incremental_state
+class DynamicConv1dTBC(nn.Module):
+    """Dynamic lightweight convolution taking T x B x C inputs
+    Args:
+        input_size: # of channels of the input
+        kernel_size: convolution channels
+        padding_l: padding to the left when using "same" padding
+        num_heads: number of heads used. The weight is of shape (num_heads, 1, kernel_size)
+        weight_dropout: the drop rate of the DropConnect to drop the weight
+        weight_softmax: normalize the weight with softmax before the convolution
+        renorm_padding: re-normalize the filters to ignore the padded part (only the non-padding parts sum up to 1)
+        bias: use bias
+        conv_bias: bias of the convolution
+        query_size: specified when feeding a different input as the query
+        in_proj: project the input and generate the filter together
+
+    Shape:
+        Input: TxBxC, i.e. (timesteps, batch_size, input_size)
+        Output: TxBxC, i.e. (timesteps, batch_size, input_size)
+
+    Attributes:
+        weight: the learnable weights of the module of shape
+            `(num_heads, 1, kernel_size)`
+        bias:   the learnable bias of the module of shape `(input_size)`
+    """
+
+    def __init__(
+        self,
+        input_size,
+        kernel_size=1,
+        padding_l=None,
+        num_heads=1,
+        weight_dropout=0.0,
+        weight_softmax=False,
+        renorm_padding=False,
+        bias=False,
+        conv_bias=False,
+        query_size=None,
+        in_proj=False,
+    ):
+        super().__init__()
+        self.input_size = input_size
+        self.query_size = input_size if query_size is None else query_size
+        self.kernel_size = kernel_size
+        self.padding_l = padding_l
+        self.num_heads = num_heads
+        self.weight_dropout_module = FairseqDropout(
+            weight_dropout, module_name=self.__class__.__name__
+        )
+        self.weight_softmax = weight_softmax
+        self.renorm_padding = renorm_padding
+
+        if in_proj:
+            self.weight_linear = Linear(
+                self.input_size, self.input_size + num_heads * kernel_size * 1
+            )
+        else:
+            self.weight_linear = Linear(
+                self.query_size, num_heads * kernel_size * 1, bias=bias
+            )
+        if conv_bias:
+            self.conv_bias = nn.Parameter(torch.Tensor(input_size))
+        else:
+            self.conv_bias = None
+        self.reset_parameters()
+
+    @property
+    def in_proj(self):
+        return (
+            self.weight_linear.out_features
+            == self.input_size + self.num_heads * self.kernel_size
+        )
+
+    def reset_parameters(self):
+        self.weight_linear.reset_parameters()
+        if self.conv_bias is not None:
+            nn.init.constant_(self.conv_bias, 0.0)
+
+    def forward(self, x, incremental_state=None, query=None, unfold=None):
+        """Assuming the input, x, of the shape T x B x C and producing an output in the shape T x B x C
+        args:
+            x: Input of shape T x B x C, i.e. (timesteps, batch_size, input_size)
+            incremental_state: A dict to keep the state
+            unfold: unfold the input or not. If not, we use the matrix trick instead
+            query: use the specified query to predict the conv filters
+        """
+        unfold = (
+            x.size(0) > 512 if unfold is None else unfold
+        )  # use unfold mode as default for long sequence to save memory
+        unfold = unfold or (incremental_state is not None)
+        assert query is None or not self.in_proj
+
+        if query is None:
+            query = x
+        if unfold:
+            output = self._forward_unfolded(x, incremental_state, query)
+        else:
+            output = self._forward_expanded(x, incremental_state, query)
+
+        if self.conv_bias is not None:
+            output = output + self.conv_bias.view(1, 1, -1)
+        return output
+
+    def _forward_unfolded(self, x, incremental_state, query):
+        """The conventional implementation of convolutions.
+        Unfolding the input by having a window shifting to the right."""
+        T, B, C = x.size()
+        K, H = self.kernel_size, self.num_heads
+        R = C // H
+        assert R * H == C == self.input_size
+
+        if self.in_proj:
+            proj = self.weight_linear(x)
+            x = proj.narrow(2, 0, self.input_size).contiguous()
+            weight = (
+                proj.narrow(2, self.input_size, H * K).contiguous().view(T * B * H, -1)
+            )
+        else:
+            weight = self.weight_linear(query).view(T * B * H, -1)
+
+        # renorm_padding is only implemented in _forward_expanded
+        assert not self.renorm_padding or incremental_state is not None
+
+        if incremental_state is not None:
+            input_buffer = self._get_input_buffer(incremental_state)
+            if input_buffer is None:
+                input_buffer = x.new()
+            x_unfold = torch.cat([input_buffer, x.unsqueeze(3)], dim=3)
+            if self.kernel_size > 1:
+                self._set_input_buffer(
+                    incremental_state, x_unfold[:, :, :, -self.kernel_size + 1 :]
+                )
+            x_unfold = x_unfold.view(T * B * H, R, -1)
+        else:
+            padding_l = self.padding_l
+            if K > T and padding_l == K - 1:
+                weight = weight.narrow(1, K - T, T)
+                K, padding_l = T, T - 1
+            # unfold the input: T x B x C --> T' x B x C x K
+            x_unfold = unfold1d(x, K, padding_l, 0)
+            x_unfold = x_unfold.view(T * B * H, R, K)
+
+        if self.weight_softmax and not self.renorm_padding:
+            weight = F.softmax(weight, dim=1)
+        weight = weight.narrow(1, 0, K)
+
+        if incremental_state is not None:
+            weight = weight[:, -x_unfold.size(2) :]
+            K = weight.size(1)
+
+        if self.weight_softmax and self.renorm_padding:
+            weight = F.softmax(weight, dim=1)
+
+        weight = self.weight_dropout_module(weight, inplace=False)
+
+        output = torch.bmm(x_unfold, weight.unsqueeze(2))  # T*B*H x R x 1
+        output = output.view(T, B, C)
+        return output
+
+    def _forward_expanded(self, x, incremental_stat, query):
+        """Turn the convolution filters into band matrices and do matrix multiplication.
+        This is faster when the sequence is short, but less memory efficient.
+        This is not used in the decoder during inference.
+        """
+        T, B, C = x.size()
+        K, H = self.kernel_size, self.num_heads
+        R = C // H
+        assert R * H == C == self.input_size
+        if self.in_proj:
+            proj = self.weight_linear(x)
+            x = proj.narrow(2, 0, self.input_size).contiguous()
+            weight = (
+                proj.narrow(2, self.input_size, H * K).contiguous().view(T * B * H, -1)
+            )
+        else:
+            weight = self.weight_linear(query).view(T * B * H, -1)
+
+        if not self.renorm_padding:
+            if self.weight_softmax:
+                weight = F.softmax(weight, dim=1)
+            weight = self.weight_dropout_module(weight, inplace=False)
+        weight = weight.narrow(1, 0, K).contiguous()
+        weight = weight.view(T, B * H, K).transpose(0, 1)
+
+        x = x.view(T, B * H, R).transpose(0, 1)
+        if self.weight_softmax and self.renorm_padding:
+            # turn the convolution filters into band matrices
+            weight_expanded = weight.new(B * H, T, T + K - 1).fill_(float("-inf"))
+            weight_expanded.as_strided(
+                (B * H, T, K), (T * (T + K - 1), T + K, 1)
+            ).copy_(weight)
+            weight_expanded = weight_expanded.narrow(2, self.padding_l, T)
+            # normalize the weight over valid positions like self-attention
+            weight_expanded = F.softmax(weight_expanded, dim=2)
+            weight_expanded = self.weight_dropout_module(weight_expanded, inplace=False)
+        else:
+            P = self.padding_l
+            # For efficiency, we cut the kernel size and reduce the padding when the kernel is larger than the length
+            if K > T and P == K - 1:
+                weight = weight.narrow(2, K - T, T)
+                K, P = T, T - 1
+            # turn the convolution filters into band matrices
+            weight_expanded = weight.new_zeros(B * H, T, T + K - 1, requires_grad=False)
+            weight_expanded.as_strided(
+                (B * H, T, K), (T * (T + K - 1), T + K, 1)
+            ).copy_(weight)
+            weight_expanded = weight_expanded.narrow(2, P, T)  # B*H x T x T
+        output = torch.bmm(weight_expanded, x)
+        output = output.transpose(0, 1).contiguous().view(T, B, C)
+        return output
+
+    def reorder_incremental_state(self, incremental_state, new_order):
+        input_buffer = self._get_input_buffer(incremental_state)
+        if input_buffer is not None:
+            input_buffer = input_buffer.index_select(1, new_order)
+            self._set_input_buffer(incremental_state, input_buffer)
+
+    def _get_input_buffer(self, incremental_state):
+        return utils.get_incremental_state(self, incremental_state, "input_buffer")
+
+    def _set_input_buffer(self, incremental_state, new_buffer):
+        return utils.set_incremental_state(
+            self, incremental_state, "input_buffer", new_buffer
+        )
+
+    def extra_repr(self):
+        s = "{}, kernel_size={}, padding_l={}, num_heads={}, weight_softmax={}, conv_bias={}, renorm_padding={}, in_proj={}".format(
+            self.input_size,
+            self.kernel_size,
+            self.padding_l,
+            self.num_heads,
+            self.weight_softmax,
+            self.conv_bias is not None,
+            self.renorm_padding,
+            self.in_proj,
+        )
+
+        if self.query_size != self.input_size:
+            s += ", query_size={}".format(self.query_size)
+        if self.weight_dropout_module.p > 0.0:
+            s += ", weight_dropout={}".format(self.weight_dropout_module.p)
+        return s
diff --git a/SpeechT5/fairseq/fairseq/modules/dynamic_crf_layer.py b/SpeechT5/fairseq/fairseq/modules/dynamic_crf_layer.py
new file mode 100644
index 0000000000000000000000000000000000000000..8fcc6b8d2672d2eacc6d01b9688bac44d5e1ce26
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/modules/dynamic_crf_layer.py
@@ -0,0 +1,189 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+"""
+This file is to re-implemented the low-rank and beam approximation of CRF layer
+Proposed by:
+
+Sun, Zhiqing, et al.
+Fast Structured Decoding for Sequence Models
+https://arxiv.org/abs/1910.11555
+
+The CRF implementation is mainly borrowed from
+https://github.com/kmkurn/pytorch-crf/blob/master/torchcrf/__init__.py
+
+"""
+
+import numpy as np
+import torch
+import torch.nn as nn
+
+
+def logsumexp(x, dim=1):
+    return torch.logsumexp(x.float(), dim=dim).type_as(x)
+
+
+class DynamicCRF(nn.Module):
+    """Dynamic CRF layer is used to approximate the traditional
+    Conditional Random Fields (CRF)
+    $P(y | x) = 1/Z(x) exp(sum_i s(y_i, x) + sum_i t(y_{i-1}, y_i, x))$
+
+    where in this function, we assume the emition scores (s) are given,
+    and the transition score is a |V| x |V| matrix $M$
+
+    in the following two aspects:
+     (1) it used a low-rank approximation for the transition matrix:
+         $M = E_1 E_2^T$
+     (2) it used a beam to estimate the normalizing factor Z(x)
+    """
+
+    def __init__(self, num_embedding, low_rank=32, beam_size=64):
+        super().__init__()
+
+        self.E1 = nn.Embedding(num_embedding, low_rank)
+        self.E2 = nn.Embedding(num_embedding, low_rank)
+
+        self.vocb = num_embedding
+        self.rank = low_rank
+        self.beam = beam_size
+
+    def extra_repr(self):
+        return "vocab_size={}, low_rank={}, beam_size={}".format(
+            self.vocb, self.rank, self.beam
+        )
+
+    def forward(self, emissions, targets, masks, beam=None):
+        """
+        Compute the conditional log-likelihood of a sequence of target tokens given emission scores
+
+        Args:
+            emissions (`~torch.Tensor`): Emission score are usually the unnormalized decoder output
+                ``(batch_size, seq_len, vocab_size)``. We assume batch-first
+            targets (`~torch.LongTensor`): Sequence of target token indices
+                ``(batch_size, seq_len)
+            masks (`~torch.ByteTensor`): Mask tensor with the same size as targets
+
+        Returns:
+            `~torch.Tensor`: approximated log-likelihood
+        """
+        numerator = self._compute_score(emissions, targets, masks)
+        denominator = self._compute_normalizer(emissions, targets, masks, beam)
+        return numerator - denominator
+
+    def forward_decoder(self, emissions, masks=None, beam=None):
+        """
+        Find the most likely output sequence using Viterbi algorithm.
+
+        Args:
+            emissions (`~torch.Tensor`): Emission score are usually the unnormalized decoder output
+                ``(batch_size, seq_len, vocab_size)``. We assume batch-first
+            masks (`~torch.ByteTensor`): Mask tensor with the same size as targets
+
+        Returns:
+            `~torch.LongTensor`: decoded sequence from the CRF model
+        """
+        return self._viterbi_decode(emissions, masks, beam)
+
+    def _compute_score(self, emissions, targets, masks=None):
+        batch_size, seq_len = targets.size()
+        emission_scores = emissions.gather(2, targets[:, :, None])[:, :, 0]  # B x T
+        transition_scores = (self.E1(targets[:, :-1]) * self.E2(targets[:, 1:])).sum(2)
+
+        scores = emission_scores
+        scores[:, 1:] += transition_scores
+
+        if masks is not None:
+            scores = scores * masks.type_as(scores)
+        return scores.sum(-1)
+
+    def _compute_normalizer(self, emissions, targets=None, masks=None, beam=None):
+        # HACK: we include "target" which is a hueristic for training
+        # HACK: we use a beam of tokens to approximate the normalizing factor (which is bad?)
+
+        beam = beam if beam is not None else self.beam
+        batch_size, seq_len = emissions.size()[:2]
+        if targets is not None:
+            _emissions = emissions.scatter(2, targets[:, :, None], np.float("inf"))
+            beam_targets = _emissions.topk(beam, 2)[1]
+            beam_emission_scores = emissions.gather(2, beam_targets)
+        else:
+            beam_emission_scores, beam_targets = emissions.topk(beam, 2)
+        beam_transition_score1 = self.E1(beam_targets[:, :-1])  # B x (T-1) x K x D
+        beam_transition_score2 = self.E2(beam_targets[:, 1:])  # B x (T-1) x K x D
+        beam_transition_matrix = torch.bmm(
+            beam_transition_score1.view(-1, beam, self.rank),
+            beam_transition_score2.view(-1, beam, self.rank).transpose(1, 2),
+        )
+        beam_transition_matrix = beam_transition_matrix.view(batch_size, -1, beam, beam)
+
+        # compute the normalizer in the log-space
+        score = beam_emission_scores[:, 0]  # B x K
+        for i in range(1, seq_len):
+            next_score = score[:, :, None] + beam_transition_matrix[:, i - 1]
+            next_score = logsumexp(next_score, dim=1) + beam_emission_scores[:, i]
+
+            if masks is not None:
+                score = torch.where(masks[:, i : i + 1], next_score, score)
+            else:
+                score = next_score
+
+        # Sum (log-sum-exp) over all possible tags
+        return logsumexp(score, dim=1)
+
+    def _viterbi_decode(self, emissions, masks=None, beam=None):
+        # HACK: we use a beam of tokens to approximate the normalizing factor (which is bad?)
+
+        beam = beam if beam is not None else self.beam
+        batch_size, seq_len = emissions.size()[:2]
+        beam_emission_scores, beam_targets = emissions.topk(beam, 2)
+        beam_transition_score1 = self.E1(beam_targets[:, :-1])  # B x (T-1) x K x D
+        beam_transition_score2 = self.E2(beam_targets[:, 1:])  # B x (T-1) x K x D
+        beam_transition_matrix = torch.bmm(
+            beam_transition_score1.view(-1, beam, self.rank),
+            beam_transition_score2.view(-1, beam, self.rank).transpose(1, 2),
+        )
+        beam_transition_matrix = beam_transition_matrix.view(batch_size, -1, beam, beam)
+
+        traj_tokens, traj_scores = [], []
+        finalized_tokens, finalized_scores = [], []
+
+        # compute the normalizer in the log-space
+        score = beam_emission_scores[:, 0]  # B x K
+        dummy = (
+            torch.arange(beam, device=score.device).expand(*score.size()).contiguous()
+        )
+
+        for i in range(1, seq_len):
+            traj_scores.append(score)
+            _score = score[:, :, None] + beam_transition_matrix[:, i - 1]
+            _score, _index = _score.max(dim=1)
+            _score = _score + beam_emission_scores[:, i]
+
+            if masks is not None:
+                score = torch.where(masks[:, i : i + 1], _score, score)
+                index = torch.where(masks[:, i : i + 1], _index, dummy)
+            else:
+                score, index = _score, _index
+            traj_tokens.append(index)
+
+        # now running the back-tracing and find the best
+        best_score, best_index = score.max(dim=1)
+        finalized_tokens.append(best_index[:, None])
+        finalized_scores.append(best_score[:, None])
+
+        for idx, scs in zip(reversed(traj_tokens), reversed(traj_scores)):
+            previous_index = finalized_tokens[-1]
+            finalized_tokens.append(idx.gather(1, previous_index))
+            finalized_scores.append(scs.gather(1, previous_index))
+
+        finalized_tokens.reverse()
+        finalized_tokens = torch.cat(finalized_tokens, 1)
+        finalized_tokens = beam_targets.gather(2, finalized_tokens[:, :, None])[:, :, 0]
+
+        finalized_scores.reverse()
+        finalized_scores = torch.cat(finalized_scores, 1)
+        finalized_scores[:, 1:] = finalized_scores[:, 1:] - finalized_scores[:, :-1]
+
+        return finalized_scores, finalized_tokens
diff --git a/SpeechT5/fairseq/fairseq/modules/dynamicconv_layer/__init__.py b/SpeechT5/fairseq/fairseq/modules/dynamicconv_layer/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..22dc6f403d2a0ecdb1b9e7e69ed96bd560e93b2c
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/modules/dynamicconv_layer/__init__.py
@@ -0,0 +1,6 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from .dynamicconv_layer import DynamicconvLayer  # noqa
diff --git a/SpeechT5/fairseq/fairseq/modules/dynamicconv_layer/cuda_function_gen.py b/SpeechT5/fairseq/fairseq/modules/dynamicconv_layer/cuda_function_gen.py
new file mode 100644
index 0000000000000000000000000000000000000000..9304f99eb8169a614f39babc830c84cac80e080b
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/modules/dynamicconv_layer/cuda_function_gen.py
@@ -0,0 +1,223 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+
+def gen_forward():
+
+    kernels = [3, 5, 7, 15, 31, 63, 127, 255]
+    blocks = [32, 64, 128, 256]
+
+    head = """
+/**
+ * Copyright (c) Facebook, Inc. and its affiliates.
+ *
+ * This source code is licensed under the MIT license found in the
+ * LICENSE file in the root directory of this source tree.
+ */
+
+#include "dynamicconv_cuda.cuh"
+
+std::vector<at::Tensor> dynamicconv_cuda_forward(at::Tensor input, at::Tensor weight, int padding_l) {
+
+    at::DeviceGuard g(input.device());
+    const auto minibatch = input.size(0);
+    const auto numFeatures = input.size(1);
+    const auto sequenceLength = input.size(2);
+
+    const auto numHeads = weight.size(1);
+    const auto filterSize = weight.size(2);
+
+    const auto numFiltersInBlock = numFeatures / numHeads;
+    const dim3 blocks(minibatch, numFeatures);
+
+    auto output = at::zeros_like(input);
+    auto stream = at::cuda::getCurrentCUDAStream();
+"""
+
+    switch = """
+    switch(filterSize) {
+"""
+
+    case_k = """
+        case {k}:
+"""
+
+    main_block = """
+            if (padding_l == {pad}) {{
+                AT_DISPATCH_FLOATING_TYPES_AND_HALF(input.scalar_type(), "dynamicconv_forward", ([&] {{
+                    dynamicconv_forward_kernel<{k}, {b_size}, {pad}, scalar_t>
+                    <<<blocks, {b_size}, 0, stream>>>(
+                            input.data<scalar_t>(),
+                            weight.data<scalar_t>(),
+                            minibatch,
+                            sequenceLength,
+                            numFeatures,
+                            numFiltersInBlock,
+                            numHeads,
+                            output.data<scalar_t>());
+                }}));
+            }} else
+"""
+
+    bad_padding = """
+            {
+                std::cout << "WARNING: Unsupported padding size - skipping forward pass" << std::endl;
+            }
+            break;\n
+"""
+
+    end = """
+        default:
+            std::cout << "WARNING: Unsupported filter length passed - skipping forward pass" << std::endl;
+    }
+
+    return {output};
+}
+"""
+
+    with open("dynamicconv_cuda_forward.cu", "w") as forward:
+        forward.write(head)
+        forward.write(switch)
+        for k in kernels:
+            b_size = 32
+            for b in blocks:
+                if b > k:
+                    b_size = b
+                    break
+            forward.write(case_k.format(k=k))
+            for pad in [k // 2, k - 1]:
+                forward.write(main_block.format(k=k, b_size=b_size, pad=pad))
+            forward.write(bad_padding)
+        forward.write(end)
+
+
+def gen_backward():
+
+    kernels = [3, 5, 7, 15, 31, 63, 127, 255]
+    thresh = [512, 512, 512, 512, 512, 380, 256, 256]
+    min_block = [64, 64, 64, 64, 64, 64, 128, 256]
+    seqs = [32 * x for x in [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]]
+
+    head = """
+/**
+ * Copyright (c) Facebook, Inc. and its affiliates.
+ *
+ * This source code is licensed under the MIT license found in the
+ * LICENSE file in the root directory of this source tree.
+ */
+
+#include "dynamicconv_cuda.cuh"
+
+std::vector<at::Tensor> dynamicconv_cuda_backward(at::Tensor gradOutput, int padding_l, at::Tensor input, at::Tensor weight) {
+
+    at::DeviceGuard g(input.device());
+    const auto minibatch = input.size(0);
+    const auto numFeatures = input.size(1);
+    const auto sequenceLength = input.size(2);
+
+    const auto numHeads = weight.size(1);
+    const auto filterSize = weight.size(2);
+
+    const auto numFiltersInBlock = numFeatures / numHeads;
+    auto numChunks = 1;
+
+    auto gradInput = at::zeros_like(input);
+    auto gradWeight = at::zeros_like(weight);
+    auto stream = at::cuda::getCurrentCUDAStream();
+
+    dim3 blocks(minibatch, numHeads, numChunks);
+"""
+
+    sequence_if = """
+    if (sequenceLength < {seq}) {{
+        switch(filterSize) {{
+"""
+
+    case_k = """
+            case {k}:
+"""
+
+    chunks_reset = """
+                numChunks = int(ceilf(sequenceLength/float({b_size})));
+                blocks = dim3(minibatch, numHeads, numChunks);
+"""
+
+    main_block = """
+                if (padding_l == {p}) {{
+                    AT_DISPATCH_FLOATING_TYPES_AND_HALF(gradOutput.scalar_type(), "dynamicconv_backward", ([&] {{
+                        dynamicconv_backward_kernel<{k}, {b_size}, {p}, scalar_t>
+                        <<<blocks, {b_size}, 0, stream>>>(
+                                    gradOutput.data<scalar_t>(),
+                                    input.data<scalar_t>(),
+                                    weight.data<scalar_t>(),
+                                    minibatch,
+                                    sequenceLength,
+                                    numFeatures,
+                                    numFiltersInBlock,
+                                    numHeads,
+                                    gradWeight.data<scalar_t>(),
+                                    gradInput.data<scalar_t>());
+                    }}));
+                }} else
+"""
+
+    bad_padding = """
+                {
+                    std::cout << "WARNING: Unsupported padding size - skipping backward pass" << std::endl;
+                }
+                break;\n
+"""
+
+    bad_filter = """
+            default:
+                std::cout << "WARNING: Unsupported filter length passed - skipping backward pass" << std::endl;
+        }
+"""
+
+    con_else = """
+    } else
+"""
+
+    final_else = """
+    {
+        switch(filterSize) {
+"""
+
+    last_return = """
+    }
+    return {gradInput, gradWeight};
+}
+"""
+
+    with open("dynamicconv_cuda_backward.cu", "w") as backward:
+        backward.write(head)
+        for seq in seqs:
+            backward.write(sequence_if.format(seq=seq))
+            for k, t, m in zip(kernels, thresh, min_block):
+                backward.write(case_k.format(k=k))
+                if seq <= t:
+                    b_size = seq
+                else:
+                    b_size = m
+                    backward.write(chunks_reset.format(b_size=b_size))
+                for p in [k // 2, k - 1]:
+                    backward.write(main_block.format(k=k, b_size=b_size, p=p))
+                backward.write(bad_padding)
+            backward.write(bad_filter)
+            backward.write(con_else)
+        backward.write(final_else)
+        for k, m in zip(kernels, min_block):
+            backward.write(case_k.format(k=k))
+            backward.write(chunks_reset.format(b_size=m))
+            for p in [k // 2, k - 1]:
+                backward.write(main_block.format(k=k, b_size=m, p=p))
+            backward.write(bad_padding)
+        backward.write(bad_filter)
+        backward.write(last_return)
+
+
+if __name__ == "__main__":
+    gen_forward()
+    gen_backward()
diff --git a/SpeechT5/fairseq/fairseq/modules/dynamicconv_layer/dynamicconv_cuda.cpp b/SpeechT5/fairseq/fairseq/modules/dynamicconv_layer/dynamicconv_cuda.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..ebd4df0e9608d769f31eadc6e0b487505f11b279
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/modules/dynamicconv_layer/dynamicconv_cuda.cpp
@@ -0,0 +1,56 @@
+/**
+ * Copyright (c) Facebook, Inc. and its affiliates.
+ *
+ * This source code is licensed under the MIT license found in the
+ * LICENSE file in the root directory of this source tree.
+ */
+
+#include <torch/extension.h>
+#include <vector>
+
+std::vector<at::Tensor> dynamicconv_cuda_forward(
+    at::Tensor input,
+    at::Tensor filters,
+    int padding_l);
+
+std::vector<at::Tensor> dynamicconv_cuda_backward(
+    at::Tensor gradOutput,
+    int padding_l,
+    at::Tensor input,
+    at::Tensor filters);
+
+
+#define CHECK_CUDA(x) AT_ASSERTM(x.type().is_cuda(), #x " must be a CUDA tensor")
+#define CHECK_CONTIGUOUS(x) AT_ASSERTM(x.is_contiguous(), #x " must be contiguous")
+#define CHECK_INPUT(x) CHECK_CUDA(x); CHECK_CONTIGUOUS(x)
+
+std::vector<at::Tensor> dynamicconv_forward(
+    at::Tensor input,
+    at::Tensor filters,
+    int padding_l) {
+
+    CHECK_INPUT(input);
+    CHECK_INPUT(filters);
+
+    return dynamicconv_cuda_forward(input, filters,
+            padding_l);
+}
+
+std::vector<at::Tensor> dynamicconv_backward(
+    at::Tensor gradOutput,
+    int padding_l,
+    at::Tensor input,
+    at::Tensor filters) {
+
+    CHECK_INPUT(gradOutput);
+    CHECK_INPUT(input);
+    CHECK_INPUT(filters);
+
+    return dynamicconv_cuda_backward(gradOutput, padding_l,
+            input, filters);
+}
+
+PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {
+    m.def("forward", &dynamicconv_forward, "dynamicconv forward (CUDA)");
+    m.def("backward", &dynamicconv_backward, "dynamicconv backward (CUDA)");
+}
diff --git a/SpeechT5/fairseq/fairseq/modules/dynamicconv_layer/dynamicconv_cuda.cuh b/SpeechT5/fairseq/fairseq/modules/dynamicconv_layer/dynamicconv_cuda.cuh
new file mode 100644
index 0000000000000000000000000000000000000000..2196259433aefc88f96cd5bbcae57740a9a8c2dc
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/modules/dynamicconv_layer/dynamicconv_cuda.cuh
@@ -0,0 +1,51 @@
+/**
+ * Copyright (c) Facebook, Inc. and its affiliates.
+ * 
+ * This source code is licensed under the MIT license found in the
+ * LICENSE file in the root directory of this source tree.
+ */
+
+#include <ATen/ATen.h>
+#include <c10/cuda/CUDAStream.h>
+
+#include <cuda.h>
+#include <cuda_fp16.h>
+#include <cuda_runtime.h>
+
+#include <algorithm>
+#include <functional>
+#include <iostream>
+#include <stdexcept>
+#include <utility>
+#include <vector>
+
+#include <stdlib.h>
+#include <assert.h>
+#include <math.h>
+
+#define SHFL_MASK 0xffffffff
+
+template<int FS, int SB, int padding_l, typename scalar_t>
+__global__
+void dynamicconv_forward_kernel(const scalar_t* input,
+                                const scalar_t* weight,
+                                int minibatch, 
+                                int sequenceLength,
+                                int numFeatures, 
+                                int numFiltersInBlock,
+                                int numHeads,
+                                scalar_t* output);
+
+template<int FS, int SB, int padding_l, typename scalar_t>
+__global__
+void dynamicconv_backward_kernel(
+    const scalar_t* gradOutput, // B * C * T
+    const scalar_t* input, // B * C * T
+    const scalar_t* weight,
+    int minibatch,
+    int sequenceLength,
+    int numFeatures,
+    int numFiltersInBlock,
+    int numHeads,
+    scalar_t* gradWeight,
+    scalar_t* gradInput); // B * H * k * T
diff --git a/SpeechT5/fairseq/fairseq/modules/dynamicconv_layer/dynamicconv_cuda_kernel.cu b/SpeechT5/fairseq/fairseq/modules/dynamicconv_layer/dynamicconv_cuda_kernel.cu
new file mode 100644
index 0000000000000000000000000000000000000000..300d35b6478080a9594a22e335988c321d43127f
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/modules/dynamicconv_layer/dynamicconv_cuda_kernel.cu
@@ -0,0 +1,168 @@
+/**
+ * Copyright (c) Facebook, Inc. and its affiliates.
+ * 
+ * This source code is licensed under the MIT license found in the
+ * LICENSE file in the root directory of this source tree.
+ */
+
+#include "dynamicconv_cuda.cuh"
+#include "dynamicconv_cuda_forward.cu"
+#include "dynamicconv_cuda_backward.cu"
+#include "../cuda_utils.cu"
+
+// FS is filter size and kernels are specialized for filter sizes
+template<int FS, int SB, int padding_l, typename scalar_t>
+__global__
+void dynamicconv_forward_kernel(const scalar_t* input,
+                                const scalar_t* weight,
+                                int minibatch,
+                                int sequenceLength,
+                                int numFeatures,
+                                int numFiltersInBlock,
+                                int numHeads,
+                                scalar_t* output) {
+  assert(blockDim.x == SB);
+
+  const int tid = threadIdx.x;
+  const int batchIdx = blockIdx.x;
+  const int featureIdx = blockIdx.y;
+  const int head = featureIdx / numFiltersInBlock;
+
+  const int IOOffset = batchIdx * numFeatures * sequenceLength
+                       + featureIdx * sequenceLength;
+  const scalar_t* inputFeature = &input[IOOffset];
+  scalar_t* outputFeature = &output[IOOffset];
+
+  scalar_t filter[FS];
+
+  __shared__ scalar_t tempInput[SB + FS];
+  zeroSharedMem<FS, SB, padding_l>(tempInput);
+
+  const int numIterations = divUp<int, int>(sequenceLength, SB);
+
+  for (int i = 0; i < numIterations; ++i) {
+    __syncthreads();
+    const int inputOffset = i * SB;
+    load_input_to_shared<FS, SB, padding_l>(inputFeature, inputOffset,
+                                            sequenceLength, i,
+                                            numIterations, false, tempInput);
+    __syncthreads();
+    if (inputOffset + tid < sequenceLength) {
+
+      #pragma unroll
+      for (int k = 0; k < FS; ++k) {
+        const int filterOffset = batchIdx * numHeads * FS * sequenceLength
+                                 + head * FS * sequenceLength
+                                 + k * sequenceLength
+                                 + i * SB + tid;
+        filter[k] = weight[filterOffset];
+      }
+
+      scalar_t out = scalar_t(0.0);
+      #pragma unroll
+      for (int k = 0; k < FS; ++k) {
+        out += filter[k] * tempInput[tid + k];
+      }
+
+      outputFeature[inputOffset + tid] = out;
+
+    }
+  }
+}
+
+template<int FS, int SB, int padding_l, typename scalar_t>
+__global__
+void dynamicconv_backward_kernel(
+    const scalar_t* gradOutput, // B * C * T
+    const scalar_t* input, // B * C * T
+    const scalar_t* weight,
+    int minibatch,
+    int sequenceLength,
+    int numFeatures,
+    int numFiltersInBlock,
+    int numHeads,
+    scalar_t* gradWeight,
+    scalar_t* gradInput) { // B * H * k * T
+
+  assert(blockDim.x == SB);
+
+  // each block operates on a single batch and filter head
+  const int tid = threadIdx.x;
+  const int batchIdx = blockIdx.x;
+  const int headIdx = blockIdx.y;
+  const int chunkIdx = blockIdx.z;
+
+  const int numChunks = divUp<int, int>(sequenceLength, SB);
+  const int inputOffset = chunkIdx * SB;
+
+  // initialize shared memory for output gradient and input
+  __shared__ scalar_t tempGradOutput[SB + FS];
+  __shared__ scalar_t tempInput[SB + FS];
+  const int padding = FS - padding_l - 1;
+
+  zeroSharedMem<FS, SB, padding>(tempGradOutput);
+  zeroSharedMem<FS, SB, padding_l>(tempInput);
+
+  // initialize local filter and weight gradient sum arrays
+  scalar_t tempGradSum[FS];
+  scalar_t bfilter[FS];
+  for (int k = 0; k < FS; ++k) {
+    tempGradSum[k] = scalar_t(0.0);
+
+    int idxOffset = inputOffset + tid + k - padding;
+    if (idxOffset >= 0 && idxOffset < sequenceLength) {
+      int bfilterOffset = batchIdx * numHeads * FS * sequenceLength
+                          + headIdx * FS * sequenceLength
+                          + (FS - k  - 1) * sequenceLength
+                          + idxOffset;
+      bfilter[k] = weight[bfilterOffset];
+    } else {
+      bfilter[k] = scalar_t(0.0);
+    }
+  }
+
+
+  // iterate over filter block
+  for (int featureIdx = 0; featureIdx < numFiltersInBlock; ++featureIdx) {
+    __syncthreads();
+
+    // load input and output gradient for this channel and chunk
+    const int IOOffset = batchIdx * numFeatures * sequenceLength
+                         + (headIdx * numFiltersInBlock + featureIdx) * sequenceLength;
+    const scalar_t* inputFeature = &input[IOOffset];
+    const scalar_t* gradOutputFeature = &gradOutput[IOOffset];
+    scalar_t* gradInputFeature = &gradInput[IOOffset];
+
+    load_input_to_shared<FS, SB, padding>(gradOutputFeature, inputOffset,
+                                            sequenceLength, chunkIdx,
+                                            numChunks, true, tempGradOutput);
+    load_input_to_shared<FS, SB, padding_l>(inputFeature, inputOffset,
+                                            sequenceLength, chunkIdx,
+                                            numChunks, true, tempInput);
+    __syncthreads();
+ 
+    // sum input and weight gradients
+    scalar_t out = scalar_t(0.0);
+    #pragma unroll
+    for (int k = 0; k < FS; ++k) {
+      tempGradSum[k] += tempInput[tid + k] * tempGradOutput[tid + padding];
+      out += bfilter[k] * tempGradOutput[tid + k];
+    }
+    
+    if (inputOffset + tid < sequenceLength) {
+      gradInputFeature[inputOffset + tid] = out;
+    }
+  }
+
+  const int gradOffset = batchIdx * numHeads * FS * sequenceLength
+               + headIdx * FS * sequenceLength;
+  scalar_t *gradWeightFeature = &gradWeight[gradOffset];
+
+  // write weight gradient
+  if (inputOffset + tid < sequenceLength) {
+    for (int k = 0; k < FS; ++k) {
+      const int outputOffset = k * sequenceLength + inputOffset + tid;
+      gradWeightFeature[outputOffset] = tempGradSum[k];
+    }
+  }
+}
diff --git a/SpeechT5/fairseq/fairseq/modules/dynamicconv_layer/dynamicconv_layer.py b/SpeechT5/fairseq/fairseq/modules/dynamicconv_layer/dynamicconv_layer.py
new file mode 100644
index 0000000000000000000000000000000000000000..711ed03483f4089dbe91964a89021b49eeffbedc
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/modules/dynamicconv_layer/dynamicconv_layer.py
@@ -0,0 +1,227 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import dynamicconv_cuda
+import torch
+import torch.nn.functional as F
+from fairseq import utils
+from fairseq.incremental_decoding_utils import with_incremental_state
+from fairseq.modules.fairseq_dropout import FairseqDropout
+from fairseq.modules.unfold import unfold1d
+from torch import nn
+from torch.autograd import Function
+
+
+class dynamicconvFunction(Function):
+    @staticmethod
+    def forward(ctx, x, weights, padding_l):
+        ctx.padding_l = padding_l
+        outputs = dynamicconv_cuda.forward(x, weights, padding_l)
+        variables = [x, weights]
+        ctx.save_for_backward(*variables)
+        return outputs[0]
+
+    @staticmethod
+    def backward(ctx, grad_output):
+        outputs = dynamicconv_cuda.backward(
+            grad_output.contiguous(), ctx.padding_l, *ctx.saved_tensors
+        )
+        grad_input, grad_weights = outputs
+        return grad_input, grad_weights, None
+
+
+@with_incremental_state
+class DynamicconvLayer(nn.Module):
+    def __init__(
+        self,
+        input_size,
+        kernel_size=1,
+        padding_l=None,
+        weight_softmax=False,
+        num_heads=1,
+        weight_dropout=0.0,
+        bias=False,
+        renorm_padding=False,
+        conv_bias=False,
+        query_size=None,
+    ):
+
+        super(DynamicconvLayer, self).__init__()
+        self.input_size = input_size
+        self.query_size = input_size if query_size is None else query_size
+        self.kernel_size = kernel_size
+        self.padding_l = padding_l
+        self.num_heads = num_heads
+        self.weight_softmax = weight_softmax
+        self.weight_dropout_module = FairseqDropout(
+            weight_dropout, module_name=self.__class__.__name__
+        )
+        self.renorm_padding = renorm_padding
+        self.bias = bias
+
+        self.weight_linear = nn.Linear(input_size, num_heads * kernel_size, bias)
+        if conv_bias:
+            self.conv_bias = nn.Parameter(torch.Tensor(input_size))
+        else:
+            self.conv_bias = None
+        self.reset_parameters()
+
+    def reset_parameters(self):
+        nn.init.xavier_uniform_(self.weight_linear.weight)
+        if self.conv_bias is not None:
+            nn.init.constant_(self.conv_bias, 0.0)
+            nn.init.constant_(self.weight_linaer.bias, 0.0)
+
+    def forward(self, x, incremental_state=None, query=None, unfold=None):
+
+        T, B, C = x.size()
+        K, H = self.kernel_size, self.num_heads
+        # R = C // H
+
+        # during inference time, incremental BMM is faster
+        if incremental_state is not None:
+            unfold = (
+                x.size(0) > 512 if unfold is None else unfold
+            )  # use unfold mode as default for long sequence to save memory
+            unfold = unfold or (incremental_state is not None)
+            assert query is None
+
+            if query is None:
+                query = x
+            if unfold:
+                output = self._forward_unfolded(x, incremental_state, query)
+            else:
+                output = self._forward_expanded(x, incremental_state, query)
+
+            if self.conv_bias is not None:
+                output = output + self.conv_bias.view(1, 1, -1)
+
+            return output
+
+        # during training time, use CUDA kernel
+        else:
+            weight = self.weight_linear(x).view(T, B, H, K)
+            if self.weight_softmax:
+                weight = F.softmax(weight, dim=-1)
+            if self.weight_dropout_module.p:
+                weight = self.weight_dropout_module(weight)
+
+            weight = weight.permute(1, 2, 3, 0).contiguous()
+            self.filters = weight
+            x = x.permute(1, 2, 0).contiguous()
+            output = dynamicconvFunction.apply(x, weight, self.padding_l).permute(
+                2, 0, 1
+            )
+            if self.conv_bias is not None:
+                output = output + self.conv_bias.view(1, 1, -1)
+            return output
+
+    def reorder_incremental_state(self, incremental_state, new_order):
+        input_buffer = self._get_input_buffer(incremental_state)
+        if input_buffer is not None:
+            input_buffer = input_buffer.index_select(1, new_order)
+            self._set_input_buffer(incremental_state, input_buffer)
+
+    def _get_input_buffer(self, incremental_state):
+        return utils.get_incremental_state(self, incremental_state, "input_buffer")
+
+    def _set_input_buffer(self, incremental_state, new_buffer):
+        return utils.set_incremental_state(
+            self, incremental_state, "input_buffer", new_buffer
+        )
+
+    def _forward_unfolded(self, x, incremental_state, query):
+        """The conventional implementation of convolutions.
+        Unfolding the input by having a window shifting to the right."""
+        T, B, C = x.size()
+        K, H = self.kernel_size, self.num_heads
+        R = C // H
+        assert R * H == C == self.input_size
+
+        weight = self.weight_linear(query).view(T * B * H, -1)
+
+        # renorm_padding is only implemented in _forward_expanded
+        assert not self.renorm_padding or incremental_state is not None
+
+        if incremental_state is not None:
+            input_buffer = self._get_input_buffer(incremental_state)
+            if input_buffer is None:
+                input_buffer = x.new()
+            x_unfold = torch.cat([input_buffer, x.unsqueeze(3)], dim=3)
+            if self.kernel_size > 1:
+                self._set_input_buffer(
+                    incremental_state, x_unfold[:, :, :, -self.kernel_size + 1 :]
+                )
+            x_unfold = x_unfold.view(T * B * H, R, -1)
+        else:
+            padding_l = self.padding_l
+            if K > T and padding_l == K - 1:
+                weight = weight.narrow(1, K - T, T)
+                K, padding_l = T, T - 1
+            # unfold the input: T x B x C --> T' x B x C x K
+            x_unfold = unfold1d(x, K, padding_l, 0)
+            x_unfold = x_unfold.view(T * B * H, R, K)
+
+        if self.weight_softmax and not self.renorm_padding:
+            weight = F.softmax(weight, dim=1)
+        weight = weight.narrow(1, 0, K)
+
+        if incremental_state is not None:
+            weight = weight[:, -x_unfold.size(2) :]
+            K = weight.size(1)
+
+        if self.weight_softmax and self.renorm_padding:
+            weight = F.softmax(weight, dim=1)
+
+        weight = self.weight_dropout_module(weight, inplace=False)
+
+        output = torch.bmm(x_unfold, weight.unsqueeze(2))  # T*B*H x R x 1
+        output = output.view(T, B, C)
+        return output
+
+    def _forward_expanded(self, x, incremental_stat, query):
+        """Turn the convolution filters into band matrices and do matrix multiplication.
+        This is faster when the sequence is short, but less memory efficient.
+        This is not used in the decoder during inference.
+        """
+        T, B, C = x.size()
+        K, H = self.kernel_size, self.num_heads
+        R = C // H
+        assert R * H == C == self.input_size
+        weight = self.weight_linear(query).view(T * B * H, -1)
+
+        if not self.renorm_padding:
+            if self.weight_softmax:
+                weight = F.softmax(weight, dim=1)
+            weight = self.weight_dropout_module(weight, inplace=False)
+        weight = weight.narrow(1, 0, K).contiguous()
+        weight = weight.view(T, B * H, K).transpose(0, 1)
+
+        x = x.view(T, B * H, R).transpose(0, 1)
+        if self.weight_softmax and self.renorm_padding:
+            # turn the convolution filters into band matrices
+            weight_expanded = weight.new(B * H, T, T + K - 1).fill_(float("-inf"))
+            weight_expanded.as_strided(
+                (B * H, T, K), (T * (T + K - 1), T + K, 1)
+            ).copy_(weight)
+            weight_expanded = weight_expanded.narrow(2, self.padding_l, T)
+            # normalize the weight over valid positions like self-attention
+            weight_expanded = F.softmax(weight_expanded, dim=2)
+            weight_expanded = self.weight_dropout_module(weight_expanded, inplace=False)
+        else:
+            P = self.padding_l
+            # For efficiency, we cut the kernel size and reduce the padding when the kernel is larger than the length
+            if K > T and P == K - 1:
+                weight = weight.narrow(2, K - T, T)
+                K, P = T, T - 1
+            # turn the convolution filters into band matrices
+            weight_expanded = weight.new_zeros(B * H, T, T + K - 1, requires_grad=False)
+            weight_expanded.as_strided(
+                (B * H, T, K), (T * (T + K - 1), T + K, 1)
+            ).copy_(weight)
+            weight_expanded = weight_expanded.narrow(2, P, T)  # B*H x T x T
+        output = torch.bmm(weight_expanded, x)
+        output = output.transpose(0, 1).contiguous().view(T, B, C)
+        return output
diff --git a/SpeechT5/fairseq/fairseq/modules/dynamicconv_layer/dynamiconv_cpu.cpp b/SpeechT5/fairseq/fairseq/modules/dynamicconv_layer/dynamiconv_cpu.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..8a6af4285da3c40a01383541acf1f455ffc060fb
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/modules/dynamicconv_layer/dynamiconv_cpu.cpp
@@ -0,0 +1,35 @@
+#include <torch/torch.h>
+#include <vector>
+
+std::vector<float*> dynamicconv_cpu_forward(
+    float* input,
+    float* filters,
+    int padding_l);
+
+std::vector<float*> dynamicconv_cpu_backward(
+    float* gradOutput,
+    int padding_l,
+    float* input,
+    float* filters);
+
+std::vector<float*> dynamicconv_forward(
+    float* input,
+    float* filters,
+    int padding_l) {
+
+    return dynamicconv_cpu_forward(input, filters, padding_l);
+}
+
+std::vector<float*> dynamicconv_backward(
+    float* gradOutput,
+    int padding_l,
+    float* input,
+    float* filters) {
+
+    return dynamicconv_cpu_backward(gradOutput, padding_l, input, filters);
+}
+
+PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {
+    m.def("forward", &dynamicconv_forward, "dynamicconv forward (CPU)");
+    m.def("backward", &dynamicconv_backward, "dynamicconv backward (CPU)");
+}
diff --git a/SpeechT5/fairseq/fairseq/modules/dynamicconv_layer/setup.py b/SpeechT5/fairseq/fairseq/modules/dynamicconv_layer/setup.py
new file mode 100644
index 0000000000000000000000000000000000000000..6a21f7e2ee0840a3b251522275a0b32a856951d7
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/modules/dynamicconv_layer/setup.py
@@ -0,0 +1,23 @@
+#!/usr/bin/env python3
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from setuptools import setup
+from torch.utils.cpp_extension import BuildExtension, CUDAExtension
+
+
+setup(
+    name="dynamicconv_layer",
+    ext_modules=[
+        CUDAExtension(
+            name="dynamicconv_cuda",
+            sources=[
+                "dynamicconv_cuda.cpp",
+                "dynamicconv_cuda_kernel.cu",
+            ],
+        ),
+    ],
+    cmdclass={"build_ext": BuildExtension},
+)
diff --git a/SpeechT5/fairseq/fairseq/modules/fairseq_dropout.py b/SpeechT5/fairseq/fairseq/modules/fairseq_dropout.py
new file mode 100644
index 0000000000000000000000000000000000000000..3cddca77186f5ddd5cfb9c0ed6def9bafdf3bf1e
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/modules/fairseq_dropout.py
@@ -0,0 +1,51 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import logging
+from typing import List, Optional
+
+import torch.nn as nn
+import torch.nn.functional as F
+
+
+logger = logging.getLogger(__name__)
+
+
+class FairseqDropout(nn.Module):
+    def __init__(self, p, module_name=None):
+        super().__init__()
+        self.p = p
+        self.module_name = module_name
+        self.apply_during_inference = False
+
+    def forward(self, x, inplace: bool = False):
+        if self.p > 0 and (self.training or self.apply_during_inference):
+            return F.dropout(x, p=self.p, training=True, inplace=inplace)
+        else:
+            return x
+
+    def make_generation_fast_(
+        self,
+        name: str,
+        retain_dropout: bool = False,
+        retain_dropout_modules: Optional[List[str]] = None,
+        **kwargs
+    ):
+        if retain_dropout:
+            if retain_dropout_modules is not None and self.module_name is None:
+                logger.warning(
+                    "Cannot enable dropout during inference for module {} "
+                    "because module_name was not set".format(name)
+                )
+            elif (
+                retain_dropout_modules is None  # if None, apply to all modules
+                or self.module_name in retain_dropout_modules
+            ):
+                logger.info(
+                    "Enabling dropout during inference for module: {}".format(name)
+                )
+                self.apply_during_inference = True
+            else:
+                logger.info("Disabling dropout for module: {}".format(name))
diff --git a/SpeechT5/fairseq/fairseq/modules/fp32_group_norm.py b/SpeechT5/fairseq/fairseq/modules/fp32_group_norm.py
new file mode 100644
index 0000000000000000000000000000000000000000..d03aac022e30c8c14a600062d1d86429504ba003
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/modules/fp32_group_norm.py
@@ -0,0 +1,25 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+"""
+Layer norm done in fp32 (for fp16 training)
+"""
+
+import torch.nn as nn
+import torch.nn.functional as F
+
+
+class Fp32GroupNorm(nn.GroupNorm):
+    def __init__(self, *args, **kwargs):
+        super().__init__(*args, **kwargs)
+
+    def forward(self, input):
+        output = F.group_norm(
+            input.float(),
+            self.num_groups,
+            self.weight.float() if self.weight is not None else None,
+            self.bias.float() if self.bias is not None else None,
+            self.eps,
+        )
+        return output.type_as(input)
diff --git a/SpeechT5/fairseq/fairseq/modules/gelu.py b/SpeechT5/fairseq/fairseq/modules/gelu.py
new file mode 100644
index 0000000000000000000000000000000000000000..a2f1ecff4a3ae3de3eb7d327b9163c46b18a15ed
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/modules/gelu.py
@@ -0,0 +1,25 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+"""
+See "Gaussian Error Linear Units (GELUs)" by Dan Hendrycks and Kevin Gimpel with
+the corresponding GitHub repo: https://github.com/hendrycks/GELUs
+"""
+
+import math
+
+import torch
+import torch.nn as nn
+
+
+def gelu_accurate(x):
+    if not hasattr(gelu_accurate, "_a"):
+        gelu_accurate._a = math.sqrt(2 / math.pi)
+    return (
+        0.5 * x * (1 + torch.tanh(gelu_accurate._a * (x + 0.044715 * torch.pow(x, 3))))
+    )
+
+
+def gelu(x: torch.Tensor) -> torch.Tensor:
+    return torch.nn.functional.gelu(x.float()).type_as(x)
diff --git a/SpeechT5/fairseq/fairseq/modules/grad_multiply.py b/SpeechT5/fairseq/fairseq/modules/grad_multiply.py
new file mode 100644
index 0000000000000000000000000000000000000000..08d15f55dfda9c61a1cf8641ea31424fe1d97f57
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/modules/grad_multiply.py
@@ -0,0 +1,18 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import torch
+
+
+class GradMultiply(torch.autograd.Function):
+    @staticmethod
+    def forward(ctx, x, scale):
+        ctx.scale = scale
+        res = x.new(x)
+        return res
+
+    @staticmethod
+    def backward(ctx, grad):
+        return grad * ctx.scale, None
diff --git a/SpeechT5/fairseq/fairseq/modules/gumbel_vector_quantizer.py b/SpeechT5/fairseq/fairseq/modules/gumbel_vector_quantizer.py
new file mode 100644
index 0000000000000000000000000000000000000000..71134388889d7f224655957256e78fd6c02d72a3
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/modules/gumbel_vector_quantizer.py
@@ -0,0 +1,202 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+
+
+class GumbelVectorQuantizer(nn.Module):
+    def __init__(
+        self,
+        dim,
+        num_vars,
+        temp,
+        groups,
+        combine_groups,
+        vq_dim,
+        time_first,
+        activation=nn.GELU(),
+        weight_proj_depth=1,
+        weight_proj_factor=1,
+    ):
+        """Vector quantization using gumbel softmax
+
+        Args:
+            dim: input dimension (channels)
+            num_vars: number of quantized vectors per group
+            temp: temperature for training. this should be a tuple of 3 elements: (start, stop, decay factor)
+            groups: number of groups for vector quantization
+            combine_groups: whether to use the vectors for all groups
+            vq_dim: dimensionality of the resulting quantized vector
+            time_first: if true, expect input in BxTxC format, otherwise in BxCxT
+            activation: what activation to use (should be a module). this is only used if weight_proj_depth is > 1
+            weight_proj_depth: number of layers (with activation in between) to project input before computing logits
+            weight_proj_factor: this is used only if weight_proj_depth is > 1. scales the inner dimensionality of
+                                projections by this factor
+        """
+        super().__init__()
+
+        self.groups = groups
+        self.combine_groups = combine_groups
+        self.input_dim = dim
+        self.num_vars = num_vars
+        self.time_first = time_first
+
+        assert (
+            vq_dim % groups == 0
+        ), f"dim {vq_dim} must be divisible by groups {groups} for concatenation"
+
+        var_dim = vq_dim // groups
+        num_groups = groups if not combine_groups else 1
+
+        self.vars = nn.Parameter(torch.FloatTensor(1, num_groups * num_vars, var_dim))
+        nn.init.uniform_(self.vars)
+
+        if weight_proj_depth > 1:
+
+            def block(input_dim, output_dim):
+                return nn.Sequential(nn.Linear(input_dim, output_dim), activation)
+
+            inner_dim = self.input_dim * weight_proj_factor
+            self.weight_proj = nn.Sequential(
+                *[
+                    block(self.input_dim if i == 0 else inner_dim, inner_dim)
+                    for i in range(weight_proj_depth - 1)
+                ],
+                nn.Linear(inner_dim, groups * num_vars),
+            )
+        else:
+            self.weight_proj = nn.Linear(self.input_dim, groups * num_vars)
+            nn.init.normal_(self.weight_proj.weight, mean=0, std=1)
+            nn.init.zeros_(self.weight_proj.bias)
+
+        if isinstance(temp, str):
+            import ast
+            temp = ast.literal_eval(temp)
+        assert len(temp) == 3, f"{temp}, {len(temp)}"
+
+        self.max_temp, self.min_temp, self.temp_decay = temp
+        self.curr_temp = self.max_temp
+        self.codebook_indices = None
+
+    def set_num_updates(self, num_updates):
+        self.curr_temp = max(
+            self.max_temp * self.temp_decay ** num_updates, self.min_temp
+        )
+
+    def get_codebook_indices(self):
+        if self.codebook_indices is None:
+            from itertools import product
+
+            p = [range(self.num_vars)] * self.groups
+            inds = list(product(*p))
+            self.codebook_indices = torch.tensor(
+                inds, dtype=torch.long, device=self.vars.device
+            ).flatten()
+
+            if not self.combine_groups:
+                self.codebook_indices = self.codebook_indices.view(
+                    self.num_vars ** self.groups, -1
+                )
+                for b in range(1, self.groups):
+                    self.codebook_indices[:, b] += self.num_vars * b
+                self.codebook_indices = self.codebook_indices.flatten()
+        return self.codebook_indices
+
+    def codebook(self):
+        indices = self.get_codebook_indices()
+        return (
+            self.vars.squeeze(0)
+            .index_select(0, indices)
+            .view(self.num_vars ** self.groups, -1)
+        )
+
+    def sample_from_codebook(self, b, n):
+        indices = self.get_codebook_indices()
+        indices = indices.view(-1, self.groups)
+        cb_size = indices.size(0)
+        assert (
+            n < cb_size
+        ), f"sample size {n} is greater than size of codebook {cb_size}"
+        sample_idx = torch.randint(low=0, high=cb_size, size=(b * n,))
+        indices = indices[sample_idx]
+
+        z = self.vars.squeeze(0).index_select(0, indices.flatten()).view(b, n, -1)
+        return z
+
+    def to_codebook_index(self, indices):
+        res = indices.new_full(indices.shape[:-1], 0)
+        for i in range(self.groups):
+            exponent = self.groups - i - 1
+            res += indices[..., i] * (self.num_vars ** exponent)
+        return res
+
+    def forward_idx(self, x):
+        res = self.forward(x, produce_targets=True)
+        return res["x"], res["targets"]
+
+    def forward(self, x, produce_targets=False):
+
+        result = {"num_vars": self.num_vars * self.groups}
+
+        if not self.time_first:
+            x = x.transpose(1, 2)
+
+        bsz, tsz, fsz = x.shape
+        x = x.reshape(-1, fsz)
+        x = self.weight_proj(x)
+        x = x.view(bsz * tsz * self.groups, -1)
+
+        _, k = x.max(-1)
+        hard_x = (
+            x.new_zeros(*x.shape)
+            .scatter_(-1, k.view(-1, 1), 1.0)
+            .view(bsz * tsz, self.groups, -1)
+        )
+        hard_probs = torch.mean(hard_x.float(), dim=0)
+        result["code_perplexity"] = torch.exp(
+            -torch.sum(hard_probs * torch.log(hard_probs + 1e-7), dim=-1)
+        ).sum()
+
+        avg_probs = torch.softmax(
+            x.view(bsz * tsz, self.groups, -1).float(), dim=-1
+        ).mean(dim=0)
+        result["prob_perplexity"] = torch.exp(
+            -torch.sum(avg_probs * torch.log(avg_probs + 1e-7), dim=-1)
+        ).sum()
+
+        result["temp"] = self.curr_temp
+
+        if self.training:
+            x = F.gumbel_softmax(x.float(), tau=self.curr_temp, hard=True).type_as(x)
+        else:
+            x = hard_x
+
+        x = x.view(bsz * tsz, -1)
+
+        vars = self.vars
+        if self.combine_groups:
+            vars = vars.repeat(1, self.groups, 1)
+
+        if produce_targets:
+            result["targets"] = (
+                x.view(bsz * tsz * self.groups, -1)
+                .argmax(dim=-1)
+                .view(bsz, tsz, self.groups)
+                .detach()
+            )
+
+        x = x.unsqueeze(-1) * vars
+        x = x.view(bsz * tsz, self.groups, self.num_vars, -1)
+        x = x.sum(-2)
+        x = x.view(bsz, tsz, -1)
+
+        if not self.time_first:
+            x = x.transpose(1, 2)  # BTC -> BCT
+
+        result["x"] = x
+
+        return result
diff --git a/SpeechT5/fairseq/fairseq/modules/kmeans_attention.py b/SpeechT5/fairseq/fairseq/modules/kmeans_attention.py
new file mode 100644
index 0000000000000000000000000000000000000000..11a7debcf2ac025fb02ba5e672987f87dbbc49a4
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/modules/kmeans_attention.py
@@ -0,0 +1,609 @@
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+import math
+from inspect import isfunction
+from operator import mul
+from functools import reduce, wraps
+
+from aml.multimodal_video.utils.einops.lib import rearrange, repeat
+from aml.multimodal_video.utils.einops.lib.layers.torch import Rearrange
+
+from fairseq.modules.local_attention import LocalAttention
+
+# constants
+
+TOKEN_SELF_ATTN_VALUE = -5e4
+KMEAN_INIT_ITERS = 10
+
+# helper functions
+
+
+def exists(val):
+    return val is not None
+
+
+def identity(x, *args, **kwargs):
+    return x
+
+
+def default(x, d):
+    if not exists(x):
+        return d if not isfunction(d) else d()
+    return x
+
+
+def cast_tuple(x):
+    return x if isinstance(x, tuple) else (x,)
+
+
+def cache_fn(f):
+    cache = None
+
+    @wraps(f)
+    def cached_fn(*args, **kwargs):
+        nonlocal cache
+        if exists(cache):
+            return cache
+        cache = f(*args, **kwargs)
+        return cache
+    return cached_fn
+
+
+def to(t):
+    return {'device': t.device, 'dtype': t.dtype}
+
+
+def find_modules(nn_module, type):
+    return [module for module in nn_module.modules() if isinstance(module, type)]
+
+
+def is_empty(t):
+    return t.nelement() == 0
+
+
+def max_neg_value(tensor):
+    return -torch.finfo(tensor.dtype).max
+
+
+def batched_index_select(values, indices):
+    last_dim = values.shape[-1]
+    return values.gather(2, expand_dim(indices, -1, last_dim))
+
+
+def merge_dims(ind_from, ind_to, tensor):
+    shape = list(tensor.shape)
+    arr_slice = slice(ind_from, ind_to + 1)
+    shape[arr_slice] = [reduce(mul, shape[arr_slice])]
+    return tensor.reshape(*shape)
+
+
+def expand_dim(t, dim, k):
+    t = t.unsqueeze(dim)
+    expand_shape = [-1] * len(t.shape)
+    expand_shape[dim] = k
+    return t.expand(*expand_shape)
+
+
+def scatter_mean(src, t, index, dim, eps=1e-5):
+    numer = src.scatter_add(dim, index, t)
+    denom = src.scatter_add(dim, index, torch.ones_like(t))
+    return numer / (denom + eps)
+
+
+def split_at_index(dim, index, t):
+    pre_slices = (slice(None),) * dim
+    l = (*pre_slices, slice(None, index))
+    r = (*pre_slices, slice(index, None))
+    return t[l], t[r]
+
+
+def reshape_dim(t, dim, split_dims):
+    shape = list(t.shape)
+    num_dims = len(shape)
+    dim = (dim + num_dims) % num_dims
+    shape[dim:dim+1] = split_dims
+    return t.reshape(shape)
+
+
+def ema(old, new, decay):
+    if not exists(old):
+        return new
+    return old * decay + new * (1 - decay)
+
+
+def ema_inplace(moving_avg, new, decay):
+    if is_empty(moving_avg):
+        moving_avg.data.copy_(new)
+        return
+    moving_avg.data.mul_(decay).add_(new, alpha=(1 - decay))
+
+# helper classes
+
+
+def map_first_tuple_or_el(x, fn):
+    if isinstance(x, tuple):
+        return (fn(x[0]),) + x[1:]
+    return fn(x)
+
+
+class Chunk(nn.Module):
+    def __init__(self, chunks, fn, along_dim=-1):
+        super().__init__()
+        self.dim = along_dim
+        self.chunks = chunks
+        self.fn = fn
+
+    def forward(self, x, **kwargs):
+        if self.chunks <= 1:
+            return self.fn(x, **kwargs)
+        chunks = x.chunk(self.chunks, dim=self.dim)
+        return torch.cat([self.fn(c, **kwargs) for c in chunks], dim=self.dim)
+
+
+class PreNorm(nn.ModuleList):
+    def __init__(self, norm_class, dim, fn):
+        super().__init__()
+        self.norm = norm_class(dim)
+        self.fn = fn
+
+    def forward(self, x, **kwargs):
+        x = self.norm(x)
+        return self.fn(x, **kwargs)
+
+
+class ReZero(nn.Module):
+    def __init__(self, fn):
+        super().__init__()
+        self.residual_weight = nn.Parameter(torch.zeros(1))
+        self.fn = fn
+
+    def forward(self, x, **kwargs):
+        x = self.fn(x, **kwargs)
+        return map_first_tuple_or_el(x, lambda t: t * self.residual_weight)
+
+
+class ScaleNorm(nn.Module):
+    def __init__(self, dim, eps=1e-5):
+        super().__init__()
+        self.g = nn.Parameter(torch.ones(1))
+        self.eps = eps
+
+    def forward(self, x):
+        def norm(t):
+            n = torch.norm(t, dim=-1, keepdim=True).clamp(min=self.eps)
+            return t / n * self.g
+        return map_first_tuple_or_el(x, norm)
+
+
+class ProjectInOut(nn.Module):
+    def __init__(self, fn, dim_in, dim_out, project_out=True):
+        super().__init__()
+        self.fn = fn
+        self.project_in = nn.Linear(dim_in, dim_out)
+        self.project_out = nn.Linear(dim_out, dim_in) if project_out else identity
+
+    def forward(self, x, **kwargs):
+        x = self.project_in(x)
+        x, loss = self.fn(x, **kwargs)
+        x = self.project_out(x)
+        return x, loss
+
+
+class MatrixMultiply(nn.Module):
+    def __init__(self, tensor, transpose=False):
+        super().__init__()
+        self.tensor = tensor
+        self.transpose = transpose
+
+    def forward(self, x):
+        tensor = self.tensor
+        if self.transpose:
+            tensor = tensor.t()
+        return x @ tensor
+
+# positional embeddings
+
+
+class DepthWiseConv1d(nn.Module):
+    def __init__(self, dim_in, dim_out, kernel_size, stride=1, bias=True, causal=False):
+        super().__init__()
+        self.padding = ((kernel_size - 1), 0) if causal else (kernel_size // 2, kernel_size // 2)
+
+        self.net = nn.Sequential(
+            nn.Conv1d(dim_in, dim_in, kernel_size=kernel_size, groups=dim_in, stride=stride, bias=bias),
+            nn.Conv1d(dim_in, dim_out, 1, bias=bias)
+        )
+
+    def forward(self, x):
+        x = F.pad(x, self.padding, value=0.)
+        return self.net(x)
+
+
+class FixedPositionalEmbedding(nn.Module):
+    def __init__(self, dim, max_seq_len):
+        super().__init__()
+        inv_freq = 1. / (10000 ** (torch.arange(0, dim, 2).float() / dim))
+        position = torch.arange(0, max_seq_len, dtype=torch.float)
+        sinusoid_inp = torch.einsum("i,j->ij", position, inv_freq)
+        emb = torch.cat((sinusoid_inp.sin(), sinusoid_inp.cos()), dim=-1)
+        self.register_buffer('emb', emb)
+
+    def forward(self, x):
+        return self.emb[None, :x.shape[1], :].to(x)
+
+
+def rotate_every_two(x):
+    x = rearrange(x, '... (d j) -> ... d j', j=2)
+    x1, x2 = x.unbind(dim=-1)
+    x = torch.stack((-x2, x1), dim=-1)
+    return rearrange(x, '... d j -> ... (d j)')
+
+
+def apply_rotary_pos_emb(q, k, sinu_pos):
+    sinu_pos = rearrange(sinu_pos, '() n (j d) -> n j d', j=2)
+    sin, cos = sinu_pos.unbind(dim=-2)
+    sin, cos = map(lambda t: repeat(t, 'b n -> b (n j)', j=2), (sin, cos))
+    q, k = map(lambda t: (t * cos) + (rotate_every_two(t) * sin), (q, k))
+    return q, k
+
+# kmeans related function and class
+
+
+def update_kmeans_on_backwards(module):
+    module.kmean_modules = find_modules(module, Kmeans)
+
+    def hook(_, grad_in, grad_out):
+        for m in module.kmean_modules:
+            m.update()
+
+    return module.register_backward_hook(hook)
+
+
+def similarity(x, means):
+    return torch.einsum('bhld,hcd->bhlc', x, means)
+
+
+def dists_and_buckets(x, means):
+    dists = similarity(x, means)
+    _, buckets = torch.max(dists, dim=-1)
+    return dists, buckets
+
+
+def batched_bincount(index, num_classes, dim=-1):
+    shape = list(index.shape)
+    shape[dim] = num_classes
+    out = index.new_zeros(shape)
+    out.scatter_add_(dim, index, torch.ones_like(index, dtype=index.dtype))
+    return out
+
+
+def kmeans_iter(x, means, buckets=None):
+    b, h, _, d, dtype, num_clusters = *x.shape, x.dtype, means.shape[1]
+
+    if not exists(buckets):
+        _, buckets = dists_and_buckets(x, means)
+
+    bins = batched_bincount(buckets, num_clusters).sum(0, keepdim=True)
+    zero_mask = bins.long() == 0
+
+    means_ = buckets.new_zeros(b, h, num_clusters, d, dtype=dtype)
+    means_.scatter_add_(-2, expand_dim(buckets, -1, d), x)
+    means_ = F.normalize(means_.sum(0, keepdim=True), dim=-1).type(dtype)
+
+    means = torch.where(zero_mask.unsqueeze(-1), means, means_)
+    means = means.squeeze(0)
+    return means
+
+
+def distribution(dists, window_size):
+    _, topk_indices = dists.topk(k=window_size, dim=-2)
+    indices = topk_indices.transpose(-2, -1)
+    return indices.reshape(*indices.size()[:2], -1)
+
+
+class Kmeans(nn.Module):
+    def __init__(self, num_heads, head_dim, num_clusters, ema_decay=0.999, commitment=1e-4):
+        super().__init__()
+        self.commitment = commitment
+        self.ema_decay = ema_decay
+
+        self.register_buffer('means', torch.randn(num_heads, num_clusters, head_dim))
+        self.register_buffer('initted', torch.tensor(False))
+        self.num_new_means = 0
+        self.new_means = None
+
+    @torch.no_grad()
+    def init(self, x):
+        if self.initted:
+            return
+        _, h, _, d, device, _ = *x.shape, x.device, x.dtype
+
+        num_clusters = self.means.shape[1]
+
+        means = x.transpose(0, 1).contiguous().view(h, -1, d)
+        num_samples = means.shape[1]
+
+        if num_samples >= num_clusters:
+            indices = torch.randperm(num_samples, device=device)[:num_clusters]
+        else:
+            indices = torch.randint(0, num_samples, (num_clusters,), device=device)
+
+        means = means[:, indices]
+
+        for _ in range(KMEAN_INIT_ITERS):
+            means = kmeans_iter(x, means)
+
+        self.num_new_means = 0
+        self.means.data.copy_(means)
+        self.initted.data.copy_(torch.tensor(True))
+
+    @torch.no_grad()
+    def update(self, new_means=None):
+        new_means = default(new_means, self.new_means)
+        assert exists(new_means), 'new kmeans has not been supplied'
+        ema_inplace(self.means, new_means, self.ema_decay)
+
+        del self.new_means
+        self.new_means = None
+        self.num_new_means = 0
+
+    def forward(self, x, update_means=False):
+        self.init(x)
+
+        b, dtype = x.shape[0], x.dtype
+        means = self.means.type(dtype)
+        x = F.normalize(x, 2, dim=-1).type(dtype)
+
+        with torch.no_grad():
+            dists, buckets = dists_and_buckets(x, means)
+
+        routed_means = batched_index_select(expand_dim(means, 0, b), buckets)
+        loss = F.mse_loss(x, routed_means) * self.commitment
+
+        if update_means:
+            with torch.no_grad():
+                means = kmeans_iter(x, means, buckets)
+            self.new_means = ema(self.new_means, means, self.num_new_means / (self.num_new_means + 1))
+            self.num_new_means += 1
+
+        return dists, loss
+
+# kmeans attention class
+
+
+class KmeansAttention(nn.Module):
+    def __init__(self, num_clusters, window_size, num_heads, head_dim, causal=False, dropout=0., ema_decay=0.999, commitment=1e-4, context_window_size=None, receives_context=False, num_mem_kv=0, shared_qk=False):
+        super().__init__()
+        self.num_heads = num_heads
+        self.num_clusters = num_clusters
+        self.head_dim = head_dim
+
+        self.window_size = window_size
+        self.context_window_size = default(context_window_size, window_size)
+        self.causal = causal
+
+        self.shared_qk = shared_qk
+        self.receives_context = receives_context
+        self.kmeans = Kmeans(num_heads, head_dim, num_clusters, ema_decay, commitment)
+        self.dropout = nn.Dropout(dropout)
+
+        self.num_mem_kv = max(num_mem_kv, 1 if causal and not shared_qk else 0)
+        self.mem_key = nn.Parameter(torch.randn(num_heads, num_clusters, self.num_mem_kv, head_dim))
+        self.mem_value = nn.Parameter(torch.randn(num_heads, num_clusters, self.num_mem_kv, head_dim))
+
+    def forward(self, q, k, v, query_mask=None, key_mask=None, **kwargs):
+        b, h, t, d, kv_t, wsz, c_wsz, nc, device, dtype = *q.shape, k.shape[2], self.window_size, self.context_window_size, self.num_clusters, q.device, q.dtype
+        is_reverse = kwargs.pop('_reverse', False)
+
+        out = torch.zeros_like(q, dtype=dtype)
+
+        update_kmeans = self.training and not is_reverse
+
+        key_mask = default(key_mask, query_mask) if not self.receives_context else key_mask
+        kv_wsz = wsz if not self.receives_context else c_wsz
+
+        wsz = min(wsz, t)
+        kv_wsz = min(kv_wsz, kv_t)
+
+        if not self.shared_qk or self.receives_context:
+            dists, aux_loss = self.kmeans(torch.cat((q, k), dim=2), update_kmeans)
+            q_dists, k_dists = split_at_index(2, t, dists)
+            indices = distribution(q_dists, wsz)
+            kv_indices = distribution(k_dists, kv_wsz)
+        else:
+            dists, aux_loss = self.kmeans(q, update_kmeans)
+            k = F.normalize(k, dim=-1).to(q)
+            indices = distribution(dists, wsz)
+            kv_indices = indices
+
+        q = batched_index_select(q, indices)
+        k = batched_index_select(k, kv_indices)
+        v = batched_index_select(v, kv_indices)
+
+        reshape_with_window = lambda x: x.reshape(b, h, nc, -1, d)
+        q, k, v = map(reshape_with_window, (q, k, v))
+
+        m_k, m_v = map(lambda x: expand_dim(x, 0, b).to(q), (self.mem_key, self.mem_value))
+        k, v = map(lambda x: torch.cat(x, dim=3), ((m_k, k), (m_v, v)))
+
+        dots = torch.einsum('bhnid,bhnjd->bhnij', q, k) * (d ** -0.5)
+
+        mask_value = max_neg_value(dots)
+
+        if exists(query_mask) or exists(key_mask):
+            query_mask = default(query_mask, lambda: torch.ones((b, t), device=device).bool())
+            key_mask = default(key_mask, lambda: torch.ones((b, kv_t), device=device).bool())
+
+            q_mask = expand_dim(query_mask, 1, h).gather(2, indices)
+            kv_mask = expand_dim(key_mask, 1, h).gather(2, kv_indices)
+            q_mask, kv_mask = map(lambda t: t.reshape(b, h, nc, -1), (q_mask, kv_mask))
+            mask = q_mask[:, :, :, :, None] * kv_mask[:, :, :, None, :]
+            mask = F.pad(mask, (self.num_mem_kv, 0), value=1)
+            dots.masked_fill_(~mask, mask_value)
+            del mask
+
+        if self.causal:
+            q_mask, kv_mask = map(lambda t: t.reshape(b, h, nc, -1), (indices, kv_indices))
+            mask = q_mask[:, :, :, :, None] >= kv_mask[:, :, :, None, :]
+            mask = F.pad(mask, (self.num_mem_kv, 0), value=1)
+            dots.masked_fill_(~mask, mask_value)
+            del mask
+
+        if self.shared_qk:
+            q_mask, kv_mask = map(lambda t: t.reshape(b, h, nc, -1), (indices, kv_indices))
+            mask = q_mask[:, :, :, :, None] == kv_mask[:, :, :, None, :]
+            mask = F.pad(mask, (self.num_mem_kv, 0), value=0)
+            dots.masked_fill_(mask, TOKEN_SELF_ATTN_VALUE)
+            del mask
+
+        dots = dots.softmax(dim=-1)
+        dots = self.dropout(dots)
+
+        bo = torch.einsum('bhcij,bhcjd->bhcid', dots, v)
+        so = torch.reshape(bo, (b, h, -1, bo.shape[-1])).type(dtype)
+        out = scatter_mean(out, so, indices.unsqueeze(-1).expand_as(so), -2)
+        return out, aux_loss
+
+# feedforward
+
+
+class GELU_(nn.Module):
+    def forward(self, x):
+        return 0.5 * x * (1 + torch.tanh(math.sqrt(2 / math.pi) * (x + 0.044715 * torch.pow(x, 3))))
+
+
+GELU = nn.GELU if hasattr(nn, 'GELU') else GELU_
+
+
+class FeedForward(nn.Module):
+    def __init__(self, dim, mult=4, dropout=0., activation=None, glu=False):
+        super().__init__()
+        activation = default(activation, GELU)
+
+        self.glu = glu
+        self.w1 = nn.Linear(dim, dim * mult * (2 if glu else 1))
+        self.act = activation()
+        self.dropout = nn.Dropout(dropout)
+        self.w2 = nn.Linear(dim * mult, dim)
+
+    def forward(self, x, **kwargs):
+        if not self.glu:
+            x = self.w1(x)
+            x = self.act(x)
+        else:
+            x, v = self.w1(x).chunk(2, dim=-1)
+            x = self.act(x) * v
+
+        x = self.dropout(x)
+        x = self.w2(x)
+        return x
+
+# self attention
+
+
+class SelfAttention(nn.Module):
+    def __init__(self, dim, max_seq_len, heads, local_attn_heads, window_size, dim_head=None, local_attn_window_size=None, local_attn_radius_blocks=1, causal=False, attn_dropout=0., dropout=0., kmeans_ema_decay=0.999, commitment_factor=1e-4, receives_context=False, context_window_size=None, rel_pos_emb=True, num_mem_kv=0, shared_qk=False, conv_query_kernel=9):
+        super().__init__()
+        assert dim_head or (dim % heads) == 0, 'hidden dimension must be divisible by number of heads'
+        assert (max_seq_len % window_size) == 0, 'maximum sequence length must be divisible by the target window size'
+        assert local_attn_heads <= heads, 'number of local attention heads must be less than total heads'
+        assert not (receives_context and local_attn_heads > 0), 'local attention cannot be used for self attention with context'
+        assert not (receives_context and causal), 'contextual attention layer cannot be causal'
+
+        local_attn_window_size = default(local_attn_window_size, window_size)
+        context_window_size = default(context_window_size, window_size)
+
+        self.shared_qk = shared_qk
+        self.receives_context = receives_context
+        self.heads = heads
+        self.local_attn_heads = local_attn_heads
+        self.global_attn_heads = heads - local_attn_heads
+
+        self.causal = causal
+        self.window_size = window_size
+
+        dim_head = default(dim_head, dim // heads)
+        dim_heads = dim_head * heads
+        self.dim_head = dim_head
+
+        num_clusters = max_seq_len // window_size
+
+        # local
+
+        local_dim_heads = dim_head * self.local_attn_heads
+
+        if self.local_attn_heads > 0:
+            rel_pos_emb_config = (dim_head, local_attn_heads) if rel_pos_emb else None
+            self.local_attn = LocalAttention(dim=dim_head, window_size=local_attn_window_size, causal=causal, dropout=attn_dropout, rel_pos_emb_config=rel_pos_emb_config, look_backward=local_attn_radius_blocks, look_forward=0 if causal else local_attn_radius_blocks)
+            self.local_to_qkv = nn.Linear(dim, 3 * local_dim_heads)
+
+        # global
+
+        global_dim_heads = dim_head * self.global_attn_heads
+
+        if self.global_attn_heads > 0:
+            self.global_attn = KmeansAttention(num_clusters, window_size, self.global_attn_heads, dim_head, causal=causal, dropout=attn_dropout, ema_decay=kmeans_ema_decay, commitment=commitment_factor, receives_context=receives_context, num_mem_kv=num_mem_kv, shared_qk=shared_qk)
+
+        self.to_q = nn.Sequential(
+            Rearrange('b n c -> b c n'),
+            DepthWiseConv1d(dim, global_dim_heads, conv_query_kernel, causal=causal),
+            Rearrange('b c n -> b n c')
+        )
+
+        self.to_v = nn.Linear(dim, global_dim_heads, bias=False)
+
+        if not self.shared_qk:
+            self.to_k = nn.Linear(dim, global_dim_heads, bias=False)
+
+        # out
+
+        self.to_out = nn.Linear(dim_heads, dim, bias=False)
+        self.dropout = nn.Dropout(dropout)
+
+    def forward(self, query, key, value, context=None, key_padding_mask=None, context_mask=None, pos_emb=None, **kwargs):
+        assert not (self.receives_context and not exists(context)), 'context must be passed if self attention is set to receive context'
+        input_mask = key_padding_mask
+        x = query.transpose(0, 1)
+        b, t, _, h, dh = *x.shape, self.heads, self.dim_head
+        has_local, has_global = map(lambda x: x > 0, (self.local_attn_heads, self.global_attn_heads))
+
+        split_heads = lambda v: reshape_dim(v, -1, (-1, dh)).transpose(1, 2).contiguous()
+
+        if has_local:
+            local_qkv = self.local_to_qkv(x).chunk(3, dim=-1)
+            lq, lk, lv = map(split_heads, local_qkv)
+
+        if has_global:
+            kv_input = x if not self.receives_context else context
+
+            q, v = self.to_q(x), self.to_v(kv_input)
+
+            if not self.shared_qk:
+                k = self.to_k(kv_input)
+            else:
+                k = self.to_q(kv_input) if self.receives_context else q
+
+            q, k, v = map(split_heads, (q, k, v))
+
+        out = []
+        total_loss = torch.tensor(0., requires_grad=True, **to(x))
+
+        if has_local:
+            local_out = self.local_attn(lq, lk, lv, input_mask=input_mask)
+            out.append(local_out)
+
+        if has_global:
+            if not self.receives_context and exists(pos_emb):
+                q, k = apply_rotary_pos_emb(q, k, pos_emb)
+
+            global_out, loss = self.global_attn(q, k, v, query_mask=input_mask, key_mask=context_mask)
+            total_loss = total_loss + loss
+
+            out.append(global_out)
+
+        out = torch.cat(out, dim=1)
+        out = out.reshape(b, h, t, -1).transpose(1, 2).reshape(b, t, -1)
+        out = self.dropout(out.transpose(0, 1))
+        # out = self.to_out(out)
+        return out, total_loss
diff --git a/SpeechT5/fairseq/fairseq/modules/kmeans_vector_quantizer.py b/SpeechT5/fairseq/fairseq/modules/kmeans_vector_quantizer.py
new file mode 100644
index 0000000000000000000000000000000000000000..040db1e83e775a3bb59d5263d22aae9276a83f22
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/modules/kmeans_vector_quantizer.py
@@ -0,0 +1,127 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import torch
+import torch.nn as nn
+from fairseq.modules import Fp32GroupNorm
+
+
+class KmeansVectorQuantizer(nn.Module):
+    def __init__(
+        self, dim, num_vars, groups, combine_groups, vq_dim, time_first, gamma=0.25
+    ):
+        """Vector quantization using straight pass-through estimator (i.e. kmeans)
+
+        Args:
+            dim: input dimension (channels)
+            num_vars: number of quantized vectors per group
+            groups: number of groups for vector quantization
+            combine_groups: whether to use the vectors for all groups
+            vq_dim: dimensionality of the resulting quantized vector
+            time_first: if true, expect input in BxTxC format, otherwise in BxCxT
+            gamma: commitment loss coefficient
+        """
+        super().__init__()
+
+        self.groups = groups
+        self.combine_groups = combine_groups
+        self.input_dim = dim
+        self.num_vars = num_vars
+        self.vq_dim = vq_dim
+        self.time_first = time_first
+
+        assert (
+            vq_dim % groups == 0
+        ), f"dim {vq_dim} must be divisible by groups {groups} for concatenation"
+
+        self.var_dim = vq_dim // groups
+        num_groups = groups if not combine_groups else 1
+
+        self.embedding = nn.Parameter(
+            0.01 * torch.randn(num_vars, num_groups, self.var_dim)
+        )
+        self.projection = nn.Sequential(
+            nn.Conv1d(dim, dim, kernel_size=1, groups=groups, bias=False),
+            Fp32GroupNorm(groups, dim),
+        )
+        self.gamma = gamma
+        self.mse_mean = nn.MSELoss(reduction="mean")
+
+    def _pass_grad(self, x, y):
+        """Manually set gradient for backward pass.
+        for y = f(x), ensure that during the backward pass,
+        dL/dy = dL/dx regardless of f(x).
+        Returns:
+            y, with the gradient forced to be dL/dy = dL/dx.
+        """
+
+        return y.detach() + (x - x.detach())
+
+    @property
+    def expand_embedding(self):
+        if self.combine_groups:
+            return self.embedding.expand(self.num_vars, self.groups, self.var_dim)
+        return self.embedding
+
+    def forward_idx(self, x):
+        res = self.forward(x, produce_targets=True)
+        return res["x"], res["targets"]
+
+    def forward(self, x, produce_targets=False):
+
+        result = {"num_vars": self.num_vars}
+
+        if self.time_first:
+            x = x.transpose(1, 2)
+
+        bsz, fsz, tsz = x.shape
+
+        ze = self.projection(x)
+        ze_ = ze.view(bsz, self.groups, self.var_dim, tsz).permute(0, 3, 1, 2)
+        d = (
+            (ze_.unsqueeze(0) - self.expand_embedding.unsqueeze(1).unsqueeze(1))
+            .view(self.num_vars, bsz, tsz, self.groups, -1)
+            .norm(dim=-1, p=2)
+        )
+        idx = d.argmin(dim=0)
+        zq = (
+            torch.stack(
+                [
+                    self.expand_embedding[idx[..., group], group]
+                    for group in range(self.groups)
+                ],
+                dim=-2,
+            )
+            .view(bsz, tsz, self.groups * self.var_dim)
+            .permute(0, 2, 1)
+        )
+        assert ze.shape == zq.shape, (ze.shape, zq.shape)
+        x = self._pass_grad(ze, zq)
+
+        hard_x = (
+            idx.new_zeros(bsz * tsz * self.groups, self.num_vars)
+            .scatter_(-1, idx.view(-1, 1), 1.0)
+            .view(bsz * tsz, self.groups, -1)
+        )
+        hard_probs = torch.mean(hard_x.float(), dim=0)
+        result["code_perplexity"] = torch.exp(
+            -torch.sum(hard_probs * torch.log(hard_probs + 1e-7), dim=-1)
+        ).sum()
+
+        if produce_targets:
+            result["targets"] = idx
+
+        if self.time_first:
+            x = x.transpose(1, 2)  # BCT -> BTC
+        result["x"] = x
+
+        ze = ze.float()
+        zq = zq.float()
+        latent_loss = self.mse_mean(zq, ze.detach())
+        commitment_loss = self.mse_mean(ze, zq.detach())
+
+        result["kmeans_loss"] = latent_loss + self.gamma * commitment_loss
+
+        return result
diff --git a/SpeechT5/fairseq/fairseq/modules/layer_drop.py b/SpeechT5/fairseq/fairseq/modules/layer_drop.py
new file mode 100644
index 0000000000000000000000000000000000000000..8961d8bcbc492c40c6b30973234416ce5a414f5a
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/modules/layer_drop.py
@@ -0,0 +1,44 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+"""
+LayerDrop as described in https://arxiv.org/abs/1909.11556.
+"""
+
+import torch
+import torch.nn as nn
+
+
+class LayerDropModuleList(nn.ModuleList):
+    """
+    A LayerDrop implementation based on :class:`torch.nn.ModuleList`.
+
+    We refresh the choice of which layers to drop every time we iterate
+    over the LayerDropModuleList instance. During evaluation we always
+    iterate over all layers.
+
+    Usage::
+
+        layers = LayerDropList(p=0.5, modules=[layer1, layer2, layer3])
+        for layer in layers:  # this might iterate over layers 1 and 3
+            x = layer(x)
+        for layer in layers:  # this might iterate over all layers
+            x = layer(x)
+        for layer in layers:  # this might not iterate over any layers
+            x = layer(x)
+
+    Args:
+        p (float): probability of dropping out each layer
+        modules (iterable, optional): an iterable of modules to add
+    """
+
+    def __init__(self, p, modules=None):
+        super().__init__(modules)
+        self.p = p
+
+    def __iter__(self):
+        dropout_probs = torch.empty(len(self)).uniform_()
+        for i, m in enumerate(super().__iter__()):
+            if not self.training or (dropout_probs[i] > self.p):
+                yield m
diff --git a/SpeechT5/fairseq/fairseq/modules/layer_norm.py b/SpeechT5/fairseq/fairseq/modules/layer_norm.py
new file mode 100644
index 0000000000000000000000000000000000000000..234609d9e213a650e0032aaa0ca0462a818bfead
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/modules/layer_norm.py
@@ -0,0 +1,50 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+
+
+try:
+    from apex.normalization import FusedLayerNorm as _FusedLayerNorm
+
+    has_fused_layernorm = True
+
+    class FusedLayerNorm(_FusedLayerNorm):
+        @torch.jit.unused
+        def forward(self, x):
+            if not x.is_cuda:
+                return super().forward(x)
+            else:
+                with torch.cuda.device(x.device):
+                    return super().forward(x)
+
+
+except ImportError:
+    has_fused_layernorm = False
+
+
+def LayerNorm(normalized_shape, eps=1e-5, elementwise_affine=True, export=False):
+    if torch.jit.is_scripting():
+        export = True
+    if not export and torch.cuda.is_available() and has_fused_layernorm:
+        return FusedLayerNorm(normalized_shape, eps, elementwise_affine)
+    return torch.nn.LayerNorm(normalized_shape, eps, elementwise_affine)
+
+
+class Fp32LayerNorm(nn.LayerNorm):
+    def __init__(self, *args, **kwargs):
+        super().__init__(*args, **kwargs)
+
+    def forward(self, input):
+        output = F.layer_norm(
+            input.float(),
+            self.normalized_shape,
+            self.weight.float() if self.weight is not None else None,
+            self.bias.float() if self.bias is not None else None,
+            self.eps,
+        )
+        return output.type_as(input)
diff --git a/SpeechT5/fairseq/fairseq/modules/learned_positional_embedding.py b/SpeechT5/fairseq/fairseq/modules/learned_positional_embedding.py
new file mode 100644
index 0000000000000000000000000000000000000000..378d0f707183dd344dbb9288dda394b11053acf0
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/modules/learned_positional_embedding.py
@@ -0,0 +1,61 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from typing import Dict, Optional
+
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from fairseq import utils
+from torch import Tensor
+
+
+class LearnedPositionalEmbedding(nn.Embedding):
+    """
+    This module learns positional embeddings up to a fixed maximum size.
+    Padding ids are ignored by either offsetting based on padding_idx
+    or by setting padding_idx to None and ensuring that the appropriate
+    position ids are passed to the forward function.
+    """
+
+    def __init__(self, num_embeddings: int, embedding_dim: int, padding_idx: int):
+        super().__init__(num_embeddings, embedding_dim, padding_idx)
+        self.onnx_trace = False
+        if self.padding_idx is not None:
+            self.max_positions = self.num_embeddings - self.padding_idx - 1
+        else:
+            self.max_positions = self.num_embeddings
+
+    def forward(
+        self,
+        input: Tensor,
+        incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]] = None,
+        positions: Optional[Tensor] = None,
+    ):
+        """Input is expected to be of size [bsz x seqlen]."""
+        assert (positions is None) or (
+            self.padding_idx is None
+        ), "If positions is pre-computed then padding_idx should not be set."
+
+        if positions is None:
+            if incremental_state is not None:
+                # positions is the same for every token when decoding a single step
+                # Without the int() cast, it doesn't work in some cases when exporting to ONNX
+                positions = torch.zeros(
+                    (1, 1), device=input.device, dtype=input.dtype
+                ).fill_(int(self.padding_idx + input.size(1)))
+            else:
+                positions = utils.make_positions(
+                    input, self.padding_idx, onnx_trace=self.onnx_trace
+                )
+        return F.embedding(
+            positions,
+            self.weight,
+            self.padding_idx,
+            self.max_norm,
+            self.norm_type,
+            self.scale_grad_by_freq,
+            self.sparse,
+        )
diff --git a/SpeechT5/fairseq/fairseq/modules/lightconv_layer/__init__.py b/SpeechT5/fairseq/fairseq/modules/lightconv_layer/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..3b2a99c1227f827768911e5e22e79f6865ffbfd3
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/modules/lightconv_layer/__init__.py
@@ -0,0 +1,6 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from .lightconv_layer import LightconvLayer  # noqa
diff --git a/SpeechT5/fairseq/fairseq/modules/lightconv_layer/cuda_function_gen.py b/SpeechT5/fairseq/fairseq/modules/lightconv_layer/cuda_function_gen.py
new file mode 100644
index 0000000000000000000000000000000000000000..a25433dd8edae2f0b52d7d0eeeb829cabc6b4b89
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/modules/lightconv_layer/cuda_function_gen.py
@@ -0,0 +1,289 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+
+def gen_forward():
+
+    kernels = [3, 5, 7, 15, 31, 63, 127, 255]
+    seqs = [32 * x for x in [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]]
+
+    head = """
+/**
+ * Copyright (c) Facebook, Inc. and its affiliates.
+ *
+ * This source code is licensed under the MIT license found in the
+ * LICENSE file in the root directory of this source tree.
+ */
+
+#include "lightconv_cuda.cuh"
+
+std::vector<at::Tensor> lightconv_cuda_forward(at::Tensor input, at::Tensor filters, int padding_l) {
+
+    at::DeviceGuard g(input.device());
+    const auto minibatch = input.size(0);
+    const auto numFeatures = input.size(1);
+    const auto sequenceLength = input.size(2);
+
+    const auto numHeads = filters.size(0);
+    const auto filterSize = filters.size(1);
+
+    const auto numFiltersInBlock = numFeatures / numHeads;
+
+    const dim3 blocks(minibatch, numFeatures);
+
+    auto output = at::zeros_like(input);
+    auto stream = at::cuda::getCurrentCUDAStream();
+"""
+
+    sequence_if = """
+    if (sequenceLength <= {seq}) {{
+        switch(filterSize) {{
+"""
+
+    case_k = """
+            case {k}:
+"""
+
+    main_block = """
+                if (padding_l == {pad}) {{
+                    AT_DISPATCH_FLOATING_TYPES_AND_HALF(input.scalar_type(), "lightconv_forward", ([&] {{
+                        lightconv_forward_kernel<{k}, {b_size}, {pad}, scalar_t>
+                        <<<blocks, {b_size}, 0, stream>>>(
+                                input.data<scalar_t>(),
+                                filters.data<scalar_t>(),
+                                minibatch,
+                                sequenceLength,
+                                numFeatures,
+                                numFiltersInBlock,
+                                output.data<scalar_t>());
+                    }}));
+                }} else
+"""
+
+    bad_padding = """
+                {
+                    std::cout << "WARNING: Unsupported padding size - skipping forward pass" << std::endl;
+                }
+                break;
+"""
+
+    bad_filter = """
+            default:
+                std::cout << "WARNING: Unsupported filter length passed - skipping forward pass" << std::endl;
+        }
+"""
+
+    con_else = """
+    } else
+"""
+
+    final_else = """
+    {
+        switch(filterSize) {
+"""
+
+    final_return = """
+    }
+
+    return {output};
+}
+"""
+
+    with open("lightconv_cuda_forward.cu", "w") as forward:
+        forward.write(head)
+        for seq in seqs:
+            forward.write(sequence_if.format(seq=seq))
+            for k in kernels:
+                forward.write(case_k.format(k=k))
+                for pad in [k // 2, k - 1]:
+                    forward.write(main_block.format(k=k, b_size=seq, pad=pad))
+                forward.write(bad_padding)
+            forward.write(bad_filter)
+            forward.write(con_else)
+
+        forward.write(final_else)
+        for k in kernels:
+            forward.write(case_k.format(k=k))
+            for pad in [k // 2, k - 1]:
+                forward.write(main_block.format(k=k, b_size=seq, pad=pad))
+            forward.write(bad_padding)
+        forward.write(bad_filter)
+        forward.write(final_return)
+
+
+def gen_backward():
+
+    head = """
+/**
+ * Copyright (c) Facebook, Inc. and its affiliates.
+ *
+ * This source code is licensed under the MIT license found in the
+ * LICENSE file in the root directory of this source tree.
+ */
+
+#include "lightconv_cuda.cuh"
+
+std::vector<at::Tensor> lightconv_cuda_backward(
+        at::Tensor gradOutput,
+        int padding_l,
+        at::Tensor input,
+        at::Tensor filters) {
+
+    // gradWrtInput
+    const int minibatch = input.size(0);
+    const int numFeatures = input.size(1);
+    const int sequenceLength = input.size(2);
+
+    const int numHeads = filters.size(0);
+    const int filterSize = filters.size(1);
+
+    const dim3 gradBlocks(minibatch, numFeatures);
+    const dim3 weightGradFirstpassShortBlocks(minibatch, numHeads);
+    const dim3 weightGradSecondpassBlocks(numHeads, filterSize);
+
+    const int numFiltersInBlock = numFeatures / numHeads;
+
+    auto gradInput = at::zeros_like(input);
+    auto gradFilters = at::zeros_like(filters);
+
+    at::DeviceGuard g(input.device());
+    auto stream = at::cuda::getCurrentCUDAStream();
+
+    switch(filterSize) {
+"""
+
+    sequence_if = """
+            if (sequenceLength <= {seq}) {{
+"""
+
+    case_k = """
+        case {k}:
+"""
+
+    main_block = """
+                if (padding_l == {p}) {{
+                    AT_DISPATCH_FLOATING_TYPES_AND_HALF(input.scalar_type(), "lightconv_backward", ([&] {{
+                        lightconv_grad_wrt_input_kernel<{k}, {b_size}, {p}, scalar_t>
+                        <<<gradBlocks, {b_size}, 0, stream>>>(
+                                gradOutput.data<scalar_t>(),
+                                filters.data<scalar_t>(),
+                                minibatch,
+                                sequenceLength,
+                                numFeatures,
+                                numFiltersInBlock,
+                                gradInput.data<scalar_t>());
+
+"""
+
+    weight_grad_short = """
+                        at::Tensor tempSumGradFilters = at::zeros({{minibatch, numHeads, filterSize}}, input.options().dtype(at::kFloat));
+                        lightconv_grad_wrt_weights_firstpass_short_kernel<{k}, {b_size}, {p}, scalar_t>
+                        <<<weightGradFirstpassShortBlocks, {b_size}, 0, stream>>>(
+                                input.data<scalar_t>(),
+                                gradOutput.data<scalar_t>(),
+                                minibatch,
+                                sequenceLength,
+                                numFeatures,
+                                numFiltersInBlock,
+                                numHeads,
+                                tempSumGradFilters.data<float>()
+                        );
+
+                        lightconv_grad_wrt_weights_secondpass_short_kernel<{k}, {b_size}, scalar_t>
+                        <<<weightGradSecondpassBlocks, {b_size}, 0, stream>>>(
+                                tempSumGradFilters.data<float>(),
+                                minibatch,
+                                numFiltersInBlock,
+                                gradFilters.data<scalar_t>()
+                        );
+                    }}));
+                }} else
+"""
+
+    weight_grad = """
+                        at::Tensor tempSumGradFilters = at::zeros({{minibatch, numFeatures, filterSize}}, input.options().dtype(at::kFloat));
+                        lightconv_grad_wrt_weights_firstpass_kernel<{k}, {b_size}, {p}, scalar_t>
+                        <<<gradBlocks, {b_size}, 0, stream>>>(
+                                input.data<scalar_t>(),
+                                gradOutput.data<scalar_t>(),
+                                minibatch,
+                                sequenceLength,
+                                numFeatures,
+                                numFiltersInBlock,
+                                tempSumGradFilters.data<float>()
+                        );
+
+                        lightconv_grad_wrt_weights_secondpass_kernel<{k}, {b_size}, scalar_t>
+                        <<<weightGradSecondpassBlocks, {b_size}, 0, stream>>>(
+                                tempSumGradFilters.data<float>(),
+                                minibatch,
+                                numFiltersInBlock,
+                                gradFilters.data<scalar_t>()
+                        );
+                    }}));
+                }} else
+"""
+
+    bad_padding = """
+                {
+                    std::cout << "WARNING: Unsupported padding size - skipping backward pass" << std::endl;
+                }
+"""
+
+    breakout = """
+                break;
+"""
+
+    bad_filter = """
+        default:
+            std::cout << "WARNING: Unsupported filter length passed - skipping backward pass" << std::endl;
+"""
+
+    con_else = """
+            } else
+"""
+
+    final_else = """
+    {
+        switch(filterSize) {
+"""
+
+    last_return = """
+    }
+    return {gradInput, gradFilters};
+}
+"""
+
+    kernels = [3, 5, 7, 15, 31, 63, 127, 255]
+    seqs = [32 * x for x in [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]]
+    thresh = [32, 32, 64, 128, 256, -1, -1, -1]
+    max_mem = [-1, -1, -1, -1, -1, 192, 96, 64]
+
+    with open("lightconv_cuda_backward.cu", "w") as backward:
+        backward.write(head)
+        for (k, t, mem) in zip(kernels, thresh, max_mem):
+            backward.write(case_k.format(k=k))
+            for seq in seqs:
+                if (t == -1 or seq <= t) and (mem == -1 or seq < mem):
+                    backward.write(sequence_if.format(seq=seq))
+                    for p in [k // 2, k - 1]:
+                        backward.write(main_block.format(k=k, b_size=seq, p=p))
+                        backward.write(weight_grad_short.format(k=k, b_size=seq, p=p))
+                    backward.write(bad_padding)
+                else:
+                    for p in [k // 2, k - 1]:
+                        backward.write(main_block.format(k=k, b_size=32, p=p))
+                        backward.write(weight_grad.format(k=k, b_size=32, p=p))
+                    backward.write(bad_padding)
+                    backward.write(breakout)
+                    break
+                backward.write(con_else)
+        backward.write(bad_filter)
+        backward.write(last_return)
+
+
+if __name__ == "__main__":
+    gen_forward()
+    gen_backward()
diff --git a/SpeechT5/fairseq/fairseq/modules/lightconv_layer/lightconv_cuda.cpp b/SpeechT5/fairseq/fairseq/modules/lightconv_layer/lightconv_cuda.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..4bf6b5ad365d604bd91eda384bb422857b640744
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/modules/lightconv_layer/lightconv_cuda.cpp
@@ -0,0 +1,54 @@
+/**
+ * Copyright (c) Facebook, Inc. and its affiliates.
+ *
+ * This source code is licensed under the MIT license found in the
+ * LICENSE file in the root directory of this source tree.
+ */
+
+#include <torch/extension.h>
+#include <vector>
+
+std::vector<at::Tensor> lightconv_cuda_forward(
+    at::Tensor input,
+    at::Tensor filters,
+    int padding_l);
+
+std::vector<at::Tensor> lightconv_cuda_backward(
+    at::Tensor gradOutput,
+    int padding_l,
+    at::Tensor input,
+    at::Tensor filters);
+
+
+#define CHECK_CUDA(x) AT_ASSERTM(x.type().is_cuda(), #x " must be a CUDA tensor")
+#define CHECK_CONTIGUOUS(x) AT_ASSERTM(x.is_contiguous(), #x " must be contiguous")
+#define CHECK_INPUT(x) CHECK_CUDA(x); CHECK_CONTIGUOUS(x)
+
+std::vector<at::Tensor> lightconv_forward(
+    at::Tensor input,
+    at::Tensor filters,
+    int padding_l) {
+
+    CHECK_INPUT(input);
+    CHECK_INPUT(filters);
+
+    return lightconv_cuda_forward(input, filters, padding_l);
+}
+
+std::vector<at::Tensor> lightconv_backward(
+    at::Tensor gradOutput,
+    int padding_l,
+    at::Tensor input,
+    at::Tensor filters) {
+
+    CHECK_INPUT(gradOutput);
+    CHECK_INPUT(input);
+    CHECK_INPUT(filters);
+
+    return lightconv_cuda_backward(gradOutput, padding_l, input, filters);
+}
+
+PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {
+    m.def("forward", &lightconv_forward, "lighconv forward (CUDA)");
+    m.def("backward", &lightconv_backward, "lighconv backward (CUDA)");
+}
diff --git a/SpeechT5/fairseq/fairseq/modules/lightconv_layer/lightconv_cuda.cuh b/SpeechT5/fairseq/fairseq/modules/lightconv_layer/lightconv_cuda.cuh
new file mode 100644
index 0000000000000000000000000000000000000000..3cae57b68fc96872a5047a7a0d081b78456e8fae
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/modules/lightconv_layer/lightconv_cuda.cuh
@@ -0,0 +1,83 @@
+/**
+ * Copyright (c) Facebook, Inc. and its affiliates.
+ * 
+ * This source code is licensed under the MIT license found in the
+ * LICENSE file in the root directory of this source tree.
+ */
+
+#include <ATen/ATen.h>
+#include <c10/cuda/CUDAStream.h>
+
+#include <cuda.h>
+#include <cuda_runtime.h>
+
+#include <algorithm>
+#include <functional>
+#include <iostream>
+#include <stdexcept>
+#include <utility>
+#include <vector>
+
+#include <stdlib.h>
+#include <assert.h>
+
+#define SHFL_MASK 0xffffffff
+
+template<int FS, int SB, int padding_l, typename scalar_t>
+__global__
+void lightconv_forward_kernel(const scalar_t* input,
+                              const scalar_t* filters,
+                              int minibatch, int sequenceLength,
+                              int numFeatures, int numFiltersInBlock,
+                              scalar_t* output);
+
+template<int FS, int SB, int padding_l, typename scalar_t>
+__global__
+void lightconv_grad_wrt_input_kernel(
+    const scalar_t* input, 
+    const scalar_t* filters,
+    int minibatch,
+    int sequenceLength,
+    int numFeatures,
+    int numFiltersInBlock,
+    scalar_t* output);
+
+template<int FS, int SB, int padding_l, typename scalar_t>
+__global__
+void lightconv_grad_wrt_weights_firstpass_short_kernel(
+    const scalar_t* input,
+    const scalar_t* gradInput,
+    int minibatch,
+    int sequenceLength,
+    int numFeatures,
+    int numFiltersInBlock,
+    int numHeads,
+    float* output);
+
+template<int FS, int SB, typename scalar_t>
+__global__
+void lightconv_grad_wrt_weights_secondpass_short_kernel(
+    const float* input,
+    const int minibatch, 
+    const int numFiltersInBlock,
+    scalar_t* output);
+
+template<int FS, int SB, int padding_l, typename scalar_t>
+__global__
+void lightconv_grad_wrt_weights_firstpass_kernel(
+    const scalar_t* input,
+    const scalar_t* gradInput,
+    int minibatch,
+    int sequenceLength,
+    int numFeatures,
+    int numFiltersInBlock,
+    float* output);
+
+template<int FS, int SB, typename scalar_t>
+__global__
+void lightconv_grad_wrt_weights_secondpass_kernel(
+    const float* input,
+    const int minibatch, 
+    const int numFiltersInBlock,
+    scalar_t* output);
+
diff --git a/SpeechT5/fairseq/fairseq/modules/lightconv_layer/lightconv_cuda_kernel.cu b/SpeechT5/fairseq/fairseq/modules/lightconv_layer/lightconv_cuda_kernel.cu
new file mode 100644
index 0000000000000000000000000000000000000000..8ee83a56c89754c2abbe717b269d07ca9e64eef2
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/modules/lightconv_layer/lightconv_cuda_kernel.cu
@@ -0,0 +1,375 @@
+/**
+ * Copyright (c) Facebook, Inc. and its affiliates.
+ * 
+ * This source code is licensed under the MIT license found in the
+ * LICENSE file in the root directory of this source tree.
+ */
+
+#include "lightconv_cuda.cuh"
+#include "lightconv_cuda_forward.cu"
+#include "lightconv_cuda_backward.cu"
+#include "../cuda_utils.cu"
+
+template<int FS, int SB, int padding_l, typename scalar_t>
+__global__
+void lightconv_forward_kernel(const scalar_t* input,
+                              const scalar_t* filters,
+                              int minibatch, int sequenceLength,
+                              int numFeatures, int numFiltersInBlock,
+                              scalar_t* output) {
+
+  const int tid = threadIdx.x;
+  const int batchIdx = blockIdx.x;
+  const int featureIdx = blockIdx.y;
+  const int filterIdx = featureIdx / numFiltersInBlock;
+
+  const int IOOffset = numFeatures * sequenceLength * batchIdx + featureIdx * sequenceLength;
+  const scalar_t* inputFeature = &input[IOOffset];
+  scalar_t* outputFeature = &output[IOOffset];
+  const scalar_t* inputFilter = &filters[filterIdx * FS];
+
+  assert(blockDim.x == SB);
+
+  scalar_t filter[FS];
+  #pragma unroll
+  for (int i = 0; i < FS; ++i) {
+    filter[i] = inputFilter[i];
+  }
+
+  __shared__ scalar_t temp[SB + FS];
+  zeroSharedMem<FS, SB, padding_l>(temp);
+
+  const int numIterations = divUp<int, int>(sequenceLength, SB);
+
+  for (int i = 0; i < numIterations; ++i) {
+    // Read input into shared memory
+    const int inputOffset = i * SB;
+
+    load_input_to_shared<FS, SB, padding_l>(inputFeature, inputOffset, sequenceLength,
+                                 i, numIterations, (numIterations == 1), temp);
+
+    __syncthreads();
+
+    scalar_t out = 0;
+    #pragma unroll
+    for (int j = 0; j < FS; ++j) {
+      out += filter[j] * temp[tid + j];
+    }
+
+    // Write output
+    const int outputOffset = inputOffset;
+    if ((outputOffset + tid) < sequenceLength) {
+      outputFeature[outputOffset + tid] = out;
+    }
+
+    __syncthreads();
+  }
+}
+
+template<int FS, int SB, int padding_l, typename scalar_t>
+__global__
+void lightconv_grad_wrt_input_kernel(
+    const scalar_t* input,
+    const scalar_t* filters,
+    int minibatch,
+    int sequenceLength,
+    int numFeatures,
+    int numFiltersInBlock,
+    scalar_t* output) {
+
+  // input grad kernel is similar to forward kernel
+  const int tid = threadIdx.x;
+  const int batchIdx = blockIdx.x;
+  const int featureIdx = blockIdx.y;
+  const int filterIdx = featureIdx / numFiltersInBlock;
+
+  const int IOOffset = numFeatures * sequenceLength * batchIdx + featureIdx * sequenceLength;
+  const scalar_t* inputFeature = &input[IOOffset];
+  scalar_t* outputFeature = &output[IOOffset];
+  const scalar_t* inputFilter = &filters[filterIdx * FS];
+
+  assert(blockDim.x == SB);
+
+  scalar_t filter[FS];
+
+  // The only change is loading the filter in reverse
+  #pragma unroll
+  for (int i = 0; i < FS; ++i) {
+    filter[i] = inputFilter[FS - i - 1];
+  }
+
+  __shared__ scalar_t temp[SB + FS];
+  const int padding = FS - padding_l - 1;
+  zeroSharedMem<FS, SB, padding>(temp);
+
+  __syncthreads();
+
+  const int numIterations = divUp<int, int>(sequenceLength, SB);
+
+  for (int i = 0; i < numIterations; ++i) {
+    // Read input into shared memory
+    const int inputOffset = i * SB;
+
+    load_input_to_shared<FS, SB, padding>(inputFeature, inputOffset, sequenceLength,
+                                 i, numIterations, false, temp);
+
+    __syncthreads();
+
+    scalar_t out = 0;
+    #pragma unroll
+    for (int j = 0; j < FS; ++j) {
+      out += filter[j] * temp[tid + j];
+    }
+
+    // Write output
+    const int outputOffset = inputOffset;
+    if ((outputOffset + tid) < sequenceLength) {
+      outputFeature[outputOffset + tid] = out;
+    }
+
+    __syncthreads();
+  }
+}
+
+// This is by far the most expensive kernel in terms of time taken.
+// Can be 16x slower than the forward or grad_wrt_input when filter size is 31
+template<int FS, int SB, int padding_l, typename scalar_t>
+__global__
+void lightconv_grad_wrt_weights_firstpass_short_kernel(
+    const scalar_t* input,
+    const scalar_t* gradInput,
+    int minibatch,
+    int sequenceLength,
+    int numFeatures,
+    int numFiltersInBlock,
+    int numHeads,
+    float* output) {
+
+  const int tid = threadIdx.x;
+  const int batchIdx = blockIdx.x;
+  const int filterIdx = blockIdx.y;
+
+  const int numIterations = divUp<int, int>(sequenceLength, SB);
+
+  float* tempOutputGradWeight = &output[filterIdx * FS * minibatch];
+
+  assert(blockDim.x == SB);
+
+  __shared__ scalar_t tempInput[SB + FS];
+  __shared__ scalar_t tempGradInput[SB + FS];
+
+  // local weight accumulation
+  float accumWeights[FS];
+
+  // Initialize memory
+  for (int i = 0; i < FS; ++i) {
+    accumWeights[i] = float(0.0);
+  }
+
+
+  // loop over each sequence within filterblock
+  for (int idxInFilterBlock = 0; idxInFilterBlock < numFiltersInBlock; ++idxInFilterBlock) {
+
+    const int featureOffset = batchIdx * numFeatures * sequenceLength + (filterIdx * numFiltersInBlock + idxInFilterBlock) * sequenceLength;
+    const scalar_t* inputFeature = &input[featureOffset];
+    const scalar_t* gradInputFeature = &gradInput[featureOffset];
+
+    zeroSharedMem<FS, SB, padding_l>(tempInput);
+    zeroSharedMem<FS, SB, (FS/2)>(tempGradInput);
+    __syncthreads();
+
+    for (int i = 0; i < numIterations; ++i) {
+
+      const int inputOffset = i * SB;
+
+      load_input_to_shared<FS, SB, padding_l>(inputFeature, inputOffset, sequenceLength,
+                                    i, numIterations, false, tempInput);
+      load_input_to_shared<FS, SB, (FS/2)>(gradInputFeature, inputOffset, sequenceLength,
+                                    i, numIterations, false, tempGradInput);
+
+      __syncthreads();
+
+      const int gradIndex = (FS/2) + tid;
+      scalar_t tempGrad = tempGradInput[gradIndex];
+
+      #pragma unroll
+      for (int j = 0; j < FS; j++) {
+        const int inputIndex = tid + j;
+        accumWeights[j] += tempInput[inputIndex] * tempGrad;
+      }
+
+      __syncthreads();
+
+    }
+
+  }
+
+  // Row-major sum
+  for (int filterWeightIdx = 0; filterWeightIdx < FS; ++filterWeightIdx) {
+
+    float temp;
+    if (tid < sequenceLength) {
+        temp = accumWeights[filterWeightIdx];
+    } else {
+        temp = float(0.0);
+    }
+
+    const int outputOffset = filterWeightIdx * minibatch + batchIdx;
+
+    temp = blockReduce(temp);
+
+    if (tid == 0) {
+      tempOutputGradWeight[outputOffset] = temp;
+    }
+  }
+}
+
+template<int FS, int SB, typename scalar_t>
+__global__
+void lightconv_grad_wrt_weights_secondpass_short_kernel(
+    const float* input,
+    const int minibatch,
+    const int numFiltersInBlock,
+    scalar_t* output) {
+
+  assert(blockDim.x == SB);
+
+  const int tid = threadIdx.x;
+
+  const int filterIdx = blockIdx.x;
+  const int filterWeightIdx = blockIdx.y;
+
+  const int inputOffset = filterIdx * FS * minibatch +
+                          filterWeightIdx * minibatch;
+  const float* tempInput = &input[inputOffset];
+
+  // read into shared memory for reduction
+  int readIndex = tid;
+
+  float sum = 0.0;
+  while (readIndex < minibatch) {
+    sum += tempInput[readIndex];
+    readIndex += SB;
+  }
+
+  float temp = blockReduce(sum);
+
+  if (tid == 0) {
+    output[blockIdx.x * FS + blockIdx.y] = temp;
+  }
+}
+
+// This is by far the most expensive kernel in terms of time taken.
+// Can be 16x slower than the forward or grad_wrt_input when filter size is 31
+template<int FS, int SB, int padding_l, typename scalar_t>
+__global__
+void lightconv_grad_wrt_weights_firstpass_kernel(
+    const scalar_t* input,
+    const scalar_t* gradInput,
+    int minibatch,
+    int sequenceLength,
+    int numFeatures,
+    int numFiltersInBlock,
+    float* output) {
+
+  assert(blockDim.x == SB);
+
+  const int tid = threadIdx.x;
+  const int batchIdx = blockIdx.x;
+  const int featureIdx = blockIdx.y;
+  const int filterIdx = featureIdx / numFiltersInBlock;
+  const int idxInFilterBlock = featureIdx % numFiltersInBlock;
+
+  const int numIterations = divUp<int, int>(sequenceLength, SB);
+
+  float temp;
+
+  __shared__ scalar_t tempInput[SB + FS];
+  __shared__ scalar_t tempGradInput[SB + FS];
+  zeroSharedMem<FS, SB, padding_l>(tempInput);
+  zeroSharedMem<FS, SB, (FS/2)>(tempGradInput);
+  __syncthreads();
+
+  float accumWeights[FS];
+
+  for (int i = 0; i < FS; ++i) {
+    accumWeights[i] = float(0.0);
+  }
+
+  const int IOOffset = batchIdx * numFeatures * sequenceLength + featureIdx * sequenceLength;
+  const scalar_t* inputFeature = &input[IOOffset];
+  const scalar_t* gradInputFeature = &gradInput[IOOffset];
+  float* tempOutputGradWeight = &output[filterIdx * FS * minibatch * numFiltersInBlock];
+
+  for (int i = 0; i < numIterations; ++i) {
+    const int inputOffset = i * SB;
+
+    load_input_to_shared<FS, SB, padding_l>(inputFeature, inputOffset, sequenceLength,
+                                 i, numIterations, false, tempInput);
+    load_input_to_shared<FS, SB, (FS/2)>(gradInputFeature, inputOffset, sequenceLength,
+                                 i, numIterations, false, tempGradInput);
+    __syncthreads();
+
+    #pragma unroll
+    for (int j = 0; j < FS; ++j) {
+      accumWeights[j] += tempInput[tid + j] * tempGradInput[tid + (FS/2)];
+    }
+
+    __syncthreads();
+  }
+
+  // Row-major sum
+  for (int filterWeightIdx = 0; filterWeightIdx < FS; ++filterWeightIdx) {
+
+    // Write to shared memory before reduction
+    if (tid < sequenceLength) {
+      temp = accumWeights[filterWeightIdx];
+    } else {
+      temp = float(0.0);
+    }
+
+    temp = blockReduce(temp);
+
+    const int outputOffset = filterWeightIdx * minibatch * numFiltersInBlock +
+                             batchIdx * numFiltersInBlock +
+                             idxInFilterBlock;
+
+    if (tid == 0) {
+      tempOutputGradWeight[outputOffset] = temp;
+    }
+  }
+}
+
+template<int FS, int SB, typename scalar_t>
+__global__
+void lightconv_grad_wrt_weights_secondpass_kernel(
+    const float* input,
+    const int minibatch,
+    const int numFiltersInBlock,
+    scalar_t* output) {
+
+  assert(blockDim.x == SB);
+  const int tid = threadIdx.x;
+
+  // What is the id within a minibatch
+  const int filterIdx = blockIdx.x;
+  const int filterWeightIdx = blockIdx.y;
+
+  const int inputOffset = filterIdx * FS * minibatch * numFiltersInBlock +
+                          filterWeightIdx * minibatch * numFiltersInBlock;
+  const float* tempInput = &input[inputOffset];
+
+  int readIndex = tid;
+
+  float sum = float(0.0);
+  while (readIndex < (minibatch * numFiltersInBlock)) {
+    sum += tempInput[readIndex];
+    readIndex += SB;
+  }
+
+  float temp = blockReduce(sum);
+
+  if (tid == 0) {
+    output[blockIdx.x * FS + blockIdx.y] = temp;
+  }
+}
diff --git a/SpeechT5/fairseq/fairseq/modules/lightconv_layer/lightconv_layer.py b/SpeechT5/fairseq/fairseq/modules/lightconv_layer/lightconv_layer.py
new file mode 100644
index 0000000000000000000000000000000000000000..e7e597f4749c591b057d776aacec39b44d99c037
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/modules/lightconv_layer/lightconv_layer.py
@@ -0,0 +1,137 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import lightconv_cuda
+import torch
+import torch.nn.functional as F
+from fairseq import utils
+from fairseq.incremental_decoding_utils import with_incremental_state
+from fairseq.modules.fairseq_dropout import FairseqDropout
+from torch import nn
+from torch.autograd import Function
+
+
+class lightconvFunction(Function):
+    @staticmethod
+    def forward(ctx, x, weights, padding_l):
+        ctx.padding_l = padding_l
+        outputs = lightconv_cuda.forward(x, weights, padding_l)
+        variables = [x, weights]
+        ctx.save_for_backward(*variables)
+        return outputs[0]
+
+    @staticmethod
+    def backward(ctx, grad_output):
+        outputs = lightconv_cuda.backward(
+            grad_output.contiguous(), ctx.padding_l, *ctx.saved_tensors
+        )
+        grad_input, grad_weights = outputs
+        return grad_input, grad_weights, None
+
+
+@with_incremental_state
+class LightconvLayer(nn.Module):
+    def __init__(
+        self,
+        input_size,
+        kernel_size=1,
+        padding_l=None,
+        weight_softmax=False,
+        num_heads=1,
+        weight_dropout=0.0,
+        bias=False,
+    ):
+        super(LightconvLayer, self).__init__()
+        self.input_size = input_size
+        self.kernel_size = kernel_size
+        self.padding_l = padding_l
+        self.num_heads = num_heads
+        self.weight_softmax = weight_softmax
+        self.weight_dropout_module = FairseqDropout(
+            weight_dropout, module_name=self.__class__.__name__
+        )
+
+        self.weight = nn.Parameter(torch.Tensor(num_heads, kernel_size))
+        if bias:
+            self.bias = nn.Parameter(torch.Tensor(input_size))
+        else:
+            self.bias = None
+        self.reset_parameters()
+
+    def upgrade_state_dict_named(self, state_dict, name):
+        prefix = name + "." if name != "" else ""
+        for k, v in state_dict.items():
+            if k.endswith(prefix + "weight"):
+                if v.dim() == 3 and v.size(1) == 1:
+                    state_dict[k] = v.squeeze(1)
+
+    def reset_parameters(self):
+        nn.init.xavier_uniform_(self.weight)
+        if self.bias is not None:
+            nn.init.constant_(self.bias, 0.0)
+
+    def forward(self, x, incremental_state=None):
+
+        # during inference time, incremental BMM is faster
+        if incremental_state is not None:
+            T, B, C = x.size()
+            K, H = self.kernel_size, self.num_heads
+            R = C // H
+            input_buffer = self._get_input_buffer(incremental_state)
+            if input_buffer is None:
+                input_buffer = x.new()
+            x_unfold = torch.cat([input_buffer, x.unsqueeze(3)], dim=3)
+            if self.kernel_size > 1:
+                self._set_input_buffer(
+                    incremental_state, x_unfold[:, :, :, -self.kernel_size + 1 :]
+                )
+            x_unfold = x_unfold.view(T * B * H, R, -1)
+
+            weight = self.weight
+            if self.weight_softmax:
+                weight = F.softmax(weight.float(), dim=1).type_as(weight)
+
+            weight = weight[:, -x_unfold.size(2) :]
+
+            K = weight.size(1)
+
+            weight = (
+                weight.view(1, H, K)
+                .expand(T * B, H, K)
+                .contiguous()
+                .view(T * B * H, K, 1)
+            )
+
+            weight = self.weight_dropout_module(weight)
+            output = torch.bmm(x_unfold, weight)  # T*B*H x R x 1
+            output = output.view(T, B, C)
+            return output
+
+        # during training time, use CUDA kernel
+        else:
+            x = x.permute(1, 2, 0).contiguous()
+            weight = self.weight
+            if self.weight_softmax:
+                weight = F.softmax(self.weight, -1)
+            if self.weight_dropout_module.p:
+                weight = self.weight_dropout_module(weight)
+            return lightconvFunction.apply(x, weight, self.padding_l).permute(2, 0, 1)
+
+    def reorder_incremental_state(self, incremental_state, new_order):
+        input_buffer = self._get_input_buffer(incremental_state)
+        if input_buffer is not None:
+            input_buffer = input_buffer.index_select(1, new_order)
+            self._set_input_buffer(incremental_state, input_buffer)
+
+    def _get_input_buffer(self, incremental_state):
+        return utils.get_incremental_state(self, incremental_state, "input_buffer")
+
+    def _set_input_buffer(self, incremental_state, new_buffer):
+        return utils.set_incremental_state(
+            self, incremental_state, "input_buffer", new_buffer
+        )
+
+    def half(self):
+        return self._apply(lambda t: t.half() if t.is_floating_point() else t)
diff --git a/SpeechT5/fairseq/fairseq/modules/lightconv_layer/setup.py b/SpeechT5/fairseq/fairseq/modules/lightconv_layer/setup.py
new file mode 100644
index 0000000000000000000000000000000000000000..052635be79b466d0ad56cf5cf607bd10c2297ecf
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/modules/lightconv_layer/setup.py
@@ -0,0 +1,23 @@
+#!/usr/bin/env python3
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from setuptools import setup
+from torch.utils.cpp_extension import BuildExtension, CUDAExtension
+
+
+setup(
+    name="lightconv_layer",
+    ext_modules=[
+        CUDAExtension(
+            "lightconv_cuda",
+            [
+                "lightconv_cuda.cpp",
+                "lightconv_cuda_kernel.cu",
+            ],
+        ),
+    ],
+    cmdclass={"build_ext": BuildExtension},
+)
diff --git a/SpeechT5/fairseq/fairseq/modules/lightweight_convolution.py b/SpeechT5/fairseq/fairseq/modules/lightweight_convolution.py
new file mode 100644
index 0000000000000000000000000000000000000000..ec11a9507951c9e8f3564753841dd9c74a4900e0
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/modules/lightweight_convolution.py
@@ -0,0 +1,310 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from fairseq import utils
+from fairseq.incremental_decoding_utils import with_incremental_state
+from fairseq.modules.fairseq_dropout import FairseqDropout
+from fairseq.modules.unfold import unfold1d
+
+
+def LightweightConv(
+    input_size,
+    kernel_size=1,
+    padding_l=None,
+    num_heads=1,
+    weight_dropout=0.0,
+    weight_softmax=False,
+    bias=False,
+):
+    if torch.cuda.is_available():
+        try:
+            from fairseq.modules.lightconv_layer import LightconvLayer
+
+            return LightconvLayer(
+                input_size,
+                kernel_size=kernel_size,
+                padding_l=padding_l,
+                num_heads=num_heads,
+                weight_dropout=weight_dropout,
+                weight_softmax=weight_softmax,
+                bias=bias,
+            )
+        except ImportError as e:
+            print(e)
+    return LightweightConv1dTBC(
+        input_size,
+        kernel_size=kernel_size,
+        padding_l=padding_l,
+        num_heads=num_heads,
+        weight_dropout=weight_dropout,
+        weight_softmax=weight_softmax,
+        bias=bias,
+    )
+
+
+class LightweightConv1d(nn.Module):
+    """Lightweight Convolution assuming the input is BxCxT
+    This is just an example that explains LightConv clearer than the TBC version.
+    We don't use this module in the model.
+
+    Args:
+        input_size: # of channels of the input and output
+        kernel_size: convolution channels
+        padding: padding
+        num_heads: number of heads used. The weight is of shape
+            `(num_heads, 1, kernel_size)`
+        weight_softmax: normalize the weight with softmax before the convolution
+
+    Shape:
+        Input: BxCxT, i.e. (batch_size, input_size, timesteps)
+        Output: BxCxT, i.e. (batch_size, input_size, timesteps)
+
+    Attributes:
+        weight: the learnable weights of the module of shape
+            `(num_heads, 1, kernel_size)`
+        bias: the learnable bias of the module of shape `(input_size)`
+    """
+
+    def __init__(
+        self,
+        input_size,
+        kernel_size=1,
+        padding=0,
+        num_heads=1,
+        weight_softmax=False,
+        bias=False,
+        weight_dropout=0.0,
+    ):
+        super().__init__()
+        self.input_size = input_size
+        self.kernel_size = kernel_size
+        self.num_heads = num_heads
+        self.padding = padding
+        self.weight_softmax = weight_softmax
+        self.weight = nn.Parameter(torch.Tensor(num_heads, 1, kernel_size))
+
+        if bias:
+            self.bias = nn.Parameter(torch.Tensor(input_size))
+        else:
+            self.bias = None
+        self.weight_dropout_module = FairseqDropout(
+            weight_dropout, module_name=self.__class__.__name__
+        )
+        self.reset_parameters()
+
+    def reset_parameters(self):
+        nn.init.xavier_uniform_(self.weight)
+        if self.bias is not None:
+            nn.init.constant_(self.bias, 0.0)
+
+    def forward(self, input):
+        """
+        input size: B x C x T
+        output size: B x C x T
+        """
+        B, C, T = input.size()
+        H = self.num_heads
+
+        weight = self.weight
+        if self.weight_softmax:
+            weight = F.softmax(weight, dim=-1)
+
+        weight = self.weight_dropout_module(weight)
+        # Merge every C/H entries into the batch dimension (C = self.input_size)
+        # B x C x T -> (B * C/H) x H x T
+        # One can also expand the weight to C x 1 x K by a factor of C/H
+        # and do not reshape the input instead, which is slow though
+        input = input.view(-1, H, T)
+        output = F.conv1d(input, weight, padding=self.padding, groups=self.num_heads)
+        output = output.view(B, C, T)
+        if self.bias is not None:
+            output = output + self.bias.view(1, -1, 1)
+
+        return output
+
+
+@with_incremental_state
+class LightweightConv1dTBC(nn.Module):
+    """Lightweight Convolution assuming the input is TxBxC
+    Args:
+        input_size: # of channels of the input
+        kernel_size: convolution channels
+        padding_l: padding to the left when using "same" padding
+        num_heads: number of heads used. The weight is of shape (num_heads, 1, kernel_size)
+        weight_dropout: the drop rate of the DropConnect to drop the weight
+        weight_softmax: normalize the weight with softmax before the convolution
+        bias: use bias
+
+    Shape:
+        Input: TxBxC, i.e. (timesteps, batch_size, input_size)
+        Output: TxBxC, i.e. (timesteps, batch_size, input_size)
+
+    Attributes:
+        weight: the learnable weights of the module of shape
+            `(num_heads, 1, kernel_size)`
+        bias:   the learnable bias of the module of shape `(input_size)`
+    """
+
+    def __init__(
+        self,
+        input_size,
+        kernel_size=1,
+        padding_l=None,
+        num_heads=1,
+        weight_dropout=0.0,
+        weight_softmax=False,
+        bias=False,
+    ):
+        super().__init__()
+        self.input_size = input_size
+        self.kernel_size = kernel_size
+        self.padding_l = padding_l
+        self.num_heads = num_heads
+        self.weight_dropout_module = FairseqDropout(
+            weight_dropout, module_name=self.__class__.__name__
+        )
+        self.weight_softmax = weight_softmax
+
+        self.weight = nn.Parameter(torch.Tensor(num_heads, 1, kernel_size))
+        if bias:
+            self.bias = nn.Parameter(torch.Tensor(input_size))
+        else:
+            self.bias = None
+
+        self.reset_parameters()
+        self.onnx_trace = False
+
+    def reset_parameters(self):
+        nn.init.xavier_uniform_(self.weight)
+        if self.bias is not None:
+            nn.init.constant_(self.bias, 0.0)
+
+    def forward(self, x, incremental_state=None, unfold=False):
+        """Assuming the input, x, of the shape T x B x C and producing an output in the shape T x B x C
+        args:
+            x: Input of shape T x B x C, i.e. (timesteps, batch_size, input_size)
+            incremental_state: A dict to keep the state
+            unfold: unfold the input or not. If not, we use the matrix trick instead
+        """
+        unfold = unfold or (incremental_state is not None)
+
+        if unfold:
+            output = self._forward_unfolded(x, incremental_state)
+        else:
+            output = self._forward_expanded(x, incremental_state)
+
+        if self.bias is not None:
+            output = output + self.bias.view(1, 1, -1)
+        return output
+
+    def prepare_for_onnx_export_(self):
+        self.onnx_trace = True
+
+    def _forward_unfolded(self, x, incremental_state):
+        """The conventional implementation of convolutions.
+        Unfolding the input by having a window shifting to the right."""
+        T, B, C = x.size()
+        K, H = self.kernel_size, self.num_heads
+        R = C // H
+        assert R * H == C == self.input_size
+
+        weight = self.weight.view(H, K)
+        if incremental_state is not None:
+            input_buffer = self._get_input_buffer(incremental_state)
+            if input_buffer is None:
+                input_buffer = x.new()
+            x_unfold = torch.cat([input_buffer, x.unsqueeze(3)], dim=3)
+            if self.kernel_size > 1:
+                self._set_input_buffer(
+                    incremental_state, x_unfold[:, :, :, -self.kernel_size + 1 :]
+                )
+            x_unfold = x_unfold.view(T * B * H, R, -1)
+        else:
+            # unfold the input: T x B x C --> T' x B x C x K
+            x_unfold = unfold1d(x, self.kernel_size, self.padding_l, 0)
+            x_unfold = x_unfold.view(T * B * H, R, K)
+
+        if self.weight_softmax:
+            weight = utils.softmax(weight, dim=1, onnx_trace=self.onnx_trace).type_as(
+                weight
+            )
+
+        if incremental_state is not None:
+            weight = weight[:, -x_unfold.size(2) :]
+            K = weight.size(1)
+
+        weight = (
+            weight.view(1, H, K).expand(T * B, H, K).contiguous().view(T * B * H, K, 1)
+        )
+
+        weight = self.weight_dropout_module(weight)
+        output = torch.bmm(x_unfold, weight)  # T*B*H x R x 1
+        output = output.view(T, B, C)
+        return output
+
+    def _forward_expanded(self, x, incremental_state):
+        """Turn the convolution filters into band matrices and do matrix multiplication.
+        This is faster when the sequence is short, but less memory efficient.
+        This is not used in the decoder during inference.
+        """
+        T, B, C = x.size()
+        K, H = self.kernel_size, self.num_heads
+        R = C // H
+        assert R * H == C == self.input_size
+
+        weight = self.weight.view(H, K)
+        if self.weight_softmax:
+            weight = utils.softmax(weight, dim=1, onnx_trace=self.onnx_trace).type_as(
+                weight
+            )
+        weight = weight.view(1, H, K).expand(T * B, H, K).contiguous()
+        weight = weight.view(T, B * H, K).transpose(0, 1)
+
+        x = x.view(T, B * H, R).transpose(0, 1)
+        P = self.padding_l
+        if K > T and P == K - 1:
+            weight = weight.narrow(2, K - T, T)
+            K, P = T, T - 1
+        # turn the convolution filters into band matrices
+        weight_expanded = weight.new_zeros(B * H, T, T + K - 1, requires_grad=False)
+        weight_expanded.as_strided((B * H, T, K), (T * (T + K - 1), T + K, 1)).copy_(
+            weight
+        )
+        weight_expanded = weight_expanded.narrow(2, P, T)
+        weight_expanded = self.weight_dropout_module(weight_expanded)
+
+        output = torch.bmm(weight_expanded, x)
+        output = output.transpose(0, 1).contiguous().view(T, B, C)
+        return output
+
+    def reorder_incremental_state(self, incremental_state, new_order):
+        input_buffer = self._get_input_buffer(incremental_state)
+        if input_buffer is not None:
+            input_buffer = input_buffer.index_select(1, new_order)
+            self._set_input_buffer(incremental_state, input_buffer)
+
+    def _get_input_buffer(self, incremental_state):
+        return utils.get_incremental_state(self, incremental_state, "input_buffer")
+
+    def _set_input_buffer(self, incremental_state, new_buffer):
+        return utils.set_incremental_state(
+            self, incremental_state, "input_buffer", new_buffer
+        )
+
+    def extra_repr(self):
+        s = "{}, kernel_size={}, padding_l={}, num_heads={}, weight_softmax={}, bias={}".format(
+            self.input_size,
+            self.kernel_size,
+            self.padding_l,
+            self.num_heads,
+            self.weight_softmax,
+            self.bias is not None,
+        )
+        if self.weight_dropout_module.p > 0.0:
+            s += ", weight_dropout={}".format(self.weight_dropout_module.p)
+        return s
diff --git a/SpeechT5/fairseq/fairseq/modules/linearized_convolution.py b/SpeechT5/fairseq/fairseq/modules/linearized_convolution.py
new file mode 100644
index 0000000000000000000000000000000000000000..f7e156cb0c75cb375447859c8b6749311372c35e
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/modules/linearized_convolution.py
@@ -0,0 +1,110 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import torch
+import torch.nn.functional as F
+from fairseq import utils
+from fairseq.incremental_decoding_utils import with_incremental_state
+
+from .conv_tbc import ConvTBC
+
+from typing import Dict, Optional
+from torch import Tensor
+
+@with_incremental_state
+class LinearizedConvolution(ConvTBC):
+    """An optimized version of nn.Conv1d.
+
+    At training time, this module uses ConvTBC, which is an optimized version
+    of Conv1d. At inference time, it optimizes incremental generation (i.e.,
+    one time step at a time) by replacing the convolutions with linear layers.
+    Note that the input order changes from training to inference.
+    """
+
+    def __init__(self, in_channels, out_channels, kernel_size, **kwargs):
+        super().__init__(in_channels, out_channels, kernel_size, **kwargs)
+        self._linearized_weight = None
+        self.register_backward_hook(self._clear_linearized_weight)
+
+    def state_dict(self, destination=None, prefix="", keep_vars=False):
+        state = ConvTBC.state_dict(self, destination, prefix, keep_vars=keep_vars)
+        # don't store redundant _linearized_weight in checkpoints
+        if prefix + "_linearized_weight" in state:
+            del state[prefix + "_linearized_weight"]
+        return state
+
+    def upgrade_state_dict_named(self, state_dict, name):
+        prefix = name + "." if name != "" else ""
+        if prefix + "_linearized_weight" in state_dict:
+            del state_dict[prefix + "_linearized_weight"]
+
+    @torch.jit.export
+    def forward(self, input, incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]] = None):
+        """
+        Args:
+            incremental_state: Used to buffer signal; if not None, then input is
+                expected to contain a single frame. If the input order changes
+                between time steps, call reorder_incremental_state.
+        Input:
+            Time x Batch x Channel during training
+            Batch x Time x Channel during inference
+        """
+        if incremental_state is None:
+            output = self.conv_tbc(input)
+            if self.kernel_size[0] > 1 and self.padding[0] > 0:
+                # remove future timesteps added by padding
+                output = output[: -self.padding[0], :, :]
+            return output
+
+        # reshape weight
+        weight = self._get_linearized_weight()
+        kw = self.kernel_size[0]
+
+        bsz = input.size(0)  # input: bsz x len x dim
+        if kw > 1:
+            input = input.data
+            input_buffer = self._get_input_buffer(incremental_state)
+            if input_buffer is None:
+                input_buffer = input.new(bsz, kw, input.size(2)).zero_()
+                self._set_input_buffer(incremental_state, input_buffer)
+            else:
+                # shift buffer
+                input_buffer[:, :-1, :] = input_buffer[:, 1:, :].clone()
+            # append next input
+            input_buffer[:, -1, :] = input[:, -1, :]
+            input = input_buffer
+        with torch.no_grad():
+            output = F.linear(input.view(bsz, -1), weight, self.bias)
+        return output.view(bsz, 1, -1)
+
+    @torch.jit.unused
+    def reorder_incremental_state(self, incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]], new_order):
+        input_buffer = self._get_input_buffer(incremental_state)
+        if input_buffer is not None:
+            input_buffer = input_buffer.index_select(0, new_order)
+            self._set_input_buffer(incremental_state, input_buffer)
+
+    @torch.jit.unused
+    def _get_input_buffer(self, incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]]):
+        return utils.get_incremental_state(self, incremental_state, "input_buffer")
+
+    @torch.jit.unused
+    def _set_input_buffer(self, incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]], new_buffer):
+        return utils.set_incremental_state(
+            self, incremental_state, "input_buffer", new_buffer
+        )
+
+    @torch.jit.unused
+    def _get_linearized_weight(self):
+        if self._linearized_weight is None:
+            kw = self.kernel_size[0]
+            weight = self.weight.transpose(2, 1).transpose(1, 0).contiguous()
+            assert weight.size() == (self.out_channels, kw, self.in_channels)
+            return weight.view(self.out_channels, -1)
+        return self._linearized_weight
+
+    @torch.jit.unused
+    def _clear_linearized_weight(self, *args):
+        self._linearized_weight = None
diff --git a/SpeechT5/fairseq/fairseq/modules/multihead_attention.py b/SpeechT5/fairseq/fairseq/modules/multihead_attention.py
new file mode 100644
index 0000000000000000000000000000000000000000..9bdca0f6af43a0a89e9225594ba5b6fbc5ee04c1
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/modules/multihead_attention.py
@@ -0,0 +1,500 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import math
+from typing import Dict, Optional, Tuple
+
+import torch
+import torch.nn.functional as F
+from fairseq import utils
+from fairseq.incremental_decoding_utils import with_incremental_state
+from fairseq.modules.fairseq_dropout import FairseqDropout
+from fairseq.modules.quant_noise import quant_noise
+from torch import Tensor, nn
+from torch.nn import Parameter
+
+
+@with_incremental_state
+class MultiheadAttention(nn.Module):
+    """Multi-headed attention.
+
+    See "Attention Is All You Need" for more details.
+    """
+
+    def __init__(
+        self,
+        embed_dim,
+        num_heads,
+        kdim=None,
+        vdim=None,
+        dropout=0.0,
+        bias=True,
+        add_bias_kv=False,
+        add_zero_attn=False,
+        self_attention=False,
+        encoder_decoder_attention=False,
+        q_noise=0.0,
+        qn_block_size=8,
+    ):
+        super().__init__()
+        self.embed_dim = embed_dim
+        self.kdim = kdim if kdim is not None else embed_dim
+        self.vdim = vdim if vdim is not None else embed_dim
+        self.qkv_same_dim = self.kdim == embed_dim and self.vdim == embed_dim
+
+        self.num_heads = num_heads
+        self.dropout_module = FairseqDropout(
+            dropout, module_name=self.__class__.__name__
+        )
+
+        self.head_dim = embed_dim // num_heads
+        assert (
+            self.head_dim * num_heads == self.embed_dim
+        ), "embed_dim must be divisible by num_heads"
+        self.scaling = self.head_dim ** -0.5
+
+        self.self_attention = self_attention
+        self.encoder_decoder_attention = encoder_decoder_attention
+
+        assert not self.self_attention or self.qkv_same_dim, (
+            "Self-attention requires query, key and " "value to be of the same size"
+        )
+
+        self.k_proj = quant_noise(
+            nn.Linear(self.kdim, embed_dim, bias=bias), q_noise, qn_block_size
+        )
+        self.v_proj = quant_noise(
+            nn.Linear(self.vdim, embed_dim, bias=bias), q_noise, qn_block_size
+        )
+        self.q_proj = quant_noise(
+            nn.Linear(embed_dim, embed_dim, bias=bias), q_noise, qn_block_size
+        )
+
+        self.out_proj = quant_noise(
+            nn.Linear(embed_dim, embed_dim, bias=bias), q_noise, qn_block_size
+        )
+
+        if add_bias_kv:
+            self.bias_k = Parameter(torch.Tensor(1, 1, embed_dim))
+            self.bias_v = Parameter(torch.Tensor(1, 1, embed_dim))
+        else:
+            self.bias_k = self.bias_v = None
+
+        self.add_zero_attn = add_zero_attn
+
+        self.reset_parameters()
+
+        self.onnx_trace = False
+
+    def prepare_for_onnx_export_(self):
+        self.onnx_trace = True
+
+    def reset_parameters(self):
+        if self.qkv_same_dim:
+            # Empirically observed the convergence to be much better with
+            # the scaled initialization
+            nn.init.xavier_uniform_(self.k_proj.weight, gain=1 / math.sqrt(2))
+            nn.init.xavier_uniform_(self.v_proj.weight, gain=1 / math.sqrt(2))
+            nn.init.xavier_uniform_(self.q_proj.weight, gain=1 / math.sqrt(2))
+        else:
+            nn.init.xavier_uniform_(self.k_proj.weight)
+            nn.init.xavier_uniform_(self.v_proj.weight)
+            nn.init.xavier_uniform_(self.q_proj.weight)
+
+        nn.init.xavier_uniform_(self.out_proj.weight)
+        if self.out_proj.bias is not None:
+            nn.init.constant_(self.out_proj.bias, 0.0)
+        if self.bias_k is not None:
+            nn.init.xavier_normal_(self.bias_k)
+        if self.bias_v is not None:
+            nn.init.xavier_normal_(self.bias_v)
+
+    def forward(
+        self,
+        query,
+        key: Optional[Tensor],
+        value: Optional[Tensor],
+        key_padding_mask: Optional[Tensor] = None,
+        incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]] = None,
+        need_weights: bool = True,
+        static_kv: bool = False,
+        attn_mask: Optional[Tensor] = None,
+        before_softmax: bool = False,
+        need_head_weights: bool = False,
+    ) -> Tuple[Tensor, Optional[Tensor]]:
+        """Input shape: Time x Batch x Channel
+
+        Args:
+            key_padding_mask (ByteTensor, optional): mask to exclude
+                keys that are pads, of shape `(batch, src_len)`, where
+                padding elements are indicated by 1s.
+            need_weights (bool, optional): return the attention weights,
+                averaged over heads (default: False).
+            attn_mask (ByteTensor, optional): typically used to
+                implement causal attention, where the mask prevents the
+                attention from looking forward in time (default: None).
+            before_softmax (bool, optional): return the raw attention
+                weights and values before the attention softmax.
+            need_head_weights (bool, optional): return the attention
+                weights for each head. Implies *need_weights*. Default:
+                return the average attention weights over all heads.
+        """
+        if need_head_weights:
+            need_weights = True
+
+        is_tpu = query.device.type == "xla"
+
+        tgt_len, bsz, embed_dim = query.size()
+        src_len = tgt_len
+        assert embed_dim == self.embed_dim
+        assert list(query.size()) == [tgt_len, bsz, embed_dim]
+        if key is not None:
+            src_len, key_bsz, _ = key.size()
+            if not torch.jit.is_scripting():
+                assert key_bsz == bsz
+                assert value is not None
+                assert src_len, bsz == value.shape[:2]
+
+        if (
+            not self.onnx_trace
+            and not is_tpu  # don't use PyTorch version on TPUs
+            and incremental_state is None
+            and not static_kv
+            # A workaround for quantization to work. Otherwise JIT compilation
+            # treats bias in linear module as method.
+            and not torch.jit.is_scripting()
+        ):
+            assert key is not None and value is not None
+            return F.multi_head_attention_forward(
+                query,
+                key,
+                value,
+                self.embed_dim,
+                self.num_heads,
+                torch.empty([0]),
+                torch.cat((self.q_proj.bias, self.k_proj.bias, self.v_proj.bias)),
+                self.bias_k,
+                self.bias_v,
+                self.add_zero_attn,
+                self.dropout_module.p,
+                self.out_proj.weight,
+                self.out_proj.bias,
+                self.training or self.dropout_module.apply_during_inference,
+                key_padding_mask,
+                need_weights,
+                attn_mask,
+                use_separate_proj_weight=True,
+                q_proj_weight=self.q_proj.weight,
+                k_proj_weight=self.k_proj.weight,
+                v_proj_weight=self.v_proj.weight,
+            )
+
+        if incremental_state is not None:
+            saved_state = self._get_input_buffer(incremental_state)
+            if saved_state is not None and "prev_key" in saved_state:
+                # previous time steps are cached - no need to recompute
+                # key and value if they are static
+                if static_kv:
+                    assert self.encoder_decoder_attention and not self.self_attention
+                    key = value = None
+        else:
+            saved_state = None
+
+        if self.self_attention:
+            q = self.q_proj(query)
+            k = self.k_proj(query)
+            v = self.v_proj(query)
+        elif self.encoder_decoder_attention:
+            # encoder-decoder attention
+            q = self.q_proj(query)
+            if key is None:
+                assert value is None
+                k = v = None
+            else:
+                k = self.k_proj(key)
+                v = self.v_proj(key)
+
+        else:
+            assert key is not None and value is not None
+            q = self.q_proj(query)
+            k = self.k_proj(key)
+            v = self.v_proj(value)
+        q *= self.scaling
+
+        if self.bias_k is not None:
+            assert self.bias_v is not None
+            k = torch.cat([k, self.bias_k.repeat(1, bsz, 1)])
+            v = torch.cat([v, self.bias_v.repeat(1, bsz, 1)])
+            if attn_mask is not None:
+                attn_mask = torch.cat(
+                    [attn_mask, attn_mask.new_zeros(attn_mask.size(0), 1)], dim=1
+                )
+            if key_padding_mask is not None:
+                key_padding_mask = torch.cat(
+                    [
+                        key_padding_mask,
+                        key_padding_mask.new_zeros(key_padding_mask.size(0), 1),
+                    ],
+                    dim=1,
+                )
+
+        q = (
+            q.contiguous()
+            .view(tgt_len, bsz * self.num_heads, self.head_dim)
+            .transpose(0, 1)
+        )
+        if k is not None:
+            k = (
+                k.contiguous()
+                .view(-1, bsz * self.num_heads, self.head_dim)
+                .transpose(0, 1)
+            )
+        if v is not None:
+            v = (
+                v.contiguous()
+                .view(-1, bsz * self.num_heads, self.head_dim)
+                .transpose(0, 1)
+            )
+
+        if saved_state is not None:
+            # saved states are stored with shape (bsz, num_heads, seq_len, head_dim)
+            if "prev_key" in saved_state:
+                _prev_key = saved_state["prev_key"]
+                assert _prev_key is not None
+                prev_key = _prev_key.view(bsz * self.num_heads, -1, self.head_dim)
+                if static_kv:
+                    k = prev_key
+                else:
+                    assert k is not None
+                    k = torch.cat([prev_key, k], dim=1)
+                src_len = k.size(1)
+            if "prev_value" in saved_state:
+                _prev_value = saved_state["prev_value"]
+                assert _prev_value is not None
+                prev_value = _prev_value.view(bsz * self.num_heads, -1, self.head_dim)
+                if static_kv:
+                    v = prev_value
+                else:
+                    assert v is not None
+                    v = torch.cat([prev_value, v], dim=1)
+            prev_key_padding_mask: Optional[Tensor] = None
+            if "prev_key_padding_mask" in saved_state:
+                prev_key_padding_mask = saved_state["prev_key_padding_mask"]
+            assert k is not None and v is not None
+            key_padding_mask = MultiheadAttention._append_prev_key_padding_mask(
+                key_padding_mask=key_padding_mask,
+                prev_key_padding_mask=prev_key_padding_mask,
+                batch_size=bsz,
+                src_len=k.size(1),
+                static_kv=static_kv,
+            )
+
+            saved_state["prev_key"] = k.view(bsz, self.num_heads, -1, self.head_dim)
+            saved_state["prev_value"] = v.view(bsz, self.num_heads, -1, self.head_dim)
+            saved_state["prev_key_padding_mask"] = key_padding_mask
+            # In this branch incremental_state is never None
+            assert incremental_state is not None
+            incremental_state = self._set_input_buffer(incremental_state, saved_state)
+        assert k is not None
+        assert k.size(1) == src_len
+
+        # This is part of a workaround to get around fork/join parallelism
+        # not supporting Optional types.
+        if key_padding_mask is not None and key_padding_mask.dim() == 0:
+            key_padding_mask = None
+
+        if key_padding_mask is not None:
+            assert key_padding_mask.size(0) == bsz
+            assert key_padding_mask.size(1) == src_len
+
+        if self.add_zero_attn:
+            assert v is not None
+            src_len += 1
+            k = torch.cat([k, k.new_zeros((k.size(0), 1) + k.size()[2:])], dim=1)
+            v = torch.cat([v, v.new_zeros((v.size(0), 1) + v.size()[2:])], dim=1)
+            if attn_mask is not None:
+                attn_mask = torch.cat(
+                    [attn_mask, attn_mask.new_zeros(attn_mask.size(0), 1)], dim=1
+                )
+            if key_padding_mask is not None:
+                key_padding_mask = torch.cat(
+                    [
+                        key_padding_mask,
+                        torch.zeros(key_padding_mask.size(0), 1).type_as(
+                            key_padding_mask
+                        ),
+                    ],
+                    dim=1,
+                )
+
+        attn_weights = torch.bmm(q, k.transpose(1, 2))
+        attn_weights = self.apply_sparse_mask(attn_weights, tgt_len, src_len, bsz)
+
+        assert list(attn_weights.size()) == [bsz * self.num_heads, tgt_len, src_len]
+
+        if attn_mask is not None:
+            attn_mask = attn_mask.unsqueeze(0)
+            if self.onnx_trace:
+                attn_mask = attn_mask.repeat(attn_weights.size(0), 1, 1)
+            attn_weights += attn_mask
+
+        if key_padding_mask is not None:
+            # don't attend to padding symbols
+            attn_weights = attn_weights.view(bsz, self.num_heads, tgt_len, src_len)
+            if not is_tpu:
+                attn_weights = attn_weights.masked_fill(
+                    key_padding_mask.unsqueeze(1).unsqueeze(2).to(torch.bool),
+                    float("-inf"),
+                )
+            else:
+                attn_weights = attn_weights.transpose(0, 2)
+                attn_weights = attn_weights.masked_fill(key_padding_mask, float("-inf"))
+                attn_weights = attn_weights.transpose(0, 2)
+            attn_weights = attn_weights.view(bsz * self.num_heads, tgt_len, src_len)
+
+        if before_softmax:
+            return attn_weights, v
+
+        attn_weights_float = utils.softmax(
+            attn_weights, dim=-1, onnx_trace=self.onnx_trace
+        )
+        attn_weights = attn_weights_float.type_as(attn_weights)
+        attn_probs = self.dropout_module(attn_weights)
+
+        assert v is not None
+        attn = torch.bmm(attn_probs, v)
+        assert list(attn.size()) == [bsz * self.num_heads, tgt_len, self.head_dim]
+        if self.onnx_trace and attn.size(1) == 1:
+            # when ONNX tracing a single decoder step (sequence length == 1)
+            # the transpose is a no-op copy before view, thus unnecessary
+            attn = attn.contiguous().view(tgt_len, bsz, embed_dim)
+        else:
+            attn = attn.transpose(0, 1).contiguous().view(tgt_len, bsz, embed_dim)
+        attn = self.out_proj(attn)
+        attn_weights: Optional[Tensor] = None
+        if need_weights:
+            attn_weights = attn_weights_float.view(
+                bsz, self.num_heads, tgt_len, src_len
+            ).transpose(1, 0)
+            if not need_head_weights:
+                # average attention weights over heads
+                attn_weights = attn_weights.mean(dim=0)
+
+        return attn, attn_weights
+
+    @staticmethod
+    def _append_prev_key_padding_mask(
+        key_padding_mask: Optional[Tensor],
+        prev_key_padding_mask: Optional[Tensor],
+        batch_size: int,
+        src_len: int,
+        static_kv: bool,
+    ) -> Optional[Tensor]:
+        # saved key padding masks have shape (bsz, seq_len)
+        if prev_key_padding_mask is not None and static_kv:
+            new_key_padding_mask = prev_key_padding_mask
+        elif prev_key_padding_mask is not None and key_padding_mask is not None:
+            new_key_padding_mask = torch.cat(
+                [prev_key_padding_mask.float(), key_padding_mask.float()], dim=1
+            )
+        # During incremental decoding, as the padding token enters and
+        # leaves the frame, there will be a time when prev or current
+        # is None
+        elif prev_key_padding_mask is not None:
+            if src_len > prev_key_padding_mask.size(1):
+                filler = torch.zeros(
+                    (batch_size, src_len - prev_key_padding_mask.size(1)),
+                    device=prev_key_padding_mask.device,
+                )
+                new_key_padding_mask = torch.cat(
+                    [prev_key_padding_mask.float(), filler.float()], dim=1
+                )
+            else:
+                new_key_padding_mask = prev_key_padding_mask.float()
+        elif key_padding_mask is not None:
+            if src_len > key_padding_mask.size(1):
+                filler = torch.zeros(
+                    (batch_size, src_len - key_padding_mask.size(1)),
+                    device=key_padding_mask.device,
+                )
+                new_key_padding_mask = torch.cat(
+                    [filler.float(), key_padding_mask.float()], dim=1
+                )
+            else:
+                new_key_padding_mask = key_padding_mask.float()
+        else:
+            new_key_padding_mask = prev_key_padding_mask
+        return new_key_padding_mask
+
+    @torch.jit.export
+    def reorder_incremental_state(
+        self,
+        incremental_state: Dict[str, Dict[str, Optional[Tensor]]],
+        new_order: Tensor,
+    ):
+        """Reorder buffered internal state (for incremental generation)."""
+        input_buffer = self._get_input_buffer(incremental_state)
+        if input_buffer is not None:
+            for k in input_buffer.keys():
+                input_buffer_k = input_buffer[k]
+                if input_buffer_k is not None:
+                    if self.encoder_decoder_attention and input_buffer_k.size(
+                        0
+                    ) == new_order.size(0):
+                        break
+                    input_buffer[k] = input_buffer_k.index_select(0, new_order)
+            incremental_state = self._set_input_buffer(incremental_state, input_buffer)
+        return incremental_state
+
+    def _get_input_buffer(
+        self, incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]]
+    ) -> Dict[str, Optional[Tensor]]:
+        result = self.get_incremental_state(incremental_state, "attn_state")
+        if result is not None:
+            return result
+        else:
+            empty_result: Dict[str, Optional[Tensor]] = {}
+            return empty_result
+
+    def _set_input_buffer(
+        self,
+        incremental_state: Dict[str, Dict[str, Optional[Tensor]]],
+        buffer: Dict[str, Optional[Tensor]],
+    ):
+        return self.set_incremental_state(incremental_state, "attn_state", buffer)
+
+    def apply_sparse_mask(self, attn_weights, tgt_len: int, src_len: int, bsz: int):
+        return attn_weights
+
+    def upgrade_state_dict_named(self, state_dict, name):
+        prefix = name + "." if name != "" else ""
+        items_to_add = {}
+        keys_to_remove = []
+        for k in state_dict.keys():
+            if k.endswith(prefix + "in_proj_weight"):
+                # in_proj_weight used to be q + k + v with same dimensions
+                dim = int(state_dict[k].shape[0] / 3)
+                items_to_add[prefix + "q_proj.weight"] = state_dict[k][:dim]
+                items_to_add[prefix + "k_proj.weight"] = state_dict[k][dim : 2 * dim]
+                items_to_add[prefix + "v_proj.weight"] = state_dict[k][2 * dim :]
+
+                keys_to_remove.append(k)
+
+                k_bias = prefix + "in_proj_bias"
+                if k_bias in state_dict.keys():
+                    dim = int(state_dict[k].shape[0] / 3)
+                    items_to_add[prefix + "q_proj.bias"] = state_dict[k_bias][:dim]
+                    items_to_add[prefix + "k_proj.bias"] = state_dict[k_bias][
+                        dim : 2 * dim
+                    ]
+                    items_to_add[prefix + "v_proj.bias"] = state_dict[k_bias][2 * dim :]
+
+                    keys_to_remove.append(prefix + "in_proj_bias")
+
+        for k in keys_to_remove:
+            del state_dict[k]
+
+        for key, value in items_to_add.items():
+            state_dict[key] = value
diff --git a/SpeechT5/fairseq/fairseq/modules/positional_embedding.py b/SpeechT5/fairseq/fairseq/modules/positional_embedding.py
new file mode 100644
index 0000000000000000000000000000000000000000..8e94e35edb46bf9dea911fe74577d8ecbe9b5ff1
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/modules/positional_embedding.py
@@ -0,0 +1,35 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import torch.nn as nn
+
+from .learned_positional_embedding import LearnedPositionalEmbedding
+from .sinusoidal_positional_embedding import SinusoidalPositionalEmbedding
+
+
+def PositionalEmbedding(
+    num_embeddings: int,
+    embedding_dim: int,
+    padding_idx: int,
+    learned: bool = False,
+):
+    if learned:
+        # if padding_idx is specified then offset the embedding ids by
+        # this index and adjust num_embeddings appropriately
+        # TODO: The right place for this offset would be inside
+        # LearnedPositionalEmbedding. Move this there for a cleaner implementation.
+        if padding_idx is not None:
+            num_embeddings = num_embeddings + padding_idx + 1
+        m = LearnedPositionalEmbedding(num_embeddings, embedding_dim, padding_idx)
+        nn.init.normal_(m.weight, mean=0, std=embedding_dim ** -0.5)
+        if padding_idx is not None:
+            nn.init.constant_(m.weight[padding_idx], 0)
+    else:
+        m = SinusoidalPositionalEmbedding(
+            embedding_dim,
+            padding_idx,
+            init_size=num_embeddings + padding_idx + 1,
+        )
+    return m
diff --git a/SpeechT5/fairseq/fairseq/modules/quant_noise.py b/SpeechT5/fairseq/fairseq/modules/quant_noise.py
new file mode 100644
index 0000000000000000000000000000000000000000..d777dfbb6c1bf6a9b769dfdaec35d5ef084c8a8b
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/modules/quant_noise.py
@@ -0,0 +1,107 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import torch
+import torch.nn as nn
+
+
+def quant_noise(module, p, block_size):
+    """
+    Wraps modules and applies quantization noise to the weights for
+    subsequent quantization with Iterative Product Quantization as
+    described in "Training with Quantization Noise for Extreme Model Compression"
+
+    Args:
+        - module: nn.Module
+        - p: amount of Quantization Noise
+        - block_size: size of the blocks for subsequent quantization with iPQ
+
+    Remarks:
+        - Module weights must have the right sizes wrt the block size
+        - Only Linear, Embedding and Conv2d modules are supported for the moment
+        - For more detail on how to quantize by blocks with convolutional weights,
+          see "And the Bit Goes Down: Revisiting the Quantization of Neural Networks"
+        - We implement the simplest form of noise here as stated in the paper
+          which consists in randomly dropping blocks
+    """
+
+    # if no quantization noise, don't register hook
+    if p <= 0:
+        return module
+
+    # supported modules
+    assert isinstance(module, (nn.Linear, nn.Embedding, nn.Conv2d))
+
+    # test whether module.weight has the right sizes wrt block_size
+    is_conv = module.weight.ndim == 4
+
+    # 2D matrix
+    if not is_conv:
+        assert (
+            module.weight.size(1) % block_size == 0
+        ), "Input features must be a multiple of block sizes"
+
+    # 4D matrix
+    else:
+        # 1x1 convolutions
+        if module.kernel_size == (1, 1):
+            assert (
+                module.in_channels % block_size == 0
+            ), "Input channels must be a multiple of block sizes"
+        # regular convolutions
+        else:
+            k = module.kernel_size[0] * module.kernel_size[1]
+            assert k % block_size == 0, "Kernel size must be a multiple of block size"
+
+    def _forward_pre_hook(mod, input):
+        # no noise for evaluation
+        if mod.training:
+            if not is_conv:
+                # gather weight and sizes
+                weight = mod.weight
+                in_features = weight.size(1)
+                out_features = weight.size(0)
+
+                # split weight matrix into blocks and randomly drop selected blocks
+                mask = torch.zeros(
+                    in_features // block_size * out_features, device=weight.device
+                )
+                mask.bernoulli_(p)
+                mask = mask.repeat_interleave(block_size, -1).view(-1, in_features)
+
+            else:
+                # gather weight and sizes
+                weight = mod.weight
+                in_channels = mod.in_channels
+                out_channels = mod.out_channels
+
+                # split weight matrix into blocks and randomly drop selected blocks
+                if mod.kernel_size == (1, 1):
+                    mask = torch.zeros(
+                        int(in_channels // block_size * out_channels),
+                        device=weight.device,
+                    )
+                    mask.bernoulli_(p)
+                    mask = mask.repeat_interleave(block_size, -1).view(-1, in_channels)
+                else:
+                    mask = torch.zeros(
+                        weight.size(0), weight.size(1), device=weight.device
+                    )
+                    mask.bernoulli_(p)
+                    mask = (
+                        mask.unsqueeze(2)
+                        .unsqueeze(3)
+                        .repeat(1, 1, mod.kernel_size[0], mod.kernel_size[1])
+                    )
+
+            # scale weights and apply mask
+            mask = mask.to(
+                torch.bool
+            )  # x.bool() is not currently supported in TorchScript
+            s = 1 / (1 - p)
+            mod.weight.data = s * weight.masked_fill(mask, 0)
+
+    module.register_forward_pre_hook(_forward_pre_hook)
+    return module
diff --git a/SpeechT5/fairseq/fairseq/modules/quantization/__init__.py b/SpeechT5/fairseq/fairseq/modules/quantization/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/SpeechT5/fairseq/fairseq/modules/quantization/pq/__init__.py b/SpeechT5/fairseq/fairseq/modules/quantization/pq/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..5b10b51b1b0ca21aaec96344f86a0ab9df0c22f8
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/modules/quantization/pq/__init__.py
@@ -0,0 +1,6 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from .utils import SizeTracker, quantize_model_  # NOQA
diff --git a/SpeechT5/fairseq/fairseq/modules/quantization/pq/em.py b/SpeechT5/fairseq/fairseq/modules/quantization/pq/em.py
new file mode 100644
index 0000000000000000000000000000000000000000..6f15c3e46bd052b1e00929e7ece9355fb03846c7
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/modules/quantization/pq/em.py
@@ -0,0 +1,211 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import logging
+import os
+import random
+from collections import Counter
+
+import torch
+
+
+class EM:
+    """
+    EM algorithm used to quantize the columns of W to minimize
+
+                         ||W - W_hat||^2
+
+    Args:
+        - W: weight matrix of size (in_features x out_features)
+        - n_iter: number of k-means iterations
+        - n_centroids: number of centroids (size of codebook)
+        - eps: for cluster reassignment when an empty cluster is found
+        - max_tentatives for cluster reassignment when an empty cluster is found
+        - verbose: print error after each iteration
+
+    Remarks:
+        - If one cluster is empty, the most populated cluster is split into
+          two clusters
+        - All the relevant dimensions are specified in the code
+    """
+
+    def __init__(
+        self, W, n_centroids=256, n_iter=20, eps=1e-6, max_tentatives=30, verbose=True
+    ):
+        self.W = W
+        self.n_centroids = n_centroids
+        self.n_iter = n_iter
+        self.eps = eps
+        self.max_tentatives = max_tentatives
+        self.verbose = verbose
+        self.centroids = torch.Tensor()
+        self.assignments = torch.Tensor()
+        self.objective = []
+
+    def initialize_centroids(self):
+        """
+        Initializes the centroids by sampling random columns from W.
+        """
+
+        in_features, out_features = self.W.size()
+        indices = torch.randint(
+            low=0, high=out_features, size=(self.n_centroids,)
+        ).long()
+        self.centroids = self.W[:, indices].t()  # (n_centroids x in_features)
+
+    def step(self, i):
+        """
+        There are two standard steps for each iteration: expectation (E) and
+        minimization (M). The E-step (assignment) is performed with an exhaustive
+        search and the M-step (centroid computation) is performed with
+        the exact solution.
+
+        Args:
+            - i: step number
+
+        Remarks:
+            - The E-step heavily uses PyTorch broadcasting to speed up computations
+              and reduce the memory overhead
+        """
+
+        # assignments (E-step)
+        distances = self.compute_distances()  # (n_centroids x out_features)
+        self.assignments = torch.argmin(distances, dim=0)  # (out_features)
+        n_empty_clusters = self.resolve_empty_clusters()
+
+        # centroids (M-step)
+        for k in range(self.n_centroids):
+            W_k = self.W[:, self.assignments == k]  # (in_features x size_of_cluster_k)
+            self.centroids[k] = W_k.mean(dim=1)  # (in_features)
+
+        # book-keeping
+        obj = (self.centroids[self.assignments].t() - self.W).norm(p=2).item()
+        self.objective.append(obj)
+        if self.verbose:
+            logging.info(
+                f"Iteration: {i},\t"
+                f"objective: {obj:.6f},\t"
+                f"resolved empty clusters: {n_empty_clusters}"
+            )
+
+    def resolve_empty_clusters(self):
+        """
+        If one cluster is empty, the most populated cluster is split into
+        two clusters by shifting the respective centroids. This is done
+        iteratively for a fixed number of tentatives.
+        """
+
+        # empty clusters
+        counts = Counter(map(lambda x: x.item(), self.assignments))
+        empty_clusters = set(range(self.n_centroids)) - set(counts.keys())
+        n_empty_clusters = len(empty_clusters)
+
+        tentatives = 0
+        while len(empty_clusters) > 0:
+            # given an empty cluster, find most populated cluster and split it into two
+            k = random.choice(list(empty_clusters))
+            m = counts.most_common(1)[0][0]
+            e = torch.randn_like(self.centroids[m]) * self.eps
+            self.centroids[k] = self.centroids[m].clone()
+            self.centroids[k] += e
+            self.centroids[m] -= e
+
+            # recompute assignments
+            distances = self.compute_distances()  # (n_centroids x out_features)
+            self.assignments = torch.argmin(distances, dim=0)  # (out_features)
+
+            # check for empty clusters
+            counts = Counter(map(lambda x: x.item(), self.assignments))
+            empty_clusters = set(range(self.n_centroids)) - set(counts.keys())
+
+            # increment tentatives
+            if tentatives == self.max_tentatives:
+                logging.info(
+                    f"Could not resolve all empty clusters, {len(empty_clusters)} remaining"
+                )
+                raise EmptyClusterResolveError
+            tentatives += 1
+
+        return n_empty_clusters
+
+    def compute_distances(self):
+        """
+        For every centroid m, computes
+
+                          ||M - m[None, :]||_2
+
+        Remarks:
+            - We rely on PyTorch's broadcasting to speed up computations
+              and reduce the memory overhead
+            - Without chunking, the sizes in the broadcasting are modified as:
+              (n_centroids x n_samples x out_features) -> (n_centroids x out_features)
+            - The broadcasting computation is automatically chunked so that
+              the tensors fit into the memory of the GPU
+        """
+
+        nb_centroids_chunks = 1
+
+        while True:
+            try:
+                return torch.cat(
+                    [
+                        (self.W[None, :, :] - centroids_c[:, :, None]).norm(p=2, dim=1)
+                        for centroids_c in self.centroids.chunk(
+                            nb_centroids_chunks, dim=0
+                        )
+                    ],
+                    dim=0,
+                )
+            except RuntimeError:
+                nb_centroids_chunks *= 2
+
+    def assign(self):
+        """
+        Assigns each column of W to its closest centroid, thus essentially
+        performing the E-step in train().
+
+        Remarks:
+            - The function must be called after train() or after loading
+              centroids using self.load(), otherwise it will return empty tensors
+        """
+
+        distances = self.compute_distances()  # (n_centroids x out_features)
+        self.assignments = torch.argmin(distances, dim=0)  # (out_features)
+
+    def save(self, path, layer):
+        """
+        Saves centroids and assignments.
+
+        Args:
+            - path: folder used to save centroids and assignments
+        """
+
+        torch.save(self.centroids, os.path.join(path, "{}_centroids.pth".format(layer)))
+        torch.save(
+            self.assignments, os.path.join(path, "{}_assignments.pth".format(layer))
+        )
+        torch.save(self.objective, os.path.join(path, "{}_objective.pth".format(layer)))
+
+    def load(self, path, layer):
+        """
+        Loads centroids and assignments from a given path
+
+        Args:
+            - path: folder use to load centroids and assignments
+        """
+
+        self.centroids = torch.load(
+            os.path.join(path, "{}_centroids.pth".format(layer))
+        )
+        self.assignments = torch.load(
+            os.path.join(path, "{}_assignments.pth".format(layer))
+        )
+        self.objective = torch.load(
+            os.path.join(path, "{}_objective.pth".format(layer))
+        )
+
+
+class EmptyClusterResolveError(Exception):
+    pass
diff --git a/SpeechT5/fairseq/fairseq/modules/quantization/pq/modules/__init__.py b/SpeechT5/fairseq/fairseq/modules/quantization/pq/modules/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..b67c8e8ad691aa01e9e10e904d69d94595387668
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/modules/quantization/pq/modules/__init__.py
@@ -0,0 +1,8 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from .qconv import PQConv2d  # NOQA
+from .qemb import PQEmbedding  # NOQA
+from .qlinear import PQLinear  # NOQA
diff --git a/SpeechT5/fairseq/fairseq/modules/quantization/pq/modules/qconv.py b/SpeechT5/fairseq/fairseq/modules/quantization/pq/modules/qconv.py
new file mode 100644
index 0000000000000000000000000000000000000000..d15ec192e8cda6265a198e583a9bf7fb194dd129
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/modules/quantization/pq/modules/qconv.py
@@ -0,0 +1,115 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import numpy as np
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from torch.nn.modules.utils import _pair
+
+
+class PQConv2d(nn.Module):
+    """
+    Quantized counterpart of nn.Conv2d module. Stores the centroid, the assignments
+    and the non-quantized biases. The full weight is re-instantiated at each forward
+    pass and autograd automatically computes the gradients with respect to the
+    centroids.
+
+    Args:
+        - centroids: centroids of size n_centroids x block_size
+        - assignments: assignments of the centroids to the subvectors
+          of size self.out_channels x n_blocks
+        - bias: the non-quantized bias, must be either torch.Tensor or None
+
+    Remarks:
+        - We refer the reader to the official documentation of the nn.Conv2d module
+          for the other arguments and the behavior of the module.
+        - Performance tests on GPU show that this implementation is 10% slower than
+          the non-quantized nn.Conv2d module for a standard training loop.
+        - During the backward, the gradients are averaged by cluster and not summed.
+          This explains the hook registered to the centroids.
+    """
+
+    def __init__(
+        self,
+        centroids,
+        assignments,
+        bias,
+        in_channels,
+        out_channels,
+        kernel_size,
+        stride=1,
+        padding=0,
+        dilation=1,
+        groups=1,
+        padding_mode="zeros",
+    ):
+        super(PQConv2d, self).__init__()
+        self.block_size = centroids.size(1)
+        self.n_centroids = centroids.size(0)
+        self.in_channels = in_channels
+        self.out_channels = out_channels
+        self.kernel_size = _pair(kernel_size)
+        self.stride = _pair(stride)
+        self.padding = _pair(padding)
+        self.dilation = _pair(dilation)
+        self.groups = groups
+        self.padding_mode = padding_mode
+        # check compatibility
+        if in_channels // groups * np.prod(self.kernel_size) % self.block_size != 0:
+            raise ValueError("Wrong PQ sizes")
+        if len(assignments) % out_channels != 0:
+            raise ValueError("Wrong PQ sizes")
+        if in_channels % groups != 0:
+            raise ValueError("in_channels must be divisible by groups")
+        if out_channels % groups != 0:
+            raise ValueError("out_channels must be divisible by groups")
+        # define parameters
+        self.centroids = nn.Parameter(centroids, requires_grad=True)
+        self.register_buffer("assignments", assignments)
+        self.register_buffer("counts", torch.bincount(assignments).type_as(centroids))
+        if bias is not None:
+            self.bias = nn.Parameter(bias)
+        else:
+            self.register_parameter("bias", None)
+        # register hook for averaging gradients per centroids instead of summing
+        self.centroids.register_hook(lambda x: x / self.counts[:, None])
+
+    @property
+    def weight(self):
+        return (
+            self.centroids[self.assignments]
+            .reshape(-1, self.out_channels, self.block_size)
+            .permute(1, 0, 2)
+            .reshape(
+                self.out_channels, self.in_channels // self.groups, *self.kernel_size
+            )
+        )
+
+    def forward(self, x):
+        return F.conv2d(
+            x,
+            self.weight,
+            self.bias,
+            self.stride,
+            self.padding,
+            self.dilation,
+            self.groups,
+        )
+
+    def extra_repr(self):
+        s = "{in_channels}, {out_channels}, kernel_size={kernel_size}, stride={stride}"
+        if self.padding != (0,) * len(self.padding):
+            s += ", padding={padding}"
+        if self.dilation != (1,) * len(self.dilation):
+            s += ", dilation={dilation}"
+        if self.groups != 1:
+            s += ", groups={groups}"
+        if self.bias is None:
+            s += ", bias=False"
+        if self.padding_mode != "zeros":
+            s += ", padding_mode={padding_mode}"
+        s += ", n_centroids={n_centroids}, block_size={block_size}"
+        return s.format(**self.__dict__)
diff --git a/SpeechT5/fairseq/fairseq/modules/quantization/pq/modules/qemb.py b/SpeechT5/fairseq/fairseq/modules/quantization/pq/modules/qemb.py
new file mode 100644
index 0000000000000000000000000000000000000000..3a74ad3c4c7c9d3203d26e7885864ba578951bfe
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/modules/quantization/pq/modules/qemb.py
@@ -0,0 +1,107 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+
+
+class PQEmbedding(nn.Module):
+    """
+    Quantized counterpart of nn.Embedding module. Stores the centroids and
+    the assignments. The full weight is re-instantiated at each forward
+    pass.
+
+    Args:
+        - centroids: centroids of size n_centroids x block_size
+        - assignments: assignments of the centroids to the subvectors
+          of size self.out_features x n_blocks
+        - bias: the non-quantized bias
+
+    Remarks:
+        - We refer the reader to the official documentation of the nn.Embedding module
+          for the other arguments and the behavior of the module
+        - Performance tests on GPU show that this implementation is 10% slower than
+          the non-quantized nn.Embedding module for a standard training loop.
+    """
+
+    def __init__(
+        self,
+        centroids,
+        assignments,
+        num_embeddings,
+        embedding_dim,
+        padding_idx=None,
+        max_norm=None,
+        norm_type=2.0,
+        scale_grad_by_freq=False,
+        sparse=False,
+        _weight=None,
+    ):
+        super(PQEmbedding, self).__init__()
+        self.block_size = centroids.size(1)
+        self.n_centroids = centroids.size(0)
+        self.num_embeddings = num_embeddings
+        self.embedding_dim = embedding_dim
+        if padding_idx is not None:
+            if padding_idx > 0:
+                assert (
+                    padding_idx < self.num_embeddings
+                ), "Padding_idx must be within num_embeddings"
+            elif padding_idx < 0:
+                assert (
+                    padding_idx >= -self.num_embeddings
+                ), "Padding_idx must be within num_embeddings"
+                padding_idx = self.num_embeddings + padding_idx
+        self.padding_idx = padding_idx
+        self.max_norm = max_norm
+        self.norm_type = norm_type
+        self.scale_grad_by_freq = scale_grad_by_freq
+        self.sparse = sparse
+        # check compatibility
+        if self.embedding_dim % self.block_size != 0:
+            raise ValueError("Wrong PQ sizes")
+        if len(assignments) % self.num_embeddings != 0:
+            raise ValueError("Wrong PQ sizes")
+        # define parameters
+        self.centroids = nn.Parameter(centroids, requires_grad=True)
+        self.register_buffer("assignments", assignments)
+        self.register_buffer("counts", torch.bincount(assignments).type_as(centroids))
+
+    @property
+    def weight(self):
+        return (
+            self.centroids[self.assignments]
+            .reshape(-1, self.num_embeddings, self.block_size)
+            .permute(1, 0, 2)
+            .flatten(1, 2)
+        )
+
+    def forward(self, input):
+        return F.embedding(
+            input,
+            self.weight,
+            self.padding_idx,
+            self.max_norm,
+            self.norm_type,
+            self.scale_grad_by_freq,
+            self.sparse,
+        )
+
+    def extra_repr(self):
+        s = "{num_embeddings}, {embedding_dim}"
+        if self.padding_idx is not None:
+            s += ", padding_idx={padding_idx}"
+        if self.max_norm is not None:
+            s += ", max_norm={max_norm}"
+        if self.norm_type != 2:
+            s += ", norm_type={norm_type}"
+        if self.scale_grad_by_freq is not False:
+            s += ", scale_grad_by_freq={scale_grad_by_freq}"
+        if self.sparse is not False:
+            s += ", sparse=True"
+        s += ", n_centroids={n_centroids}, block_size={block_size}"
+
+        return s.format(**self.__dict__)
diff --git a/SpeechT5/fairseq/fairseq/modules/quantization/pq/modules/qlinear.py b/SpeechT5/fairseq/fairseq/modules/quantization/pq/modules/qlinear.py
new file mode 100644
index 0000000000000000000000000000000000000000..9bdd25a8685bb7c7b32e1f02372aaeb26d8ba53a
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/modules/quantization/pq/modules/qlinear.py
@@ -0,0 +1,71 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+
+
+class PQLinear(nn.Module):
+    """
+    Quantized counterpart of nn.Linear module. Stores the centroid, the assignments
+    and the non-quantized biases. The full weight is re-instantiated at each forward
+    pass.
+
+    Args:
+        - centroids: centroids of size n_centroids x block_size
+        - assignments: assignments of the centroids to the subvectors
+          of size self.out_features x n_blocks
+        - bias: the non-quantized bias
+
+    Remarks:
+        - We refer the reader to the official documentation of the nn.Linear module
+          for the other arguments and the behavior of the module
+        - Performance tests on GPU show that this implementation is 15% slower than
+          the non-quantized nn.Linear module for a standard training loop.
+    """
+
+    def __init__(self, centroids, assignments, bias, in_features, out_features):
+        super(PQLinear, self).__init__()
+        self.block_size = centroids.size(1)
+        self.n_centroids = centroids.size(0)
+        self.in_features = in_features
+        self.out_features = out_features
+        # check compatibility
+        if self.in_features % self.block_size != 0:
+            raise ValueError("Wrong PQ sizes")
+        if len(assignments) % self.out_features != 0:
+            raise ValueError("Wrong PQ sizes")
+        # define parameters
+        self.centroids = nn.Parameter(centroids, requires_grad=True)
+        self.register_buffer("assignments", assignments)
+        self.register_buffer("counts", torch.bincount(assignments).type_as(centroids))
+        if bias is not None:
+            self.bias = nn.Parameter(bias)
+        else:
+            self.register_parameter("bias", None)
+
+    @property
+    def weight(self):
+        return (
+            self.centroids[self.assignments]
+            .reshape(-1, self.out_features, self.block_size)
+            .permute(1, 0, 2)
+            .flatten(1, 2)
+        )
+
+    def forward(self, x):
+        return F.linear(
+            x,
+            self.weight,
+            self.bias,
+        )
+
+    def extra_repr(self):
+        return f"in_features={self.in_features},\
+                 out_features={self.out_features},\
+                 n_centroids={self.n_centroids},\
+                 block_size={self.block_size},\
+                 bias={self.bias is not None}"
diff --git a/SpeechT5/fairseq/fairseq/modules/quantization/pq/pq.py b/SpeechT5/fairseq/fairseq/modules/quantization/pq/pq.py
new file mode 100644
index 0000000000000000000000000000000000000000..eddc2eb34602403f10979f54cd23a45bc2f104d5
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/modules/quantization/pq/pq.py
@@ -0,0 +1,128 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from .em import EM, EmptyClusterResolveError
+
+
+class PQ(EM):
+    """
+    Quantizes the layer weights W with the standard Product Quantization
+    technique. This learns a codebook of codewords or centroids of size
+    block_size from W. For further reference on using PQ to quantize
+    neural networks, see "And the Bit Goes Down: Revisiting the Quantization
+    of Neural Networks", Stock et al., ICLR 2020.
+
+    PQ is performed in two steps:
+    (1) The matrix W (weights or fully-connected or convolutional layer)
+        is reshaped to (block_size, -1).
+            - If W is fully-connected (2D), its columns are split into
+              blocks of size block_size.
+            - If W is convolutional (4D), its filters are split along the
+              spatial dimension.
+    (2) We apply the standard EM/k-means algorithm to the resulting reshaped matrix.
+
+    Args:
+        - W: weight matrix to quantize of size (in_features x out_features)
+        - block_size: size of the blocks (subvectors)
+        - n_centroids: number of centroids
+        - n_iter: number of k-means iterations
+        - eps: for cluster reassignment when an empty cluster is found
+        - max_tentatives for cluster reassignment when an empty cluster is found
+        - verbose: print information after each iteration
+
+    Remarks:
+        - block_size be compatible with the shape of W
+    """
+
+    def __init__(
+        self,
+        W,
+        block_size,
+        n_centroids=256,
+        n_iter=20,
+        eps=1e-6,
+        max_tentatives=30,
+        verbose=True,
+    ):
+        self.block_size = block_size
+        W_reshaped = self._reshape(W)
+        super(PQ, self).__init__(
+            W_reshaped,
+            n_centroids=n_centroids,
+            n_iter=n_iter,
+            eps=eps,
+            max_tentatives=max_tentatives,
+            verbose=verbose,
+        )
+
+    def _reshape(self, W):
+        """
+        Reshapes the matrix W as expained in step (1).
+        """
+
+        # fully connected: by convention the weight has size out_features x in_features
+        if len(W.size()) == 2:
+            self.out_features, self.in_features = W.size()
+            assert (
+                self.in_features % self.block_size == 0
+            ), "Linear: n_blocks must be a multiple of in_features"
+            return (
+                W.reshape(self.out_features, -1, self.block_size)
+                .permute(2, 1, 0)
+                .flatten(1, 2)
+            )
+
+        # convolutional: we reshape along the spatial dimension
+        elif len(W.size()) == 4:
+            self.out_channels, self.in_channels, self.k_h, self.k_w = W.size()
+            assert (
+                self.in_channels * self.k_h * self.k_w
+            ) % self.block_size == 0, (
+                "Conv2d: n_blocks must be a multiple of in_channels * k_h * k_w"
+            )
+            return (
+                W.reshape(self.out_channels, -1, self.block_size)
+                .permute(2, 1, 0)
+                .flatten(1, 2)
+            )
+        # not implemented
+        else:
+            raise NotImplementedError(W.size())
+
+    def encode(self):
+        """
+        Performs self.n_iter EM steps.
+        """
+
+        self.initialize_centroids()
+        for i in range(self.n_iter):
+            try:
+                self.step(i)
+            except EmptyClusterResolveError:
+                break
+
+    def decode(self):
+        """
+        Returns the encoded full weight matrix. Must be called after
+        the encode function.
+        """
+
+        # fully connected case
+        if "k_h" not in self.__dict__:
+            return (
+                self.centroids[self.assignments]
+                .reshape(-1, self.out_features, self.block_size)
+                .permute(1, 0, 2)
+                .flatten(1, 2)
+            )
+
+        # convolutional case
+        else:
+            return (
+                self.centroids[self.assignments]
+                .reshape(-1, self.out_channels, self.block_size)
+                .permute(1, 0, 2)
+                .reshape(self.out_channels, self.in_channels, self.k_h, self.k_w)
+            )
diff --git a/SpeechT5/fairseq/fairseq/modules/quantization/pq/utils.py b/SpeechT5/fairseq/fairseq/modules/quantization/pq/utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..03b15e4b1b58c9a1e6d42052b3bd5457df9a6e2e
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/modules/quantization/pq/utils.py
@@ -0,0 +1,337 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import logging
+import re
+from operator import attrgetter, itemgetter
+
+import numpy as np
+import torch.distributed as dist
+import torch.nn as nn
+
+from .modules import PQConv2d, PQEmbedding, PQLinear
+from .pq import PQ
+
+
+def quantize_model_(
+    model,
+    size_tracker,
+    layers_to_quantize,
+    block_sizes_config,
+    n_centroids_config,
+    step=0,
+    n_iter=15,
+    eps=1e-6,
+    max_tentatives=100,
+    verbose=True,
+):
+    """
+    Quantize a model in-place by stages. All the targeted
+    layers are replaced by their quantized counterpart,
+    and the model is ready for the finetuning of the
+    centroids in a standard training loop (no modifications
+    required). Note that we do not quantize biases.
+
+    Args:
+        - model: a nn.Module
+        - size_tracker: useful for tracking quatization statistics
+        - layers_to_quantize: a list containing regexps for
+          filtering the layers to quantize at each stage according
+          to their name (as in model.named_parameters())
+        - block_sizes_config: dict like
+          {
+              'Conv2d': ('kernel_size', {'(3, 3)': 9, '(1, 1)': 4}),
+              'Linear': ('in_features', {'*': 8})
+          }
+          For instance, all conv2d layers with kernel size 3x3 have
+          a block size of 9 and all Linear layers are quantized with
+          a block size of 8, irrespective of their size.
+        - n_centroids_config: dict like
+          {
+              'Conv2d': ('kernel_size', {'*': 256}),
+              'Linear': ('in_features', {'*': 256})
+          }
+          For instance, all conv2d layers are quantized with 256 centroids
+        - step: the layers to quantize inplace corresponding
+          to layers_to_quantize[step]
+    """
+
+    quantized_layers = get_layers(model, layers_to_quantize[step])
+
+    for layer in quantized_layers:
+
+        # book-keeping
+        is_master_process = (not dist.is_initialized()) or (
+            dist.is_initialized() and dist.get_rank() == 0
+        )
+        verbose = verbose and is_master_process
+
+        # get block size and centroids
+        module = attrgetter(layer)(model)
+        block_size = get_param(module, layer, block_sizes_config)
+        n_centroids = get_param(module, layer, n_centroids_config)
+        if verbose:
+            logging.info(
+                f"Quantizing layer {layer} with block size {block_size} and {n_centroids} centroids"
+            )
+
+        # quantize layer
+        weight = module.weight.data.clone()
+        is_bias = "bias" in [x[0] for x in module.named_parameters()]
+        bias = module.bias.data.clone() if is_bias else None
+        quantizer = PQ(
+            weight,
+            block_size,
+            n_centroids=n_centroids,
+            n_iter=n_iter,
+            eps=eps,
+            max_tentatives=max_tentatives,
+            verbose=verbose,
+        )
+
+        # quantization performed on all GPUs with same seed
+        quantizer.encode()
+        centroids = quantizer.centroids.contiguous()
+        assignments = quantizer.assignments.contiguous()
+
+        # broadcast results to make sure weights are up-to-date
+        if dist.is_initialized():
+            dist.broadcast(centroids, 0)
+            dist.broadcast(assignments, 0)
+
+        # instantiate the quantized counterpart
+        if isinstance(module, nn.Linear):
+            out_features, in_features = map(
+                lambda k: module.__dict__[k], ["out_features", "in_features"]
+            )
+            quantized_module = PQLinear(
+                centroids, assignments, bias, in_features, out_features
+            )
+        elif isinstance(module, nn.Embedding):
+            num_embeddings, embedding_dim = map(
+                lambda k: module.__dict__[k], ["num_embeddings", "embedding_dim"]
+            )
+            quantized_module = PQEmbedding(
+                centroids, assignments, num_embeddings, embedding_dim
+            )
+        elif isinstance(module, nn.Conv2d):
+            out_channels, in_channels, kernel_size = map(
+                lambda k: module.__dict__[k],
+                ["out_channels", "in_channels", "kernel_size"],
+            )
+            stride, padding, dilation, groups, padding_mode = map(
+                lambda k: module.__dict__[k],
+                ["stride", "padding", "dilation", "groups", "padding_mode"],
+            )
+
+            quantized_module = PQConv2d(
+                centroids,
+                assignments,
+                bias,
+                in_channels,
+                out_channels,
+                kernel_size,
+                stride=stride,
+                padding=padding,
+                dilation=dilation,
+                groups=groups,
+                padding_mode=padding_mode,
+            )
+        else:
+            raise ValueError(f"Module {module} not yet supported for quantization")
+
+        # replace layer by its quantized counterpart
+        attrsetter(layer)(model, quantized_module)
+
+        # update statistics
+        size_tracker.update(weight, block_size, n_centroids)
+
+    # return name of quantized layers
+    return quantized_layers
+
+
+def get_layers(model, filter_regexp):
+    """
+    Filters out the layers according to a regexp. Note that
+    we omit biases.
+
+    Args:
+        - model: a nn.Module
+        - filter_regexp: a regexp to filter the layers to keep
+          according to their name in model.named_parameters().
+          For instance, the regexp:
+
+             down_layers\\.[123456]\\.(conv[12]|identity\\.conv))
+
+          is keeping blocks down_layers from 1 to 6, and inside
+          each block is keeping conv1, conv2 and identity.conv.
+
+    Remarks:
+        - We add (module\\.)? at the beginning of the regexp to
+          account for the possible use of nn.parallel.DataParallel
+    """
+
+    # get all parameter names
+    all_layers = map(itemgetter(0), model.named_parameters())
+
+    # remove biases
+    all_layers = filter(lambda x: "bias" not in x, all_layers)
+
+    # remove .weight in all other names (or .weight_orig is spectral norm)
+    all_layers = map(lambda x: x.replace(".weight_orig", ""), all_layers)
+    all_layers = map(lambda x: x.replace(".weight", ""), all_layers)
+
+    # return filtered layers
+    filter_regexp = "(module\\.)?" + "(" + filter_regexp + ")"
+    r = re.compile(filter_regexp)
+
+    return list(filter(r.match, all_layers))
+
+
+def get_param(module, layer_name, param_config):
+    """
+    Given a quantization configuration, get the right parameter
+    for the module to be quantized.
+
+    Args:
+        - module: a nn.Module
+        - layer_name: the name of the layer
+        - param_config: a dict like
+          {
+              'Conv2d': ('kernel_size', {'(3, 3)': 9, '(1, 1)': 4}),
+              'Linear': ('in_features', {'*': 8})
+          }
+          For instance, all conv2d layers with kernel size 3x3 have
+          a block size of 9 and all Linear layers are quantized with
+          a block size of 8, irrespective of their size.
+
+    Remarks:
+        - if 'fuzzy_name' is passed as a parameter, layers whose layer_name
+          include 'fuzzy_name' will be assigned the given parameter.
+          In the following example, conv.expand layers will have a block
+          size of 9 while conv.reduce will have a block size of 4 and all
+          other layers will have a block size of 2.
+          {
+              'Conv2d': ('fuzzy_name', {'expand': 9, 'reduce': 4, '*': 2}),
+              'Linear': ('fuzzy_name', {'classifier': 8, 'projection': 4})
+          }
+
+    """
+
+    layer_type = module.__class__.__name__
+
+    if layer_type not in param_config:
+        raise KeyError(f"Layer type {layer_type} not in config for layer {module}")
+
+    feature, params = param_config[module.__class__.__name__]
+
+    if feature != "fuzzy_name":
+        feature_value = str(getattr(module, feature))
+        if feature_value not in params:
+            if "*" in params:
+                feature_value = "*"
+            else:
+                raise KeyError(
+                    f"{feature}={feature_value} not in config for layer {module}"
+                )
+    else:
+        feature_values = [name for name in params if name in layer_name]
+        if len(feature_values) == 0:
+            if "*" in params:
+                feature_value = "*"
+            else:
+                raise KeyError(f"name={layer_name} not in config for {module}")
+        else:
+            feature_value = feature_values[0]
+
+    return params[feature_value]
+
+
+class SizeTracker(object):
+    """
+    Class to keep track of the compressed network size with iPQ.
+
+    Args:
+        - model: a nn.Module
+
+    Remarks:
+        - The compressed size is the sum of three components
+          for each layer in the network:
+              (1) Storing the centroids given by iPQ in fp16
+              (2) Storing the assignments of the blocks in int8
+              (3) Storing all non-compressed elements such as biases
+        - This cost in only valid if we use 256 centroids (then
+          indexing can indeed by done with int8).
+    """
+
+    def __init__(self, model):
+        self.model = model
+        self.size_non_compressed_model = self.compute_size()
+        self.size_non_quantized = self.size_non_compressed_model
+        self.size_index = 0
+        self.size_centroids = 0
+        self.n_quantized_layers = 0
+
+    def compute_size(self):
+        """
+        Computes the size of the model (in MB).
+        """
+
+        res = 0
+        for _, p in self.model.named_parameters():
+            res += p.numel()
+        return res * 4 / 1024 / 1024
+
+    def update(self, W, block_size, n_centroids):
+        """
+        Updates the running statistics when quantizing a new layer.
+        """
+
+        # bits per weights
+        bits_per_weight = np.log2(n_centroids) / block_size
+        self.n_quantized_layers += 1
+
+        # size of indexing the subvectors of size block_size (in MB)
+        size_index_layer = bits_per_weight * W.numel() / 8 / 1024 / 1024
+        self.size_index += size_index_layer
+
+        # size of the centroids stored in float16 (in MB)
+        size_centroids_layer = n_centroids * block_size * 2 / 1024 / 1024
+        self.size_centroids += size_centroids_layer
+
+        # size of non-compressed layers, e.g. LayerNorms or biases (in MB)
+        size_uncompressed_layer = W.numel() * 4 / 1024 / 1024
+        self.size_non_quantized -= size_uncompressed_layer
+
+    def __repr__(self):
+        size_compressed = (
+            self.size_index + self.size_centroids + self.size_non_quantized
+        )
+        compression_ratio = self.size_non_compressed_model / size_compressed  # NOQA
+        return (
+            f"Non-compressed model size: {self.size_non_compressed_model:.2f} MB. "
+            f"After quantizing {self.n_quantized_layers} layers, size "
+            f"(indexing + centroids + other): {self.size_index:.2f} MB + "
+            f"{self.size_centroids:.2f} MB + {self.size_non_quantized:.2f} MB = "
+            f"{size_compressed:.2f} MB, compression ratio: {compression_ratio:.2f}x"
+        )
+
+
+def attrsetter(*items):
+    def resolve_attr(obj, attr):
+        attrs = attr.split(".")
+        head = attrs[:-1]
+        tail = attrs[-1]
+
+        for name in head:
+            obj = getattr(obj, name)
+        return obj, tail
+
+    def g(obj, val):
+        for attr in items:
+            resolved_obj, resolved_attr = resolve_attr(obj, attr)
+            setattr(resolved_obj, resolved_attr, val)
+
+    return g
diff --git a/SpeechT5/fairseq/fairseq/modules/quantization/quantization_options.py b/SpeechT5/fairseq/fairseq/modules/quantization/quantization_options.py
new file mode 100644
index 0000000000000000000000000000000000000000..b46d682c0edaeaaf2a230e51d50da2a32d4bda98
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/modules/quantization/quantization_options.py
@@ -0,0 +1,44 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+
+def parse_config_yaml(yaml_data):
+    # Initialize to default options.
+    quantization_options = {
+        "n_centroids": {
+            "Linear": ["in_features", {"*": 256}],
+            "Embedding": ["embedding_dim", {"*": 256}],
+        },
+        "block_sizes": {
+            "Linear": ["fuzzy_name", {"fc": 8, "attn": 4, "emb": 4}],
+            "Embedding": ["fuzzy_name", {"emb": 8}],
+        },
+        "layers_to_quantize": [
+            "decoder\\.layers\\.\\d+\\.fc[12]",
+            "decoder\\.embed_tokens\\.embeddings\\.[012]\\.[01]",
+            "decoder\\.layers\\.\\d+\\.self_attn\\.(k_proj|v_proj|q_proj|out_proj)",
+        ],
+    }
+
+    if "n_centroids" in yaml_data:
+        quantization_options["n_centroids"] = {
+            layer: convert_yaml_to_tuple(layer_data)
+            for layer, layer_data in yaml_data["n_centroids"].items()
+        }
+    if "block_sizes" in yaml_data:
+        quantization_options["block_sizes"] = {
+            layer: convert_yaml_to_tuple(layer_data)
+            for layer, layer_data in yaml_data["block_sizes"].items()
+        }
+    if "layers_to_quantize" in yaml_data:
+        quantization_options["layers_to_quantize"] = yaml_data["layers_to_quantize"]
+
+    return quantization_options
+
+
+def convert_yaml_to_tuple(yaml_dictionary):
+    """Converts a yaml dictionary with two keys: `key` and `value` into a two
+    argument tuple of those values."""
+    return (yaml_dictionary["key"], yaml_dictionary["value"])
diff --git a/SpeechT5/fairseq/fairseq/modules/quantization/scalar/__init__.py b/SpeechT5/fairseq/fairseq/modules/quantization/scalar/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..143834f3d036780eb6844c82f0c6f2d10cfe2f61
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/modules/quantization/scalar/__init__.py
@@ -0,0 +1,6 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from .utils import quantize_model_  # NOQA
diff --git a/SpeechT5/fairseq/fairseq/modules/quantization/scalar/modules/__init__.py b/SpeechT5/fairseq/fairseq/modules/quantization/scalar/modules/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..8031d9cdb23f2bc72596f8bc9cfa4965f96e3e6c
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/modules/quantization/scalar/modules/__init__.py
@@ -0,0 +1,9 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from .qact import ActivationQuantizer  # NOQA
+from .qconv import IntConv2d  # NOQA
+from .qemb import IntEmbedding  # NOQA
+from .qlinear import IntLinear  # NOQA
diff --git a/SpeechT5/fairseq/fairseq/modules/quantization/scalar/modules/qact.py b/SpeechT5/fairseq/fairseq/modules/quantization/scalar/modules/qact.py
new file mode 100644
index 0000000000000000000000000000000000000000..c5dd1d63362423ab0cfc381dddabb547a3b44c72
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/modules/quantization/scalar/modules/qact.py
@@ -0,0 +1,88 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import torch
+
+from ..ops import emulate_int
+
+
+class ActivationQuantizer:
+    """
+    Fake scalar quantization of the activations using a forward hook.
+
+    Args:
+        - module. a nn.Module for which we quantize the *post-activations*
+        - p: proportion of activations to quantize, set by default to 1
+        - update_step: to recompute quantization parameters
+        - bits: number of bits for quantization
+        - method: choose among {"tensor", "histogram", "channel"}
+        - clamp_threshold: to prevent gradients overflow
+
+    Remarks:
+        - Parameters scale and zero_point are recomputed every update_step
+          forward pass to reduce the overhead
+        - For the list of quantization methods and number of bits, see ops.py
+        - To remove the hook from the module, simply call self.handle.remove()
+        - At test time, the activations are fully quantized
+        - We use the straight-through estimator so that the gradients
+          back-propagate nicely in the network, this is implemented with
+          the detach() trick
+        - The activations are hard-clamped in [-clamp_threshold, clamp_threshold]
+          to prevent overflow during the backward pass
+    """
+
+    def __init__(
+        self,
+        module,
+        p=1,
+        update_step=1000,
+        bits=8,
+        method="histogram",
+        clamp_threshold=5,
+    ):
+        self.module = module
+        self.p = p
+        self.update_step = update_step
+        self.counter = 0
+        self.bits = bits
+        self.method = method
+        self.clamp_threshold = clamp_threshold
+        self.handle = None
+        self.register_hook()
+
+    def register_hook(self):
+        # forward hook
+        def quantize_hook(module, x, y):
+
+            # update parameters every 1000 iterations
+            if self.counter % self.update_step == 0:
+                self.scale = None
+                self.zero_point = None
+            self.counter += 1
+
+            # train with QuantNoise and evaluate the fully quantized network
+            p = self.p if self.module.training else 1
+
+            # quantize activations
+            y_q, self.scale, self.zero_point = emulate_int(
+                y.detach(),
+                bits=self.bits,
+                method=self.method,
+                scale=self.scale,
+                zero_point=self.zero_point,
+            )
+
+            # mask to apply noise
+            mask = torch.zeros_like(y)
+            mask.bernoulli_(1 - p)
+            noise = (y_q - y).masked_fill(mask.bool(), 0)
+
+            # using straight-through estimator (STE)
+            clamp_low = -self.scale * self.zero_point
+            clamp_high = self.scale * (2 ** self.bits - 1 - self.zero_point)
+            return torch.clamp(y, clamp_low.item(), clamp_high.item()) + noise.detach()
+
+        # register hook
+        self.handle = self.module.register_forward_hook(quantize_hook)
diff --git a/SpeechT5/fairseq/fairseq/modules/quantization/scalar/modules/qconv.py b/SpeechT5/fairseq/fairseq/modules/quantization/scalar/modules/qconv.py
new file mode 100644
index 0000000000000000000000000000000000000000..83788c6f71fd41e61fd115681a22d53ce8b8362c
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/modules/quantization/scalar/modules/qconv.py
@@ -0,0 +1,149 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import torch
+import torch.nn.functional as F
+from torch.nn.modules.conv import _ConvNd
+from torch.nn.modules.utils import _pair
+
+from ..ops import emulate_int
+
+
+class IntConv2d(_ConvNd):
+    """
+    Quantized counterpart of the nn.Conv2d module that applies QuantNoise during training.
+
+    Args:
+        - standard nn.Conv2d parameters
+        - p: amount of noise to inject (0 = no quantization, 1 = quantize all the weights)
+        - bits: number of bits
+        - method: choose among {"tensor", "histogram", "channel"}
+        - update_step: recompute scale and zero_point every update_steps iterations
+
+    Remarks:
+        - We use the straight-thgourh estimator so that the gradients
+          back-propagate nicely in the network, this is implemented with
+          the detach() trick
+        - Parameters scale and zero_point are recomputed every update_step
+          forward pass to reduce the overhead
+        - At test time, the weights are fully quantized
+    """
+
+    def __init__(
+        self,
+        in_channels,
+        out_channels,
+        kernel_size,
+        stride=1,
+        padding=0,
+        dilation=1,
+        groups=1,
+        bias=True,
+        padding_mode="zeros",
+        p=0,
+        bits=8,
+        method="histogram",
+        update_step=1000,
+    ):
+        kernel_size = _pair(kernel_size)
+        stride = _pair(stride)
+        padding = _pair(padding)
+        dilation = _pair(dilation)
+        super(IntConv2d, self).__init__(
+            in_channels,
+            out_channels,
+            kernel_size,
+            stride,
+            padding,
+            dilation,
+            False,
+            _pair(0),
+            groups,
+            bias,
+            padding_mode,
+        )
+
+        # quantization parameters
+        self.p = p
+        self.bits = bits
+        self.method = method
+        self.update_step = update_step
+        self.counter = 0
+
+    def _conv_forward(self, input, weight):
+        if self.padding_mode != "zeros":
+            return F.conv2d(
+                F.pad(input, self._padding_repeated_twice, mode=self.padding_mode),
+                weight,
+                self.bias,
+                self.stride,
+                _pair(0),
+                self.dilation,
+                self.groups,
+            )
+        return F.conv2d(
+            input,
+            weight,
+            self.bias,
+            self.stride,
+            self.padding,
+            self.dilation,
+            self.groups,
+        )
+
+    def forward(self, input):
+        # train with QuantNoise and evaluate the fully quantized network
+        p = self.p if self.training else 1
+
+        # update parameters every 100 iterations
+        if self.counter % self.update_step == 0:
+            self.scale = None
+            self.zero_point = None
+        self.counter += 1
+
+        # quantize weight
+        weight_quantized, self.scale, self.zero_point = emulate_int(
+            self.weight.detach(),
+            bits=self.bits,
+            method=self.method,
+            scale=self.scale,
+            zero_point=self.zero_point,
+        )
+
+        # mask to apply noise
+        mask = torch.zeros_like(self.weight)
+        mask.bernoulli_(1 - p)
+        noise = (weight_quantized - self.weight).masked_fill(mask.bool(), 0)
+
+        # using straight-through estimator (STE)
+        clamp_low = -self.scale * self.zero_point
+        clamp_high = self.scale * (2 ** self.bits - 1 - self.zero_point)
+        weight = (
+            torch.clamp(self.weight, clamp_low.item(), clamp_high.item())
+            + noise.detach()
+        )
+
+        # return output
+        output = self._conv_forward(input, weight)
+        return output
+
+    def extra_repr(self):
+        return (
+            "in_channels={}, out_channels={}, kernel_size={}, stride={}, "
+            "padding={}, dilation={}, groups={}, bias={}, quant_noise={}, "
+            "bits={}, method={}".format(
+                self.in_channels,
+                self.out_channels,
+                self.kernel_size,
+                self.stride,
+                self.padding,
+                self.dilation,
+                self.groups,
+                self.bias is not None,
+                self.p,
+                self.bits,
+                self.method,
+            )
+        )
diff --git a/SpeechT5/fairseq/fairseq/modules/quantization/scalar/modules/qemb.py b/SpeechT5/fairseq/fairseq/modules/quantization/scalar/modules/qemb.py
new file mode 100644
index 0000000000000000000000000000000000000000..d6cf06e5872cb86e5c2e726153c7a80c78db9d1e
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/modules/quantization/scalar/modules/qemb.py
@@ -0,0 +1,147 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+
+from ..ops import emulate_int
+
+
+class IntEmbedding(nn.Module):
+    """
+    Quantized counterpart of the nn.Embedding module that applies QuantNoise during training.
+
+    Args:
+        - num_embeddings: number of tokens
+        - embedding_dim: embedding dimension
+        - p: amount of noise to inject (0 = no quantization, 1 = quantize all the weights)
+        - bits: number of bits
+        - method: choose among {"tensor", "histogram", "channel"}
+        - update_step: recompute scale and zero_point every update_steps iterations
+
+    Remarks:
+        - We use the straight-through estimator so that the gradients
+          back-propagate nicely in the network, this is implemented with
+          the detach() trick
+        - Parameters scale and zero_point are recomputed every update_step
+          forward pass to reduce the overhead
+        - At test time, the weights are fully quantized
+    """
+
+    def __init__(
+        self,
+        num_embeddings,
+        embedding_dim,
+        padding_idx=None,
+        max_norm=None,
+        norm_type=2.0,
+        scale_grad_by_freq=False,
+        sparse=False,
+        _weight=None,
+        p=0,
+        update_step=1000,
+        bits=8,
+        method="histogram",
+    ):
+        super(IntEmbedding, self).__init__()
+        self.num_embeddings = num_embeddings
+        self.embedding_dim = embedding_dim
+        if padding_idx is not None:
+            if padding_idx > 0:
+                assert (
+                    padding_idx < self.num_embeddings
+                ), "Padding_idx must be within num_embeddings"
+            elif padding_idx < 0:
+                assert (
+                    padding_idx >= -self.num_embeddings
+                ), "Padding_idx must be within num_embeddings"
+                padding_idx = self.num_embeddings + padding_idx
+        self.padding_idx = padding_idx
+        self.max_norm = max_norm
+        self.norm_type = norm_type
+        self.scale_grad_by_freq = scale_grad_by_freq
+        if _weight is None:
+            self.weight = nn.Parameter(torch.Tensor(num_embeddings, embedding_dim))
+            self.reset_parameters()
+        else:
+            assert list(_weight.shape) == [
+                num_embeddings,
+                embedding_dim,
+            ], "Shape of weight does not match num_embeddings and embedding_dim"
+            self.weight = nn.Parameter(_weight)
+        self.sparse = sparse
+
+        # quantization parameters
+        self.p = p
+        self.bits = bits
+        self.method = method
+        self.update_step = update_step
+        self.counter = 0
+
+    def reset_parameters(self):
+        nn.init.normal_(self.weight)
+        if self.padding_idx is not None:
+            with torch.no_grad():
+                self.weight[self.padding_idx].fill_(0)
+
+    def forward(self, input):
+        # train with QuantNoise and evaluate the fully quantized network
+        p = self.p if self.training else 1
+
+        # update parameters every 1000 iterations
+        if self.counter % self.update_step == 0:
+            self.scale = None
+            self.zero_point = None
+        self.counter += 1
+
+        # quantize weight
+        weight_quantized, self.scale, self.zero_point = emulate_int(
+            self.weight.detach(),
+            bits=self.bits,
+            method=self.method,
+            scale=self.scale,
+            zero_point=self.zero_point,
+        )
+
+        # mask to apply noise
+        mask = torch.zeros_like(self.weight)
+        mask.bernoulli_(1 - p)
+        noise = (weight_quantized - self.weight).masked_fill(mask.bool(), 0)
+
+        # using straight-through estimator (STE)
+        clamp_low = -self.scale * self.zero_point
+        clamp_high = self.scale * (2 ** self.bits - 1 - self.zero_point)
+        weight = (
+            torch.clamp(self.weight, clamp_low.item(), clamp_high.item())
+            + noise.detach()
+        )
+
+        # return output
+        output = F.embedding(
+            input,
+            weight,
+            self.padding_idx,
+            self.max_norm,
+            self.norm_type,
+            self.scale_grad_by_freq,
+            self.sparse,
+        )
+        return output
+
+    def extra_repr(self):
+        s = "{num_embeddings}, {embedding_dim}"
+        if self.padding_idx is not None:
+            s += ", padding_idx={padding_idx}"
+        if self.max_norm is not None:
+            s += ", max_norm={max_norm}"
+        if self.norm_type != 2:
+            s += ", norm_type={norm_type}"
+        if self.scale_grad_by_freq is not False:
+            s += ", scale_grad_by_freq={scale_grad_by_freq}"
+        if self.sparse is not False:
+            s += ", sparse=True"
+        s += "quant_noise={p}, bits={bits}, method={method}"
+        return s.format(**self.__dict__)
diff --git a/SpeechT5/fairseq/fairseq/modules/quantization/scalar/modules/qlinear.py b/SpeechT5/fairseq/fairseq/modules/quantization/scalar/modules/qlinear.py
new file mode 100644
index 0000000000000000000000000000000000000000..9db1559386bce286301d31435851dc4ea76687a5
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/modules/quantization/scalar/modules/qlinear.py
@@ -0,0 +1,113 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+
+from ..ops import emulate_int
+
+
+class IntLinear(nn.Module):
+    """
+    Quantized counterpart of the nn.Linear module that applies QuantNoise during training.
+
+    Args:
+        - in_features: input features
+        - out_features: output features
+        - bias: bias or not
+        - p: amount of noise to inject (0 = no quantization, 1 = quantize all the weights)
+        - bits: number of bits
+        - method: choose among {"tensor", "histogram", "channel"}
+        - update_step: recompute scale and zero_point every update_steps iterations
+
+    Remarks:
+        - We use the straight-through estimator so that the gradients
+          back-propagate nicely in the network, this is implemented with
+          the detach() trick.
+        - Parameters scale and zero_point are recomputed every update_step
+          forward pass to reduce the overhead
+        - At test time, the weights are fully quantized
+    """
+
+    def __init__(
+        self,
+        in_features,
+        out_features,
+        bias=True,
+        p=0,
+        update_step=3000,
+        bits=8,
+        method="histogram",
+    ):
+        super(IntLinear, self).__init__()
+        self.in_features = int(in_features)
+        self.out_features = int(out_features)
+        self.weight = torch.nn.Parameter(torch.Tensor(out_features, in_features))
+        self.chosen_bias = bias
+        if self.chosen_bias:
+            self.bias = torch.nn.Parameter(torch.Tensor(out_features))
+        else:
+            self.register_parameter("bias", None)
+        self.reset_parameters()
+
+        # quantization parameters
+        self.p = p
+        self.bits = bits
+        self.method = method
+        self.update_step = update_step
+        self.counter = 0
+
+    def reset_parameters(self):
+        nn.init.xavier_uniform_(self.weight)
+        if self.chosen_bias:
+            nn.init.constant_(self.bias, 0.0)
+        return
+
+    def forward(self, input):
+        # train with QuantNoise and evaluate the fully quantized network
+        p = self.p if self.training else 1
+
+        # update parameters every 100 iterations
+        if self.counter % self.update_step == 0:
+            self.scale = None
+            self.zero_point = None
+        self.counter += 1
+
+        # quantize weight
+        weight_quantized, self.scale, self.zero_point = emulate_int(
+            self.weight.detach(),
+            bits=self.bits,
+            method=self.method,
+            scale=self.scale,
+            zero_point=self.zero_point,
+        )
+
+        # mask to apply noise
+        mask = torch.zeros_like(self.weight)
+        mask.bernoulli_(1 - p)
+        noise = (weight_quantized - self.weight).masked_fill(mask.bool(), 0)
+
+        # using straight-through estimator (STE)
+        clamp_low = -self.scale * self.zero_point
+        clamp_high = self.scale * (2 ** self.bits - 1 - self.zero_point)
+        weight = (
+            torch.clamp(self.weight, clamp_low.item(), clamp_high.item())
+            + noise.detach()
+        )
+
+        # return output
+        output = F.linear(input, weight, self.bias)
+        return output
+
+    def extra_repr(self):
+        return "in_features={}, out_features={}, bias={}, quant_noise={}, bits={}, method={}".format(
+            self.in_features,
+            self.out_features,
+            self.bias is not None,
+            self.p,
+            self.bits,
+            self.method,
+        )
diff --git a/SpeechT5/fairseq/fairseq/modules/quantization/scalar/ops.py b/SpeechT5/fairseq/fairseq/modules/quantization/scalar/ops.py
new file mode 100644
index 0000000000000000000000000000000000000000..2a855159be2795bdad45f1365e202d9abd26433b
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/modules/quantization/scalar/ops.py
@@ -0,0 +1,49 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import torch
+
+
+def emulate_int(w, bits, method, scale=None, zero_point=None):
+    q = globals()[f"emulate_int{bits}_{method}"]
+    return q(w, scale=scale, zero_point=zero_point)
+
+
+def quantize(w, scale, zero_point):
+    return (
+        torch.clamp(torch.round(w / scale + zero_point), 0, 255) - zero_point
+    ) * scale
+
+
+def emulate_int8_histogram(w, scale=None, zero_point=None):
+    if scale is None:
+        obs = torch.quantization.observer.HistogramObserver()
+        _ = obs(w.float())
+        scale, zero_point = obs.calculate_qparams()
+        scale = scale.cuda().type_as(w)
+        zero_point = zero_point.cuda().type_as(w)
+    return quantize(w, scale, zero_point), scale, zero_point
+
+
+def emulate_int8_channel(w, scale=None, zero_point=None):
+    if scale is None:
+        obs = torch.quantization.observer.PerChannelMinMaxObserver(
+            ch_axis=-1, qscheme=torch.per_channel_symmetric
+        )
+        _ = obs(w)
+        scale, zero_point, ch_axis = obs.get_qparams()
+        scale = scale.cuda().type_as(w)
+        zero_point = zero_point.cuda().type_as(w)
+    return quantize(w, scale, zero_point), scale, zero_point
+
+
+def emulate_int8_tensor(w, scale=None, zero_point=None):
+    if scale is None:
+        obs = torch.quantization.observer.MinMaxObserver()
+        _ = obs(w)
+        scale, zero_point = obs.calculate_qparams()
+        scale = scale.cuda().type_as(w)
+        zero_point = zero_point.cuda().type_as(w)
+    return quantize(w, scale, zero_point), scale, zero_point
diff --git a/SpeechT5/fairseq/fairseq/modules/quantization/scalar/utils.py b/SpeechT5/fairseq/fairseq/modules/quantization/scalar/utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..32cf616568160004bd97a673f2d85923974c1fae
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/modules/quantization/scalar/utils.py
@@ -0,0 +1,77 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import logging
+from operator import attrgetter
+
+import torch.distributed as dist
+import torch.nn as nn
+
+from ..pq.utils import attrsetter, get_layers
+from .modules import ActivationQuantizer, IntConv2d, IntEmbedding, IntLinear
+
+
+MAPPING = {nn.Linear: IntLinear, nn.Embedding: IntEmbedding, nn.Conv2d: IntConv2d}
+
+
+def quantize_model_(model, p=0.2, bits=8, update_step=3000):
+    """
+    Replaces all modules with their scalar quantized counterpart and
+    registers hooks to quantize the post-ativations of those modules.
+
+    Args:
+        - model: a nn.Module
+        - p: amount of noise (0 for no noise, 1 to quantize all the weights/activations)
+        - bits: number of bits
+        - update_step: update quantization parameters every update_step steps
+    """
+
+    # quantize all layers
+    quantized_layers = get_layers(model, "(.*?)")
+
+    for layer in quantized_layers:
+
+        # book-keeping
+        is_master_process = (not dist.is_initialized()) or (
+            dist.is_initialized() and dist.get_rank() == 0
+        )
+
+        # recover module
+        module = attrgetter(layer)(model)
+        if is_master_process:
+            logging.info(
+                f"Quantizing layer {layer} with bits={bits} and QuantNoise={p}"
+            )
+
+        # quantization params
+        q_params = {
+            "p": p,
+            "update_step": update_step,
+            "bits": bits,
+            "method": "histogram",
+            "counter": 0,
+        }
+
+        # instantiate the quantized counterpart
+        if isinstance(module, tuple(MAPPING.keys())):
+            QuantizedModule = MAPPING[module.__class__]
+            quantized_module = QuantizedModule.__new__(QuantizedModule)
+            params = module.__dict__
+            params.update(q_params)
+            quantized_module.__dict__.update(params)
+
+        else:
+            if is_master_process:
+                logging.info(f"Module {module} not yet supported for quantization")
+            continue
+
+        # activation quantization
+        a_q = ActivationQuantizer(quantized_module, p=0, bits=bits, method="histogram")
+
+        # replace layer by its quantized counterpart
+        attrsetter(layer)(model, quantized_module)
+
+    # return name of quantized layers
+    return quantized_layers
diff --git a/SpeechT5/fairseq/fairseq/modules/same_pad.py b/SpeechT5/fairseq/fairseq/modules/same_pad.py
new file mode 100644
index 0000000000000000000000000000000000000000..4c04990ea6fdb291f162ee8ac3d17a92483daf8e
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/modules/same_pad.py
@@ -0,0 +1,21 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+
+from torch import nn
+
+
+class SamePad(nn.Module):
+    def __init__(self, kernel_size, causal=False):
+        super().__init__()
+        if causal:
+            self.remove = kernel_size - 1
+        else:
+            self.remove = 1 if kernel_size % 2 == 0 else 0
+
+    def forward(self, x):
+        if self.remove > 0:
+            x = x[:, :, : -self.remove]
+        return x
diff --git a/SpeechT5/fairseq/fairseq/modules/scalar_bias.py b/SpeechT5/fairseq/fairseq/modules/scalar_bias.py
new file mode 100644
index 0000000000000000000000000000000000000000..c96247c75914fabb8a2b7ff731bb82b588f72690
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/modules/scalar_bias.py
@@ -0,0 +1,31 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+#
+
+import torch
+
+
+class ScalarBias(torch.autograd.Function):
+    """
+    Adds a vector of scalars, used in self-attention mechanism to allow
+    the model to optionally attend to this vector instead of the past
+    """
+
+    @staticmethod
+    def forward(ctx, input, dim, bias_init):
+        size = list(input.size())
+        size[dim] += 1
+        output = input.new(*size).fill_(bias_init)
+        output.narrow(dim, 1, size[dim] - 1).copy_(input)
+        ctx.dim = dim
+        return output
+
+    @staticmethod
+    def backward(ctx, grad):
+        return grad.narrow(ctx.dim, 1, grad.size(ctx.dim) - 1), None, None
+
+
+def scalar_bias(input, dim, bias_init=0):
+    return ScalarBias.apply(input, dim, bias_init)
diff --git a/SpeechT5/fairseq/fairseq/modules/sinusoidal_positional_embedding.py b/SpeechT5/fairseq/fairseq/modules/sinusoidal_positional_embedding.py
new file mode 100644
index 0000000000000000000000000000000000000000..4793ecfb522d0729fc2d24a3ddf0c6a774d67773
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/modules/sinusoidal_positional_embedding.py
@@ -0,0 +1,105 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import math
+from typing import Any, Optional
+
+import torch
+import torch.onnx.operators
+from fairseq import utils
+from torch import Tensor, nn
+
+
+class SinusoidalPositionalEmbedding(nn.Module):
+    """This module produces sinusoidal positional embeddings of any length.
+
+    Padding symbols are ignored.
+    """
+
+    def __init__(self, embedding_dim, padding_idx, init_size=1024):
+        super().__init__()
+        self.embedding_dim = embedding_dim
+        self.padding_idx = padding_idx if padding_idx is not None else 0
+        self.weights = SinusoidalPositionalEmbedding.get_embedding(
+            init_size, embedding_dim, padding_idx
+        )
+        self.onnx_trace = False
+        self.register_buffer("_float_tensor", torch.FloatTensor(1))
+        self.max_positions = int(1e5)
+
+    def prepare_for_onnx_export_(self):
+        self.onnx_trace = True
+
+    @staticmethod
+    def get_embedding(
+        num_embeddings: int, embedding_dim: int, padding_idx: Optional[int] = None
+    ):
+        """Build sinusoidal embeddings.
+
+        This matches the implementation in tensor2tensor, but differs slightly
+        from the description in Section 3.5 of "Attention Is All You Need".
+        """
+        half_dim = embedding_dim // 2
+        emb = math.log(10000) / (half_dim - 1)
+        emb = torch.exp(torch.arange(half_dim, dtype=torch.float) * -emb)
+        emb = torch.arange(num_embeddings, dtype=torch.float).unsqueeze(
+            1
+        ) * emb.unsqueeze(0)
+        emb = torch.cat([torch.sin(emb), torch.cos(emb)], dim=1).view(
+            num_embeddings, -1
+        )
+        if embedding_dim % 2 == 1:
+            # zero pad
+            emb = torch.cat([emb, torch.zeros(num_embeddings, 1)], dim=1)
+        if padding_idx is not None:
+            emb[padding_idx, :] = 0
+        return emb
+
+    def forward(
+        self,
+        input,
+        incremental_state: Optional[Any] = None,
+        timestep: Optional[Tensor] = None,
+        positions: Optional[Any] = None,
+    ):
+        """Input is expected to be of size [bsz x seqlen]."""
+        bspair = torch.onnx.operators.shape_as_tensor(input)
+        bsz, seq_len = bspair[0], bspair[1]
+        max_pos = self.padding_idx + 1 + seq_len
+        if self.weights is None or max_pos > self.weights.size(0):
+            # recompute/expand embeddings if needed
+            self.weights = SinusoidalPositionalEmbedding.get_embedding(
+                max_pos, self.embedding_dim, self.padding_idx
+            )
+        self.weights = self.weights.to(self._float_tensor)
+
+        if incremental_state is not None:
+            # positions is the same for every token when decoding a single step
+            pos = timestep.view(-1)[0] + 1 if timestep is not None else seq_len
+            if self.onnx_trace:
+                return (
+                    self.weights.index_select(index=self.padding_idx + pos, dim=0)
+                    .unsqueeze(1)
+                    .repeat(bsz, 1, 1)
+                )
+            return self.weights[self.padding_idx + pos, :].expand(bsz, 1, -1)
+
+        positions = utils.make_positions(
+            input, self.padding_idx, onnx_trace=self.onnx_trace
+        )
+        if self.onnx_trace:
+            flat_embeddings = self.weights.detach().index_select(0, positions.view(-1))
+            embedding_shape = torch.cat(
+                (bsz.view(1), seq_len.view(1), torch.tensor([-1], dtype=torch.long))
+            )
+            embeddings = torch.onnx.operators.reshape_from_tensor_shape(
+                flat_embeddings, embedding_shape
+            )
+            return embeddings
+        return (
+            self.weights.index_select(0, positions.view(-1))
+            .view(bsz, seq_len, -1)
+            .detach()
+        )
diff --git a/SpeechT5/fairseq/fairseq/modules/sparse_multihead_attention.py b/SpeechT5/fairseq/fairseq/modules/sparse_multihead_attention.py
new file mode 100644
index 0000000000000000000000000000000000000000..3cbd9d6785886e319aab0601517e27df733b6f97
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/modules/sparse_multihead_attention.py
@@ -0,0 +1,140 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import math
+
+import torch
+
+from .multihead_attention import MultiheadAttention
+
+
+class SparseMultiheadAttention(MultiheadAttention):
+    """Sparse Multi-Headed Attention.
+
+    "Generating Long Sequences with Sparse Transformers". Implements
+    fixed factorized self attention, where l=stride and c=expressivity.
+    A(1) includes all words in the stride window and A(2) takes a summary of c
+    words from the end of each stride window.
+    If is_bidirectional=False, we do not include any words past the current word,
+    as in the paper.
+    """
+
+    def __init__(
+        self,
+        embed_dim,
+        num_heads,
+        kdim=None,
+        vdim=None,
+        dropout=0.0,
+        bias=True,
+        add_bias_kv=False,
+        add_zero_attn=False,
+        self_attention=False,
+        encoder_decoder_attention=False,
+        stride=32,
+        expressivity=8,
+        is_bidirectional=True,
+    ):
+
+        super().__init__(
+            embed_dim,
+            num_heads,
+            kdim,
+            vdim,
+            dropout,
+            bias,
+            add_bias_kv,
+            add_zero_attn,
+            self_attention,
+            encoder_decoder_attention,
+        )
+
+        self.is_bidirectional = is_bidirectional
+        self.stride = stride
+        self.expressivity = expressivity
+        assert self.stride > 0 and self.stride >= self.expressivity
+
+    # Used for Ai(2) calculations - beginning of [l-c, l] range
+    def compute_checkpoint(self, word_index):
+        if word_index % self.stride == 0 and word_index != 0:
+            checkpoint_index = word_index - self.expressivity
+        else:
+            checkpoint_index = (
+                math.floor(word_index / self.stride) * self.stride
+                + self.stride
+                - self.expressivity
+            )
+        return checkpoint_index
+
+    # Computes Ai(2)
+    def compute_subset_summaries(self, absolute_max):
+        checkpoint_index = self.compute_checkpoint(0)
+        subset_two = set()
+        while checkpoint_index <= absolute_max - 1:
+            summary = set(
+                range(
+                    checkpoint_index,
+                    min(checkpoint_index + self.expressivity + 1, absolute_max),
+                )
+            )
+            subset_two = subset_two.union(summary)
+            checkpoint_index = self.compute_checkpoint(checkpoint_index + self.stride)
+        return subset_two
+
+    # Sparse Transformer Fixed Attention Pattern: https://arxiv.org/pdf/1904.10509.pdf
+    def compute_fixed_attention_subset(self, word_index, tgt_len):
+        # +1s account for range function; [min, max) -> [min, max]
+        if not self.is_bidirectional:
+            absolute_max = word_index + 1
+        else:
+            absolute_max = tgt_len
+
+        # Subset 1 - whole window
+        rounded_index = (
+            math.floor((word_index + self.stride) / self.stride) * self.stride
+        )
+        if word_index % self.stride == 0 and word_index != 0:
+            subset_one = set(
+                range(word_index - self.stride, min(absolute_max, word_index + 1))
+            )
+        else:
+            subset_one = set(
+                range(
+                    max(0, rounded_index - self.stride),
+                    min(absolute_max, rounded_index + 1),
+                )
+            )
+
+        # Subset 2 - summary per window
+        # If bidirectional, subset 2 is the same for every index
+        subset_two = set()
+        if not self.is_bidirectional:
+            subset_two = self.compute_subset_summaries(absolute_max)
+
+        return subset_one.union(subset_two)
+
+    # Compute sparse mask - if bidirectional, can pre-compute and store
+    def buffered_sparse_mask(self, tensor, tgt_len, src_len):
+        assert tgt_len > self.stride
+        sparse_mask = torch.empty((tgt_len, src_len)).float().fill_(float("-inf"))
+
+        # If bidirectional, subset 2 is the same for every index
+        subset_summaries = set()
+        if self.is_bidirectional:
+            subset_summaries = self.compute_subset_summaries(tgt_len)
+
+        for i in range(tgt_len):
+            fixed_attention_subset = self.compute_fixed_attention_subset(i, tgt_len)
+            fixed_attention_subset = fixed_attention_subset.union(subset_summaries)
+            included_word_indices = torch.LongTensor(list(fixed_attention_subset))
+            sparse_mask[i].index_fill_(0, included_word_indices, 0)
+        return sparse_mask.type_as(tensor)
+
+    def apply_sparse_mask(self, attn_weights, tgt_len, src_len, bsz):
+        sparse_mask = self.buffered_sparse_mask(attn_weights, tgt_len, src_len)
+        sparse_mask = sparse_mask.unsqueeze(0).expand(
+            bsz * self.num_heads, tgt_len, src_len
+        )
+        attn_weights += sparse_mask
diff --git a/SpeechT5/fairseq/fairseq/modules/sparse_transformer_sentence_encoder.py b/SpeechT5/fairseq/fairseq/modules/sparse_transformer_sentence_encoder.py
new file mode 100644
index 0000000000000000000000000000000000000000..f41ec09327fe80b50d20674e7482794ce45c531c
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/modules/sparse_transformer_sentence_encoder.py
@@ -0,0 +1,96 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import torch.nn as nn
+from fairseq.modules import TransformerSentenceEncoder
+from fairseq.modules.sparse_transformer_sentence_encoder_layer import (
+    SparseTransformerSentenceEncoderLayer,
+)
+
+
+class SparseTransformerSentenceEncoder(TransformerSentenceEncoder):
+    """
+    Sparse implementation of the TransformerSentenceEncoder
+    - see SparseMultiheadAttention
+    """
+
+    def __init__(
+        self,
+        padding_idx: int,
+        vocab_size: int,
+        num_encoder_layers: int = 6,
+        embedding_dim: int = 768,
+        ffn_embedding_dim: int = 3072,
+        num_attention_heads: int = 8,
+        dropout: float = 0.1,
+        attention_dropout: float = 0.1,
+        activation_dropout: float = 0.1,
+        max_seq_len: int = 256,
+        num_segments: int = 2,
+        use_position_embeddings: bool = True,
+        offset_positions_by_padding: bool = True,
+        encoder_normalize_before: bool = False,
+        apply_bert_init: bool = False,
+        activation_fn: str = "relu",
+        learned_pos_embedding: bool = True,
+        embed_scale: float = None,
+        freeze_embeddings: bool = False,
+        n_trans_layers_to_freeze: int = 0,
+        export: bool = False,
+        is_bidirectional: bool = True,
+        stride: int = 32,
+        expressivity: int = 8,
+    ) -> None:
+
+        super().__init__(
+            padding_idx,
+            vocab_size,
+            num_encoder_layers,
+            embedding_dim,
+            ffn_embedding_dim,
+            num_attention_heads,
+            dropout,
+            attention_dropout,
+            activation_dropout,
+            max_seq_len,
+            num_segments,
+            use_position_embeddings,
+            offset_positions_by_padding,
+            encoder_normalize_before,
+            apply_bert_init,
+            activation_fn,
+            learned_pos_embedding,
+            embed_scale,
+            freeze_embeddings,
+            n_trans_layers_to_freeze,
+            export,
+        )
+
+        self.layers = nn.ModuleList(
+            [
+                SparseTransformerSentenceEncoderLayer(
+                    embedding_dim=self.embedding_dim,
+                    ffn_embedding_dim=ffn_embedding_dim,
+                    num_attention_heads=num_attention_heads,
+                    dropout=dropout,
+                    attention_dropout=attention_dropout,
+                    activation_dropout=activation_dropout,
+                    activation_fn=activation_fn,
+                    export=export,
+                    is_bidirectional=is_bidirectional,
+                    stride=stride,
+                    expressivity=expressivity,
+                )
+                for _ in range(num_encoder_layers)
+            ]
+        )
+
+        def freeze_module_params(m):
+            if m is not None:
+                for p in m.parameters():
+                    p.requires_grad = False
+
+        for layer in range(n_trans_layers_to_freeze):
+            freeze_module_params(self.layers[layer])
diff --git a/SpeechT5/fairseq/fairseq/modules/sparse_transformer_sentence_encoder_layer.py b/SpeechT5/fairseq/fairseq/modules/sparse_transformer_sentence_encoder_layer.py
new file mode 100644
index 0000000000000000000000000000000000000000..d95da59c2471bfa858fd627605196d7f41f9ec12
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/modules/sparse_transformer_sentence_encoder_layer.py
@@ -0,0 +1,51 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from fairseq.modules import TransformerSentenceEncoderLayer
+from fairseq.modules.sparse_multihead_attention import SparseMultiheadAttention
+
+
+class SparseTransformerSentenceEncoderLayer(TransformerSentenceEncoderLayer):
+    """
+    Implements a Sprase Transformer Encoder Layer (see SparseMultiheadAttention)
+    """
+
+    def __init__(
+        self,
+        embedding_dim: int = 768,
+        ffn_embedding_dim: int = 3072,
+        num_attention_heads: int = 8,
+        dropout: float = 0.1,
+        attention_dropout: float = 0.1,
+        activation_dropout: float = 0.1,
+        activation_fn: str = "relu",
+        export: bool = False,
+        is_bidirectional: bool = True,
+        stride: int = 32,
+        expressivity: int = 8,
+    ) -> None:
+
+        super().__init__(
+            embedding_dim,
+            ffn_embedding_dim,
+            num_attention_heads,
+            dropout,
+            attention_dropout,
+            activation_dropout,
+            activation_fn,
+            export,
+        )
+
+        self.self_attn = SparseMultiheadAttention(
+            self.embedding_dim,
+            num_attention_heads,
+            dropout=attention_dropout,
+            add_bias_kv=False,
+            add_zero_attn=False,
+            self_attention=True,
+            is_bidirectional=is_bidirectional,
+            stride=stride,
+            expressivity=expressivity,
+        )
diff --git a/SpeechT5/fairseq/fairseq/modules/transformer_layer.py b/SpeechT5/fairseq/fairseq/modules/transformer_layer.py
new file mode 100644
index 0000000000000000000000000000000000000000..aa06a4293519eacf4194f6e357ebfac6e6003158
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/modules/transformer_layer.py
@@ -0,0 +1,419 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from typing import Dict, List, Optional
+
+import torch
+import torch.nn as nn
+from fairseq import utils
+from fairseq.modules import LayerNorm, MultiheadAttention
+from fairseq.modules.fairseq_dropout import FairseqDropout
+from fairseq.modules.quant_noise import quant_noise
+from torch import Tensor
+
+
+class TransformerEncoderLayer(nn.Module):
+    """Encoder layer block.
+
+    In the original paper each operation (multi-head attention or FFN) is
+    postprocessed with: `dropout -> add residual -> layernorm`. In the
+    tensor2tensor code they suggest that learning is more robust when
+    preprocessing each layer with layernorm and postprocessing with:
+    `dropout -> add residual`. We default to the approach in the paper, but the
+    tensor2tensor approach can be enabled by setting
+    *args.encoder_normalize_before* to ``True``.
+
+    Args:
+        args (argparse.Namespace): parsed command-line arguments
+    """
+
+    def __init__(self, args):
+        super().__init__()
+        self.args = args
+        self.embed_dim = args.encoder_embed_dim
+        self.quant_noise = getattr(args, "quant_noise_pq", 0)
+        self.quant_noise_block_size = getattr(args, "quant_noise_pq_block_size", 8) or 8
+        self.self_attn = self.build_self_attention(self.embed_dim, args)
+        export = getattr(args, "export", False)
+        self.self_attn_layer_norm = LayerNorm(self.embed_dim, export=export)
+        self.dropout_module = FairseqDropout(
+            args.dropout, module_name=self.__class__.__name__
+        )
+        self.activation_fn = utils.get_activation_fn(
+            activation=getattr(args, "activation_fn", "relu") or "relu"
+        )
+        activation_dropout_p = getattr(args, "activation_dropout", 0) or 0
+        if activation_dropout_p == 0:
+            # for backwards compatibility with models that use args.relu_dropout
+            activation_dropout_p = getattr(args, "relu_dropout", 0) or 0
+        self.activation_dropout_module = FairseqDropout(
+            float(activation_dropout_p), module_name=self.__class__.__name__
+        )
+        self.normalize_before = args.encoder_normalize_before
+        self.fc1 = self.build_fc1(
+            self.embed_dim,
+            args.encoder_ffn_embed_dim,
+            self.quant_noise,
+            self.quant_noise_block_size,
+        )
+        self.fc2 = self.build_fc2(
+            args.encoder_ffn_embed_dim,
+            self.embed_dim,
+            self.quant_noise,
+            self.quant_noise_block_size,
+        )
+
+        self.final_layer_norm = LayerNorm(self.embed_dim, export=export)
+
+    def build_fc1(self, input_dim, output_dim, q_noise, qn_block_size):
+        return quant_noise(
+            nn.Linear(input_dim, output_dim), p=q_noise, block_size=qn_block_size
+        )
+
+    def build_fc2(self, input_dim, output_dim, q_noise, qn_block_size):
+        return quant_noise(
+            nn.Linear(input_dim, output_dim), p=q_noise, block_size=qn_block_size
+        )
+
+    def build_self_attention(self, embed_dim, args):
+        return MultiheadAttention(
+            embed_dim,
+            args.encoder_attention_heads,
+            dropout=args.attention_dropout,
+            self_attention=True,
+            q_noise=self.quant_noise,
+            qn_block_size=self.quant_noise_block_size,
+        )
+
+    def residual_connection(self, x, residual):
+        return residual + x
+
+    def upgrade_state_dict_named(self, state_dict, name):
+        """
+        Rename layer norm states from `...layer_norms.0.weight` to
+        `...self_attn_layer_norm.weight` and `...layer_norms.1.weight` to
+        `...final_layer_norm.weight`
+        """
+        layer_norm_map = {"0": "self_attn_layer_norm", "1": "final_layer_norm"}
+        for old, new in layer_norm_map.items():
+            for m in ("weight", "bias"):
+                k = "{}.layer_norms.{}.{}".format(name, old, m)
+                if k in state_dict:
+                    state_dict["{}.{}.{}".format(name, new, m)] = state_dict[k]
+                    del state_dict[k]
+
+    def forward(
+        self,
+        x,
+        encoder_padding_mask: Optional[Tensor],
+        attn_mask: Optional[Tensor] = None,
+    ):
+        """
+        Args:
+            x (Tensor): input to the layer of shape `(seq_len, batch, embed_dim)`
+            encoder_padding_mask (ByteTensor): binary ByteTensor of shape
+                `(batch, seq_len)` where padding elements are indicated by ``1``.
+            attn_mask (ByteTensor): binary tensor of shape `(tgt_len, src_len)`,
+                where `tgt_len` is the length of output and `src_len` is the
+                length of input, though here both are equal to `seq_len`.
+                `attn_mask[tgt_i, src_j] = 1` means that when calculating the
+                embedding for `tgt_i`, we exclude (mask out) `src_j`. This is
+                useful for strided self-attention.
+
+        Returns:
+            encoded output of shape `(seq_len, batch, embed_dim)`
+        """
+        # anything in original attn_mask = 1, becomes -1e8
+        # anything in original attn_mask = 0, becomes 0
+        # Note that we cannot use -inf here, because at some edge cases,
+        # the attention weight (before softmax) for some padded element in query
+        # will become -inf, which results in NaN in model parameters
+        if attn_mask is not None:
+            attn_mask = attn_mask.masked_fill(attn_mask.to(torch.bool), -1e8)
+
+        residual = x
+        if self.normalize_before:
+            x = self.self_attn_layer_norm(x)
+        x, _ = self.self_attn(
+            query=x,
+            key=x,
+            value=x,
+            key_padding_mask=encoder_padding_mask,
+            need_weights=False,
+            attn_mask=attn_mask,
+        )
+        x = self.dropout_module(x)
+        x = self.residual_connection(x, residual)
+        if not self.normalize_before:
+            x = self.self_attn_layer_norm(x)
+
+        residual = x
+        if self.normalize_before:
+            x = self.final_layer_norm(x)
+        x = self.activation_fn(self.fc1(x))
+        x = self.activation_dropout_module(x)
+        x = self.fc2(x)
+        x = self.dropout_module(x)
+        x = self.residual_connection(x, residual)
+        if not self.normalize_before:
+            x = self.final_layer_norm(x)
+        return x
+
+
+class TransformerDecoderLayer(nn.Module):
+    """Decoder layer block.
+
+    In the original paper each operation (multi-head attention, encoder
+    attention or FFN) is postprocessed with: `dropout -> add residual ->
+    layernorm`. In the tensor2tensor code they suggest that learning is more
+    robust when preprocessing each layer with layernorm and postprocessing with:
+    `dropout -> add residual`. We default to the approach in the paper, but the
+    tensor2tensor approach can be enabled by setting
+    *args.decoder_normalize_before* to ``True``.
+
+    Args:
+        args (argparse.Namespace): parsed command-line arguments
+        no_encoder_attn (bool, optional): whether to attend to encoder outputs
+            (default: False).
+    """
+
+    def __init__(
+        self, args, no_encoder_attn=False, add_bias_kv=False, add_zero_attn=False
+    ):
+        super().__init__()
+        self.embed_dim = args.decoder_embed_dim
+        self.dropout_module = FairseqDropout(
+            args.dropout, module_name=self.__class__.__name__
+        )
+        self.quant_noise = getattr(args, "quant_noise_pq", 0)
+        self.quant_noise_block_size = getattr(args, "quant_noise_pq_block_size", 8)
+
+        self.cross_self_attention = getattr(args, "cross_self_attention", False)
+
+        self.self_attn = self.build_self_attention(
+            self.embed_dim,
+            args,
+            add_bias_kv=add_bias_kv,
+            add_zero_attn=add_zero_attn,
+        )
+
+        self.activation_fn = utils.get_activation_fn(
+            activation=str(args.activation_fn)
+            if getattr(args, "activation_fn", None) is not None
+            else "relu"
+        )
+        activation_dropout_p = getattr(args, "activation_dropout", 0) or 0
+        if activation_dropout_p == 0:
+            # for backwards compatibility with models that use args.relu_dropout
+            activation_dropout_p = getattr(args, "relu_dropout", 0) or 0
+        self.activation_dropout_module = FairseqDropout(
+            float(activation_dropout_p), module_name=self.__class__.__name__
+        )
+        self.normalize_before = args.decoder_normalize_before
+
+        export = getattr(args, "export", False)
+        self.self_attn_layer_norm = LayerNorm(self.embed_dim, export=export)
+
+        if no_encoder_attn:
+            self.encoder_attn = None
+            self.encoder_attn_layer_norm = None
+        else:
+            self.encoder_attn = self.build_encoder_attention(self.embed_dim, args)
+            self.encoder_attn_layer_norm = LayerNorm(self.embed_dim, export=export)
+
+        self.fc1 = self.build_fc1(
+            self.embed_dim,
+            args.decoder_ffn_embed_dim,
+            self.quant_noise,
+            self.quant_noise_block_size,
+        )
+        self.fc2 = self.build_fc2(
+            args.decoder_ffn_embed_dim,
+            self.embed_dim,
+            self.quant_noise,
+            self.quant_noise_block_size,
+        )
+
+        self.final_layer_norm = LayerNorm(self.embed_dim, export=export)
+        self.need_attn = True
+
+        self.onnx_trace = False
+
+    def build_fc1(self, input_dim, output_dim, q_noise, qn_block_size):
+        return quant_noise(nn.Linear(input_dim, output_dim), q_noise, qn_block_size)
+
+    def build_fc2(self, input_dim, output_dim, q_noise, qn_block_size):
+        return quant_noise(nn.Linear(input_dim, output_dim), q_noise, qn_block_size)
+
+    def build_self_attention(
+        self, embed_dim, args, add_bias_kv=False, add_zero_attn=False
+    ):
+        return MultiheadAttention(
+            embed_dim,
+            args.decoder_attention_heads,
+            dropout=args.attention_dropout,
+            add_bias_kv=add_bias_kv,
+            add_zero_attn=add_zero_attn,
+            self_attention=not getattr(args, "cross_self_attention", False),
+            q_noise=self.quant_noise,
+            qn_block_size=self.quant_noise_block_size,
+        )
+
+    def build_encoder_attention(self, embed_dim, args):
+        return MultiheadAttention(
+            embed_dim,
+            args.decoder_attention_heads,
+            kdim=getattr(args, "encoder_embed_dim", None),
+            vdim=getattr(args, "encoder_embed_dim", None),
+            dropout=args.attention_dropout,
+            encoder_decoder_attention=True,
+            q_noise=self.quant_noise,
+            qn_block_size=self.quant_noise_block_size,
+        )
+
+    def prepare_for_onnx_export_(self):
+        self.onnx_trace = True
+
+    def residual_connection(self, x, residual):
+        return residual + x
+
+    def forward(
+        self,
+        x,
+        encoder_out: Optional[torch.Tensor] = None,
+        encoder_padding_mask: Optional[torch.Tensor] = None,
+        incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]] = None,
+        prev_self_attn_state: Optional[List[torch.Tensor]] = None,
+        prev_attn_state: Optional[List[torch.Tensor]] = None,
+        self_attn_mask: Optional[torch.Tensor] = None,
+        self_attn_padding_mask: Optional[torch.Tensor] = None,
+        need_attn: bool = False,
+        need_head_weights: bool = False,
+    ):
+        """
+        Args:
+            x (Tensor): input to the layer of shape `(seq_len, batch, embed_dim)`
+            encoder_padding_mask (ByteTensor, optional): binary
+                ByteTensor of shape `(batch, src_len)` where padding
+                elements are indicated by ``1``.
+            need_attn (bool, optional): return attention weights
+            need_head_weights (bool, optional): return attention weights
+                for each head (default: return average over heads).
+
+        Returns:
+            encoded output of shape `(seq_len, batch, embed_dim)`
+        """
+        if need_head_weights:
+            need_attn = True
+
+        residual = x
+        if self.normalize_before:
+            x = self.self_attn_layer_norm(x)
+        if prev_self_attn_state is not None:
+            prev_key, prev_value = prev_self_attn_state[:2]
+            saved_state: Dict[str, Optional[Tensor]] = {
+                "prev_key": prev_key,
+                "prev_value": prev_value,
+            }
+            if len(prev_self_attn_state) >= 3:
+                saved_state["prev_key_padding_mask"] = prev_self_attn_state[2]
+            assert incremental_state is not None
+            self.self_attn._set_input_buffer(incremental_state, saved_state)
+        _self_attn_input_buffer = self.self_attn._get_input_buffer(incremental_state)
+        if self.cross_self_attention and not (
+            incremental_state is not None
+            and _self_attn_input_buffer is not None
+            and "prev_key" in _self_attn_input_buffer
+        ):
+            if self_attn_mask is not None:
+                assert encoder_out is not None
+                self_attn_mask = torch.cat(
+                    (x.new_zeros(x.size(0), encoder_out.size(0)), self_attn_mask), dim=1
+                )
+            if self_attn_padding_mask is not None:
+                if encoder_padding_mask is None:
+                    assert encoder_out is not None
+                    encoder_padding_mask = self_attn_padding_mask.new_zeros(
+                        encoder_out.size(1), encoder_out.size(0)
+                    )
+                self_attn_padding_mask = torch.cat(
+                    (encoder_padding_mask, self_attn_padding_mask), dim=1
+                )
+            assert encoder_out is not None
+            y = torch.cat((encoder_out, x), dim=0)
+        else:
+            y = x
+
+        x, attn = self.self_attn(
+            query=x,
+            key=y,
+            value=y,
+            key_padding_mask=self_attn_padding_mask,
+            incremental_state=incremental_state,
+            need_weights=False,
+            attn_mask=self_attn_mask,
+        )
+        x = self.dropout_module(x)
+        x = self.residual_connection(x, residual)
+        if not self.normalize_before:
+            x = self.self_attn_layer_norm(x)
+
+        if self.encoder_attn is not None and encoder_out is not None:
+            residual = x
+            if self.normalize_before:
+                x = self.encoder_attn_layer_norm(x)
+            if prev_attn_state is not None:
+                prev_key, prev_value = prev_attn_state[:2]
+                saved_state: Dict[str, Optional[Tensor]] = {
+                    "prev_key": prev_key,
+                    "prev_value": prev_value,
+                }
+                if len(prev_attn_state) >= 3:
+                    saved_state["prev_key_padding_mask"] = prev_attn_state[2]
+                assert incremental_state is not None
+                self.encoder_attn._set_input_buffer(incremental_state, saved_state)
+
+            x, attn = self.encoder_attn(
+                query=x,
+                key=encoder_out,
+                value=encoder_out,
+                key_padding_mask=encoder_padding_mask,
+                incremental_state=incremental_state,
+                static_kv=True,
+                need_weights=need_attn or (not self.training and self.need_attn),
+                need_head_weights=need_head_weights,
+            )
+            x = self.dropout_module(x)
+            x = self.residual_connection(x, residual)
+            if not self.normalize_before:
+                x = self.encoder_attn_layer_norm(x)
+
+        residual = x
+        if self.normalize_before:
+            x = self.final_layer_norm(x)
+
+        x = self.activation_fn(self.fc1(x))
+        x = self.activation_dropout_module(x)
+        x = self.fc2(x)
+        x = self.dropout_module(x)
+        x = self.residual_connection(x, residual)
+        if not self.normalize_before:
+            x = self.final_layer_norm(x)
+        if self.onnx_trace and incremental_state is not None:
+            saved_state = self.self_attn._get_input_buffer(incremental_state)
+            assert saved_state is not None
+            if self_attn_padding_mask is not None:
+                self_attn_state = [
+                    saved_state["prev_key"],
+                    saved_state["prev_value"],
+                    saved_state["prev_key_padding_mask"],
+                ]
+            else:
+                self_attn_state = [saved_state["prev_key"], saved_state["prev_value"]]
+            return x, attn, self_attn_state
+        return x, attn, None
+
+    def make_generation_fast_(self, need_attn: bool = False, **kwargs):
+        self.need_attn = need_attn
diff --git a/SpeechT5/fairseq/fairseq/modules/transformer_sentence_encoder.py b/SpeechT5/fairseq/fairseq/modules/transformer_sentence_encoder.py
new file mode 100644
index 0000000000000000000000000000000000000000..d0540d69229fb994b9e573a5016c9f239b7929e2
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/modules/transformer_sentence_encoder.py
@@ -0,0 +1,291 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from typing import Optional, Tuple
+
+import torch
+import torch.nn as nn
+from fairseq.modules import (
+    FairseqDropout,
+    LayerDropModuleList,
+    LayerNorm,
+    MultiheadAttention,
+    PositionalEmbedding,
+    TransformerSentenceEncoderLayer,
+)
+from fairseq.modules.quant_noise import quant_noise as apply_quant_noise_
+
+
+def init_bert_params(module):
+    """
+    Initialize the weights specific to the BERT Model.
+    This overrides the default initializations depending on the specified arguments.
+        1. If normal_init_linear_weights is set then weights of linear
+           layer will be initialized using the normal distribution and
+           bais will be set to the specified value.
+        2. If normal_init_embed_weights is set then weights of embedding
+           layer will be initialized using the normal distribution.
+        3. If normal_init_proj_weights is set then weights of
+           in_project_weight for MultiHeadAttention initialized using
+           the normal distribution (to be validated).
+    """
+
+    def normal_(data):
+        # with FSDP, module params will be on CUDA, so we cast them back to CPU
+        # so that the RNG is consistent with and without FSDP
+        data.copy_(
+            data.cpu().normal_(mean=0.0, std=0.02).to(data.device)
+        )
+
+    if isinstance(module, nn.Linear):
+        normal_(module.weight.data)
+        if module.bias is not None:
+            module.bias.data.zero_()
+    if isinstance(module, nn.Embedding):
+        normal_(module.weight.data)
+        if module.padding_idx is not None:
+            module.weight.data[module.padding_idx].zero_()
+    if isinstance(module, MultiheadAttention):
+        normal_(module.q_proj.weight.data)
+        normal_(module.k_proj.weight.data)
+        normal_(module.v_proj.weight.data)
+
+
+class TransformerSentenceEncoder(nn.Module):
+    """
+    Implementation for a Bi-directional Transformer based Sentence Encoder used
+    in BERT/XLM style pre-trained models.
+
+    This first computes the token embedding using the token embedding matrix,
+    position embeddings (if specified) and segment embeddings
+    (if specified). After applying the specified number of
+    TransformerEncoderLayers, it outputs all the internal states of the
+    encoder as well as the final representation associated with the first
+    token (usually CLS token).
+
+    Input:
+        - tokens: B x T matrix representing sentences
+        - segment_labels: B x T matrix representing segment label for tokens
+
+    Output:
+        - a tuple of the following:
+            - a list of internal model states used to compute the
+              predictions where each tensor has shape T x B x C
+            - sentence representation associated with first input token
+              in format B x C.
+    """
+
+    def __init__(
+        self,
+        padding_idx: int,
+        vocab_size: int,
+        num_encoder_layers: int = 6,
+        embedding_dim: int = 768,
+        ffn_embedding_dim: int = 3072,
+        num_attention_heads: int = 8,
+        dropout: float = 0.1,
+        attention_dropout: float = 0.1,
+        activation_dropout: float = 0.1,
+        layerdrop: float = 0.0,
+        max_seq_len: int = 256,
+        num_segments: int = 2,
+        use_position_embeddings: bool = True,
+        offset_positions_by_padding: bool = True,
+        encoder_normalize_before: bool = False,
+        apply_bert_init: bool = False,
+        activation_fn: str = "relu",
+        learned_pos_embedding: bool = True,
+        embed_scale: float = None,
+        freeze_embeddings: bool = False,
+        n_trans_layers_to_freeze: int = 0,
+        export: bool = False,
+        traceable: bool = False,
+        q_noise: float = 0.0,
+        qn_block_size: int = 8,
+    ) -> None:
+
+        super().__init__()
+        self.padding_idx = padding_idx
+        self.vocab_size = vocab_size
+        self.dropout_module = FairseqDropout(
+            dropout, module_name=self.__class__.__name__
+        )
+        self.layerdrop = layerdrop
+        self.max_seq_len = max_seq_len
+        self.embedding_dim = embedding_dim
+        self.num_segments = num_segments
+        self.use_position_embeddings = use_position_embeddings
+        self.apply_bert_init = apply_bert_init
+        self.learned_pos_embedding = learned_pos_embedding
+        self.traceable = traceable
+
+        self.embed_tokens = self.build_embedding(
+            self.vocab_size, self.embedding_dim, self.padding_idx
+        )
+        self.embed_scale = embed_scale
+
+        if q_noise > 0:
+            self.quant_noise = apply_quant_noise_(
+                nn.Linear(self.embedding_dim, self.embedding_dim, bias=False),
+                q_noise,
+                qn_block_size,
+            )
+        else:
+            self.quant_noise = None
+
+        self.segment_embeddings = (
+            nn.Embedding(self.num_segments, self.embedding_dim, padding_idx=None)
+            if self.num_segments > 0
+            else None
+        )
+
+        self.embed_positions = (
+            PositionalEmbedding(
+                self.max_seq_len,
+                self.embedding_dim,
+                padding_idx=(self.padding_idx if offset_positions_by_padding else None),
+                learned=self.learned_pos_embedding,
+            )
+            if self.use_position_embeddings
+            else None
+        )
+
+        if encoder_normalize_before:
+            self.emb_layer_norm = LayerNorm(self.embedding_dim, export=export)
+        else:
+            self.emb_layer_norm = None
+
+        if self.layerdrop > 0.0:
+            self.layers = LayerDropModuleList(p=self.layerdrop)
+        else:
+            self.layers = nn.ModuleList([])
+        self.layers.extend(
+            [
+                self.build_transformer_sentence_encoder_layer(
+                    embedding_dim=self.embedding_dim,
+                    ffn_embedding_dim=ffn_embedding_dim,
+                    num_attention_heads=num_attention_heads,
+                    dropout=self.dropout_module.p,
+                    attention_dropout=attention_dropout,
+                    activation_dropout=activation_dropout,
+                    activation_fn=activation_fn,
+                    export=export,
+                    q_noise=q_noise,
+                    qn_block_size=qn_block_size,
+                )
+                for _ in range(num_encoder_layers)
+            ]
+        )
+
+        # Apply initialization of model params after building the model
+        if self.apply_bert_init:
+            self.apply(init_bert_params)
+
+        def freeze_module_params(m):
+            if m is not None:
+                for p in m.parameters():
+                    p.requires_grad = False
+
+        if freeze_embeddings:
+            freeze_module_params(self.embed_tokens)
+            freeze_module_params(self.segment_embeddings)
+            freeze_module_params(self.embed_positions)
+            freeze_module_params(self.emb_layer_norm)
+
+        for layer in range(n_trans_layers_to_freeze):
+            freeze_module_params(self.layers[layer])
+
+    def build_embedding(self, vocab_size, embedding_dim, padding_idx):
+        return nn.Embedding(vocab_size, embedding_dim, padding_idx)
+
+    def build_transformer_sentence_encoder_layer(
+        self,
+        embedding_dim,
+        ffn_embedding_dim,
+        num_attention_heads,
+        dropout,
+        attention_dropout,
+        activation_dropout,
+        activation_fn,
+        export,
+        q_noise,
+        qn_block_size,
+    ):
+        return TransformerSentenceEncoderLayer(
+            embedding_dim=embedding_dim,
+            ffn_embedding_dim=ffn_embedding_dim,
+            num_attention_heads=num_attention_heads,
+            dropout=dropout,
+            attention_dropout=attention_dropout,
+            activation_dropout=activation_dropout,
+            activation_fn=activation_fn,
+            export=export,
+            q_noise=q_noise,
+            qn_block_size=qn_block_size,
+        )
+
+    def forward(
+        self,
+        tokens: torch.Tensor,
+        segment_labels: torch.Tensor = None,
+        last_state_only: bool = False,
+        positions: Optional[torch.Tensor] = None,
+        token_embeddings: Optional[torch.Tensor] = None,
+        attn_mask: Optional[torch.Tensor] = None,
+    ) -> Tuple[torch.Tensor, torch.Tensor]:
+        is_tpu = tokens.device.type == "xla"
+
+        # compute padding mask. This is needed for multi-head attention
+        padding_mask = tokens.eq(self.padding_idx)
+        if not self.traceable and not is_tpu and not padding_mask.any():
+            padding_mask = None
+
+        if token_embeddings is not None:
+            x = token_embeddings
+        else:
+            x = self.embed_tokens(tokens)
+
+        if self.embed_scale is not None:
+            x = x * self.embed_scale
+
+        if self.embed_positions is not None:
+            x = x + self.embed_positions(tokens, positions=positions)
+
+        if self.segment_embeddings is not None and segment_labels is not None:
+            x = x + self.segment_embeddings(segment_labels)
+
+        if self.quant_noise is not None:
+            x = self.quant_noise(x)
+
+        if self.emb_layer_norm is not None:
+            x = self.emb_layer_norm(x)
+
+        x = self.dropout_module(x)
+
+        # account for padding while computing the representation
+        if padding_mask is not None:
+            x = x * (1 - padding_mask.unsqueeze(-1).type_as(x))
+
+        # B x T x C -> T x B x C
+        x = x.transpose(0, 1)
+
+        inner_states = []
+        if not last_state_only:
+            inner_states.append(x)
+
+        for layer in self.layers:
+            x, _ = layer(x, self_attn_padding_mask=padding_mask, self_attn_mask=attn_mask)
+            if not last_state_only:
+                inner_states.append(x)
+
+        sentence_rep = x[0, :, :]
+
+        if last_state_only:
+            inner_states = [x]
+
+        if self.traceable:
+            return torch.stack(inner_states), sentence_rep
+        else:
+            return inner_states, sentence_rep
diff --git a/SpeechT5/fairseq/fairseq/modules/transformer_sentence_encoder_layer.py b/SpeechT5/fairseq/fairseq/modules/transformer_sentence_encoder_layer.py
new file mode 100644
index 0000000000000000000000000000000000000000..f869c4b2f8fb15f96a292e39bd293df7898a4fce
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/modules/transformer_sentence_encoder_layer.py
@@ -0,0 +1,139 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from typing import Callable, Optional
+
+import torch
+import torch.nn as nn
+from fairseq import utils
+from fairseq.modules import LayerNorm, MultiheadAttention
+from fairseq.modules.fairseq_dropout import FairseqDropout
+from fairseq.modules.quant_noise import quant_noise
+
+
+class TransformerSentenceEncoderLayer(nn.Module):
+    """
+    Implements a Transformer Encoder Layer used in BERT/XLM style pre-trained
+    models.
+    """
+
+    def __init__(
+        self,
+        embedding_dim: int = 768,
+        ffn_embedding_dim: int = 3072,
+        num_attention_heads: int = 8,
+        dropout: float = 0.1,
+        attention_dropout: float = 0.1,
+        activation_dropout: float = 0.1,
+        activation_fn: str = "relu",
+        export: bool = False,
+        q_noise: float = 0.0,
+        qn_block_size: int = 8,
+        init_fn: Callable = None,
+    ) -> None:
+        super().__init__()
+
+        if init_fn is not None:
+            init_fn()
+
+        # Initialize parameters
+        self.embedding_dim = embedding_dim
+        self.num_attention_heads = num_attention_heads
+        self.attention_dropout = attention_dropout
+        self.q_noise = q_noise
+        self.qn_block_size = qn_block_size
+
+        self.dropout_module = FairseqDropout(
+            dropout, module_name=self.__class__.__name__
+        )
+        self.activation_dropout_module = FairseqDropout(
+            activation_dropout, module_name=self.__class__.__name__
+        )
+
+        # Initialize blocks
+        self.activation_fn = utils.get_activation_fn(activation_fn)
+        self.self_attn = self.build_self_attention(
+            self.embedding_dim,
+            num_attention_heads,
+            dropout=attention_dropout,
+            self_attention=True,
+            q_noise=q_noise,
+            qn_block_size=qn_block_size,
+        )
+
+        # layer norm associated with the self attention layer
+        self.self_attn_layer_norm = LayerNorm(self.embedding_dim, export=export)
+
+        self.fc1 = self.build_fc1(
+            self.embedding_dim,
+            ffn_embedding_dim,
+            q_noise=q_noise,
+            qn_block_size=qn_block_size,
+        )
+        self.fc2 = self.build_fc2(
+            ffn_embedding_dim,
+            self.embedding_dim,
+            q_noise=q_noise,
+            qn_block_size=qn_block_size,
+        )
+
+        # layer norm associated with the position wise feed-forward NN
+        self.final_layer_norm = LayerNorm(self.embedding_dim, export=export)
+
+    def build_fc1(self, input_dim, output_dim, q_noise, qn_block_size):
+        return quant_noise(nn.Linear(input_dim, output_dim), q_noise, qn_block_size)
+
+    def build_fc2(self, input_dim, output_dim, q_noise, qn_block_size):
+        return quant_noise(nn.Linear(input_dim, output_dim), q_noise, qn_block_size)
+
+    def build_self_attention(
+        self,
+        embed_dim,
+        num_attention_heads,
+        dropout,
+        self_attention,
+        q_noise,
+        qn_block_size,
+    ):
+        return MultiheadAttention(
+            embed_dim,
+            num_attention_heads,
+            dropout=dropout,
+            self_attention=True,
+            q_noise=q_noise,
+            qn_block_size=qn_block_size,
+        )
+
+    def forward(
+        self,
+        x: torch.Tensor,
+        self_attn_mask: Optional[torch.Tensor] = None,
+        self_attn_padding_mask: Optional[torch.Tensor] = None,
+    ):
+        """
+        LayerNorm is applied either before or after the self-attention/ffn
+        modules similar to the original Transformer implementation.
+        """
+        residual = x
+        x, attn = self.self_attn(
+            query=x,
+            key=x,
+            value=x,
+            key_padding_mask=self_attn_padding_mask,
+            need_weights=False,
+            attn_mask=self_attn_mask,
+        )
+        x = self.dropout_module(x)
+        x = residual + x
+        x = self.self_attn_layer_norm(x)
+
+        residual = x
+        x = self.activation_fn(self.fc1(x))
+        x = self.activation_dropout_module(x)
+        x = self.fc2(x)
+        x = self.dropout_module(x)
+        x = residual + x
+        x = self.final_layer_norm(x)
+        return x, attn
diff --git a/SpeechT5/fairseq/fairseq/modules/transpose_last.py b/SpeechT5/fairseq/fairseq/modules/transpose_last.py
new file mode 100644
index 0000000000000000000000000000000000000000..e578b3ec5097bfac5c976b207ea46bec1d9bd4f5
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/modules/transpose_last.py
@@ -0,0 +1,20 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+"""
+transpose last 2 dimensions of the input
+"""
+
+import torch.nn as nn
+
+
+class TransposeLast(nn.Module):
+    def __init__(self, deconstruct_idx=None):
+        super().__init__()
+        self.deconstruct_idx = deconstruct_idx
+
+    def forward(self, x):
+        if self.deconstruct_idx is not None:
+            x = x[self.deconstruct_idx]
+        return x.transpose(-2, -1)
diff --git a/SpeechT5/fairseq/fairseq/modules/unfold.py b/SpeechT5/fairseq/fairseq/modules/unfold.py
new file mode 100644
index 0000000000000000000000000000000000000000..138272f1ef4f673b29e36aed4531106f7ce95968
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/modules/unfold.py
@@ -0,0 +1,19 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import torch.nn.functional as F
+
+
+def unfold1d(x, kernel_size, padding_l, pad_value=0):
+    """unfold T x B x C to T x B x C x K"""
+    if kernel_size > 1:
+        T, B, C = x.size()
+        x = F.pad(
+            x, (0, 0, 0, 0, padding_l, kernel_size - 1 - padding_l), value=pad_value
+        )
+        x = x.as_strided((T, B, C, kernel_size), (B * C, C, 1, B * C))
+    else:
+        x = x.unsqueeze(3)
+    return x
diff --git a/SpeechT5/fairseq/fairseq/modules/vggblock.py b/SpeechT5/fairseq/fairseq/modules/vggblock.py
new file mode 100644
index 0000000000000000000000000000000000000000..ee5ee19a34816c7350c21fba7c4907fec8ca7a61
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/modules/vggblock.py
@@ -0,0 +1,116 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from __future__ import absolute_import, division, print_function, unicode_literals
+
+from collections.abc import Iterable
+from itertools import repeat
+
+import torch
+import torch.nn as nn
+
+
+def _pair(v):
+    if isinstance(v, Iterable):
+        assert len(v) == 2, "len(v) != 2"
+        return v
+    return tuple(repeat(v, 2))
+
+
+def infer_conv_output_dim(conv_op, input_dim, sample_inchannel):
+    sample_seq_len = 200
+    sample_bsz = 10
+    x = torch.randn(sample_bsz, sample_inchannel, sample_seq_len, input_dim)
+    # N x C x H x W
+    # N: sample_bsz, C: sample_inchannel, H: sample_seq_len, W: input_dim
+    x = conv_op(x)
+    # N x C x H x W
+    x = x.transpose(1, 2)
+    # N x H x C x W
+    bsz, seq = x.size()[:2]
+    per_channel_dim = x.size()[3]
+    # bsz: N, seq: H, CxW the rest
+    return x.contiguous().view(bsz, seq, -1).size(-1), per_channel_dim
+
+
+class VGGBlock(torch.nn.Module):
+    """
+    VGG motibated cnn module https://arxiv.org/pdf/1409.1556.pdf
+
+    Args:
+        in_channels: (int) number of input channels (typically 1)
+        out_channels: (int) number of output channels
+        conv_kernel_size: convolution channels
+        pooling_kernel_size: the size of the pooling window to take a max over
+        num_conv_layers: (int) number of convolution layers
+        input_dim: (int) input dimension
+        conv_stride: the stride of the convolving kernel.
+            Can be a single number or a tuple (sH, sW)  Default: 1
+        padding: implicit paddings on both sides of the input.
+            Can be a single number or a tuple (padH, padW). Default: None
+        layer_norm: (bool) if layer norm is going to be applied. Default: False
+
+    Shape:
+        Input: BxCxTxfeat, i.e. (batch_size, input_size, timesteps, features)
+        Output: BxCxTxfeat, i.e. (batch_size, input_size, timesteps, features)
+    """
+
+    def __init__(
+        self,
+        in_channels,
+        out_channels,
+        conv_kernel_size,
+        pooling_kernel_size,
+        num_conv_layers,
+        input_dim,
+        conv_stride=1,
+        padding=None,
+        layer_norm=False,
+    ):
+        assert (
+            input_dim is not None
+        ), "Need input_dim for LayerNorm and infer_conv_output_dim"
+        super(VGGBlock, self).__init__()
+        self.in_channels = in_channels
+        self.out_channels = out_channels
+        self.conv_kernel_size = _pair(conv_kernel_size)
+        self.pooling_kernel_size = _pair(pooling_kernel_size)
+        self.num_conv_layers = num_conv_layers
+        self.padding = (
+            tuple(e // 2 for e in self.conv_kernel_size)
+            if padding is None
+            else _pair(padding)
+        )
+        self.conv_stride = _pair(conv_stride)
+
+        self.layers = nn.ModuleList()
+        for layer in range(num_conv_layers):
+            conv_op = nn.Conv2d(
+                in_channels if layer == 0 else out_channels,
+                out_channels,
+                self.conv_kernel_size,
+                stride=self.conv_stride,
+                padding=self.padding,
+            )
+            self.layers.append(conv_op)
+            if layer_norm:
+                conv_output_dim, per_channel_dim = infer_conv_output_dim(
+                    conv_op, input_dim, in_channels if layer == 0 else out_channels
+                )
+                self.layers.append(nn.LayerNorm(per_channel_dim))
+                input_dim = per_channel_dim
+            self.layers.append(nn.ReLU())
+
+        if self.pooling_kernel_size is not None:
+            pool_op = nn.MaxPool2d(kernel_size=self.pooling_kernel_size, ceil_mode=True)
+            self.layers.append(pool_op)
+            self.total_output_dim, self.output_dim = infer_conv_output_dim(
+                pool_op, input_dim, out_channels
+            )
+
+    def forward(self, x):
+        for i, _ in enumerate(self.layers):
+            x = self.layers[i](x)
+        return x
diff --git a/SpeechT5/fairseq/fairseq/nan_detector.py b/SpeechT5/fairseq/fairseq/nan_detector.py
new file mode 100644
index 0000000000000000000000000000000000000000..faa8031d4666c9ba9837919fe1c884dacf47ac3a
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/nan_detector.py
@@ -0,0 +1,108 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import logging
+
+import torch
+
+
+logger = logging.getLogger(__name__)
+
+
+class NanDetector:
+    """
+    Detects the first NaN or Inf in forward and/or backward pass and logs, together with the module name
+    """
+
+    def __init__(self, model, forward=True, backward=True):
+        self.bhooks = []
+        self.fhooks = []
+        self.forward = forward
+        self.backward = backward
+        self.named_parameters = list(model.named_parameters())
+        self.reset()
+
+        for name, mod in model.named_modules():
+            mod.__module_name = name
+            self.add_hooks(mod)
+
+    def __enter__(self):
+        return self
+
+    def __exit__(self, exc_type, exc_value, exc_traceback):
+        # Dump out all model gnorms to enable better debugging
+        norm = {}
+        gradients = {}
+        for name, param in self.named_parameters:
+            if param.grad is not None:
+                grad_norm = torch.norm(param.grad.data, p=2, dtype=torch.float32)
+                norm[name] = grad_norm.item()
+                if torch.isnan(grad_norm).any() or torch.isinf(grad_norm).any():
+                    gradients[name] = param.grad.data
+        if len(gradients) > 0:
+            logger.info("Detected nan/inf grad norm, dumping norms...")
+            logger.info(f"norms: {norm}")
+            logger.info(f"gradients: {gradients}")
+
+        self.close()
+
+    def add_hooks(self, module):
+        if self.forward:
+            self.fhooks.append(module.register_forward_hook(self.fhook_fn))
+        if self.backward:
+            self.bhooks.append(module.register_backward_hook(self.bhook_fn))
+
+    def reset(self):
+        self.has_printed_f = False
+        self.has_printed_b = False
+
+    def _detect(self, tensor, name, backward):
+        err = None
+        if (
+            torch.is_floating_point(tensor)
+            # single value tensors (like the loss) will not provide much info
+            and tensor.numel() >= 2
+        ):
+            with torch.no_grad():
+                if torch.isnan(tensor).any():
+                    err = "NaN"
+                elif torch.isinf(tensor).any():
+                    err = "Inf"
+        if err is not None:
+            err = f"{err} detected in output of {name}, shape: {tensor.shape}, {'backward' if backward else 'forward'}"
+        return err
+
+    def _apply(self, module, inp, x, backward):
+        if torch.is_tensor(x):
+            if isinstance(inp, tuple) and len(inp) > 0:
+                inp = inp[0]
+            err = self._detect(x, module.__module_name, backward)
+            if err is not None:
+                if torch.is_tensor(inp) and not backward:
+                    err += (
+                        f" input max: {inp.max().item()}, input min: {inp.min().item()}"
+                    )
+
+                has_printed_attr = "has_printed_b" if backward else "has_printed_f"
+                logger.warning(err)
+                setattr(self, has_printed_attr, True)
+        elif isinstance(x, dict):
+            for v in x.values():
+                self._apply(module, inp, v, backward)
+        elif isinstance(x, list) or isinstance(x, tuple):
+            for v in x:
+                self._apply(module, inp, v, backward)
+
+    def fhook_fn(self, module, inp, output):
+        if not self.has_printed_f:
+            self._apply(module, inp, output, backward=False)
+
+    def bhook_fn(self, module, inp, output):
+        if not self.has_printed_b:
+            self._apply(module, inp, output, backward=True)
+
+    def close(self):
+        for hook in self.fhooks + self.bhooks:
+            hook.remove()
diff --git a/SpeechT5/fairseq/fairseq/ngram_repeat_block.py b/SpeechT5/fairseq/fairseq/ngram_repeat_block.py
new file mode 100644
index 0000000000000000000000000000000000000000..854125149448a2d37ad2773cd1e6d614e73e0e79
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/ngram_repeat_block.py
@@ -0,0 +1,150 @@
+# Originally from Microsoft Corporation.
+# Licensed under the MIT License.
+
+""" Wrapper for ngram_repeat_block cuda extension """
+import torch
+from torch import nn
+
+import math
+from typing import Dict, List, Optional
+import warnings
+
+try:
+    from fairseq import ngram_repeat_block_cuda
+
+    EXTENSION_BUILT = True
+except ImportError:
+    EXTENSION_BUILT = False
+
+
+def is_cuda_extension_usable() -> bool:
+    """Check whether ngram_repeat_block_cuda is built properly"""
+    if not EXTENSION_BUILT or not torch.cuda.is_available():
+        return False
+    bsz = 2
+    tokens = torch.tensor([[4, 4, 3, 2], [1, 2, 3, 4]], dtype=torch.long, device="cuda")
+    lprobs = torch.rand((8, 12), device="cuda")
+    try:
+        outputs = ngram_repeat_block_cuda.forward(tokens, lprobs, bsz, 3, 4, 3)
+        outputs = outputs + 4  # This line breaks if the extension is built incorrectly.
+        return True
+    except RuntimeError:
+        warnings.warn(
+            "NGramRepeatBlock extension must be rebuilt."
+            'Run TORCH_CUDA_ARCH_LIST="6.0;6.1;7.0" python setup.py build_ext --inplace'
+        )
+        return False
+
+
+class NGramRepeatBlock(nn.Module):
+    """ Wrapper class for calling ngram_repeat_block cuda extension """
+
+    def __init__(self, no_repeat_ngram_size: int, use_extension: bool = True):
+        super().__init__()
+        self.use_extension = is_cuda_extension_usable() if use_extension else False
+        self.no_repeat_ngram_size = no_repeat_ngram_size
+
+    def reset_parameters(self):
+        pass
+
+    @torch.jit.unused
+    def call_cuda_extension(
+        self,
+        tokens,
+        lprobs,
+        bsz: int,
+        beam_size: int,
+        step: int,
+    ):
+        return ngram_repeat_block_cuda.forward(
+            tokens, lprobs, bsz, step, beam_size, self.no_repeat_ngram_size
+        )
+
+    def forward(
+        self,
+        tokens,
+        lprobs,
+        bsz: int,
+        beam_size: int,
+        step: int,
+    ):
+        """
+        Args:
+            tokens(Tensor): Input tokens(Bsz*beam, seq_len)
+            lprobs(Tensor): likelihood probability,
+            Expected to be updated in place.(Bsz*beam, vocab_size)
+            bsz(int): batch size
+            step(int): current step
+            beam_size(int): beam size
+            no_repeat_ngram_size(int): Ngram size
+        """
+        msg = f"expected {bsz *beam_size} got"
+        assert tokens.size(0) == bsz * beam_size, f"{msg} {tokens.size(0)}"
+        assert lprobs.size(0) == bsz * beam_size, f"{msg} {lprobs.size(0)}"
+        if self.use_extension:
+            return self.call_cuda_extension(tokens, lprobs, bsz, beam_size, step)
+
+        else:
+            return self._no_repeat_ngram(
+                tokens,
+                lprobs,
+                bsz,
+                beam_size,
+                step,
+            )
+
+    def _no_repeat_ngram(self, tokens, lprobs, bsz: int, beam_size: int, step: int):
+        """For each hypothesis generate a list of previous ngrams and set associated lprobs to -inf"""
+        gen_ngrams: List[Dict[str, List[int]]] = [
+            torch.jit.annotate(Dict[str, List[int]], {})
+            for bbsz_idx in range(bsz * beam_size)
+        ]
+        cpu_tokens = tokens.cpu()
+        for bbsz_idx in range(bsz * beam_size):
+            gen_tokens: List[int] = cpu_tokens[bbsz_idx].tolist()
+            for ngram in self.transpose_list(
+                [gen_tokens[i:] for i in range(self.no_repeat_ngram_size)]
+            ):
+                key = ",".join([str(x) for x in ngram[:-1]])
+                gen_ngrams[bbsz_idx][key] = gen_ngrams[bbsz_idx].get(
+                    key, torch.jit.annotate(List[int], [])
+                ) + [ngram[-1]]
+        if step + 2 - self.no_repeat_ngram_size >= 0:
+            # no banned tokens if we haven't generated no_repeat_ngram_size tokens yet
+            banned_tokens = [
+                self.calculate_banned_tokens(
+                    tokens, step, gen_ngrams, self.no_repeat_ngram_size, bbsz_idx
+                )
+                for bbsz_idx in range(bsz * beam_size)
+            ]
+        else:
+            banned_tokens = [
+                torch.jit.annotate(List[int], []) for bbsz_idx in range(bsz * beam_size)
+            ]
+        for bbsz_idx in range(bsz * beam_size):
+            lprobs[bbsz_idx][
+                torch.tensor(banned_tokens[bbsz_idx], dtype=torch.int64)
+            ] = torch.tensor(-math.inf).to(lprobs)
+        return lprobs
+
+    @staticmethod
+    def calculate_banned_tokens(
+        tokens,
+        step: int,
+        gen_ngrams: List[Dict[str, List[int]]],
+        no_repeat_ngram_size: int,
+        bbsz_idx: int,
+    ):
+        tokens_list: List[int] = tokens[
+            bbsz_idx, step + 2 - no_repeat_ngram_size : step + 1
+        ].tolist()
+        # before decoding the next token, prevent decoding of ngrams that have already appeared
+        ngram_index = ",".join([str(x) for x in tokens_list])
+        return gen_ngrams[bbsz_idx].get(ngram_index, torch.jit.annotate(List[int], []))
+
+    @staticmethod
+    def transpose_list(l: List[List[int]]):
+        # GeneratorExp aren't supported in TS so ignoring the lint
+        min_len = min([len(x) for x in l])  # noqa
+        l2 = [[row[i] for row in l] for i in range(min_len)]
+        return l2
diff --git a/SpeechT5/fairseq/fairseq/optim/__init__.py b/SpeechT5/fairseq/fairseq/optim/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..be783be896396ff659c0bd173a7acebb8a2d165d
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/optim/__init__.py
@@ -0,0 +1,48 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+"""isort:skip_file"""
+
+import importlib
+import os
+
+from fairseq import registry
+from fairseq.optim.bmuf import FairseqBMUF  # noqa
+from fairseq.optim.fairseq_optimizer import (  # noqa
+    FairseqOptimizer,
+    LegacyFairseqOptimizer,
+)
+from fairseq.optim.amp_optimizer import AMPOptimizer
+from fairseq.optim.fp16_optimizer import FP16Optimizer, MemoryEfficientFP16Optimizer
+from fairseq.optim.shard import shard_
+from omegaconf import DictConfig
+
+__all__ = [
+    "AMPOptimizer",
+    "FairseqOptimizer",
+    "FP16Optimizer",
+    "MemoryEfficientFP16Optimizer",
+    "shard_",
+]
+
+(
+    _build_optimizer,
+    register_optimizer,
+    OPTIMIZER_REGISTRY,
+    OPTIMIZER_DATACLASS_REGISTRY,
+) = registry.setup_registry("--optimizer", base_class=FairseqOptimizer, required=True)
+
+
+def build_optimizer(cfg: DictConfig, params, *extra_args, **extra_kwargs):
+    if all(isinstance(p, dict) for p in params):
+        params = [t for p in params for t in p.values()]
+    params = list(filter(lambda p: p.requires_grad, params))
+    return _build_optimizer(cfg, params, *extra_args, **extra_kwargs)
+
+
+# automatically import any Python files in the optim/ directory
+for file in sorted(os.listdir(os.path.dirname(__file__))):
+    if file.endswith(".py") and not file.startswith("_"):
+        file_name = file[: file.find(".py")]
+        importlib.import_module("fairseq.optim." + file_name)
diff --git a/SpeechT5/fairseq/fairseq/optim/adadelta.py b/SpeechT5/fairseq/fairseq/optim/adadelta.py
new file mode 100644
index 0000000000000000000000000000000000000000..f1a21549770f0904a6a40a42ff7eb52811f1bfbe
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/optim/adadelta.py
@@ -0,0 +1,47 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import torch.optim
+
+from . import LegacyFairseqOptimizer, register_optimizer
+
+
+@register_optimizer("adadelta")
+class Adadelta(LegacyFairseqOptimizer):
+    def __init__(self, args, params):
+        super().__init__(args)
+        self._optimizer = torch.optim.Adadelta(params, **self.optimizer_config)
+
+    @staticmethod
+    def add_args(parser):
+        """Add optimizer-specific arguments to the parser."""
+        # fmt: off
+        parser.add_argument('--adadelta-rho', type=float, default=0.9, metavar='RHO',
+                            help='coefficient used for computing a running average of squared gradients')
+        parser.add_argument('--adadelta-eps', type=float, default=1e-6, metavar='EPS',
+                            help='term added to the denominator to improve numerical stability')
+        parser.add_argument('--weight-decay', '--wd', default=0.0, type=float, metavar='WD',
+                            help='weight decay')
+        parser.add_argument('--anneal-eps', action='store_true', help='flag to anneal eps')
+        # fmt: on
+
+    @property
+    def optimizer_config(self):
+        """
+        Return a kwarg dictionary that will be used to override optimizer
+        args stored in checkpoints. This allows us to load a checkpoint and
+        resume training using a different set of optimizer args, e.g., with a
+        different learning rate.
+        """
+        return {
+            "lr": self.args.lr[0],
+            "rho": self.args.adadelta_rho,
+            "eps": self.args.adadelta_eps,
+            "weight_decay": self.args.weight_decay,
+        }
+
+    @property
+    def supports_flat_params(self):
+        return True
diff --git a/SpeechT5/fairseq/fairseq/optim/adafactor.py b/SpeechT5/fairseq/fairseq/optim/adafactor.py
new file mode 100644
index 0000000000000000000000000000000000000000..c969b9fbc0d229a25f2046ec67c53c57a433814b
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/optim/adafactor.py
@@ -0,0 +1,268 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import math
+
+import torch
+import torch.optim
+
+from . import LegacyFairseqOptimizer, register_optimizer
+
+
+@register_optimizer("adafactor")
+class FairseqAdafactor(LegacyFairseqOptimizer):
+    def __init__(self, args, params):
+        super().__init__(args)
+        self._optimizer = Adafactor(params, **self.optimizer_config)
+
+    @staticmethod
+    def add_args(parser):
+        """Add optimizer-specific arguments to the parser."""
+        # fmt: off
+        parser.add_argument('--adafactor-eps', default='(1e-30, 1e-3)', metavar="E",
+                            help='epsilons for Adafactor optimizer')
+        parser.add_argument('--clip-threshold', type=float, default=1.0, metavar="C",
+                            help='threshold for clipping update root mean square')
+        parser.add_argument('--decay-rate', type=float, default=-0.8, metavar="D",
+                            help='decay rate of the second moment estimator')
+        parser.add_argument('--beta1', type=float, default=None, metavar="B",
+                            help='beta for first moment estimator. Optional')
+        parser.add_argument('--weight-decay', '--wd', default=0.0, type=float, metavar='WD',
+                            help='weight decay')
+        parser.add_argument('--scale-parameter', action='store_true',
+                            help='scale learning rate by root mean square of parameter')
+        parser.add_argument('--relative-step', action='store_true',
+                            help='set learning rate to inverse square root of timestep,'
+                                 'otherwise use external learning rate')
+        parser.add_argument('--warmup-init', action='store_true',
+                            help='use relative step for warm-up learning rate schedule')
+        # fmt: on
+
+    @property
+    def optimizer_config(self):
+        """
+        Return a kwarg dictionary that will be used to override optimizer
+        args stored in checkpoints. This allows us to load a checkpoint and
+        resume training using a different set of optimizer args, e.g., with a
+        different learning rate.
+        Note : Convergence issues empirically observed with fp16 on.
+               Might require search for appropriate configuration.
+        """
+        return {
+            "lr": self.args.lr[0],
+            "eps": eval(self.args.adafactor_eps),
+            "clip_threshold": self.args.clip_threshold,
+            "decay_rate": self.args.decay_rate,
+            "beta1": self.args.beta1,
+            "weight_decay": self.args.weight_decay,
+            "scale_parameter": self.args.scale_parameter,  # defaults to False
+            "relative_step": self.args.relative_step,  # defaults to False
+            "warmup_init": self.args.warmup_init,
+        }
+
+
+class Adafactor(torch.optim.Optimizer):
+    """Implements Adafactor algorithm.
+
+    This implementation is based on:
+    `Adafactor: Adaptive Learning Rates with Sublinear Memory Cost`
+    (see https://arxiv.org/abs/1804.04235)
+
+    Note that this optimizer internally adjusts the learning rate
+    depending on the *scale_parameter*, *relative_step* and
+    *warmup_init* options. To use a manual (external) learning rate
+    schedule you should set `scale_parameter=False` and
+    `relative_step=False`.
+
+    Args:
+        params (iterable): iterable of parameters to optimize or dicts defining
+            parameter groups
+        lr (float, optional): external learning rate (default: None)
+        eps (tuple[float, float]): regularization constans for square gradient
+            and parameter scale respectively (default: (1e-30, 1e-3))
+        clip_threshold (float): threshold of root mean square of
+            final gradient update (default: 1.0)
+        decay_rate (float): coefficient used to compute running averages of square
+            gradient (default: -0.8)
+        beta1 (float): coefficient used for computing running averages of gradient
+            (default: None)
+        weight_decay (float, optional): weight decay (L2 penalty) (default: 0)
+        scale_parameter (bool): if True, learning rate is scaled by root mean square of
+            parameter (default: True)
+        relative_step (bool): if True, time-dependent learning rate is computed
+            instead of external learning rate (default: True)
+        warmup_init (bool): time-dependent learning rate computation depends on
+            whether warm-up initialization is being used (default: False)
+    """
+
+    def __init__(
+        self,
+        params,
+        lr=None,
+        eps=(1e-30, 1e-3),
+        clip_threshold=1.0,
+        decay_rate=-0.8,
+        beta1=None,
+        weight_decay=0.0,
+        scale_parameter=True,
+        relative_step=True,
+        warmup_init=False,
+    ):
+        if lr is not None and relative_step:
+            raise ValueError("Cannot combine manual lr and relative_step options")
+        if warmup_init and not relative_step:
+            raise ValueError("warmup_init requires relative_step=True")
+
+        defaults = dict(
+            lr=lr,
+            eps=eps,
+            clip_threshold=clip_threshold,
+            decay_rate=decay_rate,
+            beta1=beta1,
+            weight_decay=weight_decay,
+            scale_parameter=scale_parameter,
+            relative_step=relative_step,
+            warmup_init=warmup_init,
+        )
+        super(Adafactor, self).__init__(params, defaults)
+
+    @property
+    def supports_memory_efficient_fp16(self):
+        return True
+
+    @property
+    def supports_flat_params(self):
+        return False
+
+    def _get_lr(self, param_group, param_state):
+        rel_step_sz = param_group["lr"]
+        if param_group["relative_step"]:
+            min_step = (
+                1e-6 * param_state["step"] if param_group["warmup_init"] else 1e-2
+            )
+            rel_step_sz = min(min_step, 1.0 / math.sqrt(param_state["step"]))
+        param_scale = 1.0
+        if param_group["scale_parameter"]:
+            param_scale = max(param_group["eps"][1], param_state["RMS"])
+        return param_scale * rel_step_sz
+
+    def _get_options(self, param_group, param_shape):
+        factored = len(param_shape) >= 2
+        use_first_moment = param_group["beta1"] is not None
+        return factored, use_first_moment
+
+    def _rms(self, tensor):
+        return tensor.norm(2) / (tensor.numel() ** 0.5)
+
+    def _approx_sq_grad(self, exp_avg_sq_row, exp_avg_sq_col):
+        r_factor = (
+            (exp_avg_sq_row / exp_avg_sq_row.mean(dim=-1, keepdim=True))
+            .rsqrt_()
+            .unsqueeze(-1)
+        )
+        c_factor = exp_avg_sq_col.unsqueeze(-2).rsqrt()
+        return torch.mul(r_factor, c_factor)
+
+    def step(self, closure=None):
+        """Performs a single optimization step.
+
+        Args:
+            closure (callable, optional): A closure that reevaluates the model
+                and returns the loss.
+        """
+        loss = None
+        if closure is not None:
+            loss = closure()
+
+        for group in self.param_groups:
+            for p in group["params"]:
+                if p.grad is None:
+                    continue
+                grad = p.grad.data
+                if grad.dtype in {torch.float16, torch.bfloat16}:
+                    grad = grad.float()
+                if grad.is_sparse:
+                    raise RuntimeError("Adafactor does not support sparse gradients.")
+
+                state = self.state[p]
+                grad_shape = grad.shape
+
+                factored, use_first_moment = self._get_options(group, grad_shape)
+                # State Initialization
+                if len(state) == 0:
+                    state["step"] = 0
+
+                    if use_first_moment:
+                        # Exponential moving average of gradient values
+                        state["exp_avg"] = torch.zeros_like(grad)
+                    if factored:
+                        state["exp_avg_sq_row"] = torch.zeros(grad_shape[:-1]).to(grad)
+                        state["exp_avg_sq_col"] = torch.zeros(
+                            grad_shape[:-2] + grad_shape[-1:]
+                        ).to(grad)
+                    else:
+                        state["exp_avg_sq"] = torch.zeros_like(grad)
+
+                    state["RMS"] = 0
+                else:
+                    if use_first_moment:
+                        state["exp_avg"] = state["exp_avg"].to(grad)
+                    if factored:
+                        state["exp_avg_sq_row"] = state["exp_avg_sq_row"].to(grad)
+                        state["exp_avg_sq_col"] = state["exp_avg_sq_col"].to(grad)
+                    else:
+                        state["exp_avg_sq"] = state["exp_avg_sq"].to(grad)
+
+                p_data_fp32 = p.data
+                if p.data.dtype in {torch.float16, torch.bfloat16}:
+                    p_data_fp32 = p_data_fp32.float()
+
+                state["step"] += 1
+                state["RMS"] = self._rms(p_data_fp32)
+                group["lr"] = self._get_lr(group, state)
+
+                beta2t = 1.0 - math.pow(state["step"], group["decay_rate"])
+                update = (grad ** 2) + group["eps"][0]
+                if factored:
+                    exp_avg_sq_row = state["exp_avg_sq_row"]
+                    exp_avg_sq_col = state["exp_avg_sq_col"]
+
+                    exp_avg_sq_row.mul_(beta2t).add_(
+                        update.mean(dim=-1), alpha=1.0 - beta2t
+                    )
+                    exp_avg_sq_col.mul_(beta2t).add_(
+                        update.mean(dim=-2), alpha=1.0 - beta2t
+                    )
+
+                    # Approximation of exponential moving average of square of gradient
+                    update = self._approx_sq_grad(exp_avg_sq_row, exp_avg_sq_col)
+                    update.mul_(grad)
+                else:
+                    exp_avg_sq = state["exp_avg_sq"]
+
+                    exp_avg_sq.mul_(beta2t).add_(update, alpha=1.0 - beta2t)
+                    update = exp_avg_sq.rsqrt().mul_(grad)
+
+                update.div_(
+                    (self._rms(update) / group["clip_threshold"]).clamp_(min=1.0)
+                )
+                update.mul_(group["lr"])
+
+                if use_first_moment:
+                    exp_avg = state["exp_avg"]
+                    exp_avg.mul_(group["beta1"]).add_(update, alpha=1 - group["beta1"])
+                    update = exp_avg
+
+                if group["weight_decay"] != 0:
+                    p_data_fp32.add_(
+                        p_data_fp32, alpha=-group["weight_decay"] * group["lr"]
+                    )
+
+                p_data_fp32.add_(-update)
+
+                if p.data.dtype in {torch.float16, torch.bfloat16}:
+                    p.data.copy_(p_data_fp32)
+
+        return loss
diff --git a/SpeechT5/fairseq/fairseq/optim/adagrad.py b/SpeechT5/fairseq/fairseq/optim/adagrad.py
new file mode 100644
index 0000000000000000000000000000000000000000..4f539541c1c91d8c822f7ce624fa6eabf744f60e
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/optim/adagrad.py
@@ -0,0 +1,40 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import torch.optim
+
+from . import LegacyFairseqOptimizer, register_optimizer
+
+
+@register_optimizer("adagrad")
+class Adagrad(LegacyFairseqOptimizer):
+    def __init__(self, args, params):
+        super().__init__(args)
+        self._optimizer = torch.optim.Adagrad(params, **self.optimizer_config)
+
+    @staticmethod
+    def add_args(parser):
+        """Add optimizer-specific arguments to the parser."""
+        # fmt: off
+        parser.add_argument('--weight-decay', '--wd', default=0.0, type=float, metavar='WD',
+                            help='weight decay')
+        # fmt: on
+
+    @property
+    def optimizer_config(self):
+        """
+        Return a kwarg dictionary that will be used to override optimizer
+        args stored in checkpoints. This allows us to load a checkpoint and
+        resume training using a different set of optimizer args, e.g., with a
+        different learning rate.
+        """
+        return {
+            "lr": self.args.lr[0],
+            "weight_decay": self.args.weight_decay,
+        }
+
+    @property
+    def supports_flat_params(self):
+        return False
diff --git a/SpeechT5/fairseq/fairseq/optim/adam.py b/SpeechT5/fairseq/fairseq/optim/adam.py
new file mode 100644
index 0000000000000000000000000000000000000000..6a31e53a6285b75a2e0ee03ae54a9ec94df00e9d
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/optim/adam.py
@@ -0,0 +1,228 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import logging
+import math
+from collections.abc import Collection
+from dataclasses import dataclass, field
+from typing import Any, List
+
+import torch
+import torch.distributed as dist
+import torch.optim
+from fairseq.dataclass import FairseqDataclass
+from fairseq.optim import FairseqOptimizer, register_optimizer
+from fairseq.optim.fused_adam import get_fused_adam_class
+from omegaconf import II, OmegaConf
+
+
+logger = logging.getLogger(__name__)
+
+
+@dataclass
+class FairseqAdamConfig(FairseqDataclass):
+    adam_betas: Any = field(
+        default=(0.9, 0.999), metadata={"help": "betas for Adam optimizer"}
+    )
+    adam_eps: float = field(
+        default=1e-8, metadata={"help": "epsilon for Adam optimizer"}
+    )
+    weight_decay: float = field(default=0.0, metadata={"help": "weight decay"})
+    use_old_adam: bool = field(
+        default=False, metadata={"help": "Use fairseq.optim.adam.Adam"}
+    )
+    # TODO common vars below in parent
+    tpu: bool = II("common.tpu")
+    lr: List[float] = II("optimization.lr")
+
+
+@register_optimizer("adam", dataclass=FairseqAdamConfig)
+class FairseqAdam(FairseqOptimizer):
+    """Adam optimizer for fairseq.
+
+    Important note: this optimizer corresponds to the "AdamW" variant of
+    Adam in its weight decay behavior. As such, it is most closely
+    analogous to torch.optim.AdamW from PyTorch.
+    """
+
+    def __init__(self, cfg: FairseqAdamConfig, params):
+        super().__init__(cfg)
+        fused_adam_cls = get_fused_adam_class()
+        use_fused_adam = (
+            not getattr(cfg, "use_old_adam", False)
+            and fused_adam_cls is not None
+            and torch.cuda.is_available()
+        )
+        if getattr(cfg, "tpu", False):
+            # on TPUs we use the Adam defined here, since it
+            # automatically casts gradients to FP32
+            self._optimizer = Adam(params, **self.optimizer_config)
+        elif use_fused_adam:
+            logger.info("using FusedAdam")
+            self._optimizer = fused_adam_cls(params, **self.optimizer_config)
+        else:
+            self._optimizer = Adam(params, **self.optimizer_config)
+
+    @property
+    def optimizer_config(self):
+        """
+        Return a kwarg dictionary that will be used to override optimizer
+        args stored in checkpoints. This allows us to load a checkpoint and
+        resume training using a different set of optimizer args, e.g., with a
+        different learning rate.
+        """
+        return {
+            "lr": self.cfg.lr[0]
+            if isinstance(self.cfg.lr, Collection)
+            else self.cfg.lr,
+            "betas": eval(self.cfg.adam_betas)
+            if isinstance(self.cfg.adam_betas, str)
+            else OmegaConf.to_container(self.cfg.adam_betas),
+            "eps": self.cfg.adam_eps,
+            "weight_decay": self.cfg.weight_decay,
+        }
+
+    def average_params(self):
+        """Reduce Params is only used during BMUF distributed training."""
+        state_dict = self.optimizer.state_dict()
+        total_gpus = float(dist.get_world_size())
+
+        for _, value in state_dict["state"].items():
+            value["exp_avg"] /= total_gpus
+            value["exp_avg_sq"] /= total_gpus
+            dist.all_reduce(value["exp_avg"], op=dist.ReduceOp.SUM)
+            dist.all_reduce(value["exp_avg_sq"], op=dist.ReduceOp.SUM)
+
+
+class Adam(torch.optim.Optimizer):
+    r"""Implements Adam algorithm.
+
+    This implementation is modified from torch.optim.Adam based on:
+    `Fixed Weight Decay Regularization in Adam`
+    (see https://arxiv.org/abs/1711.05101)
+
+    It has been proposed in `Adam: A Method for Stochastic Optimization`_.
+
+    Args:
+        params (iterable): iterable of parameters to optimize or dicts defining
+            parameter groups
+        lr (float, optional): learning rate (default: 1e-3)
+        betas (Tuple[float, float], optional): coefficients used for computing
+            running averages of gradient and its square (default: (0.9, 0.999))
+        eps (float, optional): term added to the denominator to improve
+            numerical stability (default: 1e-8)
+        weight_decay (float, optional): weight decay (L2 penalty) (default: 0)
+        amsgrad (boolean, optional): whether to use the AMSGrad variant of this
+            algorithm from the paper `On the Convergence of Adam and Beyond`_
+
+    .. _Adam\: A Method for Stochastic Optimization:
+        https://arxiv.org/abs/1412.6980
+    .. _On the Convergence of Adam and Beyond:
+        https://openreview.net/forum?id=ryQu7f-RZ
+    """
+
+    def __init__(
+        self,
+        params,
+        lr=1e-3,
+        betas=(0.9, 0.999),
+        eps=1e-8,
+        weight_decay=0,
+        amsgrad=False,
+    ):
+        defaults = dict(
+            lr=lr, betas=betas, eps=eps, weight_decay=weight_decay, amsgrad=amsgrad
+        )
+        super(Adam, self).__init__(params, defaults)
+
+    @property
+    def supports_memory_efficient_fp16(self):
+        return True
+
+    @property
+    def supports_flat_params(self):
+        return True
+
+    def step(self, closure=None):
+        """Performs a single optimization step.
+
+        Args:
+            closure (callable, optional): A closure that reevaluates the model
+                and returns the loss.
+        """
+        loss = None
+        if closure is not None:
+            loss = closure()
+
+        for group in self.param_groups:
+            for p in group["params"]:
+                if p.grad is None:
+                    continue
+                grad = p.grad.data
+                if grad.dtype in {torch.float16, torch.bfloat16}:
+                    grad = grad.float()
+                if grad.is_sparse:
+                    raise RuntimeError(
+                        "Adam does not support sparse gradients, please consider SparseAdam instead"
+                    )
+                amsgrad = group.get("amsgrad", False)
+
+                p_data_fp32 = p.data
+                if p.data.dtype in {torch.float16, torch.bfloat16}:
+                    p_data_fp32 = p_data_fp32.float()
+
+                state = self.state[p]
+
+                # State initialization
+                if len(state) == 0:
+                    state["step"] = 0
+                    # Exponential moving average of gradient values
+                    state["exp_avg"] = torch.zeros_like(p_data_fp32)
+                    # Exponential moving average of squared gradient values
+                    state["exp_avg_sq"] = torch.zeros_like(p_data_fp32)
+                    if amsgrad:
+                        # Maintains max of all exp. moving avg. of sq. grad. values
+                        state["max_exp_avg_sq"] = torch.zeros_like(p_data_fp32)
+                else:
+                    state["exp_avg"] = state["exp_avg"].to(p_data_fp32)
+                    state["exp_avg_sq"] = state["exp_avg_sq"].to(p_data_fp32)
+                    if amsgrad:
+                        state["max_exp_avg_sq"] = state["max_exp_avg_sq"].to(
+                            p_data_fp32
+                        )
+
+                exp_avg, exp_avg_sq = state["exp_avg"], state["exp_avg_sq"]
+                if amsgrad:
+                    max_exp_avg_sq = state["max_exp_avg_sq"]
+                beta1, beta2 = group["betas"]
+
+                state["step"] += 1
+
+                # Decay the first and second moment running average coefficient
+                exp_avg.mul_(beta1).add_(grad, alpha=1 - beta1)
+                exp_avg_sq.mul_(beta2).addcmul_(grad, grad, value=1 - beta2)
+                if amsgrad:
+                    # Maintains the maximum of all 2nd moment running avg. till now
+                    torch.max(max_exp_avg_sq, exp_avg_sq, out=max_exp_avg_sq)
+                    # Use the max. for normalizing running avg. of gradient
+                    denom = max_exp_avg_sq.sqrt().add_(group["eps"])
+                else:
+                    denom = exp_avg_sq.sqrt().add_(group["eps"])
+
+                bias_correction1 = 1 - beta1 ** state["step"]
+                bias_correction2 = 1 - beta2 ** state["step"]
+                step_size = group["lr"] * math.sqrt(bias_correction2) / bias_correction1
+
+                if group["weight_decay"] != 0:
+                    p_data_fp32.add_(
+                        p_data_fp32, alpha=-group["weight_decay"] * group["lr"]
+                    )
+
+                p_data_fp32.addcdiv_(exp_avg, denom, value=-step_size)
+
+                if p.data.dtype in {torch.float16, torch.bfloat16}:
+                    p.data.copy_(p_data_fp32)
+
+        return loss
diff --git a/SpeechT5/fairseq/fairseq/optim/adamax.py b/SpeechT5/fairseq/fairseq/optim/adamax.py
new file mode 100644
index 0000000000000000000000000000000000000000..98ff8ad7ad6c12ab5efc53ca76db2f1663be7906
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/optim/adamax.py
@@ -0,0 +1,172 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import torch
+import torch.optim
+
+from . import LegacyFairseqOptimizer, register_optimizer
+
+
+@register_optimizer("adamax")
+class FairseqAdamax(LegacyFairseqOptimizer):
+    def __init__(self, args, params):
+        super().__init__(args)
+        self._optimizer = Adamax(params, **self.optimizer_config)
+
+    @staticmethod
+    def add_args(parser):
+        """Add optimizer-specific arguments to the parser."""
+        # fmt: off
+        parser.add_argument('--adamax-betas', default='(0.9, 0.999)', metavar='B',
+                            help='betas for Adam optimizer')
+        parser.add_argument('--adamax-eps', type=float, default=1e-8, metavar='D',
+                            help='epsilon for Adam optimizer')
+        parser.add_argument('--weight-decay', '--wd', default=0.0, type=float, metavar='WD',
+                            help='weight decay')
+        parser.add_argument('--no-bias-correction', default=False, action='store_true',
+                            help='disable bias correction')
+        # fmt: on
+
+    @property
+    def optimizer_config(self):
+        """
+        Return a kwarg dictionary that will be used to override optimizer
+        args stored in checkpoints. This allows us to load a checkpoint and
+        resume training using a different set of optimizer args, e.g., with a
+        different learning rate.
+        """
+        return {
+            "lr": self.args.lr[0],
+            "betas": eval(self.args.adamax_betas),
+            "eps": self.args.adamax_eps,
+            "weight_decay": self.args.weight_decay,
+            "bias_correction": not self.args.no_bias_correction,
+        }
+
+
+class Adamax(torch.optim.Optimizer):
+    """Implements Adamax algorithm (a variant of Adam based on infinity norm).
+
+    It has been proposed in `Adam: A Method for Stochastic Optimization`__.
+
+    Compared to the version in PyTorch, this version implements a fix for weight decay.
+
+    Args:
+        params (iterable): iterable of parameters to optimize or dicts defining
+            parameter groups
+        lr (float, optional): learning rate (default: 2e-3)
+        betas (Tuple[float, float], optional): coefficients used for computing
+            running averages of gradient and its square
+        eps (float, optional): term added to the denominator to improve
+            numerical stability (default: 1e-8)
+        weight_decay (float, optional): weight decay (L2 penalty) (default: 0)
+        bias_correction (bool, optional): enable bias correction (default: True)
+
+    __ https://arxiv.org/abs/1412.6980
+    """
+
+    def __init__(
+        self,
+        params,
+        lr=2e-3,
+        betas=(0.9, 0.999),
+        eps=1e-8,
+        weight_decay=0,
+        bias_correction=True,
+    ):
+        if not 0.0 <= lr:
+            raise ValueError("Invalid learning rate: {}".format(lr))
+        if not 0.0 <= eps:
+            raise ValueError("Invalid epsilon value: {}".format(eps))
+        if not 0.0 <= betas[0] < 1.0:
+            raise ValueError("Invalid beta parameter at index 0: {}".format(betas[0]))
+        if not 0.0 <= betas[1] < 1.0:
+            raise ValueError("Invalid beta parameter at index 1: {}".format(betas[1]))
+        if not 0.0 <= weight_decay:
+            raise ValueError("Invalid weight_decay value: {}".format(weight_decay))
+
+        defaults = dict(
+            lr=lr,
+            betas=betas,
+            eps=eps,
+            weight_decay=weight_decay,
+            bias_correction=bias_correction,
+        )
+        super(Adamax, self).__init__(params, defaults)
+
+    @property
+    def supports_memory_efficient_fp16(self):
+        return True
+
+    @property
+    def supports_flat_params(self):
+        return True
+
+    def step(self, closure=None):
+        """Performs a single optimization step.
+
+        Args:
+            closure (callable, optional): A closure that reevaluates the model
+                and returns the loss.
+        """
+        loss = None
+        if closure is not None:
+            loss = closure()
+
+        for group in self.param_groups:
+            for p in group["params"]:
+                if p.grad is None:
+                    continue
+                grad = p.grad.data.float()
+                if grad.is_sparse:
+                    raise RuntimeError("Adamax does not support sparse gradients")
+
+                p_data_fp32 = p.data
+                if p.data.dtype in {torch.float16, torch.bfloat16}:
+                    p_data_fp32 = p_data_fp32.float()
+
+                state = self.state[p]
+
+                # State initialization
+                if len(state) == 0:
+                    state["step"] = 0
+                    state["exp_avg"] = torch.zeros_like(p_data_fp32)
+                    state["exp_inf"] = torch.zeros_like(p_data_fp32)
+                else:
+                    state["exp_avg"] = state["exp_avg"].to(p_data_fp32)
+                    state["exp_inf"] = state["exp_inf"].to(p_data_fp32)
+
+                exp_avg, exp_inf = state["exp_avg"], state["exp_inf"]
+                beta1, beta2 = group["betas"]
+                eps = group["eps"]
+
+                state["step"] += 1
+
+                # Update biased first moment estimate.
+                exp_avg.mul_(beta1).add_(grad, alpha=1 - beta1)
+
+                # Update the exponentially weighted infinity norm.
+                torch.max(
+                    exp_inf.mul_(beta2),
+                    grad.abs_(),
+                    out=exp_inf,
+                )
+
+                step_size = group["lr"]
+                if group["bias_correction"]:
+                    bias_correction = 1 - beta1 ** state["step"]
+                    step_size /= bias_correction
+
+                if group["weight_decay"] != 0:
+                    p_data_fp32.add_(
+                        p_data_fp32, alpha=-group["weight_decay"] * group["lr"]
+                    )
+
+                p_data_fp32.addcdiv_(exp_avg, exp_inf.add(eps), value=-step_size)
+
+                if p.data.dtype in {torch.float16, torch.bfloat16}:
+                    p.data.copy_(p_data_fp32)
+
+        return loss
diff --git a/SpeechT5/fairseq/fairseq/optim/amp_optimizer.py b/SpeechT5/fairseq/fairseq/optim/amp_optimizer.py
new file mode 100644
index 0000000000000000000000000000000000000000..3b7958e50ce444474c48d1f5aeff05d66c19e5b6
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/optim/amp_optimizer.py
@@ -0,0 +1,105 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import logging
+
+import torch
+from fairseq import optim
+from omegaconf import DictConfig
+
+logger = logging.getLogger(__name__)
+
+
+class AMPOptimizer(optim.FairseqOptimizer):
+    """
+    Wrap an *optimizer* to support AMP (automatic mixed precision) training.
+    """
+
+    def __init__(self, cfg: DictConfig, params, fp32_optimizer, **kwargs):
+        super().__init__(cfg.optimizer)
+        self.fp32_optimizer = fp32_optimizer
+        amp_kwargs = {"init_scale": cfg.common.fp16_init_scale}
+        if getattr(cfg.common, "amp_scale_window", None) is not None:
+            amp_kwargs["growth_interval"] = cfg.common.amp_init_scale
+        self._grad_scaler = torch.cuda.amp.GradScaler(**amp_kwargs)
+        self.min_loss_scale = cfg.common.min_loss_scale
+
+    @classmethod
+    def build_optimizer(cls, cfg: DictConfig, params, **kwargs):
+        """
+        Args:
+            cfg (omegaconf.DictConfig): fairseq args
+            params (iterable): iterable of parameters to optimize
+        """
+        fp32_optimizer = optim.build_optimizer(cfg.optimizer, params)
+        return cls(cfg, params, fp32_optimizer, **kwargs)
+
+    def backward(self, loss):
+        """Computes the sum of gradients of the given tensor w.r.t. graph leaves.
+
+        Compared to :func:`fairseq.optim.FairseqOptimizer.backward`, this
+        function additionally dynamically scales the loss to avoid gradient
+        underflow.
+        """
+        self._grad_scaler.scale(loss).backward()
+
+    def step(self):
+        self.scaler.step(self.fp32_optimizer)
+        self.scaler.update()
+
+    def clip_grad_norm(self, max_norm, aggregate_norm_fn=None):
+        """Clips gradient norm."""
+        self.scaler.unscale_(self.optimizer)
+        grad_norm = self.fp32_optimizer.clip_grad_norm(max_norm, aggregate_norm_fn)
+        if not torch.isfinite(grad_norm).all():
+            new_loss_scale = self.next_loss_scale
+            if new_loss_scale <= self.min_loss_scale:
+                raise FloatingPointError(
+                    (
+                        "AMP: Minimum loss scale reached ({}). Your loss is probably exploding. "
+                        "Try restarting training or use fp32. {}"
+                    ).format(self.min_loss_scale, new_loss_scale)
+                )
+            else:
+                logger.info("AMP: overflow detected, setting scale to "
+                            f"to {new_loss_scale}")
+        return grad_norm
+
+    @property
+    def scaler(self):
+        return self._grad_scaler
+
+    @property
+    def next_loss_scale(self):
+        return self.scaler.get_scale() * self.scaler.get_backoff_factor()
+
+    @property
+    def optimizer(self):
+        return self.fp32_optimizer.optimizer
+
+    @optimizer.setter
+    def optimizer(self, optimizer):
+        self.fp32_optimizer.optimizer = optimizer
+
+    @property
+    def lr_scheduler(self):
+        return getattr(self.fp32_optimizer, "lr_scheduler", None)
+
+    @property
+    def optimizer_config(self):
+        return self.fp32_optimizer.optimizer_config
+
+    def get_lr(self):
+        return self.fp32_optimizer.get_lr()
+
+    def set_lr(self, lr):
+        self.fp32_optimizer.set_lr(lr)
+
+    def all_reduce_grads(self, module):
+        self.fp32_optimizer.all_reduce_grads(module)
+
+    @property
+    def supports_flat_params(self):
+        return self.fp32_optimizer.supports_flat_params
diff --git a/SpeechT5/fairseq/fairseq/optim/bmuf.py b/SpeechT5/fairseq/fairseq/optim/bmuf.py
new file mode 100644
index 0000000000000000000000000000000000000000..d6d0e04e86eb894efe59e13a78843d01ca9e651d
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/optim/bmuf.py
@@ -0,0 +1,200 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from dataclasses import dataclass, field
+
+import torch
+import torch.distributed as dist
+from fairseq.dataclass.configs import FairseqBMUFConfig
+from fairseq.dataclass.utils import gen_parser_from_dataclass
+from fairseq.optim.fairseq_optimizer import FairseqOptimizer
+
+
+class FairseqBMUF(FairseqOptimizer):
+    """
+    Implements incremental block distributed data parallelism similar to
+    https://ieeexplore.ieee.org/document/7472805
+
+    Paper title: Scalable training of deep learning machines by incremental
+    block training with intra-block parallel optimization and blockwise
+    model-update filtering
+    """
+
+    def __init__(self, cfg: FairseqBMUFConfig, optimizer):
+        super().__init__(cfg)
+        self._optimizer = optimizer
+        self._num_updates = 0
+        self.sync_iter = cfg.global_sync_iter
+        self.block_momentum = cfg.block_momentum
+        self.block_lr = cfg.block_lr
+        self._reset_local_data()
+        self.warmup_iteration = cfg.warmup_iterations
+        self.use_nbm = cfg.use_nbm
+        self.initial_state = self._optimizer.state_dict()
+        self.average_sync = self.cfg.average_sync
+        self.world_size = self.cfg.distributed_world_size
+
+    @staticmethod
+    def add_args(parser):
+        """Add optimizer-specific arguments to the parser."""
+        gen_parser_from_dataclass(parser, FairseqBMUFConfig())
+
+    @property
+    def optimizer(self):
+        return self._optimizer.optimizer
+
+    @property
+    def optimizer_config(self):
+        return self._optimizer.optimizer_config
+
+    def get_lr(self):
+        return self._optimizer.get_lr()
+
+    def set_lr(self, lr):
+        self._optimizer.set_lr(lr)
+
+    def state_dict(self):
+        return self._optimizer.state_dict()
+
+    def load_state_dict(self, state_dict, optimizer_overrides=None):
+        self._optimizer.load_state_dict(state_dict, optimizer_overrides)
+        self.initial_state = self._optimizer.state_dict()
+
+    def multiply_grads(self, c):
+        """Multiplies grads by a constant *c*."""
+        self._optimizer.multiply_grads(c)
+
+    def clip_grad_norm(self, max_norm, aggregate_norm_fn=None):
+        """Clips gradient norm."""
+        return self._optimizer.clip_grad_norm(max_norm, aggregate_norm_fn)
+
+    def average_params(self):
+        self._optimizer.average_params()
+
+    def _block_sync(self):
+        if self.world_size <= 1:
+            return
+        # Update the global model using local models from all GPUs
+        # (Step-1) Calculate grad between previously synced model and
+        # currrent local model
+        if self.block_momentum != 0:
+            self._calc_grad()
+
+        # (Step-2) Average gradient from all GPUs
+        self._avg_grad_from_all_gpus()
+
+        # (Step-3) Calculate global momentum and update the global model
+        if self.block_momentum != 0:
+            self._update_global_model()
+
+        # (Step-4) Average local optimizer params
+        if self.average_sync:
+            self.average_params()
+
+    def _is_warmup_end(self):
+        # Check whether train iterations is equal to warmup iter
+        if self.get_num_updates() == self.warmup_iteration:
+            return True
+        return False
+
+    def _is_bmuf_iter(self):
+        # Check whether train iterations is equal to bmuf sync iter
+        if (self.get_num_updates() > self.warmup_iteration) and (
+            self.get_num_updates() % self.sync_iter == 0
+        ):
+            return True
+        return False
+
+    def _warmup_sync(self, root_rank=0):
+        if self.world_size <= 1:
+            return
+        # Broadcast the local model to all gpus
+        for param in self.params:
+            dist.broadcast(param.data, src=root_rank)
+
+        # Update local optimizer state
+        if self.average_sync:
+            self._optimizer.average_params()
+        else:
+            self._optimizer.load_state_dict(self.initial_state)
+
+        self._reset_local_data()
+
+    def step(self, closure=None):
+        """Performs a single optimization step."""
+        self._optimizer.step(closure)
+        self.set_num_updates(self.get_num_updates() + 1)
+        if self._is_warmup_end():
+            self._warmup_sync()
+        elif self._is_bmuf_iter():
+            self._block_sync()
+
+    def zero_grad(self):
+        """Clears the gradients of all optimized parameters."""
+        self._optimizer.zero_grad()
+
+    def get_num_updates(self):
+        """Get the number of parameters updates."""
+        return self._num_updates
+
+    def set_num_updates(self, num_updates):
+        """Set the number of parameters updates."""
+        self._num_updates = num_updates
+
+    @torch.no_grad()
+    def _reset_local_data(self):
+        # (Step-0) Initialize global momentum parameters and store global copy on each gpu
+        self.global_params = [torch.zeros_like(p.data) for p in self.params]
+        self.smoothed_grads = [p.data.new_zeros(p.data.size()) for p in self.params]
+        self.grads = [p.data.new_zeros(p.data.size()) for p in self.params]
+
+        # saving the global model locally for calculating gradient during bmuf sync
+        for param, global_param in zip(self.params, self.global_params):
+            global_param.copy_(param.data)
+
+    @torch.no_grad()
+    def _calc_grad(self):
+        # global_params is basically the global copy from the previously finished
+        # synchronisation. param.data is local parameter after block_sync_freq
+        # for the local gpu. so grad is difference between previously synced
+        # model and currrent local model.
+        for index, (param, global_param) in enumerate(
+            zip(self.params, self.global_params)
+        ):
+            self.grads[index] = global_param - param.data
+
+    def _avg_grad_from_all_gpus(self):
+        for index, param in enumerate(self.params):
+            sync_para = param.data if self.block_momentum == 0 else self.grads[index]
+            sync_para /= float(dist.get_world_size())
+            dist.all_reduce(sync_para, op=dist.ReduceOp.SUM)
+
+    @torch.no_grad()
+    def _update_global_model(self):
+        for index, (param, global_param, smoothed_grad, grad) in enumerate(
+            zip(
+                self.params,
+                self.global_params,
+                self.smoothed_grads,
+                # all gpus would share the same value of smoothed_grad, since it is
+                # always computed on synchronized gradients.
+                self.grads,
+            )
+        ):
+            # global_param is basically last syncrhornized parameter. though
+            # smoothed_grad is local, all processes will have same value of
+            # smoothed_grad and hence param is globally synchronized copy.
+            # smoothed_grad(t) = BM * smoothed_grad(t-1) + BM_lr * grad(t)
+            smoothed_grad = self.block_momentum * smoothed_grad + self.block_lr * grad
+            param.data.copy_(global_param - smoothed_grad)
+
+            # A Nesterov momentum here is to do a partial weight update before
+            # calculating the gradient
+            if self.use_nbm:
+                param.data.copy_(param.data - self.block_momentum * smoothed_grad)
+
+            # backup for the next synchronization.
+            self.smoothed_grads[index] = smoothed_grad
+            global_param.copy_(param.data)
diff --git a/SpeechT5/fairseq/fairseq/optim/composite.py b/SpeechT5/fairseq/fairseq/optim/composite.py
new file mode 100644
index 0000000000000000000000000000000000000000..a5366d62434a4400ba9cc524f4286f99f733d121
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/optim/composite.py
@@ -0,0 +1,188 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import logging
+from collections import defaultdict
+from dataclasses import dataclass, field
+from typing import Dict, Any, List, Optional
+
+import torch.optim
+from fairseq.dataclass import FairseqDataclass
+from fairseq.optim import FairseqOptimizer, register_optimizer, _build_optimizer
+from fairseq.optim.lr_scheduler import FairseqLRScheduler, build_lr_scheduler
+from omegaconf import II, open_dict
+
+
+logger = logging.getLogger(__name__)
+
+
+@dataclass
+class OptimizerAndSchedulerConfig(FairseqDataclass):
+    optimizer: Any = None
+    lr_scheduler: Optional[Any] = None
+    lr: List = II("optimization.lr")
+    lr_float: Optional[float] = None  # this makes it easier to sweep on learning rate with auto sweepers
+
+
+@dataclass
+class CompositeOptimizerConfig(FairseqDataclass):
+    groups: Dict[str, Any] = field(
+        default_factory=lambda: {},
+        metadata={
+            "help": "optimizer name -> optimizer OptimizerAndSchedulerConfig. "
+            "Configures a different optimizer and (optionally) lr scheduler for each parameter group"
+        },
+    )
+
+
+@register_optimizer("composite", dataclass=CompositeOptimizerConfig)
+class FairseqCompositeOptimizer(FairseqOptimizer):
+
+    optimizers: Dict[str, FairseqOptimizer] = {}
+    lr_schedulers: Dict[str, FairseqLRScheduler] = {}
+    lr_scheduler: FairseqLRScheduler = None
+    _optimizer: torch.optim.Optimizer
+
+    def __init__(self, cfg: CompositeOptimizerConfig, params):
+        super().__init__(cfg)
+
+        assert (
+            len(params) > 1
+        ), "Composite optimizer only works when there are multiple parameter groups (try fp16_no_flatten_grads: true)"
+
+        groupped_params = defaultdict(list)
+        for p in params:
+            group = getattr(p, "param_group", "default")
+            groupped_params[group].append(p)
+
+        assert groupped_params.keys() == cfg.groups.keys(), (
+            f"Parameter groups {groupped_params.keys()} and optimizer groups {cfg.groups.keys()} are not the same! "
+            "Try setting 'param_group' on your parameters in the model."
+        )
+
+        for group, group_params in groupped_params.items():
+            group_cfg = cfg.groups[group]
+            with open_dict(group_cfg):
+                if group_cfg.lr_float is not None:
+                    group_cfg.optimizer.lr = [group_cfg.lr_float]
+                    group_cfg.lr_scheduler.lr = [group_cfg.lr_float]
+                else:
+                    group_cfg.optimizer.lr = group_cfg.lr
+                    group_cfg.lr_scheduler.lr = group_cfg.lr
+            self.optimizers[group] = _build_optimizer(group_cfg.optimizer, group_params)
+            if group_cfg.lr_scheduler is not None:
+                self.lr_schedulers[group] = build_lr_scheduler(
+                    group_cfg.lr_scheduler, self.optimizers[group]
+                )
+
+        if len(self.lr_schedulers) > 0:
+            assert len(self.lr_schedulers) == len(self.optimizers), (
+                f"Please provide an lr scheduler for each optimizer to use pass_through scheduler. "
+                f"Optimizers: {self.optimizers}; Lr scheds: {self.lr_schedulers}"
+            )
+            self.lr_scheduler = CompositeLRScheduler(self.lr_schedulers)
+
+        self._optimizer = CompositeOptimizer(self.optimizers)
+
+    @property
+    def supports_groups(self):
+        return True
+
+    @property
+    def param_groups(self):
+        for opt in self.optimizers.values():
+            for group in opt.param_groups:
+                yield group
+
+    def get_lr(self):
+        """Return the current learning rate."""
+        k = (
+            "default"
+            if "default" in self.optimizers
+            else next(iter(self.optimizers.keys()))
+        )
+        return self.optimizers[k].param_groups[0]["lr"]
+
+    def state_dict(self):
+        """Return the LR scheduler state dict."""
+        return {k: s.state_dict() for k, s in self.optimizers.items()}
+
+    def load_state_dict(self, state_dict, optimizer_overrides=None):
+        """Load an LR scheduler state dict."""
+        for k, state in state_dict.items():
+            if k not in self.optimizers:
+                # skip extra keys like "loss_scale" added by fp16 optimizer
+                continue
+
+            overrides = (
+                optimizer_overrides[k]
+                if isinstance(optimizer_overrides, dict) and k in optimizer_overrides
+                else None
+            )
+            self.optimizers[k].load_state_dict(state, optimizer_overrides=overrides)
+
+
+class CompositeOptimizer(torch.optim.Optimizer):
+    def __init__(self, optimizers: Dict[str, FairseqOptimizer]):
+        self.optimizers = optimizers
+
+    @property
+    def supports_memory_efficient_fp16(self):
+        return all(o.supports_memory_efficient_fp16 for o in self.optimizers.values())
+
+    @property
+    def supports_flat_params(self):
+        return all(o.supports_flat_params for o in self.optimizers.values())
+
+    def step(self, closure=None, groups=None):
+        """Performs a single optimization step.
+
+        Args:
+            closure (callable, optional): A closure that reevaluates the model
+                and returns the loss.
+        """
+        loss = None
+        if closure is not None:
+            loss = closure()
+
+        for k, opt in self.optimizers.items():
+            if groups is None or k in groups:
+                opt.step()
+
+        return loss
+
+    def zero_grad(self):
+        for opt in self.optimizers.values():
+            opt.zero_grad()
+
+
+class CompositeLRScheduler(FairseqLRScheduler):
+    def __init__(self, lr_schedulers):
+        super().__init__(None, None)
+
+        self.lr_schedulers = lr_schedulers
+
+    def state_dict(self):
+        """Return the LR scheduler state dict."""
+        return {k: s.state_dict() for k, s in self.lr_schedulers.items()}
+
+    def load_state_dict(self, state_dict):
+        """Load an LR scheduler state dict."""
+        for k, state in state_dict.items():
+            self.lr_schedulers[k].load_state_dict(state)
+
+    def step_begin_epoch(self, epoch):
+        """Update the learning rate at the beginning of the given epoch."""
+        for s in self.lr_schedulers.values():
+            s.step_begin_epoch(epoch)
+
+    def step(self, epoch, val_loss=None):
+        """Update the learning rate at the end of the given epoch."""
+        for s in self.lr_schedulers.values():
+            s.step(epoch)
+
+    def step_update(self, num_updates):
+        """Update the learning rate after each update."""
+        return {k: s.step_update(num_updates) for k, s in self.lr_schedulers.items()}
diff --git a/SpeechT5/fairseq/fairseq/optim/cpu_adam.py b/SpeechT5/fairseq/fairseq/optim/cpu_adam.py
new file mode 100644
index 0000000000000000000000000000000000000000..e36bccf123020d1d90acafdf6a641be1fd926b8b
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/optim/cpu_adam.py
@@ -0,0 +1,200 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import importlib
+from collections.abc import Collection
+from dataclasses import dataclass, field
+from typing import List
+
+import torch
+from fairseq.dataclass import FairseqDataclass
+from fairseq.optim import FairseqOptimizer, register_optimizer
+from omegaconf import II, DictConfig
+
+
+try:
+    import deepspeed
+    has_deepspeed = True
+except ImportError as e:
+    has_deepspeed = False
+
+
+def _get_cpu_adam():
+    try:
+        from deepspeed.ops.op_builder import CPUAdamBuilder
+        return CPUAdamBuilder().load()
+    except ImportError:
+        # fbcode
+        from deepspeed.ops.adam import DeepSpeedCPUAdam as ds_opt_adam
+        return ds_opt_adam
+
+@dataclass
+class FairseqCPUAdamConfig(FairseqDataclass):
+    adam_betas: str = field(
+        default="(0.9, 0.999)", metadata={"help": "betas for Adam optimizer"}
+    )
+    adam_eps: float = field(
+        default=1e-8, metadata={"help": "epsilon for Adam optimizer"}
+    )
+    weight_decay: float = field(default=0.0, metadata={"help": "weight decay"})
+    fp16_adam_stats: bool = field(
+        default=False, metadata={"help": "use FP16 stats (with automatic scaling)"}
+    )
+    # TODO common vars below in parent
+    lr: List[float] = II("optimization.lr")
+
+
+@register_optimizer("cpu_adam", dataclass=FairseqCPUAdamConfig)
+class FairseqCPUAdam(FairseqOptimizer):
+    """Adam optimizer for fairseq, optimized for CPU tensors.
+
+    Important note: this optimizer corresponds to the "AdamW" variant of
+    Adam in its weight decay behavior. As such, it is most closely
+    analogous to torch.optim.AdamW from PyTorch.
+    """
+
+    def __init__(self, cfg: DictConfig, params):
+        super().__init__(cfg)
+        self._optimizer = CPUAdam(params, **self.optimizer_config)
+
+    @property
+    def optimizer_config(self):
+        """
+        Return a kwarg dictionary that will be used to override optimizer
+        args stored in checkpoints. This allows us to load a checkpoint and
+        resume training using a different set of optimizer args, e.g., with a
+        different learning rate.
+        """
+        return {
+            "lr": self.cfg.lr[0]
+            if isinstance(self.cfg.lr, Collection)
+            else self.cfg.lr,
+            "betas": eval(self.cfg.adam_betas),
+            "eps": self.cfg.adam_eps,
+            "weight_decay": self.cfg.weight_decay,
+            "use_fp16_stats": self.cfg.fp16_adam_stats,
+        }
+
+
+class CPUAdam(torch.optim.Optimizer):
+
+    optimizer_id = 0
+
+    def __init__(
+        self,
+        params,
+        lr=1e-3,
+        bias_correction=True,
+        betas=(0.9, 0.999),
+        eps=1e-8,
+        weight_decay=0,
+        use_fp16_stats=False,
+    ):
+        defaults = {
+            "lr": lr,
+            "bias_correction": bias_correction,
+            "betas": betas,
+            "eps": eps,
+            "weight_decay": weight_decay,
+        }
+        super().__init__(params, defaults)
+
+        self.use_fp16_stats = use_fp16_stats
+        self.FLOAT16_MAX = 65504.0
+
+        if not has_deepspeed:
+            raise ImportError("Please install DeepSpeed: pip install deepspeed")
+
+        self.opt_id = CPUAdam.optimizer_id
+        CPUAdam.optimizer_id = CPUAdam.optimizer_id + 1
+
+        self.ds_opt_adam = _get_cpu_adam()
+        adamw_mode = True
+        self.ds_opt_adam.create_adam(
+            self.opt_id, lr, betas[0], betas[1], eps, weight_decay, adamw_mode
+        )
+
+    @property
+    def supports_flat_params(self):
+        return True
+
+    @torch.no_grad()
+    def step(self, closure=None):
+        loss = None
+        if closure is not None:
+            with torch.enable_grad():
+                loss = closure()
+
+        for group_id, group in enumerate(self.param_groups):
+            for param_id, p in enumerate(group["params"]):
+                if p.grad is None:
+                    continue
+
+                state = self.state[p]
+                if len(state) == 0:
+                    state["step"] = 0
+                    dtype = torch.float16 if self.use_fp16_stats else p.data.dtype
+                    # gradient momentums
+                    state["exp_avg"] = torch.zeros_like(
+                        p.data, dtype=dtype, device="cpu"
+                    )
+                    # gradient variances
+                    state["exp_avg_sq"] = torch.zeros_like(
+                        p.data, dtype=dtype, device="cpu"
+                    )
+                    if self.use_fp16_stats:
+                        assert torch.is_floating_point(p.data)
+                        state["exp_avg_scale"] = 1.0
+                        state["exp_avg_sq_scale"] = 1.0
+
+                exp_avg, exp_avg_sq = state["exp_avg"], state["exp_avg_sq"]
+
+                p_data_bak = p.data  # backup of the original data pointer
+
+                p.data = p.data.to(dtype=torch.float32, device="cpu")
+                p.grad.data = p.grad.data.to(dtype=torch.float32, device="cpu")
+
+                if self.use_fp16_stats:
+                    exp_avg = exp_avg.float() * state["exp_avg_scale"]
+                    exp_avg_sq = exp_avg_sq.float() * state["exp_avg_sq_scale"]
+
+                state["step"] += 1
+                beta1, beta2 = group["betas"]
+
+                self.ds_opt_adam.adam_update(
+                    self.opt_id,
+                    state["step"],
+                    group["lr"],
+                    beta1,
+                    beta2,
+                    group["eps"],
+                    group["weight_decay"],
+                    group["bias_correction"],
+                    p.data,
+                    p.grad.data,
+                    exp_avg,
+                    exp_avg_sq,
+                )
+
+                if p_data_bak.data_ptr() != p.data.data_ptr():
+                    p_data_bak.copy_(p.data)
+                    p.data = p_data_bak
+
+                if self.use_fp16_stats:
+
+                    def inf_norm(t):
+                        return torch.norm(t, float("inf"))
+
+                    # from github.com/openai/jukebox/blob/master/jukebox/utils/fp16.py
+                    state["exp_avg_scale"], state["exp_avg_sq_scale"] = (
+                        1e-8 + inf_norm(exp_avg) / self.FLOAT16_MAX,
+                        1e-8 + inf_norm(exp_avg_sq) / self.FLOAT16_MAX,
+                    )
+                    state["exp_avg"], state["exp_avg_sq"] = (
+                        (exp_avg / state["exp_avg_scale"]).half(),
+                        (exp_avg_sq / state["exp_avg_sq_scale"]).half(),
+                    )
+
+        return loss
diff --git a/SpeechT5/fairseq/fairseq/optim/dynamic_loss_scaler.py b/SpeechT5/fairseq/fairseq/optim/dynamic_loss_scaler.py
new file mode 100644
index 0000000000000000000000000000000000000000..43f9be37b9067c520cd794b9a941c57adae25e97
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/optim/dynamic_loss_scaler.py
@@ -0,0 +1,70 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+
+class DynamicLossScaler(object):
+    def __init__(
+        self,
+        init_scale=2.0 ** 15,
+        scale_factor=2.0,
+        scale_window=2000,
+        tolerance=0.0,
+        threshold=None,
+        min_loss_scale=1e-4,
+    ):
+        self.loss_scale = init_scale
+        self.scale_factor = scale_factor
+        self.scale_window = scale_window
+        self.tolerance = tolerance
+        self.threshold = threshold
+        self._iter = 0
+        self._last_overflow_iter = -1
+        self._last_rescale_iter = -1
+        self._overflows_since_rescale = 0
+        self.min_loss_scale = min_loss_scale
+
+    def scale(self, outputs):
+        return self.loss_scale * outputs
+
+    def update(self):
+        if (self._iter - self._last_overflow_iter) % self.scale_window == 0:
+            self.loss_scale *= self.scale_factor
+            self._last_rescale_iter = self._iter
+        self._iter += 1
+
+    def _decrease_loss_scale(self):
+        self.loss_scale /= self.scale_factor
+        if self.threshold is not None:
+            self.loss_scale = max(self.loss_scale, self.threshold)
+
+    def check_overflow(self, grad_norm):
+        # detect inf and nan
+        if grad_norm == float("inf") or grad_norm != grad_norm:
+            # overflow has occured
+            prev_scale = self.loss_scale
+            iter_since_rescale = self._iter - self._last_rescale_iter
+
+            self._last_overflow_iter = self._iter
+            self._overflows_since_rescale += 1
+            pct_overflow = self._overflows_since_rescale / float(iter_since_rescale)
+            if pct_overflow >= self.tolerance:
+                self._decrease_loss_scale()
+                self._last_rescale_iter = self._iter
+                self._overflows_since_rescale = 0
+
+            if self.loss_scale <= self.min_loss_scale:
+                # Use FloatingPointError as an uncommon error that parent
+                # functions can safely catch to stop training.
+                self.loss_scale = prev_scale
+                raise FloatingPointError(
+                    (
+                        "Minimum loss scale reached ({}). Your loss is probably exploding. "
+                        "Try lowering the learning rate, using gradient clipping or "
+                        "increasing the batch size."
+                    ).format(self.min_loss_scale)
+                )
+
+            self._iter += 1
+            raise OverflowError("setting loss scale to: " + str(self.loss_scale))
diff --git a/SpeechT5/fairseq/fairseq/optim/fairseq_optimizer.py b/SpeechT5/fairseq/fairseq/optim/fairseq_optimizer.py
new file mode 100644
index 0000000000000000000000000000000000000000..7e5411753a2ba94f3a7a68316131530b8b17d22a
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/optim/fairseq_optimizer.py
@@ -0,0 +1,179 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import torch
+from fairseq import utils
+from fairseq.dataclass.utils import gen_parser_from_dataclass
+
+
+class FairseqOptimizer(object):
+    def __init__(self, cfg):
+        super().__init__()
+        self.cfg = cfg
+
+    @classmethod
+    def add_args(cls, parser):
+        """Add optimizer-specific arguments to the parser."""
+        dc = getattr(cls, "__dataclass", None)
+        if dc is not None:
+            gen_parser_from_dataclass(parser, dc())
+
+    @property
+    def optimizer(self):
+        """Return a torch.optim.optimizer.Optimizer instance."""
+        if not hasattr(self, "_optimizer"):
+            raise NotImplementedError
+        if not isinstance(self._optimizer, torch.optim.Optimizer):
+            raise ValueError("_optimizer must be an instance of torch.optim.Optimizer")
+        return self._optimizer
+
+    @optimizer.setter
+    def optimizer(self, optimizer):
+        """Reset optimizer instance."""
+        if not hasattr(self, "_optimizer"):
+            raise NotImplementedError
+        if not isinstance(self._optimizer, torch.optim.Optimizer):
+            raise ValueError("_optimizer must be an instance of torch.optim.Optimizer")
+        self._optimizer = optimizer
+
+    @property
+    def optimizer_config(self):
+        """
+        Return a kwarg dictionary that will be used to override optimizer
+        args stored in checkpoints. This allows us to load a checkpoint and
+        resume training using a different set of optimizer args, e.g., with a
+        different learning rate.
+        """
+        raise NotImplementedError
+
+    @property
+    def params(self):
+        """Return an iterable of the parameters held by the optimizer."""
+        for param_group in self.param_groups:
+            for p in param_group["params"]:
+                yield p
+
+    @property
+    def param_groups(self):
+        return self.optimizer.param_groups
+
+    def __getstate__(self):
+        return self._optimizer.__getstate__()
+
+    def get_lr(self):
+        """Return the current learning rate."""
+        return self.param_groups[0]["lr"]
+
+    def set_lr(self, lr):
+        """Set the learning rate."""
+        for param_group in self.param_groups:
+            param_group["lr"] = lr
+
+    def state_dict(self):
+        """Return the optimizer's state dict."""
+        return self.optimizer.state_dict()
+
+    def load_state_dict(self, state_dict, optimizer_overrides=None):
+        """Load an optimizer state dict.
+
+        In general we should prefer the configuration of the existing optimizer
+        instance (e.g., learning rate) over that found in the state_dict. This
+        allows us to resume training from a checkpoint using a new set of
+        optimizer args.
+        """
+        self.optimizer.load_state_dict(state_dict)
+
+        if optimizer_overrides is not None and len(optimizer_overrides) > 0:
+            # override learning rate, momentum, etc. with latest values
+            for group in self.param_groups:
+                group.update(optimizer_overrides)
+
+    def backward(self, loss):
+        """Computes the sum of gradients of the given tensor w.r.t. graph leaves."""
+        loss.backward()
+
+    def all_reduce_grads(self, module):
+        """Manually all-reduce gradients (if required)."""
+        if hasattr(module, "all_reduce_grads"):
+            module.all_reduce_grads()
+
+    def multiply_grads(self, c):
+        """Multiplies grads by a constant *c*."""
+        for p in self.params:
+            if p.grad is not None:
+                if torch.is_tensor(c):
+                    c = c.to(p.grad.device)
+                p.grad.data.mul_(c)
+
+    def clip_grad_norm(self, max_norm, aggregate_norm_fn=None):
+        """Clips gradient norm."""
+        return utils.clip_grad_norm_(self.params, max_norm, aggregate_norm_fn)
+
+    def step(self, closure=None, scale=1.0, groups=None):
+        """Performs a single optimization step."""
+        if self.supports_step_with_scale:
+            if self.supports_groups:
+                self.optimizer.step(closure, scale=scale, groups=groups)
+            else:
+                self.optimizer.step(closure, scale=scale)
+        else:
+            if scale != 1.0:
+                self.multiply_grads(1.0 / scale)
+            if self.supports_groups:
+                self.optimizer.step(closure, groups=groups)
+            else:
+                self.optimizer.step(closure)
+
+    def zero_grad(self):
+        """Clears the gradients of all optimized parameters."""
+        for p in self.params:
+            p.grad = None
+        self.optimizer.zero_grad()
+
+    @property
+    def supports_memory_efficient_fp16(self):
+        if hasattr(self.optimizer, "supports_memory_efficient_fp16"):
+            return self.optimizer.supports_memory_efficient_fp16
+        return False
+
+    @property
+    def supports_step_with_scale(self):
+        if hasattr(self.optimizer, "supports_step_with_scale"):
+            return self.optimizer.supports_step_with_scale
+        return False
+
+    @property
+    def supports_groups(self):
+        if hasattr(self.optimizer, "supports_groups"):
+            return self.optimizer.supports_groups
+        return False
+
+    @property
+    def supports_flat_params(self):
+        """
+        Whether the optimizer supports collapsing of the model
+        parameters/gradients into a single contiguous Tensor.
+        """
+        if hasattr(self.optimizer, "supports_flat_params"):
+            return self.optimizer.supports_flat_params
+        return False
+
+    def average_params(self):
+        pass
+
+    def broadcast_global_state_dict(self, state_dict):
+        """
+        Broadcasts a global state dict to all ranks.
+        Useful for optimizers that shard state between ranks.
+        """
+        if hasattr(self.optimizer, "broadcast_global_state_dict"):
+            return self.optimizer.broadcast_global_state_dict(state_dict)
+        else:
+            return state_dict
+
+
+class LegacyFairseqOptimizer(FairseqOptimizer):
+    def __init__(self, args):
+        self.args = args
diff --git a/SpeechT5/fairseq/fairseq/optim/fp16_optimizer.py b/SpeechT5/fairseq/fairseq/optim/fp16_optimizer.py
new file mode 100644
index 0000000000000000000000000000000000000000..370a910102a43f34d0101717e4bd71f729f6e238
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/optim/fp16_optimizer.py
@@ -0,0 +1,546 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from collections import defaultdict
+from itertools import chain
+
+import torch
+from fairseq import optim
+from omegaconf import DictConfig
+
+from .dynamic_loss_scaler import DynamicLossScaler
+
+
+class _FP16OptimizerMixin(object):
+    def __init__(self, *args, **kwargs):
+        # forward __init__ call to the next class in mro(method resolution order)
+        super().__init__(*args, **kwargs)
+        self._multiply_factor = 1.0
+
+    @property
+    def has_flat_params(self):
+        return torch.is_tensor(self.fp32_params) or (
+            isinstance(self.fp32_params, dict)
+            and all(torch.is_tensor(t) for t in self.fp32_params.values())
+        )
+
+    @classmethod
+    def build_fp32_params(cls, args, params, flatten=True):
+        # create FP32 copy of parameters and grads
+        if flatten:
+            is_pipeline_parallel = getattr(
+                args, "pipeline_model_parallel", False
+            ) and getattr(args, "distributed_no_spawn", False)
+            total_param_size = sum(p.data.numel() for p in params)
+            devices = [torch.cuda.current_device()]
+            if is_pipeline_parallel:
+                devices = list(set(args.pipeline_devices))
+            fp32_params = {}
+            for device in devices:
+                if is_pipeline_parallel:
+                    device_param_size = sum(
+                        p.data.numel() for p in params if p.device.index == device
+                    )
+                    device_params = [p for p in params if p.device.index == device]
+                else:
+                    device_param_size = total_param_size
+                    device_params = params
+                fp32_params[device] = (
+                    device_params[0].new(0).float().new(device_param_size)
+                )
+                offset = 0
+                for p in device_params:
+                    numel = p.data.numel()
+                    fp32_params[device][offset : offset + numel].copy_(p.data.view(-1))
+                    offset += numel
+                fp32_params[device] = torch.nn.Parameter(fp32_params[device])
+                fp32_params[device].grad = fp32_params[device].data.new(
+                    device_param_size
+                )
+            return fp32_params
+        else:
+            fp32_params = []
+            for p in params:
+                p32 = torch.nn.Parameter(p.data.float())
+                if hasattr(p, 'expert'):
+                    p32.expert = True
+                p32.grad = torch.zeros_like(p32.data)
+                if hasattr(p, "param_group"):
+                    p32.param_group = p.param_group
+                fp32_params.append(p32)
+            return fp32_params
+
+    def state_dict(self):
+        """Return the optimizer's state dict."""
+        state_dict = self.fp32_optimizer.state_dict()
+        if self.scaler is not None:
+            state_dict["loss_scale"] = self.scaler.loss_scale
+        return state_dict
+
+    def load_state_dict(self, state_dict, optimizer_overrides=None):
+        """Load an optimizer state dict.
+
+        In general we should prefer the configuration of the existing optimizer
+        instance (e.g., learning rate) over that found in the state_dict. This
+        allows us to resume training from a checkpoint using a new set of
+        optimizer args.
+        """
+        if "loss_scale" in state_dict and self.scaler is not None:
+            self.scaler.loss_scale = state_dict["loss_scale"]
+        self.fp32_optimizer.load_state_dict(state_dict, optimizer_overrides)
+
+    def backward(self, loss):
+        """Computes the sum of gradients of the given tensor w.r.t. graph leaves.
+
+        Compared to :func:`fairseq.optim.FairseqOptimizer.backward`, this
+        function additionally dynamically scales the loss to avoid gradient
+        underflow.
+        """
+        if self.scaler is not None:
+            loss = self.scaler.scale(loss)
+        loss.backward()
+        self._needs_sync = True
+
+    def _sync_fp16_grads_to_fp32(self):
+        if self._needs_sync:
+            # copy FP16 grads to FP32
+            if self.has_flat_params:
+                devices = list(self.fp32_params.keys())
+                device_params_dict = defaultdict(list)
+                for p in self.fp16_params:
+                    if p.requires_grad:
+                        device_params_dict[p.device.index].append(p)
+                for device in devices:
+                    device_params = device_params_dict[device]
+                    offset = 0
+                    for p in device_params:
+                        grad_data = (
+                            p.grad.data
+                            if p.grad is not None
+                            else p.data.new_zeros(p.data.shape)
+                        )
+                        numel = grad_data.numel()
+                        self.fp32_params[device].grad.data[
+                            offset : offset + numel
+                        ].copy_(grad_data.view(-1))
+                        offset += numel
+            else:
+                for p, p32 in zip(self.fp16_params, self.fp32_params):
+                    if not p.requires_grad:
+                        continue
+                    if p.grad is not None:
+                        if p32.grad is None:
+                            p32.grad = p.grad.data.float()
+                        else:
+                            p32.grad.data.copy_(p.grad.data)
+                    else:
+                        p32.grad = torch.zeros_like(p.data, dtype=torch.float)
+
+            self._needs_sync = False
+
+    def _sync_fp32_params_to_fp16(self):
+        # copy FP32 params back into FP16 model
+        if self.has_flat_params:
+            devices = list(self.fp32_params.keys())
+            device_params_dict = defaultdict(list)
+            for p in self.fp16_params:
+                device_params_dict[p.device.index].append(p)
+            for device in devices:
+                device_params = device_params_dict[device]
+                offset = 0
+                for p in device_params:
+                    numel = p.data.numel()
+                    p.data.copy_(
+                        self.fp32_params[device]
+                        .data[offset : offset + numel]
+                        .view_as(p.data)
+                    )
+                    offset += numel
+        else:
+            for p, p32 in zip(self.fp16_params, self.fp32_params):
+                if not p.requires_grad:
+                    continue
+                p.data.copy_(p32.data)
+
+    def _unscale_grads(self):
+        self._sync_fp16_grads_to_fp32()
+        if (
+            # Skip the multiplication if it's a no-op (i.e., if _multiply_factor
+            # is 1.0). At the same time, we want to avoid the device-to-host
+            # transfer by comparing it to 1.0. Since _multiply_factor starts as
+            # a Python float, we roughly assume that if it's a tensor then it's
+            # probably not =1.0 anymore and we do the multiplication. Otherwise
+            # we can safely check the value without a D2H transfer.
+            torch.is_tensor(self._multiply_factor)
+            or self._multiply_factor != 1.0
+        ):
+            self.fp32_optimizer.multiply_grads(self._multiply_factor)
+            self._multiply_factor = 1.0
+
+    def multiply_grads(self, c):
+        """Multiplies grads by a constant ``c``."""
+        self._multiply_factor *= c
+
+    def clip_grad_norm(self, max_norm, aggregate_norm_fn=None):
+        """Clips gradient norm and updates dynamic loss scaler."""
+        self._sync_fp16_grads_to_fp32()
+
+        grad_norm = self._multiply_factor * self.fp32_optimizer.clip_grad_norm(
+            0, aggregate_norm_fn
+        )
+
+        if self.scaler is not None:
+            if grad_norm > max_norm > 0.0:
+                self._multiply_factor *= max_norm / grad_norm
+
+            self.scaler.check_overflow(grad_norm)
+        elif max_norm > 0.0:
+            clip_coef = (max_norm / (grad_norm + 1e-6)).clamp_(max=1)
+            self._multiply_factor *= clip_coef
+
+        return grad_norm
+
+    def step(self, closure=None, groups=None):
+        """Performs a single optimization step."""
+        self._sync_fp16_grads_to_fp32()
+
+        if getattr(self, "supports_step_with_scale", False):
+            self.fp32_optimizer.step(closure, scale=(1.0 / self._multiply_factor), groups=groups)
+        else:
+            self._unscale_grads()
+            self.fp32_optimizer.step(closure, groups=groups)
+
+        if self.scaler is not None:
+            self.scaler.update()
+
+        self._sync_fp32_params_to_fp16()
+
+    def zero_grad(self):
+        """Clears the gradients of all optimized parameters."""
+        for p in self.fp16_params:
+            p.grad = None
+        if self.has_flat_params:
+            if torch.is_tensor(self.fp32_params):
+                self.fp32_params.grad.zero_()
+            elif isinstance(self.fp32_params, dict):
+                for fp32_params in self.fp32_params.values():
+                    fp32_params.grad.zero_()
+            else:
+                raise RuntimeError("self.fp32_params must be a tensor or dict")
+        else:
+            for p32 in self.fp32_params:
+                if p32.grad is not None:
+                    p32.grad.zero_()
+        self._needs_sync = False
+
+        if self.scaler is not None:
+            self._multiply_factor = 1.0 / float(self.scaler.loss_scale)
+
+
+class FP16Optimizer(_FP16OptimizerMixin, optim.FairseqOptimizer):
+    """
+    Wrap an *optimizer* to support FP16 (mixed precision) training.
+    """
+
+    def __init__(self, cfg: DictConfig, params, fp32_optimizer, fp32_params, **kwargs):
+        super().__init__(cfg.optimizer)
+        self.fp16_params = params
+        self.fp32_optimizer = fp32_optimizer
+        self.fp32_params = fp32_params
+
+        if getattr(cfg.common, "fp16_scale_window", None) is None:
+            if len(cfg.optimization.update_freq) > 1:
+                raise ValueError(
+                    "--fp16-scale-window must be given explicitly when using a "
+                    "custom --update-freq schedule"
+                )
+            data_parallel_size = int(
+                cfg.distributed_training.distributed_world_size
+                / cfg.common.model_parallel_size
+            )
+            scale_window = int(
+                2 ** 14 / data_parallel_size / cfg.optimization.update_freq[0]
+            )
+        else:
+            scale_window = cfg.common.fp16_scale_window
+
+        if not getattr(cfg.common, "bf16", False):
+            self.scaler = DynamicLossScaler(
+                init_scale=cfg.common.fp16_init_scale,
+                scale_window=scale_window,
+                tolerance=cfg.common.fp16_scale_tolerance,
+                threshold=cfg.common.threshold_loss_scale,
+                min_loss_scale=cfg.common.min_loss_scale,
+            )
+        else:
+            # disable loss scaling for bfloat16
+            self.scaler = None
+
+    @classmethod
+    def build_optimizer(cls, cfg: DictConfig, params, **kwargs):
+        """
+        Args:
+            cfg (omegaconf.DictConfig): fairseq args
+            params (iterable): iterable of parameters to optimize
+        """
+        flatten = not getattr(cfg.common, "fp16_no_flatten_grads", False)
+        if getattr(cfg.common, "bf16", False):
+            flatten = False  # mixed precision is faster on TPUs without flat grads
+        fp32_params = cls.build_fp32_params(cfg.optimizer, params, flatten=flatten)
+        if flatten:
+            fp32_optimizer = optim.build_optimizer(cfg.optimizer, [fp32_params])
+        else:
+            fp32_optimizer = optim.build_optimizer(cfg.optimizer, fp32_params)
+        if flatten and not fp32_optimizer.supports_flat_params:
+            raise RuntimeError(
+                f"chosen optimizer {fp32_optimizer.__class__.__name__} does not support flat params, please set --fp16-no-flatten-grads"
+            )
+        return cls(cfg, params, fp32_optimizer, fp32_params, **kwargs)
+
+    @property
+    def optimizer(self):
+        return self.fp32_optimizer.optimizer
+
+    @optimizer.setter
+    def optimizer(self, optimizer):
+        self.fp32_optimizer.optimizer = optimizer
+
+    @property
+    def lr_scheduler(self):
+        return getattr(self.fp32_optimizer, "lr_scheduler", None)
+
+    @property
+    def optimizer_config(self):
+        return self.fp32_optimizer.optimizer_config
+
+    def get_lr(self):
+        return self.fp32_optimizer.get_lr()
+
+    def set_lr(self, lr):
+        self.fp32_optimizer.set_lr(lr)
+
+    def all_reduce_grads(self, module):
+        self.fp32_optimizer.all_reduce_grads(module)
+
+    @property
+    def supports_flat_params(self):
+        return self.fp32_optimizer.supports_flat_params
+
+
+class _MemoryEfficientFP16OptimizerMixin(object):
+    def __init__(self, *args, **kwargs):
+        # forward __init__ call to the next class in MRO (method resolution order)
+        super().__init__(*args, **kwargs)
+        self._multiply_factor = 1.0
+
+    @property
+    def has_flat_params(self):
+        return False
+
+    def state_dict(self):
+        """Return the optimizer's state dict."""
+        state_dict = self.wrapped_optimizer.state_dict()
+        if self.scaler is not None:
+            state_dict["loss_scale"] = self.scaler.loss_scale
+        return state_dict
+
+    def load_state_dict(self, state_dict, optimizer_overrides=None):
+        """Load an optimizer state dict.
+
+        In general we should prefer the configuration of the existing optimizer
+        instance (e.g., learning rate) over that found in the state_dict. This
+        allows us to resume training from a checkpoint using a new set of
+        optimizer args.
+        """
+        if "loss_scale" in state_dict and self.scaler is not None:
+            self.scaler.loss_scale = state_dict["loss_scale"]
+
+        self.wrapped_optimizer.load_state_dict(state_dict, optimizer_overrides)
+
+        # Hack: PyTorch automatically casts the optimizer state to match the
+        # type of the current parameters. But with --memory-efficient-fp16 the
+        # params are FP16 while the optimizer state is FP32 and we don't want
+        # to cast. A workaround is to manually copy back the original state
+        # after the optimizer has been loaded.
+        if not getattr(self.optimizer, "disable_mem_eff_fp16_loading_hack", False):
+            groups = self.optimizer.param_groups
+            saved_groups = state_dict["param_groups"]
+            id_map = {
+                old_id: p
+                for old_id, p in zip(
+                    chain(*(g["params"] for g in saved_groups)),
+                    chain(*(g["params"] for g in groups)),
+                )
+            }
+            for k, v in state_dict["state"].items():
+                if k in id_map:
+                    param = id_map[k]
+                    self.optimizer.state[param] = v
+
+    def backward(self, loss):
+        """Computes the sum of gradients of the given tensor w.r.t. graph leaves.
+
+        Compared to :func:`fairseq.optim.FairseqOptimizer.backward`, this
+        function additionally dynamically scales the loss to avoid gradient
+        underflow.
+        """
+        if self.scaler is not None:
+            loss = self.scaler.scale(loss)
+        loss.backward()
+
+    def _unscale_grads(self):
+        if (
+            # Skip the multiplication if it's a no-op (i.e., if _multiply_factor
+            # is 1.0). At the same time, we want to avoid the device-to-host
+            # transfer by comparing it to 1.0. Since _multiply_factor starts as
+            # a Python float, we roughly assume that if it's a tensor then it's
+            # probably not =1.0 anymore and we do the multiplication. Otherwise
+            # we can safely check the value without a D2H transfer.
+            torch.is_tensor(self._multiply_factor)
+            or self._multiply_factor != 1.0
+        ):
+            self.wrapped_optimizer.multiply_grads(self._multiply_factor)
+            self._multiply_factor = 1.0
+
+    def multiply_grads(self, c):
+        """Multiplies grads by a constant *c*."""
+        self._multiply_factor *= c
+
+    def clip_grad_norm(self, max_norm, aggregate_norm_fn=None):
+        """Clips gradient norm and updates dynamic loss scaler."""
+        max_norm = float(max_norm)
+        grad_norm = self._multiply_factor * self.wrapped_optimizer.clip_grad_norm(
+            0, aggregate_norm_fn
+        )
+
+        if self.scaler is not None:
+            grad_norm_cpu = float(grad_norm)
+            if grad_norm_cpu > max_norm > 0.0:
+                self._multiply_factor *= max_norm / grad_norm_cpu
+
+            # detect overflow and adjust loss scale
+            self.scaler.check_overflow(grad_norm_cpu)
+        elif max_norm > 0.0:
+            clip_coef = (max_norm / (grad_norm + 1e-6)).clamp_(max=1)
+            self._multiply_factor *= clip_coef
+
+        return grad_norm
+
+    def step(self, closure=None, groups=None):
+        """Performs a single optimization step."""
+        if getattr(self, "supports_step_with_scale", False):
+            # NOTE(msb) optimizer divides by scale factor
+            self.wrapped_optimizer.step(closure, scale=(1.0 / self._multiply_factor), groups=groups)
+        else:
+            self._unscale_grads()
+            self.wrapped_optimizer.step(closure, groups=groups)
+
+        if self.scaler is not None:
+            self.scaler.update()
+
+    def zero_grad(self):
+        """Clears the gradients of all optimized parameters."""
+        self.wrapped_optimizer.zero_grad()
+        if self.scaler is not None:
+            self._multiply_factor = 1.0 / float(self.scaler.loss_scale)
+        else:
+            self._multiply_factor = 1.0
+
+    @property
+    def supports_flat_params(self):
+        return self.wrapped_optimizer.supports_flat_params
+
+
+class MemoryEfficientFP16Optimizer(
+    _MemoryEfficientFP16OptimizerMixin, optim.FairseqOptimizer
+):
+    """
+    Wrap an *optimizer* to support FP16 (mixed precision) training.
+
+    Compared to :class:`fairseq.optim.FP16Optimizer`, this version does not
+    maintain an FP32 copy of the model. We instead expect the optimizer to
+    convert the gradients to FP32 internally and sync the results back to the
+    FP16 model params. This significantly reduces memory usage but slightly
+    increases the time spent in the optimizer.
+
+    Since this wrapper depends on specific functionality in the wrapped
+    optimizer (i.e., on-the-fly conversion of grads to FP32), only certain
+    optimizers can be wrapped. This is determined by the
+    *supports_memory_efficient_fp16* property.
+    """
+
+    def __init__(
+        self, cfg: DictConfig, params, optimizer, allow_unsupported=False, **kwargs
+    ):
+        if not allow_unsupported and not optimizer.supports_memory_efficient_fp16:
+            raise ValueError(
+                "Unsupported optimizer: {}".format(optimizer.__class__.__name__)
+            )
+
+        super().__init__(cfg.optimizer)
+        self.wrapped_optimizer = optimizer
+
+        if getattr(cfg.common, "fp16_scale_window", None) is None:
+            if len(cfg.optimization.update_freq) > 1:
+                raise ValueError(
+                    "--fp16-scale-window must be given explicitly when using a "
+                    "custom --update-freq schedule"
+                )
+            data_parallel_size = int(
+                cfg.distributed_training.distributed_world_size
+                / cfg.common.model_parallel_size
+            )
+            scale_window = int(
+                2 ** 14 / data_parallel_size / cfg.optimization.update_freq[0]
+            )
+        else:
+            scale_window = cfg.common.fp16_scale_window
+
+        if not getattr(cfg.common, "bf16", False):
+            self.scaler = DynamicLossScaler(
+                init_scale=cfg.common.fp16_init_scale,
+                scale_window=scale_window,
+                tolerance=cfg.common.fp16_scale_tolerance,
+                threshold=cfg.common.threshold_loss_scale,
+                min_loss_scale=cfg.common.min_loss_scale,
+            )
+        else:
+            # disable loss scaling for bfloat16
+            self.scaler = None
+
+    @classmethod
+    def build_optimizer(cls, cfg: DictConfig, params, **kwargs):
+        """
+        Args:
+            args (argparse.Namespace): fairseq args
+            params (iterable): iterable of parameters to optimize
+        """
+        fp16_optimizer = optim.build_optimizer(cfg.optimizer, params)
+        return cls(cfg, params, fp16_optimizer, **kwargs)
+
+    @property
+    def optimizer(self):
+        return self.wrapped_optimizer.optimizer
+
+    @optimizer.setter
+    def optimizer(self, optimizer):
+        self.wrapped_optimizer.optimizer = optimizer
+
+    @property
+    def optimizer_config(self):
+        return self.wrapped_optimizer.optimizer_config
+
+    @property
+    def lr_scheduler(self):
+        return getattr(self.wrapped_optimizer, "lr_scheduler", None)
+
+    def get_lr(self):
+        return self.wrapped_optimizer.get_lr()
+
+    def set_lr(self, lr):
+        self.wrapped_optimizer.set_lr(lr)
+
+    def all_reduce_grads(self, module):
+        self.wrapped_optimizer.all_reduce_grads(module)
diff --git a/SpeechT5/fairseq/fairseq/optim/fused_adam.py b/SpeechT5/fairseq/fairseq/optim/fused_adam.py
new file mode 100644
index 0000000000000000000000000000000000000000..e2b8e1bcd12c3636ae83be1b21ded4fcd55ccc96
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/optim/fused_adam.py
@@ -0,0 +1,348 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import types
+
+import torch
+
+
+def get_fused_adam_class():
+    """
+    Look for the FusedAdam optimizer from apex. We first try to load the
+    "contrib" interface, which is a bit faster than the main interface,
+    but is technically deprecated.
+    """
+    try:
+        # The "deprecated" interface in recent versions of apex is a bit
+        # faster than the main interface, since we don't use the apex
+        # optimizer. This can be installed by passing the
+        # `--deprecated_fused_adam` option when building apex.
+        global fused_adam_cuda
+        import importlib
+
+        fused_adam_cuda = importlib.import_module("fused_adam_cuda")
+        return FusedAdamV1
+    except ImportError:
+        try:
+            # fallback to the newer interface
+            from apex.optimizers import FusedAdam as _FusedAdam  # noqa
+            from apex.multi_tensor_apply import multi_tensor_applier
+
+            if multi_tensor_applier.available:
+                return FusedAdamV2
+        except ImportError:
+            pass
+    return None
+
+
+class FusedAdamV1(torch.optim.Optimizer):
+    """
+    Implements Adam algorithm. Currently GPU-only. Requires Apex to be installed via
+    ``python setup.py install --cuda_ext --cpp_ext``.
+
+    It has been proposed in `Adam: A Method for Stochastic Optimization`_.
+
+    Compared to the original version in Apex, the fairseq version casts grads
+    and params to FP32 internally to support ``--memory-efficient-fp16``.
+
+    Args:
+        params (iterable): iterable of parameters to optimize or dicts defining
+            parameter groups.
+        lr (float, optional): learning rate. (default: 1e-3)
+        betas (Tuple[float, float], optional): coefficients used for computing
+            running averages of gradient and its square. (default: (0.9, 0.999))
+        eps (float, optional): term added to the denominator to improve
+            numerical stability. (default: 1e-8)
+        weight_decay (float, optional): weight decay (L2 penalty) (default: 0)
+        amsgrad (boolean, optional): whether to use the AMSGrad variant of this
+            algorithm from the paper `On the Convergence of Adam and Beyond`_
+            (default: False) NOT SUPPORTED in FusedAdam!
+        eps_inside_sqrt (boolean, optional): in the 'update parameters' step,
+            adds eps to the bias-corrected second moment estimate before
+            evaluating square root instead of adding it to the square root of
+            second moment estimate as in the original paper. (default: False)
+    .. _Adam: A Method for Stochastic Optimization:
+        https://arxiv.org/abs/1412.6980
+    .. _On the Convergence of Adam and Beyond:
+        https://openreview.net/forum?id=ryQu7f-RZ
+    """
+
+    def __init__(
+        self,
+        params,
+        lr=1e-3,
+        bias_correction=True,
+        betas=(0.9, 0.999),
+        eps=1e-8,
+        eps_inside_sqrt=False,
+        weight_decay=0.0,
+        max_grad_norm=0.0,
+        amsgrad=False,
+    ):
+        global fused_adam_cuda
+        import importlib
+
+        fused_adam_cuda = importlib.import_module("fused_adam_cuda")
+
+        if amsgrad:
+            raise RuntimeError("FusedAdam does not support the AMSGrad variant.")
+        defaults = {
+            "lr": lr,
+            "bias_correction": bias_correction,
+            "betas": betas,
+            "eps": eps,
+            "weight_decay": weight_decay,
+            "max_grad_norm": max_grad_norm,
+        }
+        super().__init__(params, defaults)
+        self.eps_mode = 0 if eps_inside_sqrt else 1
+
+    @property
+    def supports_memory_efficient_fp16(self):
+        return True
+
+    @property
+    def supports_flat_params(self):
+        return True
+
+    @property
+    def supports_step_with_scale(self):
+        return True
+
+    def step(self, closure=None, grads=None, scale=1.0, grad_norms=None):
+        """Performs a single optimization step.
+        Args:
+            closure (callable, optional): A closure that reevaluates the model
+                and returns the loss.
+            grads (list of tensors, optional): weight gradient to use for the
+                optimizer update. If gradients have type torch.half, parameters
+                are expected to be in type torch.float. (default: None)
+            output params (list of tensors, optional): A reduced precision copy
+                of the updated weights written out in addition to the regular
+                updated weights. Have to be of same type as gradients. (default: None)
+            scale (float, optional): factor to divide gradient tensor values
+                by before applying to weights. (default: 1)
+        """
+        loss = None
+        if closure is not None:
+            loss = closure()
+
+        if grads is None:
+            grads_group = [None] * len(self.param_groups)
+        # backward compatibility
+        # assuming a list/generator of parameter means single group
+        elif isinstance(grads, types.GeneratorType):
+            grads_group = [grads]
+        elif type(grads[0]) != list:
+            grads_group = [grads]
+        else:
+            grads_group = grads
+
+        if grad_norms is None:
+            grad_norms = [None] * len(self.param_groups)
+
+        for group, grads_this_group, grad_norm in zip(
+            self.param_groups, grads_group, grad_norms
+        ):
+            if grads_this_group is None:
+                grads_this_group = [None] * len(group["params"])
+
+            # compute combined scale factor for this group
+            combined_scale = scale
+            if group.get("max_grad_norm", 0) > 0:
+                # norm is in fact norm*scale
+                clip = ((grad_norm / scale) + 1e-6) / group["max_grad_norm"]
+                if clip > 1:
+                    combined_scale = clip * scale
+
+            bias_correction = 1 if group.get("bias_correction", 1) else 0
+
+            for p, grad in zip(group["params"], grads_this_group):
+                # note: p.grad should not ever be set for correct
+                # operation of mixed precision optimizer that sometimes
+                # sends None gradients
+                if p.grad is None and grad is None:
+                    continue
+                if grad is None:
+                    grad = p.grad.data
+                if grad.is_sparse:
+                    raise RuntimeError(
+                        "FusedAdam does not support sparse gradients, "
+                        "please consider SparseAdam instead"
+                    )
+
+                p_data_fp32 = p.data.float()
+
+                state = self.state[p]
+
+                # State initialization
+                if len(state) == 0:
+                    state["step"] = 0
+                    # Exponential moving average of gradient values
+                    state["exp_avg"] = torch.zeros_like(p_data_fp32)
+                    # Exponential moving average of squared gradient values
+                    state["exp_avg_sq"] = torch.zeros_like(p_data_fp32)
+                else:
+                    state["exp_avg"] = state["exp_avg"].to(p_data_fp32)
+                    state["exp_avg_sq"] = state["exp_avg_sq"].to(p_data_fp32)
+
+                exp_avg = state["exp_avg"]
+                exp_avg_sq = state["exp_avg_sq"]
+                beta1, beta2 = group["betas"]
+
+                state["step"] += 1
+
+                out_p = p.data
+                with torch.cuda.device(p.device):
+                    fused_adam_cuda.adam(
+                        p_data_fp32,
+                        out_p,
+                        exp_avg,
+                        exp_avg_sq,
+                        grad,
+                        group["lr"],
+                        beta1,
+                        beta2,
+                        group["eps"],
+                        combined_scale,
+                        state["step"],
+                        self.eps_mode,
+                        bias_correction,
+                        group["weight_decay"],
+                    )
+
+        return loss
+
+
+try:
+    from apex.optimizers import FusedAdam
+    from apex.multi_tensor_apply import multi_tensor_applier
+
+    class FusedAdamV2(FusedAdam):
+        """
+        Compared to the original version in Apex, the fairseq version casts grads
+        and params to FP32 internally to support ``--memory-efficient-fp16``.
+        """
+
+        def __init__(self, *args, **kwargs):
+            super().__init__(*args, **kwargs)
+            if not hasattr(self, "multi_tensor_adam"):
+                raise Exception(
+                    "Apex installation is outdated. Please install an updated version of apex."
+                )
+
+        @property
+        def supports_memory_efficient_fp16(self):
+            return True
+
+        @property
+        def supports_flat_params(self):
+            return True
+
+        def step(
+            self,
+            closure=None,
+            grads=None,
+            output_params=None,
+            scale=None,
+            grad_norms=None,
+        ):
+            """Performs a single optimization step."""
+            loss = None
+            if closure is not None:
+                loss = closure()
+
+            for group in self.param_groups:
+                bias_correction = 1 if group["bias_correction"] else 0
+                beta1, beta2 = group["betas"]
+
+                # assume same step across group now to simplify things
+                # per parameter step can be easily support by making it tensor, or pass list into kernel
+                if "step" in group:
+                    group["step"] += 1
+                else:
+                    group["step"] = 1
+
+                # create lists for multi-tensor apply
+                g_16, p_16, orig_p_16, m_16, v_16 = [], [], [], [], []
+                g_32, p_32, m_32, v_32 = [], [], [], []
+
+                for p in group["params"]:
+                    if p.grad is None:
+                        continue
+                    if p.grad.data.is_sparse:
+                        raise RuntimeError(
+                            "FusedAdam does not support sparse gradients, "
+                            "please consider SparseAdam instead"
+                        )
+
+                    state = self.state[p]
+                    # State initialization
+                    if len(state) == 0:
+                        # Exponential moving average of gradient values
+                        state["exp_avg"] = torch.zeros_like(p.data, dtype=torch.float)
+                        # Exponential moving average of squared gradient values
+                        state["exp_avg_sq"] = torch.zeros_like(
+                            p.data, dtype=torch.float
+                        )
+                    else:
+                        state["exp_avg"] = state["exp_avg"].to(
+                            device=p.data.device, dtype=torch.float
+                        )
+                        state["exp_avg_sq"] = state["exp_avg_sq"].to(
+                            device=p.data.device, dtype=torch.float
+                        )
+
+                    if p.dtype == torch.float16:
+                        g_16.append(p.grad.data.float())
+                        p_16.append(p.data.float())
+                        orig_p_16.append(p.data)
+                        m_16.append(state["exp_avg"])
+                        v_16.append(state["exp_avg_sq"])
+                    elif p.dtype == torch.float32:
+                        g_32.append(p.grad.data)
+                        p_32.append(p.data)
+                        m_32.append(state["exp_avg"])
+                        v_32.append(state["exp_avg_sq"])
+                    else:
+                        raise RuntimeError("FusedAdam only support fp16 and fp32.")
+
+                with torch.cuda.device(p.device):
+                    if len(g_16) > 0:
+                        multi_tensor_applier(
+                            self.multi_tensor_adam,
+                            self._dummy_overflow_buf,
+                            [g_16, p_16, m_16, v_16],
+                            group["lr"],
+                            beta1,
+                            beta2,
+                            group["eps"],
+                            group["step"],
+                            self.adam_w_mode,
+                            bias_correction,
+                            group["weight_decay"],
+                        )
+                        for orig_p, p in zip(orig_p_16, p_16):
+                            orig_p.copy_(p.data)
+                    if len(g_32) > 0:
+                        multi_tensor_applier(
+                            self.multi_tensor_adam,
+                            self._dummy_overflow_buf,
+                            [g_32, p_32, m_32, v_32],
+                            group["lr"],
+                            beta1,
+                            beta2,
+                            group["eps"],
+                            group["step"],
+                            self.adam_w_mode,
+                            bias_correction,
+                            group["weight_decay"],
+                        )
+
+            return loss
+
+
+except ImportError:
+    pass
diff --git a/SpeechT5/fairseq/fairseq/optim/fused_lamb.py b/SpeechT5/fairseq/fairseq/optim/fused_lamb.py
new file mode 100644
index 0000000000000000000000000000000000000000..f4f2bdb0c6c65f7758509b6d4d2f2c48cb6e8b4f
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/optim/fused_lamb.py
@@ -0,0 +1,51 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from fairseq.optim import LegacyFairseqOptimizer, register_optimizer
+
+
+@register_optimizer("lamb")
+class FairseqLAMB(LegacyFairseqOptimizer):
+    """LAMB optimizer."""
+
+    def __init__(self, args, params):
+        super().__init__(args)
+        try:
+            from apex.optimizers import FusedLAMB
+
+            self._optimizer = FusedLAMB(params, **self.optimizer_config)
+        except ImportError:
+            raise ImportError("Please install apex to use LAMB optimizer")
+
+    @staticmethod
+    def add_args(parser):
+        """Add optimizer-specific arguments to the parser."""
+        # fmt: off
+        parser.add_argument('--lamb-betas', default='(0.9, 0.999)', metavar='B',
+                            help='betas for LAMB optimizer')
+        parser.add_argument('--lamb-eps', type=float, default=1e-8, metavar='D',
+                            help='epsilon for LAMB optimizer')
+        parser.add_argument('--weight-decay', '--wd', default=0.0, type=float, metavar='WD',
+                            help='weight decay')
+        # fmt: on
+
+    @property
+    def optimizer_config(self):
+        """
+        Return a kwarg dictionary that will be used to override optimizer
+        args stored in checkpoints. This allows us to load a checkpoint and
+        resume training using a different set of optimizer args, e.g., with a
+        different learning rate.
+        """
+        return {
+            "lr": self.args.lr[0],
+            "betas": eval(self.args.lamb_betas),
+            "eps": self.args.lamb_eps,
+            "weight_decay": self.args.weight_decay,
+        }
+
+    @property
+    def supports_flat_params(self):
+        return False
diff --git a/SpeechT5/fairseq/fairseq/optim/lr_scheduler/__init__.py b/SpeechT5/fairseq/fairseq/optim/lr_scheduler/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..5b3dbc023aa4a6f7bfb8403b8204d71ca432f79c
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/optim/lr_scheduler/__init__.py
@@ -0,0 +1,36 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+"""isort:skip_file"""
+
+import importlib
+import os
+
+from fairseq import registry
+from fairseq.optim.lr_scheduler.fairseq_lr_scheduler import (  # noqa
+    FairseqLRScheduler,
+    LegacyFairseqLRScheduler,
+)
+from omegaconf import DictConfig
+
+
+(
+    build_lr_scheduler_,
+    register_lr_scheduler,
+    LR_SCHEDULER_REGISTRY,
+    LR_SCHEDULER_DATACLASS_REGISTRY,
+) = registry.setup_registry(
+    "--lr-scheduler", base_class=FairseqLRScheduler, default="fixed"
+)
+
+
+def build_lr_scheduler(cfg: DictConfig, optimizer):
+    return build_lr_scheduler_(cfg, optimizer)
+
+
+# automatically import any Python files in the optim/lr_scheduler/ directory
+for file in sorted(os.listdir(os.path.dirname(__file__))):
+    if file.endswith(".py") and not file.startswith("_"):
+        file_name = file[: file.find(".py")]
+        importlib.import_module("fairseq.optim.lr_scheduler." + file_name)
diff --git a/SpeechT5/fairseq/fairseq/optim/lr_scheduler/cosine_lr_scheduler.py b/SpeechT5/fairseq/fairseq/optim/lr_scheduler/cosine_lr_scheduler.py
new file mode 100644
index 0000000000000000000000000000000000000000..51f58359eda387d67748f48217906ac6d16ccd08
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/optim/lr_scheduler/cosine_lr_scheduler.py
@@ -0,0 +1,147 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import math
+from collections.abc import Collection
+from dataclasses import dataclass, field
+from typing import List
+
+from omegaconf import II
+
+from fairseq.dataclass import FairseqDataclass
+from fairseq.optim.lr_scheduler import FairseqLRScheduler, register_lr_scheduler
+
+
+@dataclass
+class CosineLRScheduleConfig(FairseqDataclass):
+    warmup_updates: int = field(
+        default=0,
+        metadata={"help": "warmup the learning rate linearly for the first N updates"},
+    )
+    warmup_init_lr: float = field(
+        default=-1,
+        metadata={
+            "help": "initial learning rate during warmup phase; default is cfg.lr"
+        },
+    )
+    lr: List[float] = field(
+        default=II("optimization.lr"),
+        metadata={"help": "max learning rate, must be more than cfg.min_lr"},
+    )
+    min_lr: float = field(default=0.0, metadata={"help": "min learning rate"})
+    t_mult: float = field(
+        default=1.0, metadata={"help": "factor to grow the length of each period"}
+    )
+    lr_period_updates: float = field(
+        default=-1, metadata={"help": "initial number of updates per period"}
+    )
+    lr_shrink: float = field(
+        default=0.1, metadata={"help": "shrink factor for annealing"}
+    )
+    # This is not required, but is for convenience in inferring lr_period_updates
+    max_update: int = II("optimization.max_update")
+
+
+@register_lr_scheduler("cosine", dataclass=CosineLRScheduleConfig)
+class CosineLRSchedule(FairseqLRScheduler):
+    """Assign LR based on a cyclical schedule that follows the cosine function.
+
+    See https://arxiv.org/pdf/1608.03983.pdf for details.
+
+    We also support a warmup phase where we linearly increase the learning rate
+    from some initial learning rate (``--warmup-init-lr``) until the configured
+    max learning rate (``--lr``).
+
+    During warmup::
+
+      lrs = torch.linspace(cfg.warmup_init_lr, cfg.lr, cfg.warmup_updates)
+      lr = lrs[update_num]
+
+    After warmup::
+
+      lr = cfg.min_lr + 0.5*(cfg.lr - cfg.min_lr)*(1 + cos(t_curr / t_i))
+
+    where ``t_curr`` is current percentage of updates within the current period
+    range and ``t_i`` is the current period range, which is scaled by ``t_mul``
+    after every iteration.
+    """
+
+    def __init__(self, cfg: CosineLRScheduleConfig, fairseq_optimizer):
+        super().__init__(cfg, fairseq_optimizer)
+        if isinstance(cfg.lr, Collection) and len(cfg.lr) > 1:
+            raise ValueError(
+                "Cannot use a fixed learning rate schedule with cosine."
+                f" Consider --lr-scheduler=fixed instead. ({cfg.lr})"
+            )
+
+        self.max_lr = cfg.lr[0] if isinstance(cfg.lr, Collection) else cfg.lr
+        assert (
+            self.max_lr > cfg.min_lr
+        ), f"max_lr (={cfg.lr}) must be more than min_lr (={cfg.min_lr})"
+
+        warmup_end_lr = self.max_lr
+        if cfg.warmup_init_lr < 0:
+            cfg.warmup_init_lr = cfg.min_lr
+
+        self.t_mult = cfg.t_mult
+        self.period = cfg.lr_period_updates
+
+        if self.period <= 0:
+            assert (
+                cfg.max_update > 0
+            ), "Either --max_update or --lr-period-updates must be set"
+            self.period = cfg.max_update - cfg.warmup_updates
+
+        if cfg.warmup_updates > 0:
+            # linearly warmup for the first cfg.warmup_updates
+            self.lr_step = (warmup_end_lr - cfg.warmup_init_lr) / cfg.warmup_updates
+        else:
+            self.lr_step = 1
+
+        self.warmup_updates = cfg.warmup_updates
+        self.lr_shrink = cfg.lr_shrink
+
+        # initial learning rate
+        self.lr = cfg.warmup_init_lr
+        self.optimizer.set_lr(self.lr)
+
+    def step(self, epoch, val_loss=None):
+        """Update the learning rate at the end of the given epoch."""
+        super().step(epoch, val_loss)
+        # we don't change the learning rate at epoch boundaries
+        return self.optimizer.get_lr()
+
+    def step_update(self, num_updates):
+        """Update the learning rate after each update."""
+        if num_updates < self.cfg.warmup_updates:
+            self.lr = self.cfg.warmup_init_lr + num_updates * self.lr_step
+        else:
+            curr_updates = num_updates - self.cfg.warmup_updates
+            if self.t_mult != 1:
+                i = math.floor(
+                    math.log(
+                        1 - curr_updates / self.period * (1 - self.t_mult), self.t_mult
+                    )
+                )
+                t_i = self.t_mult ** i * self.period
+                t_curr = (
+                    curr_updates
+                    - (1 - self.t_mult ** i) / (1 - self.t_mult) * self.period
+                )
+            else:
+                i = math.floor(curr_updates / self.period)
+                t_i = self.period
+                t_curr = curr_updates - (self.period * i)
+
+            lr_shrink = self.lr_shrink ** i
+            min_lr = self.cfg.min_lr * lr_shrink
+            max_lr = self.max_lr * lr_shrink
+
+            self.lr = min_lr + 0.5 * (max_lr - min_lr) * (
+                1 + math.cos(math.pi * t_curr / t_i)
+            )
+
+        self.optimizer.set_lr(self.lr)
+        return self.lr
diff --git a/SpeechT5/fairseq/fairseq/optim/lr_scheduler/fairseq_lr_scheduler.py b/SpeechT5/fairseq/fairseq/optim/lr_scheduler/fairseq_lr_scheduler.py
new file mode 100644
index 0000000000000000000000000000000000000000..6c12fa56b825e81bcc3fc7a97d206777418260ef
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/optim/lr_scheduler/fairseq_lr_scheduler.py
@@ -0,0 +1,59 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from argparse import Namespace
+
+from fairseq.dataclass.utils import gen_parser_from_dataclass
+from fairseq.optim import FairseqOptimizer
+
+
+class FairseqLRScheduler(object):
+    def __init__(self, cfg, optimizer):
+        super().__init__()
+        if optimizer is not None and not isinstance(optimizer, FairseqOptimizer):
+            raise ValueError("optimizer must be an instance of FairseqOptimizer")
+        self.cfg = cfg
+        self.optimizer = optimizer
+        self.best = None
+
+    @classmethod
+    def add_args(cls, parser):
+        """Add arguments to the parser for this LR scheduler."""
+        dc = getattr(cls, "__dataclass", None)
+        if dc is not None:
+            gen_parser_from_dataclass(parser, dc())
+
+    def state_dict(self):
+        """Return the LR scheduler state dict."""
+        return {"best": self.best}
+
+    def load_state_dict(self, state_dict):
+        """Load an LR scheduler state dict."""
+        self.best = state_dict["best"]
+
+    def step_begin_epoch(self, epoch):
+        """Update the learning rate at the beginning of the given epoch."""
+        pass
+
+    def step(self, epoch, val_loss=None):
+        """Update the learning rate at the end of the given epoch."""
+        if val_loss is not None:
+            if self.best is None:
+                self.best = val_loss
+            else:
+                self.best = min(self.best, val_loss)
+
+    def step_update(self, num_updates):
+        """Update the learning rate after each update."""
+        return self.optimizer.get_lr()
+
+
+class LegacyFairseqLRScheduler(FairseqLRScheduler):
+    def __init__(self, args: Namespace, optimizer):
+        if not isinstance(optimizer, FairseqOptimizer):
+            raise ValueError("optimizer must be an instance of FairseqOptimizer")
+        self.args = args
+        self.optimizer = optimizer
+        self.best = None
diff --git a/SpeechT5/fairseq/fairseq/optim/lr_scheduler/fixed_schedule.py b/SpeechT5/fairseq/fairseq/optim/lr_scheduler/fixed_schedule.py
new file mode 100644
index 0000000000000000000000000000000000000000..d0e7e14b7e72b1151f7d7f19094430bbab64f8f0
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/optim/lr_scheduler/fixed_schedule.py
@@ -0,0 +1,76 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from dataclasses import dataclass, field
+from typing import Optional, List
+from omegaconf import II
+
+from fairseq.dataclass import FairseqDataclass
+from fairseq.optim.lr_scheduler import FairseqLRScheduler, register_lr_scheduler
+
+
+@dataclass
+class FixedLRScheduleConfig(FairseqDataclass):
+    force_anneal: Optional[int] = field(
+        default=None,
+        metadata={"help": "force annealing at specified epoch"},
+    )
+    lr_shrink: float = field(
+        default=0.1,
+        metadata={"help": "shrink factor for annealing, lr_new = (lr * lr_shrink)"},
+    )
+    warmup_updates: int = field(
+        default=0,
+        metadata={"help": "warmup the learning rate linearly for the first N updates"},
+    )
+    lr: List[float] = II("optimization.lr")
+
+
+@register_lr_scheduler("fixed", dataclass=FixedLRScheduleConfig)
+class FixedLRSchedule(FairseqLRScheduler):
+    """Decay the LR on a fixed schedule."""
+
+    def __init__(self, cfg: FixedLRScheduleConfig, optimizer):
+        super().__init__(cfg, optimizer)
+
+        self.lr = cfg.lr[0]
+        if cfg.warmup_updates > 0:
+            self.warmup_factor = 1.0 / cfg.warmup_updates
+        else:
+            self.warmup_factor = 1
+
+    def state_dict(self):
+        return {"lr": self.lr}
+
+    def load_state_dict(self, state_dict):
+        if "lr" in state_dict:
+            self.lr = state_dict["lr"]
+
+    def get_next_lr(self, epoch):
+        lrs = self.cfg.lr
+        if self.cfg.force_anneal is None or epoch < self.cfg.force_anneal:
+            # use fixed LR schedule
+            next_lr = lrs[min(epoch - 1, len(lrs) - 1)]
+        else:
+            # annneal based on lr_shrink
+            next_lr = lrs[-1] * self.cfg.lr_shrink ** (
+                epoch + 1 - self.cfg.force_anneal
+            )
+        return next_lr
+
+    def step_begin_epoch(self, epoch):
+        """Update the learning rate at the beginning of the given epoch."""
+        self.lr = self.get_next_lr(epoch)
+        self.optimizer.set_lr(self.warmup_factor * self.lr)
+        return self.optimizer.get_lr()
+
+    def step_update(self, num_updates):
+        """Update the learning rate after each update."""
+        if self.cfg.warmup_updates > 0 and num_updates < self.cfg.warmup_updates:
+            self.warmup_factor = (num_updates + 1) / float(self.cfg.warmup_updates)
+            self.optimizer.set_lr(self.warmup_factor * self.lr)
+        else:
+            self.optimizer.set_lr(self.lr)
+        return self.optimizer.get_lr()
diff --git a/SpeechT5/fairseq/fairseq/optim/lr_scheduler/inverse_square_root_schedule.py b/SpeechT5/fairseq/fairseq/optim/lr_scheduler/inverse_square_root_schedule.py
new file mode 100644
index 0000000000000000000000000000000000000000..0f87bb5d7ed5c7eb8011d4c651f2ecbf0ae700ac
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/optim/lr_scheduler/inverse_square_root_schedule.py
@@ -0,0 +1,85 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from collections.abc import Collection
+from dataclasses import dataclass, field
+from typing import List
+
+from omegaconf import II
+
+from fairseq.dataclass import FairseqDataclass
+from fairseq.optim.lr_scheduler import FairseqLRScheduler, register_lr_scheduler
+
+
+@dataclass
+class InverseSquareRootLRScheduleConfig(FairseqDataclass):
+    warmup_updates: int = field(
+        default=4000,
+        metadata={"help": "warmup the learning rate linearly for the first N updates"},
+    )
+    warmup_init_lr: float = field(
+        default=-1,
+        metadata={
+            "help": "initial learning rate during warmup phase; default is cfg.lr"
+        },
+    )
+    lr: List[float] = II("optimization.lr")
+
+
+@register_lr_scheduler("inverse_sqrt", dataclass=InverseSquareRootLRScheduleConfig)
+class InverseSquareRootSchedule(FairseqLRScheduler):
+    """Decay the LR based on the inverse square root of the update number.
+
+    We also support a warmup phase where we linearly increase the learning rate
+    from some initial learning rate (``--warmup-init-lr``) until the configured
+    learning rate (``--lr``). Thereafter we decay proportional to the number of
+    updates, with a decay factor set to align with the configured learning rate.
+
+    During warmup::
+
+      lrs = torch.linspace(cfg.warmup_init_lr, cfg.lr, cfg.warmup_updates)
+      lr = lrs[update_num]
+
+    After warmup::
+
+      decay_factor = cfg.lr * sqrt(cfg.warmup_updates)
+      lr = decay_factor / sqrt(update_num)
+    """
+
+    def __init__(self, cfg: InverseSquareRootLRScheduleConfig, optimizer):
+        super().__init__(cfg, optimizer)
+        if isinstance(cfg.lr, Collection) and len(cfg.lr) > 1:
+            raise ValueError(
+                "Cannot use a fixed learning rate schedule with inverse_sqrt."
+                " Consider --lr-scheduler=fixed instead."
+            )
+        warmup_end_lr = cfg.lr[0] if isinstance(cfg.lr, Collection) else cfg.lr
+        if cfg.warmup_init_lr < 0:
+            cfg.warmup_init_lr = 0 if cfg.warmup_updates > 0 else warmup_end_lr
+
+        # linearly warmup for the first cfg.warmup_updates
+        self.lr_step = (warmup_end_lr - cfg.warmup_init_lr) / cfg.warmup_updates
+
+        # then, decay prop. to the inverse square root of the update number
+        self.decay_factor = warmup_end_lr * cfg.warmup_updates ** 0.5
+
+        # initial learning rate
+        self.lr = cfg.warmup_init_lr
+        self.optimizer.set_lr(self.lr)
+
+    def step(self, epoch, val_loss=None):
+        """Update the learning rate at the end of the given epoch."""
+        super().step(epoch, val_loss)
+        # we don't change the learning rate at epoch boundaries
+        return self.optimizer.get_lr()
+
+    def step_update(self, num_updates):
+        """Update the learning rate after each update."""
+        if num_updates < self.cfg.warmup_updates:
+            self.lr = self.cfg.warmup_init_lr + num_updates * self.lr_step
+        else:
+            self.lr = self.decay_factor * num_updates ** -0.5
+        self.optimizer.set_lr(self.lr)
+        return self.lr
diff --git a/SpeechT5/fairseq/fairseq/optim/lr_scheduler/manual_lr_scheduler.py b/SpeechT5/fairseq/fairseq/optim/lr_scheduler/manual_lr_scheduler.py
new file mode 100644
index 0000000000000000000000000000000000000000..0269a1e2853854745e23b07931294f37b67d0295
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/optim/lr_scheduler/manual_lr_scheduler.py
@@ -0,0 +1,110 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from . import LegacyFairseqLRScheduler, register_lr_scheduler
+import logging
+import ast
+
+logger = logging.getLogger(__name__)
+logger.setLevel(logging.WARNING)
+
+
+@register_lr_scheduler("manual")
+class ManualSchedule(LegacyFairseqLRScheduler):
+    """Decay the LR on a manual schedule."""
+
+    def __init__(self, args, optimizer):
+        super().__init__(args, optimizer)
+
+        self.epoch2lr = self.parse_manuallr_args(args.epoch2lr)
+        self.update2lr = self.parse_manuallr_args(args.update2lr)
+        logger.info("@@@ ManualSchedule epoch2lr={}".format(self.epoch2lr))
+        logger.info("@@@ ManualSchedule update2lr={}".format(self.update2lr))
+
+        if 1 in self.epoch2lr:
+            self.lr = self.epoch2lr[1]
+        elif 1 in self.update2lr:
+            self.lr = self.update2lr[1]
+        else:
+            self.lr = args.lr[0]
+        self.optimizer.set_lr(self.lr)  # Set the beginning of the epoch.
+
+    def parse_manuallr_args(self, lr_args_str):
+        lr_dict = ast.literal_eval(lr_args_str.replace(' ', ''))
+        if not isinstance(lr_dict, dict):
+            raise ValueError("epoch2lr/update2lr must be abel to evaluated to a dict")
+
+        lr_args = {}
+        logger.info("@@@ after parsing input dictionary lr_dict = {}".format(lr_dict))
+        for key, val in lr_dict.items():
+            if "," in key:
+                for k in key.split(","):
+                    lr_args[int(k)] = float(val)
+            elif "-" in key:
+                s = int(key.split("-")[0])
+                e = int(key.split("-")[1])
+                for k in range(s, e + 1, 1):
+                    lr_args[k] = float(val)
+            else:
+                lr_args[int(key)] = float(val)
+
+        return lr_args
+
+    @staticmethod
+    def add_args(parser):
+        """Add arguments to the parser for this LR scheduler."""
+        # fmt: off
+        parser.add_argument(
+            "--epoch2lr",
+            type=str,
+            metavar="DICT",
+            default="{}",
+            help="a dictionary used to set lr for each epoch manually",
+        )
+        parser.add_argument(
+            "--update2lr",
+            type=str,
+            metavar="DICT",
+            default="{}",
+            help="a dictionary used to set lr for each update manually",
+        )
+        # fmt: on
+
+    def state_dict(self):
+        return {"lr": self.lr}
+
+    def load_state_dict(self, state_dict):
+        if "lr" in state_dict:
+            self.lr = state_dict["lr"]
+
+    def get_next_lr(self, epoch):
+        manual_keys = [k for k in self.epoch2lr if k <= epoch]
+        if manual_keys:
+            manual_lr = self.epoch2lr[max(manual_keys)]
+        else:
+            logger.warning("@@@ epoch={} does not exist in manual lr input. epoch2lr={}...".format(
+                epoch, list(self.epoch2lr.items())[:min(10, len(self.epoch2lr.keys())-1)]
+            ))
+            manual_lr = self.optimizer.get_lr()
+        return manual_lr
+
+    def step_begin_epoch(self, epoch):
+        """Update the learning rate at the beginning of the given epoch."""
+        self.lr = self.get_next_lr(epoch)
+        self.optimizer.set_lr(self.lr)
+        return self.optimizer.get_lr()
+
+    def step_update(self, num_updates):
+        """Update the learning rate after each update."""
+        manual_keys = [k for k in self.update2lr if k <= num_updates]
+        if manual_keys:
+            manual_lr = self.update2lr[max(manual_keys)]
+        else:
+            logger.warning("epoch={} does not exist in manual lr input update2lr={}...".format(
+                num_updates, list(self.update2lr.items())[:min(10, len(self.update2lr.keys())-1)]))
+            manual_lr = self.optimizer.get_lr()
+
+        self.optimizer.set_lr(manual_lr)
+        return self.optimizer.get_lr()
diff --git a/SpeechT5/fairseq/fairseq/optim/lr_scheduler/pass_through.py b/SpeechT5/fairseq/fairseq/optim/lr_scheduler/pass_through.py
new file mode 100644
index 0000000000000000000000000000000000000000..2f93db328c1de9b268e8ee1c0c1cad558fd089aa
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/optim/lr_scheduler/pass_through.py
@@ -0,0 +1,39 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from dataclasses import dataclass
+
+from fairseq.dataclass import FairseqDataclass
+from fairseq.optim.lr_scheduler import FairseqLRScheduler, register_lr_scheduler
+
+
+@dataclass
+class PassThroughScheduleConfig(FairseqDataclass):
+    pass
+
+
+@register_lr_scheduler("pass_through", dataclass=PassThroughScheduleConfig)
+class PassThroughScheduleSchedule(FairseqLRScheduler):
+    """Delegate lr scheduling to the optimizer."""
+
+    def __init__(self, cfg: PassThroughScheduleConfig, optimizer):
+        super().__init__(cfg, optimizer)
+        assert (
+            hasattr(optimizer, "lr_scheduler") and optimizer.lr_scheduler is not None
+        ), "Pass-through schedule can only be used with optimizers with their own schedulers"
+
+    def state_dict(self):
+        return self.optimizer.lr_scheduler.state_dict()
+
+    def load_state_dict(self, state_dict):
+        self.optimizer.lr_scheduler.load_state_dict(state_dict)
+
+    def step_begin_epoch(self, epoch):
+        """Update the learning rate at the beginning of the given epoch."""
+        return self.optimizer.lr_scheduler.step_begin_epoch(epoch)
+
+    def step_update(self, num_updates):
+        """Update the learning rate after each update."""
+        return self.optimizer.lr_scheduler.step_update(num_updates)
diff --git a/SpeechT5/fairseq/fairseq/optim/lr_scheduler/polynomial_decay_schedule.py b/SpeechT5/fairseq/fairseq/optim/lr_scheduler/polynomial_decay_schedule.py
new file mode 100644
index 0000000000000000000000000000000000000000..b8109a7c1e79cd057c355504d07bac5615c02ea9
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/optim/lr_scheduler/polynomial_decay_schedule.py
@@ -0,0 +1,89 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from dataclasses import dataclass, field
+from typing import Optional, List
+from omegaconf import II
+
+from fairseq.dataclass import FairseqDataclass
+from fairseq.optim.lr_scheduler import FairseqLRScheduler, register_lr_scheduler
+
+
+@dataclass
+class PolynomialDecayLRScheduleConfig(FairseqDataclass):
+    warmup_updates: int = field(
+        default=0,
+        metadata={"help": "warmup the learning rate linearly for the first N updates"},
+    )
+    force_anneal: Optional[int] = field(
+        default=None,
+        metadata={"help": "force annealing at specified epoch"},
+    )
+    end_learning_rate: float = field(
+        default=0.0,
+        metadata={"help": "learning rate to decay to"},
+    )
+    power: float = field(
+        default=1.0,
+        metadata={"help": "decay exponent"},
+    )
+    total_num_update: float = field(
+        default=II("optimization.max_update"),
+        metadata={"help": "total number of updates over which to decay learning rate"},
+    )
+    lr: List[float] = II("optimization.lr")
+
+
+@register_lr_scheduler("polynomial_decay", dataclass=PolynomialDecayLRScheduleConfig)
+class PolynomialDecayLRSchedule(FairseqLRScheduler):
+    """Decay the LR on a fixed schedule."""
+
+    def __init__(self, cfg: PolynomialDecayLRScheduleConfig, optimizer):
+        super().__init__(cfg, optimizer)
+
+        assert cfg.total_num_update > 0
+
+        self.lr = cfg.lr[0]
+        if cfg.warmup_updates > 0:
+            self.warmup_factor = 1.0 / cfg.warmup_updates
+        else:
+            self.warmup_factor = 1
+        self.end_learning_rate = cfg.end_learning_rate
+        self.total_num_update = cfg.total_num_update
+        self.power = cfg.power
+        self.optimizer.set_lr(self.warmup_factor * self.lr)
+
+    def get_next_lr(self, epoch):
+        lrs = self.cfg.lr
+        if self.cfg.force_anneal is None or epoch < self.cfg.force_anneal:
+            # use fixed LR schedule
+            next_lr = lrs[min(epoch, len(lrs) - 1)]
+        else:
+            # annneal based on lr_shrink
+            next_lr = self.optimizer.get_lr()
+        return next_lr
+
+    def step_begin_epoch(self, epoch):
+        """Update the learning rate at the beginning of the given epoch."""
+        self.lr = self.get_next_lr(epoch)
+        self.optimizer.set_lr(self.warmup_factor * self.lr)
+        return self.optimizer.get_lr()
+
+    def step_update(self, num_updates):
+        """Update the learning rate after each update."""
+        if self.cfg.warmup_updates > 0 and num_updates <= self.cfg.warmup_updates:
+            self.warmup_factor = num_updates / float(self.cfg.warmup_updates)
+            lr = self.warmup_factor * self.lr
+        elif num_updates >= self.total_num_update:
+            lr = self.end_learning_rate
+        else:
+            warmup = self.cfg.warmup_updates
+            lr_range = self.lr - self.end_learning_rate
+            pct_remaining = 1 - (num_updates - warmup) / (
+                self.total_num_update - warmup
+            )
+            lr = lr_range * pct_remaining ** (self.power) + self.end_learning_rate
+        self.optimizer.set_lr(lr)
+        return self.optimizer.get_lr()
diff --git a/SpeechT5/fairseq/fairseq/optim/lr_scheduler/reduce_lr_on_plateau.py b/SpeechT5/fairseq/fairseq/optim/lr_scheduler/reduce_lr_on_plateau.py
new file mode 100644
index 0000000000000000000000000000000000000000..5ee9c1be4a59ad3d072412827ab4e9b62dc7434e
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/optim/lr_scheduler/reduce_lr_on_plateau.py
@@ -0,0 +1,143 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from dataclasses import dataclass, field
+from typing import List
+
+import torch.optim.lr_scheduler
+from omegaconf import II
+
+from fairseq.dataclass import FairseqDataclass
+from fairseq.optim.lr_scheduler import FairseqLRScheduler, register_lr_scheduler
+
+
+@dataclass
+class ReduceLROnPlateauLRScheduleConfig(FairseqDataclass):
+    lr_shrink: float = field(
+        default=0.1, metadata={"help": "shrink factor for annealing"}
+    )
+    lr_threshold: float = field(
+        default=1e-4,
+        metadata={
+            "help": (
+                "threshold for measuring the new optimum, to only focus on "
+                "significant changes"
+            )
+        },
+    )
+    lr_patience: int = field(
+        default=0,
+        metadata={
+            "help": (
+                "number of epochs with no improvement after which learning rate will "
+                "be reduced"
+            )
+        },
+    )
+    warmup_updates: int = field(
+        default=0,
+        metadata={"help": "warmup the learning rate linearly for the first N updates"},
+    )
+    warmup_init_lr: float = field(
+        default=-1,
+        metadata={
+            "help": "initial learning rate during warmup phase; default is cfg.lr"
+        },
+    )
+    lr: List[float] = II("optimization.lr")
+    maximize_best_checkpoint_metric: bool = II(
+        "checkpoint.maximize_best_checkpoint_metric"
+    )
+
+
+@register_lr_scheduler(
+    "reduce_lr_on_plateau", dataclass=ReduceLROnPlateauLRScheduleConfig
+)
+class ReduceLROnPlateauLRSchedule(FairseqLRScheduler):
+    """
+    Decay the LR by a factor every time the validation loss plateaus.
+    Also comes with optional warmup phase, where we linearly increase
+    the learning rate from some initial learning rate
+    (``--warmup-init-lr``) until the configured learning rate
+    (``--lr``). Thereafter the lr is adjusted according to original
+    reduce_on_plateau scheme.
+
+    During warmup::
+
+      lrs = torch.linspace(
+          cfg.warmup_init_lr, cfg.lr, cfg.warmup_updates
+      )
+      lr = lrs[update_num]
+    """
+
+    def __init__(self, cfg: ReduceLROnPlateauLRScheduleConfig, optimizer):
+        super().__init__(cfg, optimizer)
+        if len(cfg.lr) > 1:
+            raise ValueError(
+                "Cannot use a fixed learning rate schedule with reduce_lr_on_plateau."
+                " Consider --lr-scheduler=fixed instead."
+            )
+        self.lr_scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(
+            self.optimizer.optimizer,
+            patience=cfg.lr_patience,
+            factor=cfg.lr_shrink,
+            mode="max" if cfg.maximize_best_checkpoint_metric else "min",
+            threshold=cfg.lr_threshold,
+        )
+        warmup_end_lr = cfg.lr[0]
+        # if no warm up, sets initial lr to be cfg.lr[0]
+        if cfg.warmup_init_lr < 0:
+            cfg.warmup_init_lr = 0 if cfg.warmup_updates > 0 else warmup_end_lr
+
+        # linearly warmup for the first cfg.warmup_updates
+        if cfg.warmup_updates > 0:
+            self.lr_step = (warmup_end_lr - cfg.warmup_init_lr) / cfg.warmup_updates
+
+        # this flag is either set from arg when no warm up, or set by
+        # step_update() when warmup finishes
+        self.warmup_end = True if cfg.warmup_updates <= 0 else False
+
+        # initial learning rate
+        # this self.lr is used only during init and/or warm up period
+        self.lr = warmup_end_lr if self.warmup_end else cfg.warmup_init_lr
+        self.optimizer.set_lr(self.lr)
+
+    def state_dict(self):
+        """Return the LR scheduler state dict."""
+        return {
+            "best": self.lr_scheduler.best,
+            "last_epoch": self.lr_scheduler.last_epoch,
+        }
+
+    def load_state_dict(self, state_dict):
+        """Load an LR scheduler state dict."""
+        self.lr_scheduler.best = state_dict["best"]
+        if "last_epoch" in state_dict:
+            self.lr_scheduler.last_epoch = state_dict["last_epoch"]
+
+    def step(self, epoch, val_loss=None):
+        """
+        Update the learning rate at the end of the given epoch if warmup
+        finishes otherwise no update of lr on epoch boundaries
+        """
+        if val_loss is not None and self.warmup_end is True:
+            self.lr_scheduler.step(val_loss)
+        else:
+            self.lr_scheduler.last_epoch = epoch
+        return self.optimizer.get_lr()
+
+    def step_update(self, num_updates):
+        """
+        Update the learning rate after each update."""
+        # if there is warmup
+        if self.cfg.warmup_updates > 0:
+            if num_updates <= self.cfg.warmup_updates:
+                self.lr = self.cfg.warmup_init_lr + num_updates * self.lr_step
+                self.optimizer.set_lr(self.lr)
+            else:
+                if self.warmup_end is False:
+                    self.warmup_end = True
+        # else do nothing
+        return self.optimizer.get_lr()
diff --git a/SpeechT5/fairseq/fairseq/optim/lr_scheduler/tri_stage_lr_scheduler.py b/SpeechT5/fairseq/fairseq/optim/lr_scheduler/tri_stage_lr_scheduler.py
new file mode 100644
index 0000000000000000000000000000000000000000..4d5547c39b14f62acbd4f4b9ab3abfb3009c0e6d
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/optim/lr_scheduler/tri_stage_lr_scheduler.py
@@ -0,0 +1,175 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import math
+from dataclasses import dataclass, field
+from typing import Optional, List, Tuple
+from omegaconf import II
+
+from fairseq.dataclass import FairseqDataclass
+from fairseq.optim.lr_scheduler import FairseqLRScheduler, register_lr_scheduler
+
+
+@dataclass
+class TriStageLRScheduleConfig(FairseqDataclass):
+    warmup_steps: int = field(
+        default=0,
+        metadata={"help": "warmup the learning rate linearly for the first N updates"},
+    )
+    hold_steps: int = field(
+        default=0,
+        metadata={"help": "steps in hold stage"},
+    )
+    decay_steps: int = field(
+        default=0,
+        metadata={"help": "steps in decay stages"},
+    )
+    phase_ratio: Optional[Tuple[float, float, float]] = field(
+        default=None,
+        metadata={
+            "help": (
+                "if set, automatically sets warmup/hold/decay steps to the ratio "
+                "specified here from max_updates. the ratios must add up to 1.0"
+            )
+        },
+    )
+    init_lr_scale: float = field(
+        default=0.01,
+        metadata={"help": "initial learning rate scale during warmup phase"},
+    )
+    final_lr_scale: float = field(
+        default=0.01,
+        metadata={"help": "final learning rate scale"},
+    )
+    max_update: float = II("optimization.max_update")
+    lr: List[float] = II("optimization.lr")
+
+
+@register_lr_scheduler("tri_stage", dataclass=TriStageLRScheduleConfig)
+class TriStageLRSchedule(FairseqLRScheduler):
+    """Tristage learning rate schedulr
+
+    Implement the learning rate scheduler in https://arxiv.org/pdf/1904.08779.pdf
+
+    Similar to inverse_squre_root scheduler, but tri_stage learning rate employs
+    three stages LR scheduling:
+
+        - warmup stage, starting from `lr` * `init_lr_scale`, linearly
+          increased to `lr` in `warmup_steps` iterations
+
+        - hold stage, after `warmup_steps`, keep the LR as `lr` for `hold_steps`
+          iterations
+
+        - decay stage, after hold stage, decay LR exponetially to
+          `lr` * `final_lr_scale` in `decay_steps`;
+          after that LR is keep as `final_lr_scale` * `lr`
+
+    During warmup::
+
+      init_lr = cfg.init_lr_scale * cfg.lr
+      lrs = torch.linspace(init_lr, cfg.lr, cfg.warmup_steps)
+      lr = lrs[update_num]
+
+    During hold::
+
+      lr = cfg.lr
+
+    During decay::
+
+      decay_factor = - math.log(cfg.final_lr_scale) / cfg.decay_steps
+      lr = cfg.lr * exp(- (update_num - warmup_steps - decay_steps) * decay_factor)
+
+    After that::
+
+      lr = cfg.lr * cfg.final_lr_scale
+    """
+
+    def __init__(self, cfg: TriStageLRScheduleConfig, optimizer):
+        super().__init__(cfg, optimizer)
+        if len(cfg.lr) > 1:
+            raise ValueError(
+                "Cannot use a fixed learning rate schedule with tri-stage lr."
+                " Consider --lr-scheduler=fixed instead."
+            )
+
+        # calculate LR at each point
+        self.peak_lr = cfg.lr[0]
+        self.init_lr = cfg.init_lr_scale * cfg.lr[0]
+        self.final_lr = cfg.final_lr_scale * cfg.lr[0]
+
+        if cfg.phase_ratio is not None:
+            assert cfg.max_update > 0
+            assert sum(cfg.phase_ratio) == 1, "phase ratios must add up to 1"
+            self.warmup_steps = int(cfg.max_update * cfg.phase_ratio[0])
+            self.hold_steps = int(cfg.max_update * cfg.phase_ratio[1])
+            self.decay_steps = int(cfg.max_update * cfg.phase_ratio[2])
+        else:
+            self.warmup_steps = cfg.warmup_steps
+            self.hold_steps = cfg.hold_steps
+            self.decay_steps = cfg.decay_steps
+
+        assert (
+            self.warmup_steps + self.hold_steps + self.decay_steps > 0
+        ), "please specify steps or phase_ratio"
+
+        self.warmup_rate = (
+            (self.peak_lr - self.init_lr) / self.warmup_steps
+            if self.warmup_steps != 0
+            else 0
+        )
+        self.decay_factor = -math.log(cfg.final_lr_scale) / self.decay_steps
+
+        # initial learning rate
+        self.lr = self.init_lr
+        self.optimizer.set_lr(self.lr)
+
+    def _decide_stage(self, update_step):
+        """
+        return stage, and the corresponding steps within the current stage
+        """
+        if update_step < self.warmup_steps:
+            # warmup state
+            return 0, update_step
+
+        offset = self.warmup_steps
+
+        if update_step < offset + self.hold_steps:
+            # hold stage
+            return 1, update_step - offset
+
+        offset += self.hold_steps
+
+        if update_step <= offset + self.decay_steps:
+            # decay stage
+            return 2, update_step - offset
+
+        offset += self.decay_steps
+
+        # still here ? constant lr stage
+        return 3, update_step - offset
+
+    def step(self, epoch, val_loss=None):
+        """Update the learning rate at the end of the given epoch."""
+        super().step(epoch, val_loss)
+        # we don't change the learning rate at epoch boundaries
+        return self.optimizer.get_lr()
+
+    def step_update(self, num_updates):
+        """Update the learning rate after each update."""
+        stage, steps_in_stage = self._decide_stage(num_updates)
+        if stage == 0:
+            self.lr = self.init_lr + self.warmup_rate * steps_in_stage
+        elif stage == 1:
+            self.lr = self.peak_lr
+        elif stage == 2:
+            self.lr = self.peak_lr * math.exp(-self.decay_factor * steps_in_stage)
+        elif stage == 3:
+            self.lr = self.final_lr
+        else:
+            raise ValueError("Undefined stage")
+
+        self.optimizer.set_lr(self.lr)
+
+        return self.lr
diff --git a/SpeechT5/fairseq/fairseq/optim/lr_scheduler/triangular_lr_scheduler.py b/SpeechT5/fairseq/fairseq/optim/lr_scheduler/triangular_lr_scheduler.py
new file mode 100644
index 0000000000000000000000000000000000000000..bfe2a0d381f28525f90ee120b31a69210338eb1b
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/optim/lr_scheduler/triangular_lr_scheduler.py
@@ -0,0 +1,83 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import math
+from dataclasses import dataclass, field
+from typing import List
+
+from omegaconf import II
+
+from fairseq.dataclass import FairseqDataclass
+from fairseq.optim.lr_scheduler import FairseqLRScheduler, register_lr_scheduler
+
+
+@dataclass
+class TriangularLRScheduleConfig(FairseqDataclass):
+    max_lr: float = field(
+        default="???", metadata={"help": "max learning rate, must be more than cfg.lr"}
+    )
+    lr_period_updates: float = field(
+        default=5000,
+        metadata={"help": "initial number of updates per period (cycle length)"},
+    )
+    lr_shrink: float = field(
+        default=0.1, metadata={"help": "shrink factor for annealing"}
+    )
+    shrink_min: bool = field(
+        default=False, metadata={"help": "if set, also shrinks min lr"}
+    )
+    lr: List[float] = II("optimization.lr")
+
+
+@register_lr_scheduler("triangular", dataclass=TriangularLRScheduleConfig)
+class TriangularLRSchedule(FairseqLRScheduler):
+    """Assign LR based on a triangular cyclical schedule.
+
+    See https://arxiv.org/pdf/1506.01186.pdf for details.
+    """
+
+    def __init__(self, cfg: TriangularLRScheduleConfig, optimizer):
+        super().__init__(cfg, optimizer)
+        if len(cfg.lr) > 1:
+            raise ValueError(
+                "Cannot use a fixed learning rate schedule with triangular."
+                " Consider --lr-scheduler=fixed instead."
+            )
+
+        lr = cfg.lr[0]
+
+        assert cfg.max_lr > lr, "max_lr must be more than lr"
+        self.min_lr = lr
+        self.max_lr = cfg.max_lr
+        self.stepsize = cfg.lr_period_updates // 2
+        self.lr_shrink = cfg.lr_shrink
+        self.shrink_min = cfg.shrink_min
+
+        # initial learning rate
+        self.lr = self.min_lr
+        self.optimizer.set_lr(self.lr)
+
+    def step(self, epoch, val_loss=None):
+        """Update the learning rate at the end of the given epoch."""
+        super().step(epoch, val_loss)
+        # we don't change the learning rate at epoch boundaries
+        return self.optimizer.get_lr()
+
+    def step_update(self, num_updates):
+        """Update the learning rate after each update."""
+        cycle = math.floor(num_updates / (2 * self.stepsize))
+
+        lr_shrink = self.lr_shrink ** cycle
+        max_lr = self.max_lr * lr_shrink
+        if self.shrink_min:
+            min_lr = self.min_lr * lr_shrink
+        else:
+            min_lr = self.min_lr
+
+        x = abs(num_updates / self.stepsize - 2 * (cycle + 1) + 1)
+        self.lr = min_lr + (max_lr - min_lr) * max(0, (1 - x))
+
+        self.optimizer.set_lr(self.lr)
+        return self.lr
diff --git a/SpeechT5/fairseq/fairseq/optim/nag.py b/SpeechT5/fairseq/fairseq/optim/nag.py
new file mode 100644
index 0000000000000000000000000000000000000000..c30a6c0fb1e8d5dc7edd5b53ba15a6acd46ecbff
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/optim/nag.py
@@ -0,0 +1,111 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from collections.abc import Collection
+from dataclasses import dataclass, field
+from typing import List
+
+import torch
+from fairseq.dataclass import FairseqDataclass
+from omegaconf import II, DictConfig
+from torch.optim.optimizer import Optimizer, required
+
+from . import FairseqOptimizer, register_optimizer
+
+
+@dataclass
+class FairseqNAGConfig(FairseqDataclass):
+    momentum: float = field(default=0.99, metadata={"help": "momentum factor"})
+    weight_decay: float = field(default=0.0, metadata={"help": "weight decay"})
+    # TODO common vars in parent class
+    lr: List[float] = II("optimization.lr")
+
+
+@register_optimizer("nag", dataclass=FairseqNAGConfig)
+class FairseqNAG(FairseqOptimizer):
+    def __init__(self, cfg: DictConfig, params):
+        super().__init__(cfg)
+        self._optimizer = NAG(params, **self.optimizer_config)
+
+    @property
+    def optimizer_config(self):
+        """
+        Return a kwarg dictionary that will be used to override optimizer
+        args stored in checkpoints. This allows us to load a checkpoint and
+        resume training using a different set of optimizer args, e.g., with a
+        different learning rate.
+        """
+        return {
+            "lr": self.cfg.lr[0]
+            if isinstance(self.cfg.lr, Collection)
+            else self.cfg.lr,
+            "momentum": self.cfg.momentum,
+            "weight_decay": self.cfg.weight_decay,
+        }
+
+
+class NAG(Optimizer):
+    def __init__(self, params, lr=required, momentum=0, weight_decay=0):
+        defaults = dict(lr=lr, lr_old=lr, momentum=momentum, weight_decay=weight_decay)
+        super(NAG, self).__init__(params, defaults)
+
+    @property
+    def supports_memory_efficient_fp16(self):
+        return True
+
+    @property
+    def supports_flat_params(self):
+        return True
+
+    def step(self, closure=None):
+        """Performs a single optimization step.
+
+        Args:
+            closure (callable, optional): A closure that reevaluates the model
+                and returns the loss.
+        """
+        loss = None
+        if closure is not None:
+            loss = closure()
+
+        for group in self.param_groups:
+            weight_decay = group["weight_decay"]
+            momentum = group["momentum"]
+            lr = group["lr"]
+            lr_old = group.get("lr_old", lr)
+            lr_correct = lr / lr_old if lr_old > 0 else lr
+
+            for p in group["params"]:
+                if p.grad is None:
+                    continue
+
+                p_data_fp32 = p.data
+                if p_data_fp32.dtype in {torch.float16, torch.bfloat16}:
+                    p_data_fp32 = p_data_fp32.float()
+
+                d_p = p.grad.data.float()
+                param_state = self.state[p]
+                if "momentum_buffer" not in param_state:
+                    param_state["momentum_buffer"] = torch.zeros_like(d_p)
+                else:
+                    param_state["momentum_buffer"] = param_state["momentum_buffer"].to(
+                        d_p
+                    )
+
+                buf = param_state["momentum_buffer"]
+
+                if weight_decay != 0:
+                    p_data_fp32.mul_(1 - lr * weight_decay)
+                p_data_fp32.add_(buf, alpha=momentum * momentum * lr_correct)
+                p_data_fp32.add_(d_p, alpha=-(1 + momentum) * lr)
+
+                buf.mul_(momentum * lr_correct).add_(d_p, alpha=-lr)
+
+                if p.data.dtype in {torch.float16, torch.bfloat16}:
+                    p.data.copy_(p_data_fp32)
+
+            group["lr_old"] = lr
+
+        return loss
diff --git a/SpeechT5/fairseq/fairseq/optim/sgd.py b/SpeechT5/fairseq/fairseq/optim/sgd.py
new file mode 100644
index 0000000000000000000000000000000000000000..8e34fb99a18fff12ab76be5894a84cbbb2f48176
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/optim/sgd.py
@@ -0,0 +1,43 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import torch.optim
+
+from . import LegacyFairseqOptimizer, register_optimizer
+
+
+@register_optimizer("sgd")
+class SGD(LegacyFairseqOptimizer):
+    def __init__(self, args, params):
+        super().__init__(args)
+        self._optimizer = torch.optim.SGD(params, **self.optimizer_config)
+
+    @staticmethod
+    def add_args(parser):
+        """Add optimizer-specific arguments to the parser."""
+        # fmt: off
+        parser.add_argument('--momentum', default=0.0, type=float, metavar='M',
+                            help='momentum factor')
+        parser.add_argument('--weight-decay', '--wd', default=0.0, type=float, metavar='WD',
+                            help='weight decay')
+        # fmt: on
+
+    @property
+    def optimizer_config(self):
+        """
+        Return a kwarg dictionary that will be used to override optimizer
+        args stored in checkpoints. This allows us to load a checkpoint and
+        resume training using a different set of optimizer args, e.g., with a
+        different learning rate.
+        """
+        return {
+            "lr": self.args.lr[0],
+            "momentum": self.args.momentum,
+            "weight_decay": self.args.weight_decay,
+        }
+
+    @property
+    def supports_flat_params(self):
+        return True
diff --git a/SpeechT5/fairseq/fairseq/optim/shard.py b/SpeechT5/fairseq/fairseq/optim/shard.py
new file mode 100644
index 0000000000000000000000000000000000000000..9d7f2eb9e5de6086fe2435d432bde7521ebb8155
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/optim/shard.py
@@ -0,0 +1,58 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from typing import Any, Dict
+
+from fairseq.distributed import utils
+
+
+try:
+    from fairscale.optim import OSS
+
+    _has_fairscale = True
+except ImportError:
+    _has_fairscale = False
+
+
+def shard_(optimizer, group):
+    if not _has_fairscale:
+        raise ImportError(
+            "\n\nPlease install the fairscale package:" "\n\n  pip install fairscale"
+        )
+
+    class FairseqOSS(OSS):
+        @property
+        def disable_mem_eff_fp16_loading_hack(self):
+            return True
+
+        def __getattr__(self, name):
+            if name.startswith("supports") and hasattr(self.optim, name):
+                return getattr(self.optim, name)
+            raise AttributeError(
+                "'FairseqOSS' object has no attribute {0!r}".format(name)
+            )
+
+        def broadcast_global_state_dict(
+            self, state_dict: Dict[str, Any]
+        ) -> Dict[str, Any]:
+            """
+            Broadcasts the entire state_dict to all other ranks
+            each rank is responsible to load their own partition of data
+            """
+            return utils.broadcast_object(
+                state_dict,
+                src_rank=0,
+                group=self.group,
+            )
+
+    torch_optimizer = optimizer.optimizer
+    optim_cls = type(torch_optimizer)
+
+    optimizer.optimizer = FairseqOSS(
+        torch_optimizer.param_groups,
+        optim_cls,
+        group=group,
+        **optimizer.optimizer_config
+    )
diff --git a/SpeechT5/fairseq/fairseq/options.py b/SpeechT5/fairseq/fairseq/options.py
new file mode 100644
index 0000000000000000000000000000000000000000..2d9f8381a71415ebe9c14c13068abadf289a03e4
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/options.py
@@ -0,0 +1,381 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import argparse
+from pathlib import Path
+from typing import Callable, List, Optional, Union
+
+import torch
+from fairseq import utils
+from fairseq.data.indexed_dataset import get_available_dataset_impl
+from fairseq.dataclass.configs import (
+    CheckpointConfig,
+    CommonConfig,
+    CommonEvalConfig,
+    DatasetConfig,
+    DistributedTrainingConfig,
+    EvalLMConfig,
+    GenerationConfig,
+    InteractiveConfig,
+    OptimizationConfig,
+)
+from fairseq.dataclass.utils import gen_parser_from_dataclass
+
+# this import is for backward compatibility
+from fairseq.utils import csv_str_list, eval_bool, eval_str_dict, eval_str_list  # noqa
+
+
+def get_preprocessing_parser(default_task="translation"):
+    parser = get_parser("Preprocessing", default_task)
+    add_preprocess_args(parser)
+    return parser
+
+
+def get_training_parser(default_task="translation"):
+    parser = get_parser("Trainer", default_task)
+    add_dataset_args(parser, train=True)
+    add_distributed_training_args(parser)
+    add_model_args(parser)
+    add_optimization_args(parser)
+    add_checkpoint_args(parser)
+    return parser
+
+
+def get_generation_parser(interactive=False, default_task="translation"):
+    parser = get_parser("Generation", default_task)
+    add_dataset_args(parser, gen=True)
+    add_distributed_training_args(parser, default_world_size=1)
+    add_generation_args(parser)
+    add_checkpoint_args(parser)
+    if interactive:
+        add_interactive_args(parser)
+    return parser
+
+
+def get_interactive_generation_parser(default_task="translation"):
+    return get_generation_parser(interactive=True, default_task=default_task)
+
+
+def get_eval_lm_parser(default_task="language_modeling"):
+    parser = get_parser("Evaluate Language Model", default_task)
+    add_dataset_args(parser, gen=True)
+    add_distributed_training_args(parser, default_world_size=1)
+    add_eval_lm_args(parser)
+    return parser
+
+
+def get_validation_parser(default_task=None):
+    parser = get_parser("Validation", default_task)
+    add_dataset_args(parser, train=True)
+    add_distributed_training_args(parser, default_world_size=1)
+    group = parser.add_argument_group("Evaluation")
+    gen_parser_from_dataclass(group, CommonEvalConfig())
+    return parser
+
+
+def parse_args_and_arch(
+    parser: argparse.ArgumentParser,
+    input_args: List[str] = None,
+    parse_known: bool = False,
+    suppress_defaults: bool = False,
+    modify_parser: Optional[Callable[[argparse.ArgumentParser], None]] = None,
+):
+    """
+    Args:
+        parser (ArgumentParser): the parser
+        input_args (List[str]): strings to parse, defaults to sys.argv
+        parse_known (bool): only parse known arguments, similar to
+            `ArgumentParser.parse_known_args`
+        suppress_defaults (bool): parse while ignoring all default values
+        modify_parser (Optional[Callable[[ArgumentParser], None]]):
+            function to modify the parser, e.g., to set default values
+    """
+    if suppress_defaults:
+        # Parse args without any default values. This requires us to parse
+        # twice, once to identify all the necessary task/model args, and a second
+        # time with all defaults set to None.
+        args = parse_args_and_arch(
+            parser,
+            input_args=input_args,
+            parse_known=parse_known,
+            suppress_defaults=False,
+        )
+        suppressed_parser = argparse.ArgumentParser(add_help=False, parents=[parser])
+        suppressed_parser.set_defaults(**{k: None for k, v in vars(args).items()})
+        args = suppressed_parser.parse_args(input_args)
+        return argparse.Namespace(
+            **{k: v for k, v in vars(args).items() if v is not None}
+        )
+
+    from fairseq.models import ARCH_MODEL_REGISTRY, ARCH_CONFIG_REGISTRY, MODEL_REGISTRY
+
+    # Before creating the true parser, we need to import optional user module
+    # in order to eagerly import custom tasks, optimizers, architectures, etc.
+    usr_parser = argparse.ArgumentParser(add_help=False, allow_abbrev=False)
+    usr_parser.add_argument("--user-dir", default=None)
+    usr_args, _ = usr_parser.parse_known_args(input_args)
+    utils.import_user_module(usr_args)
+
+    if modify_parser is not None:
+        modify_parser(parser)
+
+    # The parser doesn't know about model/criterion/optimizer-specific args, so
+    # we parse twice. First we parse the model/criterion/optimizer, then we
+    # parse a second time after adding the *-specific arguments.
+    # If input_args is given, we will parse those args instead of sys.argv.
+    args, _ = parser.parse_known_args(input_args)
+
+    # Add model-specific args to parser.
+    if hasattr(args, "arch"):
+        model_specific_group = parser.add_argument_group(
+            "Model-specific configuration",
+            # Only include attributes which are explicitly given as command-line
+            # arguments or which have default values.
+            argument_default=argparse.SUPPRESS,
+        )
+        if args.arch in ARCH_MODEL_REGISTRY:
+            ARCH_MODEL_REGISTRY[args.arch].add_args(model_specific_group)
+        elif args.arch in MODEL_REGISTRY:
+            MODEL_REGISTRY[args.arch].add_args(model_specific_group)
+        else:
+            raise RuntimeError()
+
+    if hasattr(args, "task"):
+        from fairseq.tasks import TASK_REGISTRY
+
+        TASK_REGISTRY[args.task].add_args(parser)
+    if getattr(args, "use_bmuf", False):
+        # hack to support extra args for block distributed data parallelism
+        from fairseq.optim.bmuf import FairseqBMUF
+
+        FairseqBMUF.add_args(parser)
+
+    # Add *-specific args to parser.
+    from fairseq.registry import REGISTRIES
+
+    for registry_name, REGISTRY in REGISTRIES.items():
+        choice = getattr(args, registry_name, None)
+        if choice is not None:
+            cls = REGISTRY["registry"][choice]
+            if hasattr(cls, "add_args"):
+                cls.add_args(parser)
+            elif hasattr(cls, "__dataclass"):
+                gen_parser_from_dataclass(parser, cls.__dataclass())
+
+    # Modify the parser a second time, since defaults may have been reset
+    if modify_parser is not None:
+        modify_parser(parser)
+
+    # Parse a second time.
+    if parse_known:
+        args, extra = parser.parse_known_args(input_args)
+    else:
+        args = parser.parse_args(input_args)
+        extra = None
+    # Post-process args.
+    if (
+        hasattr(args, "batch_size_valid") and args.batch_size_valid is None
+    ) or not hasattr(args, "batch_size_valid"):
+        args.batch_size_valid = args.batch_size
+    if hasattr(args, "max_tokens_valid") and args.max_tokens_valid is None:
+        args.max_tokens_valid = args.max_tokens
+    if getattr(args, "memory_efficient_fp16", False):
+        args.fp16 = True
+    if getattr(args, "memory_efficient_bf16", False):
+        args.bf16 = True
+    args.tpu = getattr(args, "tpu", False)
+    args.bf16 = getattr(args, "bf16", False)
+    if args.bf16:
+        args.tpu = True
+    if args.tpu and args.fp16:
+        raise ValueError("Cannot combine --fp16 and --tpu, use --bf16 on TPUs")
+
+    if getattr(args, "seed", None) is None:
+        args.seed = 1  # default seed for training
+        args.no_seed_provided = True
+    else:
+        args.no_seed_provided = False
+
+    # Apply architecture configuration.
+    if hasattr(args, "arch") and args.arch in ARCH_CONFIG_REGISTRY:
+        ARCH_CONFIG_REGISTRY[args.arch](args)
+
+    if parse_known:
+        return args, extra
+    else:
+        return args
+
+
+def get_parser(desc, default_task="translation"):
+    # Before creating the true parser, we need to import optional user module
+    # in order to eagerly import custom tasks, optimizers, architectures, etc.
+    usr_parser = argparse.ArgumentParser(add_help=False, allow_abbrev=False)
+    usr_parser.add_argument("--user-dir", default=None)
+    usr_args, _ = usr_parser.parse_known_args()
+    utils.import_user_module(usr_args)
+
+    parser = argparse.ArgumentParser(allow_abbrev=False)
+    gen_parser_from_dataclass(parser, CommonConfig())
+
+    from fairseq.registry import REGISTRIES
+
+    for registry_name, REGISTRY in REGISTRIES.items():
+        parser.add_argument(
+            "--" + registry_name.replace("_", "-"),
+            default=REGISTRY["default"],
+            choices=REGISTRY["registry"].keys(),
+        )
+
+    # Task definitions can be found under fairseq/tasks/
+    from fairseq.tasks import TASK_REGISTRY
+
+    parser.add_argument(
+        "--task",
+        metavar="TASK",
+        default=default_task,
+        choices=TASK_REGISTRY.keys(),
+        help="task",
+    )
+    # fmt: on
+    return parser
+
+
+def add_preprocess_args(parser):
+    group = parser.add_argument_group("Preprocessing")
+    # fmt: off
+    group.add_argument("-s", "--source-lang", default=None, metavar="SRC",
+                       help="source language")
+    group.add_argument("-t", "--target-lang", default=None, metavar="TARGET",
+                       help="target language")
+    group.add_argument("--trainpref", metavar="FP", default=None,
+                       help="train file prefix (also used to build dictionaries)")
+    group.add_argument("--validpref", metavar="FP", default=None,
+                       help="comma separated, valid file prefixes "
+                            "(words missing from train set are replaced with <unk>)")
+    group.add_argument("--testpref", metavar="FP", default=None,
+                       help="comma separated, test file prefixes "
+                            "(words missing from train set are replaced with <unk>)")
+    group.add_argument("--align-suffix", metavar="FP", default=None,
+                       help="alignment file suffix")
+    group.add_argument("--destdir", metavar="DIR", default="data-bin",
+                       help="destination dir")
+    group.add_argument("--thresholdtgt", metavar="N", default=0, type=int,
+                       help="map words appearing less than threshold times to unknown")
+    group.add_argument("--thresholdsrc", metavar="N", default=0, type=int,
+                       help="map words appearing less than threshold times to unknown")
+    group.add_argument("--tgtdict", metavar="FP",
+                       help="reuse given target dictionary")
+    group.add_argument("--srcdict", metavar="FP",
+                       help="reuse given source dictionary")
+    group.add_argument("--nwordstgt", metavar="N", default=-1, type=int,
+                       help="number of target words to retain")
+    group.add_argument("--nwordssrc", metavar="N", default=-1, type=int,
+                       help="number of source words to retain")
+    group.add_argument("--alignfile", metavar="ALIGN", default=None,
+                       help="an alignment file (optional)")
+    parser.add_argument('--dataset-impl', metavar='FORMAT', default='mmap',
+                        choices=get_available_dataset_impl(),
+                        help='output dataset implementation')
+    group.add_argument("--joined-dictionary", action="store_true",
+                       help="Generate joined dictionary")
+    group.add_argument("--only-source", action="store_true",
+                       help="Only process the source language")
+    group.add_argument("--padding-factor", metavar="N", default=8, type=int,
+                       help="Pad dictionary size to be multiple of N")
+    group.add_argument("--workers", metavar="N", default=1, type=int,
+                       help="number of parallel workers")
+    group.add_argument("--dict-only", action='store_true',
+                       help="if true, only builds a dictionary and then exits")
+    # fmt: on
+    return parser
+
+
+def add_dataset_args(parser, train=False, gen=False):
+    group = parser.add_argument_group("dataset_data_loading")
+    gen_parser_from_dataclass(group, DatasetConfig())
+    # fmt: on
+    return group
+
+
+def add_distributed_training_args(parser, default_world_size=None):
+    group = parser.add_argument_group("distributed_training")
+    if default_world_size is None:
+        default_world_size = max(1, torch.cuda.device_count())
+    gen_parser_from_dataclass(
+        group, DistributedTrainingConfig(distributed_world_size=default_world_size)
+    )
+    return group
+
+
+def add_optimization_args(parser):
+    group = parser.add_argument_group("optimization")
+    # fmt: off
+    gen_parser_from_dataclass(group, OptimizationConfig())
+    # fmt: on
+    return group
+
+
+def add_checkpoint_args(parser):
+    group = parser.add_argument_group("checkpoint")
+    # fmt: off
+    gen_parser_from_dataclass(group, CheckpointConfig())
+    # fmt: on
+    return group
+
+
+def add_common_eval_args(group):
+    gen_parser_from_dataclass(group, CommonEvalConfig())
+
+
+def add_eval_lm_args(parser):
+    group = parser.add_argument_group("LM Evaluation")
+    add_common_eval_args(group)
+    gen_parser_from_dataclass(group, EvalLMConfig())
+
+
+def add_generation_args(parser):
+    group = parser.add_argument_group("Generation")
+    add_common_eval_args(group)
+    gen_parser_from_dataclass(group, GenerationConfig())
+    return group
+
+
+def add_interactive_args(parser):
+    group = parser.add_argument_group("Interactive")
+    gen_parser_from_dataclass(group, InteractiveConfig())
+
+
+def add_model_args(parser):
+    group = parser.add_argument_group("Model configuration")
+    # fmt: off
+
+    # Model definitions can be found under fairseq/models/
+    #
+    # The model architecture can be specified in several ways.
+    # In increasing order of priority:
+    # 1) model defaults (lowest priority)
+    # 2) --arch argument
+    # 3) --encoder/decoder-* arguments (highest priority)
+    from fairseq.models import ARCH_MODEL_REGISTRY
+    group.add_argument('--arch', '-a', metavar='ARCH',
+                       choices=ARCH_MODEL_REGISTRY.keys(),
+                       help='model architecture')
+    # fmt: on
+    return group
+
+
+def get_args(
+    data: Union[str, Path],
+    task: str = "translation",
+    arch: str = "transformer",
+    **overrides
+):
+    parser = get_training_parser(task)
+    args = parse_args_and_arch(parser, [str(data), "--task", task, "--arch", arch])
+
+    for k, v in overrides.items():
+        setattr(args, k, v)
+
+    return args
diff --git a/SpeechT5/fairseq/fairseq/pdb.py b/SpeechT5/fairseq/fairseq/pdb.py
new file mode 100644
index 0000000000000000000000000000000000000000..1ba6ef0d336b30717cfdde94e1b838cfe2bfeb20
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/pdb.py
@@ -0,0 +1,47 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import multiprocessing
+import os
+import pdb
+import sys
+
+
+__all__ = ["set_trace"]
+
+
+_stdin = [None]
+_stdin_lock = multiprocessing.Lock()
+try:
+    _stdin_fd = sys.stdin.fileno()
+except Exception:
+    _stdin_fd = None
+
+
+class MultiprocessingPdb(pdb.Pdb):
+    """A Pdb wrapper that works in a multiprocessing environment.
+
+    Usage: `from fairseq import pdb; pdb.set_trace()`
+    """
+
+    def __init__(self):
+        pdb.Pdb.__init__(self, nosigint=True)
+
+    def _cmdloop(self):
+        stdin_bak = sys.stdin
+        with _stdin_lock:
+            try:
+                if _stdin_fd is not None:
+                    if not _stdin[0]:
+                        _stdin[0] = os.fdopen(_stdin_fd)
+                    sys.stdin = _stdin[0]
+                self.cmdloop()
+            finally:
+                sys.stdin = stdin_bak
+
+
+def set_trace():
+    pdb = MultiprocessingPdb()
+    pdb.set_trace(sys._getframe().f_back)
diff --git a/SpeechT5/fairseq/fairseq/quantization_utils.py b/SpeechT5/fairseq/fairseq/quantization_utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..11fc414c852b199b80a569bf024272535929abcc
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/quantization_utils.py
@@ -0,0 +1,143 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import logging
+
+from fairseq.modules.quantization import pq, quantization_options, scalar
+from omegaconf import DictConfig
+
+
+logger = logging.getLogger(__name__)
+
+
+def quantize_model_scalar(model, model_cfg: DictConfig):
+    quant_noise_scalar = getattr(model_cfg, "quant_noise_scalar", 0) or 0
+    if quant_noise_scalar > 0:
+        # quantize_model edits the model in place
+        scalar.quantize_model_(model, p=quant_noise_scalar, bits=8, update_step=1000)
+    return model
+
+
+class Quantizer(object):
+    def __init__(self, config_path, max_epoch, max_update):
+        try:
+            import yaml
+        except ImportError:
+            raise ImportError("Please install yaml with: pip install yaml")
+
+        # parse config
+        if config_path:
+            with open(config_path) as config_file:
+                config = quantization_options.parse_config_yaml(
+                    yaml.safe_load(config_file)
+                )
+        else:
+            config = quantization_options.parse_config_yaml({})
+
+        self.n_centroids_config = config["n_centroids"]
+        self.block_sizes_config = config["block_sizes"]
+        self.layers_to_quantize = config["layers_to_quantize"]
+
+        # We assume that training will run for a fixed number of epochs
+        # (or updates) and that we should train for equal durations
+        # between iterations of PQ.
+        num_iterations = len(self.layers_to_quantize)
+        if max_epoch > 0:
+            assert max_epoch % num_iterations == 0, (
+                "for iterative PQ, --max-epoch (={}) must be evenly divisible by "
+                "len(layers_to_quantize) (={})".format(max_epoch, num_iterations)
+            )
+            self.epoch_schedule = max_epoch // num_iterations
+        else:
+            self.epoch_schedule = None
+        if max_update > 0:
+            assert max_update % num_iterations == 0, (
+                "for iterative PQ, --max-update (={}) must be evenly divisible by "
+                "len(layers_to_quantize) (={})".format(max_update, num_iterations)
+            )
+            self.update_schedule = max_update // num_iterations
+        else:
+            self.update_schedule = None
+        assert (self.epoch_schedule is not None) ^ (
+            self.update_schedule is not None
+        ), "for iterative PQ, cannot specify both --max-update and --max-epoch"
+
+        # 0 is a special value for quantization step, which will force
+        # the first call to begin_epoch() to call step()
+        self.quantization_step = 0
+
+    def set_trainer(self, trainer):
+        self.trainer = trainer
+        self.size_tracker = pq.SizeTracker(self.trainer.get_model())
+
+    def step(self):
+        """Move to the next stage of quantization."""
+        if self.quantization_step >= len(self.layers_to_quantize):
+            # Maybe we just finished the last training step or we loaded
+            # a checkpoint for an iterative PQ model which previously
+            # finished training. Either way, don't quantize again.
+            return
+
+        logger.info(
+            "quantizing model (step={}; layers_to_quantize[step]={})".format(
+                self.quantization_step, self.layers_to_quantize[self.quantization_step]
+            )
+        )
+        quantized_layers = pq.quantize_model_(
+            self.trainer.get_model(),
+            self.size_tracker,
+            self.layers_to_quantize,
+            self.block_sizes_config,
+            self.n_centroids_config,
+            step=self.quantization_step,
+        )
+        logger.info("quantized layers: {}".format(quantized_layers))
+        logger.info(self.size_tracker)
+
+        self.quantization_step += 1
+
+        # reintialize the Trainer since model parameters have changed
+        self.trainer.reinitialize()
+
+    def begin_epoch(self, epoch):
+        """Called at the beginning of each epoch (epochs start at 1)."""
+        if (
+            (
+                self.epoch_schedule is not None
+                and epoch > 0
+                and (epoch - 1) % self.epoch_schedule == 0
+            )
+            # we always step once in the beginning, even if using
+            # update-based quantization
+            or self.quantization_step == 0
+        ):
+            self.step()
+
+    def step_update(self, num_updates):
+        """Called at the end of each step."""
+        if (
+            self.update_schedule is not None
+            and num_updates > 0
+            and num_updates % self.update_schedule == 0
+        ):
+            self.step()
+
+    def state_dict(self):
+        return {
+            "n_centroids_config": self.n_centroids_config,
+            "block_sizes_config": self.block_sizes_config,
+            "layers_to_quantize": self.layers_to_quantize,
+            "epoch_schedule": self.epoch_schedule,
+            "update_schedule": self.update_schedule,
+            "quantization_step": self.quantization_step,
+        }
+
+    def load_state_dict(self, state_dict):
+        self.n_centroids_config = state_dict["n_centroids_config"]
+        self.block_sizes_config = state_dict["block_sizes_config"]
+        self.layers_to_quantize = state_dict["layers_to_quantize"]
+        self.epoch_schedule = state_dict["epoch_schedule"]
+        self.update_schedule = state_dict["update_schedule"]
+        self.quantization_step = state_dict["quantization_step"]
diff --git a/SpeechT5/fairseq/fairseq/registry.py b/SpeechT5/fairseq/fairseq/registry.py
new file mode 100644
index 0000000000000000000000000000000000000000..3fbaeac301855d41a5d52ff58276787e8ddebfca
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/registry.py
@@ -0,0 +1,100 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from argparse import Namespace
+
+from typing import Union
+from fairseq.dataclass import FairseqDataclass
+from fairseq.dataclass.utils import populate_dataclass, merge_with_parent
+from hydra.core.config_store import ConfigStore
+from omegaconf import DictConfig
+
+REGISTRIES = {}
+
+
+def setup_registry(registry_name: str, base_class=None, default=None, required=False):
+    assert registry_name.startswith("--")
+    registry_name = registry_name[2:].replace("-", "_")
+
+    REGISTRY = {}
+    REGISTRY_CLASS_NAMES = set()
+    DATACLASS_REGISTRY = {}
+
+    # maintain a registry of all registries
+    if registry_name in REGISTRIES:
+        return  # registry already exists
+    REGISTRIES[registry_name] = {
+        "registry": REGISTRY,
+        "default": default,
+        "dataclass_registry": DATACLASS_REGISTRY,
+    }
+
+    def build_x(cfg: Union[DictConfig, str, Namespace], *extra_args, **extra_kwargs):
+        if isinstance(cfg, DictConfig):
+            choice = cfg._name
+
+            if choice and choice in DATACLASS_REGISTRY:
+                dc = DATACLASS_REGISTRY[choice]
+                cfg = merge_with_parent(dc(), cfg)
+        elif isinstance(cfg, str):
+            choice = cfg
+            if choice in DATACLASS_REGISTRY:
+                cfg = DATACLASS_REGISTRY[choice]()
+        else:
+            choice = getattr(cfg, registry_name, None)
+            if choice in DATACLASS_REGISTRY:
+                cfg = populate_dataclass(DATACLASS_REGISTRY[choice](), cfg)
+
+        if choice is None:
+            if required:
+                raise ValueError("{} is required!".format(registry_name))
+            return None
+
+        cls = REGISTRY[choice]
+        if hasattr(cls, "build_" + registry_name):
+            builder = getattr(cls, "build_" + registry_name)
+        else:
+            builder = cls
+
+        return builder(cfg, *extra_args, **extra_kwargs)
+
+    def register_x(name, dataclass=None):
+        def register_x_cls(cls):
+            if name in REGISTRY:
+                raise ValueError(
+                    "Cannot register duplicate {} ({})".format(registry_name, name)
+                )
+            if cls.__name__ in REGISTRY_CLASS_NAMES:
+                raise ValueError(
+                    "Cannot register {} with duplicate class name ({})".format(
+                        registry_name, cls.__name__
+                    )
+                )
+            if base_class is not None and not issubclass(cls, base_class):
+                raise ValueError(
+                    "{} must extend {}".format(cls.__name__, base_class.__name__)
+                )
+
+            if dataclass is not None and not issubclass(dataclass, FairseqDataclass):
+                raise ValueError(
+                    "Dataclass {} must extend FairseqDataclass".format(dataclass)
+                )
+
+            cls.__dataclass = dataclass
+            if cls.__dataclass is not None:
+                DATACLASS_REGISTRY[name] = cls.__dataclass
+
+                cs = ConfigStore.instance()
+                node = dataclass()
+                node._name = name
+                cs.store(name=name, group=registry_name, node=node, provider="fairseq")
+
+            REGISTRY[name] = cls
+
+            return cls
+
+        return register_x_cls
+
+    return build_x, register_x, REGISTRY, DATACLASS_REGISTRY
diff --git a/SpeechT5/fairseq/fairseq/scoring/__init__.py b/SpeechT5/fairseq/fairseq/scoring/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..58f2f563e493327394dff1265030d18f0814b5a2
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/scoring/__init__.py
@@ -0,0 +1,55 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+
+import importlib
+import os
+from abc import ABC, abstractmethod
+
+from fairseq import registry
+from omegaconf import DictConfig
+
+
+class BaseScorer(ABC):
+    def __init__(self, cfg):
+        self.cfg = cfg
+        self.ref = []
+        self.pred = []
+
+    def add_string(self, ref, pred):
+        self.ref.append(ref)
+        self.pred.append(pred)
+
+    @abstractmethod
+    def score(self) -> float:
+        pass
+
+    @abstractmethod
+    def result_string(self) -> str:
+        pass
+
+
+_build_scorer, register_scorer, SCORER_REGISTRY, _ = registry.setup_registry(
+    "--scoring", default="bleu"
+)
+
+
+def build_scorer(choice, tgt_dict):
+    _choice = choice._name if isinstance(choice, DictConfig) else choice
+
+    if _choice == "bleu":
+        from fairseq.scoring import bleu
+
+        return bleu.Scorer(
+            bleu.BleuConfig(pad=tgt_dict.pad(), eos=tgt_dict.eos(), unk=tgt_dict.unk())
+        )
+    return _build_scorer(choice)
+
+
+# automatically import any Python files in the current directory
+for file in sorted(os.listdir(os.path.dirname(__file__))):
+    if file.endswith(".py") and not file.startswith("_"):
+        module = file[: file.find(".py")]
+        importlib.import_module("fairseq.scoring." + module)
diff --git a/SpeechT5/fairseq/fairseq/scoring/bleu.py b/SpeechT5/fairseq/fairseq/scoring/bleu.py
new file mode 100644
index 0000000000000000000000000000000000000000..97de5f966ec08e5a304c41358e67755c601622b7
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/scoring/bleu.py
@@ -0,0 +1,167 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import ctypes
+import math
+import sys
+from dataclasses import dataclass, field
+
+import torch
+from fairseq.dataclass import FairseqDataclass
+from fairseq.scoring import BaseScorer, register_scorer
+from fairseq.scoring.tokenizer import EvaluationTokenizer
+
+
+class BleuStat(ctypes.Structure):
+    _fields_ = [
+        ("reflen", ctypes.c_size_t),
+        ("predlen", ctypes.c_size_t),
+        ("match1", ctypes.c_size_t),
+        ("count1", ctypes.c_size_t),
+        ("match2", ctypes.c_size_t),
+        ("count2", ctypes.c_size_t),
+        ("match3", ctypes.c_size_t),
+        ("count3", ctypes.c_size_t),
+        ("match4", ctypes.c_size_t),
+        ("count4", ctypes.c_size_t),
+    ]
+
+
+@dataclass
+class SacrebleuConfig(FairseqDataclass):
+    sacrebleu_tokenizer: EvaluationTokenizer.ALL_TOKENIZER_TYPES = field(
+        default="13a", metadata={"help": "tokenizer"}
+    )
+    sacrebleu_lowercase: bool = field(
+        default=False, metadata={"help": "apply lowercasing"}
+    )
+    sacrebleu_char_level: bool = field(
+        default=False, metadata={"help": "evaluate at character level"}
+    )
+
+
+@register_scorer("sacrebleu", dataclass=SacrebleuConfig)
+class SacrebleuScorer(BaseScorer):
+    def __init__(self, cfg):
+        super(SacrebleuScorer, self).__init__(cfg)
+        import sacrebleu
+
+        self.sacrebleu = sacrebleu
+        self.tokenizer = EvaluationTokenizer(
+            tokenizer_type=cfg.sacrebleu_tokenizer,
+            lowercase=cfg.sacrebleu_lowercase,
+            character_tokenization=cfg.sacrebleu_char_level,
+        )
+
+    def add_string(self, ref, pred):
+        self.ref.append(self.tokenizer.tokenize(ref))
+        self.pred.append(self.tokenizer.tokenize(pred))
+
+    def score(self, order=4):
+        return self.result_string(order).score
+
+    def result_string(self, order=4):
+        if order != 4:
+            raise NotImplementedError
+        # tokenization and lowercasing are performed by self.tokenizer instead.
+        return self.sacrebleu.corpus_bleu(
+            self.pred, [self.ref], tokenize="none"
+        ).format()
+
+
+@dataclass
+class BleuConfig(FairseqDataclass):
+    pad: int = field(default=1, metadata={"help": "padding index"})
+    eos: int = field(default=2, metadata={"help": "eos index"})
+    unk: int = field(default=3, metadata={"help": "unk index"})
+
+
+@register_scorer("bleu", dataclass=BleuConfig)
+class Scorer(object):
+    def __init__(self, cfg):
+        self.stat = BleuStat()
+        self.pad = cfg.pad
+        self.eos = cfg.eos
+        self.unk = cfg.unk
+
+        try:
+            from fairseq import libbleu
+        except ImportError as e:
+            sys.stderr.write(
+                "ERROR: missing libbleu.so. run `pip install --editable .`\n"
+            )
+            raise e
+
+        self.C = ctypes.cdll.LoadLibrary(libbleu.__file__)
+
+        self.reset()
+
+    def reset(self, one_init=False):
+        if one_init:
+            self.C.bleu_one_init(ctypes.byref(self.stat))
+        else:
+            self.C.bleu_zero_init(ctypes.byref(self.stat))
+
+    def add(self, ref, pred):
+        if not isinstance(ref, torch.IntTensor):
+            raise TypeError("ref must be a torch.IntTensor (got {})".format(type(ref)))
+        if not isinstance(pred, torch.IntTensor):
+            raise TypeError("pred must be a torch.IntTensor(got {})".format(type(pred)))
+
+        # don't match unknown words
+        rref = ref.clone()
+        assert not rref.lt(0).any()
+        rref[rref.eq(self.unk)] = -999
+
+        rref = rref.contiguous().view(-1)
+        pred = pred.contiguous().view(-1)
+
+        self.C.bleu_add(
+            ctypes.byref(self.stat),
+            ctypes.c_size_t(rref.size(0)),
+            ctypes.c_void_p(rref.data_ptr()),
+            ctypes.c_size_t(pred.size(0)),
+            ctypes.c_void_p(pred.data_ptr()),
+            ctypes.c_int(self.pad),
+            ctypes.c_int(self.eos),
+        )
+
+    def score(self, order=4):
+        psum = sum(
+            math.log(p) if p > 0 else float("-Inf") for p in self.precision()[:order]
+        )
+        return self.brevity() * math.exp(psum / order) * 100
+
+    def precision(self):
+        def ratio(a, b):
+            return a / b if b > 0 else 0
+
+        return [
+            ratio(self.stat.match1, self.stat.count1),
+            ratio(self.stat.match2, self.stat.count2),
+            ratio(self.stat.match3, self.stat.count3),
+            ratio(self.stat.match4, self.stat.count4),
+        ]
+
+    def brevity(self):
+        r = self.stat.reflen / self.stat.predlen
+        return min(1, math.exp(1 - r))
+
+    def result_string(self, order=4):
+        assert order <= 4, "BLEU scores for order > 4 aren't supported"
+        fmt = "BLEU{} = {:2.2f}, {:2.1f}"
+        for _ in range(1, order):
+            fmt += "/{:2.1f}"
+        fmt += " (BP={:.3f}, ratio={:.3f}, syslen={}, reflen={})"
+        bleup = [p * 100 for p in self.precision()[:order]]
+        return fmt.format(
+            order,
+            self.score(order=order),
+            *bleup,
+            self.brevity(),
+            self.stat.predlen / self.stat.reflen,
+            self.stat.predlen,
+            self.stat.reflen
+        )
diff --git a/SpeechT5/fairseq/fairseq/scoring/chrf.py b/SpeechT5/fairseq/fairseq/scoring/chrf.py
new file mode 100644
index 0000000000000000000000000000000000000000..0d6cb77383a44d9ac739958b79a30764f1fbf7f3
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/scoring/chrf.py
@@ -0,0 +1,27 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from fairseq.scoring import BaseScorer, register_scorer
+
+
+@register_scorer("chrf")
+class ChrFScorer(BaseScorer):
+    def __init__(self, args):
+        super(ChrFScorer, self).__init__(args)
+        import sacrebleu
+
+        self.sacrebleu = sacrebleu
+
+    def add_string(self, ref, pred):
+        self.ref.append(ref)
+        self.pred.append(pred)
+
+    def score(self, order=4):
+        return self.result_string(order).score
+
+    def result_string(self, order=4):
+        if order != 4:
+            raise NotImplementedError
+        return self.sacrebleu.corpus_chrf(self.pred, [self.ref]).format()
diff --git a/SpeechT5/fairseq/fairseq/scoring/tokenizer.py b/SpeechT5/fairseq/fairseq/scoring/tokenizer.py
new file mode 100644
index 0000000000000000000000000000000000000000..61cf6d4a7cc698258caad9f68f2e8559dd510eee
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/scoring/tokenizer.py
@@ -0,0 +1,67 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import unicodedata
+
+from fairseq.dataclass import ChoiceEnum
+
+
+class EvaluationTokenizer(object):
+    """A generic evaluation-time tokenizer, which leverages built-in tokenizers
+    in sacreBLEU (https://github.com/mjpost/sacrebleu). It additionally provides
+    lowercasing, punctuation removal and character tokenization, which are
+    applied after sacreBLEU tokenization.
+
+    Args:
+        tokenizer_type (str): the type of sacreBLEU tokenizer to apply.
+        lowercase (bool): lowercase the text.
+        punctuation_removal (bool): remove punctuation (based on unicode
+        category) from text.
+        character_tokenization (bool): tokenize the text to characters.
+    """
+
+    SPACE = chr(32)
+    SPACE_ESCAPE = chr(9601)
+    ALL_TOKENIZER_TYPES = ChoiceEnum(["none", "13a", "intl", "zh", "ja-mecab"])
+
+    def __init__(
+        self,
+        tokenizer_type: str = "13a",
+        lowercase: bool = False,
+        punctuation_removal: bool = False,
+        character_tokenization: bool = False,
+    ):
+        from sacrebleu.tokenizers import TOKENIZERS
+
+        assert tokenizer_type in TOKENIZERS, f"{tokenizer_type}, {TOKENIZERS}"
+        self.lowercase = lowercase
+        self.punctuation_removal = punctuation_removal
+        self.character_tokenization = character_tokenization
+        self.tokenizer = TOKENIZERS[tokenizer_type]
+
+    @classmethod
+    def remove_punctuation(cls, sent: str):
+        """Remove punctuation based on Unicode category."""
+        return cls.SPACE.join(
+            t
+            for t in sent.split(cls.SPACE)
+            if not all(unicodedata.category(c)[0] == "P" for c in t)
+        )
+
+    def tokenize(self, sent: str):
+        tokenized = self.tokenizer()(sent)
+
+        if self.punctuation_removal:
+            tokenized = self.remove_punctuation(tokenized)
+
+        if self.character_tokenization:
+            tokenized = self.SPACE.join(
+                list(tokenized.replace(self.SPACE, self.SPACE_ESCAPE))
+            )
+
+        if self.lowercase:
+            tokenized = tokenized.lower()
+
+        return tokenized
diff --git a/SpeechT5/fairseq/fairseq/scoring/wer.py b/SpeechT5/fairseq/fairseq/scoring/wer.py
new file mode 100644
index 0000000000000000000000000000000000000000..633dc47c247691c4c9e36cbdbab7d7cb74b38452
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/scoring/wer.py
@@ -0,0 +1,58 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from dataclasses import dataclass, field
+
+from fairseq.dataclass import FairseqDataclass
+from fairseq.scoring import BaseScorer, register_scorer
+from fairseq.scoring.tokenizer import EvaluationTokenizer
+
+
+@dataclass
+class WerScorerConfig(FairseqDataclass):
+    wer_tokenizer: EvaluationTokenizer.ALL_TOKENIZER_TYPES = field(
+        default="none", metadata={"help": "sacreBLEU tokenizer to use for evaluation"}
+    )
+    wer_remove_punct: bool = field(
+        default=False, metadata={"help": "remove punctuation"}
+    )
+    wer_char_level: bool = field(
+        default=False, metadata={"help": "evaluate at character level"}
+    )
+    wer_lowercase: bool = field(default=False, metadata={"help": "lowercasing"})
+
+
+@register_scorer("wer", dataclass=WerScorerConfig)
+class WerScorer(BaseScorer):
+    def __init__(self, cfg):
+        super().__init__(cfg)
+        self.reset()
+        try:
+            import editdistance as ed
+        except ImportError:
+            raise ImportError("Please install editdistance to use WER scorer")
+        self.ed = ed
+        self.tokenizer = EvaluationTokenizer(
+            tokenizer_type=self.cfg.wer_tokenizer,
+            lowercase=self.cfg.wer_lowercase,
+            punctuation_removal=self.cfg.wer_remove_punct,
+            character_tokenization=self.cfg.wer_char_level,
+        )
+
+    def reset(self):
+        self.distance = 0
+        self.ref_length = 0
+
+    def add_string(self, ref, pred):
+        ref_items = self.tokenizer.tokenize(ref).split()
+        pred_items = self.tokenizer.tokenize(pred).split()
+        self.distance += self.ed.eval(ref_items, pred_items)
+        self.ref_length += len(ref_items)
+
+    def result_string(self):
+        return f"WER: {self.score():.2f}"
+
+    def score(self):
+        return 100.0 * self.distance / self.ref_length if self.ref_length > 0 else 0
diff --git a/SpeechT5/fairseq/fairseq/search.py b/SpeechT5/fairseq/fairseq/search.py
new file mode 100644
index 0000000000000000000000000000000000000000..d5ea68b4ce04409c504c1d22098b7968a9ce596a
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/search.py
@@ -0,0 +1,814 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import math
+from typing import List, Optional
+
+import torch
+import torch.nn as nn
+from fairseq.token_generation_constraints import (
+    ConstraintState,
+    OrderedConstraintState,
+    UnorderedConstraintState,
+)
+from torch import Tensor
+
+
+class Search(nn.Module):
+    def __init__(self, tgt_dict):
+        super().__init__()
+        self.pad = tgt_dict.pad()
+        self.unk = tgt_dict.unk()
+        self.eos = tgt_dict.eos()
+        self.vocab_size = len(tgt_dict)
+        self.src_lengths = torch.tensor(-1)
+        self.supports_constraints = False
+        self.stop_on_max_len = False
+
+    def step(
+        self, step, lprobs, scores, prev_output_tokens=None, original_batch_idxs=None
+    ):
+        """Take a single search step.
+
+        Args:
+            step: the current search step, starting at 0
+            lprobs: (bsz x input_beam_size x vocab_size)
+                the model's log-probabilities over the vocabulary at the current step
+            scores: (bsz x input_beam_size x step)
+                the historical model scores of each hypothesis up to this point
+            prev_output_tokens: (bsz x step)
+                the previously generated oputput tokens
+            original_batch_idxs: (bsz)
+                the tensor with the batch indices, in the range [0, bsz)
+                this is useful in case there has been applied a re-ordering
+                and we need to know the orignal indices
+
+        Return: A tuple of (scores, indices, beams) where:
+            scores: (bsz x output_beam_size)
+                the scores of the chosen elements; output_beam_size can be
+                larger than input_beam_size, e.g., we may return
+                2*input_beam_size to account for EOS
+            indices: (bsz x output_beam_size)
+                the indices of the chosen elements
+            beams: (bsz x output_beam_size)
+                the hypothesis ids of the chosen elements, in the range [0, input_beam_size)
+        """
+        raise NotImplementedError
+
+    @torch.jit.export
+    def set_src_lengths(self, src_lengths):
+        self.src_lengths = src_lengths
+
+    @torch.jit.export
+    def init_constraints(self, batch_constraints: Optional[Tensor], beam_size: int):
+        """Initialize constraint states for constrained decoding (if supported).
+
+        Args:
+            batch_constraints: (torch.Tensor, optional)
+                the list of constraints, in packed form
+            beam_size: (int)
+                the beam size
+        Returns:
+            *encoder_out* rearranged according to *new_order*
+        """
+        pass
+
+    def prune_sentences(self, batch_idxs: Tensor):
+        """
+        Removes constraint states for completed sentences (if supported).
+        This is called from sequence_generator._generate() when sentences are
+        deleted from the batch.
+
+        Args:
+            batch_idxs: Indices of *sentences* whose constraint state should be *kept*.
+        """
+        pass
+
+    def update_constraints(self, active_hypos: Tensor):
+        """
+        Updates the constraint states by selecting the beam items that are retained.
+        This is called at each time step of sequence_generator._generate() when
+        the set of 2 * {beam_size} candidate hypotheses are reduced to the beam size.
+
+        Args:
+            active_hypos: (batch size, beam size)
+              list of integers denoting, for each sentence, which beam candidate items
+              should be kept.
+        """
+        pass
+
+
+class BeamSearch(Search):
+    def __init__(self, tgt_dict):
+        super().__init__(tgt_dict)
+        self.constraint_states = None
+
+    @torch.jit.export
+    def step(
+        self,
+        step: int,
+        lprobs,
+        scores: Optional[Tensor],
+        prev_output_tokens: Optional[Tensor] = None,
+        original_batch_idxs: Optional[Tensor] = None,
+    ):
+        bsz, beam_size, vocab_size = lprobs.size()
+
+        if step == 0:
+            # at the first step all hypotheses are equally likely, so use
+            # only the first beam
+            lprobs = lprobs[:, ::beam_size, :].contiguous()
+        else:
+            # make probs contain cumulative scores for each hypothesis
+            assert scores is not None
+            lprobs = lprobs + scores[:, :, step - 1].unsqueeze(-1)
+
+        top_prediction = torch.topk(
+            lprobs.view(bsz, -1),
+            k=min(
+                # Take the best 2 x beam_size predictions. We'll choose the first
+                # beam_size of these which don't predict eos to continue with.
+                beam_size * 2,
+                lprobs.view(bsz, -1).size(1) - 1,  # -1 so we never select pad
+            ),
+        )
+        scores_buf = top_prediction[0]
+        indices_buf = top_prediction[1]
+        # Project back into relative indices and beams
+        beams_buf = indices_buf // vocab_size
+        indices_buf = indices_buf.fmod(vocab_size)
+
+        # At this point, beams_buf and indices_buf are single-dim and contain relative indices
+        return scores_buf, indices_buf, beams_buf
+
+
+class PrefixConstrainedBeamSearch(Search):
+    def __init__(self, tgt_dict, prefix_allowed_tokens_fn):
+        super().__init__(tgt_dict)
+        self.prefix_allowed_tokens_fn = prefix_allowed_tokens_fn
+        self.stop_on_max_len = True
+
+    @torch.jit.export
+    def apply_mask(self, x, prev_output_tokens, original_batch_idxs):
+        beam_size = x.shape[0] // original_batch_idxs.shape[0]
+        original_batch_idxs = (
+            original_batch_idxs.unsqueeze(-1).repeat((1, beam_size)).flatten().tolist()
+        )
+
+        mask = torch.full_like(x, -math.inf)
+        for sent_i, (sent, batch_i) in enumerate(
+            zip(prev_output_tokens, original_batch_idxs)
+        ):
+            mask[sent_i, :, self.prefix_allowed_tokens_fn(batch_i, sent)] = 0
+
+        return mask
+
+    @torch.jit.export
+    def step(
+        self,
+        step: int,
+        lprobs: Tensor,
+        scores: Tensor,
+        prev_output_tokens: Tensor,
+        original_batch_idxs: Tensor,
+    ):
+        bsz, beam_size, vocab_size = lprobs.size()
+
+        lprobs += self.apply_mask(
+            lprobs.view(bsz * beam_size, 1, vocab_size),
+            prev_output_tokens,
+            original_batch_idxs,
+        ).view(bsz, beam_size, vocab_size)
+
+        if step == 0:
+            # at the first step all hypotheses are equally likely, so use
+            # only the first beam
+            lprobs = lprobs[:, ::beam_size, :].contiguous()
+        else:
+            # make probs contain cumulative scores for each hypothesis
+            assert scores is not None
+            lprobs = lprobs + scores[:, :, step - 1].unsqueeze(-1)
+
+        top_prediction = torch.topk(
+            lprobs.view(bsz, -1),
+            k=min(
+                # Take the best beam_size predictions. We'll choose the first
+                # beam_size of these which don't predict eos to continue with.
+                beam_size,
+                lprobs.view(bsz, -1).size(1) - 1,  # -1 so we never select pad
+            ),
+        )
+        scores_buf = top_prediction[0]
+        indices_buf = top_prediction[1]
+        beams_buf = indices_buf // vocab_size
+        indices_buf = indices_buf.fmod(vocab_size)
+        return scores_buf, indices_buf, beams_buf
+
+
+class LexicallyConstrainedBeamSearch(Search):
+    """Implements lexically constrained beam search as described in
+
+        Fast Lexically Constrained Decoding with Dynamic Beam
+        Allocation for Neural Machine Translation.  Post & Vilar,
+        NAACL 2018.  https://www.aclweb.org/anthology/N18-1119/
+
+    and
+
+        Improved Lexically Constrained Decoding for Translation and
+        Monolingual Rewriting. Hu et al, NAACL
+        2019. https://www.aclweb.org/anthology/N19-1090/
+
+    This is accomplished by maintaining, for each beam hypothesis, a
+    ConstraintState object (see constraints.py) that tracks which
+    constraints have been generated and using this information to
+    shape the beam for each input sentence.
+    """
+
+    def __init__(self, tgt_dict, representation):
+        super().__init__(tgt_dict)
+        self.representation = representation
+        self.vocab_size = len(tgt_dict)
+        self.num_cands = 0
+        self.supports_constraints = True
+
+    @torch.jit.export
+    def init_constraints(self, batch_constraints: Optional[Tensor], beam_size: int):
+        self.constraint_states = []
+        for constraint_tensor in batch_constraints:
+            if self.representation == "ordered":
+                constraint_state = OrderedConstraintState.create(constraint_tensor)
+            elif self.representation == "unordered":
+                constraint_state = UnorderedConstraintState.create(constraint_tensor)
+
+            self.constraint_states.append([constraint_state for i in range(beam_size)])
+
+    @torch.jit.export
+    def prune_sentences(self, batch_idxs: Tensor):
+        self.constraint_states = [
+            self.constraint_states[i] for i in batch_idxs.tolist()
+        ]
+
+    @torch.jit.export
+    def update_constraints(self, active_hypos: Tensor):
+        if self.constraint_states:
+            batch_size = active_hypos.size(0)
+            for sentid in range(batch_size):
+                self.constraint_states[sentid] = [
+                    self.constraint_states[sentid][i] for i in active_hypos[sentid]
+                ]
+
+    @torch.jit.export
+    def step(
+        self,
+        step: int,
+        lprobs: Tensor,
+        scores: Optional[Tensor],
+        prev_output_tokens: Optional[Tensor] = None,
+        original_batch_idxs: Optional[Tensor] = None,
+    ):
+        """
+        A constrained step builds a large candidates list from the following:
+        - the top 2 * {beam_size} items over the whole beam
+        - for each item in the beam
+          - the top {each_k} (default 1)
+          - all next constraints
+        We then compute the constrained state of each beam item, and assign
+        stripe codes: 0 to the best in each bank, 1 to the 2nd-best, and so
+        on. We then sort by (stripe, score), and truncate the list at
+        2 * beam size.
+
+        Args:
+            step: the decoder step
+            lprobs: (batch size, beam size, target vocab)
+                the target-vocab distributions for each item in the beam.
+        Retrun: A tuple of (scores, indices, beams, constraints) where:
+            scores: (batch, output beam size)
+                the scores of the chosen elements
+            indices: (batch, output beam size)
+                the target vocab indices of the chosen elements
+            beams: (batch, output beam size)
+                the 0-indexed hypothesis ids of the chosen elements
+            constraints: (batch, output beam size)
+                the new constraint states
+        """
+        each_k = 1
+        device = lprobs.device
+
+        batch_size, beam_size, vocab_size = lprobs.size()
+
+        self.num_cands = min(
+            # Just take the k-best. We'll get another k from the 1-best from each
+            # row, plus more from the constraints
+            beam_size * 2,
+            lprobs.view(batch_size, -1).size(1) - 1,  # -1 so we never select pad
+        )
+
+        # STEP 0: Preliminary. Prevent EOS for unfinished hyps across all batch items
+        constraint_states = self.constraint_states
+        if constraint_states and step > 0:
+            not_finished_indices = []
+            for sentno, sent_constraints in enumerate(constraint_states):
+                for beamno, state in enumerate(sent_constraints):
+                    index = sentno * beam_size + beamno
+                    if not state.finished:
+                        not_finished_indices.append(index)
+            not_finished_indices = torch.tensor(not_finished_indices)
+            if not_finished_indices.numel() > 0:
+                lprobs.view(batch_size * beam_size, -1)[
+                    not_finished_indices, self.eos
+                ] = -math.inf
+
+        if step == 0:
+            # at the first step all hypotheses are equally likely, so use
+            # only the first beam entry for each batch item
+            lprobs = lprobs[:, ::beam_size, :].contiguous()
+        else:
+            # make probs contain cumulative scores for each hypothesis
+            assert scores is not None
+            lprobs = lprobs + scores[:, :, step - 1].unsqueeze(-1)
+
+        top_prediction = torch.topk(
+            lprobs.view(batch_size, -1),
+            self.num_cands,
+        )
+        scores_buf, indices_buf = top_prediction
+        # Project back into relative indices and beams
+        beams_buf = indices_buf // vocab_size
+        indices_buf = indices_buf.fmod(vocab_size)
+
+        # Short circuit if there are no constraints in this batch
+        if not constraint_states:
+            return scores_buf, indices_buf, beams_buf
+
+        # STEP 1: get top-1 from each hypothesis across all sentences in the batch
+        if step > 0:
+            top_scores, top_indices = torch.topk(
+                lprobs.view(batch_size * beam_size, -1),
+                k=each_k,
+                dim=1,
+            )
+            top_scores = top_scores.view(batch_size, -1)
+            top_indices = top_indices.view(batch_size, -1)
+            scores_buf = torch.cat((scores_buf, top_scores), dim=1)
+            indices_buf = torch.cat((indices_buf, top_indices), dim=1)
+            new_beams = torch.arange(0, beam_size, device=device).repeat(batch_size, 1)
+            beams_buf = torch.cat((beams_buf, new_beams), dim=1)
+
+        # Now, process sentences in the batch one by one.
+        new_scores_buf = torch.zeros((batch_size, 2 * beam_size), device=device)
+        new_indices_buf = torch.zeros((batch_size, 2 * beam_size), device=device).long()
+        new_beams_buf = torch.zeros((batch_size, 2 * beam_size), device=device).long()
+        for sentno, states in enumerate(constraint_states):
+            scores, indices, beams, new_states = self.step_sentence(
+                step,
+                sentno,
+                lprobs[sentno],
+                constraint_states[sentno],
+                beams_buf[sentno].clone(),
+                indices_buf[sentno].clone(),
+                scores_buf[sentno].clone(),
+            )
+            new_scores_buf[sentno] = scores
+            new_indices_buf[sentno] = indices
+            new_beams_buf[sentno] = beams
+            self.constraint_states[sentno] = new_states
+
+        return new_scores_buf, new_indices_buf, new_beams_buf
+
+    @torch.jit.export
+    def step_sentence(
+        self,
+        step: int,
+        sentno: int,
+        lprobs: Tensor,
+        constraint_states: List[List[ConstraintState]],
+        beams_buf: Tensor,
+        indices_buf: Tensor,
+        scores_buf: Tensor,
+    ):
+        """Does per-sentence processing. Adds all constraints for each
+        hypothesis to the list of candidates; then removes duplicates,
+        sorts, and dynamically stripes across the banks. All tensor inputs
+        are collapsed to those pertaining to a single input sentence.
+        """
+        device = lprobs.device
+
+        # STEP 2: Add all constraints for each beam item
+        for beamno, state in enumerate(constraint_states):
+            next_tokens = torch.tensor(list(state.next_tokens()), device=device).long()
+            if next_tokens.numel() != 0:
+                indices_buf = torch.cat((indices_buf, next_tokens))
+                next_beams = (
+                    torch.tensor(beamno, device=device)
+                    .repeat(next_tokens.size(0))
+                    .long()
+                )
+                beams_buf = torch.cat((beams_buf, next_beams))
+                next_values = lprobs[beamno].take(next_tokens.view(-1))
+                scores_buf = torch.cat((scores_buf, next_values))
+
+            # At the 0th time step, there is just one beam item
+            if step == 0:
+                break
+
+        # STEP 3: Compute the "bank" for each candidate. This is the
+        # number of constraints it's generated. We need this so that
+        # we can do round-robin allocation of the beam across these
+        # banks. If C is the number of constraints, we select the best
+        # item in bank C, then the best in bank C-1, etc, followed by
+        # the 2nd-best in bank C, the 2nd-best in bank C-1, etc, and so
+        # on, until the maximum beam size. We accomplish this by
+        # creating a sort key and striping across the banks.
+
+        # Compute the new states for all candidates
+        cands_size = indices_buf.size(0)
+        constraint_states = [
+            constraint_states[beams_buf[i]].advance(indices_buf[i])
+            for i in range(cands_size)
+        ]
+
+        banks = torch.tensor([state.bank for state in constraint_states], device=device)
+
+        # STEP 4: Sort
+        num_constraint_tokens = len(state.tokens)
+
+        # Sort by keys (bank, score) (i.e., sort banks together, and scores
+        # within banks). AFAIK pytorch doesn't support either stable sort or
+        # multi-key sorting, so we have to hack this.
+        MAX_SCORE = -100
+        sort_key = (num_constraint_tokens - banks) * MAX_SCORE + scores_buf
+        sort_values, sort_indices = sort_key.sort(dim=0, descending=True)
+        scores_buf = scores_buf[sort_indices]
+        indices_buf = indices_buf[sort_indices]
+        beams_buf = beams_buf[sort_indices]
+        banks = banks[sort_indices]
+
+        # Sort the constraints to follow suit
+        constraint_states = [constraint_states[i] for i in sort_indices]
+
+        # STEP 5: Remove duplicates. The topk calls (overall and
+        # per-row) plus the per-row generation of constraints will
+        # produce duplicates. Here we remove them.
+
+        def roll(t):
+            """Rolls a 1d tensor left by 1.
+
+            [0, 1, 2, 3, 4] becomes [4, 0, 1, 2, 3]
+            """
+            return torch.cat((t[-1].unsqueeze(0), t[0:-1]), dim=0)
+
+        # We map candidates (beam, token_id) to a single dimension.
+        # This is then shifted by 1. We can then easily identify
+        # duplicates and create a mask that identifies unique
+        # extensions.
+        uniques_mask = beams_buf * (self.vocab_size + 1) + indices_buf
+        uniques_mask = roll(uniques_mask) != uniques_mask
+
+        # Use the mask to pare down the data structures
+        scores_buf = torch.masked_select(scores_buf, uniques_mask)
+        indices_buf = torch.masked_select(indices_buf, uniques_mask)
+        beams_buf = torch.masked_select(beams_buf, uniques_mask)
+        banks = torch.masked_select(banks, uniques_mask)
+        i = 1
+        for mask in uniques_mask[1:]:
+            if not mask:
+                constraint_states.pop(i)
+            i += mask
+
+        # STEP 6: Assign IDs round-robin across banks, sort, and
+        # truncate. Now that the candidates are sorted by (bank,
+        # score) and uniqed, we dynamically allocate the {beam_size}
+        # beam by striping across the candidates. These stripes will
+        # be used as sort keys to do round-robin selection. This is
+        # accomplished in a single pass with offsets. Sorting by
+        # highest-banks (furthest-along hypotheses) first ensures
+        # progress through the constraints.
+        #
+        # e.g., BANKS: 3 3 3 2 2 2 2 1 1 1 0 0
+        # OLD STRIPES: 0 1 2 0 1 2 3 0 1 2 0 1
+        # NEW STRIPES: 0 1+4 2+8 0+1 1+5 2+9 3+11 0+2 1+6 2+10 0+3 1+7
+        #            = 0 5 10 1 6 11 13 2 7 12 3 8
+        #
+        # Sorting by this then gives the following banks:
+        #
+        #             3 2 1 0 3 2 1 0 3 2 1 2
+        #
+        # We'll take the top {beam_size} of these.
+        stripe_offsets = [offset * (len(banks) + 1) for offset in range(len(banks) + 1)]
+        stripes = torch.zeros_like(banks)
+        cur_bank_count = -1
+        cur_bank = banks[0]
+        for i, bank in enumerate(banks):
+            if bank != cur_bank:
+                cur_bank_count = 0
+                cur_bank = bank
+            else:
+                cur_bank_count += 1
+            stripes[i] = num_constraint_tokens - bank + stripe_offsets[cur_bank_count]
+
+        # STEP 7: Sort by the stripes values
+        sort_values, sort_indices = stripes.sort(dim=0)
+        scores_buf = scores_buf[sort_indices]
+        indices_buf = indices_buf[sort_indices]
+        beams_buf = beams_buf[sort_indices]
+        constraint_states = [constraint_states[i] for i in sort_indices]
+
+        # STEP 8: Truncate to the candidates size!
+        scores_buf = scores_buf[: self.num_cands]
+        indices_buf = indices_buf[: self.num_cands]
+        beams_buf = beams_buf[: self.num_cands]
+
+        return scores_buf, indices_buf, beams_buf, constraint_states
+
+
+class LengthConstrainedBeamSearch(Search):
+    def __init__(self, tgt_dict, min_len_a, min_len_b, max_len_a, max_len_b):
+        super().__init__(tgt_dict)
+        self.min_len_a = min_len_a
+        self.min_len_b = min_len_b
+        self.max_len_a = max_len_a
+        self.max_len_b = max_len_b
+        self.beam = BeamSearch(tgt_dict)
+        self.needs_src_lengths = True
+
+    def step(
+        self,
+        step: int,
+        lprobs,
+        scores,
+        prev_output_tokens: Optional[Tensor] = None,
+        original_batch_idxs: Optional[Tensor] = None,
+    ):
+        min_lens = self.min_len_a * self.src_lengths + self.min_len_b
+        max_lens = self.max_len_a * self.src_lengths + self.max_len_b
+        lprobs[step < min_lens, :, self.eos] = -math.inf
+        lprobs[step >= max_lens, :, self.eos] = 0
+        return self.beam.step(step, lprobs, scores)
+
+
+class DiverseBeamSearch(Search):
+    """Diverse Beam Search.
+
+    See "Diverse Beam Search: Decoding Diverse Solutions from Neural Sequence
+    Models" for details.
+
+    We only implement the Hamming Diversity penalty here, which performed best
+    in the original paper.
+    """
+
+    def __init__(self, tgt_dict, num_groups, diversity_strength):
+        super().__init__(tgt_dict)
+        self.num_groups = num_groups
+        self.diversity_strength = -diversity_strength
+        self.beam = BeamSearch(tgt_dict)
+
+    @torch.jit.export
+    def step(
+        self,
+        step: int,
+        lprobs,
+        scores,
+        prev_output_tokens: Optional[Tensor] = None,
+        original_batch_idxs: Optional[Tensor] = None,
+    ):
+        bsz, beam_size, vocab_size = lprobs.size()
+        if beam_size % self.num_groups != 0:
+            raise ValueError(
+                "DiverseBeamSearch requires --beam to be divisible by the number of groups"
+            )
+
+        # initialize diversity penalty
+        diversity_buf = torch.zeros(lprobs[:, 0, :].size()).to(lprobs)
+
+        scores_G, indices_G, beams_G = [], [], []
+        for g in range(self.num_groups):
+            lprobs_g = lprobs[:, g :: self.num_groups, :]
+            scores_g = scores[:, g :: self.num_groups, :] if step > 0 else None
+
+            # apply diversity penalty
+            if g > 0:
+                lprobs_g = torch.add(
+                    lprobs_g,
+                    other=diversity_buf.unsqueeze(1),
+                    alpha=self.diversity_strength,
+                )
+            else:
+                lprobs_g = lprobs_g.contiguous()
+
+            scores_buf, indices_buf, beams_buf = self.beam.step(
+                step, lprobs_g, scores_g
+            )
+            beams_buf.mul_(self.num_groups).add_(g)
+
+            scores_G.append(scores_buf.clone())
+            indices_G.append(indices_buf.clone())
+            beams_G.append(beams_buf.clone())
+
+            # update diversity penalty
+            diversity_buf.scatter_add_(
+                1, indices_buf, torch.ones(indices_buf.size()).to(diversity_buf)
+            )
+
+        # interleave results from different groups
+        scores_buf = torch.stack(scores_G, dim=2).view(bsz, -1)
+        indices_buf = torch.stack(indices_G, dim=2).view(bsz, -1)
+        beams_buf = torch.stack(beams_G, dim=2).view(bsz, -1)
+        return scores_buf, indices_buf, beams_buf
+
+
+class Sampling(Search):
+    sampling_topk: int
+    sampling_topp: float
+
+    def __init__(self, tgt_dict, sampling_topk=-1, sampling_topp=-1.0):
+        super().__init__(tgt_dict)
+        self.sampling_topk = sampling_topk
+        self.sampling_topp = sampling_topp
+
+    def _sample_topp(self, lprobs):
+        """Sample among the smallest set of elements whose cumulative probability mass exceeds p.
+
+        See `"The Curious Case of Neural Text Degeneration"
+        (Holtzman et al., 2019) <https://arxiv.org/abs/1904.09751>`_.
+
+        Args:
+            lprobs: (bsz x input_beam_size x vocab_size)
+                the model's log-probabilities over the vocabulary at the current step
+
+        Return: A tuple of (trimed_probs, truncated_indices) where:
+            trimed_probs: (bsz x input_beam_size x ?)
+                the model's probabilities over the elements selected to sample from. The
+                width of the third dimension is determined by top-P.
+            truncated_indices: (bsz x input_beam_size x ?)
+                the indices of the chosen elements.
+        """
+        probs = lprobs.exp_()
+
+        # sort the last dimension (vocab dimension) in descending order
+        sorted_probs, sorted_indices = probs.sort(descending=True)
+
+        # compute a mask to indicate the words to be included in the top-P set.
+        cumsum_probs = sorted_probs.cumsum(dim=2)
+        mask = cumsum_probs.lt(self.sampling_topp)
+
+        # note that mask was computed by 'lt'. One more word needs to be included
+        # so that the cumulative probability mass can exceed p.
+        cumsum_mask = mask.cumsum(dim=2)
+        last_included = cumsum_mask[:, :, -1:]
+        last_included.clamp_(0, mask.size()[2] - 1)
+        mask = mask.scatter_(2, last_included, 1)
+
+        # truncate unnecessary dims.
+        max_dim = last_included.max()
+        truncated_mask = mask[:, :, : max_dim + 1]
+        truncated_probs = sorted_probs[:, :, : max_dim + 1]
+        truncated_indices = sorted_indices[:, :, : max_dim + 1]
+
+        # trim the words that are not in top-P by setting their probabilities
+        # to 0, so that they would not be sampled later.
+        trim_mask = ~truncated_mask
+        trimed_probs = truncated_probs.masked_fill_(trim_mask, 0)
+        return trimed_probs, truncated_indices
+
+    @torch.jit.export
+    def step(
+        self,
+        step: int,
+        lprobs,
+        scores,
+        prev_output_tokens: Optional[Tensor] = None,
+        original_batch_idxs: Optional[Tensor] = None,
+    ):
+        bsz, beam_size, vocab_size = lprobs.size()
+
+        if step == 0:
+            # at the first step all hypotheses are equally likely, so use
+            # only the first beam
+            lprobs = lprobs[:, ::beam_size, :].contiguous()
+
+        if self.sampling_topp > 0:
+            # only sample from the smallest set of words whose cumulative probability mass exceeds p
+            probs, top_indices = self._sample_topp(lprobs)
+        elif self.sampling_topk > 0:
+            # only sample from top-k candidates
+            lprobs, top_indices = lprobs.topk(self.sampling_topk)
+            probs = lprobs.exp_()
+        else:
+            probs = lprobs.exp_()
+
+            # dummy data to be consistent with true branch for type check
+            top_indices = torch.empty(0).to(probs)
+        # sample
+        if step == 0:
+            indices_buf = torch.multinomial(
+                probs.view(bsz, -1),
+                beam_size,
+                replacement=True,
+            ).view(bsz, beam_size)
+        else:
+            indices_buf = torch.multinomial(
+                probs.view(bsz * beam_size, -1),
+                1,
+                replacement=True,
+            ).view(bsz, beam_size)
+
+        if step == 0:
+            # expand to beam size
+            probs = probs.expand(bsz, beam_size, -1)
+
+        # gather scores
+        scores_buf = torch.gather(probs, dim=2, index=indices_buf.unsqueeze(-1))
+        scores_buf = scores_buf.log_().view(bsz, -1)
+
+        # remap indices if using top-k or top-P sampling
+        if self.sampling_topk > 0 or self.sampling_topp > 0:
+            indices_buf = torch.gather(
+                top_indices.expand(bsz, beam_size, -1),
+                dim=2,
+                index=indices_buf.unsqueeze(-1),
+            ).squeeze(2)
+
+        if step == 0:
+            beams_buf = indices_buf.new_zeros(bsz, beam_size)
+        else:
+            beams_buf = torch.arange(0, beam_size).to(indices_buf).repeat(bsz, 1)
+            # make scores cumulative
+            scores_buf.add_(
+                torch.gather(scores[:, :, step - 1], dim=1, index=beams_buf)
+            )
+
+        return scores_buf, indices_buf, beams_buf
+
+
+class DiverseSiblingsSearch(Search):
+    """
+    Beam search with diverse siblings.
+
+    See "A Simple, Fast Diverse Decoding Algorithm for Neural Generation" for details.
+    https://arxiv.org/abs/1611.08562
+
+    1/ Calculate hypotheses for each beam
+    2/ Intra-sibling ordering
+    3/ Rewrite scores
+    4/ Choose top K hypotheses
+
+    if diversity_rate == 0 is equivalent to BeamSearch
+    """
+
+    def __init__(self, tgt_dict, diversity_rate):
+        super().__init__(tgt_dict)
+        self.diversity_rate = diversity_rate
+        self.beam = BeamSearch(tgt_dict)
+
+    def step(
+        self,
+        step: int,
+        lprobs,
+        scores,
+        prev_output_tokens: Optional[Tensor] = None,
+        original_batch_idxs: Optional[Tensor] = None,
+    ):
+        bsz, beam_size, vocab_size = lprobs.size()
+        k = min(
+            # Take the best 2 x beam_size predictions. We'll choose the first
+            # beam_size of these which don't predict eos to continue with.
+            beam_size * 2,
+            lprobs.view(bsz, -1).size(1) - 1,  # -1 so we never select pad
+        )
+        s_list: List[Tensor]
+        i_list: List[Tensor]
+        s_list = [torch.empty(0).to(lprobs) for i in range(beam_size)]
+        i_list = [torch.LongTensor().to(device=lprobs.device) for i in range(beam_size)]
+        sibling_score = torch.arange(1, k + 1).to(lprobs) * self.diversity_rate
+
+        if step == 0:
+            return self.beam.step(step, lprobs, scores)
+        lprobs.add_(scores[:, :, step - 1].unsqueeze(-1))
+
+        # 1/ Calculate hypotheses for each beam
+        for i in range(beam_size):
+            torch.topk(lprobs[:, i, :].view(bsz, -1), k, out=(s_list[i], i_list[i]))
+            i_list[i].fmod_(vocab_size)
+
+            # 2/ Intra-sibling ordering by default from topk + 3/ Rewrite scores
+            s_list[i].sub_(sibling_score)
+
+        # 4/ Choose top K hypotheses
+        indices = torch.stack(i_list, dim=1).view(bsz, -1)
+
+        final_scores = torch.empty(0).to(lprobs)
+        final_indices = torch.LongTensor().to(device=lprobs.device)
+        final_beams = torch.LongTensor().to(device=lprobs.device)
+        (final_scores, final_indices) = torch.topk(
+            torch.stack(s_list, dim=1).view(bsz, -1),
+            k,
+        )
+
+        final_beams = final_indices // k
+
+        for i in range(bsz):
+            final_indices[i] = indices[i][final_indices[i]]
+
+        return final_scores, final_indices, final_beams
diff --git a/SpeechT5/fairseq/fairseq/sequence_generator.py b/SpeechT5/fairseq/fairseq/sequence_generator.py
new file mode 100644
index 0000000000000000000000000000000000000000..8a3858563ec0c3cd7f3177bcd2897d27b61dbe00
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/sequence_generator.py
@@ -0,0 +1,980 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import math
+from typing import Dict, List, Optional
+import sys
+
+import torch
+import torch.nn as nn
+from fairseq import search, utils
+from fairseq.data import data_utils
+from fairseq.models import FairseqIncrementalDecoder
+from torch import Tensor
+from fairseq.ngram_repeat_block import NGramRepeatBlock
+
+
+class SequenceGenerator(nn.Module):
+    def __init__(
+        self,
+        models,
+        tgt_dict,
+        beam_size=1,
+        max_len_a=0,
+        max_len_b=200,
+        max_len=0,
+        min_len=1,
+        normalize_scores=True,
+        len_penalty=1.0,
+        unk_penalty=0.0,
+        temperature=1.0,
+        match_source_len=False,
+        no_repeat_ngram_size=0,
+        search_strategy=None,
+        eos=None,
+        symbols_to_strip_from_output=None,
+        lm_model=None,
+        lm_weight=1.0,
+    ):
+        """Generates translations of a given source sentence.
+
+        Args:
+            models (List[~fairseq.models.FairseqModel]): ensemble of models,
+                currently support fairseq.models.TransformerModel for scripting
+            beam_size (int, optional): beam width (default: 1)
+            max_len_a/b (int, optional): generate sequences of maximum length
+                ax + b, where x is the source length
+            max_len (int, optional): the maximum length of the generated output
+                (not including end-of-sentence)
+            min_len (int, optional): the minimum length of the generated output
+                (not including end-of-sentence)
+            normalize_scores (bool, optional): normalize scores by the length
+                of the output (default: True)
+            len_penalty (float, optional): length penalty, where <1.0 favors
+                shorter, >1.0 favors longer sentences (default: 1.0)
+            unk_penalty (float, optional): unknown word penalty, where <0
+                produces more unks, >0 produces fewer (default: 0.0)
+            temperature (float, optional): temperature, where values
+                >1.0 produce more uniform samples and values <1.0 produce
+                sharper samples (default: 1.0)
+            match_source_len (bool, optional): outputs should match the source
+                length (default: False)
+        """
+        super().__init__()
+        if isinstance(models, EnsembleModel):
+            self.model = models
+        else:
+            self.model = EnsembleModel(models)
+        self.tgt_dict = tgt_dict
+        self.pad = tgt_dict.pad()
+        self.unk = tgt_dict.unk()
+        self.eos = tgt_dict.eos() if eos is None else eos
+        self.symbols_to_strip_from_output = (
+            symbols_to_strip_from_output.union({self.eos})
+            if symbols_to_strip_from_output is not None
+            else {self.eos}
+        )
+        self.vocab_size = len(tgt_dict)
+        self.beam_size = beam_size
+        # the max beam size is the dictionary size - 1, since we never select pad
+        self.beam_size = min(beam_size, self.vocab_size - 1)
+        self.max_len_a = max_len_a
+        self.max_len_b = max_len_b
+        self.min_len = min_len
+        self.max_len = max_len or self.model.max_decoder_positions()
+
+        self.normalize_scores = normalize_scores
+        self.len_penalty = len_penalty
+        self.unk_penalty = unk_penalty
+        self.temperature = temperature
+        self.match_source_len = match_source_len
+
+        if no_repeat_ngram_size > 0:
+            self.repeat_ngram_blocker = NGramRepeatBlock(no_repeat_ngram_size)
+        else:
+            self.repeat_ngram_blocker = None
+
+        assert temperature > 0, "--temperature must be greater than 0"
+
+        self.search = (
+            search.BeamSearch(tgt_dict) if search_strategy is None else search_strategy
+        )
+        # We only need to set src_lengths in LengthConstrainedBeamSearch.
+        # As a module attribute, setting it would break in multithread
+        # settings when the model is shared.
+        self.should_set_src_lengths = (
+            hasattr(self.search, "needs_src_lengths") and self.search.needs_src_lengths
+        )
+
+        self.model.eval()
+
+        self.lm_model = lm_model
+        self.lm_weight = lm_weight
+        if self.lm_model is not None:
+            self.lm_model.eval()
+
+    def cuda(self):
+        self.model.cuda()
+        return self
+
+    @torch.no_grad()
+    def forward(
+        self,
+        sample: Dict[str, Dict[str, Tensor]],
+        prefix_tokens: Optional[Tensor] = None,
+        bos_token: Optional[int] = None,
+    ):
+        """Generate a batch of translations.
+
+        Args:
+            sample (dict): batch
+            prefix_tokens (torch.LongTensor, optional): force decoder to begin
+                with these tokens
+            bos_token (int, optional): beginning of sentence token
+                (default: self.eos)
+        """
+        return self._generate(sample, prefix_tokens, bos_token=bos_token)
+
+    # TODO(myleott): unused, deprecate after pytorch-translate migration
+    def generate_batched_itr(self, data_itr, beam_size=None, cuda=False, timer=None):
+        """Iterate over a batched dataset and yield individual translations.
+        Args:
+            cuda (bool, optional): use GPU for generation
+            timer (StopwatchMeter, optional): time generations
+        """
+        for sample in data_itr:
+            s = utils.move_to_cuda(sample) if cuda else sample
+            if "net_input" not in s:
+                continue
+            input = s["net_input"]
+            # model.forward normally channels prev_output_tokens into the decoder
+            # separately, but SequenceGenerator directly calls model.encoder
+            encoder_input = {
+                k: v for k, v in input.items() if k != "prev_output_tokens"
+            }
+            if timer is not None:
+                timer.start()
+            with torch.no_grad():
+                hypos = self.generate(encoder_input)
+            if timer is not None:
+                timer.stop(sum(len(h[0]["tokens"]) for h in hypos))
+            for i, id in enumerate(s["id"].data):
+                # remove padding
+                src = utils.strip_pad(input["src_tokens"].data[i, :], self.pad)
+                ref = (
+                    utils.strip_pad(s["target"].data[i, :], self.pad)
+                    if s["target"] is not None
+                    else None
+                )
+                yield id, src, ref, hypos[i]
+
+    @torch.no_grad()
+    def generate(self, models, sample: Dict[str, Dict[str, Tensor]], **kwargs) -> List[List[Dict[str, Tensor]]]:
+        """Generate translations. Match the api of other fairseq generators.
+
+        Args:
+            models (List[~fairseq.models.FairseqModel]): ensemble of models
+            sample (dict): batch
+            prefix_tokens (torch.LongTensor, optional): force decoder to begin
+                with these tokens
+            constraints (torch.LongTensor, optional): force decoder to include
+                the list of constraints
+            bos_token (int, optional): beginning of sentence token
+                (default: self.eos)
+        """
+        return self._generate(sample, **kwargs)
+
+    def _generate(
+        self,
+        sample: Dict[str, Dict[str, Tensor]],
+        prefix_tokens: Optional[Tensor] = None,
+        constraints: Optional[Tensor] = None,
+        bos_token: Optional[int] = None,
+    ):
+        incremental_states = torch.jit.annotate(
+            List[Dict[str, Dict[str, Optional[Tensor]]]],
+            [
+                torch.jit.annotate(Dict[str, Dict[str, Optional[Tensor]]], {})
+                for i in range(self.model.models_size)
+            ],
+        )
+        net_input = sample["net_input"]
+
+        if "src_tokens" in net_input:
+            src_tokens = net_input["src_tokens"]
+            # length of the source text being the character length except EndOfSentence and pad
+            src_lengths = (
+                (src_tokens.ne(self.eos) & src_tokens.ne(self.pad)).long().sum(dim=1)
+            )
+        elif "source" in net_input:
+            src_tokens = net_input["source"]
+            src_lengths = (
+                net_input["padding_mask"].size(-1) - net_input["padding_mask"].sum(-1)
+                if net_input["padding_mask"] is not None
+                else torch.tensor(src_tokens.size(-1)).to(src_tokens)
+            )
+        elif "features" in net_input:
+            src_tokens = net_input["features"]
+            src_lengths = (
+                net_input["padding_mask"].size(-1) - net_input["padding_mask"].sum(-1)
+                if net_input["padding_mask"] is not None
+                else torch.tensor(src_tokens.size(-1)).to(src_tokens)
+            )
+        else:
+            raise Exception("expected src_tokens or source in net input. input keys: " + str(net_input.keys()))
+
+        # bsz: total number of sentences in beam
+        # Note that src_tokens may have more than 2 dimensions (i.e. audio features)
+        bsz, src_len = src_tokens.size()[:2]
+        beam_size = self.beam_size
+
+        if constraints is not None and not self.search.supports_constraints:
+            raise NotImplementedError(
+                "Target-side constraints were provided, but search method doesn't support them"
+            )
+
+        # Initialize constraints, when active
+        self.search.init_constraints(constraints, beam_size)
+
+        max_len: int = -1
+        if self.match_source_len:
+            max_len = src_lengths.max().item()
+        else:
+            max_len = min(
+                int(self.max_len_a * src_len + self.max_len_b),
+                self.max_len - 1,
+            )
+        assert (
+            self.min_len <= max_len
+        ), "min_len cannot be larger than max_len, please adjust these!"
+        # compute the encoder output for each beam
+        encoder_outs = self.model.forward_encoder(net_input)
+
+        # placeholder of indices for bsz * beam_size to hold tokens and accumulative scores
+        new_order = torch.arange(bsz).view(-1, 1).repeat(1, beam_size).view(-1)
+        new_order = new_order.to(src_tokens.device).long()
+        encoder_outs = self.model.reorder_encoder_out(encoder_outs, new_order)
+        # ensure encoder_outs is a List.
+        assert encoder_outs is not None
+
+        # initialize buffers
+        scores = (
+            torch.zeros(bsz * beam_size, max_len + 1).to(src_tokens).float()
+        )  # +1 for eos; pad is never chosen for scoring
+        tokens = (
+            torch.zeros(bsz * beam_size, max_len + 2)
+            .to(src_tokens)
+            .long()
+            .fill_(self.pad)
+        )  # +2 for eos and pad
+        tokens[:, 0] = self.eos if bos_token is None else bos_token
+        attn: Optional[Tensor] = None
+
+        # A list that indicates candidates that should be ignored.
+        # For example, suppose we're sampling and have already finalized 2/5
+        # samples. Then cands_to_ignore would mark 2 positions as being ignored,
+        # so that we only finalize the remaining 3 samples.
+        cands_to_ignore = (
+            torch.zeros(bsz, beam_size).to(src_tokens).eq(-1)
+        )  # forward and backward-compatible False mask
+
+        # list of completed sentences
+        finalized = torch.jit.annotate(
+            List[List[Dict[str, Tensor]]],
+            [torch.jit.annotate(List[Dict[str, Tensor]], []) for i in range(bsz)],
+        )  # contains lists of dictionaries of infomation about the hypothesis being finalized at each step
+
+        # a boolean array indicating if the sentence at the index is finished or not
+        finished = [False for i in range(bsz)]
+        num_remaining_sent = bsz  # number of sentences remaining
+
+        # number of candidate hypos per step
+        cand_size = 2 * beam_size  # 2 x beam size in case half are EOS
+
+        # offset arrays for converting between different indexing schemes
+        bbsz_offsets = (
+            (torch.arange(0, bsz) * beam_size)
+            .unsqueeze(1)
+            .type_as(tokens)
+            .to(src_tokens.device)
+        )
+        cand_offsets = torch.arange(0, cand_size).type_as(tokens).to(src_tokens.device)
+
+        reorder_state: Optional[Tensor] = None
+        batch_idxs: Optional[Tensor] = None
+
+        original_batch_idxs: Optional[Tensor] = None
+        if "id" in sample and isinstance(sample["id"], Tensor):
+            original_batch_idxs = sample["id"]
+        else:
+            original_batch_idxs = torch.arange(0, bsz).type_as(tokens)
+
+        for step in range(max_len + 1):  # one extra step for EOS marker
+            # reorder decoder internal states based on the prev choice of beams
+            if reorder_state is not None:
+                if batch_idxs is not None:
+                    # update beam indices to take into account removed sentences
+                    corr = batch_idxs - torch.arange(batch_idxs.numel()).type_as(
+                        batch_idxs
+                    )
+                    reorder_state.view(-1, beam_size).add_(
+                        corr.unsqueeze(-1) * beam_size
+                    )
+                    original_batch_idxs = original_batch_idxs[batch_idxs]
+                self.model.reorder_incremental_state(incremental_states, reorder_state)
+                encoder_outs = self.model.reorder_encoder_out(
+                    encoder_outs, reorder_state
+                )
+
+            lprobs, avg_attn_scores = self.model.forward_decoder(
+                tokens[:, : step + 1],
+                encoder_outs,
+                incremental_states,
+                self.temperature,
+            )
+
+            if self.lm_model is not None:
+                lm_out = self.lm_model(tokens[:, : step + 1])
+                probs = self.lm_model.get_normalized_probs(
+                    lm_out, log_probs=True, sample=None
+                )
+                probs = probs[:, -1, :] * self.lm_weight
+                lprobs += probs
+
+            lprobs[lprobs != lprobs] = torch.tensor(-math.inf).to(lprobs)
+
+            lprobs[:, self.pad] = -math.inf  # never select pad
+            lprobs[:, self.unk] -= self.unk_penalty  # apply unk penalty
+
+            # handle max length constraint
+            if step >= max_len:
+                lprobs[:, : self.eos] = -math.inf
+                lprobs[:, self.eos + 1 :] = -math.inf
+
+            # handle prefix tokens (possibly with different lengths)
+            if (
+                prefix_tokens is not None
+                and step < prefix_tokens.size(1)
+                and step < max_len
+            ):
+                lprobs, tokens, scores = self._prefix_tokens(
+                    step, lprobs, scores, tokens, prefix_tokens, beam_size
+                )
+            elif step < self.min_len:
+                # minimum length constraint (does not apply if using prefix_tokens)
+                lprobs[:, self.eos] = -math.inf
+
+            # Record attention scores, only support avg_attn_scores is a Tensor
+            if avg_attn_scores is not None:
+                if attn is None:
+                    attn = torch.empty(
+                        bsz * beam_size, avg_attn_scores.size(1), max_len + 2
+                    ).to(scores)
+                attn[:, :, step + 1].copy_(avg_attn_scores)
+
+            scores = scores.type_as(lprobs)
+            eos_bbsz_idx = torch.empty(0).to(
+                tokens
+            )  # indices of hypothesis ending with eos (finished sentences)
+            eos_scores = torch.empty(0).to(
+                scores
+            )  # scores of hypothesis ending with eos (finished sentences)
+
+            if self.should_set_src_lengths:
+                self.search.set_src_lengths(src_lengths)
+
+            if self.repeat_ngram_blocker is not None:
+                lprobs = self.repeat_ngram_blocker(tokens, lprobs, bsz, beam_size, step)
+
+            # Shape: (batch, cand_size)
+            cand_scores, cand_indices, cand_beams = self.search.step(
+                step,
+                lprobs.view(bsz, -1, self.vocab_size),
+                scores.view(bsz, beam_size, -1)[:, :, :step],
+                tokens[:, : step + 1],
+                original_batch_idxs,
+            )
+
+            # cand_bbsz_idx contains beam indices for the top candidate
+            # hypotheses, with a range of values: [0, bsz*beam_size),
+            # and dimensions: [bsz, cand_size]
+            cand_bbsz_idx = cand_beams.add(bbsz_offsets)
+
+            # finalize hypotheses that end in eos
+            # Shape of eos_mask: (batch size, beam size)
+            eos_mask = cand_indices.eq(self.eos) & cand_scores.ne(-math.inf)
+            eos_mask[:, :beam_size][cands_to_ignore] = torch.tensor(0).to(eos_mask)
+
+            # only consider eos when it's among the top beam_size indices
+            # Now we know what beam item(s) to finish
+            # Shape: 1d list of absolute-numbered
+            eos_bbsz_idx = torch.masked_select(
+                cand_bbsz_idx[:, :beam_size], mask=eos_mask[:, :beam_size]
+            )
+
+            finalized_sents: List[int] = []
+            if eos_bbsz_idx.numel() > 0:
+                eos_scores = torch.masked_select(
+                    cand_scores[:, :beam_size], mask=eos_mask[:, :beam_size]
+                )
+
+                finalized_sents = self.finalize_hypos(
+                    step,
+                    eos_bbsz_idx,
+                    eos_scores,
+                    tokens,
+                    scores,
+                    finalized,
+                    finished,
+                    beam_size,
+                    attn,
+                    src_lengths,
+                    max_len,
+                )
+                num_remaining_sent -= len(finalized_sents)
+
+            assert num_remaining_sent >= 0
+            if num_remaining_sent == 0:
+                break
+            if self.search.stop_on_max_len and step >= max_len:
+                break
+            assert step < max_len, f"{step} < {max_len}"
+
+            # Remove finalized sentences (ones for which {beam_size}
+            # finished hypotheses have been generated) from the batch.
+            if len(finalized_sents) > 0:
+                new_bsz = bsz - len(finalized_sents)
+
+                # construct batch_idxs which holds indices of batches to keep for the next pass
+                batch_mask = torch.ones(
+                    bsz, dtype=torch.bool, device=cand_indices.device
+                )
+                batch_mask[finalized_sents] = False
+                # TODO replace `nonzero(as_tuple=False)` after TorchScript supports it
+                batch_idxs = torch.arange(
+                    bsz, device=cand_indices.device
+                ).masked_select(batch_mask)
+
+                # Choose the subset of the hypothesized constraints that will continue
+                self.search.prune_sentences(batch_idxs)
+
+                eos_mask = eos_mask[batch_idxs]
+                cand_beams = cand_beams[batch_idxs]
+                bbsz_offsets.resize_(new_bsz, 1)
+                cand_bbsz_idx = cand_beams.add(bbsz_offsets)
+                cand_scores = cand_scores[batch_idxs]
+                cand_indices = cand_indices[batch_idxs]
+
+                if prefix_tokens is not None:
+                    prefix_tokens = prefix_tokens[batch_idxs]
+                src_lengths = src_lengths[batch_idxs]
+                cands_to_ignore = cands_to_ignore[batch_idxs]
+
+                scores = scores.view(bsz, -1)[batch_idxs].view(new_bsz * beam_size, -1)
+                tokens = tokens.view(bsz, -1)[batch_idxs].view(new_bsz * beam_size, -1)
+                if attn is not None:
+                    attn = attn.view(bsz, -1)[batch_idxs].view(
+                        new_bsz * beam_size, attn.size(1), -1
+                    )
+                bsz = new_bsz
+            else:
+                batch_idxs = None
+
+            # Set active_mask so that values > cand_size indicate eos hypos
+            # and values < cand_size indicate candidate active hypos.
+            # After, the min values per row are the top candidate active hypos
+
+            # Rewrite the operator since the element wise or is not supported in torchscript.
+
+            eos_mask[:, :beam_size] = ~((~cands_to_ignore) & (~eos_mask[:, :beam_size]))
+            active_mask = torch.add(
+                eos_mask.type_as(cand_offsets) * cand_size,
+                cand_offsets[: eos_mask.size(1)],
+            )
+
+            # get the top beam_size active hypotheses, which are just
+            # the hypos with the smallest values in active_mask.
+            # {active_hypos} indicates which {beam_size} hypotheses
+            # from the list of {2 * beam_size} candidates were
+            # selected. Shapes: (batch size, beam size)
+            new_cands_to_ignore, active_hypos = torch.topk(
+                active_mask, k=beam_size, dim=1, largest=False
+            )
+
+            # update cands_to_ignore to ignore any finalized hypos.
+            cands_to_ignore = new_cands_to_ignore.ge(cand_size)[:, :beam_size]
+            # Make sure there is at least one active item for each sentence in the batch.
+            assert (~cands_to_ignore).any(dim=1).all()
+
+            # update cands_to_ignore to ignore any finalized hypos
+
+            # {active_bbsz_idx} denotes which beam number is continued for each new hypothesis (a beam
+            # can be selected more than once).
+            active_bbsz_idx = torch.gather(cand_bbsz_idx, dim=1, index=active_hypos)
+            active_scores = torch.gather(cand_scores, dim=1, index=active_hypos)
+
+            active_bbsz_idx = active_bbsz_idx.view(-1)
+            active_scores = active_scores.view(-1)
+
+            # copy tokens and scores for active hypotheses
+
+            # Set the tokens for each beam (can select the same row more than once)
+            tokens[:, : step + 1] = torch.index_select(
+                tokens[:, : step + 1], dim=0, index=active_bbsz_idx
+            )
+            # Select the next token for each of them
+            tokens.view(bsz, beam_size, -1)[:, :, step + 1] = torch.gather(
+                cand_indices, dim=1, index=active_hypos
+            )
+            if step > 0:
+                scores[:, :step] = torch.index_select(
+                    scores[:, :step], dim=0, index=active_bbsz_idx
+                )
+            scores.view(bsz, beam_size, -1)[:, :, step] = torch.gather(
+                cand_scores, dim=1, index=active_hypos
+            )
+
+            # Update constraints based on which candidates were selected for the next beam
+            self.search.update_constraints(active_hypos)
+
+            # copy attention for active hypotheses
+            if attn is not None:
+                attn[:, :, : step + 2] = torch.index_select(
+                    attn[:, :, : step + 2], dim=0, index=active_bbsz_idx
+                )
+
+            # reorder incremental state in decoder
+            reorder_state = active_bbsz_idx
+
+        # sort by score descending
+        for sent in range(len(finalized)):
+            scores = torch.tensor(
+                [float(elem["score"].item()) for elem in finalized[sent]]
+            )
+            _, sorted_scores_indices = torch.sort(scores, descending=True)
+            finalized[sent] = [finalized[sent][ssi] for ssi in sorted_scores_indices]
+            finalized[sent] = torch.jit.annotate(
+                List[Dict[str, Tensor]], finalized[sent]
+            )
+        return finalized
+
+    def _prefix_tokens(
+        self, step: int, lprobs, scores, tokens, prefix_tokens, beam_size: int
+    ):
+        """Handle prefix tokens"""
+        prefix_toks = prefix_tokens[:, step].unsqueeze(-1).repeat(1, beam_size).view(-1)
+        prefix_lprobs = lprobs.gather(-1, prefix_toks.unsqueeze(-1))
+        prefix_mask = prefix_toks.ne(self.pad)
+        lprobs[prefix_mask] = torch.tensor(-math.inf).to(lprobs)
+        lprobs[prefix_mask] = lprobs[prefix_mask].scatter(
+            -1, prefix_toks[prefix_mask].unsqueeze(-1), prefix_lprobs[prefix_mask]
+        )
+        # if prefix includes eos, then we should make sure tokens and
+        # scores are the same across all beams
+        eos_mask = prefix_toks.eq(self.eos)
+        if eos_mask.any():
+            # validate that the first beam matches the prefix
+            first_beam = tokens[eos_mask].view(-1, beam_size, tokens.size(-1))[
+                :, 0, 1 : step + 1
+            ]
+            eos_mask_batch_dim = eos_mask.view(-1, beam_size)[:, 0]
+            target_prefix = prefix_tokens[eos_mask_batch_dim][:, :step]
+            assert (first_beam == target_prefix).all()
+
+            # copy tokens, scores and lprobs from the first beam to all beams
+            tokens = self.replicate_first_beam(tokens, eos_mask_batch_dim, beam_size)
+            scores = self.replicate_first_beam(scores, eos_mask_batch_dim, beam_size)
+            lprobs = self.replicate_first_beam(lprobs, eos_mask_batch_dim, beam_size)
+        return lprobs, tokens, scores
+
+    def replicate_first_beam(self, tensor, mask, beam_size: int):
+        tensor = tensor.view(-1, beam_size, tensor.size(-1))
+        tensor[mask] = tensor[mask][:, :1, :]
+        return tensor.view(-1, tensor.size(-1))
+
+    def finalize_hypos(
+        self,
+        step: int,
+        bbsz_idx,
+        eos_scores,
+        tokens,
+        scores,
+        finalized: List[List[Dict[str, Tensor]]],
+        finished: List[bool],
+        beam_size: int,
+        attn: Optional[Tensor],
+        src_lengths,
+        max_len: int,
+    ):
+        """Finalize hypothesis, store finalized information in `finalized`, and change `finished` accordingly.
+        A sentence is finalized when {beam_size} finished items have been collected for it.
+
+        Returns number of sentences (not beam items) being finalized.
+        These will be removed from the batch and not processed further.
+        Args:
+            bbsz_idx (Tensor):
+        """
+        assert bbsz_idx.numel() == eos_scores.numel()
+
+        # clone relevant token and attention tensors.
+        # tokens is (batch * beam, max_len). So the index_select
+        # gets the newly EOS rows, then selects cols 1..{step + 2}
+        tokens_clone = tokens.index_select(0, bbsz_idx)[
+            :, 1 : step + 2
+        ]  # skip the first index, which is EOS
+
+        tokens_clone[:, step] = self.eos
+        attn_clone = (
+            attn.index_select(0, bbsz_idx)[:, :, 1 : step + 2]
+            if attn is not None
+            else None
+        )
+
+        # compute scores per token position
+        pos_scores = scores.index_select(0, bbsz_idx)[:, : step + 1]
+        pos_scores[:, step] = eos_scores
+        # convert from cumulative to per-position scores
+        pos_scores[:, 1:] = pos_scores[:, 1:] - pos_scores[:, :-1]
+
+        # normalize sentence-level scores
+        if self.normalize_scores:
+            eos_scores /= (step + 1) ** self.len_penalty
+
+        # cum_unfin records which sentences in the batch are finished.
+        # It helps match indexing between (a) the original sentences
+        # in the batch and (b) the current, possibly-reduced set of
+        # sentences.
+        cum_unfin: List[int] = []
+        prev = 0
+        for f in finished:
+            if f:
+                prev += 1
+            else:
+                cum_unfin.append(prev)
+
+        # The keys here are of the form "{sent}_{unfin_idx}", where
+        # "unfin_idx" is the index in the current (possibly reduced)
+        # list of sentences, and "sent" is the index in the original,
+        # unreduced batch
+        # set() is not supported in script export
+        sents_seen: Dict[str, Optional[Tensor]] = {}
+
+        # For every finished beam item
+        for i in range(bbsz_idx.size()[0]):
+            idx = bbsz_idx[i]
+            score = eos_scores[i]
+            # sentence index in the current (possibly reduced) batch
+            unfin_idx = idx // beam_size
+            # sentence index in the original (unreduced) batch
+            sent = unfin_idx + cum_unfin[unfin_idx]
+            # Cannot create dict for key type '(int, int)' in torchscript.
+            # The workaround is to cast int to string
+            seen = str(sent.item()) + "_" + str(unfin_idx.item())
+            if seen not in sents_seen:
+                sents_seen[seen] = None
+
+            if self.match_source_len and step > src_lengths[unfin_idx]:
+                score = torch.tensor(-math.inf).to(score)
+
+            # An input sentence (among those in a batch) is finished when
+            # beam_size hypotheses have been collected for it
+            if len(finalized[sent]) < beam_size:
+                if attn_clone is not None:
+                    # remove padding tokens from attn scores
+                    hypo_attn = attn_clone[i]
+                else:
+                    hypo_attn = torch.empty(0)
+
+                finalized[sent].append(
+                    {
+                        "tokens": tokens_clone[i],
+                        "score": score,
+                        "attention": hypo_attn,  # src_len x tgt_len
+                        "alignment": torch.empty(0),
+                        "positional_scores": pos_scores[i],
+                    }
+                )
+
+        newly_finished: List[int] = []
+
+        for seen in sents_seen.keys():
+            # check termination conditions for this sentence
+            sent: int = int(float(seen.split("_")[0]))
+            unfin_idx: int = int(float(seen.split("_")[1]))
+
+            if not finished[sent] and self.is_finished(
+                step, unfin_idx, max_len, len(finalized[sent]), beam_size
+            ):
+                finished[sent] = True
+                newly_finished.append(unfin_idx)
+
+        return newly_finished
+
+    def is_finished(
+        self,
+        step: int,
+        unfin_idx: int,
+        max_len: int,
+        finalized_sent_len: int,
+        beam_size: int,
+    ):
+        """
+        Check whether decoding for a sentence is finished, which
+        occurs when the list of finalized sentences has reached the
+        beam size, or when we reach the maximum length.
+        """
+        assert finalized_sent_len <= beam_size
+        if finalized_sent_len == beam_size or step == max_len:
+            return True
+        return False
+
+
+class EnsembleModel(nn.Module):
+    """A wrapper around an ensemble of models."""
+
+    def __init__(self, models):
+        super().__init__()
+        self.models_size = len(models)
+        # method '__len__' is not supported in ModuleList for torch script
+        self.single_model = models[0]
+        self.models = nn.ModuleList(models)
+
+        self.has_incremental: bool = False
+        if all(
+            hasattr(m, "decoder") and isinstance(m.decoder, FairseqIncrementalDecoder)
+            for m in models
+        ):
+            self.has_incremental = True
+
+    def forward(self):
+        pass
+
+    def has_encoder(self):
+        return hasattr(self.single_model, "encoder")
+
+    def has_incremental_states(self):
+        return self.has_incremental
+
+    def max_decoder_positions(self):
+        return min([m.max_decoder_positions() for m in self.models if hasattr(m, "max_decoder_positions")] + [sys.maxsize])
+
+    @torch.jit.export
+    def forward_encoder(self, net_input: Dict[str, Tensor]):
+        if not self.has_encoder():
+            return None
+        return [model.encoder.forward_torchscript(net_input) for model in self.models]
+
+    @torch.jit.export
+    def forward_decoder(
+        self,
+        tokens,
+        encoder_outs: List[Dict[str, List[Tensor]]],
+        incremental_states: List[Dict[str, Dict[str, Optional[Tensor]]]],
+        temperature: float = 1.0,
+    ):
+        log_probs = []
+        avg_attn: Optional[Tensor] = None
+        encoder_out: Optional[Dict[str, List[Tensor]]] = None
+        for i, model in enumerate(self.models):
+            if self.has_encoder():
+                encoder_out = encoder_outs[i]
+            # decode each model
+            if self.has_incremental_states():
+                decoder_out = model.decoder.forward(
+                    tokens,
+                    encoder_out=encoder_out,
+                    incremental_state=incremental_states[i],
+                )
+            else:
+                if hasattr(model, "decoder"):
+                    decoder_out = model.decoder.forward(tokens, encoder_out=encoder_out)
+                else:
+                    decoder_out = model.forward(tokens)
+
+            attn: Optional[Tensor] = None
+            decoder_len = len(decoder_out)
+            if decoder_len > 1 and decoder_out[1] is not None:
+                if isinstance(decoder_out[1], Tensor):
+                    attn = decoder_out[1]
+                else:
+                    attn_holder = decoder_out[1]["attn"]
+                    if isinstance(attn_holder, Tensor):
+                        attn = attn_holder
+                    elif attn_holder is not None:
+                        attn = attn_holder[0]
+                if attn is not None:
+                    attn = attn[:, -1, :]
+
+            decoder_out_tuple = (
+                decoder_out[0][:, -1:, :].div_(temperature),
+                None if decoder_len <= 1 else decoder_out[1],
+            )
+            probs = model.get_normalized_probs(
+                decoder_out_tuple, log_probs=True, sample=None
+            )
+            probs = probs[:, -1, :]
+            if self.models_size == 1:
+                return probs, attn
+
+            log_probs.append(probs)
+            if attn is not None:
+                if avg_attn is None:
+                    avg_attn = attn
+                else:
+                    avg_attn.add_(attn)
+
+        avg_probs = torch.logsumexp(torch.stack(log_probs, dim=0), dim=0) - math.log(
+            self.models_size
+        )
+
+        if avg_attn is not None:
+            avg_attn.div_(self.models_size)
+        return avg_probs, avg_attn
+
+    @torch.jit.export
+    def reorder_encoder_out(
+        self, encoder_outs: Optional[List[Dict[str, List[Tensor]]]], new_order
+    ):
+        """
+        Reorder encoder output according to *new_order*.
+
+        Args:
+            encoder_out: output from the ``forward()`` method
+            new_order (LongTensor): desired order
+
+        Returns:
+            *encoder_out* rearranged according to *new_order*
+        """
+        new_outs: List[Dict[str, List[Tensor]]] = []
+        if not self.has_encoder():
+            return new_outs
+        for i, model in enumerate(self.models):
+            assert encoder_outs is not None
+            new_outs.append(
+                model.encoder.reorder_encoder_out(encoder_outs[i], new_order)
+            )
+        return new_outs
+
+    @torch.jit.export
+    def reorder_incremental_state(
+        self,
+        incremental_states: List[Dict[str, Dict[str, Optional[Tensor]]]],
+        new_order,
+    ):
+        if not self.has_incremental_states():
+            return
+        for i, model in enumerate(self.models):
+            model.decoder.reorder_incremental_state_scripting(
+                incremental_states[i], new_order
+            )
+
+
+class SequenceGeneratorWithAlignment(SequenceGenerator):
+    def __init__(
+        self, models, tgt_dict, left_pad_target=False, print_alignment="hard", **kwargs
+    ):
+        """Generates translations of a given source sentence.
+
+        Produces alignments following "Jointly Learning to Align and
+        Translate with Transformer Models" (Garg et al., EMNLP 2019).
+
+        Args:
+            left_pad_target (bool, optional): Whether or not the
+                hypothesis should be left padded or not when they are
+                teacher forced for generating alignments.
+        """
+        super().__init__(EnsembleModelWithAlignment(models), tgt_dict, **kwargs)
+        self.left_pad_target = left_pad_target
+
+        if print_alignment == "hard":
+            self.extract_alignment = utils.extract_hard_alignment
+        elif print_alignment == "soft":
+            self.extract_alignment = utils.extract_soft_alignment
+
+    @torch.no_grad()
+    def generate(self, models, sample, **kwargs):
+        finalized = super()._generate(sample, **kwargs)
+
+        src_tokens = sample["net_input"]["src_tokens"]
+        bsz = src_tokens.shape[0]
+        beam_size = self.beam_size
+        (
+            src_tokens,
+            src_lengths,
+            prev_output_tokens,
+            tgt_tokens,
+        ) = self._prepare_batch_for_alignment(sample, finalized)
+        if any(getattr(m, "full_context_alignment", False) for m in self.model.models):
+            attn = self.model.forward_align(src_tokens, src_lengths, prev_output_tokens)
+        else:
+            attn = [
+                finalized[i // beam_size][i % beam_size]["attention"].transpose(1, 0)
+                for i in range(bsz * beam_size)
+            ]
+
+        if src_tokens.device != "cpu":
+            src_tokens = src_tokens.to("cpu")
+            tgt_tokens = tgt_tokens.to("cpu")
+            attn = [i.to("cpu") for i in attn]
+
+        # Process the attn matrix to extract hard alignments.
+        for i in range(bsz * beam_size):
+            alignment = self.extract_alignment(
+                attn[i], src_tokens[i], tgt_tokens[i], self.pad, self.eos
+            )
+            finalized[i // beam_size][i % beam_size]["alignment"] = alignment
+        return finalized
+
+    def _prepare_batch_for_alignment(self, sample, hypothesis):
+        src_tokens = sample["net_input"]["src_tokens"]
+        bsz = src_tokens.shape[0]
+        src_tokens = (
+            src_tokens[:, None, :]
+            .expand(-1, self.beam_size, -1)
+            .contiguous()
+            .view(bsz * self.beam_size, -1)
+        )
+        src_lengths = sample["net_input"]["src_lengths"]
+        src_lengths = (
+            src_lengths[:, None]
+            .expand(-1, self.beam_size)
+            .contiguous()
+            .view(bsz * self.beam_size)
+        )
+        prev_output_tokens = data_utils.collate_tokens(
+            [beam["tokens"] for example in hypothesis for beam in example],
+            self.pad,
+            self.eos,
+            self.left_pad_target,
+            move_eos_to_beginning=True,
+        )
+        tgt_tokens = data_utils.collate_tokens(
+            [beam["tokens"] for example in hypothesis for beam in example],
+            self.pad,
+            self.eos,
+            self.left_pad_target,
+            move_eos_to_beginning=False,
+        )
+        return src_tokens, src_lengths, prev_output_tokens, tgt_tokens
+
+
+class EnsembleModelWithAlignment(EnsembleModel):
+    """A wrapper around an ensemble of models."""
+
+    def __init__(self, models):
+        super().__init__(models)
+
+    def forward_align(self, src_tokens, src_lengths, prev_output_tokens):
+        avg_attn = None
+        for model in self.models:
+            decoder_out = model(src_tokens, src_lengths, prev_output_tokens)
+            attn = decoder_out[1]["attn"][0]
+            if avg_attn is None:
+                avg_attn = attn
+            else:
+                avg_attn.add_(attn)
+        if len(self.models) > 1:
+            avg_attn.div_(len(self.models))
+        return avg_attn
diff --git a/SpeechT5/fairseq/fairseq/sequence_scorer.py b/SpeechT5/fairseq/fairseq/sequence_scorer.py
new file mode 100644
index 0000000000000000000000000000000000000000..411d4df4445ef8dd3f1907ad56f9de6943d1fed8
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/sequence_scorer.py
@@ -0,0 +1,153 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import sys
+
+import torch
+from fairseq import utils
+
+
+class SequenceScorer(object):
+    """Scores the target for a given source sentence."""
+
+    def __init__(
+        self,
+        tgt_dict,
+        softmax_batch=None,
+        compute_alignment=False,
+        eos=None,
+        symbols_to_strip_from_output=None,
+    ):
+        self.pad = tgt_dict.pad()
+        self.eos = tgt_dict.eos() if eos is None else eos
+        self.softmax_batch = softmax_batch or sys.maxsize
+        assert self.softmax_batch > 0
+        self.compute_alignment = compute_alignment
+        self.symbols_to_strip_from_output = (
+            symbols_to_strip_from_output.union({self.eos})
+            if symbols_to_strip_from_output is not None
+            else {self.eos}
+        )
+
+    @torch.no_grad()
+    def generate(self, models, sample, **kwargs):
+        """Score a batch of translations."""
+        net_input = sample["net_input"]
+
+        def batch_for_softmax(dec_out, target):
+            # assumes decoder_out[0] is the only thing needed (may not be correct for future models!)
+            first, rest = dec_out[0], dec_out[1:]
+            bsz, tsz, dim = first.shape
+            if bsz * tsz < self.softmax_batch:
+                yield dec_out, target, True
+            else:
+                flat = first.contiguous().view(1, -1, dim)
+                flat_tgt = target.contiguous().view(flat.shape[:-1])
+                s = 0
+                while s < flat.size(1):
+                    e = s + self.softmax_batch
+                    yield (flat[:, s:e],) + rest, flat_tgt[:, s:e], False
+                    s = e
+
+        def gather_target_probs(probs, target):
+            probs = probs.gather(
+                dim=2,
+                index=target.unsqueeze(-1),
+            )
+            return probs
+
+        orig_target = sample["target"]
+
+        # compute scores for each model in the ensemble
+        avg_probs = None
+        avg_attn = None
+        for model in models:
+            model.eval()
+            decoder_out = model(**net_input)
+            attn = decoder_out[1] if len(decoder_out) > 1 else None
+            if type(attn) is dict:
+                attn = attn.get("attn", None)
+
+            batched = batch_for_softmax(decoder_out, orig_target)
+            probs, idx = None, 0
+            for bd, tgt, is_single in batched:
+                sample["target"] = tgt
+                curr_prob = model.get_normalized_probs(
+                    bd, log_probs=len(models) == 1, sample=sample
+                ).data
+                if is_single:
+                    probs = gather_target_probs(curr_prob, orig_target)
+                else:
+                    if probs is None:
+                        probs = curr_prob.new(orig_target.numel())
+                    step = curr_prob.size(0) * curr_prob.size(1)
+                    end = step + idx
+                    tgt_probs = gather_target_probs(
+                        curr_prob.view(tgt.shape + (curr_prob.size(-1),)), tgt
+                    )
+                    probs[idx:end] = tgt_probs.view(-1)
+                    idx = end
+                sample["target"] = orig_target
+
+            probs = probs.view(sample["target"].shape)
+
+            if avg_probs is None:
+                avg_probs = probs
+            else:
+                avg_probs.add_(probs)
+            if attn is not None:
+                if torch.is_tensor(attn):
+                    attn = attn.data
+                else:
+                    attn = attn[0]
+                if avg_attn is None:
+                    avg_attn = attn
+                else:
+                    avg_attn.add_(attn)
+        if len(models) > 1:
+            avg_probs.div_(len(models))
+            avg_probs.log_()
+            if avg_attn is not None:
+                avg_attn.div_(len(models))
+
+        bsz = avg_probs.size(0)
+        hypos = []
+        start_idxs = sample["start_indices"] if "start_indices" in sample else [0] * bsz
+        for i in range(bsz):
+            # remove padding from ref
+            ref = (
+                utils.strip_pad(sample["target"][i, start_idxs[i] :], self.pad)
+                if sample["target"] is not None
+                else None
+            )
+            tgt_len = ref.numel()
+            avg_probs_i = avg_probs[i][start_idxs[i] : start_idxs[i] + tgt_len]
+            score_i = avg_probs_i.sum() / tgt_len
+            if avg_attn is not None:
+                avg_attn_i = avg_attn[i]
+                if self.compute_alignment:
+                    alignment = utils.extract_hard_alignment(
+                        avg_attn_i,
+                        sample["net_input"]["src_tokens"][i],
+                        sample["target"][i],
+                        self.pad,
+                        self.eos,
+                    )
+                else:
+                    alignment = None
+            else:
+                avg_attn_i = alignment = None
+            hypos.append(
+                [
+                    {
+                        "tokens": ref,
+                        "score": score_i,
+                        "attention": avg_attn_i,
+                        "alignment": alignment,
+                        "positional_scores": avg_probs_i,
+                    }
+                ]
+            )
+        return hypos
diff --git a/SpeechT5/fairseq/fairseq/tasks/__init__.py b/SpeechT5/fairseq/fairseq/tasks/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..79dde74057f40a368590cbf0ca0d290f1787a264
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/tasks/__init__.py
@@ -0,0 +1,136 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+"""isort:skip_file"""
+
+import argparse
+import importlib
+import os
+
+from fairseq.dataclass import FairseqDataclass
+from fairseq.dataclass.utils import merge_with_parent, populate_dataclass
+from hydra.core.config_store import ConfigStore
+
+from .fairseq_task import FairseqTask, LegacyFairseqTask  # noqa
+
+
+# register dataclass
+TASK_DATACLASS_REGISTRY = {}
+TASK_REGISTRY = {}
+TASK_CLASS_NAMES = set()
+
+
+def setup_task(cfg: FairseqDataclass, **kwargs):
+    task = None
+    task_name = getattr(cfg, "task", None)
+
+    if isinstance(task_name, str):
+        # legacy tasks
+        task = TASK_REGISTRY[task_name]
+        if task_name in TASK_DATACLASS_REGISTRY:
+            dc = TASK_DATACLASS_REGISTRY[task_name]
+            cfg = populate_dataclass(dc(), cfg)
+    else:
+        task_name = getattr(cfg, "_name", None)
+
+        if task_name and task_name in TASK_DATACLASS_REGISTRY:
+            dc = TASK_DATACLASS_REGISTRY[task_name]
+            cfg = merge_with_parent(dc(), cfg)
+            task = TASK_REGISTRY[task_name]
+
+    assert (
+        task is not None
+    ), f"Could not infer task type from {cfg}. Available tasks: {TASK_REGISTRY.keys()}"
+
+    return task.setup_task(cfg, **kwargs)
+
+
+def register_task(name, dataclass=None):
+    """
+    New tasks can be added to fairseq with the
+    :func:`~fairseq.tasks.register_task` function decorator.
+
+    For example::
+
+        @register_task('classification')
+        class ClassificationTask(FairseqTask):
+            (...)
+
+    .. note::
+
+        All Tasks must implement the :class:`~fairseq.tasks.FairseqTask`
+        interface.
+
+    Args:
+        name (str): the name of the task
+    """
+
+    def register_task_cls(cls):
+        if name in TASK_REGISTRY:
+            raise ValueError("Cannot register duplicate task ({})".format(name))
+        if not issubclass(cls, FairseqTask):
+            raise ValueError(
+                "Task ({}: {}) must extend FairseqTask".format(name, cls.__name__)
+            )
+        if cls.__name__ in TASK_CLASS_NAMES:
+            raise ValueError(
+                "Cannot register task with duplicate class name ({})".format(
+                    cls.__name__
+                )
+            )
+        TASK_REGISTRY[name] = cls
+        TASK_CLASS_NAMES.add(cls.__name__)
+
+        if dataclass is not None and not issubclass(dataclass, FairseqDataclass):
+            raise ValueError(
+                "Dataclass {} must extend FairseqDataclass".format(dataclass)
+            )
+
+        cls.__dataclass = dataclass
+        if dataclass is not None:
+            TASK_DATACLASS_REGISTRY[name] = dataclass
+
+            cs = ConfigStore.instance()
+            node = dataclass()
+            node._name = name
+            cs.store(name=name, group="task", node=node, provider="fairseq")
+
+        return cls
+
+    return register_task_cls
+
+
+def get_task(name):
+    return TASK_REGISTRY[name]
+
+
+def import_tasks(tasks_dir, namespace):
+    for file in os.listdir(tasks_dir):
+        path = os.path.join(tasks_dir, file)
+        if (
+            not file.startswith("_")
+            and not file.startswith(".")
+            and (file.endswith(".py") or os.path.isdir(path))
+        ):
+            task_name = file[: file.find(".py")] if file.endswith(".py") else file
+            importlib.import_module(namespace + "." + task_name)
+
+            # expose `task_parser` for sphinx
+            if task_name in TASK_REGISTRY:
+                parser = argparse.ArgumentParser(add_help=False)
+                group_task = parser.add_argument_group("Task name")
+                # fmt: off
+                group_task.add_argument('--task', metavar=task_name,
+                                        help='Enable this task with: ``--task=' + task_name + '``')
+                # fmt: on
+                group_args = parser.add_argument_group(
+                    "Additional command-line arguments"
+                )
+                TASK_REGISTRY[task_name].add_args(group_args)
+                globals()[task_name + "_parser"] = parser
+
+
+# automatically import any Python files in the tasks/ directory
+tasks_dir = os.path.dirname(__file__)
+import_tasks(tasks_dir, "fairseq.tasks")
diff --git a/SpeechT5/fairseq/fairseq/tasks/audio_pretraining.py b/SpeechT5/fairseq/fairseq/tasks/audio_pretraining.py
new file mode 100644
index 0000000000000000000000000000000000000000..c642ff5226e5f98332b5b2ae90716842c2addacc
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/tasks/audio_pretraining.py
@@ -0,0 +1,381 @@
+# Copyright (c) 2017-present, Facebook, Inc.
+# All rights reserved.
+#
+# This source code is licensed under the license found in the LICENSE file in
+# the root directory of this source tree. An additional grant of patent rights
+# can be found in the PATENTS file in the same directory.
+
+import logging
+import os
+import sys
+import torch
+
+from argparse import Namespace
+from dataclasses import dataclass, field
+from typing import Optional, Any
+from omegaconf import MISSING, II, OmegaConf
+
+from fairseq.data import (
+    AddTargetDataset,
+    BinarizedAudioDataset,
+    Dictionary,
+    FileAudioDataset,
+    encoders,
+)
+from fairseq.dataclass import FairseqDataclass
+from fairseq.dataclass.configs import GenerationConfig
+
+from . import FairseqTask, register_task
+from .. import utils
+from ..logging import metrics
+
+
+logger = logging.getLogger(__name__)
+
+
+class LabelEncoder(object):
+    def __init__(self, dictionary):
+        self.dictionary = dictionary
+
+    def __call__(self, label):
+        return self.dictionary.encode_line(
+            label, append_eos=False, add_if_not_exist=False
+        )
+
+
+@dataclass
+class InferredW2vConfig:
+    # The following are needed to precompute mask and mask channel indices
+    #   before model's forward.
+    mask_length: Optional[int] = II("model.mask_length")
+    mask_prob: Optional[float] = II("model.mask_prob")
+    mask_selection: Optional[str] = II("model.mask_selection")
+    mask_other: Optional[float] = II("model.mask_other")
+    no_mask_overlap: Optional[bool] = II("model.no_mask_overlap")
+    mask_min_space: Optional[int] = II("model.mask_min_space")
+    mask_channel_length: Optional[int] = II("model.mask_channel_length")
+    mask_channel_prob: Optional[float] = II("model.mask_channel_prob")
+    mask_channel_selection: Optional[str] = II("model.mask_channel_selection")
+    mask_channel_other: Optional[float] = II("model.mask_channel_other")
+    no_mask_channel_overlap: Optional[bool] = II("model.no_mask_channel_overlap")
+    mask_channel_min_space: Optional[int] = II("model.mask_channel_min_space")
+
+    conv_feature_layers: Optional[str] = II("model.conv_feature_layers")
+    encoder_embed_dim: Optional[int] = II("model.encoder_embed_dim")
+
+
+@dataclass
+class AudioPretrainingConfig(FairseqDataclass):
+    data: str = field(default=MISSING, metadata={"help": "path to data directory"})
+    labels: Optional[str] = field(
+        default=None,
+        metadata={"help": "extension of the label file to load, used for fine-tuning"},
+    )
+    binarized_dataset: bool = field(
+        default=False,
+        metadata={
+            "help": "if true, loads binarized dataset (useful for very large datasets). "
+            "See examples/wav2vec/scripts/binarize_manifest.sh"
+        },
+    )
+    sample_rate: int = field(
+        default=16_000,
+        metadata={
+            "help": "target sample rate. audio files will be up/down sampled to this rate"
+        },
+    )
+    normalize: bool = field(
+        default=False,
+        metadata={"help": "if set, normalizes input to have 0 mean and unit variance"},
+    )
+    enable_padding: bool = field(
+        default=False, metadata={"help": "pad shorter samples instead of cropping"}
+    )
+    max_sample_size: Optional[int] = field(
+        default=None, metadata={"help": "max sample size to crop to for batching"}
+    )
+    min_sample_size: Optional[int] = field(
+        default=None, metadata={"help": "min sample size to skip small examples"}
+    )
+
+    # Options for reporting WER metrics during validation. Only applicable to
+    # Seq2Seq models during fine-tuning
+    eval_wer: bool = field(
+        default=False, metadata={"help": "compute WER for Seq2Seq models"}
+    )
+    eval_wer_config: GenerationConfig = field(
+        default_factory=lambda: GenerationConfig(),
+        metadata={"help": "beam search config for evaluating wer during training"},
+    )
+    eval_wer_tokenizer: Any = field(
+        default=None,
+        metadata={"help": "tokenizer config for evaluating wer during training"},
+    )
+    eval_wer_post_process: str = field(
+        default="letter",
+        metadata={
+            "help": "remove BPE tokens before scoring (can be sentencepiece, letter, and more)"
+        },
+    )
+    autoregressive: bool = field(
+        default=False,
+        metadata={
+            "help": "required for autoregressive decoders (like seq2seq models); "
+            "adds 'prev_output_tokens' to input and appends eos to target"
+        },
+    )
+    num_batch_buckets: int = field(
+        default=0,
+        metadata={"help": "number of buckets"},
+    )
+    precompute_mask_indices: bool = field(
+        default=False,
+        metadata={
+            "help": "flag to compute mask indices in data preparation.",
+        },
+    )
+
+    inferred_w2v_config: Optional[InferredW2vConfig] = field(
+        default=None,
+        metadata={
+            "help": "wav2vec 2.0 masking arguments used to pre-compute masks (required for TPU)",
+        },
+    )
+
+    tpu: bool = II("common.tpu")
+
+
+@register_task("audio_pretraining", dataclass=AudioPretrainingConfig)
+class AudioPretrainingTask(FairseqTask):
+    """ """
+
+    cfg: AudioPretrainingConfig
+
+    def __init__(
+        self,
+        cfg: AudioPretrainingConfig,
+    ):
+        super().__init__(cfg)
+        if cfg.eval_wer:
+            assert cfg.labels is not None, "eval_wer can only be set during fine-tuning"
+        self.blank_symbol = "<s>"
+
+        self.state.add_factory("target_dictionary", self.load_target_dictionary)
+
+    @classmethod
+    def setup_task(cls, cfg: AudioPretrainingConfig, **kwargs):
+        """Setup the task (e.g., load dictionaries).
+
+        Args:
+            cfg (AudioPretrainingConfig): configuration of this task
+        """
+
+        return cls(cfg)
+
+    def load_target_dictionary(self):
+        if self.cfg.labels:
+            dict_path = os.path.join(self.cfg.data, f"dict.{self.cfg.labels}.txt")
+            return Dictionary.load(dict_path)
+        return None
+
+    def _get_mask_precompute_kwargs(self, cfg):
+        if self.cfg.precompute_mask_indices or self.cfg.tpu:
+            assert (
+                cfg.inferred_w2v_config is not None
+            ), "inferred_w2v_config must be set"
+            return OmegaConf.to_container(
+                cfg.inferred_w2v_config, resolve=True, enum_to_str=True
+            )
+        else:
+            return {}
+
+    def load_dataset(self, split: str, task_cfg: FairseqDataclass = None, **kwargs):
+        data_path = self.cfg.data
+        task_cfg = task_cfg or self.cfg
+
+        # upgrade old task
+        if isinstance(task_cfg, Namespace):
+            if not hasattr(task_cfg, "autoregressive"):
+                task_cfg.autoregressive = not task_cfg.criterion == "ctc"
+
+        if getattr(task_cfg, "binarized_dataset", False):
+            self.datasets[split] = BinarizedAudioDataset(
+                data_path,
+                split=split,
+                sample_rate=task_cfg.get("sample_rate", self.cfg.sample_rate),
+                max_sample_size=self.cfg.max_sample_size,
+                min_sample_size=self.cfg.min_sample_size,
+                pad=task_cfg.labels is not None or task_cfg.enable_padding,
+                normalize=task_cfg.normalize,
+                num_buckets=self.cfg.num_batch_buckets or int(self.cfg.tpu),
+                compute_mask_indices=(self.cfg.precompute_mask_indices or self.cfg.tpu),
+                **self._get_mask_precompute_kwargs(task_cfg),
+            )
+        else:
+            manifest_path = os.path.join(data_path, "{}.tsv".format(split))
+
+            self.datasets[split] = FileAudioDataset(
+                manifest_path=manifest_path,
+                sample_rate=task_cfg.get("sample_rate", self.cfg.sample_rate),
+                max_sample_size=self.cfg.max_sample_size,
+                min_sample_size=self.cfg.min_sample_size,
+                pad=task_cfg.labels is not None or task_cfg.enable_padding,
+                normalize=task_cfg.normalize,
+                num_buckets=self.cfg.num_batch_buckets or int(self.cfg.tpu),
+                compute_mask_indices=(self.cfg.precompute_mask_indices or self.cfg.tpu),
+                **self._get_mask_precompute_kwargs(task_cfg),
+            )
+
+        if self.cfg.tpu and task_cfg["mask_channel_prob"] == 0.0:
+            logger.info(
+                "Pretraining on TPUs may suffer convergence "
+                "issues when training with `mask_channel_prob` value of "
+                "0. You may want to set this to a low value close to 0."
+            )
+
+        if task_cfg.labels:
+            label_path = os.path.join(data_path, f"{split}.{task_cfg.labels}")
+            skipped_indices = getattr(self.datasets[split], "skipped_indices", set())
+            with open(label_path, "r") as f:
+                labels = [line for i, line in enumerate(f) if i not in skipped_indices]
+
+            assert len(labels) == len(self.datasets[split]), (
+                f"labels length ({len(labels)}) and dataset length "
+                f"({len(self.datasets[split])}) do not match"
+            )
+
+            process_label = LabelEncoder(self.target_dictionary)
+
+            self.datasets[split] = AddTargetDataset(
+                self.datasets[split],
+                labels,
+                pad=self.target_dictionary.pad(),
+                eos=self.target_dictionary.eos(),
+                batch_targets=True,
+                process_label=process_label,
+                add_to_input=task_cfg.get("autoregressive", False),
+            )
+
+    @property
+    def source_dictionary(self):
+        return None
+
+    @property
+    def target_dictionary(self):
+        """Return the :class:`~fairseq.data.Dictionary` for the language
+        model."""
+        return self.state.target_dictionary
+
+    def max_positions(self):
+        """Maximum input length supported by the encoder."""
+        return (sys.maxsize, sys.maxsize)
+
+    def filter_indices_by_size(
+        self,
+        indices,
+        dataset,
+        max_positions=None,
+        ignore_invalid_inputs=False,
+    ):
+        # we do not need to filter by size in this task as dataloaders take care of this
+        return indices
+
+    def valid_step(self, sample, model, criterion):
+        loss, sample_size, logging_output = super().valid_step(sample, model, criterion)
+        if self.cfg.eval_wer and self.cfg.autoregressive:
+            metrics = self._inference_with_wer(self.sequence_generator, sample, model)
+            logging_output["_num_char_errors"] = metrics["num_char_errors"]
+            logging_output["_num_chars"] = metrics["num_chars"]
+            logging_output["_num_word_errors"] = metrics["num_word_errors"]
+            logging_output["_num_words"] = metrics["num_words"]
+        return loss, sample_size, logging_output
+
+    def build_model(self, model_cfg: FairseqDataclass):
+        model = super().build_model(model_cfg)
+
+        if self.cfg.eval_wer and self.cfg.autoregressive:
+            self.sequence_generator = self.build_generator(
+                [model],
+                self.cfg.eval_wer_config,
+            )
+            if self.cfg.eval_wer_tokenizer:
+                self.tokenizer = encoders.build_tokenizer(self.cfg.eval_wer_tokenizer)
+            else:
+                self.tokenizer = None
+
+        actualized_cfg = getattr(model, "cfg", None)
+        if actualized_cfg is not None:
+            if "w2v_args" in actualized_cfg:
+                model_cfg.w2v_args = actualized_cfg.w2v_args
+
+        return model
+
+    def _inference_with_wer(self, generator, sample, model):
+        import editdistance
+
+        def decode(toks):
+            s = self.target_dictionary.string(
+                toks.int().cpu(),
+                self.cfg.eval_wer_post_process,
+                escape_unk=True,
+            )
+            if self.tokenizer:
+                s = self.tokenizer.decode(s)
+            return s
+
+        num_word_errors, num_char_errors = 0, 0
+        num_chars, num_words = 0, 0
+        gen_out = self.inference_step(generator, [model], sample, None)
+        for i in range(len(gen_out)):
+            hyp = decode(gen_out[i][0]["tokens"])
+            ref = decode(
+                utils.strip_pad(sample["target"][i], self.target_dictionary.pad()),
+            )
+            num_char_errors += editdistance.eval(hyp, ref)
+            num_chars += len(ref)
+            hyp_words = hyp.split()
+            ref_words = ref.split()
+            num_word_errors += editdistance.eval(hyp_words, ref_words)
+            num_words += len(ref_words)
+
+        return {
+            "num_char_errors": num_char_errors,
+            "num_chars": num_chars,
+            "num_word_errors": num_word_errors,
+            "num_words": num_words,
+        }
+
+    def reduce_metrics(self, logging_outputs, criterion):
+        super().reduce_metrics(logging_outputs, criterion)
+
+        zero = torch.scalar_tensor(0.0)
+        num_char_errors = sum(
+            log.get("_num_char_errors", zero) for log in logging_outputs
+        )
+        num_chars = sum(log.get("_num_chars", zero) for log in logging_outputs)
+        num_word_errors = sum(
+            log.get("_num_word_errors", zero) for log in logging_outputs
+        )
+        num_words = sum(log.get("_num_words", zero) for log in logging_outputs)
+        metrics.log_scalar("_num_char_errors", num_char_errors)
+        metrics.log_scalar("_num_chars", num_chars)
+        metrics.log_scalar("_num_word_errors", num_word_errors)
+        metrics.log_scalar("_num_words", num_words)
+        if num_chars > 0:
+            metrics.log_derived(
+                "uer",
+                lambda meters: meters["_num_char_errors"].sum
+                * 100.0
+                / meters["_num_chars"].sum
+                if meters["_num_chars"].sum > 0
+                else float("nan"),
+            )
+        if num_words > 0:
+            metrics.log_derived(
+                "wer",
+                lambda meters: meters["_num_word_errors"].sum
+                * 100.0
+                / meters["_num_words"].sum
+                if meters["_num_words"].sum > 0
+                else float("nan"),
+            )
diff --git a/SpeechT5/fairseq/fairseq/tasks/cross_lingual_lm.py b/SpeechT5/fairseq/fairseq/tasks/cross_lingual_lm.py
new file mode 100644
index 0000000000000000000000000000000000000000..8f8fe7e2de181e41bd0e6a2bf96948ee78de5ae8
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/tasks/cross_lingual_lm.py
@@ -0,0 +1,191 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import itertools
+import logging
+import os
+from collections import OrderedDict
+
+import numpy as np
+from fairseq import tokenizer, utils
+from fairseq.data import ConcatDataset, Dictionary, TokenBlockDataset, data_utils
+from fairseq.data.legacy.masked_lm_dataset import MaskedLMDataset
+from fairseq.data.legacy.masked_lm_dictionary import MaskedLMDictionary
+from fairseq.data.multi_corpus_sampled_dataset import MultiCorpusSampledDataset
+from fairseq.tasks import LegacyFairseqTask, register_task
+
+
+logger = logging.getLogger(__name__)
+
+
+@register_task("cross_lingual_lm")
+class CrossLingualLMTask(LegacyFairseqTask):
+    """
+    Task for training cross-lingual language models.
+
+    For more details look at: https://arxiv.org/pdf/1901.07291.pdf
+
+    Args:
+        dictionary (Dictionary): the dictionary for the input of the task
+    """
+
+    @staticmethod
+    def add_args(parser):
+        """Add task-specific arguments to the parser."""
+        parser.add_argument(
+            "data",
+            help="colon separated path to data directories list, \
+                            will be iterated upon during epochs in round-robin manner",
+        )
+        parser.add_argument(
+            "--tokens-per-sample",
+            default=512,
+            type=int,
+            help="max number of total tokens over all segments" " per sample",
+        )
+        parser.add_argument(
+            "--monolingual-langs",
+            default="en",
+            type=str,
+            help="comma separated list of languages for which we"
+            " want to train XLM on",
+        )
+        parser.add_argument(
+            "--shuffle",
+            action="store_true",
+            help="shuffle each monolingual dataset while" " training",
+        )
+
+    def __init__(self, args, dictionary):
+        super().__init__(args)
+        self.dictionary = dictionary
+        self.seed = args.seed
+        self.distributed_world_size = args.distributed_world_size
+        self.langs2id = self._lang_to_id(args.monolingual_langs)
+
+    def _lang_to_id(self, languages: str):
+        """
+        Build a map from languages to ids. These ids are used as segment labels
+        for cross-lingual LM training.
+        """
+        lang2id = {}
+        langs = [l.strip() for l in languages.split(",")]
+        for id, lang in enumerate(langs):
+            lang2id[lang] = id
+        return lang2id
+
+    @classmethod
+    def load_dictionary(cls, filename):
+        return MaskedLMDictionary.load(filename)
+
+    @classmethod
+    def build_dictionary(
+        cls, filenames, workers=1, threshold=-1, nwords=-1, padding_factor=8
+    ):
+        d = MaskedLMDictionary()
+        for filename in filenames:
+            Dictionary.add_file_to_dictionary(
+                filename, d, tokenizer.tokenize_line, workers
+            )
+        d.finalize(threshold=threshold, nwords=nwords, padding_factor=padding_factor)
+        return d
+
+    @property
+    def target_dictionary(self):
+        return self.dictionary
+
+    @classmethod
+    def setup_task(cls, args, **kwargs):
+        """Setup the task."""
+        dictionary = MaskedLMDictionary.load(os.path.join(args.data, "dict.txt"))
+        logger.info("dictionary: {} types".format(len(dictionary)))
+        return cls(args, dictionary)
+
+    def _load_single_lang_dataset(self, split, epoch):
+        loaded_datasets = []
+
+        paths = utils.split_paths(self.args.data)
+        assert len(paths) > 0
+        data_path = paths[(epoch - 1) % len(paths)]
+
+        for k in itertools.count():
+            split_k = split + (str(k) if k > 0 else "")
+            path = os.path.join(data_path, split_k)
+
+            ds = data_utils.load_indexed_dataset(
+                path, self.dictionary, self.args.dataset_impl
+            )
+            if ds is None:
+                if k > 0:
+                    break
+                else:
+                    raise FileNotFoundError(
+                        "Dataset not found: {} ({})".format(split, data_path)
+                    )
+
+            # Since we append each block with the classification_token,
+            # we need to effectively create blocks of length
+            # tokens_per_sample-1
+            loaded_datasets.append(
+                TokenBlockDataset(
+                    ds,
+                    ds.sizes,
+                    self.args.tokens_per_sample - 1,
+                    pad=self.dictionary.pad(),
+                    eos=self.dictionary.eos(),
+                )
+            )
+
+            logger.info(
+                "{} {} {} examples".format(data_path, split_k, len(loaded_datasets[-1]))
+            )
+
+        if len(loaded_datasets) == 1:
+            dataset = loaded_datasets[0]
+            sizes = dataset.sizes
+        else:
+            dataset = ConcatDataset(loaded_datasets)
+            sizes = np.concatenate([ds.sizes for ds in loaded_datasets])
+
+        return dataset, sizes
+
+    def load_dataset(self, split, epoch=1, combine=False, **kwargs):
+        """Load a given dataset split.
+
+        Args:
+            split (str): name of the split (e.g., train, valid, test)
+        """
+        dataset_map = OrderedDict()
+
+        for lang in self.langs2id.keys():
+            # Datasets are expected to be in "split.lang" format (Eg: train.en)
+            language_split = "{}.{}".format(split, lang)
+
+            block_dataset, sizes = self._load_single_lang_dataset(
+                split=language_split, epoch=epoch
+            )
+
+            dataset_map[lang] = MaskedLMDataset(
+                dataset=block_dataset,
+                sizes=sizes,
+                vocab=self.dictionary,
+                pad_idx=self.dictionary.pad(),
+                mask_idx=self.dictionary.mask(),
+                classif_token_idx=self.dictionary.eos(),
+                sep_token_idx=self.dictionary.eos(),
+                shuffle=getattr(self.args, "shuffle", False),
+                has_pairs=False,
+                segment_id=self.langs2id[lang],
+                seed=self.seed,
+            )
+
+        self.datasets[split] = MultiCorpusSampledDataset(dataset_map)
+        logger.info(
+            "{} {} {} examples".format(
+                utils.split_paths(self.args.data)[epoch - 1],
+                split,
+                len(self.datasets[split]),
+            )
+        )
diff --git a/SpeechT5/fairseq/fairseq/tasks/denoising.py b/SpeechT5/fairseq/fairseq/tasks/denoising.py
new file mode 100644
index 0000000000000000000000000000000000000000..cbf01e14dfad17ee8ab0ae1ca67c2458b84559cb
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/tasks/denoising.py
@@ -0,0 +1,274 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import logging
+import os
+
+from fairseq import utils
+from fairseq.data import (
+    AppendTokenDataset,
+    DenoisingDataset,
+    Dictionary,
+    IdDataset,
+    NestedDictionaryDataset,
+    NumelDataset,
+    PadDataset,
+    PrependTokenDataset,
+    StripTokenDataset,
+    TokenBlockDataset,
+    data_utils,
+)
+from fairseq.data.encoders.utils import get_whole_word_mask
+from fairseq.data.shorten_dataset import maybe_shorten_dataset
+from fairseq.tasks import LegacyFairseqTask, register_task
+import numpy as np
+
+
+logger = logging.getLogger(__name__)
+
+
+@register_task("denoising")
+class DenoisingTask(LegacyFairseqTask):
+    """
+    Denoising task for applying sequence to sequence denoising. (ie. BART)
+    """
+
+    @staticmethod
+    def add_args(parser):
+        """Add task-specific arguments to the parser."""
+        parser.add_argument("data", help="path to data directory")
+        parser.add_argument(
+            "--tokens-per-sample",
+            default=512,
+            type=int,
+            help="max number of total tokens over all segments"
+            " per sample for dataset",
+        )
+        parser.add_argument(
+            "--sample-break-mode",
+            default="complete_doc",
+            type=str,
+            help="mode for breaking sentence",
+        )
+        parser.add_argument(
+            "--mask",
+            default=0.0,
+            type=float,
+            help="fraction of words/subwords that will be masked",
+        )
+        parser.add_argument(
+            "--mask-random",
+            default=0.0,
+            type=float,
+            help="instead of using [MASK], use random token this often",
+        )
+        parser.add_argument(
+            "--insert",
+            default=0.0,
+            type=float,
+            help="insert this percentage of additional random tokens",
+        )
+        parser.add_argument(
+            "--permute",
+            default=0.0,
+            type=float,
+            help="take this proportion of subwords and permute them",
+        )
+        parser.add_argument(
+            "--rotate",
+            default=0.5,
+            type=float,
+            help="rotate this proportion of inputs",
+        )
+        parser.add_argument(
+            "--poisson-lambda",
+            default=3.0,
+            type=float,
+            help="randomly shuffle sentences for this proportion of inputs",
+        )
+        parser.add_argument(
+            "--permute-sentences",
+            default=0.0,
+            type=float,
+            help="shuffle this proportion of sentences in all inputs",
+        )
+        parser.add_argument(
+            "--mask-length",
+            default="subword",
+            type=str,
+            choices=["subword", "word", "span-poisson"],
+            help="mask length to choose",
+        )
+        parser.add_argument(
+            "--replace-length",
+            default=-1,
+            type=int,
+            help="when masking N tokens, replace with 0, 1, or N tokens (use -1 for N)",
+        )
+        parser.add_argument(
+            "--max-source-positions",
+            default=1024,
+            type=int,
+            metavar="N",
+            help="max number of tokens in the source sequence",
+        )
+        parser.add_argument(
+            "--max-target-positions",
+            default=1024,
+            type=int,
+            metavar="N",
+            help="max number of tokens in the target sequence",
+        )
+
+        parser.add_argument(
+            "--shorten-method",
+            default="none",
+            choices=["none", "truncate", "random_crop"],
+            help="if not none, shorten sequences that exceed --tokens-per-sample",
+        )
+        parser.add_argument(
+            "--shorten-data-split-list",
+            default="",
+            help="comma-separated list of dataset splits to apply shortening to, "
+            'e.g., "train,valid" (default: all dataset splits)',
+        )
+
+
+    def __init__(self, args, dictionary):
+        super().__init__(args)
+        self.dictionary = dictionary
+        self.seed = args.seed
+
+        # add mask token
+        self.mask_idx = self.dictionary.add_symbol("<mask>")
+
+    @classmethod
+    def setup_task(cls, args, **kwargs):
+        """Setup the task."""
+        dictionary = Dictionary.load(os.path.join(args.data, "dict.txt"))
+        logger.info("dictionary: {} types".format(len(dictionary)))
+        if not hasattr(args, "shuffle_instance"):
+            args.shuffle_instance = False
+        return cls(args, dictionary)
+
+    def load_dataset(self, split, epoch=1, combine=False, **kwargs):
+        """Load a given dataset split.
+
+        Args:
+            split (str): name of the split (e.g., train, valid, test)
+        """
+        paths = utils.split_paths(self.args.data)
+        assert len(paths) > 0
+        data_path = paths[(epoch - 1) % len(paths)]
+        split_path = os.path.join(data_path, split)
+
+        dataset = data_utils.load_indexed_dataset(
+            split_path,
+            self.dictionary,
+            self.args.dataset_impl,
+            combine=combine,
+        )
+        if dataset is None:
+            raise FileNotFoundError(
+                "Dataset not found: {} ({})".format(split, split_path)
+            )
+
+        dataset = StripTokenDataset(dataset, self.dictionary.eos())
+
+        dataset = maybe_shorten_dataset(
+            dataset,
+            split,
+            self.args.shorten_data_split_list,
+            self.args.shorten_method,
+            self.args.tokens_per_sample,
+            self.args.seed,
+        )
+
+        # create continuous blocks of tokens
+        dataset = TokenBlockDataset(
+            dataset,
+            dataset.sizes,
+            self.args.tokens_per_sample - 2,  # one less for <s> and one for </s>
+            pad=self.dictionary.pad(),
+            eos=self.dictionary.eos(),
+            break_mode=self.args.sample_break_mode,
+            document_sep_len=0,
+        )
+
+        # prepend beginning-of-sentence token (<s>, equiv. to [CLS] in BERT)
+        dataset = PrependTokenDataset(dataset, self.source_dictionary.bos())
+        dataset = AppendTokenDataset(dataset, self.source_dictionary.eos())
+
+        mask_whole_words = (
+            get_whole_word_mask(self.args, self.source_dictionary)
+            if self.args.mask_length != "subword"
+            else None
+        )
+
+        self.datasets[split] = DenoisingDataset(
+            dataset,
+            dataset.sizes,
+            self.dictionary,
+            self.mask_idx,
+            mask_whole_words,
+            shuffle=self.args.shuffle_instance,
+            seed=self.seed,
+            args=self.args,
+        )
+        logger.info(
+            "Split: {0}, Loaded {1} samples of denoising_dataset".format(
+                split,
+                len(self.datasets[split]),
+            )
+        )
+
+    def build_dataset_for_inference(self, src_tokens, src_lengths, **kwargs):
+        """
+        Generate batches for inference. We assume that the input begins with a
+        bos symbol (`<s>`) and ends with an eos symbol (`</s>`).
+        """
+        pad = self.source_dictionary.pad()
+        eos = self.source_dictionary.eos()
+        src_dataset = TokenBlockDataset(
+            src_tokens,
+            src_lengths,
+            block_size=self.args.tokens_per_sample - 2,  # for <s> and </s>
+            pad=pad,
+            eos=eos,
+            break_mode=self.args.sample_break_mode,
+            document_sep_len=0,
+        )
+        prev_output_tokens = PrependTokenDataset(
+            StripTokenDataset(src_dataset, eos), eos
+        )
+        src_dataset = PadDataset(src_dataset, pad_idx=pad, left_pad=False)
+        return NestedDictionaryDataset(
+            {
+                "id": IdDataset(),
+                "net_input": {
+                    "src_tokens": src_dataset,
+                    "src_lengths": NumelDataset(src_dataset, reduce=False),
+                    "prev_output_tokens": PadDataset(
+                        prev_output_tokens, pad_idx=pad, left_pad=False
+                    ),
+                },
+                "target": src_dataset,
+            },
+            sizes=[np.array(src_lengths)],
+        )
+
+    def max_positions(self):
+        """Return the max sentence length allowed by the task."""
+        return (self.args.max_source_positions, self.args.max_target_positions)
+
+    @property
+    def source_dictionary(self):
+        """Return the source :class:`~fairseq.data.Dictionary`."""
+        return self.dictionary
+
+    @property
+    def target_dictionary(self):
+        """Return the target :class:`~fairseq.data.Dictionary`."""
+        return self.dictionary
diff --git a/SpeechT5/fairseq/fairseq/tasks/fairseq_task.py b/SpeechT5/fairseq/fairseq/tasks/fairseq_task.py
new file mode 100644
index 0000000000000000000000000000000000000000..fbec9bb2a557e97cb921b705846bde482d85f169
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/tasks/fairseq_task.py
@@ -0,0 +1,677 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import logging
+import os
+import warnings
+from argparse import Namespace
+from typing import Any, Callable, Dict, List
+
+import torch
+from fairseq import metrics, search, tokenizer, utils
+from fairseq.data import Dictionary, FairseqDataset, data_utils, encoders, iterators
+from fairseq.dataclass import FairseqDataclass
+from fairseq.dataclass.utils import gen_parser_from_dataclass
+from fairseq.optim.amp_optimizer import AMPOptimizer
+from omegaconf import DictConfig
+
+
+logger = logging.getLogger(__name__)
+
+
+class StatefulContainer(object):
+
+    _state: Dict[str, Any] = dict()
+    _factories: Dict[str, Callable[[], Any]] = dict()
+
+    def add_factory(self, name, factory: Callable[[], Any]):
+        self._factories[name] = factory
+
+    def merge_state_dict(self, state_dict: Dict[str, Any]):
+        self._state.update(state_dict)
+
+    @property
+    def state_dict(self) -> Dict[str, Any]:
+        return self._state
+
+    def __getattr__(self, name):
+        if name not in self._state and name in self._factories:
+            self._state[name] = self._factories[name]()
+
+        if name in self._state:
+            return self._state[name]
+
+        raise AttributeError(f"Task state has no factory for attribute {name}")
+
+
+class FairseqTask(object):
+    """
+    Tasks store dictionaries and provide helpers for loading/iterating over
+    Datasets, initializing the Model/Criterion and calculating the loss.
+
+    Tasks have limited statefulness. In particular, state that needs to be
+    saved to/loaded from checkpoints needs to be stored in the `self.state`
+    :class:`StatefulContainer` object. For example::
+
+        self.state.add_factory("dictionary", self.load_dictionary)
+        print(self.state.dictionary)  # calls self.load_dictionary()
+
+    This is necessary so that when loading checkpoints, we can properly
+    recreate the task state after initializing the task instance.
+    """
+
+    @classmethod
+    def add_args(cls, parser):
+        """Add task-specific arguments to the parser."""
+        dc = getattr(cls, "__dataclass", None)
+        if dc is not None:
+            gen_parser_from_dataclass(parser, dc())
+
+    @staticmethod
+    def logging_outputs_can_be_summed(criterion) -> bool:
+        """
+        Whether the logging outputs returned by `train_step` and `valid_step` can
+        be summed across workers prior to calling `aggregate_logging_outputs`.
+        Setting this to True will improves distributed training speed.
+        """
+        return criterion.logging_outputs_can_be_summed()
+
+    cfg: FairseqDataclass
+    datasets: Dict[str, FairseqDataset]
+    dataset_to_epoch_iter: Dict[FairseqDataset, Any]
+    state: StatefulContainer = None
+
+    def __init__(self, cfg: FairseqDataclass, **kwargs):
+        self.cfg = cfg
+        self.datasets = dict()
+        self.dataset_to_epoch_iter = dict()
+        self.state = StatefulContainer()
+
+    @classmethod
+    def load_dictionary(cls, filename):
+        """Load the dictionary from the filename
+
+        Args:
+            filename (str): the filename
+        """
+        return Dictionary.load(filename)
+
+    @classmethod
+    def build_dictionary(
+        cls, filenames, workers=1, threshold=-1, nwords=-1, padding_factor=8
+    ):
+        """Build the dictionary
+
+        Args:
+            filenames (list): list of filenames
+            workers (int): number of concurrent workers
+            threshold (int): defines the minimum word count
+            nwords (int): defines the total number of words in the final dictionary,
+                including special symbols
+            padding_factor (int): can be used to pad the dictionary size to be a
+                multiple of 8, which is important on some hardware (e.g., Nvidia
+                Tensor Cores).
+        """
+        d = Dictionary()
+        for filename in filenames:
+            Dictionary.add_file_to_dictionary(
+                filename, d, tokenizer.tokenize_line, workers
+            )
+        d.finalize(threshold=threshold, nwords=nwords, padding_factor=padding_factor)
+        return d
+
+    @classmethod
+    def setup_task(cls, cfg: DictConfig, **kwargs):
+        """Setup the task (e.g., load dictionaries).
+
+        Args:
+            cfg (omegaconf.DictConfig): parsed command-line arguments
+        """
+        return cls(cfg, **kwargs)
+
+    def has_sharded_data(self, split):
+        return os.pathsep in getattr(self.cfg, "data", "")
+
+    def load_dataset(
+        self,
+        split: str,
+        combine: bool = False,
+        task_cfg: FairseqDataclass = None,
+        **kwargs
+    ):
+        """Load a given dataset split.
+
+        Args:
+            split (str): name of the split (e.g., train, valid, test)
+            combine (bool): combines a split segmented into pieces into one dataset
+            task_cfg (FairseqDataclass): optional task configuration stored in the checkpoint that can be used
+                                         to load datasets
+        """
+        raise NotImplementedError
+
+    def dataset(self, split):
+        """
+        Return a loaded dataset split.
+
+        Args:
+            split (str): name of the split (e.g., train, valid, test)
+
+        Returns:
+            a :class:`~fairseq.data.FairseqDataset` corresponding to *split*
+        """
+        from fairseq.data import FairseqDataset
+
+        if split not in self.datasets:
+            raise KeyError("Dataset not loaded: " + split)
+        if not isinstance(self.datasets[split], FairseqDataset):
+            raise TypeError("Datasets are expected to be of type FairseqDataset")
+        return self.datasets[split]
+
+    def filter_indices_by_size(
+        self, indices, dataset, max_positions=None, ignore_invalid_inputs=False
+    ):
+        """
+        Filter examples that are too large
+
+        Args:
+            indices (np.array): original array of sample indices
+            dataset (~fairseq.data.FairseqDataset): dataset to batch
+            max_positions (optional): max sentence length supported by the
+                model (default: None).
+            ignore_invalid_inputs (bool, optional): don't raise Exception for
+                sentences that are too long (default: False).
+        Returns:
+            np.array: array of filtered sample indices
+        """
+        indices, ignored = dataset.filter_indices_by_size(indices, max_positions)
+        if len(ignored) > 0:
+            if not ignore_invalid_inputs:
+                raise Exception(
+                    (
+                        "Size of sample #{} is invalid (={}) since max_positions={}, "
+                        "skip this example with --skip-invalid-size-inputs-valid-test"
+                    ).format(ignored[0], dataset.size(ignored[0]), max_positions)
+                )
+            logger.warning(
+                (
+                    "{:,} samples have invalid sizes and will be skipped, "
+                    "max_positions={}, first few sample ids={}"
+                ).format(len(ignored), max_positions, ignored[:10])
+            )
+        return indices
+
+    def can_reuse_epoch_itr(self, dataset):
+        # We can reuse the epoch iterator across epochs as long as the dataset
+        # hasn't disabled it. We default to ``False`` here, although in practice
+        # this will be ``True`` for most datasets that inherit from
+        # ``FairseqDataset`` due to the base implementation there.
+        return getattr(dataset, "can_reuse_epoch_itr_across_epochs", False)
+
+    def get_batch_iterator(
+        self,
+        dataset,
+        max_tokens=None,
+        max_sentences=None,
+        max_positions=None,
+        ignore_invalid_inputs=False,
+        required_batch_size_multiple=1,
+        seed=1,
+        num_shards=1,
+        shard_id=0,
+        num_workers=0,
+        epoch=1,
+        data_buffer_size=0,
+        disable_iterator_cache=False,
+    ):
+        """
+        Get an iterator that yields batches of data from the given dataset.
+
+        Args:
+            dataset (~fairseq.data.FairseqDataset): dataset to batch
+            max_tokens (int, optional): max number of tokens in each batch
+                (default: None).
+            max_sentences (int, optional): max number of sentences in each
+                batch (default: None).
+            max_positions (optional): max sentence length supported by the
+                model (default: None).
+            ignore_invalid_inputs (bool, optional): don't raise Exception for
+                sentences that are too long (default: False).
+            required_batch_size_multiple (int, optional): require batch size to
+                be a multiple of N (default: 1).
+            seed (int, optional): seed for random number generator for
+                reproducibility (default: 1).
+            num_shards (int, optional): shard the data iterator into N
+                shards (default: 1).
+            shard_id (int, optional): which shard of the data iterator to
+                return (default: 0).
+            num_workers (int, optional): how many subprocesses to use for data
+                loading. 0 means the data will be loaded in the main process
+                (default: 0).
+            epoch (int, optional): the epoch to start the iterator from
+                (default: 1).
+            data_buffer_size (int, optional): number of batches to
+                preload (default: 0).
+            disable_iterator_cache (bool, optional): don't cache the
+                EpochBatchIterator (ignores `FairseqTask::can_reuse_epoch_itr`)
+                (default: False).
+        Returns:
+            ~fairseq.iterators.EpochBatchIterator: a batched iterator over the
+                given dataset split
+        """
+        can_reuse_epoch_itr = not disable_iterator_cache and self.can_reuse_epoch_itr(
+            dataset
+        )
+        if can_reuse_epoch_itr and dataset in self.dataset_to_epoch_iter:
+            logger.debug("reusing EpochBatchIterator for epoch {}".format(epoch))
+            return self.dataset_to_epoch_iter[dataset]
+
+        assert isinstance(dataset, FairseqDataset)
+
+        # initialize the dataset with the correct starting epoch
+        dataset.set_epoch(epoch)
+
+        # get indices ordered by example size
+        with data_utils.numpy_seed(seed):
+            indices = dataset.ordered_indices()
+
+        # filter examples that are too large
+        if max_positions is not None:
+            indices = self.filter_indices_by_size(
+                indices, dataset, max_positions, ignore_invalid_inputs
+            )
+
+        # create mini-batches with given size constraints
+        batch_sampler = dataset.batch_by_size(
+            indices,
+            max_tokens=max_tokens,
+            max_sentences=max_sentences,
+            required_batch_size_multiple=required_batch_size_multiple,
+        )
+
+        # return a reusable, sharded iterator
+        epoch_iter = iterators.EpochBatchIterator(
+            dataset=dataset,
+            collate_fn=dataset.collater,
+            batch_sampler=batch_sampler,
+            seed=seed,
+            num_shards=num_shards,
+            shard_id=shard_id,
+            num_workers=num_workers,
+            epoch=epoch,
+            buffer_size=data_buffer_size,
+        )
+
+        if can_reuse_epoch_itr:
+            self.dataset_to_epoch_iter[dataset] = epoch_iter
+
+        return epoch_iter
+
+    def build_model(self, cfg: FairseqDataclass):
+        """
+        Build the :class:`~fairseq.models.BaseFairseqModel` instance for this
+        task.
+
+        Args:
+            cfg (FairseqDataclass): configuration object
+
+        Returns:
+            a :class:`~fairseq.models.BaseFairseqModel` instance
+        """
+        from fairseq import models, quantization_utils
+
+        model = models.build_model(cfg, self)
+        model = quantization_utils.quantize_model_scalar(model, cfg)
+        return model
+
+    def build_criterion(self, cfg: DictConfig):
+        """
+        Build the :class:`~fairseq.criterions.FairseqCriterion` instance for
+        this task.
+
+        Args:
+            cfg (omegaconf.DictConfig): configration object
+
+        Returns:
+            a :class:`~fairseq.criterions.FairseqCriterion` instance
+        """
+        from fairseq import criterions
+
+        return criterions.build_criterion(cfg, self)
+
+    def build_generator(
+        self, models, args, seq_gen_cls=None, extra_gen_cls_kwargs=None, prefix_allowed_tokens_fn=None,
+    ):
+        """
+        Build a :class:`~fairseq.SequenceGenerator` instance for this
+        task.
+
+        Args:
+            models (List[~fairseq.models.FairseqModel]): ensemble of models
+            args (fairseq.dataclass.configs.GenerationConfig):
+                configuration object (dataclass) for generation
+            extra_gen_cls_kwargs (Dict[str, Any]): extra options to pass
+                through to SequenceGenerator
+            prefix_allowed_tokens_fn (Callable[[int, torch.Tensor], List[int]]):
+                If provided, this function constrains the beam search to
+                allowed tokens only at each step. The provided function
+                should take 2 arguments: the batch ID (`batch_id: int`)
+                and a unidimensional tensor of token ids (`inputs_ids:
+                torch.Tensor`). It has to return a `List[int]` with the
+                allowed tokens for the next generation step conditioned
+                on the previously generated tokens (`inputs_ids`) and
+                the batch ID (`batch_id`). This argument is useful for
+                constrained generation conditioned on the prefix, as
+                described in "Autoregressive Entity Retrieval"
+                (https://arxiv.org/abs/2010.00904) and
+                https://github.com/facebookresearch/GENRE.
+        """
+        if getattr(args, "score_reference", False):
+            from fairseq.sequence_scorer import SequenceScorer
+
+            return SequenceScorer(
+                self.target_dictionary,
+                compute_alignment=getattr(args, "print_alignment", False),
+            )
+
+        from fairseq.sequence_generator import (
+            SequenceGenerator,
+            SequenceGeneratorWithAlignment,
+        )
+        try:
+            from fairseq.fb_sequence_generator import FBSequenceGenerator
+        except ModuleNotFoundError:
+            pass
+
+        # Choose search strategy. Defaults to Beam Search.
+        sampling = getattr(args, "sampling", False)
+        sampling_topk = getattr(args, "sampling_topk", -1)
+        sampling_topp = getattr(args, "sampling_topp", -1.0)
+        diverse_beam_groups = getattr(args, "diverse_beam_groups", -1)
+        diverse_beam_strength = getattr(args, "diverse_beam_strength", 0.5)
+        match_source_len = getattr(args, "match_source_len", False)
+        diversity_rate = getattr(args, "diversity_rate", -1)
+        constrained = getattr(args, "constraints", False)
+        if prefix_allowed_tokens_fn is None:
+            prefix_allowed_tokens_fn = getattr(args, "prefix_allowed_tokens_fn", None)
+        if (
+            sum(
+                int(cond)
+                for cond in [
+                    sampling,
+                    diverse_beam_groups > 0,
+                    match_source_len,
+                    diversity_rate > 0,
+                ]
+            )
+            > 1
+        ):
+            raise ValueError("Provided Search parameters are mutually exclusive.")
+        assert sampling_topk < 0 or sampling, "--sampling-topk requires --sampling"
+        assert sampling_topp < 0 or sampling, "--sampling-topp requires --sampling"
+
+        if sampling:
+            search_strategy = search.Sampling(
+                self.target_dictionary, sampling_topk, sampling_topp
+            )
+        elif diverse_beam_groups > 0:
+            search_strategy = search.DiverseBeamSearch(
+                self.target_dictionary, diverse_beam_groups, diverse_beam_strength
+            )
+        elif match_source_len:
+            # this is useful for tagging applications where the output
+            # length should match the input length, so we hardcode the
+            # length constraints for simplicity
+            search_strategy = search.LengthConstrainedBeamSearch(
+                self.target_dictionary,
+                min_len_a=1,
+                min_len_b=0,
+                max_len_a=1,
+                max_len_b=0,
+            )
+        elif diversity_rate > -1:
+            search_strategy = search.DiverseSiblingsSearch(
+                self.target_dictionary, diversity_rate
+            )
+        elif constrained:
+            search_strategy = search.LexicallyConstrainedBeamSearch(
+                self.target_dictionary, args.constraints
+            )
+        elif prefix_allowed_tokens_fn:
+            search_strategy = search.PrefixConstrainedBeamSearch(
+                self.target_dictionary, prefix_allowed_tokens_fn
+            )
+        else:
+            search_strategy = search.BeamSearch(self.target_dictionary)
+
+        extra_gen_cls_kwargs = extra_gen_cls_kwargs or {}
+        if seq_gen_cls is None:
+            if getattr(args, "print_alignment", False):
+                seq_gen_cls = SequenceGeneratorWithAlignment
+                extra_gen_cls_kwargs["print_alignment"] = args.print_alignment
+            elif getattr(args, "fb_seq_gen", False):
+                seq_gen_cls = FBSequenceGenerator
+            else:
+                seq_gen_cls = SequenceGenerator
+
+        return seq_gen_cls(
+            models,
+            self.target_dictionary,
+            beam_size=getattr(args, "beam", 5),
+            max_len_a=getattr(args, "max_len_a", 0),
+            max_len_b=getattr(args, "max_len_b", 200),
+            min_len=getattr(args, "min_len", 1),
+            normalize_scores=(not getattr(args, "unnormalized", False)),
+            len_penalty=getattr(args, "lenpen", 1),
+            unk_penalty=getattr(args, "unkpen", 0),
+            temperature=getattr(args, "temperature", 1.0),
+            match_source_len=getattr(args, "match_source_len", False),
+            no_repeat_ngram_size=getattr(args, "no_repeat_ngram_size", 0),
+            search_strategy=search_strategy,
+            **extra_gen_cls_kwargs,
+        )
+
+    def train_step(
+        self, sample, model, criterion, optimizer, update_num, ignore_grad=False
+    ):
+        """
+        Do forward and backward, and return the loss as computed by *criterion*
+        for the given *model* and *sample*.
+
+        Args:
+            sample (dict): the mini-batch. The format is defined by the
+                :class:`~fairseq.data.FairseqDataset`.
+            model (~fairseq.models.BaseFairseqModel): the model
+            criterion (~fairseq.criterions.FairseqCriterion): the criterion
+            optimizer (~fairseq.optim.FairseqOptimizer): the optimizer
+            update_num (int): the current update
+            ignore_grad (bool): multiply loss by 0 if this is set to True
+
+        Returns:
+            tuple:
+                - the loss
+                - the sample size, which is used as the denominator for the
+                  gradient
+                - logging outputs to display while training
+        """
+        model.train()
+        model.set_num_updates(update_num)
+        with torch.autograd.profiler.record_function("forward"):
+            with torch.cuda.amp.autocast(enabled=(isinstance(optimizer, AMPOptimizer))):
+                loss, sample_size, logging_output = criterion(model, sample)
+        if ignore_grad:
+            loss *= 0
+        with torch.autograd.profiler.record_function("backward"):
+            optimizer.backward(loss)
+        return loss, sample_size, logging_output
+
+    def valid_step(self, sample, model, criterion):
+        model.eval()
+        with torch.no_grad():
+            loss, sample_size, logging_output = criterion(model, sample)
+        return loss, sample_size, logging_output
+
+    def optimizer_step(self, optimizer, model, update_num):
+        optimizer.step()
+
+    def build_dataset_for_inference(
+        self, src_tokens: List[torch.Tensor], src_lengths: List[int], **kwargs
+    ) -> torch.utils.data.Dataset:
+        raise NotImplementedError
+
+    def inference_step(
+        self, generator, models, sample, prefix_tokens=None, constraints=None
+    ):
+        with torch.no_grad():
+            return generator.generate(
+                models, sample, prefix_tokens=prefix_tokens, constraints=constraints
+            )
+
+    def begin_epoch(self, epoch, model):
+        """Hook function called before the start of each epoch."""
+        pass
+
+    def begin_valid_epoch(self, epoch, model):
+        """Hook function called before the start of each validation epoch."""
+        pass
+
+    def aggregate_logging_outputs(self, logging_outputs, criterion):
+        """[deprecated] Aggregate logging outputs from data parallel training."""
+        utils.deprecation_warning(
+            "The aggregate_logging_outputs API is deprecated. "
+            "Please use the reduce_metrics API instead."
+        )
+        with metrics.aggregate() as agg:
+            self.reduce_metrics(logging_outputs, criterion)
+            return agg.get_smoothed_values()
+
+    def reduce_metrics(self, logging_outputs, criterion):
+        """Aggregate logging outputs from data parallel training."""
+        # backward compatibility for tasks that override aggregate_logging_outputs
+        base_func = FairseqTask.aggregate_logging_outputs
+        self_func = getattr(self, "aggregate_logging_outputs").__func__
+        if self_func is not base_func:
+            utils.deprecation_warning(
+                "Tasks should implement the reduce_metrics API. "
+                "Falling back to deprecated aggregate_logging_outputs API."
+            )
+            agg_logging_outputs = self.aggregate_logging_outputs(
+                logging_outputs, criterion
+            )
+            for k, v in agg_logging_outputs.items():
+                metrics.log_scalar(k, v)
+            return
+
+        if not any("ntokens" in log for log in logging_outputs):
+            warnings.warn(
+                "ntokens not found in Criterion logging outputs, cannot log wpb or wps"
+            )
+        else:
+            ntokens = sum(log.get("ntokens", 0) for log in logging_outputs)
+            metrics.log_scalar("wpb", ntokens, priority=180, round=1)
+            metrics.log_speed("wps", ntokens, priority=90, round=1)
+
+        if not any("nsentences" in log for log in logging_outputs):
+            warnings.warn(
+                "nsentences not found in Criterion logging outputs, cannot log bsz"
+            )
+        else:
+            nsentences = sum(log.get("nsentences", 0) for log in logging_outputs)
+            metrics.log_scalar("bsz", nsentences, priority=190, round=1)
+
+        criterion.__class__.reduce_metrics(logging_outputs)
+
+    def state_dict(self):
+        if self.state is not None:
+            return self.state.state_dict
+        return {}
+
+    def load_state_dict(self, state_dict: Dict[str, Any]):
+        if self.state is not None:
+            self.state.merge_state_dict(state_dict)
+
+    def max_positions(self):
+        """Return the max input length allowed by the task."""
+        return None
+
+    @property
+    def source_dictionary(self):
+        """Return the source :class:`~fairseq.data.Dictionary` (if applicable
+        for this task)."""
+        raise NotImplementedError
+
+    @property
+    def target_dictionary(self):
+        """Return the target :class:`~fairseq.data.Dictionary` (if applicable
+        for this task)."""
+        raise NotImplementedError
+
+    def build_tokenizer(self, args):
+        """Build the pre-tokenizer for this task."""
+        return encoders.build_tokenizer(args)
+
+    def build_bpe(self, args):
+        """Build the tokenizer for this task."""
+        return encoders.build_bpe(args)
+
+    def get_interactive_tokens_and_lengths(self, lines, encode_fn):
+        tokens = [
+            self.source_dictionary.encode_line(
+                encode_fn(src_str), add_if_not_exist=False
+            ).long()
+            for src_str in lines
+        ]
+        lengths = [t.numel() for t in tokens]
+        return tokens, lengths
+
+
+class LegacyFairseqTask(FairseqTask):
+    def __init__(self, args: Namespace):
+        self.args = args
+        self.datasets = {}
+        self.dataset_to_epoch_iter = {}
+
+    @classmethod
+    def setup_task(cls, args: Namespace, **kwargs):
+        """Setup the task (e.g., load dictionaries).
+
+        Args:
+            args (argparse.Namespace): parsed command-line arguments
+        """
+        return cls(args, **kwargs)
+
+    def has_sharded_data(self, split):
+        return os.pathsep in getattr(self.args, "data", "")
+
+    def build_model(self, args: Namespace):
+        """
+        Build the :class:`~fairseq.models.BaseFairseqModel` instance for this
+        task.
+
+        Args:
+            args (argparse.Namespace): parsed command-line arguments
+
+        Returns:
+            a :class:`~fairseq.models.BaseFairseqModel` instance
+        """
+        from fairseq import models, quantization_utils
+
+        model = models.build_model(args, self)
+        model = quantization_utils.quantize_model_scalar(model, args)
+        return model
+
+    def build_criterion(self, args: Namespace):
+        """
+        Build the :class:`~fairseq.criterions.FairseqCriterion` instance for
+        this task.
+
+        Args:
+            args (argparse.Namespace): parsed command-line arguments
+
+        Returns:
+            a :class:`~fairseq.criterions.FairseqCriterion` instance
+        """
+        from fairseq import criterions
+
+        return criterions.build_criterion(args, self)
diff --git a/SpeechT5/fairseq/fairseq/tasks/hubert_pretraining.py b/SpeechT5/fairseq/fairseq/tasks/hubert_pretraining.py
new file mode 100644
index 0000000000000000000000000000000000000000..ee3fedce3fcbf7a72282cfd19b6330bdcbdf2cef
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/tasks/hubert_pretraining.py
@@ -0,0 +1,193 @@
+# Copyright (c) 2017-present, Facebook, Inc.
+# All rights reserved.
+#
+# This source code is licensed under the license found in the LICENSE file in
+# the root directory of this source tree. An additional grant of patent rights
+# can be found in the PATENTS file in the same directory.
+
+import logging
+import os
+import sys
+from typing import Dict, List, Optional, Tuple
+
+import numpy as np
+
+from dataclasses import dataclass, field
+from fairseq.data import Dictionary, HubertDataset
+from fairseq.dataclass.configs import FairseqDataclass
+from fairseq.tasks import register_task
+from fairseq.tasks.fairseq_task import FairseqTask
+from omegaconf import MISSING
+
+logger = logging.getLogger(__name__)
+
+
+class LabelEncoder(object):
+    def __init__(self, dictionary: Dictionary) -> None:
+        self.dictionary = dictionary
+
+    def __call__(self, label: str) -> List[str]:
+        return self.dictionary.encode_line(
+            label, append_eos=False, add_if_not_exist=False,
+        )
+
+
+@dataclass
+class HubertPretrainingConfig(FairseqDataclass):
+    data: str = field(
+        default=MISSING, metadata={"help": "path to data directory"}
+    )
+    fine_tuning: bool = field(
+        default=False, metadata={"help": "set to true if fine-tuning Hubert"}
+    )
+    labels: List[str] = field(
+        default_factory=lambda: ["ltr"],
+        metadata={
+            "help": (
+                "extension of the label files to load, frame-level labels for"
+                " pre-training, and sequence-level label for fine-tuning"
+            )
+        },
+    )
+    label_dir: Optional[str] = field(
+        default=None,
+        metadata={
+            "help": "if set, looks for labels in this directory instead",
+        },
+    )
+    label_rate: int = field(
+        default=-1,
+        metadata={"help": "label frame rate. -1 for sequence label"},
+    )
+    sample_rate: int = field(
+        default=16_000,
+        metadata={
+            "help": "target sample rate. audio files will be up/down "
+            "sampled to this rate"
+        },
+    )
+    normalize: bool = field(
+        default=False,
+        metadata={
+            "help": "if set, normalizes input to have 0 mean and unit variance"
+        },
+    )
+    enable_padding: bool = field(
+        default=False,
+        metadata={"help": "pad shorter samples instead of cropping"},
+    )
+    max_sample_size: Optional[int] = field(
+        default=None,
+        metadata={"help": "max sample size to crop to for batching"},
+    )
+    min_sample_size: Optional[int] = field(
+        default=None,
+        metadata={"help": "min sample size to crop to for batching"},
+    )
+    single_target: Optional[bool] = field(
+        default=False,
+        metadata={
+            "help": "if set, AddTargetDatasets outputs same keys "
+            "as AddTargetDataset"
+        },
+    )
+    random_crop: Optional[bool] = field(
+        default=True,
+        metadata={"help": "always crop from the beginning if false"},
+    )
+    pad_audio: Optional[bool] = field(
+        default=False,
+        metadata={"help": "pad audio to the longest one in the batch if true"},
+    )
+
+
+@register_task("hubert_pretraining", dataclass=HubertPretrainingConfig)
+class HubertPretrainingTask(FairseqTask):
+
+    cfg: HubertPretrainingConfig
+
+    def __init__(
+        self,
+        cfg: HubertPretrainingConfig,
+    ) -> None:
+        super().__init__(cfg)
+
+        logger.info(f"current directory is {os.getcwd()}")
+        logger.info(f"HubertPretrainingTask Config {cfg}")
+
+        self.cfg = cfg
+        self.fine_tuning = cfg.fine_tuning
+
+        if cfg.fine_tuning:
+            self.state.add_factory("target_dictionary", self.load_dictionaries)
+        else:
+            self.state.add_factory("dictionaries", self.load_dictionaries)
+
+        self._source_dictionary = None
+
+        self.blank_symbol = "<s>"
+
+    @property
+    def source_dictionary(self) -> Optional[Dictionary]:
+        return self._source_dictionary
+
+    @property
+    def target_dictionary(self) -> Optional[Dictionary]:
+        return self.state.target_dictionary
+
+    @property
+    def dictionaries(self) -> List[Dictionary]:
+        return self.state.dictionaries
+
+    @classmethod
+    def setup_task(
+        cls, cfg: HubertPretrainingConfig, **kwargs
+    ) -> "HubertPretrainingTask":
+        return cls(cfg)
+
+    def load_dictionaries(self):
+        label_dir = self.cfg.data if self.cfg.label_dir is None else self.cfg.label_dir
+        dictionaries = [Dictionary.load(f"{label_dir}/dict.{label}.txt") for label in self.cfg.labels]
+        return dictionaries[0] if self.cfg.fine_tuning else dictionaries
+
+    def get_label_dir(self) -> str:
+        if self.cfg.label_dir is None:
+            return self.cfg.data
+        return self.cfg.label_dir
+
+    def load_dataset(self, split: str, **kwargs) -> None:
+        manifest = f"{self.cfg.data}/{split}.tsv"
+        dicts = [self.target_dictionary] if self.cfg.fine_tuning else self.dictionaries
+        pad_list = [dict.pad() for dict in dicts]
+        eos_list = [dict.eos() for dict in dicts]
+        procs = [LabelEncoder(dict) for dict in dicts]
+        paths = [
+            f"{self.get_label_dir()}/{split}.{l}" for l in self.cfg.labels
+        ]
+
+        # hubert v1: pad_audio=True, random_crop=False;
+        self.datasets[split] = HubertDataset(
+            manifest,
+            sample_rate=self.cfg.sample_rate,
+            label_paths=paths,
+            label_rates=self.cfg.label_rate,
+            pad_list=pad_list,
+            eos_list=eos_list,
+            label_processors=procs,
+            max_keep_sample_size=None,
+            min_keep_sample_size=self.cfg.min_sample_size,
+            max_sample_size=self.cfg.max_sample_size,
+            pad_audio=self.cfg.pad_audio,
+            normalize=self.cfg.normalize,
+            store_labels=False,
+            random_crop=self.cfg.random_crop,
+            single_target=self.cfg.single_target,
+        )
+
+    def max_positions(self) -> Tuple[int, int]:
+        return (sys.maxsize, sys.maxsize)
+
+    def filter_indices_by_size(
+        self, indices: np.array, *args, **kwargs
+    ) -> np.array:
+        return indices
diff --git a/SpeechT5/fairseq/fairseq/tasks/language_modeling.py b/SpeechT5/fairseq/fairseq/tasks/language_modeling.py
new file mode 100644
index 0000000000000000000000000000000000000000..4b76a51c61d71c4358de07bdd4eb3f93894737a8
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/tasks/language_modeling.py
@@ -0,0 +1,379 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import logging
+import os
+from dataclasses import dataclass, field
+from typing import Optional
+
+import numpy as np
+import torch
+from fairseq import utils
+from fairseq.data import (
+    AppendTokenDataset,
+    Dictionary,
+    IdDataset,
+    LMContextWindowDataset,
+    MonolingualDataset,
+    NestedDictionaryDataset,
+    NumelDataset,
+    PadDataset,
+    PrependTokenDataset,
+    StripTokenDataset,
+    TokenBlockDataset,
+    TruncatedDictionary,
+    data_utils,
+)
+from fairseq.data.indexed_dataset import get_available_dataset_impl
+from fairseq.data.shorten_dataset import maybe_shorten_dataset
+from fairseq.dataclass import ChoiceEnum, FairseqDataclass
+from fairseq.tasks import LegacyFairseqTask, register_task
+from omegaconf import II
+
+
+SAMPLE_BREAK_MODE_CHOICES = ChoiceEnum(["none", "complete", "complete_doc", "eos"])
+SHORTEN_METHOD_CHOICES = ChoiceEnum(["none", "truncate", "random_crop"])
+logger = logging.getLogger(__name__)
+
+
+@dataclass
+class LanguageModelingConfig(FairseqDataclass):
+    data: Optional[str] = field(
+        default=None, metadata={"help": "path to data directory"}
+    )
+    sample_break_mode: SAMPLE_BREAK_MODE_CHOICES = field(
+        default="none",
+        metadata={
+            "help": 'If omitted or "none", fills each sample with tokens-per-sample '
+            'tokens. If set to "complete", splits samples only at the end '
+            "of sentence, but may include multiple sentences per sample. "
+            '"complete_doc" is similar but respects doc boundaries. '
+            'If set to "eos", includes only one sentence per sample.'
+        },
+    )
+    tokens_per_sample: int = field(
+        default=1024,
+        metadata={"help": "max number of tokens per sample for LM dataset"},
+    )
+    output_dictionary_size: int = field(
+        default=-1, metadata={"help": "limit the size of output dictionary"}
+    )
+    self_target: bool = field(default=False, metadata={"help": "include self target"})
+    future_target: bool = field(
+        default=False, metadata={"help": "include future target"}
+    )
+    past_target: bool = field(default=False, metadata={"help": "include past target"})
+    add_bos_token: bool = field(
+        default=False, metadata={"help": "prepend beginning of sentence token (<s>)"}
+    )
+    max_target_positions: Optional[int] = field(
+        default=None, metadata={"help": "max number of tokens in the target sequence"}
+    )
+    shorten_method: SHORTEN_METHOD_CHOICES = field(
+        default="none",
+        metadata={
+            "help": "if not none, shorten sequences that exceed --tokens-per-sample"
+        },
+    )
+    shorten_data_split_list: str = field(
+        default="",
+        metadata={
+            "help": "comma-separated list of dataset splits to apply shortening to, "
+            'e.g., "train,valid" (default: all dataset splits)'
+        },
+    )
+    pad_to_fixed_length: Optional[bool] = field(
+        default=False, metadata={"help": "pad to fixed length"},
+    )
+    pad_to_fixed_bsz: Optional[bool] = field(
+        default=False, metadata={"help": "boolean to pad to fixed batch size"},
+    )
+
+    # TODO common vars below add to parent
+    seed: int = II("common.seed")
+    batch_size: Optional[int] = II("dataset.batch_size")
+    batch_size_valid: Optional[int] = II("dataset.batch_size_valid")
+    dataset_impl: Optional[ChoiceEnum(get_available_dataset_impl())] = II(
+        "dataset.dataset_impl"
+    )
+    data_buffer_size: int = II("dataset.data_buffer_size")
+    tpu: bool = II("common.tpu")
+    use_plasma_view: bool = II("common.use_plasma_view")
+    plasma_path: str = II("common.plasma_path")
+
+
+@register_task("language_modeling", dataclass=LanguageModelingConfig)
+class LanguageModelingTask(LegacyFairseqTask):
+    """
+    Train a language model.
+
+    Args:
+        dictionary (~fairseq.data.Dictionary): the dictionary for the input of
+            the language model
+        output_dictionary (~fairseq.data.Dictionary): the dictionary for the
+            output of the language model. In most cases it will be the same as
+            *dictionary*, but could possibly be a more limited version of the
+            dictionary (if ``--output-dictionary-size`` is used).
+        targets (List[str]): list of the target types that the language model
+            should predict.  Can be one of "self", "future", and "past".
+            Defaults to "future".
+
+    .. note::
+
+        The language modeling task is compatible with :mod:`fairseq-train`,
+        :mod:`fairseq-generate`, :mod:`fairseq-interactive` and
+        :mod:`fairseq-eval-lm`.
+
+    The language modeling task provides the following additional command-line
+    arguments:
+
+    .. argparse::
+        :ref: fairseq.tasks.language_modeling_parser
+        :prog:
+    """
+
+    def __init__(self, args, dictionary, output_dictionary=None, targets=None):
+        super().__init__(args)
+        self.dictionary = dictionary
+        self.output_dictionary = output_dictionary or dictionary
+
+        if targets is None:
+            targets = ["future"]
+        self.targets = targets
+
+    @classmethod
+    def setup_dictionary(cls, args, **kwargs):
+        dictionary = None
+        output_dictionary = None
+        if args.data:
+            paths = utils.split_paths(args.data)
+            assert len(paths) > 0
+            dictionary = Dictionary.load(os.path.join(paths[0], "dict.txt"))
+            logger.info("dictionary: {} types".format(len(dictionary)))
+            output_dictionary = dictionary
+            if args.output_dictionary_size >= 0:
+                output_dictionary = TruncatedDictionary(
+                    dictionary, args.output_dictionary_size
+                )
+        return (dictionary, output_dictionary)
+
+    @classmethod
+    def setup_task(cls, args, **kwargs):
+        """Setup the task (e.g., load dictionaries).
+
+        Args:
+            args (argparse.Namespace): parsed command-line arguments
+        """
+        dictionary, output_dictionary = cls.setup_dictionary(args, **kwargs)
+
+        # upgrade old checkpoints
+        if getattr(args, "exclude_self_target", False):
+            args.self_target = False
+
+        targets = []
+        if getattr(args, "self_target", False):
+            targets.append("self")
+        if getattr(args, "future_target", False):
+            targets.append("future")
+        if getattr(args, "past_target", False):
+            targets.append("past")
+        if len(targets) == 0:
+            # standard language modeling
+            targets = ["future"]
+
+        return cls(args, dictionary, output_dictionary, targets=targets)
+
+    def build_model(self, args):
+        model = super().build_model(args)
+        for target in self.targets:
+            if target not in model.supported_targets:
+                raise ValueError(
+                    "Unsupported language modeling target: {}".format(target)
+                )
+
+        return model
+
+    def load_dataset(
+        self, split: str, epoch=1, combine=False, **kwargs
+    ) -> MonolingualDataset:
+        """Load a given dataset split.
+
+        Args:
+            split (str): name of the split (e.g., train, valid, valid1, test)
+        """
+        paths = utils.split_paths(self.args.data)
+        assert len(paths) > 0
+
+        data_path = paths[(epoch - 1) % len(paths)]
+        split_path = os.path.join(data_path, split)
+
+        # each process has its own copy of the raw data (likely to be an np.memmap)
+        dataset = data_utils.load_indexed_dataset(
+            split_path, self.dictionary, self.args.dataset_impl, combine=combine
+        )
+        if dataset is None:
+            raise FileNotFoundError(f"Dataset not found: {split} ({split_path})")
+
+        dataset = maybe_shorten_dataset(
+            dataset,
+            split,
+            self.args.shorten_data_split_list,
+            self.args.shorten_method,
+            self.args.tokens_per_sample,
+            self.args.seed,
+        )
+        dataset = TokenBlockDataset(
+            dataset,
+            dataset.sizes,
+            self.args.tokens_per_sample,
+            pad=self.dictionary.pad(),
+            eos=self.dictionary.eos(),
+            break_mode=self.args.sample_break_mode,
+            include_targets=True,
+            use_plasma_view=self.args.use_plasma_view,
+            split_path=split_path,
+            plasma_path=self.args.plasma_path,
+        )
+
+        add_eos_for_other_targets = (
+            self.args.sample_break_mode is not None
+            and self.args.sample_break_mode != "none"
+        )
+        fixed_pad_length = None
+        if self.args.pad_to_fixed_length:
+            fixed_pad_length = self.args.tokens_per_sample
+
+        pad_to_bsz = None
+        if self.args.pad_to_fixed_bsz:
+            pad_to_bsz = self.args.batch_size_valid if 'valid' in split else self.args.batch_size
+
+        self.datasets[split] = MonolingualDataset(
+            dataset=dataset,
+            sizes=dataset.sizes,
+            src_vocab=self.dictionary,
+            tgt_vocab=self.output_dictionary,
+            add_eos_for_other_targets=add_eos_for_other_targets,
+            shuffle=True,
+            targets=self.targets,
+            add_bos_token=self.args.add_bos_token,
+            fixed_pad_length=fixed_pad_length,
+            pad_to_bsz=pad_to_bsz,
+        )
+
+    def build_dataset_for_inference(self, src_tokens, src_lengths, **kwargs):
+        """
+        Generate batches for inference. We prepend an eos token to src_tokens
+        (or bos if `--add-bos-token` is set) and we append a <pad> to target.
+        This is convenient both for generation with a prefix and LM scoring.
+        """
+        dataset = StripTokenDataset(
+            TokenBlockDataset(
+                src_tokens,
+                src_lengths,
+                block_size=None,  # ignored for "eos" break mode
+                pad=self.source_dictionary.pad(),
+                eos=self.source_dictionary.eos(),
+                break_mode="eos",
+            ),
+            # remove eos from (end of) target sequence
+            self.source_dictionary.eos(),
+        )
+        src_dataset = PrependTokenDataset(
+            dataset,
+            token=(
+                self.source_dictionary.bos()
+                if getattr(self.args, "add_bos_token", False)
+                else self.source_dictionary.eos()
+            ),
+        )
+        tgt_dataset = AppendTokenDataset(dataset, token=self.source_dictionary.pad())
+        return NestedDictionaryDataset(
+            {
+                "id": IdDataset(),
+                "net_input": {
+                    "src_tokens": PadDataset(
+                        src_dataset,
+                        pad_idx=self.source_dictionary.pad(),
+                        left_pad=False,
+                    ),
+                    "src_lengths": NumelDataset(src_dataset, reduce=False),
+                },
+                "target": PadDataset(
+                    tgt_dataset, pad_idx=self.source_dictionary.pad(), left_pad=False
+                ),
+            },
+            sizes=[np.array(src_lengths)],
+        )
+
+    def inference_step(
+        self, generator, models, sample, prefix_tokens=None, constraints=None
+    ):
+        with torch.no_grad():
+            # Generation will always be conditioned on bos_token
+            if getattr(self.args, "add_bos_token", False):
+                bos_token = self.source_dictionary.bos()
+            else:
+                bos_token = self.source_dictionary.eos()
+
+            if constraints is not None:
+                raise NotImplementedError(
+                    "Constrained decoding with the language_modeling task is not supported"
+                )
+
+            # SequenceGenerator doesn't use src_tokens directly, we need to
+            # pass the `prefix_tokens` argument instead
+            if prefix_tokens is None and sample["net_input"]["src_tokens"].nelement():
+                prefix_tokens = sample["net_input"]["src_tokens"]
+                if prefix_tokens[:, 0].eq(bos_token).all():
+                    prefix_tokens = prefix_tokens[:, 1:]
+
+            return generator.generate(
+                models, sample, prefix_tokens=prefix_tokens, bos_token=bos_token
+            )
+
+    def eval_lm_dataloader(
+        self,
+        dataset,
+        max_tokens: Optional[int] = 36000,
+        batch_size: Optional[int] = None,
+        max_positions: Optional[int] = None,
+        num_shards: int = 1,
+        shard_id: int = 0,
+        num_workers: int = 1,
+        data_buffer_size: int = 10,
+        # ensures that every evaluated token has access to a context of at least
+        # this size, if possible
+        context_window: int = 0,
+    ):
+        if context_window > 0:
+            dataset = LMContextWindowDataset(
+                dataset=dataset,
+                tokens_per_sample=self.args.tokens_per_sample,
+                context_window=context_window,
+                pad_idx=self.source_dictionary.pad(),
+            )
+        return self.get_batch_iterator(
+            dataset=dataset,
+            max_tokens=max_tokens,
+            max_sentences=batch_size,
+            max_positions=max_positions,
+            ignore_invalid_inputs=True,
+            num_shards=num_shards,
+            shard_id=shard_id,
+            num_workers=num_workers,
+            data_buffer_size=data_buffer_size,
+        ).next_epoch_itr(shuffle=False)
+
+    @property
+    def source_dictionary(self):
+        """Return the :class:`~fairseq.data.Dictionary` for the language
+        model."""
+        return self.dictionary
+
+    @property
+    def target_dictionary(self):
+        """Return the :class:`~fairseq.data.Dictionary` for the language
+        model."""
+        return self.output_dictionary
diff --git a/SpeechT5/fairseq/fairseq/tasks/legacy_masked_lm.py b/SpeechT5/fairseq/fairseq/tasks/legacy_masked_lm.py
new file mode 100644
index 0000000000000000000000000000000000000000..975497654926b64fff6c4960f54c4e6932e7fce1
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/tasks/legacy_masked_lm.py
@@ -0,0 +1,152 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import itertools
+import logging
+import os
+
+import numpy as np
+from fairseq import tokenizer, utils
+from fairseq.data import ConcatDataset, Dictionary, data_utils, indexed_dataset
+from fairseq.data.legacy.block_pair_dataset import BlockPairDataset
+from fairseq.data.legacy.masked_lm_dataset import MaskedLMDataset
+from fairseq.data.legacy.masked_lm_dictionary import BertDictionary
+from fairseq.tasks import LegacyFairseqTask, register_task
+
+
+logger = logging.getLogger(__name__)
+
+
+@register_task("legacy_masked_lm")
+class LegacyMaskedLMTask(LegacyFairseqTask):
+    """
+    Task for training Masked LM (BERT) model.
+    Args:
+        dictionary (Dictionary): the dictionary for the input of the task
+    """
+
+    @staticmethod
+    def add_args(parser):
+        """Add task-specific arguments to the parser."""
+        parser.add_argument(
+            "data",
+            help="colon separated path to data directories list, \
+                            will be iterated upon during epochs in round-robin manner",
+        )
+        parser.add_argument(
+            "--tokens-per-sample",
+            default=512,
+            type=int,
+            help="max number of total tokens over all segments"
+            " per sample for BERT dataset",
+        )
+        parser.add_argument(
+            "--break-mode", default="doc", type=str, help="mode for breaking sentence"
+        )
+        parser.add_argument("--shuffle-dataset", action="store_true", default=False)
+
+    def __init__(self, args, dictionary):
+        super().__init__(args)
+        self.dictionary = dictionary
+        self.seed = args.seed
+
+    @classmethod
+    def load_dictionary(cls, filename):
+        return BertDictionary.load(filename)
+
+    @classmethod
+    def build_dictionary(
+        cls, filenames, workers=1, threshold=-1, nwords=-1, padding_factor=8
+    ):
+        d = BertDictionary()
+        for filename in filenames:
+            Dictionary.add_file_to_dictionary(
+                filename, d, tokenizer.tokenize_line, workers
+            )
+        d.finalize(threshold=threshold, nwords=nwords, padding_factor=padding_factor)
+        return d
+
+    @property
+    def target_dictionary(self):
+        return self.dictionary
+
+    @classmethod
+    def setup_task(cls, args, **kwargs):
+        """Setup the task."""
+        paths = utils.split_paths(args.data)
+        assert len(paths) > 0
+        dictionary = BertDictionary.load(os.path.join(paths[0], "dict.txt"))
+        logger.info("dictionary: {} types".format(len(dictionary)))
+
+        return cls(args, dictionary)
+
+    def load_dataset(self, split, epoch=1, combine=False):
+        """Load a given dataset split.
+
+        Args:
+            split (str): name of the split (e.g., train, valid, test)
+        """
+        loaded_datasets = []
+
+        paths = utils.split_paths(self.args.data)
+        assert len(paths) > 0
+        data_path = paths[(epoch - 1) % len(paths)]
+        logger.info("data_path", data_path)
+
+        for k in itertools.count():
+            split_k = split + (str(k) if k > 0 else "")
+            path = os.path.join(data_path, split_k)
+            ds = indexed_dataset.make_dataset(
+                path,
+                impl=self.args.dataset_impl,
+                fix_lua_indexing=True,
+                dictionary=self.dictionary,
+            )
+
+            if ds is None:
+                if k > 0:
+                    break
+                else:
+                    raise FileNotFoundError(
+                        "Dataset not found: {} ({})".format(split, data_path)
+                    )
+
+            with data_utils.numpy_seed(self.seed + k):
+                loaded_datasets.append(
+                    BlockPairDataset(
+                        ds,
+                        self.dictionary,
+                        ds.sizes,
+                        self.args.tokens_per_sample,
+                        break_mode=self.args.break_mode,
+                        doc_break_size=1,
+                    )
+                )
+
+            logger.info(
+                "{} {} {} examples".format(data_path, split_k, len(loaded_datasets[-1]))
+            )
+
+            if not combine:
+                break
+
+        if len(loaded_datasets) == 1:
+            dataset = loaded_datasets[0]
+            sizes = dataset.sizes
+        else:
+            dataset = ConcatDataset(loaded_datasets)
+            sizes = np.concatenate([ds.sizes for ds in loaded_datasets])
+
+        self.datasets[split] = MaskedLMDataset(
+            dataset=dataset,
+            sizes=sizes,
+            vocab=self.dictionary,
+            pad_idx=self.dictionary.pad(),
+            mask_idx=self.dictionary.mask(),
+            classif_token_idx=self.dictionary.cls(),
+            sep_token_idx=self.dictionary.sep(),
+            shuffle=self.args.shuffle_dataset,
+            seed=self.seed,
+        )
diff --git a/SpeechT5/fairseq/fairseq/tasks/masked_lm.py b/SpeechT5/fairseq/fairseq/tasks/masked_lm.py
new file mode 100644
index 0000000000000000000000000000000000000000..fd2ea6ade15e94f963045db8cc1a20d4a3d5c7c1
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/tasks/masked_lm.py
@@ -0,0 +1,258 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import logging
+import os
+
+import numpy as np
+from fairseq import utils
+from fairseq.data import (
+    Dictionary,
+    IdDataset,
+    MaskTokensDataset,
+    NestedDictionaryDataset,
+    NumelDataset,
+    NumSamplesDataset,
+    PrependTokenDataset,
+    RightPadDataset,
+    SortDataset,
+    TokenBlockDataset,
+    data_utils,
+)
+from fairseq.data.encoders.utils import get_whole_word_mask
+from fairseq.data.shorten_dataset import maybe_shorten_dataset
+from fairseq.tasks import LegacyFairseqTask, register_task
+
+
+logger = logging.getLogger(__name__)
+
+
+@register_task("masked_lm")
+class MaskedLMTask(LegacyFairseqTask):
+    """Task for training masked language models (e.g., BERT, RoBERTa)."""
+
+    @staticmethod
+    def add_args(parser):
+        """Add task-specific arguments to the parser."""
+        parser.add_argument(
+            "data",
+            help="colon separated path to data directories list, \
+                            will be iterated upon during epochs in round-robin manner",
+        )
+        parser.add_argument(
+            "--sample-break-mode",
+            default="complete",
+            choices=["none", "complete", "complete_doc", "eos"],
+            help='If omitted or "none", fills each sample with tokens-per-sample '
+            'tokens. If set to "complete", splits samples only at the end '
+            "of sentence, but may include multiple sentences per sample. "
+            '"complete_doc" is similar but respects doc boundaries. '
+            'If set to "eos", includes only one sentence per sample.',
+        )
+        parser.add_argument(
+            "--tokens-per-sample",
+            default=512,
+            type=int,
+            help="max number of total tokens over all segments "
+            "per sample for BERT dataset",
+        )
+        parser.add_argument(
+            "--mask-prob",
+            default=0.15,
+            type=float,
+            help="probability of replacing a token with mask",
+        )
+        parser.add_argument(
+            "--leave-unmasked-prob",
+            default=0.1,
+            type=float,
+            help="probability that a masked token is unmasked",
+        )
+        parser.add_argument(
+            "--random-token-prob",
+            default=0.1,
+            type=float,
+            help="probability of replacing a token with a random token",
+        )
+        parser.add_argument(
+            "--freq-weighted-replacement",
+            default=False,
+            action="store_true",
+            help="sample random replacement words based on word frequencies",
+        )
+        parser.add_argument(
+            "--mask-whole-words",
+            default=False,
+            action="store_true",
+            help="mask whole words; you may also want to set --bpe",
+        )
+        parser.add_argument(
+            "--mask-multiple-length",
+            default=1,
+            type=int,
+            help="repeat the mask indices multiple times",
+        )
+        parser.add_argument(
+            "--mask-stdev", default=0.0, type=float, help="stdev of the mask length"
+        )
+        parser.add_argument(
+            "--shorten-method",
+            default="none",
+            choices=["none", "truncate", "random_crop"],
+            help="if not none, shorten sequences that exceed --tokens-per-sample",
+        )
+        parser.add_argument(
+            "--shorten-data-split-list",
+            default="",
+            help="comma-separated list of dataset splits to apply shortening to, "
+            'e.g., "train,valid" (default: all dataset splits)',
+        )
+
+    def __init__(self, args, dictionary):
+        super().__init__(args)
+        self.dictionary = dictionary
+        self.seed = args.seed
+
+        # add mask token
+        self.mask_idx = dictionary.add_symbol("<mask>")
+
+    @classmethod
+    def setup_task(cls, args, **kwargs):
+        paths = utils.split_paths(args.data)
+        assert len(paths) > 0
+        dictionary = Dictionary.load(os.path.join(paths[0], "dict.txt"))
+        logger.info("dictionary: {} types".format(len(dictionary)))
+        return cls(args, dictionary)
+
+    def load_dataset(self, split, epoch=1, combine=False, **kwargs):
+        """Load a given dataset split.
+
+        Args:
+            split (str): name of the split (e.g., train, valid, test)
+        """
+        paths = utils.split_paths(self.args.data)
+        assert len(paths) > 0
+        data_path = paths[(epoch - 1) % len(paths)]
+        split_path = os.path.join(data_path, split)
+
+        dataset = data_utils.load_indexed_dataset(
+            split_path,
+            self.source_dictionary,
+            self.args.dataset_impl,
+            combine=combine,
+        )
+        if dataset is None:
+            raise FileNotFoundError(
+                "Dataset not found: {} ({})".format(split, split_path)
+            )
+
+        dataset = maybe_shorten_dataset(
+            dataset,
+            split,
+            self.args.shorten_data_split_list,
+            self.args.shorten_method,
+            self.args.tokens_per_sample,
+            self.args.seed,
+        )
+
+        # create continuous blocks of tokens
+        dataset = TokenBlockDataset(
+            dataset,
+            dataset.sizes,
+            self.args.tokens_per_sample - 1,  # one less for <s>
+            pad=self.source_dictionary.pad(),
+            eos=self.source_dictionary.eos(),
+            break_mode=self.args.sample_break_mode,
+        )
+        logger.info("loaded {} blocks from: {}".format(len(dataset), split_path))
+
+        # prepend beginning-of-sentence token (<s>, equiv. to [CLS] in BERT)
+        dataset = PrependTokenDataset(dataset, self.source_dictionary.bos())
+
+        # create masked input and targets
+        mask_whole_words = (
+            get_whole_word_mask(self.args, self.source_dictionary)
+            if self.args.mask_whole_words
+            else None
+        )
+
+        src_dataset, tgt_dataset = MaskTokensDataset.apply_mask(
+            dataset,
+            self.source_dictionary,
+            pad_idx=self.source_dictionary.pad(),
+            mask_idx=self.mask_idx,
+            seed=self.args.seed,
+            mask_prob=self.args.mask_prob,
+            leave_unmasked_prob=self.args.leave_unmasked_prob,
+            random_token_prob=self.args.random_token_prob,
+            freq_weighted_replacement=self.args.freq_weighted_replacement,
+            mask_whole_words=mask_whole_words,
+            mask_multiple_length=self.args.mask_multiple_length,
+            mask_stdev=self.args.mask_stdev,
+        )
+
+        with data_utils.numpy_seed(self.args.seed):
+            shuffle = np.random.permutation(len(src_dataset))
+
+        self.datasets[split] = SortDataset(
+            NestedDictionaryDataset(
+                {
+                    "id": IdDataset(),
+                    "net_input": {
+                        "src_tokens": RightPadDataset(
+                            src_dataset,
+                            pad_idx=self.source_dictionary.pad(),
+                        ),
+                        "src_lengths": NumelDataset(src_dataset, reduce=False),
+                    },
+                    "target": RightPadDataset(
+                        tgt_dataset,
+                        pad_idx=self.source_dictionary.pad(),
+                    ),
+                    "nsentences": NumSamplesDataset(),
+                    "ntokens": NumelDataset(src_dataset, reduce=True),
+                },
+                sizes=[src_dataset.sizes],
+            ),
+            sort_order=[
+                shuffle,
+                src_dataset.sizes,
+            ],
+        )
+
+    def build_dataset_for_inference(self, src_tokens, src_lengths, sort=True):
+        src_dataset = RightPadDataset(
+            TokenBlockDataset(
+                src_tokens,
+                src_lengths,
+                self.args.tokens_per_sample - 1,  # one less for <s>
+                pad=self.source_dictionary.pad(),
+                eos=self.source_dictionary.eos(),
+                break_mode="eos",
+            ),
+            pad_idx=self.source_dictionary.pad(),
+        )
+        src_dataset = PrependTokenDataset(src_dataset, self.source_dictionary.bos())
+        src_dataset = NestedDictionaryDataset(
+            {
+                "id": IdDataset(),
+                "net_input": {
+                    "src_tokens": src_dataset,
+                    "src_lengths": NumelDataset(src_dataset, reduce=False),
+                },
+            },
+            sizes=src_lengths,
+        )
+        if sort:
+            src_dataset = SortDataset(src_dataset, sort_order=[src_lengths])
+        return src_dataset
+
+    @property
+    def source_dictionary(self):
+        return self.dictionary
+
+    @property
+    def target_dictionary(self):
+        return self.dictionary
diff --git a/SpeechT5/fairseq/fairseq/tasks/multilingual_denoising.py b/SpeechT5/fairseq/fairseq/tasks/multilingual_denoising.py
new file mode 100644
index 0000000000000000000000000000000000000000..d1c914917feb5165aad7482cd1377f5f65b21635
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/tasks/multilingual_denoising.py
@@ -0,0 +1,254 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import logging
+import os
+
+import numpy as np
+from fairseq.data import (
+    AppendTokenDataset,
+    ConcatDataset,
+    DenoisingDataset,
+    Dictionary,
+    PrependTokenDataset,
+    ResamplingDataset,
+    SortDataset,
+    TokenBlockDataset,
+    data_utils,
+)
+from fairseq.data.encoders.utils import get_whole_word_mask
+from fairseq.tasks import register_task
+
+from .denoising import DenoisingTask
+
+
+logger = logging.getLogger(__name__)
+
+
+@register_task("multilingual_denoising")
+class MultilingualDenoisingTask(DenoisingTask):
+    @staticmethod
+    def add_args(parser):
+        DenoisingTask.add_args(parser)
+        parser.add_argument(
+            "--multilang-sampling-alpha",
+            type=float,
+            default=1.0,
+            help="smoothing alpha for sample ratios across multiple datasets",
+        )
+        parser.add_argument("--add-lang-token", default=False, action="store_true")
+        parser.add_argument(
+            "--langs", type=str, help="language ids we are considering", default=None
+        )
+        parser.add_argument(
+            "--no-whole-word-mask-langs",
+            type=str,
+            default="",
+            metavar="N",
+            help="languages without spacing between words dont support whole word masking",
+        )
+
+    @classmethod
+    def setup_task(cls, args, **kwargs):
+        """Setup the task."""
+        paths = args.data.split(":")
+        assert len(paths) > 0
+        dictionary = Dictionary.load(os.path.join(paths[0], "dict.txt"))
+
+        data_path = paths[0]
+        if args.langs is None:
+            languages = sorted(
+                [
+                    name
+                    for name in os.listdir(data_path)
+                    if os.path.isdir(os.path.join(data_path, name))
+                ]
+            )
+        else:
+            languages = args.langs.split(",")
+
+        if args.add_lang_token:
+            for lang in languages:
+                dictionary.add_symbol("[{}]".format(lang))
+
+        logger.info("dictionary: {} types".format(len(dictionary)))
+        if not hasattr(args, "shuffle_instance"):
+            args.shuffle_instance = False
+        return cls(args, dictionary)
+
+    def __init__(self, args, dictionary):
+        super().__init__(args, dictionary)
+        self.dictionary = dictionary
+        self.seed = args.seed
+
+        # add mask token
+        self.mask_idx = self.dictionary.add_symbol("<mask>")
+        self.langs = args.langs
+        self.args = args
+
+    def _get_sample_prob(self, dataset_lens):
+        """
+        Get smoothed sampling porbability by languages. This helps low resource
+        languages by upsampling them.
+        """
+        prob = dataset_lens / dataset_lens.sum()
+        smoothed_prob = prob ** self.args.multilang_sampling_alpha
+        smoothed_prob = smoothed_prob / smoothed_prob.sum()
+        return smoothed_prob
+
+    def load_dataset(self, split, epoch=1, combine=False, **kwargs):
+        """Load a given dataset split.
+
+        Args:
+            split (str): name of the split (e.g., train, valid, test)
+        """
+        paths = self.args.data.split(":")
+        assert len(paths) > 0
+        data_path = paths[(epoch - 1) % len(paths)]
+        split_path = os.path.join(data_path, split)
+
+        if self.langs is None:
+            languages = sorted(
+                [
+                    name
+                    for name in os.listdir(data_path)
+                    if os.path.isdir(os.path.join(data_path, name))
+                ]
+            )
+        else:
+            languages = self.langs.split(",")
+            for name in languages:
+                p = os.path.join(data_path, name)
+                assert os.path.exists(p), "data not found: {}".format(p)
+
+        logger.info("Training on {0} languages: {1}".format(len(languages), languages))
+        logger.info(
+            "Language to id mapping: ", {lang: id for id, lang in enumerate(languages)}
+        )
+
+        mask_whole_words = get_whole_word_mask(self.args, self.dictionary)
+        language_without_segmentations = self.args.no_whole_word_mask_langs.split(",")
+        lang_datasets = []
+        for language in languages:
+            split_path = os.path.join(data_path, language, split)
+
+            dataset = data_utils.load_indexed_dataset(
+                split_path,
+                self.source_dictionary,
+                self.args.dataset_impl,
+                combine=combine,
+            )
+            if dataset is None:
+                raise FileNotFoundError(
+                    "Dataset not found: {} ({})".format(split, split_path)
+                )
+
+            end_token = (
+                self.source_dictionary.index("[{}]".format(language))
+                if self.args.add_lang_token
+                else self.source_dictionary.eos()
+            )
+
+            # create continuous blocks of tokens
+            dataset = TokenBlockDataset(
+                dataset,
+                dataset.sizes,
+                self.args.tokens_per_sample - 2,  # one less for <s>
+                pad=self.source_dictionary.pad(),
+                eos=end_token,
+                break_mode=self.args.sample_break_mode,
+            )
+            logger.info("loaded {} blocks from: {}".format(len(dataset), split_path))
+
+            # prepend beginning-of-sentence token (<s>, equiv. to [CLS] in BERT)
+            dataset = PrependTokenDataset(dataset, self.source_dictionary.bos())
+            dataset = AppendTokenDataset(dataset, end_token)
+
+            lang_mask_whole_words = (
+                mask_whole_words
+                if language not in language_without_segmentations
+                else None
+            )
+            lang_dataset = DenoisingDataset(
+                dataset,
+                dataset.sizes,
+                self.dictionary,
+                self.mask_idx,
+                lang_mask_whole_words,
+                shuffle=self.args.shuffle_instance,
+                seed=self.seed,
+                args=self.args,
+                eos=None
+                if not self.args.add_lang_token
+                else self.source_dictionary.index("[{}]".format(language)),
+            )
+            lang_datasets.append(lang_dataset)
+
+        dataset_lengths = np.array(
+            [len(d) for d in lang_datasets],
+            dtype=float,
+        )
+        logger.info(
+            "loaded total {} blocks for all languages".format(
+                int(dataset_lengths.sum()),
+            )
+        )
+        if split == self.args.train_subset:
+            # For train subset, additionally up or down sample languages.
+            sample_probs = self._get_sample_prob(dataset_lengths)
+            logger.info(
+                "Sample probability by language: {}".format(
+                    {
+                        lang: "{0:.4f}".format(sample_probs[id])
+                        for id, lang in enumerate(languages)
+                    }
+                )
+            )
+            size_ratio = (sample_probs * dataset_lengths.sum()) / dataset_lengths
+            logger.info(
+                "Up/Down Sampling ratio by language: {}".format(
+                    {
+                        lang: "{0:.2f}".format(size_ratio[id])
+                        for id, lang in enumerate(languages)
+                    }
+                )
+            )
+
+            resampled_lang_datasets = [
+                ResamplingDataset(
+                    lang_datasets[i],
+                    size_ratio=size_ratio[i],
+                    seed=self.args.seed,
+                    epoch=epoch,
+                    replace=size_ratio[i] >= 1.0,
+                )
+                for i, d in enumerate(lang_datasets)
+            ]
+            dataset = ConcatDataset(
+                resampled_lang_datasets,
+            )
+        else:
+            dataset = ConcatDataset(lang_datasets)
+            lang_splits = [split]
+            for lang_id, lang_dataset in enumerate(lang_datasets):
+                split_name = split + "_" + languages[lang_id]
+                lang_splits.append(split_name)
+                self.datasets[split_name] = lang_dataset
+
+            if split in self.args.valid_subset:
+                self.args.valid_subset = self.args.valid_subset.replace(
+                    split, ",".join(lang_splits)
+                )
+
+        with data_utils.numpy_seed(self.args.seed + epoch):
+            shuffle = np.random.permutation(len(dataset))
+
+        self.datasets[split] = SortDataset(
+            dataset,
+            sort_order=[
+                shuffle,
+                dataset.sizes,
+            ],
+        )
diff --git a/SpeechT5/fairseq/fairseq/tasks/multilingual_masked_lm.py b/SpeechT5/fairseq/fairseq/tasks/multilingual_masked_lm.py
new file mode 100644
index 0000000000000000000000000000000000000000..9e6ce4b8a2f77ed889a6e1451321a8e3ac21dc67
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/tasks/multilingual_masked_lm.py
@@ -0,0 +1,338 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import logging
+import os
+
+import numpy as np
+import torch
+from fairseq import utils
+from fairseq.data import (
+    ConcatDataset,
+    Dictionary,
+    IdDataset,
+    MaskTokensDataset,
+    NestedDictionaryDataset,
+    NumelDataset,
+    NumSamplesDataset,
+    PadDataset,
+    PrependTokenDataset,
+    RawLabelDataset,
+    ResamplingDataset,
+    SortDataset,
+    TokenBlockDataset,
+    data_utils,
+    encoders,
+)
+from fairseq.tasks import LegacyFairseqTask, register_task
+
+
+logger = logging.getLogger(__name__)
+
+
+@register_task("multilingual_masked_lm")
+class MultiLingualMaskedLMTask(LegacyFairseqTask):
+    """Task for training masked language models (e.g., BERT, RoBERTa)."""
+
+    @staticmethod
+    def add_args(parser):
+        """Add task-specific arguments to the parser."""
+        parser.add_argument(
+            "data",
+            help="colon separated path to data directories list, \
+                            will be iterated upon during epochs in round-robin manner",
+        )
+        parser.add_argument(
+            "--sample-break-mode",
+            default="complete",
+            choices=["none", "complete", "complete_doc", "eos"],
+            help='If omitted or "none", fills each sample with tokens-per-sample '
+            'tokens. If set to "complete", splits samples only at the end '
+            "of sentence, but may include multiple sentences per sample. "
+            '"complete_doc" is similar but respects doc boundaries. '
+            'If set to "eos", includes only one sentence per sample.',
+        )
+        parser.add_argument(
+            "--tokens-per-sample",
+            default=512,
+            type=int,
+            help="max number of total tokens over all segments "
+            "per sample for BERT dataset",
+        )
+        parser.add_argument(
+            "--mask-prob",
+            default=0.15,
+            type=float,
+            help="probability of replacing a token with mask",
+        )
+        parser.add_argument(
+            "--leave-unmasked-prob",
+            default=0.1,
+            type=float,
+            help="probability that a masked token is unmasked",
+        )
+        parser.add_argument(
+            "--random-token-prob",
+            default=0.1,
+            type=float,
+            help="probability of replacing a token with a random token",
+        )
+        parser.add_argument(
+            "--freq-weighted-replacement",
+            action="store_true",
+            help="sample random replacement words based on word frequencies",
+        )
+        parser.add_argument(
+            "--mask-whole-words",
+            default=False,
+            action="store_true",
+            help="mask whole words; you may also want to set --bpe",
+        )
+        parser.add_argument(
+            "--multilang-sampling-alpha",
+            type=float,
+            default=1.0,
+            help="smoothing alpha for sample rations across multiple datasets",
+        )
+
+    def __init__(self, args, dictionary):
+        super().__init__(args)
+        self.dictionary = dictionary
+        self.seed = args.seed
+
+        # add mask token
+        self.mask_idx = dictionary.add_symbol("<mask>")
+
+    @classmethod
+    def setup_task(cls, args, **kwargs):
+        paths = utils.split_paths(args.data)
+        assert len(paths) > 0
+        dictionary = Dictionary.load(os.path.join(paths[0], "dict.txt"))
+        logger.info("dictionary: {} types".format(len(dictionary)))
+        return cls(args, dictionary)
+
+    def _get_whole_word_mask(self):
+        # create masked input and targets
+        if self.args.mask_whole_words:
+            bpe = encoders.build_bpe(self.args)
+            if bpe is not None:
+
+                def is_beginning_of_word(i):
+                    if i < self.source_dictionary.nspecial:
+                        # special elements are always considered beginnings
+                        return True
+                    tok = self.source_dictionary[i]
+                    if tok.startswith("madeupword"):
+                        return True
+                    try:
+                        return bpe.is_beginning_of_word(tok)
+                    except ValueError:
+                        return True
+
+                mask_whole_words = torch.ByteTensor(
+                    list(map(is_beginning_of_word, range(len(self.source_dictionary))))
+                )
+        else:
+            mask_whole_words = None
+        return mask_whole_words
+
+    def _get_sample_prob(self, dataset_lens):
+        """
+        Get smoothed sampling porbability by languages. This helps low resource
+        languages by upsampling them.
+        """
+        prob = dataset_lens / dataset_lens.sum()
+        smoothed_prob = prob ** self.args.multilang_sampling_alpha
+        smoothed_prob = smoothed_prob / smoothed_prob.sum()
+        return smoothed_prob
+
+    def load_dataset(self, split, epoch=1, combine=False, **kwargs):
+        """Load a given dataset split.
+
+        Args:
+            split (str): name of the split (e.g., train, valid, test)
+        """
+        paths = utils.split_paths(self.args.data)
+        assert len(paths) > 0
+        data_path = paths[(epoch - 1) % len(paths)]
+
+        languages = sorted(
+            name
+            for name in os.listdir(data_path)
+            if os.path.isdir(os.path.join(data_path, name))
+        )
+
+        logger.info("Training on {0} languages: {1}".format(len(languages), languages))
+        logger.info(
+            "Language to id mapping: ", {lang: id for id, lang in enumerate(languages)}
+        )
+
+        mask_whole_words = self._get_whole_word_mask()
+        lang_datasets = []
+        for lang_id, language in enumerate(languages):
+            split_path = os.path.join(data_path, language, split)
+
+            dataset = data_utils.load_indexed_dataset(
+                split_path,
+                self.source_dictionary,
+                self.args.dataset_impl,
+                combine=combine,
+            )
+            if dataset is None:
+                raise FileNotFoundError(
+                    "Dataset not found: {} ({})".format(split, split_path)
+                )
+
+            # create continuous blocks of tokens
+            dataset = TokenBlockDataset(
+                dataset,
+                dataset.sizes,
+                self.args.tokens_per_sample - 1,  # one less for <s>
+                pad=self.source_dictionary.pad(),
+                eos=self.source_dictionary.eos(),
+                break_mode=self.args.sample_break_mode,
+            )
+            logger.info("loaded {} blocks from: {}".format(len(dataset), split_path))
+
+            # prepend beginning-of-sentence token (<s>, equiv. to [CLS] in BERT)
+            dataset = PrependTokenDataset(dataset, self.source_dictionary.bos())
+
+            src_dataset, tgt_dataset = MaskTokensDataset.apply_mask(
+                dataset,
+                self.source_dictionary,
+                pad_idx=self.source_dictionary.pad(),
+                mask_idx=self.mask_idx,
+                seed=self.args.seed,
+                mask_prob=self.args.mask_prob,
+                leave_unmasked_prob=self.args.leave_unmasked_prob,
+                random_token_prob=self.args.random_token_prob,
+                freq_weighted_replacement=self.args.freq_weighted_replacement,
+                mask_whole_words=mask_whole_words,
+            )
+
+            lang_dataset = NestedDictionaryDataset(
+                {
+                    "net_input": {
+                        "src_tokens": PadDataset(
+                            src_dataset,
+                            pad_idx=self.source_dictionary.pad(),
+                            left_pad=False,
+                        ),
+                        "src_lengths": NumelDataset(src_dataset, reduce=False),
+                    },
+                    "target": PadDataset(
+                        tgt_dataset,
+                        pad_idx=self.source_dictionary.pad(),
+                        left_pad=False,
+                    ),
+                    "nsentences": NumSamplesDataset(),
+                    "ntokens": NumelDataset(src_dataset, reduce=True),
+                    "lang_id": RawLabelDataset([lang_id] * src_dataset.sizes.shape[0]),
+                },
+                sizes=[src_dataset.sizes],
+            )
+            lang_datasets.append(lang_dataset)
+
+        dataset_lengths = np.array(
+            [len(d) for d in lang_datasets],
+            dtype=float,
+        )
+        logger.info(
+            "loaded total {} blocks for all languages".format(
+                dataset_lengths.sum(),
+            )
+        )
+        if split == self.args.train_subset:
+            # For train subset, additionally up or down sample languages.
+            sample_probs = self._get_sample_prob(dataset_lengths)
+            logger.info(
+                "Sample probability by language: ",
+                {
+                    lang: "{0:.4f}".format(sample_probs[id])
+                    for id, lang in enumerate(languages)
+                },
+            )
+            size_ratio = (sample_probs * dataset_lengths.sum()) / dataset_lengths
+            logger.info(
+                "Up/Down Sampling ratio by language: ",
+                {
+                    lang: "{0:.2f}".format(size_ratio[id])
+                    for id, lang in enumerate(languages)
+                },
+            )
+
+            resampled_lang_datasets = [
+                ResamplingDataset(
+                    lang_datasets[i],
+                    size_ratio=size_ratio[i],
+                    seed=self.args.seed,
+                    epoch=epoch,
+                    replace=size_ratio[i] >= 1.0,
+                )
+                for i, d in enumerate(lang_datasets)
+            ]
+            dataset = ConcatDataset(resampled_lang_datasets)
+        else:
+            dataset = ConcatDataset(lang_datasets)
+            lang_splits = [split]
+            for lang_id, lang_dataset in enumerate(lang_datasets):
+                split_name = split + "_" + languages[lang_id]
+                lang_splits.append(split_name)
+                self.datasets[split_name] = lang_dataset
+
+            # [TODO]: This is hacky for now to print validation ppl for each
+            # language individually. Maybe need task API changes to allow it
+            # in more generic ways.
+            if split in self.args.valid_subset:
+                self.args.valid_subset = self.args.valid_subset.replace(
+                    split, ",".join(lang_splits)
+                )
+
+        with data_utils.numpy_seed(self.args.seed + epoch):
+            shuffle = np.random.permutation(len(dataset))
+
+        self.datasets[split] = SortDataset(
+            dataset,
+            sort_order=[
+                shuffle,
+                dataset.sizes,
+            ],
+        )
+
+    def build_dataset_for_inference(self, src_tokens, src_lengths, sort=True):
+        src_dataset = PadDataset(
+            TokenBlockDataset(
+                src_tokens,
+                src_lengths,
+                self.args.tokens_per_sample - 1,  # one less for <s>
+                pad=self.source_dictionary.pad(),
+                eos=self.source_dictionary.eos(),
+                break_mode="eos",
+            ),
+            pad_idx=self.source_dictionary.pad(),
+            left_pad=False,
+        )
+        src_dataset = PrependTokenDataset(src_dataset, self.source_dictionary.bos())
+        src_dataset = NestedDictionaryDataset(
+            {
+                "id": IdDataset(),
+                "net_input": {
+                    "src_tokens": src_dataset,
+                    "src_lengths": NumelDataset(src_dataset, reduce=False),
+                },
+            },
+            sizes=src_lengths,
+        )
+        if sort:
+            src_dataset = SortDataset(src_dataset, sort_order=[src_lengths])
+        return src_dataset
+
+    @property
+    def source_dictionary(self):
+        return self.dictionary
+
+    @property
+    def target_dictionary(self):
+        return self.dictionary
diff --git a/SpeechT5/fairseq/fairseq/tasks/multilingual_translation.py b/SpeechT5/fairseq/fairseq/tasks/multilingual_translation.py
new file mode 100644
index 0000000000000000000000000000000000000000..26e0b529d5f2902bd80c8207a001ae28af393291
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/tasks/multilingual_translation.py
@@ -0,0 +1,457 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import contextlib
+import logging
+import os
+from collections import OrderedDict
+
+import torch
+from fairseq import metrics, options, utils
+from fairseq.data import (
+    Dictionary,
+    LanguagePairDataset,
+    RoundRobinZipDatasets,
+    TransformEosLangPairDataset,
+)
+from fairseq.models import FairseqMultiModel
+from fairseq.tasks.translation import load_langpair_dataset
+
+from . import LegacyFairseqTask, register_task
+
+
+logger = logging.getLogger(__name__)
+
+
+def _lang_token(lang: str):
+    return "__{}__".format(lang)
+
+
+def _lang_token_index(dic: Dictionary, lang: str):
+    """Return language token index."""
+    idx = dic.index(_lang_token(lang))
+    assert idx != dic.unk_index, "cannot find language token for lang {}".format(lang)
+    return idx
+
+
+@register_task("multilingual_translation")
+class MultilingualTranslationTask(LegacyFairseqTask):
+    """A task for training multiple translation models simultaneously.
+
+    We iterate round-robin over batches from multiple language pairs, ordered
+    according to the `--lang-pairs` argument.
+
+    The training loop is roughly:
+
+        for i in range(len(epoch)):
+            for lang_pair in args.lang_pairs:
+                batch = next_batch_for_lang_pair(lang_pair)
+                loss = criterion(model_for_lang_pair(lang_pair), batch)
+                loss.backward()
+            optimizer.step()
+
+    In practice, `next_batch_for_lang_pair` is abstracted in a FairseqDataset
+    (e.g., `RoundRobinZipDatasets`) and `model_for_lang_pair` is a model that
+    implements the `FairseqMultiModel` interface.
+
+    During inference it is required to specify a single `--source-lang` and
+    `--target-lang`, which indicates the inference langauge direction.
+    `--lang-pairs`, `--encoder-langtok`, `--decoder-langtok` have to be set to
+    the same value as training.
+    """
+
+    @staticmethod
+    def add_args(parser):
+        """Add task-specific arguments to the parser."""
+        # fmt: off
+        parser.add_argument('data', metavar='DIR', help='path to data directory')
+        parser.add_argument('--lang-pairs', default=None, metavar='PAIRS',
+                            help='comma-separated list of language pairs (in training order): en-de,en-fr,de-fr')
+        parser.add_argument('-s', '--source-lang', default=None, metavar='SRC',
+                            help='source language (only needed for inference)')
+        parser.add_argument('-t', '--target-lang', default=None, metavar='TARGET',
+                            help='target language (only needed for inference)')
+        parser.add_argument('--left-pad-source', default='True', type=str, metavar='BOOL',
+                            help='pad the source on the left (default: True)')
+        parser.add_argument('--left-pad-target', default='False', type=str, metavar='BOOL',
+                            help='pad the target on the left (default: False)')
+        parser.add_argument('--max-source-positions', default=1024, type=int, metavar='N',
+                            help='max number of tokens in the source sequence')
+        parser.add_argument('--max-target-positions', default=1024, type=int, metavar='N',
+                            help='max number of tokens in the target sequence')
+        parser.add_argument('--upsample-primary', default=1, type=int,
+                            help='amount to upsample primary dataset')
+        parser.add_argument('--encoder-langtok', default=None, type=str, choices=['src', 'tgt'],
+                            metavar='SRCTGT',
+                            help='replace beginning-of-sentence in source sentence with source or target '
+                                 'language token. (src/tgt)')
+        parser.add_argument('--decoder-langtok', action='store_true',
+                            help='replace beginning-of-sentence in target sentence with target language token')
+        # fmt: on
+
+    def __init__(self, args, dicts, training):
+        super().__init__(args)
+        self.dicts = dicts
+        self.training = training
+        if training:
+            self.lang_pairs = args.lang_pairs
+        else:
+            self.lang_pairs = ["{}-{}".format(args.source_lang, args.target_lang)]
+        # eval_lang_pairs for multilingual translation is usually all of the
+        # lang_pairs. However for other multitask settings or when we want to
+        # optimize for certain languages we want to use a different subset. Thus
+        # the eval_lang_pairs class variable is provided for classes that extend
+        # this class.
+        self.eval_lang_pairs = self.lang_pairs
+        # model_lang_pairs will be used to build encoder-decoder model pairs in
+        # models.build_model(). This allows multitask type of sub-class can
+        # build models other than the input lang_pairs
+        self.model_lang_pairs = self.lang_pairs
+        self.langs = list(dicts.keys())
+
+    @classmethod
+    def setup_task(cls, args, **kwargs):
+        dicts, training = cls.prepare(args, **kwargs)
+        return cls(args, dicts, training)
+
+    @classmethod
+    def update_args(cls, args):
+        args.left_pad_source = utils.eval_bool(args.left_pad_source)
+        args.left_pad_target = utils.eval_bool(args.left_pad_target)
+
+        if args.lang_pairs is None:
+            raise ValueError(
+                "--lang-pairs is required. List all the language pairs in the training objective."
+            )
+        if isinstance(args.lang_pairs, str):
+            args.lang_pairs = args.lang_pairs.split(",")
+
+    @classmethod
+    def prepare(cls, args, **kargs):
+        cls.update_args(args)
+        sorted_langs = sorted(
+            list({x for lang_pair in args.lang_pairs for x in lang_pair.split("-")})
+        )
+        if args.source_lang is not None or args.target_lang is not None:
+            training = False
+        else:
+            training = True
+
+        # load dictionaries
+        dicts = OrderedDict()
+        for lang in sorted_langs:
+            paths = utils.split_paths(args.data)
+            assert len(paths) > 0
+            dicts[lang] = cls.load_dictionary(
+                os.path.join(paths[0], "dict.{}.txt".format(lang))
+            )
+            if len(dicts) > 0:
+                assert dicts[lang].pad() == dicts[sorted_langs[0]].pad()
+                assert dicts[lang].eos() == dicts[sorted_langs[0]].eos()
+                assert dicts[lang].unk() == dicts[sorted_langs[0]].unk()
+            if args.encoder_langtok is not None or args.decoder_langtok:
+                for lang_to_add in sorted_langs:
+                    dicts[lang].add_symbol(_lang_token(lang_to_add))
+            logger.info("[{}] dictionary: {} types".format(lang, len(dicts[lang])))
+        return dicts, training
+
+    def get_encoder_langtok(self, src_lang, tgt_lang):
+        if self.args.encoder_langtok is None:
+            return self.dicts[src_lang].eos()
+        if self.args.encoder_langtok == "src":
+            return _lang_token_index(self.dicts[src_lang], src_lang)
+        else:
+            return _lang_token_index(self.dicts[src_lang], tgt_lang)
+
+    def get_decoder_langtok(self, tgt_lang):
+        if not self.args.decoder_langtok:
+            return self.dicts[tgt_lang].eos()
+        return _lang_token_index(self.dicts[tgt_lang], tgt_lang)
+
+    def alter_dataset_langtok(
+        self,
+        lang_pair_dataset,
+        src_eos=None,
+        src_lang=None,
+        tgt_eos=None,
+        tgt_lang=None,
+    ):
+        if self.args.encoder_langtok is None and not self.args.decoder_langtok:
+            return lang_pair_dataset
+
+        new_src_eos = None
+        if (
+            self.args.encoder_langtok is not None
+            and src_eos is not None
+            and src_lang is not None
+            and tgt_lang is not None
+        ):
+            new_src_eos = self.get_encoder_langtok(src_lang, tgt_lang)
+        else:
+            src_eos = None
+
+        new_tgt_bos = None
+        if self.args.decoder_langtok and tgt_eos is not None and tgt_lang is not None:
+            new_tgt_bos = self.get_decoder_langtok(tgt_lang)
+        else:
+            tgt_eos = None
+
+        return TransformEosLangPairDataset(
+            lang_pair_dataset,
+            src_eos=src_eos,
+            new_src_eos=new_src_eos,
+            tgt_bos=tgt_eos,
+            new_tgt_bos=new_tgt_bos,
+        )
+
+    def load_dataset(self, split, epoch=1, **kwargs):
+        """Load a dataset split."""
+        paths = utils.split_paths(self.args.data)
+        assert len(paths) > 0
+        data_path = paths[(epoch - 1) % len(paths)]
+
+        def language_pair_dataset(lang_pair):
+            src, tgt = lang_pair.split("-")
+            langpair_dataset = load_langpair_dataset(
+                data_path,
+                split,
+                src,
+                self.dicts[src],
+                tgt,
+                self.dicts[tgt],
+                combine=True,
+                dataset_impl=self.args.dataset_impl,
+                upsample_primary=self.args.upsample_primary,
+                left_pad_source=self.args.left_pad_source,
+                left_pad_target=self.args.left_pad_target,
+                max_source_positions=self.args.max_source_positions,
+                max_target_positions=self.args.max_target_positions,
+            )
+            return self.alter_dataset_langtok(
+                langpair_dataset,
+                src_eos=self.dicts[src].eos(),
+                src_lang=src,
+                tgt_eos=self.dicts[tgt].eos(),
+                tgt_lang=tgt,
+            )
+
+        self.datasets[split] = RoundRobinZipDatasets(
+            OrderedDict(
+                [
+                    (lang_pair, language_pair_dataset(lang_pair))
+                    for lang_pair in self.lang_pairs
+                ]
+            ),
+            eval_key=None
+            if self.training
+            else "%s-%s" % (self.args.source_lang, self.args.target_lang),
+        )
+
+    def build_dataset_for_inference(self, src_tokens, src_lengths, constraints=None):
+        if constraints is not None:
+            raise NotImplementedError(
+                "Constrained decoding with the multilingual_translation task is not supported"
+            )
+
+        lang_pair = "%s-%s" % (self.args.source_lang, self.args.target_lang)
+        return RoundRobinZipDatasets(
+            OrderedDict(
+                [
+                    (
+                        lang_pair,
+                        self.alter_dataset_langtok(
+                            LanguagePairDataset(
+                                src_tokens, src_lengths, self.source_dictionary
+                            ),
+                            src_eos=self.source_dictionary.eos(),
+                            src_lang=self.args.source_lang,
+                            tgt_eos=self.target_dictionary.eos(),
+                            tgt_lang=self.args.target_lang,
+                        ),
+                    )
+                ]
+            ),
+            eval_key=lang_pair,
+        )
+
+    def build_model(self, args):
+        def check_args():
+            messages = []
+            if (
+                len(set(self.args.lang_pairs).symmetric_difference(args.lang_pairs))
+                != 0
+            ):
+                messages.append(
+                    "--lang-pairs should include all the language pairs {}.".format(
+                        args.lang_pairs
+                    )
+                )
+            if self.args.encoder_langtok != args.encoder_langtok:
+                messages.append(
+                    "--encoder-langtok should be {}.".format(args.encoder_langtok)
+                )
+            if self.args.decoder_langtok != args.decoder_langtok:
+                messages.append(
+                    "--decoder-langtok should {} be set.".format(
+                        "" if args.decoder_langtok else "not"
+                    )
+                )
+
+            if len(messages) > 0:
+                raise ValueError(" ".join(messages))
+
+        # Update args -> the fact that the constructor here
+        # changes the args object doesn't mean you get the same one here
+        self.update_args(args)
+
+        # Check if task args are consistant with model args
+        check_args()
+
+        from fairseq import models
+
+        model = models.build_model(args, self)
+        if not isinstance(model, FairseqMultiModel):
+            raise ValueError(
+                "MultilingualTranslationTask requires a FairseqMultiModel architecture"
+            )
+        return model
+
+    def _per_lang_pair_train_loss(
+        self, lang_pair, model, update_num, criterion, sample, optimizer, ignore_grad
+    ):
+        loss, sample_size, logging_output = criterion(
+            model.models[lang_pair], sample[lang_pair]
+        )
+        if ignore_grad:
+            loss *= 0
+        optimizer.backward(loss)
+        return loss, sample_size, logging_output
+
+    def train_step(
+        self, sample, model, criterion, optimizer, update_num, ignore_grad=False
+    ):
+        model.train()
+        from collections import defaultdict
+
+        agg_loss, agg_sample_size, agg_logging_output = 0.0, 0.0, defaultdict(float)
+        curr_lang_pairs = [
+            lang_pair
+            for lang_pair in self.model_lang_pairs
+            if sample[lang_pair] is not None and len(sample[lang_pair]) != 0
+        ]
+
+        for idx, lang_pair in enumerate(curr_lang_pairs):
+
+            def maybe_no_sync():
+                if (
+                    self.args.distributed_world_size > 1
+                    and hasattr(model, "no_sync")
+                    and idx < len(curr_lang_pairs) - 1
+                ):
+                    return model.no_sync()
+                else:
+                    return contextlib.ExitStack()  # dummy contextmanager
+
+            with maybe_no_sync():
+                loss, sample_size, logging_output = self._per_lang_pair_train_loss(
+                    lang_pair,
+                    model,
+                    update_num,
+                    criterion,
+                    sample,
+                    optimizer,
+                    ignore_grad,
+                )
+            agg_loss += loss.detach().item()
+            # TODO make summing of the sample sizes configurable
+            agg_sample_size += sample_size
+            for k in logging_output:
+                agg_logging_output[k] += logging_output[k]
+                agg_logging_output[f"{lang_pair}:{k}"] += logging_output[k]
+        return agg_loss, agg_sample_size, agg_logging_output
+
+    def _per_lang_pair_valid_loss(self, lang_pair, model, criterion, sample):
+        return criterion(model.models[lang_pair], sample[lang_pair])
+
+    def valid_step(self, sample, model, criterion):
+        model.eval()
+        with torch.no_grad():
+            from collections import defaultdict
+
+            agg_loss, agg_sample_size, agg_logging_output = 0.0, 0.0, defaultdict(float)
+            for lang_pair in self.eval_lang_pairs:
+                if (
+                    lang_pair not in sample
+                    or sample[lang_pair] is None
+                    or len(sample[lang_pair]) == 0
+                ):
+                    continue
+                loss, sample_size, logging_output = self._per_lang_pair_valid_loss(
+                    lang_pair, model, criterion, sample
+                )
+                agg_loss += loss.data.item()
+                # TODO make summing of the sample sizes configurable
+                agg_sample_size += sample_size
+                for k in logging_output:
+                    agg_logging_output[k] += logging_output[k]
+                    agg_logging_output[f"{lang_pair}:{k}"] += logging_output[k]
+        return agg_loss, agg_sample_size, agg_logging_output
+
+    def inference_step(
+        self, generator, models, sample, prefix_tokens=None, constraints=None
+    ):
+        with torch.no_grad():
+            if self.args.decoder_langtok:
+                bos_token = _lang_token_index(
+                    self.target_dictionary, self.args.target_lang
+                )
+            else:
+                bos_token = self.target_dictionary.eos()
+            return generator.generate(
+                models,
+                sample,
+                prefix_tokens=prefix_tokens,
+                constraints=constraints,
+                bos_token=bos_token,
+            )
+
+    def reduce_metrics(self, logging_outputs, criterion):
+        with metrics.aggregate():
+            # pass 'sample_size', 'nsentences', 'ntokens' stats to fairseq_task
+            super().reduce_metrics(logging_outputs, criterion)
+            for k in ["sample_size", "nsentences", "ntokens"]:
+                metrics.log_scalar(k, sum(l[k] for l in logging_outputs))
+
+    @property
+    def source_dictionary(self):
+        if self.training:
+            return next(iter(self.dicts.values()))
+        else:
+            return self.dicts[self.args.source_lang]
+
+    @property
+    def target_dictionary(self):
+        if self.training:
+            return next(iter(self.dicts.values()))
+        else:
+            return self.dicts[self.args.target_lang]
+
+    def max_positions(self):
+        """Return the max sentence length allowed by the task."""
+        if len(self.datasets.values()) == 0:
+            return {
+                "%s-%s"
+                % (self.args.source_lang, self.args.target_lang): (
+                    self.args.max_source_positions,
+                    self.args.max_target_positions,
+                )
+            }
+        return OrderedDict(
+            [
+                (key, (self.args.max_source_positions, self.args.max_target_positions))
+                for split in self.datasets.keys()
+                for key in self.datasets[split].datasets.keys()
+            ]
+        )
diff --git a/SpeechT5/fairseq/fairseq/tasks/online_backtranslation.py b/SpeechT5/fairseq/fairseq/tasks/online_backtranslation.py
new file mode 100644
index 0000000000000000000000000000000000000000..2545624cd4ad9a7ec684aca798dca339feeff58b
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/tasks/online_backtranslation.py
@@ -0,0 +1,677 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import contextlib
+import json
+import logging
+import math
+import os
+from argparse import Namespace
+from collections import OrderedDict, defaultdict
+from pathlib import Path
+from typing import Dict, Sequence, Tuple
+
+import numpy as np
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+
+import fairseq
+from fairseq import metrics, options, utils
+from fairseq.data import (
+    FairseqDataset,
+    LanguagePairDataset,
+    NoisingDataset,
+    PrependTokenDataset,
+    RoundRobinZipDatasets,
+    TransformEosLangPairDataset,
+    data_utils,
+    encoders,
+)
+from fairseq.sequence_generator import SequenceGenerator
+from fairseq.tasks import register_task
+from fairseq.tasks.translation import TranslationTask, load_langpair_dataset
+
+logger = logging.getLogger(__name__)
+
+
+class PiecewiseLinearFn:
+    """Piecewise linear function. Can be configured with a string."""
+
+    def __init__(self, pieces: Sequence[Tuple[int, float]]):
+        assert pieces == sorted(
+            pieces
+        ), f"PiecewiseLinearFn configuration should be sorted, received: {pieces}"
+
+        self.pieces = pieces
+
+    def __call__(self, x: int) -> float:
+        for i, (x_a, y_a) in enumerate(self.pieces[:-1]):
+            x_b, y_b = self.pieces[i + 1]
+            if x_a <= x <= x_b:
+                return y_a + (x - x_a) * (y_b - y_a) / (x_b - x_a)
+
+        return self.pieces[-1][1]
+
+    @staticmethod
+    def from_string(configuration: str) -> "PiecewiseLinearFn":
+        """
+        Parse the configuration of lambda coefficient (for scheduling).
+        x = "3"                  # lambda will be a constant equal to x
+        x = "0:1,1000:0"         # lambda will start from 1 and linearly decrease
+                                 # to 0 during the first 1000 iterations
+        x = "0:0,1000:0,2000:1"  # lambda will be equal to 0 for the first 1000
+                                 # iterations, then will linearly increase to 1 until iteration 2000
+        """
+        if isinstance(configuration, float):
+            return PiecewiseLinearFn([(0, configuration)])
+
+        try:
+            parts = configuration.split(",")
+            if len(parts) == 1:
+                v = float(configuration)
+                return PiecewiseLinearFn([(0, v)])
+
+            split = [s.split(":") for s in parts]
+            pieces = [(int(t), float(v)) for t, v in split]
+            return PiecewiseLinearFn(pieces)
+        except Exception:
+            raise ValueError(
+                f"Invalid PiecewiseLinearFn configuration: {configuration!r}"
+            )
+
+    @staticmethod
+    def one() -> "PiecewiseLinearFn":
+        return PiecewiseLinearFn([(0, 1.0)])
+
+
+@register_task("online_backtranslation")
+class OnlineBackTranslationTask(TranslationTask):
+    @staticmethod
+    def add_args(parser):
+        """Add task-specific arguments to the parser."""
+        # fmt: off
+        # Generic translation args
+        parser.add_argument('data', help='colon separated path to data directories list, \
+                            will be iterated upon during epochs in round-robin manner; \
+                            however, valid and test data are always in the first directory to \
+                            avoid the need for repeating them in all directories')
+        parser.add_argument('--mono-langs', metavar='MONO_LANGS',
+                            help='monolingual languages for training')
+        parser.add_argument('--valid-lang-pairs', default=None, metavar='VALID_LANG_PAIRS',
+                            help='language pairs for validation')
+        parser.add_argument('--load-alignments', action='store_true',
+                            help='load the binarized alignments')
+        parser.add_argument('--left-pad-source', default='False', type=str, metavar='BOOL',
+                            help='pad the source on the left')
+        parser.add_argument('--left-pad-target', default='False', type=str, metavar='BOOL',
+                            help='pad the target on the left')
+        parser.add_argument('--upsample-primary', default=1, type=int,
+                            help='amount to upsample primary dataset')
+        parser.add_argument('--max-source-positions', default=1024, type=int, metavar='N',
+                            help='max number of tokens in the source sequence')
+        parser.add_argument('--max-target-positions', default=1024, type=int, metavar='N',
+                            help='max number of tokens in the target sequence')
+        parser.add_argument('--truncate-source', action='store_true', default=False,
+                            help='truncate source to max-source-positions')
+        parser.add_argument('--num-batch-buckets', default=0, type=int, metavar='N',
+                            help='if >0, then bucket source and target lengths into N '
+                                 'buckets and pad accordingly; this is useful on TPUs '
+                                 'to minimize the number of compilations')
+
+        # Denoising args
+        parser.add_argument('--max-word-shuffle-distance', default=3.0, type=float, metavar='N',
+                            help='maximum word shuffle distance for denoising autoencoding data generation')
+        parser.add_argument('--word-dropout-prob', default=0.1, type=float, metavar='N',
+                            help='word dropout probability for denoising autoencoding data generation')
+        parser.add_argument('--word-blanking-prob', default=0.2, type=float, metavar='N',
+                            help='word blanking probability for denoising autoencoding data generation')
+
+        # Backtranslation args
+        parser.add_argument('--lambda-bt', default="1.0", type=str, metavar='N',
+                            help='back-translation weight')
+        parser.add_argument('--lambda-dae', default="1.0", type=str, metavar='N',
+                            help='denoising auto-encoder weight')
+
+        # Evaluation args
+        parser.add_argument('--generate-one-by-one', action='store_true',
+                            help='generate one sentence at a time for backtranslation')
+
+        parser.add_argument('--eval-bleu', action='store_true',
+                            help='evaluation with BLEU scores')
+        parser.add_argument('--eval-bleu-detok', type=str, default="space",
+                            help='detokenize before computing BLEU (e.g., "moses"); '
+                                 'required if using --eval-bleu; use "space" to '
+                                 'disable detokenization; see fairseq.data.encoders '
+                                 'for other options')
+        parser.add_argument('--eval-bleu-detok-args', type=str, metavar='JSON',
+                            help='args for building the tokenizer, if needed')
+        parser.add_argument('--eval-tokenized-bleu', action='store_true', default=False,
+                            help='compute tokenized BLEU instead of sacrebleu')
+        parser.add_argument('--eval-bleu-remove-bpe', nargs='?', const='@@ ', default=None,
+                            help='remove BPE before computing BLEU')
+        parser.add_argument('--eval-bleu-args', type=str, metavar='JSON',
+                            help='generation args for BLUE scoring, '
+                                 'e.g., \'{"beam": 4, "lenpen": 0.6}\'')
+        parser.add_argument('--eval-bleu-print-samples', action='store_true',
+                            help='print sample generations during validation')
+        # fmt: on
+
+    def __init__(self, args, common_dict, mono_langs, valid_lang_pairs):
+        super().__init__(args, common_dict, common_dict)
+        self.common_dict = common_dict
+        self.mono_langs = mono_langs
+        self.valid_lang_pairs = valid_lang_pairs
+
+        self.SHOW_SAMPLES_INTERVAL = 1000
+        # Start by showing samples
+        self._show_samples_ctr = self.SHOW_SAMPLES_INTERVAL
+        self.SHOW_SAMPLES_NUMBER = 5
+        self.lambda_bt = PiecewiseLinearFn.from_string(args.lambda_bt)
+        self.lambda_dae = PiecewiseLinearFn.from_string(args.lambda_dae)
+
+        self.args = args
+        self.data = utils.split_paths(self.args.data)
+        if len(self.data) == 1:
+            shards = list(Path(self.data[0]).glob("shard*"))
+            if len(shards) > 0:
+                # keep this as strings, since it can also be a manifold path
+                old_data = self.data
+                self.data = [str(shard) for shard in shards]
+                logging.warning(f"Expanded data directory {old_data} to {self.data}")
+
+    @classmethod
+    def setup_task(cls, args, **kwargs):
+        """Setup the task (e.g., load dictionaries).
+
+        Args:
+            args (argparse.Namespace): parsed command-line arguments
+        """
+        args.left_pad_source = options.eval_bool(args.left_pad_source)
+        args.left_pad_target = options.eval_bool(args.left_pad_target)
+
+        paths = utils.split_paths(args.data)
+        assert len(paths) > 0
+        assert args.mono_langs is not None
+
+        mono_langs = args.mono_langs.split(",")
+        valid_lang_pairs = args.valid_lang_pairs.split(",")
+
+        # load dictionary
+        dict_path = os.path.join(paths[0], "dict.txt")
+        common_dict = cls.load_dictionary(dict_path)
+
+        return cls(args, common_dict, mono_langs, valid_lang_pairs)
+
+    def load_dataset(self, split, epoch=1, combine=False, **kwargs) -> FairseqDataset:
+        """Load a given dataset split.
+
+        Args:
+            split (str): name of the split (e.g., train, valid, test)
+        """
+        if split == "train":
+            data_path = self.data[(epoch - 1) % len(self.data)]
+            dataset = self.load_train_dataset(data_path)
+        else:
+            # valid/test should always be the same.
+            dataset = self.load_translation_dataset(split, self.data[0])
+
+        self.datasets[split] = dataset
+        return dataset
+
+    def load_train_dataset(self, data_path: str) -> FairseqDataset:
+        """The training dataset is made of backtranslation dataset and denoising dataset."""
+        data = []
+        for lang in self.mono_langs:
+            train_path = os.path.join(data_path, lang, "train")
+            # TODO: could we do the BT using denoise sample ?
+            # this would half the data loading work
+            data.append((f"{lang}-BT", self.load_bt_dataset(train_path, lang)))
+            data.append(
+                (f"{lang}-DENOISE", self.load_denoise_dataset(train_path, lang))
+            )
+
+        return RoundRobinZipDatasets(OrderedDict(data))
+
+    def _langpair_dataset(
+        self, src: FairseqDataset, tgt: FairseqDataset
+    ) -> LanguagePairDataset:
+        return LanguagePairDataset(
+            src,
+            src.sizes,
+            self.dictionary,
+            tgt=tgt,
+            tgt_sizes=tgt.sizes,
+            tgt_dict=self.dictionary,
+            left_pad_source=self.args.left_pad_source,
+            left_pad_target=self.args.left_pad_target,
+            # TODO: should we shuffle ? we are already sorting batch by sizes so ?
+            # shuffle=True,
+        )
+
+    def _prepend_lang_bos_to_target(
+        self, dataset: LanguagePairDataset, lang: str
+    ) -> LanguagePairDataset:
+        bos = _lang_token_index(self.dictionary, lang)
+        return TransformEosLangPairDataset(
+            dataset,
+            src_eos=self.dictionary.eos(),
+            new_src_eos=self.dictionary.eos(),
+            tgt_bos=self.dictionary.eos(),
+            new_tgt_bos=bos,
+        )
+
+    def load_bt_dataset(self, data_path: str, lang: str) -> FairseqDataset:
+        """The BT dataset is generated with (tgt, tgt) pairs.
+        The actual translation to a (generated_src, tgt) pair
+        is done on the fly during training.
+        """
+        mono_dataset = data_utils.load_indexed_dataset(
+            data_path, self.common_dict, self.args.dataset_impl
+        )
+        assert mono_dataset is not None, f"No dataset found for {lang}"
+
+        mono_dataset_src = PrependTokenDataset(
+            mono_dataset, _lang_token_index(self.dictionary, lang)
+        )
+
+        mono_dataset_bt = self._langpair_dataset(mono_dataset_src, mono_dataset)
+        logger.info(
+            f"mono_lang = {lang} "
+            f"lang token index = {_lang_token_index(self.dictionary, lang)} "
+            f"lang token = {_lang_token(lang)}"
+        )
+
+        mono_dataset_bt = self._prepend_lang_bos_to_target(mono_dataset_bt, lang)
+        return mono_dataset_bt
+
+    def load_denoise_dataset(self, data_path: str, lang: str) -> FairseqDataset:
+        """Classic denoising dataset"""
+        dataset = data_utils.load_indexed_dataset(
+            data_path, self.common_dict, self.args.dataset_impl
+        )
+        noisy_dataset = NoisingDataset(
+            dataset,
+            self.dictionary,
+            seed=1,
+            max_word_shuffle_distance=self.args.max_word_shuffle_distance,
+            word_dropout_prob=self.args.word_dropout_prob,
+            word_blanking_prob=self.args.word_blanking_prob,
+        )
+        noisy_dataset = PrependTokenDataset(
+            noisy_dataset, _lang_token_index(self.dictionary, lang)
+        )
+
+        clean_dataset = data_utils.load_indexed_dataset(
+            data_path, self.common_dict, self.args.dataset_impl
+        )
+        denoising_dataset = self._langpair_dataset(noisy_dataset, clean_dataset)
+        denoising_dataset = self._prepend_lang_bos_to_target(denoising_dataset, lang)
+        return denoising_dataset
+
+    def load_translation_dataset(
+        self, split: str, data_path: str, combine: bool = False
+    ):
+        # only judging with one language pair for the moment,
+        # since ConcatDataset doesn't work as expected
+        assert len(self.valid_lang_pairs) == 1, "For now..."
+        valid_lang_pair = self.valid_lang_pairs[0]
+        src, tgt = valid_lang_pair.split("-")
+
+        # use the same function than TranslationTask
+        src_tgt_dt = load_langpair_dataset(
+            data_path,
+            split,
+            src,
+            self.common_dict,
+            tgt,
+            self.common_dict,
+            combine=combine,
+            dataset_impl=self.args.dataset_impl,
+            upsample_primary=self.args.upsample_primary,
+            left_pad_source=self.args.left_pad_source,
+            left_pad_target=self.args.left_pad_target,
+            max_source_positions=self.args.max_source_positions,
+            max_target_positions=self.args.max_target_positions,
+            load_alignments=self.args.load_alignments,
+            truncate_source=self.args.truncate_source,
+            num_buckets=self.args.num_batch_buckets,
+            shuffle=(split != "test"),
+            prepend_bos_src=_lang_token_index(self.dictionary, src),
+        )
+
+        src_tgt_eos_dt = self._prepend_lang_bos_to_target(src_tgt_dt, tgt)
+        src_tgt_eos_dt.args = self.args
+        return src_tgt_eos_dt
+
+    def build_dataset_for_inference(self, src_tokens, src_lengths, constraints=None):
+        raise NotImplementedError
+
+    def build_model(self, args):
+        # torch.autograd.set_detect_anomaly(True)
+        model = super().build_model(args)
+
+        add_secial_tokens_to_dict_and_model(self.common_dict, model, self.mono_langs)
+
+        self.sequence_generators = {}
+        for mono_lang in self.mono_langs:
+            self.sequence_generators[mono_lang] = SequenceGenerator(
+                [model],
+                tgt_dict=self.dictionary,
+                beam_size=1,
+                max_len_a=1.3,
+                max_len_b=5,
+                min_len=5,
+                # keep 1 to be able to prepend bos
+                max_len=model.max_decoder_positions() - 1,
+            )
+
+        if getattr(args, "eval_bleu", False):
+            assert getattr(args, "eval_bleu_detok", None) is not None, (
+                "--eval-bleu-detok is required if using --eval-bleu; "
+                "try --eval-bleu-detok=moses (or --eval-bleu-detok=space "
+                "to disable detokenization, e.g., when using sentencepiece)"
+            )
+            detok_args = json.loads(getattr(args, "eval_bleu_detok_args", "{}") or "{}")
+            self.tokenizer = encoders.build_tokenizer(
+                Namespace(
+                    tokenizer=getattr(args, "eval_bleu_detok", None), **detok_args
+                )
+            )
+
+            gen_args = json.loads(getattr(args, "eval_bleu_args", "{}") or "{}")
+            self.bleu_sequence_generator = self.build_generator(
+                [model], Namespace(**gen_args)
+            )
+
+        return model
+
+    def max_positions(self):
+        """Return the max sentence length allowed by the task."""
+        return (self.args.max_source_positions, self.args.max_target_positions)
+
+    @property
+    def dictionary(self):
+        """Return the source :class:`~fairseq.data.Dictionary`."""
+        return self.common_dict
+
+    def display_samples_once_in_a_while(self, smp, mono_lang, other_lang):
+        self._show_samples_ctr += 1
+        if self._show_samples_ctr < self.SHOW_SAMPLES_INTERVAL:
+            return
+        self._show_samples_ctr = 0
+
+        ln = smp["net_input"]["src_tokens"].shape[0]
+
+        logger.info(
+            f"(r:{self.args.distributed_rank}) : "
+            f"{other_lang} ---> {mono_lang} "
+            f"({other_lang} was generated by back-translation.) {ln} samples"
+        )
+
+        for i in range(min(ln, self.SHOW_SAMPLES_NUMBER)):
+            src_tokens = smp["net_input"]["src_tokens"][i]
+            tgt_tokens = smp["target"][i]
+
+            src_str = self.dictionary.string(src_tokens, "sentencepiece")
+            tgt_str = self.dictionary.string(tgt_tokens, "sentencepiece")
+            logger.info(
+                f"\n{i}\t\t[{other_lang} generated]  {src_str}\n"
+                f"\t\t[{mono_lang} original ]  {tgt_str}\n"
+                f"\t\t[ src tokens]  {src_tokens}\n"
+            )
+
+    def backtranslate_sample(self, smp, orig_lang, other_lang) -> None:
+        """
+        * WARNING: smp is modified in place.
+        * At the start of this function, `smp` has the same input and target:
+          |--------------------------------------------------------|
+          | smp['net_input']['src_tokens'] |  smp['target']        |
+          | (from data) __en__ hello world |  __en__ hello world   |
+          |--------------------------------------------------------|
+
+        * We call generator.generate(smp, bos_token = token("ro")),
+        and copy the result as input
+        * At the end, `smp` has the translation to other language.
+          |--------------------------------------------------------|
+          | smp['net_input']['src_tokens'] |  smp['target']        |
+          | (generated) __ro__ salut lume  |  __en__ hello world   |
+          |--------------------------------------------------------|
+
+        """
+        bos_token = _lang_token_index(self.dictionary, other_lang)
+        generated = self.sequence_generators[orig_lang].generate(
+            models=[], sample=smp, bos_token=bos_token
+        )
+
+        max_lngth = max([gn[0]["tokens"].size(0) for gn in generated])
+        net_input = smp["net_input"]
+        n_src_tokens = torch.empty(
+            size=(len(generated), max_lngth + 1), dtype=net_input["src_tokens"].dtype
+        )
+        n_src_lengths = torch.empty(
+            len(generated), dtype=net_input["src_lengths"].dtype
+        )
+
+        for i, gn in enumerate(generated):
+            tokens = gn[0]["tokens"]
+            tokens_size = tokens.size(0)
+            padding_needed = max_lngth - tokens_size
+            tokens = torch.cat([tokens.new([bos_token]), tokens])
+            tokens = F.pad(tokens, (0, padding_needed), value=self.dictionary.pad())
+            n_src_tokens[i] = tokens
+            n_src_lengths[i] = tokens_size + 1
+
+        device = net_input["src_tokens"].device
+        # This seems to be important
+        del net_input["src_tokens"]
+        del net_input["src_lengths"]
+        net_input["src_tokens"] = n_src_tokens.to(device)
+        net_input["src_lengths"] = n_src_lengths.to(device)
+
+    def generate(self, smp, model):
+        model.eval()
+        orig_lang = (
+            self.dictionary[smp["net_input"]["src_tokens"][0][0]]
+            .replace(" ", "")
+            .replace("_", "")
+        )
+        bos_token = smp["net_input"]["prev_output_tokens"][0][0]
+        with torch.no_grad():
+            generated = self.sequence_generators[orig_lang].generate(
+                models=[model], sample=smp, bos_token=bos_token
+            )
+        return generated
+
+    def get_other_lang(self, lang):
+        # TODO: allow more complex mapping
+        if lang != self.mono_langs[0]:
+            return self.mono_langs[0]
+        if len(self.mono_langs) == 2:
+            return self.mono_langs[1]
+        return self.mono_langs[np.random.randint(1, len(self.mono_langs))]
+
+    def train_step(
+        self, sample, model, criterion, optimizer, update_num, ignore_grad=False
+    ):
+
+        model.train()
+        model.set_num_updates(update_num)
+
+        agg_loss, agg_sample_size = 0.0, 0.0
+        agg_logging_output: Dict[str, float] = defaultdict(float)
+
+        dataset_keys = self.datasets["train"].datasets.keys()
+
+        weights = {
+            "BT": self.lambda_bt(update_num),
+            "DENOISE": self.lambda_dae(update_num),
+        }
+        log_keys = {"BT": "bt_", "DENOISE": "dae_"}
+
+        for dataset_key in dataset_keys:
+            smp = sample[dataset_key]
+            mono_lang, task_subtype = dataset_key.split("-")
+            if weights[task_subtype] == 0:
+                continue
+
+            if task_subtype == "BT":
+                with torch.autograd.profiler.record_function("backtranslation"):
+                    model.eval()
+                    # TODO: Could we translate to several language at once ?
+                    # this would allow to share encoder_out and maximize GPU usage.
+                    other_lang = self.get_other_lang(mono_lang)
+                    self.backtranslate_sample(smp, mono_lang, other_lang)
+                    self.display_samples_once_in_a_while(smp, mono_lang, other_lang)
+                    model.train()
+
+            # Like in FairseqTask.train_step
+            with torch.autograd.profiler.record_function("forward"):
+                loss, sample_size, logging_output = criterion(model, smp)
+            loss *= weights[task_subtype]
+            if ignore_grad:
+                loss *= 0
+            with torch.autograd.profiler.record_function("backward"):
+                optimizer.backward(loss)
+
+            agg_loss += loss.item()
+            agg_sample_size += sample_size
+            for k in logging_output:
+                agg_logging_output[log_keys[task_subtype] + k] += logging_output[k]
+                agg_logging_output[k] += logging_output[k]
+
+        return agg_loss, agg_sample_size, agg_logging_output
+
+    def get_bos_token_from_sample(self, sample):
+        net_input = sample["net_input"]
+        source_lang_token_id = torch.unique(net_input["src_tokens"][:, 0]).item()
+        source_lang_token = self.dictionary[source_lang_token_id].replace("_", "")
+        target_lang_token_id = _lang_token_index(
+            self.dictionary, self.get_other_lang(source_lang_token)
+        )
+
+        return target_lang_token_id
+
+    def reduce_metrics(self, logging_outputs, criterion):
+        super().reduce_metrics(logging_outputs, criterion)
+        bt_sample_size = sum(x.get("bt_sample_size", 0) for x in logging_outputs)
+        if bt_sample_size:
+            bt_loss_sum = sum(x.get("bt_loss", 0) for x in logging_outputs)
+            bt_loss_sum *= 1 / bt_sample_size / math.log(2)
+            metrics.log_scalar("bt_loss", bt_loss_sum, bt_sample_size, round=3)
+
+            bt_nll_loss_sum = sum(x.get("bt_nll_loss", 0) for x in logging_outputs)
+            bt_ntokens = sum(x.get("bt_ntokens", 0) for x in logging_outputs)
+            bt_nll_loss_sum *= 1 / bt_ntokens / math.log(2)
+            metrics.log_scalar("bt_nll_loss", bt_nll_loss_sum, bt_ntokens, round=3)
+            metrics.log_derived(
+                "bt_ppl", lambda meters: utils.get_perplexity(meters["bt_nll_loss"].avg)
+            )
+
+        dae_sample_size = sum(x.get("dae_sample_size", 0) for x in logging_outputs)
+        if dae_sample_size:
+            dae_loss_sum = sum(x.get("dae_loss", 0) for x in logging_outputs)
+            dae_loss_sum *= 1 / dae_sample_size / math.log(2)
+            metrics.log_scalar("dae_loss", dae_loss_sum, dae_sample_size, round=3)
+
+            dae_nll_loss_sum = sum(x.get("dae_nll_loss", 0) for x in logging_outputs)
+            dae_ntokens = sum(x.get("dae_ntokens", 0) for x in logging_outputs)
+            dae_nll_loss_sum *= 1 / dae_ntokens / math.log(2)
+            metrics.log_scalar("dae_nll_loss", dae_nll_loss_sum, dae_ntokens, round=3)
+            metrics.log_derived(
+                "dae_ppl",
+                lambda meters: utils.get_perplexity(meters["dae_nll_loss"].avg),
+            )
+
+
+@torch.no_grad()
+def extend_embedding(
+    emb: nn.Module, new_vocab_size: int, copy_from_token_id: int
+) -> None:
+    old_emb_data = emb.weight.data
+    (old_vocab_size, dim) = old_emb_data.shape
+    assert new_vocab_size >= old_vocab_size
+
+    if new_vocab_size > old_vocab_size:
+        emb.weight.data = torch.zeros((new_vocab_size, dim))
+        emb.weight.data[:old_vocab_size, :] = old_emb_data
+        # initialize new embeddings
+        emb.weight.data[old_vocab_size:, :] = old_emb_data[copy_from_token_id]
+        if hasattr(emb, "num_embeddings"):
+            emb.num_embeddings = new_vocab_size
+        if hasattr(emb, "out_features"):
+            emb.out_features = new_vocab_size
+
+    if getattr(emb, "bias", None) is None:
+        return
+
+    # Fix the bias.
+    # Bias shape can be different from the previous vocab size
+    # if the weight matrix was shared and alread extended but not the bias.
+    (old_vocab_size,) = emb.bias.shape
+    assert new_vocab_size >= old_vocab_size
+    if new_vocab_size > old_vocab_size:
+        old_bias = emb.bias.data
+        new_bias = torch.zeros(
+            (new_vocab_size,), dtype=old_bias.dtype, device=old_bias.device
+        )
+        new_bias[:old_vocab_size] = old_bias
+        emb.bias.data = new_bias
+
+
+def add_secial_tokens_to_dict_and_model(
+    dictionary: "fairseq.data.Dictionary",
+    model: nn.Module,
+    mono_langs: Sequence[str],
+) -> None:
+    embs = model.encoder.embed_tokens
+    vocab_size, embedding_dim = embs.weight.shape
+
+    # The model may or may not have a '<mask>' embedding yet
+    assert (
+        len(dictionary) <= vocab_size <= len(dictionary) + 1
+    ), f"Dictionary len ({len(dictionary)}) doesn't match embs shape ({embs.weight.shape})"
+    # TODO: we should reuse the pretrained model dict which already has <mask>
+    dictionary.add_symbol("<mask>")
+
+    for lang in mono_langs:
+        lang_token = _lang_token(lang)
+        dictionary.add_symbol(lang_token)
+    logger.info(
+        f"dictionary: {len(dictionary)} -> {vocab_size} tokens "
+        f"after adding {len(mono_langs)} lang tokens."
+    )
+
+    if len(dictionary) <= vocab_size:
+        return
+
+    extend_embedding(embs, len(dictionary), dictionary.bos())
+    dec_embs = model.decoder.embed_tokens
+    extend_embedding(dec_embs, len(dictionary), dictionary.bos())
+    lm_head = model.decoder.output_projection
+    extend_embedding(lm_head, len(dictionary), dictionary.bos())
+    assert lm_head.weight.shape == (len(dictionary), embedding_dim)
+
+
+def _lang_token(lang: str) -> str:
+    return f"__{lang}__"
+
+
+def _lang_token_index(dictionary, lang: str) -> int:
+    return dictionary.index(_lang_token(lang))
+
+
+@contextlib.contextmanager
+def assert_weights_have_changed(model: nn.Module):
+    def checksum(model: nn.Module) -> float:
+        return sum(p.sum().item() for p in model.parameters())
+
+    initial_checksum = checksum(model)
+    yield model
+    final_checksum = checksum(model)
+    logger.info(
+        f"initial_checksum={initial_checksum} -> final_checksum={final_checksum}"
+    )
+    assert initial_checksum != final_checksum, "Model hasn't changed !"
diff --git a/SpeechT5/fairseq/fairseq/tasks/semisupervised_translation.py b/SpeechT5/fairseq/fairseq/tasks/semisupervised_translation.py
new file mode 100644
index 0000000000000000000000000000000000000000..b2f9bf9a733d94e50b588e4316b4a02e1c8bcf51
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/tasks/semisupervised_translation.py
@@ -0,0 +1,485 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import logging
+import os
+from collections import OrderedDict
+
+from fairseq import utils
+from fairseq.data import (
+    BacktranslationDataset,
+    IndexedCachedDataset,
+    IndexedDataset,
+    IndexedRawTextDataset,
+    LanguagePairDataset,
+    NoisingDataset,
+    RoundRobinZipDatasets,
+    data_utils,
+    indexed_dataset,
+)
+from fairseq.models import FairseqMultiModel
+from fairseq.sequence_generator import SequenceGenerator
+
+from . import register_task
+from .multilingual_translation import MultilingualTranslationTask
+
+
+logger = logging.getLogger(__name__)
+
+
+def _get_bt_dataset_key(lang_pair):
+    return "bt:" + lang_pair
+
+
+def _get_denoising_dataset_key(lang_pair):
+    return "denoising:" + lang_pair
+
+
+# ported from UnsupervisedMT
+def parse_lambda_config(x):
+    """
+    Parse the configuration of lambda coefficient (for scheduling).
+    x = "3"                  # lambda will be a constant equal to x
+    x = "0:1,1000:0"         # lambda will start from 1 and linearly decrease
+                             # to 0 during the first 1000 iterations
+    x = "0:0,1000:0,2000:1"  # lambda will be equal to 0 for the first 1000
+                             # iterations, then will linearly increase to 1 until iteration 2000
+    """
+    split = x.split(",")
+    if len(split) == 1:
+        return float(x), None
+    else:
+        split = [s.split(os.pathsep) for s in split]
+        assert all(len(s) == 2 for s in split)
+        assert all(k.isdigit() for k, _ in split)
+        assert all(
+            int(split[i][0]) < int(split[i + 1][0]) for i in range(len(split) - 1)
+        )
+        return float(split[0][1]), [(int(k), float(v)) for k, v in split]
+
+
+@register_task("semisupervised_translation")
+class SemisupervisedTranslationTask(MultilingualTranslationTask):
+    """A task for training multiple translation models simultaneously.
+
+    We iterate round-robin over batches from multiple language pairs, ordered
+    according to the `--lang-pairs` argument.
+
+    The training loop is roughly:
+
+        for i in range(len(epoch)):
+            for lang_pair in args.lang_pairs:
+                batch = next_batch_for_lang_pair(lang_pair)
+                loss = criterion(model_for_lang_pair(lang_pair), batch)
+                loss.backward()
+            optimizer.step()
+
+    In practice, `next_batch_for_lang_pair` is abstracted in a FairseqDataset
+    (e.g., `RoundRobinZipDatasets`) and `model_for_lang_pair` is a model that
+    implements the `FairseqMultiModel` interface.
+
+    During inference it is required to specify a single `--source-lang` and
+    `--target-lang`, instead of `--lang-pairs`.
+    """
+
+    @staticmethod
+    def add_args(parser):
+        """Add task-specific arguments to the parser."""
+        # fmt: off
+        MultilingualTranslationTask.add_args(parser)
+        parser.add_argument('--lambda-parallel-config', default="1.0", type=str, metavar='CONFIG',
+                            help='cross-entropy reconstruction coefficient (parallel data). '
+                                 'use fixed weight during training if set to floating point number. '
+                                 'use piecewise linear function over number of updates to schedule the '
+                                 'weight with the format: w0:step0,w1:step1,...')
+        parser.add_argument('--lambda-denoising-config', default="0.0", type=str, metavar='CONFIG',
+                            help='Cross-entropy reconstruction coefficient (denoising autoencoding)'
+                                 'use fixed weight during training if set to floating point number. '
+                                 'use piecewise linear function over number of updates to schedule the '
+                                 'weight with the format: w0:step0,w1:step1,...')
+        parser.add_argument('--lambda-otf-bt-config', default="0.0", type=str, metavar='CONFIG',
+                            help='cross-entropy reconstruction coefficient (on-the-fly back-translation parallel data)'
+                                 'use fixed weight during training if set to floating point number. '
+                                 'use piecewise linear function over number of updates to schedule the '
+                                 'weight with the format: w0:step0,w1:step1,...')
+        parser.add_argument('--bt-max-len-a', default=1.1, type=float, metavar='N',
+                            help='generate back-translated sequences of maximum length ax + b, where x is the '
+                                 'source length')
+        parser.add_argument('--bt-max-len-b', default=10.0, type=float, metavar='N',
+                            help='generate back-translated sequences of maximum length ax + b, where x is the '
+                                 'source length')
+        parser.add_argument('--bt-beam-size', default=1, type=int, metavar='N',
+                            help='beam size used in beam search of online back-translation')
+        parser.add_argument('--max-word-shuffle-distance', default=3.0, type=float, metavar='N',
+                            help='maximum word shuffle distance for denoising autoencoding data generation')
+        parser.add_argument('--word-dropout-prob', default=0.1, type=float, metavar='N',
+                            help='word dropout probability for denoising autoencoding data generation')
+        parser.add_argument('--word-blanking-prob', default=0.2, type=float, metavar='N',
+                            help='word blanking probability for denoising autoencoding data generation')
+        # fmt: on
+
+    def __init__(self, args, dicts, training):
+        super().__init__(args, dicts, training)
+        self.lambda_parallel, self.lambda_parallel_steps = parse_lambda_config(
+            args.lambda_parallel_config
+        )
+        self.lambda_otf_bt, self.lambda_otf_bt_steps = parse_lambda_config(
+            args.lambda_otf_bt_config
+        )
+        self.lambda_denoising, self.lambda_denoising_steps = parse_lambda_config(
+            args.lambda_denoising_config
+        )
+        if self.lambda_denoising > 0.0 or self.lambda_denoising_steps is not None:
+            denoising_lang_pairs = [
+                "%s-%s" % (tgt, tgt)
+                for tgt in {lang_pair.split("-")[1] for lang_pair in args.lang_pairs}
+            ]
+            self.model_lang_pairs = self.model_lang_pairs + denoising_lang_pairs
+        self.backtranslate_datasets = {}
+        self.backtranslators = {}
+
+    @classmethod
+    def setup_task(cls, args, **kwargs):
+        dicts, training = MultilingualTranslationTask.prepare(args, **kwargs)
+        return cls(args, dicts, training)
+
+    def load_dataset(self, split, epoch=1, **kwargs):
+        """Load a dataset split."""
+        paths = utils.split_paths(self.args.data)
+        assert len(paths) > 0
+        data_path = paths[(epoch - 1) % len(paths)]
+
+        def split_exists(split, src, tgt, lang):
+            if src is not None:
+                filename = os.path.join(
+                    data_path, "{}.{}-{}.{}".format(split, src, tgt, lang)
+                )
+            else:
+                filename = os.path.join(
+                    data_path, "{}.{}-None.{}".format(split, src, tgt)
+                )
+            return indexed_dataset.dataset_exists(filename, impl=self.args.dataset_impl)
+
+        def load_indexed_dataset(path, dictionary):
+            return data_utils.load_indexed_dataset(
+                path, dictionary, self.args.dataset_impl
+            )
+
+        # load parallel datasets
+        src_datasets, tgt_datasets = {}, {}
+        if (
+            self.lambda_parallel > 0.0
+            or self.lambda_parallel_steps is not None
+            or not split.startswith("train")
+        ):
+            for lang_pair in self.lang_pairs:
+                src, tgt = lang_pair.split("-")
+                if split_exists(split, src, tgt, src):
+                    prefix = os.path.join(
+                        data_path, "{}.{}-{}.".format(split, src, tgt)
+                    )
+                elif split_exists(split, tgt, src, src):
+                    prefix = os.path.join(
+                        data_path, "{}.{}-{}.".format(split, tgt, src)
+                    )
+                else:
+                    continue
+                src_datasets[lang_pair] = load_indexed_dataset(
+                    prefix + src, self.dicts[src]
+                )
+                tgt_datasets[lang_pair] = load_indexed_dataset(
+                    prefix + tgt, self.dicts[tgt]
+                )
+                logger.info(
+                    "parallel-{} {} {} examples".format(
+                        data_path, split, len(src_datasets[lang_pair])
+                    )
+                )
+            if len(src_datasets) == 0:
+                raise FileNotFoundError(
+                    "Dataset not found: {} ({})".format(split, data_path)
+                )
+
+        # back translation datasets
+        backtranslate_datasets = {}
+        if (
+            self.lambda_otf_bt > 0.0 or self.lambda_otf_bt_steps is not None
+        ) and split.startswith("train"):
+            for lang_pair in self.lang_pairs:
+                src, tgt = lang_pair.split("-")
+                if not split_exists(split, tgt, None, tgt):
+                    raise FileNotFoundError(
+                        "Dataset not found: backtranslation {} ({})".format(
+                            split, data_path
+                        )
+                    )
+                filename = os.path.join(
+                    data_path, "{}.{}-None.{}".format(split, tgt, tgt)
+                )
+                dataset = load_indexed_dataset(filename, self.dicts[tgt])
+                lang_pair_dataset_tgt = LanguagePairDataset(
+                    dataset,
+                    dataset.sizes,
+                    self.dicts[tgt],
+                    left_pad_source=self.args.left_pad_source,
+                    left_pad_target=self.args.left_pad_target,
+                )
+                lang_pair_dataset = LanguagePairDataset(
+                    dataset,
+                    dataset.sizes,
+                    src_dict=self.dicts[src],
+                    tgt=dataset,
+                    tgt_sizes=dataset.sizes,
+                    tgt_dict=self.dicts[tgt],
+                    left_pad_source=self.args.left_pad_source,
+                    left_pad_target=self.args.left_pad_target,
+                )
+                backtranslate_datasets[lang_pair] = BacktranslationDataset(
+                    tgt_dataset=self.alter_dataset_langtok(
+                        lang_pair_dataset_tgt,
+                        src_eos=self.dicts[tgt].eos(),
+                        src_lang=tgt,
+                        tgt_lang=src,
+                    ),
+                    backtranslation_fn=self.backtranslators[lang_pair],
+                    src_dict=self.dicts[src],
+                    tgt_dict=self.dicts[tgt],
+                    output_collater=self.alter_dataset_langtok(
+                        lang_pair_dataset=lang_pair_dataset,
+                        src_eos=self.dicts[src].eos(),
+                        src_lang=src,
+                        tgt_eos=self.dicts[tgt].eos(),
+                        tgt_lang=tgt,
+                    ).collater,
+                )
+                logger.info(
+                    "backtranslate-{}: {} {} {} examples".format(
+                        tgt,
+                        data_path,
+                        split,
+                        len(backtranslate_datasets[lang_pair]),
+                    )
+                )
+                self.backtranslate_datasets[lang_pair] = backtranslate_datasets[
+                    lang_pair
+                ]
+
+        # denoising autoencoder
+        noising_datasets = {}
+        if (
+            self.lambda_denoising > 0.0 or self.lambda_denoising_steps is not None
+        ) and split.startswith("train"):
+            for lang_pair in self.lang_pairs:
+                _, tgt = lang_pair.split("-")
+                if not split_exists(split, tgt, None, tgt):
+                    continue
+                filename = os.path.join(
+                    data_path, "{}.{}-None.{}".format(split, tgt, tgt)
+                )
+                tgt_dataset1 = load_indexed_dataset(filename, self.dicts[tgt])
+                tgt_dataset2 = load_indexed_dataset(filename, self.dicts[tgt])
+                noising_dataset = NoisingDataset(
+                    tgt_dataset1,
+                    self.dicts[tgt],
+                    seed=1,
+                    max_word_shuffle_distance=self.args.max_word_shuffle_distance,
+                    word_dropout_prob=self.args.word_dropout_prob,
+                    word_blanking_prob=self.args.word_blanking_prob,
+                )
+                noising_datasets[lang_pair] = self.alter_dataset_langtok(
+                    LanguagePairDataset(
+                        noising_dataset,
+                        tgt_dataset1.sizes,
+                        self.dicts[tgt],
+                        tgt_dataset2,
+                        tgt_dataset2.sizes,
+                        self.dicts[tgt],
+                        left_pad_source=self.args.left_pad_source,
+                        left_pad_target=self.args.left_pad_target,
+                    ),
+                    src_eos=self.dicts[tgt].eos(),
+                    src_lang=tgt,
+                    tgt_eos=self.dicts[tgt].eos(),
+                    tgt_lang=tgt,
+                )
+                logger.info(
+                    "denoising-{}: {} {} {} examples".format(
+                        tgt,
+                        data_path,
+                        split,
+                        len(noising_datasets[lang_pair]),
+                    )
+                )
+
+        def language_pair_dataset(lang_pair):
+            src, tgt = lang_pair.split("-")
+            src_dataset, tgt_dataset = src_datasets[lang_pair], tgt_datasets[lang_pair]
+            return self.alter_dataset_langtok(
+                LanguagePairDataset(
+                    src_dataset,
+                    src_dataset.sizes,
+                    self.dicts[src],
+                    tgt_dataset,
+                    tgt_dataset.sizes,
+                    self.dicts[tgt],
+                    left_pad_source=self.args.left_pad_source,
+                    left_pad_target=self.args.left_pad_target,
+                ),
+                self.dicts[src].eos(),
+                src,
+                self.dicts[tgt].eos(),
+                tgt,
+            )
+
+        self.datasets[split] = RoundRobinZipDatasets(
+            OrderedDict(
+                [
+                    (lang_pair, language_pair_dataset(lang_pair))
+                    for lang_pair in src_datasets.keys()
+                ]
+                + [
+                    (_get_bt_dataset_key(lang_pair), dataset)
+                    for lang_pair, dataset in backtranslate_datasets.items()
+                ]
+                + [
+                    (_get_denoising_dataset_key(lang_pair), dataset)
+                    for lang_pair, dataset in noising_datasets.items()
+                ]
+            ),
+            eval_key=None
+            if self.training
+            else "%s-%s" % (self.args.source_lang, self.args.target_lang),
+        )
+
+    def build_model(self, args):
+        from fairseq import models
+
+        model = models.build_model(args, self)
+        if not isinstance(model, FairseqMultiModel):
+            raise ValueError(
+                "SemisupervisedTranslationTask requires a FairseqMultiModel architecture"
+            )
+
+        # create SequenceGenerator for each model that has backtranslation dependency on it
+        self.sequence_generators = {}
+        if (
+            self.lambda_otf_bt > 0.0 or self.lambda_otf_bt_steps is not None
+        ) and self.training:
+            for lang_pair in self.lang_pairs:
+                src, tgt = lang_pair.split("-")
+                key = "{}-{}".format(tgt, src)
+                self.sequence_generators[key] = SequenceGenerator(
+                    [model.models[key]],
+                    tgt_dict=self.dicts[src],
+                    beam_size=args.bt_beam_size,
+                    max_len_a=args.bt_max_len_a,
+                    max_len_b=args.bt_max_len_b,
+                )
+                decoder_lang_tok_idx = self.get_decoder_langtok(src)
+
+                def backtranslate_fn(
+                    sample,
+                    model=model.models[key],
+                    bos_token=decoder_lang_tok_idx,
+                    sequence_generator=self.sequence_generators[key],
+                ):
+                    return sequence_generator.generate(
+                        [model],
+                        sample,
+                        bos_token=bos_token,
+                    )
+
+                self.backtranslators[lang_pair] = backtranslate_fn
+
+        return model
+
+    def train_step(
+        self, sample, model, criterion, optimizer, update_num, ignore_grad=False
+    ):
+        model.train()
+
+        if update_num > 0:
+            self.update_step(update_num)
+
+        agg_loss, agg_sample_size, agg_logging_output = 0.0, 0.0, {}
+
+        def forward_backward(model, samples, logging_output_key, weight):
+            nonlocal agg_loss, agg_sample_size, agg_logging_output
+            if samples is None or len(samples) == 0:
+                return
+            loss, sample_size, logging_output = criterion(model, samples)
+            if ignore_grad:
+                loss *= 0
+            else:
+                loss *= weight
+            optimizer.backward(loss)
+            agg_loss += loss.detach().item()
+            # TODO make summing of the sample sizes configurable
+            agg_sample_size += sample_size
+            for k in logging_output:
+                agg_logging_output[k] += logging_output[k]
+                agg_logging_output[logging_output_key] += logging_output[k]
+
+        if self.lambda_parallel > 0.0:
+            for lang_pair in self.lang_pairs:
+                forward_backward(
+                    model.models[lang_pair],
+                    sample[lang_pair],
+                    lang_pair,
+                    self.lambda_parallel,
+                )
+
+        if self.lambda_otf_bt > 0.0:
+            for lang_pair in self.lang_pairs:
+                sample_key = _get_bt_dataset_key(lang_pair)
+                forward_backward(
+                    model.models[lang_pair],
+                    sample[sample_key],
+                    sample_key,
+                    self.lambda_otf_bt,
+                )
+
+        if self.lambda_denoising > 0.0:
+            for lang_pair in self.lang_pairs:
+                _, tgt = lang_pair.split("-")
+                sample_key = _get_denoising_dataset_key(lang_pair)
+                forward_backward(
+                    model.models["{0}-{0}".format(tgt)],
+                    sample[sample_key],
+                    sample_key,
+                    self.lambda_denoising,
+                )
+
+        return agg_loss, agg_sample_size, agg_logging_output
+
+    def update_step(self, num_updates):
+        def lambda_step_func(config, n_iter):
+            """
+            Update a lambda value according to its schedule configuration.
+            """
+            ranges = [
+                i
+                for i in range(len(config) - 1)
+                if config[i][0] <= n_iter < config[i + 1][0]
+            ]
+            if len(ranges) == 0:
+                assert n_iter >= config[-1][0]
+                return config[-1][1]
+            assert len(ranges) == 1
+            i = ranges[0]
+            x_a, y_a = config[i]
+            x_b, y_b = config[i + 1]
+            return y_a + (n_iter - x_a) * float(y_b - y_a) / float(x_b - x_a)
+
+        if self.lambda_parallel_steps is not None:
+            self.lambda_parallel = lambda_step_func(
+                self.lambda_parallel_steps, num_updates
+            )
+        if self.lambda_denoising_steps is not None:
+            self.lambda_denoising = lambda_step_func(
+                self.lambda_denoising_steps, num_updates
+            )
+        if self.lambda_otf_bt_steps is not None:
+            self.lambda_otf_bt = lambda_step_func(self.lambda_otf_bt_steps, num_updates)
diff --git a/SpeechT5/fairseq/fairseq/tasks/sentence_prediction.py b/SpeechT5/fairseq/fairseq/tasks/sentence_prediction.py
new file mode 100644
index 0000000000000000000000000000000000000000..6732728de981da7174eae32ecaf4c47901d65399
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/tasks/sentence_prediction.py
@@ -0,0 +1,286 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import logging
+import os
+
+import numpy as np
+from fairseq import utils
+from fairseq.data import (
+    ConcatSentencesDataset,
+    Dictionary,
+    IdDataset,
+    NestedDictionaryDataset,
+    NumelDataset,
+    NumSamplesDataset,
+    OffsetTokensDataset,
+    PrependTokenDataset,
+    RawLabelDataset,
+    RightPadDataset,
+    RollDataset,
+    SortDataset,
+    StripTokenDataset,
+    data_utils,
+)
+from fairseq.data.shorten_dataset import maybe_shorten_dataset
+from fairseq.tasks import LegacyFairseqTask, register_task
+
+
+logger = logging.getLogger(__name__)
+
+
+@register_task("sentence_prediction")
+class SentencePredictionTask(LegacyFairseqTask):
+    """
+    Sentence (or sentence pair) prediction (classification or regression) task.
+
+    Args:
+        dictionary (Dictionary): the dictionary for the input of the task
+    """
+
+    @staticmethod
+    def add_args(parser):
+        """Add task-specific arguments to the parser."""
+        parser.add_argument("data", metavar="FILE", help="file prefix for data")
+        parser.add_argument(
+            "--num-classes",
+            type=int,
+            default=-1,
+            help="number of classes or regression targets",
+        )
+        parser.add_argument(
+            "--init-token",
+            type=int,
+            default=None,
+            help="add token at the beginning of each batch item",
+        )
+        parser.add_argument(
+            "--separator-token",
+            type=int,
+            default=None,
+            help="add separator token between inputs",
+        )
+        parser.add_argument("--regression-target", action="store_true", default=False)
+        parser.add_argument("--no-shuffle", action="store_true", default=False)
+        parser.add_argument(
+            "--shorten-method",
+            default="none",
+            choices=["none", "truncate", "random_crop"],
+            help="if not none, shorten sequences that exceed --tokens-per-sample",
+        )
+        parser.add_argument(
+            "--shorten-data-split-list",
+            default="",
+            help="comma-separated list of dataset splits to apply shortening to, "
+            'e.g., "train,valid" (default: all dataset splits)',
+        )
+        parser.add_argument(
+            "--add-prev-output-tokens",
+            action="store_true",
+            default=False,
+            help="add prev_output_tokens to sample, used for encoder-decoder arch",
+        )
+
+    def __init__(self, args, data_dictionary, label_dictionary):
+        super().__init__(args)
+        self.dictionary = data_dictionary
+        self._label_dictionary = label_dictionary
+        if not hasattr(args, "max_positions"):
+            self._max_positions = (
+                args.max_source_positions,
+                args.max_target_positions,
+            )
+        else:
+            self._max_positions = args.max_positions
+        args.tokens_per_sample = self._max_positions
+
+    @classmethod
+    def load_dictionary(cls, args, filename, source=True):
+        """Load the dictionary from the filename
+
+        Args:
+            filename (str): the filename
+        """
+        dictionary = Dictionary.load(filename)
+        dictionary.add_symbol("<mask>")
+        return dictionary
+
+    @classmethod
+    def setup_task(cls, args, **kwargs):
+        assert args.num_classes > 0, "Must set --num-classes"
+
+        # load data dictionary
+        data_dict = cls.load_dictionary(
+            args,
+            os.path.join(args.data, "input0", "dict.txt"),
+            source=True,
+        )
+        logger.info("[input] dictionary: {} types".format(len(data_dict)))
+
+        # load label dictionary
+        if not args.regression_target:
+            label_dict = cls.load_dictionary(
+                args,
+                os.path.join(args.data, "label", "dict.txt"),
+                source=False,
+            )
+            logger.info("[label] dictionary: {} types".format(len(label_dict)))
+        else:
+            label_dict = data_dict
+        return cls(args, data_dict, label_dict)
+
+    def load_dataset(self, split, combine=False, **kwargs):
+        """Load a given dataset split (e.g., train, valid, test)."""
+
+        def get_path(key, split):
+            return os.path.join(self.args.data, key, split)
+
+        def make_dataset(key, dictionary):
+            split_path = get_path(key, split)
+
+            try:
+                dataset = data_utils.load_indexed_dataset(
+                    split_path,
+                    dictionary,
+                    self.args.dataset_impl,
+                    combine=combine,
+                )
+            except Exception as e:
+                if "StorageException: [404] Path not found" in str(e):
+                    logger.warning(f"dataset {e} not found")
+                    dataset = None
+                else:
+                    raise e
+            return dataset
+
+        input0 = make_dataset("input0", self.source_dictionary)
+        assert input0 is not None, "could not find dataset: {}".format(
+            get_path("input0", split)
+        )
+        input1 = make_dataset("input1", self.source_dictionary)
+
+        if self.args.init_token is not None:
+            input0 = PrependTokenDataset(input0, self.args.init_token)
+
+        if input1 is None:
+            src_tokens = input0
+        else:
+            if self.args.separator_token is not None:
+                input1 = PrependTokenDataset(input1, self.args.separator_token)
+
+            src_tokens = ConcatSentencesDataset(input0, input1)
+
+        with data_utils.numpy_seed(self.args.seed):
+            shuffle = np.random.permutation(len(src_tokens))
+
+        src_tokens = maybe_shorten_dataset(
+            src_tokens,
+            split,
+            self.args.shorten_data_split_list,
+            self.args.shorten_method,
+            self.max_positions(),
+            self.args.seed,
+        )
+
+        dataset = {
+            "id": IdDataset(),
+            "net_input": {
+                "src_tokens": RightPadDataset(
+                    src_tokens,
+                    pad_idx=self.source_dictionary.pad(),
+                ),
+                "src_lengths": NumelDataset(src_tokens, reduce=False),
+            },
+            "nsentences": NumSamplesDataset(),
+            "ntokens": NumelDataset(src_tokens, reduce=True),
+        }
+
+        if self.args.add_prev_output_tokens:
+            prev_tokens_dataset = RightPadDataset(
+                RollDataset(src_tokens, 1),
+                pad_idx=self.dictionary.pad(),
+            )
+            dataset["net_input"].update(
+                prev_output_tokens=prev_tokens_dataset,
+            )
+
+        if not self.args.regression_target:
+            label_dataset = make_dataset("label", self.label_dictionary)
+            if label_dataset is not None:
+                dataset.update(
+                    target=OffsetTokensDataset(
+                        StripTokenDataset(
+                            label_dataset,
+                            id_to_strip=self.label_dictionary.eos(),
+                        ),
+                        offset=-self.label_dictionary.nspecial,
+                    )
+                )
+        else:
+            label_path = "{0}.label".format(get_path("label", split))
+            if os.path.exists(label_path):
+
+                def parse_regression_target(i, line):
+                    values = line.split()
+                    assert (
+                        len(values) == self.args.num_classes
+                    ), f'expected num_classes={self.args.num_classes} regression target values on line {i}, found: "{line}"'
+                    return [float(x) for x in values]
+
+                with open(label_path) as h:
+                    dataset.update(
+                        target=RawLabelDataset(
+                            [
+                                parse_regression_target(i, line.strip())
+                                for i, line in enumerate(h.readlines())
+                            ]
+                        )
+                    )
+
+        nested_dataset = NestedDictionaryDataset(
+            dataset,
+            sizes=[src_tokens.sizes],
+        )
+
+        if self.args.no_shuffle:
+            dataset = nested_dataset
+        else:
+            dataset = SortDataset(
+                nested_dataset,
+                # shuffle
+                sort_order=[shuffle],
+            )
+
+        logger.info("Loaded {0} with #samples: {1}".format(split, len(dataset)))
+
+        self.datasets[split] = dataset
+        return self.datasets[split]
+
+    def build_model(self, args):
+        from fairseq import models
+
+        model = models.build_model(args, self)
+
+        model.register_classification_head(
+            getattr(args, "classification_head_name", "sentence_classification_head"),
+            num_classes=self.args.num_classes,
+        )
+
+        return model
+
+    def max_positions(self):
+        return self._max_positions
+
+    @property
+    def source_dictionary(self):
+        return self.dictionary
+
+    @property
+    def target_dictionary(self):
+        return self.dictionary
+
+    @property
+    def label_dictionary(self):
+        return self._label_dictionary
diff --git a/SpeechT5/fairseq/fairseq/tasks/sentence_ranking.py b/SpeechT5/fairseq/fairseq/tasks/sentence_ranking.py
new file mode 100644
index 0000000000000000000000000000000000000000..bed44f34e5f8e506b6ae7ba30ddaa661bf4a7522
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/tasks/sentence_ranking.py
@@ -0,0 +1,219 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import logging
+import os
+
+import numpy as np
+from fairseq import utils
+from fairseq.data import (
+    ConcatSentencesDataset,
+    Dictionary,
+    IdDataset,
+    NestedDictionaryDataset,
+    NumelDataset,
+    NumSamplesDataset,
+    PrependTokenDataset,
+    RawLabelDataset,
+    RightPadDataset,
+    SortDataset,
+    TruncateDataset,
+    data_utils,
+)
+from fairseq.data.shorten_dataset import maybe_shorten_dataset
+from fairseq.tasks import LegacyFairseqTask, register_task
+
+
+logger = logging.getLogger(__name__)
+
+
+@register_task("sentence_ranking")
+class SentenceRankingTask(LegacyFairseqTask):
+    """
+    Ranking task on multiple sentences.
+
+    Args:
+        dictionary (Dictionary): the dictionary for the input of the task
+    """
+
+    @staticmethod
+    def add_args(parser):
+        """Add task-specific arguments to the parser."""
+        parser.add_argument("data", metavar="FILE", help="file prefix for data")
+        parser.add_argument(
+            "--num-classes", type=int, help="number of sentences to be ranked"
+        )
+        parser.add_argument(
+            "--init-token",
+            type=int,
+            help="add token at the beginning of each batch item",
+        )
+        parser.add_argument(
+            "--separator-token", type=int, help="add separator token between inputs"
+        )
+        parser.add_argument("--no-shuffle", action="store_true")
+        parser.add_argument(
+            "--shorten-method",
+            default="none",
+            choices=["none", "truncate", "random_crop"],
+            help="if not none, shorten sequences that exceed --tokens-per-sample",
+        )
+        parser.add_argument(
+            "--shorten-data-split-list",
+            default="",
+            help="comma-separated list of dataset splits to apply shortening to, "
+            'e.g., "train,valid" (default: all dataset splits)',
+        )
+        parser.add_argument(
+            "--max-option-length", type=int, help="max length for each option"
+        )
+
+    def __init__(self, args, dictionary):
+        super().__init__(args)
+        self.dictionary = dictionary
+
+    @classmethod
+    def load_dictionary(cls, args, filename, source=True):
+        """Load the dictionary from the filename
+
+        Args:
+            filename (str): the filename
+        """
+        dictionary = Dictionary.load(filename)
+        dictionary.add_symbol("<mask>")
+        return dictionary
+
+    @classmethod
+    def setup_task(cls, args, **kwargs):
+        assert (
+            args.criterion == "sentence_ranking"
+        ), "Must set --criterion=sentence_ranking"
+
+        # load data dictionary
+        data_dict = cls.load_dictionary(
+            args,
+            os.path.join(args.data, "input0", "dict.txt"),
+            source=True,
+        )
+        logger.info("[input] dictionary: {} types".format(len(data_dict)))
+        return SentenceRankingTask(args, data_dict)
+
+    def load_dataset(self, split, combine=False, **kwargs):
+        """Load a given dataset split (e.g., train, valid, test)."""
+
+        def get_path(type, split):
+            return os.path.join(self.args.data, type, split)
+
+        def make_dataset(type, dictionary):
+            split_path = get_path(type, split)
+
+            dataset = data_utils.load_indexed_dataset(
+                split_path,
+                self.source_dictionary,
+                self.args.dataset_impl,
+                combine=combine,
+            )
+            return dataset
+
+        input0 = make_dataset("input0", self.source_dictionary)
+        input_options = [
+            make_dataset("input{idx}".format(idx=idx + 1), self.source_dictionary)
+            for idx in range(self.args.num_classes)
+        ]
+
+        if self.args.separator_token is not None:
+            input0 = PrependTokenDataset(input0, self.args.separator_token)
+
+        src_tokens = []
+        for input_option in input_options:
+            if self.args.init_token is not None:
+                input_option = PrependTokenDataset(input_option, self.args.init_token)
+            if self.args.max_option_length is not None:
+                input_option = TruncateDataset(
+                    input_option, self.args.max_option_length
+                )
+            src_token = ConcatSentencesDataset(input_option, input0)
+            src_token = maybe_shorten_dataset(
+                src_token,
+                split,
+                self.args.shorten_data_split_list,
+                self.args.shorten_method,
+                self.args.max_positions,
+                self.args.seed,
+            )
+            src_tokens.append(src_token)
+
+        with data_utils.numpy_seed(self.args.seed):
+            shuffle = np.random.permutation(len(src_tokens[0]))
+
+        dataset = {
+            "id": IdDataset(),
+            "nsentences": NumSamplesDataset(),
+            "ntokens": NumelDataset(src_tokens[0], reduce=True),
+        }
+
+        for src_token_idx in range(len(src_tokens)):
+            dataset.update(
+                {
+                    "net_input{idx}".format(idx=src_token_idx + 1): {
+                        "src_tokens": RightPadDataset(
+                            src_tokens[src_token_idx],
+                            pad_idx=self.source_dictionary.pad(),
+                        ),
+                        "src_lengths": NumelDataset(
+                            src_tokens[src_token_idx], reduce=False
+                        ),
+                    }
+                }
+            )
+
+        label_path = "{}.label".format(get_path("label", split))
+        if os.path.exists(label_path):
+            with open(label_path) as h:
+                dataset.update(
+                    target=RawLabelDataset([int(x.strip()) for x in h.readlines()])
+                )
+
+        nested_dataset = NestedDictionaryDataset(
+            dataset,
+            sizes=[np.maximum.reduce([src_token.sizes for src_token in src_tokens])],
+        )
+
+        if self.args.no_shuffle:
+            dataset = nested_dataset
+        else:
+            dataset = SortDataset(
+                nested_dataset,
+                # shuffle
+                sort_order=[shuffle],
+            )
+
+        logger.info("Loaded {0} with #samples: {1}".format(split, len(dataset)))
+
+        self.datasets[split] = dataset
+        return self.datasets[split]
+
+    def build_model(self, args):
+        from fairseq import models
+
+        model = models.build_model(args, self)
+
+        model.register_classification_head(
+            getattr(args, "ranking_head_name", "sentence_classification_head"),
+            num_classes=1,
+        )
+
+        return model
+
+    def max_positions(self):
+        return self.args.max_positions
+
+    @property
+    def source_dictionary(self):
+        return self.dictionary
+
+    @property
+    def target_dictionary(self):
+        return self.dictionary
diff --git a/SpeechT5/fairseq/fairseq/tasks/simultaneous_translation.py b/SpeechT5/fairseq/fairseq/tasks/simultaneous_translation.py
new file mode 100644
index 0000000000000000000000000000000000000000..11c7dc1ea966a54f8915ef164377e40f90e851a1
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/tasks/simultaneous_translation.py
@@ -0,0 +1,42 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import logging
+from fairseq.tasks import register_task
+from fairseq.tasks.speech_to_text import SpeechToTextTask
+from fairseq.tasks.translation import (
+    TranslationTask, TranslationConfig
+)
+
+try:
+    import examples.simultaneous_translation # noqa
+    import_successful = True
+except BaseException:
+    import_successful = False
+
+
+logger = logging.getLogger(__name__)
+
+
+def check_import(flag):
+    if not flag:
+        raise ImportError(
+            "'examples.simultaneous_translation' is not correctly imported. "
+            "Please considering `pip install -e $FAIRSEQ_DIR`."
+        )
+
+
+@register_task("simul_speech_to_text")
+class SimulSpeechToTextTask(SpeechToTextTask):
+    def __init__(self, args, tgt_dict):
+        check_import(import_successful)
+        super().__init__(args, tgt_dict)
+
+
+@register_task("simul_text_to_text",  dataclass=TranslationConfig)
+class SimulTextToTextTask(TranslationTask):
+    def __init__(self, cfg, src_dict, tgt_dict):
+        check_import(import_successful)
+        super().__init__(cfg, src_dict, tgt_dict)
diff --git a/SpeechT5/fairseq/fairseq/tasks/speech_to_text.py b/SpeechT5/fairseq/fairseq/tasks/speech_to_text.py
new file mode 100644
index 0000000000000000000000000000000000000000..8bdf21564367d3647d582c72a6c3c9924760933e
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/tasks/speech_to_text.py
@@ -0,0 +1,149 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import logging
+import os.path as op
+from argparse import Namespace
+
+from fairseq.data import Dictionary, encoders
+from fairseq.data.audio.speech_to_text_dataset import (
+    S2TDataConfig,
+    SpeechToTextDataset,
+    SpeechToTextDatasetCreator,
+    get_features_or_waveform
+)
+from fairseq.tasks import LegacyFairseqTask, register_task
+
+
+logger = logging.getLogger(__name__)
+
+
+@register_task("speech_to_text")
+class SpeechToTextTask(LegacyFairseqTask):
+    @staticmethod
+    def add_args(parser):
+        parser.add_argument("data", help="manifest root path")
+        parser.add_argument(
+            "--config-yaml",
+            type=str,
+            default="config.yaml",
+            help="Configuration YAML filename (under manifest root)",
+        )
+        parser.add_argument(
+            "--max-source-positions",
+            default=6000,
+            type=int,
+            metavar="N",
+            help="max number of tokens in the source sequence",
+        )
+        parser.add_argument(
+            "--max-target-positions",
+            default=1024,
+            type=int,
+            metavar="N",
+            help="max number of tokens in the target sequence",
+        )
+
+    def __init__(self, args, tgt_dict):
+        super().__init__(args)
+        self.tgt_dict = tgt_dict
+        self.data_cfg = S2TDataConfig(op.join(args.data, args.config_yaml))
+
+    @classmethod
+    def setup_task(cls, args, **kwargs):
+        data_cfg = S2TDataConfig(op.join(args.data, args.config_yaml))
+        dict_path = op.join(args.data, data_cfg.vocab_filename)
+        if not op.isfile(dict_path):
+            raise FileNotFoundError(f"Dict not found: {dict_path}")
+        tgt_dict = Dictionary.load(dict_path)
+        logger.info(
+            f"dictionary size ({data_cfg.vocab_filename}): " f"{len(tgt_dict):,}"
+        )
+
+        if getattr(args, "train_subset", None) is not None:
+            if not all(s.startswith("train") for s in args.train_subset.split(",")):
+                raise ValueError('Train splits should be named like "train*".')
+        return cls(args, tgt_dict)
+
+    def build_criterion(self, args):
+        from fairseq import criterions
+
+        if self.data_cfg.prepend_tgt_lang_tag and args.ignore_prefix_size != 1:
+            raise ValueError(
+                'Please set "--ignore-prefix-size 1" since '
+                "target language ID token is prepended as BOS."
+            )
+        return criterions.build_criterion(args, self)
+
+    def load_dataset(self, split, epoch=1, combine=False, **kwargs):
+        is_train_split = split.startswith("train")
+        pre_tokenizer = self.build_tokenizer(self.args)
+        bpe_tokenizer = self.build_bpe(self.args)
+        self.datasets[split] = SpeechToTextDatasetCreator.from_tsv(
+            self.args.data,
+            self.data_cfg,
+            split,
+            self.tgt_dict,
+            pre_tokenizer,
+            bpe_tokenizer,
+            is_train_split=is_train_split,
+            epoch=epoch,
+            seed=self.args.seed,
+        )
+
+    @property
+    def target_dictionary(self):
+        return self.tgt_dict
+
+    @property
+    def source_dictionary(self):
+        return None
+
+    def max_positions(self):
+        return self.args.max_source_positions, self.args.max_target_positions
+
+    def build_model(self, args):
+        args.input_feat_per_channel = self.data_cfg.input_feat_per_channel
+        args.input_channels = self.data_cfg.input_channels
+        return super(SpeechToTextTask, self).build_model(args)
+
+    def build_generator(
+        self,
+        models,
+        args,
+        seq_gen_cls=None,
+        extra_gen_cls_kwargs=None,
+    ):
+        if self.data_cfg.prepend_tgt_lang_tag and args.prefix_size != 1:
+            raise ValueError(
+                'Please set "--prefix-size 1" since '
+                "target language ID token is prepended as BOS."
+            )
+        lang_token_ids = {
+            i
+            for s, i in self.tgt_dict.indices.items()
+            if SpeechToTextDataset.is_lang_tag(s)
+        }
+        extra_gen_cls_kwargs = {"symbols_to_strip_from_output": lang_token_ids}
+        return super().build_generator(
+            models, args, seq_gen_cls=None, extra_gen_cls_kwargs=extra_gen_cls_kwargs
+        )
+
+    def build_tokenizer(self, args):
+        logger.info(f"pre-tokenizer: {self.data_cfg.pre_tokenizer}")
+        return encoders.build_tokenizer(Namespace(**self.data_cfg.pre_tokenizer))
+
+    def build_bpe(self, args):
+        logger.info(f"tokenizer: {self.data_cfg.bpe_tokenizer}")
+        return encoders.build_bpe(Namespace(**self.data_cfg.bpe_tokenizer))
+
+    def get_interactive_tokens_and_lengths(self, lines, encode_fn):
+        n_frames = [get_features_or_waveform(p).shape[0] for p in lines]
+        return lines, n_frames
+
+    def build_dataset_for_inference(self, src_tokens, src_lengths, **kwargs):
+        return SpeechToTextDataset(
+            "interactive", False, self.data_cfg, src_tokens, src_lengths
+        )
diff --git a/SpeechT5/fairseq/fairseq/tasks/translation.py b/SpeechT5/fairseq/fairseq/tasks/translation.py
new file mode 100644
index 0000000000000000000000000000000000000000..ea80fa2e73a0ee0e6d22d1b880e9a57877c48742
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/tasks/translation.py
@@ -0,0 +1,487 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from dataclasses import dataclass, field
+import itertools
+import json
+import logging
+import os
+from typing import Optional
+from argparse import Namespace
+from omegaconf import II
+
+import numpy as np
+from fairseq import metrics, utils
+from fairseq.data import (
+    AppendTokenDataset,
+    ConcatDataset,
+    LanguagePairDataset,
+    PrependTokenDataset,
+    StripTokenDataset,
+    TruncateDataset,
+    data_utils,
+    encoders,
+    indexed_dataset,
+)
+from fairseq.data.indexed_dataset import get_available_dataset_impl
+from fairseq.dataclass import ChoiceEnum, FairseqDataclass
+from fairseq.tasks import FairseqTask, register_task
+
+
+EVAL_BLEU_ORDER = 4
+
+
+logger = logging.getLogger(__name__)
+
+
+def load_langpair_dataset(
+    data_path,
+    split,
+    src,
+    src_dict,
+    tgt,
+    tgt_dict,
+    combine,
+    dataset_impl,
+    upsample_primary,
+    left_pad_source,
+    left_pad_target,
+    max_source_positions,
+    max_target_positions,
+    prepend_bos=False,
+    load_alignments=False,
+    truncate_source=False,
+    append_source_id=False,
+    num_buckets=0,
+    shuffle=True,
+    pad_to_multiple=1,
+    prepend_bos_src=None,
+):
+    def split_exists(split, src, tgt, lang, data_path):
+        filename = os.path.join(data_path, "{}.{}-{}.{}".format(split, src, tgt, lang))
+        return indexed_dataset.dataset_exists(filename, impl=dataset_impl)
+
+    src_datasets = []
+    tgt_datasets = []
+
+    for k in itertools.count():
+        split_k = split + (str(k) if k > 0 else "")
+
+        # infer langcode
+        if split_exists(split_k, src, tgt, src, data_path):
+            prefix = os.path.join(data_path, "{}.{}-{}.".format(split_k, src, tgt))
+        elif split_exists(split_k, tgt, src, src, data_path):
+            prefix = os.path.join(data_path, "{}.{}-{}.".format(split_k, tgt, src))
+        else:
+            if k > 0:
+                break
+            else:
+                raise FileNotFoundError(
+                    "Dataset not found: {} ({})".format(split, data_path)
+                )
+
+        src_dataset = data_utils.load_indexed_dataset(
+            prefix + src, src_dict, dataset_impl
+        )
+        if truncate_source:
+            src_dataset = AppendTokenDataset(
+                TruncateDataset(
+                    StripTokenDataset(src_dataset, src_dict.eos()),
+                    max_source_positions - 1,
+                ),
+                src_dict.eos(),
+            )
+        src_datasets.append(src_dataset)
+
+        tgt_dataset = data_utils.load_indexed_dataset(
+            prefix + tgt, tgt_dict, dataset_impl
+        )
+        if tgt_dataset is not None:
+            tgt_datasets.append(tgt_dataset)
+
+        logger.info(
+            "{} {} {}-{} {} examples".format(
+                data_path, split_k, src, tgt, len(src_datasets[-1])
+            )
+        )
+
+        if not combine:
+            break
+
+    assert len(src_datasets) == len(tgt_datasets) or len(tgt_datasets) == 0
+
+    if len(src_datasets) == 1:
+        src_dataset = src_datasets[0]
+        tgt_dataset = tgt_datasets[0] if len(tgt_datasets) > 0 else None
+    else:
+        sample_ratios = [1] * len(src_datasets)
+        sample_ratios[0] = upsample_primary
+        src_dataset = ConcatDataset(src_datasets, sample_ratios)
+        if len(tgt_datasets) > 0:
+            tgt_dataset = ConcatDataset(tgt_datasets, sample_ratios)
+        else:
+            tgt_dataset = None
+
+    if prepend_bos:
+        assert hasattr(src_dict, "bos_index") and hasattr(tgt_dict, "bos_index")
+        src_dataset = PrependTokenDataset(src_dataset, src_dict.bos())
+        if tgt_dataset is not None:
+            tgt_dataset = PrependTokenDataset(tgt_dataset, tgt_dict.bos())
+    elif prepend_bos_src is not None:
+        logger.info(f"prepending src bos: {prepend_bos_src}")
+        src_dataset = PrependTokenDataset(src_dataset, prepend_bos_src)
+
+    eos = None
+    if append_source_id:
+        src_dataset = AppendTokenDataset(
+            src_dataset, src_dict.index("[{}]".format(src))
+        )
+        if tgt_dataset is not None:
+            tgt_dataset = AppendTokenDataset(
+                tgt_dataset, tgt_dict.index("[{}]".format(tgt))
+            )
+        eos = tgt_dict.index("[{}]".format(tgt))
+
+    align_dataset = None
+    if load_alignments:
+        align_path = os.path.join(data_path, "{}.align.{}-{}".format(split, src, tgt))
+        if indexed_dataset.dataset_exists(align_path, impl=dataset_impl):
+            align_dataset = data_utils.load_indexed_dataset(
+                align_path, None, dataset_impl
+            )
+
+    tgt_dataset_sizes = tgt_dataset.sizes if tgt_dataset is not None else None
+    return LanguagePairDataset(
+        src_dataset,
+        src_dataset.sizes,
+        src_dict,
+        tgt_dataset,
+        tgt_dataset_sizes,
+        tgt_dict,
+        left_pad_source=left_pad_source,
+        left_pad_target=left_pad_target,
+        align_dataset=align_dataset,
+        eos=eos,
+        num_buckets=num_buckets,
+        shuffle=shuffle,
+        pad_to_multiple=pad_to_multiple,
+    )
+
+
+@dataclass
+class TranslationConfig(FairseqDataclass):
+    data: Optional[str] = field(
+        default=None,
+        metadata={
+            "help": "colon separated path to data directories list, will be iterated upon during epochs "
+            "in round-robin manner; however, valid and test data are always in the first directory "
+            "to avoid the need for repeating them in all directories"
+        },
+    )
+    source_lang: Optional[str] = field(
+        default=None,
+        metadata={
+            "help": "source language",
+            "argparse_alias": "-s",
+        },
+    )
+    target_lang: Optional[str] = field(
+        default=None,
+        metadata={
+            "help": "target language",
+            "argparse_alias": "-t",
+        },
+    )
+    load_alignments: bool = field(
+        default=False, metadata={"help": "load the binarized alignments"}
+    )
+    left_pad_source: bool = field(
+        default=True, metadata={"help": "pad the source on the left"}
+    )
+    left_pad_target: bool = field(
+        default=False, metadata={"help": "pad the target on the left"}
+    )
+    max_source_positions: int = field(
+        default=1024, metadata={"help": "max number of tokens in the source sequence"}
+    )
+    max_target_positions: int = field(
+        default=1024, metadata={"help": "max number of tokens in the target sequence"}
+    )
+    upsample_primary: int = field(
+        default=-1, metadata={"help": "the amount of upsample primary dataset"}
+    )
+    truncate_source: bool = field(
+        default=False, metadata={"help": "truncate source to max-source-positions"}
+    )
+    num_batch_buckets: int = field(
+        default=0,
+        metadata={
+            "help": "if >0, then bucket source and target lengths into "
+            "N buckets and pad accordingly; this is useful on TPUs to minimize the number of compilations"
+        },
+    )
+    train_subset: str = II("dataset.train_subset")
+    dataset_impl: Optional[ChoiceEnum(get_available_dataset_impl())] = II(
+        "dataset.dataset_impl"
+    )
+    required_seq_len_multiple: int = II("dataset.required_seq_len_multiple")
+
+    # options for reporting BLEU during validation
+    eval_bleu: bool = field(
+        default=False, metadata={"help": "evaluation with BLEU scores"}
+    )
+    eval_bleu_args: Optional[str] = field(
+        default="{}",
+        metadata={
+            "help": 'generation args for BLUE scoring, e.g., \'{"beam": 4, "lenpen": 0.6}\', as JSON string'
+        },
+    )
+    eval_bleu_detok: str = field(
+        default="space",
+        metadata={
+            "help": "detokenize before computing BLEU (e.g., 'moses'); required if using --eval-bleu; "
+            "use 'space' to disable detokenization; see fairseq.data.encoders for other options"
+        },
+    )
+    eval_bleu_detok_args: Optional[str] = field(
+        default="{}",
+        metadata={"help": "args for building the tokenizer, if needed, as JSON string"},
+    )
+    eval_tokenized_bleu: bool = field(
+        default=False, metadata={"help": "compute tokenized BLEU instead of sacrebleu"}
+    )
+    eval_bleu_remove_bpe: Optional[str] = field(
+        default=None,
+        metadata={
+            "help": "remove BPE before computing BLEU",
+            "argparse_const": "@@ ",
+        },
+    )
+    eval_bleu_print_samples: bool = field(
+        default=False, metadata={"help": "print sample generations during validation"}
+    )
+
+
+@register_task("translation", dataclass=TranslationConfig)
+class TranslationTask(FairseqTask):
+    """
+    Translate from one (source) language to another (target) language.
+
+    Args:
+        src_dict (~fairseq.data.Dictionary): dictionary for the source language
+        tgt_dict (~fairseq.data.Dictionary): dictionary for the target language
+
+    .. note::
+
+        The translation task is compatible with :mod:`fairseq-train`,
+        :mod:`fairseq-generate` and :mod:`fairseq-interactive`.
+    """
+
+    cfg: TranslationConfig
+
+    def __init__(self, cfg: TranslationConfig, src_dict, tgt_dict):
+        super().__init__(cfg)
+        self.src_dict = src_dict
+        self.tgt_dict = tgt_dict
+
+    @classmethod
+    def setup_task(cls, cfg: TranslationConfig, **kwargs):
+        """Setup the task (e.g., load dictionaries).
+
+        Args:
+            args (argparse.Namespace): parsed command-line arguments
+        """
+
+        paths = utils.split_paths(cfg.data)
+        assert len(paths) > 0
+        # find language pair automatically
+        if cfg.source_lang is None or cfg.target_lang is None:
+            cfg.source_lang, cfg.target_lang = data_utils.infer_language_pair(paths[0])
+        if cfg.source_lang is None or cfg.target_lang is None:
+            raise Exception(
+                "Could not infer language pair, please provide it explicitly"
+            )
+
+        # load dictionaries
+        src_dict = cls.load_dictionary(
+            os.path.join(paths[0], "dict.{}.txt".format(cfg.source_lang))
+        )
+        tgt_dict = cls.load_dictionary(
+            os.path.join(paths[0], "dict.{}.txt".format(cfg.target_lang))
+        )
+        assert src_dict.pad() == tgt_dict.pad()
+        assert src_dict.eos() == tgt_dict.eos()
+        assert src_dict.unk() == tgt_dict.unk()
+        logger.info("[{}] dictionary: {} types".format(cfg.source_lang, len(src_dict)))
+        logger.info("[{}] dictionary: {} types".format(cfg.target_lang, len(tgt_dict)))
+
+        return cls(cfg, src_dict, tgt_dict)
+
+    def load_dataset(self, split, epoch=1, combine=False, **kwargs):
+        """Load a given dataset split.
+
+        Args:
+            split (str): name of the split (e.g., train, valid, test)
+        """
+        paths = utils.split_paths(self.cfg.data)
+        assert len(paths) > 0
+        if split != self.cfg.train_subset:
+            # if not training data set, use the first shard for valid and test
+            paths = paths[:1]
+        data_path = paths[(epoch - 1) % len(paths)]
+
+        # infer langcode
+        src, tgt = self.cfg.source_lang, self.cfg.target_lang
+
+        self.datasets[split] = load_langpair_dataset(
+            data_path,
+            split,
+            src,
+            self.src_dict,
+            tgt,
+            self.tgt_dict,
+            combine=combine,
+            dataset_impl=self.cfg.dataset_impl,
+            upsample_primary=self.cfg.upsample_primary,
+            left_pad_source=self.cfg.left_pad_source,
+            left_pad_target=self.cfg.left_pad_target,
+            max_source_positions=self.cfg.max_source_positions,
+            max_target_positions=self.cfg.max_target_positions,
+            load_alignments=self.cfg.load_alignments,
+            truncate_source=self.cfg.truncate_source,
+            num_buckets=self.cfg.num_batch_buckets,
+            shuffle=(split != "test"),
+            pad_to_multiple=self.cfg.required_seq_len_multiple,
+        )
+
+    def build_dataset_for_inference(self, src_tokens, src_lengths, constraints=None):
+        return LanguagePairDataset(
+            src_tokens,
+            src_lengths,
+            self.source_dictionary,
+            tgt_dict=self.target_dictionary,
+            constraints=constraints,
+        )
+
+    def build_model(self, cfg):
+        model = super().build_model(cfg)
+        if self.cfg.eval_bleu:
+            detok_args = json.loads(self.cfg.eval_bleu_detok_args)
+            self.tokenizer = encoders.build_tokenizer(
+                Namespace(tokenizer=self.cfg.eval_bleu_detok, **detok_args)
+            )
+
+            gen_args = json.loads(self.cfg.eval_bleu_args)
+            self.sequence_generator = self.build_generator(
+                [model], Namespace(**gen_args)
+            )
+        return model
+
+    def valid_step(self, sample, model, criterion):
+        loss, sample_size, logging_output = super().valid_step(sample, model, criterion)
+        if self.cfg.eval_bleu:
+            bleu = self._inference_with_bleu(self.sequence_generator, sample, model)
+            logging_output["_bleu_sys_len"] = bleu.sys_len
+            logging_output["_bleu_ref_len"] = bleu.ref_len
+            # we split counts into separate entries so that they can be
+            # summed efficiently across workers using fast-stat-sync
+            assert len(bleu.counts) == EVAL_BLEU_ORDER
+            for i in range(EVAL_BLEU_ORDER):
+                logging_output["_bleu_counts_" + str(i)] = bleu.counts[i]
+                logging_output["_bleu_totals_" + str(i)] = bleu.totals[i]
+        return loss, sample_size, logging_output
+
+    def reduce_metrics(self, logging_outputs, criterion):
+        super().reduce_metrics(logging_outputs, criterion)
+        if self.cfg.eval_bleu:
+
+            def sum_logs(key):
+                import torch
+                result = sum(log.get(key, 0) for log in logging_outputs)
+                if torch.is_tensor(result):
+                    result = result.cpu()
+                return result
+
+            counts, totals = [], []
+            for i in range(EVAL_BLEU_ORDER):
+                counts.append(sum_logs("_bleu_counts_" + str(i)))
+                totals.append(sum_logs("_bleu_totals_" + str(i)))
+
+            if max(totals) > 0:
+                # log counts as numpy arrays -- log_scalar will sum them correctly
+                metrics.log_scalar("_bleu_counts", np.array(counts))
+                metrics.log_scalar("_bleu_totals", np.array(totals))
+                metrics.log_scalar("_bleu_sys_len", sum_logs("_bleu_sys_len"))
+                metrics.log_scalar("_bleu_ref_len", sum_logs("_bleu_ref_len"))
+
+                def compute_bleu(meters):
+                    import inspect
+                    import sacrebleu
+
+                    fn_sig = inspect.getfullargspec(sacrebleu.compute_bleu)[0]
+                    if "smooth_method" in fn_sig:
+                        smooth = {"smooth_method": "exp"}
+                    else:
+                        smooth = {"smooth": "exp"}
+                    bleu = sacrebleu.compute_bleu(
+                        correct=meters["_bleu_counts"].sum,
+                        total=meters["_bleu_totals"].sum,
+                        sys_len=meters["_bleu_sys_len"].sum,
+                        ref_len=meters["_bleu_ref_len"].sum,
+                        **smooth
+                    )
+                    return round(bleu.score, 2)
+
+                metrics.log_derived("bleu", compute_bleu)
+
+    def max_positions(self):
+        """Return the max sentence length allowed by the task."""
+        return (self.cfg.max_source_positions, self.cfg.max_target_positions)
+
+    @property
+    def source_dictionary(self):
+        """Return the source :class:`~fairseq.data.Dictionary`."""
+        return self.src_dict
+
+    @property
+    def target_dictionary(self):
+        """Return the target :class:`~fairseq.data.Dictionary`."""
+        return self.tgt_dict
+
+    def _inference_with_bleu(self, generator, sample, model):
+        import sacrebleu
+
+        def decode(toks, escape_unk=False):
+            s = self.tgt_dict.string(
+                toks.int().cpu(),
+                self.cfg.eval_bleu_remove_bpe,
+                # The default unknown string in fairseq is `<unk>`, but
+                # this is tokenized by sacrebleu as `< unk >`, inflating
+                # BLEU scores. Instead, we use a somewhat more verbose
+                # alternative that is unlikely to appear in the real
+                # reference, but doesn't get split into multiple tokens.
+                unk_string=("UNKNOWNTOKENINREF" if escape_unk else "UNKNOWNTOKENINHYP"),
+            )
+            if self.tokenizer:
+                s = self.tokenizer.decode(s)
+            return s
+
+        gen_out = self.inference_step(generator, [model], sample, prefix_tokens=None)
+        hyps, refs = [], []
+        for i in range(len(gen_out)):
+            hyps.append(decode(gen_out[i][0]["tokens"]))
+            refs.append(
+                decode(
+                    utils.strip_pad(sample["target"][i], self.tgt_dict.pad()),
+                    escape_unk=True,  # don't count <unk> as matches to the hypo
+                )
+            )
+        if self.cfg.eval_bleu_print_samples:
+            logger.info("example hypothesis: " + hyps[0])
+            logger.info("example reference: " + refs[0])
+        if self.cfg.eval_tokenized_bleu:
+            return sacrebleu.corpus_bleu(hyps, [refs], tokenize="none")
+        else:
+            return sacrebleu.corpus_bleu(hyps, [refs])
diff --git a/SpeechT5/fairseq/fairseq/tasks/translation_from_pretrained_bart.py b/SpeechT5/fairseq/fairseq/tasks/translation_from_pretrained_bart.py
new file mode 100644
index 0000000000000000000000000000000000000000..0fd7a5b29f0e34699b5d5ef7574bc39b8c6052c9
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/tasks/translation_from_pretrained_bart.py
@@ -0,0 +1,132 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import torch
+from fairseq import utils
+from fairseq.data import LanguagePairDataset
+
+from . import register_task
+from .translation import TranslationTask, load_langpair_dataset
+
+
+@register_task("translation_from_pretrained_bart")
+class TranslationFromPretrainedBARTTask(TranslationTask):
+    """
+    Translate from source language to target language with a model initialized with a multilingual pretrain.
+
+    Args:
+        src_dict (~fairseq.data.Dictionary): dictionary for the source language
+        tgt_dict (~fairseq.data.Dictionary): dictionary for the target language
+
+    .. note::
+
+        The translation task is compatible with :mod:`fairseq-train`,
+        :mod:`fairseq-generate` and :mod:`fairseq-interactive`.
+
+    The translation task provides the following additional command-line
+    arguments:
+
+    .. argparse::
+        :ref: fairseq.tasks.translation_parser
+        :prog:
+    """
+
+    @staticmethod
+    def add_args(parser):
+        """Add task-specific arguments to the parser."""
+        # fmt: off
+        TranslationTask.add_args(parser)
+        parser.add_argument('--langs',  type=str, metavar='LANG',
+                            help='comma-separated list of monolingual language, '
+                                 'for example, "en,de,fr". These should match the '
+                                 'langs from pretraining (and be in the same order). '
+                                 'You should always add all pretraining language idx '
+                                 'during finetuning.')
+        parser.add_argument('--prepend-bos', action='store_true',
+                            help='prepend bos token to each sentence, which matches '
+                                 'mBART pretraining')
+        # fmt: on
+
+    def __init__(self, args, src_dict, tgt_dict):
+        super().__init__(args, src_dict, tgt_dict)
+        self.langs = args.langs.split(",")
+        for d in [src_dict, tgt_dict]:
+            for l in self.langs:
+                d.add_symbol("[{}]".format(l))
+            d.add_symbol("<mask>")
+
+    def load_dataset(self, split, epoch=1, combine=False, **kwargs):
+        """Load a given dataset split.
+
+        Args:
+            split (str): name of the split (e.g., train, valid, test)
+        """
+        paths = utils.split_paths(self.args.data)
+        assert len(paths) > 0
+        data_path = paths[(epoch - 1) % len(paths)]
+
+        # infer langcode
+        src, tgt = self.args.source_lang, self.args.target_lang
+
+        self.datasets[split] = load_langpair_dataset(
+            data_path,
+            split,
+            src,
+            self.src_dict,
+            tgt,
+            self.tgt_dict,
+            combine=combine,
+            dataset_impl=self.args.dataset_impl,
+            upsample_primary=self.args.upsample_primary,
+            left_pad_source=self.args.left_pad_source,
+            left_pad_target=self.args.left_pad_target,
+            max_source_positions=getattr(self.args, "max_source_positions", 1024),
+            max_target_positions=getattr(self.args, "max_target_positions", 1024),
+            load_alignments=self.args.load_alignments,
+            prepend_bos=getattr(self.args, "prepend_bos", False),
+            append_source_id=True,
+        )
+
+    def build_generator(self, models, args, **unused):
+        if getattr(args, "score_reference", False):
+            from fairseq.sequence_scorer import SequenceScorer
+
+            return SequenceScorer(
+                self.target_dictionary,
+                eos=self.tgt_dict.index("[{}]".format(self.args.target_lang)),
+            )
+        else:
+            from fairseq.sequence_generator import SequenceGenerator
+
+            return SequenceGenerator(
+                models,
+                self.target_dictionary,
+                beam_size=getattr(args, "beam", 5),
+                max_len_a=getattr(args, "max_len_a", 0),
+                max_len_b=getattr(args, "max_len_b", 200),
+                min_len=getattr(args, "min_len", 1),
+                normalize_scores=(not getattr(args, "unnormalized", False)),
+                len_penalty=getattr(args, "lenpen", 1),
+                unk_penalty=getattr(args, "unkpen", 0),
+                temperature=getattr(args, "temperature", 1.0),
+                match_source_len=getattr(args, "match_source_len", False),
+                no_repeat_ngram_size=getattr(args, "no_repeat_ngram_size", 0),
+                eos=self.tgt_dict.index("[{}]".format(self.args.target_lang)),
+            )
+
+    def build_dataset_for_inference(self, src_tokens, src_lengths, constraints=None):
+        src_lang_id = self.source_dictionary.index("[{}]".format(self.args.source_lang))
+        source_tokens = []
+        for s_t in src_tokens:
+            s_t = torch.cat([s_t, s_t.new(1).fill_(src_lang_id)])
+            source_tokens.append(s_t)
+        dataset = LanguagePairDataset(
+            source_tokens,
+            src_lengths,
+            self.source_dictionary,
+            tgt_dict=self.target_dictionary,
+            constraints=constraints,
+        )
+        return dataset
diff --git a/SpeechT5/fairseq/fairseq/tasks/translation_from_pretrained_xlm.py b/SpeechT5/fairseq/fairseq/tasks/translation_from_pretrained_xlm.py
new file mode 100644
index 0000000000000000000000000000000000000000..a05f2891524a8b23482e206c1742c3b816b77afb
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/tasks/translation_from_pretrained_xlm.py
@@ -0,0 +1,39 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from dataclasses import dataclass
+from fairseq.data.legacy.masked_lm_dictionary import MaskedLMDictionary
+from fairseq.tasks.translation import TranslationConfig, TranslationTask
+
+from . import register_task
+
+
+@dataclass
+class TranslationFromPretrainedXLMConfig(TranslationConfig):
+    pass
+
+
+@register_task(
+    "translation_from_pretrained_xlm", dataclass=TranslationFromPretrainedXLMConfig
+)
+class TranslationFromPretrainedXLMTask(TranslationTask):
+    """
+    Same as TranslationTask except use the MaskedLMDictionary class so that
+    we can load data that was binarized with the MaskedLMDictionary class.
+
+    This task should be used for the entire training pipeline when we want to
+    train an NMT model from a pretrained XLM checkpoint: binarizing NMT data,
+    training NMT with the pretrained XLM checkpoint, and subsequent evaluation
+    of that trained model.
+    """
+
+    @classmethod
+    def load_dictionary(cls, filename):
+        """Load the masked LM dictionary from the filename
+
+        Args:
+            filename (str): the filename
+        """
+        return MaskedLMDictionary.load(filename)
diff --git a/SpeechT5/fairseq/fairseq/tasks/translation_lev.py b/SpeechT5/fairseq/fairseq/tasks/translation_lev.py
new file mode 100644
index 0000000000000000000000000000000000000000..041279305dc4978f6a3a4178c5ec4c72c5fb2b5c
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/tasks/translation_lev.py
@@ -0,0 +1,191 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from dataclasses import dataclass, field
+import torch
+from fairseq import utils
+from fairseq.data import LanguagePairDataset
+from fairseq.dataclass import ChoiceEnum
+from fairseq.tasks import register_task
+from fairseq.tasks.translation import TranslationConfig, TranslationTask, load_langpair_dataset
+from fairseq.utils import new_arange
+
+
+NOISE_CHOICES = ChoiceEnum(["random_delete", "random_mask", "no_noise", "full_mask"])
+
+@dataclass
+class TranslationLevenshteinConfig(TranslationConfig):
+    noise: NOISE_CHOICES = field(
+        default="random_delete",
+        metadata={
+            "help": "type of noise"
+        },
+    )
+
+@register_task("translation_lev", dataclass=TranslationLevenshteinConfig)
+class TranslationLevenshteinTask(TranslationTask):
+    """
+    Translation (Sequence Generation) task for Levenshtein Transformer
+    See `"Levenshtein Transformer" <https://arxiv.org/abs/1905.11006>`_.
+    """
+
+    cfg: TranslationLevenshteinConfig
+
+    def load_dataset(self, split, epoch=1, combine=False, **kwargs):
+        """Load a given dataset split.
+
+        Args:
+            split (str): name of the split (e.g., train, valid, test)
+        """
+        paths = utils.split_paths(self.cfg.data)
+        assert len(paths) > 0
+        data_path = paths[(epoch - 1) % len(paths)]
+
+        # infer langcode
+        src, tgt = self.cfg.source_lang, self.cfg.target_lang
+
+        self.datasets[split] = load_langpair_dataset(
+            data_path,
+            split,
+            src,
+            self.src_dict,
+            tgt,
+            self.tgt_dict,
+            combine=combine,
+            dataset_impl=self.cfg.dataset_impl,
+            upsample_primary=self.cfg.upsample_primary,
+            left_pad_source=self.cfg.left_pad_source,
+            left_pad_target=self.cfg.left_pad_target,
+            max_source_positions=self.cfg.max_source_positions,
+            max_target_positions=self.cfg.max_target_positions,
+            prepend_bos=True,
+        )
+
+    def inject_noise(self, target_tokens):
+        def _random_delete(target_tokens):
+            pad = self.tgt_dict.pad()
+            bos = self.tgt_dict.bos()
+            eos = self.tgt_dict.eos()
+
+            max_len = target_tokens.size(1)
+            target_mask = target_tokens.eq(pad)
+            target_score = target_tokens.clone().float().uniform_()
+            target_score.masked_fill_(
+                target_tokens.eq(bos) | target_tokens.eq(eos), 0.0
+            )
+            target_score.masked_fill_(target_mask, 1)
+            target_score, target_rank = target_score.sort(1)
+            target_length = target_mask.size(1) - target_mask.float().sum(
+                1, keepdim=True
+            )
+
+            # do not delete <bos> and <eos> (we assign 0 score for them)
+            target_cutoff = (
+                2
+                + (
+                    (target_length - 2)
+                    * target_score.new_zeros(target_score.size(0), 1).uniform_()
+                ).long()
+            )
+            target_cutoff = target_score.sort(1)[1] >= target_cutoff
+
+            prev_target_tokens = (
+                target_tokens.gather(1, target_rank)
+                .masked_fill_(target_cutoff, pad)
+                .gather(1, target_rank.masked_fill_(target_cutoff, max_len).sort(1)[1])
+            )
+            prev_target_tokens = prev_target_tokens[
+                :, : prev_target_tokens.ne(pad).sum(1).max()
+            ]
+
+            return prev_target_tokens
+
+        def _random_mask(target_tokens):
+            pad = self.tgt_dict.pad()
+            bos = self.tgt_dict.bos()
+            eos = self.tgt_dict.eos()
+            unk = self.tgt_dict.unk()
+
+            target_masks = (
+                target_tokens.ne(pad) & target_tokens.ne(bos) & target_tokens.ne(eos)
+            )
+            target_score = target_tokens.clone().float().uniform_()
+            target_score.masked_fill_(~target_masks, 2.0)
+            target_length = target_masks.sum(1).float()
+            target_length = target_length * target_length.clone().uniform_()
+            target_length = target_length + 1  # make sure to mask at least one token.
+
+            _, target_rank = target_score.sort(1)
+            target_cutoff = new_arange(target_rank) < target_length[:, None].long()
+            prev_target_tokens = target_tokens.masked_fill(
+                target_cutoff.scatter(1, target_rank, target_cutoff), unk
+            )
+            return prev_target_tokens
+
+        def _full_mask(target_tokens):
+            pad = self.tgt_dict.pad()
+            bos = self.tgt_dict.bos()
+            eos = self.tgt_dict.eos()
+            unk = self.tgt_dict.unk()
+
+            target_mask = (
+                target_tokens.eq(bos) | target_tokens.eq(eos) | target_tokens.eq(pad)
+            )
+            return target_tokens.masked_fill(~target_mask, unk)
+
+        if self.cfg.noise == "random_delete":
+            return _random_delete(target_tokens)
+        elif self.cfg.noise == "random_mask":
+            return _random_mask(target_tokens)
+        elif self.cfg.noise == "full_mask":
+            return _full_mask(target_tokens)
+        elif self.cfg.noise == "no_noise":
+            return target_tokens
+        else:
+            raise NotImplementedError
+
+    def build_generator(self, models, args, **unused):
+        # add models input to match the API for SequenceGenerator
+        from fairseq.iterative_refinement_generator import IterativeRefinementGenerator
+
+        return IterativeRefinementGenerator(
+            self.target_dictionary,
+            eos_penalty=getattr(args, "iter_decode_eos_penalty", 0.0),
+            max_iter=getattr(args, "iter_decode_max_iter", 10),
+            beam_size=getattr(args, "iter_decode_with_beam", 1),
+            reranking=getattr(args, "iter_decode_with_external_reranker", False),
+            decoding_format=getattr(args, "decoding_format", None),
+            adaptive=not getattr(args, "iter_decode_force_max_iter", False),
+            retain_history=getattr(args, "retain_iter_history", False),
+        )
+
+    def build_dataset_for_inference(self, src_tokens, src_lengths, constraints=None):
+        if constraints is not None:
+            # Though see Susanto et al. (ACL 2020): https://www.aclweb.org/anthology/2020.acl-main.325/
+            raise NotImplementedError(
+                "Constrained decoding with the translation_lev task is not supported"
+            )
+
+        return LanguagePairDataset(
+            src_tokens, src_lengths, self.source_dictionary, append_bos=True
+        )
+
+    def train_step(
+        self, sample, model, criterion, optimizer, update_num, ignore_grad=False
+    ):
+        model.train()
+        sample["prev_target"] = self.inject_noise(sample["target"])
+        loss, sample_size, logging_output = criterion(model, sample)
+        if ignore_grad:
+            loss *= 0
+        optimizer.backward(loss)
+        return loss, sample_size, logging_output
+
+    def valid_step(self, sample, model, criterion):
+        model.eval()
+        with torch.no_grad():
+            sample["prev_target"] = self.inject_noise(sample["target"])
+            loss, sample_size, logging_output = criterion(model, sample)
+        return loss, sample_size, logging_output
diff --git a/SpeechT5/fairseq/fairseq/tasks/translation_multi_simple_epoch.py b/SpeechT5/fairseq/fairseq/tasks/translation_multi_simple_epoch.py
new file mode 100644
index 0000000000000000000000000000000000000000..6f36e5b93e98497de31969d203ae04dbb4bd9306
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/tasks/translation_multi_simple_epoch.py
@@ -0,0 +1,430 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import datetime
+import logging
+import time
+
+import torch
+from fairseq.data import (
+    FairseqDataset,
+    LanguagePairDataset,
+    ListDataset,
+    data_utils,
+    iterators,
+)
+from fairseq.data.multilingual.multilingual_data_manager import (
+    MultilingualDatasetManager,
+)
+from fairseq.data.multilingual.sampling_method import SamplingMethod
+from fairseq.tasks import LegacyFairseqTask, register_task
+from fairseq.utils import FileContentsAction
+
+
+###
+def get_time_gap(s, e):
+    return (
+        datetime.datetime.fromtimestamp(e) - datetime.datetime.fromtimestamp(s)
+    ).__str__()
+
+
+###
+
+
+logger = logging.getLogger(__name__)
+
+
+@register_task("translation_multi_simple_epoch")
+class TranslationMultiSimpleEpochTask(LegacyFairseqTask):
+    """
+    Translate from one (source) language to another (target) language.
+
+    Args:
+        langs (List[str]): a list of languages that are being supported
+        dicts (Dict[str, fairseq.data.Dictionary]): mapping from supported languages to their dictionaries
+        training (bool): whether the task should be configured for training or not
+
+    .. note::
+
+        The translation task is compatible with :mod:`fairseq-train`,
+        :mod:`fairseq-generate` and :mod:`fairseq-interactive`.
+
+    The translation task provides the following additional command-line
+    arguments:
+
+    .. argparse::
+        :ref: fairseq.tasks.translation_parser
+        :prog:
+    """
+
+    @staticmethod
+    def add_args(parser):
+        """Add task-specific arguments to the parser."""
+        # fmt: off
+        parser.add_argument('-s', '--source-lang', default=None, metavar='SRC',
+                            help='inference source language')
+        parser.add_argument('-t', '--target-lang', default=None, metavar='TARGET',
+                            help='inference target language')
+        parser.add_argument('--lang-pairs', default=None, metavar='PAIRS',
+                            help='comma-separated list of language pairs (in training order): en-de,en-fr,de-fr',
+                            action=FileContentsAction)
+        parser.add_argument('--keep-inference-langtok', action='store_true',
+                            help='keep language tokens in inference output (e.g. for analysis or debugging)')
+
+        SamplingMethod.add_arguments(parser)
+        MultilingualDatasetManager.add_args(parser)
+        # fmt: on
+
+    def __init__(self, args, langs, dicts, training):
+        super().__init__(args)
+        self.langs = langs
+        self.dicts = dicts
+        self.training = training
+        if training:
+            self.lang_pairs = args.lang_pairs
+        else:
+            self.lang_pairs = ["{}-{}".format(args.source_lang, args.target_lang)]
+        # eval_lang_pairs for multilingual translation is usually all of the
+        # lang_pairs. However for other multitask settings or when we want to
+        # optimize for certain languages we want to use a different subset. Thus
+        # the eval_lang_pairs class variable is provided for classes that extend
+        # this class.
+        self.eval_lang_pairs = self.lang_pairs
+        # model_lang_pairs will be used to build encoder-decoder model pairs in
+        # models.build_model(). This allows multitask type of sub-class can
+        # build models other than the input lang_pairs
+        self.model_lang_pairs = self.lang_pairs
+        self.source_langs = [d.split("-")[0] for d in self.lang_pairs]
+        self.target_langs = [d.split("-")[1] for d in self.lang_pairs]
+        self.check_dicts(self.dicts, self.source_langs, self.target_langs)
+
+        self.sampling_method = SamplingMethod.build_sampler(args, self)
+        self.data_manager = MultilingualDatasetManager.setup_data_manager(
+            args, self.lang_pairs, langs, dicts, self.sampling_method
+        )
+
+    def check_dicts(self, dicts, source_langs, target_langs):
+        if self.args.source_dict is not None or self.args.target_dict is not None:
+            # no need to check whether the source side and target side are sharing dictionaries
+            return
+        src_dict = dicts[source_langs[0]]
+        tgt_dict = dicts[target_langs[0]]
+        for src_lang in source_langs:
+            assert (
+                src_dict == dicts[src_lang]
+            ), "Diffrent dictionary are specified for different source languages; "
+            "TranslationMultiSimpleEpochTask only supports one shared dictionary across all source languages"
+        for tgt_lang in target_langs:
+            assert (
+                tgt_dict == dicts[tgt_lang]
+            ), "Diffrent dictionary are specified for different target languages; "
+            "TranslationMultiSimpleEpochTask only supports one shared dictionary across all target languages"
+
+    @classmethod
+    def setup_task(cls, args, **kwargs):
+        langs, dicts, training = MultilingualDatasetManager.prepare(
+           cls.load_dictionary, args, **kwargs
+        )
+        return cls(args, langs, dicts, training)
+
+    def has_sharded_data(self, split):
+        return self.data_manager.has_sharded_data(split)
+
+    def load_dataset(self, split, epoch=1, combine=False, **kwargs):
+        """Load a given dataset split.
+
+        Args:
+            split (str): name of the split (e.g., train, valid, test)
+        """
+        if split in self.datasets:
+            dataset = self.datasets[split]
+            if self.has_sharded_data(split):
+                if self.args.virtual_epoch_size is not None:
+                    if dataset.load_next_shard:
+                        shard_epoch = dataset.shard_epoch
+                    else:
+                        # no need to load next shard so skip loading
+                        # also this avoid always loading from beginning of the data
+                        return
+                else:
+                    shard_epoch = epoch
+        else:
+            # estimate the shard epoch from virtual data size and virtual epoch size
+            shard_epoch = self.data_manager.estimate_global_pass_epoch(epoch)
+        logger.info(f"loading data for {split} epoch={epoch}/{shard_epoch}")
+        logger.info(f"mem usage: {data_utils.get_mem_usage()}")
+        if split in self.datasets:
+            del self.datasets[split]
+            logger.info("old dataset deleted manually")
+            logger.info(f"mem usage: {data_utils.get_mem_usage()}")
+        self.datasets[split] = self.data_manager.load_dataset(
+            split,
+            self.training,
+            epoch=epoch,
+            combine=combine,
+            shard_epoch=shard_epoch,
+            **kwargs,
+        )
+
+    def build_dataset_for_inference(self, src_tokens, src_lengths, constraints=None):
+        if constraints is not None:
+            raise NotImplementedError(
+                "Constrained decoding with the multilingual_translation task is not supported"
+            )
+
+        src_data = ListDataset(src_tokens, src_lengths)
+        dataset = LanguagePairDataset(src_data, src_lengths, self.source_dictionary)
+        src_langtok_spec, tgt_langtok_spec = self.args.langtoks["main"]
+        if self.args.lang_tok_replacing_bos_eos:
+            dataset = self.data_manager.alter_dataset_langtok(
+                dataset,
+                src_eos=self.source_dictionary.eos(),
+                src_lang=self.args.source_lang,
+                tgt_eos=self.target_dictionary.eos(),
+                tgt_lang=self.args.target_lang,
+                src_langtok_spec=src_langtok_spec,
+                tgt_langtok_spec=tgt_langtok_spec,
+            )
+        else:
+            dataset.src = self.data_manager.src_dataset_tranform_func(
+                self.args.source_lang,
+                self.args.target_lang,
+                dataset=dataset.src,
+                spec=src_langtok_spec,
+            )
+        return dataset
+
+    def build_generator(
+        self,
+        models,
+        args,
+        seq_gen_cls=None,
+        extra_gen_cls_kwargs=None,
+    ):
+        if not getattr(args, "keep_inference_langtok", False):
+            _, tgt_langtok_spec = self.args.langtoks["main"]
+            if tgt_langtok_spec:
+                tgt_lang_tok = self.data_manager.get_decoder_langtok(
+                    self.args.target_lang, tgt_langtok_spec
+                )
+                extra_gen_cls_kwargs = extra_gen_cls_kwargs or {}
+                extra_gen_cls_kwargs["symbols_to_strip_from_output"] = {tgt_lang_tok}
+
+        return super().build_generator(
+            models, args, seq_gen_cls=None, extra_gen_cls_kwargs=extra_gen_cls_kwargs
+        )
+
+    def build_model(self, args):
+        return super().build_model(args)
+
+    def valid_step(self, sample, model, criterion):
+        loss, sample_size, logging_output = super().valid_step(sample, model, criterion)
+        return loss, sample_size, logging_output
+
+    def inference_step(
+        self, generator, models, sample, prefix_tokens=None, constraints=None
+    ):
+        with torch.no_grad():
+            _, tgt_langtok_spec = self.args.langtoks["main"]
+            if not self.args.lang_tok_replacing_bos_eos:
+                if prefix_tokens is None and tgt_langtok_spec:
+                    tgt_lang_tok = self.data_manager.get_decoder_langtok(
+                        self.args.target_lang, tgt_langtok_spec
+                    )
+                    src_tokens = sample["net_input"]["src_tokens"]
+                    bsz = src_tokens.size(0)
+                    prefix_tokens = (
+                        torch.LongTensor([[tgt_lang_tok]]).expand(bsz, 1).to(src_tokens)
+                    )
+                return generator.generate(
+                    models,
+                    sample,
+                    prefix_tokens=prefix_tokens,
+                    constraints=constraints,
+                )
+            else:
+                return generator.generate(
+                    models,
+                    sample,
+                    prefix_tokens=prefix_tokens,
+                    bos_token=self.data_manager.get_decoder_langtok(
+                        self.args.target_lang, tgt_langtok_spec
+                    )
+                    if tgt_langtok_spec
+                    else self.target_dictionary.eos(),
+                )
+
+    def reduce_metrics(self, logging_outputs, criterion):
+        super().reduce_metrics(logging_outputs, criterion)
+
+    def max_positions(self):
+        """Return the max sentence length allowed by the task."""
+        return (self.args.max_source_positions, self.args.max_target_positions)
+
+    @property
+    def source_dictionary(self):
+        return self.data_manager.get_source_dictionary(self.source_langs[0])
+
+    @property
+    def target_dictionary(self):
+        return self.data_manager.get_target_dictionary(self.target_langs[0])
+
+    def create_batch_sampler_func(
+        self,
+        max_positions,
+        ignore_invalid_inputs,
+        max_tokens,
+        max_sentences,
+        required_batch_size_multiple=1,
+        seed=1,
+    ):
+        def construct_batch_sampler(dataset, epoch):
+            splits = [
+                s for s, _ in self.datasets.items() if self.datasets[s] == dataset
+            ]
+            split = splits[0] if len(splits) > 0 else None
+            # NEW implementation
+            if epoch is not None:
+                # initialize the dataset with the correct starting epoch
+                dataset.set_epoch(epoch)
+
+            # get indices ordered by example size
+            start_time = time.time()
+            logger.info(f"start batch sampler: mem usage: {data_utils.get_mem_usage()}")
+
+            with data_utils.numpy_seed(seed):
+                indices = dataset.ordered_indices()
+            logger.info(
+                f"[{split}] @batch_sampler order indices time: {get_time_gap(start_time, time.time())}"
+            )
+            logger.info(f"mem usage: {data_utils.get_mem_usage()}")
+
+            # filter examples that are too large
+            if max_positions is not None:
+                my_time = time.time()
+                indices = self.filter_indices_by_size(
+                    indices, dataset, max_positions, ignore_invalid_inputs
+                )
+                logger.info(
+                    f"[{split}] @batch_sampler filter_by_size time: {get_time_gap(my_time, time.time())}"
+                )
+                logger.info(f"mem usage: {data_utils.get_mem_usage()}")
+
+            # create mini-batches with given size constraints
+            my_time = time.time()
+            batch_sampler = dataset.batch_by_size(
+                indices,
+                max_tokens=max_tokens,
+                max_sentences=max_sentences,
+                required_batch_size_multiple=required_batch_size_multiple,
+            )
+
+            logger.info(
+                f"[{split}] @batch_sampler batch_by_size time: {get_time_gap(my_time, time.time())}"
+            )
+            logger.info(
+                f"[{split}] per epoch batch_sampler set-up time: {get_time_gap(start_time, time.time())}"
+            )
+            logger.info(f"mem usage: {data_utils.get_mem_usage()}")
+
+            return batch_sampler
+
+        return construct_batch_sampler
+
+    # we need to override get_batch_iterator because we want to reset the epoch iterator each time
+    def get_batch_iterator(
+        self,
+        dataset,
+        max_tokens=None,
+        max_sentences=None,
+        max_positions=None,
+        ignore_invalid_inputs=False,
+        required_batch_size_multiple=1,
+        seed=1,
+        num_shards=1,
+        shard_id=0,
+        num_workers=0,
+        epoch=1,
+        data_buffer_size=0,
+        disable_iterator_cache=False,
+    ):
+        """
+        Get an iterator that yields batches of data from the given dataset.
+
+        Args:
+            dataset (~fairseq.data.FairseqDataset): dataset to batch
+            max_tokens (int, optional): max number of tokens in each batch
+                (default: None).
+            max_sentences (int, optional): max number of sentences in each
+                batch (default: None).
+            max_positions (optional): max sentence length supported by the
+                model (default: None).
+            ignore_invalid_inputs (bool, optional): don't raise Exception for
+                sentences that are too long (default: False).
+            required_batch_size_multiple (int, optional): require batch size to
+                be a multiple of N (default: 1).
+            seed (int, optional): seed for random number generator for
+                reproducibility (default: 1).
+            num_shards (int, optional): shard the data iterator into N
+                shards (default: 1).
+            shard_id (int, optional): which shard of the data iterator to
+                return (default: 0).
+            num_workers (int, optional): how many subprocesses to use for data
+                loading. 0 means the data will be loaded in the main process
+                (default: 0).
+            epoch (int, optional): the epoch to start the iterator from
+                (default: 0).
+            data_buffer_size (int, optional): number of batches to
+                preload (default: 0).
+            disable_iterator_cache (bool, optional): don't cache the
+                EpochBatchIterator (ignores `FairseqTask::can_reuse_epoch_itr`)
+                (default: False).
+        Returns:
+            ~fairseq.iterators.EpochBatchIterator: a batched iterator over the
+                given dataset split
+        """
+        # initialize the dataset with the correct starting epoch
+        assert isinstance(dataset, FairseqDataset)
+        if dataset in self.dataset_to_epoch_iter:
+            return self.dataset_to_epoch_iter[dataset]
+        if self.args.sampling_method == "RoundRobin":
+            batch_iter = super().get_batch_iterator(
+                dataset,
+                max_tokens=max_tokens,
+                max_sentences=max_sentences,
+                max_positions=max_positions,
+                ignore_invalid_inputs=ignore_invalid_inputs,
+                required_batch_size_multiple=required_batch_size_multiple,
+                seed=seed,
+                num_shards=num_shards,
+                shard_id=shard_id,
+                num_workers=num_workers,
+                epoch=epoch,
+                data_buffer_size=data_buffer_size,
+                disable_iterator_cache=disable_iterator_cache,
+            )
+            self.dataset_to_epoch_iter[dataset] = batch_iter
+            return batch_iter
+
+        construct_batch_sampler = self.create_batch_sampler_func(
+            max_positions,
+            ignore_invalid_inputs,
+            max_tokens,
+            max_sentences,
+            required_batch_size_multiple=required_batch_size_multiple,
+            seed=seed,
+        )
+
+        epoch_iter = iterators.EpochBatchIterator(
+            dataset=dataset,
+            collate_fn=dataset.collater,
+            batch_sampler=construct_batch_sampler,
+            seed=seed,
+            num_shards=num_shards,
+            shard_id=shard_id,
+            num_workers=num_workers,
+            epoch=epoch,
+        )
+        return epoch_iter
diff --git a/SpeechT5/fairseq/fairseq/token_generation_constraints.py b/SpeechT5/fairseq/fairseq/token_generation_constraints.py
new file mode 100644
index 0000000000000000000000000000000000000000..e708dc51bcb0ffb7b411496239c74d5e6f3c2448
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/token_generation_constraints.py
@@ -0,0 +1,506 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+"""Implements tracking of constraints for a beam item.
+
+A list of constraints is given as a list of one or more token
+sequences, each of length at least one token. For example, for an input sentence
+
+> Die maschinelle Übersetzung ist schwer zu kontrollieren.
+
+We could have the constraints:
+* to influence
+* hard
+
+There are two implementations:
+* OrderedConstraintState: Tracks progress through an ordered list of multitoken constraints.
+* UnorderedConstraintState: Tracks progress through an unordered list of multitoken constraints.
+
+The difference is that in the first, the constraints are assumed to be
+in order; the algorithm will permit zero or more tokens between them.
+In the second, the constraints are not ordered, so many orderings will
+be explored.
+
+The same sequence can be present any number of times, and will appear
+that many times in the output.
+"""
+
+from collections import Counter
+from typing import List, Optional, Set, Tuple
+
+import torch
+
+
+class ConstraintState:
+    def __init__(self):
+        pass
+
+
+def pack_constraints(batch_constraints: List[List[torch.Tensor]]) -> torch.Tensor:
+    """Takes a list of list of constraints in tensor form (a list of
+    tensor constraints for each sentence) and transforms it into a
+    packed Tensor. For example, here is a batch of size 3 with 3, 0,
+    and 1 constraints:
+
+        [ [ [3 1 2], [3], [4 5 6 7], ]
+          [],
+          [ [1 8 9 10 1 4 11 12], ]
+        ]
+
+    Its corresponding packed structure is:
+
+        [ [ 3  3  1  2  0  3  0  4  5  6  7  0],
+          [ 0  0  0  0  0  0  0  0  0  0  0  0],
+          [ 1  1  8  9 10  1  4 11 12  0  0  0] ]
+
+    The packed tensor has shape (batch size, maxlen), where
+    maxlen is defined below. Each row contains concatenated
+    constraint tokens for that sentence, with 0 appended after
+    each constraint. The first item in each row is the number
+    of constraints for that sentence. So maxlen is the maximum
+    of
+
+    (number of constraints) + (sum length of constraints) + 1.
+
+    across all sentences in the batch.
+    """
+    # The maximum word length of concatenated constraints for any sentence
+    max_constraints_len = 1
+    for sentence_constraints in batch_constraints:
+        if len(sentence_constraints):
+            # number of constraints, plus sum of constrain lens, plus a zero after each
+            constraints_len = (
+                1
+                + sum([c.size(0) for c in sentence_constraints])
+                + len(sentence_constraints)
+            )
+            max_constraints_len = max(max_constraints_len, constraints_len)
+
+    batch_size = len(batch_constraints)
+    constraints_tensor = torch.zeros((batch_size, max_constraints_len)).long()
+    for i, sentence_constraints in enumerate(batch_constraints):
+        constraints_tensor[i, 0] = len(sentence_constraints)
+        offset = 1
+        for j, constraint in enumerate(sentence_constraints):
+            this_len = constraint.size(0)
+            constraints_tensor[i, offset : offset + this_len] = constraint
+            offset += this_len + 1
+
+    return constraints_tensor.long()
+
+
+def unpack_constraints(constraint_tensor: torch.Tensor) -> List[torch.Tensor]:
+    """
+    Transforms *one row* of a packed constraint tensor (e.g., for one
+    sentence in the batch) into a list of constraint tensors.
+    """
+    constraint_list = []
+    num_constraints = constraint_tensor[0]
+    constraints = constraint_tensor.tolist()
+    offset = 1
+    for i in range(num_constraints):
+        where = constraints.index(0, offset)
+        constraint_list.append(constraint_tensor[offset:where])
+        offset = where + 1
+
+    return constraint_list
+
+
+class ConstraintNode:
+    """
+    Represents a node in a trie managing unordered constraints.
+    """
+
+    def __init__(self, token: int = None, parent=None):
+        # The token associate with this node (None for the root)
+        self.token = int(token) if token is not None else None
+        # The parent (None at the root)
+        self.parent = parent
+        # Whether this node is a completed constraint
+        self.terminal = 0
+        # List of child nodes
+        self.children = {}
+
+        # The cumulative number of constraints from this point in the
+        # trie forward
+        self.num_constraints = 0
+
+    @property
+    def id(self):
+        return self.token
+
+    def __str__(self):
+        term = self.terminal != 0
+        return f"[{self.token}].{term}#{self.num_constraints}"
+
+    def __getitem__(self, key: int):
+        return self.children.get(key, None)
+
+    def next_tokens(self) -> Set[int]:
+        """The set of child labels."""
+        return set(self.children.keys())
+
+    @staticmethod
+    def create(constraints: List[List[int]]):
+        root = ConstraintNode()
+        for sequence in constraints:
+            root.add_sequence(sequence)
+
+        return root
+
+    @staticmethod
+    def print_graph(node: "ConstraintNode"):
+        if len(node.children) == 0:
+            return str(node)
+        else:
+            s = f"({node}"
+            for child in node.children.values():
+                s += " " + ConstraintNode.print_graph(child)
+            s += ")"
+            return s
+
+    def token_counts(self) -> Counter:
+        """Returns a counter of the number of times each token is used
+        in a constraint.
+        """
+        token_counts = Counter()
+        kids = list(self.children.values())
+        while len(kids) > 0:
+            kid = kids.pop()
+            token_counts[kid.id] += kid.num_constraints
+            kids += list(kid.children.values())
+
+        return token_counts
+
+    def tokens(self) -> Set[int]:
+        """Returns the set of tokens in constraints."""
+        return set(self.token_counts().keys())
+
+    def add_sequence(self, sequence: List[int]):
+        """Adds a constraint, represented as a list of integers, to
+        the trie."""
+        assert len(sequence) > 0
+
+        token = int(sequence[0])
+        if token not in self.children:
+            self.children[token] = ConstraintNode(token, parent=self)
+
+        node = self.children[token]
+        if len(sequence) == 1:
+            node.terminal += 1
+            node.num_constraints += 1
+            parent = node.parent
+            while parent is not None:
+                parent.num_constraints += 1
+                parent = parent.parent
+        else:
+            node.add_sequence(sequence[1:])
+
+
+class UnorderedConstraintState(ConstraintState):
+    """
+    Records progress through the set of constraints for each item in the beam
+    using a trie.
+    """
+
+    def __init__(self, node: ConstraintNode, copy_from: "ConstraintState" = None):
+        self.node = node
+
+        if copy_from is None:
+            # The root node
+            self.root = node
+            # The set of states in the graph that have been completed
+            self.completed = Counter()
+            # The...
+            self.generated = Counter()
+            # The list of tokens we need to generate
+            self.needed_tokens = self.root.tokens()
+        else:
+            self.completed = Counter(copy_from.completed)
+            self.generated = Counter(copy_from.generated)
+            self.root = copy_from.root
+
+        # Mark the node as generated
+        if self.node != self.root:
+            self.generated[node] += 1
+
+    @staticmethod
+    def create(constraint_tensor: torch.Tensor):
+        constraint_list = unpack_constraints(constraint_tensor)
+        constraint_trie_root = ConstraintNode.create(constraint_list)
+        return UnorderedConstraintState(constraint_trie_root)
+
+    def __str__(self):
+        gen_str = ",".join([str(node) for node in self.generated])
+        return f"{self.name}/{self.bank}({gen_str})x{self.num_completed}"
+
+    def __copy__(self):
+        copied_state = UnorderedConstraintState(self.node, copy_from=self)
+        return copied_state
+
+    def copy(self):
+        return self.__copy__()
+
+    @property
+    def name(self):
+        if self.node.id is None:
+            return "ROOT"
+        else:
+            return str(self.node.id)
+
+    @property
+    def is_root(self):
+        return self.node == self.root
+
+    @property
+    def bank(self):
+        return sum(self.generated.values())
+
+    @property
+    def num_completed(self):
+        """The number of constraints (not constraint tokens) that are completed.
+        In addition to the already-completed states, we need to account for the
+        current state, which might get marked as completed when another token
+        is generated.
+        """
+        in_final = self.node.terminal and self.completed[self.node] < self.node.terminal
+        return sum(self.completed.values()) + in_final
+
+    @property
+    def finished(self):
+        return self.root.num_constraints - self.num_completed == 0
+
+    @property
+    def token_counts(self):
+        return self.root.token_counts()
+
+    @property
+    def tokens(self):
+        return self.root.tokens()
+
+    @property
+    def num_constraint_tokens(self):
+        return sum(self.token_counts.values())
+
+    def next_tokens(self) -> Set[int]:
+        """Returns the list of tokens that could come next.
+        These are (a) all tokens extending the root state and, for
+        non-root states, additionally all tokens extending the current
+        state."""
+
+        if self.node != self.root:
+            return self.root.next_tokens().union(self.node.next_tokens())
+        else:
+            return self.root.next_tokens()
+
+    def advance(self, token: int):
+        """Reads in a token and advances the state. Here's how it works.
+
+        We can advance to the next state if:
+        - there is a matching child
+        - its path isn't blocked
+
+        A path is blocked when all constraints that are descendants of
+        that node have already been generated, in the current state.
+
+        If we are not able to advance from the current state, we "fall
+        off the graph" and return to the root state. There, we again
+        try to advance, checking the same criteria.
+
+        In any case, when falling off the graph, we need to do some
+        bookkeeping. We:
+        - check whether any constraints were met (all prefixes of
+          current state)
+        - if one is found, mark it as completed
+        - adjust visited nodes accordingly
+        """
+        token = int(token)
+
+        next_state = None
+        child = self.node[token]
+        if child is not None and self.generated[child] < child.num_constraints:
+            next_state = UnorderedConstraintState(child, copy_from=self)
+
+        def rewind():
+            """If we're mid-trie and an "illegal" token is chosen next, we need
+            to reset our state to the root state. However, along the way, we need
+            to check whether a prefix of the current trie state represents a state
+            we could mark as completed.
+            """
+            node = self.node
+            while node != self.root:
+                if node.terminal and self.completed[node] < node.terminal:
+                    next_state.completed[node] += 1
+                    return
+
+                next_state.generated[node] -= 1
+                node = node.parent
+
+        # Fall off the graph, check the root
+        if next_state is None and token in self.root.next_tokens():
+            child = self.root[token]
+            # We can only traverse this edge if it's not saturated
+            if self.generated[child] < child.num_constraints:
+                next_state = UnorderedConstraintState(child, copy_from=self)
+            else:
+                next_state = UnorderedConstraintState(self.root, copy_from=self)
+
+            # Rewind
+            rewind()
+
+        elif next_state is None:
+            next_state = UnorderedConstraintState(self.root, copy_from=self)
+            # Rewind
+            rewind()
+
+        return next_state
+
+
+class ConstraintSequence:
+    def __init__(self, sequences: List[List[int]]):
+        """Represents a set of possibly multitoken constraints by
+        concatenating them and internally recording the end points.
+        """
+        self.sequences = []
+        self.endpoints = []
+        self.num_tokens = 0
+        self.tokens = set()
+        for sequence in sequences:
+            for token in sequence:
+                self.tokens.add(token)
+            self.num_tokens += len(sequence)
+            self.endpoints += [False for x in range(len(sequence) - 1)] + [True]
+            self.sequences += sequence
+
+    def __getitem__(self, key: int):
+        return self.sequences[key]
+
+    def __len__(self):
+        return len(self.sequences)
+
+    def __str__(self):
+        return str(self.sequences)
+
+
+class OrderedConstraintState(ConstraintState):
+    """
+    Records progress through the set of linear nonbranching constraints with gaps.
+    """
+
+    def __init__(self, sequence: ConstraintSequence, state: int = -1):
+        self.sequence = sequence
+        self.state = state
+
+    @staticmethod
+    def create(constraint_tensor: torch.Tensor):
+        constraint_list = unpack_constraints(constraint_tensor)
+        return OrderedConstraintState(ConstraintSequence(constraint_list), -1)
+
+    def __str__(self):
+        return f"{self.state}/{self.bank}x{self.num_completed}"
+
+    def __copy__(self):
+        return OrderedConstraintState(self.sequence, self.state)
+
+    def copy(self):
+        return self.__copy__()
+
+    @property
+    def num_completed(self):
+        if self.state == -1:
+            return 0
+        count = len(
+            list(filter(lambda x: x, self.sequence.endpoints[0 : self.state + 1]))
+        )
+        return count
+
+    @property
+    def is_root(self):
+        return self.state == -1
+
+    @property
+    def name(self):
+        if self.state == -1:
+            return "ROOT"
+        else:
+            return str(self.sequence[self.state])
+
+    @property
+    def bank(self) -> int:
+        return self.state + 1
+
+    @property
+    def finished(self):
+        return self.state + 1 == len(self.sequence)
+
+    @property
+    def token_counts(self):
+        return self.sequence.token_counts()
+
+    @property
+    def tokens(self):
+        return self.sequence.tokens
+
+    @property
+    def num_constraint_tokens(self):
+        return sum(self.token_counts.values())
+
+    def next_tokens(self) -> Set[int]:
+        """Returns the list of tokens that could come next.
+        These are (a) all tokens extending the root state and, for
+        non-root states, additionally all tokens extending the current
+        state."""
+
+        tokens = set()
+        if self.state > 0:
+            tokens.add(self.sequence[0])
+        if not self.finished:
+            tokens.add(self.sequence[self.state + 1])
+        return tokens
+
+    def advance(self, token: int):
+        """Reads in a token and advances the state. Here's how it works.
+
+        We can advance to the next state if:
+        - there is a matching child
+        - its path isn't blocked
+
+        A path is blocked when all constraints that are descendants of
+        that node have already been generated, in the current state.
+
+        If we are not able to advance from the current state, we "fall
+        off the graph" and return to the root state. There, we again
+        try to advance, checking the same criteria.
+
+        In any case, when falling off the graph, we need to do some
+        bookkeeping. We:
+        - check whether any constraints were met (all prefixes of
+          current state)
+        - if one is found, mark it as completed
+        - adjust visited nodes accordingly
+        """
+        token = int(token)
+        # print(f"{self} ADVANCE({token}) {self.sequence} -> ", end="")
+
+        if self.finished:
+            # Accept anything
+            next_state = self.copy()
+
+        elif self.sequence[self.state + 1] == token:
+            # Advance to the next token
+            next_state = OrderedConstraintState(self.sequence, self.state + 1)
+
+        elif self.sequence.endpoints[self.state]:
+            # Accept anything between constraints (*)
+            next_state = self.copy()
+
+        elif token == self.sequence[0]:
+            # Start over having generated the first token
+            next_state = OrderedConstraintState(self.sequence, 0)
+        else:
+            # Start over from the root
+            next_state = OrderedConstraintState(self.sequence, -1)
+
+        return next_state
diff --git a/SpeechT5/fairseq/fairseq/tokenizer.py b/SpeechT5/fairseq/fairseq/tokenizer.py
new file mode 100644
index 0000000000000000000000000000000000000000..42131f7b1d334020c3b48a6e44d4139f7c62ad28
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/tokenizer.py
@@ -0,0 +1,15 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import re
+
+
+SPACE_NORMALIZER = re.compile(r"\s+")
+
+
+def tokenize_line(line):
+    line = SPACE_NORMALIZER.sub(" ", line)
+    line = line.strip()
+    return line.split()
diff --git a/SpeechT5/fairseq/fairseq/trainer.py b/SpeechT5/fairseq/fairseq/trainer.py
new file mode 100644
index 0000000000000000000000000000000000000000..1deb14326f90dea246b9a1a8d3b97b95c5472a5e
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/trainer.py
@@ -0,0 +1,1439 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+"""
+Train a network across multiple GPUs.
+"""
+
+import contextlib
+import logging
+import sys
+import time
+from argparse import Namespace
+from itertools import chain
+from typing import Any, Dict, List
+
+import torch
+from fairseq import checkpoint_utils, models, optim, utils
+from fairseq.dataclass.configs import FairseqConfig
+from fairseq.dataclass.utils import convert_namespace_to_omegaconf
+from fairseq.distributed import utils as distributed_utils
+from fairseq.file_io import PathManager
+from fairseq.logging import meters, metrics
+from fairseq.nan_detector import NanDetector
+from fairseq.optim import lr_scheduler
+from omegaconf import OmegaConf
+
+logger = logging.getLogger(__name__)
+
+
+class Trainer(object):
+    """Main class for data parallel training.
+
+    This class supports synchronous distributed data parallel training,
+    where multiple workers each have a full model replica and gradients
+    are accumulated across workers before each update. We use
+    :class:`~torch.nn.parallel.DistributedDataParallel` to handle
+    communication of the gradients across workers.
+    """
+
+    def __init__(self, cfg: FairseqConfig, task, model, criterion, quantizer=None):
+
+        if isinstance(cfg, Namespace):
+            logger.warning(
+                "argparse.Namespace configuration is deprecated! Automatically converting to OmegaConf"
+            )
+            cfg = convert_namespace_to_omegaconf(cfg)
+
+        self.cfg = cfg
+        self.task = task
+
+        # catalog shared parameters
+        shared_params = _catalog_shared_params(model)
+        self.tpu = cfg.common.tpu
+        self.cuda = torch.cuda.is_available() and not cfg.common.cpu and not self.tpu
+        if self.cuda:
+            self.device = torch.device("cuda")
+        elif self.tpu:
+            self.device = utils.get_tpu_device()
+        else:
+            self.device = torch.device("cpu")
+
+        if self.cfg.distributed_training.ddp_backend == "fully_sharded":
+            if self.cfg.common.bf16:
+                raise ValueError(
+                    "FullyShardedDataParallel is not compatible with --bf16 or "
+                    "--memory-efficient-bf16"
+                )
+            if self.cfg.distributed_training.zero_sharding != "none":
+                raise ValueError(
+                    "FullyShardedDataParallel is not compatible with --zero-sharding "
+                    "option (it's already built in)"
+                )
+        else:
+            if (
+                hasattr(self.cfg.distributed_training, "cpu_offload")
+                and self.cfg.distributed_training.cpu_offload
+            ):
+                raise ValueError("--cpu-offload requires --ddp-backend=fully_sharded")
+
+        # copy model and criterion to current device/dtype
+        self._criterion = criterion
+        self._model = model
+        if cfg.distributed_training.ddp_backend != "fully_sharded":
+            if cfg.common.fp16:
+                assert not cfg.common.amp, "Cannot use fp16 and AMP together"
+                self._criterion = self._criterion.half()
+                self._model = self._model.half()
+            elif cfg.common.bf16:
+                self._criterion = self._criterion.to(dtype=torch.bfloat16)
+                self._model = self._model.to(dtype=torch.bfloat16)
+            elif cfg.common.amp:
+                self._amp_retries = 0
+        if (
+            not cfg.distributed_training.pipeline_model_parallel
+            # the DistributedFairseqModel wrapper will handle moving to device,
+            # so only handle cases which don't use the wrapper
+            and not self.use_distributed_wrapper
+        ):
+            self._criterion = self._criterion.to(device=self.device)
+            self._model = self._model.to(device=self.device)
+        self.pipeline_model_parallel = cfg.distributed_training.pipeline_model_parallel
+        self.last_device = None
+        if self.cuda and self.pipeline_model_parallel:
+            self.last_device = torch.device(
+                cfg.distributed_training.pipeline_devices[-1]
+            )
+
+        # check that shared parameters are preserved after device transfer
+        for shared_param in shared_params:
+            ref = _get_module_by_path(self._model, shared_param[0])
+            for path in shared_param[1:]:
+                logger.info(
+                    "detected shared parameter: {} <- {}".format(shared_param[0], path)
+                )
+                _set_module_by_path(self._model, path, ref)
+
+        self._dummy_batch = None  # indicates we don't have a dummy batch at first
+        self._lr_scheduler = None
+        self._num_updates = 0
+        self._num_xla_compiles = 0  # for TPUs
+        self._optim_history = None
+        self._optimizer = None
+        self._warn_once = set()
+        self._wrapped_criterion = None
+        self._wrapped_model = None
+
+        # TODO(myleott): support tpu
+        if self.cuda and self.data_parallel_world_size > 1:
+            self._grad_norm_buf = torch.cuda.DoubleTensor(self.data_parallel_world_size)
+        else:
+            self._grad_norm_buf = None
+
+        self.quantizer = quantizer
+        if self.quantizer is not None:
+            self.quantizer.set_trainer(self)
+
+        # get detailed cuda environment
+        if self.cuda:
+            self.cuda_env = utils.CudaEnvironment()
+            if self.data_parallel_world_size > 1:
+                self.cuda_env_arr = distributed_utils.all_gather_list(
+                    self.cuda_env, group=distributed_utils.get_global_group()
+                )
+            else:
+                self.cuda_env_arr = [self.cuda_env]
+            if self.data_parallel_rank == 0:
+                utils.CudaEnvironment.pretty_print_cuda_env_list(self.cuda_env_arr)
+        else:
+            self.cuda_env = None
+            self.cuda_env_arr = None
+
+        metrics.log_start_time("wall", priority=790, round=0)
+
+        self._start_time = time.time()
+        self._previous_training_time = 0
+        self._cumulative_training_time = None
+
+    def reinitialize(self):
+        """Reinitialize the Trainer, typically after model params change."""
+        self._lr_scheduler = None
+        self._optimizer = None
+        self._wrapped_criterion = None
+        self._wrapped_model = None
+
+    @property
+    def data_parallel_world_size(self):
+        if self.cfg.distributed_training.distributed_world_size == 1:
+            return 1
+        return distributed_utils.get_data_parallel_world_size()
+
+    @property
+    def data_parallel_process_group(self):
+        return distributed_utils.get_data_parallel_group()
+
+    @property
+    def data_parallel_rank(self):
+        if self.cfg.distributed_training.distributed_world_size == 1:
+            return 0
+        return distributed_utils.get_data_parallel_rank()
+
+    @property
+    def is_data_parallel_master(self):
+        # NOTE: this returns true for all model parallel replicas with data
+        # parallel rank 0
+        return self.data_parallel_rank == 0
+
+    @property
+    def use_distributed_wrapper(self) -> bool:
+        return (
+            self.data_parallel_world_size > 1 and not self.cfg.optimization.use_bmuf
+        ) or (
+            self.cfg.distributed_training.ddp_backend == "fully_sharded"
+            and self.cfg.distributed_training.cpu_offload
+        )
+
+    @property
+    def should_save_checkpoint_on_current_rank(self) -> bool:
+        """Indicates whether to save checkpoints on the current DDP rank."""
+        if (
+            self.cfg.distributed_training.ddp_backend == "fully_sharded"
+            and self.cfg.distributed_training.use_sharded_state
+        ) or getattr(self.cfg.model, "base_layers", 0) > 0:
+            return True
+        else:
+            return self.is_data_parallel_master
+
+    @property
+    def always_call_state_dict_during_save_checkpoint(self) -> bool:
+        if (
+            self.cfg.distributed_training.ddp_backend == "fully_sharded"
+            and not self.cfg.distributed_training.use_sharded_state
+        ):
+            # FSDP calls communication collective when consolidating checkpoints
+            return True
+        else:
+            return False
+
+    @property
+    def checkpoint_suffix(self) -> str:
+        """Suffix to add to the checkpoint file name."""
+        if (
+            self.cfg.distributed_training.ddp_backend == "fully_sharded"
+            and self.cfg.distributed_training.use_sharded_state
+        ):
+            return self.cfg.checkpoint.checkpoint_suffix + "-shard{0}".format(
+                self.data_parallel_rank
+            )
+        else:
+            return self.cfg.checkpoint.checkpoint_suffix or ""
+
+    @property
+    def criterion(self):
+        if self._wrapped_criterion is None:
+            if utils.has_parameters(self._criterion) and self.use_distributed_wrapper:
+                self._wrapped_criterion = models.DistributedFairseqModel(
+                    self.cfg.distributed_training,
+                    self._criterion,
+                    process_group=self.data_parallel_process_group,
+                    device=self.device,
+                )
+            else:
+                self._wrapped_criterion = self._criterion
+        return self._wrapped_criterion
+
+    @property
+    def model(self):
+        if self._wrapped_model is None:
+            if self.use_distributed_wrapper:
+                self._wrapped_model = models.DistributedFairseqModel(
+                    self.cfg.distributed_training,
+                    self._model,
+                    process_group=self.data_parallel_process_group,
+                    device=self.device,
+                )
+            else:
+                self._wrapped_model = self._model
+        return self._wrapped_model
+
+    @property
+    def optimizer(self):
+        if self._optimizer is None:
+            self._build_optimizer()
+        return self._optimizer
+
+    @property
+    def lr_scheduler(self):
+        if self._lr_scheduler is None:
+            self._build_optimizer()  # this will initialize self._lr_scheduler
+        return self._lr_scheduler
+
+    def _build_optimizer(self):
+        params = list(
+            filter(
+                lambda p: p.requires_grad,
+                chain(self.model.parameters(), self.criterion.parameters()),
+            )
+        )
+
+        if (
+            self.cfg.distributed_training.ddp_backend == "fully_sharded"
+            and self.cfg.common.fp16
+        ):
+            # FullyShardedDataParallel always uses MemoryEfficientFP16 wrapper,
+            # mostly for the grad scaling. But if we don't have the
+            # --memory-efficient-fp16 flag set, then we're effectively doing
+            # regular --fp16 and can allow the use of optimizers that would
+            # otherwise be unsupported by MemoryEfficientFP16Optimizer.
+            allow_unsupported = not self.cfg.common.memory_efficient_fp16
+            self._optimizer = optim.MemoryEfficientFP16Optimizer.build_optimizer(
+                self.cfg, params, allow_unsupported=allow_unsupported
+            )
+        elif self.cfg.common.fp16 or self.cfg.common.bf16 or self.cfg.common.amp:
+            if self.cuda and torch.cuda.get_device_capability(0)[0] < 7:
+                logger.info(
+                    "NOTE: your device does NOT support faster training with --fp16 or --amp, "
+                    "please switch to FP32 which is likely to be faster"
+                )
+            if (
+                self.cfg.common.memory_efficient_fp16
+                or self.cfg.common.memory_efficient_bf16
+            ):
+                self._optimizer = optim.MemoryEfficientFP16Optimizer.build_optimizer(
+                    self.cfg, params
+                )
+            elif self.cfg.common.amp:
+                self._optimizer = optim.AMPOptimizer.build_optimizer(self.cfg, params)
+            else:
+                self._optimizer = optim.FP16Optimizer.build_optimizer(self.cfg, params)
+        else:
+            if self.cuda and torch.cuda.get_device_capability(0)[0] >= 7:
+                logger.info("NOTE: your device may support faster training with --fp16 or --amp")
+            self._optimizer = optim.build_optimizer(self.cfg.optimizer, params)
+
+        if self.cfg.distributed_training.ddp_backend == "fully_sharded":
+            assert (
+                not self.cfg.optimization.use_bmuf
+            ), "--ddp-backend=fully_sharded is not compatible with BMUF"
+            assert self._optimizer.supports_flat_params, (
+                "--ddp-backend=fully_sharded is only compatible with pointwise "
+                "optimizers (e.g., Adam, AdamW, Adadelta, Adamax, SGD, etc.). "
+                "However, the sharding will result in slightly different results when "
+                "using non-pointwise optimizers (e.g., Adagrad, Adafactor, LAMB)"
+            )
+
+        if self.cfg.optimization.use_bmuf:
+            self._optimizer = optim.FairseqBMUF(
+                self.cfg.bmuf,
+                self._optimizer,
+            )
+
+        if self.cfg.distributed_training.zero_sharding == "os":
+            if (
+                self.cfg.common.fp16
+                and not self.cfg.common.memory_efficient_fp16
+                and not self.cfg.common.memory_efficient_bf16
+            ) and not self.cfg.common.fp16_no_flatten_grads:
+                raise ValueError(
+                    "ZeRO is incomptabile with fp16 and flattened grads. "
+                    "Please use --fp16-no-flatten-grads"
+                )
+            else:
+                optim.shard_(self._optimizer, self.data_parallel_process_group)
+
+        # We should initialize the learning rate scheduler immediately after
+        # building the optimizer, so that the initial learning rate is set.
+        self._lr_scheduler = lr_scheduler.build_lr_scheduler(
+            self.cfg.lr_scheduler,
+            self.optimizer,
+        )
+        self._lr_scheduler.step_update(0)
+
+    def consolidate_optimizer(self):
+        """For OSS, we need to consolidate the state dict."""
+        if self.cfg.checkpoint.no_save_optimizer_state:
+            return
+        self._gathered_optim_state = None
+        if hasattr(self.optimizer.optimizer, "consolidate_state_dict"):
+            self.optimizer.optimizer.consolidate_state_dict()
+
+        elif (
+            self.cfg.distributed_training.ddp_backend == "fully_sharded"
+            and not self.model.use_sharded_state
+        ):
+            st = self.model.gather_full_optim_state_dict(
+                self.optimizer
+            )  # only returns on rank 0
+            self._gathered_optim_state = st
+
+    def state_dict(self):
+        state_dict = {
+            "args": None,  # legacy
+            "cfg": (
+                OmegaConf.to_container(self.cfg, resolve=True, enum_to_str=True)
+                if OmegaConf.is_config(self.cfg)
+                else self.cfg
+            ),
+            "model": self.model.state_dict(),
+            "criterion": (
+                self.criterion.state_dict()
+                if utils.has_parameters(self.criterion)
+                else None
+            ),
+            "optimizer_history": (self._optim_history or [])
+            + [
+                {
+                    "criterion_name": self.get_criterion().__class__.__name__,
+                    "optimizer_name": self.optimizer.__class__.__name__,
+                    "lr_scheduler_state": self.lr_scheduler.state_dict(),
+                    "num_updates": self.get_num_updates(),
+                }
+            ],
+            "task_state": self.task.state_dict() if self.task is not None else {},
+            "extra_state": {
+                "metrics": metrics.state_dict(),
+                "previous_training_time": self.cumulative_training_time(),
+            },
+        }
+        if not self.cfg.checkpoint.no_save_optimizer_state:
+            if self._gathered_optim_state is not None:
+                state_dict["last_optimizer_state"] = self._gathered_optim_state
+                self._gathered_optim_state = None
+            else:
+                state_dict["last_optimizer_state"] = self.optimizer.state_dict()
+        if self.cfg.distributed_training.ddp_backend == "fully_sharded":
+            # save meta data for recombining checkpoint upon loading
+            state_dict["fsdp_metadata"] = self.model.local_metadata_dict()
+        return state_dict
+
+    def save_checkpoint(self, filename, extra_state):
+        """Save all training state in a checkpoint file."""
+        logger.info(f"Saving checkpoint to {filename}")
+        # call state_dict on all ranks in case it needs internal communication
+        state_dict = utils.move_to_cpu(self.state_dict())
+        state_dict["extra_state"].update(extra_state)
+        if self.should_save_checkpoint_on_current_rank:
+            checkpoint_utils.torch_persistent_save(
+                state_dict,
+                filename,
+                async_write=self.cfg.checkpoint.write_checkpoints_asynchronously,
+            )
+        logger.info(f"Finished saving checkpoint to {filename}")
+
+    def load_checkpoint(
+        self,
+        filename,
+        reset_optimizer=False,
+        reset_lr_scheduler=False,
+        optimizer_overrides=None,
+        reset_meters=False,
+    ):
+        """
+        Load all training state from a checkpoint file.
+        rank = 0 will load the checkpoint, and then broadcast it to all
+        other ranks.
+        """
+        extra_state, self._optim_history, last_optim_state = None, [], None
+
+        logger.info(f"Preparing to load checkpoint {filename}")
+        is_distributed = self.data_parallel_world_size > 1
+        bexists = PathManager.isfile(filename)
+        if bexists:
+            load_on_all_ranks = (
+                self.cfg.checkpoint.load_checkpoint_on_all_dp_ranks
+                # TPUs don't support broadcast yet, so load checkpoints
+                # on every worker for now
+                or self.tpu
+                # FSDP requires loading checkpoint shards on all ranks
+                or (
+                    self.cfg.distributed_training.ddp_backend == "fully_sharded"
+                    and self.cfg.distributed_training.use_sharded_state
+                )
+                or getattr(self.cfg.model, "base_layers", 0) > 0
+            )
+
+            if load_on_all_ranks or self.data_parallel_rank == 0:
+                state = checkpoint_utils.load_checkpoint_to_cpu(
+                    filename, load_on_all_ranks=load_on_all_ranks
+                )
+                last_optim_state = state.get("last_optimizer_state", None)
+
+                # If doing zero_sharding, do not broadcast global optimizer
+                # state. Later we will broadcast sharded states to each rank
+                # to avoid memory from exploding.
+                if (
+                    not load_on_all_ranks
+                    and self.cfg.distributed_training.zero_sharding == "os"
+                    and "last_optimizer_state" in state
+                    and is_distributed
+                ):
+                    state["last_optimizer_state"] = "SHARDED"
+            else:
+                last_optim_state = None
+                state = None
+
+            if is_distributed and not load_on_all_ranks:
+                state = distributed_utils.broadcast_object(
+                    state,
+                    src_rank=0,
+                    group=self.data_parallel_process_group,
+                    dist_device=self.device,
+                )
+                if self.data_parallel_rank > 0:
+                    last_optim_state = state.get("last_optimizer_state", None)
+
+            # load model parameters
+            try:
+                self.model.load_state_dict(
+                    state["model"], strict=True, model_cfg=self.cfg.model
+                )
+                # save memory for later steps
+                del state["model"]
+                if utils.has_parameters(self.get_criterion()):
+                    self.get_criterion().load_state_dict(
+                        state["criterion"], strict=True
+                    )
+                    del state["criterion"]
+
+            except Exception:
+                raise Exception(
+                    "Cannot load model parameters from checkpoint {}; "
+                    "please ensure that the architectures match.".format(filename)
+                )
+            extra_state = state["extra_state"]
+            self._optim_history = state["optimizer_history"]
+
+        if last_optim_state is not None and not reset_optimizer:
+            # rebuild optimizer after loading model, since params may have changed
+            self._build_optimizer()
+
+            # only reload optimizer and lr_scheduler if they match
+            last_optim = self._optim_history[-1]
+            assert (
+                last_optim["criterion_name"] == self.get_criterion().__class__.__name__
+            ), f"Criterion does not match; please reset the optimizer (--reset-optimizer). {last_optim['criterion_name']} vs {self.get_criterion().__class__.__name__}"
+            assert (
+                last_optim["optimizer_name"] == self.optimizer.__class__.__name__
+            ), f"Optimizer does not match; please reset the optimizer (--reset-optimizer). {last_optim['optimizer_name']} vs {self.optimizer.__class__.__name__}"
+
+            if not reset_lr_scheduler:
+                self.lr_scheduler.load_state_dict(last_optim["lr_scheduler_state"])
+
+            if (
+                self.cfg.distributed_training.ddp_backend == "fully_sharded"
+                and not self.model.use_sharded_state
+            ):
+                # if use_sharded_state, the last_optim_state is already sharded, skip this
+                last_optim_state = self.model.get_shard_from_optim_state_dict(
+                    last_optim_state
+                )
+            elif not load_on_all_ranks and is_distributed:
+                last_optim_state = self.optimizer.broadcast_global_state_dict(
+                    last_optim_state
+                )
+
+            self.optimizer.load_state_dict(last_optim_state, optimizer_overrides)
+
+            self.set_num_updates(last_optim["num_updates"])
+
+        if extra_state is not None:
+            itr_state = extra_state["train_iterator"]
+            epoch = itr_state["epoch"]
+
+            if "previous_training_time" in extra_state:
+                self._previous_training_time = extra_state["previous_training_time"]
+                self._start_time = time.time()
+
+            self.lr_step(epoch)
+
+            if (
+                itr_state.get("version", 1) >= 2
+                and itr_state["iterations_in_epoch"] == 0
+            ):
+                # reset meters at start of epoch
+                reset_meters = True
+
+            if "metrics" in extra_state and not reset_meters:
+                metrics.load_state_dict(extra_state["metrics"])
+
+                # reset TimeMeters, since their start times don't make sense anymore
+                for meter in metrics.get_meters("default"):
+                    if isinstance(meter, meters.TimeMeter):
+                        meter.reset()
+
+            logger.info(
+                "Loaded checkpoint {} (epoch {} @ {} updates)".format(
+                    filename, epoch, self.get_num_updates()
+                )
+            )
+
+        else:
+            logger.info("No existing checkpoint found {}".format(filename))
+
+        return extra_state
+
+    def get_train_iterator(
+        self,
+        epoch,
+        combine=True,
+        load_dataset=True,
+        data_selector=None,
+        shard_batch_itr=True,
+        disable_iterator_cache=False,
+    ):
+        """Return an EpochBatchIterator over the training set for a given epoch."""
+        if load_dataset:
+            logger.info("loading train data for epoch {}".format(epoch))
+            self.task.load_dataset(
+                self.cfg.dataset.train_subset,
+                epoch=epoch,
+                combine=combine,
+                data_selector=data_selector,
+                tpu=self.tpu,
+            )
+        batch_iterator = self.task.get_batch_iterator(
+            dataset=self.task.dataset(self.cfg.dataset.train_subset),
+            max_tokens=self.cfg.dataset.max_tokens,
+            max_sentences=self.cfg.dataset.batch_size,
+            max_positions=utils.resolve_max_positions(
+                self.task.max_positions(),
+                self.model.max_positions(),
+                self.cfg.dataset.max_tokens,
+            ),
+            ignore_invalid_inputs=True,
+            required_batch_size_multiple=self.cfg.dataset.required_batch_size_multiple,
+            seed=self.cfg.common.seed,
+            num_shards=self.data_parallel_world_size if shard_batch_itr else 1,
+            shard_id=self.data_parallel_rank if shard_batch_itr else 0,
+            num_workers=self.cfg.dataset.num_workers,
+            epoch=epoch,
+            data_buffer_size=self.cfg.dataset.data_buffer_size,
+            disable_iterator_cache=disable_iterator_cache,
+        )
+        self.reset_dummy_batch(batch_iterator.first_batch)
+        return batch_iterator
+
+    def get_valid_iterator(
+        self,
+        subset,
+        disable_iterator_cache=False,
+    ):
+        """Return an EpochBatchIterator over given validation subset for a given epoch."""
+        batch_iterator = self.task.get_batch_iterator(
+            dataset=self.task.dataset(subset),
+            max_tokens=self.cfg.dataset.max_tokens_valid,
+            max_sentences=self.cfg.dataset.batch_size_valid,
+            max_positions=utils.resolve_max_positions(
+                self.task.max_positions(),
+                self.model.max_positions(),
+            ),
+            ignore_invalid_inputs=self.cfg.dataset.skip_invalid_size_inputs_valid_test,
+            required_batch_size_multiple=self.cfg.dataset.required_batch_size_multiple,
+            seed=self.cfg.common.seed,
+            num_shards=self.data_parallel_world_size,
+            shard_id=self.data_parallel_rank,
+            num_workers=self.cfg.dataset.num_workers,
+            # always pass a fixed "epoch" to keep validation data consistent
+            # across training epochs
+            epoch=1,
+            data_buffer_size=self.cfg.dataset.data_buffer_size,
+            disable_iterator_cache=disable_iterator_cache,
+        )
+        self.reset_dummy_batch(batch_iterator.first_batch)
+        return batch_iterator
+
+    def begin_epoch(self, epoch):
+        """Called at the beginning of each epoch."""
+        logger.info("begin training epoch {}".format(epoch))
+
+        self.lr_step_begin_epoch(epoch)
+
+        if self.quantizer is not None:
+            self.quantizer.begin_epoch(epoch)
+
+        # task specific setup per epoch
+        self.task.begin_epoch(epoch, self.get_model())
+
+        if self.tpu:
+            import torch_xla.core.xla_model as xm
+
+            xm.rendezvous("begin_epoch")  # wait for all workers
+            xm.mark_step()
+
+    def begin_valid_epoch(self, epoch):
+        """Called at the beginning of each validation epoch."""
+
+        # task specific setup per validation epoch
+        self.task.begin_valid_epoch(epoch, self.get_model())
+
+    def reset_dummy_batch(self, batch):
+        self._dummy_batch = batch
+
+    @metrics.aggregate("train")
+    def train_step(self, samples, raise_oom=False):
+        """Do forward, backward and parameter update."""
+        self._set_seed()
+        self.model.train()
+        self.criterion.train()
+        self.zero_grad()
+
+        metrics.log_start_time("train_wall", priority=800, round=0)
+
+        # forward and backward pass
+        logging_outputs, sample_size, ooms = [], 0, 0
+        for i, sample in enumerate(samples):  # delayed update loop
+            sample, is_dummy_batch = self._prepare_sample(sample)
+
+            def maybe_no_sync():
+                """
+                Whenever *samples* contains more than one mini-batch, we
+                want to accumulate gradients locally and only call
+                all-reduce in the last backwards pass.
+                """
+                if (
+                    self.data_parallel_world_size > 1
+                    and hasattr(self.model, "no_sync")
+                    and i < len(samples) - 1
+                ):
+                    return self.model.no_sync()
+                else:
+                    return contextlib.ExitStack()  # dummy contextmanager
+
+            try:
+                with maybe_no_sync():
+                    # forward and backward
+                    loss, sample_size_i, logging_output = self.task.train_step(
+                        sample=sample,
+                        model=self.model,
+                        criterion=self.criterion,
+                        optimizer=self.optimizer,
+                        update_num=self.get_num_updates(),
+                        ignore_grad=is_dummy_batch,
+                    )
+                    del loss
+
+                logging_outputs.append(logging_output)
+                sample_size += sample_size_i
+
+                # emptying the CUDA cache after the first step can
+                # reduce the chance of OOM
+                if self.cuda and self.get_num_updates() == 0:
+                    torch.cuda.empty_cache()
+            except RuntimeError as e:
+                if "out of memory" in str(e):
+                    self._log_oom(e)
+                    if raise_oom:
+                        raise e
+                    logger.warning(
+                        "attempting to recover from OOM in forward/backward pass"
+                    )
+                    ooms += 1
+                    self.zero_grad()
+                    if self.cuda:
+                        torch.cuda.empty_cache()
+                    if self.cfg.distributed_training.distributed_world_size == 1:
+                        return None
+                else:
+                    raise e
+
+            if self.tpu and i < len(samples) - 1:
+                # tpu-comment: every XLA operation before marking step is
+                # appended to the IR graph, and processing too many batches
+                # before marking step can lead to OOM errors.
+                # To handle gradient accumulation use case, we explicitly
+                # mark step here for every forward pass without a backward pass
+                self._xla_markstep_and_send_to_cpu()
+
+        if is_dummy_batch:
+            if torch.is_tensor(sample_size):
+                sample_size.zero_()
+            else:
+                sample_size *= 0.0
+
+        if torch.is_tensor(sample_size):
+            sample_size = sample_size.float()
+        else:
+            sample_size = float(sample_size)
+
+        # gather logging outputs from all replicas
+        if self._sync_stats():
+            train_time = self._local_cumulative_training_time()
+            logging_outputs, (
+                sample_size,
+                ooms,
+                total_train_time,
+            ) = self._aggregate_logging_outputs(
+                logging_outputs, sample_size, ooms, train_time, ignore=is_dummy_batch
+            )
+            self._cumulative_training_time = (
+                total_train_time / self.data_parallel_world_size
+            )
+
+        overflow = False
+        try:
+            with torch.autograd.profiler.record_function("reduce-grads"):
+                # reduce gradients across workers
+                self.optimizer.all_reduce_grads(self.model)
+                if utils.has_parameters(self.criterion):
+                    self.optimizer.all_reduce_grads(self.criterion)
+
+            with torch.autograd.profiler.record_function("multiply-grads"):
+                # multiply gradients by (data_parallel_size / sample_size) since
+                # DDP normalizes by the number of data parallel workers for
+                # improved fp16 precision.
+                # Thus we get (sum_of_gradients / sample_size) at the end.
+                # In case of fp16, this step also undoes loss scaling.
+                # (Debugging note: Some optimizers perform this scaling on the
+                # fly, so inspecting model.parameters() or optimizer.params may
+                # still show the original, unscaled gradients.)
+                numer = (
+                    self.data_parallel_world_size
+                    if not self.cfg.optimization.use_bmuf or self._sync_stats()
+                    else 1
+                )
+                self.optimizer.multiply_grads(numer / (sample_size or 1.0))
+                # Note: (sample_size or 1.0) handles the case of a zero gradient, in a
+                # way that avoids CPU/device transfers in case sample_size is a GPU or
+                # TPU object. The assumption is that the gradient itself is also 0.
+
+            with torch.autograd.profiler.record_function("clip-grads"):
+                # clip grads
+                grad_norm = self.clip_grad_norm(self.cfg.optimization.clip_norm)
+
+            # check that grad norms are consistent across workers
+            # on tpu check tensor is slow
+            if not self.tpu:
+                if (
+                    not self.cfg.optimization.use_bmuf
+                    and self.cfg.distributed_training.ddp_backend != "slow_mo"
+                ):
+                    self._check_grad_norms(grad_norm)
+                if not torch.isfinite(grad_norm).all():
+                    # in case of AMP, if gradients are Nan/Inf then
+                    # optimizer step is still required
+                    if self.cfg.common.amp:
+                        overflow = True
+                    else:
+                        # check local gradnorm single GPU case, trigger NanDetector
+                        raise FloatingPointError("gradients are Nan/Inf")
+
+            with torch.autograd.profiler.record_function("optimizer"):
+                # take an optimization step
+                self.task.optimizer_step(
+                    self.optimizer, model=self.model, update_num=self.get_num_updates()
+                )
+                if self.cfg.common.amp and overflow:
+                    if self._amp_retries == self.cfg.common.amp_batch_retries:
+                        logger.info("AMP: skipping this batch.")
+                        self._amp_retries = 0
+                    else:
+                        self._amp_retries += 1
+                        return self.train_step(samples, raise_oom)  # recursion to feed in same batch
+
+        except FloatingPointError:
+            # re-run the forward and backward pass with hooks attached to print
+            # out where it fails
+            self.zero_grad()
+            with NanDetector(self.get_model()):
+                for _, sample in enumerate(samples):
+                    sample, _ = self._prepare_sample(sample)
+                    self.task.train_step(
+                        sample,
+                        self.model,
+                        self.criterion,
+                        self.optimizer,
+                        self.get_num_updates(),
+                        ignore_grad=False,
+                    )
+            raise
+        except OverflowError as e:
+            overflow = True
+            logger.info(
+                f"NOTE: gradient overflow detected, ignoring gradient, {str(e)}"
+            )
+            grad_norm = torch.tensor(0.0).cuda()
+            self.zero_grad()
+        except RuntimeError as e:
+            if "out of memory" in str(e):
+                self._log_oom(e)
+                logger.error("OOM during optimization, irrecoverable")
+            raise e
+
+        # Some distributed wrappers (e.g., SlowMo) need access to the optimizer
+        # after the step
+        if hasattr(self.model, "perform_additional_optimizer_actions"):
+            if hasattr(self.optimizer, "fp32_params"):
+                self.model.perform_additional_optimizer_actions(
+                    self.optimizer.optimizer, self.optimizer.fp32_params
+                )
+            else:
+                self.model.perform_additional_optimizer_actions(
+                    self.optimizer.optimizer
+                )
+
+        logging_output = None
+        if not overflow or self.cfg.distributed_training.ddp_backend == "slow_mo":
+            self.set_num_updates(self.get_num_updates() + 1)
+
+            if self.tpu:
+                import torch_xla.core.xla_model as xm
+
+                # mark step on TPUs
+                self._xla_markstep_and_send_to_cpu()
+
+                # only log stats every log_interval steps
+                # this causes wps to be misreported when log_interval > 1
+                logging_output = {}
+                if self.get_num_updates() % self.cfg.common.log_interval == 0:
+                    # log memory usage
+                    mem_info = xm.get_memory_info(self.device)
+                    gb_free = mem_info["kb_free"] / 1024 / 1024
+                    gb_total = mem_info["kb_total"] / 1024 / 1024
+                    metrics.log_scalar(
+                        "gb_free", gb_free, priority=1500, round=1, weight=0
+                    )
+                    metrics.log_scalar(
+                        "gb_total", gb_total, priority=1600, round=1, weight=0
+                    )
+                    logging_outputs = self._xla_markstep_and_send_to_cpu(
+                        logging_outputs
+                    )
+                    logging_output = self._reduce_and_log_stats(
+                        logging_outputs, sample_size, grad_norm
+                    )
+
+                # log whenever there's an XLA compilation, since these
+                # slow down training and may indicate opportunities for
+                # optimization
+                self._check_xla_compilation()
+            else:
+                if self.cuda and self.cuda_env is not None:
+                    # log minimum free memory over the iteration
+                    gb_used = torch.cuda.max_memory_allocated() / 1024 / 1024 / 1024
+                    torch.cuda.reset_peak_memory_stats()
+                    gb_free = self.cuda_env.total_memory_in_GB - gb_used
+                    metrics.log_scalar(
+                        "gb_free", gb_free, priority=1500, round=1, weight=0
+                    )
+
+                # log stats
+                logging_output = self._reduce_and_log_stats(
+                    logging_outputs, sample_size, grad_norm
+                )
+
+                # clear CUDA cache to reduce memory fragmentation
+                if (
+                    self.cuda
+                    and self.cfg.common.empty_cache_freq > 0
+                    and (
+                        (self.get_num_updates() + self.cfg.common.empty_cache_freq - 1)
+                        % self.cfg.common.empty_cache_freq
+                    )
+                    == 0
+                ):
+                    torch.cuda.empty_cache()
+
+        if self.cfg.common.fp16 or self.cfg.common.amp:
+            metrics.log_scalar(
+                "loss_scale",
+                (
+                    self.optimizer.scaler.loss_scale
+                    if self.cfg.common.fp16
+                    else self.optimizer.scaler.get_scale()
+                ),
+                priority=700,
+                round=4,
+                weight=0,
+            )
+
+        metrics.log_stop_time("train_wall")
+        return logging_output
+
+    @metrics.aggregate("valid")
+    def valid_step(self, sample, raise_oom=False):
+        """Do forward pass in evaluation mode."""
+        if self.tpu:
+            import torch_xla.core.xla_model as xm
+
+            xm.rendezvous("valid_step")  # wait for all workers
+
+        with torch.no_grad():
+            self.model.eval()
+            self.criterion.eval()
+
+            sample, is_dummy_batch = self._prepare_sample(sample)
+
+            try:
+                _loss, sample_size, logging_output = self.task.valid_step(
+                    sample, self.model, self.criterion
+                )
+            except RuntimeError as e:
+                if "out of memory" in str(e):
+                    self._log_oom(e)
+                    if not raise_oom:
+                        logger.warning(
+                            "ran out of memory in validation step, retrying batch"
+                        )
+                        for p in self.model.parameters():
+                            if p.grad is not None:
+                                p.grad = None  # free some memory
+                        if self.cuda:
+                            torch.cuda.empty_cache()
+                        return self.valid_step(sample, raise_oom=True)
+                raise e
+
+            logging_outputs = [logging_output]
+            if is_dummy_batch:
+                if torch.is_tensor(sample_size):
+                    sample_size.zero_()
+                else:
+                    sample_size *= 0.0
+
+        # gather logging outputs from all replicas
+        if self.data_parallel_world_size > 1:
+            logging_outputs, (sample_size,) = self._aggregate_logging_outputs(
+                logging_outputs,
+                sample_size,
+                ignore=is_dummy_batch,
+            )
+
+        # log validation stats
+        if self.tpu:
+            logging_outputs = self._xla_markstep_and_send_to_cpu(logging_outputs)
+        logging_output = self._reduce_and_log_stats(logging_outputs, sample_size)
+
+        return logging_output
+
+    def zero_grad(self):
+        self.optimizer.zero_grad()
+
+    def lr_step_begin_epoch(self, epoch):
+        """Adjust the learning rate at the beginning of the epoch."""
+        self.lr_scheduler.step_begin_epoch(epoch)
+        # prefer updating the LR based on the number of steps
+        return self.lr_step_update()
+
+    def lr_step(self, epoch, val_loss=None):
+        """Adjust the learning rate at the end of the epoch."""
+        self.lr_scheduler.step(epoch, val_loss)
+        # prefer updating the LR based on the number of steps
+        return self.lr_step_update()
+
+    def lr_step_update(self):
+        """Update the learning rate after each update."""
+        new_lr = self.lr_scheduler.step_update(self.get_num_updates())
+        if isinstance(new_lr, dict):
+            for k, v in new_lr.items():
+                metrics.log_scalar(f"lr_{k}", v, weight=0, priority=300)
+            new_lr = new_lr.get("default", next(iter(new_lr.values())))
+        else:
+            metrics.log_scalar("lr", new_lr, weight=0, priority=300)
+        return new_lr
+
+    def get_lr(self):
+        """Get the current learning rate."""
+        return self.optimizer.get_lr()
+
+    def get_model(self):
+        """Get the (non-wrapped) model instance."""
+        return self._model
+
+    def get_criterion(self):
+        """Get the (non-wrapped) criterion instance."""
+        return self._criterion
+
+    def get_meter(self, name):
+        """[deprecated] Get a specific meter by name."""
+        from fairseq import meters
+
+        if "get_meter" not in self._warn_once:
+            self._warn_once.add("get_meter")
+            utils.deprecation_warning(
+                "Trainer.get_meter is deprecated. Please use fairseq.metrics instead."
+            )
+
+        train_meters = metrics.get_meters("train")
+        if train_meters is None:
+            train_meters = {}
+
+        if name == "train_loss" and "loss" in train_meters:
+            return train_meters["loss"]
+        elif name == "train_nll_loss":
+            # support for legacy train.py, which assumed this meter is
+            # always initialized
+            m = train_meters.get("nll_loss", None)
+            return m or meters.AverageMeter()
+        elif name == "wall":
+            # support for legacy train.py, which assumed this meter is
+            # always initialized
+            m = metrics.get_meter("default", "wall")
+            return m or meters.TimeMeter()
+        elif name == "wps":
+            m = metrics.get_meter("train", "wps")
+            return m or meters.TimeMeter()
+        elif name in {"valid_loss", "valid_nll_loss"}:
+            # support for legacy train.py, which assumed these meters
+            # are always initialized
+            k = name[len("valid_") :]
+            m = metrics.get_meter("valid", k)
+            return m or meters.AverageMeter()
+        elif name == "oom":
+            return meters.AverageMeter()
+        elif name in train_meters:
+            return train_meters[name]
+        return None
+
+    def get_num_updates(self):
+        """Get the number of parameters updates."""
+        return self._num_updates
+
+    def set_num_updates(self, num_updates):
+        """Set the number of parameters updates."""
+        self._num_updates = num_updates
+        self.lr_step_update()
+        if self.quantizer:
+            self.quantizer.step_update(self._num_updates)
+        metrics.log_scalar("num_updates", self._num_updates, weight=0, priority=200)
+
+    def clip_grad_norm(self, clip_norm):
+        def agg_norm_fn(total_norm):
+            total_norm = total_norm.cuda().float() ** 2
+            total_norm = distributed_utils.all_reduce(
+                total_norm, group=self.data_parallel_process_group
+            )
+            return total_norm ** 0.5
+
+        should_agg_norm = (
+            self.cfg.distributed_training.ddp_backend == "fully_sharded"
+            and (
+                self.data_parallel_process_group is not None
+                or torch.distributed.is_initialized()
+            )
+        )
+        return self.optimizer.clip_grad_norm(
+            clip_norm, aggregate_norm_fn=agg_norm_fn if should_agg_norm else None
+        )
+
+    def cumulative_training_time(self):
+        if self._cumulative_training_time is None:
+            # single GPU
+            return self._local_cumulative_training_time()
+        else:
+            return self._cumulative_training_time
+
+    def _local_cumulative_training_time(self):
+        """Aggregate training time in seconds."""
+        return time.time() - self._start_time + self._previous_training_time
+
+    def _fp_convert_sample(self, sample):
+        def apply_half(t):
+            if t.dtype is torch.float32:
+                return t.to(dtype=torch.half)
+            return t
+
+        def apply_bfloat16(t):
+            if t.dtype is torch.float32:
+                return t.to(dtype=torch.bfloat16)
+            return t
+
+        if self.cfg.common.fp16:
+            sample = utils.apply_to_sample(apply_half, sample)
+
+        if self.cfg.common.bf16:
+            sample = utils.apply_to_sample(apply_bfloat16, sample)
+
+        return sample
+
+    def _prepare_sample(self, sample, is_dummy=False):
+        if sample == "DUMMY":
+            raise Exception(
+                "Trying to use an uninitialized 'dummy' batch. This usually indicates "
+                "that the total number of batches is smaller than the number of "
+                "participating GPUs. Try reducing the batch size or using fewer GPUs."
+            )
+
+        if sample is None or len(sample) == 0:
+            assert (
+                self._dummy_batch is not None and len(self._dummy_batch) > 0
+            ), "Invalid dummy batch: {}".format(self._dummy_batch)
+            sample, _ = self._prepare_sample(self._dummy_batch, is_dummy=True)
+            return sample, True
+
+        # Given that PCIe/NVLink bandwidth is significantly smaller than DRAM bandwidth
+        # it makes sense to do the format conversion on the CPU and then transfer
+        # a smaller buffer to the device. This also saves GPU memory capacity.
+
+        if self.cfg.common.on_cpu_convert_precision:
+            sample = self._fp_convert_sample(sample)
+
+        if self.cuda:
+            if self.pipeline_model_parallel:
+                if 'target' in sample:
+                    sample['target'] = utils.move_to_cuda(sample['target'], device=self.last_device)
+            else:
+                sample = utils.move_to_cuda(sample)
+        elif self.tpu and is_dummy:
+            # the dummy batch may not be on the appropriate device
+            sample = utils.move_to_cuda(sample, device=self.device)
+
+        if not self.cfg.common.on_cpu_convert_precision:
+            sample = self._fp_convert_sample(sample)
+
+        if self._dummy_batch == "DUMMY":
+            self._dummy_batch = sample
+
+        return sample, False
+
+    def _set_seed(self):
+        # Set seed based on args.seed and the update number so that we get
+        # reproducible results when resuming from checkpoints
+        seed = self.cfg.common.seed + self.get_num_updates()
+        utils.set_torch_seed(seed)
+
+    def _sync_stats(self):
+        # Return True if it's using multiple GPUs and DDP or multiple GPUs with
+        # BMUF and it's a bmuf sync with warmup iterations completed before.
+        if self.data_parallel_world_size == 1:
+            return False
+        elif self.cfg.optimization.use_bmuf:
+            return (
+                self.get_num_updates() + 1
+            ) % self.cfg.bmuf.global_sync_iter == 0 and (
+                self.get_num_updates() + 1
+            ) > self.cfg.bmuf.warmup_iterations
+        else:
+            return True
+
+    def _log_oom(self, exc):
+        msg = "OOM: Ran out of memory with exception: {}".format(exc)
+        logger.warning(msg)
+        if torch.cuda.is_available() and hasattr(torch.cuda, "memory_summary"):
+            for device_idx in range(torch.cuda.device_count()):
+                logger.warning(torch.cuda.memory_summary(device=device_idx))
+        sys.stderr.flush()
+
+    def _aggregate_logging_outputs(
+        self,
+        logging_outputs: List[Dict[str, Any]],
+        *extra_stats_to_sum,
+        ignore=False,
+    ):
+        if self.task.__class__.logging_outputs_can_be_summed(self.get_criterion()):
+            return self._fast_stat_sync_sum(
+                logging_outputs, *extra_stats_to_sum, ignore=ignore
+            )
+        else:
+            return self._all_gather_list_sync(
+                logging_outputs, *extra_stats_to_sum, ignore=ignore
+            )
+
+    def _all_gather_list_sync(
+        self,
+        logging_outputs: List[Dict[str, Any]],
+        *extra_stats_to_sum,
+        ignore=False,
+    ):
+        """
+        Sync logging outputs across workers. all_gather_list_sync is
+        suitable when logging outputs are complex types.
+        """
+        if self.tpu:
+            raise NotImplementedError
+        if ignore:
+            logging_outputs = []
+        results = list(
+            zip(
+                *distributed_utils.all_gather_list(
+                    [logging_outputs] + list(extra_stats_to_sum),
+                    max_size=getattr(self.cfg.common, "all_gather_list_size", 16384),
+                    group=self.data_parallel_process_group,
+                )
+            )
+        )
+        logging_outputs, extra_stats_to_sum = results[0], results[1:]
+        logging_outputs = list(chain.from_iterable(logging_outputs))
+        extra_stats_to_sum = [sum(s) for s in extra_stats_to_sum]
+        return logging_outputs, extra_stats_to_sum
+
+    def _fast_stat_sync_sum(
+        self,
+        logging_outputs: List[Dict[str, Any]],
+        *extra_stats_to_sum,
+        ignore=False,
+    ):
+        """
+        Sync logging outputs across workers. fast_stat_sync_sum is
+        faster than all_gather_list_sync, but is only suitable when
+        logging outputs are scalars and can be summed. Note that
+        *logging_outputs* cannot contain any nested dicts/lists.
+        """
+        data = {}
+        for i, stat in enumerate(extra_stats_to_sum):
+            data["extra_stats_" + str(i)] = stat
+        if len(logging_outputs) > 0:
+            log_keys = list(logging_outputs[0].keys())
+            for k in log_keys:
+                if not ignore:
+                    v = sum(log[k] for log in logging_outputs if k in log)
+                else:
+                    v = logging_outputs[0][k]
+                    v = torch.zeros_like(v) if torch.is_tensor(v) else 0
+                data["logging_outputs_" + k] = v
+        else:
+            log_keys = None
+
+        data = distributed_utils.all_reduce_dict(
+            data, device=self.device, group=self.data_parallel_process_group
+        )
+
+        extra_stats_to_sum = [
+            data["extra_stats_" + str(i)] for i in range(len(extra_stats_to_sum))
+        ]
+        if log_keys is not None:
+            logging_outputs = [{k: data["logging_outputs_" + k] for k in log_keys}]
+        else:
+            logging_outputs = []
+        return logging_outputs, extra_stats_to_sum
+
+    def _check_grad_norms(self, grad_norm):
+        """Check that grad norms are consistent across workers."""
+        if self._grad_norm_buf is not None:
+            self._grad_norm_buf.zero_()
+            self._grad_norm_buf[self.data_parallel_rank] = grad_norm
+            distributed_utils.all_reduce(
+                self._grad_norm_buf, group=self.data_parallel_process_group
+            )
+
+            def is_consistent(tensor):
+                max_abs_diff = torch.max(torch.abs(tensor - tensor[0]))
+                return (
+                    (torch.isfinite(tensor).all()
+                     and (max_abs_diff / (tensor[0] + 1e-6) < 1e-6).all())
+                    or
+                    (self.cfg.common.amp and not torch.isfinite(tensor).all())
+                    # in case of amp non-finite grads are fine
+                )
+
+            if not is_consistent(self._grad_norm_buf):
+                pretty_detail = "\n".join(
+                    "rank {:3d} = {:.8f}".format(r, n)
+                    for r, n in enumerate(self._grad_norm_buf.tolist())
+                )
+                error_detail = "grad_norm across the workers:\n{}\n".format(
+                    pretty_detail
+                )
+                # use FloatingPointError to trigger NanDetector
+                raise FloatingPointError(
+                    "Fatal error: gradients are inconsistent between workers. "
+                    "Try --ddp-backend=legacy_ddp. "
+                    "Or are you mixing up different generation of GPUs in training?"
+                    + "\n"
+                    + "-" * 80
+                    + "\n{}\n".format(error_detail)
+                    + "-" * 80
+                )
+
+    def _reduce_and_log_stats(self, logging_outputs, sample_size, grad_norm=None):
+        if grad_norm is not None and (
+            not torch.is_tensor(grad_norm) or torch.isfinite(grad_norm)
+        ):
+            metrics.log_speed("ups", 1.0, priority=100, round=2)
+            metrics.log_scalar("gnorm", grad_norm, priority=400, round=3)
+            if self.cfg.optimization.clip_norm > 0:
+                metrics.log_scalar(
+                    "clip",
+                    torch.where(
+                        grad_norm > self.cfg.optimization.clip_norm,
+                        grad_norm.new_tensor(100),
+                        grad_norm.new_tensor(0),
+                    ),
+                    priority=500,
+                    round=1,
+                )
+
+        with metrics.aggregate() as agg:
+            if logging_outputs is not None:
+                self.task.reduce_metrics(logging_outputs, self.get_criterion())
+                del logging_outputs
+
+            # extra warning for criterions that don't properly log a loss value
+            if "loss" not in agg:
+                if "loss" not in self._warn_once:
+                    self._warn_once.add("loss")
+                    logger.warning(
+                        "Criterion.reduce_metrics did not log a 'loss' value, "
+                        "which may break some functionality"
+                    )
+                metrics.log_scalar("loss", -1)
+
+            # support legacy interface
+            if self.tpu:
+                logging_output = {}
+            else:
+                logging_output = agg.get_smoothed_values()
+                logging_output["sample_size"] = sample_size
+                for key_to_delete in ["ppl", "wps", "wpb", "bsz"]:
+                    if key_to_delete in logging_output:
+                        del logging_output[key_to_delete]
+            return logging_output
+
+    def _check_xla_compilation(self):
+        import torch_xla.debug.metrics as met
+
+        compile_stats = met.metric_data("CompileTime")
+        if compile_stats is None:
+            return
+        num_xla_compiles = compile_stats[0]
+        if num_xla_compiles > self._num_xla_compiles:
+            logger.warning(
+                "XLA compilation detected on device #{}; too many of these can lead "
+                "to slow training, but we expect a few in the beginning".format(
+                    self.cfg.distributed_training.distributed_rank
+                )
+            )
+        self._num_xla_compiles = num_xla_compiles
+
+    def _xla_markstep_and_send_to_cpu(self, data=None):
+        import torch_xla.core.xla_model as xm
+
+        xm.mark_step()
+        if data is not None:
+            from fairseq.utils import xla_device_to_cpu
+
+            return xla_device_to_cpu(data)
+
+
+def _catalog_shared_params(module, memo=None, prefix=""):
+    if memo is None:
+        first_call = True
+        memo = {}
+    else:
+        first_call = False
+    for name, param in module._parameters.items():
+        param_prefix = prefix + ("." if prefix else "") + name
+        if param not in memo:
+            memo[param] = []
+        memo[param].append(param_prefix)
+    for name, m in module._modules.items():
+        if m is None:
+            continue
+        submodule_prefix = prefix + ("." if prefix else "") + name
+        _catalog_shared_params(m, memo, submodule_prefix)
+    if first_call:
+        return [x for x in memo.values() if len(x) > 1]
+
+
+def _get_module_by_path(module, path):
+    path = path.split(".")
+    for name in path:
+        module = getattr(module, name)
+    return module
+
+
+def _set_module_by_path(module, path, value):
+    path = path.split(".")
+    for name in path[:-1]:
+        module = getattr(module, name)
+    setattr(module, path[-1], value)
diff --git a/SpeechT5/fairseq/fairseq/utils.py b/SpeechT5/fairseq/fairseq/utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..4fe95b9e8b2b277cd545e12d5980561492b70783
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/utils.py
@@ -0,0 +1,807 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import argparse
+import contextlib
+import copy
+import importlib
+import logging
+import os
+import sys
+import warnings
+from itertools import accumulate
+from typing import Callable, Dict, List, Optional
+
+import torch
+import torch.nn.functional as F
+from fairseq.modules.multihead_attention import MultiheadAttention
+from torch import Tensor
+
+
+try:
+    from amp_C import multi_tensor_l2norm
+
+    multi_tensor_l2norm_available = True
+except ImportError:
+    multi_tensor_l2norm_available = False
+
+try:
+    import torch_xla.core.xla_model as xm
+except ImportError:
+    xm = None
+
+
+logger = logging.getLogger(__name__)
+
+
+MANIFOLD_PATH_SEP = "|"
+
+
+class FileContentsAction(argparse.Action):
+    def __init__(self, option_strings, dest, nargs=None, **kwargs):
+        if nargs is not None:
+            raise ValueError("nargs not allowed")
+        super(FileContentsAction, self).__init__(option_strings, dest, **kwargs)
+
+    def __call__(self, parser, namespace, values, option_string=None):
+        from fairseq.file_io import PathManager
+
+        if PathManager.isfile(values):
+            with PathManager.open(values) as f:
+                argument = f.read().strip()
+        else:
+            argument = values
+        setattr(namespace, self.dest, argument)
+
+
+def split_paths(paths: str, separator=os.pathsep) -> List[str]:
+    return (
+        paths.split(separator) if "://" not in paths else paths.split(MANIFOLD_PATH_SEP)
+    )
+
+
+def load_ensemble_for_inference(filenames, task, model_arg_overrides=None):
+    from fairseq import checkpoint_utils
+
+    deprecation_warning(
+        "utils.load_ensemble_for_inference is deprecated. "
+        "Please use checkpoint_utils.load_model_ensemble instead."
+    )
+    return checkpoint_utils.load_model_ensemble(
+        filenames, arg_overrides=model_arg_overrides, task=task
+    )
+
+
+def apply_to_sample(f, sample):
+    if hasattr(sample, "__len__") and len(sample) == 0:
+        return {}
+
+    def _apply(x):
+        if torch.is_tensor(x):
+            return f(x)
+        elif isinstance(x, dict):
+            return {key: _apply(value) for key, value in x.items()}
+        elif isinstance(x, list):
+            return [_apply(x) for x in x]
+        elif isinstance(x, tuple):
+            return tuple(_apply(x) for x in x)
+        elif isinstance(x, set):
+            return {_apply(x) for x in x}
+        else:
+            return x
+
+    return _apply(sample)
+
+
+def move_to_cuda(sample, device=None):
+    device = device or torch.cuda.current_device()
+
+    def _move_to_cuda(tensor):
+        # non_blocking is ignored if tensor is not pinned, so we can always set
+        # to True (see github.com/PyTorchLightning/pytorch-lightning/issues/620)
+        return tensor.to(device=device, non_blocking=True)
+
+    return apply_to_sample(_move_to_cuda, sample)
+
+
+def move_to_cpu(sample):
+    def _move_to_cpu(tensor):
+        # PyTorch has poor support for half tensors (float16) on CPU.
+        # Move any such tensors to float32.
+        if tensor.dtype in {torch.bfloat16, torch.float16}:
+            tensor = tensor.to(dtype=torch.float32)
+        return tensor.cpu()
+
+    return apply_to_sample(_move_to_cpu, sample)
+
+
+def move_to_tpu(sample):
+
+    import torch_xla.core.xla_model as xm
+
+    device = xm.xla_device()
+
+    def _move_to_tpu(tensor):
+        return tensor.to(device)
+
+    return apply_to_sample(_move_to_tpu, sample)
+
+
+def get_incremental_state(
+    module: MultiheadAttention,
+    incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]],
+    key: str,
+) -> Optional[Dict[str, Optional[Tensor]]]:
+    """Helper for getting incremental state for an nn.Module."""
+    return module.get_incremental_state(incremental_state, key)
+
+
+def set_incremental_state(
+    module: MultiheadAttention,
+    incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]],
+    key: str,
+    value: Dict[str, Optional[Tensor]],
+) -> Optional[Dict[str, Dict[str, Optional[Tensor]]]]:
+    """Helper for setting incremental state for an nn.Module."""
+    if incremental_state is not None:
+        result = module.set_incremental_state(incremental_state, key, value)
+        if result is not None:
+            incremental_state = result
+    return incremental_state
+
+
+def load_align_dict(replace_unk):
+    if replace_unk is None:
+        align_dict = None
+    elif isinstance(replace_unk, str) and len(replace_unk) > 0:
+        # Load alignment dictionary for unknown word replacement if it was passed as an argument.
+        align_dict = {}
+        with open(replace_unk, "r") as f:
+            for line in f:
+                cols = line.split()
+                align_dict[cols[0]] = cols[1]
+    else:
+        # No alignment dictionary provided but we still want to perform unknown word replacement by copying the
+        # original source word.
+        align_dict = {}
+    return align_dict
+
+
+def print_embed_overlap(embed_dict, vocab_dict):
+    embed_keys = set(embed_dict.keys())
+    vocab_keys = set(vocab_dict.symbols)
+    overlap = len(embed_keys & vocab_keys)
+    logger.info("found {}/{} types in embedding file".format(overlap, len(vocab_dict)))
+
+
+def parse_embedding(embed_path):
+    """Parse embedding text file into a dictionary of word and embedding tensors.
+
+    The first line can have vocabulary size and dimension. The following lines
+    should contain word and embedding separated by spaces.
+
+    Example:
+        2 5
+        the -0.0230 -0.0264  0.0287  0.0171  0.1403
+        at -0.0395 -0.1286  0.0275  0.0254 -0.0932
+    """
+    embed_dict = {}
+    with open(embed_path) as f_embed:
+        next(f_embed)  # skip header
+        for line in f_embed:
+            pieces = line.rstrip().split(" ")
+            embed_dict[pieces[0]] = torch.Tensor(
+                [float(weight) for weight in pieces[1:]]
+            )
+    return embed_dict
+
+
+def load_embedding(embed_dict, vocab, embedding):
+    for idx in range(len(vocab)):
+        token = vocab[idx]
+        if token in embed_dict:
+            embedding.weight.data[idx] = embed_dict[token]
+    return embedding
+
+
+def replace_unk(hypo_str, src_str, alignment, align_dict, unk):
+    from fairseq import tokenizer
+
+    # Tokens are strings here
+    hypo_tokens = tokenizer.tokenize_line(hypo_str)
+    # TODO: Very rare cases where the replacement is '<eos>' should be handled gracefully
+    src_tokens = tokenizer.tokenize_line(src_str) + ["<eos>"]
+    for i, ht in enumerate(hypo_tokens):
+        if ht == unk:
+            src_token = src_tokens[alignment[i]]
+            # Either take the corresponding value in the aligned dictionary or just copy the original value.
+            hypo_tokens[i] = align_dict.get(src_token, src_token)
+    return " ".join(hypo_tokens)
+
+
+def post_process_prediction(
+    hypo_tokens,
+    src_str,
+    alignment,
+    align_dict,
+    tgt_dict,
+    remove_bpe=None,
+    extra_symbols_to_ignore=None,
+):
+    hypo_str = tgt_dict.string(
+        hypo_tokens, remove_bpe, extra_symbols_to_ignore=extra_symbols_to_ignore
+    )
+    if align_dict is not None:
+        hypo_str = replace_unk(
+            hypo_str, src_str, alignment, align_dict, tgt_dict.unk_string()
+        )
+    if align_dict is not None or remove_bpe is not None:
+        # Convert back to tokens for evaluating with unk replacement or without BPE
+        # Note that the dictionary can be modified inside the method.
+        hypo_tokens = tgt_dict.encode_line(hypo_str, add_if_not_exist=True)
+    return hypo_tokens, hypo_str, alignment
+
+
+def make_positions(tensor, padding_idx: int, onnx_trace: bool = False):
+    """Replace non-padding symbols with their position numbers.
+
+    Position numbers begin at padding_idx+1. Padding symbols are ignored.
+    """
+    # The series of casts and type-conversions here are carefully
+    # balanced to both work with ONNX export and XLA. In particular XLA
+    # prefers ints, cumsum defaults to output longs, and ONNX doesn't know
+    # how to handle the dtype kwarg in cumsum.
+    mask = tensor.ne(padding_idx).int()
+    return (torch.cumsum(mask, dim=1).type_as(mask) * mask).long() + padding_idx
+
+
+def strip_pad(tensor, pad):
+    return tensor[tensor.ne(pad)]
+
+
+def buffered_arange(max):
+    if not hasattr(buffered_arange, "buf"):
+        buffered_arange.buf = torch.LongTensor()
+    if max > buffered_arange.buf.numel():
+        buffered_arange.buf.resize_(max)
+        torch.arange(max, out=buffered_arange.buf)
+    return buffered_arange.buf[:max]
+
+
+def convert_padding_direction(
+    src_tokens, padding_idx, right_to_left: bool = False, left_to_right: bool = False
+):
+    assert right_to_left ^ left_to_right
+    pad_mask = src_tokens.eq(padding_idx)
+    if not pad_mask.any():
+        # no padding, return early
+        return src_tokens
+    if left_to_right and not pad_mask[:, 0].any():
+        # already right padded
+        return src_tokens
+    if right_to_left and not pad_mask[:, -1].any():
+        # already left padded
+        return src_tokens
+    max_len = src_tokens.size(1)
+    buffered = torch.empty(0).long()
+    if max_len > 0:
+        torch.arange(max_len, out=buffered)
+    range = buffered.type_as(src_tokens).expand_as(src_tokens)
+    num_pads = pad_mask.long().sum(dim=1, keepdim=True)
+    if right_to_left:
+        index = torch.remainder(range - num_pads, max_len)
+    else:
+        index = torch.remainder(range + num_pads, max_len)
+    return src_tokens.gather(1, index)
+
+
+def item(tensor):
+    # tpu-comment: making this a no-op for xla devices.
+    if torch.is_tensor(tensor) and tensor.device.type == "xla":
+        return tensor.detach()
+    if hasattr(tensor, "item"):
+        return tensor.item()
+    if hasattr(tensor, "__getitem__"):
+        return tensor[0]
+    return tensor
+
+
+def multi_tensor_total_norm(grads, chunk_size=2048 * 32) -> torch.Tensor:
+    per_device_grads = {}
+    norms = []
+    for grad in grads:
+        device = grad.device
+        cur_device_grads = per_device_grads.get(device)
+        if cur_device_grads is None:
+            cur_device_grads = []
+            per_device_grads[device] = cur_device_grads
+        cur_device_grads.append(grad)
+    for device in per_device_grads.keys():
+        cur_device_grads = per_device_grads[device]
+        if device.type == "cuda":
+            # TODO(msb) return has_inf
+            has_inf = torch.zeros((1, 1), dtype=torch.int, device=device)
+            with torch.cuda.device(device):
+                norm = multi_tensor_l2norm(
+                    chunk_size, has_inf, [cur_device_grads], False
+                )
+            norms.append(norm[0].to(torch.cuda.current_device()))
+        else:
+            norms += [torch.norm(g, p=2, dtype=torch.float32) for g in cur_device_grads]
+    total_norm = torch.norm(torch.stack(norms))
+    return total_norm
+
+
+@torch.no_grad()
+def clip_grad_norm_(params, max_norm, aggregate_norm_fn=None) -> torch.Tensor:
+    def grad_exists(p):
+        return p is not None and getattr(p, "grad", None) is not None
+
+    if isinstance(params, torch.Tensor):
+        params = [params]
+    params = list(params)
+    grads = [
+        p.grad.detach() for p in params if grad_exists(p) and not hasattr(p, "expert")
+    ]
+    expert_grads = [
+        p.grad.detach() for p in params if grad_exists(p) and hasattr(p, "expert")
+    ]
+
+    if len(grads) == 0:
+        if len(params) > 0:
+            return params[0].new_tensor(0.0)
+        else:
+            return torch.tensor(0.0)
+
+    if len(grads) == 1:
+        total_norm = torch.norm(grads[0], p=2, dtype=torch.float32)
+    else:
+        if multi_tensor_l2norm_available:
+            total_norm = multi_tensor_total_norm(grads)
+        else:
+            if torch.cuda.is_available():
+                warnings.warn(
+                    "amp_C fused kernels unavailable, disabling multi_tensor_l2norm; "
+                    "you may get better performance by installing NVIDIA's apex library"
+                )
+                device = torch.cuda.current_device()
+            elif grads[0].device.type == "xla":
+                device = grads[0].device
+            else:
+                device = torch.device("cpu")
+            total_norm = torch.norm(
+                torch.stack(
+                    [torch.norm(g, p=2, dtype=torch.float32).to(device) for g in grads]
+                )
+            )
+
+    if aggregate_norm_fn is not None:
+        total_norm = aggregate_norm_fn(total_norm)
+
+    if max_norm > 0:
+        max_norm = float(max_norm)
+        clip_coef = (max_norm / (total_norm + 1e-6)).clamp_(max=1)
+        for g in grads + expert_grads:
+            g.mul_(clip_coef)
+    return total_norm
+
+
+def fill_with_neg_inf(t):
+    """FP16-compatible function that fills a tensor with -inf."""
+    return t.float().fill_(float("-inf")).type_as(t)
+
+
+def _match_types(arg1, arg2):
+    """Convert the numerical argument to the same type as the other argument"""
+
+    def upgrade(arg_number, arg_structure):
+        if isinstance(arg_structure, tuple):
+            return tuple([arg_number] * len(arg_structure))
+        elif isinstance(arg_structure, dict):
+            arg = copy.deepcopy(arg_structure)
+            for k in arg:
+                arg[k] = upgrade(arg_number, arg_structure[k])
+            return arg
+        else:
+            return arg_number
+
+    if isinstance(arg1, float) or isinstance(arg1, int):
+        return upgrade(arg1, arg2), arg2
+    elif isinstance(arg2, float) or isinstance(arg2, int):
+        return arg1, upgrade(arg2, arg1)
+
+    return arg1, arg2
+
+
+def resolve_max_positions(*args):
+    """Resolve max position constraints from multiple sources."""
+
+    def map_value_update(d1, d2):
+        updated_value = copy.deepcopy(d1)
+        for key in d2:
+            if key not in updated_value:
+                updated_value[key] = d2[key]
+            else:
+                updated_value[key] = min(d1[key], d2[key])
+        return updated_value
+
+    def nullsafe_min(l):
+        minim = None
+        for item in l:
+            if minim is None:
+                minim = item
+            elif item is not None and item < minim:
+                minim = item
+        return minim
+
+    max_positions = None
+    for arg in args:
+        if max_positions is None:
+            max_positions = arg
+        elif arg is not None:
+            max_positions, arg = _match_types(max_positions, arg)
+            if isinstance(arg, float) or isinstance(arg, int):
+                max_positions = min(max_positions, arg)
+            elif isinstance(arg, dict):
+                max_positions = map_value_update(max_positions, arg)
+            else:
+                max_positions = tuple(map(nullsafe_min, zip(max_positions, arg)))
+
+    return max_positions
+
+
+def import_user_module(args):
+    module_path = getattr(args, "user_dir", None)
+    if module_path is not None:
+        module_path = os.path.abspath(args.user_dir)
+        if not os.path.exists(module_path) and not os.path.isfile(
+            os.path.dirname(module_path)
+        ):
+            fairseq_rel_path = os.path.join(os.path.dirname(__file__), args.user_dir)
+            if os.path.exists(fairseq_rel_path):
+                module_path = fairseq_rel_path
+            else:
+                fairseq_rel_path = os.path.join(
+                    os.path.dirname(__file__), "..", args.user_dir
+                )
+                if os.path.exists(fairseq_rel_path):
+                    module_path = fairseq_rel_path
+                else:
+                    raise FileNotFoundError(module_path)
+
+        # ensure that user modules are only imported once
+        import_user_module.memo = getattr(import_user_module, "memo", set())
+        if module_path not in import_user_module.memo:
+            import_user_module.memo.add(module_path)
+
+            module_parent, module_name = os.path.split(module_path)
+            if module_name not in sys.modules:
+                sys.path.insert(0, module_parent)
+                importlib.import_module(module_name)
+
+                tasks_path = os.path.join(module_path, "tasks")
+                if os.path.exists(tasks_path):
+                    from fairseq.tasks import import_tasks
+
+                    import_tasks(tasks_path, f"{module_name}.tasks")
+
+                models_path = os.path.join(module_path, "models")
+                if os.path.exists(models_path):
+                    from fairseq.models import import_models
+
+                    import_models(models_path, f"{module_name}.models")
+            else:
+                raise ImportError(
+                    "Failed to import --user-dir={} because the corresponding module name "
+                    "({}) is not globally unique. Please rename the directory to "
+                    "something unique and try again.".format(module_path, module_name)
+                )
+
+
+def softmax(x, dim: int, onnx_trace: bool = False):
+    if onnx_trace:
+        return F.softmax(x.float(), dim=dim)
+    else:
+        return F.softmax(x, dim=dim, dtype=torch.float32)
+
+
+def log_softmax(x, dim: int, onnx_trace: bool = False):
+    if onnx_trace:
+        return F.log_softmax(x.float(), dim=dim)
+    else:
+        return F.log_softmax(x, dim=dim, dtype=torch.float32)
+
+
+def get_perplexity(loss, round=2, base=2):
+    from fairseq.logging.meters import safe_round
+
+    if loss is None:
+        return 0.0
+    try:
+        return safe_round(base ** loss, round)
+    except OverflowError:
+        return float("inf")
+
+
+def deprecation_warning(message, stacklevel=3):
+    # don't use DeprecationWarning, since it's ignored by default
+    warnings.warn(message, stacklevel=stacklevel)
+
+
+def get_activation_fn(activation: str) -> Callable:
+    """Returns the activation function corresponding to `activation`"""
+    from fairseq.modules import gelu, gelu_accurate
+
+    if activation == "relu":
+        return F.relu
+    elif activation == "gelu":
+        return gelu
+    elif activation == "gelu_fast":
+        deprecation_warning(
+            "--activation-fn=gelu_fast has been renamed to gelu_accurate"
+        )
+        return gelu_accurate
+    elif activation == "gelu_accurate":
+        return gelu_accurate
+    elif activation == "tanh":
+        return torch.tanh
+    elif activation == "linear":
+        return lambda x: x
+    else:
+        raise RuntimeError("--activation-fn {} not supported".format(activation))
+
+
+def get_available_activation_fns() -> List:
+    return [
+        "relu",
+        "gelu",
+        "gelu_fast",  # deprecated
+        "gelu_accurate",
+        "tanh",
+        "linear",
+    ]
+
+
+@contextlib.contextmanager
+def model_eval(model):
+    is_training = model.training
+    model.eval()
+    yield
+    model.train(is_training)
+
+
+def has_parameters(module):
+    try:
+        next(module.parameters())
+        return True
+    except StopIteration:
+        return False
+
+
+def get_rng_state():
+    state = {"torch_rng_state": torch.get_rng_state()}
+    if xm is not None:
+        state["xla_rng_state"] = xm.get_rng_state()
+    if torch.cuda.is_available():
+        state["cuda_rng_state"] = torch.cuda.get_rng_state()
+    return state
+
+
+def set_rng_state(state):
+    torch.set_rng_state(state["torch_rng_state"])
+    if xm is not None:
+        xm.set_rng_state(state["xla_rng_state"])
+    if torch.cuda.is_available():
+        torch.cuda.set_rng_state(state["cuda_rng_state"])
+
+
+class set_torch_seed(object):
+    def __init__(self, seed):
+        assert isinstance(seed, int)
+        self.rng_state = get_rng_state()
+
+        torch.manual_seed(seed)
+        if xm is not None:
+            xm.set_rng_state(seed)
+        if torch.cuda.is_available():
+            torch.cuda.manual_seed(seed)
+
+    def __enter__(self):
+        return self
+
+    def __exit__(self, *exc):
+        set_rng_state(self.rng_state)
+
+
+def parse_alignment(line):
+    """
+    Parses a single line from the alingment file.
+
+    Args:
+        line (str): String containing the alignment of the format:
+            <src_idx_1>-<tgt_idx_1> <src_idx_2>-<tgt_idx_2> ..
+            <src_idx_m>-<tgt_idx_m>. All indices are 0 indexed.
+
+    Returns:
+        torch.IntTensor: packed alignments of shape (2 * m).
+    """
+    alignments = line.strip().split()
+    parsed_alignment = torch.IntTensor(2 * len(alignments))
+    for idx, alignment in enumerate(alignments):
+        src_idx, tgt_idx = alignment.split("-")
+        parsed_alignment[2 * idx] = int(src_idx)
+        parsed_alignment[2 * idx + 1] = int(tgt_idx)
+    return parsed_alignment
+
+
+def get_token_to_word_mapping(tokens, exclude_list):
+    n = len(tokens)
+    word_start = [int(token not in exclude_list) for token in tokens]
+    word_idx = list(accumulate(word_start))
+    token_to_word = {i: word_idx[i] for i in range(n)}
+    return token_to_word
+
+
+def extract_hard_alignment(attn, src_sent, tgt_sent, pad, eos):
+    tgt_valid = (
+        ((tgt_sent != pad) & (tgt_sent != eos)).nonzero(as_tuple=False).squeeze(dim=-1)
+    )
+    src_invalid = (
+        ((src_sent == pad) | (src_sent == eos)).nonzero(as_tuple=False).squeeze(dim=-1)
+    )
+    src_token_to_word = get_token_to_word_mapping(src_sent, [eos, pad])
+    tgt_token_to_word = get_token_to_word_mapping(tgt_sent, [eos, pad])
+    alignment = []
+    if len(tgt_valid) != 0 and len(src_invalid) < len(src_sent):
+        attn_valid = attn[tgt_valid]
+        attn_valid[:, src_invalid] = float("-inf")
+        _, src_indices = attn_valid.max(dim=1)
+        for tgt_idx, src_idx in zip(tgt_valid, src_indices):
+            alignment.append(
+                (
+                    src_token_to_word[src_idx.item()] - 1,
+                    tgt_token_to_word[tgt_idx.item()] - 1,
+                )
+            )
+    return alignment
+
+
+def extract_soft_alignment(attn, src_sent, tgt_sent, pad, eos):
+    tgt_valid = ((tgt_sent != pad)).nonzero(as_tuple=False)
+    src_valid = ((src_sent != pad)).nonzero(as_tuple=False).squeeze(dim=-1)
+    alignment = []
+    if len(tgt_valid) != 0 and len(src_valid) != 0:
+        attn_valid = attn[tgt_valid, src_valid]
+        alignment = [
+            ["{:.6f}".format(p) for p in src_probs.tolist()] for src_probs in attn_valid
+        ]
+    return alignment
+
+
+def new_arange(x, *size):
+    """
+    Return a Tensor of `size` filled with a range function on the device of x.
+    If size is empty, using the size of the variable x.
+    """
+    if len(size) == 0:
+        size = x.size()
+    return torch.arange(size[-1], device=x.device).expand(*size).contiguous()
+
+
+def get_tpu_device():
+    return xm.xla_device()
+
+
+def tpu_data_loader(itr):
+    import torch_xla.core.xla_model as xm
+    import torch_xla.distributed.parallel_loader as pl
+    from fairseq.data import iterators
+
+    xm.rendezvous("tpu_data_loader")  # wait for all workers
+    xm.mark_step()
+    device = xm.xla_device()
+    return iterators.CountingIterator(
+        pl.ParallelLoader(itr, [device]).per_device_loader(device),
+        start=getattr(itr, "n", 0),
+        total=len(itr),
+    )
+
+
+def is_xla_tensor(tensor):
+    return torch.is_tensor(tensor) and tensor.device.type == "xla"
+
+
+def index_put(tensor, indices, value):
+    if is_xla_tensor(tensor):
+        for _ in range(indices.dim(), tensor.dim()):
+            indices = indices.unsqueeze(-1)
+        if indices.size(-1) < tensor.size(-1):
+            indices = indices.expand_as(tensor)
+        tensor = torch.mul(tensor, ~indices) + torch.mul(value, indices)
+    else:
+        tensor[indices] = value
+    return tensor
+
+
+def xla_device_to_cpu(dat):
+    import torch_xla.core.xla_model as xm
+
+    return xm._maybe_convert_to_cpu(dat)
+
+
+class CudaEnvironment(object):
+    def __init__(self):
+        cur_device = torch.cuda.current_device()
+        prop = torch.cuda.get_device_properties("cuda:{}".format(cur_device))
+        self.name = prop.name
+        self.major = prop.major
+        self.minor = prop.minor
+        self.total_memory_in_GB = prop.total_memory / 1024 / 1024 / 1024
+
+    @staticmethod
+    def pretty_print_cuda_env_list(cuda_env_list):
+        """
+        Given a list of CudaEnviorments, pretty print them
+        """
+        num_workers = len(cuda_env_list)
+        center = "CUDA enviroments for all {} workers".format(num_workers)
+        banner_len = 40 - len(center) // 2
+        first_line = "*" * banner_len + center + "*" * banner_len
+        logger.info(first_line)
+        for r, env in enumerate(cuda_env_list):
+            logger.info(
+                "rank {:3d}: ".format(r)
+                + "capabilities = {:2d}.{:<2d} ; ".format(env.major, env.minor)
+                + "total memory = {:.3f} GB ; ".format(env.total_memory_in_GB)
+                + "name = {:40s}".format(env.name)
+            )
+        logger.info(first_line)
+
+
+def csv_str_list(x):
+    return x.split(",")
+
+
+def eval_str_list(x, type=float):
+    if x is None:
+        return None
+    if isinstance(x, str):
+        x = eval(x)
+    try:
+        return list(map(type, x))
+    except TypeError:
+        return [type(x)]
+
+
+def eval_str_dict(x, type=dict):
+    if x is None:
+        return None
+    if isinstance(x, str):
+        x = eval(x)
+    return x
+
+
+def eval_bool(x, default=False):
+    if x is None:
+        return default
+    try:
+        return bool(eval(x))
+    except TypeError:
+        return default
+
+
+def reset_logging():
+    root = logging.getLogger()
+    for handler in root.handlers:
+        root.removeHandler(handler)
+    root.setLevel(os.environ.get("LOGLEVEL", "INFO").upper())
+    handler = logging.StreamHandler(sys.stdout)
+    handler.setFormatter(
+        logging.Formatter(
+            fmt="%(asctime)s | %(levelname)s | %(name)s | %(message)s",
+            datefmt="%Y-%m-%d %H:%M:%S",
+        )
+    )
+    root.addHandler(handler)
diff --git a/SpeechT5/fairseq/fairseq/version.txt b/SpeechT5/fairseq/fairseq/version.txt
new file mode 100644
index 0000000000000000000000000000000000000000..41432f00d9ce57fadd55cc7dd27b391ddf5ca0b9
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq/version.txt
@@ -0,0 +1 @@
+1.0.0a0
diff --git a/SpeechT5/fairseq/fairseq_cli/__init__.py b/SpeechT5/fairseq/fairseq_cli/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/SpeechT5/fairseq/fairseq_cli/eval_lm.py b/SpeechT5/fairseq/fairseq_cli/eval_lm.py
new file mode 100644
index 0000000000000000000000000000000000000000..ab6e77029ef738291efd190b1cfe2435dd403dea
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq_cli/eval_lm.py
@@ -0,0 +1,347 @@
+#!/usr/bin/env python3 -u
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+"""
+Evaluate the perplexity of a trained language model.
+"""
+
+import logging
+import math
+import os
+import sys
+from argparse import Namespace
+from typing import Iterable, List, Optional
+
+import torch
+import fairseq
+from fairseq import checkpoint_utils, distributed_utils, options, tasks, utils
+from fairseq.dataclass.utils import convert_namespace_to_omegaconf
+from fairseq.logging import progress_bar
+from fairseq.logging.meters import StopwatchMeter
+from fairseq.sequence_scorer import SequenceScorer
+from omegaconf import DictConfig
+
+
+logging.basicConfig(
+    format="%(asctime)s | %(levelname)s | %(name)s | %(message)s",
+    datefmt="%Y-%m-%d %H:%M:%S",
+    level=os.environ.get("LOGLEVEL", "INFO").upper(),
+    stream=sys.stdout,
+)
+logger = logging.getLogger("fairseq_cli.eval_lm")
+
+
+def eval_lm(
+    models: List[fairseq.models.FairseqModel],
+    source_dictionary: fairseq.data.Dictionary,
+    batch_iterator: Iterable,
+    post_process: Optional[str] = None,
+    output_word_probs: bool = False,
+    output_word_stats: bool = False,
+    target_dictionary: Optional[fairseq.data.Dictionary] = None,
+    softmax_batch: int = 0,
+    remove_bos_token: bool = False,
+    device: Optional[torch.device] = None,
+):
+    """
+    Args:
+        models (List[~fairseq.models.FairseqModel]): list of models to
+            evaluate. Models are essentially `nn.Module` instances, but
+            must be compatible with fairseq's `SequenceScorer`.
+        source_dictionary (~fairseq.data.Dictionary): dictionary for
+            applying any relevant post processing or outputing word
+            probs/stats.
+        batch_iterator (Iterable): yield batches of data
+        post_process (Optional[str]): post-process text by removing BPE,
+            letter segmentation, etc. Valid options can be found in
+            fairseq.data.utils.post_process, although not all options
+            are implemented here.
+        output_word_probs (Optional[bool]): output words and their
+            predicted log probabilities
+        output_word_stats (Optional[bool]): output word statistics such
+            as word count and average probability
+        target_dictionary (Optional[~fairseq.data.Dictionary]): output
+            dictionary (defaults to *source_dictionary*)
+        softmax_batch (Optional[bool]): if BxT is more than this, will
+            batch the softmax over vocab to this amount of tokens, in
+            order to fit into GPU memory
+        remove_bos_token (Optional[bool]): if True, confirm that the
+            first token is the beginning-of-sentence symbol (according
+            to the relevant dictionary) and remove it from the output
+        device (Optional[torch.device]): device to use for evaluation
+            (defaults to device of first model parameter)
+    """
+    if target_dictionary is None:
+        target_dictionary = source_dictionary
+    if device is None:
+        device = next(models[0].parameters()).device
+
+    gen_timer = StopwatchMeter()
+    scorer = SequenceScorer(target_dictionary, softmax_batch)
+
+    score_sum = 0.0
+    count = 0
+
+    if post_process is not None:
+        if post_process in {"subword_nmt", "@@ "}:
+            bpe_cont = post_process.rstrip()
+            bpe_toks = {
+                i
+                for i in range(len(source_dictionary))
+                if source_dictionary[i].endswith(bpe_cont)
+            }
+        else:
+            raise NotImplementedError(
+                "--post-process={post_process} is not implemented"
+            )
+        bpe_len = len(bpe_cont)
+    else:
+        bpe_toks = None
+        bpe_len = 0
+
+    word_stats = dict()
+
+    for sample in batch_iterator:
+        if "net_input" not in sample:
+            continue
+
+        sample = utils.move_to_cuda(sample, device=device)
+
+        gen_timer.start()
+        hypos = scorer.generate(models, sample)
+        gen_timer.stop(sample["ntokens"])
+
+        for i, hypos_i in enumerate(hypos):
+            hypo = hypos_i[0]
+            sample_id = sample["id"][i]
+
+            tokens = hypo["tokens"]
+            tgt_len = tokens.numel()
+            pos_scores = hypo["positional_scores"].float()
+
+            if remove_bos_token:
+                assert hypo["tokens"][0].item() == target_dictionary.bos()
+                tokens = tokens[1:]
+                pos_scores = pos_scores[1:]
+
+            skipped_toks = 0
+            if bpe_toks is not None:
+                for i in range(tgt_len - 1):
+                    if tokens[i].item() in bpe_toks:
+                        skipped_toks += 1
+                        pos_scores[i + 1] += pos_scores[i]
+                        pos_scores[i] = 0
+
+            inf_scores = pos_scores.eq(float("inf")) | pos_scores.eq(float("-inf"))
+            if inf_scores.any():
+                logger.info(
+                    "skipping tokens with inf scores:",
+                    target_dictionary.string(tokens[inf_scores.nonzero()]),
+                )
+                pos_scores = pos_scores[(~inf_scores).nonzero()]
+            score_sum += pos_scores.sum().cpu()
+            count += pos_scores.numel() - skipped_toks
+
+            if output_word_probs or output_word_stats:
+                w = ""
+                word_prob = []
+                is_bpe = False
+                for i in range(len(tokens)):
+                    w_ind = tokens[i].item()
+                    w += source_dictionary[w_ind]
+                    if bpe_toks is not None and w_ind in bpe_toks:
+                        w = w[:-bpe_len]
+                        is_bpe = True
+                    else:
+                        word_prob.append((w, pos_scores[i].item()))
+
+                        next_prob = None
+                        ind = i + 1
+                        while ind < len(tokens):
+                            if pos_scores[ind].item() != 0:
+                                next_prob = pos_scores[ind]
+                                break
+                            ind += 1
+
+                        word_stats.setdefault(w, WordStat(w, is_bpe)).add(
+                            pos_scores[i].item(), next_prob
+                        )
+                        is_bpe = False
+                        w = ""
+                if output_word_probs:
+                    logger.info(
+                        str(int(sample_id))
+                        + " "
+                        + (
+                            "\t".join(
+                                "{} [{:2f}]".format(x[0], x[1]) for x in word_prob
+                            )
+                        )
+                    )
+
+    avg_nll_loss = (
+        -score_sum / count / math.log(2) if count > 0 else 0
+    )  # convert to base 2
+    logger.info(
+        "Evaluated {:,} tokens in {:.1f}s ({:.2f} tokens/s)".format(
+            gen_timer.n, gen_timer.sum, 1.0 / gen_timer.avg if gen_timer.avg > 0 else 0
+        )
+    )
+
+    if output_word_stats:
+        for ws in sorted(word_stats.values(), key=lambda x: x.count, reverse=True):
+            logger.info(ws)
+
+    return {
+        "loss": avg_nll_loss,
+        "perplexity": 2 ** avg_nll_loss,
+    }
+
+
+class WordStat(object):
+    def __init__(self, word, is_bpe):
+        self.word = word
+        self.is_bpe = is_bpe
+        self.log_prob = 0
+        self.next_word_prob = 0
+        self.count = 0
+        self.missing_next_words = 0
+
+    def add(self, log_prob, next_word_prob):
+        """increments counters for the sum of log probs of current word and next
+        word (given context ending at current word). Since the next word might be at the end of the example,
+        or it might be not counted because it is not an ending subword unit,
+        also keeps track of how many of those we have seen"""
+        if next_word_prob is not None:
+            self.next_word_prob += next_word_prob
+        else:
+            self.missing_next_words += 1
+        self.log_prob += log_prob
+        self.count += 1
+
+    def __str__(self):
+        return "{}\t{}\t{}\t{}\t{}\t{}".format(
+            self.word,
+            self.count,
+            self.log_prob,
+            self.is_bpe,
+            self.next_word_prob,
+            self.count - self.missing_next_words,
+        )
+
+
+def main(cfg: DictConfig, **unused_kwargs):
+    if isinstance(cfg, Namespace):
+        cfg = convert_namespace_to_omegaconf(cfg)
+
+    utils.import_user_module(cfg.common)
+
+    logger.info(cfg)
+
+    if cfg.eval_lm.context_window > 0:
+        # reduce tokens per sample by the required context window size
+        cfg.task.tokens_per_sample -= cfg.eval_lm.context_window
+
+    # Initialize the task using the current *cfg*
+    task = tasks.setup_task(cfg.task)
+
+    # Load ensemble
+    logger.info("loading model(s) from {}".format(cfg.common_eval.path))
+    models, model_args, task = checkpoint_utils.load_model_ensemble_and_task(
+        [cfg.common_eval.path],
+        arg_overrides=eval(cfg.common_eval.model_overrides),
+        suffix=cfg.checkpoint.checkpoint_suffix,
+        strict=(cfg.checkpoint.checkpoint_shard_count == 1),
+        num_shards=cfg.checkpoint.checkpoint_shard_count,
+        task=task,
+    )
+
+    use_fp16 = cfg.common.fp16
+    use_cuda = torch.cuda.is_available() and not cfg.common.cpu
+    if use_cuda:
+        torch.cuda.set_device(cfg.distributed_training.device_id)
+
+    # Optimize ensemble for generation and set the source and dest dicts on the model
+    # (required by scorer)
+    for model in models:
+        if use_fp16:
+            model.half()
+        if use_cuda and not cfg.distributed_training.pipeline_model_parallel:
+            model.cuda()
+        model.prepare_for_inference_(cfg)
+
+    assert len(models) > 0
+
+    logger.info(
+        "num. model params: {:,}".format(sum(p.numel() for p in models[0].parameters()))
+    )
+
+    # Load dataset splits
+    task.load_dataset(cfg.dataset.gen_subset)
+    dataset = task.dataset(cfg.dataset.gen_subset)
+    logger.info(
+        "{} {} {:,} examples".format(
+            cfg.task.data, cfg.dataset.gen_subset, len(dataset)
+        )
+    )
+
+    itr = task.eval_lm_dataloader(
+        dataset=dataset,
+        max_tokens=cfg.dataset.max_tokens or 36000,
+        batch_size=cfg.dataset.batch_size,
+        max_positions=utils.resolve_max_positions(
+            *[model.max_positions() for model in models]
+        ),
+        num_shards=max(
+            cfg.dataset.num_shards,
+            cfg.distributed_training.distributed_world_size,
+        ),
+        shard_id=max(
+            cfg.dataset.shard_id,
+            cfg.distributed_training.distributed_rank,
+        ),
+        num_workers=cfg.dataset.num_workers,
+        data_buffer_size=cfg.dataset.data_buffer_size,
+        context_window=cfg.eval_lm.context_window,
+    )
+
+    itr = progress_bar.progress_bar(
+        itr,
+        log_format=cfg.common.log_format,
+        log_interval=cfg.common.log_interval,
+        default_log_format=("tqdm" if not cfg.common.no_progress_bar else "simple"),
+    )
+
+    results = eval_lm(
+        models=models,
+        source_dictionary=task.source_dictionary,
+        batch_iterator=itr,
+        post_process=cfg.common_eval.post_process,
+        output_word_probs=cfg.eval_lm.output_word_probs,
+        output_word_stats=cfg.eval_lm.output_word_stats,
+        target_dictionary=task.target_dictionary,
+        softmax_batch=cfg.eval_lm.softmax_batch,
+        remove_bos_token=getattr(cfg.task, "add_bos_token", False),
+    )
+
+    logger.info(
+        "Loss (base 2): {:.4f}, Perplexity: {:.2f}".format(
+            results["loss"], results["perplexity"]
+        )
+    )
+
+    return results
+
+
+def cli_main():
+    parser = options.get_eval_lm_parser()
+    args = options.parse_args_and_arch(parser)
+
+    distributed_utils.call_main(convert_namespace_to_omegaconf(args), main)
+
+
+if __name__ == "__main__":
+    cli_main()
diff --git a/SpeechT5/fairseq/fairseq_cli/generate.py b/SpeechT5/fairseq/fairseq_cli/generate.py
new file mode 100644
index 0000000000000000000000000000000000000000..7bd582b25670a937921755ad98af12b107ad9fc7
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq_cli/generate.py
@@ -0,0 +1,408 @@
+#!/usr/bin/env python3 -u
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+"""
+Translate pre-processed data with a trained model.
+"""
+
+import ast
+import logging
+import math
+import os
+import sys
+from argparse import Namespace
+from itertools import chain
+
+import numpy as np
+import torch
+from fairseq import checkpoint_utils, options, scoring, tasks, utils
+from fairseq.dataclass.utils import convert_namespace_to_omegaconf
+from fairseq.logging import progress_bar
+from fairseq.logging.meters import StopwatchMeter, TimeMeter
+from omegaconf import DictConfig
+
+
+def main(cfg: DictConfig):
+
+    if isinstance(cfg, Namespace):
+        cfg = convert_namespace_to_omegaconf(cfg)
+
+    assert cfg.common_eval.path is not None, "--path required for generation!"
+    assert (
+        not cfg.generation.sampling or cfg.generation.nbest == cfg.generation.beam
+    ), "--sampling requires --nbest to be equal to --beam"
+    assert (
+        cfg.generation.replace_unk is None or cfg.dataset.dataset_impl == "raw"
+    ), "--replace-unk requires a raw text dataset (--dataset-impl=raw)"
+
+    if cfg.common_eval.results_path is not None:
+        os.makedirs(cfg.common_eval.results_path, exist_ok=True)
+        output_path = os.path.join(
+            cfg.common_eval.results_path,
+            "generate-{}.txt".format(cfg.dataset.gen_subset),
+        )
+        with open(output_path, "w", buffering=1, encoding="utf-8") as h:
+            return _main(cfg, h)
+    else:
+        return _main(cfg, sys.stdout)
+
+
+def get_symbols_to_strip_from_output(generator):
+    if hasattr(generator, "symbols_to_strip_from_output"):
+        return generator.symbols_to_strip_from_output
+    else:
+        return {generator.eos}
+
+
+def _main(cfg: DictConfig, output_file):
+    logging.basicConfig(
+        format="%(asctime)s | %(levelname)s | %(name)s | %(message)s",
+        datefmt="%Y-%m-%d %H:%M:%S",
+        level=os.environ.get("LOGLEVEL", "INFO").upper(),
+        stream=output_file,
+    )
+    logger = logging.getLogger("fairseq_cli.generate")
+
+    utils.import_user_module(cfg.common)
+
+    if cfg.dataset.max_tokens is None and cfg.dataset.batch_size is None:
+        cfg.dataset.max_tokens = 12000
+    logger.info(cfg)
+
+    # Fix seed for stochastic decoding
+    if cfg.common.seed is not None and not cfg.generation.no_seed_provided:
+        np.random.seed(cfg.common.seed)
+        utils.set_torch_seed(cfg.common.seed)
+
+    use_cuda = torch.cuda.is_available() and not cfg.common.cpu
+
+    # Load dataset splits
+    task = tasks.setup_task(cfg.task)
+
+
+    # Set dictionaries
+    try:
+        src_dict = getattr(task, "source_dictionary", None)
+    except NotImplementedError:
+        src_dict = None
+    tgt_dict = task.target_dictionary
+
+    overrides = ast.literal_eval(cfg.common_eval.model_overrides)
+
+    # Load ensemble
+    logger.info("loading model(s) from {}".format(cfg.common_eval.path))
+    models, saved_cfg = checkpoint_utils.load_model_ensemble(
+        utils.split_paths(cfg.common_eval.path),
+        arg_overrides=overrides,
+        task=task,
+        suffix=cfg.checkpoint.checkpoint_suffix,
+        strict=(cfg.checkpoint.checkpoint_shard_count == 1),
+        num_shards=cfg.checkpoint.checkpoint_shard_count,
+    )
+
+    # loading the dataset should happen after the checkpoint has been loaded so we can give it the saved task config
+    task.load_dataset(cfg.dataset.gen_subset, task_cfg=saved_cfg.task)
+
+    if cfg.generation.lm_path is not None:
+        overrides["data"] = cfg.task.data
+
+        try:
+            lms, _ = checkpoint_utils.load_model_ensemble(
+                [cfg.generation.lm_path], arg_overrides=overrides, task=None
+            )
+        except:
+            logger.warning(
+                f"Failed to load language model! Please make sure that the language model dict is the same "
+                f"as target dict and is located in the data dir ({cfg.task.data})"
+            )
+            raise
+
+        assert len(lms) == 1
+    else:
+        lms = [None]
+
+    # Optimize ensemble for generation
+    for model in chain(models, lms):
+        if model is None:
+            continue
+        if cfg.common.fp16:
+            model.half()
+        if use_cuda and not cfg.distributed_training.pipeline_model_parallel:
+            model.cuda()
+        model.prepare_for_inference_(cfg)
+
+    # Load alignment dictionary for unknown word replacement
+    # (None if no unknown word replacement, empty if no path to align dictionary)
+    align_dict = utils.load_align_dict(cfg.generation.replace_unk)
+
+    # Load dataset (possibly sharded)
+    itr = task.get_batch_iterator(
+        dataset=task.dataset(cfg.dataset.gen_subset),
+        max_tokens=cfg.dataset.max_tokens,
+        max_sentences=cfg.dataset.batch_size,
+        max_positions=utils.resolve_max_positions(
+            task.max_positions(), *[m.max_positions() for m in models]
+        ),
+        ignore_invalid_inputs=cfg.dataset.skip_invalid_size_inputs_valid_test,
+        required_batch_size_multiple=cfg.dataset.required_batch_size_multiple,
+        seed=cfg.common.seed,
+        num_shards=cfg.distributed_training.distributed_world_size,
+        shard_id=cfg.distributed_training.distributed_rank,
+        num_workers=cfg.dataset.num_workers,
+        data_buffer_size=cfg.dataset.data_buffer_size,
+    ).next_epoch_itr(shuffle=False)
+    progress = progress_bar.progress_bar(
+        itr,
+        log_format=cfg.common.log_format,
+        log_interval=cfg.common.log_interval,
+        default_log_format=("tqdm" if not cfg.common.no_progress_bar else "simple"),
+    )
+
+    # Initialize generator
+    gen_timer = StopwatchMeter()
+
+    extra_gen_cls_kwargs = {"lm_model": lms[0], "lm_weight": cfg.generation.lm_weight}
+    generator = task.build_generator(
+        models, cfg.generation, extra_gen_cls_kwargs=extra_gen_cls_kwargs
+    )
+
+    # Handle tokenization and BPE
+    tokenizer = task.build_tokenizer(cfg.tokenizer)
+    bpe = task.build_bpe(cfg.bpe)
+
+    def decode_fn(x):
+        if bpe is not None:
+            x = bpe.decode(x)
+        if tokenizer is not None:
+            x = tokenizer.decode(x)
+        return x
+
+    scorer = scoring.build_scorer(cfg.scoring, tgt_dict)
+
+    num_sentences = 0
+    has_target = True
+    wps_meter = TimeMeter()
+    for sample in progress:
+        sample = utils.move_to_cuda(sample) if use_cuda else sample
+        if "net_input" not in sample:
+            continue
+
+        prefix_tokens = None
+        if cfg.generation.prefix_size > 0:
+            prefix_tokens = sample["target"][:, : cfg.generation.prefix_size]
+
+        constraints = None
+        if "constraints" in sample:
+            constraints = sample["constraints"]
+
+        gen_timer.start()
+        hypos = task.inference_step(
+            generator,
+            models,
+            sample,
+            prefix_tokens=prefix_tokens,
+            constraints=constraints,
+        )
+        num_generated_tokens = sum(len(h[0]["tokens"]) for h in hypos)
+        gen_timer.stop(num_generated_tokens)
+
+        for i, sample_id in enumerate(sample["id"].tolist()):
+            has_target = sample["target"] is not None
+
+            # Remove padding
+            if "src_tokens" in sample["net_input"]:
+                src_tokens = utils.strip_pad(
+                    sample["net_input"]["src_tokens"][i, :], tgt_dict.pad()
+                )
+            else:
+                src_tokens = None
+
+            target_tokens = None
+            if has_target:
+                target_tokens = (
+                    utils.strip_pad(sample["target"][i, :], tgt_dict.pad()).int().cpu()
+                )
+
+            # Either retrieve the original sentences or regenerate them from tokens.
+            if align_dict is not None:
+                src_str = task.dataset(cfg.dataset.gen_subset).src.get_original_text(
+                    sample_id
+                )
+                target_str = task.dataset(cfg.dataset.gen_subset).tgt.get_original_text(
+                    sample_id
+                )
+            else:
+                if src_dict is not None:
+                    src_str = src_dict.string(src_tokens, cfg.common_eval.post_process)
+                else:
+                    src_str = ""
+                if has_target:
+                    target_str = tgt_dict.string(
+                        target_tokens,
+                        cfg.common_eval.post_process,
+                        escape_unk=True,
+                        extra_symbols_to_ignore=get_symbols_to_strip_from_output(
+                            generator
+                        ),
+                    )
+
+            src_str = decode_fn(src_str)
+            if has_target:
+                target_str = decode_fn(target_str)
+
+            if not cfg.common_eval.quiet:
+                if src_dict is not None:
+                    print("S-{}\t{}".format(sample_id, src_str), file=output_file)
+                if has_target:
+                    print("T-{}\t{}".format(sample_id, target_str), file=output_file)
+
+            # Process top predictions
+            for j, hypo in enumerate(hypos[i][: cfg.generation.nbest]):
+                hypo_tokens, hypo_str, alignment = utils.post_process_prediction(
+                    hypo_tokens=hypo["tokens"].int().cpu(),
+                    src_str=src_str,
+                    alignment=hypo["alignment"],
+                    align_dict=align_dict,
+                    tgt_dict=tgt_dict,
+                    remove_bpe=cfg.common_eval.post_process,
+                    extra_symbols_to_ignore=get_symbols_to_strip_from_output(generator),
+                )
+                detok_hypo_str = decode_fn(hypo_str)
+                if not cfg.common_eval.quiet:
+                    score = hypo["score"] / math.log(2)  # convert to base 2
+                    # original hypothesis (after tokenization and BPE)
+                    print(
+                        "H-{}\t{}\t{}".format(sample_id, score, hypo_str),
+                        file=output_file,
+                    )
+                    # detokenized hypothesis
+                    print(
+                        "D-{}\t{}\t{}".format(sample_id, score, detok_hypo_str),
+                        file=output_file,
+                    )
+                    print(
+                        "P-{}\t{}".format(
+                            sample_id,
+                            " ".join(
+                                map(
+                                    lambda x: "{:.4f}".format(x),
+                                    # convert from base e to base 2
+                                    hypo["positional_scores"]
+                                    .div_(math.log(2))
+                                    .tolist(),
+                                )
+                            ),
+                        ),
+                        file=output_file,
+                    )
+
+                    if cfg.generation.print_alignment == "hard":
+                        print(
+                            "A-{}\t{}".format(
+                                sample_id,
+                                " ".join(
+                                    [
+                                        "{}-{}".format(src_idx, tgt_idx)
+                                        for src_idx, tgt_idx in alignment
+                                    ]
+                                ),
+                            ),
+                            file=output_file,
+                        )
+                    if cfg.generation.print_alignment == "soft":
+                        print(
+                            "A-{}\t{}".format(
+                                sample_id,
+                                " ".join(
+                                    [
+                                        ",".join(src_probs)
+                                        for src_probs in alignment
+                                    ]
+                                ),
+                            ),
+                            file=output_file,
+                        )
+
+                    if cfg.generation.print_step:
+                        print(
+                            "I-{}\t{}".format(sample_id, hypo["steps"]),
+                            file=output_file,
+                        )
+
+                    if cfg.generation.retain_iter_history:
+                        for step, h in enumerate(hypo["history"]):
+                            _, h_str, _ = utils.post_process_prediction(
+                                hypo_tokens=h["tokens"].int().cpu(),
+                                src_str=src_str,
+                                alignment=None,
+                                align_dict=None,
+                                tgt_dict=tgt_dict,
+                                remove_bpe=None,
+                            )
+                            print(
+                                "E-{}_{}\t{}".format(sample_id, step, h_str),
+                                file=output_file,
+                            )
+
+                # Score only the top hypothesis
+                if has_target and j == 0:
+                    if align_dict is not None or cfg.common_eval.post_process is not None:
+                        # Convert back to tokens for evaluation with unk replacement and/or without BPE
+                        target_tokens = tgt_dict.encode_line(
+                            target_str, add_if_not_exist=True
+                        )
+                        hypo_tokens = tgt_dict.encode_line(
+                            detok_hypo_str, add_if_not_exist=True
+                        )
+                    if hasattr(scorer, "add_string"):
+                        scorer.add_string(target_str, detok_hypo_str)
+                    else:
+                        scorer.add(target_tokens, hypo_tokens)
+
+        wps_meter.update(num_generated_tokens)
+        progress.log({"wps": round(wps_meter.avg)})
+        num_sentences += (
+            sample["nsentences"] if "nsentences" in sample else sample["id"].numel()
+        )
+
+    logger.info("NOTE: hypothesis and token scores are output in base 2")
+    logger.info(
+        "Translated {:,} sentences ({:,} tokens) in {:.1f}s ({:.2f} sentences/s, {:.2f} tokens/s)".format(
+            num_sentences,
+            gen_timer.n,
+            gen_timer.sum,
+            num_sentences / gen_timer.sum,
+            1.0 / gen_timer.avg,
+        )
+    )
+    if has_target:
+        if cfg.bpe and not cfg.generation.sacrebleu:
+            if cfg.common_eval.post_process:
+                logger.warning(
+                    "BLEU score is being computed by splitting detokenized string on spaces, this is probably not what you want. Use --sacrebleu for standard 13a BLEU tokenization"
+                )
+            else:
+                logger.warning(
+                    "If you are using BPE on the target side, the BLEU score is computed on BPE tokens, not on proper words.  Use --sacrebleu for standard 13a BLEU tokenization"
+                )
+        # use print to be consistent with other main outputs: S-, H-, T-, D- and so on
+        print(
+            "Generate {} with beam={}: {}".format(
+                cfg.dataset.gen_subset, cfg.generation.beam, scorer.result_string()
+            ),
+            file=output_file,
+        )
+
+    return scorer
+
+
+def cli_main():
+    parser = options.get_generation_parser()
+    args = options.parse_args_and_arch(parser)
+    main(args)
+
+
+if __name__ == "__main__":
+    cli_main()
diff --git a/SpeechT5/fairseq/fairseq_cli/hydra_train.py b/SpeechT5/fairseq/fairseq_cli/hydra_train.py
new file mode 100644
index 0000000000000000000000000000000000000000..9de01084ba01a77a2297a3eace652a9be9b50380
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq_cli/hydra_train.py
@@ -0,0 +1,80 @@
+#!/usr/bin/env python3 -u
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import logging
+import os
+
+from fairseq.dataclass.initialize import add_defaults, hydra_init
+from fairseq_cli.train import main as pre_main
+from fairseq import distributed_utils, metrics
+from fairseq.dataclass.configs import FairseqConfig
+from fairseq.utils import reset_logging
+
+import hydra
+from hydra.core.hydra_config import HydraConfig
+import torch
+from omegaconf import OmegaConf, open_dict
+
+
+logger = logging.getLogger("fairseq_cli.hydra_train")
+
+
+@hydra.main(config_path=os.path.join("..", "fairseq", "config"), config_name="config")
+def hydra_main(cfg: FairseqConfig) -> float:
+    add_defaults(cfg)
+
+    if cfg.common.reset_logging:
+        reset_logging()  # Hydra hijacks logging, fix that
+    else:
+        with open_dict(cfg):
+            # make hydra logging work with ddp (see # see https://github.com/facebookresearch/hydra/issues/1126)
+            cfg.job_logging_cfg = OmegaConf.to_container(HydraConfig.get().job_logging, resolve=True)
+
+    cfg = OmegaConf.create(OmegaConf.to_container(cfg, resolve=True, enum_to_str=True))
+    OmegaConf.set_struct(cfg, True)
+
+    try:
+        if cfg.common.profile:
+            with torch.cuda.profiler.profile():
+                with torch.autograd.profiler.emit_nvtx():
+                    distributed_utils.call_main(cfg, pre_main)
+        else:
+            distributed_utils.call_main(cfg, pre_main)
+    except BaseException as e:
+        if not cfg.common.suppress_crashes:
+            raise
+        else:
+            logger.error("Crashed! " + str(e))
+
+    # get best val and return - useful for sweepers
+    try:
+        best_val = metrics.get_smoothed_value(
+            "valid", cfg.checkpoint.best_checkpoint_metric
+        )
+    except:
+        best_val = None
+
+    if best_val is None:
+        best_val = float("inf")
+
+    return best_val
+
+
+def cli_main():
+    try:
+        from hydra._internal.utils import get_args
+
+        cfg_name = get_args().config_name or "config"
+    except:
+        logger.warning("Failed to get config name from hydra args")
+        cfg_name = "config"
+
+    hydra_init(cfg_name)
+    hydra_main()
+
+
+if __name__ == "__main__":
+    cli_main()
diff --git a/SpeechT5/fairseq/fairseq_cli/interactive.py b/SpeechT5/fairseq/fairseq_cli/interactive.py
new file mode 100644
index 0000000000000000000000000000000000000000..cadef2821a74a3b2f051c792d835129bf775714f
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq_cli/interactive.py
@@ -0,0 +1,316 @@
+#!/usr/bin/env python3 -u
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+"""
+Translate raw text with a trained model. Batches data on-the-fly.
+"""
+
+import ast
+import fileinput
+import logging
+import math
+import os
+import sys
+import time
+from argparse import Namespace
+from collections import namedtuple
+
+import numpy as np
+import torch
+from fairseq import checkpoint_utils, distributed_utils, options, tasks, utils
+from fairseq.dataclass.configs import FairseqConfig
+from fairseq.dataclass.utils import convert_namespace_to_omegaconf
+from fairseq.token_generation_constraints import pack_constraints, unpack_constraints
+from fairseq_cli.generate import get_symbols_to_strip_from_output
+
+
+logging.basicConfig(
+    format="%(asctime)s | %(levelname)s | %(name)s | %(message)s",
+    datefmt="%Y-%m-%d %H:%M:%S",
+    level=os.environ.get("LOGLEVEL", "INFO").upper(),
+    stream=sys.stdout,
+)
+logger = logging.getLogger("fairseq_cli.interactive")
+
+
+Batch = namedtuple("Batch", "ids src_tokens src_lengths constraints")
+Translation = namedtuple("Translation", "src_str hypos pos_scores alignments")
+
+
+def buffered_read(input, buffer_size):
+    buffer = []
+    with fileinput.input(files=[input], openhook=fileinput.hook_encoded("utf-8")) as h:
+        for src_str in h:
+            buffer.append(src_str.strip())
+            if len(buffer) >= buffer_size:
+                yield buffer
+                buffer = []
+
+    if len(buffer) > 0:
+        yield buffer
+
+
+def make_batches(lines, cfg, task, max_positions, encode_fn):
+    def encode_fn_target(x):
+        return encode_fn(x)
+
+    if cfg.generation.constraints:
+        # Strip (tab-delimited) contraints, if present, from input lines,
+        # store them in batch_constraints
+        batch_constraints = [list() for _ in lines]
+        for i, line in enumerate(lines):
+            if "\t" in line:
+                lines[i], *batch_constraints[i] = line.split("\t")
+
+        # Convert each List[str] to List[Tensor]
+        for i, constraint_list in enumerate(batch_constraints):
+            batch_constraints[i] = [
+                task.target_dictionary.encode_line(
+                    encode_fn_target(constraint),
+                    append_eos=False,
+                    add_if_not_exist=False,
+                )
+                for constraint in constraint_list
+            ]
+
+    if cfg.generation.constraints:
+        constraints_tensor = pack_constraints(batch_constraints)
+    else:
+        constraints_tensor = None
+
+    tokens, lengths = task.get_interactive_tokens_and_lengths(lines, encode_fn)
+
+    itr = task.get_batch_iterator(
+        dataset=task.build_dataset_for_inference(
+            tokens, lengths, constraints=constraints_tensor
+        ),
+        max_tokens=cfg.dataset.max_tokens,
+        max_sentences=cfg.dataset.batch_size,
+        max_positions=max_positions,
+        ignore_invalid_inputs=cfg.dataset.skip_invalid_size_inputs_valid_test,
+    ).next_epoch_itr(shuffle=False)
+    for batch in itr:
+        ids = batch["id"]
+        src_tokens = batch["net_input"]["src_tokens"]
+        src_lengths = batch["net_input"]["src_lengths"]
+        constraints = batch.get("constraints", None)
+
+        yield Batch(
+            ids=ids,
+            src_tokens=src_tokens,
+            src_lengths=src_lengths,
+            constraints=constraints,
+        )
+
+
+def main(cfg: FairseqConfig):
+    if isinstance(cfg, Namespace):
+        cfg = convert_namespace_to_omegaconf(cfg)
+
+    start_time = time.time()
+    total_translate_time = 0
+
+    utils.import_user_module(cfg.common)
+
+    if cfg.interactive.buffer_size < 1:
+        cfg.interactive.buffer_size = 1
+    if cfg.dataset.max_tokens is None and cfg.dataset.batch_size is None:
+        cfg.dataset.batch_size = 1
+
+    assert (
+        not cfg.generation.sampling or cfg.generation.nbest == cfg.generation.beam
+    ), "--sampling requires --nbest to be equal to --beam"
+    assert (
+        not cfg.dataset.batch_size
+        or cfg.dataset.batch_size <= cfg.interactive.buffer_size
+    ), "--batch-size cannot be larger than --buffer-size"
+
+    logger.info(cfg)
+
+    # Fix seed for stochastic decoding
+    if cfg.common.seed is not None and not cfg.generation.no_seed_provided:
+        np.random.seed(cfg.common.seed)
+        utils.set_torch_seed(cfg.common.seed)
+
+    use_cuda = torch.cuda.is_available() and not cfg.common.cpu
+
+    # Setup task, e.g., translation
+    task = tasks.setup_task(cfg.task)
+
+    # Load ensemble
+    overrides = ast.literal_eval(cfg.common_eval.model_overrides)
+    logger.info("loading model(s) from {}".format(cfg.common_eval.path))
+    models, _model_args = checkpoint_utils.load_model_ensemble(
+        utils.split_paths(cfg.common_eval.path),
+        arg_overrides=overrides,
+        task=task,
+        suffix=cfg.checkpoint.checkpoint_suffix,
+        strict=(cfg.checkpoint.checkpoint_shard_count == 1),
+        num_shards=cfg.checkpoint.checkpoint_shard_count,
+    )
+
+    # Set dictionaries
+    src_dict = task.source_dictionary
+    tgt_dict = task.target_dictionary
+
+    # Optimize ensemble for generation
+    for model in models:
+        if model is None:
+            continue
+        if cfg.common.fp16:
+            model.half()
+        if use_cuda and not cfg.distributed_training.pipeline_model_parallel:
+            model.cuda()
+        model.prepare_for_inference_(cfg)
+
+    # Initialize generator
+    generator = task.build_generator(models, cfg.generation)
+
+    # Handle tokenization and BPE
+    tokenizer = task.build_tokenizer(cfg.tokenizer)
+    bpe = task.build_bpe(cfg.bpe)
+
+    def encode_fn(x):
+        if tokenizer is not None:
+            x = tokenizer.encode(x)
+        if bpe is not None:
+            x = bpe.encode(x)
+        return x
+
+    def decode_fn(x):
+        if bpe is not None:
+            x = bpe.decode(x)
+        if tokenizer is not None:
+            x = tokenizer.decode(x)
+        return x
+
+    # Load alignment dictionary for unknown word replacement
+    # (None if no unknown word replacement, empty if no path to align dictionary)
+    align_dict = utils.load_align_dict(cfg.generation.replace_unk)
+
+    max_positions = utils.resolve_max_positions(
+        task.max_positions(), *[model.max_positions() for model in models]
+    )
+
+    if cfg.generation.constraints:
+        logger.warning(
+            "NOTE: Constrained decoding currently assumes a shared subword vocabulary."
+        )
+
+    if cfg.interactive.buffer_size > 1:
+        logger.info("Sentence buffer size: %s", cfg.interactive.buffer_size)
+    logger.info("NOTE: hypothesis and token scores are output in base 2")
+    logger.info("Type the input sentence and press return:")
+    start_id = 0
+    for inputs in buffered_read(cfg.interactive.input, cfg.interactive.buffer_size):
+        results = []
+        for batch in make_batches(inputs, cfg, task, max_positions, encode_fn):
+            bsz = batch.src_tokens.size(0)
+            src_tokens = batch.src_tokens
+            src_lengths = batch.src_lengths
+            constraints = batch.constraints
+            if use_cuda:
+                src_tokens = src_tokens.cuda()
+                src_lengths = src_lengths.cuda()
+                if constraints is not None:
+                    constraints = constraints.cuda()
+
+            sample = {
+                "net_input": {
+                    "src_tokens": src_tokens,
+                    "src_lengths": src_lengths,
+                },
+            }
+            translate_start_time = time.time()
+            translations = task.inference_step(
+                generator, models, sample, constraints=constraints
+            )
+            translate_time = time.time() - translate_start_time
+            total_translate_time += translate_time
+            list_constraints = [[] for _ in range(bsz)]
+            if cfg.generation.constraints:
+                list_constraints = [unpack_constraints(c) for c in constraints]
+            for i, (id, hypos) in enumerate(zip(batch.ids.tolist(), translations)):
+                src_tokens_i = utils.strip_pad(src_tokens[i], tgt_dict.pad())
+                constraints = list_constraints[i]
+                results.append(
+                    (
+                        start_id + id,
+                        src_tokens_i,
+                        hypos,
+                        {
+                            "constraints": constraints,
+                            "time": translate_time / len(translations),
+                        },
+                    )
+                )
+
+        # sort output to match input order
+        for id_, src_tokens, hypos, info in sorted(results, key=lambda x: x[0]):
+            src_str = ''
+            if src_dict is not None:
+                src_str = src_dict.string(src_tokens, cfg.common_eval.post_process)
+                print("S-{}\t{}".format(id_, src_str))
+                print("W-{}\t{:.3f}\tseconds".format(id_, info["time"]))
+                for constraint in info["constraints"]:
+                    print(
+                        "C-{}\t{}".format(
+                            id_, tgt_dict.string(constraint, cfg.common_eval.post_process)
+                        )
+                    )
+
+            # Process top predictions
+            for hypo in hypos[: min(len(hypos), cfg.generation.nbest)]:
+                hypo_tokens, hypo_str, alignment = utils.post_process_prediction(
+                    hypo_tokens=hypo["tokens"].int().cpu(),
+                    src_str=src_str,
+                    alignment=hypo["alignment"],
+                    align_dict=align_dict,
+                    tgt_dict=tgt_dict,
+                    remove_bpe=cfg.common_eval.post_process,
+                    extra_symbols_to_ignore=get_symbols_to_strip_from_output(generator),
+                )
+                detok_hypo_str = decode_fn(hypo_str)
+                score = hypo["score"] / math.log(2)  # convert to base 2
+                # original hypothesis (after tokenization and BPE)
+                print("H-{}\t{}\t{}".format(id_, score, hypo_str))
+                # detokenized hypothesis
+                print("D-{}\t{}\t{}".format(id_, score, detok_hypo_str))
+                print(
+                    "P-{}\t{}".format(
+                        id_,
+                        " ".join(
+                            map(
+                                lambda x: "{:.4f}".format(x),
+                                # convert from base e to base 2
+                                hypo["positional_scores"].div_(math.log(2)).tolist(),
+                            )
+                        ),
+                    )
+                )
+                if cfg.generation.print_alignment:
+                    alignment_str = " ".join(
+                        ["{}-{}".format(src, tgt) for src, tgt in alignment]
+                    )
+                    print("A-{}\t{}".format(id_, alignment_str))
+
+        # update running id_ counter
+        start_id += len(inputs)
+
+    logger.info(
+        "Total time: {:.3f} seconds; translation time: {:.3f}".format(
+            time.time() - start_time, total_translate_time
+        )
+    )
+
+
+def cli_main():
+    parser = options.get_interactive_generation_parser()
+    args = options.parse_args_and_arch(parser)
+    distributed_utils.call_main(convert_namespace_to_omegaconf(args), main)
+
+
+if __name__ == "__main__":
+    cli_main()
diff --git a/SpeechT5/fairseq/fairseq_cli/preprocess.py b/SpeechT5/fairseq/fairseq_cli/preprocess.py
new file mode 100644
index 0000000000000000000000000000000000000000..b788900d30af6f0bcc1a3e807a8bb249e70b7a43
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq_cli/preprocess.py
@@ -0,0 +1,401 @@
+#!/usr/bin/env python3
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+"""
+Data pre-processing: build vocabularies and binarize training data.
+"""
+
+import logging
+import os
+import shutil
+import sys
+from collections import Counter
+from itertools import zip_longest
+from multiprocessing import Pool
+
+from fairseq import options, tasks, utils
+from fairseq.binarizer import Binarizer
+from fairseq.data import indexed_dataset
+
+
+logging.basicConfig(
+    format="%(asctime)s | %(levelname)s | %(name)s | %(message)s",
+    datefmt="%Y-%m-%d %H:%M:%S",
+    level=os.environ.get("LOGLEVEL", "INFO").upper(),
+    stream=sys.stdout,
+)
+logger = logging.getLogger("fairseq_cli.preprocess")
+
+
+def main(args):
+    utils.import_user_module(args)
+
+    os.makedirs(args.destdir, exist_ok=True)
+
+    logger.addHandler(
+        logging.FileHandler(
+            filename=os.path.join(args.destdir, "preprocess.log"),
+        )
+    )
+    logger.info(args)
+
+    task = tasks.get_task(args.task)
+
+    def train_path(lang):
+        return "{}{}".format(args.trainpref, ("." + lang) if lang else "")
+
+    def file_name(prefix, lang):
+        fname = prefix
+        if lang is not None:
+            fname += ".{lang}".format(lang=lang)
+        return fname
+
+    def dest_path(prefix, lang):
+        return os.path.join(args.destdir, file_name(prefix, lang))
+
+    def dict_path(lang):
+        return dest_path("dict", lang) + ".txt"
+
+    def build_dictionary(filenames, src=False, tgt=False):
+        assert src ^ tgt
+        return task.build_dictionary(
+            filenames,
+            workers=args.workers,
+            threshold=args.thresholdsrc if src else args.thresholdtgt,
+            nwords=args.nwordssrc if src else args.nwordstgt,
+            padding_factor=args.padding_factor,
+        )
+
+    target = not args.only_source
+
+    if not args.srcdict and os.path.exists(dict_path(args.source_lang)):
+        raise FileExistsError(dict_path(args.source_lang))
+    if target and not args.tgtdict and os.path.exists(dict_path(args.target_lang)):
+        raise FileExistsError(dict_path(args.target_lang))
+
+    if args.joined_dictionary:
+        assert (
+            not args.srcdict or not args.tgtdict
+        ), "cannot use both --srcdict and --tgtdict with --joined-dictionary"
+
+        if args.srcdict:
+            src_dict = task.load_dictionary(args.srcdict)
+        elif args.tgtdict:
+            src_dict = task.load_dictionary(args.tgtdict)
+        else:
+            assert (
+                args.trainpref
+            ), "--trainpref must be set if --srcdict is not specified"
+            src_dict = build_dictionary(
+                {train_path(lang) for lang in [args.source_lang, args.target_lang]},
+                src=True,
+            )
+        tgt_dict = src_dict
+    else:
+        if args.srcdict:
+            src_dict = task.load_dictionary(args.srcdict)
+        else:
+            assert (
+                args.trainpref
+            ), "--trainpref must be set if --srcdict is not specified"
+            src_dict = build_dictionary([train_path(args.source_lang)], src=True)
+
+        if target:
+            if args.tgtdict:
+                tgt_dict = task.load_dictionary(args.tgtdict)
+            else:
+                assert (
+                    args.trainpref
+                ), "--trainpref must be set if --tgtdict is not specified"
+                tgt_dict = build_dictionary([train_path(args.target_lang)], tgt=True)
+        else:
+            tgt_dict = None
+
+    src_dict.save(dict_path(args.source_lang))
+    if target and tgt_dict is not None:
+        tgt_dict.save(dict_path(args.target_lang))
+
+    if args.dict_only:
+        return
+
+    def make_binary_dataset(vocab, input_prefix, output_prefix, lang, num_workers):
+        logger.info("[{}] Dictionary: {} types".format(lang, len(vocab)))
+        n_seq_tok = [0, 0]
+        replaced = Counter()
+
+        def merge_result(worker_result):
+            replaced.update(worker_result["replaced"])
+            n_seq_tok[0] += worker_result["nseq"]
+            n_seq_tok[1] += worker_result["ntok"]
+
+        input_file = "{}{}".format(
+            input_prefix, ("." + lang) if lang is not None else ""
+        )
+        offsets = Binarizer.find_offsets(input_file, num_workers)
+        pool = None
+        if num_workers > 1:
+            pool = Pool(processes=num_workers - 1)
+            for worker_id in range(1, num_workers):
+                prefix = "{}{}".format(output_prefix, worker_id)
+                pool.apply_async(
+                    binarize,
+                    (
+                        args,
+                        input_file,
+                        vocab,
+                        prefix,
+                        lang,
+                        offsets[worker_id],
+                        offsets[worker_id + 1],
+                    ),
+                    callback=merge_result,
+                )
+            pool.close()
+
+        ds = indexed_dataset.make_builder(
+            dataset_dest_file(args, output_prefix, lang, "bin"),
+            impl=args.dataset_impl,
+            vocab_size=len(vocab),
+        )
+        merge_result(
+            Binarizer.binarize(
+                input_file, vocab, lambda t: ds.add_item(t), offset=0, end=offsets[1]
+            )
+        )
+        if num_workers > 1:
+            pool.join()
+            for worker_id in range(1, num_workers):
+                prefix = "{}{}".format(output_prefix, worker_id)
+                temp_file_path = dataset_dest_prefix(args, prefix, lang)
+                ds.merge_file_(temp_file_path)
+                os.remove(indexed_dataset.data_file_path(temp_file_path))
+                os.remove(indexed_dataset.index_file_path(temp_file_path))
+
+        ds.finalize(dataset_dest_file(args, output_prefix, lang, "idx"))
+
+        logger.info(
+            "[{}] {}: {} sents, {} tokens, {:.3}% replaced by {}".format(
+                lang,
+                input_file,
+                n_seq_tok[0],
+                n_seq_tok[1],
+                100 * sum(replaced.values()) / n_seq_tok[1],
+                vocab.unk_word,
+            )
+        )
+
+    def make_binary_alignment_dataset(input_prefix, output_prefix, num_workers):
+        nseq = [0]
+
+        def merge_result(worker_result):
+            nseq[0] += worker_result["nseq"]
+
+        input_file = input_prefix
+        offsets = Binarizer.find_offsets(input_file, num_workers)
+        pool = None
+        if num_workers > 1:
+            pool = Pool(processes=num_workers - 1)
+            for worker_id in range(1, num_workers):
+                prefix = "{}{}".format(output_prefix, worker_id)
+                pool.apply_async(
+                    binarize_alignments,
+                    (
+                        args,
+                        input_file,
+                        utils.parse_alignment,
+                        prefix,
+                        offsets[worker_id],
+                        offsets[worker_id + 1],
+                    ),
+                    callback=merge_result,
+                )
+            pool.close()
+
+        ds = indexed_dataset.make_builder(
+            dataset_dest_file(args, output_prefix, None, "bin"), impl=args.dataset_impl
+        )
+
+        merge_result(
+            Binarizer.binarize_alignments(
+                input_file,
+                utils.parse_alignment,
+                lambda t: ds.add_item(t),
+                offset=0,
+                end=offsets[1],
+            )
+        )
+        if num_workers > 1:
+            pool.join()
+            for worker_id in range(1, num_workers):
+                prefix = "{}{}".format(output_prefix, worker_id)
+                temp_file_path = dataset_dest_prefix(args, prefix, None)
+                ds.merge_file_(temp_file_path)
+                os.remove(indexed_dataset.data_file_path(temp_file_path))
+                os.remove(indexed_dataset.index_file_path(temp_file_path))
+
+        ds.finalize(dataset_dest_file(args, output_prefix, None, "idx"))
+
+        logger.info("[alignments] {}: parsed {} alignments".format(input_file, nseq[0]))
+
+    def make_dataset(vocab, input_prefix, output_prefix, lang, num_workers=1):
+        if args.dataset_impl == "raw":
+            # Copy original text file to destination folder
+            output_text_file = dest_path(
+                output_prefix + ".{}-{}".format(args.source_lang, args.target_lang),
+                lang,
+            )
+            shutil.copyfile(file_name(input_prefix, lang), output_text_file)
+        else:
+            make_binary_dataset(vocab, input_prefix, output_prefix, lang, num_workers)
+
+    def make_all(lang, vocab):
+        if args.trainpref:
+            make_dataset(vocab, args.trainpref, "train", lang, num_workers=args.workers)
+        if args.validpref:
+            for k, validpref in enumerate(args.validpref.split(",")):
+                outprefix = "valid{}".format(k) if k > 0 else "valid"
+                make_dataset(
+                    vocab, validpref, outprefix, lang, num_workers=args.workers
+                )
+        if args.testpref:
+            for k, testpref in enumerate(args.testpref.split(",")):
+                outprefix = "test{}".format(k) if k > 0 else "test"
+                make_dataset(vocab, testpref, outprefix, lang, num_workers=args.workers)
+
+    def make_all_alignments():
+        if args.trainpref and os.path.exists(args.trainpref + "." + args.align_suffix):
+            make_binary_alignment_dataset(
+                args.trainpref + "." + args.align_suffix,
+                "train.align",
+                num_workers=args.workers,
+            )
+        if args.validpref and os.path.exists(args.validpref + "." + args.align_suffix):
+            make_binary_alignment_dataset(
+                args.validpref + "." + args.align_suffix,
+                "valid.align",
+                num_workers=args.workers,
+            )
+        if args.testpref and os.path.exists(args.testpref + "." + args.align_suffix):
+            make_binary_alignment_dataset(
+                args.testpref + "." + args.align_suffix,
+                "test.align",
+                num_workers=args.workers,
+            )
+
+    make_all(args.source_lang, src_dict)
+    if target:
+        make_all(args.target_lang, tgt_dict)
+    if args.align_suffix:
+        make_all_alignments()
+
+    logger.info("Wrote preprocessed data to {}".format(args.destdir))
+
+    if args.alignfile:
+        assert args.trainpref, "--trainpref must be set if --alignfile is specified"
+        src_file_name = train_path(args.source_lang)
+        tgt_file_name = train_path(args.target_lang)
+        freq_map = {}
+        with open(args.alignfile, "r", encoding="utf-8") as align_file:
+            with open(src_file_name, "r", encoding="utf-8") as src_file:
+                with open(tgt_file_name, "r", encoding="utf-8") as tgt_file:
+                    for a, s, t in zip_longest(align_file, src_file, tgt_file):
+                        si = src_dict.encode_line(s, add_if_not_exist=False)
+                        ti = tgt_dict.encode_line(t, add_if_not_exist=False)
+                        ai = list(map(lambda x: tuple(x.split("-")), a.split()))
+                        for sai, tai in ai:
+                            srcidx = si[int(sai)]
+                            tgtidx = ti[int(tai)]
+                            if srcidx != src_dict.unk() and tgtidx != tgt_dict.unk():
+                                assert srcidx != src_dict.pad()
+                                assert srcidx != src_dict.eos()
+                                assert tgtidx != tgt_dict.pad()
+                                assert tgtidx != tgt_dict.eos()
+
+                                if srcidx not in freq_map:
+                                    freq_map[srcidx] = {}
+                                if tgtidx not in freq_map[srcidx]:
+                                    freq_map[srcidx][tgtidx] = 1
+                                else:
+                                    freq_map[srcidx][tgtidx] += 1
+
+        align_dict = {}
+        for srcidx in freq_map.keys():
+            align_dict[srcidx] = max(freq_map[srcidx], key=freq_map[srcidx].get)
+
+        with open(
+            os.path.join(
+                args.destdir,
+                "alignment.{}-{}.txt".format(args.source_lang, args.target_lang),
+            ),
+            "w",
+            encoding="utf-8",
+        ) as f:
+            for k, v in align_dict.items():
+                print("{} {}".format(src_dict[k], tgt_dict[v]), file=f)
+
+
+def binarize(args, filename, vocab, output_prefix, lang, offset, end, append_eos=True):
+    ds = indexed_dataset.make_builder(
+        dataset_dest_file(args, output_prefix, lang, "bin"),
+        impl=args.dataset_impl,
+        vocab_size=len(vocab),
+    )
+
+    def consumer(tensor):
+        ds.add_item(tensor)
+
+    res = Binarizer.binarize(
+        filename, vocab, consumer, append_eos=append_eos, offset=offset, end=end
+    )
+    ds.finalize(dataset_dest_file(args, output_prefix, lang, "idx"))
+    return res
+
+
+def binarize_alignments(args, filename, parse_alignment, output_prefix, offset, end):
+    ds = indexed_dataset.make_builder(
+        dataset_dest_file(args, output_prefix, None, "bin"),
+        impl=args.dataset_impl,
+        vocab_size=None,
+    )
+
+    def consumer(tensor):
+        ds.add_item(tensor)
+
+    res = Binarizer.binarize_alignments(
+        filename, parse_alignment, consumer, offset=offset, end=end
+    )
+    ds.finalize(dataset_dest_file(args, output_prefix, None, "idx"))
+    return res
+
+
+def dataset_dest_prefix(args, output_prefix, lang):
+    base = "{}/{}".format(args.destdir, output_prefix)
+    if lang is not None:
+        lang_part = ".{}-{}.{}".format(args.source_lang, args.target_lang, lang)
+    elif args.only_source:
+        lang_part = ""
+    else:
+        lang_part = ".{}-{}".format(args.source_lang, args.target_lang)
+
+    return "{}{}".format(base, lang_part)
+
+
+def dataset_dest_file(args, output_prefix, lang, extension):
+    base = dataset_dest_prefix(args, output_prefix, lang)
+    return "{}.{}".format(base, extension)
+
+
+def get_offsets(input_file, num_workers):
+    return Binarizer.find_offsets(input_file, num_workers)
+
+
+def cli_main():
+    parser = options.get_preprocessing_parser()
+    args = parser.parse_args()
+    main(args)
+
+
+if __name__ == "__main__":
+    cli_main()
diff --git a/SpeechT5/fairseq/fairseq_cli/score.py b/SpeechT5/fairseq/fairseq_cli/score.py
new file mode 100644
index 0000000000000000000000000000000000000000..0b207be959d55f6a56d8c5eb7db3dbe0c1ac977e
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq_cli/score.py
@@ -0,0 +1,102 @@
+#!/usr/bin/env python3
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+"""
+BLEU scoring of generated translations against reference translations.
+"""
+
+import argparse
+import os
+import sys
+
+from fairseq.data import dictionary
+from fairseq.scoring import bleu
+
+
+def get_parser():
+    parser = argparse.ArgumentParser(
+        description="Command-line script for BLEU scoring."
+    )
+    # fmt: off
+    parser.add_argument('-s', '--sys', default='-', help='system output')
+    parser.add_argument('-r', '--ref', required=True, help='references')
+    parser.add_argument('-o', '--order', default=4, metavar='N',
+                        type=int, help='consider ngrams up to this order')
+    parser.add_argument('--ignore-case', action='store_true',
+                        help='case-insensitive scoring')
+    parser.add_argument('--sacrebleu', action='store_true',
+                        help='score with sacrebleu')
+    parser.add_argument('--sentence-bleu', action='store_true',
+                        help='report sentence-level BLEUs (i.e., with +1 smoothing)')
+    # fmt: on
+    return parser
+
+
+def cli_main():
+    parser = get_parser()
+    args = parser.parse_args()
+    print(args)
+
+    assert args.sys == "-" or os.path.exists(
+        args.sys
+    ), "System output file {} does not exist".format(args.sys)
+    assert os.path.exists(args.ref), "Reference file {} does not exist".format(args.ref)
+
+    dict = dictionary.Dictionary()
+
+    def readlines(fd):
+        for line in fd.readlines():
+            if args.ignore_case:
+                yield line.lower()
+            else:
+                yield line
+
+    if args.sacrebleu:
+        import sacrebleu
+
+        def score(fdsys):
+            with open(args.ref) as fdref:
+                print(sacrebleu.corpus_bleu(fdsys, [fdref]).format())
+
+    elif args.sentence_bleu:
+
+        def score(fdsys):
+            with open(args.ref) as fdref:
+                scorer = bleu.Scorer(dict.pad(), dict.eos(), dict.unk())
+                for i, (sys_tok, ref_tok) in enumerate(
+                    zip(readlines(fdsys), readlines(fdref))
+                ):
+                    scorer.reset(one_init=True)
+                    sys_tok = dict.encode_line(sys_tok)
+                    ref_tok = dict.encode_line(ref_tok)
+                    scorer.add(ref_tok, sys_tok)
+                    print(i, scorer.result_string(args.order))
+
+    else:
+
+        def score(fdsys):
+            with open(args.ref) as fdref:
+                scorer = bleu.Scorer(
+                    bleu.BleuConfig(
+                        pad=dict.pad(),
+                        eos=dict.eos(),
+                        unk=dict.unk(),
+                    )
+                )
+                for sys_tok, ref_tok in zip(readlines(fdsys), readlines(fdref)):
+                    sys_tok = dict.encode_line(sys_tok)
+                    ref_tok = dict.encode_line(ref_tok)
+                    scorer.add(ref_tok, sys_tok)
+                print(scorer.result_string(args.order))
+
+    if args.sys == "-":
+        score(sys.stdin)
+    else:
+        with open(args.sys, "r") as f:
+            score(f)
+
+
+if __name__ == "__main__":
+    cli_main()
diff --git a/SpeechT5/fairseq/fairseq_cli/train.py b/SpeechT5/fairseq/fairseq_cli/train.py
new file mode 100644
index 0000000000000000000000000000000000000000..83475873138c5d1bac288c234afb6b4a1a7882d7
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq_cli/train.py
@@ -0,0 +1,514 @@
+#!/usr/bin/env python3 -u
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+"""
+Train a new model on one or across multiple GPUs.
+"""
+
+import argparse
+import logging
+import math
+import os
+import sys
+from typing import Dict, Optional, Any, List, Tuple, Callable
+
+# We need to setup root logger before importing any fairseq libraries.
+logging.basicConfig(
+    format="%(asctime)s | %(levelname)s | %(name)s | %(message)s",
+    datefmt="%Y-%m-%d %H:%M:%S",
+    level=os.environ.get("LOGLEVEL", "INFO").upper(),
+    stream=sys.stdout,
+)
+logger = logging.getLogger("fairseq_cli.train")
+
+import numpy as np
+import torch
+from fairseq import (
+    checkpoint_utils,
+    options,
+    quantization_utils,
+    tasks,
+    utils,
+)
+from fairseq.data import iterators, data_utils
+from fairseq.data.plasma_utils import PlasmaStore
+from fairseq.dataclass.configs import FairseqConfig
+from fairseq.dataclass.utils import convert_namespace_to_omegaconf
+from fairseq.distributed import fsdp_enable_wrap, fsdp_wrap, utils as distributed_utils
+from fairseq.file_io import PathManager
+from fairseq.logging import meters, metrics, progress_bar
+from fairseq.model_parallel.megatron_trainer import MegatronTrainer
+from fairseq.trainer import Trainer
+from omegaconf import DictConfig, OmegaConf
+
+
+
+
+def main(cfg: FairseqConfig) -> None:
+    if isinstance(cfg, argparse.Namespace):
+        cfg = convert_namespace_to_omegaconf(cfg)
+
+    utils.import_user_module(cfg.common)
+
+    if distributed_utils.is_master(cfg.distributed_training) and "job_logging_cfg" in cfg:
+        # make hydra logging work with ddp (see # see https://github.com/facebookresearch/hydra/issues/1126)
+        logging.config.dictConfig(OmegaConf.to_container(cfg.job_logging_cfg))
+
+    assert (
+        cfg.dataset.max_tokens is not None or cfg.dataset.batch_size is not None
+    ), "Must specify batch size either with --max-tokens or --batch-size"
+    metrics.reset()
+
+    if cfg.common.log_file is not None:
+        handler = logging.FileHandler(filename=cfg.common.log_file)
+        logger.addHandler(handler)
+
+    np.random.seed(cfg.common.seed)
+    utils.set_torch_seed(cfg.common.seed)
+
+    if distributed_utils.is_master(cfg.distributed_training):
+        checkpoint_utils.verify_checkpoint_directory(cfg.checkpoint.save_dir)
+
+    # Print args
+    logger.info(cfg)
+
+    if cfg.checkpoint.write_checkpoints_asynchronously:
+        try:
+            import iopath  # noqa: F401
+        except ImportError:
+            logging.exception(
+                "Asynchronous checkpoint writing is specified but iopath is "
+                "not installed: `pip install iopath`"
+            )
+            return
+
+    # Setup task, e.g., translation, language modeling, etc.
+    task = tasks.setup_task(cfg.task)
+
+    assert cfg.criterion, "Please specify criterion to train a model"
+
+    # Build model and criterion
+    if cfg.distributed_training.ddp_backend == "fully_sharded":
+        with fsdp_enable_wrap(cfg.distributed_training):
+            model = fsdp_wrap(task.build_model(cfg.model))
+    else:
+        model = task.build_model(cfg.model)
+    criterion = task.build_criterion(cfg.criterion)
+    logger.info(model)
+    logger.info("task: {}".format(task.__class__.__name__))
+    logger.info("model: {}".format(model.__class__.__name__))
+    logger.info("criterion: {}".format(criterion.__class__.__name__))
+    logger.info(
+        "num. shared model params: {:,} (num. trained: {:,})".format(
+            sum(p.numel() for p in model.parameters() if not getattr(p, "expert", False)),
+            sum(p.numel() for p in model.parameters() if not getattr(p, "expert", False) and p.requires_grad)
+        )
+    )
+
+    logger.info(
+        "num. expert model params: {} (num. trained: {})".format(
+            sum(p.numel() for p in model.parameters() if getattr(p, "expert", False)),
+            sum(p.numel() for p in model.parameters() if getattr(p, "expert", False) and p.requires_grad),
+        )
+    )
+
+    # Load valid dataset (we load training data below, based on the latest checkpoint)
+    # We load the valid dataset AFTER building the model
+    data_utils.raise_if_valid_subsets_unintentionally_ignored(cfg)
+    if cfg.dataset.combine_valid_subsets:
+        task.load_dataset("valid", combine=True, epoch=1)
+    else:
+        for valid_sub_split in cfg.dataset.valid_subset.split(","):
+            task.load_dataset(valid_sub_split, combine=False, epoch=1)
+
+    # (optionally) Configure quantization
+    if cfg.common.quantization_config_path is not None:
+        quantizer = quantization_utils.Quantizer(
+            config_path=cfg.common.quantization_config_path,
+            max_epoch=cfg.optimization.max_epoch,
+            max_update=cfg.optimization.max_update,
+        )
+    else:
+        quantizer = None
+
+    # Build trainer
+    if cfg.common.model_parallel_size == 1:
+        trainer = Trainer(cfg, task, model, criterion, quantizer)
+    else:
+        trainer = MegatronTrainer(cfg, task, model, criterion)
+    logger.info(
+        "training on {} devices (GPUs/TPUs)".format(
+            cfg.distributed_training.distributed_world_size
+        )
+    )
+    logger.info(
+        "max tokens per device = {} and max sentences per device = {}".format(
+            cfg.dataset.max_tokens,
+            cfg.dataset.batch_size,
+        )
+    )
+
+    # Load the latest checkpoint if one is available and restore the
+    # corresponding train iterator
+    extra_state, epoch_itr = checkpoint_utils.load_checkpoint(
+        cfg.checkpoint,
+        trainer,
+        # don't cache epoch iterators for sharded datasets
+        disable_iterator_cache=task.has_sharded_data("train"),
+    )
+    if cfg.common.tpu:
+        import torch_xla.core.xla_model as xm
+        xm.rendezvous("load_checkpoint")  # wait for all workers
+
+    max_epoch = cfg.optimization.max_epoch or math.inf
+    lr = trainer.get_lr()
+
+    train_meter = meters.StopwatchMeter()
+    train_meter.start()
+    while epoch_itr.next_epoch_idx <= max_epoch:
+        if lr <= cfg.optimization.stop_min_lr:
+            logger.info(
+                f"stopping training because current learning rate ({lr}) is smaller "
+                "than or equal to minimum learning rate "
+                f"(--stop-min-lr={cfg.optimization.stop_min_lr})"
+            )
+            break
+
+        # train for one epoch
+        valid_losses, should_stop = train(cfg, trainer, task, epoch_itr)
+        if should_stop:
+            break
+
+        # only use first validation loss to update the learning rate
+        lr = trainer.lr_step(epoch_itr.epoch, valid_losses[0])
+
+        epoch_itr = trainer.get_train_iterator(
+            epoch_itr.next_epoch_idx,
+            # sharded data: get train iterator for next epoch
+            load_dataset=task.has_sharded_data("train"),
+            # don't cache epoch iterators for sharded datasets
+            disable_iterator_cache=task.has_sharded_data("train"),
+        )
+    train_meter.stop()
+    logger.info("done training in {:.1f} seconds".format(train_meter.sum))
+
+    # ioPath implementation to wait for all asynchronous file writes to complete.
+    if cfg.checkpoint.write_checkpoints_asynchronously:
+        logger.info(
+            "ioPath PathManager waiting for all asynchronous checkpoint "
+            "writes to finish."
+        )
+        PathManager.async_close()
+        logger.info("ioPath PathManager finished waiting.")
+
+
+def should_stop_early(cfg: DictConfig, valid_loss: float) -> bool:
+    # skip check if no validation was done in the current epoch
+    if valid_loss is None:
+        return False
+    if cfg.checkpoint.patience <= 0:
+        return False
+
+    def is_better(a, b):
+        return a > b if cfg.checkpoint.maximize_best_checkpoint_metric else a < b
+
+    prev_best = getattr(should_stop_early, "best", None)
+    if prev_best is None or is_better(valid_loss, prev_best):
+        should_stop_early.best = valid_loss
+        should_stop_early.num_runs = 0
+        return False
+    else:
+        should_stop_early.num_runs += 1
+        if should_stop_early.num_runs >= cfg.checkpoint.patience:
+            logger.info(
+                "early stop since valid performance hasn't improved for last {} runs".format(
+                    cfg.checkpoint.patience
+                )
+            )
+            return True
+        else:
+            return False
+
+
+@metrics.aggregate("train")
+def train(
+    cfg: DictConfig, trainer: Trainer, task: tasks.FairseqTask, epoch_itr
+) -> Tuple[List[Optional[float]], bool]:
+    """Train the model for one epoch and return validation losses."""
+    # Initialize data iterator
+    itr = epoch_itr.next_epoch_itr(
+        fix_batches_to_gpus=cfg.distributed_training.fix_batches_to_gpus,
+        shuffle=(epoch_itr.next_epoch_idx > cfg.dataset.curriculum),
+    )
+    update_freq = (
+        cfg.optimization.update_freq[epoch_itr.epoch - 1]
+        if epoch_itr.epoch <= len(cfg.optimization.update_freq)
+        else cfg.optimization.update_freq[-1]
+    )
+    itr = iterators.GroupedIterator(itr, update_freq)
+    if cfg.common.tpu:
+        itr = utils.tpu_data_loader(itr)
+    progress = progress_bar.progress_bar(
+        itr,
+        log_format=cfg.common.log_format,
+        log_file=cfg.common.log_file,
+        log_interval=cfg.common.log_interval,
+        epoch=epoch_itr.epoch,
+        tensorboard_logdir=(
+            cfg.common.tensorboard_logdir
+            if distributed_utils.is_master(cfg.distributed_training)
+            else None
+        ),
+        default_log_format=("tqdm" if not cfg.common.no_progress_bar else "simple"),
+        wandb_project=(
+            cfg.common.wandb_project
+            if distributed_utils.is_master(cfg.distributed_training)
+            else None
+        ),
+        wandb_run_name=os.environ.get(
+            "WANDB_NAME", os.path.basename(cfg.checkpoint.save_dir)
+        ),
+        azureml_logging=(
+            cfg.common.azureml_logging
+            if distributed_utils.is_master(cfg.distributed_training)
+            else False
+        ),
+    )
+    progress.update_config(_flatten_config(cfg))
+
+    trainer.begin_epoch(epoch_itr.epoch)
+
+    valid_subsets = cfg.dataset.valid_subset.split(",")
+    should_stop = False
+    num_updates = trainer.get_num_updates()
+    logger.info("Start iterating over samples")
+    for i, samples in enumerate(progress):
+        with metrics.aggregate("train_inner"), torch.autograd.profiler.record_function(
+            "train_step-%d" % i
+        ):
+            log_output = trainer.train_step(samples)
+
+        if log_output is not None:  # not OOM, overflow, ...
+            # log mid-epoch stats
+            num_updates = trainer.get_num_updates()
+            if num_updates % cfg.common.log_interval == 0:
+                stats = get_training_stats(metrics.get_smoothed_values("train_inner"))
+                progress.log(stats, tag="train_inner", step=num_updates)
+
+                # reset mid-epoch stats after each log interval
+                # the end-of-epoch stats will still be preserved
+                metrics.reset_meters("train_inner")
+
+        end_of_epoch = not itr.has_next()
+        valid_losses, should_stop = validate_and_save(
+            cfg, trainer, task, epoch_itr, valid_subsets, end_of_epoch
+        )
+
+        if should_stop:
+            break
+
+    # log end-of-epoch stats
+    logger.info("end of epoch {} (average epoch stats below)".format(epoch_itr.epoch))
+    stats = get_training_stats(metrics.get_smoothed_values("train"))
+    progress.print(stats, tag="train", step=num_updates)
+
+    # reset epoch-level meters
+    metrics.reset_meters("train")
+    return valid_losses, should_stop
+
+
+def _flatten_config(cfg: DictConfig):
+    config = OmegaConf.to_container(cfg)
+    # remove any legacy Namespaces and replace with a single "args"
+    namespace = None
+    for k, v in list(config.items()):
+        if isinstance(v, argparse.Namespace):
+            namespace = v
+            del config[k]
+    if namespace is not None:
+        config["args"] = vars(namespace)
+    return config
+
+
+def validate_and_save(
+    cfg: DictConfig,
+    trainer: Trainer,
+    task: tasks.FairseqTask,
+    epoch_itr,
+    valid_subsets: List[str],
+    end_of_epoch: bool,
+) -> Tuple[List[Optional[float]], bool]:
+    num_updates = trainer.get_num_updates()
+    max_update = cfg.optimization.max_update or math.inf
+
+    # Stopping conditions (and an additional one based on validation loss later
+    # on)
+    should_stop = False
+    if num_updates >= max_update:
+        should_stop = True
+        logger.info(
+            f"Stopping training due to "
+            f"num_updates: {num_updates} >= max_update: {max_update}"
+        )
+
+    training_time_hours = trainer.cumulative_training_time() / (60 * 60)
+    if (
+        cfg.optimization.stop_time_hours > 0
+        and training_time_hours > cfg.optimization.stop_time_hours
+    ):
+        should_stop = True
+        logger.info(
+            f"Stopping training due to "
+            f"cumulative_training_time: {training_time_hours} > "
+            f"stop_time_hours: {cfg.optimization.stop_time_hours} hour(s)"
+        )
+
+    do_save = (
+        (end_of_epoch and epoch_itr.epoch % cfg.checkpoint.save_interval == 0)
+        or should_stop
+        or (
+            cfg.checkpoint.save_interval_updates > 0
+            and num_updates > 0
+            and num_updates % cfg.checkpoint.save_interval_updates == 0
+            and num_updates >= cfg.dataset.validate_after_updates
+        )
+    )
+    do_validate = (
+        (not end_of_epoch and do_save)  # validate during mid-epoch saves
+        or (end_of_epoch and epoch_itr.epoch % cfg.dataset.validate_interval == 0)
+        or should_stop
+        or (
+            cfg.dataset.validate_interval_updates > 0
+            and num_updates > 0
+            and num_updates % cfg.dataset.validate_interval_updates == 0
+        )
+    ) and not cfg.dataset.disable_validation and num_updates >= cfg.dataset.validate_after_updates
+
+    # Validate
+    valid_losses = [None]
+    if do_validate:
+        valid_losses = validate(cfg, trainer, task, epoch_itr, valid_subsets)
+
+    should_stop |= should_stop_early(cfg, valid_losses[0])
+
+    # Save checkpoint
+    if do_save or should_stop:
+        checkpoint_utils.save_checkpoint(
+            cfg.checkpoint, trainer, epoch_itr, valid_losses[0]
+        )
+
+    return valid_losses, should_stop
+
+
+def get_training_stats(stats: Dict[str, Any]) -> Dict[str, Any]:
+    stats["wall"] = round(metrics.get_meter("default", "wall").elapsed_time, 0)
+    return stats
+
+
+def validate(
+    cfg: DictConfig,
+    trainer: Trainer,
+    task: tasks.FairseqTask,
+    epoch_itr,
+    subsets: List[str],
+) -> List[Optional[float]]:
+    """Evaluate the model on the validation set(s) and return the losses."""
+
+    if cfg.dataset.fixed_validation_seed is not None:
+        # set fixed seed for every validation
+        utils.set_torch_seed(cfg.dataset.fixed_validation_seed)
+
+    trainer.begin_valid_epoch(epoch_itr.epoch)
+    valid_losses = []
+    for subset in subsets:
+        logger.info('begin validation on "{}" subset'.format(subset))
+
+        # Initialize data iterator
+        itr = trainer.get_valid_iterator(subset).next_epoch_itr(
+            shuffle=False, set_dataset_epoch=False  # use a fixed valid set
+        )
+        if cfg.common.tpu:
+            itr = utils.tpu_data_loader(itr)
+        progress = progress_bar.progress_bar(
+            itr,
+            log_format=cfg.common.log_format,
+            log_interval=cfg.common.log_interval,
+            epoch=epoch_itr.epoch,
+            prefix=f"valid on '{subset}' subset",
+            tensorboard_logdir=(
+                cfg.common.tensorboard_logdir
+                if distributed_utils.is_master(cfg.distributed_training)
+                else None
+            ),
+            default_log_format=("tqdm" if not cfg.common.no_progress_bar else "simple"),
+            wandb_project=(
+                cfg.common.wandb_project
+                if distributed_utils.is_master(cfg.distributed_training)
+                else None
+            ),
+            wandb_run_name=os.environ.get(
+                "WANDB_NAME", os.path.basename(cfg.checkpoint.save_dir)
+            ),
+        )
+
+        # create a new root metrics aggregator so validation metrics
+        # don't pollute other aggregators (e.g., train meters)
+        with metrics.aggregate(new_root=True) as agg:
+            for i, sample in enumerate(progress):
+                if cfg.dataset.max_valid_steps is not None and i > cfg.dataset.max_valid_steps:
+                    break
+                trainer.valid_step(sample)
+
+        # log validation stats
+        stats = get_valid_stats(cfg, trainer, agg.get_smoothed_values())
+
+        if hasattr(task, "post_validate"):
+            task.post_validate(trainer.get_model(), stats, agg)
+
+        progress.print(stats, tag=subset, step=trainer.get_num_updates())
+
+        valid_losses.append(stats[cfg.checkpoint.best_checkpoint_metric])
+    return valid_losses
+
+
+def get_valid_stats(
+    cfg: DictConfig, trainer: Trainer, stats: Dict[str, Any]
+) -> Dict[str, Any]:
+    stats["num_updates"] = trainer.get_num_updates()
+    if hasattr(checkpoint_utils.save_checkpoint, "best"):
+        key = "best_{0}".format(cfg.checkpoint.best_checkpoint_metric)
+        best_function = max if cfg.checkpoint.maximize_best_checkpoint_metric else min
+        stats[key] = best_function(
+            checkpoint_utils.save_checkpoint.best,
+            stats[cfg.checkpoint.best_checkpoint_metric],
+        )
+    return stats
+
+
+def cli_main(
+    modify_parser: Optional[Callable[[argparse.ArgumentParser], None]] = None
+) -> None:
+    parser = options.get_training_parser()
+    args = options.parse_args_and_arch(parser, modify_parser=modify_parser)
+
+    cfg = convert_namespace_to_omegaconf(args)
+
+    if cfg.common.use_plasma_view:
+        server = PlasmaStore(path=cfg.common.plasma_path)
+        logger.info(f"Started plasma server pid {server.server.pid} {cfg.common.plasma_path}")
+
+    if args.profile:
+        with torch.cuda.profiler.profile():
+            with torch.autograd.profiler.emit_nvtx():
+                distributed_utils.call_main(cfg, main)
+    else:
+        distributed_utils.call_main(cfg, main)
+
+    # if cfg.common.use_plasma_view:
+    #     server.server.kill()
+
+
+if __name__ == "__main__":
+    cli_main()
diff --git a/SpeechT5/fairseq/fairseq_cli/validate.py b/SpeechT5/fairseq/fairseq_cli/validate.py
new file mode 100644
index 0000000000000000000000000000000000000000..22b93e9a6a1e1fbcff67075019177110905270f2
--- /dev/null
+++ b/SpeechT5/fairseq/fairseq_cli/validate.py
@@ -0,0 +1,155 @@
+#!/usr/bin/env python3 -u
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import logging
+import os
+import sys
+from argparse import Namespace
+from itertools import chain
+
+import torch
+from fairseq import checkpoint_utils, distributed_utils, options, utils
+from fairseq.dataclass.utils import convert_namespace_to_omegaconf
+from fairseq.logging import metrics, progress_bar
+from fairseq.utils import reset_logging
+from omegaconf import DictConfig
+
+
+logging.basicConfig(
+    format="%(asctime)s | %(levelname)s | %(name)s | %(message)s",
+    datefmt="%Y-%m-%d %H:%M:%S",
+    level=os.environ.get("LOGLEVEL", "INFO").upper(),
+    stream=sys.stdout,
+)
+logger = logging.getLogger("fairseq_cli.validate")
+
+
+def main(cfg: DictConfig, override_args=None):
+    if isinstance(cfg, Namespace):
+        cfg = convert_namespace_to_omegaconf(cfg)
+
+    utils.import_user_module(cfg.common)
+
+    reset_logging()
+
+    assert (
+        cfg.dataset.max_tokens is not None or cfg.dataset.batch_size is not None
+    ), "Must specify batch size either with --max-tokens or --batch-size"
+
+    use_fp16 = cfg.common.fp16
+    use_cuda = torch.cuda.is_available() and not cfg.common.cpu
+
+    if use_cuda:
+        torch.cuda.set_device(cfg.distributed_training.device_id)
+
+    if cfg.distributed_training.distributed_world_size > 1:
+        data_parallel_world_size = distributed_utils.get_data_parallel_world_size()
+        data_parallel_rank = distributed_utils.get_data_parallel_rank()
+    else:
+        data_parallel_world_size = 1
+        data_parallel_rank = 0
+
+    if override_args is not None:
+        overrides = vars(override_args)
+        overrides.update(eval(getattr(override_args, "model_overrides", "{}")))
+    else:
+        overrides = None
+
+    # Load ensemble
+    logger.info("loading model(s) from {}".format(cfg.common_eval.path))
+    models, saved_cfg, task = checkpoint_utils.load_model_ensemble_and_task(
+        [cfg.common_eval.path],
+        arg_overrides=overrides,
+        suffix=cfg.checkpoint.checkpoint_suffix,
+    )
+    model = models[0]
+
+    # Move models to GPU
+    for model in models:
+        model.eval()
+        if use_fp16:
+            model.half()
+        if use_cuda:
+            model.cuda()
+
+    # Print args
+    logger.info(saved_cfg)
+
+    # Build criterion
+    criterion = task.build_criterion(saved_cfg.criterion)
+    criterion.eval()
+
+    for subset in cfg.dataset.valid_subset.split(","):
+        try:
+            task.load_dataset(subset, combine=False, epoch=1, task_cfg=saved_cfg.task)
+            dataset = task.dataset(subset)
+        except KeyError:
+            raise Exception("Cannot find dataset: " + subset)
+
+        # Initialize data iterator
+        itr = task.get_batch_iterator(
+            dataset=dataset,
+            max_tokens=cfg.dataset.max_tokens,
+            max_sentences=cfg.dataset.batch_size,
+            max_positions=utils.resolve_max_positions(
+                task.max_positions(),
+                *[m.max_positions() for m in models],
+            ),
+            ignore_invalid_inputs=cfg.dataset.skip_invalid_size_inputs_valid_test,
+            required_batch_size_multiple=cfg.dataset.required_batch_size_multiple,
+            seed=cfg.common.seed,
+            num_shards=data_parallel_world_size,
+            shard_id=data_parallel_rank,
+            num_workers=cfg.dataset.num_workers,
+            data_buffer_size=cfg.dataset.data_buffer_size,
+        ).next_epoch_itr(shuffle=False)
+        progress = progress_bar.progress_bar(
+            itr,
+            log_format=cfg.common.log_format,
+            log_interval=cfg.common.log_interval,
+            prefix=f"valid on '{subset}' subset",
+            default_log_format=("tqdm" if not cfg.common.no_progress_bar else "simple"),
+        )
+
+        log_outputs = []
+        for i, sample in enumerate(progress):
+            sample = utils.move_to_cuda(sample) if use_cuda else sample
+            _loss, _sample_size, log_output = task.valid_step(sample, model, criterion)
+            progress.log(log_output, step=i)
+            log_outputs.append(log_output)
+
+        if data_parallel_world_size > 1:
+            log_outputs = distributed_utils.all_gather_list(
+                log_outputs,
+                max_size=cfg.common.all_gather_list_size,
+                group=distributed_utils.get_data_parallel_group(),
+            )
+            log_outputs = list(chain.from_iterable(log_outputs))
+
+        with metrics.aggregate() as agg:
+            task.reduce_metrics(log_outputs, criterion)
+            log_output = agg.get_smoothed_values()
+
+        progress.print(log_output, tag=subset, step=i)
+
+
+def cli_main():
+    parser = options.get_validation_parser()
+    args = options.parse_args_and_arch(parser)
+
+    # only override args that are explicitly given on the command line
+    override_parser = options.get_validation_parser()
+    override_args = options.parse_args_and_arch(
+        override_parser, suppress_defaults=True
+    )
+
+    distributed_utils.call_main(
+        convert_namespace_to_omegaconf(args), main, override_args=override_args
+    )
+
+
+if __name__ == "__main__":
+    cli_main()
diff --git a/SpeechT5/fairseq/hubconf.py b/SpeechT5/fairseq/hubconf.py
new file mode 100644
index 0000000000000000000000000000000000000000..5949e274edd02e86cb323331211641ce0d0b9b93
--- /dev/null
+++ b/SpeechT5/fairseq/hubconf.py
@@ -0,0 +1,73 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+"""isort:skip_file"""
+
+import functools
+import importlib
+
+
+dependencies = [
+    "dataclasses",
+    "hydra",
+    "numpy",
+    "omegaconf",
+    "regex",
+    "requests",
+    "torch",
+]
+
+
+# Check for required dependencies and raise a RuntimeError if any are missing.
+missing_deps = []
+for dep in dependencies:
+    try:
+        importlib.import_module(dep)
+    except ImportError:
+        # Hack: the hydra package is provided under the "hydra-core" name in
+        # pypi. We don't want the user mistakenly calling `pip install hydra`
+        # since that will install an unrelated package.
+        if dep == "hydra":
+            dep = "hydra-core"
+        missing_deps.append(dep)
+if len(missing_deps) > 0:
+    raise RuntimeError("Missing dependencies: {}".format(", ".join(missing_deps)))
+
+
+# only do fairseq imports after checking for dependencies
+from fairseq.hub_utils import (  # noqa; noqa
+    BPEHubInterface as bpe,
+    TokenizerHubInterface as tokenizer,
+)
+from fairseq.models import MODEL_REGISTRY  # noqa
+
+
+# torch.hub doesn't build Cython components, so if they are not found then try
+# to build them here
+try:
+    import fairseq.data.token_block_utils_fast  # noqa
+except ImportError:
+    try:
+        import cython  # noqa
+        import os
+        from setuptools import sandbox
+
+        sandbox.run_setup(
+            os.path.join(os.path.dirname(__file__), "setup.py"),
+            ["build_ext", "--inplace"],
+        )
+    except ImportError:
+        print(
+            "Unable to build Cython components. Please make sure Cython is "
+            "installed if the torch.hub model you are loading depends on it."
+        )
+
+
+# automatically expose models defined in FairseqModel::hub_models
+for _model_type, _cls in MODEL_REGISTRY.items():
+    for model_name in _cls.hub_models().keys():
+        globals()[model_name] = functools.partial(
+            _cls.from_pretrained,
+            model_name,
+        )
diff --git a/SpeechT5/fairseq/pyproject.toml b/SpeechT5/fairseq/pyproject.toml
new file mode 100644
index 0000000000000000000000000000000000000000..6d1b4c5b6fb56a63069147e3a1de922ce71a45d8
--- /dev/null
+++ b/SpeechT5/fairseq/pyproject.toml
@@ -0,0 +1,3 @@
+[build-system]
+requires = ["setuptools", "wheel", "cython"]
+build-backend = "setuptools.build_meta"
diff --git a/SpeechT5/fairseq/scripts/__init__.py b/SpeechT5/fairseq/scripts/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/SpeechT5/fairseq/scripts/average_checkpoints.py b/SpeechT5/fairseq/scripts/average_checkpoints.py
new file mode 100644
index 0000000000000000000000000000000000000000..c512f802bce6b3395cc42a0e4eb39181e9f8c873
--- /dev/null
+++ b/SpeechT5/fairseq/scripts/average_checkpoints.py
@@ -0,0 +1,158 @@
+#!/usr/bin/env python3
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import argparse
+import collections
+import os
+import re
+
+import torch
+from fairseq.file_io import PathManager
+
+
+def average_checkpoints(inputs):
+    """Loads checkpoints from inputs and returns a model with averaged weights.
+
+    Args:
+      inputs: An iterable of string paths of checkpoints to load from.
+
+    Returns:
+      A dict of string keys mapping to various values. The 'model' key
+      from the returned dict should correspond to an OrderedDict mapping
+      string parameter names to torch Tensors.
+    """
+    params_dict = collections.OrderedDict()
+    params_keys = None
+    new_state = None
+    num_models = len(inputs)
+
+    for fpath in inputs:
+        with PathManager.open(fpath, "rb") as f:
+            state = torch.load(
+                f,
+                map_location=(
+                    lambda s, _: torch.serialization.default_restore_location(s, "cpu")
+                ),
+            )
+        # Copies over the settings from the first checkpoint
+        if new_state is None:
+            new_state = state
+
+        model_params = state["model"]
+
+        model_params_keys = list(model_params.keys())
+        if params_keys is None:
+            params_keys = model_params_keys
+        elif params_keys != model_params_keys:
+            raise KeyError(
+                "For checkpoint {}, expected list of params: {}, "
+                "but found: {}".format(f, params_keys, model_params_keys)
+            )
+
+        for k in params_keys:
+            p = model_params[k]
+            if isinstance(p, torch.HalfTensor):
+                p = p.float()
+            if k not in params_dict:
+                params_dict[k] = p.clone()
+                # NOTE: clone() is needed in case of p is a shared parameter
+            else:
+                params_dict[k] += p
+
+    averaged_params = collections.OrderedDict()
+    for k, v in params_dict.items():
+        averaged_params[k] = v
+        if averaged_params[k].is_floating_point():
+            averaged_params[k].div_(num_models)
+        else:
+            averaged_params[k] //= num_models
+    new_state["model"] = averaged_params
+    return new_state
+
+
+def last_n_checkpoints(paths, n, update_based, upper_bound=None):
+    assert len(paths) == 1
+    path = paths[0]
+    if update_based:
+        pt_regexp = re.compile(r"checkpoint_\d+_(\d+)\.pt")
+    else:
+        pt_regexp = re.compile(r"checkpoint(\d+)\.pt")
+    files = PathManager.ls(path)
+
+    entries = []
+    for f in files:
+        m = pt_regexp.fullmatch(f)
+        if m is not None:
+            sort_key = int(m.group(1))
+            if upper_bound is None or sort_key <= upper_bound:
+                entries.append((sort_key, m.group(0)))
+    if len(entries) < n:
+        raise Exception(
+            "Found {} checkpoint files but need at least {}", len(entries), n
+        )
+    return [os.path.join(path, x[1]) for x in sorted(entries, reverse=True)[:n]]
+
+
+def main():
+    parser = argparse.ArgumentParser(
+        description="Tool to average the params of input checkpoints to "
+        "produce a new checkpoint",
+    )
+    # fmt: off
+    parser.add_argument('--inputs', required=True, nargs='+',
+                        help='Input checkpoint file paths.')
+    parser.add_argument('--output', required=True, metavar='FILE',
+                        help='Write the new checkpoint containing the averaged weights to this path.')
+    num_group = parser.add_mutually_exclusive_group()
+    num_group.add_argument('--num-epoch-checkpoints', type=int,
+                           help='if set, will try to find checkpoints with names checkpoint_xx.pt in the path specified by input, '
+                           'and average last this many of them.')
+    num_group.add_argument('--num-update-checkpoints', type=int,
+                           help='if set, will try to find checkpoints with names checkpoint_ee_xx.pt in the path specified by input, '
+                           'and average last this many of them.')
+    parser.add_argument('--checkpoint-upper-bound', type=int,
+                        help='when using --num-epoch-checkpoints, this will set an upper bound on which epoch to use, '
+                        'when using --num-update-checkpoints, this will set an upper bound on which update to use'
+                        'e.g., with --num-epoch-checkpoints=10 --checkpoint-upper-bound=50, checkpoints 41-50 would be averaged.'
+                        'e.g., with --num-update-checkpoints=10 --checkpoint-upper-bound=50000, checkpoints 40500-50000 would be averaged assuming --save-interval-updates 500'
+                        )
+    # fmt: on
+    args = parser.parse_args()
+    print(args)
+
+    num = None
+    is_update_based = False
+    if args.num_update_checkpoints is not None:
+        num = args.num_update_checkpoints
+        is_update_based = True
+    elif args.num_epoch_checkpoints is not None:
+        num = args.num_epoch_checkpoints
+
+    assert args.checkpoint_upper_bound is None or (
+        args.num_epoch_checkpoints is not None
+        or args.num_update_checkpoints is not None
+    ), "--checkpoint-upper-bound requires --num-epoch-checkpoints or --num-update-checkpoints"
+    assert (
+        args.num_epoch_checkpoints is None or args.num_update_checkpoints is None
+    ), "Cannot combine --num-epoch-checkpoints and --num-update-checkpoints"
+
+    if num is not None:
+        args.inputs = last_n_checkpoints(
+            args.inputs,
+            num,
+            is_update_based,
+            upper_bound=args.checkpoint_upper_bound,
+        )
+        print("averaging checkpoints: ", args.inputs)
+
+    new_state = average_checkpoints(args.inputs)
+    with PathManager.open(args.output, "wb") as f:
+        torch.save(new_state, f)
+    print("Finished writing averaged checkpoint to {}".format(args.output))
+
+
+if __name__ == "__main__":
+    main()
diff --git a/SpeechT5/fairseq/scripts/build_sym_alignment.py b/SpeechT5/fairseq/scripts/build_sym_alignment.py
new file mode 100644
index 0000000000000000000000000000000000000000..0ca5c18f7bd4b0fbf58b203793506ca395466129
--- /dev/null
+++ b/SpeechT5/fairseq/scripts/build_sym_alignment.py
@@ -0,0 +1,97 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+"""
+Use this script in order to build symmetric alignments for your translation
+dataset.
+This script depends on fast_align and mosesdecoder tools. You will need to
+build those before running the script.
+fast_align:
+    github: http://github.com/clab/fast_align
+    instructions: follow the instructions in README.md
+mosesdecoder:
+    github: http://github.com/moses-smt/mosesdecoder
+    instructions: http://www.statmt.org/moses/?n=Development.GetStarted
+The script produces the following files under --output_dir:
+    text.joined - concatenation of lines from the source_file and the
+    target_file.
+    align.forward - forward pass of fast_align.
+    align.backward - backward pass of fast_align.
+    aligned.sym_heuristic - symmetrized alignment.
+"""
+
+import argparse
+import os
+from itertools import zip_longest
+
+
+def main():
+    parser = argparse.ArgumentParser(description="symmetric alignment builer")
+    # fmt: off
+    parser.add_argument('--fast_align_dir',
+                        help='path to fast_align build directory')
+    parser.add_argument('--mosesdecoder_dir',
+                        help='path to mosesdecoder root directory')
+    parser.add_argument('--sym_heuristic',
+                        help='heuristic to use for symmetrization',
+                        default='grow-diag-final-and')
+    parser.add_argument('--source_file',
+                        help='path to a file with sentences '
+                             'in the source language')
+    parser.add_argument('--target_file',
+                        help='path to a file with sentences '
+                             'in the target language')
+    parser.add_argument('--output_dir',
+                        help='output directory')
+    # fmt: on
+    args = parser.parse_args()
+
+    fast_align_bin = os.path.join(args.fast_align_dir, "fast_align")
+    symal_bin = os.path.join(args.mosesdecoder_dir, "bin", "symal")
+    sym_fast_align_bin = os.path.join(
+        args.mosesdecoder_dir, "scripts", "ems", "support", "symmetrize-fast-align.perl"
+    )
+
+    # create joined file
+    joined_file = os.path.join(args.output_dir, "text.joined")
+    with open(args.source_file, "r", encoding="utf-8") as src, open(
+        args.target_file, "r", encoding="utf-8"
+    ) as tgt:
+        with open(joined_file, "w", encoding="utf-8") as joined:
+            for s, t in zip_longest(src, tgt):
+                print("{} ||| {}".format(s.strip(), t.strip()), file=joined)
+
+    bwd_align_file = os.path.join(args.output_dir, "align.backward")
+
+    # run forward alignment
+    fwd_align_file = os.path.join(args.output_dir, "align.forward")
+    fwd_fast_align_cmd = "{FASTALIGN} -i {JOINED} -d -o -v > {FWD}".format(
+        FASTALIGN=fast_align_bin, JOINED=joined_file, FWD=fwd_align_file
+    )
+    assert os.system(fwd_fast_align_cmd) == 0
+
+    # run backward alignment
+    bwd_align_file = os.path.join(args.output_dir, "align.backward")
+    bwd_fast_align_cmd = "{FASTALIGN} -i {JOINED} -d -o -v -r > {BWD}".format(
+        FASTALIGN=fast_align_bin, JOINED=joined_file, BWD=bwd_align_file
+    )
+    assert os.system(bwd_fast_align_cmd) == 0
+
+    # run symmetrization
+    sym_out_file = os.path.join(args.output_dir, "aligned")
+    sym_cmd = "{SYMFASTALIGN} {FWD} {BWD} {SRC} {TGT} {OUT} {HEURISTIC} {SYMAL}".format(
+        SYMFASTALIGN=sym_fast_align_bin,
+        FWD=fwd_align_file,
+        BWD=bwd_align_file,
+        SRC=args.source_file,
+        TGT=args.target_file,
+        OUT=sym_out_file,
+        HEURISTIC=args.sym_heuristic,
+        SYMAL=symal_bin,
+    )
+    assert os.system(sym_cmd) == 0
+
+
+if __name__ == "__main__":
+    main()
diff --git a/SpeechT5/fairseq/scripts/compare_namespaces.py b/SpeechT5/fairseq/scripts/compare_namespaces.py
new file mode 100644
index 0000000000000000000000000000000000000000..bc24db624f8db36f546c263ba3a806dae6d466bf
--- /dev/null
+++ b/SpeechT5/fairseq/scripts/compare_namespaces.py
@@ -0,0 +1,46 @@
+#!/usr/bin/env python
+"""Helper script to compare two argparse.Namespace objects."""
+
+from argparse import Namespace  # noqa
+
+
+def main():
+
+    ns1 = eval(input("Namespace 1: "))
+    ns2 = eval(input("Namespace 2: "))
+
+    def keys(ns):
+        ks = set()
+        for k in dir(ns):
+            if not k.startswith("_"):
+                ks.add(k)
+        return ks
+
+    k1 = keys(ns1)
+    k2 = keys(ns2)
+
+    def print_keys(ks, ns1, ns2=None):
+        for k in ks:
+            if ns2 is None:
+                print("{}\t{}".format(k, getattr(ns1, k, None)))
+            else:
+                print(
+                    "{}\t{}\t{}".format(k, getattr(ns1, k, None), getattr(ns2, k, None))
+                )
+
+    print("Keys unique to namespace 1:")
+    print_keys(k1 - k2, ns1)
+    print()
+
+    print("Keys unique to namespace 2:")
+    print_keys(k2 - k1, ns2)
+    print()
+
+    print("Overlapping keys with different values:")
+    ks = [k for k in k1 & k2 if getattr(ns1, k, "None") != getattr(ns2, k, "None")]
+    print_keys(ks, ns1, ns2)
+    print()
+
+
+if __name__ == "__main__":
+    main()
diff --git a/SpeechT5/fairseq/scripts/compound_split_bleu.sh b/SpeechT5/fairseq/scripts/compound_split_bleu.sh
new file mode 100644
index 0000000000000000000000000000000000000000..1972fddcebff9a43a70bcf14c287175c68f60e3f
--- /dev/null
+++ b/SpeechT5/fairseq/scripts/compound_split_bleu.sh
@@ -0,0 +1,20 @@
+#!/bin/bash
+
+if [ $# -ne 1 ]; then
+    echo "usage: $0 GENERATE_PY_OUTPUT"
+    exit 1
+fi
+
+GEN=$1
+
+SYS=$GEN.sys
+REF=$GEN.ref
+
+if [ $(tail -n 1 $GEN | grep BLEU | wc -l) -ne 1 ]; then
+    echo "not done generating"
+    exit
+fi
+
+grep ^H $GEN | awk -F '\t' '{print $NF}' | perl -ple 's{(\S)-(\S)}{$1 ##AT##-##AT## $2}g' > $SYS
+grep ^T $GEN | cut -f2- | perl -ple 's{(\S)-(\S)}{$1 ##AT##-##AT## $2}g' > $REF
+fairseq-score --sys $SYS --ref $REF
diff --git a/SpeechT5/fairseq/scripts/constraints/extract.py b/SpeechT5/fairseq/scripts/constraints/extract.py
new file mode 100644
index 0000000000000000000000000000000000000000..f6155d0a0538aadb46bf612256b6b949728de69e
--- /dev/null
+++ b/SpeechT5/fairseq/scripts/constraints/extract.py
@@ -0,0 +1,92 @@
+#!/usr/bin/env python3
+#
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+"""Extracts random constraints from reference files."""
+
+import argparse
+import random
+import sys
+
+from sacrebleu import extract_ngrams
+
+
+def get_phrase(words, index, length):
+    assert index < len(words) - length + 1
+    phr = " ".join(words[index : index + length])
+    for i in range(index, index + length):
+        words.pop(index)
+    return phr
+
+
+def main(args):
+
+    if args.seed:
+        random.seed(args.seed)
+
+    for line in sys.stdin:
+        constraints = []
+
+        def add_constraint(constraint):
+            constraints.append(constraint)
+
+        source = line.rstrip()
+        if "\t" in line:
+            source, target = line.split("\t")
+            if args.add_sos:
+                target = f"<s> {target}"
+            if args.add_eos:
+                target = f"{target} </s>"
+
+            if len(target.split()) >= args.len:
+                words = [target]
+
+                num = args.number
+
+                choices = {}
+                for i in range(num):
+                    if len(words) == 0:
+                        break
+                    segmentno = random.choice(range(len(words)))
+                    segment = words.pop(segmentno)
+                    tokens = segment.split()
+                    phrase_index = random.choice(range(len(tokens)))
+                    choice = " ".join(
+                        tokens[phrase_index : min(len(tokens), phrase_index + args.len)]
+                    )
+                    for j in range(
+                        phrase_index, min(len(tokens), phrase_index + args.len)
+                    ):
+                        tokens.pop(phrase_index)
+                    if phrase_index > 0:
+                        words.append(" ".join(tokens[0:phrase_index]))
+                    if phrase_index + 1 < len(tokens):
+                        words.append(" ".join(tokens[phrase_index:]))
+                    choices[target.find(choice)] = choice
+
+                    # mask out with spaces
+                    target = target.replace(choice, " " * len(choice), 1)
+
+                for key in sorted(choices.keys()):
+                    add_constraint(choices[key])
+
+        print(source, *constraints, sep="\t")
+
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--number", "-n", type=int, default=1, help="number of phrases")
+    parser.add_argument("--len", "-l", type=int, default=1, help="phrase length")
+    parser.add_argument(
+        "--add-sos", default=False, action="store_true", help="add <s> token"
+    )
+    parser.add_argument(
+        "--add-eos", default=False, action="store_true", help="add </s> token"
+    )
+    parser.add_argument("--seed", "-s", default=0, type=int)
+    args = parser.parse_args()
+
+    main(args)
diff --git a/SpeechT5/fairseq/scripts/constraints/validate.py b/SpeechT5/fairseq/scripts/constraints/validate.py
new file mode 100644
index 0000000000000000000000000000000000000000..d531ad9f39b1df42c98fe8f26ad61fe53a9ac0c5
--- /dev/null
+++ b/SpeechT5/fairseq/scripts/constraints/validate.py
@@ -0,0 +1,34 @@
+#!/usr/bin/env python3
+#
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import sys
+
+
+"""Reads in a fairseq output file, and verifies that the constraints
+(C- lines) are present in the output (the first H- line). Assumes that
+constraints are listed prior to the first hypothesis.
+"""
+
+constraints = []
+found = 0
+total = 0
+for line in sys.stdin:
+    if line.startswith("C-"):
+        constraints.append(line.rstrip().split("\t")[1])
+    elif line.startswith("H-"):
+        text = line.split("\t")[2]
+
+        for constraint in constraints:
+            total += 1
+            if constraint in text:
+                found += 1
+            else:
+                print(f"No {constraint} in {text}", file=sys.stderr)
+
+        constraints = []
+
+print(f"Found {found} / {total} = {100 * found / total:.1f}%")
diff --git a/SpeechT5/fairseq/scripts/convert_dictionary.lua b/SpeechT5/fairseq/scripts/convert_dictionary.lua
new file mode 100644
index 0000000000000000000000000000000000000000..14ee8c997f642c8ff196617c2dcd0584037a60c4
--- /dev/null
+++ b/SpeechT5/fairseq/scripts/convert_dictionary.lua
@@ -0,0 +1,34 @@
+-- Copyright (c) Facebook, Inc. and its affiliates.
+--
+-- This source code is licensed under the MIT license found in the
+-- LICENSE file in the root directory of this source tree.
+--
+-- Usage: convert_dictionary.lua <dict.th7>
+require 'fairseq'
+require 'torch'
+require 'paths'
+
+if #arg < 1 then
+   print('usage: convert_dictionary.lua <dict.th7>')
+   os.exit(1)
+end
+if not paths.filep(arg[1]) then
+   print('error: file does not exit: ' .. arg[1])
+   os.exit(1)
+end
+
+dict = torch.load(arg[1])
+dst = paths.basename(arg[1]):gsub('.th7', '.txt')
+assert(dst:match('.txt$'))
+
+f = io.open(dst, 'w')
+for idx, symbol in ipairs(dict.index_to_symbol) do
+  if idx > dict.cutoff then
+    break
+  end
+  f:write(symbol)
+  f:write(' ')
+  f:write(dict.index_to_freq[idx])
+  f:write('\n')
+end
+f:close()
diff --git a/SpeechT5/fairseq/scripts/convert_model.lua b/SpeechT5/fairseq/scripts/convert_model.lua
new file mode 100644
index 0000000000000000000000000000000000000000..61b92139294fb90a25989ebd2ee52a765fb278a2
--- /dev/null
+++ b/SpeechT5/fairseq/scripts/convert_model.lua
@@ -0,0 +1,108 @@
+-- Copyright (c) Facebook, Inc. and its affiliates.
+--
+-- This source code is licensed under the MIT license found in the
+-- LICENSE file in the root directory of this source tree.
+--
+-- Usage: convert_model.lua <model_epoch1.th7>
+require 'torch'
+local fairseq = require 'fairseq'
+
+model = torch.load(arg[1])
+
+function find_weight_norm(container, module)
+  for _, wn in ipairs(container:listModules()) do
+    if torch.type(wn) == 'nn.WeightNorm' and wn.modules[1] == module then
+      return wn
+    end
+  end
+end
+
+function push_state(dict, key, module)
+  if torch.type(module) == 'nn.Linear' then
+    local wn = find_weight_norm(model.module, module)
+    assert(wn)
+    dict[key .. '.weight_v'] = wn.v:float()
+    dict[key .. '.weight_g'] = wn.g:float()
+  elseif torch.type(module) == 'nn.TemporalConvolutionTBC' then
+    local wn = find_weight_norm(model.module, module)
+    assert(wn)
+    local v = wn.v:float():view(wn.viewOut):transpose(2, 3)
+    dict[key .. '.weight_v'] = v
+    dict[key .. '.weight_g'] = wn.g:float():view(module.weight:size(3), 1, 1)
+  else
+    dict[key .. '.weight'] = module.weight:float()
+  end
+  if module.bias then
+    dict[key .. '.bias'] = module.bias:float()
+  end
+end
+
+encoder_dict = {}
+decoder_dict = {}
+combined_dict = {}
+
+function encoder_state(encoder)
+  luts = encoder:findModules('nn.LookupTable')
+  push_state(encoder_dict, 'embed_tokens', luts[1])
+  push_state(encoder_dict, 'embed_positions', luts[2])
+
+  fcs = encoder:findModules('nn.Linear')
+  assert(#fcs >= 2)
+  local nInputPlane = fcs[1].weight:size(1)
+  push_state(encoder_dict, 'fc1', table.remove(fcs, 1))
+  push_state(encoder_dict, 'fc2', table.remove(fcs, #fcs))
+
+  for i, module in ipairs(encoder:findModules('nn.TemporalConvolutionTBC')) do
+    push_state(encoder_dict, 'convolutions.' .. tostring(i - 1), module)
+    if nInputPlane ~= module.weight:size(3) / 2 then
+      push_state(encoder_dict, 'projections.' .. tostring(i - 1), table.remove(fcs, 1))
+    end
+    nInputPlane = module.weight:size(3) / 2
+  end
+  assert(#fcs == 0)
+end
+
+function decoder_state(decoder)
+  luts = decoder:findModules('nn.LookupTable')
+  push_state(decoder_dict, 'embed_tokens', luts[1])
+  push_state(decoder_dict, 'embed_positions', luts[2])
+
+  fcs = decoder:findModules('nn.Linear')
+  local nInputPlane = fcs[1].weight:size(1)
+  push_state(decoder_dict, 'fc1', table.remove(fcs, 1))
+  push_state(decoder_dict, 'fc2', fcs[#fcs - 1])
+  push_state(decoder_dict, 'fc3', fcs[#fcs])
+
+  table.remove(fcs, #fcs)
+  table.remove(fcs, #fcs)
+
+  for i, module in ipairs(decoder:findModules('nn.TemporalConvolutionTBC')) do
+    if nInputPlane ~= module.weight:size(3) / 2 then
+      push_state(decoder_dict, 'projections.' .. tostring(i - 1), table.remove(fcs, 1))
+    end
+    nInputPlane = module.weight:size(3) / 2
+
+    local prefix = 'attention.' .. tostring(i - 1)
+    push_state(decoder_dict, prefix .. '.in_projection', table.remove(fcs, 1))
+    push_state(decoder_dict, prefix .. '.out_projection', table.remove(fcs, 1))
+    push_state(decoder_dict, 'convolutions.' .. tostring(i - 1), module)
+  end
+  assert(#fcs == 0)
+end
+
+
+_encoder = model.module.modules[2]
+_decoder = model.module.modules[3]
+
+encoder_state(_encoder)
+decoder_state(_decoder)
+
+for k, v in pairs(encoder_dict) do
+  combined_dict['encoder.' .. k] = v
+end
+for k, v in pairs(decoder_dict) do
+  combined_dict['decoder.' .. k] = v
+end
+
+
+torch.save('state_dict.t7', combined_dict)
diff --git a/SpeechT5/fairseq/scripts/count_docs.py b/SpeechT5/fairseq/scripts/count_docs.py
new file mode 100644
index 0000000000000000000000000000000000000000..58d85af85e91377a34dbd01f7674436152fd08e8
--- /dev/null
+++ b/SpeechT5/fairseq/scripts/count_docs.py
@@ -0,0 +1,58 @@
+#!/usr/bin/env python3
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+"""
+Count the number of documents and average number of lines and tokens per
+document in a large file. Documents should be separated by a single empty line.
+"""
+
+import argparse
+import gzip
+import sys
+
+import numpy as np
+
+
+def main():
+    parser = argparse.ArgumentParser()
+    parser.add_argument("input")
+    parser.add_argument("--gzip", action="store_true")
+    args = parser.parse_args()
+
+    def gopen():
+        if args.gzip:
+            return gzip.open(args.input, "r")
+        else:
+            return open(args.input, "r", encoding="utf-8")
+
+    num_lines = []
+    num_toks = []
+    with gopen() as h:
+        num_docs = 1
+        num_lines_in_doc = 0
+        num_toks_in_doc = 0
+        for i, line in enumerate(h):
+            if len(line.strip()) == 0:  # empty line indicates new document
+                num_docs += 1
+                num_lines.append(num_lines_in_doc)
+                num_toks.append(num_toks_in_doc)
+                num_lines_in_doc = 0
+                num_toks_in_doc = 0
+            else:
+                num_lines_in_doc += 1
+                num_toks_in_doc += len(line.rstrip().split())
+            if i % 1000000 == 0:
+                print(i, file=sys.stderr, end="", flush=True)
+            elif i % 100000 == 0:
+                print(".", file=sys.stderr, end="", flush=True)
+        print(file=sys.stderr, flush=True)
+
+    print("found {} docs".format(num_docs))
+    print("average num lines per doc: {}".format(np.mean(num_lines)))
+    print("average num toks per doc: {}".format(np.mean(num_toks)))
+
+
+if __name__ == "__main__":
+    main()
diff --git a/SpeechT5/fairseq/scripts/read_binarized.py b/SpeechT5/fairseq/scripts/read_binarized.py
new file mode 100644
index 0000000000000000000000000000000000000000..a414095d03fb022a6753e816fc8bfd80e11db24d
--- /dev/null
+++ b/SpeechT5/fairseq/scripts/read_binarized.py
@@ -0,0 +1,48 @@
+#!/usr/bin/env python3
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import argparse
+
+from fairseq.data import Dictionary, data_utils, indexed_dataset
+
+
+def get_parser():
+    parser = argparse.ArgumentParser(
+        description="writes text from binarized file to stdout"
+    )
+    # fmt: off
+    parser.add_argument('--dataset-impl', help='dataset implementation',
+                        choices=indexed_dataset.get_available_dataset_impl())
+    parser.add_argument('--dict', metavar='FP', help='dictionary containing known words', default=None)
+    parser.add_argument('--input', metavar='FP', required=True, help='binarized file to read')
+    # fmt: on
+
+    return parser
+
+
+def main():
+    parser = get_parser()
+    args = parser.parse_args()
+
+    dictionary = Dictionary.load(args.dict) if args.dict is not None else None
+    dataset = data_utils.load_indexed_dataset(
+        args.input,
+        dictionary,
+        dataset_impl=args.dataset_impl,
+        default="lazy",
+    )
+
+    for tensor_line in dataset:
+        if dictionary is None:
+            line = " ".join([str(int(x)) for x in tensor_line])
+        else:
+            line = dictionary.string(tensor_line)
+
+        print(line)
+
+
+if __name__ == "__main__":
+    main()
diff --git a/SpeechT5/fairseq/scripts/rm_pt.py b/SpeechT5/fairseq/scripts/rm_pt.py
new file mode 100644
index 0000000000000000000000000000000000000000..6cd063d21f0610fa7c42c2cfb2ee8af7c9c78677
--- /dev/null
+++ b/SpeechT5/fairseq/scripts/rm_pt.py
@@ -0,0 +1,141 @@
+#!/usr/bin/env python3
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import argparse
+import os
+import re
+import shutil
+import sys
+
+
+pt_regexp = re.compile(r"checkpoint(\d+|_\d+_\d+|_[a-z]+)\.pt")
+pt_regexp_epoch_based = re.compile(r"checkpoint(\d+)\.pt")
+pt_regexp_update_based = re.compile(r"checkpoint_\d+_(\d+)\.pt")
+
+
+def parse_checkpoints(files):
+    entries = []
+    for f in files:
+        m = pt_regexp_epoch_based.fullmatch(f)
+        if m is not None:
+            entries.append((int(m.group(1)), m.group(0)))
+        else:
+            m = pt_regexp_update_based.fullmatch(f)
+            if m is not None:
+                entries.append((int(m.group(1)), m.group(0)))
+    return entries
+
+
+def last_n_checkpoints(files, n):
+    entries = parse_checkpoints(files)
+    return [x[1] for x in sorted(entries, reverse=True)[:n]]
+
+
+def every_n_checkpoints(files, n):
+    entries = parse_checkpoints(files)
+    return [x[1] for x in sorted(sorted(entries)[::-n])]
+
+
+def main():
+    parser = argparse.ArgumentParser(
+        description=(
+            "Recursively delete checkpoint files from `root_dir`, "
+            "but preserve checkpoint_best.pt and checkpoint_last.pt"
+        )
+    )
+    parser.add_argument("root_dirs", nargs="*")
+    parser.add_argument(
+        "--save-last", type=int, default=0, help="number of last checkpoints to save"
+    )
+    parser.add_argument(
+        "--save-every", type=int, default=0, help="interval of checkpoints to save"
+    )
+    parser.add_argument(
+        "--preserve-test",
+        action="store_true",
+        help="preserve checkpoints in dirs that start with test_ prefix (default: delete them)",
+    )
+    parser.add_argument(
+        "--delete-best", action="store_true", help="delete checkpoint_best.pt"
+    )
+    parser.add_argument(
+        "--delete-last", action="store_true", help="delete checkpoint_last.pt"
+    )
+    parser.add_argument(
+        "--no-dereference", action="store_true", help="don't dereference symlinks"
+    )
+    args = parser.parse_args()
+
+    files_to_desymlink = []
+    files_to_preserve = []
+    files_to_delete = []
+    for root_dir in args.root_dirs:
+        for root, _subdirs, files in os.walk(root_dir):
+            if args.save_last > 0:
+                to_save = last_n_checkpoints(files, args.save_last)
+            else:
+                to_save = []
+            if args.save_every > 0:
+                to_save += every_n_checkpoints(files, args.save_every)
+            for file in files:
+                if not pt_regexp.fullmatch(file):
+                    continue
+                full_path = os.path.join(root, file)
+                if (
+                    not os.path.basename(root).startswith("test_") or args.preserve_test
+                ) and (
+                    (file == "checkpoint_last.pt" and not args.delete_last)
+                    or (file == "checkpoint_best.pt" and not args.delete_best)
+                    or file in to_save
+                ):
+                    if os.path.islink(full_path) and not args.no_dereference:
+                        files_to_desymlink.append(full_path)
+                    else:
+                        files_to_preserve.append(full_path)
+                else:
+                    files_to_delete.append(full_path)
+
+    if len(files_to_desymlink) == 0 and len(files_to_delete) == 0:
+        print("Nothing to do.")
+        sys.exit(0)
+
+    files_to_desymlink = sorted(files_to_desymlink)
+    files_to_preserve = sorted(files_to_preserve)
+    files_to_delete = sorted(files_to_delete)
+
+    print("Operations to perform (in order):")
+    if len(files_to_desymlink) > 0:
+        for file in files_to_desymlink:
+            print(" - preserve (and dereference symlink): " + file)
+    if len(files_to_preserve) > 0:
+        for file in files_to_preserve:
+            print(" - preserve: " + file)
+    if len(files_to_delete) > 0:
+        for file in files_to_delete:
+            print(" - delete: " + file)
+    while True:
+        resp = input("Continue? (Y/N): ")
+        if resp.strip().lower() == "y":
+            break
+        elif resp.strip().lower() == "n":
+            sys.exit(0)
+
+    print("Executing...")
+    if len(files_to_desymlink) > 0:
+        for file in files_to_desymlink:
+            realpath = os.path.realpath(file)
+            print("rm " + file)
+            os.remove(file)
+            print("cp {} {}".format(realpath, file))
+            shutil.copyfile(realpath, file)
+    if len(files_to_delete) > 0:
+        for file in files_to_delete:
+            print("rm " + file)
+            os.remove(file)
+
+
+if __name__ == "__main__":
+    main()
diff --git a/SpeechT5/fairseq/scripts/sacrebleu.sh b/SpeechT5/fairseq/scripts/sacrebleu.sh
new file mode 100644
index 0000000000000000000000000000000000000000..c10bf2b76ea032deabab6f5c9d8a3e1e884f1642
--- /dev/null
+++ b/SpeechT5/fairseq/scripts/sacrebleu.sh
@@ -0,0 +1,27 @@
+#!/bin/bash
+
+if [ $# -ne 4 ]; then
+    echo "usage: $0 TESTSET SRCLANG TGTLANG GEN"
+    exit 1
+fi
+
+TESTSET=$1
+SRCLANG=$2
+TGTLANG=$3
+
+GEN=$4
+
+if ! command -v sacremoses &> /dev/null
+then
+    echo "sacremoses could not be found, please install with: pip install sacremoses"
+    exit
+fi
+
+grep ^H $GEN \
+| sed 's/^H\-//' \
+| sort -n -k 1 \
+| cut -f 3 \
+| sacremoses detokenize \
+> $GEN.sorted.detok
+
+sacrebleu --test-set $TESTSET --language-pair "${SRCLANG}-${TGTLANG}" < $GEN.sorted.detok
diff --git a/SpeechT5/fairseq/scripts/shard_docs.py b/SpeechT5/fairseq/scripts/shard_docs.py
new file mode 100644
index 0000000000000000000000000000000000000000..97232c3c845ee01dc5ab627388934cc0f9588280
--- /dev/null
+++ b/SpeechT5/fairseq/scripts/shard_docs.py
@@ -0,0 +1,54 @@
+#!/usr/bin/env python3
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+"""
+Split a large file into shards while respecting document boundaries. Documents
+should be separated by a single empty line.
+"""
+
+import argparse
+import contextlib
+
+
+def main():
+    parser = argparse.ArgumentParser()
+    parser.add_argument("input")
+    parser.add_argument("--num-shards", type=int)
+    args = parser.parse_args()
+
+    assert args.num_shards is not None and args.num_shards > 1
+
+    with open(args.input, "r", encoding="utf-8") as h:
+        with contextlib.ExitStack() as stack:
+            outputs = [
+                stack.enter_context(
+                    open(args.input + ".shard" + str(i), "w", encoding="utf-8")
+                )
+                for i in range(args.num_shards)
+            ]
+
+            doc = []
+            first_doc = [True] * args.num_shards
+
+            def output_doc(i):
+                if not first_doc[i]:
+                    outputs[i].write("\n")
+                first_doc[i] = False
+                for line in doc:
+                    outputs[i].write(line)
+                doc.clear()
+
+            num_docs = 0
+            for line in h:
+                if line.strip() == "":  # empty line indicates new document
+                    output_doc(num_docs % args.num_shards)
+                    num_docs += 1
+                else:
+                    doc.append(line)
+            output_doc(num_docs % args.num_shards)
+
+
+if __name__ == "__main__":
+    main()
diff --git a/SpeechT5/fairseq/scripts/split_train_valid_docs.py b/SpeechT5/fairseq/scripts/split_train_valid_docs.py
new file mode 100644
index 0000000000000000000000000000000000000000..ff159785284a13b44626b207d84430c592acaf8f
--- /dev/null
+++ b/SpeechT5/fairseq/scripts/split_train_valid_docs.py
@@ -0,0 +1,86 @@
+#!/usr/bin/env python3
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+"""
+Split a large file into a train and valid set while respecting document
+boundaries. Documents should be separated by a single empty line.
+"""
+
+import argparse
+import random
+import sys
+
+
+def main():
+    parser = argparse.ArgumentParser()
+    parser.add_argument("input")
+    parser.add_argument("sample_output", help="train output file")
+    parser.add_argument("remainder_output", help="valid output file")
+    parser.add_argument("-k", type=int, help="remainder size")
+    parser.add_argument(
+        "--lines", action="store_true", help="split lines instead of docs"
+    )
+    args = parser.parse_args()
+
+    assert args.k is not None
+
+    sample = []
+    remainder = []
+    num_docs = [0]
+
+    def update_sample(doc):
+        if len(sample) < args.k:
+            sample.append(doc.copy())
+        else:
+            i = num_docs[0]
+            j = random.randrange(i + 1)
+            if j < args.k:
+                remainder.append(sample[j])
+                sample[j] = doc.copy()
+            else:
+                remainder.append(doc.copy())
+        num_docs[0] += 1
+        doc.clear()
+
+    with open(args.input, "r", encoding="utf-8") as h:
+        doc = []
+        for i, line in enumerate(h):
+            if line.strip() == "":  # empty line indicates new document
+                update_sample(doc)
+            else:
+                doc.append(line)
+            if args.lines:
+                update_sample(doc)
+            if i % 1000000 == 0:
+                print(i, file=sys.stderr, end="", flush=True)
+            elif i % 100000 == 0:
+                print(".", file=sys.stderr, end="", flush=True)
+        if len(doc) > 0:
+            update_sample(doc)
+    print(file=sys.stderr, flush=True)
+
+    assert len(sample) == args.k
+
+    with open(args.sample_output, "w", encoding="utf-8") as out:
+        first = True
+        for doc in sample:
+            if not first and not args.lines:
+                out.write("\n")
+            first = False
+            for line in doc:
+                out.write(line)
+
+    with open(args.remainder_output, "w", encoding="utf-8") as out:
+        first = True
+        for doc in remainder:
+            if not first and not args.lines:
+                out.write("\n")
+            first = False
+            for line in doc:
+                out.write(line)
+
+
+if __name__ == "__main__":
+    main()
diff --git a/SpeechT5/fairseq/scripts/spm_decode.py b/SpeechT5/fairseq/scripts/spm_decode.py
new file mode 100644
index 0000000000000000000000000000000000000000..1c18b1d2a7d7628b7aeb6fdb6c4ab5a096e9edf8
--- /dev/null
+++ b/SpeechT5/fairseq/scripts/spm_decode.py
@@ -0,0 +1,53 @@
+#!/usr/bin/env python
+# Copyright (c) Facebook, Inc. and its affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+
+from __future__ import absolute_import, division, print_function, unicode_literals
+
+import argparse
+
+import sentencepiece as spm
+
+
+def main():
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        "--model", required=True, help="sentencepiece model to use for decoding"
+    )
+    parser.add_argument("--input", required=True, help="input file to decode")
+    parser.add_argument("--input_format", choices=["piece", "id"], default="piece")
+    args = parser.parse_args()
+
+    sp = spm.SentencePieceProcessor()
+    sp.Load(args.model)
+
+    if args.input_format == "piece":
+
+        def decode(l):
+            return "".join(sp.DecodePieces(l))
+
+    elif args.input_format == "id":
+
+        def decode(l):
+            return "".join(sp.DecodeIds(l))
+
+    else:
+        raise NotImplementedError
+
+    def tok2int(tok):
+        # remap reference-side <unk> (represented as <<unk>>) to 0
+        return int(tok) if tok != "<<unk>>" else 0
+
+    with open(args.input, "r", encoding="utf-8") as h:
+        for line in h:
+            if args.input_format == "id":
+                print(decode(list(map(tok2int, line.rstrip().split()))))
+            elif args.input_format == "piece":
+                print(decode(line.rstrip().split()))
+
+
+if __name__ == "__main__":
+    main()
diff --git a/SpeechT5/fairseq/scripts/spm_encode.py b/SpeechT5/fairseq/scripts/spm_encode.py
new file mode 100644
index 0000000000000000000000000000000000000000..83facfb3b184aff8b9cc3f0c82dd53668c63e57b
--- /dev/null
+++ b/SpeechT5/fairseq/scripts/spm_encode.py
@@ -0,0 +1,119 @@
+#!/usr/bin/env python
+# Copyright (c) Facebook, Inc. and its affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+
+from __future__ import absolute_import, division, print_function, unicode_literals
+
+import argparse
+import contextlib
+import sys
+
+import sentencepiece as spm
+
+
+def main():
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        "--model", required=True, help="sentencepiece model to use for encoding"
+    )
+    parser.add_argument(
+        "--inputs", nargs="+", default=["-"], help="input files to filter/encode"
+    )
+    parser.add_argument(
+        "--outputs", nargs="+", default=["-"], help="path to save encoded outputs"
+    )
+    parser.add_argument("--output_format", choices=["piece", "id"], default="piece")
+    parser.add_argument(
+        "--min-len",
+        type=int,
+        metavar="N",
+        help="filter sentence pairs with fewer than N tokens",
+    )
+    parser.add_argument(
+        "--max-len",
+        type=int,
+        metavar="N",
+        help="filter sentence pairs with more than N tokens",
+    )
+    args = parser.parse_args()
+
+    assert len(args.inputs) == len(
+        args.outputs
+    ), "number of input and output paths should match"
+
+    sp = spm.SentencePieceProcessor()
+    sp.Load(args.model)
+
+    if args.output_format == "piece":
+
+        def encode(l):
+            return sp.EncodeAsPieces(l)
+
+    elif args.output_format == "id":
+
+        def encode(l):
+            return list(map(str, sp.EncodeAsIds(l)))
+
+    else:
+        raise NotImplementedError
+
+    if args.min_len is not None or args.max_len is not None:
+
+        def valid(line):
+            return (args.min_len is None or len(line) >= args.min_len) and (
+                args.max_len is None or len(line) <= args.max_len
+            )
+
+    else:
+
+        def valid(lines):
+            return True
+
+    with contextlib.ExitStack() as stack:
+        inputs = [
+            stack.enter_context(open(input, "r", encoding="utf-8"))
+            if input != "-"
+            else sys.stdin
+            for input in args.inputs
+        ]
+        outputs = [
+            stack.enter_context(open(output, "w", encoding="utf-8"))
+            if output != "-"
+            else sys.stdout
+            for output in args.outputs
+        ]
+
+        stats = {
+            "num_empty": 0,
+            "num_filtered": 0,
+        }
+
+        def encode_line(line):
+            line = line.strip()
+            if len(line) > 0:
+                line = encode(line)
+                if valid(line):
+                    return line
+                else:
+                    stats["num_filtered"] += 1
+            else:
+                stats["num_empty"] += 1
+            return None
+
+        for i, lines in enumerate(zip(*inputs), start=1):
+            enc_lines = list(map(encode_line, lines))
+            if not any(enc_line is None for enc_line in enc_lines):
+                for enc_line, output_h in zip(enc_lines, outputs):
+                    print(" ".join(enc_line), file=output_h)
+            if i % 10000 == 0:
+                print("processed {} lines".format(i), file=sys.stderr)
+
+        print("skipped {} empty lines".format(stats["num_empty"]), file=sys.stderr)
+        print("filtered {} lines".format(stats["num_filtered"]), file=sys.stderr)
+
+
+if __name__ == "__main__":
+    main()
diff --git a/SpeechT5/fairseq/scripts/spm_train.py b/SpeechT5/fairseq/scripts/spm_train.py
new file mode 100644
index 0000000000000000000000000000000000000000..9db668fd4166a860198784990de68ea26157995d
--- /dev/null
+++ b/SpeechT5/fairseq/scripts/spm_train.py
@@ -0,0 +1,16 @@
+#!/usr/bin/env python
+# Copyright (c) Facebook, Inc. and its affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+
+from __future__ import absolute_import, division, print_function, unicode_literals
+
+import sys
+
+import sentencepiece as spm
+
+
+if __name__ == "__main__":
+    spm.SentencePieceTrainer.Train(" ".join(sys.argv[1:]))
diff --git a/SpeechT5/fairseq/scripts/test_fsdp.sh b/SpeechT5/fairseq/scripts/test_fsdp.sh
new file mode 100644
index 0000000000000000000000000000000000000000..1f428a035e4474427ded991f8e8307ea59f61f69
--- /dev/null
+++ b/SpeechT5/fairseq/scripts/test_fsdp.sh
@@ -0,0 +1,24 @@
+#!/usr/bin/env bash
+rm -rf fsdp_dummy
+mkdir -p fsdp_dummy
+CUDA_VISIBLE_DEVICES=0,1,2,3 fairseq-train /private/home/sshleifer/data-bin/stories_mmap \
+    --ddp-backend fully_sharded --fp16 --fp16-init-scale 4 \
+    --cpu-offload --checkpoint-activations \
+    --task language_modeling --tokens-per-sample 256 --batch-size 8 \
+    --arch transformer_lm_gpt2_tiny \
+    --optimizer cpu_adam --adam-betas "(0.9,0.98)" \
+    --lr 0.0001 --lr-scheduler polynomial_decay --warmup-updates 5 --total-num-update 10 \
+    --max-update 5 --log-format json --log-interval 1 \
+    --save-interval-updates 5 --save-dir fsdp_dummy --disable-validation \
+    --restore-file x.pt "$@"
+
+# Now we try to load the checkpoint
+CUDA_VISIBLE_DEVICES=0,1 fairseq-train /private/home/sshleifer/data-bin/stories_mmap \
+    --ddp-backend fully_sharded --fp16 --fp16-init-scale 4 \
+    --cpu-offload --checkpoint-activations \
+    --task language_modeling --tokens-per-sample 256 --batch-size 8 \
+    --arch transformer_lm_gpt2_tiny \
+    --optimizer cpu_adam --adam-betas "(0.9,0.98)" \
+    --lr 0.0001 --lr-scheduler polynomial_decay --warmup-updates 5 --total-num-update 10 \
+    --max-update 2 --log-format json --log-interval 1 \
+    --save-interval-updates 2 --save-dir fsdp_dummy
diff --git a/SpeechT5/fairseq/setup.py b/SpeechT5/fairseq/setup.py
new file mode 100644
index 0000000000000000000000000000000000000000..51e555229c6111616362583731b181125e489ad7
--- /dev/null
+++ b/SpeechT5/fairseq/setup.py
@@ -0,0 +1,271 @@
+#!/usr/bin/env python3
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import os
+import subprocess
+import sys
+from setuptools import setup, find_packages, Extension
+
+from setuptools import Extension, find_packages, setup
+
+
+if sys.version_info < (3, 6):
+    sys.exit("Sorry, Python >= 3.6 is required for fairseq.")
+
+
+def write_version_py():
+    with open(os.path.join("fairseq", "version.txt")) as f:
+        version = f.read().strip()
+
+    # append latest commit hash to version string
+    try:
+        sha = (
+            subprocess.check_output(["git", "rev-parse", "HEAD"])
+            .decode("ascii")
+            .strip()
+        )
+        version += "+" + sha[:7]
+    except Exception:
+        pass
+
+    # write version info to fairseq/version.py
+    with open(os.path.join("fairseq", "version.py"), "w") as f:
+        f.write('__version__ = "{}"\n'.format(version))
+    return version
+
+
+version = write_version_py()
+
+
+with open("README.md") as f:
+    readme = f.read()
+
+
+if sys.platform == "darwin":
+    extra_compile_args = ["-stdlib=libc++", "-O3"]
+else:
+    extra_compile_args = ["-std=c++11", "-O3"]
+
+
+class NumpyExtension(Extension):
+    """Source: https://stackoverflow.com/a/54128391"""
+
+    def __init__(self, *args, **kwargs):
+        self.__include_dirs = []
+        super().__init__(*args, **kwargs)
+
+    @property
+    def include_dirs(self):
+        import numpy
+
+        return self.__include_dirs + [numpy.get_include()]
+
+    @include_dirs.setter
+    def include_dirs(self, dirs):
+        self.__include_dirs = dirs
+
+
+extensions = [
+    Extension(
+        "fairseq.libbleu",
+        sources=[
+            "fairseq/clib/libbleu/libbleu.cpp",
+            "fairseq/clib/libbleu/module.cpp",
+        ],
+        extra_compile_args=extra_compile_args,
+    ),
+    NumpyExtension(
+        "fairseq.data.data_utils_fast",
+        sources=["fairseq/data/data_utils_fast.pyx"],
+        language="c++",
+        extra_compile_args=extra_compile_args,
+    ),
+    NumpyExtension(
+        "fairseq.data.token_block_utils_fast",
+        sources=["fairseq/data/token_block_utils_fast.pyx"],
+        language="c++",
+        extra_compile_args=extra_compile_args,
+    ),
+]
+
+
+cmdclass = {}
+
+
+try:
+    # torch is not available when generating docs
+    from torch.utils import cpp_extension
+
+    extensions.extend(
+        [
+            cpp_extension.CppExtension(
+                "fairseq.libbase",
+                sources=[
+                    "fairseq/clib/libbase/balanced_assignment.cpp",
+                ],
+            )
+        ]
+    )
+
+    extensions.extend(
+        [
+            cpp_extension.CppExtension(
+                "fairseq.libnat",
+                sources=[
+                    "fairseq/clib/libnat/edit_dist.cpp",
+                ],
+            )
+        ]
+    )
+    if "CUDA_HOME" in os.environ:
+        extensions.extend(
+            [
+                cpp_extension.CppExtension(
+                    "fairseq.libnat_cuda",
+                    sources=[
+                        "fairseq/clib/libnat_cuda/edit_dist.cu",
+                        "fairseq/clib/libnat_cuda/binding.cpp",
+                    ],
+                ),
+                cpp_extension.CppExtension(
+                    "fairseq.ngram_repeat_block_cuda",
+                    sources=[
+                        "fairseq/clib/cuda/ngram_repeat_block_cuda.cpp",
+                        "fairseq/clib/cuda/ngram_repeat_block_cuda_kernel.cu",
+                    ],
+                ),
+            ]
+        )
+    cmdclass["build_ext"] = cpp_extension.BuildExtension
+
+except ImportError:
+    pass
+
+
+if "READTHEDOCS" in os.environ:
+    # don't build extensions when generating docs
+    extensions = []
+    if "build_ext" in cmdclass:
+        del cmdclass["build_ext"]
+
+    # use CPU build of PyTorch
+    dependency_links = [
+        "https://download.pytorch.org/whl/cpu/torch-1.7.0%2Bcpu-cp36-cp36m-linux_x86_64.whl"
+    ]
+else:
+    dependency_links = []
+
+
+if "clean" in sys.argv[1:]:
+    # Source: https://bit.ly/2NLVsgE
+    print("deleting Cython files...")
+    import subprocess
+
+    subprocess.run(
+        ["rm -f fairseq/*.so fairseq/**/*.so fairseq/*.pyd fairseq/**/*.pyd"],
+        shell=True,
+    )
+
+
+extra_packages = []
+if os.path.exists(os.path.join("fairseq", "model_parallel", "megatron", "mpu")):
+    extra_packages.append("fairseq.model_parallel.megatron.mpu")
+
+
+def do_setup(package_data):
+    setup(
+        name="fairseq",
+        version=version,
+        description="Facebook AI Research Sequence-to-Sequence Toolkit",
+        url="https://github.com/pytorch/fairseq",
+        classifiers=[
+            "Intended Audience :: Science/Research",
+            "License :: OSI Approved :: MIT License",
+            "Programming Language :: Python :: 3.6",
+            "Programming Language :: Python :: 3.7",
+            "Programming Language :: Python :: 3.8",
+            "Topic :: Scientific/Engineering :: Artificial Intelligence",
+        ],
+        long_description=readme,
+        long_description_content_type="text/markdown",
+        setup_requires=[
+            "cython",
+            'numpy<1.20.0; python_version<"3.7"',
+            'numpy; python_version>="3.7"',
+            "setuptools>=18.0",
+        ],
+        install_requires=[
+            "cffi",
+            "cython",
+            'dataclasses; python_version<"3.7"',
+            "hydra-core<1.1",
+            "omegaconf<2.1",
+            'numpy<1.20.0; python_version<"3.7"',
+            'numpy; python_version>="3.7"',
+            "regex",
+            "sacrebleu>=1.4.12",
+            "torch",
+            "tqdm",
+        ],
+        dependency_links=dependency_links,
+        packages=find_packages(
+            exclude=[
+                "examples",
+                "examples.*",
+                "scripts",
+                "scripts.*",
+                "tests",
+                "tests.*",
+            ]
+        )
+        + extra_packages,
+        package_data=package_data,
+        ext_modules=extensions,
+        test_suite="tests",
+        entry_points={
+            "console_scripts": [
+                "fairseq-eval-lm = fairseq_cli.eval_lm:cli_main",
+                "fairseq-generate = fairseq_cli.generate:cli_main",
+                "fairseq-hydra-train = fairseq_cli.hydra_train:cli_main",
+                "fairseq-interactive = fairseq_cli.interactive:cli_main",
+                "fairseq-preprocess = fairseq_cli.preprocess:cli_main",
+                "fairseq-score = fairseq_cli.score:cli_main",
+                "fairseq-train = fairseq_cli.train:cli_main",
+                "fairseq-validate = fairseq_cli.validate:cli_main",
+            ],
+        },
+        cmdclass=cmdclass,
+        zip_safe=False,
+    )
+
+
+def get_files(path, relative_to="fairseq"):
+    all_files = []
+    for root, _dirs, files in os.walk(path, followlinks=True):
+        root = os.path.relpath(root, relative_to)
+        for file in files:
+            if file.endswith(".pyc"):
+                continue
+            all_files.append(os.path.join(root, file))
+    return all_files
+
+
+if __name__ == "__main__":
+    try:
+        # symlink examples into fairseq package so package_data accepts them
+        fairseq_examples = os.path.join("fairseq", "examples")
+        if "build_ext" not in sys.argv[1:] and not os.path.exists(fairseq_examples):
+            os.symlink(os.path.join("..", "examples"), fairseq_examples)
+
+        package_data = {
+            "fairseq": (
+                get_files(fairseq_examples) + get_files(os.path.join("fairseq", "config"))
+            )
+        }
+        do_setup(package_data)
+    finally:
+        if "build_ext" not in sys.argv[1:] and os.path.islink(fairseq_examples):
+            os.unlink(fairseq_examples)
diff --git a/SpeechT5/fairseq/tests/__init__.py b/SpeechT5/fairseq/tests/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/SpeechT5/fairseq/tests/distributed/__init__.py b/SpeechT5/fairseq/tests/distributed/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/SpeechT5/fairseq/tests/distributed/test_bmuf.py b/SpeechT5/fairseq/tests/distributed/test_bmuf.py
new file mode 100644
index 0000000000000000000000000000000000000000..8b7cadb094d49587b6b82432248459fdcf42457e
--- /dev/null
+++ b/SpeechT5/fairseq/tests/distributed/test_bmuf.py
@@ -0,0 +1,207 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import argparse
+import functools
+import random
+import unittest
+from multiprocessing import Manager
+
+import torch
+import torch.nn as nn
+from fairseq import optim
+from fairseq.distributed import utils as distributed_utils
+from omegaconf import OmegaConf
+
+
+class Model(nn.Module):
+    def __init__(self, input_size, output_size):
+        super(Model, self).__init__()
+        self.fc = nn.Linear(input_size, output_size)
+
+    def forward(self, input):
+        output = self.fc(input)
+        return output
+
+
+def setup_model_loss_criterion(cfg, args, rank, is_cuda):
+    """
+    setup model, criterion and optimizer based on input args
+    """
+    args.distributed_rank = rank
+    cfg.distributed_training.distributed_rank = args.distributed_rank
+    if cfg.distributed_training.distributed_world_size > 1:
+        distributed_utils.distributed_init(cfg)
+    torch.manual_seed(1)
+    model = Model(args.input_size, args.nb_classes)
+    loss_fn = nn.CrossEntropyLoss()
+    if is_cuda:
+        model = model.cuda()
+        loss_fn = loss_fn.cuda()
+
+    optimizer = optim.sgd.SGD(args, model.parameters())
+    optimizer = optim.FairseqBMUF(
+        cfg=cfg.bmuf,
+        optimizer=optimizer
+    )
+
+    return model, loss_fn, optimizer
+
+
+def train_step(input, target, model, loss_fn, optimizer, **unused):
+    """Do forward, backward and parameter update."""
+    model.train()
+    output = model(input)
+    loss = loss_fn(output, target)
+    optimizer.backward(loss)
+    optimizer.step()
+
+
+def single_gpu_training(cfg, args, rank, iterations, shared_results):
+
+    is_cuda = torch.cuda.is_available()
+    if is_cuda:
+        torch.cuda.set_device(rank)
+
+    model, loss_fn, optimizer = setup_model_loss_criterion(cfg, args, rank, is_cuda)
+
+    for _ in range(iterations):
+        input = torch.randn(1, args.input_size)
+        target = torch.empty(args.batch_size, dtype=torch.long).random_(args.nb_classes)
+
+        if is_cuda:
+            input = input.cuda()
+            target = target.cuda()
+        train_step(input, target, model, loss_fn, optimizer)
+
+    results = []
+    for param in model.parameters():
+        if len(results) == 0:
+            results = param.flatten().cpu().data
+        else:
+            results = torch.cat((results, param.flatten().cpu().data), 0)
+
+    shared_results[rank] = results
+
+
+def setup_args():
+    args = argparse.Namespace()
+    args.global_sync_iter = 20
+    args.block_momentum = 0.875
+    args.block_lr = 0.5
+    args.input_size = 5
+    args.nb_classes = 2
+    args.batch_size = 1
+    args.lr = [1e-3]
+    args.momentum = 0
+    args.weight_decay = 0
+    args.warmup_iterations = 0
+    args.use_nbm = True
+    args.average_sync = True
+    args.global_sync_iter = 1
+    args.model_parallel_size = 1
+    args.distributed_backend = "gloo"
+
+    args.distributed_world_size = 2
+    port = random.randint(10000, 20000)
+    args.distributed_init_method = "tcp://localhost:{port}".format(port=port)
+    args.distributed_init_host = "localhost"
+    args.distributed_port = port + 1
+    args.local_world_size = args.distributed_world_size
+
+    cfg = OmegaConf.create()
+    cfg.optimization = OmegaConf.create()
+    cfg.common = OmegaConf.create()
+    cfg.distributed_training = OmegaConf.create()
+    cfg.dataset = OmegaConf.create()
+    cfg.bmuf = OmegaConf.create()
+    cfg.optimizer = OmegaConf.create()
+
+    cfg.bmuf.global_sync_iter = args.global_sync_iter
+    cfg.bmuf.block_momentum = args.block_momentum
+    cfg.bmuf.block_lr = args.block_lr
+    cfg.dataset.batch_size = args.batch_size
+    cfg.optimization.lr = args.lr
+    cfg.optimizer.momentum = args.momentum
+    cfg.optimizer.weight_decay = args.weight_decay
+    cfg.bmuf.warmup_iterations = args.warmup_iterations
+    cfg.bmuf.use_nbm = args.use_nbm
+    cfg.bmuf.average_sync = args.average_sync
+    cfg.common.model_parallel_size = args.model_parallel_size
+    cfg.distributed_training.distributed_backend = args.distributed_backend
+    cfg.distributed_training.distributed_world_size = args.distributed_world_size
+    cfg.bmuf.distributed_world_size = args.distributed_world_size
+    cfg.distributed_training.distributed_init_method = args.distributed_init_method
+    cfg.distributed_training.distributed_port = args.distributed_port
+
+    return cfg, args
+
+
+@unittest.skipIf(torch.cuda.device_count() < 2, "test requires 2 GPUs")
+class TestBMUF(unittest.TestCase):
+    def bmuf_process(self, cfg, args, iterations):
+        processes = []
+        results = Manager().dict()
+        torch.multiprocessing.spawn(
+            fn=functools.partial(single_gpu_training, cfg, args),
+            args=(iterations, results),
+            nprocs=args.distributed_world_size,
+            join=True,
+        )
+        return results
+
+    def test_bmuf_sync(self):
+        # Train model for 1 iteration and do bmuf sync without doing warmup
+        cfg, args = setup_args()
+        iterations = 1
+        results = self.bmuf_process(cfg, args, iterations)
+        # Make sure params in both machines are same
+        assert len(results) == 2
+        self.assertAlmostEqual(results[0], results[1])
+
+    def test_warmup_sync(self):
+        # Train model for 20 iteration and do warmup sync without doing bmuf sync
+        cfg, args = setup_args()
+        args.warmup_iterations = 20
+        cfg.bmuf.warmup_iterations = args.warmup_iterations
+        iterations = 20
+        results = self.bmuf_process(cfg, args, iterations)
+        # Make sure params in both machines are same
+        assert len(results) == 2
+        self.assertAlmostEqual(results[0], results[1])
+
+    def test_warmup_sync_bmuf_sync(self):
+        # Train model for 25 iteration and do warmup sync after 20 iteration
+        # and bmuf sync after 25 iteration
+        cfg, args = setup_args()
+        args.warmup_iterations = 20
+        args.global_sync_iter = 5
+        cfg.bmuf.warmup_iterations = args.warmup_iterations
+        cfg.bmuf.global_sync_iter = args.global_sync_iter
+        iterations = 25
+        results = self.bmuf_process(cfg, args, iterations)
+        # Make sure params in both machines are same
+        assert len(results) == 2
+        self.assertAlmostEqual(results[0], results[1])
+
+    def test_single_gpu_bmuf(self):
+        # Train model for 5 iterations and use GPU 1
+        cfg, args = setup_args()
+        args.distributed_world_size = 1
+        args.warmup_iterations = 5
+        cfg.distributed_training.distributed_world_size = args.distributed_world_size
+        cfg.bmuf.distributed_world_size = args.distributed_world_size
+        cfg.bmuf.warmup_iterations = args.warmup_iterations
+        iterations = 20
+        results = self.bmuf_process(cfg, args, iterations)
+        assert len(results) == 1
+
+    def assertAlmostEqual(self, t1, t2):
+        self.assertEqual(t1.size(), t2.size(), "size mismatch")
+        self.assertLess((t1 - t2).abs().max(), 1e-4)
+
+
+if __name__ == "__main__":
+    unittest.main()
diff --git a/SpeechT5/fairseq/tests/distributed/test_distributed_timeout_wrapper.py b/SpeechT5/fairseq/tests/distributed/test_distributed_timeout_wrapper.py
new file mode 100644
index 0000000000000000000000000000000000000000..27908b9d3f7d6d880351e2a12effb12f9bc27971
--- /dev/null
+++ b/SpeechT5/fairseq/tests/distributed/test_distributed_timeout_wrapper.py
@@ -0,0 +1,54 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import logging
+import signal
+import time
+import unittest
+
+import torch
+from torch import nn
+
+from fairseq.distributed import DistributedTimeoutWrapper
+
+
+class ModuleWithDelay(nn.Module):
+
+    def __init__(self, delay):
+        super().__init__()
+        self.delay = delay
+
+    def forward(self, x):
+        time.sleep(self.delay)
+        return x
+
+
+class TestDistributedTimeoutWrapper(unittest.TestCase):
+
+    def setUp(self):
+        logging.disable(logging.CRITICAL)
+
+    def tearDown(self):
+        logging.disable(logging.NOTSET)
+
+    def test_no_timeout(self):
+        module = DistributedTimeoutWrapper(ModuleWithDelay(1), 0, signal.SIGINT)
+        module(torch.rand(5))
+        module.stop_timeout()
+
+    def test_timeout_safe(self):
+        module = DistributedTimeoutWrapper(ModuleWithDelay(1), 10, signal.SIGINT)
+        module(torch.rand(5))
+        module.stop_timeout()
+
+    def test_timeout_killed(self):
+        with self.assertRaises(KeyboardInterrupt):
+            module = DistributedTimeoutWrapper(ModuleWithDelay(5), 1, signal.SIGINT)
+            module(torch.rand(5))
+            module.stop_timeout()
+
+
+if __name__ == "__main__":
+    unittest.main()
diff --git a/SpeechT5/fairseq/tests/distributed/test_module_proxy_wrapper.py b/SpeechT5/fairseq/tests/distributed/test_module_proxy_wrapper.py
new file mode 100644
index 0000000000000000000000000000000000000000..2803a044cdcc12e0a348f40d06ce89c571d307ed
--- /dev/null
+++ b/SpeechT5/fairseq/tests/distributed/test_module_proxy_wrapper.py
@@ -0,0 +1,75 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import unittest
+
+import torch
+from torch import nn
+
+from fairseq.distributed import ModuleProxyWrapper
+
+from .utils import objects_are_equal
+
+
+class MockDDPWrapper(nn.Module):
+    """A simple wrapper with an interface similar to DistributedDataParallel."""
+
+    def __init__(self, module):
+        super().__init__()
+        self.module = module
+
+    def forward(self, x):
+        return self.module(x)
+
+
+class Model(nn.Module):
+    def __init__(self):
+        super().__init__()
+        self.linear = nn.Linear(5, 10)
+        self.xyz = "hello"
+
+    def forward(self, x):
+        return self.linear(x)
+
+    def get_xyz(self):
+        return self.xyz
+
+
+class TestModuleProxyWrapper(unittest.TestCase):
+
+    def _get_module(self):
+        module = Model()
+        wrapped_module = MockDDPWrapper(module)
+        wrapped_module = ModuleProxyWrapper(wrapped_module)
+        return wrapped_module, module
+
+    def test_getattr_forwarding(self):
+        wrapped_module, module = self._get_module()
+        assert module.xyz == "hello"
+        assert module.get_xyz() == "hello"
+        assert wrapped_module.xyz == "hello"
+
+        wrapped_module.xyz = "world"
+        assert wrapped_module.xyz == "world"
+        assert module.get_xyz() == "hello"
+
+    def test_state_dict(self):
+        wrapped_module, module = self._get_module()
+        assert objects_are_equal(wrapped_module.state_dict(), module.state_dict())
+
+    def test_load_state_dict(self):
+        wrapped_module, module = self._get_module()
+        wrapped_module.load_state_dict(module.state_dict())
+        input = torch.rand(4, 5)
+        torch.testing.assert_allclose(wrapped_module(input), module(input))
+
+    def test_forward(self):
+        wrapped_module, module = self._get_module()
+        input = torch.rand(4, 5)
+        torch.testing.assert_allclose(wrapped_module(input), module(input))
+
+
+if __name__ == "__main__":
+    unittest.main()
diff --git a/SpeechT5/fairseq/tests/distributed/test_utils.py b/SpeechT5/fairseq/tests/distributed/test_utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..30f995b67acd39af5816d2eb412d6b4df7f44f8c
--- /dev/null
+++ b/SpeechT5/fairseq/tests/distributed/test_utils.py
@@ -0,0 +1,124 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import functools
+import sys
+import unittest
+
+import torch
+
+from fairseq.distributed import utils as dist_utils
+
+from .utils import objects_are_equal, spawn_and_init
+
+
+class DistributedTest(unittest.TestCase):
+    def setUp(self):
+        if not torch.cuda.is_available():
+            raise unittest.SkipTest("CUDA not available, skipping test")
+        if sys.platform == "win32":
+            raise unittest.SkipTest("NCCL doesn't support Windows, skipping test")
+        if torch.cuda.device_count() < 2:
+            raise unittest.SkipTest("distributed tests require 2+ GPUs, skipping")
+
+
+class TestBroadcastObject(DistributedTest):
+    def test_str(self):
+        spawn_and_init(
+            functools.partial(
+                TestBroadcastObject._test_broadcast_object, "hello world"
+            ),
+            world_size=2,
+        )
+
+    def test_tensor(self):
+        spawn_and_init(
+            functools.partial(
+                TestBroadcastObject._test_broadcast_object,
+                torch.rand(5),
+            ),
+            world_size=2,
+        )
+
+    def test_complex(self):
+        spawn_and_init(
+            functools.partial(
+                TestBroadcastObject._test_broadcast_object,
+                {
+                    "a": "1",
+                    "b": [2, torch.rand(2, 3), 3],
+                    "c": (torch.rand(2, 3), 4),
+                    "d": {5, torch.rand(5)},
+                    "e": torch.rand(5),
+                    "f": torch.rand(5).int().cuda(),
+                },
+            ),
+            world_size=2,
+        )
+
+    @staticmethod
+    def _test_broadcast_object(ref_obj, rank, group):
+        obj = dist_utils.broadcast_object(
+            ref_obj if rank == 0 else None, src_rank=0, group=group
+        )
+        assert objects_are_equal(ref_obj, obj)
+
+
+class TestAllGatherList(DistributedTest):
+    def test_str_equality(self):
+        spawn_and_init(
+            functools.partial(
+                TestAllGatherList._test_all_gather_list_equality,
+                "hello world",
+            ),
+            world_size=2,
+        )
+
+    def test_tensor_equality(self):
+        spawn_and_init(
+            functools.partial(
+                TestAllGatherList._test_all_gather_list_equality,
+                torch.rand(5),
+            ),
+            world_size=2,
+        )
+
+    def test_complex_equality(self):
+        spawn_and_init(
+            functools.partial(
+                TestAllGatherList._test_all_gather_list_equality,
+                {
+                    "a": "1",
+                    "b": [2, torch.rand(2, 3), 3],
+                    "c": (torch.rand(2, 3), 4),
+                    "d": {5, torch.rand(5)},
+                    "e": torch.rand(5),
+                    "f": torch.rand(5).int(),
+                },
+            ),
+            world_size=2,
+        )
+
+    @staticmethod
+    def _test_all_gather_list_equality(ref_obj, rank, group):
+        objs = dist_utils.all_gather_list(ref_obj, group)
+        for obj in objs:
+            assert objects_are_equal(ref_obj, obj)
+
+    def test_rank_tensor(self):
+        spawn_and_init(
+            TestAllGatherList._test_all_gather_list_rank_tensor, world_size=2
+        )
+
+    @staticmethod
+    def _test_all_gather_list_rank_tensor(rank, group):
+        obj = torch.tensor([rank])
+        objs = dist_utils.all_gather_list(obj, group)
+        for i, obj in enumerate(objs):
+            assert obj.item() == i
+
+
+if __name__ == "__main__":
+    unittest.main()
diff --git a/SpeechT5/fairseq/tests/distributed/utils.py b/SpeechT5/fairseq/tests/distributed/utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..c8040392a8e27eb4c3a74032c702643a91d11a3e
--- /dev/null
+++ b/SpeechT5/fairseq/tests/distributed/utils.py
@@ -0,0 +1,62 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import functools
+import tempfile
+
+import torch
+
+
+def spawn_and_init(fn, world_size, args=None):
+    if args is None:
+        args = ()
+    with tempfile.NamedTemporaryFile(delete=False) as tmp_file:
+        torch.multiprocessing.spawn(
+            fn=functools.partial(init_and_run, fn, args),
+            args=(world_size, tmp_file.name,),
+            nprocs=world_size,
+            join=True,
+        )
+
+
+def distributed_init(rank, world_size, tmp_file):
+    torch.distributed.init_process_group(
+        backend="nccl",
+        init_method="file://{}".format(tmp_file),
+        world_size=world_size,
+        rank=rank,
+    )
+    torch.cuda.set_device(rank)
+
+
+def init_and_run(fn, args, rank, world_size, tmp_file):
+    distributed_init(rank, world_size, tmp_file)
+    group = torch.distributed.new_group()
+    fn(rank, group, *args)
+
+
+def objects_are_equal(a, b) -> bool:
+    if type(a) is not type(b):
+        return False
+    if isinstance(a, dict):
+        if set(a.keys()) != set(b.keys()):
+            return False
+        for k in a.keys():
+            if not objects_are_equal(a[k], b[k]):
+                return False
+        return True
+    elif isinstance(a, (list, tuple, set)):
+        if len(a) != len(b):
+            return False
+        return all(objects_are_equal(x, y) for x, y in zip(a, b))
+    elif torch.is_tensor(a):
+        return (
+            a.size() == b.size()
+            and a.dtype == b.dtype
+            and a.device == b.device
+            and torch.all(a == b)
+        )
+    else:
+        return a == b
diff --git a/SpeechT5/fairseq/tests/gpu/__init__.py b/SpeechT5/fairseq/tests/gpu/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/SpeechT5/fairseq/tests/gpu/test_binaries_gpu.py b/SpeechT5/fairseq/tests/gpu/test_binaries_gpu.py
new file mode 100644
index 0000000000000000000000000000000000000000..de8c2426134089035c6e0e5da223647bab6f3dba
--- /dev/null
+++ b/SpeechT5/fairseq/tests/gpu/test_binaries_gpu.py
@@ -0,0 +1,449 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import contextlib
+import logging
+import json
+import os
+import tempfile
+import unittest
+from io import StringIO
+
+import torch
+from fairseq import options
+from fairseq_cli import train
+from tests.utils import (
+    create_dummy_data,
+    generate_main,
+    preprocess_lm_data,
+    preprocess_translation_data,
+    train_translation_model,
+)
+
+
+@unittest.skipIf(not torch.cuda.is_available(), "test requires a GPU")
+class TestTranslationGPU(unittest.TestCase):
+    def setUp(self):
+        logging.disable(logging.CRITICAL)
+
+    def tearDown(self):
+        logging.disable(logging.NOTSET)
+
+    def test_fp16_multigpu(self):
+        with contextlib.redirect_stdout(StringIO()):
+            with tempfile.TemporaryDirectory("test_fp16") as data_dir:
+                log = os.path.join(data_dir, "train.log")
+                create_dummy_data(data_dir)
+                preprocess_translation_data(data_dir)
+                train_translation_model(
+                    data_dir,
+                    "fconv_iwslt_de_en",
+                    ["--fp16", "--log-file", log],
+                    world_size=min(torch.cuda.device_count(), 2),
+                )
+                generate_main(data_dir)
+                assert os.path.exists(log)
+
+    @staticmethod
+    def parse_logs(logfile):
+        logs = []
+        for ln in open(logfile, "r").readlines():
+            try:
+                logs.append(json.loads(ln))
+            except json.JSONDecodeError:
+                continue
+        return logs
+
+    def test_resume_training_fsdp(self):
+        self._test_resume_training(["--ddp-backend", "fully_sharded"])
+
+    def test_resume_training_fsdp_sharded_state(self):
+        self._test_resume_training(["--ddp-backend", "fully_sharded", "--use-sharded-state"])
+
+    def test_resume_training_noc10d(self):
+        self._test_resume_training([])
+
+    def _test_resume_training(self, extra_clargs, arch="fconv_iwslt_de_en"):
+        flags = [
+            "--fp16",
+            "--log-format",
+            "json",
+            "--max-update",
+            "10",
+            "--save-interval-updates",
+            "2",
+            "--log-interval",
+            "1",
+        ] + extra_clargs
+        world_size = min(torch.cuda.device_count(), 2)
+        with contextlib.redirect_stdout(StringIO()):
+            with tempfile.TemporaryDirectory("test_fp16") as data_dir:
+                log = os.path.join(data_dir, "train.log")
+                create_dummy_data(data_dir)
+                preprocess_translation_data(data_dir)
+                train_translation_model(
+                    data_dir, arch, flags + ["--log-file", log], world_size=world_size,
+                )
+                log2 = os.path.join(data_dir, "resume.log")
+                restore_file = os.path.join(data_dir, "checkpoint_1_2.pt")
+                train_translation_model(
+                    data_dir,
+                    arch,
+                    flags + ["--log-file", log2, "--restore-file", restore_file],
+                    world_size=world_size,
+                )
+
+                l1 = self.parse_logs(log)
+                l2 = self.parse_logs(log2)
+                assert int(l2[0]["num_updates"]) == 3, f"{l1}\n\n {l2}"
+                for k in [
+                    "train_loss",
+                    "train_num_updates",
+                    "train_ppl",
+                    "train_gnorm",
+                ]:
+                    from_scratch, resumed = l1[-1][k], l2[-1][k]
+                    assert (
+                        from_scratch == resumed
+                    ), f"difference at {k} {from_scratch} != {resumed}"
+
+    def test_memory_efficient_fp16(self):
+        with contextlib.redirect_stdout(StringIO()):
+            with tempfile.TemporaryDirectory("test_memory_efficient_fp16") as data_dir:
+                create_dummy_data(data_dir)
+                preprocess_translation_data(data_dir)
+                train_translation_model(
+                    data_dir, "fconv_iwslt_de_en", ["--memory-efficient-fp16"]
+                )
+                generate_main(data_dir)
+
+    def test_transformer_fp16(self):
+        with contextlib.redirect_stdout(StringIO()):
+            with tempfile.TemporaryDirectory("test_transformer") as data_dir:
+                create_dummy_data(data_dir)
+                preprocess_translation_data(data_dir)
+                train_translation_model(
+                    data_dir,
+                    "transformer_iwslt_de_en",
+                    [
+                        "--encoder-layers",
+                        "2",
+                        "--decoder-layers",
+                        "2",
+                        "--encoder-embed-dim",
+                        "64",
+                        "--decoder-embed-dim",
+                        "64",
+                        "--fp16",
+                    ],
+                    run_validation=True,
+                )
+                generate_main(data_dir)
+
+    @unittest.skipIf(not torch.cuda.is_available(), "test requires a GPU")
+    def test_amp(self):
+        with contextlib.redirect_stdout(StringIO()):
+            with tempfile.TemporaryDirectory("test_amp") as data_dir:
+                create_dummy_data(data_dir)
+                preprocess_translation_data(data_dir)
+                train_translation_model(data_dir, "fconv_iwslt_de_en", ["--amp"])
+                generate_main(data_dir)
+
+    @unittest.skipIf(not torch.cuda.is_available(), "test requires a GPU")
+    def test_transformer_amp(self):
+        with contextlib.redirect_stdout(StringIO()):
+            with tempfile.TemporaryDirectory("test_transformer") as data_dir:
+                create_dummy_data(data_dir)
+                preprocess_translation_data(data_dir)
+                train_translation_model(
+                    data_dir,
+                    "transformer_iwslt_de_en",
+                    [
+                        "--encoder-layers",
+                        "2",
+                        "--decoder-layers",
+                        "2",
+                        "--encoder-embed-dim",
+                        "64",
+                        "--decoder-embed-dim",
+                        "64",
+                        "--amp",
+                    ],
+                    run_validation=True,
+                )
+                generate_main(data_dir)
+
+    @unittest.skipIf(not torch.cuda.is_available(), "test requires a GPU")
+    def test_levenshtein_transformer(self):
+        with contextlib.redirect_stdout(StringIO()):
+            with tempfile.TemporaryDirectory(
+                "test_levenshtein_transformer"
+            ) as data_dir:
+                create_dummy_data(data_dir)
+                preprocess_translation_data(data_dir, ["--joined-dictionary"])
+                train_translation_model(
+                    data_dir,
+                    "levenshtein_transformer",
+                    [
+                        "--apply-bert-init",
+                        "--early-exit",
+                        "6,6,6",
+                        "--criterion",
+                        "nat_loss",
+                    ],
+                    task="translation_lev",
+                )
+                gen_config = [
+                    "--task",
+                    "translation_lev",
+                    "--iter-decode-max-iter",
+                    "9",
+                    "--iter-decode-eos-penalty",
+                    "0",
+                    "--print-step",
+                ]
+                # non-ensemble generation
+                generate_main(data_dir, gen_config)
+                # ensemble generation
+                generate_main(
+                    data_dir,
+                    gen_config,
+                    path=os.pathsep.join(
+                        [
+                            os.path.join(data_dir, "checkpoint_last.pt"),
+                            os.path.join(data_dir, "checkpoint_last.pt"),
+                        ]
+                    ),
+                )
+
+    def test_fsdp_checkpoint_generate(self):
+        with contextlib.redirect_stdout(StringIO()):
+            with tempfile.TemporaryDirectory("test_fsdp_sharded") as data_dir:
+                log = os.path.join(data_dir, "train.log")
+                create_dummy_data(data_dir)
+                preprocess_translation_data(data_dir)
+                world_size = min(torch.cuda.device_count(), 2)
+                train_translation_model(
+                    data_dir,
+                    "fconv_iwslt_de_en",
+                    ["--log-file", log, "--ddp-backend", "fully_sharded"],
+                    world_size=world_size,
+                )
+                generate_main(data_dir)
+                assert os.path.exists(log)
+
+    def test_fsdp_sharded_checkpoint_generate(self):
+        with contextlib.redirect_stdout(StringIO()):
+            with tempfile.TemporaryDirectory("test_fsdp_sharded") as data_dir:
+                log = os.path.join(data_dir, "train.log")
+                create_dummy_data(data_dir)
+                preprocess_translation_data(data_dir)
+                world_size = min(torch.cuda.device_count(), 2)
+                train_translation_model(
+                    data_dir,
+                    "fconv_iwslt_de_en",
+                    ["--log-file", log, "--ddp-backend", "fully_sharded", "--use-sharded-state"],
+                    world_size=world_size,
+                )
+                generate_main(data_dir, ["--checkpoint-shard-count", str(world_size)])
+                assert os.path.exists(log)
+
+
+def _quantize_language_model(data_dir, arch, extra_flags=None, run_validation=False):
+    train_parser = options.get_training_parser()
+    train_args = options.parse_args_and_arch(
+        train_parser,
+        [
+            "--task",
+            "language_modeling",
+            data_dir,
+            "--arch",
+            arch,
+            "--optimizer",
+            "adam",
+            "--lr",
+            "0.0001",
+            "--criterion",
+            "adaptive_loss",
+            "--adaptive-softmax-cutoff",
+            "5,10,15",
+            "--max-tokens",
+            "500",
+            "--tokens-per-sample",
+            "500",
+            "--save-dir",
+            data_dir,
+            "--max-epoch",
+            "1",
+            "--no-progress-bar",
+            "--distributed-world-size",
+            "1",
+            "--ddp-backend",
+            "no_c10d",
+            "--num-workers",
+            "0",
+        ]
+        + (extra_flags or []),
+    )
+    train.main(train_args)
+
+    # try scalar quantization
+    scalar_quant_train_parser = options.get_training_parser()
+    scalar_quant_train_args = options.parse_args_and_arch(
+        scalar_quant_train_parser,
+        [
+            "--task",
+            "language_modeling",
+            data_dir,
+            "--arch",
+            arch,
+            "--optimizer",
+            "adam",
+            "--lr",
+            "0.0001",
+            "--criterion",
+            "adaptive_loss",
+            "--adaptive-softmax-cutoff",
+            "5,10,15",
+            "--max-tokens",
+            "500",
+            "--tokens-per-sample",
+            "500",
+            "--save-dir",
+            data_dir,
+            "--max-update",
+            "3",
+            "--no-progress-bar",
+            "--distributed-world-size",
+            "1",
+            "--ddp-backend",
+            "no_c10d",
+            "--num-workers",
+            "0",
+            "--quant-noise-scalar",
+            "0.5",
+        ]
+        + (extra_flags or []),
+    )
+    train.main(scalar_quant_train_args)
+
+    # try iterative PQ quantization
+    quantize_parser = options.get_training_parser()
+    quantize_args = options.parse_args_and_arch(
+        quantize_parser,
+        [
+            "--task",
+            "language_modeling",
+            data_dir,
+            "--arch",
+            arch,
+            "--optimizer",
+            "adam",
+            "--lr",
+            "0.0001",
+            "--criterion",
+            "adaptive_loss",
+            "--adaptive-softmax-cutoff",
+            "5,10,15",
+            "--max-tokens",
+            "50",
+            "--tokens-per-sample",
+            "50",
+            "--max-update",
+            "6",
+            "--no-progress-bar",
+            "--distributed-world-size",
+            "1",
+            "--ddp-backend",
+            "no_c10d",
+            "--num-workers",
+            "0",
+            "--restore-file",
+            os.path.join(data_dir, "checkpoint_last.pt"),
+            "--reset-optimizer",
+            "--quantization-config-path",
+            os.path.join(
+                os.path.dirname(__file__), "transformer_quantization_config.yaml"
+            ),
+        ]
+        + (extra_flags or []),
+    )
+    train.main(quantize_args)
+
+
+@unittest.skipIf(not torch.cuda.is_available(), "test requires a GPU")
+class TestQuantization(unittest.TestCase):
+    def setUp(self):
+        logging.disable(logging.CRITICAL)
+
+    def tearDown(self):
+        logging.disable(logging.NOTSET)
+
+    def test_quantization(self):
+        with contextlib.redirect_stdout(StringIO()):
+            with tempfile.TemporaryDirectory("test_quantization") as data_dir:
+                create_dummy_data(data_dir)
+                preprocess_lm_data(data_dir)
+                # tests both scalar and iterative PQ quantization
+                _quantize_language_model(data_dir, "transformer_lm")
+
+
+@unittest.skipIf(not torch.cuda.is_available(), "test requires a GPU")
+class TestOptimizersGPU(unittest.TestCase):
+    def setUp(self):
+        logging.disable(logging.CRITICAL)
+
+    def tearDown(self):
+        logging.disable(logging.NOTSET)
+
+    def test_flat_grads(self):
+        with contextlib.redirect_stdout(StringIO()):
+            with tempfile.TemporaryDirectory("test_flat_grads") as data_dir:
+                # Use just a bit of data and tiny model to keep this test runtime reasonable
+                create_dummy_data(data_dir, num_examples=10, maxlen=5)
+                preprocess_translation_data(data_dir)
+                with self.assertRaises(RuntimeError):
+                    # adafactor isn't compatible with flat grads, which
+                    # are used by default with --fp16
+                    train_translation_model(
+                        data_dir,
+                        "lstm",
+                        [
+                            "--required-batch-size-multiple",
+                            "1",
+                            "--encoder-layers",
+                            "1",
+                            "--encoder-hidden-size",
+                            "32",
+                            "--decoder-layers",
+                            "1",
+                            "--optimizer",
+                            "adafactor",
+                            "--fp16",
+                        ],
+                    )
+                # but it should pass once we set --fp16-no-flatten-grads
+                train_translation_model(
+                    data_dir,
+                    "lstm",
+                    [
+                        "--required-batch-size-multiple",
+                        "1",
+                        "--encoder-layers",
+                        "1",
+                        "--encoder-hidden-size",
+                        "32",
+                        "--decoder-layers",
+                        "1",
+                        "--optimizer",
+                        "adafactor",
+                        "--fp16",
+                        "--fp16-no-flatten-grads",
+                    ],
+                )
+
+
+if __name__ == "__main__":
+    unittest.main()
diff --git a/SpeechT5/fairseq/tests/gpu/transformer_quantization_config.yaml b/SpeechT5/fairseq/tests/gpu/transformer_quantization_config.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..de31d8116ced675b81eb74119642217d768e7736
--- /dev/null
+++ b/SpeechT5/fairseq/tests/gpu/transformer_quantization_config.yaml
@@ -0,0 +1,28 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+# This file defines example configuration arguments for quantizing
+# a transformer model with product quantization
+
+n_centroids:
+    Linear:
+        key: in_features
+        value: {"*": 8}
+    Embedding:
+        key: embedding_dim
+        value: {"*": 8}
+
+block_sizes:
+  Linear:
+      key: fuzzy_name
+      value: {fc: 8, attn: 4, emb: 4}
+  Embedding:
+      key: fuzzy_name
+      value: {emb: 8}
+
+layers_to_quantize:
+    - decoder\\.layers\\.\d+\\.fc[12]
+    - decoder\\.embed_tokens\\.embeddings\\.[012]\\.[01]
+    - decoder\\.layers\\.\d+\\.self_attn\\.(k_proj|v_proj|q_proj|out_proj)
diff --git a/SpeechT5/fairseq/tests/speech_recognition/__init__.py b/SpeechT5/fairseq/tests/speech_recognition/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/SpeechT5/fairseq/tests/speech_recognition/asr_test_base.py b/SpeechT5/fairseq/tests/speech_recognition/asr_test_base.py
new file mode 100644
index 0000000000000000000000000000000000000000..8c5d414e7bf17ee02f280d024fa5d07e28b79d6b
--- /dev/null
+++ b/SpeechT5/fairseq/tests/speech_recognition/asr_test_base.py
@@ -0,0 +1,557 @@
+#!/usr/bin/env python3
+
+import argparse
+import os
+import unittest
+from inspect import currentframe, getframeinfo
+
+import numpy as np
+import torch
+from examples.speech_recognition.data.data_utils import lengths_to_encoder_padding_mask
+from fairseq.data import data_utils as fairseq_data_utils
+from fairseq.data.dictionary import Dictionary
+from fairseq.models import (
+    BaseFairseqModel,
+    FairseqDecoder,
+    FairseqEncoder,
+    FairseqEncoderDecoderModel,
+    FairseqEncoderModel,
+    FairseqModel,
+)
+from fairseq.tasks.fairseq_task import LegacyFairseqTask
+
+
+DEFAULT_TEST_VOCAB_SIZE = 100
+
+
+# ///////////////////////////////////////////////////////////////////////////
+# utility function to setup dummy dict/task/input
+# ///////////////////////////////////////////////////////////////////////////
+
+
+def get_dummy_dictionary(vocab_size=DEFAULT_TEST_VOCAB_SIZE):
+    dummy_dict = Dictionary()
+    # add dummy symbol to satisfy vocab size
+    for id, _ in enumerate(range(vocab_size)):
+        dummy_dict.add_symbol("{}".format(id), 1000)
+    return dummy_dict
+
+
+class DummyTask(LegacyFairseqTask):
+    def __init__(self, args):
+        super().__init__(args)
+        self.dictionary = get_dummy_dictionary()
+        if getattr(self.args, "ctc", False):
+            self.dictionary.add_symbol("<ctc_blank>")
+        self.tgt_dict = self.dictionary
+
+    @property
+    def target_dictionary(self):
+        return self.dictionary
+
+
+def get_dummy_task_and_parser():
+    """
+    to build a fariseq model, we need some dummy parse and task. This function
+    is used to create dummy task and parser to faciliate model/criterion test
+
+    Note: we use FbSpeechRecognitionTask as the dummy task. You may want
+    to use other task by providing another function
+    """
+    parser = argparse.ArgumentParser(
+        description="test_dummy_s2s_task", argument_default=argparse.SUPPRESS
+    )
+    DummyTask.add_args(parser)
+    args = parser.parse_args([])
+    task = DummyTask.setup_task(args)
+    return task, parser
+
+
+def get_dummy_input(T=100, D=80, B=5, K=100):
+    forward_input = {}
+    # T max sequence length
+    # D feature vector dimension
+    # B batch size
+    # K target dimension size
+    feature = torch.randn(B, T, D)
+    # this (B, T, D) layout is just a convention, you can override it by
+    # write your own _prepare_forward_input function
+    src_lengths = torch.from_numpy(
+        np.random.randint(low=1, high=T, size=B, dtype=np.int64)
+    )
+    src_lengths[0] = T  # make sure the maximum length matches
+    prev_output_tokens = []
+    for b in range(B):
+        token_length = np.random.randint(low=1, high=src_lengths[b].item() + 1)
+        tokens = np.random.randint(low=0, high=K, size=token_length, dtype=np.int64)
+        prev_output_tokens.append(torch.from_numpy(tokens))
+
+    prev_output_tokens = fairseq_data_utils.collate_tokens(
+        prev_output_tokens,
+        pad_idx=1,
+        eos_idx=2,
+        left_pad=False,
+        move_eos_to_beginning=False,
+    )
+    src_lengths, sorted_order = src_lengths.sort(descending=True)
+    forward_input["src_tokens"] = feature.index_select(0, sorted_order)
+    forward_input["src_lengths"] = src_lengths
+    forward_input["prev_output_tokens"] = prev_output_tokens
+
+    return forward_input
+
+
+def get_dummy_encoder_output(encoder_out_shape=(100, 80, 5)):
+    """
+    This only provides an example to generate dummy encoder output
+    """
+    (T, B, D) = encoder_out_shape
+    encoder_out = {}
+
+    encoder_out["encoder_out"] = torch.from_numpy(
+        np.random.randn(*encoder_out_shape).astype(np.float32)
+    )
+    seq_lengths = torch.from_numpy(np.random.randint(low=1, high=T, size=B))
+    # some dummy mask
+    encoder_out["encoder_padding_mask"] = torch.arange(T).view(1, T).expand(
+        B, -1
+    ) >= seq_lengths.view(B, 1).expand(-1, T)
+    encoder_out["encoder_padding_mask"].t_()
+
+    # encoer_padding_mask is (T, B) tensor, with (t, b)-th element indicate
+    # whether encoder_out[t, b] is valid (=0) or not (=1)
+    return encoder_out
+
+
+def _current_postion_info():
+    cf = currentframe()
+    frameinfo = " (at {}:{})".format(
+        os.path.basename(getframeinfo(cf).filename), cf.f_back.f_lineno
+    )
+    return frameinfo
+
+
+def check_encoder_output(encoder_output, batch_size=None):
+    """we expect encoder_output to be a dict with the following
+    key/value pairs:
+    - encoder_out: a Torch.Tensor
+    - encoder_padding_mask: a binary Torch.Tensor
+    """
+    if not isinstance(encoder_output, dict):
+        msg = (
+            "FairseqEncoderModel.forward(...) must be a dict" + _current_postion_info()
+        )
+        return False, msg
+
+    if "encoder_out" not in encoder_output:
+        msg = (
+            "FairseqEncoderModel.forward(...) must contain encoder_out"
+            + _current_postion_info()
+        )
+        return False, msg
+
+    if "encoder_padding_mask" not in encoder_output:
+        msg = (
+            "FairseqEncoderModel.forward(...) must contain encoder_padding_mask"
+            + _current_postion_info()
+        )
+        return False, msg
+
+    if not isinstance(encoder_output["encoder_out"], torch.Tensor):
+        msg = "encoder_out must be a torch.Tensor" + _current_postion_info()
+        return False, msg
+
+    if encoder_output["encoder_out"].dtype != torch.float32:
+        msg = "encoder_out must have float32 dtype" + _current_postion_info()
+        return False, msg
+
+    mask = encoder_output["encoder_padding_mask"]
+    if mask is not None:
+        if not isinstance(mask, torch.Tensor):
+            msg = (
+                "encoder_padding_mask must be a torch.Tensor" + _current_postion_info()
+            )
+            return False, msg
+        if mask.dtype != torch.uint8 and (
+            not hasattr(torch, "bool") or mask.dtype != torch.bool
+        ):
+            msg = (
+                "encoder_padding_mask must have dtype of uint8"
+                + _current_postion_info()
+            )
+            return False, msg
+
+        if mask.dim() != 2:
+            msg = (
+                "we expect encoder_padding_mask to be a 2-d tensor, in shape (T, B)"
+                + _current_postion_info()
+            )
+            return False, msg
+
+        if batch_size is not None and mask.size(1) != batch_size:
+            msg = (
+                "we expect encoder_padding_mask to be a 2-d tensor, with size(1)"
+                + " being the batch size"
+                + _current_postion_info()
+            )
+            return False, msg
+    return True, None
+
+
+def check_decoder_output(decoder_output):
+    """we expect output from a decoder is a tuple with the following constraint:
+    - the first element is a torch.Tensor
+    - the second element can be anything (reserved for future use)
+    """
+    if not isinstance(decoder_output, tuple):
+        msg = "FariseqDecoder output must be a tuple" + _current_postion_info()
+        return False, msg
+
+    if len(decoder_output) != 2:
+        msg = "FairseqDecoder output must be 2-elem tuple" + _current_postion_info()
+        return False, msg
+
+    if not isinstance(decoder_output[0], torch.Tensor):
+        msg = (
+            "FariseqDecoder output[0] must be a torch.Tensor" + _current_postion_info()
+        )
+        return False, msg
+
+    return True, None
+
+
+# ///////////////////////////////////////////////////////////////////////////
+# Base Test class
+# ///////////////////////////////////////////////////////////////////////////
+
+
+class TestBaseFairseqModelBase(unittest.TestCase):
+    """
+    This class is used to facilitate writing unittest for any class derived from
+    `BaseFairseqModel`.
+    """
+
+    @classmethod
+    def setUpClass(cls):
+        if cls is TestBaseFairseqModelBase:
+            raise unittest.SkipTest("Skipping test case in base")
+        super().setUpClass()
+
+    def setUpModel(self, model):
+        self.assertTrue(isinstance(model, BaseFairseqModel))
+        self.model = model
+
+    def setupInput(self):
+        pass
+
+    def setUp(self):
+        self.model = None
+        self.forward_input = None
+        pass
+
+
+class TestFairseqEncoderDecoderModelBase(TestBaseFairseqModelBase):
+    """
+    base code to test FairseqEncoderDecoderModel (formally known as
+    `FairseqModel`) must be derived from this base class
+    """
+
+    @classmethod
+    def setUpClass(cls):
+        if cls is TestFairseqEncoderDecoderModelBase:
+            raise unittest.SkipTest("Skipping test case in base")
+        super().setUpClass()
+
+    def setUpModel(self, model_cls, extra_args_setters=None):
+        self.assertTrue(
+            issubclass(model_cls, (FairseqEncoderDecoderModel, FairseqModel)),
+            msg="This class only tests for FairseqModel subclasses",
+        )
+
+        task, parser = get_dummy_task_and_parser()
+        model_cls.add_args(parser)
+
+        args = parser.parse_args([])
+
+        if extra_args_setters is not None:
+            for args_setter in extra_args_setters:
+                args_setter(args)
+        model = model_cls.build_model(args, task)
+        self.model = model
+
+    def setUpInput(self, input=None):
+        self.forward_input = get_dummy_input() if input is None else input
+
+    def setUp(self):
+        super().setUp()
+
+    def test_forward(self):
+        if self.model and self.forward_input:
+            forward_output = self.model.forward(**self.forward_input)
+            # for FairseqEncoderDecoderModel, forward returns a tuple of two
+            # elements, the first one is a Torch.Tensor
+            succ, msg = check_decoder_output(forward_output)
+            if not succ:
+                self.assertTrue(succ, msg=msg)
+            self.forward_output = forward_output
+
+    def test_get_normalized_probs(self):
+        if self.model and self.forward_input:
+            forward_output = self.model.forward(**self.forward_input)
+            logprob = self.model.get_normalized_probs(forward_output, log_probs=True)
+            prob = self.model.get_normalized_probs(forward_output, log_probs=False)
+
+            # in order for different models/criterion to play with each other
+            # we need to know whether the logprob or prob output is batch_first
+            # or not. We assume an additional attribute will be attached to logprob
+            # or prob. If you find your code failed here, simply override
+            # FairseqModel.get_normalized_probs, see example at
+            # https://fburl.com/batch_first_example
+            self.assertTrue(hasattr(logprob, "batch_first"))
+            self.assertTrue(hasattr(prob, "batch_first"))
+
+            self.assertTrue(torch.is_tensor(logprob))
+            self.assertTrue(torch.is_tensor(prob))
+
+
+class TestFairseqEncoderModelBase(TestBaseFairseqModelBase):
+    """
+    base class to test FairseqEncoderModel
+    """
+
+    @classmethod
+    def setUpClass(cls):
+        if cls is TestFairseqEncoderModelBase:
+            raise unittest.SkipTest("Skipping test case in base")
+        super().setUpClass()
+
+    def setUpModel(self, model_cls, extra_args_setters=None):
+        self.assertTrue(
+            issubclass(model_cls, FairseqEncoderModel),
+            msg="This class is only used for testing FairseqEncoderModel",
+        )
+        task, parser = get_dummy_task_and_parser()
+        model_cls.add_args(parser)
+        args = parser.parse_args([])
+        if extra_args_setters is not None:
+            for args_setter in extra_args_setters:
+                args_setter(args)
+
+        model = model_cls.build_model(args, task)
+        self.model = model
+
+    def setUpInput(self, input=None):
+        self.forward_input = get_dummy_input() if input is None else input
+        # get_dummy_input() is originally for s2s, here we delete extra dict
+        # items, so it can be used for EncoderModel / Encoder as well
+        self.forward_input.pop("prev_output_tokens", None)
+
+    def setUp(self):
+        super().setUp()
+
+    def test_forward(self):
+        if self.forward_input and self.model:
+            bsz = self.forward_input["src_tokens"].size(0)
+            forward_output = self.model.forward(**self.forward_input)
+
+            # we expect forward_output to be a dict with the following
+            # key/value pairs:
+            # - encoder_out: a Torch.Tensor
+            # - encoder_padding_mask: a binary Torch.Tensor
+            succ, msg = check_encoder_output(forward_output, batch_size=bsz)
+            if not succ:
+                self.assertTrue(succ, msg=msg)
+            self.forward_output = forward_output
+
+    def test_get_normalized_probs(self):
+        if self.model and self.forward_input:
+            forward_output = self.model.forward(**self.forward_input)
+            logprob = self.model.get_normalized_probs(forward_output, log_probs=True)
+            prob = self.model.get_normalized_probs(forward_output, log_probs=False)
+
+            # in order for different models/criterion to play with each other
+            # we need to know whether the logprob or prob output is batch_first
+            # or not. We assume an additional attribute will be attached to logprob
+            # or prob. If you find your code failed here, simply override
+            # FairseqModel.get_normalized_probs, see example at
+            # https://fburl.com/batch_first_example
+            self.assertTrue(hasattr(logprob, "batch_first"))
+            self.assertTrue(hasattr(prob, "batch_first"))
+
+            self.assertTrue(torch.is_tensor(logprob))
+            self.assertTrue(torch.is_tensor(prob))
+
+
+class TestFairseqEncoderBase(unittest.TestCase):
+    """
+    base class to test FairseqEncoder
+    """
+
+    @classmethod
+    def setUpClass(cls):
+        if cls is TestFairseqEncoderBase:
+            raise unittest.SkipTest("Skipping test case in base")
+        super().setUpClass()
+
+    def setUpEncoder(self, encoder):
+        self.assertTrue(
+            isinstance(encoder, FairseqEncoder),
+            msg="This class is only used for test FairseqEncoder",
+        )
+        self.encoder = encoder
+
+    def setUpInput(self, input=None):
+        self.forward_input = get_dummy_input() if input is None else input
+        # get_dummy_input() is originally for s2s, here we delete extra dict
+        # items, so it can be used for EncoderModel / Encoder as well
+        self.forward_input.pop("prev_output_tokens", None)
+
+    def setUp(self):
+        self.encoder = None
+        self.forward_input = None
+
+    def test_forward(self):
+        if self.encoder and self.forward_input:
+            bsz = self.forward_input["src_tokens"].size(0)
+
+            forward_output = self.encoder.forward(**self.forward_input)
+            succ, msg = check_encoder_output(forward_output, batch_size=bsz)
+            if not succ:
+                self.assertTrue(succ, msg=msg)
+            self.forward_output = forward_output
+
+
+class TestFairseqDecoderBase(unittest.TestCase):
+    """
+    base class to test FairseqDecoder
+    """
+
+    @classmethod
+    def setUpClass(cls):
+        if cls is TestFairseqDecoderBase:
+            raise unittest.SkipTest("Skipping test case in base")
+        super().setUpClass()
+
+    def setUpDecoder(self, decoder):
+        self.assertTrue(
+            isinstance(decoder, FairseqDecoder),
+            msg="This class is only used for test FairseqDecoder",
+        )
+        self.decoder = decoder
+
+    def setUpInput(self, input=None):
+        self.forward_input = get_dummy_encoder_output() if input is None else input
+
+    def setUpPrevOutputTokens(self, tokens=None):
+        if tokens is None:
+            self.encoder_input = get_dummy_input()
+            self.prev_output_tokens = self.encoder_input["prev_output_tokens"]
+        else:
+            self.prev_output_tokens = tokens
+
+    def setUp(self):
+        self.decoder = None
+        self.forward_input = None
+        self.prev_output_tokens = None
+
+    def test_forward(self):
+        if (
+            self.decoder is not None
+            and self.forward_input is not None
+            and self.prev_output_tokens is not None
+        ):
+            forward_output = self.decoder.forward(
+                prev_output_tokens=self.prev_output_tokens,
+                encoder_out=self.forward_input,
+            )
+            succ, msg = check_decoder_output(forward_output)
+            if not succ:
+                self.assertTrue(succ, msg=msg)
+            self.forward_input = forward_output
+
+
+class DummyEncoderModel(FairseqEncoderModel):
+    def __init__(self, encoder):
+        super().__init__(encoder)
+
+    @classmethod
+    def build_model(cls, args, task):
+        return cls(DummyEncoder())
+
+    def get_logits(self, net_output):
+        # Inverse of sigmoid to use with BinaryCrossEntropyWithLogitsCriterion as
+        # F.binary_cross_entropy_with_logits combines sigmoid and CE
+        return torch.log(
+            torch.div(net_output["encoder_out"], 1 - net_output["encoder_out"])
+        )
+
+    def get_normalized_probs(self, net_output, log_probs, sample=None):
+        lprobs = super().get_normalized_probs(net_output, log_probs, sample=sample)
+        lprobs.batch_first = True
+        return lprobs
+
+
+class DummyEncoder(FairseqEncoder):
+    def __init__(self):
+        super().__init__(None)
+
+    def forward(self, src_tokens, src_lengths):
+        mask, max_len = lengths_to_encoder_padding_mask(src_lengths)
+        return {"encoder_out": src_tokens, "encoder_padding_mask": mask}
+
+
+class CrossEntropyCriterionTestBase(unittest.TestCase):
+    @classmethod
+    def setUpClass(cls):
+        if cls is CrossEntropyCriterionTestBase:
+            raise unittest.SkipTest("Skipping base class test case")
+        super().setUpClass()
+
+    def setUpArgs(self):
+        args = argparse.Namespace()
+        args.sentence_avg = False
+        args.threshold = 0.1  # to use with BinaryCrossEntropyWithLogitsCriterion
+        return args
+
+    def setUp(self):
+        args = self.setUpArgs()
+        self.model = DummyEncoderModel(encoder=DummyEncoder())
+        self.criterion = self.criterion_cls.build_criterion(args, task=DummyTask(args))
+
+    def get_src_tokens(self, correct_prediction, aggregate):
+        """
+        correct_prediction: True if the net_output (src_tokens) should
+        predict the correct target
+        aggregate: True if the criterion expects net_output (src_tokens)
+        aggregated across time axis
+        """
+        predicted_idx = 0 if correct_prediction else 1
+        if aggregate:
+            src_tokens = torch.zeros((2, 2), dtype=torch.float)
+            for b in range(2):
+                src_tokens[b][predicted_idx] = 1.0
+        else:
+            src_tokens = torch.zeros((2, 10, 2), dtype=torch.float)
+            for b in range(2):
+                for t in range(10):
+                    src_tokens[b][t][predicted_idx] = 1.0
+        return src_tokens
+
+    def get_target(self, soft_target):
+        if soft_target:
+            target = torch.zeros((2, 2), dtype=torch.float)
+            for b in range(2):
+                target[b][0] = 1.0
+        else:
+            target = torch.zeros((2, 10), dtype=torch.long)
+        return target
+
+    def get_test_sample(self, correct, soft_target, aggregate):
+        src_tokens = self.get_src_tokens(correct, aggregate)
+        target = self.get_target(soft_target)
+        L = src_tokens.size(1)
+        return {
+            "net_input": {"src_tokens": src_tokens, "src_lengths": torch.tensor([L])},
+            "target": target,
+            "ntokens": src_tokens.size(0) * src_tokens.size(1),
+        }
diff --git a/SpeechT5/fairseq/tests/speech_recognition/test_collaters.py b/SpeechT5/fairseq/tests/speech_recognition/test_collaters.py
new file mode 100644
index 0000000000000000000000000000000000000000..6a5029a48faea2426d7a0277655a2c7c08c1d16c
--- /dev/null
+++ b/SpeechT5/fairseq/tests/speech_recognition/test_collaters.py
@@ -0,0 +1,58 @@
+#!/usr/bin/env python3
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import unittest
+
+import numpy as np
+import torch
+from examples.speech_recognition.data.collaters import Seq2SeqCollater
+
+
+class TestSeq2SeqCollator(unittest.TestCase):
+    def test_collate(self):
+
+        eos_idx = 1
+        pad_idx = 0
+        collater = Seq2SeqCollater(
+            feature_index=0, label_index=1, pad_index=pad_idx, eos_index=eos_idx
+        )
+
+        # 2 frames in the first sample and 3 frames in the second one
+        frames1 = np.array([[7, 8], [9, 10]])
+        frames2 = np.array([[1, 2], [3, 4], [5, 6]])
+        target1 = np.array([4, 2, 3, eos_idx])
+        target2 = np.array([3, 2, eos_idx])
+        sample1 = {"id": 0, "data": [frames1, target1]}
+        sample2 = {"id": 1, "data": [frames2, target2]}
+        batch = collater.collate([sample1, sample2])
+
+        # collate sort inputs by frame's length before creating the batch
+        self.assertTensorEqual(batch["id"], torch.tensor([1, 0]))
+        self.assertEqual(batch["ntokens"], 7)
+        self.assertTensorEqual(
+            batch["net_input"]["src_tokens"],
+            torch.tensor(
+                [[[1, 2], [3, 4], [5, 6]], [[7, 8], [9, 10], [pad_idx, pad_idx]]]
+            ),
+        )
+        self.assertTensorEqual(
+            batch["net_input"]["prev_output_tokens"],
+            torch.tensor([[eos_idx, 3, 2, pad_idx], [eos_idx, 4, 2, 3]]),
+        )
+        self.assertTensorEqual(batch["net_input"]["src_lengths"], torch.tensor([3, 2]))
+        self.assertTensorEqual(
+            batch["target"],
+            torch.tensor([[3, 2, eos_idx, pad_idx], [4, 2, 3, eos_idx]]),
+        )
+        self.assertEqual(batch["nsentences"], 2)
+
+    def assertTensorEqual(self, t1, t2):
+        self.assertEqual(t1.size(), t2.size(), "size mismatch")
+        self.assertEqual(t1.ne(t2).long().sum(), 0)
+
+
+if __name__ == "__main__":
+    unittest.main()
diff --git a/SpeechT5/fairseq/tests/speech_recognition/test_cross_entropy.py b/SpeechT5/fairseq/tests/speech_recognition/test_cross_entropy.py
new file mode 100644
index 0000000000000000000000000000000000000000..b05400ed95e22762c3e3e5e8fd3ebfa6caf1e325
--- /dev/null
+++ b/SpeechT5/fairseq/tests/speech_recognition/test_cross_entropy.py
@@ -0,0 +1,37 @@
+#!/usr/bin/env python3
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from examples.speech_recognition.criterions.cross_entropy_acc import (
+    CrossEntropyWithAccCriterion,
+)
+
+from .asr_test_base import CrossEntropyCriterionTestBase
+
+
+class CrossEntropyWithAccCriterionTest(CrossEntropyCriterionTestBase):
+    def setUp(self):
+        self.criterion_cls = CrossEntropyWithAccCriterion
+        super().setUp()
+
+    def test_cross_entropy_all_correct(self):
+        sample = self.get_test_sample(correct=True, soft_target=False, aggregate=False)
+        loss, sample_size, logging_output = self.criterion(
+            self.model, sample, "sum", log_probs=True
+        )
+        assert logging_output["correct"] == 20
+        assert logging_output["total"] == 20
+        assert logging_output["sample_size"] == 20
+        assert logging_output["ntokens"] == 20
+
+    def test_cross_entropy_all_wrong(self):
+        sample = self.get_test_sample(correct=False, soft_target=False, aggregate=False)
+        loss, sample_size, logging_output = self.criterion(
+            self.model, sample, "sum", log_probs=True
+        )
+        assert logging_output["correct"] == 0
+        assert logging_output["total"] == 20
+        assert logging_output["sample_size"] == 20
+        assert logging_output["ntokens"] == 20
diff --git a/SpeechT5/fairseq/tests/speech_recognition/test_data_utils.py b/SpeechT5/fairseq/tests/speech_recognition/test_data_utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..a72e0b66948da1349d87eafdef4c4004dd535c96
--- /dev/null
+++ b/SpeechT5/fairseq/tests/speech_recognition/test_data_utils.py
@@ -0,0 +1,62 @@
+#!/usr/bin/env python3
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+import unittest
+
+import torch
+from examples.speech_recognition.data import data_utils
+
+
+class DataUtilsTest(unittest.TestCase):
+    def test_normalization(self):
+        sample_len1 = torch.tensor(
+            [
+                [
+                    -0.7661,
+                    -1.3889,
+                    -2.0972,
+                    -0.9134,
+                    -0.7071,
+                    -0.9765,
+                    -0.8700,
+                    -0.8283,
+                    0.7512,
+                    1.3211,
+                    2.1532,
+                    2.1174,
+                    1.2800,
+                    1.2633,
+                    1.6147,
+                    1.6322,
+                    2.0723,
+                    3.1522,
+                    3.2852,
+                    2.2309,
+                    2.5569,
+                    2.2183,
+                    2.2862,
+                    1.5886,
+                    0.8773,
+                    0.8725,
+                    1.2662,
+                    0.9899,
+                    1.1069,
+                    1.3926,
+                    1.2795,
+                    1.1199,
+                    1.1477,
+                    1.2687,
+                    1.3843,
+                    1.1903,
+                    0.8355,
+                    1.1367,
+                    1.2639,
+                    1.4707,
+                ]
+            ]
+        )
+        out = data_utils.apply_mv_norm(sample_len1)
+        assert not torch.isnan(out).any()
+        assert (out == sample_len1).all()
diff --git a/SpeechT5/fairseq/tests/speech_recognition/test_vggtransformer.py b/SpeechT5/fairseq/tests/speech_recognition/test_vggtransformer.py
new file mode 100644
index 0000000000000000000000000000000000000000..4dc73b8c7379970dc0bcc16fcb088a64a1bd7e3b
--- /dev/null
+++ b/SpeechT5/fairseq/tests/speech_recognition/test_vggtransformer.py
@@ -0,0 +1,135 @@
+#!/usr/bin/env python3
+
+# import models/encoder/decoder to be tested
+from examples.speech_recognition.models.vggtransformer import (
+    TransformerDecoder,
+    VGGTransformerEncoder,
+    VGGTransformerModel,
+    vggtransformer_1,
+    vggtransformer_2,
+    vggtransformer_base,
+)
+
+# import base test class
+from .asr_test_base import (
+    DEFAULT_TEST_VOCAB_SIZE,
+    TestFairseqDecoderBase,
+    TestFairseqEncoderBase,
+    TestFairseqEncoderDecoderModelBase,
+    get_dummy_dictionary,
+    get_dummy_encoder_output,
+    get_dummy_input,
+)
+
+
+class VGGTransformerModelTest_mid(TestFairseqEncoderDecoderModelBase):
+    def setUp(self):
+        def override_config(args):
+            """
+            vggtrasformer_1 use 14 layers of transformer,
+            for testing purpose, it is too expensive. For fast turn-around
+            test, reduce the number of layers to 3.
+            """
+            args.transformer_enc_config = (
+                "((1024, 16, 4096, True, 0.15, 0.15, 0.15),) * 3"
+            )
+
+        super().setUp()
+        extra_args_setter = [vggtransformer_1, override_config]
+
+        self.setUpModel(VGGTransformerModel, extra_args_setter)
+        self.setUpInput(get_dummy_input(T=50, D=80, B=5, K=DEFAULT_TEST_VOCAB_SIZE))
+
+
+class VGGTransformerModelTest_big(TestFairseqEncoderDecoderModelBase):
+    def setUp(self):
+        def override_config(args):
+            """
+            vggtrasformer_2 use 16 layers of transformer,
+            for testing purpose, it is too expensive. For fast turn-around
+            test, reduce the number of layers to 3.
+            """
+            args.transformer_enc_config = (
+                "((1024, 16, 4096, True, 0.15, 0.15, 0.15),) * 3"
+            )
+
+        super().setUp()
+        extra_args_setter = [vggtransformer_2, override_config]
+
+        self.setUpModel(VGGTransformerModel, extra_args_setter)
+        self.setUpInput(get_dummy_input(T=50, D=80, B=5, K=DEFAULT_TEST_VOCAB_SIZE))
+
+
+class VGGTransformerModelTest_base(TestFairseqEncoderDecoderModelBase):
+    def setUp(self):
+        def override_config(args):
+            """
+            vggtrasformer_base use 12 layers of transformer,
+            for testing purpose, it is too expensive. For fast turn-around
+            test, reduce the number of layers to 3.
+            """
+            args.transformer_enc_config = (
+                "((512, 8, 2048, True, 0.15, 0.15, 0.15),) * 3"
+            )
+
+        super().setUp()
+        extra_args_setter = [vggtransformer_base, override_config]
+
+        self.setUpModel(VGGTransformerModel, extra_args_setter)
+        self.setUpInput(get_dummy_input(T=50, D=80, B=5, K=DEFAULT_TEST_VOCAB_SIZE))
+
+
+class VGGTransformerEncoderTest(TestFairseqEncoderBase):
+    def setUp(self):
+        super().setUp()
+
+        self.setUpInput(get_dummy_input(T=50, D=80, B=5))
+
+    def test_forward(self):
+        print("1. test standard vggtransformer")
+        self.setUpEncoder(VGGTransformerEncoder(input_feat_per_channel=80))
+        super().test_forward()
+        print("2. test vggtransformer with limited right context")
+        self.setUpEncoder(
+            VGGTransformerEncoder(
+                input_feat_per_channel=80, transformer_context=(-1, 5)
+            )
+        )
+        super().test_forward()
+        print("3. test vggtransformer with limited left context")
+        self.setUpEncoder(
+            VGGTransformerEncoder(
+                input_feat_per_channel=80, transformer_context=(5, -1)
+            )
+        )
+        super().test_forward()
+        print("4. test vggtransformer with limited right context and sampling")
+        self.setUpEncoder(
+            VGGTransformerEncoder(
+                input_feat_per_channel=80,
+                transformer_context=(-1, 12),
+                transformer_sampling=(2, 2),
+            )
+        )
+        super().test_forward()
+        print("5. test vggtransformer with windowed context and sampling")
+        self.setUpEncoder(
+            VGGTransformerEncoder(
+                input_feat_per_channel=80,
+                transformer_context=(12, 12),
+                transformer_sampling=(2, 2),
+            )
+        )
+
+
+class TransformerDecoderTest(TestFairseqDecoderBase):
+    def setUp(self):
+        super().setUp()
+
+        dict = get_dummy_dictionary(vocab_size=DEFAULT_TEST_VOCAB_SIZE)
+        decoder = TransformerDecoder(dict)
+        dummy_encoder_output = get_dummy_encoder_output(encoder_out_shape=(50, 5, 256))
+
+        self.setUpDecoder(decoder)
+        self.setUpInput(dummy_encoder_output)
+        self.setUpPrevOutputTokens()
diff --git a/SpeechT5/fairseq/tests/test_activation_checkpointing.py b/SpeechT5/fairseq/tests/test_activation_checkpointing.py
new file mode 100644
index 0000000000000000000000000000000000000000..647a9572886f8aff09a4aadc0b21e1d5817ff38e
--- /dev/null
+++ b/SpeechT5/fairseq/tests/test_activation_checkpointing.py
@@ -0,0 +1,79 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import unittest
+
+import torch
+import torch.nn as nn
+from fairseq.modules.checkpoint_activations import checkpoint_wrapper
+from torch.utils.checkpoint import checkpoint
+
+
+class Model(nn.Module):
+    def __init__(
+        self, use_pytorch_checkpoint=False, use_fairseq_checkpoint=False, **kwargs
+    ):
+        super().__init__()
+        torch.manual_seed(0)
+        self.use_pytorch_checkpoint = use_pytorch_checkpoint
+        self.ffn = nn.Sequential(
+            nn.Linear(32, 128),
+            # add a Dropout layer to test RNG save/restore
+            nn.Dropout(p=0.5),
+            nn.Linear(128, 32),
+        )
+        if use_fairseq_checkpoint:
+            self.ffn = checkpoint_wrapper(self.ffn, **kwargs)
+        self.out = nn.Linear(32, 1)
+
+    def forward(self, x):
+        if self.use_pytorch_checkpoint:
+            x = checkpoint(self.ffn, x)
+        else:
+            x = self.ffn(x)
+        return self.out(x)
+
+
+class TestActivationCheckpointing(unittest.TestCase):
+    def _test_checkpoint_wrapper(self, device, log_memory_usage=False):
+        def get_loss_and_gnorm(model):
+            torch.manual_seed(1)
+            input = torch.rand(2, 16, 32).requires_grad_(True).to(device)
+            model.zero_grad()
+            loss = model(input).sum()
+            loss.backward()
+            gnorm = torch.norm(
+                torch.stack([torch.norm(p.grad.detach()) for p in model.parameters()])
+            )
+            return {"loss": loss, "gnorm": gnorm}
+
+        model = Model().to(device)
+        no_cpt = get_loss_and_gnorm(model)
+
+        model = Model(use_pytorch_checkpoint=True).to(device)
+        pyt_cpt = get_loss_and_gnorm(model)
+        torch.testing.assert_allclose(no_cpt["loss"], pyt_cpt["loss"])
+        torch.testing.assert_allclose(no_cpt["gnorm"], pyt_cpt["gnorm"])
+
+        model = Model(use_fairseq_checkpoint=True).to(device)
+        fairseq_cpt = get_loss_and_gnorm(model)
+        torch.testing.assert_allclose(no_cpt["loss"], fairseq_cpt["loss"])
+        torch.testing.assert_allclose(no_cpt["gnorm"], fairseq_cpt["gnorm"])
+
+        model = Model(use_fairseq_checkpoint=True, offload_to_cpu=True).to(device)
+        fairseq_cpt_offload = get_loss_and_gnorm(model)
+        torch.testing.assert_allclose(no_cpt["loss"], fairseq_cpt_offload["loss"])
+        torch.testing.assert_allclose(no_cpt["gnorm"], fairseq_cpt_offload["gnorm"])
+
+    def test_checkpoint_wrapper_cpu(self):
+        self._test_checkpoint_wrapper(device=torch.device("cpu"))
+
+    @unittest.skipIf(not torch.cuda.is_available(), "test requires a GPU")
+    def test_checkpoint_wrapper_cuda(self):
+        self._test_checkpoint_wrapper(device=torch.device("cuda"))
+
+
+if __name__ == "__main__":
+    unittest.main()
diff --git a/SpeechT5/fairseq/tests/test_amp_optimizer.py b/SpeechT5/fairseq/tests/test_amp_optimizer.py
new file mode 100644
index 0000000000000000000000000000000000000000..3a785e1830e91b7e090e841d428fe4ea61f3a65c
--- /dev/null
+++ b/SpeechT5/fairseq/tests/test_amp_optimizer.py
@@ -0,0 +1,78 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import argparse
+import copy
+import unittest
+
+import torch
+from torch.cuda.amp import autocast, GradScaler
+from fairseq.optim import build_optimizer
+
+
+@unittest.skipIf(not torch.cuda.is_available(), "test requires a GPU")
+class TestGradientScalingAMP(unittest.TestCase):
+    def setUp(self):
+        self.x = torch.tensor([2.0]).cuda().half()
+        weight = 3.0
+        bias = 5.0
+        self.error = 1.0
+        self.target = torch.tensor([self.x * weight + bias + self.error]).cuda()
+        self.loss_fn = torch.nn.L1Loss()
+
+        self.model = torch.nn.Linear(1, 1)
+        self.model.weight.data = torch.tensor([[weight]])
+        self.model.bias.data = torch.tensor([bias])
+        self.model.cuda()
+        self.params = list(self.model.parameters())
+
+        self.namespace_dls = argparse.Namespace(
+            optimizer="adam",
+            lr=[0.1],
+            adam_betas="(0.9, 0.999)",
+            adam_eps=1e-8,
+            weight_decay=0.0,
+            threshold_loss_scale=1,
+            min_loss_scale=1e-4,
+        )
+        self.scaler = GradScaler(
+            init_scale=1,
+            growth_interval=1,
+        )
+
+    def run_iter(self, model, params, optimizer):
+        optimizer.zero_grad()
+        with autocast():
+            y = model(self.x)
+            loss = self.loss_fn(y, self.target)
+        self.scaler.scale(loss).backward()
+        self.assertEqual(loss, torch.tensor(1.0, device="cuda:0", dtype=torch.float16))
+
+        self.scaler.unscale_(optimizer)
+        grad_norm = optimizer.clip_grad_norm(0)
+        self.assertAlmostEqual(grad_norm.item(), 2.2361, 4)
+
+        self.scaler.step(optimizer)
+        self.scaler.update()
+        self.assertEqual(
+            model.weight,
+            torch.tensor(
+                [[3.1]], device="cuda:0", requires_grad=True
+            ),
+        )
+        self.assertEqual(
+            model.bias,
+            torch.tensor(
+                [5.1], device="cuda:0", requires_grad=True
+            ),
+        )
+        self.assertEqual(self.scaler.get_scale(), 2.0)
+
+    def test_automatic_mixed_precision(self):
+        model = copy.deepcopy(self.model)
+        params = list(model.parameters())
+        optimizer = build_optimizer(self.namespace_dls, params)
+
+        self.run_iter(model, params, optimizer)
diff --git a/SpeechT5/fairseq/tests/test_average_checkpoints.py b/SpeechT5/fairseq/tests/test_average_checkpoints.py
new file mode 100644
index 0000000000000000000000000000000000000000..f348b56b869372d8434fe03f13324d78e9093fa2
--- /dev/null
+++ b/SpeechT5/fairseq/tests/test_average_checkpoints.py
@@ -0,0 +1,134 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import collections
+import os
+import shutil
+import tempfile
+import unittest
+
+import numpy as np
+import torch
+from scripts.average_checkpoints import average_checkpoints
+from torch import nn
+
+
+class ModelWithSharedParameter(nn.Module):
+    def __init__(self):
+        super(ModelWithSharedParameter, self).__init__()
+        self.embedding = nn.Embedding(1000, 200)
+        self.FC1 = nn.Linear(200, 200)
+        self.FC2 = nn.Linear(200, 200)
+        # tie weight in FC2 to FC1
+        self.FC2.weight = nn.Parameter(self.FC1.weight)
+        self.FC2.bias = nn.Parameter(self.FC1.bias)
+
+        self.relu = nn.ReLU()
+
+    def forward(self, input):
+        return self.FC2(self.ReLU(self.FC1(input))) + self.FC1(input)
+
+
+class TestAverageCheckpoints(unittest.TestCase):
+    def test_average_checkpoints(self):
+        params_0 = collections.OrderedDict(
+            [
+                ("a", torch.DoubleTensor([100.0])),
+                ("b", torch.FloatTensor([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])),
+                ("c", torch.IntTensor([7, 8, 9])),
+            ]
+        )
+        params_1 = collections.OrderedDict(
+            [
+                ("a", torch.DoubleTensor([1.0])),
+                ("b", torch.FloatTensor([[1.0, 1.0, 1.0], [1.0, 1.0, 1.0]])),
+                ("c", torch.IntTensor([2, 2, 2])),
+            ]
+        )
+        params_avg = collections.OrderedDict(
+            [
+                ("a", torch.DoubleTensor([50.5])),
+                ("b", torch.FloatTensor([[1.0, 1.5, 2.0], [2.5, 3.0, 3.5]])),
+                # We expect truncation for integer division
+                ("c", torch.IntTensor([4, 5, 5])),
+            ]
+        )
+
+        fd_0, path_0 = tempfile.mkstemp()
+        fd_1, path_1 = tempfile.mkstemp()
+        torch.save(collections.OrderedDict([("model", params_0)]), path_0)
+        torch.save(collections.OrderedDict([("model", params_1)]), path_1)
+
+        output = average_checkpoints([path_0, path_1])["model"]
+
+        os.close(fd_0)
+        os.remove(path_0)
+        os.close(fd_1)
+        os.remove(path_1)
+
+        for (k_expected, v_expected), (k_out, v_out) in zip(
+            params_avg.items(), output.items()
+        ):
+            self.assertEqual(
+                k_expected,
+                k_out,
+                "Key mismatch - expected {} but found {}. "
+                "(Expected list of keys: {} vs actual list of keys: {})".format(
+                    k_expected, k_out, params_avg.keys(), output.keys()
+                ),
+            )
+            np.testing.assert_allclose(
+                v_expected.numpy(),
+                v_out.numpy(),
+                err_msg="Tensor value mismatch for key {}".format(k_expected),
+            )
+
+    def test_average_checkpoints_with_shared_parameters(self):
+        def _construct_model_with_shared_parameters(path, value):
+            m = ModelWithSharedParameter()
+            nn.init.constant_(m.FC1.weight, value)
+            torch.save({"model": m.state_dict()}, path)
+            return m
+
+        tmpdir = tempfile.mkdtemp()
+        paths = []
+        path = os.path.join(tmpdir, "m1.pt")
+        m1 = _construct_model_with_shared_parameters(path, 1.0)
+        paths.append(path)
+
+        path = os.path.join(tmpdir, "m2.pt")
+        m2 = _construct_model_with_shared_parameters(path, 2.0)
+        paths.append(path)
+
+        path = os.path.join(tmpdir, "m3.pt")
+        m3 = _construct_model_with_shared_parameters(path, 3.0)
+        paths.append(path)
+
+        new_model = average_checkpoints(paths)
+        self.assertTrue(
+            torch.equal(
+                new_model["model"]["embedding.weight"],
+                (m1.embedding.weight + m2.embedding.weight + m3.embedding.weight) / 3.0,
+            )
+        )
+
+        self.assertTrue(
+            torch.equal(
+                new_model["model"]["FC1.weight"],
+                (m1.FC1.weight + m2.FC1.weight + m3.FC1.weight) / 3.0,
+            )
+        )
+
+        self.assertTrue(
+            torch.equal(
+                new_model["model"]["FC2.weight"],
+                (m1.FC2.weight + m2.FC2.weight + m3.FC2.weight) / 3.0,
+            )
+        )
+        shutil.rmtree(tmpdir)
+
+
+if __name__ == "__main__":
+    unittest.main()
diff --git a/SpeechT5/fairseq/tests/test_backtranslation_dataset.py b/SpeechT5/fairseq/tests/test_backtranslation_dataset.py
new file mode 100644
index 0000000000000000000000000000000000000000..dffc3b49387dfdc046ea23d7db179377040b7cbc
--- /dev/null
+++ b/SpeechT5/fairseq/tests/test_backtranslation_dataset.py
@@ -0,0 +1,123 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import unittest
+
+import tests.utils as test_utils
+import torch
+from fairseq.data import (
+    BacktranslationDataset,
+    LanguagePairDataset,
+    TransformEosDataset,
+)
+from fairseq.sequence_generator import SequenceGenerator
+
+
+class TestBacktranslationDataset(unittest.TestCase):
+    def setUp(self):
+        (
+            self.tgt_dict,
+            self.w1,
+            self.w2,
+            self.src_tokens,
+            self.src_lengths,
+            self.model,
+        ) = test_utils.sequence_generator_setup()
+
+        dummy_src_samples = self.src_tokens
+
+        self.tgt_dataset = test_utils.TestDataset(data=dummy_src_samples)
+        self.cuda = torch.cuda.is_available()
+
+    def _backtranslation_dataset_helper(
+        self,
+        remove_eos_from_input_src,
+        remove_eos_from_output_src,
+    ):
+        tgt_dataset = LanguagePairDataset(
+            src=self.tgt_dataset,
+            src_sizes=self.tgt_dataset.sizes,
+            src_dict=self.tgt_dict,
+            tgt=None,
+            tgt_sizes=None,
+            tgt_dict=None,
+        )
+
+        generator = SequenceGenerator(
+            [self.model],
+            tgt_dict=self.tgt_dict,
+            max_len_a=0,
+            max_len_b=200,
+            beam_size=2,
+            unk_penalty=0,
+        )
+
+        backtranslation_dataset = BacktranslationDataset(
+            tgt_dataset=TransformEosDataset(
+                dataset=tgt_dataset,
+                eos=self.tgt_dict.eos(),
+                # remove eos from the input src
+                remove_eos_from_src=remove_eos_from_input_src,
+            ),
+            src_dict=self.tgt_dict,
+            backtranslation_fn=(
+                lambda sample: generator.generate([self.model], sample)
+            ),
+            output_collater=TransformEosDataset(
+                dataset=tgt_dataset,
+                eos=self.tgt_dict.eos(),
+                # if we remove eos from the input src, then we need to add it
+                # back to the output tgt
+                append_eos_to_tgt=remove_eos_from_input_src,
+                remove_eos_from_src=remove_eos_from_output_src,
+            ).collater,
+            cuda=self.cuda,
+        )
+        dataloader = torch.utils.data.DataLoader(
+            backtranslation_dataset,
+            batch_size=2,
+            collate_fn=backtranslation_dataset.collater,
+        )
+        backtranslation_batch_result = next(iter(dataloader))
+
+        eos, pad, w1, w2 = self.tgt_dict.eos(), self.tgt_dict.pad(), self.w1, self.w2
+
+        # Note that we sort by src_lengths and add left padding, so actually
+        # ids will look like: [1, 0]
+        expected_src = torch.LongTensor([[w1, w2, w1, eos], [pad, pad, w1, eos]])
+        if remove_eos_from_output_src:
+            expected_src = expected_src[:, :-1]
+        expected_tgt = torch.LongTensor([[w1, w2, eos], [w1, w2, eos]])
+        generated_src = backtranslation_batch_result["net_input"]["src_tokens"]
+        tgt_tokens = backtranslation_batch_result["target"]
+
+        self.assertTensorEqual(expected_src, generated_src)
+        self.assertTensorEqual(expected_tgt, tgt_tokens)
+
+    def test_backtranslation_dataset_no_eos_in_output_src(self):
+        self._backtranslation_dataset_helper(
+            remove_eos_from_input_src=False,
+            remove_eos_from_output_src=True,
+        )
+
+    def test_backtranslation_dataset_with_eos_in_output_src(self):
+        self._backtranslation_dataset_helper(
+            remove_eos_from_input_src=False,
+            remove_eos_from_output_src=False,
+        )
+
+    def test_backtranslation_dataset_no_eos_in_input_src(self):
+        self._backtranslation_dataset_helper(
+            remove_eos_from_input_src=True,
+            remove_eos_from_output_src=False,
+        )
+
+    def assertTensorEqual(self, t1, t2):
+        self.assertEqual(t1.size(), t2.size(), "size mismatch")
+        self.assertEqual(t1.ne(t2).long().sum(), 0)
+
+
+if __name__ == "__main__":
+    unittest.main()
diff --git a/SpeechT5/fairseq/tests/test_binaries.py b/SpeechT5/fairseq/tests/test_binaries.py
new file mode 100644
index 0000000000000000000000000000000000000000..4e207742625427f108f78bcd24d487a081b6ccf7
--- /dev/null
+++ b/SpeechT5/fairseq/tests/test_binaries.py
@@ -0,0 +1,1874 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import contextlib
+import logging
+import json
+import os
+import random
+import sys
+import tempfile
+import unittest
+from io import StringIO
+from typing import List, Dict
+import torch
+from fairseq import options
+from fairseq_cli import eval_lm, train
+from tests.utils import (
+    create_dummy_data,
+    generate_main,
+    preprocess_lm_data,
+    preprocess_summarization_data,
+    preprocess_translation_data,
+    create_laser_data_and_config_json,
+    train_translation_model,
+    train_language_model,
+)
+
+
+try:
+    import transformers  # noqa
+
+    has_hf_transformers = True
+except ImportError:
+    has_hf_transformers = False
+
+
+class TestTranslation(unittest.TestCase):
+    def setUp(self):
+        logging.disable(logging.CRITICAL)
+
+    def tearDown(self):
+        logging.disable(logging.NOTSET)
+
+    def test_fconv(self):
+        with contextlib.redirect_stdout(StringIO()):
+            with tempfile.TemporaryDirectory("test_fconv") as data_dir:
+                create_dummy_data(data_dir)
+                preprocess_translation_data(data_dir)
+                train_translation_model(data_dir, "fconv_iwslt_de_en")
+                generate_main(data_dir)
+
+    def test_raw(self):
+        with contextlib.redirect_stdout(StringIO()):
+            with tempfile.TemporaryDirectory("test_fconv_raw") as data_dir:
+                create_dummy_data(data_dir)
+                preprocess_translation_data(data_dir, ["--dataset-impl", "raw"])
+                train_translation_model(
+                    data_dir, "fconv_iwslt_de_en", ["--dataset-impl", "raw"]
+                )
+                generate_main(data_dir, ["--dataset-impl", "raw"])
+
+    def test_update_freq(self):
+        with contextlib.redirect_stdout(StringIO()):
+            with tempfile.TemporaryDirectory("test_update_freq") as data_dir:
+                create_dummy_data(data_dir)
+                preprocess_translation_data(data_dir)
+                train_translation_model(
+                    data_dir, "fconv_iwslt_de_en", ["--update-freq", "3"]
+                )
+                generate_main(data_dir)
+
+    def test_max_positions(self):
+        with contextlib.redirect_stdout(StringIO()):
+            with tempfile.TemporaryDirectory("test_max_positions") as data_dir:
+                create_dummy_data(data_dir)
+                preprocess_translation_data(data_dir)
+                with self.assertRaises(Exception) as context:
+                    train_translation_model(
+                        data_dir,
+                        "fconv_iwslt_de_en",
+                        ["--max-target-positions", "5"],
+                    )
+                self.assertTrue(
+                    "skip this example with --skip-invalid-size-inputs-valid-test"
+                    in str(context.exception)
+                )
+                train_translation_model(
+                    data_dir,
+                    "fconv_iwslt_de_en",
+                    [
+                        "--max-target-positions",
+                        "5",
+                        "--skip-invalid-size-inputs-valid-test",
+                    ],
+                )
+                with self.assertRaises(Exception) as context:
+                    generate_main(data_dir)
+                generate_main(data_dir, ["--skip-invalid-size-inputs-valid-test"])
+
+    def test_generation(self):
+        with contextlib.redirect_stdout(StringIO()):
+            with tempfile.TemporaryDirectory("test_sampling") as data_dir:
+                create_dummy_data(data_dir)
+                preprocess_translation_data(data_dir)
+                train_translation_model(data_dir, "fconv_iwslt_de_en")
+                generate_main(
+                    data_dir,
+                    [
+                        "--sampling",
+                        "--temperature",
+                        "2",
+                        "--beam",
+                        "2",
+                        "--nbest",
+                        "2",
+                    ],
+                )
+                generate_main(
+                    data_dir,
+                    [
+                        "--sampling",
+                        "--sampling-topk",
+                        "3",
+                        "--beam",
+                        "2",
+                        "--nbest",
+                        "2",
+                    ],
+                )
+                generate_main(
+                    data_dir,
+                    [
+                        "--sampling",
+                        "--sampling-topp",
+                        "0.2",
+                        "--beam",
+                        "2",
+                        "--nbest",
+                        "2",
+                    ],
+                )
+                generate_main(
+                    data_dir,
+                    [
+                        "--diversity-rate",
+                        "0.5",
+                        "--beam",
+                        "6",
+                    ],
+                )
+                with self.assertRaises(ValueError):
+                    generate_main(
+                        data_dir,
+                        [
+                            "--diverse-beam-groups",
+                            "4",
+                            "--match-source-len",
+                        ],
+                    )
+                generate_main(data_dir, ["--prefix-size", "2"])
+                generate_main(data_dir, ["--retain-dropout"])
+
+    def test_eval_bleu(self):
+        with contextlib.redirect_stdout(StringIO()):
+            with tempfile.TemporaryDirectory("test_eval_bleu") as data_dir:
+                create_dummy_data(data_dir)
+                preprocess_translation_data(data_dir)
+                train_translation_model(
+                    data_dir,
+                    "fconv_iwslt_de_en",
+                    [
+                        "--eval-bleu",
+                        "--eval-bleu-print-samples",
+                        "--eval-bleu-remove-bpe",
+                        "--eval-bleu-detok",
+                        "space",
+                        "--eval-bleu-args",
+                        '{"beam": 4, "min_len": 10}',
+                    ],
+                )
+
+    def test_lstm(self):
+        with contextlib.redirect_stdout(StringIO()):
+            with tempfile.TemporaryDirectory("test_lstm") as data_dir:
+                create_dummy_data(data_dir)
+                preprocess_translation_data(data_dir)
+                train_translation_model(
+                    data_dir,
+                    "lstm_wiseman_iwslt_de_en",
+                    [
+                        "--encoder-layers",
+                        "2",
+                        "--decoder-layers",
+                        "2",
+                        "--encoder-embed-dim",
+                        "8",
+                        "--decoder-embed-dim",
+                        "8",
+                        "--decoder-out-embed-dim",
+                        "8",
+                    ],
+                )
+                generate_main(data_dir)
+
+    def test_lstm_bidirectional(self):
+        with contextlib.redirect_stdout(StringIO()):
+            with tempfile.TemporaryDirectory("test_lstm_bidirectional") as data_dir:
+                create_dummy_data(data_dir)
+                preprocess_translation_data(data_dir)
+                train_translation_model(
+                    data_dir,
+                    "lstm",
+                    [
+                        "--encoder-layers",
+                        "2",
+                        "--encoder-bidirectional",
+                        "--encoder-hidden-size",
+                        "16",
+                        "--encoder-embed-dim",
+                        "8",
+                        "--decoder-embed-dim",
+                        "8",
+                        "--decoder-out-embed-dim",
+                        "8",
+                        "--decoder-layers",
+                        "2",
+                    ],
+                )
+                generate_main(data_dir)
+
+    def test_transformer(self):
+        with contextlib.redirect_stdout(StringIO()):
+            with tempfile.TemporaryDirectory("test_transformer") as data_dir:
+                create_dummy_data(data_dir)
+                preprocess_translation_data(data_dir)
+                train_translation_model(
+                    data_dir,
+                    "transformer_iwslt_de_en",
+                    [
+                        "--encoder-layers",
+                        "2",
+                        "--decoder-layers",
+                        "2",
+                        "--encoder-embed-dim",
+                        "8",
+                        "--decoder-embed-dim",
+                        "8",
+                    ],
+                    run_validation=True,
+                )
+                generate_main(data_dir)
+
+    def test_multilingual_transformer(self):
+        # test with all combinations of encoder/decoder lang tokens
+        encoder_langtok_flags = [
+            [],
+            ["--encoder-langtok", "src"],
+            ["--encoder-langtok", "tgt"],
+        ]
+        decoder_langtok_flags = [[], ["--decoder-langtok"]]
+        with contextlib.redirect_stdout(StringIO()):
+            for i in range(len(encoder_langtok_flags)):
+                for j in range(len(decoder_langtok_flags)):
+                    enc_ltok_flag = encoder_langtok_flags[i]
+                    dec_ltok_flag = decoder_langtok_flags[j]
+                    with tempfile.TemporaryDirectory(
+                        f"test_multilingual_transformer_{i}_{j}"
+                    ) as data_dir:
+                        create_dummy_data(data_dir)
+                        preprocess_translation_data(data_dir)
+                        train_translation_model(
+                            data_dir,
+                            arch="multilingual_transformer",
+                            task="multilingual_translation",
+                            extra_flags=[
+                                "--encoder-layers",
+                                "2",
+                                "--decoder-layers",
+                                "2",
+                                "--encoder-embed-dim",
+                                "8",
+                                "--decoder-embed-dim",
+                                "8",
+                            ]
+                            + enc_ltok_flag
+                            + dec_ltok_flag,
+                            lang_flags=["--lang-pairs", "in-out,out-in"],
+                            run_validation=True,
+                            extra_valid_flags=enc_ltok_flag + dec_ltok_flag,
+                        )
+                        generate_main(
+                            data_dir,
+                            extra_flags=[
+                                "--task",
+                                "multilingual_translation",
+                                "--lang-pairs",
+                                "in-out,out-in",
+                                "--source-lang",
+                                "in",
+                                "--target-lang",
+                                "out",
+                            ]
+                            + enc_ltok_flag
+                            + dec_ltok_flag,
+                        )
+
+    @unittest.skipIf(
+        sys.platform.lower() == "darwin", "skip latent depth test on MacOS"
+    )
+    def test_multilingual_translation_latent_depth(self):
+        # test with latent depth in encoder, decoder, or both
+        encoder_latent_layer = [[], ["--encoder-latent-layer"]]
+        decoder_latent_layer = [[], ["--decoder-latent-layer"]]
+        with contextlib.redirect_stdout(StringIO()):
+            for i in range(len(encoder_latent_layer)):
+                for j in range(len(decoder_latent_layer)):
+                    if i == 0 and j == 0:
+                        continue
+                    enc_ll_flag = encoder_latent_layer[i]
+                    dec_ll_flag = decoder_latent_layer[j]
+                    with tempfile.TemporaryDirectory(
+                        f"test_multilingual_translation_latent_depth_{i}_{j}"
+                    ) as data_dir:
+                        create_dummy_data(data_dir)
+                        preprocess_translation_data(
+                            data_dir, extra_flags=["--joined-dictionary"]
+                        )
+                        train_translation_model(
+                            data_dir,
+                            arch="latent_multilingual_transformer",
+                            task="multilingual_translation_latent_depth",
+                            extra_flags=[
+                                "--user-dir",
+                                "examples/latent_depth/latent_depth_src",
+                                "--encoder-layers",
+                                "2",
+                                "--decoder-layers",
+                                "2",
+                                "--encoder-embed-dim",
+                                "8",
+                                "--decoder-embed-dim",
+                                "8",
+                                "--share-encoders",
+                                "--share-decoders",
+                                "--sparsity-weight",
+                                "0.1",
+                            ]
+                            + enc_ll_flag
+                            + dec_ll_flag,
+                            lang_flags=["--lang-pairs", "in-out,out-in"],
+                            run_validation=True,
+                            extra_valid_flags=[
+                                "--user-dir",
+                                "examples/latent_depth/latent_depth_src",
+                            ]
+                            + enc_ll_flag
+                            + dec_ll_flag,
+                        )
+                        generate_main(
+                            data_dir,
+                            extra_flags=[
+                                "--user-dir",
+                                "examples/latent_depth/latent_depth_src",
+                                "--task",
+                                "multilingual_translation_latent_depth",
+                                "--lang-pairs",
+                                "in-out,out-in",
+                                "--source-lang",
+                                "in",
+                                "--target-lang",
+                                "out",
+                            ]
+                            + enc_ll_flag
+                            + dec_ll_flag,
+                        )
+
+    def test_translation_multi_simple_epoch(self):
+        # test with all combinations of encoder/decoder lang tokens
+        encoder_langtok_flags = [
+            [],
+            ["--encoder-langtok", "src"],
+            ["--encoder-langtok", "tgt"],
+        ]
+        decoder_langtok_flags = [[], ["--decoder-langtok"]]
+        with contextlib.redirect_stdout(StringIO()):
+            for i in range(len(encoder_langtok_flags)):
+                for j in range(len(decoder_langtok_flags)):
+                    enc_ltok_flag = encoder_langtok_flags[i]
+                    dec_ltok_flag = decoder_langtok_flags[j]
+                    with tempfile.TemporaryDirectory(
+                        f"test_translation_multi_simple_epoch_{i}_{j}"
+                    ) as data_dir:
+                        create_dummy_data(data_dir)
+                        preprocess_translation_data(
+                            data_dir, extra_flags=["--joined-dictionary"]
+                        )
+                        train_translation_model(
+                            data_dir,
+                            arch="transformer",
+                            task="translation_multi_simple_epoch",
+                            extra_flags=[
+                                "--encoder-layers",
+                                "2",
+                                "--decoder-layers",
+                                "2",
+                                "--encoder-embed-dim",
+                                "8",
+                                "--decoder-embed-dim",
+                                "8",
+                                "--sampling-method",
+                                "temperature",
+                                "--sampling-temperature",
+                                "1.5",
+                                "--virtual-epoch-size",
+                                "1000",
+                            ]
+                            + enc_ltok_flag
+                            + dec_ltok_flag,
+                            lang_flags=["--lang-pairs", "in-out,out-in"],
+                            run_validation=True,
+                            extra_valid_flags=enc_ltok_flag + dec_ltok_flag,
+                        )
+                        generate_main(
+                            data_dir,
+                            extra_flags=[
+                                "--task",
+                                "translation_multi_simple_epoch",
+                                "--lang-pairs",
+                                "in-out,out-in",
+                                "--source-lang",
+                                "in",
+                                "--target-lang",
+                                "out",
+                            ]
+                            + enc_ltok_flag
+                            + dec_ltok_flag,
+                        )
+
+    def test_translation_multi_simple_epoch_no_vepoch(self):
+        # test with all combinations of encoder/decoder lang tokens
+        with contextlib.redirect_stdout(StringIO()):
+            enc_ltok_flag = ["--encoder-langtok", "src"]
+            dec_ltok_flag = ["--decoder-langtok"]
+            with tempfile.TemporaryDirectory(
+                "test_translation_multi_simple_epoch_dict"
+            ) as data_dir:
+                create_dummy_data(data_dir)
+                preprocess_translation_data(data_dir, extra_flags=[])
+                train_translation_model(
+                    data_dir,
+                    arch="transformer",
+                    task="translation_multi_simple_epoch",
+                    extra_flags=[
+                        "--encoder-layers",
+                        "2",
+                        "--decoder-layers",
+                        "2",
+                        "--encoder-embed-dim",
+                        "8",
+                        "--decoder-embed-dim",
+                        "8",
+                        "--sampling-method",
+                        "temperature",
+                        "--sampling-temperature",
+                        "1.5",
+                    ]
+                    + enc_ltok_flag
+                    + dec_ltok_flag,
+                    lang_flags=["--lang-pairs", "in-out"],
+                    run_validation=True,
+                    extra_valid_flags=enc_ltok_flag + dec_ltok_flag,
+                )
+                generate_main(
+                    data_dir,
+                    extra_flags=[
+                        "--task",
+                        "translation_multi_simple_epoch",
+                        "--lang-pairs",
+                        "in-out",
+                        "--source-lang",
+                        "in",
+                        "--target-lang",
+                        "out",
+                    ]
+                    + enc_ltok_flag
+                    + dec_ltok_flag,
+                )
+
+    def test_translation_multi_simple_epoch_dicts(self):
+        # test with all combinations of encoder/decoder lang tokens
+        with contextlib.redirect_stdout(StringIO()):
+            enc_ltok_flag = ["--encoder-langtok", "src"]
+            dec_ltok_flag = ["--decoder-langtok"]
+            with tempfile.TemporaryDirectory(
+                "test_translation_multi_simple_epoch_dict"
+            ) as data_dir:
+                create_dummy_data(data_dir)
+                preprocess_translation_data(data_dir, extra_flags=[])
+                train_translation_model(
+                    data_dir,
+                    arch="transformer",
+                    task="translation_multi_simple_epoch",
+                    extra_flags=[
+                        "--encoder-layers",
+                        "2",
+                        "--decoder-layers",
+                        "2",
+                        "--encoder-embed-dim",
+                        "8",
+                        "--decoder-embed-dim",
+                        "8",
+                        "--sampling-method",
+                        "temperature",
+                        "--sampling-temperature",
+                        "1.5",
+                        "--virtual-epoch-size",
+                        "1000",
+                    ]
+                    + enc_ltok_flag
+                    + dec_ltok_flag,
+                    lang_flags=["--lang-pairs", "in-out"],
+                    run_validation=True,
+                    extra_valid_flags=enc_ltok_flag + dec_ltok_flag,
+                )
+                generate_main(
+                    data_dir,
+                    extra_flags=[
+                        "--task",
+                        "translation_multi_simple_epoch",
+                        "--lang-pairs",
+                        "in-out",
+                        "--source-lang",
+                        "in",
+                        "--target-lang",
+                        "out",
+                    ]
+                    + enc_ltok_flag
+                    + dec_ltok_flag,
+                )
+
+    def test_translation_multi_simple_epoch_src_tgt_dict_spec(self):
+        # test the specification of explicit --src-dict and --tgt-dict
+        with contextlib.redirect_stdout(StringIO()):
+            enc_ltok_flag = ["--encoder-langtok", "src"]
+            dec_ltok_flag = ["--decoder-langtok"]
+            with tempfile.TemporaryDirectory(
+                "test_translation_multi_simple_epoch_dict"
+            ) as data_dir:
+                create_dummy_data(data_dir)
+                preprocess_translation_data(data_dir, extra_flags=[])
+                train_translation_model(
+                    data_dir,
+                    arch="transformer",
+                    task="translation_multi_simple_epoch",
+                    extra_flags=[
+                        "--source-dict",
+                        f"{data_dir}/dict.in.txt",
+                        "--target-dict",
+                        f"{data_dir}/dict.out.txt",
+                        "--encoder-layers",
+                        "2",
+                        "--decoder-layers",
+                        "2",
+                        "--encoder-embed-dim",
+                        "8",
+                        "--decoder-embed-dim",
+                        "8",
+                        "--sampling-method",
+                        "temperature",
+                        "--sampling-temperature",
+                        "1.5",
+                        "--virtual-epoch-size",
+                        "1000",
+                    ]
+                    + enc_ltok_flag
+                    + dec_ltok_flag,
+                    lang_flags=["--lang-pairs", "in-out"],
+                    run_validation=True,
+                    extra_valid_flags=enc_ltok_flag + dec_ltok_flag,
+                )
+                generate_main(
+                    data_dir,
+                    extra_flags=[
+                        "--task",
+                        "translation_multi_simple_epoch",
+                        "--lang-pairs",
+                        "in-out",
+                        "--source-lang",
+                        "in",
+                        "--target-lang",
+                        "out",
+                    ]
+                    + enc_ltok_flag
+                    + dec_ltok_flag,
+                )
+
+    def test_transformer_cross_self_attention(self):
+        with contextlib.redirect_stdout(StringIO()):
+            with tempfile.TemporaryDirectory(
+                "test_transformer_cross_self_attention"
+            ) as data_dir:
+                create_dummy_data(data_dir)
+                preprocess_translation_data(data_dir)
+                train_translation_model(
+                    data_dir,
+                    "transformer_iwslt_de_en",
+                    [
+                        "--encoder-layers",
+                        "2",
+                        "--decoder-layers",
+                        "2",
+                        "--encoder-embed-dim",
+                        "8",
+                        "--decoder-embed-dim",
+                        "8",
+                        "--decoder-embed-dim",
+                        "8",
+                        "--no-cross-attention",
+                        "--cross-self-attention",
+                    ],
+                    run_validation=True,
+                )
+                generate_main(data_dir, extra_flags=[])
+
+    def test_transformer_pointer_generator(self):
+        with contextlib.redirect_stdout(StringIO()):
+            with tempfile.TemporaryDirectory(
+                "test_transformer_pointer_generator"
+            ) as data_dir:
+                create_dummy_data(data_dir)
+                preprocess_summarization_data(data_dir)
+                train_translation_model(
+                    data_dir,
+                    "transformer_pointer_generator",
+                    extra_flags=[
+                        "--user-dir",
+                        "examples/pointer_generator/pointer_generator_src",
+                        "--encoder-layers",
+                        "2",
+                        "--decoder-layers",
+                        "2",
+                        "--encoder-embed-dim",
+                        "8",
+                        "--decoder-embed-dim",
+                        "8",
+                        "--alignment-layer",
+                        "-1",
+                        "--alignment-heads",
+                        "1",
+                        "--source-position-markers",
+                        "0",
+                    ],
+                    run_validation=True,
+                    extra_valid_flags=[
+                        "--user-dir",
+                        "examples/pointer_generator/pointer_generator_src",
+                    ],
+                )
+                generate_main(
+                    data_dir,
+                    extra_flags=[
+                        "--user-dir",
+                        "examples/pointer_generator/pointer_generator_src",
+                    ],
+                )
+
+    def test_lightconv(self):
+        with contextlib.redirect_stdout(StringIO()):
+            with tempfile.TemporaryDirectory("test_lightconv") as data_dir:
+                create_dummy_data(data_dir)
+                preprocess_translation_data(data_dir)
+                train_translation_model(
+                    data_dir,
+                    "lightconv_iwslt_de_en",
+                    [
+                        "--encoder-conv-type",
+                        "lightweight",
+                        "--decoder-conv-type",
+                        "lightweight",
+                        "--encoder-embed-dim",
+                        "8",
+                        "--decoder-embed-dim",
+                        "8",
+                    ],
+                )
+                generate_main(data_dir)
+
+    def test_dynamicconv(self):
+        with contextlib.redirect_stdout(StringIO()):
+            with tempfile.TemporaryDirectory("test_dynamicconv") as data_dir:
+                create_dummy_data(data_dir)
+                preprocess_translation_data(data_dir)
+                train_translation_model(
+                    data_dir,
+                    "lightconv_iwslt_de_en",
+                    [
+                        "--encoder-conv-type",
+                        "dynamic",
+                        "--decoder-conv-type",
+                        "dynamic",
+                        "--encoder-embed-dim",
+                        "8",
+                        "--decoder-embed-dim",
+                        "8",
+                    ],
+                )
+                generate_main(data_dir)
+
+    def test_cmlm_transformer(self):
+        with contextlib.redirect_stdout(StringIO()):
+            with tempfile.TemporaryDirectory("test_cmlm_transformer") as data_dir:
+                create_dummy_data(data_dir)
+                preprocess_translation_data(data_dir, ["--joined-dictionary"])
+                train_translation_model(
+                    data_dir,
+                    "cmlm_transformer",
+                    [
+                        "--apply-bert-init",
+                        "--criterion",
+                        "nat_loss",
+                        "--noise",
+                        "full_mask",
+                        "--pred-length-offset",
+                        "--length-loss-factor",
+                        "0.1",
+                    ],
+                    task="translation_lev",
+                )
+                generate_main(
+                    data_dir,
+                    [
+                        "--task",
+                        "translation_lev",
+                        "--iter-decode-max-iter",
+                        "9",
+                        "--iter-decode-eos-penalty",
+                        "0",
+                        "--print-step",
+                    ],
+                )
+
+    def test_nonautoregressive_transformer(self):
+        with contextlib.redirect_stdout(StringIO()):
+            with tempfile.TemporaryDirectory(
+                "test_nonautoregressive_transformer"
+            ) as data_dir:
+                create_dummy_data(data_dir)
+                preprocess_translation_data(data_dir, ["--joined-dictionary"])
+                train_translation_model(
+                    data_dir,
+                    "nonautoregressive_transformer",
+                    [
+                        "--apply-bert-init",
+                        "--src-embedding-copy",
+                        "--criterion",
+                        "nat_loss",
+                        "--noise",
+                        "full_mask",
+                        "--pred-length-offset",
+                        "--length-loss-factor",
+                        "0.1",
+                    ],
+                    task="translation_lev",
+                )
+                generate_main(
+                    data_dir,
+                    [
+                        "--task",
+                        "translation_lev",
+                        "--iter-decode-max-iter",
+                        "0",
+                        "--iter-decode-eos-penalty",
+                        "0",
+                        "--print-step",
+                    ],
+                )
+
+    # def test_nat_crf_transformer(self):
+    #     with contextlib.redirect_stdout(StringIO()):
+    #         with tempfile.TemporaryDirectory('test_nat_crf_transformer') as data_dir:
+    #             create_dummy_data(data_dir)
+    #             preprocess_translation_data(data_dir, ['--joined-dictionary'])
+    #             train_translation_model(data_dir, 'nacrf_transformer', [
+    #                 '--apply-bert-init', '--criterion',
+    #                 'nat_loss', '--noise', 'full_mask', '--pred-length-offset',
+    #                 '--length-loss-factor', '0.1',
+    #                 '--word-ins-loss-factor', '0.5',
+    #                 '--crf-lowrank-approx', '1',
+    #                 '--crf-beam-approx', '1'
+    #             ], task='translation_lev')
+    #             generate_main(data_dir, [
+    #                 '--task', 'translation_lev',
+    #                 '--iter-decode-max-iter', '0',
+    #                 '--iter-decode-eos-penalty', '0',
+    #                 '--print-step',
+    #             ])
+
+    def test_iterative_nonautoregressive_transformer(self):
+        with contextlib.redirect_stdout(StringIO()):
+            with tempfile.TemporaryDirectory(
+                "test_iterative_nonautoregressive_transformer"
+            ) as data_dir:
+                create_dummy_data(data_dir)
+                preprocess_translation_data(data_dir, ["--joined-dictionary"])
+                train_translation_model(
+                    data_dir,
+                    "iterative_nonautoregressive_transformer",
+                    [
+                        "--apply-bert-init",
+                        "--src-embedding-copy",
+                        "--criterion",
+                        "nat_loss",
+                        "--noise",
+                        "full_mask",
+                        "--stochastic-approx",
+                        "--dae-ratio",
+                        "0.5",
+                        "--train-step",
+                        "3",
+                    ],
+                    task="translation_lev",
+                )
+                generate_main(
+                    data_dir,
+                    [
+                        "--task",
+                        "translation_lev",
+                        "--iter-decode-max-iter",
+                        "9",
+                        "--iter-decode-eos-penalty",
+                        "0",
+                        "--print-step",
+                    ],
+                )
+
+    def test_insertion_transformer(self):
+        with contextlib.redirect_stdout(StringIO()):
+            with tempfile.TemporaryDirectory("test_insertion_transformer") as data_dir:
+                create_dummy_data(data_dir)
+                preprocess_translation_data(data_dir, ["--joined-dictionary"])
+                train_translation_model(
+                    data_dir,
+                    "insertion_transformer",
+                    [
+                        "--apply-bert-init",
+                        "--criterion",
+                        "nat_loss",
+                        "--noise",
+                        "random_mask",
+                    ],
+                    task="translation_lev",
+                )
+                generate_main(
+                    data_dir,
+                    [
+                        "--task",
+                        "translation_lev",
+                        "--iter-decode-max-iter",
+                        "9",
+                        "--iter-decode-eos-penalty",
+                        "0",
+                        "--print-step",
+                    ],
+                )
+
+    def test_mixture_of_experts(self):
+        with contextlib.redirect_stdout(StringIO()):
+            with tempfile.TemporaryDirectory("test_moe") as data_dir:
+                create_dummy_data(data_dir)
+                preprocess_translation_data(data_dir)
+                train_translation_model(
+                    data_dir,
+                    "transformer_iwslt_de_en",
+                    [
+                        "--task",
+                        "translation_moe",
+                        "--user-dir",
+                        "examples/translation_moe/translation_moe_src",
+                        "--method",
+                        "hMoElp",
+                        "--mean-pool-gating-network",
+                        "--num-experts",
+                        "3",
+                        "--encoder-layers",
+                        "2",
+                        "--decoder-layers",
+                        "2",
+                        "--encoder-embed-dim",
+                        "8",
+                        "--decoder-embed-dim",
+                        "8",
+                    ],
+                )
+                generate_main(
+                    data_dir,
+                    [
+                        "--task",
+                        "translation_moe",
+                        "--user-dir",
+                        "examples/translation_moe/translation_moe_src",
+                        "--method",
+                        "hMoElp",
+                        "--mean-pool-gating-network",
+                        "--num-experts",
+                        "3",
+                        "--gen-expert",
+                        "0",
+                    ],
+                )
+
+    def test_alignment(self):
+        with contextlib.redirect_stdout(StringIO()):
+            with tempfile.TemporaryDirectory("test_alignment") as data_dir:
+                create_dummy_data(data_dir, alignment=True)
+                preprocess_translation_data(data_dir, ["--align-suffix", "align"])
+                train_translation_model(
+                    data_dir,
+                    "transformer_align",
+                    [
+                        "--encoder-layers",
+                        "2",
+                        "--decoder-layers",
+                        "2",
+                        "--encoder-embed-dim",
+                        "8",
+                        "--decoder-embed-dim",
+                        "8",
+                        "--load-alignments",
+                        "--alignment-layer",
+                        "1",
+                        "--criterion",
+                        "label_smoothed_cross_entropy_with_alignment",
+                    ],
+                    run_validation=True,
+                )
+                generate_main(data_dir)
+
+    def test_laser_lstm(self):
+        with contextlib.redirect_stdout(StringIO()):
+            with tempfile.TemporaryDirectory("test_laser_lstm") as data_dir:
+                laser_config_file = create_laser_data_and_config_json(data_dir)
+                train_translation_model(
+                    laser_config_file.name,
+                    "laser_lstm",
+                    [
+                        "--user-dir",
+                        "examples/laser/laser_src",
+                        "--weighting-alpha",
+                        "0.3",
+                        "--encoder-bidirectional",
+                        "--encoder-hidden-size",
+                        "512",
+                        "--encoder-layers",
+                        "5",
+                        "--decoder-layers",
+                        "1",
+                        "--encoder-embed-dim",
+                        "320",
+                        "--decoder-embed-dim",
+                        "320",
+                        "--decoder-lang-embed-dim",
+                        "32",
+                        "--save-dir",
+                        data_dir,
+                        "--disable-validation",
+                    ],
+                    task="laser",
+                    lang_flags=[],
+                )
+
+    def test_laser_transformer(self):
+        with contextlib.redirect_stdout(StringIO()):
+            with tempfile.TemporaryDirectory("test_laser_transformer") as data_dir:
+                laser_config_file = create_laser_data_and_config_json(data_dir)
+                train_translation_model(
+                    laser_config_file.name,
+                    "laser_transformer",
+                    [
+                        "--user-dir",
+                        "examples/laser/laser_src",
+                        "--weighting-alpha",
+                        "0.3",
+                        "--encoder-embed-dim",
+                        "320",
+                        "--decoder-embed-dim",
+                        "320",
+                        "--decoder-lang-embed-dim",
+                        "32",
+                        "--save-dir",
+                        data_dir,
+                        "--disable-validation",
+                    ],
+                    task="laser",
+                    lang_flags=[],
+                )
+
+    def test_alignment_full_context(self):
+        with contextlib.redirect_stdout(StringIO()):
+            with tempfile.TemporaryDirectory("test_alignment") as data_dir:
+                create_dummy_data(data_dir, alignment=True)
+                preprocess_translation_data(data_dir, ["--align-suffix", "align"])
+                train_translation_model(
+                    data_dir,
+                    "transformer_align",
+                    [
+                        "--encoder-layers",
+                        "2",
+                        "--decoder-layers",
+                        "2",
+                        "--encoder-embed-dim",
+                        "8",
+                        "--decoder-embed-dim",
+                        "8",
+                        "--load-alignments",
+                        "--alignment-layer",
+                        "1",
+                        "--criterion",
+                        "label_smoothed_cross_entropy_with_alignment",
+                        "--full-context-alignment",
+                    ],
+                    run_validation=True,
+                )
+                generate_main(data_dir)
+
+    def test_transformer_layerdrop(self):
+        with contextlib.redirect_stdout(StringIO()):
+            with tempfile.TemporaryDirectory("test_transformer_layerdrop") as data_dir:
+                create_dummy_data(data_dir)
+                preprocess_translation_data(data_dir)
+                train_translation_model(
+                    data_dir,
+                    "transformer_iwslt_de_en",
+                    [
+                        "--encoder-layers",
+                        "3",
+                        "--decoder-layers",
+                        "3",
+                        "--encoder-embed-dim",
+                        "8",
+                        "--decoder-embed-dim",
+                        "8",
+                        "--encoder-layerdrop",
+                        "0.01",
+                        "--decoder-layerdrop",
+                        "0.01",
+                    ],
+                )
+                generate_main(data_dir)
+                generate_main(
+                    data_dir,
+                    [
+                        "--model-overrides",
+                        "{'encoder_layers_to_keep':'0,2','decoder_layers_to_keep':'1'}",
+                    ],
+                )
+
+
+class TestStories(unittest.TestCase):
+    def setUp(self):
+        logging.disable(logging.CRITICAL)
+
+    def tearDown(self):
+        logging.disable(logging.NOTSET)
+
+    def test_fconv_self_att_wp(self):
+        with contextlib.redirect_stdout(StringIO()):
+            with tempfile.TemporaryDirectory("test_fconv_self_att_wp") as data_dir:
+                create_dummy_data(data_dir)
+                preprocess_translation_data(data_dir)
+                config = [
+                    "--encoder-layers",
+                    "[(128, 3)] * 2",
+                    "--decoder-layers",
+                    "[(128, 3)] * 2",
+                    "--decoder-attention",
+                    "True",
+                    "--encoder-attention",
+                    "False",
+                    "--gated-attention",
+                    "True",
+                    "--self-attention",
+                    "True",
+                    "--project-input",
+                    "True",
+                    "--encoder-embed-dim",
+                    "8",
+                    "--decoder-embed-dim",
+                    "8",
+                    "--decoder-out-embed-dim",
+                    "8",
+                    "--multihead-self-attention-nheads",
+                    "2",
+                ]
+                train_translation_model(data_dir, "fconv_self_att_wp", config)
+                generate_main(data_dir)
+
+                # fusion model
+                os.rename(
+                    os.path.join(data_dir, "checkpoint_last.pt"),
+                    os.path.join(data_dir, "pretrained.pt"),
+                )
+                config.extend(
+                    [
+                        "--pretrained",
+                        "True",
+                        "--pretrained-checkpoint",
+                        os.path.join(data_dir, "pretrained.pt"),
+                        "--save-dir",
+                        os.path.join(data_dir, "fusion_model"),
+                    ]
+                )
+                train_translation_model(data_dir, "fconv_self_att_wp", config)
+
+
+class TestLanguageModeling(unittest.TestCase):
+    def setUp(self):
+        logging.disable(logging.CRITICAL)
+
+    def tearDown(self):
+        logging.disable(logging.NOTSET)
+
+    def test_fconv_lm(self):
+        with contextlib.redirect_stdout(StringIO()):
+            with tempfile.TemporaryDirectory("test_fconv_lm") as data_dir:
+                create_dummy_data(data_dir)
+                preprocess_lm_data(data_dir)
+                train_language_model(
+                    data_dir,
+                    "fconv_lm",
+                    [
+                        "--decoder-layers",
+                        "[(850, 3)] * 2 + [(1024,4)]",
+                        "--decoder-embed-dim",
+                        "280",
+                        "--optimizer",
+                        "nag",
+                        "--lr",
+                        "0.1",
+                    ],
+                )
+                eval_lm_main(data_dir)
+                generate_main(
+                    data_dir,
+                    [
+                        "--task",
+                        "language_modeling",
+                        "--sample-break-mode",
+                        "eos",
+                        "--tokens-per-sample",
+                        "500",
+                    ],
+                )
+
+    def test_transformer_lm(self):
+        with contextlib.redirect_stdout(StringIO()):
+            with tempfile.TemporaryDirectory("test_transformer_lm") as data_dir:
+                create_dummy_data(data_dir)
+                preprocess_lm_data(data_dir)
+                train_language_model(
+                    data_dir,
+                    "transformer_lm",
+                    ["--add-bos-token", '--nval',  '1'],
+                    run_validation=True,
+                )
+                eval_lm_main(data_dir)
+                eval_lm_main(data_dir, extra_flags=["--context-window", "25"])
+                generate_main(
+                    data_dir,
+                    [
+                        "--task",
+                        "language_modeling",
+                        "--sample-break-mode",
+                        "eos",
+                        "--tokens-per-sample",
+                        "500",
+                    ],
+                )
+
+    def test_transformer_lm_with_adaptive_softmax(self):
+        with contextlib.redirect_stdout(StringIO()):
+            with tempfile.TemporaryDirectory(
+                "test_transformer_lm_with_adaptive_softmax"
+            ) as data_dir:
+                create_dummy_data(data_dir)
+                preprocess_lm_data(data_dir)
+                train_language_model(
+                    data_dir,
+                    "transformer_lm",
+                    [
+                        "--add-bos-token",
+                        "--criterion",
+                        "adaptive_loss",
+                        "--adaptive-softmax-cutoff",
+                        "5,10,15",
+                    ],
+                    run_validation=True,
+                )
+                eval_lm_main(data_dir)
+                generate_main(
+                    data_dir,
+                    [
+                        "--task",
+                        "language_modeling",
+                        "--sample-break-mode",
+                        "eos",
+                        "--tokens-per-sample",
+                        "500",
+                    ],
+                )
+
+    def test_lightconv_lm(self):
+        with contextlib.redirect_stdout(StringIO()):
+            with tempfile.TemporaryDirectory("test_lightconv_lm") as data_dir:
+                create_dummy_data(data_dir)
+                preprocess_lm_data(data_dir)
+                train_language_model(
+                    data_dir,
+                    "lightconv_lm",
+                    ["--add-bos-token"],
+                    run_validation=True,
+                )
+                eval_lm_main(data_dir)
+                generate_main(
+                    data_dir,
+                    [
+                        "--task",
+                        "language_modeling",
+                        "--sample-break-mode",
+                        "eos",
+                        "--tokens-per-sample",
+                        "500",
+                    ],
+                )
+
+    def test_lstm_lm(self):
+        with contextlib.redirect_stdout(StringIO()):
+            with tempfile.TemporaryDirectory("test_lstm_lm") as data_dir:
+                create_dummy_data(data_dir)
+                preprocess_lm_data(data_dir)
+                train_language_model(
+                    data_dir,
+                    "lstm_lm",
+                    ["--add-bos-token"],
+                    run_validation=True,
+                )
+                eval_lm_main(data_dir)
+                generate_main(
+                    data_dir,
+                    [
+                        "--task",
+                        "language_modeling",
+                        "--sample-break-mode",
+                        "eos",
+                        "--tokens-per-sample",
+                        "500",
+                    ],
+                )
+
+    def test_lstm_lm_residuals(self):
+        with contextlib.redirect_stdout(StringIO()):
+            with tempfile.TemporaryDirectory("test_lstm_lm_residuals") as data_dir:
+                create_dummy_data(data_dir)
+                preprocess_lm_data(data_dir)
+                train_language_model(
+                    data_dir,
+                    "lstm_lm",
+                    ["--add-bos-token", "--residuals"],
+                    run_validation=True,
+                )
+                eval_lm_main(data_dir)
+                generate_main(
+                    data_dir,
+                    [
+                        "--task",
+                        "language_modeling",
+                        "--sample-break-mode",
+                        "eos",
+                        "--tokens-per-sample",
+                        "500",
+                    ],
+                )
+
+    @unittest.skipIf(not has_hf_transformers, "skip test if transformers is missing")
+    def test_transformer_xl_bptt_lm(self):
+        with contextlib.redirect_stdout(StringIO()):
+            with tempfile.TemporaryDirectory("test_transformer_xl_bptt_lm") as data_dir:
+                create_dummy_data(data_dir)
+                preprocess_lm_data(data_dir)
+                task_flags = [
+                    "--user-dir",
+                    "examples/truncated_bptt",
+                    "--task",
+                    "truncated_bptt_lm",
+                    "--batch-size",
+                    "2",
+                    "--tokens-per-sample",
+                    "50",
+                ]
+                train_language_model(
+                    data_dir=data_dir,
+                    arch="transformer_xl",
+                    extra_flags=task_flags
+                    + [
+                        "--n-layer",
+                        "2",
+                    ],
+                    task="truncated_bptt_lm",
+                    run_validation=True,
+                    extra_valid_flags=task_flags,
+                )
+                eval_lm_main(data_dir, extra_flags=task_flags)
+                # Train with activation offloading
+                train_language_model(
+                    data_dir=data_dir,
+                    arch="transformer_xl",
+                    extra_flags=task_flags
+                    + [
+                        "--n-layer",
+                        "2",
+                        "--offload-activations",
+                    ],
+                    task="truncated_bptt_lm",
+                    run_validation=True,
+                    extra_valid_flags=task_flags,
+                )
+
+
+class TestMaskedLanguageModel(unittest.TestCase):
+    def setUp(self):
+        logging.disable(logging.CRITICAL)
+
+    def tearDown(self):
+        logging.disable(logging.NOTSET)
+
+    def test_legacy_masked_lm(self):
+        with contextlib.redirect_stdout(StringIO()):
+            with tempfile.TemporaryDirectory("test_legacy_mlm") as data_dir:
+                create_dummy_data(data_dir)
+                preprocess_lm_data(data_dir)
+                train_legacy_masked_language_model(data_dir, "masked_lm")
+
+    def test_roberta_masked_lm(self):
+        with contextlib.redirect_stdout(StringIO()):
+            with tempfile.TemporaryDirectory("test_roberta_mlm") as data_dir:
+                create_dummy_data(data_dir)
+                preprocess_lm_data(data_dir)
+                train_masked_lm(
+                    data_dir, "roberta_base", extra_flags=["--encoder-layers", "2"]
+                )
+
+    def test_roberta_sentence_prediction(self):
+        num_classes = 3
+        with contextlib.redirect_stdout(StringIO()):
+            with tempfile.TemporaryDirectory("test_roberta_head") as data_dir:
+                create_dummy_roberta_head_data(data_dir, num_classes=num_classes)
+                preprocess_lm_data(os.path.join(data_dir, "input0"))
+                preprocess_lm_data(os.path.join(data_dir, "label"))
+                train_roberta_head(data_dir, "roberta_base", num_classes=num_classes)
+
+    def test_roberta_regression_single(self):
+        num_classes = 1
+        with contextlib.redirect_stdout(StringIO()):
+            with tempfile.TemporaryDirectory(
+                "test_roberta_regression_single"
+            ) as data_dir:
+                create_dummy_roberta_head_data(
+                    data_dir, num_classes=num_classes, regression=True
+                )
+                preprocess_lm_data(os.path.join(data_dir, "input0"))
+                train_roberta_head(
+                    data_dir,
+                    "roberta_base",
+                    num_classes=num_classes,
+                    extra_flags=["--regression-target"],
+                )
+
+    def test_roberta_regression_multiple(self):
+        num_classes = 3
+        with contextlib.redirect_stdout(StringIO()):
+            with tempfile.TemporaryDirectory(
+                "test_roberta_regression_multiple"
+            ) as data_dir:
+                create_dummy_roberta_head_data(
+                    data_dir, num_classes=num_classes, regression=True
+                )
+                preprocess_lm_data(os.path.join(data_dir, "input0"))
+                train_roberta_head(
+                    data_dir,
+                    "roberta_base",
+                    num_classes=num_classes,
+                    extra_flags=["--regression-target"],
+                )
+
+    def test_linformer_roberta_masked_lm(self):
+        with contextlib.redirect_stdout(StringIO()):
+            with tempfile.TemporaryDirectory("test_linformer_roberta_mlm") as data_dir:
+                create_dummy_data(data_dir)
+                preprocess_lm_data(data_dir)
+                train_masked_lm(
+                    data_dir,
+                    "linformer_roberta_base",
+                    extra_flags=[
+                        "--user-dir",
+                        "examples/linformer/linformer_src",
+                        "--encoder-layers",
+                        "2",
+                    ],
+                )
+
+    def test_linformer_roberta_sentence_prediction(self):
+        num_classes = 3
+        with contextlib.redirect_stdout(StringIO()):
+            with tempfile.TemporaryDirectory("test_linformer_roberta_head") as data_dir:
+                create_dummy_roberta_head_data(data_dir, num_classes=num_classes)
+                preprocess_lm_data(os.path.join(data_dir, "input0"))
+                preprocess_lm_data(os.path.join(data_dir, "label"))
+                train_roberta_head(
+                    data_dir,
+                    "linformer_roberta_base",
+                    num_classes=num_classes,
+                    extra_flags=["--user-dir", "examples/linformer/linformer_src"],
+                )
+
+    def test_linformer_roberta_regression_single(self):
+        num_classes = 1
+        with contextlib.redirect_stdout(StringIO()):
+            with tempfile.TemporaryDirectory(
+                "test_linformer_roberta_regression_single"
+            ) as data_dir:
+                create_dummy_roberta_head_data(
+                    data_dir, num_classes=num_classes, regression=True
+                )
+                preprocess_lm_data(os.path.join(data_dir, "input0"))
+                train_roberta_head(
+                    data_dir,
+                    "linformer_roberta_base",
+                    num_classes=num_classes,
+                    extra_flags=[
+                        "--regression-target",
+                        "--user-dir",
+                        "examples/linformer/linformer_src",
+                    ],
+                )
+
+    def test_linformer_roberta_regression_multiple(self):
+        num_classes = 3
+        with contextlib.redirect_stdout(StringIO()):
+            with tempfile.TemporaryDirectory(
+                "test_linformer_roberta_regression_multiple"
+            ) as data_dir:
+                create_dummy_roberta_head_data(
+                    data_dir, num_classes=num_classes, regression=True
+                )
+                preprocess_lm_data(os.path.join(data_dir, "input0"))
+                train_roberta_head(
+                    data_dir,
+                    "linformer_roberta_base",
+                    num_classes=num_classes,
+                    extra_flags=[
+                        "--regression-target",
+                        "--user-dir",
+                        "examples/linformer/linformer_src",
+                    ],
+                )
+
+    def _test_pretrained_masked_lm_for_translation(self, learned_pos_emb, encoder_only):
+        with contextlib.redirect_stdout(StringIO()):
+            with tempfile.TemporaryDirectory("test_mlm") as data_dir:
+                create_dummy_data(data_dir)
+                preprocess_lm_data(data_dir)
+                train_legacy_masked_language_model(
+                    data_dir,
+                    arch="masked_lm",
+                    extra_args=("--encoder-learned-pos",) if learned_pos_emb else (),
+                )
+                with tempfile.TemporaryDirectory(
+                    "test_mlm_translation"
+                ) as translation_dir:
+                    create_dummy_data(translation_dir)
+                    preprocess_translation_data(
+                        translation_dir, extra_flags=["--joined-dictionary"]
+                    )
+                    # Train transformer with data_dir/checkpoint_last.pt
+                    train_translation_model(
+                        translation_dir,
+                        arch="transformer_from_pretrained_xlm",
+                        extra_flags=[
+                            "--decoder-layers",
+                            "1",
+                            "--decoder-embed-dim",
+                            "32",
+                            "--decoder-attention-heads",
+                            "1",
+                            "--decoder-ffn-embed-dim",
+                            "32",
+                            "--encoder-layers",
+                            "1",
+                            "--encoder-embed-dim",
+                            "32",
+                            "--encoder-attention-heads",
+                            "1",
+                            "--encoder-ffn-embed-dim",
+                            "32",
+                            "--pretrained-xlm-checkpoint",
+                            "{}/checkpoint_last.pt".format(data_dir),
+                            "--activation-fn",
+                            "gelu",
+                            "--max-source-positions",
+                            "500",
+                            "--max-target-positions",
+                            "500",
+                        ]
+                        + (
+                            ["--encoder-learned-pos", "--decoder-learned-pos"]
+                            if learned_pos_emb
+                            else []
+                        )
+                        + (["--init-encoder-only"] if encoder_only else []),
+                        task="translation_from_pretrained_xlm",
+                    )
+
+    def test_pretrained_masked_lm_for_translation_learned_pos_emb(self):
+        self._test_pretrained_masked_lm_for_translation(True, False)
+
+    def test_pretrained_masked_lm_for_translation_sinusoidal_pos_emb(self):
+        self._test_pretrained_masked_lm_for_translation(False, False)
+
+    def test_pretrained_masked_lm_for_translation_encoder_only(self):
+        self._test_pretrained_masked_lm_for_translation(True, True)
+
+    def test_r4f_roberta(self):
+        num_classes = 3
+        with contextlib.redirect_stdout(StringIO()):
+            with tempfile.TemporaryDirectory("test_r4f_roberta_head") as data_dir:
+                create_dummy_roberta_head_data(data_dir, num_classes=num_classes)
+                preprocess_lm_data(os.path.join(data_dir, "input0"))
+                preprocess_lm_data(os.path.join(data_dir, "label"))
+                train_roberta_head(
+                    data_dir,
+                    "roberta_base",
+                    num_classes=num_classes,
+                    extra_flags=[
+                        "--user-dir",
+                        "examples/rxf/rxf_src",
+                        "--criterion",
+                        "sentence_prediction_r3f",
+                        "--spectral-norm-classification-head",
+                    ],
+                )
+
+
+def train_legacy_masked_language_model(data_dir, arch, extra_args=()):
+    train_parser = options.get_training_parser()
+    # TODO: langs should be in and out right?
+    train_args = options.parse_args_and_arch(
+        train_parser,
+        [
+            "--task",
+            "cross_lingual_lm",
+            data_dir,
+            "--arch",
+            arch,
+            # Optimizer args
+            "--optimizer",
+            "adam",
+            "--lr-scheduler",
+            "reduce_lr_on_plateau",
+            "--lr-shrink",
+            "0.5",
+            "--lr",
+            "0.0001",
+            "--stop-min-lr",
+            "1e-09",
+            # dropout, attention args
+            "--dropout",
+            "0.1",
+            "--attention-dropout",
+            "0.1",
+            # MLM args
+            "--criterion",
+            "legacy_masked_lm_loss",
+            "--masked-lm-only",
+            "--monolingual-langs",
+            "in,out",
+            "--num-segment",
+            "5",
+            # Transformer args: use a small transformer model for fast training
+            "--encoder-layers",
+            "1",
+            "--encoder-embed-dim",
+            "32",
+            "--encoder-attention-heads",
+            "1",
+            "--encoder-ffn-embed-dim",
+            "32",
+            # Other training args
+            "--max-tokens",
+            "500",
+            "--tokens-per-sample",
+            "500",
+            "--save-dir",
+            data_dir,
+            "--max-epoch",
+            "1",
+            "--no-progress-bar",
+            "--distributed-world-size",
+            "1",
+            "--dataset-impl",
+            "raw",
+            "--num-workers",
+            "0",
+        ]
+        + list(extra_args),
+    )
+    train.main(train_args)
+
+
+class TestOptimizers(unittest.TestCase):
+    def setUp(self):
+        logging.disable(logging.CRITICAL)
+
+    def tearDown(self):
+        logging.disable(logging.NOTSET)
+
+    def test_optimizers(self):
+        with contextlib.redirect_stdout(StringIO()):
+            with tempfile.TemporaryDirectory("test_optimizers") as data_dir:
+                # Use just a bit of data and tiny model to keep this test runtime reasonable
+                create_dummy_data(data_dir, num_examples=10, maxlen=5)
+                preprocess_translation_data(data_dir)
+                optimizers = ["adafactor", "adam", "nag", "adagrad", "sgd", "adadelta"]
+                last_checkpoint = os.path.join(data_dir, "checkpoint_last.pt")
+                for optimizer in optimizers:
+                    if os.path.exists(last_checkpoint):
+                        os.remove(last_checkpoint)
+                    train_translation_model(
+                        data_dir,
+                        "lstm",
+                        [
+                            "--required-batch-size-multiple",
+                            "1",
+                            "--encoder-layers",
+                            "1",
+                            "--encoder-hidden-size",
+                            "32",
+                            "--decoder-layers",
+                            "1",
+                            "--optimizer",
+                            optimizer,
+                        ],
+                    )
+                    generate_main(data_dir)
+
+
+def read_last_log_entry(
+    logs: List[logging.LogRecord], logger_name: str
+) -> Dict[str, float]:
+    for x in reversed(logs):
+        if x.name == logger_name:
+            return json.loads(x.message)
+    raise ValueError(f"No entries from {logger_name} found in captured logs")
+
+
+class TestActivationCheckpointing(unittest.TestCase):
+    base_flags = [
+        "--encoder-layers",
+        "2",
+        "--decoder-layers",
+        "2",
+        "--encoder-embed-dim",
+        "8",
+        "--decoder-embed-dim",
+        "8",
+        "--restore-file",
+        "x.pt",
+        "--log-format",
+        "json",
+        "--log-interval",
+        "1",
+        "--max-update",
+        "2",
+    ]
+
+    def _train(self, data_dir, extra_flags):
+        with self.assertLogs() as logs:
+            train_translation_model(
+                data_dir,
+                "transformer_iwslt_de_en",
+                self.base_flags + extra_flags,
+                run_validation=True,
+                extra_valid_flags=["--log-format", "json"],
+            )
+        return logs.records
+
+    def test_activation_offloading_does_not_change_metrics(self):
+        """Neither ----checkpoint-activations nor --offload-activations should change loss"""
+        with tempfile.TemporaryDirectory("test_transformer_with_act_cpt") as data_dir:
+
+            with self.assertLogs():
+                create_dummy_data(data_dir, num_examples=20)
+                preprocess_translation_data(data_dir)
+            offload_logs = self._train(data_dir, ["--offload-activations"])
+            baseline_logs = self._train(data_dir, [])
+
+            assert len(baseline_logs) == len(offload_logs)
+
+            baseline_valid_stats = read_last_log_entry(baseline_logs, "valid")
+            offload_valid_stats = read_last_log_entry(offload_logs, "valid")
+            baseline_train_stats = read_last_log_entry(baseline_logs, "train")
+            offload_train_stats = read_last_log_entry(offload_logs, "train")
+
+            assert (
+                baseline_train_stats["train_loss"] == offload_train_stats["train_loss"]
+            )
+            assert (
+                baseline_valid_stats["valid_loss"] == offload_valid_stats["valid_loss"]
+            )
+
+    def test_activation_checkpointing_does_not_change_metrics(self):
+        """--checkpoint-activations should not change loss"""
+
+        with tempfile.TemporaryDirectory("test_transformer_with_act_cpt") as data_dir:
+            with self.assertLogs():
+                create_dummy_data(data_dir, num_examples=20)
+                preprocess_translation_data(data_dir)
+            ckpt_logs = self._train(data_dir, ["--checkpoint-activations"])
+            baseline_logs = self._train(data_dir, [])
+            assert len(baseline_logs) == len(ckpt_logs)
+
+            baseline_train_stats = read_last_log_entry(baseline_logs, "train")
+            ckpt_train_stats = read_last_log_entry(ckpt_logs, "train")
+            assert baseline_train_stats["train_loss"] == ckpt_train_stats["train_loss"]
+
+            baseline_valid_stats = read_last_log_entry(baseline_logs, "valid")
+            ckpt_valid_stats = read_last_log_entry(ckpt_logs, "valid")
+            assert baseline_valid_stats["valid_loss"] == ckpt_valid_stats["valid_loss"]
+
+
+def create_dummy_roberta_head_data(
+    data_dir, num_examples=100, maxlen=10, num_classes=2, regression=False
+):
+    input_dir = "input0"
+
+    def _create_dummy_data(filename):
+        random_data = torch.rand(num_examples * maxlen)
+        input_data = 97 + torch.floor(26 * random_data).int()
+        if regression:
+            output_data = torch.rand((num_examples, num_classes))
+        else:
+            output_data = 1 + torch.floor(num_classes * torch.rand(num_examples)).int()
+        with open(os.path.join(data_dir, input_dir, filename + ".out"), "w") as f_in:
+            label_filename = filename + ".label" if regression else filename + ".out"
+            with open(os.path.join(data_dir, "label", label_filename), "w") as f_out:
+                offset = 0
+                for i in range(num_examples):
+                    # write example input
+                    ex_len = random.randint(1, maxlen)
+                    ex_str = " ".join(map(chr, input_data[offset : offset + ex_len]))
+                    print(ex_str, file=f_in)
+                    # write example label
+                    if regression:
+                        class_str = " ".join(map(str, output_data[i].numpy()))
+                        print(class_str, file=f_out)
+                    else:
+                        class_str = "class{}".format(output_data[i])
+                        print(class_str, file=f_out)
+                    offset += ex_len
+
+    os.mkdir(os.path.join(data_dir, input_dir))
+    os.mkdir(os.path.join(data_dir, "label"))
+    _create_dummy_data("train")
+    _create_dummy_data("valid")
+    _create_dummy_data("test")
+
+
+def train_masked_lm(data_dir, arch, extra_flags=None):
+    train_parser = options.get_training_parser()
+    train_args = options.parse_args_and_arch(
+        train_parser,
+        [
+            "--task",
+            "masked_lm",
+            data_dir,
+            "--arch",
+            arch,
+            "--optimizer",
+            "adam",
+            "--lr",
+            "0.0001",
+            "--criterion",
+            "masked_lm",
+            "--batch-size",
+            "500",
+            "--save-dir",
+            data_dir,
+            "--max-epoch",
+            "1",
+            "--no-progress-bar",
+            "--distributed-world-size",
+            "1",
+            "--ddp-backend",
+            "no_c10d",
+            "--num-workers",
+            "0",
+        ]
+        + (extra_flags or []),
+    )
+    train.main(train_args)
+
+
+def train_roberta_head(data_dir, arch, num_classes=2, extra_flags=None):
+    train_parser = options.get_training_parser()
+    train_args = options.parse_args_and_arch(
+        train_parser,
+        [
+            "--task",
+            "sentence_prediction",
+            data_dir,
+            "--arch",
+            arch,
+            "--encoder-layers",
+            "2",
+            "--num-classes",
+            str(num_classes),
+            "--optimizer",
+            "adam",
+            "--lr",
+            "0.0001",
+            "--criterion",
+            "sentence_prediction",
+            "--max-tokens",
+            "500",
+            "--max-positions",
+            "500",
+            "--batch-size",
+            "500",
+            "--save-dir",
+            data_dir,
+            "--max-epoch",
+            "1",
+            "--no-progress-bar",
+            "--distributed-world-size",
+            "1",
+            "--ddp-backend",
+            "no_c10d",
+            "--num-workers",
+            "0",
+        ]
+        + (extra_flags or []),
+    )
+    train.main(train_args)
+
+
+def eval_lm_main(data_dir, extra_flags=None):
+    eval_lm_parser = options.get_eval_lm_parser()
+    eval_lm_args = options.parse_args_and_arch(
+        eval_lm_parser,
+        [
+            data_dir,
+            "--path",
+            os.path.join(data_dir, "checkpoint_last.pt"),
+            "--no-progress-bar",
+            "--num-workers",
+            "0",
+        ]
+        + (extra_flags or []),
+    )
+    eval_lm.main(eval_lm_args)
+
+
+if __name__ == "__main__":
+    unittest.main()
diff --git a/SpeechT5/fairseq/tests/test_character_token_embedder.py b/SpeechT5/fairseq/tests/test_character_token_embedder.py
new file mode 100644
index 0000000000000000000000000000000000000000..24940ebd21a0e4465ca6052409353a3179e9cf6d
--- /dev/null
+++ b/SpeechT5/fairseq/tests/test_character_token_embedder.py
@@ -0,0 +1,48 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import unittest
+
+import torch
+from fairseq.data import Dictionary
+from fairseq.modules import CharacterTokenEmbedder
+
+
+class TestCharacterTokenEmbedder(unittest.TestCase):
+    def test_character_token_embedder(self):
+        vocab = Dictionary()
+        vocab.add_symbol("hello")
+        vocab.add_symbol("there")
+
+        embedder = CharacterTokenEmbedder(
+            vocab, [(2, 16), (4, 32), (8, 64), (16, 2)], 64, 5, 2
+        )
+
+        test_sents = [["hello", "unk", "there"], ["there"], ["hello", "there"]]
+        max_len = max(len(s) for s in test_sents)
+        input = torch.LongTensor(len(test_sents), max_len + 2).fill_(vocab.pad())
+        for i in range(len(test_sents)):
+            input[i][0] = vocab.eos()
+            for j in range(len(test_sents[i])):
+                input[i][j + 1] = vocab.index(test_sents[i][j])
+            input[i][j + 2] = vocab.eos()
+        embs = embedder(input)
+
+        assert embs.size() == (len(test_sents), max_len + 2, 5)
+        self.assertAlmostEqual(embs[0][0], embs[1][0])
+        self.assertAlmostEqual(embs[0][0], embs[0][-1])
+        self.assertAlmostEqual(embs[0][1], embs[2][1])
+        self.assertAlmostEqual(embs[0][3], embs[1][1])
+
+        embs.sum().backward()
+        assert embedder.char_embeddings.weight.grad is not None
+
+    def assertAlmostEqual(self, t1, t2):
+        self.assertEqual(t1.size(), t2.size(), "size mismatch")
+        self.assertLess((t1 - t2).abs().max(), 1e-6)
+
+
+if __name__ == "__main__":
+    unittest.main()
diff --git a/SpeechT5/fairseq/tests/test_checkpoint_utils.py b/SpeechT5/fairseq/tests/test_checkpoint_utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..0f28222633a68943497616507ce412ead76864d6
--- /dev/null
+++ b/SpeechT5/fairseq/tests/test_checkpoint_utils.py
@@ -0,0 +1,106 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import contextlib
+import logging
+import os
+import tempfile
+import unittest
+from io import StringIO
+from unittest.mock import patch
+
+from fairseq import checkpoint_utils
+from omegaconf import OmegaConf
+
+from tests.utils import (
+    create_dummy_data,
+    preprocess_translation_data,
+    train_translation_model,
+)
+
+
+class TestCheckpointUtils(unittest.TestCase):
+    def setUp(self):
+        logging.disable(logging.CRITICAL)
+
+    def tearDown(self):
+        logging.disable(logging.NOTSET)
+
+    @contextlib.contextmanager
+    def _train_transformer(self, seed, extra_args=None):
+        if extra_args is None:
+            extra_args = []
+        with tempfile.TemporaryDirectory(f"_train_transformer_seed{seed}") as data_dir:
+            create_dummy_data(data_dir)
+            preprocess_translation_data(data_dir)
+            train_translation_model(
+                data_dir,
+                "transformer_iwslt_de_en",
+                [
+                    "--encoder-layers",
+                    "3",
+                    "--decoder-layers",
+                    "3",
+                    "--encoder-embed-dim",
+                    "8",
+                    "--decoder-embed-dim",
+                    "8",
+                    "--seed",
+                    str(seed),
+                ]
+                + extra_args,
+            )
+            yield os.path.join(data_dir, "checkpoint_last.pt")
+
+    def test_load_model_ensemble_and_task(self):
+        # with contextlib.redirect_stdout(StringIO()):
+            with self._train_transformer(seed=123) as model1:
+                with self._train_transformer(seed=456) as model2:
+                    ensemble, cfg, task = checkpoint_utils.load_model_ensemble_and_task(
+                        filenames=[model1, model2]
+                    )
+                    self.assertEqual(len(ensemble), 2)
+
+                    # after Transformer has been migrated to Hydra, this will probably
+                    # become cfg.common.seed
+                    self.assertEqual(ensemble[0].args.seed, 123)
+                    self.assertEqual(ensemble[1].args.seed, 456)
+
+                    # the task from the first model should be returned
+                    self.assertTrue("seed123" in task.cfg.data)
+
+                    # last cfg is saved
+                    self.assertEqual(cfg.common.seed, 456)
+
+    def test_prune_state_dict(self):
+        with contextlib.redirect_stdout(StringIO()):
+            extra_args = ["--encoder-layerdrop", "0.01", "--decoder-layerdrop", "0.01"]
+            with self._train_transformer(seed=1, extra_args=extra_args) as model:
+                ensemble, cfg, task = checkpoint_utils.load_model_ensemble_and_task(
+                    filenames=[model],
+                    arg_overrides={
+                        "encoder_layers_to_keep": "0,2",
+                        "decoder_layers_to_keep": "1",
+                    },
+                )
+                self.assertEqual(len(ensemble), 1)
+                self.assertEqual(len(ensemble[0].encoder.layers), 2)
+                self.assertEqual(len(ensemble[0].decoder.layers), 1)
+
+    def test_torch_persistent_save_async(self):
+        state_dict = {}
+        filename = "async_checkpoint.pt"
+
+        with patch(f"{checkpoint_utils.__name__}.PathManager.opena") as mock_opena:
+            with patch(f"{checkpoint_utils.__name__}._torch_persistent_save") as mock_save:
+                checkpoint_utils.torch_persistent_save(
+                    state_dict, filename, async_write=True
+                )
+                mock_opena.assert_called_with(filename, "wb")
+                mock_save.assert_called()
+
+
+if __name__ == "__main__":
+    unittest.main()
diff --git a/SpeechT5/fairseq/tests/test_concat_dataset.py b/SpeechT5/fairseq/tests/test_concat_dataset.py
new file mode 100644
index 0000000000000000000000000000000000000000..d94aeffd481a2e107eb5747e41d76435b3f3dc8a
--- /dev/null
+++ b/SpeechT5/fairseq/tests/test_concat_dataset.py
@@ -0,0 +1,58 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import unittest
+
+import torch
+from fairseq.data import LanguagePairDataset, TokenBlockDataset
+from fairseq.data.concat_dataset import ConcatDataset
+from tests.test_train import mock_dict
+
+
+class TestConcatDataset(unittest.TestCase):
+    def setUp(self):
+        d = mock_dict()
+        tokens_1 = torch.LongTensor([1]).view(1, -1)
+        tokens_ds1 = TokenBlockDataset(
+            tokens_1,
+            sizes=[tokens_1.size(-1)],
+            block_size=1,
+            pad=0,
+            eos=1,
+            include_targets=False,
+        )
+        self.dataset_1 = LanguagePairDataset(
+            tokens_ds1, tokens_ds1.sizes, d, shuffle=False
+        )
+        tokens_2 = torch.LongTensor([2]).view(1, -1)
+        tokens_ds2 = TokenBlockDataset(
+            tokens_2,
+            sizes=[tokens_2.size(-1)],
+            block_size=1,
+            pad=0,
+            eos=1,
+            include_targets=False,
+        )
+        self.dataset_2 = LanguagePairDataset(
+            tokens_ds2, tokens_ds2.sizes, d, shuffle=False
+        )
+
+    def test_concat_dataset_basics(self):
+        d = ConcatDataset([self.dataset_1, self.dataset_2])
+        assert len(d) == 2
+        assert d[0]["source"][0] == 1
+        assert d[1]["source"][0] == 2
+
+        d = ConcatDataset([self.dataset_1, self.dataset_2], sample_ratios=[1, 2])
+        assert len(d) == 3
+        assert d[0]["source"][0] == 1
+        assert d[1]["source"][0] == 2
+        assert d[2]["source"][0] == 2
+
+        d = ConcatDataset([self.dataset_1, self.dataset_2], sample_ratios=[2, 1])
+        assert len(d) == 3
+        assert d[0]["source"][0] == 1
+        assert d[1]["source"][0] == 1
+        assert d[2]["source"][0] == 2
diff --git a/SpeechT5/fairseq/tests/test_constraints.py b/SpeechT5/fairseq/tests/test_constraints.py
new file mode 100644
index 0000000000000000000000000000000000000000..1c37f7e1fb26d8ea5349fedd3a60f566d09cf598
--- /dev/null
+++ b/SpeechT5/fairseq/tests/test_constraints.py
@@ -0,0 +1,269 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import sys
+import unittest
+
+import torch
+from fairseq.token_generation_constraints import *
+
+
+def tensorize(constraints: List[List[int]]) -> torch.Tensor:
+    return [torch.tensor(x) for x in constraints]
+
+
+class TestHelperRoutines(unittest.TestCase):
+    def setUp(self):
+        self.examples = [
+            ([[]], torch.tensor([[0]])),
+            ([[], []], torch.tensor([[0], [0]])),
+            ([[torch.tensor([1, 2])], []], torch.tensor([[1, 1, 2, 0], [0, 0, 0, 0]])),
+            (
+                [
+                    [
+                        torch.tensor([3, 1, 2]),
+                        torch.tensor([3]),
+                        torch.tensor([4, 5, 6, 7]),
+                    ],
+                    [],
+                    [torch.tensor([1, 8, 9, 10, 1, 4, 11, 12])],
+                ],
+                torch.tensor(
+                    [
+                        [3, 3, 1, 2, 0, 3, 0, 4, 5, 6, 7, 0],
+                        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
+                        [1, 1, 8, 9, 10, 1, 4, 11, 12, 0, 0, 0],
+                    ]
+                ),
+            ),
+        ]
+
+    def test_packing(self):
+        """Ensures the list of lists of tensors gets packed correctly."""
+        for batch_constraints, expected_tensor in self.examples:
+            packed = pack_constraints(batch_constraints)
+            assert torch.equal(packed, expected_tensor)
+
+
+class TestUnorderedConstraintState(unittest.TestCase):
+    def setUp(self):
+        # Tuples of (contraint set, expected printed graph, token counts per node)
+        self.examples = [
+            (
+                tensorize([[1, 2, 3], [1, 3], [1, 4], [4, 5, 6, 7], [1], [4, 5]]),
+                "([None].False#6 ([1].True#4 ([2].False#1 [3].True#1) [3].True#1 [4].True#1) ([4].False#2 ([5].True#2 ([6].False#1 [7].True#1))))",
+                {1: 4, 2: 1, 3: 2, 4: 3, 5: 2, 6: 1, 7: 1},
+            ),
+            ([], "[None].False#0", {}),
+            (tensorize([[0]]), "([None].False#1 [0].True#1)", {0: 1}),
+            (
+                tensorize([[100000, 1, 2, 3, 4, 5]]),
+                "([None].False#1 ([100000].False#1 ([1].False#1 ([2].False#1 ([3].False#1 ([4].False#1 [5].True#1))))))",
+                {100000: 1, 1: 1, 2: 1, 3: 1, 4: 1, 5: 1},
+            ),
+            (
+                tensorize([[1, 2], [1, 2]]),
+                "([None].False#2 ([1].False#2 [2].True#2))",
+                {1: 2, 2: 2},
+            ),
+            (
+                tensorize([[1, 2], [3, 4]]),
+                "([None].False#2 ([1].False#1 [2].True#1) ([3].False#1 [4].True#1))",
+                {1: 1, 2: 1, 3: 1, 4: 1},
+            ),
+        ]
+
+        self.sequences = [
+            (
+                self.examples[0][0],
+                [],
+                {"bank": 0, "num_completed": 0, "finished": False, "is_root": True},
+            ),
+            (
+                self.examples[0][0],
+                [1, 2],
+                {"bank": 2, "num_completed": 0, "finished": False, "is_root": False},
+            ),
+            (
+                self.examples[0][0],
+                [1, 2, 94],
+                {"bank": 1, "num_completed": 1, "finished": False, "is_root": True},
+            ),
+            (
+                self.examples[0][0],
+                [1, 3, 999, 1, 4],
+                {"bank": 4, "num_completed": 2, "finished": False, "is_root": False},
+            ),
+            (
+                self.examples[0][0],
+                [1, 3, 999, 1, 4, 999],
+                {"bank": 4, "num_completed": 2, "finished": False, "is_root": True},
+            ),
+            (
+                self.examples[0][0],
+                [4, 5, 6, 8],
+                {"bank": 2, "num_completed": 1, "finished": False, "is_root": True},
+            ),
+            (
+                self.examples[0][0],
+                # Tricky, because in last three, goes down [1->4] branch, could miss [1] and [4->5]
+                # [[1, 2, 3], [1, 3], [1, 4], [4, 5, 6, 7], [1], [4, 5]],
+                [1, 2, 3, 1, 3, 1, 4, 4, 5, 6, 7, 1, 4, 5],
+                {"bank": 14, "num_completed": 6, "finished": True, "is_root": False},
+            ),
+            (
+                self.examples[0][0],
+                [1, 2, 3, 999, 1, 3, 1, 4, 4, 5, 6, 7, 1, 4, 5, 117],
+                {"bank": 14, "num_completed": 6, "finished": True, "is_root": True},
+            ),
+            (
+                tensorize([[1], [2, 3]]),
+                # Should not be able to get credit for entering 1 a second time
+                [1, 1],
+                {"bank": 1, "num_completed": 1, "finished": False, "is_root": True},
+            ),
+            (
+                self.examples[4][0],
+                [1, 2, 1, 2],
+                {"bank": 4, "num_completed": 2, "finished": True, "is_root": False},
+            ),
+            (
+                self.examples[4][0],
+                [1, 2, 1, 2, 1],
+                {"bank": 4, "num_completed": 2, "finished": True, "is_root": True},
+            ),
+            (
+                self.examples[5][0],
+                [1, 2, 3, 4, 5],
+                {"bank": 4, "num_completed": 2, "finished": True, "is_root": True},
+            ),
+        ]
+
+    def test_graphs(self):
+        """
+        Test whether unordered graph systems are created correctly.
+        """
+        for example in self.examples:
+            constraints, expected, gold_counts = example
+            c = ConstraintNode.create(constraints)
+            assert (
+                ConstraintNode.print_graph(c) == expected
+            ), f"got {ConstraintNode.print_graph(c)}, expected {expected}"
+            assert (
+                c.token_counts() == gold_counts
+            ), f"{c} got {c.token_counts()} wanted {gold_counts}"
+
+    def test_next_tokens(self):
+        """
+        Tests that the set of next tokens is correct.
+        """
+        for example in self.examples:
+            constraints, expected, gold_counts = example
+            root = ConstraintNode.create(constraints)
+
+            root_tokens = set(root.children.keys())
+            for sequence in constraints:
+                state = UnorderedConstraintState(root)
+                for token in sequence:
+                    all_tokens = root_tokens.union(state.node.children.keys())
+                    assert (
+                        all_tokens == state.next_tokens()
+                    ), f"ALL {all_tokens} NEXT {state.next_tokens()}"
+                    state = state.advance(token)
+
+    def test_sequences(self):
+        for constraints, tokens, expected in self.sequences:
+            state = UnorderedConstraintState.create(pack_constraints([constraints])[0])
+            for token in tokens:
+                state = state.advance(token)
+            result = {}
+            for attr in expected.keys():
+                result[attr] = getattr(state, attr)
+
+            assert (
+                result == expected
+            ), f"TEST({tokens}) GOT: {result} WANTED: {expected}"
+
+
+class TestOrderedConstraintState(unittest.TestCase):
+    def setUp(self):
+        self.sequences = [
+            (
+                tensorize([[1, 2, 3], [1, 3], [1, 4], [4, 5, 6, 7], [1], [4, 5]]),
+                [],
+                {"bank": 0, "num_completed": 0, "finished": False, "is_root": True},
+            ),
+            (
+                tensorize([[1, 2, 3], [1, 3], [1, 4], [4, 5, 6, 7], [1], [4, 5]]),
+                [1, 2],
+                {"bank": 2, "num_completed": 0, "finished": False, "is_root": False},
+            ),
+            (
+                tensorize([[1, 2, 3], [1, 3], [1, 4], [4, 5, 6, 7], [1], [4, 5]]),
+                [1, 2, 94],
+                {"bank": 0, "num_completed": 0, "finished": False, "is_root": True},
+            ),
+            (
+                tensorize([[1, 2, 3], [1, 3], [1, 4], [4, 5, 6, 7], [1], [4, 5]]),
+                [1, 3, 999, 1, 4],
+                {"bank": 0, "num_completed": 0, "finished": False, "is_root": True},
+            ),
+            (
+                tensorize([[1, 2, 3], [1, 3], [1, 4], [4, 5, 6, 7], [1], [4, 5]]),
+                [1, 2, 3, 999, 999],
+                {"bank": 3, "num_completed": 1, "finished": False, "is_root": False},
+            ),
+            (
+                tensorize([[1, 2, 3], [1, 3], [1, 4], [4, 5, 6, 7], [1], [4, 5]]),
+                [1, 2, 3, 77, 1, 3, 1],
+                {"bank": 6, "num_completed": 2, "finished": False, "is_root": False},
+            ),
+            (
+                tensorize([[1, 2, 3], [1, 3], [1, 4], [4, 5, 6, 7], [1], [4, 5]]),
+                [1, 2, 3, 1, 3, 1, 4, 4, 5, 6, 7, 1, 4, 5],
+                {"bank": 14, "num_completed": 6, "finished": True, "is_root": False},
+            ),
+            (
+                tensorize([[1, 2, 3], [1, 3], [1, 4], [4, 5, 6, 7], [1], [4, 5]]),
+                [1, 2, 999, 1, 2, 3, 999, 1, 3, 1, 4, 4, 5, 6, 7, 1, 4, 5, 117],
+                {"bank": 14, "num_completed": 6, "finished": True, "is_root": False},
+            ),
+            (
+                tensorize([[1], [2, 3]]),
+                [1, 1],
+                {"bank": 1, "num_completed": 1, "finished": False, "is_root": False},
+            ),
+            (
+                tensorize([[1, 2], [1, 2]]),
+                [1, 2, 1, 2],
+                {"bank": 4, "num_completed": 2, "finished": True, "is_root": False},
+            ),
+            (
+                tensorize([[1, 2], [1, 2]]),
+                [1, 2, 1, 2, 1],
+                {"bank": 4, "num_completed": 2, "finished": True, "is_root": False},
+            ),
+            (
+                tensorize([[1, 2], [3, 4]]),
+                [1, 2, 3, 4, 5],
+                {"bank": 4, "num_completed": 2, "finished": True, "is_root": False},
+            ),
+        ]
+
+    def test_sequences(self):
+        for i, (constraints, tokens, expected) in enumerate(self.sequences):
+            state = OrderedConstraintState.create(pack_constraints([constraints])[0])
+            for token in tokens:
+                state = state.advance(token)
+            result = {}
+            for attr in expected.keys():
+                result[attr] = getattr(state, attr)
+            assert (
+                result == expected
+            ), f"TEST({tokens}) GOT: {result} WANTED: {expected}"
+
+
+if __name__ == "__main__":
+    unittest.main()
diff --git a/SpeechT5/fairseq/tests/test_convtbc.py b/SpeechT5/fairseq/tests/test_convtbc.py
new file mode 100644
index 0000000000000000000000000000000000000000..3a3c9b91e70f597ab77b9b01459cc429db5d7956
--- /dev/null
+++ b/SpeechT5/fairseq/tests/test_convtbc.py
@@ -0,0 +1,54 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import unittest
+
+import torch
+import torch.nn as nn
+from fairseq.modules import ConvTBC
+
+
+class TestConvTBC(unittest.TestCase):
+    def test_convtbc(self):
+        # ksz, in_channels, out_channels
+        conv_tbc = ConvTBC(4, 5, kernel_size=3, padding=1)
+        # out_channels, in_channels, ksz
+        conv1d = nn.Conv1d(4, 5, kernel_size=3, padding=1)
+
+        conv_tbc.weight.data.copy_(conv1d.weight.data.transpose(0, 2))
+        conv_tbc.bias.data.copy_(conv1d.bias.data)
+
+        input_tbc = torch.randn(7, 2, 4, requires_grad=True)
+        input1d = input_tbc.data.transpose(0, 1).transpose(1, 2)
+        input1d.requires_grad = True
+
+        output_tbc = conv_tbc(input_tbc)
+        output1d = conv1d(input1d)
+
+        self.assertAlmostEqual(
+            output_tbc.data.transpose(0, 1).transpose(1, 2), output1d.data
+        )
+
+        grad_tbc = torch.randn(output_tbc.size())
+        grad1d = grad_tbc.transpose(0, 1).transpose(1, 2).contiguous()
+
+        output_tbc.backward(grad_tbc)
+        output1d.backward(grad1d)
+
+        self.assertAlmostEqual(
+            conv_tbc.weight.grad.data.transpose(0, 2), conv1d.weight.grad.data
+        )
+        self.assertAlmostEqual(conv_tbc.bias.grad.data, conv1d.bias.grad.data)
+        self.assertAlmostEqual(
+            input_tbc.grad.data.transpose(0, 1).transpose(1, 2), input1d.grad.data
+        )
+
+    def assertAlmostEqual(self, t1, t2):
+        self.assertEqual(t1.size(), t2.size(), "size mismatch")
+        self.assertLess((t1 - t2).abs().max(), 1e-4)
+
+
+if __name__ == "__main__":
+    unittest.main()
diff --git a/SpeechT5/fairseq/tests/test_data_utils.py b/SpeechT5/fairseq/tests/test_data_utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..2acfc8dc184015ad762db154dd9929f4c4043093
--- /dev/null
+++ b/SpeechT5/fairseq/tests/test_data_utils.py
@@ -0,0 +1,136 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import unittest
+
+import numpy as np
+from fairseq.data.data_utils_fast import batch_by_size_fn
+from fairseq.data.data_utils_fast import batch_by_size_vec
+
+
+class TestBatchBySize(unittest.TestCase):
+    @classmethod
+    def batch_by_size_baseline(
+        cls,
+        indices,
+        num_tokens_vec,
+        max_tokens,
+        max_sentences,
+        bsz_mult,
+    ):
+        """Simple, reliable and slow implementation of batch by size """
+        batches = []
+        start = 0
+        while start < len(indices):
+            for end in range(start + 1, len(indices) + 1):
+                max_val = max(num_tokens_vec[pos] for pos in range(start, end))
+                sent_count = end - start
+                num_tokens = max_val * sent_count
+                overflow = num_tokens > max_tokens > 0 or sent_count > max_sentences > 0
+                terminate = overflow or end == len(indices)
+                if overflow:
+                    sent_count -= 1
+                if terminate:
+                    if sent_count > bsz_mult:
+                        sent_count = sent_count - sent_count % bsz_mult
+                    batches.append(indices[start : start + sent_count])
+                    start = start + sent_count
+                    break
+        return batches
+
+    @classmethod
+    def _get_error_message(
+        cls, max_sentences, max_tokens, bsz_mult, num_tokens_vec, validation, results
+    ):
+        return f"""Reference batch_by_size implementation should produce
+                    same output as the baseline method.
+                Params:
+                max_sentences={max_sentences},
+                max_tokens={max_tokens},
+                bsz_mult={bsz_mult},
+                num_tokens_vec={num_tokens_vec},
+                expected_batches={validation},
+                returned_batches={results}"""
+
+    def _compare_results(
+        self,
+        indices_len,
+        batch_by_size_impl,
+        max_sentences,
+        max_tokens,
+        bsz_mult,
+        num_tokens_vec,
+    ):
+        indices = np.array(list(range(indices_len)))
+        validation = self.batch_by_size_baseline(
+            indices,
+            num_tokens_vec,
+            max_tokens=max_tokens,
+            max_sentences=max_sentences,
+            bsz_mult=bsz_mult,
+        )
+        results = batch_by_size_impl(
+            indices,
+            num_tokens_vec,
+            max_tokens=max_tokens,
+            max_sentences=max_sentences,
+            bsz_mult=bsz_mult,
+        )
+        error_msg = self._get_error_message(
+            max_sentences, max_tokens, bsz_mult, num_tokens_vec, validation, results
+        )
+        self.assertEqual(len(validation), len(results), error_msg)
+        for first, second in zip(validation, results):
+            self.assertTrue(np.array_equal(first, second), error_msg)
+
+    def _run_compare_with_baseline_sweep(self, batch_by_size_impl):
+        """Compare reference batch_by_size implementation with batch_by_size_baseline
+        across a dense grid of hyperparam values"""
+        MAX_MAX_TOKENS = 10
+        NUM_TOKENS_VECS_COUNT = 5
+        for indices_len in [10, 11]:  # try odd and even len of indices
+            for max_sentences in range(0, indices_len + 2):
+                for max_tokens in range(0, MAX_MAX_TOKENS):
+                    for bsz_mult in range(1, max(MAX_MAX_TOKENS, indices_len) + 2):
+                        for _ in range(NUM_TOKENS_VECS_COUNT):
+                            num_tokens_vec = np.random.randint(
+                                0, max_tokens + 1, size=indices_len
+                            )
+                            self._compare_results(
+                                indices_len,
+                                batch_by_size_impl,
+                                max_sentences,
+                                max_tokens,
+                                bsz_mult,
+                                num_tokens_vec,
+                            )
+
+
+class TestBatchBySizeVec(TestBatchBySize):
+    def test_compare_with_baseline(self):
+        self._run_compare_with_baseline_sweep(batch_by_size_vec)
+
+
+class TestBatchBySizeFn(TestBatchBySize):
+    def test_compare_with_baseline(self):
+        def batch_by_size_fn_wrapper(
+            indices,
+            num_tokens_vec,
+            max_tokens,
+            max_sentences,
+            bsz_mult,
+        ):
+            def num_tokens_fn(idx):
+                return num_tokens_vec[idx]
+
+            return batch_by_size_fn(
+                indices, num_tokens_fn, max_tokens, max_sentences, bsz_mult
+            )
+
+        self._run_compare_with_baseline_sweep(batch_by_size_fn_wrapper)
+
+
+if __name__ == "__main__":
+    unittest.main()
diff --git a/SpeechT5/fairseq/tests/test_dataset.py b/SpeechT5/fairseq/tests/test_dataset.py
new file mode 100644
index 0000000000000000000000000000000000000000..a3e3970028bc4b0259153e403951e1735bb0cd3e
--- /dev/null
+++ b/SpeechT5/fairseq/tests/test_dataset.py
@@ -0,0 +1,66 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import logging
+import unittest
+from typing import Sequence
+
+from fairseq.data import LanguagePairDataset, ListDataset, RoundRobinZipDatasets
+from tests.test_train import mock_dict
+
+
+def lang_pair_dataset(lengths: Sequence[int]) -> LanguagePairDataset:
+    tokens = [[i] * l for i, l in enumerate(lengths)]
+    return LanguagePairDataset(ListDataset(tokens), lengths, mock_dict())
+
+
+def sample(id: int, length: int):
+    return {"id": id, "source": [id] * length, "target": None}
+
+
+class TestDataset(unittest.TestCase):
+    def setUp(self):
+        logging.disable(logging.CRITICAL)
+
+    def tearDown(self):
+        logging.disable(logging.NOTSET)
+
+    def test_round_robin_zip_datasets(self):
+        long_dataset = lang_pair_dataset([10, 9, 8, 11])
+        short_dataset = lang_pair_dataset([11, 9])
+
+        dataset = RoundRobinZipDatasets({"a": long_dataset, "b": short_dataset})
+        # Dataset is now sorted by sentence length
+        dataset.ordered_indices()
+        assert dataset.longest_dataset is long_dataset
+        self.assertEqual(dict(dataset[0]), {"a": sample(2, 8), "b": sample(1, 9)})
+        # The item 2 of dataset 'a' is with item (2 % 2 = 0) of dataset 'b'
+        self.assertEqual(dict(dataset[2]), {"a": sample(0, 10), "b": sample(1, 9)})
+
+    def test_round_robin_zip_datasets_filtered(self):
+        long_dataset = lang_pair_dataset([10, 20, 8, 11, 1000, 7, 12])
+        short_dataset = lang_pair_dataset([11, 20, 9, 1000])
+
+        dataset = RoundRobinZipDatasets({"a": long_dataset, "b": short_dataset})
+        # Dataset is now sorted by sentence length
+        idx = dataset.ordered_indices()
+        idx, _ = dataset.filter_indices_by_size(idx, {"a": 19, "b": 900})
+        self.assertEqual(list(idx), [0, 1, 2, 3, 4])
+        self.assertEqual(dict(dataset[0]), {"a": sample(5, 7), "b": sample(2, 9)})
+        self.assertEqual(dict(dataset[2]), {"a": sample(0, 10), "b": sample(1, 20)})
+        self.assertEqual(dict(dataset[4]), {"a": sample(6, 12), "b": sample(0, 11)})
+
+    def test_round_robin_zip_datasets_filtered_with_tuple(self):
+        long_dataset = lang_pair_dataset([10, 20, 8, 11, 1000, 7, 12])
+        short_dataset = lang_pair_dataset([11, 20, 9, 1000])
+
+        dataset = RoundRobinZipDatasets({"a": long_dataset, "b": short_dataset})
+        # Dataset is now sorted by sentence length
+        idx = dataset.ordered_indices()
+        idx, _ = dataset.filter_indices_by_size(idx, 19)
+        self.assertEqual(list(idx), [0, 1, 2, 3, 4])
+        self.assertEqual(dict(dataset[0]), {"a": sample(5, 7), "b": sample(2, 9)})
+        self.assertEqual(dict(dataset[2]), {"a": sample(0, 10), "b": sample(2, 9)})
+        self.assertEqual(dict(dataset[4]), {"a": sample(6, 12), "b": sample(2, 9)})
diff --git a/SpeechT5/fairseq/tests/test_dictionary.py b/SpeechT5/fairseq/tests/test_dictionary.py
new file mode 100644
index 0000000000000000000000000000000000000000..81ce102f4f555822e36298034cdeb3d1c0650255
--- /dev/null
+++ b/SpeechT5/fairseq/tests/test_dictionary.py
@@ -0,0 +1,116 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import io
+import tempfile
+import unittest
+
+import torch
+from fairseq.data import Dictionary
+
+
+class TestDictionary(unittest.TestCase):
+    def test_finalize(self):
+        txt = [
+            "A B C D",
+            "B C D",
+            "C D",
+            "D",
+        ]
+        ref_ids1 = list(
+            map(
+                torch.IntTensor,
+                [
+                    [4, 5, 6, 7, 2],
+                    [5, 6, 7, 2],
+                    [6, 7, 2],
+                    [7, 2],
+                ],
+            )
+        )
+        ref_ids2 = list(
+            map(
+                torch.IntTensor,
+                [
+                    [7, 6, 5, 4, 2],
+                    [6, 5, 4, 2],
+                    [5, 4, 2],
+                    [4, 2],
+                ],
+            )
+        )
+
+        # build dictionary
+        d = Dictionary()
+        for line in txt:
+            d.encode_line(line, add_if_not_exist=True)
+
+        def get_ids(dictionary):
+            ids = []
+            for line in txt:
+                ids.append(dictionary.encode_line(line, add_if_not_exist=False))
+            return ids
+
+        def assertMatch(ids, ref_ids):
+            for toks, ref_toks in zip(ids, ref_ids):
+                self.assertEqual(toks.size(), ref_toks.size())
+                self.assertEqual(0, (toks != ref_toks).sum().item())
+
+        ids = get_ids(d)
+        assertMatch(ids, ref_ids1)
+
+        # check finalized dictionary
+        d.finalize()
+        finalized_ids = get_ids(d)
+        assertMatch(finalized_ids, ref_ids2)
+
+        # write to disk and reload
+        with tempfile.NamedTemporaryFile(mode="w") as tmp_dict:
+            d.save(tmp_dict.name)
+            d = Dictionary.load(tmp_dict.name)
+            reload_ids = get_ids(d)
+            assertMatch(reload_ids, ref_ids2)
+            assertMatch(finalized_ids, reload_ids)
+
+    def test_overwrite(self):
+        # for example, Camembert overwrites <unk>, <s> and </s>
+        dict_file = io.StringIO(
+            "<unk> 999 #fairseq:overwrite\n"
+            "<s> 999 #fairseq:overwrite\n"
+            "</s> 999 #fairseq:overwrite\n"
+            ", 999\n"
+            "▁de 999\n"
+        )
+        d = Dictionary()
+        d.add_from_file(dict_file)
+        self.assertEqual(d.index("<pad>"), 1)
+        self.assertEqual(d.index("foo"), 3)
+        self.assertEqual(d.index("<unk>"), 4)
+        self.assertEqual(d.index("<s>"), 5)
+        self.assertEqual(d.index("</s>"), 6)
+        self.assertEqual(d.index(","), 7)
+        self.assertEqual(d.index("▁de"), 8)
+
+    def test_no_overwrite(self):
+        # for example, Camembert overwrites <unk>, <s> and </s>
+        dict_file = io.StringIO(
+            "<unk> 999\n" "<s> 999\n" "</s> 999\n" ", 999\n" "▁de 999\n"
+        )
+        d = Dictionary()
+        with self.assertRaisesRegex(RuntimeError, "Duplicate"):
+            d.add_from_file(dict_file)
+
+    def test_space(self):
+        # for example, character models treat space as a symbol
+        dict_file = io.StringIO("  999\n" "a 999\n" "b 999\n")
+        d = Dictionary()
+        d.add_from_file(dict_file)
+        self.assertEqual(d.index(" "), 4)
+        self.assertEqual(d.index("a"), 5)
+        self.assertEqual(d.index("b"), 6)
+
+
+if __name__ == "__main__":
+    unittest.main()
diff --git a/SpeechT5/fairseq/tests/test_export.py b/SpeechT5/fairseq/tests/test_export.py
new file mode 100644
index 0000000000000000000000000000000000000000..b380697b9aff8799f90c1e0819e408826ecf2932
--- /dev/null
+++ b/SpeechT5/fairseq/tests/test_export.py
@@ -0,0 +1,121 @@
+#!/usr/bin/env python3
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import argparse
+import tempfile
+import unittest
+
+import torch
+from fairseq.data.dictionary import Dictionary
+from fairseq.models.transformer import TransformerModel
+from fairseq.modules import multihead_attention, sinusoidal_positional_embedding
+from fairseq.tasks.fairseq_task import LegacyFairseqTask
+
+
+DEFAULT_TEST_VOCAB_SIZE = 100
+
+
+class DummyTask(LegacyFairseqTask):
+    def __init__(self, args):
+        super().__init__(args)
+        self.dictionary = get_dummy_dictionary()
+        if getattr(self.args, "ctc", False):
+            self.dictionary.add_symbol("<ctc_blank>")
+        self.src_dict = self.dictionary
+        self.tgt_dict = self.dictionary
+
+    @property
+    def source_dictionary(self):
+        return self.src_dict
+
+    @property
+    def target_dictionary(self):
+        return self.dictionary
+
+
+def get_dummy_dictionary(vocab_size=DEFAULT_TEST_VOCAB_SIZE):
+    dummy_dict = Dictionary()
+    # add dummy symbol to satisfy vocab size
+    for id, _ in enumerate(range(vocab_size)):
+        dummy_dict.add_symbol("{}".format(id), 1000)
+    return dummy_dict
+
+
+def get_dummy_task_and_parser():
+    """
+    Return a dummy task and argument parser, which can be used to
+    create a model/criterion.
+    """
+    parser = argparse.ArgumentParser(
+        description="test_dummy_s2s_task", argument_default=argparse.SUPPRESS
+    )
+    DummyTask.add_args(parser)
+    args = parser.parse_args([])
+    task = DummyTask.setup_task(args)
+    return task, parser
+
+
+def _test_save_and_load(scripted_module):
+    with tempfile.NamedTemporaryFile() as f:
+        scripted_module.save(f.name)
+        torch.jit.load(f.name)
+
+
+class TestExportModels(unittest.TestCase):
+    def test_export_multihead_attention(self):
+        module = multihead_attention.MultiheadAttention(embed_dim=8, num_heads=2)
+        scripted = torch.jit.script(module)
+        _test_save_and_load(scripted)
+
+    def test_incremental_state_multihead_attention(self):
+        module1 = multihead_attention.MultiheadAttention(embed_dim=8, num_heads=2)
+        module1 = torch.jit.script(module1)
+        module2 = multihead_attention.MultiheadAttention(embed_dim=8, num_heads=2)
+        module2 = torch.jit.script(module2)
+
+        state = {}
+        state = module1.set_incremental_state(state, "key", {"a": torch.tensor([1])})
+        state = module2.set_incremental_state(state, "key", {"a": torch.tensor([2])})
+        v1 = module1.get_incremental_state(state, "key")["a"]
+        v2 = module2.get_incremental_state(state, "key")["a"]
+
+        self.assertEqual(v1, 1)
+        self.assertEqual(v2, 2)
+
+    def test_positional_embedding(self):
+        module = sinusoidal_positional_embedding.SinusoidalPositionalEmbedding(
+            embedding_dim=8, padding_idx=1
+        )
+        scripted = torch.jit.script(module)
+        _test_save_and_load(scripted)
+
+    @unittest.skipIf(
+        torch.__version__ < "1.6.0", "Targeting OSS scriptability for the 1.6 release"
+    )
+    def test_export_transformer(self):
+        task, parser = get_dummy_task_and_parser()
+        TransformerModel.add_args(parser)
+        args = parser.parse_args([])
+        model = TransformerModel.build_model(args, task)
+        scripted = torch.jit.script(model)
+        _test_save_and_load(scripted)
+
+    @unittest.skipIf(
+        torch.__version__ < "1.6.0", "Targeting OSS scriptability for the 1.6 release"
+    )
+    def test_export_transformer_no_token_pos_emb(self):
+        task, parser = get_dummy_task_and_parser()
+        TransformerModel.add_args(parser)
+        args = parser.parse_args([])
+        args.no_token_positional_embeddings = True
+        model = TransformerModel.build_model(args, task)
+        scripted = torch.jit.script(model)
+        _test_save_and_load(scripted)
+
+
+
+if __name__ == "__main__":
+    unittest.main()
diff --git a/SpeechT5/fairseq/tests/test_file_io.py b/SpeechT5/fairseq/tests/test_file_io.py
new file mode 100644
index 0000000000000000000000000000000000000000..425812bf1672489093941e5fa09f9da3171559ee
--- /dev/null
+++ b/SpeechT5/fairseq/tests/test_file_io.py
@@ -0,0 +1,58 @@
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import os
+import shutil
+import sys
+import tempfile
+import unittest
+from typing import Optional
+from unittest.mock import MagicMock
+
+
+class TestFileIO(unittest.TestCase):
+
+    _tmpdir: Optional[str] = None
+    _tmpfile: Optional[str] = None
+    _tmpfile_contents = "Hello, World"
+
+    @classmethod
+    def setUpClass(cls) -> None:
+        cls._tmpdir = tempfile.mkdtemp()
+        with open(os.path.join(cls._tmpdir, "test.txt"), "w") as f:
+            cls._tmpfile = f.name
+            f.write(cls._tmpfile_contents)
+            f.flush()
+
+    @classmethod
+    def tearDownClass(cls) -> None:
+        # Cleanup temp working dir.
+        if cls._tmpdir is not None:
+            shutil.rmtree(cls._tmpdir)  # type: ignore
+
+    def test_file_io(self):
+        from fairseq.file_io import PathManager
+
+        with PathManager.open(os.path.join(self._tmpdir, "test.txt"), "r") as f:
+            s = f.read()
+        self.assertEqual(s, self._tmpfile_contents)
+
+    def test_file_io_oss(self):
+        # Mock iopath to simulate oss environment.
+        sys.modules["iopath"] = MagicMock()
+        from fairseq.file_io import PathManager
+
+        with PathManager.open(os.path.join(self._tmpdir, "test.txt"), "r") as f:
+            s = f.read()
+        self.assertEqual(s, self._tmpfile_contents)
+
+    def test_file_io_async(self):
+        # ioPath `PathManager` is initialized after the first `opena` call.
+        try:
+            from fairseq.file_io import IOPathManager, PathManager
+            _asyncfile = os.path.join(self._tmpdir, "async.txt")
+            f = PathManager.opena(_asyncfile, "wb")
+            f.close()
+
+        finally:
+            self.assertTrue(PathManager.async_close())
diff --git a/SpeechT5/fairseq/tests/test_fp16_optimizer.py b/SpeechT5/fairseq/tests/test_fp16_optimizer.py
new file mode 100644
index 0000000000000000000000000000000000000000..ce4f1c055ce68b8e3933636fae66cca73c5e9d18
--- /dev/null
+++ b/SpeechT5/fairseq/tests/test_fp16_optimizer.py
@@ -0,0 +1,112 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import argparse
+import copy
+import logging
+import unittest
+
+import torch
+from fairseq.optim.fp16_optimizer import FP16Optimizer, MemoryEfficientFP16Optimizer
+from omegaconf import OmegaConf
+
+
+@unittest.skipIf(not torch.cuda.is_available(), "test requires a GPU")
+class TestGradientScaling(unittest.TestCase):
+    def setUp(self):
+        self.x = torch.tensor([2.0]).cuda().half()
+        weight = 3.0
+        bias = 5.0
+        self.error = 1.0
+        self.target = torch.tensor([self.x * weight + bias + self.error]).cuda().half()
+        self.loss_fn = torch.nn.L1Loss()
+
+        self.model = torch.nn.Linear(1, 1)
+        self.model.weight.data = torch.tensor([[weight]])
+        self.model.bias.data = torch.tensor([bias])
+        self.model.cuda().half()
+        self.params = list(self.model.parameters())
+
+        self.cfg_dls = OmegaConf.create(
+            {
+                "optimization": {
+                    "lr": [0.1],
+                },
+                "optimizer": {
+                    "_name": "adam",
+                    "lr": [0.1],
+                    "adam_betas": "(0.9, 0.999)",
+                    "adam_eps": 1e-8,
+                    "weight_decay": 0.0,
+                },
+                "common": {
+                    "fp16_init_scale": 1,
+                    "fp16_scale_window": 1,
+                    "fp16_scale_tolerance": 1,
+                    "threshold_loss_scale": 1,
+                    "min_loss_scale": 1e-4,
+                    "tpu": False,
+                },
+            }
+        )
+        logging.disable(logging.CRITICAL)
+
+    def tearDown(self):
+        logging.disable(logging.NOTSET)
+
+    def run_iter(self, model, params, optimizer):
+        optimizer.zero_grad()
+        y = model(self.x)
+        loss = self.loss_fn(y, self.target)
+        optimizer.backward(loss)
+        self.assertEqual(loss, torch.tensor(1.0, device="cuda:0", dtype=torch.float16))
+
+        grad_norm = optimizer.clip_grad_norm(0)
+        self.assertAlmostEqual(grad_norm.item(), 2.2361, 4)
+
+        optimizer.step()
+        self.assertEqual(
+            model.weight,
+            torch.tensor(
+                [[3.0996]], device="cuda:0", dtype=torch.float16, requires_grad=True
+            ),
+        )
+        self.assertEqual(
+            model.bias,
+            torch.tensor(
+                [5.1016], device="cuda:0", dtype=torch.float16, requires_grad=True
+            ),
+        )
+        self.assertEqual(optimizer.scaler.loss_scale, 2.0)
+
+    def test_mixed_precision(self):
+        model = copy.deepcopy(self.model)
+        params = list(model.parameters())
+        optimizer = FP16Optimizer.build_optimizer(self.cfg_dls, params)
+
+        self.run_iter(model, params, optimizer)
+        self.assertTrue(
+            all(
+                torch.all(
+                    fp32_params.eq(
+                        torch.tensor(
+                            [3.1000, 5.1000], device="cuda:0", requires_grad=True
+                        )
+                    )
+                )
+                for fp32_params in optimizer.fp32_params.values()
+            )
+        )
+
+    def test_memory_efficient(self):
+        model = copy.deepcopy(self.model)
+        params = list(model.parameters())
+        optimizer = MemoryEfficientFP16Optimizer.build_optimizer(self.cfg_dls, params)
+
+        self.run_iter(model, params, optimizer)
+
+
+if __name__ == "__main__":
+    unittest.main()
diff --git a/SpeechT5/fairseq/tests/test_inference_dropout.py b/SpeechT5/fairseq/tests/test_inference_dropout.py
new file mode 100644
index 0000000000000000000000000000000000000000..353ac674780a9795492c75aa0a7bc0677b07a9c9
--- /dev/null
+++ b/SpeechT5/fairseq/tests/test_inference_dropout.py
@@ -0,0 +1,70 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import logging
+import unittest
+
+from fairseq.dataclass.utils import convert_namespace_to_omegaconf
+from fairseq.models.transformer import TransformerModel
+from tests.test_sequence_generator import get_dummy_task_and_parser
+
+
+class TestInferenceDropout(unittest.TestCase):
+    def setUp(self):
+        self.task, self.parser = get_dummy_task_and_parser()
+        TransformerModel.add_args(self.parser)
+        self.args = self.parser.parse_args([])
+        self.args.encoder_layers = 2
+        self.args.decoder_layers = 1
+        logging.disable(logging.CRITICAL)
+
+    def tearDown(self):
+        logging.disable(logging.NOTSET)
+
+    def test_sets_inference_dropout_to_true(self):
+        self.args.retain_dropout = True
+        self.transformer_model = TransformerModel.build_model(self.args, self.task)
+        cfg = convert_namespace_to_omegaconf(self.args)
+        self.transformer_model.prepare_for_inference_(cfg)
+        assert self.transformer_model.encoder.dropout_module.apply_during_inference
+        assert self.transformer_model.decoder.dropout_module.apply_during_inference
+        for layer in self.transformer_model.encoder.layers:
+            assert layer.dropout_module.apply_during_inference
+
+    def test_inference_dropout_false_by_default(self):
+        self.transformer_model = TransformerModel.build_model(self.args, self.task)
+        cfg = convert_namespace_to_omegaconf(self.args)
+        self.transformer_model.prepare_for_inference_(cfg)
+        assert not self.transformer_model.encoder.dropout_module.apply_during_inference
+        assert not self.transformer_model.decoder.dropout_module.apply_during_inference
+        for layer in self.transformer_model.encoder.layers:
+            assert not layer.dropout_module.apply_during_inference
+        for layer in self.transformer_model.decoder.layers:
+            assert not layer.dropout_module.apply_during_inference
+
+    def test_applies_training_mode(self):
+        self.transformer_model = TransformerModel.build_model(self.args, self.task)
+        assert self.transformer_model.encoder.dropout_module.training
+        for layer in self.transformer_model.encoder.layers:
+            assert layer.dropout_module.training
+
+        self.transformer_model.eval()
+        assert not self.transformer_model.decoder.dropout_module.training
+        for layer in self.transformer_model.encoder.layers:
+            assert not layer.dropout_module.training
+
+    def test_retain_modules(self):
+        self.args.retain_dropout = True
+        self.args.retain_dropout_modules = [
+            "TransformerEncoder",
+            "TransformerEncoderLayer",
+        ]
+        self.transformer_model = TransformerModel.build_model(self.args, self.task)
+        cfg = convert_namespace_to_omegaconf(self.args)
+        self.transformer_model.prepare_for_inference_(cfg)
+        assert self.transformer_model.encoder.dropout_module.apply_during_inference
+        assert not self.transformer_model.decoder.dropout_module.apply_during_inference
+        for layer in self.transformer_model.decoder.layers:
+            assert not layer.dropout_module.apply_during_inference
diff --git a/SpeechT5/fairseq/tests/test_iopath.py b/SpeechT5/fairseq/tests/test_iopath.py
new file mode 100644
index 0000000000000000000000000000000000000000..908261a6619806f7ef9b5dd1beb5d6817b249a6e
--- /dev/null
+++ b/SpeechT5/fairseq/tests/test_iopath.py
@@ -0,0 +1,29 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import unittest
+from unittest import mock
+
+
+class TestIOPath(unittest.TestCase):
+
+    def test_no_iopath(self):
+        from .test_reproducibility import TestReproducibility
+
+        with mock.patch.dict("sys.modules", {"iopath": None}):
+            # reuse reproducibility tests, which are e2e tests that should cover
+            # most checkpoint related functionality
+            TestReproducibility._test_reproducibility(self, "test_reproducibility")
+
+    def test_no_supports_rename(self):
+        from .test_reproducibility import TestReproducibility
+
+        with mock.patch("fairseq.file_io.PathManager.supports_rename") as mock_fn:
+            mock_fn.return_value = False
+            TestReproducibility._test_reproducibility(self, "test_reproducibility")
+
+
+if __name__ == "__main__":
+    unittest.main()
diff --git a/SpeechT5/fairseq/tests/test_iterators.py b/SpeechT5/fairseq/tests/test_iterators.py
new file mode 100644
index 0000000000000000000000000000000000000000..7b3dd4848553357e5e8326ed3a31cf5d68ceea94
--- /dev/null
+++ b/SpeechT5/fairseq/tests/test_iterators.py
@@ -0,0 +1,137 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import unittest
+
+from fairseq.data import iterators
+
+
+class TestIterators(unittest.TestCase):
+    def test_counting_iterator_index(self, ref=None, itr=None):
+        # Test the indexing functionality of CountingIterator
+        if ref is None:
+            assert itr is None
+            ref = list(range(10))
+            itr = iterators.CountingIterator(ref)
+        else:
+            assert len(ref) == 10
+            assert itr is not None
+
+        self.assertTrue(itr.has_next())
+        self.assertEqual(itr.n, 0)
+        self.assertEqual(next(itr), ref[0])
+        self.assertEqual(itr.n, 1)
+        self.assertEqual(next(itr), ref[1])
+        self.assertEqual(itr.n, 2)
+        itr.skip(3)
+        self.assertEqual(itr.n, 5)
+        self.assertEqual(next(itr), ref[5])
+        itr.skip(2)
+        self.assertEqual(itr.n, 8)
+        self.assertEqual(list(itr), [ref[8], ref[9]])
+        self.assertFalse(itr.has_next())
+
+    def test_counting_iterator_length_mismatch(self):
+        ref = list(range(10))
+        # When the underlying iterable is longer than the CountingIterator,
+        # the remaining items in the iterable should be ignored
+        itr = iterators.CountingIterator(ref, total=8)
+        self.assertEqual(list(itr), ref[:8])
+        # When the underlying iterable is shorter than the CountingIterator,
+        # raise an IndexError when the underlying iterable is exhausted
+        itr = iterators.CountingIterator(ref, total=12)
+        self.assertRaises(IndexError, list, itr)
+
+    def test_counting_iterator_take(self):
+        # Test the "take" method of CountingIterator
+        ref = list(range(10))
+        itr = iterators.CountingIterator(ref)
+        itr.take(5)
+        self.assertEqual(len(itr), len(list(iter(itr))))
+        self.assertEqual(len(itr), 5)
+
+        itr = iterators.CountingIterator(ref)
+        itr.take(5)
+        self.assertEqual(next(itr), ref[0])
+        self.assertEqual(next(itr), ref[1])
+        itr.skip(2)
+        self.assertEqual(next(itr), ref[4])
+        self.assertFalse(itr.has_next())
+
+    def test_grouped_iterator(self):
+        # test correctness
+        x = list(range(10))
+        itr = iterators.GroupedIterator(x, 1)
+        self.assertEqual(list(itr), [[0], [1], [2], [3], [4], [5], [6], [7], [8], [9]])
+        itr = iterators.GroupedIterator(x, 4)
+        self.assertEqual(list(itr), [[0, 1, 2, 3], [4, 5, 6, 7], [8, 9]])
+        itr = iterators.GroupedIterator(x, 5)
+        self.assertEqual(list(itr), [[0, 1, 2, 3, 4], [5, 6, 7, 8, 9]])
+
+        # test the GroupIterator also works correctly as a CountingIterator
+        x = list(range(30))
+        ref = list(iterators.GroupedIterator(x, 3))
+        itr = iterators.GroupedIterator(x, 3)
+        self.test_counting_iterator_index(ref, itr)
+
+    def test_sharded_iterator(self):
+        # test correctness
+        x = list(range(10))
+        itr = iterators.ShardedIterator(x, num_shards=1, shard_id=0)
+        self.assertEqual(list(itr), x)
+        itr = iterators.ShardedIterator(x, num_shards=2, shard_id=0)
+        self.assertEqual(list(itr), [0, 2, 4, 6, 8])
+        itr = iterators.ShardedIterator(x, num_shards=2, shard_id=1)
+        self.assertEqual(list(itr), [1, 3, 5, 7, 9])
+        itr = iterators.ShardedIterator(x, num_shards=3, shard_id=0)
+        self.assertEqual(list(itr), [0, 3, 6, 9])
+        itr = iterators.ShardedIterator(x, num_shards=3, shard_id=1)
+        self.assertEqual(list(itr), [1, 4, 7, None])
+        itr = iterators.ShardedIterator(x, num_shards=3, shard_id=2)
+        self.assertEqual(list(itr), [2, 5, 8, None])
+
+        # test CountingIterator functionality
+        x = list(range(30))
+        ref = list(iterators.ShardedIterator(x, num_shards=3, shard_id=0))
+        itr = iterators.ShardedIterator(x, num_shards=3, shard_id=0)
+        self.test_counting_iterator_index(ref, itr)
+
+    def test_counting_iterator_buffered_iterator_take(self):
+        ref = list(range(10))
+        buffered_itr = iterators.BufferedIterator(2, ref)
+        itr = iterators.CountingIterator(buffered_itr)
+        itr.take(5)
+        self.assertEqual(len(itr), len(list(iter(itr))))
+        self.assertEqual(len(itr), 5)
+
+        buffered_itr = iterators.BufferedIterator(2, ref)
+        itr = iterators.CountingIterator(buffered_itr)
+        itr.take(5)
+        self.assertEqual(len(buffered_itr), 5)
+        self.assertEqual(len(list(iter(buffered_itr))), 5)
+
+        buffered_itr = iterators.BufferedIterator(2, ref)
+        itr = iterators.CountingIterator(buffered_itr)
+        itr.take(5)
+        self.assertEqual(next(itr), ref[0])
+        self.assertEqual(next(itr), ref[1])
+        itr.skip(2)
+        self.assertEqual(next(itr), ref[4])
+        self.assertFalse(itr.has_next())
+        self.assertRaises(StopIteration, next, buffered_itr)
+
+        ref = list(range(4, 10))
+        buffered_itr = iterators.BufferedIterator(2, ref)
+        itr = iterators.CountingIterator(buffered_itr, start=4)
+        itr.take(5)
+        self.assertEqual(len(itr), 5)
+        self.assertEqual(len(buffered_itr), 1)
+        self.assertEqual(next(itr), ref[0])
+        self.assertFalse(itr.has_next())
+        self.assertRaises(StopIteration, next, buffered_itr)
+
+
+if __name__ == "__main__":
+    unittest.main()
diff --git a/SpeechT5/fairseq/tests/test_label_smoothing.py b/SpeechT5/fairseq/tests/test_label_smoothing.py
new file mode 100644
index 0000000000000000000000000000000000000000..04c0f974ac80f7606327f868e948712c3c18f1d0
--- /dev/null
+++ b/SpeechT5/fairseq/tests/test_label_smoothing.py
@@ -0,0 +1,123 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import argparse
+import copy
+import unittest
+
+import tests.utils as test_utils
+import torch
+from fairseq.criterions.cross_entropy import CrossEntropyCriterion
+from fairseq.criterions.label_smoothed_cross_entropy import (
+    LabelSmoothedCrossEntropyCriterion,
+)
+
+
+class TestLabelSmoothing(unittest.TestCase):
+    def setUp(self):
+        # build dictionary
+        self.d = test_utils.dummy_dictionary(3)
+        vocab = len(self.d)
+        self.assertEqual(vocab, 4 + 3)  # 4 special + 3 tokens
+        self.assertEqual(self.d.pad(), 1)
+        self.assertEqual(self.d.eos(), 2)
+        self.assertEqual(self.d.unk(), 3)
+        pad, eos, unk, w1, w2, w3 = 1, 2, 3, 4, 5, 6  # noqa: F841
+
+        # build dataset
+        self.data = [
+            # the first batch item has padding
+            {
+                "source": torch.LongTensor([w1, eos]),
+                "target": torch.LongTensor([w1, eos]),
+            },
+            {
+                "source": torch.LongTensor([w1, eos]),
+                "target": torch.LongTensor([w1, w1, eos]),
+            },
+        ]
+        self.sample = next(test_utils.dummy_dataloader(self.data))
+
+        # build model
+        self.args = argparse.Namespace()
+        self.args.sentence_avg = False
+        self.args.report_accuracy = False
+        self.args.probs = (
+            torch.FloatTensor(
+                [
+                    #      pad   eos  unk   w1   w2   w3
+                    [0.05, 0.05, 0.1, 0.05, 0.3, 0.4, 0.05],
+                    [0.05, 0.10, 0.2, 0.05, 0.2, 0.3, 0.10],
+                    [0.05, 0.15, 0.3, 0.05, 0.1, 0.2, 0.15],
+                ]
+            )
+            .unsqueeze(0)
+            .expand(2, 3, 7)
+        )  # add batch dimension
+        self.task = test_utils.TestTranslationTask.setup_task(self.args, self.d, self.d)
+        self.model = self.task.build_model(self.args)
+
+    def test_nll_loss(self):
+        self.args.label_smoothing = 0.1
+        nll_crit = CrossEntropyCriterion.build_criterion(self.args, self.task)
+        smooth_crit = LabelSmoothedCrossEntropyCriterion.build_criterion(
+            self.args, self.task
+        )
+        nll_loss, nll_sample_size, nll_logging_output = nll_crit(
+            self.model, self.sample
+        )
+        smooth_loss, smooth_sample_size, smooth_logging_output = smooth_crit(
+            self.model, self.sample
+        )
+        self.assertLess(abs(nll_loss - nll_logging_output["loss"]), 1e-6)
+        self.assertLess(abs(nll_loss - smooth_logging_output["nll_loss"]), 1e-6)
+
+    def test_padding(self):
+        self.args.label_smoothing = 0.1
+        crit = LabelSmoothedCrossEntropyCriterion.build_criterion(self.args, self.task)
+        loss, _, logging_output = crit(self.model, self.sample)
+
+        def get_one_no_padding(idx):
+            # create a new sample with just a single batch item so that there's
+            # no padding
+            sample1 = next(test_utils.dummy_dataloader([self.data[idx]]))
+            args1 = copy.copy(self.args)
+            args1.probs = args1.probs[idx, :, :].unsqueeze(0)
+            model1 = self.task.build_model(args1)
+            loss1, _, _ = crit(model1, sample1)
+            return loss1
+
+        loss1 = get_one_no_padding(0)
+        loss2 = get_one_no_padding(1)
+        self.assertAlmostEqual(loss, loss1 + loss2)
+
+    def test_reduction(self):
+        self.args.label_smoothing = 0.1
+        crit = LabelSmoothedCrossEntropyCriterion.build_criterion(self.args, self.task)
+        loss, _, logging_output = crit(self.model, self.sample, reduce=True)
+        unreduced_loss, _, _ = crit(self.model, self.sample, reduce=False)
+        self.assertAlmostEqual(loss, unreduced_loss.sum())
+
+    def test_zero_eps(self):
+        self.args.label_smoothing = 0.0
+        nll_crit = CrossEntropyCriterion.build_criterion(self.args, self.task)
+        smooth_crit = LabelSmoothedCrossEntropyCriterion.build_criterion(
+            self.args, self.task
+        )
+        nll_loss, nll_sample_size, nll_logging_output = nll_crit(
+            self.model, self.sample
+        )
+        smooth_loss, smooth_sample_size, smooth_logging_output = smooth_crit(
+            self.model, self.sample
+        )
+        self.assertAlmostEqual(nll_loss, smooth_loss)
+
+    def assertAlmostEqual(self, t1, t2):
+        self.assertEqual(t1.size(), t2.size(), "size mismatch")
+        self.assertLess((t1 - t2).abs().max(), 1e-6)
+
+
+if __name__ == "__main__":
+    unittest.main()
diff --git a/SpeechT5/fairseq/tests/test_lm_context_window.py b/SpeechT5/fairseq/tests/test_lm_context_window.py
new file mode 100644
index 0000000000000000000000000000000000000000..7415e86abdf8ddc2d797092bf98f7a1331e038d6
--- /dev/null
+++ b/SpeechT5/fairseq/tests/test_lm_context_window.py
@@ -0,0 +1,51 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import unittest
+
+import torch
+from fairseq.data import MonolingualDataset
+from fairseq.tasks.language_modeling import LanguageModelingTask, LanguageModelingConfig
+from tests import utils as test_utils
+
+
+class TestLMContextWindow(unittest.TestCase):
+
+    def test_eval_dataloader(self):
+        dictionary = test_utils.dummy_dictionary(10)
+        assert len(dictionary) == 14  # 4 extra special symbols
+        assert dictionary.pad() == 1
+
+        dataset = test_utils.TestDataset([
+            torch.tensor([4, 5, 6, 7], dtype=torch.long),
+            torch.tensor([8, 9, 10, 11], dtype=torch.long),
+            torch.tensor([12, 13], dtype=torch.long),
+        ])
+        dataset = MonolingualDataset(dataset, sizes=[4, 4, 2], src_vocab=dictionary)
+
+        config = LanguageModelingConfig(tokens_per_sample=4)
+        task = LanguageModelingTask(config, dictionary)
+
+        eval_dataloader = task.eval_lm_dataloader(
+            dataset=dataset,
+            batch_size=1,
+            context_window=2,
+        )
+
+        batch = next(eval_dataloader)
+        assert batch["net_input"]["src_tokens"][0].tolist() == [4, 5, 6, 7, 1, 1]
+        assert batch["target"][0].tolist() == [4, 5, 6, 7, 1, 1]
+
+        batch = next(eval_dataloader)
+        assert batch["net_input"]["src_tokens"][0].tolist() == [6, 7, 8, 9, 10, 11]
+        assert batch["target"][0].tolist() == [1, 1, 8, 9, 10, 11]
+
+        batch = next(eval_dataloader)
+        assert batch["net_input"]["src_tokens"][0].tolist() == [10, 11, 12, 13]
+        assert batch["target"][0].tolist() == [1, 1, 12, 13]
+
+
+if __name__ == "__main__":
+    unittest.main()
diff --git a/SpeechT5/fairseq/tests/test_lstm_jitable.py b/SpeechT5/fairseq/tests/test_lstm_jitable.py
new file mode 100644
index 0000000000000000000000000000000000000000..38f79d17931c32447e96c0fbae2630ac397e1804
--- /dev/null
+++ b/SpeechT5/fairseq/tests/test_lstm_jitable.py
@@ -0,0 +1,115 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import argparse
+import tempfile
+import unittest
+
+import torch
+from fairseq.data.dictionary import Dictionary
+from fairseq.models.lstm import LSTMModel
+from fairseq.tasks.fairseq_task import LegacyFairseqTask
+
+
+DEFAULT_TEST_VOCAB_SIZE = 100
+
+
+class DummyTask(LegacyFairseqTask):
+    def __init__(self, args):
+        super().__init__(args)
+        self.dictionary = get_dummy_dictionary()
+        if getattr(self.args, "ctc", False):
+            self.dictionary.add_symbol("<ctc_blank>")
+        self.src_dict = self.dictionary
+        self.tgt_dict = self.dictionary
+
+    @property
+    def source_dictionary(self):
+        return self.src_dict
+
+    @property
+    def target_dictionary(self):
+        return self.dictionary
+
+
+def get_dummy_dictionary(vocab_size=DEFAULT_TEST_VOCAB_SIZE):
+    dummy_dict = Dictionary()
+    # add dummy symbol to satisfy vocab size
+    for id, _ in enumerate(range(vocab_size)):
+        dummy_dict.add_symbol("{}".format(id), 1000)
+    return dummy_dict
+
+
+def get_dummy_task_and_parser():
+    """
+    to build a fariseq model, we need some dummy parse and task. This function
+    is used to create dummy task and parser to faciliate model/criterion test
+
+    Note: we use FbSpeechRecognitionTask as the dummy task. You may want
+    to use other task by providing another function
+    """
+    parser = argparse.ArgumentParser(
+        description="test_dummy_s2s_task", argument_default=argparse.SUPPRESS
+    )
+    DummyTask.add_args(parser)
+    args = parser.parse_args([])
+    task = DummyTask.setup_task(args)
+    return task, parser
+
+
+class TestJitLSTMModel(unittest.TestCase):
+    def _test_save_and_load(self, scripted_module):
+        with tempfile.NamedTemporaryFile() as f:
+            scripted_module.save(f.name)
+            torch.jit.load(f.name)
+
+    def assertTensorEqual(self, t1, t2):
+        t1 = t1[~torch.isnan(t1)]  # can cause size mismatch errors if there are NaNs
+        t2 = t2[~torch.isnan(t2)]
+        self.assertEqual(t1.size(), t2.size(), "size mismatch")
+        self.assertEqual(t1.ne(t2).long().sum(), 0)
+
+    def test_jit_and_export_lstm(self):
+        task, parser = get_dummy_task_and_parser()
+        LSTMModel.add_args(parser)
+        args = parser.parse_args([])
+        args.criterion = ""
+        model = LSTMModel.build_model(args, task)
+        scripted_model = torch.jit.script(model)
+        self._test_save_and_load(scripted_model)
+
+    def test_assert_jit_vs_nonjit_(self):
+        task, parser = get_dummy_task_and_parser()
+        LSTMModel.add_args(parser)
+        args = parser.parse_args([])
+        args.criterion = ""
+        model = LSTMModel.build_model(args, task)
+        model.eval()
+        scripted_model = torch.jit.script(model)
+        scripted_model.eval()
+        idx = len(task.source_dictionary)
+        iter = 100
+        # Inject random input and check output
+        seq_len_tensor = torch.randint(1, 10, (iter,))
+        num_samples_tensor = torch.randint(1, 10, (iter,))
+        for i in range(iter):
+            seq_len = seq_len_tensor[i]
+            num_samples = num_samples_tensor[i]
+            src_token = (torch.randint(0, idx, (num_samples, seq_len)),)
+            src_lengths = torch.randint(1, seq_len + 1, (num_samples,))
+            src_lengths, _ = torch.sort(src_lengths, descending=True)
+            # Force the first sample to have seq_len
+            src_lengths[0] = seq_len
+            prev_output_token = (torch.randint(0, idx, (num_samples, 1)),)
+            result = model(src_token[0], src_lengths, prev_output_token[0], None)
+            scripted_result = scripted_model(
+                src_token[0], src_lengths, prev_output_token[0], None
+            )
+            self.assertTensorEqual(result[0], scripted_result[0])
+            self.assertTensorEqual(result[1], scripted_result[1])
+
+
+if __name__ == "__main__":
+    unittest.main()
diff --git a/SpeechT5/fairseq/tests/test_memory_efficient_fp16.py b/SpeechT5/fairseq/tests/test_memory_efficient_fp16.py
new file mode 100644
index 0000000000000000000000000000000000000000..2bf2f29888d6027896128930626b1aafe7f18475
--- /dev/null
+++ b/SpeechT5/fairseq/tests/test_memory_efficient_fp16.py
@@ -0,0 +1,78 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import argparse
+import logging
+import unittest
+
+import torch
+from fairseq.optim.adam import FairseqAdam
+from fairseq.optim.fp16_optimizer import MemoryEfficientFP16Optimizer
+from omegaconf import OmegaConf
+
+
+@unittest.skipIf(not torch.cuda.is_available(), "test requires a GPU")
+class TestMemoryEfficientFP16(unittest.TestCase):
+    def setUp(self):
+        logging.disable(logging.CRITICAL)
+
+    def tearDown(self):
+        logging.disable(logging.NOTSET)
+
+    def test_load_state_dict(self):
+        # define simple FP16 model
+        model = torch.nn.Linear(5, 5).cuda().half()
+        params = list(model.parameters())
+
+        # initialize memory efficient FP16 optimizer
+        # with pseudo DictConfigs
+        optimizer = FairseqAdam(
+            cfg=OmegaConf.create(
+                vars(
+                    argparse.Namespace(
+                        adam_betas="(0.9, 0.999)",
+                        adam_eps=1e-8,
+                        weight_decay=0.0,
+                        lr=[0.00001],
+                    )
+                )
+            ),
+            params=params,
+        )
+        me_optimizer = MemoryEfficientFP16Optimizer(
+            cfg=OmegaConf.create(
+                {
+                    "common": vars(
+                        argparse.Namespace(
+                            fp16_init_scale=1,
+                            fp16_scale_window=1,
+                            fp16_scale_tolerance=1,
+                            threshold_loss_scale=1,
+                            min_loss_scale=1e-4,
+                        )
+                    )
+                }
+            ),
+            params=params,
+            optimizer=optimizer,
+        )
+
+        # optimizer state is created in the first step
+        loss = model(torch.rand(5).cuda().half()).sum()
+        me_optimizer.backward(loss)
+        me_optimizer.step()
+
+        # reload state
+        state = me_optimizer.state_dict()
+        me_optimizer.load_state_dict(state)
+        for k, v in me_optimizer.optimizer.state.items():
+            self.assertTrue(k.dtype == torch.float16)
+            for v_i in v.values():
+                if torch.is_tensor(v_i):
+                    self.assertTrue(v_i.dtype == torch.float32)
+
+
+if __name__ == "__main__":
+    unittest.main()
diff --git a/SpeechT5/fairseq/tests/test_metrics.py b/SpeechT5/fairseq/tests/test_metrics.py
new file mode 100644
index 0000000000000000000000000000000000000000..2de6969cf4445bc6cda44dacf6de765ea30d5f5b
--- /dev/null
+++ b/SpeechT5/fairseq/tests/test_metrics.py
@@ -0,0 +1,77 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import unittest
+import uuid
+
+from fairseq import metrics
+
+
+class TestMetrics(unittest.TestCase):
+    def test_nesting(self):
+        with metrics.aggregate() as a:
+            metrics.log_scalar("loss", 1)
+            with metrics.aggregate() as b:
+                metrics.log_scalar("loss", 2)
+
+        self.assertEqual(a.get_smoothed_values()["loss"], 1.5)
+        self.assertEqual(b.get_smoothed_values()["loss"], 2)
+
+    def test_new_root(self):
+        with metrics.aggregate() as a:
+            metrics.log_scalar("loss", 1)
+            with metrics.aggregate(new_root=True) as b:
+                metrics.log_scalar("loss", 2)
+
+        self.assertEqual(a.get_smoothed_values()["loss"], 1)
+        self.assertEqual(b.get_smoothed_values()["loss"], 2)
+
+    def test_nested_new_root(self):
+        with metrics.aggregate() as layer1:
+            metrics.log_scalar("loss", 1)
+            with metrics.aggregate(new_root=True) as layer2:
+                metrics.log_scalar("loss", 2)
+                with metrics.aggregate() as layer3:
+                    metrics.log_scalar("loss", 3)
+                    with metrics.aggregate(new_root=True) as layer4:
+                        metrics.log_scalar("loss", 4)
+            metrics.log_scalar("loss", 1.5)
+
+        self.assertEqual(layer4.get_smoothed_values()["loss"], 4)
+        self.assertEqual(layer3.get_smoothed_values()["loss"], 3)
+        self.assertEqual(layer2.get_smoothed_values()["loss"], 2.5)
+        self.assertEqual(layer1.get_smoothed_values()["loss"], 1.25)
+
+    def test_named(self):
+        name = str(uuid.uuid4())
+        metrics.reset_meters(name)
+
+        with metrics.aggregate(name):
+            metrics.log_scalar("loss", 1)
+
+        metrics.log_scalar("loss", 3)
+
+        with metrics.aggregate(name):
+            metrics.log_scalar("loss", 2)
+
+        self.assertEqual(metrics.get_smoothed_values(name)["loss"], 1.5)
+
+    def test_nested_duplicate_names(self):
+        name = str(uuid.uuid4())
+        metrics.reset_meters(name)
+
+        with metrics.aggregate(name):
+            metrics.log_scalar("loss", 1)
+            with metrics.aggregate() as other:
+                with metrics.aggregate(name):
+                    metrics.log_scalar("loss", 2)
+            metrics.log_scalar("loss", 6)
+
+        self.assertEqual(metrics.get_smoothed_values(name)["loss"], 3)
+        self.assertEqual(other.get_smoothed_values()["loss"], 2)
+
+
+if __name__ == "__main__":
+    unittest.main()
diff --git a/SpeechT5/fairseq/tests/test_multi_corpus_dataset.py b/SpeechT5/fairseq/tests/test_multi_corpus_dataset.py
new file mode 100644
index 0000000000000000000000000000000000000000..5a79f4b680e5bc2c7374ec6dd8ea525c47b40985
--- /dev/null
+++ b/SpeechT5/fairseq/tests/test_multi_corpus_dataset.py
@@ -0,0 +1,79 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import unittest
+from collections import OrderedDict
+
+import torch
+from fairseq.data import LanguagePairDataset, TokenBlockDataset
+from fairseq.data.multi_corpus_dataset import MultiCorpusDataset
+from tests.test_train import mock_dict
+
+
+class TestMultiCorpusDataset(unittest.TestCase):
+    def setUp(self):
+        d = mock_dict()
+        tokens_1 = torch.LongTensor([i for i in range(1, 5000, 2)]).view(1, -1)
+        tokens_ds1 = TokenBlockDataset(
+            tokens_1,
+            sizes=[tokens_1.size(-1)],
+            block_size=1,
+            pad=0,
+            eos=1,
+            include_targets=False,
+        )
+        self.dataset_1 = LanguagePairDataset(
+            tokens_ds1, tokens_ds1.sizes, d, shuffle=False
+        )
+        tokens_2 = torch.LongTensor([i for i in range(0, 5000, 2)]).view(1, -1)
+        tokens_ds2 = TokenBlockDataset(
+            tokens_2,
+            sizes=[tokens_2.size(-1)],
+            block_size=1,
+            pad=0,
+            eos=1,
+            include_targets=False,
+        )
+        self.dataset_2 = LanguagePairDataset(
+            tokens_ds2, tokens_ds2.sizes, d, shuffle=False
+        )
+
+    def _test_sample_helper(
+        self,
+        distribution,
+    ):
+        m = MultiCorpusDataset(
+            OrderedDict({0: self.dataset_1, 1: self.dataset_2}),
+            distribution=distribution,
+            seed=0,
+            sort_indices=True,
+        )
+        m.set_epoch(1)
+        indices = m.ordered_indices()
+        count_sample_from_first_dataset = 0
+        items = set()
+        for i in indices:
+            item = m[i]["source"].item()
+            if item % 2 == 1:
+                count_sample_from_first_dataset += 1
+
+            items.add(item)
+        sample_from_first_ds_percentage = (
+            1.0 * count_sample_from_first_dataset / len(indices)
+        )
+        self.assertLess(
+            abs(sample_from_first_ds_percentage - distribution[0]),
+            0.01,
+        )
+        self.assertEqual(
+            len(items),
+            int(min(len(self.dataset_1), len(indices) * distribution[0])
+                + min(len(self.dataset_1), len(indices) * distribution[1]))
+        )
+        print(distribution)
+
+    def test_multi_corpus_dataset(self):
+        for distribution in [[0.5, 0.5], [0.1, 0.9], [0.9, 0.1]]:
+            self._test_sample_helper(distribution=distribution)
diff --git a/SpeechT5/fairseq/tests/test_multi_corpus_sampled_dataset.py b/SpeechT5/fairseq/tests/test_multi_corpus_sampled_dataset.py
new file mode 100644
index 0000000000000000000000000000000000000000..05b20328c5605178767d138cc75e070824679842
--- /dev/null
+++ b/SpeechT5/fairseq/tests/test_multi_corpus_sampled_dataset.py
@@ -0,0 +1,95 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import unittest
+from collections import OrderedDict
+
+import numpy as np
+import torch
+from fairseq.data import LanguagePairDataset, TokenBlockDataset
+from fairseq.data.multi_corpus_sampled_dataset import MultiCorpusSampledDataset
+from tests.test_train import mock_dict
+
+
+class TestMultiCorpusSampledDataset(unittest.TestCase):
+    def setUp(self):
+        d = mock_dict()
+        tokens_1 = torch.LongTensor([1]).view(1, -1)
+        tokens_ds1 = TokenBlockDataset(
+            tokens_1,
+            sizes=[tokens_1.size(-1)],
+            block_size=1,
+            pad=0,
+            eos=1,
+            include_targets=False,
+        )
+        self.dataset_1 = LanguagePairDataset(
+            tokens_ds1, tokens_ds1.sizes, d, shuffle=False
+        )
+        tokens_2 = torch.LongTensor([2]).view(1, -1)
+        tokens_ds2 = TokenBlockDataset(
+            tokens_2,
+            sizes=[tokens_2.size(-1)],
+            block_size=1,
+            pad=0,
+            eos=1,
+            include_targets=False,
+        )
+        self.dataset_2 = LanguagePairDataset(
+            tokens_ds2, tokens_ds2.sizes, d, shuffle=False
+        )
+
+    def _test_sample_helper(
+        self,
+        expected_sample_from_first_ds_percentage,
+        num_samples=1000,
+        sampling_func=None,
+    ):
+        # To make sure test is not flaky
+        np.random.seed(0)
+        if sampling_func is None:
+            m = MultiCorpusSampledDataset(
+                OrderedDict({0: self.dataset_1, 1: self.dataset_2}),
+            )
+        else:
+            m = MultiCorpusSampledDataset(
+                OrderedDict({0: self.dataset_1, 1: self.dataset_2}),
+                sampling_func=sampling_func,
+            )
+        m.ordered_indices()
+        count_sample_from_first_dataset = 0
+        for _ in range(num_samples):
+            if m.collater([m[0], m[1]])["net_input"]["src_tokens"][0] == 1:
+                count_sample_from_first_dataset += 1
+        sample_from_first_ds_percentage = (
+            1.0 * count_sample_from_first_dataset / num_samples
+        )
+        self.assertLess(
+            abs(
+                sample_from_first_ds_percentage
+                - expected_sample_from_first_ds_percentage
+            ),
+            0.01,
+        )
+
+    def test_multi_corpus_sampled_dataset_uniform_sample(self):
+        self._test_sample_helper(expected_sample_from_first_ds_percentage=0.5)
+
+    def test_multi_corpus_sampled_dataset_weighted_sample(self):
+        def naive_weighted_sample(weights):
+            def f(l):
+                v = np.random.random()
+                agg = 0
+                for i, weight in enumerate(weights):
+                    agg += weight
+                    if agg > v:
+                        return i
+
+            return f
+
+        self._test_sample_helper(
+            expected_sample_from_first_ds_percentage=0.9,
+            sampling_func=naive_weighted_sample(weights=[0.9, 0.1]),
+        )
diff --git a/SpeechT5/fairseq/tests/test_multihead_attention.py b/SpeechT5/fairseq/tests/test_multihead_attention.py
new file mode 100644
index 0000000000000000000000000000000000000000..620a2d679147bbbb8d15f3323374a39939686ec2
--- /dev/null
+++ b/SpeechT5/fairseq/tests/test_multihead_attention.py
@@ -0,0 +1,73 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import unittest
+
+import torch
+from fairseq.modules.multihead_attention import MultiheadAttention
+
+
+class TestMultiheadAttention(unittest.TestCase):
+    def test_append_prev_key_padding_mask(self):
+        bsz = 1
+        src_len = 4
+
+        cases = [
+            # no padding mask
+            (None, None, None),
+            # current padding mask only
+            (
+                torch.tensor([[1]]).bool(),
+                None,
+                torch.tensor([[0, 0, 0, 1]]).bool(),
+            ),
+            # previous padding mask only
+            (
+                None,
+                torch.tensor([[0, 1, 0]]).bool(),
+                torch.tensor([[0, 1, 0, 0]]).bool(),
+            ),
+            # both padding masks
+            (
+                torch.tensor([[1]]).bool(),
+                torch.tensor([[0, 1, 0]]).bool(),
+                torch.tensor([[0, 1, 0, 1]]).bool(),
+            ),
+            # prev_key_padding_mask already full
+            (
+                torch.tensor([[0, 1, 0, 1]]).bool(),
+                None,
+                torch.tensor([[0, 1, 0, 1]]).bool(),
+            ),
+            # key_padding_mask already full
+            (
+                None,
+                torch.tensor([[0, 1, 0, 1]]).bool(),
+                torch.tensor([[0, 1, 0, 1]]).bool(),
+            ),
+        ]
+        for c in cases:
+            key_padding_mask = MultiheadAttention._append_prev_key_padding_mask(
+                c[0],
+                c[1],
+                batch_size=bsz,
+                src_len=src_len,
+                static_kv=False,
+            )
+
+            if key_padding_mask is not None:
+                self.assertTrue(
+                    torch.all(torch.eq(key_padding_mask, c[2])),
+                    f"Unexpected resultant key padding mask: {key_padding_mask}"
+                    f" given current: {c[0]} and previous: {c[1]}",
+                )
+                self.assertEqual(key_padding_mask.size(0), bsz)
+                self.assertEqual(key_padding_mask.size(1), src_len)
+            else:
+                self.assertIsNone(c[2])
+
+
+if __name__ == "__main__":
+    unittest.main()
diff --git a/SpeechT5/fairseq/tests/test_noising.py b/SpeechT5/fairseq/tests/test_noising.py
new file mode 100644
index 0000000000000000000000000000000000000000..b3d0d123c42eaca6f79371aa268049e668fcfcce
--- /dev/null
+++ b/SpeechT5/fairseq/tests/test_noising.py
@@ -0,0 +1,530 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import unittest
+from typing import Dict, List
+
+import tests.utils as test_utils
+import torch
+from fairseq import utils
+from fairseq.data import (
+    Dictionary,
+    LanguagePairDataset,
+    TransformEosDataset,
+    data_utils,
+    noising,
+)
+
+
+class TestDataNoising(unittest.TestCase):
+    def _get_test_data_with_bpe_cont_marker(self, append_eos=True):
+        """
+        Args:
+            append_eos: if True, each input sentence in the source tokens tensor
+                will have an EOS appended to the end.
+
+        Returns:
+            vocabs: BPE vocab with continuation markers as suffixes to denote
+                non-end of word tokens. This is the standard BPE format used in
+                fairseq's preprocessing.
+            x: input tensor containing numberized source tokens, with EOS at the
+                end if append_eos is true
+            src_lengths: and source lengths.
+        """
+        vocab = Dictionary()
+        vocab.add_symbol("he@@")
+        vocab.add_symbol("llo")
+        vocab.add_symbol("how")
+        vocab.add_symbol("are")
+        vocab.add_symbol("y@@")
+        vocab.add_symbol("ou")
+        vocab.add_symbol("n@@")
+        vocab.add_symbol("ew")
+        vocab.add_symbol("or@@")
+        vocab.add_symbol("k")
+
+        src_tokens = [
+            ["he@@", "llo", "n@@", "ew", "y@@", "or@@", "k"],
+            ["how", "are", "y@@", "ou"],
+        ]
+        x, src_lengths = x, src_lengths = self._convert_src_tokens_to_tensor(
+            vocab=vocab, src_tokens=src_tokens, append_eos=append_eos
+        )
+        return vocab, x, src_lengths
+
+    def _get_test_data_with_bpe_end_marker(self, append_eos=True):
+        """
+        Args:
+            append_eos: if True, each input sentence in the source tokens tensor
+                will have an EOS appended to the end.
+
+        Returns:
+            vocabs: BPE vocab with end-of-word markers as suffixes to denote
+                tokens at the end of a word. This is an alternative to fairseq's
+                standard preprocessing framework and is not generally supported
+                within fairseq.
+            x: input tensor containing numberized source tokens, with EOS at the
+                end if append_eos is true
+            src_lengths: and source lengths.
+        """
+        vocab = Dictionary()
+        vocab.add_symbol("he")
+        vocab.add_symbol("llo_EOW")
+        vocab.add_symbol("how_EOW")
+        vocab.add_symbol("are_EOW")
+        vocab.add_symbol("y")
+        vocab.add_symbol("ou_EOW")
+        vocab.add_symbol("n")
+        vocab.add_symbol("ew_EOW")
+        vocab.add_symbol("or")
+        vocab.add_symbol("k_EOW")
+
+        src_tokens = [
+            ["he", "llo_EOW", "n", "ew_EOW", "y", "or", "k_EOW"],
+            ["how_EOW", "are_EOW", "y", "ou_EOW"],
+        ]
+        x, src_lengths = x, src_lengths = self._convert_src_tokens_to_tensor(
+            vocab=vocab, src_tokens=src_tokens, append_eos=append_eos
+        )
+        return vocab, x, src_lengths
+
+    def _get_test_data_with_word_vocab(self, append_eos=True):
+        """
+        Args:
+            append_eos: if True, each input sentence in the source tokens tensor
+                will have an EOS appended to the end.
+
+        Returns:
+            vocabs: word vocab
+            x: input tensor containing numberized source tokens, with EOS at the
+                end if append_eos is true
+            src_lengths: and source lengths.
+        """
+        vocab = Dictionary()
+
+        vocab.add_symbol("hello")
+        vocab.add_symbol("how")
+        vocab.add_symbol("are")
+        vocab.add_symbol("you")
+        vocab.add_symbol("new")
+        vocab.add_symbol("york")
+        src_tokens = [
+            ["hello", "new", "york", "you"],
+            ["how", "are", "you", "new", "york"],
+        ]
+        x, src_lengths = self._convert_src_tokens_to_tensor(
+            vocab=vocab, src_tokens=src_tokens, append_eos=append_eos
+        )
+        return vocab, x, src_lengths
+
+    def _convert_src_tokens_to_tensor(
+        self, vocab: Dictionary, src_tokens: List[List[str]], append_eos: bool
+    ):
+        src_len = [len(x) for x in src_tokens]
+        # If we have to append EOS, we include EOS in counting src length
+        if append_eos:
+            src_len = [length + 1 for length in src_len]
+
+        x = torch.LongTensor(len(src_tokens), max(src_len)).fill_(vocab.pad())
+        for i in range(len(src_tokens)):
+            for j in range(len(src_tokens[i])):
+                x[i][j] = vocab.index(src_tokens[i][j])
+            if append_eos:
+                x[i][j + 1] = vocab.eos()
+
+        x = x.transpose(1, 0)
+        return x, torch.LongTensor(src_len)
+
+    def assert_eos_at_end(self, x, x_len, eos):
+        """Asserts last token of every sentence in x is EOS """
+        for i in range(len(x_len)):
+            self.assertEqual(
+                x[x_len[i] - 1][i],
+                eos,
+                (
+                    "Expected eos (token id {eos}) at the end of sentence {i} "
+                    "but got {other} instead"
+                ).format(i=i, eos=eos, other=x[i][-1]),
+            )
+
+    def assert_word_dropout_correct(self, x, x_noised, x_len, l_noised):
+        # Expect only the first word (2 bpe tokens) of the first example
+        # was dropped out
+        self.assertEqual(x_len[0] - 2, l_noised[0])
+        for i in range(l_noised[0]):
+            self.assertEqual(x_noised[i][0], x[i + 2][0])
+
+    def test_word_dropout_with_eos(self):
+        vocab, x, x_len = self._get_test_data_with_bpe_cont_marker(append_eos=True)
+
+        with data_utils.numpy_seed(1234):
+            noising_gen = noising.WordDropout(vocab)
+            x_noised, l_noised = noising_gen.noising(x, x_len, 0.2)
+            self.assert_word_dropout_correct(
+                x=x, x_noised=x_noised, x_len=x_len, l_noised=l_noised
+            )
+            self.assert_eos_at_end(x=x_noised, x_len=l_noised, eos=vocab.eos())
+
+    def assert_word_blanking_correct(self, x, x_noised, x_len, l_noised, unk):
+        # Expect only the first word (2 bpe tokens) of the first example
+        # was blanked out
+        self.assertEqual(x_len[0], l_noised[0])
+        for i in range(l_noised[0]):
+            if i < 2:
+                self.assertEqual(x_noised[i][0], unk)
+            else:
+                self.assertEqual(x_noised[i][0], x[i][0])
+
+    def test_word_blank_with_eos(self):
+        vocab, x, x_len = self._get_test_data_with_bpe_cont_marker(append_eos=True)
+
+        with data_utils.numpy_seed(1234):
+            noising_gen = noising.WordDropout(vocab)
+            x_noised, l_noised = noising_gen.noising(x, x_len, 0.2, vocab.unk())
+            self.assert_word_blanking_correct(
+                x=x, x_noised=x_noised, x_len=x_len, l_noised=l_noised, unk=vocab.unk()
+            )
+            self.assert_eos_at_end(x=x_noised, x_len=l_noised, eos=vocab.eos())
+
+    def generate_unchanged_shuffle_map(self, length):
+        return {i: i for i in range(length)}
+
+    def assert_word_shuffle_matches_expected(
+        self,
+        x,
+        x_len,
+        max_shuffle_distance: int,
+        vocab: Dictionary,
+        expected_shufle_maps: List[Dict[int, int]],
+        expect_eos_at_end: bool,
+        bpe_end_marker=None,
+    ):
+        """
+        This verifies that with a given x, x_len, max_shuffle_distance, and
+        vocab, we get the expected shuffle result.
+
+        Args:
+            x: Tensor of shape (T x B) = (sequence_length, batch_size)
+            x_len: Tensor of length B = batch_size
+            max_shuffle_distance: arg to pass to noising
+            expected_shuffle_maps: List[mapping] where mapping is a
+                Dict[old_index, new_index], mapping x's elements from their
+                old positions in x to their new positions in x.
+            expect_eos_at_end: if True, check the output to make sure there is
+                an EOS at the end.
+            bpe_end_marker: str denoting the BPE end token. If this is not None, we
+                set the BPE cont token to None in the noising classes.
+        """
+        bpe_cont_marker = None
+        if bpe_end_marker is None:
+            bpe_cont_marker = "@@"
+
+        with data_utils.numpy_seed(1234):
+            word_shuffle = noising.WordShuffle(
+                vocab, bpe_cont_marker=bpe_cont_marker, bpe_end_marker=bpe_end_marker
+            )
+            x_noised, l_noised = word_shuffle.noising(
+                x, x_len, max_shuffle_distance=max_shuffle_distance
+            )
+
+        # For every example, we have a different expected shuffle map. We check
+        # that each example is shuffled as expected according to each
+        # corresponding shuffle map.
+        for i in range(len(expected_shufle_maps)):
+            shuffle_map = expected_shufle_maps[i]
+            for k, v in shuffle_map.items():
+                self.assertEqual(x[k][i], x_noised[v][i])
+
+        # Shuffling should not affect the length of each example
+        for pre_shuffle_length, post_shuffle_length in zip(x_len, l_noised):
+            self.assertEqual(pre_shuffle_length, post_shuffle_length)
+        if expect_eos_at_end:
+            self.assert_eos_at_end(x=x_noised, x_len=l_noised, eos=vocab.eos())
+
+    def test_word_shuffle_with_eos(self):
+        vocab, x, x_len = self._get_test_data_with_bpe_cont_marker(append_eos=True)
+
+        # Assert word shuffle with max shuffle distance 0 causes input to be
+        # unchanged
+        self.assert_word_shuffle_matches_expected(
+            x=x,
+            x_len=x_len,
+            max_shuffle_distance=0,
+            vocab=vocab,
+            expected_shufle_maps=[
+                self.generate_unchanged_shuffle_map(example_len)
+                for example_len in x_len
+            ],
+            expect_eos_at_end=True,
+        )
+
+        # Assert word shuffle with max shuffle distance 3 matches our expected
+        # shuffle order
+        self.assert_word_shuffle_matches_expected(
+            x=x,
+            x_len=x_len,
+            vocab=vocab,
+            max_shuffle_distance=3,
+            expected_shufle_maps=[
+                self.generate_unchanged_shuffle_map(x_len[0]),
+                {0: 0, 1: 3, 2: 1, 3: 2},
+            ],
+            expect_eos_at_end=True,
+        )
+
+    def test_word_shuffle_with_eos_nonbpe(self):
+        """The purpose of this is to test shuffling logic with word vocabs"""
+        vocab, x, x_len = self._get_test_data_with_word_vocab(append_eos=True)
+
+        # Assert word shuffle with max shuffle distance 0 causes input to be
+        # unchanged
+        self.assert_word_shuffle_matches_expected(
+            x=x,
+            x_len=x_len,
+            max_shuffle_distance=0,
+            vocab=vocab,
+            expected_shufle_maps=[
+                self.generate_unchanged_shuffle_map(example_len)
+                for example_len in x_len
+            ],
+            expect_eos_at_end=True,
+        )
+
+        # Assert word shuffle with max shuffle distance 3 matches our expected
+        # shuffle order
+        self.assert_word_shuffle_matches_expected(
+            x=x,
+            x_len=x_len,
+            vocab=vocab,
+            max_shuffle_distance=3,
+            expected_shufle_maps=[
+                {0: 0, 1: 1, 2: 3, 3: 2},
+                {0: 0, 1: 2, 2: 1, 3: 3, 4: 4},
+            ],
+            expect_eos_at_end=True,
+        )
+
+    def test_word_shuffle_without_eos(self):
+        """Same result as word shuffle with eos except no EOS at end"""
+        vocab, x, x_len = self._get_test_data_with_bpe_cont_marker(append_eos=False)
+
+        # Assert word shuffle with max shuffle distance 0 causes input to be
+        # unchanged
+        self.assert_word_shuffle_matches_expected(
+            x=x,
+            x_len=x_len,
+            max_shuffle_distance=0,
+            vocab=vocab,
+            expected_shufle_maps=[
+                self.generate_unchanged_shuffle_map(example_len)
+                for example_len in x_len
+            ],
+            expect_eos_at_end=False,
+        )
+
+        # Assert word shuffle with max shuffle distance 3 matches our expected
+        # shuffle order
+        self.assert_word_shuffle_matches_expected(
+            x=x,
+            x_len=x_len,
+            vocab=vocab,
+            max_shuffle_distance=3,
+            expected_shufle_maps=[
+                self.generate_unchanged_shuffle_map(x_len[0]),
+                {0: 0, 1: 3, 2: 1, 3: 2},
+            ],
+            expect_eos_at_end=False,
+        )
+
+    def test_word_shuffle_without_eos_with_bpe_end_marker(self):
+        """Same result as word shuffle without eos except using BPE end token"""
+        vocab, x, x_len = self._get_test_data_with_bpe_end_marker(append_eos=False)
+
+        # Assert word shuffle with max shuffle distance 0 causes input to be
+        # unchanged
+        self.assert_word_shuffle_matches_expected(
+            x=x,
+            x_len=x_len,
+            max_shuffle_distance=0,
+            vocab=vocab,
+            expected_shufle_maps=[
+                self.generate_unchanged_shuffle_map(example_len)
+                for example_len in x_len
+            ],
+            expect_eos_at_end=False,
+            bpe_end_marker="_EOW",
+        )
+
+        # Assert word shuffle with max shuffle distance 3 matches our expected
+        # shuffle order
+        self.assert_word_shuffle_matches_expected(
+            x=x,
+            x_len=x_len,
+            vocab=vocab,
+            max_shuffle_distance=3,
+            expected_shufle_maps=[
+                self.generate_unchanged_shuffle_map(x_len[0]),
+                {0: 0, 1: 3, 2: 1, 3: 2},
+            ],
+            expect_eos_at_end=False,
+            bpe_end_marker="_EOW",
+        )
+
+    def assert_no_eos_at_end(self, x, x_len, eos):
+        """Asserts that the last token of each sentence in x is not EOS """
+        for i in range(len(x_len)):
+            self.assertNotEqual(
+                x[x_len[i] - 1][i],
+                eos,
+                "Expected no eos (token id {eos}) at the end of sentence {i}.".format(
+                    eos=eos, i=i
+                ),
+            )
+
+    def test_word_dropout_without_eos(self):
+        """Same result as word dropout with eos except no EOS at end"""
+        vocab, x, x_len = self._get_test_data_with_bpe_cont_marker(append_eos=False)
+
+        with data_utils.numpy_seed(1234):
+            noising_gen = noising.WordDropout(vocab)
+            x_noised, l_noised = noising_gen.noising(x, x_len, 0.2)
+            self.assert_word_dropout_correct(
+                x=x, x_noised=x_noised, x_len=x_len, l_noised=l_noised
+            )
+            self.assert_no_eos_at_end(x=x_noised, x_len=l_noised, eos=vocab.eos())
+
+    def test_word_blank_without_eos(self):
+        """Same result as word blank with eos except no EOS at end"""
+        vocab, x, x_len = self._get_test_data_with_bpe_cont_marker(append_eos=False)
+
+        with data_utils.numpy_seed(1234):
+            noising_gen = noising.WordDropout(vocab)
+            x_noised, l_noised = noising_gen.noising(x, x_len, 0.2, vocab.unk())
+            self.assert_word_blanking_correct(
+                x=x, x_noised=x_noised, x_len=x_len, l_noised=l_noised, unk=vocab.unk()
+            )
+            self.assert_no_eos_at_end(x=x_noised, x_len=l_noised, eos=vocab.eos())
+
+    def _get_noising_dataset_batch(
+        self,
+        src_tokens_no_pad,
+        src_dict,
+        append_eos_to_tgt=False,
+    ):
+        """
+        Constructs a NoisingDataset and the corresponding
+        ``LanguagePairDataset(NoisingDataset(src), src)``. If
+        *append_eos_to_tgt* is True, wrap the source dataset in
+        :class:`TransformEosDataset` to append EOS to the clean source when
+        using it as the target.
+        """
+        src_dataset = test_utils.TestDataset(data=src_tokens_no_pad)
+
+        noising_dataset = noising.NoisingDataset(
+            src_dataset=src_dataset,
+            src_dict=src_dict,
+            seed=1234,
+            max_word_shuffle_distance=3,
+            word_dropout_prob=0.2,
+            word_blanking_prob=0.2,
+            noising_class=noising.UnsupervisedMTNoising,
+        )
+        tgt = src_dataset
+        language_pair_dataset = LanguagePairDataset(
+            src=noising_dataset, tgt=tgt, src_sizes=None, src_dict=src_dict
+        )
+        language_pair_dataset = TransformEosDataset(
+            language_pair_dataset,
+            src_dict.eos(),
+            append_eos_to_tgt=append_eos_to_tgt,
+        )
+
+        dataloader = torch.utils.data.DataLoader(
+            dataset=language_pair_dataset,
+            batch_size=2,
+            collate_fn=language_pair_dataset.collater,
+        )
+        denoising_batch_result = next(iter(dataloader))
+        return denoising_batch_result
+
+    def test_noising_dataset_with_eos(self):
+        src_dict, src_tokens, _ = self._get_test_data_with_bpe_cont_marker(
+            append_eos=True
+        )
+
+        # Format data for src_dataset
+        src_tokens = torch.t(src_tokens)
+        src_tokens_no_pad = []
+        for src_sentence in src_tokens:
+            src_tokens_no_pad.append(
+                utils.strip_pad(tensor=src_sentence, pad=src_dict.pad())
+            )
+        denoising_batch_result = self._get_noising_dataset_batch(
+            src_tokens_no_pad=src_tokens_no_pad, src_dict=src_dict
+        )
+
+        eos, pad = src_dict.eos(), src_dict.pad()
+
+        # Generated noisy source as source
+        expected_src = torch.LongTensor(
+            [[4, 5, 10, 11, 8, 12, 13, eos], [pad, pad, pad, 6, 8, 9, 7, eos]]
+        )
+        # Original clean source as target (right-padded)
+        expected_tgt = torch.LongTensor(
+            [[4, 5, 10, 11, 8, 12, 13, eos], [6, 7, 8, 9, eos, pad, pad, pad]]
+        )
+        generated_src = denoising_batch_result["net_input"]["src_tokens"]
+        tgt_tokens = denoising_batch_result["target"]
+
+        self.assertTensorEqual(expected_src, generated_src)
+        self.assertTensorEqual(expected_tgt, tgt_tokens)
+
+    def test_noising_dataset_without_eos(self):
+        """
+        Similar to test noising dataset with eos except that we have to set
+        *append_eos_to_tgt* to ``True``.
+        """
+
+        src_dict, src_tokens, _ = self._get_test_data_with_bpe_cont_marker(
+            append_eos=False
+        )
+
+        # Format data for src_dataset
+        src_tokens = torch.t(src_tokens)
+        src_tokens_no_pad = []
+        for src_sentence in src_tokens:
+            src_tokens_no_pad.append(
+                utils.strip_pad(tensor=src_sentence, pad=src_dict.pad())
+            )
+        denoising_batch_result = self._get_noising_dataset_batch(
+            src_tokens_no_pad=src_tokens_no_pad,
+            src_dict=src_dict,
+            append_eos_to_tgt=True,
+        )
+
+        eos, pad = src_dict.eos(), src_dict.pad()
+
+        # Generated noisy source as source
+        expected_src = torch.LongTensor(
+            [[4, 5, 10, 11, 8, 12, 13], [pad, pad, pad, 6, 8, 9, 7]]
+        )
+        # Original clean source as target (right-padded)
+        expected_tgt = torch.LongTensor(
+            [[4, 5, 10, 11, 8, 12, 13, eos], [6, 7, 8, 9, eos, pad, pad, pad]]
+        )
+
+        generated_src = denoising_batch_result["net_input"]["src_tokens"]
+        tgt_tokens = denoising_batch_result["target"]
+
+        self.assertTensorEqual(expected_src, generated_src)
+        self.assertTensorEqual(expected_tgt, tgt_tokens)
+
+    def assertTensorEqual(self, t1, t2):
+        self.assertEqual(t1.size(), t2.size(), "size mismatch")
+        self.assertEqual(t1.ne(t2).long().sum(), 0)
+
+
+if __name__ == "__main__":
+    unittest.main()
diff --git a/SpeechT5/fairseq/tests/test_online_backtranslation.py b/SpeechT5/fairseq/tests/test_online_backtranslation.py
new file mode 100644
index 0000000000000000000000000000000000000000..0ae7e773da0ff838b3c8151bc14b84a6a9238a72
--- /dev/null
+++ b/SpeechT5/fairseq/tests/test_online_backtranslation.py
@@ -0,0 +1,206 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import tempfile
+import unittest
+from pathlib import Path
+from typing import Any, Dict, Sequence
+
+import fairseq.data.indexed_dataset as indexed_dataset
+import fairseq.options
+import fairseq.tasks.online_backtranslation as obt
+import torch
+from tests import utils
+
+
+def mk_sample(tokens: Sequence[int], batch_size: int = 2) -> Dict[str, Any]:
+    batch = torch.stack([torch.tensor(tokens, dtype=torch.long)] * batch_size)
+    sample = {
+        "net_input": {
+            "src_tokens": batch,
+            "prev_output_tokens": batch,
+            "src_lengths": torch.tensor([len(tokens)] * batch_size, dtype=torch.long),
+        },
+        "target": batch[:, 1:],
+    }
+    return sample
+
+
+def mk_dataset(num_samples: int, max_len: int, output: Path):
+    output.parent.mkdir(exist_ok=True)
+    idx = indexed_dataset.IndexedDatasetBuilder(str(output))
+    data = torch.randint(5, 100, (num_samples, max_len))
+    lengths = torch.randint(3, max_len, (num_samples,))
+    for d, l in zip(data, lengths):
+        d[0] = 0
+        idx.add_item(d[:l])
+    idx.finalize(output.with_suffix(".idx"))
+    assert output.exists()
+    assert output.with_suffix(".idx").exists()
+
+
+class OnlineBacktranslationTest(unittest.TestCase):
+
+    tmp_dir = Path(tempfile.mkdtemp(suffix="OnlineBacktranslationTest"))
+
+    @classmethod
+    def obt_task(
+        cls, languages: Sequence[str], data: Path = None, language_mapping: str = None
+    ):
+        dict_path = cls.tmp_dir / "dict.txt"
+        if not dict_path.exists():
+            dictionary = utils.dummy_dictionary(100)
+            dictionary.save(str(dict_path))
+
+        if data is not None:
+            (data / "dict.txt").write_text(dict_path.read_text())
+        else:
+            data = cls.tmp_dir
+        assert len(languages) >= 2
+
+        kwargs = {
+            "arch": "transformer",
+            # --max-sentences=1 for better predictability of batches
+            "max_sentences": 1,
+            # Use characteristics dimensions
+            "encoder_layers": 3,
+            "encoder_embed_dim": 12,
+            "encoder_ffn_embed_dim": 14,
+            "encoder_attention_heads": 4,
+            "decoder_layers": 3,
+            "decoder_embed_dim": 12,
+            "decoder_output_dim": 12,
+            "decoder_ffn_embed_dim": 14,
+            "decoder_attention_heads": 4,
+            # Disable dropout so we have comparable tests.
+            "dropout": 0,
+            "attention_dropout": 0,
+            "activation_dropout": 0,
+            "encoder_layerdrop": 0,
+        }
+
+        args = fairseq.options.get_args(
+            data,
+            task="online_backtranslation",
+            mono_langs=",".join(languages),
+            valid_lang_pairs=f"{languages[0]}-{languages[1]}",
+            tokens_per_sample=256,
+            language_mapping=language_mapping,
+            **kwargs,
+        )
+        task = obt.OnlineBackTranslationTask.setup_task(args)
+        # we need to build the model to have the correct dictionary
+        model = task.build_model(task.args)
+        return task, model
+
+    def tmp_path(self, test_case: str) -> Path:
+        return Path(tempfile.mkdtemp(test_case, dir=self.tmp_dir))
+
+    def test_lang_tokens(self):
+        task, model = self.obt_task(["en", "ro", "zh"])
+        assert obt._lang_token("en") in task.dictionary
+        assert obt._lang_token("ro") in task.dictionary
+        assert obt._lang_token("zh") in task.dictionary
+
+        en_bos = obt._lang_token_index(task.common_dict, "en")
+        assert "en" == task.common_dict[en_bos].strip("_")
+        zh_bos = obt._lang_token_index(task.common_dict, "zh")
+        assert "zh" == task.common_dict[zh_bos].strip("_")
+        zh_sample = mk_sample([zh_bos, 16, 14, 12, 10])
+
+        # we expect to receive the bos token for translation
+        assert task.get_bos_token_from_sample(zh_sample) == en_bos
+
+    def test_backtranslate_sample(self):
+        task, model = self.obt_task(["en", "ro", "zh"])
+
+        en_bos = obt._lang_token_index(task.common_dict, "en")
+        zh_bos = obt._lang_token_index(task.common_dict, "zh")
+        sample = mk_sample([zh_bos, 16, 14, 12, 10])
+
+        task.backtranslate_sample(sample, "zh", "en")
+        target_zh = list(sample["target"][0])
+        assert target_zh == [16, 14, 12, 10]  # original zh sentence
+        generated_en = sample["net_input"]["src_tokens"][0]
+        assert generated_en[0] == en_bos
+
+    def test_train_dataset(self):
+        data = self.tmp_path("test_train_dataset")
+        mk_dataset(20, 10, data / "en" / "train.bin")
+        mk_dataset(10, 10, data / "zh" / "train.bin")
+        task, model = self.obt_task(["en", "zh"], data)
+        task.load_dataset("train")
+
+        en_bos = obt._lang_token_index(task.common_dict, "en")
+        zh_bos = obt._lang_token_index(task.common_dict, "zh")
+
+        train = task.datasets["train"]
+        train.ordered_indices()
+        train.prefetch([0, 19])
+        sample_0 = train[0]
+        sample_19 = train[19]
+        self.assertEqual(
+            set(sample_0.keys()), {"en-BT", "en-DENOISE", "zh-BT", "zh-DENOISE"}
+        )
+        for sample in (sample_0, sample_19):
+            self.assertEqual(sample["en-BT"]["source"][0], en_bos)
+            # bt target isn't ready to look at.
+            self.assertEqual(sample["en-DENOISE"]["source"][0], en_bos)
+            # TODO What could we check on the target side ?
+
+        for i in range(10):
+            # Zh dataset is shorter, and is wrapped around En dataset.
+            train.prefetch([i, i + 10])
+            self.assertEqual(
+                list(train[i]["zh-DENOISE"]["source"]),
+                list(train[i + 10]["zh-DENOISE"]["source"]),
+            )
+            self.assertEqual(train[i]["zh-DENOISE"]["source"][0].item(), zh_bos)
+
+        # Sorted by increasing len
+        self.assertLess(
+            len(sample_0["en-BT"]["source"]), len(sample_19["en-BT"]["source"])
+        )
+
+    def test_valid_dataset(self):
+        data = self.tmp_path("test_valid_dataset")
+        mk_dataset(10, 21, data / "valid.en-zh.en.bin")
+        mk_dataset(10, 21, data / "valid.en-zh.zh.bin")
+
+        task, model = self.obt_task(["en", "zh"], data)
+        valid = task.load_dataset("valid")
+        en_bos = obt._lang_token_index(task.common_dict, "en")
+
+        assert valid is not None
+        valid.prefetch(range(10))
+        sample_0 = valid[0]
+        sample_9 = valid[9]
+        self.assertEqual(sample_0["id"], 0)
+        self.assertEqual(sample_9["id"], 9)
+        self.assertEqual(sample_0["source"][0], en_bos)
+        self.assertEqual(sample_9["source"][0], en_bos)
+        # TODO: could we test the target side ?
+
+    def assertFnMatch(self, fn, values):
+        for x, y in values.items():
+            fn_x = fn(x)
+            self.assertEqual(fn_x, y, f"Fn has wrong value: fn({x}) = {fn_x} != {y}")
+
+    def test_piecewise_linear_fn(self):
+        self.assertFnMatch(
+            obt.PiecewiseLinearFn.from_string("1.0"), {0: 1, 100: 1, 500: 1, 1000: 1}
+        )
+        self.assertFnMatch(
+            obt.PiecewiseLinearFn.from_string("0:1,1000:0"),
+            {0: 1, 500: 0.5, 1000: 0, 2000: 0},
+        )
+        self.assertFnMatch(
+            obt.PiecewiseLinearFn.from_string("0:0,1000:1"),
+            {0: 0, 500: 0.5, 1000: 1, 2000: 1},
+        )
+        self.assertFnMatch(
+            obt.PiecewiseLinearFn.from_string("0:0,1000:1,2000:0"),
+            {0: 0, 500: 0.5, 1000: 1, 1500: 0.5, 2000: 0, 3000: 0},
+        )
diff --git a/SpeechT5/fairseq/tests/test_plasma_utils.py b/SpeechT5/fairseq/tests/test_plasma_utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..e6344c2a5a73fcb2fb81376e7bd43470963b3674
--- /dev/null
+++ b/SpeechT5/fairseq/tests/test_plasma_utils.py
@@ -0,0 +1,126 @@
+import contextlib
+import unittest
+import tempfile
+from io import StringIO
+
+import numpy as np
+
+from tests.utils import create_dummy_data, preprocess_lm_data, train_language_model
+
+try:
+    from pyarrow import plasma
+    from fairseq.data.plasma_utils import PlasmaView, PlasmaStore
+
+    PYARROW_AVAILABLE = True
+except ImportError:
+    PYARROW_AVAILABLE = False
+
+dummy_path = "dummy"
+
+
+@unittest.skipUnless(PYARROW_AVAILABLE, "")
+class TestPlasmaView(unittest.TestCase):
+    def setUp(self) -> None:
+        self.tmp_file = tempfile.NamedTemporaryFile()  # noqa: P201
+        self.path = self.tmp_file.name
+        self.server = PlasmaStore.start(path=self.path, nbytes=10000)
+        self.client = plasma.connect(self.path, num_retries=10)
+
+    def tearDown(self) -> None:
+        self.client.disconnect()
+        self.tmp_file.close()
+        self.server.kill()
+
+    def test_two_servers_do_not_share_object_id_space(self):
+        data_server_1 = np.array([0, 1])
+        data_server_2 = np.array([2, 3])
+        server_2_path = self.path
+        with tempfile.NamedTemporaryFile() as server_1_path:
+            server = PlasmaStore.start(path=server_1_path.name, nbytes=10000)
+            arr1 = PlasmaView(
+                data_server_1, dummy_path, 1, plasma_path=server_1_path.name
+            )
+            assert len(arr1.client.list()) == 1
+            assert (arr1.array == data_server_1).all()
+            arr2 = PlasmaView(data_server_2, dummy_path, 1, plasma_path=server_2_path)
+            assert (arr2.array == data_server_2).all()
+            assert (arr1.array == data_server_1).all()
+            server.kill()
+
+    def test_hash_collision(self):
+        data_server_1 = np.array([0, 1])
+        data_server_2 = np.array([2, 3])
+        arr1 = PlasmaView(data_server_1, dummy_path, 1, plasma_path=self.path)
+        assert len(arr1.client.list()) == 1
+        arr2 = PlasmaView(data_server_2, dummy_path, 1, plasma_path=self.path)
+        assert len(arr1.client.list()) == 1
+        assert len(arr2.client.list()) == 1
+        assert (arr2.array == data_server_1).all()
+        # New hash key based on tuples
+        arr3 = PlasmaView(
+            data_server_2, dummy_path, (1, 12312312312, None), plasma_path=self.path
+        )
+        assert (
+            len(arr2.client.list()) == 2
+        ), "No new object was created by using a novel hash key"
+        assert (
+            arr3.object_id in arr2.client.list()
+        ), "No new object was created by using a novel hash key"
+        assert (
+            arr3.object_id in arr3.client.list()
+        ), "No new object was created by using a novel hash key"
+        del arr3, arr2, arr1
+
+    @staticmethod
+    def _assert_view_equal(pv1, pv2):
+        np.testing.assert_array_equal(pv1.array, pv2.array)
+
+    def test_putting_same_array_twice(self):
+        data = np.array([4, 4, 4])
+        arr1 = PlasmaView(data, dummy_path, 1, plasma_path=self.path)
+        assert len(self.client.list()) == 1
+        arr1b = PlasmaView(
+            data, dummy_path, 1, plasma_path=self.path
+        )  # should not change contents of store
+        arr1c = PlasmaView(
+            None, dummy_path, 1, plasma_path=self.path
+        )  # should not change contents of store
+
+        assert len(self.client.list()) == 1
+        self._assert_view_equal(arr1, arr1b)
+        self._assert_view_equal(arr1, arr1c)
+        PlasmaView(
+            data, dummy_path, 2, plasma_path=self.path
+        )  # new object id, adds new entry
+        assert len(self.client.list()) == 2
+
+        new_client = plasma.connect(self.path)
+        assert len(new_client.list()) == 2  # new client can access same objects
+        assert isinstance(arr1.object_id, plasma.ObjectID)
+        del arr1b
+        del arr1c
+
+    def test_plasma_store_full_raises(self):
+        with tempfile.NamedTemporaryFile() as new_path:
+            server = PlasmaStore.start(path=new_path.name, nbytes=10000)
+            with self.assertRaises(plasma.PlasmaStoreFull):
+                # 2000 floats is more than 2000 bytes
+                PlasmaView(
+                    np.random.rand(10000, 1), dummy_path, 1, plasma_path=new_path.name
+                )
+            server.kill()
+
+    def test_object_id_overflow(self):
+        PlasmaView.get_object_id("", 2 ** 21)
+
+    def test_training_lm_plasma(self):
+        with contextlib.redirect_stdout(StringIO()):
+            with tempfile.TemporaryDirectory("test_transformer_lm") as data_dir:
+                create_dummy_data(data_dir)
+                preprocess_lm_data(data_dir)
+                train_language_model(
+                    data_dir,
+                    "transformer_lm",
+                    ["--use-plasma-view", "--plasma-path", self.path],
+                    run_validation=True,
+                )
diff --git a/SpeechT5/fairseq/tests/test_reproducibility.py b/SpeechT5/fairseq/tests/test_reproducibility.py
new file mode 100644
index 0000000000000000000000000000000000000000..94931b2a0721c4adfee8899c89cac24f45973d17
--- /dev/null
+++ b/SpeechT5/fairseq/tests/test_reproducibility.py
@@ -0,0 +1,150 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import contextlib
+import json
+import os
+import tempfile
+import unittest
+from io import StringIO
+
+import torch
+
+from . import test_binaries
+
+
+class TestReproducibility(unittest.TestCase):
+    def _test_reproducibility(
+        self,
+        name,
+        extra_flags=None,
+        delta=0.0001,
+        resume_checkpoint="checkpoint1.pt",
+        max_epoch=3,
+    ):
+        def get_last_log_stats_containing_string(log_records, search_string):
+            for log_record in logs.records[::-1]:
+                if isinstance(log_record.msg, str) and search_string in log_record.msg:
+                    return json.loads(log_record.msg)
+
+        if extra_flags is None:
+            extra_flags = []
+
+        with tempfile.TemporaryDirectory(name) as data_dir:
+            with self.assertLogs() as logs:
+                test_binaries.create_dummy_data(data_dir)
+                test_binaries.preprocess_translation_data(data_dir)
+
+            # train epochs 1 and 2 together
+            with self.assertLogs() as logs:
+                test_binaries.train_translation_model(
+                    data_dir,
+                    "fconv_iwslt_de_en",
+                    [
+                        "--dropout",
+                        "0.0",
+                        "--log-format",
+                        "json",
+                        "--log-interval",
+                        "1",
+                        "--max-epoch",
+                        str(max_epoch),
+                    ]
+                    + extra_flags,
+                )
+            train_log = get_last_log_stats_containing_string(logs.records, "train_loss")
+            valid_log = get_last_log_stats_containing_string(logs.records, "valid_loss")
+
+            # train epoch 2, resuming from previous checkpoint 1
+            os.rename(
+                os.path.join(data_dir, resume_checkpoint),
+                os.path.join(data_dir, "checkpoint_last.pt"),
+            )
+            with self.assertLogs() as logs:
+                test_binaries.train_translation_model(
+                    data_dir,
+                    "fconv_iwslt_de_en",
+                    [
+                        "--dropout",
+                        "0.0",
+                        "--log-format",
+                        "json",
+                        "--log-interval",
+                        "1",
+                        "--max-epoch",
+                        str(max_epoch),
+                    ]
+                    + extra_flags,
+                )
+            train_res_log = get_last_log_stats_containing_string(
+                logs.records, "train_loss"
+            )
+            valid_res_log = get_last_log_stats_containing_string(
+                logs.records, "valid_loss"
+            )
+
+            for k in ["train_loss", "train_ppl", "train_num_updates", "train_gnorm"]:
+                self.assertAlmostEqual(
+                    float(train_log[k]), float(train_res_log[k]), delta=delta
+                )
+            for k in [
+                "valid_loss",
+                "valid_ppl",
+                "valid_num_updates",
+                "valid_best_loss",
+            ]:
+                self.assertAlmostEqual(
+                    float(valid_log[k]), float(valid_res_log[k]), delta=delta
+                )
+
+    def test_reproducibility(self):
+        self._test_reproducibility("test_reproducibility")
+
+    @unittest.skipIf(not torch.cuda.is_available(), "test requires a GPU")
+    def test_reproducibility_fp16(self):
+        self._test_reproducibility(
+            "test_reproducibility_fp16",
+            [
+                "--fp16",
+                "--fp16-init-scale",
+                "4096",
+            ],
+            delta=0.011,
+        )
+
+    @unittest.skipIf(not torch.cuda.is_available(), "test requires a GPU")
+    def test_reproducibility_memory_efficient_fp16(self):
+        self._test_reproducibility(
+            "test_reproducibility_memory_efficient_fp16",
+            [
+                "--memory-efficient-fp16",
+                "--fp16-init-scale",
+                "4096",
+            ],
+        )
+
+    @unittest.skipIf(not torch.cuda.is_available(), "test requires a GPU")
+    def test_reproducibility_amp(self):
+        self._test_reproducibility(
+            "test_reproducibility_amp",
+            [
+                "--amp",
+                "--fp16-init-scale",
+                "4096",
+            ],
+            delta=0.011,
+        )
+
+    def test_mid_epoch_reproducibility(self):
+        self._test_reproducibility(
+            "test_mid_epoch_reproducibility",
+            ["--save-interval-updates", "3"],
+            resume_checkpoint="checkpoint_1_3.pt",
+            max_epoch=1,
+        )
+
+
+if __name__ == "__main__":
+    unittest.main()
diff --git a/SpeechT5/fairseq/tests/test_resampling_dataset.py b/SpeechT5/fairseq/tests/test_resampling_dataset.py
new file mode 100644
index 0000000000000000000000000000000000000000..ccb53a253ce6ca0d8e972adfa708144b4299b3cb
--- /dev/null
+++ b/SpeechT5/fairseq/tests/test_resampling_dataset.py
@@ -0,0 +1,103 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import collections
+import unittest
+
+import numpy as np
+from fairseq.data import ListDataset, ResamplingDataset
+
+
+class TestResamplingDataset(unittest.TestCase):
+    def setUp(self):
+        self.strings = ["ab", "c", "def", "ghij"]
+        self.weights = [4.0, 2.0, 7.0, 1.5]
+        self.size_ratio = 2
+        self.dataset = ListDataset(
+            self.strings, np.array([len(s) for s in self.strings])
+        )
+
+    def _test_common(self, resampling_dataset, iters):
+        assert len(self.dataset) == len(self.strings) == len(self.weights)
+        assert len(resampling_dataset) == self.size_ratio * len(self.strings)
+
+        results = {"ordered_by_size": True, "max_distribution_diff": 0.0}
+
+        totalfreqs = 0
+        freqs = collections.defaultdict(int)
+
+        for epoch_num in range(iters):
+            resampling_dataset.set_epoch(epoch_num)
+
+            indices = resampling_dataset.ordered_indices()
+            assert len(indices) == len(resampling_dataset)
+
+            prev_size = -1
+
+            for i in indices:
+                cur_size = resampling_dataset.size(i)
+                # Make sure indices map to same sequences within an epoch
+                assert resampling_dataset[i] == resampling_dataset[i]
+
+                # Make sure length of sequence is correct
+                assert cur_size == len(resampling_dataset[i])
+
+                freqs[resampling_dataset[i]] += 1
+                totalfreqs += 1
+
+                if prev_size > cur_size:
+                    results["ordered_by_size"] = False
+
+                prev_size = cur_size
+
+        assert set(freqs.keys()) == set(self.strings)
+        for s, weight in zip(self.strings, self.weights):
+            freq = freqs[s] / totalfreqs
+            expected_freq = weight / sum(self.weights)
+            results["max_distribution_diff"] = max(
+                results["max_distribution_diff"], abs(expected_freq - freq)
+            )
+
+        return results
+
+    def test_resampling_dataset_batch_by_size_false(self):
+        resampling_dataset = ResamplingDataset(
+            self.dataset,
+            self.weights,
+            size_ratio=self.size_ratio,
+            batch_by_size=False,
+            seed=0,
+        )
+
+        results = self._test_common(resampling_dataset, iters=1000)
+
+        # For batch_by_size = False, the batches should be returned in
+        # arbitrary order of size.
+        assert not results["ordered_by_size"]
+
+        # Allow tolerance in distribution error of 2%.
+        assert results["max_distribution_diff"] < 0.02
+
+    def test_resampling_dataset_batch_by_size_true(self):
+        resampling_dataset = ResamplingDataset(
+            self.dataset,
+            self.weights,
+            size_ratio=self.size_ratio,
+            batch_by_size=True,
+            seed=0,
+        )
+
+        results = self._test_common(resampling_dataset, iters=1000)
+
+        # For batch_by_size = True, the batches should be returned in
+        # increasing order of size.
+        assert results["ordered_by_size"]
+
+        # Allow tolerance in distribution error of 2%.
+        assert results["max_distribution_diff"] < 0.02
+
+
+if __name__ == "__main__":
+    unittest.main()
diff --git a/SpeechT5/fairseq/tests/test_roberta.py b/SpeechT5/fairseq/tests/test_roberta.py
new file mode 100644
index 0000000000000000000000000000000000000000..b0b9cfd31e8cb1e03ae74403886d2fb5266e0443
--- /dev/null
+++ b/SpeechT5/fairseq/tests/test_roberta.py
@@ -0,0 +1,314 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import functools
+import unittest
+from typing import Any, Dict, Sequence
+
+import fairseq
+import fairseq.options
+import fairseq.tasks
+import torch
+from tests.utils import dummy_dictionary
+
+VOCAB_SIZE = 100
+
+
+@fairseq.tasks.register_task("fake_task")
+class FakeTask(fairseq.tasks.LegacyFairseqTask):
+    def __init__(self, args):
+        super().__init__(args)
+        self.dictionary = dummy_dictionary(VOCAB_SIZE - 4)
+        assert len(self.dictionary) == VOCAB_SIZE
+
+    @property
+    def source_dictionary(self):
+        return self.dictionary
+
+    @property
+    def target_dictionary(self):
+        return self.dictionary
+
+
+@functools.lru_cache()
+def get_toy_model(
+    device: str,
+    architecture: str = "roberta_enc_dec",
+    **extra_args: Any,
+):
+    assert device in ("gpu", "cpu")
+    kwargs = {
+        "arch": architecture,
+        # Use characteristics dimensions
+        "encoder_layers": 3,
+        "encoder_embed_dim": 12,
+        "encoder_ffn_embed_dim": 14,
+        "encoder_attention_heads": 4,
+        "decoder_layers": 3,
+        "decoder_embed_dim": 12,
+        "decoder_ffn_embed_dim": 14,
+        "decoder_attention_heads": 4,
+        # Disable dropout so we have comparable tests.
+        "dropout": 0,
+        "attention_dropout": 0,
+        "activation_dropout": 0,
+        "encoder_layerdrop": 0,
+        # required args
+        "tokens_per_sample": 256,
+        "data": "/tmp/test_roberta",
+    }
+    kwargs.update(extra_args)
+    fake_task = FakeTask(kwargs)
+    args = fairseq.options.get_args(
+        task="online_backtranslation",
+        mono_langs="en,ro",
+        valid_lang_pairs="en-ro",
+        **kwargs,
+    )
+    torch.manual_seed(0)
+    model = fake_task.build_model(args)
+    if device == "gpu":
+        model.cuda()
+    return fake_task, model
+
+
+def mk_sample(
+    lang: str, device: str, tok: Sequence[int] = None, batch_size: int = 2
+) -> Dict[str, Any]:
+    assert device in ("gpu", "cpu")
+    if not tok:
+        if lang == "en":
+            tok = [10, 11, 12, 13, 14, 15, 2]
+        else:
+            tok = [20, 21, 22, 23, 24, 25, 26, 27, 2]
+
+    batch = torch.stack([torch.tensor(tok, dtype=torch.long)] * batch_size)
+    if device == "gpu":
+        batch = batch.cuda()
+    sample = {
+        "net_input": {
+            "src_tokens": batch,
+            "prev_output_tokens": batch,
+            "src_lengths": torch.tensor(
+                [len(tok)] * batch_size, dtype=torch.long, device=batch.device
+            ),
+        },
+        "target": batch[:, 1:],
+    }
+    return sample
+
+
+def cpu_gpu(fn):
+    def helper(self):
+        fn(self, "cpu")
+        if torch.cuda.is_available():
+            fn(self, "gpu")
+
+    return helper
+
+
+def architectures(fn):
+    def helper(self):
+        for arch in ["roberta_enc_dec", "transformer"]:
+            fn(self, arch)
+
+    return helper
+
+
+class RobertaTest(unittest.TestCase):
+    def assertTensorEqual(self, t1, t2, delta: float = 1e-6):
+        self.assertEqual(t1.size(), t2.size(), "size mismatch")
+        if delta == 0.0:
+            self.assertEqual(t1.ne(t2).long().sum(), 0)
+        else:
+            self.assertEqual(((t2 - t1).abs() > delta).long().sum(), 0)
+
+    def assertSharing(self, model, link_groups: Sequence[Sequence[str]]):
+        ids = {}
+        for group in link_groups:
+            group_ids = {name: id(params(model, name)) for name in group}
+            shared_id = group_ids[group[0]]
+            self.assertEqual(group_ids, {name: shared_id for name in group})
+            self.assertNotIn(shared_id, ids)
+            ids[shared_id] = group
+
+    def test_roberta_shared_params(self):
+        _, roberta = get_toy_model("cpu", architecture="roberta")
+        self.assertSharing(
+            roberta,
+            [
+                [
+                    "encoder.sentence_encoder.embed_tokens.weight",
+                    "encoder.lm_head.weight",
+                ]
+            ],
+        )
+
+        _, roberta = get_toy_model(
+            "cpu", architecture="roberta", untie_weights_roberta=True
+        )
+        self.assertSharing(
+            roberta,
+            [
+                ["encoder.sentence_encoder.embed_tokens.weight"],
+                ["encoder.lm_head.weight"],
+            ],
+        )
+
+    def test_roberta_enc_dec_shared_params(self):
+        # 3 distinct embeddings
+        _, enc_dec = get_toy_model("cpu", architecture="roberta_enc_dec")
+        self.assertSharing(
+            enc_dec,
+            [
+                ["encoder.embed_tokens.weight"],
+                ["decoder.embed_tokens.weight"],
+                ["decoder.output_projection.weight"],
+            ],
+        )
+
+        # 2 distinct embeddings, one for encoder, one for decoder
+        _, enc_dec = get_toy_model(
+            "cpu", architecture="roberta_enc_dec", share_decoder_input_output_embed=True
+        )
+        self.assertSharing(
+            enc_dec,
+            [
+                ["encoder.embed_tokens.weight"],
+                [
+                    "decoder.embed_tokens.weight",
+                    "decoder.output_projection.weight",
+                ],
+            ],
+        )
+
+        # shared embeddings
+        _, enc_dec = get_toy_model(
+            "cpu", architecture="roberta_enc_dec", share_all_embeddings=True
+        )
+        self.assertSharing(
+            enc_dec,
+            [
+                [
+                    "encoder.embed_tokens.weight",
+                    "decoder.embed_tokens.weight",
+                    "decoder.output_projection.weight",
+                ]
+            ],
+        )
+
+    def test_roberta_max_positions_is_correctly_set(self):
+        device = "cpu"
+        task, model = get_toy_model(device)
+        max_pos = model.max_decoder_positions()
+        self.assertEqual(max_pos, 256)
+        self.assertEqual(max_pos, model.decoder.max_positions())
+        self.assertEqual(max_pos, model.encoder.max_positions())
+        self.assertEqual(max_pos, model.encoder.embed_positions.max_positions)
+
+        sentence = [31 for _ in range(max_pos)]
+        sample = mk_sample("en", device, sentence, batch_size=1)
+        self.assertEqual(list(sample["net_input"]["src_lengths"]), [max_pos])
+        self.assertEqual(len(sample["net_input"]["src_tokens"][0]), max_pos)
+        x, _ = model.forward(**sample["net_input"])
+        self.assertEqual(x.shape, (1, max_pos, VOCAB_SIZE))
+
+    @cpu_gpu
+    def test_roberta_forward_backward(self, device: str):
+        _, model = get_toy_model(device)
+        sample = mk_sample("en", device)
+        en_tokens = sample["net_input"]["src_tokens"]
+        (bs, l) = en_tokens.shape
+        # Forward
+        logits, _ = model(**sample["net_input"])
+        self.assertEqual(logits.shape, (bs, l, VOCAB_SIZE))
+
+        # Backward
+        loss = logits.sum()
+        loss.backward()
+
+    @cpu_gpu
+    def test_roberta_forward_backward_bs1(self, device: str):
+        _, model = get_toy_model(device)
+        sample = mk_sample("en", device, batch_size=1)
+        o, _ = model.forward(**sample["net_input"])
+        loss = o.sum()
+        sample2 = mk_sample("ro", device, batch_size=1)
+        o, _ = model.forward(**sample2["net_input"])
+        loss += o.sum()
+        loss.backward()
+
+    @cpu_gpu
+    def test_roberta_batching(self, device: str):
+        """
+        Checks that the batch of size 2 give twice the same results than the batch of size 1.
+        """
+        _, model = get_toy_model(device)
+        sample = mk_sample("en", device, batch_size=1)
+        slen = sample["net_input"]["src_lengths"][0]
+        sample2 = mk_sample("en", device, batch_size=2)
+        with torch.no_grad():
+            z = model.encoder.forward(
+                sample["net_input"]["src_tokens"], sample["net_input"]["src_lengths"]
+            )
+            z = z["encoder_out"][-1]
+            logits, _ = model.forward(**sample["net_input"])
+
+            z2 = model.encoder.forward(
+                sample2["net_input"]["src_tokens"], sample["net_input"]["src_lengths"]
+            )
+            z2 = z2["encoder_out"][-1]
+            logits2, _ = model.forward(**sample2["net_input"])
+
+        self.assertEqual(z.shape, (slen, 1, 12))
+        self.assertEqual(z2.shape, (slen, 2, 12))
+        self.assertTensorEqual(logits2[0], logits2[1])
+        self.assertTensorEqual(logits[0], logits2[0])
+
+    @cpu_gpu
+    def test_roberta_incremental_decoder(self, device: str):
+        """
+        Checks that incremental decoding yields the same result than non incremental one.
+        """
+        task, model = get_toy_model(device)
+
+        en_sample = mk_sample("en", device)
+        en_tokens = en_sample["net_input"]["src_tokens"]
+        ro_sample = mk_sample("ro", device)
+        ro_tokens = ro_sample["net_input"]["src_tokens"]
+
+        en_enc = model.encoder.forward(
+            en_tokens, src_lengths=en_sample["net_input"]["src_lengths"]
+        )
+        (bs, tgt_len) = ro_tokens.shape
+
+        # Decode without incremental state
+        ro_dec, _ = model.decoder.forward(ro_tokens, encoder_out=en_enc)
+        self.assertEqual(ro_dec.shape, (bs, tgt_len, VOCAB_SIZE))
+        self.assertTensorEqual(ro_dec[0], ro_dec[1])
+
+        # Decode with incremental state
+        inc_state = {}
+        ro_dec_inc = []
+        for l in range(tgt_len):
+            ro, _ = model.decoder.forward(
+                ro_tokens[:, : l + 1], encoder_out=en_enc, incremental_state=inc_state
+            )
+            self.assertEqual(ro.shape, (bs, 1, VOCAB_SIZE))
+            ro_dec_inc.append(ro)
+
+        for l in range(tgt_len):
+            # Intra-batch
+            self.assertTensorEqual(ro_dec_inc[l][0], ro_dec_inc[l][1])
+            # Incremental vs non-incremental
+            self.assertTensorEqual(ro_dec_inc[l][:, 0], ro_dec[:, l])
+
+
+def params(model, name):
+    if "." not in name:
+        return getattr(model, name)
+
+    prefix, name = name.split(".", 1)
+    return params(getattr(model, prefix), name)
diff --git a/SpeechT5/fairseq/tests/test_sequence_generator.py b/SpeechT5/fairseq/tests/test_sequence_generator.py
new file mode 100644
index 0000000000000000000000000000000000000000..afbdfb6c2cde139dfc7e8c48fdbf889375c8d4e1
--- /dev/null
+++ b/SpeechT5/fairseq/tests/test_sequence_generator.py
@@ -0,0 +1,745 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import argparse
+import tempfile
+import unittest
+import math
+import numpy as np
+
+
+import tests.utils as test_utils
+import torch
+from fairseq import search
+from fairseq.data.dictionary import Dictionary
+from fairseq.models.transformer import TransformerModel
+from fairseq.sequence_generator import EnsembleModel, SequenceGenerator
+from fairseq.ngram_repeat_block import NGramRepeatBlock
+from fairseq.tasks.fairseq_task import LegacyFairseqTask
+
+
+DEFAULT_TEST_VOCAB_SIZE = 100
+
+
+class DummyTask(LegacyFairseqTask):
+    def __init__(self, args):
+        super().__init__(args)
+        self.dictionary = get_dummy_dictionary()
+        if getattr(self.args, "ctc", False):
+            self.dictionary.add_symbol("<ctc_blank>")
+        self.src_dict = self.dictionary
+        self.tgt_dict = self.dictionary
+
+    @property
+    def source_dictionary(self):
+        return self.src_dict
+
+    @property
+    def target_dictionary(self):
+        return self.dictionary
+
+
+def get_dummy_dictionary(vocab_size=DEFAULT_TEST_VOCAB_SIZE):
+    dummy_dict = Dictionary()
+    # add dummy symbol to satisfy vocab size
+    for id, _ in enumerate(range(vocab_size)):
+        dummy_dict.add_symbol("{}".format(id), n=1000)
+    return dummy_dict
+
+
+def get_dummy_task_and_parser():
+    """
+    to build a fariseq model, we need some dummy parse and task. This function
+    is used to create dummy task and parser to faciliate model/criterion test
+
+    Note: we use FbSpeechRecognitionTask as the dummy task. You may want
+    to use other task by providing another function
+    """
+    parser = argparse.ArgumentParser(
+        description="test_dummy_s2s_task", argument_default=argparse.SUPPRESS
+    )
+    DummyTask.add_args(parser)
+    args = parser.parse_args([])
+    task = DummyTask.setup_task(args)
+    return task, parser
+
+
+class TestJitSequenceGeneratorBase(unittest.TestCase):
+    def setUp(self):
+        self.task, self.parser = get_dummy_task_and_parser()
+        eos = self.task.tgt_dict.eos()
+        src_tokens = torch.randint(3, 50, (2, 10)).long()
+        src_tokens = torch.cat((src_tokens, torch.LongTensor([[eos], [eos]])), -1)
+        src_lengths = torch.LongTensor([2, 10])
+        self.sample = {
+            "net_input": {"src_tokens": src_tokens, "src_lengths": src_lengths}
+        }
+        TransformerModel.add_args(self.parser)
+        args = self.parser.parse_args([])
+        args.encoder_layers = 2
+        args.decoder_layers = 1
+        self.transformer_model = TransformerModel.build_model(args, self.task)
+
+    def assertOutputEqual(self, hypo, pos_probs):
+        pos_scores = torch.FloatTensor(pos_probs).log()
+        self.assertTensorSizeEqual(hypo["positional_scores"], pos_scores)
+        self.assertTensorSizeEqual(pos_scores.numel(), hypo["tokens"].numel())
+
+    def assertTensorSizeEqual(self, t1, t2):
+        self.assertEqual(t1.size(), t2.size(), "size mismatch")
+
+    def assertAlmostEqual(self, t1, t2):
+        self.assertEqual(t1.size(), t2.size(), "size mismatch")
+        self.assertLess((t1 - t2).abs().max(), 1e-4)
+
+    def assertTensorEqual(self, t1, t2):
+        self.assertEqual(t1.size(), t2.size(), "size mismatch")
+        self.assertEqual(t1.ne(t2).long().sum(), 0)
+
+    def assertHypoEqual(self, h1, h2):
+        "Check two hypos are equal"
+        self.assertTensorEqual(h1["tokens"], h2["tokens"])
+        self.assertAlmostEqual(h1["positional_scores"], h2["positional_scores"])
+        self.assertLess(abs(h1["score"] - h2["score"]), 1e-6)
+        self.assertAlmostEqual(h1["attention"], h2["attention"])
+
+    def _test_save_and_load(self, scripted_module):
+        with tempfile.NamedTemporaryFile() as f:
+            scripted_module.save(f.name)
+            torch.jit.load(f.name)
+
+
+JIT_MSG = "Targeting OSS scriptability for the 1.6 release"
+
+
+@unittest.skipIf(torch.__version__ < "1.6.0", JIT_MSG)
+class TestJitSequenceGenerator(TestJitSequenceGeneratorBase):
+    def test_export_transformer(self):
+        model = self.transformer_model
+        torch.jit.script(model)
+
+    def test_ensemble_sequence_generator(self):
+        model = self.transformer_model
+        generator = SequenceGenerator(
+            [model],
+            self.task.tgt_dict,
+            beam_size=2,
+            no_repeat_ngram_size=2,
+            max_len_b=10,
+        )
+        scripted_model = torch.jit.script(generator)
+        self._test_save_and_load(scripted_model)
+
+    def test_export_ensemble_model(self):
+        model = self.transformer_model
+        ensemble_models = EnsembleModel([model])
+        torch.jit.script(ensemble_models)
+
+
+class TestExportSearch(unittest.TestCase):
+    def setUp(self):
+        task, _ = get_dummy_task_and_parser()
+        self.tgt_dict = task.tgt_dict
+        self.min_top1_prob = 0.4
+
+    def test_export_diverse_bs(self):
+        search_strategy = search.DiverseBeamSearch(
+            self.tgt_dict, num_groups=2, diversity_strength=0.0
+        )
+        torch.jit.script(search_strategy)
+
+    def test_export_sampling(self):
+        low_sampling_topp = self.min_top1_prob / 2.0
+        search_strategy = search.Sampling(
+            self.tgt_dict, sampling_topp=low_sampling_topp
+        )
+        torch.jit.script(search_strategy)
+
+    def test_export_diverse_siblings_search(self):
+        search_strategy = search.DiverseSiblingsSearch(
+            self.tgt_dict, diversity_rate=0.5
+        )
+        torch.jit.script(search_strategy)
+
+
+class TestSequenceGeneratorBase(unittest.TestCase):
+    def assertHypoTokens(self, hypo, tokens):
+        self.assertTensorEqual(hypo["tokens"], torch.LongTensor(tokens))
+
+    def assertHypoScore(self, hypo, pos_probs, normalized=True, lenpen=1.0):
+        pos_scores = torch.FloatTensor(pos_probs).log()
+        self.assertAlmostEqual(hypo["positional_scores"], pos_scores)
+        self.assertEqual(pos_scores.numel(), hypo["tokens"].numel())
+        score = pos_scores.sum()
+        if normalized:
+            score /= pos_scores.numel() ** lenpen
+        self.assertLess(abs(score - hypo["score"]), 1e-6)
+
+    def assertAlmostEqual(self, t1, t2):
+        self.assertEqual(t1.size(), t2.size(), "size mismatch")
+        self.assertLess((t1 - t2).abs().max(), 1e-4)
+
+    def assertTensorEqual(self, t1, t2):
+        self.assertEqual(t1.size(), t2.size(), "size mismatch")
+        self.assertEqual(t1.ne(t2).long().sum(), 0)
+
+
+class TestSequenceGenerator(TestSequenceGeneratorBase):
+    def setUp(self):
+        (
+            self.tgt_dict,
+            self.w1,
+            self.w2,
+            src_tokens,
+            src_lengths,
+            self.model,
+        ) = test_utils.sequence_generator_setup()
+        self.sample = {
+            "net_input": {"src_tokens": src_tokens, "src_lengths": src_lengths}
+        }
+
+    def test_with_normalization(self):
+        generator = SequenceGenerator([self.model], self.tgt_dict, beam_size=2)
+        hypos = generator.forward(self.sample)
+        eos, w1, w2 = self.tgt_dict.eos(), self.w1, self.w2
+        # sentence 1, beam 1
+        self.assertHypoTokens(hypos[0][0], [w1, eos])
+        self.assertHypoScore(hypos[0][0], [0.9, 1.0])
+        # sentence 1, beam 2
+        self.assertHypoTokens(hypos[0][1], [w2, w1, w2, eos])
+        self.assertHypoScore(hypos[0][1], [0.1, 0.9, 0.9, 1.0])
+        # sentence 2, beam 1
+        self.assertHypoTokens(hypos[1][0], [w1, w2, w1, eos])
+        self.assertHypoScore(hypos[1][0], [0.7, 0.4, 0.4, 1.0])
+        # sentence 2, beam 2
+        self.assertHypoTokens(hypos[1][1], [w1, w2, eos])
+        self.assertHypoScore(hypos[1][1], [0.7, 0.4, 0.6])
+
+    def test_without_normalization(self):
+        # Sentence 1: unchanged from the normalized case
+        # Sentence 2: beams swap order
+        generator = SequenceGenerator(
+            [self.model], self.tgt_dict, beam_size=2, normalize_scores=False
+        )
+        hypos = generator.forward(self.sample)
+        eos, w1, w2 = self.tgt_dict.eos(), self.w1, self.w2
+        # sentence 1, beam 1
+        self.assertHypoTokens(hypos[0][0], [w1, eos])
+        self.assertHypoScore(hypos[0][0], [0.9, 1.0], normalized=False)
+        # sentence 1, beam 2
+        self.assertHypoTokens(hypos[0][1], [w2, w1, w2, eos])
+        self.assertHypoScore(hypos[0][1], [0.1, 0.9, 0.9, 1.0], normalized=False)
+        # sentence 2, beam 1
+        self.assertHypoTokens(hypos[1][0], [w1, w2, eos])
+        self.assertHypoScore(hypos[1][0], [0.7, 0.4, 0.6], normalized=False)
+        # sentence 2, beam 2
+        self.assertHypoTokens(hypos[1][1], [w1, w2, w1, eos])
+        self.assertHypoScore(hypos[1][1], [0.7, 0.4, 0.4, 1.0], normalized=False)
+
+    def test_with_lenpen_favoring_short_hypos(self):
+        lenpen = 0.6
+        generator = SequenceGenerator(
+            [self.model], self.tgt_dict, beam_size=2, len_penalty=lenpen
+        )
+        hypos = generator.forward(self.sample)
+        eos, w1, w2 = self.tgt_dict.eos(), self.w1, self.w2
+        # sentence 1, beam 1
+        self.assertHypoTokens(hypos[0][0], [w1, eos])
+        self.assertHypoScore(hypos[0][0], [0.9, 1.0], lenpen=lenpen)
+        # sentence 1, beam 2
+        self.assertHypoTokens(hypos[0][1], [w2, w1, w2, eos])
+        self.assertHypoScore(hypos[0][1], [0.1, 0.9, 0.9, 1.0], lenpen=lenpen)
+        # sentence 2, beam 1
+        self.assertHypoTokens(hypos[1][0], [w1, w2, eos])
+        self.assertHypoScore(hypos[1][0], [0.7, 0.4, 0.6], lenpen=lenpen)
+        # sentence 2, beam 2
+        self.assertHypoTokens(hypos[1][1], [w1, w2, w1, eos])
+        self.assertHypoScore(hypos[1][1], [0.7, 0.4, 0.4, 1.0], lenpen=lenpen)
+
+    def test_with_lenpen_favoring_long_hypos(self):
+        lenpen = 5.0
+        generator = SequenceGenerator(
+            [self.model], self.tgt_dict, beam_size=2, len_penalty=lenpen
+        )
+        hypos = generator.forward(self.sample)
+        eos, w1, w2 = self.tgt_dict.eos(), self.w1, self.w2
+        # sentence 1, beam 1
+        self.assertHypoTokens(hypos[0][0], [w2, w1, w2, eos])
+        self.assertHypoScore(hypos[0][0], [0.1, 0.9, 0.9, 1.0], lenpen=lenpen)
+        # sentence 1, beam 2
+        self.assertHypoTokens(hypos[0][1], [w1, eos])
+        self.assertHypoScore(hypos[0][1], [0.9, 1.0], lenpen=lenpen)
+        # sentence 2, beam 1
+        self.assertHypoTokens(hypos[1][0], [w1, w2, w1, eos])
+        self.assertHypoScore(hypos[1][0], [0.7, 0.4, 0.4, 1.0], lenpen=lenpen)
+        # sentence 2, beam 2
+        self.assertHypoTokens(hypos[1][1], [w1, w2, eos])
+        self.assertHypoScore(hypos[1][1], [0.7, 0.4, 0.6], lenpen=lenpen)
+
+    def test_maxlen(self):
+        generator = SequenceGenerator(
+            [self.model], self.tgt_dict, beam_size=2, max_len_b=2
+        )
+        hypos = generator.forward(self.sample)
+        eos, w1, w2 = self.tgt_dict.eos(), self.w1, self.w2
+        # sentence 1, beam 1
+        self.assertHypoTokens(hypos[0][0], [w1, eos])
+        self.assertHypoScore(hypos[0][0], [0.9, 1.0])
+        # sentence 1, beam 2
+        self.assertHypoTokens(hypos[0][1], [w2, w2, eos])
+        self.assertHypoScore(hypos[0][1], [0.1, 0.1, 0.6])
+        # sentence 2, beam 1
+        self.assertHypoTokens(hypos[1][0], [w1, w2, eos])
+        self.assertHypoScore(hypos[1][0], [0.7, 0.4, 0.6])
+        # sentence 2, beam 2
+        self.assertHypoTokens(hypos[1][1], [w2, w2, eos])
+        self.assertHypoScore(hypos[1][1], [0.3, 0.9, 0.01])
+
+    def test_encoder_with_different_output_len(self):
+        args = self.model.encoder.args
+        task = test_utils.TestTranslationTask.setup_task(
+            args, self.tgt_dict, self.tgt_dict
+        )
+        reshaping_model = test_utils.TestReshapingModel.build_model(args, task)
+        generator = SequenceGenerator(
+            [reshaping_model], self.tgt_dict, beam_size=2, max_len_b=2
+        )
+        hypos = generator.forward(self.sample)
+        for sent in [0, 1]:
+            for beam in [0, 1]:
+                assert hypos[sent][beam]["attention"] is not None
+
+    def test_generation_with_additional_input(self):
+        args = self.model.encoder.args
+        task = test_utils.TestTranslationTask.setup_task(
+            args, self.tgt_dict, self.tgt_dict
+        )
+        add_input_model = test_utils.TestAdditionalInputModel.build_model(args, task)
+        generator = SequenceGenerator([add_input_model], self.tgt_dict, beam_size=2)
+        sample = self.sample.copy()
+        sample["net_input"]["fancy_other_input"] = sample["net_input"]["src_tokens"]
+        hypos = generator.forward(self.sample)
+        eos, w1, w2 = self.tgt_dict.eos(), self.w1, self.w2
+        # sentence 1, beam 1
+        self.assertHypoTokens(hypos[0][0], [w1, eos])
+        self.assertHypoScore(hypos[0][0], [0.9, 1.0])
+
+
+@unittest.skipUnless(torch.cuda.is_available(), "")
+class TestRepeatNgramBlocking(TestSequenceGeneratorBase):
+    @classmethod
+    def setUpClass(cls):
+        (
+            cls.tgt_dict,
+            cls.w1,
+            cls.w2,
+            src_tokens,
+            src_lengths,
+            cls.model,
+        ) = test_utils.sequence_generator_setup()
+        return cls
+
+    def test_finds_repetitive_tokens(self):
+        bsz, vocab_size, beam_size, step = 2, 4, 1, 3
+        generated_tok = torch.tensor(
+            [[2, 2, 2, 2], [3, 3, 3, 3]], dtype=torch.long, device="cuda"
+        )
+        lprobs = torch.zeros((beam_size * bsz, vocab_size), device="cuda")
+        desired_result = lprobs.new_tensor(
+            [[0.0, 0.0, -math.inf, 0.0], [0.0, 0.0, 0.0, -math.inf]]
+        )
+
+        cuda_ext_result, baseline_result = self._compare_cuda_ext_to_default_implem(
+            bsz, beam_size, generated_tok, lprobs, step, 2
+        )
+        self.assertTensorEqual(cuda_ext_result, desired_result)
+        self.assertTensorEqual(baseline_result, desired_result)
+
+    @unittest.skipIf(torch.__version__ < "1.6.0", JIT_MSG)
+    def test_jit_no_extension(self):
+        bsz, vocab_size, beam_size, step = 2, 4, 1, 3
+        generated_tok = torch.tensor(
+            [[2, 2, 2, 2], [3, 3, 3, 3]], dtype=torch.long, device="cuda"
+        )
+        lprobs = torch.zeros((beam_size * bsz, vocab_size), device="cuda")
+        blocker = NGramRepeatBlock(2, use_extension=False)
+        base_result = blocker(generated_tok, lprobs.clone(), bsz, beam_size, step)
+        scripted_blocker = torch.jit.script(blocker)
+        jit_result = scripted_blocker(
+            generated_tok, lprobs.clone(), bsz, beam_size, step
+        )
+        self.assertTensorEqual(base_result, jit_result)
+
+    def test_ngram_blocking_same_as_default_implem(self):
+        """Test that cuda extension returns same things as default impl in many settings."""
+        vocab_size = 4
+        step = 6
+        for _ in range(2):
+            block_param = np.random.choice([1, 2, 3, 4])
+            batch_size = np.random.randint(1, 8)
+            beam_size = np.random.choice([1, 2, 4, 8])
+            lprobs = torch.zeros((beam_size * batch_size, vocab_size), device="cuda")
+
+            generated_tok = torch.tensor(
+                np.random.randint(
+                    0, vocab_size, size=(batch_size * beam_size, step + 1)
+                ),
+                device="cuda",
+                dtype=torch.long,
+            )
+            self._compare_cuda_ext_to_default_implem(
+                batch_size,
+                beam_size,
+                generated_tok,
+                lprobs,
+                step,
+                block_param,
+            )
+
+    def _compare_cuda_ext_to_default_implem(
+        self, bsz, beam_size, generated_tok, lprobs, step, block_param
+    ):
+        """Assert that cuda extension and default implem return the same thing."""
+        blocker = NGramRepeatBlock(block_param)
+        assert blocker.use_extension, "Extension not compiled"
+        cuda_ext_result = blocker(
+            generated_tok,
+            lprobs.clone(),
+            bsz,
+            beam_size,
+            step,
+        )
+        blocker.use_extension = False
+        baseline_result = blocker(
+            generated_tok,
+            lprobs.clone(),
+            bsz,
+            beam_size,
+            step,
+        )
+        self.assertTensorEqual(cuda_ext_result, baseline_result)
+        blocker.use_extension = True
+        return cuda_ext_result, baseline_result
+
+
+class TestDiverseBeamSearch(TestSequenceGeneratorBase):
+    def setUp(self):
+        # construct dummy dictionary
+        d = test_utils.dummy_dictionary(vocab_size=2)
+        self.assertEqual(d.pad(), 1)
+        self.assertEqual(d.eos(), 2)
+        self.assertEqual(d.unk(), 3)
+        self.eos = d.eos()
+        self.w1 = 4
+        self.w2 = 5
+
+        # construct source data
+        self.src_tokens = torch.LongTensor(
+            [
+                [self.w1, self.w2, self.eos],
+                [self.w1, self.w2, self.eos],
+            ]
+        )
+        self.src_lengths = torch.LongTensor([2, 2])
+
+        args = argparse.Namespace()
+        unk = 0.0
+        args.beam_probs = [
+            # step 0:
+            torch.FloatTensor(
+                [
+                    # eos      w1   w2
+                    # sentence 1:
+                    [0.0, unk, 0.9, 0.1],  # beam 1
+                    [0.0, unk, 0.9, 0.1],  # beam 2
+                    # sentence 2:
+                    [0.0, unk, 0.7, 0.3],
+                    [0.0, unk, 0.7, 0.3],
+                ]
+            ),
+            # step 1:
+            torch.FloatTensor(
+                [
+                    # eos      w1   w2
+                    # sentence 1:
+                    [0.0, unk, 0.6, 0.4],
+                    [0.0, unk, 0.6, 0.4],
+                    # sentence 2:
+                    [0.25, unk, 0.35, 0.4],
+                    [0.25, unk, 0.35, 0.4],
+                ]
+            ),
+            # step 2:
+            torch.FloatTensor(
+                [
+                    # eos      w1   w2
+                    # sentence 1:
+                    [1.0, unk, 0.0, 0.0],
+                    [1.0, unk, 0.0, 0.0],
+                    # sentence 2:
+                    [0.9, unk, 0.1, 0.0],
+                    [0.9, unk, 0.1, 0.0],
+                ]
+            ),
+        ]
+
+        task = test_utils.TestTranslationTask.setup_task(args, d, d)
+        self.model = task.build_model(args)
+        self.tgt_dict = task.target_dictionary
+
+    def test_diverse_beam_search(self):
+        search_strategy = search.DiverseBeamSearch(
+            self.tgt_dict, num_groups=2, diversity_strength=0.0
+        )
+        generator = SequenceGenerator(
+            [self.model],
+            self.tgt_dict,
+            beam_size=2,
+            search_strategy=search_strategy,
+        )
+        sample = {
+            "net_input": {
+                "src_tokens": self.src_tokens,
+                "src_lengths": self.src_lengths,
+            }
+        }
+        hypos = generator.forward(sample)
+        eos, w1, w2 = self.eos, self.w1, self.w2
+        # sentence 1, beam 1
+        self.assertHypoTokens(hypos[0][0], [w1, w1, eos])
+        self.assertHypoScore(hypos[0][0], [0.9, 0.6, 1.0])
+        # sentence 1, beam 2
+        self.assertHypoTokens(hypos[0][1], [w1, w1, eos])
+        self.assertHypoScore(hypos[0][1], [0.9, 0.6, 1.0])
+        # sentence 2, beam 1
+        self.assertHypoTokens(hypos[1][0], [w1, w2, eos])
+        self.assertHypoScore(hypos[1][0], [0.7, 0.4, 0.9])
+        # sentence 2, beam 2
+        self.assertHypoTokens(hypos[1][1], [w1, w2, eos])
+        self.assertHypoScore(hypos[1][1], [0.7, 0.4, 0.9])
+
+
+class TestDiverseSiblingsSearch(TestDiverseBeamSearch):
+    def assertHypoScore(
+        self, hypo, pos_probs, sibling_rank, diversity_rate, normalized=True, lenpen=1.0
+    ):
+        pos_scores = torch.FloatTensor(pos_probs).log()
+        pos_scores.sub_(torch.Tensor(sibling_rank) * diversity_rate)
+        self.assertAlmostEqual(hypo["positional_scores"], pos_scores)
+        self.assertEqual(pos_scores.numel(), hypo["tokens"].numel())
+        score = pos_scores.sum()
+        if normalized:
+            score /= pos_scores.numel() ** lenpen
+        self.assertLess(abs(score - hypo["score"]), 1e-6)
+
+    def test_diverse_beam_search(self):
+        search_strategy = search.DiverseSiblingsSearch(
+            self.tgt_dict, diversity_rate=0.5
+        )
+        generator = SequenceGenerator(
+            [self.model], self.tgt_dict, beam_size=2, search_strategy=search_strategy
+        )
+        sample = {
+            "net_input": {
+                "src_tokens": self.src_tokens,
+                "src_lengths": self.src_lengths,
+            }
+        }
+        hypos = generator.forward(sample)
+        eos, w1, w2 = self.eos, self.w1, self.w2
+        # sentence 1, beam 1
+        self.assertHypoTokens(hypos[0][0], [w1, w1, eos])
+        self.assertHypoScore(hypos[0][0], [0.9, 0.6, 1.0], [0, 1, 1], 0.5)
+        # sentence 1, beam 2
+        self.assertHypoTokens(hypos[0][1], [w1, w2, eos])
+        self.assertHypoScore(hypos[0][1], [0.9, 0.4, 1.0], [0, 2, 1], 0.5)
+        # sentence 2, beam 1
+        self.assertHypoTokens(hypos[1][0], [w1, w2, eos])
+        self.assertHypoScore(hypos[1][0], [0.7, 0.4, 0.9], [0, 1, 1], 0.5)
+        # sentence 2, beam 2
+        self.assertHypoTokens(hypos[1][1], [w1, w1, eos])
+        self.assertHypoScore(hypos[1][1], [0.7, 0.35, 0.9], [0, 2, 1], 0.5)
+
+
+class TestTopPSamplingSearch(TestSequenceGeneratorBase):
+    def setUp(self):
+        # construct dummy dictionary
+        d = test_utils.dummy_dictionary(vocab_size=2)
+        self.assertEqual(d.pad(), 1)
+        self.assertEqual(d.eos(), 2)
+        self.assertEqual(d.unk(), 3)
+        self.eos = d.eos()
+        self.w1 = 4
+        self.w2 = 5
+
+        # construct source data
+        self.src_tokens = torch.LongTensor(
+            [
+                [self.w1, self.w2, self.eos],
+                [self.w1, self.w2, self.eos],
+            ]
+        )
+        self.src_lengths = torch.LongTensor([2, 2])
+
+        args = argparse.Namespace()
+        unk = 0.0
+        # The minimal probability of top 2 tokens.
+        self.min_top2_prob = 0.75
+        # The minimal probability of the top 1 token.
+        self.min_top1_prob = 0.4
+
+        w1_prob = self.min_top1_prob
+        w2_prob = self.min_top2_prob - self.min_top1_prob
+        eos_prob = 1 - self.min_top2_prob
+
+        args.beam_probs = [
+            # step 0:
+            torch.FloatTensor(
+                [
+                    # eos      w1   w2
+                    [0.0, unk, 1.0, 0.0],
+                    [0.0, unk, 1.0, 0.0],
+                    [0.0, unk, 1.0, 0.0],
+                    [0.0, unk, 1.0, 0.0],
+                ]
+            ),
+            # step 1:
+            torch.FloatTensor(
+                [
+                    # eos           w1       w2
+                    [eos_prob, unk, w1_prob, w2_prob],
+                    [eos_prob, unk, w1_prob, w2_prob],
+                    [eos_prob, unk, w1_prob, w2_prob],
+                    [eos_prob, unk, w1_prob, w2_prob],
+                ]
+            ),
+            # step 2:
+            torch.FloatTensor(
+                [
+                    # eos      w1   w2
+                    [1.0, unk, 0.0, 0.0],
+                    [1.0, unk, 0.0, 0.0],
+                    [1.0, unk, 0.0, 0.0],
+                    [1.0, unk, 0.0, 0.0],
+                ]
+            ),
+        ]
+
+        task = test_utils.TestTranslationTask.setup_task(args, d, d)
+        self.model = task.build_model(args)
+        self.tgt_dict = task.target_dictionary
+
+    def test_topp_sampling_search_low_prob(self):
+        # Given a prob low enough to top-P sampling, we expect only the top
+        # 1 token to be sampled, which always results in the same output.
+        low_sampling_topp = self.min_top1_prob / 2.0
+        search_strategy = search.Sampling(
+            self.tgt_dict, sampling_topp=low_sampling_topp
+        )
+        generator = SequenceGenerator(
+            [self.model], self.tgt_dict, beam_size=2, search_strategy=search_strategy
+        )
+        sample = {
+            "net_input": {
+                "src_tokens": self.src_tokens,
+                "src_lengths": self.src_lengths,
+            }
+        }
+        hypos = generator.forward(sample)
+        eos, w1 = self.eos, self.w1
+        # sentence 1, beam 1
+        self.assertHypoTokens(hypos[0][0], [w1, w1, eos])
+        self.assertHypoScore(hypos[0][0], [1.0, 0.4, 1.0])
+        # sentence 1, beam 2
+        self.assertHypoTokens(hypos[0][1], [w1, w1, eos])
+        self.assertHypoScore(hypos[0][1], [1.0, 0.4, 1.0])
+        # sentence 2, beam 1
+        self.assertHypoTokens(hypos[1][0], [w1, w1, eos])
+        self.assertHypoScore(hypos[1][0], [1.0, 0.4, 1.0])
+        # sentence 2, beam 2
+        self.assertHypoTokens(hypos[1][1], [w1, w1, eos])
+        self.assertHypoScore(hypos[1][1], [1.0, 0.4, 1.0])
+
+    def test_topp_sampling_search_high_prob(self):
+        # Given a prob high enough to top-P sampling, any of the top 2
+        # tokens could be sampled. This can cause different outputs.
+        high_sampling_topp = (self.min_top1_prob + self.min_top2_prob) / 2.0
+        search_strategy = search.Sampling(
+            self.tgt_dict, sampling_topp=high_sampling_topp
+        )
+        generator = SequenceGenerator(
+            [self.model], self.tgt_dict, beam_size=2, search_strategy=search_strategy
+        )
+        sample = {
+            "net_input": {
+                "src_tokens": self.src_tokens,
+                "src_lengths": self.src_lengths,
+            }
+        }
+        hypos = generator.forward(sample)
+        eos, w1, w2 = self.eos, self.w1, self.w2
+        # sentence 1, beam 1
+        self.assertTrue(
+            self.hypoTokens(hypos[0][0], [w1, w1, eos])
+            or self.hypoTokens(hypos[0][0], [w1, w2, eos])
+        )
+        self.assertTrue(
+            self.hypoScore(hypos[0][0], [1.0, 0.4, 1.0])
+            or self.hypoScore(hypos[0][0], [1.0, 0.35, 1.0])
+        )
+
+        # sentence 1, beam 2
+        self.assertTrue(
+            self.hypoTokens(hypos[0][1], [w1, w1, eos])
+            or self.hypoTokens(hypos[0][1], [w1, w2, eos])
+        )
+        self.assertTrue(
+            self.hypoScore(hypos[0][1], [1.0, 0.4, 1.0])
+            or self.hypoScore(hypos[0][1], [1.0, 0.35, 1.0])
+        )
+
+        # sentence 2, beam 1
+        self.assertTrue(
+            self.hypoTokens(hypos[1][0], [w1, w1, eos])
+            or self.hypoTokens(hypos[1][0], [w1, w2, eos])
+        )
+        self.assertTrue(
+            self.hypoScore(hypos[1][0], [1.0, 0.4, 1.0])
+            or self.hypoScore(hypos[1][0], [1.0, 0.35, 1.0])
+        )
+
+        # sentence 2, beam 2
+        self.assertTrue(
+            self.hypoTokens(hypos[1][1], [w1, w1, eos])
+            or self.hypoTokens(hypos[1][1], [w1, w2, eos])
+        )
+        self.assertTrue(
+            self.hypoScore(hypos[1][1], [1.0, 0.4, 1.0])
+            or self.hypoScore(hypos[1][1], [1.0, 0.35, 1.0])
+        )
+
+    def hypoTokens(self, hypo, tokens):
+        return self.tensorEqual(hypo["tokens"], torch.LongTensor(tokens))
+
+    def hypoScore(self, hypo, pos_probs, normalized=True, lenpen=1.0):
+        pos_scores = torch.FloatTensor(pos_probs).log()
+        if not self.almostEqual(hypo["positional_scores"], pos_scores):
+            return False
+        if pos_scores.numel() != hypo["tokens"].numel():
+            return False
+        score = pos_scores.sum()
+        if normalized:
+            score /= pos_scores.numel() ** lenpen
+        return abs(score - hypo["score"]) < 1e-6
+
+    def almostEqual(self, t1, t2):
+        return t1.size() == t2.size() and (t1 - t2).abs().max() < 1e-4
+
+    def tensorEqual(self, t1, t2):
+        return t1.size() == t2.size() and t1.ne(t2).long().sum() == 0
+
+
+if __name__ == "__main__":
+    unittest.main()
diff --git a/SpeechT5/fairseq/tests/test_sequence_scorer.py b/SpeechT5/fairseq/tests/test_sequence_scorer.py
new file mode 100644
index 0000000000000000000000000000000000000000..42f9447b599bcd7a9913aec37d94ea5078ff43a3
--- /dev/null
+++ b/SpeechT5/fairseq/tests/test_sequence_scorer.py
@@ -0,0 +1,120 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import argparse
+import unittest
+
+import tests.utils as test_utils
+import torch
+from fairseq.sequence_scorer import SequenceScorer
+
+
+class TestSequenceScorer(unittest.TestCase):
+    def test_sequence_scorer(self):
+        # construct dummy dictionary
+        d = test_utils.dummy_dictionary(vocab_size=2)
+        self.assertEqual(d.pad(), 1)
+        self.assertEqual(d.eos(), 2)
+        self.assertEqual(d.unk(), 3)
+        eos = d.eos()
+        w1 = 4
+        w2 = 5
+
+        # construct dataloader
+        data = [
+            {
+                "source": torch.LongTensor([w1, w2, eos]),
+                "target": torch.LongTensor([w1, w2, w1, eos]),
+            },
+            {
+                "source": torch.LongTensor([w2, eos]),
+                "target": torch.LongTensor([w2, w1, eos]),
+            },
+            {
+                "source": torch.LongTensor([w2, eos]),
+                "target": torch.LongTensor([w2, eos]),
+            },
+        ]
+        data_itr = test_utils.dummy_dataloader(data)
+
+        # specify expected output probabilities
+        args = argparse.Namespace()
+        unk = 0.0
+        args.beam_probs = [
+            # step 0:
+            torch.FloatTensor(
+                [
+                    # eos      w1   w2
+                    [0.0, unk, 0.6, 0.4],  # sentence 1
+                    [0.0, unk, 0.4, 0.6],  # sentence 2
+                    [0.0, unk, 0.7, 0.3],  # sentence 3
+                ]
+            ),
+            # step 1:
+            torch.FloatTensor(
+                [
+                    # eos      w1   w2
+                    [0.0, unk, 0.2, 0.7],  # sentence 1
+                    [0.0, unk, 0.8, 0.2],  # sentence 2
+                    [0.7, unk, 0.1, 0.2],  # sentence 3
+                ]
+            ),
+            # step 2:
+            torch.FloatTensor(
+                [
+                    # eos       w1    w2
+                    [0.10, unk, 0.50, 0.4],  # sentence 1
+                    [0.15, unk, 0.15, 0.7],  # sentence 2
+                    [0.00, unk, 0.00, 0.0],  # sentence 3
+                ]
+            ),
+            # step 3:
+            torch.FloatTensor(
+                [
+                    # eos      w1    w2
+                    [0.9, unk, 0.05, 0.05],  # sentence 1
+                    [0.0, unk, 0.00, 0.0],  # sentence 2
+                    [0.0, unk, 0.00, 0.0],  # sentence 3
+                ]
+            ),
+        ]
+        expected_scores = [
+            [0.6, 0.7, 0.5, 0.9],  # sentence 1
+            [0.6, 0.8, 0.15],  # sentence 2
+            [0.3, 0.7],  # sentence 3
+        ]
+
+        task = test_utils.TestTranslationTask.setup_task(args, d, d)
+        model = task.build_model(args)
+        scorer = SequenceScorer(task.target_dictionary)
+        for sample in data_itr:
+            hypos = task.inference_step(scorer, [model], sample)
+            for id, hypos_id in zip(sample["id"].tolist(), hypos):
+                self.assertHypoTokens(hypos_id[0], data[id]["target"])
+                self.assertHypoScore(hypos_id[0], expected_scores[id])
+
+    def assertHypoTokens(self, hypo, tokens):
+        self.assertTensorEqual(hypo["tokens"], torch.LongTensor(tokens))
+
+    def assertHypoScore(self, hypo, pos_probs, normalized=True, lenpen=1.0):
+        pos_scores = torch.FloatTensor(pos_probs).log()
+        self.assertAlmostEqual(hypo["positional_scores"], pos_scores)
+        self.assertEqual(pos_scores.numel(), hypo["tokens"].numel())
+        score = pos_scores.sum()
+        if normalized:
+            score /= pos_scores.numel() ** lenpen
+        self.assertLess(abs(score - hypo["score"]), 1e-6)
+
+    def assertAlmostEqual(self, t1, t2):
+        self.assertEqual(t1.size(), t2.size(), "size mismatch")
+        self.assertLess((t1 - t2).abs().max(), 1e-4)
+
+    def assertTensorEqual(self, t1, t2):
+        self.assertEqual(t1.size(), t2.size(), "size mismatch")
+        self.assertEqual(t1.ne(t2).long().sum(), 0)
+
+
+if __name__ == "__main__":
+    unittest.main()
diff --git a/SpeechT5/fairseq/tests/test_sparse_multihead_attention.py b/SpeechT5/fairseq/tests/test_sparse_multihead_attention.py
new file mode 100644
index 0000000000000000000000000000000000000000..3e32b25a7fb1e12295b84d0c65064f8e42b7bdd3
--- /dev/null
+++ b/SpeechT5/fairseq/tests/test_sparse_multihead_attention.py
@@ -0,0 +1,114 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import unittest
+
+import torch
+from fairseq.modules.sparse_multihead_attention import SparseMultiheadAttention
+
+
+class TestSparseMultiheadAttention(unittest.TestCase):
+    def test_sparse_multihead_attention(self):
+        attn_weights = torch.randn(1, 8, 8)
+        bidirectional_sparse_mask = torch.tensor(
+            [
+                [0, 0, 0, 0, 0, float("-inf"), float("-inf"), 0],
+                [0, 0, 0, 0, 0, float("-inf"), float("-inf"), 0],
+                [0, 0, 0, 0, 0, float("-inf"), float("-inf"), 0],
+                [0, 0, 0, 0, 0, float("-inf"), float("-inf"), 0],
+                [float("-inf"), float("-inf"), float("-inf"), 0, 0, 0, 0, 0],
+                [float("-inf"), float("-inf"), float("-inf"), 0, 0, 0, 0, 0],
+                [float("-inf"), float("-inf"), float("-inf"), 0, 0, 0, 0, 0],
+                [float("-inf"), float("-inf"), float("-inf"), 0, 0, 0, 0, 0],
+            ]
+        )
+
+        bidirectional_attention = SparseMultiheadAttention(
+            16, 1, stride=4, expressivity=1, is_bidirectional=True
+        )
+        bidirectional_attention_sparse_mask = (
+            bidirectional_attention.buffered_sparse_mask(attn_weights, 8, 8)
+        )
+        torch.all(
+            torch.eq(bidirectional_attention_sparse_mask, bidirectional_sparse_mask)
+        )
+
+        sparse_mask = torch.tensor(
+            [
+                [
+                    0,
+                    float("-inf"),
+                    float("-inf"),
+                    float("-inf"),
+                    float("-inf"),
+                    float("-inf"),
+                    float("-inf"),
+                    float("-inf"),
+                ],
+                [
+                    0,
+                    0,
+                    float("-inf"),
+                    float("-inf"),
+                    float("-inf"),
+                    float("-inf"),
+                    float("-inf"),
+                    float("-inf"),
+                ],
+                [
+                    0,
+                    0,
+                    0,
+                    float("-inf"),
+                    float("-inf"),
+                    float("-inf"),
+                    float("-inf"),
+                    float("-inf"),
+                ],
+                [
+                    0,
+                    0,
+                    0,
+                    0,
+                    float("-inf"),
+                    float("-inf"),
+                    float("-inf"),
+                    float("-inf"),
+                ],
+                [0, 0, 0, 0, 0, float("-inf"), float("-inf"), float("-inf")],
+                [
+                    float("-inf"),
+                    float("-inf"),
+                    float("-inf"),
+                    0,
+                    0,
+                    0,
+                    float("-inf"),
+                    float("-inf"),
+                ],
+                [
+                    float("-inf"),
+                    float("-inf"),
+                    float("-inf"),
+                    0,
+                    0,
+                    0,
+                    0,
+                    float("-inf"),
+                ],
+                [float("-inf"), float("-inf"), float("-inf"), 0, 0, 0, 0, 0],
+            ]
+        )
+
+        attention = SparseMultiheadAttention(
+            16, 1, stride=4, expressivity=1, is_bidirectional=False
+        )
+        attention_sparse_mask = attention.buffered_sparse_mask(attn_weights, 8, 8)
+
+        torch.all(torch.eq(attention_sparse_mask, sparse_mask))
+
+
+if __name__ == "__main__":
+    unittest.main()
diff --git a/SpeechT5/fairseq/tests/test_token_block_dataset.py b/SpeechT5/fairseq/tests/test_token_block_dataset.py
new file mode 100644
index 0000000000000000000000000000000000000000..c4d7b76dcd55fe7869dbb1fa188f7b36fb639bda
--- /dev/null
+++ b/SpeechT5/fairseq/tests/test_token_block_dataset.py
@@ -0,0 +1,92 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import unittest
+
+import tests.utils as test_utils
+import torch
+from fairseq.data import TokenBlockDataset
+
+
+class TestTokenBlockDataset(unittest.TestCase):
+    def _build_dataset(self, data, **kwargs):
+        sizes = [len(x) for x in data]
+        underlying_ds = test_utils.TestDataset(data)
+        return TokenBlockDataset(underlying_ds, sizes, **kwargs)
+
+    def test_eos_break_mode(self):
+        data = [
+            torch.tensor([5, 4, 3, 2, 1], dtype=torch.long),
+            torch.tensor([1], dtype=torch.long),
+            torch.tensor([8, 7, 6, 1], dtype=torch.long),
+        ]
+        ds = self._build_dataset(data, block_size=None, pad=0, eos=1, break_mode="eos")
+        self.assertEqual(ds[0].tolist(), [5, 4, 3, 2, 1])
+        self.assertEqual(ds[1].tolist(), [1])
+        self.assertEqual(ds[2].tolist(), [8, 7, 6, 1])
+
+        data = [
+            torch.tensor([5, 4, 3, 2, 1], dtype=torch.long),
+            torch.tensor([8, 7, 6, 1], dtype=torch.long),
+            torch.tensor([1], dtype=torch.long),
+        ]
+        ds = self._build_dataset(data, block_size=None, pad=0, eos=1, break_mode="eos")
+        self.assertEqual(ds[0].tolist(), [5, 4, 3, 2, 1])
+        self.assertEqual(ds[1].tolist(), [8, 7, 6, 1])
+        self.assertEqual(ds[2].tolist(), [1])
+
+    def test_block_break_mode(self):
+        data = [
+            torch.tensor([5, 4, 3, 2, 1], dtype=torch.long),
+            torch.tensor([8, 7, 6, 1], dtype=torch.long),
+            torch.tensor([9, 1], dtype=torch.long),
+        ]
+        ds = self._build_dataset(data, block_size=3, pad=0, eos=1, break_mode="none")
+        self.assertEqual(ds[0].tolist(), [5, 4, 3])
+        self.assertEqual(ds[1].tolist(), [2, 1, 8])
+        self.assertEqual(ds[2].tolist(), [7, 6, 1])
+        self.assertEqual(ds[3].tolist(), [9, 1])
+
+    def test_complete_break_mode(self):
+        data = [
+            torch.tensor([5, 4, 3, 2, 1], dtype=torch.long),
+            torch.tensor([8, 7, 6, 1], dtype=torch.long),
+            torch.tensor([9, 1], dtype=torch.long),
+        ]
+        ds = self._build_dataset(
+            data, block_size=6, pad=0, eos=1, break_mode="complete"
+        )
+        self.assertEqual(ds[0].tolist(), [5, 4, 3, 2, 1])
+        self.assertEqual(ds[1].tolist(), [8, 7, 6, 1, 9, 1])
+
+        data = [
+            torch.tensor([4, 3, 2, 1], dtype=torch.long),
+            torch.tensor([5, 1], dtype=torch.long),
+            torch.tensor([1], dtype=torch.long),
+            torch.tensor([6, 1], dtype=torch.long),
+        ]
+        ds = self._build_dataset(
+            data, block_size=3, pad=0, eos=1, break_mode="complete"
+        )
+        self.assertEqual(ds[0].tolist(), [4, 3, 2, 1])
+        self.assertEqual(ds[1].tolist(), [5, 1, 1])
+        self.assertEqual(ds[2].tolist(), [6, 1])
+
+    def test_4billion_tokens(self):
+        """Regression test for numpy type promotion issue https://github.com/numpy/numpy/issues/5745"""
+        data = [torch.tensor(list(range(10000)), dtype=torch.long)] * 430000
+        ds = self._build_dataset(
+            data, block_size=6, pad=0, eos=1, break_mode="complete"
+        )
+        ds[-1]  # __getitem__ works
+        start, end = ds.slice_indices[-1]
+        assert end > 4294967295  # data must be sufficiently large to overflow uint32
+        assert not isinstance(
+            end + 1, float
+        )  # this would also raise, since np.uint64(1) + 1 => 2.0
+
+
+if __name__ == "__main__":
+    unittest.main()
diff --git a/SpeechT5/fairseq/tests/test_train.py b/SpeechT5/fairseq/tests/test_train.py
new file mode 100644
index 0000000000000000000000000000000000000000..65f4683bc67ca80c81bf1d2c27be621b57f7df94
--- /dev/null
+++ b/SpeechT5/fairseq/tests/test_train.py
@@ -0,0 +1,246 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import contextlib
+import logging
+import unittest
+from io import StringIO
+from unittest.mock import MagicMock, patch
+
+import torch
+from fairseq import checkpoint_utils, data
+from omegaconf import OmegaConf
+
+
+def mock_trainer(epoch, num_updates, iterations_in_epoch):
+    trainer = MagicMock()
+    trainer.load_checkpoint.return_value = {
+        "train_iterator": {
+            "epoch": epoch,
+            "iterations_in_epoch": iterations_in_epoch,
+            "shuffle": False,
+        },
+    }
+    trainer.get_num_updates.return_value = num_updates
+    return trainer
+
+
+def mock_dict():
+    d = MagicMock()
+    d.pad.return_value = 1
+    d.eos.return_value = 2
+    d.unk.return_value = 3
+    return d
+
+
+def get_trainer_and_epoch_itr(epoch, epoch_size, num_updates, iterations_in_epoch):
+    tokens = torch.LongTensor(list(range(epoch_size))).view(1, -1)
+    tokens_ds = data.TokenBlockDataset(
+        tokens,
+        sizes=[tokens.size(-1)],
+        block_size=1,
+        pad=0,
+        eos=1,
+        include_targets=False,
+    )
+    trainer = mock_trainer(epoch, num_updates, iterations_in_epoch)
+    dataset = data.LanguagePairDataset(
+        tokens_ds, tokens_ds.sizes, mock_dict(), shuffle=False
+    )
+    epoch_itr = data.EpochBatchIterator(
+        dataset=dataset,
+        collate_fn=dataset.collater,
+        batch_sampler=[[i] for i in range(epoch_size)],
+    )
+    return trainer, epoch_itr
+
+
+def get_mock_cfg(finetune_from_model):
+    cfg_mock = OmegaConf.create(
+        {
+            "checkpoint": {
+                "optimizer_overrides": "{}",
+                "reset_dataloader": False,
+                "reset_meters": False,
+                "reset_optimizer": False,
+                "reset_lr_scheduler": False,
+                "finetune_from_model": finetune_from_model,
+                "model_parallel_size": 1,
+                "restore_file": "checkpoint_last.pt",
+            },
+            "common": {
+                "model_parallel_size": 1,
+            },
+        }
+    )
+    return cfg_mock
+
+
+class TestLoadCheckpoint(unittest.TestCase):
+    def setUp(self):
+        self.cfg_mock = get_mock_cfg(None)
+        self.patches = {
+            "os.makedirs": MagicMock(),
+            "os.path.join": MagicMock(),
+            "os.path.isfile": MagicMock(return_value=True),
+            "os.path.isabs": MagicMock(return_value=False),
+            "fairseq.file_io.PathManager.exists": MagicMock(return_value=False),
+        }
+        self.applied_patches = [patch(p, d) for p, d in self.patches.items()]
+        [p.start() for p in self.applied_patches]
+        logging.disable(logging.CRITICAL)
+
+    def tearDown(self):
+        patch.stopall()
+        logging.disable(logging.NOTSET)
+
+    def test_load_partial_checkpoint(self):
+        with contextlib.redirect_stdout(StringIO()):
+            trainer, epoch_itr = get_trainer_and_epoch_itr(2, 150, 200, 50)
+            trainer.get_train_iterator = MagicMock(return_value=epoch_itr)
+
+            _, epoch_itr = checkpoint_utils.load_checkpoint(
+                self.cfg_mock.checkpoint, trainer
+            )
+
+            self.assertEqual(epoch_itr.epoch, 2)
+            self.assertEqual(epoch_itr.iterations_in_epoch, 50)
+
+            itr = epoch_itr.next_epoch_itr(shuffle=False)
+            self.assertEqual(epoch_itr.epoch, 2)
+            self.assertEqual(epoch_itr.iterations_in_epoch, 50)
+
+            self.assertEqual(next(itr)["net_input"]["src_tokens"][0].item(), 50)
+            self.assertEqual(epoch_itr.iterations_in_epoch, 51)
+
+            for _ in range(150 - 52):
+                next(itr)
+            self.assertEqual(epoch_itr.iterations_in_epoch, 149)
+            self.assertTrue(itr.has_next())
+            next(itr)
+            self.assertFalse(itr.has_next())
+
+            itr = epoch_itr.next_epoch_itr(shuffle=False)
+            self.assertTrue(itr.has_next())
+            self.assertEqual(epoch_itr.epoch, 3)
+            self.assertEqual(epoch_itr.iterations_in_epoch, 0)
+
+    def test_load_full_checkpoint(self):
+        with contextlib.redirect_stdout(StringIO()):
+            trainer, epoch_itr = get_trainer_and_epoch_itr(2, 150, 300, 150)
+            trainer.get_train_iterator = MagicMock(return_value=epoch_itr)
+
+            _, epoch_itr = checkpoint_utils.load_checkpoint(
+                self.cfg_mock.checkpoint, trainer
+            )
+            itr = epoch_itr.next_epoch_itr(shuffle=False)
+
+            self.assertEqual(epoch_itr.epoch, 3)
+            self.assertEqual(epoch_itr.iterations_in_epoch, 0)
+            self.assertEqual(next(itr)["net_input"]["src_tokens"][0].item(), 0)
+
+    def test_load_no_checkpoint(self):
+        with contextlib.redirect_stdout(StringIO()):
+            trainer, epoch_itr = get_trainer_and_epoch_itr(1, 150, 0, 0)
+            trainer.get_train_iterator = MagicMock(return_value=epoch_itr)
+            self.patches["os.path.isfile"].return_value = False
+
+            _, epoch_itr = checkpoint_utils.load_checkpoint(
+                self.cfg_mock.checkpoint, trainer
+            )
+            itr = epoch_itr.next_epoch_itr(shuffle=False)
+
+            self.assertEqual(epoch_itr.epoch, 1)
+            self.assertEqual(epoch_itr.iterations_in_epoch, 0)
+            self.assertEqual(next(itr)["net_input"]["src_tokens"][0].item(), 0)
+
+    def test_finetune_from_model_args_conflict(self):
+        with contextlib.redirect_stdout(StringIO()):
+            trainer, epoch_itr = get_trainer_and_epoch_itr(1, 150, 0, 0)
+            trainer.get_train_iterator = MagicMock(return_value=epoch_itr)
+
+            for arg in [
+                "reset_optimizer",
+                "reset_lr_scheduler",
+                "reset_meters",
+                "reset_dataloader",
+            ]:
+                with self.subTest(arg=arg):
+                    cfg_mock = get_mock_cfg("/temp/checkpoint_pretrained.pt")
+                    cfg_mock["checkpoint"][arg] = True
+                    with self.assertRaises(Exception) as context:
+                        _, _ = checkpoint_utils.load_checkpoint(
+                            cfg_mock.checkpoint, trainer
+                        )
+
+                    self.assertTrue(
+                        "--finetune-from-model can not be set together with either --reset-optimizer"
+                        " or reset_lr_scheduler or reset_meters or reset_dataloader"
+                        in str(context.exception)
+                    )
+
+    def test_finetune_from_model(self):
+        with contextlib.redirect_stdout(StringIO()):
+            trainer, epoch_itr = get_trainer_and_epoch_itr(1, 150, 0, 0)
+            trainer.get_train_iterator = MagicMock(return_value=epoch_itr)
+            from_model_path = "/temp/checkpoint_pretrained.pt"
+
+            def mock_finetune_exist(path):
+                if path == from_model_path:
+                    return True
+                else:
+                    return False
+
+            self.patches[
+                "fairseq.file_io.PathManager.exists"
+            ].side_effect = mock_finetune_exist
+            cfg_mock = get_mock_cfg(from_model_path)
+            cfg_mock.checkpoint.restore_file = "checkpoint_last.pt"
+            _, _ = checkpoint_utils.load_checkpoint(cfg_mock.checkpoint, trainer)
+            (
+                checkpoint_path,
+                reset_optimizer,
+                reset_lr_scheduler,
+                optimizer_overrides,
+            ) = trainer.load_checkpoint.call_args[0]
+            reset_meters = trainer.load_checkpoint.call_args[1]["reset_meters"]
+            self.assertTrue(reset_optimizer)
+            self.assertTrue(reset_lr_scheduler)
+            self.assertTrue(reset_meters)
+
+    def test_finetune_from_model_resume(self):
+        with contextlib.redirect_stdout(StringIO()):
+            trainer, epoch_itr = get_trainer_and_epoch_itr(1, 150, 0, 0)
+            trainer.get_train_iterator = MagicMock(return_value=epoch_itr)
+            from_model_path = "/temp/checkpoint_pretrained.pt"
+
+            # launch second time
+            # both restore_file=checkpoint_last.pt and finetune_from_model are set
+            def mock_finetune_exist(path):
+                if path == from_model_path or path.endsWith("checkpoint_last.pt"):
+                    return True
+                else:
+                    return False
+
+            self.patches[
+                "fairseq.file_io.PathManager.exists"
+            ].side_effect = mock_finetune_exist
+            cfg_mock = get_mock_cfg(from_model_path)
+            cfg_mock.checkpoint.restore_file = "checkpoint_last.pt"
+            _, _ = checkpoint_utils.load_checkpoint(cfg_mock.checkpoint, trainer)
+            (
+                checkpoint_path,
+                reset_optimizer,
+                reset_lr_scheduler,
+                optimizer_overrides,
+            ) = trainer.load_checkpoint.call_args[0]
+            reset_meters = trainer.load_checkpoint.call_args[1]["reset_meters"]
+            self.assertFalse(reset_optimizer)
+            self.assertFalse(reset_lr_scheduler)
+            self.assertFalse(reset_meters)
+
+
+if __name__ == "__main__":
+    unittest.main()
diff --git a/SpeechT5/fairseq/tests/test_transformer.py b/SpeechT5/fairseq/tests/test_transformer.py
new file mode 100644
index 0000000000000000000000000000000000000000..de5c5bdbd49692e63fb1cb50108a791304425dc1
--- /dev/null
+++ b/SpeechT5/fairseq/tests/test_transformer.py
@@ -0,0 +1,65 @@
+import argparse
+import unittest
+from typing import Any, Dict, Sequence
+
+import torch
+from fairseq.models import transformer
+
+from tests.test_roberta import FakeTask
+
+
+def mk_sample(tok: Sequence[int] = None, batch_size: int = 2) -> Dict[str, Any]:
+    if not tok:
+        tok = [10, 11, 12, 13, 14, 15, 2]
+
+    batch = torch.stack([torch.tensor(tok, dtype=torch.long)] * batch_size)
+    sample = {
+        "net_input": {
+            "src_tokens": batch,
+            "prev_output_tokens": batch,
+            "src_lengths": torch.tensor(
+                [len(tok)] * batch_size, dtype=torch.long, device=batch.device
+            ),
+        },
+        "target": batch[:, 1:],
+    }
+    return sample
+
+
+def mk_transformer(**extra_args: Any):
+    overrides = {
+        # Use characteristics dimensions
+        "encoder_embed_dim": 12,
+        "encoder_ffn_embed_dim": 14,
+        "decoder_embed_dim": 12,
+        "decoder_ffn_embed_dim": 14,
+        # Disable dropout so we have comparable tests.
+        "dropout": 0,
+        "attention_dropout": 0,
+        "activation_dropout": 0,
+        "encoder_layerdrop": 0,
+    }
+    overrides.update(extra_args)
+    # Overrides the defaults from the parser
+    args = argparse.Namespace(**overrides)
+    transformer.tiny_architecture(args)
+
+    torch.manual_seed(0)
+    task = FakeTask(args)
+    return transformer.TransformerModel.build_model(args, task)
+
+
+class TransformerTestCase(unittest.TestCase):
+    def test_forward_backward(self):
+        model = mk_transformer(encoder_embed_dim=12, decoder_embed_dim=12)
+        sample = mk_sample()
+        o, _ = model.forward(**sample["net_input"])
+        loss = o.sum()
+        loss.backward()
+
+    def test_different_encoder_decoder_embed_dim(self):
+        model = mk_transformer(encoder_embed_dim=12, decoder_embed_dim=16)
+        sample = mk_sample()
+        o, _ = model.forward(**sample["net_input"])
+        loss = o.sum()
+        loss.backward()
diff --git a/SpeechT5/fairseq/tests/test_utils.py b/SpeechT5/fairseq/tests/test_utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..79195903e0f34372a24fa50312a6e00170c14471
--- /dev/null
+++ b/SpeechT5/fairseq/tests/test_utils.py
@@ -0,0 +1,114 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import unittest
+
+import torch
+from fairseq import utils
+
+
+class TestUtils(unittest.TestCase):
+    def test_convert_padding_direction(self):
+        pad = 1
+        left_pad = torch.LongTensor(
+            [
+                [2, 3, 4, 5, 6],
+                [1, 7, 8, 9, 10],
+                [1, 1, 1, 11, 12],
+            ]
+        )
+        right_pad = torch.LongTensor(
+            [
+                [2, 3, 4, 5, 6],
+                [7, 8, 9, 10, 1],
+                [11, 12, 1, 1, 1],
+            ]
+        )
+
+        self.assertAlmostEqual(
+            right_pad,
+            utils.convert_padding_direction(
+                left_pad,
+                pad,
+                left_to_right=True,
+            ),
+        )
+        self.assertAlmostEqual(
+            left_pad,
+            utils.convert_padding_direction(
+                right_pad,
+                pad,
+                right_to_left=True,
+            ),
+        )
+
+    def test_make_positions(self):
+        pad = 1
+        left_pad_input = torch.LongTensor(
+            [
+                [9, 9, 9, 9, 9],
+                [1, 9, 9, 9, 9],
+                [1, 1, 1, 9, 9],
+            ]
+        )
+        left_pad_output = torch.LongTensor(
+            [
+                [2, 3, 4, 5, 6],
+                [1, 2, 3, 4, 5],
+                [1, 1, 1, 2, 3],
+            ]
+        )
+        right_pad_input = torch.LongTensor(
+            [
+                [9, 9, 9, 9, 9],
+                [9, 9, 9, 9, 1],
+                [9, 9, 1, 1, 1],
+            ]
+        )
+        right_pad_output = torch.LongTensor(
+            [
+                [2, 3, 4, 5, 6],
+                [2, 3, 4, 5, 1],
+                [2, 3, 1, 1, 1],
+            ]
+        )
+
+        self.assertAlmostEqual(
+            left_pad_output,
+            utils.make_positions(left_pad_input, pad),
+        )
+        self.assertAlmostEqual(
+            right_pad_output,
+            utils.make_positions(right_pad_input, pad),
+        )
+
+    def test_clip_grad_norm_(self):
+        params = torch.nn.Parameter(torch.zeros(5)).requires_grad_(False)
+        grad_norm = utils.clip_grad_norm_(params, 1.0)
+        self.assertTrue(torch.is_tensor(grad_norm))
+        self.assertEqual(grad_norm, 0.0)
+
+        params = [torch.nn.Parameter(torch.zeros(5)) for i in range(3)]
+        for p in params:
+            p.grad = torch.full((5,), fill_value=2.0)
+        grad_norm = utils.clip_grad_norm_(params, 1.0)
+        exp_grad_norm = torch.full((15,), fill_value=2.0).norm()
+        self.assertTrue(torch.is_tensor(grad_norm))
+        self.assertEqual(grad_norm, exp_grad_norm)
+
+        grad_norm = utils.clip_grad_norm_(params, 1.0)
+        self.assertAlmostEqual(grad_norm, torch.tensor(1.0))
+
+    def test_resolve_max_positions_with_tuple(self):
+        resolved = utils.resolve_max_positions(None, (2000, 100, 2000), 12000)
+        self.assertEqual(resolved, (2000, 100, 2000))
+
+    def assertAlmostEqual(self, t1, t2):
+        self.assertEqual(t1.size(), t2.size(), "size mismatch")
+        self.assertLess(utils.item((t1 - t2).abs().max()), 1e-4)
+
+
+if __name__ == "__main__":
+    unittest.main()
diff --git a/SpeechT5/fairseq/tests/test_valid_subset_checks.py b/SpeechT5/fairseq/tests/test_valid_subset_checks.py
new file mode 100644
index 0000000000000000000000000000000000000000..3e9191bda66fccfebba34920f88bf7b1efea5f7e
--- /dev/null
+++ b/SpeechT5/fairseq/tests/test_valid_subset_checks.py
@@ -0,0 +1,138 @@
+import os
+import shutil
+import tempfile
+import unittest
+
+from fairseq import options
+from fairseq.dataclass.utils import convert_namespace_to_omegaconf
+from fairseq.data.data_utils import raise_if_valid_subsets_unintentionally_ignored
+from .utils import create_dummy_data, preprocess_lm_data, train_language_model
+
+
+def make_lm_config(
+    data_dir=None,
+    extra_flags=None,
+    task="language_modeling",
+    arch="transformer_lm_gpt2_tiny",
+):
+    task_args = [task]
+    if data_dir is not None:
+        task_args += [data_dir]
+    train_parser = options.get_training_parser()
+    train_args = options.parse_args_and_arch(
+        train_parser,
+        [
+            "--task",
+            *task_args,
+            "--arch",
+            arch,
+            "--optimizer",
+            "adam",
+            "--lr",
+            "0.0001",
+            "--max-tokens",
+            "500",
+            "--tokens-per-sample",
+            "500",
+            "--save-dir",
+            data_dir,
+            "--max-epoch",
+            "1",
+        ]
+        + (extra_flags or []),
+    )
+    cfg = convert_namespace_to_omegaconf(train_args)
+    return cfg
+
+
+def write_empty_file(path):
+    with open(path, "w"):
+        pass
+    assert os.path.exists(path)
+
+
+class TestValidSubsetsErrors(unittest.TestCase):
+    """Test various filesystem, clarg combinations and ensure that error raising happens as expected"""
+
+    def _test_case(self, paths, extra_flags):
+        with tempfile.TemporaryDirectory() as data_dir:
+            [
+                write_empty_file(os.path.join(data_dir, f"{p}.bin"))
+                for p in paths + ["train"]
+            ]
+            cfg = make_lm_config(data_dir, extra_flags=extra_flags)
+            raise_if_valid_subsets_unintentionally_ignored(cfg)
+
+    def test_default_raises(self):
+        with self.assertRaises(ValueError):
+            self._test_case(["valid", "valid1"], [])
+        with self.assertRaises(ValueError):
+            self._test_case(
+                ["valid", "valid1", "valid2"], ["--valid-subset", "valid,valid1"]
+            )
+
+    def partially_specified_valid_subsets(self):
+        with self.assertRaises(ValueError):
+            self._test_case(
+                ["valid", "valid1", "valid2"], ["--valid-subset", "valid,valid1"]
+            )
+        # Fix with ignore unused
+        self._test_case(
+            ["valid", "valid1", "valid2"],
+            ["--valid-subset", "valid,valid1", "--ignore-unused-valid-subsets"],
+        )
+
+    def test_legal_configs(self):
+        self._test_case(["valid"], [])
+        self._test_case(["valid", "valid1"], ["--ignore-unused-valid-subsets"])
+        self._test_case(["valid", "valid1"], ["--combine-val"])
+        self._test_case(["valid", "valid1"], ["--valid-subset", "valid,valid1"])
+        self._test_case(["valid", "valid1"], ["--valid-subset", "valid1"])
+        self._test_case(
+            ["valid", "valid1"], ["--combine-val", "--ignore-unused-valid-subsets"]
+        )
+        self._test_case(
+            ["valid1"], ["--valid-subset", "valid1"]
+        )  # valid.bin doesn't need to be ignored.
+
+    def test_disable_validation(self):
+        self._test_case([], ["--disable-validation"])
+        self._test_case(["valid", "valid1"], ["--disable-validation"])
+
+    def test_dummy_task(self):
+        cfg = make_lm_config(task="dummy_lm")
+        raise_if_valid_subsets_unintentionally_ignored(cfg)
+
+    def test_masked_dummy_task(self):
+        cfg = make_lm_config(task="dummy_masked_lm")
+        raise_if_valid_subsets_unintentionally_ignored(cfg)
+
+
+class TestCombineValidSubsets(unittest.TestCase):
+    def _train(self, extra_flags):
+        with self.assertLogs() as logs:
+            with tempfile.TemporaryDirectory("test_transformer_lm") as data_dir:
+                create_dummy_data(data_dir, num_examples=20)
+                preprocess_lm_data(data_dir)
+
+                shutil.copyfile(f"{data_dir}/valid.bin", f"{data_dir}/valid1.bin")
+                shutil.copyfile(f"{data_dir}/valid.idx", f"{data_dir}/valid1.idx")
+                train_language_model(
+                    data_dir,
+                    "transformer_lm",
+                    ["--max-update", "0", "--log-format", "json"] + extra_flags,
+                    run_validation=False,
+                )
+        return [x.message for x in logs.records]
+
+    def test_combined(self):
+        flags = ["--combine-valid-subsets"]
+        logs = self._train(flags)
+        assert any(["valid1" in x for x in logs])  # loaded 100 examples from valid1
+        assert not any(["valid1_ppl" in x for x in logs])  # metrics are combined
+
+    def test_subsets(self):
+        flags = ["--valid-subset", "valid,valid1"]
+        logs = self._train(flags)
+        assert any(["valid_ppl" in x for x in logs])  # loaded 100 examples from valid1
+        assert any(["valid1_ppl" in x for x in logs])  # metrics are combined
diff --git a/SpeechT5/fairseq/tests/utils.py b/SpeechT5/fairseq/tests/utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..6e0c709517aea570acb36901dd47bc12a3025b07
--- /dev/null
+++ b/SpeechT5/fairseq/tests/utils.py
@@ -0,0 +1,717 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import argparse
+import json
+import os
+import random
+import sys
+from io import StringIO
+
+import torch
+import torch.nn.functional as F
+from fairseq import options, utils
+from fairseq.data import Dictionary
+from fairseq.data.language_pair_dataset import collate
+from fairseq.models import (
+    FairseqEncoder,
+    FairseqEncoderDecoderModel,
+    FairseqIncrementalDecoder,
+)
+from fairseq.models.fairseq_encoder import EncoderOut
+from fairseq.tasks import LegacyFairseqTask
+from fairseq_cli import generate, interactive, preprocess, train, validate
+import fairseq.distributed.utils as distributed_utils
+from fairseq.dataclass.utils import convert_namespace_to_omegaconf
+
+
+def dummy_dictionary(vocab_size, prefix="token_"):
+    d = Dictionary()
+    for i in range(vocab_size):
+        token = prefix + str(i)
+        d.add_symbol(token)
+    d.finalize(padding_factor=1)  # don't add extra padding symbols
+    return d
+
+
+def dummy_dataloader(
+    samples, padding_idx=1, eos_idx=2, batch_size=None,
+):
+    if batch_size is None:
+        batch_size = len(samples)
+
+    # add any missing data to samples
+    for i, sample in enumerate(samples):
+        if "id" not in sample:
+            sample["id"] = i
+
+    # create dataloader
+    dataset = TestDataset(samples)
+    dataloader = torch.utils.data.DataLoader(
+        dataset,
+        batch_size=batch_size,
+        collate_fn=(lambda samples: collate(samples, padding_idx, eos_idx)),
+    )
+    return iter(dataloader)
+
+
+def sequence_generator_setup():
+    # construct dummy dictionary
+    d = dummy_dictionary(vocab_size=2)
+
+    eos = d.eos()
+    w1 = 4
+    w2 = 5
+
+    # construct source data
+    src_tokens = torch.LongTensor([[w1, w2, eos], [w1, w2, eos]])
+    src_lengths = torch.LongTensor([2, 2])
+
+    args = argparse.Namespace()
+    unk = 0.0
+    args.beam_probs = [
+        # step 0:
+        torch.FloatTensor(
+            [
+                # eos      w1   w2
+                # sentence 1:
+                [0.0, unk, 0.9, 0.1],  # beam 1
+                [0.0, unk, 0.9, 0.1],  # beam 2
+                # sentence 2:
+                [0.0, unk, 0.7, 0.3],
+                [0.0, unk, 0.7, 0.3],
+            ]
+        ),
+        # step 1:
+        torch.FloatTensor(
+            [
+                # eos      w1   w2       prefix
+                # sentence 1:
+                [1.0, unk, 0.0, 0.0],  # w1: 0.9  (emit: w1 <eos>: 0.9*1.0)
+                [0.0, unk, 0.9, 0.1],  # w2: 0.1
+                # sentence 2:
+                [0.25, unk, 0.35, 0.4],  # w1: 0.7  (don't emit: w1 <eos>: 0.7*0.25)
+                [0.00, unk, 0.10, 0.9],  # w2: 0.3
+            ]
+        ),
+        # step 2:
+        torch.FloatTensor(
+            [
+                # eos      w1   w2       prefix
+                # sentence 1:
+                [0.0, unk, 0.1, 0.9],  # w2 w1: 0.1*0.9
+                [
+                    0.6,
+                    unk,
+                    0.2,
+                    0.2,
+                ],  # w2 w2: 0.1*0.1  (emit: w2 w2 <eos>: 0.1*0.1*0.6)
+                # sentence 2:
+                [
+                    0.60,
+                    unk,
+                    0.4,
+                    0.00,
+                ],  # w1 w2: 0.7*0.4  (emit: w1 w2 <eos>: 0.7*0.4*0.6)
+                [0.01, unk, 0.0, 0.99],  # w2 w2: 0.3*0.9
+            ]
+        ),
+        # step 3:
+        torch.FloatTensor(
+            [
+                # eos      w1   w2       prefix
+                # sentence 1:
+                [
+                    1.0,
+                    unk,
+                    0.0,
+                    0.0,
+                ],  # w2 w1 w2: 0.1*0.9*0.9  (emit: w2 w1 w2 <eos>: 0.1*0.9*0.9*1.0)
+                [
+                    1.0,
+                    unk,
+                    0.0,
+                    0.0,
+                ],  # w2 w1 w1: 0.1*0.9*0.1  (emit: w2 w1 w1 <eos>: 0.1*0.9*0.1*1.0)
+                # sentence 2:
+                [
+                    0.1,
+                    unk,
+                    0.5,
+                    0.4,
+                ],  # w2 w2 w2: 0.3*0.9*0.99  (emit: w2 w2 w2 <eos>: 0.3*0.9*0.99*0.1)
+                [
+                    1.0,
+                    unk,
+                    0.0,
+                    0.0,
+                ],  # w1 w2 w1: 0.7*0.4*0.4  (emit: w1 w2 w1 <eos>: 0.7*0.4*0.4*1.0)
+            ]
+        ),
+    ]
+
+    task = TestTranslationTask.setup_task(args, d, d)
+    model = task.build_model(args)
+    tgt_dict = task.target_dictionary
+
+    return tgt_dict, w1, w2, src_tokens, src_lengths, model
+
+
+def create_dummy_data(data_dir, num_examples=100, maxlen=20, alignment=False):
+    def _create_dummy_data(filename):
+        data = torch.rand(num_examples * maxlen)
+        data = 97 + torch.floor(26 * data).int()
+        with open(os.path.join(data_dir, filename), "w") as h:
+            offset = 0
+            for _ in range(num_examples):
+                ex_len = random.randint(1, maxlen)
+                ex_str = " ".join(map(chr, data[offset : offset + ex_len]))
+                print(ex_str, file=h)
+                offset += ex_len
+
+    def _create_dummy_alignment_data(filename_src, filename_tgt, filename):
+        with open(os.path.join(data_dir, filename_src), "r") as src_f, open(
+            os.path.join(data_dir, filename_tgt), "r"
+        ) as tgt_f, open(os.path.join(data_dir, filename), "w") as h:
+            for src, tgt in zip(src_f, tgt_f):
+                src_len = len(src.split())
+                tgt_len = len(tgt.split())
+                avg_len = (src_len + tgt_len) // 2
+                num_alignments = random.randint(avg_len // 2, 2 * avg_len)
+                src_indices = torch.floor(torch.rand(num_alignments) * src_len).int()
+                tgt_indices = torch.floor(torch.rand(num_alignments) * tgt_len).int()
+                ex_str = " ".join(
+                    [
+                        "{}-{}".format(src, tgt)
+                        for src, tgt in zip(src_indices, tgt_indices)
+                    ]
+                )
+                print(ex_str, file=h)
+
+    _create_dummy_data("train.in")
+    _create_dummy_data("train.out")
+    _create_dummy_data("valid.in")
+    _create_dummy_data("valid.out")
+    _create_dummy_data("test.in")
+    _create_dummy_data("test.out")
+
+    if alignment:
+        _create_dummy_alignment_data("train.in", "train.out", "train.align")
+        _create_dummy_alignment_data("valid.in", "valid.out", "valid.align")
+        _create_dummy_alignment_data("test.in", "test.out", "test.align")
+
+
+def preprocess_lm_data(data_dir):
+    preprocess_parser = options.get_preprocessing_parser()
+    preprocess_args = preprocess_parser.parse_args(
+        [
+            "--only-source",
+            "--trainpref",
+            os.path.join(data_dir, "train.out"),
+            "--validpref",
+            os.path.join(data_dir, "valid.out"),
+            "--testpref",
+            os.path.join(data_dir, "test.out"),
+            "--destdir",
+            data_dir,
+        ]
+    )
+    preprocess.main(preprocess_args)
+
+
+def preprocess_translation_data(data_dir, extra_flags=None):
+    preprocess_parser = options.get_preprocessing_parser()
+    preprocess_args = preprocess_parser.parse_args(
+        [
+            "--source-lang",
+            "in",
+            "--target-lang",
+            "out",
+            "--trainpref",
+            os.path.join(data_dir, "train"),
+            "--validpref",
+            os.path.join(data_dir, "valid"),
+            "--testpref",
+            os.path.join(data_dir, "test"),
+            "--thresholdtgt",
+            "0",
+            "--thresholdsrc",
+            "0",
+            "--destdir",
+            data_dir,
+        ]
+        + (extra_flags or []),
+    )
+    preprocess.main(preprocess_args)
+
+
+def preprocess_summarization_data(data_dir, extra_flags=None):
+    preprocess_parser = options.get_preprocessing_parser()
+    preprocess_args = preprocess_parser.parse_args(
+        [
+            "--source-lang",
+            "in",
+            "--target-lang",
+            "out",
+            "--trainpref",
+            os.path.join(data_dir, "train"),
+            "--validpref",
+            os.path.join(data_dir, "valid"),
+            "--testpref",
+            os.path.join(data_dir, "test"),
+            "--thresholdtgt",
+            "0",
+            "--thresholdsrc",
+            "0",
+            "--joined-dictionary",
+            "--destdir",
+            data_dir,
+        ]
+        + (extra_flags or []),
+    )
+    preprocess.main(preprocess_args)
+
+
+def create_laser_data_and_config_json(data_dir):
+    src_langs = ["de", "fr", "ru", "tr", "zh"]
+    tgt_langs = ["en", "es"]
+    config_json = {}
+    config_train_json = []
+    src_vocab = None
+    tgt_vocab = None
+
+    for src_lang in src_langs:
+        for tgt_lang in tgt_langs:
+            langpair_folder = f"{src_lang}-{tgt_lang}"
+
+            langpair_path = os.path.join(data_dir, langpair_folder)
+            os.mkdir(langpair_path)
+            create_dummy_data(langpair_path)
+            preprocess_translation_data(langpair_path, ["--dataset-impl", "cached"])
+
+            src_vocab = os.path.join(langpair_path, "dict.in.txt")
+            tgt_vocab = os.path.join(langpair_path, "dict.out.txt")
+            config_train_json.append(
+                {
+                    "id": 0 if tgt_lang == "en" else 1,
+                    "src": os.path.join(langpair_path, "train.in-out.in"),
+                    "tgt": os.path.join(langpair_path, "train.in-out.out"),
+                }
+            )
+
+    config_json["src_vocab"] = src_vocab
+    config_json["tgt_vocab"] = tgt_vocab
+    config_json["train"] = config_train_json
+
+    with open(os.path.join(data_dir, "laserconfig.json"), "w") as config_file:
+        json.dump(config_json, config_file)
+
+    return config_file
+
+
+def train_translation_model(
+    data_dir,
+    arch,
+    extra_flags=None,
+    task="translation",
+    run_validation=False,
+    lang_flags=None,
+    extra_valid_flags=None,
+    world_size=1,
+):
+    if lang_flags is None:
+        lang_flags = [
+            "--source-lang",
+            "in",
+            "--target-lang",
+            "out",
+        ]
+    train_parser = options.get_training_parser()
+    train_args = options.parse_args_and_arch(
+        train_parser,
+        [
+            "--task",
+            task,
+            data_dir,
+            "--save-dir",
+            data_dir,
+            "--arch",
+            arch,
+            "--optimizer",
+            "nag",
+            "--lr",
+            "0.05",
+            "--max-tokens",
+            "500",
+            "--max-epoch",
+            "1",
+            "--no-progress-bar",
+            "--distributed-world-size",
+            str(world_size),
+            "--num-workers",
+            "0",
+        ]
+        + lang_flags
+        + (extra_flags or []),
+    )
+
+    cfg = convert_namespace_to_omegaconf(train_args)
+    distributed_utils.call_main(cfg, train.main)
+
+    if run_validation:
+        # test validation
+        validate_parser = options.get_validation_parser()
+        validate_args = options.parse_args_and_arch(
+            validate_parser,
+            [
+                "--task",
+                task,
+                data_dir,
+                "--path",
+                os.path.join(data_dir, "checkpoint_last.pt"),
+                "--valid-subset",
+                "valid",
+                "--max-tokens",
+                "500",
+                "--no-progress-bar",
+                "--num-workers",
+                "0",
+            ]
+            + lang_flags
+            + (extra_valid_flags or []),
+        )
+        validate.main(validate_args)
+
+
+def generate_main(data_dir, extra_flags=None, path=None):
+    if extra_flags is None:
+        extra_flags = [
+            "--print-alignment",
+        ]
+    if path is None:
+        path = os.path.join(data_dir, "checkpoint_last.pt")
+    generate_parser = options.get_generation_parser()
+    generate_args = options.parse_args_and_arch(
+        generate_parser,
+        [
+            data_dir,
+            "--path",
+            path,
+            "--beam",
+            "3",
+            "--batch-size",
+            "64",
+            "--max-len-b",
+            "5",
+            "--gen-subset",
+            "valid",
+            "--no-progress-bar",
+            "--num-workers",
+            "0",
+        ]
+        + (extra_flags or []),
+    )
+
+    # evaluate model in batch mode
+    generate.main(generate_args)
+
+    # evaluate model interactively
+    generate_args.buffer_size = 0
+    generate_args.input = "-"
+    generate_args.batch_size = None
+    orig_stdin = sys.stdin
+    sys.stdin = StringIO("h e l l o\n")
+    interactive.main(generate_args)
+    sys.stdin = orig_stdin
+
+
+class TestDataset(torch.utils.data.Dataset):
+    def __init__(self, data):
+        super().__init__()
+        self.data = data
+        self.sizes = None
+
+    def __getitem__(self, index):
+        return self.data[index]
+
+    def __len__(self):
+        return len(self.data)
+
+
+class TestTranslationTask(LegacyFairseqTask):
+    def __init__(self, args, src_dict, tgt_dict, model):
+        super().__init__(args)
+        self.src_dict = src_dict
+        self.tgt_dict = tgt_dict
+        self.model = model
+
+    @classmethod
+    def setup_task(cls, args, src_dict=None, tgt_dict=None, model=None):
+        return cls(args, src_dict, tgt_dict, model)
+
+    def build_model(self, args):
+        return TestModel.build_model(args, self)
+
+    @property
+    def source_dictionary(self):
+        return self.src_dict
+
+    @property
+    def target_dictionary(self):
+        return self.tgt_dict
+
+
+class TestModel(FairseqEncoderDecoderModel):
+    def __init__(self, encoder, decoder):
+        super().__init__(encoder, decoder)
+
+    @classmethod
+    def build_model(cls, args, task):
+        encoder = TestEncoder(args, task.source_dictionary)
+        decoder = TestIncrementalDecoder(args, task.target_dictionary)
+        return cls(encoder, decoder)
+
+
+class TestEncoder(FairseqEncoder):
+    def __init__(self, args, dictionary):
+        super().__init__(dictionary)
+        self.args = args
+
+    def forward(self, src_tokens, src_lengths=None, **kwargs):
+        return EncoderOut(
+            encoder_out=src_tokens,
+            encoder_padding_mask=None,
+            encoder_embedding=None,
+            encoder_states=None,
+            src_tokens=None,
+            src_lengths=None,
+        )
+
+    def reorder_encoder_out(self, encoder_out, new_order):
+        return EncoderOut(
+            encoder_out=encoder_out.encoder_out.index_select(0, new_order),
+            encoder_padding_mask=None,
+            encoder_embedding=None,
+            encoder_states=None,
+            src_tokens=None,
+            src_lengths=None,
+        )
+
+
+class TestIncrementalDecoder(FairseqIncrementalDecoder):
+    def __init__(self, args, dictionary):
+        super().__init__(dictionary)
+        assert hasattr(args, "beam_probs") or hasattr(args, "probs")
+        args.max_decoder_positions = getattr(args, "max_decoder_positions", 100)
+        self.args = args
+
+    def forward(self, prev_output_tokens, encoder_out=None, incremental_state=None):
+        if incremental_state is not None:
+            prev_output_tokens = prev_output_tokens[:, -1:]
+        bbsz = prev_output_tokens.size(0)
+        vocab = len(self.dictionary)
+        src_len = encoder_out.encoder_out.size(1)
+        tgt_len = prev_output_tokens.size(1)
+
+        # determine number of steps
+        if incremental_state is not None:
+            # cache step number
+            step = utils.get_incremental_state(self, incremental_state, "step")
+            if step is None:
+                step = 0
+            utils.set_incremental_state(self, incremental_state, "step", step + 1)
+            steps = [step]
+        else:
+            steps = list(range(tgt_len))
+
+        # define output in terms of raw probs
+        if hasattr(self.args, "probs"):
+            assert (
+                self.args.probs.dim() == 3
+            ), "expected probs to have size bsz*steps*vocab"
+            probs = self.args.probs.index_select(1, torch.LongTensor(steps))
+        else:
+            probs = torch.FloatTensor(bbsz, len(steps), vocab).zero_()
+            for i, step in enumerate(steps):
+                # args.beam_probs gives the probability for every vocab element,
+                # starting with eos, then unknown, and then the rest of the vocab
+                if step < len(self.args.beam_probs):
+                    probs[:, i, self.dictionary.eos() :] = self.args.beam_probs[step]
+                else:
+                    probs[:, i, self.dictionary.eos()] = 1.0
+
+        # random attention
+        attn = torch.rand(bbsz, tgt_len, src_len)
+
+        dev = prev_output_tokens.device
+        return probs.to(dev), {"attn": [attn.to(dev)]}
+
+    def get_normalized_probs(self, net_output, log_probs, _):
+        # the decoder returns probabilities directly
+        probs = net_output[0]
+        if log_probs:
+            return probs.log()
+        else:
+            return probs
+
+    def max_positions(self):
+        return self.args.max_decoder_positions
+
+
+class TestReshapingEncoder(FairseqEncoder):
+    def __init__(self, args, dictionary):
+        super().__init__(dictionary)
+        self.args = args
+
+    def forward(self, src_tokens, src_lengths=None, **kwargs):
+        b_sz, t_sz = src_tokens.shape
+        padding_needed = t_sz % 2
+        x = src_tokens
+        if padding_needed > 0:
+            padding_needed = 2 - padding_needed
+            x = F.pad(x, (0, padding_needed))
+
+        return EncoderOut(
+            encoder_out=x.view(b_sz, -1, 2),
+            encoder_padding_mask=None,
+            encoder_embedding=None,
+            encoder_states=None,
+            src_tokens=None,
+            src_lengths=None,
+        )
+
+    def reorder_encoder_out(self, encoder_out, new_order):
+        return EncoderOut(
+            encoder_out=encoder_out.encoder_out.index_select(0, new_order),
+            encoder_padding_mask=None,
+            encoder_embedding=None,
+            encoder_states=None,
+            src_tokens=None,
+            src_lengths=None,
+        )
+
+
+class TestReshapingModel(FairseqEncoderDecoderModel):
+    def __init__(self, encoder, decoder):
+        super().__init__(encoder, decoder)
+
+    @classmethod
+    def build_model(cls, args, task):
+        encoder = TestReshapingEncoder(args, task.source_dictionary)
+        decoder = TestIncrementalDecoder(args, task.target_dictionary)
+        return cls(encoder, decoder)
+
+
+class TestAdditionalInputEncoder(FairseqEncoder):
+    def __init__(self, args, dictionary):
+        super().__init__(dictionary)
+        self.args = args
+
+    def forward(self, src_tokens, src_lengths=None, **kwargs):
+        assert "fancy_other_input" in kwargs
+        assert kwargs["fancy_other_input"] is not None
+        return EncoderOut(
+            encoder_out=src_tokens,
+            encoder_padding_mask=None,
+            encoder_embedding=None,
+            encoder_states=None,
+            src_tokens=None,
+            src_lengths=None,
+        )
+
+    def reorder_encoder_out(self, encoder_out, new_order):
+        return EncoderOut(
+            encoder_out=encoder_out.encoder_out.index_select(0, new_order),
+            encoder_padding_mask=None,
+            encoder_embedding=None,
+            encoder_states=None,
+            src_tokens=None,
+            src_lengths=None,
+        )
+
+
+class TestAdditionalInputModel(FairseqEncoderDecoderModel):
+    def __init__(self, encoder, decoder):
+        super().__init__(encoder, decoder)
+
+    @classmethod
+    def build_model(cls, args, task):
+        encoder = TestAdditionalInputEncoder(args, task.source_dictionary)
+        decoder = TestIncrementalDecoder(args, task.target_dictionary)
+        return cls(encoder, decoder)
+
+    def forward(self, src_tokens, src_lengths, prev_output_tokens, **kwargs):
+        encoder_out = self.encoder(src_tokens, src_lengths=src_lengths, **kwargs)
+        decoder_out = self.decoder(
+            prev_output_tokens, encoder_out=encoder_out, **kwargs
+        )
+        return decoder_out
+
+
+def train_language_model(
+    data_dir,
+    arch,
+    extra_flags=None,
+    run_validation=False,
+    extra_valid_flags=None,
+    task="language_modeling",
+    world_size=1,
+):
+    train_parser = options.get_training_parser()
+    train_args = options.parse_args_and_arch(
+        train_parser,
+        [
+            "--task",
+            task,
+            data_dir,
+            "--arch",
+            arch,
+            "--optimizer",
+            "adam",
+            "--lr",
+            "0.0001",
+            "--max-tokens",
+            "500",
+            "--tokens-per-sample",
+            "500",
+            "--save-dir",
+            data_dir,
+            "--max-epoch",
+            "1",
+            "--no-progress-bar",
+            "--distributed-world-size",
+            str(world_size),
+            "--ddp-backend",
+            "no_c10d",
+            "--num-workers",
+            "0",
+        ]
+        + (extra_flags or []),
+    )
+    cfg = convert_namespace_to_omegaconf(train_args)
+    distributed_utils.call_main(cfg, train.main)
+
+    if run_validation:
+        # test validation
+        validate_parser = options.get_validation_parser()
+        validate_args = options.parse_args_and_arch(
+            validate_parser,
+            [
+                "--task",
+                task,
+                data_dir,
+                "--path",
+                os.path.join(data_dir, "checkpoint_last.pt"),
+                "--valid-subset",
+                "valid",
+                "--max-tokens",
+                "500",
+                "--no-progress-bar",
+                "--num-workers",
+                "0",
+            ]
+            + (extra_valid_flags or []),
+        )
+        validate.main(validate_args)
diff --git a/SpeechT5/fairseq/train.py b/SpeechT5/fairseq/train.py
new file mode 100644
index 0000000000000000000000000000000000000000..321de3d9b53f8194b58c26f5cb2c03281afc2bb1
--- /dev/null
+++ b/SpeechT5/fairseq/train.py
@@ -0,0 +1,14 @@
+#!/usr/bin/env python3 -u
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+"""
+Legacy entry point. Use fairseq_cli/train.py or fairseq-train instead.
+"""
+
+from fairseq_cli.train import cli_main
+
+
+if __name__ == "__main__":
+    cli_main()
diff --git a/SpeechT5/results/ablation_study.png b/SpeechT5/results/ablation_study.png
new file mode 100644
index 0000000000000000000000000000000000000000..9e3fb62a4a0639e97960ab6d68d48a4ee18c734a
Binary files /dev/null and b/SpeechT5/results/ablation_study.png differ
diff --git a/SpeechT5/results/asr.png b/SpeechT5/results/asr.png
new file mode 100644
index 0000000000000000000000000000000000000000..b0250a4dea3415e2bd1cb15a21177c689bb56c48
Binary files /dev/null and b/SpeechT5/results/asr.png differ
diff --git a/SpeechT5/results/se.png b/SpeechT5/results/se.png
new file mode 100644
index 0000000000000000000000000000000000000000..3c0f55baa190534e8c72fabee1f6911475758a3f
Binary files /dev/null and b/SpeechT5/results/se.png differ
diff --git a/SpeechT5/results/sid.png b/SpeechT5/results/sid.png
new file mode 100644
index 0000000000000000000000000000000000000000..d7e464af982f9ccd4affa58f8db37ba4dec4cf83
Binary files /dev/null and b/SpeechT5/results/sid.png differ
diff --git a/SpeechT5/results/st.png b/SpeechT5/results/st.png
new file mode 100644
index 0000000000000000000000000000000000000000..711add5776a52b7691db5ac14fabb4453660c07e
Binary files /dev/null and b/SpeechT5/results/st.png differ
diff --git a/SpeechT5/results/tts.png b/SpeechT5/results/tts.png
new file mode 100644
index 0000000000000000000000000000000000000000..6d06326844de2c57437ad5a27467f68fb5d28c12
Binary files /dev/null and b/SpeechT5/results/tts.png differ
diff --git a/SpeechT5/results/vc.png b/SpeechT5/results/vc.png
new file mode 100644
index 0000000000000000000000000000000000000000..ce9753efe328ef00c7239533f39865680d2d2f34
Binary files /dev/null and b/SpeechT5/results/vc.png differ
diff --git a/SpeechT5/scripts/generate_class.py b/SpeechT5/scripts/generate_class.py
new file mode 100644
index 0000000000000000000000000000000000000000..b9656d214513a4228836883771442bec539f1a20
--- /dev/null
+++ b/SpeechT5/scripts/generate_class.py
@@ -0,0 +1,153 @@
+import ast
+import logging
+import os
+import sys
+from argparse import Namespace
+
+import numpy as np
+import torch
+from fairseq import checkpoint_utils, options, tasks, utils
+from fairseq.fairseq.dataclass.utils import convert_namespace_to_omegaconf
+from fairseq.fairseq.logging import progress_bar
+from omegaconf import DictConfig
+
+
+def main(cfg: DictConfig):
+    if isinstance(cfg, Namespace):
+        cfg = convert_namespace_to_omegaconf(cfg)
+
+    assert cfg.common_eval.path is not None, "--path required for generation!"
+    assert (
+        cfg.generation.replace_unk is None or cfg.dataset.dataset_impl == "raw"
+    ), "--replace-unk requires a raw text dataset (--dataset-impl=raw)"
+
+    if cfg.common_eval.results_path is not None:
+        os.makedirs(cfg.common_eval.results_path, exist_ok=True)
+
+    return _main(cfg, sys.stdout)
+
+
+def _main(cfg: DictConfig, output_file):
+    logging.basicConfig(
+        format="%(asctime)s | %(levelname)s | %(name)s | %(message)s",
+        datefmt="%Y-%m-%d %H:%M:%S",
+        level=os.environ.get("LOGLEVEL", "INFO").upper(),
+        stream=output_file,
+    )
+    logger = logging.getLogger("speecht5.generate_class")
+
+    utils.import_user_module(cfg.common)
+
+    assert cfg.dataset.batch_size == 1, "only support batch size 1"
+    logger.info(cfg)
+
+    # Fix seed for stochastic decoding
+    if cfg.common.seed is not None and not cfg.generation.no_seed_provided:
+        np.random.seed(cfg.common.seed)
+        utils.set_torch_seed(cfg.common.seed)
+
+    use_cuda = torch.cuda.is_available() and not cfg.common.cpu
+    if not use_cuda:
+        logger.info("generate speech on cpu")
+
+    # build task
+    task = tasks.setup_task(cfg.task)
+
+    # Load ensemble
+    logger.info("loading model(s) from {}".format(cfg.common_eval.path))
+    overrides = ast.literal_eval(cfg.common_eval.model_overrides)
+    models, saved_cfg = checkpoint_utils.load_model_ensemble(
+        utils.split_paths(cfg.common_eval.path),
+        arg_overrides=overrides,
+        task=task,
+        suffix=cfg.checkpoint.checkpoint_suffix,
+        strict=(cfg.checkpoint.checkpoint_shard_count == 1),
+        num_shards=cfg.checkpoint.checkpoint_shard_count,
+    )
+    logger.info(saved_cfg)
+
+    # loading the dataset should happen after the checkpoint has been loaded so we can give it the saved task config
+    task.load_dataset(cfg.dataset.gen_subset, task_cfg=saved_cfg.task)
+
+    # optimize ensemble for generation
+    for model in models:
+        if model is None:
+            continue
+        if cfg.common.fp16:
+            model.half()
+        if use_cuda and not cfg.distributed_training.pipeline_model_parallel:
+            model.cuda()
+        model.prepare_for_inference_(cfg)
+
+    # load dataset (possibly sharded)
+    itr = task.get_batch_iterator(
+        dataset=task.dataset(cfg.dataset.gen_subset),
+        max_tokens=cfg.dataset.max_tokens,
+        max_sentences=cfg.dataset.batch_size,
+        max_positions=None,
+        ignore_invalid_inputs=cfg.dataset.skip_invalid_size_inputs_valid_test,
+        required_batch_size_multiple=cfg.dataset.required_batch_size_multiple,
+        seed=cfg.common.seed,
+        num_shards=cfg.distributed_training.distributed_world_size,
+        shard_id=cfg.distributed_training.distributed_rank,
+        num_workers=cfg.dataset.num_workers,
+        data_buffer_size=cfg.dataset.data_buffer_size,
+    ).next_epoch_itr(shuffle=False)
+    progress = progress_bar.progress_bar(
+        itr,
+        log_format=cfg.common.log_format,
+        log_interval=cfg.common.log_interval,
+        default_log_format=("tqdm" if not cfg.common.no_progress_bar else "simple"),
+    )
+    
+    n_correct = 0
+    n_total = 0
+    assert hasattr(task.dataset(cfg.dataset.gen_subset), "tgt_dict")
+    dict_class = task.dataset(cfg.dataset.gen_subset).tgt_dict
+    for i, sample in enumerate(progress):
+        if "net_input" not in sample or "source" not in sample["net_input"]:
+            continue
+        sample = utils.move_to_cuda(sample) if use_cuda else sample
+        prefix_tokens = utils.move_to_cuda(
+            torch.LongTensor([[dict_class.eos()] for _ in range(len(sample["net_input"]["source"]))])
+        )
+
+        outs = task.generate_class(
+            models, 
+            sample["net_input"],
+            prefix_tokens,
+        )
+        prediction = outs.detach().cpu().tolist()
+        categories = [dict_class[predi] for predi in prediction]
+
+        if "target" in sample:
+            target = sample["target"].squeeze(1).detach().cpu().tolist()
+            labels = [dict_class[tgti] for tgti in target]
+
+        n_total += len(categories)
+        if "target" in sample:
+            r_correct = []
+            for ci, li in zip(categories, labels):
+                if ci == li:
+                    n_correct += 1
+                    r_correct.append(True)
+                else:
+                    r_correct.append(False)
+
+        logger.info(
+            f"{i} (size: {sample['net_input']['source'].shape}) -> {prediction} ({categories}) " +
+            f"<- target: {target} ({labels})\t{r_correct}" if "target" in sample else ""
+        )
+    logger.info(
+        f"Accuracy on {cfg.dataset.gen_subset}: {n_correct*100.0/n_total:.3f} ({n_correct}/{n_total})"
+    )
+
+
+def cli_main():
+    parser = options.get_generation_parser()
+    args = options.parse_args_and_arch(parser)
+    main(args)
+
+
+if __name__ == "__main__":
+    cli_main()
diff --git a/SpeechT5/scripts/generate_speech.py b/SpeechT5/scripts/generate_speech.py
new file mode 100644
index 0000000000000000000000000000000000000000..deed3e4a552660909e1b4088ab0c90e66206fea4
--- /dev/null
+++ b/SpeechT5/scripts/generate_speech.py
@@ -0,0 +1,199 @@
+import ast
+import logging
+import os
+import os.path as op
+import sys
+from argparse import Namespace
+
+import numpy as np
+import torch
+from fairseq import checkpoint_utils, options, tasks, utils
+from fairseq.dataclass.utils import convert_namespace_to_omegaconf
+from fairseq.logging import progress_bar
+from omegaconf import DictConfig
+
+
+# define function for plot prob and att_ws
+def _plot_and_save(array, figname, figsize=(6, 4), dpi=150):
+    import matplotlib.pyplot as plt
+
+    shape = array.shape
+    if len(shape) == 1:
+        # for eos probability
+        plt.figure(figsize=figsize, dpi=dpi)
+        plt.plot(array)
+        plt.xlabel("Frame")
+        plt.ylabel("Probability")
+        plt.ylim([0, 1])
+    elif len(shape) == 2:
+        # for tacotron 2 attention weights, whose shape is (out_length, in_length)
+        plt.figure(figsize=figsize, dpi=dpi)
+        plt.imshow(array, aspect="auto")
+    elif len(shape) == 4:
+        # for transformer attention weights,
+        # whose shape is (#leyers, #heads, out_length, in_length)
+        plt.figure(figsize=(figsize[0] * shape[0], figsize[1] * shape[1]), dpi=dpi)
+        for idx1, xs in enumerate(array):
+            for idx2, x in enumerate(xs, 1):
+                plt.subplot(shape[0], shape[1], idx1 * shape[1] + idx2)
+                plt.imshow(x, aspect="auto")
+                plt.xlabel("Input")
+                plt.ylabel("Output")
+    else:
+        raise NotImplementedError("Support only from 1D to 4D array.")
+    plt.tight_layout()
+    if not op.exists(op.dirname(figname)):
+        # NOTE: exist_ok = True is needed for parallel process decoding
+        os.makedirs(op.dirname(figname), exist_ok=True)
+    plt.savefig(figname)
+    plt.close()
+
+
+# define function to calculate focus rate
+# (see section 3.3 in https://arxiv.org/abs/1905.09263)
+def _calculate_focus_rete(att_ws):
+    if att_ws is None:
+        # fastspeech case -> None
+        return 1.0
+    elif len(att_ws.shape) == 2:
+        # tacotron 2 case -> (L, T)
+        return float(att_ws.max(dim=-1)[0].mean())
+    elif len(att_ws.shape) == 4:
+        # transformer case -> (#layers, #heads, L, T)
+        return float(att_ws.max(dim=-1)[0].mean(dim=-1).max())
+    else:
+        raise ValueError("att_ws should be 2 or 4 dimensional tensor.")
+
+
+def main(cfg: DictConfig):
+    if isinstance(cfg, Namespace):
+        cfg = convert_namespace_to_omegaconf(cfg)
+
+    assert cfg.common_eval.path is not None, "--path required for generation!"
+    assert (
+        cfg.generation.replace_unk is None or cfg.dataset.dataset_impl == "raw"
+    ), "--replace-unk requires a raw text dataset (--dataset-impl=raw)"
+
+    if cfg.common_eval.results_path is not None:
+        os.makedirs(cfg.common_eval.results_path, exist_ok=True)
+
+    return _main(cfg, sys.stdout)
+
+
+def _main(cfg: DictConfig, output_file):
+    logging.basicConfig(
+        format="%(asctime)s | %(levelname)s | %(name)s | %(message)s",
+        datefmt="%Y-%m-%d %H:%M:%S",
+        level=os.environ.get("LOGLEVEL", "INFO").upper(),
+        stream=output_file,
+    )
+    logger = logging.getLogger("speecht5.generate_speech")
+
+    utils.import_user_module(cfg.common)
+
+    assert cfg.dataset.batch_size == 1, "only support batch size 1"
+    logger.info(cfg)
+
+    # Fix seed for stochastic decoding
+    if cfg.common.seed is not None and not cfg.generation.no_seed_provided:
+        np.random.seed(cfg.common.seed)
+        utils.set_torch_seed(cfg.common.seed)
+
+    use_cuda = torch.cuda.is_available() and not cfg.common.cpu
+    if not use_cuda:
+        logger.info("generate speech on cpu")
+
+    # build task
+    task = tasks.setup_task(cfg.task)
+
+    # Load ensemble
+    logger.info("loading model(s) from {}".format(cfg.common_eval.path))
+    overrides = ast.literal_eval(cfg.common_eval.model_overrides)
+    models, saved_cfg = checkpoint_utils.load_model_ensemble(
+        utils.split_paths(cfg.common_eval.path),
+        arg_overrides=overrides,
+        task=task,
+        suffix=cfg.checkpoint.checkpoint_suffix,
+        strict=(cfg.checkpoint.checkpoint_shard_count == 1),
+        num_shards=cfg.checkpoint.checkpoint_shard_count,
+    )
+    logger.info(saved_cfg)
+
+    # loading the dataset should happen after the checkpoint has been loaded so we can give it the saved task config
+    task.load_dataset(cfg.dataset.gen_subset, task_cfg=saved_cfg.task)
+
+    # optimize ensemble for generation
+    for model in models:
+        if model is None:
+            continue
+        if cfg.common.fp16:
+            model.half()
+        if use_cuda and not cfg.distributed_training.pipeline_model_parallel:
+            model.cuda()
+        model.prepare_for_inference_(cfg)
+
+    # load dataset (possibly sharded)
+    itr = task.get_batch_iterator(
+        dataset=task.dataset(cfg.dataset.gen_subset),
+        max_tokens=cfg.dataset.max_tokens,
+        max_sentences=cfg.dataset.batch_size,
+        max_positions=None,
+        ignore_invalid_inputs=cfg.dataset.skip_invalid_size_inputs_valid_test,
+        required_batch_size_multiple=cfg.dataset.required_batch_size_multiple,
+        seed=cfg.common.seed,
+        num_shards=cfg.distributed_training.distributed_world_size,
+        shard_id=cfg.distributed_training.distributed_rank,
+        num_workers=cfg.dataset.num_workers,
+        data_buffer_size=cfg.dataset.data_buffer_size,
+    ).next_epoch_itr(shuffle=False)
+    progress = progress_bar.progress_bar(
+        itr,
+        log_format=cfg.common.log_format,
+        log_interval=cfg.common.log_interval,
+        default_log_format=("tqdm" if not cfg.common.no_progress_bar else "simple"),
+    )
+    
+    for i, sample in enumerate(progress):
+        if "net_input" not in sample:
+            continue
+        sample = utils.move_to_cuda(sample) if use_cuda else sample
+        outs, _, attn = task.generate_speech(
+            models, 
+            sample["net_input"],
+        )
+        focus_rate = _calculate_focus_rete(attn)
+        outs = outs.cpu().numpy()
+        audio_name = op.basename(sample['name'][0])
+        np.save(op.join(cfg.common_eval.results_path, audio_name.replace(".wav", "-feats.npy")), outs)
+
+        logging.info(
+            "{} (size: {}->{} ({}), focus rate: {:.3f})".format(
+                sample['name'][0],
+                sample['src_lengths'][0].item(),
+                outs.shape[0],
+                sample['dec_target_lengths'][0].item(), 
+                focus_rate
+            )
+        )
+
+        if i < 6 and attn is not None:
+            import shutil
+            demo_dir = op.join(op.dirname(cfg.common_eval.results_path), "demo")
+            audio_dir = op.join(demo_dir, "audio")
+            os.makedirs(audio_dir, exist_ok=True)
+            shutil.copy(op.join(task.dataset(cfg.dataset.gen_subset).audio_root, sample['tgt_name'][0] if "tgt_name" in sample else sample['name'][0]), op.join(audio_dir, audio_name))
+            att_dir = op.join(demo_dir, "att_ws")
+            _plot_and_save(attn.cpu().numpy(), op.join(att_dir, f"{audio_name}_att_ws.png"))
+            spec_dir = op.join(demo_dir, "spec")
+            _plot_and_save(outs.T, op.join(spec_dir, f"{audio_name}_gen.png"))
+            _plot_and_save(sample["target"][0].cpu().numpy().T, op.join(spec_dir, f"{audio_name}_ori.png"))
+
+
+def cli_main():
+    parser = options.get_generation_parser()
+    args = options.parse_args_and_arch(parser)
+    main(args)
+
+
+if __name__ == "__main__":
+    cli_main()
diff --git a/SpeechT5/speecht5/__init__.py b/SpeechT5/speecht5/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..8994f9a368ae4b2eff720fffb134e2a5b813ee1c
--- /dev/null
+++ b/SpeechT5/speecht5/__init__.py
@@ -0,0 +1 @@
+from . import data, tasks, criterions, models   # noqa
\ No newline at end of file
diff --git a/SpeechT5/speecht5/criterions/__init__.py b/SpeechT5/speecht5/criterions/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..f4aa07ca8bf24092095fc9303ee4a1d80cec7b35
--- /dev/null
+++ b/SpeechT5/speecht5/criterions/__init__.py
@@ -0,0 +1,10 @@
+import importlib
+import os
+
+
+for file in os.listdir(os.path.dirname(__file__)):
+    if file.endswith(".py") and not file.startswith("_"):
+        criterion_name = file[: file.find(".py")]
+        importlib.import_module(
+            "speecht5.criterions." + criterion_name
+        )
\ No newline at end of file
diff --git a/SpeechT5/speecht5/criterions/speech_pretrain_criterion.py b/SpeechT5/speecht5/criterions/speech_pretrain_criterion.py
new file mode 100644
index 0000000000000000000000000000000000000000..7377d968e2ac316eed140e1c4f72b9cab621f582
--- /dev/null
+++ b/SpeechT5/speecht5/criterions/speech_pretrain_criterion.py
@@ -0,0 +1,268 @@
+# --------------------------------------------------------
+# SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing (https://arxiv.org/abs/2110.07205)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/SpeechT5
+# Copyright (c) 2021 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Based on fairseq and espnet code bases
+# https://github.com/pytorch/fairseq; https://github.com/espnet/espnet
+# --------------------------------------------------------
+
+import math
+import re
+from dataclasses import dataclass, field
+from typing import List, Optional
+
+import torch
+import torch.nn.functional as F
+from fairseq import utils
+from fairseq.logging import metrics
+from fairseq.criterions import FairseqCriterion
+from speecht5.criterions.text_to_speech_loss import TexttoSpeechLoss, TexttoSpeechLossConfig
+
+
+@dataclass
+class SpeechPretrainCriterionConfig(TexttoSpeechLossConfig):
+    pred_masked_weight: float = field(
+        default=1.0,
+        metadata={"help": "weight for predictive loss for masked frames"},
+    )
+    pred_nomask_weight: float = field(
+        default=0.0,
+        metadata={"help": "weight for predictive loss for unmasked frames"},
+    )
+    loss_weights: Optional[List[float]] = field(
+        default_factory=lambda: [10,],
+        metadata={"help": "weights for additional loss terms (not first one)"},
+    )
+    log_keys: List[str] = field(
+        default_factory=lambda: [],
+        metadata={"help": "output keys to log"},
+    )
+    hubert_weight: float = field(
+        default=1.0,
+        metadata={"help": "weight of hubert loss"},
+    )
+    dec_weight: float = field(
+        default=1.0,
+        metadata={"help": "weight of decoder loss"},
+    )
+
+
+class SpeechPretrainCriterion(FairseqCriterion):
+    def __init__(
+        self, 
+        task, 
+        sentence_avg, 
+        pred_masked_weight, 
+        pred_nomask_weight, 
+        loss_weights=None, 
+        log_keys=None,
+        use_masking=True,
+        use_weighted_masking=False,
+        loss_type="L1",
+        bce_pos_weight=5.0,
+        hubert_weight=1.0,
+        dec_weight=1.0,
+    ):
+        super().__init__(task)
+        self.pred_masked_weight = pred_masked_weight
+        self.pred_nomask_weight = pred_nomask_weight
+        self.loss_weights = loss_weights
+        self.log_keys = [] if log_keys is None else log_keys
+        self.hubert_weight = hubert_weight
+        self.dec_weight = dec_weight
+
+        self.speech_criterion = TexttoSpeechLoss(
+            task,
+            sentence_avg,
+            use_masking,
+            use_weighted_masking,
+            loss_type,
+            bce_pos_weight,
+        )
+
+    def forward(self, model, sample, reduce=True, log_pred=False):
+        """Compute the loss for the given sample.
+        Returns a tuple with three elements:
+        1) the loss
+        2) the sample size, which is used as the denominator for the gradient
+        3) logging outputs to display while training
+        """
+        if self.dec_weight == 0:
+            sample["net_input"]["only_hubert"] = True
+        net_output, net_output_dec = model(target_list=sample["target_list"], **sample["net_input"])
+        loss = 0.
+        sample_size = 0
+        logging_output = {}
+        reduction = "sum" if reduce else "none"
+
+        loss_m_list = []
+        logp_m_list = model.get_logits(net_output, True)
+        targ_m_list = model.get_targets(None, net_output, True)
+        assert self.pred_masked_weight == 0 or len(logp_m_list) > 0
+        for i, (logp_m, targ_m) in enumerate(zip(logp_m_list, targ_m_list)):
+            loss_m = F.cross_entropy(logp_m, targ_m, reduction=reduction)
+            loss_m_list.append(loss_m)
+            logging_output[f"loss_m_{i}"] = loss_m.detach().item()
+        if self.pred_masked_weight > 0:
+            loss += self.pred_masked_weight * sum(loss_m_list)
+            sample_size += targ_m_list[0].numel()
+
+        loss_u_list = []
+        logp_u_list = model.get_logits(net_output, False)
+        targ_u_list = model.get_targets(None, net_output, False)
+        assert self.pred_nomask_weight == 0 or len(logp_u_list) > 0
+        for i, (logp_u, targ_u) in enumerate(zip(logp_u_list, targ_u_list)):
+            loss_u = F.cross_entropy(logp_u, targ_u, reduction=reduction)
+            loss_u_list.append(loss_u)
+            logging_output[f"loss_u_{i}"] = loss_u.detach().item()
+        if self.pred_nomask_weight > 0:
+            loss += self.pred_nomask_weight * sum(loss_u_list)
+            sample_size += targ_u_list[0].numel()
+
+        if self.loss_weights is not None:
+            assert hasattr(model, "get_extra_losses")
+            extra_losses, names = model.get_extra_losses(net_output)
+            if torch.is_tensor(extra_losses):
+                extra_losses = [extra_losses]
+                names = [names]
+            if len(self.loss_weights) == 1 and len(extra_losses) != 1:
+                self.loss_weights = [self.loss_weights[0]] * len(extra_losses)
+            if len(self.loss_weights) > len(extra_losses):
+                modified_loss_weight = self.loss_weights[:len(extra_losses)]
+            else:
+                modified_loss_weight = self.loss_weights
+
+            # assert len(extra_losses) == len(self.loss_weights), f"{len(extra_losses)}, {len(self.loss_weights)}"
+            for p, n, coef in zip(extra_losses, names, modified_loss_weight):
+                # print(n + str(coef))
+                if coef != 0 and p is not None:
+                    p = coef * p.float() * sample_size
+                    loss += p
+                    logging_output[f"loss_{n}"] = p.detach().item()
+
+        logging_output = {
+            "ntokens": sample_size,
+            "nsentences": sample["id"].numel(),
+            "sample_size": sample_size,
+            "ngpu": 1,
+            **logging_output,
+        }
+
+        if 'loss_prob_perplexity' in logging_output:
+            logging_output['code_perplexity'] = net_output['code_perplexity'].detach().item()     
+
+        for lk in self.log_keys:
+            if lk in net_output:
+                logging_output[lk] = float((net_output[lk].item()))
+
+        def compute_correct(logits):
+            if logits.numel() == 0:
+                return 0, 0
+            else:
+                assert logits.dim() > 1, logits.shape
+                max = logits.argmax(-1) == 0
+                min = logits.argmin(-1) == 0
+                both = max & min
+                corr = max.long().sum().item() - both.long().sum().item()
+                count = max.numel()
+                return corr, count
+
+        with torch.no_grad():
+            for i, logp_m in enumerate(logp_m_list):
+                corr_m, count_m = compute_correct(logp_m)
+                logging_output[f"correct_m_{i}"] = corr_m
+                logging_output[f"count_m_{i}"] = count_m
+
+            for i, logp_u in enumerate(logp_u_list):
+                corr_u, count_u = compute_correct(logp_u)
+                logging_output[f"correct_u_{i}"] = corr_u
+                logging_output[f"count_u_{i}"] = count_u
+
+        if self.dec_weight == 0.0:
+            logging_output["loss"] = loss.item() if reduce else loss
+            return loss, sample_size, logging_output
+
+#       ## dec loss
+        dec_loss, l1_loss, l2_loss, bce_loss, enc_dec_attn_loss = self.speech_criterion.compute_loss(model, net_output_dec, sample)
+        
+        # Log tts loss
+        logging_output['dec_loss'] = dec_loss.item()
+        logging_output['l1_loss'] = l1_loss.item()
+        logging_output['l2_loss'] = l2_loss.item()
+        logging_output['bce_loss'] = bce_loss.item()
+        if enc_dec_attn_loss is not None:
+            logging_output['enc_dec_attn_loss'] = enc_dec_attn_loss.item()
+
+        loss = self.hubert_weight * loss + self.dec_weight * sample_size * dec_loss
+        logging_output["loss"] = loss.item() if reduce else loss
+        return loss, sample_size, logging_output
+
+    @staticmethod
+    def reduce_metrics(logging_outputs) -> None:
+        """Aggregate logging outputs from data parallel training (copied from normal cross entropy)."""
+        loss_sum = sum(log.get("loss", 0) for log in logging_outputs)
+        ntokens = sum(log.get("ntokens", 0) for log in logging_outputs)
+        sample_size = sum(log.get("sample_size", 0) for log in logging_outputs)
+        dec_loss_sum = sum(log.get("dec_loss", 0) for log in logging_outputs)
+        l1_loss_sum = sum(log.get("l1_loss", 0) for log in logging_outputs)
+        l2_loss_sum = sum(log.get("l2_loss", 0) for log in logging_outputs)
+        bce_loss_sum = sum(log.get("bce_loss", 0) for log in logging_outputs)
+        ngpu = sum(log.get("ngpu", 0) for log in logging_outputs)
+
+        metrics.log_scalar("loss", loss_sum / sample_size / math.log(2), sample_size, round=3)
+        if sample_size != ntokens:
+            metrics.log_scalar("nll_loss", loss_sum / ntokens / math.log(2), ntokens, round=3)
+            metrics.log_derived("ppl", lambda meters: utils.get_perplexity(meters["nll_loss"].avg))
+        else:
+            metrics.log_derived("ppl", lambda meters: utils.get_perplexity(meters["loss"].avg))
+
+        counts = {}
+        for lk in logging_outputs[0].keys():
+            if lk.startswith("count_"):
+                val = sum(log[lk] for log in logging_outputs)
+                metrics.log_scalar(lk, val)
+                counts[lk] = val
+
+        for lk in logging_outputs[0].keys():
+            if lk.startswith("loss_"):
+                val = sum(log[lk] for log in logging_outputs)
+                metrics.log_scalar(lk, val / sample_size / math.log(2), round=3)
+            elif lk.startswith("correct_"):
+                val = sum(log[lk] for log in logging_outputs)
+                metrics.log_scalar(lk, val / counts[re.sub("correct", "count", lk)])
+            elif lk == 'code_perplexity':
+                val = sum(log[lk] for log in logging_outputs)
+                metrics.log_scalar(lk, val / len(logging_outputs), round=3)
+
+        metrics.log_scalar(
+            "dec_loss", dec_loss_sum / ngpu, sample_size, 2, round=5
+        )
+        metrics.log_scalar(
+            "l1_loss", l1_loss_sum / ngpu, sample_size, 2, round=5
+        )
+        metrics.log_scalar(
+            "l2_loss", l2_loss_sum / ngpu, sample_size, 2, round=5
+        )
+        metrics.log_scalar(
+            "bce_loss", bce_loss_sum / ngpu, sample_size, 2, round=5
+        )
+        if "enc_dec_attn_loss" in logging_outputs[0]:
+            enc_dec_attn_loss_sum = sum(log.get("enc_dec_attn_loss", 0) for log in logging_outputs)
+            metrics.log_scalar(
+                "enc_dec_attn_loss", enc_dec_attn_loss_sum / ngpu, sample_size, round=8
+            )
+
+    @staticmethod
+    def aggregate_logging_outputs(logging_outputs):
+        """Aggregate logging outputs from data parallel training."""
+        raise NotImplementedError()
+
+    @staticmethod
+    def logging_outputs_can_be_summed() -> bool:
+        """
+        Whether the logging outputs returned by `forward` can be summed
+        across workers prior to calling `reduce_metrics`. Setting this
+        to True will improves distributed training speed.
+        """
+        return False
diff --git a/SpeechT5/speecht5/criterions/speech_to_text_loss.py b/SpeechT5/speecht5/criterions/speech_to_text_loss.py
new file mode 100644
index 0000000000000000000000000000000000000000..bff149b1d725ec1520cad4bc861403422103d46d
--- /dev/null
+++ b/SpeechT5/speecht5/criterions/speech_to_text_loss.py
@@ -0,0 +1,476 @@
+# --------------------------------------------------------
+# SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing (https://arxiv.org/abs/2110.07205)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/SpeechT5
+# Copyright (c) 2021 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Based on fairseq and espnet code bases
+# https://github.com/pytorch/fairseq; https://github.com/espnet/espnet
+# --------------------------------------------------------
+
+import math
+from argparse import Namespace
+from dataclasses import dataclass, field
+from omegaconf import II
+from typing import Optional
+
+import torch
+import torch.nn.functional as F
+from fairseq import  utils
+from fairseq.logging import metrics
+from fairseq.criterions import FairseqCriterion, register_criterion
+from fairseq.dataclass import FairseqDataclass
+from fairseq.data.data_utils import post_process
+from fairseq.tasks import FairseqTask
+from fairseq.logging.meters import safe_round
+
+import logging
+logger = logging.getLogger(__name__)
+
+@dataclass
+class SpeechtoTextLossConfig(FairseqDataclass):
+    zero_infinity: bool = field(
+        default=False,
+        metadata={"help": "zero inf loss when source length <= target length"},
+    )
+    sentence_avg: bool = II("optimization.sentence_avg")
+    post_process: Optional[str] = field(
+        default="sentencepiece",
+        metadata={
+            "help": "how to post process predictions into words. can be letter, "
+            "wordpiece, BPE symbols, etc. "
+            "See fairseq.data.data_utils.post_process() for full list of options"
+        },
+    )
+    wer_kenlm_model: Optional[str] = field(
+        default=None,
+        metadata={
+            "help": "if this is provided, use kenlm to compute wer (along with other wer_* args)"
+        },
+    )
+    wer_lexicon: Optional[str] = field(
+        default=None,
+        metadata={"help": "lexicon to use with wer_kenlm_model"},
+    )
+    wer_lm_weight: float = field(
+        default=2.0,
+        metadata={"help": "lm weight to use with wer_kenlm_model"},
+    )
+    wer_word_score: float = field(
+        default=-1.0,
+        metadata={"help": "lm word score to use with wer_kenlm_model"},
+    )
+
+    wer_args: Optional[str] = field(
+        default=None,
+        metadata={
+            "help": "DEPRECATED: tuple of (wer_kenlm_model, wer_lexicon, wer_lm_weight, wer_word_score)"
+        },
+    )
+
+    label_smoothing: float = field(
+        default=0.0,
+        metadata={"help": "epsilon for label smoothing, 0 means no label smoothing"},
+    )
+    report_accuracy: bool = field(
+        default=False,
+        metadata={"help": "report accuracy metric"},
+    )
+    ignore_prefix_size: int = field(
+        default=0,
+        metadata={"help": "Ignore first N tokens"},
+    )
+    #: bool = II("optimization.sentence_avg")
+
+    ce_weight: float = field(
+        default=1.0,
+        metadata={"help": "loss weight for cross entropy"},
+    )
+    ctc_weight: float = field(
+        default=0.0,
+        metadata={"help": "loss weiehgt for ctc in ASR"},
+    )
+
+
+def label_smoothed_nll_loss(lprobs, target, epsilon, ignore_index=None, reduce=True):
+    if target.dim() == lprobs.dim() - 1:
+        target = target.unsqueeze(-1)
+    nll_loss = -lprobs.gather(dim=-1, index=target)
+    smooth_loss = -lprobs.sum(dim=-1, keepdim=True)
+    if ignore_index is not None:
+        pad_mask = target.eq(ignore_index)
+        nll_loss.masked_fill_(pad_mask, 0.0)
+        smooth_loss.masked_fill_(pad_mask, 0.0)
+    else:
+        nll_loss = nll_loss.squeeze(-1)
+        smooth_loss = smooth_loss.squeeze(-1)
+    if reduce:
+        nll_loss = nll_loss.sum()
+        smooth_loss = smooth_loss.sum()
+    eps_i = epsilon / (lprobs.size(-1) - 1)
+    loss = (1.0 - epsilon - eps_i) * nll_loss + eps_i * smooth_loss
+    return loss, nll_loss
+
+
+class SpeechtoTextLoss(FairseqCriterion):
+    def __init__(
+        self,
+        cfg: SpeechtoTextLossConfig, 
+        task: FairseqTask,
+        sentence_avg=True,
+        label_smoothing=0.1,
+        ignore_prefix_size=0,
+        report_accuracy=False,
+        ce_weight=1.0,
+        ctc_weight=0.0,
+    ):
+
+        super().__init__(task)
+        self.blank_idx = (
+            task.target_dictionary.index(task.blank_symbol)
+            if hasattr(task, "blank_symbol")
+            else 0
+        )
+        #print ("self.blank_idx: ", self.blank_idx)
+
+        self.pad_idx = task.target_dictionary.pad()
+        self.eos_idx = task.target_dictionary.eos()
+        self.post_process = cfg.post_process
+        self.ce_weight = ce_weight
+        self.ctc_weight = ctc_weight
+
+        ## for ce
+        self.sentence_avg = sentence_avg
+        self.eps = label_smoothing
+        self.ignore_prefix_size = ignore_prefix_size
+        self.report_accuracy = report_accuracy
+
+        if cfg.wer_args is not None:
+            (
+                cfg.wer_kenlm_model,
+                cfg.wer_lexicon,
+                cfg.wer_lm_weight,
+                cfg.wer_word_score,
+            ) = eval(cfg.wer_args)
+
+        if cfg.wer_kenlm_model is not None:
+            from examples.speech_recognition.w2l_decoder import W2lKenLMDecoder
+
+            dec_args = Namespace()
+            dec_args.nbest = 1
+            dec_args.criterion = "ctc"
+            dec_args.kenlm_model = cfg.wer_kenlm_model
+            dec_args.lexicon = cfg.wer_lexicon
+            dec_args.beam = 50
+            dec_args.beam_size_token = min(50, len(task.target_dictionary))
+            dec_args.beam_threshold = min(50, len(task.target_dictionary))
+            dec_args.lm_weight = cfg.wer_lm_weight
+            dec_args.word_score = cfg.wer_word_score
+            dec_args.unk_weight = -math.inf
+            dec_args.sil_weight = 0
+
+            self.w2l_decoder = W2lKenLMDecoder(dec_args, task.target_dictionary)
+        else:
+            self.w2l_decoder = None
+
+        self.zero_infinity = cfg.zero_infinity
+        #self.sentence_avg = cfg.sentence_avg
+
+        if self.ce_weight > 0 and self.ctc_weight > 0:
+            logger.info("Using cross entropy loss and CTC loss for ASR")
+        elif self.ce_weight > 0:
+            logger.info("Only using CE loss")
+        elif self.ctc_weight > 0:
+            logger.info("Only using CTC loss for ASR")
+        else:
+            logger.info("ERROR")
+
+    def forward(self, model, sample, reduce=True):
+
+        if self.ce_weight == 0 and self.ctc_weight > 0:
+            sample["only_ctc"] = True
+
+        net_output_decoder, net_output = model(**sample["net_input"])
+        
+        if self.ce_weight > 0:
+            loss_ce, nll_loss_ce = self.compute_loss(model, net_output_decoder, sample, reduce=reduce)
+            #print ("loss_ce: ", loss_ce)
+        else:
+            nll_loss_ce = None
+
+        if self.ctc_weight > 0:
+            loss_ctc, lprobs, input_lengths = self.compute_loss_ctc(model, net_output, sample)
+
+        if self.ce_weight > 0 and self.ctc_weight > 0:
+            loss = self.ce_weight * loss_ce + self.ctc_weight * loss_ctc
+        elif self.ce_weight > 0:
+            loss = loss_ce
+        elif self.ctc_weight > 0:
+            loss = loss_ctc
+        else:
+            logger.info("ERROR: must ce_weight > 0 or ctc_weight > 0")
+
+        ntokens = (
+            sample["ntokens"] if "ntokens" in sample else sample["target_lengths"].sum().item()
+        )
+
+        sample_size = sample["target"].size(0) if self.sentence_avg else ntokens
+
+        logging_output = {
+            "loss": loss.item(),
+            "ce_loss": loss_ce.item() if self.ce_weight > 0 else 0,
+            "ctc_loss": loss_ctc.item() if self.ctc_weight > 0 else 0,
+            "nll_loss": nll_loss_ce.item() if nll_loss_ce is not None else 0,
+            "ntokens": sample["ntokens"],
+            "nsentences": sample["target"].size(0),
+            "sample_size": sample_size,
+        }
+
+        if self.ce_weight > 0 and self.report_accuracy:
+            n_correct, total = self.compute_accuracy(model, net_output_decoder, sample)
+            logging_output["n_correct"] = utils.item(n_correct.item())
+            logging_output["total"] = utils.item(total.data)
+
+        if self.ctc_weight > 0 and not model.training:
+            import editdistance
+
+            with torch.no_grad():
+                lprobs_t = lprobs.transpose(0, 1).float().contiguous().cpu()
+
+                c_err = 0
+                c_len = 0
+                w_errs = 0
+                w_len = 0
+                wv_errs = 0
+                for lp, t, inp_l in zip(
+                    lprobs_t,
+                    sample["target_label"]
+                    if "target_label" in sample
+                    else sample["target"],
+                    input_lengths,
+                ):
+                    lp = lp[:inp_l].unsqueeze(0)
+
+                    decoded = None
+                    if self.w2l_decoder is not None:
+                        decoded = self.w2l_decoder.decode(lp)
+                        if len(decoded) < 1:
+                            decoded = None
+                        else:
+                            decoded = decoded[0]
+                            if len(decoded) < 1:
+                                decoded = None
+                            else:
+                                decoded = decoded[0]
+
+                    p = (t != self.task.target_dictionary.pad()) & (
+                        t != self.task.target_dictionary.eos()
+                    )
+                    targ = t[p]
+                    targ_units = self.task.target_dictionary.string(targ)
+                    targ_units_arr = targ.tolist()
+
+                    toks = lp.argmax(dim=-1).unique_consecutive()
+                    pred_units_arr = toks[toks != self.blank_idx].tolist()
+
+                    c_err += editdistance.eval(pred_units_arr, targ_units_arr)
+                    c_len += len(targ_units_arr)
+
+                    targ_words = post_process(targ_units, self.post_process).split()
+
+                    pred_units = self.task.target_dictionary.string(pred_units_arr)
+                    pred_words_raw = post_process(pred_units, self.post_process).split()
+
+                    if decoded is not None and "words" in decoded:
+                        pred_words = decoded["words"]
+                        w_errs += editdistance.eval(pred_words, targ_words)
+                        wv_errs += editdistance.eval(pred_words_raw, targ_words)
+                    else:
+                        dist = editdistance.eval(pred_words_raw, targ_words)
+                        w_errs += dist
+                        wv_errs += dist
+
+                    w_len += len(targ_words)
+
+                logging_output["wv_errors"] = wv_errs
+                logging_output["w_errors"] = w_errs
+                logging_output["w_total"] = w_len
+                logging_output["c_errors"] = c_err
+                logging_output["c_total"] = c_len
+
+        return loss, sample_size, logging_output
+
+    def compute_loss_ctc(self, model, net_output, sample):
+        lprobs = model.get_normalized_probs_for_ctc(
+            net_output, log_probs=True
+        ).contiguous()  # (T, B, C) from the encoder
+
+        if net_output["encoder_padding_mask"] is not None:
+            non_padding_mask = ~net_output["encoder_padding_mask"][0]
+            input_lengths = non_padding_mask.long().sum(-1)
+        else:
+            input_lengths = lprobs.new_full(
+                (lprobs.size(1),), lprobs.size(0), dtype=torch.long
+            )
+
+        pad_mask = (sample["target"] != self.pad_idx) & (
+            sample["target"] != self.eos_idx
+        )
+        targets_flat = sample["target"].masked_select(pad_mask)
+        if "target_lengths" in sample:
+            target_lengths = sample["target_lengths"]
+        else:
+            target_lengths = pad_mask.sum(-1)
+
+        ##processing
+        target_lengths = target_lengths - 1
+
+        with torch.backends.cudnn.flags(enabled=False):
+            loss_ctc = F.ctc_loss(
+                lprobs,
+                targets_flat,
+                input_lengths,
+                target_lengths,
+                blank=self.blank_idx,
+                reduction="sum",
+                zero_infinity=self.zero_infinity,
+            )
+
+        return loss_ctc, lprobs, input_lengths
+
+    ## for ce
+    def get_lprobs_and_target(self, model, net_output, sample):
+        lprobs = model.get_normalized_probs(net_output, log_probs=True)
+        target = model.get_targets(sample, net_output)
+        if self.ignore_prefix_size > 0:
+            if getattr(lprobs, "batch_first", False):
+                lprobs = lprobs[:, self.ignore_prefix_size :, :].contiguous()
+                target = target[:, self.ignore_prefix_size :].contiguous()
+            else:
+                lprobs = lprobs[self.ignore_prefix_size :, :, :].contiguous()
+                target = target[self.ignore_prefix_size :, :].contiguous()
+        return lprobs.view(-1, lprobs.size(-1)), target.view(-1)
+
+    def compute_loss(self, model, net_output, sample, reduce=True):
+        lprobs, target = self.get_lprobs_and_target(model, net_output, sample)
+        loss, nll_loss = label_smoothed_nll_loss(
+            lprobs,
+            target,
+            self.eps,
+            ignore_index=self.padding_idx,
+            reduce=reduce,
+        )
+        return loss, nll_loss
+
+    def compute_accuracy(self, model, net_output, sample):
+        lprobs, target = self.get_lprobs_and_target(model, net_output, sample)
+        mask = target.ne(self.padding_idx)
+        n_correct = torch.sum(
+            lprobs.argmax(1).masked_select(mask).eq(target.masked_select(mask))
+        )
+        total = torch.sum(mask)
+        return n_correct, total
+
+
+    @staticmethod
+    def reduce_metrics(logging_outputs) -> None:
+        """Aggregate logging outputs from data parallel training."""
+
+        loss_sum = utils.item(sum(log.get("loss", 0) for log in logging_outputs))
+        nll_loss_sum = sum(log.get("nll_loss", 0) for log in logging_outputs)
+        ce_loss_sum = sum(log.get("ce_loss", 0) for log in logging_outputs)
+        ctc_loss_sum = sum(log.get("ctc_loss", 0) for log in logging_outputs)
+        ntokens = utils.item(sum(log.get("ntokens", 0) for log in logging_outputs))
+        nsentences = utils.item(
+            sum(log.get("nsentences", 0) for log in logging_outputs)
+        )
+        sample_size = utils.item(
+            sum(log.get("sample_size", 0) for log in logging_outputs)
+        )
+
+        metrics.log_scalar(
+            "loss", loss_sum / sample_size / math.log(2), sample_size, round=3
+        )
+
+        metrics.log_scalar(
+            "ctc_loss", ctc_loss_sum / sample_size / math.log(2), ntokens, 2, round=3
+        )
+        metrics.log_scalar(
+            "ce_loss", ce_loss_sum / ntokens, ntokens, 2, round=3
+        )
+        metrics.log_scalar(
+            "nll_loss", nll_loss_sum / ntokens / math.log(2), ntokens, 2, round=3
+        )
+        metrics.log_derived(
+            "ppl", lambda meters: utils.get_perplexity(meters["nll_loss"].avg, 2)
+        )
+        
+        total = utils.item(sum(log.get("total", 0) for log in logging_outputs))
+        if total > 0:
+            metrics.log_scalar("total", total)
+            n_correct = utils.item(
+                sum(log.get("n_correct", 0) for log in logging_outputs)
+            )
+            metrics.log_scalar("n_correct", n_correct)
+            metrics.log_derived(
+                "accuracy",
+                lambda meters: round(
+                    meters["n_correct"].sum * 100.0 / meters["total"].sum, 3
+                )
+                if meters["total"].sum > 0
+                else float("nan"),
+                2
+            )
+
+        metrics.log_scalar("ntokens", ntokens)
+        metrics.log_scalar("nsentences", nsentences)
+        if sample_size != ntokens:
+            metrics.log_scalar(
+                "nll_loss", loss_sum / ntokens / math.log(2), ntokens, round=3
+            )
+
+        c_errors = sum(log.get("c_errors", 0) for log in logging_outputs)
+        metrics.log_scalar("_c_errors", c_errors)
+        c_total = sum(log.get("c_total", 0) for log in logging_outputs)
+        metrics.log_scalar("_c_total", c_total)
+        w_errors = sum(log.get("w_errors", 0) for log in logging_outputs)
+        metrics.log_scalar("_w_errors", w_errors)
+        wv_errors = sum(log.get("wv_errors", 0) for log in logging_outputs)
+        metrics.log_scalar("_wv_errors", wv_errors)
+        w_total = sum(log.get("w_total", 0) for log in logging_outputs)
+        metrics.log_scalar("_w_total", w_total)
+
+        if c_total > 0:
+            metrics.log_derived(
+                "uer",
+                lambda meters: safe_round(
+                    meters["_c_errors"].sum * 100.0 / meters["_c_total"].sum, 3
+                )
+                if meters["_c_total"].sum > 0
+                else float("nan"),
+            )
+        if w_total > 0:
+            metrics.log_derived(
+                "wer",
+                lambda meters: safe_round(
+                    meters["_w_errors"].sum * 100.0 / meters["_w_total"].sum, 3
+                )
+                if meters["_w_total"].sum > 0
+                else float("nan"),
+            )
+            metrics.log_derived(
+                "raw_wer",
+                lambda meters: safe_round(
+                    meters["_wv_errors"].sum * 100.0 / meters["_w_total"].sum, 3
+                )
+                if meters["_w_total"].sum > 0
+                else float("nan"),
+            )
+
+    @staticmethod
+    def logging_outputs_can_be_summed() -> bool:
+        """
+        Whether the logging outputs returned by `forward` can be summed
+        across workers prior to calling `reduce_metrics`. Setting this
+        to True will improves distributed training speed.
+        """
+        return True
diff --git a/SpeechT5/speecht5/criterions/speecht5_criterion.py b/SpeechT5/speecht5/criterions/speecht5_criterion.py
new file mode 100644
index 0000000000000000000000000000000000000000..089405e9f5145993bcacb6d148d12bc8644a0fd0
--- /dev/null
+++ b/SpeechT5/speecht5/criterions/speecht5_criterion.py
@@ -0,0 +1,446 @@
+# --------------------------------------------------------
+# SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing (https://arxiv.org/abs/2110.07205)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/SpeechT5
+# Copyright (c) 2021 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Based on fairseq and espnet code bases
+# https://github.com/pytorch/fairseq; https://github.com/espnet/espnet
+# --------------------------------------------------------
+
+import re
+from dataclasses import dataclass
+
+import math
+from fairseq import utils
+from fairseq.logging import metrics
+from fairseq.criterions import FairseqCriterion, register_criterion
+from speecht5.criterions.text_to_speech_loss import TexttoSpeechLoss
+from speecht5.criterions.text_pretrain_criterion import TextPretrainCriterion, TextPretrainCriterionConfig
+from fairseq.criterions.label_smoothed_cross_entropy import LabelSmoothedCrossEntropyCriterionConfig
+from speecht5.criterions.speech_pretrain_criterion import SpeechPretrainCriterion, SpeechPretrainCriterionConfig
+from speecht5.criterions.speech_to_text_loss import SpeechtoTextLoss, SpeechtoTextLossConfig                                                  
+from fairseq.logging.meters import safe_round
+
+@dataclass
+class SpeechT5CriterionConfig(
+    LabelSmoothedCrossEntropyCriterionConfig, 
+    TextPretrainCriterionConfig,
+    SpeechPretrainCriterionConfig,
+    SpeechtoTextLossConfig
+    ):
+    pass
+
+@register_criterion(
+    "speecht5", dataclass=SpeechT5CriterionConfig
+)
+class SpeechT5Criterion(FairseqCriterion):
+    def __init__(
+        self,
+        task,
+        sentence_avg,
+        label_smoothing,
+        pred_masked_weight, 
+        pred_nomask_weight, 
+        loss_weights=None, 
+        log_keys=None,
+        ignore_prefix_size=0,
+        report_accuracy=False,
+        use_masking=True,
+        use_weighted_masking=False,
+        loss_type="L1",
+        bce_pos_weight=5.0,
+        bce_loss_lambda=1.0,
+        use_guided_attn_loss=False,
+        num_heads_applied_guided_attn=2,
+        ce_weight=1.0,
+        ctc_weight=0.0,
+        hubert_weight=1.0,
+        dec_weight=1.0,
+        bart_weight=1.0,
+    ):
+        super().__init__(task)
+        self.speech_criterion = TexttoSpeechLoss(
+            task,
+            sentence_avg,
+            use_masking,
+            use_weighted_masking,
+            loss_type,
+            bce_pos_weight,
+            bce_loss_lambda,
+            use_guided_attn_loss,
+            num_heads_applied_guided_attn=num_heads_applied_guided_attn,
+        )
+        self.text_criterion = SpeechtoTextLoss(
+            SpeechtoTextLossConfig,
+            task,
+            sentence_avg,
+            label_smoothing,
+            ignore_prefix_size,
+            report_accuracy,
+            ce_weight,
+            ctc_weight
+        )
+        self.text_pretrain_criterion = TextPretrainCriterion(
+            task,
+            sentence_avg,
+            bart_weight,
+            loss_weights,
+        )
+        self.speech_pretrain_criterion = SpeechPretrainCriterion(
+            task,
+            sentence_avg,
+            pred_masked_weight,
+            pred_nomask_weight,
+            loss_weights,
+            log_keys,
+            use_masking,
+            use_weighted_masking,
+            loss_type,
+            bce_pos_weight,
+            hubert_weight,
+            dec_weight
+        )
+
+    def forward(self, model, sample, reduce=True):
+        """Compute the loss for the given sample.
+
+        Returns a tuple with three elements:
+        1) the loss
+        2) the sample size, which is used as the denominator for the gradient
+        3) logging outputs to display while training
+        """
+
+        task_name = sample['task_name']
+        if task_name == 's2t' or task_name == 's2c':
+            return self.text_criterion(model, sample, reduce)
+        elif task_name == 't2s' or task_name == 's2s':
+            return self.speech_criterion(model, sample)
+        elif task_name == 'text_pretrain':
+            return self.text_pretrain_criterion(model, sample, reduce)
+        elif task_name == 'speech_pretrain':
+            return self.speech_pretrain_criterion(model, sample, reduce)
+
+    @classmethod
+    def reduce_metrics(cls, logging_outputs):
+        """Aggregate logging outputs from data parallel training."""
+        logging_outputs_dict = {}
+        for logging_output in logging_outputs:
+            for task_name in logging_output:
+                if task_name not in ['s2t', 't2s', 's2c', 's2s', 'text_pretrain', 'speech_pretrain']:
+                    continue
+
+                if task_name not in logging_outputs_dict:
+                    logging_outputs_dict[task_name] = []
+                logging_outputs_dict[task_name].append(logging_output[task_name])
+
+        for task_name in logging_outputs_dict:
+            if task_name == 's2t':
+                # LabelSmoothedCrossEntropyCriterion.reduce_metrics([logging_output['s2t'] for logging_output in logging_outputs])
+                s2t_logging_output = logging_outputs_dict[task_name]
+                # s2t_sum = sum(log.get("ce_loss", 0) for log in logging_outputs)
+                loss_sum = sum(log.get("loss", 0) for log in s2t_logging_output)
+                nll_loss_sum = sum(log.get("nll_loss", 0) for log in s2t_logging_output)
+                ntokens = sum(log.get("ntokens", 0) for log in s2t_logging_output)
+                ce_loss_sum = sum(log.get("ce_loss", 0) for log in s2t_logging_output)
+                ctc_loss_sum = sum(log.get("ctc_loss", 0) for log in s2t_logging_output)
+
+                sample_size = max(1, sum(log.get("sample_size", 0) for log in s2t_logging_output))
+                metrics.log_scalar(
+                    "s2t_loss", loss_sum / sample_size / math.log(2), sample_size, 1, round=3
+                )
+
+                metrics.log_scalar(
+                    "s2t_nll_loss", nll_loss_sum / ntokens / math.log(2), ntokens, 2, round=3
+                )
+                metrics.log_derived(
+                    "s2t_ppl", lambda meters: utils.get_perplexity(meters["s2t_nll_loss"].avg, 2)
+                )
+                metrics.log_scalar(
+                    "ctc_loss", ctc_loss_sum / sample_size / math.log(2), ntokens, 2, round=3
+                )
+                metrics.log_scalar(
+                    "ce_loss", ce_loss_sum / ntokens, ntokens, 2, round=3
+                )
+
+                total = utils.item(sum(log.get("total", 0) for log in s2t_logging_output))
+                if total > 0:
+                    metrics.log_scalar("s2t_total", total)
+                    n_correct = utils.item(
+                        sum(log.get("n_correct", 0) for log in s2t_logging_output)
+                    )
+                    metrics.log_scalar("s2t_n_correct", n_correct)
+                    metrics.log_derived(
+                        "s2t_accuracy",
+                        lambda meters: round(
+                            meters["s2t_n_correct"].sum * 100.0 / meters["s2t_total"].sum, 3
+                        )
+                        if meters["s2t_total"].sum > 0
+                        else float("nan"),
+                        2
+                    )
+                c_errors = sum(log.get("c_errors", 0) for log in s2t_logging_output)
+                metrics.log_scalar("_c_errors", c_errors)
+                c_total = sum(log.get("c_total", 0) for log in s2t_logging_output)
+                metrics.log_scalar("_c_total", c_total)
+                w_errors = sum(log.get("w_errors", 0) for log in s2t_logging_output)
+                metrics.log_scalar("_w_errors", w_errors)
+                wv_errors = sum(log.get("wv_errors", 0) for log in s2t_logging_output)
+                metrics.log_scalar("_wv_errors", wv_errors)
+                w_total = sum(log.get("w_total", 0) for log in s2t_logging_output)
+                metrics.log_scalar("_w_total", w_total)
+                if c_total > 0:
+                    metrics.log_derived(
+                        "uer",
+                        lambda meters: safe_round(
+                            meters["_c_errors"].sum * 100.0 / meters["_c_total"].sum, 3
+                        )
+                        if meters["_c_total"].sum > 0
+                        else float("nan"),
+                    )
+                if w_total > 0:
+                    metrics.log_derived(
+                        "wer",
+                        lambda meters: safe_round(
+                            meters["_w_errors"].sum * 100.0 / meters["_w_total"].sum, 3
+                        )
+                        if meters["_w_total"].sum > 0
+                        else float("nan"),
+                    )
+                    metrics.log_derived(
+                        "raw_wer",
+                        lambda meters: safe_round(
+                            meters["_wv_errors"].sum * 100.0 / meters["_w_total"].sum, 3
+                        )
+                        if meters["_w_total"].sum > 0
+                        else float("nan"),
+                    )
+
+            if task_name == 't2s':
+                # TTSLossCriterion.reduce_metrics([logging_output['t2s'] for logging_output in logging_outputs])
+                # t2s_sum = sum(log.get("speech_loss", 0) for log in logging_outputs)
+                t2s_logging_output = logging_outputs_dict[task_name]
+                loss_sum = sum(log.get("loss", 0) for log in t2s_logging_output)
+                l1_loss_sum = sum(log.get("l1_loss", 0) for log in t2s_logging_output)
+                l2_loss_sum = sum(log.get("l2_loss", 0) for log in t2s_logging_output)
+                bce_loss_sum = sum(log.get("bce_loss", 0) for log in t2s_logging_output)
+                sample_size = max(1, sum(log.get("sample_size", 0) for log in t2s_logging_output))
+                metrics.log_scalar(
+                    "t2s_loss", loss_sum / sample_size, sample_size, 1, round=5
+                )
+                encoder_alpha_sum = sum(log.get("encoder_alpha", 0) for log in t2s_logging_output)
+                decoder_alpha_sum = sum(log.get("decoder_alpha", 0) for log in t2s_logging_output)
+                ngpu = sum(log.get("ngpu", 0) for log in t2s_logging_output)
+
+                metrics.log_scalar(
+                    "t2s_l1_loss", l1_loss_sum / sample_size, sample_size, 2, round=5
+                )
+                metrics.log_scalar(
+                    "t2s_l2_loss", l2_loss_sum / sample_size, sample_size, 2, round=5
+                )
+                metrics.log_scalar(
+                    "t2s_bce_loss", bce_loss_sum / sample_size, sample_size, 2, round=5
+                )
+                metrics.log_scalar(
+                    "t2s_encoder_alpha", encoder_alpha_sum / sample_size, sample_size, round=5
+                )
+                metrics.log_scalar(
+                    "t2s_decoder_alpha", decoder_alpha_sum / sample_size, sample_size, round=5
+                )
+
+                if "enc_dec_attn_loss" in t2s_logging_output[0]:
+                    enc_dec_attn_loss_sum = sum(log.get("enc_dec_attn_loss", 0) for log in t2s_logging_output)
+                    metrics.log_scalar(
+                        "t2s_enc_dec_attn_loss", enc_dec_attn_loss_sum / sample_size, sample_size, round=8
+                    )
+
+            if task_name == 's2c':
+                s2c_logging_output = logging_outputs_dict[task_name]
+                loss_sum = sum(log.get("loss", 0) for log in s2c_logging_output)
+                nll_loss_sum = sum(log.get("nll_loss", 0) for log in s2c_logging_output)
+                ntokens = sum(log.get("ntokens", 0) for log in s2c_logging_output)
+
+                sample_size = max(1, sum(log.get("sample_size", 0) for log in s2c_logging_output))
+                metrics.log_scalar(
+                    "s2c_loss", loss_sum / sample_size / math.log(2), sample_size, 1, round=3
+                )
+
+                metrics.log_scalar(
+                    "s2c_nll_loss", nll_loss_sum / ntokens / math.log(2), ntokens, 2, round=3
+                )
+
+                total = utils.item(sum(log.get("total", 0) for log in s2c_logging_output)) 
+                if total > 0:
+                    metrics.log_scalar("s2c_total", total)
+                    n_correct = utils.item(sum(log.get("n_correct", 0) for log in s2c_logging_output))
+                    metrics.log_scalar("s2c_n_correct", n_correct)
+                    metrics.log_derived(
+                        "s2c_accuracy",
+                        lambda meters: round(
+                            meters["s2c_n_correct"].sum * 100.0 / meters["s2c_total"].sum, 3
+                        )
+                        if meters["s2c_total"].sum > 0
+                        else float("nan"),
+                        2
+                    )
+            
+            if task_name == 's2s':
+                s2s_logging_output = logging_outputs_dict[task_name]
+                loss_sum = sum(log.get("loss", 0) for log in s2s_logging_output)
+                l1_loss_sum = sum(log.get("l1_loss", 0) for log in s2s_logging_output)
+                l2_loss_sum = sum(log.get("l2_loss", 0) for log in s2s_logging_output)
+                bce_loss_sum = sum(log.get("bce_loss", 0) for log in s2s_logging_output)
+                sample_size = max(1, sum(log.get("sample_size", 0) for log in s2s_logging_output))
+                metrics.log_scalar(
+                    "s2s_loss", loss_sum / sample_size, sample_size, 1, round=5
+                )
+                encoder_alpha_sum = sum(log.get("encoder_alpha", 0) for log in s2s_logging_output)
+                decoder_alpha_sum = sum(log.get("decoder_alpha", 0) for log in s2s_logging_output)
+                ngpu = sum(log.get("ngpu", 0) for log in s2s_logging_output)
+
+                metrics.log_scalar(
+                    "s2s_l1_loss", l1_loss_sum / sample_size, sample_size, 2, round=5
+                )
+                metrics.log_scalar(
+                    "s2s_l2_loss", l2_loss_sum / sample_size, sample_size, 2, round=5
+                )
+                metrics.log_scalar(
+                    "s2s_bce_loss", bce_loss_sum / sample_size, sample_size, 2, round=5
+                )
+                metrics.log_scalar(
+                    "s2s_decoder_alpha", decoder_alpha_sum / sample_size, sample_size, round=5
+                )
+
+                if "enc_dec_attn_loss" in s2s_logging_output[0]:
+                    enc_dec_attn_loss_sum = sum(log.get("enc_dec_attn_loss", 0) for log in s2s_logging_output)
+                    metrics.log_scalar(
+                        "s2s_enc_dec_attn_loss", enc_dec_attn_loss_sum / sample_size, sample_size, round=8
+                    )
+
+            if task_name == 'text_pretrain':
+                bart_logging_output = logging_outputs_dict[task_name]
+                loss_sum = sum(log.get("loss", 0) for log in bart_logging_output)
+                ntokens = sum(log.get("ntokens", 0) for log in bart_logging_output)
+                sample_size = max(1, sum(log.get("sample_size", 0) for log in bart_logging_output))
+                bart_loss_sum = sum(log.get("bart_loss", 0) for log in bart_logging_output)
+
+                # we divide by log(2) to convert the loss from base e to base 2
+                metrics.log_scalar(
+                    "text_loss", loss_sum / sample_size / math.log(2), sample_size, round=3
+                )
+                metrics.log_scalar(
+                    "bart_loss", bart_loss_sum / sample_size / math.log(2), ntokens, 2, round=3
+                )
+                if sample_size != ntokens:
+                    metrics.log_scalar(
+                        "bart_nll_loss", bart_loss_sum / ntokens / math.log(2), ntokens, round=3
+                    )
+                    metrics.log_derived(
+                        "bart_ppl", lambda meters: utils.get_perplexity(meters["bart_nll_loss"].avg)
+                    )
+                else:
+                    metrics.log_derived(
+                        "bart_ppl", lambda meters: utils.get_perplexity(meters["bart_loss"].avg)
+                    )
+                metrics.log_scalar("bart_wpb", ntokens, priority=180, round=1)
+
+                val_prob_perplexity = 0
+                val_code_perplexity = 0
+                sample_size_pp = 0
+                count_log_cp = 0
+                for log in bart_logging_output:
+                    if "loss_prob_perplexity" in log:
+                        val_prob_perplexity = val_prob_perplexity + log["loss_prob_perplexity"]
+                        sample_size_pp = sample_size_pp + log["sample_size"]
+                    if "code_perplexity" in log:
+                        val_code_perplexity = val_code_perplexity + log["code_perplexity"]
+                        count_log_cp = count_log_cp + 1
+                if val_prob_perplexity > 0:
+                    metrics.log_scalar("text_loss_prob_perplexity", val_prob_perplexity / sample_size_pp / math.log(2), round=3)
+                if val_code_perplexity > 0:
+                    metrics.log_scalar("text_code_perplexity", val_code_perplexity / count_log_cp, round=3)
+
+            if task_name == 'speech_pretrain':
+                hubert_logging_output = logging_outputs_dict[task_name]
+                loss_sum = sum(log.get("loss", 0) for log in hubert_logging_output)
+                ntokens = sum(log.get("ntokens", 0) for log in hubert_logging_output)
+                sample_size = max(1, sum(log.get("sample_size", 0) for log in hubert_logging_output))
+                dec_loss_sum = sum(log.get("dec_loss", 0) for log in hubert_logging_output)
+                l1_loss_sum = sum(log.get("l1_loss", 0) for log in hubert_logging_output)
+                l2_loss_sum = sum(log.get("l2_loss", 0) for log in hubert_logging_output)
+                bce_loss_sum = sum(log.get("bce_loss", 0) for log in hubert_logging_output)
+                ngpu = sum(log.get("ngpu", 0) for log in hubert_logging_output)
+
+                metrics.log_scalar("hubert_loss", loss_sum / sample_size / math.log(2), sample_size, round=3)
+                if sample_size != ntokens:
+                    metrics.log_scalar("hubert_nll_loss", loss_sum / ntokens / math.log(2), ntokens, round=3)
+                    metrics.log_derived("hubert_ppl", lambda meters: utils.get_perplexity(meters["hubert_nll_loss"].avg))
+                else:
+                    metrics.log_derived("hubert_ppl", lambda meters: utils.get_perplexity(meters["hubert_loss"].avg))
+
+                counts = {}
+                for lk in hubert_logging_output[0].keys():
+                    if lk.startswith("count_"):
+                        val = sum(log[lk] for log in hubert_logging_output)
+                        metrics.log_scalar("hubert_" + lk, val)
+                        counts[lk] = val
+
+                for lk in hubert_logging_output[0].keys():
+                    if lk.startswith("loss_") and lk != 'loss_prob_perplexity':
+                        val = sum(log[lk] for log in hubert_logging_output)
+                        metrics.log_scalar("hubert_" + lk, val / sample_size / math.log(2), round=3)
+                    elif lk.startswith("correct_"):
+                        val = sum(log[lk] for log in hubert_logging_output)
+                        metrics.log_scalar("hubert_" + lk, val / counts[re.sub("correct", "count", lk)])
+                    # elif lk == 'code_perplexity':
+                    #     val = sum(log[lk] for log in hubert_logging_output)
+                    #     metrics.log_scalar("hubert_" + lk, val / len(hubert_logging_output), round=3)
+
+                val_prob_perplexity = 0
+                val_code_perplexity = 0
+                sample_size_pp = 0
+                count_log_cp = 0
+                for log in hubert_logging_output:
+                    if "loss_prob_perplexity" in log:
+                        val_prob_perplexity = val_prob_perplexity + log["loss_prob_perplexity"]
+                        sample_size_pp = sample_size_pp + log["sample_size"]
+                    if "code_perplexity" in log:
+                        val_code_perplexity = val_code_perplexity + log["code_perplexity"]
+                        count_log_cp = count_log_cp + 1
+                if val_prob_perplexity > 0:
+                    metrics.log_scalar("hubert_loss_prob_perplexity", val_prob_perplexity / sample_size_pp / math.log(2), round=3)
+                if val_code_perplexity > 0:
+                    metrics.log_scalar("hubert_code_perplexity", val_code_perplexity / count_log_cp, round=3)
+
+                metrics.log_scalar(
+                    "hubert_dec_loss", dec_loss_sum / ngpu, sample_size, 2, round=5
+                )
+                metrics.log_scalar(
+                    "hubert_l1_loss", l1_loss_sum / ngpu, sample_size, 2, round=5
+                )
+                metrics.log_scalar(
+                    "hubert_l2_loss", l2_loss_sum / ngpu, sample_size, 2, round=5
+                )
+                metrics.log_scalar(
+                    "hubert_bce_loss", bce_loss_sum / ngpu, sample_size, 2, round=5
+                )
+                if "enc_dec_attn_loss" in hubert_logging_output[0]:
+                    enc_dec_attn_loss_sum = sum(log.get("enc_dec_attn_loss", 0) for log in hubert_logging_output)
+                    metrics.log_scalar(
+                        "hubert_enc_dec_attn_loss", enc_dec_attn_loss_sum / ngpu, sample_size, round=8
+                    )
+                metrics.log_scalar("hubert_wpb", ntokens, priority=180, round=1)
+
+        loss = sum(log.get("loss", 0) for log in logging_outputs)
+        sample_size = max(1, sum(log.get("sample_size", 0) for log in logging_outputs))
+        metrics.log_scalar(
+            "loss", loss / sample_size, sample_size, 1, round=5
+        )
+
+    @staticmethod
+    def logging_outputs_can_be_summed() -> bool:
+        """
+        Whether the logging outputs returned by `forward` can be summed
+        across workers prior to calling `reduce_metrics`. Setting this
+        to True will improves distributed training speed.
+        """
+        return False
diff --git a/SpeechT5/speecht5/criterions/text_pretrain_criterion.py b/SpeechT5/speecht5/criterions/text_pretrain_criterion.py
new file mode 100644
index 0000000000000000000000000000000000000000..be459bfd46074d64b8fe13238d52aaf2e871154e
--- /dev/null
+++ b/SpeechT5/speecht5/criterions/text_pretrain_criterion.py
@@ -0,0 +1,145 @@
+# --------------------------------------------------------
+# SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing (https://arxiv.org/abs/2110.07205)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/SpeechT5
+# Copyright (c) 2021 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Based on fairseq and espnet code bases
+# https://github.com/pytorch/fairseq; https://github.com/espnet/espnet
+# --------------------------------------------------------
+
+import math
+from dataclasses import dataclass, field
+from typing import List, Optional
+
+import torch
+import torch.nn.functional as F
+from fairseq import utils
+from fairseq.logging import metrics
+from fairseq.criterions import FairseqCriterion, register_criterion
+from fairseq.dataclass import FairseqDataclass
+from omegaconf import II
+
+
+@dataclass
+class TextPretrainCriterionConfig(FairseqDataclass):
+    sentence_avg: bool = II("optimization.sentence_avg")
+    loss_weights: Optional[List[float]] = field(
+        default_factory=lambda: [0.1,],
+        metadata={"help": "weights for additional loss terms (not first one)"},
+    )
+    bart_weight: float = field(
+        default=1.0,
+        metadata={"help": "loss weight for cross entropy"},
+    )
+
+
+class TextPretrainCriterion(FairseqCriterion):
+    def __init__(self, task, sentence_avg, bart_weight, loss_weights=None):
+        super().__init__(task)
+        self.sentence_avg = sentence_avg
+        self.loss_weights = loss_weights
+        self.bart_weight = bart_weight
+
+    def forward(self, model, sample, reduce=True):
+        """Compute the loss for the given sample.
+
+        Returns a tuple with three elements:
+        1) the loss
+        2) the sample size, which is used as the denominator for the gradient
+        3) logging outputs to display while training
+        """
+        net_output, codebook_out, encoder_output = model(**sample["net_input"])
+        bart_loss, _ = self.compute_loss(model, net_output, sample, reduce=reduce)
+        sample_size = (
+            sample["target"].size(0) if self.sentence_avg else sample["ntokens"]
+        )
+        
+        loss = self.bart_weight * bart_loss
+        logging_output = {
+            "loss": loss.item(),
+            "ntokens": sample["ntokens"],
+            "nsentences": sample["target"].size(0),
+            "bart_loss": bart_loss.item(),
+            "sample_size": sample_size,
+        }
+
+        if "prob_perplexity" in codebook_out:
+            assert hasattr(model, "get_extra_losses")
+            extra_losses, names = model.get_extra_losses(codebook_out)
+            if torch.is_tensor(extra_losses):
+                extra_losses = [extra_losses]
+                names = [names]
+            if len(self.loss_weights) == 1 and len(extra_losses) != 1:
+                self.loss_weights = [self.loss_weights[0]] * len(extra_losses)
+            if len(self.loss_weights) > len(extra_losses):
+                modified_loss_weight = self.loss_weights[len(extra_losses):]
+            else:
+                modified_loss_weight = self.loss_weights
+
+            # assert len(extra_losses) == len(self.loss_weights), f"{len(extra_losses)}, {len(self.loss_weights)}"
+            for p, n, coef in zip(extra_losses, names, modified_loss_weight):
+                # print(n + str(coef))
+                if coef != 0 and p is not None:
+                    p = coef * p.float() * sample_size
+                    loss += p
+                    logging_output[f"loss_{n}"] = p.item()
+
+        if 'loss_prob_perplexity' in logging_output:
+            logging_output['code_perplexity'] = codebook_out['code_perplexity'].item()
+
+        return loss, sample_size, logging_output
+
+    def compute_loss(self, model, net_output, sample, reduce=True):
+        lprobs = model.get_normalized_probs(net_output, log_probs=True)
+        lprobs = lprobs.view(-1, lprobs.size(-1))
+        target = model.get_targets(sample, net_output).view(-1)
+        loss = F.nll_loss(
+            lprobs,
+            target,
+            ignore_index=self.padding_idx,
+            reduction="sum" if reduce else "none",
+        )
+        return loss, loss
+
+    @staticmethod
+    def reduce_metrics(logging_outputs) -> None:
+        """Aggregate logging outputs from data parallel training."""
+        loss_sum = sum(log.get("loss", 0) for log in logging_outputs)
+        ntokens = sum(log.get("ntokens", 0) for log in logging_outputs)
+        sample_size = sum(log.get("sample_size", 0) for log in logging_outputs)
+        bart_loss_sum = sum(log.get("bart_loss", 0) for log in logging_outputs)
+
+        # we divide by log(2) to convert the loss from base e to base 2
+        metrics.log_scalar(
+            "loss", loss_sum / sample_size / math.log(2), sample_size, round=3
+        )
+        metrics.log_scalar(
+            "bart_loss", bart_loss_sum / sample_size / math.log(2), ntokens, 2, round=3
+        )
+        if sample_size != ntokens:
+            metrics.log_scalar(
+                "nll_loss", bart_loss_sum / ntokens / math.log(2), ntokens, round=3
+            )
+            metrics.log_derived(
+                "ppl", lambda meters: utils.get_perplexity(meters["nll_loss"].avg)
+            )
+        else:
+            metrics.log_derived(
+                "ppl", lambda meters: utils.get_perplexity(meters["bart_loss"].avg)
+            )
+
+        if "loss_prob_perplexity" in logging_outputs[0].keys():
+            val = sum(log["loss_prob_perplexity"] for log in logging_outputs)
+            metrics.log_scalar("loss_prob_perplexity", val / sample_size / math.log(2), round=3)
+        if "code_perplexity" in logging_outputs[0].keys():
+            val = sum(log["code_perplexity"] for log in logging_outputs)
+            metrics.log_scalar("code_perplexity", val / len(logging_outputs), round=3)
+
+    @staticmethod
+    def logging_outputs_can_be_summed() -> bool:
+        """
+        Whether the logging outputs returned by `forward` can be summed
+        across workers prior to calling `reduce_metrics`. Setting this
+        to True will improves distributed training speed.
+        """
+        return True
diff --git a/SpeechT5/speecht5/criterions/text_to_speech_loss.py b/SpeechT5/speecht5/criterions/text_to_speech_loss.py
new file mode 100644
index 0000000000000000000000000000000000000000..aa17bd19be9b1f804ae88c7ce8f441c5ddc4002d
--- /dev/null
+++ b/SpeechT5/speecht5/criterions/text_to_speech_loss.py
@@ -0,0 +1,428 @@
+# --------------------------------------------------------
+# SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing (https://arxiv.org/abs/2110.07205)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/SpeechT5
+# Copyright (c) 2021 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Based on fairseq and espnet code bases
+# https://github.com/pytorch/fairseq; https://github.com/espnet/espnet
+# --------------------------------------------------------
+
+from dataclasses import dataclass, field
+
+import torch
+from fairseq import utils
+from fairseq.logging import metrics
+from espnet.nets.pytorch_backend.nets_utils import make_non_pad_mask
+from fairseq.criterions import FairseqCriterion, register_criterion
+from fairseq.dataclass import FairseqDataclass
+from speecht5.models.modules.speech_encoder_prenet import SpeechEncoderPrenet
+from espnet.nets.pytorch_backend.e2e_tts_tacotron2 import GuidedAttentionLoss
+from omegaconf import II
+from typing import Any
+
+
+@dataclass
+class TexttoSpeechLossConfig(FairseqDataclass):
+    use_masking: bool = field(
+        default=True,
+        metadata={"help": "Whether to use masking in calculation of loss"},
+    )
+    use_weighted_masking: bool = field(
+        default=False,
+        metadata={"help": "Whether to use weighted masking in calculation of loss"},
+    )
+    loss_type: str = field(
+        default="L1",
+        metadata={"help": "How to calc loss"},
+    )
+    bce_pos_weight: float = field(
+        default=5.0,
+        metadata={"help": "Positive sample weight in BCE calculation (only for use-masking=True)"},
+    )
+    bce_loss_lambda: float = field(
+        default=1.0,
+        metadata={"help": "Lambda in bce loss"},
+    )
+    use_guided_attn_loss: bool = field(
+        default=False,
+        metadata={"help": "Whether to use guided attention loss"},
+    )
+    guided_attn_loss_sigma: float = field(
+        default=0.4,
+        metadata={"help": "Sigma in guided attention loss"},
+    )
+    guided_attn_loss_lambda: float = field(
+        default=10.0,
+        metadata={"help": "Lambda in guided attention loss"},
+    )
+    num_layers_applied_guided_attn: int = field(
+        default=2,
+        metadata={"help": "Number of layers to be applied guided attention loss, if set -1, all of the layers will be applied."},
+    )
+    num_heads_applied_guided_attn: int = field(
+        default=2,
+        metadata={"help": "Number of heads in each layer to be applied guided attention loss, if set -1, all of the heads will be applied."},
+    )
+    modules_applied_guided_attn: Any = field(
+        default=("encoder-decoder",),
+        metadata={"help": "Module name list to be applied guided attention loss"},
+    )
+    sentence_avg: bool = II("optimization.sentence_avg")
+
+
+class TexttoSpeechLoss(FairseqCriterion):
+    def __init__(
+        self,
+        task,
+        sentence_avg,
+        use_masking=True,
+        use_weighted_masking=False,
+        loss_type="L1",
+        bce_pos_weight=5.0,
+        bce_loss_lambda=1.0,
+        use_guided_attn_loss=False,
+        guided_attn_loss_sigma=0.4,
+        guided_attn_loss_lambda=1.0,
+        num_layers_applied_guided_attn=2,
+        num_heads_applied_guided_attn=2,
+        modules_applied_guided_attn=["encoder-decoder"],
+    ):
+        super().__init__(task)
+        self.sentence_avg = sentence_avg
+        self.use_masking = use_masking
+        self.use_weighted_masking = use_weighted_masking
+        self.loss_type = loss_type
+        self.bce_pos_weight = bce_pos_weight
+        self.bce_loss_lambda = bce_loss_lambda
+        self.use_guided_attn_loss = use_guided_attn_loss
+        self.guided_attn_loss_sigma = guided_attn_loss_sigma
+        self.guided_attn_loss_lambda = guided_attn_loss_lambda
+        # define loss function
+        self.criterion = Tacotron2Loss(
+            use_masking=use_masking,
+            use_weighted_masking=use_weighted_masking,
+            bce_pos_weight=bce_pos_weight,
+        )
+        if self.use_guided_attn_loss:
+            self.num_layers_applied_guided_attn = num_layers_applied_guided_attn
+            self.num_heads_applied_guided_attn = num_heads_applied_guided_attn
+            self.modules_applied_guided_attn = modules_applied_guided_attn
+        if self.use_guided_attn_loss:
+            self.attn_criterion = GuidedMultiHeadAttentionLoss(
+                sigma=guided_attn_loss_sigma,
+                alpha=guided_attn_loss_lambda,
+            )
+
+    def forward(self, model, sample):
+        """Compute the loss for the given sample.
+
+        Returns a tuple with three elements:
+        1) the loss
+        2) the sample size, which is used as the denominator for the gradient
+        3) logging outputs to display while training
+        """
+        net_output = model(**sample["net_input"])
+        loss, l1_loss, l2_loss, bce_loss, enc_dec_attn_loss = self.compute_loss(model, net_output, sample)
+        # sample_size = (
+        #     sample["target"].size(0) if self.sentence_avg else sample["nframes"]
+        # )
+        sample_size = 1
+        logging_output = {
+            "loss": loss.item(),
+            "l1_loss": l1_loss.item(),
+            "l2_loss": l2_loss.item(),
+            "bce_loss": bce_loss.item(),
+            "sample_size": 1,
+            "ntokens": sample["ntokens"],
+            "nsentences": sample["target"].size(0),
+        }
+
+        if enc_dec_attn_loss is not None:
+            logging_output['enc_dec_attn_loss'] = enc_dec_attn_loss.item()
+
+        if hasattr(model, 'text_encoder_prenet'):
+            logging_output["encoder_alpha"] = model.text_encoder_prenet.encoder_prenet[-1].alpha.item()
+            logging_output["decoder_alpha"] = model.speech_decoder_prenet.decoder_prenet[-1].alpha.item()
+        elif hasattr(model, "speech_encoder_prenet"):
+            logging_output["decoder_alpha"] = model.speech_decoder_prenet.decoder_prenet[-1].alpha.item()
+        else:
+            if 'task' not in sample:
+                logging_output["encoder_alpha"] = model.encoder_prenet.encoder_prenet[-1].alpha.item()
+            logging_output["decoder_alpha"] = model.decoder_prenet.decoder_prenet[-1].alpha.item()
+
+        return loss, sample_size, logging_output
+
+    def compute_loss(self, model, net_output, sample):
+        before_outs, after_outs, logits, attn = net_output
+        labels = sample["labels"]
+        ys = sample["dec_target"]
+        olens = sample["dec_target_lengths"]
+        ilens = sample["src_lengths"]
+
+        # modifiy mod part of groundtruth
+        if model.reduction_factor > 1:
+            olens_in = olens.new([torch.div(olen, model.reduction_factor, rounding_mode='floor') for olen in olens])
+            olens = olens.new([olen - olen % model.reduction_factor for olen in olens])
+            max_olen = max(olens)
+            ys = ys[:, :max_olen]
+            labels = labels[:, :max_olen]
+            labels = torch.scatter(labels, 1, (olens - 1).unsqueeze(1), 1.0) # make sure at least one frame has 1
+            # labels[:, -1] = 1.0  
+        else:
+            olens_in = olens
+
+        # caluculate loss values
+        l1_loss, l2_loss, bce_loss = self.criterion(
+            after_outs, before_outs, logits, ys, labels, olens
+        )
+
+        # l1_loss = l1_loss / ys.size(2)
+        # l2_loss = l2_loss / ys.size(2)
+
+        if self.loss_type == "L1":
+            loss = l1_loss + self.bce_loss_lambda * bce_loss if self.bce_loss_lambda > 0.0 else l1_loss
+        elif self.loss_type == "L2":
+            loss = l2_loss + self.bce_loss_lambda * bce_loss if self.bce_loss_lambda > 0.0 else l2_loss
+        elif self.loss_type == "L1+L2":
+            loss = l1_loss + l2_loss + self.bce_loss_lambda * bce_loss if self.bce_loss_lambda > 0.0 else l1_loss + l2_loss
+        else:
+            raise ValueError("unknown --loss-type " + self.loss_type)
+
+        # calculate guided attention loss
+        enc_dec_attn_loss = None
+        if self.use_guided_attn_loss:
+            # calculate the input lengths of encoder, which is determined by encoder prenet
+            if hasattr(model, 'encoder_reduction_factor') and model.encoder_reduction_factor > 1:
+                ilens_in = ilens.new([ilen // model.encoder_reduction_factor for ilen in ilens])
+            else:
+                ilens_in = ilens
+            # work for speech to speech model's input
+            if "task_name" in sample and sample["task_name"] == "s2s":
+                m = None
+                if hasattr(model, 'encoder_prenet'):
+                    m = model.encoder_prenet
+                elif hasattr(model, 'speech_encoder_prenet'):
+                    m = model.speech_encoder_prenet
+                if m is not None and isinstance(m, SpeechEncoderPrenet):
+                    ilens_in = m.get_src_lengths(ilens_in)
+            # calculate for encoder-decoder
+            if "encoder-decoder" in self.modules_applied_guided_attn:
+                attn = [att_l[:, : self.num_heads_applied_guided_attn] for att_l in attn]
+                att_ws = torch.cat(attn, dim=1)  # (B, H*L, T_out, T_in)
+                enc_dec_attn_loss = self.attn_criterion(att_ws, ilens_in, olens_in)
+                loss = loss + enc_dec_attn_loss
+
+        return loss, l1_loss, l2_loss, bce_loss, enc_dec_attn_loss
+
+    @classmethod
+    def reduce_metrics(cls, logging_outputs) -> None:
+        """Aggregate logging outputs from data parallel training."""
+        loss_sum = sum(log.get("loss", 0) for log in logging_outputs)
+        l1_loss_sum = sum(log.get("l1_loss", 0) for log in logging_outputs)
+        l2_loss_sum = sum(log.get("l2_loss", 0) for log in logging_outputs)
+        bce_loss_sum = sum(log.get("bce_loss", 0) for log in logging_outputs)
+        sample_size = max(1, sum(log.get("sample_size", 0) for log in logging_outputs))
+        metrics.log_scalar(
+            "loss", loss_sum / sample_size, sample_size, 1, round=5
+        )
+        encoder_alpha_sum = sum(log.get("encoder_alpha", 0) for log in logging_outputs)
+        decoder_alpha_sum = sum(log.get("decoder_alpha", 0) for log in logging_outputs)
+        ngpu = sum(log.get("ngpu", 0) for log in logging_outputs)
+
+        metrics.log_scalar(
+            "l1_loss", l1_loss_sum / sample_size, sample_size, 2, round=5
+        )
+        metrics.log_scalar(
+            "l2_loss", l2_loss_sum / sample_size, sample_size, 2, round=5
+        )
+        metrics.log_scalar(
+            "bce_loss", bce_loss_sum / sample_size, sample_size, 2, round=5
+        )
+        metrics.log_scalar(
+            "encoder_alpha", encoder_alpha_sum / sample_size, sample_size, round=5
+        )
+        metrics.log_scalar(
+            "decoder_alpha", decoder_alpha_sum / sample_size, sample_size, round=5
+        )
+
+        if "enc_dec_attn_loss" in logging_outputs[0]:
+            enc_dec_attn_loss_sum = sum(log.get("enc_dec_attn_loss", 0) for log in logging_outputs)
+            metrics.log_scalar(
+                "enc_dec_attn_loss", enc_dec_attn_loss_sum / sample_size, sample_size, round=8
+            )
+
+
+    @staticmethod
+    def logging_outputs_can_be_summed() -> bool:
+        """
+        Whether the logging outputs returned by `forward` can be summed
+        across workers prior to calling `reduce_metrics`. Setting this
+        to True will improves distributed training speed.
+        """
+        return True
+
+class Tacotron2Loss(torch.nn.Module):
+    """Loss function module for Tacotron2."""
+
+    def __init__(
+        self, use_masking=True, use_weighted_masking=False, bce_pos_weight=20.0
+    ):
+        """Initialize Tactoron2 loss module.
+
+        Args:
+            use_masking (bool): Whether to apply masking
+                for padded part in loss calculation.
+            use_weighted_masking (bool):
+                Whether to apply weighted masking in loss calculation.
+            bce_pos_weight (float): Weight of positive sample of stop token.
+
+        """
+        super(Tacotron2Loss, self).__init__()
+        assert (use_masking != use_weighted_masking) or not use_masking
+        self.use_masking = use_masking
+        self.use_weighted_masking = use_weighted_masking
+
+        # define criterions
+        # reduction = "none" if self.use_weighted_masking else "sum"
+        reduction = "none" if self.use_weighted_masking else "mean"
+        self.l1_criterion = torch.nn.L1Loss(reduction=reduction)
+        self.mse_criterion = torch.nn.MSELoss(reduction=reduction)
+        self.bce_criterion = torch.nn.BCEWithLogitsLoss(
+            reduction=reduction, pos_weight=torch.tensor(bce_pos_weight)
+        )
+
+        # NOTE(kan-bayashi): register pre hook function for the compatibility
+        self._register_load_state_dict_pre_hook(self._load_state_dict_pre_hook)
+
+    def forward(self, after_outs, before_outs, logits, ys, labels, olens):
+        """Calculate forward propagation.
+
+        Args:
+            after_outs (Tensor): Batch of outputs after postnets (B, Lmax, odim).
+            before_outs (Tensor): Batch of outputs before postnets (B, Lmax, odim).
+            logits (Tensor): Batch of stop logits (B, Lmax).
+            ys (Tensor): Batch of padded target features (B, Lmax, odim).
+            labels (LongTensor): Batch of the sequences of stop token labels (B, Lmax).
+            olens (LongTensor): Batch of the lengths of each target (B,).
+
+        Returns:
+            Tensor: L1 loss value.
+            Tensor: Mean square error loss value.
+            Tensor: Binary cross entropy loss value.
+
+        """
+        # make mask and apply it
+        if self.use_masking:
+            masks = make_non_pad_mask(olens).unsqueeze(-1).to(ys.device)
+            ys = ys.masked_select(masks)
+            after_outs = after_outs.masked_select(masks)
+            before_outs = before_outs.masked_select(masks)
+            labels = labels.masked_select(masks[:, :, 0])
+            logits = logits.masked_select(masks[:, :, 0])
+
+        # calculate loss
+        l1_loss = self.l1_criterion(after_outs, ys) + self.l1_criterion(before_outs, ys)
+        mse_loss = self.mse_criterion(after_outs, ys) + self.mse_criterion(
+            before_outs, ys
+        )
+        bce_loss = self.bce_criterion(logits, labels)
+
+        # make weighted mask and apply it
+        if self.use_weighted_masking:
+            masks = make_non_pad_mask(olens).unsqueeze(-1).to(ys.device)
+            weights = masks.float() / masks.sum(dim=1, keepdim=True).float()
+            out_weights = weights.div(ys.size(0) * ys.size(2))
+            logit_weights = weights.div(ys.size(0))
+
+            # apply weight
+            l1_loss = l1_loss.mul(out_weights).masked_select(masks).sum()
+            mse_loss = mse_loss.mul(out_weights).masked_select(masks).sum()
+            bce_loss = (
+                bce_loss.mul(logit_weights.squeeze(-1))
+                .masked_select(masks.squeeze(-1))
+                .sum()
+            )
+
+        return l1_loss, mse_loss, bce_loss
+
+    def _load_state_dict_pre_hook(
+        self,
+        state_dict,
+        prefix,
+        local_metadata,
+        strict,
+        missing_keys,
+        unexpected_keys,
+        error_msgs,
+    ):
+        """Apply pre hook fucntion before loading state dict.
+
+        From v.0.6.1 `bce_criterion.pos_weight` param is registered as a parameter but
+        old models do not include it and as a result, it causes missing key error when
+        loading old model parameter. This function solve the issue by adding param in
+        state dict before loading as a pre hook function
+        of the `load_state_dict` method.
+
+        """
+        key = prefix + "bce_criterion.pos_weight"
+        if key not in state_dict:
+            state_dict[key] = self.bce_criterion.pos_weight
+
+class GuidedMultiHeadAttentionLoss(GuidedAttentionLoss):
+    """Guided attention loss function module for multi head attention.
+    Args:
+        sigma (float, optional): Standard deviation to control
+        how close attention to a diagonal.
+        alpha (float, optional): Scaling coefficient (lambda).
+        reset_always (bool, optional): Whether to always reset masks.
+    """
+
+    def forward(self, att_ws, ilens, olens):
+        """Calculate forward propagation.
+        Args:
+            att_ws (Tensor):
+                Batch of multi head attention weights (B, H, T_max_out, T_max_in).
+            ilens (LongTensor): Batch of input lenghts (B,).
+            olens (LongTensor): Batch of output lenghts (B,).
+        Returns:
+            Tensor: Guided attention loss value.
+        """
+        if self.guided_attn_masks is None:
+            self.guided_attn_masks = (
+                self._make_guided_attention_masks(ilens, olens)
+                .to(att_ws.device)
+                .unsqueeze(1)
+            )
+        if self.masks is None:
+            self.masks = self._make_masks(ilens, olens).to(att_ws.device).unsqueeze(1)
+        losses = self.guided_attn_masks * att_ws
+        loss = torch.mean(losses.masked_select(self.masks))
+        if self.reset_always:
+            self._reset_masks()
+
+        return self.alpha * loss
+
+    def _make_guided_attention_masks(self, ilens, olens):
+        n_batches = len(ilens)
+        max_ilen = max(ilens)
+        max_olen = max(olens)
+        guided_attn_masks = torch.zeros((n_batches, max_olen, max_ilen), device=olens.device)
+        for idx, (ilen, olen) in enumerate(zip(ilens, olens)):
+            guided_attn_masks[idx, :olen, :ilen] = self._make_guided_attention_mask(
+                ilen, olen, self.sigma
+            )
+        return guided_attn_masks
+
+    @staticmethod
+    def _make_guided_attention_mask(ilen, olen, sigma):
+        grid_x, grid_y = torch.meshgrid(torch.arange(olen, device=olen.device), torch.arange(ilen, device=olen.device))
+        grid_x, grid_y = grid_x.float(), grid_y.float()
+        return 1.0 - torch.exp(
+            -((grid_y / ilen - grid_x / olen) ** 2) / (2 * (sigma**2))
+        )
+
+    @staticmethod
+    def _make_masks(ilens, olens):
+        in_masks = make_non_pad_mask(ilens).to(ilens.device)  # (B, T_in)
+        out_masks = make_non_pad_mask(olens).to(olens.device)  # (B, T_out)
+        return out_masks.unsqueeze(-1) & in_masks.unsqueeze(-2)  # (B, T_out, T_in)
diff --git a/SpeechT5/speecht5/data/__init__.py b/SpeechT5/speecht5/data/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/SpeechT5/speecht5/data/multitask_dataset.py b/SpeechT5/speecht5/data/multitask_dataset.py
new file mode 100644
index 0000000000000000000000000000000000000000..65e13cf0e9b640bf94618e4c746f5d37fbcac3ee
--- /dev/null
+++ b/SpeechT5/speecht5/data/multitask_dataset.py
@@ -0,0 +1,265 @@
+# --------------------------------------------------------
+# SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing (https://arxiv.org/abs/2110.07205)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/SpeechT5
+# Copyright (c) 2021 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Based on fairseq and espnet code bases
+# https://github.com/pytorch/fairseq; https://github.com/espnet/espnet
+# --------------------------------------------------------
+
+import bisect
+
+import logging
+import numpy as np
+from torch.utils.data.dataloader import default_collate
+from fairseq.data import data_utils
+
+from fairseq.data.fairseq_dataset import FairseqDataset
+
+logger = logging.getLogger(__name__)
+
+class MultitaskDataset(FairseqDataset):
+    @staticmethod
+    def cumsum(sequence):
+        r, s = [], 0
+        for e in sequence:
+            curr_len = len(e)
+            r.append(curr_len + s)
+            s += curr_len
+        return r
+
+    def __init__(self, datasets, sample_ratios=1, batch_ratio=None):
+        super(MultitaskDataset, self).__init__()
+        assert len(datasets) > 0, "datasets should not be an empty iterable"
+        self.datasets = list(datasets)
+        if isinstance(sample_ratios, int):
+            sample_ratios = [sample_ratios] * len(self.datasets)
+            if batch_ratio is not None:
+                logger.info('batch ratio is ' + str(batch_ratio))
+                self.batch_ratio = batch_ratio
+            else:
+                self.batch_ratio = None
+        else:
+            logger.info('set sample ratio to ' + str(sample_ratios))
+            if batch_ratio is not None:
+                logger.info('batch ratio is ' + str(batch_ratio))
+                self.batch_ratio = batch_ratio
+            else:
+                self.batch_ratio = None
+        self.sample_ratios = sample_ratios
+        self._ordered_indices = None
+        self._update_size()
+
+    def __len__(self):
+        return self.cumulative_sizes[-1]
+
+    def __getitem__(self, idx):
+        dataset_idx, sample_idx = self._get_dataset_and_sample_index(idx)
+        sample = self.datasets[dataset_idx][sample_idx]
+        if isinstance(sample, dict):
+            sample["dataset_idx"] = dataset_idx
+        else:
+            sample = sample + (dataset_idx,)
+        return sample
+
+    def _update_size(self):
+        self.cumulative_sizes = self.cumsum(self.datasets)
+        self.real_sizes = [len(d) for d in self.datasets]
+
+    def _get_dataset_and_sample_index(self, idx: int):
+        dataset_idx = bisect.bisect_right(self.cumulative_sizes, idx)
+        if dataset_idx == 0:
+            sample_idx = idx
+        else:
+            sample_idx = idx - self.cumulative_sizes[dataset_idx - 1]
+        sample_idx = sample_idx % self.real_sizes[dataset_idx]
+        return dataset_idx, sample_idx
+
+    def collater(self, samples, **extra_args):
+        # For now only supports datasets with same underlying collater implementations
+        if samples is not None and len(samples) > 0:
+            if isinstance(samples[0], dict):
+                dataset_idx = samples[0]["dataset_idx"]
+            else:
+                dataset_idx = samples[0][-1]
+                samples = [sample[:-1] for sample in samples]
+        else:
+            dataset_idx = 0
+
+        if hasattr(self.datasets[dataset_idx], "collater"):
+            return self.datasets[dataset_idx].collater(samples, **extra_args)
+        else:
+            return default_collate(samples, **extra_args)
+
+    def size(self, idx: int):
+        """
+        Return an example's size as a float or tuple.
+        """
+        dataset_idx, sample_idx = self._get_dataset_and_sample_index(idx)
+        return self.datasets[dataset_idx].size(sample_idx)
+
+    def num_tokens(self, index: int):
+        return np.max(self.size(index))
+
+    def attr(self, attr: str, index: int):
+        dataset_idx = bisect.bisect_right(self.cumulative_sizes, index)
+        return getattr(self.datasets[dataset_idx], attr, None)
+
+    @property
+    def sizes(self):
+        _dataset_sizes = []
+        for ds in self.datasets:
+            if isinstance(ds.sizes, np.ndarray):
+                _dataset_sizes.append(ds.sizes)
+            else:
+                # Only support underlying dataset with single size array.
+                assert isinstance(ds.sizes, list)
+                _dataset_sizes.append(ds.sizes[0])
+        return np.concatenate(_dataset_sizes)
+
+    @property
+    def supports_prefetch(self):
+        return all(d.supports_prefetch for d in self.datasets)
+
+    def ordered_indices(self):
+        # ordered_indices = []
+        # for i, dataset in enumerate(self.datasets):
+        #     indice = dataset.ordered_indices()
+        #     ordered_indices.append(indice)
+        if self._ordered_indices is None:
+            # Call the underlying dataset's ordered_indices() here, so that we
+            # get the same random ordering as we would have from using the
+            # underlying sub-datasets directly.
+            self._ordered_indices = [
+                dataset.ordered_indices()
+                for dataset in self.datasets
+            ]
+        return np.arange(len(self))
+
+    def prefetch(self, indices):
+        frm = 0
+        for to, ds in zip(self.cumulative_sizes, self.datasets):
+            real_size = len(ds)
+            if getattr(ds, "supports_prefetch", False):
+                ds.prefetch([(i - frm) % real_size for i in indices if frm <= i < to])
+            frm = to
+
+    def batch_by_size(
+        self,
+        indices,
+        max_tokens=None,
+        max_sentences=None,
+        required_batch_size_multiple=1,
+    ):
+        if not hasattr(self, "max_tokens"):
+            self.max_tokens = max_tokens
+        if not hasattr(self, "max_sentences"):
+            self.max_sentences = max_sentences
+        if not hasattr(self, "required_batch_size_multiple"):
+            self.required_batch_size_multiple = required_batch_size_multiple
+        batch_samplers = []
+        for i, dataset in enumerate(self.datasets):
+            batch_sampler = dataset.batch_by_size(
+                self._ordered_indices[i],
+                max_tokens=max_tokens if self.batch_ratio is None else max_tokens * self.batch_ratio[i],
+                max_sentences=max_sentences,
+                required_batch_size_multiple=required_batch_size_multiple,
+            )
+            if i > 0:
+                for batch in batch_sampler:
+                    batch += self.cumulative_sizes[i - 1]
+            if self.sample_ratios[i] != 1.0:
+                batch_sampler = np.array(batch_sampler)
+                batch_sampler = np.random.choice(batch_sampler, int(len(batch_sampler) * self.sample_ratios[i]))
+                batch_sampler = list(batch_sampler)
+            logger.info('Adjust batch by ratio ' + str(self.sample_ratios[i]) + ' and the number of batch is ' + str(int(len(batch_sampler))) + ' for dataset ' + str(i))
+            batch_samplers.extend(batch_sampler)
+        return batch_samplers
+
+    def filter_indices_by_size(self, indices, max_positions):
+        """
+        Filter each sub-dataset independently, then update the round robin to work
+        on the filtered sub-datasets.
+        """
+        if not hasattr(self, "max_positions"):
+            self.max_positions = max_positions
+        ignored_some = False
+        for i in range(len(self.datasets)):
+            # ignored = []
+            self._ordered_indices[i], ignored = self.datasets[i].filter_indices_by_size(
+                self._ordered_indices[i], self.max_positions[i]
+            )
+            if len(ignored) > 0:
+                ignored_some = True
+                logger.warning(
+                    f"{len(ignored)} samples from {i} have invalid sizes and will be skipped, "
+                    f"max_positions={self.max_positions[i]}, first few sample ids={ignored[:10]}"
+                )
+
+        logger.info('update dataset size')
+        self._update_size()
+
+        # Since we are modifying in place the _ordered_indices,
+        # it's not possible anymore to return valid ignored indices.
+        # Hopefully the extra debug information print above should be enough to debug.
+        # Ideally we would receive ignore_invalid_inputs so that we could have
+        # a proper error message.
+        return (np.arange(len(self)), [0] if ignored_some else [])
+
+    @property
+    def can_reuse_epoch_itr_across_epochs(self):
+        return all(d.can_reuse_epoch_itr_across_epochs for d in self.datasets)
+
+    def set_epoch(self, epoch):
+        super().set_epoch(epoch)
+        for ds in self.datasets:
+            if hasattr(ds, "set_epoch"):
+                ds.set_epoch(epoch)
+
+    def shuffle_batches(self, batches, seed):
+        logger.info("shuffle batches")
+        new_batches_fromlist = []
+        new_batches_notlist = []
+        new_batches = []
+        with data_utils.numpy_seed(seed):
+            np.random.shuffle(batches)
+            for batch in batches:
+                if isinstance(batch, list):
+                    # np.random.shuffle(batch)
+                    new_batches_fromlist.append(batch)
+                else:
+                    new_batches_notlist.append(batch)
+            logger.info("Get " + str(len(new_batches_fromlist)) + " chunk from speech sides")
+            logger.info("Get " + str(sum([len(batch_list) for batch_list in new_batches_fromlist])) + " batches from speech sides")
+            logger.info("Get " + str(len(new_batches_notlist)) + " batches from text sides")
+            if len(new_batches_fromlist) == 0:
+                return new_batches_notlist
+            st_ratio = int(len(new_batches_notlist) / len(new_batches_fromlist))
+            logger.info("Get st_ratio " + str(st_ratio))
+            last_idx = 0
+            for i in range(len(new_batches_fromlist)):
+                if i == len(new_batches_fromlist) - 1:
+                    new_batches_fromlist[i].extend(new_batches_notlist[last_idx:])
+                else:
+                    new_batches_fromlist[i].extend(new_batches_notlist[last_idx : last_idx + st_ratio])
+                np.random.shuffle(new_batches_fromlist[i])
+                new_batches.extend(new_batches_fromlist[i])
+                last_idx = last_idx + st_ratio
+        logger.info("Finish shuffle")
+        return new_batches
+
+    def reset_batch_sampler(self):
+        logger.info("reset batch sampler")
+        self._ordered_indices = [
+            self.datasets[i].ordered_indices()
+            for i in range(len(self.datasets))
+        ]
+        self.filter_indices_by_size(None, None)
+
+        batch_samplers = self.batch_by_size(
+            None,
+            self.max_tokens,
+            self.max_sentences,
+            self.required_batch_size_multiple
+        )
+        return batch_samplers
diff --git a/SpeechT5/speecht5/data/speech_dataset.py b/SpeechT5/speecht5/data/speech_dataset.py
new file mode 100644
index 0000000000000000000000000000000000000000..c339ee1b0b195abace434e67b9d8518d596de2d6
--- /dev/null
+++ b/SpeechT5/speecht5/data/speech_dataset.py
@@ -0,0 +1,476 @@
+# --------------------------------------------------------
+# SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing (https://arxiv.org/abs/2110.07205)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/SpeechT5
+# Copyright (c) 2021 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Based on fairseq and espnet code bases
+# https://github.com/pytorch/fairseq; https://github.com/espnet/espnet
+# --------------------------------------------------------
+
+import itertools
+import logging
+import os
+import sys
+from typing import Any, List, Optional, Union
+
+import numpy as np
+
+import torch
+import torch.nn.functional as F
+import librosa
+from fairseq.data.audio.speech_to_text_dataset import get_features_or_waveform
+from fairseq.data import data_utils
+from fairseq.data.fairseq_dataset import FairseqDataset
+
+logger = logging.getLogger(__name__)
+
+def _collate_frames(
+    frames: List[torch.Tensor], is_audio_input: bool = False
+):
+    """
+    Convert a list of 2D frames into a padded 3D tensor
+    Args:
+        frames (list): list of 2D frames of size L[i]*f_dim. Where L[i] is
+            length of i-th frame and f_dim is static dimension of features
+    Returns:
+        3D tensor of size len(frames)*len_max*f_dim where len_max is max of L[i]
+    """
+    max_len = max(frame.size(0) for frame in frames)
+    if is_audio_input:
+        out = frames[0].new_zeros((len(frames), max_len))
+    else:
+        out = frames[0].new_zeros((len(frames), max_len, frames[0].size(1)))
+    for i, v in enumerate(frames):
+        out[i, : v.size(0)] = v
+    return out
+
+def add_first_frame_and_remove_last_frame(ys):
+    ys_in = torch.cat(
+        [ys.new_zeros((ys.shape[0], 1, ys.shape[2])), ys[:, :-1]], dim=1
+    )
+    return ys_in
+
+def load_audio(manifest_path, max_keep, min_keep):
+    n_long, n_short = 0, 0
+    names, inds, sizes, spk_embeds = [], [], [], []
+    with open(manifest_path) as f:
+        root = f.readline().strip()
+        for ind, line in enumerate(f):
+            items = line.strip().split("\t")
+            assert len(items) == 3, line
+            sz = int(items[1])
+            if min_keep is not None and sz < min_keep:
+                n_short += 1
+            elif max_keep is not None and sz > max_keep:
+                n_long += 1
+            else:
+                names.append(items[0])
+                spk_embeds.append(items[2])
+                inds.append(ind)
+                sizes.append(sz)
+    tot = ind + 1
+    logger.info(
+        (
+            f"max_keep={max_keep}, min_keep={min_keep}, "
+            f"loaded {len(names)}, skipped {n_short} short and {n_long} long, "
+            f"longest-loaded={max(sizes)}, shortest-loaded={min(sizes)}"
+        )
+    )
+    return root, names, inds, tot, sizes, spk_embeds
+
+
+def load_label(label_path, inds, tot):
+    with open(label_path) as f:
+        labels = [line.rstrip() for line in f]
+        assert (
+            len(labels) == tot
+        ), f"number of labels does not match ({len(labels)} != {tot})"
+        labels = [labels[i] for i in inds]
+    return labels
+
+
+def load_label_offset(label_path, inds, tot):
+    with open(label_path) as f:
+        code_lengths = [len(line.encode("utf-8")) for line in f]
+        assert (
+            len(code_lengths) == tot
+        ), f"number of labels does not match ({len(code_lengths)} != {tot})"
+        offsets = list(itertools.accumulate([0] + code_lengths))
+        offsets = [(offsets[i], offsets[i + 1]) for i in inds]
+    return offsets
+
+
+def verify_label_lengths(
+    audio_sizes,
+    audio_rate,
+    label_path,
+    label_rate,
+    inds,
+    tot,
+    tol=0.1,  # tolerance in seconds
+):
+    if label_rate < 0:
+        logger.info(f"{label_path} is sequence label. skipped")
+        return
+
+    with open(label_path) as f:
+        lengths = [len(line.rstrip().split()) for line in f]
+        assert len(lengths) == tot
+        lengths = [lengths[i] for i in inds]
+    num_invalid = 0
+    for i, ind in enumerate(inds):
+        dur_from_audio = audio_sizes[i] / audio_rate
+        dur_from_label = lengths[i] / label_rate
+        if abs(dur_from_audio - dur_from_label) > tol:
+            logger.warning(
+                (
+                    f"audio and label duration differ too much "
+                    f"(|{dur_from_audio} - {dur_from_label}| > {tol}) "
+                    f"in line {ind+1} of {label_path}. Check if `label_rate` "
+                    f"is correctly set (currently {label_rate}). "
+                    f"num. of samples = {audio_sizes[i]}; "
+                    f"label length = {lengths[i]}"
+                )
+            )
+            num_invalid += 1
+    if num_invalid > 0:
+        logger.warning(
+            f"total {num_invalid} (audio, label) pairs with mismatched lengths"
+        )
+
+
+def logmelfilterbank(
+    audio,
+    sampling_rate,
+    fft_size=1024,
+    hop_size=256,
+    win_length=None,
+    window="hann",
+    num_mels=80,
+    fmin=80,
+    fmax=7600,
+    eps=1e-10,
+):
+    """Compute log-Mel filterbank feature. 
+    (https://github.com/kan-bayashi/ParallelWaveGAN/blob/master/parallel_wavegan/bin/preprocess.py)
+
+    Args:
+        audio (ndarray): Audio signal (T,).
+        sampling_rate (int): Sampling rate.
+        fft_size (int): FFT size.
+        hop_size (int): Hop size.
+        win_length (int): Window length. If set to None, it will be the same as fft_size.
+        window (str): Window function type.
+        num_mels (int): Number of mel basis.
+        fmin (int): Minimum frequency in mel basis calculation.
+        fmax (int): Maximum frequency in mel basis calculation.
+        eps (float): Epsilon value to avoid inf in log calculation.
+
+    Returns:
+        ndarray: Log Mel filterbank feature (#frames, num_mels).
+
+    """
+    # get amplitude spectrogram
+    x_stft = librosa.stft(audio, n_fft=fft_size, hop_length=hop_size,
+                          win_length=win_length, window=window, pad_mode="reflect")
+    spc = np.abs(x_stft).T  # (#frames, #bins)
+
+    # get mel basis
+    fmin = 0 if fmin is None else fmin
+    fmax = sampling_rate / 2 if fmax is None else fmax
+    mel_basis = librosa.filters.mel(sr=sampling_rate, n_fft=fft_size, n_mels=num_mels, fmin=fmin, fmax=fmax)
+
+    return np.log10(np.maximum(eps, np.dot(spc, mel_basis.T)))
+
+
+class SpeechPretrainDataset(FairseqDataset):
+    def __init__(
+        self,
+        manifest_path: str,
+        sample_rate: float,
+        label_paths: List[str],
+        label_rates: Union[List[float], float],  # -1 for sequence labels
+        pad_list: List[str],
+        eos_list: List[str],
+        label_processors: Optional[List[Any]] = None,
+        max_keep_sample_size: Optional[int] = None,
+        min_keep_sample_size: Optional[int] = None,
+        max_sample_size: Optional[int] = None,
+        shuffle: bool = True,
+        pad_audio: bool = False,
+        normalize: bool = False,
+        store_labels: bool = True,
+        random_crop: bool = False,
+        single_target: bool = False,
+        reduction_factor: int = 1,
+    ):
+        self.audio_root, self.audio_names, inds, tot, self.sizes, self.spk_embeds = load_audio(
+            manifest_path, max_keep_sample_size, min_keep_sample_size
+        )
+        self.sample_rate = sample_rate
+        self.shuffle = shuffle
+        self.random_crop = random_crop
+
+        self.num_labels = len(label_paths)
+        self.pad_list = pad_list
+        self.eos_list = eos_list
+        self.label_processors = label_processors
+        self.single_target = single_target
+        self.label_rates = (
+            [label_rates for _ in range(len(label_paths))]
+            if isinstance(label_rates, float)
+            else label_rates
+        )
+        self.store_labels = store_labels
+        if store_labels:
+            self.label_list = [load_label(p, inds, tot) for p in label_paths]
+        else:
+            self.label_paths = label_paths
+            self.label_offsets_list = [
+                load_label_offset(p, inds, tot) for p in label_paths
+            ]
+        assert label_processors is None or len(label_processors) == self.num_labels
+        for label_path, label_rate in zip(label_paths, self.label_rates):
+            verify_label_lengths(
+                self.sizes, sample_rate, label_path, label_rate, inds, tot
+            )
+
+        self.max_sample_size = (
+            max_sample_size if max_sample_size is not None else sys.maxsize
+        )
+        self.pad_audio = pad_audio
+        self.normalize = normalize
+        self.reduction_factor = reduction_factor
+        logger.info(
+            f"pad_audio={pad_audio}, random_crop={random_crop}, reduction_factor={reduction_factor}, "
+            f"normalize={normalize}, max_sample_size={self.max_sample_size}"
+        )
+
+    def get_audio(self, index):
+        import soundfile as sf
+
+        wav_path = os.path.join(self.audio_root, self.audio_names[index])
+        wav, cur_sample_rate = sf.read(wav_path)
+        wav = torch.from_numpy(wav).float()
+        fbank = logmelfilterbank(
+            wav.view(-1).cpu().numpy(), 16000
+        )
+        fbank = torch.from_numpy(fbank).float()
+        wav = self.postprocess(wav, cur_sample_rate)
+        return wav, fbank
+
+    def get_label(self, index, label_idx):
+        if self.store_labels:
+            label = self.label_list[label_idx][index]
+        else:
+            with open(self.label_paths[label_idx]) as f:
+                offset_s, offset_e = self.label_offsets_list[label_idx][index]
+                f.seek(offset_s)
+                label = f.read(offset_e - offset_s)
+
+        if self.label_processors is not None:
+            label = self.label_processors[label_idx](label)
+        return label
+
+    def get_labels(self, index):
+        return [self.get_label(index, i) for i in range(self.num_labels)]
+
+    def __getitem__(self, index):
+        wav, fbank = self.get_audio(index)
+        labels = self.get_labels(index)
+        spkembs = get_features_or_waveform(
+            os.path.join(self.audio_root, self.spk_embeds[index])
+        )
+        spkembs = torch.from_numpy(spkembs).float()
+        return {"id": index, "source": wav, "target": fbank, "label_list": labels, 'spkembs': spkembs}
+
+    def __len__(self):
+        return len(self.sizes)
+
+    def crop_to_max_size(self, wav, target_size):
+        size = len(wav)
+        diff = size - target_size
+        if diff <= 0:
+            return wav, 0
+
+        start, end = 0, target_size
+        if self.random_crop:
+            start = np.random.randint(0, diff + 1)
+            end = size - diff + start
+        return wav[start:end], start
+
+    def collater(self, samples):
+        # target = max(sizes) -> random_crop not used
+        # target = max_sample_size -> random_crop used for long
+        samples = [s for s in samples if s["source"] is not None]
+        if len(samples) == 0:
+            return {}
+
+        audios = [s["source"] for s in samples]
+        audio_sizes = [len(s) for s in audios]
+
+        fbanks = [s["target"] for s in samples]
+        fbank_sizes = [len(s) for s in fbanks]
+        
+        if self.pad_audio:
+            audio_size = min(max(audio_sizes), self.max_sample_size)
+        else:
+            audio_size = min(min(audio_sizes), self.max_sample_size)
+        collated_audios, padding_mask, audio_starts = self.collater_audio(
+            audios, audio_size
+        )
+
+        collated_fbanks = []
+        collated_audios_size = []
+        for i in range(len(fbanks)):
+            fbank_start = int(audio_starts[i] / (audio_sizes[i] / fbank_sizes[i]))
+            fbank_size = int(audio_size / (audio_sizes[i] / fbank_sizes[i]))
+            fbank_end = min(fbank_start + fbank_size, fbank_sizes[i])
+            collated_fbanks.append(fbanks[i][fbank_start : fbank_end])
+            collated_audios_size.append(audio_size)
+        collated_fbanks_size = [len(s) for s in collated_fbanks]
+        collated_fbanks = _collate_frames(collated_fbanks)
+        collated_fbanks_size = torch.tensor(collated_fbanks_size, dtype=torch.long)
+
+        # thin out frames for reduction factor (B, Lmax, odim) ->  (B, Lmax//r, odim)
+        if self.reduction_factor > 1:
+            collated_fbanks_in = collated_fbanks[:, self.reduction_factor - 1 :: self.reduction_factor]
+            collated_fbanks_size_in = collated_fbanks_size.new([torch.div(olen, self.reduction_factor, rounding_mode='floor') for olen in collated_fbanks_size])
+        else:
+            collated_fbanks_in, collated_fbanks_size_in = collated_fbanks, collated_fbanks_size
+
+        prev_output_tokens = torch.cat(
+            [collated_fbanks_in.new_zeros((collated_fbanks_in.shape[0], 1, collated_fbanks_in.shape[2])), collated_fbanks_in[:, :-1]], dim=1
+        )
+
+	    # make labels for stop prediction
+        labels = collated_fbanks.new_zeros(collated_fbanks.size(0), collated_fbanks.size(1))
+        for i, l in enumerate(fbank_sizes):
+            labels[i, l - 1 :] = 1.0
+
+        spkembs = _collate_frames([s["spkembs"] for s in samples], is_audio_input=True)
+
+        targets_by_label = [
+            [s["label_list"][i] for s in samples] for i in range(self.num_labels)
+        ]
+        targets_list, lengths_list, ntokens_list = self.collater_label(
+            targets_by_label, audio_size, audio_starts
+        )
+
+        net_input = {
+            "source": collated_audios, 
+            "padding_mask": padding_mask, 
+            "prev_output_tokens": prev_output_tokens,
+            "spkembs": spkembs,
+            "tgt_lengths": collated_fbanks_size_in,
+        }
+
+        batch = {
+            "id": torch.LongTensor([s["id"] for s in samples]),
+            "net_input": net_input,
+            "labels": labels,
+            "dec_target": collated_fbanks,
+            "dec_target_lengths": collated_fbanks_size,
+            "src_lengths": collated_audios_size,
+            "task_name": 'speech_pretrain',
+        }
+
+        if self.single_target:
+            batch["target_lengths"] = lengths_list[0]
+            batch["ntokens"] = ntokens_list[0]
+            batch["target"] = targets_list[0]
+        else:
+            batch["target_lengths_list"] = lengths_list
+            batch["ntokens_list"] = ntokens_list
+            batch["target_list"] = targets_list
+        return batch
+
+    def collater_audio(self, audios, audio_size):
+        collated_audios = audios[0].new_zeros(len(audios), audio_size)
+        padding_mask = (
+            torch.BoolTensor(collated_audios.shape).fill_(False)
+            # if self.pad_audio else None
+        )
+        audio_starts = [0 for _ in audios]
+        for i, audio in enumerate(audios):
+            diff = len(audio) - audio_size
+            if diff == 0:
+                collated_audios[i] = audio
+            elif diff < 0:
+                assert self.pad_audio
+                collated_audios[i] = torch.cat([audio, audio.new_full((-diff,), 0.0)])
+                padding_mask[i, diff:] = True
+            else:
+                collated_audios[i], audio_starts[i] = self.crop_to_max_size(
+                    audio, audio_size
+                )
+        return collated_audios, padding_mask, audio_starts
+
+    def collater_frm_label(self, targets, audio_size, audio_starts, label_rate, pad):
+        assert label_rate > 0
+        s2f = label_rate / self.sample_rate
+        frm_starts = [int(round(s * s2f)) for s in audio_starts]
+        frm_size = int(round(audio_size * s2f))
+        if not self.pad_audio:
+            rem_size = [len(t) - s for t, s in zip(targets, frm_starts)]
+            frm_size = min(frm_size, *rem_size)
+        targets = [t[s : s + frm_size] for t, s in zip(targets, frm_starts)]
+        logger.debug(f"audio_starts={audio_starts}")
+        logger.debug(f"frame_starts={frm_starts}")
+        logger.debug(f"frame_size={frm_size}")
+
+        lengths = torch.LongTensor([len(t) for t in targets])
+        ntokens = lengths.sum().item()
+        targets = data_utils.collate_tokens(targets, pad_idx=pad, left_pad=False)
+        return targets, lengths, ntokens
+
+    def collater_seq_label(self, targets, pad):
+        lengths = torch.LongTensor([len(t) for t in targets])
+        ntokens = lengths.sum().item()
+        targets = data_utils.collate_tokens(targets, pad_idx=pad, left_pad=False)
+        return targets, lengths, ntokens
+
+    def collater_label(self, targets_by_label, audio_size, audio_starts):
+        targets_list, lengths_list, ntokens_list = [], [], []
+        itr = zip(targets_by_label, self.label_rates, self.pad_list)
+        for targets, label_rate, pad in itr:
+            if label_rate == -1.0:
+                targets, lengths, ntokens = self.collater_seq_label(targets, pad)
+            else:
+                targets, lengths, ntokens = self.collater_frm_label(
+                    targets, audio_size, audio_starts, label_rate, pad
+                )
+            targets_list.append(targets)
+            lengths_list.append(lengths)
+            ntokens_list.append(ntokens)
+        return targets_list, lengths_list, ntokens_list
+
+    def num_tokens(self, index):
+        return self.size(index)
+
+    def size(self, index):
+        if self.pad_audio:
+            return self.sizes[index]
+        return min(self.sizes[index], self.max_sample_size)
+
+    def ordered_indices(self):
+        if self.shuffle:
+            order = [np.random.permutation(len(self))]
+        else:
+            order = [np.arange(len(self))]
+
+        order.append(self.sizes)
+        return np.lexsort(order)[::-1]
+
+    def postprocess(self, wav, cur_sample_rate):
+        if wav.dim() == 2:
+            wav = wav.mean(-1)
+        assert wav.dim() == 1, wav.dim()
+
+        if cur_sample_rate != self.sample_rate:
+            raise Exception(f"sr {cur_sample_rate} != {self.sample_rate}")
+
+        if self.normalize:
+            with torch.no_grad():
+                wav = F.layer_norm(wav, wav.shape)
+        return wav
diff --git a/SpeechT5/speecht5/data/speech_to_class_dataset.py b/SpeechT5/speecht5/data/speech_to_class_dataset.py
new file mode 100644
index 0000000000000000000000000000000000000000..dda301f1c1e78519d941c07d7fc8ea918858cffd
--- /dev/null
+++ b/SpeechT5/speecht5/data/speech_to_class_dataset.py
@@ -0,0 +1,262 @@
+# --------------------------------------------------------
+# SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing (https://arxiv.org/abs/2110.07205)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/SpeechT5
+# Copyright (c) 2021 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Based on fairseq and espnet code bases
+# https://github.com/pytorch/fairseq; https://github.com/espnet/espnet
+# --------------------------------------------------------
+
+import logging
+import os
+from typing import Any, List, Optional
+
+import numpy as np
+
+import torch
+import torch.nn.functional as F
+from fairseq.data import data_utils, Dictionary
+from fairseq.data.fairseq_dataset import FairseqDataset
+
+logger = logging.getLogger(__name__)
+
+
+def load_audio(manifest_path, max_keep, min_keep):
+    """manifest tsv: wav_path, wav_nframe, wav_class
+
+    Args
+        manifest_path: str
+        max_keep: int
+        min_keep: int
+    
+    Return
+        root, names, inds, tot, sizes, classes
+    """
+    n_long, n_short = 0, 0
+    names, inds, sizes, classes = [], [], [], []
+    with open(manifest_path) as f:
+        root = f.readline().strip()
+        for ind, line in enumerate(f):
+            items = line.strip().split("\t")
+            assert len(items) >= 2, line
+            sz = int(items[1])
+            if min_keep is not None and sz < min_keep:
+                n_short += 1
+            elif max_keep is not None and sz > max_keep:
+                n_long += 1
+            else:
+                names.append(items[0])
+                if len(items) > 2:
+                    classes.append(items[2])
+                inds.append(ind)
+                sizes.append(sz)
+    tot = ind + 1
+    logger.info(
+        (
+            f"max_keep={max_keep}, min_keep={min_keep}, "
+            f"loaded {len(names)}, skipped {n_short} short and {n_long} long, "
+            f"longest-loaded={max(sizes)}, shortest-loaded={min(sizes)}"
+        )
+    )
+    if len(classes) == 0:
+        logger.warn("no classes loaded only if inference")
+    return root, names, inds, tot, sizes, classes
+
+
+def sample_from_feature(x: np.ndarray, max_segment_length: int = 300):
+    """Load a segment within 300-400/51200-76800 frames or the corresponding samples from a utterance.
+
+    Args:
+        x (np.ndarray): feature or waveform (frames[, features]), e.g., log mel filter bank or waveform
+        max_segment_length (int, optional): maximum segment length. Defaults to 400.
+
+    Returns:
+        np.ndarray: segmented features
+    """
+    if len(x) <= max_segment_length:
+        return x
+    start = np.random.randint(0, x.shape[0] - max_segment_length)
+    return x[start: start + max_segment_length]
+
+
+class SpeechToClassDataset(FairseqDataset):
+    def __init__(
+        self,
+        manifest_path: str,
+        sample_rate: float,
+        label_processors: Optional[List[Any]] = None,
+        max_keep_sample_size: Optional[int] = None,
+        min_keep_sample_size: Optional[int] = None,
+        shuffle: bool = True,
+        normalize: bool = False,
+        tgt_dict: Optional[Dictionary] = None,
+        max_length: Optional[int] = None
+    ):
+        self.audio_root, self.audio_names, inds, tot, self.wav_sizes, self.wav_classes = load_audio(
+            manifest_path, max_keep_sample_size, min_keep_sample_size
+        )
+        self.sample_rate = sample_rate
+        self.shuffle = shuffle
+
+        self.label_processors = label_processors
+
+        self.normalize = normalize
+        self.tgt_dict = tgt_dict
+        self.max_length = max_length
+        logger.info(
+            f"max_length={max_length}, normalize={normalize}"
+        )
+
+    def get_audio(self, index):
+        import soundfile as sf
+
+        wav_path = os.path.join(self.audio_root, self.audio_names[index])
+        wav, cur_sample_rate = sf.read(wav_path)
+        if self.max_length is not None:
+            wav = sample_from_feature(wav, self.max_length)
+        wav = torch.from_numpy(wav).float()
+        wav = self.postprocess(wav, cur_sample_rate)
+        return wav
+
+    def get_label(self, index):
+        label = self.wav_classes[index]
+
+        if self.label_processors is not None:
+            label = self.label_processors(label)
+        return label
+
+    def __getitem__(self, index):
+        wav = self.get_audio(index)
+        label = None
+        if len(self.wav_classes) == len(self.audio_names):
+            label = self.get_label(index)
+        return {"id": index, "source": wav, "label": label}
+
+    def __len__(self):
+        return len(self.wav_sizes)
+
+    def collater(self, samples):
+        samples = [s for s in samples if s["source"] is not None]
+        if len(samples) == 0:
+            return {}
+
+        audios = [s["source"] for s in samples]
+        audio_sizes = [len(s) for s in audios]
+
+        audio_size = max(audio_sizes)
+        collated_audios, padding_mask = self.collater_audio(
+            audios, audio_size
+        )
+
+        decoder_label = None
+        decoder_target = None
+        decoder_target_lengths = None
+        if samples[0]["label"] is not None:
+            targets_by_label = [
+                [s["label"] for s in samples]
+            ]
+            targets_list, lengths_list, ntokens_list = self.collater_label(targets_by_label)
+
+            decoder_label = [
+                (targets_list[0][i, :lengths_list[0][i]]).long()
+                for i in range(targets_list[0].size(0))
+            ]
+
+            decoder_target = data_utils.collate_tokens(
+                decoder_label,
+                self.tgt_dict.pad(),
+                self.tgt_dict.eos(),
+                left_pad=False,
+                move_eos_to_beginning=False,
+            )
+            decoder_target_lengths = torch.tensor(
+                [x.size(0) for x in decoder_label], dtype=torch.long
+            )
+        prev_output_tokens = data_utils.collate_tokens(
+            [torch.LongTensor([-1]) for _ in samples],
+            self.tgt_dict.pad(),
+            self.tgt_dict.eos(),
+            left_pad=False,
+            move_eos_to_beginning=True,
+        )
+
+        net_input = {
+            "source": collated_audios, 
+            "padding_mask": padding_mask,
+            "prev_output_tokens": prev_output_tokens,
+            "task_name": "s2c",
+        }
+        batch = {
+            "id": torch.LongTensor([s["id"] for s in samples]),
+            "net_input": net_input,
+            "target": decoder_target,
+            "target_lengths": decoder_target_lengths,
+            "task_name": "s2c",
+            "ntokens": len(samples),
+        }
+
+        return batch
+
+    def collater_audio(self, audios, audio_size):
+        collated_audios = audios[0].new_zeros(len(audios), audio_size)
+        padding_mask = (
+            torch.BoolTensor(collated_audios.shape).fill_(False)
+        )
+        for i, audio in enumerate(audios):
+            diff = len(audio) - audio_size
+            if diff == 0:
+                collated_audios[i] = audio
+            elif diff < 0:
+                collated_audios[i] = torch.cat([audio, audio.new_full((-diff,), 0.0)])
+                padding_mask[i, diff:] = True
+            else:
+                raise Exception("Diff should not be larger than 0")
+        return collated_audios, padding_mask
+
+    def collater_seq_label(self, targets, pad):
+        lengths = torch.LongTensor([len(t) for t in targets])
+        ntokens = lengths.sum().item()
+        targets = data_utils.collate_tokens(targets, pad_idx=pad, left_pad=False)
+        return targets, lengths, ntokens
+
+    def collater_label(self, targets_by_label):
+        targets_list, lengths_list, ntokens_list = [], [], []
+        itr = zip(targets_by_label, [self.tgt_dict.pad()])
+        for targets, pad in itr:
+            targets, lengths, ntokens = self.collater_seq_label(targets, pad)
+            targets_list.append(targets)
+            lengths_list.append(lengths)
+            ntokens_list.append(ntokens)
+        return targets_list, lengths_list, ntokens_list
+
+    def num_tokens(self, index):
+        return self.size(index)
+
+    def size(self, index):
+        return self.wav_sizes[index]
+
+    @property
+    def sizes(self):
+        return np.array(self.wav_sizes)
+
+    def ordered_indices(self):
+        if self.shuffle:
+            order = [np.random.permutation(len(self))]
+        else:
+            order = [np.arange(len(self))]
+
+        order.append(self.wav_sizes)
+        return np.lexsort(order)[::-1]
+
+    def postprocess(self, wav, cur_sample_rate):
+        if wav.dim() == 2:
+            wav = wav.mean(-1)
+        assert wav.dim() == 1, wav.dim()
+
+        if cur_sample_rate != self.sample_rate:
+            raise Exception(f"sr {cur_sample_rate} != {self.sample_rate}")
+
+        if self.normalize:
+            with torch.no_grad():
+                wav = F.layer_norm(wav, wav.shape)
+        return wav
diff --git a/SpeechT5/speecht5/data/speech_to_speech_dataset.py b/SpeechT5/speecht5/data/speech_to_speech_dataset.py
new file mode 100644
index 0000000000000000000000000000000000000000..c9c195d74dacbaaeb9c94aea69b351a75e3bc91a
--- /dev/null
+++ b/SpeechT5/speecht5/data/speech_to_speech_dataset.py
@@ -0,0 +1,282 @@
+# --------------------------------------------------------
+# SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing (https://arxiv.org/abs/2110.07205)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/SpeechT5
+# Copyright (c) 2021 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Based on fairseq and espnet code bases
+# https://github.com/pytorch/fairseq; https://github.com/espnet/espnet
+# --------------------------------------------------------
+
+import logging
+import os
+from typing import Any, List, Optional
+
+import librosa
+import numpy as np
+import torch
+import torch.nn.functional as F
+from fairseq.data.fairseq_dataset import FairseqDataset
+
+logger = logging.getLogger(__name__)
+
+def _collate_frames(
+    frames: List[torch.Tensor], is_audio_input: bool = False
+):
+    """
+    Convert a list of 2D frames into a padded 3D tensor
+    Args:
+        frames (list): list of 2D frames of size L[i]*f_dim. Where L[i] is
+            length of i-th frame and f_dim is static dimension of features
+    Returns:
+        3D tensor of size len(frames)*len_max*f_dim where len_max is max of L[i]
+    """
+    max_len = max(frame.size(0) for frame in frames)
+    if is_audio_input:
+        out = frames[0].new_zeros((len(frames), max_len))
+    else:
+        out = frames[0].new_zeros((len(frames), max_len, frames[0].size(1)))
+    for i, v in enumerate(frames):
+        out[i, : v.size(0)] = v
+    return out
+
+def load_audio(manifest_path, max_keep, min_keep):
+    """manifest tsv: src_wav, src_nframe, tgt_wav, tgt_nframe, tgt_spkemb"""
+    n_long, n_short = 0, 0
+    src_names, tgt_names, inds, sizes, tgt_sizes, spk_embeds = [], [], [], [], [], []
+    with open(manifest_path) as f:
+        root = f.readline().strip()
+        for ind, line in enumerate(f):
+            items = line.strip().split("\t")
+            assert len(items) >= 2, line
+            sz = int(items[1])
+            if min_keep is not None and sz < min_keep:
+                n_short += 1
+            elif max_keep is not None and sz > max_keep:
+                n_long += 1
+            else:
+                src_names.append(items[0])
+                tgt_names.append(items[2])
+                tgt_sizes.append(items[3])
+                spk_embeds.append(items[4])
+                inds.append(ind)
+                sizes.append(sz)
+    tot = ind + 1
+    logger.info(
+        (
+            f"max_keep={max_keep}, min_keep={min_keep}, "
+            f"loaded {len(src_names)}, skipped {n_short} short and {n_long} long, "
+            f"longest-loaded={max(sizes)}, shortest-loaded={min(sizes)}"
+        )
+    )
+    return root, src_names, inds, tot, sizes, tgt_names, tgt_sizes, spk_embeds
+
+
+def logmelfilterbank(
+    audio,
+    sampling_rate,
+    fft_size=1024,
+    hop_size=256,
+    win_length=None,
+    window="hann",
+    num_mels=80,
+    fmin=80,
+    fmax=7600,
+    eps=1e-10,
+):
+    """Compute log-Mel filterbank feature. 
+    (https://github.com/kan-bayashi/ParallelWaveGAN/blob/master/parallel_wavegan/bin/preprocess.py)
+
+    Args:
+        audio (ndarray): Audio signal (T,).
+        sampling_rate (int): Sampling rate.
+        fft_size (int): FFT size.
+        hop_size (int): Hop size.
+        win_length (int): Window length. If set to None, it will be the same as fft_size.
+        window (str): Window function type.
+        num_mels (int): Number of mel basis.
+        fmin (int): Minimum frequency in mel basis calculation.
+        fmax (int): Maximum frequency in mel basis calculation.
+        eps (float): Epsilon value to avoid inf in log calculation.
+
+    Returns:
+        ndarray: Log Mel filterbank feature (#frames, num_mels).
+
+    """
+    # get amplitude spectrogram
+    x_stft = librosa.stft(audio, n_fft=fft_size, hop_length=hop_size,
+                          win_length=win_length, window=window, pad_mode="reflect")
+    spc = np.abs(x_stft).T  # (#frames, #bins)
+
+    # get mel basis
+    fmin = 0 if fmin is None else fmin
+    fmax = sampling_rate / 2 if fmax is None else fmax
+    mel_basis = librosa.filters.mel(sr=sampling_rate, n_fft=fft_size, n_mels=num_mels, fmin=fmin, fmax=fmax)
+
+    return np.log10(np.maximum(eps, np.dot(spc, mel_basis.T)))
+
+
+class SpeechToSpeechDataset(FairseqDataset):
+    def __init__(
+        self,
+        manifest_path: str,
+        sample_rate: float,
+        max_keep_sample_size: Optional[int] = None,
+        min_keep_sample_size: Optional[int] = None,
+        shuffle: bool = True,
+        normalize: bool = False,
+        reduction_factor: int = 1,
+    ):
+        self.audio_root, self.audio_names, inds, tot, self.wav_sizes, self.tgt_audios, self.tgt_sizes, self.tgt_spkembs = load_audio(
+            manifest_path, max_keep_sample_size, min_keep_sample_size
+        )
+        self.sample_rate = sample_rate
+        self.shuffle = shuffle
+
+        self.normalize = normalize
+        self.reduction_factor = reduction_factor
+        logger.info(
+            f"reduction_factor={reduction_factor}, normalize={normalize}"
+        )
+
+    def get_audio(self, index):
+        import soundfile as sf
+
+        wav_fbank = []
+        for name in [self.audio_names[index], self.tgt_audios[index]]:
+            wav_path = os.path.join(self.audio_root, name)
+            wav, cur_sample_rate = sf.read(wav_path)
+            wav = torch.from_numpy(wav).float()
+            fbank = logmelfilterbank(
+                wav.view(-1).cpu().numpy(), 16000
+            )
+            fbank = torch.from_numpy(fbank).float()
+            wav = self.postprocess(wav, cur_sample_rate)
+            wav_fbank.append(wav)
+            wav_fbank.append(fbank)
+        src_wav, src_fbank, tgt_wav, tgt_fbank = wav_fbank
+        return src_wav, src_fbank, tgt_wav, tgt_fbank
+
+    def __getitem__(self, index):
+        src_wav, src_fbank, tgt_wav, tgt_fbank = self.get_audio(index)
+        spkembs = np.load(os.path.join(self.audio_root, self.tgt_spkembs[index]))
+        spkembs = torch.from_numpy(spkembs).float()
+        name = self.audio_names[index].replace("/", ".").replace(".wav", "") + "-" + self.tgt_audios[index].replace("/", ".").replace(".wav", "") + ".wav"
+        return {"id": index, "source": src_wav, "target": tgt_fbank, "spkembs": spkembs, "audio_name": name, "tgt_name": self.tgt_audios[index]}
+    
+    def __len__(self):
+        return len(self.wav_sizes)
+
+    def collater(self, samples):
+        samples = [s for s in samples if s["source"] is not None]
+        if len(samples) == 0:
+            return {}
+
+        audios = [s["source"] for s in samples]
+        audio_sizes = [len(s) for s in audios]
+
+        audio_size = max(audio_sizes)
+        collated_audios, padding_mask = self.collater_audio(
+            audios, audio_size
+        )
+
+        fbanks = [s["target"] for s in samples]
+        fbank_sizes = [len(s) for s in fbanks]
+
+        collated_fbanks = _collate_frames(fbanks)
+        collated_fbanks_size = torch.tensor(fbank_sizes, dtype=torch.long)
+
+        # thin out frames for reduction factor (B, Lmax, odim) ->  (B, Lmax//r, odim)
+        if self.reduction_factor > 1:
+            collated_fbanks_in = collated_fbanks[:, self.reduction_factor - 1 :: self.reduction_factor]
+            collated_fbanks_size_in = collated_fbanks_size.new([torch.div(olen, self.reduction_factor, rounding_mode='floor') for olen in collated_fbanks_size])
+        else:
+            collated_fbanks_in, collated_fbanks_size_in = collated_fbanks, collated_fbanks_size
+
+        prev_output_tokens = torch.cat(
+            [collated_fbanks_in.new_zeros((collated_fbanks_in.shape[0], 1, collated_fbanks_in.shape[2])), collated_fbanks_in[:, :-1]], dim=1
+        )
+
+        # make labels for stop prediction
+        labels = collated_fbanks.new_zeros(collated_fbanks.size(0), collated_fbanks.size(1))
+        for i, l in enumerate(fbank_sizes):
+            labels[i, l - 1 :] = 1.0
+
+        spkembs = _collate_frames([s["spkembs"] for s in samples], is_audio_input=True)
+
+        net_input = {
+            "source": collated_audios,
+            "padding_mask": padding_mask,
+            "prev_output_tokens": prev_output_tokens,
+            "tgt_lengths": collated_fbanks_size_in,
+            "spkembs": spkembs,
+            "task_name": "s2s",
+        }
+        batch = {
+            "id": torch.LongTensor([s["id"] for s in samples]),
+            "name": [s["audio_name"] for s in samples],
+            "tgt_name": [s["tgt_name"] for s in samples],
+            "net_input": net_input,
+            "labels": labels,
+            "dec_target": collated_fbanks,
+            "dec_target_lengths": collated_fbanks_size,
+            "src_lengths": torch.LongTensor(audio_sizes),
+            "task_name": "s2s",
+            "ntokens": sum(audio_sizes),
+            "target": collated_fbanks,
+        }
+
+        return batch
+
+    def collater_audio(self, audios, audio_size):
+        collated_audios = audios[0].new_zeros(len(audios), audio_size)
+        padding_mask = (
+            torch.BoolTensor(collated_audios.shape).fill_(False)
+        )
+        for i, audio in enumerate(audios):
+            diff = len(audio) - audio_size
+            if diff == 0:
+                collated_audios[i] = audio
+            elif diff < 0:
+                collated_audios[i] = torch.cat([audio, audio.new_full((-diff,), 0.0)])
+                padding_mask[i, diff:] = True
+            else:
+                raise Exception("Diff should not be larger than 0")
+        return collated_audios, padding_mask
+
+
+    def num_tokens(self, index):
+        return self.wav_sizes[index]
+
+    def size(self, index):
+        return self.wav_sizes[index], self.tgt_sizes[index]
+
+    @property
+    def sizes(self):
+        return np.array(self.wav_sizes)
+
+    @property
+    def can_reuse_epoch_itr_across_epochs(self):
+        """No cache dataset if dataset is large-scale. Cache dataset for small dataset."""
+        return True
+
+    def ordered_indices(self):
+        if self.shuffle:
+            order = [np.random.permutation(len(self))]
+        else:
+            order = [np.arange(len(self))]
+
+        order.append(self.wav_sizes)
+        return np.lexsort(order)[::-1]
+
+    def postprocess(self, wav, cur_sample_rate):
+        if wav.dim() == 2:
+            wav = wav.mean(-1)
+        assert wav.dim() == 1, wav.dim()
+
+        if cur_sample_rate != self.sample_rate:
+            raise Exception(f"sr {cur_sample_rate} != {self.sample_rate}")
+
+        if self.normalize:
+            with torch.no_grad():
+                wav = F.layer_norm(wav, wav.shape)
+        return wav
diff --git a/SpeechT5/speecht5/data/speech_to_text_dataset.py b/SpeechT5/speecht5/data/speech_to_text_dataset.py
new file mode 100644
index 0000000000000000000000000000000000000000..e0be66663d1e8d268e98f7e56abc4c8af2cc4232
--- /dev/null
+++ b/SpeechT5/speecht5/data/speech_to_text_dataset.py
@@ -0,0 +1,270 @@
+# --------------------------------------------------------
+# SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing (https://arxiv.org/abs/2110.07205)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/SpeechT5
+# Copyright (c) 2021 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Based on fairseq and espnet code bases
+# https://github.com/pytorch/fairseq; https://github.com/espnet/espnet
+# --------------------------------------------------------
+
+import itertools
+import logging
+import os
+from typing import Any, List, Optional
+
+import numpy as np
+
+import torch
+import torch.nn.functional as F
+from fairseq.data import data_utils, Dictionary
+from fairseq.data.fairseq_dataset import FairseqDataset
+
+logger = logging.getLogger(__name__)
+
+
+def load_audio(manifest_path, max_keep, min_keep):
+    n_long, n_short = 0, 0
+    names, inds, sizes = [], [], []
+    with open(manifest_path) as f:
+        root = f.readline().strip()
+        for ind, line in enumerate(f):
+            items = line.strip().split("\t")
+            assert len(items) >= 2, line
+            sz = int(items[1])
+            if min_keep is not None and sz < min_keep:
+                n_short += 1
+            elif max_keep is not None and sz > max_keep:
+                n_long += 1
+            else:
+                names.append(items[0])
+                inds.append(ind)
+                sizes.append(sz)
+    tot = ind + 1
+    logger.info(
+        (
+            f"max_keep={max_keep}, min_keep={min_keep}, "
+            f"loaded {len(names)}, skipped {n_short} short and {n_long} long, "
+            f"longest-loaded={max(sizes)}, shortest-loaded={min(sizes)}"
+        )
+    )
+    return root, names, inds, tot, sizes
+
+
+def load_label(label_path, inds, tot):
+    with open(label_path) as f:
+        labels = [line.rstrip() for line in f]
+        assert (
+            len(labels) == tot
+        ), f"number of labels does not match ({len(labels)} != {tot})"
+        labels = [labels[i] for i in inds]
+    return labels
+
+
+def load_label_offset(label_path, inds, tot):
+    with open(label_path) as f:
+        code_lengths = [len(line.encode("utf-8")) for line in f]
+        assert (
+            len(code_lengths) == tot
+        ), f"number of labels does not match ({len(code_lengths)} != {tot})"
+        offsets = list(itertools.accumulate([0] + code_lengths))
+        offsets = [(offsets[i], offsets[i + 1]) for i in inds]
+    return offsets
+
+
+class SpeechToTextDataset(FairseqDataset):
+    def __init__(
+        self,
+        manifest_path: str,
+        sample_rate: float,
+        label_paths: List[str],
+        label_processors: Optional[List[Any]] = None,
+        max_keep_sample_size: Optional[int] = None,
+        min_keep_sample_size: Optional[int] = None,
+        shuffle: bool = True,
+        normalize: bool = False,
+        store_labels: bool = True,
+        tgt_dict: Optional[Dictionary] = None,
+        tokenizer = None,
+    ):
+        self.audio_root, self.audio_names, inds, tot, self.wav_sizes = load_audio(
+            manifest_path, max_keep_sample_size, min_keep_sample_size
+        )
+        self.sample_rate = sample_rate
+        self.shuffle = shuffle
+        self.tgt_dict = tgt_dict
+        self.tokenizer = tokenizer
+
+        self.num_labels = len(label_paths)
+        self.label_processors = label_processors
+        self.store_labels = store_labels
+        if store_labels:
+            self.label_list = [load_label(p, inds, tot) for p in label_paths]
+        else:
+            self.label_paths = label_paths
+            self.label_offsets_list = [
+                load_label_offset(p, inds, tot) for p in label_paths
+            ]
+        assert label_processors is None or len(label_processors) == self.num_labels
+
+        self.normalize = normalize
+        logger.info(
+            f"normalize={normalize}"
+        )
+
+    def get_audio(self, index):
+        import soundfile as sf
+
+        wav_path = os.path.join(self.audio_root, self.audio_names[index])
+        wav, cur_sample_rate = sf.read(wav_path)
+        wav = torch.from_numpy(wav).float()
+        wav = self.postprocess(wav, cur_sample_rate)
+        return wav
+
+    def get_label(self, index, label_idx):
+        if self.store_labels:
+            label = self.label_list[label_idx][index]
+        else:
+            with open(self.label_paths[label_idx]) as f:
+                offset_s, offset_e = self.label_offsets_list[label_idx][index]
+                f.seek(offset_s)
+                label = f.read(offset_e - offset_s)
+
+        if self.tokenizer is not None:
+            label = self.tokenizer.encode(label)
+
+        if self.label_processors is not None:
+            label = self.label_processors[label_idx](label)
+        return label
+
+    def get_labels(self, index):
+        return [self.get_label(index, i) for i in range(self.num_labels)]
+
+    def __getitem__(self, index):
+        wav = self.get_audio(index)
+        labels = self.get_labels(index)
+        return {"id": index, "source": wav, "label_list": labels}
+
+    def __len__(self):
+        return len(self.wav_sizes)
+
+    def collater(self, samples):
+        samples = [s for s in samples if s["source"] is not None]
+        if len(samples) == 0:
+            return {}
+
+        audios = [s["source"] for s in samples]
+        audio_sizes = [len(s) for s in audios]
+
+        audio_size = max(audio_sizes)
+        collated_audios, padding_mask = self.collater_audio(
+            audios, audio_size
+        )
+
+        targets_by_label = [
+            [s["label_list"][i] for s in samples] for i in range(self.num_labels)
+        ]
+        targets_list, lengths_list, ntokens_list = self.collater_label(targets_by_label)
+
+        decoder_label = [
+            torch.cat((targets_list[0][i, :lengths_list[0][i]], torch.tensor([self.tgt_dict.eos()])), 0).long()
+            for i in range(targets_list[0].size(0))
+        ]
+
+        decoder_target = data_utils.collate_tokens(
+            decoder_label,
+            self.tgt_dict.pad(),
+            self.tgt_dict.eos(),
+            left_pad=False,
+            move_eos_to_beginning=False,
+        )
+        decoder_target_lengths = torch.tensor(
+            [x.size(0) for x in decoder_label], dtype=torch.long
+        )
+        prev_output_tokens = data_utils.collate_tokens(
+            decoder_label,
+            self.tgt_dict.pad(),
+            self.tgt_dict.eos(),
+            left_pad=False,
+            move_eos_to_beginning=True,
+        )
+
+        net_input = {
+            "source": collated_audios, 
+            "padding_mask": padding_mask,
+            "prev_output_tokens": prev_output_tokens,
+            "task_name": "s2t",
+        }
+        batch = {
+            "id": torch.LongTensor([s["id"] for s in samples]),
+            "net_input": net_input,
+            "target": decoder_target,
+            "target_lengths": decoder_target_lengths,
+            "task_name": "s2t",
+            "ntokens": ntokens_list[0]
+        }
+
+        return batch
+
+    def collater_audio(self, audios, audio_size):
+        collated_audios = audios[0].new_zeros(len(audios), audio_size)
+        padding_mask = (
+            torch.BoolTensor(collated_audios.shape).fill_(False)
+        )
+        for i, audio in enumerate(audios):
+            diff = len(audio) - audio_size
+            if diff == 0:
+                collated_audios[i] = audio
+            elif diff < 0:
+                collated_audios[i] = torch.cat([audio, audio.new_full((-diff,), 0.0)])
+                padding_mask[i, diff:] = True
+            else:
+                raise Exception("Diff should not be larger than 0")
+        return collated_audios, padding_mask
+
+    def collater_seq_label(self, targets, pad):
+        lengths = torch.LongTensor([len(t) for t in targets])
+        ntokens = lengths.sum().item()
+        targets = data_utils.collate_tokens(targets, pad_idx=pad, left_pad=False)
+        return targets, lengths, ntokens
+
+    def collater_label(self, targets_by_label):
+        targets_list, lengths_list, ntokens_list = [], [], []
+        itr = zip(targets_by_label, [self.tgt_dict.pad()])
+        for targets, pad in itr:
+            targets, lengths, ntokens = self.collater_seq_label(targets, pad)
+            targets_list.append(targets)
+            lengths_list.append(lengths)
+            ntokens_list.append(ntokens)
+        return targets_list, lengths_list, ntokens_list
+
+    def num_tokens(self, index):
+        return self.size(index)
+
+    def size(self, index):
+        return self.wav_sizes[index]
+
+    @property
+    def sizes(self):
+        return np.array(self.wav_sizes)
+
+    def ordered_indices(self):
+        if self.shuffle:
+            order = [np.random.permutation(len(self))]
+        else:
+            order = [np.arange(len(self))]
+
+        order.append(self.wav_sizes)
+        return np.lexsort(order)[::-1]
+
+    def postprocess(self, wav, cur_sample_rate):
+        if wav.dim() == 2:
+            wav = wav.mean(-1)
+        assert wav.dim() == 1, wav.dim()
+
+        if cur_sample_rate != self.sample_rate:
+            raise Exception(f"sr {cur_sample_rate} != {self.sample_rate}")
+
+        if self.normalize:
+            with torch.no_grad():
+                wav = F.layer_norm(wav, wav.shape)
+        return wav
diff --git a/SpeechT5/speecht5/data/text_dataset.py b/SpeechT5/speecht5/data/text_dataset.py
new file mode 100644
index 0000000000000000000000000000000000000000..faa0120d25807c3d7420ecd5848ca7429fc79dd5
--- /dev/null
+++ b/SpeechT5/speecht5/data/text_dataset.py
@@ -0,0 +1,476 @@
+# --------------------------------------------------------
+# SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing (https://arxiv.org/abs/2110.07205)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/SpeechT5
+# Copyright (c) 2021 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Based on fairseq and espnet code bases
+# https://github.com/pytorch/fairseq; https://github.com/espnet/espnet
+# --------------------------------------------------------
+
+import math
+
+import numpy as np
+import torch
+
+from fairseq.data import FairseqDataset, data_utils
+
+
+def collate(
+    samples,
+    pad_idx,
+    eos_idx,
+    vocab,
+    left_pad_source=False,
+    left_pad_target=False,
+    input_feeding=True,
+    pad_to_length=None,
+):
+    assert input_feeding
+    if len(samples) == 0:
+        return {}
+
+    def merge(key, left_pad, move_eos_to_beginning=False, pad_to_length=None):
+        return data_utils.collate_tokens(
+            [s[key] for s in samples],
+            pad_idx,
+            eos_idx=None,  # use eos_idx of each sample instead of vocab.eos()
+            left_pad=left_pad,
+            move_eos_to_beginning=move_eos_to_beginning,
+            pad_to_length=pad_to_length,
+        )
+
+    id = torch.LongTensor([s["id"] for s in samples])
+    src_tokens = merge(
+        "source",
+        left_pad=left_pad_source,
+        pad_to_length=pad_to_length["source"] if pad_to_length is not None else None,
+    )
+    # sort by descending source length
+    src_lengths = torch.LongTensor([s["source"].numel() for s in samples])
+    src_lengths, sort_order = src_lengths.sort(descending=True)
+    id = id.index_select(0, sort_order)
+    src_tokens = src_tokens.index_select(0, sort_order)
+
+    prev_output_tokens = None
+    target = None
+    if samples[0].get("target", None) is not None:
+        target = merge(
+            "target",
+            left_pad=left_pad_target,
+            pad_to_length=pad_to_length["target"]
+            if pad_to_length is not None
+            else None,
+        )
+        target = target.index_select(0, sort_order)
+        ntokens = sum(len(s["target"]) for s in samples)
+
+        if input_feeding:
+            # we create a shifted version of targets for feeding the
+            # previous output token(s) into the next decoder step
+            prev_output_tokens = merge(
+                "target",
+                left_pad=left_pad_target,
+                move_eos_to_beginning=True,
+                pad_to_length=pad_to_length["target"]
+                if pad_to_length is not None
+                else None,
+            )
+            prev_output_tokens = prev_output_tokens.index_select(0, sort_order)
+    else:
+        ntokens = sum(len(s["source"]) for s in samples)
+
+    batch = {
+        "id": id,
+        "ntokens": ntokens,
+        "net_input": {
+            "src_tokens": src_tokens,
+            "src_lengths": src_lengths,
+        },
+        "target": target,
+        "nsentences": samples[0]["source"].size(0),
+        "sort_order": sort_order,
+        "task_name": 'text_pretrain',
+    }
+    if prev_output_tokens is not None:
+        batch["net_input"]["prev_output_tokens"] = prev_output_tokens
+
+    return batch
+
+
+class TextPretrainDataset(FairseqDataset):
+    """
+    A wrapper around TokenBlockDataset for BART dataset.
+
+    Args:
+        dataset (TokenBlockDataset): dataset to wrap
+        sizes (List[int]): sentence lengths
+        vocab (~fairseq.data.Dictionary): vocabulary
+        mask_idx (int): dictionary index used for masked token
+        mask_whole_words: only mask whole words. This should be a byte mask
+            over vocab indices, indicating whether it is the beginning of a
+            word. We will extend any mask to encompass the whole word.
+        shuffle (bool, optional): shuffle the elements before batching.
+          Default: ``True``
+        seed: Seed for random number generator for reproducibility.
+        args: argparse arguments.
+    """
+
+    def __init__(
+        self,
+        dataset,
+        sizes,
+        vocab,
+        mask_idx,
+        mask_whole_words,
+        shuffle,
+        seed,
+        args,
+        eos=None,
+        item_transform_func=None,
+        iid_noise_target=False,
+        uni_mask_idxs=None,
+    ):
+        self.dataset = dataset
+
+        self.sizes = sizes
+
+        self.vocab = vocab
+        self.shuffle = shuffle
+        self.seed = seed
+        if iid_noise_target:
+            assert isinstance(uni_mask_idxs, torch.Tensor), "if use iid_noise_target, the uni_mask_idxs must be a tensor which contain the mask indexs"
+        self.iid_noise_target = iid_noise_target
+        self.uni_mask_idxs = uni_mask_idxs
+        self.mask_idx = mask_idx
+        self.mask_whole_word = mask_whole_words
+        self.mask_ratio = args.mask
+        self.random_ratio = args.mask_random
+        self.insert_ratio = args.insert
+        self.rotate_ratio = args.rotate
+        self.permute_sentence_ratio = args.permute_sentences
+        self.eos = eos if eos is not None else vocab.eos()
+        self.item_transform_func = item_transform_func
+
+        if args.bpe != "gpt2":
+            self.full_stop_index = self.vocab.eos()
+        else:
+            assert args.bpe == "gpt2"
+            self.full_stop_index = self.vocab.index("13")
+
+        self.replace_length = args.replace_length
+        if self.replace_length not in [-1, 0, 1]:
+            raise ValueError(f"invalid arg: replace_length={self.replace_length}")
+        if args.mask_length not in ["subword", "word", "span-poisson"]:
+            raise ValueError(f"invalid arg: mask-length={args.mask_length}")
+        if args.mask_length == "subword" and args.replace_length not in [0, 1]:
+            raise ValueError(f"if using subwords, use replace-length=1 or 0")
+
+        self.mask_span_distribution = None
+        if args.mask_length == "span-poisson":
+            _lambda = args.poisson_lambda
+
+            lambda_to_the_k = 1
+            e_to_the_minus_lambda = math.exp(-_lambda)
+            k_factorial = 1
+            ps = []
+            for k in range(0, 128):
+                ps.append(e_to_the_minus_lambda * lambda_to_the_k / k_factorial)
+                lambda_to_the_k *= _lambda
+                k_factorial *= k + 1
+                if ps[-1] < 0.0000001:
+                    break
+            ps = torch.FloatTensor(ps)
+            self.mask_span_distribution = torch.distributions.Categorical(ps)
+
+        self.epoch = 0
+
+    @property
+    def can_reuse_epoch_itr_across_epochs(self):
+        return True  # only the noise changes, not item sizes
+
+    def set_epoch(self, epoch, **unused):
+        self.epoch = epoch
+
+    def __getitem__(self, index):
+        with data_utils.numpy_seed(self.seed, self.epoch, index):
+            tokens = self.dataset[index]
+            assert tokens[-1] == self.eos
+            source, target = tokens, tokens.clone()
+
+            if self.permute_sentence_ratio > 0.0:
+                source = self.permute_sentences(source, self.permute_sentence_ratio)
+
+            if self.mask_ratio > 0:
+                source, new_target = self.add_whole_word_mask(source, self.mask_ratio)
+                if new_target is not None:
+                    target = new_target
+
+            if self.insert_ratio > 0:
+                source = self.add_insertion_noise(source, self.insert_ratio)
+
+            if self.rotate_ratio > 0.0 and np.random.random() < self.rotate_ratio:
+                source = self.add_rolling_noise(source)
+        # there can additional changes to make:
+        if self.item_transform_func is not None:
+            source, target = self.item_transform_func(source, target)
+
+        assert (source >= 0).all()
+        assert (source[1:-1] >= 1).all()
+        assert (source <= len(self.vocab)).all()
+        assert source[0] == self.vocab.bos()
+        assert source[-1] == self.eos
+        return {
+            "id": index,
+            "source": source,
+            "target": target,
+        }
+
+    def __len__(self):
+        return len(self.dataset)
+
+    def permute_sentences(self, source, p=1.0):
+        full_stops = source == self.full_stop_index
+        # Pretend it ends with a full stop so last span is a sentence
+        full_stops[-2] = 1
+
+        # Tokens that are full stops, where the previous token is not
+        sentence_ends = (full_stops[1:] * ~full_stops[:-1]).nonzero(as_tuple=False) + 2
+        result = source.clone()
+
+        num_sentences = sentence_ends.size(0)
+        num_to_permute = math.ceil((num_sentences * 2 * p) / 2.0)
+        substitutions = torch.randperm(num_sentences)[:num_to_permute]
+        ordering = torch.arange(0, num_sentences)
+        ordering[substitutions] = substitutions[torch.randperm(num_to_permute)]
+
+        # Ignore <bos> at start
+        index = 1
+        for i in ordering:
+            sentence = source[(sentence_ends[i - 1] if i > 0 else 1) : sentence_ends[i]]
+            result[index : index + sentence.size(0)] = sentence
+            index += sentence.size(0)
+        return result
+
+    def word_starts(self, source):
+        if self.mask_whole_word is not None:
+            is_word_start = self.mask_whole_word.gather(0, source)
+        else:
+            is_word_start = torch.ones(source.size())
+        is_word_start[0] = 0
+        is_word_start[-1] = 0
+        return is_word_start
+
+    def add_whole_word_mask(self, source, p):
+        source_ori = source.clone()
+        is_word_start = self.word_starts(source)
+        num_to_mask = int(math.ceil(is_word_start.float().sum() * p))
+        num_inserts = 0
+        if num_to_mask == 0:
+            return source
+
+        if self.mask_span_distribution is not None:
+            lengths = self.mask_span_distribution.sample(sample_shape=(num_to_mask,))
+
+            # Make sure we have enough to mask
+            cum_length = torch.cumsum(lengths, 0)
+            while cum_length[-1] < num_to_mask:
+                lengths = torch.cat(
+                    [
+                        lengths,
+                        self.mask_span_distribution.sample(sample_shape=(num_to_mask,)),
+                    ],
+                    dim=0,
+                )
+                cum_length = torch.cumsum(lengths, 0)
+
+            # Trim to masking budget
+            i = 0
+            while cum_length[i] < num_to_mask:
+                i += 1
+            lengths[i] = num_to_mask - (0 if i == 0 else cum_length[i - 1])
+            num_to_mask = i + 1
+            lengths = lengths[:num_to_mask]
+
+            # Handle 0-length mask (inserts) separately
+            lengths = lengths[lengths > 0]
+            num_inserts = num_to_mask - lengths.size(0)
+            num_to_mask -= num_inserts
+            if num_to_mask == 0:
+                return self.add_insertion_noise(source, num_inserts / source.size(0))
+
+            assert (lengths > 0).all()
+        else:
+            lengths = torch.ones((num_to_mask,)).long()
+        assert is_word_start[-1] == 0
+        word_starts = is_word_start.nonzero(as_tuple=False)
+        indices = word_starts[
+            torch.randperm(word_starts.size(0))[:num_to_mask]
+        ].squeeze(1)
+        mask_random = torch.FloatTensor(num_to_mask).uniform_() < self.random_ratio
+
+        source_length = source.size(0)
+        assert source_length - 1 not in indices
+        to_keep = torch.ones(source_length, dtype=torch.bool)
+        is_word_start[
+            -1
+        ] = 255  # acts as a long length, so spans don't go over the end of doc
+        if self.replace_length == 0:
+            to_keep[indices] = 0
+        else:
+            # keep index, but replace it with [MASK]
+            source[indices] = self.mask_idx
+            source[indices[mask_random]] = torch.randint(
+                1, len(self.vocab), size=(mask_random.sum(),)
+            )
+
+        if self.mask_span_distribution is not None:
+            assert len(lengths.size()) == 1
+            assert lengths.size() == indices.size()
+            lengths -= 1
+            while indices.size(0) > 0:
+                assert lengths.size() == indices.size()
+                lengths -= is_word_start[indices + 1].long()
+                uncompleted = lengths >= 0
+                indices = indices[uncompleted] + 1
+                mask_random = mask_random[uncompleted]
+                lengths = lengths[uncompleted]
+                if self.replace_length != -1:
+                    # delete token
+                    to_keep[indices] = 0
+                else:
+                    # keep index, but replace it with [MASK]
+                    source[indices] = self.mask_idx
+                    source[indices[mask_random]] = torch.randint(
+                        1, len(self.vocab), size=(mask_random.sum(),)
+                    )
+        else:
+            # A bit faster when all lengths are 1
+            while indices.size(0) > 0:
+                uncompleted = is_word_start[indices + 1] == 0
+                indices = indices[uncompleted] + 1
+                mask_random = mask_random[uncompleted]
+                if self.replace_length != -1:
+                    # delete token
+                    to_keep[indices] = 0
+                else:
+                    # keep index, but replace it with [MASK]
+                    source[indices] = self.mask_idx
+                    source[indices[mask_random]] = torch.randint(
+                        1, len(self.vocab), size=(mask_random.sum(),)
+                    )
+
+                assert source_length - 1 not in indices
+
+        if not self.iid_noise_target:
+            source = source[to_keep]
+            target = None
+        else:
+            ## Prepare source
+            source_mask_idx = (source == self.mask_idx).nonzero().view(-1)
+            source[source_mask_idx] = self.uni_mask_idxs[:source_mask_idx.size(0)]
+            source = source[to_keep]
+
+            ## Prepare target
+            to_keep[source_mask_idx] = 0
+
+            # source_mask_idx: from [a, b, c, ...] to [a, b + 1, c + 2, ...]
+            source_mask_idx = source_mask_idx + torch.arange(source_mask_idx.size(0))
+            # target: source_length + mask_length
+            target = source_ori.new_zeros(source_mask_idx.size(0) + source_ori.size(0))
+            # target: [0, 0, 0, X, 0, 0, Y, ....]
+            target[source_mask_idx] = self.uni_mask_idxs[:source_mask_idx.size(0)]
+
+            target_to_keep = to_keep.new_zeros(source_mask_idx.size(0) + source_ori.size(0))
+
+            # Copy original value to target and target_to_keep
+            target_to_keep[target == 0] = to_keep
+            target_to_keep[-1] = 0
+            target[target == 0] = source_ori
+
+            target = target[~target_to_keep]
+
+        if num_inserts > 0:
+            source = self.add_insertion_noise(source, num_inserts / source.size(0))
+
+        return source, target
+
+    def add_permuted_noise(self, tokens, p):
+        num_words = len(tokens)
+        num_to_permute = math.ceil(((num_words * 2) * p) / 2.0)
+        substitutions = torch.randperm(num_words - 2)[:num_to_permute] + 1
+        tokens[substitutions] = tokens[substitutions[torch.randperm(num_to_permute)]]
+        return tokens
+
+    def add_rolling_noise(self, tokens):
+        offset = np.random.randint(1, max(1, tokens.size(-1) - 1) + 1)
+        tokens = torch.cat(
+            (tokens[0:1], tokens[offset:-1], tokens[1:offset], tokens[-1:]),
+            dim=0,
+        )
+        return tokens
+
+    def add_insertion_noise(self, tokens, p):
+        if p == 0.0:
+            return tokens
+
+        num_tokens = len(tokens)
+        n = int(math.ceil(num_tokens * p))
+
+        noise_indices = torch.randperm(num_tokens + n - 2)[:n] + 1
+        noise_mask = torch.zeros(size=(num_tokens + n,), dtype=torch.bool)
+        noise_mask[noise_indices] = 1
+        result = torch.LongTensor(n + len(tokens)).fill_(-1)
+
+        num_random = int(math.ceil(n * self.random_ratio))
+        result[noise_indices[num_random:]] = self.mask_idx
+        result[noise_indices[:num_random]] = torch.randint(
+            low=1, high=len(self.vocab), size=(num_random,)
+        )
+
+        result[~noise_mask] = tokens
+
+        assert (result >= 0).all()
+        return result
+
+    def collater(self, samples, pad_to_length=None):
+        """Merge a list of samples to form a mini-batch.
+        Args:
+            samples (List[dict]): samples to collate
+        Returns:
+            dict: a mini-batch of data
+        """
+        return collate(
+            samples, self.vocab.pad(), self.eos, self.vocab, pad_to_length=pad_to_length
+        )
+
+    def num_tokens(self, index):
+        """Return the number of tokens in a sample. This value is used to
+        enforce ``--max-tokens`` during batching."""
+        return self.sizes[index]
+
+    def size(self, index):
+        """Return an example's size as a float or tuple. This value is used when
+        filtering a dataset with ``--max-positions``."""
+        return self.sizes[index]
+
+    def ordered_indices(self):
+        """Return an ordered list of indices. Batches will be constructed based
+        on this order."""
+        if self.shuffle:
+            indices = np.random.permutation(len(self))
+        else:
+            indices = np.arange(len(self))
+        return indices[np.argsort(self.sizes[indices], kind="mergesort")]
+
+    def prefetch(self, indices):
+        self.src.prefetch(indices)
+        self.tgt.prefetch(indices)
+
+    @property
+    def supports_prefetch(self):
+        return (
+            hasattr(self.src, "supports_prefetch")
+            and self.src.supports_prefetch
+            and hasattr(self.tgt, "supports_prefetch")
+            and self.tgt.supports_prefetch
+        )
diff --git a/SpeechT5/speecht5/data/text_to_speech_dataset.py b/SpeechT5/speecht5/data/text_to_speech_dataset.py
new file mode 100644
index 0000000000000000000000000000000000000000..e0e0d750d142fed26f555a88062e25fabf0f0153
--- /dev/null
+++ b/SpeechT5/speecht5/data/text_to_speech_dataset.py
@@ -0,0 +1,331 @@
+# --------------------------------------------------------
+# SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing (https://arxiv.org/abs/2110.07205)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/SpeechT5
+# Copyright (c) 2021 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Based on fairseq and espnet code bases
+# https://github.com/pytorch/fairseq; https://github.com/espnet/espnet
+# --------------------------------------------------------
+
+import itertools
+import logging
+import os
+from typing import Any, List, Optional
+
+import numpy as np
+
+import torch
+import torch.nn.functional as F
+import librosa
+from fairseq.data.audio.speech_to_text_dataset import get_features_or_waveform
+from fairseq.data import data_utils, Dictionary
+from fairseq.data.fairseq_dataset import FairseqDataset
+
+
+logger = logging.getLogger(__name__)
+
+def _collate_frames(
+    frames: List[torch.Tensor], is_audio_input: bool = False
+):
+    """
+    Convert a list of 2D frames into a padded 3D tensor
+    Args:
+        frames (list): list of 2D frames of size L[i]*f_dim. Where L[i] is
+            length of i-th frame and f_dim is static dimension of features
+    Returns:
+        3D tensor of size len(frames)*len_max*f_dim where len_max is max of L[i]
+    """
+    max_len = max(frame.size(0) for frame in frames)
+    if is_audio_input:
+        out = frames[0].new_zeros((len(frames), max_len))
+    else:
+        out = frames[0].new_zeros((len(frames), max_len, frames[0].size(1)))
+    for i, v in enumerate(frames):
+        out[i, : v.size(0)] = v
+    return out
+
+def load_audio(manifest_path, max_keep, min_keep):
+    n_long, n_short = 0, 0
+    names, inds, sizes, spk_embeds = [], [], [], []
+    with open(manifest_path) as f:
+        root = f.readline().strip()
+        for ind, line in enumerate(f):
+            items = line.strip().split("\t")
+            assert len(items) == 3, line
+            sz = int(items[1])
+            if min_keep is not None and sz < min_keep:
+                n_short += 1
+            elif max_keep is not None and sz > max_keep:
+                n_long += 1
+            else:
+                names.append(items[0])
+                spk_embeds.append(items[2])
+                inds.append(ind)
+                sizes.append(sz)
+    tot = ind + 1
+    logger.info(
+        (
+            f"max_keep={max_keep}, min_keep={min_keep}, "
+            f"loaded {len(names)}, skipped {n_short} short and {n_long} long, "
+            f"longest-loaded={max(sizes)}, shortest-loaded={min(sizes)}"
+        )
+    )
+    return root, names, inds, tot, sizes, spk_embeds
+
+
+def load_label(label_path, inds, tot):
+    with open(label_path) as f:
+        labels = [line.rstrip() for line in f]
+        assert (
+            len(labels) == tot
+        ), f"number of labels does not match ({len(labels)} != {tot})"
+        labels = [labels[i] for i in inds]
+    return labels
+
+
+def load_label_offset(label_path, inds, tot):
+    with open(label_path) as f:
+        code_lengths = [len(line.encode("utf-8")) for line in f]
+        assert (
+            len(code_lengths) == tot
+        ), f"number of labels does not match ({len(code_lengths)} != {tot})"
+        offsets = list(itertools.accumulate([0] + code_lengths))
+        offsets = [(offsets[i], offsets[i + 1]) for i in inds]
+    return offsets
+
+
+def logmelfilterbank(
+    audio,
+    sampling_rate,
+    fft_size=1024,
+    hop_size=256,
+    win_length=None,
+    window="hann",
+    num_mels=80,
+    fmin=80,
+    fmax=7600,
+    eps=1e-10,
+):
+    """Compute log-Mel filterbank feature. 
+    (https://github.com/kan-bayashi/ParallelWaveGAN/blob/master/parallel_wavegan/bin/preprocess.py)
+
+    Args:
+        audio (ndarray): Audio signal (T,).
+        sampling_rate (int): Sampling rate.
+        fft_size (int): FFT size.
+        hop_size (int): Hop size.
+        win_length (int): Window length. If set to None, it will be the same as fft_size.
+        window (str): Window function type.
+        num_mels (int): Number of mel basis.
+        fmin (int): Minimum frequency in mel basis calculation.
+        fmax (int): Maximum frequency in mel basis calculation.
+        eps (float): Epsilon value to avoid inf in log calculation.
+
+    Returns:
+        ndarray: Log Mel filterbank feature (#frames, num_mels).
+
+    """
+    # get amplitude spectrogram
+    x_stft = librosa.stft(audio, n_fft=fft_size, hop_length=hop_size,
+                          win_length=win_length, window=window, pad_mode="reflect")
+    spc = np.abs(x_stft).T  # (#frames, #bins)
+
+    # get mel basis
+    fmin = 0 if fmin is None else fmin
+    fmax = sampling_rate / 2 if fmax is None else fmax
+    mel_basis = librosa.filters.mel(sr=sampling_rate, n_fft=fft_size, n_mels=num_mels, fmin=fmin, fmax=fmax)
+
+    return np.log10(np.maximum(eps, np.dot(spc, mel_basis.T)))
+
+
+
+class TextToSpeechDataset(FairseqDataset):
+    def __init__(
+        self,
+        manifest_path: str,
+        sample_rate: float,
+        label_paths: List[str],
+        label_processors: Optional[List[Any]] = None,
+        max_keep_sample_size: Optional[int] = None,
+        min_keep_sample_size: Optional[int] = None,
+        shuffle: bool = True,
+        normalize: bool = False,
+        store_labels: bool = True,
+        src_dict: Optional[Dictionary] = None,
+        tokenizer = None,
+        reduction_factor: int = 1,
+    ):
+        self.audio_root, self.audio_names, inds, tot, self.wav_sizes, self.spk_embeds = load_audio(
+            manifest_path, max_keep_sample_size, min_keep_sample_size
+        )
+        self.sample_rate = sample_rate
+        self.shuffle = shuffle
+        self.src_dict = src_dict
+        self.tokenizer = tokenizer
+
+        self.num_labels = len(label_paths)
+        self.label_processors = label_processors
+        self.store_labels = store_labels
+        if store_labels:
+            self.label_list = [load_label(p, inds, tot) for p in label_paths]
+        else:
+            self.label_paths = label_paths
+            self.label_offsets_list = [
+                load_label_offset(p, inds, tot) for p in label_paths
+            ]
+        assert label_processors is None or len(label_processors) == self.num_labels
+
+        self.normalize = normalize
+        self.reduction_factor = reduction_factor
+        logger.info(
+            f"reduction_factor={reduction_factor}, normalize={normalize}"
+        )
+
+    def get_audio(self, index):
+        import soundfile as sf
+
+        wav_path = os.path.join(self.audio_root, self.audio_names[index])
+        wav, cur_sample_rate = sf.read(wav_path)
+        wav = torch.from_numpy(wav).float()
+        fbank = logmelfilterbank(
+            wav.view(-1).cpu().numpy(), 16000
+        )
+        fbank = torch.from_numpy(fbank).float()
+        wav = self.postprocess(wav, cur_sample_rate)
+        return wav, fbank
+
+    def get_label(self, index, label_idx):
+        if self.store_labels:
+            label = self.label_list[label_idx][index]
+        else:
+            with open(self.label_paths[label_idx]) as f:
+                offset_s, offset_e = self.label_offsets_list[label_idx][index]
+                f.seek(offset_s)
+                label = f.read(offset_e - offset_s)
+
+        if self.tokenizer is not None:
+            label = self.tokenizer.encode(label)
+
+        if self.label_processors is not None:
+            label = self.label_processors[label_idx](label)
+        return label
+
+    def get_labels(self, index):
+        return [self.get_label(index, i) for i in range(self.num_labels)]
+
+    def __getitem__(self, index):
+        wav, fbank = self.get_audio(index)
+        labels = self.get_labels(index)
+        spkembs = get_features_or_waveform(
+            os.path.join(self.audio_root, self.spk_embeds[index])
+        )
+        spkembs = torch.from_numpy(spkembs).float()
+        return {"id": index, "source": labels, "target": fbank, "spkembs": spkembs, "audio_name": self.audio_names[index]}
+    
+    def __len__(self):
+        return len(self.wav_sizes)
+
+    def collater(self, samples):
+        samples = [s for s in samples if s["source"] is not None]
+        if len(samples) == 0:
+            return {}
+
+        fbanks = [s["target"] for s in samples]
+        fbank_sizes = [len(s) for s in fbanks]
+
+        collated_fbanks = _collate_frames(fbanks)
+        collated_fbanks_size = torch.tensor(fbank_sizes, dtype=torch.long)
+
+        # thin out frames for reduction factor (B, Lmax, odim) ->  (B, Lmax//r, odim)
+        if self.reduction_factor > 1:
+            collated_fbanks_in = collated_fbanks[:, self.reduction_factor - 1 :: self.reduction_factor]
+            collated_fbanks_size_in = collated_fbanks_size.new([torch.div(olen, self.reduction_factor, rounding_mode='floor') for olen in collated_fbanks_size])
+        else:
+            collated_fbanks_in, collated_fbanks_size_in = collated_fbanks, collated_fbanks_size
+
+        prev_output_tokens = torch.cat(
+            [collated_fbanks_in.new_zeros((collated_fbanks_in.shape[0], 1, collated_fbanks_in.shape[2])), collated_fbanks_in[:, :-1]], dim=1
+        )
+
+        # make labels for stop prediction
+        labels = collated_fbanks.new_zeros(collated_fbanks.size(0), collated_fbanks.size(1))
+        for i, l in enumerate(fbank_sizes):
+            labels[i, l - 1 :] = 1.0
+
+        spkembs = _collate_frames([s["spkembs"] for s in samples], is_audio_input=True)
+
+        sources_by_label = [
+            [s["source"][i] for s in samples] for i in range(self.num_labels)
+        ]
+        sources_list, lengths_list, ntokens_list = self.collater_label(sources_by_label)
+
+        net_input = {
+            "src_tokens": sources_list[0], 
+            "src_lengths": lengths_list[0],
+            "prev_output_tokens": prev_output_tokens,
+            "tgt_lengths": collated_fbanks_size_in,
+            "spkembs": spkembs,
+            "task_name": "t2s",
+        }
+        batch = {
+            "id": torch.LongTensor([s["id"] for s in samples]),
+            "name": [s["audio_name"] for s in samples],
+            "net_input": net_input,
+            "labels": labels,
+            "dec_target": collated_fbanks,
+            "dec_target_lengths": collated_fbanks_size,
+            "src_lengths": lengths_list[0],
+            "task_name": "t2s",
+            "ntokens": ntokens_list[0],
+            "target": collated_fbanks,
+        }
+
+        return batch
+
+    def collater_seq_label(self, targets, pad):
+        lengths = torch.LongTensor([len(t) for t in targets])
+        ntokens = lengths.sum().item()
+        targets = data_utils.collate_tokens(targets, pad_idx=pad, left_pad=False)
+        return targets, lengths, ntokens
+
+    def collater_label(self, targets_by_label):
+        targets_list, lengths_list, ntokens_list = [], [], []
+        itr = zip(targets_by_label, [self.src_dict.pad()])
+        for targets, pad in itr:
+            targets, lengths, ntokens = self.collater_seq_label(targets, pad)
+            targets_list.append(targets)
+            lengths_list.append(lengths)
+            ntokens_list.append(ntokens)
+        return targets_list, lengths_list, ntokens_list
+
+    def num_tokens(self, index):
+        return self.size(index)
+
+    def size(self, index):
+        return self.wav_sizes[index]
+
+    @property
+    def sizes(self):
+        return np.array(self.wav_sizes)
+
+    def ordered_indices(self):
+        if self.shuffle:
+            order = [np.random.permutation(len(self))]
+        else:
+            order = [np.arange(len(self))]
+
+        order.append(self.wav_sizes)
+        return np.lexsort(order)[::-1]
+
+    def postprocess(self, wav, cur_sample_rate):
+        if wav.dim() == 2:
+            wav = wav.mean(-1)
+        assert wav.dim() == 1, wav.dim()
+
+        if cur_sample_rate != self.sample_rate:
+            raise Exception(f"sr {cur_sample_rate} != {self.sample_rate}")
+
+        if self.normalize:
+            with torch.no_grad():
+                wav = F.layer_norm(wav, wav.shape)
+        return wav
diff --git a/SpeechT5/speecht5/models/__init__.py b/SpeechT5/speecht5/models/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..d8db7a74c15397db0aaf82a1459146cbe12a9c8b
--- /dev/null
+++ b/SpeechT5/speecht5/models/__init__.py
@@ -0,0 +1,2 @@
+from .speecht5 import *  # noqa
+from .t5_transformer_lm import *  # noqa
diff --git a/SpeechT5/speecht5/models/modules/__init__.py b/SpeechT5/speecht5/models/modules/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/SpeechT5/speecht5/models/modules/decoder.py b/SpeechT5/speecht5/models/modules/decoder.py
new file mode 100644
index 0000000000000000000000000000000000000000..a066d1dd38af547c62e86b0aaa4efcf7f4e47040
--- /dev/null
+++ b/SpeechT5/speecht5/models/modules/decoder.py
@@ -0,0 +1,324 @@
+# --------------------------------------------------------
+# SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing (https://arxiv.org/abs/2110.07205)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/SpeechT5
+# Copyright (c) 2021 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Based on fairseq and espnet code bases
+# https://github.com/pytorch/fairseq; https://github.com/espnet/espnet
+# --------------------------------------------------------
+
+from typing import Any, Dict, List, Optional
+
+import torch
+import torch.nn as nn
+from fairseq import utils
+from fairseq.distributed import fsdp_wrap
+from fairseq.models import (
+    FairseqIncrementalDecoder,
+)
+from fairseq.modules import (
+    FairseqDropout,
+    LayerDropModuleList,
+    LayerNorm,
+)
+from fairseq.modules.checkpoint_activations import checkpoint_wrapper
+from torch import Tensor
+
+from .encoder import RelativePositionalEncoding
+from .transformer_layer import TransformerDecoderLayer
+
+DEFAULT_MIN_PARAMS_TO_WRAP = int(1e8)
+
+
+class TransformerDecoder(FairseqIncrementalDecoder):
+    """
+    Transformer decoder consisting of *args.decoder_layers* layers. Each layer
+    is a :class:`TransformerDecoderLayer`.
+
+    Args:
+        args (argparse.Namespace): parsed command-line arguments
+        dictionary (~fairseq.data.Dictionary): decoding dictionary
+        embed_tokens (torch.nn.Embedding): output embedding
+        no_encoder_attn (bool, optional): whether to attend to encoder outputs
+            (default: False).
+    """
+
+    def __init__(
+        self,
+        args,
+        no_encoder_attn=False,
+    ):
+        self.args = args
+        super().__init__(None)
+        self.register_buffer("version", torch.Tensor([3]))
+        self._future_mask = torch.empty(0)
+
+        self.dropout_module = FairseqDropout(
+            args.dropout, module_name=self.__class__.__name__
+        )
+        self.decoder_layerdrop = args.decoder_layerdrop
+        # self.max_s_positions = args.max_target_positions
+        export = getattr(args, "export", False)
+        self.cross_self_attention = getattr(args, "cross_self_attention", False)
+
+        if self.decoder_layerdrop > 0.0:
+            self.layers = LayerDropModuleList(p=self.decoder_layerdrop)
+        else:
+            self.layers = nn.ModuleList([])
+        self.layers.extend(
+            [
+                self.build_decoder_layer(args, no_encoder_attn)
+                for _ in range(args.decoder_layers)
+            ]
+        )
+        self.num_layers = len(self.layers)
+
+        if args.decoder_normalize_before and not getattr(
+            args, "no_decoder_final_norm", False
+        ):
+            self.layer_norm = LayerNorm(args.decoder_embed_dim, eps=args.layer_norm_eps, export=export)
+        else:
+            self.layer_norm = None
+
+        if args.relative_position_embedding:
+            self.pos_emb = RelativePositionalEncoding(args.encoder_embed_dim//args.encoder_attention_heads, args.decoder_max_relative_position)
+
+    def build_decoder_layer(self, args, no_encoder_attn=False):
+        layer = TransformerDecoderLayer(args, no_encoder_attn=no_encoder_attn, has_relative_attention_bias=args.relative_position_embedding)
+        checkpoint = getattr(args, "checkpoint_activations", False)
+        if checkpoint:
+            offload_to_cpu = getattr(args, "offload_activations", False)
+            layer = checkpoint_wrapper(layer, offload_to_cpu=offload_to_cpu)
+        # if we are checkpointing, enforce that FSDP always wraps the
+        # checkpointed layer, regardless of layer size
+        min_params_to_wrap = (
+            getattr(args, "min_params_to_wrap", DEFAULT_MIN_PARAMS_TO_WRAP)
+            if not checkpoint
+            else 0
+        )
+        layer = fsdp_wrap(layer, min_num_params=min_params_to_wrap)
+        return layer
+
+    def forward(
+        self,
+        prev_output_tokens,
+        tgt_mask,
+        encoder_out: Optional[Dict[str, List[Tensor]]] = None,
+        incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]] = None,
+        full_context_alignment: bool = False,
+        alignment_layer: Optional[int] = None,
+        alignment_heads: Optional[int] = None,
+        src_lengths: Optional[Any] = None,
+        return_all_hiddens: bool = False,
+    ):
+        """
+        Args:
+            prev_output_tokens (LongTensor): previous decoder outputs of shape
+                `(batch, tgt_len)`, for teacher forcing
+            encoder_out (optional): output from the encoder, used for
+                encoder-side attention, should be of size T x B x C
+            incremental_state (dict): dictionary used for storing state during
+                :ref:`Incremental decoding`
+            features_only (bool, optional): only return features without
+                applying output layer (default: False).
+            full_context_alignment (bool, optional): don't apply
+                auto-regressive mask to self-attention (default: False).
+
+        Returns:
+            tuple:
+                - the decoder's output of shape `(batch, tgt_len, vocab)`
+                - a dictionary with any model-specific outputs
+        """
+
+        x, extra = self.extract_features(
+            prev_output_tokens,
+            tgt_mask,
+            encoder_out=encoder_out,
+            incremental_state=incremental_state,
+            full_context_alignment=full_context_alignment,
+            alignment_layer=alignment_layer,
+            alignment_heads=alignment_heads,
+        )
+
+        return x, extra
+
+    def extract_features(
+        self,
+        prev_output_tokens,
+        tgt_mask,
+        encoder_out: Optional[Dict[str, List[Tensor]]],
+        incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]] = None,
+        full_context_alignment: bool = False,
+        alignment_layer: Optional[int] = None,
+        alignment_heads: Optional[int] = None,
+    ):
+        return self.extract_features_scriptable(
+            prev_output_tokens,
+            tgt_mask,
+            encoder_out,
+            incremental_state,
+            full_context_alignment,
+            alignment_layer,
+            alignment_heads,
+        )
+
+    """
+    A scriptable subclass of this class has an extract_features method and calls
+    super().extract_features, but super() is not supported in torchscript. A copy of
+    this function is made to be used in the subclass instead.
+    """
+
+    def extract_features_scriptable(
+        self,
+        prev_output_tokens,
+        tgt_mask,
+        encoder_out: Optional[Dict[str, List[Tensor]]],
+        incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]] = None,
+        full_context_alignment: bool = False,
+        alignment_layer: Optional[int] = None,
+        alignment_heads: Optional[int] = None,
+    ):
+        """
+        Similar to *forward* but only return features.
+
+        Includes several features from "Jointly Learning to Align and
+        Translate with Transformer Models" (Garg et al., EMNLP 2019).
+
+        Args:
+            full_context_alignment (bool, optional): don't apply
+                auto-regressive mask to self-attention (default: False).
+            alignment_layer (int, optional): return mean alignment over
+                heads at this layer (default: last layer).
+            alignment_heads (int, optional): only average alignment over
+                this many heads (default: all heads).
+
+        Returns:
+            tuple:
+                - the decoder's features of shape `(batch, tgt_len, embed_dim)`
+                - a dictionary with any model-specific outputs
+        """
+        bs = prev_output_tokens.size(0)
+        if alignment_layer is None:
+            alignment_layer = self.num_layers - 1
+
+        enc: Optional[Tensor] = None
+        padding_mask: Optional[Tensor] = None
+        if encoder_out is not None and len(encoder_out["encoder_out"]) > 0:
+            enc = encoder_out["encoder_out"][0]
+            assert (
+                enc.size()[1] == bs
+            ), f"Expected enc.shape == (t, {bs}, c) got {enc.shape}"
+        if encoder_out is not None and len(encoder_out["encoder_padding_mask"]) > 0:
+            padding_mask = encoder_out["encoder_padding_mask"][0]
+
+        # B x T x C -> T x B x C
+        x = prev_output_tokens.transpose(0, 1)
+
+        self_attn_padding_mask: Optional[Tensor] = None
+        if self.cross_self_attention or tgt_mask is not None:
+            self_attn_padding_mask = tgt_mask
+
+        ## relative position embedding
+        if self.args.relative_position_embedding:
+            x_len = x.shape[0]
+            pos_seq = torch.arange(0, x_len).long().to(x.device)
+            pos_seq = pos_seq[:, None] - pos_seq[None, :]
+            pos_k, pos_v = self.pos_emb(pos_seq)
+        else:
+            pos_k = None
+
+        # decoder layers
+        attn_list = []
+        attn: Optional[Tensor] = None
+        inner_states: List[Optional[Tensor]] = [x]
+        for idx, layer in enumerate(self.layers):
+            if incremental_state is None and not full_context_alignment:
+                self_attn_mask = self.buffered_future_mask(x)
+            else:
+                self_attn_mask = None
+
+            x, layer_attn, _ = layer(
+                x,
+                enc,
+                padding_mask,
+                incremental_state,
+                self_attn_mask=self_attn_mask,
+                self_attn_padding_mask=self_attn_padding_mask,
+                need_attn=bool((idx == alignment_layer or alignment_layer == -1)),
+                need_head_weights=bool((idx == alignment_layer or alignment_layer == -1)),
+                pos_bias=pos_k,
+            )
+            inner_states.append(x)
+            if layer_attn is not None and (idx == alignment_layer or alignment_layer == -1):
+                attn = layer_attn.float().to(x)
+                attn_list.append(attn.transpose(0, 1))
+
+        if attn is not None and len(attn_list) == 1:
+            if alignment_heads is not None:
+                attn = attn[:alignment_heads]
+
+            # average probabilities over heads
+            attn = attn.mean(dim=0)
+
+        if self.layer_norm is not None:
+            x = self.layer_norm(x)
+
+        # T x B x C -> B x T x C
+        x = x.transpose(0, 1)
+
+        return x, {"attn": [attn if len(attn_list) <= 1 else attn_list], "inner_states": inner_states}
+
+    # def max_positions(self):
+    #     """Maximum output length supported by the decoder."""
+    #     return self.max_target_positions
+
+    def buffered_future_mask(self, tensor):
+        dim = tensor.size(0)
+        # self._future_mask.device != tensor.device is not working in TorchScript. This is a workaround.
+        if (
+            self._future_mask.size(0) == 0
+            or (not self._future_mask.device == tensor.device)
+            or self._future_mask.size(0) < dim
+        ):
+            self._future_mask = torch.triu(
+                utils.fill_with_neg_inf(torch.zeros([dim, dim], device=tensor.device)), 1,
+            )
+        else:
+            self._future_mask = self._future_mask.to(tensor)
+        return self._future_mask[:dim, :dim]
+
+    def upgrade_state_dict_named(self, state_dict, name):
+        """Upgrade a (possibly old) state dict for new versions of fairseq."""
+        for i in range(self.num_layers):
+            # update layer norms
+            layer_norm_map = {
+                "0": "self_attn_layer_norm",
+                "1": "encoder_attn_layer_norm",
+                "2": "final_layer_norm",
+            }
+            for old, new in layer_norm_map.items():
+                for m in ("weight", "bias"):
+                    k = "{}.layers.{}.layer_norms.{}.{}".format(name, i, old, m)
+                    if k in state_dict:
+                        state_dict[
+                            "{}.layers.{}.{}.{}".format(name, i, new, m)
+                        ] = state_dict[k]
+                        del state_dict[k]
+
+        version_key = "{}.version".format(name)
+        if utils.item(state_dict.get(version_key, torch.Tensor([1]))[0]) <= 2:
+            # earlier checkpoints did not normalize after the stack of layers
+            self.layer_norm = None
+            self.normalize = False
+            state_dict[version_key] = torch.Tensor([1])
+
+        return state_dict
+
+    def set_num_updates(self, num_updates):
+        """State from trainer to pass along to model at every update."""
+
+        def _apply(m):
+            if hasattr(m, "set_num_updates") and m != self:
+                m.set_num_updates(num_updates)
+
+        self.apply(_apply)
diff --git a/SpeechT5/speecht5/models/modules/encoder.py b/SpeechT5/speecht5/models/modules/encoder.py
new file mode 100644
index 0000000000000000000000000000000000000000..0deb193285497d40da3286c48016c5c12fa6710f
--- /dev/null
+++ b/SpeechT5/speecht5/models/modules/encoder.py
@@ -0,0 +1,381 @@
+# --------------------------------------------------------
+# SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing (https://arxiv.org/abs/2110.07205)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/SpeechT5
+# Copyright (c) 2021 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Based on fairseq and espnet code bases
+# https://github.com/pytorch/fairseq; https://github.com/espnet/espnet
+# --------------------------------------------------------
+
+from typing import Dict, List
+
+import numpy as np
+import torch
+import torch.nn as nn
+import contextlib
+from fairseq import utils
+from fairseq.models import (
+    FairseqEncoder,
+)
+from fairseq.modules import (
+    FairseqDropout,
+    LayerNorm,
+    TransformerEncoderLayer,
+)
+from torch import Tensor
+from .transformer_layer import TransformerSentenceEncoderLayer
+
+
+
+DEFAULT_MIN_PARAMS_TO_WRAP = int(1e8)
+
+def Linear(in_features, out_features, bias=True):
+    m = nn.Linear(in_features, out_features, bias)
+    nn.init.xavier_uniform_(m.weight)
+    if bias:
+        nn.init.constant_(m.bias, 0.0)
+    return m
+
+
+class RelativePositionalEncoding(torch.nn.Module):
+    def __init__(self, d_model, maxlen=1000, embed_v=False):
+        super(RelativePositionalEncoding, self).__init__()
+
+        self.d_model = d_model
+        self.maxlen = maxlen
+        self.pe_k = torch.nn.Embedding(2*maxlen, d_model) 
+        if embed_v:
+            self.pe_v = torch.nn.Embedding(2*maxlen, d_model)
+        self.embed_v = embed_v
+
+
+    def forward(self, pos_seq):
+        pos_seq[pos_seq < -self.maxlen] = -self.maxlen
+        pos_seq[pos_seq >= self.maxlen] = self.maxlen - 1
+        pos_seq = pos_seq + self.maxlen
+        if self.embed_v:
+            return self.pe_k(pos_seq), self.pe_v(pos_seq)
+        else:
+            return self.pe_k(pos_seq), None
+
+class TransformerEncoder(FairseqEncoder):
+    """
+    Transformer encoder consisting of *args.encoder_layers* layers. Each layer
+    is a :class:`TransformerEncoderLayer`.
+
+    Args:
+        args (argparse.Namespace): parsed command-line arguments
+        dictionary (~fairseq.data.Dictionary): encoding dictionary
+        embed_tokens (torch.nn.Embedding): input embedding
+    """
+
+    def __init__(self, args, tgt_dict=None, embed_tokens=None):
+        self.args = args
+        super().__init__(None)
+        self.register_buffer("version", torch.Tensor([3]))
+
+        self.dropout_module = FairseqDropout(
+            args.dropout, module_name=self.__class__.__name__
+        )
+        self.encoder_layerdrop = args.encoder_layerdrop
+        self.freeze_encoder_updates = args.freeze_encoder_updates
+        if args.no_freeze_encoder_layer is not None:
+            self.no_freeze_encoder_layer = eval(args.no_freeze_encoder_layer)
+        else:
+            self.no_freeze_encoder_layer = None
+        self.num_updates = 0
+        export = getattr(args, "export", False)
+
+        self.layers = nn.ModuleList([])
+        self.layers.extend(
+            [self.build_encoder_layer(args) for i in range(args.encoder_layers)]
+        )
+        self.num_layers = len(self.layers)
+
+        self.use_sent_enc_layer = args.use_sent_enc_layer
+        self.unb_enc_layer = getattr(args, "unb_enc_layer", -1)
+
+        self.layer_norm_first = args.layer_norm_first
+        self.layer_norm = LayerNorm(args.encoder_embed_dim, eps=args.layer_norm_eps, export=export)
+        
+        if args.share_ctc_embed and embed_tokens is not None:
+            self.proj = nn.Linear(
+                embed_tokens.weight.shape[1],
+                embed_tokens.weight.shape[0],
+                bias=False,
+            )
+            self.proj.weight = embed_tokens.weight
+        elif tgt_dict is not None:
+            self.proj = Linear(args.encoder_embed_dim, len(tgt_dict))
+        else:
+            self.proj = None
+        
+        if args.relative_position_embedding:
+            self.pos_emb = RelativePositionalEncoding(args.encoder_embed_dim//args.encoder_attention_heads, args.encoder_max_relative_position)
+
+
+    def build_encoder_layer(self, args):
+        if args.use_sent_enc_layer:
+            layer = TransformerSentenceEncoderLayer(
+                embedding_dim=args.encoder_embed_dim,
+                ffn_embedding_dim=args.encoder_ffn_embed_dim,
+                num_attention_heads=args.encoder_attention_heads,
+                dropout=args.dropout,
+                attention_dropout=args.attention_dropout,
+                activation_dropout=args.activation_dropout,
+                activation_fn=args.activation_fn,
+                layer_norm_first=args.layer_norm_first,
+                has_relative_attention_bias=args.relative_position_embedding,
+            )
+        else:
+            layer = TransformerEncoderLayer(args)
+        return layer
+
+    def forward(
+        self,
+        encoder_in,
+        encoder_padding_mask,
+        return_all_hiddens: bool = False,
+        tgt_layer=None,
+    ):
+        """
+        Args:
+            src_tokens (LongTensor): tokens in the source language of shape
+                `(batch, src_len)`
+            src_lengths (torch.LongTensor): lengths of each source sentence of
+                shape `(batch)`
+            return_all_hiddens (bool, optional): also return all of the
+                intermediate hidden states (default: False).
+            token_embeddings (torch.Tensor, optional): precomputed embeddings
+                default `None` will recompute embeddings
+
+        Returns:
+            dict:
+                - **encoder_out** (Tensor): the last encoder layer's output of
+                  shape `(src_len, batch, embed_dim)`
+                - **encoder_padding_mask** (ByteTensor): the positions of
+                  padding elements of shape `(batch, src_len)`
+                - **encoder_embedding** (Tensor): the (scaled) embedding lookup
+                  of shape `(batch, src_len, embed_dim)`
+                - **encoder_states** (List[Tensor]): all intermediate
+                  hidden states of shape `(src_len, batch, embed_dim)`.
+                  Only populated if *return_all_hiddens* is True.
+        """
+        if self.no_freeze_encoder_layer is None:
+            ft = self.freeze_encoder_updates <= self.num_updates
+        else:
+            ft = True
+        with torch.no_grad() if not ft else contextlib.ExitStack():
+            encoder_out = self.forward_scriptable(
+                encoder_in, encoder_padding_mask, return_all_hiddens, tgt_layer=tgt_layer,
+            )
+
+        # CTC and bert
+        if self.proj:
+            x_for_ctc = self.proj(self.dropout_module(encoder_out["encoder_out"][0]))
+        else:
+            x_for_ctc = None
+
+        encoder_out["encoder_out_for_ctc"] = [x_for_ctc] # T x B x C
+
+        return encoder_out
+
+    # TorchScript doesn't support super() method so that the scriptable Subclass
+    # can't access the base class model in Torchscript.
+    # Current workaround is to add a helper function with different name and
+    # call the helper function from scriptable Subclass.
+    def forward_scriptable(
+        self,
+        encoder_in,
+        encoder_padding_mask,
+        return_all_hiddens: bool = False,
+        tgt_layer=None,
+    ):
+        """
+        Args:
+            src_tokens (LongTensor): tokens in the source language of shape
+                `(batch, src_len)`
+            src_lengths (torch.LongTensor): lengths of each source sentence of
+                shape `(batch)`
+            return_all_hiddens (bool, optional): also return all of the
+                intermediate hidden states (default: False).
+            token_embeddings (torch.Tensor, optional): precomputed embeddings
+                default `None` will recompute embeddings
+
+        Returns:
+            dict:
+                - **encoder_out** (Tensor): the last encoder layer's output of
+                  shape `(src_len, batch, embed_dim)`
+                - **encoder_padding_mask** (ByteTensor): the positions of
+                  padding elements of shape `(batch, src_len)`
+                - **encoder_embedding** (Tensor): the (scaled) embedding lookup
+                  of shape `(batch, src_len, embed_dim)`
+                - **encoder_states** (List[Tensor]): all intermediate
+                  hidden states of shape `(src_len, batch, embed_dim)`.
+                  Only populated if *return_all_hiddens* is True.
+        """
+        if self.no_freeze_encoder_layer is not None:
+            ft = self.freeze_encoder_updates <= self.num_updates
+        else:
+            ft = True
+        with torch.no_grad() if not ft else contextlib.ExitStack():
+            # compute padding mask
+            if not self.use_sent_enc_layer:
+                has_pads = encoder_in.device.type == "xla" or encoder_padding_mask.any()
+
+            if not self.layer_norm_first:
+                encoder_in = self.layer_norm(encoder_in)
+
+            encoder_in = self.dropout_module(encoder_in)
+
+            # B x T x C -> T x B x C
+            x = encoder_in.transpose(0, 1)
+
+            encoder_states = []
+
+            if return_all_hiddens:
+                encoder_states.append(x)
+
+            ## relative position embedding
+            if self.args.relative_position_embedding:
+                x_len = x.shape[0]
+                pos_seq = torch.arange(0, x_len).long().to(x.device)
+                pos_seq = pos_seq[:, None] - pos_seq[None, :]
+                pos_k, pos_v = self.pos_emb(pos_seq)
+            else:
+                pos_k = None
+
+        # encoder layers
+        r = None
+        d = None
+        for i, layer in enumerate(self.layers):
+            dropout_probability = np.random.random()
+
+            with torch.no_grad() if (not ft) and i not in self.no_freeze_encoder_layer else contextlib.ExitStack():
+                if not self.training or (dropout_probability > self.encoder_layerdrop) or i == self.unb_enc_layer:
+                    if self.use_sent_enc_layer:
+                        x, _ = layer(x, self_attn_padding_mask=encoder_padding_mask, self_attn_mask=None, need_weights=False, pos_bias=pos_k)
+                        # x, _ = layer(x, self_attn_padding_mask=encoder_padding_mask, need_weights=False, pos_bias=pos_k)
+                    else:
+                        x = layer(x, encoder_padding_mask=encoder_padding_mask if has_pads else None, attn_mask=None)
+                        # x = layer(x, encoder_padding_mask=encoder_padding_mask if has_pads else None)
+                if i == self.unb_enc_layer:
+                    d = x
+
+                if i == tgt_layer:
+                    r = x
+                    break
+
+                if return_all_hiddens:
+                    assert encoder_states is not None
+                    encoder_states.append(x)
+
+        with torch.no_grad() if not ft else contextlib.ExitStack():
+            # Finally T x B x C
+            if self.layer_norm_first:
+                x = self.layer_norm(x.transpose(0, 1)).transpose(0, 1)
+
+            if r is not None:
+                x = r
+
+        # The Pytorch Mobile lite interpreter does not supports returning NamedTuple in
+        # `forward` so we use a dictionary instead.
+        # TorchScript does not support mixed values so the values are all lists.
+        # The empty list is equivalent to None.
+        return {
+            "encoder_out": [x],  # T x B x C
+            "encoder_padding_mask": [encoder_padding_mask],  # B x T
+            "encoder_states": encoder_states,  # List[T x B x C]
+            "src_tokens": [],
+            "decoder_input": [d],
+        }
+
+    @torch.jit.export
+    def reorder_encoder_out(self, encoder_out: Dict[str, List[Tensor]], new_order):
+        """
+        Reorder encoder output according to *new_order*.
+
+        Args:
+            encoder_out: output from the ``forward()`` method
+            new_order (LongTensor): desired order
+
+        Returns:
+            *encoder_out* rearranged according to *new_order*
+        """
+        if len(encoder_out["encoder_out"]) == 0:
+            new_encoder_out = []
+        else:
+            new_encoder_out = [encoder_out["encoder_out"][0].index_select(1, new_order)]
+
+        if len(encoder_out["encoder_out_for_ctc"]) == 0:
+            new_x_for_ctc = []
+        else:
+            new_x_for_ctc = [encoder_out["encoder_out_for_ctc"][0].index_select(1, new_order)]
+
+        if len(encoder_out["encoder_padding_mask"]) == 0:
+            new_encoder_padding_mask = []
+        else:
+            new_encoder_padding_mask = [
+                encoder_out["encoder_padding_mask"][0].index_select(0, new_order)
+            ]
+
+        if len(encoder_out["src_tokens"]) == 0:
+            src_tokens = []
+        else:
+            src_tokens = [(encoder_out["src_tokens"][0]).index_select(0, new_order)]
+
+        if len(encoder_out["decoder_input"]) == 0 or encoder_out["decoder_input"][0] is None:
+            new_decoder_input = []
+        else:
+            new_decoder_input = [
+                encoder_out["decoder_input"][0].index_select(0, new_order)
+            ]
+
+        encoder_states = encoder_out["encoder_states"]
+        if len(encoder_states) > 0:
+            for idx, state in enumerate(encoder_states):
+                encoder_states[idx] = state.index_select(1, new_order)
+
+        return {
+            "encoder_out": new_encoder_out,  # T x B x C
+            "encoder_padding_mask": new_encoder_padding_mask,  # B x T
+            "encoder_states": encoder_states,  # List[T x B x C]
+            "src_tokens": src_tokens,  # B x T
+            "encoder_out_for_ctc": new_x_for_ctc, # T x B x C
+            "decoder_input": new_decoder_input,
+        }
+
+    # def max_positions(self):
+    #     """Maximum input length supported by the encoder."""
+    #     return self.max_source_positions
+
+    def upgrade_state_dict_named(self, state_dict, name):
+        """Upgrade a (possibly old) state dict for new versions of fairseq."""
+        # if isinstance(self.embed_positions, SinusoidalPositionalEmbedding):
+        #     weights_key = "{}.embed_positions.weights".format(name)
+        #     if weights_key in state_dict:
+        #         print("deleting {0}".format(weights_key))
+        #         del state_dict[weights_key]
+        #     state_dict[
+        #         "{}.embed_positions._float_tensor".format(name)
+        #     ] = torch.FloatTensor(1)
+        for i in range(self.num_layers):
+            # update layer norms
+            if not isinstance(self.layers[i], TransformerSentenceEncoderLayer):
+                self.layers[i].upgrade_state_dict_named(
+                    state_dict, "{}.layers.{}".format(name, i)
+                )
+
+        version_key = "{}.version".format(name)
+        if utils.item(state_dict.get(version_key, torch.Tensor([1]))[0]) < 2:
+            # earlier checkpoints did not normalize after the stack of layers
+            self.layer_norm = None
+            self.normalize = False
+            state_dict[version_key] = torch.Tensor([1])
+        return state_dict
+
+    def set_num_updates(self, num_updates):
+        """Set the number of parameters updates."""
+        super().set_num_updates(num_updates)
+        self.num_updates = num_updates
+    
\ No newline at end of file
diff --git a/SpeechT5/speecht5/models/modules/multihead_attention.py b/SpeechT5/speecht5/models/modules/multihead_attention.py
new file mode 100644
index 0000000000000000000000000000000000000000..fb126ef6b72d61b9cc50bceca2504976c307f865
--- /dev/null
+++ b/SpeechT5/speecht5/models/modules/multihead_attention.py
@@ -0,0 +1,522 @@
+# --------------------------------------------------------
+# SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing (https://arxiv.org/abs/2110.07205)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/SpeechT5
+# Copyright (c) 2021 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Based on fairseq and espnet code bases
+# https://github.com/pytorch/fairseq; https://github.com/espnet/espnet
+# --------------------------------------------------------
+
+import math
+from typing import Dict, Optional, Tuple
+
+import torch
+import torch.nn.functional as F
+from fairseq import utils
+from fairseq.incremental_decoding_utils import with_incremental_state
+from fairseq.modules.fairseq_dropout import FairseqDropout
+from fairseq.modules.quant_noise import quant_noise
+from torch import Tensor, nn
+from torch.nn import Parameter
+
+
+@with_incremental_state
+class MultiheadAttention(nn.Module):
+    """Multi-headed attention.
+
+    See "Attention Is All You Need" for more details.
+    """
+
+    def __init__(
+        self,
+        embed_dim,
+        num_heads,
+        kdim=None,
+        vdim=None,
+        dropout=0.0,
+        bias=True,
+        add_bias_kv=False,
+        add_zero_attn=False,
+        self_attention=False,
+        encoder_decoder_attention=False,
+        q_noise=0.0,
+        qn_block_size=8,
+        has_relative_attention_bias=False,
+    ):
+        super().__init__()
+        self.embed_dim = embed_dim
+        self.kdim = kdim if kdim is not None else embed_dim
+        self.vdim = vdim if vdim is not None else embed_dim
+        self.qkv_same_dim = self.kdim == embed_dim and self.vdim == embed_dim
+
+        self.num_heads = num_heads
+        self.dropout_module = FairseqDropout(
+            dropout, module_name=self.__class__.__name__
+        )
+
+        self.has_relative_attention_bias = has_relative_attention_bias
+        self.head_dim = embed_dim // num_heads
+        assert (
+            self.head_dim * num_heads == self.embed_dim
+        ), "embed_dim must be divisible by num_heads"
+        self.scaling = self.head_dim ** -0.5
+
+        self.self_attention = self_attention
+        self.encoder_decoder_attention = encoder_decoder_attention
+
+        assert not self.self_attention or self.qkv_same_dim, (
+            "Self-attention requires query, key and " "value to be of the same size"
+        )
+
+        self.k_proj = quant_noise(
+            nn.Linear(self.kdim, embed_dim, bias=bias), q_noise, qn_block_size
+        )
+        self.v_proj = quant_noise(
+            nn.Linear(self.vdim, embed_dim, bias=bias), q_noise, qn_block_size
+        )
+        self.q_proj = quant_noise(
+            nn.Linear(embed_dim, embed_dim, bias=bias), q_noise, qn_block_size
+        )
+
+        self.out_proj = quant_noise(
+            nn.Linear(embed_dim, embed_dim, bias=bias), q_noise, qn_block_size
+        )
+
+        if add_bias_kv:
+            self.bias_k = Parameter(torch.Tensor(1, 1, embed_dim))
+            self.bias_v = Parameter(torch.Tensor(1, 1, embed_dim))
+        else:
+            self.bias_k = self.bias_v = None
+
+        self.add_zero_attn = add_zero_attn
+
+        self.reset_parameters()
+
+        self.onnx_trace = False
+
+    def prepare_for_onnx_export_(self):
+        self.onnx_trace = True
+
+    def reset_parameters(self):
+        if self.qkv_same_dim:
+            # Empirically observed the convergence to be much better with
+            # the scaled initialization
+            nn.init.xavier_uniform_(self.k_proj.weight, gain=1 / math.sqrt(2))
+            nn.init.xavier_uniform_(self.v_proj.weight, gain=1 / math.sqrt(2))
+            nn.init.xavier_uniform_(self.q_proj.weight, gain=1 / math.sqrt(2))
+        else:
+            nn.init.xavier_uniform_(self.k_proj.weight)
+            nn.init.xavier_uniform_(self.v_proj.weight)
+            nn.init.xavier_uniform_(self.q_proj.weight)
+
+        nn.init.xavier_uniform_(self.out_proj.weight)
+        if self.out_proj.bias is not None:
+            nn.init.constant_(self.out_proj.bias, 0.0)
+        if self.bias_k is not None:
+            nn.init.xavier_normal_(self.bias_k)
+        if self.bias_v is not None:
+            nn.init.xavier_normal_(self.bias_v)
+
+    def forward(
+        self,
+        query,
+        key: Optional[Tensor],
+        value: Optional[Tensor],
+        key_padding_mask: Optional[Tensor] = None,
+        incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]] = None,
+        need_weights: bool = True,
+        static_kv: bool = False,
+        attn_mask: Optional[Tensor] = None,
+        before_softmax: bool = False,
+        need_head_weights: bool = False,
+        position_bias: Optional[Tensor] = None
+    ) -> Tuple[Tensor, Optional[Tensor]]:
+        """Input shape: Time x Batch x Channel
+
+        Args:
+            key_padding_mask (ByteTensor, optional): mask to exclude
+                keys that are pads, of shape `(batch, src_len)`, where
+                padding elements are indicated by 1s.
+            need_weights (bool, optional): return the attention weights,
+                averaged over heads (default: False).
+            attn_mask (ByteTensor, optional): typically used to
+                implement causal attention, where the mask prevents the
+                attention from looking forward in time (default: None).
+            before_softmax (bool, optional): return the raw attention
+                weights and values before the attention softmax.
+            need_head_weights (bool, optional): return the attention
+                weights for each head. Implies *need_weights*. Default:
+                return the average attention weights over all heads.
+        """
+        if need_head_weights:
+            need_weights = True
+
+        is_tpu = query.device.type == "xla"
+
+        tgt_len, bsz, embed_dim = query.size()
+        src_len = tgt_len
+        assert embed_dim == self.embed_dim
+        assert list(query.size()) == [tgt_len, bsz, embed_dim]
+        if key is not None:
+            src_len, key_bsz, _ = key.size()
+            if not torch.jit.is_scripting():
+                assert key_bsz == bsz
+                assert value is not None
+                assert src_len, bsz == value.shape[:2]
+
+        if (
+            not self.onnx_trace
+            and not is_tpu  # don't use PyTorch version on TPUs
+            and incremental_state is None
+            and not static_kv
+            # A workaround for quantization to work. Otherwise JIT compilation
+            # treats bias in linear module as method.
+            and not torch.jit.is_scripting()
+            and not self.has_relative_attention_bias
+        ):
+            assert key is not None and value is not None
+            return F.multi_head_attention_forward(
+                query,
+                key,
+                value,
+                self.embed_dim,
+                self.num_heads,
+                torch.empty([0]),
+                torch.cat((self.q_proj.bias, self.k_proj.bias, self.v_proj.bias)),
+                self.bias_k,
+                self.bias_v,
+                self.add_zero_attn,
+                self.dropout_module.p,
+                self.out_proj.weight,
+                self.out_proj.bias,
+                self.training or self.dropout_module.apply_during_inference,
+                key_padding_mask,
+                need_weights,
+                attn_mask,
+                use_separate_proj_weight=True,
+                q_proj_weight=self.q_proj.weight,
+                k_proj_weight=self.k_proj.weight,
+                v_proj_weight=self.v_proj.weight,
+            )
+
+        if incremental_state is not None:
+            saved_state = self._get_input_buffer(incremental_state)
+            if saved_state is not None and "prev_key" in saved_state:
+                # previous time steps are cached - no need to recompute
+                # key and value if they are static
+                if static_kv:
+                    assert self.encoder_decoder_attention and not self.self_attention
+                    key = value = None
+        else:
+            saved_state = None
+
+        if self.self_attention:
+            q = self.q_proj(query)
+            k = self.k_proj(query)
+            v = self.v_proj(query)
+        elif self.encoder_decoder_attention:
+            # encoder-decoder attention
+            q = self.q_proj(query)
+            if key is None:
+                assert value is None
+                k = v = None
+            else:
+                k = self.k_proj(key)
+                v = self.v_proj(key)
+
+        else:
+            assert key is not None and value is not None
+            q = self.q_proj(query)
+            k = self.k_proj(key)
+            v = self.v_proj(value)
+        q *= self.scaling
+
+        if self.bias_k is not None:
+            assert self.bias_v is not None
+            k = torch.cat([k, self.bias_k.repeat(1, bsz, 1)])
+            v = torch.cat([v, self.bias_v.repeat(1, bsz, 1)])
+            if attn_mask is not None:
+                attn_mask = torch.cat(
+                    [attn_mask, attn_mask.new_zeros(attn_mask.size(0), 1)], dim=1
+                )
+            if key_padding_mask is not None:
+                key_padding_mask = torch.cat(
+                    [
+                        key_padding_mask,
+                        key_padding_mask.new_zeros(key_padding_mask.size(0), 1),
+                    ],
+                    dim=1,
+                )
+
+        q = (
+            q.contiguous()
+            .view(tgt_len, bsz * self.num_heads, self.head_dim)
+            .transpose(0, 1)
+        )
+        if k is not None:
+            k = (
+                k.contiguous()
+                .view(-1, bsz * self.num_heads, self.head_dim)
+                .transpose(0, 1)
+            )
+        if v is not None:
+            v = (
+                v.contiguous()
+                .view(-1, bsz * self.num_heads, self.head_dim)
+                .transpose(0, 1)
+            )
+
+        if saved_state is not None:
+            # saved states are stored with shape (bsz, num_heads, seq_len, head_dim)
+            if "prev_key" in saved_state:
+                _prev_key = saved_state["prev_key"]
+                assert _prev_key is not None
+                prev_key = _prev_key.view(bsz * self.num_heads, -1, self.head_dim)
+                if static_kv:
+                    k = prev_key
+                else:
+                    assert k is not None
+                    k = torch.cat([prev_key, k], dim=1)
+                src_len = k.size(1)
+            if "prev_value" in saved_state:
+                _prev_value = saved_state["prev_value"]
+                assert _prev_value is not None
+                prev_value = _prev_value.view(bsz * self.num_heads, -1, self.head_dim)
+                if static_kv:
+                    v = prev_value
+                else:
+                    assert v is not None
+                    v = torch.cat([prev_value, v], dim=1)
+            prev_key_padding_mask: Optional[Tensor] = None
+            if "prev_key_padding_mask" in saved_state:
+                prev_key_padding_mask = saved_state["prev_key_padding_mask"]
+            assert k is not None and v is not None
+            key_padding_mask = MultiheadAttention._append_prev_key_padding_mask(
+                key_padding_mask=key_padding_mask,
+                prev_key_padding_mask=prev_key_padding_mask,
+                batch_size=bsz,
+                src_len=k.size(1),
+                static_kv=static_kv,
+            )
+
+            saved_state["prev_key"] = k.view(bsz, self.num_heads, -1, self.head_dim)
+            saved_state["prev_value"] = v.view(bsz, self.num_heads, -1, self.head_dim)
+            saved_state["prev_key_padding_mask"] = key_padding_mask
+            # In this branch incremental_state is never None
+            assert incremental_state is not None
+            incremental_state = self._set_input_buffer(incremental_state, saved_state)
+        assert k is not None
+        assert k.size(1) == src_len
+
+        # This is part of a workaround to get around fork/join parallelism
+        # not supporting Optional types.
+        if key_padding_mask is not None and key_padding_mask.dim() == 0:
+            key_padding_mask = None
+
+        if key_padding_mask is not None:
+            assert key_padding_mask.size(0) == bsz
+            assert key_padding_mask.size(1) == src_len
+
+        if self.add_zero_attn:
+            assert v is not None
+            src_len += 1
+            k = torch.cat([k, k.new_zeros((k.size(0), 1) + k.size()[2:])], dim=1)
+            v = torch.cat([v, v.new_zeros((v.size(0), 1) + v.size()[2:])], dim=1)
+            if attn_mask is not None:
+                attn_mask = torch.cat(
+                    [attn_mask, attn_mask.new_zeros(attn_mask.size(0), 1)], dim=1
+                )
+            if key_padding_mask is not None:
+                key_padding_mask = torch.cat(
+                    [
+                        key_padding_mask,
+                        torch.zeros(key_padding_mask.size(0), 1).type_as(
+                            key_padding_mask
+                        ),
+                    ],
+                    dim=1,
+                )
+
+        attn_weights = torch.bmm(q, k.transpose(1, 2))
+        attn_weights = self.apply_sparse_mask(attn_weights, tgt_len, src_len, bsz)
+
+        if position_bias is not None and self.has_relative_attention_bias: ## first order
+            ## position_bias: [241, 241, 64]
+            #print ("attn_weights: ", attn_weights.size()) # [492, 241, 241]
+            reshape_q = q.contiguous().view(bsz * self.num_heads, -1, self.head_dim).transpose(0,1) #[241, 492, 64]
+            #print ("reshape_q: ", reshape_q.size())
+            B = torch.matmul(reshape_q, position_bias.transpose(-2, -1))
+            #print ("B: ", B.size())  ## [241, 492, 241]
+            #B = B.transpose(0, 1).view(bsz, self.num_heads, position_bias.size(0), position_bias.size(1))
+            B = B.transpose(0, 1).view(bsz*self.num_heads, position_bias.size(0), position_bias.size(1))
+            #print ("B 2: ", B.size())
+            attn_weights += B
+        else:
+            position_bias = None
+
+        assert list(attn_weights.size()) == [bsz * self.num_heads, tgt_len, src_len]
+
+        if attn_mask is not None:
+            attn_mask = attn_mask.unsqueeze(0)
+            if self.onnx_trace:
+                attn_mask = attn_mask.repeat(attn_weights.size(0), 1, 1)
+            attn_weights += attn_mask
+
+        if key_padding_mask is not None:
+            # don't attend to padding symbols
+            attn_weights = attn_weights.view(bsz, self.num_heads, tgt_len, src_len)
+            if not is_tpu:
+                attn_weights = attn_weights.masked_fill(
+                    key_padding_mask.unsqueeze(1).unsqueeze(2).to(torch.bool),
+                    float("-inf"),
+                )
+            else:
+                attn_weights = attn_weights.transpose(0, 2)
+                attn_weights = attn_weights.masked_fill(key_padding_mask, float("-inf"))
+                attn_weights = attn_weights.transpose(0, 2)
+            attn_weights = attn_weights.view(bsz * self.num_heads, tgt_len, src_len)
+
+        if before_softmax:
+            return attn_weights, v
+
+        attn_weights_float = utils.softmax(
+            attn_weights, dim=-1, onnx_trace=self.onnx_trace
+        )
+        attn_weights = attn_weights_float.type_as(attn_weights)
+        attn_probs = self.dropout_module(attn_weights)
+
+        assert v is not None
+        attn = torch.bmm(attn_probs, v)
+        assert list(attn.size()) == [bsz * self.num_heads, tgt_len, self.head_dim]
+        if self.onnx_trace and attn.size(1) == 1:
+            # when ONNX tracing a single decoder step (sequence length == 1)
+            # the transpose is a no-op copy before view, thus unnecessary
+            attn = attn.contiguous().view(tgt_len, bsz, embed_dim)
+        else:
+            attn = attn.transpose(0, 1).contiguous().view(tgt_len, bsz, embed_dim)
+        attn = self.out_proj(attn)
+        attn_weights: Optional[Tensor] = None
+        if need_weights:
+            attn_weights = attn_weights_float.view(
+                bsz, self.num_heads, tgt_len, src_len
+            ).transpose(1, 0)
+            if not need_head_weights:
+                # average attention weights over heads
+                attn_weights = attn_weights.mean(dim=0)
+
+        return attn, attn_weights
+
+    @staticmethod
+    def _append_prev_key_padding_mask(
+        key_padding_mask: Optional[Tensor],
+        prev_key_padding_mask: Optional[Tensor],
+        batch_size: int,
+        src_len: int,
+        static_kv: bool,
+    ) -> Optional[Tensor]:
+        # saved key padding masks have shape (bsz, seq_len)
+        if prev_key_padding_mask is not None and static_kv:
+            new_key_padding_mask = prev_key_padding_mask
+        elif prev_key_padding_mask is not None and key_padding_mask is not None:
+            new_key_padding_mask = torch.cat(
+                [prev_key_padding_mask.float(), key_padding_mask.float()], dim=1
+            )
+        # During incremental decoding, as the padding token enters and
+        # leaves the frame, there will be a time when prev or current
+        # is None
+        elif prev_key_padding_mask is not None:
+            if src_len > prev_key_padding_mask.size(1):
+                filler = torch.zeros(
+                    (batch_size, src_len - prev_key_padding_mask.size(1)),
+                    device=prev_key_padding_mask.device,
+                )
+                new_key_padding_mask = torch.cat(
+                    [prev_key_padding_mask.float(), filler.float()], dim=1
+                )
+            else:
+                new_key_padding_mask = prev_key_padding_mask.float()
+        elif key_padding_mask is not None:
+            if src_len > key_padding_mask.size(1):
+                filler = torch.zeros(
+                    (batch_size, src_len - key_padding_mask.size(1)),
+                    device=key_padding_mask.device,
+                )
+                new_key_padding_mask = torch.cat(
+                    [filler.float(), key_padding_mask.float()], dim=1
+                )
+            else:
+                new_key_padding_mask = key_padding_mask.float()
+        else:
+            new_key_padding_mask = prev_key_padding_mask
+        return new_key_padding_mask
+
+    @torch.jit.export
+    def reorder_incremental_state(
+        self,
+        incremental_state: Dict[str, Dict[str, Optional[Tensor]]],
+        new_order: Tensor,
+    ):
+        """Reorder buffered internal state (for incremental generation)."""
+        input_buffer = self._get_input_buffer(incremental_state)
+        if input_buffer is not None:
+            for k in input_buffer.keys():
+                input_buffer_k = input_buffer[k]
+                if input_buffer_k is not None:
+                    if self.encoder_decoder_attention and input_buffer_k.size(
+                        0
+                    ) == new_order.size(0):
+                        break
+                    input_buffer[k] = input_buffer_k.index_select(0, new_order)
+            incremental_state = self._set_input_buffer(incremental_state, input_buffer)
+        return incremental_state
+
+    def _get_input_buffer(
+        self, incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]]
+    ) -> Dict[str, Optional[Tensor]]:
+        result = self.get_incremental_state(incremental_state, "attn_state")
+        if result is not None:
+            return result
+        else:
+            empty_result: Dict[str, Optional[Tensor]] = {}
+            return empty_result
+
+    def _set_input_buffer(
+        self,
+        incremental_state: Dict[str, Dict[str, Optional[Tensor]]],
+        buffer: Dict[str, Optional[Tensor]],
+    ):
+        return self.set_incremental_state(incremental_state, "attn_state", buffer)
+
+    def apply_sparse_mask(self, attn_weights, tgt_len: int, src_len: int, bsz: int):
+        return attn_weights
+
+    def upgrade_state_dict_named(self, state_dict, name):
+        prefix = name + "." if name != "" else ""
+        items_to_add = {}
+        keys_to_remove = []
+        for k in state_dict.keys():
+            if k.endswith(prefix + "in_proj_weight"):
+                # in_proj_weight used to be q + k + v with same dimensions
+                dim = int(state_dict[k].shape[0] / 3)
+                items_to_add[prefix + "q_proj.weight"] = state_dict[k][:dim]
+                items_to_add[prefix + "k_proj.weight"] = state_dict[k][dim : 2 * dim]
+                items_to_add[prefix + "v_proj.weight"] = state_dict[k][2 * dim :]
+
+                keys_to_remove.append(k)
+
+                k_bias = prefix + "in_proj_bias"
+                if k_bias in state_dict.keys():
+                    dim = int(state_dict[k].shape[0] / 3)
+                    items_to_add[prefix + "q_proj.bias"] = state_dict[k_bias][:dim]
+                    items_to_add[prefix + "k_proj.bias"] = state_dict[k_bias][
+                        dim : 2 * dim
+                    ]
+                    items_to_add[prefix + "v_proj.bias"] = state_dict[k_bias][2 * dim :]
+
+                    keys_to_remove.append(prefix + "in_proj_bias")
+
+        for k in keys_to_remove:
+            del state_dict[k]
+
+        for key, value in items_to_add.items():
+            state_dict[key] = value
diff --git a/SpeechT5/speecht5/models/modules/speaker_decoder_postnet.py b/SpeechT5/speecht5/models/modules/speaker_decoder_postnet.py
new file mode 100644
index 0000000000000000000000000000000000000000..555ddef0475305f5be581a58d155ff358269e051
--- /dev/null
+++ b/SpeechT5/speecht5/models/modules/speaker_decoder_postnet.py
@@ -0,0 +1,197 @@
+# --------------------------------------------------------
+# SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing (https://arxiv.org/abs/2110.07205)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/SpeechT5
+# Copyright (c) 2021 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Based on fairseq and espnet code bases
+# https://github.com/pytorch/fairseq; https://github.com/espnet/espnet
+# --------------------------------------------------------
+
+import torch.nn as nn
+import math
+import torch
+import torch.nn.functional as F
+
+
+class AngularMargin(nn.Module):
+    """
+    An implementation of Angular Margin (AM) proposed in the following
+    paper: '''Margin Matters: Towards More Discriminative Deep Neural Network
+    Embeddings for Speaker Recognition''' (https://arxiv.org/abs/1906.07317)
+
+    Arguments
+    ---------
+    margin : float
+        The margin for cosine similiarity
+    scale : float
+        The scale for cosine similiarity
+
+    Return
+    ---------
+    predictions : torch.Tensor
+
+    Example
+    -------
+    >>> pred = AngularMargin()
+    >>> outputs = torch.tensor([ [1., -1.], [-1., 1.], [0.9, 0.1], [0.1, 0.9] ])
+    >>> targets = torch.tensor([ [1., 0.], [0., 1.], [ 1., 0.], [0.,  1.] ])
+    >>> predictions = pred(outputs, targets)
+    >>> predictions[:,0] > predictions[:,1]
+    tensor([ True, False,  True, False])
+    """
+
+    def __init__(self, margin=0.0, scale=1.0):
+        super(AngularMargin, self).__init__()
+        self.margin = margin
+        self.scale = scale
+
+    def forward(self, outputs, targets):
+        """Compute AM between two tensors
+
+        Arguments
+        ---------
+        outputs : torch.Tensor
+            The outputs of shape [N, C], cosine similarity is required.
+        targets : torch.Tensor
+            The targets of shape [N, C], where the margin is applied for.
+
+        Return
+        ---------
+        predictions : torch.Tensor
+        """
+        outputs = outputs - self.margin * targets
+        return self.scale * outputs
+
+
+class AdditiveAngularMargin(AngularMargin):
+    """
+    An implementation of Additive Angular Margin (AAM) proposed
+    in the following paper: '''Margin Matters: Towards More Discriminative Deep
+    Neural Network Embeddings for Speaker Recognition'''
+    (https://arxiv.org/abs/1906.07317)
+
+    Arguments
+    ---------
+    margin : float
+        The margin for cosine similiarity, usually 0.2.
+    scale: float
+        The scale for cosine similiarity, usually 30.
+
+    Returns
+    -------
+    predictions : torch.Tensor
+        Tensor.
+    Example
+    -------
+    >>> outputs = torch.tensor([ [1., -1.], [-1., 1.], [0.9, 0.1], [0.1, 0.9] ])
+    >>> targets = torch.tensor([ [1., 0.], [0., 1.], [ 1., 0.], [0.,  1.] ])
+    >>> pred = AdditiveAngularMargin()
+    >>> predictions = pred(outputs, targets)
+    >>> predictions[:,0] > predictions[:,1]
+    tensor([ True, False,  True, False])
+    """
+
+    def __init__(self, margin=0.0, scale=1.0, easy_margin=False):
+        super(AdditiveAngularMargin, self).__init__(margin, scale)
+        self.easy_margin = easy_margin
+
+        self.cos_m = math.cos(self.margin)
+        self.sin_m = math.sin(self.margin)
+        self.th = math.cos(math.pi - self.margin)
+        self.mm = math.sin(math.pi - self.margin) * self.margin
+
+    def forward(self, outputs, targets):
+        """
+        Compute AAM between two tensors
+
+        Arguments
+        ---------
+        outputs : torch.Tensor
+            The outputs of shape [N, C], cosine similarity is required.
+        targets : torch.Tensor
+            The targets of shape [N, C], where the margin is applied for.
+
+        Return
+        ---------
+        predictions : torch.Tensor
+        """
+        cosine = outputs.float()
+        sine = torch.sqrt((1.0 - torch.pow(cosine, 2)).clamp(0, 1))
+        phi = cosine * self.cos_m - sine * self.sin_m  # cos(theta + m)
+        if self.easy_margin:
+            phi = torch.where(cosine > 0, phi, cosine)
+        else:
+            phi = torch.where(cosine > self.th, phi, cosine - self.mm)
+        outputs = (targets * phi) + ((1.0 - targets) * cosine)
+        return self.scale * outputs
+
+
+class SpeakerDecoderPostnet(nn.Module):
+    """Speaker Identification Postnet.
+
+    Arguments
+    ---------
+    embed_dim : int
+        The size of embedding.
+    class_num: int
+        The number of classes.
+    args : Namespace
+
+    Return
+    ---------
+    embed : torch.Tensor
+    output : torch.Tensor
+    """
+
+    def __init__(self, embed_dim, class_num, args):
+        super(SpeakerDecoderPostnet, self).__init__()
+        self.embed_dim = embed_dim
+        self.class_num = class_num
+        self.no_pooling_bn = getattr(args, "sid_no_pooling_bn", False)
+        self.no_embed_postnet = getattr(args, "sid_no_embed_postnet", False)
+        self.normalize_postnet = getattr(args, "sid_normalize_postnet", False)
+        self.softmax_head = getattr(args, "sid_softmax_type", "softmax")
+        if not self.no_pooling_bn:
+            self.bn_pooling = nn.BatchNorm1d(args.decoder_output_dim) 
+        else:
+            self.bn_pooling = None
+        if not self.no_embed_postnet:
+            self.output_embedding = nn.Linear(args.decoder_output_dim, embed_dim, bias=False)
+            self.bn_embedding = nn.BatchNorm1d(embed_dim)
+        else:
+            self.output_embedding = None
+            self.bn_embedding = None
+            self.embed_dim = args.decoder_output_dim
+        self.output_projection = nn.Linear(self.embed_dim, class_num, bias=False)
+        if self.softmax_head == "amsoftmax":
+            self.output_layer = AngularMargin(args.softmax_margin, args.softmax_scale)
+        elif self.softmax_head == "aamsoftmax":
+            self.output_layer = AdditiveAngularMargin(args.softmax_margin, args.softmax_scale, args.softmax_easy_margin)
+        else:
+            self.output_layer = None
+        if self.output_embedding is not None:
+            nn.init.normal_(self.output_embedding.weight, mean=0, std=embed_dim ** -0.5)
+        nn.init.normal_(self.output_projection.weight, mean=0, std=class_num ** -0.5)
+
+    def forward(self, x, target=None):
+        """
+        Parameters
+        ----------
+        x : torch.Tensor of shape [batch, channel] or [batch, time, channel]
+        target : torch.Tensor of shape [batch, channel]
+        """
+        if self.bn_pooling is not None:
+            x = self.bn_pooling(x)
+        if self.output_embedding is not None and self.bn_embedding is not None:
+            embed = self.bn_embedding(self.output_embedding(x))
+        else:
+            embed = x
+        if self.output_layer is not None or self.normalize_postnet:
+            x_norm = F.normalize(embed, p=2, dim=1)
+            w_norm = F.normalize(self.output_projection.weight, p=2, dim=1) # [out_dim, in_dim]
+            output = F.linear(x_norm, w_norm)
+            if self.training and target is not None and self.output_layer is not None:
+                output = self.output_layer(output, target)
+        else:
+            output = self.output_projection(embed)
+        return output, embed
diff --git a/SpeechT5/speecht5/models/modules/speech_decoder_postnet.py b/SpeechT5/speecht5/models/modules/speech_decoder_postnet.py
new file mode 100644
index 0000000000000000000000000000000000000000..6e357be150f72f0b9bb27855a4417eb743763134
--- /dev/null
+++ b/SpeechT5/speecht5/models/modules/speech_decoder_postnet.py
@@ -0,0 +1,76 @@
+# --------------------------------------------------------
+# SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing (https://arxiv.org/abs/2110.07205)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/SpeechT5
+# Copyright (c) 2021 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Based on fairseq and espnet code bases
+# https://github.com/pytorch/fairseq; https://github.com/espnet/espnet
+# --------------------------------------------------------
+
+import contextlib
+import torch
+import torch.nn as nn
+
+from espnet.nets.pytorch_backend.tacotron2.decoder import Postnet
+
+
+class SpeechDecoderPostnet(nn.Module):
+    """
+
+    Args:
+        in_channels (int): the number of input channels
+        mid_channels (int): the number of intermediate channels
+        out_channels (int): the number of output channels
+        kernel_sizes (List[int]): the kernel size for each convolutional layer
+    """
+
+    def __init__(
+        self,
+        odim,
+        args,
+    ):
+        super(SpeechDecoderPostnet, self).__init__()
+        # define decoder postnet
+        # define final projection
+        self.feat_out = torch.nn.Linear(args.decoder_embed_dim, odim * args.reduction_factor)
+        self.prob_out = torch.nn.Linear(args.decoder_embed_dim, args.reduction_factor)
+
+        # define postnet
+        self.postnet = (
+            None
+            if args.postnet_layers == 0
+            else Postnet(
+                idim=0,
+                odim=odim,
+                n_layers=args.postnet_layers,
+                n_chans=args.postnet_chans,
+                n_filts=args.postnet_filts,
+                use_batch_norm=args.use_batch_norm,
+                dropout_rate=args.postnet_dropout_rate,
+            )
+        )
+
+        self.odim = odim
+        self.num_updates = 0
+        self.freeze_decoder_updates = args.freeze_decoder_updates
+
+    def forward(self, zs):
+        ft = self.freeze_decoder_updates <= self.num_updates
+        with torch.no_grad() if not ft else contextlib.ExitStack():
+            # (B, Lmax//r, odim * r) -> (B, Lmax//r * r, odim)
+            before_outs = self.feat_out(zs).view(zs.size(0), -1, self.odim)
+            # (B, Lmax//r, r) -> (B, Lmax//r * r)
+            logits = self.prob_out(zs).view(zs.size(0), -1)
+            # postnet -> (B, Lmax//r * r, odim)
+            if self.postnet is None:
+                after_outs = before_outs
+            else:
+                after_outs = before_outs + self.postnet(
+                    before_outs.transpose(1, 2)
+                ).transpose(1, 2)
+
+        return before_outs, after_outs, logits
+    
+    def set_num_updates(self, num_updates):
+        """Set the number of parameters updates."""
+        self.num_updates = num_updates
diff --git a/SpeechT5/speecht5/models/modules/speech_decoder_prenet.py b/SpeechT5/speecht5/models/modules/speech_decoder_prenet.py
new file mode 100644
index 0000000000000000000000000000000000000000..bd89584606701f47c1882b67b17afa0f3d80207c
--- /dev/null
+++ b/SpeechT5/speecht5/models/modules/speech_decoder_prenet.py
@@ -0,0 +1,110 @@
+# --------------------------------------------------------
+# SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing (https://arxiv.org/abs/2110.07205)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/SpeechT5
+# Copyright (c) 2021 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Based on fairseq and espnet code bases
+# https://github.com/pytorch/fairseq; https://github.com/espnet/espnet
+# --------------------------------------------------------
+
+import contextlib
+import torch
+import torch.nn as nn
+
+import torch.nn.functional as F
+from espnet.nets.pytorch_backend.tacotron2.decoder import Prenet as TacotronDecoderPrenet
+from espnet.nets.pytorch_backend.transformer.embedding import PositionalEncoding
+from espnet.nets.pytorch_backend.transformer.embedding import ScaledPositionalEncoding
+from espnet.nets.pytorch_backend.nets_utils import make_non_pad_mask
+
+
+class SpeechDecoderPrenet(nn.Module):
+    """
+
+    Args:
+        in_channels (int): the number of input channels
+        mid_channels (int): the number of intermediate channels
+        out_channels (int): the number of output channels
+        kernel_sizes (List[int]): the kernel size for each convolutional layer
+    """
+
+    def __init__(
+        self,
+        odim,
+        args,
+    ):
+        super(SpeechDecoderPrenet, self).__init__()
+        # define decoder prenet
+        if args.dprenet_layers != 0:
+            # decoder prenet
+            decoder_input_layer = torch.nn.Sequential(
+                TacotronDecoderPrenet(
+                    idim=odim,
+                    n_layers=args.dprenet_layers,
+                    n_units=args.dprenet_units,
+                    dropout_rate=args.dprenet_dropout_rate,
+                ),
+                torch.nn.Linear(args.dprenet_units, args.decoder_embed_dim),
+            )
+        else:
+            decoder_input_layer = "linear"
+
+        pos_enc_class = (
+            ScaledPositionalEncoding if args.dec_use_scaled_pos_enc else PositionalEncoding
+        )
+
+        if decoder_input_layer == "linear":
+            self.decoder_prenet = torch.nn.Sequential(
+                torch.nn.Linear(odim, args.decoder_embed_dim),
+                torch.nn.LayerNorm(args.decoder_embed_dim),
+                torch.nn.Dropout(args.transformer_dec_dropout_rate),
+                torch.nn.ReLU(),
+                pos_enc_class(args.decoder_embed_dim, args.transformer_dec_positional_dropout_rate),
+            )
+        elif isinstance(decoder_input_layer, torch.nn.Module):
+            self.decoder_prenet = torch.nn.Sequential(
+                decoder_input_layer, pos_enc_class(args.decoder_embed_dim, args.transformer_dec_positional_dropout_rate, max_len=args.max_speech_positions)
+            )
+
+        if args.spk_embed_integration_type == 'pre':
+            self.spkembs_layer = torch.nn.Sequential(
+                torch.nn.Linear(args.spk_embed_dim + args.decoder_embed_dim, args.decoder_embed_dim), torch.nn.ReLU()
+            )
+        self.num_updates = 0
+        self.freeze_decoder_updates = args.freeze_decoder_updates
+
+    def forward(self, prev_output_tokens, tgt_lengths_in=None, spkembs=None):
+        ft = self.freeze_decoder_updates <= self.num_updates
+        with torch.no_grad() if not ft else contextlib.ExitStack():
+            prev_output_tokens = self.decoder_prenet(prev_output_tokens)
+
+            if spkembs is not None:
+                spkembs = F.normalize(spkembs).unsqueeze(1).expand(-1, prev_output_tokens.size(1), -1)
+                prev_output_tokens = self.spkembs_layer(torch.cat([prev_output_tokens, spkembs], dim=-1))
+
+            if tgt_lengths_in is not None:
+                tgt_frames_mask = ~(self._source_mask(tgt_lengths_in).squeeze(1))
+            else:
+                tgt_frames_mask = None
+            return prev_output_tokens, tgt_frames_mask
+
+    def _source_mask(self, ilens):
+        """Make masks for self-attention.
+        Args:
+            ilens (LongTensor or List): Batch of lengths (B,).
+        Returns:
+            Tensor: Mask tensor for self-attention.
+                    dtype=torch.uint8 in PyTorch 1.2-
+                    dtype=torch.bool in PyTorch 1.2+ (including 1.2)
+        Examples:
+            >>> ilens = [5, 3]
+            >>> self._source_mask(ilens)
+            tensor([[[1, 1, 1, 1, 1],
+                    [[1, 1, 1, 0, 0]]], dtype=torch.uint8)
+        """
+        x_masks = make_non_pad_mask(ilens).to(next(self.parameters()).device)
+        return x_masks.unsqueeze(-2)
+    
+    def set_num_updates(self, num_updates):
+        """Set the number of parameters updates."""
+        self.num_updates = num_updates
diff --git a/SpeechT5/speecht5/models/modules/speech_encoder_postnet.py b/SpeechT5/speecht5/models/modules/speech_encoder_postnet.py
new file mode 100644
index 0000000000000000000000000000000000000000..ae8371bcb2c01065636e078962249c7fd1f968f8
--- /dev/null
+++ b/SpeechT5/speecht5/models/modules/speech_encoder_postnet.py
@@ -0,0 +1,124 @@
+# --------------------------------------------------------
+# SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing (https://arxiv.org/abs/2110.07205)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/SpeechT5
+# Copyright (c) 2021 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Based on fairseq and espnet code bases
+# https://github.com/pytorch/fairseq; https://github.com/espnet/espnet
+# --------------------------------------------------------
+
+import logging
+import torch.nn as nn
+import torch
+
+
+logger = logging.getLogger(__name__)
+
+class SpeechEncoderPostnet(nn.Module):
+    """
+
+    Args:
+        in_channels (int): the number of input channels
+        mid_channels (int): the number of intermediate channels
+        out_channels (int): the number of output channels
+        kernel_sizes (List[int]): the kernel size for each convolutional layer
+    """
+
+    def __init__(self, dictionaries, args):
+        super(SpeechEncoderPostnet, self).__init__()
+        # modules below are not needed during fine-tuning
+        self.target_glu = args.target_glu
+        self.skip_masked = args.skip_masked
+        self.skip_nomask = args.skip_nomask
+        self.logit_temp = args.logit_temp
+
+        final_dim = (
+            args.final_dim if args.final_dim > 0 else args.encoder_embed_dim
+        )
+        if any([d is None for d in dictionaries]):
+            logger.info(
+                "cannot find dictionary. assume will be used for fine-tuning"
+            )
+        else:
+            self.num_classes = [len(d) for d in dictionaries]
+            self.label_embs_concat = nn.Parameter(
+                torch.FloatTensor(sum(self.num_classes), final_dim)
+            )
+            nn.init.uniform_(self.label_embs_concat)
+        self.untie_final_proj = args.untie_final_proj
+        if self.untie_final_proj:
+            self.final_proj = nn.Linear(
+                args.encoder_embed_dim, final_dim * len(dictionaries)
+            )
+        else:
+            self.final_proj = nn.Linear(args.encoder_embed_dim, final_dim)
+
+    def compute_nce(self, x, pos, negs):
+        neg_is_pos = (pos == negs).all(-1)
+        pos = pos.unsqueeze(0)
+        targets = torch.cat([pos, negs], dim=0)
+
+        logits = torch.cosine_similarity(
+            x.float(), targets.float(), dim=-1
+        ).type_as(x)
+        logits /= self.logit_temp
+        if neg_is_pos.any():
+            logits[1:][neg_is_pos] = float("-inf")
+        logits = logits.transpose(0, 1)  # (num_x, num_cls+1)
+        return logits
+
+    def forward(self, x, padding_mask, mask_indices, target_list):
+        def compute_pred(proj_x, target, label_embs):
+            # compute logits for the i-th label set
+            y = torch.index_select(label_embs, 0, target.long())
+            negs = label_embs.unsqueeze(1).expand(-1, proj_x.size(0), -1)
+            if self.target_glu:
+                y = self.target_glu(y)
+                negs = self.target_glu(negs)
+            # proj_x: (S, D)
+            # y: (S, D)
+            # negs: (Neg, S, D)
+            return self.compute_nce(proj_x, y, negs)
+
+        label_embs_list = self.label_embs_concat.split(self.num_classes, 0)
+
+        if not self.skip_masked:
+            masked_indices = torch.logical_and(~padding_mask, mask_indices)
+            proj_x_m = self.final_proj(x[masked_indices])
+            if self.untie_final_proj:
+                proj_x_m_list = proj_x_m.chunk(len(target_list), dim=-1)
+            else:
+                proj_x_m_list = [proj_x_m for _ in range(len(target_list))]
+            logit_m_list = [
+                compute_pred(proj_x_m, t[masked_indices], label_embs_list[i])
+                for i, (proj_x_m, t) in enumerate(
+                    zip(proj_x_m_list, target_list)
+                )
+            ]
+        else:
+            logit_m_list = [None for _ in target_list]
+
+        if not self.skip_nomask:
+            nomask_indices = torch.logical_and(~padding_mask, ~mask_indices)
+            proj_x_u = self.final_proj(x[nomask_indices])
+            if self.untie_final_proj:
+                proj_x_u_list = proj_x_u.chunk(len(target_list), dim=-1)
+            else:
+                proj_x_u_list = [proj_x_u for _ in range(len(target_list))]
+
+            logit_u_list = [
+                compute_pred(proj_x_u, t[nomask_indices], label_embs_list[i])
+                for i, (proj_x_u, t) in enumerate(
+                    zip(proj_x_u_list, target_list)
+                )
+            ]
+        else:
+            logit_u_list = [None for _ in target_list]
+
+        result = {
+            "logit_m_list": logit_m_list,
+            "logit_u_list": logit_u_list,
+            "padding_mask": padding_mask,
+        }
+
+        return result
diff --git a/SpeechT5/speecht5/models/modules/speech_encoder_prenet.py b/SpeechT5/speecht5/models/modules/speech_encoder_prenet.py
new file mode 100644
index 0000000000000000000000000000000000000000..89e4a7d5a9b0cb50ed3d99aa54a7e3729a6cf67e
--- /dev/null
+++ b/SpeechT5/speecht5/models/modules/speech_encoder_prenet.py
@@ -0,0 +1,374 @@
+# --------------------------------------------------------
+# SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing (https://arxiv.org/abs/2110.07205)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/SpeechT5
+# Copyright (c) 2021 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Based on fairseq and espnet code bases
+# https://github.com/pytorch/fairseq; https://github.com/espnet/espnet
+# --------------------------------------------------------
+
+import logging
+import math
+import torch
+import contextlib
+from typing import List, Tuple
+import torch.nn as nn
+
+from fairseq.data.data_utils import lengths_to_padding_mask
+from fairseq.data.data_utils import compute_mask_indices
+from fairseq.modules import (
+    PositionalEmbedding,
+    Fp32GroupNorm,
+    FairseqDropout,
+    SamePad,
+    GradMultiply,
+    LayerNorm,
+    Fp32LayerNorm,
+    TransposeLast,
+)
+import numpy as np
+
+logger = logging.getLogger(__name__)
+
+
+class LinearLayer(nn.Module):
+    def __init__(self, idim, odom, dropout=0):
+        super(LinearLayer, self).__init__()
+        self.linear = nn.Sequential(
+            nn.Linear(idim, odom),
+            nn.LayerNorm(odom),
+            nn.Dropout(dropout),
+            nn.ReLU(),
+        )
+
+    def get_out_seq_lens_tensor(self, in_seq_lens_tensor):
+        out = in_seq_lens_tensor.clone()
+        return out
+    
+    def forward(self, src_tokens, src_lengths):
+        """
+        src_tokens: [B, T, C]
+        src_lengths: [B]
+        """
+        x = self.linear(src_tokens)
+        x = x.transpose(0, 1).contiguous() # -> T x B x C
+        return x, src_lengths
+
+
+class SpeechEncoderPrenet(nn.Module):
+    """
+
+    Args:
+        in_channels (int): the number of input channels
+        mid_channels (int): the number of intermediate channels
+        out_channels (int): the number of output channels
+        kernel_sizes (List[int]): the kernel size for each convolutional layer
+    """
+
+    def __init__(self, args):
+        super(SpeechEncoderPrenet, self).__init__()
+        self.dropout_module = FairseqDropout(
+            p=args.dropout, module_name=self.__class__.__name__
+        )
+        self.embed_scale = math.sqrt(args.encoder_embed_dim)
+        if args.no_scale_embedding:
+            self.embed_scale = 1.0
+        self.padding_idx = 1
+        self.freeze_encoder_updates = args.freeze_encoder_updates
+        self.num_updates = 0
+        assert args.encoder_speech_prenet in ["conv", "linear"], args.encoder_speech_prenet
+        feature_enc_layers = eval(args.conv_feature_layers)  # noqa
+        self.embed = feature_enc_layers[-1][0]
+
+        self.feature_extractor = ConvFeatureExtractionModel(
+            conv_layers=feature_enc_layers,
+            dropout=0.0,
+            mode=args.extractor_mode,
+            conv_bias=args.conv_bias,
+        )
+        feature_ds_rate = np.prod([s for _, _, s in feature_enc_layers])
+        self.feat2tar_ratio = (
+            args.label_rates * feature_ds_rate / args.sample_rate
+        )
+
+        self.post_extract_proj = (
+            nn.Linear(self.embed, args.encoder_embed_dim)
+            if self.embed != args.encoder_embed_dim
+            else None
+        )
+
+        self.use_conv_pos = args.use_conv_pos
+        self.use_sinc_pos = args.use_sinc_pos
+        self.use_abs_pos = getattr(args, "use_abs_pos", False)
+
+        self.feature_grad_mult = args.feature_grad_mult
+        if self.use_conv_pos:
+            self.layer_norm = LayerNorm(self.embed)
+            self.pos_conv = nn.Conv1d(
+                args.encoder_embed_dim,
+                args.encoder_embed_dim,
+                kernel_size=args.conv_pos,
+                padding=args.conv_pos // 2,
+                groups=args.conv_pos_groups,
+            )
+            dropout = 0
+            std = math.sqrt((4 * (1.0 - dropout)) / (args.conv_pos * args.encoder_embed_dim))
+            nn.init.normal_(self.pos_conv.weight, mean=0, std=std)
+            nn.init.constant_(self.pos_conv.bias, 0)
+            self.pos_conv = nn.utils.weight_norm(self.pos_conv, name="weight", dim=2)
+            self.pos_conv = nn.Sequential(self.pos_conv, SamePad(args.conv_pos), nn.GELU())
+
+        assert not (self.use_sinc_pos and self.use_abs_pos), f"sinc pos: {self.use_sinc_pos} abs pos: {self.use_abs_pos}"
+        if self.use_sinc_pos:
+            self.embed_positions = PositionalEmbedding(
+                args.max_speech_positions, args.encoder_embed_dim, self.padding_idx
+            )
+        if self.use_abs_pos:
+            self.embed_positions = PositionalEmbedding(
+                args.max_speech_positions, args.encoder_embed_dim, self.padding_idx, learned=True
+            )
+        
+        # Hubert
+        self.mask_prob = args.mask_prob
+        self.mask_selection = args.mask_selection
+        self.mask_other = args.mask_other
+        self.hubert_mask_length = args.hubert_mask_length
+        self.no_mask_overlap = args.no_mask_overlap
+        self.mask_min_space = args.mask_min_space
+
+        self.mask_channel_prob = args.mask_channel_prob
+        self.mask_channel_selection = args.mask_channel_selection
+        self.mask_channel_other = args.mask_channel_other
+        self.mask_channel_length = args.mask_channel_length
+        self.no_mask_channel_overlap = args.no_mask_channel_overlap
+        self.mask_channel_min_space = args.mask_channel_min_space
+
+        self.mask_emb = nn.Parameter(
+            torch.FloatTensor(args.encoder_embed_dim).uniform_()
+        )
+
+    def forward(self, src_tokens, require_feat_pen=False, target_list=None, padding_mask=None, mask=True):
+        ft = self.freeze_encoder_updates <= self.num_updates
+        with torch.no_grad() if not ft else contextlib.ExitStack():
+            return self._forward(src_tokens, require_feat_pen, target_list, padding_mask, mask)
+
+    def _forward(self, src_tokens, require_feat_pen=False, target_list=None, padding_mask=None, mask=True):
+        if self.feature_grad_mult > 0:
+            x = self.feature_extractor(src_tokens)
+            x = x.transpose(1, 2).transpose(0, 1)  # [length, batch, hidden_size]
+            if self.feature_grad_mult != 1.0:
+                x = GradMultiply.apply(x, self.feature_grad_mult)
+        else:
+            with torch.no_grad():
+                x = self.feature_extractor(src_tokens)
+                x = x.transpose(1, 2).transpose(0, 1)  # [length, batch, hidden_size]
+        x = x.transpose(0, 1) # [batch, length, hidden_size]
+
+        encoder_padding_mask = padding_mask
+
+        x = x.transpose(1, 2) # [batch, hidden_size, length]
+        if target_list is not None:
+            x, target_list = self.forward_targets(x, target_list)
+        features_pen = x.float().pow(2).mean()
+        x = x.transpose(1, 2) # [batch, length, hidden_size]
+        x = self.layer_norm(x)
+        encoder_padding_mask = self.forward_padding_mask(x, encoder_padding_mask)
+        if self.post_extract_proj is not None:
+            x = self.post_extract_proj(x)
+        x = self.dropout_module(x)
+        if mask:
+            x, mask_indices = self.apply_hubert_mask(
+                x, encoder_padding_mask
+            )
+        else:
+            x = x
+            mask_indices = None
+
+        if self.use_conv_pos:
+            positions = self.pos_conv(x.transpose(1, 2))
+            positions = positions.transpose(1, 2)
+        #else:
+        #    positions = self.embed_positions(encoder_padding_mask)
+            x = x + positions
+
+        if self.use_sinc_pos:
+            positions = self.embed_positions(encoder_padding_mask)
+            x = x + positions
+
+        # x = self.dropout_module(x)
+
+        if require_feat_pen:
+            return (x, features_pen, mask_indices, target_list), encoder_padding_mask
+        else:
+            # For consistence with encoder
+            return x, encoder_padding_mask
+
+    def forward_targets(
+        self, features: torch.Tensor, target_list: List[torch.Tensor],
+    ) -> Tuple[torch.Tensor, torch.Tensor]:
+        # Trim features to ensure labels exist and then get aligned labels
+        feat_tsz = features.size(2)
+        targ_tsz = min([t.size(1) for t in target_list])
+        if self.feat2tar_ratio * feat_tsz > targ_tsz:
+            feat_tsz = int(targ_tsz / self.feat2tar_ratio)
+            features = features[..., :feat_tsz]
+        target_inds = torch.arange(feat_tsz).float() * self.feat2tar_ratio
+        target_list = [t[:, target_inds.long()] for t in target_list]
+        return features, target_list
+
+    def forward_padding_mask(
+        self, features: torch.Tensor, padding_mask: torch.Tensor,
+    ) -> torch.Tensor:
+        extra = padding_mask.size(1) % features.size(1)
+        if extra > 0:
+            padding_mask = padding_mask[:, :-extra]
+        padding_mask = padding_mask.view(
+            padding_mask.size(0), features.size(1), -1
+        )
+        padding_mask = padding_mask.all(-1)
+        return padding_mask
+
+    def get_src_lengths(self, src_lengths):
+        return self.feature_extractor.get_out_seq_lens_tensor(src_lengths)
+
+    def apply_hubert_mask(self, x, padding_mask):
+        B, T, C = x.shape
+        if self.mask_prob > 0:
+            mask_indices = compute_mask_indices(
+                (B, T),
+                padding_mask,
+                self.mask_prob,
+                self.hubert_mask_length,
+                self.mask_selection,
+                self.mask_other,
+                min_masks=2,
+                no_overlap=self.no_mask_overlap,
+                min_space=self.mask_min_space,
+            )
+            mask_indices = torch.from_numpy(mask_indices).to(x.device)
+            x[mask_indices] = self.mask_emb
+        else:
+            mask_indices = None
+
+        if self.mask_channel_prob > 0:
+            mask_channel_indices = compute_mask_indices(
+                (B, C),
+                None,
+                self.mask_channel_prob,
+                self.mask_channel_length,
+                self.mask_channel_selection,
+                self.mask_channel_other,
+                no_overlap=self.no_mask_channel_overlap,
+                min_space=self.mask_channel_min_space,
+            )
+            mask_channel_indices = (
+                torch.from_numpy(mask_channel_indices)
+                .to(x.device)
+                .unsqueeze(1)
+                .expand(-1, T, -1)
+            )
+            x[mask_channel_indices] = 0
+
+        return x, mask_indices
+
+    def set_num_updates(self, num_updates):
+        """Set the number of parameters updates."""
+        self.num_updates = num_updates
+
+class ConvFeatureExtractionModel(nn.Module):
+    def __init__(
+        self,
+        conv_layers: List[Tuple[int, int, int]],
+        dropout: float = 0.0,
+        mode: str = "default",
+        conv_bias: bool = False,
+    ):
+        super().__init__()
+
+        assert mode in {"default", "layer_norm"}
+
+        def block(
+            n_in,
+            n_out,
+            k,
+            stride,
+            is_layer_norm=False,
+            is_group_norm=False,
+            conv_bias=False,
+        ):
+            def make_conv():
+                conv = nn.Conv1d(n_in, n_out, k, stride=stride, bias=conv_bias)
+                nn.init.kaiming_normal_(conv.weight)
+                return conv
+
+            assert (
+                is_layer_norm and is_group_norm
+            ) == False, "layer norm and group norm are exclusive"
+
+            if is_layer_norm:
+                return nn.Sequential(
+                    make_conv(),
+                    nn.Dropout(p=dropout),
+                    nn.Sequential(
+                        TransposeLast(),
+                        Fp32LayerNorm(dim, elementwise_affine=True),
+                        TransposeLast(),
+                    ),
+                    nn.GELU(),
+                )
+            elif is_group_norm:
+                return nn.Sequential(
+                    make_conv(),
+                    nn.Dropout(p=dropout),
+                    Fp32GroupNorm(dim, dim, affine=True),
+                    nn.GELU(),
+                )
+            else:
+                return nn.Sequential(make_conv(), nn.Dropout(p=dropout), nn.GELU())
+
+        in_d = 1
+        self.conv_layers = nn.ModuleList()
+        self.conv_layers_infos = conv_layers
+        for i, cl in enumerate(conv_layers):
+            assert len(cl) == 3, "invalid conv definition: " + str(cl)
+            (dim, k, stride) = cl
+
+            self.conv_layers.append(
+                block(
+                    in_d,
+                    dim,
+                    k,
+                    stride,
+                    is_layer_norm=mode == "layer_norm",
+                    is_group_norm=mode == "default" and i == 0,
+                    conv_bias=conv_bias,
+                )
+            )
+            in_d = dim
+
+    def forward(self, x):
+        # BxT -> BxCxT
+        x = x.unsqueeze(1)
+        for conv in self.conv_layers:
+            x = conv(x)
+        return x
+
+    def get_out_seq_lens_nonmask_after_a_layer(self, in_seq_lens_tensor, i):
+        """Returns the out_seq_lens_nonmask 0/1 tensor after a layer.
+
+        Args:
+            in_seq_lens_tensor (LongTensor): length
+
+        Returns:
+            LongTensor: length
+        """
+        out_lengths = in_seq_lens_tensor.clone()
+        out_lengths = ((out_lengths.float() - (self.conv_layers_infos[i][1] - 1) - 1) / self.conv_layers_infos[i][-1] + 1).floor().long()
+        out_nonmask = (~lengths_to_padding_mask(out_lengths)).float()
+        return out_nonmask, out_lengths
+
+    def get_out_seq_lens_tensor(self, in_seq_lens_tensor):
+        out = in_seq_lens_tensor.clone()
+        for i in range(len(self.conv_layers)):
+            out = ((out.float() - (self.conv_layers_infos[i][1] - 1) - 1) / self.conv_layers_infos[i][-1] + 1).floor().long()
+        return out
diff --git a/SpeechT5/speecht5/models/modules/text_decoder_postnet.py b/SpeechT5/speecht5/models/modules/text_decoder_postnet.py
new file mode 100644
index 0000000000000000000000000000000000000000..f9230352196accfdd40891bb1a844b3740c8253c
--- /dev/null
+++ b/SpeechT5/speecht5/models/modules/text_decoder_postnet.py
@@ -0,0 +1,93 @@
+# --------------------------------------------------------
+# SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing (https://arxiv.org/abs/2110.07205)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/SpeechT5
+# Copyright (c) 2021 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Based on fairseq and espnet code bases
+# https://github.com/pytorch/fairseq; https://github.com/espnet/espnet
+# --------------------------------------------------------
+
+import torch.nn as nn
+import torch
+import contextlib
+
+from fairseq import utils
+from fairseq.modules import (
+    AdaptiveSoftmax,
+)
+
+class TextDecoderPostnet(nn.Module):
+    """
+
+    Args:
+        in_channels (int): the number of input channels
+        mid_channels (int): the number of intermediate channels
+        out_channels (int): the number of output channels
+        kernel_sizes (List[int]): the kernel size for each convolutional layer
+    """
+
+    def __init__(self, embed_tokens, dictionary, args, output_projection=None,):
+        super(TextDecoderPostnet, self).__init__()
+        self.output_embed_dim = args.decoder_output_dim
+        self.output_projection = output_projection
+        self.adaptive_softmax = None
+        self.share_input_output_embed = args.share_input_output_embed
+        if self.output_projection is None:
+            self.build_output_projection(args, dictionary, embed_tokens)
+        self.freeze_decoder_updates = args.freeze_decoder_updates
+        self.num_updates = 0
+
+    def output_layer(self, features):
+        """Project features to the vocabulary size."""
+        if self.adaptive_softmax is None:
+            # project back to size of vocabulary
+            return self.output_projection(features)
+        else:
+            return features
+
+    def build_output_projection(self, args, dictionary, embed_tokens):
+        if args.adaptive_softmax_cutoff is not None:
+            self.adaptive_softmax = AdaptiveSoftmax(
+                len(dictionary),
+                self.output_embed_dim,
+                utils.eval_str_list(args.adaptive_softmax_cutoff, type=int),
+                dropout=args.adaptive_softmax_dropout,
+                adaptive_inputs=embed_tokens if args.tie_adaptive_weights else None,
+                factor=args.adaptive_softmax_factor,
+                tie_proj=args.tie_adaptive_proj,
+            )
+        elif self.share_input_output_embed:
+            self.output_projection = nn.Linear(
+                embed_tokens.weight.shape[1],
+                embed_tokens.weight.shape[0],
+                bias=False,
+            )
+            self.output_projection.weight = embed_tokens.weight
+        else:
+            self.output_projection = nn.Linear(
+                self.output_embed_dim, len(dictionary), bias=False
+            )
+            nn.init.normal_(
+                self.output_projection.weight, mean=0, std=self.output_embed_dim ** -0.5
+            )
+        # num_base_layers = getattr(args, "base_layers", 0)
+        # for i in range(num_base_layers):
+        #     self.layers.insert(
+        #         ((i + 1) * args.decoder_layers) // (num_base_layers + 1),
+        #         BaseLayer(args),
+        #     )
+
+    def forward(self, x):
+        ft = self.freeze_decoder_updates <= self.num_updates
+        with torch.no_grad() if not ft else contextlib.ExitStack():
+            return self._forward(x)
+
+    def _forward(self, x):
+        # embed positions
+        x = self.output_layer(x)
+
+        return x
+
+    def set_num_updates(self, num_updates):
+        """Set the number of parameters updates."""
+        self.num_updates = num_updates
diff --git a/SpeechT5/speecht5/models/modules/text_decoder_prenet.py b/SpeechT5/speecht5/models/modules/text_decoder_prenet.py
new file mode 100644
index 0000000000000000000000000000000000000000..8921b8ec3776fef81b6068b7d101b8d12286054f
--- /dev/null
+++ b/SpeechT5/speecht5/models/modules/text_decoder_prenet.py
@@ -0,0 +1,128 @@
+# --------------------------------------------------------
+# SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing (https://arxiv.org/abs/2110.07205)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/SpeechT5
+# Copyright (c) 2021 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Based on fairseq and espnet code bases
+# https://github.com/pytorch/fairseq; https://github.com/espnet/espnet
+# --------------------------------------------------------
+
+import math
+import torch.nn as nn
+import torch
+import contextlib
+
+from fairseq.modules.quant_noise import quant_noise as apply_quant_noise_
+from fairseq.models.transformer import Linear, LayerNorm
+from fairseq.modules import (
+    PositionalEmbedding,
+    FairseqDropout,
+)
+
+
+class TextDecoderPrenet(nn.Module):
+    """
+
+    Args:
+        in_channels (int): the number of input channels
+        mid_channels (int): the number of intermediate channels
+        out_channels (int): the number of output channels
+        kernel_sizes (List[int]): the kernel size for each convolutional layer
+    """
+
+    def __init__(self, embed_tokens, args):
+        super(TextDecoderPrenet, self).__init__()
+        self.dropout_module = FairseqDropout(
+            args.dropout, module_name=self.__class__.__name__
+        )
+        self.decoder_layerdrop = args.decoder_layerdrop
+        self.num_updates = 0
+
+        input_embed_dim = embed_tokens.embedding_dim
+        embed_dim = args.decoder_embed_dim
+        self.embed_dim = embed_dim
+        self.output_embed_dim = args.decoder_output_dim
+
+        self.padding_idx = embed_tokens.padding_idx
+
+        self.embed_tokens = embed_tokens
+
+        self.embed_scale = 1.0 if args.no_scale_embedding else math.sqrt(embed_dim)
+
+        if not args.adaptive_input and args.quant_noise_pq > 0:
+            self.quant_noise = apply_quant_noise_(
+                nn.Linear(embed_dim, embed_dim, bias=False),
+                args.quant_noise_pq,
+                args.quant_noise_pq_block_size,
+            )
+        else:
+            self.quant_noise = None
+
+        self.project_in_dim = (
+            Linear(input_embed_dim, embed_dim, bias=False)
+            if embed_dim != input_embed_dim
+            else None
+        )
+        self.embed_positions = (
+            PositionalEmbedding(
+                args.max_text_positions,
+                embed_dim,
+                self.padding_idx,
+                learned=args.decoder_learned_pos,
+            )
+            if not args.no_token_positional_embeddings
+            else None
+        )
+        export = getattr(args, "export", False)
+        if getattr(args, "layernorm_embedding", False):
+            self.layernorm_embedding = LayerNorm(embed_dim, export=export)
+        else:
+            self.layernorm_embedding = None
+        
+        self.freeze_decoder_updates = args.freeze_decoder_updates
+
+    def forward(self, prev_output_tokens, incremental_state=None):
+        ft = self.freeze_decoder_updates <= self.num_updates
+        with torch.no_grad() if not ft else contextlib.ExitStack():
+            return self._forward(prev_output_tokens, incremental_state)
+
+    def _forward(self, prev_output_tokens, incremental_state=None):
+        if prev_output_tokens.eq(self.padding_idx).any():
+            x_mask = prev_output_tokens.eq(self.padding_idx)
+        else:
+            x_mask = None
+
+        # embed positions
+        positions = None
+        if self.embed_positions is not None:
+            positions = self.embed_positions(
+                prev_output_tokens, incremental_state=incremental_state
+            )
+
+        if incremental_state is not None:
+            prev_output_tokens = prev_output_tokens[:, -1:]
+            if positions is not None:
+                positions = positions[:, -1:]
+
+        # embed tokens and positions
+        x = self.embed_scale * self.embed_tokens(prev_output_tokens)
+
+        if self.quant_noise is not None:
+            x = self.quant_noise(x)
+
+        if self.project_in_dim is not None:
+            x = self.project_in_dim(x)
+
+        if positions is not None:
+            x += positions
+
+        if self.layernorm_embedding is not None:
+            x = self.layernorm_embedding(x)
+
+        x = self.dropout_module(x)
+
+        return x, x_mask, incremental_state
+
+    def set_num_updates(self, num_updates):
+        """Set the number of parameters updates."""
+        self.num_updates = num_updates
diff --git a/SpeechT5/speecht5/models/modules/text_encoder_prenet.py b/SpeechT5/speecht5/models/modules/text_encoder_prenet.py
new file mode 100644
index 0000000000000000000000000000000000000000..466e6493c002e2043f045f31c9c2d7f9712fb5ef
--- /dev/null
+++ b/SpeechT5/speecht5/models/modules/text_encoder_prenet.py
@@ -0,0 +1,45 @@
+# --------------------------------------------------------
+# SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing (https://arxiv.org/abs/2110.07205)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/SpeechT5
+# Copyright (c) 2021 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Based on fairseq and espnet code bases
+# https://github.com/pytorch/fairseq; https://github.com/espnet/espnet
+# --------------------------------------------------------
+
+import torch.nn as nn
+
+from espnet.nets.pytorch_backend.transformer.embedding import PositionalEncoding
+from espnet.nets.pytorch_backend.transformer.embedding import ScaledPositionalEncoding
+
+
+class TextEncoderPrenet(nn.Module):
+    """
+
+    Args:
+        in_channels (int): the number of input channels
+        mid_channels (int): the number of intermediate channels
+        out_channels (int): the number of output channels
+        kernel_sizes (List[int]): the kernel size for each convolutional layer
+    """
+
+    def __init__(
+        self,
+        embed_tokens,
+        args,
+    ):
+        super(TextEncoderPrenet, self).__init__()
+        self.padding_idx = embed_tokens.padding_idx
+        # define encoder prenet
+        # get positional encoding class
+        pos_enc_class = (
+            ScaledPositionalEncoding if args.enc_use_scaled_pos_enc else PositionalEncoding
+        )
+
+        self.encoder_prenet = nn.Sequential(
+            embed_tokens,
+            pos_enc_class(args.encoder_embed_dim, args.transformer_enc_positional_dropout_rate, max_len=args.max_text_positions),
+        )
+
+    def forward(self, src_tokens):
+        return self.encoder_prenet(src_tokens), src_tokens.eq(self.padding_idx)
diff --git a/SpeechT5/speecht5/models/modules/transformer_layer.py b/SpeechT5/speecht5/models/modules/transformer_layer.py
new file mode 100644
index 0000000000000000000000000000000000000000..3bdc0ba9fd0cc889e6934a53681f444519eb42ac
--- /dev/null
+++ b/SpeechT5/speecht5/models/modules/transformer_layer.py
@@ -0,0 +1,411 @@
+# --------------------------------------------------------
+# SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing (https://arxiv.org/abs/2110.07205)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/SpeechT5
+# Copyright (c) 2021 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Based on fairseq and espnet code bases
+# https://github.com/pytorch/fairseq; https://github.com/espnet/espnet
+# --------------------------------------------------------
+
+from typing import Dict, List, Optional
+
+import torch
+import torch.nn as nn
+import contextlib
+from fairseq import utils
+from fairseq.modules import LayerNorm
+from .multihead_attention import MultiheadAttention
+from fairseq.modules.fairseq_dropout import FairseqDropout
+from fairseq.modules.quant_noise import quant_noise
+from torch import Tensor
+
+
+class TransformerSentenceEncoderLayer(nn.Module):
+    """
+    Implements a Transformer Encoder Layer used in BERT/XLM style pre-trained
+    models.
+    """
+
+    def __init__(
+        self,
+        embedding_dim: float = 768,
+        ffn_embedding_dim: float = 3072,
+        num_attention_heads: float = 8,
+        dropout: float = 0.1,
+        attention_dropout: float = 0.1,
+        activation_dropout: float = 0.1,
+        activation_fn: str = "relu",
+        layer_norm_first: bool = False,
+        has_relative_attention_bias: bool = False,
+    ) -> None:
+
+        super().__init__()
+        # Initialize parameters
+        self.embedding_dim = embedding_dim
+        self.dropout = dropout
+        self.activation_dropout = activation_dropout
+
+        # Initialize blocks
+        self.activation_fn = utils.get_activation_fn(activation_fn)
+        self.self_attn = MultiheadAttention(
+            self.embedding_dim,
+            num_attention_heads,
+            dropout=attention_dropout,
+            self_attention=True,
+            has_relative_attention_bias=has_relative_attention_bias,
+        )
+
+        self.dropout1 = nn.Dropout(dropout)
+        self.dropout2 = nn.Dropout(self.activation_dropout)
+        self.dropout3 = nn.Dropout(dropout)
+
+        self.layer_norm_first = layer_norm_first
+
+        # layer norm associated with the self attention layer
+        self.self_attn_layer_norm = LayerNorm(self.embedding_dim)
+        self.fc1 = nn.Linear(self.embedding_dim, ffn_embedding_dim)
+        self.fc2 = nn.Linear(ffn_embedding_dim, self.embedding_dim)
+
+        # layer norm associated with the position wise feed-forward NN
+        self.final_layer_norm = LayerNorm(self.embedding_dim)
+
+        if has_relative_attention_bias:
+            self.norm_k = LayerNorm(self.embedding_dim//num_attention_heads)
+
+    def forward(
+        self,
+        x: torch.Tensor,
+        self_attn_mask: torch.Tensor = None,
+        self_attn_padding_mask: torch.Tensor = None,
+        need_weights: bool = False,
+        att_args=None,
+        pos_bias=None,
+    ):
+        """
+        LayerNorm is applied either before or after the self-attention/ffn
+        modules similar to the original Transformer imlementation.
+        """
+        residual = x
+
+        if self.layer_norm_first:
+            x = self.self_attn_layer_norm(x)
+            if pos_bias is not None:
+                pos_bias = self.norm_k(pos_bias)
+            x, attn = self.self_attn(
+                query=x,
+                key=x,
+                value=x,
+                key_padding_mask=self_attn_padding_mask,
+                attn_mask=self_attn_mask,
+                position_bias=pos_bias,
+            )
+            x = self.dropout1(x)
+            x = residual + x
+
+            residual = x
+            x = self.final_layer_norm(x)
+            x = self.activation_fn(self.fc1(x))
+            x = self.dropout2(x)
+            x = self.fc2(x)
+            x = self.dropout3(x)
+            x = residual + x
+        else:
+            x, attn = self.self_attn(
+                query=x,
+                key=x,
+                value=x,
+                key_padding_mask=self_attn_padding_mask,
+                position_bias=pos_bias,
+            )
+
+            x = self.dropout1(x)
+            x = residual + x
+
+            x = self.self_attn_layer_norm(x)
+
+            residual = x
+            x = self.activation_fn(self.fc1(x))
+            x = self.dropout2(x)
+            x = self.fc2(x)
+            x = self.dropout3(x)
+            x = residual + x
+            x = self.final_layer_norm(x)
+
+        return x, attn
+
+
+class TransformerDecoderLayer(nn.Module):
+    """Decoder layer block.
+
+    In the original paper each operation (multi-head attention, encoder
+    attention or FFN) is postprocessed with: `dropout -> add residual ->
+    layernorm`. In the tensor2tensor code they suggest that learning is more
+    robust when preprocessing each layer with layernorm and postprocessing with:
+    `dropout -> add residual`. We default to the approach in the paper, but the
+    tensor2tensor approach can be enabled by setting
+    *args.decoder_normalize_before* to ``True``.
+
+    Args:
+        args (argparse.Namespace): parsed command-line arguments
+        no_encoder_attn (bool, optional): whether to attend to encoder outputs
+            (default: False).
+    """
+
+    def __init__(
+        self, args, no_encoder_attn=False, add_bias_kv=False, add_zero_attn=False, has_relative_attention_bias=False
+    ):
+        super().__init__()
+        self.embed_dim = args.decoder_embed_dim
+        self.num_updates = 0
+        self.dropout_module = FairseqDropout(
+            args.dropout, module_name=self.__class__.__name__
+        )
+        self.quant_noise = getattr(args, "quant_noise_pq", 0)
+        self.quant_noise_block_size = getattr(args, "quant_noise_pq_block_size", 8)
+
+        self.cross_self_attention = getattr(args, "cross_self_attention", False)
+
+        self.freeze_decoder_updates = getattr(args, "freeze_decoder_updates", 0)
+
+        self.self_attn = self.build_self_attention(
+            self.embed_dim,
+            args,
+            add_bias_kv=add_bias_kv,
+            add_zero_attn=add_zero_attn,
+        )
+
+        self.activation_fn = utils.get_activation_fn(
+            activation=str(args.activation_fn)
+            if getattr(args, "activation_fn", None) is not None
+            else "relu"
+        )
+        activation_dropout_p = getattr(args, "activation_dropout", 0) or 0
+        if activation_dropout_p == 0:
+            # for backwards compatibility with models that use args.relu_dropout
+            activation_dropout_p = getattr(args, "relu_dropout", 0) or 0
+        self.activation_dropout_module = FairseqDropout(
+            float(activation_dropout_p), module_name=self.__class__.__name__
+        )
+        self.normalize_before = args.decoder_normalize_before
+
+        export = getattr(args, "export", False)
+        self.self_attn_layer_norm = LayerNorm(self.embed_dim, export=export)
+
+        if no_encoder_attn:
+            self.encoder_attn = None
+            self.encoder_attn_layer_norm = None
+        else:
+            self.encoder_attn = self.build_encoder_attention(self.embed_dim, args)
+            self.encoder_attn_layer_norm = LayerNorm(self.embed_dim, export=export)
+
+        self.fc1 = self.build_fc1(
+            self.embed_dim,
+            args.decoder_ffn_embed_dim,
+            self.quant_noise,
+            self.quant_noise_block_size,
+        )
+        self.fc2 = self.build_fc2(
+            args.decoder_ffn_embed_dim,
+            self.embed_dim,
+            self.quant_noise,
+            self.quant_noise_block_size,
+        )
+
+        self.final_layer_norm = LayerNorm(self.embed_dim, export=export)
+        self.need_attn = True
+
+        self.onnx_trace = False
+
+        self.has_relative_attention_bias = has_relative_attention_bias
+        if self.has_relative_attention_bias:
+            self.norm_k = LayerNorm(self.embed_dim//args.decoder_attention_heads)
+
+    def build_fc1(self, input_dim, output_dim, q_noise, qn_block_size):
+        return quant_noise(nn.Linear(input_dim, output_dim), q_noise, qn_block_size)
+
+    def build_fc2(self, input_dim, output_dim, q_noise, qn_block_size):
+        return quant_noise(nn.Linear(input_dim, output_dim), q_noise, qn_block_size)
+
+    def build_self_attention(
+        self, embed_dim, args, add_bias_kv=False, add_zero_attn=False
+    ):
+        return MultiheadAttention(
+            embed_dim,
+            args.decoder_attention_heads,
+            dropout=args.attention_dropout,
+            add_bias_kv=add_bias_kv,
+            add_zero_attn=add_zero_attn,
+            self_attention=not getattr(args, "cross_self_attention", False),
+            q_noise=self.quant_noise,
+            qn_block_size=self.quant_noise_block_size,
+            #has_relative_attention_bias=args.has_relative_attention_bias,
+        )
+
+    def build_encoder_attention(self, embed_dim, args):
+        return MultiheadAttention(
+            embed_dim,
+            args.decoder_attention_heads,
+            kdim=getattr(args, "encoder_embed_dim", None),
+            vdim=getattr(args, "encoder_embed_dim", None),
+            dropout=args.attention_dropout,
+            encoder_decoder_attention=True,
+            q_noise=self.quant_noise,
+            qn_block_size=self.quant_noise_block_size,
+        )
+
+    def prepare_for_onnx_export_(self):
+        self.onnx_trace = True
+
+    def residual_connection(self, x, residual):
+        return residual + x
+
+    def forward(
+        self,
+        x,
+        encoder_out: Optional[torch.Tensor] = None,
+        encoder_padding_mask: Optional[torch.Tensor] = None,
+        incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]] = None,
+        prev_self_attn_state: Optional[List[torch.Tensor]] = None,
+        prev_attn_state: Optional[List[torch.Tensor]] = None,
+        self_attn_mask: Optional[torch.Tensor] = None,
+        self_attn_padding_mask: Optional[torch.Tensor] = None,
+        need_attn: bool = False,
+        need_head_weights: bool = False,
+        pos_bias=None,
+    ):
+        """
+        Args:
+            x (Tensor): input to the layer of shape `(seq_len, batch, embed_dim)`
+            encoder_padding_mask (ByteTensor, optional): binary
+                ByteTensor of shape `(batch, src_len)` where padding
+                elements are indicated by ``1``.
+            need_attn (bool, optional): return attention weights
+            need_head_weights (bool, optional): return attention weights
+                for each head (default: return average over heads).
+
+        Returns:
+            encoded output of shape `(seq_len, batch, embed_dim)`
+        """
+        ft = self.freeze_decoder_updates <= self.num_updates
+    
+        with torch.no_grad() if not ft else contextlib.ExitStack():
+            if need_head_weights:
+                need_attn = True
+
+            residual = x
+            if self.normalize_before:
+                x = self.self_attn_layer_norm(x)
+                if pos_bias is not None:
+                    pos_bias = self.norm_k(pos_bias)
+            if prev_self_attn_state is not None:
+                prev_key, prev_value = prev_self_attn_state[:2]
+                saved_state: Dict[str, Optional[Tensor]] = {
+                    "prev_key": prev_key,
+                    "prev_value": prev_value,
+                }
+                if len(prev_self_attn_state) >= 3:
+                    saved_state["prev_key_padding_mask"] = prev_self_attn_state[2]
+                assert incremental_state is not None
+                self.self_attn._set_input_buffer(incremental_state, saved_state)
+            _self_attn_input_buffer = self.self_attn._get_input_buffer(incremental_state)
+            if self.cross_self_attention and not (
+                incremental_state is not None
+                and _self_attn_input_buffer is not None
+                and "prev_key" in _self_attn_input_buffer
+            ):
+                if self_attn_mask is not None:
+                    assert encoder_out is not None
+                    self_attn_mask = torch.cat(
+                        (x.new_zeros(x.size(0), encoder_out.size(0)), self_attn_mask), dim=1
+                    )
+                if self_attn_padding_mask is not None:
+                    if encoder_padding_mask is None:
+                        assert encoder_out is not None
+                        encoder_padding_mask = self_attn_padding_mask.new_zeros(
+                            encoder_out.size(1), encoder_out.size(0)
+                        )
+                    self_attn_padding_mask = torch.cat(
+                        (encoder_padding_mask, self_attn_padding_mask), dim=1
+                    )
+                assert encoder_out is not None
+                y = torch.cat((encoder_out, x), dim=0)
+            else:
+                y = x
+
+            x, attn = self.self_attn(
+                query=x,
+                key=y,
+                value=y,
+                key_padding_mask=self_attn_padding_mask,
+                incremental_state=incremental_state,
+                need_weights=False,
+                attn_mask=self_attn_mask,
+                position_bias=pos_bias,
+            )
+            x = self.dropout_module(x)
+            x = self.residual_connection(x, residual)
+            if not self.normalize_before:
+                x = self.self_attn_layer_norm(x)
+
+        if self.encoder_attn is not None and encoder_out is not None:
+            residual = x
+            if self.normalize_before:
+                x = self.encoder_attn_layer_norm(x)
+            if prev_attn_state is not None:
+                prev_key, prev_value = prev_attn_state[:2]
+                saved_state: Dict[str, Optional[Tensor]] = {
+                    "prev_key": prev_key,
+                    "prev_value": prev_value,
+                }
+                if len(prev_attn_state) >= 3:
+                    saved_state["prev_key_padding_mask"] = prev_attn_state[2]
+                assert incremental_state is not None
+                self.encoder_attn._set_input_buffer(incremental_state, saved_state)
+
+            x, attn = self.encoder_attn(
+                query=x,
+                key=encoder_out,
+                value=encoder_out,
+                key_padding_mask=encoder_padding_mask,
+                incremental_state=incremental_state,
+                static_kv=True,
+                need_weights=need_attn or (not self.training and self.need_attn),
+                need_head_weights=need_head_weights,
+            )
+            x = self.dropout_module(x)
+            x = self.residual_connection(x, residual)
+            if not self.normalize_before:
+                x = self.encoder_attn_layer_norm(x)
+
+        with torch.no_grad() if not ft else contextlib.ExitStack():
+            residual = x
+            if self.normalize_before:
+                x = self.final_layer_norm(x)
+
+            x = self.activation_fn(self.fc1(x))
+            x = self.activation_dropout_module(x)
+            x = self.fc2(x)
+            x = self.dropout_module(x)
+            x = self.residual_connection(x, residual)
+            if not self.normalize_before:
+                x = self.final_layer_norm(x)
+            if self.onnx_trace and incremental_state is not None:
+                saved_state = self.self_attn._get_input_buffer(incremental_state)
+                assert saved_state is not None
+                if self_attn_padding_mask is not None:
+                    self_attn_state = [
+                        saved_state["prev_key"],
+                        saved_state["prev_value"],
+                        saved_state["prev_key_padding_mask"],
+                    ]
+                else:
+                    self_attn_state = [saved_state["prev_key"], saved_state["prev_value"]]
+                return x, attn, self_attn_state
+        return x, attn, None
+
+    def make_generation_fast_(self, need_attn: bool = False, **kwargs):
+        self.need_attn = need_attn
+
+    def set_num_updates(self, num_updates):
+        """Set the number of parameters updates."""
+        self.num_updates = num_updates
diff --git a/SpeechT5/speecht5/models/speecht5.py b/SpeechT5/speecht5/models/speecht5.py
new file mode 100644
index 0000000000000000000000000000000000000000..cb17131522344e7676bfcc4ceb3c053d17f9eb10
--- /dev/null
+++ b/SpeechT5/speecht5/models/speecht5.py
@@ -0,0 +1,1447 @@
+# --------------------------------------------------------
+# SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing (https://arxiv.org/abs/2110.07205)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/SpeechT5
+# Copyright (c) 2021 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Based on fairseq and espnet code bases
+# https://github.com/pytorch/fairseq; https://github.com/espnet/espnet
+# --------------------------------------------------------
+
+import logging
+from ast import literal_eval
+from typing import Dict, List, Optional, Tuple
+
+import torch
+import torch.nn.functional as F
+from fairseq import utils
+from fairseq.models import (
+    FairseqEncoderDecoderModel,
+    FairseqIncrementalDecoder,
+    register_model,
+    register_model_architecture,
+)
+from .modules.text_encoder_prenet import TextEncoderPrenet
+from .modules.text_decoder_prenet import TextDecoderPrenet
+from .modules.text_decoder_postnet import TextDecoderPostnet
+from .modules.speech_encoder_prenet import SpeechEncoderPrenet
+from .modules.speech_encoder_postnet import SpeechEncoderPostnet
+from .modules.speech_decoder_prenet import SpeechDecoderPrenet
+from .modules.speech_decoder_postnet import SpeechDecoderPostnet
+from .modules.speaker_decoder_postnet import SpeakerDecoderPostnet
+from .modules.encoder import TransformerEncoder
+from .modules.decoder import TransformerDecoder
+from fairseq.modules.transformer_sentence_encoder import init_bert_params
+from fairseq.models.transformer import Embedding
+from fairseq.modules import (
+    GumbelVectorQuantizer,
+)
+from torch import Tensor
+
+
+logger = logging.getLogger(__name__)
+
+DEFAULT_MAX_TEXT_POSITIONS = 450
+DEFAULT_MAX_SPEECH_POSITIONS = 4000
+
+
+@register_model("t5_transformer")
+class T5TransformerModel(FairseqEncoderDecoderModel):
+    """Adapted Transformer model (https://arxiv.org/abs/1706.03762) for
+    speech-to-text tasks. The Transformer encoder/decoder remains the same.
+    A trainable input subsampler is prepended to the Transformer encoder to
+    project inputs into the encoder dimension as well as downsample input
+    sequence for computational efficiency."""
+
+    def __init__(
+            self, 
+            args,
+            encoder, decoder,
+            text_encoder_prenet, speech_encoder_prenet,
+            text_decoder_prenet, speech_decoder_prenet,
+            text_decoder_postnet, speech_decoder_postnet,
+            speaker_decoder_postnet, speech_encoder_postnet, 
+        ):
+        super().__init__(encoder, decoder)
+
+        self.encoder = encoder
+        self.decoder = decoder
+
+        self.text_encoder_prenet = text_encoder_prenet
+        self.speech_encoder_prenet = speech_encoder_prenet
+
+        self.text_decoder_prenet = text_decoder_prenet
+        self.speech_decoder_prenet = speech_decoder_prenet
+
+        self.text_decoder_postnet = text_decoder_postnet
+        self.speech_decoder_postnet = speech_decoder_postnet
+        self.speaker_decoder_postnet = speaker_decoder_postnet
+
+        self.hubert_layer = speech_encoder_postnet
+
+        self.reduction_factor = args.reduction_factor
+        self.spk_embed_dim = args.spk_embed_dim
+        # define projection layer
+        self.spk_embed_integration_type = args.spk_embed_integration_type
+        if self.spk_embed_dim is not None and self.spk_embed_integration_type != 'pre':
+            if self.spk_embed_integration_type == "add":
+                self.projection = torch.nn.Linear(self.spk_embed_dim, args.decoder_embed_dim)
+            else:
+                self.projection = torch.nn.Linear(
+                    args.decoder_embed_dim + self.spk_embed_dim, args.decoder_embed_dim
+                )
+
+        self.use_codebook = args.use_codebook
+        self.codebook_prob = getattr(args, "codebook_prob", 0.5) # args.codebook_prob
+        if self.use_codebook:
+            vq_dim = args.latent_dim if args.latent_dim > 0 else args.encoder_embed_dim
+            self.quantizer = GumbelVectorQuantizer(
+                dim=args.encoder_embed_dim,
+                num_vars=args.latent_vars,
+                temp=args.latent_temp,
+                groups=args.latent_groups,
+                combine_groups=False,
+                vq_dim=vq_dim,
+                time_first=True,
+                weight_proj_depth=args.quantizer_depth,
+                weight_proj_factor=args.quantizer_factor,
+            )
+
+        self.num_updates = 0
+
+        # # Follow BERT's random weight initialization (for BART)
+        if args.bert_init:
+            self.apply(init_bert_params)
+        self.args = args
+        self.prune_modules(args.modules_filter)
+
+    @staticmethod
+    def add_args(parser):
+        """Add model-specific arguments to the parser."""
+        # Transformer
+        parser.add_argument(
+            "--activation-fn",
+            type=str,
+            choices=utils.get_available_activation_fns(),
+            help="activation function to use",
+        )
+        parser.add_argument(
+            "--dropout", type=float, metavar="D", help="dropout probability"
+        )
+        parser.add_argument(
+            "--attention-dropout",
+            type=float,
+            metavar="D",
+            help="dropout probability for attention weights",
+        )
+        parser.add_argument(
+            "--activation-dropout",
+            "--relu-dropout",
+            type=float,
+            metavar="D",
+            help="dropout probability after activation in FFN.",
+        )
+        parser.add_argument(
+            "--encoder-embed-dim",
+            type=int,
+            metavar="N",
+            help="encoder embedding dimension",
+        )
+        parser.add_argument(
+            "--encoder-ffn-embed-dim",
+            type=int,
+            metavar="N",
+            help="encoder embedding dimension for FFN",
+        )
+        parser.add_argument(
+            "--encoder-layers", type=int, metavar="N", help="num encoder layers"
+        )
+        parser.add_argument(
+            "--encoder-attention-heads",
+            type=int,
+            metavar="N",
+            help="num encoder attention heads",
+        )
+        parser.add_argument(
+            "--encoder-normalize-before",
+            action="store_true",
+            help="apply layernorm before each encoder block",
+        )
+        parser.add_argument(
+            "--decoder-normalize-before",
+            action="store_true",
+            help="apply layernorm before each decoder block",
+        )
+        parser.add_argument(
+            "--decoder-embed-dim",
+            type=int,
+            metavar="N",
+            help="decoder embedding dimension",
+        )
+        parser.add_argument(
+            "--decoder-ffn-embed-dim",
+            type=int,
+            metavar="N",
+            help="decoder embedding dimension for FFN",
+        )
+        parser.add_argument(
+            "--decoder-layers", type=int, metavar="N", help="num decoder layers"
+        )
+        parser.add_argument(
+            "--decoder-attention-heads",
+            type=int,
+            metavar="N",
+            help="num decoder attention heads",
+        )
+        parser.add_argument(
+            "--reduction-factor",
+            type=int,
+            help="reduction factor for decoder",
+        )
+        parser.add_argument(
+            "--spk-embed-dim",
+            type=int,
+            help="speaker embedding dimension",
+        )
+        parser.add_argument(
+            "--layernorm-embedding",
+            action="store_true",
+            help="add layernorm to embedding",
+        )
+        parser.add_argument(
+            "--load-pretrained-encoder-from",
+            type=str,
+            metavar="STR",
+            help="model to take encoder weights from (for initialization)",
+        )
+        parser.add_argument(
+            '--freeze-encoder-updates',
+            type=int,
+            help='number of steps to freeze encoder before finetune'
+        )
+        parser.add_argument(
+            '--freeze-decoder-updates',
+            type=int,
+            help='number of steps to freeze decoder before finetune'
+        )
+        parser.add_argument(
+            '--no-freeze-encoder-layer',
+            type=str,
+            help='which encoder layer not freeze during finetune'
+        )
+        parser.add_argument(
+            "--share-input-output-embed",
+            action="store_true",
+            help="share decoder input and output embeddings",
+        )
+        parser.add_argument(
+            "--share-ctc-embed",
+            action="store_true",
+            help="share ctc embed and decoder embed",
+        )
+        parser.add_argument(
+            "--encoder-sliding-window-attn",
+            default=None,
+            type=int,
+            help="If not None but a even number, set sliding window attention to encoder's attn_mask, e.g., 4, 10, and 20",
+        )
+        
+        # Convolutional subsampler
+        parser.add_argument(
+            "--encoder-speech-prenet",
+            default="conv",
+            type=str,
+            choices=["conv", "linear"],
+            help="The type of encoder speech prenet, e.g., conv or linear."
+        )
+        parser.add_argument(
+            "--conv-kernel-sizes",
+            default="5,5",
+            type=str,
+            help="The layer of convolution of encoder speech prenet."
+        )
+        parser.add_argument(
+            "--conv-channels",
+            default=1024,
+            type=int,
+            help="The channels of encoder speech prenet."
+        )
+        parser.add_argument(
+            "--subsample-stride",
+            default="2,2",
+            type=str,
+            help="The subsample stride for conv1dsubsample."
+        )
+        parser.add_argument(
+            "--spk-embed-integration-type",
+            type=str,
+            choices=["pre", "add"],
+            help="speaker embedding integration type"
+        )
+        parser.add_argument(
+            "--dprenet-dropout-rate",
+            default=0.5,
+            type=float,
+            help="The dropout rate of decoder speech prenet."
+        )
+        
+        ## SE
+        parser.add_argument(
+            "--se-predict",
+            default=None,
+            choices=["masking", "target", "delta"],
+            help="If set, source speech inputs decoder to predict the masking/target/delta of corresponding inputs."
+               + "masking is [0, 1], target is predicted output, delta is difference between inputs and outputs",
+        )
+        parser.add_argument(
+            "--se-decoder-input",
+            type=str,
+            default="previous_target",
+            choices=["previous_target", "source"],
+        )
+        
+        ## SID
+        parser.add_argument(
+            "--modules-filter",
+            default=None,
+            type=str,
+            help="Remove unused modules for, e.g., SID.",
+        )
+        parser.add_argument(
+            "--sid-pad-prenet",
+            action="store_true",
+            help="If set, the size of text dictionary is as small as for <pad> token.",
+        )
+        parser.add_argument(
+            "--encoder-attn-branch",
+            type=str,
+            default="identity,full",
+            help="encoder attention branch sliding window, e.g., 'identity,0,2,4,full'",
+        )
+        parser.add_argument(
+            "--encoder-block-branch",
+            type=str,
+            help="average the output of encoder, e.g., '4,5,6'",
+        )
+        parser.add_argument(
+            "--sid-encoder-cls",
+            default=None,
+            choices=["encoder"],
+            help="If set, add cls vector to the encoder input, e.g., constant vector.",
+        )
+        parser.add_argument(
+            "--sid-shuffle-encoder-input",
+            action="store_true",
+            help="If set, shuffle encoder input in time.",
+        )
+        parser.add_argument(
+            "--sid-decoder-speaker",
+            action="store_true",
+            help="If set, apply speaker decoder as transformer decoder.",
+        )
+        parser.add_argument(
+            "--sid-decoder-attn-dim",
+            default=128,
+            type=int,
+            help="Attention dimension in attensive statistics pooling of speaker decoder.",
+        )
+        parser.add_argument(
+            "--sid-t5-postnet",
+            action="store_true",
+            help="If set, apply TextDecoderPostnet as speaker classification.",
+        )
+        parser.add_argument(
+            "--sid-embed-dim",
+            default=128,
+            type=int,
+            help="Embedding dimension in speaker postnet for speaker identification if embed postnet.",
+        )
+        parser.add_argument(
+            "--sid-pooling-layer",
+            default="decoder",
+            type=str,
+            choices=["decoder-las", "decoder", "encoder", "encoder-cls", "encoder-speaker"],
+            help="The output of decoder or encoder uses as SID pooling layer over temporal dimension.",
+        )
+        parser.add_argument(
+            "--sid-no-pooling-bn",
+            action="store_true",
+            help="If set, not attention batchnorm.",
+        )
+        parser.add_argument(
+            "--sid-no-embed-postnet",
+            action="store_true",
+            help="If set, no layer between decoder output and classification layer.",
+        )
+        parser.add_argument(
+            "--sid-normalize-postnet",
+            action="store_true",
+            help="If set, normalize input and weight in postnet/classifier.",
+        )
+        parser.add_argument(
+            "--sid-softmax-type",
+            default="softmax",
+            choices=["softmax", "amsoftmax", "aamsoftmax"],
+            help="If using amsoftmax or aamsoftmax, the target should be given.",
+        )
+        parser.add_argument(
+            "--softmax-scale",
+            default=1.0,
+            type=float,
+            help="Scale for AMSoftmax or AAMSoftmax.",
+        )
+        parser.add_argument(
+            "--softmax-margin",
+            default=0.0,
+            type=float,
+            help="Margin for AMSoftmax or AAMSoftmax.",
+        )
+        parser.add_argument(
+            "--softmax-easy-margin",
+            action="store_true",
+            help="Enable easy margin for AAMSoftmax.",
+        )
+        parser.add_argument(
+            "--encoder-layerdrop",
+            type=float,
+            metavar="D",
+            help="LayerDrop probability for encoder",
+        )
+        parser.add_argument(
+            "--decoder-layerdrop",
+            type=float,
+            metavar="D",
+            help="LayerDrop probability for decoder",
+        )
+        
+        ## Hubert
+        parser.add_argument(
+            '--feature-grad-mult',
+            type=float,
+            help='multiply feature extractor var grads by this'
+        )
+        parser.add_argument(
+            '--logit-temp',
+            type=float,
+            help='temperature to divide logits by'
+        )
+        parser.add_argument(
+            '--final-dim',
+            type=int,
+            help="project final representations and targets to this many "
+            "dimensions. set to encoder_embed_dim is <= 0"
+        )
+        
+        # mask
+        parser.add_argument(
+            '--hubert-mask-length',
+            type=int,
+            help='mask length'
+        )
+        parser.add_argument(
+            '--mask-prob',
+            type=float,
+            help='probability of replacing a token with mask'
+        )
+        parser.add_argument(
+            "--mask-selection",
+            choices=["static", "uniform", "normal", "poisson"],
+            help="how to choose mask length",
+        )
+        parser.add_argument(
+            '--mask-other',
+            type=float,
+            help="secondary mask argument "
+            "(used for more complex distributions), "
+            "see help in compute_mask_indices"
+        )
+        parser.add_argument(
+            '--mask-min-space',
+            type=int,
+            help='min space between spans (if no overlap is enabled)'
+        )
+        
+        # channel masking
+        parser.add_argument(
+            '--mask-channel-length',
+            type=int,
+            help='length of the mask for features (channels)'
+        )
+        parser.add_argument(
+            '--mask-channel-prob',
+            type=float,
+            help="probability of replacing a feature with 0"
+        )
+        parser.add_argument(
+            "--mask-channel-selection",
+            choices=["static", "uniform", "normal", "poisson"],
+            help="how to choose mask length for channel masking",
+        )
+        parser.add_argument(
+            '--mask-channel-other',
+            type=float,
+            help="secondary mask argument "
+            "(used for more complex distributions), "
+            "see help in compute_mask_indices"
+        )
+        parser.add_argument(
+            '--mask-channel-min-space',
+            type=int,
+            help='min space between spans (if no overlap is enabled)'
+        )
+        
+        # abs positional embeddings
+        parser.add_argument(
+            '--conv-pos',
+            type=int,
+            help='number of filters for convolutional positional embeddings'
+        )
+        parser.add_argument(
+            '--conv-pos-groups',
+            type=int,
+            help='number of groups for convolutional positional embedding'
+        )
+        
+        # codebook related
+        parser.add_argument(
+            "--use-codebook",
+            action="store_true",
+            help="whether to use codebook",
+        )
+        parser.add_argument(
+            "--codebook-prob",
+            type=float,
+            help="probability to use codebook",
+        )
+        parser.add_argument(
+            "--latent-vars",
+            type=int,
+            help="number of latent variables V in each group of the codebook",
+        )
+        parser.add_argument(
+            "--latent-groups",
+            type=int,
+            help="number of groups G of latent variables in the codebook",
+        )
+        parser.add_argument(
+            "--latent-dim",
+            type=int,
+            help="if > 0, uses this dimensionality for latent variables. "
+            "otherwise uses final_dim / latent_groups",
+        )
+        parser.add_argument(
+            "--latent-temp",
+            type=literal_eval,
+            help="temperature for latent variable sampling. "
+            "can be tuple of 3 values (start, end, decay)",
+        )
+        parser.add_argument(
+            "--quantizer-depth",
+            type=int,
+            help="number of quantizer layers",
+        )
+        parser.add_argument(
+            "--quantizer-factor",
+            type=int,
+            help="number of quantizer layers",
+        )
+        parser.add_argument(
+            "--get-code-distribution",
+            action='store_true',
+            help="whether to get the code distribution (for test)",
+        )
+
+        # relative pos enc
+        parser.add_argument(
+            "--relative-position-embedding",
+            action='store_true',
+            help="whether to use relative position embedding",
+        )
+        parser.add_argument(
+            "--num-buckets",
+            type=int,
+            default=320,
+            help="num of buckets for relative position embedding",
+        )
+        parser.add_argument(
+            "--max-distance",
+            type=int,
+            default=1280,
+            help="max distance for relative position embedding",
+        )
+        parser.add_argument(
+            "--encoder-max-relative-position",
+            type=int,
+            help="max distance for relative position embedding in encoder",
+        )
+        parser.add_argument(
+            "--decoder-max-relative-position",
+            type=int,
+            help="max distance for relative position embedding in decoder",
+        )
+
+        # hubert feature extractor
+        parser.add_argument(
+            "--conv-feature-layers",
+            type=str,
+            help= "string describing convolutional feature extraction "
+            "layers in form of a python list that contains "
+            "[(dim, kernel_size, stride), ...]",
+        )
+        parser.add_argument(
+            "--conv-bias",
+            action='store_true',
+            help="include bias in conv encoder",
+        )
+        parser.add_argument(
+            "--extractor-mode",
+            choices=["default", "layer_norm"],
+            help="mode for feature extractor. default has a single group "
+            "norm with d groups in the first conv block, whereas layer_norm "
+            "has layer norms in every block (meant to use with normalize=True)"
+        )
+
+        # others
+        parser.add_argument(
+            "--bert-init",
+            action='store_true',
+            help="initilize as bert",
+        )
+        parser.add_argument(
+            "--unb-enc-layer",
+            type=int,
+            default=-1,
+            help="which layer's output is used as the input of decoder",
+        )
+
+    # Encoder, Decoder
+    @classmethod
+    def build_encoder(cls, args, dictionary=None, embed_tokens=None):
+        return TransformerEncoder(args, dictionary, embed_tokens)
+
+    @classmethod
+    def build_decoder(cls, args):
+        return TransformerDecoder(args)
+
+    # Encoder Prenet
+    @classmethod
+    def build_text_encoder_prenet(cls, embed_tokens, args):
+        return TextEncoderPrenet(embed_tokens, args)
+
+    @classmethod
+    def build_speech_encoder_prenet(cls, args):
+        return SpeechEncoderPrenet(args)
+
+    # Decoder Prenet
+    @classmethod
+    def build_text_decoder_prenet(cls, embed_tokens, args):
+        return TextDecoderPrenet(embed_tokens, args)
+
+    @classmethod
+    def build_speech_decoder_prenet(cls, odim, args):
+        return SpeechDecoderPrenet(odim, args)
+
+    # Decoder Postnet
+    @classmethod
+    def build_text_decoder_postnet(cls, embed_tokens, dictionary, args):
+        return TextDecoderPostnet(embed_tokens, dictionary, args)
+
+    @classmethod
+    def build_speaker_decoder_postnet(cls, embed_dim, class_num, args):
+        return SpeakerDecoderPostnet(embed_dim, class_num, args)
+
+    @classmethod
+    def build_speech_decoder_postnet(cls, odim, args):
+        return SpeechDecoderPostnet(odim, args)
+
+    @classmethod
+    def build_speech_encoder_postnet(cls, dictionaries, args):
+        return SpeechEncoderPostnet(dictionaries, args)
+
+    @classmethod
+    def build_model(cls, args, task):
+        """Build a new model instance."""
+
+        # make sure all arguments are present in older models
+        base_architecture(args)
+
+        def build_embedding(dictionary, embed_dim, max_num_embeddings=None):
+            num_embeddings = len(dictionary)
+            if max_num_embeddings is not None and isinstance(max_num_embeddings, int):
+                num_embeddings = min(num_embeddings, max_num_embeddings)  
+            padding_idx = dictionary.pad()
+            return Embedding(num_embeddings, embed_dim, padding_idx)
+
+        if hasattr(args, "sid_pad_prenet") and args.sid_pad_prenet:
+            max_num_embeddings = 3 # <pad> at index 2
+        else:
+            max_num_embeddings = None
+        
+        text_decoder_embed_tokens = build_embedding(
+            task.dicts["text"], args.decoder_embed_dim, max_num_embeddings
+        )        
+
+        if args.share_input_output_embed:
+            text_encoder_embed_tokens = text_decoder_embed_tokens
+        else:
+            text_encoder_embed_tokens = build_embedding(
+                task.dicts["text"], args.encoder_embed_dim
+            )
+
+        speech_odim = args.speech_odim
+        if "text" in task.dicts:
+            encoder = cls.build_encoder(args, task.dicts["text"], text_encoder_embed_tokens)
+        else:
+            encoder = cls.build_encoder(args)      
+        decoder = cls.build_decoder(args)
+
+        text_encoder_prenet = cls.build_text_encoder_prenet(text_encoder_embed_tokens, args)
+        speech_encoder_prenet = cls.build_speech_encoder_prenet(args)
+
+        text_decoder_prenet = cls.build_text_decoder_prenet(text_decoder_embed_tokens, args)
+        if getattr(args, "sid_pooling_layer", None) == "decoder-las":
+            speech_decoder_prenet = cls.build_speech_encoder_prenet(args)
+        else:
+            speech_decoder_prenet = cls.build_speech_decoder_prenet(speech_odim, args)
+
+        text_decoder_postnet = cls.build_text_decoder_postnet(text_decoder_embed_tokens, task.dicts['text'], args)
+        speech_decoder_postnet = cls.build_speech_decoder_postnet(speech_odim, args)
+
+        if getattr(args, "sid_t5_postnet", False):
+            speaker_decoder_postnet = None
+        else:
+            if task.t5_task == "s2c":
+                speaker_decoder_postnet = cls.build_speaker_decoder_postnet(args.sid_embed_dim, len(task.dicts['text']), args)
+            else:
+                speaker_decoder_postnet = None
+
+        if "hubert" in task.dicts:
+            speech_encoder_postnet = cls.build_speech_encoder_postnet(task.dicts['hubert'], args)
+        else:
+            speech_encoder_postnet = None
+
+        return cls(
+            args, 
+            encoder, decoder, 
+            text_encoder_prenet, speech_encoder_prenet,
+            text_decoder_prenet, speech_decoder_prenet,
+            text_decoder_postnet, speech_decoder_postnet,
+            speaker_decoder_postnet, speech_encoder_postnet,
+        )
+
+    def get_normalized_probs(
+        self,
+        net_output: Tuple[Tensor, Optional[Dict[str, List[Optional[Tensor]]]]],
+        log_probs: bool,
+        sample: Optional[Dict[str, Tensor]] = None,
+    ):
+        # net_output['encoder_out'] is a (B, T, D) tensor
+        lprobs = self.get_normalized_probs_scriptable(net_output, log_probs, sample)
+        lprobs.batch_first = True
+        return lprobs
+
+    def get_normalized_probs_for_ctc(self, net_output, log_probs):
+        """Get normalized probabilities (or log probs) from a net's output."""
+
+        logits = net_output["encoder_out_for_ctc"][0]
+        if log_probs:
+            return utils.log_softmax(logits.float(), dim=-1)
+        else:
+            return utils.softmax(logits.float(), dim=-1)
+
+    def get_logits(self, net_output, is_masked=True):
+        if is_masked:
+            logits_list = net_output["logit_m_list"]
+        else:
+            logits_list = net_output["logit_u_list"]
+        logits_list = [x.float() for x in logits_list if x is not None]
+        return logits_list
+
+    def get_targets(self, sample, net_output, is_masked=True):
+        if "logit_m_list" in net_output:
+            logits_list = self.get_logits(net_output, is_masked)
+            targets_list = [
+                x.new_zeros(x.size(0), dtype=torch.long) for x in logits_list
+            ]
+            return targets_list
+        else:
+            return sample["target"]
+
+    def get_extra_losses(self, net_output):
+        extra_losses = []
+        names = []
+
+        if "features_pen" in net_output:
+            extra_losses.append(net_output["features_pen"])
+            names.append("features_pen")
+
+        if "prob_perplexity" in net_output:
+            extra_losses.append(
+                (net_output["num_vars"] - net_output["prob_perplexity"])
+                / net_output["num_vars"]
+            )
+            names.append("prob_perplexity")
+
+        return extra_losses, names
+
+    def forward(self, source=None, src_tokens=None, src_lengths=None, prev_output_tokens=None, tgt_lengths=None, spkembs=None, target_list=None, task_name=None, padding_mask=None, only_hubert=False, only_ctc=False, feature_only=False, tgt_enc_layer=None, mask=True):
+        """
+        The forward method inherited from the base class has a **kwargs
+        argument in its input, which is not supported in torchscript. This
+        method overwrites the forward method definition without **kwargs.
+        """
+        assert source is not None or src_tokens is not None
+        # padding_mask is not none only when input is waveform
+        if source is None and padding_mask is None and not feature_only:
+            input_type = 'text'
+        else:
+            input_type = 'speech'
+
+        if prev_output_tokens is not None and len(prev_output_tokens.size()) == 2:
+            output_type = 'text'
+            codebook_out = {}
+        else:
+            output_type = 'speech'
+
+        if task_name is not None and task_name == "s2c":
+            if target_list is not None and target_list.size(1) == 1 and not getattr(self.args, "sid_t5_postnet", False):
+                sid_target = F.one_hot(target_list.squeeze(1), num_classes=self.speaker_decoder_postnet.class_num)
+            else:
+                sid_target = None
+            target_list = None
+
+        # Encoder Prenet
+        if input_type == 'text':
+            encoder_input, encoder_padding_mask = self.text_encoder_prenet(src_tokens)
+        else:
+            if target_list is not None:
+                encoder_input, encoder_padding_mask = self.speech_encoder_prenet(source, require_feat_pen=True, target_list=target_list, padding_mask=padding_mask, mask=mask)
+                encoder_input, features_pen, mask_indices, target_list = encoder_input
+            else:
+                encoder_input, encoder_padding_mask = self.speech_encoder_prenet(source, padding_mask=padding_mask, mask=self.training)
+                # shuffle a batch of inputs of encoder
+                if self.training and hasattr(self.args, "sid_shuffle_encoder_input") and getattr(self.args, "sid_shuffle_encoder_input", False):
+                    shuffle_index = torch.randperm(encoder_padding_mask.size(1), device=encoder_padding_mask.device)
+                    encoder_input = torch.index_select(encoder_input, 1, shuffle_index)
+                    encoder_padding_mask = torch.index_select(encoder_padding_mask, 1, shuffle_index)
+                if getattr(self.args, "sid_encoder_cls", None) == "encoder":
+                    prev_output_tokens = torch.zeros_like(prev_output_tokens)
+                    encoder_input, encoder_padding_mask = self._integrate_with_speaker_cls(prev_output_tokens, encoder_input, encoder_padding_mask)
+
+        # Encoder: T x B x C
+        encoder_output = self.encoder(encoder_input, encoder_padding_mask, tgt_layer=tgt_enc_layer)
+
+        if task_name is not None and task_name == 'speech_pretrain' and feature_only:
+            return encoder_output["encoder_out"][0].transpose(0, 1)
+
+        if task_name is not None and task_name == 's2c':
+            if self.args.sid_pooling_layer == "encoder":
+                return self.speaker_decoder_postnet(encoder_output["encoder_out"][0].transpose(0, 1).mean(1), sid_target), None
+            elif self.args.sid_pooling_layer == "encoder-cls":
+                return self.speaker_decoder_postnet(encoder_output["encoder_out"][0].transpose(0, 1)[:,0], sid_target), None
+            elif self.args.sid_pooling_layer == "encoder-speaker" or getattr(self.args, "sid_decoder_speaker", False):
+                return self.speaker_decoder_postnet(encoder_output["encoder_out"][0].transpose(0, 1), sid_target), None
+
+        if target_list is not None:
+            hubert_results = self.hubert_layer(
+                encoder_output["encoder_out"][0].transpose(0, 1), 
+                encoder_padding_mask, 
+                mask_indices, 
+                target_list
+            )
+
+            hubert_results['features_pen'] = features_pen
+
+        if "decoder_input" in encoder_output and encoder_output["decoder_input"][0] is not None:
+            # Change the encoder output to decoder input once set unb-enc-layer
+            encoder_output["encoder_out"] = encoder_output["decoder_input"]
+
+        if self.use_codebook:
+            q = self.quantizer(encoder_output["encoder_out"][0].transpose(0, 1))
+
+            # q["x"]: B x T x C
+            # Sample indexs according to the codebook prob
+            random_idx = torch.randperm(q["x"].size(1))[:int(q["x"].size(1) * self.codebook_prob)]
+            # Make weight for q
+            q_w = q["x"].new_zeros(q["x"].size(1))
+            q_w[random_idx] = 1.0
+            # Combine quantized codes and encoder output
+            encoder_output["encoder_out"][0] = (
+                q_w.view(-1, 1) * q["x"] + (- q_w + 1).view(-1, 1) * encoder_output["encoder_out"][0].transpose(0, 1)
+            ).transpose(0, 1)
+
+            # encoder_output["encoder_out"][0] = q["x"].transpose(0, 1)
+            if output_type == 'speech':
+                hubert_results["prob_perplexity"] = q["prob_perplexity"]
+                hubert_results["code_perplexity"] = q["code_perplexity"]
+                hubert_results["num_vars"] = q["num_vars"]
+                hubert_results["temp"] = q["temp"]
+            elif output_type == 'text':
+                codebook_out["prob_perplexity"] = q["prob_perplexity"]
+                codebook_out["code_perplexity"] = q["code_perplexity"]
+                codebook_out["num_vars"] = q["num_vars"]
+                codebook_out["temp"] = q["temp"]
+
+        if only_hubert and target_list is not None:
+            return hubert_results, None
+        
+        if only_ctc and task_name is not None and task_name == "s2t":
+            return None, encoder_output
+        elif not self.training and prev_output_tokens is None and task_name == "s2t" and task_name is not None:
+            return encoder_output
+
+        # Decoder Prenet
+        if output_type == 'text':
+            # _ is the incremental state
+            prev_output_tokens, tgt_mask, _ = self.text_decoder_prenet(prev_output_tokens)
+            if task_name is not None and task_name == 's2c':
+                prev_output_tokens = torch.zeros_like(prev_output_tokens)
+        else:
+            # integrate speaker embedding
+            if self.spk_embed_integration_type == "pre" and self.spk_embed_dim is not None:
+                # Decoder Prenet
+                prev_output_tokens, tgt_mask = self.speech_decoder_prenet(prev_output_tokens, tgt_lengths, spkembs)
+            else:
+                if self.spk_embed_dim is not None:
+                    encoder_output["encoder_out"] = [self._integrate_with_spk_embed(
+                        encoder_output["encoder_out"][0].transpose(0, 1), spkembs
+                    ).transpose(0, 1)]
+
+                prev_output_tokens, tgt_mask = self.speech_decoder_prenet(prev_output_tokens, tgt_lengths)
+
+        # BART Sequence Classification: cat <pad> + feature before decoder
+        if task_name is not None and task_name == 's2c' and self.args.sid_pooling_layer == "decoder-las":
+            decoder_feat_input, decoder_feat_mask = self.speech_decoder_prenet(src_tokens, src_lengths)
+            prev_output_tokens, tgt_mask = self._integrate_with_speaker_cls((prev_output_tokens, tgt_mask), decoder_feat_input, decoder_feat_mask, cls_first=False)
+        
+        # SE predict masking to corresponding inputs and source speech replaces the prev_output_tokens as the input of decoder
+        if task_name is not None and task_name == "s2s" and getattr(self.args, "se_decoder_input", "previous_target") == "source":
+            prev_output_tokens, tgt_mask = self.speech_decoder_prenet(src_tokens, src_lengths)
+
+        # Decoder
+        decoder_output, extra = self.decoder(prev_output_tokens, tgt_mask, encoder_output, 
+                                             full_context_alignment=getattr(self.args, "decoder_full_context_alignment", False), 
+                                             alignment_layer=(-1 if target_list is None and output_type == 'speech' else None))
+        # Decoder Postnet
+        if task_name is not None and task_name == 's2c':
+            if not getattr(self.args, "sid_t5_postnet", False):
+                if self.args.sid_pooling_layer == "decoder":
+                    return self.speaker_decoder_postnet(decoder_output.mean(1), sid_target), None
+                elif self.args.sid_pooling_layer == "decoder-las":
+                    indices = (tgt_mask.eq(False).float().sum(1) - 1.0).type(torch.int64)
+                    indices = indices.unsqueeze(1).unsqueeze(2).expand(-1, -1, decoder_output.size(2))
+                    return self.speaker_decoder_postnet(decoder_output.gather(1, indices), sid_target), None
+            else:
+                return (self.text_decoder_postnet(decoder_output), None), encoder_output
+
+        # SE predict: masking, target, delta. Ensure reduction factor 1
+        if task_name is not None and task_name == 's2s' and getattr(self.args, "se_predict", None) is not None:
+            assert self.reduction_factor == 1, f"{self.reduction_factor} != 1"
+            before_outs, after_outs, logits = self.speech_decoder_postnet(decoder_output)
+            se_predict = getattr(self.args, "se_predict")
+            if se_predict == "masking":
+                before_outs = torch.sigmoid(before_outs) * src_tokens
+                after_outs = torch.sigmoid(after_outs) * src_tokens
+                return before_outs, after_outs, logits, extra['attn'][0]
+            elif se_predict == "target":
+                return before_outs, after_outs, logits, extra['attn'][0]
+            elif se_predict == "delta":
+                before_outs = before_outs - src_tokens
+                after_outs = after_outs - src_tokens
+                return before_outs, after_outs, logits, extra['attn'][0]
+            else:
+                raise ValueError(f"{se_predict} not in [masking, target, delta]")
+
+        if task_name is not None and task_name == 's2t':
+            #return self.text_decoder_postnet(decoder_output), None
+            return (self.text_decoder_postnet(decoder_output), None), encoder_output
+        if output_type == 'text':
+            return (self.text_decoder_postnet(decoder_output), None), codebook_out, encoder_output
+        else:
+            if target_list is not None:
+                return hubert_results, (self.speech_decoder_postnet(decoder_output) + (extra['attn'][0],))
+            else:
+                return self.speech_decoder_postnet(decoder_output) + (extra['attn'][0],)
+
+    def _integrate_with_speaker_cls(self, pad_input, encoder_input, encoder_padding_mask=None, cls_first=True):
+        """
+        encoder_input: [B, T, C]
+        encoder_padding_mask: [B, T]
+        """
+        if hasattr(self, "text_decoder_prenet"):
+            if isinstance(pad_input, tuple):
+                repeat_cls_vector, repeat_cls_mask = pad_input
+            else:
+                repeat_cls_vector, repeat_cls_mask, _ = self.text_decoder_prenet(pad_input)
+
+            if encoder_padding_mask is not None:
+                bsz = encoder_input.size(0)
+                tsz = encoder_input.size(1)
+                encoder_padding_mask = encoder_input.new_zeros((bsz, tsz)) == 1.0
+            if repeat_cls_mask is None:
+                mask_size = (encoder_padding_mask.size(0), 1)
+                mask_type = encoder_padding_mask.dtype
+                repeat_cls_mask = encoder_padding_mask.new_zeros(mask_size) == 1.0
+            ret_encoder_padding_mask = torch.cat([repeat_cls_mask, encoder_padding_mask], dim=1)
+
+            if cls_first:
+                ret_encoder_input = torch.cat([repeat_cls_vector, encoder_input], dim=1)
+            else:
+                ret_encoder_input = torch.cat([encoder_input, encoder_input[:,-1:,:]], dim=1)
+                mask_size = (encoder_padding_mask.size(0), 1)
+                mask_type = encoder_padding_mask.dtype
+                repeat_cls_mask_ = encoder_padding_mask.new_ones(mask_size) == 1.0
+                encoder_padding_mask_ = torch.cat([encoder_padding_mask, repeat_cls_mask_], dim=1)
+                indices = encoder_padding_mask.eq(False).float().sum(1).type(torch.int64).unsqueeze(1)
+                indices_mask = torch.zeros_like(ret_encoder_padding_mask).scatter(1, indices, 1.0)
+                ret_encoder_input = ret_encoder_input * (1.0 - encoder_padding_mask_.type(ret_encoder_input.dtype).unsqueeze(2)) \
+                    + repeat_cls_vector * indices_mask.type(repeat_cls_vector.dtype).unsqueeze(2)
+            
+        return ret_encoder_input, ret_encoder_padding_mask
+
+    def _integrate_with_spk_embed(self, hs, spembs):
+        """Integrate speaker embedding with hidden states.
+        Args:
+            hs (Tensor): Batch of hidden state sequences (B, Tmax, adim).
+            spembs (Tensor): Batch of speaker embeddings (B, spk_embed_dim).
+        Returns:
+            Tensor: Batch of integrated hidden state sequences (B, Tmax, adim)
+        """
+        if self.spk_embed_integration_type == "add":
+            # apply projection and then add to hidden states
+            spembs = self.projection(F.normalize(spembs))
+            hs = hs + spembs.unsqueeze(1)
+        elif self.spk_embed_integration_type == "concat":
+            # concat hidden states with spk embeds and then apply projection
+            spembs = F.normalize(spembs).unsqueeze(1).expand(-1, hs.size(1), -1)
+            hs = self.projection(torch.cat([hs, spembs], dim=-1))
+        else:
+            raise NotImplementedError("support only add or concat.")
+
+        return hs
+
+    def load_state_dict(
+        self,
+        state_dict,
+        strict=True,
+        model_cfg=None,
+        args=None,
+    ):
+        """NOT STRICT Copies parameters and buffers from *state_dict* into this module and
+        its descendants.
+
+        Overrides the method in :class:`nn.Module`. Compared with that method
+        this additionally "upgrades" *state_dicts* from old checkpoints.
+        """
+        # self.prune_modules(model_cfg.modules_filter)
+        model_dict_size = self.text_decoder_postnet.output_projection.out_features
+        ckpt_dict_size = state_dict["text_decoder_postnet.output_projection.weight"].size(0)
+        if model_dict_size != ckpt_dict_size:
+            # reset dictionary-related modules, such as embedding table and encoder ctc embed
+            logger.warn(f"not equal dictionary between model and checkpoint: {model_dict_size} vs {ckpt_dict_size}")
+            logger.info(f"reset model dictionary with size of {model_dict_size}")
+            removed_keys = [
+                key for key in state_dict.keys() if any(
+                    key.startswith(previ) for previ in [
+                        "encoder.proj", "text_encoder_prenet", "text_decoder_prenet", "text_decoder_postnet"
+                    ]
+                )
+            ]
+            for key in removed_keys:
+                state_dict.pop(key, None)
+                logger.info(f"removed loaded checkpoint: {key}")
+        for m in self._modules.keys():
+            m_state_dict = {
+                key.replace(f"{m}.", ""): value for key, value in state_dict.items() if key.startswith(f"{m}.")
+            }
+            if hasattr(self, m):
+                self._modules[m].load_state_dict(m_state_dict, False)
+        return self
+
+    def prune_modules(self, modules_filter=None):
+        """Prune unused modules for specific tasks."""
+        if modules_filter is None:
+            return
+        elif modules_filter == "s2c":
+            if hasattr(self, "text_encoder_prenet"): del self.text_encoder_prenet
+            if hasattr(self, "speech_decoder_prenet") and getattr(self.args, "sid_pooling_layer", None) != "decoder-las": 
+                del self.speech_decoder_prenet
+            if hasattr(self, "speech_decoder_postnet"): del self.speech_decoder_postnet
+            if hasattr(self, "text_decoder_postnet"): del self.text_decoder_postnet
+            if hasattr(self, "speech_encoder_postnet"): del self.speech_encoder_postnet
+            if hasattr(self.encoder, "proj"): self.encoder.proj = None
+            if hasattr(self, "projection"): del self.projection
+            if hasattr(self, "quantizer"): del self.quantizer
+            if getattr(self.args, "sid_pooling_layer", "decoder").startswith("encoder") or getattr(self.args, "sid_decoder_speaker", False): 
+                if hasattr(self.decoder, "dropout_module"): del self.decoder.dropout_module
+                if hasattr(self.decoder, "layers"): del self.decoder.layers
+                if hasattr(self.decoder, "layer_norm"): del self.decoder.layer_norm
+                if hasattr(self, "text_decoder_prenet"): del self.text_decoder_prenet
+        elif modules_filter == "s2s":
+            if hasattr(self, "speaker_decoder_postnet"): del self.speaker_decoder_postnet
+            if hasattr(self, "text_encoder_prenet"): del self.text_encoder_prenet
+            if hasattr(self, "text_decoder_prenet"): del self.text_decoder_prenet
+            if hasattr(self, "text_decoder_postnet"): del self.text_decoder_postnet
+            if hasattr(self, "speech_encoder_postnet"): del self.speech_encoder_postnet
+            if hasattr(self.encoder, "proj"): self.encoder.proj = None
+            if hasattr(self, "projection"): del self.projection
+            if hasattr(self, "quantizer"): del self.quantizer
+        elif modules_filter == "t2s":
+            if hasattr(self, "speaker_decoder_postnet"): del self.speaker_decoder_postnet
+            if hasattr(self, "speech_encoder_prenet"): del self.speech_encoder_prenet
+            if hasattr(self, "text_decoder_prenet"): del self.text_decoder_prenet
+            if hasattr(self, "text_decoder_postnet"): del self.text_decoder_postnet
+            if hasattr(self, "speech_encoder_postnet"): del self.speech_encoder_postnet
+            if hasattr(self.encoder, "proj"): self.encoder.proj = None
+            if hasattr(self, "projection"): del self.projection
+            if hasattr(self, "quantizer"): del self.quantizer
+        elif modules_filter == "s3prl":
+            # remain the encoder and the pre/post net
+            if hasattr(self.decoder, "dropout_module"): del self.decoder.dropout_module
+            if hasattr(self.decoder, "layers"): del self.decoder.layers
+            if hasattr(self.decoder, "layer_norm"): del self.decoder.layer_norm
+            if hasattr(self, "speaker_decoder_postnet"): del self.speaker_decoder_postnet
+            if hasattr(self, "text_decoder_prenet"): del self.text_decoder_prenet
+            if hasattr(self, "text_decoder_postnet"): del self.text_decoder_postnet
+            if hasattr(self, "speech_decoder_prenet"): del self.speech_decoder_prenet
+            if hasattr(self, "speech_decoder_postnet"): del self.speech_decoder_postnet
+            if hasattr(self, "speech_encoder_postnet"): del self.speech_encoder_postnet
+            if hasattr(self.encoder, "proj"): self.encoder.proj = None
+            if hasattr(self, "projection"): del self.projection
+            if hasattr(self, "quantizer"): del self.quantizer
+
+    def forward_encoder_torchscript(self, net_input: Dict[str, Tensor]):
+        """A TorchScript-compatible version of forward.
+
+        Encoders which use additional arguments may want to override
+        this method for TorchScript compatibility.
+        """
+        if torch.jit.is_scripting():
+            return self.forward_encoder(
+                source=net_input["source"],
+                padding_mask=net_input["padding_mask"]
+            )
+        else:
+            return self.forward_encoder_non_torchscript(net_input)
+
+    @torch.jit.unused
+    def forward_encoder_non_torchscript(self, net_input: Dict[str, Tensor]):
+        encoder_input = {
+            k: v for k, v in net_input.items() if k != "prev_output_tokens" and k != "task_name"
+        }
+        return self.forward_encoder(**encoder_input)
+
+    def forward_encoder(self, source, padding_mask=None):
+        # Encoder Prenet
+        encoder_input, encoder_padding_mask = self.speech_encoder_prenet(source, padding_mask=padding_mask, mask=False)
+
+        # Encoder
+        encoder_output = self.encoder(encoder_input, encoder_padding_mask)
+
+        return encoder_output
+
+    def forward_text_encoder(self, src_tokens):
+        # Text Encoder Prenet
+        encoder_input, encoder_padding_mask = self.text_encoder_prenet(src_tokens)
+
+        # Encoder
+        encoder_output = self.encoder(encoder_input, encoder_padding_mask)
+
+        return encoder_output
+
+    def forward_decoder(self, tokens, encoder_out, incremental_state):
+        # Decoder Prenet
+        prev_output_tokens, tgt_mask, incremental_state = self.text_decoder_prenet(tokens, incremental_state)
+
+        # Decoder
+        decoder_output, extra = self.decoder(
+            prev_output_tokens,
+            tgt_mask,
+            encoder_out=encoder_out,
+            incremental_state=incremental_state,
+        )
+
+        # Decoder Postnet
+        return self.text_decoder_postnet(decoder_output), extra
+
+    def set_num_updates(self, num_updates):
+        """Set the number of parameters updates."""
+        super().set_num_updates(num_updates)
+        self.num_updates = num_updates
+
+    def generate_class(self, source, prev_output_tokens, **kwargs):
+        encoder_out = self.forward_encoder(source, padding_mask=kwargs["padding_mask"])
+
+        prev_output_tokens, tgt_mask, _ = self.text_decoder_prenet(prev_output_tokens, {})
+        prev_output_tokens = torch.zeros_like(prev_output_tokens) # s2c use zero vector as [CLS]
+
+        decoder_output, extra = self.decoder(
+            prev_output_tokens,
+            tgt_mask,
+            encoder_out=encoder_out,
+        )
+
+        decoder_out, embed = self.speaker_decoder_postnet(decoder_output.mean(1))
+
+        pred_class = decoder_out.argmax(1)
+        return pred_class
+
+    def generate_speech(self, source=None, src_tokens=None, spkembs=None, **kwargs):
+        assert source is not None or src_tokens is not None
+
+        threshold = kwargs.get("threshold", 0.5)
+        minlenratio = kwargs.get("threshold", 0.0)
+
+        if source is None:
+            assert src_tokens.size(0) == 1
+            encoder_out = self.forward_text_encoder(src_tokens)
+            maxlenratio = kwargs.get("threshold", 20.0)
+        else:
+            assert source.size(0) == 1
+            encoder_out = self.forward_encoder(source, padding_mask=kwargs["padding_mask"])
+            maxlenratio = kwargs.get("threshold", 10.0)
+
+        if spkembs is not None and self.spk_embed_integration_type != "pre":
+            encoder_out["encoder_out"] = [self._integrate_with_spk_embed(
+                encoder_out["encoder_out"][0].transpose(0, 1), spkembs
+            ).transpose(0, 1)]
+            spkembs = None
+
+        maxlen = int(encoder_out["encoder_out"][0].size(0) * maxlenratio / self.reduction_factor)
+        minlen = int(encoder_out["encoder_out"][0].size(0) * minlenratio / self.reduction_factor)
+        
+        idx = 0
+        ys = encoder_out["encoder_out"][0].new_zeros(1, 1, self.speech_decoder_postnet.odim)
+        outs, probs = [], []
+
+        # forward decoder step-by-step
+        if isinstance(self.decoder, FairseqIncrementalDecoder):
+            incremental_states = {}
+        else:
+            incremental_states = None
+        attns = []
+        while True:
+            # update index
+            idx += 1
+            # calculate output and stop prob at idx-th step
+            decoder_in, _ = self.speech_decoder_prenet(ys, spkembs=spkembs)
+            z, extra = self.decoder(decoder_in[:,-1:], None, encoder_out, incremental_states, alignment_layer=-1)
+            outs += [self.speech_decoder_postnet.feat_out(z[0, -1]).view(self.reduction_factor, self.speech_decoder_postnet.odim)]  # [(r, odim), ...]
+            probs += [torch.sigmoid(self.speech_decoder_postnet.prob_out(z[0, -1]))]  # [(r), ...]
+
+            # update next inputs
+            ys = torch.cat((ys, outs[-1][-1].view(1, 1, self.speech_decoder_postnet.odim)), dim=1)  # (1, idx + 1, odim)
+            attns.append(torch.stack([att_l[0] for att_l in extra['attn'][0]], dim=0))
+            # check whether to finish generation
+            if int(sum(probs[-1] >= threshold)) > 0 or idx >= maxlen:
+                # check mininum length
+                if idx < minlen:
+                    continue
+                outs = (torch.cat(outs, dim=0).unsqueeze(0).transpose(1, 2))  # (L, odim) -> (1, L, odim) -> (1, odim, L)
+                if self.speech_decoder_postnet.postnet is not None:
+                    outs = outs + self.speech_decoder_postnet.postnet(outs)  # (1, odim, L)
+                outs = outs.transpose(2, 1).squeeze(0)  # (L, odim)
+                probs = torch.cat(probs, dim=0)
+                attn = torch.cat(attns, dim=2)
+                break
+
+        if outs.size(0) == maxlen:
+            logging.warning("output length reaches maximum length")
+        return outs, probs, attn
+
+
+@register_model_architecture(model_name="t5_transformer", arch_name="t5_transformer")
+def base_architecture(args):
+    # Transformer
+    args.bert_init = getattr(args, "bert_init", False)
+    args.encoder_embed_dim = getattr(args, "encoder_embed_dim", 768)
+    args.encoder_ffn_embed_dim = getattr(args, "encoder_ffn_embed_dim", 768 * 4)
+    args.encoder_layers = getattr(args, "encoder_layers", 12)
+    args.encoder_attention_heads = getattr(args, "encoder_attention_heads", 12)
+    args.encoder_normalize_before = getattr(args, "encoder_normalize_before", False)
+    args.decoder_embed_dim = getattr(args, "decoder_embed_dim", args.encoder_embed_dim)
+    args.decoder_ffn_embed_dim = getattr(
+        args, "decoder_ffn_embed_dim", args.encoder_ffn_embed_dim
+    )
+    args.decoder_layers = getattr(args, "decoder_layers", 6)
+    args.decoder_attention_heads = getattr(args, "decoder_attention_heads", 12)
+    args.decoder_normalize_before = getattr(args, "decoder_normalize_before", False)
+    args.dropout = getattr(args, "dropout", 0.1)
+    args.attention_dropout = getattr(args, "attention_dropout", args.dropout)
+    args.activation_dropout = getattr(args, "activation_dropout", args.dropout)
+    args.activation_fn = getattr(args, "activation_fn", "gelu")
+    args.decoder_layerdrop = getattr(args, "decoder_layerdrop", 0.0)
+    args.decoder_output_dim = getattr(
+        args, "decoder_output_dim", args.decoder_embed_dim
+    )
+    args.decoder_input_dim = getattr(args, "decoder_input_dim", args.decoder_embed_dim)
+    args.encoder_layerdrop = getattr(args, "encoder_layerdrop", 0)
+    args.decoder_layerdrop = getattr(args, "decoder_layerdrop", 0)
+    args.max_text_positions = getattr(args, "max_text_positions", DEFAULT_MAX_TEXT_POSITIONS)
+    args.max_speech_positions = getattr(args, "max_speech_positions", DEFAULT_MAX_SPEECH_POSITIONS)
+
+    # Espnet related, including prenet, postnet
+    args.eprenet_conv_layers = getattr(args, "eprenet_conv_layers", 0)
+    args.eprenet_conv_filts = getattr(args, "eprenet_conv_filts", 0)
+    args.eprenet_conv_chans = getattr(args, "eprenet_conv_chans", 0)
+    args.use_batch_norm = getattr(args, "use_batch_norm", True)
+    args.eprenet_dropout_rate = getattr(args, "eprenet_dropout_rate", 0.0)
+    args.enc_use_scaled_pos_enc = getattr(args, "enc_use_scaled_pos_enc", True)
+    args.dec_use_scaled_pos_enc = getattr(args, "dec_use_scaled_pos_enc", True)
+    args.postnet_layers = getattr(args, "postnet_layers", 5)
+    args.postnet_chans = getattr(args, "postnet_chans", 256)
+    args.postnet_filts = getattr(args, "postnet_filts", 5)
+    args.postnet_dropout_rate = getattr(args, "postnet_dropout_rate", 0.5)
+    args.dprenet_dropout_rate = getattr(args, "dprenet_dropout_rate", 0.5)
+    args.dprenet_layers = getattr(args, "dprenet_layers", 2)
+    args.dprenet_units = getattr(args, "dprenet_units", 256)
+    args.initial_encoder_alpha = getattr(args, "initial_encoder_alpha", 1.0)
+    args.initial_decoder_alpha = getattr(args, "initial_decoder_alpha", 1.0)
+    args.spk_embed_integration_type = getattr(args, "spk_embed_integration_type", "pre")
+    args.spk_embed_dim = getattr(args, "spk_embed_dim", 512)
+    args.encoder_reduction_factor = getattr(args, "encoder_reduction_factor", 1)
+    args.reduction_factor = getattr(args, "reduction_factor", 2)
+    args.transformer_enc_positional_dropout_rate = getattr(args, "transformer_enc_positional_dropout_rate", 0.1)
+    args.transformer_dec_positional_dropout_rate = getattr(args, "transformer_dec_positional_dropout_rate", 0.1)
+    args.layer_norm_eps = getattr(args, "layer_norm_eps", 1e-5)
+    args.no_scale_embedding = getattr(args, "no_scale_embedding", True)
+    # Convolutional subsampler
+    args.encoder_speech_prenet = getattr(args, "encoder_speech_prenet", "conv")
+    args.conv_kernel_sizes = getattr(args, "conv_kernel_sizes", "5,5")
+    args.conv_channels = getattr(args, "conv_channels", 1024)
+    args.quant_noise_pq = getattr(args, "quant_noise_pq", 0)
+
+    args.adaptive_softmax_cutoff = getattr(args, "adaptive_softmax_cutoff", None)
+    args.adaptive_softmax_dropout = getattr(args, "adaptive_softmax_dropout", 0)
+    args.no_token_positional_embeddings = getattr(
+        args, "no_token_positional_embeddings", False
+    )
+    args.adaptive_input = getattr(args, "adaptive_input", False)
+    args.decoder_learned_pos = getattr(args, "decoder_learned_pos", False)
+    args.share_input_output_embed = getattr(args, "share_input_output_embed", False)
+    args.share_ctc_embed = getattr(args, "share_ctc_embed", False)
+    args.freeze_encoder_updates = getattr(args, "freeze_encoder_updates", 0)
+    args.freeze_decoder_updates = getattr(args, "freeze_decoder_updates", 0)
+    args.no_freeze_encoder_layer = getattr(args, "no_freeze_encoder_layer", None)
+
+    ## sid
+    args.sid_embed_dim = getattr(args, "sid_embed_dim", 128)
+    args.sid_pooling_layer = getattr(args, "sid_pooling_layer", "decoder")
+    args.softmax_scale = getattr(args, "softmax_scale", 1)
+    args.softmax_margin = getattr(args, "softmax_margin", 0)
+    args.softmax_easy_margin = getattr(args, "softmax_easy_margin", False)
+    args.modules_filter = getattr(args, "modules_filter", None)
+
+    ## Hubert
+    args.conv_pos = getattr(args, "conv_pos", 128)
+    args.conv_pos_groups = getattr(args, "conv_pos_groups", 16)
+    args.target_glu = getattr(args, "target_glu", False)
+    args.logit_temp = getattr(args, "logit_temp", 0.1)
+    args.final_dim = getattr(args, "final_dim", 256)
+    args.untie_final_proj = getattr(args, "untie_final_proj", True)
+    args.feature_grad_mult = getattr(args, "feature_grad_mult", 0.1)
+    args.use_sent_enc_layer = getattr(args, "use_sent_enc_layer", True)
+    # hubert feature extractor
+    args.extractor_mode = getattr(args, "extractor_mode", "default")
+    args.conv_feature_layers = getattr(args, "conv_feature_layers", "[(512,10,5)] + [(512,3,2)] * 4 + [(512,2,2)] * 2")
+    args.conv_bias = getattr(args, "conv_bias", False)
+    # mask
+    args.hubert_mask_length = getattr(args, "hubert_mask_length", 10)
+    args.mask_prob = getattr(args, "mask_prob", 0.0)
+    args.mask_selection = getattr(args, "mask_selection", "static")
+    args.mask_other = getattr(args, "mask_other", 0)
+    args.no_mask_overlap = getattr(args, "no_mask_overlap", False)
+    args.mask_min_space = getattr(args, "mask_min_space", 1)
+    # channel mask
+    args.mask_channel_length = getattr(args, "mask_channel_length", 10)
+    args.mask_channel_prob = getattr(args, "mask_channel_prob", 0.0)
+    args.mask_channel_selection = getattr(args, "mask_channel_selection", "static")
+    args.mask_channel_other = getattr(args, "mask_channel_other", 0)
+    args.no_mask_channel_overlap = getattr(args, "no_mask_channel_overlap", False)
+    args.mask_channel_min_space = getattr(args, "mask_channel_min_space", 1)
+    # loss computation
+    args.skip_masked = getattr(args, "skip_masked", False)
+    args.skip_nomask = getattr(args, "skip_nomask", False)
+    # conv Pos
+    args.use_conv_pos = getattr(args, "use_conv_pos", False)
+    args.use_sinc_pos = getattr(args, "use_sinc_pos", False)
+
+    # codebook
+    args.use_codebook = getattr(args, "use_codebook", False)
+    args.latent_vars = getattr(args, "latent_vars", 100)
+    args.latent_groups = getattr(args, "latent_groups", 2)
+    args.latent_dim = getattr(args, "latent_dim", 0)
+    args.latent_temp = getattr(args, "latent_temp", (2, 0.5, 0.999995))
+    args.quantizer_depth = getattr(args, "quantizer_depth", 1)
+    args.quantizer_factor = getattr(args, "quantizer_factor", 3)
+    args.codebook_prob = getattr(args, "codebook_prob", 0.5)
+
+    # Relative pos embed
+    args.relative_position_embedding = getattr(args, "relative_position_embedding", False)
+    args.num_buckets = getattr(args, "num_buckets", 320)
+    args.max_distance = getattr(args, "max_distance", 1280)
+    args.encoder_max_relative_position = getattr(args, "encoder_max_relative_position", 160)
+    args.decoder_max_relative_position = getattr(args, "decoder_max_relative_position", 160)
+
+@register_model_architecture("t5_transformer", "t5_transformer_base")
+def t5_transformer_base(args):
+    args.use_conv_pos = getattr(args, "use_conv_pos", True)
+    args.use_sinc_pos = getattr(args, "use_sinc_pos", True)
+    args.layernorm_embedding = getattr(args, "layernorm_embedding", False)
+    args.encoder_normalize_before = getattr(args, "encoder_normalize_before", False)
+    args.decoder_normalize_before = getattr(args, "decoder_normalize_before", False)
+    args.layer_norm_first = getattr(args, "layer_norm_first", False)
+    args.relative_position_embedding = getattr(args, "relative_position_embedding", True)
+    args.dropout = getattr(args, "dropout", 0.1)
+    args.activation_dropout = getattr(args, "activation_dropout", 0.0)
+    args.attention_dropout = getattr(args, "attention_dropout", 0.1)
+    args.encoder_layerdrop = getattr(args, "encoder_layerdrop", 0.05)
+    args.decoder_layerdrop = getattr(args, "decoder_layerdrop", 0.05)
+    args.mask_prob = getattr(args, "mask_prob", 0.80)
+    base_architecture(args)
+
+@register_model_architecture("t5_transformer", "t5_transformer_large")
+def t5_transformer_large(args):
+    args.use_conv_pos = getattr(args, "use_conv_pos", True)
+    args.use_sinc_pos = getattr(args, "use_sinc_pos", True)
+    args.encoder_normalize_before = getattr(args, "encoder_normalize_before", False)
+    args.decoder_normalize_before = getattr(args, "decoder_normalize_before", True)
+    args.layer_norm_first = getattr(args, "layer_norm_first", True)
+    args.relative_position_embedding = getattr(args, "relative_position_embedding", True)
+    args.dropout = getattr(args, "dropout", 0.0)
+    args.activation_dropout = getattr(args, "activation_dropout", 0.0)
+    args.attention_dropout = getattr(args, "attention_dropout", 0.0)
+    args.encoder_layerdrop = getattr(args, "encoder_layerdrop", 0.0)
+    args.decoder_layerdrop = getattr(args, "decoder_layerdrop", 0.0)
+    args.encoder_embed_dim = getattr(args, "encoder_embed_dim", 1024)
+    args.encoder_layers = getattr(args, "encoder_layers", 24)
+    args.decoder_layers = getattr(args, "decoder_layers", 6)
+    args.encoder_ffn_embed_dim = getattr(args, "encoder_ffn_embed_dim", 4096)
+    args.encoder_attention_heads = getattr(args, "encoder_attention_heads", 16)
+    args.decoder_attention_heads = getattr(args, "decoder_attention_heads", 16)
+    args.feature_grad_mult = getattr(args, "feature_grad_mult", 1.0)
+    args.extractor_mode = getattr(args, "extractor_mode", "layer_norm")
+    args.final_dim = getattr(args, "final_dim", 768)
+    args.mask_prob = getattr(args, "mask_prob", 0.80)
+    base_architecture(args)
+
+@register_model_architecture("t5_transformer", "t5_transformer_base_asr")
+def t5_transformer_base_asr(args):
+    args.use_conv_pos = getattr(args, "use_conv_pos", True)
+    args.use_sinc_pos = getattr(args, "use_sinc_pos", True)
+    args.encoder_normalize_before = getattr(args, "encoder_normalize_before", False)
+    args.decoder_normalize_before = getattr(args, "decoder_normalize_before", False)
+    args.layer_norm_first = getattr(args, "layer_norm_first", False)
+    args.relative_position_embedding = getattr(args, "relative_position_embedding", True)
+    args.dropout = getattr(args, "dropout", 0.1)
+    args.activation_dropout = getattr(args, "activation_dropout", 0.1)
+    args.attention_dropout = getattr(args, "attention_dropout", 0.1)
+    args.feature_grad_mult = getattr(args, "feature_grad_mult", 0.0)
+    args.encoder_layerdrop = getattr(args, "encoder_layerdrop", 0.1)
+    args.decoder_layerdrop = getattr(args, "decoder_layerdrop", 0.1)
+    args.mask_prob = getattr(args, "mask_prob", 0.75)
+    args.mask_selection = getattr(args, "mask_selection", "static")
+    args.mask_channel_length = getattr(args, "mask_channel_length", 64)
+    args.mask_channel_prob = getattr(args, "mask_channel_prob", 0.5)
+    args.mask_channel_selection = getattr(args, "mask_channel_selection", "static")
+    args.max_text_positions = getattr(args, "max_text_positions", 600)
+    base_architecture(args)
diff --git a/SpeechT5/speecht5/models/t5_transformer_lm.py b/SpeechT5/speecht5/models/t5_transformer_lm.py
new file mode 100644
index 0000000000000000000000000000000000000000..7b20b7f8901e07996c198847fd916f637844ad7a
--- /dev/null
+++ b/SpeechT5/speecht5/models/t5_transformer_lm.py
@@ -0,0 +1,25 @@
+# --------------------------------------------------------
+# SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing (https://arxiv.org/abs/2110.07205)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/SpeechT5
+# Copyright (c) 2021 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Based on fairseq and espnet code bases
+# https://github.com/pytorch/fairseq; https://github.com/espnet/espnet
+# --------------------------------------------------------
+
+from fairseq.models import (
+    register_model_architecture,
+)
+from fairseq.models.transformer_lm import base_lm_architecture
+
+
+@register_model_architecture(model_name="transformer_lm", arch_name="transformer_lm_t5")
+def transformer_lm_t5(args):
+    args.decoder_embed_dim = getattr(args, "decoder_embed_dim", 1280)
+    args.decoder_ffn_embed_dim = getattr(args, "decoder_ffn_embed_dim", 6144)
+    args.decoder_layers = getattr(args, "decoder_layers", 20)
+    args.decoder_attention_heads = getattr(args, "decoder_attention_heads", 16)
+    args.dropout = getattr(args, "dropout", 0.1)
+    args.attention_dropout = getattr(args, "attention_dropout", 0.1)
+    args.activation_fn = getattr(args, "activation_fn", "gelu")
+    base_lm_architecture(args)
diff --git a/SpeechT5/speecht5/sequence_generator.py b/SpeechT5/speecht5/sequence_generator.py
new file mode 100644
index 0000000000000000000000000000000000000000..46fc676ab0e4b6dcddd4e003844bf89d0eccb775
--- /dev/null
+++ b/SpeechT5/speecht5/sequence_generator.py
@@ -0,0 +1,1080 @@
+# --------------------------------------------------------
+# SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing (https://arxiv.org/abs/2110.07205)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/SpeechT5
+# Copyright (c) 2021 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Based on fairseq and espnet code bases
+# https://github.com/pytorch/fairseq; https://github.com/espnet/espnet
+# --------------------------------------------------------
+
+import math
+from typing import Dict, List, Optional
+import sys
+
+import torch
+import torch.nn as nn
+from fairseq import search, utils
+from fairseq.data import data_utils
+from fairseq.models import FairseqIncrementalDecoder
+from torch import Tensor
+from fairseq.ngram_repeat_block import NGramRepeatBlock
+from espnet.nets.ctc_prefix_score import CTCPrefixScore
+import numpy
+
+CTC_SCORING_RATIO = 7.0
+
+class SequenceGenerator(nn.Module):
+    def __init__(
+        self,
+        models,
+        tgt_dict,
+        beam_size=1,
+        max_len_a=0,
+        max_len_b=200,
+        max_len=0,
+        min_len=1,
+        normalize_scores=True,
+        len_penalty=1.0,
+        unk_penalty=0.0,
+        temperature=1.0,
+        match_source_len=False,
+        no_repeat_ngram_size=0,
+        search_strategy=None,
+        eos=None,
+        symbols_to_strip_from_output=None,
+        lm_model=None,
+        lm_weight=1.0,
+        ctc_weight=0.0,
+    ):
+        """Generates translations of a given source sentence.
+
+        Args:
+            models (List[~fairseq.models.FairseqModel]): ensemble of models,
+                currently support fairseq.models.TransformerModel for scripting
+            beam_size (int, optional): beam width (default: 1)
+            max_len_a/b (int, optional): generate sequences of maximum length
+                ax + b, where x is the source length
+            max_len (int, optional): the maximum length of the generated output
+                (not including end-of-sentence)
+            min_len (int, optional): the minimum length of the generated output
+                (not including end-of-sentence)
+            normalize_scores (bool, optional): normalize scores by the length
+                of the output (default: True)
+            len_penalty (float, optional): length penalty, where <1.0 favors
+                shorter, >1.0 favors longer sentences (default: 1.0)
+            unk_penalty (float, optional): unknown word penalty, where <0
+                produces more unks, >0 produces fewer (default: 0.0)
+            temperature (float, optional): temperature, where values
+                >1.0 produce more uniform samples and values <1.0 produce
+                sharper samples (default: 1.0)
+            match_source_len (bool, optional): outputs should match the source
+                length (default: False)
+        """
+        super().__init__()
+        if isinstance(models, EnsembleModel):
+            self.model = models
+        else:
+            self.model = EnsembleModel(models)
+        self.tgt_dict = tgt_dict
+        self.pad = tgt_dict.pad()
+        self.unk = tgt_dict.unk()
+        self.eos = tgt_dict.eos() if eos is None else eos
+        self.blank = self.tgt_dict.index("<ctc_blank>")
+        self.mask = self.tgt_dict.index("<mask>")
+        self.mask_idxs = []
+        if self.tgt_dict.index("<mask>0") != self.unk:
+            count = 0
+            while self.tgt_dict.index("<mask>" + str(count)) != self.unk:
+                self.mask_idxs.append(self.tgt_dict.index("<mask>" + str(count)))
+                count += 1
+        self.mask_idxs = torch.tensor(self.mask_idxs)
+        self.symbols_to_strip_from_output = (
+            symbols_to_strip_from_output.union({self.eos})
+            if symbols_to_strip_from_output is not None
+            else {self.eos}
+        )
+        self.vocab_size = len(tgt_dict)
+        self.beam_size = beam_size
+        # the max beam size is the dictionary size - 1, since we never select pad
+        self.beam_size = min(beam_size, self.vocab_size - 1)
+        self.max_len_a = max_len_a
+        self.max_len_b = max_len_b
+        self.min_len = min_len
+        self.max_len = max_len or self.model.max_decoder_positions()
+
+        self.normalize_scores = normalize_scores
+        self.len_penalty = len_penalty
+        self.unk_penalty = unk_penalty
+        self.temperature = temperature
+        self.match_source_len = match_source_len
+
+        if no_repeat_ngram_size > 0:
+            self.repeat_ngram_blocker = NGramRepeatBlock(no_repeat_ngram_size)
+        else:
+            self.repeat_ngram_blocker = None
+
+        assert temperature > 0, "--temperature must be greater than 0"
+
+        self.search = (
+            search.BeamSearch(tgt_dict) if search_strategy is None else search_strategy
+        )
+        # We only need to set src_lengths in LengthConstrainedBeamSearch.
+        # As a module attribute, setting it would break in multithread
+        # settings when the model is shared.
+        self.should_set_src_lengths = (
+            hasattr(self.search, "needs_src_lengths") and self.search.needs_src_lengths
+        )
+
+        self.model.eval()
+
+        self.lm_model = lm_model
+        self.lm_weight = lm_weight
+        self.ctc_weight = ctc_weight
+        if self.lm_model is not None:
+            self.lm_model.eval()
+
+    def cuda(self):
+        self.model.cuda()
+        return self
+
+    @torch.no_grad()
+    def forward(
+        self,
+        sample: Dict[str, Dict[str, Tensor]],
+        prefix_tokens: Optional[Tensor] = None,
+        bos_token: Optional[int] = None,
+    ):
+        """Generate a batch of translations.
+
+        Args:
+            sample (dict): batch
+            prefix_tokens (torch.LongTensor, optional): force decoder to begin
+                with these tokens
+            bos_token (int, optional): beginning of sentence token
+                (default: self.eos)
+        """
+        return self._generate(sample, prefix_tokens, bos_token=bos_token)
+
+    # TODO(myleott): unused, deprecate after pytorch-translate migration
+    def generate_batched_itr(self, data_itr, beam_size=None, cuda=False, timer=None):
+        """Iterate over a batched dataset and yield individual translations.
+        Args:
+            cuda (bool, optional): use GPU for generation
+            timer (StopwatchMeter, optional): time generations
+        """
+        for sample in data_itr:
+            s = utils.move_to_cuda(sample) if cuda else sample
+            if "net_input" not in s:
+                continue
+            input = s["net_input"]
+            # model.forward normally channels prev_output_tokens into the decoder
+            # separately, but SequenceGenerator directly calls model.encoder
+            encoder_input = {
+                k: v for k, v in input.items() if k != "prev_output_tokens"
+            }
+            if timer is not None:
+                timer.start()
+            with torch.no_grad():
+                hypos = self.generate(encoder_input)
+            if timer is not None:
+                timer.stop(sum(len(h[0]["tokens"]) for h in hypos))
+            for i, id in enumerate(s["id"].data):
+                # remove padding
+                src = utils.strip_pad(input["src_tokens"].data[i, :], self.pad)
+                ref = (
+                    utils.strip_pad(s["target"].data[i, :], self.pad)
+                    if s["target"] is not None
+                    else None
+                )
+                yield id, src, ref, hypos[i]
+
+    @torch.no_grad()
+    def generate(self, models, sample: Dict[str, Dict[str, Tensor]], **kwargs):
+        """Generate translations. Match the api of other fairseq generators.
+
+        Args:
+            models (List[~fairseq.models.FairseqModel]): ensemble of models
+            sample (dict): batch
+            prefix_tokens (torch.LongTensor, optional): force decoder to begin
+                with these tokens
+            constraints (torch.LongTensor, optional): force decoder to include
+                the list of constraints
+            bos_token (int, optional): beginning of sentence token
+                (default: self.eos)
+        """
+        return self._generate(sample, **kwargs)
+
+    def _generate(
+        self,
+        sample: Dict[str, Dict[str, Tensor]],
+        prefix_tokens: Optional[Tensor] = None,
+        constraints: Optional[Tensor] = None,
+        bos_token: Optional[int] = None,
+    ):
+        incremental_states = torch.jit.annotate(
+            List[Dict[str, Dict[str, Optional[Tensor]]]],
+            [
+                torch.jit.annotate(Dict[str, Dict[str, Optional[Tensor]]], {})
+                for i in range(self.model.models_size)
+            ],
+        )
+        net_input = sample["net_input"]
+
+        if "src_tokens" in net_input:
+            src_tokens = net_input["src_tokens"]
+            # length of the source text being the character length except EndOfSentence and pad
+            src_lengths = (
+                (src_tokens.ne(self.eos) & src_tokens.ne(self.pad)).long().sum(dim=1)
+            )
+        elif "source" in net_input:
+            src_tokens = net_input["source"]
+            src_lengths = (
+                net_input["padding_mask"].size(-1) - net_input["padding_mask"].sum(-1)
+                if net_input["padding_mask"] is not None
+                else torch.tensor(src_tokens.size(-1)).to(src_tokens)
+            )
+        elif "features" in net_input:
+            src_tokens = net_input["features"]
+            src_lengths = (
+                net_input["padding_mask"].size(-1) - net_input["padding_mask"].sum(-1)
+                if net_input["padding_mask"] is not None
+                else torch.tensor(src_tokens.size(-1)).to(src_tokens)
+            )
+        else:
+            raise Exception("expected src_tokens or source in net input. input keys: " + str(net_input.keys()))
+
+        # bsz: total number of sentences in beam
+        # Note that src_tokens may have more than 2 dimensions (i.e. audio features)
+        bsz, src_len = src_tokens.size()[:2]
+        beam_size = self.beam_size
+
+        if constraints is not None and not self.search.supports_constraints:
+            raise NotImplementedError(
+                "Target-side constraints were provided, but search method doesn't support them"
+            )
+
+        # Initialize constraints, when active
+        self.search.init_constraints(constraints, beam_size)
+
+        max_len: int = -1
+        if self.match_source_len:
+            max_len = src_lengths.max().item()
+        else:
+            max_len = min(
+                int(self.max_len_a * src_len + self.max_len_b),
+                self.max_len - 1,
+            )
+        assert (
+            self.min_len <= max_len
+        ), "min_len cannot be larger than max_len, please adjust these!"
+        # compute the encoder output for each beam
+        encoder_outs = self.model.forward_encoder(net_input)
+
+        # Get CTC lprobs and prep ctc_scorer
+        if self.ctc_weight > 0:
+            ctc_lprobs = self.model.models[0].get_normalized_probs_for_ctc(
+                encoder_outs[0], log_probs=True
+            ).contiguous().transpose(0, 1)  # (B, T, C) from the encoder
+
+            hyp = {}
+            ctc_prefix_score = CTCPrefixScore(ctc_lprobs[0].detach().cpu().numpy(), self.blank, self.eos, numpy)
+            hyp["ctc_state_prev"] = ctc_prefix_score.initial_state()
+            hyp["ctc_score_prev"] = 0.0
+            ctc_beam = min(ctc_lprobs.shape[-1] - self.mask_idxs.size(-1), int(beam_size * CTC_SCORING_RATIO))
+            ctc_hyps = {str(self.eos): hyp}
+
+        # placeholder of indices for bsz * beam_size to hold tokens and accumulative scores
+        new_order = torch.arange(bsz).view(-1, 1).repeat(1, beam_size).view(-1)
+        new_order = new_order.to(src_tokens.device).long()
+        encoder_outs = self.model.reorder_encoder_out(encoder_outs, new_order)
+        # ensure encoder_outs is a List.
+        assert encoder_outs is not None
+
+        # initialize buffers
+        scores = (
+            torch.zeros(bsz * beam_size, max_len + 1).to(src_tokens).float()
+        )  # +1 for eos; pad is never chosen for scoring
+        tokens = (
+            torch.zeros(bsz * beam_size, max_len + 2)
+            .to(src_tokens)
+            .long()
+            .fill_(self.pad)
+        )  # +2 for eos and pad
+        tokens[:, 0] = self.eos if bos_token is None else bos_token
+        attn: Optional[Tensor] = None
+
+        # A list that indicates candidates that should be ignored.
+        # For example, suppose we're sampling and have already finalized 2/5
+        # samples. Then cands_to_ignore would mark 2 positions as being ignored,
+        # so that we only finalize the remaining 3 samples.
+        cands_to_ignore = (
+            torch.zeros(bsz, beam_size).to(src_tokens).eq(-1)
+        )  # forward and backward-compatible False mask
+
+        # list of completed sentences
+        finalized = torch.jit.annotate(
+            List[List[Dict[str, Tensor]]],
+            [torch.jit.annotate(List[Dict[str, Tensor]], []) for i in range(bsz)],
+        )  # contains lists of dictionaries of infomation about the hypothesis being finalized at each step
+
+        # a boolean array indicating if the sentence at the index is finished or not
+        finished = [False for i in range(bsz)]
+        num_remaining_sent = bsz  # number of sentences remaining
+
+        # number of candidate hypos per step
+        cand_size = 2 * beam_size  # 2 x beam size in case half are EOS
+
+        # offset arrays for converting between different indexing schemes
+        bbsz_offsets = (
+            (torch.arange(0, bsz) * beam_size)
+            .unsqueeze(1)
+            .type_as(tokens)
+            .to(src_tokens.device)
+        )
+        cand_offsets = torch.arange(0, cand_size).type_as(tokens).to(src_tokens.device)
+
+        reorder_state: Optional[Tensor] = None
+        ctc_state = None
+        batch_idxs: Optional[Tensor] = None
+
+        original_batch_idxs: Optional[Tensor] = None
+        if "id" in sample and isinstance(sample["id"], Tensor):
+            original_batch_idxs = sample["id"]
+        else:
+            original_batch_idxs = torch.arange(0, bsz).type_as(tokens)
+
+        for step in range(max_len + 1):  # one extra step for EOS marker
+            # reorder decoder internal states based on the prev choice of beams
+            if reorder_state is not None:
+                if batch_idxs is not None:
+                    # update beam indices to take into account removed sentences
+                    corr = batch_idxs - torch.arange(batch_idxs.numel()).type_as(
+                        batch_idxs
+                    )
+                    reorder_state.view(-1, beam_size).add_(
+                        corr.unsqueeze(-1) * beam_size
+                    )
+                    original_batch_idxs = original_batch_idxs[batch_idxs]
+                self.model.reorder_incremental_state(incremental_states, reorder_state)
+                encoder_outs = self.model.reorder_encoder_out(
+                    encoder_outs, reorder_state
+                )
+
+            lprobs, avg_attn_scores = self.model.forward_decoder(
+                tokens[:, : step + 1],
+                encoder_outs,
+                incremental_states,
+                self.temperature,
+            )
+
+            if self.ctc_weight > 0 and step != 0:
+                # lprobs[:, self.blank] = -math.inf  # never select blank
+                ctc_lprobs = lprobs.clone()
+                ctc_lprobs[:, self.blank] = -math.inf # never select blank
+                if self.mask != self.unk:
+                    ctc_lprobs[:, self.mask] = -math.inf # never select mask
+                if self.mask_idxs.size(0) != 0:
+                    ctc_lprobs[:, self.mask_idxs] = -math.inf # never select mask
+                local_best_scores, local_best_ids = torch.topk(ctc_lprobs, ctc_beam, dim=-1)
+                for b in range(tokens.size(0)):
+                    hyp_key = " ".join(str(x) for x in tokens[b, : step + 1].tolist())
+                    ctc_scores, ctc_states = ctc_prefix_score(
+                        tokens[b, : step + 1].cpu(), local_best_ids[b].cpu(), ctc_hyps[hyp_key]["ctc_state_prev"]
+                    )
+                    lprobs[b] = lprobs[b]
+                    lprobs[b, local_best_ids[b]] = (1 - self.ctc_weight) * (lprobs[b, local_best_ids[b]]) + self.ctc_weight * torch.from_numpy(
+                            ctc_scores - ctc_hyps[hyp_key]["ctc_score_prev"]
+                        ).to(device="cuda")
+                    for j in range(len(local_best_ids[b])):
+                        ctc_hyps[hyp_key + " " + str(local_best_ids[b][j].item())] = {}
+                        ctc_hyps[hyp_key + " " + str(local_best_ids[b][j].item())]["ctc_score_prev"] = ctc_scores[j]
+                        ctc_hyps[hyp_key + " " + str(local_best_ids[b][j].item())]["ctc_state_prev"] = ctc_states[j]
+
+                # local_ctc_scores, ctc_state = ctc_scorer(
+                #     tokens[:, : step + 1], ctc_state, part_ids
+                # )
+                # lprobs += local_ctc_scores * self.ctc_weight
+            elif self.ctc_weight > 0 and step == 0:
+                ctc_lprobs = lprobs.clone()
+                ctc_lprobs[:, self.blank] = -math.inf # never select blank
+                if self.mask != self.unk:
+                    ctc_lprobs[:, self.mask] = -math.inf # never select mask
+                if self.mask_idxs.size(0) != 0:
+                    ctc_lprobs[:, self.mask_idxs] = -math.inf # never select mask
+                local_best_scores, local_best_ids = torch.topk(ctc_lprobs, ctc_beam, dim=-1)
+                for b in range(tokens.size(0)):
+                    hyp_key = " ".join(str(x) for x in tokens[b, : step + 1].tolist())
+                    ctc_scores, ctc_states = ctc_prefix_score(
+                        tokens[b, : step + 1].cpu(), local_best_ids[b].cpu(), ctc_hyps[hyp_key]["ctc_state_prev"]
+                    )
+                    lprobs[b] = lprobs[b]
+                    lprobs[b, local_best_ids[b]] = (1 - self.ctc_weight) * (lprobs[b, local_best_ids[b]]) + self.ctc_weight * torch.from_numpy(
+                            ctc_scores - ctc_hyps[hyp_key]["ctc_score_prev"]
+                        ).to(device="cuda")
+                    for j in range(len(local_best_ids[b])):
+                        if b == 0:
+                            ctc_hyps[hyp_key + " " + str(local_best_ids[b][j].item())] = {}
+                            ctc_hyps[hyp_key + " " + str(local_best_ids[b][j].item())]["ctc_score_prev"] = ctc_scores[j]
+                            ctc_hyps[hyp_key + " " + str(local_best_ids[b][j].item())]["ctc_state_prev"] = ctc_states[j]
+
+            if self.lm_model is not None:
+                lm_out = self.lm_model(tokens[:, : step + 1])
+                probs = self.lm_model.get_normalized_probs(
+                    lm_out, log_probs=True, sample=None
+                )
+                probs = probs[:, -1, :] * self.lm_weight
+                lprobs[:, :probs.size(1)] += probs
+
+            # handle prefix tokens (possibly with different lengths)
+            if (
+                prefix_tokens is not None
+                and step < prefix_tokens.size(1)
+                and step < max_len
+            ):
+                lprobs, tokens, scores = self._prefix_tokens(
+                    step, lprobs, scores, tokens, prefix_tokens, beam_size
+                )
+            elif step < self.min_len:
+                # minimum length constraint (does not apply if using prefix_tokens)
+                lprobs[:, self.eos] = -math.inf
+
+            lprobs[lprobs != lprobs] = torch.tensor(-math.inf).to(lprobs)
+
+            lprobs[:, self.pad] = -math.inf  # never select pad
+            lprobs[:, self.unk] -= self.unk_penalty  # apply unk penalty
+            lprobs[:, self.blank] = -math.inf # never select blank
+            if self.mask != self.unk:
+                lprobs[:, self.mask] = -math.inf # never select mask
+            if self.mask_idxs.size(0) != 0:
+                lprobs[:, self.mask_idxs] = -math.inf # never select mask
+
+            # handle max length constraint
+            if step >= max_len:
+                lprobs[:, : self.eos] = -math.inf
+                lprobs[:, self.eos + 1 :] = -math.inf
+
+            # Record attention scores, only support avg_attn_scores is a Tensor
+            if avg_attn_scores is not None:
+                if attn is None:
+                    attn = torch.empty(
+                        bsz * beam_size, avg_attn_scores.size(1), max_len + 2
+                    ).to(scores)
+                attn[:, :, step + 1].copy_(avg_attn_scores)
+
+            scores = scores.type_as(lprobs)
+            eos_bbsz_idx = torch.empty(0).to(
+                tokens
+            )  # indices of hypothesis ending with eos (finished sentences)
+            eos_scores = torch.empty(0).to(
+                scores
+            )  # scores of hypothesis ending with eos (finished sentences)
+
+            if self.should_set_src_lengths:
+                self.search.set_src_lengths(src_lengths)
+
+            if self.repeat_ngram_blocker is not None:
+                lprobs = self.repeat_ngram_blocker(tokens, lprobs, bsz, beam_size, step)
+
+            # Shape: (batch, cand_size)
+            cand_scores, cand_indices, cand_beams = self.search.step(
+                step,
+                lprobs.view(bsz, -1, self.vocab_size),
+                scores.view(bsz, beam_size, -1)[:, :, :step],
+                tokens[:, : step + 1],
+                original_batch_idxs,
+            )
+
+            # cand_bbsz_idx contains beam indices for the top candidate
+            # hypotheses, with a range of values: [0, bsz*beam_size),
+            # and dimensions: [bsz, cand_size]
+            cand_bbsz_idx = cand_beams.add(bbsz_offsets)
+
+            # finalize hypotheses that end in eos
+            # Shape of eos_mask: (batch size, beam size)
+            eos_mask = cand_indices.eq(self.eos) & cand_scores.ne(-math.inf)
+            eos_mask[:, :beam_size][cands_to_ignore] = torch.tensor(0).to(eos_mask)
+
+            # only consider eos when it's among the top beam_size indices
+            # Now we know what beam item(s) to finish
+            # Shape: 1d list of absolute-numbered
+            eos_bbsz_idx = torch.masked_select(
+                cand_bbsz_idx[:, :beam_size], mask=eos_mask[:, :beam_size]
+            )
+
+            finalized_sents: List[int] = []
+            if eos_bbsz_idx.numel() > 0:
+                eos_scores = torch.masked_select(
+                    cand_scores[:, :beam_size], mask=eos_mask[:, :beam_size]
+                )
+
+                finalized_sents = self.finalize_hypos(
+                    step,
+                    eos_bbsz_idx,
+                    eos_scores,
+                    tokens,
+                    scores,
+                    finalized,
+                    finished,
+                    beam_size,
+                    attn,
+                    src_lengths,
+                    max_len,
+                )
+                num_remaining_sent -= len(finalized_sents)
+
+            assert num_remaining_sent >= 0
+            if num_remaining_sent == 0:
+                break
+            if self.search.stop_on_max_len and step >= max_len:
+                break
+            assert step < max_len, f"{step} < {max_len}"
+
+            # Remove finalized sentences (ones for which {beam_size}
+            # finished hypotheses have been generated) from the batch.
+            if len(finalized_sents) > 0:
+                new_bsz = bsz - len(finalized_sents)
+
+                # construct batch_idxs which holds indices of batches to keep for the next pass
+                batch_mask = torch.ones(
+                    bsz, dtype=torch.bool, device=cand_indices.device
+                )
+                batch_mask[finalized_sents] = False
+                # TODO replace `nonzero(as_tuple=False)` after TorchScript supports it
+                batch_idxs = torch.arange(
+                    bsz, device=cand_indices.device
+                ).masked_select(batch_mask)
+
+                # Choose the subset of the hypothesized constraints that will continue
+                self.search.prune_sentences(batch_idxs)
+
+                eos_mask = eos_mask[batch_idxs]
+                cand_beams = cand_beams[batch_idxs]
+                bbsz_offsets.resize_(new_bsz, 1)
+                cand_bbsz_idx = cand_beams.add(bbsz_offsets)
+                cand_scores = cand_scores[batch_idxs]
+                cand_indices = cand_indices[batch_idxs]
+
+                if prefix_tokens is not None:
+                    prefix_tokens = prefix_tokens[batch_idxs]
+                src_lengths = src_lengths[batch_idxs]
+                cands_to_ignore = cands_to_ignore[batch_idxs]
+
+                scores = scores.view(bsz, -1)[batch_idxs].view(new_bsz * beam_size, -1)
+                tokens = tokens.view(bsz, -1)[batch_idxs].view(new_bsz * beam_size, -1)
+                if attn is not None:
+                    attn = attn.view(bsz, -1)[batch_idxs].view(
+                        new_bsz * beam_size, attn.size(1), -1
+                    )
+                bsz = new_bsz
+            else:
+                batch_idxs = None
+
+            # Set active_mask so that values > cand_size indicate eos hypos
+            # and values < cand_size indicate candidate active hypos.
+            # After, the min values per row are the top candidate active hypos
+
+            # Rewrite the operator since the element wise or is not supported in torchscript.
+
+            eos_mask[:, :beam_size] = ~((~cands_to_ignore) & (~eos_mask[:, :beam_size]))
+            active_mask = torch.add(
+                eos_mask.type_as(cand_offsets) * cand_size,
+                cand_offsets[: eos_mask.size(1)],
+            )
+
+            # get the top beam_size active hypotheses, which are just
+            # the hypos with the smallest values in active_mask.
+            # {active_hypos} indicates which {beam_size} hypotheses
+            # from the list of {2 * beam_size} candidates were
+            # selected. Shapes: (batch size, beam size)
+            new_cands_to_ignore, active_hypos = torch.topk(
+                active_mask, k=beam_size, dim=1, largest=False
+            )
+
+            # update cands_to_ignore to ignore any finalized hypos.
+            cands_to_ignore = new_cands_to_ignore.ge(cand_size)[:, :beam_size]
+            # Make sure there is at least one active item for each sentence in the batch.
+            assert (~cands_to_ignore).any(dim=1).all()
+
+            # update cands_to_ignore to ignore any finalized hypos
+
+            # {active_bbsz_idx} denotes which beam number is continued for each new hypothesis (a beam
+            # can be selected more than once).
+            active_bbsz_idx = torch.gather(cand_bbsz_idx, dim=1, index=active_hypos)
+            active_scores = torch.gather(cand_scores, dim=1, index=active_hypos)
+
+            active_bbsz_idx = active_bbsz_idx.view(-1)
+            active_scores = active_scores.view(-1)
+
+            # copy tokens and scores for active hypotheses
+
+            # Set the tokens for each beam (can select the same row more than once)
+            tokens[:, : step + 1] = torch.index_select(
+                tokens[:, : step + 1], dim=0, index=active_bbsz_idx
+            )
+            # Select the next token for each of them
+            tokens.view(bsz, beam_size, -1)[:, :, step + 1] = torch.gather(
+                cand_indices, dim=1, index=active_hypos
+            )
+            if step > 0:
+                scores[:, :step] = torch.index_select(
+                    scores[:, :step], dim=0, index=active_bbsz_idx
+                )
+            scores.view(bsz, beam_size, -1)[:, :, step] = torch.gather(
+                cand_scores, dim=1, index=active_hypos
+            )
+
+            # Update constraints based on which candidates were selected for the next beam
+            self.search.update_constraints(active_hypos)
+
+            # copy attention for active hypotheses
+            if attn is not None:
+                attn[:, :, : step + 2] = torch.index_select(
+                    attn[:, :, : step + 2], dim=0, index=active_bbsz_idx
+                )
+
+            # reorder incremental state in decoder
+            reorder_state = active_bbsz_idx
+
+            # if self.ctc_weight > 0:
+            #     accum_best_id = torch.gather(cand_indices, dim=1, index=active_hypos)
+            #     ctc_state = ctc_scorer.index_select_state(
+            #         ctc_state, accum_best_id
+            #     )
+
+        # sort by score descending
+        for sent in range(len(finalized)):
+            scores = torch.tensor(
+                [float(elem["score"].item()) for elem in finalized[sent]]
+            )
+            _, sorted_scores_indices = torch.sort(scores, descending=True)
+            finalized[sent] = [finalized[sent][ssi] for ssi in sorted_scores_indices]
+            finalized[sent] = torch.jit.annotate(
+                List[Dict[str, Tensor]], finalized[sent]
+            )
+        return finalized
+
+    def _prefix_tokens(
+        self, step: int, lprobs, scores, tokens, prefix_tokens, beam_size: int
+    ):
+        """Handle prefix tokens"""
+        prefix_toks = prefix_tokens[:, step].unsqueeze(-1).repeat(1, beam_size).view(-1)
+        prefix_lprobs = lprobs.gather(-1, prefix_toks.unsqueeze(-1))
+        prefix_mask = prefix_toks.ne(self.pad)
+        lprobs[prefix_mask] = torch.min(prefix_lprobs) - 1
+        lprobs[prefix_mask] = lprobs[prefix_mask].scatter(
+            -1, prefix_toks[prefix_mask].unsqueeze(-1), prefix_lprobs[prefix_mask]
+        )
+        # if prefix includes eos, then we should make sure tokens and
+        # scores are the same across all beams
+        eos_mask = prefix_toks.eq(self.eos)
+        if eos_mask.any():
+            # validate that the first beam matches the prefix
+            first_beam = tokens[eos_mask].view(-1, beam_size, tokens.size(-1))[
+                :, 0, 1 : step + 1
+            ]
+            eos_mask_batch_dim = eos_mask.view(-1, beam_size)[:, 0]
+            target_prefix = prefix_tokens[eos_mask_batch_dim][:, :step]
+            assert (first_beam == target_prefix).all()
+
+            # copy tokens, scores and lprobs from the first beam to all beams
+            tokens = self.replicate_first_beam(tokens, eos_mask_batch_dim, beam_size)
+            scores = self.replicate_first_beam(scores, eos_mask_batch_dim, beam_size)
+            lprobs = self.replicate_first_beam(lprobs, eos_mask_batch_dim, beam_size)
+        return lprobs, tokens, scores
+
+    def replicate_first_beam(self, tensor, mask, beam_size: int):
+        tensor = tensor.view(-1, beam_size, tensor.size(-1))
+        tensor[mask] = tensor[mask][:, :1, :]
+        return tensor.view(-1, tensor.size(-1))
+
+    def finalize_hypos(
+        self,
+        step: int,
+        bbsz_idx,
+        eos_scores,
+        tokens,
+        scores,
+        finalized: List[List[Dict[str, Tensor]]],
+        finished: List[bool],
+        beam_size: int,
+        attn: Optional[Tensor],
+        src_lengths,
+        max_len: int,
+    ):
+        """Finalize hypothesis, store finalized information in `finalized`, and change `finished` accordingly.
+        A sentence is finalized when {beam_size} finished items have been collected for it.
+        Returns number of sentences (not beam items) being finalized.
+        These will be removed from the batch and not processed further.
+        Args:
+            bbsz_idx (Tensor):
+        """
+        assert bbsz_idx.numel() == eos_scores.numel()
+
+        # clone relevant token and attention tensors.
+        # tokens is (batch * beam, max_len). So the index_select
+        # gets the newly EOS rows, then selects cols 1..{step + 2}
+        tokens_clone = tokens.index_select(0, bbsz_idx)[
+            :, 1 : step + 2
+        ]  # skip the first index, which is EOS
+
+        tokens_clone[:, step] = self.eos
+        attn_clone = (
+            attn.index_select(0, bbsz_idx)[:, :, 1 : step + 2]
+            if attn is not None
+            else None
+        )
+
+        # compute scores per token position
+        pos_scores = scores.index_select(0, bbsz_idx)[:, : step + 1]
+        pos_scores[:, step] = eos_scores
+        # convert from cumulative to per-position scores
+        pos_scores[:, 1:] = pos_scores[:, 1:] - pos_scores[:, :-1]
+
+        # normalize sentence-level scores
+        if self.normalize_scores:
+            eos_scores /= (step + 1) ** self.len_penalty
+
+        # cum_unfin records which sentences in the batch are finished.
+        # It helps match indexing between (a) the original sentences
+        # in the batch and (b) the current, possibly-reduced set of
+        # sentences.
+        cum_unfin: List[int] = []
+        prev = 0
+        for f in finished:
+            if f:
+                prev += 1
+            else:
+                cum_unfin.append(prev)
+        cum_fin_tensor = torch.tensor(cum_unfin, dtype=torch.int).to(bbsz_idx)
+
+        unfin_idx = bbsz_idx // beam_size
+        sent = unfin_idx + torch.index_select(cum_fin_tensor, 0, unfin_idx)
+
+        # Create a set of "{sent}{unfin_idx}", where
+        # "unfin_idx" is the index in the current (possibly reduced)
+        # list of sentences, and "sent" is the index in the original,
+        # unreduced batch
+        # For every finished beam item
+        # sentence index in the current (possibly reduced) batch
+        seen = (sent << 32) + unfin_idx
+        unique_seen: List[int] = torch.unique(seen).tolist()
+
+        if self.match_source_len:
+            condition = step > torch.index_select(src_lengths, 0, unfin_idx)
+            eos_scores = torch.where(condition, torch.tensor(-math.inf), eos_scores)
+        sent_list: List[int] = sent.tolist()
+        for i in range(bbsz_idx.size()[0]):
+            # An input sentence (among those in a batch) is finished when
+            # beam_size hypotheses have been collected for it
+            if len(finalized[sent_list[i]]) < beam_size:
+                if attn_clone is not None:
+                    # remove padding tokens from attn scores
+                    hypo_attn = attn_clone[i]
+                else:
+                    hypo_attn = torch.empty(0)
+
+                finalized[sent_list[i]].append(
+                    {
+                        "tokens": tokens_clone[i],
+                        "score": eos_scores[i],
+                        "attention": hypo_attn,  # src_len x tgt_len
+                        "alignment": torch.empty(0),
+                        "positional_scores": pos_scores[i],
+                    }
+                )
+
+        newly_finished: List[int] = []
+        for unique_s in unique_seen:
+            # check termination conditions for this sentence
+            unique_sent: int = unique_s >> 32
+            unique_unfin_idx: int = unique_s - (unique_sent << 32)
+
+            if not finished[unique_sent] and self.is_finished(
+                step, unique_unfin_idx, max_len, len(finalized[unique_sent]), beam_size
+            ):
+                finished[unique_sent] = True
+                newly_finished.append(unique_unfin_idx)
+
+        return newly_finished
+
+    def is_finished(
+        self,
+        step: int,
+        unfin_idx: int,
+        max_len: int,
+        finalized_sent_len: int,
+        beam_size: int,
+    ):
+        """
+        Check whether decoding for a sentence is finished, which
+        occurs when the list of finalized sentences has reached the
+        beam size, or when we reach the maximum length.
+        """
+        assert finalized_sent_len <= beam_size
+        if finalized_sent_len == beam_size or step == max_len:
+            return True
+        return False
+
+
+class EnsembleModel(nn.Module):
+    """A wrapper around an ensemble of models."""
+
+    def __init__(self, models):
+        super().__init__()
+        self.models_size = len(models)
+        # method '__len__' is not supported in ModuleList for torch script
+        self.single_model = models[0]
+        self.models = nn.ModuleList(models)
+
+        self.has_incremental: bool = False
+        if all(
+            hasattr(m, "decoder") and isinstance(m.decoder, FairseqIncrementalDecoder)
+            for m in models
+        ):
+            self.has_incremental = True
+
+    def forward(self):
+        pass
+
+    def has_encoder(self):
+        return hasattr(self.single_model, "encoder")
+
+    def is_t5_structure(self):
+        t5_structure = hasattr(self.single_model, "text_encoder_prenet") and hasattr(self.single_model, "speech_encoder_prenet") or \
+            hasattr(self.single_model, "encoder_prenet") and hasattr(self.single_model, "encoder_prenet")
+        return t5_structure
+
+    def has_incremental_states(self):
+        return self.has_incremental
+
+    def max_decoder_positions(self):
+        return min([m.max_decoder_positions() for m in self.models if hasattr(m, "max_decoder_positions")] + [sys.maxsize])
+
+    @torch.jit.export
+    def forward_encoder(self, net_input: Dict[str, Tensor]):
+        if not self.has_encoder():
+            return None
+        elif self.is_t5_structure():
+            return [model.forward_encoder_torchscript(net_input) for model in self.models]
+        else:
+            return [model.encoder.forward_torchscript(net_input) for model in self.models]
+
+    @torch.jit.export
+    def forward_decoder(
+        self,
+        tokens,
+        encoder_outs: List[Dict[str, List[Tensor]]],
+        incremental_states: List[Dict[str, Dict[str, Optional[Tensor]]]],
+        temperature: float = 1.0,
+    ):
+        log_probs = []
+        avg_attn: Optional[Tensor] = None
+        encoder_out: Optional[Dict[str, List[Tensor]]] = None
+        for i, model in enumerate(self.models):
+            if self.has_encoder():
+                encoder_out = encoder_outs[i]
+            # decode each model
+            if self.has_incremental_states():
+                if self.is_t5_structure:
+                    decoder_out = model.forward_decoder(
+                        tokens,
+                        encoder_out=encoder_out,
+                        incremental_state=incremental_states[i]
+                    )
+                else:
+                    decoder_out = model.decoder.forward(
+                        tokens,
+                        encoder_out=encoder_out,
+                        incremental_state=incremental_states[i],
+                    )
+            else:
+                if hasattr(model, "decoder"):
+                    decoder_out = model.decoder.forward(tokens, encoder_out=encoder_out)
+                else:
+                    decoder_out = model.forward(tokens)
+
+            attn: Optional[Tensor] = None
+            decoder_len = len(decoder_out)
+            if decoder_len > 1 and decoder_out[1] is not None:
+                if isinstance(decoder_out[1], Tensor):
+                    attn = decoder_out[1]
+                else:
+                    attn_holder = decoder_out[1]["attn"]
+                    if isinstance(attn_holder, Tensor):
+                        attn = attn_holder
+                    elif attn_holder is not None:
+                        attn = attn_holder[0]
+                if attn is not None:
+                    attn = attn[:, -1, :]
+
+            decoder_out_tuple = (
+                decoder_out[0][:, -1:, :].div_(temperature),
+                None if decoder_len <= 1 else decoder_out[1],
+            )
+            probs = model.get_normalized_probs(
+                decoder_out_tuple, log_probs=True, sample=None
+            )
+            probs = probs[:, -1, :]
+            if self.models_size == 1:
+                return probs, attn
+
+            log_probs.append(probs)
+            if attn is not None:
+                if avg_attn is None:
+                    avg_attn = attn
+                else:
+                    avg_attn.add_(attn)
+
+        avg_probs = torch.logsumexp(torch.stack(log_probs, dim=0), dim=0) - math.log(
+            self.models_size
+        )
+
+        if avg_attn is not None:
+            avg_attn.div_(self.models_size)
+        return avg_probs, avg_attn
+
+    @torch.jit.export
+    def reorder_encoder_out(
+        self, encoder_outs: Optional[List[Dict[str, List[Tensor]]]], new_order
+    ):
+        """
+        Reorder encoder output according to *new_order*.
+
+        Args:
+            encoder_out: output from the ``forward()`` method
+            new_order (LongTensor): desired order
+
+        Returns:
+            *encoder_out* rearranged according to *new_order*
+        """
+        new_outs: List[Dict[str, List[Tensor]]] = []
+        if not self.has_encoder():
+            return new_outs
+        for i, model in enumerate(self.models):
+            assert encoder_outs is not None
+            new_outs.append(
+                model.encoder.reorder_encoder_out(encoder_outs[i], new_order)
+            )
+        return new_outs
+
+    @torch.jit.export
+    def reorder_incremental_state(
+        self,
+        incremental_states: List[Dict[str, Dict[str, Optional[Tensor]]]],
+        new_order,
+    ):
+        if not self.has_incremental_states():
+            return
+        for i, model in enumerate(self.models):
+            model.decoder.reorder_incremental_state_scripting(
+                incremental_states[i], new_order
+            )
+
+
+class SequenceGeneratorWithAlignment(SequenceGenerator):
+    def __init__(
+        self, models, tgt_dict, left_pad_target=False, print_alignment="hard", **kwargs
+    ):
+        """Generates translations of a given source sentence.
+
+        Produces alignments following "Jointly Learning to Align and
+        Translate with Transformer Models" (Garg et al., EMNLP 2019).
+
+        Args:
+            left_pad_target (bool, optional): Whether or not the
+                hypothesis should be left padded or not when they are
+                teacher forced for generating alignments.
+        """
+        super().__init__(EnsembleModelWithAlignment(models), tgt_dict, **kwargs)
+        self.left_pad_target = left_pad_target
+
+        if print_alignment == "hard":
+            self.extract_alignment = utils.extract_hard_alignment
+        elif print_alignment == "soft":
+            self.extract_alignment = utils.extract_soft_alignment
+
+    @torch.no_grad()
+    def generate(self, models, sample, **kwargs):
+        finalized = super()._generate(sample, **kwargs)
+
+        src_tokens = sample["net_input"]["src_tokens"]
+        bsz = src_tokens.shape[0]
+        beam_size = self.beam_size
+        (
+            src_tokens,
+            src_lengths,
+            prev_output_tokens,
+            tgt_tokens,
+        ) = self._prepare_batch_for_alignment(sample, finalized)
+        if any(getattr(m, "full_context_alignment", False) for m in self.model.models):
+            attn = self.model.forward_align(src_tokens, src_lengths, prev_output_tokens)
+        else:
+            attn = [
+                finalized[i // beam_size][i % beam_size]["attention"].transpose(1, 0)
+                for i in range(bsz * beam_size)
+            ]
+
+        if src_tokens.device != "cpu":
+            src_tokens = src_tokens.to("cpu")
+            tgt_tokens = tgt_tokens.to("cpu")
+            attn = [i.to("cpu") for i in attn]
+
+        # Process the attn matrix to extract hard alignments.
+        for i in range(bsz * beam_size):
+            alignment = self.extract_alignment(
+                attn[i], src_tokens[i], tgt_tokens[i], self.pad, self.eos
+            )
+            finalized[i // beam_size][i % beam_size]["alignment"] = alignment
+        return finalized
+
+    def _prepare_batch_for_alignment(self, sample, hypothesis):
+        src_tokens = sample["net_input"]["src_tokens"]
+        bsz = src_tokens.shape[0]
+        src_tokens = (
+            src_tokens[:, None, :]
+            .expand(-1, self.beam_size, -1)
+            .contiguous()
+            .view(bsz * self.beam_size, -1)
+        )
+        src_lengths = sample["net_input"]["src_lengths"]
+        src_lengths = (
+            src_lengths[:, None]
+            .expand(-1, self.beam_size)
+            .contiguous()
+            .view(bsz * self.beam_size)
+        )
+        prev_output_tokens = data_utils.collate_tokens(
+            [beam["tokens"] for example in hypothesis for beam in example],
+            self.pad,
+            self.eos,
+            self.left_pad_target,
+            move_eos_to_beginning=True,
+        )
+        tgt_tokens = data_utils.collate_tokens(
+            [beam["tokens"] for example in hypothesis for beam in example],
+            self.pad,
+            self.eos,
+            self.left_pad_target,
+            move_eos_to_beginning=False,
+        )
+        return src_tokens, src_lengths, prev_output_tokens, tgt_tokens
+
+
+class EnsembleModelWithAlignment(EnsembleModel):
+    """A wrapper around an ensemble of models."""
+
+    def __init__(self, models):
+        super().__init__(models)
+
+    def forward_align(self, src_tokens, src_lengths, prev_output_tokens):
+        avg_attn = None
+        for model in self.models:
+            decoder_out = model(src_tokens, src_lengths, prev_output_tokens)
+            attn = decoder_out[1]["attn"][0]
+            if avg_attn is None:
+                avg_attn = attn
+            else:
+                avg_attn.add_(attn)
+        if len(self.models) > 1:
+            avg_attn.div_(len(self.models))
+        return avg_attn
diff --git a/SpeechT5/speecht5/tasks/__init__.py b/SpeechT5/speecht5/tasks/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/SpeechT5/speecht5/tasks/speecht5.py b/SpeechT5/speecht5/tasks/speecht5.py
new file mode 100644
index 0000000000000000000000000000000000000000..ff3c88eb6bd6736608b17f8faa200814f65956ae
--- /dev/null
+++ b/SpeechT5/speecht5/tasks/speecht5.py
@@ -0,0 +1,700 @@
+# --------------------------------------------------------
+# SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing (https://arxiv.org/abs/2110.07205)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/SpeechT5
+# Copyright (c) 2021 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Based on fairseq and espnet code bases
+# https://github.com/pytorch/fairseq; https://github.com/espnet/espnet
+# --------------------------------------------------------
+
+import logging
+import os.path as op
+from argparse import Namespace
+from collections import OrderedDict
+
+import torch
+from fairseq.data import (
+    Dictionary, 
+    encoders, 
+    PrependTokenDataset,
+    AppendTokenDataset, 
+    data_utils, 
+    StripTokenDataset,
+    TokenBlockDataset,
+)
+from fairseq.data.encoders.utils import get_whole_word_mask
+from fairseq import utils
+from speecht5.data.multitask_dataset import MultitaskDataset
+from speecht5.data.speech_to_text_dataset import SpeechToTextDataset
+from speecht5.data.text_to_speech_dataset import TextToSpeechDataset
+from speecht5.data.speech_to_speech_dataset import SpeechToSpeechDataset
+from speecht5.data.speech_to_class_dataset import SpeechToClassDataset
+from speecht5.data.speech_dataset import SpeechPretrainDataset
+from speecht5.data.text_dataset import TextPretrainDataset
+from fairseq.data.shorten_dataset import maybe_shorten_dataset
+from fairseq.tasks import LegacyFairseqTask, register_task
+from fairseq.tasks.hubert_pretraining import LabelEncoder 
+
+logger = logging.getLogger(__name__)
+
+TASK_NAME = ["s2t", "t2s", "s2s", "s2c", "pretrain"]
+
+@register_task("speecht5")
+class SpeechT5Task(LegacyFairseqTask):
+    @staticmethod
+    def add_args(parser):
+        parser.add_argument("data", help="manifest root path")
+        parser.add_argument(
+            "--config-yaml",
+            type=str,
+            default="config.yaml",
+            help="Configuration YAML filename (under manifest root)",
+        )
+        parser.add_argument(
+            "--max-speech-sample-size",
+            default=None,
+            type=int,
+            metavar="N",
+            help="max speech sample size",
+        )
+        parser.add_argument(
+            "--min-speech-sample-size",
+            default=None,
+            type=int,
+            metavar="N",
+            help="min speech sample size",
+        )
+        parser.add_argument(
+            "--max-speech-positions",
+            default=4000,
+            type=int,
+            metavar="N",
+            help="max number of tokens in the source sequence",
+        )
+        parser.add_argument(
+            "--max-text-positions",
+            default=450,
+            type=int,
+            metavar="N",
+            help="max number of tokens in the target sequence",
+        )
+        parser.add_argument(
+            '--t5-task',
+            choices=TASK_NAME,
+            help='task for training'
+        )
+        parser.add_argument(
+            "--bpe-tokenizer",
+            type=str,
+            default=None,
+            help="bpe tokenizer for s2t",
+        )
+        # Speaker Identification (SID)
+        parser.add_argument(
+            "--finetune-from-modules",
+            default=None,
+            # choices=[
+            #     "encoder-decoder", "encoder", "decoder",
+            #     "speech_encoder_prenet-encoder-decoder-text_decoder_prenet-text_decoder_postnet",     # ASR, T5 SID
+            #     "speech_encoder_prenet-encoder-decoder-text_decoder_prenet-speaker_decoder_postnet",  # SID
+            #     "speech_encoder_prenet-encoder-decoder-speech_decoder_prenet-speech_decoder_postnet", # VC, SE
+            #     "text_encoder_prenet-encoder-decoder-speech_decoder_prenet-speech_decoder_postnet",   # TTS
+            # ],
+            help="If set, using part modules of finetune model.",
+        )
+        parser.add_argument(
+            "--finetune-out-of-modules",
+            default=None,
+            # choices=[
+            #     "speaker_decoder_postnet", # SID
+            #     "speech_decoder_postnet", # SE with reduction factor 1
+            # ],
+            help="If set, remove part modules of finetune model.",
+        )
+        # BART
+        parser.add_argument(
+            "--shorten-method",
+            default="none",
+            choices=["none", "truncate", "random_crop"],
+            help="if not none, shorten sequences that exceed --tokens-per-sample",
+        )
+        parser.add_argument(
+            "--shorten-data-split-list",
+            default="",
+            help="comma-separated list of dataset splits to apply shortening to, "
+            'e.g., "train,valid" (default: all dataset splits)',
+        )
+
+        parser.add_argument(
+            "--tokens-per-sample",
+            default=512,
+            type=int,
+            help="max number of total tokens over all segments"
+            " per sample for dataset",
+        )
+        parser.add_argument(
+            "--sample-break-mode",
+            default="eos",
+            type=str,
+            help="mode for breaking sentence",
+        )
+        parser.add_argument(
+            "--mask",
+            default=0.3,
+            type=float,
+            help="fraction of words/subwords that will be masked",
+        )
+        parser.add_argument(
+            "--mask-random",
+            default=0.1,
+            type=float,
+            help="instead of using [MASK], use random token this often",
+        )
+        parser.add_argument(
+            "--insert",
+            default=0.0,
+            type=float,
+            help="insert this percentage of additional random tokens",
+        )
+        parser.add_argument(
+            "--permute",
+            default=0.0,
+            type=float,
+            help="take this proportion of subwords and permute them",
+        )
+        parser.add_argument(
+            "--rotate",
+            default=0.0,
+            type=float,
+            help="rotate this proportion of inputs",
+        )
+        parser.add_argument(
+            "--poisson-lambda",
+            default=3.5,
+            type=float,
+            help="randomly shuffle sentences for this proportion of inputs",
+        )
+        parser.add_argument(
+            "--permute-sentences",
+            default=0.0,
+            type=float,
+            help="shuffle this proportion of sentences in all inputs",
+        )
+        parser.add_argument(
+            "--mask-length",
+            default="span-poisson",
+            type=str,
+            choices=["subword", "word", "span-poisson"],
+            help="mask length to choose",
+        )
+        parser.add_argument(
+            "--replace-length",
+            default=1,
+            type=int,
+            help="when masking N tokens, replace with 0, 1, or N tokens (use -1 for N)",
+        )
+        parser.add_argument(
+            "--iid-noise-target",
+            action="store_true",
+            help="whether to use t5 form target",
+        )
+        # Hubert
+        parser.add_argument(
+            "--hubert-labels",
+            nargs="*",
+            type=str,
+            default=['km'],
+            help="extension of the label files to load, frame-level labels for pre-training, and sequence-level label for fine-tuning",
+        )
+        parser.add_argument(
+            "--hubert-label-dir",
+            type=str,
+            default=None,
+            help="if set, looks for labels in this directory instead",
+        )
+        parser.add_argument(
+            "--sample-rate",
+            default=100,
+            type=float,
+            help="target sample rate. audio files will be up/down sampled to this rate",
+        )
+        parser.add_argument(
+            "--label-rates",
+            default=-1,
+            type=float,
+            help="if set, looks for labels in this directory instead",
+        )
+        parser.add_argument(
+            "--normalize",
+            action="store_true",
+            help="if set, normalizes input to have 0 mean and unit variance",
+        )
+        parser.add_argument(
+            "--enable-padding",
+            action="store_true",
+            help="pad shorter samples instead of cropping",
+        )
+        parser.add_argument(
+            "--pad-audio",
+            action="store_true",
+            help="pad audio to the longest one in the batch if true",
+        )
+        parser.add_argument(
+            "--random-crop",
+            action="store_true",
+            help="always crop from the beginning if false",
+        )
+        parser.add_argument(
+            "--single-target",
+            action="store_true",
+            help="if set, AddTargetDatasets outputs same keys "
+            "as AddTargetDataset",
+        )
+        parser.add_argument(
+            "--batch-ratio",
+            default=None,
+            type=str,
+            help="ratio of bach size for each dataset",
+        )
+        parser.add_argument(
+            "--sample-ratios",
+            default=None,
+            type=str,
+            help="ratio of sample for each dataset",
+        )
+        parser.add_argument(
+            "--ctc-weight",
+            type=float,
+            default=0.0,
+            help="ctc weight for inference",
+        )
+
+    def __init__(self, args, dicts, config):
+        super().__init__(args)
+        self.dicts = dicts
+        self.config = config
+        self.t5_task = args.t5_task
+        # Used for filter size
+        if self.t5_task in ['s2t', 't2s', 's2s', 's2c']:
+            self.max_pos = [self.args.max_speech_positions * 256]
+        elif self.t5_task == 'pretrain':
+            self.max_pos = [self.args.max_speech_positions * 256, self.args.max_text_positions]
+
+        self.mask_idx = self.dicts["text"].add_symbol("<mask>")
+        # add blank token for ctc
+        # if args.ctc_weight > 0:
+        self.blank_symbol_idx = self.dicts["text"].add_symbol("<ctc_blank>")
+        self.blank_symbol = "<ctc_blank>"
+
+        # add mask token
+        if hasattr(args, "iid_noise_target") and args.iid_noise_target:
+            self.uni_mask_idxs = []
+            for i in range(600):
+                self.uni_mask_idxs.append(self.dicts["text"].add_symbol("<mask>" + str(i)))
+            self.uni_mask_idxs = torch.tensor(self.uni_mask_idxs)
+
+        self.seed = args.seed
+
+    @classmethod
+    def setup_task(cls, args, **kwargs):
+        # load dictionaries and config
+        dicts = OrderedDict()
+        if args.t5_task == 'pretrain' and not hasattr(args, "shuffle_instance"):
+            args.shuffle_instance = False
+
+        # Prepare config
+        config = None
+        logger.info('No config file for ' + args.t5_task)
+
+        if args.t5_task == "pretrain":
+            dicts["hubert"] = [Dictionary.load(f"{args.hubert_label_dir}/dict.{label}.txt") for label in args.hubert_labels]
+            dicts["text"] = Dictionary.load(op.join(args.data, "dict.txt"))
+        else:
+            if config is None:
+                dicts["text"] = Dictionary.load(op.join(args.data, "dict.txt"))
+            else:
+                dicts["text"] = Dictionary.load(op.join(args.data, config.vocab_filename))
+
+        return cls(args, dicts, config)
+
+    def build_criterion(self, args):
+        from fairseq import criterions
+        return criterions.build_criterion(args, self)
+
+    def load_dataset(self, split, epoch=1, combine=False, **kwargs):
+        sample_ratios = []
+        if self.t5_task == "s2t":
+            ## For speech to text task
+            bpe_tokenizer = self.build_bpe(self.args)
+            manifest = f"{self.args.data}/{split}.tsv"
+            procs = [LabelEncoder(self.dicts["text"])]
+            paths = [f"{self.args.hubert_label_dir}/{split}.txt"]
+            self.datasets[split] = SpeechToTextDataset(
+                manifest,
+                sample_rate=self.args.sample_rate,
+                label_paths=paths,
+                label_processors=procs,
+                max_keep_sample_size=self.max_pos[0] if self.args.max_speech_sample_size is None else self.args.max_speech_sample_size,
+                min_keep_sample_size=self.args.min_speech_sample_size,
+                normalize=self.args.normalize,
+                store_labels=False,
+                tgt_dict=self.dicts["text"],
+                tokenizer=bpe_tokenizer,
+            )
+        elif self.t5_task == "t2s":
+            ## For text to speech task
+            from fairseq.data import ConcatDataset
+            bpe_tokenizer = self.build_bpe(self.args)
+            procs = [LabelEncoder(self.dicts["text"])]
+            t2s_datasets = [
+                TextToSpeechDataset(
+                    manifest_path=f"{self.args.data}/{name}.tsv",
+                    sample_rate=self.args.sample_rate,
+                    label_paths=[f"{self.args.hubert_label_dir}/{name}.txt"],
+                    label_processors=procs,
+                    max_keep_sample_size=self.max_pos[0],
+                    normalize=self.args.normalize,
+                    store_labels=False,
+                    src_dict=self.dicts["text"],
+                    tokenizer=bpe_tokenizer,
+                    reduction_factor=self.args.reduction_factor,
+                )
+                for name in split.split(",")
+            ]
+            self.datasets[split] = ConcatDataset(t2s_datasets) if len(t2s_datasets) > 1 else t2s_datasets[0]
+        elif self.t5_task == "s2s":
+            manifest = f"{self.args.data}/{split}.tsv"
+            self.datasets[split] = SpeechToSpeechDataset(
+                manifest_path=manifest,
+                sample_rate=self.args.sample_rate,
+                max_keep_sample_size=self.max_pos[0] if self.args.max_speech_sample_size is None else self.args.max_speech_sample_size,
+                min_keep_sample_size=self.args.min_speech_sample_size,
+                normalize=self.args.normalize,
+                reduction_factor=self.args.reduction_factor,
+            )
+        elif self.t5_task == "s2c":
+            is_train_split = ("train" in split)
+            is_valid_split = ("valid" in split)
+            if is_train_split:
+                max_length = 51200
+            elif is_valid_split:
+                max_length = 76800
+            else:
+                max_length = 2560000
+            manifest = op.join(f"{self.args.data}", f"{split}.tsv")
+            procs = LabelEncoder(self.dicts["text"]) # map speaker to id
+            self.datasets[split] = SpeechToClassDataset(
+                manifest_path=manifest,
+                sample_rate=self.args.sample_rate,
+                label_processors=procs,
+                max_keep_sample_size=self.max_pos[0] if self.args.max_speech_sample_size is None else self.args.max_speech_sample_size,
+                min_keep_sample_size=self.args.min_speech_sample_size,
+                normalize=self.args.normalize,
+                tgt_dict=self.dicts["text"],
+                max_length=max_length
+            )
+        elif self.t5_task == "pretrain":
+            is_train_split = ("train" in split)
+            pretrain_datasets = []
+            speech_split, text_split = split.split('|')
+
+            ## Speech pre-train
+            manifest = f"{self.args.data}/{speech_split}.tsv"
+            dicts = self.dicts["hubert"]
+            pad_list = [dict.pad() for dict in dicts]
+            eos_list = [dict.eos() for dict in dicts]
+            procs = [LabelEncoder(dict) for dict in dicts]
+            paths = [
+                f"{self.args.hubert_label_dir}/{speech_split}.{l}" for l in self.args.hubert_labels
+            ]
+            # hubert v1: pad_audio=True, random_crop=False;
+            self.args.dec_weight = getattr(self.args, "dec_weight", 1.0)
+            pretrain_datasets.append(
+                SpeechPretrainDataset(
+                    manifest,
+                    sample_rate=self.args.sample_rate,
+                    label_paths=paths,
+                    label_rates=self.args.label_rates,
+                    pad_list=pad_list,
+                    eos_list=eos_list,
+                    label_processors=procs,
+                    max_keep_sample_size=None,
+                    min_keep_sample_size=32000,
+                    max_sample_size=self.args.max_speech_sample_size,
+                    pad_audio=self.args.pad_audio,
+                    normalize=self.args.normalize,
+                    store_labels=False,
+                    random_crop=self.args.random_crop,
+                    single_target=self.args.single_target,
+                    reduction_factor=self.args.reduction_factor,
+                )
+            )
+            sample_ratios.append(sum([pretrain_datasets[0].size(i) for i in range(len(pretrain_datasets[0]))]))
+
+            ## Text pre-train
+            paths = utils.split_paths(self.args.data)
+            assert len(paths) > 0
+            data_path = paths[(epoch - 1) % len(paths)]
+            split_path = op.join(data_path, text_split)
+            bart_dataset = data_utils.load_indexed_dataset(
+                split_path,
+                self.dicts["text"],
+                self.args.dataset_impl,
+                combine=combine,
+            )
+            if bart_dataset is None:
+                raise FileNotFoundError(
+                    "Dataset not found: {} ({})".format(text_split, split_path)
+                )
+            bart_dataset = StripTokenDataset(bart_dataset, self.dicts["text"].eos())
+            bart_dataset = maybe_shorten_dataset(
+                bart_dataset,
+                text_split,
+                self.args.shorten_data_split_list,
+                self.args.shorten_method,
+                self.args.tokens_per_sample,
+                self.args.seed,
+            )
+            # create continuous blocks of tokens
+            bart_dataset = TokenBlockDataset(
+                bart_dataset,
+                bart_dataset.sizes,
+                self.args.tokens_per_sample - 2,  # one less for <s> and one for </s>
+                pad=self.dicts["text"].pad(),
+                eos=self.dicts["text"].eos(),
+                break_mode=self.args.sample_break_mode,
+                document_sep_len=0,
+            )
+            # prepend beginning-of-sentence token (<s>, equiv. to [CLS] in BERT)
+            bart_dataset = PrependTokenDataset(bart_dataset, self.dicts["text"].bos())
+            bart_dataset = AppendTokenDataset(bart_dataset, self.dicts["text"].eos())
+            mask_whole_words = (
+                get_whole_word_mask(self.args, self.dicts["text"])
+                if self.args.mask_length != "subword"
+                else None
+            )
+            self.args.bert_weight = getattr(self.args, "bert_weight", 0.0)
+            pretrain_datasets.append(
+                TextPretrainDataset(
+                    bart_dataset,
+                    bart_dataset.sizes,
+                    self.dicts["text"],
+                    self.mask_idx,
+                    mask_whole_words,
+                    shuffle=self.args.shuffle_instance,
+                    seed=self.seed,
+                    args=self.args,
+                    iid_noise_target=self.args.iid_noise_target,
+                    uni_mask_idxs=self.uni_mask_idxs if self.args.iid_noise_target else None,
+                )
+            )
+            sample_ratios.append(sum(pretrain_datasets[1].sizes))
+            logger.info(
+                "Task: {0}, Loaded {1} samples of denoising_dataset".format(
+                    'bart',
+                    len(pretrain_datasets[1]),
+                )
+            )
+
+            logger.info('token ratio is ' + str(sample_ratios))
+            if self.args.batch_ratio is not None:
+                batch_ratio = eval(self.args.batch_ratio)
+                assert len(batch_ratio) == len(sample_ratios)
+                sample_ratios = [sample_ratios[i] / batch_ratio[i] for i in range(len(sample_ratios))]
+            else:
+                batch_ratio = None
+            max_size = max(sample_ratios)
+            sample_ratios = [max_size / r for r in sample_ratios]
+            if hasattr(self.args, "sample_ratios") and self.args.sample_ratios is not None:
+                sample_ratios = eval(self.args.sample_ratios)
+            if is_train_split:
+                self.datasets[split] = MultitaskDataset(
+                    pretrain_datasets, sample_ratios, batch_ratio
+                )
+            else:
+                self.datasets[split] = MultitaskDataset(
+                    pretrain_datasets, batch_ratio=batch_ratio
+                )
+
+    def train_step(
+        self, sample, model, criterion, optimizer, update_num, ignore_grad=False
+    ):
+        model.train()
+        model.set_num_updates(update_num)
+
+        # Junyi: not use sample_size, but normalize the loss locally
+        agg_loss, agg_sample_size, agg_logging_output = 0.0, 1.0, {}
+        agg_logging_output['sample_size'] = 1
+
+        def forward_backward(model, samples, weight=1.0):
+            nonlocal agg_loss, agg_logging_output
+            if samples is None or len(samples) == 0:
+                return
+            loss, sample_size, logging_output = criterion(model, samples)
+            if ignore_grad:
+                loss *= 0
+            else:
+                loss *= weight
+            loss = loss / sample_size
+            optimizer.backward(loss)
+            agg_loss += loss.detach().item()
+            # # TODO make summing of the sample sizes configurable
+            for k in logging_output:
+                if k == 'ntokens' or k == 'nsentences':
+                    if k not in agg_logging_output:
+                        agg_logging_output[k] = 0
+                    agg_logging_output[k] += logging_output[k]
+                    # continue
+                # agg_logging_output[k] += logging_output[k]
+                # agg_logging_output[task_name] += logging_output[k]
+            agg_logging_output[samples['task_name']] = logging_output
+
+        forward_backward(model, sample)
+
+        agg_logging_output["loss"] = agg_loss
+
+        return agg_loss, agg_sample_size, agg_logging_output
+
+    def valid_step(self, sample, model, criterion):
+        model.eval()
+        with torch.no_grad():
+            from collections import defaultdict
+
+            agg_loss, agg_sample_size, agg_logging_output = 0.0, 1.0, defaultdict(float)
+            agg_logging_output['sample_size'] = 1
+            loss, sample_size, logging_output = criterion(model, sample)
+            loss = loss / sample_size
+            # agg_loss += loss.data.item() if isinstance(loss, torch.Tensor) else loss
+            agg_loss += loss.item() if isinstance(loss, torch.Tensor) else loss
+            agg_logging_output[sample['task_name']] = logging_output
+            agg_logging_output["loss"] = agg_loss
+        return agg_loss, agg_sample_size, agg_logging_output
+
+    @property
+    def target_dictionary(self):
+        return self.dicts["text"]
+
+    @property
+    def source_dictionary(self):
+        return None
+
+    def build_model(self, args):
+        try:
+            args.input_feat_per_channel = self.config.input_feat_per_channel
+            args.input_channels = self.config.input_channels
+        except Exception as e:
+            args.input_feat_per_channel = 80
+            args.input_channels = 1
+            logger.info(f"Cannot set input_feat_per_channel, input_channels, since: ")
+            logger.warn(e)
+            logger.info(f"Set to: {args.input_feat_per_channel} and {args.input_channels}")
+
+        args.speech_odim = args.input_feat_per_channel * args.input_channels
+
+        args.label_rates = self.args.label_rates
+        args.sample_rate = self.args.sample_rate
+        self.args.reduction_factor = args.reduction_factor
+        return super(SpeechT5Task, self).build_model(args)
+
+    def build_generator(
+        self,
+        models,
+        args,
+        seq_gen_cls=None,
+        extra_gen_cls_kwargs=None,
+    ):
+        from speecht5.sequence_generator import SequenceGenerator
+        extra_gen_cls_kwargs = {
+            "ctc_weight": self.args.ctc_weight,
+            **extra_gen_cls_kwargs
+        }
+        return super().build_generator(
+            models, args, seq_gen_cls=SequenceGenerator, extra_gen_cls_kwargs=extra_gen_cls_kwargs
+        )
+
+    def build_tokenizer(self, args):
+        if self.config is None:
+            logger.info(f"pre-tokenizer: None")
+            return encoders.build_tokenizer(Namespace(**{"tokenizer": None}))
+        else:
+            logger.info(f"pre-tokenizer: {self.config.pre_tokenizer}")
+            return encoders.build_tokenizer(Namespace(**self.config.pre_tokenizer))
+
+    def build_bpe(self, args):
+        if self.config is not None:
+            logger.info(f"tokenizer: {self.config.bpe_tokenizer}")
+            return encoders.build_bpe(Namespace(**self.config.bpe_tokenizer))
+        else:
+            logger.info(f"tokenizer: {self.args.bpe_tokenizer}")
+            return encoders.build_bpe(Namespace(**{"bpe": "sentencepiece", "sentencepiece_model": self.args.bpe_tokenizer}))
+
+    def generate_class(self, models, net_input, prefix_tokens, **kwargs):
+        with torch.no_grad():
+            encoder_input = {
+                k: v for k, v in net_input.items() if k != "prev_output_tokens" and k != "task_name"
+            }
+            encoder_input.update(kwargs)
+            encoder_input.update({"prev_output_tokens": prefix_tokens})
+            return models[0].generate_class(**encoder_input)
+
+    def generate_speech(self, models, net_input, **kwargs):
+        with torch.no_grad():
+            encoder_input = {
+                k: v for k, v in net_input.items() if k != "prev_output_tokens" and k != "task_name"
+            }
+            encoder_input.update(kwargs)
+            return models[0].generate_speech(**encoder_input)
+
+    def inference_t2s(
+        self, models, sample
+    ):
+        with torch.no_grad():
+            xs = sample['net_input']['src_tokens']
+            spkemb = sample['net_input']['spkembs']
+            return models[0].inference(xs, spkemb)
+
+    def inference_s2s(
+        self, models, sample, force_equal_length=False
+    ):
+        with torch.no_grad():
+            x = sample['net_input']['src_tokens']
+            xlen = sample['net_input']['src_lengths']
+            spkemb = sample['net_input']['spkembs']
+            prev_output_tokens = sample['net_input']['prev_output_tokens']
+            padding_mask = sample['net_input']['padding_mask']
+            tgt_lengths = sample['net_input']['tgt_lengths']
+            return models[0].inference_s2s(x, xlen, spkemb, prev_output_tokens, tgt_lengths, force_equal_length=force_equal_length, padding_mask=padding_mask)
+
+    def inference_s2c(
+        self, models, sample
+    ):
+        with torch.no_grad():
+            x = sample['net_input']['src_tokens']
+            xlen = sample['net_input']['src_lengths']
+            prev_output_tokens = sample['net_input']['prev_output_tokens']
+            padding_mask = sample['net_input']['padding_mask']
+            assert prev_output_tokens.size(1) == 1, prev_output_tokens.size()
+            return models[0].inference_s2c(x, xlen, prev_output_tokens, padding_mask=padding_mask)
+
+    def filter_indices_by_size(
+        self, indices, dataset, max_positions=None, ignore_invalid_inputs=False
+    ):
+        """
+        Filter examples that are too large
+
+        Args:
+            indices (np.array): original array of sample indices
+            dataset (~fairseq.data.FairseqDataset): dataset to batch
+            max_positions (optional): max sentence length supported by the
+                model (default: None).
+            ignore_invalid_inputs (bool, optional): don't raise Exception for
+                sentences that are too long (default: False).
+        Returns:
+            np.array: array of filtered sample indices
+        """
+
+        indices, ignored = dataset.filter_indices_by_size(
+            indices,
+            self.max_pos
+        )
+        return indices
diff --git a/SpeechT5/speecht5_framework.png b/SpeechT5/speecht5_framework.png
new file mode 100644
index 0000000000000000000000000000000000000000..1f86964fcb2a51c24b0a778365cc3733484ed470
Binary files /dev/null and b/SpeechT5/speecht5_framework.png differ
diff --git a/SpeechT5/spm.py b/SpeechT5/spm.py
new file mode 100644
index 0000000000000000000000000000000000000000..e3b82aafd44e466b0cb387084217cf93bfec1dc1
--- /dev/null
+++ b/SpeechT5/spm.py
@@ -0,0 +1,13 @@
+import sentencepiece as spm
+
+# 加载模型
+sp = spm.SentencePieceProcessor()
+sp.Load('/public/home/changhl/dataset/spm_char.model')  # 加载之前训练好的模型
+
+# 编码文本
+encoded_text = sp.EncodeAsPieces("Hello, world!")
+print(encoded_text)  # 输出编码后的结果
+
+# 解码文本
+decoded_text = sp.DecodePieces(encoded_text)
+print(decoded_text)  # 输出解码后的原始文本
\ No newline at end of file
diff --git a/SpeechUT/README.md b/SpeechUT/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..e74967ae6c5f9b0c859a71c0643faeeb6a83bcd1
--- /dev/null
+++ b/SpeechUT/README.md
@@ -0,0 +1,203 @@
+# SpeechUT
+<!--**Pre-trained models for speech related tasks**-->
+
+ [**SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder Based Speech-Text Pre-training**](https://arxiv.org/abs/2210.03730)
+
+
+- (Done) Oct 2022: release the code and models
+- Oct 2022: release preprint in [arXiv](https://arxiv.org/abs/2210.03730)
+
+## Pre-Trained and Fine-tuned Models
+|  Model                |               Pre-training Dataset (unlabeled)                                                                                                    | Fine-tuning Dataset (labeled)                     | Model |
+| :------:              | :----------------------------------------------:                                                                                                  | :-----------------:                               | :-----: |
+| SpeechUT Base (ASR)   | [960 hrs LibriSpeech](http://www.openslr.org/12) + [40M Text](http://www.openslr.org/11)                                                          |                      -                            | [Azure Storage]|
+| SpeechUT Base (ASR)   | [960 hrs LibriSpeech](http://www.openslr.org/12) + [40M Text](http://www.openslr.org/11)                                                          | [100 hrs LibriSpeech](http://www.openslr.org/12)  | [Azure Storage]|
+| SpeechUT Large (ASR)  | [60k hrs LibriSpeech](http://www.openslr.org/12) + [40M Text](http://www.openslr.org/11)                                                          |                      -                            | [Azure Storage]|
+| SpeechUT Large (ASR)  | [60k hrs LibriSpeech](http://www.openslr.org/12) + [40M Text](http://www.openslr.org/11)                                                          | [960 hrs LibriSpeech](http://www.openslr.org/12)  | [Azure Storage]|
+| SpeechUT Base (En-De) | [960 hrs LibriSpeech](http://www.openslr.org/12) + [408 hrs MuST-C v1](https://ict.fbk.eu/must-c/) + [4.6M Text](https://www.statmt.org/wmt16/)   |                      -                            | [Azure Storage]|
+| SpeechUT Base (En-De) | [960 hrs LibriSpeech](http://www.openslr.org/12) + [408 hrs MuST-C v1](https://ict.fbk.eu/must-c/) + [4.6M Text](https://www.statmt.org/wmt16/)   | [En-De MuST-C v1](https://ict.fbk.eu/must-c/)     | [Azure Storage]|
+| SpeechUT Base (En-Es) | [960 hrs LibriSpeech](http://www.openslr.org/12) + [504 hrs MuST-C v1](https://ict.fbk.eu/must-c/) + [15M Text](https://www.statmt.org/wmt13/)    |                      -                            | [Azure Storage]|
+| SpeechUT Base (En-Es) | [960 hrs LibriSpeech](http://www.openslr.org/12) + [504 hrs MuST-C v1](https://ict.fbk.eu/must-c/) + [15M Text](https://www.statmt.org/wmt13/)    | [En-Es MuST-C v1](https://ict.fbk.eu/must-c/)     | [Azure Storage]|
+| SpeechUT Base (En-Fr) | [960 hrs LibriSpeech](http://www.openslr.org/12) + [492 hrs MuST-C v1](https://ict.fbk.eu/must-c/) + [40M Text](https://www.statmt.org/wmt14/)    |                      -                            | [Azure Storage]|
+| SpeechUT Base (En-Fr) | [960 hrs LibriSpeech](http://www.openslr.org/12) + [492 hrs MuST-C v1](https://ict.fbk.eu/must-c/) + [40M Text](https://www.statmt.org/wmt14/)    | [En-Fr MuST-C v1](https://ict.fbk.eu/must-c/)     | [Azure Storage]|
+
+
+## Language Model
+See [here](https://github.com/microsoft/SpeechT5/tree/main/Speech2C#language-model-and-vocabulary).
+
+
+## Setup
+
+```bash
+git submodule update --init SpeechUT/fairseq
+cd SpeechUT/
+pip install --editable fairseq/
+pip install sacrebleu==1.5.1
+```
+
+
+## ASR on LibriSpeech
+### Data preparation
+Please follow the steps of wav2vec 2.0 manifest [here](https://github.com/pytorch/fairseq/tree/main/examples/wav2vec#prepare-training-data-manifest) to prepare `train.tsv` and `train.ltr`. You should make sure the vocabulary [`dict.ltr.txt`](dataset/LibriSpeech/dict.ltr.txt) is the same as that used for the pre-trained model. Put yout prepared data into `$data_dir`.
+
+### Fine-tune a hybrid CTC-ED model
+- Fine-tune the base model on 100h subset
+    ```bash
+    # Usage: speechut/scripts/tune_speechut_asr/finetune_base_edctc.sh <model_path> <data_dir> <cpt_tag> [mount=$PWD] [world_size=8] [update_freq=2]
+    model_path=path/to/your/pre-trained/model
+    data_dir=dataset/LibriSpeech/asr
+    bash speechut/scripts/tune_speechut_asr/finetune_base_edctc.sh $model_path $data_dir 'tag400k'
+    ```
+
+- Fine-tune the large model on 960h subset
+    ```bash
+    # Usage: speechut/scripts/tune_speechut_asr/finetune960h_large_edctc.sh <model_path> <data_dir> <cpt_tag> [mount=$PWD] [world_size=8] [update_freq=3]
+    model_path=path/to/your/pre-trained/model
+    data_dir=dataset/LibriSpeech/asr
+    bash speechut/scripts/tune_speechut_asr/finetune960h_large_edctc.sh $model_path $data_dir 'tag400k'
+    ```
+
+### Decode
+- CTC-ED joint decoding
+    ```bash
+    # Usage: speechut/scripts/tune_speechut_asr/inference_edctc.sh <model_path> <data_dir> [gen-set=dev_other] [beam_size=10] [ctc_weight=0.2] [--normalize]
+    model_path=path/to/your/fine-tuned/model
+    data_dir=dataset/LibriSpeech/asr
+    # for base model
+    bash speechut/scripts/tune_speechut_asr/inference_edctc.sh $model_path $data_dir test_clean 10 0.2
+    # for large model, you should set --normalize at the end
+    bash speechut/scripts/tune_speechut_asr/inference_edctc.sh $model_path $data_dir test_clean 10 0.2 --normalize
+    ```
+    > We use the [espnet](https://github.com/espnet/espnet)-style joint decoding algorithm, currently only supporting batch_size=1. If you find it too slow, please check [`inference_nj.sh`](speechut/scripts/tune_speechut_asr/inference_nj.sh) for a multi-thread version.
+
+- CTC-ED joint decoding with LM
+    ```bash
+    # Usage: speechut/scripts/tune_speechut_asr/inference_edctclm.sh <model_path> <data_dir> [gen-set=dev_other] [beam_size=30] [ctc_weight=0.3] [lm_weight=0.7] [lm_path] [--normalize]
+    model_path=path/to/your/fine-tuned/model
+    data_dir=dataset/LibriSpeech/asr
+    lm_path=path/to/char_lm/model
+    # for base model
+    bash speechut/scripts/tune_speechut_asr/inference_edctclm.sh $model_path $data_dir test_clean 30 0.3 0.7 $lm_path
+    # for large model, you should set --normalize at the end
+    bash speechut/scripts/tune_speechut_asr/inference_edctclm.sh $model_path $data_dir test_clean 30 0.3 0.7 $lm_path --normalize
+    ```
+
+    > We currently only support batch_size=1. If you find it too slow, please check [`inference_lm_nj.sh`](speechut/scripts/tune_speechut_asr/inference_lm_nj.sh) for a multi-thread version.
+
+    > The released language model uses a different vocaburary [`dict.txt`](dataset/LibriSpeech/dict.txt), put it into `$data_dir` and the script will access it.
+
+
+## ST on MuST-C
+### Data preparation
+
+ST models are fine-tuned with [fairseq speech-to-text](https://github.com/facebookresearch/fairseq/tree/main/examples/speech_to_text) task, so just follow the data preparation instructions [here](https://github.com/facebookresearch/fairseq/tree/main/examples/speech_to_text#data-preparation).
+To fine-tune our released models, you should use the same sentecepiece models and dictionaries as ours:
+
+- En-De: [sentencepiece_model](dataset/MuSTC/en_de/spm_unigram10000.model), [dict](dataset/MuSTC/en_de/dict.spm.txt)
+- En-Es: [sentencepiece_model](dataset/MuSTC/en_es/spm_unigram10000.model), [dict](dataset/MuSTC/en_es/dict.spm.txt)
+- En-Fr: [sentencepiece_model](dataset/MuSTC/en_fr/spm_unigram10000.model), [dict](dataset/MuSTC/en_fr/dict.spm.txt)
+
+We provided examples in [`dataset`](dataset/MuSTC).
+
+### Fine-tune an encoder-decoder model
+
+```bash
+# Usage: speechut/scripts/tune_speechut_st/finetune_base_mustc_enxx.sh <model_path> <data_dir> <lang> <cpt-tag> [mount=$PWD] [world_size=8] [update_freq=4/6]
+model_path=path/to/your/pre-trained/model
+data_dir=dataset/MuSTC/en-${lang}
+bash speechut/scripts/tune_speechut_st/finetune_base_mustc_enxx.sh $model_path $data_dir ${lang} tag400k
+```
+Please check the script [`finetune_base_mustc_enxx.sh`](speechut/scripts/tune_speechut_st/finetune_base_mustc_enxx.sh) for detailed configuration.
+
+### Decode
+You might average several model checkpoints with the best dev accuracy to stablize the performance,
+```bash
+python fairseq/scripts/average_checkpoints.py --inputs $model_dir/checkpoint.best_acc*.pt --output $model_dir/checkpoint.avgnbest.pt
+```
+Then decode the model with beam search,
+```bash
+# Usage: speechut/scripts/tune_speechut_st/inference_st.sh <model_path> <data_dir> <lang> [gen-set=dev] [beam_size=10] [lenpen=1.0]
+model_path=path/to/your/fine-tuned/model
+data_dir=dataset/MuSTC/en-${lang}
+bash speechut/scripts/tune_speechut_st/inference_st.sh $model_path $data_dir ${lang} tst-COMMON
+```
+
+
+
+
+## Pre-train for ASR
+
+### Data preparation
+The model is pre-trained by speech-to-unit, unit-to-text and mask-unit-lm tasks.
+1. For speech-to-unit task, please follow the steps of data preparation for HuBERT [here](https://github.com/facebookresearch/fairseq/tree/main/examples/hubert#data-preparation).
+2. For unit-to-text task, follow the steps below:
+    - Generate units from unpaired text by [T2U Generator](#T2U-Generator).
+    - Pair the generated units and text data, convert them to binary files.
+3. For mask-unit-lm task, combine the units generated from step1 and step2 together.
+
+You should use [`dict.ltr.txt`](dataset/LibriSpeech/dict.ltr.txt) when preparing the text data, make sure the dictionary is the same as that used for fine-tuning.
+
+### Pre-train base model
+
+```bash
+# Usage: speechut/scripts/pretrain_speechut/base_speechut_for_asr.sh <data_dir> <text_data_dir> [mount=$PWD] [world_size=32] [update_freq=1]
+data_dir=
+text_data_dir=
+bash speechut/scripts/pretrain_speechut/base_speechut_for_asr.sh $data_dir $text_data_dir
+```
+
+## Pre-train for ST
+
+### Data preparation
+The model is pre-trained by speech-to-unit, unit-to-text and mask-unit-lm tasks.
+1. For speech-to-unit task, please follow the steps of data preparation for HuBERT [here](https://github.com/facebookresearch/fairseq/tree/main/examples/hubert#data-preparation).
+2. For unit-to-text task, we use bilingual text where the source side (i.e. English) is used to generate unit and the target side serves as the output. Follow the steps below:
+    - Normalize the source (English) text by removing punctuation, converting capital letters.
+    - Generate units from the source (English) text by [T2U Generator](#T2U-Generator).
+    - Pair the generated units and text data, convert them to binary files.
+3. For mask-unit-lm task, combine the units generated from step1 and step2 together.
+You should use the same sentencepiece models and dictionaries as that used for [fine-tuning](#ST-on-MuST-C).
+
+
+### Pre-train base model
+
+```bash
+# Usage: speechut/scripts/pretrain_speechut/base_speechut_for_st.sh <data_dir> <text_data_dir> <lang> [mount=$PWD] [world_size=32] [update_freq=1]
+data_dir=
+text_data_dir=
+bash speechut/scripts/pretrain_speechut/base_speechut_for_st.sh $data_dir $text_data_dir ${lang}
+```
+
+
+## T2U Generator
+The original paper trains an encoder-decoder model to generate reduced units from text, which is time consuming due to the autoregressive generation.
+We recently update the T2U generator to a non-autoregressive model, which generates non-reduced units (can be easily post-processed to reduced units). Please follow the usage provided by [Hidden-unit Tokenizer for Text](https://github.com/microsoft/SpeechT5/tree/main/SpeechLM#hidden-unit-tokenizer-for-text) (they used the same HuBERT units as this work).
+
+
+## License
+
+This project is licensed under the license found in the LICENSE file in the root directory of this source tree.
+Portions of the source code are based on the [FAIRSEQ](https://github.com/pytorch/fairseq).
+
+[Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct)
+
+## Reference
+
+If you find our work is useful in your research, please cite the following paper:
+
+```bibtex
+@article{zhang2022speechut,
+  title   = {SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder Based Speech-Text Pre-training},
+  author  = {Zhang, Ziqiang and Zhou, Long and Ao, Junyi and Liu, Shujie and Dai, Lirong and Li, Jinyu and Wei, Furu},
+  eprint={2210.03730},
+  archivePrefix={arXiv},
+  primaryClass={cs.CL},
+  year={2022}
+}
+```
+
+### Contact Information
+
+For help or issues using SpeechUT models, please submit a GitHub issue.
+
+For other communications related to SpeechUT, please contact Long Zhou (`lozhou@microsoft.com`).
\ No newline at end of file
diff --git a/SpeechUT/dataset/LibriSpeech/dict.km.txt b/SpeechUT/dataset/LibriSpeech/dict.km.txt
new file mode 100644
index 0000000000000000000000000000000000000000..bbfe59e554d6234f3631d8d09d9281c2160f4675
--- /dev/null
+++ b/SpeechUT/dataset/LibriSpeech/dict.km.txt
@@ -0,0 +1,500 @@
+0 0
+1 1
+2 2
+3 3
+4 4
+5 5
+6 6
+7 7
+8 8
+9 9
+10 10
+11 11
+12 12
+13 13
+14 14
+15 15
+16 16
+17 17
+18 18
+19 19
+20 20
+21 21
+22 22
+23 23
+24 24
+25 25
+26 26
+27 27
+28 28
+29 29
+30 30
+31 31
+32 32
+33 33
+34 34
+35 35
+36 36
+37 37
+38 38
+39 39
+40 40
+41 41
+42 42
+43 43
+44 44
+45 45
+46 46
+47 47
+48 48
+49 49
+50 50
+51 51
+52 52
+53 53
+54 54
+55 55
+56 56
+57 57
+58 58
+59 59
+60 60
+61 61
+62 62
+63 63
+64 64
+65 65
+66 66
+67 67
+68 68
+69 69
+70 70
+71 71
+72 72
+73 73
+74 74
+75 75
+76 76
+77 77
+78 78
+79 79
+80 80
+81 81
+82 82
+83 83
+84 84
+85 85
+86 86
+87 87
+88 88
+89 89
+90 90
+91 91
+92 92
+93 93
+94 94
+95 95
+96 96
+97 97
+98 98
+99 99
+100 100
+101 101
+102 102
+103 103
+104 104
+105 105
+106 106
+107 107
+108 108
+109 109
+110 110
+111 111
+112 112
+113 113
+114 114
+115 115
+116 116
+117 117
+118 118
+119 119
+120 120
+121 121
+122 122
+123 123
+124 124
+125 125
+126 126
+127 127
+128 128
+129 129
+130 130
+131 131
+132 132
+133 133
+134 134
+135 135
+136 136
+137 137
+138 138
+139 139
+140 140
+141 141
+142 142
+143 143
+144 144
+145 145
+146 146
+147 147
+148 148
+149 149
+150 150
+151 151
+152 152
+153 153
+154 154
+155 155
+156 156
+157 157
+158 158
+159 159
+160 160
+161 161
+162 162
+163 163
+164 164
+165 165
+166 166
+167 167
+168 168
+169 169
+170 170
+171 171
+172 172
+173 173
+174 174
+175 175
+176 176
+177 177
+178 178
+179 179
+180 180
+181 181
+182 182
+183 183
+184 184
+185 185
+186 186
+187 187
+188 188
+189 189
+190 190
+191 191
+192 192
+193 193
+194 194
+195 195
+196 196
+197 197
+198 198
+199 199
+200 200
+201 201
+202 202
+203 203
+204 204
+205 205
+206 206
+207 207
+208 208
+209 209
+210 210
+211 211
+212 212
+213 213
+214 214
+215 215
+216 216
+217 217
+218 218
+219 219
+220 220
+221 221
+222 222
+223 223
+224 224
+225 225
+226 226
+227 227
+228 228
+229 229
+230 230
+231 231
+232 232
+233 233
+234 234
+235 235
+236 236
+237 237
+238 238
+239 239
+240 240
+241 241
+242 242
+243 243
+244 244
+245 245
+246 246
+247 247
+248 248
+249 249
+250 250
+251 251
+252 252
+253 253
+254 254
+255 255
+256 256
+257 257
+258 258
+259 259
+260 260
+261 261
+262 262
+263 263
+264 264
+265 265
+266 266
+267 267
+268 268
+269 269
+270 270
+271 271
+272 272
+273 273
+274 274
+275 275
+276 276
+277 277
+278 278
+279 279
+280 280
+281 281
+282 282
+283 283
+284 284
+285 285
+286 286
+287 287
+288 288
+289 289
+290 290
+291 291
+292 292
+293 293
+294 294
+295 295
+296 296
+297 297
+298 298
+299 299
+300 300
+301 301
+302 302
+303 303
+304 304
+305 305
+306 306
+307 307
+308 308
+309 309
+310 310
+311 311
+312 312
+313 313
+314 314
+315 315
+316 316
+317 317
+318 318
+319 319
+320 320
+321 321
+322 322
+323 323
+324 324
+325 325
+326 326
+327 327
+328 328
+329 329
+330 330
+331 331
+332 332
+333 333
+334 334
+335 335
+336 336
+337 337
+338 338
+339 339
+340 340
+341 341
+342 342
+343 343
+344 344
+345 345
+346 346
+347 347
+348 348
+349 349
+350 350
+351 351
+352 352
+353 353
+354 354
+355 355
+356 356
+357 357
+358 358
+359 359
+360 360
+361 361
+362 362
+363 363
+364 364
+365 365
+366 366
+367 367
+368 368
+369 369
+370 370
+371 371
+372 372
+373 373
+374 374
+375 375
+376 376
+377 377
+378 378
+379 379
+380 380
+381 381
+382 382
+383 383
+384 384
+385 385
+386 386
+387 387
+388 388
+389 389
+390 390
+391 391
+392 392
+393 393
+394 394
+395 395
+396 396
+397 397
+398 398
+399 399
+400 400
+401 401
+402 402
+403 403
+404 404
+405 405
+406 406
+407 407
+408 408
+409 409
+410 410
+411 411
+412 412
+413 413
+414 414
+415 415
+416 416
+417 417
+418 418
+419 419
+420 420
+421 421
+422 422
+423 423
+424 424
+425 425
+426 426
+427 427
+428 428
+429 429
+430 430
+431 431
+432 432
+433 433
+434 434
+435 435
+436 436
+437 437
+438 438
+439 439
+440 440
+441 441
+442 442
+443 443
+444 444
+445 445
+446 446
+447 447
+448 448
+449 449
+450 450
+451 451
+452 452
+453 453
+454 454
+455 455
+456 456
+457 457
+458 458
+459 459
+460 460
+461 461
+462 462
+463 463
+464 464
+465 465
+466 466
+467 467
+468 468
+469 469
+470 470
+471 471
+472 472
+473 473
+474 474
+475 475
+476 476
+477 477
+478 478
+479 479
+480 480
+481 481
+482 482
+483 483
+484 484
+485 485
+486 486
+487 487
+488 488
+489 489
+490 490
+491 491
+492 492
+493 493
+494 494
+495 495
+496 496
+497 497
+498 498
+499 499
diff --git a/SpeechUT/dataset/LibriSpeech/dict.kmu.txt b/SpeechUT/dataset/LibriSpeech/dict.kmu.txt
new file mode 100644
index 0000000000000000000000000000000000000000..bbfe59e554d6234f3631d8d09d9281c2160f4675
--- /dev/null
+++ b/SpeechUT/dataset/LibriSpeech/dict.kmu.txt
@@ -0,0 +1,500 @@
+0 0
+1 1
+2 2
+3 3
+4 4
+5 5
+6 6
+7 7
+8 8
+9 9
+10 10
+11 11
+12 12
+13 13
+14 14
+15 15
+16 16
+17 17
+18 18
+19 19
+20 20
+21 21
+22 22
+23 23
+24 24
+25 25
+26 26
+27 27
+28 28
+29 29
+30 30
+31 31
+32 32
+33 33
+34 34
+35 35
+36 36
+37 37
+38 38
+39 39
+40 40
+41 41
+42 42
+43 43
+44 44
+45 45
+46 46
+47 47
+48 48
+49 49
+50 50
+51 51
+52 52
+53 53
+54 54
+55 55
+56 56
+57 57
+58 58
+59 59
+60 60
+61 61
+62 62
+63 63
+64 64
+65 65
+66 66
+67 67
+68 68
+69 69
+70 70
+71 71
+72 72
+73 73
+74 74
+75 75
+76 76
+77 77
+78 78
+79 79
+80 80
+81 81
+82 82
+83 83
+84 84
+85 85
+86 86
+87 87
+88 88
+89 89
+90 90
+91 91
+92 92
+93 93
+94 94
+95 95
+96 96
+97 97
+98 98
+99 99
+100 100
+101 101
+102 102
+103 103
+104 104
+105 105
+106 106
+107 107
+108 108
+109 109
+110 110
+111 111
+112 112
+113 113
+114 114
+115 115
+116 116
+117 117
+118 118
+119 119
+120 120
+121 121
+122 122
+123 123
+124 124
+125 125
+126 126
+127 127
+128 128
+129 129
+130 130
+131 131
+132 132
+133 133
+134 134
+135 135
+136 136
+137 137
+138 138
+139 139
+140 140
+141 141
+142 142
+143 143
+144 144
+145 145
+146 146
+147 147
+148 148
+149 149
+150 150
+151 151
+152 152
+153 153
+154 154
+155 155
+156 156
+157 157
+158 158
+159 159
+160 160
+161 161
+162 162
+163 163
+164 164
+165 165
+166 166
+167 167
+168 168
+169 169
+170 170
+171 171
+172 172
+173 173
+174 174
+175 175
+176 176
+177 177
+178 178
+179 179
+180 180
+181 181
+182 182
+183 183
+184 184
+185 185
+186 186
+187 187
+188 188
+189 189
+190 190
+191 191
+192 192
+193 193
+194 194
+195 195
+196 196
+197 197
+198 198
+199 199
+200 200
+201 201
+202 202
+203 203
+204 204
+205 205
+206 206
+207 207
+208 208
+209 209
+210 210
+211 211
+212 212
+213 213
+214 214
+215 215
+216 216
+217 217
+218 218
+219 219
+220 220
+221 221
+222 222
+223 223
+224 224
+225 225
+226 226
+227 227
+228 228
+229 229
+230 230
+231 231
+232 232
+233 233
+234 234
+235 235
+236 236
+237 237
+238 238
+239 239
+240 240
+241 241
+242 242
+243 243
+244 244
+245 245
+246 246
+247 247
+248 248
+249 249
+250 250
+251 251
+252 252
+253 253
+254 254
+255 255
+256 256
+257 257
+258 258
+259 259
+260 260
+261 261
+262 262
+263 263
+264 264
+265 265
+266 266
+267 267
+268 268
+269 269
+270 270
+271 271
+272 272
+273 273
+274 274
+275 275
+276 276
+277 277
+278 278
+279 279
+280 280
+281 281
+282 282
+283 283
+284 284
+285 285
+286 286
+287 287
+288 288
+289 289
+290 290
+291 291
+292 292
+293 293
+294 294
+295 295
+296 296
+297 297
+298 298
+299 299
+300 300
+301 301
+302 302
+303 303
+304 304
+305 305
+306 306
+307 307
+308 308
+309 309
+310 310
+311 311
+312 312
+313 313
+314 314
+315 315
+316 316
+317 317
+318 318
+319 319
+320 320
+321 321
+322 322
+323 323
+324 324
+325 325
+326 326
+327 327
+328 328
+329 329
+330 330
+331 331
+332 332
+333 333
+334 334
+335 335
+336 336
+337 337
+338 338
+339 339
+340 340
+341 341
+342 342
+343 343
+344 344
+345 345
+346 346
+347 347
+348 348
+349 349
+350 350
+351 351
+352 352
+353 353
+354 354
+355 355
+356 356
+357 357
+358 358
+359 359
+360 360
+361 361
+362 362
+363 363
+364 364
+365 365
+366 366
+367 367
+368 368
+369 369
+370 370
+371 371
+372 372
+373 373
+374 374
+375 375
+376 376
+377 377
+378 378
+379 379
+380 380
+381 381
+382 382
+383 383
+384 384
+385 385
+386 386
+387 387
+388 388
+389 389
+390 390
+391 391
+392 392
+393 393
+394 394
+395 395
+396 396
+397 397
+398 398
+399 399
+400 400
+401 401
+402 402
+403 403
+404 404
+405 405
+406 406
+407 407
+408 408
+409 409
+410 410
+411 411
+412 412
+413 413
+414 414
+415 415
+416 416
+417 417
+418 418
+419 419
+420 420
+421 421
+422 422
+423 423
+424 424
+425 425
+426 426
+427 427
+428 428
+429 429
+430 430
+431 431
+432 432
+433 433
+434 434
+435 435
+436 436
+437 437
+438 438
+439 439
+440 440
+441 441
+442 442
+443 443
+444 444
+445 445
+446 446
+447 447
+448 448
+449 449
+450 450
+451 451
+452 452
+453 453
+454 454
+455 455
+456 456
+457 457
+458 458
+459 459
+460 460
+461 461
+462 462
+463 463
+464 464
+465 465
+466 466
+467 467
+468 468
+469 469
+470 470
+471 471
+472 472
+473 473
+474 474
+475 475
+476 476
+477 477
+478 478
+479 479
+480 480
+481 481
+482 482
+483 483
+484 484
+485 485
+486 486
+487 487
+488 488
+489 489
+490 490
+491 491
+492 492
+493 493
+494 494
+495 495
+496 496
+497 497
+498 498
+499 499
diff --git a/SpeechUT/dataset/LibriSpeech/dict.ltr.txt b/SpeechUT/dataset/LibriSpeech/dict.ltr.txt
new file mode 100644
index 0000000000000000000000000000000000000000..26a7e6ba309998c3868db7ecab5d7afa52a68e52
--- /dev/null
+++ b/SpeechUT/dataset/LibriSpeech/dict.ltr.txt
@@ -0,0 +1,29 @@
+| 803288730
+E 439294199
+T 319071758
+A 277306732
+O 263784364
+N 239361162
+I 237353011
+H 223346762
+S 220175453
+R 203352500
+D 152198685
+L 141597450
+U 98913389
+M 87138757
+C 84680142
+W 81375101
+F 80240665
+G 70642902
+Y 68388038
+P 58436929
+B 52538531
+V 33250231
+K 26906609
+' 9162896
+X 5075632
+J 4746771
+Q 3401794
+Z 2186971
+<mask> 1
diff --git a/SpeechUT/dataset/LibriSpeech/dict.txt b/SpeechUT/dataset/LibriSpeech/dict.txt
new file mode 100644
index 0000000000000000000000000000000000000000..69929e1666c8182148d83ef4332e4c677bb90e5a
--- /dev/null
+++ b/SpeechUT/dataset/LibriSpeech/dict.txt
@@ -0,0 +1,28 @@
+| 94802
+E 51860
+T 38431
+A 33152
+O 31495
+N 28855
+I 28794
+H 27187
+S 26071
+R 23546
+D 18289
+L 16308
+U 12400
+M 10685
+W 10317
+C 9844
+F 9062
+G 8924
+Y 8226
+P 6890
+B 6339
+V 3936
+K 3456
+' 1023
+X 636
+J 598
+Q 437
+Z 213
diff --git a/SpeechUT/dataset/MuSTC/dict.km.txt b/SpeechUT/dataset/MuSTC/dict.km.txt
new file mode 100644
index 0000000000000000000000000000000000000000..bbfe59e554d6234f3631d8d09d9281c2160f4675
--- /dev/null
+++ b/SpeechUT/dataset/MuSTC/dict.km.txt
@@ -0,0 +1,500 @@
+0 0
+1 1
+2 2
+3 3
+4 4
+5 5
+6 6
+7 7
+8 8
+9 9
+10 10
+11 11
+12 12
+13 13
+14 14
+15 15
+16 16
+17 17
+18 18
+19 19
+20 20
+21 21
+22 22
+23 23
+24 24
+25 25
+26 26
+27 27
+28 28
+29 29
+30 30
+31 31
+32 32
+33 33
+34 34
+35 35
+36 36
+37 37
+38 38
+39 39
+40 40
+41 41
+42 42
+43 43
+44 44
+45 45
+46 46
+47 47
+48 48
+49 49
+50 50
+51 51
+52 52
+53 53
+54 54
+55 55
+56 56
+57 57
+58 58
+59 59
+60 60
+61 61
+62 62
+63 63
+64 64
+65 65
+66 66
+67 67
+68 68
+69 69
+70 70
+71 71
+72 72
+73 73
+74 74
+75 75
+76 76
+77 77
+78 78
+79 79
+80 80
+81 81
+82 82
+83 83
+84 84
+85 85
+86 86
+87 87
+88 88
+89 89
+90 90
+91 91
+92 92
+93 93
+94 94
+95 95
+96 96
+97 97
+98 98
+99 99
+100 100
+101 101
+102 102
+103 103
+104 104
+105 105
+106 106
+107 107
+108 108
+109 109
+110 110
+111 111
+112 112
+113 113
+114 114
+115 115
+116 116
+117 117
+118 118
+119 119
+120 120
+121 121
+122 122
+123 123
+124 124
+125 125
+126 126
+127 127
+128 128
+129 129
+130 130
+131 131
+132 132
+133 133
+134 134
+135 135
+136 136
+137 137
+138 138
+139 139
+140 140
+141 141
+142 142
+143 143
+144 144
+145 145
+146 146
+147 147
+148 148
+149 149
+150 150
+151 151
+152 152
+153 153
+154 154
+155 155
+156 156
+157 157
+158 158
+159 159
+160 160
+161 161
+162 162
+163 163
+164 164
+165 165
+166 166
+167 167
+168 168
+169 169
+170 170
+171 171
+172 172
+173 173
+174 174
+175 175
+176 176
+177 177
+178 178
+179 179
+180 180
+181 181
+182 182
+183 183
+184 184
+185 185
+186 186
+187 187
+188 188
+189 189
+190 190
+191 191
+192 192
+193 193
+194 194
+195 195
+196 196
+197 197
+198 198
+199 199
+200 200
+201 201
+202 202
+203 203
+204 204
+205 205
+206 206
+207 207
+208 208
+209 209
+210 210
+211 211
+212 212
+213 213
+214 214
+215 215
+216 216
+217 217
+218 218
+219 219
+220 220
+221 221
+222 222
+223 223
+224 224
+225 225
+226 226
+227 227
+228 228
+229 229
+230 230
+231 231
+232 232
+233 233
+234 234
+235 235
+236 236
+237 237
+238 238
+239 239
+240 240
+241 241
+242 242
+243 243
+244 244
+245 245
+246 246
+247 247
+248 248
+249 249
+250 250
+251 251
+252 252
+253 253
+254 254
+255 255
+256 256
+257 257
+258 258
+259 259
+260 260
+261 261
+262 262
+263 263
+264 264
+265 265
+266 266
+267 267
+268 268
+269 269
+270 270
+271 271
+272 272
+273 273
+274 274
+275 275
+276 276
+277 277
+278 278
+279 279
+280 280
+281 281
+282 282
+283 283
+284 284
+285 285
+286 286
+287 287
+288 288
+289 289
+290 290
+291 291
+292 292
+293 293
+294 294
+295 295
+296 296
+297 297
+298 298
+299 299
+300 300
+301 301
+302 302
+303 303
+304 304
+305 305
+306 306
+307 307
+308 308
+309 309
+310 310
+311 311
+312 312
+313 313
+314 314
+315 315
+316 316
+317 317
+318 318
+319 319
+320 320
+321 321
+322 322
+323 323
+324 324
+325 325
+326 326
+327 327
+328 328
+329 329
+330 330
+331 331
+332 332
+333 333
+334 334
+335 335
+336 336
+337 337
+338 338
+339 339
+340 340
+341 341
+342 342
+343 343
+344 344
+345 345
+346 346
+347 347
+348 348
+349 349
+350 350
+351 351
+352 352
+353 353
+354 354
+355 355
+356 356
+357 357
+358 358
+359 359
+360 360
+361 361
+362 362
+363 363
+364 364
+365 365
+366 366
+367 367
+368 368
+369 369
+370 370
+371 371
+372 372
+373 373
+374 374
+375 375
+376 376
+377 377
+378 378
+379 379
+380 380
+381 381
+382 382
+383 383
+384 384
+385 385
+386 386
+387 387
+388 388
+389 389
+390 390
+391 391
+392 392
+393 393
+394 394
+395 395
+396 396
+397 397
+398 398
+399 399
+400 400
+401 401
+402 402
+403 403
+404 404
+405 405
+406 406
+407 407
+408 408
+409 409
+410 410
+411 411
+412 412
+413 413
+414 414
+415 415
+416 416
+417 417
+418 418
+419 419
+420 420
+421 421
+422 422
+423 423
+424 424
+425 425
+426 426
+427 427
+428 428
+429 429
+430 430
+431 431
+432 432
+433 433
+434 434
+435 435
+436 436
+437 437
+438 438
+439 439
+440 440
+441 441
+442 442
+443 443
+444 444
+445 445
+446 446
+447 447
+448 448
+449 449
+450 450
+451 451
+452 452
+453 453
+454 454
+455 455
+456 456
+457 457
+458 458
+459 459
+460 460
+461 461
+462 462
+463 463
+464 464
+465 465
+466 466
+467 467
+468 468
+469 469
+470 470
+471 471
+472 472
+473 473
+474 474
+475 475
+476 476
+477 477
+478 478
+479 479
+480 480
+481 481
+482 482
+483 483
+484 484
+485 485
+486 486
+487 487
+488 488
+489 489
+490 490
+491 491
+492 492
+493 493
+494 494
+495 495
+496 496
+497 497
+498 498
+499 499
diff --git a/SpeechUT/dataset/MuSTC/dict.kmu.txt b/SpeechUT/dataset/MuSTC/dict.kmu.txt
new file mode 100644
index 0000000000000000000000000000000000000000..bbfe59e554d6234f3631d8d09d9281c2160f4675
--- /dev/null
+++ b/SpeechUT/dataset/MuSTC/dict.kmu.txt
@@ -0,0 +1,500 @@
+0 0
+1 1
+2 2
+3 3
+4 4
+5 5
+6 6
+7 7
+8 8
+9 9
+10 10
+11 11
+12 12
+13 13
+14 14
+15 15
+16 16
+17 17
+18 18
+19 19
+20 20
+21 21
+22 22
+23 23
+24 24
+25 25
+26 26
+27 27
+28 28
+29 29
+30 30
+31 31
+32 32
+33 33
+34 34
+35 35
+36 36
+37 37
+38 38
+39 39
+40 40
+41 41
+42 42
+43 43
+44 44
+45 45
+46 46
+47 47
+48 48
+49 49
+50 50
+51 51
+52 52
+53 53
+54 54
+55 55
+56 56
+57 57
+58 58
+59 59
+60 60
+61 61
+62 62
+63 63
+64 64
+65 65
+66 66
+67 67
+68 68
+69 69
+70 70
+71 71
+72 72
+73 73
+74 74
+75 75
+76 76
+77 77
+78 78
+79 79
+80 80
+81 81
+82 82
+83 83
+84 84
+85 85
+86 86
+87 87
+88 88
+89 89
+90 90
+91 91
+92 92
+93 93
+94 94
+95 95
+96 96
+97 97
+98 98
+99 99
+100 100
+101 101
+102 102
+103 103
+104 104
+105 105
+106 106
+107 107
+108 108
+109 109
+110 110
+111 111
+112 112
+113 113
+114 114
+115 115
+116 116
+117 117
+118 118
+119 119
+120 120
+121 121
+122 122
+123 123
+124 124
+125 125
+126 126
+127 127
+128 128
+129 129
+130 130
+131 131
+132 132
+133 133
+134 134
+135 135
+136 136
+137 137
+138 138
+139 139
+140 140
+141 141
+142 142
+143 143
+144 144
+145 145
+146 146
+147 147
+148 148
+149 149
+150 150
+151 151
+152 152
+153 153
+154 154
+155 155
+156 156
+157 157
+158 158
+159 159
+160 160
+161 161
+162 162
+163 163
+164 164
+165 165
+166 166
+167 167
+168 168
+169 169
+170 170
+171 171
+172 172
+173 173
+174 174
+175 175
+176 176
+177 177
+178 178
+179 179
+180 180
+181 181
+182 182
+183 183
+184 184
+185 185
+186 186
+187 187
+188 188
+189 189
+190 190
+191 191
+192 192
+193 193
+194 194
+195 195
+196 196
+197 197
+198 198
+199 199
+200 200
+201 201
+202 202
+203 203
+204 204
+205 205
+206 206
+207 207
+208 208
+209 209
+210 210
+211 211
+212 212
+213 213
+214 214
+215 215
+216 216
+217 217
+218 218
+219 219
+220 220
+221 221
+222 222
+223 223
+224 224
+225 225
+226 226
+227 227
+228 228
+229 229
+230 230
+231 231
+232 232
+233 233
+234 234
+235 235
+236 236
+237 237
+238 238
+239 239
+240 240
+241 241
+242 242
+243 243
+244 244
+245 245
+246 246
+247 247
+248 248
+249 249
+250 250
+251 251
+252 252
+253 253
+254 254
+255 255
+256 256
+257 257
+258 258
+259 259
+260 260
+261 261
+262 262
+263 263
+264 264
+265 265
+266 266
+267 267
+268 268
+269 269
+270 270
+271 271
+272 272
+273 273
+274 274
+275 275
+276 276
+277 277
+278 278
+279 279
+280 280
+281 281
+282 282
+283 283
+284 284
+285 285
+286 286
+287 287
+288 288
+289 289
+290 290
+291 291
+292 292
+293 293
+294 294
+295 295
+296 296
+297 297
+298 298
+299 299
+300 300
+301 301
+302 302
+303 303
+304 304
+305 305
+306 306
+307 307
+308 308
+309 309
+310 310
+311 311
+312 312
+313 313
+314 314
+315 315
+316 316
+317 317
+318 318
+319 319
+320 320
+321 321
+322 322
+323 323
+324 324
+325 325
+326 326
+327 327
+328 328
+329 329
+330 330
+331 331
+332 332
+333 333
+334 334
+335 335
+336 336
+337 337
+338 338
+339 339
+340 340
+341 341
+342 342
+343 343
+344 344
+345 345
+346 346
+347 347
+348 348
+349 349
+350 350
+351 351
+352 352
+353 353
+354 354
+355 355
+356 356
+357 357
+358 358
+359 359
+360 360
+361 361
+362 362
+363 363
+364 364
+365 365
+366 366
+367 367
+368 368
+369 369
+370 370
+371 371
+372 372
+373 373
+374 374
+375 375
+376 376
+377 377
+378 378
+379 379
+380 380
+381 381
+382 382
+383 383
+384 384
+385 385
+386 386
+387 387
+388 388
+389 389
+390 390
+391 391
+392 392
+393 393
+394 394
+395 395
+396 396
+397 397
+398 398
+399 399
+400 400
+401 401
+402 402
+403 403
+404 404
+405 405
+406 406
+407 407
+408 408
+409 409
+410 410
+411 411
+412 412
+413 413
+414 414
+415 415
+416 416
+417 417
+418 418
+419 419
+420 420
+421 421
+422 422
+423 423
+424 424
+425 425
+426 426
+427 427
+428 428
+429 429
+430 430
+431 431
+432 432
+433 433
+434 434
+435 435
+436 436
+437 437
+438 438
+439 439
+440 440
+441 441
+442 442
+443 443
+444 444
+445 445
+446 446
+447 447
+448 448
+449 449
+450 450
+451 451
+452 452
+453 453
+454 454
+455 455
+456 456
+457 457
+458 458
+459 459
+460 460
+461 461
+462 462
+463 463
+464 464
+465 465
+466 466
+467 467
+468 468
+469 469
+470 470
+471 471
+472 472
+473 473
+474 474
+475 475
+476 476
+477 477
+478 478
+479 479
+480 480
+481 481
+482 482
+483 483
+484 484
+485 485
+486 486
+487 487
+488 488
+489 489
+490 490
+491 491
+492 492
+493 493
+494 494
+495 495
+496 496
+497 497
+498 498
+499 499
diff --git a/SpeechUT/dataset/MuSTC/en_de/config.yaml b/SpeechUT/dataset/MuSTC/en_de/config.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..dce5f63011a8c33a4d12eec569fdcc91ea299f68
--- /dev/null
+++ b/SpeechUT/dataset/MuSTC/en_de/config.yaml
@@ -0,0 +1,3 @@
+vocab_filename: dict.spm.txt
+src_vocab_filename: dict.kmu.txt
+
diff --git a/SpeechUT/dataset/MuSTC/en_de/config_ende.yaml b/SpeechUT/dataset/MuSTC/en_de/config_ende.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..dd080a05500211cade57d80056c8ce311ce4c0c2
--- /dev/null
+++ b/SpeechUT/dataset/MuSTC/en_de/config_ende.yaml
@@ -0,0 +1,14 @@
+bpe_tokenizer:
+  bpe: sentencepiece
+  sentencepiece_model: spm_unigram10000.model
+
+sampling_alpha: 1.0
+shuffle: false
+use_audio_input: true
+use_sample_rate: 16000
+
+vocab_filename: dict.spm.txt
+
+# required by speech_to_text task but never used  
+input_channels: 1
+input_feat_per_channel: 1
diff --git a/SpeechUT/dataset/MuSTC/en_de/dict.kmu.txt b/SpeechUT/dataset/MuSTC/en_de/dict.kmu.txt
new file mode 100644
index 0000000000000000000000000000000000000000..bbfe59e554d6234f3631d8d09d9281c2160f4675
--- /dev/null
+++ b/SpeechUT/dataset/MuSTC/en_de/dict.kmu.txt
@@ -0,0 +1,500 @@
+0 0
+1 1
+2 2
+3 3
+4 4
+5 5
+6 6
+7 7
+8 8
+9 9
+10 10
+11 11
+12 12
+13 13
+14 14
+15 15
+16 16
+17 17
+18 18
+19 19
+20 20
+21 21
+22 22
+23 23
+24 24
+25 25
+26 26
+27 27
+28 28
+29 29
+30 30
+31 31
+32 32
+33 33
+34 34
+35 35
+36 36
+37 37
+38 38
+39 39
+40 40
+41 41
+42 42
+43 43
+44 44
+45 45
+46 46
+47 47
+48 48
+49 49
+50 50
+51 51
+52 52
+53 53
+54 54
+55 55
+56 56
+57 57
+58 58
+59 59
+60 60
+61 61
+62 62
+63 63
+64 64
+65 65
+66 66
+67 67
+68 68
+69 69
+70 70
+71 71
+72 72
+73 73
+74 74
+75 75
+76 76
+77 77
+78 78
+79 79
+80 80
+81 81
+82 82
+83 83
+84 84
+85 85
+86 86
+87 87
+88 88
+89 89
+90 90
+91 91
+92 92
+93 93
+94 94
+95 95
+96 96
+97 97
+98 98
+99 99
+100 100
+101 101
+102 102
+103 103
+104 104
+105 105
+106 106
+107 107
+108 108
+109 109
+110 110
+111 111
+112 112
+113 113
+114 114
+115 115
+116 116
+117 117
+118 118
+119 119
+120 120
+121 121
+122 122
+123 123
+124 124
+125 125
+126 126
+127 127
+128 128
+129 129
+130 130
+131 131
+132 132
+133 133
+134 134
+135 135
+136 136
+137 137
+138 138
+139 139
+140 140
+141 141
+142 142
+143 143
+144 144
+145 145
+146 146
+147 147
+148 148
+149 149
+150 150
+151 151
+152 152
+153 153
+154 154
+155 155
+156 156
+157 157
+158 158
+159 159
+160 160
+161 161
+162 162
+163 163
+164 164
+165 165
+166 166
+167 167
+168 168
+169 169
+170 170
+171 171
+172 172
+173 173
+174 174
+175 175
+176 176
+177 177
+178 178
+179 179
+180 180
+181 181
+182 182
+183 183
+184 184
+185 185
+186 186
+187 187
+188 188
+189 189
+190 190
+191 191
+192 192
+193 193
+194 194
+195 195
+196 196
+197 197
+198 198
+199 199
+200 200
+201 201
+202 202
+203 203
+204 204
+205 205
+206 206
+207 207
+208 208
+209 209
+210 210
+211 211
+212 212
+213 213
+214 214
+215 215
+216 216
+217 217
+218 218
+219 219
+220 220
+221 221
+222 222
+223 223
+224 224
+225 225
+226 226
+227 227
+228 228
+229 229
+230 230
+231 231
+232 232
+233 233
+234 234
+235 235
+236 236
+237 237
+238 238
+239 239
+240 240
+241 241
+242 242
+243 243
+244 244
+245 245
+246 246
+247 247
+248 248
+249 249
+250 250
+251 251
+252 252
+253 253
+254 254
+255 255
+256 256
+257 257
+258 258
+259 259
+260 260
+261 261
+262 262
+263 263
+264 264
+265 265
+266 266
+267 267
+268 268
+269 269
+270 270
+271 271
+272 272
+273 273
+274 274
+275 275
+276 276
+277 277
+278 278
+279 279
+280 280
+281 281
+282 282
+283 283
+284 284
+285 285
+286 286
+287 287
+288 288
+289 289
+290 290
+291 291
+292 292
+293 293
+294 294
+295 295
+296 296
+297 297
+298 298
+299 299
+300 300
+301 301
+302 302
+303 303
+304 304
+305 305
+306 306
+307 307
+308 308
+309 309
+310 310
+311 311
+312 312
+313 313
+314 314
+315 315
+316 316
+317 317
+318 318
+319 319
+320 320
+321 321
+322 322
+323 323
+324 324
+325 325
+326 326
+327 327
+328 328
+329 329
+330 330
+331 331
+332 332
+333 333
+334 334
+335 335
+336 336
+337 337
+338 338
+339 339
+340 340
+341 341
+342 342
+343 343
+344 344
+345 345
+346 346
+347 347
+348 348
+349 349
+350 350
+351 351
+352 352
+353 353
+354 354
+355 355
+356 356
+357 357
+358 358
+359 359
+360 360
+361 361
+362 362
+363 363
+364 364
+365 365
+366 366
+367 367
+368 368
+369 369
+370 370
+371 371
+372 372
+373 373
+374 374
+375 375
+376 376
+377 377
+378 378
+379 379
+380 380
+381 381
+382 382
+383 383
+384 384
+385 385
+386 386
+387 387
+388 388
+389 389
+390 390
+391 391
+392 392
+393 393
+394 394
+395 395
+396 396
+397 397
+398 398
+399 399
+400 400
+401 401
+402 402
+403 403
+404 404
+405 405
+406 406
+407 407
+408 408
+409 409
+410 410
+411 411
+412 412
+413 413
+414 414
+415 415
+416 416
+417 417
+418 418
+419 419
+420 420
+421 421
+422 422
+423 423
+424 424
+425 425
+426 426
+427 427
+428 428
+429 429
+430 430
+431 431
+432 432
+433 433
+434 434
+435 435
+436 436
+437 437
+438 438
+439 439
+440 440
+441 441
+442 442
+443 443
+444 444
+445 445
+446 446
+447 447
+448 448
+449 449
+450 450
+451 451
+452 452
+453 453
+454 454
+455 455
+456 456
+457 457
+458 458
+459 459
+460 460
+461 461
+462 462
+463 463
+464 464
+465 465
+466 466
+467 467
+468 468
+469 469
+470 470
+471 471
+472 472
+473 473
+474 474
+475 475
+476 476
+477 477
+478 478
+479 479
+480 480
+481 481
+482 482
+483 483
+484 484
+485 485
+486 486
+487 487
+488 488
+489 489
+490 490
+491 491
+492 492
+493 493
+494 494
+495 495
+496 496
+497 497
+498 498
+499 499
diff --git a/SpeechUT/dataset/MuSTC/en_de/dict.spm.txt b/SpeechUT/dataset/MuSTC/en_de/dict.spm.txt
new file mode 100644
index 0000000000000000000000000000000000000000..6f45c562c35023a09b76baa5bbbb38243ef0654c
--- /dev/null
+++ b/SpeechUT/dataset/MuSTC/en_de/dict.spm.txt
@@ -0,0 +1,9997 @@
+, 1
+. 1
+▁die 1
+▁der 1
+en 1
+▁und 1
+s 1
+e 1
+▁ 1
+n 1
+▁in 1
+▁zu 1
+▁den 1
+▁von 1
+▁ist 1
+- 1
+▁das 1
+▁für 1
+er 1
+▁auf 1
+▁mit 1
+▁ein 1
+▁eine 1
+▁des 1
+▁nicht 1
+▁Sie 1
+▁wir 1
+▁dass 1
+▁es 1
+t 1
+▁im 1
+▁werden 1
+▁sich 1
+r 1
+▁dem 1
+▁ich 1
+▁an 1
+▁Die 1
+▁sind 1
+▁auch 1
+▁sie 1
+▁über 1
+▁um 1
+▁wird 1
+▁als 1
+: 1
+▁haben 1
+▁( 1
+▁aus 1
+▁wie 1
+es 1
+▁oder 1
+▁Ich 1
+) 1
+▁hat 1
+▁einer 1
+▁Das 1
+▁- 1
+d 1
+▁bei 1
+▁einen 1
+▁können 1
+m 1
+▁zur 1
+▁diese 1
+▁vor 1
+▁Wir 1
+▁er 1
+▁uns 1
+▁so 1
+▁nach 1
+▁einem 1
+ung 1
+▁Es 1
+▁durch 1
+▁" 1
+▁nur 1
+▁kann 1
+▁ver 1
+▁be 1
+? 1
+▁zum 1
+▁wenn 1
+▁In 1
+▁dieser 1
+y 1
+▁man 1
+▁aber 1
+▁war 1
+▁noch 1
+▁sein 1
+▁Der 1
+▁Kommission 1
+▁was 1
+st 1
+te 1
+▁wurde 1
+▁sehr 1
+k 1
+▁daß 1
+! 1
+▁müssen 1
+▁ge 1
+▁Herr 1
+▁S 1
+▁Und 1
+▁– 1
+▁mehr 1
+ten 1
+a 1
+in 1
+▁alle 1
+o 1
+▁diesem 1
+▁unter 1
+▁am 1
+be 1
+▁Ein 1
+" 1
+z 1
+de 1
+▁hier 1
+▁Union 1
+▁B 1
+▁möchte 1
+ge 1
+den 1
+▁ihre 1
+▁gibt 1
+▁Er 1
+▁E 1
+▁Europäischen 1
+f 1
+▁A 1
+al 1
+▁bis 1
+b 1
+▁Ver 1
+S 1
+l 1
+g 1
+▁dieses 1
+B 1
+an 1
+/ 1
+▁An 1
+▁Menschen 1
+▁unsere 1
+▁Be 1
+▁Bericht 1
+▁keine 1
+▁dann 1
+w 1
+▁muss 1
+▁eines 1
+▁Wenn 1
+▁K 1
+▁vom 1
+▁Präsident 1
+▁Zeit 1
+▁ab 1
+h 1
+▁Ihnen 1
+ver 1
+▁Welt 1
+▁zwischen 1
+▁EU 1
+▁mich 1
+▁habe 1
+▁immer 1
+▁Parlament 1
+le 1
+ing 1
+▁anderen 1
+T 1
+ch 1
+▁sowie 1
+▁da 1
+▁C 1
+C 1
+▁F 1
+i 1
+re 1
+der 1
+u 1
+c 1
+▁Europa 1
+; 1
+et 1
+▁Im 1
+▁damit 1
+D 1
+▁diesen 1
+ur 1
+▁Aus 1
+▁Hotel 1
+▁G 1
+P 1
+ieren 1
+sch 1
+▁zwei 1
+▁Mitgliedstaaten 1
+A 1
+M 1
+▁Diese 1
+F 1
+▁„ 1
+▁W 1
+▁de 1
+▁gegen 1
+ar 1
+▁Ab 1
+em 1
+ter 1
+▁Frau 1
+▁wurden 1
+▁M 1
+▁sondern 1
+▁Vor 1
+K 1
+lich 1
+sten 1
+▁andere 1
+▁neue 1
+▁unserer 1
+ä 1
+▁Jahr 1
+ern 1
+▁Entwicklung 1
+▁Aber 1
+la 1
+▁weil 1
+▁T 1
+▁heute 1
+▁Auf 1
+▁mir 1
+it 1
+▁seine 1
+▁wo 1
+▁machen 1
+▁Ihre 1
+ischen 1
+▁selbst 1
+▁viele 1
+▁Re 1
+▁Europäische 1
+▁un 1
+▁Jahren 1
+▁P 1
+▁etwas 1
+▁meine 1
+▁D 1
+▁ganz 1
+▁Mit 1
+▁viel 1
+▁Rat 1
+▁also 1
+▁Ge 1
+▁sollte 1
+▁Frage 1
+▁wieder 1
+“ 1
+▁jedoch 1
+zu 1
+ein 1
+iert 1
+us 1
+▁ohne 1
+li 1
+▁ihrer 1
+ig 1
+▁jetzt 1
+W 1
+E 1
+▁bereits 1
+▁sollten 1
+▁bin 1
+ö 1
+▁Sch 1
+▁würde 1
+▁L 1
+▁ihr 1
+el 1
+at 1
+ungen 1
+▁Land 1
+▁ob 1
+▁Arbeit 1
+▁geht 1
+G 1
+▁So 1
+is 1
+ut 1
+▁dazu 1
+▁neuen 1
+▁europäischen 1
+or 1
+▁sehen 1
+▁O 1
+▁3 1
+▁einige 1
+um 1
+me 1
+ab 1
+N 1
+▁H 1
+2 1
+▁a 1
+▁Da 1
+▁sagen 1
+▁re 1
+▁finden 1
+p 1
+se 1
+▁I 1
+▁einfach 1
+▁wirklich 1
+▁Wie 1
+▁schon 1
+▁weiter 1
+▁Jahre 1
+igen 1
+▁darauf 1
+ra 1
+▁19 1
+▁tun 1
+il 1
+ta 1
+on 1
+▁Art 1
+' 1
+▁Recht 1
+H 1
+▁dafür 1
+ische 1
+▁1 1
+▁Unternehmen 1
+gen 1
+O 1
+I 1
+▁Teil 1
+▁the 1
+ma 1
+▁wollen 1
+▁Ziel 1
+ne 1
+▁Problem 1
+▁gut 1
+▁2 1
+▁allen 1
+ste 1
+▁Was 1
+▁R 1
+ri 1
+▁Um 1
+▁waren 1
+ed 1
+▁seiner 1
+é 1
+▁liegt 1
+▁Eine 1
+▁denn 1
+▁Maßnahmen 1
+▁pro 1
+ie 1
+R 1
+ir 1
+▁Bürger 1
+▁ihren 1
+ation 1
+▁nun 1
+▁möglich 1
+▁Bereich 1
+▁Nach 1
+▁La 1
+▁Le 1
+▁drei 1
+▁du 1
+▁Daten 1
+▁hatte 1
+▁k 1
+ü 1
+▁Zu 1
+▁zusammen 1
+▁während 1
+▁f 1
+▁Lage 1
+bar 1
+aus 1
+▁wissen 1
+▁bietet 1
+▁Informationen 1
+▁zurück 1
+liche 1
+to 1
+▁Grund 1
+U 1
+ag 1
+▁allem 1
+L 1
+▁Herrn 1
+ft 1
+schaft 1
+ro 1
+▁seit 1
+▁Dies 1
+▁doch 1
+▁N 1
+ierung 1
+▁— 1
+ia 1
+00 1
+▁Länder 1
+▁wäre 1
+▁unseren 1
+▁wichtig 1
+▁natürlich 1
+▁Z 1
+▁geben 1
+ol 1
+▁Leben 1
+▁Man 1
+▁St 1
+1 1
+▁ersten 1
+lichen 1
+isch 1
+▁d 1
+▁Bei 1
+chen 1
+▁könnte 1
+▁Rahmen 1
+▁Stadt 1
+ik 1
+un 1
+v 1
+). 1
+▁Fall 1
+▁einmal 1
+▁Euro 1
+▁ent 1
+▁Ma 1
+und 1
+▁denen 1
+x 1
+▁Für 1
+▁welche 1
+▁Un 1
+▁davon 1
+▁lassen 1
+▁Kinder 1
+▁Vorschlag 1
+▁Se 1
+3 1
+mal 1
+▁Zimmer 1
+kt 1
+... 1
+▁darüber 1
+▁Energie 1
+▁4 1
+▁Weg 1
+▁V 1
+▁ihnen 1
+▁Beispiel 1
+▁alles 1
+▁letzten 1
+▁hin 1
+▁soll 1
+ner 1
+▁of 1
+▁10 1
+ka 1
+▁b 1
+▁Parlaments 1
+▁Regierung 1
+▁insbesondere 1
+▁Unterstützung 1
+▁meiner 1
+▁dabei 1
+▁and 1
+▁erhalten 1
+▁kein 1
+ck 1
+▁kommen 1
+ige 1
+▁große 1
+▁dort 1
+▁Sicherheit 1
+ul 1
+zeit 1
+▁Zusammenarbeit 1
+▁genau 1
+▁Hier 1
+▁Ländern 1
+▁stehen 1
+▁Als 1
+th 1
+▁beim 1
+im 1
+▁Namen 1
+V 1
+▁Frauen 1
+as 1
+▁Fragen 1
+stand 1
+▁besteht 1
+▁Politik 1
+id 1
+heit 1
+▁Mittel 1
+land 1
+▁5 1
+os 1
+am 1
+▁System 1
+▁sch 1
+▁Zukunft 1
+▁% 1
+▁großen 1
+▁steht 1
+ad 1
+ze 1
+▁m 1
+▁Weise 1
+her 1
+▁Bedeutung 1
+▁direkt 1
+▁Ende 1
+▁Markt 1
+▁Ihr 1
+▁Seite 1
+▁unser 1
+▁Probleme 1
+au 1
+▁Kollegen 1
+▁worden 1
+▁stellen 1
+▁... 1
+▁De 1
+▁Möglichkeit 1
+▁Kunden 1
+lo 1
+▁Thema 1
+▁aller 1
+▁erste 1
+▁all 1
+▁Kommissar 1
+▁Tag 1
+da 1
+▁Ha 1
+ungs 1
+ha 1
+▁Geld 1
+▁Sp 1
+▁sicher 1
+▁Mal 1
+▁eigenen 1
+▁Ro 1
+ic 1
+▁Unter 1
+na 1
+auf 1
+▁muß 1
+▁Pro 1
+▁to 1
+▁Meinung 1
+▁klar 1
+▁Rolle 1
+▁Wasser 1
+▁Schutz 1
+▁Umwelt 1
+ien 1
+▁USA 1
+ent 1
+ler 1
+ben 1
+▁gegenüber 1
+▁politischen 1
+▁Programm 1
+▁seinen 1
+j 1
+▁sogar 1
+▁Staaten 1
+▁z 1
+▁ihn 1
+▁daher 1
+▁daran 1
+politik 1
+go 1
+do 1
+ster 1
+ach 1
+▁Über 1
+▁la 1
+▁erreichen 1
+ti 1
+bau 1
+enden 1
+▁Vi 1
+men 1
+▁ja 1
+ber 1
+gehen 1
+▁will 1
+end 1
+▁gerade 1
+he 1
+▁Verfügung 1
+▁besonders 1
+▁Ort 1
+▁gemacht 1
+▁jeder 1
+▁20 1
+gel 1
+▁unterstützen 1
+▁gehen 1
+▁etwa 1
+▁kommt 1
+▁weitere 1
+▁fest 1
+▁Millionen 1
+▁Al 1
+▁Situation 1
+▁Internet 1
+▁bekannt 1
+cht 1
+ca 1
+▁zwar 1
+▁Haus 1
+▁Dinge 1
+▁vielleicht 1
+▁macht 1
+▁Finanz 1
+▁weit 1
+mp 1
+▁glaube 1
+ce 1
+▁weniger 1
+▁europäische 1
+▁Auch 1
+▁verschiedenen 1
+▁sowohl 1
+hn 1
+ke 1
+är 1
+▁innerhalb 1
+▁Punkt 1
+▁Bo 1
+▁bringen 1
+▁Gesellschaft 1
+▁Am 1
+▁führen 1
+▁sagte 1
+ende 1
+min 1
+ger 1
+▁Lösung 1
+▁Alle 1
+halt 1
+ion 1
+co 1
+▁Spiel 1
+▁vielen 1
+ist 1
+▁Hilfe 1
+di 1
+▁meisten 1
+▁entfernt 1
+hin 1
+), 1
+▁hinaus 1
+vor 1
+▁gesagt 1
+▁Gemeinschaft 1
+ff 1
+▁Seiten 1
+▁werde 1
+sa 1
+▁Form 1
+▁mein 1
+che 1
+▁Ihrer 1
+ß 1
+▁politische 1
+▁besser 1
+ment 1
+igkeit 1
+▁voll 1
+▁Geschichte 1
+▁Richtlinie 1
+▁schnell 1
+▁bieten 1
+▁Li 1
+▁indem 1
+▁stellt 1
+▁c 1
+▁Wirtschaft 1
+▁nehmen 1
+▁würden 1
+▁internationalen 1
+▁weiß 1
+▁U 1
+▁Haupt 1
+▁Projekt 1
+so 1
+▁Artikel 1
+ll 1
+▁unserem 1
+▁beiden 1
+io 1
+▁wirtschaftliche 1
+EN 1
+ell 1
+▁Leute 1
+teil 1
+▁Kon 1
+▁Zusammenhang 1
+ex 1
+▁hatten 1
+▁deutlich 1
+▁bedeutet 1
+▁6 1
+ko 1
+▁befindet 1
+iz 1
+▁vier 1
+▁Preis 1
+ür 1
+▁18 1
+▁brauchen 1
+▁tatsächlich 1
+▁möchten 1
+▁15 1
+werk 1
+spiel 1
+▁Bild 1
+vi 1
+recht 1
+▁schaffen 1
+▁J 1
+sp 1
+fe 1
+ph 1
+zi 1
+op 1
+om 1
+▁nämlich 1
+▁solche 1
+▁sei 1
+ität 1
+Z 1
+▁Natur 1
+raum 1
+▁ihm 1
+▁China 1
+ier 1
+▁Kultur 1
+▁Fa 1
+ho 1
+ub 1
+▁gute 1
+▁Co 1
+▁Mar 1
+▁Mo 1
+▁wenig 1
+▁Rechts 1
+pro 1
+ine 1
+▁denke 1
+▁Deshalb 1
+6 1
+▁bereit 1
+▁darin 1
+com 1
+ken 1
+▁Neu 1
+art 1
+▁Bar 1
+▁Dank 1
+gang 1
+▁Ja 1
+iv 1
+form 1
+▁Uhr 1
+▁Do 1
+▁gab 1
+▁bitte 1
+▁Durch 1
+▁' 1
+▁Entschließung 1
+▁Ka 1
+▁le 1
+▁könnten 1
+▁wichtige 1
+▁Ebene 1
+isten 1
+▁Du 1
+▁w 1
+▁Ihren 1
+ve 1
+▁Personen 1
+▁He 1
+ale 1
+wand 1
+▁Wahl 1
+▁Hand 1
+▁Kosten 1
+▁zeigt 1
+▁Wirtschafts 1
+gabe 1
+▁leben 1
+▁jeden 1
+and 1
+▁Ar 1
+▁Industrie 1
+▁unterstützt 1
+▁Verbraucher 1
+▁Entscheidung 1
+▁sprechen 1
+ton 1
+▁Ca 1
+▁kleine 1
+ga 1
+▁Dieses 1
+▁gleich 1
+halten 1
+wa 1
+▁Präsidentin 1
+ismus 1
+▁notwendig 1
+▁lange 1
+weise 1
+▁Haushalts 1
+▁erst 1
+los 1
+man 1
+▁Zugang 1
+▁Ansicht 1
+▁Lebens 1
+▁Bezug 1
+▁Pa 1
+▁arbeiten 1
+hi 1
+▁Region 1
+bereich 1
+▁keinen 1
+000 1
+▁7 1
+▁neu 1
+▁Fraktion 1
+▁Raum 1
+▁Jahres 1
+haus 1
+▁Dienstleistung 1
+tel 1
+▁ihrem 1
+▁dessen 1
+▁Prozess 1
+▁Rück 1
+tag 1
+mm 1
+sicht 1
+▁Also 1
+age 1
+ren 1
+▁nutzen 1
+▁Rates 1
+▁seinem 1
+geben 1
+gt 1
+kommen 1
+▁The 1
+▁heraus 1
+▁Arbeits 1
+stellen 1
+▁hoch 1
+▁Produkte 1
+▁st 1
+▁ausge 1
+% 1
+keit 1
+▁Abstimmung 1
+▁nächsten 1
+mo 1
+▁unterschiedlich 1
+▁Unsere 1
+▁Demokratie 1
+▁Erfolg 1
+▁30 1
+▁einigen 1
+▁gehört 1
+▁Dieser 1
+pa 1
+▁handelt 1
+lä 1
+va 1
+▁gilt 1
+▁entwickelt 1
+ly 1
+0 1
+▁konnte 1
+▁schwer 1
+▁Ent 1
+Berichterstatter 1
+tt 1
+tritt 1
+nt 1
+▁Lo 1
+▁darf 1
+▁Sa 1
+▁Anwendung 1
+son 1
+▁öffentlichen 1
+▁Verfahren 1
+▁12 1
+▁Aussprache 1
+lu 1
+▁besten 1
+ick 1
+▁sollen 1
+▁erreicht 1
+▁Rechte 1
+▁fünf 1
+▁verwendet 1
+ert 1
+▁Service 1
+pi 1
+log 1
+▁Geschäfts 1
+▁Ho 1
+▁Menschenrechte 1
+bo 1
+▁t 1
+▁Grundlage 1
+▁Richtung 1
+▁weltweit 1
+▁Software 1
+mittel 1
+▁Website 1
+▁Europas 1
+▁Bi 1
+▁Verbindung 1
+▁weiterhin 1
+▁ebenfalls 1
+▁Verantwortung 1
+pe 1
+uch 1
+▁hätte 1
+▁nationalen 1
+än 1
+▁Doch 1
+▁Deutschland 1
+ug 1
+no 1
+▁Debatte 1
+lei 1
+ry 1
+akt 1
+5 1
+▁Ra 1
+▁kleinen 1
+▁Me 1
+ak 1
+▁deshalb 1
+▁Wettbewerb 1
+▁dürfen 1
+▁betrifft 1
+tra 1
+▁Bau 1
+▁zeigen 1
+▁bzw 1
+lichkeit 1
+▁Qualität 1
+bel 1
+schutz 1
+▁Tatsache 1
+▁gemeinsame 1
+▁Bevölkerung 1
+▁Gebiet 1
+hr 1
+▁ca 1
+▁Na 1
+bi 1
+system 1
+▁wahr 1
+▁ange 1
+▁Sozial 1
+elt 1
+▁2009 1
+▁Krieg 1
+per 1
+▁hoffe 1
+4 1
+▁Vereinigte 1
+ok 1
+ja 1
+▁eigentlich 1
+▁meinen 1
+ive 1
+▁Reihe 1
+weg 1
+▁aufgrund 1
+ling 1
+▁Antwort 1
+▁Umsetzung 1
+▁Pi 1
+▁halten 1
+all 1
+zen 1
+▁bekommen 1
+▁abge 1
+▁fast 1
+▁Reise 1
+ierte 1
+▁oft 1
+▁8 1
+▁verstehen 1
+▁h 1
+gi 1
+ard 1
+▁ganze 1
+▁sozialen 1
+▁auszu 1
+ium 1
+reg 1
+▁Vertrag 1
+▁Interesse 1
+▁& 1
+pl 1
+rat 1
+fa 1
+ant 1
+▁meinem 1
+▁g 1
+▁kurz 1
+▁darum 1
+▁Abkommen 1
+▁Ziele 1
+Änderungsanträge 1
+▁Daher 1
+▁Wert 1
+ün 1
+▁recht 1
+▁helfen 1
+▁Produkt 1
+▁Wort 1
+nehmen 1
+▁statt 1
+▁führt 1
+tal 1
+▁Schritt 1
+▁nie 1
+▁Partner 1
+▁50 1
+▁Selbst 1
+rü 1
+▁bleiben 1
+ou 1
+ER 1
+▁dar 1
+if 1
+rie 1
+▁paar 1
+▁Bedingungen 1
+▁se 1
+▁US 1
+▁p 1
+ord 1
+ci 1
+mon 1
+igung 1
+▁später 1
+▁wohl 1
+▁Musik 1
+▁heißt 1
+ör 1
+fi 1
+▁Nicht 1
+mit 1
+lassen 1
+▁Auswirkungen 1
+ow 1
+sta 1
+sk 1
+acht 1
+kon 1
+▁Herausforderung 1
+port 1
+▁richtig 1
+tro 1
+einander 1
+sche 1
+▁Gr 1
+anz 1
+▁erforderlich 1
+St 1
+▁frei 1
+▁jede 1
+▁Ta 1
+eln 1
+▁Nun 1
+▁Druck 1
+▁Ergebnis 1
+▁Ihrem 1
+up 1
+cken 1
+haft 1
+▁di 1
+▁Text 1
+fen 1
+▁Kontrolle 1
+▁Möglichkeiten 1
+▁Mai 1
+stellung 1
+▁Ch 1
+▁Institutionen 1
+▁gemeinsam 1
+▁verschiedene 1
+▁findet 1
+ot 1
+▁per 1
+weisen 1
+gleich 1
+▁internationale 1
+▁beispielsweise 1
+lle 1
+▁Vorschläge 1
+▁allerdings 1
+▁Macht 1
+▁verbunden 1
+▁jedes 1
+nd 1
+▁jedem 1
+▁hinter 1
+▁gestellt 1
+▁je 1
+▁gemeinsamen 1
+▁Von 1
+▁Ri 1
+▁eigene 1
+lass 1
+▁for 1
+ill 1
+▁Ne 1
+▁Design 1
+▁Ko 1
+▁Strategie 1
+du 1
+In 1
+▁gleichzeitig 1
+▁ermöglicht 1
+▁Person 1
+20 1
+schw 1
+▁Dr 1
+▁Hinblick 1
+pf 1
+▁Handels 1
+fer 1
+▁Forschung 1
+▁her 1
+mi 1
+▁Damen 1
+gu 1
+reich 1
+ng 1
+gr 1
+X 1
+verständlich 1
+▁einzige 1
+mäßig 1
+▁Version 1
+ei 1
+▁Verordnung 1
+▁Minuten 1
+sh 1
+▁Blick 1
+▁Prozent 1
+ld 1
+á 1
+▁ebenso 1
+▁gesamten 1
+isierung 1
+▁ging 1
+sion 1
+▁Buch 1
+▁Sicherheits 1
+▁Einsatz 1
+▁leicht 1
+▁Ausdruck 1
+▁Ergebnisse 1
+▁sieht 1
+arbeit 1
+▁enthalten 1
+ich 1
+▁sp 1
+▁Landes 1
+▁Funktion 1
+▁Computer 1
+weit 1
+▁9 1
+▁Familien 1
+( 1
+▁Nähe 1
+▁einzelnen 1
+▁Kunst 1
+▁warum 1
+handel 1
+tri 1
+▁Auto 1
+▁Aufgabe 1
+lage 1
+▁Meine 1
+spolitik 1
+▁Beziehungen 1
+▁gegeben 1
+▁Video 1
+▁völlig 1
+▁Freiheit 1
+Lachen 1
+▁100 1
+ellen 1
+unter 1
+▁einzu 1
+wei 1
+▁14 1
+▁Liste 1
+▁soziale 1
+gestellt 1
+▁Russland 1
+arbeiten 1
+▁Hotels 1
+” 1
+elle 1
+▁schwierig 1
+▁nichts 1
+bild 1
+▁Interessen 1
+▁liegen 1
+▁angenommen 1
+▁/ 1
+▁Erweiterung 1
+trag 1
+markt 1
+▁bevor 1
+▁Regionen 1
+▁zweite 1
+sektor 1
+nder 1
+träge 1
+▁1. 1
+▁Platz 1
+▁setzen 1
+Mail 1
+zug 1
+▁Film 1
+ate 1
+del 1
+▁Berg 1
+innen 1
+iger 1
+▁Änderungsantrag 1
+▁ganzen 1
+▁Kampf 1
+▁Po 1
+▁Gewalt 1
+▁Zum 1
+▁versuchen 1
+ip 1
+sel 1
+▁spielen 1
+▁danken 1
+▁Gruppe 1
+ere 1
+▁Verhandlungen 1
+▁schriftlich 1
+ierten 1
+▁ma 1
+▁Groß 1
+▁versucht 1
+ud 1
+ah 1
+über 1
+▁Abgeordneten 1
+▁Gelegenheit 1
+berg 1
+▁Mitarbeiter 1
+▁lang 1
+▁Technologie 1
+hm 1
+▁Herren 1
+hilfe 1
+▁16 1
+tlich 1
+▁To 1
+tu 1
+▁Gefahr 1
+Re 1
+▁2008 1
+▁Zahl 1
+▁Seit 1
+platz 1
+ktion 1
+▁Tat 1
+▁Position 1
+rit 1
+▁Kolleginnen 1
+greifen 1
+lang 1
+tik 1
+▁Kraft 1
+▁2010 1
+▁17 1
+▁Fehler 1
+▁Nutzung 1
+▁San 1
+▁Gesundheit 1
+▁Sache 1
+po 1
+▁25 1
+▁allein 1
+▁Ba 1
+▁verfügt 1
+▁Ex 1
+▁Türkei 1
+▁Mitglieder 1
+_ 1
+▁Informations 1
+igt 1
+ziel 1
+▁besondere 1
+▁Erfahrung 1
+länder 1
+tre 1
+▁Krise 1
+ang 1
+▁ex 1
+▁Außerdem 1
+▁wegen 1
+fall 1
+▁Wa 1
+▁Staats 1
+▁Kontakt 1
+▁Sport 1
+rt 1
+kel 1
+▁Förderung 1
+▁geführt 1
+setzung 1
+▁Beitrag 1
+▁Idee 1
+▁Dialog 1
+Ver 1
+▁Sta 1
+▁gerne 1
+▁entwickeln 1
+▁X 1
+▁kam 1
+Gelächter 1
+▁rund 1
+leben 1
+▁bleibt 1
+." 1
+ac 1
+kraft 1
+icht 1
+▁Bilder 1
+▁Gra 1
+▁erfolgreich 1
+▁Wer 1
+fang 1
+▁Anfang 1
+▁mehrere 1
+▁Themen 1
+ni 1
+legen 1
+▁äußerst 1
+ru 1
+▁verbessern 1
+▁befinden 1
+pt 1
+▁neben 1
+▁Verbesserung 1
+cher 1
+▁Bereichen 1
+▁Sektor 1
+rk 1
+ator 1
+ki 1
+▁Ausschuss 1
+kosten 1
+pp 1
+▁Ausschusses 1
+▁Luft 1
+▁abzu 1
+▁mal 1
+▁denken 1
+ort 1
+▁Meer 1
+▁Modell 1
+sehen 1
+▁jemand 1
+mmer 1
+est 1
+wert 1
+▁Gebäude 1
+▁gar 1
+isieren 1
+za 1
+▁Unser 1
+▁verwenden 1
+bt 1
+▁Steuer 1
+▁Zentrum 1
+▁gestimmt 1
+▁13 1
+voll 1
+▁is 1
+dem 1
+7 1
+▁stark 1
+▁“ 1
+▁Kl 1
+nis 1
+▁u 1
+▁Jahrhundert 1
+ssen 1
+▁konnten 1
+führung 1
+▁lässt 1
+▁unabhängig 1
+tes 1
+ron 1
+▁ermöglichen 1
+▁stärker 1
+schlag 1
+satz 1
+▁Körper 1
+tet 1
+▁Regierungen 1
+die 1
+net 1
+ähr 1
+serv 1
+ina 1
+erung 1
+org 1
+▁Woche 1
+’ 1
+▁Web 1
+steuer 1
+▁Fl 1
+▁Behörden 1
+▁New 1
+▁Damit 1
+▁gleichen 1
+▁zweiten 1
+mel 1
+▁Familie 1
+▁unseres 1
+▁Konferenz 1
+▁km 1
+▁Augen 1
+sam 1
+▁Wissen 1
+▁Menge 1
+▁con 1
+ap 1
+▁obwohl 1
+setzen 1
+▁Flug 1
+▁ändern 1
+fr 1
+▁Wachstum 1
+▁Benutzer 1
+▁Tage 1
+▁Licht 1
+▁erklärt 1
+one 1
+alen 1
+▁See 1
+▁Wo 1
+stelle 1
+▁bestimmte 1
+] 1
+▁kon 1
+▁Konflikt 1
+fahrt 1
+▁Arbeitnehmer 1
+our 1
+▁deine 1
+▁Landwirtschaft 1
+folge 1
+▁Lissabon 1
+tr 1
+▁wichtigen 1
+▁wichtigsten 1
+gegeben 1
+▁Anzahl 1
+▁bestehen 1
+wo 1
+▁Lu 1
+▁El 1
+des 1
+▁Zeitpunkt 1
+▁National 1
+▁Team 1
+ika 1
+▁großer 1
+▁Fischerei 1
+▁fördern 1
+▁gesamte 1
+▁2007 1
+▁Schaffung 1
+▁Dann 1
+▁scheint 1
+▁Sicht 1
+▁hinsichtlich 1
+▁Reform 1
+nde 1
+▁wollte 1
+hal 1
+▁24 1
+▁Linie 1
+▁Wein 1
+▁Einzel 1
+▁hohe 1
+▁Standpunkt 1
+▁Je 1
+▁Frankreich 1
+▁größte 1
+ue 1
+rin 1
+arm 1
+▁Entwicklungs 1
+▁häufig 1
+▁weiteren 1
+▁öffentliche 1
+llen 1
+▁Sprache 1
+▁bisher 1
+▁Gemeinschafts 1
+ide 1
+▁Notwendigkeit 1
+▁ständig 1
+äu 1
+mb 1
+mer 1
+bl 1
+▁Ki 1
+▁Werk 1
+▁wobei 1
+last 1
+rufen 1
+▁passiert 1
+▁gesehen 1
+▁gebracht 1
+▁Binnenmarkt 1
+heim 1
+ts 1
+qui 1
+▁Gesamt 1
+▁Verwendung 1
+▁sagt 1
+zahl 1
+wirtschaft 1
+▁Angebot 1
+▁anderer 1
+si 1
+ana 1
+J 1
+▁Moment 1
+istischen 1
+vo 1
+▁Produktion 1
+eur 1
+▁nächste 1
+▁Con 1
+ran 1
+▁anzu 1
+▁Kind 1
+Be 1
+the 1
+▁Behandlung 1
+▁Ad 1
+▁[ 1
+stein 1
+zer 1
+▁Di 1
+bu 1
+▁Bildung 1
+▁Denn 1
+eu 1
+▁kennen 1
+▁Ni 1
+ign 1
+▁Dollar 1
+bei 1
+str 1
+▁Während 1
+▁sechs 1
+ischer 1
+zeug 1
+setzt 1
+feld 1
+▁Währung 1
+wer 1
+▁tragen 1
+▁Medien 1
+▁2000 1
+▁genug 1
+▁Juni 1
+▁Pla 1
+▁Mitglied 1
+▁Mann 1
+▁derzeit 1
+▁größere 1
+▁wer 1
+▁Restaurant 1
+▁Handel 1
+programm 1
+▁Tra 1
+kl 1
+]] 1
+▁hätten 1
+An 1
+fahren 1
+res 1
+▁Umgebung 1
+▁Mehrheit 1
+lau 1
+▁durchgeführt 1
+▁Vertrags 1
+Applaus 1
+▁Tri 1
+▁Frei 1
+▁Zweck 1
+▁Afrika 1
+üt 1
+30 1
+▁letzte 1
+▁Y 1
+▁Verfassung 1
+industrie 1
+▁Vertreter 1
+▁No 1
+welt 1
+jahr 1
+hl 1
+▁40 1
+▁meines 1
+▁Finanzierung 1
+ob 1
+▁Lösungen 1
+▁Bitte 1
+seite 1
+ay 1
+▁Einführung 1
+frage 1
+▁Fortschritte 1
+▁falsch 1
+je 1
+▁enthält 1
+". 1
+▁ausgestattet 1
+▁Beschäftigung 1
+▁tief 1
+▁li 1
+▁Stunden 1
+▁anders 1
+ös 1
+▁treffen 1
+▁Her 1
+▁wirtschaftlich 1
+▁braucht 1
+▁genannt 1
+▁Ju 1
+▁Lebensmittel 1
+▁gehören 1
+▁sorgen 1
+schau 1
+blick 1
+schen 1
+▁Jo 1
+▁v 1
+stre 1
+▁geeignet 1
+▁Gu 1
+▁Darüber 1
+treib 1
+о 1
+▁eindeutig 1
+fass 1
+▁wahrscheinlich 1
+Y 1
+▁komplett 1
+▁schließlich 1
+wi 1
+wissenschaftlich 1
+mann 1
+istische 1
+▁Integration 1
+▁he 1
+▁Te 1
+netz 1
+▁geschlossen 1
+zimmer 1
+▁sa 1
+▁Innovation 1
+▁Folgen 1
+▁Fahr 1
+▁guten 1
+▁lediglich 1
+nahme 1
+▁neuer 1
+▁Entscheidungen 1
+▁praktisch 1
+▁0 1
+▁Tier 1
+▁Instrument 1
+8 1
+50 1
+▁nahe 1
+▁Milliarden 1
+▁[[ 1
+iti 1
+▁All 1
+▁getan 1
+▁glauben 1
+▁Konzept 1
+▁verfügen 1
+ven 1
+▁Änderung 1
+▁zunächst 1
+itz 1
+▁Diskussion 1
+▁erwähnt 1
+▁Zwei 1
+▁Gruppen 1
+▁we 1
+▁Stelle 1
+har 1
+schließen 1
+▁Gesundheits 1
+▁Aufmerksamkeit 1
+met 1
+hör 1
+▁Mi 1
+▁Schl 1
+▁Herzen 1
+lt 1
+▁Vertrauen 1
+führen 1
+▁absolut 1
+▁Gericht 1
+sätze 1
+▁Inhalt 1
+genommen 1
+▁eingesetzt 1
+▁Punkte 1
+▁leisten 1
+able 1
+▁Park 1
+fo 1
+ix 1
+▁Höhe 1
+sprech 1
+▁Not 1
+▁unbe 1
+▁liebe 1
+▁Grenzen 1
+▁Fach 1
+▁Projekte 1
+bringen 1
+", 1
+▁bessere 1
+▁funktioniert 1
+▁Wi 1
+▁beste 1
+▁Wochen 1
+▁Test 1
+▁Klima 1
+▁Inter 1
+schä 1
+▁Ansatz 1
+▁bestimmten 1
+▁Änderungen 1
+tisch 1
+▁Schiff 1
+od 1
+punkt 1
+preis 1
+▁Bekämpfung 1
+sicherheit 1
+▁Beitritt 1
+▁Erklärung 1
+▁Auswahl 1
+▁Präsidentschaft 1
+▁Online 1
+stra 1
+▁solchen 1
+▁Gen 1
+▁Vereinbarung 1
+▁Bank 1
+fähig 1
+▁Versuch 1
+▁bringt 1
+▁größten 1
+▁Urlaub 1
+pass 1
+▁Fisch 1
+uell 1
+▁Italien 1
+▁lo 1
+▁Schw 1
+ku 1
+wie 1
+tern 1
+▁geworden 1
+▁on 1
+ity 1
+▁zehn 1
+▁früher 1
+▁Privat 1
+tor 1
+▁effektiv 1
+▁wichtiger 1
+▁gewährleisten 1
+▁dadurch 1
+plan 1
+▁vorhanden 1
+▁Haushalt 1
+änge 1
+▁Ru 1
+▁erneut 1
+ange 1
+▁Rede 1
+uß 1
+AN 1
+▁morgen 1
+▁eher 1
+ance 1
+▁jene 1
+▁hohen 1
+▁Strand 1
+zig 1
+▁verfügbar 1
+ank 1
+▁Gäste 1
+hä 1
+▁Insel 1
+▁l 1
+▁bi 1
+▁vorgeschlagen 1
+glich 1
+ba 1
+▁Kommissarin 1
+▁Natürlich 1
+laden 1
+▁automatisch 1
+▁Investitionen 1
+▁Zusammen 1
+IT 1
+buch 1
+▁Erachtens 1
+▁offen 1
+dig 1
+▁außerdem 1
+▁somit 1
+▁Außen 1
+Sch 1
+▁Kern 1
+▁Nur 1
+legt 1
+▁Heute 1
+out 1
+▁überhaupt 1
+▁Ga 1
+▁Ratspräsident 1
+▁2006 1
+ön 1
+▁Öl 1
+bare 1
+▁Pe 1
+▁Opfer 1
+ziehen 1
+▁erkennen 1
+▁Preise 1
+sverfahren 1
+▁vergangenen 1
+isiert 1
+▁Initiative 1
+▁vollständig 1
+▁genommen 1
+▁Ti 1
+ette 1
+press 1
+▁setzt 1
+▁Werte 1
+▁Viele 1
+▁Auffassung 1
+systeme 1
+sfähigkeit 1
+▁gefunden 1
+▁Car 1
+▁zahlreiche 1
+▁begann 1
+▁Armut 1
+barkeit 1
+ite 1
+▁vorgesehen 1
+▁Aufenthalt 1
+▁Sommer 1
+bot 1
+▁Nationen 1
+det 1
+stehen 1
+▁Sinne 1
+▁Dabei 1
+▁Aktivitäten 1
+cu 1
+IS 1
+alität 1
+▁erster 1
+ind 1
+▁Regeln 1
+▁Dis 1
+▁halte 1
+▁Führung 1
+▁suchen 1
+▁lernen 1
+▁behandelt 1
+▁Ressourcen 1
+ade 1
+eb 1
+▁beginnen 1
+stieg 1
+▁hervor 1
+amm 1
+® 1
+▁folgt 1
+▁ziehen 1
+▁vorgelegt 1
+▁aufge 1
+les 1
+▁do 1
+utz 1
+▁erwarten 1
+▁Frieden 1
+ult 1
+ock 1
+▁Verkehrs 1
+▁Gefühl 1
+tischen 1
+▁Lassen 1
+kom 1
+▁beide 1
+schi 1
+▁fand 1
+cke 1
+▁Warum 1
+▁festgelegt 1
+▁vergessen 1
+unternehmen 1
+▁Öffentlichkeit 1
+▁Bio 1
+▁länger 1
+ht 1
+▁Berlin 1
+lagen 1
+fach 1
+▁Basis 1
+ect 1
+▁Nord 1
+druck 1
+AR 1
+frei 1
+tan 1
+▁aufgenommen 1
+▁erfahren 1
+ice 1
+▁Folge 1
+merk 1
+fl 1
+ino 1
+▁verbessert 1
+▁Sitzung 1
+▁Beginn 1
+▁Besuch 1
+▁Neben 1
+tin 1
+▁alten 1
+wachsen 1
+ES 1
+Ge 1
+▁Organisation 1
+▁gelegen 1
+▁Ob 1
+▁Kapital 1
+▁arbeitet 1
+▁Dienst 1
+▁diejenigen 1
+▁September 1
+▁insgesamt 1
+▁gewesen 1
+Ma 1
+mor 1
+ruf 1
+▁Vielen 1
+ker 1
+mun 1
+▁Jetzt 1
+▁Stellen 1
+richten 1
+▁co 1
+▁geschaffen 1
+stoff 1
+stände 1
+▁Tätigkeit 1
+ris 1
+ordnung 1
+▁fort 1
+10 1
+▁Mehr 1
+▁Port 1
+▁Bad 1
+▁ziemlich 1
+eil 1
+ya 1
+▁verantwortlich 1
+ssystem 1
+▁Standard 1
+service 1
+▁ruhig 1
+▁voran 1
+▁et 1
+trä 1
+▁angesicht 1
+▁Bu 1
+▁benötigen 1
+▁einschließlich 1
+▁Vorschriften 1
+▁tra 1
+▁weder 1
+tische 1
+▁Juli 1
+▁Transparenz 1
+▁moderne 1
+▁Dezember 1
+▁Ste 1
+▁Israel 1
+▁Küche 1
+▁nimmt 1
+▁benutzt 1
+▁kannst 1
+▁Ausbildung 1
+ließ 1
+▁eng 1
+▁kleiner 1
+▁Monaten 1
+▁richtige 1
+rate 1
+partner 1
+▁Hinsicht 1
+▁Staat 1
+▁erinnern 1
+▁Programme 1
+lose 1
+▁UN 1
+enz 1
+▁Cha 1
+▁sofort 1
+▁Tagesordnung 1
+▁aktiv 1
+▁schützen 1
+▁erfolgt 1
+▁dringend 1
+▁Pri 1
+rechte 1
+gebiet 1
+▁mag 1
+▁Bestimmungen 1
+tat 1
+▁erzielt 1
+tim 1
+▁ko 1
+▁11 1
+▁Republik 1
+▁Ä 1
+▁vertreten 1
+ett 1
+▁Hause 1
+schrift 1
+ender 1
+▁verändert 1
+▁Kom 1
+▁inter 1
+▁War 1
+▁Netzwerk 1
+hof 1
+tum 1
+▁Nacht 1
+▁Forschungs 1
+▁aufzu 1
+▁Dritt 1
+▁Vergangenheit 1
+▁po 1
+▁o 1
+▁leider 1
+▁Bereiche 1
+problem 1
+CH 1
+teile 1
+▁Einige 1
+▁bewusst 1
+▁Stabilität 1
+▁beitragen 1
+▁unge 1
+cker 1
+laufen 1
+▁Bemühungen 1
+mar 1
+▁zumindest 1
+RO 1
+wirken 1
+kop 1
+▁Klein 1
+▁000 1
+▁Firma 1
+▁täglich 1
+▁Or 1
+▁Gründen 1
+Die 1
+que 1
+nnen 1
+▁Gehirn 1
+▁Unternehmens 1
+▁wenige 1
+lin 1
+SE 1
+▁Kr 1
+▁2. 1
+▁Weiter 1
+erweise 1
+hält 1
+▁bezüglich 1
+▁Untersuchung 1
+▁europäischer 1
+▁kostenlos 1
+▁fragen 1
+▁gemäß 1
+daten 1
+▁Information 1
+trieb 1
+▁zunehmend 1
+gegangen 1
+▁Kompromiss 1
+▁erwartet 1
+▁Fällen 1
+RE 1
+▁Risiko 1
+▁Kar 1
+kal 1
+▁Vorsitz 1
+lauf 1
+▁Erst 1
+▁erlaubt 1
+▁Fu 1
+▁euch 1
+Fraktion 1
+lü 1
+▁Anforderungen 1
+verkehr 1
+▁Dokument 1
+richt 1
+▁Organ 1
+▁verhindern 1
+DE 1
+ände 1
+а 1
+▁Paket 1
+▁Post 1
+▁begrüße 1
+▁folgen 1
+▁Generation 1
+dienst 1
+▁si 1
+▁wesentlich 1
+▁Aufnahme 1
+▁Wieder 1
+▁unten 1
+▁Struktur 1
+▁Aspekte 1
+pol 1
+ring 1
+licher 1
+▁Bewegung 1
+schein 1
+▁Amerika 1
+ition 1
+werte 1
+▁Solidarität 1
+▁Alter 1
+▁versch 1
+alter 1
+▁gerecht 1
+fonds 1
+kehr 1
+е 1
+▁Fi 1
+▁Indien 1
+schluss 1
+kla 1
+lan 1
+▁Januar 1
+▁gr 1
+bahn 1
+▁West 1
+wasser 1
+▁Th 1
+▁Protokoll 1
+mä 1
+Schlusselwortern 1
+▁gezeigt 1
+▁spielt 1
+ationen 1
+▁Erde 1
+reichen 1
+▁Betrieb 1
+▁Ideen 1
+▁Spanien 1
+treten 1
+▁Zur 1
+▁veröffentlicht 1
+ica 1
+▁getroffen 1
+▁März 1
+▁Bro 1
+▁Anfrage 1
+▁Zweitens 1
+▁groß 1
+we 1
+▁Terrorismus 1
+▁60 1
+äre 1
+▁Fe 1
+tar 1
+AL 1
+▁me 1
+▁Einfluss 1
+▁gleiche 1
+▁benutzen 1
+▁Gi 1
+▁Ku 1
+▁Mutter 1
+ologische 1
+con 1
+▁sah 1
+▁Annahme 1
+▁Personal 1
+app 1
+wende 1
+▁Sy 1
+▁Gewinn 1
+▁entscheiden 1
+▁Q 1
+▁nennen 1
+▁perfekt 1
+▁einge 1
+▁Auftrag 1
+fragen 1
+bildung 1
+dien 1
+▁Wissenschaft 1
+▁Straf 1
+▁Gesetz 1
+▁Partei 1
+▁wider 1
+▁Herz 1
+▁Ist 1
+▁Technik 1
+▁entsprechend 1
+▁Plan 1
+▁Erfahrungen 1
+▁2005 1
+▁bald 1
+▁benötigt 1
+ost 1
+▁unmittelbar 1
+▁schlecht 1
+oren 1
+anten 1
+▁ernst 1
+ori 1
+▁Erstens 1
+▁unver 1
+▁Schi 1
+▁begrüßen 1
+01 1
+▁Schlüssel 1
+▁Flughafen 1
+zog 1
+▁Größe 1
+aktion 1
+stellt 1
+▁normal 1
+▁Straßen 1
+▁außerhalb 1
+▁Wunsch 1
+▁Webseite 1
+▁ne 1
+▁Vergleich 1
+stück 1
+fällig 1
+▁erfüllen 1
+9 1
+misch 1
+▁Boden 1
+▁Süd 1
+IN 1
+ges 1
+isse 1
+▁Japan 1
+sie 1
+ian 1
+19 1
+▁Transport 1
+▁lösen 1
+▁endlich 1
+▁Firmen 1
+sbereich 1
+bericht 1
+▁ausreichend 1
+jo 1
+▁heutigen 1
+for 1
+wesen 1
+nom 1
+▁Oktober 1
+▁aktuellen 1
+▁Hi 1
+▁globalen 1
+▁genießen 1
+atur 1
+▁EUR 1
+▁Bru 1
+▁Fest 1
+staat 1
+▁2004 1
+ure 1
+▁fordern 1
+▁Kindern 1
+ys 1
+▁rein 1
+▁darstellt 1
+▁Aufgaben 1
+▁Monate 1
+▁Com 1
+▁Geist 1
+▁integriert 1
+▁Hoch 1
+▁eben 1
+üsse 1
+и 1
+▁Bürgerinnen 1
+park 1
+▁Bis 1
+▁Telefon 1
+▁Irak 1
+dauer 1
+▁Fernseh 1
+▁Wohn 1
+▁Märkte 1
+ano 1
+▁Wei 1
+entwicklung 1
+▁ha 1
+lö 1
+▁gewählt 1
+▁Patienten 1
+bank 1
+amp 1
+▁Su 1
+▁genutzt 1
+▁Kopf 1
+▁Meter 1
+▁überzeugt 1
+▁Objekt 1
+▁Osten 1
+è 1
+tten 1
+▁außer 1
+▁Verhalten 1
+▁stimmen 1
+▁del 1
+▁Nr 1
+▁Reaktion 1
+▁Botschaft 1
+▁Bas 1
+▁wählen 1
+nach 1
+▁Fortschritt 1
+▁Tagen 1
+▁dachte 1
+▁Sinn 1
+▁nachdem 1
+▁breite 1
+▁Tre 1
+▁Spieler 1
+▁ihres 1
+▁kaum 1
+▁Obwohl 1
+▁Vorstellung 1
+▁r 1
+▁gering 1
+uf 1
+▁weg 1
+gra 1
+leg 1
+ari 1
+▁niemand 1
+funktion 1
+ständig 1
+▁Verwaltung 1
+▁Hoffnung 1
+projekt 1
+í 1
+▁Schwierigkeiten 1
+▁trotz 1
+▁anderes 1
+▁geändert 1
+EL 1
+▁anderem 1
+reu 1
+burg 1
+▁ausschließlich 1
+betrieb 1
+▁genannten 1
+▁Option 1
+▁neues 1
+▁hören 1
+▁no 1
+▁Männer 1
+▁oben 1
+▁überall 1
+lieb 1
+▁Ski 1
+wechsel 1
+sprozess 1
+gar 1
+▁Universität 1
+fin 1
+▁Go 1
+▁Rechnung 1
+▁Künstler 1
+ations 1
+zustellen 1
+tru 1
+▁verpflichtet 1
+▁fa 1
+▁Angelegenheit 1
+▁nationale 1
+▁i 1
+▁Gar 1
+▁zusätzliche 1
+legung 1
+▁Karte 1
+▁Frühstück 1
+▁Tradition 1
+▁Präsidenten 1
+AS 1
+▁welches 1
+sitz 1
+▁Berichts 1
+leiten 1
+▁Pal 1
+leb 1
+schläge 1
+▁umfassende 1
+▁Anti 1
+▁globale 1
+▁International 1
+▁Aktion 1
+▁Bedürfnisse 1
+▁Gegen 1
+nden 1
+▁vorher 1
+▁Parteien 1
+▁berücksichtigt 1
+mü 1
+▁Gleich 1
+▁Stimme 1
+gesetzt 1
+essen 1
+▁Arbeitsplätze 1
+fern 1
+▁entsprechenden 1
+▁fr 1
+linie 1
+▁teil 1
+▁ju 1
+▁extrem 1
+▁wären 1
+kor 1
+▁Minister 1
+lit 1
+▁Verteidigung 1
+▁verstärkt 1
+▁Schritte 1
+▁offensichtlich 1
+▁Partnerschaft 1
+▁Deutsch 1
+▁Zeichen 1
+▁Mer 1
+gend 1
+ckt 1
+▁Durchführung 1
+▁Mitgliedstaat 1
+▁echte 1
+▁Bürgern 1
+▁zentrale 1
+zel 1
+▁beschlossen 1
+ai 1
+Q 1
+ionen 1
+ace 1
+▁junge 1
+▁gesetzt 1
+▁Fahrzeug 1
+üb 1
+zeichen 1
+▁Chance 1
+▁Monat 1
+▁Praxis 1
+▁Per 1
+▁offiziell 1
+▁80 1
+seitig 1
+▁gegenwärtig 1
+▁Pu 1
+▁eingehen 1
+▁entschieden 1
+▁Pol 1
+technik 1
+▁Stil 1
+▁En 1
+▁Golf 1
+ib 1
+arch 1
+zugehen 1
+▁gelten 1
+▁organisiert 1
+▁with 1
+▁erklären 1
+ON 1
+part 1
+▁Menschenrechts 1
+▁Si 1
+> 1
+ast 1
+ären 1
+▁Server 1
+▁22 1
+ial 1
+▁Klimawandel 1
+ction 1
+kan 1
+OR 1
+▁unbedingt 1
+ppe 1
+sicherung 1
+fallen 1
+ement 1
+führer 1
+▁vi 1
+▁bilden 1
+mark 1
+▁Abgeordnete 1
+▁AG 1
+ras 1
+▁Angelegenheiten 1
+▁nötig 1
+▁klare 1
+▁Datei 1
+▁Arten 1
+▁Freund 1
+messen 1
+liste 1
+▁danke 1
+ühl 1
+▁komme 1
+▁November 1
+▁Windows 1
+▁Bewertung 1
+▁Organisationen 1
+▁früh 1
+▁eingerichtet 1
+▁bedeuten 1
+▁übernehmen 1
+lung 1
+▁lesen 1
+▁Den 1
+▁Technologien 1
+▁erhöht 1
+reisen 1
+▁Amt 1
+▁Arm 1
+▁bestimmt 1
+macht 1
+fel 1
+fuhr 1
+▁Begriff 1
+▁Atmosphäre 1
+▁usw 1
+äl 1
+▁Fuß 1
+▁End 1
+▁gefährlich 1
+▁gesprochen 1
+EG 1
+wunder 1
+▁technische 1
+▁Pf 1
+▁alte 1
+▁jährlich 1
+▁Mitteilung 1
+standard 1
+▁näher 1
+ete 1
+▁Spa 1
+▁Liebe 1
+erei 1
+▁menschliche 1
+▁Einrichtung 1
+▁Volk 1
+▁pa 1
+statt 1
+▁Wann 1
+▁Mor 1
+▁dich 1
+▁Infrastruktur 1
+ek 1
+anischen 1
+line 1
+▁Vo 1
+technologie 1
+▁Schul 1
+sicher 1
+▁Bus 1
+▁Gipfel 1
+▁na 1
+▁Allerdings 1
+bereit 1
+pri 1
+▁verloren 1
+dition 1
+kü 1
+▁Völker 1
+▁Abend 1
+▁Tro 1
+▁möglicherweise 1
+▁2003 1
+▁streng 1
+▁sämtliche 1
+▁Kirche 1
+▁einzelne 1
+geht 1
+▁positive 1
+▁Bre 1
+▁Chi 1
+▁Aspekt 1
+н 1
+san 1
+▁Tages 1
+▁konkrete 1
+▁Regel 1
+tzt 1
+▁funktionieren 1
+par 1
+▁Material 1
+▁Dem 1
+vel 1
+▁Hinweis 1
+▁Feld 1
+bri 1
+▁bitten 1
+hören 1
+▁erstellt 1
+▁Großbritannien 1
+▁Gründe 1
+▁Off 1
+▁technischen 1
+▁Analyse 1
+zz 1
+ight 1
+▁gesch 1
+▁finanzielle 1
+sprogramm 1
+mut 1
+▁Har 1
+hren 1
+▁Gott 1
+og 1
+▁Worte 1
+vers 1
+▁Jahrzehnt 1
+▁stets 1
+struktur 1
+wahl 1
+find 1
+sprache 1
+System 1
+▁23 1
+▁tätig 1
+gruppen 1
+▁Stück 1
+schaff 1
+▁entsprechende 1
+mitteln 1
+els 1
+▁Abschluss 1
+denken 1
+erstatterin 1
+▁Paris 1
+▁folgenden 1
+reise 1
+▁beteiligt 1
+▁Anstrengungen 1
+▁you 1
+▁persönlichen 1
+tage 1
+▁Dienste 1
+lement 1
+▁privaten 1
+▁Zustimmung 1
+▁aktuelle 1
+fehl 1
+▁entgegen 1
+▁Realität 1
+zentrum 1
+▁entstehen 1
+▁sage 1
+▁Griechenland 1
+▁modernen 1
+▁GmbH 1
+svorschriften 1
+▁Vorteile 1
+tiert 1
+sort 1
+▁jeweiligen 1
+zahlung 1
+▁rasch 1
+gl 1
+▁Haltung 1
+geb 1
+▁alt 1
+ative 1
+▁Angst 1
+▁negativ 1
+▁verändern 1
+▁April 1
+▁manchmal 1
+▁Überwachung 1
+ov 1
+▁Antrag 1
+▁Mein 1
+▁Waren 1
+▁Dazu 1
+▁geleistet 1
+pel 1
+getragen 1
+ili 1
+gg 1
+ile 1
+▁Wege 1
+val 1
+äh 1
+▁Schließlich 1
+▁Gast 1
+▁Stand 1
+etz 1
+hnt 1
+▁betrachtet 1
+▁folgende 1
+▁Geb 1
+▁berücksichtigen 1
+▁bewegen 1
+echt 1
+gesellschaft 1
+▁allgemeinen 1
+gesetz 1
+fälle 1
+behörde 1
+rück 1
+▁akzeptieren 1
+vent 1
+▁90 1
+▁Iran 1
+▁Hu 1
+gänge 1
+ational 1
+ros 1
+▁Garten 1
+artig 1
+losen 1
+▁Eltern 1
+▁acht 1
+▁Einstellung 1
+▁Start 1
+▁York 1
+ym 1
+▁200 1
+▁Au 1
+stoß 1
+▁qu 1
+aktiv 1
+▁App 1
+asi 1
+geber 1
+▁historischen 1
+▁Suche 1
+▁gefordert 1
+▁Verpflichtungen 1
+öl 1
+▁As 1
+▁stand 1
+eck 1
+▁Charakter 1
+▁Voll 1
+▁Ve 1
+▁Reformen 1
+▁Vielleicht 1
+▁Can 1
+▁nachhaltige 1
+▁starke 1
+dy 1
+▁derartige 1
+▁Mitte 1
+▁ausdrücklich 1
+▁Fran 1
+▁at 1
+▁ba 1
+▁Bri 1
+▁CO 1
+teilung 1
+gesch 1
+▁Wahlen 1
+▁entspricht 1
+▁Foto 1
+▁Tiere 1
+▁Sh 1
+▁Werkzeug 1
+▁zahlreichen 1
+▁Motor 1
+▁Tür 1
+red 1
+modell 1
+sser 1
+▁intensiv 1
+▁regelmäßig 1
+▁Banken 1
+▁Zweifel 1
+▁Schule 1
+▁Angriff 1
+▁Beweis 1
+▁künftig 1
+▁Ausgaben 1
+hu 1
+▁schön 1
+▁gewisse 1
+handlung 1
+ieß 1
+▁demokratischen 1
+▁Produktions 1
+hafte 1
+bre 1
+rich 1
+▁Anerkennung 1
+▁Mess 1
+Ich 1
+▁Kritik 1
+ting 1
+deck 1
+▁Sitz 1
+▁Zi 1
+griff 1
+▁dir 1
+▁Straße 1
+▁Tu 1
+gericht 1
+IC 1
+lon 1
+▁Kurs 1
+AT 1
+▁el 1
+▁Gas 1
+ott 1
+santrag 1
+region 1
+fassung 1
+oni 1
+kultur 1
+▁Zeitraum 1
+▁mindestens 1
+▁günstig 1
+▁Cor 1
+▁gekommen 1
+▁mi 1
+chi 1
+▁handeln 1
+▁Nutzen 1
+▁Zunächst 1
+▁zuvor 1
+▁speziell 1
+▁Anteil 1
+▁Komm 1
+▁militärische 1
+▁angesprochen 1
+▁Ausschuß 1
+▁Vielfalt 1
+oder 1
+▁Besucher 1
+▁gern 1
+▁hoffen 1
+▁Zug 1
+▁zudem 1
+▁Engagement 1
+▁Sonder 1
+▁musste 1
+▁Waffen 1
+maßnahmen 1
+▁Installation 1
+▁umgesetzt 1
+gruppe 1
+verfahren 1
+▁al 1
+ologie 1
+▁vermeiden 1
+▁Hintergrund 1
+zeichnen 1
+▁sicherzustellen 1
+swert 1
+▁Lehr 1
+▁Agentur 1
+▁For 1
+▁Stellungnahme 1
+▁betrachten 1
+kräfte 1
+5- 1
+lösung 1
+ekt 1
+▁seines 1
+▁nahm 1
+▁legen 1
+▁Ze 1
+tur 1
+ins 1
+leitung 1
+▁allgemeine 1
+wissen 1
+wick 1
+tung 1
+▁Kriterien 1
+▁Beratung 1
+▁Politiker 1
+?" 1
+steigen 1
+mitglied 1
+ox 1
+▁Site 1
+▁Lang 1
+▁Glück 1
+schicht 1
+Aus 1
+versorgung 1
+▁konzentrieren 1
+▁ungefähr 1
+▁Tor 1
+schalt 1
+ding 1
+räu 1
+baren 1
+zent 1
+gebracht 1
+▁by 1
+würdig 1
+▁erweitert 1
+▁Ereignisse 1
+▁demokratische 1
+▁halb 1
+▁Holz 1
+▁persönlich 1
+▁Des 1
+▁Schaden 1
+▁erfüllt 1
+▁beziehen 1
+▁hinweisen 1
+ST 1
+zeichnung 1
+▁Jung 1
+▁Mag 1
+ration 1
+▁Militär 1
+▁etc 1
+▁ar 1
+▁Freunde 1
+▁voraus 1
+gal 1
+▁Netz 1
+▁Anpassung 1
+▁Falle 1
+▁Ausnahme 1
+▁Super 1
+▁schneller 1
+▁Eigen 1
+▁Unterkategorien 1
+prä 1
+▁deutschen 1
+▁Grunde 1
+▁sprach 1
+▁Falls 1
+▁Fähigkeit 1
+▁Mus 1
+van 1
+▁Link 1
+▁reden 1
+▁hart 1
+OS 1
+▁Gedanken 1
+▁Instrumente 1
+▁Eis 1
+are 1
+▁Mio 1
+▁miteinander 1
+▁verfolgt 1
+ßt 1
+▁Wind 1
+▁bestehenden 1
+schl 1
+▁Veränderungen 1
+▁laufen 1
+rum 1
+iss 1
+rad 1
+▁Meeres 1
+▁sicherlich 1
+▁Büro 1
+ath 1
+▁Kan 1
+▁Forderung 1
+▁Hersteller 1
+▁verlassen 1
+▁Autor 1
+krank 1
+▁Management 1
+▁Verkauf 1
+▁Kauf 1
+▁Wahrheit 1
+▁op 1
+▁finde 1
+win 1
+▁Teile 1
+▁Atom 1
+▁Mensch 1
+sol 1
+▁ergreifen 1
+träger 1
+▁Sonnen 1
+bring 1
+▁Sol 1
+▁Unterschied 1
+ther 1
+ps 1
+▁fahren 1
+rei 1
+position 1
+▁Kor 1
+Programm 1
+▁los 1
+▁niedrig 1
+▁Gold 1
+▁interessiert 1
+▁prüfen 1
+ore 1
+▁Chris 1
+▁einfache 1
+▁verursacht 1
+ffe 1
+▁zuletzt 1
+▁Rand 1
+▁London 1
+▁reich 1
+▁Leistung 1
+bus 1
+ink 1
+sinn 1
+stall 1
+tier 1
+▁Kredit 1
+▁Jeder 1
+eten 1
+▁jegliche 1
+▁gelangen 1
+rahmen 1
+▁komplexe 1
+angebot 1
+▁mehreren 1
+▁Vater 1
+▁Zwischen 1
+qua 1
+▁Einkommen 1
+▁Hel 1
+▁Roma 1
+▁27 1
+▁Stein 1
+▁Strom 1
+▁Sprachen 1
+▁Grundsatz 1
+▁Ihres 1
+▁Anspruch 1
+▁Prinzip 1
+station 1
+▁Texte 1
+▁mo 1
+▁typisch 1
+see 1
+▁Winter 1
+▁Aufbau 1
+▁Mon 1
+▁amerikanischen 1
+ski 1
+sucht 1
+bett 1
+▁geschützt 1
+fläche 1
+▁ernsthaft 1
+▁entweder 1
+▁Österreich 1
+▁finanziellen 1
+▁bedeutende 1
+▁schöne 1
+▁online 1
+▁Vereinte 1
+rau 1
+ual 1
+▁Tod 1
+kunden 1
+▁Mont 1
+Ein 1
+▁Gemeinde 1
+grenzen 1
+▁interessante 1
+wiesen 1
+▁Rad 1
+▁Absicht 1
+▁manche 1
+▁erzielen 1
+▁Titel 1
+krieg 1
+▁Ober 1
+▁anerkannt 1
+▁Signal 1
+gelassen 1
+▁Ost 1
+▁Inhalte 1
+▁Europäer 1
+▁Städte 1
+▁Kenntnis 1
+rang 1
+▁Trans 1
+US 1
+▁hält 1
+ini 1
+ffer 1
+▁erhöhen 1
+▁darunter 1
+▁darstellen 1
+▁Typ 1
+▁gewinnen 1
+Das 1
+▁Str 1
+▁interessant 1
+▁gegenseitig 1
+▁Prüfung 1
+▁stimmt 1
+▁daraus 1
+▁älter 1
+▁positiv 1
+stadt 1
+▁tut 1
+▁geschehen 1
+▁Polizei 1
+▁danach 1
+▁Aussicht 1
+mie 1
+▁dagegen 1
+▁Kooperation 1
+▁Pflicht 1
+▁Drogen 1
+▁Restaurants 1
+▁ch 1
+▁Globalisierung 1
+▁grenz 1
+▁Wirkung 1
+▁persönliche 1
+▁relativ 1
+▁Welche 1
+▁Nutzer 1
+▁2001 1
+▁ideal 1
+▁Systeme 1
+kur 1
+ndung 1
+▁Richtlinien 1
+▁Rest 1
+schul 1
+▁betonen 1
+pflicht 1
+inter 1
+▁Kommunikation 1
+▁optimal 1
+▁Landschaft 1
+▁Gesetzgebung 1
+▁70 1
+lern 1
+▁Regelung 1
+▁genauso 1
+ali 1
+freiheit 1
+▁regionale 1
+bin 1
+▁Vorbereitung 1
+ante 1
+▁wünschen 1
+▁2002 1
+volle 1
+weck 1
+stab 1
+▁großartig 1
+▁stattfinden 1
+preise 1
+geteilt 1
+▁bezeichnet 1
+vertrag 1
+▁Systems 1
+stehende 1
+▁♫ 1
+▁Wettbewerbs 1
+▁rechts 1
+▁innovative 1
+▁Sollte 1
+▁verfolgen 1
+act 1
+▁eingeführt 1
+machen 1
+▁stabil 1
+▁Angesicht 1
+▁damals 1
+geführt 1
+test 1
+▁freien 1
+mat 1
+lager 1
+tsch 1
+bürger 1
+▁angeboten 1
+▁su 1
+band 1
+▁Kategorie 1
+unk 1
+▁At 1
+▁Abschließend 1
+▁Akt 1
+haltung 1
+▁Band 1
+▁laut 1
+▁bestätigt 1
+▁Herstellung 1
+zo 1
+raten 1
+▁Kurz 1
+▁kommenden 1
+▁stärken 1
+ät 1
+▁welcher 1
+▁Gut 1
+▁erleben 1
+▁Zudem 1
+▁Kal 1
+▁Phase 1
+▁Teilnehmer 1
+▁sieben 1
+▁ho 1
+▁teilen 1
+▁angemessen 1
+sländern 1
+▁Weil 1
+▁stimme 1
+▁Mu 1
+fest 1
+▁klein 1
+schuldig 1
+▁Zell 1
+▁Fonds 1
+kul 1
+▁angegeben 1
+▁Argument 1
+▁falls 1
+roh 1
++ 1
+▁Bundes 1
+nes 1
+wahr 1
+flug 1
+▁Irland 1
+▁Entwurf 1
+▁Termin 1
+wir 1
+& 1
+▁Formen 1
+▁herum 1
+▁Sorge 1
+▁teilweise 1
+▁Homepage 1
+▁Kontroll 1
+60 1
+▁spezielle 1
+▁500 1
+stell 1
+wirkung 1
+hängen 1
+▁Tatsächlich 1
+▁Einigung 1
+▁Flugzeug 1
+▁Königreich 1
+wertung 1
+▁com 1
+▁Übersetzung 1
+fordern 1
+▁Friedens 1
+räume 1
+RA 1
+räumen 1
+wohn 1
+▁kaufen 1
+▁grundlegende 1
+▁Mädchen 1
+▁unglaublich 1
+richtung 1
+▁elektronische 1
+energie 1
+▁gegründet 1
+▁Mikro 1
+liegen 1
+▁dank 1
+▁angenehm 1
+bad 1
+▁Sam 1
+▁Blut 1
+▁ähnlich 1
+fu 1
+▁hinzu 1
+▁Dennoch 1
+hel 1
+sabkommen 1
+material 1
+▁Status 1
+▁garantiert 1
+▁übernachten 1
+Kon 1
+15 1
+▁Garantie 1
+iere 1
+▁Hälfte 1
+▁menschlichen 1
+alisierung 1
+▁1999 1
+sagen 1
+politischen 1
+▁Leider 1
+▁einsetzen 1
+▁Presse 1
+▁langsam 1
+▁Übergang 1
+▁Polen 1
+▁Ordnung 1
+▁angezeigt 1
+ille 1
+▁Studie 1
+▁besonderen 1
+orientiert 1
+oo 1
+▁Ratsvorsitz 1
+▁grosse 1
+▁zentral 1
+▁beginnt 1
+▁Kur 1
+Er 1
+▁Qua 1
+▁geschieht 1
+▁freuen 1
+▁dennoch 1
+vention 1
+▁Verständnis 1
+▁Bestandteil 1
+paket 1
+leistung 1
+ssig 1
+arten 1
+▁freue 1
+▁Ausland 1
+▁Kamera 1
+▁Gesicht 1
+ativ 1
+nie 1
+▁richtigen 1
+▁Tisch 1
+brechen 1
+▁Hände 1
+▁bauen 1
+▁wirksam 1
+falls 1
+▁Verpflichtung 1
+zähl 1
+▁Maschinen 1
+▁Hy 1
+▁Elemente 1
+dacht 1
+▁dritte 1
+▁Grundsätze 1
+▁dienen 1
+▁Multi 1
+▁Zahlen 1
+▁dritten 1
+prinzip 1
+Le 1
+spar 1
+▁trägt 1
+werfen 1
+99 1
+▁fallen 1
+▁Danke 1
+dro 1
+▁Ur 1
+ban 1
+▁August 1
+▁hotel 1
+▁Planeten 1
+▁Prioritäten 1
+pu 1
+▁Studien 1
+▁Einheit 1
+▁fühlen 1
+anlagen 1
+▁erscheint 1
+▁Oder 1
+kreis 1
+kurs 1
+▁zahlen 1
+▁übertragen 1
+▁lebt 1
+▁Initiativen 1
+▁Absatz 1
+eller 1
+▁größer 1
+▁Willen 1
+▁aufmerksam 1
+▁Schau 1
+han 1
+▁Einrichtungen 1
+▁Ausstellung 1
+by 1
+hotel 1
+produktion 1
+▁notwendigen 1
+▁links 1
+▁hochwertige 1
+dienste 1
+nimmt 1
+▁Red 1
+▁Papier 1
+rechts 1
+▁: 1
+▁feststellen 1
+▁Tour 1
+▁erstellen 1
+ehr 1
+viel 1
+▁humanitäre 1
+schuld 1
+wagen 1
+▁schlimm 1
+rus 1
+▁betroffen 1
+▁warten 1
+▁Februar 1
+typ 1
+hy 1
+ty 1
+▁profitieren 1
+hor 1
+▁Brüssel 1
+▁wollten 1
+▁einzigartige 1
+Beifall 1
+ehrt 1
+▁fordert 1
+rom 1
+king 1
+zieht 1
+▁Genau 1
+instrument 1
+systems 1
+wart 1
+bald 1
+▁Jede 1
+▁Miss 1
+▁Jahrhunderts 1
+upp 1
+▁Westen 1
+▁herzlich 1
+öpf 1
+▁Heil 1
+stoffe 1
+▁durchaus 1
+▁Air 1
+▁Museum 1
+▁nützlich 1
+▁zufrieden 1
+zugeben 1
+▁Verlust 1
+▁Grün 1
+chten 1
+bra 1
+▁Stärkung 1
+management 1
+digen 1
+war 1
+▁hingewiesen 1
+▁Ukraine 1
+▁beschäftigt 1
+▁Verwaltungs 1
+person 1
+▁sinnvoll 1
+▁interne 1
+▁sonst 1
+▁gewährleistet 1
+▁hervorragende 1
+bü 1
+▁Gebieten 1
+tral 1
+▁lokalen 1
+▁Innen 1
+▁entscheidend 1
+Sterne 1
+▁€ 1
+▁Dimension 1
+▁diskutieren 1
+▁meist 1
+weich 1
+▁vo 1
+▁Mindest 1
+gegen 1
+▁Grenz 1
+▁Sal 1
+▁umfasst 1
+ux 1
+geladen 1
+▁besuchen 1
+▁befürworte 1
+▁Agrar 1
+▁Pan 1
+▁vernünftig 1
+▁Ton 1
+q 1
+▁que 1
+▁Funktionen 1
+▁Spezial 1
+licht 1
+▁Beschreibung 1
+▁Besitz 1
+▁abgeschlossen 1
+▁erheblich 1
+politische 1
+Que 1
+pekt 1
+▁höhere 1
+von 1
+▁begrenzt 1
+ó 1
+nahm 1
+▁vorstellen 1
+▁Achtung 1
+▁erfolgen 1
+bindung 1
+▁Vorteil 1
+▁institutionelle 1
+▁größeren 1
+▁schreiben 1
+▁Sil 1
+Arbeitslosigkeit 1
+▁beinhaltet 1
+▁Villa 1
+how 1
+▁Ferien 1
+▁Tagung 1
+▁Club 1
+▁Wald 1
+roll 1
+▁jederzeit 1
+р 1
+laub 1
+▁Worten 1
+▁Risiken 1
+▁politischer 1
+▁vollkommen 1
+▁» 1
+“, 1
+▁fertig 1
+▁gewünscht 1
+▁Umfang 1
+issen 1
+sto 1
+▁überzeugen 1
+willig 1
+▁gestalten 1
+izi 1
+dia 1
+▁Bal 1
+hlen 1
+▁Such 1
+▁verlieren 1
+▁à 1
+gung 1
+▁ließ 1
+▁Mini 1
+▁Beteiligung 1
+PS 1
+▁Einhaltung 1
+ächtig 1
+organisation 1
+ID 1
+▁wichtigste 1
+▁Essen 1
+12 1
+▁Zustand 1
+▁ad 1
+▁Lauf 1
+▁diskutiert 1
+▁Justiz 1
+▁Klasse 1
+fä 1
+▁Lesung 1
+▁umfangreiche 1
+▁Komfort 1
+LE 1
+▁zer 1
+▁Volks 1
+▁Nachrichten 1
+▁erhält 1
+häuser 1
+▁erfordert 1
+anische 1
+▁unternehmen 1
+▁Haut 1
+richtlinie 1
+▁Qualitäts 1
+▁Maße 1
+liefer 1
+▁Kein 1
+▁machte 1
+schnitt 1
+änderung 1
+▁kulturelle 1
+▁Eindruck 1
+▁CD 1
+mission 1
+aut 1
+▁deutsche 1
+tieren 1
+▁Wohnung 1
+▁PC 1
+▁Religion 1
+▁Dorf 1
+▁beliebt 1
+▁Sc 1
+▁höchste 1
+▁wenigen 1
+brauch 1
+▁Wissenschaftler 1
+geordnet 1
+tho 1
+▁ökologisch 1
+▁Regime 1
+AP 1
+▁unterstütze 1
+tion 1
+rö 1
+▁300 1
+▁hergestellt 1
+schrei 1
+▁Fotos 1
+▁Gegenteil 1
+zin 1
+kommt 1
+dit 1
+tragen 1
+▁Camp 1
+▁Wachstums 1
+▁Verkehr 1
+▁Priorität 1
+ani 1
+ood 1
+minister 1
+▁or 1
+▁Standort 1
+▁Handlung 1
+punkte 1
+▁Hafen 1
+▁fair 1
+zeiten 1
+low 1
+nov 1
+▁Bedarf 1
+▁ursprünglich 1
+text 1
+drücke 1
+▁Verträge 1
+▁Terrasse 1
+▁informieren 1
+▁Para 1
+karte 1
+▁König 1
+▁legt 1
+▁freundlich 1
+▁wachsende 1
+▁jungen 1
+ausschuss 1
+▁Betracht 1
+star 1
+pack 1
+▁Schluss 1
+▁Kontinent 1
+gli 1
+zusetzen 1
+▁Kräfte 1
+TE 1
+meister 1
+▁Rot 1
+west 1
+▁Krankheit 1
+gründe 1
+▁sehe 1
+▁historische 1
+ida 1
+bruch 1
+▁Business 1
+▁Barcelona 1
+▁Gerichtshof 1
+▁möglichst 1
+gebildet 1
+▁Patent 1
+▁Leistungen 1
+heiten 1
+▁zusätzlich 1
+grad 1
+▁einiger 1
+▁Produkten 1
+Ab 1
+04 1
+▁kürzlich 1
+▁wusste 1
+▁entdeckt 1
+▁Schwerpunkt 1
+▁Angaben 1
+▁Kosovo 1
+▁bequem 1
+▁Tur 1
+▁wunderbar 1
+▁Gemeinsamen 1
+▁wunderschöne 1
+02 1
+produkt 1
+▁Val 1
+▁kom 1
+▁großes 1
+▁überprüfen 1
+mil 1
+▁11. 1
+▁Treffen 1
+▁betont 1
+▁Dateien 1
+▁Zuständigkeit 1
+uk 1
+kenntnis 1
+staaten 1
+▁Pre 1
+▁kompliziert 1
+▁21 1
+RI 1
+ara 1
+▁Diskriminierung 1
+▁Niveau 1
+▁Eigentum 1
+bie 1
+▁Hauses 1
+organ 1
+information 1
+▁Pat 1
+ep 1
+▁weiterer 1
+▁Fin 1
+▁Pflanzen 1
+▁Flexibilität 1
+▁II 1
+▁schlechte 1
+▁Hin 1
+cha 1
+▁Boot 1
+klär 1
+▁Ruhe 1
+▁Meiner 1
+▁uner 1
+05 1
+▁Forderungen 1
+sagt 1
+▁hundert 1
+▁jeweils 1
+▁Export 1
+▁selten 1
+straße 1
+▁Gesetze 1
+▁Kompetenz 1
+zone 1
+zeichnet 1
+▁niemals 1
+▁Unterschiede 1
+▁Ursache 1
+▁Part 1
+▁Wander 1
+▁Morgen 1
+▁private 1
+▁Verbrechen 1
+▁übrigen 1
+▁Methode 1
+▁Landwirte 1
+▁Experten 1
+weil 1
+▁Datenbank 1
+▁Gleichzeitig 1
+▁Rund 1
+▁pr 1
+züge 1
+▁geplant 1
+ama 1
+▁Halb 1
+▁Einwanderung 1
+air 1
+▁Geschäft 1
+▁Studenten 1
+av 1
+▁sogenannte 1
+▁allgemein 1
+Hotel 1
+▁medizinische 1
+▁Zahlung 1
+Sp 1
+▁Lizenz 1
+pä 1
+ons 1
+wendung 1
+▁freie 1
+ausschusses 1
+▁Laufe 1
+▁gemeinschaft 1
+füge 1
+▁lokale 1
+▁Zugriff 1
+▁Liberalisierung 1
+▁bildet 1
+▁Lehrer 1
+tätigkeit 1
+▁endgültig 1
+▁geöffnet 1
+▁Schreib 1
+▁grundsätzlich 1
+▁verabschiedet 1
+▁schw 1
+т 1
+▁hauptsächlich 1
+dra 1
+lat 1
+bor 1
+märkte 1
+jährige 1
+▁reagieren 1
+tür 1
+▁einzigen 1
+stor 1
+▁Schweiz 1
+not 1
+▁höher 1
+▁Speicher 1
+Wir 1
+Ã 1
+gie 1
+stellungen 1
+nummer 1
+▁y 1
+▁Amsterdam 1
+AC 1
+▁gebaut 1
+▁Freude 1
+spe 1
+▁kulturellen 1
+tzen 1
+eucht 1
+▁Download 1
+▁Überzeugung 1
+2- 1
+MA 1
+▁3. 1
+rand 1
+ull 1
+▁Streit 1
+▁zugänglich 1
+schloss 1
+▁jüngsten 1
+▁festgestellt 1
+weisung 1
+▁ausgewählt 1
+▁professionelle 1
+einheit 1
+▁Nahrungsmittel 1
+▁Standards 1
+▁Gegensatz 1
+▁Name 1
+▁WTO 1
+produkte 1
+lli 1
+gewalt 1
+▁Gebiete 1
+sbedingungen 1
+▁Koordinierung 1
+leih 1
+▁Beziehung 1
+▁regionalen 1
+▁High 1
+▁informiert 1
+▁wesentliche 1
+ata 1
+ehen 1
+▁fe 1
+▁Messe 1
+▁Marken 1
+vorschlag 1
+▁sauber 1
+▁gehalten 1
+▁Vielzahl 1
+▁solcher 1
+▁Fenster 1
+▁ideale 1
+▁Maß 1
+DA 1
+▁schafft 1
+gab 1
+way 1
+▁Überprüfung 1
+rät 1
+▁irgendeine 1
+▁externe 1
+▁Fo 1
+▁Debian 1
+▁behandeln 1
+▁definiert 1
+link 1
+▁Grundrechte 1
+▁Schulden 1
+lig 1
+ona 1
+▁Drei 1
+▁spricht 1
+▁Seine 1
+▁Letzt 1
+bedingungen 1
+▁britischen 1
+▁erforderlichen 1
+▁Architektur 1
+Li 1
+▁Krankheiten 1
+▁Bett 1
+▁gegenwärtigen 1
+yp 1
+▁Bedrohung 1
+prozess 1
+▁Pass 1
+▁Dia 1
+▁Wirklichkeit 1
+Al 1
+Geschäftsordnung 1
+suche 1
+▁anstatt 1
+oc 1
+möglichkeiten 1
+▁Adresse 1
+▁klicken 1
+ust 1
+qu 1
+▁Jugend 1
+▁akzeptiert 1
+▁ähnliche 1
+▁+ 1
+programme 1
+▁vorliegenden 1
+cc 1
+▁fällt 1
+▁schwarz 1
+▁Rom 1
+▁üblich 1
+▁schönen 1
+▁Arbeiten 1
+▁führte 1
+▁Best 1
+▁Ganz 1
+▁brachte 1
+▁Erklärungen 1
+▁Ei 1
+▁Reg 1
+▁20. 1
+▁Gewicht 1
+▁au 1
+▁Oberfläche 1
+ware 1
+cy 1
+▁« 1
+▁Verletzung 1
+▁Alternative 1
+▁abhängig 1
+pfen 1
+▁unternommen 1
+▁dient 1
+▁trotzdem 1
+gelegt 1
+ise 1
+mobil 1
+▁möglichen 1
+kapital 1
+▁as 1
+▁mussten 1
+▁not 1
+▁kamen 1
+TM 1
+heben 1
+forschung 1
+drückt 1
+mod 1
+FR 1
+CO 1
+▁Sand 1
+▁transparent 1
+▁Krebs 1
+▁Suite 1
+▁ergeben 1
+▁französischen 1
+▁amerikanische 1
+▁Perspektive 1
+▁Zentralbank 1
+▁Weitere 1
+teri 1
+▁empfehlen 1
+▁besitzt 1
+holen 1
+ces 1
+mus 1
+▁Mil 1
+▁Definition 1
+▁Google 1
+▁Berichte 1
+zellen 1
+▁Karten 1
+07 1
+geschäft 1
+▁eröffnet 1
+▁außerordentlich 1
+75 1
+Ö 1
+▁Kombination 1
+▁Beschäftigungs 1
+* 1
+▁Booking 1
+▁billig 1
+▁keinerlei 1
+treffen 1
+sport 1
+tour 1
+▁Unabhängigkeit 1
+▁Doppel 1
+schneid 1
+▁lautet 1
+▁room 1
+▁voller 1
+▁Erhöhung 1
+▁korrekt 1
+▁26 1
+AM 1
+▁Fälle 1
+tät 1
+▁öffentlich 1
+yl 1
+70 1
+wal 1
+▁teilnehmen 1
+grund 1
+▁Berufs 1
+▁Leuten 1
+durch 1
+▁Massen 1
+▁individuell 1
+▁Afghanistan 1
+gebühr 1
+▁Element 1
+▁schließen 1
+lor 1
+bas 1
+nah 1
+zy 1
+sorge 1
+▁eingereicht 1
+▁toll 1
+ular 1
+SA 1
+iziert 1
+▁gutes 1
+▁dieselbe 1
+11 1
+▁28 1
+▁Übereinkommen 1
+ziehung 1
+▁basiert 1
+hand 1
+▁begonnen 1
+spann 1
+▁internationaler 1
+▁reichen 1
+▁läuft 1
+▁erscheinen 1
+▁geschrieben 1
+▁wertvoll 1
+reif 1
+▁21. 1
+fund 1
+▁Glas 1
+▁entsprechen 1
+qualität 1
+в 1
+▁Branche 1
+▁verehrte 1
+▁individuelle 1
+▁Lager 1
+▁langfristig 1
+▁Gegend 1
+pon 1
+▁ausländische 1
+▁Identität 1
+figur 1
+tätig 1
+▁verbundenen 1
+▁Min 1
+▁enorme 1
+▁Zeiten 1
+▁pre 1
+fort 1
+▁Regelungen 1
+ED 1
+maß 1
+▁Beihilfen 1
+▁Arbeitsmarkt 1
+▁ausgewogen 1
+▁zählen 1
+▁Farben 1
+▁Bill 1
+präsent 1
+scheid 1
+frist 1
+▁reicht 1
+▁Gebrauch 1
+gebunden 1
+..." 1
+▁guter 1
+▁vorschlagen 1
+▁erzählen 1
+▁angemessene 1
+IM 1
+jährigen 1
+▁gesetzlich 1
+▁speziellen 1
+▁entstanden 1
+ju 1
+wür 1
+▁glücklich 1
+▁Nein 1
+gut 1
+▁Erinnerung 1
+▁Flüchtlinge 1
+▁zuerst 1
+namen 1
+▁produziert 1
+fähigkeit 1
+mos 1
+▁politisch 1
+▁Tourismus 1
+▁Fähigkeiten 1
+{ 1
+▁erinnert 1
+▁Erwachsene 1
+▁from 1
+getreten 1
+rot 1
+ley 1
+Gästebewertungen 1
+isches 1
+ET 1
+▁Last 1
+▁bisschen 1
+▁Empfehlungen 1
+▁Prä 1
+▁wodurch 1
+▁grüne 1
+ball 1
+▁Feuer 1
+IP 1
+▁Geräte 1
+well 1
+gezogen 1
+▁verb 1
+▁Gemeinsam 1
+▁1998 1
+cial 1
+▁gesellschaft 1
+strahl 1
+'' 1
+▁Gerät 1
+▁seien 1
+▁Englisch 1
+▁Pl 1
+haben 1
+kauf 1
+IA 1
+▁Nachfrage 1
+▁Agenda 1
+kette 1
+▁Col 1
+▁Verhältnis 1
+▁Minderheiten 1
+tain 1
+MS 1
+▁Antworten 1
+▁Zentral 1
+▁geboren 1
+fahrzeug 1
+▁rück 1
+wirkt 1
+▁Code 1
+stanz 1
+vid 1
+▁Nachdem 1
+▁gelungen 1
+▁Aktionen 1
+▁TV 1
+▁Buchung 1
+▁Ebenso 1
+▁garantieren 1
+▁Sprach 1
+▁Ger 1
+wort 1
+▁langfristige 1
+▁Teilen 1
+▁sorgfältig 1
+▁Wohlstand 1
+▁General 1
+▁Anwendungen 1
+▁Asyl 1
+unt 1
+▁Betriebs 1
+▁langen 1
+▁treten 1
+ua 1
+▁wiederholt 1
+abel 1
+▁angeht 1
+▁Anlagen 1
+ita 1
+▁Schrift 1
+▁betroffenen 1
+leistungen 1
+▁unmöglich 1
+▁sagten 1
+kra 1
+plätze 1
+▁Problemen 1
+▁nationaler 1
+ax 1
+▁Dokumente 1
+ätz 1
+▁Regierungs 1
+▁gestern 1
+ringen 1
+smaßnahmen 1
+▁Veranstaltung 1
+▁Unterkunft 1
+▁Eigenschaften 1
+▁umgehen 1
+▁IT 1
+ij 1
+tic 1
+ré 1
+▁sodass 1
+▁Klicken 1
+▁Serie 1
+▁Gleichgewicht 1
+40 1
+▁herunter 1
+▁verbreitet 1
+= 1
+uck 1
+▁sozial 1
+ence 1
+▁Stimm 1
+ebene 1
+▁Schüler 1
+▁Methoden 1
+istisch 1
+ül 1
+▁Bestellung 1
+▁dürfte 1
+▁Par 1
+▁Delegation 1
+▁vereinbart 1
+PL 1
+▁Fluss 1
+▁My 1
+▁Baby 1
+flu 1
+organisationen 1
+▁Symbol 1
+üß 1
+▁Asien 1
+▁Portugal 1
+top 1
+üchte 1
+▁einverstanden 1
+▁Teilnahme 1
+▁Sehr 1
+▁Stunde 1
+▁Fort 1
+aria 1
+reform 1
+▁Hall 1
+ada 1
+NE 1
+film 1
+▁Vision 1
+lution 1
+▁Hilfs 1
+Innen 1
+ture 1
+▁Austausch 1
+▁rot 1
+gas 1
+▁außergewöhnlich 1
+transport 1
+▁Stern 1
+▁Cu 1
+wehr 1
+▁digitale 1
+gesellschaften 1
+▁Amerikaner 1
+▁Sachen 1
+CE 1
+▁sicherstellen 1
+▁ro 1
+itter 1
+▁Glauben 1
+▁erzeugt 1
+▁Anlage 1
+▁Bildungs 1
+▁Mitteln 1
+SP 1
+▁Jugendliche 1
+▁erhebliche 1
+▁Denken 1
+▁dauerhaft 1
+▁Abschnitt 1
+▁erneuerbare 1
+▁vielmehr 1
+vis 1
+▁Mut 1
+▁geringer 1
+▁tat 1
+▁Ph 1
+▁wiederum 1
+▁ländlichen 1
+▁kurze 1
+▁Mir 1
+▁Wünsche 1
+▁höchst 1
+▁Zoll 1
+▁Entwicklungen 1
+verkehrs 1
+▁* 1
+▁kontextuell 1
+▁Voraussetzungen 1
+maschine 1
+suchen 1
+führt 1
+gerät 1
+direkt 1
+kontrolle 1
+▁Montag 1
+quelle 1
+▁Bücher 1
+▁klassische 1
+▁verstanden 1
+▁Niederlande 1
+▁entscheidende 1
+chel 1
+▁hilft 1
+pen 1
+Pla 1
+henswürdigkeiten 1
+Ch 1
+▁entdecken 1
+▁zugleich 1
+usch 1
+▁Krisen 1
+ish 1
+▁are 1
+▁Veränderung 1
+▁Beim 1
+neu 1
+▁Geschlecht 1
+▁Kilometer 1
+zung 1
+▁Zins 1
+zubringen 1
+▁andererseits 1
+▁Grad 1
+▁Effizienz 1
+▁Einklang 1
+gres 1
+▁ru 1
+cal 1
+▁vieler 1
+▁wesentlichen 1
+Pa 1
+▁CA 1
+▁Umständen 1
+▁vorbei 1
+▁traditionellen 1
+▁keiner 1
+▁Süden 1
+▁besseren 1
+schritt 1
+▁frisch 1
+▁verkauft 1
+EC 1
+03 1
+▁mittels 1
+▁Revolution 1
+▁positiven 1
+stä 1
+dri 1
+gelöst 1
+gelt 1
+▁abgestimmt 1
+▁öffnen 1
+▁35 1
+dreh 1
+▁Regulierung 1
+gro 1
+▁it 1
+entwurf 1
+▁spät 1
+partei 1
+▁Orte 1
+▁hört 1
+schiff 1
+post 1
+▁Einkaufs 1
+▁Grenze 1
+▁diesbezüglich 1
+▁Wohl 1
+▁Aussage 1
+▁Immobilien 1
+▁weiteres 1
+▁Muster 1
+gebiete 1
+▁Mitgliedern 1
+▁beachten 1
+▁religiöse 1
+illa 1
+vari 1
+▁Bedenken 1
+▁besucht 1
+isierte 1
+▁maximal 1
+ira 1
+▁Nieder 1
+▁trans 1
+Datei 1
+▁gefördert 1
+LA 1
+▁www 1
+▁Versorgung 1
+▁namens 1
+▁mon 1
+TA 1
+▁Farb 1
+▁beschränkt 1
+ham 1
+▁Gal 1
+▁betreffen 1
+} 1
+▁seitens 1
+▁Gang 1
+▁gestaltet 1
+▁erlebt 1
+bro 1
+bauen 1
+ächte 1
+▁befassen 1
+▁Bel 1
+anda 1
+▁BIP 1
+Г 1
+ati 1
+▁illegale 1
+▁Anliegen 1
+▁Nahen 1
+▁Ter 1
+▁landwirtschaft 1
+▁Verbot 1
+▁Geschichten 1
+▁Planung 1
+▁Blog 1
+▁Internationalen 1
+▁Programms 1
+och 1
+▁weltweiten 1
+▁übernommen 1
+▁Schwarz 1
+▁Schweden 1
+Shop 1
+fol 1
+▁zuständig 1
+gezahlt 1
+▁Ausmaß 1
+▁wirksame 1
+▁schauen 1
+▁deiner 1
+▁Dort 1
+▁Beispiele 1
+▁Sanktionen 1
+▁beliebig 1
+14 1
+▁Job 1
+▁wann 1
+▁Neue 1
+tle 1
+▁Trotz 1
+▁erlauben 1
+▁Liefer 1
+▁herrliche 1
+cre 1
+car 1
+▁trifft 1
+▁les 1
+fisch 1
+65 1
+kar 1
+▁riesigen 1
+▁dynamisch 1
+▁Diskussionen 1
+▁gesunde 1
+▁derzeitigen 1
+LI 1
+▁ausgezeichnete 1
+▁leiden 1
+▁installiert 1
+▁Formular 1
+▁pu 1
+▁Gründung 1
+▁Mittelmeer 1
+▁Kollege 1
+▁gewonnen 1
+▁anbieten 1
+▁Tausende 1
+▁bedarf 1
+▁Redner 1
+▁entschlossen 1
+▁Institution 1
+tief 1
+▁Handeln 1
+▁vollständige 1
+▁Open 1
+▁Noch 1
+▁vorgeschlagenen 1
+▁Anreiz 1
+▁Datenschutz 1
+▁bereitgestellt 1
+krise 1
+rank 1
+▁Tief 1
+durchschnittlich 1
+▁erkannt 1
+▁Mangel 1
+▁grundlegenden 1
+▁dis 1
+▁Star 1
+pflanz 1
+▁griechische 1
+▁erhielt 1
+OL 1
+▁Gespräche 1
+▁Unternehmer 1
+▁höchsten 1
+cia 1
+▁nochmals 1
+▁Leistungs 1
+Veröffentlichung 1
+▁pri 1
+▁modern 1
+25 1
+▁Statistik 1
+▁World 1
+▁Modul 1
+▁kämpfen 1
+▁Fischer 1
+▁Vertrieb 1
+▁außen 1
+▁Linux 1
+▁Erwartungen 1
+▁Rumänien 1
+▁gelöst 1
+▁Wal 1
+▁Kol 1
+niveau 1
+vest 1
+▁konstruktiv 1
+▁Sonne 1
+▁erzählt 1
+▁sichtbar 1
+▁Ziffer 1
+▁Andere 1
+▁Erholung 1
+▁Einwohner 1
+pul 1
+48 1
+▁Links 1
+▁Effekt 1
+▁bekämpfen 1
+▁Medi 1
+▁Strategien 1
+oli 1
+▁Hol 1
+▁Kommunikations 1
+▁attraktiv 1
+▁Spiele 1
+dre 1
+▁Fraktionen 1
+pan 1
+atoren 1
+▁Ehe 1
+aufnahme 1
+nor 1
+▁Materialien 1
+▁Za 1
+▁Bra 1
+▁Leitlinien 1
+▁verlangt 1
+▁liefern 1
+▁siehe 1
+▁Direkt 1
+▁John 1
+▁bestimmen 1
+▁Char 1
+▁Sach 1
+▁Kaffee 1
+▁eigenes 1
+▁führenden 1
+tom 1
+▁Haushaltsplan 1
+körper 1
+▁Kann 1
+▁ausführlich 1
+▁verdient 1
+luft 1
+anlage 1
+▁Kandidat 1
+einrichtungen 1
+▁Temperatur 1
+ura 1
+▁Himmel 1
+▁ergriffen 1
+▁Regen 1
+▁Gesch 1
+austausch 1
+pre 1
+bit 1
+of 1
+▁gew 1
+▁schnelle 1
+gebung 1
+▁stellte 1
+▁Häuser 1
+▁erfolgreiche 1
+▁Reich 1
+geräte 1
+▁Mac 1
+▁Dingen 1
+mini 1
+▁Bauern 1
+▁komfortable 1
+▁Stärke 1
+▁besitzen 1
+▁konzentriert 1
+▁Voraussetzung 1
+▁verbinden 1
+▁Aufgrund 1
+▁Elektro 1
+▁Fläche 1
+lehr 1
+▁technologische 1
+▁Empfehlung 1
+▁zustimmen 1
+tech 1
+La 1
+▁konkreten 1
+▁sorgt 1
+98 1
+▁Wandel 1
+▁29 1
+▁Nummer 1
+08 1
+änder 1
+▁Fehl 1
+▁flexible 1
+überschreitende 1
+begriff 1
+▁kurzem 1
+▁Format 1
+forderung 1
+fra 1
+▁Jedes 1
+▁verlangen 1
+▁umzusetzen 1
+▁gemütliche 1
+▁anti 1
+▁gearbeitet 1
+▁annehmen 1
+▁Dadurch 1
+▁Strukturen 1
+▁Hinter 1
+▁Erzeugnisse 1
+▁Norden 1
+▁hast 1
+ella 1
+▁Güter 1
+▁bestehende 1
+ique 1
+▁Regional 1
+spezifisch 1
+▁türkische 1
+▁Praktik 1
+Emissionen 1
+geschlossen 1
+▁Autos 1
+bli 1
+л 1
+▁finanziert 1
+streit 1
+ping 1
+fällen 1
+isierten 1
+▁hoher 1
+Fi 1
+▁fühlt 1
+▁bezieht 1
+ico 1
+▁senden 1
+▁fragte 1
+▁chemische 1
+fried 1
+pat 1
+▁Wähler 1
+▁bezahlen 1
+PA 1
+tausend 1
+Ü 1
+▁Beruf 1
+Berücksichtigung 1
+▁zweifellos 1
+▁zeit 1
+▁umgeben 1
+▁Schatten 1
+▁Nachricht 1
+▁Fri 1
+▁Strukturfonds 1
+▁hängt 1
+code 1
+▁beobachten 1
+▁bekommt 1
+▁Publikum 1
+▁beschrieben 1
+▁Alles 1
+gemeinschaft 1
+▁Kreis 1
+ong 1
+geschrieben 1
+Pro 1
+▁welchen 1
+maschinen 1
+80 1
+▁Maßnahme 1
+▁berechtigt 1
+fällt 1
+smöglichkeiten 1
+cho 1
+▁Wellness 1
+leit 1
+▁Medizin 1
+karten 1
+▁Institut 1
+▁gewährt 1
+miet 1
+▁angesehen 1
+▁lag 1
+bat 1
+▁Zeitung 1
+por 1
+▁Kontext 1
+▁Verhandlungs 1
+▁Del 1
+▁international 1
+▁Farbe 1
+▁wenden 1
+▁Führer 1
+▁Ohne 1
+▁Dafür 1
+▁Labor 1
+09 1
+verhalten 1
+▁Umfeld 1
+▁betreffend 1
+▁Werbe 1
+▁schwierigen 1
+▁pla 1
+plant 1
+▁fehlt 1
+▁willkommen 1
+▁eigen 1
+▁Schulen 1
+▁Vom 1
+erfolg 1
+breite 1
+IL 1
+▁City 1
+▁Beschluss 1
+▁Wesen 1
+▁weshalb 1
+▁zuverlässig 1
+effekt 1
+gno 1
+▁Studio 1
+▁Gestaltung 1
+wind 1
+▁ehemaligen 1
+▁Klimaanlage 1
+▁Mittelpunkt 1
+▁irische 1
+▁Zeug 1
+▁1997 1
+▁Milch 1
+▁kostenfrei 1
+▁kostet 1
+▁übrigens 1
+▁friedlich 1
+▁Allgemeinen 1
+old 1
+▁Korruption 1
+▁Auge 1
+▁existieren 1
+▁Küste 1
+▁britische 1
+▁bewegt 1
+ierbar 1
+Adresse 1
+▁ausgesetzt 1
+▁Tal 1
+▁Universum 1
+▁Theater 1
+▁Früh 1
+zuhalten 1
+für 1
+fü 1
+▁exklusiv 1
+▁Schuld 1
+▁einig 1
+Der 1
+▁Schönheit 1
+▁gezwungen 1
+immt 1
+schreiben 1
+▁Mode 1
+▁Cre 1
+▁traditionelle 1
+▁Denk 1
+▁offenen 1
+▁stolz 1
+itu 1
+▁Alt 1
+▁Geschwindigkeit 1
+▁Hund 1
+▁starken 1
+▁staatliche 1
+▁Verfügbarkeit 1
+▁eventuell 1
+▁Ferner 1
+▁Konzert 1
+▁Vermittlung 1
+▁Bucht 1
+▁bemüht 1
+▁massiv 1
+▁Würde 1
+▁Kommentar 1
+▁essen 1
+▁neun 1
+▁Comp 1
+▁Umweltschutz 1
+lief 1
+▁Sorgen 1
+▁vorgenommen 1
+▁sub 1
+▁plötzlich 1
+▁Normen 1
+▁ausgerichtet 1
+▁Beiträge 1
+▁Trend 1
+▁(1 1
+▁präsentiert 1
+situation 1
+geschichte 1
+▁Kapitel 1
+lock 1
+18 1
+▁Männern 1
+▁Umgang 1
+ält 1
+▁Harmonisierung 1
+▁existiert 1
+▁entlang 1
+▁Uni 1
+hang 1
+▁erhältlich 1
+▁Kongress 1
+▁Sub 1
+▁aufzunehmen 1
+▁innere 1
+▁verurteilt 1
+▁Prinzipien 1
+▁Mobilität 1
+▁Traum 1
+▁einfacher 1
+▁Ag 1
+▁Microsoft 1
+▁ehrlich 1
+pläne 1
+▁richten 1
+▁Kapazität 1
+gerecht 1
+▁Stadtzentrum 1
+EM 1
+▁heutige 1
+bert 1
+▁gezogen 1
+▁achten 1
+▁150 1
+▁hab 1
+▁Heimat 1
+rseits 1
+poli 1
+▁Stellung 1
+▁Konsens 1
+▁Handy 1
+Rezeption 1
+▁abgelehnt 1
+▁reduziert 1
+▁Anbieter 1
+▁investieren 1
+▁Fleisch 1
+▁zählt 1
+▁Dach 1
+leute 1
+regelung 1
+▁vielfältig 1
+amerikanische 1
+▁entwickelten 1
+▁breit 1
+▁Wichtig 1
+sländer 1
+decken 1
+MO 1
+schreiten 1
+urteil 1
+rechnen 1
+▁Brasilien 1
+▁gespeichert 1
+▁This 1
+▁verkaufen 1
+Kom 1
+▁inzwischen 1
+▁egal 1
+▁afrikanische 1
+▁aktive 1
+▁verringern 1
+▁spezifische 1
+▁Radio 1
+gekommen 1
+▁glaubt 1
+34 1
+stimme 1
+▁Tim 1
+▁echten 1
+bewegung 1
+▁Plattform 1
+▁Grand 1
+с 1
+▁Eisenbahn 1
+▁kreativ 1
+▁Herkunft 1
+zukommen 1
+anti 1
+zehn 1
+▁Dauer 1
+▁Vorgehen 1
+▁beraten 1
+▁Übereinstimmung 1
+Ro 1
+stimmung 1
+▁Verbesserungen 1
+▁Mark 1
+▁beträchtlich 1
+▁Viertel 1
+▁Altstadt 1
+verbrauch 1
+▁umfassenden 1
+lett 1
+▁Operation 1
+▁Real 1
+stätte 1
+▁Drittens 1
+▁Maschine 1
+▁emp 1
+▁wert 1
+geld 1
+tschaftswachstum 1
+▁Geschäfte 1
+▁französische 1
+dank 1
+▁sterben 1
+▁Festlegung 1
+▁Ausführung 1
+▁hinein 1
+▁Widerstand 1
+Mobil 1
+▁durchzuführen 1
+PT 1
+▁$ 1
+▁mod 1
+raub 1
+▁Besorgnis 1
+▁Leid 1
+▁mögen 1
+uc 1
+▁begrüßt 1
+TV 1
+finden 1
+▁Veranstaltungen 1
+@ 1
+freundliche 1
+The 1
+▁letzter 1
+verwaltung 1
+▁sichern 1
+▁offene 1
+▁NATO 1
+▁vorbereitet 1
+▁fordere 1
+▁Zahlungs 1
+▁Original 1
+▁Gebühr 1
+▁Top 1
+▁verbringen 1
+▁Agrarpolitik 1
+▁aufnehmen 1
+flo 1
+▁Gefängnis 1
+▁1996 1
+▁Schäden 1
+▁berühmten 1
+▁bestand 1
+Tech 1
+▁Update 1
+▁bekam 1
+▁unterzeichnet 1
+▁UNO 1
+▁Ol 1
+° 1
+▁sofern 1
+▁nieder 1
+▁italienischen 1
+datei 1
+▁eure 1
+▁einerseits 1
+So 1
+▁betreffenden 1
+▁Aktien 1
+sprach 1
+kunft 1
+▁gerichtet 1
+version 1
+▁Urteil 1
+▁verleihen 1
+▁chinesischen 1
+▁dahin 1
+24 1
+fahrer 1
+▁starten 1
+64 1
+▁Inseln 1
+▁Arbeitsplatz 1
+▁Opposition 1
+▁entsteht 1
+▁Experiment 1
+▁Spannung 1
+▁westlichen 1
+▁anwesend 1
+38 1
+▁hinweg 1
+▁Fern 1
+nehmer 1
+▁verhindert 1
+▁Frist 1
+▁Kreditkarte 1
+▁Angebote 1
+▁Hauptstadt 1
+uliert 1
+menge 1
+▁Details 1
+▁Front 1
+▁Freizeit 1
+▁beträgt 1
+95 1
+▁detailliert 1
+▁kontrolliert 1
+▁südlich 1
+▁IN 1
+▁Emp 1
+zieren 1
+▁riesige 1
+▁Verbreitung 1
+▁Verringerung 1
+▁Übernachtung 1
+▁durchführen 1
+▁Deck 1
+wächst 1
+▁Tool 1
+fragt 1
+▁Stimmen 1
+bestimmung 1
+▁Genehmigung 1
+▁geprüft 1
+▁irgendwie 1
+▁tausend 1
+▁Zivil 1
+▁industrielle 1
+▁künftigen 1
+alismus 1
+▁einheitliche 1
+ih 1
+web 1
+к 1
+▁gelegt 1
+▁Mari 1
+▁Städten 1
+▁behalten 1
+xi 1
+▁amtierende 1
+▁Center 1
+▁enorm 1
+pot 1
+▁Roboter 1
+▁staatlichen 1
+▁gewissen 1
+▁Verein 1
+ctor 1
+fremd 1
+▁Grafik 1
+▁Werbung 1
+▁Marketing 1
+lösen 1
+▁strategische 1
+ami 1
+▁Verantwortlich 1
+▁gefragt 1
+▁Chancen 1
+▁geehrte 1
+cro 1
+▁Speise 1
+ME 1
+▁ehrgeizig 1
+▁sonstige 1
+▁Faktoren 1
+▁Touristen 1
+▁Stamm 1
+hung 1
+▁Haftung 1
+▁passieren 1
+▁schicken 1
+ato 1
+abkommen 1
+▁Satz 1
+▁Stahl 1
+wald 1
+▁hervorragend 1
+ont 1
+▁Bewusstsein 1
+basierte 1
+ola 1
+ativen 1
+▁Mc 1
+avi 1
+phase 1
+quote 1
+▁Kat 1
+füll 1
+▁ursprünglichen 1
+▁nachhaltigen 1
+▁uneingeschränkt 1
+wohl 1
+▁Ursprung 1
+▁Ehre 1
+▁beruht 1
+▁reduzieren 1
+▁Brief 1
+greif 1
+▁Menü 1
+schlüssel 1
+▁hielt 1
+Projekt 1
+▁Ernährung 1
+▁produzieren 1
+▁hinzufügen 1
+▁Drittländern 1
+▁zusätzlichen 1
+▁Intervention 1
+hebung 1
+play 1
+▁Solar 1
+▁auftreten 1
+ierende 1
+▁Journalist 1
+▁doppelt 1
+▁letztendlich 1
+verbindung 1
+▁schwach 1
+▁besch 1
+▁= 1
+▁Paul 1
+▁Zusatz 1
+ulation 1
+▁para 1
+cio 1
+kontroll 1
+▁letztlich 1
+▁schrieb 1
+östlich 1
+▁effiziente 1
+▁erzeugen 1
+IE 1
+▁ausgesprochen 1
+▁Ausführungen 1
+▁heran 1
+phi 1
+▁zugun 1
+tausch 1
+kunde 1
+37 1
+▁aufgefordert 1
+format 1
+▁niedriger 1
+▁untersucht 1
+▁verteilt 1
+kleid 1
+ierenden 1
+lip 1
+▁überein 1
+▁DVD 1
+▁Sein 1
+Mo 1
+▁Terror 1
+falt 1
+abend 1
+▁gelernt 1
+▁nutzt 1
+▁solange 1
+▁zurückzu 1
+vol 1
+▁Sicherung 1
+auto 1
+ñ 1
+▁Plenum 1
+ström 1
+▁sur 1
+▁bemühen 1
+▁Bahnhof 1
+▁bezug 1
+▁ausgeführt 1
+din 1
+ging 1
+▁hervorheben 1
+▁Lernen 1
+▁Spaß 1
+▁Ganze 1
+lesen 1
+▁Inflation 1
+fluss 1
+▁bezahlt 1
+▁Fitness 1
+▁Spar 1
+▁Gespräch 1
+▁Browser 1
+▁angepasst 1
+▁Respekt 1
+▁Beste 1
+▁Stock 1
+▁Volkswirtschaft 1
+ical 1
+96 1
+▁effizient 1
+äumt 1
+verletzung 1
+DER 1
+▁Marke 1
+▁2020 1
+▁Anstieg 1
+▁beschäftigen 1
+▁spanischen 1
+▁Steuern 1
+▁Kranken 1
+▁Tochter 1
+▁gratulieren 1
+▁dauert 1
+ifiziert 1
+schluß 1
+▁erleichtern 1
+Bahn 1
+▁Menschheit 1
+▁Kunde 1
+▁beglückwünschen 1
+▁Zucker 1
+pal 1
+▁palästinensisch 1
+▁bestätigen 1
+▁jemals 1
+▁führende 1
+▁Strecke 1
+GE 1
+▁renoviert 1
+▁freiwillig 1
+stil 1
+antwort 1
+▁Digital 1
+▁Schlafzimmer 1
+gin 1
+▁geprägt 1
+97 1
+center 1
+▁wiederholen 1
+▁zerstört 1
+▁untersuchen 1
+▁Hor 1
+kret 1
+š 1
+läufe 1
+▁Todes 1
+▁geistige 1
+team 1
+▁zivil 1
+▁Stoffe 1
+schütt 1
+ç 1
+▁Zusammenhalt 1
+▁Tele 1
+gramm 1
+zü 1
+UR 1
+Lo 1
+▁neuesten 1
+▁Will 1
+▁herrscht 1
+komp 1
+17 1
+▁Forum 1
+▁Fünf 1
+▁berufliche 1
+▁soweit 1
+▁Kohle 1
+Bo 1
+▁Appartement 1
+▁Deswegen 1
+▁entspannen 1
+rühr 1
+▁zuständigen 1
+▁Bulgarien 1
+▁Räume 1
+▁Küsten 1
+▁aussehen 1
+Strategie 1
+▁hand 1
+▁zufolge 1
+▁Detail 1
+bal 1
+▁KMU 1
+▁knapp 1
+▁enger 1
+▁Sohn 1
+м 1
+▁unverzüglich 1
+anstalt 1
+▁Sind 1
+▁Beschlüsse 1
+▁Freunden 1
+16 1
+lohn 1
+▁Mission 1
+▁illegal 1
+▁Apartments 1
+▁Vorschlägen 1
+▁Mandat 1
+▁Bibliothek 1
+▁miss 1
+▁Gästen 1
+▁400 1
+▁Geschmack 1
+▁parlamentarisch 1
+▁kontrollieren 1
+▁Pläne 1
+49 1
+ado 1
+liegenden 1
+▁Hohen 1
+▁englische 1
+▁liefert 1
+beauftragte 1
+▁jener 1
+▁Cap 1
+▁Pakistan 1
+▁Audio 1
+▁Anfragen 1
+▁Metall 1
+▁Schlag 1
+▁Sehen 1
+▁scheinen 1
+amt 1
+ivität 1
+lenk 1
+▁israelisch 1
+fährt 1
+▁Beamte 1
+▁Nachbarn 1
+konferenz 1
+▁Event 1
+füllt 1
+▁Sicher 1
+Tra 1
+85 1
+▁Arbeitsplätzen 1
+▁Renn 1
+wärts 1
+reicht 1
+▁Pop 1
+▁Gerechtigkeit 1
+▁fein 1
+europa 1
+assi 1
+▁Gesetzes 1
+▁beteiligen 1
+dor 1
+▁Ferienwohnung 1
+▁Rechtsgrundlage 1
+▁nahezu 1
+grün 1
+gibt 1
+skonferenz 1
+▁verschiedener 1
+NS 1
+▁Defizit 1
+▁Aktionsplan 1
+▁ersetzt 1
+▁Sä 1
+▁offenbar 1
+▁gefährdet 1
+▁getrennt 1
+Di 1
+branche 1
+▁par 1
+▁Kabel 1
+▁qualifizierte 1
+auch 1
+▁höheren 1
+▁kurzfristig 1
+▁bilaterale 1
+▁Akteure 1
+ermaßen 1
+▁Ausgabe 1
+cor 1
+▁Aktiv 1
+▁32 1
+Ar 1
+▁ferner 1
+▁anschließend 1
+▁Potenzial 1
+▁weitgehend 1
+▁Martin 1
+▁Live 1
+▁Abfall 1
+▁Bemerkungen 1
+▁sucht 1
+06 1
+▁umfassend 1
+azi 1
+▁Angel 1
+▁Verarbeitung 1
+▁spreche 1
+ruhe 1
+▁ausgeschlossen 1
+▁gedacht 1
+▁erstaunlich 1
+▁schwedische 1
+▁Wien 1
+▁verdienen 1
+▁Costa 1
+▁Lieferung 1
+▁Nahrung 1
+▁elegante 1
+▁Sha 1
+▁Training 1
+olo 1
+▁moralische 1
+▁zwingen 1
+meldung 1
+ward 1
+▁konkret 1
+ident 1
+39 1
+▁Zypern 1
+▁Ausstattung 1
+▁Automobil 1
+option 1
+iff 1
+steck 1
+TO 1
+gewinn 1
+▁konsequent 1
+▁Haar 1
+▁ändert 1
+günstig 1
+▁spezialisiert 1
+▁Konsequenzen 1
+▁Mrd 1
+SL 1
+▁Renten 1
+IR 1
+▁Rohstoff 1
+▁Anzeige 1
+table 1
+freund 1
+trifft 1
+▁Drittel 1
+▁Ozean 1
+Vi 1
+▁Einschränkung 1
+▁Kroatien 1
+schieben 1
+▁Eingang 1
+▁blieb 1
+förder 1
+▁beendet 1
+▁Großteil 1
+mail 1
+▁sexuelle 1
+konzept 1
+▁jenen 1
+68 1
+▁Existenz 1
+▁Gepäck 1
+▁Einbeziehung 1
+▁verstärken 1
+zunehmen 1
+▁van 1
+▁spezifischen 1
+▁Konvent 1
+▁Tru 1
+zusammenarbeiten 1
+▁Sammlung 1
+▁Entlastung 1
+▁Armee 1
+▁Unterhaltung 1
+läuft 1
+▁geäußert 1
+▁zugrunde 1
+▁Sex 1
+▁Außenpolitik 1
+▁ansehen 1
+▁Bur 1
+▁Darum 1
+▁Konzern 1
+▁unterscheiden 1
+▁Mach 1
+55 1
+▁qualitativ 1
+fertig 1
+▁Tauch 1
+tüm 1
+zusehen 1
+▁schlägt 1
+▁Obama 1
+▁Augenblick 1
+spräsident 1
+▁Bereitstellung 1
+▁Erlebnis 1
+dar 1
+▁kontinuierlich 1
+▁generell 1
+▁gültig 1
+institut 1
+▁schlagen 1
+fahr 1
+personal 1
+▁Plat 1
+lot 1
+▁Weiß 1
+▁Pra 1
+44 1
+▁Kampagne 1
+▁Brücke 1
+put 1
+▁Bush 1
+▁Bahn 1
+▁schrecklich 1
+▁Gleichstellung 1
+▁dargestellt 1
+à 1
+▁1990 1
+▁Power 1
+▁ausgezeichneten 1
+▁alternative 1
+Euro 1
+HO 1
+hilf 1
+steig 1
+▁Beach 1
+▁Geburt 1
+▁beigetragen 1
+▁Va 1
+▁Kreuz 1
+waffen 1
+▁beantworten 1
+▁Öko 1
+▁Dich 1
+▁TED 1
+schätze 1
+bedingt 1
+▁Balkan 1
+▁Medikamente 1
+▁Wild 1
+▁multi 1
+▁Kanada 1
+Vor 1
+Ja 1
+▁Erfüllung 1
+▁sitzen 1
+▁Bord 1
+strategie 1
+▁Maus 1
+bla 1
+▁Reduzierung 1
+▁Bühne 1
+▁Zusätzlich 1
+check 1
+▁Nachbarschaft 1
+▁Theorie 1
+▁sozialer 1
+▁Weltkrieg 1
+bio 1
+▁Ausgangspunkt 1
+▁Kopenhagen 1
+▁Widerspruch 1
+▁Gewährleistung 1
+wandel 1
+ografische 1
+▁Phänomen 1
+▁Casino 1
+▁zukünftige 1
+▁Ruf 1
+▁mächtig 1
+▁Wett 1
+▁Ersatz 1
+▁realistisch 1
+CD 1
+mul 1
+▁geltenden 1
+▁verbindlich 1
+▁Islam 1
+▁arabische 1
+▁behaupten 1
+▁nachdenken 1
+▁Bildschirm 1
+▁normalerweise 1
+Länder 1
+sache 1
+turm 1
+¤ 1
+▁verteidigen 1
+▁1995 1
+Was 1
+▁Verlauf 1
+flüge 1
+mensch 1
+▁relevant 1
+losigkeit 1
+▁Hochschul 1
+▁Summe 1
+▁Staates 1
+heil 1
+▁Provinz 1
+▁systematisch 1
+ji 1
+▁Kenn 1
+▁Insbesondere 1
+▁Schengen 1
+▁Gute 1
+▁russischen 1
+▁Können 1
+▁dramatisch 1
+büro 1
+▁Wand 1
+▁berühmte 1
+▁Wechsel 1
+▁potenziell 1
+▁global 1
+▁page 1
+▁Charta 1
+▁Übrigen 1
+▁leichter 1
+▁genügend 1
+▁Besonders 1
+▁ordnungsgemäß 1
+▁schönsten 1
+▁that 1
+▁Autobahn 1
+▁Katastrophe 1
+kampf 1
+▁chinesische 1
+▁traf 1
+▁ergänzen 1
+▁Ungarn 1
+▁gespielt 1
+▁Lä 1
+▁lieber 1
+▁Glaubwürdigkeit 1
+▁übernimmt 1
+▁Kap 1
+IF 1
+▁Free 1
+▁Badezimmer 1
+▁eingegangen 1
+▁geboten 1
+▁genetisch 1
+leiter 1
+▁Schalt 1
+▁beeinflussen 1
+cur 1
+▁my 1
+beziehungen 1
+▁45 1
+geschickt 1
+▁comp 1
+▁entscheidender 1
+▁bloß 1
+▁organisieren 1
+▁vermitteln 1
+▁äußern 1
+▁hoffentlich 1
+▁Marktes 1
+design 1
+▁Laut 1
+▁Ärzte 1
+▁betreiben 1
+heiß 1
+graben 1
+▁rechtlichen 1
+▁beeindruckend 1
+▁präzise 1
+▁angewandt 1
+media 1
+▁Finanzkrise 1
+▁Klar 1
+lieferung 1
+block 1
+▁Hostels 1
+eisen 1
+gängig 1
+läufig 1
+▁Verteilung 1
+▁Anlass 1
+▁Steuerung 1
+▁Öffnung 1
+▁Pension 1
+▁gewiss 1
+▁eingebracht 1
+▁2011 1
+trauen 1
+▁Netze 1
+▁Treib 1
+▁Darstellung 1
+▁Schwimm 1
+▁Tabelle 1
+▁Schlacht 1
+86 1
+▁Abhängigkeit 1
+▁München 1
+ón 1
+▁Gegenstand 1
+▁ersetzen 1
+größe 1
+▁Frankfurt 1
+▁Subsidiarität 1
+▁hell 1
+▁Bemerkung 1
+NA 1
+▁bislang 1
+prüfung 1
+13 1
+▁froh 1
+▁vorlegen 1
+▁mittleren 1
+▁Leitung 1
+▁installieren 1
+▁Schloss 1
+▁vorherige 1
+name 1
+▁Eurozone 1
+▁Stufe 1
+▁Betrag 1
+▁Beschränkung 1
+▁selben 1
+▁Fla 1
+pool 1
+▁überprüft 1
+▁rechtliche 1
+▁Flüge 1
+Distribution 1
+▁verbindet 1
+▁ethnische 1
+erklärung 1
+▁Versammlung 1
+▁gehe 1
+ausschuß 1
+▁Haben 1
+graph 1
+▁Pflege 1
+oma 1
+▁Kra 1
+klasse 1
+▁Rechnungshof 1
+▁gebeten 1
+▁Komponenten 1
+▁IP 1
+▁Vorhaben 1
+▁Satelliten 1
+▁sammeln 1
+möglich 1
+lässlich 1
+» 1
+For 1
+▁einheitlichen 1
+arbeiter 1
+abhängig 1
+▁deswegen 1
+Ä 1
+▁Faktor 1
+д 1
+▁Wörter 1
+▁Bearbeitung 1
+88 1
+▁ansprechen 1
+▁konfrontiert 1
+▁Wälder 1
+▁Trotzdem 1
+▁ergibt 1
+▁Oper 1
+▁Schlaf 1
+▁Migration 1
+regen 1
+elf 1
+▁dankbar 1
+▁allzu 1
+▁Befugnisse 1
+▁ehemalige 1
+▁serviert 1
+wähl 1
+▁Van 1
+▁berichtet 1
+geist 1
+▁Seele 1
+▁ländliche 1
+ible 1
+zentren 1
+43 1
+▁Import 1
+viertel 1
+# 1
+▁kümmern 1
+▁Tabak 1
+▁Pool 1
+offen 1
+▁Kopie 1
+▁Übertragung 1
+rechnet 1
+▁Wahrscheinlich 1
+▁vorrangig 1
+einkommen 1
+bund 1
+▁Investitions 1
+europäischen 1
+ruhig 1
+▁momentan 1
+▁Belastung 1
+▁versichern 1
+Par 1
+Zu 1
+samt 1
+▁erreichbar 1
+verhältnis 1
+SI 1
+▁Dusche 1
+▁Sto 1
+DI 1
+▁Registrierung 1
+▁Wiederaufbau 1
+grenze 1
+char 1
+▁reagiert 1
+4-0 1
+▁aktuell 1
+▁Av 1
+▁Luxus 1
+▁Regierungschefs 1
+geworfen 1
+▁Photo 1
+▁überwinden 1
+ava 1
+▁aufbauen 1
+▁Ägypten 1
+versicherung 1
+speicher 1
+▁Wetter 1
+prüf 1
+anta 1
+▁könnt 1
+▁Kollegin 1
+▁mobile 1
+▁Quellen 1
+▁aussprechen 1
+geschlagen 1
+ektor 1
+▁besorgt 1
+▁nett 1
+▁betrieben 1
+▁gewöhnlich 1
+▁hilfreich 1
+▁Fußball 1
+▁Sekunden 1
+ated 1
+▁Fremd 1
+▁EZB 1
+▁AKP 1
+▁this 1
+▁zukünftigen 1
+▁Bereitschaft 1
+▁fantastisch 1
+leib 1
+▁Produktivität 1
+▁Ana 1
+▁Ambiente 1
+▁stehenden 1
+sstrategie 1
+▁Brand 1
+▁nunmehr 1
+▁Anmerkung 1
+▁Nah 1
+▁einzusetzen 1
+opp 1
+gegriffen 1
+▁Hindernisse 1
+▁Peter 1
+wettbewerbsfähig 1
+▁bestimmter 1
+▁Überlegungen 1
+▁verboten 1
+▁müsste 1
+▁Russ 1
+▁buchen 1
+▁Block 1
+▁effizienter 1
+landschaft 1
+col 1
+▁hol 1
+▁österreichische 1
+▁Prag 1
+▁Spitzen 1
+▁bedroht 1
+▁Durchschnitt 1
+45 1
+▁strategischen 1
+▁Überleben 1
+▁medi 1
+▁Kriminalität 1
+▁erschien 1
+▁Balkon 1
+23 1
+č 1
+▁stammt 1
+▁lernt 1
+▁steigende 1
+gesehen 1
+view 1
+▁erwähnen 1
+▁Textil 1
+kapazität 1
+▁Ansprüche 1
+▁Erwägung 1
+▁radikal 1
+▁fördert 1
+▁Poker 1
+stufe 1
+▁Arzneimittel 1
+▁Tonnen 1
+▁Nachhaltigkeit 1
+▁einzigartig 1
+▁fehlende 1
+binden 1
+▁retten 1
+▁Nehmen 1
+▁total 1
+▁Katastrophen 1
+pra 1
+quer 1
+lagerung 1
+▁nachdrücklich 1
+▁Reservierung 1
+▁Wunder 1
+shop 1
+▁eingehalten 1
+schreibung 1
+gearbeitet 1
+▁bemerkt 1
+▁solide 1
+▁großem 1
+▁Mexiko 1
+familie 1
+▁Mel 1
+▁aktivieren 1
+▁Sound 1
+▁Festival 1
+Ha 1
+ielle 1
+urlaub 1
+▁stammen 1
+▁Spitze 1
+▁Gottes 1
+▁aufgebaut 1
+▁Premierminister 1
+klick 1
+Mitglied 1
+IG 1
+▁geraten 1
+▁EADS 1
+▁Pack 1
+47 1
+▁Ball 1
+▁Ungleichheit 1
+▁verletzt 1
+▁Meister 1
+▁weist 1
+▁Mitgliedschaft 1
+▁Ablehnung 1
+▁intelligente 1
+▁Anwender 1
+▁Burg 1
+▁empfangen 1
+gehoben 1
+greift 1
+schlüsse 1
+rack 1
+▁Bestimmung 1
+▁müssten 1
+▁inklusive 1
+▁heißen 1
+▁Vari 1
+▁gekennzeichnet 1
+▁Wesentlichen 1
+▁erweitern 1
+qual 1
+67 1
+66 1
+▁Pferd 1
+22 1
+▁Feind 1
+▁Vorlage 1
+geschnitten 1
+п 1
+▁Miet 1
+▁anbelangt 1
+▁rechtzeitig 1
+▁still 1
+▁Stiftung 1
+▁Verpackung 1
+▁Dreh 1
+UNG 1
+▁anschließen 1
+▁Konto 1
+▁engagiert 1
+▁gewaltige 1
+▁Finger 1
+▁Schwer 1
+boot 1
+▁einbezogen 1
+▁Viel 1
+▁Budget 1
+▁Profil 1
+▁wonach 1
+nütz 1
+▁Vermögen 1
+▁heutzutage 1
+▁anfangen 1
+▁Ankunft 1
+▁Fahrt 1
+90 1
+▁beabsichtigt 1
+▁Ausschüsse 1
+▁Rückkehr 1
+▁Förder 1
+gefühl 1
+▁selber 1
+▁Schuh 1
+▁Station 1
+▁Leit 1
+Enterprise 1
+▁großzügig 1
+methode 1
+▁angelegt 1
+▁verringert 1
+▁WLAN 1
+dorf 1
+▁Investoren 1
+▁virtuelle 1
+▁Parameter 1
+▁Passwort 1
+▁Beide 1
+ographische 1
+▁Manche 1
+▁Bern 1
+▁baut 1
+▁Bay 1
+schütz 1
+schreck 1
+▁herzustellen 1
+▁Fax 1
+– 1
+wuchs 1
+▁aufgeführt 1
+▁passende 1
+agentur 1
+Daten 1
+motor 1
+▁Konsultation 1
+hersteller 1
+▁unzureichend 1
+▁fiel 1
+▁Trag 1
+▁Persönlichkeit 1
+▁vorgestellt 1
+ppel 1
+▁erfreut 1
+▁Entfernung 1
+▁Börse 1
+▁Luxemburg 1
+▁Verzögerung 1
+▁Diplom 1
+▁Gerade 1
+▁Million 1
+▁Alltag 1
+▁Wochenende 1
+brenn 1
+▁mache 1
+▁Group 1
+▁Ya 1
+▁gezielt 1
+AU 1
+weiß 1
+▁mündliche 1
+▁Strände 1
+beruf 1
+▁mitteilen 1
+beschreibung 1
+▁Sozialdemokrat 1
+▁angewendet 1
+▁abstimmen 1
+▁beenden 1
+▁Unterricht 1
+▁italienische 1
+gewicht 1
+▁Beschwerde 1
+▁behauptet 1
+▁Minute 1
+Bericht 1
+▁Allianz 1
+▁Arzt 1
+▁vertrauen 1
+messe 1
+▁anhand 1
+▁vorliegende 1
+kräftig 1
+▁Support 1
+▁Belgien 1
+▁richtet 1
+rechnung 1
+5% 1
+▁formuliert 1
+▁Wall 1
+▁Resultat 1
+ivilgesellschaft 1
+schuss 1
+AV 1
+▁Freitag 1
+▁Ingenieur 1
+▁gründlich 1
+gehört 1
+▁Anders 1
+▁davor 1
+artikel 1
+▁England 1
+othek 1
+▁derzeitige 1
+▁Außenminister 1
+▁vermutlich 1
+▁Müll 1
+▁Ausflüge 1
+▁überrascht 1
+▁Schmerz 1
+▁nennt 1
+▁geliefert 1
+▁befasst 1
+▁Schon 1
+▁eingeleitet 1
+▁empfohlen 1
+▁Dir 1
+planung 1
+LO 1
+▁Donnerstag 1
+▁Entdeckung 1
+▁antworten 1
+▁Fang 1
+21 1
+▁errichtet 1
+▁Hunger 1
+Über 1
+bedarf 1
+▁Impuls 1
+▁erstmals 1
+▁Plätze 1
+▁Geschenk 1
+▁Erstellung 1
+▁Schweizer 1
+▁Abschaffung 1
+▁Tarif 1
+zieh 1
+reit 1
+afrika 1
+▁Motiv 1
+2000 1
+▁legal 1
+Control 1
+▁Leiter 1
+▁Back 1
+▁zurückzuführen 1
+pflege 1
+▁Anschluss 1
+▁bewältigen 1
+▁jetzigen 1
+▁unterstreichen 1
+▁Schnell 1
+▁gefallen 1
+dokument 1
+▁Alkohol 1
+▁physisch 1
+dienstleistungen 1
+▁Schicksal 1
+▁Wärme 1
+▁vertraut 1
+bezogene 1
+▁Normal 1
+▁Sauna 1
+faktor 1
+▁Erzeuger 1
+▁vorzulegen 1
+▁überlassen 1
+Energieeffizienz 1
+Produkt 1
+▁abschließend 1
+glichkeit 1
+ova 1
+sstaatlichkeit 1
+▁Hardware 1
+▁japanische 1
+▁Vorausschau 1
+▁Acht 1
+▁ökonomische 1
+▁mittlere 1
+Com 1
+ergebnisse 1
+▁kombiniert 1
+▁Archiv 1
+▁Email 1
+▁Richter 1
+▁beruhen 1
+Seite 1
+sprozesses 1
+gäste 1
+getriebe 1
+▁Schaf 1
+▁Richtig 1
+▁Barroso 1
+▁hervorgehoben 1
+ponent 1
+brach 1
+▁Präsentation 1
+▁geregelt 1
+DR 1
+▁Hohe 1
+▁dasselbe 1
+PE 1
+▁fliegen 1
+▁Erwerb 1
+▁eingeschränkt 1
+▁legitim 1
+▁Katalog 1
+▁behoben 1
+▁Syrien 1
+▁bisherigen 1
+vision 1
+▁Immer 1
+staatliche 1
+load 1
+▁Freizügigkeit 1
+▁separat 1
+▁Festplatte 1
+Service 1
+minute 1
+▁Länge 1
+sendung 1
+▁digital 1
+ici 1
+▁kalt 1
+dringlich 1
+deutsch 1
+ausgaben 1
+EU 1
+▁Lateinamerika 1
+ifizieren 1
+holz 1
+count 1
+▁verwende 1
+▁Lohn 1
+▁technisch 1
+verständnis 1
+▁draußen 1
+▁polnische 1
+▁senken 1
+▁verbracht 1
+▁definieren 1
+Mark 1
+▁anwenden 1
+▁Prüf 1
+41 1
+▁Know 1
+▁Phil 1
+▁Lobby 1
+▁Vereinfachung 1
+▁begleitet 1
+▁Serbien 1
+▁Reinigung 1
+▁Gegner 1
+▁ausgezeichnet 1
+▁Nu 1
+▁erfordern 1
+spiegel 1
+▁beeinflusst 1
+▁Hamburg 1
+krebs 1
+▁erhoben 1
+▁gelingt 1
+source 1
+▁Ideal 1
+automat 1
+▁hohem 1
+▁Vizepräsident 1
+▁Putin 1
+gestaltung 1
+▁Ratifizierung 1
+anzeige 1
+▁Forscher 1
+▁Konsum 1
+▁Vortrag 1
+▁gestärkt 1
+▁Transaktion 1
+72 1
+regierung 1
+▁Madrid 1
+▁erschaffen 1
+▁schätzen 1
+▁Verbrauch 1
+▁Beteiligten 1
+▁angebracht 1
+56 1
+▁hierbei 1
+▁Hof 1
+▁Extrem 1
+▁Kohäsion 1
+▁erteilt 1
+▁Mauer 1
+▁Zone 1
+▁einzuführen 1
+▁bemerkenswert 1
+▁Versand 1
+▁Umsatz 1
+Staaten 1
+steht 1
+▁Baum 1
+▁registriert 1
+▁gelesen 1
+▁Volkes 1
+bewußt 1
+▁Finnland 1
+schliess 1
+▁Oh 1
+▁vereint 1
+▁Miß 1
+2001 1
+behandlung 1
+SCH 1
+▁Show 1
+▁geräumig 1
+▁Mängel 1
+▁PHP 1
+gemäß 1
+SV 1
+▁solch 1
+Vertrag 1
+charakter 1
+▁festzulegen 1
+ländische 1
+▁Erhaltung 1
+▁Protest 1
+▁Geh 1
+▁gestattet 1
+erstattung 1
+▁wünsche 1
+rüstung 1
+▁bedanken 1
+▁Card 1
+EX 1
+FA 1
+katastrophe 1
+▁Palästinenser 1
+▁kennt 1
+▁festzustellen 1
+heilig 1
+light 1
+▁Apartment 1
+▁teilt 1
+vorschriften 1
+gelegen 1
+▁Il 1
+▁Herbst 1
+▁gebraucht 1
+▁angeblich 1
+▁Verfasser 1
+TER 1
+▁Santa 1
+▁Behinderung 1
+ifizierung 1
+▁Tom 1
+front 1
+▁Michael 1
+▁unentgeltlich 1
+▁beschränken 1
+▁Rauch 1
+▁speichern 1
+abilität 1
+▁Cast 1
+sieg 1
+card 1
+▁Bevor 1
+flex 1
+▁Einwanderer 1
+:// 1
+OP 1
+▁inmitten 1
+▁Lied 1
+78 1
+▁ermutigen 1
+▁Volksgesundheit 1
+züglich 1
+▁ISO 1
+küste 1
+▁Ausrüstung 1
+▁Feier 1
+Version 1
+▁nachzudenken 1
+▁Seminar 1
+▁David 1
+46 1
+▁Dynamik 1
+▁Beseitigung 1
+▁präsentieren 1
+förderung 1
+wässer 1
+ía 1
+▁dauern 1
+▁bewährt 1
+▁gewähren 1
+▁verstehe 1
+Mitgliedstaaten 1
+▁Arbeitgeber 1
+▁Mechanismen 1
+▁erinnere 1
+▁angekündigt 1
+termin 1
+▁Schulung 1
+▁Sobald 1
+schwäche 1
+umpf 1
+ifikation 1
+▁nördlich 1
+▁Formulierung 1
+▁nehme 1
+▁Nachteil 1
+▁Unterkünfte 1
+imp 1
+▁Filter 1
+Sie 1
+▁Befehl 1
+kühl 1
+▁jüngste 1
+у 1
+▁Modernisierung 1
+runde 1
+▁schaut 1
+▁Südafrika 1
+▁Panorama 1
+▁geschätzt 1
+▁Soldaten 1
+stitution 1
+▁elektrische 1
+▁Fertigung 1
+box 1
+▁gründet 1
+▁Belarus 1
+▁Wählen 1
+\ 1
+▁bezeichnen 1
+anspruch 1
+▁erarbeitet 1
+▁herunterladen 1
+▁Fahrrad 1
+boden 1
+▁Bekannt 1
+schlaf 1
+▁Maria 1
+▁stattfindet 1
+▁Vorgehensweise 1
+▁nu 1
+▁begründet 1
+ippen 1
+HA 1
+▁wohnen 1
+RS 1
+▁Media 1
+book 1
+schirm 1
+verlust 1
+35 1
+▁unterliegen 1
+▁Mond 1
+finanzierung 1
+▁Australien 1
+▁Zerstörung 1
+▁gelangt 1
+länge 1
+▁Pilot 1
+▁Willkommen 1
+▁Weltwirtschaft 1
+▁Unsicherheit 1
+▁hierfür 1
+▁koordiniert 1
+▁entfernen 1
+▁Nichtraucher 1
+lässt 1
+▁grundlegend 1
+▁Verzeichnis 1
+▁Konfiguration 1
+▁versteht 1
+haupt 1
+▁bereitet 1
+▁Voraus 1
+â 1
+initiative 1
+▁vergleichbar 1
+▁nord 1
+▁Dol 1
+Server 1
+▁Global 1
+▁Schnee 1
+geschwindigkeit 1
+▁Hügel 1
+2010 1
+strom 1
+▁aktualisiert 1
+▁begeistert 1
+▁aufzubauen 1
+▁Opti 1
+▁Game 1
+Software 1
+gefügt 1
+▁Sonntag 1
+zulegen 1
+▁Resort 1
+▁Höchst 1
+▁Heizung 1
+▁Zauber 1
+ographie 1
+▁läßt 1
+operation 1
+▁Begleit 1
+▁Käufer 1
+Kohäsionspolitik 1
+▁Somit 1
+▁modified 1
+▁Profit 1
+▁übrig 1
+▁Ausbau 1
+▁Nation 1
+▁Vorrang 1
+▁unnötig 1
+▁Passagier 1
+▁anzuwenden 1
+51 1
+terra 1
+▁sichergestellt 1
+▁enthalt 1
+zuführen 1
+83 1
+schön 1
+übergreifende 1
+▁Entwickler 1
+▁öffnet 1
+▁steigern 1
+▁anzunehmen 1
+71 1
+flüssig 1
+▁Fremdenverkehr 1
+82 1
+▁Flor 1
+▁bewerten 1
+76 1
+▁Freihandel 1
+motiv 1
+79 1
+▁Straßburg 1
+▁Rechner 1
+74 1
+▁Zeile 1
+▁laufenden 1
+▁Währungsunion 1
+▁Gewerkschaft 1
+verarbeitung 1
+spricht 1
+stärke 1
+Wirtschaftskrise 1
+▁Emissions 1
+500 1
+back 1
+▁linken 1
+▁Berechnung 1
+▁Demokraten 1
+▁Karriere 1
+▁Abendessen 1
+schrieb 1
+▁Metro 1
+Strahl 1
+wachstum 1
+▁Konzentration 1
+II 1
+technische 1
+▁gesammelt 1
+▁Sat 1
+atik 1
+▁Poly 1
+trans 1
+87 1
+▁wechseln 1
+59 1
+▁Reserve 1
+garten 1
+phrase 1
+NO 1
+kamera 1
+essel 1
+▁raus 1
+▁Händler 1
+« 1
+▁passt 1
+Ex 1
+restaurant 1
+▁anspruchsvolle 1
+▁personenbezogen 1
+▁örtliche 1
+▁Betreiber 1
+wel 1
+fordert 1
+ständigkeit 1
+▁Alpen 1
+liberal 1
+73 1
+▁Washington 1
+▁veröffentlichen 1
+▁Senkung 1
+fenster 1
+▁Kuba 1
+▁vermittelt 1
+▁USB 1
+▁hingegen 1
+▁Koch 1
+▁http 1
+▁diplomatische 1
+▁Möbel 1
+▁250 1
+▁Gefangene 1
+▁russische 1
+▁Tau 1
+▁mehrfach 1
+▁Dokumentation 1
+▁Professor 1
+▁Beachtung 1
+▁mangelnde 1
+▁Meilen 1
+kunst 1
+zutreten 1
+litz 1
+▁Petition 1
+▁Gibt 1
+wünsch 1
+Auf 1
+▁have 1
+▁Datum 1
+▁herauszufinden 1
+▁Truppen 1
+vertreter 1
+54 1
+▁vereinbar 1
+▁Einfach 1
+77 1
+▁Todesstrafe 1
+▁ergänzt 1
+▁Strasse 1
+▁Fakten 1
+▁Stoff 1
+tauchen 1
+▁gehabt 1
+▁IWF 1
+▁Beurteilung 1
+▁Bestell 1
+▁Andererseits 1
+▁Human 1
+▁Ausarbeitung 1
+nehmbar 1
+Stunden 1
+Finde 1
+histori 1
+▁HIV 1
+▁befürchte 1
+▁passen 1
+▁1994 1
+bewusst 1
+Bomb 1
+▁finanzieren 1
+▁irgendwelche 1
+▁mittlerweile 1
+gebrochen 1
+▁Rettung 1
+▁armen 1
+▁schien 1
+▁Äußerung 1
+kredit 1
+serie 1
+▁beweisen 1
+ú 1
+▁Souveränität 1
+▁Schlusselwort 1
+rechtliche 1
+▁Säule 1
+▁geheim 1
+führ 1
+▁Lesen 1
+software 1
+▁gleichermaßen 1
+▁Tanz 1
+▁erheben 1
+▁Ereignis 1
+58 1
+▁beseitigen 1
+▁Aufsicht 1
+▁Betrug 1
+▁freut 1
+▁Auseinandersetz 1
+▁Übernahme 1
+▁schädlich 1
+▁teuer 1
+▁geschafft 1
+▁Street 1
+▁lebendig 1
+▁Entspannung 1
+▁Newsletter 1
+anforderungen 1
+▁Steigerung 1
+▁Saison 1
+▁strikt 1
+dämm 1
+▁Java 1
+zip 1
+höhe 1
+▁Verwirklichung 1
+zweig 1
+▁Revision 1
+heirat 1
+▁portugiesische 1
+▁Etwa 1
+▁bevorzugt 1
+▁schrittweise 1
+▁berechnet 1
+▁Modern 1
+▁erwiesen 1
+▁Nachmittag 1
+▁Bäume 1
+▁Nov 1
+Spiel 1
+▁gemein 1
+schränke 1
+▁Leidenschaft 1
+▁Null 1
+▁Ausgleich 1
+▁fürchte 1
+strecken 1
+Work 1
+▁objektiv 1
+gefertigt 1
+▁diverse 1
+logie 1
+▁Auflösung 1
+▁grammatisch 1
+inisterpräsident 1
+▁Gehminuten 1
+▁Referenz 1
+▁gewidmet 1
+▁Manchmal 1
+▁überwachen 1
+▁realer 1
+▁Empfang 1
+▁Parkplatz 1
+▁basierend 1
+▁erkennt 1
+max 1
+zeile 1
+▁Kreativität 1
+▁angefangen 1
+▁versteckt 1
+down 1
+▁Finanzielle 1
+raff 1
+▁demokratisch 1
+vil 1
+▁denjenigen 1
+▁kontaktieren 1
+gebäude 1
+zauber 1
+▁Jean 1
+▁Referendum 1
+▁erworben 1
+▁ausüben 1
+▁Anmeldung 1
+▁extra 1
+kannt 1
+▁Klang 1
+▁soeben 1
+▁Allgemein 1
+▁Überblick 1
+▁Genf 1
+▁Spektrum 1
+ING 1
+▁Spiegel 1
+▁Osteuropa 1
+▁plus 1
+▁Minderheit 1
+▁Parkplätze 1
+Rechtsvorschrift 1
+▁herstellen 1
+verhältnisse 1
+Schnittstelle 1
+gast 1
+• 1
+▁einstimmig 1
+▁luxuriöse 1
+vermögen 1
+▁Okay 1
+▁Kennzeichnung 1
+▁Umweltfragen 1
+▁indische 1
+▁verursachen 1
+▁Paar 1
+Prozess 1
+▁festlegen 1
+я 1
+▁faszinierend 1
+▁seltsam 1
+▁Luftverkehr 1
+▁vous 1
+gesundheit 1
+▁getötet 1
+▁fassen 1
+▁Wartung 1
+▁vergleichen 1
+▁Robert 1
+▁bedauerlich 1
+▁langjährige 1
+▁Zeitplan 1
+▁künstlich 1
+städte 1
+▁jedenfalls 1
+fotograf 1
+▁beinhalten 1
+komplex 1
+▁aktiviert 1
+▁Lieblings 1
+▁Pub 1
+▁Logik 1
+▁Errichtung 1
+versammlung 1
+präg 1
+▁nachhaltig 1
+Gesundheitswesen 1
+▁Wahrnehmung 1
+▁Wirksamkeit 1
+▁investiert 1
+▁Massage 1
+„ 1
+platte 1
+69 1
+▁Ausrichtung 1
+▁Oliven 1
+▁Bewohner 1
+▁niederländische 1
+▁ungarische 1
+▁starb 1
+▁Schlussel 1
+▁Zivilisation 1
+▁Philosophie 1
+▁Chef 1
+▁Ecke 1
+▁600 1
+▁besagt 1
+▁Konstruktion 1
+▁Szene 1
+▁leistet 1
+▁berichten 1
+▁Arbeitskräfte 1
+▁Qui 1
+▁sparen 1
+▁abgegeben 1
+▁Betreuung 1
+▁Tendenz 1
+güter 1
+muster 1
+▁übermittelt 1
+▁älteste 1
+▁Schwäche 1
+▁Betriebssystem 1
+▁beobachtet 1
+▁beantwortet 1
+▁Lücke 1
+▁Acc 1
+▁beschreiben 1
+▁Ratschlag 1
+▁Ergänzung 1
+▁islamische 1
+▁Geheimnis 1
+▁Steuerzahler 1
+▁Erkenntnisse 1
+▁vorsichtig 1
+▁Versprechen 1
+Sicherheitsrat 1
+▁körperlich 1
+▁Umstände 1
+Entschlossenheit 1
+▁beinahe 1
+▁Mitgefühl 1
+taucht 1
+▁geltend 1
+wichtig 1
+▁endet 1
+nbetracht 1
+▁Flash 1
+▁Mitentscheidung 1
+84 1
+▁entstand 1
+Gruppe 1
+händler 1
+▁Abteilung 1
+▁Rock 1
+▁Jack 1
+Und 1
+▁verschiedenste 1
+▁eignet 1
+▁einzelstaatlich 1
+▁Island 1
+▁nein 1
+LAN 1
+▁DIE 1
+▁bedeutsam 1
+analyse 1
+▁müßte 1
+▁dunkle 1
+▁oftmals 1
+stellbar 1
+▁Mechanismus 1
+▁Substanz 1
+▁Psycho 1
+TEN 1
+▁wünscht 1
+fertigung 1
+musik 1
+▁Niemand 1
+▁startet 1
+▁bewahren 1
+▁fuer 1
+▁Benutzung 1
+▁Flüchtlings 1
+▁Tschechische 1
+▁anzubieten 1
+▁geblieben 1
+▁Bilanz 1
+▁Johann 1
+fekt 1
+opfer 1
+▁Internetseite 1
+▁Ausweitung 1
+werbung 1
+▁Konvention 1
+▁dargelegt 1
+53 1
+▁super 1
+▁begegnen 1
+bücher 1
+▁Luftfahrt 1
+▁PPE 1
+GO 1
+▁Rückgang 1
+▁Saal 1
+▁derselben 1
+▁Mehrwertsteuer 1
+▁steckt 1
+EIN 1
+verteilung 1
+▁Bezeichnung 1
+▁German 1
+▁Logo 1
+▁vorzunehmen 1
+▁Bestätigung 1
+▁Tragödie 1
+▁ungewöhnlich 1
+▁use 1
+▁Durchsetzung 1
+▁Architekt 1
+▁Visa 1
+ION 1
+▁House 1
+konsum 1
+mitte 1
+▁oberste 1
+▁Zulassung 1
+▁gesichert 1
+▁Tasche 1
+währung 1
+ffel 1
+Up 1
+▁Aktivität 1
+▁Dänemark 1
+▁Transfer 1
+nack 1
+▁mutig 1
+63 1
+▁GAP 1
+▁Anhörung 1
+Benutzer 1
+▁spanische 1
+▁anerkennen 1
+erzeugnisse 1
+▁steigt 1
+bevölkerung 1
+▁PDF 1
+nutzung 1
+▁könne 1
+institutionelle 1
+▁Präsenz 1
+▁Futter 1
+▁Frühjahr 1
+▁Hör 1
+î 1
+▁planen 1
+schärfe 1
+▁Log 1
+▁George 1
+▁Route 1
+▁Vogel 1
+▁irgendwo 1
+▁Terroristen 1
+verkauf 1
+transfer 1
+leiste 1
+▁Beschluß 1
+▁zeitlich 1
+▁Flughäfen 1
+▁Interview 1
+pend 1
+28 1
+▁Qual 1
+▁Aufzug 1
+▁sensible 1
+▁Besondere 1
+▁künstlerische 1
+29 1
+▁Nutz 1
+terrasse 1
+bekämpfung 1
+▁schließt 1
+▁Volkspartei 1
+Umstrukturierung 1
+▁lenken 1
+▁süd 1
+▁stattgefunden 1
+▁Konkurrenz 1
+ressourcen 1
+suite 1
+▁Gewässer 1
+▁exakt 1
+▁erfasst 1
+▁Tempo 1
+▁fortgesetzt 1
+▁anschauen 1
+57 1
+▁Zwar 1
+▁Listings 1
+▁Genießen 1
+81 1
+▁Fett 1
+▁pour 1
+▁logisch 1
+▁Französisch 1
+dimension 1
+▁Café 1
+▁Segel 1
+▁Eröffnung 1
+▁entdeckst 1
+▁Krankenhaus 1
+▁hinzufugen 1
+▁Dörfer 1
+▁multilaterale 1
+ê 1
+▁städtische 1
+▁Vorbehalt 1
+Technologie 1
+▁spannend 1
+wettbewerb 1
+▁Ablauf 1
+▁einmalige 1
+▁Bürokratie 1
+▁Jedoch 1
+▁Danach 1
+hundert 1
+▁Differenz 1
+▁Fassung 1
+signal 1
+▁verrückt 1
+▁Ufer 1
+▁verpflichten 1
+▁anhaltende 1
+käufe 1
+..."... 1
+▁Empfänger 1
+▁Rezession 1
+▁ablehnen 1
+▁Schwester 1
+▁800 1
+masse 1
+▁teure 1
+▁stattdessen 1
+▁hinzugefügt 1
+▁Ostsee 1
+ject 1
+▁Christen 1
+▁geleitet 1
+▁realisiert 1
+▁2012 1
+▁traurig 1
+matische 1
+Source 1
+▁Kunststoff 1
+einhalb 1
+▁Prodi 1
+Richtlinie 1
+Agent 1
+mauer 1
+▁abgesehen 1
+▁gestartet 1
+▁JavaScript 1
+▁User 1
+▁umfassen 1
+Bereich 1
+▁Beobachter 1
+▁aussieht 1
+read 1
+▁anpassen 1
+ough 1
+▁Investition 1
+▁Zunahme 1
+▁Quoten 1
+kampagne 1
+verschmutzung 1
+▁ECU 1
+▁gestatten 1
+▁Missbrauch 1
+flieg 1
+anwendung 1
+▁Konjunktur 1
+▁Zuverlässigkeit 1
+▁Dringlichkeit 1
+▁Vordergrund 1
+▁Störung 1
+www 1
+▁views 1
+▁DNA 1
+▁Arbeitszeit 1
+▁Skype 1
+▁stärksten 1
+▁belgische 1
+▁Erfindung 1
+▁Mischung 1
+▁geschah 1
+▁Salz 1
+▁Beschäftigte 1
+▁interessieren 1
+▁Erreichung 1
+▁Klarheit 1
+▁erbaut 1
+ific 1
+▁Samstag 1
+KOM 1
+äuschen 1
+▁irgend 1
+▁wild 1
+▁iranische 1
+▁Key 1
+▁keineswegs 1
+▁drastisch 1
+▁gäbe 1
+▁verarbeitet 1
+zwecke 1
+▁Feststellung 1
+▁Hunderte 1
+▁Urheberrecht 1
+Schlussfolgerung 1
+▁Brau 1
+telefon 1
+revolution 1
+▁widmen 1
+▁zulässig 1
+▁Wichtigkeit 1
+▁schlicht 1
+▁Kenntnisse 1
+▁Vertretung 1
+▁genehmigt 1
+▁Verfolgung 1
+▁Unfall 1
+spalt 1
+ologen 1
+▁kompetent 1
+▁Nizza 1
+▁1980 1
+▁privat 1
+▁verabschieden 1
+▁beibehalten 1
+▁Figur 1
+▁unterbreitet 1
+▁Privatsphäre 1
+basis 1
+Dollar 1
+samkeit 1
+▁Stich 1
+▁Song 1
+▁erleichtert 1
+oberfläche 1
+▁Variante 1
+ministerium 1
+▁Gemüse 1
+▁kollektive 1
+Musik 1
+▁zustande 1
+▁Brennstoff 1
+▁bestmöglich 1
+▁Einnahmen 1
+▁Zahn 1
+▁elegant 1
+▁Schicht 1
+wirtschaftliche 1
+▁Erkrankung 1
+▁dreht 1
+▁Museen 1
+▁testen 1
+▁tragisch 1
+▁Besonderheit 1
+▁Kyoto 1
+▁nannte 1
+▁strukturelle 1
+▁Behörde 1
+besitz 1
+normen 1
+unterschied 1
+▁vereinfacht 1
+▁Offenheit 1
+▁unerlässlich 1
+avec 1
+▁Rahmenprogramm 1
+▁Welthandels 1
+▁Bruder 1
+▁Beobachtung 1
+92 1
+▁parallel 1
+▁Gaza 1
+▁Einschätzung 1
+▁Life 1
+gesteuert 1
+▁NICHT 1
+▁benachteiligt 1
+▁verankert 1
+▁Visum 1
+▁Schlussfolger 1
+▁integrieren 1
+▁Qualifikation 1
+▁bewertet 1
+▁Bereits 1
+▁Fundament 1
+▁statistisch 1
+▁Stress 1
+▁Übersicht 1
+santräge 1
+▁Evolution 1
+stöße 1
+▁sozusagen 1
+risiko 1
+▁Lieferanten 1
+▁Masse 1
+▁Palette 1
+modul 1
+▁Aktualisierung 1
+▁Begründung 1
+▁respektieren 1
+♪ 1
+▁Direktor 1
+frau 1
+▁Zurück 1
+disziplin 1
+mechanismus 1
+▁Prozeß 1
+▁Gentoo 1
+▁Album 1
+▁Fabrik 1
+Jahr 1
+lux 1
+mechanismen 1
+▁Mathematik 1
+▁gerechtfertigt 1
+▁stilvoll 1
+▁Lärm 1
+reiz 1
+glas 1
+bleiben 1
+▁exzellente 1
+bilanz 1
+oire 1
+▁Stattdessen 1
+▁verurteilen 1
+▁Fertig 1
+▁territoriale 1
+▁überraschend 1
+▁Airbus 1
+▁beschleunigen 1
+▁entworfen 1
+▁längst 1
+▁Nachdruck 1
+▁Kürze 1
+▁Slowakei 1
+▁Saint 1
+cord 1
+▁Gegenstände 1
+Video 1
+▁Blumen 1
+▁Siedlung 1
+therapie 1
+▁Flotte 1
+▁Mittelmeerraum 1
+▁drücken 1
+▁aufregend 1
+▁Aufhebung 1
+▁Bosnien 1
+▁Tennis 1
+schmerz 1
+▁Hochzeit 1
+▁Arbeitsgruppe 1
+▁Etikett 1
+▁Hostelsclub 1
+▁Kaiser 1
+▁bedienen 1
+▁Fotografie 1
+metall 1
+▁Ausbeutung 1
+▁Wiederherstell 1
+2009 1
+Frühstücksbuffet 1
+▁erstklassige 1
+▁anzupassen 1
+▁modernste 1
+▁Alternativ 1
+▁auswählen 1
+▁Kürzung 1
+klima 1
+board 1
+kriterien 1
+▁Innenstadt 1
+▁Finanzmärkte 1
+umweltfreundlich 1
+▁Kernel 1
+▁Demonstration 1
+▁eröffnen 1
+▁gegebenenfalls 1
+▁1993 1
+▁Laser 1
+▁Rassismus 1
+▁ärmsten 1
+▁emotional 1
+▁studiert 1
+saison 1
+formular 1
+Sowohl 1
+▁Beförderung 1
+▁unbekannt 1
+gestalt 1
+Abkommen 1
+▁Liberalen 1
+▁ignoriert 1
+möglichkeit 1
+▁beschreibt 1
+▁Kreatur 1
+hancengleichheit 1
+▁Galerie 1
+▁Fuss 1
+▁Treibhausgas 1
+▁umgekehrt 1
+▁verschaffen 1
+▁Wolf 1
+▁1992 1
+ibili 1
+▁Entdecke 1
+▁Mobiltelefon 1
+▁respektiert 1
+▁zwölf 1
+▁America 1
+▁Hans 1
+ador 1
+stunden 1
+irurg 1
+▁Brenn 1
+ž 1
+plikation 1
+▁Gegenwart 1
+93 1
+▁Füße 1
+dichte 1
+▁legislative 1
+▁Häfen 1
+▁Schnitt 1
+museum 1
+schätzung 1
+nachfolgend 1
+▁Halt 1
+▁inspiriert 1
+▁Erzeugung 1
+▁Reparatur 1
+▁Fortsetzung 1
+▁erörtert 1
+▁nukleare 1
+▁Prävention 1
+▁Florenz 1
+▁Mehrwert 1
+ь 1
+▁Innerhalb 1
+▁anzuzeigen 1
+▁staff 1
+▁Check 1
+▁Ferr 1
+▁Schwelle 1
+▁Applikation 1
+▁unzählige 1
+▁Sprech 1
+▁fortsetzen 1
+vereinbarung 1
+▁Verkehrsmittel 1
+stift 1
+▁Marokko 1
+▁Anwesenheit 1
+▁Fokus 1
+▁Anregung 1
+▁Komplexität 1
+▁Verhältnisse 1
+leuchten 1
+reihe 1
+papier 1
+▁Lokal 1
+▁100% 1
+▁Grünbuch 1
+▁Elite 1
+▁vergangen 1
+▁Pfad 1
+ô 1
+▁begleiten 1
+▁Wechselkurs 1
+▁Bonus 1
+▁Berater 1
+versuch 1
+wid 1
+▁Thomas 1
+▁Reichtum 1
+▁begangen 1
+aufgaben 1
+▁Physik 1
+▁zugute 1
+5.000 1
+Lösung 1
+Гј 1
+▁kommunizieren 1
+▁verwandelt 1
+▁Problematik 1
+studie 1
+▁NRO 1
+Regierung 1
+Gipfel 1
+▁tschechische 1
+▁Vereinigung 1
+▁Folgendes 1
+▁Angestellte 1
+▁wofür 1
+▁Dublin 1
+▁Abfälle 1
+▁Solche 1
+größte 1
+з 1
+▁Anhänger 1
+▁Ausgrenzung 1
+▁herausragende 1
+▁Erarbeitung 1
+Paket 1
+▁Weiterbildung 1
+regulierung 1
+profil 1
+▁Abenteuer 1
+▁Konvergenz 1
+▁flexibel 1
+▁vorsieht 1
+▁Venedig 1
+▁unterbrochen 1
+▁Echtzeit 1
+▁Behauptung 1
+hai 1
+▁verantwortungs 1
+eiße 1
+▁spüren 1
+▁problemlos 1
+▁damalige 1
+▁2013 1
+2003 1
+▁Inkrafttreten 1
+▁Magazin 1
+▁minimal 1
+▁Statut 1
+▁bekräftigt 1
+▁gekauft 1
+2008 1
+▁Nigeria 1
+gipfel 1
+▁bearbeitet 1
+entscheidung 1
+▁Therapie 1
+▁Verabschiedung 1
+▁erwerben 1
+▁Black 1
+▁Erscheinung 1
+gezeichnet 1
+geschaltet 1
+▁Insgesamt 1
+▁unterscheidet 1
+▁weibliche 1
+berichterstatter 1
+kämpfe 1
+▁Orientierung 1
+▁Gipfeltreffen 1
+Expert 1
+▁eingeräumt 1
+▁natur 1
+▁Wüste 1
+flüsse 1
+▁Virus 1
+klagt 1
+▁basieren 1
+▁etabliert 1
+▁maßgeblich 1
+emissionen 1
+messung 1
+▁Zusage 1
+▁stecken 1
+sprachige 1
+▁III 1
+nähe 1
+▁gewinnt 1
+▁fließen 1
+▁erlangen 1
+▁Korrektur 1
+▁bürgerliche 1
+▁Gewähr 1
+sequenz 1
+▁Mütter 1
+▁Geltung 1
+▁verwandeln 1
+▁ethische 1
+blatt 1
+▁Extra 1
+groß 1
+▁sofortige 1
+ergebnis 1
+94 1
+Annehmlichkeiten 1
+▁Hauptbahnhof 1
+▁kritisiert 1
+▁Talent 1
+▁Eigenschaft 1
+89 1
+▁Gedanke 1
+▁jünger 1
+▁permanent 1
+veranstaltung 1
+tempo 1
+Team 1
+Modus 1
+▁glaubwürdig 1
+▁verwirklichen 1
+▁Griff 1
+▁Ähnlich 1
+Politik 1
+▁Disziplin 1
+stürzt 1
+▁vermieden 1
+▁Höhle 1
+▁Zusammensetzung 1
+▁arbeits 1
+▁finanziell 1
+reinigung 1
+С 1
+▁Rubrik 1
+▁rechtfertigen 1
+▁vermute 1
+▁Indikator 1
+▁schwerwiegende 1
+▁reichhaltig 1
+▁Einzelheiten 1
+ichtraucherzonen 1
+1⁄4 1
+▁Scheitern 1
+▁Zuerst 1
+▁liberale 1
+▁Michel 1
+▁Verlängerung 1
+▁atemberaubend 1
+▁verbreiten 1
+▁produktiv 1
+▁Wohnzimmer 1
+Ebene 1
+teilnehmer 1
+Point 1
+▁gefährden 1
+▁Hektar 1
+missbrauch 1
+▁Lebensqualität 1
+▁füllen 1
+å 1
+▁Entschädigung 1
+▁Wahrung 1
+▁gepflegt 1
+gespräch 1
+▁kenne 1
+▁Ministerrat 1
+▁University 1
+▁reibungslos 1
+▁Batterie 1
+▁Knochen 1
+▁überwiegend 1
+lapp 1
+mütig 1
+▁Eigentümer 1
+▁ordentlich 1
+erzeugung 1
+▁Studium 1
+Generalsekretär 1
+▁Mittwoch 1
+▁Spaziergang 1
+▁Rußland 1
+▁Depression 1
+▁Weiterhin 1
+▁ignorieren 1
+▁zugestimmt 1
+▁Bananen 1
+anbieter 1
+▁Möchte 1
+▁Rasse 1
+▁Kraftstoff 1
+▁Performance 1
+▁1991 1
+▁beizutragen 1
+▁Energiequellen 1
+räder 1
+▁klug 1
+▁Unterdrückung 1
+▁gravierend 1
+▁zerstören 1
+▁erstreckt 1
+▁romantische 1
+periode 1
+Bürger 1
+▁malerische 1
+objekt 1
+́ 1
+▁Bewältigung 1
+▁gebilligt 1
+▁verliehen 1
+▁europaweit 1
+▁universell 1
+▁zeige 1
+störung 1
+▁Rezept 1
+▁Literatur 1
+▁Zürich 1
+▁appelliere 1
+▁fundamental 1
+▁zurückkehren 1
+▁gelingen 1
+▁angewiesen 1
+schuh 1
+` 1
+zuziehen 1
+effizient 1
+▁Zufriedenheit 1
+▁bemerken 1
+reinigt 1
+▁herrschen 1
+ão 1
+[ 1
+2002 1
+▁Libanon 1
+▁allmählich 1
+▁verknüpft 1
+▁Kleidung 1
+▁Mittler 1
+▁teilzunehmen 1
+▁markiert 1
+▁Geräusch 1
+▁Airport 1
+house 1
+▁zwanzig 1
+temperatur 1
+brecher 1
+▁Heimatland 1
+▁Mitgliedsländer 1
+▁Kämpfe 1
+▁Logistik 1
+▁lädt 1
+▁Gesichtspunkt 1
+benutzer 1
+▁Tibet 1
+▁spiegelt 1
+▁1989 1
+▁good 1
+Funktion 1
+▁Gedächtnis 1
+▁verwaltet 1
+▁verschwinden 1
+Dienst 1
+▁Entstehung 1
+▁beschleunigt 1
+▁Ordner 1
+▁Dampf 1
+▁unterwegs 1
+▁Gewebe 1
+schmutz 1
+▁inhaltlich 1
+▁Ermittlung 1
+▁löschen 1
+▁agieren 1
+▁Gelände 1
+Format 1
+plattform 1
+▁auswärtige 1
+▁obligatorisch 1
+▁terroristische 1
+klausel 1
+▁überarbeitet 1
+▁zielt 1
+▁Angehörige 1
+▁vorliegt 1
+▁Doppelzimmer 1
+¶ 1
+▁Stabilisierung 1
+medizin 1
+▁Spanisch 1
+▁Koordination 1
+stuhl 1
+▁dänische 1
+feuer 1
+steigerung 1
+verbindlich 1
+▁Legislativ 1
+▁Bewerber 1
+▁touristische 1
+▁Zusammenbruch 1
+sammlung 1
+▁verzichten 1
+▁GNU 1
+▁asiatische 1
+motion 1
+‘ 1
+▁Spezialitäten 1
+empfindlich 1
+▁12.00 1
+Demokratisierung 1
+▁Erdbeben 1
+▁Vergnügen 1
+▁schreibt 1
+▁löst 1
+▁Inhaber 1
+sphäre 1
+▁Act 1
+▁Übung 1
+▁Moskau 1
+▁Rechenschaft 1
+▁heftig 1
+▁berührt 1
+demokratische 1
+▁stützen 1
+▁Mühe 1
+▁Genuss 1
+▁Köln 1
+▁1,5 1
+▁Funktionalität 1
+▁Territorium 1
+▁angestrebt 1
+▁vereinfachen 1
+▁Domain 1
+▁Taxi 1
+▁benannt 1
+▁konzipiert 1
+▁Zuschauer 1
+▁scheinbar 1
+filter 1
+▁Ernst 1
+ł 1
+▁Maastricht 1
+▁Palästina 1
+status 1
+smethoden 1
+▁aufgerufen 1
+▁Straßenverkehr 1
+▁Diagnose 1
+▁Monopol 1
+▁location 1
+▁Chemie 1
+▁Royal 1
+▁Maßstab 1
+▁MySQL 1
+▁theoretisch 1
+▁entspannt 1
+▁Maxim 1
+Fotograf 1
+Institut 1
+▁brutal 1
+▁Vorredner 1
+▁winzig 1
+▁Höhepunkt 1
+▁analysieren 1
+▁kohärent 1
+▁Salzburg 1
+sozial 1
+kompetenz 1
+▁Desktop 1
+▁Getreide 1
+▁leitet 1
+▁administrative 1
+▁spektakulär 1
+▁vorübergehend 1
+▁folglich 1
+▁Register 1
+▁Medikament 1
+förmig 1
+▁erläutern 1
+▁genießt 1
+beginn 1
+szusammenarbeit 1
+▁nutzbar 1
+▁psychisch 1
+▁Universal 1
+▁Nerven 1
+▁Plastik 1
+präsidenten 1
+arquis 1
+▁trug 1
+▁Produzenten 1
+▁Ausübung 1
+▁Folter 1
+2006 1
+ы 1
+▁Betroffenen 1
+▁Zustellbett 1
+▁geholfen 1
+▁beiträgt 1
+▁Brüder 1
+▁Schätzung 1
+▁Drittstaaten 1
+defizit 1
+▁zahlt 1
+▁unverzichtbar 1
+▁Workshop 1
+▁herkömmliche 1
+▁Gross 1
+▁herausfinden 1
+▁vorbereiten 1
+▁parti 1
+▁Präsidium 1
+▁ausgedehnt 1
+▁erörtern 1
+▁Bedienung 1
+▁gehandelt 1
+▁verschieden 1
+▁Fülle 1
+▁grob 1
+▁kauft 1
+▁erläutert 1
+▁Devisen 1
+▁unterbreiten 1
+▁gewann 1
+▁Adria 1
+▁Ökosystem 1
+▁erachte 1
+▁Bakterien 1
+▁visuelle 1
+▁vorbehalten 1
+experiment 1
+▁Einladung 1
+▁empfängt 1
+▁befand 1
+▁beurteilen 1
+▁Riesen 1
+▁iPhone 1
+▁resultieren 1
+▁Verkäufer 1
+▁getestet 1
+▁denselben 1
+▁Mögliche 1
+▁schließe 1
+▁traditionell 1
+▁Gestalt 1
+▁Interpretation 1
+▁ratifiziert 1
+firmen 1
+▁scharf 1
+nbsp 1
+▁zitiere 1
+wolle 1
+gültig 1
+▁Elektrizität 1
+▁Atlantik 1
+▁droht 1
+▁Kalender 1
+betrug 1
+▁Lounge 1
+▁empfinde 1
+▁Chemikalien 1
+▁enttäuscht 1
+▁transatlantisch 1
+▁Anfänge 1
+▁verliert 1
+▁schützt 1
+▁befreien 1
+▁bezogen 1
+▁sportlich 1
+‚ 1
+г 1
+▁Ankündigung 1
+▁irgendwann 1
+▁mittelalterlich 1
+▁verhandeln 1
+▁aufrichtig 1
+▁Flasche 1
+▁Jugoslawien 1
+▁Taiwan 1
+▁Trennung 1
+▁zutiefst 1
+▁Centre 1
+▁Milliarde 1
+▁Außer 1
+▁Galaxie 1
+▁Rotary 1
+▁bedauere 1
+▁Wertpapier 1
+Artikel 1
+▁bürokratische 1
+▁Konsumenten 1
+ст 1
+senkung 1
+Administration 1
+▁Intelligenz 1
+▁beeinträchtigt 1
+▁Infektion 1
+ausstattung 1
+▁nenne 1
+▁Argentinien 1
+▁Subventionen 1
+▁Spuren 1
+▁Überraschung 1
+▁regeln 1
+▁Züge 1
+▁Pharma 1
+schöpfung 1
+▁Geburtstag 1
+▁Elektronik 1
+▁schenken 1
+▁gründen 1
+kirche 1
+besuch 1
+▁Motto 1
+▁stetig 1
+▁Vorgaben 1
+▁Diktatur 1
+▁Verstärkung 1
+▁inakzeptabel 1
+▁stoppen 1
+▁School 1
+▁insofern 1
+höfe 1
+▁verheerend 1
+▁Vögel 1
+▁sanft 1
+Design 1
+lücke 1
+▁weiss 1
+▁Rahmenbedingung 1
+&#160; 1
+▁Potential 1
+▁fügt 1
+▁Nordkorea 1
+▁Spezies 1
+▁ungeachtet 1
+▁Quadrat 1
+▁Rhein 1
+▁Sechs 1
+▁Navigation 1
+▁definitiv 1
+▁musikalische 1
+▁absurd 1
+▁Weißbuch 1
+▁entschied 1
+▁Blue 1
+▁Publikation 1
+▁erkennbar 1
+▁kostengünstig 1
+▁kommunistische 1
+▁trennen 1
+▁Libyen 1
+▁Sowjetunion 1
+▁bedauern 1
+club 1
+lateral 1
+▁jahrelang 1
+▁worauf 1
+▁sinken 1
+Temp 1
+▁Weihnachts 1
+▁Wohlbefinden 1
+▁römische 1
+▁Anweisungen 1
+flotte 1
+fleisch 1
+kreuz 1
+ansprüche 1
+▁irakische 1
+▁Charles 1
+▁einheimische 1
+video 1
+spruch 1
+▁Foundation 1
+▁Investment 1
+▁kompakt 1
+▁Meldung 1
+▁offenkundig 1
+▁interaktive 1
+▁geniessen 1
+▁bevorstehenden 1
+▁Mineral 1
+Fischereipolitik 1
+▁Alexander 1
+▁Ungleichgewicht 1
+▁schlug 1
+▁Besatzung 1
+▁Dutzend 1
+▁melden 1
+▁Warnung 1
+і 1
+▁Frequenz 1
+▁Kompromiß 1
+▁Norwegen 1
+▁Früchte 1
+▁wünschenswert 1
+▁Rindfleisch 1
+▁multinationale 1
+▁Monitor 1
+▁vorteilhaft 1
+▁Index 1
+Modell 1
+potenzial 1
+▁entscheidet 1
+▁horizontal 1
+▁Toilette 1
+sammenzuarbeiten 1
+lizenz 1
+▁informelle 1
+▁zukünftig 1
+ökonom 1
+▁verlängert 1
+▁Gärten 1
+summe 1
+▁Bedingung 1
+▁analysiert 1
+▁Vietnam 1
+leuchtet 1
+brücke 1
+town 1
+▁Nuklear 1
+▁Litauen 1
+▁fossile 1
+▁eingebaut 1
+▁problematisch 1
+▁klingt 1
+härte 1
+Plug 1
+protokoll 1
+▁Aluminium 1
+▁Mazedonien 1
+▁Slowenien 1
+▁Richard 1
+▁Ultra 1
+▁isoliert 1
+Internet 1
+▁Stabilitätspakt 1
+▁Vermarktung 1
+übertragung 1
+feindliche 1
+▁renommierte 1
+▁verschärft 1
+▁Überarbeitung 1
+▁Aufklärung 1
+▁ansonsten 1
+▁fühle 1
+▁operative 1
+▁beseitigt 1
+▁motiviert 1
+▁bescheiden 1
+▁blind 1
+▁Turnier 1
+kündigt 1
+▁Integrität 1
+▁verwalten 1
+§ 1
+▁Erdöl 1
+▁trocken 1
+▁wählt 1
+erfahrung 1
+▁Illusion 1
+▁optimiert 1
+▁AIDS 1
+▁Flagge 1
+▁jeweilige 1
+▁abzielen 1
+▁Frucht 1
+▁ernannt 1
+▁muslimische 1
+▁Governance 1
+▁Protein 1
+й 1
+▁identifizieren 1
+▁ewig 1
+konflikt 1
+▁Zeichnung 1
+▁Anleger 1
+▁Kanäle 1
+▁gesundheitliche 1
+wärme 1
+€ 1
+ografie 1
+▁Korea 1
+▁which 1
+▁Freuen 1
+▁gefolgt 1
+▁Kohlenstoff 1
+▁Swiss 1
+infrastruktur 1
+▁finnische 1
+▁Netto 1
+Gestatten 1
+▁korrigieren 1
+▁zeitgenössische 1
+▁Klinik 1
+Commerce 1
+streifen 1
+angehörige 1
+▁Köpfe 1
+▁Hotelsafe 1
+bearbeitung 1
+▁erfunden 1
+▁liebt 1
+▁Schwellenländer 1
+▁Adobe 1
+verantwortlich 1
+vorsitzende 1
+▁Indonesien 1
+▁Schokolade 1
+▁jüdische 1
+▁Ökonomie 1
+erlebnis 1
+▁abzielt 1
+▁Facebook 1
+▁Sorgfalt 1
+▁versprochen 1
+▁Optimierung 1
+szeitraum 1
+▁Schlußfolgerung 1
+▁bewaffnete 1
+▁lustig 1
+▁töten 1
+▁auszuüben 1
+wörter 1
+Bild 1
+▁Laptop 1
+▁Mallorca 1
+▁akzeptabel 1
+▁Erfordernisse 1
+· 1
+▁potentiell 1
+▁Chinesen 1
+▁Materie 1
+Engine 1
+▁Folie 1
+schöpfen 1
+▁Budapest 1
+▁profitiert 1
+▁Periode 1
+▁Gemäß 1
+▁Ernennung 1
+▁Kloster 1
+▁klinische 1
+▁aktualisieren 1
+▁tödlich 1
+▁vertraulich 1
+▁Münz 1
+▁Kohärenz 1
+▁empfiehlt 1
+▁äußert 1
+▁Reihenfolge 1
+▁durfte 1
+▁Tempel 1
+▁Zuhause 1
+▁flach 1
+Karte 1
+▁breakfast 1
+▁erfreulich 1
+▁Ideologie 1
+praxis 1
+▁blockiert 1
+▁Schauspieler 1
+Preis 1
+erkennung 1
+▁Einfluß 1
+▁Millennium 1
+▁Privileg 1
+▁zwangsläufig 1
+▁Gummi 1
+flücht 1
+Partner 1
+▁eindrucksvoll 1
+aufrechterhalten 1
+▁Kabine 1
+▁familiär 1
+▁Muslime 1
+▁keinesfalls 1
+▁dünn 1
+▁LateRooms 1
+▁Albanien 1
+▁Annäherung 1
+▁Behinderte 1
+▁Evaluierung 1
+▁Molekül 1
+▁Tunesien 1
+▁Quartal 1
+Christdemokraten 1
+▁Liege 1
+ý 1
+▁verschwunden 1
+ć 1
+▁Teufel 1
+▁einzubeziehen 1
+▁äußere 1
+▁College 1
+▁Effektivität 1
+▁Alpha 1
+▁Komplettpreis 1
+▁Assoziierung 1
+▁Sauerstoff 1
+▁Thailand 1
+▁gescheitert 1
+▁Bezirk 1
+▁Könnte 1
+▁hübsch 1
+▁Befreiung 1
+schmelz 1
+Automat 1
+▁Befürchtung 1
+▁aggressiv 1
+▁erforschen 1
+▁berühmt 1
+ière 1
+▁Legitimität 1
+▁Nichtregierungs 1
+belastung 1
+computer 1
+Haushalt 1
+▁Kalifornien 1
+▁Träger 1
+▁strafrechtlich 1
+▁unberührt 1
+▁größtenteils 1
+▁Animation 1
+▁Content 1
+▁verstoßen 1
+gesteckt 1
+zusammen 1
+▁Vielmehr 1
+▁zügig 1
+▁spätestens 1
+▁Neuigkeiten 1
+▁verfasst 1
+▁rief 1
+Ausnahmeregelung 1
+völker 1
+▁Föderation 1
+▁Erdgas 1
+style 1
+▁kriminelle 1
+▁Parallel 1
+▁feiern 1
+▁Surf 1
+▁Wikitravel 1
+б 1
+▁Toleranz 1
+▁beantragt 1
+▁Ängste 1
+geholt 1
+▁ideologisch 1
+dauerlicherweise 1
+▁Cocktail 1
+▁Errungenschaft 1
+▁koordinieren 1
+▁eigenständige 1
+▁Spalte 1
+▁gelb 1
+▁Simbabwe 1
+▁fortgeschritten 1
+theorie 1
+▁Autonomie 1
+▁steuerliche 1
+ð 1
+ч 1
+▁Stockholm 1
+▁Vulkan 1
+▁Instabilität 1
+▁verschoben 1
+siedlung 1
+▁ausgebaut 1
+▁Saudi 1
+widrig 1
+▁Boutique 1
+▁Organismen 1
+▁kümmert 1
+▁Security 1
+script 1
+▁Puerto 1
+▁Emotionen 1
+clus 1
+▁Piazza 1
+▁Löhne 1
+▁primär 1
+Gleichbehandlung 1
+Protokoll 1
+ı 1
+▁vorzubereiten 1
+▁ausgeübt 1
+brüche 1
+Taste 1
+▁gesondert 1
+▁Prognose 1
+▁umstritten 1
+▁befreit 1
+schlepp 1
+▁Patient 1
+ysikalisch 1
+philosoph 1
+▁Implementierung 1
+▁komfortabel 1
+▁original 1
+▁männliche 1
+▁konventionelle 1
+▁bekräftigen 1
+hydr 1
+▁Verweis 1
+unwahrscheinlich 1
+fabrik 1
+volumen 1
+▁centre 1
+EWG 1
+▁Migranten 1
+▁verteidigt 1
+▁stehe 1
+▁Erneuerung 1
+▁Immunität 1
+blätter 1
+▁beweist 1
+▁Grundfreiheiten 1
+▁Central 1
+▁schickt 1
+wissenschaftler 1
+verbände 1
+▁spürbar 1
+▁gewohnt 1
+▁abzulehnen 1
+▁Twitter 1
+▁dahingehend 1
+▁Copyright 1
+▁stützt 1
+▁Übersetzer 1
+▁HTML 1
+▁optimistisch 1
+▁anstreben 1
+▁Louis 1
+Präsident 1
+reißen 1
+überwachung 1
+▁Network 1
+▁fortschrittlich 1
+▁Mahlzeit 1
+▁verbieten 1
+© 1
+▁konservativ 1
+▁stattfand 1
+▁geklärt 1
+▁verleiht 1
+point 1
+▁Schweine 1
+▁Hongkong 1
+▁Schottland 1
+▁makroökonomisch 1
+▁Joseph 1
+▁Schriftsteller 1
+▁Etappe 1
+läßlich 1
+▁unendlich 1
+▁verhandelt 1
+▁Nachweis 1
+▁Darlehen 1
+▁Kriterium 1
+▁beeinträchtigen 1
+▁unterliegt 1
+▁verkündet 1
+▁Niederlassung 1
+▁veranstaltet 1
+adresse 1
+▁Attraktionen 1
+▁Zertifizierung 1
+▁harmonisiert 1
+▁veranlasst 1
+▁Dunkel 1
+▁Rekord 1
+▁Hindernis 1
+antwortungsvolle 1
+▁Komplex 1
+▁Demokratische 1
+▁Gültigkeit 1
+▁Prototyp 1
+▁größtmögliche 1
+▁inspirieren 1
+▁Käse 1
+konzern 1
+machung 1
+▁Diejenigen 1
+▁Beendigung 1
+bäume 1
+▁katastrophal 1
+▁leistungsfähige 1
+▁verwirklicht 1
+▁Zubehör 1
+▁widmet 1
+▁bewahrt 1
+▁Herberge 1
+mikro 1
+ähnlich 1
+▁wöchentlich 1
+▁engagieren 1
+▁energisch 1
+▁studieren 1
+α 1
+▁Begrenzung 1
+▁Kernkraftwerk 1
+▁Saddam 1
+▁einschlägige 1
+▁versorgen 1
+beratung 1
+▁leistungsstarke 1
+▁unbegrenzt 1
+ufrechterhaltung 1
+farbig 1
+▁Koalition 1
+▁beachtet 1
+▁ausgeglichen 1
+▁streben 1
+▁Release 1
+▁namentlich 1
+▁Reichweite 1
+▁trinken 1
+▁selbständig 1
+▁Korallen 1
+▁gedruckt 1
+▁wiederhole 1
+ě 1
+▁populär 1
+▁vorzuschlagen 1
+▁Buffet 1
+▁belastet 1
+▁Parlamentarier 1
+▁strukturiert 1
+▁erlangt 1
+firma 1
+▁milde 1
+▁Verschmutzung 1
+▁gratis 1
+▁Entspannen 1
+▁grösste 1
+garantie 1
+▁beunruhigend 1
+▁öfter 1
+▁bestraft 1
+▁unterstreicht 1
+ación 1
+▁weitreichende 1
+▁Komponente 1
+ń 1
+▁Vermeidung 1
+▁unabdingbar 1
+▁befriedigen 1
+▁Folglich 1
+▁Schließung 1
+▁identisch 1
+glücklicherweise 1
+▁anzuerkennen 1
+▁beschädigt 1
+▁hinzuzufügen 1
+▁Wohlergehen 1
+▁Fracht 1
+erhöhung 1
+gesandt 1
+wurf 1
+▁vorangegangenen 1
+▁monatlich 1
+▁Streben 1
+▁Ahnung 1
+▁Blatt 1
+konstruktion 1
+▁Stuttgart 1
+▁registrieren 1
+▁gemeldet 1
+▁anscheinend 1
+▁Verurteilung 1
+chancen 1
+▁Bündnis 1
+▁erholsame 1
+▁klimatisiert 1
+▁Fußgänger 1
+▁Science 1
+▁importiert 1
+▁beunruhigt 1
+▁Tunnel 1
+▁widerspiegelt 1
+▁konstant 1
+▁zugewiesen 1
+▁beauftragt 1
+▁Fragestunde 1
+▁Clinton 1
+▁übereinstimmen 1
+▁Beschaffung 1
+bedürftig 1
+▁Francisco 1
+▁robust 1
+▁unsichtbar 1
+Energieverbrauch 1
+Standard 1
+Konferenz 1
+Website 1
+▁beherrscht 1
+▁harmonisch 1
+▁sonnig 1
+▁clean 1
+▁Vergessen 1
+▁betreibt 1
+kolleg 1
+▁Begegnung 1
+▁Inanspruchnahme 1
+▁Südtirol 1
+▁Rentner 1
+▁symbolisch 1
+▁Daniel 1
+intensiv 1
+lekommunikations 1
+▁Corporate 1
+▁Stornierung 1
+▁voranzutreiben 1
+▁autonom 1
+▁Bewirtschaftung 1
+▁Jagd 1
+▁köstliche 1
+wirksam 1
+Meinungsäußerung 1
+▁Tschetschenien 1
+▁verweigert 1
+▁schweigen 1
+▁human 1
+manager 1
+Mitgliedsstaaten 1
+▁geschäftliche 1
+▁behindern 1
+▁gewerbliche 1
+▁versorgt 1
+▁Sudan 1
+inhaber 1
+▁Interessant 1
+х 1
+▁Feedback 1
+▁Gletscher 1
+▁Wachstumspakt 1
+▁Algerien 1
+▁geachtet 1
+▁heikle 1
+▁BMW 1
+▁Abweichung 1
+▁lebhaft 1
+public 1
+bewusstsein 1
+▁gemischt 1
+▁Positiv 1
+▁kämpft 1
+▁Segment 1
+▁Student 1
+▁Schwierigkeit 1
+▁North 1
+β 1
+— 1
+$ 1
+ř 1
+Š 1
+ę 1
+ò 1
+ø 1
+ë 1
+ο 1
+τ 1
+ň 1
+ц 1
+ε 1
+ι 1
+ж 1
+Č 1
+æ 1
+ï 1
+ş 1
+μ 1
+ā 1
+ą 1
+ν 1
+ĺ 1
+ŕ 1
+ù 1
+ğ 1
+† 1
+ю 1
+ś 1
+ш 1
+É 1
+ا 1
+ì 1
+κ 1
+ρ 1
+⁄ 1
+π 1
+σ 1
+ل 1
+λ 1
+ő 1
+ż 1
+~ 1
+ă 1
+œ 1
+Á 1
+Â 1
+û 1
+đ 1
+› 1
+В 1
+Å 1
+Р 1
+¿ 1
+υ 1
+^ 1
+£ 1
+‹ 1
+Ž 1
+ű 1
+ί 1
+ф 1
+ī 1
+→ 1
+щ 1
+η 1
+ن 1
+ς 1
+ό 1
+ů 1
+ر 1
+õ 1
+ي 1
+Ÿ 1
+± 1
+э 1
+ã 1
+¬ 1
+П 1
+− 1
+ά 1
+َ 1
+δ 1
+Ó 1
+ē 1
+م 1
+İ 1
+ď 1
+ή 1
+Ø 1
+و 1
+ت 1
+ї 1
+น 1
+ō 1
+ū 1
+К 1
+‟ 1
+γ 1
+А 1
+έ 1
+Ç 1
+ė 1
+ك 1
+‐ 1
+× 1
+า 1
+Í 1
+อ 1
+Н 1
+¡ 1
+¢ 1
+М 1
+่ 1
+ร 1
+ľ 1
+Ѓ 1
+ب 1
+θ 1
+س 1
+ع 1
+О 1
+د 1
+ω 1
+י 1
+÷ 1
+І 1
+Б 1
+ƒ 1
+Ś 1
+Т 1
+Ł 1
+є 1
+Л 1
+ţ 1
+أ 1
+ง 1
+À 1
+′ 1
+Д 1
+Ú 1
+ו 1
+ก 1
+Ż 1
+ِ 1
+เ 1
+ม 1
+ύ 1
+ר 1
+← 1
+χ 1
+้ 1
+ี 1
+¦ 1
+א 1
+ه 1
+Ё 1
+ť 1
+ź 1
+Ñ 1
+φ 1
+И 1
+ة 1
+ว 1
+Ґ 1
+ส 1
+ל 1
+ה 1
+Ý 1
+ÿ 1
+Љ 1
+خ 1
+ิ 1
+Е 1
+ّ 1
+ศ 1
+Ќ 1
+พ 1
+Ő 1
+ค 1
+ั 1
+ะ 1
+њ 1
+‡ 1
+ف 1
+Њ 1
+ด 1
+У 1
+̈ 1
+ב 1
+Ð 1
+З 1
+¥ 1
+‒ 1
+年 1
+ў 1
+ע 1
+ห 1
+년 1
+ท 1
+Ş 1
+̧ 1
+ج 1
+► 1
+Æ 1
+ح 1
+È 1
+Î 1
+※ 1
+Α 1
+ล 1
+Ф 1
+Ω 1
+ώ 1
+Я 1
+ъ 1
+ش 1
+ص 1
+Э 1
+ื 1
+Ê 1
+ņ 1
+ё 1
+ת 1
+ย 1
+ุ 1
+ข 1
+إ 1
+● 1
+ϋ 1
+ξ 1
+ט 1
+þ 1
+ק 1
+ζ 1
+ق 1
+บ 1
+Ń 1
+Ą 1
+، 1
+ْ 1
+Ε 1
+ป 1
+ณ 1
+‰ 1
+ļ 1
+ד 1
+ى 1
+Η 1
+日 1
+ใ 1
+צ 1
+Đ 1
+Π 1
+פ 1
+ต 1
+‛ 1
+Х 1
+מ 1
+­ 1
+Μ 1
+ש 1
+ُ 1
+』 1
+Ш 1
+ѓ 1
+ķ 1
+ם 1
+⇒ 1
+ض 1
+Ι 1
+上 1
+本 1
+็ 1
+ј 1
+Ў 1
+Ď 1
+Τ 1
+Ô 1
+Ě 1
+↑ 1
+√ 1
+和 1
+Ч 1
+Þ 1
+Ї 1
+食 1
+で 1
+ู 1
+แ 1
+ํ 1
+จ 1
+Є 1
+ช 1
+Κ 1
+Œ 1
+Ο 1
+ѕ 1
+ן 1
+ط 1
+ა 1
+Ľ 1
+Ř 1
+Δ 1
+Ц 1
+غ 1
+ー 1
+す 1
+♫ 1
+ไ 1
+Џ 1
+Σ 1
+נ 1
+Ć 1
+ز 1
+ی 1
+、 1
+【 1
+̋ 1
+Ν 1
+い 1
+。 1
+ґ 1
+Ę 1
+Ĺ 1
+Ō 1
+自 1
+̃ 1
+् 1
+Ė 1
+ʿ 1
+Γ 1
+Θ 1
+Ж 1
+ז 1
+ი 1
+ლ 1
+ረ 1
+】 1
+克 1
+顶 1
+ų 1
+三 1
+< 1
+ג 1
+ス 1
+文 1
+የ 1
+て 1
+ผ 1
+寺 1
+በ 1
+来 1
+手 1
+球 1
+ 1
+ 1
+ 1
+Ë 1
+ŏ 1
+ŭ 1
+Ų 1
+̊ 1
+Ј 1
+љ 1
+ء 1
+آ 1
+ث 1
+र 1
+ዓ 1
+ይ 1
+★ 1
+治 1
+Ă 1
+≥ 1
+Ò 1
+『 1
+新 1
+は 1
+Õ 1
+免 1
+疫 1
+博 1
+场 1
+的 1
+网 1
+ン 1
+し 1
+も 1
+ึ 1
+ธ 1
+გ 1
+ე 1
+რ 1
+ħ 1
+ǎ 1
+प 1
+出 1
+武 1
+Ï 1
+Ň 1
+Ů 1
+ː 1
+̤ 1
+ќ 1
+ң 1
+۱ 1
+० 1
+დ 1
+ო 1
+ს 1
+ገ 1
+ệ 1
+≪ 1
+≫ 1
+◎ 1
+♥ 1
+县 1
+天 1
+市 1
+東 1
+江 1
+白 1
+空 1
+蛋 1
+語 1
+语 1
+込 1
+青 1
+ǐ 1
+כ 1
+द 1
+ा 1
+コ 1
+ナ 1
+一 1
+中 1
+山 1
+鋝 1
+ቀ 1
+̄ 1
+む 1
+有 1
+不 1
+乐 1
+在 1
+娱 1
+正 1
+赌 1
+ま 1
+か 1
+た 1
+っ 1
+く 1
+് 1
+Φ 1
+を 1
+チ 1
+マ 1
+・ 1
+Ī 1
+Ğ 1
+̍ 1
+ח 1
+ئ 1
+े 1
+鋍 1
+Ì 1
+ĕ 1
+Ū 1
+ơ 1
+ǔ 1
+̨ 1
+ً 1
+म 1
+ह 1
+ถ 1
+์ 1
+ຸ 1
+ህ 1
+ም 1
+ሻ 1
+ተ 1
+አ 1
+ኣ 1
+ው 1
+ጽ 1
+፣ 1
+ṭ 1
+ạ 1
+ế 1
+ễ 1
+↓ 1
+⇢ 1
+≈ 1
+■ 1
+◆ 1
+ち 1
+る 1
+イ 1
+オ 1
+テ 1
+ル 1
+了 1
+修 1
+分 1
+匿 1
+名 1
+吗 1
+敏 1
+木 1
+机 1
+站 1
+鋓 1
+问 1
+舁 1
+Ű 1
+ψ 1
+Ю 1
+ֳ 1
+仁 1
+水 1
+清 1
+石 1
+简 1
+谷 1
+እ 1
+ኦ 1
+宣 1
+መ 1
+ጨ 1
+人 1
+生 1
+さ 1
+ん 1
+お 1
+に 1
+ワ 1
+是 1
+キ 1
+や 1
+が 1
+つ 1
+と 1
+大 1
+屋 1
+โ 1
+ซ 1
+タ 1
+ә 1
+ภ 1
+ვ 1
+╚ 1
+ウ 1
+ユ 1
+体 1
+北 1
+龙 1
+Λ 1
+ћ 1
+ס 1
+ذ 1
+پ 1
+अ 1
+Ḥ 1
+振 1
+ḍ 1
+书 1
+小 1
+毛 1
+谢 1
+鰃 1
+Û 1
+Ĉ 1
+ĩ 1
+Į 1
+į 1
+ǵ 1
+ɑ 1
+ə 1
+ʒ 1
+ΐ 1
+Й 1
+Ъ 1
+ғ 1
+ָ 1
+ک 1
+ھ 1
+ۇ 1
+च 1
+ञ 1
+त 1
+ल 1
+श 1
+२ 1
+ন 1
+য 1
+় 1
+া 1
+૦ 1
+೦ 1
+೧ 1
+ษ 1
+ๆ 1
+ბ 1
+ნ 1
+პ 1
+ღ 1
+ხ 1
+ሕ 1
+ር 1
+ቶ 1
+ክ 1
+ዲ 1
+ả 1
+ổ 1
+ớ 1
+ῆ 1
+▪ 1
+▼ 1
+◊ 1
+○ 1
+♣ 1
+➲ 1
+あ 1
+う 1
+じ 1
+だ 1
+ね 1
+へ 1
+み 1
+め 1
+よ 1
+サ 1
+ジ 1
+ダ 1
+ツ 1
+ド 1
+ニ 1
+ヌ 1
+ハ 1
+ビ 1
+ベ 1
+ペ 1
+ボ 1
+ミ 1
+メ 1
+ヨ 1
+二 1
+从 1
+你 1
+内 1
+刘 1
+剥 1
+危 1
+受 1
+后 1
+幹 1
+张 1
+微 1
+応 1
+思 1
+戸 1
+欢 1
+歌 1
+气 1
+测 1
+海 1
+港 1
+溥 1
+牌 1
+章 1
+線 1
+舖 1
+花 1
+见 1
+言 1
+过 1
+送 1
+遝 1
+都 1
+里 1
+际 1
+题 1
+黚 1
+이 1
+ッ 1
+ĉ 1
+ģ 1
+Ť 1
+⋅ 1
+け 1
+举 1
+德 1
+管 1
+箱 1
+舫 1
+鋜 1
+陵 1
+ɛ 1
+ḥ 1
+ṣ 1
+╩ 1
+こ 1
+伸 1
+原 1
+国 1
+深 1
+鋖 1
+Ρ 1
+─ 1
+东 1
+五 1
+应 1
+方 1
+西 1
+の 1
+デ 1
+フ 1
+ホ 1
+ラ 1
+リ 1
+所 1
+グ 1
+え 1
+き 1
+げ 1
+ら 1
+ろ 1
+見 1
+部 1
+Ţ 1
+Ћ 1
+ェ 1
+Χ 1
+ђ 1
+ү 1
+һ 1
+ব 1
+ക 1
+ദ 1
+ല 1
+വ 1
+ഷ 1
+ീ 1
+于 1
+依 1
+头 1
+庆 1
+挂 1
+火 1
+用 1
+至 1
+车 1
+重 1
+除 1
+Ġ 1
+ŷ 1
+೨ 1
+余 1
+其 1
+叫 1
+吴 1
+咬 1
+引 1
+扈 1
+才 1
+晏 1
+牙 1
+紧 1
+跋 1
+ġ 1
+≤ 1
+゚ 1
+呀 1
+如 1
+届 1
+岩 1
+损 1
+澤 1
+続 1
+臺 1
+舩 1
+ở 1
+Ù 1
+Ā 1
+ĝ 1
+ĥ 1
+ĵ 1
+ĸ 1
+Ņ 1
+Ŝ 1
+ŝ 1
+Ź 1
+ǒ 1
+Ǻ 1
+ț 1
+ɔ 1
+ɡ 1
+ʐ 1
+ˆ 1
+Ά 1
+Ί 1
+Ό 1
+Ώ 1
+Β 1
+Υ 1
+ϊ 1
+Ѕ 1
+ұ 1
+Ӓ 1
+ө 1
+؛ 1
+ؤ 1
+ـ 1
+ٌ 1
+ځ 1
+ڭ 1
+ण 1
+न 1
+ब 1
+ि 1
+ु 1
+१ 1
+३ 1
+ই 1
+র 1
+হ 1
+ি 1
+ু 1
+ো 1
+্ 1
+૧ 1
+૨ 1
+೫ 1
+೬ 1
+೯ 1
+പ 1
+ญ 1
+ฎ 1
+ฝ 1
+ฟ 1
+฿ 1
+ຄ 1
+ງ 1
+ດ 1
+ຖ 1
+ນ 1
+ມ 1
+ວ 1
+ັ 1
+ስ 1
+ቐ 1
+ት 1
+Ṭ 1
+ẋ 1
+ẩ 1
+ậ 1
+ề 1
+ồ 1
+ộ 1
+ụ 1
+ủ 1
+ỹ 1
+ἰ 1
+₤ 1
+∞ 1
+█ 1
+◇ 1
+◈ 1
+☆ 1
+☑ 1
+☼ 1
+♀ 1
+♂ 1
+♦ 1
+」 1
+〜 1
+ぃ 1
+ぎ 1
+ぐ 1
+ご 1
+ざ 1
+ず 1
+せ 1
+ぜ 1
+そ 1
+ぞ 1
+ど 1
+な 1
+ぬ 1
+ば 1
+ひ 1
+び 1
+ふ 1
+ぶ 1
+べ 1
+ほ 1
+ぼ 1
+ゆ 1
+り 1
+れ 1
+わ 1
+ア 1
+ィ 1
+エ 1
+カ 1
+ガ 1
+ギ 1
+ク 1
+ケ 1
+ゲ 1
+ゴ 1
+ザ 1
+シ 1
+セ 1
+ゼ 1
+ソ 1
+ゾ 1
+ト 1
+ネ 1
+ノ 1
+バ 1
+パ 1
+ヒ 1
+ピ 1
+ブ 1
+プ 1
+ヘ 1
+ポ 1
+ム 1
+モ 1
+ヤ 1
+レ 1
+ロ 1
+ヴ 1
+ㄤ 1
+七 1
+万 1
+丈 1
+下 1
+义 1
+习 1
+事 1
+京 1
+仪 1
+仲 1
+价 1
+会 1
+但 1
+何 1
+倍 1
+儀 1
+儉 1
+光 1
+公 1
+刨 1
+則 1
+剣 1
+务 1
+动 1
+勧 1
+区 1
+去 1
+参 1
+及 1
+取 1
+只 1
+可 1
+台 1
+吃 1
+向 1
+君 1
+吧 1
+吹 1
+吾 1
+告 1
+喜 1
+嘛 1
+四 1
+回 1
+囧 1
+固 1
+國 1
+堂 1
+墩 1
+央 1
+好 1
+娃 1
+子 1
+孴 1
+宝 1
+客 1
+家 1
+寨 1
+寶 1
+寸 1
+尔 1
+局 1
+岭 1
+崩 1
+川 1
+希 1
+広 1
+庚 1
+弁 1
+彭 1
+役 1
+必 1
+怒 1
+怡 1
+性 1
+意 1
+慢 1
+成 1
+戴 1
+抜 1
+探 1
+接 1
+掻 1
+握 1
+搞 1
+摂 1
+撮 1
+放 1
+施 1
+昇 1
+星 1
+春 1
+显 1
+普 1
+曌 1
+曝 1
+書 1
+最 1
+板 1
+查 1
+柱 1
+桂 1
+检 1
+楽 1
+檀 1
+次 1
+止 1
+步 1
+気 1
+汉 1
+没 1
+泥 1
+注 1
+泽 1
+洛 1
+活 1
+浦 1
+済 1
+満 1
+漢 1
+焼 1
+煮 1
+爱 1
+父 1
+片 1
+率 1
+玉 1
+王 1
+班 1
+琢 1
+畢 1
+畿 1
+疆 1
+疑 1
+百 1
+皇 1
+直 1
+相 1
+眼 1
+瞎 1
+知 1
+确 1
+示 1
+礼 1
+神 1
+祿 1
+福 1
+秀 1
+竞 1
+端 1
+竹 1
+第 1
+答 1
+紅 1
+終 1
+統 1
+纹 1
+细 1
+统 1
+维 1
+罗 1
+群 1
+義 1
+羽 1
+耀 1
+胞 1
+能 1
+臘 1
+臨 1
+致 1
+舐 1
+航 1
+葉 1
+葱 1
+蒸 1
+蔚 1
+藤 1
+街 1
+视 1
+觉 1
+訓 1
+記 1
+請 1
+许 1
+诘 1
+请 1
+调 1
+貌 1
+貮 1
+货 1
+质 1
+赤 1
+赵 1
+超 1
+足 1
+軍 1
+辱 1
+迎 1
+返 1
+连 1
+迷 1
+道 1
+遭 1
+郑 1
+鄭 1
+酉 1
+鋘 1
+鋟 1
+镜 1
+閩 1
+闽 1
+阳 1
+陀 1
+降 1
+陶 1
+電 1
+静 1
+音 1
+预 1
+飞 1
+飼 1
+馬 1
+鲁 1
+鵜 1
+黄 1
+黨 1
+검 1
+고 1
+군 1
+나 1
+누 1
+님 1
+단 1
+당 1
+드 1
+맨 1
+반 1
+번 1
+법 1
+베 1
+별 1
+빛 1
+성 1
+스 1
+신 1
+에 1
+왕 1
+요 1
+유 1
+자 1
+작 1
+조 1
+짝 1
+천 1
+추 1
+터 1
+̇ 1
+入 1
+凄 1
+千 1
+吳 1
+实 1
+康 1
+彰 1
+旨 1
+森 1
+睦 1
+苑 1
+蔓 1
+関 1
+鰈 1
+鰊 1
+鲞 1
+鹿 1
+ˈ 1
+Ẳ 1
+丘 1
+井 1
+今 1
+圆 1
+安 1
+明 1
+李 1
+甜 1
+田 1
+羅 1
+茶 1
+覺 1
+雄 1
+鴻 1
+대 1
+르 1
+체 1
+층 1
+ʀ 1
+愛 1
+无 1
+产 1
+住 1
+反 1
+場 1
+景 1
+济 1
+益 1
+种 1
+经 1
+而 1
+行 1
+非 1
+力 1
+学 1
+常 1
+朝 1
+留 1
+Ђ 1
+џ 1
+湖 1
+綺 1
+麗 1
+<mask> 1
diff --git a/SpeechUT/dataset/MuSTC/en_de/spm_unigram10000.model b/SpeechUT/dataset/MuSTC/en_de/spm_unigram10000.model
new file mode 100644
index 0000000000000000000000000000000000000000..ac88f59caee81b4fafce7e961b37863310f9ad95
Binary files /dev/null and b/SpeechUT/dataset/MuSTC/en_de/spm_unigram10000.model differ
diff --git a/SpeechUT/dataset/MuSTC/en_es/config.yaml b/SpeechUT/dataset/MuSTC/en_es/config.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..dce5f63011a8c33a4d12eec569fdcc91ea299f68
--- /dev/null
+++ b/SpeechUT/dataset/MuSTC/en_es/config.yaml
@@ -0,0 +1,3 @@
+vocab_filename: dict.spm.txt
+src_vocab_filename: dict.kmu.txt
+
diff --git a/SpeechUT/dataset/MuSTC/en_es/config_enes.yaml b/SpeechUT/dataset/MuSTC/en_es/config_enes.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..dd080a05500211cade57d80056c8ce311ce4c0c2
--- /dev/null
+++ b/SpeechUT/dataset/MuSTC/en_es/config_enes.yaml
@@ -0,0 +1,14 @@
+bpe_tokenizer:
+  bpe: sentencepiece
+  sentencepiece_model: spm_unigram10000.model
+
+sampling_alpha: 1.0
+shuffle: false
+use_audio_input: true
+use_sample_rate: 16000
+
+vocab_filename: dict.spm.txt
+
+# required by speech_to_text task but never used  
+input_channels: 1
+input_feat_per_channel: 1
diff --git a/SpeechUT/dataset/MuSTC/en_es/dict.kmu.txt b/SpeechUT/dataset/MuSTC/en_es/dict.kmu.txt
new file mode 100644
index 0000000000000000000000000000000000000000..bbfe59e554d6234f3631d8d09d9281c2160f4675
--- /dev/null
+++ b/SpeechUT/dataset/MuSTC/en_es/dict.kmu.txt
@@ -0,0 +1,500 @@
+0 0
+1 1
+2 2
+3 3
+4 4
+5 5
+6 6
+7 7
+8 8
+9 9
+10 10
+11 11
+12 12
+13 13
+14 14
+15 15
+16 16
+17 17
+18 18
+19 19
+20 20
+21 21
+22 22
+23 23
+24 24
+25 25
+26 26
+27 27
+28 28
+29 29
+30 30
+31 31
+32 32
+33 33
+34 34
+35 35
+36 36
+37 37
+38 38
+39 39
+40 40
+41 41
+42 42
+43 43
+44 44
+45 45
+46 46
+47 47
+48 48
+49 49
+50 50
+51 51
+52 52
+53 53
+54 54
+55 55
+56 56
+57 57
+58 58
+59 59
+60 60
+61 61
+62 62
+63 63
+64 64
+65 65
+66 66
+67 67
+68 68
+69 69
+70 70
+71 71
+72 72
+73 73
+74 74
+75 75
+76 76
+77 77
+78 78
+79 79
+80 80
+81 81
+82 82
+83 83
+84 84
+85 85
+86 86
+87 87
+88 88
+89 89
+90 90
+91 91
+92 92
+93 93
+94 94
+95 95
+96 96
+97 97
+98 98
+99 99
+100 100
+101 101
+102 102
+103 103
+104 104
+105 105
+106 106
+107 107
+108 108
+109 109
+110 110
+111 111
+112 112
+113 113
+114 114
+115 115
+116 116
+117 117
+118 118
+119 119
+120 120
+121 121
+122 122
+123 123
+124 124
+125 125
+126 126
+127 127
+128 128
+129 129
+130 130
+131 131
+132 132
+133 133
+134 134
+135 135
+136 136
+137 137
+138 138
+139 139
+140 140
+141 141
+142 142
+143 143
+144 144
+145 145
+146 146
+147 147
+148 148
+149 149
+150 150
+151 151
+152 152
+153 153
+154 154
+155 155
+156 156
+157 157
+158 158
+159 159
+160 160
+161 161
+162 162
+163 163
+164 164
+165 165
+166 166
+167 167
+168 168
+169 169
+170 170
+171 171
+172 172
+173 173
+174 174
+175 175
+176 176
+177 177
+178 178
+179 179
+180 180
+181 181
+182 182
+183 183
+184 184
+185 185
+186 186
+187 187
+188 188
+189 189
+190 190
+191 191
+192 192
+193 193
+194 194
+195 195
+196 196
+197 197
+198 198
+199 199
+200 200
+201 201
+202 202
+203 203
+204 204
+205 205
+206 206
+207 207
+208 208
+209 209
+210 210
+211 211
+212 212
+213 213
+214 214
+215 215
+216 216
+217 217
+218 218
+219 219
+220 220
+221 221
+222 222
+223 223
+224 224
+225 225
+226 226
+227 227
+228 228
+229 229
+230 230
+231 231
+232 232
+233 233
+234 234
+235 235
+236 236
+237 237
+238 238
+239 239
+240 240
+241 241
+242 242
+243 243
+244 244
+245 245
+246 246
+247 247
+248 248
+249 249
+250 250
+251 251
+252 252
+253 253
+254 254
+255 255
+256 256
+257 257
+258 258
+259 259
+260 260
+261 261
+262 262
+263 263
+264 264
+265 265
+266 266
+267 267
+268 268
+269 269
+270 270
+271 271
+272 272
+273 273
+274 274
+275 275
+276 276
+277 277
+278 278
+279 279
+280 280
+281 281
+282 282
+283 283
+284 284
+285 285
+286 286
+287 287
+288 288
+289 289
+290 290
+291 291
+292 292
+293 293
+294 294
+295 295
+296 296
+297 297
+298 298
+299 299
+300 300
+301 301
+302 302
+303 303
+304 304
+305 305
+306 306
+307 307
+308 308
+309 309
+310 310
+311 311
+312 312
+313 313
+314 314
+315 315
+316 316
+317 317
+318 318
+319 319
+320 320
+321 321
+322 322
+323 323
+324 324
+325 325
+326 326
+327 327
+328 328
+329 329
+330 330
+331 331
+332 332
+333 333
+334 334
+335 335
+336 336
+337 337
+338 338
+339 339
+340 340
+341 341
+342 342
+343 343
+344 344
+345 345
+346 346
+347 347
+348 348
+349 349
+350 350
+351 351
+352 352
+353 353
+354 354
+355 355
+356 356
+357 357
+358 358
+359 359
+360 360
+361 361
+362 362
+363 363
+364 364
+365 365
+366 366
+367 367
+368 368
+369 369
+370 370
+371 371
+372 372
+373 373
+374 374
+375 375
+376 376
+377 377
+378 378
+379 379
+380 380
+381 381
+382 382
+383 383
+384 384
+385 385
+386 386
+387 387
+388 388
+389 389
+390 390
+391 391
+392 392
+393 393
+394 394
+395 395
+396 396
+397 397
+398 398
+399 399
+400 400
+401 401
+402 402
+403 403
+404 404
+405 405
+406 406
+407 407
+408 408
+409 409
+410 410
+411 411
+412 412
+413 413
+414 414
+415 415
+416 416
+417 417
+418 418
+419 419
+420 420
+421 421
+422 422
+423 423
+424 424
+425 425
+426 426
+427 427
+428 428
+429 429
+430 430
+431 431
+432 432
+433 433
+434 434
+435 435
+436 436
+437 437
+438 438
+439 439
+440 440
+441 441
+442 442
+443 443
+444 444
+445 445
+446 446
+447 447
+448 448
+449 449
+450 450
+451 451
+452 452
+453 453
+454 454
+455 455
+456 456
+457 457
+458 458
+459 459
+460 460
+461 461
+462 462
+463 463
+464 464
+465 465
+466 466
+467 467
+468 468
+469 469
+470 470
+471 471
+472 472
+473 473
+474 474
+475 475
+476 476
+477 477
+478 478
+479 479
+480 480
+481 481
+482 482
+483 483
+484 484
+485 485
+486 486
+487 487
+488 488
+489 489
+490 490
+491 491
+492 492
+493 493
+494 494
+495 495
+496 496
+497 497
+498 498
+499 499
diff --git a/SpeechUT/dataset/MuSTC/en_es/dict.spm.txt b/SpeechUT/dataset/MuSTC/en_es/dict.spm.txt
new file mode 100644
index 0000000000000000000000000000000000000000..194ae6f610da4c2ec1975ba3aa9f45fe527c98c3
--- /dev/null
+++ b/SpeechUT/dataset/MuSTC/en_es/dict.spm.txt
@@ -0,0 +1,9997 @@
+▁de 1
+, 1
+▁la 1
+. 1
+▁y 1
+▁en 1
+▁que 1
+▁el 1
+s 1
+▁a 1
+▁los 1
+▁las 1
+▁del 1
+▁se 1
+▁para 1
+▁con 1
+▁un 1
+▁por 1
+n 1
+▁una 1
+▁no 1
+▁su 1
+▁al 1
+▁es 1
+▁( 1
+r 1
+▁sobre 1
+) 1
+▁El 1
+▁como 1
+▁o 1
+▁ 1
+▁lo 1
+▁La 1
+▁más 1
+▁En 1
+es 1
+▁ha 1
+: 1
+do 1
+a 1
+; 1
+▁sus 1
+▁A 1
+o 1
+▁Naciones 1
+▁Unidas 1
+da 1
+▁entre 1
+▁Estados 1
+▁este 1
+/ 1
+se 1
+mente 1
+▁Comisión 1
+▁también 1
+▁e 1
+▁países 1
+▁desarrollo 1
+▁General 1
+mos 1
+▁Consejo 1
+▁si 1
+▁esta 1
+▁han 1
+▁Y 1
+▁contra 1
+▁son 1
+ndo 1
+▁derechos 1
+▁todos 1
+e 1
+- 1
+▁in 1
+ción 1
+▁informe 1
+▁Comité 1
+▁" 1
+▁está 1
+▁Se 1
+▁No 1
+á 1
+▁ser 1
+l 1
+▁re 1
+▁parte 1
+▁internacional 1
+▁Los 1
+? 1
+la 1
+▁período 1
+▁personas 1
+▁me 1
+▁información 1
+dos 1
+▁¿ 1
+▁todo 1
+▁medidas 1
+ba 1
+▁años 1
+▁resolución 1
+das 1
+t 1
+▁así 1
+▁mi 1
+▁pero 1
+▁Estado 1
+▁derecho 1
+A 1
+▁humanos 1
+▁puede 1
+▁programa 1
+▁Por 1
+ó 1
+▁sin 1
+▁le 1
+▁muy 1
+▁otros 1
+▁hacer 1
+▁sesiones 1
+C 1
+▁actividades 1
+▁artículo 1
+ta 1
+ra 1
+lo 1
+en 1
+▁Asamblea 1
+▁cuando 1
+▁ese 1
+▁niños 1
+▁1 1
+y 1
+▁proyecto 1
+▁trabajo 1
+▁Presidente 1
+▁Gobierno 1
+▁respecto 1
+▁aplicación 1
+▁miembros 1
+▁fin 1
+▁sistema 1
+▁“ 1
+▁dos 1
+d 1
+▁forma 1
+▁- 1
+le 1
+▁vez 1
+▁todas 1
+▁tiene 1
+▁Secretario 1
+▁seguridad 1
+▁país 1
+), 1
+ía 1
+▁otras 1
+to 1
+). 1
+▁Las 1
+ca 1
+te 1
+▁mundo 1
+ron 1
+▁mujeres 1
+ar 1
+▁apoyo 1
+▁De 1
+i 1
+an 1
+on 1
+S 1
+ciones 1
+▁Grupo 1
+▁están 1
+▁recursos 1
+▁hay 1
+▁hecho 1
+▁cada 1
+▁servicios 1
+▁Es 1
+c 1
+p 1
+de 1
+▁lugar 1
+u 1
+▁debe 1
+▁Convención 1
+▁esa 1
+▁situación 1
+▁cuenta 1
+▁fue 1
+▁ya 1
+▁2 1
+▁particular 1
+ma 1
+▁durante 1
+▁tiempo 1
+▁había 1
+▁desde 1
+é 1
+▁programas 1
+▁internacionales 1
+▁organizaciones 1
+▁importante 1
+▁proceso 1
+▁tanto 1
+▁cooperación 1
+▁era 1
+▁esto 1
+▁pueden 1
+no 1
+▁c 1
+co 1
+▁Si 1
+▁Internacional 1
+▁República 1
+di 1
+▁política 1
+▁porque 1
+▁S 1
+▁Seguridad 1
+▁manera 1
+▁3 1
+▁asistencia 1
+er 1
+▁paz 1
+▁nacional 1
+▁párrafo 1
+▁relación 1
+▁Unión 1
+nte 1
+re 1
+" 1
+▁b 1
+▁Pero 1
+▁gran 1
+▁cuestiones 1
+▁Con 1
+▁año 1
+▁nos 1
+sa 1
+▁bien 1
+▁sido 1
+▁general 1
+▁nivel 1
+▁C 1
+▁I 1
+▁cuestión 1
+les 1
+▁pro 1
+▁mayor 1
+▁Al 1
+▁uno 1
+▁vida 1
+ti 1
+▁protección 1
+▁hasta 1
+▁Sr 1
+▁personal 1
+▁caso 1
+▁sólo 1
+▁E 1
+▁tienen 1
+án 1
+▁estos 1
+▁Derechos 1
+▁mismo 1
+▁nacionales 1
+P 1
+me 1
+rá 1
+▁presente 1
+▁P 1
+▁Europea 1
+g 1
+▁19 1
+rse 1
+▁menos 1
+m 1
+▁hace 1
+ría 1
+▁esos 1
+ce 1
+▁Conferencia 1
+ci 1
+cu 1
+▁eso 1
+L 1
+▁O 1
+▁millones 1
+▁marco 1
+ro 1
+▁acuerdo 1
+▁ver 1
+▁dólares 1
+▁conformidad 1
+í 1
+▁número 1
+▁posible 1
+▁salud 1
+na 1
+▁social 1
+ban 1
+▁haya 1
+ación 1
+▁ejemplo 1
+▁armas 1
+▁qué 1
+▁sea 1
+▁mediante 1
+▁10 1
+▁di 1
+▁políticas 1
+b 1
+▁4 1
+▁tema 1
+▁datos 1
+▁atención 1
+▁antes 1
+▁capacidad 1
+▁ni 1
+▁ante 1
+” 1
+▁les 1
+▁Así 1
+que 1
+▁decisión 1
+E 1
+▁partes 1
+▁cosas 1
+▁labor 1
+▁esas 1
+▁embargo 1
+ga 1
+▁5 1
+▁algo 1
+▁materia 1
+ten 1
+▁estas 1
+li 1
+▁tener 1
+▁ahora 1
+▁algunos 1
+▁tres 1
+▁Especial 1
+▁ex 1
+▁deben 1
+▁otra 1
+idad 1
+▁esfuerzos 1
+so 1
+▁mejor 1
+▁examen 1
+▁trata 1
+▁otro 1
+▁decir 1
+z 1
+▁nuestra 1
+▁También 1
+▁mujer 1
+▁nuevo 1
+▁comunidad 1
+▁medio 1
+▁pre 1
+▁acceso 1
+▁condiciones 1
+go 1
+▁donde 1
+al 1
+▁base 1
+▁diciembre 1
+▁tipo 1
+▁nuestro 1
+je 1
+vi 1
+B 1
+▁Además 1
+▁mucho 1
+▁Oficina 1
+▁casos 1
+▁Re 1
+▁objetivos 1
+▁problemas 1
+▁Programa 1
+▁después 1
+▁Sin 1
+▁6 1
+▁educación 1
+▁Desarrollo 1
+nos 1
+mi 1
+▁toda 1
+▁15 1
+▁Para 1
+▁Parte 1
+▁te 1
+in 1
+ja 1
+▁instituciones 1
+mo 1
+▁día 1
+▁co 1
+ri 1
+▁Unidos 1
+▁informes 1
+▁Humanos 1
+▁momento 1
+▁importancia 1
+▁B 1
+pa 1
+▁sino 1
+▁primera 1
+▁12 1
+D 1
+▁Europa 1
+▁responsabilidad 1
+▁cualquier 1
+▁tan 1
+▁Me 1
+▁especial 1
+▁ellos 1
+▁participación 1
+▁cual 1
+▁grupos 1
+▁7 1
+▁África 1
+▁normas 1
+▁Esta 1
+el 1
+▁población 1
+▁necesidad 1
+ve 1
+▁reunión 1
+▁va 1
+T 1
+tra 1
+". 1
+▁problema 1
+▁disposiciones 1
+▁sector 1
+▁ley 1
+tu 1
+▁— 1
+▁8 1
+ne 1
+▁grupo 1
+▁violencia 1
+▁Mundial 1
+▁documento 1
+▁primer 1
+▁Organización 1
+▁persona 1
+▁sesión 1
+▁sociedad 1
+▁uso 1
+▁muchos 1
+▁ca 1
+▁solo 1
+▁d 1
+▁Parlamento 1
+▁Ley 1
+▁resultados 1
+h 1
+▁ayuda 1
+▁recomendaciones 1
+un 1
+os 1
+▁dar 1
+▁mejorar 1
+▁Este 1
+ado 1
+▁gente 1
+▁lograr 1
+▁he 1
+O 1
+▁aquí 1
+▁In 1
+▁tal 1
+k 1
+▁nota 1
+▁20 1
+▁mundial 1
+▁necesario 1
+▁F 1
+▁necesidades 1
+▁9 1
+▁cómo 1
+▁debería 1
+▁horas 1
+▁gestión 1
+▁nombre 1
+▁Lo 1
+▁fecha 1
+bi 1
+▁Partes 1
+▁objetivo 1
+M 1
+si 1
+tar 1
+G 1
+▁región 1
+▁Miembros 1
+▁plan 1
+▁14 1
+▁proyectos 1
+▁18 1
+▁presupuesto 1
+▁Como 1
+▁inter 1
+▁11 1
+los 1
+▁Trabajo 1
+va 1
+▁lucha 1
+▁ello 1
+f 1
+▁control 1
+las 1
+▁promover 1
+00 1
+▁poder 1
+▁Tribunal 1
+2 1
+as 1
+▁hemos 1
+tas 1
+▁Mi 1
+▁podría 1
+▁tenemos 1
+▁especialmente 1
+▁G 1
+▁incluso 1
+▁civil 1
+Risas 1
+qui 1
+▁nuestros 1
+▁garantizar 1
+▁Ma 1
+▁discriminación 1
+▁regionales 1
+▁cambio 1
+F 1
+▁medida 1
+res 1
+▁dentro 1
+▁crear 1
+▁13 1
+▁acerca 1
+v 1
+tos 1
+▁debate 1
+▁sentido 1
+ter 1
+▁hoy 1
+▁estaba 1
+▁junio 1
+▁pobreza 1
+▁mayoría 1
+ir 1
+.1 1
+▁hacia 1
+▁virtud 1
+▁total 1
+▁plazo 1
+▁algunas 1
+▁propuesta 1
+x 1
+▁estado 1
+▁aún 1
+▁papel 1
+▁sociales 1
+▁medios 1
+▁favor 1
+▁respuesta 1
+▁haber 1
+▁dice 1
+▁modo 1
+▁[ 1
+▁decisiones 1
+▁p 1
+▁sí 1
+... 1
+▁per 1
+bo 1
+▁30 1
+▁principios 1
+▁organización 1
+U 1
+▁autoridades 1
+▁terrorismo 1
+▁práctica 1
+▁futuro 1
+▁adoptar 1
+▁Su 1
+▁podemos 1
+I 1
+▁evaluación 1
+▁empleo 1
+is 1
+▁anexo 1
+▁objeto 1
+▁habían 1
+▁nuevas 1
+▁acción 1
+▁organismos 1
+▁calidad 1
+▁pertinentes 1
+tó 1
+] 1
+▁delegación 1
+▁mo 1
+▁investigación 1
+▁punto 1
+▁cuanto 1
+pi 1
+▁cabo 1
+▁examinar 1
+▁productos 1
+ria 1
+gra 1
+▁mandato 1
+▁nosotros 1
+dor 1
+ble 1
+▁16 1
+▁Esto 1
+▁establecer 1
+▁Una 1
+▁poco 1
+▁siempre 1
+▁zonas 1
+▁Un 1
+▁empresas 1
+▁bajo 1
+ven 1
+▁según 1
+▁órganos 1
+▁creación 1
+▁nueva 1
+ida 1
+▁legislación 1
+▁tu 1
+▁orden 1
+▁ma 1
+▁julio 1
+▁relativas 1
+▁público 1
+▁realidad 1
+▁será 1
+▁Social 1
+ada 1
+▁Nacional 1
+▁seguir 1
+▁M 1
+▁aumento 1
+▁libertad 1
+▁tecnología 1
+▁estamos 1
+▁vi 1
+▁Señor 1
+▁Ha 1
+▁importantes 1
+▁septiembre 1
+gi 1
+▁principales 1
+▁yo 1
+▁ejecución 1
+▁fondos 1
+▁sostenible 1
+ú 1
+▁segundo 1
+▁falta 1
+▁17 1
+▁días 1
+▁misma 1
+▁además 1
+▁realmente 1
+▁diferentes 1
+▁mercado 1
+▁principio 1
+▁pública 1
+▁siguientes 1
+▁Declaración 1
+ntes 1
+▁agua 1
+▁gobiernos 1
+▁representantes 1
+▁financiación 1
+▁Protocolo 1
+▁económica 1
+fi 1
+▁declaración 1
+▁contexto 1
+▁función 1
+▁través 1
+▁u 1
+▁igualdad 1
+ur 1
+▁frente 1
+pe 1
+▁arreglo 1
+▁pa 1
+▁efectos 1
+▁D 1
+▁noviembre 1
+▁unos 1
+▁actual 1
+▁económico 1
+▁él 1
+▁sigue 1
+▁largo 1
+▁final 1
+▁comercio 1
+encia 1
+▁gubernamentales 1
+▁prácticas 1
+▁aplicar 1
+za 1
+▁nuevos 1
+iendo 1
+▁octubre 1
+dores 1
+▁Junta 1
+N 1
+ido 1
+▁gastos 1
+▁estar 1
+ch 1
+▁siendo 1
+▁muchas 1
+▁Nueva 1
+▁Fondo 1
+▁to 1
+▁tra 1
+▁establecimiento 1
+▁niño 1
+ncia 1
+▁funcionarios 1
+▁ciudadanos 1
+▁formas 1
+▁siguiente 1
+▁víctimas 1
+or 1
+▁justicia 1
+▁valor 1
+▁conflictos 1
+ían 1
+▁último 1
+▁fuera 1
+▁po 1
+rán 1
+▁posibilidad 1
+▁tras 1
+▁causa 1
+▁operaciones 1
+tro 1
+▁Asuntos 1
+▁figura 1
+▁espacio 1
+▁grandes 1
+▁equipo 1
+▁pregunta 1
+▁coordinación 1
+▁Secretaría 1
+▁cumplimiento 1
+▁sistemas 1
+▁interés 1
+▁ayudar 1
+▁Ministerio 1
+miento 1
+▁reuniones 1
+ru 1
+▁pasado 1
+fa 1
+tiva 1
+▁energía 1
+▁regional 1
+' 1
+▁nucleares 1
+▁representa 1
+▁ámbito 1
+▁promoción 1
+▁opinión 1
+ge 1
+jo 1
+▁pueblo 1
+▁fundamental 1
+▁poner 1
+▁Departamento 1
+ización 1
+fe 1
+▁edad 1
+▁realizar 1
+▁representante 1
+▁autor 1
+▁comunicación 1
+▁técnica 1
+▁crisis 1
+▁ra 1
+▁historia 1
+▁reforma 1
+▁Europeo 1
+▁procedimientos 1
+▁mantenimiento 1
+▁sean 1
+▁creo 1
+5 1
+▁resultado 1
+R 1
+▁pueda 1
+fer 1
+▁obligaciones 1
+vo 1
+▁marzo 1
+▁resoluciones 1
+▁compromiso 1
+6 1
+▁reducir 1
+▁carácter 1
+▁texto 1
+▁T 1
+▁tomar 1
+▁2005 1
+▁idea 1
+▁Informe 1
+▁veces 1
+emos 1
+▁fueron 1
+▁aumentar 1
+▁correspondiente 1
+▁puedan 1
+▁dicho 1
+▁propuestas 1
+▁mecanismos 1
+▁Israel 1
+▁Ca 1
+▁común 1
+ismo 1
+▁Ahora 1
+▁cuatro 1
+▁gracias 1
+▁policía 1
+▁lista 1
+▁actos 1
+▁locales 1
+rio 1
+▁abril 1
+▁considera 1
+▁ambiente 1
+▁meses 1
+▁serie 1
+▁familia 1
+▁instrumentos 1
+▁da 1
+▁género 1
+▁mayo 1
+lla 1
+▁Cuando 1
+▁2000 1
+▁solución 1
+▁conflicto 1
+▁toma 1
+▁jóvenes 1
+▁L 1
+ando 1
+▁amplia 1
+▁of 1
+man 1
+▁guerra 1
+H 1
+po 1
+3 1
+▁prevención 1
+▁mientras 1
+▁varios 1
+▁aspectos 1
+▁diálogo 1
+▁Carta 1
+▁Misión 1
+▁preocupación 1
+▁aunque 1
+▁ven 1
+▁zona 1
+1 1
+▁estrategia 1
+▁cinco 1
+▁obstante 1
+▁expertos 1
+▁vista 1
+il 1
+por 1
+con 1
+▁Pa 1
+▁i 1
+▁asegurar 1
+▁iniciativas 1
+▁China 1
+▁efecto 1
+ya 1
+▁conjunto 1
+▁Pacto 1
+▁ve 1
+sión 1
+▁puesto 1
+ados 1
+▁f 1
+▁capacitación 1
+▁demás 1
+▁fundamentales 1
+be 1
+▁the 1
+▁entonces 1
+); 1
+▁experiencia 1
+▁sería 1
+▁servicio 1
+▁dijo 1
+▁facilitar 1
+▁24 1
+▁documentos 1
+▁alto 1
+▁América 1
+us 1
+▁2004 1
+cha 1
+▁transporte 1
+▁procedimiento 1
+sta 1
+▁cumplir 1
+▁UE 1
+▁pueblos 1
+▁encontrar 1
+▁enero 1
+▁2002 1
+▁Tratado 1
+▁Convenio 1
+▁25 1
+▁nuestras 1
+▁2001 1
+▁resulta 1
+▁hombres 1
+▁li 1
+ran 1
+4 1
+▁local 1
+▁Representante 1
+era 1
+▁debido 1
+▁relativa 1
+▁crecimiento 1
+▁21 1
+▁relaciones 1
+▁especiales 1
+▁territorio 1
+! 1
+ni 1
+▁últimos 1
+▁puestos 1
+▁producción 1
+nd 1
+▁libre 1
+▁estudio 1
+▁formación 1
+▁22 1
+▁2003 1
+V 1
+▁todavía 1
+▁oportunidad 1
+”. 1
+▁centro 1
+▁imp 1
+▁Acción 1
+per 1
+▁tenía 1
+▁funciones 1
+▁acuerdos 1
+▁celebrada 1
+▁colaboración 1
+▁Iraq 1
+▁luego 1
+▁establecido 1
+▁economía 1
+▁señor 1
+ista 1
+▁consecuencias 1
+▁Le 1
+▁consultas 1
+▁VIH 1
+ré 1
+▁proteger 1
+um 1
+▁relacionadas 1
+▁principal 1
+▁Ba 1
+▁leyes 1
+▁sa 1
+▁llevar 1
+▁saber 1
+▁llegar 1
+▁Hay 1
+▁eran 1
+▁Co 1
+▁mantener 1
+▁casi 1
+▁éxito 1
+.2 1
+▁dis 1
+▁reducción 1
+▁utilización 1
+▁indica 1
+▁quiero 1
+▁podrá 1
+▁ro 1
+▁referencia 1
+tal 1
+▁refugiados 1
+▁ninguna 1
+▁eficaz 1
+▁ciudad 1
+▁ella 1
+w 1
+▁Di 1
+▁2006 1
+▁adopción 1
+tado 1
+▁prestar 1
+▁R 1
+▁financieros 1
+▁enfoque 1
+▁gobierno 1
+▁presentar 1
+▁ejercicio 1
+gu 1
+▁aprobación 1
+▁bienes 1
+▁presentación 1
+▁suma 1
+▁consecuencia 1
+▁31 1
+▁hotel 1
+ha 1
+▁Te 1
+it 1
+▁deberían 1
+▁hizo 1
+/1 1
+▁plenamente 1
+▁Penal 1
+ente 1
+▁políticos 1
+▁comunidades 1
+▁comp 1
+▁23 1
+▁indígenas 1
+▁razón 1
+▁régimen 1
+▁ellas 1
+▁riesgo 1
+▁Artículo 1
+▁Sra 1
+▁espera 1
+adas 1
+▁difícil 1
+ros 1
+▁Pro 1
+▁buena 1
+", 1
+▁varias 1
+du 1
+cri 1
+▁fuerzas 1
+▁menores 1
+▁presentado 1
+▁administración 1
+▁determinar 1
+▁nada 1
+cia 1
+▁obtener 1
+▁cuales 1
+▁and 1
+▁generales 1
+▁temas 1
+▁participar 1
+lu 1
+ber 1
+ing 1
+▁cu 1
+▁ingresos 1
+▁2007 1
+”, 1
+▁alguna 1
+cen 1
+▁sub 1
+cción 1
+▁proporcionar 1
+▁Código 1
+st 1
+men 1
+ll 1
+▁utilizar 1
+▁inglés 1
+tion 1
+▁Mo 1
+▁pe 1
+▁igual 1
+ct 1
+▁parece 1
+▁significa 1
+▁Vi 1
+▁dado 1
+ita 1
+▁esfera 1
+): 1
+den 1
+▁ningún 1
+tor 1
+▁debemos 1
+▁ne 1
+▁delito 1
+▁niveles 1
+▁observa 1
+▁incluidos 1
+▁curso 1
+▁concepto 1
+com 1
+▁mar 1
+▁seguimiento 1
+▁He 1
+▁político 1
+▁par 1
+▁agosto 1
+CN 1
+Qué 1
+▁observaciones 1
+ck 1
+▁existe 1
+▁apoyar 1
+▁mis 1
+▁casa 1
+cer 1
+▁peligro 1
+▁teniendo 1
+▁competencia 1
+▁análisis 1
+▁Permanente 1
+▁alcanzar 1
+▁centros 1
+▁so 1
+ho 1
+▁auto 1
+amos 1
+▁trabajadores 1
+bra 1
+▁segunda 1
+▁U 1
+▁ir 1
+▁bo 1
+▁II 1
+▁negociaciones 1
+▁real 1
+▁sexual 1
+ieron 1
+▁nunca 1
+▁fa 1
+▁secretaría 1
+ren 1
+▁N 1
+▁asuntos 1
+istas 1
+▁g 1
+▁progresos 1
+▁regiones 1
+▁relacionados 1
+7 1
+▁provisional 1
+Aplausos 1
+▁hacerlo 1
+ec 1
+car 1
+cio 1
+▁cargo 1
+cho 1
+▁desea 1
+rra 1
+▁evitar 1
+▁Económico 1
+▁pi 1
+pre 1
+▁violaciones 1
+ón 1
+/2 1
+8 1
+▁dinero 1
+▁K 1
+▁pesar 1
+▁insta 1
+hi 1
+▁Sa 1
+ul 1
+▁haciendo 1
+▁artículos 1
+▁febrero 1
+▁respeto 1
+▁sal 1
+▁esferas 1
+▁necesita 1
+▁usted 1
+ver 1
+▁financiera 1
+enta 1
+▁iniciativa 1
+▁escala 1
+▁civiles 1
+▁Relator 1
+▁Reino 1
+▁Rusia 1
+▁disposición 1
+▁independiente 1
+▁Centro 1
+▁penal 1
+tes 1
+▁fines 1
+▁terreno 1
+▁• 1
+▁trabajar 1
+▁dispuesto 1
+▁Medio 1
+▁podrían 1
+▁red 1
+▁judicial 1
+véase 1
+▁26 1
+▁oportunidades 1
+▁tales 1
+▁ad 1
+▁amenaza 1
+▁Constitución 1
+▁fuerza 1
+▁relativo 1
+▁ofrece 1
+▁necesarias 1
+gue 1
+izar 1
+▁desarme 1
+ras 1
+tivas 1
+▁párr 1
+▁cerca 1
+tivo 1
+▁Milenio 1
+▁Corte 1
+▁Democrática 1
+▁consiguiente 1
+tación 1
+▁miembro 1
+▁muerte 1
+▁estructura 1
+▁fondo 1
+▁delitos 1
+▁privado 1
+▁28 1
+▁Acuerdo 1
+▁claro 1
+▁pues 1
+▁aprobado 1
+▁hablar 1
+▁27 1
+▁EE 1
+▁Desde 1
+▁votación 1
+▁anterior 1
+▁seis 1
+▁adoptadas 1
+▁fortalecer 1
+▁2008 1
+▁Unido 1
+che 1
+▁lugares 1
+▁Plan 1
+▁mecanismo 1
+▁for 1
+▁misión 1
+▁tratamiento 1
+gre 1
+▁abordar 1
+Add 1
+▁enseñanza 1
+su 1
+nes 1
+▁incluida 1
+▁Mar 1
+▁Asia 1
+▁amplio 1
+SIDA 1
+▁Po 1
+▁estoy 1
+▁Esa 1
+▁palabras 1
+▁resolver 1
+▁sitio 1
+imos 1
+▁elecciones 1
+▁incluye 1
+▁Ro 1
+▁reconocimiento 1
+▁contribuciones 1
+▁Durante 1
+▁aprobó 1
+ró 1
+he 1
+UU 1
+ic 1
+▁niñas 1
+▁tratados 1
+idas 1
+rios 1
+▁cuyo 1
+▁señala 1
+▁recomienda 1
+▁planes 1
+sti 1
+▁propia 1
+▁partir 1
+▁actividad 1
+▁Cumbre 1
+cto 1
+▁Da 1
+th 1
+▁ba 1
+▁diversos 1
+mina 1
+▁elementos 1
+▁intereses 1
+▁Ministro 1
+tre 1
+▁junto 1
+.5 1
+▁York 1
+▁tenido 1
+siones 1
+▁elaboración 1
+▁sectores 1
+▁demanda 1
+ió 1
+j 1
+▁1999 1
+▁contenido 1
+▁presenta 1
+▁diversas 1
+▁único 1
+▁declaraciones 1
+▁eficacia 1
+▁militares 1
+▁quienes 1
+▁vigor 1
+entes 1
+▁Aunque 1
+▁miras 1
+▁completa 1
+ng 1
+ue 1
+▁instrumento 1
+▁incluir 1
+▁propio 1
+▁reserva 1
+▁tratar 1
+ac 1
+▁Tema 1
+▁necesarios 1
+▁empresa 1
+▁origen 1
+▁mí 1
+▁preparación 1
+nas 1
+▁escrito 1
+▁cuadro 1
+▁delegaciones 1
+▁recibir 1
+▁Paz 1
+▁constituye 1
+▁estudios 1
+▁estrategias 1
+▁bienio 1
+▁conocimientos 1
+▁central 1
+▁vol 1
+sas 1
+▁interna 1
+▁nuclear 1
+▁integración 1
+▁hijos 1
+▁Alto 1
+0.000 1
+▁militar 1
+▁supervisión 1
+▁Internet 1
+▁podía 1
+▁Ra 1
+▁estaban 1
+▁hacen 1
+▁car 1
+ana 1
+▁obligación 1
+▁situaciones 1
+▁pasa 1
+▁Asimismo 1
+9 1
+▁tortura 1
+▁permite 1
+▁familias 1
+▁H 1
+pu 1
+▁satisfacción 1
+▁supuesto 1
+min 1
+▁50 1
+ales 1
+▁entidades 1
+▁na 1
+▁afirma 1
+▁infraestructura 1
+▁escuelas 1
+▁pequeñas 1
+▁San 1
+▁posición 1
+mp 1
+▁fi 1
+▁funciona 1
+▁Congo 1
+▁misiones 1
+zo 1
+.4 1
+▁cierto 1
+10 1
+▁material 1
+▁superior 1
+▁sección 1
+▁económicos 1
+▁humana 1
+▁luz 1
+▁apoya 1
+gen 1
+▁línea 1
+▁Ho 1
+▁pide 1
+am 1
+ña 1
+▁apartado 1
+▁India 1
+▁cambios 1
+mar 1
+ces 1
+▁democracia 1
+▁ho 1
+▁global 1
+▁anual 1
+▁adoptado 1
+K 1
+ina 1
+▁usar 1
+ero 1
+nda 1
+▁capital 1
+io 1
+cos 1
+ed 1
+▁participantes 1
+▁juicio 1
+fo 1
+▁dirección 1
+▁palabra 1
+▁violación 1
+▁contribuir 1
+▁pena 1
+▁figuran 1
+▁intercambio 1
+▁solicitud 1
+▁mejores 1
+▁incluidas 1
+▁Sala 1
+▁alta 1
+▁comunicaciones 1
+▁cantidad 1
+▁última 1
+▁financieras 1
+▁suficiente 1
+▁titulado 1
+▁oficiales 1
+▁cultura 1
+▁actualmente 1
+▁graves 1
+▁trans 1
+▁oficial 1
+▁tenga 1
+▁allí 1
+▁firma 1
+▁universal 1
+ment 1
+▁marcha 1
+▁padres 1
+▁29 1
+▁industria 1
+▁asociados 1
+▁preocupa 1
+▁alcance 1
+▁invita 1
+gar 1
+▁respecta 1
+▁V 1
+ex 1
+▁diferencia 1
+▁tráfico 1
+cion 1
+▁siguen 1
+▁2000, 1
+▁pidió 1
+▁públicos 1
+▁mal 1
+▁carta 1
+▁Entonces 1
+▁condición 1
+▁relativos 1
+ye 1
+▁camino 1
+▁compromisos 1
+▁PNUD 1
+▁recomendación 1
+▁com 1
+▁semana 1
+▁alrededor 1
+0 1
+.3 1
+rre 1
+▁armados 1
+▁detención 1
+▁continuación 1
+wa 1
+▁Presidencia 1
+▁tribunales 1
+bilidad 1
+bles 1
+▁tengan 1
+llo 1
+▁tierra 1
+sen 1
+ka 1
+▁finales 1
+▁Federación 1
+▁expresa 1
+▁acciones 1
+▁Entre 1
+▁carga 1
+▁eliminar 1
+▁distintos 1
+▁funcionamiento 1
+▁circunstancias 1
+▁elaborar 1
+▁Justicia 1
+▁ri 1
+▁donantes 1
+▁man 1
+▁pensar 1
+▁entorno 1
+▁desarrollar 1
+habla 1
+▁tercer 1
+▁contar 1
+▁Li 1
+▁procesos 1
+▁Derecho 1
+▁refiere 1
+▁directrices 1
+▁van 1
+▁reglamento 1
+▁Afganistán 1
+▁Segú 1
+▁modelo 1
+▁hi 1
+dad 1
+▁Creo 1
+art 1
+▁jurídico 1
+▁Bo 1
+▁seguro 1
+▁pobres 1
+▁dejar 1
+▁cosa 1
+tivos 1
+▁be 1
+▁esté 1
+▁existentes 1
+▁ce 1
+▁incluido 1
+▁dificultades 1
+▁comerciales 1
+aciones 1
+▁propiedad 1
+▁DE 1
+lle 1
+50 1
+ut 1
+▁drogas 1
+▁previsto 1
+▁considerar 1
+▁dispone 1
+▁razones 1
+▁res 1
+▁reconoce 1
+▁hu 1
+▁Todos 1
+▁adicionales 1
+▁naturales 1
+▁plena 1
+▁fe 1
+▁2005, 1
+▁establece 1
+rían 1
+▁deseo 1
+remos 1
+▁Administración 1
+tribu 1
+▁tecnologías 1
+▁única 1
+▁perspectiva 1
+▁paso 1
+▁mu 1
+▁ob 1
+▁cambiar 1
+▁interesados 1
+qué 1
+▁can 1
+ad 1
+▁estabilidad 1
+▁construcción 1
+▁humano 1
+▁eliminación 1
+▁W 1
+▁financiero 1
+ia 1
+▁confianza 1
+▁anti 1
+▁humanitaria 1
+▁materiales 1
+▁J 1
+tri 1
+▁quisiera 1
+▁efectiva 1
+▁anteriores 1
+dio 1
+▁cuya 1
+ton 1
+▁pruebas 1
+▁Canadá 1
+can 1
+▁– 1
+▁hombre 1
+▁oficinas 1
+té 1
+▁Japón 1
+tura 1
+▁Francia 1
+▁pequeños 1
+▁tasa 1
+▁prevenir 1
+▁económicas 1
+▁decidió 1
+▁tendrá 1
+▁opiniones 1
+▁contribución 1
+▁éste 1
+pen 1
+▁criterios 1
+▁determinación 1
+tera 1
+▁conseguir 1
+▁UNICEF 1
+▁hora 1
+▁queda 1
+▁producto 1
+▁dicha 1
+▁Banco 1
+▁conocimiento 1
+▁prevé 1
+▁verdad 1
+bu 1
+▁tribunal 1
+▁básicos 1
+▁culturales 1
+ol 1
+▁conclusiones 1
+cre 1
+▁entender 1
+▁Gracias 1
+▁Yo 1
+▁acoge 1
+▁encuentra 1
+▁Gran 1
+▁2001, 1
+AC 1
+▁climático 1
+▁quien 1
+▁profesional 1
+▁aplica 1
+▁2003, 1
+▁planificación 1
+▁conferencias 1
+▁40 1
+▁necesaria 1
+▁prioridades 1
+▁web 1
+▁tuvo 1
+ig 1
+▁dirigida 1
+Y 1
+▁60 1
+▁trabajos 1
+▁naturaleza 1
+▁recibido 1
+▁Eso 1
+▁presentó 1
+▁extra 1
+ario 1
+et 1
+▁siglo 1
+▁Comisionado 1
+▁escuela 1
+▁esencial 1
+▁tengo 1
+▁formular 1
+▁física 1
+▁presencia 1
+gan 1
+spe 1
+▁UN 1
+▁valores 1
+▁beneficios 1
+mento 1
+rias 1
+nu 1
+▁Países 1
+▁estima 1
+▁Presidenta 1
+▁proliferación 1
+mas 1
+▁señaló 1
+▁directamente 1
+▁transición 1
+▁posibles 1
+▁Mujer 1
+tan 1
+▁2004, 1
+▁comercial 1
+▁diseño 1
+▁mercados 1
+sto 1
+tica 1
+▁informar 1
+cur 1
+▁duda 1
+▁Norte 1
+▁reforzar 1
+▁2006, 1
+▁gra 1
+▁explotación 1
+▁Información 1
+▁entrada 1
+▁distribución 1
+▁2002, 1
+▁motivos 1
+▁permitir 1
+▁existen 1
+▁Ex 1
+▁Pe 1
+▁jurídica 1
+▁evaluar 1
+▁agentes 1
+▁transferencia 1
+nta 1
+15 1
+▁algún 1
+▁División 1
+▁combatir 1
+▁existencia 1
+▁independencia 1
+cas 1
+▁lleva 1
+izado 1
+▁europea 1
+▁fortalecimiento 1
+der 1
+▁cuerpo 1
+rro 1
+▁tipos 1
+▁asunto 1
+▁do 1
+▁exp 1
+▁Instituto 1
+▁Director 1
+▁fuerte 1
+ándose 1
+dica 1
+▁preparar 1
+▁puntos 1
+▁realizado 1
+▁serán 1
+▁clave 1
+▁fuentes 1
+▁fomentar 1
+nia 1
+▁Recordando 1
+at 1
+▁internos 1
+▁métodos 1
+▁mismos 1
+ctor 1
+▁infantil 1
+▁grado 1
+▁instalaciones 1
+▁bastante 1
+▁gustaría 1
+▁términos 1
+▁oficina 1
+▁Kosovo 1
+▁autoridad 1
+▁Na 1
+▁Pide 1
+▁preguntas 1
+▁cultural 1
+▁territorios 1
+▁alguien 1
+▁Ta 1
+▁Foro 1
+▁menudo 1
+▁pasar 1
+▁pesca 1
+▁señalar 1
+lí 1
+sos 1
+▁terroristas 1
+▁visto 1
+có 1
+La 1
+▁corre 1
+ste 1
+▁Véase 1
+fu 1
+W 1
+▁Ginebra 1
+▁habría 1
+▁Ka 1
+▁Sudán 1
+▁órgano 1
+▁Consultiva 1
+▁Niño 1
+▁fomento 1
+▁completo 1
+▁prioridad 1
+iera 1
+org 1
+▁motivo 1
+▁¡ 1
+▁distintas 1
+▁celebrar 1
+▁cuentas 1
+▁deberá 1
+lar 1
+▁refleja 1
+▁Ar 1
+xi 1
+▁defensa 1
+ke 1
+▁propone 1
+iza 1
+▁1997 1
+▁asimismo 1
+par 1
+▁extraordinario 1
+▁muestra 1
+▁claramente 1
+▁basa 1
+▁pu 1
+▁abierta 1
+▁próximo 1
+▁vigilancia 1
+▁llega 1
+▁permanente 1
+para 1
+▁venta 1
+ine 1
+▁discapacidad 1
+▁menor 1
+pare 1
+▁abarca 1
+▁baja 1
+nal 1
+▁tratado 1
+▁m 1
+stru 1
+ie 1
+▁solicita 1
+▁Bueno 1
+▁consulta 1
+▁México 1
+▁voluntad 1
+▁Alemania 1
+▁palestino 1
+▁saben 1
+▁prestación 1
+▁responsables 1
+▁Ja 1
+▁Com 1
+▁deuda 1
+▁demasiado 1
+▁diferente 1
+▁conforme 1
+▁(19 1
+▁categoría 1
+sh 1
+▁cor 1
+rnos 1
+pon 1
+▁Ju 1
+nt 1
+▁tareas 1
+Cómo 1
+▁buen 1
+_ 1
+▁concretas 1
+▁depende 1
+▁miles 1
+▁inversión 1
+▁totalmente 1
+▁exige 1
+▁ocupa 1
+▁ciencia 1
+idades 1
+▁preciso 1
+▁humanitario 1
+▁enfermedades 1
+▁grave 1
+nto 1
+▁cerebro 1
+▁cuarto 1
+▁ustedes 1
+▁1996 1
+▁puedo 1
+▁progreso 1
+▁Comunidad 1
+▁visita 1
+son 1
+▁Irlanda 1
+▁costos 1
+▁integrado 1
+▁condena 1
+▁Oriente 1
+▁creado 1
+OS 1
+▁ésta 1
+▁enmiendas 1
+▁bu 1
+▁Mu 1
+▁precios 1
+▁costo 1
+able 1
+▁s 1
+▁prueba 1
+▁sanciones 1
+▁fácil 1
+cal 1
+▁pago 1
+▁primero 1
+▁2007, 1
+mb 1
+pro 1
+▁públicas 1
+▁fu 1
+ced 1
+▁Hotel 1
+▁riesgos 1
+▁Ya 1
+▁ideas 1
+▁adecuada 1
+▁debía 1
+▁alimentos 1
+tin 1
+▁du 1
+ki 1
+▁mayores 1
+▁Cooperación 1
+▁Sur 1
+▁Estos 1
+▁marca 1
+ou 1
+ud 1
+▁causas 1
+▁responder 1
+▁Proyecto 1
+▁máximo 1
+No 1
+▁directa 1
+cía 1
+▁simplemente 1
+▁pleno 1
+▁destrucción 1
+▁2009 1
+▁informa 1
+▁ahí 1
+oc 1
+5% 1
+ifica 1
+▁ofrecer 1
+land 1
+om 1
+▁mínimo 1
+▁1998 1
+dió 1
+▁Corea 1
+▁estén 1
+▁reciente 1
+▁Brasil 1
+▁Toma 1
+cida 1
+mba 1
+▁aprobar 1
+▁principalmente 1
+▁asesoramiento 1
+20 1
+rica 1
+▁mucha 1
+▁Cuba 1
+▁mejora 1
+▁tarea 1
+▁particularmente 1
+▁mano 1
+▁soluciones 1
+▁eficaces 1
+▁Turquía 1
+▁ti 1
+▁llamamiento 1
+▁Servicio 1
+puesta 1
+bor 1
+▁urgente 1
+▁redes 1
+▁esperanza 1
+ER 1
+▁Comisario 1
+▁posibilidades 1
+▁importa 1
+▁palestinos 1
+▁profesionales 1
+▁investigaciones 1
+▁registro 1
+11 1
+▁Reunión 1
+▁asociación 1
+▁actuales 1
+▁mitad 1
+▁« 1
+▁propuesto 1
+des 1
+▁crea 1
+▁quincuagésimo 1
+▁párrafos 1
+▁sexo 1
+▁viven 1
+° 1
+▁Lu 1
+▁electrónico 1
+▁ordinario 1
+▁mil 1
+▁comunes 1
+▁inversiones 1
+▁100 1
+ció 1
+▁consenso 1
+▁volver 1
+▁Reglamento 1
+▁continua 1
+▁examina 1
+▁mundiales 1
+▁emp 1
+▁hechos 1
+▁vincula 1
+ante 1
+▁emergencia 1
+▁realización 1
+clu 1
+▁libertades 1
+▁celebrado 1
+CI 1
+▁rápido 1
+▁cabe 1
+▁recientemente 1
+til 1
+▁An 1
+▁Que 1
+▁Salud 1
+im 1
+▁indicadores 1
+▁Servicios 1
+▁estudiantes 1
+mu 1
+▁garantiza 1
+▁interesante 1
+Por 1
+▁resto 1
+▁activa 1
+▁conclusión 1
+torio 1
+▁pequeño 1
+▁rurales 1
+▁transparencia 1
+iva 1
+▁revisión 1
+▁logro 1
+▁Fa 1
+▁adicional 1
+▁ci 1
+és 1
+▁ponente 1
+▁Estas 1
+12 1
+▁mañana 1
+mit 1
+▁enmienda 1
+▁necesitan 1
+.9 1
+▁clara 1
+lan 1
+▁celebra 1
+▁legal 1
+▁ta 1
+▁realizados 1
+▁contratación 1
+▁Ga 1
+▁institucional 1
+▁períodos 1
+▁haga 1
+▁beneplácito 1
+▁grande 1
+▁responsable 1
+▁racial 1
+▁propios 1
+▁1995 1
+gos 1
+▁minutos 1
+▁Palestina 1
+via 1
+▁europeos 1
+qu 1
+▁implica 1
+▁preparado 1
+▁enviar 1
+sis 1
+▁hincapié 1
+AR 1
+ice 1
+▁comenzar 1
+▁extranjeros 1
+▁familiar 1
+▁especializados 1
+dia 1
+▁sol 1
+▁requisitos 1
+▁España 1
+▁normal 1
+▁mes 1
+▁orientación 1
+▁firme 1
+▁Africana 1
+▁Irán 1
+ju 1
+ieran 1
+▁2008, 1
+▁ciudades 1
+▁To 1
+40 1
+bri 1
+▁inicial 1
+▁Ejecutivo 1
+▁media 1
+▁ru 1
+met 1
+▁ambos 1
+▁definición 1
+▁llama 1
+▁presta 1
+▁multi 1
+▁disponibles 1
+X 1
+25 1
+amiento 1
+gui 1
+▁Organismo 1
+▁celebró 1
+▁bi 1
+▁Pre 1
+▁sur 1
+▁vulnerables 1
+▁etc 1
+toria 1
+▁factores 1
+▁Mesa 1
+▁constituyen 1
+ry 1
+▁consumo 1
+% 1
+ko 1
+▁queremos 1
+▁viaje 1
+▁simple 1
+idos 1
+▁respuestas 1
+▁contiene 1
+▁versión 1
+▁encuentran 1
+▁imagen 1
+yo 1
+ito 1
+eo 1
+▁minorías 1
+▁qui 1
+▁Porque 1
+▁capítulo 1
+ES 1
+▁contrario 1
+El 1
+▁Hace 1
+▁petición 1
+▁afectados 1
+▁aquellos 1
+▁conocer 1
+▁interior 1
+▁actuar 1
+?" 1
+aba 1
+▁voy 1
+▁fase 1
+▁adelante 1
+▁asociaciones 1
+▁allá 1
+▁Caribe 1
+▁Dis 1
+▁llamado 1
+▁desempeñar 1
+cul 1
+▁conjunta 1
+▁Leona 1
+gado 1
+nce 1
+▁UNCTAD 1
+▁construir 1
+▁Bu 1
+▁empezar 1
+▁tarde 1
+▁siga 1
+▁difusión 1
+▁planeta 1
+▁Líbano 1
+▁comisión 1
+▁únicamente 1
+▁v 1
+▁contrato 1
+30 1
+▁Facultativo 1
+▁campo 1
+▁podrán 1
+▁etapa 1
+▁Comercio 1
+▁diversidad 1
+▁noche 1
+▁página 1
+▁solamente 1
+▁prohibición 1
+J 1
+ger 1
+▁Examen 1
+▁constante 1
+▁potencial 1
+▁intención 1
+▁garantías 1
+vis 1
+▁cal 1
+▁encargado 1
+▁efectivo 1
+▁pedir 1
+▁Tenemos 1
+▁plano 1
+tina 1
+▁quiere 1
+amente 1
+▁tri 1
+una 1
+▁europeo 1
+▁deberán 1
+▁reservas 1
+▁arte 1
+▁Sierra 1
+▁calle 1
+▁breve 1
+▁masa 1
+▁informó 1
+▁competentes 1
+▁celebrará 1
+ivo 1
+▁madre 1
+▁título 1
+▁propósito 1
+▁Educación 1
+▁Ejecutiva 1
+▁expresión 1
+▁sabemos 1
+AN 1
+UN 1
+▁enorme 1
+▁tenían 1
+▁1° 1
+▁conferencia 1
+▁recordar 1
+▁entrega 1
+▁supone 1
+▁Exteriores 1
+▁interno 1
+▁estudiar 1
+▁abierto 1
+▁ONG 1
+▁Pacífico 1
+▁trabajando 1
+▁Fi 1
+▁Australia 1
+▁denuncia 1
+13 1
+▁destaca 1
+▁velar 1
+dora 1
+▁Relaciones 1
+▁cuotas 1
+▁familiares 1
+▁garantía 1
+▁1990 1
+▁Árabe 1
+ere 1
+▁Go 1
+▁pie 1
+▁celebración 1
+▁Quiero 1
+rme 1
+▁estratégico 1
+▁técnicas 1
+mer 1
+▁conciencia 1
+op 1
+▁próxima 1
+▁III 1
+▁licencia 1
+▁ámbitos 1
+▁Italia 1
+▁aprobada 1
+▁minas 1
+▁bueno 1
+▁ampliar 1
+▁basada 1
+▁afecta 1
+▁pequeña 1
+▁confirma 1
+rí 1
+▁vamos 1
+▁... 1
+▁isla 1
+▁debates 1
+▁actualidad 1
+ura 1
+▁religión 1
+▁Ne 1
+anza 1
+▁corresponde 1
+val 1
+▁Ese 1
+▁matrimonio 1
+▁solicitudes 1
+▁Hi 1
+▁h 1
+cta 1
+▁Después 1
+▁requiere 1
+tada 1
+▁jurídicos 1
+rea 1
+.7 1
+▁Occidental 1
+▁proporciona 1
+▁continuar 1
+▁recuperación 1
+▁daños 1
+▁cita 1
+▁formulación 1
+▁juego 1
+▁expresar 1
+-20 1
+▁hubiera 1
+▁bienestar 1
+▁precio 1
+▁semanas 1
+▁contacto 1
+ez 1
+▁luchar 1
+nza 1
+▁Ni 1
+▁aprovechar 1
+▁llegado 1
+▁sala 1
+▁necesitamos 1
+▁conducto 1
+▁trato 1
+▁Do 1
+▁Política 1
+▁busca 1
+▁Subcomisión 1
+duc 1
+▁nadie 1
+▁dio 1
+▁pérdida 1
+▁Timor 1
+id 1
+tí 1
+▁poblaciones 1
+▁Guinea 1
+▁natural 1
+▁somos 1
+▁esfuerzo 1
+▁clientes 1
+▁Anexo 1
+▁impacto 1
+▁habitaciones 1
+▁vivienda 1
+▁prisión 1
+▁aproximadamente 1
+▁plantea 1
+▁Asociación 1
+▁creciente 1
+▁examinado 1
+▁estrecha 1
+em 1
+ist 1
+▁evolución 1
+▁justo 1
+▁Algunos 1
+▁código 1
+▁clase 1
+▁ocupación 1
+▁intervención 1
+▁humanidad 1
+▁habrá 1
+▁Bosnia 1
+▁Todo 1
+▁habla 1
+▁ampliación 1
+14 1
+▁Inter 1
+▁avances 1
+▁impedir 1
+▁sexagésimo 1
+▁pone 1
+nar 1
+▁Ve 1
+▁cap 1
+MI 1
+▁Uno 1
+mor 1
+flu 1
+▁mental 1
+▁evidente 1
+▁diferencias 1
+▁educa 1
+▁adhesión 1
+▁tercera 1
+tel 1
+▁"¿ 1
+▁Dependencia 1
+▁fuente 1
+antes 1
+▁Pakistán 1
+ula 1
+» 1
+▁honor 1
+▁sabe 1
+▁frecuencia 1
+▁economías 1
+▁útil 1
+ld 1
+▁Son 1
+cina 1
+▁0 1
+▁movimiento 1
+gó 1
+▁suficientes 1
+▁recurso 1
+▁rec 1
+▁fra 1
+▁examinó 1
+▁visión 1
+lin 1
+▁Pública 1
+▁utiliza 1
+▁fronteras 1
+▁aspecto 1
+▁identidad 1
+▁logrado 1
+▁tierras 1
+▁acto 1
+▁hicieron 1
+.6 1
+▁nacionalidad 1
+▁idiomas 1
+▁Nos 1
+▁aire 1
+▁interesadas 1
+▁ref 1
+rma 1
+▁acaba 1
+▁Cada 1
+dic 1
+▁rápidamente 1
+▁Universidad 1
+▁facilita 1
+▁mesa 1
+▁Je 1
+▁incluyen 1
+▁estadísticas 1
+▁Está 1
+dida 1
+▁aceptar 1
+▁corrupción 1
+▁judiciales 1
+▁partida 1
+▁oficiosas 1
+rico 1
+ea 1
+▁exterior 1
+▁Sudáfrica 1
+tico 1
+▁orador 1
+▁2010 1
+▁Tiene 1
+▁come 1
+▁compra 1
+▁médico 1
+▁Hemos 1
+izada 1
+▁elección 1
+▁destino 1
+▁enfermedad 1
+▁probablemente 1
+▁autores 1
+▁periódico 1
+▁1994 1
+▁opciones 1
+rgi 1
+▁comportamiento 1
+▁acontecimientos 1
+▁Man 1
+▁excelente 1
+cl 1
+▁Pi 1
+▁agricultura 1
+▁inf 1
+▁siete 1
+▁cambia 1
+18 1
+▁inclusión 1
+▁representación 1
+▁soy 1
+tt 1
+▁duración 1
+▁ju 1
+▁obstáculos 1
+▁Aquí 1
+▁Tras 1
+▁l 1
+▁opera 1
+▁mensaje 1
+tarios 1
+▁cursos 1
+▁ilícito 1
+stra 1
+▁elemento 1
+▁Estamos 1
+▁eficiencia 1
+bre 1
+aje 1
+* 1
+▁indemnización 1
+— 1
+▁animales 1
+▁adecuado 1
+▁recientes 1
+▁Herzegovina 1
+▁bar 1
+▁sé 1
+▁Desarme 1
+▁Bajos 1
+▁op 1
+▁Sociales 1
+▁vidas 1
+▁dije 1
+▁Wa 1
+▁plenaria 1
+▁adelantados 1
+▁consolidación 1
+▁directo 1
+▁reformas 1
+▁Ki 1
+▁israelíes 1
+ica 1
+ly 1
+port 1
+▁extranjero 1
+▁acompaña 1
+▁ajuste 1
+▁ataques 1
+▁trabaja 1
+▁privada 1
+▁pronto 1
+60 1
+rlo 1
+▁dirigentes 1
+ible 1
+▁mostrar 1
+ino 1
+endo 1
+▁gusta 1
+▁reitera 1
+ab 1
+IT 1
+▁hogar 1
+▁crédito 1
+▁seguirá 1
+▁Sede 1
+▁búsqueda 1
+▁decenio 1
+▁campaña 1
+▁basado 1
+▁Comp 1
+▁categorías 1
+▁llamada 1
+▁Popular 1
+iente 1
+▁dan 1
+siona 1
+▁activamente 1
+▁comprendido 1
+▁ONU 1
+ite 1
+16 1
+▁observó 1
+▁Fue 1
+▁considerable 1
+▁Egipto 1
+▁equipos 1
+▁constitucional 1
+▁occidental 1
+▁increíble 1
+ificación 1
+liza 1
+▁Viena 1
+▁tradicionales 1
+▁Protección 1
+▁termina 1
+▁jurisdicción 1
+▁específicas 1
+▁Car 1
+AD 1
+▁super 1
+▁Suiza 1
+día 1
+van 1
+▁regular 1
+▁rep 1
+▁eficiente 1
+▁Uds 1
+▁Latina 1
+▁Cu 1
+▁candidatos 1
+▁directiva 1
+jan 1
+oso 1
+▁Islámica 1
+▁organizar 1
+▁primeros 1
+pla 1
+▁reconocer 1
+▁pagar 1
+▁explica 1
+( 1
+▁Colombia 1
+▁conducta 1
+▁vivir 1
+▁teléfono 1
+vert 1
+▁participa 1
+▁completamente 1
+▁denuncias 1
+De 1
+ento 1
+cial 1
+▁supra 1
+▁oferta 1
+▁profunda 1
+sion 1
+▁rehabilitación 1
+▁centra 1
+▁emisiones 1
+▁suministro 1
+▁The 1
+▁Insta 1
+▁primaria 1
+▁sucede 1
+▁infancia 1
+tiendo 1
+▁doble 1
+▁Nota 1
+▁chi 1
+▁k 1
+▁ilegal 1
+▁Somalia 1
+▁Yugoslavia 1
+▁otorga 1
+▁Kuwait 1
+17 1
+▁laboral 1
+▁1999, 1
+▁producir 1
+▁experiencias 1
+▁Gestión 1
+▁formuladas 1
+cimiento 1
+▁asilo 1
+▁san 1
+▁Gu 1
+▁ciclo 1
+▁señora 1
+▁atender 1
+MA 1
+▁oriental 1
+fund 1
+▁dichos 1
+CE 1
+▁hacemos 1
+▁establecidos 1
+▁realiza 1
+▁migrantes 1
+▁razonable 1
+▁español 1
+ide 1
+▁So 1
+▁reglas 1
+ión 1
+ñ 1
+▁hablando 1
+én 1
+01 1
+▁amigos 1
+▁cuidado 1
+▁tomó 1
+▁composición 1
+▁expresó 1
+▁asigna 1
+▁compartir 1
+▁trabajan 1
+gel 1
+▁tendría 1
+▁hijo 1
+▁enfrenta 1
+/19 1
+▁estructuras 1
+▁deliberaciones 1
+▁combina 1
+▁acta 1
+▁positiva 1
+cip 1
+▁rápida 1
+mbre 1
+▁Presupuesto 1
+▁porcentaje 1
+we 1
+▁comienzo 1
+▁Expertos 1
+▁concluir 1
+▁usuarios 1
+▁consigna 1
+▁levanta 1
+pri 1
+▁disponible 1
+▁entra 1
+▁tomado 1
+▁Noruega 1
+▁sistemática 1
+▁Sección 1
+▁médicos 1
+cor 1
+▁Señorías 1
+ev 1
+▁demostrado 1
+nder 1
+▁imágenes 1
+▁X 1
+▁Indonesia 1
+▁Hoy 1
+▁propias 1
+▁comunitario 1
+▁Tierra 1
+▁multilaterales 1
+▁desplazados 1
+ificado 1
+▁integrada 1
+▁fiscal 1
+DE 1
+▁iniciar 1
+▁técnico 1
+▁hagan 1
+▁Estatuto 1
+jas 1
+▁alimentaria 1
+▁Ucrania 1
+table 1
+▁jurídicas 1
+gubernamentales 1
+▁costa 1
+▁desafíos 1
+▁desastres 1
+ones 1
+▁Central 1
+▁previstos 1
+▁ACNUR 1
+▁pertinente 1
+ion 1
+▁lado 1
+▁pocos 1
+▁ambas 1
+ua 1
+▁radio 1
+▁Señora 1
+▁incorporar 1
+▁Ri 1
+▁término 1
+▁fenómeno 1
+▁Rwanda 1
+▁Argentina 1
+▁mandatos 1
+▁estratégica 1
+▁integridad 1
+aria 1
+▁especie 1
+▁decidir 1
+bar 1
+▁temporal 1
+▁sentencia 1
+▁favorable 1
+chi 1
+▁escolar 1
+▁w 1
+▁ocho 1
+▁35 1
+vie 1
+▁clima 1
+▁euros 1
+▁im 1
+IS 1
+▁distrito 1
+▁libro 1
+▁Bien 1
+▁instancia 1
+▁Económica 1
+▁recomendó 1
+▁Estrategia 1
+lia 1
+▁compara 1
+▁servir 1
+▁presentes 1
+▁intento 1
+▁Ello 1
+EN 1
+▁publicación 1
+ano 1
+▁industrial 1
+▁Aplicación 1
+▁Oriental 1
+▁médica 1
+dera 1
+▁científicos 1
+▁adolescentes 1
+▁Suecia 1
+▁estatuto 1
+▁técnicos 1
+▁dirigido 1
+icio 1
+▁tamaño 1
+▁alienta 1
+▁estipula 1
+▁petróleo 1
+▁cifras 1
+▁reafirma 1
+▁empleados 1
+▁contribuye 1
+▁africanos 1
+Se 1
+▁desarrollados 1
+▁logros 1
+▁remo 1
+▁presidente 1
+▁Económicos 1
+▁Embajador 1
+▁pedido 1
+▁científica 1
+▁♫ 1
+vers 1
+▁% 1
+▁norte 1
+▁legislativo 1
+▁Tal 1
+▁amenazas 1
+▁dependencia 1
+▁moral 1
+▁viola 1
+▁residencia 1
+▁tasas 1
+ner 1
+▁electrónica 1
+▁repercusiones 1
+mbra 1
+▁fuego 1
+/20 1
+▁opción 1
+▁tratos 1
+▁consideración 1
+▁buscar 1
+▁características 1
+▁racismo 1
+▁prensa 1
+▁padre 1
+▁Marco 1
+▁sociedades 1
+▁Z 1
+99 1
+IC 1
+▁restricciones 1
+▁consiste 1
+▁interpretación 1
+▁Documentos 1
+resolución 1
+▁000 1
+▁sola 1
+ts 1
+▁respetar 1
+▁espíritu 1
+▁sustantivo 1
+▁45 1
+En 1
+▁voto 1
+▁dónde 1
+▁entrar 1
+▁Delito 1
+ores 1
+sal 1
+▁Va 1
+▁normativa 1
+▁participan 1
+mé 1
+▁obra 1
+▁organizada 1
+▁Pu 1
+▁Nuestro 1
+▁institución 1
+▁ejercer 1
+▁crímenes 1
+tima 1
+▁entendimiento 1
+▁terceros 1
+▁capacidades 1
+▁aplicable 1
+▁organizado 1
+▁norma 1
+▁superar 1
+CP 1
+eros 1
+▁(1 1
+▁Hu 1
+AS 1
+▁Nigeria 1
+▁Serbia 1
+▁exposición 1
+ral 1
+▁reclamaciones 1
+tergubernamental 1
+zar 1
+▁demuestra 1
+▁francés 1
+gua 1
+▁comisiones 1
+hu 1
+97 1
+▁ambientales 1
+▁rechaza 1
+▁inferior 1
+▁intenta 1
+lega 1
+▁Liberia 1
+▁Ge 1
+▁Islas 1
+Es 1
+est 1
+▁cree 1
+▁transmit 1
+▁individual 1
+ay 1
+▁democrático 1
+▁limitado 1
+▁Muchas 1
+▁idioma 1
+tero 1
+▁Burundi 1
+▁destinados 1
+▁ciertas 1
+▁Jo 1
+▁dichas 1
+▁carrera 1
+▁pudiera 1
+▁subraya 1
+▁elegir 1
+RI 1
+pul 1
+▁áreas 1
+▁manos 1
+▁Estoy 1
+▁larga 1
+▁contratos 1
+▁comunitaria 1
+▁Ambiente 1
+▁operacionales 1
+▁llevado 1
+▁Puede 1
+ico 1
+▁desarrolla 1
+mica 1
+ff 1
+▁delincuencia 1
+▁ambiental 1
+▁podían 1
+▁foto 1
+ología 1
+▁habitantes 1
+▁Siria 1
+OM 1
+▁debidamente 1
+▁Decide 1
+▁Primer 1
+illo 1
+▁área 1
+▁democrática 1
+▁hacía 1
+▁em 1
+▁Coordinación 1
+ena 1
+bla 1
+▁equilibrio 1
+▁Bi 1
+▁aborda 1
+▁preocupaciones 1
+▁iba 1
+stro 1
+▁ratificación 1
+03 1
+ent 1
+▁colegas 1
+▁colectiva 1
+.1) 1
+▁similares 1
+▁Cámara 1
+▁solidaridad 1
+▁elaborado 1
+udi 1
+▁éstos 1
+▁inmediatamente 1
+zi 1
+▁fo 1
+▁Otros 1
+▁Pen 1
+▁mismas 1
+▁Ad 1
+▁árabe 1
+▁complejo 1
+▁Chile 1
+▁recuerda 1
+▁provoca 1
+▁48 1
+▁comida 1
+▁vemos 1
+▁activos 1
+▁quería 1
+▁aprender 1
+▁fija 1
+au 1
+/4 1
+/58/ 1
+▁consentimiento 1
+▁digital 1
+▁oradores 1
+▁puesta 1
+▁Cuadro 1
+▁quizá 1
+▁esenciales 1
+▁detallada 1
+▁ocasión 1
+fri 1
+▁índole 1
+▁alentar 1
+▁determinados 1
+AL 1
+▁quién 1
+▁comparación 1
+▁Chipre 1
+▁indicado 1
+▁generación 1
+▁Prevención 1
+▁reales 1
+▁sede 1
+▁aéreo 1
+cita 1
+▁continúa 1
+▁1993 1
+▁regula 1
+▁quieren 1
+45 1
+▁podido 1
+▁Be 1
+▁estados 1
+▁color 1
+ial 1
+▁bilaterales 1
+96 1
+▁CA 1
+▁col 1
+izados 1
+lé 1
+/55/ 1
+bro 1
+ín 1
+▁concretos 1
+▁producido 1
+▁proceda 1
+▁original 1
+form 1
+▁componente 1
+▁federal 1
+▁música 1
+tores 1
+▁mira 1
+▁t 1
+▁seres 1
+▁procesa 1
+▁sustancias 1
+▁diez 1
+▁beneficio 1
+▁viene 1
+▁cri 1
+serv 1
+▁32 1
+▁meta 1
+cional 1
+▁intelectual 1
+▁inmediato 1
+▁ocasiones 1
+cien 1
+▁presión 1
+▁posterior 1
+▁2005. 1
+ler 1
+scripción 1
+mbi 1
+ord 1
+▁Debe 1
+▁For 1
+▁buenas 1
+▁exhorta 1
+▁solicitar 1
+▁externa 1
+▁componentes 1
+cciones 1
+ien 1
+duzca 1
+ation 1
+▁seguido 1
+▁34 1
+ndido 1
+mática 1
+▁Marruecos 1
+▁operación 1
+▁37 1
+▁tendencia 1
+▁vecinos 1
+▁Federal 1
+illa 1
+▁computadora 1
+▁definitiva 1
+fre 1
+▁protocolo 1
+▁54 1
+▁Personal 1
+▁demora 1
+uda 1
+▁Apoyo 1
+▁tampoco 1
+eg 1
+▁Per 1
+▁separado 1
+▁cáncer 1
+▁ilícita 1
+▁consumidores 1
+UR 1
+▁Ru 1
+▁esperar 1
+▁et 1
+▁anteriormente 1
+▁cara 1
+/3 1
+▁vía 1
+▁Costa 1
+▁acordado 1
+▁Ministros 1
+▁encaminadas 1
+ego 1
+▁partidos 1
+▁explicar 1
+ow 1
+▁prestando 1
+▁numerosas 1
+▁tuviera 1
+▁segura 1
+▁hogares 1
+▁unidad 1
+▁permita 1
+▁numerosos 1
+▁básica 1
+▁pas 1
+▁crítica 1
+/59/ 1
+iéndose 1
+pos 1
+▁Regional 1
+▁significativa 1
+▁administrativas 1
+dí 1
+▁Más 1
+ual 1
+▁halla 1
+liber 1
+mite 1
+▁Era 1
+▁edificio 1
+▁Darfur 1
+▁Cuestiones 1
+▁nombramiento 1
+▁sentir 1
+▁corto 1
+▁debían 1
+▁Les 1
+ndi 1
+▁salir 1
+NU 1
+▁bosques 1
+▁realizada 1
+▁despliegue 1
+▁Web 1
+35 1
+▁puerta 1
+▁empresarial 1
+▁encargada 1
+▁brinda 1
+▁Gaza 1
+▁ten 1
+▁reconstrucción 1
+▁Haití 1
+▁quinto 1
+▁Sistema 1
+▁estableció 1
+ji 1
+▁observadores 1
+▁Ko 1
+▁calcula 1
+▁terminar 1
+▁rural 1
+▁mortalidad 1
+▁límites 1
+▁Des 1
+▁Bretaña 1
+▁incluya 1
+▁película 1
+▁excepción 1
+▁Human 1
+▁secundaria 1
+▁próximos 1
+▁cámara 1
+▁dignidad 1
+/5 1
+ier 1
+rlos 1
+cido 1
+rd 1
+cept 1
+▁gas 1
+rte 1
+▁genera 1
+▁multilateral 1
+▁presentada 1
+▁Austria 1
+▁fund 1
+▁cabeza 1
+▁detenidos 1
+▁ordenación 1
+▁33 1
+▁recon 1
+arse 1
+▁auditoría 1
+▁representan 1
+▁avanzar 1
+Ivoire 1
+▁Discriminación 1
+▁hará 1
+▁entidad 1
+▁analizar 1
+▁destacar 1
+▁adecuadas 1
+▁Autoridad 1
+▁arma 1
+▁conexas 1
+▁identificar 1
+▁ii 1
+▁video 1
+▁afectan 1
+▁abs 1
+orient 1
+mó 1
+▁1992 1
+▁distancia 1
+lec 1
+▁LA 1
+▁justa 1
+▁paga 1
+▁entraña 1
+gna 1
+▁Operaciones 1
+▁determinado 1
+len 1
+▁guarda 1
+▁abuso 1
+▁Tenien 1
+▁Fu 1
+▁especifica 1
+▁televisión 1
+26 1
+▁tendrán 1
+▁comunicar 1
+▁ponga 1
+▁Dios 1
+▁documentación 1
+icia 1
+▁decidido 1
+▁elevado 1
+▁publica 1
+▁positivo 1
+▁utilizando 1
+▁alcanzado 1
+▁Asesor 1
+▁or 1
+▁36 1
+78 1
+▁circulación 1
+▁gu 1
+Re 1
+▁Ac 1
+eron 1
+tario 1
+▁habido 1
+▁provincia 1
+▁conjuntamente 1
+▁hacerse 1
+▁combustible 1
+▁superficie 1
+▁vehículos 1
+▁Políticos 1
+▁financiar 1
+23 1
+▁soberanía 1
+/60/ 1
+▁demostrar 1
+▁copia 1
+▁colabora 1
+▁específicos 1
+▁organiza 1
+esta 1
+▁mínima 1
+95 1
+▁Zelandia 1
+▁potencia 1
+▁aparece 1
+▁volumen 1
+▁continente 1
+▁registrado 1
+▁indicar 1
+▁Croacia 1
+▁privadas 1
+▁Ti 1
+▁reto 1
+▁capa 1
+lex 1
+▁IV 1
+/63/ 1
+▁estable 1
+▁escuchar 1
+▁situado 1
+▁Primera 1
+▁ejemplos 1
+▁modificar 1
+▁integrar 1
+▁usa 1
+▁Hasta 1
+84 1
+aron 1
+▁correo 1
+▁Côte 1
+▁Wi 1
+▁Oficiales 1
+▁Quinta 1
+▁comprende 1
+▁controlar 1
+▁39 1
+▁cualquiera 1
+▁calendario 1
+▁método 1
+▁destinadas 1
+66 1
+▁cultivo 1
+▁ocurre 1
+▁dura 1
+▁apertura 1
+▁Tengo 1
+▁promulga 1
+▁étnico 1
+/57/ 1
+sol 1
+rie 1
+▁exactamente 1
+21 1
+▁gobernanza 1
+▁juntos 1
+▁víctima 1
+▁similar 1
+▁aumentado 1
+▁salva 1
+▁territorial 1
+▁ocupado 1
+vin 1
+67 1
+nio 1
+lica 1
+gio 1
+jó 1
+▁Polonia 1
+▁naciones 1
+wi 1
+▁gana 1
+▁gen 1
+ON 1
+cra 1
+▁respeta 1
+▁exportación 1
+▁aprendizaje 1
+▁Sólo 1
+▁procede 1
+▁darle 1
+doras 1
+▁1991 1
+▁concreto 1
+▁indispensable 1
+▁precisa 1
+▁siquiera 1
+rup 1
+▁observar 1
+fra 1
+▁estatales 1
+▁Georgia 1
+▁cumbre 1
+▁Sí 1
+ticos 1
+▁pacientes 1
+▁conocido 1
+▁conversaciones 1
+▁legislativas 1
+/56/ 1
+▁Pres 1
+▁encarga 1
+▁diputados 1
+▁Adjunto 1
+▁integral 1
+▁duradera 1
+▁créditos 1
+ID 1
+Con 1
+200 1
+▁células 1
+▁cometido 1
+▁probable 1
+▁Uganda 1
+tras 1
+▁sometido 1
+▁agrícola 1
+▁tradicional 1
+▁Segunda 1
+▁futuras 1
+▁ejecutar 1
+▁comprometido 1
+▁transparente 1
+▁Todas 1
+▁2004. 1
+▁corazón 1
+▁proporción 1
+▁encuesta 1
+▁prestado 1
+▁2006. 1
+▁institucionales 1
+▁limitar 1
+▁introducción 1
+▁permiso 1
+▁beneficia 1
+▁racional 1
+▁celebradas 1
+rri 1
+▁sexuales 1
+▁ciertos 1
+ot 1
+▁International 1
+dero 1
+▁posee 1
+▁foro 1
+▁penales 1
+lina 1
+76 1
+▁Grecia 1
+▁Nuestra 1
+▁desechos 1
+▁comité 1
+▁paquete 1
+▁Pri 1
+Z 1
+▁ga 1
+▁Perú 1
+▁mencionado 1
+▁factor 1
+▁adopten 1
+▁mas 1
+▁1, 1
+▁Policía 1
+▁Directiva 1
+▁selecciona 1
+▁comprensión 1
+▁Jefe 1
+int 1
+▁Congreso 1
+▁Corr 1
+▁presentadas 1
+▁introducir 1
+taria 1
+▁anuales 1
+69 1
+presión 1
+▁Quisiera 1
+▁pendientes 1
+▁permanentes 1
+▁detalles 1
+▁conservación 1
+▁líderes 1
+▁Fiscal 1
+▁adopte 1
+▁limita 1
+IN 1
+▁formula 1
+gas 1
+▁físico 1
+▁amor 1
+▁Cre 1
+▁oportuna 1
+▁prestaciones 1
+▁tratando 1
+▁realizadas 1
+▁bio 1
+▁presidencia 1
+▁popular 1
+▁cabal 1
+▁mejoras 1
+▁Debemos 1
+▁Etiopía 1
+▁lu 1
+▁Guatemala 1
+▁evaluaciones 1
+74 1
+▁igualmente 1
+▁desempeño 1
+▁deberíamos 1
+od 1
+▁observación 1
+▁educativo 1
+▁coherente 1
+▁pronuncia 1
+▁ofrecen 1
+▁negociación 1
+ático 1
+este 1
+▁previstas 1
+▁sabía 1
+▁Add 1
+TE 1
+▁junta 1
+/62/ 1
+▁Eliminación 1
+▁perspectivas 1
+24 1
+▁goza 1
+▁finalmente 1
+▁migración 1
+osa 1
+▁legales 1
+▁Algunas 1
+▁restaurante 1
+▁comentarios 1
+▁aquel 1
+▁permitirá 1
+▁daño 1
+▁mala 1
+▁pérdidas 1
+▁asumir 1
+▁Tu 1
+▁coordinar 1
+cua 1
+▁agradecimiento 1
+arios 1
+▁libros 1
+29 1
+19 1
+continuación 1
+▁toca 1
+▁múltiples 1
+▁obtenido 1
+▁utilizado 1
+▁' 1
+lidad 1
+media 1
+mini 1
+▁pacífica 1
+▁organismo 1
+gente 1
+▁Asistencia 1
+▁preventiva 1
+▁estará 1
+▁publicado 1
+▁proceder 1
+▁adecuados 1
+▁encargados 1
+/61/ 1
+▁determinadas 1
+▁arriba 1
+▁alguno 1
+▁prima 1
+▁disminución 1
+voc 1
+dro 1
+33 1
+▁hubo 1
+tiza 1
+51 1
+▁determina 1
+▁Portugal 1
+▁Muchos 1
+▁42 1
+▁mata 1
+7% 1
+▁publicar 1
+enda 1
+ris 1
+▁abogado 1
+▁retos 1
+▁muerto 1
+▁Del 1
+▁tendencias 1
+▁someter 1
+▁metros 1
+▁Can 1
+22 1
+iga 1
+ag 1
+▁49 1
+▁2009, 1
+▁resumen 1
+▁efectivos 1
+ire 1
+▁ejército 1
+▁erradicación 1
+▁voz 1
+mpli 1
+▁post 1
+▁mencionar 1
+▁limitaciones 1
+▁vital 1
+▁geográfica 1
+▁diseñado 1
+▁2003. 1
+▁capaz 1
+▁modalidades 1
+▁buenos 1
+▁Suplemento 1
+▁on 1
+▁herramientas 1
+▁abusos 1
+83 1
+▁procedentes 1
+▁2007. 1
+▁aprobadas 1
+▁Territorio 1
+.8 1
+▁Roma 1
+▁encanta 1
+▁Sus 1
+▁asignación 1
+▁vivo 1
+mon 1
+▁afectadas 1
+▁Sostenible 1
+▁inmediata 1
+▁constituir 1
+▁Bar 1
+▁/ 1
+▁ge 1
+ust 1
+▁cargos 1
+▁comprender 1
+▁exportaciones 1
+▁israelí 1
+▁selección 1
+▁58 1
+▁eje 1
+07 1
+▁44 1
+▁estilo 1
+▁77 1
+▁pág 1
+▁influencia 1
+▁innovación 1
+tención 1
+▁desempleo 1
+lá 1
+mpa 1
+▁perjuicio 1
+▁Medidas 1
+▁adultos 1
+▁aplicaciones 1
+▁57 1
+▁* 1
+▁Argelia 1
+▁laboratorio 1
+rado 1
+▁relieve 1
+▁parecer 1
+▁Finlandia 1
+▁máxima 1
+artículo 1
+▁negocios 1
+▁Fo 1
+curri 1
+▁reglamenta 1
+▁é 1
+▁siente 1
+80 1
+▁evento 1
+▁particulares 1
+▁contingentes 1
+▁dé 1
+▁aumenta 1
+▁indican 1
+▁justifica 1
+del 1
+▁euro 1
+eña 1
+▁espero 1
+▁difíciles 1
+▁mente 1
+mbo 1
+▁criterio 1
+▁alumnos 1
+▁disponibilidad 1
+▁llegó 1
+nica 1
+▁Val 1
+▁reclamación 1
+▁metas 1
+fica 1
+▁Sta 1
+bia 1
+▁efectivamente 1
+▁bases 1
+▁promedio 1
+▁cadena 1
+79 1
+▁bancos 1
+▁Dirección 1
+▁inmigración 1
+▁disciplina 1
+iz 1
+▁visitas 1
+▁Resolución 1
+▁suministra 1
+▁anuncia 1
+▁investigar 1
+▁Checa 1
+▁periódicos 1
+▁fre 1
+▁debida 1
+▁Fuerza 1
+▁considerarse 1
+▁declara 1
+▁55 1
+▁disfrutar 1
+▁SIDA 1
+fin 1
+cómo 1
+▁coste 1
+▁salario 1
+▁administrativos 1
+▁géneros 1
+▁suficientemente 1
+rig 1
+▁desempeña 1
+▁merece 1
+.30 1
+▁verdadera 1
+▁informado 1
+▁individuales 1
+▁carretera 1
+▁incremento 1
+▁Mientras 1
+▁revela 1
+▁41 1
+▁reconciliación 1
+▁70 1
+▁vínculos 1
+▁salida 1
+6% 1
+tá 1
+▁n 1
+▁80 1
+▁Bélgica 1
+▁examinando 1
+▁puso 1
+▁plataforma 1
+▁respectivamente 1
+uta 1
+plo 1
+mático 1
+▁Belarús 1
+como 1
+▁Relatora 1
+▁aplicables 1
+▁época 1
+pec 1
+TA 1
+▁Ibíd 1
+▁trate 1
+▁Vicepresidente 1
+▁mapa 1
+▁fiscales 1
+bol 1
+▁contempla 1
+plica 1
+▁cam 1
+▁pudieran 1
+produc 1
+▁firmado 1
+▁testigos 1
+República 1
+▁Interior 1
+ibilidad 1
+▁satisfacer 1
+cord 1
+▁cometidos 1
+▁letra 1
+70 1
+▁concesión 1
+▁permitan 1
+75 1
+▁aporta 1
+side 1
+▁adquisición 1
+hor 1
+▁ideal 1
+ster 1
+▁señalado 1
+▁miedo 1
+▁alimentación 1
+▁incorporación 1
+ibles 1
+▁variedad 1
+▁mantiene 1
+▁transmitir 1
+▁tránsito 1
+▁Tra 1
+▁revisar 1
+▁vistas 1
+▁medicamentos 1
+zó 1
+▁suelo 1
+▁mercancías 1
+▁ojos 1
+▁magistrados 1
+▁finalidad 1
+▁rea 1
+▁extrema 1
+▁funcionario 1
+rm 1
+2% 1
+▁imposible 1
+▁intervenciones 1
+▁dirigidas 1
+▁seguros 1
+▁permiten 1
+▁lengua 1
+▁insuficiente 1
+▁90 1
+▁inicio 1
+br 1
+▁2001. 1
+▁independientes 1
+▁PNUMA 1
+▁vulnerabilidad 1
+▁joven 1
+icos 1
+ora 1
+-1 1
+ant 1
+▁Cha 1
+▁lejos 1
+▁2002. 1
+▁asignado 1
+▁especies 1
+▁disfrute 1
+▁llevó 1
+▁Tercera 1
+▁proporcione 1
+▁aguas 1
+▁2000. 1
+1, 1
+▁rein 1
+▁Ab 1
+▁básico 1
+▁presentados 1
+44 1
+▁pagos 1
+▁argumento 1
+▁complace 1
+▁fui 1
+▁inspira 1
+▁señal 1
+▁existente 1
+▁blanco 1
+▁agrícolas 1
+▁sigan 1
+edi 1
+tando 1
+▁sumamente 1
+▁Esos 1
+Rev 1
+RE 1
+vel 1
+filia 1
+27 1
+▁satélite 1
+▁lectura 1
+▁tomando 1
+▁adapta 1
+▁2008. 1
+▁correcciones 1
+▁frecuente 1
+▁51 1
+73 1
+▁votar 1
+▁www 1
+▁Antes 1
+▁proveedores 1
+▁viajes 1
+▁53 1
+▁acceder 1
+02 1
+▁establezca 1
+▁CE 1
+▁definir 1
+▁prevista 1
+▁puedes 1
+▁dirigir 1
+▁malos 1
+gro 1
+▁ejecuta 1
+tamos 1
+▁46 1
+▁crucial 1
+▁histórico 1
+▁atrás 1
+▁dicen 1
+68 1
+▁experimenta 1
+38 1
+stre 1
+▁Tailandia 1
+▁cocina 1
+▁penas 1
+ting 1
+▁sirve 1
+▁Sol 1
+▁Rica 1
+▁historias 1
+lización 1
+eño 1
+▁limitada 1
+▁dedicado 1
+integr 1
+▁momentos 1
+▁vaya 1
+▁cierta 1
+▁pretende 1
+▁Árabes 1
+SP 1
+▁Malasia 1
+▁habitual 1
+▁aplicado 1
+▁cientos 1
+▁avance 1
+▁Recursos 1
+▁respectivos 1
+▁moneda 1
+▁biológica 1
+31 1
+▁nacimiento 1
+▁químicos 1
+▁(2001) 1
+▁sexto 1
+▁respond 1
+▁entiende 1
+▁soldados 1
+▁Alta 1
+▁espacial 1
+▁apropiado 1
+▁Filipinas 1
+ED 1
+▁espaciales 1
+▁intentar 1
+▁47 1
+▁Ke 1
+▁unidades 1
+3% 1
+▁ulterior 1
+▁consejo 1
+▁cha 1
+▁modelos 1
+▁adoptada 1
+▁consta 1
+▁humanitarias 1
+▁tarjeta 1
+▁pienso 1
+▁Real 1
+▁escucha 1
+81 1
+vu 1
+▁Túnez 1
+▁salvo 1
+▁Dinamarca 1
+▁reintegración 1
+▁software 1
+▁ejecutivo 1
+▁audiencia 1
+▁solicitado 1
+▁guía 1
+▁asegura 1
+▁frontera 1
+▁examine 1
+▁raíz 1
+▁mediano 1
+▁reunir 1
+▁marino 1
+▁verdadero 1
+▁formulada 1
+▁establecidas 1
+▁CO 1
+▁Miembro 1
+▁quizás 1
+▁Ciencia 1
+▁38 1
+leta 1
+▁Refugiados 1
+▁permitido 1
+▁Red 1
+▁imponer 1
+▁tus 1
+cel 1
+▁aprecia 1
+▁convenios 1
+vol 1
+34 1
+▁aeropuerto 1
+▁antigua 1
+▁Acoge 1
+▁pat 1
+▁seminarios 1
+▁seminario 1
+▁presupuestario 1
+▁enlace 1
+▁supervisar 1
+▁ataque 1
+ería 1
+▁involucra 1
+▁párrs 1
+▁200 1
+▁reciben 1
+▁objetos 1
+▁Santa 1
+4% 1
+05 1
+▁puerto 1
+▁Myanmar 1
+parte 1
+▁pasó 1
+▁56 1
+▁considerado 1
+▁entrevista 1
+▁gratuita 1
+▁raza 1
+cé 1
+bili 1
+puesto 1
+▁incorpora 1
+▁izquierda 1
+DI 1
+▁apropiadas 1
+▁acumula 1
+▁hospital 1
+▁52 1
+▁utilizan 1
+▁comenzó 1
+▁equitativa 1
+▁referente 1
+▁capaces 1
+▁notable 1
+rías 1
+▁Bangladesh 1
+▁compleja 1
+▁desafío 1
+▁compañía 1
+posición 1
+▁torno 1
+Á 1
+8% 1
+▁impuestos 1
+▁altos 1
+▁cooperar 1
+▁cifra 1
+▁presentan 1
+▁Potencia 1
+▁éstas 1
+3) 1
+28 1
+48 1
+▁leer 1
+▁1998, 1
+▁desarrollado 1
+▁deriva 1
+▁efectuar 1
+ografía 1
+▁electoral 1
+37 1
+▁indicó 1
+▁convertido 1
+▁generar 1
+damente 1
+▁positivos 1
+▁prostitución 1
+nico 1
+▁Supervisión 1
+forma 1
+▁nueve 1
+ológica 1
+▁manifiesto 1
+▁practica 1
+▁emplea 1
+▁43 1
+fir 1
+▁gama 1
+▁Observa 1
+▁Actualmente 1
+▁impulso 1
+▁superiores 1
+▁Ob 1
+SA 1
+▁requisito 1
+61 1
+87 1
+▁revisado 1
+▁árabes 1
+▁Du 1
+▁km 1
+▁62 1
+ñe 1
+▁estrechamente 1
+▁coherencia 1
+▁ONUDI 1
+▁posteriormente 1
+▁camina 1
+SE 1
+▁voluntarias 1
+▁personales 1
+▁medicina 1
+▁juez 1
+71 1
+▁go 1
+▁activo 1
+▁incluyendo 1
+▁Mantenimiento 1
+▁Rumania 1
+▁Civil 1
+jar 1
+▁micro 1
+▁voluntaria 1
+▁convertir 1
+▁2, 1
+▁encontramos 1
+dé 1
+ima 1
+iste 1
+▁iniciado 1
+▁Ecuador 1
+▁Cabe 1
+4) 1
+guard 1
+▁negocia 1
+▁índice 1
+▁profundo 1
+▁cumplido 1
+▁moderna 1
+▁Aprobación 1
+90 1
+tales 1
+▁ruta 1
+▁contribuyen 1
+▁europeas 1
+▁desplazamiento 1
+▁concentra 1
+▁recibió 1
+▁Azerbaiyán 1
+▁departamentos 1
+▁alimenta 1
+▁Sha 1
+Ha 1
+▁creó 1
+RA 1
+▁autorización 1
+▁condenado 1
+▁concreta 1
+▁mejoramiento 1
+▁noticias 1
+▁rendimiento 1
+ard 1
+▁abrir 1
+▁Bulgaria 1
+▁prolonga 1
+▁decide 1
+▁Promoción 1
+CA 1
+▁promesa 1
+▁distribuir 1
+gia 1
+▁Droga 1
+▁OMC 1
+tener 1
+▁habitación 1
+▁Exhorta 1
+▁meridional 1
+▁Angola 1
+1) 1
+▁estudia 1
+▁liderazgo 1
+▁sensibilización 1
+▁Iniciativa 1
+▁Sobre 1
+▁básicas 1
+Hábitat 1
+lig 1
+▁Venezuela 1
+▁líneas 1
+▁asesinato 1
+▁sueño 1
+▁vio 1
+94 1
+▁votado 1
+▁compromete 1
+▁Ver 1
+▁significativo 1
+ños 1
+▁cer 1
+▁acepta 1
+▁ingreso 1
+77 1
+▁dolor 1
+PE 1
+▁ligeras 1
+▁Alianza 1
+▁París 1
+▁realice 1
+ador 1
+▁pl 1
+ph 1
+▁sitios 1
+▁metodología 1
+▁urgencia 1
+▁Kenya 1
+pli 1
+▁ocupan 1
+▁Introducción 1
+▁estuvo 1
+ah 1
+▁ratificado 1
+▁administra 1
+▁expone 1
+▁usando 1
+OR 1
+▁Cultura 1
+▁totalidad 1
+▁financia 1
+▁controles 1
+▁reducido 1
+▁suerte 1
+▁residentes 1
+▁sonido 1
+▁explota 1
+▁decía 1
+▁agricultores 1
+▁campañas 1
+sia 1
+▁Civiles 1
+32 1
+▁juzga 1
+▁aun 1
+mov 1
+▁redacción 1
+▁Supremo 1
+▁podríamos 1
+▁Sal 1
+▁planos 1
+▁adaptación 1
+stitución 1
+/2000/ 1
+▁coloca 1
+▁playa 1
+▁aldea 1
+▁Reconociendo 1
+▁edificios 1
+:// 1
+▁ultraterrestre 1
+▁negro 1
+ular 1
+▁tienda 1
+04 1
+▁jueces 1
+▁Expresa 1
+▁globalización 1
+AM 1
+▁Lanka 1
+1% 1
+▁vuelo 1
+▁confi 1
+▁reglamentos 1
+terinstitucional 1
+▁verificación 1
+▁competencias 1
+▁hambre 1
+CO 1
+▁podamos 1
+▁Financiación 1
+misión 1
+visión 1
+▁armadas 1
+▁denomina 1
+▁administrativo 1
+89 1
+▁llena 1
+▁incrementar 1
+▁aprobados 1
+▁hice 1
+▁realizan 1
+▁Lisboa 1
+▁garantice 1
+▁pertenece 1
+▁partido 1
+ity 1
+▁prioritaria 1
+▁Media 1
+chos 1
+fí 1
+ábamos 1
+▁fundamento 1
+09 1
+public 1
+▁fabricación 1
+▁destinado 1
+▁armado 1
+▁moderno 1
+▁terrestre 1
+▁j 1
+dura 1
+▁logra 1
+▁Hungría 1
+▁Pueblo 1
+reg 1
+eti 1
+▁baño 1
+▁Montenegro 1
+▁continúe 1
+▁excepcional 1
+▁CP 1
+▁jefes 1
+▁Saudita 1
+▁Arabia 1
+▁peso 1
+▁robot 1
+▁administrativa 1
+illas 1
+cionales 1
+▁planta 1
+parti 1
+▁velocidad 1
+▁represión 1
+tino 1
+▁define 1
+ug 1
+▁hermano 1
+cado 1
+▁respalda 1
+▁sanitaria 1
+▁Nepal 1
+▁obligatoria 1
+▁registrada 1
+▁religiosas 1
+▁desastre 1
+▁inteligente 1
+ted 1
+▁estimaciones 1
+▁(2004) 1
+▁impunidad 1
+▁Estudi 1
+igu 1
+▁Fe 1
+bur 1
+part 1
+▁conexión 1
+▁abajo 1
+▁aceptación 1
+▁gasto 1
+▁regla 1
+▁firmemente 1
+▁flexibilidad 1
+Sur 1
+▁contaminación 1
+ty 1
+▁ajusta 1
+▁requieren 1
+americano 1
+▁ocurrido 1
+▁individuos 1
+▁corta 1
+▁Podemos 1
+▁oposición 1
+▁obras 1
+eta 1
+▁inicia 1
+▁estándar 1
+▁Sri 1
+▁notificación 1
+▁llevan 1
+▁fallo 1
+▁vive 1
+▁previa 1
+▁asentamientos 1
+▁derecha 1
+▁corrientes 1
+▁límite 1
+▁coopera 1
+▁revisada 1
+▁coche 1
+▁infra 1
+▁IN 1
+▁plantas 1
+▁Movimiento 1
+▁Luego 1
+Leste 1
+▁reforz 1
+▁nación 1
+▁carbono 1
+▁subprograma 1
+▁VI 1
+▁contribuido 1
+/2001/ 1
+Ma 1
+▁consolidar 1
+▁insulares 1
+▁Cor 1
+▁desee 1
+▁conecta 1
+▁negativa 1
+▁ausencia 1
+▁perder 1
+▁nuevamente 1
+93 1
+▁nombres 1
+▁disponer 1
+▁conexos 1
+up 1
+▁Or 1
+▁formato 1
+▁Ber 1
+▁Ci 1
+ique 1
+53 1
+▁pudo 1
+gada 1
+▁artista 1
+▁profundamente 1
+ético 1
+▁poli 1
+▁comités 1
+55 1
+▁adopta 1
+▁is 1
+▁correcta 1
+▁exclusión 1
+▁inteligencia 1
+▁convencido 1
+▁interpreta 1
+íamos 1
+▁llamar 1
+VI 1
+▁primordial 1
+▁felicitar 1
+▁consecución 1
+▁diciendo 1
+▁usuario 1
+▁Eritrea 1
+▁comprar 1
+▁caja 1
+▁intolerancia 1
+▁ilegales 1
+▁recoge 1
+PA 1
+7) 1
+▁procura 1
+▁ciento 1
+Г 1
+▁manual 1
+▁privados 1
+rez 1
+▁Defensa 1
+▁viviendas 1
+▁cumplan 1
+AT 1
+▁equivalente 1
+end 1
+36 1
+▁Destaca 1
+▁futura 1
+▁pese 1
+▁votos 1
+ze 1
+ño 1
+▁específica 1
+vas 1
+▁pensamos 1
+▁movimientos 1
+▁menciona 1
+▁transmisión 1
+▁estrellas 1
+▁suya 1
+▁ventajas 1
+▁& 1
+cap 1
+▁pensamiento 1
+▁15.00 1
+▁tuvieron 1
+▁cláusula 1
+▁convención 1
+▁record 1
+▁PMA 1
+▁textos 1
+▁Decenio 1
+▁inmigrantes 1
+her 1
+▁consciente 1
+tic 1
+▁funcionar 1
+▁chino 1
+▁compañías 1
+▁instrucciones 1
+▁mantenga 1
+▁vuelve 1
+www 1
+▁recurrir 1
+85 1
+▁10.00 1
+caso 1
+ub 1
+92 1
+▁ritmo 1
+▁anima 1
+▁Otra 1
+▁duplica 1
+▁error 1
+▁expresado 1
+▁Año 1
+▁serio 1
+ang 1
+▁asistir 1
+▁experimento 1
+▁palestina 1
+gh 1
+▁normativo 1
+▁Jordania 1
+▁retraso 1
+bli 1
+▁seguía 1
+q 1
+▁aplicando 1
+47 1
+▁identificación 1
+zos 1
+▁turismo 1
+▁orgánico 1
+▁sufrimiento 1
+▁integrante 1
+▁visual 1
+▁aportar 1
+▁analiza 1
+▁Común 1
+nacional 1
+▁creen 1
+▁Jefes 1
+▁Eslovenia 1
+▁61 1
+▁competente 1
+▁reconocido 1
+▁relativamente 1
+▁acelerar 1
+▁habida 1
+▁abogados 1
+/2004/ 1
+▁realizando 1
+▁consultar 1
+▁59 1
+▁colectivo 1
+▁Estadística 1
+▁transacciones 1
+65 1
+▁relacionada 1
+▁produce 1
+▁causado 1
+▁dando 1
+▁mirar 1
+AP 1
+SR 1
+▁vinculante 1
+▁vino 1
+▁garantizado 1
+▁formulado 1
+88 1
+▁diario 1
+ví 1
+▁utilizados 1
+activa 1
+▁sacar 1
+▁presentara 1
+▁exista 1
+▁UNFPA 1
+▁pantalla 1
+▁añadir 1
+▁val 1
+▁consideran 1
+▁oficio 1
+▁castigo 1
+▁defender 1
+▁digo 1
+▁esperamos 1
+▁Otro 1
+▁Actividades 1
+▁Procedimiento 1
+americana 1
+▁eficazmente 1
+▁fabrica 1
+43 1
+▁consideró 1
+▁patrocinadores 1
+▁Gar 1
+▁parcial 1
+▁temprana 1
+▁nacido 1
+▁sujeta 1
+▁estadounidenses 1
+▁detenido 1
+▁OIT 1
+▁autorizado 1
+▁Han 1
+72 1
+▁discu 1
+▁cuestionario 1
+imi 1
+▁simplifica 1
+imo 1
+▁actuación 1
+▁acusado 1
+▁Jerusalén 1
+▁Población 1
+▁Espero 1
+▁vacuna 1
+izó 1
+▁fotos 1
+▁mencionados 1
+▁impone 1
+▁carece 1
+In 1
+▁exigir 1
+▁específicamente 1
+/2002/ 1
+▁cuál 1
+ring 1
+▁pase 1
+63 1
+▁reconocida 1
+mentar 1
+ü 1
+▁sostenibilidad 1
+▁previo 1
+▁imparti 1
+39 1
+und 1
+ciente 1
+▁manifesta 1
+▁conveniente 1
+-2 1
+iano 1
+▁hoja 1
+▁preocupado 1
+▁exclusivamente 1
+▁certificado 1
+▁absoluto 1
+tización 1
+▁New 1
+▁Dado 1
+ológico 1
+▁toneladas 1
+98 1
+▁utilice 1
+▁dijeron 1
+IP 1
+▁Macedonia 1
+▁Párrafo 1
+▁Equipo 1
+▁vuelta 1
+▁declaró 1
+pp 1
+tec 1
+▁nu 1
+▁verde 1
+ST 1
+▁describe 1
+▁formuló 1
+▁priva 1
+muni 1
+▁extradición 1
+▁(2 1
+clar 1
+▁cliente 1
+▁lleve 1
+▁bal 1
+ker 1
+▁insolvencia 1
+▁pasando 1
+▁aplican 1
+▁Infancia 1
+▁cobertura 1
+▁St 1
+▁deporte 1
+Q 1
+▁precisamente 1
+ley 1
+▁regreso 1
+mm 1
+-3 1
+▁quedan 1
+ben 1
+▁prohíbe 1
+▁reparación 1
+▁castiga 1
+▁río 1
+▁elevada 1
+▁terminado 1
+▁debajo 1
+/2003/ 1
+siderablemente 1
+▁aceptable 1
+▁comunicado 1
+human 1
+▁motor 1
+▁memoria 1
+line 1
+▁Libia 1
+▁encima 1
+54 1
+▁fueran 1
+▁adelanto 1
+▁creemos 1
+▁Pueden 1
+▁básicamente 1
+▁cantidades 1
+▁convenido 1
+▁subregionales 1
+fina 1
+▁criminal 1
+▁sustancial 1
+▁saldo 1
+9% 1
+▁séptimo 1
+▁teoría 1
+▁domina 1
+▁Universal 1
+▁incumplimiento 1
+▁medioambiental 1
+▁redonda 1
+41 1
+▁pul 1
+imiento 1
+▁ocupados 1
+▁Terrorismo 1
+ética 1
+Pro 1
+▁intensificar 1
+▁pocas 1
+▁complementaria 1
+▁formar 1
+86 1
+▁piensa 1
+▁Camboya 1
+▁Senegal 1
+▁genocidio 1
+▁sufrido 1
+▁montaña 1
+▁convertirse 1
+cil 1
+▁peor 1
+▁parecen 1
+59 1
+▁aportan 1
+▁aprend 1
+▁cama 1
+▁energética 1
+▁Reafirmando 1
+▁ganar 1
+▁diplomática 1
+▁participado 1
+cid 1
+▁publicaciones 1
+▁Investigación 1
+▁dimensión 1
+▁vale 1
+▁herramienta 1
+▁establecida 1
+dec 1
+▁estuviera 1
+aña 1
+▁Administrativos 1
+MIN 1
+▁encuentre 1
+▁fotografía 1
+▁autora 1
+▁periódicamente 1
+▁lenguaje 1
+celera 1
+▁enfoques 1
+▁empieza 1
+▁Tortura 1
+▁rendición 1
+▁separación 1
+▁TED 1
+▁vías 1
+▁500 1
+▁actitud 1
+ult 1
+▁Racial 1
+▁rodea 1
+▁2) 1
+▁visitar 1
+46 1
+▁máquina 1
+▁etapas 1
+▁asesor 1
+▁apruebe 1
+▁estaría 1
+▁modificación 1
+▁operacional 1
+Firmado 1
+▁gubernamental 1
+▁reclamante 1
+mico 1
+▁formal 1
+▁agrega 1
+▁reproductiva 1
+▁contactos 1
+▁alternativas 1
+▁perdido 1
+cieron 1
+pondrá 1
+8) 1
+▁impuesto 1
+tividad 1
+▁programación 1
+tiende 1
+▁defensores 1
+ell 1
+▁desempeñan 1
+▁proporcionado 1
+▁Fundación 1
+▁gradual 1
+▁Beijing 1
+▁Lituania 1
+64 1
+▁Otras 1
+cent 1
+▁conseguido 1
+ef 1
+▁iii 1
+▁Queda 1
+▁actuaciones 1
+▁campos 1
+2/ 1
+tema 1
+bel 1
+▁jefe 1
+▁interesa 1
+▁captura 1
+▁brindar 1
+▁XXI 1
+▁cuentan 1
+▁We 1
+oro 1
+▁década 1
+▁elegido 1
+▁futuros 1
+▁cartas 1
+▁presentará 1
+▁documenta 1
+▁presupuestarias 1
+▁estatal 1
+▁extraordinaria 1
+▁inquietud 1
+▁Invita 1
+▁considere 1
+▁barrio 1
+▁clases 1
+▁movilización 1
+▁hiciera 1
+/2005/ 1
+▁ignora 1
+▁enuncia 1
+▁viable 1
+▁bebé 1
+= 1
+▁correcto 1
+▁vigésimo 1
+▁renovable 1
+▁participen 1
+42 1
+▁deficiencias 1
+▁convenciones 1
+▁compatible 1
+▁páginas 1
+▁Familia 1
+Cuál 1
+▁Viet 1
+▁ampliamente 1
+▁esforz 1
+– 1
+▁corresponda 1
+▁erradicar 1
+▁exámenes 1
+▁legítima 1
+▁obliga 1
+tur 1
+▁renta 1
+vid 1
+▁histórica 1
+IV 1
+▁preparativos 1
+▁negocio 1
+ece 1
+▁productores 1
+▁absolutamente 1
+▁Incluso 1
+▁quedar 1
+-19 1
+▁aclara 1
+bul 1
+▁haberse 1
+▁ruso 1
+gal 1
+▁: 1
+▁intensifica 1
+CT 1
+▁bajos 1
+56 1
+▁flexible 1
+▁muestran 1
+▁arbitraria 1
+▁Usted 1
+dra 1
+▁Sírvanse 1
+pusieron 1
+▁Acta 1
+▁caracteriza 1
+58 1
+▁agrava 1
+91 1
+▁300 1
+▁patrimonio 1
+▁enfrentar 1
+ear 1
+▁laborales 1
+▁Él 1
+COM 1
+/2006/ 1
+▁detalle 1
+▁adoptó 1
+▁agresión 1
+tuvieron 1
+▁somet 1
+▁manifestaciones 1
+▁Reafirma 1
+▁siguió 1
+▁Chi 1
+▁hubieran 1
+▁2009. 1
+▁director 1
+pan 1
+▁interacción 1
+ux 1
+▁amigo 1
+▁archivos 1
+▁frase 1
+▁creer 1
+▁Capítulo 1
+▁escribir 1
+▁subsidio 1
+▁excesiva 1
+▁detener 1
+▁Armenia 1
+▁Ghana 1
+▁acogida 1
+▁regímenes 1
+▁1) 1
+▁errores 1
+▁monto 1
+II 1
+▁prioritario 1
+▁juegos 1
+▁preguntar 1
+▁sustantiva 1
+ifi 1
+▁tú 1
+tán 1
+▁kilómetros 1
+▁convierte 1
+▁conceder 1
+▁diga 1
+▁Sch 1
+▁designado 1
+mal 1
+▁comparte 1
+▁modificaciones 1
+▁preferencia 1
+▁cuarta 1
+▁ataca 1
+▁basadas 1
+gráfica 1
+▁Bolivia 1
+▁impide 1
+▁Documento 1
+▁Comisionada 1
+▁alternativo 1
+cepción 1
+▁discursos 1
+▁Energía 1
+▁adquirida 1
+▁apliquen 1
+ístico 1
+han 1
+▁urbano 1
+▁empleado 1
+-4 1
+▁saneamiento 1
+▁OIEA 1
+▁armonización 1
+▁ido 1
+▁adquirir 1
+▁sencilla 1
+▁llegue 1
+▁recibe 1
+▁Verde 1
+▁posteriores 1
+▁acredita 1
+▁competitividad 1
+lio 1
+▁sólida 1
+▁Agricultura 1
+▁alerta 1
+▁hicimos 1
+▁Chad 1
+osas 1
+▁inscripción 1
+bio 1
+rimi 1
+▁Superior 1
+▁últimas 1
+▁pensé 1
+▁Salvador 1
+▁altura 1
+▁humanitarios 1
+▁realizó 1
+▁materna 1
+▁plantear 1
+▁ll 1
+▁aumentando 1
+▁reciba 1
+▁tienes 1
+▁difundir 1
+▁pasos 1
+▁existía 1
+reci 1
+▁margen 1
+▁convencionales 1
+▁invitación 1
+▁tolerancia 1
+ólogo 1
+▁pan 1
+▁Caja 1
+▁comienza 1
+▁facilite 1
+▁tecnológica 1
+▁municiones 1
+▁libres 1
+▁ACNUDH 1
+▁continuo 1
+▁periódica 1
+▁anuncio 1
+▁America 1
+▁octavo 1
+ak 1
+06 1
+▁ganado 1
+▁Qui 1
+▁Reconoce 1
+▁Solo 1
+▁Esperamos 1
+▁peces 1
+▁opone 1
+▁Interna 1
+▁enjuiciamiento 1
+▁colaborar 1
+▁flor 1
+5.000 1
+▁Uruguay 1
+▁sanitario 1
+▁concretamente 1
+▁tecnológico 1
+▁corriente 1
+▁descarga 1
+▁avanzado 1
+▁tardar 1
+▁avión 1
+▁expuesto 1
+▁posiciones 1
+▁reside 1
+▁Alienta 1
+▁concluido 1
+▁informativa 1
+▁vigente 1
+▁dinámica 1
+▁riqueza 1
+▁instalación 1
+▁préstamos 1
+▁mencionadas 1
+▁adecuadamente 1
+▁transnacional 1
+▁prácticamente 1
+▁déficit 1
+▁enormes 1
+versión 1
+▁aclarar 1
+▁ciudadano 1
+shi 1
+▁antiguo 1
+log 1
+contra 1
+▁azul 1
+▁mon 1
+▁observador 1
+ponga 1
+▁doméstica 1
+▁pertenecientes 1
+▁secreto 1
+▁liberalización 1
+▁Orden 1
+▁culturas 1
+▁Dr 1
+▁Objetivos 1
+net 1
+▁descubrir 1
+▁Exp 1
+▁tro 1
+mil 1
+▁temático 1
+▁Pas 1
+lico 1
+▁atentado 1
+uro 1
+49 1
+▁manifiesta 1
+▁revista 1
+▁millón 1
+volución 1
+arias 1
+▁culpable 1
+glo 1
+tem 1
+▁conserva 1
+▁Doha 1
+▁dia 1
+▁tele 1
+▁fácilmente 1
+▁autonomía 1
+▁movilidad 1
+▁aplicarse 1
+▁enumera 1
+▁ministerios 1
+▁Moldova 1
+▁solucionar 1
+▁Día 1
+▁sorprend 1
+▁quedado 1
+mple 1
+licit 1
+▁buques 1
+▁plazos 1
+▁tome 1
+fort 1
+▁http 1
+’ 1
+▁propuso 1
+▁arreglos 1
+▁solicitantes 1
+▁Q 1
+▁finalizar 1
+▁alojamiento 1
+▁electricidad 1
+▁apenas 1
+▁jugar 1
+▁prestan 1
+▁asignar 1
+▁basados 1
+▁subrayar 1
+▁uniforme 1
+IM 1
+▁comunica 1
+▁presentaron 1
+nova 1
+▁patente 1
+▁imaginar 1
+▁planteamiento 1
+▁quiera 1
+▁normalmente 1
+▁estadounidense 1
+RO 1
+▁lamentable 1
+▁che 1
+▁enviado 1
+ani 1
+▁ponen 1
+▁art 1
+▁dispuesta 1
+▁maestros 1
+▁FAO 1
+▁prácticos 1
+▁controversias 1
+▁UNESCO 1
+▁responde 1
+▁resta 1
+▁suelen 1
+▁50% 1
+▁científico 1
+▁creados 1
+▁legisla 1
+▁complejidad 1
+▁expulsión 1
+▁gasta 1
+▁apropiada 1
+comp 1
+▁avanza 1
+▁Londres 1
+▁casas 1
+82 1
+▁hija 1
+▁alega 1
+▁Eslovaquia 1
+▁recomendar 1
+▁Debería 1
+▁cálculo 1
+▁movilizar 1
+▁charla 1
+▁corte 1
+▁vigilar 1
+62 1
+▁don 1
+▁Tom 1
+▁niña 1
+▁ministerial 1
+▁pensando 1
+▁envío 1
+stitui 1
+▁retirada 1
+▁regresar 1
+▁amplias 1
+▁2006-2007 1
+▁(1999) 1
+▁Constitucional 1
+dina 1
+▁Euro 1
+▁Tecnología 1
+▁adaptar 1
+▁rige 1
+▁recuperar 1
+NA 1
+▁vigentes 1
+▁Tanzanía 1
+▁encontrado 1
+▁Ban 1
+▁madres 1
+▁Coordinador 1
+▁pareja 1
+▁sorprendente 1
+▁imparcial 1
+▁específico 1
+▁Kazajstán 1
+▁xenofobia 1
+▁pensiones 1
+dy 1
+▁declarado 1
+▁bilateral 1
+▁distinción 1
+▁utilidad 1
+US 1
+cí 1
+mun 1
+▁dirige 1
+08 1
+▁Integra 1
+▁creencias 1
+▁agrado 1
+col 1
+▁enfrentan 1
+Puedo 1
+▁verano 1
+▁urbanas 1
+▁oído 1
+▁dispuestos 1
+▁Culturales 1
+▁Nam 1
+▁registra 1
+▁solar 1
+▁cumple 1
+▁Observando 1
+▁defini 1
+ction 1
+▁obstaculiza 1
+▁socorro 1
+▁Principios 1
+▁ilícitos 1
+▁práctico 1
+▁aniversario 1
+▁receptor 1
+▁escenario 1
+gri 1
+vesti 1
+adores 1
+▁listas 1
+mia 1
+▁abandona 1
+▁Comisaria 1
+▁biblioteca 1
+▁diaria 1
+▁Segundo 1
+▁mezcla 1
+▁cuadr 1
+▁invitó 1
+▁mantenido 1
+▁extranjera 1
+▁Malta 1
+ight 1
+▁foros 1
+▁combinación 1
+▁municipio 1
+▁Resulta 1
+▁armamentos 1
+▁virus 1
+▁expresaron 1
+▁explicación 1
+▁alternativa 1
+▁etiqueta 1
+▁ocupar 1
+▁adquisiciones 1
+▁discurso 1
+▁almacenamiento 1
+▁venido 1
+▁Letonia 1
+▁Albania 1
+▁quieres 1
+▁recepción 1
+▁Partido 1
+▁legítimo 1
+▁aviones 1
+▁fijado 1
+▁concluye 1
+▁John 1
+▁exigencias 1
+▁considerables 1
+ham 1
+▁registros 1
+▁verdaderamente 1
+▁inició 1
+▁odio 1
+tual 1
+▁étnicas 1
+▁apartamento 1
+▁universo 1
+▁libremente 1
+ET 1
+▁barrera 1
+▁rela 1
+▁consonancia 1
+▁crítico 1
+▁financiado 1
+▁(2000) 1
+▁Camerún 1
+▁Nairobi 1
+▁propietario 1
+▁tuve 1
+▁salvaguardias 1
+▁postura 1
+▁disco 1
+▁ubicación 1
+52 1
+lip 1
+▁preliminar 1
+▁imagina 1
+▁construye 1
+▁esposa 1
+▁concede 1
+▁Mont 1
+▁RE 1
+▁aprovecha 1
+▁serían 1
+▁63 1
+▁religiosa 1
+▁Ben 1
+▁satisfactoria 1
+▁fraude 1
+mita 1
+▁dada 1
+▁oral 1
+▁blanqueo 1
+▁ja 1
+▁transforma 1
+▁agradecer 1
+▁traslado 1
+▁retirar 1
+▁participaron 1
+▁preste 1
+▁cubrir 1
+▁expresamente 1
+▁envía 1
+▁constantemente 1
+▁mundialización 1
+▁parque 1
+▁aumentó 1
+▁aceptado 1
+▁Bri 1
+▁Col 1
+MO 1
+▁temor 1
+▁colega 1
+terna 1
+▁hubiese 1
+▁sentimiento 1
+▁rom 1
+▁dedicada 1
+▁Trans 1
+▁ensayos 1
+▁beneficiarios 1
+▁siento 1
+▁agenda 1
+▁equivoca 1
+▁piensan 1
+osos 1
+è 1
+ánico 1
+▁medir 1
+if 1
+▁investigadores 1
+▁invertir 1
+▁madera 1
+57 1
+▁Mauricio 1
+▁evidencia 1
+▁instrucción 1
+▁impulsar 1
+▁acusados 1
+▁gravedad 1
+▁comentario 1
+virt 1
+▁mensajes 1
+▁reconoció 1
+▁corporal 1
+▁dieron 1
+▁ciudadanía 1
+▁sangre 1
+▁Necesitamos 1
+▁profesores 1
+▁Singapur 1
+▁voluntarios 1
+Original 1
+▁Camp 1
+▁terrible 1
+▁disponga 1
+tz 1
+ulación 1
+▁vea 1
+▁definido 1
+▁É 1
+▁2004-2005 1
+▁Luxemburgo 1
+▁tipifica 1
+▁Vol 1
+eccion 1
+▁desplazadas 1
+▁generaciones 1
+▁desmovilización 1
+▁aparato 1
+▁interino 1
+▁lamenta 1
+▁jugador 1
+▁logrados 1
+▁centrales 1
+greso 1
+▁exporta 1
+ek 1
+▁Recomendación 1
+▁concertado 1
+7/ 1
+▁titulada 1
+tenta 1
+▁Cuenta 1
+▁centrar 1
+▁geo 1
+▁Bruselas 1
+▁Islandia 1
+▁Subraya 1
+▁dimensiones 1
+▁abre 1
+▁limpia 1
+▁hecha 1
+▁presunta 1
+▁periodo 1
+▁expectativas 1
+LA 1
+▁Fuerzas 1
+UE 1
+PI 1
+▁desean 1
+▁arquitectura 1
+▁productividad 1
+▁invoca 1
+ciencia 1
+▁banda 1
+▁credibilidad 1
+▁ninguno 1
+▁organizó 1
+▁actúa 1
+▁externos 1
+▁noveno 1
+rc 1
+▁pilar 1
+▁Kyoto 1
+▁constructivo 1
+▁Decreto 1
+ándole 1
+▁cambiado 1
+▁ejemplar 1
+▁Habid 1
+▁verbal 1
+▁Za 1
+▁inaceptable 1
+▁detallado 1
+▁necesariamente 1
+▁proponer 1
+▁» 1
+▁facilitado 1
+WG 1
+▁maneja 1
+▁derivados 1
+▁crimen 1
+▁comer 1
+▁dudas 1
+▁Qatar 1
+▁sanciona 1
+▁extremadamente 1
+▁entró 1
+▁tercero 1
+▁cuáles 1
+▁enseña 1
+▁Liga 1
+tieron 1
+▁cohesión 1
+▁empezó 1
+▁parecía 1
+▁protesta 1
+▁incidentes 1
+▁deposit 1
+▁Uzbekistán 1
+▁promueve 1
+▁preservar 1
+▁suministros 1
+▁promueva 1
+▁investiga 1
+iese 1
+lógico 1
+▁significado 1
+ducido 1
+▁encomia 1
+lli 1
+▁detectar 1
+▁Estonia 1
+▁salvar 1
+for 1
+Bissau 1
+▁sufragar 1
+lares 1
+▁piezas 1
+▁especializada 1
+iones 1
+▁imperio 1
+▁contraído 1
+zu 1
+▁cuán 1
+by 1
+▁Espacio 1
+ónica 1
+▁autónomos 1
+▁interesado 1
+▁socios 1
+tive 1
+SI 1
+▁ratificar 1
+▁tercio 1
+▁radical 1
+▁genética 1
+▁obtiene 1
+▁destacó 1
+▁consideraciones 1
+▁creando 1
+párr 1
+▁Nicaragua 1
+▁PIB 1
+▁subvenciones 1
+▁mutuo 1
+▁aérea 1
+▁complementa 1
+▁círculo 1
+▁remuneración 1
+▁contribuya 1
+▁transporta 1
+▁utilizada 1
+▁existir 1
+▁Modelo 1
+▁13.00 1
+pol 1
+▁clasifica 1
+▁2008-2009 1
+▁expansión 1
+▁poderes 1
+▁alcanza 1
+▁Varios 1
+▁plat 1
+▁renuncia 1
+▁actualizada 1
+▁planteado 1
+véanse 1
+▁religiones 1
+▁vacantes 1
+▁apoye 1
+▁gal 1
+▁Jurídicos 1
+▁trabajado 1
+▁estación 1
+quí 1
+▁sometida 1
+▁págs 1
+▁químicas 1
+▁alcanzados 1
+▁Ministerial 1
+▁verse 1
+▁sujetos 1
+▁coordinada 1
+3/ 1
+Qu 1
+▁subsidiarios 1
+▁cerrado 1
+▁clic 1
+▁Podría 1
+▁sólido 1
+▁piloto 1
+▁CON 1
+▁actores 1
+mpi 1
+ö 1
+▁comprobar 1
+▁importación 1
+▁Río 1
+▁reúne 1
+▁mejorado 1
+▁barco 1
+▁peticiones 1
+▁buscando 1
+▁Roja 1
+▁Sociedad 1
+▁embarazo 1
+▁recuerdo 1
+▁vídeo 1
+▁reclutamiento 1
+▁profesor 1
+struct 1
+▁titular 1
+▁británico 1
+▁silencio 1
+▁apelación 1
+▁debatir 1
+tric 1
+▁pacíficos 1
+▁Quizá 1
+▁cruza 1
+▁estructurales 1
+▁plantean 1
+▁aleja 1
+▁golpe 1
+▁cuánto 1
+ándolo 1
+ndar 1
+▁Reco 1
+▁monta 1
+▁Ciudad 1
+▁piedra 1
+▁extremo 1
+tch 1
+ENT 1
+▁elimina 1
+▁producen 1
+ung 1
+▁atmósfera 1
+▁femenina 1
+▁respectivas 1
+▁recauda 1
+▁sostiene 1
+▁secciones 1
+▁dedicar 1
+▁nombrado 1
+gina 1
+har 1
+▁universidades 1
+▁útiles 1
+▁Faso 1
+▁Consulta 1
+▁impulsa 1
+▁aprueba 1
+▁Dos 1
+bal 1
+▁destinada 1
+▁temperatura 1
+▁Ante 1
+▁precedentes 1
+▁romaníes 1
+▁libera 1
+▁Salón 1
+iti 1
+▁OSSI 1
+▁sugiere 1
+cula 1
+▁Toda 1
+▁relacionado 1
+▁apoyado 1
+rina 1
+▁recibidas 1
+▁convenio 1
+▁impuestas 1
+▁Cruz 1
+▁negativas 1
+▁Soy 1
+▁constructiva 1
+▁periodistas 1
+▁conoce 1
+peri 1
+▁Plataforma 1
+▁conducir 1
+▁legislativa 1
+▁Mujeres 1
+▁claridad 1
+▁asumido 1
+▁ocasiona 1
+▁conocida 1
+cultural 1
+▁eres 1
+▁ocurrió 1
+▁fracaso 1
+▁masiva 1
+▁deseen 1
+uar 1
+▁ordenamiento 1
+▁Unidad 1
+▁religiosos 1
+SO 1
+▁flujo 1
+▁muro 1
+▁75 1
+▁aquellas 1
+TO 1
+Me 1
+▁CD 1
+IG 1
+▁Tre 1
+ificó 1
+▁propósitos 1
+▁orgánica 1
+▁sugirió 1
+▁cometidas 1
+▁afrontar 1
+▁premio 1
+▁consagra 1
+▁hago 1
+CH 1
+▁bienvenida 1
+▁opina 1
+▁división 1
+▁socio 1
+cy 1
+▁Defensor 1
+▁impresión 1
+▁limitación 1
+2005 1
+▁positivas 1
+▁supera 1
+▁Yugoslava 1
+▁Habie 1
+▁detrás 1
+▁nave 1
+venga 1
+pet 1
+▁elegidos 1
+ik 1
+▁fomenta 1
+CRC 1
+▁Dicha 1
+** 1
+▁encuentro 1
+vivi 1
+ich 1
+▁Primero 1
+▁conscientes 1
+▁Gal 1
+▁Unida 1
+ft 1
+▁Sé 1
+▁Th 1
+▁Finalmente 1
+▁epidemia 1
+▁ayude 1
+▁comparti 1
+▁Imp 1
+▁viendo 1
+▁repatriación 1
+▁cierre 1
+▁combate 1
+▁aumente 1
+▁Panamá 1
+▁terrorista 1
+▁ADN 1
+▁décadas 1
+▁suspensión 1
+▁oradora 1
+▁depósito 1
+Los 1
+▁explosivos 1
+▁subregional 1
+▁1989 1
+sistir 1
+▁probar 1
+▁antecedentes 1
+▁oportuno 1
+▁probabilidad 1
+▁consideramos 1
+▁experto 1
+▁persistente 1
+▁informaciones 1
+▁Zambia 1
+▁océanos 1
+▁paciente 1
+▁hospitales 1
+▁Google 1
+▁objeciones 1
+▁Sexta 1
+▁Había 1
+▁sufren 1
+▁centrado 1
+%) 1
+▁socava 1
+▁reconocidos 1
+▁proporcionada 1
+▁resistencia 1
+▁publicó 1
+▁vacaciones 1
+▁ahorro 1
+▁industriales 1
+▁establecen 1
+cogiendo 1
+▁retorno 1
+▁Estaba 1
+▁culpa 1
+▁Posteriormente 1
+▁parlamentario 1
+▁reemplaza 1
+▁transfronterizo 1
+▁acreedores 1
+iana 1
+▁consultiva 1
+▁convino 1
+▁ocuparse 1
+▁extranjeras 1
+▁inspección 1
+▁Yemen 1
+▁carne 1
+▁obligatorio 1
+TI 1
+▁adoptando 1
+▁inventario 1
+▁Cri 1
+▁finanzas 1
+ige 1
+▁Emp 1
+plaza 1
+▁alquiler 1
+▁diseñar 1
+iéramos 1
+▁véase 1
+▁tren 1
+▁bruto 1
+▁universidad 1
+▁obligados 1
+▁transformación 1
+▁Investigaciones 1
+▁Jamaica 1
+▁escuchado 1
+▁complica 1
+lecomunicaciones 1
+▁Igualdad 1
+▁inmunidad 1
+pone 1
+▁académico 1
+▁perfil 1
+▁costes 1
+▁utilizarse 1
+▁gases 1
+▁cura 1
+▁(2003) 1
+▁Debido 1
+▁débil 1
+▁usan 1
+2006 1
+▁corresponden 1
+▁dará 1
+▁provincias 1
+6/ 1
+▁recomendado 1
+▁Guía 1
+▁Gra 1
+▁fiable 1
+▁Reitera 1
+▁publicidad 1
+▁gráfico 1
+▁prepara 1
+▁feliz 1
+▁migratorios 1
+▁cumplen 1
+▁Pese 1
+▁policial 1
+▁Lista 1
+▁generalmente 1
+▁campamentos 1
+▁estricta 1
+▁litoral 1
+ACIÓN 1
+▁requerir 1
+▁asisten 1
+▁presos 1
+▁voluntario 1
+▁estancia 1
+▁DEL 1
+▁emprender 1
+▁agrupa 1
+MP 1
+▁compensa 1
+▁obligado 1
+▁previas 1
+▁apropiados 1
+▁extraño 1
+▁rapidez 1
+▁Bra 1
+▁veo 1
+▁traducción 1
+▁cero 1
+▁Madrid 1
+▁asiento 1
+▁disponen 1
+▁revolución 1
+▁Consolidación 1
+▁accesible 1
+▁ventaja 1
+▁Malí 1
+▁compañero 1
+▁vuelva 1
+▁orientado 1
+▁cumpli 1
+▁Burkina 1
+▁intervenir 1
+▁encarcela 1
+▁Che 1
+▁Observación 1
+▁10% 1
+▁Gi 1
+burg 1
+fico 1
+▁delante 1
+▁65 1
+▁Centroafricana 1
+▁aportaciones 1
+▁adquirido 1
+▁refieren 1
+▁afectar 1
+▁relaciona 1
+▁intercambiar 1
+▁ayer 1
+▁piel 1
+▁traslad 1
+▁supervivencia 1
+▁Participa 1
+▁20% 1
+zona 1
+▁irre 1
+▁exteriores 1
+▁2015 1
+ificar 1
+▁Chris 1
+▁Evaluación 1
+▁máquinas 1
+▁animal 1
+▁Law 1
+▁trataba 1
+▁Nosotros 1
+▁Zimbabwe 1
+▁viejo 1
+▁tradición 1
+▁55/2 1
+▁reglamentación 1
+▁Dicho 1
+▁negativos 1
+▁situada 1
+▁mensual 1
+▁permitiría 1
+▁examinará 1
+▁felicita 1
+▁abandonar 1
+▁compuesto 1
+▁escasa 1
+▁Oficial 1
+uri 1
+▁1373 1
+▁indirecta 1
+▁inciso 1
+PL 1
+▁vínculo 1
+▁indebido 1
+▁Transición 1
+▁imposición 1
+▁reacción 1
+▁mover 1
+▁ecosistemas 1
+▁océano 1
+▁brecha 1
+and 1
+▁inclusive 1
+▁incumbe 1
+▁cesación 1
+▁Fra 1
+▁llevando 1
+▁marina 1
+▁Puerto 1
+/2007/ 1
+▁asocia 1
+▁And 1
+OL 1
+▁pared 1
+▁Independiente 1
+▁peligrosos 1
+▁operativo 1
+▁Washington 1
+▁gratuito 1
+▁viernes 1
+▁recién 1
+@ 1
+▁provisionales 1
+▁iniciales 1
+▁ejerce 1
+ku 1
+pto 1
+▁permitió 1
+▁equidad 1
+▁siguiera 1
+▁Hoteles 1
+▁cercano 1
+▁constitución 1
+▁escolares 1
+▁viva 1
+▁llegada 1
+▁2002-2003 1
+▁Agencia 1
+▁Control 1
+▁declarar 1
+▁Benin 1
+▁Lucha 1
+▁identifica 1
+▁cielo 1
+RES 1
+▁Ven 1
+▁fuese 1
+▁liberación 1
+▁desigualdad 1
+▁estarán 1
+ther 1
+▁Nivel 1
+▁marítimo 1
+▁tropas 1
+▁sensible 1
+▁(2006) 1
+▁parlamentaria 1
+▁Registro 1
+▁apunta 1
+▁Haya 1
+▁encontraba 1
+▁(2005) 1
+▁escrita 1
+mí 1
+▁Liechtenstein 1
+ure 1
+ática 1
+▁incentivos 1
+▁150 1
+▁aplique 1
+▁Celebra 1
+▁discriminatoria 1
+LE 1
+▁regulación 1
+▁embarazada 1
+▁proseguir 1
+▁gira 1
+▁pasada 1
+▁ética 1
+▁servidor 1
+▁residuos 1
+▁cancela 1
+▁privación 1
+▁duro 1
+▁participó 1
+▁formulario 1
+▁contribuirá 1
+▁1.000 1
+venta 1
+▁presentarse 1
+▁Directora 1
+poli 1
+▁18.00 1
+ONU 1
+▁alianzas 1
+IL 1
+▁aporte 1
+▁juvenil 1
+▁Cambio 1
+figura 1
+▁coordinado 1
+▁considerando 1
+▁desertificación 1
+▁ilustra 1
+▁protegido 1
+▁propicia 1
+▁describir 1
+▁enseñar 1
+▁ricos 1
+▁Fuente 1
+▁degradantes 1
+▁jurisprudencia 1
+▁matemática 1
+▁Durban 1
+▁1980 1
+▁viajar 1
+cito 1
+▁juventud 1
+▁contenidas 1
+▁Formas 1
+▁determinada 1
+▁procesamiento 1
+▁actualización 1
+sent 1
+▁Indígenas 1
+▁avanzada 1
+▁destacado 1
+▁antiguos 1
+▁diplomático 1
+▁sostenido 1
+▁sensación 1
+▁elabora 1
+▁cumpla 1
+▁núcleo 1
+▁sequía 1
+Si 1
+▁creada 1
+▁empecé 1
+▁tomen 1
+éis 1
+▁facultades 1
+CONF 1
+▁desarrollando 1
+▁apoyando 1
+▁evoluciona 1
+▁denominado 1
+▁contienen 1
+IF 1
+▁asegurarse 1
+▁ponerse 1
+▁Dominicana 1
+▁estadísticos 1
+▁comenzado 1
+▁secuestro 1
+dujeron 1
+▁divulgación 1
+▁encaminados 1
+▁perjudicial 1
+▁recibieron 1
+▁subrayó 1
+▁presten 1
+▁gar 1
+▁permanecer 1
+▁hechas 1
+▁obstáculo 1
+▁facultad 1
+▁Mc 1
+▁memorando 1
+▁expediente 1
+ducción 1
+▁dispositivo 1
+▁magnitud 1
+▁perjudica 1
+▁distribuido 1
+▁olvidar 1
+▁capitales 1
+▁reclama 1
+▁vehículo 1
+▁Casa 1
+▁respaldo 1
+▁décimo 1
+▁with 1
+▁64 1
+▁Siempre 1
+▁delincuentes 1
+▁africana 1
+DP 1
+▁Están 1
+▁observado 1
+▁Monterrey 1
+▁asesina 1
+king 1
+cara 1
+▁presunto 1
+▁lunes 1
+▁cárcel 1
+▁goce 1
+▁módulo 1
+▁escasez 1
+▁asciende 1
+▁regresa 1
+▁Tenía 1
+▁neuro 1
+▁iraquíes 1
+▁dispositivos 1
+▁ubica 1
+▁consideren 1
+▁Barcelona 1
+▁incidencia 1
+pel 1
+▁piso 1
+▁poseedores 1
+dez 1
+▁potable 1
+▁subtema 1
+▁departamento 1
+▁Ku 1
+▁ajustar 1
+▁sencillo 1
+▁considero 1
+cuerda 1
+▁banco 1
+▁Estupefacientes 1
+▁energético 1
+▁lógica 1
+▁tira 1
+▁necesitaba 1
+▁perfectamente 1
+▁ministros 1
+▁testimonio 1
+▁asegure 1
+▁Provisional 1
+▁cónyuge 1
+▁solicite 1
+▁Final 1
+▁líder 1
+▁retira 1
+2.000 1
+▁perfeccion 1
+Estados 1
+▁mejorando 1
+▁resumida 1
+cabeza 1
+▁iguales 1
+marca 1
+CD 1
+▁sugerencias 1
+▁Namibia 1
+▁promoviendo 1
+▁disfruta 1
+cierto 1
+▁generado 1
+▁accidentes 1
+ate 1
+▁socioeconómico 1
+▁resultantes 1
+ducto 1
+▁realista 1
+▁mutuamente 1
+▁iglesia 1
+▁posesión 1
+FOR 1
+▁habita 1
+▁TV 1
+▁Vo 1
+▁brillante 1
+▁prosperidad 1
+▁pornografía 1
+▁crece 1
+▁secreta 1
+▁pens 1
+▁maternidad 1
+▁afirmó 1
+moni 1
+▁acabar 1
+▁admisibilidad 1
+ture 1
+▁VII 1
+▁docente 1
+▁cuidados 1
+▁Jamahiriya 1
+▁estábamos 1
+▁interesada 1
+▁gestionar 1
+poner 1
+▁chica 1
+▁Documentación 1
+▁inhumanos 1
+▁progresiva 1
+▁vuelto 1
+▁decisivo 1
+▁valioso 1
+▁plaza 1
+▁controversia 1
+▁Delincuencia 1
+▁orgullo 1
+▁Paul 1
+▁suscita 1
+▁autorizada 1
+▁funcione 1
+▁contabilidad 1
+▁Marte 1
+▁consumidor 1
+▁proporcionan 1
+lau 1
+▁almacena 1
+▁señales 1
+▁llamamos 1
+EM 1
+▁deudor 1
+.4/ 1
+▁reunió 1
+▁desearía 1
+▁penitenciario 1
+▁árboles 1
+▁informático 1
+▁sospechoso 1
+▁valiosa 1
+▁report 1
+▁decreto 1
+▁agencias 1
+▁EL 1
+▁oficialmente 1
+▁OMS 1
+▁forzada 1
+▁vota 1
+▁propiedades 1
+▁prohibido 1
+▁Honduras 1
+▁equitativo 1
+▁Public 1
+▁Considera 1
+▁celebre 1
+▁empeño 1
+gun 1
+▁mini 1
+▁definitivo 1
+▁sentado 1
+▁crueles 1
+spir 1
+tiene 1
+▁Normas 1
+lón 1
+▁acreedor 1
+▁mutua 1
+▁MONUC 1
+▁diamantes 1
+▁fábrica 1
+▁parlamento 1
+▁órdenes 1
+▁sindicatos 1
+▁vender 1
+▁belleza 1
+▁orientaciones 1
+▁lanzamiento 1
+▁condiciona 1
+▁fiscalización 1
+▁directivo 1
+▁óptima 1
+▁corregir 1
+▁incluía 1
+ling 1
+▁Observaciones 1
+NI 1
+▁hablamos 1
+▁estudiando 1
+DH 1
+▁extensión 1
+▁prohibir 1
+▁EN 1
+▁aspira 1
+▁sueldos 1
+CR 1
+/10 1
+▁especialistas 1
+▁esclavitud 1
+▁café 1
+▁modificado 1
+▁experimental 1
+Di 1
+▁productivo 1
+▁estadística 1
+▁justificar 1
+▁(2002) 1
+▁consultivo 1
+▁ingeniería 1
+▁estrictamente 1
+▁cerrar 1
+▁inspeccion 1
+▁negociar 1
+cular 1
+▁agradecería 1
+▁estricto 1
+▁curs 1
+“ 1
+▁cometer 1
+mero 1
+▁talleres 1
+▁SE 1
+▁siguiendo 1
+▁legitimidad 1
+▁oculta 1
+▁Deseo 1
+▁unilateral 1
+▁situ 1
+▁pesquera 1
+▁reclusos 1
+▁tejido 1
+▁pensaba 1
+▁exclusiva 1
+▁compro 1
+▁Tayikistán 1
+ité 1
+stituye 1
+▁1988 1
+▁preparando 1
+▁World 1
+▁paisaje 1
+▁absoluta 1
+iller 1
+▁enunciados 1
+▁fiduciario 1
+▁restablecer 1
+▁genial 1
+▁inevitable 1
+▁tarifa 1
+▁Kar 1
+▁manifestar 1
+▁indemniza 1
+/64/ 1
+▁complementario 1
+▁recordó 1
+▁deterioro 1
+FR 1
+▁visado 1
+▁Fer 1
+▁fórmula 1
+▁psicológica 1
+▁cultiva 1
+izo 1
+▁actitudes 1
+▁pelo 1
+▁Federativa 1
+▁invasión 1
+▁mostrado 1
+▁teníamos 1
+▁Consultivo 1
+▁mono 1
+▁Regla 1
+▁diagnóstico 1
+▁proyecta 1
+▁septentrional 1
+▁decisiva 1
+▁Respect 1
+▁municipales 1
+ish 1
+▁Libertad 1
+▁reiterar 1
+▁convocar 1
+▁ventana 1
+goberna 1
+▁Fiscalía 1
+▁Mozambique 1
+▁constituía 1
+▁lujo 1
+izaciones 1
+-5 1
+▁Índice 1
+▁transformar 1
+/2008/ 1
+▁Sáhara 1
+▁bancaria 1
+▁Nunca 1
+▁80% 1
+▁mediados 1
+▁sustituir 1
+▁usado 1
+Ã 1
+▁UNMIK 1
+▁respaldar 1
+▁autoriza 1
+rey 1
+▁directores 1
+▁personalidad 1
+▁bloque 1
+▁incorporado 1
+▁Preparatorio 1
+▁ecológica 1
+ende 1
+▁altamente 1
+▁Tur 1
+▁accidente 1
+▁formularon 1
+▁maravilloso 1
+▁virtual 1
+▁productiva 1
+▁basta 1
+▁estimula 1
+Í 1
+▁Inspección 1
+▁francesa 1
+▁incompatible 1
+▁privilegio 1
+▁vivía 1
+▁procurar 1
+▁iraquí 1
+▁400 1
+▁Cualquier 1
+▁excepciones 1
+ambi 1
+▁catástrofe 1
+▁Guerra 1
+▁disparidad 1
+▁suele 1
+▁continuará 1
+▁presupuestaria 1
+IDA 1
+▁plantilla 1
+▁acordó 1
+ME 1
+▁tradiciones 1
+▁instó 1
+WP 1
+▁poca 1
+▁afectado 1
+▁importaciones 1
+▁acusa 1
+▁Existe 1
+▁distribuye 1
+▁cuantía 1
+▁multa 1
+▁preparatorio 1
+▁Planificación 1
+▁concienciación 1
+▁automóvil 1
+▁construido 1
+Mi 1
+▁centrarse 1
+▁Sub 1
+▁respeten 1
+Com 1
+▁protegida 1
+▁respira 1
+centr 1
+vía 1
+ib 1
+С 1
+▁Habitaciones 1
+▁Mejor 1
+▁diría 1
+▁introducido 1
+▁podremos 1
+▁mitigar 1
+▁morir 1
+gráfico 1
+▁PRO 1
+▁Tri 1
+▁añadido 1
+▁exhaustiva 1
+▁pasajeros 1
+▁irregular 1
+▁reduce 1
+▁policiales 1
+▁inestabilidad 1
+▁inseguridad 1
+▁propicio 1
+▁conservar 1
+▁vienen 1
+▁reviste 1
+▁funcional 1
+▁hablado 1
+▁cuestiona 1
+▁Cal 1
+▁generalizada 1
+▁Montreal 1
+▁jueves 1
+▁hermoso 1
+▁porteador 1
+▁Auditores 1
+ence 1
+▁recopilación 1
+▁protege 1
+▁viaja 1
+▁acordar 1
+▁Natural 1
+▁fundamenta 1
+▁puente 1
+▁anfitrión 1
+▁dictamen 1
+▁ejecuciones 1
+▁estupefacientes 1
+о 1
+▁brazo 1
+ive 1
+▁placer 1
+▁contamina 1
+▁automáticamente 1
+▁otorgar 1
+▁separada 1
+▁perfecto 1
+▁remitir 1
+▁implanta 1
+▁dificultad 1
+▁decidida 1
+OC 1
+DO 1
+▁quedó 1
+▁detección 1
+▁Mediterráneo 1
+▁emprendido 1
+▁invernadero 1
+▁produjo 1
+▁Existen 1
+▁étnica 1
+▁armonizar 1
+▁señalaron 1
+▁urbana 1
+▁degradación 1
+▁museo 1
+▁asesores 1
+activ 1
+▁literalmente 1
+▁incertidumbre 1
+tendiendo 1
+212) 1
+▁dispara 1
+▁30% 1
+▁manipula 1
+▁Paraguay 1
+▁poderoso 1
+▁efectuado 1
+▁mantienen 1
+▁imagin 1
+▁nombrar 1
+▁CNUDMI 1
+▁cotidiana 1
+▁refugio 1
+▁963- 1
+▁Armas 1
+▁preserva 1
+▁sentar 1
+▁titulares 1
+zz 1
+▁resulte 1
+▁Creemos 1
+▁preocupante 1
+.400 1
+▁profundidad 1
+▁bloqueo 1
+& 1
+▁verificar 1
+EL 1
+▁crecer 1
+▁PYME 1
+▁socioeconómica 1
+▁administradora 1
+▁celebraron 1
+To 1
++ 1
+▁Jurídica 1
+▁hermosa 1
+▁famoso 1
+▁actualizado 1
+▁inadmisible 1
+▁marítima 1
+▁resuelto 1
+▁semejante 1
+TRA 1
+ducir 1
+▁debilita 1
+berg 1
+ä 1
+▁intensa 1
+htm 1
+▁incrementa 1
+▁Alimentación 1
+▁desayuno 1
+▁eléctrica 1
+▁puntual 1
+▁forzoso 1
+▁pensión 1
+▁archivo 1
+▁globales 1
+▁arbitraje 1
+▁tendremos 1
+▁excepto 1
+▁club 1
+▁habilidades 1
+▁Auditor 1
+▁Emiratos 1
+▁afuera 1
+▁montón 1
+▁complementar 1
+▁conocí 1
+ísima 1
+0/ 1
+▁esperaba 1
+▁reembolso 1
+▁convirtió 1
+▁emplear 1
+COP 1
+▁Mongolia 1
+▁discapacidades 1
+▁Global 1
+▁Sabemos 1
+▁seriamente 1
+▁contratista 1
+▁Nu 1
+▁conduce 1
+▁excede 1
+▁prever 1
+▁llevará 1
+▁solía 1
+▁Lamentablemente 1
+▁aeronave 1
+▁navegación 1
+▁” 1
+▁Voy 1
+▁gravemente 1
+▁Bahrein 1
+▁Escuela 1
+▁sucedió 1
+▁soporta 1
+pie 1
+PR 1
+▁ayudará 1
+▁auxiliar 1
+▁entablar 1
+▁piscina 1
+▁misiles 1
+icidad 1
+▁envió 1
+▁poniendo 1
+▁móvil 1
+▁explorar 1
+▁Seguimiento 1
+▁damos 1
+▁Vamos 1
+▁negar 1
+▁artificial 1
+▁Monetario 1
+▁LOS 1
+▁editor 1
+▁asociado 1
+▁clínica 1
+▁continuidad 1
+▁actúe 1
+2009 1
+.300 1
+▁dotación 1
+▁Djibouti 1
+▁FMAM 1
+▁saludable 1
+▁flota 1
+PT 1
+▁cable 1
+▁frustra 1
+reestructuración 1
+▁minera 1
+▁ropa 1
+▁73 1
+▁unilaterales 1
+▁nutri 1
+▁mencionada 1
+▁suprimir 1
+▁aprendido 1
+▁ubicado 1
+▁agradable 1
+▁enemigo 1
+▁repercusión 1
+mbe 1
+tienda 1
+▁naturalmente 1
+anti 1
+▁fijar 1
+uelta 1
+▁desplegado 1
+▁recogida 1
+▁lleno 1
+▁discrimina 1
+▁nutrición 1
+▁afirmación 1
+▁contractual 1
+▁pintura 1
+▁hielo 1
+▁taller 1
+▁cine 1
+▁maestro 1
+▁tuvimos 1
+▁eventual 1
+▁lecciones 1
+▁evita 1
+▁presupuest 1
+▁coincide 1
+▁Fiduciario 1
+▁Mecanismo 1
+▁Ribera 1
+▁conmigo 1
+▁mueve 1
+▁electo 1
+[ 1
+▁recibida 1
+▁costumbre 1
+▁dominio 1
+▁empeora 1
+▁monetaria 1
+▁objetiva 1
+▁conversación 1
+▁origina 1
+▁disminuye 1
+▁supervisa 1
+▁Fiscalización 1
+▁bosque 1
+▁dólar 1
+▁conocen 1
+▁musulmanes 1
+▁Nuevo 1
+▁Somos 1
+▁obligada 1
+▁implementa 1
+▁circula 1
+▁pobre 1
+▁bacteria 1
+▁discapacitados 1
+▁Decisión 1
+▁edición 1
+dal 1
+▁distinto 1
+▁alentador 1
+▁encontrará 1
+▁talibanes 1
+▁vacío 1
+▁evalu 1
+▁doy 1
+▁ronda 1
+▁patrocinado 1
+▁bienal 1
+▁dieta 1
+▁colores 1
+▁vela 1
+▁doctor 1
+.600 1
+▁doce 1
+mail 1
+▁vigila 1
+▁forestal 1
+▁Proceso 1
+▁Consciente 1
+▁Ésta 1
+▁móviles 1
+▁mantenerse 1
+▁rendir 1
+▁canales 1
+▁registrar 1
+▁actualizar 1
+▁transfronteriza 1
+▁oralmente 1
+▁noticia 1
+güe 1
+vision 1
+▁Madagascar 1
+▁panorama 1
+▁importe 1
+▁infracciones 1
+▁censo 1
+▁empleadores 1
+▁coordinadores 1
+▁Financiero 1
+▁listo 1
+▁concentración 1
+▁atraer 1
+▁secc 1
+ney 1
+▁precisión 1
+▁Transporte 1
+▁Muy 1
+▁exploración 1
+▁Cuarta 1
+▁reflejar 1
+▁rojo 1
+▁enjuicia 1
+▁facilitación 1
+▁Ejército 1
+▁auténtica 1
+▁patrones 1
+▁llevada 1
+▁beneficiarse 1
+▁relaja 1
+NEPAD 1
+▁lesiones 1
+▁calor 1
+▁pronta 1
+▁Organizada 1
+▁pagado 1
+▁municipal 1
+▁electorales 1
+front 1
+▁adjunto 1
+▁umbral 1
+▁Abu 1
+▁cuándo 1
+▁Plaza 1
+▁Ze 1
+posiciones 1
+▁estructural 1
+▁planteadas 1
+▁reducida 1
+▁compartido 1
+2010 1
+▁Cabo 1
+▁aportación 1
+▁imputa 1
+▁sujeto 1
+▁transmite 1
+▁internet 1
+▁ordenador 1
+president 1
+▁minuto 1
+▁Copenhague 1
+▁alivio 1
+▁infecta 1
+kh 1
+uela 1
+▁preguntó 1
+▁sinergia 1
+▁wi 1
+▁alegaciones 1
+pusiera 1
+▁adjunta 1
+▁resultó 1
+▁basándose 1
+▁orientar 1
+▁aplicará 1
+▁correctamente 1
+/9 1
+▁pendiente 1
+▁reacciona 1
+▁validez 1
+▁ganancias 1
+▁confidencial 1
+▁usamos 1
+▁originales 1
+▁perpetrado 1
+▁recicla 1
+▁visible 1
+▁basarse 1
+patria 1
+▁comprador 1
+IR 1
+▁neutral 1
+▁habiendo 1
+▁ocupante 1
+▁acusaciones 1
+▁intentando 1
+▁obviamente 1
+▁preámbulo 1
+▁rechazo 1
+▁facilidad 1
+▁respondió 1
+▁emisión 1
+▁distinta 1
+▁disminuir 1
+▁hermana 1
+▁jurídicamente 1
+▁agradece 1
+▁falla 1
+▁deseamos 1
+▁Personas 1
+▁tragedia 1
+▁símbolo 1
+▁Aún 1
+▁aborto 1
+▁Casi 1
+▁Spa 1
+▁instalar 1
+▁antipersonal 1
+▁heridos 1
+rito 1
+▁triste 1
+▁consideraba 1
+▁liberal 1
+decisión 1
+▁permanece 1
+▁desgracia 1
+▁Atómica 1
+▁háb 1
+▁complicado 1
+.800 1
+▁malo 1
+▁desplaza 1
+▁impresionante 1
+▁alianza 1
+▁mío 1
+tente 1
+▁lanzar 1
+▁Situación 1
+▁droga 1
+▁parti 1
+▁United 1
+▁prórroga 1
+▁ofrecido 1
+▁emitido 1
+▁USD 1
+ding 1
+▁1967 1
+▁pista 1
+ç 1
+▁Johannesburgo 1
+▁macroeconómica 1
+▁transcurrido 1
+HRC 1
+▁neto 1
+▁40% 1
+▁observancia 1
+mental 1
+▁enfrentamiento 1
+CIA 1
+▁desigualdades 1
+▁secundario 1
+▁TNP 1
+▁Kong 1
+▁venir 1
+▁ruido 1
+▁sustenta 1
+▁abastecimiento 1
+▁circun 1
+▁maravillosa 1
+▁matricul 1
+▁Hong 1
+▁cubierta 1
+lement 1
+▁miércoles 1
+▁insiste 1
+cualesquiera 1
+▁indígena 1
+▁saca 1
+▁administrar 1
+▁olvida 1
+▁empezamos 1
+▁Parece 1
+▁restantes 1
+▁daña 1
+▁descubrimiento 1
+▁forzado 1
+Quién 1
+III 1
+▁Público 1
+▁clausura 1
+▁simultánea 1
+▁reunido 1
+▁caída 1
+▁minoría 1
+▁emocional 1
+CRP 1
+▁conciliación 1
+▁mediación 1
+▁vele 1
+▁Resumen 1
+▁derivadas 1
+Esta 1
+▁US 1
+▁amparo 1
+prime 1
+▁ocurrir 1
+▁preparada 1
+▁Preocupa 1
+▁refiero 1
+▁Trata 1
+/15 1
+▁custodia 1
+▁relevante 1
+▁incluirá 1
+▁constata 1
+▁escena 1
+▁necesite 1
+▁jamás 1
+▁Puesto 1
+▁elegante 1
+▁exposiciones 1
+▁negativo 1
+▁enfoca 1
+▁mostrarles 1
+▁negra 1
+▁asequible 1
+FCCC 1
+▁rango 1
+▁sujeción 1
+▁forestales 1
+▁aspiraciones 1
+▁obtención 1
+▁tenor 1
+▁africano 1
+▁propaga 1
+▁$ 1
+▁experimentado 1
+CEDAW 1
+▁equilibrada 1
+▁proclama 1
+▁hostilidades 1
+▁certificación 1
+▁iv 1
+▁acogió 1
+▁Militar 1
+▁ultra 1
+▁mitigación 1
+▁afgano 1
+▁volvió 1
+▁interpretar 1
+▁Financiera 1
+▁Meridional 1
+▁concluyó 1
+▁estudiante 1
+▁Vigilancia 1
+▁auspicios 1
+▁obtenida 1
+▁secuencia 1
+▁UNOPS 1
+▁David 1
+▁consultores 1
+▁innovadoras 1
+▁viento 1
+▁compartida 1
+▁vieron 1
+scentralización 1
+▁estimular 1
+▁sustancia 1
+▁Bel 1
+▁identificado 1
+EX 1
+▁profundiza 1
+▁Kha 1
+▁previamente 1
+▁preparatoria 1
+▁estimación 1
+logía 1
+▁académica 1
+▁directriz 1
+▁conviene 1
+▁retiro 1
+▁obligar 1
+▁Togo 1
+▁aproxima 1
+▁acuífero 1
+.100 1
+▁XX 1
+▁repente 1
+▁nace 1
+▁defensor 1
+▁Nadie 1
+▁inflación 1
+▁TIC 1
+▁admitir 1
+▁afecte 1
+▁desempeñado 1
+▁120 1
+▁tomadas 1
+▁coordinador 1
+▁reconozca 1
+▁Principales 1
+▁tensiones 1
+▁Marino 1
+▁aviso 1
+▁st 1
+▁Elección 1
+▁Queremos 1
+▁elevar 1
+▁clasificación 1
+▁peligroso 1
+lah 1
+▁quinta 1
+▁esencialmente 1
+XV 1
+▁sirio 1
+▁ciertamente 1
+▁requerido 1
+▁surge 1
+3.000 1
+▁Comunicación 1
+▁Tampoco 1
+▁reflexión 1
+▁Operación 1
+▁viabilidad 1
+igi 1
+▁provocado 1
+pondría 1
+▁prorrogar 1
+▁pierde 1
+▁recopila 1
+▁prestará 1
+▁candidato 1
+▁cuente 1
+▁averiguar 1
+2008 1
+▁+ 1
+▁alcohol 1
+▁destruir 1
+▁seguida 1
+▁cobra 1
+▁cola 1
+ICEF 1
+▁excesivo 1
+▁vivimos 1
+▁incaut 1
+▁creatividad 1
+sobre 1
+▁equivale 1
+▁elaborando 1
+▁batalla 1
+▁anunció 1
+▁logística 1
+▁ofrezca 1
+▁Todavía 1
+▁constituido 1
+chu 1
+▁enérgicamente 1
+▁llevaron 1
+▁perro 1
+▁Consideramos 1
+▁Mundo 1
+ándola 1
+▁Park 1
+▁oír 1
+tif 1
+▁pidieron 1
+▁seguimos 1
+Cuándo 1
+▁vendedor 1
+▁compila 1
+▁actualiza 1
+▁individuo 1
+▁Sírva 1
+▁Estocolmo 1
+▁patrón 1
+▁despert 1
+propia 1
+▁contenida 1
+▁sistemático 1
+▁transferir 1
+▁objeción 1
+▁pasaporte 1
+▁biológico 1
+▁delegados 1
+▁"¡ 1
+▁especializado 1
+▁ozono 1
+▁impreso 1
+▁comodidad 1
+▁inicialmente 1
+▁genes 1
+▁propagación 1
+▁She 1
+▁manifestado 1
+▁referirse 1
+cier 1
+▁percepción 1
+▁trabajadoras 1
+▁dificulta 1
+▁inventa 1
+▁Pregunta 1
+▁desapariciones 1
+▁solicitó 1
+▁sospecha 1
+▁Transnacional 1
+▁innovadores 1
+▁2011 1
+▁reunirse 1
+▁incidente 1
+▁cautela 1
+▁domicilio 1
+▁asociadas 1
+▁modifica 1
+▁recluta 1
+▁candidatura 1
+▁equipada 1
+▁imprescindible 1
+▁universitario 1
+▁renovación 1
+▁conlleva 1
+▁votantes 1
+▁72 1
+▁implementación 1
+▁proporcionando 1
+▁subregión 1
+▁Productos 1
+▁ascenso 1
+▁parto 1
+▁extraña 1
+▁Windows 1
+▁empezando 1
+▁Comisiones 1
+▁previsible 1
+▁estrecho 1
+▁prorrog 1
+▁1986 1
+view 1
+▁poderosa 1
+▁localidad 1
+▁interroga 1
+▁pedimos 1
+ić 1
+▁electro 1
+▁90% 1
+▁discusión 1
+▁codifica 1
+▁planea 1
+clav 1
+▁envejecimiento 1
+▁informática 1
+▁ejecutiva 1
+sistió 1
+presupuestarios 1
+▁Industrial 1
+▁cárceles 1
+▁convierta 1
+▁transacción 1
+▁alemán 1
+▁circunstancia 1
+▁civilizaciones 1
+▁dedicación 1
+▁100% 1
+▁medianas 1
+▁veía 1
+▁peores 1
+▁acusación 1
+▁Principal 1
+▁preliminares 1
+▁unanimidad 1
+▁Alineados 1
+▁gigante 1
+▁favorecer 1
+▁responda 1
+▁borra 1
+▁Report 1
+▁oración 1
+▁manteniendo 1
+▁desaparece 1
+▁rectores 1
+▁colocar 1
+▁gusto 1
+▁acoso 1
+▁falsa 1
+▁acondicionado 1
+▁oficioso 1
+▁sufrir 1
+▁desaparecido 1
+▁estableciendo 1
+▁singular 1
+▁Organizaciones 1
+▁becas 1
+Sub 1
+▁parcialmente 1
+ortalecimiento 1
+▁bancario 1
+▁limpieza 1
+▁Barbados 1
+▁fruto 1
+▁querido 1
+▁padece 1
+▁econom 1
+▁Botswana 1
+▁Género 1
+▁adherirse 1
+▁disminuido 1
+▁Conforme 1
+▁guardia 1
+▁techo 1
+▁cobrar 1
+▁verifica 1
+▁alfabetización 1
+▁plástico 1
+▁repetir 1
+▁escond 1
+poniendo 1
+▁boca 1
+▁interlocutores 1
+,000 1
+estructura 1
+▁factura 1
+▁proviene 1
+▁Fiji 1
+▁pacífico 1
+▁corrección 1
+▁Ud 1
+dista 1
+▁aptitudes 1
+▁jardín 1
+е 1
+▁entrenamiento 1
+▁prestó 1
+▁mirando 1
+▁conocemos 1
+scripciones 1
+▁defecto 1
+▁colaboradores 1
+▁blog 1
+más 1
+▁Comunicaciones 1
+▁soporte 1
+▁California 1
+▁Climático 1
+▁Empleo 1
+▁calificado 1
+▁consistente 1
+▁salón 1
+▁consultor 1
+▁desequilibrio 1
+▁Malawi 1
+▁teatro 1
+▁horario 1
+▁espiritual 1
+▁inaugura 1
+▁pandemia 1
+▁reinserción 1
+▁OSCE 1
+▁extender 1
+▁balance 1
+▁activista 1
+▁aboga 1
+▁desaparición 1
+▁prisiones 1
+▁interactivo 1
+▁Trabaja 1
+▁notificar 1
+▁atrapa 1
+estabiliza 1
+▁OOPS 1
+ffe 1
+▁intimida 1
+militar 1
+▁proveedor 1
+▁felicidad 1
+▁fiesta 1
+ibíd 1
+▁Científico 1
+▁aliviar 1
+▁conmemora 1
+▁FMI 1
+▁Administrativo 1
+▁lento 1
+▁cena 1
+▁Anti 1
+▁Ultraterrestre 1
+▁encontré 1
+▁reducciones 1
+▁amable 1
+▁formule 1
+▁configuración 1
+▁trasladar 1
+▁traducir 1
+▁doméstico 1
+▁prejuicio 1
+▁Iglesia 1
+▁proporcionó 1
+▁influir 1
+▁inscrito 1
+scendencia 1
+▁Éste 1
+▁traer 1
+▁confía 1
+▁tramita 1
+▁traza 1
+▁pieza 1
+▁encarar 1
+▁Capacitación 1
+▁Finanzas 1
+▁estabilización 1
+▁Frente 1
+▁contaminantes 1
+▁cuesta 1
+▁perfecta 1
+▁tramitación 1
+▁encaminada 1
+▁célula 1
+▁insistir 1
+Saben 1
+▁Mauritania 1
+▁molécula 1
+▁genital 1
+▁considerará 1
+Puede 1
+▁Directrices 1
+▁Recomendaciones 1
+▁musical 1
+▁tabaco 1
+▁UNITA 1
+▁conectar 1
+▁imparcialidad 1
+▁competitiva 1
+▁cometa 1
+▁56/2 1
+▁pedía 1
+▁Doy 1
+▁innovador 1
+▁Atlántico 1
+▁cerebral 1
+▁viviendo 1
+▁Empresa 1
+▁marido 1
+▁reanudar 1
+▁autorizar 1
+▁Balcanes 1
+▁cambió 1
+▁Tanto 1
+▁sentía 1
+▁controlado 1
+Música 1
+▁fronterizo 1
+▁terraza 1
+▁infección 1
+▁sectoriales 1
+▁cruce 1
+▁turco 1
+▁decidí 1
+▁Grandes 1
+▁procedente 1
+▁modern 1
+trop 1
+▁equilibrado 1
+▁aceite 1
+▁Dentro 1
+ística 1
+ão 1
+▁multiplica 1
+▁prohibida 1
+▁confisca 1
+▁explora 1
+▁hídricos 1
+▁inalienable 1
+▁trámite 1
+и 1
+▁conexiones 1
+▁estrés 1
+▁José 1
+▁franco 1
+▁actuando 1
+▁armonía 1
+▁Consenso 1
+▁compasión 1
+▁albergue 1
+▁discriminatorio 1
+▁simula 1
+▁à 1
+▁declarada 1
+▁Asentamientos 1
+▁arbitral 1
+▁designar 1
+▁bolsa 1
+CIÓN 1
+▁redacta 1
+▁auténtico 1
+▁descubierto 1
+▁desplegar 1
+▁válida 1
+▁ejecutado 1
+stitucion 1
+▁coloniales 1
+▁concilia 1
+▁testigo 1
+▁química 1
+▁violenta 1
+▁Pienso 1
+▁blanca 1
+▁inherente 1
+▁parecido 1
+▁elabore 1
+▁criatura 1
+▁obedece 1
+▁golpea 1
+▁fundada 1
+▁martes 1
+▁empezado 1
+▁estereotipos 1
+▁intersectorial 1
+▁lluvia 1
+▁violentos 1
+▁sobrevivir 1
+▁caliente 1
+▁redactar 1
+▁fila 1
+▁pura 1
+▁gestiones 1
+eciendo 1
+▁infracción 1
+▁ambicioso 1
+▁manejo 1
+▁acorde 1
+▁` 1
+▁literal 1
+▁versiones 1
+▁árbol 1
+▁hidro 1
+▁comprendida 1
+▁pudieron 1
+▁expres 1
+credit 1
+▁dinámico 1
+▁excluir 1
+▁Espacial 1
+▁Auto 1
+▁pierna 1
+▁decenas 1
+▁tensión 1
+▁manifestó 1
+▁proveniente 1
+▁reiteró 1
+▁subsistencia 1
+▁telefónica 1
+▁variable 1
+▁70% 1
+ident 1
+▁tutor 1
+▁Tercer 1
+▁Níger 1
+▁instituto 1
+▁lingüística 1
+▁satisfecho 1
+▁soberano 1
+▁referido 1
+▁colonial 1
+▁detecta 1
+▁logístico 1
+▁Libro 1
+▁digna 1
+.900 1
+Observador 1
+▁contenía 1
+▁homenaje 1
+▁rebeldes 1
+а 1
+▁perturba 1
+▁carbón 1
+▁niega 1
+▁1982 1
+▁concentrar 1
+▁gestiona 1
+▁italiano 1
+▁Electoral 1
+▁preparó 1
+▁was 1
+▁jornada 1
+# 1
+▁judío 1
+▁Club 1
+▁desglosados 1
+▁Poli 1
+▁trabajador 1
+gura 1
+▁reconociendo 1
+▁restricción 1
+▁refería 1
+▁ocupe 1
+▁pensado 1
+▁ratifica 1
+▁satisfactorio 1
+▁compensación 1
+▁Franja 1
+▁vieja 1
+▁paralela 1
+▁critica 1
+▁beneficiar 1
+▁Africano 1
+▁2020 1
+▁Basilea 1
+▁recíproca 1
+▁precedente 1
+▁campamento 1
+▁bloquea 1
+▁malaria 1
+▁lucro 1
+▁Rights 1
+▁1949 1
+▁explícitamente 1
+▁consagrados 1
+▁aceptada 1
+párrafo 1
+▁doctrina 1
+▁refuerza 1
+▁milenio 1
+▁emigr 1
+▁combatientes 1
+▁elaboró 1
+▁cosecha 1
+▁consigue 1
+▁refugiado 1
+▁creía 1
+▁Utilización 1
+▁informal 1
+▁armoniza 1
+▁legalidad 1
+▁apéndice 1
+▁privacidad 1
+▁Nations 1
+▁tangible 1
+▁liberar 1
+▁constituida 1
+▁habilita 1
+▁fal 1
+▁guarde 1
+▁aprende 1
+▁recae 1
+à 1
+▁cercana 1
+▁excluye 1
+▁intensifique 1
+▁George 1
+▁obvio 1
+▁autónoma 1
+▁explicó 1
+restablecimiento 1
+▁religioso 1
+▁1987 1
+▁talento 1
+▁Bush 1
+▁Trinidad 1
+▁convicción 1
+▁Permítanme 1
+▁espectacular 1
+▁incapacidad 1
+▁Formación 1
+▁racista 1
+▁signo 1
+▁auditor 1
+▁sirva 1
+▁visitantes 1
+▁helicóptero 1
+▁novedades 1
+▁recomendada 1
+▁empresarios 1
+▁caza 1
+▁adelanta 1
+Bueno 1
+▁afronta 1
+▁temporada 1
+LO 1
+/1999/ 1
+▁fax 1
+▁Presentación 1
+▁mutilación 1
+▁tuberculosis 1
+▁nervio 1
+▁plural 1
+▁borde 1
+▁subir 1
+▁lanzado 1
+É 1
+▁Bolivarian 1
+▁Secretaria 1
+▁ecológico 1
+▁Seminario 1
+▁arresto 1
+▁61/2 1
+▁rostro 1
+▁justificación 1
+1998 1
+▁DERECHO 1
+▁Mónaco 1
+▁trimestre 1
+▁Tareas 1
+óxido 1
+▁presiones 1
+▁definiciones 1
+▁creativa 1
+▁Kirguistán 1
+▁exhaustivo 1
+▁Reserva 1
+▁monitor 1
+▁atractivo 1
+▁Cuarto 1
+▁arroja 1
+▁reorganiza 1
+▁dispositiva 1
+▁huérfanos 1
+▁sudoriental 1
+▁pesquero 1
+▁from 1
+▁modificada 1
+▁Junto 1
+▁creativo 1
+disciplina 1
+▁Premio 1
+▁cuidadosamente 1
+script 1
+▁característica 1
+▁tolera 1
+▁biodiversidad 1
+▁frágil 1
+▁islámico 1
+▁terminó 1
+▁benefici 1
+▁admisión 1
+▁coral 1
+tendrá 1
+4.000 1
+▁Conclusiones 1
+▁aislamiento 1
+▁bombarde 1
+▁continental 1
+▁psicológico 1
+▁Racismo 1
+▁sindical 1
+▁1985 1
+▁vulnerable 1
+▁OTAN 1
+▁logró 1
+▁(2008) 1
+▁extrajudiciales 1
+▁OCDE 1
+▁ilumina 1
+▁convencer 1
+▁aviación 1
+ándonos 1
+▁escribió 1
+▁60% 1
+▁1970 1
+▁59/2 1
+▁sucedido 1
+▁exceso 1
+▁brutal 1
+▁lagunas 1
+▁contenga 1
+▁válido 1
+▁direcciones 1
+▁heridas 1
+▁25% 1
+▁descolonización 1
+▁siguieron 1
+▁desprende 1
+▁suspender 1
+▁kilo 1
+▁biología 1
+cora 1
+▁coordina 1
+Video 1
+▁Diálogo 1
+▁conveniencia 1
+▁XVII 1
+▁Libre 1
+▁arregl 1
+▁compone 1
+▁extraer 1
+▁bicicleta 1
+▁determinó 1
+▁columna 1
+▁culmina 1
+▁dulce 1
+▁Condición 1
+▁modernización 1
+▁pretend 1
+▁robo 1
+▁azúcar 1
+▁hipótesis 1
+▁Excmo 1
+▁Mixto 1
+▁serbios 1
+▁rueda 1
+▁nube 1
+▁confirmó 1
+▁emociones 1
+▁deficiente 1
+▁concedido 1
+▁alentó 1
+▁ancianos 1
+ICA 1
+▁inversores 1
+▁divisiones 1
+▁imperativo 1
+INF 1
+▁Martin 1
+▁mantuvo 1
+▁refleje 1
+▁creciendo 1
+▁extensa 1
+▁fijo 1
+dependientes 1
+▁Guyana 1
+▁Técnico 1
+▁cerrada 1
+▁menoscaba 1
+▁Tabago 1
+▁determine 1
+▁discutir 1
+▁cuidar 1
+▁prohib 1
+▁incluyó 1
+▁enmendada 1
+▁emitir 1
+▁intenciones 1
+▁relatores 1
+▁Enviado 1
+▁regalo 1
+▁caracter 1
+▁Gabón 1
+▁redunda 1
+▁agotado 1
+▁aplazar 1
+dministraciones 1
+▁carencia 1
+▁Bhután 1
+▁algoritmo 1
+▁higiene 1
+▁folleto 1
+▁concierne 1
+▁resolv 1
+CEDEAO 1
+▁Corrupción 1
+▁proteína 1
+▁migratoria 1
+▁capacitado 1
+▁atribuciones 1
+▁genético 1
+work 1
+▁Liberación 1
+▁concerniente 1
+▁secuestra 1
+▁impugna 1
+▁Trato 1
+▁panel 1
+▁National 1
+▁prisioneros 1
+▁referéndum 1
+▁transcurso 1
+▁amistad 1
+▁firmó 1
+▁abandono 1
+▁piensen 1
+▁recibí 1
+▁devolver 1
+verdad 1
+▁reprimir 1
+▁sentimos 1
+▁digno 1
+▁surgido 1
+▁confiar 1
+tendida 1
+▁cubre 1
+▁falsifica 1
+▁adición 1
+▁2012 1
+▁ministerio 1
+▁recompensa 1
+▁ingeni 1
+▁jubilación 1
+6.000 1
+▁Adelanto 1
+▁OSACT 1
+▁Turkmenistán 1
+▁idéntica 1
+▁famosa 1
+▁57/2 1
+▁paralelo 1
+Ó 1
+▁apoyó 1
+▁cápita 1
+▁inspectores 1
+▁indique 1
+▁cubano 1
+▁haciéndo 1
+® 1
+▁estudie 1
+▁fruta 1
+▁proteja 1
+▁Reforma 1
+▁agujero 1
+▁constituya 1
+▁diligencia 1
+▁Golán 1
+▁severa 1
+eira 1
+▁prepare 1
+▁Votos 1
+▁llamo 1
+▁Zona 1
+▁recurre 1
+▁prosiga 1
+▁Villa 1
+▁botón 1
+▁XI 1
+▁detenciones 1
+▁comporta 1
+н 1
+▁fronteriza 1
+▁pregunto 1
+▁profesión 1
+▁fichero 1
+▁primavera 1
+▁reanudación 1
+▁ONUDD 1
+▁demasiada 1
+▁entendido 1
+▁perpetua 1
+place 1
+▁indebida 1
+▁Armadas 1
+▁planificar 1
+▁Maldivas 1
+▁consuetudinario 1
+▁conozca 1
+▁ensayo 1
+▁acepte 1
+▁celular 1
+▁sueldo 1
+▁articula 1
+ñi 1
+▁UNIFEM 1
+▁Woods 1
+▁exclusivo 1
+Qaida 1
+▁intenso 1
+▁Robert 1
+▁necesitará 1
+т 1
+▁encontró 1
+▁espectro 1
+▁Ayuda 1
+▁gratis 1
+▁sugerencia 1
+▁celda 1
+▁alberga 1
+▁partículas 1
+▁reasentamiento 1
+▁subyacente 1
+▁varones 1
+▁fracasa 1
+ward 1
+▁Reducción 1
+▁operadores 1
+▁mostró 1
+▁convencida 1
+▁Andorra 1
+▁cumpliendo 1
+▁divide 1
+▁sincero 1
+▁dictado 1
+▁romper 1
+▁Habla 1
+▁divulga 1
+▁Afirma 1
+▁CEPA 1
+▁exhib 1
+▁punta 1
+▁acced 1
+▁Bahamas 1
+▁desventaja 1
+▁exitosa 1
+▁permitiera 1
+▁recinto 1
+▁ingenieros 1
+▁emergentes 1
+▁ministro 1
+▁paludismo 1
+▁refirió 1
+▁ingres 1
+▁Señala 1
+▁Varias 1
+ógeno 1
+stituyó 1
+▁Número 1
+▁circuito 1
+▁terapia 1
+▁prelación 1
+proyecto 1
+▁(2007) 1
+▁Apelaciones 1
+▁Gobernador 1
+▁divertido 1
+▁insuficiencia 1
+▁prevalencia 1
+♪ 1
+pendiendo 1
+▁Samoa 1
+▁Administrador 1
+▁Facebook 1
+▁continuó 1
+▁fabricantes 1
+▁persiste 1
+▁favorito 1
+/2009/ 1
+▁efectividad 1
+▁maltrato 1
+▁barato 1
+▁donaciones 1
+▁dictar 1
+▁gradua 1
+▁OSE 1
+▁intimidación 1
+▁argumenta 1
+▁Antigua 1
+▁menú 1
+▁Islam 1
+▁autónomo 1
+▁comunicó 1
+▁temporario 1
+▁disponía 1
+▁semillas 1
+▁golf 1
+▁demográfica 1
+▁VIII 1
+▁salió 1
+áramos 1
+▁colabor 1
+▁Alimentos 1
+▁cirugía 1
+▁denegación 1
+▁turístico 1
+▁semestre 1
+▁tabla 1
+▁quedará 1
+▁Industria 1
+▁Juventud 1
+▁Occidente 1
+▁Técnica 1
+▁fortaleciendo 1
+▁obtuvo 1
+▁referirme 1
+▁milicias 1
+▁Verificación 1
+▁Objetivo 1
+▁Igualmente 1
+▁optimiza 1
+▁reproducción 1
+▁basura 1
+▁hostil 1
+▁contratante 1
+▁aborde 1
+▁adjudic 1
+▁emplazamiento 1
+▁sensibilizar 1
+▁sostener 1
+▁Museo 1
+▁Mayor 1
+▁agencia 1
+▁dibujo 1
+▁injerencia 1
+▁fumadores 1
+▁interfaz 1
+▁persistencia 1
+▁respete 1
+▁terror 1
+▁faculta 1
+▁Inglaterra 1
+▁Promover 1
+▁designación 1
+▁empoderamiento 1
+▁frío 1
+▁Recuerda 1
+▁55/1 1
+▁Ombudsman 1
+▁atribuye 1
+▁amiga 1
+▁exacta 1
+▁contribuyó 1
+▁femenino 1
+▁reafirmó 1
+▁rigurosa 1
+▁descanso 1
+▁destinatario 1
+▁permitía 1
+▁aventura 1
+▁Belice 1
+▁Africa 1
+▁Bretton 1
+▁Cuestión 1
+▁Suprema 1
+▁confortable 1
+▁interviene 1
+▁excluido 1
+Cuánto 1
+▁Instancia 1
+▁asume 1
+▁Diversidad 1
+▁colegio 1
+▁inundaciones 1
+▁Apelación 1
+▁raíces 1
+▁ultima 1
+▁demográfico 1
+▁Fecha 1
+▁reanuda 1
+▁felices 1
+▁mencionó 1
+▁Recientemente 1
+▁sucediendo 1
+▁Academia 1
+▁intermedio 1
+▁reseña 1
+▁aprendí 1
+▁pesado 1
+▁injusticia 1
+▁órbita 1
+▁alojado 1
+▁descenso 1
+▁informativo 1
+▁emocionante 1
+ísimo 1
+▁calentamiento 1
+▁Instituciones 1
+▁estigma 1
+▁Encargado 1
+▁Observadores 1
+▁contaba 1
+▁Socorro 1
+▁intérprete 1
+▁malnutrición 1
+▁geográfico 1
+convocadas 1
+▁falso 1
+▁desigual 1
+▁ferrocarril 1
+▁terremoto 1
+▁abstenciones 1
+▁Papua 1
+▁Omán 1
+▁atañe 1
+▁mantendrá 1
+▁convergencia 1
+▁reconstruir 1
+▁innovaciones 1
+▁limpio 1
+▁iraní 1
+▁alemana 1
+▁concertar 1
+▁antelación 1
+▁administradores 1
+document 1
+▁enmendar 1
+▁conexo 1
+Ú 1
+▁pertinencia 1
+▁misterio 1
+▁destruye 1
+▁comparable 1
+▁acontecimiento 1
+▁surgir 1
+▁dirigió 1
+▁conceptual 1
+▁France 1
+▁hueso 1
+▁china 1
+Está 1
+▁persecución 1
+▁ingresar 1
+▁mercurio 1
+▁recibiendo 1
+▁ofreciendo 1
+▁reduciendo 1
+▁voces 1
+▁variante 1
+▁tutela 1
+▁linea 1
+▁enviada 1
+/54/ 1
+▁facultativo 1
+▁2010-2011 1
+▁Ejecución 1
+▁problemática 1
+▁recogido 1
+▁visitó 1
+▁concertada 1
+▁turno 1
+comercialización 1
+▁2000-2001 1
+▁ayudó 1
+▁pregunté 1
+▁requerida 1
+▁calificación 1
+▁adherido 1
+▁Microsoft 1
+▁billones 1
+▁coincidi 1
+▁incorpore 1
+▁Región 1
+▁gesto 1
+▁preescolar 1
+▁tropical 1
+Гі 1
+▁reproduce 1
+▁cuota 1
+▁admisible 1
+▁unánime 1
+▁intensidad 1
+▁invierno 1
+▁Decisiones 1
+Sudáfrica 1
+▁recrea 1
+iéndole 1
+▁Turismo 1
+▁vulnera 1
+€ 1
+▁agradezco 1
+▁ceremonia 1
+▁estancamiento 1
+▁penetra 1
+▁prefiere 1
+▁subsahariana 1
+▁ordinaria 1
+▁renovado 1
+▁aprovechamiento 1
+▁estableciera 1
+▁incitación 1
+▁masivo 1
+▁financiamiento 1
+interdependencia 1
+▁Descolonización 1
+▁sorpresa 1
+▁sostenida 1
+▁reputación 1
+wood 1
+▁vertical 1
+▁afro 1
+▁acero 1
+▁añade 1
+▁estrella 1
+▁Francisco 1
+▁inscribir 1
+▁potente 1
+▁jubila 1
+▁Análisis 1
+▁Gubernamentales 1
+▁murieron 1
+▁murió 1
+▁supuestamente 1
+▁expira 1
+▁instalado 1
+Quiere 1
+▁Amsterdam 1
+▁flagrante 1
+▁fútbol 1
+▁galaxia 1
+▁competitivo 1
+▁concertación 1
+▁expresiones 1
+▁efectuada 1
+▁Negocios 1
+▁coalición 1
+▁subsiguiente 1
+▁reclusión 1
+▁genoma 1
+▁obtenga 1
+▁tienden 1
+PNUD 1
+lógica 1
+XXI 1
+▁analítico 1
+▁mercenarios 1
+▁pertenencia 1
+▁ejerza 1
+▁Gui 1
+▁habló 1
+▁típico 1
+▁pasivo 1
+▁58/2 1
+▁Democracia 1
+▁Inversiones 1
+▁flagelo 1
+▁prestigio 1
+▁prorrateo 1
+▁Órgano 1
+▁abruma 1
+▁motivación 1
+▁victoria 1
+▁hábitat 1
+▁refuerce 1
+▁diseñada 1
+▁curva 1
+▁interrogatorio 1
+▁aplaud 1
+▁revoca 1
+▁préstamo 1
+▁Amnistía 1
+▁canadiense 1
+▁contemporánea 1
+▁desfavorecidos 1
+▁estímulo 1
+▁reúna 1
+▁exacto 1
+▁devastador 1
+▁gravado 1
+▁Biológica 1
+▁externo 1
+▁Development 1
+▁horrible 1
+▁progenitor 1
+▁Berlín 1
+▁comencé 1
+▁inmensa 1
+▁bordo 1
+▁desarrolle 1
+▁sucesivo 1
+▁batería 1
+▁oscura 1
+▁atribuir 1
+▁(1998) 1
+▁renovar 1
+▁recupera 1
+▁análoga 1
+▁espectáculo 1
+▁Linux 1
+▁infecciones 1
+▁disputa 1
+▁XIX 1
+Ley 1
+▁Biblioteca 1
+▁Lesotho 1
+▁democratización 1
+▁medición 1
+▁esposo 1
+▁revisa 1
+5-0 1
+> 1
+▁pabellón 1
+▁mantuvier 1
+▁luces 1
+▁Swazilandia 1
+▁Taiwán 1
+▁diversificación 1
+▁kuwaití 1
+▁alojarte 1
+▁matrícula 1
+▁refuerzo 1
+▁tranquila 1
+▁consolidado 1
+▁asistieron 1
+▁aislado 1
+▁recabar 1
+▁modalidad 1
+▁Aspectos 1
+▁embarca 1
+▁colección 1
+▁admira 1
+▁consigo 1
+▁Recomienda 1
+▁autobús 1
+▁complemento 1
+© 1
+▁énfasis 1
+▁llamó 1
+▁sensibilidad 1
+▁bandera 1
+▁valiente 1
+▁suave 1
+OPS 1
+▁Bangkok 1
+▁Distrito 1
+▁Prohibición 1
+▁contrarrestar 1
+▁fantástico 1
+▁produciendo 1
+gún 1
+▁investigador 1
+▁numer 1
+▁asegur 1
+▁Booking 1
+▁arquitecto 1
+▁británica 1
+▁liquidación 1
+▁sabiduría 1
+▁ánimo 1
+▁neuronas 1
+▁cristal 1
+cogemos 1
+▁Degradantes 1
+▁remesas 1
+▁Bienestar 1
+▁crónica 1
+▁deportivo 1
+Dónde 1
+▁Manual 1
+▁Estratégico 1
+▁síntomas 1
+▁típica 1
+▁plasma 1
+▁Adición 1
+▁destruido 1
+8.000 1
+▁Subcomité 1
+▁Suriname 1
+▁Simplemente 1
+▁juguete 1
+▁polvo 1
+▁comprobado 1
+▁juega 1
+▁hablo 1
+▁desempeñe 1
+▁billete 1
+▁estimó 1
+▁desierto 1
+▁Series 1
+pdf 1
+▁emprende 1
+▁Contratante 1
+▁abolición 1
+▁desconocido 1
+▁filosofía 1
+▁muchísimo 1
+▁prevalece 1
+▁recorrido 1
+▁alarmante 1
+▁revolucion 1
+▁dibuja 1
+▁preveía 1
+▁establecieron 1
+▁automática 1
+▁Océano 1
+▁credenciales 1
+▁litigio 1
+▁Respuesta 1
+▁requería 1
+▁diputado 1
+▁sentí 1
+▁Tokelau 1
+▁inquieta 1
+▁that 1
+▁selectiva 1
+/2002 1
+icultura 1
+▁diapositiva 1
+SIÓN 1
+▁Bosques 1
+▁ocurriendo 1
+▁permítanme 1
+▁centenar 1
+▁receta 1
+▁requiera 1
+▁delicado 1
+▁depositario 1
+▁film 1
+▁incondicional 1
+▁mercantil 1
+▁Bagdad 1
+▁decidieron 1
+▁colonia 1
+▁Comercial 1
+▁Judicial 1
+▁Profesional 1
+▁redujo 1
+▁trauma 1
+▁regulador 1
+▁cooperativas 1
+persona 1
+▁enérgica 1
+▁expedición 1
+▁revitalización 1
+с 1
+▁moratoria 1
+▁especula 1
+▁mamá 1
+▁Príncipe 1
+▁ascendía 1
+▁designada 1
+▁sexta 1
+▁vecino 1
+▁Reglamentación 1
+▁aparición 1
+▁atentamente 1
+▁reacciones 1
+▁Pesca 1
+▁adolescente 1
+▁magistrado 1
+▁fundador 1
+▁Barbuda 1
+▁distinguir 1
+▁drástica 1
+▁Página 1
+▁reivindica 1
+▁oscuro 1
+-2004 1
+▁Probablemente 1
+▁Véanse 1
+▁Ofrece 1
+PRST 1
+▁suscrito 1
+▁invite 1
+▁salarial 1
+Informe 1
+▁trágico 1
+▁sitúa 1
+▁muestre 1
+▁revisiones 1
+▁Irak 1
+▁esquema 1
+▁agotan 1
+▁Resoluciones 1
+▁pretexto 1
+▁residual 1
+▁Palacio 1
+▁segmento 1
+▁adversa 1
+▁guiar 1
+▁reparar 1
+▁Móvil 1
+▁Treaty 1
+▁aconseja 1
+▁cuartel 1
+▁fortalezca 1
+▁jurisdicciones 1
+▁penitenciaria 1
+▁precursores 1
+▁abuelo 1
+▁romaní 1
+▁Crueles 1
+▁errónea 1
+▁longitud 1
+▁monumento 1
+▁griego 1
+▁portátil 1
+▁botella 1
+▁artística 1
+▁optimista 1
+▁Cairo 1
+▁Subprograma 1
+▁estuve 1
+▁esquina 1
+¡ 1
+▁ingrediente 1
+▁cuantifica 1
+▁elegida 1
+▁densidad 1
+▁periodista 1
+emisor 1
+▁participe 1
+▁gratitud 1
+▁pudiéramos 1
+▁temprano 1
+▁tsunami 1
+▁apasiona 1
+▁contrabando 1
+sgraciadamente 1
+▁vote 1
+ô 1
+▁Ambiental 1
+▁divisas 1
+▁intermediario 1
+▁competir 1
+▁elogia 1
+ómetro 1
+▁XII 1
+▁Logística 1
+▁reembolsa 1
+▁Mercado 1
+▁preventivo 1
+▁exención 1
+▁describ 1
+▁imparte 1
+▁delicada 1
+▁satisface 1
+▁Moscú 1
+▁decepciona 1
+▁sabido 1
+▁dictada 1
+▁norteamericano 1
+▁patrocina 1
+▁advertencia 1
+▁certeza 1
+▁restringir 1
+▁remedio 1
+▁Voluntarios 1
+▁fósiles 1
+▁oxígeno 1
+▁predecir 1
+▁somalí 1
+▁ideología 1
+▁espalda 1
+▁alcalde 1
+▁contraseña 1
+XXVI 1
+Demócrata 1
+▁vigencia 1
+▁alegría 1
+▁Frank 1
+▁camiones 1
+▁leído 1
+▁Haciend 1
+▁horizontal 1
+▁introducida 1
+▁líquido 1
+▁concebido 1
+▁absorbe 1
+▁decidimos 1
+Nueva 1
+▁acostumbra 1
+▁homicidio 1
+▁textil 1
+▁Siguiendo 1
+▁huelga 1
+▁Obama 1
+▁cruel 1
+▁audio 1
+.105/ 1
+▁Economía 1
+▁FNUAP 1
+▁MANUD 1
+▁diplomacia 1
+▁jardines 1
+▁Considero 1
+▁equipado 1
+▁centró 1
+▁contraídas 1
+▁alegra 1
+incluido 1
+▁envíe 1
+▁recuerde 1
+▁comience 1
+▁radiación 1
+▁aterriza 1
+▁Ambos 1
+▁FPNUL 1
+▁Subsecretario 1
+▁protagonista 1
+▁registró 1
+▁restrictiva 1
+▁Conjunto 1
+▁Estructura 1
+▁comprometida 1
+▁correspondía 1
+▁termine 1
+▁invisible 1
+▁otoño 1
+▁motivado 1
+▁explícita 1
+▁dormitorio 1
+▁entusiasmo 1
+▁ratifique 1
+▁pudiese 1
+▁filtra 1
+▁variaciones 1
+▁parezca 1
+Italia 1
+China 1
+▁tranquilo 1
+▁disparo 1
+▁vivido 1
+sexual 1
+▁Subdivisión 1
+▁artefactos 1
+▁libanés 1
+▁Contribuciones 1
+▁arriesga 1
+▁existiendo 1
+▁demostró 1
+▁enfermera 1
+▁Provincia 1
+▁tendiente 1
+▁multitud 1
+▁trienal 1
+▁Debian 1
+▁recorte 1
+scribió 1
+safortunadamente 1
+▁Magistrado 1
+sproporcionada 1
+▁concluya 1
+7.000 1
+▁pautas 1
+Canadá 1
+▁Senado 1
+métrica 1
+▁equipamiento 1
+▁ratificó 1
+▁abuela 1
+▁Ahí 1
+▁confiere 1
+gregación 1
+▁introduce 1
+▁marginados 1
+▁patrulla 1
+▁Apertura 1
+▁ocurra 1
+México 1
+▁Emergencia 1
+▁analfabetismo 1
+▁italiana 1
+confidencialidad 1
+▁intercambia 1
+▁Adelantados 1
+▁asombroso 1
+▁multianual 1
+▁aceptó 1
+▁alinea 1
+▁Coalición 1
+▁adquiere 1
+▁comprenda 1
+▁autopista 1
+▁contenedor 1
+CCPR 1
+▁Vivienda 1
+▁calificaciones 1
+▁creíble 1
+▁facilitó 1
+▁evacua 1
+▁pelea 1
+▁Cuarteto 1
+▁Tráfico 1
+▁excelencia 1
+▁ofreció 1
+▁fiabilidad 1
+▁dispensa 1
+▁excusa 1
+▁computador 1
+▁explosión 1
+▁demostración 1
+▁exitoso 1
+▁sugerir 1
+▁percibe 1
+▁ESPAÑOL 1
+▁arrecife 1
+▁fármaco 1
+▁redoblar 1
+▁asigne 1
+▁escritura 1
+▁aislada 1
+▁rescate 1
+demócrata 1
+▁desviación 1
+▁Annan 1
+▁Adicional 1
+▁desglosa 1
+▁Reconciliación 1
+▁demuestre 1
+▁exagera 1
+▁islámica 1
+▁mecánica 1
+▁eligió 1
+▁reloj 1
+▁suicida 1
+▁comenzando 1
+▁plurianual 1
+▁prerrogativa 1
+▁muebles 1
+▁rechazó 1
+▁turística 1
+▁Podrá 1
+▁cambie 1
+▁permitiendo 1
+▁diagnostic 1
+cogieron 1
+в 1
+▁Abstenciones 1
+▁Marshall 1
+▁clásico 1
+▁mérito 1
+▁acogido 1
+▁brote 1
+evolucion 1
+▁convencional 1
+▁INGLÉS 1
+▁Solidaridad 1
+▁exhortó 1
+▁perseguir 1
+▁abrió 1
+▁inicie 1
+▁efectu 1
+▁innecesaria 1
+▁vectores 1
+▁nocivas 1
+▁cuenca 1
+▁marginal 1
+▁elemental 1
+▁susceptible 1
+▁Seguro 1
+▁arroz 1
+Alguien 1
+▁Recuerdo 1
+▁Richard 1
+▁Violencia 1
+▁piratería 1
+▁reproducir 1
+▁Michael 1
+▁gripe 1
+▁veinte 1
+▁esboza 1
+▁mortal 1
+privada 1
+▁ecosistema 1
+▁desempleados 1
+▁endeudados 1
+▁monopolio 1
+▁niñez 1
+▁percibir 1
+▁séptima 1
+▁alcanzó 1
+▁agresiones 1
+▁defiende 1
+▁inadecuada 1
+▁inmueble 1
+▁maniobra 1
+▁Gambia 1
+▁transferido 1
+▁Pleno 1
+Cuarta 1
+▁regir 1
+▁suplementaria 1
+▁vergüenza 1
+▁Piensen 1
+▁entendemos 1
+▁injusta 1
+▁planteó 1
+▁abejas 1
+▁servido 1
+▁certifica 1
+▁Schengen 1
+▁agotamiento 1
+▁comisaría 1
+▁signatura 1
+prendió 1
+▁Adquisiciones 1
+▁audiovisual 1
+▁Street 1
+▁ofrecía 1
+▁molesta 1
+Podría 1
+▁tarda 1
+Direct 1
+▁núm 1
+▁Inmigración 1
+▁Nuclear 1
+▁héroe 1
+▁idónea 1
+▁preferible 1
+▁intencional 1
+▁engaño 1
+control 1
+▁musulmana 1
+▁pidiendo 1
+▁curiosidad 1
+▁inminente 1
+▁AOD 1
+▁vegetal 1
+▁emitida 1
+▁Seguir 1
+▁abundante 1
+▁contemporáneo 1
+▁Women 1
+▁indirecto 1
+iéndolo 1
+▁Association 1
+▁Sumario 1
+▁inclina 1
+▁detrimento 1
+▁Debate 1
+▁moderado 1
+▁Arbitraje 1
+▁Desastres 1
+▁Ecuatorial 1
+▁Gibraltar 1
+▁censura 1
+▁clínico 1
+▁posibilita 1
+▁domingo 1
+▁repatriados 1
+▁Encomia 1
+▁angular 1
+▁introdujo 1
+▁represalia 1
+▁climática 1
+▁convincente 1
+▁pesada 1
+▁alimento 1
+▁aproveche 1
+▁cumplía 1
+▁prohibiciones 1
+▁riguroso 1
+▁vanguardia 1
+▁Abdul 1
+Corr 1
+grupo 1
+text 1
+▁consume 1
+▁Incluye 1
+▁desacuerdo 1
+▁exacerba 1
+▁introductoria 1
+▁utilizó 1
+▁reasign 1
+▁inútil 1
+▁resistente 1
+▁Vicente 1
+▁ballena 1
+▁honra 1
+▁químico 1
+▁Brunei 1
+▁discrepancia 1
+▁extremismo 1
+▁franceses 1
+▁dividir 1
+▁síntesis 1
+▁DVD 1
+▁huevo 1
+▁empuja 1
+▁compatibilidad 1
+▁descubrió 1
+▁disminuyó 1
+▁urbanización 1
+▁reunificación 1
+▁Charles 1
+▁remitido 1
+▁confusión 1
+▁Parque 1
+evaluación 1
+▁Fundamentales 1
+▁Ministra 1
+▁comandante 1
+▁comunique 1
+▁reunieron 1
+▁apelaciones 1
+▁Condena 1
+▁Organiza 1
+resoluciones 1
+▁encaminado 1
+▁Erradicación 1
+▁Excelencia 1
+▁PRESIDENTE 1
+▁exactitud 1
+▁insectos 1
+▁matemático 1
+▁microcrédito 1
+▁predecesor 1
+▁dormir 1
+▁extinción 1
+struyó 1
+▁marginación 1
+▁presidido 1
+▁Viernes 1
+▁humilla 1
+▁rehenes 1
+▁pudimos 1
+▁Multi 1
+▁1.0 1
+▁sucesor 1
+▁Jurídico 1
+▁Básicamente 1
+▁Permítaseme 1
+▁Pobreza 1
+▁Soviética 1
+▁Belgrado 1
+▁Enmienda 1
+▁supervisor 1
+▁Solicita 1
+▁Times 1
+▁emoción 1
+▁turca 1
+▁catalizador 1
+▁descubrimos 1
+▁sospechosas 1
+▁subasta 1
+▁Steve 1
+CERD 1
+▁rehabilit 1
+▁Profesor 1
+▁literatura 1
+▁remunerado 1
+▁alemanes 1
+▁escalera 1
+▁rectifica 1
+▁probado 1
+$ 1
+▁Apoyamos 1
+▁Ninguna 1
+▁pájaro 1
+▁University 1
+▁propongo 1
+▁filtro 1
+▁suplente 1
+▁quisiéramos 1
+▁deteriora 1
+▁lentamente 1
+▁turistas 1
+▁punible 1
+▁Claro 1
+▁Twitter 1
+▁reconocieron 1
+▁Jueves 1
+▁generosa 1
+▁contradicción 1
+▁Abeba 1
+▁belga 1
+▁concesiones 1
+▁estabilizar 1
+institucionales 1
+Alemania 1
+́ 1
+▁UNAMSIL 1
+▁asamblea 1
+▁encarecidamente 1
+▁progresivo 1
+▁refrigera 1
+р 1
+▁afgana 1
+Federación 1
+▁atrocidades 1
+▁placa 1
+▁accesibilidad 1
+▁apátrida 1
+▁multisectorial 1
+▁quinquenal 1
+▁reflejo 1
+▁descarta 1
+http 1
+europe 1
+atlántica 1
+▁Cercano 1
+▁Gabinete 1
+▁interactuar 1
+▁llegué 1
+▁prototipo 1
+▁referendo 1
+▁Situado 1
+▁espejo 1
+▁insumos 1
+▁Prensa 1
+▁imprevistos 1
+▁tóxicos 1
+▁comenzamos 1
+▁recesión 1
+▁construida 1
+▁Espera 1
+▁Detallada 1
+▁aborígenes 1
+▁excombatientes 1
+▁inmunización 1
+▁arrastra 1
+▁Edición 1
+▁tecla 1
+▁Obviamente 1
+▁Service 1
+▁progresar 1
+▁prudente 1
+▁tormenta 1
+▁Tokio 1
+▁descuento 1
+▁invertido 1
+▁Court 1
+tuvimos 1
+torgamiento 1
+▁Darussalam 1
+▁introduzca 1
+▁suplementario 1
+▁Minorías 1
+▁escoger 1
+▁hereda 1
+▁mixta 1
+▁HUMANOS 1
+▁Salomón 1
+▁abstenerse 1
+▁atraviesa 1
+▁soberana 1
+▁sábado 1
+▁colonos 1
+▁tentativa 1
+▁Funciona 1
+▁vestido 1
+comunicación 1
+▁Harvard 1
+▁Tuvalu 1
+▁biotecnología 1
+▁inglesa 1
+▁restauración 1
+▁subterránea 1
+▁encomendado 1
+▁exigía 1
+▁corona 1
+▁conozco 1
+▁embajador 1
+▁táctica 1
+▁convicciones 1
+▁Amazon 1
+▁barata 1
+▁emerge 1
+ò 1
+▁relevancia 1
+▁Rumanía 1
+▁tardía 1
+▁asentamiento 1
+▁indiscriminado 1
+▁metilbromuro 1
+▁vecindad 1
+▁persigue 1
+▁Continua 1
+¿ 1
+Francia 1
+▁adulto 1
+▁sostuvo 1
+▁imperante 1
+▁giro 1
+▁salvaguardia 1
+▁hormiga 1
+▁perdieron 1
+▁restaurar 1
+▁CFC 1
+▁muerta 1
+‘ 1
+▁Agrícola 1
+▁apariencia 1
+▁escuchó 1
+▁Barroso 1
+▁adulta 1
+▁culto 1
+▁grita 1
+▁emociona 1
+▁Instrumento 1
+▁Palestino 1
+{ 1
+▁promovido 1
+▁esclavos 1
+▁impedido 1
+▁vidrio 1
+▁prospera 1
+▁medicamento 1
+oficina 1
+▁Preparatoria 1
+▁Xenofobia 1
+▁autobuses 1
+▁fortaleza 1
+▁hostigamiento 1
+▁innecesario 1
+▁insostenible 1
+▁necesito 1
+▁tranquilidad 1
+▁subvención 1
+▁jugando 1
+▁octava 1
+▁Juegos 1
+▁ligero 1
+▁esclarec 1
+▁plaga 1
+▁Deporte 1
+▁inmobiliario 1
+▁sufriendo 1
+▁truco 1
+▁recibo 1
+▁Center 1
+parlamentaria 1
+Relator 1
+} 1
+± 1
+ê 1
+č 1
+š 1
+у 1
+л 1
+Ñ 1
+к 1
+м 1
+ø 1
+â 1
+п 1
+д 1
+· 1
+ï 1
+ì 1
+î 1
+å 1
+ë 1
+Р 1
+ل 1
+я 1
+ы 1
+б 1
+َ 1
+― 1
+з 1
+ي 1
+` 1
+є 1
+г 1
+„ 1
+∗ 1
+й 1
+ь 1
+В 1
+ß 1
+ž 1
+μ 1
+Ѓ 1
+§ 1
+ù 1
+‰ 1
+< 1
+ş 1
+« 1
+ł 1
+\ 1
+õ 1
+ð 1
+ا 1
+Ö 1
+Č 1
+х 1
+û 1
+Ο 1
+ć 1
+£ 1
+ă 1
+æ 1
+α 1
+ю 1
+‹ 1
+ā 1
+‚ 1
+ė 1
+ã 1
+ę 1
+Û 1
+Ü 1
+ı 1
+~ 1
+ш 1
+Å 1
+ر 1
+Â 1
+• 1
+Ç 1
+ŷ 1
+ι 1
+Ž 1
+œ 1
+─ 1
+ý 1
+Ä 1
+ו 1
+Ⴗ 1
+ت 1
+ф 1
+σ 1
+ن 1
+→ 1
+ą 1
+− 1
+‡ 1
+ο 1
+τ 1
+щ 1
+э 1
+ε 1
+ب 1
+^ 1
+ğ 1
+ś 1
+ż 1
+م 1
+ה 1
+ň 1
+È 1
+ъ 1
+¦ 1
+Ş 1
+Т 1
+ő 1
+● 1
+ѓ 1
+κ 1
+د 1
+ة 1
+ÿ 1
+і 1
+Џ 1
+ν 1
+К 1
+ש 1
+▪ 1
+À 1
+و 1
+ī 1
+ĝ 1
+ō 1
+ْ 1
+ū 1
+† 1
+υ 1
+О 1
+λ 1
+И 1
+ό 1
+י 1
+ל 1
+π 1
+η 1
+Н 1
+İ 1
+І 1
+س 1
+خ 1
+ع 1
+מ 1
+ك 1
+β 1
+ח 1
+语 1
+Ì 1
+ί 1
+П 1
+ા 1
+ר 1
+А 1
+ב 1
+נ 1
+ρ 1
+ά 1
+М 1
+ŝ 1
+ŭ 1
+Ò 1
+× 1
+ת 1
+■ 1
+Ê 1
+ź 1
+ٌ 1
+¥ 1
+̊ 1
+ї 1
+Е 1
+δ 1
+ĉ 1
+ע 1
+þ 1
+Ł 1
+Ε 1
+ّ 1
+ન 1
+χ 1
+Đ 1
+ه 1
+ 1
+ς 1
+Х 1
+פ 1
+، 1
+ح 1
+ढ 1
+ी 1
+ो 1
+′ 1
+¢ 1
+Ι 1
+ف 1
+У 1
+θ 1
+γ 1
+¬ 1
+א 1
+ط 1
+ો 1
+÷ 1
+Κ 1
+З 1
+ى 1
+ی 1
+ा 1
+ર 1
+อ 1
+ķ 1
+¤ 1
+ય 1
+ق 1
+ȣ 1
+Ф 1
+ק 1
+ص 1
+े 1
+‐ 1
+≈ 1
+○ 1
+★ 1
+北 1
+Ï 1
+ή 1
+ד 1
+ش 1
+म 1
+ી 1
+غ 1
+我 1
+个 1
+你 1
+Ч 1
+Ш 1
+Ë 1
+Α 1
+Б 1
+જ 1
+¶ 1
+Ô 1
+ų 1
+ё 1
+ם 1
+ث 1
+ं 1
+ई 1
+ड 1
+त 1
+य 1
+र 1
+ि 1
+ં 1
+ે 1
+્ 1
+ဪ 1
+◆ 1
+♲ 1
+ē 1
+आ 1
+ક 1
+ુ 1
+神 1
+ز 1
+่ 1
+牙 1
+以 1
+的 1
+妈 1
+在 1
+Л 1
+一 1
+ύ 1
+中 1
+भ 1
+海 1
+ṛ 1
+Ý 1
+ģ 1
+Ń 1
+ť 1
+ů 1
+ǎ 1
+ǵ 1
+Ν 1
+Ρ 1
+έ 1
+ξ 1
+ω 1
+ћ 1
+ג 1
+ך 1
+ן 1
+ض 1
+ڤ 1
+ग 1
+ज 1
+थ 1
+न 1
+ગ 1
+ચ 1
+દ 1
+ย 1
+ร 1
+ၝ 1
+√ 1
+≤ 1
+♦ 1
+❑ 1
+。 1
+』 1
+不 1
+与 1
+出 1
+台 1
+啊 1
+正 1
+洋 1
+ј 1
+ु 1
+સ 1
+ṣ 1
+大 1
+法 1
+西 1
+香 1
+面 1
+行 1
+游 1
+港 1
+来 1
+执 1
+地 1
+历 1
+决 1
+内 1
+仲 1
+了 1
+► 1
+∑ 1
+ၛ 1
+ષ 1
+વ 1
+લ 1
+પ 1
+ધ 1
+ડ 1
+। 1
+् 1
+ह 1
+צ 1
+כ 1
+Ц 1
+Ř 1
+Œ 1
+ĵ 1
+Ĉ 1
+有 1
+‛ 1
+क 1
+ז 1
+Є 1
+Μ 1
+ʾ 1
+ľ 1
+裁 1
+ج 1
+إ 1
+Э 1
+Д 1
+Σ 1
+़ 1
+ќ 1
+Î 1
+أ 1
+њ 1
+※ 1
+љ 1
+Ő 1
+ 1
+Ţ 1
+Ј 1
+̧ 1
+Ś 1
+♫ 1
+ţ 1
+Ć 1
+प 1
+ 1
+đ 1
+› 1
+Ŷ 1
+Ø 1
+ě 1
+؟ 1
+ř 1
+Љ 1
+Њ 1
+⁄ 1
+ц 1
+ж 1
+Ќ 1
+̈ 1
+Ћ 1
+之 1
+制 1
+发 1
+展 1
+度 1
+ч 1
+Ђ 1
+ń 1
+互 1
+相 1
+下 1
+可 1
+唱 1
+教 1
+曲 1
+歌 1
+给 1
+φ 1
+Й 1
+京 1
+公 1
+名 1
+声 1
+多 1
+居 1
+工 1
+很 1
+心 1
+方 1
+晟 1
+結 1
+网 1
+苑 1
+બ 1
+ṭ 1
+山 1
+政 1
+果 1
+白 1
+社 1
+铭 1
+ű 1
+Ÿ 1
+ǻ 1
+₤ 1
+持 1
+­ 1
+Æ 1
+Ā 1
+Ă 1
+ċ 1
+ď 1
+į 1
+Ľ 1
+ņ 1
+ŕ 1
+Ū 1
+ƒ 1
+ǒ 1
+Β 1
+Υ 1
+Φ 1
+Ω 1
+ζ 1
+Ѕ 1
+Ж 1
+Ы 1
+Я 1
+ט 1
+ף 1
+ؤ 1
+ً 1
+ख 1
+स 1
+એ 1
+ખ 1
+ણ 1
+થ 1
+મ 1
+શ 1
+િ 1
+ธ 1
+ศ 1
+ะ 1
+ใ 1
+ၡ 1
+ၢ 1
+ṇ 1
+ẹ 1
+ọ 1
+ờ 1
+‒ 1
+⇒ 1
+⇢ 1
+∙ 1
+≥ 1
+□ 1
+◙ 1
+◦ 1
+◯ 1
+♣ 1
+『 1
+い 1
+お 1
+し 1
+も 1
+ろ 1
+世 1
+並 1
+为 1
+伯 1
+作 1
+修 1
+做 1
+农 1
+几 1
+分 1
+剥 1
+参 1
+合 1
+和 1
+咬 1
+喜 1
+嘛 1
+外 1
+太 1
+妨 1
+姐 1
+完 1
+寓 1
+局 1
+帝 1
+年 1
+建 1
+後 1
+徳 1
+情 1
+慧 1
+成 1
+所 1
+拉 1
+探 1
+星 1
+木 1
+松 1
+比 1
+燰 1
+特 1
+王 1
+甜 1
+生 1
+界 1
+眼 1
+租 1
+等 1
+紧 1
+美 1
+翻 1
+臺 1
+色 1
+茶 1
+葱 1
+藤 1
+要 1
+见 1
+视 1
+角 1
+言 1
+請 1
+设 1
+译 1
+课 1
+赠 1
+路 1
+载 1
+農 1
+连 1
+送 1
+這 1
+鏮 1
+鑚 1
+镜 1
+问 1
+阳 1
+陈 1
+院 1
+Š 1
+Ė 1
+Ķ 1
+ț 1
+ذ 1
+આ 1
+พ 1
+↕ 1
+⇕ 1
+炎 1
+Ę 1
+ļ 1
+હ 1
+三 1
+取 1
+寄 1
+甸 1
+返 1
+イ 1
+オ 1
+セ 1
+ッ 1
+デ 1
+Ў 1
+俄 1
+利 1
+德 1
+意 1
+日 1
+汉 1
+班 1
+英 1
+萄 1
+葡 1
+阿 1
+<mask> 1
diff --git a/SpeechUT/dataset/MuSTC/en_es/spm_unigram10000.model b/SpeechUT/dataset/MuSTC/en_es/spm_unigram10000.model
new file mode 100644
index 0000000000000000000000000000000000000000..ac4cc9ef1c4677908cf91e8cc7dda71997e6dbf2
Binary files /dev/null and b/SpeechUT/dataset/MuSTC/en_es/spm_unigram10000.model differ
diff --git a/SpeechUT/dataset/MuSTC/en_fr/config.yaml b/SpeechUT/dataset/MuSTC/en_fr/config.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..dce5f63011a8c33a4d12eec569fdcc91ea299f68
--- /dev/null
+++ b/SpeechUT/dataset/MuSTC/en_fr/config.yaml
@@ -0,0 +1,3 @@
+vocab_filename: dict.spm.txt
+src_vocab_filename: dict.kmu.txt
+
diff --git a/SpeechUT/dataset/MuSTC/en_fr/config_enfr.yaml b/SpeechUT/dataset/MuSTC/en_fr/config_enfr.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..dd080a05500211cade57d80056c8ce311ce4c0c2
--- /dev/null
+++ b/SpeechUT/dataset/MuSTC/en_fr/config_enfr.yaml
@@ -0,0 +1,14 @@
+bpe_tokenizer:
+  bpe: sentencepiece
+  sentencepiece_model: spm_unigram10000.model
+
+sampling_alpha: 1.0
+shuffle: false
+use_audio_input: true
+use_sample_rate: 16000
+
+vocab_filename: dict.spm.txt
+
+# required by speech_to_text task but never used  
+input_channels: 1
+input_feat_per_channel: 1
diff --git a/SpeechUT/dataset/MuSTC/en_fr/dict.kmu.txt b/SpeechUT/dataset/MuSTC/en_fr/dict.kmu.txt
new file mode 100644
index 0000000000000000000000000000000000000000..bbfe59e554d6234f3631d8d09d9281c2160f4675
--- /dev/null
+++ b/SpeechUT/dataset/MuSTC/en_fr/dict.kmu.txt
@@ -0,0 +1,500 @@
+0 0
+1 1
+2 2
+3 3
+4 4
+5 5
+6 6
+7 7
+8 8
+9 9
+10 10
+11 11
+12 12
+13 13
+14 14
+15 15
+16 16
+17 17
+18 18
+19 19
+20 20
+21 21
+22 22
+23 23
+24 24
+25 25
+26 26
+27 27
+28 28
+29 29
+30 30
+31 31
+32 32
+33 33
+34 34
+35 35
+36 36
+37 37
+38 38
+39 39
+40 40
+41 41
+42 42
+43 43
+44 44
+45 45
+46 46
+47 47
+48 48
+49 49
+50 50
+51 51
+52 52
+53 53
+54 54
+55 55
+56 56
+57 57
+58 58
+59 59
+60 60
+61 61
+62 62
+63 63
+64 64
+65 65
+66 66
+67 67
+68 68
+69 69
+70 70
+71 71
+72 72
+73 73
+74 74
+75 75
+76 76
+77 77
+78 78
+79 79
+80 80
+81 81
+82 82
+83 83
+84 84
+85 85
+86 86
+87 87
+88 88
+89 89
+90 90
+91 91
+92 92
+93 93
+94 94
+95 95
+96 96
+97 97
+98 98
+99 99
+100 100
+101 101
+102 102
+103 103
+104 104
+105 105
+106 106
+107 107
+108 108
+109 109
+110 110
+111 111
+112 112
+113 113
+114 114
+115 115
+116 116
+117 117
+118 118
+119 119
+120 120
+121 121
+122 122
+123 123
+124 124
+125 125
+126 126
+127 127
+128 128
+129 129
+130 130
+131 131
+132 132
+133 133
+134 134
+135 135
+136 136
+137 137
+138 138
+139 139
+140 140
+141 141
+142 142
+143 143
+144 144
+145 145
+146 146
+147 147
+148 148
+149 149
+150 150
+151 151
+152 152
+153 153
+154 154
+155 155
+156 156
+157 157
+158 158
+159 159
+160 160
+161 161
+162 162
+163 163
+164 164
+165 165
+166 166
+167 167
+168 168
+169 169
+170 170
+171 171
+172 172
+173 173
+174 174
+175 175
+176 176
+177 177
+178 178
+179 179
+180 180
+181 181
+182 182
+183 183
+184 184
+185 185
+186 186
+187 187
+188 188
+189 189
+190 190
+191 191
+192 192
+193 193
+194 194
+195 195
+196 196
+197 197
+198 198
+199 199
+200 200
+201 201
+202 202
+203 203
+204 204
+205 205
+206 206
+207 207
+208 208
+209 209
+210 210
+211 211
+212 212
+213 213
+214 214
+215 215
+216 216
+217 217
+218 218
+219 219
+220 220
+221 221
+222 222
+223 223
+224 224
+225 225
+226 226
+227 227
+228 228
+229 229
+230 230
+231 231
+232 232
+233 233
+234 234
+235 235
+236 236
+237 237
+238 238
+239 239
+240 240
+241 241
+242 242
+243 243
+244 244
+245 245
+246 246
+247 247
+248 248
+249 249
+250 250
+251 251
+252 252
+253 253
+254 254
+255 255
+256 256
+257 257
+258 258
+259 259
+260 260
+261 261
+262 262
+263 263
+264 264
+265 265
+266 266
+267 267
+268 268
+269 269
+270 270
+271 271
+272 272
+273 273
+274 274
+275 275
+276 276
+277 277
+278 278
+279 279
+280 280
+281 281
+282 282
+283 283
+284 284
+285 285
+286 286
+287 287
+288 288
+289 289
+290 290
+291 291
+292 292
+293 293
+294 294
+295 295
+296 296
+297 297
+298 298
+299 299
+300 300
+301 301
+302 302
+303 303
+304 304
+305 305
+306 306
+307 307
+308 308
+309 309
+310 310
+311 311
+312 312
+313 313
+314 314
+315 315
+316 316
+317 317
+318 318
+319 319
+320 320
+321 321
+322 322
+323 323
+324 324
+325 325
+326 326
+327 327
+328 328
+329 329
+330 330
+331 331
+332 332
+333 333
+334 334
+335 335
+336 336
+337 337
+338 338
+339 339
+340 340
+341 341
+342 342
+343 343
+344 344
+345 345
+346 346
+347 347
+348 348
+349 349
+350 350
+351 351
+352 352
+353 353
+354 354
+355 355
+356 356
+357 357
+358 358
+359 359
+360 360
+361 361
+362 362
+363 363
+364 364
+365 365
+366 366
+367 367
+368 368
+369 369
+370 370
+371 371
+372 372
+373 373
+374 374
+375 375
+376 376
+377 377
+378 378
+379 379
+380 380
+381 381
+382 382
+383 383
+384 384
+385 385
+386 386
+387 387
+388 388
+389 389
+390 390
+391 391
+392 392
+393 393
+394 394
+395 395
+396 396
+397 397
+398 398
+399 399
+400 400
+401 401
+402 402
+403 403
+404 404
+405 405
+406 406
+407 407
+408 408
+409 409
+410 410
+411 411
+412 412
+413 413
+414 414
+415 415
+416 416
+417 417
+418 418
+419 419
+420 420
+421 421
+422 422
+423 423
+424 424
+425 425
+426 426
+427 427
+428 428
+429 429
+430 430
+431 431
+432 432
+433 433
+434 434
+435 435
+436 436
+437 437
+438 438
+439 439
+440 440
+441 441
+442 442
+443 443
+444 444
+445 445
+446 446
+447 447
+448 448
+449 449
+450 450
+451 451
+452 452
+453 453
+454 454
+455 455
+456 456
+457 457
+458 458
+459 459
+460 460
+461 461
+462 462
+463 463
+464 464
+465 465
+466 466
+467 467
+468 468
+469 469
+470 470
+471 471
+472 472
+473 473
+474 474
+475 475
+476 476
+477 477
+478 478
+479 479
+480 480
+481 481
+482 482
+483 483
+484 484
+485 485
+486 486
+487 487
+488 488
+489 489
+490 490
+491 491
+492 492
+493 493
+494 494
+495 495
+496 496
+497 497
+498 498
+499 499
diff --git a/SpeechUT/dataset/MuSTC/en_fr/dict.spm.txt b/SpeechUT/dataset/MuSTC/en_fr/dict.spm.txt
new file mode 100644
index 0000000000000000000000000000000000000000..db33a0589715d19a009ac803a74623e7aa436f39
--- /dev/null
+++ b/SpeechUT/dataset/MuSTC/en_fr/dict.spm.txt
@@ -0,0 +1,9997 @@
+▁de 1
+, 1
+' 1
+s 1
+. 1
+▁la 1
+▁et 1
+▁l 1
+▁des 1
+▁les 1
+▁à 1
+▁d 1
+▁le 1
+▁ 1
+▁du 1
+▁en 1
+’ 1
+▁que 1
+e 1
+- 1
+▁pour 1
+▁dans 1
+▁a 1
+▁un 1
+▁sur 1
+▁au 1
+▁qui 1
+▁une 1
+▁par 1
+▁( 1
+▁est 1
+) 1
+es 1
+▁aux 1
+▁ou 1
+▁qu 1
+▁pas 1
+▁ce 1
+r 1
+▁plus 1
+▁Le 1
+▁s 1
+▁sont 1
+a 1
+est 1
+er 1
+▁n 1
+▁ne 1
+un 1
+▁nous 1
+▁: 1
+▁avec 1
+▁ont 1
+il 1
+▁Les 1
+▁La 1
+▁L 1
+▁se 1
+▁été 1
+/ 1
+▁vous 1
+une 1
+▁il 1
+▁C 1
+ment 1
+▁c 1
+; 1
+▁être 1
+nt 1
+▁Il 1
+▁ces 1
+▁cette 1
+é 1
+▁leur 1
+t 1
+on 1
+▁comme 1
+▁fait 1
+▁Canada 1
+ant 1
+▁« 1
+▁pays 1
+▁M 1
+▁son 1
+en 1
+▁je 1
+y 1
+▁sa 1
+▁- 1
+▁y 1
+▁En 1
+▁si 1
+▁S 1
+▁A 1
+▁peut 1
+S 1
+▁entre 1
+▁faire 1
+▁• 1
+▁1 1
+▁mais 1
+A 1
+ent 1
+▁tout 1
+▁développement 1
+c 1
+▁D 1
+▁tous 1
+ai 1
+▁Et 1
+▁ses 1
+ait 1
+: 1
+d 1
+▁rapport 1
+▁Conseil 1
+▁j 1
+o 1
+▁travail 1
+▁non 1
+▁États 1
+▁même 1
+ci 1
+▁B 1
+▁droits 1
+▁leurs 1
+▁deux 1
+▁on 1
+re 1
+▁ainsi 1
+). 1
+▁aussi 1
+▁Je 1
+ons 1
+▁services 1
+▁Commission 1
+in 1
+▁Nous 1
+▁bien 1
+C 1
+ation 1
+▁également 1
+▁cas 1
+z 1
+autres 1
+▁F 1
+▁in 1
+▁» 1
+u 1
+n 1
+▁dont 1
+i 1
+▁p 1
+▁? 1
+), 1
+▁personnes 1
+▁Comité 1
+▁sous 1
+▁sécurité 1
+▁membres 1
+▁T 1
+▁J 1
+l 1
+▁P 1
+le 1
+▁projet 1
+▁où 1
+▁droit 1
+▁santé 1
+▁autres 1
+P 1
+▁2 1
+f 1
+al 1
+▁était 1
+▁doit 1
+▁très 1
+▁contre 1
+▁partie 1
+m 1
+ur 1
+▁Nations 1
+D 1
+ra 1
+▁% 1
+▁" 1
+▁3 1
+▁R 1
+M 1
+an 1
+▁Dans 1
+h 1
+▁programme 1
+▁cadre 1
+▁cours 1
+▁soit 1
+▁dé 1
+▁Unies 1
+▁sans 1
+B 1
+article 1
+elle 1
+▁notre 1
+▁elle 1
+▁re 1
+▁m 1
+▁général 1
+▁mesures 1
+▁G 1
+▁compte 1
+▁avons 1
+p 1
+© 1
+ils 1
+▁activités 1
+▁ré 1
+▁demande 1
+▁ans 1
+és 1
+▁afin 1
+▁question 1
+▁système 1
+▁nombre 1
+▁données 1
+E 1
+à 1
+or 1
+▁5 1
+ée 1
+b 1
+L 1
+▁mise 1
+▁toutes 1
+▁peuvent 1
+▁Ce 1
+it 1
+ont 1
+▁b 1
+▁monde 1
+▁femmes 1
+ez 1
+▁4 1
+aient 1
+▁Pour 1
+▁— 1
+▁matière 1
+▁politique 1
+▁point 1
+▁moins 1
+information 1
+ou 1
+être 1
+▁vie 1
+▁recherche 1
+▁10 1
+▁E 1
+▁ils 1
+▁enfants 1
+▁jour 1
+▁votre 1
+▁On 1
+▁temps 1
+O 1
+▁gestion 1
+▁avait 1
+▁De 1
+▁19 1
+▁concernant 1
+▁questions 1
+T 1
+N 1
+▁À 1
+▁gouvernement 1
+▁façon 1
+us 1
+▁Si 1
+v 1
+▁ressources 1
+▁Mais 1
+▁nos 1
+▁générale 1
+▁niveau 1
+F 1
+H 1
+Ã 1
+▁autre 1
+R 1
+▁produits 1
+▁vue 1
+▁doivent 1
+▁donc 1
+▁6 1
+▁programmes 1
+G 1
+▁lui 1
+▁processus 1
+au 1
+▁internationale 1
+ne 1
+▁protection 1
+▁N 1
+▁avoir 1
+▁politiques 1
+▁toute 1
+▁notamment 1
+ce 1
+▁plan 1
+▁ça 1
+▁fois 1
+▁cet 1
+▁dit 1
+2 1
+] 1
+▁Un 1
+▁me 1
+homme 1
+▁encore 1
+▁cela 1
+is 1
+▁– 1
+▁international 1
+▁chaque 1
+▁mon 1
+▁paragraphe 1
+aux 1
+▁ma 1
+▁session 1
+▁économique 1
+g 1
+▁[ 1
+▁K 1
+▁sera 1
+▁européenne 1
+▁avant 1
+▁H 1
+▁résolution 1
+▁après 1
+ité 1
+1 1
+▁7 1
+at 1
+▁Par 1
+▁trois 1
+ar 1
+▁années 1
+age 1
+ir 1
+eau 1
+te 1
+▁fin 1
+ement 1
+▁base 1
+▁situation 1
+▁personne 1
+▁lieu 1
+État 1
+ées 1
+▁Convention 1
+▁Une 1
+▁pro 1
+▁résultats 1
+k 1
+▁Cette 1
+▁20 1
+▁période 1
+▁groupe 1
+▁gens 1
+▁loi 1
+▁place 1
+▁8 1
+▁décision 1
+▁9 1
+▁selon 1
+▁mesure 1
+▁I 1
+▁Au 1
+3 1
+id 1
+▁beaucoup 1
+▁formation 1
+▁15 1
+▁exemple 1
+▁peu 1
+ance 1
+▁présent 1
+té 1
+ca 1
+▁h 1
+▁titre 1
+▁V 1
+▁secteur 1
+▁manière 1
+▁œuvre 1
+▁tant 1
+ing 1
+▁renseignements 1
+ais 1
+▁particulier 1
+▁Elle 1
+? 1
+▁première 1
+action 1
+▁000 1
+ie 1
+▁pré 1
+" 1
+▁of 1
+▁part 1
+▁coopération 1
+▁prendre 1
+▁car 1
+▁pendant 1
+▁mettre 1
+▁public 1
+▁W 1
+▁devrait 1
+▁certains 1
+Assemblée 1
+▁compris 1
+▁Président 1
+ch 1
+▁g 1
+▁Vous 1
+Union 1
+▁étaient 1
+application 1
+▁contrôle 1
+▁inter 1
+▁suis 1
+▁Ces 1
+▁Loi 1
+▁société 1
+▁12 1
+▁rôle 1
+▁service 1
+▁depuis 1
+▁grande 1
+▁qualité 1
+▁ceux 1
+▁national 1
+( 1
+▁nouveau 1
+▁parce 1
+▁’ 1
+évaluation 1
+▁marché 1
+▁plusieurs 1
+ut 1
+▁voir 1
+▁Groupe 1
+▁personnel 1
+4 1
+▁faut 1
+▁dollars 1
+▁grand 1
+les 1
+▁domaine 1
+▁Des 1
+▁seulement 1
+▁dispositions 1
+▁f 1
+▁chose 1
+Unis 1
+V 1
+▁con 1
+▁quand 1
+▁jusqu 1
+▁parties 1
+▁$ 1
+la 1
+▁conditions 1
+aire 1
+▁besoin 1
+ad 1
+était 1
+▁projets 1
+▁savoir 1
+de 1
+▁publique 1
+▁18 1
+ro 1
+▁millions 1
+ex 1
+▁nom 1
+▁eu 1
+0 1
+um 1
+el 1
+aide 1
+▁entreprises 1
+▁important 1
+hui 1
+▁vers 1
+▁fonction 1
+▁premier 1
+▁possible 1
+▁laquelle 1
+▁11 1
+ter 1
+▁nationale 1
+▁population 1
+▁Secrétaire 1
+▁document 1
+▁30 1
+▁région 1
+ine 1
+▁nouvelle 1
+▁pourrait 1
+able 1
+et 1
+▁objectifs 1
+▁Re 1
+▁moment 1
+00 1
+im 1
+▁présente 1
+▁problèmes 1
+U 1
+▁ci 1
+6 1
+▁O 1
+ri 1
+elles 1
+environnement 1
+▁mois 1
+ac 1
+▁alors 1
+▁groupes 1
+▁travaux 1
+▁taux 1
+▁13 1
+emploi 1
+5 1
+di 1
+▁va 1
+▁Programme 1
+am 1
+▁lors 1
+▁conformément 1
+▁organisations 1
+oc 1
+Organisation 1
+▁nouvelles 1
+▁paix 1
+▁sujet 1
+▁déjà 1
+▁ex 1
+▁concerne 1
+▁seront 1
+x 1
+▁raison 1
+th 1
+▁organismes 1
+▁date 1
+7 1
+w 1
+que 1
+▁co 1
+▁soient 1
+▁Ils 1
+▁toujours 1
+▁> 1
+examen 1
+utilisation 1
+li 1
+▁vertu 1
+▁produit 1
+▁dire 1
+▁visant 1
+▁sommes 1
+▁problème 1
+▁site 1
+▁DE 1
+▁elles 1
+▁règlement 1
+▁mis 1
+▁moyen 1
+▁quelque 1
+eur 1
+▁14 1
+▁ayant 1
+op 1
+▁informations 1
+ul 1
+ions 1
+▁efforts 1
+▁and 1
+▁r 1
+▁décembre 1
+▁nouveaux 1
+▁suite 1
+autre 1
+ins 1
+me 1
+I 1
+▁the 1
+▁financement 1
+éducation 1
+É 1
+par 1
+▁besoins 1
+▁pouvoir 1
+▁donné 1
+▁é 1
+▁chez 1
+ton 1
+▁celui 1
+▁production 1
+▁comment 1
+▁certaines 1
+▁Bureau 1
+▁celle 1
+▁In 1
+▁terme 1
+▁année 1
+ale 1
+▁serait 1
+ta 1
+ol 1
+▁relatives 1
+▁documents 1
+ion 1
+▁vraiment 1
+▁mondiale 1
+Rires 1
+▁tenu 1
+ordre 1
+▁effet 1
+▁forme 1
+▁; 1
+▁transport 1
+▁pris 1
+ot 1
+▁participation 1
+che 1
+▁là 1
+▁maintenant 1
+▁canadienne 1
+▁choses 1
+▁juin 1
+▁risque 1
+ique 1
+▁durable 1
+▁quelques 1
+▁prix 1
+administration 1
+▁16 1
+ill 1
+.1 1
+art 1
+rait 1
+▁Gouvernement 1
+▁valeur 1
+ré 1
+▁canadien 1
+▁soutien 1
+▁ni 1
+▁nécessaire 1
+▁République 1
+▁lorsque 1
+▁2005 1
+▁outre 1
+om 1
+eux 1
+9 1
+▁font 1
+▁mieux 1
+▁type 1
+▁2006 1
+▁traitement 1
+▁respect 1
+▁donner 1
+» 1
+▁jeunes 1
+accès 1
+ensemble 1
+eurs 1
+▁long 1
+ér 1
+ront 1
+▁17 1
+ti 1
+▁rapports 1
+▁communauté 1
+▁2007 1
+▁cause 1
+après 1
+lo 1
+▁pouvez 1
+▁fonds 1
+▁social 1
+▁v 1
+▁dis 1
+▁ici 1
+ner 1
+▁devraient 1
+con 1
+agit 1
+▁institutions 1
+▁sein 1
+▁Parlement 1
+▁européen 1
+8 1
+▁commerce 1
+ig 1
+qu 1
+ence 1
+▁Con 1
+▁note 1
+▁heures 1
+▁avaient 1
+ic 1
+▁Mme 1
+▁no 1
+è 1
+▁aujourd 1
+▁moi 1
+ay 1
+▁capacité 1
+▁Cour 1
+ier 1
+▁0 1
+▁Conférence 1
+▁mars 1
+▁mandat 1
+▁dépenses 1
+▁septembre 1
+▁100 1
+▁liste 1
+▁2004 1
+os 1
+▁réunion 1
+▁ministre 1
+▁montant 1
+▁ch 1
+ES 1
+if 1
+▁Ministère 1
+▁but 1
+industrie 1
+▁demandé 1
+▁création 1
+▁créer 1
+ve 1
+▁affaires 1
+▁budget 1
+AC 1
+▁nécessaires 1
+▁mai 1
+Europe 1
+▁po 1
+▁spécial 1
+▁fournir 1
+année 1
+▁procédure 1
+ure 1
+▁quatre 1
+▁systèmes 1
+▁to 1
+▁avez 1
+.2 1
+▁sens 1
+min 1
+▁É 1
+accord 1
+▁internationales 1
+▁jours 1
+▁auprès 1
+▁souvent 1
+▁sociale 1
+▁sorte 1
+▁famille 1
+▁25 1
+▁recommandations 1
+▁étant 1
+vi 1
+▁normes 1
+▁éléments 1
+▁renforcer 1
+▁pratiques 1
+▁différents 1
+▁juste 1
+▁technique 1
+ag 1
+▁cinq 1
+▁lutte 1
+ants 1
+▁permis 1
+▁celles 1
+isation 1
+▁croissance 1
+dé 1
+▁déclaration 1
+exercice 1
+▁moyens 1
+▁Donc 1
+co 1
+▁ra 1
+▁trouver 1
+▁plupart 1
+▁juillet 1
+▁communication 1
+▁Cela 1
+▁bon 1
+▁soins 1
+voir 1
+K 1
+▁début 1
+aires 1
+ard 1
+ia 1
+lé 1
+ab 1
+▁fins 1
+ive 1
+▁pense 1
+▁pratique 1
+▁permet 1
+▁vos 1
+▁Rapport 1
+objet 1
+tant 1
+man 1
+▁jamais 1
+▁domaines 1
+▁suivi 1
+ed 1
+▁progrès 1
+pos 1
+to 1
+▁décisions 1
+com 1
+iv 1
+▁2008 1
+ici 1
+UE 1
+ONU 1
+J 1
+▁indiqué 1
+▁technologie 1
+▁deuxième 1
+▁2003 1
+▁21 1
+▁autorités 1
+▁partir 1
+▁24 1
+▁ligne 1
+▁Ma 1
+▁économiques 1
+▁Comme 1
+▁novembre 1
+▁proposition 1
+▁nombreux 1
+▁Centre 1
+▁haut 1
+end 1
+▁janvier 1
+▁appel 1
+élaboration 1
+▁bonne 1
+W 1
+▁2002 1
+▁eux 1
+sion 1
+entre 1
+▁». 1
+na 1
+ère 1
+sse 1
+▁régions 1
+ge 1
+▁responsabilité 1
+▁promotion 1
+appui 1
+▁aider 1
+▁accord 1
+▁opérations 1
+▁assurer 1
+▁règles 1
+▁membre 1
+_ 1
+▁aucune 1
+vous 1
+▁avril 1
+▁31 1
+▁effets 1
+▁Nord 1
+▁octobre 1
+▁adopté 1
+▁dernier 1
+Office 1
+▁reçu 1
+▁violence 1
+.3 1
+▁total 1
+▁vol 1
+CE 1
+▁référence 1
+▁stratégie 1
+▁charge 1
+▁représentants 1
+▁devant 1
+mo 1
+▁déterminer 1
+▁nature 1
+▁ministère 1
+▁réponse 1
+▁différentes 1
+énergie 1
+son 1
+▁justice 1
+% 1
+▁mes 1
+▁techniques 1
+ap 1
+▁grâce 1
+▁représentant 1
+▁an 1
+assurer 1
+▁fond 1
+as 1
+▁Ã 1
+▁termes 1
+▁permettre 1
+nes 1
+▁comp 1
+all 1
+▁relative 1
+▁prises 1
+ign 1
+▁», 1
+▁pu 1
+▁lorsqu 1
+▁autochtones 1
+▁environ 1
+▁2000 1
+▁Ch 1
+od 1
+▁nationales 1
+▁nationaux 1
+▁hommes 1
+▁points 1
+▁changement 1
+▁coût 1
+▁principes 1
+mi 1
+air 1
+égalité 1
+▁section 1
+▁vigueur 1
+▁relations 1
+▁reste 1
+... 1
+intérêt 1
+▁revenu 1
+▁50 1
+▁coûts 1
+▁changements 1
+AN 1
+▁mondial 1
+▁aide 1
+▁face 1
+▁frais 1
+origine 1
+tique 1
+▁devons 1
+▁législation 1
+▁risques 1
+exécution 1
+▁propre 1
+importance 1
+exploitation 1
+▁22 1
+▁poste 1
+▁2001 1
+▁certain 1
+tra 1
+ub 1
+▁améliorer 1
+avoir 1
+égard 1
+établissement 1
+▁tels 1
+▁passé 1
+▁civile 1
+tes 1
+int 1
+▁gouvernements 1
+IS 1
+iste 1
+▁Règlement 1
+▁Tribunal 1
+▁délégation 1
+va 1
+▁commun 1
+▁pourquoi 1
+▁séance 1
+▁aucun 1
+ue 1
+▁rendre 1
+▁t 1
+ables 1
+▁promouvoir 1
+▁efficace 1
+▁divers 1
+▁près 1
+▁internationaux 1
+▁débat 1
+gu 1
+ateurs 1
+▁pourraient 1
+▁propriété 1
+▁lequel 1
+du 1
+▁régime 1
+▁avis 1
+▁per 1
+per 1
+▁dernière 1
+urs 1
+état 1
+▁importante 1
+▁procédures 1
+▁ensemble 1
+▁puis 1
+▁réseau 1
+▁discrimination 1
+▁financière 1
+▁di 1
+▁position 1
+.4 1
+istes 1
+▁réduire 1
+lu 1
+▁plutôt 1
+▁travers 1
+▁tel 1
+ER 1
+▁traité 1
+ite 1
+▁Web 1
+); 1
+ations 1
+▁participants 1
+▁centre 1
+▁No 1
+organisation 1
+ver 1
+▁sûr 1
+dic 1
+▁employés 1
+▁direction 1
+pl 1
+▁donne 1
+Y 1
+pe 1
+▁actuellement 1
+▁égard 1
+▁for 1
+▁principe 1
+ma 1
+▁président 1
+▁niveaux 1
+▁23 1
+▁St 1
+ob 1
+▁́ 1
+▁prise 1
+▁comprendre 1
+▁Pro 1
+▁Se 1
+▁approche 1
+▁obtenir 1
+▁contexte 1
+res 1
+tre 1
+IC 1
+▁examiné 1
+▁Bien 1
+▁seule 1
+▁partenaires 1
+▁pouvons 1
+▁texte 1
+▁publié 1
+▁Alors 1
+ateur 1
+be 1
+iques 1
+ante 1
+and 1
+▁présenter 1
+▁coordination 1
+▁contenu 1
+▁réduction 1
+▁culture 1
+▁Sa 1
+▁trouve 1
+▁créé 1
+▁collaboration 1
+▁fa 1
+▁territoire 1
+▁utiliser 1
+▁nécessité 1
+▁études 1
+▁juridique 1
+tu 1
+▁décidé 1
+isme 1
+▁Br 1
+org 1
+our 1
+lement 1
+activité 1
+30 1
+▁porte 1
+▁Canadiens 1
+▁matériel 1
+▁disposition 1
+▁sociaux 1
+▁fédéral 1
+▁existe 1
+ction 1
+analyse 1
+▁examen 1
+▁vu 1
+▁prévention 1
+ph 1
+▁milieu 1
+▁surveillance 1
+▁Dé 1
+adoption 1
+ssent 1
+▁soumis 1
+▁répondre 1
+▁bureau 1
+né 1
+ification 1
+▁telle 1
+50 1
+▁trop 1
+▁modèle 1
+▁capacités 1
+gi 1
+▁preuve 1
+▁secteurs 1
+euse 1
+appel 1
+▁oeuvre 1
+▁moyenne 1
+▁réalisation 1
+ist 1
+▁structure 1
+▁demandes 1
+▁présenté 1
+▁seul 1
+▁Fonds 1
+▁dernières 1
+▁exigences 1
+▁marchés 1
+enseignement 1
+▁État 1
+Agence 1
+▁tenir 1
+▁possibilité 1
+▁maintien 1
+▁davantage 1
+efficacité 1
+▁proposé 1
+abord 1
+▁participer 1
+▁déc 1
+▁Toutefois 1
+▁faveur 1
+IN 1
+▁pêche 1
+ille 1
+ticulièrement 1
+▁recours 1
+él 1
+▁initiatives 1
+▁police 1
+▁tard 1
+▁février 1
+▁propos 1
+pa 1
+économie 1
+▁ad 1
+▁guerre 1
+▁conférence 1
+assurance 1
+ga 1
+▁zone 1
+▁1999 1
+▁technologies 1
+▁durant 1
+avis 1
+▁mer 1
+:// 1
+annexe 1
+▁page 1
+go 1
+▁Ro 1
+▁suivants 1
+nd 1
+ON 1
+▁ville 1
+▁établi 1
+ten 1
+▁ailleurs 1
+▁simple 1
+vis 1
+▁compétences 1
+20 1
+▁mission 1
+yn 1
+▁nombreuses 1
+▁voie 1
+mé 1
+ois 1
+▁réserve 1
+▁conseil 1
+▁zones 1
+qui 1
+▁déclaré 1
+▁tu 1
+▁sol 1
+▁e 1
+auteur 1
+ord 1
+▁accès 1
+▁al 1
+ang 1
+▁telles 1
+ho 1
+▁vérification 1
+▁difficile 1
+iers 1
+do 1
+▁Sud 1
+ette 1
+▁quant 1
+▁numéro 1
+▁pauvreté 1
+15 1
+▁28 1
+▁large 1
+▁publics 1
+/2 1
+▁femme 1
+▁petit 1
+▁commencé 1
+▁semble 1
+▁mort 1
+▁six 1
+▁constitue 1
+igne 1
+Afrique 1
+▁vais 1
+tion 1
+ible 1
+serv 1
+▁mal 1
+même 1
+° 1
+▁comité 1
+use 1
+▁Santé 1
+▁citoyens 1
+▁choix 1
+▁parler 1
+▁définition 1
+▁26 1
+▁genre 1
+www 1
+▁principaux 1
+= 1
+Uni 1
+▁suivant 1
+▁armes 1
+13 1
+▁transfert 1
+▁! 1
+▁durée 1
+▁humaines 1
+ry 1
+▁privé 1
+▁condition 1
+▁II 1
+▁méthodes 1
+▁Monsieur 1
+▁Tout 1
+▁série 1
+▁tenue 1
+▁état 1
+▁directement 1
+▁gouvernementale 1
+hi 1
+▁Al 1
+-1 1
+▁40 1
+▁continuer 1
+▁application 1
+▁quoi 1
+▁27 1
+▁mé 1
+▁directive 1
+dire 1
+▁travailleurs 1
+▁valeurs 1
+▁fonctions 1
+tro 1
+étude 1
+ise 1
+ct 1
+▁i 1
+▁rien 1
+▁directeur 1
+▁marchandises 1
+▁diverses 1
+▁victimes 1
+11 1
+gc 1
+▁ensuite 1
+▁continue 1
+ienne 1
+▁rec 1
+da 1
+EN 1
+▁protéger 1
+▁op 1
+▁parmi 1
+▁sociétés 1
+▁langue 1
+▁passe 1
+Applaudissements 1
+form 1
+10 1
+▁aurait 1
+j 1
+▁Voici 1
+▁plans 1
+▁contribution 1
+▁Y 1
+▁o 1
+▁communautaire 1
+▁rendement 1
+▁financières 1
+▁demander 1
+▁trans 1
+port 1
+▁offre 1
+enfant 1
+Ontario 1
+▁travailler 1
+▁comprend 1
+▁Sel 1
+▁réglementation 1
+▁articles 1
+tation 1
+▁liberté 1
+▁défense 1
+▁construction 1
+▁faisant 1
+ég 1
+▁lois 1
+ib 1
+intégration 1
+▁br 1
+▁fédérale 1
+▁disponibles 1
+▁Secrétariat 1
+mar 1
+▁permanent 1
+▁statut 1
+▁Parties 1
+▁principal 1
+val 1
+rie 1
+▁aspects 1
+▁critères 1
+ec 1
+▁renforcement 1
+▁financiers 1
+▁utilisé 1
+av 1
+▁facteurs 1
+érer 1
+bi 1
+nel 1
+▁étude 1
+▁commission 1
+▁New 1
+intention 1
+▁classe 1
+entreprise 1
+▁connaissances 1
+▁réunions 1
+▁contact 1
+▁priorités 1
+▁planification 1
+ç 1
+lor 1
+▁biens 1
+▁solution 1
+▁rendu 1
+▁obligations 1
+▁main 1
+▁rend 1
+▁contribuer 1
+ales 1
+▁gaz 1
+investissement 1
+ven 1
+▁salle 1
+▁contrat 1
+▁imp 1
+urgence 1
+▁consultatif 1
+▁permettant 1
+▁anti 1
+▁Québec 1
+▁août 1
+▁élevé 1
+▁fer 1
+▁http 1
+X 1
+AT 1
+io 1
+▁centrale 1
+▁côté 1
+ide 1
+▁Z 1
+▁conséquent 1
+ate 1
+▁X 1
+▁crise 1
+ux 1
+▁cent 1
+▁langues 1
+▁propositions 1
+expérience 1
+dessus 1
+ger 1
+▁Mo 1
+AR 1
+▁canadiennes 1
+je 1
+▁Ré 1
+▁propres 1
+▁suivantes 1
+▁électronique 1
+▁responsable 1
+CI 1
+▁sélection 1
+▁Co 1
+▁version 1
+▁Inc 1
+▁faciliter 1
+▁relatifs 1
+▁stratégique 1
+ét 1
+▁succès 1
+objectif 1
+▁Déclaration 1
+▁assez 1
+▁60 1
+avais 1
+▁terrorisme 1
+▁Cependant 1
+▁peine 1
+▁rapidement 1
+▁démocratique 1
+▁annuel 1
+▁présentation 1
+▁Plan 1
+12 1
+▁postes 1
+ali 1
+ives 1
+▁tour 1
+▁portant 1
+▁écrit 1
+▁clients 1
+▁Mar 1
+▁Internet 1
+▁grandes 1
+▁force 1
+▁faible 1
+▁passer 1
+▁Membres 1
+▁Services 1
+▁Europe 1
+▁Qu 1
+ow 1
+▁surtout 1
+▁29 1
+▁mot 1
+cul 1
+▁régional 1
+▁Article 1
+spect 1
+▁chargé 1
+▁mécanismes 1
+▁financier 1
+no 1
+gé 1
+st 1
+EC 1
+▁veux 1
+cour 1
+▁fl 1
+ens 1
+assistance 1
+▁estime 1
+▁Code 1
+▁unique 1
+enquête 1
+rons 1
+▁trait 1
+▁communautés 1
+▁conclu 1
+▁supplémentaires 1
+▁manque 1
+essai 1
+▁Protocole 1
+lic 1
+▁maison 1
+▁vote 1
+▁chapitre 1
+▁sources 1
+▁chacun 1
+em 1
+▁rapide 1
+échelle 1
+▁caractère 1
+▁U 1
+▁provisoire 1
+▁entreprise 1
+▁meilleure 1
+▁médias 1
+14 1
+ha 1
+▁Car 1
+▁crois 1
+▁prend 1
+alité 1
+▁liés 1
+IT 1
+▁troisième 1
+tent 1
+éri 1
+▁dialogue 1
+▁/ 1
+▁humains 1
+▁chef 1
+▁aura 1
+▁relatif 1
+▁local 1
+▁fr 1
+uit 1
+land 1
+▁◦ 1
+▁importants 1
+▁voyage 1
+▁liées 1
+▁suit 1
+enregistrement 1
+oy 1
+▁Sur 1
+tr 1
+AL 1
+intérieur 1
+▁Service 1
+▁bas 1
+▁Chine 1
+tribu 1
+absence 1
+up 1
+▁terrain 1
+▁code 1
+Amérique 1
+pé 1
+isse 1
+avait 1
+▁coup 1
+▁possibilités 1
+▁fonctionnement 1
+▁dés 1
+▁instruments 1
+▁secrétariat 1
+ern 1
+▁contributions 1
+▁1, 1
+▁ait 1
+étais 1
+nelle 1
+▁VIH 1
+▁ET 1
+▁responsables 1
+▁conflit 1
+▁prévoit 1
+▁présence 1
+▁conflits 1
+▁LA 1
+ire 1
+▁milliards 1
+lant 1
+25 1
+▁méthode 1
+ly 1
+tri 1
+▁Quand 1
+▁Ministre 1
+▁ministères 1
+améliorer 1
+ris 1
+▁visite 1
+▁port 1
+▁juge 1
+▁DES 1
+iser 1
+▁organes 1
+▁avantages 1
+▁vise 1
+vers 1
+▁intérieur 1
+tiques 1
+▁êtes 1
+▁libre 1
+nous 1
+oul 1
+entrée 1
+▁espèces 1
+avenir 1
+histoire 1
+▁réforme 1
+ché 1
+ments 1
+▁particulière 1
+▁garantir 1
+.5 1
+ug 1
+▁maladie 1
+▁1998 1
+▁dès 1
+▁intérêts 1
+▁Ex 1
+OR 1
+ière 1
+▁anglais 1
+▁comportement 1
+▁forces 1
+br 1
+40 1
+cher 1
+innovation 1
+▁puisse 1
+▁tiers 1
+âge 1
+▁convient 1
+▁vente 1
+▁clairement 1
+▁évaluation 1
+▁actuelle 1
+▁modifications 1
+▁convention 1
+der 1
+▁délai 1
+▁source 1
+gue 1
+▁formes 1
+act 1
+▁Après 1
+▁nord 1
+18 1
+▁toutefois 1
+▁commune 1
+éd 1
+▁grands 1
+▁actions 1
+▁consultations 1
+▁réalité 1
+▁pi 1
+▁quelle 1
+▁canadiens 1
+▁raisons 1
+nant 1
+mer 1
+▁simplement 1
+▁Mon 1
+▁marque 1
+▁lesquelles 1
+▁types 1
+▁défini 1
+▁Me 1
+▁objectif 1
+▁réfugiés 1
+▁scientifiques 1
+▁Etats 1
+& 1
+▁Ainsi 1
+▁stratégies 1
+nal 1
+espace 1
+nement 1
+â 1
+▁ai 1
+▁représente 1
+▁met 1
+▁2009 1
+▁fonctionnaires 1
+qué 1
+▁physique 1
+▁1. 1
+alinéa 1
+▁servir 1
+ell 1
+▁regard 1
+▁enregistré 1
+▁parents 1
+occasion 1
+▁voudrais 1
+▁maladies 1
+RE 1
+experts 1
+▁relativement 1
+▁publiques 1
+▁prévu 1
+▁route 1
+▁obtenu 1
+▁The 1
+▁Merci 1
+▁train 1
+ative 1
+19 1
+prim 1
+▁Vi 1
+▁émissions 1
+ies 1
+▁plein 1
+ni 1
+des 1
+inter 1
+▁com 1
+▁choisi 1
+ient 1
+éc 1
+▁quantité 1
+dit 1
+▁quel 1
+ix 1
+▁lettre 1
+▁Haut 1
+▁lumière 1
+ev 1
+▁crédit 1
+08 1
+▁article 1
+ités 1
+▁faites 1
+▁professionnelle 1
+expression 1
+▁lesquels 1
+RC 1
+▁Comment 1
+Q 1
+▁spéciale 1
+▁tableau 1
+▁consultation 1
+▁négociations 1
+idée 1
+▁collectivités 1
+▁sub 1
+AP 1
+ler 1
+ade 1
+isé 1
+▁(19 1
+hé 1
+▁veut 1
+▁petits 1
+▁priorité 1
+ène 1
+▁interne 1
+16 1
+▁inc 1
+▁examiner 1
+ron 1
+▁majorité 1
+▁Depuis 1
+▁centres 1
+nu 1
+▁locaux 1
+▁militaire 1
+▁jouer 1
+▁phase 1
+▁corps 1
+amélioration 1
+▁utilisés 1
+▁établissements 1
+▁confiance 1
+▁consommation 1
+édi 1
+▁Direction 1
+▁populations 1
+ifié 1
+▁Amendement 1
+▁concurrence 1
+ous 1
+euses 1
+▁radio 1
+▁liens 1
+▁petite 1
+ast 1
+éré 1
+▁consiste 1
+▁Nouvelle 1
+▁Elles 1
+▁établir 1
+▁observations 1
+▁▪ 1
+ka 1
+▁atteint 1
+▁juridiques 1
+ak 1
+▁taille 1
+bo 1
+onne 1
+rais 1
+▁signifie 1
+▁statistiques 1
+▁tr 1
+▁accords 1
+60 1
+out 1
+ind 1
+-2 1
+▁champ 1
+▁suivre 1
+▁militaires 1
+▁Or 1
+sida 1
+▁pr 1
+SE 1
+applique 1
+▁conclusions 1
+▁aller 1
+@ 1
+IR 1
+lin 1
+▁multi 1
+▁Royaume 1
+argent 1
+▁savez 1
+▁candidats 1
+Ouest 1
+non 1
+▁haute 1
+fr 1
+▁province 1
+CN 1
+ort 1
+▁caractéristique 1
+▁lignes 1
+▁organisation 1
+▁petites 1
+équipe 1
+▁An 1
+▁k 1
+▁Afrique 1
+▁Lorsque 1
+▁approuvé 1
+évolution 1
+mp 1
+▁Colombie 1
+▁responsabilités 1
+▁agents 1
+éro 1
+▁déposé 1
+05 1
+ati 1
+▁Communauté 1
+▁pied 1
+tte 1
+uc 1
+/1 1
+▁partage 1
+▁final 1
+▁conçu 1
+▁terres 1
+▁scientifique 1
+teur 1
+▁direct 1
+pt 1
+▁communications 1
+▁campagne 1
+ich 1
+▁prévues 1
+▁www 1
+▁1995 1
+▁parti 1
+▁meilleur 1
+▁globale 1
+échange 1
+▁pla 1
+ép 1
+▁forte 1
+▁conséquences 1
+ème 1
+▁concept 1
+ke 1
+▁Sous 1
+▁col 1
+▁entendu 1
+ssant 1
+03 1
+▁figure 1
+▁suivante 1
+▁régionaux 1
+▁pouvait 1
+▁régionales 1
+▁solutions 1
+▁développer 1
+▁sites 1
+▁transports 1
+▁catégorie 1
+▁traite 1
+▁veiller 1
+▁cor 1
+▁peuple 1
+▁familles 1
+▁humaine 1
+▁Date 1
+aine 1
+▁généralement 1
+inc 1
+iez 1
+▁télé 1
+▁licence 1
+▁portée 1
+▁capital 1
+▁sociales 1
+▁France 1
+▁annexe 1
+▁or 1
+▁siècle 1
+▁mère 1
+AS 1
+▁etc 1
+mb 1
+ts 1
+▁terre 1
+ong 1
+▁photo 1
+▁longue 1
+▁modification 1
+apprentissage 1
+▁presque 1
+17 1
+▁action 1
+▁élément 1
+▁semaine 1
+▁conseils 1
+▁connaissance 1
+▁pénale 1
+▁Document 1
+CC 1
+Britannique 1
+eu 1
+yl 1
+29 1
+▁importantes 1
+▁reconnu 1
+▁actes 1
+▁complète 1
+▁Terre 1
+▁fil 1
+▁organisé 1
+▁activité 1
+▁réaliser 1
+▁mécanisme 1
+01 1
+AD 1
+▁profit 1
+▁trouvé 1
+▁poursuivre 1
+▁man 1
+▁cerveau 1
+▁pleinement 1
+▁partenariat 1
+▁locales 1
+▁̈ 1
+▁spécifiques 1
+▁2004, 1
+▁1997 1
+▁analyse 1
+▁résultat 1
+tal 1
+▁& 1
+ban 1
+erie 1
+élimination 1
+▁augmentation 1
+! 1
+▁composé 1
+▁fort 1
+▁mar 1
+▁idée 1
+▁expérience 1
+▁2, 1
+ces 1
+▁décrit 1
+▁uniquement 1
+▁cons 1
+amp 1
+▁2005, 1
+▁vrai 1
+abilité 1
+▁atteindre 1
+▁récemment 1
+▁micro 1
+app 1
+▁recommande 1
+▁tri 1
+▁médicaments 1
+▁provenant 1
+ien 1
+▁peuples 1
+▁parc 1
+don 1
+pp 1
+▁distribution 1
+ach 1
+▁dispose 1
+▁recherches 1
+▁consommateurs 1
+▁Ar 1
+▁Fa 1
+21 1
+TE 1
+▁York 1
+▁nucléaires 1
+▁Russie 1
+impôt 1
+▁2003, 1
+TION 1
+ttes 1
+▁élaboré 1
+▁ouvert 1
+gén 1
+▁dépend 1
+▁quelqu 1
+▁ren 1
+73 1
+▁vis 1
+96 1
+▁changer 1
+attention 1
+▁enfant 1
+\ 1
+▁logement 1
+▁chemin 1
+▁souhaite 1
+▁communiquer 1
+lect 1
+ana 1
+▁compétence 1
+lon 1
+▁dû 1
+ult 1
+▁prêt 1
+▁montre 1
+▁réseaux 1
+▁90 1
+he 1
+tim 1
+▁revenus 1
+▁Banque 1
+▁importe 1
+iz 1
+initiative 1
+▁intellectuelle 1
+vé 1
+▁nations 1
+▁retour 1
+▁Tous 1
+▁seraient 1
+▁doute 1
+▁bo 1
+▁raisonnable 1
+▁bio 1
+ques 1
+ération 1
+ologie 1
+rou 1
+▁défis 1
+▁environnement 1
+▁naturelles 1
+". 1
+▁publication 1
+▁Mission 1
+utiliser 1
+▁emploi 1
+po 1
+can 1
+▁gros 1
+▁relation 1
+67 1
+tur 1
+▁humanitaire 1
+▁Charte 1
+▁dépôt 1
+▁2002, 1
+établir 1
+AM 1
+▁-- 1
+▁DU 1
+ines 1
+▁prestations 1
+▁propose 1
+▁limite 1
+▁contraire 1
+cent 1
+24 1
+▁favoriser 1
+wa 1
+▁judiciaire 1
+▁diversité 1
+heure 1
+infrastructure 1
+iff 1
+▁gra 1
+agriculture 1
+▁précise 1
+▁* 1
+isant 1
+▁traités 1
+obtenir 1
+▁ro 1
+nc 1
+▁essentiel 1
+▁véhicules 1
+▁supérieur 1
+tin 1
+▁respecter 1
+▁2000, 1
+▁occasion 1
+cré 1
+▁gr 1
+▁vont 1
+école 1
+▁livre 1
+▁considérée 1
+gr 1
+examiner 1
+▁exp 1
+Accord 1
+uff 1
+▁fi 1
+ju 1
+▁engagements 1
+▁2001, 1
+▁rue 1
+RI 1
+▁situé 1
+▁utile 1
+▁voyez 1
+US 1
+▁trou 1
+men 1
+▁Pré 1
+▁prestation 1
+▁change 1
+▁alimentaire 1
+▁recommandé 1
+▁eaux 1
+▁Que 1
+▁étudiants 1
+mes 1
+▁conception 1
+▁étrangères 1
+▁sud 1
+▁disponible 1
+▁commercial 1
+offre 1
+▁Annexe 1
+ê 1
+▁Bo 1
+▁bureaux 1
+06 1
+dessous 1
+▁difficultés 1
+” 1
+▁lancé 1
+▁informé 1
+▁chercheurs 1
+nés 1
+▁lien 1
+existence 1
+▁recommandation 1
+▁constituent 1
+att 1
+▁Ba 1
+▁continu 1
+enne 1
+▁parle 1
+▁professionnels 1
+▁loin 1
+▁cycle 1
+mis 1
+TM 1
+issement 1
+▁humain 1
+▁calcul 1
+▁environnemental 1
+▁attention 1
+▁appelé 1
+▁exprimé 1
+venu 1
+▁animaux 1
+▁pression 1
+▁devenir 1
+▁règle 1
+▁produire 1
+▁engagé 1
+▁permettra 1
+ouverture 1
+▁outils 1
+▁matières 1
+▁connaître 1
+▁paiement 1
+▁probablement 1
+▁efficaces 1
+eff 1
+▁parfois 1
+oire 1
+ame 1
+▁35 1
+chant 1
+antes 1
+ose 1
+ration 1
+engagement 1
+▁ép 1
+▁appui 1
+rc 1
+▁démocratie 1
+▁aliments 1
+▁contient 1
+atif 1
+installation 1
+▁1994 1
+▁cependant 1
+▁autant 1
+▁Ça 1
+accent 1
+▁2006, 1
+raient 1
+▁mentionné 1
+adresse 1
+▁super 1
+▁histoire 1
+▁déclarations 1
+▁autour 1
+PE 1
+▁substances 1
+rain 1
+▁transition 1
+▁faite 1
+▁permettent 1
+▁français 1
+90 1
+▁réponses 1
+▁filles 1
+lan 1
+▁film 1
+▁“ 1
+étranger 1
+▁ta 1
+voy 1
+év 1
+▁principales 1
+▁commencer 1
+▁Ad 1
+▁totale 1
+▁prochaine 1
+▁court 1
+▁traiter 1
+▁joue 1
+▁80 1
+pro 1
+▁principalement 1
+▁Division 1
+▁venir 1
+▁investissements 1
+▁élections 1
+▁porter 1
+▁Ca 1
+▁peux 1
+4) 1
+▁préparation 1
+▁Ottawa 1
+▁post 1
+▁pouvoirs 1
+▁modèles 1
+▁devait 1
+▁all 1
+▁Li 1
+▁surface 1
+▁souligné 1
+▁installations 1
+▁diffusion 1
+▁considération 1
+ui 1
+ori 1
+▁Enfin 1
+▁formulaire 1
+▁annuelle 1
+▁ca 1
+▁Projet 1
+lig 1
+▁missions 1
+▁so 1
+▁téléphone 1
+▁directives 1
+▁dix 1
+▁carte 1
+▁cha 1
+log 1
+▁fourni 1
+▁indique 1
+ov 1
+▁vidéo 1
+09 1
+▁officielles 1
+alisation 1
+lit 1
+▁considère 1
+▁modifier 1
+▁provinces 1
+€ 1
+▁ba 1
+▁Département 1
+cl 1
+approbation 1
+ît 1
+▁mots 1
+▁étrangers 1
+ATION 1
+▁tra 1
+accueil 1
+▁comporte 1
+▁dossier 1
+agissant 1
+▁puissent 1
+▁Cet 1
+▁sou 1
+▁correspondant 1
+ures 1
+* 1
+▁science 1
+ical 1
+▁vision 1
+▁montrer 1
+ty 1
+23 1
+TA 1
+▁courant 1
+▁w 1
+uis 1
+▁danger 1
+▁ob 1
+naire 1
+▁siège 1
+mon 1
+là 1
+▁test 1
+▁App 1
+▁connu 1
+▁évaluer 1
+orientation 1
+tic 1
+78 1
+ô 1
+▁vivant 1
+▁comptes 1
+▁menace 1
+▁commerciales 1
+▁semaines 1
+22 1
+ages 1
+approche 1
+▁vaste 1
+uv 1
+– 1
+▁1996 1
+▁utilisation 1
+02 1
+effet 1
+OU 1
+▁Saint 1
+66 1
+▁urbain 1
+(1) 1
+▁minutes 1
+▁enquête 1
+▁Com 1
+▁communiqué 1
+▁mode 1
+▁exige 1
+▁communautaires 1
+vo 1
+vent 1
+▁compter 1
+▁penser 1
+cept 1
+▁proportion 1
+Institut 1
+mat 1
+ule 1
+▁CE 1
+▁Partie 1
+▁= 1
+impact 1
+ja 1
+PR 1
+▁mêmes 1
+▁maintenir 1
+arrêt 1
+▁sais 1
+▁cellules 1
+26 1
+ba 1
+itu 1
+chi 1
+issant 1
+▁compar 1
+▁500 1
+esprit 1
+so 1
+▁limites 1
+▁Est 1
+▁habit 1
+▁fais 1
+ari 1
+▁Accueil 1
+fi 1
+▁satisfaction 1
+▁préalable 1
+én 1
+▁soutenir 1
+ifs 1
+Add 1
+▁signé 1
+▁1990 1
+▁circonstances 1
+▁commis 1
+hy 1
+▁cré 1
+▁bou 1
+▁villes 1
+▁intervenants 1
+▁45 1
+▁fo 1
+rom 1
+▁désir 1
+▁presse 1
+▁règlements 1
+nov 1
+car 1
+▁rempli 1
+▁Ne 1
+si 1
+▁régionale 1
+imi 1
+ats 1
+obligation 1
+▁candidat 1
+cur 1
+Z 1
+uel 1
+▁Représentant 1
+▁jeu 1
+▁demeure 1
+▁permanente 1
+▁ét 1
+▁fondée 1
+ert 1
+▁considérable 1
+▁né 1
+▁sexuelle 1
+▁sept 1
+tage 1
+▁facile 1
+▁cancer 1
+▁convenu 1
+née 1
+▁volonté 1
+▁Gu 1
+▁Certains 1
+ième 1
+lle 1
+▁guide 1
+EM 1
+▁drogues 1
+▁John 1
+▁scolaire 1
+▁Plus 1
+avons 1
+▁réuni 1
+▁Traité 1
+▁partout 1
+." 1
+▁Na 1
+▁territoires 1
+▁départ 1
+▁patients 1
+▁cour 1
+27 1
+ah 1
+▁Notre 1
+▁conséquence 1
+▁complet 1
+▁prévue 1
+émission 1
+▁réel 1
+▁sport 1
+ity 1
+▁stabilité 1
+augmentation 1
+enfants 1
+▁fondamentaux 1
+pr 1
+▁précis 1
+RS 1
+ani 1
+DE 1
+▁Voir 1
+pi 1
+▁mener 1
+ition 1
+▁Point 1
+▁offert 1
+▁Ra 1
+31 1
+▁stratégiques 1
+ud 1
+tar 1
+iens 1
+▁précédent 1
+▁délégations 1
+équipement 1
+▁défi 1
+cr 1
+tor 1
+▁modifié 1
+▁réaction 1
+ran 1
+▁privée 1
+▁arrive 1
+▁voix 1
+▁III 1
+▁critique 1
+armes 1
+moi 1
+LE 1
+▁seconde 1
+▁vert 1
+resse 1
+▁réussi 1
+ice 1
+▁tribunal 1
+04 1
+▁Dr 1
+▁commentaires 1
+gouvernemental 1
+▁actuel 1
+▁adressée 1
+SC 1
+PC 1
+▁experts 1
+▁désarmement 1
+▁ceci 1
+▁déchets 1
+▁bal 1
+▁offrir 1
+▁devenu 1
+table 1
+▁garde 1
+ole 1
+▁révision 1
+▁ministres 1
+▁autorisé 1
+.6 1
+igné 1
+▁entier 1
+▁Directeur 1
+▁principale 1
+▁adoptée 1
+▁intégrée 1
+▁Ni 1
+07 1
+> 1
+▁poursuite 1
+». 1
+▁tendance 1
+aï 1
+ieux 1
+▁obligatoire 1
+▁%) 1
+▁volume 1
+▁Non 1
+tif 1
+ick 1
+-3 1
+▁chimiques 1
+impression 1
+▁clés 1
+▁fut 1
+affaires 1
+75 1
+informations 1
+▁fabrication 1
+72 1
+▁augmenté 1
+▁vient 1
+▁pan 1
+▁bois 1
+été 1
+arm 1
+appliquer 1
+téri 1
+▁administratives 1
+▁nucléaire 1
+▁pourcentage 1
+sc 1
+▁passage 1
+▁soutenu 1
+▁degré 1
+▁requérant 1
+mand 1
+rant 1
+SA 1
+▁secondaire 1
+▁intitulé 1
+teurs 1
+▁information 1
+▁Son 1
+▁bord 1
+▁Chaque 1
+duc 1
+ifi 1
+▁nommé 1
+▁devez 1
+▁tête 1
+sp 1
+TI 1
+élection 1
+▁prévenir 1
+ano 1
+▁observé 1
+▁active 1
+▁génération 1
+intervention 1
+▁partenariats 1
+uf 1
+▁Même 1
+▁ordinaire 1
+asse 1
+IM 1
+▁gouvernance 1
+▁2. 1
+▁participé 1
+ian 1
+▁précédente 1
+▁écoles 1
+▁conformité 1
+▁70 1
+▁utilisées 1
+▁Réunion 1
+▁intégré 1
+68 1
+▁premiers 1
+▁ouverte 1
+▁recevoir 1
+▁nomination 1
+▁200 1
+83 1
+▁derniers 1
+▁génétique 1
+▁idées 1
+▁potentiel 1
+ns 1
+autorité 1
+cri 1
+▁longtemps 1
+", 1
+▁commence 1
+▁auteurs 1
+ase 1
+▁appelle 1
+▁fourniture 1
+▁milliers 1
+▁st 1
+Université 1
+▁ordre 1
+▁catégories 1
+▁directrices 1
+avant 1
+▁chargée 1
+rit 1
+cadre 1
+▁organisée 1
+▁Avec 1
+▁trafic 1
+▁Pacte 1
+76 1
+bl 1
+▁CO 1
+▁particuliers 1
+▁sciences 1
+28 1
+▁résoudre 1
+▁Constitution 1
+▁Rapporteur 1
+▁pertinentes 1
+▁journal 1
+▁Maintenant 1
+▁apporter 1
+▁civils 1
+79 1
+▁organisme 1
+▁intérêt 1
+▁différence 1
+CR 1
+▁faits 1
+▁travaille 1
+▁donnée 1
+aider 1
+▁planète 1
+gn 1
+ente 1
+activités 1
+▁Japon 1
+▁détail 1
+ST 1
+▁présentée 1
+ip 1
+▁personnels 1
+▁favorable 1
+▁figurant 1
+▁moitié 1
+▁équipe 1
+ches 1
+▁suffisamment 1
+▁bande 1
+▁déterminé 1
+▁conservation 1
+▁International 1
+▁Do 1
+▁copie 1
+▁limitée 1
+habitat 1
+▁lieux 1
+▁sexe 1
+ject 1
+▁utilisant 1
+▁grave 1
+▁logiciel 1
+▁collecte 1
+OMPI 1
+▁financé 1
+▁véhicule 1
+ial 1
+80 1
+ête 1
+agent 1
+▁protocole 1
+▁commissaire 1
+▁accepté 1
+▁crime 1
+IP 1
+importe 1
+▁importance 1
+adhésion 1
+▁retard 1
+▁connexes 1
+▁pénal 1
+▁mi 1
+▁Ontario 1
+▁RE 1
+▁conduite 1
+▁is 1
+71 1
+▁− 1
+▁contra 1
+lar 1
+▁commerciaux 1
+▁rencontre 1
+imp 1
+NE 1
+▁clair 1
+92 1
+▁conclusion 1
+▁perte 1
+▁éco 1
+étaient 1
+cle 1
+33 1
+▁tient 1
+▁am 1
+ache 1
+▁extraordinaire 1
+▁supplémentaire 1
+▁numérique 1
+▁climat 1
+▁considéré 1
+vie 1
+trait 1
+▁(1 1
+▁équitable 1
+▁CN 1
+▁table 1
+▁estimé 1
+▁conforme 1
+ât 1
+▁sexes 1
+▁généraux 1
+▁exercice 1
+avocat 1
+DI 1
+vit 1
+35 1
+IE 1
+▁Ho 1
+inspection 1
+▁solide 1
+▁réalisé 1
+▁Ac 1
+▁message 1
+▁contenant 1
+dis 1
+▁collective 1
+▁Cl 1
+▁tâches 1
+▁laboratoire 1
+▁2007, 1
+74 1
+▁initiative 1
+▁tirer 1
+▁géographique 1
+clu 1
+ami 1
+▁annoncé 1
+▁mari 1
+97 1
+▁temporaire 1
+▁appliquer 1
+▁moteur 1
+▁température 1
+▁procédé 1
+▁résolutions 1
+▁périodique 1
+achat 1
+▁fournis 1
+▁inscrit 1
+In 1
+▁Be 1
+▁3, 1
+PS 1
+mêmes 1
+▁tandis 1
+sol 1
+▁élevée 1
+affectation 1
+▁He 1
+▁apporté 1
+▁bout 1
+ID 1
+atrice 1
+ko 1
+▁33 1
+▁représentent 1
+▁Jo 1
+ber 1
+▁ri 1
+▁applicables 1
+▁fournit 1
+▁maritime 1
+▁vivre 1
+▁agricole 1
+SP 1
+ères 1
+▁puisque 1
+▁couleur 1
+▁graves 1
+▁devra 1
+▁employé 1
+▁Du 1
+CP 1
+▁Toronto 1
+▁auxquels 1
+▁musique 1
+▁Premières 1
+▁primaire 1
+▁définir 1
+▁Conformément 1
+existe 1
+91 1
+▁Autres 1
+▁... 1
+▁mouvement 1
+▁parole 1
+▁utilise 1
+▁indicateurs 1
+70 1
+99 1
+▁Pas 1
+▁sé 1
+▁procéder 1
+▁parvenir 1
+logue 1
+▁commerciale 1
+espèce 1
+▁utilisateurs 1
+observation 1
+ium 1
+▁Note 1
+▁intéressant 1
+ore 1
+▁sauf 1
+▁Mc 1
+▁préoccupations 1
+part 1
+▁concours 1
+issue 1
+▁Per 1
+▁modalités 1
+▁So 1
+▁susceptibles 1
+▁énergétique 1
+▁pilote 1
+▁tiré 1
+ssé 1
+▁montré 1
+pprovisionnement 1
+viennent 1
+▁station 1
+▁ordinateur 1
+Est 1
+▁EN 1
+▁cherche 1
+▁inclus 1
+ox 1
+▁appris 1
+▁Mi 1
+études 1
+oir 1
+▁global 1
+PP 1
+▁chefs 1
+lier 1
+▁somme 1
+▁soi 1
+▁Air 1
+adi 1
+ik 1
+▁payé 1
+cra 1
+▁moderne 1
+▁circulation 1
+▁Pa 1
+▁rejet 1
+▁exposé 1
+ières 1
+era 1
+▁réalisés 1
+œ 1
+lation 1
+▁Sommet 1
+OP 1
+▁Fédération 1
+▁reconnaissance 1
+▁producteurs 1
+EUR 1
+94 1
+œuvre 1
+▁jugé 1
+▁fixé 1
+rage 1
+▁demandeur 1
+▁pourra 1
+93 1
+▁fournisseurs 1
+▁Section 1
+▁transparence 1
+▁voulu 1
+forme 1
+▁mou 1
+col 1
+delà 1
+000 1
+▁salaire 1
+▁prêts 1
+▁répond 1
+▁55 1
+▁pouvant 1
+▁situations 1
+▁Po 1
+ock 1
+▁nu 1
+ama 1
+▁Ou 1
+▁immédiatement 1
+q 1
+▁Montréal 1
+let 1
+▁description 1
+▁réelle 1
+▁Ju 1
+▁at 1
+▁su 1
+▁venu 1
+▁double 1
+▁1993 1
+isés 1
+▁familiale 1
+▁consacré 1
+▁conseiller 1
+exportation 1
+▁journée 1
+extérieur 1
+eaux 1
+▁supérieure 1
+su 1
+▁locale 1
+▁format 1
+cé 1
+▁ressort 1
+▁pauvres 1
+▁allé 1
+▁Col 1
+▁rémunération 1
+aille 1
+▁utilisée 1
+▁relève 1
+▁aérien 1
+▁gl 1
+▁envoyé 1
+▁central 1
+▁élèves 1
+▁32 1
+▁procès 1
+fo 1
+▁mené 1
+bon 1
+▁adapté 1
+EL 1
+▁souligne 1
+ôt 1
+▁can 1
+▁Genève 1
+▁sensibilisation 1
+▁désigné 1
+▁prochain 1
+tour 1
+▁Su 1
+▁feu 1
+▁affecté 1
+image 1
+▁futur 1
+ète 1
+▁Pourquoi 1
+érant 1
+▁LE 1
+gar 1
+▁structures 1
+oli 1
+tré 1
+atoire 1
+▁agricoles 1
+▁pièces 1
+▁fédéraux 1
+institution 1
+▁Bon 1
+▁placé 1
+▁comprennent 1
+▁commande 1
+▁auquel 1
+off 1
+▁impact 1
+▁participant 1
+tel 1
+affaire 1
+▁baisse 1
+▁européens 1
+▁Parce 1
+▁dossiers 1
+▁art 1
+▁conf 1
+▁événements 1
+emp 1
+-20 1
+▁poids 1
+▁vingt 1
+▁biais 1
+han 1
+▁assuré 1
+▁fondé 1
+▁plainte 1
+▁répartition 1
+▁retraite 1
+▁Quel 1
+ral 1
+▁invité 1
+▁paragraphes 1
+▁détention 1
+41 1
+association 1
+gre 1
+▁positive 1
+▁illicite 1
+▁torture 1
+▁not 1
+alimentation 1
+cer 1
+▁éviter 1
+▁vit 1
+▁appr 1
+▁collectivité 1
+▁essentiellement 1
+dition 1
+issent 1
+▁Avant 1
+▁proche 1
+isée 1
+▁discussions 1
+▁papier 1
+amb 1
+nis 1
+inscription 1
+▁instance 1
+▁consulter 1
+▁conférences 1
+environ 1
+▁nuit 1
+▁hors 1
+▁certificat 1
+▁certaine 1
+▁Afin 1
+amendement 1
+spir 1
+naires 1
+▁recettes 1
+▁Bar 1
+▁concert 1
+69 1
+▁Demande 1
+atifs 1
+▁effectué 1
+▁1992 1
+acte 1
+▁< 1
+ME 1
+vre 1
+▁CA 1
+▁gérer 1
+ley 1
+▁(1) 1
+▁élaborer 1
+▁exécutif 1
+▁savons 1
+ett 1
+bre 1
+▁2010 1
+fa 1
+▁endroit 1
+▁acte 1
+dia 1
+▁violation 1
+▁encourage 1
+tru 1
+▁complexe 1
+ym 1
+45 1
+▁marche 1
+Orient 1
+▁Pr 1
+▁dessin 1
+▁payer 1
+▁Tableau 1
+▁Compte 1
+▁sent 1
+▁support 1
+auto 1
+▁compréhension 1
+▁tâche 1
+▁reprise 1
+▁industrielle 1
+▁menées 1
+IF 1
+▁sujets 1
+▁vi 1
+86 1
+▁discussion 1
+87 1
+ath 1
+-4 1
+organisme 1
+▁Pays 1
+89 1
+▁congé 1
+ET 1
+▁instrument 1
+▁rester 1
+▁populaire 1
+▁différences 1
+▁faudrait 1
+autorisation 1
+▁historique 1
+nn 1
+appliquent 1
+77 1
+las 1
+▁tribunaux 1
+▁brut 1
+▁ferme 1
+▁maximum 1
+▁adjoint 1
+▁Description 1
+▁construire 1
+▁Nouveau 1
+Association 1
+▁voici 1
+ack 1
+employeur 1
+▁générales 1
+UR 1
+audience 1
+▁For 1
+▁minimum 1
+NA 1
+», 1
+appareil 1
+▁engagement 1
+jo 1
+az 1
+nent 1
+▁félicite 1
+▁arrêt 1
+▁signature 1
+interdiction 1
+ans 1
+▁cinquante 1
+▁propriétaire 1
+▁réduit 1
+▁présidence 1
+▁accrue 1
+▁correspond 1
+▁allons 1
+IA 1
+cal 1
+▁chacune 1
+van 1
+▁cap 1
+net 1
+▁inst 1
+▁chiffres 1
+▁Numéro 1
+▁cher 1
+▁bénéficier 1
+▁additionnel 1
+▁vitesse 1
+▁rencontré 1
+▁camp 1
+▁neuf 1
+▁communes 1
+▁lutter 1
+employé 1
+▁Institut 1
+opération 1
+ues 1
+▁destinés 1
+▁nouvel 1
+▁extrêmement 1
+▁posé 1
+▁taxe 1
+▁distance 1
+▁occupé 1
+▁Ha 1
+ndre 1
+▁présentés 1
+▁premières 1
+▁trouvent 1
+▁opérationnel 1
+▁palestinien 1
+exposition 1
+▁mises 1
+▁étape 1
+▁aient 1
+RA 1
+usage 1
+▁conjoint 1
+fect 1
+▁acc 1
+▁adoptées 1
+▁minorités 1
+▁désormais 1
+▁meilleurs 1
+▁bonnes 1
+▁véritable 1
+asp 1
+34 1
+▁père 1
+rique 1
+▁motifs 1
+▁Aux 1
+▁cible 1
+▁Ceci 1
+▁signalé 1
+▁enquêtes 1
+▁perdu 1
+▁largement 1
+main 1
+▁assistance 1
+▁noté 1
+▁spéciales 1
+ili 1
+▁obstacles 1
+Initiative 1
+avance 1
+▁représentation 1
+▁initiale 1
+▁concernés 1
+▁facteur 1
+▁chaîne 1
+▁adopter 1
+▁norme 1
+impose 1
+84 1
+▁préparer 1
+▁associés 1
+37 1
+▁constaté 1
+vision 1
+fic 1
+▁Am 1
+programme 1
+▁pre 1
+NU 1
+▁détaillée 1
+▁traitements 1
+▁rÃ 1
+▁Oui 1
+51 1
+acquisition 1
+▁rapporteur 1
+▁entend 1
+▁Société 1
+▁huit 1
+▁yeux 1
+▁violations 1
+▁stade 1
+▁crédits 1
+▁encourager 1
+▁Avis 1
+▁erreur 1
+▁accroître 1
+▁devient 1
+mul 1
+▁homme 1
+5) 1
+▁espace 1
+▁auront 1
+▁Cor 1
+44 1
+▁remboursement 1
+▁vaccin 1
+▁qualifié 1
+▁Th 1
+▁dirigeants 1
+▁culturels 1
+pens 1
+ifier 1
+▁bar 1
+▁objet 1
+▁Forum 1
+▁délais 1
+▁religion 1
+avion 1
+49 1
+▁chambres 1
+▁gouvernementaux 1
+▁Trésor 1
+▁sûreté 1
+▁3. 1
+▁collègues 1
+▁em 1
+▁auto 1
+▁thème 1
+▁subi 1
+▁To 1
+32 1
+▁rep 1
+▁climatique 1
+▁applicable 1
+▁forêts 1
+exécut 1
+IL 1
+▁assure 1
+▁frontières 1
+og 1
+▁Chambre 1
+▁34 1
+▁naissance 1
+OL 1
+▁composition 1
+ature 1
+rent 1
+▁envers 1
+43 1
+▁répercussions 1
+hôtel 1
+‐ 1
+▁échanges 1
+▁américaine 1
+gin 1
+engage 1
+els 1
+lis 1
+Bas 1
+▁Affaires 1
+fe 1
+gent 1
+39 1
+▁1991 1
+cy 1
+▁fixe 1
+▁faisons 1
+▁efficacement 1
+mique 1
+inte 1
+parle 1
+▁causé 1
+CO 1
+ric 1
+▁dynamique 1
+▁1999, 1
+▁Port 1
+▁avancé 1
+▁hausse 1
+tan 1
+2) 1
+▁voulons 1
+▁signe 1
+▁IN 1
+▁essais 1
+▁cro 1
+met 1
+gen 1
+évaluer 1
+▁évalué 1
+CA 1
+hér 1
+mètre 1
+▁exclu 1
+▁Toutes 1
+▁démarche 1
+▁amis 1
+▁acteurs 1
+38 1
+we 1
+attache 1
+▁Stratégie 1
+sur 1
+DC 1
+▁appuyer 1
+▁transformation 1
+▁spécifique 1
+▁construit 1
+oll 1
+actif 1
+▁masse 1
+▁chance 1
+éthique 1
+▁ver 1
+▁min 1
+SI 1
+▁carrière 1
+▁déplacement 1
+▁Ab 1
+▁Politique 1
+▁Sh 1
+▁antérieure 1
+PA 1
+▁village 1
+▁africaine 1
+èrent 1
+encontre 1
+▁quatrième 1
+UN 1
+indication 1
+avaient 1
+▁mod 1
+▁textes 1
+▁accordée 1
+ust 1
+▁COM 1
+▁36 1
+▁choisir 1
+phy 1
+▁puissance 1
+▁réserves 1
+intéresse 1
+▁cœur 1
+▁naturel 1
+▁feuille 1
+OC 1
+▁Puis 1
+▁Ri 1
+▁législatif 1
+▁écologique 1
+vin 1
+▁formule 1
+▁déb 1
+embl 1
+Iraq 1
+95 1
+▁entièrement 1
+▁incroyable 1
+▁finale 1
+▁extérieur 1
+CS 1
+ki 1
+ata 1
+▁datée 1
+assi 1
+36 1
+▁réf 1
+-19 1
+▁Amérique 1
+▁arrêté 1
+▁cultures 1
+ED 1
+▁contractant 1
+▁aff 1
+▁servi 1
+▁accordé 1
+▁mont 1
+▁contrats 1
+UL 1
+▁performance 1
+▁télévision 1
+ham 1
+▁internes 1
+▁2005. 1
+▁garantie 1
+▁détails 1
+▁refus 1
+▁garanti 1
+▁Pe 1
+▁interdit 1
+oi 1
+▁km 1
+unité 1
+▁échéant 1
+▁bi 1
+IV 1
+▁atteinte 1
+▁indispensable 1
+▁phénomène 1
+▁Moyen 1
+uré 1
+▁limité 1
+▁chambre 1
+▁formé 1
+élaborer 1
+▁prévus 1
+▁mérite 1
+▁perspective 1
+▁attend 1
+▁sait 1
+▁régulièrement 1
+▁opinion 1
+mm 1
+▁complètement 1
+▁médicaux 1
+▁pension 1
+▁régler 1
+▁invite 1
+▁PE 1
+▁représenté 1
+▁tenant 1
+lais 1
+▁terminé 1
+▁subventions 1
+▁extérieure 1
+ica 1
+▁voiture 1
+immigration 1
+▁étudié 1
+EMENT 1
+rer 1
+▁spéciaux 1
+▁avions 1
+▁vivent 1
+stat 1
+▁adultes 1
+▁séjour 1
+▁75 1
+▁registre 1
+▁ressemble 1
+TS 1
+taire 1
+▁patrimoine 1
+▁électroniques 1
+▁Pendant 1
+▁blanc 1
+ld 1
+▁compagnie 1
+▁augmenter 1
+interprétation 1
+▁brevets 1
+▁budgétaires 1
+▁usage 1
+▁directe 1
+CT 1
+▁pertinents 1
+▁provinciaux 1
+▁destruction 1
+▁fum 1
+▁réception 1
+▁préliminaire 1
+▁pourront 1
+▁travaillent 1
+gro 1
+▁Lorsqu 1
+▁coll 1
+▁Di 1
+duit 1
+▁image 1
+cial 1
+▁sommet 1
+tÃ 1
+▁développé 1
+.7 1
+64 1
+indépendance 1
+▁Communication 1
+▁Premier 1
+▁conscience 1
+▁établie 1
+▁veulent 1
+enfance 1
+▁Ressources 1
+▁Mexique 1
+éno 1
+▁Man 1
+▁constituer 1
+▁Q 1
+ros 1
+position 1
+ites 1
+▁Go 1
+Asie 1
+gan 1
+ino 1
+ements 1
+▁soumettre 1
+▁transmis 1
+cha 1
+▁37 1
+EP 1
+ides 1
+▁évident 1
+▁gamme 1
+▁paiements 1
+usine 1
+▁plaintes 1
+▁mauvaise 1
+espoir 1
+▁recrutement 1
+▁meilleures 1
+gie 1
+quant 1
+line 1
+▁détermination 1
+65 1
+aw 1
+▁♫ 1
+▁sanctions 1
+comp 1
+▁aérienne 1
+▁arabe 1
+▁culturelle 1
+just 1
+lli 1
+électricité 1
+ulaire 1
+▁Har 1
+▁contribué 1
+abri 1
+▁linguistique 1
+nique 1
+excellent 1
+▁+ 1
+▁rappelle 1
+▁partager 1
+berg 1
+▁mauvais 1
+6) 1
+▁ONG 1
+▁couvert 1
+embr 1
+bit 1
+grad 1
+▁brevet 1
+hor 1
+▁Grand 1
+▁Première 1
+▁appropriées 1
+82 1
+▁Israël 1
+54 1
+85 1
+▁Ga 1
+▁2004. 1
+nat 1
+roule 1
+▁décennie 1
+▁endroits 1
+▁All 1
+▁nation 1
+amine 1
+▁voulez 1
+Irlande 1
+á 1
+▁fera 1
+▁Comp 1
+identité 1
+▁élevés 1
+▁Corée 1
+voi 1
+88 1
+▁client 1
+nom 1
+▁malgré 1
+48 1
+▁vérifier 1
+47 1
+▁présentées 1
+▁officielle 1
+▁climatiques 1
+▁voit 1
+▁prison 1
+▁peur 1
+pol 1
+▁Bu 1
+▁jury 1
+▁biennal 1
+▁Nombre 1
+tée 1
+▁rang 1
+▁effectivement 1
+gel 1
+▁riche 1
+▁convaincu 1
+▁changé 1
+élément 1
+▁prennent 1
+▁consensus 1
+-10 1
+venant 1
+▁corruption 1
+▁lit 1
+▁th 1
+▁Pre 1
+▁découvert 1
+▁établis 1
+▁rural 1
+CH 1
+1, 1
+▁mentale 1
+dite 1
+▁Décision 1
+ave 1
+▁arrivé 1
+har 1
+▁souhait 1
+▁Forces 1
+rog 1
+engager 1
+▁donateurs 1
+▁Pacifique 1
+▁nourriture 1
+▁titulaire 1
+adaptation 1
+▁réalisée 1
+▁tôt 1
+▁individus 1
+▁alimentaires 1
+▁arriver 1
+▁crimes 1
+▁culturelles 1
+▁300 1
+▁Millénaire 1
+▁Parmi 1
+FC 1
+▁associations 1
+▁Manitoba 1
+▁images 1
+▁terroristes 1
+▁documentation 1
+zo 1
+appelle 1
+▁visés 1
+▁pétrole 1
+ost 1
+▁professionnel 1
+2006 1
+▁réellement 1
+opposition 1
+2007 1
+Administration 1
+▁reconnaître 1
+▁préparé 1
+▁facilement 1
+▁électrique 1
+ANT 1
+▁économie 1
+▁visées 1
+▁accident 1
+▁différent 1
+▁rivière 1
+▁Association 1
+nie 1
+toxi 1
+ajout 1
+▁poly 1
+▁ha 1
+▁indépendant 1
+ssez 1
+2005 1
+ster 1
+▁cou 1
+ville 1
+▁Gestion 1
+▁sentiment 1
+▁officiel 1
+▁conc 1
+fl 1
+▁(2) 1
+▁budgétaire 1
+▁souligner 1
+▁viol 1
+▁emplois 1
+▁bénéfice 1
+▁pollution 1
+▁auraient 1
+▁Commissaire 1
+▁partis 1
+igu 1
+▁manifeste 1
+▁requis 1
+2, 1
+▁liaison 1
+▁entrepris 1
+intégrité 1
+▁expériences 1
+▁quartier 1
++ 1
+▁européennes 1
+her 1
+ze 1
+bas 1
+ook 1
+▁pouvaient 1
+▁séparé 1
+55 1
+▁fabricant 1
+▁rurales 1
+▁dispositif 1
+struct 1
+▁professeur 1
+▁million 1
+▁laisser 1
+▁préoccupé 1
+mor 1
+▁accompli 1
+▁semblable 1
+▁Recherche 1
+▁48 1
+▁Tu 1
+▁culturel 1
+uer 1
+▁dette 1
+▁autorisée 1
+▁séances 1
+ouch 1
+▁impossible 1
+▁rouge 1
+▁38 1
+▁pièce 1
+▁carbone 1
+46 1
+▁2003. 1
+ax 1
+▁conclure 1
+▁indiquer 1
+▁4, 1
+▁foi 1
+▁régi 1
+▁relever 1
+▁répondu 1
+▁outil 1
+▁continent 1
+IG 1
+univers 1
+amour 1
+nées 1
+▁géo 1
+▁Système 1
+▁médicale 1
+▁résumé 1
+ute 1
+▁Ka 1
+▁Exp 1
+sistant 1
+indemnité 1
+▁remplir 1
+▁allant 1
+▁considérer 1
+▁pal 1
+ailleurs 1
+▁vent 1
+▁élu 1
+original 1
+▁possibles 1
+▁tarif 1
+-01 1
+▁PNUD 1
+fin 1
+▁Congo 1
+▁exportations 1
+▁statistique 1
+▁Qui 1
+ott 1
+ble 1
+$ 1
+▁évaluations 1
+▁jeune 1
+▁saisi 1
+▁transmission 1
+▁concentration 1
+▁Information 1
+98 1
+▁Guide 1
+▁cibl 1
+▁gauche 1
+UT 1
+▁pleine 1
+▁porté 1
+▁El 1
+ight 1
+▁division 1
+▁nécessite 1
+ek 1
+▁plénière 1
+▁respectivement 1
+▁notion 1
+GR 1
+▁publications 1
+▁saison 1
+ary 1
+▁criminalité 1
+▁inférieur 1
+ancienne 1
+▁auxquelles 1
+ote 1
+ome 1
+▁juridiction 1
+ann 1
+▁monétaire 1
+▁informatique 1
+RO 1
+▁livres 1
+identification 1
+▁relevant 1
+▁prié 1
+▁Plusieurs 1
+▁Voilà 1
+▁déterminée 1
+▁noter 1
+▁Mont 1
+▁décès 1
+▁Pla 1
+▁tellement 1
+▁constitutionnel 1
+▁Mari 1
+▁exemples 1
+▁mention 1
+▁Aujourd 1
+ina 1
+▁préoccupation 1
+▁fiscale 1
+▁activement 1
+▁scène 1
+ENT 1
+58 1
+emplacement 1
+▁électorale 1
+mit 1
+▁Force 1
+▁axée 1
+▁Sp 1
+▁Turquie 1
+▁étapes 1
+opinion 1
+mie 1
+OMC 1
+MP 1
+▁identifié 1
+▁Sou 1
+▁déployés 1
+▁gestionnaires 1
+▁alinéa 1
+▁42 1
+euro 1
+vention 1
+individu 1
+▁liée 1
+▁augmente 1
+3) 1
+por 1
+céd 1
+▁significative 1
+expliqu 1
+▁libertés 1
+▁44 1
+▁41 1
+▁fondamentales 1
+▁appuie 1
+▁énorme 1
+▁intégral 1
+oud 1
+▁Saskatchewan 1
+za 1
+▁concerné 1
+▁autonome 1
+▁FC 1
+▁crée 1
+écri 1
+▁remise 1
+▁satisfait 1
+▁devront 1
+57 1
+▁affirme 1
+▁char 1
+▁modifiée 1
+▁Personne 1
+▁eau 1
+▁migration 1
+AF 1
+aud 1
+AI 1
+▁raciale 1
+duction 1
+▁capable 1
+▁Tel 1
+tit 1
+▁39 1
+▁hôtel 1
+▁effectuer 1
+▁euro 1
+▁infrastructures 1
+époque 1
+▁précision 1
+▁repose 1
+▁matériaux 1
+tive 1
+WP 1
+▁résidence 1
+59 1
+▁subvention 1
+échelon 1
+▁devaient 1
+Etat 1
+▁prévoir 1
+▁Brésil 1
+▁relevé 1
+▁Total 1
+acc 1
+▁marques 1
+▁financer 1
+▁décider 1
+intermédiaire 1
+▁interventions 1
+42 1
+▁1) 1
+▁forcé 1
+▁utiles 1
+MI 1
+▁différente 1
+63 1
+pre 1
+▁travaillé 1
+▁Ko 1
+ets 1
+▁armés 1
+▁réformes 1
+exception 1
+appelant 1
+ru 1
+▁débats 1
+▁provinciale 1
+aff 1
+▁élargi 1
+▁avantage 1
+▁CC 1
+▁handicapées 1
+arc 1
+prolifération 1
+introduction 1
+▁inf 1
+61 1
+▁appropriée 1
+▁systématique 1
+Allemagne 1
+▁fille 1
+▁étroite 1
+dy 1
+▁garanties 1
+▁naturelle 1
+entretien 1
+acteur 1
+▁observer 1
+▁indépendante 1
+AG 1
+▁2002. 1
+(2) 1
+▁unités 1
+▁heureux 1
+▁individuelle 1
+isées 1
+avez 1
+▁continuent 1
+▁protégé 1
+▁chances 1
+▁comités 1
+▁extra 1
+▁cartes 1
+ico 1
+▁accomplis 1
+uve 1
+occupant 1
+▁distinct 1
+armée 1
+html 1
+▁clé 1
+▁bâtiment 1
+▁obtenus 1
+▁SC 1
+DP 1
+▁assis 1
+attend 1
+gestion 1
+▁pertes 1
+▁constitué 1
+▁profil 1
+▁montrent 1
+▁figurent 1
+▁matin 1
+▁fréquence 1
+▁Droit 1
+rier 1
+▁régimes 1
+▁2006. 1
+▁reconnaît 1
+▁chinois 1
+Inde 1
+▁actuelles 1
+PI 1
+▁vulnérables 1
+▁civil 1
+▁pose 1
+▁fonctionne 1
+▁options 1
+MA 1
+81 1
+▁poser 1
+exprimer 1
+▁appliqué 1
+élé 1
+▁déclare 1
+instrument 1
+▁diminution 1
+Écosse 1
+▁Sch 1
+▁normale 1
+▁biologique 1
+IQUE 1
+▁bis 1
+▁séminaire 1
+▁net 1
+▁es 1
+▁entités 1
+▁investi 1
+▁cru 1
+▁essentiels 1
+contr 1
+▁aime 1
+▁transporteur 1
+▁médecine 1
+agir 1
+by 1
+humanité 1
+▁publicité 1
+incidence 1
+▁tendances 1
+▁enfin 1
+pul 1
+▁esprit 1
+62 1
+nier 1
+▁échange 1
+▁influence 1
+▁menée 1
+▁lire 1
+▁marqué 1
+▁récent 1
+▁faisait 1
+ographie 1
+▁56 1
+▁banques 1
+den 1
+TC 1
+▁apprendre 1
+iti 1
+▁côte 1
+▁95 1
+▁israélien 1
+▁approprié 1
+Atlantique 1
+Alberta 1
+animal 1
+▁Lors 1
+ther 1
+▁migrants 1
+importation 1
+________________ 1
+▁législatives 1
+mobil 1
+cor 1
+ira 1
+▁étudier 1
+▁fréquent 1
+▁CON 1
+éné 1
+Brunswick 1
+▁ajouté 1
+▁mariage 1
+strat 1
+UNICEF 1
+▁IRSC 1
+imposition 1
+▁machine 1
+▁amendements 1
+▁visée 1
+▁claire 1
+/3 1
+▁annonce 1
+lles 1
+▁2) 1
+▁Statistique 1
+RES 1
+▁versé 1
+1) 1
+▁sensible 1
+▁mines 1
+▁éliminer 1
+entente 1
+▁Paris 1
+▁établissement 1
+▁attribué 1
+▁concentrations 1
+▁mettant 1
+▁noir 1
+bor 1
+▁couche 1
+▁dirigé 1
+En 1
+▁spatiale 1
+▁multilatéral 1
+▁mur 1
+▁2008, 1
+▁réglement 1
+▁x 1
+▁LES 1
+ries 1
+▁manuel 1
+aller 1
+▁envisagé 1
+ling 1
+▁économies 1
+▁âgées 1
+université 1
+▁pages 1
+▁adresse 1
+▁donnent 1
+espère 1
+▁poisson 1
+▁Recommandation 1
+DA 1
+▁poissons 1
+▁Dis 1
+amm 1
+▁touchant 1
+atrices 1
+▁destiné 1
+▁Ver 1
+▁affaire 1
+▁tourisme 1
+▁Table 1
+adopter 1
+▁intéressés 1
+▁proactive 1
+▁suffisante 1
+▁existant 1
+▁partagé 1
+▁volet 1
+▁calendrier 1
+▁Grande 1
+▁restrictions 1
+▁examine 1
+MD 1
+▁combien 1
+avec 1
+ayant 1
+52 1
+▁multiples 1
+▁négociation 1
+▁administratif 1
+▁domin 1
+▁Kosovo 1
+▁Ph 1
+▁combat 1
+56 1
+▁fallait 1
+▁robot 1
+▁Deux 1
+▁lourd 1
+temp 1
+▁spécialistes 1
+bu 1
+▁PA 1
+▁Divulgation 1
+▁proposée 1
+▁située 1
+▁individuel 1
+▁conduit 1
+▁confirmé 1
+environnementaux 1
+▁âge 1
+Internet 1
+▁présentent 1
+▁adéquate 1
+-02 1
+▁puissant 1
+own 1
+▁Paragraphe 1
+▁limiter 1
+▁destinées 1
+gique 1
+▁virus 1
+accroître 1
+▁suffisant 1
+▁Quelle 1
+▁plaignant 1
+▁médecins 1
+▁pacifique 1
+édition 1
+▁concernées 1
+▁retrait 1
+▁Développement 1
+▁décide 1
+honorable 1
+▁complexes 1
+af 1
+▁existants 1
+▁varie 1
+lage 1
+▁revêt 1
+alcool 1
+▁majeure 1
+-03 1
+▁chercher 1
+instruction 1
+matique 1
+▁remercier 1
+▁PRO 1
+▁mène 1
+président 1
+île 1
+▁explique 1
+▁reflète 1
+▁Suisse 1
+▁croire 1
+▁dispens 1
+iennes 1
+▁Toute 1
+▁sert 1
+▁prochaines 1
+▁objets 1
+▁autorisation 1
+▁plu 1
+view 1
+intéressé 1
+▁communs 1
+▁2007. 1
+▁Vo 1
+▁valoir 1
+▁Autochtones 1
+▁personnelle 1
+MC 1
+▁réussite 1
+▁Act 1
+▁continuera 1
+alis 1
+bar 1
+▁utilisent 1
+▁ministérielle 1
+▁enseignants 1
+▁be 1
+▁Inde 1
+iel 1
+▁parlementaire 1
+ouest 1
+▁introduit 1
+▁incombe 1
+▁85 1
+▁satellite 1
+▁accompagné 1
+▁navires 1
+FI 1
+▁remplacé 1
+▁National 1
+entreprises 1
+▁second 1
+▁réparti 1
+ense 1
+▁mètres 1
+audit 1
+▁photos 1
+▁établies 1
+▁Inter 1
+▁médecin 1
+▁prestataire 1
+▁variation 1
+▁neuro 1
+▁renvoi 1
+▁exactement 1
+▁parlé 1
+▁télécommunicat 1
+bin 1
+▁évolution 1
+▁Dieu 1
+LA 1
+cou 1
+▁tiendra 1
+mission 1
+▁Mal 1
+.9 1
+▁effective 1
+▁classification 1
+▁destination 1
+▁sort 1
+ajustement 1
+▁émis 1
+inscrire 1
+▁récente 1
+venir 1
+í 1
+octroi 1
+▁Ann 1
+▁mini 1
+▁disent 1
+-11 1
+▁admissibles 1
+lie 1
+▁actifs 1
+ps 1
+▁prescrit 1
+pho 1
+▁intitulée 1
+uvre 1
+53 1
+TR 1
+▁Dépenses 1
+▁allez 1
+horizon 1
+▁institution 1
+▁Hu 1
+▁acquis 1
+▁2000. 1
+▁conventions 1
+▁Vancouver 1
+Neuve 1
+QU 1
+▁découlant 1
+▁Salle 1
+▁désigne 1
+▁mémoire 1
+▁conscient 1
+ménage 1
+▁oui 1
+pha 1
+lat 1
+▁plantes 1
+Euro 1
+▁importations 1
+▁1998, 1
+isent 1
+▁énergie 1
+▁centaines 1
+incident 1
+los 1
+▁soulevé 1
+▁Suède 1
+▁sortir 1
+▁intérieure 1
+▁mesurer 1
+gène 1
+▁britannique 1
+.1) 1
+adapter 1
+▁bénéficiaires 1
+▁San 1
+▁NO 1
+▁contrôler 1
+▁défaut 1
+▁bourse 1
+▁dehors 1
+▁pat 1
+euros 1
+▁perspectives 1
+aimerais 1
+▁actuels 1
+▁variable 1
+▁échelle 1
+▁offerts 1
+▁écrite 1
+▁drogue 1
+▁signal 1
+▁dimension 1
+▁contribue 1
+Iran 1
+▁frontière 1
+▁sang 1
+▁Ta 1
+▁(2 1
+▁mondialisation 1
+▁fortement 1
+▁requête 1
+▁incidences 1
+exclusion 1
+▁caractéris 1
+▁Entre 1
+▁dois 1
+▁design 1
+acquitter 1
+▁maximale 1
+▁regarder 1
+▁satisfaire 1
+▁soir 1
+▁Bi 1
+▁expliquer 1
+▁remplacer 1
+fer 1
+▁lac 1
+▁couverture 1
+DR 1
+▁juges 1
+▁jugement 1
+▁2001. 1
+▁chômage 1
+▁43 1
+▁mineurs 1
+▁Ltd 1
+apport 1
+▁rejeté 1
+▁intention 1
+▁résolu 1
+▁appareils 1
+▁Madame 1
+▁comparaison 1
+▁empêcher 1
+▁légale 1
+▁nourri 1
+LI 1
+cho 1
+▁entrée 1
+mettre 1
+▁états 1
+▁mus 1
+▁effectuée 1
+▁rappeler 1
+▁détaillé 1
+vari 1
+▁administrative 1
+▁soumission 1
+▁américain 1
+▁relié 1
+▁49 1
+nch 1
+▁sondage 1
+ordinateur 1
+▁fils 1
+▁consentement 1
+och 1
+▁terrestre 1
+▁existantes 1
+OS 1
+atives 1
+▁recueilli 1
+▁Gra 1
+▁portent 1
+▁65 1
+Prince 1
+▁quotidienne 1
+▁400 1
+▁banque 1
+▁Lo 1
+▁consacrée 1
+▁tradition 1
+▁substance 1
+ull 1
+▁commissions 1
+opportunité 1
+▁possède 1
+▁montants 1
+▁profonde 1
+océan 1
+▁rem 1
+▁tarifaire 1
+▁Tra 1
+▁patient 1
+lée 1
+aison 1
+▁religieuse 1
+▁joué 1
+fond 1
+▁mortalité 1
+▁devoir 1
+▁remercie 1
+ish 1
+▁notification 1
+▁IV 1
+Le 1
+▁détenus 1
+▁futures 1
+▁instamment 1
+eng 1
+▁précisément 1
+▁Certaines 1
+▁conserver 1
+▁universités 1
+honneur 1
+▁Val 1
+éviter 1
+▁majeur 1
+▁Téléphone 1
+VI 1
+▁dommages 1
+** 1
+index 1
+allocation 1
+équilibre 1
+Étant 1
+itude 1
+▁traditionnels 1
+▁Défense 1
+▁humanitaires 1
+▁similaires 1
+▁54 1
+▁fou 1
+termin 1
+▁préjudice 1
+▁souci 1
+▁flux 1
+hiver 1
+nez 1
+ting 1
+▁facultatif 1
+▁technologique 1
+▁recon 1
+▁compétentes 1
+BC 1
+II 1
+gard 1
+vel 1
+oxy 1
+▁doté 1
+▁ami 1
+équ 1
+arriv 1
+▁web 1
+▁confi 1
+▁58 1
+▁préciser 1
+▁Mesures 1
+▁équipes 1
+▁Nom 1
+occupation 1
+▁effectuées 1
+écran 1
+bert 1
+▁acceptable 1
+▁CEE 1
+▁PME 1
+▁esp 1
+▁57 1
+▁ten 1
+▁dépasse 1
+▁Jean 1
+▁MA 1
+ombre 1
+▁ministériel 1
+-12 1
+▁district 1
+▁lecture 1
+uk 1
+▁vite 1
+▁fondamentale 1
+ange 1
+▁territoriale 1
+utilisateur 1
+▁Pi 1
+éni 1
+▁susmentionné 1
+▁crucial 1
+▁RÉ 1
+▁stable 1
+▁lié 1
+hypothèse 1
+sixième 1
+7) 1
+▁Environnement 1
+▁(613) 1
+doc 1
+▁proposées 1
+▁Examen 1
+▁Im 1
+illage 1
+exemple 1
+▁menaces 1
+▁stocks 1
+▁seuls 1
+Al 1
+▁théorie 1
+▁CI 1
+▁volontaire 1
+américain 1
+▁conformes 1
+▁quelles 1
+▁municipalité 1
+▁classé 1
+arrière 1
+▁Allemagne 1
+élève 1
+▁tabac 1
+▁Ru 1
+bel 1
+▁totalement 1
+▁résulte 1
+▁directeurs 1
+▁Rappelant 1
+▁report 1
+▁légitime 1
+ADN 1
+▁indiquent 1
+▁parallèle 1
+▁confronté 1
+▁Télé 1
+▁visent 1
+▁veuillez 1
+▁essayer 1
+▁particulières 1
+voqu 1
+▁grain 1
+tain 1
+SR 1
+▁participent 1
+▁morale 1
+étique 1
+▁voies 1
+▁programmation 1
+▁collection 1
+▁causes 1
+▁nationalité 1
+-5 1
+▁cotisation 1
+équité 1
+▁répondants 1
+▁illégale 1
+▁fini 1
+▁renouvelable 1
+occuper 1
+▁succ 1
+▁Veuillez 1
+▁rédaction 1
+▁Cap 1
+active 1
+▁ventes 1
+▁gain 1
+office 1
+chet 1
+▁finances 1
+▁officiels 1
+▁énoncés 1
+▁revue 1
+3/ 1
+▁classique 1
+▁américains 1
+▁requises 1
+AIRE 1
+▁Présidente 1
+▁Sol 1
+▁vendu 1
+Ukraine 1
+▁axé 1
+▁entière 1
+tom 1
+▁Grâce 1
+atteindre 1
+▁lettres 1
+▁consolidation 1
+▁dispositifs 1
+▁Mise 1
+▁inclure 1
+▁Lettre 1
+▁lancer 1
+ii 1
+gh 1
+OI 1
+fait 1
+▁étrangère 1
+2004 1
+▁preuves 1
+poli 1
+▁rendue 1
+avantage 1
+▁anciens 1
+▁truc 1
+▁écart 1
+▁puisqu 1
+▁gratuit 1
+▁finalement 1
+▁équivalent 1
+abandon 1
+éral 1
+▁discuter 1
+▁initial 1
+▁précisé 1
+▁four 1
+▁présentes 1
+▁ultérieure 1
+expert 1
+▁connais 1
+▁livraison 1
+▁remplacement 1
+fact 1
+▁appelée 1
+lla 1
+▁habitants 1
+OM 1
+▁française 1
+mont 1
+▁bons 1
+iciens 1
+▁my 1
+▁Sal 1
+enregistr 1
+exigence 1
+nait 1
+▁appartenant 1
+▁radiodiffusion 1
+expansion 1
+▁questionnaire 1
+▁révisé 1
+▁préserver 1
+acier 1
+▁sanitaire 1
+.8 1
+class 1
+▁59 1
+▁placement 1
+▁courriel 1
+▁53 1
+▁semblent 1
+gramme 1
+▁te 1
+essaye 1
+▁Eh 1
+PT 1
+▁ratification 1
+mment 1
+lot 1
+▁formulées 1
+▁VI 1
+▁réparation 1
+▁répét 1
+extrême 1
+▁droite 1
+▁découvrir 1
+▁calculé 1
+▁incidence 1
+La 1
+▁Ge 1
+▁ii 1
+▁voilà 1
+▁essentielle 1
+▁combattre 1
+▁passant 1
+aménagement 1
+▁survie 1
+▁bases 1
+iller 1
+culaire 1
+ibilité 1
+ION 1
+FP 1
+▁viennent 1
+wi 1
+▁créée 1
+▁devenue 1
+▁continué 1
+▁bu 1
+▁catastrophes 1
+▁voulais 1
+▁pont 1
+ada 1
+▁reçues 1
+▁mobile 1
+flu 1
+▁mor 1
+▁dangereux 1
+▁espèce 1
+UM 1
+AUX 1
+▁révélé 1
+entend 1
+▁agir 1
+▁encouragé 1
+tien 1
+abilis 1
+▁allemand 1
+▁disposer 1
+▁chaud 1
+ampleur 1
+▁Liens 1
+septième 1
+▁députés 1
+immeuble 1
+▁négatif 1
+▁Région 1
+▁exercer 1
+▁disponibilité 1
+▁mélange 1
+[ 1
+aucune 1
+▁carburant 1
+▁discours 1
+▁47 1
+organisations 1
+lev 1
+▁seuil 1
+▁standard 1
+▁constitution 1
+▁bancaire 1
+angle 1
+▁définitive 1
+▁isolé 1
+▁montagne 1
+▁distinction 1
+▁médical 1
+batt 1
+pré 1
+▁ti 1
+▁entendre 1
+▁touche 1
+▁profiter 1
+▁progress 1
+hal 1
+▁Rec 1
+▁racisme 1
+asile 1
+pond 1
+▁Page 1
+occupe 1
+▁Nos 1
+▁Can 1
+▁garder 1
+▁préparatoire 1
+▁distribué 1
+▁inférieure 1
+tabli 1
+▁biologiques 1
+▁expériment 1
+▁privés 1
+▁productivité 1
+effort 1
+▁ref 1
+▁CD 1
+▁obligation 1
+ologiques 1
+illon 1
+▁résistance 1
+bat 1
+▁former 1
+▁bibliothèque 1
+▁exposition 1
+▁PIB 1
+▁Liste 1
+▁fondamental 1
+▁devrions 1
+▁bleu 1
+▁transparent 1
+perfectionnement 1
+EX 1
+▁avancés 1
+▁chiffre 1
+publi 1
+▁61 1
+▁thèmes 1
+bol 1
+▁US 1
+▁accessible 1
+▁entente 1
+attribution 1
+miné 1
+▁accorder 1
+▁approfondie 1
+ator 1
+▁Caraïbes 1
+▁insuffisant 1
+▁Organisation 1
+▁motif 1
+▁tests 1
+.10 1
+ITÉ 1
+▁restaurant 1
+TÉ 1
+▁jeunesse 1
+fu 1
+/4 1
+ié 1
+▁correct 1
+CEE 1
+viv 1
+▁découverte 1
+▁contrôlé 1
+▁dose 1
+▁poursuivi 1
+▁prenant 1
+▁pensé 1
+▁ronde 1
+▁emp 1
+▁Site 1
+▁mouvements 1
+▁spécialisées 1
+▁46 1
+▁ru 1
+▁considérés 1
+▁Budget 1
+vier 1
+▁ajouter 1
+▁implique 1
+................ 1
+▁Min 1
+▁Yukon 1
+▁Bosnie 1
+élargissement 1
+2008 1
+FR 1
+gal 1
+▁offrent 1
+▁milieux 1
+▁04 1
+▁constante 1
+▁pousse 1
+▁proposer 1
+▁Justice 1
+▁respecté 1
+▁mutuelle 1
+▁déposée 1
+▁exposés 1
+▁infractions 1
+▁domicile 1
+offrir 1
+▁tonnes 1
+▁soldats 1
+▁visé 1
+▁effectués 1
+▁retenue 1
+press 1
+▁moindre 1
+ini 1
+▁capitale 1
+▁exécuté 1
+-6 1
+▁exception 1
+▁époque 1
+indice 1
+): 1
+appuyer 1
+▁témoins 1
+aéroport 1
+▁tir 1
+bour 1
+▁Chapitre 1
+▁applications 1
+▁dÃ 1
+▁pensons 1
+▁envisage 1
+▁teneur 1
+▁irr 1
+▁1987 1
+▁saisie 1
+▁prioritaires 1
+▁Fi 1
+▁islamique 1
+hr 1
+▁profession 1
+▁contribuent 1
+▁prétend 1
+▁assujetti 1
+▁1989 1
+Île 1
+LO 1
+▁positif 1
+È 1
+▁Prie 1
+▁Afghanistan 1
+▁appuyé 1
+▁1997, 1
+harmonisation 1
+▁vérité 1
+▁auparavant 1
+jour 1
+nage 1
+ndra 1
+▁SO 1
+ung 1
+▁abouti 1
+VE 1
+aujourd 1
+▁universelle 1
+8) 1
+qua 1
+▁visible 1
+▁espagnol 1
+ado 1
+▁Transports 1
+électro 1
+▁informer 1
+▁gagner 1
+▁Réseau 1
+▁noms 1
+vol 1
+bout 1
+▁réflexion 1
+▁entraîne 1
+▁industries 1
+▁exigé 1
+▁faudra 1
+▁soixante 1
+▁pri 1
+▁99 1
+▁judiciaires 1
+huitième 1
+▁Alberta 1
+▁négative 1
+▁intéressées 1
+ivité 1
+▁organis 1
+éco 1
+▁résultant 1
+exploit 1
+▁constate 1
+▁versement 1
+neuvième 1
+▁réservé 1
+▁latine 1
+▁régulière 1
+▁Aide 1
+▁Wi 1
+cell 1
+▁tiens 1
+▁fournissent 1
+▁administrations 1
+▁GR 1
+▁77 1
+ancien 1
+▁russe 1
+▁bassin 1
+oux 1
+▁réclamation 1
+▁privées 1
+▁compose 1
+▁réglementaires 1
+IB 1
+▁CR 1
+▁institutionnels 1
+▁habituellement 1
+▁provenance 1
+▁froid 1
+▁Cuba 1
+▁Version 1
+où 1
+text 1
+▁restent 1
+▁traditionnelles 1
+2003 1
+mination 1
+▁prévisions 1
+▁fiable 1
+▁verre 1
+▁fichier 1
+connect 1
+▁essentielles 1
+OIT 1
+onde 1
+▁variété 1
+▁estimations 1
+▁minimale 1
+AV 1
+▁indi 1
+étendre 1
+▁regroupe 1
+▁apport 1
+▁Objectif 1
+▁fondement 1
+▁médicament 1
+▁physiques 1
+▁agent 1
+▁couvre 1
+anne 1
+▁coordonner 1
+▁invités 1
+▁52 1
+▁capitaux 1
+▁définis 1
+▁marge 1
+▁rassemble 1
+▁51 1
+▁énoncées 1
+▁perçu 1
+▁constater 1
+▁chasse 1
+▁volontaires 1
+▁marine 1
+▁enjeux 1
+rég 1
+vironnementale 1
+▁fournies 1
+▁sortie 1
+GE 1
+▁David 1
+▁vice 1
+▁favorise 1
+▁abus 1
+▁récentes 1
+▁1996, 1
+architecture 1
+▁suprême 1
+▁fusion 1
+gation 1
+▁Archives 1
+▁Norvège 1
+▁compétitivité 1
+Équipe 1
+▁mettent 1
+NC 1
+▁Asie 1
+▁histoires 1
+▁néanmoins 1
+▁glace 1
+▁inscrits 1
+▁impliqué 1
+▁rêve 1
+dor 1
+▁concrètes 1
+ministre 1
+▁sexuel 1
+▁formulé 1
+écart 1
+hu 1
+autonomie 1
+▁consomm 1
+rions 1
+▁Health 1
+explication 1
+▁soutient 1
+êt 1
+▁plaisir 1
+années 1
+▁franc 1
+▁chargés 1
+accompagne 1
+▁municipal 1
+indicateur 1
+▁PDF 1
+▁migr 1
+prend 1
+aliser 1
+▁nette 1
+▁Cadre 1
+clé 1
+axe 1
+▁orientations 1
+▁déterminant 1
+▁foyer 1
+▁Assemblée 1
+▁Mac 1
+▁1988 1
+▁déploiement 1
+cip 1
+▁condamné 1
+▁quels 1
+▁maîtrise 1
+ny 1
+▁indépendants 1
+actuel 1
+▁diminué 1
+▁Trans 1
+udi 1
+▁dangereuses 1
+▁suppose 1
+▁exercé 1
+▁fournisseur 1
+▁démontré 1
+▁département 1
+▁exact 1
+▁difficiles 1
+▁permettrait 1
+▁administratifs 1
+▁compromis 1
+▁futurs 1
+▁actif 1
+▁mentionne 1
+▁secret 1
+▁douanes 1
+Les 1
+▁donn 1
+▁envisager 1
+▁psycho 1
+▁évidence 1
+lique 1
+▁logique 1
+▁bénévole 1
+▁Paul 1
+▁développés 1
+▁capables 1
+▁traduit 1
+gg 1
+kin 1
+expertise 1
+agence 1
+▁Industrie 1
+« 1
+▁rat 1
+▁Courriel 1
+▁intermédiaire 1
+▁révolution 1
+-04 1
+▁autochtone 1
+-05 1
+▁transmettre 1
+▁mesuré 1
+▁150 1
+▁bébé 1
+▁reproduction 1
+▁clinique 1
+▁accru 1
+élev 1
+▁Journal 1
+pér 1
+▁absolument 1
+▁pur 1
+1/ 1
+▁surveiller 1
+▁célébr 1
+▁joint 1
+accroissement 1
+▁` 1
+arité 1
+▁derrière 1
+▁leadership 1
+joint 1
+▁voisins 1
+▁régissant 1
+▁transféré 1
+▁Fondation 1
+▁certainement 1
+Ar 1
+▁composant 1
+uy 1
+ALE 1
+▁intelligent 1
+▁reconstruction 1
+▁étroitement 1
+▁énoncé 1
+bul 1
+ker 1
+▁SUR 1
+▁coordonnée 1
+▁CH 1
+endroit 1
+Labrador 1
+disciplinaire 1
+▁régulier 1
+▁Soudan 1
+▁location 1
+▁visiteurs 1
+apporter 1
+▁accepter 1
+EI 1
+▁reçoivent 1
+▁solidarité 1
+▁identique 1
+▁88 1
+Environnement 1
+▁messages 1
+▁vir 1
+▁reçoit 1
+▁file 1
+▁chaleur 1
+▁Aucune 1
+▁égale 1
+▁Canadian 1
+▁importé 1
+▁800 1
+oï 1
+▁bâtiments 1
+▁dépense 1
+▁Environ 1
+▁délivré 1
+▁urgent 1
+▁Source 1
+mber 1
+▁adolescents 1
+▁couvrir 1
+▁combustible 1
+▁repos 1
+top 1
+aime 1
+▁Art 1
+attente 1
+▁consulté 1
+▁opportun 1
+▁intérimaire 1
+pin 1
+▁2004-2005 1
+▁décret 1
+▁Renseignements 1
+▁prête 1
+▁sauvage 1
+▁descend 1
+▁normalement 1
+Autriche 1
+▁effort 1
+affirm 1
+algré 1
+▁provincial 1
+▁cliniques 1
+atmosphère 1
+▁remarquable 1
+▁Évaluation 1
+homo 1
+▁criminel 1
+▁candidature 1
+asso 1
+▁opérationnelles 1
+TO 1
+▁forum 1
+▁machines 1
+▁voyages 1
+▁ethnique 1
+▁basé 1
+way 1
+▁Vienne 1
+vert 1
+▁normal 1
+lang 1
+▁uniforme 1
+iale 1
+▁employeurs 1
+ski 1
+▁Résolution 1
+▁libération 1
+▁regardez 1
+▁supérieurs 1
+▁géré 1
+▁associé 1
+/5 1
+▁industriel 1
+▁phrase 1
+▁installé 1
+labor 1
+▁Questions 1
+▁armées 1
+▁combinaison 1
+▁spécialisé 1
+▁02 1
+▁apportées 1
+▁conjointe 1
+▁indiquant 1
+▁intervention 1
+▁concevoir 1
+▁partenaire 1
+▁PAR 1
+-06 1
+▁quelconque 1
+▁lancement 1
+▁éventuelle 1
+▁statu 1
+▁sérieux 1
+hir 1
+▁proximité 1
+leur 1
+▁duquel 1
+▁belle 1
+arra 1
+NI 1
+▁probable 1
+gré 1
+important 1
+TRE 1
+▁mg 1
+5/ 1
+TER 1
+▁étranger 1
+▁pourtant 1
+▁expliqué 1
+▁difficulté 1
+agisse 1
+▁Troisième 1
+▁Jan 1
+▁concentrer 1
+▁africains 1
+▁admissible 1
+▁réglementaire 1
+▁solde 1
+EE 1
+▁blessure 1
+▁bilatéral 1
+▁mains 1
+▁tissus 1
+▁opération 1
+▁concernent 1
+▁faibles 1
+▁ter 1
+▁adopte 1
+▁mixte 1
+the 1
+gon 1
+▁îles 1
+▁ferroviaire 1
+▁Canadiennes 1
+▁tension 1
+▁bloc 1
+OD 1
+▁secours 1
+cre 1
+9) 1
+ède 1
+▁Pakistan 1
+▁rythme 1
+icule 1
+old 1
+Herzégovine 1
+▁délégué 1
+▁améliorations 1
+▁laissé 1
+mili 1
+▁institutionnel 1
+graph 1
+▁organe 1
+▁kilomètres 1
+échapp 1
+▁EX 1
+effectuer 1
+▁Pri 1
+▁formuler 1
+dans 1
+▁Ensuite 1
+▁trente 1
+▁imposé 1
+▁unité 1
+▁retrouve 1
+rise 1
+FA 1
+▁mondiaux 1
+▁graphique 1
+échec 1
+▁conjointement 1
+appliqu 1
+▁satisfaisant 1
+▁existent 1
+ART 1
+▁basée 1
+prouv 1
+▁diplomatique 1
+considérablement 1
+▁abordé 1
+habitude 1
+▁traduction 1
+aucun 1
+▁discipline 1
+▁secrétaire 1
+▁instructions 1
+écoute 1
+▁Roumanie 1
+▁analyses 1
+▁clause 1
+▁voyons 1
+ding 1
+▁artistes 1
+Ch 1
+riez 1
+ning 1
+▁parlementaires 1
+▁PC 1
+lam 1
+Québec 1
+▁complémentaires 1
+ö 1
+▁lacunes 1
+▁Parcs 1
+king 1
+▁équipé 1
+▁sollicit 1
+▁présentant 1
+ouvrage 1
+▁stress 1
+▁souhaitent 1
+▁prescriptions 1
+▁achevé 1
+▁voitures 1
+emballage 1
+▁lâ 1
+▁traditionnelle 1
+▁victime 1
+▁University 1
+▁navigation 1
+▁influ 1
+▁entraîner 1
+▁organique 1
+organismes 1
+OCDE 1
+▁forêt 1
+▁perd 1
+▁Chef 1
+▁jeux 1
+▁passion 1
+▁licences 1
+▁MIN 1
+èse 1
+▁accessibles 1
+charge 1
+▁pire 1
+▁indu 1
+▁connue 1
+ologique 1
+▁dessus 1
+infection 1
+nière 1
+▁appropriés 1
+estimation 1
+épreuve 1
+▁pensez 1
+▁complexité 1
+▁ordonnance 1
+▁historiques 1
+▁approuvée 1
+▁soin 1
+attaque 1
+▁potentielle 1
+▁Belgique 1
+▁contribuable 1
+▁issus 1
+▁mm 1
+▁logistique 1
+▁massive 1
+itÃ 1
+▁réfléchir 1
+▁fonctionner 1
+▁gouverneur 1
+▁rappelé 1
+▁accueilli 1
+▁Autre 1
+▁2006-2007 1
+▁rapportant 1
+▁infraction 1
+bus 1
+▁Proposition 1
+▁milliard 1
+logique 1
+▁croissante 1
+▁bilan 1
+gué 1
+▁compagnies 1
+▁fournie 1
+▁liquide 1
+▁cuisine 1
+▁ancien 1
+▁cinquième 1
+▁suivent 1
+▁attentes 1
+acquérir 1
+▁perception 1
+mir 1
+▁révisée 1
+▁autrement 1
+▁anniversaire 1
+uri 1
+▁bactérie 1
+NO 1
+▁Robert 1
+▁modes 1
+▁âgés 1
+▁recueillir 1
+▁approuve 1
+▁allégations 1
+TP 1
+▁dûment 1
+▁Pol 1
+▁identifier 1
+▁Grèce 1
+▁avenir 1
+▁Peut 1
+entreprendre 1
+firm 1
+▁Lu 1
+▁ajoutée 1
+▁socio 1
+hôpital 1
+▁particules 1
+éclair 1
+GI 1
+▁visage 1
+▁intégrer 1
+place 1
+▁arabes 1
+OMS 1
+▁Nor 1
+ARC 1
+SG 1
+▁ateliers 1
+ima 1
+▁musée 1
+▁éventuel 1
+▁paramètres 1
+▁TED 1
+▁maintenu 1
+▁idéal 1
+▁Commerce 1
+▁équipements 1
+▁Italie 1
+▁Congrès 1
+▁Em 1
+▁personnelles 1
+▁Pologne 1
+aptitude 1
+indique 1
+▁allait 1
+▁collaborer 1
+éch 1
+▁appliquée 1
+▁hauteur 1
+Australie 1
+étend 1
+▁Vice 1
+▁pensent 1
+▁apporte 1
+▁franchi 1
+▁fondées 1
+TRA 1
+▁Siège 1
+▁établit 1
+▁industriels 1
+loi 1
+▁pensions 1
+▁fonctionnaire 1
+-8 1
+arch 1
+▁tchèque 1
+▁aspect 1
+IST 1
+▁permettront 1
+▁transactions 1
+product 1
+▁taxes 1
+▁Justification 1
+▁passagers 1
+▁potentiels 1
+▁éventail 1
+▁protéine 1
+▁transit 1
+▁connaît 1
+▁totalité 1
+▁nécessairement 1
+atelier 1
+BI 1
+major 1
+▁plate 1
+▁revenir 1
+▁Directive 1
+▁appliquées 1
+▁heure 1
+▁250 1
+▁Aucun 1
+▁600 1
+avère 1
+▁Tri 1
+▁pensée 1
+▁serre 1
+métrique 1
+▁bras 1
+▁Patrimoine 1
+▁coopérer 1
+▁entraîné 1
+▁diffusé 1
+▁litige 1
+itt 1
+▁barre 1
+ût 1
+indemnisation 1
+▁ST 1
+▁admis 1
+▁remarquer 1
+▁jouent 1
+▁regarde 1
+▁ententes 1
+▁Sans 1
+▁bulletin 1
+rice 1
+▁manifestations 1
+Franc 1
+▁renseignement 1
+▁Finlande 1
+sident 1
+lt 1
+issi 1
+▁transformer 1
+▁mobilité 1
+nit 1
+▁susceptible 1
+▁handicapés 1
+▁travaillant 1
+▁Université 1
+▁orale 1
+hum 1
+▁dimensions 1
+▁solaire 1
+▁compens 1
+▁résidents 1
+▁aidé 1
+▁cote 1
+tout 1
+République 1
+▁North 1
+▁dommage 1
+▁Beaucoup 1
+▁amples 1
+▁Cha 1
+2002 1
+▁différends 1
+entrepreneur 1
+▁naturels 1
+pend 1
+▁démographique 1
+MENT 1
+▁formulation 1
+▁combiné 1
+point 1
+étudier 1
+▁fixée 1
+▁parfait 1
+mod 1
+▁viable 1
+▁contingent 1
+▁colonne 1
+▁64 1
+ouille 1
+▁titulaires 1
+▁Protection 1
+▁expressément 1
+▁Activités 1
+▁longueur 1
+inquiétude 1
+▁détenu 1
+▁branche 1
+▁imagin 1
+▁défend 1
+▁fruits 1
+oyez 1
+▁délinquant 1
+▁module 1
+ash 1
+logie 1
+omique 1
+▁Mor 1
+▁with 1
+égo 1
+▁francophone 1
+▁oiseaux 1
+▁échantillons 1
+▁suggère 1
+▁accusé 1
+automne 1
+▁associée 1
+▁déplacées 1
+organe 1
+étendue 1
+▁fichiers 1
+pharmaceutique 1
+▁paraît 1
+comme 1
+▁engagés 1
+▁reli 1
+▁servent 1
+▁cité 1
+▁rubrique 1
+attaquer 1
+inventaire 1
+ommercialisation 1
+▁équilibre 1
+▁situe 1
+▁62 1
+▁adéquat 1
+▁fier 1
+rio 1
+ü 1
+▁disposent 1
+▁imm 1
+▁courante 1
+Ex 1
+acide 1
+▁Char 1
+Con 1
+▁compétents 1
+▁tentative 1
+▁soleil 1
+présent 1
+▁remettre 1
+mère 1
+▁scrutin 1
+▁superficie 1
+▁artistique 1
+▁débit 1
+▁technologiques 1
+Bretagne 1
+▁cohérence 1
+▁composantes 1
+Italie 1
+▁Timor 1
+▁?" 1
+▁remédier 1
+midi 1
+▁revendication 1
+▁effectifs 1
+▁Lake 1
+▁complément 1
+ual 1
+▁Liban 1
+▁veille 1
+▁affirmé 1
+▁savent 1
+▁01 1
+▁témoigne 1
+▁Christ 1
+▁frein 1
+▁AU 1
+▁universel 1
+▁dépit 1
+▁rechange 1
+argument 1
+▁Fin 1
+▁positifs 1
+▁obstacle 1
+▁collectif 1
+▁exemplaires 1
+III 1
+▁from 1
+influence 1
+▁étendu 1
+▁fabriqué 1
+▁voulait 1
+▁tiennent 1
+▁PCT 1
+▁résident 1
+▁63 1
+▁affiche 1
+▁1980 1
+▁révèle 1
+▁océans 1
+▁opérationnelle 1
+graphe 1
+▁comprenant 1
+▁2002-2003 1
+pour 1
+▁donnant 1
+▁navire 1
+illant 1
+▁voisin 1
+▁stockage 1
+▁ethniques 1
+▁correctement 1
+▁exempt 1
+pdf 1
+érée 1
+▁universitaires 1
+2000 1
+▁offrant 1
+étape 1
+▁composante 1
+assembl 1
+▁réputé 1
+▁renforcé 1
+▁Ki 1
+▁Deuxième 1
+opéra 1
+Égypte 1
+tter 1
+▁langage 1
+▁valable 1
+time 1
+accélérer 1
+vir 1
+Votre 1
+invention 1
+▁associées 1
+▁similaire 1
+▁1985 1
+▁démontre 1
+▁entretien 1
+▁ton 1
+▁conduire 1
+▁représenter 1
+homologation 1
+▁Néanmoins 1
+▁boîte 1
+▁rétro 1
+▁externes 1
+▁peau 1
+▁fiscal 1
+▁diffuser 1
+acqu 1
+rand 1
+▁amélioré 1
+▁automobile 1
+cup 1
+fort 1
+encourager 1
+▁bénéficient 1
+▁produisent 1
+▁restreint 1
+▁billet 1
+▁coordonn 1
+▁critère 1
+▁essayé 1
+▁thématique 1
+tine 1
+▁autorité 1
+▁Outre 1
+▁coin 1
+anglais 1
+.2.1 1
+▁bénéficiaire 1
+stitution 1
+▁chronique 1
+▁amélioration 1
+▁attendu 1
+▁monnaie 1
+▁option 1
+▁exclusivement 1
+▁mut 1
+agissait 1
+▁Musée 1
+▁ratifié 1
+▁attitude 1
+▁qualification 1
+▁remarque 1
+▁Statut 1
+▁reconnue 1
+excellence 1
+▁Bruxelles 1
+▁aperçu 1
+▁scénario 1
+▁déficit 1
+▁rétablissement 1
+▁réduite 1
+▁spécialisés 1
+2001 1
+▁TPS 1
+▁° 1
+▁recensement 1
+▁by 1
+▁renouvellement 1
+▁préférence 1
+page 1
+▁violent 1
+▁0,0 1
+▁compétente 1
+▁tableaux 1
+▁désignée 1
+▁style 1
+▁Adoption 1
+▁progresser 1
+▁interprétation 1
+estime 1
+utilité 1
+arg 1
+prop 1
+▁retenu 1
+1998 1
+▁troubles 1
+▁participe 1
+▁officieuses 1
+plan 1
+▁garçons 1
+▁magasin 1
+▁concret 1
+▁tort 1
+▁efficacité 1
+observateur 1
+▁POUR 1
+vage 1
+Agriculture 1
+▁pierre 1
+çons 1
+▁intra 1
+Indonésie 1
+assemblée 1
+oeuvre 1
+articul 1
+issé 1
+▁Jeux 1
+ssons 1
+▁école 1
+▁printemps 1
+▁média 1
+▁GRC 1
+oxyde 1
+▁rappel 1
+▁CNUCED 1
+ffin 1
+▁égal 1
+▁Vol 1
+ji 1
+▁argent 1
+▁courte 1
+▁priv 1
+▁voyageurs 1
+résolution 1
+tect 1
+▁désignation 1
+Autorité 1
+▁fasse 1
+Pro 1
+▁passée 1
+▁front 1
+▁chute 1
+▁cohésion 1
+▁célèbre 1
+▁paysage 1
+▁dignité 1
+▁Ben 1
+▁suscite 1
+▁citoyen 1
+▁envoyer 1
+▁appar 1
+-07 1
+▁saine 1
+velopp 1
+▁verser 1
+▁quotidien 1
+▁répand 1
+▁définit 1
+/00 1
+▁externe 1
+▁touchés 1
+▁Portugal 1
+▁rédigé 1
+▁responsabilis 1
+ÉS 1
+▁vide 1
+▁Revenu 1
+tude 1
+▁quasi 1
+▁organisationnel 1
+▁intéressante 1
+▁Leur 1
+tech 1
+thérapie 1
+▁virtuel 1
+▁handicap 1
+LC 1
+▁vécu 1
+▁écosystèmes 1
+▁vendre 1
+tudi 1
+▁Quant 1
+dg 1
+▁réunir 1
+▁éducatif 1
+▁Accord 1
+▁richesse 1
+▁coeur 1
+▁supprimer 1
+▁plastique 1
+▁vin 1
+OTAN 1
+▁Op 1
+▁concurrentiel 1
+▁suivie 1
+AND 1
+acceptation 1
+▁Danemark 1
+incertitude 1
+▁réglé 1
+▁Tél 1
+▁étions 1
+▁suffit 1
+▁trouverez 1
+▁Washington 1
+▁suggéré 1
+▁voté 1
+▁Règle 1
+▁frère 1
+hydr 1
+▁contraintes 1
+▁(3) 1
+rence 1
+▁future 1
+infraction 1
+▁trimestre 1
+▁vouloir 1
+admissibilité 1
+exploration 1
+▁effectif 1
+rap 1
+▁accéder 1
+▁Chypre 1
+▁attribuable 1
+▁symbole 1
+▁téléphonique 1
+▁administré 1
+▁occidentale 1
+▁terminer 1
+▁agences 1
+▁plage 1
+▁Imp 1
+copie 1
+▁tend 1
+▁atelier 1
+▁limitation 1
+▁partiel 1
+▁Public 1
+▁conformer 1
+Zélande 1
+▁Corporation 1
+▁Résumé 1
+▁discriminatoire 1
+▁sensibiliser 1
+▁visa 1
+échantillon 1
+▁Durant 1
+▁bref 1
+▁développe 1
+▁refusé 1
+▁pouvais 1
+▁dotation 1
+▁acheté 1
+ney 1
+▁sentir 1
+UC 1
+▁requise 1
+▁diagnostic 1
+▁Science 1
+organiser 1
+oubli 1
+africain 1
+IX 1
+UV 1
+▁légères 1
+▁absolu 1
+▁race 1
+▁étudiant 1
+▁compatible 1
+▁illustre 1
+▁prÃ 1
+▁content 1
+thèse 1
+▁mensuel 1
+▁spécialement 1
+htm 1
+Afghanistan 1
+▁détermine 1
+▁observateurs 1
+avancement 1
+▁domestique 1
+tivité 1
+▁pourrions 1
+▁cal 1
+ctor 1
+▁Fl 1
+identifier 1
+▁subsidiaire 1
+▁George 1
+▁Publications 1
+envoi 1
+▁hôte 1
+▁Résultats 1
+OSCE 1
+appelante 1
+▁strict 1
+▁organiser 1
+▁rencontrer 1
+▁Pal 1
+▁coordonnateur 1
+▁clientèle 1
+▁acheter 1
+▁Chi 1
+ball 1
+▁Pren 1
+▁chimique 1
+▁profondeur 1
+▁arrière 1
+implication 1
+▁souhaiter 1
+ographique 1
+▁Sierra 1
+û 1
+▁malade 1
+▁aliment 1
+suite 1
+▁magnifique 1
+® 1
+▁promoteur 1
+intervenant 1
+▁Consul 1
+▁Londres 1
+▁Gal 1
+▁Wal 1
+▁déposer 1
+invitation 1
+▁UN 1
+▁diplôme 1
+Argentine 1
+▁quarante 1
+▁disparu 1
+▁prenantes 1
+événement 1
+▁fixer 1
+▁DG 1
+essence 1
+▁CANADA 1
+▁cultiv 1
+▁acquise 1
+▁citer 1
+occupent 1
+guer 1
+HA 1
+▁démontrer 1
+Espagne 1
+▁fermé 1
+▁potable 1
+▁révél 1
+▁gestionnaire 1
+▁méta 1
+intégrer 1
+▁resté 1
+▁remis 1
+▁prie 1
+▁disparition 1
+▁connaissent 1
+▁douce 1
+▁Travail 1
+Israël 1
+augmenter 1
+▁décrire 1
+▁néglige 1
+▁encourageant 1
+▁Espagne 1
+▁distincte 1
+▁imprimé 1
+▁Hotel 1
+▁maternelle 1
+▁détection 1
+▁agriculteurs 1
+▁touristique 1
+▁palestinienne 1
+▁stock 1
+ÉE 1
+▁viande 1
+expiration 1
+obtention 1
+▁chap 1
+▁disque 1
+▁· 1
+Canada 1
+▁entrepreneurs 1
+ASS 1
+▁HCR 1
+/10 1
+administrateur 1
+importateur 1
+envoyer 1
+▁vendredi 1
+▁concentr 1
+dd 1
+aurait 1
+▁Rome 1
+▁croissant 1
+▁conviction 1
+interaction 1
+▁tente 1
+▁Bulgarie 1
+▁faisaient 1
+mail 1
+▁TV 1
+▁atmosphérique 1
+▁ouvrir 1
+▁certification 1
+▁plat 1
+▁Ob 1
+▁transmet 1
+▁98 1
+▁circonscription 1
+▁recouvrement 1
+entremise 1
+▁disposé 1
+▁constamment 1
+▁Gar 1
+▁administrateurs 1
+▁témoignage 1
+▁justifié 1
+▁arbres 1
+▁Sécurité 1
+▁demeurent 1
+▁répression 1
+▁vieille 1
+régional 1
+▁intensif 1
+▁recul 1
+▁prime 1
+option 1
+▁conducteur 1
+▁menacé 1
+%) 1
+▁ajoute 1
+▁provoque 1
+▁Pat 1
+▁leçon 1
+▁mineur 1
+▁tué 1
+▁Européen 1
+▁répondant 1
+▁retirer 1
+▁via 1
+▁procureur 1
+-09 1
+exercer 1
+▁leader 1
+▁Pêches 1
+oïde 1
+▁universitaire 1
+huile 1
+▁étage 1
+▁vêtements 1
+▁jardin 1
+▁Jusqu 1
+▁voter 1
+▁civilisation 1
+▁Peter 1
+▁préfér 1
+▁espérons 1
+▁éloigné 1
+étiquette 1
+▁immédiate 1
+▁orientale 1
+▁explicite 1
+▁affiché 1
+▁arrangements 1
+▁conservé 1
+incendie 1
+ception 1
+▁profond 1
+▁perdre 1
+▁destinée 1
+▁consultant 1
+glo 1
+▁saisir 1
+▁tax 1
+▁défendre 1
+▁vois 1
+ège 1
+OMM 1
+/58/ 1
+▁échantillon 1
+culture 1
+▁compléter 1
+▁corporel 1
+artiste 1
+Ambassadeur 1
+▁réfléchi 1
+▁justifier 1
+ajouter 1
+▁identité 1
+▁utilis 1
+▁Bell 1
+▁télécopie 1
+pression 1
+▁connexion 1
+▁émerge 1
+▁Ali 1
+OUR 1
+▁ram 1
+▁achats 1
+▁morceau 1
+path 1
+▁conversation 1
+▁psychologique 1
+▁Deuxièmement 1
+▁Suite 1
+▁démo 1
+inspire 1
+▁consommateur 1
+▁évolue 1
+épidémie 1
+▁Géorgie 1
+▁Finances 1
+▁évoqué 1
+▁fondation 1
+▁hôpitaux 1
+▁personnalité 1
+▁enceinte 1
+▁empêche 1
+▁Mat 1
+accorder 1
+▁sanction 1
+▁attaché 1
+TVH 1
+Rev 1
+chev 1
+▁retiré 1
+▁alloué 1
+▁polluants 1
+▁repas 1
+▁amp 1
+expédition 1
+sseur 1
+▁Croatie 1
+▁Leone 1
+▁lundi 1
+▁anglaise 1
+▁facilité 1
+▁2.1 1
+▁Chili 1
+▁Serbie 1
+▁commandant 1
+▁associ 1
+tron 1
+▁Communiqué 1
+instaurer 1
+nouveau 1
+OT 1
+▁autorise 1
+▁Océans 1
+▁TPSGC 1
+initi 1
+▁Dès 1
+▁péri 1
+atteign 1
+▁événement 1
+▁Kenya 1
+▁signaler 1
+▁sauvegarde 1
+ä 1
+▁génie 1
+invent 1
+▁rendent 1
+ONT 1
+national 1
+▁Palestine 1
+UNI 1
+▁confié 1
+▁classement 1
+▁Procureur 1
+▁parfaitement 1
+gion 1
+▁Hongrie 1
+▁automatiquement 1
+▁dégage 1
+▁lucratif 1
+ingénieur 1
+▁tendant 1
+duire 1
+arbre 1
+▁gravité 1
+extension 1
+▁disait 1
+▁espère 1
+sectoriel 1
+tract 1
+▁Dia 1
+▁clôture 1
+▁récolte 1
+▁vend 1
+▁répondent 1
+▁(2001) 1
+▁Seul 1
+▁apparaît 1
+▁légèrement 1
+▁automatique 1
+▁mille 1
+▁Cabinet 1
+▁Malheureusement 1
+▁suiv 1
+▁noire 1
+▁vieux 1
+▁armé 1
+ok 1
+EU 1
+▁Trois 1
+▁comparable 1
+▁Lisbonne 1
+▁contiennent 1
+▁côtière 1
+ouvrir 1
+▁vague 1
+▁Research 1
+▁coopérative 1
+▁valide 1
+▁servant 1
+glement 1
+▁censé 1
+acheteur 1
+▁précédemment 1
+▁équilibré 1
+▁distingue 1
+▁3.1 1
+nor 1
+▁Bay 1
+ulf 1
+▁coupable 1
+▁2005-2006 1
+▁obligé 1
+▁déposant 1
+Alliance 1
+▁législative 1
+tag 1
+étal 1
+▁Général 1
+▁café 1
+▁investisseurs 1
+▁4.1 1
+onique 1
+▁rayon 1
+▁vulnérabilité 1
+Г 1
+▁séquence 1
+▁souffre 1
+2,5 1
+▁plafond 1
+/59/ 1
+adresser 1
+▁antérieur 1
+▁intégrante 1
+▁souveraineté 1
+▁thé 1
+▁Sub 1
+▁parvenu 1
+▁comm 1
+coup 1
+▁profondément 1
+▁métaux 1
+▁tombe 1
+ated 1
+abus 1
+▁aéronefs 1
+ordonnance 1
+▁stupéfiant 1
+▁synthèse 1
+▁Post 1
+ACDI 1
+▁Maroc 1
+▁Nunavut 1
+ssaient 1
+▁dépendance 1
+▁incluant 1
+▁éventuellement 1
+orateur 1
+▁bateau 1
+satisf 1
+▁bateaux 1
+▁catastrophe 1
+▁apparent 1
+▁souffrance 1
+▁poussé 1
+aura 1
+phi 1
+▁Fort 1
+▁péril 1
+▁carré 1
+▁salarié 1
+▁créancier 1
+▁journaux 1
+▁Sha 1
+▁hydro 1
+▁Att 1
+▁lecteur 1
+▁tolérance 1
+▁évidemment 1
+▁suspension 1
+▁faux 1
+▁significatif 1
+objection 1
+▁affecte 1
+exige 1
+▁routier 1
+▁accepte 1
+▁suppos 1
+▁beau 1
+▁exploité 1
+▁120 1
+▁Conseiller 1
+▁indirect 1
+▁Limited 1
+100 1
+▁dirige 1
+▁courrier 1
+identifi 1
+▁demeur 1
+1999 1
+▁imposées 1
+▁transformé 1
+admission 1
+▁1986 1
+▁combler 1
+▁colonie 1
+▁italien 1
+212) 1
+cription 1
+▁recens 1
+enseignant 1
+▁Rwanda 1
+échéance 1
+▁radical 1
+▁prioritaire 1
+▁Mark 1
+▁quitter 1
+▁emprunt 1
+▁gènes 1
+▁vraie 1
+ignant 1
+‰ 1
+▁minoritaire 1
+/60/ 1
+trans 1
+▁attendre 1
+▁pertinence 1
+▁concrète 1
+▁puni 1
+▁relèvent 1
+▁revient 1
+Enquête 1
+épi 1
+épargne 1
+▁exceptionnelle 1
+lib 1
+▁progressivement 1
+▁modernisation 1
+ambassade 1
+efforce 1
+▁accro 1
+▁pourriez 1
+▁diminuer 1
+▁Décide 1
+▁cumul 1
+▁requ 1
+Commission 1
+▁religieux 1
+▁fausse 1
+▁manger 1
+▁surpris 1
+▁ressource 1
+assainissement 1
+▁portable 1
+▁grossesse 1
+ATIONS 1
+liquer 1
+inclusion 1
+▁blessé 1
+▁constituée 1
+Ottawa 1
+▁iii 1
+▁Tim 1
+LES 1
+▁Australie 1
+▁commencent 1
+▁bruit 1
+▁attaques 1
+▁constat 1
+▁Contact 1
+▁assumer 1
+MIN 1
+arbitrage 1
+▁privilège 1
+▁meurt 1
+▁surmonter 1
+▁reconnaissant 1
+▁bientôt 1
+▁MPO 1
+/11 1
+▁assisté 1
+▁2.2 1
+▁basse 1
+▁divulgation 1
+char 1
+▁cellule 1
+▁dollar 1
+▁officiellement 1
+ARI 1
+▁renforcée 1
+Yougoslavie 1
+▁fiducie 1
+introduire 1
+arrivée 1
+▁poursuit 1
+▁arrivée 1
+▁criminelle 1
+▁décharge 1
+oppose 1
+▁probabilité 1
+▁décideurs 1
+▁Winnipeg 1
+▁détruit 1
+▁avancer 1
+annonce 1
+▁témoin 1
+▁territoriaux 1
+▁Côte 1
+ink 1
+▁Adresse 1
+▁Kar 1
+▁TRANS 1
+▁reproduit 1
+▁publier 1
+empêcher 1
+haus 1
+▁modifiant 1
+▁attirer 1
+Président 1
+▁confirme 1
+élargir 1
+▁Introduction 1
+▁ménages 1
+INS 1
+▁Indiens 1
+▁INC 1
+▁Membre 1
+▁analysé 1
+▁BCE 1
+▁féminine 1
+▁précoce 1
+▁fournissant 1
+▁mobilisation 1
+vironnementales 1
+icité 1
+ECT 1
+▁mécanique 1
+▁possession 1
+▁pertinente 1
+ÉT 1
+▁Michael 1
+▁Financement 1
+▁sommaire 1
+în 1
+▁Aussi 1
+tisme 1
+▁Commentaires 1
+▁Cinquième 1
+MR 1
+verbal 1
+▁Prix 1
+▁voudra 1
+▁fenêtre 1
+sphère 1
+▁Somalie 1
+▁décrites 1
+▁plaque 1
+▁Commissariat 1
+▁aborder 1
+▁informel 1
+▁respectifs 1
+▁confidentiel 1
+appréciation 1
+▁dépistage 1
+▁revanche 1
+▁schéma 1
+▁douleur 1
+▁amené 1
+▁honor 1
+▁spécifiquement 1
+aéronef 1
+équation 1
+▁2003-2004 1
+▁English 1
+▁accueille 1
+▁viabilité 1
+▁cohérente 1
+▁feront 1
+▁vieillissement 1
+ozone 1
+ship 1
+▁dépasser 1
+▁exploiter 1
+▁confirmer 1
+▁Martin 1
+▁(« 1
+▁aquatique 1
+0,00 1
+▁théâtre 1
+μ 1
+▁courage 1
+tendent 1
+Azerbaïdjan 1
+▁bilatéraux 1
+▁963- 1
+▁densité 1
+/57/ 1
+▁touché 1
+agé 1
+1997 1
+Ê 1
+▁Luxembourg 1
+▁déficience 1
+▁escompté 1
+▁inclut 1
+▁analogue 1
+▁communique 1
+▁repr 1
+▁entrave 1
+▁portefeuille 1
+COM 1
+▁couple 1
+▁intervenir 1
+▁faune 1
+ANCE 1
+▁améliorée 1
+▁Partenariat 1
+▁talent 1
+▁considér 1
+▁englobe 1
+allégation 1
+▁possèdent 1
+▁tissu 1
+▁dérivé 1
+▁consenti 1
+▁souris 1
+▁Retour 1
+▁Télécopieur 1
+Â 1
+▁comptabilité 1
+▁hu 1
+▁simultané 1
+▁jurisprudence 1
+▁ressortissants 1
+▁métier 1
+KO 1
+Pierre 1
+▁ouvre 1
+▁citoyenneté 1
+▁iraquien 1
+▁propice 1
+phone 1
+change 1
+▁Commun 1
+nçant 1
+importantes 1
+▁Application 1
+▁transnationale 1
+horaire 1
+insuffisance 1
+▁supprimé 1
+▁Coût 1
+▁étoiles 1
+▁TIC 1
+▁pourrez 1
+▁tenter 1
+intimé 1
+▁fraude 1
+▁tranche 1
+▁Régime 1
+▁spectre 1
+▁paye 1
+± 1
+▁Fonction 1
+▁revoir 1
+autoriser 1
+▁renforce 1
+▁mathématique 1
+étant 1
+ISS 1
+▁souvenir 1
+atch 1
+acé 1
+faire 1
+mou 1
+▁# 1
+Arctique 1
+▁extrait 1
+▁700 1
+▁Quoi 1
+▁comparé 1
+▁converti 1
+▁présumé 1
+/2001/ 1
+utilise 1
+▁particip 1
+▁soumet 1
+▁croit 1
+▁différ 1
+▁restructuration 1
+▁camion 1
+▁libellé 1
+▁spécifié 1
+▁vacances 1
+▁prévoient 1
+ERS 1
+éliminer 1
+▁opposé 1
+▁charbon 1
+▁déclin 1
+incapacité 1
+▁recruté 1
+AGE 1
+▁Premièrement 1
+▁refuser 1
+méthyl 1
+▁James 1
+▁restant 1
+▁retrouver 1
+▁march 1
+▁forestier 1
+▁déclarer 1
+DH 1
+/61/ 1
+▁grosse 1
+ford 1
+— 1
+ENCE 1
+▁Sommaire 1
+intensité 1
+▁Secteur 1
+agression 1
+lio 1
+ncée 1
+▁calculer 1
+▁délivrance 1
+inverse 1
+▁Bern 1
+▁bain 1
+appuyant 1
+▁vérifié 1
+▁brûl 1
+▁délit 1
+mettez 1
+installe 1
+Année 1
+▁prospérité 1
+▁déten 1
+▁bâti 1
+▁Koweït 1
+▁compliqué 1
+▁mentionnées 1
+▁passeport 1
+▁gratuitement 1
+▁varié 1
+▁marginal 1
+agne 1
+.1.1 1
+troisième 1
+▁diplômé 1
+▁zéro 1
+▁Territoires 1
+▁féliciter 1
+▁menu 1
+▁libéralisation 1
+▁légal 1
+(3) 1
+▁Hol 1
+▁OK 1
+▁Consultations 1
+▁guéri 1
+▁signer 1
+▁correspondance 1
+rifi 1
+région 1
+WG 1
+deuxième 1
+▁recourir 1
+▁photographie 1
+/12 1
+▁World 1
+▁félicité 1
+▁marcher 1
+▁administr 1
+source 1
+▁correction 1
+fusion 1
+▁retourner 1
+▁Calgary 1
+▁cérémonie 1
+▁originaire 1
+interprète 1
+▁indien 1
+▁ressenti 1
+ELLE 1
+▁insiste 1
+▁visiter 1
+▁correspondent 1
+.3.1 1
+▁météorologique 1
+Homme 1
+▁définitif 1
+▁préservation 1
+Ordre 1
+▁observe 1
+▁analytique 1
+▁royale 1
+▁descriptif 1
+▁député 1
+Commissariat 1
+▁manipul 1
+▁forestière 1
+TES 1
+▁voient 1
+QUE 1
+§ 1
+▁comportant 1
+▁stipule 1
+bal 1
+▁japonais 1
+▁prolongé 1
+▁réconciliation 1
+▁levée 1
+bré 1
+▁directrice 1
+▁fiche 1
+▁spectacle 1
+▁Journée 1
+▁biotechnologie 1
+▁préféré 1
+▁afférent 1
+atténuation 1
+▁réagir 1
+Qu 1
+graphie 1
+▁chien 1
+▁vaut 1
+▁compétition 1
+▁inconnu 1
+inflation 1
+alerte 1
+▁imparti 1
+▁sauver 1
+archi 1
+▁exemplaire 1
+envergure 1
+▁Quelques 1
+▁effectue 1
+POS 1
+▁paquet 1
+▁(1999) 1
+▁réussir 1
+▁orateurs 1
+▁analyser 1
+▁posent 1
+▁établissant 1
+investir 1
+▁vital 1
+▁Prend 1
+ignore 1
+▁Slovénie 1
+▁fardeau 1
+/2004/ 1
+▁exiger 1
+bio 1
+▁validité 1
+bec 1
+▁familial 1
+▁approfondi 1
+{ 1
+▁réaffirme 1
+▁versées 1
+▁enseignements 1
+▁écrire 1
+▁rive 1
+▁librement 1
+institutionnelle 1
+Com 1
+▁Google 1
+▁inutile 1
+UD 1
+▁concurrent 1
+▁tap 1
+▁cardiaque 1
+▁sensiblement 1
+artisan 1
+dium 1
+▁marketing 1
+▁mobiliser 1
+/01 1
+▁assorti 1
+▁circuit 1
+▁contemporain 1
+istique 1
+▁Mesdames 1
+▁décor 1
+▁SCT 1
+Laurent 1
+▁CRTC 1
+▁serveur 1
+▁supervision 1
+entraînement 1
+▁kilo 1
+incitation 1
+Comité 1
+▁consacrer 1
+chloro 1
+/62/ 1
+intervalle 1
+▁conclut 1
+▁bienfaisance 1
+ASFC 1
+▁femelle 1
+▁pot 1
+autant 1
+▁sucre 1
+▁Coopération 1
+quatrième 1
+INE 1
+~ 1
+/2000/ 1
+▁arguments 1
+▁roche 1
+▁immédiat 1
+▁raconter 1
+▁complété 1
+Algérie 1
+▁prononcée 1
+accomplir 1
+▁baie 1
+▁sélectionné 1
+▁dérogation 1
+▁Type 1
+étudiant 1
+ING 1
+▁artificiel 1
+Edmonton 1
+▁molécule 1
+▁reprendre 1
+} 1
+▁finir 1
+▁provoqué 1
+▁facture 1
+▁abandonné 1
+accusation 1
+atteinte 1
+▁mâle 1
+▁contraignant 1
+▁exportateurs 1
+▁recueil 1
+accident 1
+▁feux 1
+▁1984 1
+▁internet 1
+▁prononcer 1
+▁interprété 1
+Ayant 1
+pel 1
+▁touchées 1
+lip 1
+▁Messieurs 1
+▁procède 1
+▁réservoir 1
+▁tombé 1
+▁clef 1
+oxygène 1
+▁formelle 1
+▁multilatéraux 1
+▁Étude 1
+informer 1
+▁concentré 1
+inspecteur 1
+▁appartient 1
+▁symptômes 1
+équipage 1
+▁décisionnel 1
+▁procurer 1
+annulation 1
+▁bénéficie 1
+▁dotée 1
+▁compl 1
+▁minière 1
+▁Development 1
+Ouganda 1
+▁immigrants 1
+bourg 1
+▁métro 1
+▁pertinent 1
+▁William 1
+▁sécuritaire 1
+▁vérificateur 1
+▁wh 1
+▁harmonisé 1
+explorer 1
+pass 1
+▁VII 1
+▁vendeur 1
+▁prononcé 1
+▁stimuler 1
+utili 1
+▁salue 1
+▁électeurs 1
+.2.2 1
+▁Han 1
+▁commandement 1
+▁Durée 1
+rup 1
+cinquième 1
+▁Tour 1
+erreur 1
+▁véritablement 1
+embauche 1
+emprisonnement 1
+▁blanchiment 1
+métrie 1
+▁Richard 1
+étiquetage 1
+▁réinsertion 1
+apprendre 1
+▁fabriquer 1
+▁Company 1
+▁prouver 1
+Irak 1
+▁Corr 1
+▁discuté 1
+▁tire 1
+Éthiopie 1
+▁bilingue 1
+▁caméra 1
+▁Titre 1
+gov 1
+▁douane 1
+▁soupçon 1
+▁gagné 1
+▁arbitraire 1
+▁atlantique 1
+▁dégradation 1
+wood 1
+▁combattants 1
+▁heurt 1
+http 1
+▁restriction 1
+▁suffi 1
+▁prolonge 1
+▁MDN 1
+▁cercle 1
+▁Ville 1
+▁Gaza 1
+économique 1
+▁saurait 1
+accessibilité 1
+▁distribuer 1
+▁soulève 1
+attendre 1
+▁souple 1
+▁fermement 1
+▁Bibliothèque 1
+groupe 1
+▁calme 1
+apparition 1
+▁River 1
+▁cellulaire 1
+▁interrogé 1
+▁étudie 1
+▁habilité 1
+▁fur 1
+▁nulle 1
+▁souterrain 1
+outils 1
+attentat 1
+École 1
+▁protège 1
+▁signaux 1
+phosph 1
+agrément 1
+▁exceptionnel 1
+▁Pourtant 1
+UNESCO 1
+▁personnage 1
+▁quitté 1
+▁prendra 1
+bac 1
+Édouard 1
+▁avéré 1
+▁heureuse 1
+▁résume 1
+▁clos 1
+▁remonte 1
+▁proviennent 1
+officier 1
+▁Halifax 1
+▁traduire 1
+▁saumon 1
+▁condamnation 1
+▁piste 1
+▁consistant 1
+▁méthodologie 1
+inquiète 1
+▁canada 1
+▁réexamen 1
+▁furent 1
+/2006/ 1
+/2005/ 1
+achèvement 1
+▁alternative 1
+▁joindre 1
+▁textile 1
+▁indiennes 1
+/2003/ 1
+▁pêcheurs 1
+▁constituant 1
+▁créativité 1
+▁Réserve 1
+▁traditionnel 1
+▁Arrêt 1
+expulsion 1
+XV 1
+vide 1
+pyr 1
+▁regret 1
+▁câble 1
+▁souches 1
+▁toucher 1
+▁neige 1
+EST 1
+▁concession 1
+▁socioéconomique 1
+▁blanche 1
+▁souhaitable 1
+▁adressé 1
+▁détenteur 1
+ONG 1
+▁peinture 1
+immunité 1
+▁prévision 1
+▁Burundi 1
+▁Haïti 1
+▁Guerre 1
+▁inspiré 1
+▁faim 1
+Opération 1
+▁informelle 1
+▁comparativement 1
+▁récepteur 1
+immobilisation 1
+▁séparation 1
+▁Manuel 1
+eck 1
+▁chirurgie 1
+▁rémunéré 1
+▁structuré 1
+▁ciel 1
+Angleterre 1
+▁Lituanie 1
+▁[...] 1
+▁renferme 1
+invite 1
+▁apparu 1
+essentiel 1
+▁canal 1
+▁fréquemment 1
+▁suppression 1
+▁Aff 1
+▁nutrition 1
+▁faiblesse 1
+▁enseigne 1
+▁créant 1
+▁sauvetage 1
+▁Pérou 1
+▁réalise 1
+▁commentaire 1
+accise 1
+▁malheureusement 1
+inscriv 1
+existait 1
+war 1
+0.00 1
+▁exigent 1
+Annexe 1
+▁rétablir 1
+▁dumping 1
+Ivoire 1
+▁protégées 1
+ILL 1
+▁évolué 1
+▁diriger 1
+▁âgé 1
+▁discut 1
+▁Valeur 1
+< 1
+apprécie 1
+uck 1
+▁demandant 1
+▁subir 1
+▁beauté 1
+hol 1
+formation 1
+valent 1
+▁trace 1
+▁fonctionnent 1
+▁moral 1
+▁sonore 1
+▁dépassé 1
+territorial 1
+▁nettement 1
+intolérance 1
+▁déplacer 1
+▁chanson 1
+▁segment 1
+évolu 1
+▁garantit 1
+▁compétent 1
+clin 1
+▁complémentaire 1
+énoncé 1
+instauration 1
+▁Imaginez 1
+▁délibérations 1
+▁Chacun 1
+▁reporter 1
+PORT 1
+▁Tre 1
+cru 1
+▁collègue 1
+▁Philippines 1
+aviation 1
+▁Costa 1
+▁fonctionnelle 1
+▁typique 1
+▁2008-2009 1
+▁confidentialité 1
+▁fermeture 1
+interface 1
+▁bénéficié 1
+▁Celui 1
+▁vallée 1
+▁(2004) 1
+▁imaginer 1
+EPA 1
+▁fête 1
+▁Nigéria 1
+▁motivation 1
+▁prévoyant 1
+▁souten 1
+entendre 1
+▁Slovaquie 1
+▁condamn 1
+▁Registre 1
+▁transitoire 1
+ORD 1
+▁réside 1
+issons 1
+gri 1
+▁expose 1
+crit 1
+entrevue 1
+▁Macédoine 1
+▁humide 1
+▁Lacs 1
+▁magazine 1
+‘ 1
+▁touchent 1
+▁corriger 1
+▁Sénat 1
+▁bouge 1
+▁Darfour 1
+▁rayonnement 1
+clenche 1
+▁constant 1
+Sud 1
+▁pilier 1
+▁CFP 1
+▁boule 1
+▁circul 1
+▁minute 1
+▁assurant 1
+ë 1
+exemption 1
+imposent 1
+effectif 1
+▁distributeur 1
+▁offices 1
+améliore 1
+asi 1
+▁Compagnie 1
+▁olympique 1
+▁végétal 1
+▁cesser 1
+Article 1
+▁Analyse 1
+▁Contribution 1
+▁problématique 1
+▁troupes 1
+▁recommand 1
+▁cabinet 1
+▁croyance 1
+▁tourner 1
+▁fonctionnel 1
+▁Considérant 1
+▁renvoyé 1
+▁compensation 1
+▁Park 1
+▁agréé 1
+▁__________ 1
+▁Kyoto 1
+▁défavoris 1
+▁favorisant 1
+▁blé 1
+▁éclairé 1
+▁tube 1
+▁Méd 1
+▁convenable 1
+▁doigt 1
+▁routière 1
+▁résidant 1
+▁notable 1
+▁intérim 1
+/55/ 1
+▁afghan 1
+arbitre 1
+harmonie 1
+▁signale 1
+▁puits 1
+ôme 1
+trice 1
+▁réputation 1
+▁fallu 1
+▁fraction 1
+Estonie 1
+▁grec 1
+intelligence 1
+appartenance 1
+▁réaffirmé 1
+▁consacre 1
+▁muni 1
+habitation 1
+work 1
+▁1970 1
+informatique 1
+▁Réponse 1
+▁émanant 1
+▁énonce 1
+▁capture 1
+▁Très 1
+▁redevance 1
+▁assume 1
+▁tomber 1
+▁remarqué 1
+▁motivé 1
+▁Relations 1
+▁attendant 1
+▁Processus 1
+▁impôts 1
+▁excessive 1
+▁Formation 1
+▁courbe 1
+▁1982 1
+aurais 1
+▁démonstration 1
+▁incorporé 1
+▁travailleur 1
+droit 1
+▁jeudi 1
+▁consolider 1
+▁Marc 1
+▁Group 1
+▁haine 1
+▁insulaires 1
+▁signification 1
+Hôtel 1
+▁médiation 1
+▁jaune 1
+▁Question 1
+▁objective 1
+▁libéré 1
+entraîne 1
+▁observ 1
+excédent 1
+▁Victoria 1
+▁précède 1
+/2002/ 1
+▁gars 1
+imagine 1
+▁Américains 1
+▁rentable 1
+▁producteur 1
+▁transférer 1
+▁collège 1
+▁constatations 1
+▁Lettonie 1
+=" 1
+_______ 1
+▁progressive 1
+azote 1
+but 1
+type 1
+invalidité 1
+▁goût 1
+Arménie 1
+▁golf 1
+Rouge 1
+▁Peu 1
+morph 1
+▁doter 1
+Monténégro 1
+▁transfrontière 1
+▁étroit 1
+▁mange 1
+▁reconnaiss 1
+▁ratio 1
+audiovisuel 1
+▁incapable 1
+▁sincère 1
+employer 1
+envisager 1
+Direct 1
+accueillir 1
+▁interactions 1
+▁confort 1
+▁apparaître 1
+▁collabore 1
+▁Report 1
+▁Définition 1
+▁harcèlement 1
+▁munitions 1
+efforcer 1
+adolescent 1
+▁Bangladesh 1
+▁ralenti 1
+▁génial 1
+inclure 1
+▁officiers 1
+ressource 1
+/56/ 1
+▁étrange 1
+▁déduction 1
+▁cigarette 1
+▁Encourage 1
+▁considérant 1
+TRANS 1
+▁dirigeant 1
+hibit 1
+info 1
+▁Thaïlande 1
+▁fragment 1
+▁partiellement 1
+▁gagne 1
+exactitude 1
+▁fassent 1
+▁indésirable 1
+▁négocier 1
+▁sévère 1
+▁hasard 1
+▁sérieusement 1
+▁occidental 1
+▁Supplément 1
+▁appartement 1
+ough 1
+▁arbitral 1
+▁trompe 1
+▁arriva 1
+lette 1
+CHE 1
+▁précieux 1
+▁diffère 1
+▁réadaptation 1
+▁précieuse 1
+Colombie 1
+mond 1
+emporte 1
+mic 1
+▁Travaux 1
+▁commémor 1
+▁masculin 1
+▁pensais 1
+▁biologie 1
+▁vivement 1
+▁2000-2001 1
+▁émet 1
+insolvabilité 1
+▁documentaire 1
+▁tourne 1
+plus 1
+▁douanière 1
+▁récompense 1
+▁souplesse 1
+▁Transport 1
+▁transaction 1
+▁Louis 1
+▁libéral 1
+arrestation 1
+▁durabilité 1
+▁plomb 1
+▁sanguin 1
+Industrie 1
+▁pluie 1
+▁Promotion 1
+▁terrible 1
+▁infecté 1
+▁frappe 1
+▁dixième 1
+▁prépare 1
+/63/ 1
+▁Directrice 1
+▁pénétr 1
+▁dépression 1
+ó 1
+▁Palestiniens 1
+▁minorité 1
+1995 1
+▁justifie 1
+▁Bulletin 1
+▁restauration 1
+ward 1
+▁cessation 1
+▁Madrid 1
+▁antiterroriste 1
+▁hommage 1
+▁contenir 1
+▁grief 1
+▁Contactez 1
+hospital 1
+▁biodiversité 1
+▁retenir 1
+anthrop 1
+▁réaliste 1
+échantillonnage 1
+▁Off 1
+▁PMA 1
+▁suppl 1
+▁divisé 1
+▁PNUE 1
+▁prisonniers 1
+▁prenez 1
+▁privilégié 1
+exploitant 1
+▁matérielle 1
+église 1
+▁déplacé 1
+▁mardi 1
+▁Singapour 1
+/2008/ 1
+▁sorti 1
+affirmation 1
+▁paludisme 1
+▁recyclage 1
+▁amener 1
+hydro 1
+▁souhaité 1
+appendice 1
+▁Paiements 1
+▁annulé 1
+▁maître 1
+▁Radio 1
+▁remet 1
+▁coefficient 1
+▁rigoureuse 1
+lève 1
+accéder 1
+script 1
+▁substantielle 1
+▁utilisons 1
+▁0,1 1
+▁Society 1
+▁(2000) 1
+▁State 1
+uj 1
+▁mettra 1
+▁confusion 1
+▁Hong 1
+▁projection 1
+▁vivons 1
+▁viens 1
+▁caisse 1
+▁dispers 1
+▁spécialiste 1
+inscrit 1
+▁Crédit 1
+valuation 1
+élevage 1
+écosystème 1
+▁1,5 1
+▁pluriannuel 1
+▁prélèvement 1
+▁tenté 1
+▁léger 1
+▁préparatifs 1
+▁promesse 1
+▁roulant 1
+▁nuage 1
+benz 1
+▁suisse 1
+▁Libéria 1
+▁ultra 1
+adore 1
+▁roman 1
+abilit 1
+▁primordial 1
+▁sécheresse 1
+▁concertée 1
+▁électoral 1
+▁Rica 1
+▁fiscaux 1
+▁imprimable 1
+▁théorique 1
+▁invoqué 1
+▁assister 1
+2010 1
+hygiène 1
+▁fauteuil 1
+▁conduis 1
+▁that 1
+▁subséquent 1
+▁Principes 1
+▁jambe 1
+iya 1
+▁allié 1
+▁ressembl 1
+auraient 1
+ogène 1
+harmoniser 1
+▁compassion 1
+▁porc 1
+▁diabète 1
+▁imprimer 1
+Arabie 1
+▁guère 1
+▁croient 1
+éclairage 1
+▁filtre 1
+▁bonheur 1
+▁mercure 1
+▁sportif 1
+▁vive 1
+▁suspendu 1
+▁Figure 1
+▁CRDI 1
+▁inégalités 1
+▁prostitution 1
+▁débiteur 1
+▁correctionnel 1
+▁manifestation 1
+▁Mouvement 1
+▁retombées 1
+▁certifié 1
+▁détecter 1
+▁Médiateur 1
+défini 1
+▁parlons 1
+▁noyau 1
+▁prévisible 1
+▁débouché 1
+enveloppe 1
+version 1
+▁subsistance 1
+▁Island 1
+efficience 1
+▁pénitentiaire 1
+▁Malaisie 1
+ABLE 1
+hydrogène 1
+▁voulaient 1
+▁suspect 1
+▁program 1
+IER 1
+▁Canadien 1
+actualité 1
+▁FNUAP 1
+▁déclarant 1
+FORM 1
+▁découvr 1
+▁honnête 1
+▁inacceptable 1
+▁reconnaissent 1
+▁Lieu 1
+▁mercredi 1
+▁stabilisation 1
+▁justification 1
+Albanie 1
+▁mourir 1
+▁demandons 1
+▁déployé 1
+▁frappé 1
+▁maximal 1
+▁repris 1
+▁brochure 1
+AIEA 1
+▁précipit 1
+▁Royal 1
+stein 1
+▁convaincre 1
+▁disparaît 1
+▁négocié 1
+▁pesticides 1
+▁déroulé 1
+▁enrichi 1
+IÈRE 1
+▁Tarif 1
+extr 1
+▁Présentation 1
+▁configuration 1
+▁littoral 1
+▁contrepartie 1
+▁agréable 1
+▁filiale 1
+session 1
+▁Carte 1
+▁nommer 1
+algorithme 1
+innocuité 1
+▁Bélarus 1
+▁Technologie 1
+▁bénéfique 1
+▁crédibilité 1
+▁Smith 1
+▁payable 1
+▁laissez 1
+hop 1
+iversification 1
+▁chauffage 1
+▁référendum 1
+▁préventive 1
+▁trouvant 1
+troph 1
+impunité 1
+▁pédagogique 1
+▁retrouvé 1
+▁réservation 1
+▁URL 1
+▁progression 1
+enlèvement 1
+well 1
+▁Produits 1
+▁quiconque 1
+▁flotte 1
+Armée 1
+▁normalisation 1
+pac 1
+− 1
+âme 1
+▁chevauch 1
+▁différemment 1
+▁permanence 1
+▁bataille 1
+▁Water 1
+▁ratifier 1
+▁Sept 1
+▁pareil 1
+▁yougoslave 1
+▁fixation 1
+▁atténuer 1
+▁usées 1
+▁visuel 1
+essor 1
+▁Michel 1
+▁Tru 1
+▁grippe 1
+▁scolarité 1
+▁xénophobie 1
+▁métal 1
+▁Lanka 1
+▁oublier 1
+▁sensibilité 1
+▁planifier 1
+Organisme 1
+▁moléculaire 1
+▁Où 1
+▁club 1
+▁1981 1
+▁normalisé 1
+employ 1
+▁mutation 1
+▁modifie 1
+▁infantile 1
+▁récapitul 1
+▁souscrit 1
+UNE 1
+▁Participation 1
+▁bouche 1
+▁System 1
+▁Venezuela 1
+▁Chris 1
+▁faculté 1
+▁technicien 1
+▁livré 1
+/2007/ 1
+▁perfection 1
+▁paie 1
+▁confortable 1
+hébergement 1
+▁génome 1
+▁pomme 1
+▁conversion 1
+▁exclusive 1
+▁éliminé 1
+▁lendemain 1
+▁retourné 1
+Musique 1
+octobre 1
+▁fruit 1
+▁impliquant 1
+▁Séance 1
+oreille 1
+▁Garde 1
+▁flexibilité 1
+▁néerlandais 1
+▁imposable 1
+▁délicat 1
+▁Élection 1
+▁Invite 1
+▁loisirs 1
+pêche 1
+▁termine 1
+▁Tunisie 1
+color 1
+01-00 1
+EAU 1
+▁triste 1
+▁pénurie 1
+▁archives 1
+▁Appel 1
+▁réfugié 1
+accomplissement 1
+Nouvelle 1
+▁Référence 1
+▁confirmation 1
+▁millénaire 1
+▁répertoire 1
+▁Wood 1
+▁visuelle 1
+▁douze 1
+▁Actuellement 1
+▁Sénégal 1
+▁littéralement 1
+▁accéléré 1
+▁Ghana 1
+▁substitut 1
+▁gère 1
+méditerranéen 1
+▁constructeur 1
+▁précité 1
+▁infirmières 1
+addition 1
+▁évalue 1
+IDE 1
+utodétermination 1
+▁inévitable 1
+▁Contrôle 1
+▁inhérent 1
+▁oriental 1
+▁immobiliers 1
+▁fiabilité 1
+▁Guatemala 1
+▁corrélation 1
+▁maternité 1
+▁récupération 1
+ón 1
+hélicoptère 1
+▁commença 1
+▁SERVICES 1
+▁catalogue 1
+▁asiatique 1
+Éducation 1
+▁instituts 1
+▁ravi 1
+▁Malte 1
+▁actualisé 1
+▁libanais 1
+▁semblait 1
+▁DANS 1
+▁incompatible 1
+▁indépendamment 1
+▁Africa 1
+▁impératif 1
+▁Situation 1
+▁(2005) 1
+CONF 1
+burg 1
+▁généré 1
+Habitat 1
+▁élabore 1
+▁Beijing 1
+▁Combien 1
+▁silence 1
+▁Mots 1
+▁Doha 1
+▁créateur 1
+▁portion 1
+▁suscité 1
+▁Jordanie 1
+▁obtient 1
+▁écho 1
+▁discrétion 1
+▁prenons 1
+Entente 1
+▁Organ 1
+▁roue 1
+▁Index 1
+▁sérieuse 1
+▁vraisemblable 1
+unanimité 1
+▁aînés 1
+▁abordable 1
+▁Jérusalem 1
+▁génocide 1
+ONUDI 1
+▁Global 1
+▁Moldova 1
+▁foncier 1
+▁vétérinaire 1
+▁syrienne 1
+▁publie 1
+section 1
+▁Vérification 1
+▁2001-2002 1
+architecte 1
+▁Quatre 1
+▁rationnelle 1
+▁Puisque 1
+extradition 1
+Quatrième 1
+▁tuberculose 1
+▁cyber 1
+▁présidé 1
+▁prélevé 1
+discrimination 1
+▁prudent 1
+▁déroulement 1
+▁cinéma 1
+▁(2002) 1
+alliance 1
+▁modeste 1
+▁0,5 1
+▁ANNEXE 1
+▁Tanzanie 1
+▁synergie 1
+▁opérateurs 1
+▁conviendrait 1
+▁Super 1
+▁prière 1
+▁Cinquante 1
+▁Réaffirmant 1
+▁formidable 1
+▁réchauffement 1
+▁tabagisme 1
+Organe 1
+▁amène 1
+▁strictement 1
+▁légumes 1
+▁respiratoire 1
+▁vélo 1
+▁fuite 1
+▁écouter 1
+____ 1
+▁légère 1
+▁accessoires 1
+▁exerçant 1
+▁manifesté 1
+▁rejoindre 1
+▁constructive 1
+▁découlent 1
+▁desquelles 1
+▁construis 1
+▁aidant 1
+▁envoie 1
+▁conviennent 1
+▁Photo 1
+▁Aperçu 1
+autoroute 1
+▁bienvenue 1
+▁613- 1
+▁boissons 1
+extraction 1
+iii 1
+ingénierie 1
+▁Données 1
+▁soudain 1
+▁FAO 1
+▁exclusif 1
+▁South 1
+▁propagation 1
+▁surprenant 1
+▁sympa 1
+TURE 1
+▁Council 1
+héritage 1
+▁structurels 1
+▁Livre 1
+extrémité 1
+▁Excellence 1
+▁Laissez 1
+▁sœur 1
+▁aéroports 1
+communication 1
+▁pollu 1
+▁contamination 1
+▁anticipé 1
+▁inciter 1
+Angola 1
+service 1
+▁remplace 1
+▁accumulé 1
+▁suédois 1
+édifice 1
+humain 1
+▁indirectement 1
+atténu 1
+▁Californie 1
+▁Fabrication 1
+▁Gouverneur 1
+▁Népal 1
+▁attiré 1
+habilitation 1
+▁exécutive 1
+▁Bank 1
+▁patron 1
+consult 1
+▁apprécié 1
+▁mémorandum 1
+▁nettoyage 1
+▁athlètes 1
+▁tumeur 1
+▁armements 1
+▁sépare 1
+▁souverain 1
+▁United 1
+▁docteur 1
+▁confère 1
+Commissaire 1
+▁passif 1
+▁Thomas 1
+▁School 1
+allégement 1
+▁Haye 1
+▁gouvern 1
+▁supplément 1
+Uruguay 1
+▁conciliation 1
+▁Contexte 1
+▁exhort 1
+▁immense 1
+▁(2003) 1
+▁acides 1
+........... 1
+▁stipul 1
+▁Maurice 1
+▁Ajout 1
+▁préfère 1
+▁constituait 1
+▁toxicomanie 1
+▁faillite 1
+▁simulation 1
+▁entrevues 1
+accumulation 1
+▁Accès 1
+▁Gazette 1
+▁Regardez 1
+▁exhaustive 1
+▁ordonné 1
+projet 1
+▁renouvelé 1
+▁provient 1
+annuler 1
+Équateur 1
+▁bizarre 1
+▁saoudite 1
+▁Permettez 1
+▁(2006) 1
+▁connecté 1
+aventure 1
+▁constructif 1
+▁éducative 1
+▁fleuve 1
+▁Human 1
+▁Charles 1
+équivalent 1
+animaux 1
+▁Myanmar 1
+▁innovant 1
+entretenir 1
+▁réagi 1
+▁sectorielle 1
+▁minimal 1
+▁matériau 1
+▁adapt 1
+▁énumérés 1
+▁atomique 1
+▁brillant 1
+▁prouvé 1
+▁fabrique 1
+▁XXIe 1
+▁portuaire 1
+▁Établissement 1
+système 1
+occurrence 1
+▁suicide 1
+▁bouteille 1
+▁Street 1
+▁féminin 1
+▁transforme 1
+▁privilégi 1
+▁divergence 1
+▁personnalisé 1
+▁représentatif 1
+▁croyons 1
+▁Ibid 1
+▁Culture 1
+▁enlevé 1
+▁Niveau 1
+▁Nairobi 1
+▁destinataire 1
+▁refléter 1
+▁gare 1
+▁Steve 1
+▁thermique 1
+▁combine 1
+entraide 1
+ñ 1
+▁délibéré 1
+▁Parfois 1
+▁Fond 1
+▁survenus 1
+esclavage 1
+▁merveilleux 1
+▁clandestin 1
+▁bombe 1
+▁cerner 1
+▁caché 1
+▁DEMANDE 1
+▁Frank 1
+▁détruire 1
+▁prévalence 1
+▁générer 1
+▁mutuel 1
+▁Management 1
+▁envahi 1
+document 1
+▁subordonn 1
+avril 1
+▁attaqué 1
+▁convergence 1
+▁explosifs 1
+▁PRÉ 1
+▁accept 1
+affecter 1
+▁Concernant 1
+avertissement 1
+▁desquels 1
+▁certes 1
+▁émotionnel 1
+▁Divers 1
+▁College 1
+▁spirituel 1
+▁diamètre 1
+éprouve 1
+Ø 1
+▁Catégorie 1
+▁batterie 1
+▁muscle 1
+▁barème 1
+▁résidentiel 1
+intro 1
+▁0,2 1
+▁drapeau 1
+▁contractuelle 1
+▁requiert 1
+accréditation 1
+▁Salvador 1
+▁générique 1
+▁panneaux 1
+▁Texte 1
+▁intense 1
+entreposage 1
+▁récit 1
+▁garçon 1
+▁crédible 1
+▁bagage 1
+▁lentement 1
+▁terroriste 1
+▁arme 1
+▁Kazakhstan 1
+▁déployer 1
+▁résidus 1
+▁simplifier 1
+▁inhumains 1
+▁qualifi 1
+▁regrette 1
+▁différend 1
+▁efficient 1
+▁chrétien 1
+APECA 1
+▁infections 1
+▁médiateur 1
+▁circulaire 1
+▁réviser 1
+▁connexe 1
+▁Columbia 1
+▁excessif 1
+▁énormément 1
+▁ingrédients 1
+assure 1
+arrangement 1
+absorption 1
+▁British 1
+▁Finalement 1
+▁cadeau 1
+▁travaillons 1
+excuse 1
+▁évoque 1
+▁devais 1
+▁géant 1
+▁insisté 1
+▁posséder 1
+▁salariale 1
+▁toxicité 1
+▁plateforme 1
+▁consigné 1
+systématiquement 1
+▁orienté 1
+▁inventé 1
+somm 1
+1994 1
+▁Modification 1
+▁2007-2008 1
+▁chèque 1
+▁incluent 1
+▁pandémie 1
+▁ponctuel 1
+▁étonnant 1
+default 1
+▁Syrie 1
+accompagn 1
+exposé 1
+ontrairement 1
+▁clarifier 1
+▁postsecondaire 1
+▁souviens 1
+▁Exemple 1
+▁trouvait 1
+SCIAN 1
+interroge 1
+▁assistant 1
+field 1
+▁Suivant 1
+explosion 1
+▁Parallèlement 1
+▁doctorat 1
+▁fantastique 1
+viendront 1
+▁connaissez 1
+▁connaissons 1
+▁émotions 1
+▁montrant 1
+▁Décennie 1
+▁modélisation 1
+▁poursuivra 1
+affichage 1
+▁venait 1
+▁ISO 1
+▁cherché 1
+cyclo 1
+▁fragile 1
+▁validation 1
+interpréter 1
+▁survivre 1
+▁vapeur 1
+▁Encore 1
+▁devriez 1
+▁décédé 1
+▁Disposition 1
+▁rentabilité 1
+▁verbale 1
+▁apparemment 1
+▁réjouis 1
+▁apprécier 1
+▁prenne 1
+▁philosophie 1
+▁récupérer 1
+▁consolidé 1
+▁visibilité 1
+▁centraux 1
+▁créature 1
+▁guidé 1
+▁Fournir 1
+▁aveugle 1
+empreinte 1
+▁auxiliaire 1
+injection 1
+▁exploitants 1
+▁littérature 1
+▁pouvions 1
+▁simplifié 1
+▁quantitative 1
+▁neutre 1
+▁hebdomadaire 1
+▁octroyé 1
+accessible 1
+▁productive 1
+▁oublié 1
+Vidéo 1
+conseil 1
+dimension 1
+abondance 1
+▁Objet 1
+▁Zimbabwe 1
+▁relance 1
+▁Center 1
+▁modéré 1
+Église 1
+▁Coordonnateur 1
+▁piscine 1
+▁tactique 1
+altitude 1
+▁redressement 1
+▁linéaire 1
+▁vocation 1
+▁denrées 1
+▁estimons 1
+▁monument 1
+▁effectu 1
+▁Engagements 1
+▁Méthode 1
+▁Cliquez 1
+▁présidentielle 1
+▁Normes 1
+▁Petit 1
+▁calibre 1
+▁rédiger 1
+▁contesté 1
+▁fidèle 1
+▁minéraux 1
+▁aspirations 1
+▁publicitaire 1
+▁rentrer 1
+▁pauvre 1
+institut 1
+écriture 1
+▁corrigé 1
+▁détérioration 1
+▁magnétique 1
+▁Qatar 1
+▁Croix 1
+▁ultime 1
+▁geste 1
+▁chaussures 1
+▁Souligne 1
+▁comptait 1
+économiste 1
+Islande 1
+provincial 1
+▁Collège 1
+▁galaxie 1
+▁reproduire 1
+▁animé 1
+▁Mandat 1
+psych 1
+▁suspens 1
+▁essayons 1
+▁multinationale 1
+▁champignon 1
+▁Central 1
+▁survivant 1
+▁désertification 1
+▁facilitation 1
+▁instauré 1
+▁perturbation 1
+▁consultez 1
+▁Berlin 1
+▁Observations 1
+▁desservi 1
+▁brièvement 1
+▁brève 1
+▁libérer 1
+▁maïs 1
+▁démarrage 1
+aquaculture 1
+▁influencé 1
+▁macroéconomique 1
+▁déploie 1
+▁apparaissent 1
+ã 1
+▁cohérent 1
+▁nutritionnel 1
+▁Network 1
+▁précaution 1
+ï 1
+▁minéral 1
+gouvernement 1
+imagination 1
+▁Copenhague 1
+▁maintenance 1
+▁institué 1
+▁détient 1
+▁spatial 1
+▁souffrant 1
+▁précédant 1
+clamation 1
+▁Utilisation 1
+▁légitimité 1
+▁regroupé 1
+▁Business 1
+▁métallique 1
+▁diffuse 1
+enzyme 1
+ology 1
+▁Présidence 1
+▁réduisant 1
+▁affiliée 1
+▁Bolivie 1
+▁ramener 1
+impulsion 1
+▁compromettre 1
+développement 1
+Érythrée 1
+ˆ 1
+▁réclame 1
+apparence 1
+▁plonge 1
+▁spec 1
+▁couvrant 1
+▁renforçant 1
+▁unilatérale 1
+▁cheval 1
+▁dîner 1
+enjeu 1
+▁cancér 1
+▁usagers 1
+▁attire 1
+▁consécutive 1
+▁cérébral 1
+▁tunnel 1
+▁productif 1
+attrape 1
+# 1
+▁flexible 1
+▁puissions 1
+interruption 1
+▁jouissent 1
+▁convoqué 1
+▁demandait 1
+▁lycée 1
+▁contaminants 1
+▁pathogène 1
+▁Formulaire 1
+▁Conscient 1
+▁interculturel 1
+▁revendiqu 1
+▁refroidi 1
+▁pompe 1
+▁spectaculaire 1
+▁céréales 1
+▁imputable 1
+▁pensait 1
+eizième 1
+glyc 1
+▁dimanche 1
+▁festival 1
+▁baleine 1
+▁joie 1
+▁pâte 1
+▁CONSEIL 1
+▁PARTIE 1
+▁signataires 1
+▁transfrontalier 1
+▁Recueil 1
+▁Balkans 1
+▁recruter 1
+▁appartiennent 1
+▁biomasse 1
+▁contradiction 1
+▁soulignant 1
+▁Précédent 1
+ébauche 1
+▁chimie 1
+attestation 1
+▁Cameroun 1
+▁Noël 1
+▁Stephen 1
+▁attentivement 1
+▁superviseur 1
+obésité 1
+▁CANADIEN 1
+▁pharmacie 1
+▁terrasse 1
+▁coalition 1
+affection 1
+▁Liechtenstein 1
+▁engendre 1
+Instance 1
+▁quinze 1
+▁Amélioration 1
+▁congrès 1
+▁rembourser 1
+▁végétation 1
+▁Work 1
+amorce 1
+▁Windows 1
+▁douanier 1
+▁Pensez 1
+▁samedi 1
+▁entamé 1
+▁Stockholm 1
+▁félicitons 1
+▁musulmans 1
+▁rupture 1
+▁Black 1
+▁entourant 1
+▁Daniel 1
+▁pain 1
+▁ultérieur 1
+▁Facebook 1
+▁Women 1
+animation 1
+▁tuyau 1
+▁Engage 1
+▁poursuivent 1
+▁succession 1
+î 1
+▁entouré 1
+α 1
+▁merveilleuse 1
+▁Traitement 1
+▁novateur 1
+affiche 1
+▁bétail 1
+▁possédant 1
+▁victoire 1
+espérance 1
+ificateur 1
+étoile 1
+▁Enregistrement 1
+▁précurseur 1
+Occident 1
+▁acquitté 1
+▁navigable 1
+▁soufre 1
+▁Améliorer 1
+▁Cambodge 1
+▁onzième 1
+Histoire 1
+▁Strasbourg 1
+▁antidumping 1
+▁endommagé 1
+▁gagnant 1
+▁thérapeutiques 1
+▁semestre 1
+▁raciste 1
+instabilité 1
+▁honte 1
+▁RAPPORT 1
+▁frontaliers 1
+▁marquage 1
+▁préjudiciable 1
+▁stimulant 1
+▁trimestriel 1
+▁prévoyait 1
+▁clarté 1
+▁indigènes 1
+▁prospère 1
+▁achète 1
+émergence 1
+▁estimatif 1
+embryon 1
+▁Microsoft 1
+▁Méditerranée 1
+▁immigré 1
+▁excédentaire 1
+▁injuste 1
+▁rassemblé 1
+▁Province 1
+▁soigneusement 1
+▁Modifier 1
+▁musicale 1
+▁améliorant 1
+▁municipaux 1
+▁règne 1
+▁Joseph 1
+▁représentaient 1
+▁procure 1
+conférence 1
+▁Mécanisme 1
+▁australien 1
+▁félicitant 1
+▁mammifères 1
+▁policière 1
+invasion 1
+▁gardien 1
+immigrant 1
+▁accrédité 1
+▁dégagé 1
+▁traduis 1
+▁terminant 1
+entrepôt 1
+▁rénovation 1
+▁réparer 1
+vingt 1
+▁battre 1
+éthane 1
+▁prudence 1
+▁désastre 1
+▁éclat 1
+▁portail 1
+▁persiste 1
+Traduction 1
+▁poumon 1
+privé 1
+▁pharmaco 1
+alphabétisation 1
+▁Transfert 1
+▁comprenait 1
+▁pilotage 1
+▁Changement 1
+▁Français 1
+▁Procédure 1
+▁insectes 1
+▁rapatriement 1
+optimiser 1
+▁sénateur 1
+▁grève 1
+` 1
+émetteur 1
+▁illégal 1
+▁divise 1
+▁OTTAWA 1
+▁fascinant 1
+évènement 1
+▁rapprochement 1
+▁refuge 1
+▁radiation 1
+Atelier 1
+▁Renforcement 1
+▁ambitieux 1
+▁certitude 1
+Alzheimer 1
+▁collaborateurs 1
+▁flagrant 1
+▁ambiant 1
+investigation 1
+Ouzbékistan 1
+▁Burkina 1
+▁Festival 1
+▁combustion 1
+autrui 1
+attirer 1
+▁législateur 1
+▁General 1
+▁amusant 1
+▁deviendra 1
+inconvénient 1
+× 1
+▁percevoir 1
+IFICATION 1
+▁Department 1
+▁aléatoire 1
+▁marchande 1
+▁résiduel 1
+▁télécharger 1
+▁Scott 1
+▁infligé 1
+▁dramatique 1
+▁nominal 1
+aluminium 1
+transfrontalière 1
+▁COMMISSION 1
+▁immunitaire 1
+écrivain 1
+▁Power 1
+anomalie 1
+▁ordonne 1
+▁Prestation 1
+▁Profil 1
+enlever 1
+Islam 1
+Ç 1
+Administrateur 1
+▁dénommé 1
+▁hectare 1
+▁coïncid 1
+▁Priorité 1
+ouillé 1
+▁dégâts 1
+▁portugais 1
+▁suspendre 1
+▁énumérées 1
+▁Mettr 1
+entamer 1
+▁salubrité 1
+▁tangible 1
+▁cheveux 1
+▁commandé 1
+▁Rights 1
+▁Surveillance 1
+▁synthétique 1
+▁jouir 1
+▁Session 1
+▁Visite 1
+▁Structure 1
+▁composite 1
+▁précédé 1
+▁meurent 1
+nouvelle 1
+figure 1
+▁Prairies 1
+▁défaillance 1
+▁automatisé 1
+▁HTML 1
+▁croyez 1
+▁nôtre 1
+7,5 1
+▁affaibli 1
+Ancien 1
+▁béton 1
+▁entretenu 1
+viendrait 1
+amitié 1
+avortement 1
+aggrave 1
+▁Situé 1
+▁favori 1
+éthanol 1
+▁Responsabilité 1
+▁gratitude 1
+▁prototype 1
+▁remboursé 1
+extinction 1
+▁Food 1
+▁Soixante 1
+▁imprévu 1
+▁rattaché 1
+▁colloque 1
+▁dividende 1
+▁patrouille 1
+▁Réaffirme 1
+▁ruisseau 1
+▁retire 1
+effectue 1
+Infrastructure 1
+analyste 1
+▁pétrolière 1
+▁remboursable 1
+▁reddition 1
+▁épuisé 1
+▁Classification 1
+▁Nicaragua 1
+▁Dossier 1
+▁favorisé 1
+irrégularité 1
+évacuation 1
+▁réciproque 1
+▁simplification 1
+Entreprise 1
+▁Airlines 1
+▁caution 1
+accumule 1
+▁contracté 1
+▁phoque 1
+insertion 1
+▁authentique 1
+▁semences 1
+▁prescription 1
+Amsterdam 1
+enthousiasme 1
+▁invisible 1
+▁représentait 1
+▁1999-2000 1
+▁Museum 1
+phényl 1
+▁tierce 1
+▁Métis 1
+apprenant 1
+Indien 1
+▁qualitative 1
+▁représentative 1
+▁spécificité 1
+▁consistait 1
+▁Olympique 1
+▁démobilisation 1
+▁persistance 1
+▁plongé 1
+▁Fraser 1
+▁cartographie 1
+▁Tchad 1
+▁Création 1
+▁anglophone 1
+▁empêché 1
+▁irlandais 1
+▁jouissance 1
+embargo 1
+Effect 1
+▁terminal 1
+▁Philip 1
+▁trajet 1
+▁ventilation 1
+▁permettait 1
+▁détecté 1
+▁thermo 1
+automobile 1
+▁doctrine 1
+▁subdivision 1
+′ 1
+ć 1
+о 1
+Ô 1
+ú 1
+¢ 1
+š 1
+č 1
+е 1
+а 1
+Å 1
+и 1
+β 1
+Ö 1
+ο 1
+Î 1
+À 1
+ø 1
+н 1
+т 1
+■ 1
+й 1
+÷ 1
+å 1
+Ž 1
+⁄ 1
+Á 1
+с 1
+ι 1
+ς 1
+р 1
+ν 1
+π 1
+σ 1
+“ 1
+τ 1
+æ 1
+в 1
+ε 1
+Œ 1
+ρ 1
+Š 1
+≤ 1
+√ 1
+Õ 1
+ß 1
+κ 1
+∗ 1
+л 1
+ل 1
+ž 1
+Δ 1
+ا 1
+£ 1
+ł 1
+≥ 1
+¡ 1
+ì 1
+м 1
+^ 1
+γ 1
+к 1
+Û 1
+→ 1
+¶ 1
+λ 1
+† 1
+η 1
+¿ 1
+ı 1
+ί 1
+д 1
+Ä 1
+С 1
+ý 1
+Ï 1
+δ 1
+у 1
+ό 1
+ي 1
+Ο 1
+п 1
+Ü 1
+● 1
+ù 1
+ò 1
+¤ 1
+› 1
+Ó 1
+ę 1
+ė 1
+я 1
+ş 1
+ر 1
+õ 1
+Č 1
+ـ 1
+̄ 1
+ă 1
+‚ 1
+ÿ 1
+Π 1
+― 1
+ā 1
+、 1
+Ł 1
+б 1
+υ 1
+□ 1
+ы 1
+г 1
+ń 1
+ع 1
+θ 1
+ω 1
+ь 1
+م 1
+ت 1
+Τ 1
+Í 1
+► 1
+َ 1
+ą 1
+ī 1
+ة 1
+Σ 1
+ř 1
+Ñ 1
+ð 1
+ŕ 1
+ч 1
+„ 1
+Ë 1
+‡ 1
+ƒ 1
+ή 1
+ب 1
+ن 1
+ū 1
+د 1
+¥ 1
+ά 1
+─ 1
+ś 1
+ж 1
+ف 1
+و 1
+В 1
+¦ 1
+х 1
+⎯ 1
+ᑦ 1
+ق 1
+ż 1
+ц 1
+Ý 1
+Α 1
+ţ 1
+Ù 1
+Ú 1
+Ђ 1
+‹ 1
+χ 1
+ώ 1
+έ 1
+Þ 1
+ю 1
+Ş 1
+Р 1
+ň 1
+ύ 1
+ē 1
+ʼ 1
+Ÿ 1
+Æ 1
+ě 1
+ő 1
+ǫ 1
+ِ 1
+Ò 1
+ʹ 1
+Κ 1
+ᐅ 1
+Ε 1
+þ 1
+Ω 1
+ᐃ 1
+Ð 1
+ľ 1
+̊ 1
+أ 1
+⇒ 1
+ᒃ 1
+¬ 1
+٠ 1
+ų 1
+ш 1
+غ 1
+⌦ 1
+Μ 1
+ج 1
+ْ 1
+Φ 1
+ґ 1
+ك 1
+≠ 1
+◗ 1
+خ 1
+Ì 1
+ᖃ 1
+ᓄ 1
+١ 1
+٢ 1
+Т 1
+ض 1
+ᑐ 1
+∑ 1
+国 1
+Λ 1
+Ś 1
+➧ 1
+і 1
+ᑕ 1
+П 1
+· 1
+́ 1
+的 1
+ɔ 1
+ť 1
+Θ 1
+Ģ 1
+Ţ 1
+ź 1
+، 1
+س 1
+إ 1
+İ 1
+ّ 1
+ᓯ 1
+ᕐ 1
+ᑎ 1
+。 1
+法 1
+Γ 1
+ɛ 1
+Ă 1
+ɶ 1
+Ν 1
+⋅ 1
+М 1
+э 1
+ط 1
+亚 1
+会 1
+• 1
+尔 1
+利 1
+ъ 1
+ů 1
+Η 1
+ᓂ 1
+❒ 1
+Ő 1
+ʔ 1
+≡ 1
+ξ 1
+ф 1
+К 1
+ᓗ 1
+ᖅ 1
+大 1
+兰 1
+О 1
+Ċ 1
+ō 1
+ش 1
+ُ 1
+Ż 1
+ї 1
+ᓕ 1
+▲ 1
+✓ 1
+❏ 1
+〈 1
+글 1
+한 1
+А 1
+Н 1
+ᓇ 1
+ᓐ 1
+年 1
+和 1
+事 1
+٩ 1
+į 1
+ċ 1
+ζ 1
+Ќ 1
+ذ 1
+ħ 1
+Ľ 1
+ψ 1
+Ē 1
+Ğ 1
+ȇ 1
+̂ 1
+̱ 1
+Б 1
+ه 1
+٤ 1
+٥ 1
+ᒥ 1
+ᖓ 1
+※ 1
+← 1
+∞ 1
+❑ 1
+❚ 1
+อ 1
+ᕆ 1
+ز 1
+尼 1
+一 1
+提 1
+在 1
+不 1
+上 1
+各 1
+巴 1
+加 1
+ё 1
+ј 1
+٣ 1
+‟ 1
+ 1
+Ġ 1
+ʺ 1
+ˇ 1
+̍ 1
+Ά 1
+٦ 1
+∼ 1
+❖ 1
+➟ 1
+Ń 1
+Д 1
+ᐊ 1
+ᖏ 1
+ᒪ 1
+ᓪ 1
+阿 1
+北 1
+拉 1
+了 1
+合 1
+关 1
+由 1
+编 1
+为 1
+机 1
+主 1
+并 1
+对 1
+布 1
+任 1
+制 1
+牙 1
+表 1
+塞 1
+斯 1
+ظ 1
+٨ 1
+ย 1
+ĕ 1
+ű 1
+ᓈ 1
+ 1
+Ė 1
+Ρ 1
+ح 1
+ى 1
+พ 1
+ศ 1
+่ 1
+ᑭ 1
+ᒋ 1
+ᓚ 1
+ᔪ 1
+↓ 1
+└ 1
+ル 1
+ー 1
+比 1
+马 1
+년 1
+Ą 1
+И 1
+ص 1
+及 1
+联 1
+第 1
+日 1
+我 1
+介 1
+用 1
+于 1
+作 1
+出 1
+员 1
+要 1
+发 1
+式 1
+文 1
+致 1
+秘 1
+个 1
+书 1
+埃 1
+本 1
+俄 1
+伊 1
+罗 1
+鲁 1
+瓜 1
+Е 1
+ท 1
+∂ 1
+首 1
+ď 1
+­ 1
+ư 1
+̨ 1
+Й 1
+ث 1
+Ⴑ 1
+ᑲ 1
+ᒻ 1
+ᕗ 1
+ᖕ 1
+ờ 1
+↑ 1
+╚ 1
+╩ 1
+▬ 1
+➥ 1
+➾ 1
+サ 1
+下 1
+定 1
+报 1
+施 1
+民 1
+玩 1
+站 1
+章 1
+过 1
+通 1
+ћ 1
+ᕋ 1
+行 1
+乌 1
+西 1
+克 1
+以 1
+与 1
+所 1
+面 1
+高 1
+其 1
+力 1
+♫ 1
+诠 1
+译 1
+理 1
+犯 1
+爱 1
+权 1
+律 1
+弃 1
+号 1
+厅 1
+做 1
+伯 1
+众 1
+中 1
+☐ 1
+ệ 1
+ᒧ 1
+ᑯ 1
+ᐱ 1
+ร 1
+Э 1
+Џ 1
+ƴ 1
+Ř 1
+ļ 1
+Ě 1
+票 1
+死 1
+刑 1
+❍ 1
+ế 1
+З 1
+̇ 1
+ʾ 1
+Ů 1
+ķ 1
+Đ 1
+规 1
+务 1
+则 1
+决 1
+・ 1
+写 1
+官 1
+必 1
+持 1
+搁 1
+耽 1
+Љ 1
+Ѓ 1
+Ι 1
+◊ 1
+适 1
+ᕕ 1
+Β 1
+Ћ 1
+Υ 1
+套 1
+工 1
+料 1
+材 1
+现 1
+绍 1
+ȣ 1
+议 1
+✤ 1
+Њ 1
+他 1
+免 1
+避 1
+ƙ 1
+≈ 1
+ᓴ 1
+◆ 1
+们 1
+供 1
+保 1
+前 1
+口 1
+另 1
+外 1
+头 1
+府 1
+政 1
+正 1
+确 1
+简 1
+语 1
+щ 1
+̃ 1
+✔ 1
+ў 1
+公 1
+处 1
+有 1
+构 1
+组 1
+织 1
+间 1
+đ 1
+⌫ 1
+匈 1
+卡 1
+厄 1
+古 1
+坦 1
+基 1
+果 1
+萄 1
+葡 1
+◄ 1
+买 1
+印 1
+塔 1
+多 1
+安 1
+度 1
+廷 1
+德 1
+意 1
+朗 1
+根 1
+纳 1
+芬 1
+道 1
+∆ 1
+ѓ 1
+执 1
+昨 1
+被 1
+І 1
+Ф 1
+ต 1
+า 1
+ิ 1
+์ 1
+仁 1
+今 1
+席 1
+毛 1
+纪 1
+ϊ 1
+Л 1
+向 1
+念 1
+Į 1
+ǎ 1
+ǐ 1
+ữ 1
+赞 1
+| 1
+ 1
+Ā 1
+Ę 1
+Ħ 1
+Ī 1
+ĺ 1
+Ļ 1
+Ņ 1
+ņ 1
+Ň 1
+Ō 1
+Ū 1
+Ų 1
+ŷ 1
+ơ 1
+Ƴ 1
+ɲ 1
+̀ 1
+̆ 1
+̐ 1
+̓ 1
+Ψ 1
+Ч 1
+Ш 1
+Ъ 1
+ғ 1
+ұ 1
+ט 1
+ء 1
+ً 1
+ٱ 1
+प 1
+म 1
+र 1
+े 1
+् 1
+ง 1
+ถ 1
+น 1
+ว 1
+ั 1
+ี 1
+ึ 1
+ᄅ 1
+ᐸ 1
+ᑖ 1
+ᒫ 1
+ᓅ 1
+ᓖ 1
+ᔨ 1
+ᔭ 1
+ᔾ 1
+ᕈ 1
+ᕌ 1
+ᕙ 1
+Ẕ 1
+ạ 1
+ầ 1
+ẹ 1
+ọ 1
+ớ 1
+ứ 1
+‛ 1
+↔ 1
+↵ 1
+∕ 1
+∫ 1
+⊂ 1
+⊕ 1
+⌃ 1
+⌘ 1
+⌠ 1
+⌧ 1
+★ 1
+☺ 1
+♀ 1
+♂ 1
+✕ 1
+✜ 1
+✢ 1
+➔ 1
+➢ 1
+➪ 1
+➯ 1
+『 1
+』 1
+【 1
+】 1
+の 1
+ア 1
+オ 1
+カ 1
+ス 1
+ド 1
+レ 1
+ン 1
+举 1
+予 1
+五 1
+付 1
+伏 1
+伝 1
+你 1
+例 1
+先 1
+八 1
+共 1
+典 1
+内 1
+况 1
+刘 1
+刚 1
+動 1
+午 1
+南 1
+去 1
+反 1
+台 1
+吉 1
+告 1
+商 1
+园 1
+圆 1
+坚 1
+堂 1
+如 1
+实 1
+希 1
+干 1
+开 1
+录 1
+形 1
+情 1
+成 1
+拜 1
+放 1
+晚 1
+智 1
+東 1
+架 1
+格 1
+案 1
+梶 1
+次 1
+气 1
+沙 1
+津 1
+浦 1
+添 1
+澳 1
+王 1
+球 1
+瑞 1
+産 1
+疆 1
+线 1
+美 1
+群 1
+耀 1
+肯 1
+腊 1
+葱 1
+见 1
+記 1
+记 1
+贝 1
+达 1
+运 1
+迪 1
+送 1
+達 1
+邦 1
+防 1
+院 1
+隆 1
+Ć 1
+明 1
+ᒐ 1
+パ 1
+リ 1
+山 1
+ϋ 1
+ᑑ 1
+ᖑ 1
+危 1
+地 1
+є 1
+ѕ 1
+љ 1
+京 1
+名 1
+复 1
+最 1
+核 1
+经 1
+丹 1
+列 1
+宁 1
+来 1
+桑 1
+泊 1
+特 1
+荷 1
+莫 1
+▼ 1
+━ 1
+φ 1
+方 1
+Є 1
+➤ 1
+ğ 1
+з 1
+参 1
+将 1
+按 1
+授 1
+新 1
+是 1
+更 1
+活 1
+照 1
+综 1
+○ 1
+∙ 1
+̧ 1
+♦ 1
+动 1
+努 1
+委 1
+尤 1
+效 1
+率 1
+Ё 1
+♪ 1
+̈ 1
+▪ 1
+◦ 1
+月 1
+<mask> 1
diff --git a/SpeechUT/dataset/MuSTC/en_fr/spm_unigram10000.model b/SpeechUT/dataset/MuSTC/en_fr/spm_unigram10000.model
new file mode 100644
index 0000000000000000000000000000000000000000..43cfd14648388420a61c1d45f522667ea5318019
Binary files /dev/null and b/SpeechUT/dataset/MuSTC/en_fr/spm_unigram10000.model differ
diff --git a/SpeechUT/speechut/__init__.py b/SpeechUT/speechut/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..97327d269e93a13cd135f6c1a187fd820a8decb8
--- /dev/null
+++ b/SpeechUT/speechut/__init__.py
@@ -0,0 +1 @@
+from . import data, tasks, criterions, models
diff --git a/SpeechUT/speechut/config/finetune_asr/speechut_base_100h.yaml b/SpeechUT/speechut/config/finetune_asr/speechut_base_100h.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..736c3c72b9a7ba85eacaf44e1952fa7f0fc15a4f
--- /dev/null
+++ b/SpeechUT/speechut/config/finetune_asr/speechut_base_100h.yaml
@@ -0,0 +1,101 @@
+# @package _group_
+
+common:
+  fp16: true
+  log_format: json
+  log_interval: 100
+  tensorboard_logdir: tblog
+  seed: 1337
+
+checkpoint:
+  save_interval: 1
+  keep_last_epochs: 1
+  keep_best_checkpoints: 5
+  best_checkpoint_metric: dec_accuracy
+  maximize_best_checkpoint_metric: true
+  restore_file: checkpoint_last.pt
+
+distributed_training:
+  ddp_backend: legacy_ddp
+  find_unused_parameters: true
+  distributed_world_size: 1
+  distributed_port: -1
+  nprocs_per_node: 8
+
+task:
+  _name: joint_sc2t_pretraining
+  data: ???
+  fine_tuning: true
+  label_dir: ???
+  normalize: false  # must be consistent with pre-training
+  labels: ["ltr"]
+  store_labels: true
+  single_target: true
+  add_decoder_target: true
+  pad_audio: false
+  random_crop: true
+  hubert_tokenizer: "none"
+  sp_path: None
+
+dataset:
+  num_workers: 0
+  max_tokens: 1300000
+  skip_invalid_size_inputs_valid_test: true
+  train_subset: train_100
+  valid_subset: dev_other
+  required_batch_size_multiple: 1
+
+criterion:
+  _name: ctc_ce
+  zero_infinity: true
+
+optimization:
+  max_update: 40000
+  lr: [0.00001]
+  sentence_avg: true
+  update_freq: [2]
+
+optimizer:
+  _name: adam
+  adam_betas: (0.9,0.98)
+  adam_eps: 1e-08
+  weight_decay: 0.0
+
+lr_scheduler:
+  _name: tri_stage
+  phase_ratio: [0.1, 0.4, 0.5]
+  final_lr_scale: 0.05
+
+model:
+  _name: speechut_asr
+  w2v_path: ???
+  apply_mask: true
+  mask_prob: 0.65
+  mask_channel_prob: 0.5
+  mask_channel_length: 64
+  layerdrop: 0.1
+  activation_dropout: 0.1
+  feature_grad_mult: 0.0
+  freeze_finetune_updates: 0
+  add_decoder: true
+
+hydra:
+  job:
+    config:
+      override_dirname:
+        kv_sep: '-'
+        item_sep: '__'
+        exclude_keys:
+          - run
+          - task.data
+          - task.label_dir
+          - model.w2v_path
+          - dataset.train_subset
+          - dataset.valid_subset
+          - criterion.wer_kenlm_model
+          - criterion.wer_lexicon
+  run:
+    dir: ???
+  sweep:
+    dir: ???
+    subdir: ${hydra.job.config_name}__${hydra.job.override_dirname}
diff --git a/SpeechUT/speechut/config/finetune_asr/speechut_large_100h.yaml b/SpeechUT/speechut/config/finetune_asr/speechut_large_100h.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..7cbc59e61f10ab00b997286d6355f22ce1008677
--- /dev/null
+++ b/SpeechUT/speechut/config/finetune_asr/speechut_large_100h.yaml
@@ -0,0 +1,102 @@
+# @package _group_
+
+common:
+  fp16: true
+  log_format: json
+  log_interval: 100
+  tensorboard_logdir: tblog
+  seed: 1337
+
+checkpoint:
+  save_interval: 1
+  keep_last_epochs: 5
+  keep_best_checkpoints: 5
+  best_checkpoint_metric: dec_accuracy
+  maximize_best_checkpoint_metric: true
+  restore_file: checkpoint_last.pt
+
+distributed_training:
+  ddp_backend: legacy_ddp
+  find_unused_parameters: true
+  distributed_world_size: 16
+  distributed_port: -1
+  nprocs_per_node: 8
+
+task:
+  _name: joint_sc2t_pretraining
+  data: ???
+  fine_tuning: true
+  label_dir: ???
+  normalize: true  # must be consistent with pre-training
+  labels: ["ltr"]
+  store_labels: true
+  single_target: true
+  add_decoder_target: true
+  pad_audio: false
+  random_crop: true
+  hubert_tokenizer: "none"
+  sp_path: None
+
+dataset:
+  num_workers: 0
+  max_tokens: 1300000
+  skip_invalid_size_inputs_valid_test: true
+  train_subset: train_100
+  valid_subset: dev_other
+  required_batch_size_multiple: 1
+
+criterion:
+  _name: ctc_ce
+  zero_infinity: true
+
+optimization:
+  max_update: 40000
+  lr: [0.00001]
+  sentence_avg: true
+  update_freq: [2]
+
+optimizer:
+  _name: adam
+  adam_betas: (0.9,0.98)
+  adam_eps: 1e-08
+  weight_decay: 0.0
+
+lr_scheduler:
+  _name: tri_stage
+  phase_ratio: [0.1, 0.4, 0.5]
+  final_lr_scale: 0.05
+
+model:
+  _name: speechut_asr
+  w2v_path: ???
+  apply_mask: true
+  mask_prob: 0.5
+  mask_channel_prob: 0.5
+  mask_channel_length: 64
+  layerdrop: 0.0
+  activation_dropout: 0.1
+  attention_dropout: 0.1
+  feature_grad_mult: 0.0
+  freeze_finetune_updates: 0
+  add_decoder: true
+
+hydra:
+  job:
+    config:
+      override_dirname:
+        kv_sep: '-'
+        item_sep: '__'
+        exclude_keys:
+          - run
+          - task.data
+          - task.label_dir
+          - model.w2v_path
+          - dataset.train_subset
+          - dataset.valid_subset
+          - criterion.wer_kenlm_model
+          - criterion.wer_lexicon
+  run:
+    dir: ???
+  sweep:
+    dir: ???
+    subdir: ${hydra.job.config_name}__${hydra.job.override_dirname}
diff --git a/SpeechUT/speechut/config/finetune_asr/speechut_large_960h.yaml b/SpeechUT/speechut/config/finetune_asr/speechut_large_960h.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..f10d6002555e5cbcfbf31035d8258e77abc26050
--- /dev/null
+++ b/SpeechUT/speechut/config/finetune_asr/speechut_large_960h.yaml
@@ -0,0 +1,100 @@
+# @package _group_
+
+common:
+  fp16: true
+  log_format: json
+  log_interval: 100
+  tensorboard_logdir: tblog
+
+checkpoint:
+  save_interval: 1
+  keep_last_epochs: 5
+  keep_best_checkpoints: 5
+  best_checkpoint_metric: dec_accuracy
+  maximize_best_checkpoint_metric: true
+  restore_file: checkpoint_last.pt
+
+distributed_training:
+  ddp_backend: legacy_ddp
+  find_unused_parameters: true
+  distributed_world_size: 24
+  distributed_port: -1
+  nprocs_per_node: 8
+
+task:
+  _name: joint_sc2t_pretraining
+  data: ???
+  fine_tuning: true
+  label_dir: ???
+  normalize: true  # must be consistent with pre-training
+  labels: ["ltr"]
+  store_labels: true
+  single_target: true
+  add_decoder_target: true
+  pad_audio: false
+  random_crop: true
+  hubert_tokenizer: "none"
+  sp_path: None
+
+dataset:
+  num_workers: 0
+  max_tokens: 1300000
+  skip_invalid_size_inputs_valid_test: true
+  train_subset: train_960
+  valid_subset: dev_other
+  required_batch_size_multiple: 1
+
+criterion:
+  _name: ctc_ce
+  zero_infinity: true
+
+optimization:
+  max_update: 40000
+  lr: [0.00001]
+  sentence_avg: true
+  update_freq: [2]
+
+optimizer:
+  _name: adam
+  adam_betas: (0.9,0.98)
+  adam_eps: 1e-08
+  weight_decay: 0.0
+
+lr_scheduler:
+  _name: tri_stage
+  phase_ratio: [0.1, 0.4, 0.5]
+  final_lr_scale: 0.05
+
+model:
+  _name: speechut_asr
+  w2v_path: ???
+  apply_mask: true
+  mask_prob: 0.5
+  mask_channel_prob: 0.25
+  mask_channel_length: 64
+  layerdrop: 0.0
+  activation_dropout: 0.1
+  feature_grad_mult: 0.0
+  freeze_finetune_updates: 0
+  add_decoder: true
+
+hydra:
+  job:
+    config:
+      override_dirname:
+        kv_sep: '-'
+        item_sep: '__'
+        exclude_keys:
+          - run
+          - task.data
+          - task.label_dir
+          - model.w2v_path
+          - dataset.train_subset
+          - dataset.valid_subset
+          - criterion.wer_kenlm_model
+          - criterion.wer_lexicon
+  run:
+    dir: ???
+  sweep:
+    dir: ???
+    subdir: ${hydra.job.config_name}__${hydra.job.override_dirname}
diff --git a/SpeechUT/speechut/config/pretrain/speechut_base_librispeech.yaml b/SpeechUT/speechut/config/pretrain/speechut_base_librispeech.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..6a3751febf2efc3cbf7a91e3a75f05b570559f2c
--- /dev/null
+++ b/SpeechUT/speechut/config/pretrain/speechut_base_librispeech.yaml
@@ -0,0 +1,153 @@
+# @package _group_
+
+common:
+  fp16: true
+  log_format: json
+  log_interval: 200
+  seed: 1337
+  tensorboard_logdir: tblog
+
+checkpoint:
+  save_dir: ???
+  save_interval: 4
+  keep_last_epochs: 4
+  save_interval_updates: 50000
+  keep_interval_updates: -1
+  keep_interval_updates_pattern: 50000
+  # no_epoch_checkpoints: true
+
+distributed_training:
+  ddp_backend: no_c10d
+  distributed_backend: 'nccl'
+  distributed_port: -1
+  distributed_world_size: 32
+  nprocs_per_node: 8
+  find_unused_parameters: true
+
+task:
+  _name: joint_sc2t_pretraining
+  data: ???
+  label_dir: ???
+  labels: ???
+  label_rate: ${model.label_rate}
+  store_labels: true
+  sample_rate: 16000
+  max_sample_size: 250000
+  min_sample_size: 32000
+  pad_audio: false
+  random_crop: true
+  normalize: false # must be consistent with extractor
+  add_decoder_target: true
+  text_cfg:
+    seed: ${common.seed}
+    text_data: ???
+    data_config: config.yaml
+    sample_break_mode: eos
+    tokens_per_sample: 1024
+    shorten_method: "random_crop"
+    text_maxtokens_ratio: 1.5
+
+dataset:
+  num_workers: 6
+  max_tokens: 1400000
+  skip_invalid_size_inputs_valid_test: true
+  validate_interval: ${checkpoint.save_interval}
+  validate_interval_updates: ${checkpoint.save_interval_updates}
+  required_batch_size_multiple: 1
+
+criterion:
+  _name: speechut_criterion
+  pred_masked_weight: 1.0
+  pred_nomask_weight: 0.0
+  loss_weights: [10,]
+  label_smoothing: 0.1
+  u2t_ed_weight: 0.1
+  u2t_ctc_weight: 0.1
+  text_mum_weight: 0.5
+
+optimization:
+  max_update: 400000
+  lr: [0.0005]
+  clip_norm: 10.0
+
+optimizer:
+  _name: adam
+  adam_betas: (0.9,0.98)
+  adam_eps: 1e-06
+  weight_decay: 0.01
+
+lr_scheduler:
+  _name: polynomial_decay
+  warmup_updates: 32000
+
+model:
+  _name: speechut
+  label_rate: ???
+  skip_masked: false
+  skip_nomask: false
+  mask_prob: 0.80
+  extractor_mode: default
+  conv_feature_layers: '[(512,10,5)] + [(512,3,2)] * 4 + [(512,2,2)] * 2'
+  final_dim: 256
+  activation_fn: "gelu"
+  encoder_layers: 6
+  encoder_attention_heads: 8
+  encoder_layerdrop: 0.0
+  dropout_input: 0.1
+  dropout_features: 0.1
+  dropout: 0.1
+  attention_dropout: 0.1
+  feature_grad_mult: 0.1
+  untie_final_proj: true
+  activation_dropout: 0.0
+  use_rel_pos_enc: true
+  add_unit_encoder: true
+  add_text_ctc: true
+  mask_u2t: false
+  mix_with_unit: true
+  add_decoder: true
+  reset_decoder_embedding_config: true
+  text_transformer:
+    activation_fn: ${model.activation_fn}
+    dropout: ${model.dropout}
+    attention_dropout: ${model.attention_dropout}
+    activation_dropout: ${model.activation_dropout}
+    max_source_positions: 3000
+    max_target_positions: 3000
+    no_scale_embedding: true
+    layernorm_embedding: true
+    no_token_positional_embeddings: false
+    share_decoder_input_output_embed: false
+    encoder:
+      embed_dim: 768
+      ffn_embed_dim: 3072
+      layers: 6
+      attention_heads: 8
+      normalize_before: false
+      learned_pos: true
+      layerdrop: ${model.encoder_layerdrop}
+    decoder:
+      layerdrop: 0.1
+      embed_dim: 768
+      ffn_embed_dim: 3072
+      layers: 6
+      attention_heads: 12
+      normalize_before: false
+      learned_pos: false
+      output_dim: 768
+
+hydra:
+  job:
+    config:
+      override_dirname:
+        kv_sep: '-'
+        item_sep: '__'
+        exclude_keys:
+          - run
+          - task.data
+          - task.label_dir
+  run:
+    dir: ???
+  sweep:
+    dir: ???
+    subdir: ${hydra.job.config_name}__${hydra.job.override_dirname}
diff --git a/SpeechUT/speechut/config/pretrain/speechut_large_librilight.yaml b/SpeechUT/speechut/config/pretrain/speechut_large_librilight.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..849c1d986126f6e26f3e10feb14fae0a299be4b4
--- /dev/null
+++ b/SpeechUT/speechut/config/pretrain/speechut_large_librilight.yaml
@@ -0,0 +1,159 @@
+# @package _group_
+
+common:
+  fp16: true
+  fp16_scale_tolerance: 0.1   # alleviate fp16 overflow issue
+  log_format: json
+  log_interval: 200
+  seed: 1234
+  tensorboard_logdir: tblog
+
+checkpoint:
+  save_dir: ???
+  save_interval: 1
+  keep_last_epochs: 4
+  save_interval_updates: 10000
+  keep_interval_updates: -1
+  keep_interval_updates_pattern: 10000
+  # no_epoch_checkpoints: true
+
+distributed_training:
+  ddp_backend: no_c10d
+  distributed_backend: 'nccl'
+  distributed_port: -1
+  distributed_world_size: 128
+  nprocs_per_node: 8
+  find_unused_parameters: true
+
+task:
+  _name: joint_sc2t_pretraining
+  data: ???
+  label_dir: ???
+  labels: ???
+  label_rate: ${model.label_rate}
+  store_labels: true
+  sample_rate: 16000
+  max_sample_size: 250000
+  min_sample_size: 32000
+  pad_audio: false
+  random_crop: true
+  normalize: true # must be consistent with extractor
+  add_decoder_target: true
+  text_cfg:
+    seed: ${common.seed}
+    text_data: ???
+    data_config: config.yaml
+    sample_break_mode: eos
+    tokens_per_sample: 1024
+    shorten_method: "random_crop"
+    text_maxtokens_ratio: 1.4
+
+dataset:
+  num_workers: 6
+  max_tokens: 900000
+  skip_invalid_size_inputs_valid_test: true
+  validate_interval: ${checkpoint.save_interval}
+  validate_interval_updates: ${checkpoint.save_interval_updates}
+  required_batch_size_multiple: 2
+
+criterion:
+  _name: speechut_criterion
+  pred_masked_weight: 1.0
+  pred_nomask_weight: 0.0
+  loss_weights: [10,]
+  label_smoothing: 0.1
+  u2t_ed_weight: 0.1
+  u2t_ctc_weight: 0.1
+  text_mum_weight: 0.5
+
+optimization:
+  max_update: 400000
+  lr: [0.0005]
+  clip_norm: 1.0
+
+optimizer:
+  _name: adam
+  adam_betas: (0.9,0.98)
+  adam_eps: 1e-06
+  weight_decay: 0.01
+
+lr_scheduler:
+  _name: polynomial_decay
+  warmup_updates: 32000
+  end_learning_rate: 0.00015  # for future longger pre-training, e.g. 600K step
+
+model:
+  _name: speechut
+  label_rate: ???
+  encoder_embed_dim: 1024
+  encoder_ffn_embed_dim: 4096
+  skip_masked: false
+  skip_nomask: false
+  mask_prob: 0.80
+  extractor_mode: layer_norm
+  conv_feature_layers: '[(512,10,5)] + [(512,3,2)] * 4 + [(512,2,2)] * 2'
+  final_dim: 768
+  activation_fn: "gelu"
+  encoder_layers: 12
+  encoder_attention_heads: 16
+  encoder_layerdrop: 0.0
+  dropout_input: 0.0
+  dropout_features: 0.0
+  dropout: 0.0
+  attention_dropout: 0.0
+  layer_norm_first: true
+  feature_grad_mult: 1.0
+  untie_final_proj: true
+  activation_dropout: 0.0
+  use_rel_pos_enc: true
+  add_unit_encoder: true
+  add_text_ctc: true
+  mask_u2t: false
+  mix_with_unit: true
+  add_decoder: true
+  reset_decoder_embedding_config: true
+  scaling_for_att: 32   # alleviate fp16 overflow issue
+  text_transformer:
+    activation_fn: ${model.activation_fn}
+    dropout: ${model.dropout}
+    attention_dropout: ${model.attention_dropout}
+    activation_dropout: ${model.activation_dropout}
+    max_source_positions: 3000
+    max_target_positions: 3000
+    no_scale_embedding: true
+    layernorm_embedding: true
+    no_token_positional_embeddings: true
+    share_decoder_input_output_embed: false
+    encoder:
+      embed_dim: 1024
+      ffn_embed_dim: 4096
+      layers: 12
+      attention_heads: 16
+      normalize_before: false
+      learned_pos: true
+      layerdrop: ${model.encoder_layerdrop}
+    decoder:
+      layerdrop: 0.1
+      embed_dim: 768
+      ffn_embed_dim: 3072
+      layers: 6
+      attention_heads: 12
+      normalize_before: false
+      learned_pos: false
+      output_dim: 768
+
+hydra:
+  job:
+    config:
+      override_dirname:
+        kv_sep: '-'
+        item_sep: '__'
+        exclude_keys:
+          - run
+          - task.data
+          - task.label_dir
+  run:
+    dir: ???
+  sweep:
+    dir: ???
+    subdir: ${hydra.job.config_name}__${hydra.job.override_dirname}
diff --git a/SpeechUT/speechut/criterions/__init__.py b/SpeechUT/speechut/criterions/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..2bf9fac9a8c00d76decd07417d86a2625c4c851c
--- /dev/null
+++ b/SpeechUT/speechut/criterions/__init__.py
@@ -0,0 +1,9 @@
+import importlib
+import os
+
+for file in os.listdir(os.path.dirname(__file__)):
+    if file.endswith(".py") and not file.startswith("_"):
+        criterion_name = file[: file.find(".py")]
+        importlib.import_module(
+            "speechut.criterions." + criterion_name
+        )
diff --git a/SpeechUT/speechut/criterions/ctc_ce.py b/SpeechUT/speechut/criterions/ctc_ce.py
new file mode 100644
index 0000000000000000000000000000000000000000..aab6c9d23ac3b7dc410704bcba8982a697a57656
--- /dev/null
+++ b/SpeechUT/speechut/criterions/ctc_ce.py
@@ -0,0 +1,414 @@
+# ----------------------------------------------------------------------------
+# SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder Based Speech-Text Pre-training (https://arxiv.org/abs/2210.03730)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/SpeechUT
+# Code based on fairseq: https://github.com/facebookresearch/fairseq/tree/272c4c5197250997148fb12c0db6306035f166a4
+# 
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# ----------------------------------------------------------------------------
+
+import math
+from argparse import Namespace
+from dataclasses import dataclass, field
+from omegaconf import II
+from typing import Optional
+
+import torch
+import torch.nn.functional as F
+from fairseq import metrics, utils
+from fairseq.criterions import FairseqCriterion, register_criterion
+from fairseq.criterions.label_smoothed_cross_entropy import label_smoothed_nll_loss
+from fairseq.dataclass import FairseqDataclass
+from fairseq.data.data_utils import post_process
+from fairseq.tasks import FairseqTask
+from fairseq.logging.meters import safe_round
+
+
+@dataclass
+class CtcCeCriterionConfig(FairseqDataclass):
+    zero_infinity: bool = field(
+        default=False,
+        metadata={"help": "zero inf loss when source length <= target length"},
+    )
+    sentence_avg: bool = II("optimization.sentence_avg")
+    post_process: str = field(
+        default="letter",
+        metadata={
+            "help": "how to post process predictions into words. can be letter, "
+            "wordpiece, BPE symbols, etc. "
+            "See fairseq.data.data_utils.post_process() for full list of options"
+        },
+    )
+    wer_kenlm_model: Optional[str] = field(
+        default=None,
+        metadata={
+            "help": "if this is provided, use kenlm to compute wer (along with other wer_* args)"
+        },
+    )
+    wer_lexicon: Optional[str] = field(
+        default=None,
+        metadata={"help": "lexicon to use with wer_kenlm_model"},
+    )
+    wer_lm_weight: float = field(
+        default=2.0,
+        metadata={"help": "lm weight to use with wer_kenlm_model"},
+    )
+    wer_word_score: float = field(
+        default=-1.0,
+        metadata={"help": "lm word score to use with wer_kenlm_model"},
+    )
+
+    wer_args: Optional[str] = field(
+        default=None,
+        metadata={
+            "help": "DEPRECATED: tuple of (wer_kenlm_model, wer_lexicon, wer_lm_weight, wer_word_score)"
+        },
+    )
+
+    dec_weight: float = field(
+        default=0.5,
+        metadata={"help": "weights for decoder CE Loss, loss will be ((1 - dec_weight) * hubert_loss + dec_weight * CE_Loss)"},
+    )
+    report_accuracy: bool = field(
+        default=True,
+        metadata={"help": "report decoder accuracy metric"},
+    )
+    ignore_prefix_size: int = field(
+        default=0,
+        metadata={"help": "Ignore first N tokens"},
+    )
+    label_smoothing: float = field(
+        default=0.1,
+        metadata={"help": "epsilon for label smoothing, 0 means no label smoothing"},
+    )
+
+
+@register_criterion("ctc_ce", dataclass=CtcCeCriterionConfig)
+class CtcCeCriterion(FairseqCriterion):
+    def __init__(self, cfg: CtcCeCriterionConfig, task: FairseqTask):
+        super().__init__(task)
+        self.blank_idx = (
+            task.target_dictionary.index(task.blank_symbol)
+            if hasattr(task, "blank_symbol")
+            else 0
+        )
+        self.pad_idx = task.target_dictionary.pad()
+        self.eos_idx = task.target_dictionary.eos()
+        self.post_process = cfg.post_process
+
+        if cfg.wer_args is not None:
+            (
+                cfg.wer_kenlm_model,
+                cfg.wer_lexicon,
+                cfg.wer_lm_weight,
+                cfg.wer_word_score,
+            ) = eval(cfg.wer_args)
+
+        if cfg.wer_kenlm_model is not None:
+            from examples.speech_recognition.w2l_decoder import W2lKenLMDecoder
+
+            dec_args = Namespace()
+            dec_args.nbest = 1
+            dec_args.criterion = "ctc"
+            dec_args.kenlm_model = cfg.wer_kenlm_model
+            dec_args.lexicon = cfg.wer_lexicon
+            dec_args.beam = 50
+            dec_args.beam_size_token = min(50, len(task.target_dictionary))
+            dec_args.beam_threshold = min(50, len(task.target_dictionary))
+            dec_args.lm_weight = cfg.wer_lm_weight
+            dec_args.word_score = cfg.wer_word_score
+            dec_args.unk_weight = -math.inf
+            dec_args.sil_weight = 0
+
+            self.w2l_decoder = W2lKenLMDecoder(dec_args, task.target_dictionary)
+        else:
+            self.w2l_decoder = None
+
+        self.zero_infinity = cfg.zero_infinity
+        self.sentence_avg = cfg.sentence_avg
+
+        self.dec_weight = cfg.dec_weight
+        self.report_accuracy = cfg.report_accuracy
+        self.ignore_prefix_size = cfg.ignore_prefix_size
+        self.eps = cfg.label_smoothing
+
+    def forward(self, model, sample, reduce=True):
+        net_output = model(**sample["net_input"])
+        lprobs = model.get_normalized_probs(
+            net_output, log_probs=True
+        ).contiguous()  # (T, B, C) from the encoder
+
+        if "src_lengths" in sample["net_input"]:
+            input_lengths = sample["net_input"]["src_lengths"]
+        else:
+            if net_output["padding_mask"] is not None:
+                non_padding_mask = ~net_output["padding_mask"]
+                input_lengths = non_padding_mask.long().sum(-1)
+            else:
+                input_lengths = lprobs.new_full(
+                    (lprobs.size(1),), lprobs.size(0), dtype=torch.long
+                )
+
+        pad_mask = (sample["target"] != self.pad_idx) & (
+            sample["target"] != self.eos_idx
+        )
+        targets_flat = sample["target"].masked_select(pad_mask)
+        if "target_lengths" in sample:
+            target_lengths = sample["target_lengths"]
+        else:
+            target_lengths = pad_mask.sum(-1)
+
+        with torch.backends.cudnn.flags(enabled=False):
+            loss = F.ctc_loss(
+                lprobs,
+                targets_flat,
+                input_lengths,
+                target_lengths,
+                blank=self.blank_idx,
+                reduction="sum",
+                zero_infinity=self.zero_infinity,
+            )
+
+        ntokens = (
+            sample["ntokens"] if "ntokens" in sample else target_lengths.sum().item()
+        )
+
+        sample_size = sample["target"].size(0) if self.sentence_avg else ntokens
+
+        logging_output = {}
+        if "decoder_target" in sample:
+            if net_output["decoder_out"] is not None:
+                dec_sample_size = sample["target"].size(0) if self.sentence_avg else sample["dec_ntokens"]
+                dec_loss, dec_nll_loss = self.compute_ce_loss(model, net_output["decoder_out"], sample, reduce=reduce)
+                logging_output["ctc_loss"] = loss.item()
+                loss = (1 - self.dec_weight) * loss + (self.dec_weight * dec_loss *  sample_size / dec_sample_size)
+                logging_output["dec_loss"] = dec_loss.item()
+                logging_output["dec_nll_loss"] = dec_nll_loss.item()
+                logging_output["dec_sample_size"] = dec_sample_size
+
+                if self.report_accuracy:
+                    n_correct, total = self.compute_accuracy(model, net_output["decoder_out"], sample)
+                    logging_output["dec_n_correct"] = utils.item(n_correct.data)
+                    logging_output["total"] = utils.item(total.data)
+            else:
+                logging_output["ctc_loss"] = loss.item()
+                loss = (1 - self.dec_weight) * loss
+                logging_output["dec_loss"] = 0
+                logging_output["dec_nll_loss"] = 0
+                logging_output["dec_sample_size"] = 1
+                if self.report_accuracy:
+                    logging_output["dec_n_correct"] = 0
+                    logging_output["total"] = 1
+            
+        logging_output = {
+            "loss": utils.item(loss.data),  # * sample['ntokens'],
+            "ntokens": ntokens,
+            "nsentences": sample["id"].numel(),
+            "sample_size": sample_size,
+            **logging_output,
+        }
+
+        if not model.training and self.dec_weight < 1.0:
+            import editdistance
+
+            with torch.no_grad():
+                lprobs_t = lprobs.transpose(0, 1).float().contiguous().cpu()
+
+                c_err = 0
+                c_len = 0
+                w_errs = 0
+                w_len = 0
+                wv_errs = 0
+                for lp, t, inp_l in zip(
+                    lprobs_t,
+                    sample["target_label"]
+                    if "target_label" in sample
+                    else sample["target"],
+                    input_lengths,
+                ):
+                    lp = lp[:inp_l].unsqueeze(0)
+
+                    decoded = None
+                    if self.w2l_decoder is not None:
+                        decoded = self.w2l_decoder.decode(lp)
+                        if len(decoded) < 1:
+                            decoded = None
+                        else:
+                            decoded = decoded[0]
+                            if len(decoded) < 1:
+                                decoded = None
+                            else:
+                                decoded = decoded[0]
+
+                    p = (t != self.task.target_dictionary.pad()) & (
+                        t != self.task.target_dictionary.eos()
+                    )
+                    targ = t[p]
+                    targ_units = self.task.target_dictionary.string(targ)
+                    targ_units_arr = targ.tolist()
+
+                    toks = lp.argmax(dim=-1).unique_consecutive()
+                    pred_units_arr = toks[toks != self.blank_idx].tolist()
+
+                    c_err += editdistance.eval(pred_units_arr, targ_units_arr)
+                    c_len += len(targ_units_arr)
+
+                    targ_words = post_process(targ_units, self.post_process).split()
+
+                    pred_units = self.task.target_dictionary.string(pred_units_arr)
+                    pred_words_raw = post_process(pred_units, self.post_process).split()
+
+                    if decoded is not None and "words" in decoded:
+                        pred_words = decoded["words"]
+                        w_errs += editdistance.eval(pred_words, targ_words)
+                        wv_errs += editdistance.eval(pred_words_raw, targ_words)
+                    else:
+                        dist = editdistance.eval(pred_words_raw, targ_words)
+                        w_errs += dist
+                        wv_errs += dist
+
+                    w_len += len(targ_words)
+
+                logging_output["wv_errors"] = wv_errs
+                logging_output["w_errors"] = w_errs
+                logging_output["w_total"] = w_len
+                logging_output["c_errors"] = c_err
+                logging_output["c_total"] = c_len
+
+        return loss, sample_size, logging_output
+
+    def compute_ce_loss(self, model, net_output, sample, reduce=True):
+        lprobs, target = self.get_lprobs_and_target(model, net_output, sample)
+        loss, nll_loss = label_smoothed_nll_loss(
+            lprobs,
+            target,
+            self.eps,
+            ignore_index=self.pad_idx,
+            reduce=reduce,
+        )
+        return loss, nll_loss
+
+    def compute_accuracy(self, model, net_output, sample):
+        lprobs, target = self.get_lprobs_and_target(model, net_output, sample)
+        mask = target.ne(self.pad_idx)
+        n_correct = torch.sum(
+            lprobs.argmax(1).masked_select(mask).eq(target.masked_select(mask))
+        )
+        total = torch.sum(mask)
+        return n_correct, total
+
+    def get_lprobs_and_target(self, model, net_output, sample):
+        lprobs = model.get_normalized_probs(net_output, log_probs=True)
+        target = sample["decoder_target"]
+        if self.ignore_prefix_size > 0:
+            if getattr(lprobs, "batch_first", False):
+                lprobs = lprobs[:, self.ignore_prefix_size :, :].contiguous()
+                target = target[:, self.ignore_prefix_size :].contiguous()
+            else:
+                lprobs = lprobs[self.ignore_prefix_size :, :, :].contiguous()
+                target = target[self.ignore_prefix_size :, :].contiguous()
+        return lprobs.view(-1, lprobs.size(-1)), target.view(-1)
+
+
+    @staticmethod
+    def reduce_metrics(logging_outputs) -> None:
+        """Aggregate logging outputs from data parallel training."""
+
+        loss_sum = utils.item(sum(log.get("loss", 0) for log in logging_outputs))
+        ntokens = utils.item(sum(log.get("ntokens", 0) for log in logging_outputs))
+        nsentences = utils.item(
+            sum(log.get("nsentences", 0) for log in logging_outputs)
+        )
+        sample_size = utils.item(
+            sum(log.get("sample_size", 0) for log in logging_outputs)
+        )
+
+        metrics.log_scalar(
+            "loss", loss_sum / sample_size / math.log(2), sample_size, round=3
+        )
+        metrics.log_scalar("ntokens", ntokens)
+        metrics.log_scalar("nsentences", nsentences)
+        if sample_size != ntokens:
+            metrics.log_scalar(
+                "nll_loss", loss_sum / ntokens / math.log(2), ntokens, round=3
+            )
+
+        c_errors = sum(log.get("c_errors", 0) for log in logging_outputs)
+        metrics.log_scalar("_c_errors", c_errors)
+        c_total = sum(log.get("c_total", 0) for log in logging_outputs)
+        metrics.log_scalar("_c_total", c_total)
+        w_errors = sum(log.get("w_errors", 0) for log in logging_outputs)
+        metrics.log_scalar("_w_errors", w_errors)
+        wv_errors = sum(log.get("wv_errors", 0) for log in logging_outputs)
+        metrics.log_scalar("_wv_errors", wv_errors)
+        w_total = sum(log.get("w_total", 0) for log in logging_outputs)
+        metrics.log_scalar("_w_total", w_total)
+
+        if c_total > 0:
+            metrics.log_derived(
+                "uer",
+                lambda meters: safe_round(
+                    meters["_c_errors"].sum * 100.0 / meters["_c_total"].sum, 3
+                )
+                if meters["_c_total"].sum > 0
+                else float("nan"),
+            )
+        if w_total > 0:
+            metrics.log_derived(
+                "wer",
+                lambda meters: safe_round(
+                    meters["_w_errors"].sum * 100.0 / meters["_w_total"].sum, 3
+                )
+                if meters["_w_total"].sum > 0
+                else float("nan"),
+            )
+            metrics.log_derived(
+                "raw_wer",
+                lambda meters: safe_round(
+                    meters["_wv_errors"].sum * 100.0 / meters["_w_total"].sum, 3
+                )
+                if meters["_w_total"].sum > 0
+                else float("nan"),
+            )
+
+        if "dec_loss" in logging_outputs[0]:
+            ctc_loss_sum = sum(log.get("ctc_loss", 0) for log in logging_outputs)
+            dec_loss_sum = sum(log.get("dec_loss", 0) for log in logging_outputs)
+            dec_nll_loss_sum = sum(log.get("dec_nll_loss", 0) for log in logging_outputs)
+            dec_sample_size = sum(log.get("dec_sample_size", 0) for log in logging_outputs)
+            metrics.log_scalar(
+                "dec_loss", dec_loss_sum / dec_sample_size / math.log(2), dec_sample_size, round=3
+            )
+            metrics.log_scalar(
+                "ctc_loss", ctc_loss_sum / sample_size / math.log(2), sample_size, round=3
+            )
+            metrics.log_scalar(
+                "dec_nll_loss", dec_nll_loss_sum / dec_sample_size / math.log(2), dec_sample_size, round=3
+            )
+            metrics.log_derived(
+                "dec_ppl", lambda meters: utils.get_perplexity(meters["dec_nll_loss"].avg)
+            )
+            total = utils.item(sum(log.get("total", 0) for log in logging_outputs))
+            if total > 0:
+                metrics.log_scalar("total", total)
+                n_correct = utils.item(
+                    sum(log.get("dec_n_correct", 0) for log in logging_outputs)
+                )
+                metrics.log_scalar("dec_n_correct", n_correct)
+                metrics.log_derived(
+                    "dec_accuracy",
+                    lambda meters: round(
+                        meters["dec_n_correct"].sum * 100.0 / meters["total"].sum, 3
+                    )
+                    if meters["total"].sum > 0
+                    else float("nan"),
+                )
+
+    @staticmethod
+    def logging_outputs_can_be_summed() -> bool:
+        """
+        Whether the logging outputs returned by `forward` can be summed
+        across workers prior to calling `reduce_metrics`. Setting this
+        to True will improves distributed training speed.
+        """
+        return True
diff --git a/SpeechUT/speechut/criterions/speechut_criterion.py b/SpeechUT/speechut/criterions/speechut_criterion.py
new file mode 100644
index 0000000000000000000000000000000000000000..0d735f1efd16aebf4146e26d5a5ebaeca2516ad7
--- /dev/null
+++ b/SpeechUT/speechut/criterions/speechut_criterion.py
@@ -0,0 +1,384 @@
+# ----------------------------------------------------------------------------
+# SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder Based Speech-Text Pre-training (https://arxiv.org/abs/2210.03730)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/SpeechUT
+# Code based on fairseq: https://github.com/facebookresearch/fairseq/tree/272c4c5197250997148fb12c0db6306035f166a4
+# 
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# ----------------------------------------------------------------------------
+
+import logging
+import math
+import re
+from dataclasses import dataclass, field
+from typing import List, Optional
+
+import numpy as np
+import torch
+import torch.nn.functional as F
+from fairseq import metrics, utils
+from fairseq.criterions import FairseqCriterion, register_criterion
+from fairseq.criterions.label_smoothed_cross_entropy import label_smoothed_nll_loss
+from fairseq.dataclass import FairseqDataclass
+
+logger = logging.getLogger(__name__)
+
+@dataclass
+class SpeechUTCriterionConfig(FairseqDataclass):
+    pred_masked_weight: float = field(
+        default=1.0,
+        metadata={"help": "weight for predictive loss for masked frames"},
+    )
+    pred_nomask_weight: float = field(
+        default=0.0,
+        metadata={"help": "weight for predictive loss for unmasked frames"},
+    )
+    loss_weights: Optional[List[float]] = field(
+        default=None,
+        metadata={"help": "weights for additional loss terms (not first one)"},
+    )
+    log_keys: List[str] = field(
+        default_factory=lambda: [],
+        metadata={"help": "output keys to log"},
+    )
+    u2t_ed_weight: float = field(
+        default=0.1,
+        metadata={"help": "weights for text ED Loss, loss will be (hubert_loss + text_mum_weight * MUM_Loss + u2t_ed_weight * CE_Loss + u2t_ctc_weight * CTC_loss)"},
+    )
+    u2t_ctc_weight: float = field(
+        default=0.0,
+        metadata={"help": "weights for text ED Loss, loss will be (hubert_loss + text_mum_weight * MUM_Loss + u2t_ed_weight * CE_Loss + u2t_ctc_weight * CTC_loss)"},
+    )
+    text_mum_weight: float = field(
+        default=0.0,
+        metadata={"help": "masked unit modeling weight from the text end"},
+    )
+    report_accuracy: bool = field(
+        default=True,
+        metadata={"help": "report decoder accuracy metric"},
+    )
+    ignore_prefix_size: int = field(
+        default=0,
+        metadata={"help": "Ignore first N tokens"},
+    )
+    label_smoothing: float = field(
+        default=0.0,
+        metadata={"help": "epsilon for label smoothing, 0 means no label smoothing"},
+    )
+    no_ctc_blank: bool = field(
+        default=False,
+        metadata={"help": "mask out the blank of ctc, only when dec_loss_type=ctc"},
+    )
+    label_smoothing: float = field(
+        default=0.0,
+        metadata={"help": "epsilon for label smoothing, 0 means no label smoothing"},
+    )
+
+@register_criterion("speechut_criterion", dataclass=SpeechUTCriterionConfig)
+class SpeechUTCriterion(FairseqCriterion):
+    def __init__(
+        self, 
+        task, 
+        pred_masked_weight, 
+        pred_nomask_weight, 
+        loss_weights=None, 
+        log_keys=None, 
+        u2t_ed_weight=0.1,
+        u2t_ctc_weight=0,
+        text_mum_weight=0,
+        report_accuracy=False, 
+        ignore_prefix_size=0,
+        label_smoothing=0,
+        no_ctc_blank=False,
+    ):
+        super().__init__(task)
+        self.pred_masked_weight = pred_masked_weight
+        self.pred_nomask_weight = pred_nomask_weight
+        self.loss_weights = loss_weights
+        self.log_keys = [] if log_keys is None else log_keys
+        self.u2t_ed_weight = u2t_ed_weight
+        self.u2t_ctc_weight = u2t_ctc_weight
+        self.text_mum_weight = text_mum_weight
+        self.report_accuracy = report_accuracy
+        self.ignore_prefix_size = ignore_prefix_size
+        self.eps = label_smoothing
+        self.no_ctc_blank = no_ctc_blank
+        self.padding_idx = task.dictionaries[0].pad()
+        self.eos_idx = task.dictionaries[0].eos()
+        self.blank_idx = task.dictionaries[0].bos()
+
+    def compute_hubert_loss(self, model, net_output, reduction, preffix='', suffix=''):
+        loss = 0
+        sample_size = []
+        logging_output = {}
+        loss_m_list = []
+        logp_m_list = model.get_logits(net_output, True)
+        targ_m_list = model.get_targets(net_output, True)
+        assert self.pred_masked_weight == 0 or len(logp_m_list) > 0
+        for i, (logp_m, targ_m) in enumerate(zip(logp_m_list, targ_m_list)):
+            loss_m = F.cross_entropy(logp_m, targ_m, reduction=reduction)
+            loss_m_list.append(loss_m)
+            logging_output[f"{preffix}loss_m_{i}"] = loss_m.detach().item()
+        if self.pred_masked_weight > 0:
+            loss += self.pred_masked_weight * sum(loss_m_list)
+            sample_size.append(targ_m_list[0].numel())
+
+        loss_u_list = []
+        logp_u_list = model.get_logits(net_output, False)
+        targ_u_list = model.get_targets(net_output, False)
+        assert self.pred_nomask_weight == 0 or len(logp_u_list) > 0
+        for i, (logp_u, targ_u) in enumerate(zip(logp_u_list, targ_u_list)):
+            loss_u = F.cross_entropy(logp_u, targ_u, reduction=reduction)
+            loss_u_list.append(loss_u)
+            logging_output[f"{preffix}loss_u_{i}"] = loss_u.detach().item()
+        if self.pred_nomask_weight > 0:
+            loss += self.pred_nomask_weight * sum(loss_u_list)
+            sample_size.append(targ_u_list[0].numel())
+        
+        sample_size = np.mean(sample_size)
+
+        def compute_correct(logits, targets):
+            if logits.numel() == 0:
+                return 0, 0
+            else:
+                assert logits.dim() > 1, logits.shape
+                max = logits.argmax(-1) == targets
+                min = logits.argmin(-1) == targets
+                both = max & min
+                corr = max.long().sum().item() - both.long().sum().item()
+                count = max.numel()
+                return corr, count
+
+        with torch.no_grad():
+            for i, (logp_m, targ_m) in enumerate(zip(logp_m_list, targ_m_list)):
+                corr_m, count_m = compute_correct(logp_m, targ_m)
+                logging_output[f"correct_m_{i}{suffix}"] = corr_m
+                logging_output[f"count_m_{i}{suffix}"] = count_m
+
+            for i, (logp_u, targ_u) in enumerate(zip(logp_u_list, targ_u_list)):
+                corr_u, count_u = compute_correct(logp_u, targ_u)
+                logging_output[f"correct_u_{i}{suffix}"] = corr_u
+                logging_output[f"count_u_{i}{suffix}"] = count_u
+
+        return loss, sample_size, logging_output
+
+
+    def forward(self, model, sample, reduce=True, log_pred=False):
+        """Compute the loss for the given sample.
+        Returns a tuple with three elements:
+        1) the loss
+        2) the sample size, which is used as the denominator for the gradient
+        3) logging outputs to display while training
+        """
+        reduction = "sum" if reduce else "none"
+
+        if "net_input" in sample:
+            unit_sample = text_sample = None
+        else:
+            unit_sample = sample.get("text_mono", None)
+            text_sample = sample.get("text_paired", None)
+            assert unit_sample is not None or text_sample is not None
+            sample = sample.get("speech")
+
+        ### 1. S2U: do hubert forward and loss computation
+        sample["modality"] = "speech"
+        net_output = model(target_list=sample["target_list"], **sample["net_input"])
+        loss, sample_size, logging_output = self.compute_hubert_loss(
+            model,
+            net_output,
+            reduction,
+        )
+        if self.loss_weights is not None:
+            assert hasattr(model, "get_extra_losses")
+            extra_losses, names = model.get_extra_losses(net_output)
+            if torch.is_tensor(extra_losses):
+                extra_losses = [extra_losses]
+                names = [names]
+            if len(self.loss_weights) == 1 and len(extra_losses) != 1:
+                self.loss_weights = [self.loss_weights[0]] * len(extra_losses)
+            assert len(extra_losses) == len(
+                self.loss_weights
+            ), f"{len(extra_losses)}, {len(self.loss_weights)}"
+            for p, n, coef in zip(extra_losses, names, self.loss_weights):
+                if coef != 0 and p is not None:
+                    p = coef * p.float() * sample_size
+                    loss += p
+                    logging_output[f"loss_{n}"] = p.item()
+        for lk in self.log_keys:
+            if lk in net_output:
+                logging_output[lk] = float((net_output[lk]))
+        
+        ### 2. do text U2T forward and loss computation
+        if text_sample is not None and (self.u2t_ctc_weight + self.u2t_ed_weight) > 0:
+            ## 2.1 re-loading "target_list", in default case, target_list = [src_tokens],
+            ## while in case of using "unit-phone-char" structure, target_list will be [ref_tokens]
+            text_sample["net_input"]["target_list"] = [
+                text_sample.get("ref_tokens", text_sample["net_input"]["src_tokens"].clone()),
+            ]
+            text_net_output = model(**text_sample["net_input"])
+            text_sample_size = text_sample["ntokens"]
+
+            ### 2.1 U2T_UCTC
+            if self.u2t_ctc_weight > 0:
+                text_ctc_loss = self.compute_ctc_loss(model, text_net_output, text_sample["target"], reduction=reduction)
+                loss += self.u2t_ctc_weight * text_ctc_loss * sample_size / text_sample_size
+                logging_output["text_ctc_loss"] = utils.item(text_ctc_loss)
+                logging_output["text_sample_size"] = text_sample_size
+
+            ### 2.2 U2T_ED
+            if self.u2t_ed_weight > 0:
+                text_dec_loss, text_dec_nll_loss = self.compute_ce_loss(model, text_net_output["decoder_out"], text_sample, reduce=reduce)
+                loss += self.u2t_ed_weight * text_dec_loss * sample_size / text_sample_size
+                logging_output["text_dec_loss"] = utils.item(text_dec_loss)
+                logging_output["text_dec_nll_loss"] = utils.item(text_dec_nll_loss)
+                logging_output["text_sample_size"] = text_sample_size
+                if self.report_accuracy:
+                    n_correct, total = self.compute_accuracy(model, text_net_output["decoder_out"], text_sample)
+                    logging_output["correct_text_dec"] = utils.item(n_correct.data)
+                    logging_output["count_text_dec"] = utils.item(total.data)
+
+        ### 3. do unit MUM forward and loss computation
+        if unit_sample is not None and self.text_mum_weight > 0:
+            src_tokens = unit_sample["net_input"]["src_tokens"]
+            target = unit_sample.get("target", None)
+            target = src_tokens.clone() if target is None else target
+            unit_net_output = model.forward_mum(src_tokens, target)
+            loss_num, sample_size_mum, logging_output_mum = self.compute_hubert_loss(
+                model,
+                unit_net_output,
+                reduction,
+                preffix="mum_",
+                suffix="_mum",
+            )
+            loss += self.text_mum_weight * loss_num * sample_size / sample_size_mum
+            logging_output["unit_sample_size"] = sample_size_mum
+            logging_output.update(logging_output_mum)
+
+        logging_output = {
+            "loss": utils.item(loss) if reduce else loss,
+            "ntokens": sample_size,
+            "nsentences": sample["id"].numel() + (text_sample["id"].numel() if text_sample is not None else 0),
+            "sample_size": sample_size,
+            **logging_output,
+        }
+
+        return loss, sample_size, logging_output
+
+    def compute_ctc_loss(self, model, net_output, target, reduction):
+        logits = net_output["encoder_out_ctc"][0]  # (T, B, C) from the code-encoder
+        if self.no_ctc_blank:
+            ## set prob of <blank> to -inf
+            logits = logits.float()
+            logits[:, :, self.blank_idx] = -1000000.0
+        
+        lprobs = F.log_softmax(logits.float(), dim=-1)
+
+        encoder_padding_mask = net_output["encoder_padding_mask"][0]
+        non_padding_mask = ~encoder_padding_mask
+        input_lengths = non_padding_mask.long().sum(-1)
+        pad_mask = (target != self.padding_idx) & (target != self.eos_idx)
+        targets_flat = target.masked_select(pad_mask)
+        target_lengths = pad_mask.sum(-1)
+
+        with torch.backends.cudnn.flags(enabled=False):
+            loss = F.ctc_loss(
+                lprobs,
+                targets_flat,
+                input_lengths,
+                target_lengths,
+                blank=self.blank_idx,
+                reduction=reduction,
+                zero_infinity=True,
+            )
+        return loss
+
+    def compute_ce_loss(self, model, net_output, sample, reduce=True):
+        lprobs, target = self.get_lprobs_and_target(model, net_output, sample)
+        loss, nll_loss = label_smoothed_nll_loss(
+            lprobs,
+            target,
+            self.eps,
+            ignore_index=self.padding_idx,
+            reduce=reduce,
+        )
+        return loss, nll_loss
+
+    def compute_accuracy(self, model, net_output, sample):
+        lprobs, target = self.get_lprobs_and_target(model, net_output, sample)
+        mask = target.ne(self.padding_idx)
+        n_correct = torch.sum(
+            lprobs.argmax(1).masked_select(mask).eq(target.masked_select(mask))
+        )
+        total = torch.sum(mask)
+        return n_correct, total
+
+    def get_lprobs_and_target(self, model, net_output, sample):
+        lprobs = model.get_normalized_probs(net_output, log_probs=True)
+        target = sample["target"]
+
+        return lprobs.view(-1, lprobs.size(-1)), target.view(-1)
+
+    @staticmethod
+    def reduce_metrics(logging_outputs) -> None:
+        """Aggregate logging outputs from data parallel training (copied from normal cross entropy)."""
+        loss_sum = sum(log.get("loss", 0) for log in logging_outputs)
+        ntokens = sum(log.get("ntokens", 0) for log in logging_outputs)
+        sample_size = sum(log.get("sample_size", 0) for log in logging_outputs)
+
+        metrics.log_scalar(
+            "loss", loss_sum / sample_size / math.log(2), sample_size, round=3
+        )
+        if sample_size != ntokens:
+            metrics.log_scalar(
+                "nll_loss", loss_sum / ntokens / math.log(2), ntokens, round=3
+            )
+            metrics.log_derived(
+                "ppl", lambda meters: utils.get_perplexity(meters["nll_loss"].avg)
+            )
+        else:
+            metrics.log_derived(
+                "ppl", lambda meters: utils.get_perplexity(meters["loss"].avg)
+            )
+
+        counts = {}
+        for lk in logging_outputs[0].keys():
+            if lk.startswith("count_"):
+                val = sum(log.get(lk, 0) for log in logging_outputs)
+                metrics.log_scalar(lk, val)
+                counts[lk] = val
+
+        for lk in logging_outputs[0].keys():
+            if lk.startswith("loss_"):
+                val = sum(log.get(lk, 0) for log in logging_outputs)
+                metrics.log_scalar(lk, val / sample_size / math.log(2), round=3)
+            elif lk.startswith("correct_"):
+                val = sum(log.get(lk, 0) for log in logging_outputs)
+                metrics.log_scalar(lk, val / counts[re.sub("correct", "count", lk)])
+
+        if "text_sample_size" in logging_outputs[0]:
+            text_sample_size = sum(log.get("text_sample_size", 0) for log in logging_outputs)
+            for lk in logging_outputs[0].keys():
+                if lk.startswith("text_") and lk.endswith("_loss"):
+                    val = sum(log.get(lk, 0) for log in logging_outputs)
+                    metrics.log_scalar(lk, val / text_sample_size / math.log(2), round=3)
+
+        if "unit_sample_size" in logging_outputs[0]:
+            unit_sample_size = sum(log.get("unit_sample_size", 0) for log in logging_outputs)
+            for lk in logging_outputs[0].keys():
+                if lk.startswith("mum_loss_"):
+                    val = sum(log.get(lk, 0) for log in logging_outputs)
+                    metrics.log_scalar(lk, val / unit_sample_size / math.log(2), round=3)
+
+    @staticmethod
+    def aggregate_logging_outputs(logging_outputs):
+        """Aggregate logging outputs from data parallel training."""
+        raise NotImplementedError()
+
+    @staticmethod
+    def logging_outputs_can_be_summed() -> bool:
+        """
+        Whether the logging outputs returned by `forward` can be summed
+        across workers prior to calling `reduce_metrics`. Setting this
+        to True will improves distributed training speed.
+        """
+        return False
diff --git a/SpeechUT/speechut/data/concat_dataset.py b/SpeechUT/speechut/data/concat_dataset.py
new file mode 100644
index 0000000000000000000000000000000000000000..5766921ac39b571010b318e0d4b6f967cd21d96e
--- /dev/null
+++ b/SpeechUT/speechut/data/concat_dataset.py
@@ -0,0 +1,129 @@
+# --------------------------------------------------------
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Based on fairseq code bases
+# https://github.com/facebookresearch/fairseq
+# --------------------------------------------------------
+
+import bisect
+
+import numpy as np
+from torch.utils.data.dataloader import default_collate
+
+from fairseq.data import FairseqDataset
+
+
+class ConcatDataset(FairseqDataset):
+    @staticmethod
+    def cumsum(sequence, sample_ratios):
+        r, s = [], 0
+        for e, ratio in zip(sequence, sample_ratios):
+            curr_len = int(ratio * len(e))
+            r.append(curr_len + s)
+            s += curr_len
+        return r
+
+    def __init__(self, datasets, sample_ratios=1):
+        super(ConcatDataset, self).__init__()
+        assert len(datasets) > 0, "datasets should not be an empty iterable"
+        self.datasets = list(datasets)
+        if isinstance(sample_ratios, int):
+            sample_ratios = [sample_ratios] * len(self.datasets)
+        self.sample_ratios = sample_ratios
+        self.cumulative_sizes = self.cumsum(self.datasets, sample_ratios)
+        self.real_sizes = [len(d) for d in self.datasets]
+
+    def __len__(self):
+        return self.cumulative_sizes[-1]
+
+    def __getitem__(self, idx):
+        dataset_idx, sample_idx = self._get_dataset_and_sample_index(idx)
+        return self.datasets[dataset_idx][sample_idx]
+
+    def _get_dataset_and_sample_index(self, idx: int):
+        dataset_idx = bisect.bisect_right(self.cumulative_sizes, idx)
+        if dataset_idx == 0:
+            sample_idx = idx
+        else:
+            sample_idx = idx - self.cumulative_sizes[dataset_idx - 1]
+        sample_idx = sample_idx % self.real_sizes[dataset_idx]
+        return dataset_idx, sample_idx
+
+    def collater(self, samples, **extra_args):
+        # For now only supports datasets with same underlying collater implementations
+        if hasattr(self.datasets[0], "collater"):
+            return self.datasets[0].collater(samples, **extra_args)
+        else:
+            return default_collate(samples, **extra_args)
+
+    def size(self, idx: int):
+        """
+        Return an example's size as a float or tuple.
+        """
+        dataset_idx, sample_idx = self._get_dataset_and_sample_index(idx)
+        return self.datasets[dataset_idx].size(sample_idx)
+
+    def num_tokens(self, index: int):
+        return np.max(self.size(index))
+
+    def attr(self, attr: str, index: int):
+        dataset_idx = bisect.bisect_right(self.cumulative_sizes, index)
+        return getattr(self.datasets[dataset_idx], attr, None)
+
+    @property
+    def sizes(self):
+        _dataset_sizes = []
+        for ds, sr in zip(self.datasets, self.sample_ratios):
+            if isinstance(ds.sizes, np.ndarray):
+                _dataset_sizes.append(np.tile(ds.sizes, sr))
+            else:
+                # Only support underlying dataset with single size array.
+                assert isinstance(ds.sizes, list)
+                _dataset_sizes.append(np.tile(ds.sizes[0], sr))
+        return np.concatenate(_dataset_sizes)
+
+    @property
+    def supports_prefetch(self):
+        return all(d.supports_prefetch for d in self.datasets)
+
+    def ordered_indices(self):
+        """
+        Returns indices sorted by length. So less padding is needed.
+        """
+        if isinstance(self.sizes, np.ndarray) and len(self.sizes.shape) > 1:
+            # special handling for concatenating lang_pair_datasets
+            if getattr(self.datasets[0], "shuffle", False):
+                indices = np.random.permutation(len(self)).astype(np.int64)
+            else:
+                indices = np.arange(len(self), dtype=np.int64)
+            sizes = self.sizes
+            tgt_sizes = (
+                sizes[:, 1] if len(sizes.shape) > 0 and sizes.shape[1] > 1 else None
+            )
+            src_sizes = (
+                sizes[:, 0] if len(sizes.shape) > 0 and sizes.shape[1] > 1 else sizes
+            )
+            # sort by target length, then source length
+            if tgt_sizes is not None:
+                indices = indices[np.argsort(tgt_sizes[indices], kind="mergesort")]
+            return indices[np.argsort(src_sizes[indices], kind="mergesort")]
+        else:
+            return np.argsort(self.sizes)
+
+    def prefetch(self, indices):
+        frm = 0
+        for to, ds in zip(self.cumulative_sizes, self.datasets):
+            real_size = len(ds)
+            if getattr(ds, "supports_prefetch", False):
+                ds.prefetch([(i - frm) % real_size for i in indices if frm <= i < to])
+            frm = to
+
+    @property
+    def can_reuse_epoch_itr_across_epochs(self):
+        return all(d.can_reuse_epoch_itr_across_epochs for d in self.datasets)
+
+    def set_epoch(self, epoch):
+        super().set_epoch(epoch)
+        for ds in self.datasets:
+            if hasattr(ds, "set_epoch"):
+                ds.set_epoch(epoch)
diff --git a/SpeechUT/speechut/data/hubert_dataset.py b/SpeechUT/speechut/data/hubert_dataset.py
new file mode 100644
index 0000000000000000000000000000000000000000..64965dea445a0a5afc63c887b1bc89cece0b203b
--- /dev/null
+++ b/SpeechUT/speechut/data/hubert_dataset.py
@@ -0,0 +1,597 @@
+# --------------------------------------------------------
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Based on fairseq code bases
+# https://github.com/facebookresearch/fairseq
+# --------------------------------------------------------
+
+import itertools
+import logging
+import io
+import os
+import sys
+import time
+from pathlib import Path
+from typing import Any, List, Optional, Union, Tuple
+
+import numpy as np
+
+import torch
+import torch.nn.functional as F
+from fairseq.data import data_utils, Dictionary
+from fairseq.data.fairseq_dataset import FairseqDataset
+from fairseq.data.audio.audio_utils import (
+    read_from_stored_zip,
+    is_sf_audio_data,
+)
+
+FEATURE_OR_SF_AUDIO_FILE_EXTENSIONS = {".npy", ".wav", ".flac", ".ogg"}
+
+logger = logging.getLogger(__name__)
+
+def parse_path(path: str) -> Tuple[str, List[int]]:
+    """Parse data path which is either a path to
+    1. a .npy/.wav/.flac/.ogg file
+    2. a stored ZIP file with slicing info: "[zip_path]:[offset]:[length]"
+
+      Args:
+          path (str): the data path to parse
+
+      Returns:
+          file_path (str): the file path
+          slice_ptr (list of int): empty in case 1;
+            byte offset and length for the slice in case 2
+    """
+
+    if Path(path).suffix in FEATURE_OR_SF_AUDIO_FILE_EXTENSIONS:
+        _path, slice_ptr = path, []
+    else:
+        _path, *slice_ptr = path.split(":")
+        if not Path(_path).is_file():
+            raise FileNotFoundError(f"File not found: {_path}")
+    assert len(slice_ptr) in {0, 1, 2}, f"Invalid path: {path}"
+    slice_ptr = [int(i) for i in slice_ptr]
+    return _path, slice_ptr
+
+def load_audio(manifest_path, max_keep, min_keep, retry_times=5):
+    n_long, n_short = 0, 0
+    names, inds, sizes, chunk_names, chunk_indices = [], [], [], [], []
+    for i in range(retry_times):
+        with open(manifest_path) as f:
+            root = f.readline().strip()
+            for ind, line in enumerate(f):
+                items = line.strip().split("\t")
+                assert len(items) == 2, line
+                sz = int(items[1])
+                if min_keep is not None and sz < min_keep:
+                    n_short += 1
+                elif max_keep is not None and sz > max_keep:
+                    n_long += 1
+                else:
+                    fname = items[0].split(":")
+                    if len(fname) > 2:
+                        if len(chunk_names) == 0 or fname[0] != chunk_names[-1]:
+                            chunk_names.append(fname[0])
+                            chunk_indices.append(len(names))
+                    names.append(items[0])
+                    inds.append(ind)
+                    sizes.append(sz)
+        if len(names) == 0:
+            logger.warn(f"Fail to load manifest for the {i} time")
+            time.sleep(1)
+            continue
+        else:
+            break
+    tot = ind + 1
+    logger.info(
+        (
+            f"max_keep={max_keep}, min_keep={min_keep}, "
+            f"loaded {len(names)}, skipped {n_short} short and {n_long} long, "
+            f"longest-loaded={max(sizes)}, shortest-loaded={min(sizes)}"
+        )
+    )
+    return root, names, inds, tot, sizes, chunk_names, chunk_indices
+
+
+def load_label(label_path, inds, tot, retry_times=5):
+    for i in range(retry_times):
+        with open(label_path) as f:
+            labels = [line.rstrip() for line in f]
+            if len(labels) == 0:
+                logger.warn(f"Fail to load label for the {i} time")
+                time.sleep(1)
+                continue
+            else:
+                break
+    assert (
+        len(labels) == tot
+    ), f"number of labels does not match ({len(labels)} != {tot})"
+    labels = [labels[i] for i in inds]
+    return labels
+
+
+def load_label_offset(label_path, inds, tot, retry_times=5):
+    for i in range(retry_times):
+        with open(label_path) as f:
+            code_lengths = [len(line.encode("utf-8")) for line in f]
+            if len(code_lengths) == 0:
+                logger.warn(f"Fail to load label for the {i} time")
+                time.sleep(1)
+                continue
+            else:
+                break
+    assert (
+        len(code_lengths) == tot
+    ), f"number of labels does not match ({len(code_lengths)} != {tot})"
+    offsets = list(itertools.accumulate([0] + code_lengths))
+    offsets = [(offsets[i], offsets[i + 1]) for i in inds]
+    return offsets
+
+
+def verify_label_lengths(
+    audio_sizes,
+    audio_rate,
+    label_path,
+    label_rate,
+    inds,
+    tot,
+    tol=0.1,  # tolerance in seconds
+):
+    if label_rate < 0:
+        logger.info(f"{label_path} is sequence label. skipped")
+        return
+
+    with open(label_path) as f:
+        lengths = [len(line.rstrip().split()) for line in f]
+        assert len(lengths) == tot
+        lengths = [lengths[i] for i in inds]
+    num_invalid = 0
+    for i, ind in enumerate(inds):
+        dur_from_audio = audio_sizes[i] / audio_rate
+        dur_from_label = lengths[i] / label_rate
+        if abs(dur_from_audio - dur_from_label) > tol:
+            logger.warning(
+                (
+                    f"audio and label duration differ too much "
+                    f"(|{dur_from_audio} - {dur_from_label}| > {tol}) "
+                    f"in line {ind+1} of {label_path}. Check if `label_rate` "
+                    f"is correctly set (currently {label_rate}). "
+                    f"num. of samples = {audio_sizes[i]}; "
+                    f"label length = {lengths[i]}"
+                )
+            )
+            num_invalid += 1
+    if num_invalid > 0:
+        logger.warning(
+            f"total {num_invalid} (audio, label) pairs with mismatched lengths"
+        )
+
+
+class HubertDataset(FairseqDataset):
+    def __init__(
+        self,
+        manifest_path: str,
+        sample_rate: float,
+        label_paths: List[str],
+        label_rates: Union[List[float], float],  # -1 for sequence labels
+        pad_list: List[str],
+        eos_list: List[str],
+        label_processors: Optional[List[Any]] = None,
+        max_keep_sample_size: Optional[int] = None,
+        min_keep_sample_size: Optional[int] = None,
+        max_sample_size: Optional[int] = None,
+        shuffle: bool = True,
+        pad_audio: bool = False,
+        normalize: bool = False,
+        store_labels: bool = True,
+        random_crop: bool = False,
+        single_target: bool = False,
+        tgt_dict: Optional[Dictionary] = None,
+        add_decoder_target: bool = False,
+        fine_tuning: bool = False,
+        tgt_lang_idx: int = None,
+        tokenizer = None,
+        mbart_style_lang_id: bool = False,
+        retry_times: int = 5,
+        reduce_label_for_dec: bool = True,
+    ):
+        self.audio_root, self.audio_names, inds, tot, self.wav_sizes, self.chunk_names, self.chunk_indices = load_audio(
+            manifest_path, max_keep_sample_size, min_keep_sample_size, retry_times
+        )
+        self.sample_rate = sample_rate
+        self.shuffle = shuffle
+        self.random_crop = random_crop
+        self.tgt_dict = tgt_dict
+        self.add_decoder_target = add_decoder_target
+        self.fine_tuning = fine_tuning
+
+        self.num_labels = len(label_paths)
+        self.pad_list = pad_list
+        self.eos_list = eos_list
+        self.label_processors = label_processors
+        self.single_target = single_target
+        self.epoch = 0
+
+        self.label_rates = (
+            [label_rates for _ in range(len(label_paths))]
+            if isinstance(label_rates, int)
+            else label_rates
+        )
+        self.store_labels = store_labels
+        if store_labels:
+            self.label_list = [load_label(p, inds, tot, retry_times) for p in label_paths]
+        else:
+            self.label_paths = label_paths
+            self.label_offsets_list = [
+                load_label_offset(p, inds, tot, retry_times) for p in label_paths
+            ]
+        assert label_processors is None or len(label_processors) == self.num_labels
+        for label_path, label_rate in zip(label_paths, self.label_rates):
+            verify_label_lengths(
+                self.wav_sizes, sample_rate, label_path, label_rate, inds, tot
+            )
+
+        self.max_sample_size = (
+            max_sample_size if max_sample_size is not None else sys.maxsize
+        )
+        self.pad_audio = pad_audio
+        self.normalize = normalize
+        self.tgt_lang_idx = tgt_lang_idx
+        self.tokenizer = tokenizer
+        self.mbart_style_lang_id = mbart_style_lang_id
+        self.retry_times = retry_times
+        self.reduce_label_for_dec = reduce_label_for_dec
+        logger.info(
+            f"pad_audio={pad_audio}, random_crop={random_crop}, tgt_lang_idx={self.tgt_lang_idx}, reduce_label_for_dec={reduce_label_for_dec}, "
+            f"mbart_style_lang_id={mbart_style_lang_id}, normalize={normalize}, max_sample_size={self.max_sample_size}"
+        )
+
+    def set_epoch(self, epoch):
+        self.epoch = epoch
+    
+    def batch_by_size(self, indices, max_tokens=None, max_sentences=None, required_batch_size_multiple=1):
+        self.max_tokens = max_tokens
+        self.max_sentences = max_sentences
+        self.required_batch_size_multiple = required_batch_size_multiple
+        if isinstance(indices[0], np.ndarray):
+            batch_list = []
+            for indice in indices:
+                batch = super(HubertDataset, self).batch_by_size(indice, max_tokens, max_sentences, required_batch_size_multiple)
+                batch_list.append(batch)
+            return batch_list
+        else:
+            return super(HubertDataset, self).batch_by_size(indices, max_tokens, max_sentences, required_batch_size_multiple)
+    def shuffle_batches(self, batches, seed):
+        if isinstance(batches[0], list):
+            new_batches = []
+            with data_utils.numpy_seed(seed):
+                np.random.shuffle(batches)
+                for batch in batches:
+                    np.random.shuffle(batch)
+                    new_batches.extend(batch)
+            return new_batches
+        else:
+            with data_utils.numpy_seed(seed):
+                np.random.shuffle(batches)
+        return batches
+    
+    def get_audio(self, index):
+        import soundfile as sf
+
+        wav_path = os.path.join(self.audio_root, self.audio_names[index])
+        _path, slice_ptr = parse_path(wav_path)
+        if len(slice_ptr) == 1:
+            import kaldiio
+            feat = kaldiio.load_mat(wav_path)
+            feat = torch.from_numpy(feat).float()
+            if self.normalize:
+                with torch.no_grad():
+                    feat = F.layer_norm(feat, feat.shape[-1])
+            return feat
+        else:
+            if len(slice_ptr) == 2:
+                byte_data = read_from_stored_zip(_path, slice_ptr[0], slice_ptr[1])
+                assert is_sf_audio_data(byte_data)
+                wav_path = io.BytesIO(byte_data)
+            for i in range(self.retry_times):
+                if i < self.retry_times - 1:
+                    try:
+                        wav, cur_sample_rate = sf.read(wav_path)
+                        break
+                    except Exception as e:
+                        logger.warn(f"Fail to load wav for the {i} time")
+                        logger.warn(e)
+                        time.sleep(1)
+                        continue
+                else:
+                    wav, cur_sample_rate = sf.read(wav_path)
+
+            wav = torch.from_numpy(wav).float()
+            wav = self.postprocess(wav, cur_sample_rate)
+            return wav
+
+    def get_label(self, index, label_idx):
+        if self.store_labels:
+            label = self.label_list[label_idx][index]
+        else:
+            with open(self.label_paths[label_idx]) as f:
+                offset_s, offset_e = self.label_offsets_list[label_idx][index]
+                f.seek(offset_s)
+                label = f.read(offset_e - offset_s)
+
+        if self.tokenizer is not None and self.fine_tuning:
+            label = self.tokenizer.encode(label)
+
+        if self.label_processors is not None:
+            label = self.label_processors[label_idx](label)
+        return label
+
+    def get_labels(self, index):
+        return [self.get_label(index, i) for i in range(self.num_labels)]
+
+    def __getitem__(self, index):
+        wav = self.get_audio(index)
+        labels = self.get_labels(index)
+        return {"id": index, "source": wav, "label_list": labels}
+
+    def __len__(self):
+        return len(self.wav_sizes)
+
+    def crop_to_max_size(self, wav, target_size):
+        size = len(wav)
+        diff = size - target_size
+        if diff <= 0:
+            return wav, 0
+
+        start, end = 0, target_size
+        if self.random_crop:
+            start = np.random.randint(0, diff + 1)
+            end = size - diff + start
+        return wav[start:end], start
+
+    def collater(self, samples):
+        # target = max(sizes) -> random_crop not used
+        # target = max_sample_size -> random_crop used for long
+        samples = [s for s in samples if s["source"] is not None]
+        if len(samples) == 0:
+            return {}
+
+        audios = [s["source"] for s in samples]
+        audio_sizes = [len(s) for s in audios]
+        if self.pad_audio:
+            audio_size = min(max(audio_sizes), self.max_sample_size)
+        else:
+            audio_size = min(min(audio_sizes), self.max_sample_size)
+        feat_dim = audios[0].size(-1) if audios[0].dim() > 1 else 1
+        collated_audios, padding_mask, audio_starts = self.collater_audio(
+            audios, audio_size, feat_dim,
+        )
+
+        targets_by_label = [
+            [s["label_list"][i] for s in samples] for i in range(self.num_labels)
+        ]
+        targets_list, lengths_list, ntokens_list = self.collater_label(
+            targets_by_label, audio_size, audio_starts
+        )
+
+        if self.add_decoder_target:
+            if self.fine_tuning:
+                    decoder_label = [
+                        torch.cat((targets_list[0][i, :lengths_list[0][i]], torch.tensor([self.tgt_dict.eos()])), 0).long()
+                        for i in range(targets_list[0].size(0))
+                    ]
+            else:
+                if self.tokenizer is not None:
+                    decoder_label = [
+                        # Set 48 for translate int to char and avoid \n
+                        torch.cat(
+                            (
+                                torch.tensor(
+                                    self.tokenizer.sp.Encode(
+                                        "".join(
+                                            [chr(j + 48) for j in (
+                                                targets_list[0][i, :lengths_list[0][i]].unique_consecutive() if self.reduce_label_for_dec else targets_list[0][i, :lengths_list[0][i]]
+                                            ).tolist()]
+                                        ), out_type=int
+                                    )
+                                ), 
+                                torch.tensor([self.tgt_dict.eos()])
+                            ), dim=0
+                        ).long()
+                        for i in range(targets_list[0].size(0))
+                    ]
+                else:
+                    decoder_label = [
+                        torch.cat((targets_list[0][i, :lengths_list[0][i]].unique_consecutive() if self.reduce_label_for_dec else targets_list[0][i, :lengths_list[0][i]], torch.tensor([self.tgt_dict.eos()])), 0).long()
+                        for i in range(targets_list[0].size(0))
+                    ]
+
+            if self.mbart_style_lang_id:
+                decoder_label = [
+                    torch.cat((decoder_label[i], torch.tensor([self.tgt_lang_idx])), 0).long()
+                    for i in range(targets_list[0].size(0))
+                ]
+
+            dec_ntokens = sum(x.size(0) for x in decoder_label)
+            decoder_target = data_utils.collate_tokens(
+                decoder_label,
+                self.tgt_dict.pad(),
+                self.tgt_dict.eos() if not self.mbart_style_lang_id else self.tgt_lang_idx,
+                left_pad=False,
+                move_eos_to_beginning=False,
+            )
+            decoder_target_lengths = torch.tensor(
+                [x.size(0) for x in decoder_label], dtype=torch.long
+            )
+            prev_output_tokens = data_utils.collate_tokens(
+                decoder_label,
+                self.tgt_dict.pad(),
+                self.tgt_dict.eos() if not self.mbart_style_lang_id else self.tgt_lang_idx,
+                left_pad=False,
+                move_eos_to_beginning=True,
+            )
+            
+            if self.tgt_lang_idx is not None and not self.mbart_style_lang_id:
+                assert (prev_output_tokens[:, 0] != self.tgt_dict.eos()).sum() == 0
+                prev_output_tokens[:, 0] = self.tgt_lang_idx
+
+            net_input = {
+                "source": collated_audios, 
+                "padding_mask": padding_mask,
+                "prev_output_tokens": prev_output_tokens,
+            }
+            batch = {
+                "id": torch.LongTensor([s["id"] for s in samples]),
+                "net_input": net_input,
+                "decoder_target": decoder_target,
+                "decoder_target_lengths": decoder_target_lengths,
+                "dec_ntokens": dec_ntokens,
+                "lang_idx": self.tgt_lang_idx,
+            }
+        else:
+            net_input = {"source": collated_audios, "padding_mask": padding_mask}
+            batch = {
+                "id": torch.LongTensor([s["id"] for s in samples]),
+                "net_input": net_input,
+            }
+
+        if self.single_target:
+            batch["target_lengths"] = lengths_list[0]
+            batch["ntokens"] = ntokens_list[0]
+            batch["target"] = targets_list[0]
+        else:
+            batch["target_lengths_list"] = lengths_list
+            batch["ntokens_list"] = ntokens_list
+            batch["target_list"] = targets_list
+        return batch
+
+    def collater_audio(self, audios, audio_size, feat_dim=1):
+        collated_audios = audios[0].new_zeros(len(audios), audio_size, feat_dim)
+        padding_mask = (
+            torch.BoolTensor(collated_audios.shape[0:2]).fill_(False)
+            # if self.pad_audio else None
+        )
+        audio_starts = [0 for _ in audios]
+        for i, audio in enumerate(audios):
+            audio = audio.view(-1, feat_dim)
+            diff = len(audio) - audio_size
+            if diff == 0:
+                collated_audios[i] = audio
+            elif diff < 0:
+                assert self.pad_audio
+                collated_audios[i] = torch.cat([audio, audio.new_full((-diff, feat_dim), 0.0)])
+                padding_mask[i, diff:] = True
+            else:
+                collated_audios[i], audio_starts[i] = self.crop_to_max_size(
+                    audio, audio_size
+                )
+        return collated_audios.squeeze(-1), padding_mask, audio_starts
+
+    def collater_frm_label(self, targets, audio_size, audio_starts, label_rate, pad):
+        assert label_rate > 0
+        s2f = label_rate / self.sample_rate
+        frm_starts = [int(round(s * s2f)) for s in audio_starts]
+        frm_size = int(round(audio_size * s2f))
+        if not self.pad_audio:
+            rem_size = [len(t) - s for t, s in zip(targets, frm_starts)]
+            frm_size = min(frm_size, *rem_size)
+        targets = [t[s : s + frm_size] for t, s in zip(targets, frm_starts)]
+        logger.debug(f"audio_starts={audio_starts}")
+        logger.debug(f"frame_starts={frm_starts}")
+        logger.debug(f"frame_size={frm_size}")
+
+        lengths = torch.LongTensor([len(t) for t in targets])
+        ntokens = lengths.sum().item()
+        targets = data_utils.collate_tokens(targets, pad_idx=pad, left_pad=False)
+        return targets, lengths, ntokens
+
+    def collater_seq_label(self, targets, pad):
+        lengths = torch.LongTensor([len(t) for t in targets])
+        ntokens = lengths.sum().item()
+        targets = data_utils.collate_tokens(targets, pad_idx=pad, left_pad=False)
+        return targets, lengths, ntokens
+
+    def collater_label(self, targets_by_label, audio_size, audio_starts):
+        targets_list, lengths_list, ntokens_list = [], [], []
+        itr = zip(targets_by_label, self.label_rates, self.pad_list)
+        for targets, label_rate, pad in itr:
+            if label_rate == -1:
+                targets, lengths, ntokens = self.collater_seq_label(targets, pad)
+            else:
+                targets, lengths, ntokens = self.collater_frm_label(
+                    targets, audio_size, audio_starts, label_rate, pad
+                )
+            targets_list.append(targets)
+            lengths_list.append(lengths)
+            ntokens_list.append(ntokens)
+        return targets_list, lengths_list, ntokens_list
+
+    def num_tokens(self, index):
+        return self.size(index)
+
+    def size(self, index):
+        if self.pad_audio:
+            return self.wav_sizes[index]
+        return min(self.wav_sizes[index], self.max_sample_size)
+
+    @property
+    def sizes(self):
+        return np.array(self.wav_sizes)
+
+    def ordered_indices(self):
+        """Return an ordered list of indices. Batches will be constructed based
+        on this order."""
+
+        if self.shuffle:
+            if len(self.chunk_names) > 0:
+                logger.info(f"ordered indices for epoch {self.epoch}")
+                with data_utils.numpy_seed(self.epoch):
+                    self.chunk_order = np.random.permutation(len(self.chunk_names))
+                chunk_count = 0
+                tmp_sizes = []
+                tmp_indices = []
+                indice = []
+                for i in self.chunk_order:
+                    chunk_count += 1
+                    start = self.chunk_indices[i]
+                    end = self.chunk_indices[i+1] if i < len(self.chunk_names) - 1 else len(self)
+                    size = list(self.sizes[start:end])
+                    tmp_indices.extend(list(np.arange(start, end)))
+                    tmp_sizes.extend(size)
+                    if chunk_count % 10 == 0 or i == self.chunk_order[0]:
+                        order = [np.random.permutation(len(tmp_indices))]
+                        order.append(
+                            np.minimum(
+                                np.array(tmp_sizes),
+                                self.max_sample_size,
+                            )
+                        )
+                        sort_idx = np.lexsort(order)[::-1]
+                        indice.append(np.array([tmp_indices[k] for k in sort_idx]))
+                        tmp_indices = []
+                        tmp_sizes =[]
+                return indice
+            else:
+                order = [np.random.permutation(len(self))]
+                order.append(
+                    np.minimum(
+                        np.array(self.sizes),
+                        self.max_sample_size,
+                    )
+                )
+                return np.lexsort(order)[::-1]
+        else:
+            return np.arange(len(self))
+
+    def postprocess(self, wav, cur_sample_rate):
+        if wav.dim() == 2:
+            wav = wav.mean(-1)
+        assert wav.dim() == 1, wav.dim()
+
+        if cur_sample_rate != self.sample_rate:
+            raise Exception(f"sr {cur_sample_rate} != {self.sample_rate}")
+
+        if self.normalize:
+            with torch.no_grad():
+                wav = F.layer_norm(wav, wav.shape)
+        return wav
diff --git a/SpeechUT/speechut/data/language_trible_dataset.py b/SpeechUT/speechut/data/language_trible_dataset.py
new file mode 100644
index 0000000000000000000000000000000000000000..6494127d6bb5d993d557f9f534f7cca83b0f7fa1
--- /dev/null
+++ b/SpeechUT/speechut/data/language_trible_dataset.py
@@ -0,0 +1,669 @@
+# --------------------------------------------------------
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Based on fairseq code bases
+# https://github.com/facebookresearch/fairseq
+# --------------------------------------------------------
+
+import logging
+import numpy as np
+import torch
+import os
+import itertools
+
+from fairseq.data import FairseqDataset, data_utils
+from fairseq.data import (
+    AppendTokenDataset,
+    ConcatDataset,
+    PrependTokenDataset,
+    data_utils,
+    indexed_dataset,
+)
+
+logger = logging.getLogger(__name__)
+
+def load_langtriple_dataset(
+    data_path,
+    split,
+    src,
+    src_dict,
+    ref,
+    ref_dict,
+    tgt,
+    tgt_dict,
+    combine,
+    dataset_impl,
+    upsample_primary,
+    left_pad_source,
+    left_pad_target,
+    max_source_positions,
+    max_target_positions,
+    prepend_bos=False,
+    load_alignments=False,
+    truncate_source=False,
+    append_source_id=False,
+    num_buckets=0,
+    shuffle=True,
+    pad_to_multiple=1,
+    prepend_bos_src=None,
+    lang_format="[{}]",
+):
+    assert not truncate_source
+    def split_exists(split, src, ref, tgt, lang, data_path):
+        filename = os.path.join(data_path, "{}.{}-{}-{}.{}".format(split, src, ref, tgt, lang))
+        return indexed_dataset.dataset_exists(filename, impl=dataset_impl)
+
+    src_datasets = []
+    ref_datasets = []
+    tgt_datasets = []
+
+    for k in itertools.count():
+        split_k = split + (str(k) if k > 0 else "")
+
+        # infer langcode
+        if split_exists(split_k, src, ref, tgt, src, data_path):
+            prefix = os.path.join(data_path, "{}.{}-{}-{}.".format(split_k, src, ref, tgt))
+        elif split_exists(split_k, tgt, ref, src, src, data_path):
+            prefix = os.path.join(data_path, "{}.{}-{}-{}.".format(split_k, tgt, ref, src))
+        else:
+            if k > 0:
+                break
+            else:
+                raise FileNotFoundError(
+                    "Dataset not found: {} ({})".format(split, data_path)
+                )
+
+        src_dataset = data_utils.load_indexed_dataset(
+            prefix + src, src_dict, dataset_impl
+        )
+        src_datasets.append(src_dataset)
+
+        ref_dataset = data_utils.load_indexed_dataset(
+            prefix + ref, ref_dict, dataset_impl
+        )
+        ref_datasets.append(ref_dataset)
+
+        tgt_dataset = data_utils.load_indexed_dataset(
+            prefix + tgt, tgt_dict, dataset_impl
+        )
+        if tgt_dataset is not None:
+            tgt_datasets.append(tgt_dataset)
+
+        logger.info(
+            "{} {} {}-{}-{} {} examples".format(
+                data_path, split_k, src, ref, tgt, len(src_datasets[-1])
+            )
+        )
+
+        if not combine:
+            break
+
+    assert len(src_datasets) == len(ref_datasets)
+    assert len(src_datasets) == len(tgt_datasets) or len(tgt_datasets) == 0
+
+    if len(src_datasets) == 1:
+        src_dataset = src_datasets[0]
+        ref_dataset = ref_datasets[0]
+        tgt_dataset = tgt_datasets[0] if len(tgt_datasets) > 0 else None
+    else:
+        sample_ratios = [1] * len(src_datasets)
+        sample_ratios[0] = upsample_primary
+        src_dataset = ConcatDataset(src_datasets, sample_ratios)
+        ref_dataset = ConcatDataset(ref_datasets, sample_ratios)
+        if len(tgt_datasets) > 0:
+            tgt_dataset = ConcatDataset(tgt_datasets, sample_ratios)
+        else:
+            tgt_dataset = None
+
+    if prepend_bos:
+        assert hasattr(src_dict, "bos_index") and hasattr(ref_dict, "bos_index") and hasattr(tgt_dict, "bos_index")
+        src_dataset = PrependTokenDataset(src_dataset, src_dict.bos())
+        ref_dataset = PrependTokenDataset(ref_dataset, ref_dict.bos())
+        if tgt_dataset is not None:
+            tgt_dataset = PrependTokenDataset(tgt_dataset, tgt_dict.bos())
+    elif prepend_bos_src is not None:
+        logger.info(f"prepending src bos: {prepend_bos_src}")
+        src_dataset = PrependTokenDataset(src_dataset, prepend_bos_src)
+        ref_dataset = PrependTokenDataset(ref_dataset, prepend_bos_src)
+
+    eos = None
+    if append_source_id:
+        src_dataset = AppendTokenDataset(
+            src_dataset, src_dict.index(lang_format.format(src))
+        )
+        ref_dataset = AppendTokenDataset(
+            ref_dataset, ref_dict.index(lang_format.format(ref))
+        )
+        if tgt_dataset is not None:
+            tgt_dataset = AppendTokenDataset(
+                tgt_dataset, tgt_dict.index(lang_format.format(tgt))
+            )
+        eos = tgt_dict.index(lang_format.format(tgt))
+
+    align_dataset = None
+    if load_alignments:
+        align_path = os.path.join(data_path, "{}.align.{}-{}".format(split, src, tgt))
+        if indexed_dataset.dataset_exists(align_path, impl=dataset_impl):
+            align_dataset = data_utils.load_indexed_dataset(
+                align_path, None, dataset_impl
+            )
+
+    tgt_dataset_sizes = tgt_dataset.sizes if tgt_dataset is not None else None
+    return LanguageTripleDataset(
+        src_dataset,
+        src_dataset.sizes,
+        src_dict,
+        ref_dataset,
+        ref_dataset.sizes,
+        ref_dict,
+        tgt_dataset,
+        tgt_dataset_sizes,
+        tgt_dict,
+        left_pad_source=left_pad_source,
+        left_pad_target=left_pad_target,
+        align_dataset=align_dataset,
+        eos=eos,
+        num_buckets=num_buckets,
+        shuffle=shuffle,
+        pad_to_multiple=pad_to_multiple,
+    )
+
+
+def collate(
+    samples,
+    pad_idx,
+    eos_idx,
+    left_pad_source=True,
+    left_pad_target=False,
+    input_feeding=True,
+    pad_to_length=None,
+    pad_to_multiple=1,
+):
+    if len(samples) == 0:
+        return {}
+
+    def merge(key, left_pad, move_eos_to_beginning=False, pad_to_length=None):
+        return data_utils.collate_tokens(
+            [s[key] for s in samples],
+            pad_idx,
+            None,
+            left_pad,
+            move_eos_to_beginning,
+            pad_to_length=pad_to_length,
+            pad_to_multiple=pad_to_multiple,
+        )
+
+    def check_alignment(alignment, src_len, tgt_len):
+        if alignment is None or len(alignment) == 0:
+            return False
+        if (
+            alignment[:, 0].max().item() >= src_len - 1
+            or alignment[:, 1].max().item() >= tgt_len - 1
+        ):
+            logger.warning("alignment size mismatch found, skipping alignment!")
+            return False
+        return True
+
+    def compute_alignment_weights(alignments):
+        """
+        Given a tensor of shape [:, 2] containing the source-target indices
+        corresponding to the alignments, a weight vector containing the
+        inverse frequency of each target index is computed.
+        For e.g. if alignments = [[5, 7], [2, 3], [1, 3], [4, 2]], then
+        a tensor containing [1., 0.5, 0.5, 1] should be returned (since target
+        index 3 is repeated twice)
+        """
+        align_tgt = alignments[:, 1]
+        _, align_tgt_i, align_tgt_c = torch.unique(
+            align_tgt, return_inverse=True, return_counts=True
+        )
+        align_weights = align_tgt_c[align_tgt_i[np.arange(len(align_tgt))]]
+        return 1.0 / align_weights.float()
+
+    id = torch.LongTensor([s["id"] for s in samples])
+    src_tokens = merge(
+        "source",
+        left_pad=left_pad_source,
+        pad_to_length=pad_to_length["source"] if pad_to_length is not None else None,
+    )
+    ref_tokens = merge(
+        "reference",
+        left_pad=left_pad_source,
+        pad_to_length=pad_to_length["source"] if pad_to_length is not None else None,
+    )
+    # sort by descending source length
+    src_lengths = torch.LongTensor(
+        [s["source"].ne(pad_idx).long().sum() for s in samples]
+    )
+    ref_lengths = torch.LongTensor(
+        [s["reference"].ne(pad_idx).long().sum() for s in samples]
+    )
+    src_lengths, sort_order = src_lengths.sort(descending=True)
+    id = id.index_select(0, sort_order)
+    src_tokens = src_tokens.index_select(0, sort_order)
+    ref_lengths = ref_lengths.index_select(0, sort_order)
+    ref_tokens = ref_tokens.index_select(0, sort_order)
+
+    prev_output_tokens = None
+    target = None
+    if samples[0].get("target", None) is not None:
+        target = merge(
+            "target",
+            left_pad=left_pad_target,
+            pad_to_length=pad_to_length["target"]
+            if pad_to_length is not None
+            else None,
+        )
+        target = target.index_select(0, sort_order)
+        tgt_lengths = torch.LongTensor(
+            [s["target"].ne(pad_idx).long().sum() for s in samples]
+        ).index_select(0, sort_order)
+        ntokens = tgt_lengths.sum().item()
+
+        if samples[0].get("prev_output_tokens", None) is not None:
+            prev_output_tokens = merge("prev_output_tokens", left_pad=left_pad_target)
+        elif input_feeding:
+            # we create a shifted version of targets for feeding the
+            # previous output token(s) into the next decoder step
+            prev_output_tokens = merge(
+                "target",
+                left_pad=left_pad_target,
+                move_eos_to_beginning=True,
+                pad_to_length=pad_to_length["target"]
+                if pad_to_length is not None
+                else None,
+            )
+    else:
+        ntokens = src_lengths.sum().item()
+
+    batch = {
+        "id": id,
+        "nsentences": len(samples),
+        "ntokens": ntokens,
+        "net_input": {
+            "src_tokens": src_tokens,
+            "src_lengths": src_lengths,
+        },
+        "target": target,
+        "ref_tokens": ref_tokens,
+        "ref_lengths": ref_lengths,
+    }
+    if prev_output_tokens is not None:
+        batch["net_input"]["prev_output_tokens"] = prev_output_tokens.index_select(
+            0, sort_order
+        )
+
+    if samples[0].get("alignment", None) is not None:
+        bsz, tgt_sz = batch["target"].shape
+        src_sz = batch["net_input"]["src_tokens"].shape[1]
+
+        offsets = torch.zeros((len(sort_order), 2), dtype=torch.long)
+        offsets[:, 1] += torch.arange(len(sort_order), dtype=torch.long) * tgt_sz
+        if left_pad_source:
+            offsets[:, 0] += src_sz - src_lengths
+        if left_pad_target:
+            offsets[:, 1] += tgt_sz - tgt_lengths
+
+        alignments = [
+            alignment + offset
+            for align_idx, offset, src_len, tgt_len in zip(
+                sort_order, offsets, src_lengths, tgt_lengths
+            )
+            for alignment in [samples[align_idx]["alignment"].view(-1, 2)]
+            if check_alignment(alignment, src_len, tgt_len)
+        ]
+
+        if len(alignments) > 0:
+            alignments = torch.cat(alignments, dim=0)
+            align_weights = compute_alignment_weights(alignments)
+
+            batch["alignments"] = alignments
+            batch["align_weights"] = align_weights
+
+    if samples[0].get("constraints", None) is not None:
+        # Collate the packed constraints across the samples, padding to
+        # the length of the longest sample.
+        lens = [sample.get("constraints").size(0) for sample in samples]
+        max_len = max(lens)
+        constraints = torch.zeros((len(samples), max(lens))).long()
+        for i, sample in enumerate(samples):
+            constraints[i, 0 : lens[i]] = samples[i].get("constraints")
+        batch["constraints"] = constraints.index_select(0, sort_order)
+
+    return batch
+
+
+class LanguageTripleDataset(FairseqDataset):
+    """
+    A pair of torch.utils.data.Datasets.
+
+    Args:
+        src (torch.utils.data.Dataset): source dataset to wrap
+        src_sizes (List[int]): source sentence lengths
+        src_dict (~fairseq.data.Dictionary): source vocabulary
+        tgt (torch.utils.data.Dataset, optional): target dataset to wrap
+        tgt_sizes (List[int], optional): target sentence lengths
+        tgt_dict (~fairseq.data.Dictionary, optional): target vocabulary
+        left_pad_source (bool, optional): pad source tensors on the left side
+            (default: True).
+        left_pad_target (bool, optional): pad target tensors on the left side
+            (default: False).
+        shuffle (bool, optional): shuffle dataset elements before batching
+            (default: True).
+        input_feeding (bool, optional): create a shifted version of the targets
+            to be passed into the model for teacher forcing (default: True).
+        remove_eos_from_source (bool, optional): if set, removes eos from end
+            of source if it's present (default: False).
+        append_eos_to_target (bool, optional): if set, appends eos to end of
+            target if it's absent (default: False).
+        align_dataset (torch.utils.data.Dataset, optional): dataset
+            containing alignments.
+        constraints (Tensor, optional): 2d tensor with a concatenated, zero-
+            delimited list of constraints for each sentence.
+        append_bos (bool, optional): if set, appends bos to the beginning of
+            source/target sentence.
+        num_buckets (int, optional): if set to a value greater than 0, then
+            batches will be bucketed into the given number of batch shapes.
+        src_lang_id (int, optional): source language ID, if set, the collated batch
+            will contain a field 'src_lang_id' in 'net_input' which indicates the
+            source language of the samples.
+        tgt_lang_id (int, optional): target language ID, if set, the collated batch
+            will contain a field 'tgt_lang_id' which indicates the target language
+             of the samples.
+    """
+
+    def __init__(
+        self,
+        src,
+        src_sizes,
+        src_dict,
+        ref,
+        ref_sizes,
+        ref_dict,
+        tgt=None,
+        tgt_sizes=None,
+        tgt_dict=None,
+        left_pad_source=True,
+        left_pad_target=False,
+        shuffle=True,
+        input_feeding=True,
+        remove_eos_from_source=False,
+        append_eos_to_target=False,
+        align_dataset=None,
+        constraints=None,
+        append_bos=False,
+        eos=None,
+        num_buckets=0,
+        src_lang_id=None,
+        tgt_lang_id=None,
+        pad_to_multiple=1,
+    ):
+        if tgt_dict is not None:
+            assert src_dict.pad() == tgt_dict.pad()
+            assert src_dict.eos() == tgt_dict.eos()
+            assert src_dict.unk() == tgt_dict.unk()
+        if tgt is not None:
+            assert len(src) == len(
+                tgt
+            ), "Source and target must contain the same number of examples"
+        assert len(src) == len(
+            ref
+        ), "Source and reference must contain the same number of examples"
+        self.src = src
+        self.ref = ref
+        self.tgt = tgt
+        self.src_sizes = np.array(src_sizes)
+        self.ref_sizes = np.array(ref_sizes)
+        self.tgt_sizes = np.array(tgt_sizes) if tgt_sizes is not None else None
+        self.sizes = (
+            np.vstack((self.src_sizes, self.tgt_sizes)).T
+            if self.tgt_sizes is not None
+            else self.src_sizes
+        )
+        self.src_dict = src_dict
+        self.ref_dict = ref_dict
+        self.tgt_dict = tgt_dict
+        self.left_pad_source = left_pad_source
+        self.left_pad_target = left_pad_target
+        self.shuffle = shuffle
+        self.input_feeding = input_feeding
+        self.remove_eos_from_source = remove_eos_from_source
+        self.append_eos_to_target = append_eos_to_target
+        self.align_dataset = align_dataset
+        if self.align_dataset is not None:
+            assert (
+                self.tgt_sizes is not None
+            ), "Both source and target needed when alignments are provided"
+        self.constraints = constraints
+        self.append_bos = append_bos
+        self.eos = eos if eos is not None else src_dict.eos()
+        self.src_lang_id = src_lang_id
+        self.tgt_lang_id = tgt_lang_id
+        if num_buckets > 0:
+            from fairseq.data import BucketPadLengthDataset
+
+            self.src = BucketPadLengthDataset(
+                self.src,
+                sizes=self.src_sizes,
+                num_buckets=num_buckets,
+                pad_idx=self.src_dict.pad(),
+                left_pad=self.left_pad_source,
+            )
+            self.src_sizes = self.src.sizes
+            logger.info("bucketing source lengths: {}".format(list(self.src.buckets)))
+            self.ref = BucketPadLengthDataset(
+                self.ref,
+                sizes=self.ref_sizes,
+                num_buckets=num_buckets,
+                pad_idx=self.ref_dict.pad(),
+                left_pad=self.left_pad_source,
+            )
+            self.ref_sizes = self.ref.sizes
+            logger.info("bucketing reference lengths: {}".format(list(self.src.buckets)))
+            if self.tgt is not None:
+                self.tgt = BucketPadLengthDataset(
+                    self.tgt,
+                    sizes=self.tgt_sizes,
+                    num_buckets=num_buckets,
+                    pad_idx=self.tgt_dict.pad(),
+                    left_pad=self.left_pad_target,
+                )
+                self.tgt_sizes = self.tgt.sizes
+                logger.info(
+                    "bucketing target lengths: {}".format(list(self.tgt.buckets))
+                )
+
+            # determine bucket sizes using self.num_tokens, which will return
+            # the padded lengths (thanks to BucketPadLengthDataset)
+            num_tokens = np.vectorize(self.num_tokens, otypes=[np.compat.long])
+            self.bucketed_num_tokens = num_tokens(np.arange(len(self.src)))
+            self.buckets = [
+                (None, num_tokens) for num_tokens in np.unique(self.bucketed_num_tokens)
+            ]
+        else:
+            self.buckets = None
+        self.pad_to_multiple = pad_to_multiple
+
+    def get_batch_shapes(self):
+        return self.buckets
+
+    def __getitem__(self, index):
+        tgt_item = self.tgt[index] if self.tgt is not None else None
+        src_item = self.src[index]
+        ref_item = self.ref[index]
+        # Append EOS to end of tgt sentence if it does not have an EOS and remove
+        # EOS from end of src sentence if it exists. This is useful when we use
+        # use existing datasets for opposite directions i.e., when we want to
+        # use tgt_dataset as src_dataset and vice versa
+        if self.append_eos_to_target:
+            eos = self.tgt_dict.eos() if self.tgt_dict else self.src_dict.eos()
+            if self.tgt and self.tgt[index][-1] != eos:
+                tgt_item = torch.cat([self.tgt[index], torch.LongTensor([eos])])
+
+        if self.append_bos:
+            bos = self.tgt_dict.bos() if self.tgt_dict else self.src_dict.bos()
+            if self.tgt and self.tgt[index][0] != bos:
+                tgt_item = torch.cat([torch.LongTensor([bos]), self.tgt[index]])
+
+            bos = self.src_dict.bos()
+            if self.src[index][0] != bos:
+                src_item = torch.cat([torch.LongTensor([bos]), self.src[index]])
+            if self.ref[index][0] != bos:
+                ref_item = torch.cat([torch.LongTensor([bos]), self.ref[index]])
+
+        if self.remove_eos_from_source:
+            eos = self.src_dict.eos()
+            if self.src[index][-1] == eos:
+                src_item = self.src[index][:-1]
+            if self.ref[index][-1] == eos:
+                ref_item = self.ref[index][:-1]
+
+        example = {
+            "id": index,
+            "source": src_item,
+            "reference": ref_item,
+            "target": tgt_item,
+        }
+        if self.align_dataset is not None:
+            example["alignment"] = self.align_dataset[index]
+        if self.constraints is not None:
+            example["constraints"] = self.constraints[index]
+        return example
+
+    def __len__(self):
+        return len(self.src)
+
+    def collater(self, samples, pad_to_length=None):
+        """Merge a list of samples to form a mini-batch.
+
+        Args:
+            samples (List[dict]): samples to collate
+            pad_to_length (dict, optional): a dictionary of
+                {'source': source_pad_to_length, 'target': target_pad_to_length}
+                to indicate the max length to pad to in source and target respectively.
+
+        Returns:
+            dict: a mini-batch with the following keys:
+
+                - `id` (LongTensor): example IDs in the original input order
+                - `ntokens` (int): total number of tokens in the batch
+                - `net_input` (dict): the input to the Model, containing keys:
+
+                  - `src_tokens` (LongTensor): a padded 2D Tensor of tokens in
+                    the source sentence of shape `(bsz, src_len)`. Padding will
+                    appear on the left if *left_pad_source* is ``True``.
+                  - `src_lengths` (LongTensor): 1D Tensor of the unpadded
+                    lengths of each source sentence of shape `(bsz)`
+                  - `prev_output_tokens` (LongTensor): a padded 2D Tensor of
+                    tokens in the target sentence, shifted right by one
+                    position for teacher forcing, of shape `(bsz, tgt_len)`.
+                    This key will not be present if *input_feeding* is
+                    ``False``.  Padding will appear on the left if
+                    *left_pad_target* is ``True``.
+                  - `src_lang_id` (LongTensor): a long Tensor which contains source
+                    language IDs of each sample in the batch
+
+                - `target` (LongTensor): a padded 2D Tensor of tokens in the
+                  target sentence of shape `(bsz, tgt_len)`. Padding will appear
+                  on the left if *left_pad_target* is ``True``.
+                - `tgt_lang_id` (LongTensor): a long Tensor which contains target language
+                   IDs of each sample in the batch
+        """
+        res = collate(
+            samples,
+            pad_idx=self.src_dict.pad(),
+            eos_idx=self.eos,
+            left_pad_source=self.left_pad_source,
+            left_pad_target=self.left_pad_target,
+            input_feeding=self.input_feeding,
+            pad_to_length=pad_to_length,
+            pad_to_multiple=self.pad_to_multiple,
+        )
+        if self.src_lang_id is not None or self.tgt_lang_id is not None:
+            src_tokens = res["net_input"]["src_tokens"]
+            bsz = src_tokens.size(0)
+            if self.src_lang_id is not None:
+                res["net_input"]["src_lang_id"] = (
+                    torch.LongTensor([[self.src_lang_id]]).expand(bsz, 1).to(src_tokens)
+                )
+            if self.tgt_lang_id is not None:
+                res["tgt_lang_id"] = (
+                    torch.LongTensor([[self.tgt_lang_id]]).expand(bsz, 1).to(src_tokens)
+                )
+        return res
+
+    def num_tokens(self, index):
+        """Return the number of tokens in a sample. This value is used to
+        enforce ``--max-tokens`` during batching."""
+        return max(
+            self.src_sizes[index],
+            self.tgt_sizes[index] if self.tgt_sizes is not None else 0,
+        )
+
+    def num_tokens_vec(self, indices):
+        """Return the number of tokens for a set of positions defined by indices.
+        This value is used to enforce ``--max-tokens`` during batching."""
+        sizes = self.src_sizes[indices]
+        if self.tgt_sizes is not None:
+            sizes = np.maximum(sizes, self.tgt_sizes[indices])
+        return sizes
+
+    def size(self, index):
+        """Return an example's size as a float or tuple. This value is used when
+        filtering a dataset with ``--max-positions``."""
+        return (
+            self.src_sizes[index],
+            self.tgt_sizes[index] if self.tgt_sizes is not None else 0,
+        )
+
+    def ordered_indices(self):
+        """Return an ordered list of indices. Batches will be constructed based
+        on this order."""
+        if self.shuffle:
+            indices = np.random.permutation(len(self)).astype(np.int64)
+        else:
+            indices = np.arange(len(self), dtype=np.int64)
+        if self.buckets is None:
+            # sort by target length, then source length
+            if self.tgt_sizes is not None:
+                indices = indices[np.argsort(self.tgt_sizes[indices], kind="mergesort")]
+            return indices[np.argsort(self.src_sizes[indices], kind="mergesort")]
+        else:
+            # sort by bucketed_num_tokens, which is:
+            #   max(padded_src_len, padded_tgt_len)
+            return indices[
+                np.argsort(self.bucketed_num_tokens[indices], kind="mergesort")
+            ]
+
+    @property
+    def supports_prefetch(self):
+        return getattr(self.src, "supports_prefetch", False) and (
+            getattr(self.tgt, "supports_prefetch", False) or self.tgt is None
+        )
+
+    def prefetch(self, indices):
+        self.src.prefetch(indices)
+        if self.tgt is not None:
+            self.tgt.prefetch(indices)
+        if self.align_dataset is not None:
+            self.align_dataset.prefetch(indices)
+
+    def filter_indices_by_size(self, indices, max_sizes):
+        """Filter a list of sample indices. Remove those that are longer
+            than specified in max_sizes.
+
+        Args:
+            indices (np.array): original array of sample indices
+            max_sizes (int or list[int] or tuple[int]): max sample size,
+                can be defined separately for src and tgt (then list or tuple)
+
+        Returns:
+            np.array: filtered sample array
+            list: list of removed indices
+        """
+        return data_utils.filter_paired_dataset_indices_by_size(
+            self.src_sizes,
+            self.tgt_sizes,
+            indices,
+            max_sizes,
+        )
diff --git a/SpeechUT/speechut/data/load_langpair_dataset.py b/SpeechUT/speechut/data/load_langpair_dataset.py
new file mode 100644
index 0000000000000000000000000000000000000000..bfd204598e67d41a5688e16b0835f96fd40cf384
--- /dev/null
+++ b/SpeechUT/speechut/data/load_langpair_dataset.py
@@ -0,0 +1,172 @@
+# --------------------------------------------------------
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Based on fairseq code bases
+# https://github.com/facebookresearch/fairseq
+# --------------------------------------------------------
+
+"""
+    Modified from https://github.com/facebookresearch/fairseq/blob/272c4c5197250997148fb12c0db6306035f166a4/fairseq/tasks/translation.py
+    1. Add custom lang_format in function load_langpair_dataset
+    2. If truncate_source (default no), use RandomCropDataset instead of TruncateDataset
+"""
+
+import itertools
+import logging
+import os
+
+from fairseq.data import (
+    AppendTokenDataset,
+    LanguagePairDataset,
+    PrependTokenDataset,
+    StripTokenDataset,
+    TruncateDataset,
+    RandomCropDataset,
+    data_utils,
+    indexed_dataset,
+)
+
+from speechut.data.concat_dataset import ConcatDataset
+
+
+EVAL_BLEU_ORDER = 4
+
+
+logger = logging.getLogger(__name__)
+
+
+def load_langpair_dataset(
+    data_path,
+    split,
+    src,
+    src_dict,
+    tgt,
+    tgt_dict,
+    combine,
+    dataset_impl,
+    upsample_primary,
+    left_pad_source,
+    left_pad_target,
+    max_source_positions,
+    max_target_positions,
+    prepend_bos=False,
+    load_alignments=False,
+    truncate_source=False,
+    append_source_id=False,
+    num_buckets=0,
+    shuffle=True,
+    pad_to_multiple=1,
+    prepend_bos_src=None,
+    lang_format="[{}]",
+    input_feeding=True,
+):
+    def split_exists(split, src, tgt, lang, data_path):
+        filename = os.path.join(data_path, "{}.{}-{}.{}".format(split, src, tgt, lang))
+        return indexed_dataset.dataset_exists(filename, impl=dataset_impl)
+
+    src_datasets = []
+    tgt_datasets = []
+
+    for k in itertools.count():
+        split_k = split + (str(k) if k > 0 else "")
+
+        # infer langcode
+        if split_exists(split_k, src, tgt, src, data_path):
+            prefix = os.path.join(data_path, "{}.{}-{}.".format(split_k, src, tgt))
+        elif split_exists(split_k, tgt, src, src, data_path):
+            prefix = os.path.join(data_path, "{}.{}-{}.".format(split_k, tgt, src))
+        else:
+            if k > 0:
+                break
+            else:
+                raise FileNotFoundError(
+                    "Dataset not found: {} ({})".format(split, data_path)
+                )
+
+        src_dataset = data_utils.load_indexed_dataset(
+            prefix + src, src_dict, dataset_impl
+        )
+        if truncate_source:
+            src_dataset = AppendTokenDataset(
+                RandomCropDataset(
+                    StripTokenDataset(src_dataset, src_dict.eos()),
+                    max_source_positions - 1,
+                ),
+                src_dict.eos(),
+            )
+        src_datasets.append(src_dataset)
+
+        tgt_dataset = data_utils.load_indexed_dataset(
+            prefix + tgt, tgt_dict, dataset_impl
+        )
+        if tgt_dataset is not None:
+            tgt_datasets.append(tgt_dataset)
+
+        logger.info(
+            "{} {} {}-{} {} examples".format(
+                data_path, split_k, src, tgt, len(src_datasets[-1])
+            )
+        )
+
+        if not combine:
+            break
+
+    assert len(src_datasets) == len(tgt_datasets) or len(tgt_datasets) == 0
+
+    if len(src_datasets) == 1:
+        src_dataset = src_datasets[0]
+        tgt_dataset = tgt_datasets[0] if len(tgt_datasets) > 0 else None
+    else:
+        sample_ratios = [1] * len(src_datasets)
+        sample_ratios[0] = upsample_primary
+        src_dataset = ConcatDataset(src_datasets, sample_ratios)
+        if len(tgt_datasets) > 0:
+            tgt_dataset = ConcatDataset(tgt_datasets, sample_ratios)
+        else:
+            tgt_dataset = None
+
+    if prepend_bos:
+        assert hasattr(src_dict, "bos_index") and hasattr(tgt_dict, "bos_index")
+        src_dataset = PrependTokenDataset(src_dataset, src_dict.bos())
+        if tgt_dataset is not None:
+            tgt_dataset = PrependTokenDataset(tgt_dataset, tgt_dict.bos())
+    elif prepend_bos_src is not None:
+        logger.info(f"prepending src bos: {prepend_bos_src}")
+        src_dataset = PrependTokenDataset(src_dataset, prepend_bos_src)
+
+    eos = None
+    if append_source_id:
+        src_dataset = AppendTokenDataset(
+            src_dataset, src_dict.index(lang_format.format(src))
+        )
+        if tgt_dataset is not None:
+            tgt_dataset = AppendTokenDataset(
+                tgt_dataset, tgt_dict.index(lang_format.format(tgt))
+            )
+        eos = tgt_dict.index(lang_format.format(tgt))
+
+    align_dataset = None
+    if load_alignments:
+        align_path = os.path.join(data_path, "{}.align.{}-{}".format(split, src, tgt))
+        if indexed_dataset.dataset_exists(align_path, impl=dataset_impl):
+            align_dataset = data_utils.load_indexed_dataset(
+                align_path, None, dataset_impl
+            )
+
+    tgt_dataset_sizes = tgt_dataset.sizes if tgt_dataset is not None else None
+    return LanguagePairDataset(
+        src_dataset,
+        src_dataset.sizes,
+        src_dict,
+        tgt_dataset,
+        tgt_dataset_sizes,
+        tgt_dict,
+        left_pad_source=left_pad_source,
+        left_pad_target=left_pad_target,
+        align_dataset=align_dataset,
+        eos=eos,
+        num_buckets=num_buckets,
+        shuffle=shuffle,
+        pad_to_multiple=pad_to_multiple,
+        input_feeding=input_feeding,
+    )
diff --git a/SpeechUT/speechut/data/multimodal_corpus_dataset.py b/SpeechUT/speechut/data/multimodal_corpus_dataset.py
new file mode 100644
index 0000000000000000000000000000000000000000..19a6f8962757dec9b32430a98cd6e850d1f30d19
--- /dev/null
+++ b/SpeechUT/speechut/data/multimodal_corpus_dataset.py
@@ -0,0 +1,368 @@
+# --------------------------------------------------------
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Based on fairseq code bases
+# https://github.com/facebookresearch/fairseq
+# --------------------------------------------------------
+
+import logging
+from os import replace
+import time
+from collections import OrderedDict
+from typing import Any, Dict, List, Optional
+
+import numpy as np
+from fairseq.data import data_utils
+
+from fairseq.data import FairseqDataset
+
+logger = logging.getLogger(__name__)
+
+
+class MultiCorpusDataset(FairseqDataset):
+    """
+    see fairseq/fairseq/data/multi_corpus_dataset.__doc__
+
+    Args:
+        datasets: a OrderedDict of FairseqDataset instances.
+        distribution: a List containing the probability of getting an utterance from
+                        corresponding dataset
+        seed: random seed for sampling the datsets
+        sort_indices: if true, will sort the ordered indices by size
+        batch_sample: if true, will ensure each batch is from a single dataset
+    """
+
+    def __init__(
+        self,
+        datasets: Dict[str, FairseqDataset],
+        max_positions: Dict,
+        distribution: List[float],
+        max_tokens_ratio: List[float],
+        seed: int = 1234,
+        sort_indices: bool = False,
+        check_length: bool = False,
+    ):
+        super().__init__()
+        assert isinstance(datasets, OrderedDict)
+        assert len(datasets) == len(distribution)
+        # assert sum(distribution) == 1
+        self.datasets = datasets
+        self.distribution = distribution
+        self.max_tokens_ratio = max_tokens_ratio
+        self.seed = seed
+        self.sort_indices = sort_indices
+        self.max_positions = max_positions
+        self.check_length = check_length
+
+        # Avoid repeated conversions to list later
+        self.dataset_list = list(datasets.values())
+        self.total_num_instances = 0
+
+        # first_dataset = self.dataset_list[0]
+
+        self.num_instances_per_dataset = []
+        self.dataset_offsets = []
+        for i, dataset in enumerate(self.dataset_list):
+            assert isinstance(dataset, FairseqDataset)
+            # assert type(dataset) is type(first_dataset)
+            self.num_instances_per_dataset.append(
+                0 if self.distribution[i] == 0 else len(dataset)
+            )
+            self.dataset_offsets.append(self.total_num_instances)
+            self.total_num_instances += self.num_instances_per_dataset[i]
+
+    def ordered_indices(self):
+        start = time.time()
+        with data_utils.numpy_seed(self.seed, self.epoch):
+            logger.info(f"sampling new dataset with seed {self.seed} epoch {self.epoch}")
+            sampled_indices = {}
+
+            # For each dataset i, sample self.distribution[i] * self.total_num_instances
+            for i, key in enumerate(self.datasets):
+                tp = time.time()
+                if self.distribution[i] == 0:
+                    # skip dataset if sampling probability is 0
+                    continue
+
+                if i < len(self.datasets) - 1:
+                    num_instances = int(self.distribution[i] * self.total_num_instances)
+                    high = self.dataset_offsets[i + 1]
+                else:
+                    num_instances = int(self.distribution[i] * self.total_num_instances)
+                    high = self.total_num_instances
+
+                logger.info(f"sampling {num_instances} from {key} dataset")
+
+                # First, add k copies of the dataset where k = num_instances // len(dataset).
+                # This ensures an equal distribution of the data points as much as possible.
+                # For the remaining entries randomly sample them
+                dataset_size = len(self.datasets[key])
+                num_copies = num_instances // dataset_size
+                dataset_indices = np.random.permutation(high - self.dataset_offsets[i])[: num_instances - num_copies * dataset_size]
+                if num_copies > 0:
+                    dataset_indices = np.concatenate(
+                            (
+                                np.repeat(
+                                    np.arange(high - self.dataset_offsets[i]), num_copies
+                                ),
+                                dataset_indices,
+                            )
+                        )
+                # filter by size, we should ignore it by setting check_length=False
+                # , as it is very time-consuming on large dadaset
+                if self.max_positions[key] is not None and self.check_length:
+                    dataset_indices, ignored = self.datasets[key].filter_indices_by_size(
+                        dataset_indices,
+                        self.max_positions[key],
+                    )
+                    if len(ignored) > 0:
+                        logger.warning(
+                            (
+                                "{:,} samples have invalid sizes and will be skipped, "
+                                "max_positions={}, first few sample ids={}"
+                            ).format(len(ignored), self.max_positions[key], ignored[:10])
+                        )
+            
+                if self.sort_indices:
+                    logger.info(" - sampled indices took {}s".format(time.time() - tp))
+                    tp = time.time()
+                    dataset_indices = np.sort(dataset_indices)
+                    ordered_indices = self.datasets[key].ordered_indices()
+                    if isinstance(ordered_indices[0], np.ndarray):  # chunked audio data
+                        dataset_indices = [order_idx + self.dataset_offsets[i] for order_idx in ordered_indices]
+                        assert self.dataset_offsets[i] == 0
+                        # TODO for chunked audio data, now assume len(dataset_indices) == len(dataset). Don't filter any data.
+                    else:
+                        dataset_indices = ordered_indices[dataset_indices] + self.dataset_offsets[i]
+                    logger.info(" - ordered_indices took {}s".format(time.time() - tp))
+                else:
+                    np.random.shuffle(dataset_indices)
+                
+                sampled_indices[key] = dataset_indices
+
+            logger.info(
+                "multi_corpus_dataset ordered_indices took {}s".format(
+                    time.time() - start
+                )
+            )
+            return sampled_indices
+
+    def _map_index(self, index: int):
+        """
+        If dataset A has length N and dataset B has length M
+        then index 1 maps to index 1 of dataset A, and index N + 1
+        maps to index 1 of B.
+        """
+        counter = 0
+        for num_instances, key in zip(self.num_instances_per_dataset, self.datasets):
+            if index < counter + num_instances:
+                return index - counter, key
+            counter += num_instances
+        raise ValueError(
+            "Invalid index: {}, max: {}".format(index, self.total_num_instances)
+        )
+
+    def __len__(self):
+        """
+        Length of this dataset is the sum of individual datasets
+        """
+        return self.total_num_instances
+
+    def __getitem__(self, index):
+        new_index, key = self._map_index(index)
+        try:
+            item = self.datasets[key][new_index]
+            item["full_id"] = index
+            return item
+        except Exception as e:
+            e.args = (f"Error from {key} dataset", *e.args)
+            raise
+
+    def collater(self, samples):
+        """
+        If we are doing batch sampling, then pick the right collater to use.
+
+        Otherwise we assume all collaters are the same.
+        """
+        if len(samples) == 0:
+            return None
+        
+        samples_dict = {key: [] for key in self.datasets}
+        for s in samples:
+            _, key = self._map_index(s["full_id"])
+            samples_dict[key].append(s)
+        
+        batch = {}
+        for key in samples_dict:
+            if len(samples_dict[key]) == 0:
+                continue
+            batch[key] = self.datasets[key].collater(samples_dict[key])
+
+        return batch
+
+
+    def num_tokens(self, index: int):
+        index, key = self._map_index(index)
+        return self.datasets[key].num_tokens(index)
+
+    def size(self, index: int):
+        index, key = self._map_index(index)
+        return self.datasets[key].size(index)
+
+    @property
+    def can_reuse_epoch_itr_across_epochs(self):
+        return False
+
+    def set_epoch(self, epoch, **unused):
+        super().set_epoch(epoch)
+        logger.info(f"setting epoch of multi_corpus_dataset to {epoch}")
+        for ds in self.dataset_list:
+            if hasattr(ds, "set_epoch"):
+                ds.set_epoch(epoch)
+        self.epoch = epoch
+
+    @property
+    def supports_prefetch(self):
+        return False
+
+    @property
+    def supports_fetch_outside_dataloader(self):
+        return all(
+            self.datasets[key].supports_fetch_outside_dataloader
+            for key in self.datasets
+        )
+        
+
+    def batch_by_size(
+        self,
+        indices,
+        max_tokens=None,
+        max_sentences=None,
+        required_batch_size_multiple=1,
+    ):    
+        dataset_indices = indices
+        batches_dict = {}
+        for n, key in enumerate(dataset_indices):
+            max_tokens_ratio = self.max_tokens_ratio[n]
+            if isinstance(dataset_indices[key][0], np.ndarray): # chunked audio data
+                cur_batches = self.datasets[key].batch_by_size(
+                    dataset_indices[key],
+                    round(max_tokens * max_tokens_ratio),
+                    max_sentences,
+                    required_batch_size_multiple,
+                )
+                logger.info(f"Created {sum([len(b) for b in cur_batches])} [{len(cur_batches)}] batches for dataset {key}")
+            else:
+                cur_batches = super().batch_by_size(
+                    np.array(dataset_indices[key], dtype=np.int64),
+                    round(max_tokens * max_tokens_ratio),
+                    max_sentences,
+                    required_batch_size_multiple,
+                )
+                logger.info(f"Created {len(cur_batches)} batches for dataset {key}")
+            batches_dict[key] = cur_batches
+
+        return batches_dict
+
+
+    def get_batch_sampler(
+        self,
+        indices,
+        num_shards,
+        seed,
+        max_tokens=None,
+        max_sentences=None,
+        required_batch_size_multiple=1,
+        split_modality_batch=False,
+    ):
+
+        def batch_sampler(dataset, epoch):
+            start = time.time()
+            batches_dict = dataset.batch_by_size(
+                indices,
+                max_tokens=max_tokens,
+                max_sentences=max_sentences,
+                required_batch_size_multiple=required_batch_size_multiple,
+            )
+            logger.info(f"multi_corpus_dataset, batch_by_size took {time.time() - start}s")
+            start = time.time()
+            new_batches = []
+
+            ### shuffle inner group size, split into speech/text batches
+            shuffled_batches_list = []
+            speech_batches = []
+            ### we should specify the speech_batches because: we need concatenate different speech datasets 
+            # (e.g. ltr or km) instead of loading them parellelly.
+            for name, batches in batches_dict.items():
+                if name.startswith("speech"):
+                    if isinstance(batches[0], list):  # chunked audio data
+                        batches = self.datasets[name].shuffle_batches(list(batches), seed + epoch)
+                        shuffled_batches_list.append(batches)
+                    else:
+                        batches = inner_bucket_shuffle(batches, seed+epoch, num_shards*10)
+                        batches = batches[: (len(batches) // num_shards) * num_shards]
+                        if len(batches) == 0:
+                            logger.warning(f"Sample 0 batch for {name}, you should ensure that no {name} data provided.")
+                        else:
+                            speech_batches += batches
+                else:
+                    batches = inner_bucket_shuffle(batches, seed+epoch, num_shards*10)
+                    batches = batches[: (len(batches) // num_shards) * num_shards]
+                    if len(batches) == 0:
+                        logger.warning(f"Sample 0 batch for {name}, you should ensure that no {name} data provided.")
+                    else:
+                        batches = shuffle_buckets(batches, seed=seed+epoch, inner_shuf=False)
+                        shuffled_batches_list.append(batches)
+                if len(speech_batches) > 0:
+                    speech_batches = shuffle_buckets(speech_batches, seed=seed+epoch, inner_shuf=False)
+                    shuffled_batches_list.append(speech_batches)
+
+            ### create the final new_batches
+            num_batch = min(len(batches) for batches in shuffled_batches_list)
+            if split_modality_batch:
+                for i in range(0, num_batch, num_shards):
+                    for batches in shuffled_batches_list:
+                        new_batches += batches[i: i + num_shards]
+            else:
+                for i in range(num_batch):
+                    new_batches.append(np.concatenate([batches[i] for batches in shuffled_batches_list]))
+
+            logger.info(f"multi_corpus_dataset sample {len(new_batches)} batches, took {time.time() - start}s")
+            return new_batches
+        
+        def inner_bucket_shuffle(batches, seed, bucket_size=10, thr=0):
+            """we assert batches is sorted form long to short.
+                shuffle samples in a buctet(e.g. 10 batches).
+                batches: a list of numpy array"""
+            num_batch = len(batches)
+            new_batches = []
+            num_buckets = len(batches) // bucket_size
+            i = 0
+            while i < num_batch:
+                if (i < bucket_size * thr or 
+                    i >= bucket_size * (num_buckets - thr)
+                ):
+                    new_batches.append(batches[i])
+                    i += 1
+                else:
+                    group = np.concatenate(batches[i: i+bucket_size])
+                    with data_utils.numpy_seed(seed):
+                        np.random.shuffle(group)
+                    new_batches += np.array_split(group, bucket_size)
+                    i += bucket_size
+            assert all([len(batch) > 0 for batch in new_batches])
+            return new_batches
+        
+        def shuffle_buckets(batches, seed, inner_shuf=True):
+            if inner_shuf:
+                batches = inner_bucket_shuffle(batches, seed, num_shards*10)
+            batches = [batches[i: i + num_shards] for i in range(0, len(batches)-num_shards+1, num_shards)]
+            assert len(batches[-1]) == num_shards
+            new_batches = []
+            with data_utils.numpy_seed(seed):
+                np.random.shuffle(batches)
+                for group in batches:
+                    new_batches += group
+            return new_batches
+        
+        return batch_sampler
diff --git a/SpeechUT/speechut/models/__init__.py b/SpeechUT/speechut/models/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/SpeechUT/speechut/models/speechut.py b/SpeechUT/speechut/models/speechut.py
new file mode 100644
index 0000000000000000000000000000000000000000..cb668286c1c1c420d0c7d7b9e74a3bca17c6c871
--- /dev/null
+++ b/SpeechUT/speechut/models/speechut.py
@@ -0,0 +1,785 @@
+# ----------------------------------------------------------------------------
+# SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder Based Speech-Text Pre-training (https://arxiv.org/abs/2210.03730)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/SpeechUT
+# Code based on fairseq: https://github.com/facebookresearch/fairseq/tree/272c4c5197250997148fb12c0db6306035f166a4
+# 
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# ----------------------------------------------------------------------------
+
+import logging
+from dataclasses import dataclass, field
+from typing import Dict, List, Optional, Tuple
+
+import numpy as np
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+
+from fairseq import utils, checkpoint_utils
+from fairseq.data.data_utils import compute_mask_indices
+from fairseq.data.dictionary import Dictionary
+from fairseq.dataclass import ChoiceEnum
+from fairseq.models import BaseFairseqModel, register_model
+from fairseq.models.transformer import Embedding
+from fairseq.file_io import PathManager
+from torch import Tensor
+from fairseq.models.wav2vec.wav2vec2 import ConvFeatureExtractionModel
+from fairseq.modules import GradMultiply, LayerNorm
+from fairseq.tasks.hubert_pretraining import (
+    HubertPretrainingConfig,
+    HubertPretrainingTask,
+)
+from fairseq.models.hubert import HubertConfig
+from fairseq.models.transformer import TransformerConfig
+from speechut.modules import TransformerEncoder
+from speechut.modules import TransformerEncoderBase
+from speechut.modules import TransformerDecoderBaseScriptable
+
+logger = logging.getLogger(__name__)
+
+EXTRACTOR_MODE_CHOICES = ChoiceEnum(["default", "layer_norm"])
+MASKING_DISTRIBUTION_CHOICES = ChoiceEnum(["static", "uniform", "normal", "poisson"])
+
+
+@dataclass
+
+class SpeechutConfig(HubertConfig):
+    use_rel_pos_enc: bool = field(
+        default=False,
+        metadata={"help": "whether to use relative positional encoding"},
+    )
+    scaling_for_att: float = field(
+        default=1.0,
+        metadata={"help": "scaling for attention weights to prevent overflow issue (for large model)"},
+    )
+
+    # unit encoder-decoder
+    text_transformer: TransformerConfig = TransformerConfig()
+    reset_decoder_embedding_config: bool = field(
+        default=False,
+        metadata={"help": "reset the no_scale_embedding/layernorm_embedding to default for the decoder"},
+    )
+    add_unit_encoder: bool = field(
+        default=False,
+        metadata={"help": "add unit encoder"},
+    )
+    add_decoder: bool = field(
+        default=True,
+        metadata={"help": "add decoder"},
+    )
+    add_text_ctc: bool = field(
+        default=False,
+        metadata={"help": "add_text_ctc head"},
+    )
+    text_ctc_conv_kernel: int = field(
+        default=2,
+        metadata={"help": "text_ctc_conv kernel size"},
+    )
+    mask_u2t: bool = field(
+        default=True,
+        metadata={"help": "mask the unit input in unit-to-text task"},
+    )
+
+    # embedding mixing
+    mix_with_unit: bool = field(
+        default=True,
+        metadata={"help": "mix with the unit embeddings"},
+    )
+    use_pred_unit: bool = field(
+        default=False,
+        metadata={"help": "use the embeddings of predicted units"},
+    )
+    l2_embedding: bool = field(
+        default=False,
+        metadata={"help": "compute l2 loss between unit embedding and unit hidden state"},
+    )
+
+    # Finetune related
+    encoder_dict_size: int = field(
+        default=-1,
+        metadata={"help": "text encoder dictionary dimension"},
+    )
+
+    decoder_dict_size: int = field(
+        default=-1,
+        metadata={"help": "decoder dictionary dimension"},
+    )
+
+
+@register_model("speechut", dataclass=SpeechutConfig)
+class SpeechutModel(BaseFairseqModel):
+    def __init__(
+        self,
+        cfg: SpeechutConfig,
+        task_cfg: HubertPretrainingConfig,
+        dictionaries: List[Dictionary],
+        unit_dictionary: Dictionary = None,
+        text_tgt_dictionary: Dictionary = None,
+    ) -> None:
+        super().__init__()
+        logger.info(f"SpeechutModel Config: {cfg}")
+
+        feature_enc_layers = eval(cfg.conv_feature_layers)  # noqa
+        self.embed = feature_enc_layers[-1][0]
+
+        self.feature_extractor = ConvFeatureExtractionModel(
+            conv_layers=feature_enc_layers,
+            dropout=0.0,
+            mode=cfg.extractor_mode,
+            conv_bias=cfg.conv_bias,
+        )
+        feature_ds_rate = np.prod([s for _, _, s in feature_enc_layers])
+        self.feat2tar_ratio = cfg.label_rate * feature_ds_rate / task_cfg.sample_rate
+
+        self.post_extract_proj = (
+            nn.Linear(self.embed, cfg.encoder_embed_dim)
+            if self.embed != cfg.encoder_embed_dim
+            else None
+        )
+
+        self.mask_prob = cfg.mask_prob
+        self.mask_selection = cfg.mask_selection
+        self.mask_other = cfg.mask_other
+        self.mask_length = cfg.mask_length
+        self.no_mask_overlap = cfg.no_mask_overlap
+        self.mask_min_space = cfg.mask_min_space
+
+        self.mask_channel_prob = cfg.mask_channel_prob
+        self.mask_channel_selection = cfg.mask_channel_selection
+        self.mask_channel_other = cfg.mask_channel_other
+        self.mask_channel_length = cfg.mask_channel_length
+        self.no_mask_channel_overlap = cfg.no_mask_channel_overlap
+        self.mask_channel_min_space = cfg.mask_channel_min_space
+
+        self.dropout_input = nn.Dropout(cfg.dropout_input)
+        self.dropout_features = nn.Dropout(cfg.dropout_features)
+
+        self.feature_grad_mult = cfg.feature_grad_mult
+        self.logit_temp = cfg.logit_temp
+        self.skip_masked = cfg.skip_masked
+        self.skip_nomask = cfg.skip_nomask
+
+        final_dim = cfg.final_dim if cfg.final_dim > 0 else cfg.encoder_embed_dim
+
+        self.mask_emb = nn.Parameter(
+            torch.FloatTensor(cfg.encoder_embed_dim).uniform_()
+        )
+
+        self.encoder = TransformerEncoder(cfg)
+        self.layer_norm = LayerNorm(self.embed)
+
+        self.target_glu = None
+        if cfg.target_glu:
+            self.target_glu = nn.Sequential(
+                nn.Linear(final_dim, final_dim * 2), nn.GLU()
+            )
+
+        self.final_dim = final_dim
+        assert len(dictionaries) <= 2, f"Only support <=2 kinds of targets, get {len(dictionaries)} dictionaries"
+        if len(dictionaries) == 1:
+            dictionaries = [dictionaries[0], dictionaries[0]]
+        self.num_classes = [len(d) for d in dictionaries]
+
+        self.final_proj = nn.Linear(cfg.encoder_embed_dim, final_dim)
+        self.code_encoder_proj = nn.Linear(cfg.text_transformer.encoder.embed_dim, self.num_classes[-1])
+        self.final_proj_list = [self.final_proj, self.code_encoder_proj]
+
+        self.label_embs_concat = nn.Parameter(torch.FloatTensor(self.num_classes[0], final_dim))
+        self.label_embs_list = [self.label_embs_concat]
+        for p in self.label_embs_list:
+            nn.init.uniform_(p)
+
+        ### build unit encoder:
+        self.mask_u2t = cfg.mask_u2t
+        self.add_text_ctc = cfg.add_text_ctc
+        self.text_ctc_conv_kernel = cfg.text_ctc_conv_kernel
+        self.padding_idx = unit_dictionary.pad()
+        self.unit_mask_idx = unit_dictionary.index("<mask>")
+
+        self.add_unit_encoder = cfg.add_unit_encoder
+        self.mix_with_unit = cfg.mix_with_unit
+        self.use_pred_unit = cfg.use_pred_unit
+        self.l2_embedding = cfg.l2_embedding
+        if self.add_unit_encoder:
+            assert len(unit_dictionary) == self.num_classes[0], f"unit_dictionary: {len(unit_dictionary)}, self.num_classes[0]: {self.num_classes[0]}"
+            ### build unit pre-net, and shared with hubert label_embs if needed (default: False)
+            self.unit_embed_tokens = self.build_embedding(
+                    unit_dictionary,
+                    cfg.text_transformer.encoder.embed_dim,
+                )
+            if self.final_dim == cfg.text_transformer.encoder.embed_dim:
+                logger.info("Share label_embs[0] with unit_embed_tokens ...")
+                nn.init.uniform_(self.unit_embed_tokens.weight)
+                self.label_embs_list[0] = self.unit_embed_tokens.weight
+
+            ### build unit encoder
+            self.unit_encoder = TransformerEncoderBase(
+                cfg.text_transformer, 
+                unit_dictionary, 
+                self.unit_embed_tokens,
+                use_rel_pos_enc=cfg.use_rel_pos_enc,
+                scaling_for_att=cfg.scaling_for_att,
+            )
+            
+            ### build text ctc head
+            if self.add_text_ctc:
+                conv = nn.Conv1d(
+                    cfg.text_transformer.encoder.embed_dim, cfg.text_transformer.encoder.embed_dim, 
+                    self.text_ctc_conv_kernel,
+                    stride=self.text_ctc_conv_kernel // 2,
+                    bias=False,
+                    padding=self.text_ctc_conv_kernel // 2,
+                )
+                nn.init.kaiming_normal_(conv.weight)
+                self.unit_encoder_ctc_head = nn.Sequential(
+                    Rotate3D(),
+                    conv,
+                    nn.Dropout(p=0.1),
+                    nn.Sequential(
+                        Rotate3D(),
+                        Rotate3D(),
+                        LayerNorm(cfg.text_transformer.encoder.embed_dim),
+                    ),
+                    nn.GELU(),
+                    nn.Linear(cfg.text_transformer.encoder.embed_dim, len(text_tgt_dictionary)),
+                )
+
+        ### build unit2text decoder, not available for now
+        self.add_decoder = cfg.add_decoder
+        self.text_transformer_cfg = cfg.text_transformer
+        if self.add_decoder:
+            # To make sure that the decoder dict size is the same as the fine-tuning tgt_dict size or bpe code dict size
+            dec_dictionary = self.cutting_dictionary(text_tgt_dictionary, cfg.decoder_dict_size)
+            decoder_embed_tokens = self.build_embedding(
+                dec_dictionary, cfg.text_transformer.decoder.embed_dim
+            )
+            if cfg.reset_decoder_embedding_config:
+                cfg.text_transformer.no_scale_embedding = False
+                cfg.text_transformer.layernorm_embedding = False
+                cfg.text_transformer.no_token_positional_embeddings = False
+            self.decoder = TransformerDecoderBaseScriptable(cfg.text_transformer, dec_dictionary, decoder_embed_tokens, use_rel_pos_enc=cfg.use_rel_pos_enc)
+        
+
+    def cutting_dictionary(self, dictionary, dict_size):
+        if dictionary is None or dict_size <= 0:
+            return dictionary
+        else:
+            import copy
+            cut_dictionary = copy.deepcopy(dictionary)
+            if dict_size > len(cut_dictionary):
+                for i in range(dict_size - len(cut_dictionary)):
+                    cut_dictionary.symbols.append(f'_{i}_')
+            else:
+                cut_dictionary.symbols = cut_dictionary.symbols[:dict_size]
+            return cut_dictionary
+
+    def build_embedding(self, dictionary, embed_dim):
+        num_embeddings = len(dictionary)
+        padding_idx = dictionary.pad()
+        return Embedding(num_embeddings, embed_dim, padding_idx)
+
+    def upgrade_state_dict_named(self, state_dict, name):
+        """Upgrade a (possibly old) state dict for new versions of fairseq."""
+
+        super().upgrade_state_dict_named(state_dict, name)
+        return state_dict
+
+    @classmethod
+    def build_model(cls, cfg: SpeechutConfig, task: HubertPretrainingTask):
+        """Build a new model instance."""
+        unit_dictionary = getattr(task, "text_src_dictionary", None)
+        text_tgt_dictionary = getattr(task, "text_dictionary", None)
+        model = SpeechutModel(cfg, task.cfg, task.dictionaries, unit_dictionary, text_tgt_dictionary)
+        return model
+
+    def apply_mask(self, x, padding_mask, target_list):
+        B, T, C = x.shape
+        if self.mask_prob > 0:
+            mask_indices = compute_mask_indices(
+                (B, T),
+                padding_mask,
+                self.mask_prob,
+                self.mask_length,
+                self.mask_selection,
+                self.mask_other,
+                min_masks=2,
+                no_overlap=self.no_mask_overlap,
+                min_space=self.mask_min_space,
+            )
+            mask_indices = torch.from_numpy(mask_indices).to(x.device)
+            x[mask_indices] = self.mask_emb
+        else:
+            mask_indices = None
+
+        if self.mask_channel_prob > 0:
+            mask_channel_indices = compute_mask_indices(
+                (B, C),
+                None,
+                self.mask_channel_prob,
+                self.mask_channel_length,
+                self.mask_channel_selection,
+                self.mask_channel_other,
+                no_overlap=self.no_mask_channel_overlap,
+                min_space=self.mask_channel_min_space,
+            )
+            mask_channel_indices = (
+                torch.from_numpy(mask_channel_indices)
+                .to(x.device)
+                .unsqueeze(1)
+                .expand(-1, T, -1)
+            )
+            x[mask_channel_indices] = 0
+
+        return x, mask_indices
+
+    def forward_features(self, source: torch.Tensor) -> torch.Tensor:
+        if self.feature_grad_mult > 0:
+            features = self.feature_extractor(source)
+            if self.feature_grad_mult != 1.0:
+                features = GradMultiply.apply(features, self.feature_grad_mult)
+        else:
+            with torch.no_grad():
+                features = self.feature_extractor(source)
+        return features
+
+    def forward_targets(
+        self,
+        features: torch.Tensor,
+        target_list: List[torch.Tensor],
+    ) -> Tuple[torch.Tensor, torch.Tensor]:
+        # Trim features to ensure labels exist and then get aligned labels
+        feat_tsz = features.size(2)
+        targ_tsz = min([t.size(1) for t in target_list])
+        if self.feat2tar_ratio * feat_tsz > targ_tsz:
+            feat_tsz = int(targ_tsz / self.feat2tar_ratio)
+            features = features[..., :feat_tsz]
+        target_inds = torch.arange(feat_tsz).float() * self.feat2tar_ratio
+        target_inds += np.random.choice(int(self.feat2tar_ratio))
+        target_list = [t[:, target_inds.long()] for t in target_list]
+        return features, target_list
+
+    def forward_padding_mask(
+        self,
+        features: torch.Tensor,
+        padding_mask: torch.Tensor,
+    ) -> torch.Tensor:
+        extra = padding_mask.size(1) % features.size(1)
+        if extra > 0:
+            padding_mask = padding_mask[:, :-extra]
+        padding_mask = padding_mask.view(padding_mask.size(0), features.size(1), -1)
+        padding_mask = padding_mask.all(-1)
+        return padding_mask
+
+    def get_normalized_probs(
+        self,
+        net_output: Tuple[Tensor, Optional[Dict[str, List[Optional[Tensor]]]]],
+        log_probs: bool,
+        sample: Optional[Dict[str, Tensor]] = None,
+    ):
+        lprobs = self.get_normalized_probs_scriptable(net_output, log_probs, sample)
+        lprobs.batch_first = True
+        return lprobs
+
+    def downsample_ctc_padding_mask(self, padding_mask):
+        """
+        padding_mask: (B, T)
+        """
+        stride = self.text_ctc_conv_kernel // 2
+        return padding_mask[:, ::stride]
+    
+    def compute_pred(self, proj_x, label_embs):
+        if self.target_glu:
+            label_embs = self.target_glu(label_embs)
+        x = F.normalize(proj_x.float(), dim=-1)                 # (S, D)
+        label_embs = F.normalize(label_embs.float(), dim=-1)    # (C, D)
+        logits = torch.matmul(x, label_embs.T).type_as(proj_x)  # (S, C)
+        logits /= self.logit_temp
+        return logits
+
+    def compute_hubert_logits(self, x, target, proj, label_embs, padding_mask, mask_indices):
+        if not self.skip_masked:
+            masked_indices = torch.logical_and(~padding_mask, mask_indices)
+            proj_x_m = proj(x[masked_indices])
+            logit_m_list = [(self.compute_pred(proj_x_m, label_embs), target[masked_indices])]
+        else:
+            logit_m_list = [None]
+
+        if not self.skip_nomask:
+            nomask_indices = torch.logical_and(~padding_mask, ~mask_indices)
+            proj_x_u = proj(x[nomask_indices])
+            logit_u_list = [(self.compute_pred(proj_x_u, label_embs), target[nomask_indices])]
+        else:
+            logit_u_list = [None]
+
+        return logit_m_list, logit_u_list
+
+    def compute_ce_logits(self, x, target, proj, padding_mask, mask_indices):
+        if not self.skip_masked:
+            masked_indices = torch.logical_and(~padding_mask, mask_indices)
+            logit_m_list = [(proj(x[masked_indices]), target[masked_indices])]
+        else:
+            logit_m_list = [None]
+
+        if not self.skip_nomask:
+            nomask_indices = torch.logical_and(~padding_mask, ~mask_indices)
+            logit_u_list = [(proj(x[nomask_indices]), target[nomask_indices])]
+        else:
+            logit_u_list = [None]
+
+        return logit_m_list, logit_u_list
+
+    def convert_embeddings(self,
+        x,
+        padding_mask,
+        target=None,
+        mask_indices=None,
+        mix_with_unit=False,
+        use_pred_unit=False,
+        l2_embedding=False,
+        remask=False
+    ):
+        """
+        1. Mix with units if needed (default: True)
+        2. Prepare for unit_encoder inputs
+        Inputs:
+            x, (B, T, D)
+        Return:
+            src_tokens, (B, T)
+            soft_embeddings, (B, T, D)
+            l2_loss, a loss
+        """
+        soft_embeddings = self.final_proj_list[0](x) if x.size(-1) == self.final_dim else x
+        if padding_mask is None:
+            padding_mask = soft_embeddings.new_zeros(soft_embeddings.size(0), soft_embeddings.size(1), dtype=torch.long)
+        if use_pred_unit:
+            src_tokens = self.compute_pred(self.final_proj_list[0](x), self.label_embs_list[0]).argmax(dim=-1)
+            src_tokens[padding_mask] = self.padding_idx
+        elif target is not None:
+            src_tokens = target
+        else:
+            src_tokens = padding_mask.long()
+
+        if l2_embedding | mix_with_unit:
+            unit_embeddings = self.unit_embed_tokens(src_tokens)    # (B, T, D)
+        
+        l2_loss = 0
+        if l2_embedding:
+            if mask_indices is not None:
+                l2_loss = (soft_embeddings - unit_embeddings)[mask_indices].float().pow(2).mean(dim=-1)
+                scale = unit_embeddings[mask_indices].float().pow(2).sum(dim=-1)
+            else:
+                l2_loss = (soft_embeddings - unit_embeddings).float().pow(2).mean(dim=-1)
+                scale = unit_embeddings.float().pow(2).sum(dim=-1)
+            l2_loss = (l2_loss / scale).mean()
+
+        if mix_with_unit:
+            B, T, D = x.shape
+            selected_indices = compute_mask_indices(
+                (B, T),
+                padding_mask,
+                self.mask_prob / 2,
+                self.mask_length // 2,
+                self.mask_selection,
+                self.mask_other,
+                min_masks=2,
+                no_overlap=self.no_mask_overlap,
+                min_space=self.mask_min_space,
+            )
+            selected_indices = torch.from_numpy(selected_indices).to(x.device)
+            if mask_indices is not None:
+                if remask:
+                    remask_indices = torch.logical_and(selected_indices, mask_indices)
+                    soft_embeddings[remask_indices] = self.mask_emb
+                swap_indices = torch.logical_and(selected_indices, ~mask_indices)
+            else:
+                swap_indices = selected_indices
+            soft_embeddings[swap_indices] = unit_embeddings[swap_indices]
+
+        soft_embeddings = soft_embeddings * (1 - padding_mask.unsqueeze(-1).type_as(x))
+        return src_tokens, soft_embeddings, l2_loss
+
+    def forward(
+        self,
+        source: torch.Tensor = None,
+        src_tokens: torch.Tensor = None,
+        src_lengths: torch.Tensor = None,
+        prev_output_tokens: torch.Tensor = None,
+        target_list: Optional[List[torch.Tensor]] = None,
+        padding_mask: Optional[torch.Tensor] = None,
+        mask: bool = True,
+        features_only: bool = False,
+        output_layer: Optional[int] = None,
+    ) -> Dict[str, torch.Tensor]:
+        assert source is not None or src_tokens is not None
+        if source is not None:
+            return self.forward_speech(
+                source=source,
+                target_list=target_list,
+                padding_mask=padding_mask,
+                mask=mask,
+                features_only=features_only,
+                output_layer=output_layer,
+            )
+        else:
+            return self.forward_text(
+                src_tokens=src_tokens,
+                src_lengths=src_lengths,
+                prev_output_tokens=prev_output_tokens,
+                mask=self.mask_u2t,
+                features_only=features_only,
+                output_layer=output_layer,
+            )
+    
+    def forward_speech(
+        self,
+        source: torch.Tensor = None,
+        target_list: Optional[List[torch.Tensor]] = None,
+        padding_mask: Optional[torch.Tensor] = None,
+        mask: bool = True,
+        features_only: bool = False,
+        output_layer: Optional[int] = None,
+    ) -> Dict[str, torch.Tensor]:
+        """output layer is 1-based"""
+        features = self.forward_features(source)
+        if target_list is not None:
+            features, target_list = self.forward_targets(features, target_list)
+
+        features_pen = features.float().pow(2).mean()
+
+        features = features.transpose(1, 2)
+        features = self.layer_norm(features)
+        unmasked_features = features.clone()
+
+        if padding_mask is not None:
+            padding_mask = self.forward_padding_mask(features, padding_mask)
+
+        if self.post_extract_proj is not None:
+            features = self.post_extract_proj(features)
+
+        features = self.dropout_input(features)
+        unmasked_features = self.dropout_features(unmasked_features)
+
+        if mask:
+            x, mask_indices = self.apply_mask(features, padding_mask, target_list)
+        else:
+            x = features
+            mask_indices = None
+
+        # feature: (B, T, D), float
+        # target: (B, T), long
+        # x: (B, T, D), float
+        # padding_mask: (B, T), bool
+        # mask_indices: (B, T), bool
+        x, _ = self.encoder(
+            x,
+            padding_mask=padding_mask,
+            layer=None if output_layer is None else output_layer - 1,
+        )
+
+        if features_only:
+            return {"x": x, "padding_mask": padding_mask, "features": features}
+        
+        logit_m_list, logit_u_list = self.compute_hubert_logits(
+            x,
+            target_list[0],
+            self.final_proj_list[0],
+            self.label_embs_list[0],
+            padding_mask,
+            mask_indices,
+        )
+
+        result = {
+            "logit_m_list": logit_m_list,
+            "logit_u_list": logit_u_list,
+            "padding_mask": padding_mask,
+            "features_pen": features_pen,
+        }
+        
+        if self.add_unit_encoder:
+            src_tokens, x_emb, l2_loss = self.convert_embeddings(
+                x, 
+                padding_mask, target_list[0],
+                mask_indices=mask_indices,
+                mix_with_unit=self.mix_with_unit,
+                use_pred_unit=self.use_pred_unit,
+                l2_embedding=self.l2_embedding,
+            )
+            encoder_out = self.unit_encoder(src_tokens, token_embeddings=x_emb)
+
+            result['encoder_out'] = encoder_out['encoder_out']  # [(T, B, D)]
+            result['encoder_padding_mask'] = encoder_out['encoder_padding_mask']    # [(B, T)]
+            if self.l2_embedding:
+                result['embedding_l2_loss'] = l2_loss
+
+            code_logit_m_list, code_logit_u_list = self.compute_ce_logits(
+                encoder_out['encoder_out'][0].transpose(0, 1),      # -> (B, T, C)
+                target_list[-1], 
+                self.final_proj_list[1], 
+                padding_mask,
+                mask_indices,
+            )
+            result['logit_m_list'] += code_logit_m_list
+            result['logit_u_list'] += code_logit_u_list
+        return result
+
+    def forward_text(
+        self,
+        src_tokens: torch.Tensor = None,
+        src_lengths: torch.Tensor = None,
+        prev_output_tokens: torch.Tensor = None,
+        target_list: Optional[List[torch.Tensor]] = None,
+        mask: bool = True,
+        features_only: bool = False,
+        output_layer: Optional[int] = None,
+    ) -> Dict[str, torch.Tensor]:
+        assert self.add_unit_encoder, f"Can not forward unit-text branch without unit_encoder!"
+
+        padding_mask = src_tokens == self.padding_idx
+        unit_embeddings = self.unit_embed_tokens(src_tokens)
+        if mask:
+            unit_embeddings, mask_indices = self.apply_mask(unit_embeddings, padding_mask, [src_tokens])
+        
+        encoder_out = self.unit_encoder(
+            src_tokens,
+            token_embeddings=unit_embeddings,
+            return_all_hiddens=output_layer is not None,
+        )
+
+        result = {}
+        result["encoder_out"] = encoder_out["encoder_out"]
+        result["encoder_states"] = encoder_out["encoder_states"]
+        result["padding_mask"] = padding_mask
+
+        if self.add_text_ctc:
+            result["encoder_out_ctc"] = [self.unit_encoder_ctc_head(x) for x in encoder_out['encoder_out']]
+            result["encoder_padding_mask"] = [
+                self.downsample_ctc_padding_mask(padding_mask) for padding_mask in encoder_out['encoder_padding_mask']
+            ]
+        
+        if features_only:
+            return result
+        if self.add_decoder:
+            assert prev_output_tokens is not None
+            decoder_out = self.decoder(
+                prev_output_tokens=prev_output_tokens, encoder_out=encoder_out,
+            )
+            result['decoder_out'] = decoder_out
+        return result
+
+    def forward_mum(self, src_tokens, target, mask=True):
+        target_list = [target]
+        padding_mask = src_tokens.eq(self.unit_encoder.padding_idx)
+        unit_embeddings = self.unit_embed_tokens(src_tokens)
+        if mask:
+            unit_embeddings, mask_indices = self.apply_mask(unit_embeddings, padding_mask, target_list)
+        else:
+            ### If already applied mask on src_tokens, then the target_list should contains many padding_idx
+            mask_indices = target_list[-1] != self.padding_idx
+            unit_embeddings[mask_indices] = self.mask_emb
+
+        encoder_out = self.unit_encoder(
+            src_tokens,
+            token_embeddings=unit_embeddings,
+        )
+        code_logit_m_list, code_logit_u_list = self.compute_ce_logits(
+            encoder_out["encoder_out"][0].transpose(0, 1), 
+            target_list[-1], 
+            self.final_proj_list[1], 
+            padding_mask,
+            mask_indices,
+        )
+        result = {}
+        result["logit_m_list"] = code_logit_m_list
+        result["logit_u_list"] = code_logit_u_list
+        result["padding_mask"] = padding_mask
+        return result
+
+    def extract_features(
+        self,
+        source: torch.Tensor,
+        padding_mask: Optional[torch.Tensor] = None,
+        mask: bool = False,
+        ret_conv: bool = False,
+        output_layer: Optional[int] = None,
+        **kwargs,
+    ) -> Tuple[torch.Tensor, torch.Tensor]:
+        """Extract encoder features for only speech input"""
+        res = self.forward(
+            source,
+            padding_mask=padding_mask,
+            mask=mask,
+            features_only=True,
+            output_layer=output_layer,
+        )
+        x = res["x"] # B x T x D
+        padding_mask = res["padding_mask"]
+
+        if self.add_unit_encoder:
+            src_tokens, x, _ = self.convert_embeddings(
+                x,
+                padding_mask,
+                mix_with_unit=False,
+                use_pred_unit=False,
+            )
+            encoder_out = self.unit_encoder(
+                src_tokens,
+                token_embeddings=x,
+                return_all_hiddens=output_layer is not None
+            )
+            res["x"] = encoder_out['encoder_out'][0].transpose(0, 1)  # (B, T, D)
+
+        feature = res["features"] if ret_conv else res["x"]
+        if output_layer is not None:
+            feature = encoder_out['encoder_states']
+
+        return feature, padding_mask
+
+    def get_logits(self, net_output, is_masked=True):
+        if is_masked:
+            logits_list = net_output["logit_m_list"]
+        else:
+            logits_list = net_output["logit_u_list"]
+        logits_list = [x[0].float() for x in logits_list if x is not None]
+        return logits_list
+
+    def get_targets(self, net_output, is_masked=True):
+        if is_masked:
+            logits_list = net_output["logit_m_list"]
+        else:
+            logits_list = net_output["logit_u_list"]
+        targets_list = [x[1].long() for x in logits_list if x is not None]
+        return targets_list
+
+    def get_extra_losses(self, net_output):
+        extra_losses = []
+        names = []
+
+        if "features_pen" in net_output:
+            extra_losses.append(net_output["features_pen"])
+            names.append("features_pen")
+
+        if "embedding_l2_loss" in net_output:
+            extra_losses.append(net_output["embedding_l2_loss"])
+            names.append("embedding_l2_loss")
+
+        return extra_losses, names
+
+    def remove_pretraining_modules(self, step2=False):
+        self.target_glu = None
+
+    def load_checkpoint(self, checkpoint: str):
+        if not PathManager.exists(checkpoint):
+            raise IOError("Model file not found: {}".format(checkpoint))
+        state = checkpoint_utils.load_checkpoint_to_cpu(checkpoint)
+        return state
+
+class Rotate3D(nn.Module):
+    """
+    (T, B, D) --> (B, D, T) --> (D, T, B) --> (T, B, D)
+    """
+    def __init__(self):
+        super().__init__()
+
+    def forward(self, x):
+        return x.permute(1, 2, 0)
diff --git a/SpeechUT/speechut/models/speechut_asr.py b/SpeechUT/speechut/models/speechut_asr.py
new file mode 100644
index 0000000000000000000000000000000000000000..f9ec9d8488b4f7e552804d355de000c80fb35b78
--- /dev/null
+++ b/SpeechUT/speechut/models/speechut_asr.py
@@ -0,0 +1,165 @@
+# ----------------------------------------------------------------------------
+# SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder Based Speech-Text Pre-training (https://arxiv.org/abs/2210.03730)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/SpeechUT
+# Code based on fairseq: https://github.com/facebookresearch/fairseq/tree/272c4c5197250997148fb12c0db6306035f166a4
+# 
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# ----------------------------------------------------------------------------
+
+import contextlib
+import torch
+from dataclasses import dataclass, field
+from fairseq import utils
+from fairseq.models import BaseFairseqModel, register_model
+from fairseq.models.fairseq_encoder import FairseqEncoder
+from fairseq.models.hubert import HubertAsrConfig, HubertEncoder
+from fairseq.tasks import FairseqTask
+
+@dataclass
+class SpeechUTASRConfig(HubertAsrConfig):
+    add_decoder: bool = field(
+        default=True,
+        metadata={"help": "add decoder for fine-tune"},
+    )
+
+@register_model("speechut_asr", dataclass=SpeechUTASRConfig)
+class SpeechUTASR(BaseFairseqModel):
+    """
+    A encoder-ctc-decoder model if cfg.add_decoder is True, or a encoder-ctc model
+    """
+    def __init__(self, cfg: SpeechUTASRConfig, encoder: FairseqEncoder):
+        super().__init__()
+        self.cfg = cfg
+        self.encoder = encoder
+        if not cfg.add_decoder:
+            self.encoder.w2v_model.decoder = None
+
+    def upgrade_state_dict_named(self, state_dict, name):
+        super().upgrade_state_dict_named(state_dict, name)
+        return state_dict
+    
+    @classmethod
+    def build_model(cls, cfg: SpeechUTASRConfig, task: FairseqTask):
+        """Build a new model instance."""
+        encoder = SpeechUTEncoder(cfg, task)
+        return cls(cfg, encoder)
+
+    def forward(self, source, padding_mask, prev_output_tokens, **kwargs):
+        encoder_out = self.encoder(source, padding_mask, **kwargs)
+
+        x = self.encoder.final_dropout(encoder_out['encoder_out'][0])  # (T, B, C)
+        if self.encoder.proj:
+            x = self.encoder.proj(x)
+        if self.encoder.conv_ctc_proj:
+            padding_mask = self.encoder.w2v_model.downsample_ctc_padding_mask(encoder_out["encoder_padding_mask"][0])
+        else:
+            padding_mask = encoder_out["encoder_padding_mask"]
+
+        decoder_out = self.decoder(
+            prev_output_tokens, encoder_out=encoder_out, **kwargs
+        ) if self.cfg.add_decoder else None
+        
+        return {
+            "encoder_out_ctc": x,           # (T, B, C), for CTC loss
+            "padding_mask": padding_mask,   # (B, T), for CTC loss
+            "decoder_out": decoder_out,     # for ED loss
+        }
+    
+    def forward_decoder(self, prev_output_tokens, **kwargs):
+        return self.decoder(prev_output_tokens, **kwargs)
+
+    def get_logits(self, net_output):
+        """For CTC decoding"""
+        logits = net_output["encoder_out"]
+        padding = net_output["encoder_padding_mask"]
+        if padding is not None and padding.any():
+            padding = padding.T
+            logits[padding][..., 0] = 0
+            logits[padding][..., 1:] = float("-inf")
+
+        return logits
+
+    def get_normalized_probs(self, net_output, log_probs, sample=None):
+        """For 1) computing CTC loss, 2) decoder decoding."""
+
+        if "encoder_out_ctc" in net_output:
+            logits = net_output["encoder_out_ctc"]
+        else:
+            return self.decoder.get_normalized_probs(net_output, log_probs, sample)
+        
+        if isinstance(logits, list):
+            logits = logits[0]
+
+        if log_probs:
+            return utils.log_softmax(logits.float(), dim=-1)
+        else:
+            return utils.softmax(logits.float(), dim=-1)
+
+    @property
+    def decoder(self):
+        return self.encoder.w2v_model.decoder
+
+
+class SpeechUTEncoder(HubertEncoder):
+    """
+    Modified from fairseq.models.hubert.hubert_asr.HubertEncoder
+    1. make it compatible with encoder-decoder model
+    """
+    def __init__(self, cfg: HubertAsrConfig, task):
+        super().__init__(cfg, task)
+        
+        if (task.target_dictionary is not None) and (
+            hasattr(self.w2v_model, "unit_encoder_ctc_head")
+        ):
+            self.proj = self.w2v_model.unit_encoder_ctc_head
+            self.conv_ctc_proj = True
+        else:
+            self.conv_ctc_proj = False
+    
+    def forward(self, source, padding_mask, tbc=True, **kwargs):
+        w2v_args = {
+            "source": source,
+            "padding_mask": padding_mask,
+            "mask": self.apply_mask and self.training,
+        }
+        ft = self.freeze_finetune_updates <= self.num_updates
+        with torch.no_grad() if not ft else contextlib.ExitStack():
+            x, padding_mask = self.w2v_model.extract_features(**w2v_args)
+            if tbc:
+                # B x T x C -> T x B x C
+                x = x.transpose(0, 1)
+        return {
+            "encoder_out": [x],  # T x B x C
+            "encoder_padding_mask": [padding_mask],  # B x T
+        }
+    
+    def forward_torchscript(self, net_input):
+        """A TorchScript-compatible version of forward.
+
+        Forward the encoder out.
+        """
+        x, padding_mask = self.w2v_model.extract_features(**net_input, mask=False)
+        # B x T x C -> T x B x C
+        x = x.transpose(0, 1)
+
+        encoder_out = {
+            "encoder_out" : [x],
+            "encoder_padding_mask" : [padding_mask],
+        }
+        if self.proj:
+            x = self.proj(x)
+            encoder_out["encoder_out_ctc"] = x
+
+        return encoder_out
+
+    def reorder_encoder_out(self, encoder_out, new_order):
+        if encoder_out["encoder_out"] is not None:
+            encoder_out["encoder_out"] = [
+                x.index_select(1, new_order) for x in encoder_out["encoder_out"]
+            ]
+        if encoder_out["encoder_padding_mask"] is not None:
+            encoder_out["encoder_padding_mask"] = [
+                x.index_select(0, new_order) for x in encoder_out["encoder_padding_mask"]
+            ]
+        return encoder_out
diff --git a/SpeechUT/speechut/models/speechut_st.py b/SpeechUT/speechut/models/speechut_st.py
new file mode 100644
index 0000000000000000000000000000000000000000..6faaccfc89748a2692bd1eaec200588449d10423
--- /dev/null
+++ b/SpeechUT/speechut/models/speechut_st.py
@@ -0,0 +1,221 @@
+# ----------------------------------------------------------------------------
+# SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder Based Speech-Text Pre-training (https://arxiv.org/abs/2210.03730)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/SpeechUT
+# Code based on fairseq: https://github.com/facebookresearch/fairseq/tree/272c4c5197250997148fb12c0db6306035f166a4
+# 
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# ----------------------------------------------------------------------------
+
+import logging
+import contextlib
+import torch
+import torch.nn as nn
+from argparse import Namespace
+from dataclasses import dataclass
+from typing import Any
+from fairseq import checkpoint_utils, tasks
+from fairseq.models import BaseFairseqModel, register_model
+from fairseq.models.fairseq_encoder import FairseqEncoder
+from fairseq.tasks import FairseqTask
+from fairseq.dataclass.utils import convert_namespace_to_omegaconf
+from fairseq.data.data_utils import lengths_to_padding_mask
+
+from fairseq.models.hubert import HubertAsrConfig
+
+logger = logging.getLogger(__name__)
+
+@dataclass
+class SpeechUTS2TConfig(HubertAsrConfig):
+    ### the following config is only for the compatibility to fairseq speech_to_text task
+    input_feat_per_channel: Any = None
+    input_channels: Any = None
+    speaker_to_id: Any = None
+
+@register_model("speechut_st_legacy", dataclass=SpeechUTS2TConfig)
+class SpeechUTS2T(BaseFairseqModel):
+    """An encoder-decoder model."""
+    def __init__(self, cfg: SpeechUTS2TConfig, encoder: FairseqEncoder):
+        super().__init__()
+        self.cfg = cfg
+        self.encoder = encoder
+
+    def upgrade_state_dict_named(self, state_dict, name):
+        super().upgrade_state_dict_named(state_dict, name)
+        return state_dict
+    
+    @classmethod
+    def build_model(cls, cfg: SpeechUTS2TConfig, task: FairseqTask):
+        """Build a new model instance."""
+        encoder = SpeechUTEncoder(cfg, task)
+        return cls(cfg, encoder)
+
+    def forward(self, src_tokens, src_lengths, prev_output_tokens, **kwargs):
+        encoder_out = self.encoder(src_tokens, src_lengths, **kwargs)
+        decoder_out = self.encoder.w2v_model.decoder(
+            prev_output_tokens, encoder_out=encoder_out, **kwargs
+        )
+        return decoder_out
+    
+    def forward_decoder(self, prev_output_tokens, **kwargs):
+        return self.encoder.w2v_model.decoder(prev_output_tokens, **kwargs)
+
+    def get_normalized_probs(self, net_output, log_probs, sample=None):
+        """For decoder decoding."""
+        return self.encoder.w2v_model.decoder.get_normalized_probs(net_output, log_probs, sample)
+        
+    @property
+    def decoder(self):
+        return self.encoder.w2v_model.decoder
+
+
+class SpeechUTEncoder(FairseqEncoder):
+    """
+    Modified from fairseq.models.hubert.hubert_asr.HubertEncoder
+    1. make it compatible with fairseq speech_to_text task
+    2. make it compatible with encoder-decoder model
+    """
+    def __init__(self, cfg: SpeechUTS2TConfig, task):
+        self.apply_mask = cfg.apply_mask
+
+        arg_overrides = {
+            "dropout": cfg.dropout,
+            "activation_dropout": cfg.activation_dropout,
+            "dropout_input": cfg.dropout_input,
+            "attention_dropout": cfg.attention_dropout,
+            "mask_length": cfg.mask_length,
+            "mask_prob": cfg.mask_prob,
+            "mask_selection": cfg.mask_selection,
+            "mask_other": cfg.mask_other,
+            "no_mask_overlap": cfg.no_mask_overlap,
+            "mask_channel_length": cfg.mask_channel_length,
+            "mask_channel_prob": cfg.mask_channel_prob,
+            "mask_channel_selection": cfg.mask_channel_selection,
+            "mask_channel_other": cfg.mask_channel_other,
+            "no_mask_channel_overlap": cfg.no_mask_channel_overlap,
+            "encoder_layerdrop": cfg.layerdrop,
+            "feature_grad_mult": cfg.feature_grad_mult,
+        }
+
+        if cfg.w2v_args is None:
+            state = checkpoint_utils.load_checkpoint_to_cpu(cfg.w2v_path, arg_overrides)
+            w2v_args = state.get("cfg", None)
+            if w2v_args is None:
+                w2v_args = convert_namespace_to_omegaconf(state["args"])
+            cfg.w2v_args = w2v_args
+        else:
+            state = None
+            w2v_args = cfg.w2v_args
+            if isinstance(w2v_args, Namespace):
+                cfg.w2v_args = w2v_args = convert_namespace_to_omegaconf(w2v_args)
+
+        assert task.data_cfg.standardize_audio() == w2v_args.task.normalize, (
+            "Fine-tuning works best when data normalization is the same. "
+            "Please check that --normalize is set or unset for "
+            "both pre-training and here"
+        )
+
+        pretrain_task = tasks.setup_task(w2v_args.task, load_local_states=False)
+        assert state is not None and "task_state" in state, f"the stored dictionaries not found in checkpoint!"
+        # This will load the stored "dictionaries" object
+        pretrain_task.load_state_dict(state["task_state"])
+
+        model = pretrain_task.build_model(w2v_args.model, from_checkpoint=True)
+        if state is not None and not cfg.no_pretrained_weights:
+            try:            
+                model.load_state_dict(state["model"], strict=True)
+            except Exception as e:
+                logger.warn(e)
+                model.load_state_dict(state["model"], strict=False)
+
+        model.remove_pretraining_modules()
+
+        super().__init__(pretrain_task.source_dictionary)
+
+        d = w2v_args.model.encoder_embed_dim
+
+        self.w2v_model = model
+
+        self.final_dropout = nn.Dropout(cfg.final_dropout)
+        self.freeze_finetune_updates = cfg.freeze_finetune_updates
+        self.num_updates = 0
+
+    def set_num_updates(self, num_updates):
+        """Set the number of parameters updates."""
+        super().set_num_updates(num_updates)
+        self.num_updates = num_updates
+    
+    def forward(self, src_tokens=None, src_lengths=None, **kwargs):
+
+        w2v_args = {
+            "source": src_tokens,
+            "padding_mask": lengths_to_padding_mask(src_lengths),
+            "mask": self.apply_mask and self.training,
+        }
+
+        ft = self.freeze_finetune_updates <= self.num_updates
+
+        with torch.no_grad() if not ft else contextlib.ExitStack():
+            x, padding_mask = self.w2v_model.extract_features(**w2v_args)
+            # B x T x C -> T x B x C
+            x = x.transpose(0, 1)
+
+        return {
+            "encoder_out": [x],  # T x B x C
+            "encoder_padding_mask": [padding_mask],  # B x T
+            "padding_mask": [padding_mask],
+        }
+    
+    def forward_torchscript(self, net_input):
+        """A TorchScript-compatible version of forward.
+
+        Forward the encoder out.
+        """
+        _net_input = {
+            "source": net_input["src_tokens"],
+            "padding_mask": lengths_to_padding_mask(net_input["src_lengths"]),
+            "mask": False,
+        }
+
+        x, padding_mask = self.w2v_model.extract_features(**_net_input)
+        # B x T x C -> T x B x C
+        x = x.transpose(0, 1)
+
+        encoder_out = {
+            "encoder_out" : [x],
+            "encoder_padding_mask" : [padding_mask],
+        }
+        return encoder_out
+
+    def reorder_encoder_out(self, encoder_out, new_order):
+        if encoder_out["encoder_out"] is not None:
+            encoder_out["encoder_out"] = [
+                x.index_select(1, new_order) for x in encoder_out["encoder_out"]
+            ]
+        if encoder_out["encoder_padding_mask"] is not None:
+            encoder_out["encoder_padding_mask"] = [
+                x.index_select(0, new_order) for x in encoder_out["encoder_padding_mask"]
+            ]
+        return encoder_out
+
+    def max_positions(self):
+        """Maximum input length supported by the encoder."""
+        return None
+
+    def upgrade_state_dict_named(self, state_dict, name):
+        return state_dict
+
+
+def Embedding(num_embeddings, embedding_dim, padding_idx):
+    m = nn.Embedding(num_embeddings, embedding_dim, padding_idx=padding_idx)
+    nn.init.normal_(m.weight, mean=0, std=embedding_dim**-0.5)
+    nn.init.constant_(m.weight[padding_idx], 0)
+    return m
+
+
+def Linear(in_features, out_features, bias=True):
+    m = nn.Linear(in_features, out_features, bias)
+    nn.init.xavier_uniform_(m.weight)
+    if bias:
+        nn.init.constant_(m.bias, 0.0)
+    return m
diff --git a/SpeechUT/speechut/models/t5_transformer_lm.py b/SpeechUT/speechut/models/t5_transformer_lm.py
new file mode 100644
index 0000000000000000000000000000000000000000..3d16a2df00b692114f8d84d254cf486d09e1137b
--- /dev/null
+++ b/SpeechUT/speechut/models/t5_transformer_lm.py
@@ -0,0 +1,25 @@
+# --------------------------------------------------------
+# Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data (https://arxiv.org/abs/2203.17113)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/Speech2C
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Based on fairseq code bases
+# https://github.com/pytorch/fairseq
+# --------------------------------------------------------
+
+from fairseq.models import (
+    register_model_architecture,
+)
+from fairseq.models.transformer_lm import base_lm_architecture
+
+
+@register_model_architecture(model_name="transformer_lm", arch_name="transformer_lm_t5")
+def transformer_lm_t5(args):
+    args.decoder_embed_dim = getattr(args, "decoder_embed_dim", 1280)
+    args.decoder_ffn_embed_dim = getattr(args, "decoder_ffn_embed_dim", 6144)
+    args.decoder_layers = getattr(args, "decoder_layers", 20)
+    args.decoder_attention_heads = getattr(args, "decoder_attention_heads", 16)
+    args.dropout = getattr(args, "dropout", 0.1)
+    args.attention_dropout = getattr(args, "attention_dropout", 0.1)
+    args.activation_fn = getattr(args, "activation_fn", "gelu")
+    base_lm_architecture(args)
diff --git a/SpeechUT/speechut/modules/__init__.py b/SpeechUT/speechut/modules/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..dad97814e515d8e68d68e4e031d4f9c9055f3864
--- /dev/null
+++ b/SpeechUT/speechut/modules/__init__.py
@@ -0,0 +1,27 @@
+# --------------------------------------------------------
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Based on fairseq code bases
+# https://github.com/facebookresearch/fairseq
+# --------------------------------------------------------
+
+from .learned_positional_embedding import LearnedPositionalEmbedding
+from .multihead_attention import MultiheadAttention
+from .relative_pos_enc import RelativePositionalEncoding
+from .transformer_layer import TransformerEncoderLayerBase, TransformerDecoderLayerBase
+from .w2v_encoder import TransformerEncoder, TransformerSentenceEncoderLayer
+from .transformer_encoder import TransformerEncoderBase
+from .transformer_decoder import TransformerDecoderScriptable, TransformerDecoderBaseScriptable
+
+__all__ = [
+    "MultiheadAttention",
+    "RelativePositionalEncoding",
+    "LearnedPositionalEmbedding",
+    "TransformerEncoderLayerBase",
+    "TransformerDecoderLayerBase",
+    "TransformerEncoder",
+    "TransformerSentenceEncoderLayer",
+    "TransformerEncoderBase",
+    "TransformerDecoderScriptable",
+    "TransformerDecoderBaseScriptable",
+]
diff --git a/SpeechUT/speechut/modules/ctc_prefix_score.py b/SpeechUT/speechut/modules/ctc_prefix_score.py
new file mode 100644
index 0000000000000000000000000000000000000000..b42cbd819abf7bdd718bef3db3f553c8360ac384
--- /dev/null
+++ b/SpeechUT/speechut/modules/ctc_prefix_score.py
@@ -0,0 +1,93 @@
+#!/usr/bin/env python3
+
+# Copyright 2018 Mitsubishi Electric Research Labs (Takaaki Hori)
+#  Apache 2.0  (http://www.apache.org/licenses/LICENSE-2.0)
+
+import numpy as np
+import six
+
+
+class CTCPrefixScore(object):
+    """Compute CTC label sequence scores
+    which is based on Algorithm 2 in WATANABE et al.
+    "HYBRID CTC/ATTENTION ARCHITECTURE FOR END-TO-END SPEECH RECOGNITION,"
+    but extended to efficiently compute the probablities of multiple labels
+    simultaneously
+    """
+
+    def __init__(self, x, blank, eos, xp):
+        self.xp = xp
+        self.logzero = -10000000000.0
+        self.blank = blank
+        self.eos = eos
+        self.input_length = len(x)
+        self.x = x
+
+    def initial_state(self):
+        """Obtain an initial CTC state
+        :return: CTC state
+        """
+        # initial CTC state is made of a frame x 2 tensor that corresponds to
+        # r_t^n(<sos>) and r_t^b(<sos>), where 0 and 1 of axis=1 represent
+        # superscripts n and b (non-blank and blank), respectively.
+        r = self.xp.full((self.input_length, 2), self.logzero, dtype=np.float32)
+        r[0, 1] = self.x[0, self.blank]
+        for i in six.moves.range(1, self.input_length):
+            r[i, 1] = r[i - 1, 1] + self.x[i, self.blank]
+        return r
+
+    def __call__(self, y, cs, r_prev):
+        """Compute CTC prefix scores for next labels
+        :param y     : prefix label sequence
+        :param cs    : array of next labels
+        :param r_prev: previous CTC state
+        :return ctc_scores, ctc_states
+        """
+        # initialize CTC states
+        output_length = len(y) - 1  # ignore sos
+        # new CTC states are prepared as a frame x (n or b) x n_labels tensor
+        # that corresponds to r_t^n(h) and r_t^b(h).
+        r = self.xp.ndarray((self.input_length, 2, len(cs)), dtype=np.float32)
+        xs = self.x[:, cs]
+        if output_length == 0:
+            r[0, 0] = xs[0]
+            r[0, 1] = self.logzero
+        else:
+            r[output_length - 1] = self.logzero
+
+        # prepare forward probabilities for the last label
+        r_sum = self.xp.logaddexp(
+            r_prev[:, 0], r_prev[:, 1]
+        )  # log(r_t^n(g) + r_t^b(g))
+        last = y[-1]
+        if output_length > 0 and last in cs:
+            log_phi = self.xp.ndarray((self.input_length, len(cs)), dtype=np.float32)
+            for i in six.moves.range(len(cs)):
+                log_phi[:, i] = r_sum if cs[i] != last else r_prev[:, 1]
+        else:
+            log_phi = r_sum
+
+        # compute forward probabilities log(r_t^n(h)), log(r_t^b(h)),
+        # and log prefix probabilities log(psi)
+        start = max(output_length, 1)
+        log_psi = r[start - 1, 0]
+        for t in six.moves.range(start, self.input_length):
+            r[t, 0] = self.xp.logaddexp(r[t - 1, 0], log_phi[t - 1]) + xs[t]
+            r[t, 1] = (
+                self.xp.logaddexp(r[t - 1, 0], r[t - 1, 1]) + self.x[t, self.blank]
+            )
+            log_psi = self.xp.logaddexp(log_psi, log_phi[t - 1] + xs[t])
+
+        # get P(...eos|X) that ends with the prefix itself
+        eos_pos = self.xp.where(cs == self.eos)[0]
+        if len(eos_pos) > 0:
+            log_psi[eos_pos] = r_sum[-1]  # log(r_T^n(g) + r_T^b(g))
+
+        # exclude blank probs
+        blank_pos = self.xp.where(cs == self.blank)[0]
+        if len(blank_pos) > 0:
+            log_psi[blank_pos] = self.logzero
+
+        # return the log prefix probability and CTC states, where the label axis
+        # of the CTC states is moved to the first axis to slice it easily
+        return log_psi, self.xp.rollaxis(r, 2)
diff --git a/SpeechUT/speechut/modules/learned_positional_embedding.py b/SpeechUT/speechut/modules/learned_positional_embedding.py
new file mode 100644
index 0000000000000000000000000000000000000000..20c8558e20b2172a8c607e2f5c32aa146ff2b9cf
--- /dev/null
+++ b/SpeechUT/speechut/modules/learned_positional_embedding.py
@@ -0,0 +1,69 @@
+# --------------------------------------------------------
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Based on fairseq code bases
+# https://github.com/facebookresearch/fairseq
+# --------------------------------------------------------
+
+"""
+    Modified from https://github.com/facebookresearch/fairseq/blob/main/fairseq/modules/learned_positional_embedding.py
+    1. Add clamping if the input length exceeds the max-source-tokens
+"""
+
+from typing import Dict, Optional
+
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from fairseq import utils
+from torch import Tensor
+
+
+class LearnedPositionalEmbedding(nn.Embedding):
+    """
+    This module learns positional embeddings up to a fixed maximum size.
+    Padding ids are ignored by either offsetting based on padding_idx
+    or by setting padding_idx to None and ensuring that the appropriate
+    position ids are passed to the forward function.
+    """
+
+    def __init__(self, num_embeddings: int, embedding_dim: int, padding_idx: int):
+        super().__init__(num_embeddings, embedding_dim, padding_idx)
+        self.onnx_trace = False
+        if self.padding_idx is not None:
+            self.max_positions = self.num_embeddings - self.padding_idx - 1
+        else:
+            self.max_positions = self.num_embeddings
+
+    def forward(
+        self,
+        input: Tensor,
+        incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]] = None,
+        positions: Optional[Tensor] = None,
+    ):
+        """Input is expected to be of size [bsz x seqlen]."""
+        assert (positions is None) or (
+            self.padding_idx is None
+        ), "If positions is pre-computed then padding_idx should not be set."
+
+        if positions is None:
+            if incremental_state is not None:
+                # positions is the same for every token when decoding a single step
+                # Without the int() cast, it doesn't work in some cases when exporting to ONNX
+                positions = torch.zeros(
+                    (1, 1), device=input.device, dtype=input.dtype
+                ).fill_(int(self.padding_idx + input.size(1)))
+            else:
+                positions = utils.make_positions(
+                    input, self.padding_idx, onnx_trace=self.onnx_trace
+                )
+            positions = torch.clamp(positions, max=self.padding_idx + self.max_positions)
+        return F.embedding(
+            positions,
+            self.weight,
+            self.padding_idx,
+            self.max_norm,
+            self.norm_type,
+            self.scale_grad_by_freq,
+            self.sparse,
+        )
diff --git a/SpeechUT/speechut/modules/multihead_attention.py b/SpeechUT/speechut/modules/multihead_attention.py
new file mode 100644
index 0000000000000000000000000000000000000000..89f46ab628ebe7faa1a3db2fd4f31a7269bb006a
--- /dev/null
+++ b/SpeechUT/speechut/modules/multihead_attention.py
@@ -0,0 +1,346 @@
+# --------------------------------------------------------
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Based on fairseq code bases
+# https://github.com/facebookresearch/fairseq
+# --------------------------------------------------------
+
+from typing import Dict, Optional, Tuple
+
+import torch
+import torch.nn.functional as F
+from fairseq import utils
+from torch import Tensor
+
+from fairseq.modules import MultiheadAttention as FairseqMultiheadAttention
+
+
+class MultiheadAttention(FairseqMultiheadAttention):
+    """Multi-headed attention.
+
+    See "Attention Is All You Need" for more details.
+    """
+
+    def __init__(
+        self,
+        embed_dim,
+        num_heads,
+        kdim=None,
+        vdim=None,
+        dropout=0.0,
+        bias=True,
+        add_bias_kv=False,
+        add_zero_attn=False,
+        self_attention=False,
+        encoder_decoder_attention=False,
+        q_noise=0.0,
+        qn_block_size=8,
+        scaling_for_att=1.0
+    ):
+        super().__init__(
+            embed_dim,
+            num_heads,
+            kdim,
+            vdim,
+            dropout,
+            bias,
+            add_bias_kv,
+            add_zero_attn,
+            self_attention,
+            encoder_decoder_attention,
+            q_noise,
+            qn_block_size,
+        )
+        self.scaling_for_att = scaling_for_att
+
+    def forward(
+        self,
+        query,
+        key: Optional[Tensor],
+        value: Optional[Tensor],
+        key_padding_mask: Optional[Tensor] = None,
+        incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]] = None,
+        need_weights: bool = True,
+        static_kv: bool = False,
+        attn_mask: Optional[Tensor] = None,
+        before_softmax: bool = False,
+        need_head_weights: bool = False,
+        position_bias: Optional[Tensor] = None,
+    ) -> Tuple[Tensor, Optional[Tensor]]:
+        """Input shape: Time x Batch x Channel
+
+        Args:
+            key_padding_mask (ByteTensor, optional): mask to exclude
+                keys that are pads, of shape `(batch, src_len)`, where
+                padding elements are indicated by 1s.
+            need_weights (bool, optional): return the attention weights,
+                averaged over heads (default: False).
+            attn_mask (ByteTensor, optional): typically used to
+                implement causal attention, where the mask prevents the
+                attention from looking forward in time (default: None).
+            before_softmax (bool, optional): return the raw attention
+                weights and values before the attention softmax.
+            need_head_weights (bool, optional): return the attention
+                weights for each head. Implies *need_weights*. Default:
+                return the average attention weights over all heads.
+        """
+        if need_head_weights:
+            need_weights = True
+
+        is_tpu = query.device.type == "xla"
+
+        tgt_len, bsz, embed_dim = query.size()
+        src_len = tgt_len
+        assert embed_dim == self.embed_dim, f"query dim {embed_dim} != {self.embed_dim}"
+        assert list(query.size()) == [tgt_len, bsz, embed_dim]
+        if key is not None:
+            src_len, key_bsz, _ = key.size()
+            if not torch.jit.is_scripting():
+                assert key_bsz == bsz
+                assert value is not None
+                assert src_len, bsz == value.shape[:2]
+
+        if (
+            not self.onnx_trace
+            and not is_tpu  # don't use PyTorch version on TPUs
+            and incremental_state is None
+            and not static_kv
+            # A workaround for quantization to work. Otherwise JIT compilation
+            # treats bias in linear module as method.
+            and not torch.jit.is_scripting()
+            and position_bias is None
+        ):
+            assert key is not None and value is not None
+            return F.multi_head_attention_forward(
+                query,
+                key,
+                value,
+                self.embed_dim,
+                self.num_heads,
+                torch.empty([0]),
+                torch.cat((self.q_proj.bias, self.k_proj.bias, self.v_proj.bias)),
+                self.bias_k,
+                self.bias_v,
+                self.add_zero_attn,
+                self.dropout_module.p,
+                self.out_proj.weight,
+                self.out_proj.bias,
+                self.training or self.dropout_module.apply_during_inference,
+                key_padding_mask,
+                need_weights,
+                attn_mask,
+                use_separate_proj_weight=True,
+                q_proj_weight=self.q_proj.weight,
+                k_proj_weight=self.k_proj.weight,
+                v_proj_weight=self.v_proj.weight,
+            )
+
+        if incremental_state is not None:
+            saved_state = self._get_input_buffer(incremental_state)
+            if saved_state is not None and "prev_key" in saved_state:
+                # previous time steps are cached - no need to recompute
+                # key and value if they are static
+                if static_kv:
+                    assert self.encoder_decoder_attention and not self.self_attention
+                    key = value = None
+        else:
+            saved_state = None
+
+        if self.self_attention:
+            q = self.q_proj(query)
+            k = self.k_proj(query)
+            v = self.v_proj(query)
+        elif self.encoder_decoder_attention:
+            # encoder-decoder attention
+            q = self.q_proj(query)
+            if key is None:
+                assert value is None
+                k = v = None
+            else:
+                k = self.k_proj(key)
+                v = self.v_proj(key)
+
+        else:
+            assert key is not None and value is not None
+            q = self.q_proj(query)
+            k = self.k_proj(key)
+            v = self.v_proj(value)
+        q *= self.scaling
+        q *= (1 / self.scaling_for_att)
+
+        if self.bias_k is not None:
+            assert self.bias_v is not None
+            k = torch.cat([k, self.bias_k.repeat(1, bsz, 1)])
+            v = torch.cat([v, self.bias_v.repeat(1, bsz, 1)])
+            if attn_mask is not None:
+                attn_mask = torch.cat(
+                    [attn_mask, attn_mask.new_zeros(attn_mask.size(0), 1)], dim=1
+                )
+            if key_padding_mask is not None:
+                key_padding_mask = torch.cat(
+                    [
+                        key_padding_mask,
+                        key_padding_mask.new_zeros(key_padding_mask.size(0), 1),
+                    ],
+                    dim=1,
+                )
+
+        q = (
+            q.contiguous()
+            .view(tgt_len, bsz * self.num_heads, self.head_dim)
+            .transpose(0, 1)
+        )
+        if k is not None:
+            k = (
+                k.contiguous()
+                .view(-1, bsz * self.num_heads, self.head_dim)
+                .transpose(0, 1)
+            )
+        if v is not None:
+            v = (
+                v.contiguous()
+                .view(-1, bsz * self.num_heads, self.head_dim)
+                .transpose(0, 1)
+            )
+
+        if saved_state is not None:
+            # saved states are stored with shape (bsz, num_heads, seq_len, head_dim)
+            if "prev_key" in saved_state:
+                _prev_key = saved_state["prev_key"]
+                assert _prev_key is not None
+                prev_key = _prev_key.view(bsz * self.num_heads, -1, self.head_dim)
+                if static_kv:
+                    k = prev_key
+                else:
+                    assert k is not None
+                    k = torch.cat([prev_key, k], dim=1)
+                src_len = k.size(1)
+            if "prev_value" in saved_state:
+                _prev_value = saved_state["prev_value"]
+                assert _prev_value is not None
+                prev_value = _prev_value.view(bsz * self.num_heads, -1, self.head_dim)
+                if static_kv:
+                    v = prev_value
+                else:
+                    assert v is not None
+                    v = torch.cat([prev_value, v], dim=1)
+            prev_key_padding_mask: Optional[Tensor] = None
+            if "prev_key_padding_mask" in saved_state:
+                prev_key_padding_mask = saved_state["prev_key_padding_mask"]
+            assert k is not None and v is not None
+            key_padding_mask = MultiheadAttention._append_prev_key_padding_mask(
+                key_padding_mask=key_padding_mask,
+                prev_key_padding_mask=prev_key_padding_mask,
+                batch_size=bsz,
+                src_len=k.size(1),
+                static_kv=static_kv,
+            )
+
+            saved_state["prev_key"] = k.view(bsz, self.num_heads, -1, self.head_dim)
+            saved_state["prev_value"] = v.view(bsz, self.num_heads, -1, self.head_dim)
+            saved_state["prev_key_padding_mask"] = key_padding_mask
+            # In this branch incremental_state is never None
+            assert incremental_state is not None
+            incremental_state = self._set_input_buffer(incremental_state, saved_state)
+        assert k is not None
+        assert k.size(1) == src_len
+
+        # This is part of a workaround to get around fork/join parallelism
+        # not supporting Optional types.
+        if key_padding_mask is not None and key_padding_mask.dim() == 0:
+            key_padding_mask = None
+
+        if key_padding_mask is not None:
+            assert key_padding_mask.size(0) == bsz
+            assert key_padding_mask.size(1) == src_len
+
+        if self.add_zero_attn:
+            assert v is not None
+            src_len += 1
+            k = torch.cat([k, k.new_zeros((k.size(0), 1) + k.size()[2:])], dim=1)
+            v = torch.cat([v, v.new_zeros((v.size(0), 1) + v.size()[2:])], dim=1)
+            if attn_mask is not None:
+                attn_mask = torch.cat(
+                    [attn_mask, attn_mask.new_zeros(attn_mask.size(0), 1)], dim=1
+                )
+            if key_padding_mask is not None:
+                key_padding_mask = torch.cat(
+                    [
+                        key_padding_mask,
+                        torch.zeros(key_padding_mask.size(0), 1).type_as(
+                            key_padding_mask
+                        ),
+                    ],
+                    dim=1,
+                )
+
+        attn_weights = torch.bmm(q, k.transpose(1, 2))
+        attn_weights = self.apply_sparse_mask(attn_weights, tgt_len, src_len, bsz)
+
+        if position_bias is not None: ## first order
+            ## position_bias: [241, 241, 64]
+            #print ("attn_weights: ", attn_weights.size()) # [492, 241, 241]
+            reshape_q = q.contiguous().view(bsz * self.num_heads, -1, self.head_dim).transpose(0,1) #[241, 492, 64]
+            #print ("reshape_q: ", reshape_q.size())
+            B = torch.matmul(reshape_q, position_bias.transpose(-2, -1))
+            #print ("B: ", B.size())  ## [241, 492, 241]
+            #B = B.transpose(0, 1).view(bsz, self.num_heads, position_bias.size(0), position_bias.size(1))
+            B = B.transpose(0, 1).view(bsz*self.num_heads, position_bias.size(0), position_bias.size(1))
+            #print ("B 2: ", B.size())
+            attn_weights += B
+
+        attn_weights *= self.scaling_for_att
+        assert list(attn_weights.size()) == [bsz * self.num_heads, tgt_len, src_len]
+
+        if attn_mask is not None:
+            attn_mask = attn_mask.unsqueeze(0)
+            if self.onnx_trace:
+                attn_mask = attn_mask.repeat(attn_weights.size(0), 1, 1)
+            attn_weights += attn_mask
+
+        if key_padding_mask is not None:
+            # don't attend to padding symbols
+            attn_weights = attn_weights.view(bsz, self.num_heads, tgt_len, src_len)
+            if not is_tpu:
+                attn_weights = attn_weights.masked_fill(
+                    key_padding_mask.unsqueeze(1).unsqueeze(2).to(torch.bool),
+                    float("-inf"),
+                )
+            else:
+                attn_weights = attn_weights.transpose(0, 2)
+                attn_weights = attn_weights.masked_fill(key_padding_mask, float("-inf"))
+                attn_weights = attn_weights.transpose(0, 2)
+            attn_weights = attn_weights.view(bsz * self.num_heads, tgt_len, src_len)
+        
+        if self.scaling_for_att > 1.0:
+            attn_weights = attn_weights - attn_weights.detach().max(dim=-1, keepdim=True)[0]
+        
+        if before_softmax:
+            return attn_weights, v
+
+        attn_weights_float = utils.softmax(
+            attn_weights, dim=-1, onnx_trace=self.onnx_trace
+        )
+        attn_weights = attn_weights_float.type_as(attn_weights)
+        attn_probs = self.dropout_module(attn_weights)
+
+        assert v is not None
+        attn = torch.bmm(attn_probs, v)
+        assert list(attn.size()) == [bsz * self.num_heads, tgt_len, self.head_dim]
+        if self.onnx_trace and attn.size(1) == 1:
+            # when ONNX tracing a single decoder step (sequence length == 1)
+            # the transpose is a no-op copy before view, thus unnecessary
+            attn = attn.contiguous().view(tgt_len, bsz, embed_dim)
+        else:
+            attn = attn.transpose(0, 1).contiguous().view(tgt_len, bsz, embed_dim)
+        attn = self.out_proj(attn)
+        attn_weights: Optional[Tensor] = None
+        if need_weights:
+            attn_weights = attn_weights_float.view(
+                bsz, self.num_heads, tgt_len, src_len
+            ).transpose(1, 0)
+            if not need_head_weights:
+                # average attention weights over heads
+                attn_weights = attn_weights.mean(dim=0)
+
+        return attn, attn_weights
diff --git a/SpeechUT/speechut/modules/relative_pos_enc.py b/SpeechUT/speechut/modules/relative_pos_enc.py
new file mode 100644
index 0000000000000000000000000000000000000000..7021fc0941fef310ca5571c101b8a8e18ffc1db6
--- /dev/null
+++ b/SpeechUT/speechut/modules/relative_pos_enc.py
@@ -0,0 +1,33 @@
+# --------------------------------------------------------
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Based on fairseq code bases
+# https://github.com/facebookresearch/fairseq
+# --------------------------------------------------------
+
+import torch
+
+class RelativePositionalEncoding(torch.nn.Module):
+    def __init__(self, d_model, maxlen=1000, embed_v=False):
+        super(RelativePositionalEncoding, self).__init__()
+
+        self.d_model = d_model
+        self.maxlen = maxlen
+        self.pe_k = torch.nn.Embedding(2*maxlen, d_model) 
+        if embed_v:
+            self.pe_v = torch.nn.Embedding(2*maxlen, d_model)
+        self.embed_v = embed_v
+
+
+    def forward(self, pos_seq, incremental_state=None):
+        pos_seq[pos_seq < -self.maxlen] = -self.maxlen
+        pos_seq[pos_seq >= self.maxlen] = self.maxlen - 1
+        pos_seq = pos_seq + self.maxlen
+        
+        if incremental_state is not None:
+            pos_seq = pos_seq[-1:]
+
+        if self.embed_v:
+            return self.pe_k(pos_seq), self.pe_v(pos_seq)
+        else:
+            return self.pe_k(pos_seq), None
diff --git a/SpeechUT/speechut/modules/transformer_decoder.py b/SpeechUT/speechut/modules/transformer_decoder.py
new file mode 100644
index 0000000000000000000000000000000000000000..84417b44b2672e49cf92bad8355d2dae48661b55
--- /dev/null
+++ b/SpeechUT/speechut/modules/transformer_decoder.py
@@ -0,0 +1,543 @@
+# --------------------------------------------------------
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Based on fairseq code bases
+# https://github.com/facebookresearch/fairseq
+# --------------------------------------------------------
+
+"""
+    Modified from https://github.com/facebookresearch/fairseq/blob/main/fairseq/models/transformer/transformer_decoder.py
+"""
+
+import math
+from typing import Any, Dict, List, Optional
+
+import torch
+import torch.nn as nn
+from fairseq import utils
+from fairseq.distributed import fsdp_wrap
+from fairseq.models import FairseqIncrementalDecoder
+from fairseq.models.transformer import TransformerConfig
+from fairseq.modules import (
+    AdaptiveSoftmax,
+    BaseLayer,
+    FairseqDropout,
+    LayerDropModuleList,
+    LayerNorm,
+    PositionalEmbedding,
+    SinusoidalPositionalEmbedding,
+)
+from fairseq.modules.checkpoint_activations import checkpoint_wrapper
+from fairseq.modules.quant_noise import quant_noise as apply_quant_noise_
+from torch import Tensor
+
+from speechut.modules import transformer_layer
+from speechut.modules import RelativePositionalEncoding
+
+# rewrite name for backward compatibility in `make_generation_fast_`
+def module_name_fordropout(module_name: str) -> str:
+    if module_name == "TransformerDecoderBase":
+        return "TransformerDecoder"
+    else:
+        return module_name
+
+
+class TransformerDecoderBase(FairseqIncrementalDecoder):
+    """
+    Transformer decoder consisting of *cfg.decoder.layers* layers. Each layer
+    is a :class:`TransformerDecoderLayer`.
+
+    Args:
+        args (argparse.Namespace): parsed command-line arguments
+        dictionary (~fairseq.data.Dictionary): decoding dictionary
+        embed_tokens (torch.nn.Embedding): output embedding
+        no_encoder_attn (bool, optional): whether to attend to encoder outputs
+            (default: False).
+    """
+
+    def __init__(
+        self,
+        cfg,
+        dictionary,
+        embed_tokens,
+        no_encoder_attn=False,
+        output_projection=None,
+        use_rel_pos_enc=False,
+    ):
+        self.cfg = cfg
+        super().__init__(dictionary)
+        self.register_buffer("version", torch.Tensor([3]))
+        self._future_mask = torch.empty(0)
+
+        self.dropout_module = FairseqDropout(
+            cfg.dropout, module_name=module_name_fordropout(self.__class__.__name__)
+        )
+        self.decoder_layerdrop = cfg.decoder.layerdrop
+        self.share_input_output_embed = cfg.share_decoder_input_output_embed
+
+        input_embed_dim = embed_tokens.embedding_dim
+        embed_dim = cfg.decoder.embed_dim
+        self.embed_dim = embed_dim
+        self.output_embed_dim = cfg.decoder.output_dim
+
+        self.padding_idx = embed_tokens.padding_idx
+        self.max_target_positions = cfg.max_target_positions
+
+        self.embed_tokens = embed_tokens
+
+        self.embed_scale = 1.0 if cfg.no_scale_embedding else math.sqrt(embed_dim)
+
+        if not cfg.adaptive_input and cfg.quant_noise.pq > 0:
+            self.quant_noise = apply_quant_noise_(
+                nn.Linear(embed_dim, embed_dim, bias=False),
+                cfg.quant_noise.pq,
+                cfg.quant_noise.pq_block_size,
+            )
+        else:
+            self.quant_noise = None
+
+        self.project_in_dim = (
+            Linear(input_embed_dim, embed_dim, bias=False)
+            if embed_dim != input_embed_dim
+            else None
+        )
+        self.embed_positions = (
+            PositionalEmbedding(
+                self.max_target_positions,
+                embed_dim,
+                self.padding_idx,
+                learned=cfg.decoder.learned_pos,
+            )
+            if not cfg.no_token_positional_embeddings
+            else None
+        )
+        if cfg.layernorm_embedding:
+            self.layernorm_embedding = LayerNorm(embed_dim, export=cfg.export)
+        else:
+            self.layernorm_embedding = None
+
+        self.cross_self_attention = cfg.cross_self_attention
+
+        if self.decoder_layerdrop > 0.0:
+            self.layers = LayerDropModuleList(p=self.decoder_layerdrop)
+        else:
+            self.layers = nn.ModuleList([])
+        self.use_rel_pos_enc = use_rel_pos_enc
+        self.layers.extend(
+            [
+                self.build_decoder_layer(cfg, no_encoder_attn)
+                for _ in range(cfg.decoder.layers)
+            ]
+        )
+        self.num_layers = len(self.layers)
+
+        if cfg.decoder.normalize_before and not cfg.no_decoder_final_norm:
+            self.layer_norm = LayerNorm(embed_dim, export=cfg.export)
+        else:
+            self.layer_norm = None
+
+        self.project_out_dim = (
+            Linear(embed_dim, self.output_embed_dim, bias=False)
+            if embed_dim != self.output_embed_dim and not cfg.tie_adaptive_weights
+            else None
+        )
+
+        self.adaptive_softmax = None
+        self.output_projection = output_projection
+        if self.output_projection is None:
+            self.build_output_projection(cfg, dictionary, embed_tokens)
+        if self.use_rel_pos_enc:
+            self.pos_emb = RelativePositionalEncoding(embed_dim // cfg.decoder.attention_heads, 24)
+
+    def build_output_projection(self, cfg, dictionary, embed_tokens):
+        if cfg.adaptive_softmax_cutoff is not None:
+            self.adaptive_softmax = AdaptiveSoftmax(
+                len(dictionary),
+                self.output_embed_dim,
+                utils.eval_str_list(cfg.adaptive_softmax_cutoff, type=int),
+                dropout=cfg.adaptive_softmax_dropout,
+                adaptive_inputs=embed_tokens if cfg.tie_adaptive_weights else None,
+                factor=cfg.adaptive_softmax_factor,
+                tie_proj=cfg.tie_adaptive_proj,
+            )
+        elif self.share_input_output_embed:
+            self.output_projection = nn.Linear(
+                self.embed_tokens.weight.shape[1],
+                self.embed_tokens.weight.shape[0],
+                bias=False,
+            )
+            self.output_projection.weight = self.embed_tokens.weight
+        else:
+            self.output_projection = nn.Linear(
+                self.output_embed_dim, len(dictionary), bias=False
+            )
+            nn.init.normal_(
+                self.output_projection.weight, mean=0, std=self.output_embed_dim ** -0.5
+            )
+        num_base_layers = cfg.base_layers
+        for i in range(num_base_layers):
+            self.layers.insert(
+                ((i + 1) * cfg.decoder.layers) // (num_base_layers + 1),
+                BaseLayer(cfg),
+            )
+
+    def build_decoder_layer(self, cfg, no_encoder_attn=False):
+        layer = transformer_layer.TransformerDecoderLayerBase(cfg, no_encoder_attn, has_relative_attention_bias=self.use_rel_pos_enc)
+        checkpoint = cfg.checkpoint_activations
+        if checkpoint:
+            offload_to_cpu = cfg.offload_activations
+            layer = checkpoint_wrapper(layer, offload_to_cpu=offload_to_cpu)
+        # if we are checkpointing, enforce that FSDP always wraps the
+        # checkpointed layer, regardless of layer size
+        min_params_to_wrap = cfg.min_params_to_wrap if not checkpoint else 0
+        layer = fsdp_wrap(layer, min_num_params=min_params_to_wrap)
+        return layer
+
+    def forward(
+        self,
+        prev_output_tokens,
+        encoder_out: Optional[Dict[str, List[Tensor]]] = None,
+        incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]] = None,
+        features_only: bool = False,
+        full_context_alignment: bool = False,
+        alignment_layer: Optional[int] = None,
+        alignment_heads: Optional[int] = None,
+        src_lengths: Optional[Any] = None,
+        return_all_hiddens: bool = False,
+    ):
+        """
+        Args:
+            prev_output_tokens (LongTensor): previous decoder outputs of shape
+                `(batch, tgt_len)`, for teacher forcing
+            encoder_out (optional): output from the encoder, used for
+                encoder-side attention, should be of size T x B x C
+            incremental_state (dict): dictionary used for storing state during
+                :ref:`Incremental decoding`
+            features_only (bool, optional): only return features without
+                applying output layer (default: False).
+            full_context_alignment (bool, optional): don't apply
+                auto-regressive mask to self-attention (default: False).
+
+        Returns:
+            tuple:
+                - the decoder's output of shape `(batch, tgt_len, vocab)`
+                - a dictionary with any model-specific outputs
+        """
+
+        x, extra = self.extract_features(
+            prev_output_tokens,
+            encoder_out=encoder_out,
+            incremental_state=incremental_state,
+            full_context_alignment=full_context_alignment,
+            alignment_layer=alignment_layer,
+            alignment_heads=alignment_heads,
+        )
+
+        if not features_only:
+            x = self.output_layer(x)
+        return x, extra
+
+    def extract_features(
+        self,
+        prev_output_tokens,
+        encoder_out: Optional[Dict[str, List[Tensor]]],
+        incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]] = None,
+        full_context_alignment: bool = False,
+        alignment_layer: Optional[int] = None,
+        alignment_heads: Optional[int] = None,
+    ):
+        return self.extract_features_scriptable(
+            prev_output_tokens,
+            encoder_out,
+            incremental_state,
+            full_context_alignment,
+            alignment_layer,
+            alignment_heads,
+        )
+
+    """
+    A scriptable subclass of this class has an extract_features method and calls
+    super().extract_features, but super() is not supported in torchscript. A copy of
+    this function is made to be used in the subclass instead.
+    """
+
+    def extract_features_scriptable(
+        self,
+        prev_output_tokens,
+        encoder_out: Optional[Dict[str, List[Tensor]]],
+        incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]] = None,
+        full_context_alignment: bool = False,
+        alignment_layer: Optional[int] = None,
+        alignment_heads: Optional[int] = None,
+    ):
+        """
+        Similar to *forward* but only return features.
+
+        Includes several features from "Jointly Learning to Align and
+        Translate with Transformer Models" (Garg et al., EMNLP 2019).
+
+        Args:
+            full_context_alignment (bool, optional): don't apply
+                auto-regressive mask to self-attention (default: False).
+            alignment_layer (int, optional): return mean alignment over
+                heads at this layer (default: last layer).
+            alignment_heads (int, optional): only average alignment over
+                this many heads (default: all heads).
+
+        Returns:
+            tuple:
+                - the decoder's features of shape `(batch, tgt_len, embed_dim)`
+                - a dictionary with any model-specific outputs
+        """
+        bs, slen = prev_output_tokens.size()
+        if alignment_layer is None:
+            alignment_layer = self.num_layers - 1
+
+        enc: Optional[Tensor] = None
+        padding_mask: Optional[Tensor] = None
+        if encoder_out is not None and len(encoder_out["encoder_out"]) > 0:
+            enc = encoder_out["encoder_out"][0]
+            assert (
+                enc.size()[1] == bs
+            ), f"Expected enc.shape == (t, {bs}, c) got {enc.shape}"
+        if encoder_out is not None and len(encoder_out["encoder_padding_mask"]) > 0:
+            padding_mask = encoder_out["encoder_padding_mask"][0]
+
+        # embed positions
+        positions = None
+        if self.embed_positions is not None:
+            positions = self.embed_positions(
+                prev_output_tokens, incremental_state=incremental_state
+            )
+
+        if incremental_state is not None:
+            prev_output_tokens = prev_output_tokens[:, -1:]
+            if positions is not None:
+                positions = positions[:, -1:]
+
+        # embed tokens and positions
+        x = self.embed_scale * self.embed_tokens(prev_output_tokens)
+
+        if self.quant_noise is not None:
+            x = self.quant_noise(x)
+
+        if self.project_in_dim is not None:
+            x = self.project_in_dim(x)
+
+        if positions is not None:
+            x += positions
+
+        if self.layernorm_embedding is not None:
+            x = self.layernorm_embedding(x)
+
+        x = self.dropout_module(x)
+
+        # B x T x C -> T x B x C
+        x = x.transpose(0, 1)
+        if self.use_rel_pos_enc:
+            pos_seq = torch.arange(0, slen).long().to(x.device)
+            pos_seq = pos_seq[:, None] - pos_seq[None, :]
+            pos_k, _ = self.pos_emb(pos_seq, incremental_state)
+        else:
+            pos_k = None
+
+        self_attn_padding_mask: Optional[Tensor] = None
+        if self.cross_self_attention or prev_output_tokens.eq(self.padding_idx).any():
+            self_attn_padding_mask = prev_output_tokens.eq(self.padding_idx)
+
+        # decoder layers
+        attn: Optional[Tensor] = None
+        inner_states: List[Optional[Tensor]] = [x]
+        for idx, layer in enumerate(self.layers):
+            if incremental_state is None and not full_context_alignment:
+                self_attn_mask = self.buffered_future_mask(x)
+            else:
+                self_attn_mask = None
+
+            x, layer_attn, _ = layer(
+                x,
+                enc,
+                padding_mask,
+                incremental_state,
+                self_attn_mask=self_attn_mask,
+                self_attn_padding_mask=self_attn_padding_mask,
+                need_attn=bool((idx == alignment_layer)),
+                need_head_weights=bool((idx == alignment_layer)),
+                pos_bias=pos_k,
+            )
+            inner_states.append(x)
+            if layer_attn is not None and idx == alignment_layer:
+                attn = layer_attn.float().to(x)
+
+        if attn is not None:
+            if alignment_heads is not None:
+                attn = attn[:alignment_heads]
+
+            # average probabilities over heads
+            attn = attn.mean(dim=0)
+
+        if self.layer_norm is not None:
+            x = self.layer_norm(x)
+
+        # T x B x C -> B x T x C
+        x = x.transpose(0, 1)
+
+        if self.project_out_dim is not None:
+            x = self.project_out_dim(x)
+
+        return x, {"attn": [attn], "inner_states": inner_states}
+
+    def output_layer(self, features):
+        """Project features to the vocabulary size."""
+        if self.adaptive_softmax is None:
+            # project back to size of vocabulary
+            return self.output_projection(features)
+        else:
+            return features
+
+    def max_positions(self):
+        """Maximum output length supported by the decoder."""
+        if self.embed_positions is None:
+            return self.max_target_positions
+        return min(self.max_target_positions, self.embed_positions.max_positions)
+
+    def buffered_future_mask(self, tensor):
+        dim = tensor.size(0)
+        # self._future_mask.device != tensor.device is not working in TorchScript. This is a workaround.
+        if (
+            self._future_mask.size(0) == 0
+            or (not self._future_mask.device == tensor.device)
+            or self._future_mask.size(0) < dim
+        ):
+            self._future_mask = torch.triu(
+                utils.fill_with_neg_inf(torch.zeros([dim, dim])), 1
+            )
+        self._future_mask = self._future_mask.to(tensor)
+        return self._future_mask[:dim, :dim]
+
+    def upgrade_state_dict_named(self, state_dict, name):
+        """Upgrade a (possibly old) state dict for new versions of fairseq."""
+        if isinstance(self.embed_positions, SinusoidalPositionalEmbedding):
+            weights_key = "{}.embed_positions.weights".format(name)
+            if weights_key in state_dict:
+                del state_dict[weights_key]
+            state_dict[
+                "{}.embed_positions._float_tensor".format(name)
+            ] = torch.FloatTensor(1)
+
+        if f"{name}.output_projection.weight" not in state_dict:
+            if self.share_input_output_embed:
+                embed_out_key = f"{name}.embed_tokens.weight"
+            else:
+                embed_out_key = f"{name}.embed_out"
+            if embed_out_key in state_dict:
+                state_dict[f"{name}.output_projection.weight"] = state_dict[
+                    embed_out_key
+                ]
+                if not self.share_input_output_embed:
+                    del state_dict[embed_out_key]
+
+        for i in range(self.num_layers):
+            # update layer norms
+            layer_norm_map = {
+                "0": "self_attn_layer_norm",
+                "1": "encoder_attn_layer_norm",
+                "2": "final_layer_norm",
+            }
+            for old, new in layer_norm_map.items():
+                for m in ("weight", "bias"):
+                    k = "{}.layers.{}.layer_norms.{}.{}".format(name, i, old, m)
+                    if k in state_dict:
+                        state_dict[
+                            "{}.layers.{}.{}.{}".format(name, i, new, m)
+                        ] = state_dict[k]
+                        del state_dict[k]
+
+        version_key = "{}.version".format(name)
+        if utils.item(state_dict.get(version_key, torch.Tensor([1]))[0]) <= 2:
+            # earlier checkpoints did not normalize after the stack of layers
+            self.layer_norm = None
+            self.normalize = False
+            state_dict[version_key] = torch.Tensor([1])
+
+        return state_dict
+
+
+def Linear(in_features, out_features, bias=True):
+    m = nn.Linear(in_features, out_features, bias)
+    nn.init.xavier_uniform_(m.weight)
+    if bias:
+        nn.init.constant_(m.bias, 0.0)
+    return m
+
+class TransformerDecoderBaseScriptable(TransformerDecoderBase):
+    def extract_features(
+        self,
+        prev_output_tokens,
+        encoder_out: Optional[Dict[str, List[Tensor]]] = None,
+        incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]] = None,
+        full_context_alignment: bool = False,
+        alignment_layer: Optional[int] = None,
+        alignment_heads: Optional[int] = None,
+    ):
+        # call scriptable method from parent class
+        x, _ = self.extract_features_scriptable(
+            prev_output_tokens,
+            encoder_out,
+            incremental_state,
+            full_context_alignment,
+            alignment_layer,
+            alignment_heads,
+        )
+        return x, None
+
+
+class TransformerDecoder(TransformerDecoderBase):
+    def __init__(
+        self,
+        args,
+        dictionary,
+        embed_tokens,
+        no_encoder_attn=False,
+        output_projection=None,
+    ):
+        self.args = args
+        super().__init__(
+            TransformerConfig.from_namespace(args),
+            dictionary,
+            embed_tokens,
+            no_encoder_attn=no_encoder_attn,
+            output_projection=output_projection,
+            use_rel_pos_enc=getattr(args, "use_rel_pos_enc", False),
+        )
+
+    def build_output_projection(self, args, dictionary, embed_tokens):
+        super().build_output_projection(
+            TransformerConfig.from_namespace(args), dictionary, embed_tokens
+        )
+
+    def build_decoder_layer(self, args, no_encoder_attn=False):
+        return super().build_decoder_layer(
+            TransformerConfig.from_namespace(args), no_encoder_attn=no_encoder_attn
+        )
+
+class TransformerDecoderScriptable(TransformerDecoder):
+    def extract_features(
+        self,
+        prev_output_tokens,
+        encoder_out: Optional[Dict[str, List[Tensor]]] = None,
+        incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]] = None,
+        full_context_alignment: bool = False,
+        alignment_layer: Optional[int] = None,
+        alignment_heads: Optional[int] = None,
+    ):
+        # call scriptable method from parent class
+        x, _ = self.extract_features_scriptable(
+            prev_output_tokens,
+            encoder_out,
+            incremental_state,
+            full_context_alignment,
+            alignment_layer,
+            alignment_heads,
+        )
+        return x, None
diff --git a/SpeechUT/speechut/modules/transformer_encoder.py b/SpeechUT/speechut/modules/transformer_encoder.py
new file mode 100644
index 0000000000000000000000000000000000000000..f94e1fed8a005ec59d1e422157e08d88ff95bfda
--- /dev/null
+++ b/SpeechUT/speechut/modules/transformer_encoder.py
@@ -0,0 +1,401 @@
+# --------------------------------------------------------
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Based on fairseq code bases
+# https://github.com/facebookresearch/fairseq
+# --------------------------------------------------------
+
+import math
+from typing import Dict, List, Optional
+
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from fairseq import utils
+from fairseq.distributed import fsdp_wrap
+from fairseq.models import FairseqEncoder
+from fairseq.modules import (
+    FairseqDropout,
+    LayerDropModuleList,
+    LayerNorm,
+    SinusoidalPositionalEmbedding,
+)
+from fairseq.modules.checkpoint_activations import checkpoint_wrapper
+from fairseq.modules.quant_noise import quant_noise as apply_quant_noise_
+from torch import Tensor
+from fairseq.models.transformer import (
+    TransformerConfig,
+)
+
+
+from speechut.modules import transformer_layer, LearnedPositionalEmbedding
+from speechut.modules import RelativePositionalEncoding
+
+# rewrite name for backward compatibility in `make_generation_fast_`
+def module_name_fordropout(module_name: str) -> str:
+    if module_name == "TransformerEncoderBase":
+        return "TransformerEncoder"
+    else:
+        return module_name
+
+
+class TransformerEncoderBase(FairseqEncoder):
+    """
+    Transformer encoder consisting of *cfg.encoder.layers* layers. Each layer
+    is a :class:`TransformerEncoderLayer`.
+
+    Args:
+        args (argparse.Namespace): parsed command-line arguments
+        dictionary (~fairseq.data.Dictionary): encoding dictionary
+        embed_tokens (torch.nn.Embedding): input embedding
+    """
+
+    def __init__(self, cfg, dictionary, embed_tokens, use_rel_pos_enc=False, scaling_for_att=1.0):
+        self.cfg = cfg
+        super().__init__(dictionary)
+        self.register_buffer("version", torch.Tensor([3]))
+
+        self.dropout_module = FairseqDropout(
+            cfg.dropout, module_name=module_name_fordropout(self.__class__.__name__)
+        )
+        self.encoder_layerdrop = cfg.encoder.layerdrop
+
+        embed_dim = embed_tokens.embedding_dim
+        self.padding_idx = embed_tokens.padding_idx
+        self.max_source_positions = cfg.max_source_positions
+
+        self.embed_tokens = embed_tokens
+
+        self.embed_scale = 1.0 if cfg.no_scale_embedding else math.sqrt(embed_dim)
+
+        self.embed_positions = (
+            PositionalEmbedding(
+                cfg.max_source_positions,
+                embed_dim,
+                self.padding_idx,
+                learned=cfg.encoder.learned_pos,
+            )
+            if not cfg.no_token_positional_embeddings
+            else None
+        )
+        if cfg.layernorm_embedding:
+            self.layernorm_embedding = LayerNorm(embed_dim, export=cfg.export)
+        else:
+            self.layernorm_embedding = None
+
+        if not cfg.adaptive_input and cfg.quant_noise.pq > 0:
+            self.quant_noise = apply_quant_noise_(
+                nn.Linear(embed_dim, embed_dim, bias=False),
+                cfg.quant_noise.pq,
+                cfg.quant_noise.pq_block_size,
+            )
+        else:
+            self.quant_noise = None
+
+        if self.encoder_layerdrop > 0.0:
+            self.layers = LayerDropModuleList(p=self.encoder_layerdrop)
+        else:
+            self.layers = nn.ModuleList([])
+        self.use_rel_pos_enc = use_rel_pos_enc
+        self.scaling_for_att = scaling_for_att
+        self.layers.extend(
+            [self.build_encoder_layer(cfg) for i in range(cfg.encoder.layers)]
+        )
+        self.num_layers = len(self.layers)
+
+        if cfg.encoder.normalize_before:
+            self.layer_norm = LayerNorm(embed_dim, export=cfg.export)
+        else:
+            self.layer_norm = None
+        if self.use_rel_pos_enc:
+            self.pos_emb = RelativePositionalEncoding(embed_dim // cfg.encoder.attention_heads, 160)
+
+    def build_encoder_layer(self, cfg):
+        layer = transformer_layer.TransformerEncoderLayerBase(cfg, has_relative_attention_bias=self.use_rel_pos_enc, scaling_for_att=self.scaling_for_att)
+        checkpoint = cfg.checkpoint_activations
+        if checkpoint:
+            offload_to_cpu = cfg.offload_activations
+            layer = checkpoint_wrapper(layer, offload_to_cpu=offload_to_cpu)
+        # if we are checkpointing, enforce that FSDP always wraps the
+        # checkpointed layer, regardless of layer size
+        min_params_to_wrap = cfg.min_params_to_wrap if not checkpoint else 0
+        layer = fsdp_wrap(layer, min_num_params=min_params_to_wrap)
+        return layer
+
+    def forward_embedding(
+        self, src_tokens, token_embedding: Optional[torch.Tensor] = None
+    ):
+        # embed tokens and positions
+        if token_embedding is None:
+            token_embedding = self.embed_tokens(src_tokens)
+        x = embed = self.embed_scale * token_embedding
+        if self.embed_positions is not None:
+            x = embed + self.embed_positions(src_tokens)
+        if self.layernorm_embedding is not None:
+            x = self.layernorm_embedding(x)
+        x = self.dropout_module(x)
+        if self.quant_noise is not None:
+            x = self.quant_noise(x)
+        return x, embed
+
+    def forward(
+        self,
+        src_tokens,
+        src_lengths: Optional[torch.Tensor] = None,
+        return_all_hiddens: bool = False,
+        token_embeddings: Optional[torch.Tensor] = None,
+        uniformity_layers: Optional[List[int]] = None,
+    ):
+        """
+        Args:
+            src_tokens (LongTensor): tokens in the source language of shape
+                `(batch, src_len)`
+            src_lengths (torch.LongTensor): lengths of each source sentence of
+                shape `(batch)`
+            return_all_hiddens (bool, optional): also return all of the
+                intermediate hidden states (default: False).
+            token_embeddings (torch.Tensor, optional): precomputed embeddings
+                default `None` will recompute embeddings
+
+        Returns:
+            dict:
+                - **encoder_out** (Tensor): the last encoder layer's output of
+                  shape `(src_len, batch, embed_dim)`
+                - **encoder_padding_mask** (ByteTensor): the positions of
+                  padding elements of shape `(batch, src_len)`
+                - **encoder_embedding** (Tensor): the (scaled) embedding lookup
+                  of shape `(batch, src_len, embed_dim)`
+                - **encoder_states** (List[Tensor]): all intermediate
+                  hidden states of shape `(src_len, batch, embed_dim)`.
+                  Only populated if *return_all_hiddens* is True.
+        """
+        return self.forward_scriptable(
+            src_tokens, src_lengths, return_all_hiddens, token_embeddings, uniformity_layers
+        )
+
+    # TorchScript doesn't support super() method so that the scriptable Subclass
+    # can't access the base class model in Torchscript.
+    # Current workaround is to add a helper function with different name and
+    # call the helper function from scriptable Subclass.
+    def forward_scriptable(
+        self,
+        src_tokens,
+        src_lengths: Optional[torch.Tensor] = None,
+        return_all_hiddens: bool = False,
+        token_embeddings: Optional[torch.Tensor] = None,
+        uniformity_layers: Optional[List[int]] = None,
+    ):
+        """
+        Args:
+            src_tokens (LongTensor): tokens in the source language of shape
+                `(batch, src_len)`
+            src_lengths (torch.LongTensor): lengths of each source sentence of
+                shape `(batch)`
+            return_all_hiddens (bool, optional): also return all of the
+                intermediate hidden states (default: False).
+            token_embeddings (torch.Tensor, optional): precomputed embeddings
+                default `None` will recompute embeddings
+
+        Returns:
+            dict:
+                - **encoder_out** (Tensor): the last encoder layer's output of
+                  shape `(src_len, batch, embed_dim)`
+                - **encoder_padding_mask** (ByteTensor): the positions of
+                  padding elements of shape `(batch, src_len)`
+                - **encoder_embedding** (Tensor): the (scaled) embedding lookup
+                  of shape `(batch, src_len, embed_dim)`
+                - **encoder_states** (List[Tensor]): all intermediate
+                  hidden states of shape `(src_len, batch, embed_dim)`.
+                  Only populated if *return_all_hiddens* is True.
+        """
+        # compute padding mask
+        encoder_padding_mask = src_tokens.eq(self.padding_idx)
+        has_pads = src_tokens.device.type == "xla" or encoder_padding_mask.any()
+
+        x, encoder_embedding = self.forward_embedding(src_tokens, token_embeddings)
+
+        # account for padding while computing the representation
+        if has_pads:
+            x = x * (1 - encoder_padding_mask.unsqueeze(-1).type_as(x))
+
+        # B x T x C -> T x B x C
+        x = x.transpose(0, 1)
+        if self.use_rel_pos_enc:
+            x_len = x.shape[0]
+            pos_seq = torch.arange(0, x_len).long().to(x.device)
+            pos_seq = pos_seq[:, None] - pos_seq[None, :]
+            pos_k, pos_v = self.pos_emb(pos_seq)
+        else:
+            pos_k = None
+
+        encoder_states = []
+        uniformity_hiddens = []
+
+        if return_all_hiddens:
+            encoder_states.append(x)
+
+        if uniformity_layers is not None and 0 in uniformity_layers:
+            x = F.normalize(x.float(), dim=-1).type_as(x)
+            uniformity_hiddens.append(x)
+
+        # encoder layers
+        for i, layer in enumerate(self.layers):
+            x = layer(
+                x, encoder_padding_mask=encoder_padding_mask if has_pads else None,
+                pos_bias=pos_k,
+            )
+            if uniformity_layers is not None and i+1 in uniformity_layers:
+                x = F.normalize(x.float(), dim=-1).type_as(x)
+                uniformity_hiddens.append(x)
+            if return_all_hiddens:
+                assert encoder_states is not None
+                encoder_states.append(x)
+
+        if self.layer_norm is not None:
+            x = self.layer_norm(x)
+
+        # The Pytorch Mobile lite interpreter does not supports returning NamedTuple in
+        # `forward` so we use a dictionary instead.
+        # TorchScript does not support mixed values so the values are all lists.
+        # The empty list is equivalent to None.
+        src_lengths = (
+            src_tokens.ne(self.padding_idx)
+            .sum(dim=1, dtype=torch.int32)
+            .reshape(-1, 1)
+            .contiguous()
+        )
+        return {
+            "encoder_out": [x],  # T x B x C
+            "encoder_padding_mask": [encoder_padding_mask],  # B x T
+            "encoder_embedding": [encoder_embedding],  # B x T x C
+            "encoder_states": encoder_states,  # List[T x B x C]
+            "uniformity_hiddens": uniformity_hiddens, # List[T x B x C]
+            "src_tokens": [],
+            "src_lengths": [src_lengths],
+        }
+
+    @torch.jit.export
+    def reorder_encoder_out(self, encoder_out: Dict[str, List[Tensor]], new_order):
+        """
+        Reorder encoder output according to *new_order*.
+
+        Args:
+            encoder_out: output from the ``forward()`` method
+            new_order (LongTensor): desired order
+
+        Returns:
+            *encoder_out* rearranged according to *new_order*
+        """
+        if len(encoder_out["encoder_out"]) == 0:
+            new_encoder_out = []
+        else:
+            new_encoder_out = [encoder_out["encoder_out"][0].index_select(1, new_order)]
+        if len(encoder_out["encoder_padding_mask"]) == 0:
+            new_encoder_padding_mask = []
+        else:
+            new_encoder_padding_mask = [
+                encoder_out["encoder_padding_mask"][0].index_select(0, new_order)
+            ]
+        if len(encoder_out["encoder_embedding"]) == 0:
+            new_encoder_embedding = []
+        else:
+            new_encoder_embedding = [
+                encoder_out["encoder_embedding"][0].index_select(0, new_order)
+            ]
+
+        if len(encoder_out["src_tokens"]) == 0:
+            src_tokens = []
+        else:
+            src_tokens = [(encoder_out["src_tokens"][0]).index_select(0, new_order)]
+
+        if len(encoder_out["src_lengths"]) == 0:
+            src_lengths = []
+        else:
+            src_lengths = [(encoder_out["src_lengths"][0]).index_select(0, new_order)]
+
+        encoder_states = encoder_out["encoder_states"]
+        if len(encoder_states) > 0:
+            for idx, state in enumerate(encoder_states):
+                encoder_states[idx] = state.index_select(1, new_order)
+
+        return {
+            "encoder_out": new_encoder_out,  # T x B x C
+            "encoder_padding_mask": new_encoder_padding_mask,  # B x T
+            "encoder_embedding": new_encoder_embedding,  # B x T x C
+            "encoder_states": encoder_states,  # List[T x B x C]
+            "src_tokens": src_tokens,  # B x T
+            "src_lengths": src_lengths,  # B x 1
+        }
+
+    def max_positions(self):
+        """Maximum input length supported by the encoder."""
+        if self.embed_positions is None:
+            return self.max_source_positions
+        return min(self.max_source_positions, self.embed_positions.max_positions)
+
+    def upgrade_state_dict_named(self, state_dict, name):
+        """Upgrade a (possibly old) state dict for new versions of fairseq."""
+        if isinstance(self.embed_positions, SinusoidalPositionalEmbedding):
+            weights_key = "{}.embed_positions.weights".format(name)
+            if weights_key in state_dict:
+                print("deleting {0}".format(weights_key))
+                del state_dict[weights_key]
+            state_dict[
+                "{}.embed_positions._float_tensor".format(name)
+            ] = torch.FloatTensor(1)
+        for i in range(self.num_layers):
+            # update layer norms
+            self.layers[i].upgrade_state_dict_named(
+                state_dict, "{}.layers.{}".format(name, i)
+            )
+
+        version_key = "{}.version".format(name)
+        if utils.item(state_dict.get(version_key, torch.Tensor([1]))[0]) < 2:
+            # earlier checkpoints did not normalize after the stack of layers
+            self.layer_norm = None
+            self.normalize = False
+            state_dict[version_key] = torch.Tensor([1])
+        return state_dict
+
+
+class TransformerEncoder(TransformerEncoderBase):
+    def __init__(self, args, dictionary, embed_tokens):
+        self.args = args
+        super().__init__(
+            TransformerConfig.from_namespace(args),
+            dictionary,
+            embed_tokens,
+            use_rel_pos_enc=getattr(args, "use_rel_pos_enc", False),
+            scaling_for_att=getattr(args, "scaling_for_att", 1.0),
+        )
+
+    def build_encoder_layer(self, args):
+        return super().build_encoder_layer(
+            TransformerConfig.from_namespace(args),
+        )
+
+
+def PositionalEmbedding(
+    num_embeddings: int,
+    embedding_dim: int,
+    padding_idx: int,
+    learned: bool = False,
+):
+    if learned:
+        # if padding_idx is specified then offset the embedding ids by
+        # this index and adjust num_embeddings appropriately
+        # TODO: The right place for this offset would be inside
+        # LearnedPositionalEmbedding. Move this there for a cleaner implementation.
+        if padding_idx is not None:
+            num_embeddings = num_embeddings + padding_idx + 1
+        m = LearnedPositionalEmbedding(num_embeddings, embedding_dim, padding_idx)
+        nn.init.normal_(m.weight, mean=0, std=embedding_dim**-0.5)
+        if padding_idx is not None:
+            nn.init.constant_(m.weight[padding_idx], 0)
+    else:
+        m = SinusoidalPositionalEmbedding(
+            embedding_dim,
+            padding_idx,
+            init_size=num_embeddings + padding_idx + 1,
+        )
+    return m
diff --git a/SpeechUT/speechut/modules/transformer_layer.py b/SpeechUT/speechut/modules/transformer_layer.py
new file mode 100644
index 0000000000000000000000000000000000000000..a71a848f1a5436756168aafd12d71637520b6b67
--- /dev/null
+++ b/SpeechUT/speechut/modules/transformer_layer.py
@@ -0,0 +1,330 @@
+# --------------------------------------------------------
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Based on fairseq code bases
+# https://github.com/facebookresearch/fairseq
+# --------------------------------------------------------
+
+"""
+    Modified from https://github.com/facebookresearch/fairseq/blob/main/fairseq/modules/transformer_layer.py
+    https://github.com/microsoft/SpeechT5/blob/main/Speech2C/speech2c/models/modules/transformer_decoder_layer.py
+"""
+
+from typing import Dict, List, Optional
+
+import torch
+from torch import Tensor
+from fairseq.modules import LayerNorm
+from fairseq.modules.transformer_layer import TransformerEncoderLayerBase as FairseqTransformerEncoderLayerBase
+from fairseq.modules.transformer_layer import TransformerDecoderLayerBase as FairseqTransformerDecoderLayerBase
+
+from speechut.modules import MultiheadAttention
+
+class TransformerEncoderLayerBase(FairseqTransformerEncoderLayerBase):
+    """Encoder layer block.
+
+    In the original paper each operation (multi-head attention or FFN) is
+    postprocessed with: `dropout -> add residual -> layernorm`. In the
+    tensor2tensor code they suggest that learning is more robust when
+    preprocessing each layer with layernorm and postprocessing with:
+    `dropout -> add residual`. We default to the approach in the paper, but the
+    tensor2tensor approach can be enabled by setting
+    *cfg.encoder.normalize_before* to ``True``.
+
+    Args:
+        args (argparse.Namespace): parsed command-line arguments
+    """
+
+    def __init__(self, cfg, has_relative_attention_bias=False, scaling_for_att=1.0):
+        self.scaling_for_att = scaling_for_att
+        super().__init__(cfg)
+        if has_relative_attention_bias:
+            self.norm_k = LayerNorm(self.embed_dim // cfg.encoder.attention_heads)
+
+    def build_self_attention(self, embed_dim, cfg, scaling_for_att=1.0):
+        return MultiheadAttention(
+            embed_dim,
+            cfg.encoder.attention_heads,
+            dropout=cfg.attention_dropout,
+            self_attention=True,
+            q_noise=self.quant_noise,
+            qn_block_size=self.quant_noise_block_size,
+            scaling_for_att=self.scaling_for_att,
+        )
+
+    def forward(
+        self,
+        x,
+        encoder_padding_mask: Optional[Tensor],
+        attn_mask: Optional[Tensor] = None,
+        pos_bias=None,
+    ):
+        """
+        Args:
+            x (Tensor): input to the layer of shape `(seq_len, batch, embed_dim)`
+            encoder_padding_mask (ByteTensor): binary ByteTensor of shape
+                `(batch, seq_len)` where padding elements are indicated by ``1``.
+            attn_mask (ByteTensor): binary tensor of shape `(tgt_len, src_len)`,
+                where `tgt_len` is the length of output and `src_len` is the
+                length of input, though here both are equal to `seq_len`.
+                `attn_mask[tgt_i, src_j] = 1` means that when calculating the
+                embedding for `tgt_i`, we exclude (mask out) `src_j`. This is
+                useful for strided self-attention.
+
+        Returns:
+            encoded output of shape `(seq_len, batch, embed_dim)`
+        """
+        # anything in original attn_mask = 1, becomes -1e8
+        # anything in original attn_mask = 0, becomes 0
+        # Note that we cannot use -inf here, because at some edge cases,
+        # the attention weight (before softmax) for some padded element in query
+        # will become -inf, which results in NaN in model parameters
+        if attn_mask is not None:
+            attn_mask = attn_mask.masked_fill(
+                attn_mask.to(torch.bool), -1e8 if x.dtype == torch.float32 else -1e4
+            )
+
+        residual = x
+        if self.normalize_before:
+            x = self.self_attn_layer_norm(x)
+            if pos_bias is not None:
+                pos_bias = self.norm_k(pos_bias)
+        x, _ = self.self_attn(
+            query=x,
+            key=x,
+            value=x,
+            key_padding_mask=encoder_padding_mask,
+            need_weights=False,
+            attn_mask=attn_mask,
+            position_bias=pos_bias,
+        )
+        x = self.dropout_module(x)
+        x = self.residual_connection(x, residual)
+        if not self.normalize_before:
+            x = self.self_attn_layer_norm(x)
+
+        residual = x
+        if self.normalize_before:
+            x = self.final_layer_norm(x)
+        x = self.activation_fn(self.fc1(x))
+        x = self.activation_dropout_module(x)
+        x = self.fc2(x)
+        x = self.dropout_module(x)
+        x = self.residual_connection(x, residual)
+        if not self.normalize_before:
+            x = self.final_layer_norm(x)
+        return x
+
+
+
+class TransformerDecoderLayerBase(FairseqTransformerDecoderLayerBase):
+    """Decoder layer block.
+
+    In the original paper each operation (multi-head attention, encoder
+    attention or FFN) is postprocessed with: `dropout -> add residual ->
+    layernorm`. In the tensor2tensor code they suggest that learning is more
+    robust when preprocessing each layer with layernorm and postprocessing with:
+    `dropout -> add residual`. We default to the approach in the paper, but the
+    tensor2tensor approach can be enabled by setting
+    *cfg.decoder.normalize_before* to ``True``.
+
+    Args:
+        args (argparse.Namespace): parsed command-line arguments
+        no_encoder_attn (bool, optional): whether to attend to encoder outputs
+            (default: False).
+    """
+
+    def __init__(
+        self, cfg, no_encoder_attn=False, add_bias_kv=False, add_zero_attn=False, has_relative_attention_bias=False, scaling_for_att=1.0,
+    ):
+        self.scaling_for_att = scaling_for_att
+        super().__init__(cfg,
+            no_encoder_attn,
+            add_bias_kv,
+            add_zero_attn,
+        )
+
+        if has_relative_attention_bias:
+            self.norm_k = LayerNorm(self.embed_dim // cfg.decoder.attention_heads)
+
+    def build_self_attention(
+        self, embed_dim, cfg, add_bias_kv=False, add_zero_attn=False
+    ):
+        return MultiheadAttention(
+            embed_dim,
+            cfg.decoder.attention_heads,
+            dropout=cfg.attention_dropout,
+            add_bias_kv=add_bias_kv,
+            add_zero_attn=add_zero_attn,
+            self_attention=not cfg.cross_self_attention,
+            q_noise=self.quant_noise,
+            qn_block_size=self.quant_noise_block_size,
+            scaling_for_att=self.scaling_for_att,
+        )
+
+    def build_encoder_attention(self, embed_dim, cfg):
+        return MultiheadAttention(
+            embed_dim,
+            cfg.decoder.attention_heads,
+            kdim=cfg.encoder.embed_dim,
+            vdim=cfg.encoder.embed_dim,
+            dropout=cfg.attention_dropout,
+            encoder_decoder_attention=True,
+            q_noise=self.quant_noise,
+            qn_block_size=self.quant_noise_block_size,
+            scaling_for_att=self.scaling_for_att,
+        )
+
+    def forward(
+        self,
+        x,
+        encoder_out: Optional[torch.Tensor] = None,
+        encoder_padding_mask: Optional[torch.Tensor] = None,
+        incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]] = None,
+        prev_self_attn_state: Optional[List[torch.Tensor]] = None,
+        prev_attn_state: Optional[List[torch.Tensor]] = None,
+        self_attn_mask: Optional[torch.Tensor] = None,
+        self_attn_padding_mask: Optional[torch.Tensor] = None,
+        need_attn: bool = False,
+        need_head_weights: bool = False,
+        pos_bias=None,
+    ):
+        """
+        Args:
+            x (Tensor): input to the layer of shape `(seq_len, batch, embed_dim)`
+            encoder_padding_mask (ByteTensor, optional): binary
+                ByteTensor of shape `(batch, src_len)` where padding
+                elements are indicated by ``1``.
+            need_attn (bool, optional): return attention weights
+            need_head_weights (bool, optional): return attention weights
+                for each head (default: return average over heads).
+
+        Returns:
+            encoded output of shape `(seq_len, batch, embed_dim)`
+        """
+        if need_head_weights:
+            need_attn = True
+
+        residual = x
+        if self.normalize_before:
+            x = self.self_attn_layer_norm(x)
+            if pos_bias is not None:
+                pos_bias = self.norm_k(pos_bias)
+        if prev_self_attn_state is not None:
+            prev_key, prev_value = prev_self_attn_state[:2]
+            saved_state: Dict[str, Optional[Tensor]] = {
+                "prev_key": prev_key,
+                "prev_value": prev_value,
+            }
+            if len(prev_self_attn_state) >= 3:
+                saved_state["prev_key_padding_mask"] = prev_self_attn_state[2]
+            assert incremental_state is not None
+            self.self_attn._set_input_buffer(incremental_state, saved_state)
+        _self_attn_input_buffer = self.self_attn._get_input_buffer(incremental_state)
+        if self.cross_self_attention and not (
+            incremental_state is not None
+            and _self_attn_input_buffer is not None
+            and "prev_key" in _self_attn_input_buffer
+        ):
+            if self_attn_mask is not None:
+                assert encoder_out is not None
+                self_attn_mask = torch.cat(
+                    (x.new_zeros(x.size(0), encoder_out.size(0)), self_attn_mask), dim=1
+                )
+            if self_attn_padding_mask is not None:
+                if encoder_padding_mask is None:
+                    assert encoder_out is not None
+                    encoder_padding_mask = self_attn_padding_mask.new_zeros(
+                        encoder_out.size(1), encoder_out.size(0)
+                    )
+                self_attn_padding_mask = torch.cat(
+                    (encoder_padding_mask, self_attn_padding_mask), dim=1
+                )
+            assert encoder_out is not None
+            y = torch.cat((encoder_out, x), dim=0)
+        else:
+            y = x
+
+        x, attn = self.self_attn(
+            query=x,
+            key=y,
+            value=y,
+            key_padding_mask=self_attn_padding_mask,
+            incremental_state=incremental_state,
+            need_weights=False,
+            attn_mask=self_attn_mask,
+            position_bias=pos_bias,
+        )
+        if self.c_attn is not None:
+            tgt_len, bsz = x.size(0), x.size(1)
+            x = x.view(tgt_len, bsz, self.nh, self.head_dim)
+            x = torch.einsum("tbhd,h->tbhd", x, self.c_attn)
+            x = x.reshape(tgt_len, bsz, self.embed_dim)
+        if self.attn_ln is not None:
+            x = self.attn_ln(x)
+        x = self.dropout_module(x)
+        x = self.residual_connection(x, residual)
+        if not self.normalize_before:
+            x = self.self_attn_layer_norm(x)
+
+        if self.encoder_attn is not None and encoder_out is not None:
+            residual = x
+            if self.normalize_before:
+                x = self.encoder_attn_layer_norm(x)
+            if prev_attn_state is not None:
+                prev_key, prev_value = prev_attn_state[:2]
+                saved_state: Dict[str, Optional[Tensor]] = {
+                    "prev_key": prev_key,
+                    "prev_value": prev_value,
+                }
+                if len(prev_attn_state) >= 3:
+                    saved_state["prev_key_padding_mask"] = prev_attn_state[2]
+                assert incremental_state is not None
+                self.encoder_attn._set_input_buffer(incremental_state, saved_state)
+
+            x, attn = self.encoder_attn(
+                query=x,
+                key=encoder_out,
+                value=encoder_out,
+                key_padding_mask=encoder_padding_mask,
+                incremental_state=incremental_state,
+                static_kv=True,
+                need_weights=need_attn or (not self.training and self.need_attn),
+                need_head_weights=need_head_weights,
+            )
+            x = self.dropout_module(x)
+            x = self.residual_connection(x, residual)
+            if not self.normalize_before:
+                x = self.encoder_attn_layer_norm(x)
+
+        residual = x
+        if self.normalize_before:
+            x = self.final_layer_norm(x)
+
+        x = self.activation_fn(self.fc1(x))
+        x = self.activation_dropout_module(x)
+        if self.ffn_layernorm is not None:
+            x = self.ffn_layernorm(x)
+        x = self.fc2(x)
+        x = self.dropout_module(x)
+        if self.w_resid is not None:
+            residual = torch.mul(self.w_resid, residual)
+        x = self.residual_connection(x, residual)
+        if not self.normalize_before:
+            x = self.final_layer_norm(x)
+        if self.onnx_trace and incremental_state is not None:
+            saved_state = self.self_attn._get_input_buffer(incremental_state)
+            assert saved_state is not None
+            if self_attn_padding_mask is not None:
+                self_attn_state = [
+                    saved_state["prev_key"],
+                    saved_state["prev_value"],
+                    saved_state["prev_key_padding_mask"],
+                ]
+            else:
+                self_attn_state = [saved_state["prev_key"], saved_state["prev_value"]]
+            return x, attn, self_attn_state
+        return x, attn, None
+
+    def make_generation_fast_(self, need_attn: bool = False, **kwargs):
+        self.need_attn = need_attn
diff --git a/SpeechUT/speechut/modules/w2v_encoder.py b/SpeechUT/speechut/modules/w2v_encoder.py
new file mode 100644
index 0000000000000000000000000000000000000000..386f1eb0a4f4f67b552271e65c0b402d197e5bb2
--- /dev/null
+++ b/SpeechUT/speechut/modules/w2v_encoder.py
@@ -0,0 +1,281 @@
+# --------------------------------------------------------
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Based on fairseq code bases
+# https://github.com/facebookresearch/fairseq
+# --------------------------------------------------------
+
+"""
+    wav2vec encoder adding relitive position bias, modified from 
+    https://github.com/microsoft/SpeechT5/blob/main/Speech2C/speech2c/models/modules/transformer_encoder.py
+    https://github.com/facebookresearch/fairseq/blob/main/fairseq/models/wav2vec/wav2vec2.py
+"""
+
+import math
+import numpy as np
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from fairseq import utils
+from fairseq.dataclass import ChoiceEnum
+from fairseq.modules import (
+    LayerNorm,
+    SamePad,
+)
+from fairseq.modules.checkpoint_activations import checkpoint_wrapper
+from fairseq.modules.transformer_sentence_encoder import init_bert_params
+from fairseq.utils import index_put
+from fairseq.distributed import fsdp_wrap
+from fairseq.models.wav2vec.utils import pad_to_multiple
+
+## reload multi-head attition with rel-pos-bias
+from fairseq.models.wav2vec.wav2vec2 import TransformerEncoder as W2vTransformerEncoder
+from speechut.modules import RelativePositionalEncoding
+from speechut.modules import MultiheadAttention
+
+EXTRACTOR_MODE_CHOICES = ChoiceEnum(["default", "layer_norm"])
+MASKING_DISTRIBUTION_CHOICES = ChoiceEnum(["static", "uniform", "normal", "poisson"])
+
+
+class TransformerEncoder(W2vTransformerEncoder):
+    def __init__(self, args):
+        super().__init__(args)
+
+        self.dropout = args.dropout
+        self.embedding_dim = args.encoder_embed_dim
+        self.required_seq_len_multiple = args.required_seq_len_multiple
+        self.use_rel_pos_enc = getattr(args, "use_rel_pos_enc", False)
+
+        self.pos_conv = nn.Conv1d(
+            self.embedding_dim,
+            self.embedding_dim,
+            kernel_size=args.conv_pos,
+            padding=args.conv_pos // 2,
+            groups=args.conv_pos_groups,
+        )
+        dropout = 0
+        std = math.sqrt((4 * (1.0 - dropout)) / (args.conv_pos * self.embedding_dim))
+        nn.init.normal_(self.pos_conv.weight, mean=0, std=std)
+        nn.init.constant_(self.pos_conv.bias, 0)
+
+        self.pos_conv = nn.utils.weight_norm(self.pos_conv, name="weight", dim=2)
+        self.pos_conv = nn.Sequential(self.pos_conv, SamePad(args.conv_pos), nn.GELU())
+
+        layers = []
+        for _ in range(args.encoder_layers):
+            layer = TransformerSentenceEncoderLayer(
+                embedding_dim=self.embedding_dim,
+                ffn_embedding_dim=args.encoder_ffn_embed_dim,
+                num_attention_heads=args.encoder_attention_heads,
+                dropout=self.dropout,
+                attention_dropout=args.attention_dropout,
+                activation_dropout=args.activation_dropout,
+                activation_fn=args.activation_fn,
+                layer_norm_first=args.layer_norm_first,
+                has_relative_attention_bias=self.use_rel_pos_enc,
+            )
+            if args.checkpoint_activations:
+                layer = fsdp_wrap(layer)
+                layer = checkpoint_wrapper(layer)
+            layers.append(layer)
+        self.layers = nn.ModuleList(layers)
+
+        self.layer_norm_first = args.layer_norm_first
+        self.layer_norm = LayerNorm(self.embedding_dim)
+        self.layerdrop = args.encoder_layerdrop
+        if self.use_rel_pos_enc:
+            self.pos_emb = RelativePositionalEncoding(args.encoder_embed_dim // args.encoder_attention_heads, 160)
+
+
+        self.apply(init_bert_params)
+
+    def forward(self, x, padding_mask=None, layer=None):
+        x, layer_results = self.extract_features(x, padding_mask, layer)
+
+        if self.layer_norm_first and layer is None:
+            x = self.layer_norm(x)
+
+        return x, layer_results
+
+    def extract_features(self, x, padding_mask=None, tgt_layer=None):
+
+        if padding_mask is not None:
+            x = index_put(x, padding_mask, 0)
+
+        x_conv = self.pos_conv(x.transpose(1, 2))
+        x_conv = x_conv.transpose(1, 2)
+        x = x + x_conv
+
+        if not self.layer_norm_first:
+            x = self.layer_norm(x)
+
+        # pad to the sequence length dimension
+        x, pad_length = pad_to_multiple(
+            x, self.required_seq_len_multiple, dim=-2, value=0
+        )
+        if pad_length > 0 and padding_mask is None:
+            padding_mask = x.new_zeros((x.size(0), x.size(1)), dtype=torch.bool)
+            padding_mask[:, -pad_length:] = True
+        else:
+            padding_mask, _ = pad_to_multiple(
+                padding_mask, self.required_seq_len_multiple, dim=-1, value=True
+            )
+        x = F.dropout(x, p=self.dropout, training=self.training)
+
+        # B x T x C -> T x B x C
+        x = x.transpose(0, 1)
+
+        if self.use_rel_pos_enc:
+            x_len = x.shape[0]
+            pos_seq = torch.arange(0, x_len).long().to(x.device)
+            pos_seq = pos_seq[:, None] - pos_seq[None, :]
+            pos_k, pos_v = self.pos_emb(pos_seq)
+        else:
+            pos_k = None
+
+        layer_results = []
+        r = None
+        for i, layer in enumerate(self.layers):
+            dropout_probability = np.random.random()
+            if not self.training or (dropout_probability > self.layerdrop):
+                x, z = layer(x, self_attn_padding_mask=padding_mask, need_weights=False, pos_bias=pos_k)
+                if tgt_layer is not None:
+                    # unpad if needed
+                    if pad_length > 0:
+                        layer_results.append(
+                            (
+                                x[:-pad_length],
+                                z[:, :-pad_length, :-pad_length]
+                                if z is not None
+                                else z,
+                            )
+                        )
+                    else:
+                        layer_results.append((x, z))
+            if i == tgt_layer:
+                r = x
+                break
+
+        if r is not None:
+            x = r
+
+        # T x B x C -> B x T x C
+        x = x.transpose(0, 1)
+        # undo paddding
+        if pad_length > 0:
+            x = x[:, :-pad_length]
+
+        return x, layer_results
+
+
+class TransformerSentenceEncoderLayer(nn.Module):
+    """
+    Implements a Transformer Encoder Layer used in BERT/XLM style pre-trained
+    models.
+    """
+
+    def __init__(
+        self,
+        embedding_dim: float = 768,
+        ffn_embedding_dim: float = 3072,
+        num_attention_heads: float = 8,
+        dropout: float = 0.1,
+        attention_dropout: float = 0.1,
+        activation_dropout: float = 0.1,
+        activation_fn: str = "relu",
+        layer_norm_first: bool = False,
+        has_relative_attention_bias: bool = False,
+    ) -> None:
+
+        super().__init__()
+        # Initialize parameters
+        self.embedding_dim = embedding_dim
+        self.dropout = dropout
+        self.activation_dropout = activation_dropout
+
+        # Initialize blocks
+        self.activation_fn = utils.get_activation_fn(activation_fn)
+        self.self_attn = MultiheadAttention(
+            self.embedding_dim,
+            num_attention_heads,
+            dropout=attention_dropout,
+            self_attention=True,
+        )
+
+        self.dropout1 = nn.Dropout(dropout)
+        self.dropout2 = nn.Dropout(self.activation_dropout)
+        self.dropout3 = nn.Dropout(dropout)
+
+        self.layer_norm_first = layer_norm_first
+
+        # layer norm associated with the self attention layer
+        self.self_attn_layer_norm = LayerNorm(self.embedding_dim)
+        self.fc1 = nn.Linear(self.embedding_dim, ffn_embedding_dim)
+        self.fc2 = nn.Linear(ffn_embedding_dim, self.embedding_dim)
+
+        # layer norm associated with the position wise feed-forward NN
+        self.final_layer_norm = LayerNorm(self.embedding_dim)
+
+        if has_relative_attention_bias:
+            self.norm_k = LayerNorm(self.embedding_dim//num_attention_heads)
+
+    def forward(
+        self,
+        x: torch.Tensor,
+        self_attn_mask: torch.Tensor = None,
+        self_attn_padding_mask: torch.Tensor = None,
+        need_weights: bool = False,
+        att_args=None,
+        pos_bias=None,
+    ):
+        """
+        LayerNorm is applied either before or after the self-attention/ffn
+        modules similar to the original Transformer imlementation.
+        """
+        residual = x
+
+        if self.layer_norm_first:
+            x = self.self_attn_layer_norm(x)
+            if pos_bias is not None:
+                pos_bias = self.norm_k(pos_bias)
+            x, attn = self.self_attn(
+                query=x,
+                key=x,
+                value=x,
+                key_padding_mask=self_attn_padding_mask,
+                attn_mask=self_attn_mask,
+                position_bias=pos_bias,
+            )
+            x = self.dropout1(x)
+            x = residual + x
+
+            residual = x
+            x = self.final_layer_norm(x)
+            x = self.activation_fn(self.fc1(x))
+            x = self.dropout2(x)
+            x = self.fc2(x)
+            x = self.dropout3(x)
+            x = residual + x
+        else:
+            x, attn = self.self_attn(
+                query=x,
+                key=x,
+                value=x,
+                key_padding_mask=self_attn_padding_mask,
+                position_bias=pos_bias,
+            )
+
+            x = self.dropout1(x)
+            x = residual + x
+
+            x = self.self_attn_layer_norm(x)
+
+            residual = x
+            x = self.activation_fn(self.fc1(x))
+            x = self.dropout2(x)
+            x = self.fc2(x)
+            x = self.dropout3(x)
+            x = residual + x
+            x = self.final_layer_norm(x)
+
+        return x, attn
diff --git a/SpeechUT/speechut/scripts/pretrain_speechut/base_speechut_for_asr.sh b/SpeechUT/speechut/scripts/pretrain_speechut/base_speechut_for_asr.sh
new file mode 100644
index 0000000000000000000000000000000000000000..d5bc7311331208c3f2f65c17586c73ee63cd98f0
--- /dev/null
+++ b/SpeechUT/speechut/scripts/pretrain_speechut/base_speechut_for_asr.sh
@@ -0,0 +1,40 @@
+# ####################################
+# SpeechUT Base model #
+# ####################################
+[ $# -lt 2 ] && echo "Usage: $0 <data_dir> <text_data_dir> [mount=${PWD}] [world_size=32] [update_freq=1]" && exit 1
+[ ${PWD##*/} != SpeechUT ] && echo "Error: dir not match! Switch to SpeechUT/ and run it again!" && exit 1
+DATA_DIR=$1
+TEXT_DATA_DIR=$2
+mount=$3
+world_size=$4
+update_freq=$5
+[ -z $mount ] && mount=${PWD}
+[ -z $world_size ] && world_size=32
+[ -z $update_freq ] && update_freq=1
+
+CODE_ROOT=${PWD}
+MODEL_DIR="${mount}/exp/pretrain/base_speechut4asr_${world_size}gpu_${update_freq}accum"
+[ -d $MODEL_DIR ] || mkdir -p $MODEL_DIR
+
+python $CODE_ROOT/fairseq/fairseq_cli/hydra_train.py \
+  --config-dir $CODE_ROOT/speechut/config/pretrain \
+  --config-name speechut_base_librispeech \
+  common.user_dir=$CODE_ROOT/speechut \
+  \
+  task.labels='["km"]' \
+  model.label_rate=50 \
+  task.data=$DATA_DIR \
+  task.label_dir=$DATA_DIR \
+  task.text_cfg.text_data=$TEXT_DATA_DIR \
+  \
+  dataset.train_subset=\"train_960+pseudo_libritext.kmu-ltr+merge_960.kmu-none\" \
+  dataset.valid_subset=\"dev_clean+dev.kmu-ltr+dev.kmu-none\" \
+  dataset.num_workers=0 \
+  dataset.max_tokens=1400000 \
+  distributed_training.distributed_world_size=${world_size} \
+  optimization.update_freq=[${update_freq}] \
+  \
+  common.tensorboard_logdir=$MODEL_DIR \
+  checkpoint.save_dir=$MODEL_DIR \
+  hydra.run.dir=$MODEL_DIR \
+  hydra.job.name=base_speechut4asr_${world_size}gpu_${update_freq}accum
diff --git a/SpeechUT/speechut/scripts/pretrain_speechut/base_speechut_for_st.sh b/SpeechUT/speechut/scripts/pretrain_speechut/base_speechut_for_st.sh
new file mode 100644
index 0000000000000000000000000000000000000000..438a43f55275938c51faefab181dacc1af3567d0
--- /dev/null
+++ b/SpeechUT/speechut/scripts/pretrain_speechut/base_speechut_for_st.sh
@@ -0,0 +1,47 @@
+# ####################################
+# SpeechUT Base model #
+# ####################################
+[ $# -lt 3 ] && echo "Usage: $0 <data_dir> <text_data_dir> <lang=de/es> [mount=${PWD}] [world_size=32] [update_freq=1]" && exit 1
+[ ${PWD##*/} != SpeechUT ] && echo "Error: dir not match! Switch to SpeechUT/ and run it again!" && exit 1
+DATA_DIR=$1
+TEXT_DATA_DIR=$2
+lang=$3
+mount=$4
+world_size=$5
+update_freq=$6
+[ -z $mount ] && mount=${PWD}
+[ -z $world_size ] && world_size=32
+[ -z $update_freq ] && update_freq=1
+
+CODE_ROOT=${PWD}
+MODEL_DIR="${mount}/exp/pretrain/base_speechut4en${lang}_${world_size}gpu_${update_freq}accum"
+[ -d $MODEL_DIR ] || mkdir -p $MODEL_DIR
+
+python $CODE_ROOT/fairseq/fairseq_cli/hydra_train.py \
+  --config-dir $CODE_ROOT/speechut/config/pretrain \
+  --config-name speechut_base_librispeech \
+  common.user_dir=$CODE_ROOT/speechut \
+  \
+  task.labels='["km"]' \
+  model.label_rate=50 \
+  task.data=$DATA_DIR \
+  task.label_dir=$DATA_DIR \
+  task.text_cfg.text_data=$TEXT_DATA_DIR \
+  \
+  model.add_text_ctc=false \
+  model.text_transformer.share_decoder_input_output_embed=true \
+  criterion.u2t_ed_weight=1.0 \
+  criterion.u2t_ctc_weight=0 \
+  \
+  dataset.train_subset=\"train_960,mustcuns_${lang}+pseudo_wmt_en${lang}.kmu-spm+train_960.kmu-none,mustcuns_${lang}.kmu-none\" \
+  dataset.valid_subset=\"dev_clean+pseudo_valid.kmu-spm+dev.kmu-none\" \
+  dataset.num_workers=0 \
+  dataset.max_tokens=1400000 \
+  distributed_training.distributed_world_size=${world_size} \
+  optimization.update_freq=[${update_freq}] \
+  \
+  common.tensorboard_logdir=$MODEL_DIR \
+  checkpoint.save_dir=$MODEL_DIR \
+  hydra.run.dir=$MODEL_DIR \
+  hydra.job.name=base_speechut4en${lang}_${world_size}gpu_${update_freq}accum
+
diff --git a/SpeechUT/speechut/scripts/pretrain_speechut/base_speechut_for_st_enfr.sh b/SpeechUT/speechut/scripts/pretrain_speechut/base_speechut_for_st_enfr.sh
new file mode 100644
index 0000000000000000000000000000000000000000..c0c7217d0c124e603bb3b95ff11b7e7e462290c0
--- /dev/null
+++ b/SpeechUT/speechut/scripts/pretrain_speechut/base_speechut_for_st_enfr.sh
@@ -0,0 +1,48 @@
+# ####################################
+# SpeechUT Base model #
+# ####################################
+[ $# -lt 3 ] && echo "Usage: $0 <data_dir> <text_data_dir> [lang=fr] [mount=${PWD}] [world_size=32] [update_freq=1]" && exit 1
+[ ${PWD##*/} != SpeechUT ] && echo "Error: dir not match! Switch to SpeechUT/ and run it again!" && exit 1
+DATA_DIR=$1
+TEXT_DATA_DIR=$2
+lang=$3
+mount=$4
+world_size=$5
+update_freq=$6
+[ -z $lang ] && lang=fr
+[ -z $mount ] && mount=${PWD}
+[ -z $world_size ] && world_size=32
+[ -z $update_freq ] && update_freq=1
+
+CODE_ROOT=${PWD}
+MODEL_DIR="${mount}/exp/pretrain/base_speechut4en${lang}_${world_size}gpu_${update_freq}accum"
+[ -d $MODEL_DIR ] || mkdir -p $MODEL_DIR
+
+python $CODE_ROOT/fairseq/fairseq_cli/hydra_train.py \
+  --config-dir $CODE_ROOT/speechut/config/pretrain \
+  --config-name speechut_base_librispeech \
+  common.user_dir=$CODE_ROOT/speechut \
+  \
+  task.labels='["km"]' \
+  model.label_rate=50 \
+  task.data=$DATA_DIR \
+  task.label_dir=$DATA_DIR \
+  task.text_cfg.text_data=$TEXT_DATA_DIR \
+  \
+  model.add_text_ctc=false \
+  criterion.u2t_ed_weight=1.0 \
+  criterion.u2t_ctc_weight=0 \
+  \
+  dataset.train_subset=\"train_960,pretrain_mustc+pseudo_wmt14_enfr.kmu-spm+train_960.kmu-none,pretrain_mustc.kmu-none\" \
+  dataset.valid_subset=\"dev_clean+pseudo_valid.kmu-spm+dev.kmu-none\" \
+  dataset.num_workers=0 \
+  dataset.max_tokens=1400000 \
+  optimization.max_update=600000 \
+  distributed_training.distributed_world_size=${world_size} \
+  optimization.update_freq=[${update_freq}] \
+  \
+  common.tensorboard_logdir=$MODEL_DIR \
+  checkpoint.save_dir=$MODEL_DIR \
+  hydra.run.dir=$MODEL_DIR \
+  hydra.job.name=base_speechut4en${lang}_${world_size}gpu_${update_freq}accum
+
diff --git a/SpeechUT/speechut/scripts/pretrain_speechut/large_speechut_for_asr.sh b/SpeechUT/speechut/scripts/pretrain_speechut/large_speechut_for_asr.sh
new file mode 100644
index 0000000000000000000000000000000000000000..e9d64d789ed0421252edd71aa9c8268a42dc42f3
--- /dev/null
+++ b/SpeechUT/speechut/scripts/pretrain_speechut/large_speechut_for_asr.sh
@@ -0,0 +1,41 @@
+# ####################################
+# SpeechUT Large model #
+# ####################################
+[ $# -lt 2 ] && echo "Usage: $0 <data_dir> <text_data_dir> [mount=${PWD}] [world_size=32] [update_freq=4]" && exit 1
+[ ${PWD##*/} != SpeechUT ] && echo "Error: dir not match! Switch to SpeechUT/ and run it again!" && exit 1
+DATA_DIR=$1
+TEXT_DATA_DIR=$2
+mount=$3
+world_size=$4
+update_freq=$5
+[ -z $mount ] && mount=${PWD}
+[ -z $world_size ] && world_size=32
+[ -z $update_freq ] && update_freq=4
+
+CODE_ROOT=${PWD}
+MODEL_DIR="${mount}/exp/pretrain/large_speechut4asr_${world_size}gpu_${update_freq}accum"
+[ -d $MODEL_DIR ] || mkdir -p $MODEL_DIR
+
+python $CODE_ROOT/fairseq/fairseq_cli/hydra_train.py \
+  --config-dir $CODE_ROOT/speechut/config/pretrain \
+  --config-name speechut_large_librilight \
+  common.user_dir=$CODE_ROOT/speechut \
+  \
+  task.labels='["km"]' \
+  model.label_rate=50 \
+  task.data=$DATA_DIR \
+  task.label_dir=$DATA_DIR \
+  task.text_cfg.text_data=$TEXT_DATA_DIR \
+  \
+  dataset.train_subset=\"train_small+pseudo_libritext.kmu-ltr\" \
+  dataset.valid_subset=\"dev_clean+dev.kmu-ltr\" \
+  dataset.num_workers=0 \
+  dataset.max_tokens=900000 \
+  distributed_training.distributed_world_size=${world_size} \
+  optimization.update_freq=[${update_freq}] \
+  \
+  common.tensorboard_logdir=$MODEL_DIR \
+  checkpoint.save_dir=$MODEL_DIR \
+  hydra.run.dir=$MODEL_DIR \
+  hydra.job.name=large_speechut4asr_${world_size}gpu_${update_freq}accum
+  
\ No newline at end of file
diff --git a/SpeechUT/speechut/scripts/tune_speechut_asr/finetune960h_large_edctc.sh b/SpeechUT/speechut/scripts/tune_speechut_asr/finetune960h_large_edctc.sh
new file mode 100644
index 0000000000000000000000000000000000000000..08a25818bc9fc519e65fa175886545a8650c0906
--- /dev/null
+++ b/SpeechUT/speechut/scripts/tune_speechut_asr/finetune960h_large_edctc.sh
@@ -0,0 +1,45 @@
+# ####################################
+# SpeechUT Large model #
+# ####################################
+[ $# -lt 3 ] && echo "Usage: $0 <model_path> <data_dir> <cpt_tag> [mount=${PWD}] [world_size=8] [update_freq=3]" && exit 1
+[ ${PWD##*/} != SpeechUT ] && echo "Error: dir not match! Switch to SpeechUT/ and run it again!" && exit 1
+
+w2v_path=$1
+DATA_DIR=$2
+cpt=$3
+mount=$4
+world_size=$5
+update_freq=$6
+[ -z $mount ] && mount=${PWD}
+[ -z $world_size ] && world_size=8
+[ -z $update_freq ] && update_freq=3
+
+CODE_ROOT=${PWD}
+
+exp_name=${w2v_path%/*}
+exp_name=${exp_name##*/}
+MODEL_DIR="${mount}/exp/finetune_asr/$exp_name/960h_edctc80k_from_${cpt}_bz3.3m_lr1e-5"
+[ -d $MODEL_DIR ] || mkdir -p $MODEL_DIR
+
+python $CODE_ROOT/fairseq/fairseq_cli/hydra_train.py \
+  --config-dir $CODE_ROOT/speechut/config/finetune_asr \
+  --config-name speechut_large_960h \
+  common.user_dir=$CODE_ROOT/speechut \
+  \
+  task.data=$DATA_DIR \
+  task.label_dir=$DATA_DIR \
+  model.w2v_path=${w2v_path} \
+  \
+  optimization.lr=[0.00001] \
+  optimization.max_update=80000 \
+  dataset.max_tokens=1100000 \
+  optimization.update_freq=[${update_freq}] \
+  distributed_training.distributed_world_size=${world_size} \
+  \
+  dataset.train_subset="train_960" \
+  dataset.valid_subset="dev_other" \
+  \
+  common.tensorboard_logdir=$MODEL_DIR \
+  checkpoint.save_dir=$MODEL_DIR \
+  hydra.run.dir=$MODEL_DIR \
+  hydra.job.name=960h_edctc80k_from_${cpt}_bz3.3m_lr1e-5
diff --git a/SpeechUT/speechut/scripts/tune_speechut_asr/finetune_base_edctc.sh b/SpeechUT/speechut/scripts/tune_speechut_asr/finetune_base_edctc.sh
new file mode 100644
index 0000000000000000000000000000000000000000..cad7bd0a11336a2b5e0c34372d57b7b4b953a414
--- /dev/null
+++ b/SpeechUT/speechut/scripts/tune_speechut_asr/finetune_base_edctc.sh
@@ -0,0 +1,45 @@
+# ####################################
+# SpeechUT Base model #
+# ####################################
+[ $# -lt 3 ] && echo "Usage: $0 <model_path> <data_dir> <cpt_tag> [mount=${PWD}] [world_size=8] [update_freq=2]" && exit 1
+[ ${PWD##*/} != SpeechUT ] && echo "Error: dir not match! Switch to SpeechUT/ and run it again!" && exit 1
+
+w2v_path=$1
+DATA_DIR=$2
+cpt=$3
+mount=$4
+world_size=$5
+update_freq=$6
+[ -z $mount ] && mount=${PWD}
+[ -z $world_size ] && world_size=8
+[ -z $update_freq ] && update_freq=2
+
+CODE_ROOT=${PWD}
+
+exp_name=${w2v_path%/*}
+exp_name=${exp_name##*/}
+MODEL_DIR="${mount}/exp/finetune_asr/$exp_name/edctc40k_from_${cpt}_bz2.6m_lr1e-5"
+[ -d $MODEL_DIR ] || mkdir -p $MODEL_DIR
+
+python $CODE_ROOT/fairseq/fairseq_cli/hydra_train.py \
+  --config-dir $CODE_ROOT/speechut/config/finetune_asr \
+  --config-name speechut_base_100h \
+  common.user_dir=$CODE_ROOT/speechut \
+  \
+  task.data=$DATA_DIR \
+  task.label_dir=$DATA_DIR \
+  model.w2v_path=${w2v_path} \
+  \
+  optimization.lr=[0.00001] \
+  optimization.max_update=40000 \
+  dataset.max_tokens=1300000 \
+  optimization.update_freq=[${update_freq}] \
+  distributed_training.distributed_world_size=${world_size} \
+  \
+  dataset.train_subset="train_clean_100" \
+  dataset.valid_subset="dev_other" \
+  \
+  common.tensorboard_logdir=$MODEL_DIR \
+  checkpoint.save_dir=$MODEL_DIR \
+  hydra.run.dir=$MODEL_DIR \
+  hydra.job.name=edctc40k_from_${cpt}_bz2.6m_lr1e-5
diff --git a/SpeechUT/speechut/scripts/tune_speechut_asr/inference_edctc.sh b/SpeechUT/speechut/scripts/tune_speechut_asr/inference_edctc.sh
new file mode 100644
index 0000000000000000000000000000000000000000..9dce06398c476a26290839b7f3a8f8632a5060e0
--- /dev/null
+++ b/SpeechUT/speechut/scripts/tune_speechut_asr/inference_edctc.sh
@@ -0,0 +1,61 @@
+#####################################
+# SpeechUT ASR model #
+#####################################
+[ $# -lt 2 ] && echo "Usage: $0 <model_path> <data_dir> [gen-set=dev_other] [beam_size=10] [ctc_weight=0.2] [--normalize]" && exit 1
+[ ${PWD##*/} != SpeechUT ] && echo "Error: dir not match! Switch to SpeechUT/ and run it again!" && exit 1
+
+model_path=$1
+DATA_DIR=$2
+gen_set=$3
+beam_size=$4
+ctc_weight=$5
+extra=$6
+[ -z $extra ] && echo "Assert decoding base model! If you are decoding large model, please add '--normalize' at the end..."
+[ -z $gen_set ] && gen_set="dev_other"
+[ -z $beam_size ] && beam_size=10
+[ -z $ctc_weight ] && ctc_weight=0.2
+[ $ctc_weight == 0 ] && [ $beam_size != 1 ] && echo "Change beam size to 1 as no ctc-decoding used..." && beam_size=1
+[ $ctc_weight != 0 ] && extra="$extra --batch-size 1"
+
+src_dir=${model_path%/*}
+cpt=${model_path##*/}
+cpt=${cpt%.*}
+
+CODE_ROOT=${PWD}
+
+for subset in ${gen_set//,/ }; do
+    results_path=$src_dir/decode_${cpt}/beam${beam_size}_ctc${ctc_weight}/${subset}_${world_size}_${rank}
+    [ ! -d $results_path ] && mkdir -p $results_path
+
+    python $CODE_ROOT/fairseq/fairseq_cli/generate.py $DATA_DIR \
+    --user-dir $CODE_ROOT/speechut \
+    --label-dir ${DATA_DIR} \
+    --labels '["ltr"]' \
+    --single-target \
+    --post-process letter \
+    --gen-subset ${subset} \
+    --max-tokens 2000000 \
+    \
+    --task joint_sc2t_pretraining \
+    --add-decoder-target \
+    --fine-tuning \
+    --pad-audio \
+    --random-crop \
+    \
+    --ctc-weight ${ctc_weight} $extra \
+    --beam ${beam_size} \
+    \
+    --path ${model_path} \
+    --results-path $results_path \
+    \
+    --scoring wer --max-len-a 0.00078125 --max-len-b 200 \
+    &
+done
+wait
+
+
+for subset in ${gen_set//,/ }; do
+    results_path=$src_dir/decode_${cpt}/beam${beam_size}_ctc${ctc_weight}/${subset}_${world_size}_${rank}
+    echo $results_path
+    tail -n 1 $results_path/generate-*.txt
+done
diff --git a/SpeechUT/speechut/scripts/tune_speechut_asr/inference_edctclm.sh b/SpeechUT/speechut/scripts/tune_speechut_asr/inference_edctclm.sh
new file mode 100644
index 0000000000000000000000000000000000000000..dadd1a4286de52cef0250640ef64fd4117e11ecb
--- /dev/null
+++ b/SpeechUT/speechut/scripts/tune_speechut_asr/inference_edctclm.sh
@@ -0,0 +1,66 @@
+#####################################
+# SpeechUT ASR model #
+#####################################
+[ $# -lt 2 ] && echo "Usage: $0 <model_path> <data_dir> [gen-set=dev_other] [beam_size=30] [ctc_weight=0.3] [lm_weight=0.7] [lm_path] [--normalize]" && exit 1
+[ ${PWD##*/} != SpeechUT ] && echo "Error: dir not match! Switch to SpeechUT/ and run it again!" && exit 1
+
+model_path=$1
+DATA_DIR=$2
+gen_set=$3
+beam_size=$4
+ctc_weight=$5
+lm_weight=$6
+lm_path=$7
+extra=$8
+[ -z $extra ] && echo "Assert decoding base model! If you are decoding large model, please add '--normalize' at the end..."
+[ -z $gen_set ] && gen_set="dev_other"
+[ -z $beam_size ] && beam_size=30
+[ -z $ctc_weight ] && ctc_weight=0.3
+[ -z $lm_weight ] && lm_weight=0.7
+[ -z $lm_path ] && lm_path="/mnt/default/v-junyiao/librispeech/lm/lm_ctc_form/checkpoint_best.pt"
+[ $ctc_weight == 0 ] && [ $beam_size != 1 ] && echo "Change beam size to 1 and lm_weight to 0 as no ctc-decoding used..." && beam_size=1 && lm_weight=0
+[ $ctc_weight != 0 ] && extra="$extra --batch-size 1"
+
+src_dir=${model_path%/*}
+cpt=${model_path##*/}
+cpt=${cpt%.*}
+
+CODE_ROOT=${PWD}
+
+for subset in ${gen_set//,/ }; do
+    results_path=$src_dir/decode_${cpt}/beam${beam_size}_ctc${ctc_weight}_lm${lm_weight}/${subset}_${world_size}_${rank}
+    [ ! -d $results_path ] && mkdir -p $results_path
+
+    python $CODE_ROOT/fairseq/fairseq_cli/generate.py $DATA_DIR \
+    --user-dir $CODE_ROOT/speechut \
+    --label-dir ${DATA_DIR} \
+    --labels '["ltr"]' \
+    --single-target \
+    --post-process letter \
+    --gen-subset ${subset} \
+    --max-tokens 800000 \
+    \
+    --task joint_sc2t_pretraining \
+    --add-decoder-target \
+    --fine-tuning \
+    --pad-audio \
+    --random-crop \
+    \
+    --ctc-weight ${ctc_weight} $extra \
+    --lm-weight ${lm_weight} --lm-path ${lm_path} \
+    --beam ${beam_size} \
+    \
+    --path ${model_path} \
+    --results-path ${results_path} \
+    \
+    --scoring wer --max-len-a 0.00078125 --max-len-b 200 \
+    &
+done
+wait
+
+
+for subset in ${gen_set//,/ }; do
+    results_path=$src_dir/decode_${cpt}/beam${beam_size}_ctc${ctc_weight}_lm${lm_weight}/${subset}_${world_size}_${rank}
+    echo $results_path
+    tail -n 1 $results_path/generate-*.txt
+done
diff --git a/SpeechUT/speechut/scripts/tune_speechut_asr/inference_lm_nj.sh b/SpeechUT/speechut/scripts/tune_speechut_asr/inference_lm_nj.sh
new file mode 100644
index 0000000000000000000000000000000000000000..a5627a59975a01736907a5cc3fb76df335709b43
--- /dev/null
+++ b/SpeechUT/speechut/scripts/tune_speechut_asr/inference_lm_nj.sh
@@ -0,0 +1,74 @@
+#####################################
+# SpeechUT ASR model #
+#####################################
+[ $# -lt 2 ] && echo "Usage: $0 <model_path> <data_dir> [gen-set=dev_other] [beam_size=30] [ctc_weight=0.3] [lm_weight=0.7] [lm_path] [nj=8] [ngpu=8] [--normalize]" && exit 1
+[ ${PWD##*/} != SpeechUT ] && echo "Error: dir not match! Switch to SpeechUT/ and run it again!" && exit 1
+
+model_path=$1
+DATA_DIR=$2
+gen_set=$3
+beam_size=$4
+ctc_weight=$5
+lm_weight=$6
+lm_path=$7
+nj=$8
+ngpu=$9
+extra=${10}
+[ -z $extra ] && echo "Assert decoding base model! If you are decoding large model, please add '--normalize' at the end..."
+[ -z $gen_set ] && gen_set="dev_other"
+[ -z $beam_size ] && beam_size=30
+[ -z $ctc_weight ] && ctc_weight=0.3
+[ -z $lm_weight ] && lm_weight=0.7
+[ -z $lm_path ] && lm_path="/mnt/default/v-junyiao/librispeech/lm/lm_ctc_form/checkpoint_best.pt"
+[ $ctc_weight == 0 ] && [ $beam_size != 1 ] && echo "Change beam size to 1 and lm_weight to 0 as no ctc-decoding used..." && beam_size=1 && lm_weight=0
+[ $ctc_weight != 0 ] && extra="$extra --batch-size 1"
+[ -z $nj ] && nj=8
+[ -z $ngpu ] && ngpu=8
+
+src_dir=${model_path%/*}
+cpt=${model_path##*/}
+cpt=${cpt%.*}
+
+CODE_ROOT=${PWD}
+
+world_size=$nj
+for rank in $(seq 0 $((nj - 1))); do
+    export CUDA_VISIBLE_DEVICES=$((rank % $ngpu))
+    for subset in ${gen_set//,/ }; do
+        results_path=$src_dir/decode_${cpt}/beam${beam_size}_ctc${ctc_weight}_lm${lm_weight}/${subset}_${world_size}_${rank}
+        [ ! -d $results_path ] && mkdir -p $results_path
+
+        python $CODE_ROOT/fairseq/fairseq_cli/generate.py $DATA_DIR \
+        --user-dir $CODE_ROOT/speechut \
+        --label-dir ${DATA_DIR} \
+        --labels '["ltr"]' \
+        --single-target \
+        --post-process letter \
+        --gen-subset ${subset} \
+        --max-tokens 800000 \
+        \
+        --task joint_sc2t_pretraining \
+        --add-decoder-target \
+        --fine-tuning \
+        --pad-audio \
+        --random-crop \
+        \
+        --ctc-weight ${ctc_weight} $extra \
+        --lm-weight ${lm_weight} --lm-path ${lm_path} \
+        --beam ${beam_size} \
+        \
+        --path ${model_path} \
+        --results-path $results_path \
+        \
+        --scoring wer --max-len-a 0.00078125 --max-len-b 200 \
+        --distributed-world-size ${world_size} --distributed-rank ${rank} \
+        &
+    done
+done
+wait
+
+
+for subset in ${gen_set//,/ }; do
+    results_dir=$src_dir/decode_${cpt}/beam${beam_size}_ctc${ctc_weight}_lm${lm_weight}
+    cat $results_dir/${subset}_${world_size}_*/generate-${subset}.txt | grep -v "^Generate" > $results_dir/generate-${subset}.all.txt
+done
diff --git a/SpeechUT/speechut/scripts/tune_speechut_asr/inference_nj.sh b/SpeechUT/speechut/scripts/tune_speechut_asr/inference_nj.sh
new file mode 100644
index 0000000000000000000000000000000000000000..08e6df431c9856f24122118017b8ae85bacc5444
--- /dev/null
+++ b/SpeechUT/speechut/scripts/tune_speechut_asr/inference_nj.sh
@@ -0,0 +1,69 @@
+#####################################
+# SpeechUT ASR model #
+#####################################
+[ $# -lt 2 ] && echo "Usage: $0 <model_path> <data_dir> [gen-set=dev_other] [beam_size=10] [ctc_weight=0.2] [nj=32] [ngpu=8] [--normalize]" && exit 1
+[ ${PWD##*/} != SpeechUT ] && echo "Error: dir not match! Switch to SpeechUT/ and run it again!" && exit 1
+
+model_path=$1
+DATA_DIR=$2
+gen_set=$3
+beam_size=$4
+ctc_weight=$5
+nj=$6
+ngpu=$7
+extra=$8
+[ -z $extra ] && echo "Assert decoding base model! If you are decoding large model, please add '--normalize' at the end..."
+[ -z $gen_set ] && gen_set="dev_other"
+[ -z $beam_size ] && beam_size=10
+[ -z $ctc_weight ] && ctc_weight=0.2
+[ $ctc_weight == 0 ] && [ $beam_size != 1 ] && echo "Change beam size to 1 as no ctc-decoding used..." && beam_size=1
+[ $ctc_weight != 0 ] && extra="$extra --batch-size 1"
+[ -z $nj ] && nj=32
+[ -z $ngpu ] && ngpu=8
+
+src_dir=${model_path%/*}
+cpt=${model_path##*/}
+cpt=${cpt%.*}
+
+CODE_ROOT=${PWD}
+
+world_size=$nj
+for rank in $(seq 0 $((nj - 1))); do
+    export CUDA_VISIBLE_DEVICES=$((rank % $ngpu))
+    for subset in ${gen_set//,/ }; do
+        results_path=$src_dir/decode_${cpt}/beam${beam_size}_ctc${ctc_weight}/${subset}_${world_size}_${rank}
+        [ ! -d $results_path ] && mkdir -p $results_path
+
+        python $CODE_ROOT/fairseq/fairseq_cli/generate.py $DATA_DIR \
+        --user-dir $CODE_ROOT/speechut \
+        --label-dir ${DATA_DIR} \
+        --labels '["ltr"]' \
+        --single-target \
+        --post-process letter \
+        --gen-subset ${subset} \
+        --max-tokens 2000000 \
+        \
+        --task joint_sc2t_pretraining \
+        --add-decoder-target \
+        --fine-tuning \
+        --pad-audio \
+        --random-crop \
+        \
+        --ctc-weight ${ctc_weight} $extra \
+        --beam ${beam_size} \
+        \
+        --path ${model_path} \
+        --results-path $results_path \
+        \
+        --scoring wer --max-len-a 0.00078125 --max-len-b 200 \
+        --distributed-world-size ${world_size} --distributed-rank ${rank} \
+        &
+    done
+done
+wait
+
+
+for subset in ${gen_set//,/ }; do
+    results_dir=$src_dir/decode_${cpt}/beam${beam_size}_ctc${ctc_weight}
+    cat $results_dir/${subset}_${world_size}_*/generate-${subset}.txt | grep -v "^Generate" > $results_dir/generate-${subset}.all.txt
+done
diff --git a/SpeechUT/speechut/scripts/tune_speechut_st/finetune_base_mustc_enxx.sh b/SpeechUT/speechut/scripts/tune_speechut_st/finetune_base_mustc_enxx.sh
new file mode 100644
index 0000000000000000000000000000000000000000..59c8a2a0346b708894b1568fa691c062537aa559
--- /dev/null
+++ b/SpeechUT/speechut/scripts/tune_speechut_st/finetune_base_mustc_enxx.sh
@@ -0,0 +1,77 @@
+# ####################################
+# SpeechUT Base model #
+# ####################################
+[ $# -lt 4 ] && echo "Usage: $0 <model_path> <data_dir> <lang> <cpt-tag> [mount=${PWD}] [world_size=8] [update_freq=4/6]" && exit 0
+[ ${PWD##*/} != SpeechUT ] && echo "Error: dir not match! Switch to SpeechUT/ and run it again!" && exit 1
+
+w2v_path=$1
+DATA_DIR=$2
+lang=$3
+cpt=$4
+mount=$5
+world_size=$6
+update_freq=$7
+[ -z $mount ] && mount=${PWD}
+[ -z $world_size ] && world_size=8
+[ -z $update_freq ] && update_freq=4
+
+CODE_ROOT=${PWD}
+
+exp_name=${w2v_path%/*}
+exp_name=${exp_name##*/}
+MODEL_DIR="$mount/exp/finetune_mustc/$exp_name/legacy_en${lang}_from_${cpt}_bz3.2m_lr3e-5"
+[ -d $MODEL_DIR ] || mkdir -p $MODEL_DIR
+
+max_tokens=800000
+python $CODE_ROOT/fairseq/fairseq_cli/train.py ${DATA_DIR} \
+    --save-dir ${MODEL_DIR} \
+    --user-dir $CODE_ROOT/speechut \
+    --task speech_to_text \
+    --config-yaml config_en${lang}.yaml \
+    --train-subset "train_st" \
+    --valid-subset "dev_st" \
+    --fp16 \
+    --seed 1 \
+    \
+    --ddp-backend no_c10d \
+    --distributed-world-size ${world_size} \
+    --tensorboard-logdir ${MODEL_DIR} \
+    \
+    --criterion label_smoothed_cross_entropy --report-accuracy \
+    --label-smoothing 0.3 \
+    \
+    --optimizer adam \
+    --clip-norm 1.0 \
+    --lr 3e-05 \
+    --lr-scheduler polynomial_decay --warmup-updates 5000 \
+    --max-update 50000 \
+    --total-num-update 50000 \
+    --update-freq ${update_freq} \
+    \
+    --max-tokens ${max_tokens} \
+    --max-sentences 16 \
+    --max-tokens-valid ${max_tokens} \
+    --grouped-shuffling \
+    --max-source-positions ${max_tokens} \
+    --skip-invalid-size-inputs-valid-test \
+    --num-workers 0 \
+    --best-checkpoint-metric "accuracy" \
+    --maximize-best-checkpoint-metric \
+    \
+    --arch "speechut_st_legacy" \
+    --w2v-path ${w2v_path} \
+    --layerdrop 0.1 \
+    --activation-dropout 0.1 \
+    --attention-dropout 0.1 \
+    --feature-grad-mult 1.0 \
+    \
+    --apply-mask --mask-prob 0.5 \
+    \
+    --log-format json \
+    --log-interval 100 \
+    --save-interval 1 \
+    --keep-last-epochs 5 \
+    --keep-best-checkpoints 5 \
+    \
+    2>&1 | tee ${MODEL_DIR}/train_en${lang}.log
+
diff --git a/SpeechUT/speechut/scripts/tune_speechut_st/inference_st.sh b/SpeechUT/speechut/scripts/tune_speechut_st/inference_st.sh
new file mode 100644
index 0000000000000000000000000000000000000000..3aefa10e360f57dbf66cff9d84c800b4da89619f
--- /dev/null
+++ b/SpeechUT/speechut/scripts/tune_speechut_st/inference_st.sh
@@ -0,0 +1,44 @@
+# ####################################
+# SpeechUT Base model #
+# ####################################
+[ $# -lt 3 ] && echo "Usage: $0 <model_path> <data_dir> <lang> [gen-set=dev] [beam_size=10] [lenpen=1.0]" && exit 0
+[ ${PWD##*/} != SpeechUT ] && echo "Error: dir not match! Switch to SpeechUT/ and run it again!" && exit 1
+
+model_path=$1
+DATA_DIR=$2
+lang=$3
+gen_set=$4
+beam_size=$5
+lenpen=$6
+[ -z $gen_set ] && gen_set="dev"
+[ -z $beam_size ] && beam_size=10
+[ -z $lenpen ] && lenpen=1
+src_dir=${model_path%/*}
+cpt=${model_path##*/}
+cpt=${cpt%.*}
+
+CODE_ROOT=${PWD}
+results_path=$src_dir/decode_${cpt}_beam${beam_size}/${gen_set}
+[ ! -d $results_path ] && mkdir -p $results_path
+
+python $CODE_ROOT/fairseq/fairseq_cli/generate.py $DATA_DIR \
+    --gen-subset ${gen_set}_st \
+    --max-tokens 2000000 \
+    --max-source-positions 2000000 \
+    --num-workers 0 \
+    \
+    --user-dir $CODE_ROOT/speechut \
+    --task speech_to_text \
+    --config-yaml config_en${lang}.yaml \
+    \
+    --path ${model_path} \
+    --results-path $results_path \
+    \
+    --scoring sacrebleu --max-len-a 0 --max-len-b 512 \
+    --beam ${beam_size} \
+    --lenpen $lenpen \
+    # --model-overrides "{'model':{'w2v_path':'/path/to/your/pretrained/model.pt'}}" \
+
+    echo $results_path
+    tail -n 1 $results_path/generate-*.txt
+    sleep 1s
diff --git a/SpeechUT/speechut/squence_generator.py b/SpeechUT/speechut/squence_generator.py
new file mode 100644
index 0000000000000000000000000000000000000000..730e92768322473bb247471e657ec2cd02a48b0f
--- /dev/null
+++ b/SpeechUT/speechut/squence_generator.py
@@ -0,0 +1,1118 @@
+# ----------------------------------------------------------------------------
+# SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder Based Speech-Text Pre-training (https://arxiv.org/abs/2210.03730)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/SpeechUT
+# Code based on fairseq: https://github.com/facebookresearch/fairseq/tree/272c4c5197250997148fb12c0db6306035f166a4
+# 
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# ----------------------------------------------------------------------------
+"""
+    Modified from fairseq/fairseq/sequence_generator.py
+    1. add joint ctc decoding (merged from espnet)
+    2. add lm dict conversion
+"""
+import math
+from typing import Dict, List, Optional
+import sys
+import inspect
+
+import torch
+import torch.nn as nn
+from fairseq import search, utils
+from fairseq.data import data_utils
+from fairseq.models import FairseqIncrementalDecoder
+from torch import Tensor
+from fairseq.ngram_repeat_block import NGramRepeatBlock
+import numpy
+
+from speechut.modules.ctc_prefix_score import CTCPrefixScore
+
+MAX_CTC_BEAM = 33
+CTC_SCORING_RATIO = 4
+
+class SequenceGenerator(nn.Module):
+    def __init__(
+        self,
+        models,
+        tgt_dict,
+        beam_size=1,
+        max_len_a=0,
+        max_len_b=200,
+        max_len=0,
+        min_len=1,
+        normalize_scores=True,
+        len_penalty=1.0,
+        unk_penalty=0.0,
+        temperature=1.0,
+        match_source_len=False,
+        no_repeat_ngram_size=0,
+        search_strategy=None,
+        eos=None,
+        bos=None,
+        symbols_to_strip_from_output=None,
+        lm_model=None,
+        lm_weight=1.0,
+        ctc_weight=0.0,
+        lm_dict=None,
+    ):
+        """Generates translations of a given source sentence.
+
+        Args:
+            models (List[~fairseq.models.FairseqModel]): ensemble of models,
+                currently support fairseq.models.TransformerModel for scripting
+            beam_size (int, optional): beam width (default: 1)
+            max_len_a/b (int, optional): generate sequences of maximum length
+                ax + b, where x is the source length
+            max_len (int, optional): the maximum length of the generated output
+                (not including end-of-sentence)
+            min_len (int, optional): the minimum length of the generated output
+                (not including end-of-sentence)
+            normalize_scores (bool, optional): normalize scores by the length
+                of the output (default: True)
+            len_penalty (float, optional): length penalty, where <1.0 favors
+                shorter, >1.0 favors longer sentences (default: 1.0)
+            unk_penalty (float, optional): unknown word penalty, where <0
+                produces more unks, >0 produces fewer (default: 0.0)
+            temperature (float, optional): temperature, where values
+                >1.0 produce more uniform samples and values <1.0 produce
+                sharper samples (default: 1.0)
+            match_source_len (bool, optional): outputs should match the source
+                length (default: False)
+        """
+        super().__init__()
+        if isinstance(models, EnsembleModel):
+            self.model = models
+        else:
+            self.model = EnsembleModel(models)
+        self.tgt_dict = tgt_dict
+        self.pad = tgt_dict.pad()
+        self.unk = tgt_dict.unk()
+        self.eos = tgt_dict.eos() if eos is None else eos
+        self.bos = self.eos if bos is None else bos
+        self.blank = self.tgt_dict.index("<s>")
+        self.symbols_to_strip_from_output = (
+            symbols_to_strip_from_output.union({self.eos, self.bos})
+            if symbols_to_strip_from_output is not None
+            else {self.eos, self.bos}
+        )
+        self.vocab_size = len(tgt_dict)
+        self.beam_size = beam_size
+        # the max beam size is the dictionary size - 1, since we never select pad
+        self.beam_size = min(beam_size, self.vocab_size - 1)
+        self.max_len_a = max_len_a
+        self.max_len_b = max_len_b
+        self.min_len = min_len
+        self.max_len = max_len or self.model.max_decoder_positions()
+
+        self.normalize_scores = normalize_scores
+        self.len_penalty = len_penalty
+        self.unk_penalty = unk_penalty
+        self.temperature = temperature
+        self.match_source_len = match_source_len
+
+        if no_repeat_ngram_size > 0:
+            self.repeat_ngram_blocker = NGramRepeatBlock(no_repeat_ngram_size)
+        else:
+            self.repeat_ngram_blocker = None
+
+        assert temperature > 0, "--temperature must be greater than 0"
+
+        self.search = (
+            search.BeamSearch(tgt_dict) if search_strategy is None else search_strategy
+        )
+        # We only need to set src_lengths in LengthConstrainedBeamSearch.
+        # As a module attribute, setting it would break in multithread
+        # settings when the model is shared.
+        self.should_set_src_lengths = (
+            hasattr(self.search, "needs_src_lengths") and self.search.needs_src_lengths
+        )
+
+        self.model.eval()
+
+        self.lm_model = lm_model
+        self.lm_weight = lm_weight
+        self.ctc_weight = ctc_weight
+        if self.lm_model is not None:
+            self.lm_model.eval()
+        
+        # assume lm and ed model use different dicts, but the same vovab,
+        # align the LM dict to the ED dict
+        self.lm_dict = lm_dict
+        if lm_dict is not None and not tgt_dict.symbols == lm_dict.symbols:
+            self.lm_vocab_size = len(lm_dict)
+            assert self.lm_vocab_size <= self.vocab_size
+            self.dict_transform_forward = {}
+            for sym in self.lm_dict.symbols:
+                if self.tgt_dict.index(sym) != self.lm_dict.index(sym):
+                    self.dict_transform_forward[self.tgt_dict.index(sym)] = self.lm_dict.index(sym)
+            # [32, 33]
+            self.dict_transform_back = torch.zeros(self.lm_vocab_size, self.vocab_size).float()
+            for syb in self.lm_dict.symbols:
+                self.dict_transform_back[self.lm_dict.index(syb)][self.tgt_dict.index(syb)] = 1.0
+
+
+    def dict_transform(self, tokens, forward=True):
+        if self.lm_dict is None:
+            return tokens
+        if forward:
+            assert tokens.dim() == 2
+            t_tokens = tokens.clone()
+            offset = self.vocab_size
+            for idx in self.dict_transform_forward:
+                t_tokens[t_tokens == idx] = idx + offset
+            for idx in self.dict_transform_forward:
+                t_tokens[t_tokens == idx + offset] = self.dict_transform_forward[idx]
+            for idx in range(self.lm_vocab_size, self.vocab_size):
+                t_tokens[t_tokens == idx] = self.tgt_dict.pad()
+            return t_tokens
+
+        dict_transform_back = self.dict_transform_back.type_as(tokens)
+        tokens = torch.matmul(tokens, dict_transform_back)
+        for i in range(self.lm_vocab_size, self.vocab_size):
+            tokens[:, i] = -10000000
+        return tokens
+
+    def cuda(self):
+        self.model.cuda()
+        return self
+
+    @torch.no_grad()
+    def forward(
+        self,
+        sample: Dict[str, Dict[str, Tensor]],
+        prefix_tokens: Optional[Tensor] = None,
+        bos_token: Optional[int] = None,
+    ):
+        """Generate a batch of translations.
+
+        Args:
+            sample (dict): batch
+            prefix_tokens (torch.LongTensor, optional): force decoder to begin
+                with these tokens
+            bos_token (int, optional): beginning of sentence token
+                (default: self.eos)
+        """
+        return self._generate(sample, prefix_tokens, bos_token=bos_token)
+
+    # TODO(myleott): unused, deprecate after pytorch-translate migration
+    def generate_batched_itr(self, data_itr, beam_size=None, cuda=False, timer=None):
+        """Iterate over a batched dataset and yield individual translations.
+        Args:
+            cuda (bool, optional): use GPU for generation
+            timer (StopwatchMeter, optional): time generations
+        """
+        for sample in data_itr:
+            s = utils.move_to_cuda(sample) if cuda else sample
+            if "net_input" not in s:
+                continue
+            input = s["net_input"]
+            # model.forward normally channels prev_output_tokens into the decoder
+            # separately, but SequenceGenerator directly calls model.encoder
+            encoder_input = {
+                k: v for k, v in input.items() if k != "prev_output_tokens"
+            }
+            if timer is not None:
+                timer.start()
+            with torch.no_grad():
+                hypos = self.generate(encoder_input)
+            if timer is not None:
+                timer.stop(sum(len(h[0]["tokens"]) for h in hypos))
+            for i, id in enumerate(s["id"].data):
+                # remove padding
+                src = utils.strip_pad(input["src_tokens"].data[i, :], self.pad)
+                ref = (
+                    utils.strip_pad(s["target"].data[i, :], self.pad)
+                    if s["target"] is not None
+                    else None
+                )
+                yield id, src, ref, hypos[i]
+
+    @torch.no_grad()
+    def generate(
+        self, models, sample: Dict[str, Dict[str, Tensor]], **kwargs
+    ) -> List[List[Dict[str, Tensor]]]:
+        """Generate translations. Match the api of other fairseq generators.
+
+        Args:
+            models (List[~fairseq.models.FairseqModel]): ensemble of models
+            sample (dict): batch
+            prefix_tokens (torch.LongTensor, optional): force decoder to begin
+                with these tokens
+            constraints (torch.LongTensor, optional): force decoder to include
+                the list of constraints
+            bos_token (int, optional): beginning of sentence token
+                (default: self.eos)
+        """
+        return self._generate(sample, **kwargs)
+
+    def _generate(
+        self,
+        sample: Dict[str, Dict[str, Tensor]],
+        prefix_tokens: Optional[Tensor] = None,
+        constraints: Optional[Tensor] = None,
+        bos_token: Optional[int] = None,
+        min_len_a: Optional[float] = None,
+        modal_idx: Optional[int] = -1,
+    ):
+        incremental_states = torch.jit.annotate(
+            List[Dict[str, Dict[str, Optional[Tensor]]]],
+            [
+                torch.jit.annotate(Dict[str, Dict[str, Optional[Tensor]]], {})
+                for i in range(self.model.models_size)
+            ],
+        )
+        net_input = sample["net_input"]
+
+        if "src_tokens" in net_input:
+            src_tokens = net_input["src_tokens"]
+            # length of the source text being the character length except EndOfSentence and pad
+            src_lengths = (
+                (src_tokens.ne(self.eos) & src_tokens.ne(self.pad)).long().sum(dim=1)
+            )
+        elif "source" in net_input:
+            src_tokens = net_input["source"]
+            src_lengths = (
+                net_input["padding_mask"].size(-1) - net_input["padding_mask"].sum(-1)
+                if net_input["padding_mask"] is not None
+                else torch.tensor(src_tokens.size(-1)).to(src_tokens)
+            )
+        elif "features" in net_input:
+            src_tokens = net_input["features"]
+            src_lengths = (
+                net_input["padding_mask"].size(-1) - net_input["padding_mask"].sum(-1)
+                if net_input["padding_mask"] is not None
+                else torch.tensor(src_tokens.size(-1)).to(src_tokens)
+            )
+        else:
+            raise Exception(
+                "expected src_tokens or source in net input. input keys: "
+                + str(net_input.keys())
+            )
+
+        # bsz: total number of sentences in beam
+        # Note that src_tokens may have more than 2 dimensions (i.e. audio features)
+        bsz, src_len = src_tokens.size()[:2]
+        beam_size = self.beam_size
+
+        if constraints is not None and not self.search.supports_constraints:
+            raise NotImplementedError(
+                "Target-side constraints were provided, but search method doesn't support them"
+            )
+
+        # Initialize constraints, when active
+        self.search.init_constraints(constraints, beam_size)
+
+        max_len: int = -1
+        if self.match_source_len:
+            max_len = src_lengths.max().item()
+        else:
+            max_len = min(
+                int(self.max_len_a * src_len + self.max_len_b),
+                self.max_len - 1,
+            )
+
+        min_len = self.min_len if min_len_a is None else int(min_len_a * src_len + 1)
+        assert (
+            min_len <= max_len
+        ), "min_len cannot be larger than max_len, please adjust these!"
+        # compute the encoder output for each beam
+        with torch.autograd.profiler.record_function("EnsembleModel: forward_encoder"):
+            encoder_outs = self.model.forward_encoder(net_input)
+
+        dec_sos = sample["lang_idx"] if ("lang_idx" in sample and sample["lang_idx"] is not None) else (self.bos if bos_token is None else bos_token)
+        # Get CTC lprobs and prep ctc_scorer
+        if self.ctc_weight > 0:
+            ctc_lprobs = self.model.models[0].get_normalized_probs(
+                encoder_outs[0], log_probs=True
+            ).contiguous().transpose(0, 1)  # (B, T, C) from the encoder
+
+            hyp = {}
+            ctc_prefix_score = CTCPrefixScore(ctc_lprobs[0].detach().cpu().numpy(), self.blank, self.eos, numpy)
+            hyp["ctc_state_prev"] = ctc_prefix_score.initial_state()
+            hyp["ctc_score_prev"] = 0.0
+            ctc_beam = min(ctc_lprobs.shape[-1], int(min(beam_size * CTC_SCORING_RATIO, MAX_CTC_BEAM)))
+            ctc_hyps = {str(dec_sos): hyp}
+
+        # placeholder of indices for bsz * beam_size to hold tokens and accumulative scores
+        new_order = torch.arange(bsz).view(-1, 1).repeat(1, beam_size).view(-1)
+        new_order = new_order.to(src_tokens.device).long()
+        encoder_outs = self.model.reorder_encoder_out(encoder_outs, new_order)
+        # ensure encoder_outs is a List.
+        assert encoder_outs is not None
+
+        # initialize buffers
+        scores = (
+            torch.zeros(bsz * beam_size, max_len + 1).to(src_tokens).float()
+        )  # +1 for eos; pad is never chosen for scoring
+        tokens = (
+            torch.zeros(bsz * beam_size, max_len + 2)
+            .to(src_tokens)
+            .long()
+            .fill_(self.pad)
+        )  # +2 for eos and pad
+        tokens[:, 0] = dec_sos
+        attn: Optional[Tensor] = None
+
+        # A list that indicates candidates that should be ignored.
+        # For example, suppose we're sampling and have already finalized 2/5
+        # samples. Then cands_to_ignore would mark 2 positions as being ignored,
+        # so that we only finalize the remaining 3 samples.
+        cands_to_ignore = (
+            torch.zeros(bsz, beam_size).to(src_tokens).eq(-1)
+        )  # forward and backward-compatible False mask
+
+        # list of completed sentences
+        finalized = torch.jit.annotate(
+            List[List[Dict[str, Tensor]]],
+            [torch.jit.annotate(List[Dict[str, Tensor]], []) for i in range(bsz)],
+        )  # contains lists of dictionaries of infomation about the hypothesis being finalized at each step
+
+        # a boolean array indicating if the sentence at the index is finished or not
+        finished = [False for i in range(bsz)]
+        num_remaining_sent = bsz  # number of sentences remaining
+
+        # number of candidate hypos per step
+        cand_size = 2 * beam_size  # 2 x beam size in case half are EOS
+
+        # offset arrays for converting between different indexing schemes
+        bbsz_offsets = (
+            (torch.arange(0, bsz) * beam_size)
+            .unsqueeze(1)
+            .type_as(tokens)
+            .to(src_tokens.device)
+        )
+        cand_offsets = torch.arange(0, cand_size).type_as(tokens).to(src_tokens.device)
+
+        reorder_state: Optional[Tensor] = None
+        batch_idxs: Optional[Tensor] = None
+
+        original_batch_idxs: Optional[Tensor] = None
+        if "id" in sample and isinstance(sample["id"], Tensor):
+            original_batch_idxs = sample["id"]
+        else:
+            original_batch_idxs = torch.arange(0, bsz).type_as(tokens)
+
+        for step in range(max_len + 1):  # one extra step for EOS marker
+            # reorder decoder internal states based on the prev choice of beams
+            if reorder_state is not None:
+                if batch_idxs is not None:
+                    # update beam indices to take into account removed sentences
+                    corr = batch_idxs - torch.arange(batch_idxs.numel()).type_as(
+                        batch_idxs
+                    )
+                    reorder_state.view(-1, beam_size).add_(
+                        corr.unsqueeze(-1) * beam_size
+                    )
+                    original_batch_idxs = original_batch_idxs[batch_idxs]
+                self.model.reorder_incremental_state(incremental_states, reorder_state)
+                encoder_outs = self.model.reorder_encoder_out(
+                    encoder_outs, reorder_state
+                )
+            with torch.autograd.profiler.record_function(
+                "EnsembleModel: forward_decoder"
+            ):
+                lprobs, avg_attn_scores = self.model.forward_decoder(
+                    tokens[:, : step + 1],
+                    encoder_outs,
+                    incremental_states,
+                    self.temperature,
+                    modal_idx,
+                )
+
+            if self.ctc_weight > 0 and step != 0 and step < ctc_prefix_score.input_length:
+                new_lprobs = lprobs.new_full(lprobs.size(), -math.inf)
+                ctc_lprobs = lprobs.clone()
+                ctc_lprobs[:, self.blank] = -math.inf # never select blank
+                _, local_best_ids = torch.topk(ctc_lprobs, ctc_beam, dim=-1)
+                for b in range(tokens.size(0)):
+                    hyp_key = " ".join(str(x) for x in tokens[b, : step + 1].tolist())
+                    ctc_scores, ctc_states = ctc_prefix_score(
+                        tokens[b, : step + 1].cpu(), local_best_ids[b].cpu(), ctc_hyps[hyp_key]["ctc_state_prev"]
+                    )
+                    new_lprobs[b, local_best_ids[b]] = (1 - self.ctc_weight) * (lprobs[b, local_best_ids[b]]) + self.ctc_weight * torch.from_numpy(
+                            ctc_scores - ctc_hyps[hyp_key]["ctc_score_prev"]
+                        ).to(device="cuda")
+                    for j in range(len(local_best_ids[b])):
+                        ctc_hyps[hyp_key + " " + str(local_best_ids[b][j].item())] = {}
+                        ctc_hyps[hyp_key + " " + str(local_best_ids[b][j].item())]["ctc_score_prev"] = ctc_scores[j]
+                        ctc_hyps[hyp_key + " " + str(local_best_ids[b][j].item())]["ctc_state_prev"] = ctc_states[j]
+                lprobs = new_lprobs
+            elif self.ctc_weight > 0 and step == 0:
+                new_lprobs = lprobs.new_full(lprobs.size(), -math.inf)
+                ctc_lprobs = lprobs.clone()
+                ctc_lprobs[:, self.blank] = -math.inf # never select blank
+                _, local_best_ids = torch.topk(ctc_lprobs, ctc_beam, dim=-1)
+                for b in range(tokens.size(0)):
+                    hyp_key = " ".join(str(x) for x in tokens[b, : step + 1].tolist())
+                    ctc_scores, ctc_states = ctc_prefix_score(
+                        tokens[b, : step + 1].cpu(), local_best_ids[b].cpu(), ctc_hyps[hyp_key]["ctc_state_prev"]
+                    )
+                    new_lprobs[b, local_best_ids[b]] = (1 - self.ctc_weight) * (lprobs[b, local_best_ids[b]]) + self.ctc_weight * torch.from_numpy(
+                            ctc_scores - ctc_hyps[hyp_key]["ctc_score_prev"]
+                        ).to(device="cuda")
+                    for j in range(len(local_best_ids[b])):
+                        if b == 0:
+                            ctc_hyps[hyp_key + " " + str(local_best_ids[b][j].item())] = {}
+                            ctc_hyps[hyp_key + " " + str(local_best_ids[b][j].item())]["ctc_score_prev"] = ctc_scores[j]
+                            ctc_hyps[hyp_key + " " + str(local_best_ids[b][j].item())]["ctc_state_prev"] = ctc_states[j]
+                lprobs = new_lprobs
+            if self.lm_model is not None and self.lm_weight != 0:
+                if self.lm_dict is not None:
+                    transformed_tokens = self.dict_transform(tokens[:, : step + 1])
+                    lm_out = self.lm_model(transformed_tokens)
+                else:
+                    lm_out = self.lm_model(tokens[:, : step + 1])
+                probs = self.lm_model.get_normalized_probs(
+                    lm_out, log_probs=True, sample=None
+                )
+                probs = probs[:, -1, :] * self.lm_weight
+                if self.lm_dict is not None:
+                    probs = self.dict_transform(probs, forward=False)
+                lprobs += probs
+            # handle prefix tokens (possibly with different lengths)
+            if (
+                prefix_tokens is not None
+                and step < prefix_tokens.size(1)
+                and step < max_len
+            ):
+                lprobs, tokens, scores = self._prefix_tokens(
+                    step, lprobs, scores, tokens, prefix_tokens, beam_size
+                )
+            elif step < min_len:
+                # minimum length constraint (does not apply if using prefix_tokens)
+                lprobs[:, self.eos] = -math.inf
+
+            lprobs[lprobs != lprobs] = torch.tensor(-math.inf).to(lprobs)
+
+            lprobs[:, self.pad] = -math.inf  # never select pad
+            lprobs[:, self.unk] -= self.unk_penalty  # apply unk penalty
+            lprobs[:, self.blank] = -math.inf # never select blank
+            if dec_sos != self.eos:
+                lprobs[:, dec_sos] = -math.inf # never select lang id
+
+            # handle max length constraint
+            if step >= max_len:
+                lprobs[:, : self.eos] = -math.inf
+                lprobs[:, self.eos + 1 :] = -math.inf
+
+            # Record attention scores, only support avg_attn_scores is a Tensor
+            if avg_attn_scores is not None:
+                if attn is None:
+                    attn = torch.empty(
+                        bsz * beam_size, avg_attn_scores.size(1), max_len + 2
+                    ).to(scores)
+                attn[:, :, step + 1].copy_(avg_attn_scores)
+
+            scores = scores.type_as(lprobs)
+            eos_bbsz_idx = torch.empty(0).to(
+                tokens
+            )  # indices of hypothesis ending with eos (finished sentences)
+            eos_scores = torch.empty(0).to(
+                scores
+            )  # scores of hypothesis ending with eos (finished sentences)
+
+            if self.should_set_src_lengths:
+                self.search.set_src_lengths(src_lengths)
+
+            if self.repeat_ngram_blocker is not None:
+                lprobs = self.repeat_ngram_blocker(tokens, lprobs, bsz, beam_size, step)
+
+            # Shape: (batch, cand_size)
+            cand_scores, cand_indices, cand_beams = self.search.step(
+                step,
+                lprobs.view(bsz, -1, self.vocab_size),
+                scores.view(bsz, beam_size, -1)[:, :, :step],
+                tokens[:, : step + 1],
+                original_batch_idxs,
+            )
+
+            # cand_bbsz_idx contains beam indices for the top candidate
+            # hypotheses, with a range of values: [0, bsz*beam_size),
+            # and dimensions: [bsz, cand_size]
+            cand_bbsz_idx = cand_beams.add(bbsz_offsets)
+
+            # finalize hypotheses that end in eos
+            # Shape of eos_mask: (batch size, beam size)
+            eos_mask = cand_indices.eq(self.eos) & cand_scores.ne(-math.inf)
+            eos_mask[:, :beam_size][cands_to_ignore] = torch.tensor(0).to(eos_mask)
+
+            # only consider eos when it's among the top beam_size indices
+            # Now we know what beam item(s) to finish
+            # Shape: 1d list of absolute-numbered
+            eos_bbsz_idx = torch.masked_select(
+                cand_bbsz_idx[:, :beam_size], mask=eos_mask[:, :beam_size]
+            )
+
+            finalized_sents: List[int] = []
+            if eos_bbsz_idx.numel() > 0:
+                eos_scores = torch.masked_select(
+                    cand_scores[:, :beam_size], mask=eos_mask[:, :beam_size]
+                )
+
+                finalized_sents = self.finalize_hypos(
+                    step,
+                    eos_bbsz_idx,
+                    eos_scores,
+                    tokens,
+                    scores,
+                    finalized,
+                    finished,
+                    beam_size,
+                    attn,
+                    src_lengths,
+                    max_len,
+                )
+                num_remaining_sent -= len(finalized_sents)
+
+            assert num_remaining_sent >= 0
+            if num_remaining_sent == 0:
+                break
+            if self.search.stop_on_max_len and step >= max_len:
+                break
+            assert step < max_len, f"{step} < {max_len}"
+
+            # Remove finalized sentences (ones for which {beam_size}
+            # finished hypotheses have been generated) from the batch.
+            if len(finalized_sents) > 0:
+                new_bsz = bsz - len(finalized_sents)
+
+                # construct batch_idxs which holds indices of batches to keep for the next pass
+                batch_mask = torch.ones(
+                    bsz, dtype=torch.bool, device=cand_indices.device
+                )
+                batch_mask[finalized_sents] = False
+                # TODO replace `nonzero(as_tuple=False)` after TorchScript supports it
+                batch_idxs = torch.arange(
+                    bsz, device=cand_indices.device
+                ).masked_select(batch_mask)
+
+                # Choose the subset of the hypothesized constraints that will continue
+                self.search.prune_sentences(batch_idxs)
+
+                eos_mask = eos_mask[batch_idxs]
+                cand_beams = cand_beams[batch_idxs]
+                bbsz_offsets.resize_(new_bsz, 1)
+                cand_bbsz_idx = cand_beams.add(bbsz_offsets)
+                cand_scores = cand_scores[batch_idxs]
+                cand_indices = cand_indices[batch_idxs]
+
+                if prefix_tokens is not None:
+                    prefix_tokens = prefix_tokens[batch_idxs]
+                src_lengths = src_lengths[batch_idxs]
+                cands_to_ignore = cands_to_ignore[batch_idxs]
+
+                scores = scores.view(bsz, -1)[batch_idxs].view(new_bsz * beam_size, -1)
+                tokens = tokens.view(bsz, -1)[batch_idxs].view(new_bsz * beam_size, -1)
+                if attn is not None:
+                    attn = attn.view(bsz, -1)[batch_idxs].view(
+                        new_bsz * beam_size, attn.size(1), -1
+                    )
+                bsz = new_bsz
+            else:
+                batch_idxs = None
+
+            # Set active_mask so that values > cand_size indicate eos hypos
+            # and values < cand_size indicate candidate active hypos.
+            # After, the min values per row are the top candidate active hypos
+
+            # Rewrite the operator since the element wise or is not supported in torchscript.
+
+            eos_mask[:, :beam_size] = ~((~cands_to_ignore) & (~eos_mask[:, :beam_size]))
+            active_mask = torch.add(
+                eos_mask.type_as(cand_offsets) * cand_size,
+                cand_offsets[: eos_mask.size(1)],
+            )
+
+            # get the top beam_size active hypotheses, which are just
+            # the hypos with the smallest values in active_mask.
+            # {active_hypos} indicates which {beam_size} hypotheses
+            # from the list of {2 * beam_size} candidates were
+            # selected. Shapes: (batch size, beam size)
+            new_cands_to_ignore, active_hypos = torch.topk(
+                active_mask, k=beam_size, dim=1, largest=False
+            )
+
+            # update cands_to_ignore to ignore any finalized hypos.
+            cands_to_ignore = new_cands_to_ignore.ge(cand_size)[:, :beam_size]
+            # Make sure there is at least one active item for each sentence in the batch.
+            assert (~cands_to_ignore).any(dim=1).all()
+
+            # update cands_to_ignore to ignore any finalized hypos
+
+            # {active_bbsz_idx} denotes which beam number is continued for each new hypothesis (a beam
+            # can be selected more than once).
+            active_bbsz_idx = torch.gather(cand_bbsz_idx, dim=1, index=active_hypos)
+            active_scores = torch.gather(cand_scores, dim=1, index=active_hypos)
+
+            active_bbsz_idx = active_bbsz_idx.view(-1)
+            active_scores = active_scores.view(-1)
+
+            # copy tokens and scores for active hypotheses
+
+            # Set the tokens for each beam (can select the same row more than once)
+            tokens[:, : step + 1] = torch.index_select(
+                tokens[:, : step + 1], dim=0, index=active_bbsz_idx
+            )
+            # Select the next token for each of them
+            tokens.view(bsz, beam_size, -1)[:, :, step + 1] = torch.gather(
+                cand_indices, dim=1, index=active_hypos
+            )
+            if step > 0:
+                scores[:, :step] = torch.index_select(
+                    scores[:, :step], dim=0, index=active_bbsz_idx
+                )
+            scores.view(bsz, beam_size, -1)[:, :, step] = torch.gather(
+                cand_scores, dim=1, index=active_hypos
+            )
+
+            # Update constraints based on which candidates were selected for the next beam
+            self.search.update_constraints(active_hypos)
+
+            # copy attention for active hypotheses
+            if attn is not None:
+                attn[:, :, : step + 2] = torch.index_select(
+                    attn[:, :, : step + 2], dim=0, index=active_bbsz_idx
+                )
+
+            # reorder incremental state in decoder
+            reorder_state = active_bbsz_idx
+
+        # sort by score descending
+        for sent in range(len(finalized)):
+            scores = torch.tensor(
+                [float(elem["score"].item()) for elem in finalized[sent]]
+            )
+            _, sorted_scores_indices = torch.sort(scores, descending=True)
+            finalized[sent] = [finalized[sent][ssi] for ssi in sorted_scores_indices]
+            finalized[sent] = torch.jit.annotate(
+                List[Dict[str, Tensor]], finalized[sent]
+            )
+        return finalized
+
+    def _prefix_tokens(
+        self, step: int, lprobs, scores, tokens, prefix_tokens, beam_size: int
+    ):
+        """Handle prefix tokens"""
+        prefix_toks = prefix_tokens[:, step].unsqueeze(-1).repeat(1, beam_size).view(-1)
+        prefix_lprobs = lprobs.gather(-1, prefix_toks.unsqueeze(-1))
+        prefix_mask = prefix_toks.ne(self.pad)
+        lprobs[prefix_mask] = torch.tensor(-math.inf).to(lprobs)
+        lprobs[prefix_mask] = lprobs[prefix_mask].scatter(
+            -1, prefix_toks[prefix_mask].unsqueeze(-1), prefix_lprobs[prefix_mask]
+        )
+        # if prefix includes eos, then we should make sure tokens and
+        # scores are the same across all beams
+        eos_mask = prefix_toks.eq(self.eos)
+        if eos_mask.any():
+            # validate that the first beam matches the prefix
+            first_beam = tokens[eos_mask].view(-1, beam_size, tokens.size(-1))[
+                :, 0, 1 : step + 1
+            ]
+            eos_mask_batch_dim = eos_mask.view(-1, beam_size)[:, 0]
+            target_prefix = prefix_tokens[eos_mask_batch_dim][:, :step]
+            assert (first_beam == target_prefix).all()
+
+            # copy tokens, scores and lprobs from the first beam to all beams
+            tokens = self.replicate_first_beam(tokens, eos_mask_batch_dim, beam_size)
+            scores = self.replicate_first_beam(scores, eos_mask_batch_dim, beam_size)
+            lprobs = self.replicate_first_beam(lprobs, eos_mask_batch_dim, beam_size)
+        return lprobs, tokens, scores
+
+    def replicate_first_beam(self, tensor, mask, beam_size: int):
+        tensor = tensor.view(-1, beam_size, tensor.size(-1))
+        tensor[mask] = tensor[mask][:, :1, :]
+        return tensor.view(-1, tensor.size(-1))
+
+    def finalize_hypos(
+        self,
+        step: int,
+        bbsz_idx,
+        eos_scores,
+        tokens,
+        scores,
+        finalized: List[List[Dict[str, Tensor]]],
+        finished: List[bool],
+        beam_size: int,
+        attn: Optional[Tensor],
+        src_lengths,
+        max_len: int,
+    ):
+        """Finalize hypothesis, store finalized information in `finalized`, and change `finished` accordingly.
+        A sentence is finalized when {beam_size} finished items have been collected for it.
+
+        Returns number of sentences (not beam items) being finalized.
+        These will be removed from the batch and not processed further.
+        Args:
+            bbsz_idx (Tensor):
+        """
+        assert bbsz_idx.numel() == eos_scores.numel()
+
+        # clone relevant token and attention tensors.
+        # tokens is (batch * beam, max_len). So the index_select
+        # gets the newly EOS rows, then selects cols 1..{step + 2}
+        tokens_clone = tokens.index_select(0, bbsz_idx)[
+            :, 1 : step + 2
+        ]  # skip the first index, which is EOS
+
+        tokens_clone[:, step] = self.eos
+        attn_clone = (
+            attn.index_select(0, bbsz_idx)[:, :, 1 : step + 2]
+            if attn is not None
+            else None
+        )
+
+        # compute scores per token position
+        pos_scores = scores.index_select(0, bbsz_idx)[:, : step + 1]
+        pos_scores[:, step] = eos_scores
+        # convert from cumulative to per-position scores
+        pos_scores[:, 1:] = pos_scores[:, 1:] - pos_scores[:, :-1]
+
+        # normalize sentence-level scores
+        if self.normalize_scores:
+            eos_scores /= (step + 1) ** self.len_penalty
+
+        # cum_unfin records which sentences in the batch are finished.
+        # It helps match indexing between (a) the original sentences
+        # in the batch and (b) the current, possibly-reduced set of
+        # sentences.
+        cum_unfin: List[int] = []
+        prev = 0
+        for f in finished:
+            if f:
+                prev += 1
+            else:
+                cum_unfin.append(prev)
+        cum_fin_tensor = torch.tensor(cum_unfin, dtype=torch.int).to(bbsz_idx)
+
+        unfin_idx = bbsz_idx // beam_size
+        sent = unfin_idx + torch.index_select(cum_fin_tensor, 0, unfin_idx)
+
+        # Create a set of "{sent}{unfin_idx}", where
+        # "unfin_idx" is the index in the current (possibly reduced)
+        # list of sentences, and "sent" is the index in the original,
+        # unreduced batch
+        # For every finished beam item
+        # sentence index in the current (possibly reduced) batch
+        seen = (sent << 32) + unfin_idx
+        unique_seen: List[int] = torch.unique(seen).tolist()
+
+        if self.match_source_len:
+            condition = step > torch.index_select(src_lengths, 0, unfin_idx)
+            eos_scores = torch.where(condition, torch.tensor(-math.inf), eos_scores)
+        sent_list: List[int] = sent.tolist()
+        for i in range(bbsz_idx.size()[0]):
+            # An input sentence (among those in a batch) is finished when
+            # beam_size hypotheses have been collected for it
+            if len(finalized[sent_list[i]]) < beam_size:
+                if attn_clone is not None:
+                    # remove padding tokens from attn scores
+                    hypo_attn = attn_clone[i]
+                else:
+                    hypo_attn = torch.empty(0)
+
+                finalized[sent_list[i]].append(
+                    {
+                        "tokens": tokens_clone[i],
+                        "score": eos_scores[i],
+                        "attention": hypo_attn,  # src_len x tgt_len
+                        "alignment": torch.empty(0),
+                        "positional_scores": pos_scores[i],
+                    }
+                )
+
+        newly_finished: List[int] = []
+        for unique_s in unique_seen:
+            # check termination conditions for this sentence
+            unique_sent: int = unique_s >> 32
+            unique_unfin_idx: int = unique_s - (unique_sent << 32)
+
+            if not finished[unique_sent] and self.is_finished(
+                step, unique_unfin_idx, max_len, len(finalized[unique_sent]), beam_size
+            ):
+                finished[unique_sent] = True
+                newly_finished.append(unique_unfin_idx)
+
+        return newly_finished
+
+    def is_finished(
+        self,
+        step: int,
+        unfin_idx: int,
+        max_len: int,
+        finalized_sent_len: int,
+        beam_size: int,
+    ):
+        """
+        Check whether decoding for a sentence is finished, which
+        occurs when the list of finalized sentences has reached the
+        beam size, or when we reach the maximum length.
+        """
+        assert finalized_sent_len <= beam_size
+        if finalized_sent_len == beam_size or step == max_len:
+            return True
+        return False
+
+
+class EnsembleModel(nn.Module):
+    """A wrapper around an ensemble of models."""
+
+    def __init__(self, models):
+        super().__init__()
+        self.models_size = len(models)
+        # method '__len__' is not supported in ModuleList for torch script
+        self.single_model = models[0]
+        self.models = nn.ModuleList(models)
+
+        self.has_incremental: bool = False
+        if all(
+            hasattr(m, "decoder") and isinstance(m.decoder, FairseqIncrementalDecoder)
+            for m in models
+        ):
+            self.has_incremental = True
+
+    def forward(self):
+        pass
+
+    def has_encoder(self):
+        return hasattr(self.single_model, "encoder")
+
+    def has_incremental_states(self):
+        return self.has_incremental
+
+    def max_decoder_positions(self):
+        return min(
+            [
+                m.max_decoder_positions()
+                for m in self.models
+                if hasattr(m, "max_decoder_positions")
+            ]
+            + [sys.maxsize]
+        )
+
+    @torch.jit.export
+    def forward_encoder(self, net_input: Dict[str, Tensor]):
+        if not self.has_encoder():
+            return None
+        return [model.encoder.forward_torchscript(net_input) for model in self.models]
+
+    @torch.jit.export
+    def forward_decoder(
+        self,
+        tokens,
+        encoder_outs: List[Dict[str, List[Tensor]]],
+        incremental_states: List[Dict[str, Dict[str, Optional[Tensor]]]],
+        temperature: float = 1.0,
+        modal_idx: int = -1,
+    ):
+        log_probs = []
+        avg_attn: Optional[Tensor] = None
+        encoder_out: Optional[Dict[str, List[Tensor]]] = None
+        for i, model in enumerate(self.models):
+            if self.has_encoder():
+                encoder_out = encoder_outs[i]
+            # decode each model
+            if self.has_incremental_states():
+                if "modal_idx" in inspect.getfullargspec(model.decoder.forward).args:
+                    decoder_out = model.decoder.forward(
+                        tokens,
+                        encoder_out=encoder_out,
+                        incremental_state=incremental_states[i],
+                        modal_idx=modal_idx,
+                    )
+                else:
+                    decoder_out = model.decoder.forward(
+                        tokens,
+                        encoder_out=encoder_out,
+                        incremental_state=incremental_states[i],
+                    )
+            else:
+                if hasattr(model, "decoder"):
+                    decoder_out = model.decoder.forward(tokens, encoder_out=encoder_out)
+                else:
+                    decoder_out = model.forward(tokens)
+
+            attn: Optional[Tensor] = None
+            decoder_len = len(decoder_out)
+            if decoder_len > 1 and decoder_out[1] is not None:
+                if isinstance(decoder_out[1], Tensor):
+                    attn = decoder_out[1]
+                else:
+                    attn_holder = decoder_out[1]["attn"]
+                    if isinstance(attn_holder, Tensor):
+                        attn = attn_holder
+                    elif attn_holder is not None:
+                        attn = attn_holder[0]
+                if attn is not None:
+                    attn = attn[:, -1, :]
+
+            decoder_out_tuple = (
+                decoder_out[0][:, -1:, :].div_(temperature),
+                None if decoder_len <= 1 else decoder_out[1],
+            )
+            probs = model.get_normalized_probs(
+                decoder_out_tuple, log_probs=True, sample=None
+            )
+            probs = probs[:, -1, :]
+            if self.models_size == 1:
+                return probs, attn
+
+            log_probs.append(probs)
+            if attn is not None:
+                if avg_attn is None:
+                    avg_attn = attn
+                else:
+                    avg_attn.add_(attn)
+
+        avg_probs = torch.logsumexp(torch.stack(log_probs, dim=0), dim=0) - math.log(
+            self.models_size
+        )
+
+        if avg_attn is not None:
+            avg_attn.div_(self.models_size)
+        return avg_probs, avg_attn
+
+    @torch.jit.export
+    def reorder_encoder_out(
+        self, encoder_outs: Optional[List[Dict[str, List[Tensor]]]], new_order
+    ):
+        """
+        Reorder encoder output according to *new_order*.
+
+        Args:
+            encoder_out: output from the ``forward()`` method
+            new_order (LongTensor): desired order
+
+        Returns:
+            *encoder_out* rearranged according to *new_order*
+        """
+        new_outs: List[Dict[str, List[Tensor]]] = []
+        if not self.has_encoder():
+            return new_outs
+        for i, model in enumerate(self.models):
+            assert encoder_outs is not None
+            new_outs.append(
+                model.encoder.reorder_encoder_out(encoder_outs[i], new_order)
+            )
+        return new_outs
+
+    @torch.jit.export
+    def reorder_incremental_state(
+        self,
+        incremental_states: List[Dict[str, Dict[str, Optional[Tensor]]]],
+        new_order,
+    ):
+        if not self.has_incremental_states():
+            return
+        for i, model in enumerate(self.models):
+            model.decoder.reorder_incremental_state_scripting(
+                incremental_states[i], new_order
+            )
+
+
+class SequenceGeneratorWithAlignment(SequenceGenerator):
+    def __init__(
+        self, models, tgt_dict, left_pad_target=False, print_alignment="hard", **kwargs
+    ):
+        """Generates translations of a given source sentence.
+
+        Produces alignments following "Jointly Learning to Align and
+        Translate with Transformer Models" (Garg et al., EMNLP 2019).
+
+        Args:
+            left_pad_target (bool, optional): Whether or not the
+                hypothesis should be left padded or not when they are
+                teacher forced for generating alignments.
+        """
+        super().__init__(EnsembleModelWithAlignment(models), tgt_dict, **kwargs)
+        self.left_pad_target = left_pad_target
+
+        if print_alignment == "hard":
+            self.extract_alignment = utils.extract_hard_alignment
+        elif print_alignment == "soft":
+            self.extract_alignment = utils.extract_soft_alignment
+
+    @torch.no_grad()
+    def generate(self, models, sample, **kwargs):
+        finalized = super()._generate(sample, **kwargs)
+
+        src_tokens = sample["net_input"]["src_tokens"]
+        bsz = src_tokens.shape[0]
+        beam_size = self.beam_size
+        (
+            src_tokens,
+            src_lengths,
+            prev_output_tokens,
+            tgt_tokens,
+        ) = self._prepare_batch_for_alignment(sample, finalized)
+        if any(getattr(m, "full_context_alignment", False) for m in self.model.models):
+            attn = self.model.forward_align(src_tokens, src_lengths, prev_output_tokens)
+        else:
+            attn = [
+                finalized[i // beam_size][i % beam_size]["attention"].transpose(1, 0)
+                for i in range(bsz * beam_size)
+            ]
+
+        if src_tokens.device != "cpu":
+            src_tokens = src_tokens.to("cpu")
+            tgt_tokens = tgt_tokens.to("cpu")
+            attn = [i.to("cpu") for i in attn]
+
+        # Process the attn matrix to extract hard alignments.
+        for i in range(bsz * beam_size):
+            alignment = self.extract_alignment(
+                attn[i], src_tokens[i], tgt_tokens[i], self.pad, self.eos
+            )
+            finalized[i // beam_size][i % beam_size]["alignment"] = alignment
+        return finalized
+
+    def _prepare_batch_for_alignment(self, sample, hypothesis):
+        src_tokens = sample["net_input"]["src_tokens"]
+        bsz = src_tokens.shape[0]
+        src_tokens = (
+            src_tokens[:, None, :]
+            .expand(-1, self.beam_size, -1)
+            .contiguous()
+            .view(bsz * self.beam_size, -1)
+        )
+        src_lengths = sample["net_input"]["src_lengths"]
+        src_lengths = (
+            src_lengths[:, None]
+            .expand(-1, self.beam_size)
+            .contiguous()
+            .view(bsz * self.beam_size)
+        )
+        prev_output_tokens = data_utils.collate_tokens(
+            [beam["tokens"] for example in hypothesis for beam in example],
+            self.pad,
+            self.eos,
+            self.left_pad_target,
+            move_eos_to_beginning=True,
+        )
+        tgt_tokens = data_utils.collate_tokens(
+            [beam["tokens"] for example in hypothesis for beam in example],
+            self.pad,
+            self.eos,
+            self.left_pad_target,
+            move_eos_to_beginning=False,
+        )
+        return src_tokens, src_lengths, prev_output_tokens, tgt_tokens
+
+
+class EnsembleModelWithAlignment(EnsembleModel):
+    """A wrapper around an ensemble of models."""
+
+    def __init__(self, models):
+        super().__init__(models)
+
+    def forward_align(self, src_tokens, src_lengths, prev_output_tokens):
+        avg_attn = None
+        for model in self.models:
+            decoder_out = model(src_tokens, src_lengths, prev_output_tokens)
+            attn = decoder_out[1]["attn"][0]
+            if avg_attn is None:
+                avg_attn = attn
+            else:
+                avg_attn.add_(attn)
+        if len(self.models) > 1:
+            avg_attn.div_(len(self.models))
+        return avg_attn
diff --git a/SpeechUT/speechut/tasks/joint_sc2t_pretrain.py b/SpeechUT/speechut/tasks/joint_sc2t_pretrain.py
new file mode 100644
index 0000000000000000000000000000000000000000..db6e4e611f01d58f53ede5fd529fb9ceca44bcc8
--- /dev/null
+++ b/SpeechUT/speechut/tasks/joint_sc2t_pretrain.py
@@ -0,0 +1,1004 @@
+# ----------------------------------------------------------------------------
+# SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder Based Speech-Text Pre-training (https://arxiv.org/abs/2210.03730)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/SpeechUT
+# Code based on fairseq: https://github.com/facebookresearch/fairseq/tree/272c4c5197250997148fb12c0db6306035f166a4
+# 
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# ----------------------------------------------------------------------------
+
+import logging
+import os
+import sys
+from typing import Dict, List, Optional, Tuple
+from pathlib import Path
+
+import numpy as np
+from argparse import Namespace
+from collections import OrderedDict
+
+import torch
+from dataclasses import dataclass, field
+from fairseq.data import (
+    Dictionary,
+    encoders,
+    data_utils,
+    StripTokenDataset,
+    PrependTokenDataset,
+    AppendTokenDataset,
+    DenoisingDataset,
+    ConcatDataset,
+    FairseqDataset,
+    iterators,
+    ResamplingDataset,
+    MaskTokensDataset,
+    LanguagePairDataset,
+)
+from fairseq.data.audio.speech_to_text_joint_dataset import S2TJointDataConfig
+from fairseq.data.shorten_dataset import maybe_shorten_dataset
+# from fairseq.data.encoders.utils import get_whole_word_mask
+from fairseq.dataclass.configs import FairseqDataclass
+from fairseq.tasks import register_task
+from fairseq.tasks.fairseq_task import FairseqTask
+from fairseq.dataclass.constants import ChoiceEnum
+from omegaconf import MISSING
+
+from speechut.data.multimodal_corpus_dataset import MultiCorpusDataset
+from speechut.data.load_langpair_dataset  import load_langpair_dataset
+from speechut.data.language_trible_dataset import LanguageTripleDataset, load_langtriple_dataset
+from speechut.data.hubert_dataset import HubertDataset
+
+logger = logging.getLogger(__name__)
+
+TOKENIZER_CHOICES = ChoiceEnum(["sentencepiece", "hubert_letters", "none"])
+
+def _lang_token(lang: str):
+    return "<lang:{}>".format(lang)
+
+def _lang_token_index(dic: Dictionary, lang: str):
+    """Return language token index."""
+    idx = dic.index(_lang_token(lang))
+    assert idx != dic.unk_index, "cannot find language token for lang {}".format(lang)
+    return idx
+
+
+class LabelEncoder(object):
+    def __init__(self, dictionary: Dictionary) -> None:
+        self.dictionary = dictionary
+
+    def __call__(self, label: str) -> List[str]:
+        return self.dictionary.encode_line(
+            label, append_eos=False, add_if_not_exist=False,
+        )
+
+
+### wrap the initial get_whole_word_mask which needs bpe_tokenizer,
+### here we just assume words are splited by "|" or "<SIL>"
+def get_whole_word_mask(args, dictionary):
+    def is_beginning_of_word(i):
+        if i < dictionary.nspecial:
+            # special elements are always considered beginnings
+            return True
+        tok = dictionary[i]
+        if tok.startswith("madeupword"):
+            return True
+        elif tok in ["<unk>", "<s>", "</s>", "<pad>", "|", "<eps>"]:
+            return True
+        else:
+            return False
+
+    mask_whole_words = torch.ByteTensor(
+        list(map(is_beginning_of_word, range(len(dictionary))))
+    )
+    return mask_whole_words
+
+def get_repeative_start(tokens):
+    """
+    tokens: torch.Tensor with repeative tokens
+    """
+    length = len(tokens)
+    rep_start_id = tokens[:-1] != tokens[1:]
+    return torch.cat([torch.tensor([True]), rep_start_id])
+
+@dataclass
+class TextPretrainingConfig(FairseqDataclass):    
+    ### added for joint pretraining
+    text_data: Optional[str] = field(
+        default=None,
+        metadata={
+            "help": "if set, path to text data directory",
+        },
+    )
+    seed: Optional[int] = field(
+        default=1,
+        metadata={
+            "help": "for ordered_indices in MulticorpusDataset",
+        },
+    )
+    tokens_per_sample: Optional[int] = field(
+        default=512,
+        metadata={
+            "help": "max number of total tokens over all segments per sample for dataset",
+        },
+    )
+    tokens_per_sample_tgt: Optional[int] = field(
+        default=512,
+        metadata={
+            "help": "max number of total tokens over all segments per target sample for dataset",
+        },
+    )
+    sample_break_mode: Optional[str] = field(
+        default="eos",
+        metadata={
+            "help": "mode for breaking sentence",
+        },
+    )
+    mask: Optional[float] = field(
+        default=0.3,
+        metadata={
+            "help": "fraction of words/subwords that will be masked",
+        },
+    )
+    leave_unmasked_prob: float = field(
+        default=0.1,
+        metadata={"help": "probability that a masked token is unmasked"},
+    )
+    mask_random: Optional[float] = field(
+        default=0.1,
+        metadata={
+            "help": "instead of using [MASK], use random token this often",
+        },
+    )
+    freq_weighted_replacement: bool = field(
+        default=False,
+        metadata={"help": "sample random replacement words based on word frequencies"},
+    )
+    mask_whole_words: bool = field(
+        default=True,
+        metadata={"help": "mask whole words; you may also want to set --bpe"},
+    )
+    mask_repeative_tokens: bool = field(
+        default=True,
+        metadata={"help": "mask repeative_tokens; if mask_whole_words=False"},
+    )
+    mask_multiple_length: int = field(
+        default=1,
+        metadata={"help": "repeat the mask indices multiple times"},
+    )
+    mask_stdev: float = field(
+        default=0.0,
+        metadata={"help": "stdev of the mask length"},
+    )
+    shorten_method: Optional[str] = field(
+        default="none",
+        metadata={
+            "help": "if not none, shorten sequences that exceed tokens_per_sample",
+            "choices": "none/truncate/random_crop"
+        },
+    )
+    shorten_data_split_list: Optional[str] = field(
+        default="",
+        metadata={
+            "help": "comma_separated list of dataset splits to apply shortening to, e.g., train,valid (default: all dataset splits)",
+        },
+    )
+
+    ### below hypra-parameters is used in bart
+    insert: Optional[float] = field(
+        default=0.0,
+        metadata={
+            "help": "insert this percentage of additional random tokens",
+        },
+    )
+    permute: Optional[float] = field(
+        default=0.0,
+        metadata={
+            "help": "take this proportion of subwords and permute them",
+        },
+    )
+    rotate: Optional[float] = field(
+        default=0.0,
+        metadata={
+            "help": "rotate this proportion of inputs",
+        },
+    )
+    poisson_lambda: Optional[float] = field(
+        default=3.5,
+        metadata={
+            "help": "randomly shuffle sentences for this proportion of inputs",
+        },
+    )
+    permute_sentences: Optional[float] = field(
+        default=0.0,
+        metadata={
+            "help": "shuffle this proportion of sentences in all inputs",
+        },
+    )
+    mask_length: Optional[str] = field(
+        default="span-poisson",
+        metadata={
+            "help": "mask length to choose",
+            "choice": "subword/word/span-poisson"
+        },
+    )
+    replace_length: Optional[int] = field(
+        default=1,
+        metadata={
+            "help": "when masking N tokens, replace with 0, 1, or N tokens (use -1 for N)",
+        },
+    )
+    shuffle_instance: Optional[bool] = field(
+        default=False,
+        metadata={"help": "shuffle instance"},
+    )
+    max_source_positions: Optional[int] = field(
+        default=1024,
+        metadata={"help": "max number of tokens in the source sequence"},
+    )
+    max_target_positions: Optional[int] = field(
+        default=1024,
+        metadata={"help": "max number of tokens in the target sequence"},
+    )
+    bpe: Optional[str] = field(
+        default="",
+        metadata={
+            "help": "will wrapped by the text_data_config yaml",
+        },
+    )
+    data_config: Optional[str] = field(
+        default=None,
+        metadata={
+            "help": "a config yaml specify the bpe model of text data",
+        },
+    )
+    text_maxtokens_ratio: Optional[float] = field(
+        default=1.0,
+        metadata={
+            "help": "for text, max_tokens = max_tokens * text_maxtokens_ratio / 320 ",
+        },
+    )
+    prepend_tgt_lang_tag: bool = field(
+        default=False,
+        metadata={"help": "prepend tgt_lang_tag to replace <eos>"},
+    )
+    mask_text_ratio: Optional[float] = field(
+        default=0.0,
+        metadata={
+            "help": "mask_text_ratio, for paired data",
+        },
+    )
+    truncate_mono_source: bool = field(
+        default=True,
+        metadata={"help": "truncate mono source-side examples that exceed max-positions"},
+    )
+
+
+@dataclass
+class JointPretrainingConfig(FairseqDataclass):
+    data: str = field(
+        default=MISSING, metadata={"help": "path to speech data directory"}
+    )
+    fine_tuning: bool = field(
+        default=False, metadata={"help": "set to true if fine-tuning Hubert"}
+    )
+    labels: List[str] = field(
+        default_factory=lambda: ["ltr"],
+        metadata={
+            "help": (
+                "extension of the label files to load, frame-level labels for"
+                " pre-training, and sequence-level label for fine-tuning"
+            )
+        },
+    )
+    label_dir: Optional[str] = field(
+        default=None,
+        metadata={
+            "help": "if set, looks for labels in this directory instead",
+        },
+    )
+    label_rate: int = field(
+        default=-1,
+        metadata={"help": "label frame rate. -1 for sequence label"},
+    )
+    sample_rate: int = field(
+        default=16_000,
+        metadata={
+            "help": "target sample rate. audio files will be up/down "
+            "sampled to this rate"
+        },
+    )
+    normalize: bool = field(
+        default=False,
+        metadata={
+            "help": "if set, normalizes input to have 0 mean and unit variance"
+        },
+    )
+    enable_padding: bool = field(
+        default=False,
+        metadata={"help": "pad shorter samples instead of cropping"},
+    )
+    max_keep_size: Optional[int] = field(
+        default=None,
+        metadata={"help": "exclude sample longer than this"},
+    )
+    max_sample_size: Optional[int] = field(
+        default=None,
+        metadata={"help": "max sample size to crop to for batching"},
+    )
+    min_sample_size: Optional[int] = field(
+        default=None,
+        metadata={"help": "min sample size to crop to for batching"},
+    )
+    single_target: Optional[bool] = field(
+        default=False,
+        metadata={
+            "help": "if set, AddTargetDatasets outputs same keys "
+            "as AddTargetDataset"
+        },
+    )
+    random_crop: Optional[bool] = field(
+        default=True,
+        metadata={"help": "always crop from the beginning if false"},
+    )
+    pad_audio: Optional[bool] = field(
+        default=False,
+        metadata={"help": "pad audio to the longest one in the batch if true"},
+    )
+    store_labels: Optional[bool] = field(
+        default=True,
+        metadata={"help": "store spm labels in memory, should be true when fine-tune with bpe"},
+    )
+    add_decoder_target: bool = field(
+        default=False,
+        metadata={"help": "contral the model architecture, if set True, load reduced unit as target"},
+    )
+    split_modality_batch: bool = field(
+        default=False,
+        metadata={"help": "whether create all samples of different modalities in a batch"},
+    )
+    speech_tgt_lang: str = field(
+        default="",
+        metadata={"help": "prepend <tgt-id> to prev_output_tokens to replace <eos>, only used for decoder"},
+    )
+    speech_sampling_alpha: float = field(
+        default=0.2,
+        metadata={
+            "help": "Hyper-parameter alpha = 1/T for temperature-based speech resampling."
+            "(alpha = 1 for no resampling)"
+        },
+    )
+    text_sampling_alpha: float = field(
+        default=0.2,
+        metadata={
+            "help": "Hyper-parameter alpha = 1/T for temperature-based text resampling."
+            "(alpha = 1 for no resampling)"
+        },
+    )
+    hubert_tokenizer: Optional[TOKENIZER_CHOICES] = field(
+        default="none",
+        metadata={"help": "which tokenizer for processing text"},
+    )
+    sp_path: Optional[str] = field(
+        default=None,
+        metadata={"help": "sentencepiece model path if using bpe tokenizer"},
+    )
+    text_cfg: TextPretrainingConfig = TextPretrainingConfig()
+    # For inference
+    ctc_weight: float = field(
+        default=0.0,
+        metadata={"help": "ctc weight during inference"},
+    )
+    lm_dict: Optional[str] = field(
+        default="dict.txt",
+        metadata={"help": "dict used for decoding with language model, should be in cfg.data/"},
+    )
+
+@register_task("joint_sc2t_pretraining", dataclass=JointPretrainingConfig)
+class Jsc2tPretrainingTask(FairseqTask):
+
+    cfg: JointPretrainingConfig
+
+    def __init__(
+        self,
+        cfg: JointPretrainingConfig,
+        load_local_states: True,
+    ) -> None:
+        super().__init__(cfg)
+
+        logger.info(f"current directory is {os.getcwd()}")
+        logger.info(f"JSTPretrainingTask Config {cfg}")
+
+        self.cfg = cfg
+        self.fine_tuning = cfg.fine_tuning
+        self.blank_symbol = "<s>"
+
+        if load_local_states:
+            self.state.add_factory("hubert_tokenizer", self.build_tokenizer)
+            if self.cfg.text_cfg.text_data is not None and os.path.exists(self.cfg.text_cfg.text_data):
+                self.state.add_factory("text_dictionary", self.load_text_dictionary)
+                self.state.add_factory("text_src_dictionary", self.load_text_src_dictionary)
+            if cfg.fine_tuning:
+                self.state.add_factory("target_dictionary", self.load_dictionaries)
+            else:
+                self.state.add_factory("dictionaries", self.load_dictionaries)
+
+            if cfg.text_cfg.data_config is not None:
+                self.text_data_cfg = S2TJointDataConfig(Path(f"{cfg.text_cfg.text_data}/{cfg.text_cfg.data_config}"))
+                self.cfg.text_cfg.bpe = self.text_data_cfg.bpe_tokenizer["bpe"]
+            else:
+                self.text_data_cfg = None
+
+    @property
+    def source_dictionary(self) -> Optional[Dictionary]:
+        return None
+
+    @property
+    def target_dictionary(self) -> Optional[Dictionary]:
+        return self.state.target_dictionary
+
+    @property
+    def dictionaries(self) -> List[Dictionary]:
+        return self.state.dictionaries
+
+    @property
+    def text_dictionary(self) -> Optional[Dictionary]:
+        return self.state.text_dictionary
+
+    @property
+    def text_src_dictionary(self) -> Optional[Dictionary]:
+        return self.state.text_src_dictionary
+
+    @property
+    def hubert_tokenizer(self):
+        return self.state.hubert_tokenizer
+
+    def load_dictionaries(self):
+        label_dir = self.cfg.data if self.cfg.label_dir is None else self.cfg.label_dir
+        dictionaries = [Dictionary.load(f"{label_dir}/dict.{label}.txt") for label in self.cfg.labels]
+        if not self.cfg.fine_tuning:
+            for dictionary in dictionaries:
+                dictionary.add_symbol("<mask>")
+        return dictionaries[0] if self.cfg.fine_tuning else dictionaries
+    
+    def load_text_dictionary(self):
+        tgt_dict_path = f"{self.cfg.text_cfg.text_data}/{self.text_data_cfg.vocab_filename if self.text_data_cfg is not None else 'dict.txt'}"
+        if not os.path.isfile(tgt_dict_path):
+            raise FileNotFoundError(f"Dict not found: {tgt_dict_path}")
+        text_dictionary = Dictionary.load(tgt_dict_path)
+        self.mask_idx = text_dictionary.add_symbol("<mask>")
+        return text_dictionary
+
+    def load_text_src_dictionary(self):
+        src_dict_path = f"{self.cfg.text_cfg.text_data}/{self.text_data_cfg.src_vocab_filename if self.text_data_cfg is not None else 'dict.txt'}"
+        if not os.path.isfile(src_dict_path):
+            raise FileNotFoundError(f"Dict not found: {src_dict_path}")
+        src_text_dictionary = Dictionary.load(src_dict_path)
+        self.mask_idx = src_text_dictionary.add_symbol("<mask>")
+        return src_text_dictionary
+
+    @classmethod
+    def setup_task(
+        cls, cfg: JointPretrainingConfig, **kwargs
+    ) -> "Jsc2tPretrainingTask":
+        load_local_states = kwargs.get("load_local_states", True)
+        return cls(cfg, load_local_states)
+
+    def get_label_dir(self) -> str:
+        if self.cfg.label_dir is None:
+            return self.cfg.data
+        return self.cfg.label_dir
+    
+    def load_paired_dataset(self, text_split, truncate_source=False):
+        text_split, lp = text_split.rsplit('.', 1)       # e.g. "libritext.ltr-ltr"
+        if len(lp.split("-")) == 2:
+            src, tgt = lp.split("-")
+            if src == tgt:
+                logger.warn(f"| trying to load monolingual dataset {text_split}.{lp}, please check your task is right.")
+                paired_dataset = self.load_char_bart_dataset(f"{text_split}.{lp}.{tgt}")
+                return paired_dataset
+            paired_dataset = load_langpair_dataset(
+                self.cfg.text_cfg.text_data,
+                text_split,
+                src,
+                self.text_src_dictionary,
+                tgt,
+                self.text_dictionary,
+                combine=True,
+                dataset_impl=None,
+                upsample_primary=1,
+                left_pad_source=False,
+                left_pad_target=False,
+                max_source_positions=self.cfg.text_cfg.tokens_per_sample,
+                max_target_positions=self.cfg.text_cfg.tokens_per_sample,
+                truncate_source=truncate_source,
+                prepend_bos=False,
+                load_alignments=False,
+                append_source_id=True if self.cfg.text_cfg.prepend_tgt_lang_tag else False,
+                lang_format="<lang:{}>" if self.cfg.text_cfg.prepend_tgt_lang_tag else "[{}]",
+                input_feeding=self.cfg.add_decoder_target,
+            )
+            if self.cfg.text_cfg.mask_text_ratio > 0:
+                # add mask
+                self.mask_idx = self.text_src_dictionary.index("<mask>")
+                mask_whole_words = None
+                if self.cfg.text_cfg.mask_whole_words:
+                    mask_whole_words = get_whole_word_mask(self.cfg.text_cfg, self.text_src_dictionary)
+                elif self.cfg.text_cfg.mask_repeative_tokens:
+                    mask_whole_words = get_repeative_start
+                
+                src_dataset, src_unmasked_dataset = MaskTokensDataset.apply_mask(
+                    paired_dataset.src,
+                    self.text_src_dictionary,
+                    pad_idx=self.text_src_dictionary.pad(),
+                    mask_idx=self.mask_idx,
+                    seed=self.cfg.text_cfg.seed,
+                    mask_prob=self.cfg.text_cfg.mask_text_ratio,
+                    leave_unmasked_prob=self.cfg.text_cfg.leave_unmasked_prob,
+                    random_token_prob=self.cfg.text_cfg.mask_random,
+                    freq_weighted_replacement=self.cfg.text_cfg.freq_weighted_replacement,
+                    mask_whole_words=mask_whole_words,
+                    mask_multiple_length=self.cfg.text_cfg.mask_multiple_length,
+                    mask_stdev=self.cfg.text_cfg.mask_stdev,
+                )
+                tgt_dataset = paired_dataset.tgt if paired_dataset.tgt is not None else src_unmasked_dataset
+                paired_dataset = LanguageTripleDataset(
+                    src_dataset,
+                    src_dataset.sizes,
+                    self.text_src_dictionary,
+                    src_unmasked_dataset,
+                    src_unmasked_dataset.sizes,
+                    self.text_src_dictionary,
+                    tgt_dataset,
+                    tgt_dataset.sizes,
+                    self.text_dictionary,
+                    left_pad_source=False,
+                    left_pad_target=False,
+                    align_dataset=None,
+                    eos=None,
+                    num_buckets=0,
+                    shuffle=True,
+                    pad_to_multiple=1,
+                )
+        else:
+            src, ref, tgt = lp.split("-")
+            paired_dataset = load_langtriple_dataset(
+                self.cfg.text_cfg.text_data,
+                text_split,
+                src,
+                self.text_src_dictionary,
+                ref,
+                self.dictionaries[-1],
+                tgt,
+                self.text_dictionary,
+                combine=True,
+                dataset_impl=None,
+                upsample_primary=1,
+                left_pad_source=False,
+                left_pad_target=False,
+                max_source_positions=self.cfg.text_cfg.tokens_per_sample,
+                max_target_positions=self.cfg.text_cfg.tokens_per_sample,
+                truncate_source=truncate_source,
+                prepend_bos=False,
+                load_alignments=False,
+                append_source_id=True if self.cfg.text_cfg.prepend_tgt_lang_tag else False,
+                lang_format="<lang:{}>" if self.cfg.text_cfg.prepend_tgt_lang_tag else "[{}]",
+            )
+        return paired_dataset
+
+    def load_dataset(self, split: str, epoch=1, **kwargs) -> None:
+        """
+            Create Wav dataset for audio, and Index dataset for phonemized text, 
+            then concatenate them to by fairseq.data.multi_corpus_dataset.MultiCorpusDataset.
+        """
+        speech_splits = split.split('+')[0].split(',')
+        ### 1st, create a speech dataset using STSpeechDataset (modified from HubertDataset)
+        dicts = [self.target_dictionary] if self.cfg.fine_tuning else self.dictionaries
+        pad_list = [dict.pad() for dict in dicts]
+        eos_list = [dict.eos() for dict in dicts]
+        procs = [LabelEncoder(dict) for dict in dicts]
+        if self.cfg.speech_tgt_lang != "":
+            tgt_lang_idx = _lang_token_index(dicts[0], self.cfg.speech_tgt_lang)
+            logger.info(f"Will prepend <{tgt_lang_idx}> at the beginning of prev_output_tokens to replace <eos>")
+        else:
+            tgt_lang_idx = None
+
+
+        # hubert v1: pad_audio=True, random_crop=False;
+        speech_datasets = []
+        for speech_split in speech_splits:
+            paths = [
+                f"{self.get_label_dir()}/{speech_split}.{l}" for l in self.cfg.labels
+            ]
+            speech_datasets.append(
+                HubertDataset(
+                    f"{self.cfg.data}/{speech_split}.tsv",
+                    sample_rate=self.cfg.sample_rate,
+                    label_paths=paths,
+                    label_rates=self.cfg.label_rate,
+                    pad_list=pad_list,
+                    eos_list=eos_list,
+                    label_processors=procs,
+                    max_keep_sample_size=self.cfg.max_keep_size,
+                    min_keep_sample_size=self.cfg.min_sample_size,
+                    max_sample_size=self.cfg.max_sample_size,
+                    pad_audio=self.cfg.pad_audio,
+                    normalize=self.cfg.normalize,
+                    store_labels=self.cfg.store_labels,
+                    random_crop=self.cfg.random_crop,
+                    single_target=self.cfg.single_target,
+                    tgt_dict=dicts[0],
+                    add_decoder_target=self.cfg.add_decoder_target,
+                    fine_tuning=self.cfg.fine_tuning,
+                    tgt_lang_idx=tgt_lang_idx,
+                    tokenizer=self.hubert_tokenizer,
+                )
+            )
+        if len(speech_datasets) > 1:
+            speech_dataset = ConcatDataset(speech_datasets)
+        else:
+            speech_dataset = speech_datasets[0]
+
+        has_text = len(split.split('+')) > 1
+        if not has_text:
+            assert speech_dataset is not None
+            self.datasets[split] = speech_dataset
+            return
+
+        ### 2nd, create paired/mono text datasets using Langpairdataset
+        if split.split('+')[1] != '':
+            paired_splits = [paired_split for paired_split in split.split('+')[1].split(',') if paired_split != '']
+            paired_datasets = [self.load_paired_dataset(paired_split) for paired_split in paired_splits]
+        else:
+            paired_splits, paired_datasets = [], []
+
+        if len(split.split('+')) > 2 and split.split('+')[2] != '':
+            mono_splits = [mono_split for mono_split in split.split('+')[2].split(',') if mono_split != '']
+            mono_datasets = [self.load_paired_dataset(mono_split, truncate_source=self.cfg.text_cfg.truncate_mono_source) for mono_split in mono_splits]
+        else:
+            mono_splits, mono_datasets = [], []
+        
+        assert len(mono_datasets + paired_datasets) > 0, f"split {split} has no text! you should check out for that"
+
+        ### 3rd, if provided, create a supervised dataset with labeled data
+        if len(split.split('+')) > 3 and split.split('+')[3] != '':
+            assert len(paired_splits) > 0, f"supervised dataset can not be loaded without text paired dataset!"
+            tgt = paired_splits[0].rsplit('.', 1)[1].split("-")[1]
+            sup_split = split.split('+')[3]
+
+            sup_dataset = HubertDataset(
+                f"{self.cfg.data}/{sup_split}.tsv",
+                sample_rate=self.cfg.sample_rate,
+                label_paths=[f"{self.get_label_dir()}/{sup_split}.{tgt}"],
+                label_rates=[-1],
+                pad_list=[self.text_dictionary.pad()],
+                eos_list=[self.text_dictionary.eos()],
+                label_processors=[LabelEncoder(self.text_dictionary)],
+                max_keep_sample_size=self.cfg.max_keep_size,
+                min_keep_sample_size=None,
+                max_sample_size=None,
+                pad_audio=True,
+                normalize=self.cfg.normalize,
+                store_labels=self.cfg.store_labels,
+                random_crop=False,
+                single_target=True,
+                tgt_dict=self.text_dictionary,
+                add_decoder_target=self.cfg.add_decoder_target,
+                fine_tuning=True,
+                tgt_lang_idx=None,
+                tokenizer=None,
+            )
+        else:
+            sup_dataset = None
+
+        ### 4th, compose a MultiCorpusDataset
+        dataset_dict, max_positions_dict, distributions, max_tokens_ratios = self.resample_multi_modality_dataset(
+        speech_dataset, sup_dataset, mono_datasets, paired_datasets, mono_splits, paired_splits, epoch=epoch,
+        )
+        self.datasets[split] = MultiCorpusDataset(
+            dataset_dict,
+            max_positions=max_positions_dict,
+            distribution=distributions,
+            max_tokens_ratio=max_tokens_ratios,
+            seed=self.cfg.text_cfg.seed,
+            sort_indices=True,
+        )
+
+
+    def max_positions(self) -> Tuple[int, int]:
+        return (sys.maxsize, sys.maxsize)
+
+    def filter_indices_by_size(
+        self, indices: np.array, *args, **kwargs
+    ) -> np.array:
+        return indices
+
+    def get_batch_iterator(
+        self,
+        dataset,
+        max_tokens=None,
+        max_sentences=None,
+        max_positions=None,
+        ignore_invalid_inputs=False,
+        required_batch_size_multiple=1,
+        seed=1,
+        num_shards=1,
+        shard_id=0,
+        num_workers=0,
+        epoch=1,
+        data_buffer_size=0,
+        disable_iterator_cache=False,
+        skip_remainder_batch=False,
+        grouped_shuffling=False,
+        update_epoch_batch_itr=False,
+    ):
+        """
+        Get an iterator that yields batches of data from the given dataset.
+        Args:
+            dataset (~fairseq.data.FairseqDataset): dataset to batch
+            max_tokens (int, optional): max number of tokens in each batch
+                (default: None).
+            max_sentences (int, optional): max number of sentences in each
+                batch (default: None).
+            max_positions (optional): max sentence length supported by the
+                model (default: None).
+            ignore_invalid_inputs (bool, optional): don't raise Exception for
+                sentences that are too long (default: False).
+            required_batch_size_multiple (int, optional): require batch size to
+                be a multiple of N (default: 1).
+            seed (int, optional): seed for random number generator for
+                reproducibility (default: 1).
+            num_shards (int, optional): shard the data iterator into N
+                shards (default: 1).
+            shard_id (int, optional): which shard of the data iterator to
+                return (default: 0).
+            num_workers (int, optional): how many subprocesses to use for data
+                loading. 0 means the data will be loaded in the main process
+                (default: 0).
+            epoch (int, optional): the epoch to start the iterator from
+                (default: 1).
+            data_buffer_size (int, optional): number of batches to
+                preload (default: 0).
+            disable_iterator_cache (bool, optional): don't cache the
+                EpochBatchIterator (ignores `FairseqTask::can_reuse_epoch_itr`)
+                (default: False).
+            skip_remainder_batch (bool, optional): if set, discard the last
+                batch in each training epoch, as the last batch is often smaller than
+                    local_batch_size * distributed_word_size (default: ``True``).
+            grouped_shuffling (bool, optional): group batches with each groups
+                containing num_shards batches and shuffle groups. Reduces difference
+                between sequence lengths among workers for batches sorted by length.
+            update_epoch_batch_itr (bool optional): if true then donot use the cached
+                batch iterator for the epoch
+            
+        Returns:
+            ~fairseq.iterators.EpochBatchIterator: a batched iterator over the
+                given dataset split
+        """
+        if self.fine_tuning or not isinstance(dataset, MultiCorpusDataset):
+            return super().get_batch_iterator(
+                dataset,
+                max_tokens=max_tokens,
+                max_sentences=max_sentences,
+                max_positions=max_positions,
+                ignore_invalid_inputs=ignore_invalid_inputs,
+                required_batch_size_multiple=required_batch_size_multiple,
+                seed=seed,
+                num_shards=num_shards,
+                shard_id=shard_id,
+                num_workers=num_workers,
+                epoch=epoch,
+                data_buffer_size=data_buffer_size,
+                disable_iterator_cache=disable_iterator_cache,
+                skip_remainder_batch=skip_remainder_batch,
+                grouped_shuffling=grouped_shuffling,
+                update_epoch_batch_itr=update_epoch_batch_itr,
+            )
+        
+        can_reuse_epoch_itr = (
+            not disable_iterator_cache
+            and not update_epoch_batch_itr
+            and self.can_reuse_epoch_itr(dataset)
+        )
+        if can_reuse_epoch_itr and dataset in self.dataset_to_epoch_iter:
+            logger.debug("reusing EpochBatchIterator for epoch {}".format(epoch))
+            return self.dataset_to_epoch_iter[dataset]
+
+        assert isinstance(dataset, FairseqDataset)
+
+        # initialize the dataset with the correct starting epoch
+        dataset.set_epoch(epoch)
+
+        # get indices ordered by example size
+        with data_utils.numpy_seed(seed):
+            indices = dataset.ordered_indices()
+
+        # filter examples that are too large
+        if max_positions is not None:
+            indices = self.filter_indices_by_size(
+                indices, dataset, max_positions, ignore_invalid_inputs
+            )
+
+        # create mini-batches with given size constraints
+        batch_sampler = dataset.get_batch_sampler(
+            indices,
+            num_shards,
+            seed,
+            max_tokens=max_tokens,
+            max_sentences=max_sentences,
+            required_batch_size_multiple=required_batch_size_multiple,
+            split_modality_batch=self.cfg.split_modality_batch,
+        )
+
+        # return a reusable, sharded iterator
+        epoch_iter = iterators.EpochBatchIterator(
+            dataset=dataset,
+            collate_fn=dataset.collater,
+            batch_sampler=batch_sampler,
+            seed=seed,
+            num_shards=num_shards,
+            shard_id=shard_id,
+            num_workers=num_workers,
+            epoch=epoch,
+            buffer_size=data_buffer_size,
+            skip_remainder_batch=skip_remainder_batch,
+            disable_shuffling=True,
+            grouped_shuffling=grouped_shuffling,
+        )
+
+        if can_reuse_epoch_itr:
+            self.dataset_to_epoch_iter[dataset] = epoch_iter
+
+        return epoch_iter
+    
+    def build_generator(
+        self,
+        models,
+        args,
+        seq_gen_cls=None,
+        extra_gen_cls_kwargs=None,
+    ):
+        """Build ED-CTC generator for finet-tuned ASR model"""
+        from speechut.squence_generator import SequenceGenerator
+        extra_gen_cls_kwargs = {
+            "ctc_weight": self.cfg.ctc_weight,
+            "lm_dict": Dictionary.load(os.path.join(self.cfg.data, self.cfg.lm_dict)),
+            **extra_gen_cls_kwargs
+        }
+        return super().build_generator(
+            models, args, seq_gen_cls=SequenceGenerator, extra_gen_cls_kwargs=extra_gen_cls_kwargs
+        )
+
+    @classmethod
+    def _get_size_ratios(cls, ids: List[str], sizes: List[int], alpha: float = 1.0):
+        """Size ratios for temperature-based sampling
+        (https://arxiv.org/abs/1907.05019)"""
+        _sizes = np.array(sizes)
+        prob = _sizes / _sizes.sum()
+        smoothed_prob = prob ** alpha
+        smoothed_prob = smoothed_prob / smoothed_prob.sum()
+        size_ratio = (smoothed_prob * _sizes.sum()) / _sizes
+
+        o_str = str({_i: f"{prob[i]:.3f}" for i, _i in enumerate(ids)})
+        logger.info(f"original sampling probability: {o_str}")
+        p_str = str({_i: f"{smoothed_prob[i]:.3f}" for i, _i in enumerate(ids)})
+        logger.info(f"balanced sampling probability: {p_str}")
+        sr_str = str({_id: f"{size_ratio[i]:.3f}" for i, _id in enumerate(ids)})
+        logger.info(f"balanced sampling size ratio: {sr_str}")
+        return size_ratio.tolist()
+
+    def resample_multi_modality_dataset(self, speech_dataset, sup_dataset, mono_datasets, paired_datasets, mono_splits, paired_splits, epoch=1, train=True):
+        assert len(mono_datasets+paired_datasets) > 0, f"No text data loaded!"
+
+        if len(mono_datasets) > 1 and self.cfg.text_sampling_alpha != 1.0:
+            size_ratios = self._get_size_ratios(
+                mono_splits, [len(s) for s in mono_datasets], alpha=self.cfg.text_sampling_alpha
+            )
+            mono_datasets = [
+                ResamplingDataset(
+                    d, size_ratio=r, seed=0, epoch=epoch, replace=(r >= 1.0)
+                ) for d, r in zip(mono_datasets, size_ratios)
+            ]
+            
+        if len(paired_datasets) > 1 and self.cfg.text_sampling_alpha != 1.0:
+            size_ratios = self._get_size_ratios(
+                paired_splits, [len(s) for s in paired_datasets], alpha=self.cfg.text_sampling_alpha
+            )
+            paired_datasets = [
+                ResamplingDataset(
+                    d, size_ratio=r, seed=0, epoch=epoch, replace=(r >= 1.0)
+                ) for d, r in zip(paired_datasets, size_ratios)
+            ]
+
+        dataset_list = [speech_dataset, sup_dataset]
+        for datasets in [mono_datasets, paired_datasets]:
+            if len(datasets) > 1:
+                dataset_list.append(ConcatDataset(datasets))
+            elif len(datasets) == 1:
+                dataset_list.append(datasets[0])
+            else:
+                dataset_list.append(None)
+
+        ### match speech/text datasets according to modality
+        dataset_dict = OrderedDict((name, d) for name, d in zip(["speech", "speech_sup", "text_mono", "text_paired"], dataset_list) if d is not None)
+        max_positions_dict = {
+            "speech": None,
+            "speech_sup": None,
+            "text_mono": (self.cfg.text_cfg.tokens_per_sample, self.cfg.text_cfg.tokens_per_sample),
+            "text_paired": (self.cfg.text_cfg.tokens_per_sample, self.cfg.text_cfg.tokens_per_sample),
+        }
+        max_positions_dict = OrderedDict((name, max_positions_dict[name]) for name in dataset_dict.keys())
+        max_tokens_ratios_dict = {
+            "speech": 1.0,
+            "speech_sup": 1.0,
+            "text_mono": 1.0 / 320 / self.cfg.text_cfg.text_maxtokens_ratio,
+            "text_paired": 1.0 / 320 / self.cfg.text_cfg.text_maxtokens_ratio,
+        }
+        max_tokens_ratios = [max_tokens_ratios_dict[name] for name in dataset_dict.keys()]
+        dataset_lens = np.array([len(dataset) for dataset in dataset_dict.values()])
+        dataset_avg_sample_lens = np.array([
+            sum([dataset.num_tokens(i) for i in np.random.randint(low=0, high=len(dataset), size=10000)]) / 10000.0 
+            for dataset in dataset_dict.values()
+        ])
+
+        if not "speech" in dataset_dict:
+            distributions = [l / sum(dataset_lens) for l in dataset_lens]
+        else:
+            ## we just keep the batches of speech and non-speech the same, expand_coef is to ensure speech batches is less than others
+            first_ratio = dataset_lens[0] / sum(dataset_lens)
+            expand_coef = 1.2 if sup_dataset is None else 1.1 * sum(dataset_lens[0:2]) / dataset_lens[0]
+            distributions = [expand_coef * max_tokens_ratios[i] * dataset_avg_sample_lens[0] / l for (i, l) in enumerate(dataset_avg_sample_lens)]
+            distributions[0] = 1.0
+            if sup_dataset is not None:
+                distributions[1] = dataset_lens[1] / dataset_lens[0]
+            distributions = [first_ratio * d for d in distributions]
+
+        logging.info(f"Number samples of datasets is {dataset_lens}")
+        logging.info(f"Avg sample length of datasets is {dataset_avg_sample_lens}")
+        logging.info(f"Sampling distributions is {distributions}")
+        logging.info(f"Maxtokens ratio is {max_tokens_ratios}")
+        return dataset_dict, max_positions_dict, distributions, max_tokens_ratios
+
+    def build_tokenizer(self, cfg=None):
+        logger.info(f"tokenizer: {self.cfg.hubert_tokenizer}")
+        if self.cfg.hubert_tokenizer != "none":
+            return encoders.build_bpe(Namespace(**{"bpe": self.cfg.hubert_tokenizer, "sentencepiece_model": self.cfg.sp_path}))
+        else:
+            return None
+
+    def load_char_bart_dataset(self, split):
+        mono_dataset = data_utils.load_indexed_dataset(
+            f"{self.cfg.text_cfg.text_data}/{split}",
+            self.text_dictionary,
+        )
+        mono_dataset = StripTokenDataset(mono_dataset, self.text_dictionary.eos())
+        mono_dataset = maybe_shorten_dataset(
+            mono_dataset,
+            split,
+            self.cfg.text_cfg.shorten_data_split_list,
+            self.cfg.text_cfg.shorten_method,
+            self.cfg.text_cfg.tokens_per_sample - 2,
+            self.cfg.text_cfg.seed,
+        )
+        logger.info("loaded {} samples from: {}".format(len(mono_dataset), mono_dataset))
+        ### prepend bos and eos to dataset
+        mono_dataset = PrependTokenDataset(mono_dataset, self.text_dictionary.bos())
+        mono_dataset = AppendTokenDataset(mono_dataset, self.text_dictionary.eos())
+        mask_whole_words = (
+            get_whole_word_mask(None, self.text_dictionary)
+            if self.cfg.text_cfg.mask_whole_words
+            else None
+        )
+        lang=self.cfg.speech_tgt_lang
+        mono_dataset = DenoisingDataset(
+            mono_dataset,
+            mono_dataset.sizes,
+            self.text_dictionary,
+            self.mask_idx,
+            mask_whole_words,
+            shuffle=self.cfg.text_cfg.shuffle_instance,
+            seed=self.cfg.text_cfg.seed,
+            args=self.cfg.text_cfg,
+            tgt_lang_idx=_lang_token_index(self.text_dictionary, lang) if self.cfg.text_cfg.prepend_tgt_lang_tag else None,
+        )
+        
+        return mono_dataset
diff --git a/VATLM/README.md b/VATLM/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..0594e6715ebc8346ec888fd388dd5775cfe99e38
--- /dev/null
+++ b/VATLM/README.md
@@ -0,0 +1,135 @@
+# VATLM
+<!--**Pre-trained models for speech related tasks**-->
+
+ [**VATLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for Speech Representation Learning**](https://arxiv.org/abs/2211.11275)
+
+
+- (Done) Nov. 2022: release the code and models
+- Nov. 2022: release preprint in [arXiv](https://arxiv.org/abs/2211.11275)
+
+## Pre-Trained and Fine-tuned Models
+
+|    Model    |            Pre-training Dataset            |  Fine-tuning Dataset  |                            Model                             |
+| :---------: | :----------------------------------------: | :-------------------: | :----------------------------------------------------------: |
+| VatLM Base  |       LRS3 + paired audio+text+audio       |           -           | [Google drive](https://drive.google.com/file/d/121ITJc22prpbd4sCy9bPWpdkKgGikkgm/view?usp=share_link) |
+| VatLM Base  |       LRS3 + paired audio+text+audio       | LRS-30h  audio-visual | [Google drive](https://drive.google.com/file/d/1Bfbq0G-tASw3YrI3rzdpYgTE-UV-YaN0/view?usp=share_link) |
+| VatLM Base  |       LRS3 + paired audio+text+audio       |    LRS-30h  visual    | [Google drive](https://drive.google.com/file/d/1qALD9obym0zCDoszVn2CzW0U3EUl-4v7/view?usp=share_link) |
+| VatLM Base  | VoxCeleb2 + LRS3 + paired audio+text+audio |           -           | [Google drive](https://drive.google.com/file/d/1piae9Row25OEfAekVz5Bxb9YnIVyEP0A/view?usp=share_link) |
+| VatLM Base  | VoxCeleb2 + LRS3 + paired audio+text+audio | LRS-30h audio-visual  | [Google drive](https://drive.google.com/file/d/13JVuUi9gIIoUM888XcAOzvN7ioazn-cv/view?usp=share_link) |
+| VatLM Base  | VoxCeleb2 + LRS3 + paired audio+text+audio |    LRS-30h  visual    | [Google drive](https://drive.google.com/file/d/1pAQHf60HgqDORGzyqEjdGTIywLKO3Ko5/view?usp=share_link) |
+| VatLM Base  | VoxCeleb2 + LRS3 + paired audio+text+audio | LRS-433h audio-visual | [Google drive](https://drive.google.com/file/d/1u9oMnivBelxznQcMDoM_u5EOfJuxnSuL/view?usp=share_link) |
+| VatLM Base  | VoxCeleb2 + LRS3 + paired audio+text+audio |    LRS-433h visual    | [Google drive](https://drive.google.com/file/d/1g107k5tL3XyvevSe0BzMqYOQFyFQG7jf/view?usp=share_link) |
+| VatLM Large | VoxCeleb2 + LRS3 + paired audio+text+audio |           -           | [Google drive](https://drive.google.com/file/d/1_vbVFpKcaaPcCx2FtI-GyzVvxAhppg_b/view?usp=share_link) |
+| VatLM Large | VoxCeleb2 + LRS3 + paired audio+text+audio | LRS-30h  audio-visual | [Google drive](https://drive.google.com/file/d/1LyTCxceTZIqjVdMY6hlJjWolaIAZ0Mhs/view?usp=share_link) |
+| VatLM Large | VoxCeleb2 + LRS3 + paired audio+text+audio |    LRS-30h  visual    | [Google drive](https://drive.google.com/file/d/1CuyGg5O14F9Y_WCwpCVoKYbDKVtjBRQU/view?usp=share_link) |
+| VatLM Large | VoxCeleb2 + LRS3 + paired audio+text+audio | LRS-433h audio-visual | [Google drive](https://drive.google.com/file/d/12orvO3xBuzdUDrBOqjW0mdGhV2Kmsy0Q/view?usp=share_link) |
+| VatLM Large | VoxCeleb2 + LRS3 + paired audio+text+audio |    LRS-433h visual    | [Google drive](https://drive.google.com/file/d/17DDTUPs0BkaJtSUTiJHLBbymt2LCGo6e/view?usp=share_link) |
+
+
+
+## Setup
+
+To fine-tune or pre-train more models, please follow the instructions below.
+
+```bash
+git clone https://github.com/microsoft/SpeechT5.git
+cd SpeechT5/VATLM
+git submodule init && git submodule update
+
+cd VATLM/fairseq  && pip install --editable
+cd VATLM/vat_hubert && pip install -r requirements.txt
+```
+
+## Data preparation
+
+1. For audio or visual data, please follow the steps of AV-HuBERT's [script](https://github.com/facebookresearch/av_hubert/tree/main/avhubert/preparation) to pre-process the data and get the corresponding `train.tsv`,` train.km` files.
+
+2. For unimodal audio data, the visual modality is replaced with a zero vector, and the features are extracted according to this [script](https://github.com/facebookresearch/av_hubert/tree/main/avhubert/preparation) and then kmeans [clustering](https://github.com/facebookresearch/av_hubert/tree/main/avhubert/clustering) is performed to get the corresponding labels.
+
+3. For unimodal text data, we use a small amount of pair text-audio data to obtain paired phone-unit data, and get the corresponding phoneme sequences by looking up the [lexicon](https://drive.google.com/file/d/1dh9NEx_cCF9_Aa0UcKyl9j00GXs6LmLQ/view?usp=sharing), and the unit data are obtained by extracting features and performing kmeans [clustering](https://github.com/facebookresearch/av_hubert/tree/main/avhubert/clustering).  Then follow this [script](https://github.com/microsoft/SpeechT5/tree/main/SpeechLM#hidden-unit-tokenizer-for-text) to train the phone2unit model.
+
+## Pre-train
+
+- VatLM Base model (LRS3 + paired audio+text+audio)
+
+  ```shell
+  cd VATLM/vat_hubert/vathubert/scripts/pretrain
+  ngpu=32
+  updatefreq=1
+  save_path=/path/to/save_path
+  
+  bash base_lsr3_pretrain_iter5.sh ${ngpu} ${updatefreq} ${save_path}
+  ```
+
+- VatLM Base model (VoxCeleb2 + paired audio+text+audio)
+
+  ```shell
+  cd VATLM/vat_hubert/vathubert/scripts/pretrain
+  ngpu=32
+  updatefreq=1
+  save_path=/path/to/save_path
+  
+  bash base_vox_pretrain_iter5.sh ${ngpu} ${updatefreq} ${save_path}
+  ```
+
+- VatLM Large model (VoxCeleb2 + paired audio+text+audio)
+
+  ```shell
+  cd VATLM/vat_hubert/vathubert/scripts/pretrain
+  ngpu=32
+  updatefreq=2
+  save_path=/path/to/save_path
+  
+  bash large_vox_pretrain_iter5.sh ${ngpu} ${updatefreq} ${save_path}
+  ```
+
+## Fine-tune AVSR/VSR
+
+For example, the AVSR model can be obtained by fine-tuning the VatLM model using 30 hours of labeled data.
+
+```shell
+cd VATLM/vat_hubert/vathubert/scripts/finetune_avsr
+ngpu=8
+updatefreq=1
+save_path=/path/to/save_path
+
+bash base_lrs3_finetune30_av.sh ${ngpu} ${updatefreq} ${save_path}
+```
+
+## Decode
+
+For example, decoding the fine-tuned AVSR model.
+
+```sh
+cd VATLM/vat_hubert/vathubert/
+data="test"
+bash decode_avhubert_lrs3.sh ${data}
+```
+
+## License
+
+This project is licensed under the license found in the LICENSE file in the root directory of this source tree.
+Portions of the source code are based on the [FAIRSEQ](https://github.com/pytorch/fairseq) and [av_hubert](https://github.com/facebookresearch/av_hubert)
+
+[Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct)
+
+## Reference
+
+If you find our work is useful in your research, please cite the following paper:
+
+```bibtex
+@article{zhu2022vatlm,
+      title={VATLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for Speech Representation Learning}, 
+      author={Qiushi Zhu and Long Zhou and Ziqiang Zhang and Shujie Liu and Binxing Jiao and Jie Zhang and Lirong Dai and Daxin Jiang and Jinyu Li and Furu Wei},
+      year={2022},
+      eprint={2211.11275},
+      archivePrefix={arXiv},
+}
+```
+
+### Contact Information
+
+For help or issues using VatLM models, please submit a GitHub issue.
+
+For other communications related to VatLM, please contact Long Zhou (`lozhou@microsoft.com`).
+
diff --git a/VATLM/vat_hubert/requirements.txt b/VATLM/vat_hubert/requirements.txt
new file mode 100644
index 0000000000000000000000000000000000000000..a177256afd477279936017c3830ffefef5e6ccc3
--- /dev/null
+++ b/VATLM/vat_hubert/requirements.txt
@@ -0,0 +1,6 @@
+python-speech-features==0.6
+scipy==1.5.4
+opencv-python==4.5.4.60
+sentencepiece==0.1.96
+editdistance==0.6.0
+kaldiio==2.17.2
\ No newline at end of file
diff --git a/VATLM/vat_hubert/vathubert/__init__.py b/VATLM/vat_hubert/vathubert/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..1e16cee6d480e2dd8a959a3b1c30d410cf5b008d
--- /dev/null
+++ b/VATLM/vat_hubert/vathubert/__init__.py
@@ -0,0 +1,11 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+# from .hubert import *  # noqa
+# from .hubert_asr import *  # noqa
+# from .hubert_dataset import *
+# from .hubert_pretraining import *
+# from .hubert_criterion import *
+from . import data, tasks, criterions, models
\ No newline at end of file
diff --git a/VATLM/vat_hubert/vathubert/conf/finetune/base_lrs3_30h_av.yaml b/VATLM/vat_hubert/vathubert/conf/finetune/base_lrs3_30h_av.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..fdae821e66bc580e08c6f179ee94a2d799e586a7
--- /dev/null
+++ b/VATLM/vat_hubert/vathubert/conf/finetune/base_lrs3_30h_av.yaml
@@ -0,0 +1,118 @@
+# @package _group_
+
+common:
+  fp16: true
+  log_format: json
+  log_interval: 200
+  tensorboard_logdir: tblog
+  seed: 1337
+  user_dir: ???
+
+checkpoint:
+  save_interval: 2
+  keep_interval_updates: 1
+  no_epoch_checkpoints: true
+  best_checkpoint_metric: accuracy
+  maximize_best_checkpoint_metric: true
+
+distributed_training:
+  ddp_backend: c10d
+  find_unused_parameters: true
+  distributed_world_size: 8
+  # distributed_port: 29671
+  nprocs_per_node: 8
+
+task:
+  _name: vat_hubert_pretraining
+  is_s2s: true
+  data: ???
+  label_dir: ???
+  tokenizer_bpe_model: ???
+  normalize: true  # must be consistent with pre-training
+  labels: ["wrd"]
+  single_target: true
+  fine_tuning: true
+  stack_order_audio: 4
+  tokenizer_bpe_name: sentencepiece
+  max_sample_size: 500
+  modalities: ["video","audio"]
+  image_aug: true
+  pad_audio: true
+  random_crop: false
+
+dataset:
+  num_workers: 6
+  max_tokens: 2000
+  validate_after_updates: 0
+  validate_interval: 2
+  train_subset: train
+  valid_subset: valid
+
+criterion:
+  _name: label_smoothed_cross_entropy
+  report_accuracy: true
+  label_smoothing: 0.1
+
+optimization:
+  max_update: 50000
+  lr: [0.001]
+  sentence_avg: true
+  update_freq: [1]
+
+optimizer:
+  _name: adam
+  adam_betas: (0.9,0.98)
+  adam_eps: 1e-08
+
+lr_scheduler:
+  _name: tri_stage
+  warmup_steps: 15000
+  hold_steps: 0
+  decay_steps: 20000
+  final_lr_scale: 0.05
+
+model:
+  _name: vat_hubert_seq2seq
+  w2v_path: ???
+  apply_mask: false
+  mask_selection: static
+  mask_length: 10
+  mask_other: 0
+  mask_prob: 0.75
+  mask_channel_selection: static
+  mask_channel_length: 64
+  mask_channel_other: 0
+  mask_channel_prob: 0.5
+  layerdrop: 0.1
+  dropout: 0.0
+  activation_dropout: 0.1
+  attention_dropout: 0.0
+  feature_grad_mult: 1.0
+  decoder_layers: 6
+  decoder_dropout: 0.1
+  decoder_attention_dropout: 0.0
+  decoder_activation_dropout: 0.1
+  freeze_finetune_updates: 30000
+  share_decoder_input_output_embed: true
+  decoder_normalize_before: true
+
+hydra:
+  job:
+    config:
+      override_dirname:
+        kv_sep: '-'
+        item_sep: '__'
+        exclude_keys:
+          - run
+          - task.data
+          - task.label_dir
+          - model.w2v_path
+          - dataset.train_subset
+          - dataset.valid_subset
+          - criterion.wer_kenlm_model
+          - criterion.wer_lexicon
+  run:
+    dir: ???
+  sweep:
+    dir: ???
+    subdir: ${hydra.job.config_name}__${hydra.job.override_dirname}
diff --git a/VATLM/vat_hubert/vathubert/conf/finetune/base_lrs3_30h_v.yaml b/VATLM/vat_hubert/vathubert/conf/finetune/base_lrs3_30h_v.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..1672dbb682d0d20cee32b111268952ff7dbf6f11
--- /dev/null
+++ b/VATLM/vat_hubert/vathubert/conf/finetune/base_lrs3_30h_v.yaml
@@ -0,0 +1,118 @@
+# @package _group_
+
+common:
+  fp16: true
+  log_format: json
+  log_interval: 200
+  tensorboard_logdir: tblog
+  seed: 1337
+  user_dir: ???
+
+checkpoint:
+  save_interval: 2
+  keep_interval_updates: 1
+  no_epoch_checkpoints: true
+  best_checkpoint_metric: accuracy
+  maximize_best_checkpoint_metric: true
+
+distributed_training:
+  ddp_backend: c10d
+  find_unused_parameters: true
+  distributed_world_size: 8
+  # distributed_port: 29671
+  nprocs_per_node: 8
+
+task:
+  _name: vat_hubert_pretraining
+  is_s2s: true
+  data: ???
+  label_dir: ???
+  tokenizer_bpe_model: ???
+  normalize: true  # must be consistent with pre-training
+  labels: ["wrd"]
+  single_target: true
+  fine_tuning: true
+  stack_order_audio: 4
+  tokenizer_bpe_name: sentencepiece
+  max_sample_size: 500
+  modalities: ["video"]
+  image_aug: true
+  pad_audio: true
+  random_crop: false
+
+dataset:
+  num_workers: 6
+  max_tokens: 2000
+  validate_after_updates: 0
+  validate_interval: 2
+  train_subset: train
+  valid_subset: valid
+
+criterion:
+  _name: label_smoothed_cross_entropy
+  report_accuracy: true
+  label_smoothing: 0.1
+
+optimization:
+  max_update: 40000
+  lr: [0.001]
+  sentence_avg: true
+  update_freq: [1]
+
+optimizer:
+  _name: adam
+  adam_betas: (0.9,0.98)
+  adam_eps: 1e-08
+
+lr_scheduler:
+  _name: tri_stage
+  warmup_steps: 10000
+  hold_steps: 0
+  decay_steps: 20000
+  final_lr_scale: 0.05
+
+model:
+  _name: vat_hubert_seq2seq
+  w2v_path: ???
+  apply_mask: false
+  mask_selection: static
+  mask_length: 10
+  mask_other: 0
+  mask_prob: 0.75
+  mask_channel_selection: static
+  mask_channel_length: 64
+  mask_channel_other: 0
+  mask_channel_prob: 0.5
+  layerdrop: 0.1
+  dropout: 0.0
+  activation_dropout: 0.1
+  attention_dropout: 0.0
+  feature_grad_mult: 1.0
+  decoder_layers: 6
+  decoder_dropout: 0.1
+  decoder_attention_dropout: 0.0
+  decoder_activation_dropout: 0.1
+  freeze_finetune_updates: 24000
+  share_decoder_input_output_embed: true
+  decoder_normalize_before: true
+
+hydra:
+  job:
+    config:
+      override_dirname:
+        kv_sep: '-'
+        item_sep: '__'
+        exclude_keys:
+          - run
+          - task.data
+          - task.label_dir
+          - model.w2v_path
+          - dataset.train_subset
+          - dataset.valid_subset
+          - criterion.wer_kenlm_model
+          - criterion.wer_lexicon
+  run:
+    dir: ???
+  sweep:
+    dir: ???
+    subdir: ${hydra.job.config_name}__${hydra.job.override_dirname}
diff --git a/VATLM/vat_hubert/vathubert/conf/finetune/base_vox_30h_av.yaml b/VATLM/vat_hubert/vathubert/conf/finetune/base_vox_30h_av.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..fdae821e66bc580e08c6f179ee94a2d799e586a7
--- /dev/null
+++ b/VATLM/vat_hubert/vathubert/conf/finetune/base_vox_30h_av.yaml
@@ -0,0 +1,118 @@
+# @package _group_
+
+common:
+  fp16: true
+  log_format: json
+  log_interval: 200
+  tensorboard_logdir: tblog
+  seed: 1337
+  user_dir: ???
+
+checkpoint:
+  save_interval: 2
+  keep_interval_updates: 1
+  no_epoch_checkpoints: true
+  best_checkpoint_metric: accuracy
+  maximize_best_checkpoint_metric: true
+
+distributed_training:
+  ddp_backend: c10d
+  find_unused_parameters: true
+  distributed_world_size: 8
+  # distributed_port: 29671
+  nprocs_per_node: 8
+
+task:
+  _name: vat_hubert_pretraining
+  is_s2s: true
+  data: ???
+  label_dir: ???
+  tokenizer_bpe_model: ???
+  normalize: true  # must be consistent with pre-training
+  labels: ["wrd"]
+  single_target: true
+  fine_tuning: true
+  stack_order_audio: 4
+  tokenizer_bpe_name: sentencepiece
+  max_sample_size: 500
+  modalities: ["video","audio"]
+  image_aug: true
+  pad_audio: true
+  random_crop: false
+
+dataset:
+  num_workers: 6
+  max_tokens: 2000
+  validate_after_updates: 0
+  validate_interval: 2
+  train_subset: train
+  valid_subset: valid
+
+criterion:
+  _name: label_smoothed_cross_entropy
+  report_accuracy: true
+  label_smoothing: 0.1
+
+optimization:
+  max_update: 50000
+  lr: [0.001]
+  sentence_avg: true
+  update_freq: [1]
+
+optimizer:
+  _name: adam
+  adam_betas: (0.9,0.98)
+  adam_eps: 1e-08
+
+lr_scheduler:
+  _name: tri_stage
+  warmup_steps: 15000
+  hold_steps: 0
+  decay_steps: 20000
+  final_lr_scale: 0.05
+
+model:
+  _name: vat_hubert_seq2seq
+  w2v_path: ???
+  apply_mask: false
+  mask_selection: static
+  mask_length: 10
+  mask_other: 0
+  mask_prob: 0.75
+  mask_channel_selection: static
+  mask_channel_length: 64
+  mask_channel_other: 0
+  mask_channel_prob: 0.5
+  layerdrop: 0.1
+  dropout: 0.0
+  activation_dropout: 0.1
+  attention_dropout: 0.0
+  feature_grad_mult: 1.0
+  decoder_layers: 6
+  decoder_dropout: 0.1
+  decoder_attention_dropout: 0.0
+  decoder_activation_dropout: 0.1
+  freeze_finetune_updates: 30000
+  share_decoder_input_output_embed: true
+  decoder_normalize_before: true
+
+hydra:
+  job:
+    config:
+      override_dirname:
+        kv_sep: '-'
+        item_sep: '__'
+        exclude_keys:
+          - run
+          - task.data
+          - task.label_dir
+          - model.w2v_path
+          - dataset.train_subset
+          - dataset.valid_subset
+          - criterion.wer_kenlm_model
+          - criterion.wer_lexicon
+  run:
+    dir: ???
+  sweep:
+    dir: ???
+    subdir: ${hydra.job.config_name}__${hydra.job.override_dirname}
diff --git a/VATLM/vat_hubert/vathubert/conf/finetune/base_vox_30h_v.yaml b/VATLM/vat_hubert/vathubert/conf/finetune/base_vox_30h_v.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..fdae821e66bc580e08c6f179ee94a2d799e586a7
--- /dev/null
+++ b/VATLM/vat_hubert/vathubert/conf/finetune/base_vox_30h_v.yaml
@@ -0,0 +1,118 @@
+# @package _group_
+
+common:
+  fp16: true
+  log_format: json
+  log_interval: 200
+  tensorboard_logdir: tblog
+  seed: 1337
+  user_dir: ???
+
+checkpoint:
+  save_interval: 2
+  keep_interval_updates: 1
+  no_epoch_checkpoints: true
+  best_checkpoint_metric: accuracy
+  maximize_best_checkpoint_metric: true
+
+distributed_training:
+  ddp_backend: c10d
+  find_unused_parameters: true
+  distributed_world_size: 8
+  # distributed_port: 29671
+  nprocs_per_node: 8
+
+task:
+  _name: vat_hubert_pretraining
+  is_s2s: true
+  data: ???
+  label_dir: ???
+  tokenizer_bpe_model: ???
+  normalize: true  # must be consistent with pre-training
+  labels: ["wrd"]
+  single_target: true
+  fine_tuning: true
+  stack_order_audio: 4
+  tokenizer_bpe_name: sentencepiece
+  max_sample_size: 500
+  modalities: ["video","audio"]
+  image_aug: true
+  pad_audio: true
+  random_crop: false
+
+dataset:
+  num_workers: 6
+  max_tokens: 2000
+  validate_after_updates: 0
+  validate_interval: 2
+  train_subset: train
+  valid_subset: valid
+
+criterion:
+  _name: label_smoothed_cross_entropy
+  report_accuracy: true
+  label_smoothing: 0.1
+
+optimization:
+  max_update: 50000
+  lr: [0.001]
+  sentence_avg: true
+  update_freq: [1]
+
+optimizer:
+  _name: adam
+  adam_betas: (0.9,0.98)
+  adam_eps: 1e-08
+
+lr_scheduler:
+  _name: tri_stage
+  warmup_steps: 15000
+  hold_steps: 0
+  decay_steps: 20000
+  final_lr_scale: 0.05
+
+model:
+  _name: vat_hubert_seq2seq
+  w2v_path: ???
+  apply_mask: false
+  mask_selection: static
+  mask_length: 10
+  mask_other: 0
+  mask_prob: 0.75
+  mask_channel_selection: static
+  mask_channel_length: 64
+  mask_channel_other: 0
+  mask_channel_prob: 0.5
+  layerdrop: 0.1
+  dropout: 0.0
+  activation_dropout: 0.1
+  attention_dropout: 0.0
+  feature_grad_mult: 1.0
+  decoder_layers: 6
+  decoder_dropout: 0.1
+  decoder_attention_dropout: 0.0
+  decoder_activation_dropout: 0.1
+  freeze_finetune_updates: 30000
+  share_decoder_input_output_embed: true
+  decoder_normalize_before: true
+
+hydra:
+  job:
+    config:
+      override_dirname:
+        kv_sep: '-'
+        item_sep: '__'
+        exclude_keys:
+          - run
+          - task.data
+          - task.label_dir
+          - model.w2v_path
+          - dataset.train_subset
+          - dataset.valid_subset
+          - criterion.wer_kenlm_model
+          - criterion.wer_lexicon
+  run:
+    dir: ???
+  sweep:
+    dir: ???
+    subdir: ${hydra.job.config_name}__${hydra.job.override_dirname}
diff --git a/VATLM/vat_hubert/vathubert/conf/finetune/base_vox_433h_av.yaml b/VATLM/vat_hubert/vathubert/conf/finetune/base_vox_433h_av.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..f39bda27b7b2dd0d217864bf2d27bec27ae59f65
--- /dev/null
+++ b/VATLM/vat_hubert/vathubert/conf/finetune/base_vox_433h_av.yaml
@@ -0,0 +1,118 @@
+# @package _group_
+
+common:
+  fp16: true
+  log_format: json
+  log_interval: 200
+  tensorboard_logdir: tblog
+  seed: 1337
+  user_dir: ???
+
+checkpoint:
+  save_interval: 2
+  keep_interval_updates: 1
+  no_epoch_checkpoints: true
+  best_checkpoint_metric: accuracy
+  maximize_best_checkpoint_metric: true
+
+distributed_training:
+  ddp_backend: c10d
+  find_unused_parameters: true
+  distributed_world_size: 8
+  # distributed_port: 29671
+  nprocs_per_node: 8
+
+task:
+  _name: vat_hubert_pretraining
+  is_s2s: true
+  data: ???
+  label_dir: ???
+  tokenizer_bpe_model: ???
+  normalize: true  # must be consistent with pre-training
+  labels: ["wrd"]
+  single_target: true
+  fine_tuning: true
+  stack_order_audio: 4
+  tokenizer_bpe_name: sentencepiece
+  max_sample_size: 500
+  modalities: ["video","audio"]
+  image_aug: true
+  pad_audio: true
+  random_crop: false
+
+dataset:
+  num_workers: 6
+  max_tokens: 2000
+  validate_after_updates: 0
+  validate_interval: 2
+  train_subset: train
+  valid_subset: valid
+
+criterion:
+  _name: label_smoothed_cross_entropy
+  report_accuracy: true
+  label_smoothing: 0.1
+
+optimization:
+  max_update: 60000
+  lr: [0.001]
+  sentence_avg: true
+  update_freq: [1]
+
+optimizer:
+  _name: adam
+  adam_betas: (0.9,0.98)
+  adam_eps: 1e-08
+
+lr_scheduler:
+  _name: tri_stage
+  warmup_steps: 20000
+  hold_steps: 0
+  decay_steps: 40000
+  final_lr_scale: 0.05
+
+model:
+  _name: vat_hubert_seq2seq
+  w2v_path: ???
+  apply_mask: false
+  mask_selection: static
+  mask_length: 10
+  mask_other: 0
+  mask_prob: 0.75
+  mask_channel_selection: static
+  mask_channel_length: 64
+  mask_channel_other: 0
+  mask_channel_prob: 0.5
+  layerdrop: 0.1
+  dropout: 0.0
+  activation_dropout: 0.1
+  attention_dropout: 0.0
+  feature_grad_mult: 1.0
+  decoder_layers: 6
+  decoder_dropout: 0.1
+  decoder_attention_dropout: 0.0
+  decoder_activation_dropout: 0.1
+  freeze_finetune_updates: 48000
+  share_decoder_input_output_embed: true
+  decoder_normalize_before: true
+
+hydra:
+  job:
+    config:
+      override_dirname:
+        kv_sep: '-'
+        item_sep: '__'
+        exclude_keys:
+          - run
+          - task.data
+          - task.label_dir
+          - model.w2v_path
+          - dataset.train_subset
+          - dataset.valid_subset
+          - criterion.wer_kenlm_model
+          - criterion.wer_lexicon
+  run:
+    dir: ???
+  sweep:
+    dir: ???
+    subdir: ${hydra.job.config_name}__${hydra.job.override_dirname}
diff --git a/VATLM/vat_hubert/vathubert/conf/finetune/base_vox_433h_v.yaml b/VATLM/vat_hubert/vathubert/conf/finetune/base_vox_433h_v.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..773d638f28809f3dfbfbcf34ab08c0ed4729a133
--- /dev/null
+++ b/VATLM/vat_hubert/vathubert/conf/finetune/base_vox_433h_v.yaml
@@ -0,0 +1,118 @@
+# @package _group_
+
+common:
+  fp16: true
+  log_format: json
+  log_interval: 200
+  tensorboard_logdir: tblog
+  seed: 1337
+  user_dir: ???
+
+checkpoint:
+  save_interval: 2
+  keep_interval_updates: 1
+  no_epoch_checkpoints: true
+  best_checkpoint_metric: accuracy
+  maximize_best_checkpoint_metric: true
+
+distributed_training:
+  ddp_backend: c10d
+  find_unused_parameters: true
+  distributed_world_size: 8
+  # distributed_port: 29671
+  nprocs_per_node: 8
+
+task:
+  _name: vat_hubert_pretraining
+  is_s2s: true
+  data: ???
+  label_dir: ???
+  tokenizer_bpe_model: ???
+  normalize: true  # must be consistent with pre-training
+  labels: ["wrd"]
+  single_target: true
+  fine_tuning: true
+  stack_order_audio: 4
+  tokenizer_bpe_name: sentencepiece
+  max_sample_size: 500
+  modalities: ["video"]
+  image_aug: true
+  pad_audio: true
+  random_crop: false
+
+dataset:
+  num_workers: 6
+  max_tokens: 2000
+  validate_after_updates: 0
+  validate_interval: 2
+  train_subset: train
+  valid_subset: valid
+
+criterion:
+  _name: label_smoothed_cross_entropy
+  report_accuracy: true
+  label_smoothing: 0.1
+
+optimization:
+  max_update: 30000
+  lr: [0.001]
+  sentence_avg: true
+  update_freq: [1]
+
+optimizer:
+  _name: adam
+  adam_betas: (0.9,0.98)
+  adam_eps: 1e-08
+
+lr_scheduler:
+  _name: tri_stage
+  warmup_steps: 10000
+  hold_steps: 0
+  decay_steps: 20000
+  final_lr_scale: 0.05
+
+model:
+  _name: vat_hubert_seq2seq
+  w2v_path: ???
+  apply_mask: false
+  mask_selection: static
+  mask_length: 10
+  mask_other: 0
+  mask_prob: 0.75
+  mask_channel_selection: static
+  mask_channel_length: 64
+  mask_channel_other: 0
+  mask_channel_prob: 0.5
+  layerdrop: 0.1
+  dropout: 0.0
+  activation_dropout: 0.1
+  attention_dropout: 0.0
+  feature_grad_mult: 1.0
+  decoder_layers: 6
+  decoder_dropout: 0.1
+  decoder_attention_dropout: 0.0
+  decoder_activation_dropout: 0.1
+  freeze_finetune_updates: 18000
+  share_decoder_input_output_embed: true
+  decoder_normalize_before: true
+
+hydra:
+  job:
+    config:
+      override_dirname:
+        kv_sep: '-'
+        item_sep: '__'
+        exclude_keys:
+          - run
+          - task.data
+          - task.label_dir
+          - model.w2v_path
+          - dataset.train_subset
+          - dataset.valid_subset
+          - criterion.wer_kenlm_model
+          - criterion.wer_lexicon
+  run:
+    dir: ???
+  sweep:
+    dir: ???
+    subdir: ${hydra.job.config_name}__${hydra.job.override_dirname}
diff --git a/VATLM/vat_hubert/vathubert/conf/finetune/large_vox_30h_av.yaml b/VATLM/vat_hubert/vathubert/conf/finetune/large_vox_30h_av.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..f712945a2616fef0c9a3f6bb40b1f7299a8c2182
--- /dev/null
+++ b/VATLM/vat_hubert/vathubert/conf/finetune/large_vox_30h_av.yaml
@@ -0,0 +1,124 @@
+# @package _group_
+
+common:
+  fp16: true
+  log_format: json
+  log_interval: 200
+  tensorboard_logdir: tblog
+  seed: 1337
+  user_dir: ???
+
+checkpoint:
+  save_interval: 2
+  keep_interval_updates: 1
+  no_epoch_checkpoints: true
+  best_checkpoint_metric: accuracy
+  maximize_best_checkpoint_metric: true
+
+distributed_training:
+  ddp_backend: c10d
+  find_unused_parameters: true
+  distributed_world_size: 8
+  # distributed_port: 29671
+  nprocs_per_node: 8
+
+task:
+  _name: vat_hubert_pretraining
+  is_s2s: true
+  data: ???
+  label_dir: ???
+  tokenizer_bpe_model: ???
+  normalize: true  # must be consistent with pre-training
+  labels: ["wrd"]
+  single_target: true
+  fine_tuning: true
+  stack_order_audio: 4
+  tokenizer_bpe_name: sentencepiece
+  max_sample_size: 500
+  modalities: ["video","audio"]
+  image_aug: true
+  pad_audio: true
+  random_crop: false
+  # noise_prob: 0.25
+  # noise_snr: 0
+  # noise_wav: ???
+
+dataset:
+  num_workers: 6
+  max_tokens: 1000
+  validate_after_updates: 0
+  validate_interval: 2
+  train_subset: train
+  valid_subset: valid
+
+criterion:
+  _name: label_smoothed_cross_entropy
+  report_accuracy: true
+  label_smoothing: 0.1
+
+optimization:
+  max_update: 60000
+  lr: [0.001]
+  sentence_avg: true
+  update_freq: [1]
+
+optimizer:
+  _name: adam
+  adam_betas: (0.9,0.98)
+  adam_eps: 1e-08
+
+lr_scheduler:
+  _name: tri_stage
+  warmup_steps: 20000
+  hold_steps: 0
+  decay_steps: 40000
+  final_lr_scale: 0.05
+
+model:
+  _name: vat_hubert_seq2seq
+  w2v_path: ???
+  apply_mask: false
+  mask_selection: static
+  mask_length: 10
+  mask_other: 0
+  mask_prob: 0.75
+  mask_channel_selection: static
+  mask_channel_length: 64
+  mask_channel_other: 0
+  mask_channel_prob: 0.5
+  layerdrop: 0.1
+  dropout: 0.0
+  activation_dropout: 0.1
+  attention_dropout: 0.0
+  feature_grad_mult: 1.0
+  decoder_layers: 9
+  decoder_dropout: 0.1
+  decoder_attention_dropout: 0.0
+  decoder_activation_dropout: 0.1
+  freeze_finetune_updates: 48000
+  share_decoder_input_output_embed: true
+  decoder_normalize_before: true
+  decoder_embed_dim: 1024
+  decoder_ffn_embed_dim: 4096
+  decoder_attention_heads: 8
+
+hydra:
+  job:
+    config:
+      override_dirname:
+        kv_sep: '-'
+        item_sep: '__'
+        exclude_keys:
+          - run
+          - task.data
+          - task.label_dir
+          - model.w2v_path
+          - dataset.train_subset
+          - dataset.valid_subset
+          - criterion.wer_kenlm_model
+          - criterion.wer_lexicon
+  run:
+    dir: ???
+  sweep:
+    dir: ???
+    subdir: ${hydra.job.config_name}__${hydra.job.override_dirname}
diff --git a/VATLM/vat_hubert/vathubert/conf/finetune/large_vox_30h_v.yaml b/VATLM/vat_hubert/vathubert/conf/finetune/large_vox_30h_v.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..f40166828236122a8deca9514680b65a1cd3251d
--- /dev/null
+++ b/VATLM/vat_hubert/vathubert/conf/finetune/large_vox_30h_v.yaml
@@ -0,0 +1,121 @@
+# @package _group_
+
+common:
+  fp16: true
+  log_format: json
+  log_interval: 200
+  tensorboard_logdir: tblog
+  seed: 1337
+  user_dir: ???
+
+checkpoint:
+  save_interval: 2
+  keep_interval_updates: 1
+  no_epoch_checkpoints: true
+  best_checkpoint_metric: accuracy
+  maximize_best_checkpoint_metric: true
+
+distributed_training:
+  ddp_backend: c10d
+  find_unused_parameters: true
+  distributed_world_size: 8
+  # distributed_port: 29671
+  nprocs_per_node: 8
+
+task:
+  _name: vat_hubert_pretraining
+  is_s2s: true
+  data: ???
+  label_dir: ???
+  tokenizer_bpe_model: ???
+  normalize: true  # must be consistent with pre-training
+  labels: ["wrd"]
+  single_target: true
+  fine_tuning: true
+  stack_order_audio: 4
+  tokenizer_bpe_name: sentencepiece
+  max_sample_size: 500
+  modalities: ["video"]
+  image_aug: true
+  pad_audio: true
+  random_crop: false
+
+dataset:
+  num_workers: 6
+  max_tokens: 1000
+  validate_after_updates: 0
+  validate_interval: 2
+  train_subset: train
+  valid_subset: valid
+
+criterion:
+  _name: label_smoothed_cross_entropy
+  report_accuracy: true
+  label_smoothing: 0.1
+
+optimization:
+  max_update: 18000
+  lr: [0.001]
+  sentence_avg: true
+  update_freq: [1]
+
+optimizer:
+  _name: adam
+  adam_betas: (0.9,0.98)
+  adam_eps: 1e-08
+
+lr_scheduler:
+  _name: tri_stage
+  warmup_steps: 6000
+  hold_steps: 0
+  decay_steps: 12000
+  final_lr_scale: 0.05
+
+model:
+  _name: vat_hubert_seq2seq
+  w2v_path: ???
+  apply_mask: false
+  mask_selection: static
+  mask_length: 10
+  mask_other: 0
+  mask_prob: 0.75
+  mask_channel_selection: static
+  mask_channel_length: 64
+  mask_channel_other: 0
+  mask_channel_prob: 0.5
+  layerdrop: 0.1
+  dropout: 0.0
+  activation_dropout: 0.1
+  attention_dropout: 0.0
+  feature_grad_mult: 1.0
+  decoder_layers: 9
+  decoder_dropout: 0.1
+  decoder_attention_dropout: 0.0
+  decoder_activation_dropout: 0.1
+  freeze_finetune_updates: 14400
+  share_decoder_input_output_embed: true
+  decoder_normalize_before: true
+  decoder_embed_dim: 1024
+  decoder_ffn_embed_dim: 4096
+  decoder_attention_heads: 8
+  
+hydra:
+  job:
+    config:
+      override_dirname:
+        kv_sep: '-'
+        item_sep: '__'
+        exclude_keys:
+          - run
+          - task.data
+          - task.label_dir
+          - model.w2v_path
+          - dataset.train_subset
+          - dataset.valid_subset
+          - criterion.wer_kenlm_model
+          - criterion.wer_lexicon
+  run:
+    dir: ???
+  sweep:
+    dir: ???
+    subdir: ${hydra.job.config_name}__${hydra.job.override_dirname}
diff --git a/VATLM/vat_hubert/vathubert/conf/finetune/large_vox_433h_av.yaml b/VATLM/vat_hubert/vathubert/conf/finetune/large_vox_433h_av.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..bd08081605fbfd85eeb774087efc2d3c4d740242
--- /dev/null
+++ b/VATLM/vat_hubert/vathubert/conf/finetune/large_vox_433h_av.yaml
@@ -0,0 +1,122 @@
+# @package _group_
+
+common:
+  fp16: true
+  log_format: json
+  log_interval: 200
+  tensorboard_logdir: tblog
+  seed: 1337
+  user_dir: ???
+
+checkpoint:
+  save_interval: 2
+  keep_interval_updates: 1
+  no_epoch_checkpoints: true
+  best_checkpoint_metric: accuracy
+  maximize_best_checkpoint_metric: true
+
+distributed_training:
+  ddp_backend: c10d
+  find_unused_parameters: true
+  distributed_world_size: 8
+  # distributed_port: 29671
+  nprocs_per_node: 8
+
+task:
+  _name: vat_hubert_pretraining
+  is_s2s: true
+  data: ???
+  label_dir: ???
+  tokenizer_bpe_model: ???
+  normalize: true  # must be consistent with pre-training
+  labels: ["wrd"]
+  single_target: true
+  fine_tuning: true
+  stack_order_audio: 4
+  tokenizer_bpe_name: sentencepiece
+  max_sample_size: 500
+  modalities: ["video","audio"]
+  image_aug: true
+  pad_audio: true
+  random_crop: false
+
+
+dataset:
+  num_workers: 6
+  max_tokens: 1000
+  validate_after_updates: 0
+  validate_interval: 2
+  train_subset: train
+  valid_subset: valid
+
+criterion:
+  _name: label_smoothed_cross_entropy
+  report_accuracy: true
+  label_smoothing: 0.1
+
+optimization:
+  max_update: 60000
+  lr: [0.001]
+  sentence_avg: true
+  update_freq: [1]
+
+optimizer:
+  _name: adam
+  adam_betas: (0.9,0.98)
+  adam_eps: 1e-08
+
+lr_scheduler:
+  _name: tri_stage
+  warmup_steps: 20000
+  hold_steps: 0
+  decay_steps: 40000
+  final_lr_scale: 0.05
+
+model:
+  _name: vat_hubert_seq2seq
+  w2v_path: ???
+  apply_mask: false
+  mask_selection: static
+  mask_length: 10
+  mask_other: 0
+  mask_prob: 0.75
+  mask_channel_selection: static
+  mask_channel_length: 64
+  mask_channel_other: 0
+  mask_channel_prob: 0.5
+  layerdrop: 0.1
+  dropout: 0.0
+  activation_dropout: 0.1
+  attention_dropout: 0.0
+  feature_grad_mult: 1.0
+  decoder_layers: 9
+  decoder_dropout: 0.1
+  decoder_attention_dropout: 0.0
+  decoder_activation_dropout: 0.1
+  freeze_finetune_updates: 48000
+  share_decoder_input_output_embed: true
+  decoder_normalize_before: true
+  decoder_embed_dim: 1024
+  decoder_ffn_embed_dim: 4096
+  decoder_attention_heads: 8
+
+hydra:
+  job:
+    config:
+      override_dirname:
+        kv_sep: '-'
+        item_sep: '__'
+        exclude_keys:
+          - run
+          - task.data
+          - task.label_dir
+          - model.w2v_path
+          - dataset.train_subset
+          - dataset.valid_subset
+          - criterion.wer_kenlm_model
+          - criterion.wer_lexicon
+  run:
+    dir: ???
+  sweep:
+    dir: ???
+    subdir: ${hydra.job.config_name}__${hydra.job.override_dirname}
diff --git a/VATLM/vat_hubert/vathubert/conf/finetune/large_vox_433h_v.yaml b/VATLM/vat_hubert/vathubert/conf/finetune/large_vox_433h_v.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..e31c848866567f6d927ef240f797ba37cc7aa787
--- /dev/null
+++ b/VATLM/vat_hubert/vathubert/conf/finetune/large_vox_433h_v.yaml
@@ -0,0 +1,121 @@
+# @package _group_
+
+common:
+  fp16: true
+  log_format: json
+  log_interval: 200
+  tensorboard_logdir: tblog
+  seed: 1337
+  user_dir: ???
+
+checkpoint:
+  save_interval: 2
+  keep_interval_updates: 1
+  no_epoch_checkpoints: true
+  best_checkpoint_metric: accuracy
+  maximize_best_checkpoint_metric: true
+
+distributed_training:
+  ddp_backend: c10d
+  find_unused_parameters: true
+  distributed_world_size: 8
+  # distributed_port: 29671
+  nprocs_per_node: 8
+
+task:
+  _name: vat_hubert_pretraining
+  is_s2s: true
+  data: ???
+  label_dir: ???
+  tokenizer_bpe_model: ???
+  normalize: true  # must be consistent with pre-training
+  labels: ["wrd"]
+  single_target: true
+  fine_tuning: true
+  stack_order_audio: 4
+  tokenizer_bpe_name: sentencepiece
+  max_sample_size: 500
+  modalities: ["video"]
+  image_aug: true
+  pad_audio: true
+  random_crop: false
+
+dataset:
+  num_workers: 6
+  max_tokens: 1000
+  validate_after_updates: 0
+  validate_interval: 2
+  train_subset: train
+  valid_subset: valid
+
+criterion:
+  _name: label_smoothed_cross_entropy
+  report_accuracy: true
+  label_smoothing: 0.1
+
+optimization:
+  max_update: 30000
+  lr: [0.001]
+  sentence_avg: true
+  update_freq: [1]
+
+optimizer:
+  _name: adam
+  adam_betas: (0.9,0.98)
+  adam_eps: 1e-08
+
+lr_scheduler:
+  _name: tri_stage
+  warmup_steps: 10000
+  hold_steps: 0
+  decay_steps: 20000
+  final_lr_scale: 0.05
+
+model:
+  _name: vat_hubert_seq2seq
+  w2v_path: ???
+  apply_mask: false
+  mask_selection: static
+  mask_length: 10
+  mask_other: 0
+  mask_prob: 0.75
+  mask_channel_selection: static
+  mask_channel_length: 64
+  mask_channel_other: 0
+  mask_channel_prob: 0.5
+  layerdrop: 0.1
+  dropout: 0.0
+  activation_dropout: 0.1
+  attention_dropout: 0.0
+  feature_grad_mult: 1.0
+  decoder_layers: 9
+  decoder_dropout: 0.1
+  decoder_attention_dropout: 0.0
+  decoder_activation_dropout: 0.1
+  freeze_finetune_updates: 18000
+  share_decoder_input_output_embed: true
+  decoder_normalize_before: true
+  decoder_embed_dim: 1024
+  decoder_ffn_embed_dim: 4096
+  decoder_attention_heads: 8
+  
+hydra:
+  job:
+    config:
+      override_dirname:
+        kv_sep: '-'
+        item_sep: '__'
+        exclude_keys:
+          - run
+          - task.data
+          - task.label_dir
+          - model.w2v_path
+          - dataset.train_subset
+          - dataset.valid_subset
+          - criterion.wer_kenlm_model
+          - criterion.wer_lexicon
+  run:
+    dir: ???
+  sweep:
+    dir: ???
+    subdir: ${hydra.job.config_name}__${hydra.job.override_dirname}
diff --git a/VATLM/vat_hubert/vathubert/conf/pretrain/base_lrs3_iter5.yaml b/VATLM/vat_hubert/vathubert/conf/pretrain/base_lrs3_iter5.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..b67c97df3a960f0a758300b058db2fd3dec974e0
--- /dev/null
+++ b/VATLM/vat_hubert/vathubert/conf/pretrain/base_lrs3_iter5.yaml
@@ -0,0 +1,113 @@
+# @package _group_
+
+common:
+  fp16: true
+  log_format: json
+  log_interval: 200
+  seed: 1337
+  user_dir: ???
+  empty_cache_freq: 10000
+
+checkpoint:
+  save_interval: 5
+  save_interval_updates: 25000
+  keep_interval_updates: 1
+  no_epoch_checkpoints: false
+
+
+distributed_training:
+  ddp_backend: no_c10d
+  distributed_backend: 'nccl'
+  distributed_world_size: 32
+  # distributed_port: 29671
+  nprocs_per_node: 8
+
+task:
+  _name: vat_hubert_pretraining
+  data: ???
+  label_dir: ???
+  labels: ["km"]
+  label_rate: ${model.label_rate}
+  sample_rate: 25
+  max_sample_size: 500
+  min_sample_size: 5
+  pad_audio: true
+  random_crop: false
+  normalize: true
+  stack_order_audio: 4
+  # stack_order: 1
+  input_modality: image
+  image_aug: true
+
+dataset:
+  num_workers: 6
+  max_tokens: 1000
+  skip_invalid_size_inputs_valid_test: true
+  validate_interval: 5
+  validate_interval_updates: 10000
+
+criterion:
+  _name: vat_hubert
+  pred_masked_weight: 1.0
+  pred_nomask_weight: 0.0
+  loss_weights: [10,]
+
+optimization:
+  max_update: 400000
+  lr: [0.0005]
+  clip_norm: 10.0
+
+optimizer:
+  _name: adam
+  adam_betas: (0.9,0.98)
+  adam_eps: 1e-06
+  weight_decay: 0.01
+
+lr_scheduler:
+  _name: polynomial_decay
+  warmup_updates: 32000
+
+model:
+  _name: vat_hubert
+  label_rate: ???
+  skip_masked: false
+  skip_nomask: false
+  modality_dropout: 0.5
+  audio_dropout: 0.5
+  modality_fuse: concat
+  selection_type: same_seq
+  masking_type: input
+  mask_prob_image: 0.3
+  mask_length_image: 10
+  mask_prob_audio: 0.8
+  mask_length_audio: 10
+  extractor_mode: default
+  # conv_feature_layers: '[(512,10,5)] + [(512,3,2)] * 4 + [(512,2,2)] * 2'
+  final_dim: 256
+  encoder_layerdrop: 0.05
+  dropout_input: 0.1
+  dropout_features: 0.1
+  dropout: 0.1
+  attention_dropout: 0.1
+  feature_grad_mult: 0.1
+  untie_final_proj: true
+  activation_dropout: 0.0
+  wav_input: false
+  layer_norm_first: true
+  audio_feat_dim: 104
+
+hydra:
+  job:
+    config:
+      override_dirname:
+        kv_sep: '-'
+        item_sep: '__'
+        exclude_keys:
+          - run
+          - task.data
+          - task.label_dir
+  run:
+    dir: ???
+  sweep:
+    dir: ???
+    subdir: ${hydra.job.config_name}__${hydra.job.override_dirname}
diff --git a/VATLM/vat_hubert/vathubert/conf/pretrain/base_vox_iter5.yaml b/VATLM/vat_hubert/vathubert/conf/pretrain/base_vox_iter5.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..0f349a2bfbe6b74bc8435c25cac564be429986e7
--- /dev/null
+++ b/VATLM/vat_hubert/vathubert/conf/pretrain/base_vox_iter5.yaml
@@ -0,0 +1,113 @@
+# @package _group_
+
+common:
+  fp16: true
+  log_format: json
+  log_interval: 200
+  seed: 1337
+  user_dir: ???
+  empty_cache_freq: 10000
+
+checkpoint:
+  save_interval: 1
+  save_interval_updates: 10000
+  keep_interval_updates: 1
+  no_epoch_checkpoints: false
+
+
+distributed_training:
+  ddp_backend: no_c10d
+  distributed_backend: 'nccl'
+  distributed_world_size: 32
+  # distributed_port: 29671
+  nprocs_per_node: 8
+
+task:
+  _name: vat_hubert_pretraining
+  data: ???
+  label_dir: ???
+  labels: ["km"]
+  label_rate: ${model.label_rate}
+  sample_rate: 25
+  max_sample_size: 500
+  min_sample_size: 5
+  pad_audio: true
+  random_crop: false
+  normalize: true
+  stack_order_audio: 4
+  # stack_order: 1
+  input_modality: image
+  image_aug: true
+
+dataset:
+  num_workers: 6
+  max_tokens: 1000
+  skip_invalid_size_inputs_valid_test: true
+  validate_interval: 5
+  validate_interval_updates: 10000
+
+criterion:
+  _name: vat_hubert
+  pred_masked_weight: 1.0
+  pred_nomask_weight: 0.0
+  loss_weights: [10,]
+
+optimization:
+  max_update: 400000
+  lr: [0.002]
+  clip_norm: 10.0
+
+optimizer:
+  _name: adam
+  adam_betas: (0.9,0.98)
+  adam_eps: 1e-06
+  weight_decay: 0.01
+
+lr_scheduler:
+  _name: polynomial_decay
+  warmup_updates: 64000
+
+model:
+  _name: vat_hubert
+  label_rate: ???
+  skip_masked: false
+  skip_nomask: false
+  modality_dropout: 0.5
+  audio_dropout: 0.5
+  modality_fuse: concat
+  selection_type: same_seq
+  masking_type: input
+  mask_prob_image: 0.3
+  mask_length_image: 10
+  mask_prob_audio: 0.8
+  mask_length_audio: 10
+  extractor_mode: default
+  # conv_feature_layers: '[(512,10,5)] + [(512,3,2)] * 4 + [(512,2,2)] * 2'
+  final_dim: 256
+  encoder_layerdrop: 0.05
+  dropout_input: 0.1
+  dropout_features: 0.1
+  dropout: 0.1
+  attention_dropout: 0.1
+  feature_grad_mult: 0.1
+  untie_final_proj: true
+  activation_dropout: 0.0
+  wav_input: false
+  layer_norm_first: true
+  audio_feat_dim: 104
+
+hydra:
+  job:
+    config:
+      override_dirname:
+        kv_sep: '-'
+        item_sep: '__'
+        exclude_keys:
+          - run
+          - task.data
+          - task.label_dir
+  run:
+    dir: ???
+  sweep:
+    dir: ???
+    subdir: ${hydra.job.config_name}__${hydra.job.override_dirname}
diff --git a/VATLM/vat_hubert/vathubert/conf/pretrain/large_vox_iter5.yaml b/VATLM/vat_hubert/vathubert/conf/pretrain/large_vox_iter5.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..e543801170b6931225548569a232f7c694580a18
--- /dev/null
+++ b/VATLM/vat_hubert/vathubert/conf/pretrain/large_vox_iter5.yaml
@@ -0,0 +1,118 @@
+# @package _group_
+
+common:
+  fp16: true
+  log_format: json
+  log_interval: 200
+  seed: 1337
+  user_dir: ???
+  empty_cache_freq: 10000
+
+checkpoint:
+  save_interval: 2
+  save_interval_updates: 10000
+  keep_interval_updates: 1
+  no_epoch_checkpoints: false
+
+
+distributed_training:
+  ddp_backend: no_c10d
+  distributed_backend: 'nccl'
+  distributed_world_size: 64
+  # distributed_port: 29671
+  nprocs_per_node: 8
+
+task:
+  _name: vat_hubert_pretraining
+  data: ???
+  label_dir: ???
+  labels: ["km"]
+  label_rate: ${model.label_rate}
+  sample_rate: 25
+  max_sample_size: 500
+  min_sample_size: 5
+  pad_audio: true
+  random_crop: false
+  normalize: true
+  stack_order_audio: 4
+  # stack_order: 1
+  input_modality: image
+  image_aug: true
+  # max_trim_sample_size: 400
+
+dataset:
+  num_workers: 6
+  max_tokens: 1000
+  skip_invalid_size_inputs_valid_test: true
+  validate_interval: 5
+  validate_interval_updates: 10000
+
+criterion:
+  _name: vat_hubert
+  pred_masked_weight: 1.0
+  pred_nomask_weight: 0.0
+  loss_weights: [10,]
+
+optimization:
+  max_update: 600000
+  lr: [0.002]
+  clip_norm: 10.0
+
+optimizer:
+  _name: adam
+  adam_betas: (0.9,0.98)
+  adam_eps: 1e-06
+  weight_decay: 0.01
+
+lr_scheduler:
+  _name: polynomial_decay
+  warmup_updates: 48000
+
+model:
+  _name: vat_hubert
+  label_rate: ???
+  skip_masked: false
+  skip_nomask: false
+  modality_dropout: 0.5
+  audio_dropout: 0.5
+  modality_fuse: concat
+  selection_type: same_seq
+  masking_type: input
+  mask_prob_image: 0.3
+  mask_length_image: 5
+  mask_prob_audio: 0.8
+  mask_length_audio: 10
+  extractor_mode: default
+  # conv_feature_layers: '[(512,10,5)] + [(512,3,2)] * 4 + [(512,2,2)] * 2'
+  final_dim: 256
+  encoder_layerdrop: 0.05
+  dropout_input: 0.1
+  dropout_features: 0.1
+  dropout: 0.1
+  attention_dropout: 0.1
+  feature_grad_mult: 0.1
+  untie_final_proj: true
+  activation_dropout: 0.0
+  wav_input: false
+  layer_norm_first: true
+  audio_feat_dim: 104
+  encoder_layers: 24
+  encoder_embed_dim: 1024
+  encoder_ffn_embed_dim: 4096
+  encoder_attention_heads: 16
+
+hydra:
+  job:
+    config:
+      override_dirname:
+        kv_sep: '-'
+        item_sep: '__'
+        exclude_keys:
+          - run
+          - task.data
+          - task.label_dir
+  run:
+    dir: ???
+  sweep:
+    dir: ???
+    subdir: ${hydra.job.config_name}__${hydra.job.override_dirname}
diff --git a/VATLM/vat_hubert/vathubert/conf/s2s_decode.yaml b/VATLM/vat_hubert/vathubert/conf/s2s_decode.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..ce9279f49255cf3721e946bd5e57c4066ca1e0d8
--- /dev/null
+++ b/VATLM/vat_hubert/vathubert/conf/s2s_decode.yaml
@@ -0,0 +1,23 @@
+common:
+  user_dir: ???
+
+generation:
+  beam: 50
+  max_len_a: 1.0
+  max_len_b: 0
+  lenpen: 1.0
+  lm_weight: 0
+
+common_eval:
+  results_path: ???
+  path: ???
+
+dataset:
+  max_tokens: 1000
+  gen_subset: valid
+  num_workers: 0
+
+override:
+  noise_prob: 0.0
+  noise_snr: 0
+  modalities: ???
diff --git a/VATLM/vat_hubert/vathubert/criterions/__init__.py b/VATLM/vat_hubert/vathubert/criterions/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..b5266f3f9f137067b059f69d15d766a951247422
--- /dev/null
+++ b/VATLM/vat_hubert/vathubert/criterions/__init__.py
@@ -0,0 +1,9 @@
+import importlib
+import os
+
+for file in os.listdir(os.path.dirname(__file__)):
+    if file.endswith(".py") and not file.startswith("_"):
+        criterion_name = file[: file.find(".py")]
+        importlib.import_module(
+            "vathubert.criterions." + criterion_name
+        )
diff --git a/VATLM/vat_hubert/vathubert/criterions/vathubert_criterion.py b/VATLM/vat_hubert/vathubert/criterions/vathubert_criterion.py
new file mode 100644
index 0000000000000000000000000000000000000000..ec6fc59da699d5d3deb415cd5e963010c30beab6
--- /dev/null
+++ b/VATLM/vat_hubert/vathubert/criterions/vathubert_criterion.py
@@ -0,0 +1,408 @@
+# ----------------------------------------------------------------------------
+# VatLM: Visual-Audio-Text Pre-Training  with Unified Masked Prediction for Speech Representation Learning
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/VATLM
+# Code based on fairseq: https://github.com/facebookresearch/fairseq and av_hubert: https://github.com/facebookresearch/av_hubert
+# 
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# ----------------------------------------------------------------------------
+import math
+import re
+from dataclasses import dataclass, field
+from typing import List, Optional
+
+import torch
+import torch.nn.functional as F
+from fairseq import metrics, utils
+from fairseq.criterions import FairseqCriterion, register_criterion
+from fairseq.dataclass import FairseqDataclass
+
+
+@dataclass
+class VATHubertCriterionConfig(FairseqDataclass):
+    pred_masked_weight: float = field(
+        default=1.0,
+        metadata={"help": "weight for predictive loss for masked frames"},
+    )
+    pred_nomask_weight: float = field(
+        default=0.0,
+        metadata={"help": "weight for predictive loss for unmasked frames"},
+    )
+    loss_weights: Optional[List[float]] = field(
+        default=None,
+        metadata={"help": "weights for additional loss terms (not first one)"},
+    )
+    log_keys: List[str] = field(
+        default_factory=lambda: [],
+        metadata={"help": "output keys to log"},
+    )
+    banlance_loss_weights: Optional[List[float]] = field(
+        default=None,
+        metadata={"help": "weights for additional loss terms (not first one)"},
+    )
+
+@register_criterion("vat_hubert", dataclass=VATHubertCriterionConfig)
+class VATHubertCriterion(FairseqCriterion):
+    def __init__(self, task, pred_masked_weight, pred_nomask_weight, banlance_loss_weights, loss_weights=None, log_keys=None):
+        super().__init__(task)
+        self.pred_masked_weight = pred_masked_weight
+        self.pred_nomask_weight = pred_nomask_weight
+        self.loss_weights = loss_weights
+        self.log_keys = [] if log_keys is None else log_keys
+        self.banlance_loss_weights = banlance_loss_weights
+
+    def forward(self, model, sample, reduce=True, log_pred=False):
+        """Compute the loss for the given sample.
+        Returns a tuple with three elements:
+        1) the loss
+        2) the sample size, which is used as the denominator for the gradient
+        3) logging outputs to display while training
+        """
+
+        videoaudio_sample = sample.get("videoaudio", None)
+        audiotext_sample = sample.get("audiotext", None)
+        onlytext_sample = sample.get("onlytext", None)
+        onlyaudio_sample = sample.get("onlyaudio", None)
+
+
+        loss = 0.
+        loss1 = 0.
+        loss2 = 0.
+        loss3 = 0.
+        loss4 = 0.
+        sample_size = 0
+        logging_output = {}
+        reduction = "sum" if reduce else "none"
+
+        if videoaudio_sample is not None:
+            # print("videoaudio_sample")
+            net_output = model(target_list=videoaudio_sample["target_list"], **videoaudio_sample["net_input"])
+
+            loss_m_list = []
+            logp_m_list, targ_m_list = net_output['logit_m_list'], net_output['target_m_list']
+            for i, (logp_m, targ_m) in enumerate(zip(logp_m_list, targ_m_list)):
+                loss_m = F.cross_entropy(logp_m, targ_m, reduction=reduction)
+                loss_m_list.append(loss_m)
+                logging_output[f"loss_m_videoaudio_{i}"] = loss_m.detach().item()
+            if self.pred_masked_weight > 0:
+                loss1 += self.pred_masked_weight * sum(loss_m_list)
+                sample_size += targ_m_list[0].numel()
+
+            loss_u_list = []
+            logp_u_list, targ_u_list = net_output['logit_u_list'], net_output['target_u_list']
+            for i, (logp_u, targ_u) in enumerate(zip(logp_u_list, targ_u_list)):
+                loss_u = F.cross_entropy(logp_u, targ_u, reduction=reduction)
+                loss_u_list.append(loss_u)
+                logging_output[f"loss_u_videoaudio_{i}"] = loss_u.detach().item()
+            if self.pred_nomask_weight > 0:
+                loss1 += self.pred_nomask_weight * sum(loss_u_list)
+                sample_size += targ_u_list[0].numel()
+
+            if self.loss_weights is not None:
+                assert hasattr(model, "get_extra_losses")
+                extra_losses, names = model.get_extra_losses(net_output)
+                if torch.is_tensor(extra_losses):
+                    extra_losses = [extra_losses]
+                    names = [names]
+                if len(self.loss_weights) == 1 and len(extra_losses) != 1:
+                    self.loss_weights = [self.loss_weights[0]] * len(extra_losses)
+                assert len(extra_losses) == len(self.loss_weights), f"{len(extra_losses)}, {len(self.loss_weights)}"
+                for p, n, coef in zip(extra_losses, names, self.loss_weights):
+                    if coef != 0 and p is not None:
+                        p = coef * p.float() * sample_size
+                        loss1 += p
+                        logging_output[f"loss_videoaudio_{n}"] = p.item()
+
+            logging_output = {
+                "loss_video_audio": loss1.item() if reduce else loss1,
+                **logging_output,
+            }
+
+            for lk in self.log_keys:
+                if lk in net_output:
+                    logging_output[lk] = float((net_output[lk]))
+
+            with torch.no_grad():
+                for i, logp_m in enumerate(logp_m_list):
+                    # corr_m, count_m = compute_correct(logp_m)
+                    if logp_m.numel() == 0:
+                        corr_m, count_m = 0
+                    else:
+                        corr_m, count_m = (logp_m.argmax(dim=-1)==targ_m_list[i]).sum().item(), len(targ_m_list[i])
+                    logging_output[f"correct_m_videoaudio_{i}"] = corr_m
+                    logging_output[f"count_m_videoaudio_{i}"] = count_m
+
+                for i, logp_u in enumerate(logp_u_list):
+                    if logp_u.numel() == 0:
+                        corr_u, count_u = 0, 0
+                    else:
+                        corr_u, count_u = (logp_u.argmax(dim=-1)==targ_u_list[i]).sum().item(), len(targ_u_list[i])
+                    logging_output[f"correct_u_videoaudio_{i}"] = corr_u
+                    logging_output[f"count_u_videoaudio_{i}"] = count_u
+            
+
+        if audiotext_sample is not None:
+            # print("audiotext_sample")
+            net_output = model(target_list=audiotext_sample["target_list"], targets_phone_list=audiotext_sample["targets_phone_list"], **audiotext_sample["net_input"])
+
+            loss_m_list = []
+            logp_m_list, targ_m_list = net_output['logit_m_list'], net_output['target_m_list']
+            for i, (logp_m, targ_m) in enumerate(zip(logp_m_list, targ_m_list)):
+                loss_m = F.cross_entropy(logp_m, targ_m, reduction=reduction)
+                loss_m_list.append(loss_m)               
+                logging_output[f"loss_m_audiotext_{i}"] = loss_m.detach().item()
+
+                
+            if self.pred_masked_weight > 0:
+                loss2 += self.pred_masked_weight * sum(loss_m_list)
+                sample_size += targ_m_list[0].numel()
+
+            loss_u_list = []
+            logp_u_list, targ_u_list = net_output['logit_u_list'], net_output['target_u_list']
+            for i, (logp_u, targ_u) in enumerate(zip(logp_u_list, targ_u_list)):
+                loss_u = F.cross_entropy(logp_u, targ_u, reduction=reduction)
+                loss_u_list.append(loss_u)
+                logging_output[f"loss_u_audiotext_{i}"] = loss_u.detach().item()
+            if self.pred_nomask_weight > 0:
+                loss2 += self.pred_nomask_weight * sum(loss_u_list)
+                sample_size += targ_u_list[0].numel()
+
+            if self.loss_weights is not None:
+                assert hasattr(model, "get_extra_losses")
+                extra_losses, names = model.get_extra_losses(net_output)
+                if torch.is_tensor(extra_losses):
+                    extra_losses = [extra_losses]
+                    names = [names]
+                if len(self.loss_weights) == 1 and len(extra_losses) != 1:
+                    self.loss_weights = [self.loss_weights[0]] * len(extra_losses)
+                assert len(extra_losses) == len(self.loss_weights), f"{len(extra_losses)}, {len(self.loss_weights)}"
+                for p, n, coef in zip(extra_losses, names, self.loss_weights):
+                    if coef != 0 and p is not None:
+                        p = coef * p.float() * sample_size
+                        loss2 += p
+                        logging_output[f"loss_audiotext_{n}"] = p.item()
+            
+
+            logging_output = {
+                "loss_audiotext": loss2.item() if reduce else loss2,
+                **logging_output,
+            }
+
+            for lk in self.log_keys:
+                if lk in net_output:
+                    logging_output[lk] = float((net_output[lk]))
+
+            with torch.no_grad():
+                for i, logp_m in enumerate(logp_m_list):
+                    # corr_m, count_m = compute_correct(logp_m)
+                    if logp_m.numel() == 0:
+                        corr_m, count_m = 0
+                    else:
+                        corr_m, count_m = (logp_m.argmax(dim=-1)==targ_m_list[i]).sum().item(), len(targ_m_list[i])
+                    logging_output[f"correct_m_audiotext_{i}"] = corr_m
+                    logging_output[f"count_m_audiotext_{i}"] = count_m
+
+                for i, logp_u in enumerate(logp_u_list):
+                    if logp_u.numel() == 0:
+                        corr_u, count_u = 0, 0
+                    else:
+                        corr_u, count_u = (logp_u.argmax(dim=-1)==targ_u_list[i]).sum().item(), len(targ_u_list[i])
+                    logging_output[f"correct_u_audiotext_{i}"] = corr_u
+                    logging_output[f"count_u_audiotext_{i}"] = count_u
+
+
+        if onlytext_sample is not None:
+            # print("onlytext_sample")
+            net_output = model(target_list=onlytext_sample["target_list"], extra_text_phone_list=onlytext_sample["extra_text_phone_list"], **onlytext_sample["net_input"])
+
+            loss_m_list = []
+            logp_m_list, targ_m_list = net_output['logit_m_list'], net_output['target_m_list']
+            for i, (logp_m, targ_m) in enumerate(zip(logp_m_list, targ_m_list)):
+                loss_m = F.cross_entropy(logp_m, targ_m, reduction=reduction)
+                loss_m_list.append(loss_m)               
+                logging_output[f"loss_m_onlytext_{i}"] = loss_m.detach().item()
+
+                
+            if self.pred_masked_weight > 0:
+                loss3 += self.pred_masked_weight * sum(loss_m_list)
+                sample_size += targ_m_list[0].numel()
+
+            loss_u_list = []
+            logp_u_list, targ_u_list = net_output['logit_u_list'], net_output['target_u_list']
+            for i, (logp_u, targ_u) in enumerate(zip(logp_u_list, targ_u_list)):
+                loss_u = F.cross_entropy(logp_u, targ_u, reduction=reduction)
+                loss_u_list.append(loss_u)
+                logging_output[f"loss_u_onlytext_{i}"] = loss_u.detach().item()
+            if self.pred_nomask_weight > 0:
+                loss3 += self.pred_nomask_weight * sum(loss_u_list)
+                sample_size += targ_u_list[0].numel()
+
+            if self.loss_weights is not None:
+                assert hasattr(model, "get_extra_losses")
+                extra_losses, names = model.get_extra_losses(net_output)
+                if torch.is_tensor(extra_losses):
+                    extra_losses = [extra_losses]
+                    names = [names]
+                if len(self.loss_weights) == 1 and len(extra_losses) != 1:
+                    self.loss_weights = [self.loss_weights[0]] * len(extra_losses)
+                assert len(extra_losses) == len(self.loss_weights), f"{len(extra_losses)}, {len(self.loss_weights)}"
+                for p, n, coef in zip(extra_losses, names, self.loss_weights):
+                    if coef != 0 and p is not None:
+                        p = coef * p.float() * sample_size
+                        loss3 += p
+                        logging_output[f"loss_onlytext_{n}"] = p.item()
+            
+
+            logging_output = {
+                "loss_onlytext": loss3.item() if reduce else loss3,
+                **logging_output,
+            }
+
+            for lk in self.log_keys:
+                if lk in net_output:
+                    logging_output[lk] = float((net_output[lk]))
+
+            with torch.no_grad():
+                for i, logp_m in enumerate(logp_m_list):
+                    # corr_m, count_m = compute_correct(logp_m)
+                    if logp_m.numel() == 0:
+                        corr_m, count_m = 0
+                    else:
+                        corr_m, count_m = (logp_m.argmax(dim=-1)==targ_m_list[i]).sum().item(), len(targ_m_list[i])
+                    logging_output[f"correct_m_onlytext_{i}"] = corr_m
+                    logging_output[f"count_m_onlytext_{i}"] = count_m
+
+                for i, logp_u in enumerate(logp_u_list):
+                    if logp_u.numel() == 0:
+                        corr_u, count_u = 0, 0
+                    else:
+                        corr_u, count_u = (logp_u.argmax(dim=-1)==targ_u_list[i]).sum().item(), len(targ_u_list[i])
+                    logging_output[f"correct_u_onlytext_{i}"] = corr_u
+                    logging_output[f"count_u_onlytext_{i}"] = count_u
+
+
+        if onlyaudio_sample is not None:
+            # print("onlytext_sample")
+            net_output = model(target_list=onlyaudio_sample["target_list"], **onlyaudio_sample["net_input"])
+
+            loss_m_list = []
+            logp_m_list, targ_m_list = net_output['logit_m_list'], net_output['target_m_list']
+            for i, (logp_m, targ_m) in enumerate(zip(logp_m_list, targ_m_list)):
+                loss_m = F.cross_entropy(logp_m, targ_m, reduction=reduction)
+                loss_m_list.append(loss_m)               
+                logging_output[f"loss_m_onlyaudio_{i}"] = loss_m.detach().item()
+
+                
+            if self.pred_masked_weight > 0:
+                loss4 += self.pred_masked_weight * sum(loss_m_list)
+                sample_size += targ_m_list[0].numel()
+
+            loss_u_list = []
+            logp_u_list, targ_u_list = net_output['logit_u_list'], net_output['target_u_list']
+            for i, (logp_u, targ_u) in enumerate(zip(logp_u_list, targ_u_list)):
+                loss_u = F.cross_entropy(logp_u, targ_u, reduction=reduction)
+                loss_u_list.append(loss_u)
+                logging_output[f"loss_u_onlyaudio_{i}"] = loss_u.detach().item()
+            if self.pred_nomask_weight > 0:
+                loss4 += self.pred_nomask_weight * sum(loss_u_list)
+                sample_size += targ_u_list[0].numel()
+
+            if self.loss_weights is not None:
+                assert hasattr(model, "get_extra_losses")
+                extra_losses, names = model.get_extra_losses(net_output)
+                if torch.is_tensor(extra_losses):
+                    extra_losses = [extra_losses]
+                    names = [names]
+                if len(self.loss_weights) == 1 and len(extra_losses) != 1:
+                    self.loss_weights = [self.loss_weights[0]] * len(extra_losses)
+                assert len(extra_losses) == len(self.loss_weights), f"{len(extra_losses)}, {len(self.loss_weights)}"
+                for p, n, coef in zip(extra_losses, names, self.loss_weights):
+                    if coef != 0 and p is not None:
+                        p = coef * p.float() * sample_size
+                        loss4 += p
+                        logging_output[f"loss_onlyaudio_{n}"] = p.item()
+            
+
+            logging_output = {
+                "loss_onlyaudio": loss4.item() if reduce else loss4,
+                **logging_output,
+            }
+
+            for lk in self.log_keys:
+                if lk in net_output:
+                    logging_output[lk] = float((net_output[lk]))
+
+            with torch.no_grad():
+                for i, logp_m in enumerate(logp_m_list):
+                    # corr_m, count_m = compute_correct(logp_m)
+                    if logp_m.numel() == 0:
+                        corr_m, count_m = 0
+                    else:
+                        corr_m, count_m = (logp_m.argmax(dim=-1)==targ_m_list[i]).sum().item(), len(targ_m_list[i])
+                    logging_output[f"correct_m_onlyaudio_{i}"] = corr_m
+                    logging_output[f"count_m_onlyaudio_{i}"] = count_m
+
+                for i, logp_u in enumerate(logp_u_list):
+                    if logp_u.numel() == 0:
+                        corr_u, count_u = 0, 0
+                    else:
+                        corr_u, count_u = (logp_u.argmax(dim=-1)==targ_u_list[i]).sum().item(), len(targ_u_list[i])
+                    logging_output[f"correct_u_onlyaudio_{i}"] = corr_u
+                    logging_output[f"count_u_onlyaudio_{i}"] = count_u
+
+
+
+        loss = loss1 + loss2 + self.banlance_loss_weights[0] * loss3 + self.banlance_loss_weights[1] * loss4
+
+        logging_output = {
+            "loss": loss.item() if reduce else loss,
+            "ntokens": sample_size,
+            "nsentences": sample["videoaudio"]["id"].numel(),
+            "sample_size": sample_size,
+            **logging_output,
+        }
+
+        return loss, sample_size, logging_output
+
+    @staticmethod
+    def reduce_metrics(logging_outputs) -> None:
+        """Aggregate logging outputs from data parallel training (copied from normal cross entropy)."""
+        loss_sum = sum(log.get("loss", 0) for log in logging_outputs)
+        ntokens = sum(log.get("ntokens", 0) for log in logging_outputs)
+        sample_size = sum(log.get("sample_size", 0) for log in logging_outputs)
+
+        metrics.log_scalar("loss", loss_sum / sample_size / math.log(2), sample_size, round=3)
+        if sample_size != ntokens:
+            metrics.log_scalar("nll_loss", loss_sum / ntokens / math.log(2), ntokens, round=3)
+            metrics.log_derived("ppl", lambda meters: utils.get_perplexity(meters["nll_loss"].avg))
+        else:
+            metrics.log_derived("ppl", lambda meters: utils.get_perplexity(meters["loss"].avg))
+
+        counts = {}
+        for lk in logging_outputs[0].keys():
+            if lk.startswith("count_"):
+                val = sum(log[lk] for log in logging_outputs)
+                metrics.log_scalar(lk, val)
+                counts[lk] = val
+
+        for lk in logging_outputs[0].keys():
+            if lk.startswith("loss_"):
+                val = sum(log[lk] for log in logging_outputs)
+                metrics.log_scalar(lk, val / sample_size / math.log(2), round=3)
+            elif lk.startswith("correct_"):
+                val = sum(log[lk] for log in logging_outputs)
+                metrics.log_scalar(lk, val / counts[re.sub("correct", "count", lk)])
+
+    @staticmethod
+    def aggregate_logging_outputs(logging_outputs):
+        """Aggregate logging outputs from data parallel training."""
+        raise NotImplementedError()
+
+    @staticmethod
+    def logging_outputs_can_be_summed() -> bool:
+        """
+        Whether the logging outputs returned by `forward` can be summed
+        across workers prior to calling `reduce_metrics`. Setting this
+        to True will improves distributed training speed.
+        """
+        return False
diff --git a/VATLM/vat_hubert/vathubert/data/audiohubert_dataset.py b/VATLM/vat_hubert/vathubert/data/audiohubert_dataset.py
new file mode 100644
index 0000000000000000000000000000000000000000..90c2e0748c65418a92e483a1e3f3cc9189b55b35
--- /dev/null
+++ b/VATLM/vat_hubert/vathubert/data/audiohubert_dataset.py
@@ -0,0 +1,509 @@
+# ----------------------------------------------------------------------------
+# VatLM: Visual-Audio-Text Pre-Training  with Unified Masked Prediction for Speech Representation Learning
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/VATLM
+# Code based on fairseq: https://github.com/facebookresearch/fairseq and av_hubert: https://github.com/facebookresearch/av_hubert
+# 
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# ----------------------------------------------------------------------------
+
+import itertools
+import logging
+import os
+import sys
+import time
+from typing import Any, List, Optional, Union
+
+import numpy as np
+
+import torch
+import torch.nn.functional as F
+from fairseq.data import data_utils
+from fairseq.data.fairseq_dataset import FairseqDataset
+# from python_speech_features import logfbank
+from scipy.io import wavfile
+import kaldiio
+
+DBG=True if len(sys.argv) == 1 else False
+
+if DBG:
+    import utils as custom_utils
+    logging.basicConfig(
+        format="%(asctime)s | %(levelname)s | %(name)s | %(message)s",
+        datefmt="%Y-%m-%d %H:%M:%S",
+        level=os.environ.get("LOGLEVEL", "DEBUG").upper(),
+        stream=sys.stdout,
+    )
+else:
+    from . import utils as custom_utils
+
+logger = logging.getLogger(__name__)
+
+
+def load_audio(manifest_path, max_keep, min_keep, frame_rate, label_paths, label_rates, tol=0.1):
+    def is_audio_label_aligned(audio_dur, label_durs):
+        return all([abs(audio_dur - label_dur)<tol for label_dur in label_durs])
+
+    n_long, n_short, n_unaligned = 0, 0, 0
+    names, inds, sizes = [], [], []
+    dur_from_label_list = []
+    is_seq_label = any([x==-1 for x in label_rates])
+    for label_path, label_rate in zip(label_paths, label_rates):
+        label_lengths = [len(line.rstrip().split())/label_rate for line in open(label_path).readlines()]
+        dur_from_label_list.append(label_lengths)
+    dur_from_label_list = list(zip(*dur_from_label_list))
+
+    with open(manifest_path) as f:
+        root = f.readline().strip()
+        for ind, line in enumerate(f):
+            items = line.strip().split("\t")
+            sz = int(items[1]) / 640 # 
+            if min_keep is not None and sz < min_keep:
+                n_short += 1
+            elif max_keep is not None and sz > max_keep:
+                n_long += 1
+            elif (not is_seq_label) and (not is_audio_label_aligned(sz/frame_rate, dur_from_label_list[ind])):
+                n_unaligned += 1
+            else:
+                audio_path = items[0]
+                names.append(os.path.join(root, audio_path))
+                inds.append(ind)
+                sizes.append(sz)
+    tot = ind + 1
+    logger.info(
+        (
+            f"label_rates={label_rates}, "
+            f"max_keep={max_keep}, min_keep={min_keep}, "
+            f"loaded {len(names)}, skipped {n_short} short and {n_long} long and {n_unaligned} unaligned, "
+            f"longest-loaded={max(sizes)}, shortest-loaded={min(sizes)}"
+        )
+    )
+    return root, names, inds, tot, sizes
+
+
+
+def load_label(label_path, inds, tot):
+    with open(label_path) as f:
+        labels = [line.rstrip() for line in f]
+        assert (
+            len(labels) == tot
+        ), f"number of labels does not match ({len(labels)} != {tot})"
+        labels = [labels[i] for i in inds]
+    return labels
+
+def load_phone_label(tsv, inds, tot):
+    with open(tsv) as f:
+        labels = [line.rstrip().split("\t")[-1] for line in f.readlines()[1:]]
+        assert (
+            len(labels) == tot
+        ), f"number of labels does not match ({len(labels)} != {tot})"
+        labels = [labels[i] for i in inds]
+    return labels
+
+
+def load_label_offset(label_path, inds, tot):
+    with open(label_path) as f:
+        code_lengths = [len(line.encode("utf-8")) for line in f]
+        assert (
+            len(code_lengths) == tot
+        ), f"number of labels does not match ({len(code_lengths)} != {tot})"
+        offsets = list(itertools.accumulate([0] + code_lengths))
+        offsets = [(offsets[i], offsets[i + 1]) for i in inds]
+    return offsets
+
+
+def verify_label_lengths(
+    audio_sizes,
+    audio_rate,
+    label_path,
+    label_rate,
+    inds,
+    tot,
+    tol=0.1,  # tolerance in seconds
+):
+    if label_rate < 0:
+        logger.info(f"{label_path} is sequence label. skipped")
+        return
+
+    with open(label_path) as f:
+        lengths = [len(line.rstrip().split()) for line in f]
+        assert len(lengths) == tot
+        lengths = [lengths[i] for i in inds]
+    num_invalid = 0
+    for i, ind in enumerate(inds):
+        dur_from_audio = audio_sizes[i] / audio_rate
+        dur_from_label = lengths[i] / label_rate
+        if abs(dur_from_audio - dur_from_label) > tol:
+            logger.warning(
+                (
+                    f"audio and label duration differ too much "
+                    f"(|{dur_from_audio} - {dur_from_label}| > {tol}) "
+                    f"in line {ind+1} of {label_path}. Check if `label_rate` "
+                    f"is correctly set (currently {label_rate}). "
+                    f"num. of samples = {audio_sizes[i]}; "
+                    f"label length = {lengths[i]}"
+                )
+            )
+            num_invalid += 1
+    if num_invalid > 0:
+        logger.warning(
+            f"total {num_invalid} (audio, label) pairs with mismatched lengths"
+        )
+
+
+class AudioHubertDataset(FairseqDataset):
+    def __init__(
+            self,
+            manifest_path: str,
+            sample_rate: float,
+            label_paths: List[str],
+            label_rates: Union[List[float], float],  # -1 for sequence labels
+            pad_list: List[str],
+            eos_list: List[str],
+            label_processors: Optional[List[Any]] = None,
+            phone_sequence_processors: Optional[List[Any]] = None,
+            max_keep_sample_size: Optional[int] = None,
+            min_keep_sample_size: Optional[int] = None,
+            max_sample_size: Optional[int] = None,
+            shuffle: bool = True,
+            pad_audio: bool = False,
+            normalize: bool = False,
+            store_labels: bool = True,
+            single_target: bool = False,
+            stack_order_audio: int=1,
+            skip_verify: bool=False,
+            is_s2s=False,
+    ):
+        self.label_rates = (
+            [label_rates for _ in range(len(label_paths))]
+            if isinstance(label_rates, int)
+            else label_rates
+        )
+        self.audio_root, self.names, inds, tot, self.sizes = load_audio(manifest_path, max_keep_sample_size, min_keep_sample_size, frame_rate=sample_rate, label_paths=label_paths, label_rates=self.label_rates)
+        self.sample_rate = sample_rate
+        self.stack_order_audio = stack_order_audio
+        self.shuffle = shuffle
+
+        self.num_labels = len(label_paths)
+        self.pad_list = pad_list
+        self.eos_list = eos_list
+        self.label_processors = label_processors
+        self.phone_processors = phone_sequence_processors
+        self.single_target = single_target
+        self.store_labels = store_labels
+        self.is_s2s = is_s2s
+
+        assert self.single_target == (self.label_rates[0] == -1), f"single target should be equivalent to sequence label (label_rate==-1)"
+        if store_labels:
+            self.label_list = [load_label(p, inds, tot) for p in label_paths]
+            self.phone_list = [load_phone_label(p, inds, tot) for p in [manifest_path]]
+
+        else:
+            self.label_paths = label_paths
+            self.label_offsets_list = [
+                load_label_offset(p, inds, tot) for p in label_paths
+            ]
+        assert (
+            label_processors is None
+            or len(label_processors) == self.num_labels
+        )
+        if not skip_verify:
+            for label_path, label_rate in zip(label_paths, self.label_rates):
+                verify_label_lengths(self.sizes, self.sample_rate, label_path, label_rate, inds, tot)
+        else:
+            logger.info(f"Skip label alignment verifying")
+
+        self.max_sample_size = (
+            max_sample_size if max_sample_size is not None else sys.maxsize
+        )
+        self.pad_audio = pad_audio
+        self.normalize = normalize
+
+
+    def get_label(self, index, label_idx):
+        if self.store_labels:
+            label = self.label_list[label_idx][index]
+        else:
+            with open(self.label_paths[label_idx]) as f:
+                offset_s, offset_e = self.label_offsets_list[label_idx][index]
+                f.seek(offset_s)
+                label = f.read(offset_e - offset_s)
+
+        if self.label_processors is not None:
+            label = self.label_processors[label_idx](label)
+        return label
+
+    def get_labels(self, index):
+        return [self.get_label(index, i) for i in range(self.num_labels)]
+
+    def get_phone(self, index, label_idx):
+        label = self.phone_list[label_idx][index]
+        if self.phone_processors is not None:
+            label = self.phone_processors[label_idx](label)
+        return label
+
+    def get_phones(self, index):
+        return [self.get_phone(index, i) for i in range(1)]
+
+
+    def load_feature(self, mix_name):
+        """
+        Load audio feature
+        Returns:
+        audio_feats: numpy.ndarray of shape [T, F]
+        """
+        def stacker(feats, stack_order):
+            """
+            Concatenating consecutive audio frames
+            Args:
+            feats - numpy.ndarray of shape [T, F]
+            stack_order - int (number of neighboring frames to concatenate
+            Returns:
+            feats - numpy.ndarray of shape [T', F']
+            """
+            feat_dim = feats.shape[1]
+            if len(feats) % stack_order != 0:
+                res = stack_order - len(feats) % stack_order
+                res = np.zeros([res, feat_dim]).astype(feats.dtype)
+                feats = np.concatenate([feats, res], axis=0)
+            feats = feats.reshape((-1, stack_order, feat_dim)).reshape(-1, stack_order*feat_dim)
+            return feats
+        audio_fn = mix_name
+
+        # sample_rate, wav_data = wavfile.read(audio_fn)
+        # assert sample_rate == 16_000 and len(wav_data.shape) == 1
+        # audio_feats = logfbank(wav_data, samplerate=sample_rate).astype(np.float32) # [T, F]
+        audio_feats = kaldiio.load_mat(audio_fn).astype(np.float32)
+
+        audio_feats = stacker(audio_feats, self.stack_order_audio) # [T/stack_order_audio, F*stack_order_audio]
+        return audio_feats
+
+
+    def __getitem__(self, index):
+        audio_feats = self.load_feature(self.names[index])
+        audio_feats = torch.from_numpy(audio_feats.astype(np.float32))
+        if self.normalize:
+            with torch.no_grad():
+                audio_feats = F.layer_norm(audio_feats, audio_feats.shape[1:])
+        labels = self.get_labels(index)
+        phone_sequence_list = self.get_phones(index)
+
+    
+        return {"id": index, 'audio_source': audio_feats, "label_list": labels, "phone_sequence_list": phone_sequence_list}
+
+
+    def __len__(self):
+        return len(self.sizes)
+
+    def crop_to_max_size(self, wav, target_size, start=None):
+        size = len(wav)
+        diff = size - target_size
+        if diff <= 0:
+            return wav, 0
+        # longer utterances
+        if start is None:
+            start, end = 0, target_size
+            # if self.random_crop:
+            #     start = np.random.randint(0, diff + 1)
+            #     end = size - diff + start
+        else:
+            end = start + target_size
+        return wav[start:end], start
+
+    def collater(self, samples):
+        samples = [s for s in samples if s["id"] is not None]
+        if len(samples) == 0:
+            return {}
+
+        audio_source = [s["audio_source"] for s in samples]
+        if audio_source[0] is None:
+            audio_source = None
+        if audio_source is not None:
+            audio_sizes = [len(s) for s in audio_source]
+    
+        if self.pad_audio:
+            audio_size = min(max(audio_sizes), self.max_sample_size)
+        else:
+            audio_size = min(min(audio_sizes), self.max_sample_size)
+        if audio_source is not None:
+            collated_audios, padding_mask, audio_starts = self.collater_audio(audio_source, audio_size)
+        else:
+            collated_audios, audio_starts = None, None
+        
+        # B1, D1, T1 = collated_audios.size()
+        # collated_videos =  torch.from_numpy(np.zeros((B1, 1, T1, 88, 88)).astype(np.float32))
+
+        targets_by_label = [
+            [s["label_list"][i] for s in samples]
+            for i in range(self.num_labels)
+        ]
+        targets_list, lengths_list, ntokens_list = self.collater_label(
+            targets_by_label, audio_size, audio_starts
+        )
+
+        ############################################################################
+        phone_sequence_list = [s["phone_sequence_list"] for s in samples]
+        if phone_sequence_list[0] is None:
+            phone_sequence_list = None
+        
+        targets_by_phone_label = [
+            [s["phone_sequence_list"][i] for s in samples]
+            for i in range(self.num_labels)
+        ]
+        targets_phone_list, lengths_phone_list, ntokens_phone_list = self.collater_phone_label(
+            targets_by_phone_label, audio_size, audio_starts
+        )
+
+        # print("targets_phone_list", targets_phone_list)
+        ######################################################
+
+        # source = {"audio": collated_audios, "video": collated_videos}
+        source = {"audio": collated_audios, "video": None}
+        net_input = {"source": source, "padding_mask": padding_mask}
+        batch = {
+            "id": torch.LongTensor([s["id"] for s in samples]),
+            "net_input": net_input,
+        }
+
+        if self.single_target:
+            batch["target_lengths"] = lengths_list[0]
+            batch["ntokens"] = ntokens_list[0]
+            if self.is_s2s:
+                batch['target'], net_input['prev_output_tokens'] = targets_list[0][0], targets_list[0][1]
+            else:
+                batch["target"] = targets_list[0]
+        else:
+            batch["target_lengths_list"] = lengths_list
+            batch["ntokens_list"] = ntokens_list
+            batch["target_list"] = targets_list 
+
+        batch["targets_phone_list"] = targets_phone_list       
+
+        return batch
+
+    def collater_audio(self, audios, audio_size, audio_starts=None):
+        audio_feat_shape = list(audios[0].shape[1:])
+        collated_audios = audios[0].new_zeros([len(audios), audio_size]+audio_feat_shape)
+        padding_mask = (
+            torch.BoolTensor(len(audios), audio_size).fill_(False) # 
+        )
+        start_known = audio_starts is not None
+        audio_starts = [0 for _ in audios] if not start_known else audio_starts
+        for i, audio in enumerate(audios):
+            diff = len(audio) - audio_size
+            if diff == 0:
+                collated_audios[i] = audio
+            elif diff < 0:
+                assert self.pad_audio
+                collated_audios[i] = torch.cat(
+                    [audio, audio.new_full([-diff]+audio_feat_shape, 0.0)]
+                )
+                padding_mask[i, diff:] = True
+            else:
+                collated_audios[i], audio_starts[i] = self.crop_to_max_size(
+                    audio, audio_size, audio_starts[i] if start_known else None
+                )
+        if len(audios[0].shape) == 2:
+            collated_audios = collated_audios.transpose(1, 2) # [B, T, F] -> [B, F, T]
+        else:
+            collated_audios = collated_audios.permute((0, 4, 1, 2, 3)).contiguous() # [B, T, H, W, C] -> [B, C, T, H, W]
+        return collated_audios, padding_mask, audio_starts
+
+    def collater_frm_label(
+        self, targets, audio_size, audio_starts, label_rate, pad
+    ):
+        assert label_rate > 0
+        s2f = label_rate / self.sample_rate # num label per sample
+        frm_starts = [int(round(s * s2f)) for s in audio_starts]
+        frm_size = int(round(audio_size * s2f))
+        if not self.pad_audio:
+            rem_size = [len(t) - s for t, s in zip(targets, frm_starts)]
+            frm_size = min(frm_size, *rem_size)
+        targets = [t[s: s + frm_size] for t, s in zip(targets, frm_starts)]
+        logger.debug(f"audio_starts={audio_starts}")
+        logger.debug(f"frame_starts={frm_starts}")
+        logger.debug(f"frame_size={frm_size}")
+
+        lengths = torch.LongTensor([len(t) for t in targets])
+        ntokens = lengths.sum().item()
+        targets = data_utils.collate_tokens(
+            targets, pad_idx=pad, left_pad=False
+        )
+        return targets, lengths, ntokens
+
+    def collater_frm_phone_label(
+        self, targets, pad
+    ):
+
+        lengths = torch.LongTensor([len(t) for t in targets])
+        ntokens = lengths.sum().item()
+        targets = data_utils.collate_tokens(
+            targets, pad_idx=pad, left_pad=False
+        )
+        return targets, lengths, ntokens
+
+    def collater_seq_label(self, targets, pad):
+        lengths = torch.LongTensor([len(t) for t in targets])
+        ntokens = lengths.sum().item()
+        targets = data_utils.collate_tokens(
+            targets, pad_idx=pad, left_pad=False
+        )
+        return targets, lengths, ntokens
+
+
+    def collater_seq_label_s2s(self, targets, pad):
+        lengths = torch.LongTensor([len(t) for t in targets])
+        ntokens = lengths.sum().item()
+        pad, eos = self.label_processors[0].dictionary.pad(), self.label_processors[0].dictionary.eos()
+        targets_ = data_utils.collate_tokens(targets, pad_idx=pad, eos_idx=eos, left_pad=False)
+        prev_output_tokens = data_utils.collate_tokens(targets, pad_idx=pad, eos_idx=eos, left_pad=False, move_eos_to_beginning=True)
+        return (targets_, prev_output_tokens), lengths, ntokens
+
+    def collater_label(self, targets_by_label, audio_size, audio_starts):
+        targets_list, lengths_list, ntokens_list = [], [], []
+        itr = zip(targets_by_label, self.label_rates, self.pad_list)
+        for targets, label_rate, pad in itr:
+            if label_rate == -1:
+                if self.is_s2s:
+                    targets, lengths, ntokens = self.collater_seq_label_s2s(targets, pad)
+                else:
+                    targets, lengths, ntokens = self.collater_seq_label(targets, pad)
+            else:
+                targets, lengths, ntokens = self.collater_frm_label(
+                    targets, audio_size, audio_starts, label_rate, pad
+                )
+            targets_list.append(targets)
+            lengths_list.append(lengths)
+            ntokens_list.append(ntokens)
+        return targets_list, lengths_list, ntokens_list
+
+
+    def collater_phone_label(self, targets_by_label, audio_size, audio_starts):
+        targets_list, lengths_list, ntokens_list = [], [], []
+        itr = zip(targets_by_label, self.label_rates, self.pad_list)
+        for targets, label_rate, pad in itr:
+            targets, lengths, ntokens = self.collater_frm_phone_label(
+                targets, pad
+            )
+            targets_list.append(targets)
+            lengths_list.append(lengths)
+            ntokens_list.append(ntokens)
+        return targets_list, lengths_list, ntokens_list
+
+
+    def num_tokens(self, index):
+        return self.size(index)
+
+    def size(self, index):
+        if self.pad_audio:
+            return self.sizes[index]
+        return min(self.sizes[index], self.max_sample_size)
+
+    def ordered_indices(self):
+        if self.shuffle:
+            order = [np.random.permutation(len(self))]
+        else:
+            order = [np.arange(len(self))]
+
+        order.append(self.sizes)
+        return np.lexsort(order)[::-1]
diff --git a/VATLM/vat_hubert/vathubert/data/onlyaudiohubert_dataset.py b/VATLM/vat_hubert/vathubert/data/onlyaudiohubert_dataset.py
new file mode 100644
index 0000000000000000000000000000000000000000..d864f7d822fb124efb357a7a136117418b131c99
--- /dev/null
+++ b/VATLM/vat_hubert/vathubert/data/onlyaudiohubert_dataset.py
@@ -0,0 +1,436 @@
+# ----------------------------------------------------------------------------
+# VatLM: Visual-Audio-Text Pre-Training  with Unified Masked Prediction for Speech Representation Learning
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/VATLM
+# Code based on fairseq: https://github.com/facebookresearch/fairseq and av_hubert: https://github.com/facebookresearch/av_hubert
+# 
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# ----------------------------------------------------------------------------
+
+import itertools
+import logging
+import os
+import sys
+import time
+from typing import Any, List, Optional, Union
+
+import numpy as np
+
+import torch
+import torch.nn.functional as F
+from fairseq.data import data_utils
+from fairseq.data.fairseq_dataset import FairseqDataset
+from scipy.io import wavfile
+import kaldiio
+
+
+DBG=True if len(sys.argv) == 1 else False
+
+if DBG:
+    import utils as custom_utils
+    logging.basicConfig(
+        format="%(asctime)s | %(levelname)s | %(name)s | %(message)s",
+        datefmt="%Y-%m-%d %H:%M:%S",
+        level=os.environ.get("LOGLEVEL", "DEBUG").upper(),
+        stream=sys.stdout,
+    )
+else:
+    from . import utils as custom_utils
+
+logger = logging.getLogger(__name__)
+
+
+def load_audio(manifest_path, max_keep, min_keep, frame_rate, label_paths, label_rates, tol=0.1):
+    def is_audio_label_aligned(audio_dur, label_durs):
+        return all([abs(audio_dur - label_dur)<tol for label_dur in label_durs])
+
+    n_long, n_short, n_unaligned = 0, 0, 0
+    names, inds, sizes = [], [], []
+    dur_from_label_list = []
+    is_seq_label = any([x==-1 for x in label_rates])
+    for label_path, label_rate in zip(label_paths, label_rates):
+        label_lengths = [len(line.rstrip().split())/label_rate for line in open(label_path).readlines()]
+        dur_from_label_list.append(label_lengths)
+    dur_from_label_list = list(zip(*dur_from_label_list))
+
+    with open(manifest_path) as f:
+        root = f.readline().strip()
+        for ind, line in enumerate(f):
+            items = line.strip().split("\t")
+            sz = int(items[1]) / 640 # 
+            if min_keep is not None and sz < min_keep:
+                n_short += 1
+            elif max_keep is not None and sz > max_keep:
+                n_long += 1
+            else:
+                audio_path = items[0]
+                names.append(os.path.join(root, audio_path))
+                inds.append(ind)
+                sizes.append(sz)
+    tot = ind + 1
+    logger.info(
+        (
+            f"label_rates={label_rates}, "
+            f"max_keep={max_keep}, min_keep={min_keep}, "
+            f"loaded {len(names)}, skipped {n_short} short and {n_long} long, "
+            f"longest-loaded={max(sizes)}, shortest-loaded={min(sizes)}"
+        )
+    )
+    return root, names, inds, tot, sizes
+
+
+
+def load_label(label_path, inds, tot):
+    with open(label_path) as f:
+        labels = [line.rstrip() for line in f]
+        assert (
+            len(labels) == tot
+        ), f"number of labels does not match ({len(labels)} != {tot})"
+        labels = [labels[i] for i in inds]
+    return labels
+
+
+def load_label_offset(label_path, inds, tot):
+    with open(label_path) as f:
+        code_lengths = [len(line.encode("utf-8")) for line in f]
+        assert (
+            len(code_lengths) == tot
+        ), f"number of labels does not match ({len(code_lengths)} != {tot})"
+        offsets = list(itertools.accumulate([0] + code_lengths))
+        offsets = [(offsets[i], offsets[i + 1]) for i in inds]
+    return offsets
+
+
+def verify_label_lengths(
+    audio_sizes,
+    audio_rate,
+    label_path,
+    label_rate,
+    inds,
+    tot,
+    tol=0.1,  # tolerance in seconds
+):
+    if label_rate < 0:
+        logger.info(f"{label_path} is sequence label. skipped")
+        return
+
+    with open(label_path) as f:
+        lengths = [len(line.rstrip().split()) for line in f]
+        assert len(lengths) == tot
+        lengths = [lengths[i] for i in inds]
+    num_invalid = 0
+    for i, ind in enumerate(inds):
+        dur_from_audio = audio_sizes[i] / audio_rate
+        dur_from_label = lengths[i] / label_rate
+        if abs(dur_from_audio - dur_from_label) > tol:
+            logger.warning(
+                (
+                    f"audio and label duration differ too much "
+                    f"(|{dur_from_audio} - {dur_from_label}| > {tol}) "
+                    f"in line {ind+1} of {label_path}. Check if `label_rate` "
+                    f"is correctly set (currently {label_rate}). "
+                    f"num. of samples = {audio_sizes[i]}; "
+                    f"label length = {lengths[i]}"
+                )
+            )
+            num_invalid += 1
+    if num_invalid > 0:
+        logger.warning(
+            f"total {num_invalid} (audio, label) pairs with mismatched lengths"
+        )
+
+
+class OnlyAudioHubertDataset(FairseqDataset):
+    def __init__(
+            self,
+            manifest_path: str,
+            sample_rate: float,
+            label_paths: List[str],
+            label_rates: Union[List[float], float],  # -1 for sequence labels
+            pad_list: List[str],
+            eos_list: List[str],
+            label_processors: Optional[List[Any]] = None,
+            max_keep_sample_size: Optional[int] = None,
+            min_keep_sample_size: Optional[int] = None,
+            max_sample_size: Optional[int] = None,
+            shuffle: bool = True,
+            pad_audio: bool = False,
+            normalize: bool = False,
+            store_labels: bool = True,
+            single_target: bool = False,
+            stack_order_audio: int=1,
+            skip_verify: bool=False,
+            is_s2s=False,
+    ):
+        self.label_rates = (
+            [label_rates for _ in range(len(label_paths))]
+            if isinstance(label_rates, int)
+            else label_rates
+        )
+        self.audio_root, self.names, inds, tot, self.sizes = load_audio(manifest_path, max_keep_sample_size, min_keep_sample_size, frame_rate=sample_rate, label_paths=label_paths, label_rates=self.label_rates)
+        self.sample_rate = sample_rate
+        self.stack_order_audio = stack_order_audio
+        self.shuffle = shuffle
+
+        self.num_labels = len(label_paths)
+        self.pad_list = pad_list
+        self.eos_list = eos_list
+        self.label_processors = label_processors
+        self.single_target = single_target
+        self.store_labels = store_labels
+        self.is_s2s = is_s2s
+
+        assert self.single_target == (self.label_rates[0] == -1), f"single target should be equivalent to sequence label (label_rate==-1)"
+        if store_labels:
+            self.label_list = [load_label(p, inds, tot) for p in label_paths]
+
+        else:
+            self.label_paths = label_paths
+            self.label_offsets_list = [
+                load_label_offset(p, inds, tot) for p in label_paths
+            ]
+        assert (
+            label_processors is None
+            or len(label_processors) == self.num_labels
+        )
+        if not skip_verify:
+            for label_path, label_rate in zip(label_paths, self.label_rates):
+                verify_label_lengths(self.sizes, self.sample_rate, label_path, label_rate, inds, tot)
+        else:
+            logger.info(f"Skip label alignment verifying")
+
+        self.max_sample_size = (
+            max_sample_size if max_sample_size is not None else sys.maxsize
+        )
+        self.pad_audio = pad_audio
+        self.normalize = normalize
+
+
+    def get_label(self, index, label_idx):
+        if self.store_labels:
+            label = self.label_list[label_idx][index]
+        else:
+            with open(self.label_paths[label_idx]) as f:
+                offset_s, offset_e = self.label_offsets_list[label_idx][index]
+                f.seek(offset_s)
+                label = f.read(offset_e - offset_s)
+
+        if self.label_processors is not None:
+            label = self.label_processors[label_idx](label)
+        return label
+
+    def get_labels(self, index):
+        return [self.get_label(index, i) for i in range(self.num_labels)]
+
+
+    def load_feature(self, mix_name):
+        """
+        Load audio feature
+        Returns:
+        audio_feats: numpy.ndarray of shape [T, F]
+        """
+        def stacker(feats, stack_order):
+            """
+            Concatenating consecutive audio frames
+            Args:
+            feats - numpy.ndarray of shape [T, F]
+            stack_order - int (number of neighboring frames to concatenate
+            Returns:
+            feats - numpy.ndarray of shape [T', F']
+            """
+            feat_dim = feats.shape[1]
+            if len(feats) % stack_order != 0:
+                res = stack_order - len(feats) % stack_order
+                res = np.zeros([res, feat_dim]).astype(feats.dtype)
+                feats = np.concatenate([feats, res], axis=0)
+            feats = feats.reshape((-1, stack_order, feat_dim)).reshape(-1, stack_order*feat_dim)
+            return feats
+        audio_fn = mix_name
+
+        audio_feats = kaldiio.load_mat(audio_fn).astype(np.float32)
+        audio_feats = stacker(audio_feats, self.stack_order_audio) # [T/stack_order_audio, F*stack_order_audio]
+        return audio_feats
+
+
+    def __getitem__(self, index):
+        audio_feats = self.load_feature(self.names[index])
+        audio_feats = torch.from_numpy(audio_feats.astype(np.float32))
+        if self.normalize:
+            with torch.no_grad():
+                audio_feats = F.layer_norm(audio_feats, audio_feats.shape[1:])
+        labels = self.get_labels(index)
+
+    
+        return {"id": index, 'audio_source': audio_feats, "label_list": labels}
+
+
+    def __len__(self):
+        return len(self.sizes)
+
+    def crop_to_max_size(self, wav, target_size, start=None):
+        size = len(wav)
+        diff = size - target_size
+        if diff <= 0:
+            return wav, 0
+        # longer utterances
+        if start is None:
+            start, end = 0, target_size
+            # if self.random_crop:
+            #     start = np.random.randint(0, diff + 1)
+            #     end = size - diff + start
+        else:
+            end = start + target_size
+        return wav[start:end], start
+
+    def collater(self, samples):
+        samples = [s for s in samples if s["id"] is not None]
+        if len(samples) == 0:
+            return {}
+
+        audio_source = [s["audio_source"] for s in samples]
+        if audio_source[0] is None:
+            audio_source = None
+        if audio_source is not None:
+            audio_sizes = [len(s) for s in audio_source]
+    
+        if self.pad_audio:
+            audio_size = min(max(audio_sizes), self.max_sample_size)
+        else:
+            audio_size = min(min(audio_sizes), self.max_sample_size)
+        if audio_source is not None:
+            collated_audios, padding_mask, audio_starts = self.collater_audio(audio_source, audio_size)
+        else:
+            collated_audios, audio_starts = None, None
+
+        targets_by_label = [
+            [s["label_list"][i] for s in samples]
+            for i in range(self.num_labels)
+        ]
+        targets_list, lengths_list, ntokens_list = self.collater_label(
+            targets_by_label, audio_size, audio_starts
+        )
+
+        source = {"audio": collated_audios, "video": None}
+        net_input = {"source": source, "padding_mask": padding_mask}
+        batch = {
+            "id": torch.LongTensor([s["id"] for s in samples]),
+            "net_input": net_input,
+        }
+
+        if self.single_target:
+            batch["target_lengths"] = lengths_list[0]
+            batch["ntokens"] = ntokens_list[0]
+            if self.is_s2s:
+                batch['target'], net_input['prev_output_tokens'] = targets_list[0][0], targets_list[0][1]
+            else:
+                batch["target"] = targets_list[0]
+        else:
+            batch["target_lengths_list"] = lengths_list
+            batch["ntokens_list"] = ntokens_list
+            batch["target_list"] = targets_list    
+
+        return batch
+
+    def collater_audio(self, audios, audio_size, audio_starts=None):
+        audio_feat_shape = list(audios[0].shape[1:])
+        collated_audios = audios[0].new_zeros([len(audios), audio_size]+audio_feat_shape)
+        padding_mask = (
+            torch.BoolTensor(len(audios), audio_size).fill_(False) # 
+        )
+        start_known = audio_starts is not None
+        audio_starts = [0 for _ in audios] if not start_known else audio_starts
+        for i, audio in enumerate(audios):
+            diff = len(audio) - audio_size
+            if diff == 0:
+                collated_audios[i] = audio
+            elif diff < 0:
+                assert self.pad_audio
+                collated_audios[i] = torch.cat(
+                    [audio, audio.new_full([-diff]+audio_feat_shape, 0.0)]
+                )
+                padding_mask[i, diff:] = True
+            else:
+                collated_audios[i], audio_starts[i] = self.crop_to_max_size(
+                    audio, audio_size, audio_starts[i] if start_known else None
+                )
+        if len(audios[0].shape) == 2:
+            collated_audios = collated_audios.transpose(1, 2) # [B, T, F] -> [B, F, T]
+        else:
+            collated_audios = collated_audios.permute((0, 4, 1, 2, 3)).contiguous() # [B, T, H, W, C] -> [B, C, T, H, W]
+        return collated_audios, padding_mask, audio_starts
+
+    def collater_frm_label(
+        self, targets, audio_size, audio_starts, label_rate, pad
+    ):
+        assert label_rate > 0
+        s2f = label_rate / self.sample_rate # num label per sample
+        frm_starts = [int(round(s * s2f)) for s in audio_starts]
+        frm_size = int(round(audio_size * s2f))
+        if not self.pad_audio:
+            rem_size = [len(t) - s for t, s in zip(targets, frm_starts)]
+            frm_size = min(frm_size, *rem_size)
+        targets = [t[s: s + frm_size] for t, s in zip(targets, frm_starts)]
+        logger.debug(f"audio_starts={audio_starts}")
+        logger.debug(f"frame_starts={frm_starts}")
+        logger.debug(f"frame_size={frm_size}")
+
+        lengths = torch.LongTensor([len(t) for t in targets])
+        ntokens = lengths.sum().item()
+        targets = data_utils.collate_tokens(
+            targets, pad_idx=pad, left_pad=False
+        )
+        return targets, lengths, ntokens
+
+
+    def collater_seq_label(self, targets, pad):
+        lengths = torch.LongTensor([len(t) for t in targets])
+        ntokens = lengths.sum().item()
+        targets = data_utils.collate_tokens(
+            targets, pad_idx=pad, left_pad=False
+        )
+        return targets, lengths, ntokens
+
+
+    def collater_seq_label_s2s(self, targets, pad):
+        lengths = torch.LongTensor([len(t) for t in targets])
+        ntokens = lengths.sum().item()
+        pad, eos = self.label_processors[0].dictionary.pad(), self.label_processors[0].dictionary.eos()
+        targets_ = data_utils.collate_tokens(targets, pad_idx=pad, eos_idx=eos, left_pad=False)
+        prev_output_tokens = data_utils.collate_tokens(targets, pad_idx=pad, eos_idx=eos, left_pad=False, move_eos_to_beginning=True)
+        return (targets_, prev_output_tokens), lengths, ntokens
+
+    def collater_label(self, targets_by_label, audio_size, audio_starts):
+        targets_list, lengths_list, ntokens_list = [], [], []
+        itr = zip(targets_by_label, self.label_rates, self.pad_list)
+        for targets, label_rate, pad in itr:
+            if label_rate == -1:
+                if self.is_s2s:
+                    targets, lengths, ntokens = self.collater_seq_label_s2s(targets, pad)
+                else:
+                    targets, lengths, ntokens = self.collater_seq_label(targets, pad)
+            else:
+                targets, lengths, ntokens = self.collater_frm_label(
+                    targets, audio_size, audio_starts, label_rate, pad
+                )
+            targets_list.append(targets)
+            lengths_list.append(lengths)
+            ntokens_list.append(ntokens)
+        return targets_list, lengths_list, ntokens_list
+
+
+    def num_tokens(self, index):
+        return self.size(index)
+
+    def size(self, index):
+        if self.pad_audio:
+            return self.sizes[index]
+        return min(self.sizes[index], self.max_sample_size)
+
+    def ordered_indices(self):
+        if self.shuffle:
+            order = [np.random.permutation(len(self))]
+        else:
+            order = [np.arange(len(self))]
+
+        order.append(self.sizes)
+        return np.lexsort(order)[::-1]
diff --git a/VATLM/vat_hubert/vathubert/data/texthubert_dataset.py b/VATLM/vat_hubert/vathubert/data/texthubert_dataset.py
new file mode 100644
index 0000000000000000000000000000000000000000..d5701df7e81f27e329b8b4e096bf39e6bca58798
--- /dev/null
+++ b/VATLM/vat_hubert/vathubert/data/texthubert_dataset.py
@@ -0,0 +1,300 @@
+# ----------------------------------------------------------------------------
+# VatLM: Visual-Audio-Text Pre-Training  with Unified Masked Prediction for Speech Representation Learning
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/VATLM
+# Code based on fairseq: https://github.com/facebookresearch/fairseq and av_hubert: https://github.com/facebookresearch/av_hubert
+# 
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# ----------------------------------------------------------------------------
+
+import itertools
+import logging
+import os
+import sys
+import time
+from typing import Any, List, Optional, Union
+
+import numpy as np
+
+import torch
+import torch.nn.functional as F
+from fairseq.data import data_utils
+from fairseq.data.fairseq_dataset import FairseqDataset
+
+DBG=True if len(sys.argv) == 1 else False
+
+if DBG:
+    import utils as custom_utils
+    logging.basicConfig(
+        format="%(asctime)s | %(levelname)s | %(name)s | %(message)s",
+        datefmt="%Y-%m-%d %H:%M:%S",
+        level=os.environ.get("LOGLEVEL", "DEBUG").upper(),
+        stream=sys.stdout,
+    )
+else:
+    from . import utils as custom_utils
+
+logger = logging.getLogger(__name__)
+
+
+def load_text(manifest_path, max_keep, min_keep, frame_rate, label_paths, label_rates, tol=0.1):
+
+    n_long, n_short, n_unaligned = 0, 0, 0
+    names, inds, sizes = [], [], []
+    dur_from_label_list = []
+
+    with open(manifest_path) as f:
+        for ind, line in enumerate(f):
+            items = line.strip().split("\t")
+            frames = items[0]
+            sz = int(frames)
+            if min_keep is not None and sz < min_keep:
+                n_short += 1
+            elif max_keep is not None and sz > max_keep:
+                n_long += 1
+            else:
+                inds.append(ind)
+                sizes.append(sz)
+
+    logger.info(
+        (
+            f"max_keep={max_keep}, min_keep={min_keep}, "
+            f"loaded {len(inds)}, skipped {n_short} short and {n_long} long"
+            f"longest-loaded={max(sizes)}, shortest-loaded={min(sizes)}"
+        )
+    )
+
+    return inds, sizes
+
+
+def load_label(label_path, inds):
+    with open(label_path) as f:
+        labels = [line.rstrip() for line in f]
+        labels = [labels[i] for i in inds]
+    return labels
+
+def load_phone_label(tsv, inds):
+    with open(tsv) as f:
+        labels = [line.rstrip() for line in f.readlines()]
+        labels = [labels[i] for i in inds]
+    return labels
+
+
+def load_label_offset(label_path, inds):
+    with open(label_path) as f:
+        code_lengths = [len(line.encode("utf-8")) for line in f]
+        offsets = list(itertools.accumulate([0] + code_lengths))
+        offsets = [(offsets[i], offsets[i + 1]) for i in inds]
+    return offsets
+
+
+
+class TextHubertDataset(FairseqDataset):
+    def __init__(
+            self,
+            manifest_path: str,
+            sample_rate: float,
+            label_paths: List[str],
+            label_rates: Union[List[float], float],  # -1 for sequence labels
+            pad_list: List[str],
+            eos_list: List[str],
+            label_processors: Optional[List[Any]] = None,
+            phone_sequence_processors: Optional[List[Any]] = None,
+            max_keep_sample_size: Optional[int] = None,
+            min_keep_sample_size: Optional[int] = None,
+            max_sample_size: Optional[int] = None,
+            shuffle: bool = True,
+            pad_audio: bool = False,
+            normalize: bool = False,
+            store_labels: bool = True,
+            single_target: bool = False,
+            stack_order_audio: int=1,
+            skip_verify: bool=False,
+            is_s2s=False,
+    ):
+        self.label_rates = (
+            [label_rates for _ in range(len(label_paths))]
+            if isinstance(label_rates, int)
+            else label_rates
+        )
+        inds, self.sizes = load_text(manifest_path, max_keep_sample_size, min_keep_sample_size, frame_rate=sample_rate, label_paths=label_paths, label_rates=self.label_rates)
+        self.sample_rate = sample_rate
+        self.stack_order_audio = stack_order_audio
+        self.shuffle = shuffle
+
+        self.num_labels = len(label_paths)
+        self.pad_list = pad_list
+        self.eos_list = eos_list
+        self.label_processors = label_processors
+        self.phone_processors = phone_sequence_processors
+        self.single_target = single_target
+        self.store_labels = store_labels
+        self.is_s2s = is_s2s
+
+
+        if store_labels:
+            self.label_list = [load_label(p, inds) for p in label_paths]
+            self.phone_list = [load_phone_label(p, inds) for p in [manifest_path]]
+
+        else:
+            self.label_paths = label_paths
+            self.label_offsets_list = [
+                load_label_offset(p, inds) for p in label_paths
+            ]
+
+        self.max_sample_size = (
+            max_sample_size if max_sample_size is not None else sys.maxsize
+        )
+        self.pad_audio = pad_audio
+        self.normalize = normalize
+
+
+    def get_label(self, index, label_idx):
+        if self.store_labels:
+            label = self.label_list[label_idx][index]
+        else:
+            with open(self.label_paths[label_idx]) as f:
+                offset_s, offset_e = self.label_offsets_list[label_idx][index]
+                f.seek(offset_s)
+                label = f.read(offset_e - offset_s)
+
+        if self.label_processors is not None:
+            label = self.label_processors[label_idx](label)
+        return label
+
+    def get_labels(self, index):
+        return [self.get_label(index, i) for i in range(self.num_labels)]
+
+    def get_phone(self, index, label_idx):
+        label = self.phone_list[label_idx][index]
+        if self.phone_processors is not None:
+            label = self.phone_processors[label_idx](label)
+        return label
+
+    def get_phones(self, index):
+        return [self.get_phone(index, i) for i in range(1)]
+
+
+    def __getitem__(self, index):
+        labels = self.get_labels(index)
+        phone_sequence_list = self.get_phones(index)
+
+    
+        return {"id": index, "label_list": labels, "phone_sequence_list": phone_sequence_list}
+
+
+    def __len__(self):
+        return len(self.sizes)
+
+
+    def collater(self, samples):
+        samples = [s for s in samples if s["id"] is not None]
+        if len(samples) == 0:
+            return {}
+
+        targets_by_label = [
+            [s["label_list"][i] for s in samples]
+            for i in range(self.num_labels)
+        ]
+        targets_list, lengths_list, ntokens_list = self.collater_label(
+            targets_by_label,
+        )
+
+        phone_sequence_list = [s["phone_sequence_list"] for s in samples]
+        if phone_sequence_list[0] is None:
+            phone_sequence_list = None
+        
+        targets_by_phone_label = [
+            [s["phone_sequence_list"][i] for s in samples]
+            for i in range(self.num_labels)
+        ]
+        targets_phone_list, lengths_phone_list, ntokens_phone_list = self.collater_phone_label(
+            targets_by_phone_label,
+        )
+
+        net_input = {"source": None}
+        batch = {
+            "id": torch.LongTensor([s["id"] for s in samples]),
+            "net_input": net_input,
+        }
+
+        if self.single_target:
+            batch["target_lengths"] = lengths_list[0]
+            batch["ntokens"] = ntokens_list[0]
+            if self.is_s2s:
+                batch['target'], net_input['prev_output_tokens'] = targets_list[0][0], targets_list[0][1]
+            else:
+                batch["target"] = targets_list[0]
+        else:
+            batch["target_lengths_list"] = lengths_list
+            batch["ntokens_list"] = ntokens_list
+            batch["target_list"] = targets_list 
+
+        batch["extra_text_phone_list"] = targets_phone_list       
+
+        return batch
+
+    def collater_frm_label(
+        self, targets, label_rate, pad
+    ):
+        lengths = torch.LongTensor([len(t) for t in targets])
+        ntokens = lengths.sum().item()
+        targets = data_utils.collate_tokens(
+            targets, pad_idx=pad, left_pad=False
+        )
+        return targets, lengths, ntokens
+
+
+    def collater_frm_phone_label(
+        self, targets, pad
+    ):
+
+        lengths = torch.LongTensor([len(t) for t in targets])
+        ntokens = lengths.sum().item()
+        targets = data_utils.collate_tokens(
+            targets, pad_idx=pad, left_pad=False
+        )
+        return targets, lengths, ntokens
+
+    def collater_label(self, targets_by_label,):
+        targets_list, lengths_list, ntokens_list = [], [], []
+        itr = zip(targets_by_label, self.label_rates, self.pad_list)
+        for targets, label_rate, pad in itr:
+            targets, lengths, ntokens = self.collater_frm_label(
+                targets, label_rate, pad
+            )
+            targets_list.append(targets)
+            lengths_list.append(lengths)
+            ntokens_list.append(ntokens)
+        return targets_list, lengths_list, ntokens_list
+
+
+    def collater_phone_label(self, targets_by_label):
+        targets_list, lengths_list, ntokens_list = [], [], []
+        itr = zip(targets_by_label, self.label_rates, self.pad_list)
+        for targets, label_rate, pad in itr:
+            targets, lengths, ntokens = self.collater_frm_phone_label(
+                targets, pad
+            )
+            targets_list.append(targets)
+            lengths_list.append(lengths)
+            ntokens_list.append(ntokens)
+        return targets_list, lengths_list, ntokens_list
+
+
+    def num_tokens(self, index):
+        return self.size(index)
+
+    def size(self, index):
+        if self.pad_audio:
+            return self.sizes[index]
+        return min(self.sizes[index], self.max_sample_size)
+
+    def ordered_indices(self):
+        if self.shuffle:
+            order = [np.random.permutation(len(self))]
+        else:
+            order = [np.arange(len(self))]
+
+        order.append(self.sizes)
+        return np.lexsort(order)[::-1]
diff --git a/VATLM/vat_hubert/vathubert/data/utils.py b/VATLM/vat_hubert/vathubert/data/utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..666500334e3223273e1df67125796116a429c3c9
--- /dev/null
+++ b/VATLM/vat_hubert/vathubert/data/utils.py
@@ -0,0 +1,300 @@
+# ----------------------------------------------------------------------------
+# VatLM: Visual-Audio-Text Pre-Training  with Unified Masked Prediction for Speech Representation Learning
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/VATLM
+# Code based on fairseq: https://github.com/facebookresearch/fairseq and av_hubert: https://github.com/facebookresearch/av_hubert
+# 
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# ----------------------------------------------------------------------------
+import cv2
+import torch
+import random
+import numpy as np
+from typing import Dict, List, Optional, Tuple
+
+def load_video(path):
+    for i in range(3):
+        try:
+            cap = cv2.VideoCapture(path)
+            frames = []
+            while True:
+                ret, frame = cap.read()
+                if ret:
+                    frame = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
+                    frames.append(frame)
+                else:
+                    break
+            frames = np.stack(frames)
+            return frames
+        except Exception:
+            print(f"failed loading {path} ({i} / 3)")
+            if i == 2:
+                raise ValueError(f"Unable to load {path}")
+
+
+class Compose(object):
+    """Compose several preprocess together.
+    Args:
+        preprocess (list of ``Preprocess`` objects): list of preprocess to compose.
+    """
+
+    def __init__(self, preprocess):
+        self.preprocess = preprocess
+
+    def __call__(self, sample):
+        for t in self.preprocess:
+            sample = t(sample)
+        return sample
+
+    def __repr__(self):
+        format_string = self.__class__.__name__ + '('
+        for t in self.preprocess:
+            format_string += '\n'
+            format_string += '    {0}'.format(t)
+        format_string += '\n)'
+        return format_string
+
+
+class Normalize(object):
+    """Normalize a ndarray image with mean and standard deviation.
+    """
+
+    def __init__(self, mean, std):
+        self.mean = mean
+        self.std = std
+
+    def __call__(self, frames):
+        """
+        Args:
+            tensor (Tensor): Tensor image of size (C, H, W) to be normalized.
+        Returns:
+            Tensor: Normalized Tensor image.
+        """
+        frames = (frames - self.mean) / self.std
+        return frames
+
+    def __repr__(self):
+        return self.__class__.__name__+'(mean={0}, std={1})'.format(self.mean, self.std)
+
+class CenterCrop(object):
+    """Crop the given image at the center
+    """
+    def __init__(self, size):
+        self.size = size
+
+    def __call__(self, frames):
+        """
+        Args:
+            img (numpy.ndarray): Images to be cropped.
+        Returns:
+            numpy.ndarray: Cropped image.
+        """
+        t, h, w = frames.shape
+        th, tw = self.size
+        delta_w = int(round((w - tw))/2.)
+        delta_h = int(round((h - th))/2.)
+        frames = frames[:, delta_h:delta_h+th, delta_w:delta_w+tw]
+        return frames
+
+
+class RandomCrop(object):
+    """Crop the given image at the center
+    """
+
+    def __init__(self, size):
+        self.size = size
+
+    def __call__(self, frames):
+        """
+        Args:
+            img (numpy.ndarray): Images to be cropped.
+        Returns:
+            numpy.ndarray: Cropped image.
+        """
+        t, h, w = frames.shape
+        th, tw = self.size
+        delta_w = random.randint(0, w-tw)
+        delta_h = random.randint(0, h-th)
+        frames = frames[:, delta_h:delta_h+th, delta_w:delta_w+tw]
+        return frames
+
+    def __repr__(self):
+        return self.__class__.__name__ + '(size={0})'.format(self.size)
+
+class HorizontalFlip(object):
+    """Flip image horizontally.
+    """
+
+    def __init__(self, flip_ratio):
+        self.flip_ratio = flip_ratio
+
+    def __call__(self, frames):
+        """
+        Args:
+            img (numpy.ndarray): Images to be flipped with a probability flip_ratio
+        Returns:
+            numpy.ndarray: Cropped image.
+        """
+        t, h, w = frames.shape
+        if random.random() < self.flip_ratio:
+            for index in range(t):
+                frames[index] = cv2.flip(frames[index], 1)
+        return frames
+
+def compute_mask_indices(
+    shape: Tuple[int, int],
+    padding_mask: Optional[torch.Tensor],
+    mask_prob: float,
+    mask_length: int,
+    mask_type: str = "static",
+    mask_other: float = 0.0,
+    min_masks: int = 0,
+    no_overlap: bool = False,
+    min_space: int = 0,
+) -> np.ndarray:
+    """
+    Computes random mask spans for a given shape
+    Args:
+        shape: the the shape for which to compute masks.
+            should be of size 2 where first element is batch size and 2nd is timesteps
+        padding_mask: optional padding mask of the same size as shape, which will prevent masking padded elements
+        mask_prob: probability for each token to be chosen as start of the span to be masked. this will be multiplied by
+            number of timesteps divided by length of mask span to mask approximately this percentage of all elements.
+            however due to overlaps, the actual number will be smaller (unless no_overlap is True)
+        mask_type: how to compute mask lengths
+            static = fixed size
+            uniform = sample from uniform distribution [mask_other, mask_length*2]
+            normal = sample from normal distribution with mean mask_length and stdev mask_other. mask is min 1 element
+            poisson = sample from possion distribution with lambda = mask length
+        min_masks: minimum number of masked spans
+        no_overlap: if false, will switch to an alternative recursive algorithm that prevents spans from overlapping
+        min_space: only used if no_overlap is True, this is how many elements to keep unmasked between spans
+    """
+
+    bsz, all_sz = shape
+    mask = np.full((bsz, all_sz), False)
+
+    all_num_mask = int(
+        # add a random number for probabilistic rounding
+        mask_prob * all_sz / float(mask_length)
+        + np.random.rand()
+    )
+
+    all_num_mask = max(min_masks, all_num_mask)
+
+    mask_idcs = []
+    for i in range(bsz):
+        if padding_mask is not None:
+            sz = all_sz - padding_mask[i].long().sum().item()
+            num_mask = int(
+                # add a random number for probabilistic rounding
+                mask_prob * sz / float(mask_length)
+                + np.random.rand()
+            )
+            num_mask = max(min_masks, num_mask)
+        else:
+            sz = all_sz
+            num_mask = all_num_mask
+
+        if mask_type == "static":
+            lengths = np.full(num_mask, mask_length)
+        elif mask_type == "uniform":
+            lengths = np.random.randint(mask_other, mask_length * 2 + 1, size=num_mask)
+        elif mask_type == "normal":
+            lengths = np.random.normal(mask_length, mask_other, size=num_mask)
+            lengths = [max(1, int(round(x))) for x in lengths]
+        elif mask_type == "poisson":
+            lengths = np.random.poisson(mask_length, size=num_mask)
+            lengths = [int(round(x)) for x in lengths]
+        else:
+            raise Exception("unknown mask selection " + mask_type)
+
+        if sum(lengths) == 0:
+            lengths[0] = min(mask_length, sz - 1)
+
+        if no_overlap:
+            mask_idc = []
+
+            def arrange(s, e, length, keep_length):
+                span_start = np.random.randint(s, e - length)
+                mask_idc.extend(span_start + i for i in range(length))
+
+                new_parts = []
+                if span_start - s - min_space >= keep_length:
+                    new_parts.append((s, span_start - min_space + 1))
+                if e - span_start - keep_length - min_space > keep_length:
+                    new_parts.append((span_start + length + min_space, e))
+                return new_parts
+
+            parts = [(0, sz)]
+            min_length = min(lengths)
+            for length in sorted(lengths, reverse=True):
+                lens = np.fromiter(
+                    (e - s if e - s >= length + min_space else 0 for s, e in parts),
+                    np.int,
+                )
+                l_sum = np.sum(lens)
+                if l_sum == 0:
+                    break
+                probs = lens / np.sum(lens)
+                c = np.random.choice(len(parts), p=probs)
+                s, e = parts.pop(c)
+                parts.extend(arrange(s, e, length, min_length))
+            mask_idc = np.asarray(mask_idc)
+        else:
+            min_len = min(lengths)
+            if sz - min_len <= num_mask:
+                min_len = sz - num_mask - 1
+
+            mask_idc = np.random.choice(sz - min_len, num_mask, replace=False)
+
+            mask_idc = np.asarray(
+                [
+                    mask_idc[j] + offset
+                    for j in range(len(mask_idc))
+                    for offset in range(lengths[j])
+                ]
+            )
+
+        mask_idcs.append(np.unique(mask_idc[mask_idc < sz]))
+
+    min_len = min([len(m) for m in mask_idcs])
+    batch_indexes, starts, ends = [], [], []
+    for i, mask_idc in enumerate(mask_idcs):
+        if len(mask_idc) > min_len:
+            mask_idc = np.random.choice(mask_idc, min_len, replace=False)
+        mask[i, mask_idc] = True
+        vals, run_starts, run_lengths = find_runs(mask[i])
+        start_indices, lengths = run_starts[vals == True], run_lengths[vals == True]
+        starts.append(start_indices)
+        ends.append(start_indices+lengths)
+        batch_indexes.append(np.zeros([len(start_indices)])+i)
+    return mask, np.concatenate(starts).astype(np.int64), np.concatenate(ends).astype(np.int64), np.concatenate(batch_indexes).astype(np.int64)
+
+def find_runs(x):
+    """Find runs of consecutive items in an array."""
+
+    # ensure array
+    x = np.asanyarray(x)
+    if x.ndim != 1:
+        raise ValueError('only 1D array supported')
+    n = x.shape[0]
+
+    # handle empty array
+    if n == 0:
+        return np.array([]), np.array([]), np.array([])
+
+    else:
+        # find run starts
+        loc_run_start = np.empty(n, dtype=bool)
+        loc_run_start[0] = True
+        np.not_equal(x[:-1], x[1:], out=loc_run_start[1:])
+        run_starts = np.nonzero(loc_run_start)[0]
+
+        # find run values
+        run_values = x[loc_run_start]
+
+        # find run lengths
+        run_lengths = np.diff(np.append(run_starts, n))
+
+        return run_values, run_starts, run_lengths
diff --git a/VATLM/vat_hubert/vathubert/data/vathubert_dataset.py b/VATLM/vat_hubert/vathubert/data/vathubert_dataset.py
new file mode 100644
index 0000000000000000000000000000000000000000..f1cf0939ae46661508914028e29e522b09f9afe2
--- /dev/null
+++ b/VATLM/vat_hubert/vathubert/data/vathubert_dataset.py
@@ -0,0 +1,530 @@
+# ----------------------------------------------------------------------------
+# VatLM: Visual-Audio-Text Pre-Training  with Unified Masked Prediction for Speech Representation Learning
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/VATLM
+# Code based on fairseq: https://github.com/facebookresearch/fairseq and av_hubert: https://github.com/facebookresearch/av_hubert
+# 
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# ----------------------------------------------------------------------------
+
+import itertools
+import logging
+import os
+import sys
+import time
+from typing import Any, List, Optional, Union
+
+import numpy as np
+
+import torch
+import torch.nn.functional as F
+from fairseq.data import data_utils
+from fairseq.data.fairseq_dataset import FairseqDataset
+from scipy.io import wavfile
+import kaldiio
+
+DBG=True if len(sys.argv) == 1 else False
+
+if DBG:
+    import utils as custom_utils
+    logging.basicConfig(
+        format="%(asctime)s | %(levelname)s | %(name)s | %(message)s",
+        datefmt="%Y-%m-%d %H:%M:%S",
+        level=os.environ.get("LOGLEVEL", "DEBUG").upper(),
+        stream=sys.stdout,
+    )
+else:
+    from . import utils as custom_utils
+
+logger = logging.getLogger(__name__)
+
+
+def load_audio_visual(manifest_path, max_keep, min_keep, frame_rate, label_paths, label_rates, tol=0.1):
+    def is_audio_label_aligned(audio_dur, label_durs):
+        return all([abs(audio_dur - label_dur)<tol for label_dur in label_durs])
+
+    n_long, n_short, n_unaligned = 0, 0, 0
+    names, inds, sizes = [], [], []
+    dur_from_label_list = []
+    is_seq_label = any([x==-1 for x in label_rates])
+    for label_path, label_rate in zip(label_paths, label_rates):
+        label_lengths = [len(line.rstrip().split())/label_rate for line in open(label_path).readlines()]
+        dur_from_label_list.append(label_lengths)
+    dur_from_label_list = list(zip(*dur_from_label_list))
+
+    with open(manifest_path) as f:
+        root = f.readline().strip()
+        for ind, line in enumerate(f):
+            items = line.strip().split("\t")
+            sz = int(items[-2]) # 
+            if min_keep is not None and sz < min_keep:
+                n_short += 1
+            elif max_keep is not None and sz > max_keep:
+                n_long += 1
+            elif (not is_seq_label) and (not is_audio_label_aligned(sz/frame_rate, dur_from_label_list[ind])):
+                n_unaligned += 1
+            else:
+                video_path = items[1]
+                audio_path = items[2]
+                audio_id = items[0]
+                names.append((video_path, audio_path+','+audio_id))
+                inds.append(ind)
+                sizes.append(sz)
+    tot = ind + 1
+    logger.info(
+        (
+            f"label_rates={label_rates}, "
+            f"max_keep={max_keep}, min_keep={min_keep}, "
+            f"loaded {len(names)}, skipped {n_short} short and {n_long} long and {n_unaligned} unaligned, "
+            f"longest-loaded={max(sizes)}, shortest-loaded={min(sizes)}"
+        )
+    )
+    return root, names, inds, tot, sizes
+
+def load_label(label_path, inds, tot):
+    with open(label_path) as f:
+        labels = [line.rstrip() for line in f]
+        assert (
+            len(labels) == tot
+        ), f"number of labels does not match ({len(labels)} != {tot})"
+        labels = [labels[i] for i in inds]
+    return labels
+
+
+def load_label_offset(label_path, inds, tot):
+    with open(label_path) as f:
+        code_lengths = [len(line.encode("utf-8")) for line in f]
+        assert (
+            len(code_lengths) == tot
+        ), f"number of labels does not match ({len(code_lengths)} != {tot})"
+        offsets = list(itertools.accumulate([0] + code_lengths))
+        offsets = [(offsets[i], offsets[i + 1]) for i in inds]
+    return offsets
+
+
+def verify_label_lengths(
+    audio_sizes,
+    audio_rate,
+    label_path,
+    label_rate,
+    inds,
+    tot,
+    tol=0.1,  # tolerance in seconds
+):
+    if label_rate < 0:
+        logger.info(f"{label_path} is sequence label. skipped")
+        return
+
+    with open(label_path) as f:
+        lengths = [len(line.rstrip().split()) for line in f]
+        assert len(lengths) == tot
+        lengths = [lengths[i] for i in inds]
+    num_invalid = 0
+    for i, ind in enumerate(inds):
+        dur_from_audio = audio_sizes[i] / audio_rate
+        dur_from_label = lengths[i] / label_rate
+        if abs(dur_from_audio - dur_from_label) > tol:
+            logger.warning(
+                (
+                    f"audio and label duration differ too much "
+                    f"(|{dur_from_audio} - {dur_from_label}| > {tol}) "
+                    f"in line {ind+1} of {label_path}. Check if `label_rate` "
+                    f"is correctly set (currently {label_rate}). "
+                    f"num. of samples = {audio_sizes[i]}; "
+                    f"label length = {lengths[i]}"
+                )
+            )
+            num_invalid += 1
+    if num_invalid > 0:
+        logger.warning(
+            f"total {num_invalid} (audio, label) pairs with mismatched lengths"
+        )
+
+
+class VATHubertDataset(FairseqDataset):
+    def __init__(
+            self,
+            manifest_path: str,
+            sample_rate: float,
+            label_paths: List[str],
+            label_rates: Union[List[float], float],  # -1 for sequence labels
+            pad_list: List[str],
+            eos_list: List[str],
+            label_processors: Optional[List[Any]] = None,
+            max_keep_sample_size: Optional[int] = None,
+            min_keep_sample_size: Optional[int] = None,
+            max_sample_size: Optional[int] = None,
+            shuffle: bool = True,
+            pad_audio: bool = False,
+            normalize: bool = False,
+            store_labels: bool = True,
+            random_crop: bool = False,
+            single_target: bool = False,
+            stack_order_audio: int=1,
+            skip_verify: bool=False,
+            image_mean: float=0,
+            image_std: float=1,
+            image_crop_size: int=88,
+            image_aug: bool=False,
+            modalities: Optional[List[str]]=None,
+            is_s2s=False,
+            noise_fn=None,
+            noise_prob=0,
+            noise_snr=0,
+            noise_num=1
+    ):
+        self.label_rates = (
+            [label_rates for _ in range(len(label_paths))]
+            if isinstance(label_rates, int)
+            else label_rates
+        )
+        self.modalities = set(modalities)
+        self.audio_root, self.names, inds, tot, self.sizes = load_audio_visual(manifest_path, max_keep_sample_size, min_keep_sample_size, frame_rate=sample_rate, label_paths=label_paths, label_rates=self.label_rates)
+        self.sample_rate = sample_rate
+        self.stack_order_audio = stack_order_audio
+        self.shuffle = shuffle
+        self.random_crop = random_crop
+
+        self.num_labels = len(label_paths)
+        self.pad_list = pad_list
+        self.eos_list = eos_list
+        self.label_processors = label_processors
+        self.single_target = single_target
+        self.store_labels = store_labels
+        self.is_s2s = is_s2s
+        self.noise_wav, self.noise_prob, self.noise_snr, self.noise_num = [ln.strip() for ln in open(noise_fn).readlines()] if noise_fn is not None else [], noise_prob, noise_snr, noise_num
+
+        assert self.single_target == (self.label_rates[0] == -1), f"single target should be equivalent to sequence label (label_rate==-1)"
+        if store_labels:
+            self.label_list = [load_label(p, inds, tot) for p in label_paths]
+        else:
+            self.label_paths = label_paths
+            self.label_offsets_list = [
+                load_label_offset(p, inds, tot) for p in label_paths
+            ]
+        assert (
+            label_processors is None
+            or len(label_processors) == self.num_labels
+        )
+        if not skip_verify:
+            for label_path, label_rate in zip(label_paths, self.label_rates):
+                verify_label_lengths(self.sizes, self.sample_rate, label_path, label_rate, inds, tot)
+        else:
+            logger.info(f"Skip label alignment verifying")
+
+        self.max_sample_size = (
+            max_sample_size if max_sample_size is not None else sys.maxsize
+        )
+        self.pad_audio = pad_audio
+        self.normalize = normalize
+        if image_aug:
+            self.transform = custom_utils.Compose([
+                custom_utils.Normalize( 0.0,255.0 ),
+                custom_utils.RandomCrop((image_crop_size, image_crop_size)),
+                custom_utils.HorizontalFlip(0.5),
+                custom_utils.Normalize(image_mean, image_std) ])
+        else:
+            self.transform = custom_utils.Compose([
+                custom_utils.Normalize( 0.0,255.0 ),
+                custom_utils.CenterCrop((image_crop_size, image_crop_size)),
+                custom_utils.Normalize(image_mean, image_std) ])
+        logger.info(f"image transform: {self.transform}")
+
+        logger.info(
+            f"pad_audio={pad_audio}, random_crop={random_crop}, "
+            f"normalize={normalize}, max_sample_size={self.max_sample_size}, "
+            f"seqs2seq data={self.is_s2s},")
+        logger.info(
+            f"Noise wav: {noise_fn}->{len(self.noise_wav)} wav, Prob: {self.noise_prob}, SNR: {self.noise_snr}, Number of mixture: {self.noise_num}"
+        )
+
+    def get_label(self, index, label_idx):
+        if self.store_labels:
+            label = self.label_list[label_idx][index]
+        else:
+            with open(self.label_paths[label_idx]) as f:
+                offset_s, offset_e = self.label_offsets_list[label_idx][index]
+                f.seek(offset_s)
+                label = f.read(offset_e - offset_s)
+
+        if self.label_processors is not None:
+            label = self.label_processors[label_idx](label)
+        return label
+
+    def get_labels(self, index):
+        return [self.get_label(index, i) for i in range(self.num_labels)]
+
+    def load_feature(self, mix_name):
+        """
+        Load image and audio feature
+        Returns:
+        video_feats: numpy.ndarray of shape [T, H, W, 1], audio_feats: numpy.ndarray of shape [T, F]
+        """
+        def stacker(feats, stack_order):
+            """
+            Concatenating consecutive audio frames
+            Args:
+            feats - numpy.ndarray of shape [T, F]
+            stack_order - int (number of neighboring frames to concatenate
+            Returns:
+            feats - numpy.ndarray of shape [T', F']
+            """
+            feat_dim = feats.shape[1]
+            if len(feats) % stack_order != 0:
+                res = stack_order - len(feats) % stack_order
+                res = np.zeros([res, feat_dim]).astype(feats.dtype)
+                feats = np.concatenate([feats, res], axis=0)
+            feats = feats.reshape((-1, stack_order, feat_dim)).reshape(-1, stack_order*feat_dim)
+            return feats
+        video_fn, audio_fn = mix_name
+        if 'video' in self.modalities:
+            video_feats = self.load_video(video_fn) # [T, H, W, 1]
+        else:
+            video_feats = None
+        if 'audio' in self.modalities:
+            audio_fn = audio_fn.split(',')[0]
+            audio_feats = kaldiio.load_mat(audio_fn).astype(np.float32)       
+
+            audio_feats = stacker(audio_feats, self.stack_order_audio) # [T/stack_order_audio, F*stack_order_audio]
+        else:
+            audio_feats = None
+        if audio_feats is not None and video_feats is not None:
+            diff = len(audio_feats) - len(video_feats)
+            if diff < 0:
+                audio_feats = np.concatenate([audio_feats, np.zeros([-diff, audio_feats.shape[-1]], dtype=audio_feats.dtype)])
+            elif diff > 0:
+                audio_feats = audio_feats[:-diff]
+        return video_feats, audio_feats
+
+    def load_video(self, audio_name):
+        feats = custom_utils.load_video(os.path.join(self.audio_root, audio_name))
+        feats = self.transform(feats)
+        feats = np.expand_dims(feats, axis=-1)
+        return feats
+
+    def select_noise(self):
+        rand_indexes = np.random.randint(0, len(self.noise_wav), size=self.noise_num)
+        noise_wav = []
+        for x in rand_indexes:
+            noise_wav.append(wavfile.read(self.noise_wav[x])[1].astype(np.float32))
+        if self.noise_num == 1:
+            return noise_wav[0]
+        else:
+            min_len = min([len(x) for x in noise_wav])
+            noise_wav = [x[:min_len] for x in noise_wav]
+            noise_wav = np.floor(np.stack(noise_wav).mean(axis=0))
+            return noise_wav
+
+    def add_noise(self, clean_wav):
+        clean_wav = clean_wav.astype(np.float32)
+        noise_wav = self.select_noise()
+        if type(self.noise_snr) == int or type(self.noise_snr) == float:
+            snr = self.noise_snr
+        elif type(self.noise_snr) == tuple:
+            snr = np.random.randint(self.noise_snr[0], self.noise_snr[1]+1)
+        clean_rms = np.sqrt(np.mean(np.square(clean_wav), axis=-1))
+        if len(clean_wav) > len(noise_wav):
+            ratio = int(np.ceil(len(clean_wav)/len(noise_wav)))
+            noise_wav = np.concatenate([noise_wav for _ in range(ratio)])
+        if len(clean_wav) < len(noise_wav):
+            start = 0
+            noise_wav = noise_wav[start: start + len(clean_wav)]
+        noise_rms = np.sqrt(np.mean(np.square(noise_wav), axis=-1))
+        adjusted_noise_rms = clean_rms / (10**(snr/20))
+        adjusted_noise_wav = noise_wav * (adjusted_noise_rms / noise_rms)
+        mixed = clean_wav + adjusted_noise_wav
+
+        #Avoid clipping noise
+        max_int16 = np.iinfo(np.int16).max
+        min_int16 = np.iinfo(np.int16).min
+        if mixed.max(axis=0) > max_int16 or mixed.min(axis=0) < min_int16:
+            if mixed.max(axis=0) >= abs(mixed.min(axis=0)): 
+                reduction_rate = max_int16 / mixed.max(axis=0)
+            else :
+                reduction_rate = min_int16 / mixed.min(axis=0)
+            mixed = mixed * (reduction_rate)
+        mixed = mixed.astype(np.int16)
+        return mixed
+
+    def __getitem__(self, index):
+        video_feats, audio_feats = self.load_feature(self.names[index])
+        audio_feats, video_feats = torch.from_numpy(audio_feats.astype(np.float32)) if audio_feats is not None else None, torch.from_numpy(video_feats.astype(np.float32)) if video_feats is not None else None
+        if self.normalize and 'audio' in self.modalities:
+            with torch.no_grad():
+                audio_feats = F.layer_norm(audio_feats, audio_feats.shape[1:])
+        labels = self.get_labels(index)
+        fid = self.names[index][1].split(':')[1]
+        return {"id": index, 'fid': fid, "video_source": video_feats, 'audio_source': audio_feats, "label_list": labels}
+
+    def __len__(self):
+        return len(self.sizes)
+
+    def crop_to_max_size(self, wav, target_size, start=None):
+        size = len(wav)
+        diff = size - target_size
+        if diff <= 0:
+            return wav, 0
+        # longer utterances
+        if start is None:
+            start, end = 0, target_size
+            if self.random_crop:
+                start = np.random.randint(0, diff + 1)
+                end = size - diff + start
+        else:
+            end = start + target_size
+        return wav[start:end], start
+
+    def collater(self, samples):
+        samples = [s for s in samples if s["id"] is not None]
+        if len(samples) == 0:
+            return {}
+
+        audio_source, video_source = [s["audio_source"] for s in samples], [s["video_source"] for s in samples]
+        if audio_source[0] is None:
+            audio_source = None
+        if video_source[0] is None:
+            video_source = None
+        if audio_source is not None:
+            audio_sizes = [len(s) for s in audio_source]
+        else:
+            audio_sizes = [len(s) for s in video_source]
+        if self.pad_audio:
+            audio_size = min(max(audio_sizes), self.max_sample_size)
+        else:
+            audio_size = min(min(audio_sizes), self.max_sample_size)
+        if audio_source is not None:
+            collated_audios, padding_mask, audio_starts = self.collater_audio(audio_source, audio_size)
+        else:
+            collated_audios, audio_starts = None, None
+        if video_source is not None:
+            collated_videos, padding_mask, audio_starts = self.collater_audio(video_source, audio_size, audio_starts)
+        else:
+            collated_videos = None
+        targets_by_label = [
+            [s["label_list"][i] for s in samples]
+            for i in range(self.num_labels)
+        ]
+        targets_list, lengths_list, ntokens_list = self.collater_label(
+            targets_by_label, audio_size, audio_starts
+        )
+        source = {"audio": collated_audios, "video": collated_videos}
+        net_input = {"source": source, "padding_mask": padding_mask}
+        batch = {
+            "id": torch.LongTensor([s["id"] for s in samples]),
+            "net_input": net_input,
+            "utt_id": [s['fid'] for s in samples]
+        }
+
+        if self.single_target:
+            batch["target_lengths"] = lengths_list[0]
+            batch["ntokens"] = ntokens_list[0]
+            if self.is_s2s:
+                batch['target'], net_input['prev_output_tokens'] = targets_list[0][0], targets_list[0][1]
+            else:
+                batch["target"] = targets_list[0]
+        else:
+            batch["target_lengths_list"] = lengths_list
+            batch["ntokens_list"] = ntokens_list
+            batch["target_list"] = targets_list
+        return batch
+
+    def collater_audio(self, audios, audio_size, audio_starts=None):
+        audio_feat_shape = list(audios[0].shape[1:])
+        collated_audios = audios[0].new_zeros([len(audios), audio_size]+audio_feat_shape)
+        padding_mask = (
+            torch.BoolTensor(len(audios), audio_size).fill_(False) # 
+        )
+        start_known = audio_starts is not None
+        audio_starts = [0 for _ in audios] if not start_known else audio_starts
+        for i, audio in enumerate(audios):
+            diff = len(audio) - audio_size
+            if diff == 0:
+                collated_audios[i] = audio
+            elif diff < 0:
+                assert self.pad_audio
+                collated_audios[i] = torch.cat(
+                    [audio, audio.new_full([-diff]+audio_feat_shape, 0.0)]
+                )
+                padding_mask[i, diff:] = True
+            else:
+                collated_audios[i], audio_starts[i] = self.crop_to_max_size(
+                    audio, audio_size, audio_starts[i] if start_known else None
+                )
+        if len(audios[0].shape) == 2:
+            collated_audios = collated_audios.transpose(1, 2) # [B, T, F] -> [B, F, T]
+        else:
+            collated_audios = collated_audios.permute((0, 4, 1, 2, 3)).contiguous() # [B, T, H, W, C] -> [B, C, T, H, W]
+        return collated_audios, padding_mask, audio_starts
+
+    def collater_frm_label(
+        self, targets, audio_size, audio_starts, label_rate, pad
+    ):
+        assert label_rate > 0
+        s2f = label_rate / self.sample_rate # num label per sample
+        frm_starts = [int(round(s * s2f)) for s in audio_starts]
+        frm_size = int(round(audio_size * s2f))
+        if not self.pad_audio:
+            rem_size = [len(t) - s for t, s in zip(targets, frm_starts)]
+            frm_size = min(frm_size, *rem_size)
+        targets = [t[s: s + frm_size] for t, s in zip(targets, frm_starts)]
+        logger.debug(f"audio_starts={audio_starts}")
+        logger.debug(f"frame_starts={frm_starts}")
+        logger.debug(f"frame_size={frm_size}")
+
+        lengths = torch.LongTensor([len(t) for t in targets])
+        ntokens = lengths.sum().item()
+        targets = data_utils.collate_tokens(
+            targets, pad_idx=pad, left_pad=False
+        )
+        return targets, lengths, ntokens
+
+    def collater_seq_label(self, targets, pad):
+        lengths = torch.LongTensor([len(t) for t in targets])
+        ntokens = lengths.sum().item()
+        targets = data_utils.collate_tokens(
+            targets, pad_idx=pad, left_pad=False
+        )
+        return targets, lengths, ntokens
+
+    def collater_seq_label_s2s(self, targets, pad):
+        lengths = torch.LongTensor([len(t) for t in targets])
+        ntokens = lengths.sum().item()
+        pad, eos = self.label_processors[0].dictionary.pad(), self.label_processors[0].dictionary.eos()
+        targets_ = data_utils.collate_tokens(targets, pad_idx=pad, eos_idx=eos, left_pad=False)
+        prev_output_tokens = data_utils.collate_tokens(targets, pad_idx=pad, eos_idx=eos, left_pad=False, move_eos_to_beginning=True)
+        return (targets_, prev_output_tokens), lengths, ntokens
+
+    def collater_label(self, targets_by_label, audio_size, audio_starts):
+        targets_list, lengths_list, ntokens_list = [], [], []
+        itr = zip(targets_by_label, self.label_rates, self.pad_list)
+        for targets, label_rate, pad in itr:
+            if label_rate == -1:
+                if self.is_s2s:
+                    targets, lengths, ntokens = self.collater_seq_label_s2s(targets, pad)
+                else:
+                    targets, lengths, ntokens = self.collater_seq_label(targets, pad)
+            else:
+                targets, lengths, ntokens = self.collater_frm_label(
+                    targets, audio_size, audio_starts, label_rate, pad
+                )
+            targets_list.append(targets)
+            lengths_list.append(lengths)
+            ntokens_list.append(ntokens)
+        return targets_list, lengths_list, ntokens_list
+
+    def num_tokens(self, index):
+        return self.size(index)
+
+    def size(self, index):
+        if self.pad_audio:
+            return self.sizes[index]
+        return min(self.sizes[index], self.max_sample_size)
+
+    def ordered_indices(self):
+        if self.shuffle:
+            order = [np.random.permutation(len(self))]
+        else:
+            order = [np.arange(len(self))]
+
+        order.append(self.sizes)
+        return np.lexsort(order)[::-1]
diff --git a/VATLM/vat_hubert/vathubert/decode_avhubert_lrs3.sh b/VATLM/vat_hubert/vathubert/decode_avhubert_lrs3.sh
new file mode 100644
index 0000000000000000000000000000000000000000..867fb7a0857ac66f8c738b93aa75495a70a03cc6
--- /dev/null
+++ b/VATLM/vat_hubert/vathubert/decode_avhubert_lrs3.sh
@@ -0,0 +1,17 @@
+#!/bin/bash
+
+decode_path=/path/to/finetuned_model
+finetuned_model=checkpoint_best.pt
+beam=50
+data=$1
+[ -z $data ] && data="test"
+
+python -B infer_s2s.py --config-dir /path/to/vat_hubert/vathubert/conf/ --config-name s2s_decode.yaml \
+  dataset.gen_subset=${data} common_eval.path=${decode_path}/checkpoints/${finetuned_model} \
+  common_eval.results_path=${decode_path}/${finetuned_model}_${data}_video_beam${beam} \
+  override.modalities=["video"] \
+  common.user_dir=/path/to/vat_hubert/vathubert \
+  override.data=/path/to/data \
+  override.label_dir=/path/to/data \
+  generation.beam=${beam}
+
diff --git a/VATLM/vat_hubert/vathubert/infer_s2s.py b/VATLM/vat_hubert/vathubert/infer_s2s.py
new file mode 100644
index 0000000000000000000000000000000000000000..d86d5ec0bc96a404a004684bdc042e3ca0fdceec
--- /dev/null
+++ b/VATLM/vat_hubert/vathubert/infer_s2s.py
@@ -0,0 +1,321 @@
+# ----------------------------------------------------------------------------
+# VatLM: Visual-Audio-Text Pre-Training  with Unified Masked Prediction for Speech Representation Learning
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/VATLM
+# Code based on fairseq: https://github.com/facebookresearch/fairseq and av_hubert: https://github.com/facebookresearch/av_hubert
+# 
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# ----------------------------------------------------------------------------
+
+import ast
+from itertools import chain
+import logging
+import math
+import os
+import sys
+import json
+import hashlib
+import editdistance
+from argparse import Namespace
+
+import numpy as np
+import torch
+from fairseq import checkpoint_utils, options, tasks, utils, distributed_utils
+from fairseq.dataclass.utils import convert_namespace_to_omegaconf
+from fairseq.logging import progress_bar
+from fairseq.logging.meters import StopwatchMeter, TimeMeter
+from fairseq.models import FairseqLanguageModel
+from omegaconf import DictConfig
+
+from pathlib import Path
+import hydra
+from hydra.core.config_store import ConfigStore
+from fairseq.dataclass.configs import (
+    CheckpointConfig,
+    CommonConfig,
+    CommonEvalConfig,
+    DatasetConfig,
+    DistributedTrainingConfig,
+    GenerationConfig,
+    FairseqDataclass,
+)
+from dataclasses import dataclass, field, is_dataclass
+from typing import Any, Dict, List, Optional, Tuple, Union
+from omegaconf import OmegaConf
+
+logging.root.setLevel(logging.INFO)
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger(__name__)
+
+config_path = Path(__file__).resolve().parent / "conf"
+
+@dataclass
+class OverrideConfig(FairseqDataclass):
+    noise_wav: Optional[str] = field(default=None, metadata={'help': 'noise wav file'})
+    noise_prob: float = field(default=0, metadata={'help': 'noise probability'})
+    noise_snr: float = field(default=0, metadata={'help': 'noise SNR in audio'})
+    modalities: List[str] = field(default_factory=lambda: [""], metadata={'help': 'which modality to use'})
+    data: Optional[str] = field(default=None, metadata={'help': 'path to test data directory'})
+    label_dir: Optional[str] = field(default=None, metadata={'help': 'path to test label directory'})
+
+@dataclass
+class InferConfig(FairseqDataclass):
+    task: Any = None
+    generation: GenerationConfig = GenerationConfig()
+    common: CommonConfig = CommonConfig()
+    common_eval: CommonEvalConfig = CommonEvalConfig()
+    checkpoint: CheckpointConfig = CheckpointConfig()
+    distributed_training: DistributedTrainingConfig = DistributedTrainingConfig()
+    dataset: DatasetConfig = DatasetConfig()
+    override: OverrideConfig = OverrideConfig()
+    is_ax: bool = field(
+        default=False,
+        metadata={
+            "help": "if true, assumes we are using ax for tuning and returns a tuple for ax to consume"
+        },
+    )
+
+
+def main(cfg: DictConfig):
+
+    if isinstance(cfg, Namespace):
+        cfg = convert_namespace_to_omegaconf(cfg)
+
+    assert cfg.common_eval.path is not None, "--path required for recognition!"
+    assert (
+        not cfg.generation.sampling or cfg.generation.nbest == cfg.generation.beam
+    ), "--sampling requires --nbest to be equal to --beam"
+
+    if cfg.common_eval.results_path is not None:
+        os.makedirs(cfg.common_eval.results_path, exist_ok=True)
+        output_path = os.path.join(cfg.common_eval.results_path, "decode.log")
+        with open(output_path, "w", buffering=1, encoding="utf-8") as h:
+            return _main(cfg, h)
+    return _main(cfg, sys.stdout)
+
+
+def get_symbols_to_strip_from_output(generator):
+    if hasattr(generator, "symbols_to_strip_from_output"):
+        return generator.symbols_to_strip_from_output
+    else:
+        return {generator.eos, generator.pad}
+
+def _main(cfg, output_file):
+    logging.basicConfig(
+        format="%(asctime)s | %(levelname)s | %(name)s | %(message)s",
+        datefmt="%Y-%m-%d %H:%M:%S",
+        level=os.environ.get("LOGLEVEL", "INFO").upper(),
+        stream=output_file,
+    )
+    logger = logging.getLogger("hybrid.speech_recognize")
+    if output_file is not sys.stdout:  # also print to stdout
+        logger.addHandler(logging.StreamHandler(sys.stdout))
+
+    utils.import_user_module(cfg.common)
+    models, saved_cfg, task = checkpoint_utils.load_model_ensemble_and_task([cfg.common_eval.path])
+    models = [model.eval().cuda() for model in models]
+    saved_cfg.task.modalities = cfg.override.modalities
+    task = tasks.setup_task(saved_cfg.task)
+
+    task.build_tokenizer(saved_cfg.tokenizer)
+    task.build_bpe(saved_cfg.bpe)
+
+    logger.info(cfg)
+
+    # Fix seed for stochastic decoding
+    if cfg.common.seed is not None and not cfg.generation.no_seed_provided:
+        np.random.seed(cfg.common.seed)
+        utils.set_torch_seed(cfg.common.seed)
+
+    use_cuda = torch.cuda.is_available()
+
+    # Set dictionary
+    dictionary = task.target_dictionary
+
+    # loading the dataset should happen after the checkpoint has been loaded so we can give it the saved task config
+    task.cfg.noise_prob = cfg.override.noise_prob
+    task.cfg.noise_snr = cfg.override.noise_snr
+    task.cfg.noise_wav = cfg.override.noise_wav
+    if cfg.override.data is not None:
+        task.cfg.data = cfg.override.data
+    if cfg.override.label_dir is not None:
+        task.cfg.label_dir = cfg.override.label_dir
+    task.load_dataset(cfg.dataset.gen_subset, task_cfg=saved_cfg.task)
+
+    lms = [None]
+
+    # Optimize ensemble for generation
+    for model in chain(models, lms):
+        if model is None:
+            continue
+        if cfg.common.fp16:
+            model.half()
+        if use_cuda and not cfg.distributed_training.pipeline_model_parallel:
+            model.cuda()
+        model.prepare_for_inference_(cfg)
+
+    # Load dataset (possibly sharded)
+    itr = task.get_batch_iterator(
+        dataset=task.dataset(cfg.dataset.gen_subset),
+        max_tokens=cfg.dataset.max_tokens,
+        max_sentences=cfg.dataset.batch_size,
+        max_positions=utils.resolve_max_positions(
+            task.max_positions(), *[m.max_positions() for m in models]
+        ),
+        ignore_invalid_inputs=cfg.dataset.skip_invalid_size_inputs_valid_test,
+        required_batch_size_multiple=cfg.dataset.required_batch_size_multiple,
+        seed=cfg.common.seed,
+        num_shards=cfg.distributed_training.distributed_world_size,
+        shard_id=cfg.distributed_training.distributed_rank,
+        num_workers=cfg.dataset.num_workers,
+        data_buffer_size=cfg.dataset.data_buffer_size,
+    ).next_epoch_itr(shuffle=False)
+    progress = progress_bar.progress_bar(
+        itr,
+        log_format=cfg.common.log_format,
+        log_interval=cfg.common.log_interval,
+        default_log_format=("tqdm" if not cfg.common.no_progress_bar else "simple"),
+    )
+
+    # Initialize generator
+    if cfg.generation.match_source_len:
+        logger.warning(
+            "The option match_source_len is not applicable to speech recognition. Ignoring it."
+        )
+    gen_timer = StopwatchMeter()
+    extra_gen_cls_kwargs = {
+        "lm_model": lms[0],
+        "lm_weight": cfg.generation.lm_weight,
+    }
+    cfg.generation.score_reference = False  #
+    save_attention_plot = cfg.generation.print_alignment is not None
+    cfg.generation.print_alignment = None  #
+    generator = task.build_generator(
+        models, cfg.generation, extra_gen_cls_kwargs=extra_gen_cls_kwargs
+    )
+
+    def decode_fn(x):
+        symbols_ignore = get_symbols_to_strip_from_output(generator)
+        symbols_ignore.add(dictionary.pad())
+        if hasattr(task.datasets[cfg.dataset.gen_subset].label_processors[0], 'decode'):
+            return task.datasets[cfg.dataset.gen_subset].label_processors[0].decode(x, symbols_ignore)
+        chars = dictionary.string(x, extra_symbols_to_ignore=symbols_ignore)
+        words = " ".join("".join(chars.split()).replace('|', ' ').split())
+        return words
+
+    num_sentences = 0
+    has_target = True
+    wps_meter = TimeMeter()
+    result_dict = {'utt_id': [], 'ref': [], 'hypo': []}
+    for sample in progress:
+        sample = utils.move_to_cuda(sample) if use_cuda else sample
+        if "net_input" not in sample:
+            continue
+
+        prefix_tokens = None
+        if cfg.generation.prefix_size > 0:
+            prefix_tokens = sample["target"][:, : cfg.generation.prefix_size]
+
+        constraints = None
+        if "constraints" in sample:
+            constraints = sample["constraints"]
+
+        gen_timer.start()
+        hypos = task.inference_step(
+            generator,
+            models,
+            sample,
+            prefix_tokens=prefix_tokens,
+            constraints=constraints,
+        )
+        num_generated_tokens = sum(len(h[0]["tokens"]) for h in hypos)
+        gen_timer.stop(num_generated_tokens)
+
+        for i in range(len(sample["id"])):
+            result_dict['utt_id'].append(sample['utt_id'][i])
+            ref_sent = decode_fn(sample['target'][i].int().cpu())
+            result_dict['ref'].append(ref_sent)
+            best_hypo = hypos[i][0]['tokens'].int().cpu()
+            hypo_str = decode_fn(best_hypo)
+            result_dict['hypo'].append(hypo_str)
+            logger.info(f"\nREF:{ref_sent}\nHYP:{hypo_str}\n")
+        wps_meter.update(num_generated_tokens)
+        progress.log({"wps": round(wps_meter.avg)})
+        num_sentences += sample["nsentences"] if "nsentences" in sample else sample["id"].numel()
+
+    logger.info("NOTE: hypothesis and token scores are output in base 2")
+    logger.info("Recognized {:,} utterances ({} tokens) in {:.1f}s ({:.2f} sentences/s, {:.2f} tokens/s)".format(
+        num_sentences, gen_timer.n, gen_timer.sum, num_sentences / gen_timer.sum, 1. / gen_timer.avg))
+
+    yaml_str = OmegaConf.to_yaml(cfg.generation)
+    fid = int(hashlib.md5(yaml_str.encode("utf-8")).hexdigest(), 16)
+    fid = fid % 1000000
+    result_fn = f"{cfg.common_eval.results_path}/hypo-{fid}.json"
+    json.dump(result_dict, open(result_fn, 'w'), indent=4)
+    n_err, n_total = 0, 0
+    assert len(result_dict['hypo']) == len(result_dict['ref'])
+    for hypo, ref in zip(result_dict['hypo'], result_dict['ref']):
+        hypo, ref = hypo.strip().split(), ref.strip().split()
+        n_err += editdistance.eval(hypo, ref)
+        n_total += len(ref)
+    wer = 100 * n_err / n_total
+    wer_fn = f"{cfg.common_eval.results_path}/wer.{fid}"
+    with open(wer_fn, "w") as fo:
+        fo.write(f"WER: {wer}\n")
+        fo.write(f"err / num_ref_words = {n_err} / {n_total}\n\n")
+        fo.write(f"{yaml_str}")
+    logger.info(f"WER: {wer}%")
+    return
+
+
+@hydra.main(config_path=config_path, config_name="infer")
+def hydra_main(cfg: InferConfig) -> Union[float, Tuple[float, Optional[float]]]:
+    container = OmegaConf.to_container(cfg, resolve=True, enum_to_str=True)
+    cfg = OmegaConf.create(container)
+    OmegaConf.set_struct(cfg, True)
+
+    if cfg.common.reset_logging:
+        reset_logging()
+
+    wer = float("inf")
+
+    try:
+        if cfg.common.profile:
+            with torch.cuda.profiler.profile():
+                with torch.autograd.profiler.emit_nvtx():
+                    distributed_utils.call_main(cfg, main)
+        else:
+            distributed_utils.call_main(cfg, main)
+
+    except BaseException as e:  # pylint: disable=broad-except
+        if not cfg.common.suppress_crashes:
+            raise
+        else:
+            logger.error("Crashed! %s", str(e))
+    return
+
+
+def cli_main() -> None:
+    try:
+        from hydra._internal.utils import (
+            get_args,
+        )  # pylint: disable=import-outside-toplevel
+
+        cfg_name = get_args().config_name or "infer"
+    except ImportError:
+        logger.warning("Failed to get config name from hydra args")
+        cfg_name = "infer"
+
+    cs = ConfigStore.instance()
+    cs.store(name=cfg_name, node=InferConfig)
+
+    for k in InferConfig.__dataclass_fields__:
+        if is_dataclass(InferConfig.__dataclass_fields__[k].type):
+            v = InferConfig.__dataclass_fields__[k].default
+            cs.store(name=k, node=v)
+
+    hydra_main()  # pylint: disable=no-value-for-parameter
+
+
+if __name__ == "__main__":
+    cli_main()
diff --git a/VATLM/vat_hubert/vathubert/models/decoder.py b/VATLM/vat_hubert/vathubert/models/decoder.py
new file mode 100644
index 0000000000000000000000000000000000000000..481842b2210229c2c41a279f8febdce65027fb62
--- /dev/null
+++ b/VATLM/vat_hubert/vathubert/models/decoder.py
@@ -0,0 +1,246 @@
+# ----------------------------------------------------------------------------
+# VatLM: Visual-Audio-Text Pre-Training  with Unified Masked Prediction for Speech Representation Learning
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/VATLM
+# Code based on fairseq: https://github.com/facebookresearch/fairseq and av_hubert: https://github.com/facebookresearch/av_hubert
+# 
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# ----------------------------------------------------------------------------
+
+from argparse import Namespace
+import contextlib
+import copy
+import math
+import numpy as np
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from dataclasses import dataclass, field
+from omegaconf import MISSING, II, open_dict
+from typing import Any, Optional
+
+from fairseq import checkpoint_utils, tasks, utils
+from fairseq.dataclass import FairseqDataclass
+from fairseq.dataclass.utils import convert_namespace_to_omegaconf
+from fairseq.tasks import FairseqTask
+from fairseq.models import (
+    BaseFairseqModel,
+    FairseqEncoder,
+    FairseqEncoderDecoderModel,
+    FairseqIncrementalDecoder,
+    register_model,
+)
+# from fairseq.models.wav2vec.wav2vec2 import MASKING_DISTRIBUTION_CHOICES
+from fairseq.modules import (
+    LayerNorm,
+    PositionalEmbedding,
+    TransformerDecoderLayer,
+)
+
+
+class TransformerDecoder(FairseqIncrementalDecoder):
+    """
+    Transformer decoder consisting of *args.decoder_layers* layers. Each layer
+    is a :class:`TransformerDecoderLayer`.
+
+    Args:
+        args (argparse.Namespace): parsed command-line arguments
+        dictionary (~fairseq.data.Dictionary): decoding dictionary
+        embed_tokens (torch.nn.Embedding): output embedding
+        no_encoder_attn (bool, optional): whether to attend to encoder outputs
+            (default: False).
+    """
+
+    def __init__(
+        self,
+        cfg,
+        dictionary,
+        embed_tokens,
+        no_encoder_attn=False,
+    ):
+        super().__init__(dictionary)
+
+        self.dropout = cfg.decoder_dropout
+        self.share_input_output_embed = cfg.share_decoder_input_output_embed
+
+        input_embed_dim = embed_tokens.embedding_dim
+        embed_dim = cfg.decoder_embed_dim
+        self.output_embed_dim = cfg.decoder_embed_dim
+
+        self.layerdrop = cfg.decoder_layerdrop
+
+        padding_idx = embed_tokens.padding_idx
+        self.max_target_positions = cfg.max_target_positions
+
+        self.embed_tokens = embed_tokens
+        # self.embed_scale = math.sqrt(embed_dim)  # todo: try with input_embed_dim
+        self.embed_scale = 1.0 if cfg.no_scale_embedding else math.sqrt(embed_dim)
+
+        self.project_in_dim = (
+            Linear(input_embed_dim, embed_dim, bias=False)
+            if embed_dim != input_embed_dim
+            else None
+        )
+
+        self.embed_positions = (
+            PositionalEmbedding(
+                cfg.max_target_positions,
+                embed_dim,
+                padding_idx,
+                learned=cfg.decoder_learned_pos,
+            )
+            if not cfg.no_token_positional_embeddings
+            else None
+        )
+
+        # TODO: update this when transformer gets converted to dataclass configs
+        transformer_cfg = copy.deepcopy(cfg)
+        # with open_dict(transformer_cfg):
+        transformer_cfg.dropout = transformer_cfg.decoder_dropout
+        transformer_cfg.attention_dropout = (
+            transformer_cfg.decoder_attention_dropout
+        )
+        transformer_cfg.activation_dropout = (
+            transformer_cfg.decoder_activation_dropout
+        )
+
+        self.layers = nn.ModuleList([])
+        self.layers.extend(
+            [
+                TransformerDecoderLayer(transformer_cfg, no_encoder_attn)
+                for _ in range(transformer_cfg.decoder_layers)
+            ]
+        )
+
+        if not self.share_input_output_embed:
+            self.embed_out = nn.Parameter(
+                torch.Tensor(len(dictionary), self.output_embed_dim)
+            )
+            nn.init.normal_(self.embed_out, mean=0, std=self.output_embed_dim ** -0.5)
+
+        if transformer_cfg.decoder_normalize_before:
+            self.layer_norm = LayerNorm(embed_dim)
+        else:
+            self.layer_norm = None
+
+    def forward(
+        self, prev_output_tokens, encoder_out=None, incremental_state=None, **unused
+    ):
+        """
+        Args:
+            prev_output_tokens (LongTensor): previous decoder outputs of shape
+                `(batch, tgt_len)`, for teacher forcing
+            encoder_out (Tensor, optional): output from the encoder, used for
+                encoder-side attention
+            incremental_state (dict): dictionary used for storing state during
+                :ref:`Incremental decoding`
+
+        Returns:
+            tuple:
+                - the decoder's output of shape `(batch, tgt_len, vocab)`
+                - a dictionary with any model-specific outputs
+        """
+        prev_output_tokens = prev_output_tokens.long()
+        x, extra = self.extract_features(
+            prev_output_tokens, encoder_out, incremental_state
+        )
+        x = self.output_layer(x)
+        return x, extra
+
+    def extract_features(
+        self, prev_output_tokens, encoder_out=None, incremental_state=None, **unused
+    ):
+        """
+        Similar to *forward* but only return features.
+
+        Returns:
+            tuple:
+                - the decoder's features of shape `(batch, tgt_len, embed_dim)`
+                - a dictionary with any model-specific outputs
+        """
+
+        # embed positions
+        positions = (
+            self.embed_positions(
+                prev_output_tokens, incremental_state=incremental_state
+            )
+            if self.embed_positions is not None
+            else None
+        )
+
+        if incremental_state is not None:
+            prev_output_tokens = prev_output_tokens[:, -1:]
+            if positions is not None:
+                positions = positions[:, -1:]
+
+        # embed tokens and positions
+        x = self.embed_scale * self.embed_tokens(prev_output_tokens)
+
+        if self.project_in_dim is not None:
+            x = self.project_in_dim(x)
+
+        if positions is not None:
+            x += positions
+        x = F.dropout(x, p=self.dropout, training=self.training)
+
+        # B x T x C -> T x B x C
+        x = x.transpose(0, 1)
+        attn = None
+
+        inner_states = [x]
+
+        # decoder layers
+        for layer in self.layers:
+            dropout_probability = np.random.random()
+            if not self.training or (dropout_probability > self.layerdrop):
+                x, attn, _ = layer(
+                    x,
+                    encoder_out["encoder_out"] if encoder_out is not None else None,
+                    encoder_out["padding_mask"] if encoder_out is not None else None,
+                    incremental_state,
+                    self_attn_mask=self.buffered_future_mask(x)
+                    if incremental_state is None
+                    else None,
+                )
+                inner_states.append(x)
+
+        if self.layer_norm:
+            x = self.layer_norm(x)
+
+        # T x B x C -> B x T x C
+        x = x.transpose(0, 1)
+
+        return x, {"attn": attn, "inner_states": inner_states}
+
+    def output_layer(self, features, **kwargs):
+        """Project features to the vocabulary size."""
+        # project back to size of vocabulary
+        emb_mat = self.embed_tokens.weight if self.share_input_output_embed else self.embed_out
+        return torch.matmul(features, emb_mat.transpose(0, 1))
+        # if self.share_input_output_embed:
+        #     return F.linear(features, self.embed_tokens.weight)
+        # else:
+        #     return F.linear(features, self.embed_out)
+
+    def max_positions(self):
+        """Maximum output length supported by the decoder."""
+        if self.embed_positions is None:
+            return self.max_target_positions
+        return min(self.max_target_positions, self.embed_positions.max_positions)
+
+    def buffered_future_mask(self, tensor):
+        dim = tensor.size(0)
+        if (
+            not hasattr(self, "_future_mask")
+            or self._future_mask is None
+            or self._future_mask.device != tensor.device
+            or self._future_mask.size(0) < dim
+        ):
+            self._future_mask = torch.triu(
+                utils.fill_with_neg_inf(tensor.new(dim, dim)), 1
+            )
+        return self._future_mask[:dim, :dim]
+
+    def upgrade_state_dict_named(self, state_dict, name):
+        return state_dict
+
diff --git a/VATLM/vat_hubert/vathubert/models/resnet.py b/VATLM/vat_hubert/vathubert/models/resnet.py
new file mode 100644
index 0000000000000000000000000000000000000000..4e9436f531713a2f1cb26b38e148e0e66d3f3877
--- /dev/null
+++ b/VATLM/vat_hubert/vathubert/models/resnet.py
@@ -0,0 +1,172 @@
+# ----------------------------------------------------------------------------
+# VatLM: Visual-Audio-Text Pre-Training  with Unified Masked Prediction for Speech Representation Learning
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/VATLM
+# Code based on fairseq: https://github.com/facebookresearch/fairseq and av_hubert: https://github.com/facebookresearch/av_hubert
+# 
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# ----------------------------------------------------------------------------
+
+import logging
+import math
+import torch.nn as nn
+import pdb
+
+
+logger = logging.getLogger(__name__)
+
+def conv3x3(in_planes, out_planes, stride=1):
+    return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride,
+                     padding=1, bias=False)
+
+
+def downsample_basic_block( inplanes, outplanes, stride ):
+    return  nn.Sequential(
+                nn.Conv2d(inplanes, outplanes, kernel_size=1, stride=stride, bias=False),
+                nn.BatchNorm2d(outplanes),
+            )
+
+def downsample_basic_block_v2( inplanes, outplanes, stride ):
+    return  nn.Sequential(
+                nn.AvgPool2d(kernel_size=stride, stride=stride, ceil_mode=True, count_include_pad=False),
+                nn.Conv2d(inplanes, outplanes, kernel_size=1, stride=1, bias=False),
+                nn.BatchNorm2d(outplanes),
+            )
+
+
+
+class BasicBlock(nn.Module):
+    expansion = 1
+
+    def __init__(self, inplanes, planes, stride=1, downsample=None, relu_type = 'relu' ):
+        super(BasicBlock, self).__init__()
+
+        assert relu_type in ['relu','prelu']
+
+        self.conv1 = conv3x3(inplanes, planes, stride)
+        self.bn1 = nn.BatchNorm2d(planes)
+
+        if relu_type == 'relu':
+            self.relu1 = nn.ReLU(inplace=True)
+            self.relu2 = nn.ReLU(inplace=True)
+        elif relu_type == 'prelu':
+            self.relu1 = nn.PReLU(num_parameters=planes)
+            self.relu2 = nn.PReLU(num_parameters=planes)
+        else:
+            raise Exception('relu type not implemented')
+
+        self.conv2 = conv3x3(planes, planes)
+        self.bn2 = nn.BatchNorm2d(planes)
+        
+        self.downsample = downsample
+        self.stride = stride
+
+    def forward(self, x):
+        residual = x
+        out = self.conv1(x)
+        out = self.bn1(out)
+        out = self.relu1(out)
+        out = self.conv2(out)
+        out = self.bn2(out)
+        if self.downsample is not None:
+            residual = self.downsample(x)
+
+        out += residual
+        out = self.relu2(out)
+
+        return out
+
+
+class ResNet(nn.Module):
+
+    def __init__(self, block, layers, num_classes=1000, relu_type = 'relu', gamma_zero = False, avg_pool_downsample = False):
+        self.inplanes = 64
+        self.relu_type = relu_type
+        self.gamma_zero = gamma_zero
+        self.downsample_block = downsample_basic_block_v2 if avg_pool_downsample else downsample_basic_block
+
+        super(ResNet, self).__init__()
+        self.layer1 = self._make_layer(block, 64, layers[0])
+        self.layer2 = self._make_layer(block, 128, layers[1], stride=2)
+        self.layer3 = self._make_layer(block, 256, layers[2], stride=2)
+        self.layer4 = self._make_layer(block, 512, layers[3], stride=2)
+        self.avgpool = nn.AdaptiveAvgPool2d(1)
+
+        for m in self.modules():
+            if isinstance(m, nn.Conv2d):
+                n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
+                m.weight.data.normal_(0, math.sqrt(2. / n))
+            elif isinstance(m, nn.BatchNorm2d):
+                m.weight.data.fill_(1)
+                m.bias.data.zero_()
+
+        if self.gamma_zero:
+            for m in self.modules():
+                if isinstance(m, BasicBlock ):
+                    m.bn2.weight.data.zero_()
+
+    def _make_layer(self, block, planes, blocks, stride=1):
+
+
+        downsample = None
+        if stride != 1 or self.inplanes != planes * block.expansion:
+            downsample = self.downsample_block( inplanes = self.inplanes, 
+                                                 outplanes = planes * block.expansion, 
+                                                 stride = stride )
+
+        layers = []
+        layers.append(block(self.inplanes, planes, stride, downsample, relu_type = self.relu_type))
+        self.inplanes = planes * block.expansion
+        for i in range(1, blocks):
+            layers.append(block(self.inplanes, planes, relu_type = self.relu_type))
+
+        return nn.Sequential(*layers)
+
+    def forward(self, x):
+        x = self.layer1(x)
+        x = self.layer2(x)
+        x = self.layer3(x)
+        x = self.layer4(x)
+        x = self.avgpool(x)
+        x = x.view(x.size(0), -1)
+        return x
+
+class ResEncoder(nn.Module):
+    def __init__(self, relu_type, weights):
+        super(ResEncoder, self).__init__()
+        self.frontend_nout = 64
+        self.backend_out = 512
+        frontend_relu = nn.PReLU(num_parameters=self.frontend_nout) if relu_type == 'prelu' else nn.ReLU()
+        self.frontend3D = nn.Sequential(
+            nn.Conv3d(1, self.frontend_nout, kernel_size=(5, 7, 7), stride=(1, 2, 2), padding=(2, 3, 3), bias=False),
+            nn.BatchNorm3d(self.frontend_nout),
+            frontend_relu,
+            nn.MaxPool3d( kernel_size=(1, 3, 3), stride=(1, 2, 2), padding=(0, 1, 1)))
+        self.trunk = ResNet(BasicBlock, [2, 2, 2, 2], relu_type=relu_type)
+        if weights is not None:
+            logger.info(f"Load {weights} for resnet")
+            std = torch.load(weights, map_location=torch.device('cpu'))['model_state_dict']
+            frontend_std, trunk_std = OrderedDict(), OrderedDict()
+            for key, val in std.items():
+                new_key = '.'.join(key.split('.')[1:])
+                if 'frontend3D' in key:
+                    frontend_std[new_key] = val
+                if 'trunk' in key:
+                    trunk_std[new_key] = val
+            self.frontend3D.load_state_dict(frontend_std)
+            self.trunk.load_state_dict(trunk_std)
+
+    def forward(self, x):
+        B, C, T, H, W = x.size()
+        x = self.frontend3D(x)
+        Tnew = x.shape[2]
+        x = self.threeD_to_2D_tensor(x)
+        x = self.trunk(x)
+        x = x.view(B, Tnew, x.size(1))
+        x = x.transpose(1, 2).contiguous()
+        return x
+
+    def threeD_to_2D_tensor(self, x):
+        n_batch, n_channels, s_time, sx, sy = x.shape
+        x = x.transpose(1, 2).contiguous()
+        return x.reshape(n_batch*s_time, n_channels, sx, sy)
diff --git a/VATLM/vat_hubert/vathubert/models/utils.py b/VATLM/vat_hubert/vathubert/models/utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..b04a5e67f99e20d1f26a8d9377a8de85188aa425
--- /dev/null
+++ b/VATLM/vat_hubert/vathubert/models/utils.py
@@ -0,0 +1,301 @@
+# ----------------------------------------------------------------------------
+# VatLM: Visual-Audio-Text Pre-Training  with Unified Masked Prediction for Speech Representation Learning
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/VATLM
+# Code based on fairseq: https://github.com/facebookresearch/fairseq and av_hubert: https://github.com/facebookresearch/av_hubert
+# 
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# ----------------------------------------------------------------------------
+
+import cv2
+import torch
+import random
+import numpy as np
+from typing import Dict, List, Optional, Tuple
+
+def load_video(path):
+    for i in range(3):
+        try:
+            cap = cv2.VideoCapture(path)
+            frames = []
+            while True:
+                ret, frame = cap.read()
+                if ret:
+                    frame = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
+                    frames.append(frame)
+                else:
+                    break
+            frames = np.stack(frames)
+            return frames
+        except Exception:
+            print(f"failed loading {path} ({i} / 3)")
+            if i == 2:
+                raise ValueError(f"Unable to load {path}")
+
+
+class Compose(object):
+    """Compose several preprocess together.
+    Args:
+        preprocess (list of ``Preprocess`` objects): list of preprocess to compose.
+    """
+
+    def __init__(self, preprocess):
+        self.preprocess = preprocess
+
+    def __call__(self, sample):
+        for t in self.preprocess:
+            sample = t(sample)
+        return sample
+
+    def __repr__(self):
+        format_string = self.__class__.__name__ + '('
+        for t in self.preprocess:
+            format_string += '\n'
+            format_string += '    {0}'.format(t)
+        format_string += '\n)'
+        return format_string
+
+
+class Normalize(object):
+    """Normalize a ndarray image with mean and standard deviation.
+    """
+
+    def __init__(self, mean, std):
+        self.mean = mean
+        self.std = std
+
+    def __call__(self, frames):
+        """
+        Args:
+            tensor (Tensor): Tensor image of size (C, H, W) to be normalized.
+        Returns:
+            Tensor: Normalized Tensor image.
+        """
+        frames = (frames - self.mean) / self.std
+        return frames
+
+    def __repr__(self):
+        return self.__class__.__name__+'(mean={0}, std={1})'.format(self.mean, self.std)
+
+class CenterCrop(object):
+    """Crop the given image at the center
+    """
+    def __init__(self, size):
+        self.size = size
+
+    def __call__(self, frames):
+        """
+        Args:
+            img (numpy.ndarray): Images to be cropped.
+        Returns:
+            numpy.ndarray: Cropped image.
+        """
+        t, h, w = frames.shape
+        th, tw = self.size
+        delta_w = int(round((w - tw))/2.)
+        delta_h = int(round((h - th))/2.)
+        frames = frames[:, delta_h:delta_h+th, delta_w:delta_w+tw]
+        return frames
+
+
+class RandomCrop(object):
+    """Crop the given image at the center
+    """
+
+    def __init__(self, size):
+        self.size = size
+
+    def __call__(self, frames):
+        """
+        Args:
+            img (numpy.ndarray): Images to be cropped.
+        Returns:
+            numpy.ndarray: Cropped image.
+        """
+        t, h, w = frames.shape
+        th, tw = self.size
+        delta_w = random.randint(0, w-tw)
+        delta_h = random.randint(0, h-th)
+        frames = frames[:, delta_h:delta_h+th, delta_w:delta_w+tw]
+        return frames
+
+    def __repr__(self):
+        return self.__class__.__name__ + '(size={0})'.format(self.size)
+
+class HorizontalFlip(object):
+    """Flip image horizontally.
+    """
+
+    def __init__(self, flip_ratio):
+        self.flip_ratio = flip_ratio
+
+    def __call__(self, frames):
+        """
+        Args:
+            img (numpy.ndarray): Images to be flipped with a probability flip_ratio
+        Returns:
+            numpy.ndarray: Cropped image.
+        """
+        t, h, w = frames.shape
+        if random.random() < self.flip_ratio:
+            for index in range(t):
+                frames[index] = cv2.flip(frames[index], 1)
+        return frames
+
+def compute_mask_indices(
+    shape: Tuple[int, int],
+    padding_mask: Optional[torch.Tensor],
+    mask_prob: float,
+    mask_length: int,
+    mask_type: str = "static",
+    mask_other: float = 0.0,
+    min_masks: int = 0,
+    no_overlap: bool = False,
+    min_space: int = 0,
+) -> np.ndarray:
+    """
+    Computes random mask spans for a given shape
+    Args:
+        shape: the the shape for which to compute masks.
+            should be of size 2 where first element is batch size and 2nd is timesteps
+        padding_mask: optional padding mask of the same size as shape, which will prevent masking padded elements
+        mask_prob: probability for each token to be chosen as start of the span to be masked. this will be multiplied by
+            number of timesteps divided by length of mask span to mask approximately this percentage of all elements.
+            however due to overlaps, the actual number will be smaller (unless no_overlap is True)
+        mask_type: how to compute mask lengths
+            static = fixed size
+            uniform = sample from uniform distribution [mask_other, mask_length*2]
+            normal = sample from normal distribution with mean mask_length and stdev mask_other. mask is min 1 element
+            poisson = sample from possion distribution with lambda = mask length
+        min_masks: minimum number of masked spans
+        no_overlap: if false, will switch to an alternative recursive algorithm that prevents spans from overlapping
+        min_space: only used if no_overlap is True, this is how many elements to keep unmasked between spans
+    """
+
+    bsz, all_sz = shape
+    mask = np.full((bsz, all_sz), False)
+
+    all_num_mask = int(
+        # add a random number for probabilistic rounding
+        mask_prob * all_sz / float(mask_length)
+        + np.random.rand()
+    )
+
+    all_num_mask = max(min_masks, all_num_mask)
+
+    mask_idcs = []
+    for i in range(bsz):
+        if padding_mask is not None:
+            sz = all_sz - padding_mask[i].long().sum().item()
+            num_mask = int(
+                # add a random number for probabilistic rounding
+                mask_prob * sz / float(mask_length)
+                + np.random.rand()
+            )
+            num_mask = max(min_masks, num_mask)
+        else:
+            sz = all_sz
+            num_mask = all_num_mask
+
+        if mask_type == "static":
+            lengths = np.full(num_mask, mask_length)
+        elif mask_type == "uniform":
+            lengths = np.random.randint(mask_other, mask_length * 2 + 1, size=num_mask)
+        elif mask_type == "normal":
+            lengths = np.random.normal(mask_length, mask_other, size=num_mask)
+            lengths = [max(1, int(round(x))) for x in lengths]
+        elif mask_type == "poisson":
+            lengths = np.random.poisson(mask_length, size=num_mask)
+            lengths = [int(round(x)) for x in lengths]
+        else:
+            raise Exception("unknown mask selection " + mask_type)
+
+        if sum(lengths) == 0:
+            lengths[0] = min(mask_length, sz - 1)
+
+        if no_overlap:
+            mask_idc = []
+
+            def arrange(s, e, length, keep_length):
+                span_start = np.random.randint(s, e - length)
+                mask_idc.extend(span_start + i for i in range(length))
+
+                new_parts = []
+                if span_start - s - min_space >= keep_length:
+                    new_parts.append((s, span_start - min_space + 1))
+                if e - span_start - keep_length - min_space > keep_length:
+                    new_parts.append((span_start + length + min_space, e))
+                return new_parts
+
+            parts = [(0, sz)]
+            min_length = min(lengths)
+            for length in sorted(lengths, reverse=True):
+                lens = np.fromiter(
+                    (e - s if e - s >= length + min_space else 0 for s, e in parts),
+                    np.int,
+                )
+                l_sum = np.sum(lens)
+                if l_sum == 0:
+                    break
+                probs = lens / np.sum(lens)
+                c = np.random.choice(len(parts), p=probs)
+                s, e = parts.pop(c)
+                parts.extend(arrange(s, e, length, min_length))
+            mask_idc = np.asarray(mask_idc)
+        else:
+            min_len = min(lengths)
+            if sz - min_len <= num_mask:
+                min_len = sz - num_mask - 1
+
+            mask_idc = np.random.choice(sz - min_len, num_mask, replace=False)
+
+            mask_idc = np.asarray(
+                [
+                    mask_idc[j] + offset
+                    for j in range(len(mask_idc))
+                    for offset in range(lengths[j])
+                ]
+            )
+
+        mask_idcs.append(np.unique(mask_idc[mask_idc < sz]))
+
+    min_len = min([len(m) for m in mask_idcs])
+    batch_indexes, starts, ends = [], [], []
+    for i, mask_idc in enumerate(mask_idcs):
+        if len(mask_idc) > min_len:
+            mask_idc = np.random.choice(mask_idc, min_len, replace=False)
+        mask[i, mask_idc] = True
+        vals, run_starts, run_lengths = find_runs(mask[i])
+        start_indices, lengths = run_starts[vals == True], run_lengths[vals == True]
+        starts.append(start_indices)
+        ends.append(start_indices+lengths)
+        batch_indexes.append(np.zeros([len(start_indices)])+i)
+    return mask, np.concatenate(starts).astype(np.int64), np.concatenate(ends).astype(np.int64), np.concatenate(batch_indexes).astype(np.int64)
+
+def find_runs(x):
+    """Find runs of consecutive items in an array."""
+
+    # ensure array
+    x = np.asanyarray(x)
+    if x.ndim != 1:
+        raise ValueError('only 1D array supported')
+    n = x.shape[0]
+
+    # handle empty array
+    if n == 0:
+        return np.array([]), np.array([]), np.array([])
+
+    else:
+        # find run starts
+        loc_run_start = np.empty(n, dtype=bool)
+        loc_run_start[0] = True
+        np.not_equal(x[:-1], x[1:], out=loc_run_start[1:])
+        run_starts = np.nonzero(loc_run_start)[0]
+
+        # find run values
+        run_values = x[loc_run_start]
+
+        # find run lengths
+        run_lengths = np.diff(np.append(run_starts, n))
+
+        return run_values, run_starts, run_lengths
diff --git a/VATLM/vat_hubert/vathubert/models/vathubert.py b/VATLM/vat_hubert/vathubert/models/vathubert.py
new file mode 100644
index 0000000000000000000000000000000000000000..a172b4a87d511e1b8bc02a5f6f3ffe08807cb973
--- /dev/null
+++ b/VATLM/vat_hubert/vathubert/models/vathubert.py
@@ -0,0 +1,851 @@
+# ----------------------------------------------------------------------------
+# VatLM: Visual-Audio-Text Pre-Training  with Unified Masked Prediction for Speech Representation Learning
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/VATLM
+# Code based on fairseq: https://github.com/facebookresearch/fairseq and av_hubert: https://github.com/facebookresearch/av_hubert
+# 
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# ----------------------------------------------------------------------------
+
+import os,sys
+import logging
+from typing import Dict, List, Optional, Tuple
+
+import numpy as np
+
+import torch
+import torch.nn as nn
+from dataclasses import dataclass, field
+from fairseq import utils
+from fairseq.data.data_utils import compute_mask_indices
+from fairseq.data.dictionary import Dictionary
+from fairseq.dataclass import ChoiceEnum, FairseqDataclass
+from fairseq.models import BaseFairseqModel, register_model
+from fairseq.models.wav2vec.wav2vec2 import (
+    ConvFeatureExtractionModel,
+    TransformerEncoder,
+)
+from fairseq.modules import GradMultiply, LayerNorm
+from copy import deepcopy
+
+DBG=True if len(sys.argv) == 1 else False
+
+if DBG:
+    from vathubert.tasks.vathubert_pretraining import (
+        VATHubertPretrainingConfig,
+        VATHubertPretrainingTask,
+    )
+    from resnet import ResEncoder
+    logging.basicConfig(
+        format="%(asctime)s | %(levelname)s | %(name)s | %(message)s",
+        datefmt="%Y-%m-%d %H:%M:%S",
+        level=os.environ.get("LOGLEVEL", "INFO").upper(),
+        stream=sys.stdout,
+    )
+    from utils import compute_mask_indices
+    from decoder import TransformerDecoder
+
+else:
+    from vathubert.tasks.vathubert_pretraining import (
+        VATHubertPretrainingConfig,
+        VATHubertPretrainingTask,
+    )
+    from vathubert.models.resnet import ResEncoder
+    from vathubert.models.utils import compute_mask_indices
+    from vathubert.models.decoder import TransformerDecoder
+
+from omegaconf import II
+
+logger = logging.getLogger(__name__)
+
+EXTRACTOR_MODE_CHOICES = ChoiceEnum(["default", "layer_norm"])
+MASKING_DISTRIBUTION_CHOICES = ChoiceEnum(
+    ["static", "uniform", "normal", "poisson"]
+)
+
+
+@dataclass
+class VATHubertConfig(FairseqDataclass):
+    label_rate: int = II("task.label_rate")
+    modalities: str = II("task.modalities")
+    extractor_mode: EXTRACTOR_MODE_CHOICES = field(
+        default="default",
+        metadata={
+            "help": "mode for feature extractor. default has a single group "
+            "norm with d groups in the first conv block, whereas layer_norm "
+            "has layer norms in every block (meant to use with normalize=True)"
+        },
+    )
+    encoder_layers: int = field(
+        default=12, metadata={"help": "num encoder layers in the transformer"}
+    )
+    encoder_embed_dim: int = field(
+        default=768, metadata={"help": "encoder embedding dimension"}
+    )
+    encoder_ffn_embed_dim: int = field(
+        default=3072, metadata={"help": "encoder embedding dimension for FFN"}
+    )
+    encoder_attention_heads: int = field(
+        default=12, metadata={"help": "num encoder attention heads"}
+    )
+    activation_fn: ChoiceEnum(utils.get_available_activation_fns()) = field(
+        default="gelu", metadata={"help": "activation function to use"}
+    )
+
+    # dropouts
+    dropout: float = field(
+        default=0.1,
+        metadata={"help": "dropout probability for the transformer"},
+    )
+    attention_dropout: float = field(
+        default=0.1,
+        metadata={"help": "dropout probability for attention weights"},
+    )
+    activation_dropout: float = field(
+        default=0.0,
+        metadata={"help": "dropout probability after activation in FFN"},
+    )
+    encoder_layerdrop: float = field(
+        default=0.0,
+        metadata={"help": "probability of dropping a tarnsformer layer"},
+    )
+    dropout_input: float = field(
+        default=0.0,
+        metadata={"help": "dropout to apply to the input (after feat extr)"},
+    )
+    dropout_features: float = field(
+        default=0.0,
+        metadata={
+            "help": "dropout to apply to the features (after feat extr)"
+        },
+    )
+
+    final_dim: int = field(
+        default=0,
+        metadata={
+            "help": "project final representations and targets to this many "
+            "dimensions. set to encoder_embed_dim is <= 0"
+        },
+    )
+    untie_final_proj: bool = field(
+        default=False,
+        metadata={"help": "use separate projection for each target"},
+    )
+    layer_norm_first: bool = field(
+        default=False,
+        metadata={"help": "apply layernorm first in the transformer"},
+    )
+    conv_feature_layers: str = field(
+        default="[(512,10,5)] + [(512,3,2)] * 4 + [(512,2,2)] * 2",
+        metadata={
+            "help": "string describing convolutional feature extraction "
+            "layers in form of a python list that contains "
+            "[(dim, kernel_size, stride), ...]"
+        },
+    )
+    conv_bias: bool = field(
+        default=False, metadata={"help": "include bias in conv encoder"}
+    )
+    logit_temp: float = field(
+        default=0.1, metadata={"help": "temperature to divide logits by"}
+    )
+    target_glu: bool = field(
+        default=False, metadata={"help": "adds projection + glu to targets"}
+    )
+    feature_grad_mult: float = field(
+        default=1.0,
+        metadata={"help": "multiply feature extractor var grads by this"},
+    )
+
+    # masking
+    mask_length_audio: int = field(default=10, metadata={"help": "mask length"})
+    mask_prob_audio: float = field(
+        default=0.65,
+        metadata={"help": "probability of replacing a token with mask"},
+    )
+    mask_length_image: int = field(default=10, metadata={"help": "mask length"})
+    mask_prob_image: float = field(
+        default=0.65,
+        metadata={"help": "probability of replacing a token with mask"},
+    )
+    mask_selection: MASKING_DISTRIBUTION_CHOICES = field(
+        default="static", metadata={"help": "how to choose mask length"}
+    )
+    mask_other: float = field(
+        default=0,
+        metadata={
+            "help": "secondary mask argument "
+            "(used for more complex distributions), "
+            "see help in compute_mask_indicesh"
+        },
+    )
+    no_mask_overlap: bool = field(
+        default=False, metadata={"help": "whether to allow masks to overlap"}
+    )
+    mask_min_space: int = field(
+        default=1,
+        metadata={
+            "help": "min space between spans (if no overlap is enabled)"
+        },
+    )
+
+    # channel masking
+    mask_channel_length: int = field(
+        default=10,
+        metadata={"help": "length of the mask for features (channels)"},
+    )
+    mask_channel_prob: float = field(
+        default=0.0,
+        metadata={"help": "probability of replacing a feature with 0"},
+    )
+    mask_channel_selection: MASKING_DISTRIBUTION_CHOICES = field(
+        default="static",
+        metadata={"help": "how to choose mask length for channel masking"},
+    )
+    mask_channel_other: float = field(
+        default=0,
+        metadata={
+            "help": "secondary mask argument "
+            "(used for more complex distributions), "
+            "see help in compute_mask_indicesh"
+        },
+    )
+    no_mask_channel_overlap: bool = field(
+        default=False,
+        metadata={"help": "whether to allow channel masks to overlap"},
+    )
+    mask_channel_min_space: int = field(
+        default=1,
+        metadata={
+            "help": "min space between spans (if no overlap is enabled)"
+        },
+    )
+
+    # positional embeddings
+    conv_pos: int = field(
+        default=128,
+        metadata={
+            "help": "number of filters for convolutional positional embeddings"
+        },
+    )
+    conv_pos_groups: int = field(
+        default=16,
+        metadata={
+            "help": "number of groups for convolutional positional embedding"
+        },
+    )
+
+    latent_temp: Tuple[float, float, float] = field(
+        default=(2, 0.5, 0.999995),
+        metadata={"help": "legacy (to be removed)"},
+    )
+
+    # loss computation
+    skip_masked: bool = field(
+        default=False,
+        metadata={"help": "skip computing losses over masked frames"},
+    )
+    skip_nomask: bool = field(
+        default=False,
+        metadata={"help": "skip computing losses over unmasked frames"},
+    )
+    resnet_relu_type: str = field(default='prelu', metadata={"help": 'relu type for resnet'})
+    resnet_weights: Optional[str] = field(default=None, metadata={"help": 'resnet weights'})
+    sim_type: str = field(default='cosine', metadata={"help": 'similarity type'})
+
+    sub_encoder_layers: int = field(default=0, metadata={'help': 'number of transformer layers for single modality'})
+    audio_feat_dim: int = field(default=-1, metadata={'help': 'audio feature dimension'})
+    modality_dropout: float = field(default=0, metadata={'help': 'drop one modality'})
+    audio_dropout: float = field(default=0, metadata={'help': 'drop audio feature'})
+    modality_fuse: str = field(default='concat', metadata={'help': 'fusing two modalities: add,concat'})
+    selection_type : str = field(default='same_other_seq', metadata={'help': 'type of selectig images, same_other_seq: replace masked span with span from another sequence, same_seq: repace masked span with span of the same sequence'})
+    masking_type : str = field(default='input', metadata={'help': 'input or feature masking'})
+
+    decoder_embed_dim: int = field(
+        default=768, metadata={"help": "decoder embedding dimension"}
+    )
+    decoder_ffn_embed_dim: int = field(
+        default=3072, metadata={"help": "decoder embedding dimension for FFN"}
+    )
+    decoder_layers: int = field(
+        default=6, metadata={"help": "num of decoder layers"}
+    )
+    decoder_layerdrop: float = field(
+        default=0.0, metadata={"help": "decoder layerdrop chance"}
+    )
+    decoder_attention_heads: int = field(
+        default=4, metadata={"help": "num decoder attention heads"}
+    )
+    decoder_learned_pos: bool = field(
+        default=False,
+        metadata={"help": "use learned positional embeddings in the decoder"},
+    )
+    decoder_normalize_before: bool = field(
+        default=False,
+        metadata={"help": "apply layernorm before each decoder block"},
+    )
+    no_token_positional_embeddings: bool = field(
+        default=False,
+        metadata={
+            "help": "if set, disables positional embeddings "
+            "(outside self attention)"
+        },
+    )
+    decoder_dropout: float = field(
+        default=0.1, metadata={"help": "dropout probability in the decoder"}
+    )
+    decoder_attention_dropout: float = field(
+        default=0.1,
+        metadata={
+            "help": "dropout probability for attention weights "
+            "inside the decoder"
+        },
+    )
+    decoder_activation_dropout: float = field(
+        default=0.0,
+        metadata={
+            "help": "dropout probability after activation in FFN "
+            "inside the decoder"
+        },
+    )
+    max_target_positions: int = field(
+        default=2048, metadata={"help": "max target positions"}
+    )
+    share_decoder_input_output_embed: bool = field(
+        default=False,
+        metadata={"help": "share decoder input and output embeddings"},
+    )
+    no_scale_embedding: bool = field(default=True, metadata={'help': 'scale embedding'})
+
+class SubModel(nn.Module):
+    def __init__(self, resnet=None, input_dim=None, cfg=None):
+        super().__init__()
+        self.resnet = resnet
+        self.proj = nn.Linear(input_dim, cfg.encoder_embed_dim)
+        self.encoder = TransformerEncoder(cfg) if cfg.encoder_layers > 0 else None
+
+    def forward(self, x):
+        if self.resnet is not None:
+            x = self.resnet(x)
+        x = self.proj(x.transpose(1, 2))
+        if self.encoder is not None:
+            x = self.encoder(x)[0].transpose(1, 2)
+        else:
+            x = x.transpose(1, 2)
+        return x
+
+@register_model("vat_hubert", dataclass=VATHubertConfig)
+class VATHubertModel(BaseFairseqModel):
+    def __init__(
+        self,
+        cfg: VATHubertConfig,
+        task_cfg: VATHubertPretrainingConfig,
+        dictionaries: List[Dictionary],
+        **kwargs
+    ) -> None:
+        super().__init__()
+        logger.info(f"HubertModel Config: {cfg}")
+
+        feature_ds_rate = 1
+        self.feat2tar_ratio = cfg.label_rate * feature_ds_rate / task_cfg.sample_rate
+        sub_cfg = deepcopy(cfg)
+        sub_cfg.encoder_layers = sub_cfg.sub_encoder_layers
+        resnet = ResEncoder(relu_type=cfg.resnet_relu_type, weights=cfg.resnet_weights)
+        self.feature_extractor_audio = SubModel(resnet=None, input_dim=cfg.audio_feat_dim, cfg=sub_cfg)
+        self.feature_extractor_video = SubModel(resnet=resnet, input_dim=resnet.backend_out, cfg=sub_cfg)
+        self.modality_dropout, self.audio_dropout = cfg.modality_dropout, cfg.audio_dropout
+        self.modality_fuse = cfg.modality_fuse
+        self.encoder_embed_dim = cfg.encoder_embed_dim
+        if self.modality_fuse == 'concat':
+            self.embed = cfg.encoder_embed_dim * 3
+        elif self.modality_fuse == 'add':
+            self.embed = cfg.encoder_embed_dim
+        self.post_extract_proj = (
+            nn.Linear(self.embed, cfg.encoder_embed_dim)
+            if self.embed != cfg.encoder_embed_dim
+            else None
+        )
+
+        self.mask_prob_image, self.mask_prob_audio = cfg.mask_prob_image, cfg.mask_prob_audio
+        self.mask_selection = cfg.mask_selection
+        self.mask_other = cfg.mask_other
+        self.mask_length_image, self.mask_length_audio = cfg.mask_length_image, cfg.mask_length_audio
+        self.no_mask_overlap = cfg.no_mask_overlap
+        self.mask_min_space = cfg.mask_min_space
+
+        self.mask_channel_prob = cfg.mask_channel_prob
+        self.mask_channel_selection = cfg.mask_channel_selection
+        self.mask_channel_other = cfg.mask_channel_other
+        self.mask_channel_length = cfg.mask_channel_length
+        self.no_mask_channel_overlap = cfg.no_mask_channel_overlap
+        self.mask_channel_min_space = cfg.mask_channel_min_space
+
+        self.dropout_input = nn.Dropout(cfg.dropout_input)
+        self.dropout_features = nn.Dropout(cfg.dropout_features)
+
+        self.feature_grad_mult = cfg.feature_grad_mult
+        self.logit_temp = cfg.logit_temp
+        self.skip_masked = cfg.skip_masked
+        self.skip_nomask = cfg.skip_nomask
+        self.sim_type = cfg.sim_type
+        self.selection_type = cfg.selection_type
+        self.masking_type = cfg.masking_type
+
+        final_dim = (
+            cfg.final_dim if cfg.final_dim > 0 else cfg.encoder_embed_dim
+        )
+
+        self.mask_emb = nn.Parameter(
+            torch.FloatTensor(cfg.audio_feat_dim).uniform_() if self.masking_type == 'input' else torch.FloatTensor(cfg.encoder_embed_dim).uniform_()
+        )
+
+        self.encoder = TransformerEncoder(cfg)
+        self.layer_norm = LayerNorm(self.embed)
+
+        self.target_glu = None
+        if cfg.target_glu:
+            self.target_glu = nn.Sequential(
+                nn.Linear(final_dim, final_dim * 2), nn.GLU()
+            )
+
+        self.untie_final_proj = cfg.untie_final_proj
+        if self.untie_final_proj:
+            self.final_proj = nn.Linear(
+                cfg.encoder_embed_dim, final_dim * len(dictionaries)
+            )
+        else:
+            self.final_proj = nn.Linear(cfg.encoder_embed_dim, final_dim)
+
+        # modules below are not needed during fine-tuning
+        if any([d is None for d in dictionaries]):
+            logger.info(
+                "cannot find dictionary. assume will be used for fine-tuning"
+            )
+        else:
+            self.num_classes = [len(d) for d in dictionaries]
+            self.label_embs_concat = nn.Parameter(
+                torch.FloatTensor(sum(self.num_classes), final_dim)
+            )
+            nn.init.uniform_(self.label_embs_concat)
+        
+        self.phone_embed = nn.Embedding(46, cfg.encoder_embed_dim)
+        self.phone_conv = nn.Sequential(
+            nn.Conv1d(in_channels=cfg.encoder_embed_dim, out_channels=cfg.encoder_embed_dim, kernel_size=3, stride=2, padding=1),
+            nn.ReLU(),
+            nn.Conv1d(in_channels=cfg.encoder_embed_dim, out_channels=cfg.encoder_embed_dim, kernel_size=3, stride=2, padding=1),
+        )
+
+    def upgrade_state_dict_named(self, state_dict, name):
+        """Upgrade a (possibly old) state dict for new versions of fairseq."""
+
+        super().upgrade_state_dict_named(state_dict, name)
+        return state_dict
+
+    @classmethod
+    def build_model(cls, cfg: VATHubertConfig, task: VATHubertPretrainingTask):
+        """Build a new model instance."""
+
+        kwargs = {}
+        model = VATHubertModel(cfg, task.cfg, task.dictionaries, **kwargs)
+        return model
+
+    def apply_input_mask(self, x, padding_mask, target_list):
+        B, C, T = x.shape[:3]
+        is_audio = True if len(x.shape) == 3 else False
+        if is_audio:
+            mask_prob, mask_length = self.mask_prob_audio, self.mask_length_audio
+        else:
+            mask_prob, mask_length = self.mask_prob_image, self.mask_length_image
+        if mask_prob > 0:
+
+            mask_indices, starts, ends, batch_indexes = compute_mask_indices(
+                (B, T),
+                padding_mask,
+                mask_prob,
+                mask_length,
+                self.mask_selection,
+                self.mask_other,
+                min_masks=2,
+                no_overlap=self.no_mask_overlap,
+                min_space=self.mask_min_space,
+            )
+            mask_indices_np = mask_indices
+            mask_indices = torch.from_numpy(mask_indices).to(x.device)
+            x = x.transpose(1, 2).contiguous() # [B, T, C, H, W]
+            if B == 1:
+                x[mask_indices] = 0
+            elif is_audio:
+                x[mask_indices] = self.mask_emb
+            elif self.selection_type == 'same_other_seq':
+                perm = (torch.arange(B) + torch.randint(low=1, high=B, size=(1,))) % B
+                x_perm = x[perm]
+                x[mask_indices] = x_perm[mask_indices]
+            elif self.selection_type == 'same_seq':
+                batch_indexes_, other_indexes = [], []
+                for batch_index, start, end in zip(batch_indexes, starts, ends):
+                    length = end-start
+                    other_start = np.setdiff1d(np.arange(T), np.arange(max(0, start-length), end))
+                    if len(other_start) > 0:
+                        other_start = np.random.choice(other_start, size=1)
+                    else:
+                        other_start = 0
+                    other_end = other_start + length
+                    other_indexes.append(np.arange(other_start, other_end).clip(max=T-1))
+                    batch_indexes_.append(np.zeros([length], dtype=np.int64)+batch_index)
+                batch_indexes, other_indexes = np.concatenate(batch_indexes_), np.concatenate(other_indexes)
+                x[mask_indices] = x[batch_indexes, other_indexes]
+
+            x = x.transpose(1, 2).contiguous()
+        else:
+            mask_indices = None
+
+        if self.mask_channel_prob > 0:
+            logger.info(f"No mask channel prob for input masking")
+        return x, mask_indices
+
+    def apply_feature_mask(self, x, padding_mask, target_list):
+        B, T, C = x.shape
+        assert self.mask_prob_audio == self.mask_prob_image and self.mask_length_audio == self.mask_length_image, f"masking prob/length for image/audio be same for feature masking"
+        mask_prob, mask_length = self.mask_prob_audio, self.mask_length_image
+        if mask_prob > 0:
+            mask_indices, _, _, _ = compute_mask_indices(
+                (B, T),
+                padding_mask,
+                mask_prob,
+                mask_length,
+                self.mask_selection,
+                self.mask_other,
+                min_masks=2,
+                no_overlap=self.no_mask_overlap,
+                min_space=self.mask_min_space,
+            )
+            mask_indices = torch.from_numpy(mask_indices).to(x.device)
+            x[mask_indices] = self.mask_emb
+        else:
+            mask_indices = None
+
+        if self.mask_channel_prob > 0:
+            mask_channel_indices, _, _, _ = compute_mask_indices(
+                (B, C),
+                None,
+                self.mask_channel_prob,
+                self.mask_channel_length,
+                self.mask_channel_selection,
+                self.mask_channel_other,
+                no_overlap=self.no_mask_channel_overlap,
+                min_space=self.mask_channel_min_space,
+            )
+            mask_channel_indices = (
+                torch.from_numpy(mask_channel_indices)
+                .to(x.device)
+                .unsqueeze(1)
+                .expand(-1, T, -1)
+            )
+            x[mask_channel_indices] = 0
+
+        return x, mask_indices
+
+    def forward_features(self, source: torch.Tensor, modality: str) -> torch.Tensor:
+        extractor = eval(f"self.feature_extractor_{modality}")
+        if self.feature_grad_mult > 0:
+            features = extractor(source)
+            if self.feature_grad_mult != 1.0:
+                features = GradMultiply.apply(features, self.feature_grad_mult)
+        else:
+            with torch.no_grad():
+                features = extractor(source)
+        return features
+
+    def forward_targets(
+            self, features: torch.Tensor, mask_indices: torch.Tensor, target_list: List[torch.Tensor],
+    ) -> Tuple[torch.Tensor, torch.Tensor]:
+        # Trim features to ensure labels exist and then get aligned labels
+        feat_tsz = features.size(2)
+        targ_tsz = min([t.size(1) for t in target_list])
+        if self.feat2tar_ratio * feat_tsz > targ_tsz:
+            feat_tsz = int(targ_tsz / self.feat2tar_ratio)
+            features = features[..., :feat_tsz]
+            if mask_indices is not None:
+                mask_indices = mask_indices[..., :feat_tsz]
+        target_inds = torch.arange(feat_tsz).float() * self.feat2tar_ratio
+        target_list = [t[:, target_inds.long()] for t in target_list]
+        return features, mask_indices, target_list
+
+    def forward_padding_mask(
+        self, features: torch.Tensor, padding_mask: torch.Tensor,
+    ) -> torch.Tensor:
+        extra = padding_mask.size(1) % features.size(1)
+        if extra > 0:
+            padding_mask = padding_mask[:, :-extra]
+        padding_mask = padding_mask.view(
+            padding_mask.size(0), features.size(1), -1
+        )
+        padding_mask = padding_mask.all(-1)
+        return padding_mask
+
+    def compute_logits(self, feats, emb_mat):
+        # feats: [B, T, F], emb_mat: [V, F]
+        if self.sim_type == 'dot':
+            logits = torch.matmul(feats, emb_mat.transpose(0, 1))
+        elif self.sim_type == 'cosine':
+            batch_size, timesteps, emb_dim = feats.size()
+            feats_ = feats.view(-1, emb_dim)
+            nom = (feats_.unsqueeze(dim=1) * emb_mat.unsqueeze(dim=0)).sum(dim=-1) # [B*T, V]
+            denom = (feats_**2).sum(dim=-1).sqrt().unsqueeze(dim=1) * (emb_mat**2).sum(dim=-1).sqrt().unsqueeze(dim=0) # [B*T, V]
+            logits = (nom/denom.clamp(min=1e-6)).view(batch_size, timesteps, -1)
+        else:
+            raise NotImplementedError
+        logits = logits / self.logit_temp
+        return logits
+
+    def forward(
+        self,
+        source: torch.Tensor,
+        target_list: Optional[List[torch.Tensor]] = None,
+        targets_phone_list: Optional[List[torch.Tensor]] = None,
+        extra_text_phone_list: Optional[List[torch.Tensor]] = None,
+        padding_mask: Optional[torch.Tensor] = None,
+        mask: bool = True,
+        features_only: bool = False,
+        output_layer: Optional[int] = None
+    ) -> Dict[str, torch.Tensor]:
+        """output layer is 1-based"""
+
+        if not extra_text_phone_list:
+            src_audio, src_video = source['audio'], source['video']  # src_audio:[B, D1, T], [B, 1, T, 88, 88]
+            
+            if mask and self.masking_type == 'input':
+                src_video, mask_indices_video = self.apply_input_mask(src_video, padding_mask, target_list)
+                src_audio, mask_indices_audio = self.apply_input_mask(src_audio, padding_mask, target_list)
+                mask_indices = torch.logical_or(mask_indices_audio, mask_indices_video)
+            else:
+                src_audio, src_video, mask_indices = src_audio, src_video, None
+
+
+            if src_audio is not None and src_video is None:
+                features_audio = self.forward_features(src_audio, modality='audio') # features: [B, F, T]
+                features_video = features_audio.new_zeros(features_audio.size(0), features_audio.size(1), features_audio.size(-1))
+            elif src_audio is None and src_video is not None:
+                features_video = self.forward_features(src_video, modality='video')
+                features_audio = features_video.new_zeros(features_video.size(0), features_video.size(1), features_video.size(-1))
+            elif src_audio is not None and src_video is not None:
+                features_video = self.forward_features(src_video, modality='video')
+                features_audio = self.forward_features(src_audio, modality='audio') # features: [B, F, T]
+            
+
+            if targets_phone_list is not None:
+                phone_sequence = targets_phone_list[0]
+                phone_embedding = self.phone_embed(phone_sequence)
+
+                feature_phone = self.phone_conv(phone_embedding.transpose(1,2))
+
+            if targets_phone_list is None and src_audio is not None:
+                feature_phone = features_audio.new_zeros(features_audio.size(0), features_audio.size(1), features_audio.size(-1))
+            
+            if targets_phone_list is None and src_video is not None:
+                feature_phone = features_video.new_zeros(features_video.size(0), features_video.size(1), features_video.size(-1))
+            
+
+
+            if features_audio.size(-1) != feature_phone.size(-1):
+                diff = features_audio.size(-1) - feature_phone.size(-1)
+
+                if diff >=0:   
+                    phone_pad_zero = torch.zeros(features_audio.size(0), features_audio.size(1), diff).type_as(feature_phone)
+                    feature_phone = torch.cat((feature_phone, phone_pad_zero), dim=-1)
+                else:
+                    feature_phone = feature_phone[:,:,:features_audio.size(-1)]
+        
+        else:
+            
+            phone_sequence = extra_text_phone_list[0]
+            phone_embedding = self.phone_embed(phone_sequence)
+            feature_phone = self.phone_conv(phone_embedding.transpose(1,2))
+            features_audio = feature_phone.new_zeros(feature_phone.size(0), feature_phone.size(1), feature_phone.size(-1))
+            features_video = feature_phone.new_zeros(feature_phone.size(0), feature_phone.size(1), feature_phone.size(-1))
+
+            mask_indices=None
+            padding_mask = torch.zeros(feature_phone.size(0), feature_phone.size(-1)).to(torch.bool).cuda()
+
+        
+
+
+        modality_drop_prob, audio_drop_prob = np.random.random(), np.random.random()
+        if self.training:
+            if modality_drop_prob < self.modality_dropout:
+                if audio_drop_prob < self.audio_dropout:
+                    features_audio = 0 * features_audio
+                else:
+                    features_video = 0 * features_video
+
+
+        if self.modality_fuse == 'concat':
+            features = torch.cat([features_audio, features_video, feature_phone], dim=1)
+        elif self.modality_fuse == 'add':
+            features = features_audio + features_video + feature_phone
+        
+
+        if target_list is not None:
+            features, mask_indices, target_list = self.forward_targets(features, mask_indices, target_list)
+
+        features_pen = features.float().pow(2).mean()
+
+        features = features.transpose(1, 2)  # [B, T, 1536]
+        features = self.layer_norm(features)
+
+        if padding_mask is not None:
+            padding_mask = self.forward_padding_mask(features, padding_mask)
+
+        if self.post_extract_proj is not None:
+            features = self.post_extract_proj(features)
+
+        features = self.dropout_input(features)
+
+        if self.masking_type == 'feature' and mask:
+            x, mask_indices = self.apply_feature_mask(features, padding_mask, target_list)
+        else:
+            x = features
+
+        # feature: (B, T, D), float
+        # target: (B, T), long
+        # x: (B, T, D), float
+        # padding_mask: (B, T), bool
+        # mask_indices: (B, T), bool
+        x, _ = self.encoder(
+            x,
+            padding_mask=padding_mask,
+            layer=None if output_layer is None else output_layer - 1
+        )     # [B, T, 768]
+
+        if features_only:
+            return {"x": x, "padding_mask": padding_mask, "features": features}
+
+        label_embs_list = self.label_embs_concat.split(self.num_classes, 0)   # list to tuple
+        proj_x = self.final_proj(x)
+        if self.untie_final_proj:   # True
+            proj_x_list = proj_x.chunk(len(self.num_classes), dim=-1)
+        else:
+            proj_x_list = [proj_x for _ in self.num_classes]
+        logit_list = [self.compute_logits(proj, emb).view(-1, num_class) for proj, emb, num_class in zip(proj_x_list, label_embs_list, self.num_classes)] # [[B*T, V]]
+        mask, unmask = torch.logical_and(mask_indices, ~padding_mask).view(-1), torch.logical_and(~mask_indices, ~padding_mask).view(-1) # [B*T]
+        logit_m_list, logit_u_list = [logit[mask] for logit in logit_list], [logit[unmask] for logit in logit_list]
+        target_m_list, target_u_list = [target.view(-1)[mask].long() for target in target_list], [target.view(-1)[unmask].long() for target in target_list]
+        result = {
+            "logit_m_list": logit_m_list,
+            "logit_u_list": logit_u_list,
+            "target_m_list": target_m_list,
+            "target_u_list": target_u_list,
+            "padding_mask": padding_mask,
+            "features_pen": features_pen,
+        }
+        return result
+
+    def extract_features(
+        self,
+        source: torch.Tensor,
+        padding_mask: Optional[torch.Tensor] = None,
+        mask: bool = False,
+        ret_conv: bool = False,
+        output_layer: Optional[int] = None,
+    ) -> Tuple[torch.Tensor, torch.Tensor]:
+        res = self.forward(
+            source,
+            padding_mask=padding_mask,
+            mask=mask,
+            features_only=True,
+            output_layer=output_layer,
+        )
+        feature = res["features"] if ret_conv else res["x"]
+        return feature, res["padding_mask"]
+
+    def extract_finetune(self, source, padding_mask=None, mask=False, ret_conv=False, output_layer=None):
+        src_audio, src_video = source['audio'], source['video']
+        if mask and self.masking_type == 'input':
+            src_video, mask_indices_video = self.apply_input_mask(src_video, padding_mask, target_list=None)
+            src_audio, mask_indices_audio = self.apply_input_mask(src_audio, padding_mask, target_list=None)
+            mask_indices = torch.logical_or(mask_indices_audio, mask_indices_video) # mask_indices not used in fine-tuning
+        else:
+            src_audio, src_video, mask_indices = src_audio, src_video, None
+
+        if src_audio is not None and src_video is None:
+            features_audio = self.forward_features(src_audio, modality='audio') # features: [B, F, T]
+            features_video = features_audio.new_zeros(features_audio.size(0), self.encoder_embed_dim, features_audio.size(-1))
+            feature_phone = features_audio.new_zeros(features_audio.size(0), features_audio.size(1), features_audio.size(-1))
+        elif src_audio is None and src_video is not None:
+            features_video = self.forward_features(src_video, modality='video')
+            features_audio = features_video.new_zeros(features_video.size(0), self.encoder_embed_dim, features_video.size(-1))
+            feature_phone = features_video.new_zeros(features_video.size(0), features_video.size(1), features_video.size(-1))
+        elif src_audio is not None and src_video is not None:
+            features_video = self.forward_features(src_video, modality='video')
+            features_audio = self.forward_features(src_audio, modality='audio') # features: [B, F, T]
+            feature_phone = features_video.new_zeros(features_video.size(0), features_video.size(1), features_video.size(-1))
+
+        if self.modality_fuse == 'concat':
+            features = torch.cat([features_audio, features_video, feature_phone], dim=1)
+        elif self.modality_fuse == 'add':
+            features = features_audio + features_video + feature_phone
+        features_pen = features.float().pow(2).mean()
+
+        features = features.transpose(1, 2)
+        features = self.layer_norm(features)
+        unmasked_features = features.clone()
+
+        if padding_mask is not None:
+            padding_mask = self.forward_padding_mask(features, padding_mask)
+
+        if self.post_extract_proj is not None:
+            features = self.post_extract_proj(features)
+
+        features = self.dropout_input(features)
+        unmasked_features = self.dropout_features(unmasked_features)
+        x = features
+        mask_indices = None
+
+        # feature: (B, T, D), float
+        # target: (B, T), long
+        # x: (B, T, D), float
+        # padding_mask: (B, T), bool
+        # mask_indices: (B, T), bool
+        x, _ = self.encoder(
+            x,
+            padding_mask=padding_mask,
+            layer=None if output_layer is None else output_layer - 1
+        )
+
+        return x, padding_mask
+
+
+    def get_extra_losses(self, net_output):
+        extra_losses = []
+        names = []
+        if "features_pen" in net_output:
+            extra_losses.append(net_output["features_pen"])
+            names.append("features_pen")
+
+        return extra_losses, names
+
+    def remove_pretraining_modules(self):
+        self.target_glu = None
+        self.final_proj = None
+        self.label_embs_concat = None
+        self.mask_emb = None
+
+    def get_logits(self, net_output, is_masked=True):
+        raise NotImplementedError
+
+    def get_targets(self, net_output, is_masked=True):
+        raise NotImplementedError
+
+    def compute_nce(self, x, pos, negs):
+        neg_is_pos = (pos == negs).all(-1)
+        pos = pos.unsqueeze(0)
+        targets = torch.cat([pos, negs], dim=0)
+
+        logits = torch.cosine_similarity(
+            x.float(), targets.float(), dim=-1
+        ).type_as(x)
+        logits /= self.logit_temp
+        if neg_is_pos.any():
+            logits[1:][neg_is_pos] = float("-inf")
+        logits = logits.transpose(0, 1)  # (num_x, num_cls+1)
+        return logits
diff --git a/VATLM/vat_hubert/vathubert/models/vathubert_asr.py b/VATLM/vat_hubert/vathubert/models/vathubert_asr.py
new file mode 100644
index 0000000000000000000000000000000000000000..a9902a9844c94e800ba2ef5967ed1993e81a9e48
--- /dev/null
+++ b/VATLM/vat_hubert/vathubert/models/vathubert_asr.py
@@ -0,0 +1,481 @@
+# ----------------------------------------------------------------------------
+# VatLM: Visual-Audio-Text Pre-Training  with Unified Masked Prediction for Speech Representation Learning
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/VATLM
+# Code based on fairseq: https://github.com/facebookresearch/fairseq and av_hubert: https://github.com/facebookresearch/av_hubert
+# 
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# ----------------------------------------------------------------------------
+
+
+import sys,logging
+import contextlib
+import tempfile
+from argparse import Namespace
+from typing import Any, Optional
+
+import torch
+import torch.nn as nn
+from dataclasses import dataclass, field
+from fairseq import checkpoint_utils, tasks, utils
+from fairseq.dataclass import FairseqDataclass
+from fairseq.dataclass.utils import convert_namespace_to_omegaconf
+from fairseq.models import BaseFairseqModel, FairseqEncoder, FairseqEncoderDecoderModel, register_model
+from fairseq.models.hubert.hubert import MASKING_DISTRIBUTION_CHOICES
+from fairseq.tasks import FairseqTask
+from omegaconf import II, MISSING
+
+DBG=True if len(sys.argv) == 1 else False
+
+if DBG:
+    from vathubert.models.vathubert import VATHubertModel
+    from vathubert.models.decoder import TransformerDecoder
+else:
+    from vathubert.models.vathubert import VATHubertModel
+    from vathubert.models.decoder import TransformerDecoder
+
+logger = logging.getLogger(__name__)
+
+
+@dataclass
+class VATHubertAsrConfig(FairseqDataclass):
+    w2v_path: str = field(
+        default=MISSING, metadata={"help": "path to hubert model"}
+    )
+    no_pretrained_weights: bool = field(
+        default=False,
+        metadata={"help": "if true, does not load pretrained weights"},
+    )
+    dropout_input: float = field(
+        default=0.0,
+        metadata={"help": "dropout to apply to the input (after feat extr)"},
+    )
+    final_dropout: float = field(
+        default=0.0,
+        metadata={
+            "help": "dropout after transformer and before final projection"
+        },
+    )
+    dropout: float = field(
+        default=0.0,
+        metadata={"help": "dropout probability inside hubert model"},
+    )
+    attention_dropout: float = field(
+        default=0.0,
+        metadata={
+            "help": "dropout probability for attention weights "
+            "inside hubert model"
+        },
+    )
+    activation_dropout: float = field(
+        default=0.0,
+        metadata={
+            "help": "dropout probability after activation in FFN "
+            "inside hubert model"
+        },
+    )
+
+    # masking
+    apply_mask: bool = field(
+        default=False, metadata={"help": "apply masking during fine-tuning"}
+    )
+    mask_length: int = field(
+        default=10, metadata={"help": "repeat the mask indices multiple times"}
+    )
+    mask_prob: float = field(
+        default=0.5,
+        metadata={
+            "help": "probability of replacing a token with mask "
+            "(normalized by length)"
+        },
+    )
+    mask_selection: MASKING_DISTRIBUTION_CHOICES = field(
+        default="static", metadata={"help": "how to choose masks"}
+    )
+    mask_other: float = field(
+        default=0,
+        metadata={
+            "help": "secondary mask argument "
+            "(used for more complex distributions), "
+            "see help in compute_mask_indices"
+        },
+    )
+    no_mask_overlap: bool = field(
+        default=False, metadata={"help": "whether to allow masks to overlap"}
+    )
+
+    # channel masking
+    mask_channel_length: int = field(
+        default=10,
+        metadata={"help": "length of the mask for features (channels)"},
+    )
+    mask_channel_prob: float = field(
+        default=0.0,
+        metadata={"help": "probability of replacing a feature with 0"},
+    )
+    mask_channel_selection: MASKING_DISTRIBUTION_CHOICES = field(
+        default="static",
+        metadata={"help": "how to choose mask length for channel masking"},
+    )
+    mask_channel_other: float = field(
+        default=0,
+        metadata={
+            "help": "secondary mask argument "
+            "(used for more complex distributions), "
+            "see help in compute_mask_indices"
+        },
+    )
+    no_mask_channel_overlap: bool = field(
+        default=False,
+        metadata={"help": "whether to allow channel masks to overlap"},
+    )
+    freeze_finetune_updates: int = field(
+        default=0,
+        metadata={"help": "dont finetune hubert for this many updates"},
+    )
+    feature_grad_mult: float = field(
+        default=0.0,
+        metadata={"help": "reset feature grad mult in hubert to this"},
+    )
+    layerdrop: float = field(
+        default=0.0,
+        metadata={"help": "probability of dropping a layer in hubert"},
+    )
+    normalize: bool = II("task.normalize")
+    data: str = II("task.data")
+
+    # this holds the loaded hubert args
+    w2v_args: Any = None
+
+
+@dataclass
+class VATHubertSeq2SeqConfig(VATHubertAsrConfig):
+    decoder_embed_dim: int = field(
+        default=768, metadata={"help": "decoder embedding dimension"}
+    )
+    decoder_ffn_embed_dim: int = field(
+        default=3072, metadata={"help": "decoder embedding dimension for FFN"}
+    )
+    decoder_layers: int = field(
+        default=6, metadata={"help": "num of decoder layers"}
+    )
+    decoder_layerdrop: float = field(
+        default=0.0, metadata={"help": "decoder layerdrop chance"}
+    )
+    decoder_attention_heads: int = field(
+        default=4, metadata={"help": "num decoder attention heads"}
+    )
+    decoder_learned_pos: bool = field(
+        default=False,
+        metadata={"help": "use learned positional embeddings in the decoder"},
+    )
+    decoder_normalize_before: bool = field(
+        default=False,
+        metadata={"help": "apply layernorm before each decoder block"},
+    )
+    no_token_positional_embeddings: bool = field(
+        default=False,
+        metadata={
+            "help": "if set, disables positional embeddings "
+            "(outside self attention)"
+        },
+    )
+    decoder_dropout: float = field(
+        default=0.0, metadata={"help": "dropout probability in the decoder"}
+    )
+    decoder_attention_dropout: float = field(
+        default=0.0,
+        metadata={
+            "help": "dropout probability for attention weights "
+            "inside the decoder"
+        },
+    )
+    decoder_activation_dropout: float = field(
+        default=0.0,
+        metadata={
+            "help": "dropout probability after activation in FFN "
+            "inside the decoder"
+        },
+    )
+    max_target_positions: int = field(
+        default=2048, metadata={"help": "max target positions"}
+    )
+    share_decoder_input_output_embed: bool = field(
+        default=False,
+        metadata={"help": "share decoder input and output embeddings"},
+    )
+    no_scale_embedding: bool = field(default=True, metadata={'help': 'scale embedding'})
+
+class HubertEncoder(FairseqEncoder):
+    def __init__(self, cfg: VATHubertAsrConfig, tgt_dict=None):
+        self.apply_mask = cfg.apply_mask
+
+        arg_overrides = {
+            "dropout": cfg.dropout,
+            "activation_dropout": cfg.activation_dropout,
+            "dropout_input": cfg.dropout_input,
+            "attention_dropout": cfg.attention_dropout,
+            "mask_length": cfg.mask_length,
+            "mask_prob": cfg.mask_prob,
+            "mask_selection": cfg.mask_selection,
+            "mask_other": cfg.mask_other,
+            "no_mask_overlap": cfg.no_mask_overlap,
+            "mask_channel_length": cfg.mask_channel_length,
+            "mask_channel_prob": cfg.mask_channel_prob,
+            "mask_channel_selection": cfg.mask_channel_selection,
+            "mask_channel_other": cfg.mask_channel_other,
+            "no_mask_channel_overlap": cfg.no_mask_channel_overlap,
+            "encoder_layerdrop": cfg.layerdrop,
+            "feature_grad_mult": cfg.feature_grad_mult,
+        }
+
+        if cfg.w2v_args is None:
+            state = checkpoint_utils.load_checkpoint_to_cpu(
+                cfg.w2v_path, arg_overrides
+            )
+            w2v_args = state.get("cfg", None)
+            if w2v_args is None:
+                w2v_args = convert_namespace_to_omegaconf(state["args"])
+            cfg.w2v_args = w2v_args
+        else:
+            state = None
+            w2v_args = cfg.w2v_args
+            if isinstance(w2v_args, Namespace):
+                cfg.w2v_args = w2v_args = convert_namespace_to_omegaconf(
+                    w2v_args
+                )
+
+        assert cfg.normalize == w2v_args.task.normalize, (
+            "Fine-tuning works best when data normalization is the same. "
+            "Please check that --normalize is set or unset for "
+            "both pre-training and here"
+        )
+
+        w2v_args.task.data = cfg.data
+
+        task = tasks.setup_task(w2v_args.task)
+        model = task.build_model(w2v_args.model)
+
+        if state is not None and not cfg.no_pretrained_weights:
+            # set strict=False because we omit some modules
+            model.load_state_dict(state["model"], strict=False)
+
+        model.remove_pretraining_modules()
+
+        super().__init__(task.source_dictionary)
+
+        d = model.encoder.embedding_dim
+
+        self.w2v_model = model
+
+        self.final_dropout = nn.Dropout(cfg.final_dropout)
+        self.freeze_finetune_updates = cfg.freeze_finetune_updates
+        self.num_updates = 0
+
+        if tgt_dict is not None:
+            self.proj = Linear(d, len(tgt_dict))
+        elif getattr(cfg, "decoder_embed_dim", d) != d:
+            self.proj = Linear(d, cfg.decoder_embed_dim)
+        else:
+            self.proj = None
+
+    def set_num_updates(self, num_updates):
+        """Set the number of parameters updates."""
+        super().set_num_updates(num_updates)
+        self.num_updates = num_updates
+
+    def forward(self, source, padding_mask, tbc=True, **kwargs):
+
+        w2v_args = {
+            "source": source,
+            "padding_mask": padding_mask,
+            "mask": self.apply_mask and self.training,
+        }
+        ft = self.freeze_finetune_updates <= self.num_updates
+
+        with torch.no_grad() if not ft else contextlib.ExitStack():
+            x, padding_mask = self.w2v_model.extract_finetune(**w2v_args)
+
+            if tbc:
+                # B x T x C -> T x B x C
+                x = x.transpose(0, 1)
+
+        x = self.final_dropout(x)
+
+        if self.proj:
+            x = self.proj(x)
+
+        return {
+            "encoder_out": x,  # T x B x C
+            "encoder_padding_mask": padding_mask,  # B x T
+            "padding_mask": padding_mask,
+        }
+
+    def reorder_encoder_out(self, encoder_out, new_order):
+        if encoder_out["encoder_out"] is not None:
+            encoder_out["encoder_out"] = encoder_out[
+                "encoder_out"
+            ].index_select(1, new_order)
+        if encoder_out["encoder_padding_mask"] is not None:
+            encoder_out["encoder_padding_mask"] = encoder_out[
+                "encoder_padding_mask"
+            ].index_select(0, new_order)
+        return encoder_out
+
+    def max_positions(self):
+        """Maximum input length supported by the encoder."""
+        return None
+
+    def upgrade_state_dict_named(self, state_dict, name):
+        return state_dict
+
+
+class HubertEncoderWrapper(FairseqEncoder):
+    def __init__(self, w2v_model):
+        super().__init__(None)
+        self.w2v_model = w2v_model
+
+    def forward(self, source, padding_mask, **kwargs):
+        w2v_args = {
+            "source": source,
+            "padding_mask": padding_mask,
+        }
+
+        x, padding_mask = self.w2v_model.extract_finetune(**w2v_args)
+        # B x T x C -> T x B x C
+        x = x.transpose(0, 1)
+
+        return {
+            "encoder_out": x,  # T x B x C
+            "encoder_padding_mask": padding_mask,  # B x T
+            "padding_mask": padding_mask
+        }
+
+    def reorder_encoder_out(self, encoder_out, new_order):
+        if encoder_out["encoder_out"] is not None:
+            encoder_out["encoder_out"] = encoder_out[
+                "encoder_out"
+            ].index_select(1, new_order)
+        if encoder_out["encoder_padding_mask"] is not None:
+            encoder_out["encoder_padding_mask"] = encoder_out[
+                "encoder_padding_mask"
+            ].index_select(0, new_order)
+        if encoder_out["padding_mask"] is not None:
+            encoder_out["padding_mask"] = encoder_out[
+                "padding_mask"
+            ].index_select(0, new_order)
+        return encoder_out
+
+@register_model("vat_hubert_seq2seq", dataclass=VATHubertSeq2SeqConfig)
+class VATHubertSeq2Seq(FairseqEncoderDecoderModel):
+    def __init__(self, encoder, decoder, tgt_dict, cfg):
+        super().__init__(encoder, decoder)
+        self.cfg = cfg
+        self.freeze_finetune_updates = cfg.freeze_finetune_updates
+
+    @classmethod
+    def build_model(cls, cfg, task):
+        """Build a new model instance."""
+
+        arg_overrides = {
+            "dropout": cfg.dropout,
+            "activation_dropout": cfg.activation_dropout,
+            "dropout_input": cfg.dropout_input,
+            "attention_dropout": cfg.attention_dropout,
+            "mask_length": cfg.mask_length,
+            "mask_prob": cfg.mask_prob,
+            "mask_selection": cfg.mask_selection,
+            "mask_other": cfg.mask_other,
+            "no_mask_overlap": cfg.no_mask_overlap,
+            "mask_channel_length": cfg.mask_channel_length,
+            "mask_channel_prob": cfg.mask_channel_prob,
+            "mask_channel_selection": cfg.mask_channel_selection,
+            "mask_channel_other": cfg.mask_channel_other,
+            "no_mask_channel_overlap": cfg.no_mask_channel_overlap,
+            "encoder_layerdrop": cfg.layerdrop,
+            "feature_grad_mult": cfg.feature_grad_mult,
+        }
+
+        if cfg.w2v_args is None:
+            state = checkpoint_utils.load_checkpoint_to_cpu(
+                cfg.w2v_path, arg_overrides
+            )
+            w2v_args = state.get("cfg", None)
+            if w2v_args is None:
+                w2v_args = convert_namespace_to_omegaconf(state["args"])
+            cfg.w2v_args = w2v_args
+        else:
+            state = None
+            w2v_args = cfg.w2v_args
+            if isinstance(w2v_args, Namespace):
+                cfg.w2v_args = w2v_args = convert_namespace_to_omegaconf(
+                    w2v_args
+                )
+
+        assert cfg.normalize == w2v_args.task.normalize, (
+            "Fine-tuning works best when data normalization is the same. "
+            "Please check that --normalize is set or unset for "
+            "both pre-training and here"
+        )
+
+        w2v_args.task.data = cfg.data
+
+        task_pretrain = tasks.setup_task(w2v_args.task)
+        if state is not None:
+            task_pretrain.load_state_dict(state['task_state'])
+
+        encoder_ = task_pretrain.build_model(w2v_args.model)
+
+        encoder = HubertEncoderWrapper(encoder_)
+        if state is not None and not cfg.no_pretrained_weights:
+            # set strict=False because we omit some modules
+            del state['model']['mask_emb']
+            del state['model']['label_embs_concat']
+
+            encoder.w2v_model.load_state_dict(state["model"], strict=False)
+
+        encoder.w2v_model.remove_pretraining_modules()
+
+        src_dict, tgt_dict = task.source_dictionary, task.target_dictionary
+
+        def build_embedding(dictionary, embed_dim):
+            num_embeddings = len(dictionary)
+            padding_idx = dictionary.pad()
+            emb = Embedding(num_embeddings, embed_dim, padding_idx=padding_idx)
+            return emb
+
+        decoder_embed_tokens = build_embedding(tgt_dict, cfg.decoder_embed_dim)
+        decoder = TransformerDecoder(cfg, tgt_dict, decoder_embed_tokens)
+
+        return VATHubertSeq2Seq(encoder, decoder, tgt_dict, cfg)
+
+
+    def forward(self, **kwargs):
+        ft = self.freeze_finetune_updates <= self.num_updates
+        with torch.no_grad() if not ft else contextlib.ExitStack():
+            output = self.encoder(**kwargs)
+        decoder_out = self.decoder(prev_output_tokens=kwargs['prev_output_tokens'], encoder_out=output)
+        return decoder_out
+
+    def upgrade_state_dict_named(self, state_dict, name):
+        super().upgrade_state_dict_named(state_dict, name)
+        return state_dict
+
+    def set_num_updates(self, num_updates):
+        """Set the number of parameters updates."""
+        super().set_num_updates(num_updates)
+        self.num_updates = num_updates
+
+def Embedding(num_embeddings, embedding_dim, padding_idx):
+    m = nn.Embedding(num_embeddings, embedding_dim, padding_idx=padding_idx)
+    nn.init.normal_(m.weight, mean=0, std=embedding_dim ** -0.5)
+    nn.init.constant_(m.weight[padding_idx], 0)
+    return m
+
+
+def Linear(in_features, out_features, bias=True):
+    m = nn.Linear(in_features, out_features, bias)
+    nn.init.xavier_uniform_(m.weight)
+    if bias:
+        nn.init.constant_(m.bias, 0.0)
+    return m
diff --git a/VATLM/vat_hubert/vathubert/scripts/finetune_avsr/base_lrs3_finetune30_av.sh b/VATLM/vat_hubert/vathubert/scripts/finetune_avsr/base_lrs3_finetune30_av.sh
new file mode 100644
index 0000000000000000000000000000000000000000..422939bb222204506bf2f35774982dfa797aceac
--- /dev/null
+++ b/VATLM/vat_hubert/vathubert/scripts/finetune_avsr/base_lrs3_finetune30_av.sh
@@ -0,0 +1,27 @@
+#!/bin/bash
+
+ngpu=$1
+updatefreq=$2
+max_tokens=$3
+pretrained_model_path=$4
+save_path=$5
+
+python /path/to/fairseq/fairseq_cli/hydra_train.py \
+       --config-dir /path/to/vat_hubert/vathubert/conf/finetune --config-name base_lrs3_30h_av.yaml \
+       task.data=/path/to/30h_data_tsv \
+       task.label_dir=/path/to/30h_data_tsv \
+       task.tokenizer_bpe_model=/path/to/sentencepiece/model \
+       task.modalities=["audio","video"] \
+       model.w2v_path=${pretrained_model_path} \
+       hydra.run.dir=${save_path} \
+       common.user_dir=/path/to/vat_hubert/vathubert \
+       distributed_training.distributed_world_size=${ngpu} \
+       distributed_training.ddp_backend="no_c10d" \
+       optimization.update_freq=[${updatefreq}] \
+       dataset.max_tokens=${max_tokens} \
+       +task.use_supervised_data=False \
+       +task.use_extra_textdata=False \
+       +task.use_extra_audiodata=False \
+       
+
+
diff --git a/VATLM/vat_hubert/vathubert/scripts/finetune_avsr/base_vox_finetune30_av.sh b/VATLM/vat_hubert/vathubert/scripts/finetune_avsr/base_vox_finetune30_av.sh
new file mode 100644
index 0000000000000000000000000000000000000000..5e8a9e55bdf7f6380dcf5611e8407a6a28b26604
--- /dev/null
+++ b/VATLM/vat_hubert/vathubert/scripts/finetune_avsr/base_vox_finetune30_av.sh
@@ -0,0 +1,28 @@
+#!/bin/bash
+
+ngpu=$1
+updatefreq=$2
+max_tokens=$3
+pretrained_model_path=$4
+save_path=$5
+
+python /path/to/fairseq/fairseq_cli/hydra_train.py \
+       --config-dir /path/to/vat_hubert/vathubert/conf/finetune --config-name base_vox_30h_av.yaml \
+       task.data=/path/to/30h_data_tsv \
+       task.label_dir=/path/to/30h_data_tsv \
+       task.tokenizer_bpe_model=/path/to/sentencepiece/model \
+       task.modalities=["audio","video"] \
+       model.w2v_path=${pretrained_model_path} \
+       hydra.run.dir=${save_path} \
+       common.user_dir=/path/to/vat_hubert/vathubert  \
+       distributed_training.distributed_world_size=${ngpu} \
+       distributed_training.ddp_backend="no_c10d" \
+       optimization.update_freq=[${updatefreq}] \
+       dataset.max_tokens=${max_tokens} \
+       +task.use_supervised_data=False \
+       +task.use_extra_textdata=False \
+       +task.use_extra_audiodata=False \
+       
+
+
+
diff --git a/VATLM/vat_hubert/vathubert/scripts/finetune_avsr/base_vox_finetune433_av.sh b/VATLM/vat_hubert/vathubert/scripts/finetune_avsr/base_vox_finetune433_av.sh
new file mode 100644
index 0000000000000000000000000000000000000000..0a9ad419c779f7a170605c04be979892ea5332cc
--- /dev/null
+++ b/VATLM/vat_hubert/vathubert/scripts/finetune_avsr/base_vox_finetune433_av.sh
@@ -0,0 +1,28 @@
+#!/bin/bash
+
+ngpu=$1
+updatefreq=$2
+max_tokens=$3
+pretrained_model_path=$4
+save_path=$5
+
+python /path/to/fairseq/fairseq_cli/hydra_train.py \
+       --config-dir /path/to/vat_hubert/vathubert/conf/finetune --config-name base_vox_433h_av.yaml \
+       task.data=/path/to/433h_data_tsv \
+       task.label_dir=/path/to/433h_data_tsv \
+       task.tokenizer_bpe_model=/path/to/sentencepiece/model \
+       task.modalities=["audio","video"] \
+       model.w2v_path=${pretrained_model_path} \
+       hydra.run.dir=${save_path} \
+       common.user_dir=/path/to/vat_hubert/vathubert  \
+       distributed_training.distributed_world_size=${ngpu} \
+       distributed_training.ddp_backend="no_c10d" \
+       optimization.update_freq=[${updatefreq}] \
+       dataset.max_tokens=${max_tokens} \
+       +task.use_supervised_data=False \
+       +task.use_extra_textdata=False \
+       +task.use_extra_audiodata=False \
+       
+
+
+
diff --git a/VATLM/vat_hubert/vathubert/scripts/finetune_avsr/large_vox_finetune30_av.sh b/VATLM/vat_hubert/vathubert/scripts/finetune_avsr/large_vox_finetune30_av.sh
new file mode 100644
index 0000000000000000000000000000000000000000..fb849c3de0e5ea7ff89d26f9379529e5c90aca11
--- /dev/null
+++ b/VATLM/vat_hubert/vathubert/scripts/finetune_avsr/large_vox_finetune30_av.sh
@@ -0,0 +1,24 @@
+#!/bin/bash
+
+ngpu=$1
+updatefreq=$2
+max_tokens=$3
+pretrained_model_path=$4
+save_path=$5
+
+python /path/to/fairseq/fairseq_cli/hydra_train.py \
+       --config-dir /path/to/vat_hubert/vathubert/conf/finetune --config-name large_vox_30h_av.yaml \
+       task.data=/path/to/30h_data_tsv \
+       task.label_dir=/path/to/30h_data_tsv \
+       task.tokenizer_bpe_model=/path/to/sentencepiece/model \
+       task.modalities=["audio","video"] \
+       model.w2v_path=${pretrained_model_path} \
+       hydra.run.dir=${save_path} \
+       common.user_dir=/path/to/vat_hubert/vathubert  \
+       distributed_training.distributed_world_size=${ngpu} \
+       distributed_training.ddp_backend="no_c10d" \
+       optimization.update_freq=[${updatefreq}] \
+       dataset.max_tokens=${max_tokens} \
+       +task.use_supervised_data=False \
+       +task.use_extra_textdata=False \
+       +task.use_extra_audiodata=False \
\ No newline at end of file
diff --git a/VATLM/vat_hubert/vathubert/scripts/finetune_avsr/large_vox_finetune433_av.sh b/VATLM/vat_hubert/vathubert/scripts/finetune_avsr/large_vox_finetune433_av.sh
new file mode 100644
index 0000000000000000000000000000000000000000..47668d093dce01191a5ddcaae0dd469dd3537f37
--- /dev/null
+++ b/VATLM/vat_hubert/vathubert/scripts/finetune_avsr/large_vox_finetune433_av.sh
@@ -0,0 +1,28 @@
+#!/bin/bash
+
+ngpu=$1
+updatefreq=$2
+max_tokens=$3
+pretrained_model_path=$4
+save_path=$5
+
+python /path/to/fairseq/fairseq_cli/hydra_train.py \
+       --config-dir /path/to/vat_hubert/vathubert/conf/finetune --config-name large_vox_433h_av.yaml \
+       task.data=/path/to/433h_data_tsv \
+       task.label_dir=/path/to/433h_data_tsv \
+       task.tokenizer_bpe_model=/path/to/sentencepiece/model \
+       task.modalities=["audio","video"] \
+       model.w2v_path=${pretrained_model_path} \
+       hydra.run.dir=${save_path} \
+       common.user_dir=/path/to/vat_hubert/vathubert  \
+       distributed_training.distributed_world_size=${ngpu} \
+       distributed_training.ddp_backend="no_c10d" \
+       optimization.update_freq=[${updatefreq}] \
+       dataset.max_tokens=${max_tokens} \
+       +task.use_supervised_data=False \
+       +task.use_extra_textdata=False \
+       +task.use_extra_audiodata=False \
+       
+
+
+
diff --git a/VATLM/vat_hubert/vathubert/scripts/finetune_vsr/base_lrs3_finetune30_v.sh b/VATLM/vat_hubert/vathubert/scripts/finetune_vsr/base_lrs3_finetune30_v.sh
new file mode 100644
index 0000000000000000000000000000000000000000..56ac2375c27fdc3c6999116ea74565af773267c9
--- /dev/null
+++ b/VATLM/vat_hubert/vathubert/scripts/finetune_vsr/base_lrs3_finetune30_v.sh
@@ -0,0 +1,27 @@
+#!/bin/bash
+
+ngpu=$1
+updatefreq=$2
+max_tokens=$3
+pretrained_model_path=$4
+save_path=$5
+
+python /path/to/fairseq/fairseq_cli/hydra_train.py \
+       --config-dir /path/to/vat_hubert/vathubert/conf/finetune --config-name base_lrs3_30h_v.yaml \
+       task.data=/path/to/30h_data_tsv \
+       task.label_dir=/path/to/30h_data_tsv \
+       task.tokenizer_bpe_model=/path/to/sentencepiece/model \
+       task.modalities=["video"] \
+       model.w2v_path=${pretrained_model_path} \
+       hydra.run.dir=${save_path} \
+       common.user_dir=/path/to/vat_hubert/vathubert  \
+       distributed_training.distributed_world_size=${ngpu} \
+       distributed_training.ddp_backend="no_c10d" \
+       optimization.update_freq=[${updatefreq}] \
+       dataset.max_tokens=${max_tokens} \
+       +task.use_supervised_data=False \
+       +task.use_extra_textdata=False \
+       +task.use_extra_audiodata=False \
+       
+
+
diff --git a/VATLM/vat_hubert/vathubert/scripts/finetune_vsr/base_vox_finetune30_v.sh b/VATLM/vat_hubert/vathubert/scripts/finetune_vsr/base_vox_finetune30_v.sh
new file mode 100644
index 0000000000000000000000000000000000000000..d3f1b13feba0701f629e66409a0cc082ecd0d7ac
--- /dev/null
+++ b/VATLM/vat_hubert/vathubert/scripts/finetune_vsr/base_vox_finetune30_v.sh
@@ -0,0 +1,27 @@
+#!/bin/bash
+
+ngpu=$1
+updatefreq=$2
+max_tokens=$3
+pretrained_model_path=$4
+save_path=$5
+
+python /path/to/fairseq/fairseq_cli/hydra_train.py \
+       --config-dir /path/to/vat_hubert/vathubert/conf/finetune --config-name base_vox_30h_v.yaml \
+       task.data=/path/to/30h_data_tsv \
+       task.label_dir=/path/to/30h_data_tsv \
+       task.tokenizer_bpe_model=/path/to/sentencepiece/model \
+       task.modalities=["video"] \
+       model.w2v_path=${pretrained_model_path} \
+       hydra.run.dir=${save_path} \
+       common.user_dir=/path/to/vat_hubert/vathubert  \
+       distributed_training.distributed_world_size=${ngpu} \
+       distributed_training.ddp_backend="no_c10d" \
+       optimization.update_freq=[${updatefreq}] \
+       dataset.max_tokens=${max_tokens} \
+       +task.use_supervised_data=False \
+       +task.use_extra_textdata=False \
+       +task.use_extra_audiodata=False \
+       
+
+
diff --git a/VATLM/vat_hubert/vathubert/scripts/finetune_vsr/base_vox_finetune433_v.sh b/VATLM/vat_hubert/vathubert/scripts/finetune_vsr/base_vox_finetune433_v.sh
new file mode 100644
index 0000000000000000000000000000000000000000..4943d8a22268b4e97ba047c7cca18208fecef4fa
--- /dev/null
+++ b/VATLM/vat_hubert/vathubert/scripts/finetune_vsr/base_vox_finetune433_v.sh
@@ -0,0 +1,28 @@
+#!/bin/bash
+
+ngpu=$1
+updatefreq=$2
+max_tokens=$3
+pretrained_model_path=$4
+save_path=$5
+
+python /path/to/fairseq/fairseq_cli/hydra_train.py \
+       --config-dir /path/to/vat_hubert/vathubert/conf/finetune --config-name base_vox_433h_v.yaml \
+       task.data=/path/to/433h_data_tsv \
+       task.label_dir=/path/to/433h_data_tsv \
+       task.tokenizer_bpe_model=/path/to/sentencepiece/model \
+       task.modalities=["video"] \
+       model.w2v_path=${pretrained_model_path} \
+       hydra.run.dir=${save_path} \
+       common.user_dir=/path/to/vat_hubert/vathubert  \
+       distributed_training.distributed_world_size=${ngpu} \
+       distributed_training.ddp_backend="no_c10d" \
+       optimization.update_freq=[${updatefreq}] \
+       dataset.max_tokens=${max_tokens} \
+       +task.use_supervised_data=False \
+       +task.use_extra_textdata=False \
+       +task.use_extra_audiodata=False \
+       
+
+
+
diff --git a/VATLM/vat_hubert/vathubert/scripts/finetune_vsr/large_vox_finetune30_v.sh b/VATLM/vat_hubert/vathubert/scripts/finetune_vsr/large_vox_finetune30_v.sh
new file mode 100644
index 0000000000000000000000000000000000000000..c36d01767ea058ae3b086f486bd1043a643ae470
--- /dev/null
+++ b/VATLM/vat_hubert/vathubert/scripts/finetune_vsr/large_vox_finetune30_v.sh
@@ -0,0 +1,27 @@
+#!/bin/bash
+
+ngpu=$1
+updatefreq=$2
+max_tokens=$3
+pretrained_model_path=$4
+save_path=$5
+
+python /path/to/fairseq/fairseq_cli/hydra_train.py \
+       --config-dir /path/to/vat_hubert/vathubert/conf/finetune --config-name large_vox_30h_v.yaml \
+       task.data=/path/to/30h_data_tsv \
+       task.label_dir=/path/to/30h_data_tsv \
+       task.tokenizer_bpe_model=/path/to/sentencepiece/model \
+       task.modalities=["video"] \
+       model.w2v_path=${pretrained_model_path} \
+       hydra.run.dir=${save_path} \
+       common.user_dir=/path/to/vat_hubert/vathubert  \
+       distributed_training.distributed_world_size=${ngpu} \
+       distributed_training.ddp_backend="no_c10d" \
+       optimization.update_freq=[${updatefreq}] \
+       dataset.max_tokens=${max_tokens} \
+       +task.use_supervised_data=False \
+       +task.use_extra_textdata=False \
+       +task.use_extra_audiodata=False \
+       
+
+
diff --git a/VATLM/vat_hubert/vathubert/scripts/finetune_vsr/large_vox_finetune433_v.sh b/VATLM/vat_hubert/vathubert/scripts/finetune_vsr/large_vox_finetune433_v.sh
new file mode 100644
index 0000000000000000000000000000000000000000..275222c30c0af19b3788b98e2c6e9a7b62e1f0c3
--- /dev/null
+++ b/VATLM/vat_hubert/vathubert/scripts/finetune_vsr/large_vox_finetune433_v.sh
@@ -0,0 +1,28 @@
+#!/bin/bash
+
+ngpu=$1
+updatefreq=$2
+max_tokens=$3
+pretrained_model_path=$4
+save_path=$5
+
+python /path/to/fairseq/fairseq_cli/hydra_train.py \
+       --config-dir /path/to/vat_hubert/vathubert/conf/finetune --config-name large_vox_433h_v.yaml \
+       task.data=/path/to/433h_data_tsv \
+       task.label_dir=/path/to/433h_data_tsv \
+       task.tokenizer_bpe_model=/path/to/sentencepiece/model \
+       task.modalities=["video"] \
+       model.w2v_path=${pretrained_model_path} \
+       hydra.run.dir=${save_path} \
+       common.user_dir=/path/to/vat_hubert/vathubert  \
+       distributed_training.distributed_world_size=${ngpu} \
+       distributed_training.ddp_backend="no_c10d" \
+       optimization.update_freq=[${updatefreq}] \
+       dataset.max_tokens=${max_tokens} \
+       +task.use_supervised_data=False \
+       +task.use_extra_textdata=False \
+       +task.use_extra_audiodata=False \
+       
+
+
+
diff --git a/VATLM/vat_hubert/vathubert/scripts/pretrain/base_lsr3_pretrain_iter5.sh b/VATLM/vat_hubert/vathubert/scripts/pretrain/base_lsr3_pretrain_iter5.sh
new file mode 100644
index 0000000000000000000000000000000000000000..bb9d03cd6272b5b2c52aff645c0cf6580379f473
--- /dev/null
+++ b/VATLM/vat_hubert/vathubert/scripts/pretrain/base_lsr3_pretrain_iter5.sh
@@ -0,0 +1,31 @@
+#!/bin/bash
+ngpu=$1
+updatefreq=$2
+datapath=/LocalData/vatlm_related/fbankdata
+save_path=$3
+
+python /path/to/fairseq/fairseq_cli/hydra_train.py \
+       --config-dir /path/to/vat_hubert/vathubert/conf/pretrain --config-name base_lrs3_iter5.yaml \
+       task.data=${datapath}/433pre_lrs3_433h_tsv \
+       task.label_dir=${datapath}/433pre_lrs3_433h_tsv \
+       +task.sup_data_path=${datapath}/433pre_tedv3_phone_concat_tsv2 \
+       +task.sup_manifest=${datapath}/433pre_tedv3_phone_concat_tsv2 \
+       +task.onlytext_manifest=${datapath}/433pre_cantab_tsv \
+       +task.onlyaudio_manifest=${datapath}/433pre_giga_tsv_km \
+       hydra.run.dir=${save_path} \
+       common.user_dir=/path/to/vat_hubert/vathubert \
+       distributed_training.distributed_world_size=${ngpu} \
+       optimization.update_freq=[${updatefreq}] \
+       dataset.max_tokens=3000  \
+       model.label_rate=25  \
+       common.log_interval=200 \
+       checkpoint.save_interval=5 \
+       +task.sample_distributions=\"0.08,0.1,0.15,0.15\" \
+       +criterion.banlance_loss_weights=[1.0,1.0] \
+       dataset.data_buffer_size=40 \
+       +task.use_supervised_data=True \
+       +task.use_extra_textdata=True \
+       +task.use_extra_audiodata=True \
+
+
+       
\ No newline at end of file
diff --git a/VATLM/vat_hubert/vathubert/scripts/pretrain/base_vox_pretrain_iter5.sh b/VATLM/vat_hubert/vathubert/scripts/pretrain/base_vox_pretrain_iter5.sh
new file mode 100644
index 0000000000000000000000000000000000000000..221588a16fcef4ffc084d59cdbee8c04171ac023
--- /dev/null
+++ b/VATLM/vat_hubert/vathubert/scripts/pretrain/base_vox_pretrain_iter5.sh
@@ -0,0 +1,30 @@
+#!/bin/bash
+ngpu=$1
+updatefreq=$2
+datapath=/LocalData/vatlm_related/fbankdata
+save_path=$3
+
+
+python /path/to/fairseq/fairseq_cli/hydra_train.py \
+       --config-dir /path/to/vat_hubert/vathubert/conf/pretrain --config-name base_vox_iter5.yaml \
+       task.data=${datapath}/fbank_lrs3_vox_tsv \
+       task.label_dir=${datapath}/fbank_lrs3_vox_tsv \
+       +task.sup_data_path=${datapath}/fbank_tedv3_phone_concat_vox_tsv \
+       +task.sup_manifest=${datapath}/fbank_tedv3_phone_concat_vox_tsv \
+       +task.onlytext_manifest=${datapath}/cantab2_vox_tsv \
+       +task.onlyaudio_manifest=${datapath}/fbank_giga_vox_tsv_km \
+       hydra.run.dir=${save_path} \
+       common.user_dir=/path/to/vat_hubert/vathubert \
+       distributed_training.distributed_world_size=${ngpu} \
+       optimization.update_freq=[${updatefreq}] \
+       dataset.max_tokens=3000  \
+       model.label_rate=25  \
+       common.log_interval=200 \
+       checkpoint.save_interval=5 \
+       +task.sample_distributions=\"0.13,0.15,0.32,0.3\" \
+       +criterion.banlance_loss_weights=[1.0,1.0] \
+       dataset.data_buffer_size=40 \
+       +task.use_supervised_data=True \
+       +task.use_extra_textdata=True \
+       +task.use_extra_audiodata=True \       
+
diff --git a/VATLM/vat_hubert/vathubert/scripts/pretrain/large_vox_pretrain_iter5.sh b/VATLM/vat_hubert/vathubert/scripts/pretrain/large_vox_pretrain_iter5.sh
new file mode 100644
index 0000000000000000000000000000000000000000..064f9ce14bc2c54809f060ad997664913d95bbfa
--- /dev/null
+++ b/VATLM/vat_hubert/vathubert/scripts/pretrain/large_vox_pretrain_iter5.sh
@@ -0,0 +1,31 @@
+#!/bin/bash
+unset WORLD_SIZE
+ngpu=$1
+updatefreq=$2
+datapath=/LocalData/vatlm_related/fbankdata
+save_path=$3
+
+
+python /path/to/fairseq/fairseq_cli/hydra_train.py \
+       --config-dir /path/to/vat_hubert/vathubert/conf/pretrain --config-name large_vox_iter5.yaml \
+       task.data=${datapath}/fbank_lrs3_vox_tsv \
+       task.label_dir=${datapath}/fbank_lrs3_vox_tsv \
+       +task.sup_data_path=${datapath}/fbank_tedv3_phone_concat_vox_tsv \
+       +task.sup_manifest=${datapath}/fbank_tedv3_phone_concat_vox_tsv \
+       +task.onlytext_manifest=${datapath}/cantab2_vox_tsv \
+       +task.onlyaudio_manifest=${datapath}/fbank_giga_vox_tsv_km \
+       hydra.run.dir=${save_path} \
+       common.user_dir=/path/to/vat_hubert/vathubert \
+       distributed_training.distributed_world_size=${ngpu} \
+       optimization.update_freq=[${updatefreq}] \
+       dataset.max_tokens=3000  \
+       model.label_rate=25  \
+       common.log_interval=200 \
+       checkpoint.save_interval=5 \
+       +task.sample_distributions=\"0.13,0.15,0.32,0.3\" \
+       +criterion.banlance_loss_weights=[1.0,1.0] \
+       dataset.data_buffer_size=40 \
+       +task.use_supervised_data=True \
+       +task.use_extra_textdata=True \
+       +task.use_extra_audiodata=True \       
+
diff --git a/VATLM/vat_hubert/vathubert/sequence_generator.py b/VATLM/vat_hubert/vathubert/sequence_generator.py
new file mode 100644
index 0000000000000000000000000000000000000000..49cfa7a5125a5e32d40693a6367dfb7aa4cad703
--- /dev/null
+++ b/VATLM/vat_hubert/vathubert/sequence_generator.py
@@ -0,0 +1,988 @@
+# ----------------------------------------------------------------------------
+# VatLM: Visual-Audio-Text Pre-Training  with Unified Masked Prediction for Speech Representation Learning
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/VATLM
+# Code based on fairseq: https://github.com/facebookresearch/fairseq and av_hubert: https://github.com/facebookresearch/av_hubert
+# 
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# ----------------------------------------------------------------------------
+
+import math
+from typing import Dict, List, Optional
+import sys
+
+import torch
+import torch.nn as nn
+from fairseq import search, utils
+from fairseq.data import data_utils
+from fairseq.models import FairseqIncrementalDecoder
+from torch import Tensor
+from fairseq.ngram_repeat_block import NGramRepeatBlock
+
+
+class SequenceGenerator(nn.Module):
+    def __init__(
+        self,
+        models,
+        tgt_dict,
+        beam_size=1,
+        max_len_a=0,
+        max_len_b=200,
+        max_len=0,
+        min_len=1,
+        normalize_scores=True,
+        len_penalty=1.0,
+        unk_penalty=0.0,
+        temperature=1.0,
+        match_source_len=False,
+        no_repeat_ngram_size=0,
+        search_strategy=None,
+        eos=None,
+        symbols_to_strip_from_output=None,
+        lm_model=None,
+        lm_weight=1.0,
+    ):
+        """Generates translations of a given source sentence.
+
+        Args:
+            models (List[~fairseq.models.FairseqModel]): ensemble of models,
+                currently support fairseq.models.TransformerModel for scripting
+            beam_size (int, optional): beam width (default: 1)
+            max_len_a/b (int, optional): generate sequences of maximum length
+                ax + b, where x is the source length
+            max_len (int, optional): the maximum length of the generated output
+                (not including end-of-sentence)
+            min_len (int, optional): the minimum length of the generated output
+                (not including end-of-sentence)
+            normalize_scores (bool, optional): normalize scores by the length
+                of the output (default: True)
+            len_penalty (float, optional): length penalty, where <1.0 favors
+                shorter, >1.0 favors longer sentences (default: 1.0)
+            unk_penalty (float, optional): unknown word penalty, where <0
+                produces more unks, >0 produces fewer (default: 0.0)
+            temperature (float, optional): temperature, where values
+                >1.0 produce more uniform samples and values <1.0 produce
+                sharper samples (default: 1.0)
+            match_source_len (bool, optional): outputs should match the source
+                length (default: False)
+        """
+        super().__init__()
+        if isinstance(models, EnsembleModel):
+            self.model = models
+        else:
+            self.model = EnsembleModel(models)
+        self.tgt_dict = tgt_dict
+        self.pad = tgt_dict.pad()
+        self.unk = tgt_dict.unk()
+        self.eos = tgt_dict.eos() if eos is None else eos
+        self.symbols_to_strip_from_output = (
+            symbols_to_strip_from_output.union({self.eos})
+            if symbols_to_strip_from_output is not None
+            else {self.eos}
+        )
+        self.vocab_size = len(tgt_dict)
+        self.beam_size = beam_size
+        # the max beam size is the dictionary size - 1, since we never select pad
+        self.beam_size = min(beam_size, self.vocab_size - 1)
+        self.max_len_a = max_len_a
+        self.max_len_b = max_len_b
+        self.min_len = min_len
+        self.max_len = max_len or self.model.max_decoder_positions()
+
+        self.normalize_scores = normalize_scores
+        self.len_penalty = len_penalty
+        self.unk_penalty = unk_penalty
+        self.temperature = temperature
+        self.match_source_len = match_source_len
+
+        if no_repeat_ngram_size > 0:
+            self.repeat_ngram_blocker = NGramRepeatBlock(no_repeat_ngram_size)
+        else:
+            self.repeat_ngram_blocker = None
+
+        assert temperature > 0, "--temperature must be greater than 0"
+
+        self.search = (
+            search.BeamSearch(tgt_dict) if search_strategy is None else search_strategy
+        )
+        # We only need to set src_lengths in LengthConstrainedBeamSearch.
+        # As a module attribute, setting it would break in multithread
+        # settings when the model is shared.
+        self.should_set_src_lengths = (
+            hasattr(self.search, "needs_src_lengths") and self.search.needs_src_lengths
+        )
+
+        self.model.eval()
+
+        self.lm_model = lm_model
+        self.lm_weight = lm_weight
+        if self.lm_model is not None:
+            self.lm_model.eval()
+
+    def cuda(self):
+        self.model.cuda()
+        return self
+
+    @torch.no_grad()
+    def forward(
+        self,
+        sample: Dict[str, Dict[str, Tensor]],
+        prefix_tokens: Optional[Tensor] = None,
+        bos_token: Optional[int] = None,
+    ):
+        """Generate a batch of translations.
+
+        Args:
+            sample (dict): batch
+            prefix_tokens (torch.LongTensor, optional): force decoder to begin
+                with these tokens
+            bos_token (int, optional): beginning of sentence token
+                (default: self.eos)
+        """
+        return self._generate(sample, prefix_tokens, bos_token=bos_token)
+
+    # TODO(myleott): unused, deprecate after pytorch-translate migration
+    def generate_batched_itr(self, data_itr, beam_size=None, cuda=False, timer=None):
+        """Iterate over a batched dataset and yield individual translations.
+        Args:
+            cuda (bool, optional): use GPU for generation
+            timer (StopwatchMeter, optional): time generations
+        """
+        for sample in data_itr:
+            s = utils.move_to_cuda(sample) if cuda else sample
+            if "net_input" not in s:
+                continue
+            input = s["net_input"]
+            # model.forward normally channels prev_output_tokens into the decoder
+            # separately, but SequenceGenerator directly calls model.encoder
+            encoder_input = {
+                k: v for k, v in input.items() if k != "prev_output_tokens"
+            }
+            if timer is not None:
+                timer.start()
+            with torch.no_grad():
+                hypos = self.generate(encoder_input)
+            if timer is not None:
+                timer.stop(sum(len(h[0]["tokens"]) for h in hypos))
+            for i, id in enumerate(s["id"].data):
+                # remove padding
+                src = utils.strip_pad(input["src_tokens"].data[i, :], self.pad)
+                ref = (
+                    utils.strip_pad(s["target"].data[i, :], self.pad)
+                    if s["target"] is not None
+                    else None
+                )
+                yield id, src, ref, hypos[i]
+
+    @torch.no_grad()
+    def generate(self, models, sample: Dict[str, Dict[str, Tensor]], **kwargs) -> List[List[Dict[str, Tensor]]]:
+        """Generate translations. Match the api of other fairseq generators.
+
+        Args:
+            models (List[~fairseq.models.FairseqModel]): ensemble of models
+            sample (dict): batch
+            prefix_tokens (torch.LongTensor, optional): force decoder to begin
+                with these tokens
+            constraints (torch.LongTensor, optional): force decoder to include
+                the list of constraints
+            bos_token (int, optional): beginning of sentence token
+                (default: self.eos)
+        """
+        return self._generate(sample, **kwargs)
+
+    def _generate(
+        self,
+        sample: Dict[str, Dict[str, Tensor]],
+        prefix_tokens: Optional[Tensor] = None,
+        constraints: Optional[Tensor] = None,
+        bos_token: Optional[int] = None,
+    ):
+        incremental_states = torch.jit.annotate(
+            List[Dict[str, Dict[str, Optional[Tensor]]]],
+            [
+                torch.jit.annotate(Dict[str, Dict[str, Optional[Tensor]]], {})
+                for i in range(self.model.models_size)
+            ],
+        )
+        net_input = sample["net_input"]
+
+        if "src_tokens" in net_input:
+            src_tokens = net_input["src_tokens"]
+            # length of the source text being the character length except EndOfSentence and pad
+            src_lengths = (
+                (src_tokens.ne(self.eos) & src_tokens.ne(self.pad)).long().sum(dim=1)
+            )
+        elif "source" in net_input:
+            src_tokens = net_input["source"]
+            src_lengths = (
+                net_input["padding_mask"].size(-1) - net_input["padding_mask"].sum(-1)
+                if net_input["padding_mask"] is not None
+                else torch.tensor(src_tokens.size(-1)).to(src_tokens)
+            )
+        elif "features" in net_input:
+            src_tokens = net_input["features"]
+            src_lengths = (
+                net_input["padding_mask"].size(-1) - net_input["padding_mask"].sum(-1)
+                if net_input["padding_mask"] is not None
+                else torch.tensor(src_tokens.size(-1)).to(src_tokens)
+            )
+        else:
+            raise Exception("expected src_tokens or source in net input. input keys: " + str(net_input.keys()))
+
+        # bsz: total number of sentences in beam
+        # Note that src_tokens may have more than 2 dimensions (i.e. audio features)
+        if src_tokens['audio'] is not None:
+            bsz, src_len = src_tokens['audio'].size()[:2]
+            src_device = src_tokens['audio'].device
+        else:
+            bsz, src_len = net_input['padding_mask'].size()
+            src_device = src_tokens['video'].device
+        beam_size = self.beam_size
+        if constraints is not None and not self.search.supports_constraints:
+            raise NotImplementedError(
+                "Target-side constraints were provided, but search method doesn't support them"
+            )
+
+        # Initialize constraints, when active
+        self.search.init_constraints(constraints, beam_size)
+
+        max_len: int = -1
+        if self.match_source_len:
+            max_len = src_lengths.max().item()
+        else:
+            max_len = min(
+                int(self.max_len_a * src_len + self.max_len_b),
+                self.max_len - 1,
+            )
+        assert (
+            self.min_len <= max_len
+        ), "min_len cannot be larger than max_len, please adjust these!"
+        # compute the encoder output for each beam
+        encoder_outs = self.model.forward_encoder(net_input)
+
+        # placeholder of indices for bsz * beam_size to hold tokens and accumulative scores
+        new_order = torch.arange(bsz).view(-1, 1).repeat(1, beam_size).view(-1)
+        new_order = new_order.to(src_device).long()
+        encoder_outs = self.model.reorder_encoder_out(encoder_outs, new_order)
+        # ensure encoder_outs is a List.
+        assert encoder_outs is not None
+
+        # initialize buffers
+        scores = (
+            torch.zeros(bsz * beam_size, max_len + 1).to(src_device).float()
+        )  # +1 for eos; pad is never chosen for scoring
+        tokens = (
+            torch.zeros(bsz * beam_size, max_len + 2)
+            .to(src_device)
+            .long()
+            .fill_(self.pad)
+        )  # +2 for eos and pad
+        tokens[:, 0] = self.eos if bos_token is None else bos_token
+        attn: Optional[Tensor] = None
+
+        # A list that indicates candidates that should be ignored.
+        # For example, suppose we're sampling and have already finalized 2/5
+        # samples. Then cands_to_ignore would mark 2 positions as being ignored,
+        # so that we only finalize the remaining 3 samples.
+        cands_to_ignore = (
+            torch.zeros(bsz, beam_size).to(src_device).eq(-1)
+        )  # forward and backward-compatible False mask
+
+        # list of completed sentences
+        finalized = torch.jit.annotate(
+            List[List[Dict[str, Tensor]]],
+            [torch.jit.annotate(List[Dict[str, Tensor]], []) for i in range(bsz)],
+        )  # contains lists of dictionaries of infomation about the hypothesis being finalized at each step
+
+        # a boolean array indicating if the sentence at the index is finished or not
+        finished = [False for i in range(bsz)]
+        num_remaining_sent = bsz  # number of sentences remaining
+
+        # number of candidate hypos per step
+        cand_size = 2 * beam_size  # 2 x beam size in case half are EOS
+
+        # offset arrays for converting between different indexing schemes
+        bbsz_offsets = (
+            (torch.arange(0, bsz) * beam_size)
+            .unsqueeze(1)
+            .type_as(tokens)
+            .to(src_device)
+        )
+        cand_offsets = torch.arange(0, cand_size).type_as(tokens).to(src_device)
+
+        reorder_state: Optional[Tensor] = None
+        batch_idxs: Optional[Tensor] = None
+
+        original_batch_idxs: Optional[Tensor] = None
+        if "id" in sample and isinstance(sample["id"], Tensor):
+            original_batch_idxs = sample["id"]
+        else:
+            original_batch_idxs = torch.arange(0, bsz).type_as(tokens)
+
+        for step in range(max_len + 1):  # one extra step for EOS marker
+            # reorder decoder internal states based on the prev choice of beams
+            if reorder_state is not None:
+                if batch_idxs is not None:
+                    # update beam indices to take into account removed sentences
+                    corr = batch_idxs - torch.arange(batch_idxs.numel()).type_as(
+                        batch_idxs
+                    )
+                    reorder_state.view(-1, beam_size).add_(
+                        corr.unsqueeze(-1) * beam_size
+                    )
+                    original_batch_idxs = original_batch_idxs[batch_idxs]
+                self.model.reorder_incremental_state(incremental_states, reorder_state)
+                encoder_outs = self.model.reorder_encoder_out(
+                    encoder_outs, reorder_state
+                )
+
+            lprobs, avg_attn_scores = self.model.forward_decoder(
+                tokens[:, : step + 1],
+                encoder_outs,
+                incremental_states,
+                self.temperature,
+            )
+
+            if self.lm_model is not None:
+                lm_out = self.lm_model(tokens[:, : step + 1])
+                probs = self.lm_model.get_normalized_probs(
+                    lm_out, log_probs=True, sample=None
+                )
+                probs = probs[:, -1, :] * self.lm_weight
+                lprobs += probs
+
+            lprobs[lprobs != lprobs] = torch.tensor(-math.inf).to(lprobs)
+
+            lprobs[:, self.pad] = -math.inf  # never select pad
+            lprobs[:, self.unk] -= self.unk_penalty  # apply unk penalty
+
+            # handle max length constraint
+            if step >= max_len:
+                lprobs[:, : self.eos] = -math.inf
+                lprobs[:, self.eos + 1 :] = -math.inf
+
+            # handle prefix tokens (possibly with different lengths)
+            if (
+                prefix_tokens is not None
+                and step < prefix_tokens.size(1)
+                and step < max_len
+            ):
+                lprobs, tokens, scores = self._prefix_tokens(
+                    step, lprobs, scores, tokens, prefix_tokens, beam_size
+                )
+            elif step < self.min_len:
+                # minimum length constraint (does not apply if using prefix_tokens)
+                lprobs[:, self.eos] = -math.inf
+
+            # Record attention scores, only support avg_attn_scores is a Tensor
+            if avg_attn_scores is not None:
+                if attn is None:
+                    attn = torch.empty(
+                        bsz * beam_size, avg_attn_scores.size(1), max_len + 2
+                    ).to(scores)
+                attn[:, :, step + 1].copy_(avg_attn_scores)
+
+            scores = scores.type_as(lprobs)
+            eos_bbsz_idx = torch.empty(0).to(
+                tokens
+            )  # indices of hypothesis ending with eos (finished sentences)
+            eos_scores = torch.empty(0).to(
+                scores
+            )  # scores of hypothesis ending with eos (finished sentences)
+
+            if self.should_set_src_lengths:
+                self.search.set_src_lengths(src_lengths)
+
+            if self.repeat_ngram_blocker is not None:
+                lprobs = self.repeat_ngram_blocker(tokens, lprobs, bsz, beam_size, step)
+
+            # Shape: (batch, cand_size)
+            cand_scores, cand_indices, cand_beams = self.search.step(
+                step,
+                lprobs.view(bsz, -1, self.vocab_size),
+                scores.view(bsz, beam_size, -1)[:, :, :step],
+                tokens[:, : step + 1],
+                original_batch_idxs,
+            )
+
+            # cand_bbsz_idx contains beam indices for the top candidate
+            # hypotheses, with a range of values: [0, bsz*beam_size),
+            # and dimensions: [bsz, cand_size]
+            cand_bbsz_idx = cand_beams.add(bbsz_offsets)
+
+            # finalize hypotheses that end in eos
+            # Shape of eos_mask: (batch size, beam size)
+            eos_mask = cand_indices.eq(self.eos) & cand_scores.ne(-math.inf)
+            eos_mask[:, :beam_size][cands_to_ignore] = torch.tensor(0).to(eos_mask)
+
+            # only consider eos when it's among the top beam_size indices
+            # Now we know what beam item(s) to finish
+            # Shape: 1d list of absolute-numbered
+            eos_bbsz_idx = torch.masked_select(
+                cand_bbsz_idx[:, :beam_size], mask=eos_mask[:, :beam_size]
+            )
+
+            finalized_sents: List[int] = []
+            if eos_bbsz_idx.numel() > 0:
+                eos_scores = torch.masked_select(
+                    cand_scores[:, :beam_size], mask=eos_mask[:, :beam_size]
+                )
+
+                finalized_sents = self.finalize_hypos(
+                    step,
+                    eos_bbsz_idx,
+                    eos_scores,
+                    tokens,
+                    scores,
+                    finalized,
+                    finished,
+                    beam_size,
+                    attn,
+                    src_lengths,
+                    max_len,
+                )
+                num_remaining_sent -= len(finalized_sents)
+
+            assert num_remaining_sent >= 0
+            if num_remaining_sent == 0:
+                break
+            if self.search.stop_on_max_len and step >= max_len:
+                break
+            assert step < max_len, f"{step} < {max_len}"
+
+            # Remove finalized sentences (ones for which {beam_size}
+            # finished hypotheses have been generated) from the batch.
+            if len(finalized_sents) > 0:
+                new_bsz = bsz - len(finalized_sents)
+
+                # construct batch_idxs which holds indices of batches to keep for the next pass
+                batch_mask = torch.ones(
+                    bsz, dtype=torch.bool, device=cand_indices.device
+                )
+                batch_mask[finalized_sents] = False
+                # TODO replace `nonzero(as_tuple=False)` after TorchScript supports it
+                batch_idxs = torch.arange(
+                    bsz, device=cand_indices.device
+                ).masked_select(batch_mask)
+
+                # Choose the subset of the hypothesized constraints that will continue
+                self.search.prune_sentences(batch_idxs)
+
+                eos_mask = eos_mask[batch_idxs]
+                cand_beams = cand_beams[batch_idxs]
+                bbsz_offsets.resize_(new_bsz, 1)
+                cand_bbsz_idx = cand_beams.add(bbsz_offsets)
+                cand_scores = cand_scores[batch_idxs]
+                cand_indices = cand_indices[batch_idxs]
+
+                if prefix_tokens is not None:
+                    prefix_tokens = prefix_tokens[batch_idxs]
+                src_lengths = src_lengths[batch_idxs]
+                cands_to_ignore = cands_to_ignore[batch_idxs]
+
+                scores = scores.view(bsz, -1)[batch_idxs].view(new_bsz * beam_size, -1)
+                tokens = tokens.view(bsz, -1)[batch_idxs].view(new_bsz * beam_size, -1)
+                if attn is not None:
+                    attn = attn.view(bsz, -1)[batch_idxs].view(
+                        new_bsz * beam_size, attn.size(1), -1
+                    )
+                bsz = new_bsz
+            else:
+                batch_idxs = None
+
+            # Set active_mask so that values > cand_size indicate eos hypos
+            # and values < cand_size indicate candidate active hypos.
+            # After, the min values per row are the top candidate active hypos
+
+            # Rewrite the operator since the element wise or is not supported in torchscript.
+
+            eos_mask[:, :beam_size] = ~((~cands_to_ignore) & (~eos_mask[:, :beam_size]))
+            active_mask = torch.add(
+                eos_mask.type_as(cand_offsets) * cand_size,
+                cand_offsets[: eos_mask.size(1)],
+            )
+
+            # get the top beam_size active hypotheses, which are just
+            # the hypos with the smallest values in active_mask.
+            # {active_hypos} indicates which {beam_size} hypotheses
+            # from the list of {2 * beam_size} candidates were
+            # selected. Shapes: (batch size, beam size)
+            new_cands_to_ignore, active_hypos = torch.topk(
+                active_mask, k=beam_size, dim=1, largest=False
+            )
+
+            # update cands_to_ignore to ignore any finalized hypos.
+            cands_to_ignore = new_cands_to_ignore.ge(cand_size)[:, :beam_size]
+            # Make sure there is at least one active item for each sentence in the batch.
+            assert (~cands_to_ignore).any(dim=1).all()
+
+            # update cands_to_ignore to ignore any finalized hypos
+
+            # {active_bbsz_idx} denotes which beam number is continued for each new hypothesis (a beam
+            # can be selected more than once).
+            active_bbsz_idx = torch.gather(cand_bbsz_idx, dim=1, index=active_hypos)
+            active_scores = torch.gather(cand_scores, dim=1, index=active_hypos)
+
+            active_bbsz_idx = active_bbsz_idx.view(-1)
+            active_scores = active_scores.view(-1)
+
+            # copy tokens and scores for active hypotheses
+
+            # Set the tokens for each beam (can select the same row more than once)
+            tokens[:, : step + 1] = torch.index_select(
+                tokens[:, : step + 1], dim=0, index=active_bbsz_idx
+            )
+            # Select the next token for each of them
+            tokens.view(bsz, beam_size, -1)[:, :, step + 1] = torch.gather(
+                cand_indices, dim=1, index=active_hypos
+            )
+            if step > 0:
+                scores[:, :step] = torch.index_select(
+                    scores[:, :step], dim=0, index=active_bbsz_idx
+                )
+            scores.view(bsz, beam_size, -1)[:, :, step] = torch.gather(
+                cand_scores, dim=1, index=active_hypos
+            )
+
+            # Update constraints based on which candidates were selected for the next beam
+            self.search.update_constraints(active_hypos)
+
+            # copy attention for active hypotheses
+            if attn is not None:
+                attn[:, :, : step + 2] = torch.index_select(
+                    attn[:, :, : step + 2], dim=0, index=active_bbsz_idx
+                )
+
+            # reorder incremental state in decoder
+            reorder_state = active_bbsz_idx
+
+        # sort by score descending
+        for sent in range(len(finalized)):
+            scores = torch.tensor(
+                [float(elem["score"].item()) for elem in finalized[sent]]
+            )
+            _, sorted_scores_indices = torch.sort(scores, descending=True)
+            finalized[sent] = [finalized[sent][ssi] for ssi in sorted_scores_indices]
+            finalized[sent] = torch.jit.annotate(
+                List[Dict[str, Tensor]], finalized[sent]
+            )
+        return finalized
+
+    def _prefix_tokens(
+        self, step: int, lprobs, scores, tokens, prefix_tokens, beam_size: int
+    ):
+        """Handle prefix tokens"""
+        prefix_toks = prefix_tokens[:, step].unsqueeze(-1).repeat(1, beam_size).view(-1)
+        prefix_lprobs = lprobs.gather(-1, prefix_toks.unsqueeze(-1))
+        prefix_mask = prefix_toks.ne(self.pad)
+        lprobs[prefix_mask] = torch.tensor(-math.inf).to(lprobs)
+        lprobs[prefix_mask] = lprobs[prefix_mask].scatter(
+            -1, prefix_toks[prefix_mask].unsqueeze(-1), prefix_lprobs[prefix_mask]
+        )
+        # if prefix includes eos, then we should make sure tokens and
+        # scores are the same across all beams
+        eos_mask = prefix_toks.eq(self.eos)
+        if eos_mask.any():
+            # validate that the first beam matches the prefix
+            first_beam = tokens[eos_mask].view(-1, beam_size, tokens.size(-1))[
+                :, 0, 1 : step + 1
+            ]
+            eos_mask_batch_dim = eos_mask.view(-1, beam_size)[:, 0]
+            target_prefix = prefix_tokens[eos_mask_batch_dim][:, :step]
+            assert (first_beam == target_prefix).all()
+
+            # copy tokens, scores and lprobs from the first beam to all beams
+            tokens = self.replicate_first_beam(tokens, eos_mask_batch_dim, beam_size)
+            scores = self.replicate_first_beam(scores, eos_mask_batch_dim, beam_size)
+            lprobs = self.replicate_first_beam(lprobs, eos_mask_batch_dim, beam_size)
+        return lprobs, tokens, scores
+
+    def replicate_first_beam(self, tensor, mask, beam_size: int):
+        tensor = tensor.view(-1, beam_size, tensor.size(-1))
+        tensor[mask] = tensor[mask][:, :1, :]
+        return tensor.view(-1, tensor.size(-1))
+
+    def finalize_hypos(
+        self,
+        step: int,
+        bbsz_idx,
+        eos_scores,
+        tokens,
+        scores,
+        finalized: List[List[Dict[str, Tensor]]],
+        finished: List[bool],
+        beam_size: int,
+        attn: Optional[Tensor],
+        src_lengths,
+        max_len: int,
+    ):
+        """Finalize hypothesis, store finalized information in `finalized`, and change `finished` accordingly.
+        A sentence is finalized when {beam_size} finished items have been collected for it.
+
+        Returns number of sentences (not beam items) being finalized.
+        These will be removed from the batch and not processed further.
+        Args:
+            bbsz_idx (Tensor):
+        """
+        assert bbsz_idx.numel() == eos_scores.numel()
+
+        # clone relevant token and attention tensors.
+        # tokens is (batch * beam, max_len). So the index_select
+        # gets the newly EOS rows, then selects cols 1..{step + 2}
+        tokens_clone = tokens.index_select(0, bbsz_idx)[
+            :, 1 : step + 2
+        ]  # skip the first index, which is EOS
+
+        tokens_clone[:, step] = self.eos
+        attn_clone = (
+            attn.index_select(0, bbsz_idx)[:, :, 1 : step + 2]
+            if attn is not None
+            else None
+        )
+
+        # compute scores per token position
+        pos_scores = scores.index_select(0, bbsz_idx)[:, : step + 1]
+        pos_scores[:, step] = eos_scores
+        # convert from cumulative to per-position scores
+        pos_scores[:, 1:] = pos_scores[:, 1:] - pos_scores[:, :-1]
+
+        # normalize sentence-level scores
+        if self.normalize_scores:
+            eos_scores /= (step + 1) ** self.len_penalty
+
+        # cum_unfin records which sentences in the batch are finished.
+        # It helps match indexing between (a) the original sentences
+        # in the batch and (b) the current, possibly-reduced set of
+        # sentences.
+        cum_unfin: List[int] = []
+        prev = 0
+        for f in finished:
+            if f:
+                prev += 1
+            else:
+                cum_unfin.append(prev)
+
+        # The keys here are of the form "{sent}_{unfin_idx}", where
+        # "unfin_idx" is the index in the current (possibly reduced)
+        # list of sentences, and "sent" is the index in the original,
+        # unreduced batch
+        # set() is not supported in script export
+        sents_seen: Dict[str, Optional[Tensor]] = {}
+
+        # For every finished beam item
+        for i in range(bbsz_idx.size()[0]):
+            idx = bbsz_idx[i]
+            score = eos_scores[i]
+            # sentence index in the current (possibly reduced) batch
+            unfin_idx = idx // beam_size
+            # sentence index in the original (unreduced) batch
+            sent = unfin_idx + cum_unfin[unfin_idx]
+            # Cannot create dict for key type '(int, int)' in torchscript.
+            # The workaround is to cast int to string
+            seen = str(sent.item()) + "_" + str(unfin_idx.item())
+            if seen not in sents_seen:
+                sents_seen[seen] = None
+
+            if self.match_source_len and step > src_lengths[unfin_idx]:
+                score = torch.tensor(-math.inf).to(score)
+
+            # An input sentence (among those in a batch) is finished when
+            # beam_size hypotheses have been collected for it
+            if len(finalized[sent]) < beam_size:
+                if attn_clone is not None:
+                    # remove padding tokens from attn scores
+                    hypo_attn = attn_clone[i]
+                else:
+                    hypo_attn = torch.empty(0)
+
+                finalized[sent].append(
+                    {
+                        "tokens": tokens_clone[i],
+                        "score": score,
+                        "attention": hypo_attn,  # src_len x tgt_len
+                        "alignment": torch.empty(0),
+                        "positional_scores": pos_scores[i],
+                    }
+                )
+
+        newly_finished: List[int] = []
+
+        for seen in sents_seen.keys():
+            # check termination conditions for this sentence
+            sent: int = int(float(seen.split("_")[0]))
+            unfin_idx: int = int(float(seen.split("_")[1]))
+
+            if not finished[sent] and self.is_finished(
+                step, unfin_idx, max_len, len(finalized[sent]), beam_size
+            ):
+                finished[sent] = True
+                newly_finished.append(unfin_idx)
+
+        return newly_finished
+
+    def is_finished(
+        self,
+        step: int,
+        unfin_idx: int,
+        max_len: int,
+        finalized_sent_len: int,
+        beam_size: int,
+    ):
+        """
+        Check whether decoding for a sentence is finished, which
+        occurs when the list of finalized sentences has reached the
+        beam size, or when we reach the maximum length.
+        """
+        assert finalized_sent_len <= beam_size
+        if finalized_sent_len == beam_size or step == max_len:
+            return True
+        return False
+
+
+class EnsembleModel(nn.Module):
+    """A wrapper around an ensemble of models."""
+
+    def __init__(self, models):
+        super().__init__()
+        self.models_size = len(models)
+        # method '__len__' is not supported in ModuleList for torch script
+        self.single_model = models[0]
+        self.models = nn.ModuleList(models)
+
+        self.has_incremental: bool = False
+        if all(
+            hasattr(m, "decoder") and isinstance(m.decoder, FairseqIncrementalDecoder)
+            for m in models
+        ):
+            self.has_incremental = True
+
+    def forward(self):
+        pass
+
+    def has_encoder(self):
+        return hasattr(self.single_model, "encoder")
+
+    def has_incremental_states(self):
+        return self.has_incremental
+
+    def max_decoder_positions(self):
+        return min([m.max_decoder_positions() for m in self.models if hasattr(m, "max_decoder_positions")] + [sys.maxsize])
+
+    @torch.jit.export
+    def forward_encoder(self, net_input: Dict[str, Tensor]):
+        if not self.has_encoder():
+            return None
+        return [model.encoder.forward_torchscript(net_input) for model in self.models]
+
+    @torch.jit.export
+    def forward_decoder(
+        self,
+        tokens,
+        encoder_outs: List[Dict[str, List[Tensor]]],
+        incremental_states: List[Dict[str, Dict[str, Optional[Tensor]]]],
+        temperature: float = 1.0,
+    ):
+        log_probs = []
+        avg_attn: Optional[Tensor] = None
+        encoder_out: Optional[Dict[str, List[Tensor]]] = None
+        for i, model in enumerate(self.models):
+            if self.has_encoder():
+                encoder_out = encoder_outs[i]
+            # decode each model
+            if self.has_incremental_states():
+                decoder_out = model.decoder.forward(
+                    tokens,
+                    encoder_out=encoder_out,
+                    incremental_state=incremental_states[i],
+                )
+            else:
+                if hasattr(model, "decoder"):
+                    decoder_out = model.decoder.forward(tokens, encoder_out=encoder_out)
+                else:
+                    decoder_out = model.forward(tokens)
+
+            attn: Optional[Tensor] = None
+            decoder_len = len(decoder_out)
+            if decoder_len > 1 and decoder_out[1] is not None:
+                if isinstance(decoder_out[1], Tensor):
+                    attn = decoder_out[1]
+                else:
+                    attn_holder = decoder_out[1]["attn"]
+                    if isinstance(attn_holder, Tensor):
+                        attn = attn_holder
+                    elif attn_holder is not None:
+                        attn = attn_holder[0]
+                if attn is not None:
+                    attn = attn[:, -1, :]
+
+            decoder_out_tuple = (
+                decoder_out[0][:, -1:, :].div_(temperature),
+                None if decoder_len <= 1 else decoder_out[1],
+            )
+            probs = model.get_normalized_probs(
+                decoder_out_tuple, log_probs=True, sample=None
+            )
+            probs = probs[:, -1, :]
+            if self.models_size == 1:
+                return probs, attn
+
+            log_probs.append(probs)
+            if attn is not None:
+                if avg_attn is None:
+                    avg_attn = attn
+                else:
+                    avg_attn.add_(attn)
+
+        avg_probs = torch.logsumexp(torch.stack(log_probs, dim=0), dim=0) - math.log(
+            self.models_size
+        )
+
+        if avg_attn is not None:
+            avg_attn.div_(self.models_size)
+        return avg_probs, avg_attn
+
+    @torch.jit.export
+    def reorder_encoder_out(
+        self, encoder_outs: Optional[List[Dict[str, List[Tensor]]]], new_order
+    ):
+        """
+        Reorder encoder output according to *new_order*.
+
+        Args:
+            encoder_out: output from the ``forward()`` method
+            new_order (LongTensor): desired order
+
+        Returns:
+            *encoder_out* rearranged according to *new_order*
+        """
+        new_outs: List[Dict[str, List[Tensor]]] = []
+        if not self.has_encoder():
+            return new_outs
+        for i, model in enumerate(self.models):
+            assert encoder_outs is not None
+            new_outs.append(
+                model.encoder.reorder_encoder_out(encoder_outs[i], new_order)
+            )
+        return new_outs
+
+    @torch.jit.export
+    def reorder_incremental_state(
+        self,
+        incremental_states: List[Dict[str, Dict[str, Optional[Tensor]]]],
+        new_order,
+    ):
+        if not self.has_incremental_states():
+            return
+        for i, model in enumerate(self.models):
+            model.decoder.reorder_incremental_state_scripting(
+                incremental_states[i], new_order
+            )
+
+
+class SequenceGeneratorWithAlignment(SequenceGenerator):
+    def __init__(
+        self, models, tgt_dict, left_pad_target=False, print_alignment="hard", **kwargs
+    ):
+        """Generates translations of a given source sentence.
+
+        Produces alignments following "Jointly Learning to Align and
+        Translate with Transformer Models" (Garg et al., EMNLP 2019).
+
+        Args:
+            left_pad_target (bool, optional): Whether or not the
+                hypothesis should be left padded or not when they are
+                teacher forced for generating alignments.
+        """
+        super().__init__(EnsembleModelWithAlignment(models), tgt_dict, **kwargs)
+        self.left_pad_target = left_pad_target
+
+        if print_alignment == "hard":
+            self.extract_alignment = utils.extract_hard_alignment
+        elif print_alignment == "soft":
+            self.extract_alignment = utils.extract_soft_alignment
+
+    @torch.no_grad()
+    def generate(self, models, sample, **kwargs):
+        finalized = super()._generate(sample, **kwargs)
+
+        src_tokens = sample["net_input"]["src_tokens"]
+        bsz = src_tokens.shape[0]
+        beam_size = self.beam_size
+        (
+            src_tokens,
+            src_lengths,
+            prev_output_tokens,
+            tgt_tokens,
+        ) = self._prepare_batch_for_alignment(sample, finalized)
+        if any(getattr(m, "full_context_alignment", False) for m in self.model.models):
+            attn = self.model.forward_align(src_tokens, src_lengths, prev_output_tokens)
+        else:
+            attn = [
+                finalized[i // beam_size][i % beam_size]["attention"].transpose(1, 0)
+                for i in range(bsz * beam_size)
+            ]
+
+        if src_tokens.device != "cpu":
+            src_tokens = src_tokens.to("cpu")
+            tgt_tokens = tgt_tokens.to("cpu")
+            attn = [i.to("cpu") for i in attn]
+
+        # Process the attn matrix to extract hard alignments.
+        for i in range(bsz * beam_size):
+            alignment = self.extract_alignment(
+                attn[i], src_tokens[i], tgt_tokens[i], self.pad, self.eos
+            )
+            finalized[i // beam_size][i % beam_size]["alignment"] = alignment
+        return finalized
+
+    def _prepare_batch_for_alignment(self, sample, hypothesis):
+        src_tokens = sample["net_input"]["src_tokens"]
+        bsz = src_tokens.shape[0]
+        src_tokens = (
+            src_tokens[:, None, :]
+            .expand(-1, self.beam_size, -1)
+            .contiguous()
+            .view(bsz * self.beam_size, -1)
+        )
+        src_lengths = sample["net_input"]["src_lengths"]
+        src_lengths = (
+            src_lengths[:, None]
+            .expand(-1, self.beam_size)
+            .contiguous()
+            .view(bsz * self.beam_size)
+        )
+        prev_output_tokens = data_utils.collate_tokens(
+            [beam["tokens"] for example in hypothesis for beam in example],
+            self.pad,
+            self.eos,
+            self.left_pad_target,
+            move_eos_to_beginning=True,
+        )
+        tgt_tokens = data_utils.collate_tokens(
+            [beam["tokens"] for example in hypothesis for beam in example],
+            self.pad,
+            self.eos,
+            self.left_pad_target,
+            move_eos_to_beginning=False,
+        )
+        return src_tokens, src_lengths, prev_output_tokens, tgt_tokens
+
+
+class EnsembleModelWithAlignment(EnsembleModel):
+    """A wrapper around an ensemble of models."""
+
+    def __init__(self, models):
+        super().__init__(models)
+
+    def forward_align(self, src_tokens, src_lengths, prev_output_tokens):
+        avg_attn = None
+        for model in self.models:
+            decoder_out = model(src_tokens, src_lengths, prev_output_tokens)
+            attn = decoder_out[1]["attn"][0]
+            if avg_attn is None:
+                avg_attn = attn
+            else:
+                avg_attn.add_(attn)
+        if len(self.models) > 1:
+            avg_attn.div_(len(self.models))
+        return avg_attn
diff --git a/VATLM/vat_hubert/vathubert/tasks/vathubert_pretraining.py b/VATLM/vat_hubert/vathubert/tasks/vathubert_pretraining.py
new file mode 100644
index 0000000000000000000000000000000000000000..08b81d1cb51c5f68328eb334c2ff67e7ea7e8c79
--- /dev/null
+++ b/VATLM/vat_hubert/vathubert/tasks/vathubert_pretraining.py
@@ -0,0 +1,863 @@
+# ----------------------------------------------------------------------------
+# VatLM: Visual-Audio-Text Pre-Training  with Unified Masked Prediction for Speech Representation Learning
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/VATLM
+# Code based on fairseq: https://github.com/facebookresearch/fairseq and av_hubert: https://github.com/facebookresearch/av_hubert
+# 
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# ----------------------------------------------------------------------------
+
+import logging
+import os, glob
+import sys
+from typing import Dict, List, Optional, Tuple
+
+import numpy as np
+
+from dataclasses import dataclass, field
+from fairseq import metrics, search
+from fairseq.data import Dictionary, encoders
+from fairseq.dataclass.configs import FairseqDataclass
+from fairseq.tasks import register_task
+from fairseq.tasks.fairseq_task import FairseqTask
+from omegaconf import MISSING, II
+import numpy as np
+from argparse import Namespace
+
+DBG=True if len(sys.argv) == 1 else False
+
+if DBG:
+    from vathubert.data.vathubert_dataset import VATHubertDataset
+    from vathubert.sequence_generator import SequenceGenerator
+else:
+
+    from vathubert.data.vathubert_dataset import VATHubertDataset
+    from vathubert.sequence_generator import SequenceGenerator
+    from vathubert.data.audiohubert_dataset import AudioHubertDataset
+    from vathubert.data.texthubert_dataset import TextHubertDataset
+    from vathubert.data.onlyaudiohubert_dataset import OnlyAudioHubertDataset
+
+from fairseq.data.audio.multi_corpus_dataset_audio import MultiCorpusDataset
+from collections import OrderedDict
+from fairseq.data import FairseqDataset
+from fairseq.data import data_utils
+from fairseq.data import iterators
+
+
+logger = logging.getLogger(__name__)
+
+
+class LabelEncoder(object):
+    def __init__(self, dictionary: Dictionary) -> None:
+        self.dictionary = dictionary
+
+    def __call__(self, label: str) -> List[str]:
+        return self.dictionary.encode_line(
+            label, append_eos=False, add_if_not_exist=False,
+        )
+
+class LabelEncoderS2SToken(object):
+    def __init__(self, dictionary: Dictionary, bpe_tokenizer) -> None:
+        self.bpe_tokenizer = bpe_tokenizer
+        self.dictionary = dictionary
+
+    def __call__(self, label: str) -> List[str]:
+        label = self.bpe_tokenizer.encode(label.lower())
+        return self.dictionary.encode_line(
+            label, append_eos=True, add_if_not_exist=False,
+        ).long()
+
+    def decode(self, tok, symbols_ignore=None):
+        tok = self.dictionary.string(tok, extra_symbols_to_ignore=symbols_ignore)
+        if self.bpe_tokenizer:
+            tok = self.bpe_tokenizer.decode(tok)
+        return tok
+
+@dataclass
+class VATHubertPretrainingConfig(FairseqDataclass):
+    data: str = field(
+        default=MISSING, metadata={"help": "path to data directory"}
+    )
+    labels: List[str] = field(
+        default_factory=lambda: ["ltr"],
+        metadata={
+            "help": (
+                "extension of the label files to load, frame-level labels for"
+                " pre-training, and sequence-level label for fine-tuning"
+            )
+        },
+    )
+    label_dir: Optional[str] = field(
+        default=None,
+        metadata={
+            "help": "if set, looks for labels in this directory instead",
+        },
+    )
+    label_rate: int = field(
+        default=-1,
+        metadata={"help": "label frame rate. -1 for sequence label"},
+    )
+
+    sample_rate: int = field(
+        default=16_000,
+        metadata={
+            "help": "target sample rate. audio files will be up/down "
+            "sampled to this rate"
+        },
+    )
+    normalize: bool = field(
+        default=False,
+        metadata={
+            "help": "if set, normalizes input to have 0 mean and unit variance"
+        },
+    )
+    enable_padding: bool = field(
+        default=False,
+        metadata={"help": "pad shorter samples instead of cropping"},
+    )
+    max_sample_size: Optional[int] = field(
+        default=None,
+        metadata={"help": "max sample size to keep in training"},
+    )
+    min_sample_size: Optional[int] = field(
+        default=None,
+        metadata={"help": "min sample size to keep in training"},
+    )
+    max_trim_sample_size: Optional[int] = field(
+        default=II("task.max_sample_size"),
+        metadata={"help": "max sample size to trim to for batching"},
+    )
+    single_target: Optional[bool] = field(
+        default=False,
+        metadata={
+            "help": "if set, AddTargetDatasets outputs same keys "
+            "as AddTargetDataset"
+        },
+    )
+    random_crop: Optional[bool] = field(
+        default=True,
+        metadata={"help": "always crop from the beginning if false"},
+    )
+    pad_audio: Optional[bool] = field(
+        default=False,
+        metadata={"help": "pad audio to the longest one in the batch if true"},
+    )
+    pdb: Optional[bool] = field(
+        default=False,
+        metadata={"help": "pdb"},
+    )
+    stack_order_audio: int = field(
+        default=1,
+        metadata={"help": "concatenate n consecutive audio frames for one step"},
+    )
+    skip_verify: Optional[bool] = field(
+        default=False,
+        metadata={"help": "skip verifying label-audio alignment"},
+    )
+
+    text_sampling_alpha: float = field(
+        default=0.2,
+        metadata={
+            "help": "Hyper-parameter alpha = 1/T for temperature-based text resampling."
+            "(alpha = 1 for no resampling)"
+        },
+    )
+    split_modality_batch: bool = field(
+        default=False,
+        metadata={"help": "whether create all samples of different modalities in a batch"},
+    )    
+    image_aug: bool = field(default=False, metadata={'help': 'image data augmentation'})
+    image_crop_size: int = field(
+        default=88, metadata={"help": "image ROI size"})
+    image_mean: float = field(
+        default=0.421, metadata={"help": "image mean"})
+    image_std: float = field(
+        default=0.165, metadata={"help": "image std"})
+    modalities: Optional[List[str]] = field(default_factory=lambda: ["audio", "video"], metadata={'help': 'modalities to load'})
+    is_s2s: bool=field(default=False, metadata={'help': 'seq2seq fine-tuning only'})
+    tokenizer_bpe_name: Optional[str] = field(default=None, metadata={'help': 'tokenizer model name'})
+    tokenizer_bpe_model: Optional[str] = field(default=None, metadata={'help': 'tokenizer model path'})
+    noise_wav: Optional[str] = field(default=None, metadata={'help': 'manifest of noise wav files (one wav file path per line)'})
+    noise_prob: float = field(default=0, metadata={'help': 'noise probability'})
+    noise_snr: Optional[str] = field(default='0', metadata={'help': 'noise SNR in audio'})
+    noise_num: int = field(default=1, metadata={'help': 'number of noise wav files to mix'})
+    fine_tuning: bool = field(default=False, metadata={"help": "set to true if fine-tuning AV-Hubert"})
+    use_supervised_data: bool = field(default=True, metadata={"help": "use paired speech-text data"})
+    sup_data_path: Optional[str] = field(
+        default=None,
+        metadata={
+            "help": "supervised dataset path",
+        },
+    )
+    sup_manifest: Optional[str] = field(
+        default=None,
+        metadata={
+            "help": "supervised dataset manifest",
+        },
+    )
+    sample_distributions: Optional[str] = field(default='0', metadata={'help': 'sample distribution'})
+    ###########
+    use_extra_textdata: bool = field(default=True, metadata={"help": "use extra text data"})
+    onlytext_manifest: Optional[str] = field(
+        default=None,
+        metadata={
+            "help": "text-only dataset manifest",
+        },
+    )
+    use_extra_audiodata: bool = field(default=True, metadata={"help": "use extra audio data"})
+    onlyaudio_manifest: Optional[str] = field(
+        default=None,
+        metadata={
+            "help": "audio-only dataset manifest",
+        },
+    )
+
+@register_task("vat_hubert_pretraining", dataclass=VATHubertPretrainingConfig)
+class VATHubertPretrainingTask(FairseqTask):
+
+    cfg: VATHubertPretrainingConfig
+
+    def __init__(
+        self,
+        cfg: VATHubertPretrainingConfig,
+    ) -> None:
+        super().__init__(cfg)
+
+        logger.info(f"current directory is {os.getcwd()}")
+        logger.info(f"VATHubertPretrainingTask Config {cfg}")
+
+        self.state.add_factory("phone_dictionary", self.load_phone_dictionaries)
+        # self.state.add_factory("s2s_tokenizer", self.load_tokenizer)
+
+        self.fine_tuning = cfg.fine_tuning
+        if cfg.fine_tuning:
+            self.state.add_factory("target_dictionary", self.load_dictionaries)
+            if cfg.is_s2s:
+                self.state.add_factory("s2s_tokenizer", self.load_tokenizer)
+        else:
+            self.state.add_factory("dictionaries", self.load_dictionaries)
+
+
+
+        self.blank_symbol = "<s>"
+
+    @property
+    def source_dictionary(self) -> Optional[Dictionary]:
+        return None # self._source_dictionary
+
+    @property
+    def target_dictionary(self) -> Optional[Dictionary]:
+        return self.state.target_dictionary # self._target_dictionary
+
+    @property
+    def dictionaries(self) -> List[Dictionary]:
+        return self.state.dictionaries
+
+    @property
+    def phone_dictionary(self) -> List[Dictionary]:
+        return self.state.phone_dictionary
+
+
+    def load_dictionaries(self):
+        label_dir = self.cfg.data if self.cfg.label_dir is None else self.cfg.label_dir
+        dictionaries = [
+            Dictionary.load(f"{label_dir}/dict.{label}.txt")
+            for label in self.cfg.labels
+        ]
+        return dictionaries[0] if self.cfg.fine_tuning else dictionaries
+
+    def load_tokenizer(self):
+        logger.info(f"Using tokenizer")
+        bpe_args = Namespace(**{'bpe': self.cfg.tokenizer_bpe_name, f"{self.cfg.tokenizer_bpe_name}_model": self.cfg.tokenizer_bpe_model})
+        bpe_tokenizer = encoders.build_bpe(bpe_args)
+        return bpe_tokenizer
+
+    def load_phone_dictionaries(self):
+        dictionaries = [
+            Dictionary.load(f"{self.cfg.sup_manifest}/dict.phn.txt")
+        ]
+        return dictionaries
+
+
+    @property
+    def s2s_tokenizer(self):
+        return self.state.s2s_tokenizer
+
+    @classmethod
+    def setup_task(
+        cls, cfg: VATHubertPretrainingConfig, **kwargs
+    ) -> "VATHubertPretrainingTask":
+        if cfg.pdb:
+            import pdb
+            pdb.set_trace()
+        return cls(cfg)
+
+    def get_label_dir(self) -> str:
+        if self.cfg.label_dir is None:
+            return self.cfg.data
+        return self.cfg.label_dir
+
+    def load_dataset(self, split: str, epoch=1, **kwargs) -> None:
+        manifest = f"{self.cfg.data}/{split}.tsv"
+        dictionaries = [self.target_dictionary] if self.fine_tuning else self.dictionaries
+        pad_list = [dictionary.pad() for dictionary in dictionaries]   # [1], blank应该是[0]
+        eos_list = [dictionary.eos() for dictionary in dictionaries]   # [2]
+        if not self.cfg.is_s2s:
+            procs = [LabelEncoder(dictionary) for dictionary in dictionaries]
+        else:
+            logger.info(f"Using tokenizer")
+            bpe_tokenizer = self.s2s_tokenizer
+            procs = [LabelEncoderS2SToken(dictionary, bpe_tokenizer) for dictionary in dictionaries]
+        paths = [
+            f"{self.get_label_dir()}/{split}.{l}" for l in self.cfg.labels
+        ]
+        image_aug = self.cfg.image_aug if split == 'train' else False
+        noise_fn, noise_snr = f"{self.cfg.noise_wav}/{split}.tsv" if self.cfg.noise_wav is not None else None, eval(self.cfg.noise_snr)
+        noise_num = self.cfg.noise_num # 
+        
+        all_datasets = []
+        avdatasets = VATHubertDataset(
+            manifest,
+            sample_rate=self.cfg.sample_rate,
+            label_paths=paths,
+            label_rates=self.cfg.label_rate,
+            pad_list=pad_list,
+            eos_list=eos_list,
+            label_processors=procs,
+            max_keep_sample_size=self.cfg.max_sample_size,
+            min_keep_sample_size=self.cfg.min_sample_size,
+            max_sample_size=self.cfg.max_trim_sample_size,
+            pad_audio=self.cfg.pad_audio,
+            normalize=self.cfg.normalize,
+            store_labels=False,
+            random_crop=self.cfg.random_crop,
+            single_target=self.cfg.single_target,
+            stack_order_audio=self.cfg.stack_order_audio,
+            skip_verify=self.cfg.skip_verify,
+            image_mean=self.cfg.image_mean,
+            image_std=self.cfg.image_std,
+            image_crop_size=self.cfg.image_crop_size,
+            image_aug=image_aug,
+            modalities=self.cfg.modalities,
+            is_s2s=self.cfg.is_s2s,
+            noise_fn=noise_fn,
+            noise_prob=self.cfg.noise_prob,
+            noise_snr=noise_snr,
+            noise_num=noise_num
+        )
+        all_datasets.append(avdatasets)
+
+        # import pdb
+        # pdb.set_trace()
+
+        if self.cfg.use_supervised_data:
+            sup_manifest = f"{self.cfg.sup_manifest}/{split}.tsv"
+
+            sup_paths = [
+                f"{self.cfg.sup_data_path}/{split}.{l}" for l in self.cfg.labels
+            ]
+            
+            phone_dictionaries = self.phone_dictionary
+            phone_procs = [LabelEncoder(dictionary) for dictionary in phone_dictionaries]
+            
+            atdatasets = AudioHubertDataset(
+                sup_manifest,
+                sample_rate=self.cfg.sample_rate,
+                label_paths=sup_paths,
+                label_rates=self.cfg.label_rate,
+                pad_list=pad_list,
+                eos_list=eos_list,
+                label_processors=procs,
+                phone_sequence_processors=phone_procs,
+                max_keep_sample_size=self.cfg.max_sample_size,
+                min_keep_sample_size=self.cfg.min_sample_size,
+                max_sample_size=self.cfg.max_trim_sample_size,
+                pad_audio=self.cfg.pad_audio,
+                normalize=self.cfg.normalize,
+                store_labels=True,
+                single_target=self.cfg.single_target,
+                stack_order_audio=self.cfg.stack_order_audio,
+                skip_verify=self.cfg.skip_verify,
+                is_s2s=self.cfg.is_s2s,
+            )
+            all_datasets.append(atdatasets)
+  
+        if self.cfg.use_extra_textdata:
+            extra_text_manifest = f"{self.cfg.onlytext_manifest}/{split}.tsv"
+            extra_text_paths = [
+                f"{self.cfg.onlytext_manifest}/{split}.{l}" for l in self.cfg.labels
+            ]
+            
+            # import pdb
+            # pdb.set_trace()
+
+            textdatasets = TextHubertDataset(
+                extra_text_manifest,
+                sample_rate=self.cfg.sample_rate,
+                label_paths=extra_text_paths,
+                label_rates=self.cfg.label_rate,
+                pad_list=pad_list,
+                eos_list=eos_list,
+                label_processors=procs,
+                phone_sequence_processors=phone_procs,
+                max_keep_sample_size=self.cfg.max_sample_size,
+                min_keep_sample_size=self.cfg.min_sample_size,
+                max_sample_size=self.cfg.max_trim_sample_size,
+                pad_audio=self.cfg.pad_audio,
+                normalize=self.cfg.normalize,
+                store_labels=True,
+                single_target=self.cfg.single_target,
+                stack_order_audio=self.cfg.stack_order_audio,
+                skip_verify=self.cfg.skip_verify,
+                is_s2s=self.cfg.is_s2s,
+            )
+            all_datasets.append(textdatasets)
+        
+        if self.cfg.use_extra_audiodata:
+            extra_audio_manifest = f"{self.cfg.onlyaudio_manifest}/{split}.tsv"
+            extra_audio_paths = [
+                f"{self.cfg.onlyaudio_manifest}/{split}.{l}" for l in self.cfg.labels
+            ]
+
+            audiodatasets = OnlyAudioHubertDataset(
+                extra_audio_manifest,
+                sample_rate=self.cfg.sample_rate,
+                label_paths=extra_audio_paths,
+                label_rates=self.cfg.label_rate,
+                pad_list=pad_list,
+                eos_list=eos_list,
+                label_processors=procs,
+                max_keep_sample_size=self.cfg.max_sample_size,
+                min_keep_sample_size=self.cfg.min_sample_size,
+                max_sample_size=self.cfg.max_trim_sample_size,
+                pad_audio=self.cfg.pad_audio,
+                normalize=self.cfg.normalize,
+                store_labels=False,
+                single_target=self.cfg.single_target,
+                stack_order_audio=self.cfg.stack_order_audio,
+                skip_verify=self.cfg.skip_verify,
+                is_s2s=self.cfg.is_s2s,
+            )
+            all_datasets.append(audiodatasets)        
+
+
+
+
+        dataset_list = all_datasets
+        dataset_dict = OrderedDict((name, d) for name, d in zip(["videoaudio", "audiotext", "onlytext", "onlyaudio"], dataset_list) if d is not None)
+        if not self.fine_tuning:
+            max_positions_dict = {
+                "videoaudio": 1024,
+                "audiotext": 1024,
+                "onlytext": 1024,
+                "onlyaudio": 1024,
+            }
+            max_positions_dict = OrderedDict((name, max_positions_dict[name]) for name in dataset_dict.keys())
+
+            max_tokens_ratios_dict = {
+                "videoaudio": 1.0,
+                "audiotext": 1.0,
+                "onlytext": 1.0,
+                "onlyaudio": 1.0,
+            }
+            max_tokens_ratios = [max_tokens_ratios_dict[name] for name in dataset_dict.keys()]
+            dataset_lens = np.array([len(dataset) for dataset in dataset_dict.values()])
+            dataset_avg_sample_lens = np.array([
+                sum([dataset.num_tokens(i) for i in np.random.randint(low=0, high=len(dataset), size=10000)]) / 10000.0 
+                for dataset in dataset_dict.values()
+            ])
+            distributions = [eval(self.cfg.sample_distributions)[0], eval(self.cfg.sample_distributions)[1], eval(self.cfg.sample_distributions)[2], eval(self.cfg.sample_distributions)[3]]
+
+        
+
+            logging.info(f"Number samples of datasets is {dataset_lens}")
+            logging.info(f"Avg sample length of datasets is {dataset_avg_sample_lens}")
+            logging.info(f"Sampling distributions is {distributions}")
+            logging.info(f"Maxtokens ratio is {max_tokens_ratios}")
+            logging.info(f"split_modality_batch is {self.cfg.split_modality_batch}")
+
+
+            self.datasets[split] = MultiCorpusDataset(
+                dataset_dict,
+                max_positions=max_positions_dict,
+                distribution=distributions,
+                max_tokens_ratio=max_tokens_ratios,
+                seed=1234,
+                sort_indices=True,
+            )
+
+        if self.fine_tuning:
+            self.datasets[split] = VATHubertDataset(
+                manifest,
+                sample_rate=self.cfg.sample_rate,
+                label_paths=paths,
+                label_rates=self.cfg.label_rate,
+                pad_list=pad_list,
+                eos_list=eos_list,
+                label_processors=procs,
+                max_keep_sample_size=self.cfg.max_sample_size,
+                min_keep_sample_size=self.cfg.min_sample_size,
+                max_sample_size=self.cfg.max_trim_sample_size,
+                pad_audio=self.cfg.pad_audio,
+                normalize=self.cfg.normalize,
+                store_labels=False,
+                random_crop=self.cfg.random_crop,
+                single_target=self.cfg.single_target,
+                stack_order_audio=self.cfg.stack_order_audio,
+                skip_verify=self.cfg.skip_verify,
+                image_mean=self.cfg.image_mean,
+                image_std=self.cfg.image_std,
+                image_crop_size=self.cfg.image_crop_size,
+                image_aug=image_aug,
+                modalities=self.cfg.modalities,
+                is_s2s=self.cfg.is_s2s,
+                noise_fn=noise_fn,
+                noise_prob=self.cfg.noise_prob,
+                noise_snr=noise_snr,
+                noise_num=noise_num
+            )
+
+    # @classmethod
+    # def _get_size_ratios(cls, ids: List[str], sizes: List[int], alpha: float = 1.0):
+    #     """Size ratios for temperature-based sampling
+    #     (https://arxiv.org/abs/1907.05019)"""
+    #     _sizes = np.array(sizes)
+    #     prob = _sizes / _sizes.sum()
+    #     smoothed_prob = prob ** alpha
+    #     smoothed_prob = smoothed_prob / smoothed_prob.sum()
+    #     size_ratio = (smoothed_prob * _sizes.sum()) / _sizes
+
+    #     o_str = str({_i: f"{prob[i]:.3f}" for i, _i in enumerate(ids)})
+    #     logger.info(f"original sampling probability: {o_str}")
+    #     p_str = str({_i: f"{smoothed_prob[i]:.3f}" for i, _i in enumerate(ids)})
+    #     logger.info(f"balanced sampling probability: {p_str}")
+    #     sr_str = str({_id: f"{size_ratio[i]:.3f}" for i, _id in enumerate(ids)})
+    #     logger.info(f"balanced sampling size ratio: {sr_str}")
+    #     return size_ratio.tolist()
+
+
+    # def resample_multi_modality_dataset(self, speech_dataset, paired_datasets, epoch=1, train=True):
+           
+    #     if len(paired_datasets) > 1 and self.cfg.text_sampling_alpha != 1.0:
+    #         size_ratios = self._get_size_ratios(
+    #             paired_splits, [len(s) for s in paired_datasets], alpha=self.cfg.text_sampling_alpha
+    #         )
+    #         paired_datasets = [
+    #             ResamplingDataset(
+    #                 d, size_ratio=r, seed=0, epoch=epoch, replace=(r >= 1.0)
+    #             ) for d, r in zip(paired_datasets, size_ratios)
+    #         ]
+
+    #     dataset_list = [speech_dataset]
+    #     for datasets in [paired_datasets]:
+    #         if len(datasets) > 1:
+    #             dataset_list.append(ConcatDataset(datasets))
+    #         elif len(datasets) == 1:
+    #             dataset_list.append(datasets[0])
+    #         else:
+    #             dataset_list.append(None)
+
+    #     ### match speech/text datasets according to modality
+    #     dataset_dict = OrderedDict((name, d) for name, d in zip(["speech", "speech_sup", "text_mono", "text_paired"], dataset_list) if d is not None)
+    #     max_positions_dict = {
+    #         "speech": None,
+    #         "speech_sup": None,
+    #         "text_mono": (1024, 1024),
+    #         "text_paired": (1024, 1024),
+    #     }
+    #     max_positions_dict = OrderedDict((name, max_positions_dict[name]) for name in dataset_dict.keys())
+    #     max_tokens_ratios_dict = {
+    #         "speech": 1.0,
+    #         "speech_sup": 1.0,
+    #         "text_mono": 1.0 / 320 / 1.0,
+    #         "text_paired": 1.0 / 320 / 1.0,
+    #     }
+    #     max_tokens_ratios = [max_tokens_ratios_dict[name] for name in dataset_dict.keys()]
+    #     dataset_lens = np.array([len(dataset) for dataset in dataset_dict.values()])
+    #     dataset_avg_sample_lens = np.array([
+    #         sum([dataset.num_tokens(i) for i in np.random.randint(low=0, high=len(dataset), size=10000)]) / 10000.0 
+    #         for dataset in dataset_dict.values()
+    #     ])
+
+    #     if not "speech" in dataset_dict:
+    #         distributions = [l / sum(dataset_lens) for l in dataset_lens]
+    #     else:
+    #         ## we just keep the batches of speech and non-speech the same, expand_coef is to ensure speech batches is less than others
+    #         first_ratio = dataset_lens[0] / sum(dataset_lens)
+    #         expand_coef = 1.8 if sup_dataset is None else 1.1 * sum(dataset_lens[0:2]) / dataset_lens[0]
+    #         distributions = [expand_coef * max_tokens_ratios[i] * dataset_avg_sample_lens[0] / l for (i, l) in enumerate(dataset_avg_sample_lens)]
+    #         distributions[0] = 1.0
+    #         if sup_dataset is not None:
+    #             distributions[1] = dataset_lens[1] / dataset_lens[0]
+    #         distributions = [first_ratio * d for d in distributions]
+
+    #     logging.info(f"Number samples of datasets is {dataset_lens}")
+    #     logging.info(f"Avg sample length of datasets is {dataset_avg_sample_lens}")
+    #     logging.info(f"Sampling distributions is {distributions}")
+    #     logging.info(f"Maxtokens ratio is {max_tokens_ratios}")
+    #     return dataset_dict, max_positions_dict, distributions, max_tokens_ratios
+
+
+    def max_positions(self) -> Tuple[int, int]:
+        return (sys.maxsize, sys.maxsize)
+
+    def filter_indices_by_size(
+        self, indices: np.array, *args, **kwargs
+    ) -> np.array:
+        return indices
+
+    def get_batch_iterator(
+        self,
+        dataset,
+        max_tokens=None,
+        max_sentences=None,
+        max_positions=None,
+        ignore_invalid_inputs=False,
+        required_batch_size_multiple=1,
+        seed=1,
+        num_shards=1,
+        shard_id=0,
+        num_workers=0,
+        epoch=1,
+        data_buffer_size=0,
+        disable_iterator_cache=False,
+        skip_remainder_batch=False,
+        grouped_shuffling=False,
+        update_epoch_batch_itr=False,
+    ):
+        """
+        Get an iterator that yields batches of data from the given dataset.
+        Args:
+            dataset (~fairseq.data.FairseqDataset): dataset to batch
+            max_tokens (int, optional): max number of tokens in each batch
+                (default: None).
+            max_sentences (int, optional): max number of sentences in each
+                batch (default: None).
+            max_positions (optional): max sentence length supported by the
+                model (default: None).
+            ignore_invalid_inputs (bool, optional): don't raise Exception for
+                sentences that are too long (default: False).
+            required_batch_size_multiple (int, optional): require batch size to
+                be a multiple of N (default: 1).
+            seed (int, optional): seed for random number generator for
+                reproducibility (default: 1).
+            num_shards (int, optional): shard the data iterator into N
+                shards (default: 1).
+            shard_id (int, optional): which shard of the data iterator to
+                return (default: 0).
+            num_workers (int, optional): how many subprocesses to use for data
+                loading. 0 means the data will be loaded in the main process
+                (default: 0).
+            epoch (int, optional): the epoch to start the iterator from
+                (default: 1).
+            data_buffer_size (int, optional): number of batches to
+                preload (default: 0).
+            disable_iterator_cache (bool, optional): don't cache the
+                EpochBatchIterator (ignores `FairseqTask::can_reuse_epoch_itr`)
+                (default: False).
+            skip_remainder_batch (bool, optional): if set, discard the last
+                batch in each training epoch, as the last batch is often smaller than
+                    local_batch_size * distributed_word_size (default: ``True``).
+            grouped_shuffling (bool, optional): group batches with each groups
+                containing num_shards batches and shuffle groups. Reduces difference
+                between sequence lengths among workers for batches sorted by length.
+            update_epoch_batch_itr (bool optional): if true then donot use the cached
+                batch iterator for the epoch
+            
+        Returns:
+            ~fairseq.iterators.EpochBatchIterator: a batched iterator over the
+                given dataset split
+        """
+        
+        if self.fine_tuning or not isinstance(dataset, MultiCorpusDataset):
+            return super().get_batch_iterator(
+                dataset,
+                max_tokens=max_tokens,
+                max_sentences=max_sentences,
+                max_positions=max_positions,
+                ignore_invalid_inputs=ignore_invalid_inputs,
+                required_batch_size_multiple=required_batch_size_multiple,
+                seed=seed,
+                num_shards=num_shards,
+                shard_id=shard_id,
+                num_workers=num_workers,
+                epoch=epoch,
+                data_buffer_size=data_buffer_size,
+                disable_iterator_cache=disable_iterator_cache,
+            )
+        logging.info(f"num_workers is {num_workers}")
+        can_reuse_epoch_itr = (
+            not disable_iterator_cache
+            and not update_epoch_batch_itr
+            and self.can_reuse_epoch_itr(dataset)
+        )
+        if can_reuse_epoch_itr and dataset in self.dataset_to_epoch_iter:
+            logger.debug("reusing EpochBatchIterator for epoch {}".format(epoch))
+            return self.dataset_to_epoch_iter[dataset]
+
+        assert isinstance(dataset, FairseqDataset)
+
+        # initialize the dataset with the correct starting epoch
+        dataset.set_epoch(epoch)
+
+        # get indices ordered by example size
+        with data_utils.numpy_seed(seed):
+            indices = dataset.ordered_indices()
+
+        # filter examples that are too large
+        if max_positions is not None:
+            indices = self.filter_indices_by_size(
+                indices, dataset, max_positions, ignore_invalid_inputs
+            )
+
+        # create mini-batches with given size constraints
+        batch_sampler = dataset.get_batch_sampler(
+            indices,
+            num_shards,
+            seed,
+            max_tokens=max_tokens,
+            max_sentences=max_sentences,
+            required_batch_size_multiple=required_batch_size_multiple,
+            split_modality_batch=self.cfg.split_modality_batch,
+        )
+
+        # return a reusable, sharded iterator
+        
+        epoch_iter = iterators.EpochBatchIterator(
+            dataset=dataset,
+            collate_fn=dataset.collater,
+            batch_sampler=batch_sampler,
+            seed=seed,
+            num_shards=num_shards,
+            shard_id=shard_id,
+            num_workers=num_workers,
+            epoch=epoch,
+            buffer_size=data_buffer_size,
+            disable_shuffling=True,
+        )
+
+        if can_reuse_epoch_itr:
+            self.dataset_to_epoch_iter[dataset] = epoch_iter
+
+        return epoch_iter
+
+
+    def build_generator(
+        self, models, args, seq_gen_cls=None, extra_gen_cls_kwargs=None, prefix_allowed_tokens_fn=None,
+    ):
+        """
+        Build a :class:`~fairseq.SequenceGenerator` instance for this
+        task.
+        Args:
+            models (List[~fairseq.models.FairseqModel]): ensemble of models
+            args (fairseq.dataclass.configs.GenerationConfig):
+                configuration object (dataclass) for generation
+            extra_gen_cls_kwargs (Dict[str, Any]): extra options to pass
+                through to SequenceGenerator
+            prefix_allowed_tokens_fn (Callable[[int, torch.Tensor], List[int]]):
+                If provided, this function constrains the beam search to
+                allowed tokens only at each step. The provided function
+                should take 2 arguments: the batch ID (`batch_id: int`)
+                and a unidimensional tensor of token ids (`inputs_ids:
+                torch.Tensor`). It has to return a `List[int]` with the
+                allowed tokens for the next generation step conditioned
+                on the previously generated tokens (`inputs_ids`) and
+                the batch ID (`batch_id`). This argument is useful for
+                constrained generation conditioned on the prefix, as
+                described in "Autoregressive Entity Retrieval"
+                (https://arxiv.org/abs/2010.00904) and
+                https://github.com/facebookresearch/GENRE.
+        """
+        if getattr(args, "score_reference", False):
+            from fairseq.sequence_scorer import SequenceScorer
+
+            return SequenceScorer(
+                self.target_dictionary,
+                compute_alignment=getattr(args, "print_alignment", False),
+            )
+
+        # Choose search strategy. Defaults to Beam Search.
+        sampling = getattr(args, "sampling", False)
+        sampling_topk = getattr(args, "sampling_topk", -1)
+        sampling_topp = getattr(args, "sampling_topp", -1.0)
+        diverse_beam_groups = getattr(args, "diverse_beam_groups", -1)
+        diverse_beam_strength = getattr(args, "diverse_beam_strength", 0.5)
+        match_source_len = getattr(args, "match_source_len", False)
+        diversity_rate = getattr(args, "diversity_rate", -1)
+        constrained = getattr(args, "constraints", False)
+        if prefix_allowed_tokens_fn is None:
+            prefix_allowed_tokens_fn = getattr(args, "prefix_allowed_tokens_fn", None)
+        if (
+            sum(
+                int(cond)
+                for cond in [
+                    sampling,
+                    diverse_beam_groups > 0,
+                    match_source_len,
+                    diversity_rate > 0,
+                ]
+            )
+            > 1
+        ):
+            raise ValueError("Provided Search parameters are mutually exclusive.")
+        assert sampling_topk < 0 or sampling, "--sampling-topk requires --sampling"
+        assert sampling_topp < 0 or sampling, "--sampling-topp requires --sampling"
+
+        if sampling:
+            search_strategy = search.Sampling(
+                self.target_dictionary, sampling_topk, sampling_topp
+            )
+        elif diverse_beam_groups > 0:
+            search_strategy = search.DiverseBeamSearch(
+                self.target_dictionary, diverse_beam_groups, diverse_beam_strength
+            )
+        elif match_source_len:
+            # this is useful for tagging applications where the output
+            # length should match the input length, so we hardcode the
+            # length constraints for simplicity
+            search_strategy = search.LengthConstrainedBeamSearch(
+                self.target_dictionary,
+                min_len_a=1,
+                min_len_b=0,
+                max_len_a=1,
+                max_len_b=0,
+            )
+        elif diversity_rate > -1:
+            search_strategy = search.DiverseSiblingsSearch(
+                self.target_dictionary, diversity_rate
+            )
+        elif constrained:
+            search_strategy = search.LexicallyConstrainedBeamSearch(
+                self.target_dictionary, args.constraints
+            )
+        elif prefix_allowed_tokens_fn:
+            search_strategy = search.PrefixConstrainedBeamSearch(
+                self.target_dictionary, prefix_allowed_tokens_fn
+            )
+        else:
+            search_strategy = search.BeamSearch(self.target_dictionary)
+
+        extra_gen_cls_kwargs = extra_gen_cls_kwargs or {}
+        if seq_gen_cls is None:
+            if getattr(args, "print_alignment", False):
+                seq_gen_cls = SequenceGeneratorWithAlignment
+                extra_gen_cls_kwargs["print_alignment"] = args.print_alignment
+            else:
+                seq_gen_cls = SequenceGenerator
+
+        return seq_gen_cls(
+            models,
+            self.target_dictionary,
+            beam_size=getattr(args, "beam", 5),
+            max_len_a=getattr(args, "max_len_a", 0),
+            max_len_b=getattr(args, "max_len_b", 200),
+            min_len=getattr(args, "min_len", 1),
+            normalize_scores=(not getattr(args, "unnormalized", False)),
+            len_penalty=getattr(args, "lenpen", 1),
+            unk_penalty=getattr(args, "unkpen", 0),
+            temperature=getattr(args, "temperature", 1.0),
+            match_source_len=getattr(args, "match_source_len", False),
+            no_repeat_ngram_size=getattr(args, "no_repeat_ngram_size", 0),
+            search_strategy=search_strategy,
+            **extra_gen_cls_kwargs,
+        )
diff --git a/VATLM/vat_hubert/vathubert/utils.py b/VATLM/vat_hubert/vathubert/utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..60d57fa006adbb9839e1c3501b3442917bb0df3e
--- /dev/null
+++ b/VATLM/vat_hubert/vathubert/utils.py
@@ -0,0 +1,298 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+
+import cv2
+import torch
+import random
+import numpy as np
+from typing import Dict, List, Optional, Tuple
+
+def load_video(path):
+    for i in range(3):
+        try:
+            cap = cv2.VideoCapture(path)
+            frames = []
+            while True:
+                ret, frame = cap.read()
+                if ret:
+                    frame = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
+                    frames.append(frame)
+                else:
+                    break
+            frames = np.stack(frames)
+            return frames
+        except Exception:
+            print(f"failed loading {path} ({i} / 3)")
+            if i == 2:
+                raise ValueError(f"Unable to load {path}")
+
+
+class Compose(object):
+    """Compose several preprocess together.
+    Args:
+        preprocess (list of ``Preprocess`` objects): list of preprocess to compose.
+    """
+
+    def __init__(self, preprocess):
+        self.preprocess = preprocess
+
+    def __call__(self, sample):
+        for t in self.preprocess:
+            sample = t(sample)
+        return sample
+
+    def __repr__(self):
+        format_string = self.__class__.__name__ + '('
+        for t in self.preprocess:
+            format_string += '\n'
+            format_string += '    {0}'.format(t)
+        format_string += '\n)'
+        return format_string
+
+
+class Normalize(object):
+    """Normalize a ndarray image with mean and standard deviation.
+    """
+
+    def __init__(self, mean, std):
+        self.mean = mean
+        self.std = std
+
+    def __call__(self, frames):
+        """
+        Args:
+            tensor (Tensor): Tensor image of size (C, H, W) to be normalized.
+        Returns:
+            Tensor: Normalized Tensor image.
+        """
+        frames = (frames - self.mean) / self.std
+        return frames
+
+    def __repr__(self):
+        return self.__class__.__name__+'(mean={0}, std={1})'.format(self.mean, self.std)
+
+class CenterCrop(object):
+    """Crop the given image at the center
+    """
+    def __init__(self, size):
+        self.size = size
+
+    def __call__(self, frames):
+        """
+        Args:
+            img (numpy.ndarray): Images to be cropped.
+        Returns:
+            numpy.ndarray: Cropped image.
+        """
+        t, h, w = frames.shape
+        th, tw = self.size
+        delta_w = int(round((w - tw))/2.)
+        delta_h = int(round((h - th))/2.)
+        frames = frames[:, delta_h:delta_h+th, delta_w:delta_w+tw]
+        return frames
+
+
+class RandomCrop(object):
+    """Crop the given image at the center
+    """
+
+    def __init__(self, size):
+        self.size = size
+
+    def __call__(self, frames):
+        """
+        Args:
+            img (numpy.ndarray): Images to be cropped.
+        Returns:
+            numpy.ndarray: Cropped image.
+        """
+        t, h, w = frames.shape
+        th, tw = self.size
+        delta_w = random.randint(0, w-tw)
+        delta_h = random.randint(0, h-th)
+        frames = frames[:, delta_h:delta_h+th, delta_w:delta_w+tw]
+        return frames
+
+    def __repr__(self):
+        return self.__class__.__name__ + '(size={0})'.format(self.size)
+
+class HorizontalFlip(object):
+    """Flip image horizontally.
+    """
+
+    def __init__(self, flip_ratio):
+        self.flip_ratio = flip_ratio
+
+    def __call__(self, frames):
+        """
+        Args:
+            img (numpy.ndarray): Images to be flipped with a probability flip_ratio
+        Returns:
+            numpy.ndarray: Cropped image.
+        """
+        t, h, w = frames.shape
+        if random.random() < self.flip_ratio:
+            for index in range(t):
+                frames[index] = cv2.flip(frames[index], 1)
+        return frames
+
+def compute_mask_indices(
+    shape: Tuple[int, int],
+    padding_mask: Optional[torch.Tensor],
+    mask_prob: float,
+    mask_length: int,
+    mask_type: str = "static",
+    mask_other: float = 0.0,
+    min_masks: int = 0,
+    no_overlap: bool = False,
+    min_space: int = 0,
+) -> np.ndarray:
+    """
+    Computes random mask spans for a given shape
+    Args:
+        shape: the the shape for which to compute masks.
+            should be of size 2 where first element is batch size and 2nd is timesteps
+        padding_mask: optional padding mask of the same size as shape, which will prevent masking padded elements
+        mask_prob: probability for each token to be chosen as start of the span to be masked. this will be multiplied by
+            number of timesteps divided by length of mask span to mask approximately this percentage of all elements.
+            however due to overlaps, the actual number will be smaller (unless no_overlap is True)
+        mask_type: how to compute mask lengths
+            static = fixed size
+            uniform = sample from uniform distribution [mask_other, mask_length*2]
+            normal = sample from normal distribution with mean mask_length and stdev mask_other. mask is min 1 element
+            poisson = sample from possion distribution with lambda = mask length
+        min_masks: minimum number of masked spans
+        no_overlap: if false, will switch to an alternative recursive algorithm that prevents spans from overlapping
+        min_space: only used if no_overlap is True, this is how many elements to keep unmasked between spans
+    """
+
+    bsz, all_sz = shape
+    mask = np.full((bsz, all_sz), False)
+
+    all_num_mask = int(
+        # add a random number for probabilistic rounding
+        mask_prob * all_sz / float(mask_length)
+        + np.random.rand()
+    )
+
+    all_num_mask = max(min_masks, all_num_mask)
+
+    mask_idcs = []
+    for i in range(bsz):
+        if padding_mask is not None:
+            sz = all_sz - padding_mask[i].long().sum().item()
+            num_mask = int(
+                # add a random number for probabilistic rounding
+                mask_prob * sz / float(mask_length)
+                + np.random.rand()
+            )
+            num_mask = max(min_masks, num_mask)
+        else:
+            sz = all_sz
+            num_mask = all_num_mask
+
+        if mask_type == "static":
+            lengths = np.full(num_mask, mask_length)
+        elif mask_type == "uniform":
+            lengths = np.random.randint(mask_other, mask_length * 2 + 1, size=num_mask)
+        elif mask_type == "normal":
+            lengths = np.random.normal(mask_length, mask_other, size=num_mask)
+            lengths = [max(1, int(round(x))) for x in lengths]
+        elif mask_type == "poisson":
+            lengths = np.random.poisson(mask_length, size=num_mask)
+            lengths = [int(round(x)) for x in lengths]
+        else:
+            raise Exception("unknown mask selection " + mask_type)
+
+        if sum(lengths) == 0:
+            lengths[0] = min(mask_length, sz - 1)
+
+        if no_overlap:
+            mask_idc = []
+
+            def arrange(s, e, length, keep_length):
+                span_start = np.random.randint(s, e - length)
+                mask_idc.extend(span_start + i for i in range(length))
+
+                new_parts = []
+                if span_start - s - min_space >= keep_length:
+                    new_parts.append((s, span_start - min_space + 1))
+                if e - span_start - keep_length - min_space > keep_length:
+                    new_parts.append((span_start + length + min_space, e))
+                return new_parts
+
+            parts = [(0, sz)]
+            min_length = min(lengths)
+            for length in sorted(lengths, reverse=True):
+                lens = np.fromiter(
+                    (e - s if e - s >= length + min_space else 0 for s, e in parts),
+                    np.int,
+                )
+                l_sum = np.sum(lens)
+                if l_sum == 0:
+                    break
+                probs = lens / np.sum(lens)
+                c = np.random.choice(len(parts), p=probs)
+                s, e = parts.pop(c)
+                parts.extend(arrange(s, e, length, min_length))
+            mask_idc = np.asarray(mask_idc)
+        else:
+            min_len = min(lengths)
+            if sz - min_len <= num_mask:
+                min_len = sz - num_mask - 1
+
+            mask_idc = np.random.choice(sz - min_len, num_mask, replace=False)
+
+            mask_idc = np.asarray(
+                [
+                    mask_idc[j] + offset
+                    for j in range(len(mask_idc))
+                    for offset in range(lengths[j])
+                ]
+            )
+
+        mask_idcs.append(np.unique(mask_idc[mask_idc < sz]))
+
+    min_len = min([len(m) for m in mask_idcs])
+    batch_indexes, starts, ends = [], [], []
+    for i, mask_idc in enumerate(mask_idcs):
+        if len(mask_idc) > min_len:
+            mask_idc = np.random.choice(mask_idc, min_len, replace=False)
+        mask[i, mask_idc] = True
+        vals, run_starts, run_lengths = find_runs(mask[i])
+        start_indices, lengths = run_starts[vals == True], run_lengths[vals == True]
+        starts.append(start_indices)
+        ends.append(start_indices+lengths)
+        batch_indexes.append(np.zeros([len(start_indices)])+i)
+    return mask, np.concatenate(starts).astype(np.int64), np.concatenate(ends).astype(np.int64), np.concatenate(batch_indexes).astype(np.int64)
+
+def find_runs(x):
+    """Find runs of consecutive items in an array."""
+
+    # ensure array
+    x = np.asanyarray(x)
+    if x.ndim != 1:
+        raise ValueError('only 1D array supported')
+    n = x.shape[0]
+
+    # handle empty array
+    if n == 0:
+        return np.array([]), np.array([]), np.array([])
+
+    else:
+        # find run starts
+        loc_run_start = np.empty(n, dtype=bool)
+        loc_run_start[0] = True
+        np.not_equal(x[:-1], x[1:], out=loc_run_start[1:])
+        run_starts = np.nonzero(loc_run_start)[0]
+
+        # find run values
+        run_values = x[loc_run_start]
+
+        # find run lengths
+        run_lengths = np.diff(np.append(run_starts, n))
+
+        return run_values, run_starts, run_lengths
diff --git a/WavLLM/README.md b/WavLLM/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..23d5221b997d6a28596ee0652dbaa10d44365f70
--- /dev/null
+++ b/WavLLM/README.md
@@ -0,0 +1,80 @@
+# WavLLM
+
+<!--**Pre-trained models for speech related tasks**-->
+
+ [**WavLLM: Towards Robust and Adaptive Speech Large Language Model**](https://arxiv.org/abs/2404.00656)
+
+
+- April 2024: release the code and models
+- April 2024: release preprint in [arXiv](https://arxiv.org/abs/2404.00656)
+
+## Model
+to get the WavLLM model, run
+```bash
+bash ./download/download.sh 0
+```
+
+## Gaokao (SQA)
+The audio samples and transcripts can be download using
+```bash
+bash ./download/download.sh 1
+```
+
+The tsv file path of SQA task is [`tsv_path`](wavllm/test_data/gaokao.tsv).
+
+## Setup
+
+```bash
+git submodule update --init WavLLM/fairseq
+cd WavLLM/
+conda create -n wavllm python=3.10.0
+conda activate wavllm
+pip install --editable fairseq/
+pip install sentencepiece
+pip install transformers==4.32.1
+pip install numpy==1.23.5
+pip install editdistance
+pip install soundfile
+```
+
+## Inference
+```bash
+cp -r wavllm fairseq/examples
+cd fairseq
+bash examples/wavllm/scripts/inference_sft.sh $model_path $data_name
+```
+We provided examples of each task in [`test_data`](wavllm/test_data)
+
+## Examples
+| Task |  Audio | Prompt |  Target |  Output |
+| :-----: | :-----: | :-----: | :-----: | :-----: |
+| ASR | [`audio`](wavllm/test_data/audio/asr.flac) | Based on the attached audio, generate a comprehensive text transcription of the spoken content. | he hoped there would be stew for dinner turnips and carrots and bruised potatoes and fat mutton pieces to be ladled out in thick peppered flour fattened sauce | he hoped there would be stew for dinner turnips and carrots and browned potatoes and fat mutton pieces to be ladled out in thick peppered flour fattened sauce  |
+| SV | [`audio`](wavllm/test_data/audio/sv.wav) | Is there only one speaker in the audio clip? | Yes | Yes, there is only one speaker in the audio clip  |
+| ST | [`audio`](wavllm/test_data/audio/st.flac) | Translate the audio clip into German. | Sie wird schon in Ordnung sein. | Sie wird es wohl überleben.  |
+| ER | [`audio`](wavllm/test_data/audio/emo.wav) | Can you describe the emotional condition of the speaker in the provided audio clip? | sad | sad  |
+| SQA | [`audio`](wavllm/test_data/audio/sqa.wav) | What will the man do next? A. Start to take exercise; B. Do as he always does; C. Change his working time. | A | A |
+| SQQA | [`audio`](wavllm/test_data/audio/sqqa.wav) | - | The fundamental theorem of calculus is a theorem that links the concept of the derivative of a function with the concept of the integral. | The fundamental theorem of calculus states that different rates of change of a function at a point in the domain of the function are equal. This theorem is used to determine the accumulation of a quantity over a given interval, and to establish relationships between different variables. It is also used to calculate derivatives, which are important in many areas such as optimization, physics, and economics. |
+| II-task | [`audio`](wavllm/test_data/audio/II-task.wav) | To begin, Transcribe the audio recording into text, capturing every spoken word; Subsequently, How does the woman finally decide to go home? A. By bus; B. In the man’s car; C. In her father’s car.; Furthermore, ignore the audio clip, What is the capital of New Zealand?; Lastly, Continue the narrative of given audio clip in a coherent and engaging way |  - | Here's the text from the audio recording: o no i missed the bus home it's already half past ten now i don't think there would be another bus to come don't worry i can take you home thanks but we go in different directions i'll just call my dad to pick me up\\nAddressing the prompt 'How does the woman finally decide to go home? A. By bus; B. In the man’s car; C. In her father’s car.', the answer is: C\\nThe response to 'What is the capital of New Zealand? is: The capital of New Zealand is Wellington.\\nContinuing the story, we have: The woman looked at the man with gratitude and relief. She had been worried about how she would get home after missing the bus. The man smiled and said, “It’s no problem, I can take you home.” The woman thanked him and they made their way out of the building. As they walked, the woman realized that they were going in different directions. She thanked the man again and promised to call her dad to pick her up. As they parted ways, the man wished her a good night and she walked towards her dad’s house, feeling grateful for the kindness of a stranger.  |
+| CoT-task | [`audio`](wavllm/test_data/audio/CoT-task.wav) | First of all, transcribe the audio recording into text, capturing every spoken word; Additionally given this audio clip and text, can you condense it into a clear, concise summary, no more than 20 words?; Lastly disregarding the sound, translate this English summary into German. | Drei Filme aus dem asiatisch-pazifischen Raum im Rennen in Cannes | The audio transcription is: three films from asia pacific are in the running for the coveted golden palms at this year's cannes film festival competing in a field dominated by european productions organizers announced monday; The audio clip have been summarized as follows: Three films from Asia-Pacific are competing for Golden Palms at this year's Cannes Film Festival, challenging European productions.; The translation of summary is: **Drei Filme aus Asien-Pazifik nehmen an diesem Jahr an den Filmfestspielen von Cannes teil und konkurrieren mit europäischen Produktionen um die Goldenen Palmen.**  |
+| CoT-task-story | [`audio`](wavllm/test_data/audio/CoT-task-story.wav) | First of all, transcribe the audio recording into text, capturing every spoken word; Additionally given this audio clip and text, can you condense it into a clear, concise summary, no more than 20 words?; Lastly disregarding the sound, translate this English summary into German. | Bis zum Jahr 2500 ist die Erde eine umweltfreundliche Utopie mit fortschrittlicher KI, neuronaler Vernetzung und einer perfekten Mischung aus Technologie und Natur. | The audio clip's transcription reads: in the year twenty five hundred earth gleamed like a sapphire a futuristic utopia where harmony reigned skyscrapers draped in lush greenery stretched towards the heavens their glass surfaces reflecting the tranquil azure of a pollution free sky humanity had transcended past conflicts embracing an era of shared consciousness through neural connectivity autonomous vehicles glided silently on solar paths while people mingled in serene communal spaces their basic needs met by advanced ai that predicted and catered to their every whim the great reconciliation had merged technology with nature and in this new world every individual thrived their potential limited only by the expanse of their own creativity the utopia wasn't just a place it was the pulse of civilization beating in perfect rhythm with the universe; The summary of the audio clip is: In 2500, Earth became a futuristic utopia with harmonious, pollution-free society, advanced technology, and limitless individual potential.; The summary translates to: **Im Jahr 2500 wurde die Erde zu einer futuristischen Utopie mit harmonischer Gesellschaft, reiner Umwelt, fortschrittlicher Technologie und unbegrenztem menschlichem Potenzial.** |
+
+## License
+
+This project is licensed under the license found in the LICENSE file in the root directory of this source tree.
+Portions of the source code are based on the [FAIRSEQ](https://github.com/pytorch/fairseq) and [av_hubert](https://github.com/facebookresearch/av_hubert)
+
+[Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct)
+
+## Reference
+
+If you find our work is useful in your research, please cite the following paper:
+
+```bibtex
+@article{hu2024wavllm,
+      title={WavLLM: Towards Robust and Adaptive Speech Large Language Model}, 
+      author={Shujie Hu, Long Zhou, Shujie Liu, Sanyuan Chen, Hongkun Hao, Jing Pan, Xunying Liu, Jinyu Li, Sunit Sivasankaran, Linquan Liu, Furu Wei},
+      year={2024},
+      eprint={2404.00656},
+      archivePrefix={arXiv},
+}
+```
diff --git a/WavLLM/download/download.sh b/WavLLM/download/download.sh
new file mode 100644
index 0000000000000000000000000000000000000000..a670932452c7b54fcd8ac1dfc638fc9b36143fd6
--- /dev/null
+++ b/WavLLM/download/download.sh
@@ -0,0 +1,26 @@
+stage=$1
+# WavLLM model
+if [ "$stage" -eq 0 ]; then
+  url_p1="https://valle.blob.core.windows.net/share/wavllm/fi"
+  url_p2="nal.pt?sv=2021-10-04&st=2024-04-24T04%3A50%3A"
+  url_p3="15Z&se=2025-04-25T04%3A50%3A00Z&sr=b&sp=r&si"
+  url_p4="g=M82edjKinydPiVd86oS78ZS9L"
+  url_p5="TVxg0%2F2om3IaEkodIo%3D"
+  curl -o final.pt ${url_p1}${url_p2}${url_p3}${url_p4}${url_p5}
+else
+  # gaokao_audio
+  url_p1="https://valle.blob.core.windows.net/share/wavllm/ga"
+  url_p2="okao_audio.zip?sv=2021-10-04&st=2024-04-24T04%3A58%3A"
+  url_p3="56Z&se=2025-04-25T04%3A58%3A00Z&sr=b&sp=r&s"
+  url_p4="ig=0ql1dkz59%2FSxRHkz1ajtC"
+  url_p5="yfCR5Hva4UISlIfDrOO%2BRc%3D"
+  curl -o gaokao_audio.zip ${url_p1}${url_p2}${url_p3}${url_p4}${url_p5}
+
+  # gaokao_transcript
+  url_p1="https://valle.blob.core.windows.net/share/wavllm/ga"
+  url_p2="okao_text.zip?sv=2021-10-04&st=2024-04-24T04%3A57%3A"
+  url_p3="37Z&se=2025-04-25T04%3A57%3A00Z&sr=b&sp=r&s"
+  url_p4="ig=n5QKXU3F9RiP6SxHl6uVEJ"
+  url_p5="8m7WZ3iEeOGns1BoIozvI%3D"
+  curl -o gaokao_text.zip ${url_p1}${url_p2}${url_p3}${url_p4}${url_p5}
+fi
\ No newline at end of file
diff --git a/WavLLM/wavllm/__init__.py b/WavLLM/wavllm/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..2ab47a2570c34fde2658da0366bfbcc0ef5a857b
--- /dev/null
+++ b/WavLLM/wavllm/__init__.py
@@ -0,0 +1 @@
+from . import criterions
diff --git a/WavLLM/wavllm/criterions/cross_entropy_acc.py b/WavLLM/wavllm/criterions/cross_entropy_acc.py
new file mode 100644
index 0000000000000000000000000000000000000000..6a97fb6ec8b109978c20d58b1c6feeb6ed391f86
--- /dev/null
+++ b/WavLLM/wavllm/criterions/cross_entropy_acc.py
@@ -0,0 +1,120 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import math
+from dataclasses import dataclass
+
+import torch.nn.functional as F
+from fairseq import metrics, utils
+from fairseq.criterions import FairseqCriterion, register_criterion
+from fairseq.dataclass import FairseqDataclass
+from omegaconf import II
+import torch
+
+
+@dataclass
+class CrossEntropyAccuracyCriterionConfig(FairseqDataclass):
+    sentence_avg: bool = II("optimization.sentence_avg")
+
+
+@register_criterion("cross_entropy_acc", dataclass=CrossEntropyAccuracyCriterionConfig)
+class CrossEntropyAccuracyCriterion(FairseqCriterion):
+    def __init__(self, task, sentence_avg):
+        super().__init__(task)
+        self.sentence_avg = sentence_avg
+
+    def forward(self, model, sample, reduce=True):
+        """Compute the loss for the given sample.
+
+        Returns a tuple with three elements:
+        1) the loss
+        2) the sample size, which is used as the denominator for the gradient
+        3) logging outputs to display while training
+        """
+        #print ("self.padding_idx: ", self.padding_idx) # pad_id is 0 (unk_id is 0 in LLaMA)
+        net_output = model(**sample["net_input"])
+
+        loss, lprobs, target = self.compute_loss(model, net_output, sample, reduce=reduce)
+        ## cal acoustic accuracy
+        mask = target != self.padding_idx
+        correct = torch.sum(lprobs.argmax(1).masked_select(mask) == target.masked_select(mask))
+        total = torch.sum(mask)
+
+
+        sample_size = (
+            sample["target"].size(0) if self.sentence_avg else sample["ntokens"]
+        )
+        logging_output = {
+            "loss": loss.data,
+            "correct": correct,
+            "total": total,
+            "ntokens": sample["ntokens"],
+            "nsentences": sample["target"].size(0),
+            "sample_size": sample_size,
+        }
+        return loss, sample_size, logging_output
+
+    def compute_loss(self, model, net_output, sample, reduce=True):
+        lprobs = model.get_normalized_probs(net_output, log_probs=True)
+        lprobs = lprobs.view(-1, lprobs.size(-1))
+        target = model.get_targets(sample, net_output).view(-1)
+        loss = F.nll_loss(
+            lprobs,
+            target,
+            ignore_index=self.padding_idx,
+            reduction="sum" if reduce else "none",
+        )
+        # if net_output[0][1] is not None:
+        #     cosine_loss = 1 - F.cosine_similarity(net_output[0][1], net_output[0][2], dim=-1).mean()
+        #     loss = loss + cosine_loss
+        return loss, lprobs, target
+
+    @staticmethod
+    def reduce_metrics(logging_outputs) -> None:
+        """Aggregate logging outputs from data parallel training."""
+        loss_sum = sum(log.get("loss", 0) for log in logging_outputs)
+        ntokens = sum(log.get("ntokens", 0) for log in logging_outputs)
+        sample_size = sum(log.get("sample_size", 0) for log in logging_outputs)
+        correct_sum = utils.item(sum(log.get("correct", 0) for log in logging_outputs))
+        total_sum = utils.item(sum(log.get("total", 0) for log in logging_outputs))
+
+        # we divide by log(2) to convert the loss from base e to base 2
+        metrics.log_scalar(
+            "loss", loss_sum / sample_size / math.log(2), sample_size, round=3
+        )
+        
+        if sample_size != ntokens:
+            metrics.log_scalar(
+                "nll_loss", loss_sum / ntokens / math.log(2), ntokens, round=3
+            )
+            metrics.log_derived(
+                "ppl", lambda meters: utils.get_perplexity(meters["nll_loss"].avg)
+            )
+        else:
+            metrics.log_derived(
+                "ppl", lambda meters: utils.get_perplexity(meters["loss"].avg)
+            )
+
+        if total_sum > 0:
+            metrics.log_scalar("total_sum", total_sum)
+            metrics.log_scalar("correct_sum", correct_sum)
+            metrics.log_derived(
+                "accuracy",
+                lambda meters: round(
+                    meters["correct_sum"].sum * 100.0 / meters["total_sum"].sum, 3
+                )
+                if meters["total_sum"].sum > 0
+                else float("nan"),
+            )
+
+
+    @staticmethod
+    def logging_outputs_can_be_summed() -> bool:
+        """
+        Whether the logging outputs returned by `forward` can be summed
+        across workers prior to calling `reduce_metrics`. Setting this
+        to True will improves distributed training speed.
+        """
+        return True
\ No newline at end of file
diff --git a/WavLLM/wavllm/data/speechllm_dataset.py b/WavLLM/wavllm/data/speechllm_dataset.py
new file mode 100644
index 0000000000000000000000000000000000000000..42cb8f48e11d860baa98039218d614a7c0b4a3fd
--- /dev/null
+++ b/WavLLM/wavllm/data/speechllm_dataset.py
@@ -0,0 +1,678 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import csv
+import io
+import logging
+import re
+import time
+from collections import defaultdict
+from pathlib import Path
+from typing import Dict, List, Optional
+from dataclasses import dataclass
+
+import numpy as np
+import torch
+from fairseq.data import (
+    ConcatDataset,
+    Dictionary,
+    FairseqDataset,
+    ResamplingDataset,
+    data_utils as fairseq_data_utils,
+)
+from fairseq.data.audio.audio_utils import (
+    get_fbank,
+    get_waveform,
+    read_from_stored_zip,
+    is_npy_data,
+    is_sf_audio_data,
+    parse_path,
+    FEATURE_OR_SF_AUDIO_FILE_EXTENSIONS,
+)
+from fairseq.data.audio.feature_transforms import CompositeAudioFeatureTransform
+#from fairseq.data.audio.data_cfg import S2TDataConfig as SpeechLLMDataConfig
+
+import os
+from sentencepiece import SentencePieceProcessor
+from copy import deepcopy
+
+import torchaudio
+
+logger = logging.getLogger(__name__)
+
+
+def get_features_from_npy_or_audio(path):
+    ext = Path(path).suffix
+    if ext not in FEATURE_OR_SF_AUDIO_FILE_EXTENSIONS:
+        raise ValueError(f'Unsupported file format for "{path}"')
+    return np.load(path) if ext == ".npy" else get_fbank(path)
+
+
+def get_features_or_waveform_from_stored_zip(
+    path,
+    byte_offset,
+    byte_size,
+    need_waveform=False,
+    use_sample_rate=None,
+):
+    assert path.endswith(".zip")
+    data = read_from_stored_zip(path, byte_offset, byte_size)
+    f = io.BytesIO(data)
+    if is_npy_data(data):
+        features_or_waveform = np.load(f)
+    elif is_sf_audio_data(data):
+        features_or_waveform = (
+            get_waveform(f, always_2d=False, output_sample_rate=use_sample_rate)[0]
+            if need_waveform
+            else get_fbank(f)
+        )
+    else:
+        raise ValueError(f'Unknown file format for "{path}"')
+    return features_or_waveform
+
+
+def get_features_or_waveform(path: str, need_waveform=False, use_sample_rate=None):
+    """Get speech features from .npy file or waveform from .wav/.flac file.
+    The file may be inside an uncompressed ZIP file and is accessed via byte
+    offset and length.
+
+    Args:
+        path (str): File path in the format of "<.npy/.wav/.flac path>" or
+        "<zip path>:<byte offset>:<byte length>".
+        need_waveform (bool): return waveform instead of features.
+        use_sample_rate (int): change sample rate for the input wave file
+
+    Returns:
+        features_or_waveform (numpy.ndarray): speech features or waveform.
+    """
+    _path, slice_ptr = parse_path(path)
+    # print(_path)
+    if len(slice_ptr) == 0:
+        if need_waveform:
+            return get_waveform(
+                _path, always_2d=False, output_sample_rate=use_sample_rate
+            )[0]
+        return get_features_from_npy_or_audio(_path)
+    elif len(slice_ptr) == 2:
+        features_or_waveform = get_features_or_waveform_from_stored_zip(
+            _path,
+            slice_ptr[0],
+            slice_ptr[1],
+            need_waveform=need_waveform,
+            use_sample_rate=use_sample_rate,
+        )
+    else:
+        raise ValueError(f"Invalid path: {path}")
+
+    return features_or_waveform
+
+
+def _collate_frames(
+    frames: List[torch.Tensor], is_audio_input: bool = False
+) -> torch.Tensor:
+    """
+    Convert a list of 2D frames into a padded 3D tensor
+    Args:
+        frames (list): list of 2D frames of size L[i]*f_dim. Where L[i] is
+            length of i-th frame and f_dim is static dimension of features
+    Returns:
+        3D tensor of size len(frames)*len_max*f_dim where len_max is max of L[i]
+    """    
+    max_len = max(frame.size(0) for frame in frames)
+    if is_audio_input:
+        out = frames[0].new_zeros((len(frames), max_len))
+    else:
+        out = frames[0].new_zeros((len(frames), max_len, frames[0].size(1)))
+    for i, v in enumerate(frames):
+        out[i, : v.size(0)] = v
+    return out
+
+def _collate_frames_pad_number(
+    frames: List[torch.Tensor], pad_number
+) -> torch.Tensor:
+    max_len = max(frame.size(1) for frame in frames)
+    # max_len = 2250 # all pad to 30s
+    out = frames[0].new_ones((len(frames), frames[0].size(0), max_len)) * pad_number
+    
+    for i, v in enumerate(frames):
+        out[i, :, : v.size(1)] = v
+    return out
+
+
+@dataclass
+class SpeechLLMDatasetItem(object):
+    index: int
+    source: torch.Tensor
+    target: Optional[torch.Tensor] = None
+    left_prompt: Optional[torch.Tensor] = None
+    speaker_id: Optional[int] = None
+    target_mask: Optional[bool] = None
+    prompt_mask: Optional[bool] = None
+    left_prompt_mask: Optional[bool] = None
+    speech_flag: bool = None
+    speech_mask: Optional[bool] = None
+    audio_codec: Optional[torch.Tensor] = None
+    mid_prompt: Optional[torch.Tensor] = None
+    mid_prompt_mask: Optional[bool] = None
+    example_source: Optional[torch.Tensor] = None
+    example_speech_mask: Optional[bool] = None
+    example_audio_codec: Optional[torch.Tensor] = None
+    lora_scale: Optional[torch.Tensor] = None
+    orig_prompt: Optional[torch.Tensor] = None
+    wavlm_sources: Optional[torch.Tensor] = None
+    wavlm_speech_mask: Optional[bool] = None
+    example_wavlm_sources: Optional[torch.Tensor] = None
+    example_wavlm_speech_mask: Optional[bool] = None
+
+class SpeechLLMDataset(FairseqDataset):
+    LANG_TAG_TEMPLATE = "<lang:{}>"
+
+    def __init__(
+        self,
+        cfg,
+        data_root: str,
+        split: str,
+        is_train_split: bool,
+        text_tokenizer=None,
+        audio_processor=None,
+        wavlm_processor=None,
+        n_frames_per_step=1,
+        append_eos=True,
+    ):  
+        samples = self._load_samples_from_tsv(data_root, split)
+        KEY_ID, KEY_AUDIO, KEY_N_FRAMES = "id", "audio", "n_frames"
+        KEY_PROMPT_TEXT, KEY_TGT_TEXT = "prompt", "tgt_text"
+
+        START_FRAME, END_FRAME = "start_frame", "end_frame"
+
+        WITH_SPEECH = "with_speech"
+
+        audio_root = Path(cfg.audio_root)
+
+        ids = [s[KEY_ID] for s in samples]
+        audio_paths = [(audio_root / s[KEY_AUDIO]).as_posix() for s in samples]
+        n_frames = [int(s[KEY_N_FRAMES]) for s in samples]
+        prompt_texts = [s[KEY_PROMPT_TEXT] for s in samples]
+        tgt_texts = [s[KEY_TGT_TEXT] for s in samples]
+
+        if START_FRAME in samples[0]:
+            start_frames = [s[START_FRAME] for s in samples]
+            self.start_frames = start_frames
+        else:
+            self.start_frames = None
+        
+        if END_FRAME in samples[0]:
+            end_frames = [s[END_FRAME] for s in samples]
+            self.end_frames = end_frames
+        else:
+            self.end_frames = None
+
+        self.split, self.is_train_split = split, is_train_split
+        self.cfg = cfg
+        self.audio_paths, self.n_frames = audio_paths, n_frames
+        self.n_samples = len(audio_paths)
+        assert len(n_frames) == self.n_samples > 0
+        assert prompt_texts is None or len(prompt_texts) == self.n_samples
+        assert tgt_texts is None or len(tgt_texts) == self.n_samples
+        assert ids is None or len(ids) == self.n_samples
+
+        if cfg.alpaca_text:
+            speech_flag = [s[WITH_SPEECH] for s in samples]
+            assert speech_flag is None or len(speech_flag) == self.n_samples
+            self.speech_flag = speech_flag
+
+        if cfg.prompt_bulid:
+            self.B_INST = "[INST]"
+            self.B_SYS = "<<SYS>>\n"
+            self.SYSTEM = "As a helpful language and speech assistant, you are able to understand the speech content provided by the user, and assist the user with a variety of tasks using natural language."
+            self.E_SYS = "\n<</SYS>>\n\n"
+            self.E_INST = "[/INST]"
+            self.B_SPEECH = "<SPEECH>"
+            self.E_SPEECH = "</SPEECH>"
+            self.B_EXAMPLE = "<EXAMPLE>"
+            self.E_EXAMPLE = "</EXAMPLE>"
+            self.B_TARGET = "<TARGET>"
+            self.E_TARGET = "</TARGET>"
+
+        self.tgt_texts = tgt_texts
+        self.prompt_texts = prompt_texts
+        self.ids = ids
+        self.shuffle = cfg.shuffle if is_train_split else False
+
+        self.feature_transforms = None
+
+        self.tokenizer = text_tokenizer
+        self.audio_processor = audio_processor
+        self.wavlm_processor = wavlm_processor
+        self.n_frames_per_step = n_frames_per_step
+        self.speaker_to_id = None
+
+        self.tgt_lens = self.get_tgt_lens_and_check_oov()
+        self.append_eos = append_eos
+
+        logger.info(self.__repr__())
+
+    def get_tgt_lens_and_check_oov(self):
+        if self.tgt_texts is None:
+            return [0 for _ in range(self.n_samples)]
+        tgt_lens = []
+        n_tokens, n_oov_tokens = 0, 0
+        for i in range(self.n_samples):
+            tokenized = self.get_tokenized_tgt_text(i)
+            # oov_tokens = [
+            #     t
+            #     for t in tokenized
+            #     if self.tgt_dict.index(t) == self.tgt_dict.unk_index
+            # ]
+            # n_tokens += len(tokenized)
+            # n_oov_tokens += len(oov_tokens)
+            tgt_lens.append(len(tokenized))
+        #logger.info(f"'{self.split}' has {n_oov_tokens / n_tokens * 100:.2f}% OOV")
+        return tgt_lens
+
+    def __repr__(self):
+        return (
+            self.__class__.__name__
+            + f'(split="{self.split}", n_samples={self.n_samples:_}, '
+            f"prepend_tgt_lang_tag={self.cfg.prepend_tgt_lang_tag}, "
+            f"shuffle={self.shuffle}, transforms={self.feature_transforms}, "
+            f"n_frames_per_step={self.n_frames_per_step}"
+        )
+
+    @classmethod
+    def is_lang_tag(cls, token):
+        pattern = cls.LANG_TAG_TEMPLATE.replace("{}", "(.*)")
+        return re.match(pattern, token)
+
+    def check_tgt_lang_tag(self):
+        if self.cfg.prepend_tgt_lang_tag:
+            assert self.tgt_langs is not None and self.tgt_dict is not None
+            tgt_lang_tags = [
+                self.LANG_TAG_TEMPLATE.format(t) for t in set(self.tgt_langs)
+            ]
+            assert all(t in self.tgt_dict for t in tgt_lang_tags)
+
+    @classmethod
+    def tokenize(cls, tokenizer, text: str):
+        return text if tokenizer is None else tokenizer.encode(text)
+
+    def get_tokenized_tgt_text(self, index: int):
+        text = self.tgt_texts[index]
+        text = torch.tensor(self.tokenizer.encode(text, bos=False, eos=True), dtype=torch.int64)
+        return text
+
+    def get_tokenized_few_shot_tgt_text(self, text):
+        text = torch.tensor(self.tokenizer.encode(text, bos=False, eos=False), dtype=torch.int64)
+        return text
+
+    def get_speech_flag(self, index: int):
+        speech_flag = self.speech_flag[index]
+        if speech_flag == "True":
+            return True
+        else:
+            return False
+
+    def get_tokenized_prompt_text(self, index: int):
+        text = self.prompt_texts[index]
+        text = torch.tensor(self.tokenizer.encode('"' + text + '"', bos=False, eos=False), dtype=torch.int64)
+        return text
+
+    def get_tokenized_left_and_right_prompts_text(self, index, left_str, right_str):
+        left_text = torch.tensor(self.tokenizer.encode(left_str, bos=True, eos=False), dtype=torch.int64)
+        right_text = torch.tensor(self.tokenizer.encode(" " + self.E_SPEECH + " " + self.prompt_texts[index] + " " + right_str, bos=False, eos=False), dtype=torch.int64)
+        return left_text, right_text
+
+    def pack_frames(self, feature: torch.Tensor):
+        if self.n_frames_per_step == 1:
+            return feature
+        n_packed_frames = feature.shape[0] // self.n_frames_per_step
+        feature = feature[: self.n_frames_per_step * n_packed_frames]
+        return feature.reshape(n_packed_frames, -1)
+
+    @classmethod
+    def get_lang_tag_idx(cls, lang: str, dictionary: Dictionary):
+        lang_tag_idx = dictionary.index(cls.LANG_TAG_TEMPLATE.format(lang))
+        assert lang_tag_idx != dictionary.unk()
+        return lang_tag_idx
+
+    def _get_source_audio(self, index: int) -> torch.Tensor:
+        source = get_features_or_waveform(
+            self.audio_paths[index],
+            need_waveform=self.cfg.use_audio_input,
+            use_sample_rate=self.cfg.use_sample_rate,
+        )
+        if self.feature_transforms is not None:
+            assert not self.cfg.use_audio_input
+            source = self.feature_transforms(source)
+        source = torch.from_numpy(source).float()
+        return source
+
+    def process_audio_source(self, source):
+        length_in_s = len(source) / self.cfg.use_sample_rate
+        dura_flag = False if length_in_s <= 30 else True
+        if dura_flag:
+            source_parts = []
+            sample_count = 30 * self.cfg.use_sample_rate
+            # offset = 1 * self.cfg.use_sample_rate
+            offset = 0
+            segments = int(np.ceil((len(source) - sample_count) / (sample_count - offset)) + 1)  
+            for i in range(segments):  
+                start = i * (sample_count - offset)  
+                end = min(start + sample_count, len(source))  
+                part = source[start:end]
+                source_parts.append(part)
+        else:
+            source_parts = [source]
+        return source_parts
+
+    def __getitem__(self, index: int) -> SpeechLLMDatasetItem:
+        if self.start_frames is not None:
+            source_num_frames = int(self.end_frames[index]) - int(self.start_frames[index])
+            source, sr = torchaudio.load(self.audio_paths[index],
+                            frame_offset=int(self.start_frames[index]),
+                            num_frames=source_num_frames)
+            source = source[0]
+        else:
+            source = self._get_source_audio(index)
+        
+        source = self.pack_frames(source)
+        source_parts = self.process_audio_source(source)
+        if self.cfg.is_whisper:
+            sources = []
+            speech_attention_masks = []
+
+            for part in source_parts:
+                audio_output = self.audio_processor(part, sampling_rate=16000, return_tensors="pt", return_attention_mask=True)
+                sources.append(audio_output.input_features[0])
+                speech_attention_masks.append(audio_output.attention_mask[0].bool())
+
+            if self.cfg.use_wavlm:
+                wavlm_output = self.wavlm_processor(source, sampling_rate=16000, return_tensors="pt", return_attention_mask=True)
+                wavlm_sources = wavlm_output.input_values[0]
+                
+            else:
+                wavlm_sources = None
+
+        
+        example_sources = None
+        example_speech_attention_masks = None
+        example_wavlm_sources = None
+
+        audio_codec = None
+        example_audio_codec = None
+
+        target = None
+        if self.tgt_texts is not None:
+            target = self.get_tokenized_tgt_text(index) ## end with eos
+            if self.cfg.prepend_tgt_lang_tag:
+                lang_tag_idx = self.get_lang_tag_idx(
+                    self.tgt_langs[index], self.tgt_dict
+                )
+                target = torch.cat((torch.LongTensor([lang_tag_idx]), target), 0)
+
+        speaker_id = None
+        if self.speaker_to_id is not None:
+            speaker_id = self.speaker_to_id[self.speakers[index]]
+        
+        orig_prompt = self.get_tokenized_prompt_text(index) ## begin with bos
+
+        if self.cfg.prompt_bulid:
+            left_prompt_text = self.B_INST + self.B_SYS + self.SYSTEM + self.E_SYS + self.B_SPEECH
+            right_prompt_text = self.E_INST
+            left_prompt, right_prompt = self.get_tokenized_left_and_right_prompts_text(index, left_prompt_text, right_prompt_text)
+            
+            prompt_target = torch.cat((right_prompt, target), dim=0)
+            left_prompt_mask = torch.ones(left_prompt.shape[0]).bool()
+            
+            mid_prompt = None
+            mid_prompt_mask = None
+            prompt_mask = torch.cat((torch.ones(right_prompt[1:].shape[0]), torch.zeros(target.shape[0])), dim=0).bool()
+            target_mask = torch.cat((torch.zeros(right_prompt[1:].shape[0]), torch.ones(target.shape[0])), dim=0).bool()
+        
+        
+        lora_scale = -1
+
+        if self.cfg.alpaca_text:
+            speech_flag = self.get_speech_flag(index)
+            return SpeechLLMDatasetItem(
+                index=index, source=sources, target=prompt_target, speaker_id=speaker_id, left_prompt=left_prompt,
+                target_mask=target_mask, prompt_mask=prompt_mask, speech_mask=speech_attention_masks,
+                audio_codec=audio_codec, speech_flag=speech_flag, left_prompt_mask=left_prompt_mask,
+                mid_prompt=mid_prompt, mid_prompt_mask=mid_prompt_mask, example_source=example_sources,
+                example_speech_mask=example_speech_attention_masks, example_audio_codec=example_audio_codec,
+                lora_scale=lora_scale, orig_prompt=orig_prompt, wavlm_sources=wavlm_sources,
+                example_wavlm_sources=example_wavlm_sources,
+            )
+        
+        return SpeechLLMDatasetItem(
+            index=index, source=sources, target=prompt_target, speaker_id=speaker_id, left_prompt=left_prompt,
+            target_mask=target_mask, prompt_mask=prompt_mask, speech_mask=speech_attention_masks,
+            audio_codec=audio_codec, left_prompt_mask=left_prompt_mask
+        )
+
+    def __len__(self):
+        return self.n_samples
+
+    def collater(
+        self, samples: List[SpeechLLMDatasetItem], return_order: bool = False
+    ) -> Dict:
+        if len(samples) == 0:
+            return {}
+        indices = torch.tensor([x.index for x in samples], dtype=torch.long)
+        lora_scales = torch.tensor([x.lora_scale for x in samples], dtype=torch.long)
+        if self.cfg.is_whisper:
+            frames = [x.source for x in samples]
+            n_frames = torch.tensor([x.source[0].size(1) * len(x.source) for x in samples], dtype=torch.long)
+            batch_size = len(frames)
+            audio_decoder_input_ids = torch.ones((batch_size, self.cfg.whisper_token_len)).to(torch.long)
+            # audio_decoder_input_ids = audio_decoder_input_ids.to(src_tokens.device).to(torch.long)
+            if self.cfg.use_wavlm:
+                wavlm_input_features = [{"input_values": x.wavlm_sources} for x in samples]
+                wavlm_frames = self.wavlm_processor.pad(wavlm_input_features, padding=True, return_tensors="pt")["input_values"]
+                wavlm_speech_masks = self.wavlm_processor.pad(wavlm_input_features, padding=True, return_tensors="pt")["attention_mask"]
+                
+                example_wavlm_frames = None
+                example_wavlm_speech_masks = None
+        n_frames, order = n_frames.sort(descending=True)
+        indices = indices.index_select(0, order)
+        lora_scales = lora_scales.index_select(0, order)
+        # frames = frames.index_select(0, order)
+        frames = [frames[i] for i in order]
+        if wavlm_frames is not None:
+            wavlm_frames = [wavlm_frames[i] for i in order]
+            wavlm_speech_masks = [wavlm_speech_masks[i] for i in order]
+
+        speech_masks = [x.speech_mask for x in samples]
+        speech_masks = [speech_masks[i] for i in order]
+
+        
+        audio_codecs = None
+        audio_codec_masks = None
+
+        if self.cfg.alpaca_text:
+            speech_flags = [x.speech_flag for x in samples]
+            speech_flags = torch.tensor(speech_flags, dtype=torch.bool)
+            speech_flags = speech_flags.index_select(0, order)
+        else:
+            speech_flags = None
+
+        target, target_lengths = None, None
+        prev_output_tokens = None
+        ntokens = None
+        
+        if self.tgt_texts is not None:
+            # if self.prompt_texts is not None:
+            #     target_sequence = [x.prompt for x in samples] + [x.target for x in samples]
+            #     print ("")
+            # else:
+            #     target_sequence = [x.target for x in samples]
+
+            target_masks = [x.target_mask for x in samples]
+            target_masks = fairseq_data_utils.collate_tokens(target_masks, False)
+            target_masks = target_masks.index_select(0, order)
+
+            prompt_masks = [x.prompt_mask for x in samples]
+            prompt_masks = fairseq_data_utils.collate_tokens(prompt_masks, False)
+            prompt_masks = prompt_masks.index_select(0, order)
+
+            if self.cfg.prompt_bulid:
+                left_prompt_masks = [x.left_prompt_mask for x in samples]
+                left_prompt_masks = fairseq_data_utils.collate_tokens(left_prompt_masks, False)
+                left_prompt_masks = left_prompt_masks.index_select(0, order)
+                
+                left_prompts = fairseq_data_utils.collate_tokens(
+                    [x.left_prompt for x in samples],
+                    0, #self.tokenizer.pad_id,
+                )
+                left_prompts = left_prompts.index_select(0, order)
+
+                mid_prompt_masks = None
+                mid_prompts = None
+                
+            target = fairseq_data_utils.collate_tokens(
+                [x.target[1:] for x in samples],
+                0, #self.tokenizer.pad_id,
+            )
+            target = target.index_select(0, order)
+            target_lengths = torch.tensor(
+                [x.target.size(0) for x in samples], dtype=torch.long
+            ).index_select(0, order)
+            prev_output_tokens = fairseq_data_utils.collate_tokens(
+                [x.target[:-1] for x in samples],
+                0, #self.tokenizer.pad_id,
+            )
+            prev_output_tokens = prev_output_tokens.index_select(0, order)
+            ntokens = sum(x.target[1:].size(0) for x in samples)
+
+            orig_prompts = fairseq_data_utils.collate_tokens(
+                [x.orig_prompt for x in samples],
+                0, #self.tokenizer.pad_id,
+            )
+            orig_prompts = orig_prompts.index_select(0, order)
+
+        speaker = None
+        if self.speaker_to_id is not None:
+            speaker = (
+                torch.tensor([s.speaker_id for s in samples], dtype=torch.long)
+                .index_select(0, order)
+                .view(-1, 1)
+            )
+    
+        if self.cfg.is_whisper:
+            net_input = {
+                "index": indices,
+                "lora_index": lora_scales,
+                "speech_flag": speech_flags,
+                "audio_codec": audio_codecs,
+                "src_tokens": frames,
+                "src_lengths": n_frames,
+                "audio_decoder_input_ids": audio_decoder_input_ids,
+                "prev_output_tokens": prev_output_tokens,
+                "target_masks": target_masks,
+                "prompt_masks": prompt_masks,
+                "left_prompts": left_prompts,
+                "left_prompt_masks": left_prompt_masks,
+                "speech_masks": speech_masks,
+                "codec_masks": audio_codec_masks,
+                "orig_prompts": orig_prompts,
+                "wavlm_src_tokens": wavlm_frames,
+                "wavlm_speech_masks": wavlm_speech_masks,
+            }
+        out = {
+            "id": indices,
+            "net_input": net_input,
+            "speaker": speaker,
+            "target": target,
+            "target_lengths": target_lengths,
+            "ntokens": ntokens,
+            "nsentences": len(samples),
+        }
+        if return_order:
+            out["order"] = order
+        return out
+
+    def num_tokens(self, index):
+        return self.n_frames[index]
+
+    def size(self, index):
+        return self.n_frames[index], self.tgt_lens[index]
+
+    @property
+    def sizes(self):
+        return np.array(self.n_frames)
+
+    @property
+    def can_reuse_epoch_itr_across_epochs(self):
+        return True
+
+    def ordered_indices(self):
+        if self.shuffle:
+            order = [np.random.permutation(len(self))]
+        else:
+            order = [np.arange(len(self))]
+        # first by descending order of # of frames then by original/random order
+        order.append([-n for n in self.n_frames])
+        return np.lexsort(order)
+
+    def prefetch(self, indices):
+        raise False
+
+    def get_transforms(self, transform_type, split, is_train):
+        """Split-specific feature transforms. Allowing train set
+        wildcard `_train`, evaluation set wildcard `_eval` and general
+        wildcard `*` for matching."""
+        from copy import deepcopy
+
+        cfg = deepcopy(self.config)
+        _cur = cfg.get(f"{transform_type}transforms", {})
+        cur = _cur.get(split)
+        cur = _cur.get("_train") if cur is None and is_train else cur
+        cur = _cur.get("_eval") if cur is None and not is_train else cur
+        cur = _cur.get("*") if cur is None else cur
+        return cur
+
+    def get_feature_transforms(self, split, is_train):
+        cfg = deepcopy(self.config)
+        # TODO: deprecate transforms
+        cur = self.get_transforms("", split, is_train)
+        if cur is not None:
+            logger.warning(
+                "Auto converting transforms into feature_transforms, "
+                "but transforms will be deprecated in the future. Please "
+                "update this in the config."
+            )
+            ft_transforms = self.get_transforms("feature_", split, is_train)
+            if ft_transforms:
+                cur.extend(ft_transforms)
+        else:
+            cur = self.get_transforms("feature_", split, is_train)
+        cfg["feature_transforms"] = cur
+        return cfg
+
+    def get_waveform_transforms(self, split, is_train):
+        cfg = deepcopy(self.config)
+        cfg["waveform_transforms"] = self.get_transforms("waveform_", split, is_train)
+        return cfg
+
+    @classmethod
+    def _load_samples_from_tsv(self, root: str, split: str):
+        tsv_path = Path(root) / f"{split}.tsv"
+        if not tsv_path.is_file():
+            raise FileNotFoundError(f"Dataset not found: {tsv_path}")
+        with open(tsv_path) as f:
+            reader = csv.DictReader(
+                f,
+                delimiter="\t",
+                quotechar=None,
+                doublequote=False,
+                lineterminator="\n",
+                quoting=csv.QUOTE_NONE,
+            )
+            samples = [dict(e) for e in reader]
+        if len(samples) == 0:
+            raise ValueError(f"Empty manifest: {tsv_path}")
+        return samples
+
+
diff --git a/WavLLM/wavllm/data/tokenizer.py b/WavLLM/wavllm/data/tokenizer.py
new file mode 100644
index 0000000000000000000000000000000000000000..a3c760b02589850f2115ffb2a70a825b85fe5343
--- /dev/null
+++ b/WavLLM/wavllm/data/tokenizer.py
@@ -0,0 +1,49 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import csv
+import io
+import logging
+import re
+from collections import defaultdict
+from pathlib import Path
+from typing import Dict, List, Optional
+from dataclasses import dataclass
+
+import os
+from sentencepiece import SentencePieceProcessor
+from copy import deepcopy
+
+logger = logging.getLogger(__name__)
+
+
+
+class Tokenizer:
+    def __init__(self, model_path: str):
+        # reload tokenizer
+        assert os.path.isfile(model_path), model_path
+        self.sp_model = SentencePieceProcessor(model_file=model_path)
+        logger.info(f"Reloaded SentencePiece model from {model_path}")
+
+        # BOS / EOS token IDs
+        self.n_words: int = self.sp_model.vocab_size()
+        self.bos_id: int = self.sp_model.bos_id()
+        self.eos_id: int = self.sp_model.eos_id()
+        self.pad_id: int = self.sp_model.pad_id()
+        self.unk_id: int = self.sp_model.unk_id()
+        logger.info(f"#words: {self.n_words} - BOS ID: {self.bos_id} - EOS ID: {self.eos_id} - PAD ID: {self.pad_id} - UNK ID: {self.unk_id}")
+        assert self.sp_model.vocab_size() == self.sp_model.get_piece_size()
+
+    def encode(self, s: str, bos: bool, eos: bool) -> List[int]:
+        assert type(s) is str
+        t = self.sp_model.encode(s)
+        if bos:
+            t = [self.bos_id] + t
+        if eos:
+            t = t + [self.eos_id]
+        return t
+
+    def decode(self, t: List[int]) -> str:
+        return self.sp_model.decode(t)
\ No newline at end of file
diff --git a/WavLLM/wavllm/inference/generate.py b/WavLLM/wavllm/inference/generate.py
new file mode 100644
index 0000000000000000000000000000000000000000..2cfd29ab05b4d7d799ef3b2faea9fc8b3ea37e92
--- /dev/null
+++ b/WavLLM/wavllm/inference/generate.py
@@ -0,0 +1,454 @@
+#!/usr/bin/env python3 -u
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+"""
+Translate pre-processed data with a trained model.
+"""
+
+import ast
+import logging
+import math
+import os
+import sys
+import time
+from argparse import Namespace
+from itertools import chain
+
+import numpy as np
+import torch
+from omegaconf import DictConfig
+
+from fairseq import checkpoint_utils, options, scoring, tasks, utils
+from fairseq.dataclass.utils import convert_namespace_to_omegaconf
+from fairseq.logging import progress_bar
+from fairseq.logging.meters import StopwatchMeter, TimeMeter
+
+
+def main(cfg: DictConfig):
+
+    if isinstance(cfg, Namespace):
+        print(cfg)
+        cfg = convert_namespace_to_omegaconf(cfg)
+
+    assert cfg.common_eval.path is not None, "--path required for generation!"
+    assert (
+        not cfg.generation.sampling or cfg.generation.nbest == cfg.generation.beam
+    ), "--sampling requires --nbest to be equal to --beam"
+    assert (
+        cfg.generation.replace_unk is None or cfg.dataset.dataset_impl == "raw"
+    ), "--replace-unk requires a raw text dataset (--dataset-impl=raw)"
+
+    if cfg.common_eval.results_path is not None:
+        os.makedirs(cfg.common_eval.results_path, exist_ok=True)
+        output_path = os.path.join(
+            cfg.common_eval.results_path,
+            "generate-{}.txt".format(cfg.dataset.gen_subset),
+        )
+        with open(output_path, "w", buffering=1, encoding="utf-8") as h:
+            return _main(cfg, h)
+    else:
+        return _main(cfg, sys.stdout)
+
+
+def get_symbols_to_strip_from_output(generator):
+    if hasattr(generator, "symbols_to_strip_from_output"):
+        return generator.symbols_to_strip_from_output
+    else:
+        return {generator.eos}
+
+
+def _main(cfg: DictConfig, output_file):
+    logging.basicConfig(
+        format="%(asctime)s | %(levelname)s | %(name)s | %(message)s",
+        datefmt="%Y-%m-%d %H:%M:%S",
+        level=os.environ.get("LOGLEVEL", "INFO").upper(),
+        stream=output_file,
+    )
+    logger = logging.getLogger("fairseq_cli.generate")
+
+    utils.import_user_module(cfg.common)
+
+    if cfg.dataset.max_tokens is None and cfg.dataset.batch_size is None:
+        cfg.dataset.max_tokens = 12000
+    logger.info(cfg)
+
+    # Fix seed for stochastic decoding
+    if cfg.common.seed is not None and not cfg.generation.no_seed_provided:
+        np.random.seed(cfg.common.seed)
+        utils.set_torch_seed(cfg.common.seed)
+
+    use_cuda = torch.cuda.is_available() and not cfg.common.cpu
+
+    task = tasks.setup_task(cfg.task)
+    task.cfg.is_whisper = True
+    task.cfg.processor_path = "openai/whisper-large-v2"
+
+    
+    task.cfg.whisper_with_decoder = False
+    task.cfg.sft_stage = True
+    task.cfg.llama_2 = True
+    task.cfg.use_lora = True
+    task.cfg.lora_r = 32
+    task.cfg.lora_alpha = 32
+
+    task.cfg.freeze_audio_encoder = False
+    task.cfg.use_wavlm = True
+    
+    task.cfg.wavlm_processor_path = "microsoft/wavlm-base"
+    task.cfg.wavlm_output_weight = True
+
+    task.cfg.alpaca_text = True
+    task.cfg.prompt_bulid = True
+
+    task.cfg.load_pretrained_model = False
+
+    task.cfg.second_stage_update_scale = True
+    task.cfg.scale_with_audio = False
+    task.cfg.scale_0_1 = False
+    
+    # Set dictionaries
+    try:
+        src_dict = getattr(task, "source_dictionary", None)
+    except NotImplementedError:
+        src_dict = None
+    tgt_dict = task.target_dictionary
+
+    overrides = ast.literal_eval(cfg.common_eval.model_overrides)
+
+    # Load ensemble
+    logger.info("loading model(s) from {}".format(cfg.common_eval.path))
+    models, saved_cfg = checkpoint_utils.load_model_ensemble(
+        utils.split_paths(cfg.common_eval.path),
+        arg_overrides=overrides,
+        task=task,
+        suffix=cfg.checkpoint.checkpoint_suffix,
+        strict=(cfg.checkpoint.checkpoint_shard_count == 1),
+        num_shards=cfg.checkpoint.checkpoint_shard_count,
+    )
+
+    # loading the dataset should happen after the checkpoint has been loaded so we can give it the saved task config
+    task.load_dataset(cfg.dataset.gen_subset, task_cfg=saved_cfg.task)
+
+    if cfg.generation.lm_path is not None:
+        overrides["data"] = cfg.task.data
+
+        try:
+            lms, _ = checkpoint_utils.load_model_ensemble(
+                [cfg.generation.lm_path], arg_overrides=overrides, task=None
+            )
+        except:
+            logger.warning(
+                f"Failed to load language model! Please make sure that the language model dict is the same "
+                f"as target dict and is located in the data dir ({cfg.task.data})"
+            )
+            raise
+
+        assert len(lms) == 1
+    else:
+        lms = [None]
+
+    # Optimize ensemble for generation
+    for model in chain(models, lms):
+        if model is None:
+            continue
+        if cfg.common.fp16:
+            model.half()
+        if use_cuda and not cfg.distributed_training.pipeline_model_parallel:
+            model.cuda()
+        model.prepare_for_inference_(cfg)
+
+    # Load alignment dictionary for unknown word replacement
+    # (None if no unknown word replacement, empty if no path to align dictionary)
+    align_dict = utils.load_align_dict(cfg.generation.replace_unk)
+
+    # Load dataset (possibly sharded)
+    itr = task.get_batch_iterator(
+        dataset=task.dataset(cfg.dataset.gen_subset),
+        max_tokens=cfg.dataset.max_tokens,
+        # max_sentences=cfg.dataset.batch_size,
+        max_sentences=1,
+        max_positions=utils.resolve_max_positions(
+            task.max_positions(), *[m.max_positions() for m in models]
+        ),
+        ignore_invalid_inputs=cfg.dataset.skip_invalid_size_inputs_valid_test,
+        required_batch_size_multiple=cfg.dataset.required_batch_size_multiple,
+        seed=cfg.common.seed,
+        num_shards=cfg.distributed_training.distributed_world_size,
+        shard_id=cfg.distributed_training.distributed_rank,
+        num_workers=cfg.dataset.num_workers,
+        data_buffer_size=cfg.dataset.data_buffer_size,
+    ).next_epoch_itr(shuffle=False)
+    progress = progress_bar.progress_bar(
+        itr,
+        log_format=cfg.common.log_format,
+        log_interval=cfg.common.log_interval,
+        default_log_format=("tqdm" if not cfg.common.no_progress_bar else "simple"),
+    )
+
+    # Initialize generator
+    gen_timer = StopwatchMeter()
+
+    extra_gen_cls_kwargs = {"lm_model": lms[0], "lm_weight": cfg.generation.lm_weight}
+
+    generator = task.build_generator(
+        models, cfg.generation, extra_gen_cls_kwargs=extra_gen_cls_kwargs
+    )
+
+    # Handle tokenization and BPE
+    tokenizer = task.build_tokenizer(cfg.tokenizer)
+    bpe = task.build_bpe(cfg.bpe)
+
+    def decode_fn(x):
+        if bpe is not None:
+            x = bpe.decode(x)
+        if tokenizer is not None:
+            x = tokenizer.decode(x)
+        return x
+
+    scorer = scoring.build_scorer(cfg.scoring, tgt_dict)
+
+    num_sentences = 0
+    has_target = True
+    wps_meter = TimeMeter()
+    for sample in progress:
+        sample = utils.move_to_cuda(sample) if use_cuda else sample
+        if "net_input" not in sample:
+            continue
+
+        prompt_masks = sample["net_input"]["prompt_masks"]
+        prefix_sizes = prompt_masks.sum(axis=-1)
+
+        #print ("prefix_sizes: ", prefix_sizes.shape, prefix_sizes)
+        prefix_size = prefix_sizes.max() ## TODO: right prompt length
+        
+        prefix_tokens = None
+        #if cfg.generation.prefix_size > 0:
+        #    prefix_tokens = sample["target"][:, : cfg.generation.prefix_size]
+        if prefix_size > 0:
+            prefix_tokens = sample["target"][:, : prefix_size]
+
+        constraints = None
+        if "constraints" in sample:
+            constraints = sample["constraints"]
+
+        gen_timer.start()
+        hypos = task.inference_step(
+            generator,
+            models,
+            sample,
+            prefix_tokens=prefix_tokens,
+            constraints=constraints,
+        )
+        num_generated_tokens = sum(len(h[0]["tokens"]) for h in hypos)
+        gen_timer.stop(num_generated_tokens)
+        
+        target_masks = sample["net_input"]["target_masks"]
+        prompt_masks = sample["net_input"]["prompt_masks"]
+
+        for i, sample_id in enumerate(sample["id"].tolist()):
+            has_target = sample["target"] is not None
+            target_mask = target_masks[i, :]
+            prompt_mask = prompt_masks[i, :]
+            #print ("sample[target]: ", sample["target"])
+
+            # Remove padding
+            if "src_tokens" in sample["net_input"]:
+                # src_tokens = utils.strip_pad(
+                #     sample["net_input"]["src_tokens"][i, :], tgt_dict.pad()
+                # )
+                src_tokens = None
+            else:
+                src_tokens = None
+
+            target_tokens = None
+            if has_target:
+                target_tokens = (
+                    utils.strip_pad(sample["target"][i, :], tgt_dict.pad()).int().cpu()
+                )
+
+            # Either retrieve the original sentences or regenerate them from tokens.
+            if align_dict is not None:
+                src_str = task.dataset(cfg.dataset.gen_subset).src.get_original_text(
+                    sample_id
+                )
+                target_str = task.dataset(cfg.dataset.gen_subset).tgt.get_original_text(
+                    sample_id
+                )
+            else:
+                if src_dict is not None:
+                    src_str = src_dict.string(src_tokens, cfg.common_eval.post_process)
+                else:
+                    src_str = ""
+                if has_target:
+                    target_str = tgt_dict.string(
+                        target_tokens,
+                        cfg.common_eval.post_process,
+                        escape_unk=True,
+                        extra_symbols_to_ignore=get_symbols_to_strip_from_output(
+                            generator
+                        ),
+                    )
+
+            target_str = task.tokenizer.decode(
+                utils.strip_pad(sample["target"][i, :][target_mask], 0).tolist()
+                # sample["target"][i, :][target_mask].tolist()
+            )
+
+            if not cfg.common_eval.quiet:
+                if src_dict is not None:
+                    print("S-{}\t{}".format(sample_id, src_str), file=output_file)
+                if has_target:
+                    print("T-{}\t{}".format(sample_id, target_str), file=output_file)
+                    #print("T-{}\t{}".format(sample_id, task.tokenizer.decode(sample["target"][i, :].tolist())), file=output_file)
+                    print("T-{}\t{}".format(sample_id, target_str))
+                    #print("T-{}\t{}".format(sample_id, task.tokenizer.decode([16564,   373,   278, 10959, 10348, 29892,  5706,   263, 15171,  6270])))
+
+            # Process top predictions
+            for j, hypo in enumerate(hypos[i][: cfg.generation.nbest]):
+                hypo_tokens, hypo_str, alignment = utils.post_process_prediction(
+                    hypo_tokens=hypo["tokens"].int().cpu(),
+                    src_str=src_str,
+                    alignment=hypo["alignment"],
+                    align_dict=align_dict,
+                    tgt_dict=tgt_dict,
+                    remove_bpe=cfg.common_eval.post_process,
+                    extra_symbols_to_ignore=get_symbols_to_strip_from_output(generator),
+                )
+                #detok_hypo_str = decode_fn(hypo_str)
+                prompt_mask_len = prompt_mask.sum(axis=0)
+                hypo_ids = hypo_tokens.tolist()[prompt_mask_len:]
+                hypo_str = task.tokenizer.decode(hypo_tokens.tolist()[prompt_mask_len:])
+                
+                detok_hypo_str = hypo_str
+                if not cfg.common_eval.quiet:
+                    score = hypo["score"] / math.log(2)  # convert to base 2
+                    print(
+                        "D-{}\t{}".format(sample_id, hypo_str),
+                    )
+                    print(
+                        #"D-{}\t{}\t{}".format(sample_id, score, detok_hypo_str),
+                        #"D-{}\t{}\t{}\t{}".format(sample_id, score, hypo_str, " ".join(str(i) for i in hypo_tokens.tolist()[prompt_mask_len:])),
+                        "D-{}\t{}\t{}".format(sample_id, score, repr(hypo_str)),
+                        file=output_file,
+                    )
+                    
+
+                    if cfg.generation.print_alignment == "hard":
+                        print(
+                            "A-{}\t{}".format(
+                                sample_id,
+                                " ".join(
+                                    [
+                                        "{}-{}".format(src_idx, tgt_idx)
+                                        for src_idx, tgt_idx in alignment
+                                    ]
+                                ),
+                            ),
+                            file=output_file,
+                        )
+                    if cfg.generation.print_alignment == "soft":
+                        print(
+                            "A-{}\t{}".format(
+                                sample_id,
+                                " ".join(
+                                    [",".join(src_probs) for src_probs in alignment]
+                                ),
+                            ),
+                            file=output_file,
+                        )
+
+                    if cfg.generation.print_step:
+                        print(
+                            "I-{}\t{}".format(sample_id, hypo["steps"]),
+                            file=output_file,
+                        )
+
+                    if cfg.generation.retain_iter_history:
+                        for step, h in enumerate(hypo["history"]):
+                            _, h_str, _ = utils.post_process_prediction(
+                                hypo_tokens=h["tokens"].int().cpu(),
+                                src_str=src_str,
+                                alignment=None,
+                                align_dict=None,
+                                tgt_dict=tgt_dict,
+                                remove_bpe=None,
+                            )
+                            print(
+                                "E-{}_{}\t{}".format(sample_id, step, h_str),
+                                file=output_file,
+                            )
+
+                # Score only the top hypothesis
+                if has_target and j == 0:
+                    if (
+                        align_dict is not None
+                        or cfg.common_eval.post_process is not None
+                    ):
+                        # Convert back to tokens for evaluation with unk replacement and/or without BPE
+                        target_tokens = tgt_dict.encode_line(
+                            target_str, add_if_not_exist=True
+                        )
+                        hypo_tokens = tgt_dict.encode_line(
+                            detok_hypo_str, add_if_not_exist=True
+                        )
+                    if hasattr(scorer, "add_string"):
+                        scorer.add_string(target_str, detok_hypo_str)
+                    else:
+                        scorer.add(target_tokens, hypo_tokens)
+
+        wps_meter.update(num_generated_tokens)
+        progress.log({"wps": round(wps_meter.avg)})
+        num_sentences += (
+            sample["nsentences"] if "nsentences" in sample else sample["id"].numel()
+        )
+
+    logger.info("NOTE: hypothesis and token scores are output in base 2")
+    logger.info(
+        "Translated {:,} sentences ({:,} tokens) in {:.1f}s ({:.2f} sentences/s, {:.2f} tokens/s)".format(
+            num_sentences,
+            gen_timer.n,
+            gen_timer.sum,
+            num_sentences / gen_timer.sum,
+            1.0 / gen_timer.avg,
+        )
+    )
+    if has_target:
+        if cfg.bpe and not cfg.generation.sacrebleu:
+            if cfg.common_eval.post_process:
+                logger.warning(
+                    "BLEU score is being computed by splitting detokenized string on spaces, this is probably not what you want. Use --sacrebleu for standard 13a BLEU tokenization"
+                )
+            else:
+                logger.warning(
+                    "If you are using BPE on the target side, the BLEU score is computed on BPE tokens, not on proper words.  Use --sacrebleu for standard 13a BLEU tokenization"
+                )
+        # use print to be consistent with other main outputs: S-, H-, T-, D- and so on
+        print(
+            "Generate {} with beam={}: {}".format(
+                cfg.dataset.gen_subset, cfg.generation.beam, scorer.result_string()
+            ),
+            file=output_file,
+        )
+
+    return scorer
+
+
+def cli_main():
+    parser = options.get_generation_parser()
+    # TODO: replace this workaround with refactoring of `AudioPretraining`
+    parser.add_argument(
+        "--arch",
+        "-a",
+        metavar="ARCH",
+        default="wav2vec2",
+        help="Model architecture. For constructing tasks that rely on "
+        "model args (e.g. `AudioPretraining`)",
+    )
+    args = options.parse_args_and_arch(parser)
+    main(args)
+
+
+if __name__ == "__main__":
+    cli_main()
diff --git a/WavLLM/wavllm/inference/sequence_generator.py b/WavLLM/wavllm/inference/sequence_generator.py
new file mode 100644
index 0000000000000000000000000000000000000000..14b500cb3d35a70546e1f9711cddf67d9a86e3c9
--- /dev/null
+++ b/WavLLM/wavllm/inference/sequence_generator.py
@@ -0,0 +1,1017 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import math
+import sys
+from typing import Dict, List, Optional
+
+import torch
+import time
+import torch.nn as nn
+import torch.nn.functional as F
+from torch import Tensor
+
+from fairseq import search, utils
+from fairseq.data import data_utils
+from fairseq.models import FairseqIncrementalDecoder
+from fairseq.ngram_repeat_block import NGramRepeatBlock
+
+
+class SequenceGenerator(nn.Module):
+    def __init__(
+        self,
+        models,
+        tgt_dict,
+        beam_size=1,
+        max_len_a=0,
+        max_len_b=200,
+        max_len=0,
+        min_len=1,
+        normalize_scores=True,
+        len_penalty=1.0,
+        unk_penalty=0.0,
+        temperature=1.0,
+        match_source_len=False,
+        no_repeat_ngram_size=0,
+        search_strategy=None,
+        eos=None,
+        symbols_to_strip_from_output=None,
+        lm_model=None,
+        lm_weight=1.0,
+        tokens_to_suppress=(),
+        n_words=0,
+    ):
+        """Generates translations of a given source sentence.
+
+        Args:
+            models (List[~fairseq.models.FairseqModel]): ensemble of models,
+                currently support fairseq.models.TransformerModel for scripting
+            beam_size (int, optional): beam width (default: 1)
+            max_len_a/b (int, optional): generate sequences of maximum length
+                ax + b, where x is the source length
+            max_len (int, optional): the maximum length of the generated output
+                (not including end-of-sentence)
+            min_len (int, optional): the minimum length of the generated output
+                (not including end-of-sentence)
+            normalize_scores (bool, optional): normalize scores by the length
+                of the output (default: True)
+            len_penalty (float, optional): length penalty, where <1.0 favors
+                shorter, >1.0 favors longer sentences (default: 1.0)
+            unk_penalty (float, optional): unknown word penalty, where <0
+                produces more unks, >0 produces fewer (default: 0.0)
+            temperature (float, optional): temperature, where values
+                >1.0 produce more uniform samples and values <1.0 produce
+                sharper samples (default: 1.0)
+            match_source_len (bool, optional): outputs should match the source
+                length (default: False)
+        """
+        super().__init__()
+        if isinstance(models, EnsembleModel):
+            self.model = models
+        else:
+            self.model = EnsembleModel(models)
+        self.tgt_dict = tgt_dict
+        # self.pad = tgt_dict.pad()
+        # self.unk = tgt_dict.unk()
+        # self.eos = tgt_dict.eos() if eos is None else eos
+        self.pad = 0  ## original pad index is -1
+        self.unk = 0
+        self.bos = 1
+        self.eos = 2
+        self.symbols_to_strip_from_output = (
+            symbols_to_strip_from_output.union({self.eos})
+            if symbols_to_strip_from_output is not None
+            else {self.eos}
+        )
+
+        self.token_indices_to_suppress: Optional[Tensor] = None
+        token_indices_to_suppress = []
+        for token_string in tokens_to_suppress:
+            token_index = tgt_dict.index(token_string)
+            assert token_index != self.unk
+            token_indices_to_suppress.append(token_index)
+        if len(token_indices_to_suppress) > 0:
+            self.token_indices_to_suppress = torch.Tensor(
+                token_indices_to_suppress
+            ).long()
+
+        self.vocab_size = len(tgt_dict)
+        self.n_words = n_words
+        self.beam_size = beam_size
+        # the max beam size is the dictionary size - 1, since we never select pad
+        self.beam_size = min(beam_size, self.vocab_size - 1)
+        self.model.set_decoder_beam_size(self.beam_size)
+        self.max_len_a = max_len_a
+        self.max_len_b = max_len_b
+        self.min_len = min_len
+        self.max_len = max_len or self.model.max_decoder_positions()
+
+        self.normalize_scores = normalize_scores
+        self.len_penalty = len_penalty
+        self.unk_penalty = unk_penalty
+        self.temperature = temperature
+        self.match_source_len = match_source_len
+
+        if no_repeat_ngram_size > 0:
+            self.repeat_ngram_blocker = NGramRepeatBlock(no_repeat_ngram_size)
+        else:
+            self.repeat_ngram_blocker = None
+
+        assert temperature > 0, "--temperature must be greater than 0"
+
+        self.search = (
+            search.BeamSearch(tgt_dict) if search_strategy is None else search_strategy
+        )
+        # We only need to set src_lengths in LengthConstrainedBeamSearch.
+        # As a module attribute, setting it would break in multithread
+        # settings when the model is shared.
+        self.should_set_src_lengths = (
+            hasattr(self.search, "needs_src_lengths") and self.search.needs_src_lengths
+        )
+
+        self.model.eval()
+
+        self.lm_model = lm_model
+        self.lm_weight = lm_weight
+        if self.lm_model is not None:
+            self.lm_model.eval()
+
+    def cuda(self):
+        self.model.cuda()
+        return self
+
+    @torch.no_grad()
+    def forward(
+        self,
+        sample: Dict[str, Dict[str, Tensor]],
+        prefix_tokens: Optional[Tensor] = None,
+        bos_token: Optional[int] = None,
+    ):
+        """Generate a batch of translations.
+
+        Args:
+            sample (dict): batch
+            prefix_tokens (torch.LongTensor, optional): force decoder to begin
+                with these tokens
+            bos_token (int, optional): beginning of sentence token
+                (default: self.eos)
+        """
+        return self._generate(sample, prefix_tokens, bos_token=bos_token)
+
+    # TODO(myleott): unused, deprecate after pytorch-translate migration
+    def generate_batched_itr(self, data_itr, beam_size=None, cuda=False, timer=None):
+        """Iterate over a batched dataset and yield individual translations.
+        Args:
+            cuda (bool, optional): use GPU for generation
+            timer (StopwatchMeter, optional): time generations
+        """
+        for sample in data_itr:
+            s = utils.move_to_cuda(sample) if cuda else sample
+            if "net_input" not in s:
+                continue
+            input = s["net_input"]
+            # model.forward normally channels prev_output_tokens into the decoder
+            # separately, but SequenceGenerator directly calls model.encoder
+            encoder_input = {
+                k: v for k, v in input.items() if k != "prev_output_tokens"
+            }
+            if timer is not None:
+                timer.start()
+            with torch.no_grad():
+                hypos = self.generate(encoder_input)
+            if timer is not None:
+                timer.stop(sum(len(h[0]["tokens"]) for h in hypos))
+            for i, id in enumerate(s["id"].data):
+                # remove padding
+                src = utils.strip_pad(input["src_tokens"].data[i, :], self.pad)
+                ref = (
+                    utils.strip_pad(s["target"].data[i, :], self.pad)
+                    if s["target"] is not None
+                    else None
+                )
+                yield id, src, ref, hypos[i]
+
+    @torch.no_grad()
+    def generate(
+        self, models, sample: Dict[str, Dict[str, Tensor]], **kwargs
+    ) -> List[List[Dict[str, Tensor]]]:
+        """Generate translations. Match the api of other fairseq generators.
+
+        Args:
+            models (List[~fairseq.models.FairseqModel]): ensemble of models
+            sample (dict): batch
+            prefix_tokens (torch.LongTensor, optional): force decoder to begin
+                with these tokens
+            constraints (torch.LongTensor, optional): force decoder to include
+                the list of constraints
+            bos_token (int, optional): beginning of sentence token
+                (default: self.eos)
+        """
+        return self._generate(sample, **kwargs)
+
+    def _generate(
+        self,
+        sample: Dict[str, Dict[str, Tensor]],
+        prefix_tokens: Optional[Tensor] = None,
+        constraints: Optional[Tensor] = None,
+        bos_token: Optional[int] = None,
+    ):
+        incremental_states = torch.jit.annotate(
+            List[Dict[str, Dict[str, Optional[Tensor]]]],
+            [
+                torch.jit.annotate(Dict[str, Dict[str, Optional[Tensor]]], {})
+                for i in range(self.model.models_size)
+            ],
+        )
+        net_input = sample["net_input"]
+        tid = sample["id"]
+
+        if "src_tokens" in net_input:
+            src_tokens = net_input["src_tokens"]
+            # length of the source text being the character length except EndOfSentence and pad
+            # if src_lengths exists in net_input (speech_to_text dataset case), then use it
+            if "src_lengths" in net_input:
+                src_lengths = net_input["src_lengths"]
+            else:
+                src_lengths = (
+                    (src_tokens.ne(self.eos) & src_tokens.ne(self.pad))
+                    .long()
+                    .sum(dim=1)
+                )
+        elif "source" in net_input:
+            src_tokens = net_input["source"]
+            src_lengths = (
+                net_input["padding_mask"].size(-1) - net_input["padding_mask"].sum(-1)
+                if net_input["padding_mask"] is not None
+                else torch.tensor(src_tokens.size(-1)).to(src_tokens)
+            )
+        elif "features" in net_input:
+            src_tokens = net_input["features"]
+            src_lengths = (
+                net_input["padding_mask"].size(-1) - net_input["padding_mask"].sum(-1)
+                if net_input["padding_mask"] is not None
+                else torch.tensor(src_tokens.size(-1)).to(src_tokens)
+            )
+        else:
+            raise Exception(
+                "expected src_tokens or source in net input. input keys: "
+                + str(net_input.keys())
+            )
+
+        # bsz: total number of sentences in beam
+        # Note that src_tokens may have more than 2 dimensions (i.e. audio features)
+        # bsz, src_len = src_tokens.size()[:2]
+        bsz = len(src_tokens)
+        src_len = int(max(src_lengths))
+        beam_size = self.beam_size
+
+        if constraints is not None and not self.search.supports_constraints:
+            raise NotImplementedError(
+                "Target-side constraints were provided, but search method doesn't support them"
+            )
+
+        # Initialize constraints, when active
+        self.search.init_constraints(constraints, beam_size)
+
+        max_len: int = -1
+        if self.match_source_len:
+            max_len = src_lengths.max().item()
+        else:
+            max_len = min(
+                int(self.max_len_a * src_len + self.max_len_b),
+                self.max_len - 1,
+            )
+        assert (
+            self.min_len <= max_len
+        ), "min_len cannot be larger than max_len, please adjust these!"
+        # compute the encoder output for each beam
+        with torch.autograd.profiler.record_function("EnsembleModel: forward_encoder"):
+            #print ("net_input: ", net_input)
+            #print ("self.model.forward_encoder: ", self.model.forward_encoder)
+            all_encoder_outs = self.model.forward_encoder(net_input)
+            if "wavlm_src_tokens" in net_input and net_input["wavlm_src_tokens"] is not None:
+                orig_prompts = net_input["orig_prompts"]
+                prompt_embedding = None
+                all_wavlm_outs = [model.wavlm_encoder.forward_torchscript(net_input, prompt_embedding) for model in self.model.models]
+                wavlm_outs, few_shot_wavlm_outs = zip(*all_wavlm_outs)
+            else:
+                wavlm_outs, few_shot_wavlm_outs = None, None
+            encoder_outs, few_shot_encoder_outs = zip(*all_encoder_outs)
+        
+        # placeholder of indices for bsz * beam_size to hold tokens and accumulative scores
+        out_device = encoder_outs[0]['encoder_out'].device
+        new_order = torch.arange(bsz).view(-1, 1).repeat(1, beam_size).view(-1)
+        new_order = new_order.to(out_device).long()
+        encoder_outs = self.model.reorder_encoder_out(encoder_outs, new_order)
+        if wavlm_outs is not None:
+            wavlm_outs = self.model.reorder_encoder_out(wavlm_outs, new_order, True)
+        
+        assert encoder_outs is not None
+
+        # initialize buffers
+        scores = (
+            torch.zeros(bsz * beam_size, max_len + 1).to(out_device).float()
+        )  # +1 for eos; pad is never chosen for scoring
+        tokens = (
+            torch.zeros(bsz * beam_size, max_len + 2)
+            .to(out_device)
+            .long()
+            .fill_(self.pad)
+        )  # +2 for eos and pad
+        tokens[:, 0] = self.bos if bos_token is None else bos_token
+        attn: Optional[Tensor] = None
+
+        # A list that indicates candidates that should be ignored.
+        # For example, suppose we're sampling and have already finalized 2/5
+        # samples. Then cands_to_ignore would mark 2 positions as being ignored,
+        # so that we only finalize the remaining 3 samples.
+        cands_to_ignore = (
+            torch.zeros(bsz, beam_size).to(out_device).eq(-1)
+        )  # forward and backward-compatible False mask
+
+        # list of completed sentences
+        finalized = torch.jit.annotate(
+            List[List[Dict[str, Tensor]]],
+            [torch.jit.annotate(List[Dict[str, Tensor]], []) for i in range(bsz)],
+        )  # contains lists of dictionaries of infomation about the hypothesis being finalized at each step
+
+        # a boolean array indicating if the sentence at the index is finished or not
+        finished = [False for i in range(bsz)]
+        num_remaining_sent = bsz  # number of sentences remaining
+
+        # number of candidate hypos per step
+        cand_size = 2 * beam_size  # 2 x beam size in case half are EOS
+
+        # offset arrays for converting between different indexing schemes
+        bbsz_offsets = (
+            (torch.arange(0, bsz) * beam_size)
+            .unsqueeze(1)
+            .type_as(tokens)
+            .to(out_device)
+        )
+        cand_offsets = torch.arange(0, cand_size).type_as(tokens).to(out_device)
+
+        reorder_state: Optional[Tensor] = None
+        batch_idxs: Optional[Tensor] = None
+
+        original_batch_idxs: Optional[Tensor] = None
+        if "id" in sample and isinstance(sample["id"], Tensor):
+            original_batch_idxs = sample["id"]
+        else:
+            original_batch_idxs = torch.arange(0, bsz).type_as(tokens)
+
+        for step in range(max_len + 1):  # one extra step for EOS marker
+            # reorder decoder internal states based on the prev choice of beams
+            if reorder_state is not None:
+                if batch_idxs is not None:
+                    # update beam indices to take into account removed sentences
+                    corr = batch_idxs - torch.arange(batch_idxs.numel()).type_as(
+                        batch_idxs
+                    )
+                    reorder_state.view(-1, beam_size).add_(
+                        corr.unsqueeze(-1) * beam_size
+                    )
+                    original_batch_idxs = original_batch_idxs[batch_idxs]
+                self.model.reorder_incremental_state(incremental_states, reorder_state)
+                encoder_outs = self.model.reorder_encoder_out(
+                    encoder_outs, reorder_state
+                )
+                if wavlm_outs is not None:
+                    wavlm_outs = self.model.reorder_encoder_out(wavlm_outs, reorder_state, True)
+            with torch.autograd.profiler.record_function(
+                "EnsembleModel: forward_decoder"
+            ):
+                lprobs, avg_attn_scores = self.model.forward_decoder(
+                    tokens[:, : step + 1],
+                    encoder_outs,
+                    few_shot_encoder_outs,
+                    incremental_states,
+                    self.temperature,
+                    net_input,
+                    step,
+                    wavlm_outs,
+                    few_shot_wavlm_outs,
+                    tid[0],
+                )
+                #print ("step, lprobs: ", step, lprobs.shape, lprobs)
+
+            if self.lm_model is not None:
+                lm_out = self.lm_model(tokens[:, : step + 1])
+                probs = self.lm_model.get_normalized_probs(
+                    lm_out, log_probs=True, sample=None
+                )
+                probs = probs[:, -1, :] * self.lm_weight
+                lprobs += probs
+
+            lprobs[lprobs != lprobs] = torch.tensor(-math.inf).to(lprobs)
+
+            lprobs[:, self.pad] = -math.inf  # never select pad
+            lprobs[:, self.unk] -= self.unk_penalty  # apply unk penalty
+
+            if prefix_tokens is not None:
+                if step == prefix_tokens.size(1):
+                    lprobs[:, self.eos] = -math.inf
+            else:
+                if step == 0:
+                    lprobs[:, self.eos] = -math.inf
+
+            # handle max length constraint
+            if step >= max_len:
+                lprobs[:, : self.eos] = -math.inf
+                lprobs[:, self.eos + 1 :] = -math.inf
+
+            # handle prefix tokens (possibly with different lengths)
+            if (
+                prefix_tokens is not None
+                and step < prefix_tokens.size(1)
+                and step < max_len
+            ):
+                lprobs, tokens, scores = self._prefix_tokens(
+                    step, lprobs, scores, tokens, prefix_tokens, beam_size
+                )
+            else:
+                if step < self.min_len:
+                    # minimum length constraint (does not apply if using prefix_tokens)
+                    lprobs[:, self.eos] = -math.inf
+
+                if self.token_indices_to_suppress is not None:
+                    lprobs[:, self.token_indices_to_suppress] = -math.inf
+
+            # Record attention scores, only support avg_attn_scores is a Tensor
+            if avg_attn_scores is not None:
+                if attn is None:
+                    attn = torch.empty(
+                        bsz * beam_size, avg_attn_scores.size(1), max_len + 2
+                    ).to(scores)
+                attn[:, :, step + 1].copy_(avg_attn_scores)
+
+            scores = scores.type_as(lprobs)
+            eos_bbsz_idx = torch.empty(0).to(
+                tokens
+            )  # indices of hypothesis ending with eos (finished sentences)
+            eos_scores = torch.empty(0).to(
+                scores
+            )  # scores of hypothesis ending with eos (finished sentences)
+
+            if self.should_set_src_lengths:
+                self.search.set_src_lengths(src_lengths)
+
+            if self.repeat_ngram_blocker is not None:
+                lprobs = self.repeat_ngram_blocker(tokens, lprobs, bsz, beam_size, step)
+
+            # Shape: (batch, cand_size)
+            cand_scores, cand_indices, cand_beams = self.search.step(
+                step,
+                lprobs.view(bsz, -1, self.n_words),
+                scores.view(bsz, beam_size, -1)[:, :, :step],
+                tokens[:, : step + 1],
+                original_batch_idxs,
+            )
+
+            # cand_bbsz_idx contains beam indices for the top candidate
+            # hypotheses, with a range of values: [0, bsz*beam_size),
+            # and dimensions: [bsz, cand_size]
+            cand_bbsz_idx = cand_beams.add(bbsz_offsets)
+
+            # finalize hypotheses that end in eos
+            # Shape of eos_mask: (batch size, beam size)
+            eos_mask = cand_indices.eq(self.eos) & cand_scores.ne(-math.inf)
+            eos_mask[:, :beam_size][cands_to_ignore] = torch.tensor(0).to(eos_mask)
+
+            # only consider eos when it's among the top beam_size indices
+            # Now we know what beam item(s) to finish
+            # Shape: 1d list of absolute-numbered
+            eos_bbsz_idx = torch.masked_select(
+                cand_bbsz_idx[:, :beam_size], mask=eos_mask[:, :beam_size]
+            )
+
+            finalized_sents: List[int] = []
+            if eos_bbsz_idx.numel() > 0:
+                eos_scores = torch.masked_select(
+                    cand_scores[:, :beam_size], mask=eos_mask[:, :beam_size]
+                )
+
+                finalized_sents = self.finalize_hypos(
+                    step,
+                    eos_bbsz_idx,
+                    eos_scores,
+                    tokens,
+                    scores,
+                    finalized,
+                    finished,
+                    beam_size,
+                    attn,
+                    src_lengths,
+                    max_len,
+                )
+                num_remaining_sent -= len(finalized_sents)
+
+            assert num_remaining_sent >= 0
+            if num_remaining_sent == 0:
+                break
+            if self.search.stop_on_max_len and step >= max_len:
+                break
+            assert step < max_len, f"{step} < {max_len}"
+
+            # Remove finalized sentences (ones for which {beam_size}
+            # finished hypotheses have been generated) from the batch.
+            if len(finalized_sents) > 0:
+                new_bsz = bsz - len(finalized_sents)
+
+                # construct batch_idxs which holds indices of batches to keep for the next pass
+                batch_mask = torch.ones(
+                    bsz, dtype=torch.bool, device=cand_indices.device
+                )
+                batch_mask[finalized_sents] = False
+                # TODO replace `nonzero(as_tuple=False)` after TorchScript supports it
+                batch_idxs = torch.arange(
+                    bsz, device=cand_indices.device
+                ).masked_select(batch_mask)
+
+                # Choose the subset of the hypothesized constraints that will continue
+                self.search.prune_sentences(batch_idxs)
+
+                eos_mask = eos_mask[batch_idxs]
+                cand_beams = cand_beams[batch_idxs]
+                bbsz_offsets.resize_(new_bsz, 1)
+                cand_bbsz_idx = cand_beams.add(bbsz_offsets)
+                cand_scores = cand_scores[batch_idxs]
+                cand_indices = cand_indices[batch_idxs]
+
+                if prefix_tokens is not None:
+                    prefix_tokens = prefix_tokens[batch_idxs]
+                src_lengths = src_lengths[batch_idxs]
+                cands_to_ignore = cands_to_ignore[batch_idxs]
+
+                scores = scores.view(bsz, -1)[batch_idxs].view(new_bsz * beam_size, -1)
+                tokens = tokens.view(bsz, -1)[batch_idxs].view(new_bsz * beam_size, -1)
+                if attn is not None:
+                    attn = attn.view(bsz, -1)[batch_idxs].view(
+                        new_bsz * beam_size, attn.size(1), -1
+                    )
+                bsz = new_bsz
+            else:
+                batch_idxs = None
+
+            # Set active_mask so that values > cand_size indicate eos hypos
+            # and values < cand_size indicate candidate active hypos.
+            # After, the min values per row are the top candidate active hypos
+
+            # Rewrite the operator since the element wise or is not supported in torchscript.
+
+            eos_mask[:, :beam_size] = ~((~cands_to_ignore) & (~eos_mask[:, :beam_size]))
+            active_mask = torch.add(
+                eos_mask.type_as(cand_offsets) * cand_size,
+                cand_offsets[: eos_mask.size(1)],
+            )
+
+            # get the top beam_size active hypotheses, which are just
+            # the hypos with the smallest values in active_mask.
+            # {active_hypos} indicates which {beam_size} hypotheses
+            # from the list of {2 * beam_size} candidates were
+            # selected. Shapes: (batch size, beam size)
+            new_cands_to_ignore, active_hypos = torch.topk(
+                active_mask, k=beam_size, dim=1, largest=False
+            )
+
+            # update cands_to_ignore to ignore any finalized hypos.
+            cands_to_ignore = new_cands_to_ignore.ge(cand_size)[:, :beam_size]
+            # Make sure there is at least one active item for each sentence in the batch.
+            assert (~cands_to_ignore).any(dim=1).all()
+
+            # update cands_to_ignore to ignore any finalized hypos
+
+            # {active_bbsz_idx} denotes which beam number is continued for each new hypothesis (a beam
+            # can be selected more than once).
+            active_bbsz_idx = torch.gather(cand_bbsz_idx, dim=1, index=active_hypos)
+            active_scores = torch.gather(cand_scores, dim=1, index=active_hypos)
+
+            active_bbsz_idx = active_bbsz_idx.view(-1)
+            active_scores = active_scores.view(-1)
+
+            # copy tokens and scores for active hypotheses
+
+            # Set the tokens for each beam (can select the same row more than once)
+            tokens[:, : step + 1] = torch.index_select(
+                tokens[:, : step + 1], dim=0, index=active_bbsz_idx
+            )
+            # Select the next token for each of them
+            tokens.view(bsz, beam_size, -1)[:, :, step + 1] = torch.gather(
+                cand_indices, dim=1, index=active_hypos
+            )
+            if step > 0:
+                scores[:, :step] = torch.index_select(
+                    scores[:, :step], dim=0, index=active_bbsz_idx
+                )
+            scores.view(bsz, beam_size, -1)[:, :, step] = torch.gather(
+                cand_scores, dim=1, index=active_hypos
+            )
+
+            # Update constraints based on which candidates were selected for the next beam
+            self.search.update_constraints(active_hypos)
+
+            # copy attention for active hypotheses
+            if attn is not None:
+                attn[:, :, : step + 2] = torch.index_select(
+                    attn[:, :, : step + 2], dim=0, index=active_bbsz_idx
+                )
+
+            # reorder incremental state in decoder
+            reorder_state = active_bbsz_idx
+
+        # sort by score descending
+        for sent in range(len(finalized)):
+            scores = torch.tensor(
+                [float(elem["score"].item()) for elem in finalized[sent]]
+            )
+            _, sorted_scores_indices = torch.sort(scores, descending=True)
+            finalized[sent] = [finalized[sent][ssi] for ssi in sorted_scores_indices]
+            finalized[sent] = torch.jit.annotate(
+                List[Dict[str, Tensor]], finalized[sent]
+            )
+        return finalized
+
+    def _prefix_tokens(
+        self, step: int, lprobs, scores, tokens, prefix_tokens, beam_size: int
+    ):
+        """Handle prefix tokens"""
+        prefix_toks = prefix_tokens[:, step].unsqueeze(-1).repeat(1, beam_size).view(-1)
+        prefix_lprobs = lprobs.gather(-1, prefix_toks.unsqueeze(-1))
+        prefix_mask = prefix_toks.ne(self.pad)
+        lprobs[prefix_mask] = torch.tensor(-math.inf).to(lprobs)
+        lprobs[prefix_mask] = lprobs[prefix_mask].scatter(
+            -1, prefix_toks[prefix_mask].unsqueeze(-1), prefix_lprobs[prefix_mask]
+        )
+        # if prefix includes eos, then we should make sure tokens and
+        # scores are the same across all beams
+        eos_mask = prefix_toks.eq(self.eos)
+        if eos_mask.any():
+            # validate that the first beam matches the prefix
+            first_beam = tokens[eos_mask].view(-1, beam_size, tokens.size(-1))[
+                :, 0, 1 : step + 1
+            ]
+            eos_mask_batch_dim = eos_mask.view(-1, beam_size)[:, 0]
+            target_prefix = prefix_tokens[eos_mask_batch_dim][:, :step]
+            assert (first_beam == target_prefix).all()
+
+            # copy tokens, scores and lprobs from the first beam to all beams
+            tokens = self.replicate_first_beam(tokens, eos_mask_batch_dim, beam_size)
+            scores = self.replicate_first_beam(scores, eos_mask_batch_dim, beam_size)
+            lprobs = self.replicate_first_beam(lprobs, eos_mask_batch_dim, beam_size)
+        return lprobs, tokens, scores
+
+    def replicate_first_beam(self, tensor, mask, beam_size: int):
+        tensor = tensor.view(-1, beam_size, tensor.size(-1))
+        tensor[mask] = tensor[mask][:, :1, :]
+        return tensor.view(-1, tensor.size(-1))
+
+    def finalize_hypos(
+        self,
+        step: int,
+        bbsz_idx,
+        eos_scores,
+        tokens,
+        scores,
+        finalized: List[List[Dict[str, Tensor]]],
+        finished: List[bool],
+        beam_size: int,
+        attn: Optional[Tensor],
+        src_lengths,
+        max_len: int,
+    ):
+        """Finalize hypothesis, store finalized information in `finalized`, and change `finished` accordingly.
+        A sentence is finalized when {beam_size} finished items have been collected for it.
+
+        Returns number of sentences (not beam items) being finalized.
+        These will be removed from the batch and not processed further.
+        Args:
+            bbsz_idx (Tensor):
+        """
+        assert bbsz_idx.numel() == eos_scores.numel()
+
+        # clone relevant token and attention tensors.
+        # tokens is (batch * beam, max_len). So the index_select
+        # gets the newly EOS rows, then selects cols 1..{step + 2}
+        tokens_clone = tokens.index_select(0, bbsz_idx)[
+            :, 1 : step + 2
+        ]  # skip the first index, which is EOS
+
+        tokens_clone[:, step] = self.eos
+        attn_clone = (
+            attn.index_select(0, bbsz_idx)[:, :, 1 : step + 2]
+            if attn is not None
+            else None
+        )
+
+        # compute scores per token position
+        pos_scores = scores.index_select(0, bbsz_idx)[:, : step + 1]
+        pos_scores[:, step] = eos_scores
+        # convert from cumulative to per-position scores
+        pos_scores[:, 1:] = pos_scores[:, 1:] - pos_scores[:, :-1]
+
+        # normalize sentence-level scores
+        if self.normalize_scores:
+            eos_scores /= (step + 1) ** self.len_penalty
+
+        # cum_unfin records which sentences in the batch are finished.
+        # It helps match indexing between (a) the original sentences
+        # in the batch and (b) the current, possibly-reduced set of
+        # sentences.
+        cum_unfin: List[int] = []
+        prev = 0
+        for f in finished:
+            if f:
+                prev += 1
+            else:
+                cum_unfin.append(prev)
+        cum_fin_tensor = torch.tensor(cum_unfin, dtype=torch.int).to(bbsz_idx)
+
+        unfin_idx = torch.div(bbsz_idx, beam_size, rounding_mode="trunc")
+        sent = unfin_idx + torch.index_select(cum_fin_tensor, 0, unfin_idx)
+
+        # Create a set of "{sent}{unfin_idx}", where
+        # "unfin_idx" is the index in the current (possibly reduced)
+        # list of sentences, and "sent" is the index in the original,
+        # unreduced batch
+        # For every finished beam item
+        # sentence index in the current (possibly reduced) batch
+        seen = (sent << 32) + unfin_idx
+        unique_seen: List[int] = torch.unique(seen).tolist()
+
+        if self.match_source_len:
+            condition = step > torch.index_select(src_lengths, 0, unfin_idx)
+            eos_scores = torch.where(condition, torch.tensor(-math.inf), eos_scores)
+        sent_list: List[int] = sent.tolist()
+        for i in range(bbsz_idx.size()[0]):
+            # An input sentence (among those in a batch) is finished when
+            # beam_size hypotheses have been collected for it
+            if len(finalized[sent_list[i]]) < beam_size:
+                if attn_clone is not None:
+                    # remove padding tokens from attn scores
+                    hypo_attn = attn_clone[i]
+                else:
+                    hypo_attn = torch.empty(0)
+
+                finalized[sent_list[i]].append(
+                    {
+                        "tokens": tokens_clone[i],
+                        "score": eos_scores[i],
+                        "attention": hypo_attn,  # src_len x tgt_len
+                        "alignment": torch.empty(0),
+                        "positional_scores": pos_scores[i],
+                    }
+                )
+
+        newly_finished: List[int] = []
+        for unique_s in unique_seen:
+            # check termination conditions for this sentence
+            unique_sent: int = unique_s >> 32
+            unique_unfin_idx: int = unique_s - (unique_sent << 32)
+
+            if not finished[unique_sent] and self.is_finished(
+                step, unique_unfin_idx, max_len, len(finalized[unique_sent]), beam_size
+            ):
+                finished[unique_sent] = True
+                newly_finished.append(unique_unfin_idx)
+
+        return newly_finished
+
+    def is_finished(
+        self,
+        step: int,
+        unfin_idx: int,
+        max_len: int,
+        finalized_sent_len: int,
+        beam_size: int,
+    ):
+        """
+        Check whether decoding for a sentence is finished, which
+        occurs when the list of finalized sentences has reached the
+        beam size, or when we reach the maximum length.
+        """
+        assert finalized_sent_len <= beam_size
+        if finalized_sent_len == beam_size or step == max_len:
+            return True
+        return False
+
+
+class EnsembleModel(nn.Module):
+    """A wrapper around an ensemble of models."""
+
+    def __init__(self, models):
+        super().__init__()
+        self.models_size = len(models)
+        # method '__len__' is not supported in ModuleList for torch script
+        self.single_model = models[0]
+        self.models = nn.ModuleList(models)
+
+        self.has_incremental: bool = False
+        if all(
+            hasattr(m, "decoder") and isinstance(m.decoder, FairseqIncrementalDecoder)
+            for m in models
+        ):
+            self.has_incremental = True
+        
+        if all(hasattr(m, "gpt_model") for m in models):
+            self.has_incremental = True
+
+    def forward(self):
+        pass
+
+    def has_encoder(self):
+        #return hasattr(self.single_model, "encoder")
+        return hasattr(self.single_model, "audio_encoder")
+
+    def has_incremental_states(self):
+        return self.has_incremental
+
+    def max_decoder_positions(self):
+        return min(
+            [
+                m.max_decoder_positions()
+                for m in self.models
+                if hasattr(m, "max_decoder_positions")
+            ]
+            + [sys.maxsize]
+        )
+
+    def set_decoder_beam_size(self, beam_size):
+        """Set beam size for efficient beamable enc-dec attention."""
+        if beam_size > 1:
+            for model in self.models:
+                if hasattr(model, "set_beam_size"):
+                    model.set_beam_size(beam_size)
+
+    def concate_audio_wavlm(self, model, wavlm_audio_out, audio_out):
+        if audio_out['encoder_out'].size(1) != wavlm_audio_out['encoder_out'].size(1):
+            out_len = min(audio_out['encoder_out'].size(1), wavlm_audio_out['encoder_out'].size(1))
+            audio_out['encoder_out'] = audio_out['encoder_out'][:, : out_len, :]
+            audio_out["encoder_padding_mask"] = audio_out["encoder_padding_mask"][:, : out_len]
+            wavlm_audio_out['encoder_out'] = wavlm_audio_out['encoder_out'][:, : out_len, :]
+            wavlm_audio_out["encoder_padding_mask"] = wavlm_audio_out["encoder_padding_mask"][:, : out_len]
+
+        audio_out['encoder_out'] = torch.cat((audio_out['encoder_out'], wavlm_audio_out['encoder_out']), dim=2)
+        audio_out['encoder_out'] = model.wavlm_audio_proj(audio_out['encoder_out'])
+        return audio_out
+
+    @torch.jit.export
+    def forward_encoder(self, net_input: Dict[str, Tensor]):
+        #print ("self.has_encoder(): ", self.has_encoder())
+        if not self.has_encoder():
+            return None
+        #return [model.encoder.forward_torchscript(net_input) for model in self.models]
+        return [model.audio_encoder.forward_torchscript(net_input) for model in self.models]
+
+    @torch.jit.export
+    def forward_decoder(
+        self,
+        tokens,
+        encoder_outs: List[Dict[str, List[Tensor]]],
+        few_shot_encoder_outs: List[Dict[str, List[Tensor]]],
+        incremental_states: List[Dict[str, Dict[str, Optional[Tensor]]]],
+        temperature: float = 1.0,
+        net_input: Dict[str, Optional[Tensor]] = None,
+        step: int = 0,
+        wavlm_outs: List[Dict[str, List[Tensor]]] = None,
+        few_shot_wavlm_outs: List[Dict[str, List[Tensor]]] = None,
+        tid: int = 0,
+    ):
+        log_probs = []
+        avg_attn: Optional[Tensor] = None
+        encoder_out: Optional[Dict[str, List[Tensor]]] = None
+        few_shot_encoder_out: Optional[Dict[str, List[Tensor]]] = None
+        wavlm_out: Optional[Dict[str, List[Tensor]]] = None
+        for i, model in enumerate(self.models):
+            if self.has_encoder():
+                encoder_out = encoder_outs[i]
+                if few_shot_encoder_outs[0] is not None:
+                    few_shot_encoder_out = few_shot_encoder_outs[i]
+                if wavlm_outs is not None:
+                    wavlm_out = wavlm_outs[i]
+                if few_shot_wavlm_outs is not None:
+                    few_shot_wavlm_out = few_shot_wavlm_outs[i]
+            # decode each model
+            if self.has_incremental_states():
+                # decoder_out = model.decoder.forward(
+                #     tokens,
+                #     encoder_out=encoder_out,
+                #     incremental_state=incremental_states[i],
+                # )
+                examples = net_input["prev_output_tokens"]
+                orig_prompts = net_input["orig_prompts"]
+                left_prompts = net_input["left_prompts"]
+                left_prompt_masks = net_input["left_prompt_masks"]
+                if few_shot_encoder_out is not None:
+                    mid_prompts = net_input["mid_prompts"]
+                    mid_prompt_masks = net_input["mid_prompt_masks"]
+                else:
+                    mid_prompts = None
+                    mid_prompt_masks = None
+
+                prompt_masks = net_input["prompt_masks"]
+                target_masks = net_input["target_masks"]
+                index = net_input["index"]
+
+                lora_index = net_input["lora_index"]
+                if lora_index[0] == -1:
+                    lora_index = None
+
+                # codec
+                if step == 0:
+                    audio_codec = net_input["audio_codec"]
+                    codec_masks = net_input["codec_masks"]
+                    
+                    encoder_out = self.concate_audio_wavlm(model, wavlm_out, encoder_out)
+                start_pos = step
+                # TODO
+                # encoder_out = None
+                decoder_out = model.gpt_model.forward_generate(tid, tokens, start_pos, examples, encoder_out, index, orig_prompts, lora_index, few_shot_encoder_out=few_shot_encoder_out, left_prompts=left_prompts, mid_prompts=mid_prompts, prompt_masks=prompt_masks, target_masks=target_masks, left_prompt_masks=left_prompt_masks, mid_prompt_masks=mid_prompt_masks, incremental_state=incremental_states[i])
+            else:
+                if hasattr(model, "decoder"):
+                    decoder_out = model.decoder.forward(tokens, encoder_out=encoder_out)
+                else:
+                    decoder_out = model.forward(tokens)
+
+            attn: Optional[Tensor] = None
+            decoder_len = len(decoder_out)
+            if decoder_len > 1 and decoder_out[1] is not None:
+                if isinstance(decoder_out[1], Tensor):
+                    attn = decoder_out[1]
+                else:
+                    attn_holder = decoder_out[1]["attn"]
+                    if isinstance(attn_holder, Tensor):
+                        attn = attn_holder
+                    elif attn_holder is not None:
+                        attn = attn_holder[0]
+                if attn is not None:
+                    attn = attn[:, -1, :]
+
+            decoder_out_tuple = (
+                decoder_out[0][:, -1:, :].div_(temperature),
+                None if decoder_len <= 1 else decoder_out[1],
+            )
+            probs = model.get_normalized_probs(
+                decoder_out_tuple, log_probs=True, sample=None
+            )
+            probs = probs[:, -1, :]
+
+            if self.models_size == 1:
+                return probs, attn
+
+            log_probs.append(probs)
+            if attn is not None:
+                if avg_attn is None:
+                    avg_attn = attn
+                else:
+                    avg_attn.add_(attn)
+
+        avg_probs = torch.logsumexp(torch.stack(log_probs, dim=0), dim=0) - math.log(
+            self.models_size
+        )
+        if avg_attn is not None:
+            avg_attn.div_(self.models_size)
+        return avg_probs, avg_attn
+
+    @torch.jit.export
+    def reorder_encoder_out(
+        self, encoder_outs: Optional[List[Dict[str, List[Tensor]]]], new_order, wavlm=False
+    ):
+        """
+        Reorder encoder output according to *new_order*.
+
+        Args:
+            encoder_out: output from the ``forward()`` method
+            new_order (LongTensor): desired order
+
+        Returns:
+            *encoder_out* rearranged according to *new_order*
+        """
+        new_outs: List[Dict[str, List[Tensor]]] = []
+        if not self.has_encoder():
+            return new_outs
+        for i, model in enumerate(self.models):
+            assert encoder_outs is not None
+            if wavlm:
+                new_outs.append(
+                    model.wavlm_encoder.reorder_encoder_out(encoder_outs[i], new_order)
+                )
+            else:
+                new_outs.append(
+                    model.audio_encoder.reorder_encoder_out(encoder_outs[i], new_order)
+                )
+        return new_outs
+
+    @torch.jit.export
+    def reorder_incremental_state(
+        self,
+        incremental_states: List[Dict[str, Dict[str, Optional[Tensor]]]],
+        new_order,
+    ):
+        if not self.has_incremental_states():
+            return
+        for i, model in enumerate(self.models):
+            model.gpt_model.reorder_incremental_state_scripting(
+                incremental_states[i], new_order
+            )
\ No newline at end of file
diff --git a/WavLLM/wavllm/models/llama.py b/WavLLM/wavllm/models/llama.py
new file mode 100644
index 0000000000000000000000000000000000000000..c85295a2d1d5fe4cfce2afe3dfdd0f75c2b20926
--- /dev/null
+++ b/WavLLM/wavllm/models/llama.py
@@ -0,0 +1,693 @@
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from torch.nn import Embedding, Linear
+
+from fairseq.data.data_utils import lengths_to_padding_mask, lengths_to_mask
+from fairseq.models import (
+    FairseqEncoder,
+    FairseqDecoder,
+    FairseqEncoderDecoderModel,
+    register_model,
+    register_model_architecture,
+)
+
+from torch.distributed.fsdp.wrap import _or_policy, lambda_auto_wrap_policy, transformer_auto_wrap_policy
+import torch.utils.checkpoint as cp
+from torch.distributed.fsdp.fully_sharded_data_parallel import CPUOffload
+from torch.distributed.fsdp import (
+    FullyShardedDataParallel as FSDP,
+    MixedPrecision
+)
+
+from typing import Optional, Tuple, List, Dict
+from dataclasses import dataclass
+from sentencepiece import SentencePieceProcessor
+from torch import Tensor
+
+import math
+import json
+import os
+import numpy as np
+import functools
+import time
+
+try:
+    from xformers.ops import memory_efficient_attention, LowerTriangularMask, MemoryEfficientAttentionCutlassOp
+except ModuleNotFoundError:
+    print ("xformers.ops ModuleNotFoundError")
+    memory_efficient_attention, LowerTriangularMask, MemoryEfficientAttentionCutlassOp = None, None, None
+
+from functools import partial
+
+from torch.distributed.algorithms._checkpoint.checkpoint_wrapper import (
+    checkpoint_wrapper,
+    CheckpointImpl,
+    apply_activation_checkpointing,
+)
+
+non_reentrant_wrapper = partial(
+    checkpoint_wrapper,
+    checkpoint_impl=CheckpointImpl.NO_REENTRANT,
+)
+
+check_fn = lambda submodule: isinstance(submodule, TransformerBlock)
+
+
+def apply_fsdp_checkpointing(model):
+    """apply activation checkpointing to model
+    returns None as model is updated directly
+    """
+    print(f"--> applying fsdp activation checkpointing...")
+
+    apply_activation_checkpointing(
+        model, checkpoint_wrapper_fn=non_reentrant_wrapper, check_fn=check_fn
+    )
+
+def get_llama_wrapper():
+    """we register our main layer class and use the fsdp transformer wrapping policy
+    ensures embedding layers are in the root fsdp unit for shared access and that fsdp units map to transformer layers
+    """
+
+    transformer_wrap_policy = functools.partial(
+        transformer_auto_wrap_policy,
+        transformer_layer_cls={
+            FeedForward,
+        },
+    )
+    auto_wrap_policy = transformer_wrap_policy
+    # auto_wrap_policy = functools.partial(_or_policy, policies=[lambda_policy, transformer_wrap_policy])
+    return auto_wrap_policy
+
+class LLaMADecoder(FairseqDecoder):
+
+    # TODO: change it to incremental decoder!!
+
+    def __init__(
+        self, dictionary, llama_checkpoint, n_xatt, d_att, d_ffn, use_lora, lora_r, lora_alpha, lora_scale_train, 
+            lora_scale_index, lora_scale_random, lora_task_index, lora_moe, lora_moe_scaling, lora_moe_n_experts, 
+            enable_fsdp, use_xformers, second_stage_update_scale, second_stage_fix_lora, lora_only_qv, scale_only_one, scale_with_audio, 
+            scale_0_1, scale_predict_time, scale_predict_all_dim, scale_predict_all_dim_each_layer, prompt_loss,
+            use_llama_adapter,
+    ):
+        super().__init__(dictionary)
+        model_args = ModelArgs(n_xatt=n_xatt, d_att=d_att, d_ffn=d_ffn, use_lora=use_lora, lora_r=lora_r, lora_alpha=lora_alpha, lora_scale_train=lora_scale_train, 
+                            lora_scale_index=lora_scale_index, lora_scale_random=lora_scale_random, lora_task_index=lora_task_index, n_experts=lora_moe_n_experts, 
+                            lora_moe=lora_moe, lora_moe_scaling=lora_moe_scaling, use_xformers=use_xformers, second_stage_update_scale=second_stage_update_scale, second_stage_fix_lora=second_stage_fix_lora,
+                            lora_only_qv=lora_only_qv, scale_only_one=scale_only_one, scale_with_audio=scale_with_audio, scale_0_1=scale_0_1, 
+                            scale_predict_time=scale_predict_time,scale_predict_all_dim=scale_predict_all_dim, 
+                            scale_predict_all_dim_each_layer=scale_predict_all_dim_each_layer, prompt_loss=prompt_loss,
+                            use_llama_adapter=use_llama_adapter,enable_fsdp=enable_fsdp)
+        self.model_llama = LLAMA(model_args)
+        # checkpoint = torch.load(llama_checkpoint, map_location="cpu")
+        # self.model_llama.load_state_dict(checkpoint, strict=False)
+        apply_fsdp_checkpointing(self.model_llama)
+
+    def forward(self, prev_output_tokens, audio_out, index, orig_prompts, left_prompts=None, target_masks=None, prompt_masks=None, left_prompt_masks=None, speech_middle=False, speech_flag=None, example_audio_out=None, mid_prompts=None, mid_prompt_masks=None,lora_index=None):
+        return self.model_llama(prev_output_tokens, audio_out, index, orig_prompts, left_prompts, target_masks, prompt_masks, left_prompt_masks, speech_middle=speech_middle, speech_flag=speech_flag, example_audio_out=example_audio_out, mid_prompts=mid_prompts, mid_prompt_masks=mid_prompt_masks,lora_index=lora_index), None
+
+    def forward_generate(self, tid, prev_output_tokens, start_pos, examples, audio_out, index, orig_prompts, lora_index=None, few_shot_encoder_out=None, left_prompts=None, mid_prompts=None, target_masks=None, prompt_masks=None, left_prompt_masks=None, mid_prompt_masks=None, speech_flag=None, incremental_state=None):
+        return self.model_llama.forward_generate(tid, prev_output_tokens, start_pos, examples, audio_out, index, orig_prompts, lora_index=lora_index, few_shot_encoder_out=few_shot_encoder_out, left_prompts=left_prompts, mid_prompts=mid_prompts,target_masks=target_masks, prompt_masks=prompt_masks, left_prompt_masks=left_prompt_masks, mid_prompt_masks=mid_prompt_masks, speech_flag=speech_flag, incremental_state=incremental_state), None
+    
+    def reorder_incremental_state_scripting(
+        self,
+        incremental_state: Dict[str, Dict[str, Optional[Tensor]]],
+        new_order: Tensor,
+    ):
+        return self.model_llama.reorder_incremental_state_scripting(incremental_state, new_order)
+
+
+# LLAMA model
+@dataclass
+class ModelArgs:
+    dim: int = 4096
+    n_layers: int = 32
+    n_heads: int = 32
+    vocab_size: int = 32000  # defined later by tokenizer
+    multiple_of: int = 256  # make SwiGLU hidden layer size multiple of large power of 2
+    norm_eps: float = 1e-6
+    # max_batch_size: int = 32
+    max_seq_len: int = 1024
+
+    n_xatt: int = 16
+    d_att: int = 256
+    d_ffn: int = 256
+
+    gradient_checkpointing: bool = False
+
+    use_lora: bool = True
+    lora_scale_train: bool = False
+    lora_only_qv: bool = False
+    lora_scale_index: bool = False
+    lora_scale_random: bool = False
+    lora_task_index: bool = False
+    lora_r: int = 8
+    lora_alpha: int = 32
+    lora_dropout: float = 0.1
+    lora_moe: bool = False
+    lora_moe_scaling: bool = False
+    n_experts: int = 3
+
+    flash_attention: bool = False
+    use_xformers: bool = False
+    second_stage_update_scale: bool = False
+    second_stage_fix_lora: bool = False
+    scale_only_one: bool = False
+    scale_with_audio: bool = True
+    scale_0_1: bool = True
+    scale_predict_time: bool = False
+    scale_predict_all_dim: bool = False
+    scale_predict_all_dim_each_layer: bool = False
+    prompt_loss: bool = False
+
+    use_llama_adapter: bool = False
+    add_bias: bool = False
+    add_scale: bool = False
+
+    enable_fsdp: bool = False
+
+class RMSNorm(torch.nn.Module):
+    def __init__(self, dim: int, eps: float = 1e-6):
+        super().__init__()
+        self.eps = eps
+        self.weight = nn.Parameter(torch.ones(dim))
+
+    def _norm(self, x):
+        return x * torch.rsqrt(x.pow(2).mean(-1, keepdim=True) + self.eps)
+
+    def forward(self, x):
+        output = self._norm(x.float()).type_as(x)
+        return output * self.weight
+
+
+def precompute_freqs_cis(dim: int, end: int, theta: float = 10000.0):
+    freqs = 1.0 / (theta ** (torch.arange(0, dim, 2)[: (dim // 2)].float() / dim))
+    t = torch.arange(end, device=freqs.device)  # type: ignore
+    freqs = torch.outer(t, freqs).float()  # type: ignore
+    freqs_cis = torch.polar(torch.ones_like(freqs), freqs)  # complex64
+    return freqs_cis
+
+
+def reshape_for_broadcast(freqs_cis: torch.Tensor, x: torch.Tensor):
+    ndim = x.ndim
+    assert 0 <= 1 < ndim
+    assert freqs_cis.shape == (x.shape[1], x.shape[-1])
+    shape = [d if i == 1 or i == ndim - 1 else 1 for i, d in enumerate(x.shape)]
+    return freqs_cis.view(*shape)
+
+
+def apply_rotary_emb(
+    xq: torch.Tensor,
+    xk: torch.Tensor,
+    freqs_cis: torch.Tensor,
+) -> Tuple[torch.Tensor, torch.Tensor]:
+    xq_ = torch.view_as_complex(xq.float().reshape(*xq.shape[:-1], -1, 2))
+    xk_ = torch.view_as_complex(xk.float().reshape(*xk.shape[:-1], -1, 2))
+    freqs_cis = reshape_for_broadcast(freqs_cis, xq_)
+    xq_out = torch.view_as_real(xq_ * freqs_cis).flatten(3)
+    xk_out = torch.view_as_real(xk_ * freqs_cis).flatten(3)
+    return xq_out.type_as(xq), xk_out.type_as(xk)
+
+class Attention_LoRA(nn.Module):
+    def __init__(self, args):
+        super().__init__()
+
+        self.n_local_heads = args.n_heads
+        self.head_dim = args.dim // args.n_heads
+
+        self.wq = Linear(
+            args.dim,
+            args.n_heads * self.head_dim,
+            bias=False
+        )
+        self.wk = Linear(
+            args.dim,
+            args.n_heads * self.head_dim,
+            bias=False
+        )
+        self.wv = Linear(
+            args.dim,
+            args.n_heads * self.head_dim,
+            bias=False
+        )
+        self.wo = Linear(
+            args.n_heads * self.head_dim,
+            args.dim,
+            bias=False
+        )
+
+        self.wq_lora_A = nn.Parameter(self.wq.weight.new_zeros((args.dim, args.lora_r)))
+        self.wq_lora_B = nn.Parameter(self.wq.weight.new_zeros((args.lora_r, args.dim)))
+        self.wk_lora_A = nn.Parameter(self.wk.weight.new_zeros((args.dim, args.lora_r)))
+        self.wk_lora_B = nn.Parameter(self.wk.weight.new_zeros((args.lora_r, args.dim)))
+        self.wv_lora_A = nn.Parameter(self.wv.weight.new_zeros((args.dim, args.lora_r)))
+        self.wv_lora_B = nn.Parameter(self.wv.weight.new_zeros((args.lora_r, args.dim)))
+        self.wo_lora_A = nn.Parameter(self.wo.weight.new_zeros((args.dim, args.lora_r)))
+        self.wo_lora_B = nn.Parameter(self.wo.weight.new_zeros((args.lora_r, args.dim)))
+        self.scaling = args.lora_alpha / args.lora_r
+
+        self.reset_parameters()
+
+        if args.lora_dropout > 0.:
+            self.lora_dropout = nn.Dropout(p=args.lora_dropout)
+        else:
+            self.lora_dropout = lambda x: x
+
+        # TODO use incremental states
+        # self.cache_k = torch.zeros(
+        #     (args.max_batch_size, args.max_seq_len, self.n_local_heads, self.head_dim)
+        # ).cuda()
+        # self.cache_v = torch.zeros(
+        #     (args.max_batch_size, args.max_seq_len, self.n_local_heads, self.head_dim)
+        # ).cuda()
+
+    def reset_parameters(self):
+        nn.init.kaiming_uniform_(self.wq_lora_A, a=math.sqrt(5))
+        nn.init.kaiming_uniform_(self.wk_lora_A, a=math.sqrt(5))
+        nn.init.kaiming_uniform_(self.wv_lora_A, a=math.sqrt(5))
+        nn.init.kaiming_uniform_(self.wo_lora_A, a=math.sqrt(5))
+        nn.init.zeros_(self.wq_lora_B)
+        nn.init.zeros_(self.wk_lora_B)
+        nn.init.zeros_(self.wv_lora_B)
+        nn.init.zeros_(self.wo_lora_B)
+
+    def _checkpointed_forward(self, x):
+        xq = self.wq(x) + (self.lora_dropout(x) @ self.wq_lora_A @ self.wq_lora_B) * self.scaling
+        xk = self.wk(x) + (self.lora_dropout(x) @ self.wk_lora_A @ self.wk_lora_B) * self.scaling
+        xv = self.wv(x) + (self.lora_dropout(x) @ self.wv_lora_A @ self.wv_lora_B) * self.scaling
+        return xq, xk, xv
+    
+    def forward(self, x: torch.Tensor, start_pos: int, freqs_cis: torch.Tensor, 
+                mask: Optional[torch.Tensor], index, lora_index, lora_weights, pooled_scale_output, is_prompt=False, 
+                incremental_state=None, gradient_checkpointing=False, layer_id=None):
+
+        bsz, seqlen, _ = x.shape
+        if is_prompt:
+            xq = self.wq(x)
+            xk = self.wk(x)
+            xv = self.wv(x)
+        else:
+            if pooled_scale_output is not None:
+                pooled_scale_output = pooled_scale_output.type_as(x)
+                xq = self.wq(x) + (self.lora_dropout(x) @ self.wq_lora_A @ self.wq_lora_B) * pooled_scale_output.unsqueeze(1)
+                xk = self.wk(x) + (self.lora_dropout(x) @ self.wk_lora_A @ self.wk_lora_B) * pooled_scale_output.unsqueeze(1)
+                xv = self.wv(x) + (self.lora_dropout(x) @ self.wv_lora_A @ self.wv_lora_B) * pooled_scale_output.unsqueeze(1)
+
+        xq = xq.view(bsz, seqlen, self.n_local_heads, self.head_dim)
+        xk = xk.view(bsz, seqlen, self.n_local_heads, self.head_dim)
+        xv = xv.view(bsz, seqlen, self.n_local_heads, self.head_dim)
+
+        xq, xk = apply_rotary_emb(xq, xk, freqs_cis=freqs_cis)
+
+        if incremental_state is not None:
+            if "prev_key" in incremental_state:
+                prev_key = incremental_state["prev_key"].view(
+                    bsz, -1, self.n_local_heads, self.head_dim
+                )
+                prev_value = incremental_state["prev_value"].view(
+                    bsz, -1, self.n_local_heads, self.head_dim
+                )
+
+                xk = torch.cat([prev_key, xk], dim=1)
+                xv = torch.cat([prev_value, xv], dim=1)
+                #print ("test1")
+
+            incremental_state["prev_key"] = xk.view(
+                bsz, -1, self.n_local_heads, self.head_dim
+            )
+            incremental_state["prev_value"] = xv.view(
+                bsz, -1, self.n_local_heads, self.head_dim
+            )
+            #src_len = k.size(1)
+ 
+               
+        keys = xk
+        values = xv
+
+        xq = xq.transpose(1, 2)
+        keys = keys.transpose(1, 2)
+        values = values.transpose(1, 2)
+
+        scores = torch.matmul(xq, keys.transpose(2, 3)) / math.sqrt(self.head_dim)
+        # print("scores: ", scores.shape)
+        if mask is not None:
+            scores = scores + mask  # (bs, n_local_heads, slen, cache_len + slen)
+
+        scores = F.softmax(scores.float(), dim=-1).type_as(xq)
+        output = torch.matmul(scores, values)  # (bs, n_local_heads, slen, head_dim)
+        output = output.transpose(
+            1, 2
+        ).contiguous().view(bsz, seqlen, -1)
+        if is_prompt:
+            return self.wo(output)
+        else:
+            if pooled_scale_output is not None:
+                return self.wo(output) + (self.lora_dropout(output) @ self.wo_lora_A @ self.wo_lora_B) * pooled_scale_output.unsqueeze(1)
+
+class Attention(nn.Module):
+    def __init__(self, args: ModelArgs):
+        super().__init__()
+
+        self.n_local_heads = args.n_heads
+        self.head_dim = args.dim // args.n_heads
+
+        self.wq = Linear(
+            args.dim,
+            args.n_heads * self.head_dim,
+            bias=False
+        )
+        self.wk = Linear(
+            args.dim,
+            args.n_heads * self.head_dim,
+            bias=False
+        )
+        self.wv = Linear(
+            args.dim,
+            args.n_heads * self.head_dim,
+            bias=False
+        )
+        self.wo = Linear(
+            args.n_heads * self.head_dim,
+            args.dim,
+            bias=False
+        )
+
+        # TODO use incremental states
+        # self.cache_k = torch.zeros(
+        #     (args.max_batch_size, args.max_seq_len, self.n_local_heads, self.head_dim)
+        # ).cuda()
+        # self.cache_v = torch.zeros(
+        #     (args.max_batch_size, args.max_seq_len, self.n_local_heads, self.head_dim)
+        # ).cuda()
+
+        self.flash_attention = args.flash_attention
+
+    def _checkpointed_forward(self, x):
+        return self.wq(x), self.wk(x), self.wv(x)
+    
+    def forward(self, x: torch.Tensor, start_pos: int, freqs_cis: torch.Tensor, 
+                mask: Optional[torch.Tensor], index, lora_index, lora_weights, pooled_scale_output, is_prompt=False, incremental_state=None, gradient_checkpointing=False):
+
+        bsz, seqlen, _ = x.shape
+        if gradient_checkpointing and self.training:
+            xq, xk, xv = cp.checkpoint(self._checkpointed_forward, x)
+        else:
+            xq, xk, xv = self.wq(x), self.wk(x), self.wv(x)
+
+        xq = xq.view(bsz, seqlen, self.n_local_heads, self.head_dim)
+        xk = xk.view(bsz, seqlen, self.n_local_heads, self.head_dim)
+        xv = xv.view(bsz, seqlen, self.n_local_heads, self.head_dim)
+
+        xq, xk = apply_rotary_emb(xq, xk, freqs_cis=freqs_cis)
+
+        if incremental_state is not None:
+            if "prev_key" in incremental_state:
+                prev_key = incremental_state["prev_key"].view(
+                    bsz, -1, self.n_local_heads, self.head_dim
+                )
+                prev_value = incremental_state["prev_value"].view(
+                    bsz, -1, self.n_local_heads, self.head_dim
+                )
+                xk = torch.cat([prev_key, xk], dim=1)
+                xv = torch.cat([prev_value, xv], dim=1)
+                #print ("test1")
+            incremental_state["prev_key"] = xk.view(
+                bsz, -1, self.n_local_heads, self.head_dim
+            )
+            incremental_state["prev_value"] = xv.view(
+                bsz, -1, self.n_local_heads, self.head_dim
+            )
+            #src_len = k.size(1)
+ 
+        if self.flash_attention:
+            # attn_bias = LowerTriangularMask()
+            attn_bias = mask
+            attn = memory_efficient_attention(xq, xk, xv, attn_bias, op=MemoryEfficientAttentionCutlassOp)  # B M H K
+            attn = attn.contiguous().view(bsz, seqlen, -1)
+            return self.wo(attn)
+        else: 
+            keys = xk
+            values = xv
+
+            xq = xq.transpose(1, 2)
+            keys = keys.transpose(1, 2)
+            values = values.transpose(1, 2)
+            #print ("xq: ", xq.shape)
+            #print ("keys: ", keys.shape)
+            scores = torch.matmul(xq, keys.transpose(2, 3)) / math.sqrt(self.head_dim)
+
+            if mask is not None:
+                scores = scores + mask  # (bs, n_local_heads, slen, cache_len + slen)
+
+            scores = F.softmax(scores.float(), dim=-1).type_as(xq)
+            output = torch.matmul(scores, values)  # (bs, n_local_heads, slen, head_dim)
+            output = output.transpose(
+                1, 2
+            ).contiguous().view(bsz, seqlen, -1)
+
+            return self.wo(output)
+
+class FeedForward(nn.Module):
+    def __init__(
+        self,
+        dim: int,
+        hidden_dim: int,
+        multiple_of: int,
+    ):
+        super().__init__()
+        hidden_dim = int(2 * hidden_dim / 3)
+        hidden_dim = multiple_of * ((hidden_dim + multiple_of - 1) // multiple_of)
+
+        self.w1 = Linear(
+            dim, hidden_dim, bias=False
+        )
+        self.w2 = Linear(
+            hidden_dim, dim, bias=False
+        )
+        self.w3 = Linear(
+            dim, hidden_dim, bias=False
+        )
+
+    def forward(self, x, gradient_checkpointing):
+        if gradient_checkpointing and self.training:
+            output = cp.checkpoint(self._checkpointed_forward, x)
+        else:
+            output = self.w2(F.silu(self.w1(x)) * self.w3(x))
+        return output
+
+    def _checkpointed_forward(self, x):
+        return self.w2(F.silu(self.w1(x)) * self.w3(x))
+
+
+class TransformerBlock(nn.Module):
+    def __init__(self, layer_id: int, args: ModelArgs):
+        super().__init__()
+        self.n_heads = args.n_heads
+        self.dim = args.dim
+        self.head_dim = args.dim // args.n_heads
+        self.ffn_gradient_checkpointing = args.gradient_checkpointing
+        if args.use_lora:
+            self.attention = Attention_LoRA(args)
+        else:
+            self.attention = Attention(args)
+        
+        self.feed_forward = FeedForward(
+            dim=args.dim, hidden_dim=4 * args.dim, multiple_of=args.multiple_of
+        )
+        self.layer_id = layer_id
+        self.attention_norm = RMSNorm(args.dim, eps=args.norm_eps)
+        self.ffn_norm = RMSNorm(args.dim, eps=args.norm_eps)
+
+        wrapping_policy = get_llama_wrapper()
+        fpSixteen = MixedPrecision(
+            param_dtype=torch.float16,
+            reduce_dtype=torch.float16,
+            buffer_dtype=torch.float16
+        )
+        if args.enable_fsdp:
+            self.feed_forward = FSDP(
+                self.feed_forward,
+                auto_wrap_policy = wrapping_policy,
+                device_id=torch.cuda.current_device(),
+                limit_all_gathers=True,
+                mixed_precision=fpSixteen,
+                # cpu_offload=CPUOffload(offload_params=True),
+            )
+        
+
+    def forward(self, x: torch.Tensor, start_pos: int, freqs_cis: torch.Tensor, mask: Optional[torch.Tensor], index, lora_index, lora_weights, pooled_scale_output, is_prompt=False, incremental_state=None,):
+        h = x + self.attention.forward(self.attention_norm(x), start_pos, freqs_cis, mask, index, lora_index, lora_weights, pooled_scale_output, is_prompt, incremental_state, layer_id=self.layer_id)
+        ffn_output = self.feed_forward.forward(self.ffn_norm(h), self.ffn_gradient_checkpointing)
+        out = h + ffn_output
+        return out
+
+class LLAMA(nn.Module):
+    def __init__(self, params: ModelArgs):
+        super().__init__()
+        self.params = params
+        self.vocab_size = params.vocab_size
+        self.n_layers = params.n_layers
+        self.n_xatt = params.n_xatt
+        self.lora_moe = params.lora_moe
+        self.lora_moe_scaling = params.lora_moe_scaling
+        self.second_stage_update_scale = params.second_stage_update_scale
+        self.second_stage_fix_lora = params.second_stage_fix_lora
+        self.prompt_loss = params.prompt_loss
+        self.scale_only_one = params.scale_only_one
+        self.scale_with_audio = params.scale_with_audio
+        self.scale_0_1 = params.scale_0_1
+        self.scale_predict_time = params.scale_predict_time
+        self.scale_predict_all_dim = params.scale_predict_all_dim
+        self.scale_predict_all_dim_each_layer = params.scale_predict_all_dim_each_layer
+        if self.second_stage_update_scale:
+            self.scale_fc_1 = nn.Linear(4096, 1024)
+            self.scale_fc_2 = nn.Linear(1024, 4096)
+            self.scale_weight_attention = nn.Linear(4096, 1)
+            self.scale_fc_nonliner = F.gelu
+        
+        self.scale_predictor = None
+        self.tok_embeddings = Embedding(
+            params.vocab_size, params.dim
+        )
+
+        self.layers = torch.nn.ModuleList()
+        for layer_id in range(params.n_layers):
+            self.layers.append(TransformerBlock(layer_id, params))
+
+        self.norm = RMSNorm(params.dim, eps=params.norm_eps)
+        self.output = Linear(
+            params.dim, params.vocab_size, bias=False
+        )
+        self.infer_pooled_prompt_output = None
+
+        self.freqs_cis = precompute_freqs_cis(
+            self.params.dim // self.params.n_heads, self.params.max_seq_len * 2
+        )
+
+    def freeze_module(self, module):
+        for param in module.parameters():
+            param.requires_grad = False
+
+    def get_text_embedding(self, text):
+        return self.tok_embeddings(text)
+
+    def forward_generate(self, tid, prev_output_tokens, start_pos, examples, audio_out, index, orig_prompts, lora_index=None, few_shot_encoder_out=None, left_prompts=None, mid_prompts=None, target_masks=None, prompt_masks=None, left_prompt_masks=None, mid_prompt_masks=None, speech_flag=None, incremental_state=None):
+        with torch.no_grad():
+            if audio_out is not None:
+                moe_weights = None
+
+                if self.second_stage_update_scale:
+                    if start_pos == 0:
+                        prompt_h = self.tok_embeddings(orig_prompts)
+                        _, prompt_seqlen, _ = prompt_h.shape
+                        prompt_freqs_cis = self.freqs_cis.to(prompt_h.device)
+                        prompt_freqs_cis = prompt_freqs_cis[ : prompt_seqlen]
+            
+                        mask = None
+                        if prompt_seqlen > 1:
+                            mask = torch.full((1, 1, prompt_seqlen, prompt_seqlen), float("-inf"), device=prompt_h.device)
+                            mask = torch.triu(mask, diagonal= 0 + 1).type_as(prompt_h)
+
+                        for i in range(self.n_layers):
+                            prompt_h = self.layers[i](prompt_h, 0, prompt_freqs_cis, mask, -1, None, None, None, is_prompt=True)
+                        scale_output = self.scale_fc_2(self.scale_fc_nonliner(self.scale_fc_1(prompt_h)))
+                        
+                        scale_attn_weights = F.softmax(self.scale_weight_attention(scale_output), dim=1)
+                        weighted_scale = scale_output * scale_attn_weights
+                        weighted_sum_scale = weighted_scale.sum(dim=1)
+                        
+                        pooled_scale_output = torch.clamp(torch.relu(weighted_sum_scale), max=3)
+                        self.infer_pooled_prompt_output = pooled_scale_output
+                    else:
+                        pooled_scale_output = self.infer_pooled_prompt_output
+                else:
+                    pooled_scale_output = None
+
+                is_prompt = False
+                if start_pos == 0:
+                    if left_prompts is not None:
+                        left_h = self.tok_embeddings(left_prompts)
+                    h = self.tok_embeddings(prev_output_tokens)
+                    if left_prompts is not None:
+                        h = torch.cat((left_h, audio_out["encoder_out"], h), dim=1)
+                    else:
+                        h = torch.cat((audio_out["encoder_out"], h), dim=1)
+                else:
+                    prev_output_tokens = prev_output_tokens[:, -1:]
+                    if left_prompts is not None:
+                        start_pos = start_pos + left_prompts.shape[1] + audio_out["encoder_out"].shape[1]
+                    else:
+                        start_pos = start_pos + audio_out["encoder_out"].shape[1]
+                    h = self.tok_embeddings(prev_output_tokens)
+            else:
+                if start_pos == 0:
+                    h = self.tok_embeddings(prev_output_tokens)
+                else:
+                    prev_output_tokens = prev_output_tokens[:, -1:]
+                    h = self.tok_embeddings(prev_output_tokens)
+
+            _bsz, seqlen, _ = h.shape
+            freqs_cis = self.freqs_cis.to(h.device)
+            freqs_cis = freqs_cis[start_pos : start_pos + seqlen]
+            mask = None
+            if seqlen > 1:
+                mask = torch.full((1, 1, seqlen, seqlen), float("-inf"), device=h.device)
+                mask = torch.triu(mask, diagonal=start_pos + 1).type_as(h)
+
+            #start_pos = 0
+            for i in range(self.n_layers):
+                if i not in incremental_state:
+                    incremental_state[i] = {}
+                h = self.layers[i](h, start_pos, freqs_cis, mask, index, lora_index, moe_weights, pooled_scale_output, is_prompt, incremental_state[i])
+
+        h = self.norm(h)
+        
+        out = self.output(h)
+        return out
+
+    def reorder_incremental_state_scripting(
+        self,
+        incremental_state: Dict[str, Dict[str, Optional[Tensor]]],
+        new_order: Tensor,
+    ):
+        for key in incremental_state:
+            for param_name in incremental_state[key]:
+                if incremental_state[key][param_name] is not None:
+                    incremental_state[key][param_name] = incremental_state[key][param_name].index_select(0, new_order)
+
+class Tokenizer:
+    def __init__(self, model_path: str):
+        # reload tokenizer
+        assert os.path.isfile(model_path), model_path
+        self.sp_model = SentencePieceProcessor(model_file=model_path)
+
+        # BOS / EOS token IDs
+        self.n_words: int = self.sp_model.vocab_size()
+        self.bos_id: int = self.sp_model.bos_id()
+        self.eos_id: int = self.sp_model.eos_id()
+        self.pad_id: int = self.sp_model.pad_id()
+
+        print(self.n_words)
+        print(self.bos_id)
+        print(self.eos_id)
+        print(self.pad_id)
+        print(self.sp_model.unk_id())
+
+        assert self.sp_model.vocab_size() == self.sp_model.get_piece_size()
+
+    def encode(self, s: str, bos: bool, eos: bool) -> List[int]:
+        assert type(s) is str
+        t = self.sp_model.encode(s)
+        if bos:
+            t = [self.bos_id] + t
+        if eos:
+            t = t + [self.eos_id]
+        return t
+
+    def decode(self, t: List[int]) -> str:
+        return self.sp_model.decode(t)
+
+
+
+
+
diff --git a/WavLLM/wavllm/models/speechllm_model.py b/WavLLM/wavllm/models/speechllm_model.py
new file mode 100644
index 0000000000000000000000000000000000000000..50905e242d8125f067eb4954b17c4f258a94ffb6
--- /dev/null
+++ b/WavLLM/wavllm/models/speechllm_model.py
@@ -0,0 +1,299 @@
+#!/usr/bin/env python3
+
+import logging
+import math
+import os
+import time
+from pathlib import Path
+from typing import Dict, List, Optional, Tuple
+from dataclasses import dataclass, field
+
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from torch import Tensor
+from collections import OrderedDict
+
+from fairseq import checkpoint_utils, utils
+from fairseq.data.data_utils import lengths_to_padding_mask
+from fairseq.models import (
+    FairseqEncoderModel,
+    FairseqDecoder,
+    FairseqEncoderDecoderModel,
+    BaseFairseqModel,
+    register_model,
+    register_model_architecture,
+)
+from fairseq.models.speech_to_text.hub_interface import S2THubInterface
+from fairseq.models.transformer import Embedding, TransformerDecoder
+from fairseq.modules import (
+    FairseqDropout,
+    LayerNorm,
+    PositionalEmbedding,
+    TransformerEncoderLayer,
+)
+from .llama import LLaMADecoder
+from .whisper_encoder import FairseqWhisperEncoder, WhisperAdapter
+from .wavlm import FairseqWavLMEncoder
+from omegaconf import II
+from fairseq.dataclass import ChoiceEnum, FairseqDataclass
+
+logger = logging.getLogger(__name__)
+
+class Conv1dSubsampler(nn.Module):
+    """Convolutional subsampler: a stack of 1D convolution (along temporal
+    dimension) followed by non-linear activation via gated linear units
+    (https://arxiv.org/abs/1911.08460)
+
+    Args:
+        in_channels (int): the number of input channels
+        mid_channels (int): the number of intermediate channels
+        out_channels (int): the number of output channels
+        kernel_sizes (List[int]): the kernel size for each convolutional layer
+    """
+
+    def __init__(
+        self,
+        in_channels: int,
+        mid_channels: int,
+        out_channels: int,
+        kernel_sizes: List[int] = (3, 3),
+        strides: List[int] = (2, 2),
+    ):
+        super(Conv1dSubsampler, self).__init__()
+        self.strides = strides
+        self.n_layers = len(kernel_sizes)
+        self.conv_layers = nn.ModuleList(
+            nn.Conv1d(
+                in_channels if i == 0 else mid_channels // 2,
+                mid_channels if i < self.n_layers - 1 else out_channels * 2,
+                k,
+                stride=s,
+                padding=k // 2,
+            )
+            for i, (k, s) in enumerate(zip(kernel_sizes, strides))
+        )
+
+    def get_out_seq_lens_tensor(self, in_seq_lens_tensor):
+        out = in_seq_lens_tensor.clone()
+        for i in range(self.n_layers):
+            out = ((out.float() - 1) / self.strides[i] + 1).floor().long()
+        return out
+
+    def forward(self, src_tokens, src_lengths):
+        bsz, in_seq_len, _ = src_tokens.size()  # B x T x (C x D)
+        x = src_tokens.transpose(1, 2).contiguous()  # -> B x (C x D) x T
+        for conv in self.conv_layers:
+            x = conv(x)
+            x = nn.functional.glu(x, dim=1)
+        _, _, out_seq_len = x.size()
+        x = x.transpose(1, 2).transpose(0, 1).contiguous()  # -> T x B x (C x D)
+        return x, self.get_out_seq_lens_tensor(src_lengths)
+
+@dataclass
+class DecoderConfig(FairseqDataclass):
+    # Text
+    vocab_size: int = -1
+
+    # Fairscale
+    checkpoint_activations: bool = False
+    fsdp: bool = False
+    ddp_rank: int = 0
+    flash_attention: bool = False
+    sope_rel_pos: bool = False
+    scale_length: int = 2048
+
+
+    def override(self, args):
+        for hp in self.__dict__.keys():
+            if getattr(args, hp, None) is not None:
+                self.__dict__[hp] = getattr(args, hp, None)
+
+
+@dataclass
+class SpeechLLMMOdelConfig(DecoderConfig):
+    llama_checkpoint: str = ""
+    speechllm_checkpoint: str = ""
+    vicuna_model_path: str = 'lmsys/vicuna-7b-v1.5'
+    # n_xatt: int = 16
+    n_xatt: int = field(
+        default=16,
+        metadata={"help": "the number of xatt"},
+    )
+    d_att: int = field(
+        default=256,
+        metadata={"help": "the dimension of xatt"},
+    )
+    d_ffn: int = field(
+        default=256,
+        metadata={"help": "the dimension of ffn in xatt"},
+    )
+    freeze_gpt: bool = field(
+        default=True
+    )
+    freeze_audio_encoder: bool = field(
+        default=True
+    )
+    whisper_path: str = field(
+        default="openai/whisper-large-v2"
+        # default="openai/whisper-small.en"
+    )
+    wavlm_path: str = field(
+        default="microsoft/wavlm-base"
+        # default="openai/whisper-small.en"
+    )
+    wavlm_output_weight: bool = field(
+        default=False
+    )
+    wavlm_output_weight_by_prompts: bool = field(
+        default=False
+    )
+    wavlm_plus: bool = field(
+        default=False
+    )
+    wavlm_plus_weight: bool = field(
+        default=False
+    )
+    wavlm_plus_1layer: bool = field(
+        default=False
+    )
+    wavlm_plus_1layer_5: bool = field(
+        default=False
+    )
+    wavlm_plus_5layer: bool = field(
+        default=False
+    )
+    wavlm_first_7_layers: bool = field(
+        default=False
+    )
+    load_pretrained_encoder_from: str = field(
+        default=""
+    )
+    use_lora: bool = field(
+        default=False
+    )
+
+class TextEmbedding(nn.Embedding):
+    def reset_parameters(self):
+        nn.init.normal_(self.weight, mean=0, std=self.embedding_dim**-0.5)
+        self._fill_padding_idx_with_zero()
+
+
+@register_model("speechllm_model", dataclass=SpeechLLMMOdelConfig)
+class SpeechLLMModel(BaseFairseqModel):
+    """Adapted Transformer model (https://arxiv.org/abs/1706.03762) for
+    speech-to-text tasks. The Transformer encoder/decoder remains the same.
+    A trainable input subsampler is prepended to the Transformer encoder to
+    project inputs into the encoder dimension as well as downsample input
+    sequence for computational efficiency."""
+
+    def __init__(self, cfg: SpeechLLMMOdelConfig, task):
+        super().__init__()
+        logger.info(f"SpeechLLMModel Config: {cfg}")
+        self.cfg = cfg
+        self.task = task
+        cfg.freeze_audio_encoder = task.cfg.freeze_audio_encoder
+        if task.cfg.llama_2:
+            cfg.llama_checkpoint = task.cfg.llama_2_path
+
+        
+        self.audio_encoder = self.build_audio_encoder(cfg, task)
+        if self.task.cfg.use_wavlm:
+            self.cfg.wavlm_output_weight = task.cfg.wavlm_output_weight
+            self.cfg.wavlm_output_weight_by_prompts = task.cfg.wavlm_output_weight_by_prompts
+            self.cfg.wavlm_plus = task.cfg.wavlm_plus
+            self.cfg.wavlm_plus_weight = task.cfg.wavlm_plus_weight
+            self.cfg.wavlm_plus_1layer = task.cfg.wavlm_plus_1layer
+            self.cfg.wavlm_plus_1layer_5 = task.cfg.wavlm_plus_1layer_5
+            self.cfg.wavlm_plus_5layer = task.cfg.wavlm_plus_5layer
+            self.cfg.wavlm_first_7_layers = task.cfg.wavlm_first_7_layers
+            self.wavlm_encoder = self.build_wavlm_encoder(cfg, task)
+            self.wavlm_audio_proj = nn.Linear(4096, 4096)
+        self.gpt_model = self.build_gpt_model(cfg, task)
+        self.audio_proj = nn.Linear(2048, 4096)
+
+    @classmethod
+    def build_audio_encoder(cls, cfg, task):
+        if task.cfg.is_whisper:
+            if not task.cfg.whisper_with_decoder:
+                cfg.whisper_path = "openai/whisper-large-v2"
+                encoder = FairseqWhisperEncoder(cfg)
+
+        pretraining_path = getattr(cfg, "load_pretrained_encoder_from", None)
+        if pretraining_path is not None and len(pretraining_path) > 0:
+            if not Path(pretraining_path).exists():
+                logger.warning(
+                    f"skipped pretraining because {pretraining_path} does not exist"
+                )
+            else:
+                state = torch.load(pretraining_path, map_location="cpu")
+                
+                component_state_dict = OrderedDict()
+                component_type = "audio_encoder"
+                for key in state["model"].keys():
+                    if key.startswith(component_type):
+                        # encoder.input_layers.0.0.weight --> input_layers.0.0.weight
+                        component_subkey = key[len(component_type) + 1 :]
+                        component_state_dict[component_subkey] = state["model"][key]
+                encoder.load_state_dict(component_state_dict, strict=True)
+            
+                logger.info(f"loaded pretrained encoder from: {pretraining_path}")
+        return encoder
+
+    @classmethod
+    def build_wavlm_encoder(cls, cfg, task):
+        cfg.wavlm_path = "microsoft/wavlm-base"
+        encoder = FairseqWavLMEncoder(cfg)
+        return encoder
+
+    @classmethod
+    def build_gpt_model(cls, cfg, task):
+        gpt_model = LLaMADecoder(dictionary=task.tgt_dict,
+                                llama_checkpoint=cfg.llama_checkpoint,
+                                n_xatt=cfg.n_xatt,
+                                d_att=cfg.d_att,
+                                d_ffn=cfg.d_ffn,
+                                use_lora=task.cfg.use_lora,
+                                lora_only_qv=task.cfg.lora_only_qv,
+                                lora_scale_train=task.cfg.lora_scale_train,
+                                lora_scale_index=task.cfg.lora_scale_index,
+                                lora_scale_random=task.cfg.lora_scale_random,
+                                lora_task_index=task.cfg.lora_task_index,
+                                lora_moe=task.cfg.lora_moe,
+                                lora_moe_scaling=task.cfg.lora_moe_scaling,
+                                lora_moe_n_experts=task.cfg.lora_moe_n_experts,
+                                lora_r=task.cfg.lora_r,
+                                lora_alpha=task.cfg.lora_alpha,
+                                enable_fsdp=task.cfg.enable_fsdp,
+                                use_xformers=task.cfg.use_xformers,
+                                second_stage_update_scale=task.cfg.second_stage_update_scale,
+                                second_stage_fix_lora=task.cfg.second_stage_fix_lora,
+                                scale_only_one=task.cfg.scale_only_one,
+                                scale_with_audio=task.cfg.scale_with_audio,
+                                scale_0_1=task.cfg.scale_0_1,
+                                scale_predict_time=task.cfg.scale_predict_time,
+                                scale_predict_all_dim=task.cfg.scale_predict_all_dim,
+                                scale_predict_all_dim_each_layer=task.cfg.scale_predict_all_dim_each_layer,
+                                prompt_loss=task.cfg.prompt_loss,
+                                use_llama_adapter=task.cfg.use_llama_adapter,)
+        return gpt_model
+
+    @classmethod
+    def build_model(cls, cfg, task):
+        """Build a new model instance."""
+        return cls(cfg, task)
+
+    def get_targets(self, sample, net_output):
+        return sample['target'][sample['net_input']['target_masks']]
+
+    def get_normalized_probs(
+        self,
+        net_output: Tuple[Tensor, Optional[Dict[str, List[Optional[Tensor]]]]],
+        log_probs: bool,
+        sample: Optional[Dict[str, Tensor]] = None,
+    ):
+        # net_output['encoder_out'] is a (B, T, D) tensor
+        #print ("net_output: ", net_output)
+        lprobs = self.get_normalized_probs_scriptable(net_output[0], log_probs, sample)
+        lprobs.batch_first = True
+        return lprobs
\ No newline at end of file
diff --git a/WavLLM/wavllm/models/wavlm.py b/WavLLM/wavllm/models/wavlm.py
new file mode 100644
index 0000000000000000000000000000000000000000..a03d19e12a95d0eb2955d377d3e1925da9f728fa
--- /dev/null
+++ b/WavLLM/wavllm/models/wavlm.py
@@ -0,0 +1,159 @@
+from typing import Optional, Tuple
+from dataclasses import dataclass
+import os
+
+import torch
+from transformers import WavLMConfig
+from transformers.models.wavlm.modeling_wavlm import WavLMModel, WavLMEncoderLayerStableLayerNorm
+from transformers.utils import ModelOutput
+
+import torch.nn as nn
+import torch.nn.functional as F
+from torch.nn import Embedding, Linear
+
+from fairseq.data.data_utils import lengths_to_padding_mask, lengths_to_mask
+from fairseq.models import (
+    FairseqEncoder,
+    FairseqDecoder,
+    FairseqEncoderDecoderModel,
+    register_model,
+    register_model_architecture,
+)
+from ..modules.convolution import Conv1dSubsampler
+from typing import Optional, Tuple, List
+from dataclasses import dataclass
+from sentencepiece import SentencePieceProcessor
+import math
+import json
+import os
+import numpy as np
+from functools import partial
+import time
+
+from torch.distributed.algorithms._checkpoint.checkpoint_wrapper import (
+    checkpoint_wrapper,
+    CheckpointImpl,
+    apply_activation_checkpointing,
+)
+
+non_reentrant_wrapper = partial(
+    checkpoint_wrapper,
+    checkpoint_impl=CheckpointImpl.NO_REENTRANT,
+)
+
+check_fn_encoder = lambda submodule: isinstance(submodule, WavLMEncoderLayerStableLayerNorm)
+
+def apply_fsdp_checkpointing(model):
+    """apply activation checkpointing to model
+    returns None as model is updated directly
+    """
+    print(f"--> applying fsdp activation checkpointing...")
+
+    apply_activation_checkpointing(
+        model, checkpoint_wrapper_fn=non_reentrant_wrapper, check_fn=check_fn_encoder
+    )
+
+class WavLMAdapter(nn.Module):
+    def __init__(
+        self,
+        input_size: int,
+        down_size: int,
+        activation: str
+    ):
+        super(WavLMAdapter, self).__init__()
+        self.down_layer = nn.Linear(input_size, down_size)
+        self.up_layer = nn.Linear(down_size, input_size)
+        self.non_linearity = F.relu if activation == 'relu' else F.gelu
+        self.layer_norm = nn.LayerNorm(input_size)
+
+    def forward(self, src_tokens):
+        return self.layer_norm(self.up_layer(self.non_linearity(self.down_layer(src_tokens)))) + src_tokens
+
+class FairseqWavLMEncoder(FairseqEncoder):
+    
+    def __init__(self, args):
+        super().__init__(None)
+
+        torch.set_printoptions(precision=10)
+
+        self.model = WavLMModel.from_pretrained(args.wavlm_path)
+        self.config = self.model.config
+        self.wavlm_plus = args.wavlm_plus
+        self.wavlm_plus_weight = args.wavlm_plus_weight
+        self.wavlm_plus_1layer = args.wavlm_plus_1layer
+        self.wavlm_plus_1layer_5 = args.wavlm_plus_1layer_5
+        self.wavlm_plus_5layer = args.wavlm_plus_5layer
+
+        for param in self.model.parameters():
+            param.requires_grad = False
+        
+        self.wavlm_output_weight = args.wavlm_output_weight
+        self.wavlm_output_weight_by_prompts = args.wavlm_output_weight_by_prompts
+        self.wavlm_first_7_layers = args.wavlm_first_7_layers
+        self.adapter = WavLMAdapter(1024, 512, "gelu")
+        self.projector = nn.Linear(1024, 2048)
+        self.subsample = Conv1dSubsampler(768, 512, 1024, [3, 3])
+        if self.wavlm_output_weight:
+            initial_weights = torch.ones(13, requires_grad=True).float()
+            self.output_weights = nn.Parameter(initial_weights).to(self.projector.weight)
+            self.weights_predictor = None
+
+    def forward(
+        self, src_tokens, attention_mask, prompt_embedding=None,
+    ):
+        extract_features = self.model.feature_extractor(src_tokens)
+        extract_features = extract_features.transpose(1, 2)
+        attention_mask = self.model._get_feature_vector_attention_mask(extract_features.shape[1], attention_mask)
+        hidden_states, extract_features = self.model.feature_projection(extract_features)
+        hidden_states = self.model._mask_hidden_states(
+            hidden_states, attention_mask=attention_mask
+        )
+
+        encoder_outputs = self.model.encoder(
+            hidden_states,
+            attention_mask=attention_mask,
+            output_attentions=False,
+            output_hidden_states=True,
+            return_dict=True,
+        )
+
+        src_lengths = attention_mask.sum(-1).to(torch.long)
+        if self.wavlm_output_weight:
+            norm_output_weights = F.softmax(self.output_weights, dim=0)
+            weighted_output = [output * weight for output, weight in zip(encoder_outputs.hidden_states, norm_output_weights)]  
+            wavlm_output = torch.stack(weighted_output).sum(dim=0)
+        
+        outputs, src_lengths = self.subsample(wavlm_output, src_lengths)
+        outputs = outputs.transpose(0, 1).contiguous()
+        outputs = self.adapter(outputs)
+        outputs = self.projector(outputs)
+
+        attention_mask = lengths_to_mask(src_lengths)
+        return {
+            "encoder_out": outputs,  # B T C
+            "encoder_padding_mask": attention_mask # B T
+        }
+
+
+    def forward_torchscript(self, net_input, prompt_embedding=None):
+
+        example_wavlm_src_tokens = net_input.get("example_wavlm_src_tokens")
+        example_wavlm_speech_masks = net_input.get("example_wavlm_speech_masks")
+        example_wavlm_audio_out = None
+    
+        wavlm_src_tokens = net_input["wavlm_src_tokens"]
+        wavlm_speech_masks = net_input["wavlm_speech_masks"]
+
+        wavlm_input = torch.stack(wavlm_src_tokens, dim=0)
+        wavlm_speech_masks_input = torch.stack(wavlm_speech_masks, dim=0)
+        wavlm_audio_out = self.forward(src_tokens=wavlm_input, attention_mask=wavlm_speech_masks_input, prompt_embedding=prompt_embedding)
+
+        return wavlm_audio_out, example_wavlm_audio_out
+
+    def reorder_encoder_out(self, encoder_out, new_order):
+        new_encoder_hidden = encoder_out["encoder_out"].index_select(0, new_order)
+        new_encoder_padding_mask = encoder_out["encoder_padding_mask"].to(new_order.device).index_select(0, new_order)
+        return {
+            "encoder_out": new_encoder_hidden,  # B T C
+            "encoder_padding_mask": new_encoder_padding_mask # B T
+        }
diff --git a/WavLLM/wavllm/models/whisper_encoder.py b/WavLLM/wavllm/models/whisper_encoder.py
new file mode 100644
index 0000000000000000000000000000000000000000..065a7a7ed141d939256612e8a6d18d44f10c545e
--- /dev/null
+++ b/WavLLM/wavllm/models/whisper_encoder.py
@@ -0,0 +1,219 @@
+from typing import Optional, Tuple
+from dataclasses import dataclass
+import os
+
+import torch
+from transformers import WhisperConfig
+from transformers.models.whisper.modeling_whisper import WhisperEncoder as HFWhisperEncoder
+from transformers.utils import ModelOutput
+from transformers.models.whisper.modeling_whisper import WhisperEncoderLayer, WhisperDecoderLayer
+import torch.nn as nn
+import torch.nn.functional as F
+from torch.nn import Embedding, Linear
+
+from fairseq.data.data_utils import lengths_to_padding_mask, lengths_to_mask
+from fairseq.models import (
+    FairseqEncoder,
+    FairseqDecoder,
+    FairseqEncoderDecoderModel,
+    register_model,
+    register_model_architecture,
+)
+from ..modules.convolution import Conv1dSubsampler
+from typing import Optional, Tuple, List
+from dataclasses import dataclass
+from sentencepiece import SentencePieceProcessor
+import math
+import json
+import os
+import numpy as np
+from functools import partial
+import time
+
+from torch.distributed.algorithms._checkpoint.checkpoint_wrapper import (
+    checkpoint_wrapper,
+    CheckpointImpl,
+    apply_activation_checkpointing,
+)
+
+non_reentrant_wrapper = partial(
+    checkpoint_wrapper,
+    checkpoint_impl=CheckpointImpl.NO_REENTRANT,
+)
+
+check_fn_encoder = lambda submodule: isinstance(submodule, WhisperEncoderLayer)
+
+def apply_fsdp_checkpointing(model):
+    """apply activation checkpointing to model
+    returns None as model is updated directly
+    """
+    print(f"--> applying fsdp activation checkpointing...")
+
+    apply_activation_checkpointing(
+        model, checkpoint_wrapper_fn=non_reentrant_wrapper, check_fn=check_fn_encoder
+    )
+
+def lengths_to_padding_mask(lens):
+    bsz, max_lens = lens.size(0), torch.max(lens).item()
+    mask = torch.arange(max_lens).to(lens.device).view(1, max_lens)
+    mask = mask.expand(bsz, -1) >= lens.view(bsz, 1).expand(-1, max_lens)
+    return mask
+
+@dataclass
+class WhisperOutput(ModelOutput):
+    last_hidden_state: torch.FloatTensor = None
+    hidden_states: Optional[Tuple[torch.FloatTensor]] = None
+    attentions: Optional[Tuple[torch.FloatTensor]] = None
+    output_lengths: Optional[torch.LongTensor] = None
+
+class WhisperAdapter(nn.Module):
+    def __init__(
+        self,
+        input_size: int,
+        down_size: int,
+        activation: str
+    ):
+        super(WhisperAdapter, self).__init__()
+        self.down_layer = nn.Linear(input_size, down_size)
+        self.up_layer = nn.Linear(down_size, input_size)
+        self.non_linearity = F.relu if activation == 'relu' else F.gelu
+        self.layer_norm = nn.LayerNorm(input_size)
+
+    def forward(self, src_tokens):
+        return self.layer_norm(self.up_layer(self.non_linearity(self.down_layer(src_tokens)))) + src_tokens
+
+class FairseqWhisperEncoder(FairseqEncoder):
+    
+    def __init__(self, args):
+        super().__init__(None)
+        self.model = WhisperEncoder.from_pretrained(args.whisper_path)
+        self.config = self.model.config
+
+        for param in self.model.parameters():
+            param.requires_grad = False
+
+        
+        self.adapter = WhisperAdapter(1024, 512, "gelu")
+        self.projector = nn.Linear(1024, 2048)
+        self.subsample = Conv1dSubsampler(1280, 1280, 1024, [3, 3])
+        apply_fsdp_checkpointing(self.model)
+
+    def forward(
+        self, src_tokens, attention_mask,
+    ):
+        hidden_states = src_tokens
+        encoder_outputs = self.model(
+            hidden_states,
+            attention_mask=attention_mask,
+            output_attentions=False,
+            output_hidden_states=False,
+            return_dict=True,
+        )
+        
+        speech_lengths = encoder_outputs.output_lengths
+        outputs, speech_lengths = self.subsample(encoder_outputs.last_hidden_state, speech_lengths)
+        outputs = outputs.transpose(0, 1).contiguous()
+        speech_padding_mask = lengths_to_padding_mask(speech_lengths)
+        speech_atts = ~speech_padding_mask
+        outputs = self.adapter(outputs)
+        outputs = self.projector(outputs)
+        return {
+            # "encoder_out": encoder_outputs[0],  # B T C
+            "encoder_out": outputs,  # B T C
+            "encoder_padding_mask": speech_atts
+        }
+
+    def split_wav_codec(self, audio_out, wav_n):
+        ori_type = audio_out['encoder_out'].dtype
+        split_audio_out = torch.split(audio_out['encoder_out'], wav_n, dim=0)
+        split_padding_mask = torch.split(audio_out['encoder_padding_mask'], wav_n, dim=0)
+        padded_audio_out = []  
+        padded_padding_mask = []
+        for a, p in zip(split_audio_out, split_padding_mask):  
+            if a.shape[0] < max(wav_n):  
+                a_size = list(a.shape)  
+                a_size[0] = max(wav_n) - a.shape[0]  
+                a_pad_tensor = torch.zeros(a_size).to(a.device)
+                a = torch.cat((a, a_pad_tensor), dim=0) 
+
+                p_size = list(p.shape)
+                p_size[0] = max(wav_n) - p.shape[0]  
+                p_pad_tensor = torch.zeros(p_size).bool().to(p.device)
+                p = torch.cat((p, p_pad_tensor), dim=0) 
+            padded_audio_out.append(a) 
+            padded_padding_mask.append(p)
+
+        audio_out['encoder_out'] = torch.stack([torch.cat(tuple(t[i] for i in range(max(wav_n))), dim=0) for t in padded_audio_out]).to(ori_type)
+        audio_out['encoder_padding_mask'] = torch.stack([torch.cat(tuple(t[i] for i in range(max(wav_n))), dim=0) for t in padded_padding_mask])
+        return audio_out
+
+    def forward_torchscript(self, net_input):
+        example_src_tokens = net_input.get("example_src_tokens")
+        example_speech_masks = net_input.get("example_speech_masks")
+        example_audio_out = None
+
+        src_tokens = net_input["src_tokens"]
+        speech_masks = net_input["speech_masks"]
+
+        wav_n = [len(src_token) for src_token in src_tokens]
+
+        stacked_input = torch.stack([tensor for lst in src_tokens for tensor in lst])
+        stacked_mask = torch.stack([tensor for lst in speech_masks for tensor in lst])
+
+        audio_out = self.forward(stacked_input, stacked_mask)
+
+        audio_out = self.split_wav_codec(audio_out, wav_n)
+        example_audio_out = None
+
+        return audio_out, example_audio_out
+
+    def reorder_encoder_out(self, encoder_out, new_order):
+        new_encoder_hidden = encoder_out["encoder_out"].index_select(0, new_order)
+        new_encoder_padding_mask = encoder_out["encoder_padding_mask"].to(new_order.device).index_select(0, new_order)
+        return {
+            "encoder_out": new_encoder_hidden,  # B T C
+            "encoder_padding_mask": new_encoder_padding_mask # B T
+        }
+
+class WhisperEncoder(HFWhisperEncoder):
+    """
+    overwrite forward to support attention_mask
+    overwrite from_pretrained to support split encoder parameters from pretrained WhisperModel
+    """
+
+    def from_pretrained(model_path):
+        config = WhisperConfig.from_pretrained(model_path)
+
+        model = WhisperEncoder(config)
+        return model
+
+    def forward(
+        self,
+        input_features,
+        attention_mask=None,
+        head_mask=None,
+        output_attentions=None,
+        output_hidden_states=None,
+        return_dict=None,
+    ):
+        output = super().forward(
+            input_features,
+            attention_mask,
+            head_mask,
+            output_attentions,
+            output_hidden_states,
+            return_dict
+        )
+        
+        last_hidden_state = output.last_hidden_state # B x T x C
+        input_lengths = attention_mask.sum(-1)
+        output_lengths = self._get_feat_extract_output_lengths(input_lengths)
+        max_length = output_lengths.max()
+        last_hidden_state = last_hidden_state[:,:max_length,:]
+
+        return WhisperOutput(
+            last_hidden_state=last_hidden_state,
+            hidden_states=None,
+            attentions=None,
+            output_lengths=output_lengths
+        )
\ No newline at end of file
diff --git a/WavLLM/wavllm/modules/convolution.py b/WavLLM/wavllm/modules/convolution.py
new file mode 100644
index 0000000000000000000000000000000000000000..65dd9bbc1decff25e91bc949eca637ee6ed8064e
--- /dev/null
+++ b/WavLLM/wavllm/modules/convolution.py
@@ -0,0 +1,126 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+
+from typing import List
+
+import torch
+import torch.nn as nn
+
+
+class Conv1dSubsampler(nn.Module):
+    """Convolutional subsampler: a stack of 1D convolution (along temporal
+    dimension) followed by non-linear activation via gated linear units
+    (https://arxiv.org/abs/1911.08460)
+
+    Args:
+        in_channels (int): the number of input channels
+        mid_channels (int): the number of intermediate channels
+        out_channels (int): the number of output channels
+        kernel_sizes (List[int]): the kernel size for each convolutional layer
+    """
+
+    def __init__(
+        self,
+        in_channels: int,
+        mid_channels: int,
+        out_channels: int,
+        kernel_sizes: List[int] = (3, 3),
+    ):
+        super(Conv1dSubsampler, self).__init__()
+        self.n_layers = len(kernel_sizes)
+        self.conv_layers = nn.ModuleList(
+            nn.Conv1d(
+                in_channels if i == 0 else mid_channels // 2,
+                mid_channels if i < self.n_layers - 1 else out_channels * 2,
+                k,
+                stride=2,
+                padding=k // 2,
+            )
+            for i, k in enumerate(kernel_sizes)
+        )
+
+    def get_out_seq_lens_tensor(self, in_seq_lens_tensor):
+        out = in_seq_lens_tensor.clone()
+        for _ in range(self.n_layers):
+            out = ((out.float() - 1) / 2 + 1).floor().long()
+        return out
+
+    def forward(self, src_tokens, src_lengths):
+        bsz, in_seq_len, _ = src_tokens.size()  # B x T x (C x D)
+        x = src_tokens.transpose(1, 2).contiguous()  # -> B x (C x D) x T
+        for conv in self.conv_layers:
+            x = conv(x)
+            x = nn.functional.glu(x, dim=1)
+        _, _, out_seq_len = x.size()
+        x = x.transpose(1, 2).transpose(0, 1).contiguous()  # -> T x B x (C x D)
+        return x, self.get_out_seq_lens_tensor(src_lengths)
+
+
+def infer_conv_output_dim(in_channels, input_dim, out_channels):
+    sample_seq_len = 200
+    sample_bsz = 10
+    x = torch.randn(sample_bsz, in_channels, sample_seq_len, input_dim)
+    x = torch.nn.Conv2d(in_channels, out_channels, 3, stride=2, padding=3 // 2)(x)
+    x = torch.nn.Conv2d(out_channels, out_channels, 3, stride=2, padding=3 // 2)(x)
+    x = x.transpose(1, 2)
+    mb, seq = x.size()[:2]
+    return x.contiguous().view(mb, seq, -1).size(-1)
+
+
+class Conv2dSubsampler(nn.Module):
+    """Convolutional subsampler: a stack of 2D convolution based on ESPnet implementation
+    (https://github.com/espnet/espnet)
+
+    Args:
+        input_channels (int): the number of input channels
+        input_feat_per_channel (int): encoder input dimension per input channel
+        conv_out_channels (int): the number of output channels of conv layer
+        encoder_embed_dim (int): encoder dimentions
+    """
+
+    def __init__(
+        self,
+        input_channels: int,
+        input_feat_per_channel: int,
+        conv_out_channels: int,
+        encoder_embed_dim: int,
+    ):
+        super().__init__()
+        assert input_channels == 1, input_channels
+        self.conv = torch.nn.Sequential(
+            torch.nn.Conv2d(
+                input_channels, conv_out_channels, 3, stride=2, padding=3 // 2
+            ),
+            torch.nn.ReLU(),
+            torch.nn.Conv2d(
+                conv_out_channels,
+                conv_out_channels,
+                3,
+                stride=2,
+                padding=3 // 2,
+            ),
+            torch.nn.ReLU(),
+        )
+        transformer_input_dim = infer_conv_output_dim(
+            input_channels, input_feat_per_channel, conv_out_channels
+        )
+        self.out = torch.nn.Linear(transformer_input_dim, encoder_embed_dim)
+
+    def forward(self, src_tokens, src_lengths):
+        B, T_i, C = src_tokens.size()
+        x = src_tokens.view(B, T_i, 1, C).transpose(1, 2).contiguous()
+        x = self.conv(x)
+        B, _, T_o, _ = x.size()
+        x = x.transpose(1, 2).transpose(0, 1).contiguous().view(T_o, B, -1)
+        x = self.out(x)
+
+        subsampling_factor = int(T_i * 1.0 / T_o + 0.5)
+        input_len_0 = (src_lengths.float() / subsampling_factor).ceil().long()
+        input_len_1 = x.size(0) * torch.ones([src_lengths.size(0)]).long().to(
+            input_len_0.device
+        )
+        input_lengths = torch.min(input_len_0, input_len_1)
+        return x, input_lengths
\ No newline at end of file
diff --git a/WavLLM/wavllm/requirements.txt b/WavLLM/wavllm/requirements.txt
new file mode 100644
index 0000000000000000000000000000000000000000..f7c634fcf560212f4803abba2047041f1143e008
--- /dev/null
+++ b/WavLLM/wavllm/requirements.txt
@@ -0,0 +1,7 @@
+fairscale==0.4.13
+fairseq==0.12.2
+numpy==1.24.3
+omegaconf==2.0.6
+sentencepiece==0.1.99
+torch==2.0.1
+transformers==4.32.1
diff --git a/WavLLM/wavllm/scripts/inference_sft.sh b/WavLLM/wavllm/scripts/inference_sft.sh
new file mode 100644
index 0000000000000000000000000000000000000000..d0381d266e03a85e30a486f8e6be8bafed29e957
--- /dev/null
+++ b/WavLLM/wavllm/scripts/inference_sft.sh
@@ -0,0 +1,39 @@
+export CUDA_VISIBLE_DEVICES=0
+export HYDRA_FULL_ERROR=1
+export PYTHONPATH=$$PYTHONPATH:${PWD}
+
+model_path=$1
+[ -z $model_path ] && model_path="?"
+
+src_dir=${model_path%/*}
+cpt=${model_path##*/}
+cpt=${cpt%.*}
+
+gen_set=$2
+[ -z $gen_set ] && gen_set="?"
+[ -z $beam_size ] && beam_size=1
+
+
+FAIRSEQ_ROOT=${PWD}
+DATA_DIR=$FAIRSEQ_ROOT/examples/wavllm/test_data
+
+for subset in $gen_set; do
+    results_path=$src_dir/decode_${cpt}_beam${beam_size}/${subset}
+    [ ! -d $results_path ] && mkdir -p $results_path
+
+    python $FAIRSEQ_ROOT/examples/wavllm/inference/generate.py $DATA_DIR \
+    --user-dir examples/wavllm \
+    --tokenizer-path $FAIRSEQ_ROOT/examples/wavllm/tokenizer/tokenizer.model \
+    --gen-subset ${subset} \
+    \
+    --task speechllm_task \
+    \
+    --path ${model_path} \
+    --results-path $results_path \
+    \
+    --scoring wer \
+    --skip-invalid-size-inputs-valid-test \
+    --max-tokens 1600000 \
+    --sampling --beam 1 --nbest 1 --temperature 0.5 \
+    --max-len-a 0 --max-len-b 512
+done
\ No newline at end of file
diff --git a/WavLLM/wavllm/tasks/speechllm_task.py b/WavLLM/wavllm/tasks/speechllm_task.py
new file mode 100644
index 0000000000000000000000000000000000000000..dfeeecb380f24b08196441caeecd05734acf2aba
--- /dev/null
+++ b/WavLLM/wavllm/tasks/speechllm_task.py
@@ -0,0 +1,533 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import logging
+from pathlib import Path
+from argparse import Namespace
+
+from fairseq.data import Dictionary, encoders
+from ..data.speechllm_dataset import (
+    SpeechLLMDataset,
+    get_features_or_waveform,
+)
+
+from transformers import WhisperProcessor
+
+from ..data.tokenizer import Tokenizer
+from transformers import WhisperProcessor, AutoProcessor, AutoFeatureExtractor
+from fairseq.tasks import FairseqTask, LegacyFairseqTask, register_task
+from dataclasses import dataclass, field
+from fairseq.dataclass import ChoiceEnum, FairseqDataclass
+from fairseq.data import ResamplingDataset, ConcatDataset
+from typing import Optional, Any, List
+import os
+from fairseq import search, utils
+
+logger = logging.getLogger(__name__)
+
+
+class Dictionary_for_pad(Dictionary):
+    """A mapping from symbols to consecutive integers"""
+
+    def __init__(
+        self,
+        *,  # begin keyword-only arguments
+        pad="<pad>", ## pad_id = 0
+        bos="<s>",
+        eos="</s>",
+        unk="<unk>",
+        extra_special_symbols=None,
+    ):
+        self.bos_word, self.unk_word, self.pad_word, self.eos_word = bos, unk, pad, eos
+        self.symbols = []
+        self.count = []
+        self.indices = {}
+        self.pad_index = self.add_symbol(pad) ## let pad_id = 0
+        self.bos_index = self.add_symbol(bos) # 1
+        self.eos_index = self.add_symbol(eos) # 2
+        self.unk_index = self.add_symbol(unk) # 3
+        if extra_special_symbols:
+            for s in extra_special_symbols:
+                self.add_symbol(s)
+        self.nspecial = len(self.symbols)
+
+@dataclass
+class SpeechLLMTaskConfig(FairseqDataclass):
+    data: Optional[str] = field(
+        default=None, metadata={"help": "manifest root path"}
+    )
+    max_source_positions: int = field(
+        default=640000,
+        metadata={"help": "max number of tokens in the source sequence"},
+    )
+    max_target_positions: int = field(
+        default=6000,
+        metadata={"help": "max number of tokens in the target sequence"},
+    )
+    tokenizer_path: Optional[str] = field(
+        default=None, metadata={"help": "LLM tokenizer model path"}
+    )
+    processor_path: Optional[str] = field(
+        default=None, metadata={"help": "audio encoder's processor path"}
+    )
+    wavlm_processor_path: Optional[str] = field(
+        default=None, metadata={"help": "wavlm encoder's processor path"}
+    )
+    seed: int = field(
+        default=12345,
+        metadata={"help": "random seed"},
+    )
+    audio_root: Optional[str] = field(
+        default="", metadata={"help": "audio root path"}
+    )
+
+    prepend_tgt_lang_tag: bool = field(
+        default=False
+    )
+    shuffle: bool = field(
+        default=True
+    )
+    use_audio_input: bool = field(
+        default=True
+    )
+    is_whisper: bool = field(
+        default=False
+    )
+    whisper_with_decoder: bool = field(
+        default=True
+    )
+    whisper_token_len: int = field(
+        default=64
+    )
+    freeze_audio_encoder: bool = field(
+        default=True
+    )
+    use_sample_rate: int = field(
+        default=16000,
+        metadata={"help": "sample rate for speech input"},
+    )
+    reload_speechllm: bool = field(
+        default=False
+    )
+    use_vicuna: bool = field(
+        default=False
+    )
+    sft_stage: bool = field(
+        default=False
+    )
+    use_lora: bool = field(
+        default=False
+    )
+    lora_r: int = field(
+        default=8
+    )
+    lora_alpha: int = field(
+        default=32
+    )
+    lora_scale_train: bool = field(
+        default=False
+    )
+    lora_scale_index: bool = field(
+        default=False
+    )
+    lora_task_index: bool = field(
+        default=False
+    )
+    lora_scale_random: bool = field(
+        default=False
+    )
+    lora_moe: bool = field(
+        default=False
+    )
+    lora_moe_n_experts: int = field(
+        default=3
+    )
+    lora_moe_scaling: bool = field(
+        default=False
+    )
+    llama_2: bool = field(
+        default=False
+    )
+    llama_2_path: str = field(
+        default=""
+    )
+    parallel_mode: bool = field(
+        default=False
+    )
+    enable_fsdp: bool = field(
+        default=False
+    )
+    continue_write_task: bool = field(
+        default=False
+    )
+    only_text: bool = field(
+        default=False
+    )
+    alpaca_text: bool = field(
+        default=False
+    )
+    with_codec: bool = field(
+        default=False
+    )
+    after_adapter: bool = field(
+        default=False
+    )
+    get_codec_online: bool = field(
+        default=False
+    )
+    in_context_infer: bool = field(
+        default=False
+    )
+    in_context_train: bool = field(
+        default=False
+    )
+    pretrained_checkpoint: str = field(
+        default=""
+    )
+    prompt_bulid: bool = field(
+        default=False
+    )
+    prompt_before_speech: bool = field(
+        default=False
+    )
+    use_xformers: bool = field(
+        default=False
+    )
+    small_scale_training: bool = field(
+        default=False
+    )
+    second_stage_update_scale: bool = field(
+        default=False
+    )
+    second_stage_fix_lora: bool = field(
+        default=False
+    )
+    scale_only_one: bool = field(
+        default=False
+    )
+    scale_with_audio: bool = field(
+        default=True
+    )
+    scale_0_1: bool = field(
+        default=True
+    )
+    scale_predict_time: bool = field(
+        default=False
+    )
+    scale_predict_all_dim: bool = field(
+        default=False
+    )
+    scale_predict_all_dim_each_layer: bool = field(
+        default=False
+    )
+    second_stage_update_lora: bool = field(
+        default=False
+    )
+    second_stage_add_lora: bool = field(
+        default=False
+    )
+    lora_only_qv: bool = field(
+        default=False
+    )
+    load_pretrained_model: bool = field(
+        default=True
+    )
+    prompt_loss: bool = field(
+        default=False
+    )
+    use_llama_adapter: bool = field(
+        default=False
+    )
+    codec_weights: bool = field(
+        default=False
+    )
+    use_wavlm: bool = field(
+        default=False
+    )
+    wavlm_weights: bool = field(
+        default=False
+    )
+    wavlm_output_weight: bool = field(
+        default=False
+    )
+    wavlm_output_weight_by_prompts: bool = field(
+        default=False
+    )
+    wavlm_first_7_layers: bool = field(
+        default=False
+    )
+    wavlm_plus: bool = field(
+        default=False
+    )
+    wavlm_plus_weight: bool = field(
+        default=False
+    )
+    wavlm_plus_1layer: bool = field(
+        default=False
+    )
+    wavlm_plus_1layer_5: bool = field(
+        default=False
+    )
+    wavlm_plus_5layer: bool = field(
+        default=False
+    )
+    skip_whisper: bool = field(
+        default=False
+    )
+
+@register_task("speechllm_task", dataclass=SpeechLLMTaskConfig)
+class SpeechLLMTask(FairseqTask):
+
+
+    def __init__(self, cfg: SpeechLLMTaskConfig):
+        """"""
+        cfg: SpeechLLMTaskConfig
+        super().__init__(cfg)
+        logger.info(f"current directory is {os.getcwd()}")
+        logger.info(f"Task Config {cfg}")
+        self.cfg = cfg
+
+        self.tgt_dict = Dictionary_for_pad.load(f"{self.cfg.data}/dict.txt")
+        #self.data_cfg = SpeechLLMDataConfig(Path(args.data) / args.config_yaml)
+        #self.speaker_to_id = self._get_speaker_to_id(
+
+    @classmethod
+    def setup_task(cls, cfg, **kwargs):
+        # data_cfg = SpeechLLMDataConfig(Path(args.data) / args.config_yaml)
+        # dict_path = Path(args.data) / data_cfg.vocab_filename
+        # if not dict_path.is_file():
+        #     raise FileNotFoundError(f"Dict not found: {dict_path.as_posix()}")
+        # tgt_dict = Dictionary.load(dict_path.as_posix())
+        # logger.info(
+        #     f"dictionary size ({data_cfg.vocab_filename}): " f"{len(tgt_dict):,}"
+        # )
+
+        # if getattr(args, "train_subset", None) is not None:
+        #     if not all(s.startswith("train") for s in args.train_subset.split(",")):
+        #         raise ValueError('Train splits should be named like "train*".')
+        return cls(cfg)
+
+
+    def load_dataset(self, split, epoch=1, combine=False, **kwargs):
+        is_train_split = split.startswith("train")
+        # pre_tokenizer = self.build_tokenizer(self.args)
+        # bpe_tokenizer = self.build_bpe(self.args)
+        text_tokenizer = self.build_tokenizer(self.cfg.tokenizer_path)
+        if self.cfg.is_whisper:
+            audio_processor = self.bulid_processor(self.cfg.processor_path)
+        else:
+            audio_processor = None
+
+        if self.cfg.use_wavlm:
+            wavlm_feature_extractor = AutoFeatureExtractor.from_pretrained(self.cfg.wavlm_processor_path)
+        else:
+            wavlm_feature_extractor = None
+
+        self.n_words = text_tokenizer.n_words
+        self.tokenizer = text_tokenizer
+
+        datasets = [ 
+                SpeechLLMDataset(
+                    self.cfg,
+                    data_root=self.cfg.data,
+                    split=subset,
+                    text_tokenizer=text_tokenizer,
+                    audio_processor=audio_processor,
+                    wavlm_processor=wavlm_feature_extractor,
+                    is_train_split=is_train_split,
+                    #seed=self.cfg.seed,
+                ) for subset in split.split(",")
+            ]
+
+        if is_train_split and len(datasets) > 1 and cfg.sampling_alpha != 1.0:
+            # temperature-based sampling
+            size_ratios = cls.get_size_ratios(datasets, alpha=cfg.sampling_alpha)
+            datasets = [
+                ResamplingDataset(
+                    d, size_ratio=r, seed=self.cfg.seed, epoch=epoch, replace=(r >= 1.0)
+                )
+                for r, d in zip(size_ratios, datasets)
+            ]
+
+        self.datasets[split] = ConcatDataset(datasets) if len(datasets) > 1 else datasets[0]
+
+
+    @property
+    def target_dictionary(self):
+        return self.tgt_dict
+
+    @property
+    def source_dictionary(self):
+        return None
+
+    def max_positions(self):
+        return self.cfg.max_source_positions, self.cfg.max_target_positions
+
+    def build_model(self, args):
+        model = super().build_model(args)
+        return model
+
+    def build_generator(
+        self,
+        models,
+        args,
+        seq_gen_cls=None,
+        extra_gen_cls_kwargs=None,
+    ):
+
+        if extra_gen_cls_kwargs is None:
+            extra_gen_cls_kwargs = {}
+        return self.build_generator_base(
+            models, args, seq_gen_cls=None, extra_gen_cls_kwargs=extra_gen_cls_kwargs
+        )
+
+    def build_generator_base(
+        self,
+        models,
+        args,
+        seq_gen_cls=None,
+        extra_gen_cls_kwargs=None,
+        prefix_allowed_tokens_fn=None,
+    ):
+        """
+        Build a :class:`~fairseq.SequenceGenerator` instance for this
+        task.
+
+        Args:
+            models (List[~fairseq.models.FairseqModel]): ensemble of models
+            args (fairseq.dataclass.configs.GenerationConfig):
+                configuration object (dataclass) for generation
+            extra_gen_cls_kwargs (Dict[str, Any]): extra options to pass
+                through to SequenceGenerator
+            prefix_allowed_tokens_fn (Callable[[int, torch.Tensor], List[int]]):
+                If provided, this function constrains the beam search to
+                allowed tokens only at each step. The provided function
+                should take 2 arguments: the batch ID (`batch_id: int`)
+                and a unidimensional tensor of token ids (`inputs_ids:
+                torch.Tensor`). It has to return a `List[int]` with the
+                allowed tokens for the next generation step conditioned
+                on the previously generated tokens (`inputs_ids`) and
+                the batch ID (`batch_id`). This argument is useful for
+                constrained generation conditioned on the prefix, as
+                described in "Autoregressive Entity Retrieval"
+                (https://arxiv.org/abs/2010.00904) and
+                https://github.com/facebookresearch/GENRE.
+        """
+        if getattr(args, "score_reference", False):
+            from fairseq.sequence_scorer import SequenceScorer
+
+            return SequenceScorer(
+                self.target_dictionary,
+                compute_alignment=getattr(args, "print_alignment", False),
+            )
+
+        from ..inference.sequence_generator import (
+            SequenceGenerator,
+        )
+
+        # Choose search strategy. Defaults to Beam Search.
+        sampling = getattr(args, "sampling", False)
+        sampling_topk = getattr(args, "sampling_topk", -1)
+        sampling_topp = getattr(args, "sampling_topp", -1.0)
+        diverse_beam_groups = getattr(args, "diverse_beam_groups", -1)
+        diverse_beam_strength = getattr(args, "diverse_beam_strength", 0.5)
+        match_source_len = getattr(args, "match_source_len", False)
+        diversity_rate = getattr(args, "diversity_rate", -1)
+        constrained = getattr(args, "constraints", False)
+        if prefix_allowed_tokens_fn is None:
+            prefix_allowed_tokens_fn = getattr(args, "prefix_allowed_tokens_fn", None)
+        if (
+            sum(
+                int(cond)
+                for cond in [
+                    sampling,
+                    diverse_beam_groups > 0,
+                    match_source_len,
+                    diversity_rate > 0,
+                ]
+            )
+            > 1
+        ):
+            raise ValueError("Provided Search parameters are mutually exclusive.")
+        assert sampling_topk < 0 or sampling, "--sampling-topk requires --sampling"
+        assert sampling_topp < 0 or sampling, "--sampling-topp requires --sampling"
+
+        if sampling:
+            search_strategy = search.Sampling(
+                self.target_dictionary, sampling_topk, sampling_topp
+            )
+        elif diverse_beam_groups > 0:
+            search_strategy = search.DiverseBeamSearch(
+                self.target_dictionary, diverse_beam_groups, diverse_beam_strength
+            )
+        elif match_source_len:
+            # this is useful for tagging applications where the output
+            # length should match the input length, so we hardcode the
+            # length constraints for simplicity
+            search_strategy = search.LengthConstrainedBeamSearch(
+                self.target_dictionary,
+                min_len_a=1,
+                min_len_b=0,
+                max_len_a=1,
+                max_len_b=0,
+            )
+        elif diversity_rate > -1:
+            search_strategy = search.DiverseSiblingsSearch(
+                self.target_dictionary, diversity_rate
+            )
+        elif constrained:
+            search_strategy = search.LexicallyConstrainedBeamSearch(
+                self.target_dictionary, args.constraints
+            )
+        elif prefix_allowed_tokens_fn:
+            search_strategy = search.PrefixConstrainedBeamSearch(
+                self.target_dictionary, prefix_allowed_tokens_fn
+            )
+        else:
+            search_strategy = search.BeamSearch(self.target_dictionary)
+
+        extra_gen_cls_kwargs = extra_gen_cls_kwargs or {}
+        if seq_gen_cls is None:
+            seq_gen_cls = SequenceGenerator
+
+        return seq_gen_cls(
+            models,
+            self.target_dictionary,
+            beam_size=getattr(args, "beam", 5),
+            max_len_a=getattr(args, "max_len_a", 0),
+            max_len_b=getattr(args, "max_len_b", 200),
+            min_len=getattr(args, "min_len", 1),
+            normalize_scores=(not getattr(args, "unnormalized", False)),
+            len_penalty=getattr(args, "lenpen", 1),
+            unk_penalty=getattr(args, "unkpen", 0),
+            temperature=getattr(args, "temperature", 1.0),
+            match_source_len=getattr(args, "match_source_len", False),
+            no_repeat_ngram_size=getattr(args, "no_repeat_ngram_size", 0),
+            search_strategy=search_strategy,
+            n_words=self.n_words,
+            **extra_gen_cls_kwargs,
+        )
+
+    def build_tokenizer(self, tokenizer_path):
+        logger.info(f"tokenizer: {self.cfg.tokenizer_path}")
+        text_tokenizer = Tokenizer(self.cfg.tokenizer_path)
+        return text_tokenizer
+
+    def bulid_processor(self, processor_path):
+        if self.cfg.is_whisper:
+            logger.info(f"processor: {processor_path}")
+            audio_processor = AutoProcessor.from_pretrained(processor_path)
+        else:
+            audio_processor = None
+        return audio_processor
+
+    def get_interactive_tokens_and_lengths(self, lines, encode_fn):
+        n_frames = [get_features_or_waveform(p).shape[0] for p in lines]
+        return lines, n_frames
+
+    def build_dataset_for_inference(self, src_tokens, src_lengths, **kwargs):
+        return SpeechToTextDataset(
+            "interactive", False, self.data_cfg, src_tokens, src_lengths
+        )
diff --git a/WavLLM/wavllm/test_data/CoT-task-story.tsv b/WavLLM/wavllm/test_data/CoT-task-story.tsv
new file mode 100644
index 0000000000000000000000000000000000000000..2e258351b5e18b7c0dc2cc4d39bb53d8213e8a0b
--- /dev/null
+++ b/WavLLM/wavllm/test_data/CoT-task-story.tsv
@@ -0,0 +1,2 @@
+id	audio	n_frames	prompt	tgt_text	with_speech	orig_story
+0	SpeechT5/WavLLM/fairseq/examples/wavllm/test_data/audio/CoT-task-story.wav	1079348	First of all, transcribe the audio recording into text, capturing every spoken word; Additionally given this audio clip and text, can you condense it into a clear, concise summary, no more than 20 words?; Lastly disregarding the sound, translate this English summary into German.	Bis zum Jahr 2500 ist die Erde eine umweltfreundliche Utopie mit fortschrittlicher KI, neuronaler Vernetzung und einer perfekten Mischung aus Technologie und Natur.	True	In the year 2500, Earth gleamed like a sapphire, a futuristic utopia where harmony reigned. Skyscrapers, draped in lush greenery, stretched towards the heavens, their glass surfaces reflecting the tranquil azure of a pollution-free sky. Humanity had transcended past conflicts, embracing an era of shared consciousness through neural connectivity. Autonomous vehicles glided silently on solar pathways, while people mingled in serene communal spaces, their basic needs met by advanced AI that predicted and catered to their every whim. The Great Reconciliation had merged technology with nature, and in this new world, every individual thrived, their potential limited only by the expanses of their own creativity. The utopia wasn't just a place; it was the pulse of civilization, beating in perfect rhythm with the universe.
\ No newline at end of file
diff --git a/WavLLM/wavllm/test_data/CoT-task.tsv b/WavLLM/wavllm/test_data/CoT-task.tsv
new file mode 100644
index 0000000000000000000000000000000000000000..72500fef2f7f5b23d4b280d67428b5c8d4141881
--- /dev/null
+++ b/WavLLM/wavllm/test_data/CoT-task.tsv
@@ -0,0 +1,2 @@
+id	audio	n_frames	prompt	with_speech	tgt_text
+0	SpeechT5/WavLLM/fairseq/examples/wavllm/test_data/audio/CoT-task.wav	214437	First of all, transcribe the audio recording into text, capturing every spoken word; Additionally given this audio clip and text, can you condense it into a clear, concise summary, no more than 20 words?; Lastly disregarding the sound, translate this English summary into German.	True	Drei Filme aus dem asiatisch-pazifischen Raum im Rennen in Cannes
\ No newline at end of file
diff --git a/WavLLM/wavllm/test_data/II-task.tsv b/WavLLM/wavllm/test_data/II-task.tsv
new file mode 100644
index 0000000000000000000000000000000000000000..49d1f3c43a2922110bd43a2713a89b02f7b43427
--- /dev/null
+++ b/WavLLM/wavllm/test_data/II-task.tsv
@@ -0,0 +1,2 @@
+id	audio	n_frames	with_speech	prompt	tgt_text
+0	SpeechT5/WavLLM/fairseq/examples/wavllm/test_data/audio/II-task.wav	111111	True	To begin, Transcribe the audio recording into text, capturing every spoken word; Subsequently, How does the woman finally decide to go home? A. By bus; B. In the man’s car; C. In her father’s car.; Furthermore, ignore the audio clip, What is the capital of New Zealand?; Lastly, Continue the narrative of given audio clip in a coherent and engaging way	ASR+SQA+SFT+Continue
\ No newline at end of file
diff --git a/WavLLM/wavllm/test_data/SQA.tsv b/WavLLM/wavllm/test_data/SQA.tsv
new file mode 100644
index 0000000000000000000000000000000000000000..f49760362d200ba82b09a546fbc4f3c725bbedb1
--- /dev/null
+++ b/WavLLM/wavllm/test_data/SQA.tsv
@@ -0,0 +1,2 @@
+id	audio	n_frames	prompt	tgt_text	with_speech
+0	SpeechT5/WavLLM/fairseq/examples/wavllm/test_data/audio/sqa.wav	111111	What will the man do next? A. Start to take exercise; B. Do as he always does; C. Change his working time.	A	True
diff --git a/WavLLM/wavllm/test_data/SQQA.tsv b/WavLLM/wavllm/test_data/SQQA.tsv
new file mode 100644
index 0000000000000000000000000000000000000000..4463e2863aae001c563a076cd369b1a8660aba3f
--- /dev/null
+++ b/WavLLM/wavllm/test_data/SQQA.tsv
@@ -0,0 +1,2 @@
+id	audio	n_frames	prompt	tgt_text	with_speech
+0	SpeechT5/WavLLM/fairseq/examples/wavllm/test_data/audio/sqqa.wav	182574		The fundamental theorem of calculus is a theorem that links the concept of the derivative of a function with the concept of the integral .	True
\ No newline at end of file
diff --git a/WavLLM/wavllm/test_data/asr.tsv b/WavLLM/wavllm/test_data/asr.tsv
new file mode 100644
index 0000000000000000000000000000000000000000..8a6dd428a69a25f849c0014e4685b9586e8110b5
--- /dev/null
+++ b/WavLLM/wavllm/test_data/asr.tsv
@@ -0,0 +1,2 @@
+id	audio	n_frames	prompt	tgt_text	with_speech
+0	SpeechT5/WavLLM/fairseq/examples/wavllm/test_data/audio/asr.flac	166960	Based on the attached audio, generate a comprehensive text transcription of the spoken content.	he hoped there would be stew for dinner turnips and carrots and bruised potatoes and fat mutton pieces to be ladled out in thick peppered flour fattened sauce	True
diff --git a/WavLLM/wavllm/test_data/audio/CoT-task-story.wav b/WavLLM/wavllm/test_data/audio/CoT-task-story.wav
new file mode 100644
index 0000000000000000000000000000000000000000..a870cc54f635a5340a21c3f64a6f2c005ffd2320
Binary files /dev/null and b/WavLLM/wavllm/test_data/audio/CoT-task-story.wav differ
diff --git a/WavLLM/wavllm/test_data/audio/CoT-task.wav b/WavLLM/wavllm/test_data/audio/CoT-task.wav
new file mode 100644
index 0000000000000000000000000000000000000000..371aa5dbfec72d87db2d0a936dc18026af0fdf5d
Binary files /dev/null and b/WavLLM/wavllm/test_data/audio/CoT-task.wav differ
diff --git a/WavLLM/wavllm/test_data/audio/II-task.wav b/WavLLM/wavllm/test_data/audio/II-task.wav
new file mode 100644
index 0000000000000000000000000000000000000000..9d658abeab92d79fa69b97e6fc931b1cc54c6c6a
Binary files /dev/null and b/WavLLM/wavllm/test_data/audio/II-task.wav differ
diff --git a/WavLLM/wavllm/test_data/audio/asr.flac b/WavLLM/wavllm/test_data/audio/asr.flac
new file mode 100644
index 0000000000000000000000000000000000000000..3365912ae29f65308e87f0fca53fc1db39fa5e7a
Binary files /dev/null and b/WavLLM/wavllm/test_data/audio/asr.flac differ
diff --git a/WavLLM/wavllm/test_data/audio/emo.wav b/WavLLM/wavllm/test_data/audio/emo.wav
new file mode 100644
index 0000000000000000000000000000000000000000..f4449dacb9fcad0268b7bf4c2ae8a5fcec18d019
Binary files /dev/null and b/WavLLM/wavllm/test_data/audio/emo.wav differ
diff --git a/WavLLM/wavllm/test_data/audio/sqa.wav b/WavLLM/wavllm/test_data/audio/sqa.wav
new file mode 100644
index 0000000000000000000000000000000000000000..dd44234286e2ba3f02691c2d3a5e88a9ca2e6702
Binary files /dev/null and b/WavLLM/wavllm/test_data/audio/sqa.wav differ
diff --git a/WavLLM/wavllm/test_data/audio/sqqa.wav b/WavLLM/wavllm/test_data/audio/sqqa.wav
new file mode 100644
index 0000000000000000000000000000000000000000..35a394dab54426707fa721fb14d4f6c05d3bc97f
Binary files /dev/null and b/WavLLM/wavllm/test_data/audio/sqqa.wav differ
diff --git a/WavLLM/wavllm/test_data/audio/st.flac b/WavLLM/wavllm/test_data/audio/st.flac
new file mode 100644
index 0000000000000000000000000000000000000000..824b4fc43c147e5f44d014bb402e87271c18b6b8
Binary files /dev/null and b/WavLLM/wavllm/test_data/audio/st.flac differ
diff --git a/WavLLM/wavllm/test_data/audio/sv.wav b/WavLLM/wavllm/test_data/audio/sv.wav
new file mode 100644
index 0000000000000000000000000000000000000000..169e336d02d0fdf4df3d2204ffad679fbc3a2aea
Binary files /dev/null and b/WavLLM/wavllm/test_data/audio/sv.wav differ
diff --git a/WavLLM/wavllm/test_data/dict.txt b/WavLLM/wavllm/test_data/dict.txt
new file mode 100644
index 0000000000000000000000000000000000000000..d328f2152d8b37a96b3e5470c56fcd6755a7a7ad
--- /dev/null
+++ b/WavLLM/wavllm/test_data/dict.txt
@@ -0,0 +1,5 @@
+1 1
+2 2
+3 3
+4 4
+5 5
diff --git a/WavLLM/wavllm/test_data/emo.tsv b/WavLLM/wavllm/test_data/emo.tsv
new file mode 100644
index 0000000000000000000000000000000000000000..eff6a3e9eeac76a2b4a1d64a8bdee41febd59c76
--- /dev/null
+++ b/WavLLM/wavllm/test_data/emo.tsv
@@ -0,0 +1,2 @@
+id	audio	n_frames	prompt	tgt_text	with_speech
+0	SpeechT5/WavLLM/fairseq/examples/wavllm/test_data/audio/emo.wav	12345	Can you describe the emotional condition of the speaker in the provided audio clip?	sad	True
diff --git a/WavLLM/wavllm/test_data/en2de.tsv b/WavLLM/wavllm/test_data/en2de.tsv
new file mode 100644
index 0000000000000000000000000000000000000000..7f9ce5a8585f31d5004db9f8e707eb400431b463
--- /dev/null
+++ b/WavLLM/wavllm/test_data/en2de.tsv
@@ -0,0 +1,2 @@
+id	audio	n_frames	tgt_text	prompt	with_speech
+0	SpeechT5/WavLLM/fairseq/examples/wavllm/test_data/audio/st.flac	34560	Sie wird schon in Ordnung sein.	Translate the audio clip into German.	True
diff --git a/WavLLM/wavllm/test_data/gaokao.tsv b/WavLLM/wavllm/test_data/gaokao.tsv
new file mode 100644
index 0000000000000000000000000000000000000000..eeccefe54a1acfc9885ded45204eaa1736865d23
--- /dev/null
+++ b/WavLLM/wavllm/test_data/gaokao.tsv
@@ -0,0 +1,2001 @@
+id	audio	n_frames	prompt	tgt_text	with_speech
+short_conv_1	gaokao_audio/short_conv_1.wav	111111	What will the man do next? A. Start to take exercise; B. Do as he always does; C. Change his working time.	A	True
+short_conv_10	gaokao_audio/short_conv_10.wav	111111	What does the woman want to do? A. Mail a letter; B. Use the restroom; C. Find the police station.	B	True
+short_conv_100	gaokao_audio/short_conv_100.wav	111111	What does the woman suggest the man do? A. Quit smoking completely; B. Resist the temptation of smoking; C. Be careful of smoking.	B	True
+short_conv_1000	gaokao_audio/short_conv_1000.wav	111111	What do we know about the woman? A. She plays tennis well; B. She seldom plays tennis; C. She plays tennis regularly	B	True
+short_conv_1001	gaokao_audio/short_conv_1001.wav	111111	Who is wanted on the phone? A. John Smith; B. Chris Watson; C. Sarah White.	B	True
+short_conv_1002	gaokao_audio/short_conv_1002.wav	111111	What time is it now? A. 7:00; B. 7:45; C. 8:15.	C	True
+short_conv_1003	gaokao_audio/short_conv_1003.wav	111111	What does the man think of Zhao Benshan? A. Rich B. Wise C. Funny	C	True
+short_conv_1004	gaokao_audio/short_conv_1004.wav	111111	Who supports human cloning? A. The woman B. Lucy C. The man	B	True
+short_conv_1005	gaokao_audio/short_conv_1005.wav	111111	What is the woman going to do this evening? A. Go to dinner B. Visit her sister C. Go to the airport.	C	True
+short_conv_1006	gaokao_audio/short_conv_1006.wav	111111	What are the speakers talking about? A. Using English in everyday life; B. Learning a foreign language; C. Making a piece of cake.	B	True
+short_conv_1007	gaokao_audio/short_conv_1007.wav	111111	what does the man mean? A. The woman can use his pen; B. He is reading a book; C. He will buy a pen for the woman	A	True
+short_conv_1008	gaokao_audio/short_conv_1008.wav	111111	What do you know about the woman? A. He likes volleyball; B. She doesn’t like basketball; C. She likes sports, too.	B	True
+short_conv_1009	gaokao_audio/short_conv_1009.wav	111111	What does the woman mean? A. She likes doing morning exercises; B. She can’t get up early; C. She dislikes doing such exercises.	B	True
+short_conv_101	gaokao_audio/short_conv_101.wav	111111	What probably happened to the man's girlfriend? A. She broke her leg; B. She broke up with him; C. She was in bed with the flu.	A	True
+short_conv_1010	gaokao_audio/short_conv_1010.wav	111111	How are Tom’s skills? A. Excellent; B. Just so-so; C. Very bad.	A	True
+short_conv_1011	gaokao_audio/short_conv_1011.wav	111111	How long did the game last? A. About two hours; B. About two hours and a half; C. About three hours.	B	True
+short_conv_1012	gaokao_audio/short_conv_1012.wav	111111	Which ball game does Jack play best? A. Volleyball; B. Ping-pong; C. Tennis.	B	True
+short_conv_1013	gaokao_audio/short_conv_1013.wav	111111	What is the conversation mainly about? A. The woman’s phone; B. The woman’s new sound system; C. The woman’s favorite music.	B	True
+short_conv_1014	gaokao_audio/short_conv_1014.wav	111111	Why did the man apologize? A. He incorrectly guessed the baby’s age; B. He mistook the woman’s boy for a girl; C. He made a comment about the baby’s hair.	B	True
+short_conv_1015	gaokao_audio/short_conv_1015.wav	111111	How much more will the woman have to pay? A. 3 pounds; B. 5 pounds; C. 8 pounds.	A	True
+short_conv_1016	gaokao_audio/short_conv_1016.wav	111111	Where are the speakers probably? A. In a restaurant; B. In a hotel; C. At home.	C	True
+short_conv_1017	gaokao_audio/short_conv_1017.wav	111111	What will Mrs. Williams do later? A. Give Mr. Anderson a call; B. Attend a basketball match; C. Pass on a message.	C	True
+short_conv_1018	gaokao_audio/short_conv_1018.wav	111111	What are the speakers mainly talking about? A. Preparing for a test; B. Eating during an exam; C. Getting a medical exam.	B	True
+short_conv_1019	gaokao_audio/short_conv_1019.wav	111111	What is the probable relationship between the speakers? A. Father and daughter; B. Classmates; C. Teacher and student.	C	True
+short_conv_102	gaokao_audio/short_conv_102.wav	111111	Who gave the woman a haircut? A. Her coach; B. A barber; C. Her friend.	C	True
+short_conv_1020	gaokao_audio/short_conv_1020.wav	111111	When does the man usually do exercise? A. In the afternoon; B. In the morning; C. At night.	B	True
+short_conv_1021	gaokao_audio/short_conv_1021.wav	111111	What did the woman do today? A. She cleaned the car; B. She bought an umbrella; C. She listened to the weather forecast.	A	True
+short_conv_1022	gaokao_audio/short_conv_1022.wav	111111	What fruit does the woman use? A. Pears; B. Oranges; C. Bananas.	C	True
+short_conv_1023	gaokao_audio/short_conv_1023.wav	111111	What will the woman probably do next? A. Go shopping; B. Look at the homework; C. Go to Hannah’s birthday party.	A	True
+short_conv_1024	gaokao_audio/short_conv_1024.wav	111111	What are the speakers talking about? A. Where their tent is; B. Where to set up the tent; C. How to recognize different trees.	A	True
+short_conv_1025	gaokao_audio/short_conv_1025.wav	111111	What does the woman offer to do? A. Help the man’s wife find a doctor; B. Call an ambulance; C. Take the man to the hospital.	B	True
+short_conv_1026	gaokao_audio/short_conv_1026.wav	111111	What will the girl be doing tonight? A. Watching a film; B. Reading a novel; C. Eating a meal in the cafeteria.	C	True
+short_conv_1027	gaokao_audio/short_conv_1027.wav	111111	Where will the man find his car keys? A. On the bookcase; B. In his pocket; C. On the coffee table.	C	True
+short_conv_1028	gaokao_audio/short_conv_1028.wav	111111	What do we learn about the speakers? A. They are unwilling to wait for Jack; B. They can’t see Jack’s grade; C. They are eager to see Jack’s reaction.	C	True
+short_conv_1029	gaokao_audio/short_conv_1029.wav	111111	What does the man think the weather will be like? A. Fine; B. Rainy; C. Cloudy.	B	True
+short_conv_103	gaokao_audio/short_conv_103.wav	111111	Where does the conversation most probably happened? A. At a snack bar; B. At a photo studio; C. At a fancy restaurant.	A	True
+short_conv_1030	gaokao_audio/short_conv_1030.wav	111111	What does the man imply? A. The woman speaks English very well; B. The woman has a strong French accent; C. The woman must be from France.	A	True
+short_conv_1031	gaokao_audio/short_conv_1031.wav	111111	Why does the man apologize? A. He has lost the book; B. He has forgotten to bring the book; C. He has brought the wrong book.	B	True
+short_conv_1032	gaokao_audio/short_conv_1032.wav	111111	How is the woman feeling? A. Ill; B. Hot; C. Cold.	C	True
+short_conv_1033	gaokao_audio/short_conv_1033.wav	111111	What is the man’s reaction to the woman’s words? A. Anger; B. Impatience; C. Surprise.	C	True
+short_conv_1034	gaokao_audio/short_conv_1034.wav	111111	What does the man plan to do in five years? A. To leave the company; B. To manage the company; C. To start his own company.	B	True
+short_conv_1035	gaokao_audio/short_conv_1035.wav	111111	What does the woman imply? A. She doesn’t appreciate John’s humor; B. She used to understand John’s humor; C. She doesn’t have any sense of humor.	A	True
+short_conv_1036	gaokao_audio/short_conv_1036.wav	111111	What is Robert’s occupation now? A. Novelist; B. Reporter; C. Secretary.	A	True
+short_conv_1037	gaokao_audio/short_conv_1037.wav	111111	Why was George at the hospital? A. His wife was sick; B. His wife just had a baby; C. He was visiting his daughter.	C	True
+short_conv_1038	gaokao_audio/short_conv_1038.wav	111111	What is the woman most probably doing now? A. Reading downstairs; B. Sleeping downstairs; C. Sitting upstairs.	C	True
+short_conv_1039	gaokao_audio/short_conv_1039.wav	111111	What does the boy want to have? A. Green peaches; B. Red peaches; C. Red apples.	B	True
+short_conv_104	gaokao_audio/short_conv_104.wav	111111	Why does the woman look upset? A. Her purse was stolen; B. She was given a parking ticket; C. She couldn’t find a parking space.	A	True
+short_conv_1040	gaokao_audio/short_conv_1040.wav	111111	How many people will go to the museum together? A. Three; B. Five; C. Six.	C	True
+short_conv_1041	gaokao_audio/short_conv_1041.wav	111111	Where is John’s father? A. At work B. At home; C. At schoo1.	A	True
+short_conv_1042	gaokao_audio/short_conv_1042.wav	111111	What are the speakers talking about? A. Banks; B. Money; C. Cards.	B	True
+short_conv_1043	gaokao_audio/short_conv_1043.wav	111111	What will the man do? A. Invite his friends; B. Prepare some food; C. Go pack the car.	C	True
+short_conv_1044	gaokao_audio/short_conv_1044.wav	111111	Why didn’t the man go to the exhibition? A. Getting tickets would take too much time; B. He didn’t care for Da Vinci’s paintings; C. The ticket was too expensive.	A	True
+short_conv_1045	gaokao_audio/short_conv_1045.wav	111111	What does the man mean? A. The dish is new on the menu; B. The dish is a good bargain; C. The dish is quite healthy.	B	True
+short_conv_1046	gaokao_audio/short_conv_1046.wav	111111	What is the weather like now? A. Sunny; B. Windy; C. Rainy.	C	True
+short_conv_1047	gaokao_audio/short_conv_1047.wav	111111	When did the man visit Yellowstone Park? A. This year B. Last year; C. The year before last.	A	True
+short_conv_1048	gaokao_audio/short_conv_1048.wav	111111	What are the speakers talking about? A. Their living place; B. Their children; C. Their hobbies.	A	True
+short_conv_1049	gaokao_audio/short_conv_1049.wav	111111	Why is the man excited? A. Because he's got a driver's license; B. Because he's sold many tickets; C. Because he's going abroad.	C	True
+short_conv_105	gaokao_audio/short_conv_105.wav	111111	What’s the probable relationship between the speakers? A. Teacher and student; B. Father and daughter; C. Classmates.	A	True
+short_conv_1050	gaokao_audio/short_conv_1050.wav	111111	How does the son feel at the very moment? A. Tired; B. Hungry; C. Disappointed.	B	True
+short_conv_1051	gaokao_audio/short_conv_1051.wav	111111	What subject does the woman like best? A. English; B. Physics; C. Math.	A	True
+short_conv_1052	gaokao_audio/short_conv_1052.wav	111111	What will the woman do? A. Make tea for Tom; B. Have a cup of coffee; C. Make tea for herself.	C	True
+short_conv_1053	gaokao_audio/short_conv_1053.wav	111111	How will the woman probably get to California? A. By air; B. By bus; C. By train.	B	True
+short_conv_1054	gaokao_audio/short_conv_1054.wav	111111	What would the girl like to have as a pet? A. A fish; B. A cat; C. A dog.	C	True
+short_conv_1055	gaokao_audio/short_conv_1055.wav	111111	What is the woman worried about? A. A job interview; B. Going out to dinner; C. Talking to her co-workers.	A	True
+short_conv_1056	gaokao_audio/short_conv_1056.wav	111111	What is the man’s advice to the woman? A. Buy a new TV; B. Get her TV fixed; C. Watch TV all the time.	A	True
+short_conv_1057	gaokao_audio/short_conv_1057.wav	111111	Whose computer is broken? A. Bob’s; B. Bill’s; C. David’s.	C	True
+short_conv_1058	gaokao_audio/short_conv_1058.wav	111111	What are the speakers mainly talking about? A. Lifestyles in New York; B. Some health problems; C. The heavy traffic.	C	True
+short_conv_1059	gaokao_audio/short_conv_1059.wav	111111	What does the woman probably think of working while in college? A. Useful; B. Difficult; C. Unnecessary.	A	True
+short_conv_106	gaokao_audio/short_conv_106.wav	111111	When will the speakers discuss the matter again? A. On Wednesday; B. On Friday; C. On Thursday.	C	True
+short_conv_1060	gaokao_audio/short_conv_1060.wav	111111	Why was the man late? A. He didn’t feel well; B. He set off late; C. He got lost.	C	True
+short_conv_1061	gaokao_audio/short_conv_1061.wav	111111	Where does the woman work? A. In a post office; B. In a hotel; C. In a cafe.	B	True
+short_conv_1062	gaokao_audio/short_conv_1062.wav	111111	What will the man do first? A. Have some coffee; B. Return a book; C. Write a report.	B	True
+short_conv_1063	gaokao_audio/short_conv_1063.wav	111111	Where does this conversation probably take place? A. At an airport; B. At home; C. At a department store.	A	True
+short_conv_1064	gaokao_audio/short_conv_1064.wav	111111	Why did the woman thank the man? A. He lent her some money; B. He paid the five-pound bill for her; C. He returned the money he found to her.	C	True
+short_conv_1065	gaokao_audio/short_conv_1065.wav	111111	What is the woman probably going to do? A. Throw the paper away; B. Read the paper through again; C. Rewrite the paper.	B	True
+short_conv_1066	gaokao_audio/short_conv_1066.wav	111111	How will the man travel to London? A. By car; B. By train; C. By bus.	C	True
+short_conv_1067	gaokao_audio/short_conv_1067.wav	111111	What will the weather be like at midday tomorrow? A. Stormy; B. Cloudy; C. Thundery.	B	True
+short_conv_1068	gaokao_audio/short_conv_1068.wav	111111	Where does the man most probably work? A. In a shop; B. On a farm; C. In an office.	A	True
+short_conv_1069	gaokao_audio/short_conv_1069.wav	111111	Who is Miss Jones? A. The man’s classmate; B. The man’s teacher; C. The man’s sister.	B	True
+short_conv_107	gaokao_audio/short_conv_107.wav	111111	What is the purpose of the woman's call? A. To ask about a bill; B. To pay the gas bill; C. To open a new account.	A	True
+short_conv_1070	gaokao_audio/short_conv_1070.wav	111111	When will the speakers meet? A. On Wednesday B. On Thursday C. On Friday	C	True
+short_conv_1071	gaokao_audio/short_conv_1071.wav	111111	How much will the woman pay? A. $12; B. $30; C. $42.	C	True
+short_conv_1072	gaokao_audio/short_conv_1072.wav	111111	What will the speakers do after watching a movie? A. Have dinner; B. Type the reports; C. Help Tony with his work.	A	True
+short_conv_1073	gaokao_audio/short_conv_1073.wav	111111	Where are the speakers? A. At a bus stop; B. On a bus; C. In the man’s home.	B	True
+short_conv_1074	gaokao_audio/short_conv_1074.wav	111111	What are the speakers mainly discussing? A. A car crash; B. A funny guy; C. A car advertisement.	C	True
+short_conv_1075	gaokao_audio/short_conv_1075.wav	111111	How is the man going to pay for the car? A. By earning money on holidays; B. By asking his parents for money; C. By borrowing money from the woman.	A	True
+short_conv_1076	gaokao_audio/short_conv_1076.wav	111111	What will the woman have? A. Fried rice; B. A sandwich; C. Pizza.	B	True
+short_conv_1077	gaokao_audio/short_conv_1077.wav	111111	What are the speakers talking about? A. A class B. A trip C. A city	B	True
+short_conv_1078	gaokao_audio/short_conv_1078.wav	111111	What is the woman doing? A. Having an interview; B. Selling a typewriter; C. Doing some typing.	A	True
+short_conv_1079	gaokao_audio/short_conv_1079.wav	111111	How much will the man pay in total? A. ﹩70 B. ﹩60 C. ﹩50	C	True
+short_conv_108	gaokao_audio/short_conv_108.wav	111111	What did the woman just do? A. She argued with the man; B. She shouted at another woman; C. She left her work to someone else.	B	True
+short_conv_1080	gaokao_audio/short_conv_1080.wav	111111	Where is Butch now? A. At a friends house B. In his office C. At home	A	True
+short_conv_1081	gaokao_audio/short_conv_1081.wav	111111	What colour is the dress? A. Blue B. Yellow C. Green	A	True
+short_conv_1082	gaokao_audio/short_conv_1082.wav	111111	What does the man think about Jennifer? A. She is like many other people B. She always keeps her word C. She is tougher than men	B	True
+short_conv_1083	gaokao_audio/short_conv_1083.wav	111111	Why did the woman decide to use a bike? A. To protect the environment; B. To pay her tuition C. To improve her health	B	True
+short_conv_1084	gaokao_audio/short_conv_1084.wav	111111	What is the probable relationship between the speakers? A. Husband and wife B. Waiter and customer C. Doctor and patient	C	True
+short_conv_1085	gaokao_audio/short_conv_1085.wav	111111	How much at least does one shirt cost today? A. ＄60 B. ＄40 C. ＄30.	C	True
+short_conv_1086	gaokao_audio/short_conv_1086.wav	111111	What does the man mean? A. He failed the exam B. He studied very hard C. The exam was too easy	A	True
+short_conv_1087	gaokao_audio/short_conv_1087.wav	111111	What are the speakers mainly talking about? A. A strange belief; B. The habits of bees; C. A visiting friend.	A	True
+short_conv_1088	gaokao_audio/short_conv_1088.wav	111111	What will the man do next? A. Watch a show; B. Take a shower; C. Continue reading.	C	True
+short_conv_1089	gaokao_audio/short_conv_1089.wav	111111	How has the weather been recently? A. Cold and rainy; B. Hot and dry; C. Windy and dry.	B	True
+short_conv_109	gaokao_audio/short_conv_109.wav	111111	Who might the man be? A. A crew member; B. A doctor; C. A taxi driver.	A	True
+short_conv_1090	gaokao_audio/short_conv_1090.wav	111111	How many people are having the meal? A. Two; B. Three; C. Four.	C	True
+short_conv_1091	gaokao_audio/short_conv_1091.wav	111111	What is the cat’s main color? A. Black; B. White; C. Gray.	A	True
+short_conv_1092	gaokao_audio/short_conv_1092.wav	111111	What do we know about the man’s apartment? A. It is not quiet enough; B. It is near the train station; C. It has a good view of the park.	A	True
+short_conv_1093	gaokao_audio/short_conv_1093.wav	111111	Where did the man go yesterday? A. The hotel; B. The office; C. The airport.	B	True
+short_conv_1094	gaokao_audio/short_conv_1094.wav	111111	What is the woman’s red jacket best for? A. The rainy days; B. The windy days; C. The warm days.	C	True
+short_conv_1095	gaokao_audio/short_conv_1095.wav	111111	What would the woman probably order with chicken? A. White wine; B. Red wine; C. Beer.	A	True
+short_conv_1096	gaokao_audio/short_conv_1096.wav	111111	What will the woman probably write her name with? A. A pencil; B. Her finger; C. An electronic pen.	B	True
+short_conv_1097	gaokao_audio/short_conv_1097.wav	111111	How much will the woman pay for one chair? A. $ 59; B. $62; C. $65.	C	True
+short_conv_1098	gaokao_audio/short_conv_1098.wav	111111	What is the probable relationship between the two speakers? A. Professor and student; B. Hotel manager and tourist; C. Salesman and customer.	A	True
+short_conv_1099	gaokao_audio/short_conv_1099.wav	111111	What will the man probably do next? A. Go back to his work; B. Eat out for lunch; C. Pick up Jenny.	B	True
+short_conv_11	gaokao_audio/short_conv_11.wav	111111	What does the woman mean? A. She missed the travel; B. The travel was a failure; C. Bob didn,t travel with them.	C	True
+short_conv_110	gaokao_audio/short_conv_110.wav	111111	How does the woman probably feel now? A. Excited; B. Worried; C. Stressed.	A	True
+short_conv_1100	gaokao_audio/short_conv_1100.wav	111111	What did the woman think they would do? A. See an exhibition; B. Have a meeting; C. Attend a lecture.	A	True
+short_conv_1101	gaokao_audio/short_conv_1101.wav	111111	Where does the conversation probably take place? A. In a clinic; B. In a hotel; C. In a store.	C	True
+short_conv_1102	gaokao_audio/short_conv_1102.wav	111111	What does the woman tell the boy to do? A. Tell the truth; B. Read newspapers; C. Question the news.	C	True
+short_conv_1103	gaokao_audio/short_conv_1103.wav	111111	What is the man going to do tonight? A. Go to the concert B. Write a report; C. Stay at home	A	True
+short_conv_1104	gaokao_audio/short_conv_1104.wav	111111	Why is the woman so tired? A. She is overweight; B. She docs too much exercise C. She is on a diet.	C	True
+short_conv_1105	gaokao_audio/short_conv_1105.wav	111111	Where are the two speakers? A. In the room; B. At the front desk; C. Outside the hotel.	A	True
+short_conv_1106	gaokao_audio/short_conv_1106.wav	111111	Why did the woman feel surprised? A. The man solved a mystery; B. The man achieved a high score; C. The man won a prize.	B	True
+short_conv_1107	gaokao_audio/short_conv_1107.wav	111111	What is the man going to do? A. Move to the countryside; B. Live in a large city; C. Have a holiday.	B	True
+short_conv_1108	gaokao_audio/short_conv_1108.wav	111111	What does the man mean? A. He doesn’t feel hot at all; B. The woman is very considerate; C. The air conditioner can be used.	C	True
+short_conv_1109	gaokao_audio/short_conv_1109.wav	111111	Why does the man like hiking? A. He can get close to nature; B. He likes wild animals; C. He can make friends.	A	True
+short_conv_111	gaokao_audio/short_conv_111.wav	111111	What does the woman mean? A. The man should lock the car doors; B. The man should let his dog out of the hot car; C. The man should take his baby into the store with him.	B	True
+short_conv_1110	gaokao_audio/short_conv_1110.wav	111111	Where does the talk possibly take place? A. In a meeting room; B. At a patty; C. In a clothes shop.	C	True
+short_conv_1111	gaokao_audio/short_conv_1111.wav	111111	How did the woman book the tickets? A. Over the phone; B. Through the Internet; C. At the ticket office.	B	True
+short_conv_1112	gaokao_audio/short_conv_1112.wav	111111	What is the man likely to do on Friday? A. See the new exhibition; B. Watch a baseball game; C. Finish a report.	C	True
+short_conv_1113	gaokao_audio/short_conv_1113.wav	111111	What are the speakers mainly talking about? A. The weather this year; B. Water conservation; C. The importance of washing.	B	True
+short_conv_1114	gaokao_audio/short_conv_1114.wav	111111	How much will the woman pay for the T-shirt and the jeans? A. $10; B. $20; C. $30.	C	True
+short_conv_1115	gaokao_audio/short_conv_1115.wav	111111	What do we know about the woman? A. She has a fever; B. She looks very tired now; C. She fell asleep in an outdoor chair.	C	True
+short_conv_1116	gaokao_audio/short_conv_1116.wav	111111	What will the woman do right after she types the letter? A. Havea meal; B. Change her clothes; C. Take the car.	B	True
+short_conv_1117	gaokao_audio/short_conv_1117.wav	111111	What is the weather like today? A. Rainy; B. Cloudy; C. Sunny,	C	True
+short_conv_1118	gaokao_audio/short_conv_1118.wav	111111	Where does the conversation take place? A. In an army; B. At a restaurant; C. In an office.	B	True
+short_conv_1119	gaokao_audio/short_conv_1119.wav	111111	When should the woman return the book according to the man? A. This Saturday; B. Next Friday; C. Next Tuesday.	B	True
+short_conv_112	gaokao_audio/short_conv_112.wav	111111	What does the man want to do with the money? A. Hire a babysitter; B. Go to a performance; C. Join an educational program.	B	True
+short_conv_1120	gaokao_audio/short_conv_1120.wav	111111	How much should the woman pay for the books? A. $ 10; B. $20; C. $30.	C	True
+short_conv_1121	gaokao_audio/short_conv_1121.wav	111111	What color is the shirt? A. Yellow; B. Green; C. Blue.	C	True
+short_conv_1122	gaokao_audio/short_conv_1122.wav	111111	What are the speakers mainly discussing? A. Washing the woman’s dirty clothes; B. Folding clean clothes; C. Doing Ben’s laundry.	C	True
+short_conv_1123	gaokao_audio/short_conv_1123.wav	111111	What are the speakers comparing? A. A movie and a novel; B. Two movies; C. Two types of music.	B	True
+short_conv_1124	gaokao_audio/short_conv_1124.wav	111111	Which pair of shoes was comfortable? A. The second pair; B. The third pair; C. The first pair.	A	True
+short_conv_1125	gaokao_audio/short_conv_1125.wav	111111	What time should the woman start recording? A. At four o’clock; B. At six o’clock; C. At seven o’clock.	B	True
+short_conv_1126	gaokao_audio/short_conv_1126.wav	111111	What is almost ready to serve? A. The noodles; B. The salad; C. The bread.	A	True
+short_conv_1127	gaokao_audio/short_conv_1127.wav	111111	What does the man mean? A. He went mountain climbing last year; B. He doesn't like mountain climbing at all; C. He hasn’t traveled around the world yet.	B	True
+short_conv_1128	gaokao_audio/short_conv_1128.wav	111111	Why is the man going to New York? A. To work there; B. To visit a friend; C. To have a vacation.	A	True
+short_conv_1129	gaokao_audio/short_conv_1129.wav	111111	What is the relationship between the two speakers? A. Student and teacher; B. Patient and doctor; C. Lawyer and client.	A	True
+short_conv_113	gaokao_audio/short_conv_113.wav	111111	Why can’t the woman tell the man the time? A. Her bus is coming; B. She forgot her phone; C. Her clock isn’t reliable.	C	True
+short_conv_1130	gaokao_audio/short_conv_1130.wav	111111	What time does the woman usually have breakfast? A. At 7:00; B. At 8:00; C. At 10:00.	A	True
+short_conv_1131	gaokao_audio/short_conv_1131.wav	111111	How will the woman go downtown? A. By taxi; B. By car; C. By bus.	C	True
+short_conv_1132	gaokao_audio/short_conv_1132.wav	111111	What does the man mean? A. A cold drink can be relaxing; B. Scott and Tina like to play jocks on each other; C. Humor can be helpful in embarrassing situations	C	True
+short_conv_1133	gaokao_audio/short_conv_1133.wav	111111	What does the woman mean? A. She is encouraged; B. She appreciate the offer; C. She needs a friend	B	True
+short_conv_1134	gaokao_audio/short_conv_1134.wav	111111	What surprises the man? A. Jane dropped school; B. Jane made much money C. Jane is still working.	C	True
+short_conv_1135	gaokao_audio/short_conv_1135.wav	111111	What will the woman probably do next? A. Check the battery; B. Repair the display; C. Buy a new phone.	A	True
+short_conv_1136	gaokao_audio/short_conv_1136.wav	111111	What does the woman ask the man to do? A. Have lunch first B. Eat slowly; C. Join in a game.	B	True
+short_conv_1137	gaokao_audio/short_conv_1137.wav	111111	Where are the speakers? A. In a university; B. In a housing agency; C. In a lost and found.	B	True
+short_conv_1138	gaokao_audio/short_conv_1138.wav	111111	What's the probable relationship between the speakers? A. Salesperson and customer; B. Cook and customer; C. Doctor and patient.	C	True
+short_conv_1139	gaokao_audio/short_conv_1139.wav	111111	What are the speakers talking about? A. A dinner out; B. A trip to Hainan; C. A birthday gift.	C	True
+short_conv_114	gaokao_audio/short_conv_114.wav	111111	What does the man mean? A. The woman鈥檚 father likes her present very much; B. The woman had better not buy a shirt this time; C. It’s a good idea for the woman to buy her father a shirt.	B	True
+short_conv_1140	gaokao_audio/short_conv_1140.wav	111111	What’s the time now? A. 10:10; B. 10:30; C. 11:00.	A	True
+short_conv_1141	gaokao_audio/short_conv_1141.wav	111111	How does the man prefer to go to work? A. By car; B. By bus; C. On foot.	A	True
+short_conv_1142	gaokao_audio/short_conv_1142.wav	111111	What will the woman probably do? A. Play football after schoo1; B. Register at a swim club; C. Go to a gym class.	B	True
+short_conv_1143	gaokao_audio/short_conv_1143.wav	111111	Where does the conversation probably take place? A. On the train; B. On the bus; C. On the plane.	C	True
+short_conv_1144	gaokao_audio/short_conv_1144.wav	111111	How much will the woman pay for the chemistry book? A. $200; B. $100; C. $50.	C	True
+short_conv_1145	gaokao_audio/short_conv_1145.wav	111111	Why is the man watching the movie? A. To teach some Italian; B. To improve his Spanish; C. To prepare for an exam.	A	True
+short_conv_1146	gaokao_audio/short_conv_1146.wav	111111	What does the man ask the woman to do? A. To park elsewhere; B. To drive along the street; C. To stay for a short while.	A	True
+short_conv_1147	gaokao_audio/short_conv_1147.wav	111111	What are the speakers mainly talking about? A. Young people lose their jobs easily; B. Young people are too quick in making decisions; C. Young people seldom stay long in the same job.	C	True
+short_conv_1148	gaokao_audio/short_conv_1148.wav	111111	How does the woman feel about the zoo? A. Sad; B. Impressed; C. Disappointed.	B	True
+short_conv_1149	gaokao_audio/short_conv_1149.wav	111111	What is the woman doing now? A. Baking cookies; B. Making a list; C. Shopping for groceries.	B	True
+short_conv_115	gaokao_audio/short_conv_115.wav	111111	What does the woman imply? A. The tickets are sold in advance at half price; B. It’s difficult to buy the tickets on the spot; C. It’s better to buy the tickets beforehand.	C	True
+short_conv_1150	gaokao_audio/short_conv_1150.wav	111111	What happened to the woman? A. She woke up late; B. She got to work late; C. She stayed up late.	A	True
+short_conv_1151	gaokao_audio/short_conv_1151.wav	111111	Where does the man want to go? A. To a railway station; B. To a Post Office; C. To the seaside.	C	True
+short_conv_1152	gaokao_audio/short_conv_1152.wav	111111	What is the probable relationship between the speakers? A. Teacher and student; B. Boss and secretary; C. Husband and wife.	B	True
+short_conv_1153	gaokao_audio/short_conv_1153.wav	111111	Where does the conversation probably take place? A. In a store; B. In an office; C. In a bank.	A	True
+short_conv_1154	gaokao_audio/short_conv_1154.wav	111111	What does the man think of the lecture? A. It was interesting and easy to follow; B. It was far beyond his understanding; C. It was as difficult as he had expected.	B	True
+short_conv_1155	gaokao_audio/short_conv_1155.wav	111111	How does the woman keep slim? A. By going on a diet; B. By eating fruit and vegetables; C. By doing physical exercise.	C	True
+short_conv_1156	gaokao_audio/short_conv_1156.wav	111111	Which team lost the basketball match? A. Class One; B. Class Two; C. Class Three.	B	True
+short_conv_1157	gaokao_audio/short_conv_1157.wav	111111	What does the woman think of the pet dog? A. Lovely; B. Annoying; C. Lazy.	B	True
+short_conv_1158	gaokao_audio/short_conv_1158.wav	111111	Where are the two speakers? A. In a bookstore; B. In a library; C. In the classroom.	A	True
+short_conv_1159	gaokao_audio/short_conv_1159.wav	111111	What does the woman work as now? A. A waitress; B. A manager; C. A cashier.	C	True
+short_conv_116	gaokao_audio/short_conv_116.wav	111111	What can we learn about the man and woman from this conversation? A. They are probably art students; B. They won’t go to the zoo together; C. They are making an appointment.	A	True
+short_conv_1160	gaokao_audio/short_conv_1160.wav	111111	How long has the man been a bus driver? A. Two months; B. Three months; C. Four months.	B	True
+short_conv_1161	gaokao_audio/short_conv_1161.wav	111111	What are the speakers talking about? A. Driving; B. The Internet; C. Their job.	C	True
+short_conv_1162	gaokao_audio/short_conv_1162.wav	111111	What are the speakers talking about? A. When to announce the news; B. Who will attend the meeting; C. What to discuss on Monday.	A	True
+short_conv_1163	gaokao_audio/short_conv_1163.wav	111111	Why does the man lose points? A. He is usually late for class; B. He hands in his homework late; C. He isn’t as smart as other students.	B	True
+short_conv_1164	gaokao_audio/short_conv_1164.wav	111111	How many cups of coffee has the woman had? A. 3; B. 4; C. 5.	C	True
+short_conv_1165	gaokao_audio/short_conv_1165.wav	111111	What sport does the man like best? A. Ice skating; B. Skateboarding; C. Skiing.	C	True
+short_conv_1166	gaokao_audio/short_conv_1166.wav	111111	Which class is the man taking? A. Biology; B. Chemistry; C. Physics.	A	True
+short_conv_1167	gaokao_audio/short_conv_1167.wav	111111	How many public universities are there in Britain? A. Over forty; B. Less than forty; C. Zero.	C	True
+short_conv_1168	gaokao_audio/short_conv_1168.wav	111111	What is the probable relationship between the two speakers? A. Customer and waitress; B. Teacher and student; C. Boss and secretary.	C	True
+short_conv_1169	gaokao_audio/short_conv_1169.wav	111111	When will the man be able to visit Mr. Black? A. November 1; B. November 2; C. November 3.	B	True
+short_conv_117	gaokao_audio/short_conv_117.wav	111111	What does the man say about the course? A. It’s too tough for some students; B. It’s much more difficult than people think; C. It’s believed to be the hardest course.	B	True
+short_conv_1170	gaokao_audio/short_conv_1170.wav	111111	Why is the woman surprised? A. Because she found the assignment very difficult; B. Because she found it hard to believe so much time the man had spent; C. Because the man couldn’t finish the assignment because he was busy.	B	True
+short_conv_1171	gaokao_audio/short_conv_1171.wav	111111	What can we know about Judy? A. She came to the party; B. She hasn’t appeared yet; C. She planned to come but changed her mind later.	B	True
+short_conv_1172	gaokao_audio/short_conv_1172.wav	111111	When will the speakers probably leave for the train station? A. At 3:30 p.m; B. At 4:00 p.m; C. At 4:30 p.m.	A	True
+short_conv_1173	gaokao_audio/short_conv_1173.wav	111111	What’s the weather probably like now? A. Rainy B. Sunny C. Cloudy	B	True
+short_conv_1174	gaokao_audio/short_conv_1174.wav	111111	What will the man buy for the woman? A. Notebooks B. Pencils C. Paper	B	True
+short_conv_1175	gaokao_audio/short_conv_1175.wav	111111	What is the man’s favorite sport? A. Tennis B. Volleyball C. Basketball	C	True
+short_conv_1176	gaokao_audio/short_conv_1176.wav	111111	What is the woman going to do tonight? A. Go to the cinema B. Attend a meeting C. Watch TV	C	True
+short_conv_1177	gaokao_audio/short_conv_1177.wav	111111	What is Claire probably like? A. Lazy; B. Clever; C. Hard-working.	A	True
+short_conv_1178	gaokao_audio/short_conv_1178.wav	111111	Why is the woman calling the man? A. To buy something from him; B. To sell him some pieces of furniture; C. To set a date to visit his new apartment.	A	True
+short_conv_1179	gaokao_audio/short_conv_1179.wav	111111	When will the equipment arrive? A. This Wednesday; B. This Friday; C. Next Monday.	B	True
+short_conv_118	gaokao_audio/short_conv_118.wav	111111	Why does the man suggest a woman wear long sleeves? A. To help relieve her of the pain; B. To prevent mosquito bites; C. To avoid getting sun-burnt.	B	True
+short_conv_1180	gaokao_audio/short_conv_1180.wav	111111	How will the speakers probably get to the exhibition? A. By bus; B. By taxi; C. By subway.	C	True
+short_conv_1181	gaokao_audio/short_conv_1181.wav	111111	What will the woman do tomorrow morning? A. Go to the zoo; B. Go to the airport; C. Deal with an email.	C	True
+short_conv_1182	gaokao_audio/short_conv_1182.wav	111111	Where is Jane and Bill’s new house? A. Near a train station; B. Near an airport; C. Near a highway.	B	True
+short_conv_1183	gaokao_audio/short_conv_1183.wav	111111	How does the man feel about his job? A. He enjoys it; B. He doesn’t like it at all; C. He wants to find a new job.	A	True
+short_conv_1184	gaokao_audio/short_conv_1184.wav	111111	What is the woman going to do first? A. Send the e-mail; B. Fix the computer; C. Go to the club.	B	True
+short_conv_1185	gaokao_audio/short_conv_1185.wav	111111	What do we know about Bill? A. He will finish the paper soon; B. He’s not going to write the paper; C. He seldom completes his work early.	C	True
+short_conv_1186	gaokao_audio/short_conv_1186.wav	111111	What is the man doing? A. Asking for help; B. Making suggestions; C. Asking for permission.	B	True
+short_conv_1187	gaokao_audio/short_conv_1187.wav	111111	Where does the conversation probably take place? A. At the hotel; B. At the customs; C. At the station.	B	True
+short_conv_1188	gaokao_audio/short_conv_1188.wav	111111	What does the man ask the woman to do? A. Lower her voice; B. Do the laundry; C. Paint the wall.	A	True
+short_conv_1189	gaokao_audio/short_conv_1189.wav	111111	Why is the woman in a hurry? A. She is heading for school; B. She wants to fetch a book; C. She has to pick up the man.	B	True
+short_conv_119	gaokao_audio/short_conv_119.wav	111111	What does the man mean? A. The apartment was provided with some old furniture; B. The furniture he bought was very cheap; C. The furniture in the market was on sale every Sunday.	B	True
+short_conv_1190	gaokao_audio/short_conv_1190.wav	111111	Whose birthday party will the speakers attend? A. Amy’s; B. Derek’s; C. Karl’s.	A	True
+short_conv_1191	gaokao_audio/short_conv_1191.wav	111111	What does the man suggest the woman do? A. Learn slowly; B. Practise more; C. Take lessons.	B	True
+short_conv_1192	gaokao_audio/short_conv_1192.wav	111111	What's the relationship between the speakers? A. Teacher and student; B. Mum and son; C. Grandma and grandson.	B	True
+short_conv_1193	gaokao_audio/short_conv_1193.wav	111111	Where does the conversation probably take place? A. In a book store; B. In a library; C. In a ticket office.	A	True
+short_conv_1194	gaokao_audio/short_conv_1194.wav	111111	How will the speakers go to Chengdu? A. By train; B. By car; C. By airplane.	C	True
+short_conv_1195	gaokao_audio/short_conv_1195.wav	111111	What time is it now? A. 7:20 p.m; B. 7:40 p.m; C. 8:00 p.m.	A	True
+short_conv_1196	gaokao_audio/short_conv_1196.wav	111111	Why does the man refuse a refill? A. He has a stomachache; B. He is full; C. He doesn't like the taste.	B	True
+short_conv_1197	gaokao_audio/short_conv_1197.wav	111111	What is the probable relationship between the speakers? A. Classmates; B. Teacher and student; C. Doctor and patient.	A	True
+short_conv_1198	gaokao_audio/short_conv_1198.wav	111111	How does the woman go to work? A. By car; B. On foot; C. By bike.	B	True
+short_conv_1199	gaokao_audio/short_conv_1199.wav	111111	When does the train leave? A. At 6:30; B. At 8:30; C. At 10:30.	C	True
+short_conv_12	gaokao_audio/short_conv_12.wav	111111	What can we learn about the man? A. He is anxious to see his sister; B. He wrote to his sister last month; C. He is expecting a letter from his sister.	C	True
+short_conv_120	gaokao_audio/short_conv_120.wav	111111	What does the woman mean? A. The food is not tasty enough; B. The man cannot afford the food; C. The food is worth the prices.	C	True
+short_conv_1200	gaokao_audio/short_conv_1200.wav	111111	What can we say about the woman? A. She’s generous; B. She’s curious; C. She’s helpful.	C	True
+short_conv_1201	gaokao_audio/short_conv_1201.wav	111111	What will James do tomorrow? A. Watch a TV program; B. Give a talk; C. Write a report.	B	True
+short_conv_1202	gaokao_audio/short_conv_1202.wav	111111	Where are the speakers talking ? A. In a house; B. In a park; C. In a library.	A	True
+short_conv_1203	gaokao_audio/short_conv_1203.wav	111111	Who is the owner of the book? A. The man himself; B. The man’s brother; C. The woman’s brother.	B	True
+short_conv_1204	gaokao_audio/short_conv_1204.wav	111111	What are the speakers talking about? A. The new house; B. The new friends; C. The new gardens.	A	True
+short_conv_1205	gaokao_audio/short_conv_1205.wav	111111	Who broke the window? A. The boy; B. The girl; C. Someone else.	C	True
+short_conv_1206	gaokao_audio/short_conv_1206.wav	111111	Who enjoyed the film yesterday? A. John; B. All except John; C. Everyone including John.	B	True
+short_conv_1207	gaokao_audio/short_conv_1207.wav	111111	What are the speakers talking about? A. A story; B. A textbook; C. A movie.	A	True
+short_conv_1208	gaokao_audio/short_conv_1208.wav	111111	When will the man arrive in Cairo? A. In the morning; B. At noon; C. In the afternoon.	C	True
+short_conv_1209	gaokao_audio/short_conv_1209.wav	111111	Why does the man stop his talk with the woman? A. He isn't interested in her words; B. He is expecting another call; C. He is angry with her.	B	True
+short_conv_121	gaokao_audio/short_conv_121.wav	111111	What does the man mean? A. He didn鈥檛 get the book he needed; B. The library is closed on weekends; C. He didn’t have time to go to the library.	A	True
+short_conv_1210	gaokao_audio/short_conv_1210.wav	111111	What does the man imply? A. The woman should go on playing chess; B. He wants to play chess with the woman; C. The woman is weak in playing chess.	A	True
+short_conv_1211	gaokao_audio/short_conv_1211.wav	111111	What is the man? A. A weather forecaster; B. A pilot; C. A trainer.	B	True
+short_conv_1212	gaokao_audio/short_conv_1212.wav	111111	What does the woman advise the man to buy? A. Nothing; B. A computer; C. A cellphone.	A	True
+short_conv_1213	gaokao_audio/short_conv_1213.wav	111111	Which room is Mum going to clean? A. The kitchen; B. The bedroom; C. The living room.	C	True
+short_conv_1214	gaokao_audio/short_conv_1214.wav	111111	How many people will go to the zoo? A. 3; B. 4; C. 5.	B	True
+short_conv_1215	gaokao_audio/short_conv_1215.wav	111111	Who cooks the dinner? A. Jason; B. John C. The woman.	B	True
+short_conv_1216	gaokao_audio/short_conv_1216.wav	111111	What will Lucy do this weekend? A. Go shopping; B. Camp in the mountain; C. Go hiking.	C	True
+short_conv_1217	gaokao_audio/short_conv_1217.wav	111111	When does the man plan to get to the party? A. Around 6: 00 pm; B. Around 7: 00 pm; C. Around 7: 30 pm	C	True
+short_conv_1218	gaokao_audio/short_conv_1218.wav	111111	What will the man do this afternoon? A. Play football; B. Play tennis; C. Play computer games.	C	True
+short_conv_1219	gaokao_audio/short_conv_1219.wav	111111	How will the man go to school tomorrow? A. By car; B. By bus; C. By bike.	B	True
+short_conv_122	gaokao_audio/short_conv_122.wav	111111	What do we learn from the conversation about Tony? A. He is not very enthusiastic about his English lessons; B. He is a student of the music department; C. He is not very interested in English songs.	A	True
+short_conv_1220	gaokao_audio/short_conv_1220.wav	111111	Where will the speakers go first tomorrow? A. To a castle; B. To a garden; C. To a museum.	C	True
+short_conv_1221	gaokao_audio/short_conv_1221.wav	111111	What is the man doing？ A. Offering a suggestion; B. Asking for help; C. Giving a warning	A	True
+short_conv_1222	gaokao_audio/short_conv_1222.wav	111111	What will the man do on Friday? A. Go shopping; B. Go camping; C. Hold a dinner party.	A	True
+short_conv_1223	gaokao_audio/short_conv_1223.wav	111111	What does the man offer to do? A. Take a message; B. Get Ms. Sullivan; C. Call back in thirty minutes.	A	True
+short_conv_1224	gaokao_audio/short_conv_1224.wav	111111	How much will a stamp cost in May? A. A penny; B. 42 cents; C. 44 cents.	C	True
+short_conv_1225	gaokao_audio/short_conv_1225.wav	111111	What does the man suggest the woman do? A. Wait on the phone; B. Order on the Internet; C. Drive to the pizza place.	B	True
+short_conv_1226	gaokao_audio/short_conv_1226.wav	111111	What’s the main reason that the man likes his college? A. It’s not expensive; B. It has great teachers; C. He likes his classmates.	A	True
+short_conv_1227	gaokao_audio/short_conv_1227.wav	111111	How does the man suggest the woman deal with the old shoes? A. Have them repaired; B. Throw them away; C. Sell them online.	C	True
+short_conv_1228	gaokao_audio/short_conv_1228.wav	111111	What does the woman mean? A. The new computer is cool; B. Her computer broke down; C. She needs a bigger computer.	B	True
+short_conv_1229	gaokao_audio/short_conv_1229.wav	111111	What will the Jeff’s father do? A. Lose weight; B. Learn driving; C. Buy a car for Jeff.	C	True
+short_conv_123	gaokao_audio/short_conv_123.wav	111111	What does the woman think of the science fiction? A. It needs improving; B. It is just so-so; C. It’s wonderful.	C	True
+short_conv_1230	gaokao_audio/short_conv_1230.wav	111111	What does the man want to know? A. When grandma is coming over; B. When his mother will be home; C. When he should call the woman.	A	True
+short_conv_1231	gaokao_audio/short_conv_1231.wav	111111	Where does the conversation take place? A. In the post office; B. In a taxi; C. In a hotel.	B	True
+short_conv_1232	gaokao_audio/short_conv_1232.wav	111111	What are the speakers talking about? A. The woman’s family; B. A trip; C. A picture.	C	True
+short_conv_1233	gaokao_audio/short_conv_1233.wav	111111	Why does the girl talk with the man? A. To send an invitation; B. To get permission; C. To ask for help.	B	True
+short_conv_1234	gaokao_audio/short_conv_1234.wav	111111	Where will the speakers have their picnic? A. On a rock; B. On the grass; C. In a boat.	A	True
+short_conv_1235	gaokao_audio/short_conv_1235.wav	111111	When does the conversation take place? A. At 5:30; B. At 6:00; C. At 6:30.	A	True
+short_conv_1236	gaokao_audio/short_conv_1236.wav	111111	How much should the man pay in total? A. $100; B. $110; C. $115.	B	True
+short_conv_1237	gaokao_audio/short_conv_1237.wav	111111	In which year is the man in college now? A. The first year; B. The second year; C. The third year.	C	True
+short_conv_1238	gaokao_audio/short_conv_1238.wav	111111	What does the man suggest the woman do? A. Ask a repair shop for help; B. Buy another car; C. Fix the car herself.	B	True
+short_conv_1239	gaokao_audio/short_conv_1239.wav	111111	What’s the matter with the woman? A. She has caught a bad cold; B. She stayed online too long; C. She can’t stand paint smell.	C	True
+short_conv_124	gaokao_audio/short_conv_124.wav	111111	Where are the speakers？ A. At a ball field; B. In a classroom; C. At a hospital.	C	True
+short_conv_1240	gaokao_audio/short_conv_1240.wav	111111	How does the man think of the book? A. Humorous; B. Scientific; C. Popular.	A	True
+short_conv_1241	gaokao_audio/short_conv_1241.wav	111111	What is the problem for the man? A. He has to meet many people; B. He has to leave his friends; C. He has to travel a lot.	B	True
+short_conv_1242	gaokao_audio/short_conv_1242.wav	111111	What is the man going to do? A. Go on the Internet; B. Make a phone call; C. Take a train trip.	A	True
+short_conv_1243	gaokao_audio/short_conv_1243.wav	111111	Where are the speakers? A. In a classroom; B. In a library; C. In a bookstore.	B	True
+short_conv_1244	gaokao_audio/short_conv_1244.wav	111111	How will Susan spend most of her time in France? A. Traveling around; B. Studying at a school; C. Looking after her aunt.	A	True
+short_conv_1245	gaokao_audio/short_conv_1245.wav	111111	What are the speakers talking about? A. Going out; B. Ordering drinks; C. Preparing for a party.	C	True
+short_conv_1246	gaokao_audio/short_conv_1246.wav	111111	What does the woman think of the movie? A. It’s amusing; B. It’s exciting; C. It’s disappointing.	C	True
+short_conv_1247	gaokao_audio/short_conv_1247.wav	111111	When is the assignment due? A. Later today; B. Tomorrow; C. The day after tomorrow.	C	True
+short_conv_1248	gaokao_audio/short_conv_1248.wav	111111	What did the woman try to do? A. Create a new password; B. Get some information for the man; C. Go online using the man’s new password.	B	True
+short_conv_1249	gaokao_audio/short_conv_1249.wav	111111	Why didn’t the woman answer her phone? A. She lost her phone; B. She didn’t want to talk to the man; C. She was not allowed to use the phone then.	C	True
+short_conv_125	gaokao_audio/short_conv_125.wav	111111	What was the weather like in the morning？ A. Cloudy; B. Windy; C. Sunny.	A	True
+short_conv_1250	gaokao_audio/short_conv_1250.wav	111111	What will the man have? A. Coffee with milk; B. Tea with sweet cream; C. Tea with sugar.	A	True
+short_conv_1251	gaokao_audio/short_conv_1251.wav	111111	What does the man think of going to Aspen? A. It costs too much; B. It sounds very interesting; C. He needs to think about it.	A	True
+short_conv_1252	gaokao_audio/short_conv_1252.wav	111111	What does the woman tell the man to do? A. Prepare for landing; B. Take his headphones out; C. Put his chair back in fifteen minutes.	A	True
+short_conv_1253	gaokao_audio/short_conv_1253.wav	111111	What is the relationship between the speakers? A. They are friends; B. They are cousins; C. They are brother and sister.	C	True
+short_conv_1254	gaokao_audio/short_conv_1254.wav	111111	What will the woman do next? A. Bring the man a salad; B. Take away the man’s soda; C. Give the man some cheese.	A	True
+short_conv_1255	gaokao_audio/short_conv_1255.wav	111111	What does the man wish for the future? A. All his dreams will come true; B. Science will develop much faster; C. He will be able to do his job from home.	C	True
+short_conv_1256	gaokao_audio/short_conv_1256.wav	111111	When did the speakers last see each other? A. Fifteen years ago; B. Five years ago; C. One year ago.	B	True
+short_conv_1257	gaokao_audio/short_conv_1257.wav	111111	What are the speakers mainly talking about? A. Young people lose their jobs easily; B. Young people are too quick in making decisions; C. Young people seldom stay long in the same job.	C	True
+short_conv_1258	gaokao_audio/short_conv_1258.wav	111111	How does the woman feel about the zoo? A. Sad; B. Impressed; C. Disappointed.	B	True
+short_conv_1259	gaokao_audio/short_conv_1259.wav	111111	What is the woman doing now? A. Baking cookies; B. Making a list; C. Shopping for groceries.	B	True
+short_conv_126	gaokao_audio/short_conv_126.wav	111111	What is the woman's advice？ A. Selling the old table; B. Buying two bookshelves; C. Moving some furniture.	C	True
+short_conv_1260	gaokao_audio/short_conv_1260.wav	111111	What happened to the woman? A. She woke up late; B. She got to work late; C. She went to sleep late.	A	True
+short_conv_1261	gaokao_audio/short_conv_1261.wav	111111	Where does the man want to go? A. A railway station; B. A post office; C. The seaside.	C	True
+short_conv_1262	gaokao_audio/short_conv_1262.wav	111111	What are the speakers mainly talking about? A. A film; B. An actor; C. A book.	A	True
+short_conv_1263	gaokao_audio/short_conv_1263.wav	111111	Where does the conversation take place? A. In a restaurant; B. In a hotel; C. In an office.	B	True
+short_conv_1264	gaokao_audio/short_conv_1264.wav	111111	What does the man want the woman to do? A. Repair his computer; B. Call Sam for help; C. Tell him a phone number.	C	True
+short_conv_1265	gaokao_audio/short_conv_1265.wav	111111	When should the man hand in the report at the latest? A. At 2:00; B. At 3:00; C. At 4:00.	C	True
+short_conv_1266	gaokao_audio/short_conv_1266.wav	111111	Who is the little girl? A. The woman’s sister; B. The woman’s niece; C. The woman’s daughter	B	True
+short_conv_1267	gaokao_audio/short_conv_1267.wav	111111	What has the woman lost? A. Her purse; B. Some keys; C. A pair of glasses.	A	True
+short_conv_1268	gaokao_audio/short_conv_1268.wav	111111	How much should the man pay for a night? A. $20; B. $30; C. $50.	C	True
+short_conv_1269	gaokao_audio/short_conv_1269.wav	111111	What does the woman say about Sally? A. She is wrong to fire her boss; B. She always says what she thinks; C. She seems to have a lot in her mind.	B	True
+short_conv_127	gaokao_audio/short_conv_127.wav	111111	What does the man think the woman should do at 4:00？ A. Write a report; B. Meet Mr. Black; C. Pick up her parents.	B	True
+short_conv_1270	gaokao_audio/short_conv_1270.wav	111111	What colour window does the man prefer? A. Green; B. Yellow; C. Dark blue.	C	True
+short_conv_1271	gaokao_audio/short_conv_1271.wav	111111	What will the man do? A. Make a cake; B. Buy a chocolate cake; C. Bring some chocolate to Alice.	B	True
+short_conv_1272	gaokao_audio/short_conv_1272.wav	111111	How long is the swimming pool open today? A. 10 hours; B. 11 hours; C. 12 hours.	B	True
+short_conv_1273	gaokao_audio/short_conv_1273.wav	111111	What can we know about the man? A. He likes singing; B. He likes challenges; C. He likes music.	C	True
+short_conv_1274	gaokao_audio/short_conv_1274.wav	111111	What does the woman learn in college? A. Learning communication; B. Computer science; C. International trade.	B	True
+short_conv_1275	gaokao_audio/short_conv_1275.wav	111111	When is the meeting? A. On Thursday; B. On Friday; C. On Saturday.	B	True
+short_conv_1276	gaokao_audio/short_conv_1276.wav	111111	What did the girl do last night? A. She had a talk; B. She watched TV; C. She did her homework.	A	True
+short_conv_1277	gaokao_audio/short_conv_1277.wav	111111	What does the man mean? A. He dislikes eating chicken; B. He won’t eat the burned chicken; C. He forgets to cook the chicken.	B	True
+short_conv_1278	gaokao_audio/short_conv_1278.wav	111111	How long does the man plan to try teleworking? A. A few days; B. A few weeks; C. A few months.	C	True
+short_conv_1279	gaokao_audio/short_conv_1279.wav	111111	What are the speakers talking about? A. A term schedule; B. Their vacation plan; C. The optional courses.	A	True
+short_conv_128	gaokao_audio/short_conv_128.wav	111111	How often does the man write to his parents？ A. Twice a week; B. Twice a month; C. Three times a month.	A	True
+short_conv_1280	gaokao_audio/short_conv_1280.wav	111111	Where was the man yesterday afternoon? A. By a lake; B. In a café; C. In a cinema.	B	True
+short_conv_1281	gaokao_audio/short_conv_1281.wav	111111	What’s the weather like today? A. Rainy; B. Sunny; C. Windy.	B	True
+short_conv_1282	gaokao_audio/short_conv_1282.wav	111111	Whose niece came just now? A. Jenny’s; B. Alice’s ; C. Tom’s.	B	True
+short_conv_1283	gaokao_audio/short_conv_1283.wav	111111	Who asked John to leave? A. Paula; B. Mike; C. Helen.	A	True
+short_conv_1284	gaokao_audio/short_conv_1284.wav	111111	What will the man do next? A. Put on his overcoat; B. Put on Jack’s overcoat; C. Buy an overcoat.	B	True
+short_conv_1285	gaokao_audio/short_conv_1285.wav	111111	What is the right time order? A. Start work, visit friends, rent a house; B. Start work, rent a house, visit friends; C. Visit friends, rent a house, start work.	C	True
+short_conv_1286	gaokao_audio/short_conv_1286.wav	111111	Which place is the man probably going to? A. The Science Museum; B. The railway station; C. The No. 15 bus stop.	B	True
+short_conv_1287	gaokao_audio/short_conv_1287.wav	111111	What will the man do? A. Fly a kite with the woman; B. Fly a kite by himself; C. Stay at home.	A	True
+short_conv_1288	gaokao_audio/short_conv_1288.wav	111111	What does the woman mean? A. The film is too long; B. The film is too short; C. The film is boring.	C	True
+short_conv_1289	gaokao_audio/short_conv_1289.wav	111111	What does the woman ask the man to do right now? A. Watch TV; B. Go to bed; C. Do homework.	C	True
+short_conv_129	gaokao_audio/short_conv_129.wav	111111	How much change should the man get? A. $3; B. $7; C. $9.	A	True
+short_conv_1290	gaokao_audio/short_conv_1290.wav	111111	Where are the speakers? A. At a department store; B. At an airport; C. At a restaurant.	B	True
+short_conv_1291	gaokao_audio/short_conv_1291.wav	111111	When will the speakers get to the theater if they leave now? A. At 7:35; B. At 7:45; C. At 8:20.	B	True
+short_conv_1292	gaokao_audio/short_conv_1292.wav	111111	What will they do tonight? A. Go to the cinema; B. Go to a concert; C. Go to a restaurant.	B	True
+short_conv_1293	gaokao_audio/short_conv_1293.wav	111111	Where are the speakers probably? A. At a bank; B. At a supermarket; C. At a post office.	C	True
+short_conv_1294	gaokao_audio/short_conv_1294.wav	111111	What does the man want? A. A cup of coffee; B. Orange juice; C. Pancakes.	A	True
+short_conv_1295	gaokao_audio/short_conv_1295.wav	111111	When does the train leave for Boston? A. 9: 25; B. 12:00; C. 11: 45.	A	True
+short_conv_1296	gaokao_audio/short_conv_1296.wav	111111	How is the weather today? A. Sunny; B. Cloudy; C. Rainy.	B	True
+short_conv_1297	gaokao_audio/short_conv_1297.wav	111111	What does the man w ant to know ? A. What time it is; B. When his train is supposed to leave; C. Where he can find the boarding hall.	A	True
+short_conv_1298	gaokao_audio/short_conv_1298.wav	111111	Who might Freddy be? A. The speakers’ son; B. The speakers’ pet; C. The speakers’ house owner.	B	True
+short_conv_1299	gaokao_audio/short_conv_1299.wav	111111	Why is the w oman sad? A. She lost her homew ork; B. She didn’t finish her essay; C. She forgot to reply to her teacher’s email.	A	True
+short_conv_13	gaokao_audio/short_conv_13.wav	111111	How long will the man have to wait for another bus? A. 5 minutes; B. 15 minutes; C. 20 minutes.	B	True
+short_conv_130	gaokao_audio/short_conv_130.wav	111111	What does the woman imply about Peter? A. He likes to follow the fashion; B. He has bad taste in dressing; C. He missed a few lessons.	B	True
+short_conv_1300	gaokao_audio/short_conv_1300.wav	111111	What is the girl doing? A. Making the bed; B. Fixing the chair; C. Driving to school.	B	True
+short_conv_1301	gaokao_audio/short_conv_1301.wav	111111	What did Kate do? A. She sang a song; B. She listened to some music; C. She took a nap in her room.	B	True
+short_conv_1302	gaokao_audio/short_conv_1302.wav	111111	What does the man say about the necklace? A. It suits the woman; B. It doesn't look expensive; C. It looks really old.	A	True
+short_conv_1303	gaokao_audio/short_conv_1303.wav	111111	What problem does the man have? A. He has some pain in the ears; B. His glasses aren't suitable; C. His back hurts.	B	True
+short_conv_1304	gaokao_audio/short_conv_1304.wav	111111	Why does the man make the call? A. To make an appointment; B. To do some experiments; C. To ask about his test results.	C	True
+short_conv_1305	gaokao_audio/short_conv_1305.wav	111111	What does the woman mean? A. She agrees with the man; B. She doesn't know Mrs.Kim; C. Mrs.Kim shouldn't be on the committee.	A	True
+short_conv_1306	gaokao_audio/short_conv_1306.wav	111111	Where does the woman live now? A. In the dorm; B. In the hotel; C. In an apartment.	C	True
+short_conv_1307	gaokao_audio/short_conv_1307.wav	111111	What will the boy probably do this weekend? A. Have a picnic; B. Learn about science; C. Study math by himself.	B	True
+short_conv_1308	gaokao_audio/short_conv_1308.wav	111111	Who is Jack probably talking with? A. His mother; B. His teacher; C. His dentist.	A	True
+short_conv_1309	gaokao_audio/short_conv_1309.wav	111111	How many shirts will the man buy? A. Three; B. Five; C. Six.	B	True
+short_conv_131	gaokao_audio/short_conv_131.wav	111111	What did the man volunteer to do? A. Do gardening; B. Collect stamps; C. Protect the plants.	C	True
+short_conv_1310	gaokao_audio/short_conv_1310.wav	111111	How will the woman go to the town center? A. By train; B. By bus; C. By taxi.	A	True
+short_conv_1311	gaokao_audio/short_conv_1311.wav	111111	Whose book does Suzie have? A. Hannah’s; B. Her mother’s; C. Deborah’s.	C	True
+short_conv_1312	gaokao_audio/short_conv_1312.wav	111111	What is the man ‘s problem? A. He is very hungry; B. He dialed the wrong number; C. He doesn’t want a room facing the sea.	B	True
+short_conv_1313	gaokao_audio/short_conv_1313.wav	111111	What is the man doing there? A. He is living there; B. He is on holiday there; C. He is working there.	B	True
+short_conv_1314	gaokao_audio/short_conv_1314.wav	111111	What is the probable relationship between the two speakers? A. Husband and wife; B. Teacher and student; C. Employer and employee.	A	True
+short_conv_1315	gaokao_audio/short_conv_1315.wav	111111	Which is the best way to get to the Water Cube at the moment? A. Taking a subway; B. Taking a bus; C. Taking a taxi.	A	True
+short_conv_1316	gaokao_audio/short_conv_1316.wav	111111	What time is it now? A. 7:15; B. 8:00; C. 8:45.	A	True
+short_conv_1317	gaokao_audio/short_conv_1317.wav	111111	What does the man say about the movie? A. It’s horrible; B. It’s amusing; C. It’s not good.	B	True
+short_conv_1318	gaokao_audio/short_conv_1318.wav	111111	What is the probable relationship between the speakers? A. Acquaintances; B. Classmates; C. A couple.	C	True
+short_conv_1319	gaokao_audio/short_conv_1319.wav	111111	What are the speakers talking about? A. A farm; B. Some houses; C. A corn field.	A	True
+short_conv_132	gaokao_audio/short_conv_132.wav	111111	Where might the speakers be? A. In a hospital; B. At a restaurant; C. On a bus.	B	True
+short_conv_1320	gaokao_audio/short_conv_1320.wav	111111	Why is John late for school? A. He was stuck in traffic; B. He hurt his head; C. He did a good deed.	C	True
+short_conv_1321	gaokao_audio/short_conv_1321.wav	111111	What does the woman think of the car journey? A. It’s too long; B. It’s very exciting; C. It’s rather dangerous.	A	True
+short_conv_1322	gaokao_audio/short_conv_1322.wav	111111	What hat is the man looking for? A. The cowboy hat; B. The one with stars; C. The one with a baseball logo.	B	True
+short_conv_1323	gaokao_audio/short_conv_1323.wav	111111	What do we know about the woman? A. She works as a tutor at night; B. She has a well­-paid job; C. She got a pay raise recently.	C	True
+short_conv_1324	gaokao_audio/short_conv_1324.wav	111111	What has led Amy to success? A. Her intelligence; B. Her effort; C. Her luck.	B	True
+short_conv_1325	gaokao_audio/short_conv_1325.wav	111111	How did the woman feel about her life? A. Worried; B. Satisfied; C. Bored.	C	True
+short_conv_1326	gaokao_audio/short_conv_1326.wav	111111	How will the man go to the train station tonight? A. By car; B. By bus; C. On foot.	B	True
+short_conv_1327	gaokao_audio/short_conv_1327.wav	111111	What are the speakers mainly talking about? A. Young people lose their jobs easily; B. Young people are too quick in making decisions; C. Young people seldom stay long in the same job.	C	True
+short_conv_1328	gaokao_audio/short_conv_1328.wav	111111	How does the woman feel about the zoo? A. Sad B. Impressed C. Disappointed	B	True
+short_conv_1329	gaokao_audio/short_conv_1329.wav	111111	What is the woman doing now? A. Baking cookies B. Making a list C. Shopping for groceries	B	True
+short_conv_133	gaokao_audio/short_conv_133.wav	111111	What is the man complaining about? A. The food; B. The project; C. The noise.	C	True
+short_conv_1330	gaokao_audio/short_conv_1330.wav	111111	What happened to the woman? A. She woke up late B. She got to work late C. She stayed up late	A	True
+short_conv_1331	gaokao_audio/short_conv_1331.wav	111111	Where does the man want to go? A. To a railway station B. To a Post Office C. To the seaside	C	True
+short_conv_1332	gaokao_audio/short_conv_1332.wav	111111	Why hasn’t the man begun Christmas shopping yet? A. He has no money; B. It’s still the early of November; C. He has no time.	C	True
+short_conv_1333	gaokao_audio/short_conv_1333.wav	111111	What does the man think of Mr. Anderson’s class? A. Easy to pass; B. Hard to follow; C. Worth learning.	C	True
+short_conv_1334	gaokao_audio/short_conv_1334.wav	111111	What have NOT the speakers planned to do? A. Pay off the loans; B. Send their kids abroad; C. Go traveling.	B	True
+short_conv_1335	gaokao_audio/short_conv_1335.wav	111111	When will the speakers go roller-skating? A. 8:00 p.m. tonight; B. 9:00 a.m. tomorrow; C. 9:00 p.m. tomorrow.	B	True
+short_conv_1336	gaokao_audio/short_conv_1336.wav	111111	Where does the conversation take place probably? A. In a hotel; B. In a cinema; C. In a shopping center.	A	True
+short_conv_1337	gaokao_audio/short_conv_1337.wav	111111	Why will the woman go to the park today? A. To play basketball; B. To play volleyball; C. To walk with her friends.	B	True
+short_conv_1338	gaokao_audio/short_conv_1338.wav	111111	Where will the man sit in the restaurant? A. Near the door; B. In the corner; C. Near the window.	C	True
+short_conv_1339	gaokao_audio/short_conv_1339.wav	111111	When will Professor Davidson talk with the woman? A. After his class; B. The next day; C. Before office hours.	B	True
+short_conv_134	gaokao_audio/short_conv_134.wav	111111	Which sport does the man prefer now? A. Tennis; B. Basketball; C. Football.	B	True
+short_conv_1340	gaokao_audio/short_conv_1340.wav	111111	Where does the conversation probably take place? A. In a hotel; B. In a hospital; C. In a restaurant.	A	True
+short_conv_1341	gaokao_audio/short_conv_1341.wav	111111	What does the man prefer to do on Sundays? A. Go shopping; B. Go swimming; C. Do some reading.	C	True
+short_conv_1342	gaokao_audio/short_conv_1342.wav	111111	Why didn’t Helen come to the party? A. She was not invited B. She forgot the time C. She had a piano lesson	C	True
+short_conv_1343	gaokao_audio/short_conv_1343.wav	111111	Where did Alice go on the weekend? A. To the countryside B. To the entertainment park C. To the cinema	A	True
+short_conv_1344	gaokao_audio/short_conv_1344.wav	111111	What are the speakers probably talking about? A. TV shows; B. Stories; C. Books.	A	True
+short_conv_1345	gaokao_audio/short_conv_1345.wav	111111	How old is Tina now? A. 19 B. 20 C. 21	C	True
+short_conv_1346	gaokao_audio/short_conv_1346.wav	111111	What is the weather like? A. It’s fine; B. It’s rainy; C. It’s too hot.	B	True
+short_conv_1347	gaokao_audio/short_conv_1347.wav	111111	What’s the woman’s plan for tonight? A. To visit her parents B. To see a movie with the man C. The have classes with the man	A	True
+short_conv_1348	gaokao_audio/short_conv_1348.wav	111111	Where is Lucy? A. In her office B. In the meeting room C. On the playground	C	True
+short_conv_1349	gaokao_audio/short_conv_1349.wav	111111	How may Kate feel now? A. Excited B. Sad C. Happy	B	True
+short_conv_135	gaokao_audio/short_conv_135.wav	111111	Who will pay for the meal? A. The man; B. The woman; C. The woman’s sister.	C	True
+short_conv_1350	gaokao_audio/short_conv_1350.wav	111111	What does the woman want to be? A. An engineer B. A teacher C. A doctor	B	True
+short_conv_1351	gaokao_audio/short_conv_1351.wav	111111	What does the woman like best? A. Apples B. Oranges C. Bananas	A	True
+short_conv_1352	gaokao_audio/short_conv_1352.wav	111111	Why was the man late for work? A. He was in an accident; B. His car was being repaired; C. He couldn’t get his car going.	C	True
+short_conv_1353	gaokao_audio/short_conv_1353.wav	111111	What was wrong with Jack? A. He had a fever; B. He was in hospital; C. He was late for work.	C	True
+short_conv_1354	gaokao_audio/short_conv_1354.wav	111111	How long will the man stay in France? A. Five weeks; B. Three weeks; C. Two weeks.	A	True
+short_conv_1355	gaokao_audio/short_conv_1355.wav	111111	What are the speakers doing? A. Working; B. Jogging; C. Drinking.	B	True
+short_conv_1356	gaokao_audio/short_conv_1356.wav	111111	What made the man so worried? A. The exam; B. The paper; C. His teacher.	B	True
+short_conv_1357	gaokao_audio/short_conv_1357.wav	111111	How much did the woman’s trousers cost? A. 45 dollars; B. 12 dollars; C. 33 dollars.	C	True
+short_conv_1358	gaokao_audio/short_conv_1358.wav	111111	Where can you most probably hear this talk? A. In a department store; B. In a post office; C. In a bank.	C	True
+short_conv_1359	gaokao_audio/short_conv_1359.wav	111111	Why does the man turn down the woman’s offer? A. He doesn’t have coffee before lunch; B. He doesn’t feel like wine; C. He prefers tea.	A	True
+short_conv_136	gaokao_audio/short_conv_136.wav	111111	Why is the woman surprised? A. The shirt is very expensive; B. Her husband wants four shirts; C. The man doesn’t agree with her.	A	True
+short_conv_1360	gaokao_audio/short_conv_1360.wav	111111	How much time is left before the movie begins? A. 7 minutes; B. 15 minutes; C. 30 minutes.	B	True
+short_conv_1361	gaokao_audio/short_conv_1361.wav	111111	What did the woman do last Saturday? A. She saw a play; B. She acted in a play; C. She went to the tea house.	A	True
+short_conv_1362	gaokao_audio/short_conv_1362.wav	111111	Where does the conversation most probably take place? A. In a library; B. At a bookstore; C. In a museum.	A	True
+short_conv_1363	gaokao_audio/short_conv_1363.wav	111111	What was the weather like in the mountains yesterday? A. Sunny; B. Windy; C. Snowy.	C	True
+short_conv_1364	gaokao_audio/short_conv_1364.wav	111111	What does the man want to cut out of paper? A. A fish; B. A bird; C. A monkey.	B	True
+short_conv_1365	gaokao_audio/short_conv_1365.wav	111111	When will the film start? A. At 5:00; B. At 6:00; C. At 7:00.	C	True
+short_conv_1366	gaokao_audio/short_conv_1366.wav	111111	Which club will the man join? A. The film club; B. The travel club; C. The sports club.	B	True
+short_conv_1367	gaokao_audio/short_conv_1367.wav	111111	What is the relationship between the speakers? A. Mother and son; B. Hostess and guest; C. Waitress and customer.	B	True
+short_conv_1368	gaokao_audio/short_conv_1368.wav	111111	What does the man imply? A. His brother will watch the game; B. He isn’t interested in the game; C. His brother will play in the game.	C	True
+short_conv_1369	gaokao_audio/short_conv_1369.wav	111111	What book has the man’s sister got? A. A medical book; B. A Chinese textbook; C. An English textbook.	A	True
+short_conv_137	gaokao_audio/short_conv_137.wav	111111	What will the man do on his birthday? A. Have a party; B. See a movie; C. Go out for a meal.	B	True
+short_conv_1370	gaokao_audio/short_conv_1370.wav	111111	Who is ill? A. The man; B. The woman; C. The man’s brother.	C	True
+short_conv_1371	gaokao_audio/short_conv_1371.wav	111111	What is the woman doing? A. Walking; B. Driving; C. Running.	B	True
+short_conv_1372	gaokao_audio/short_conv_1372.wav	111111	What are the speakers talking about? A. A dress; B. A sale; C. Some shoes.	A	True
+short_conv_1373	gaokao_audio/short_conv_1373.wav	111111	Where are the speakers? A. On a bus; B. On a train; C. On a plane.	C	True
+short_conv_1374	gaokao_audio/short_conv_1374.wav	111111	What happened to the woman? A. She was late for work; B. She offered bad service; C. She was asked to leave her job.	A	True
+short_conv_1375	gaokao_audio/short_conv_1375.wav	111111	What is the woman looking for? A. Her glasses; B. Her keys; C. Her books.	A	True
+short_conv_1376	gaokao_audio/short_conv_1376.wav	111111	What kind of weather does the man like? A. Rainy; B. Sunny; C. Cloudy.	B	True
+short_conv_1377	gaokao_audio/short_conv_1377.wav	111111	What time of day is it? A. Morning; B. Afternoon; C. Evening.	A	True
+short_conv_1378	gaokao_audio/short_conv_1378.wav	111111	What is the woman’s job? A. A shopkeeper; B. A waitress; C. A saleswoman.	B	True
+short_conv_1379	gaokao_audio/short_conv_1379.wav	111111	When should Tony return the money to Emma? A. In July; B. In August; C. In September.	C	True
+short_conv_138	gaokao_audio/short_conv_138.wav	111111	How can people travel today? A. By air; B. By ship; C. By train.	C	True
+short_conv_1380	gaokao_audio/short_conv_1380.wav	111111	What will the woman do next? A. Go home; B. Copy her ID C. Deposit her check.	A	True
+short_conv_1381	gaokao_audio/short_conv_1381.wav	111111	What color is the book that the man wants? A. Red; B. Black; C. Blue.	B	True
+short_conv_1382	gaokao_audio/short_conv_1382.wav	111111	When would Thomas and Lily like to leave? A. Tomorrow; B. Next Monday or Tuesday; C. This Tuesday.	B	True
+short_conv_1383	gaokao_audio/short_conv_1383.wav	111111	What are the two speakers talking about? A. A novel; B. A film; C. A cinema.	B	True
+short_conv_1384	gaokao_audio/short_conv_1384.wav	111111	What is the man doing? A. Asking for help; B. Making suggestions; C. Asking for permission.	B	True
+short_conv_1385	gaokao_audio/short_conv_1385.wav	111111	What are the two speakers doing? A. Watching a film; B. Talking about a book; C. Listening to a record.	B	True
+short_conv_1386	gaokao_audio/short_conv_1386.wav	111111	Who is the man? A. A traffic policeman; B. A taxi driver; C. A restaurant waiter.	B	True
+short_conv_1387	gaokao_audio/short_conv_1387.wav	111111	What will the man do first? A. Wash the cups B. Clean the floor C. Clean the windows.	B	True
+short_conv_1388	gaokao_audio/short_conv_1388.wav	111111	What was the weather like yesterday? A. Rainy; B. Sunny; C. Snowy.	A	True
+short_conv_1389	gaokao_audio/short_conv_1389.wav	111111	Where are the man’s gloves probably? A. In his schoolbag B. In his pocket; C. On the sofa	C	True
+short_conv_139	gaokao_audio/short_conv_139.wav	111111	What are the speakers mainly talking about? A. A book; B. An album; C. A song.	B	True
+short_conv_1390	gaokao_audio/short_conv_1390.wav	111111	What are the speakers talking about? A. A ball B. A boy C. A hat	B	True
+short_conv_1391	gaokao_audio/short_conv_1391.wav	111111	What time is it now? A. 1:15pm B. 2:15pm C. 3:15pm.	A	True
+short_conv_1392	gaokao_audio/short_conv_1392.wav	111111	What is the weather like today? A. It’s rainy; B. It’s cloudy; C. It’s sunny.	C	True
+short_conv_1393	gaokao_audio/short_conv_1393.wav	111111	What was the man doing at the moment? A. He was listening to the radio; B. He was writing something; C. He was reading a book.	B	True
+short_conv_1394	gaokao_audio/short_conv_1394.wav	111111	Why will the woman leave before eleven? A. To go home; B. To buy something; C. To make a work plan.	B	True
+short_conv_1395	gaokao_audio/short_conv_1395.wav	111111	Why does the man move to New York? A. To work there; B. To look after parents; C. To make a trip.	A	True
+short_conv_1396	gaokao_audio/short_conv_1396.wav	111111	What are the two speakers talking about? A. Buying a TV; B. TV channel; C. Sports meet.	B	True
+short_conv_1397	gaokao_audio/short_conv_1397.wav	111111	What does Susan mean? A. She had a date then; B. She will put off the meeting; C. She didn’t have time to prepare the speech.	C	True
+short_conv_1398	gaokao_audio/short_conv_1398.wav	111111	What does the man want to do? A. Learn to play baseball; B. Organize a baseball team; C. Find a baseball player.	A	True
+short_conv_1399	gaokao_audio/short_conv_1399.wav	111111	What’s wrong with Jane? A. She misses her home very much; B. She hasn’t received her mother’s letter; C. She is worried about her mother’s health.	A	True
+short_conv_14	gaokao_audio/short_conv_14.wav	111111	What is the man going to do? A. Stay inside; B. Buy an umbrella; C. Walk out with an umbrella.	A	True
+short_conv_140	gaokao_audio/short_conv_140.wav	111111	How does the woman finally decide to go home? A. By bus; B. In the man’s car; C. In her father’s car.	C	True
+short_conv_1400	gaokao_audio/short_conv_1400.wav	111111	What does the man advise the woman to do? A. See Mr. Smith; B. Check the letter; C. Type the letter again.	C	True
+short_conv_1401	gaokao_audio/short_conv_1401.wav	111111	How many people will visit New York for free? A. 2; B. 3; C. 5.	C	True
+short_conv_1402	gaokao_audio/short_conv_1402.wav	111111	According to the woman, what should the man do at first? A. He should ask about the flat on the phone; B. He should read the advertisements for flats in the newspaper; C. He should phone and make an appointment.	B	True
+short_conv_1403	gaokao_audio/short_conv_1403.wav	111111	What is the man’s choice? A. He prefers train for trip; B. He doesn’t like traveling; C. Not mentioned.	C	True
+short_conv_1404	gaokao_audio/short_conv_1404.wav	111111	When would Thomas and Lily like to leave? A. Tomorrow; B. Next Monday or Tuesday; C. This Monday.	B	True
+short_conv_1405	gaokao_audio/short_conv_1405.wav	111111	Where and when will the meeting be held? A. Room 303,3:00 pm; B. Room 303,2:00 pm; C. Room 302,2:00 pm.	C	True
+short_conv_1406	gaokao_audio/short_conv_1406.wav	111111	What does the woman want to do? A. To have an X ray; B. To go to the hospital; C. To help the wounded man.	C	True
+short_conv_1407	gaokao_audio/short_conv_1407.wav	111111	What is the boy going to do today? A. Watch a football game; B. Go to see a doctor; C. Call his head teacher.	A	True
+short_conv_1408	gaokao_audio/short_conv_1408.wav	111111	What does the man say about his country? A. It is cold; B. It is hot; C. It is rainy.	C	True
+short_conv_1409	gaokao_audio/short_conv_1409.wav	111111	What does the man mean? A. They don’t have to arrive for the Brown’s lunch on time; B. It’s impolite to be late for the Brown’s lunch; C. They don’t have to have manners in France.	A	True
+short_conv_141	gaokao_audio/short_conv_141.wav	111111	When will the man leave for Sweden? A. Today; B. Tomorrow; C. The day after tomorrow.	A	True
+short_conv_1410	gaokao_audio/short_conv_1410.wav	111111	What are the speakers talking about? A. An article in the newspaper; B. A meeting with the president; C. A speech on television.	C	True
+short_conv_1411	gaokao_audio/short_conv_1411.wav	111111	What is the change? A. $0.50 B. $0.75 C. $3.25	B	True
+short_conv_1412	gaokao_audio/short_conv_1412.wav	111111	What is the problem with her English? A. Her spelling is very poor; B. Her speaking is not good; C. Her pronunciation is not good.	A	True
+short_conv_1413	gaokao_audio/short_conv_1413.wav	111111	Why can't the woman go to the party? A. She is sick; B. She has to work; C. She has to stay at home.	B	True
+short_conv_1414	gaokao_audio/short_conv_1414.wav	111111	Where is the woman going now? A. Her brother's office B. Her own house; C. The market.	C	True
+short_conv_1415	gaokao_audio/short_conv_1415.wav	111111	What's the correct time? A. 8:20; B. 8:25; C. 8:15.	B	True
+short_conv_1416	gaokao_audio/short_conv_1416.wav	111111	Where are the two speakers? A. On a ship; B. On a train; C. On a plane.	C	True
+short_conv_1417	gaokao_audio/short_conv_1417.wav	111111	What is the woman going to do first after she leaves home? A. Buy two chickens B. Go to the bank; C. Go to eat something.	B	True
+short_conv_1418	gaokao_audio/short_conv_1418.wav	111111	When did the two speakers plan to meet Jane? A. At2:00 B. At2:15 C. At2:30	A	True
+short_conv_1419	gaokao_audio/short_conv_1419.wav	111111	What would be the woman’s advice? A. Don't drink water with ice B. Don't have the drinks at lunch C. Don't eat cold dishes in the summer	A	True
+short_conv_142	gaokao_audio/short_conv_142.wav	111111	What is the weather probably like now? A. Hot; B. Cold; C. Warm.	B	True
+short_conv_1420	gaokao_audio/short_conv_1420.wav	111111	Who will not be at the party? A. Jessica B. Linda C. Rose	C	True
+short_conv_1421	gaokao_audio/short_conv_1421.wav	111111	What are the two speakers probably doing? A. Playing a game; B. Enjoying a play; C. Watching a match	C	True
+short_conv_1422	gaokao_audio/short_conv_1422.wav	111111	Why did the speakers’ cake turn out poorly this time? A. They took it out too early; B. The oven was broken halfway; C. They left it in the oven too long.	A	True
+short_conv_1423	gaokao_audio/short_conv_1423.wav	111111	What’s the boy’s plan for this weekend? A. Going to the countryside; B. Visiting his parents; C. Playing chess with the girl.	A	True
+short_conv_1424	gaokao_audio/short_conv_1424.wav	111111	Where is the leather sofa now? A. In the bedroom; B. In the living room; C. In the dining room.	C	True
+short_conv_1425	gaokao_audio/short_conv_1425.wav	111111	What did the girl do during the winter holiday? A. She visited her teachers; B. She read some books; C. She went sightseeing.	C	True
+short_conv_1426	gaokao_audio/short_conv_1426.wav	111111	What is this conversation mainly about? A. Grandma’s cooking; B. The Spring Festival; C. A trip	C	True
+short_conv_1427	gaokao_audio/short_conv_1427.wav	111111	What does the woman think of her interview? A. It was tough; B. It was interesting C. It was successful	C	True
+short_conv_1428	gaokao_audio/short_conv_1428.wav	111111	What are the speakers talking about? A. A restaurant B. A street C. A dish	A	True
+short_conv_1429	gaokao_audio/short_conv_1429.wav	111111	Where docs the conversation probably take place? A. In a bank B. At a ticket office C. On a train	B	True
+short_conv_143	gaokao_audio/short_conv_143.wav	111111	What does the woman need to do today? A. Attend a competition; B. Collect some material; C. Recite a composition.	B	True
+short_conv_1430	gaokao_audio/short_conv_1430.wav	111111	What is the probable relationship between the speakers? A. Workmates B. Brother and sister C. Teacher and student	A	True
+short_conv_1431	gaokao_audio/short_conv_1431.wav	111111	What does John find difficult in learning German? A. Pronunciation B. Vocabulary C. Grammar	C	True
+short_conv_1432	gaokao_audio/short_conv_1432.wav	111111	When is the plane taking off? A. At 9:00; B. At 10:00; C. At 10:30.	C	True
+short_conv_1433	gaokao_audio/short_conv_1433.wav	111111	What is the man doing? A. Watching TV; B. Answering the phone; C. Repairing the TV.	B	True
+short_conv_1434	gaokao_audio/short_conv_1434.wav	111111	Whom does the dictionary belong to? A. Mary; B. Jack; C. Jane.	B	True
+short_conv_1435	gaokao_audio/short_conv_1435.wav	111111	How did the man come here? A. By bus; B. By taxi; C. By car.	A	True
+short_conv_1436	gaokao_audio/short_conv_1436.wav	111111	What is the man going to do at the weekend? A. Meet some friends; B. Take a holiday; C. Stay at home.	C	True
+short_conv_1437	gaokao_audio/short_conv_1437.wav	111111	What does the man think of the book? A. Scientific; B. Interesting; C. Popular.	B	True
+short_conv_1438	gaokao_audio/short_conv_1438.wav	111111	What will the speakers do? A. Buy the T-shirt; B. Try on the red skirt; C. Go to another shop.	C	True
+short_conv_1439	gaokao_audio/short_conv_1439.wav	111111	What are the speakers mainly talking about? A. The environment; B. The price of petrol; C. Electric vehicles	C	True
+short_conv_144	gaokao_audio/short_conv_144.wav	111111	What does the man dislike? A. Old bookcases; B. Newly painted wall; C. Old paintings.	B	True
+short_conv_1440	gaokao_audio/short_conv_1440.wav	111111	What does the woman want to do? A. Look for a job; B. Put up an ad; C. Remove the snow.	A	True
+short_conv_1441	gaokao_audio/short_conv_1441.wav	111111	Where are the speakers? A. At home; B. At the man’s office; C. At a clinic	C	True
+short_conv_1442	gaokao_audio/short_conv_1442.wav	111111	What are the speakers talking about? A. A lightweight bag; B. Things to wear; C. The warm weather.	B	True
+short_conv_1443	gaokao_audio/short_conv_1443.wav	111111	Why was Mr.Johnson in the hospital? A. His wife was ill; B. His wife had a baby; C. He went to visit his daughter.	C	True
+short_conv_1444	gaokao_audio/short_conv_1444.wav	111111	How does the man get to work? A. By car; B. By train; C. By bus.	B	True
+short_conv_1445	gaokao_audio/short_conv_1445.wav	111111	When does the man want to play? A. Later today; B. Tomorrow; C. Right now.	B	True
+short_conv_1446	gaokao_audio/short_conv_1446.wav	111111	What does the woman want to eat? A. Fruit; B. Eggs; C. Pancakes.	A	True
+short_conv_1447	gaokao_audio/short_conv_1447.wav	111111	What are the speakers talking about? A. When to relax; B. How to keep fit; C. How to handle pressure.	C	True
+short_conv_1448	gaokao_audio/short_conv_1448.wav	111111	What does the man suggest the woman do? A. Ignore the ad; B. Get more information first; C. Order a computer right away.	B	True
+short_conv_1449	gaokao_audio/short_conv_1449.wav	111111	What does the boy want to have? A. A dog; B. A rabbit; C. Some fish.	A	True
+short_conv_145	gaokao_audio/short_conv_145.wav	111111	Where is the Blue Ocean Restaurant? A. Beside the Blue Sky Restaurant; B. Opposite the Blue Sky Restaurant; C. Opposite the Blue Bay Restaurant.	C	True
+short_conv_1450	gaokao_audio/short_conv_1450.wav	111111	What’s the relationship between the speakers? A. Host and guest; B. Doctor and patient; C. Teacher and student.	B	True
+short_conv_1451	gaokao_audio/short_conv_1451.wav	111111	Where is Sally going tonight? A. To a party; B. To a shop; C. To a mountain.	A	True
+short_conv_1452	gaokao_audio/short_conv_1452.wav	111111	Why will the woman go to Beijing? A. She has found a new job there; B. She will attend college there; C. She wants to see the world.	A	True
+short_conv_1453	gaokao_audio/short_conv_1453.wav	111111	How will the man probably go downtown? A. By bus; B. By taxi; C. By subway.	A	True
+short_conv_1454	gaokao_audio/short_conv_1454.wav	111111	Where are the speakers? A. In a school; B. At an airport; C. At a railway station.	B	True
+short_conv_1455	gaokao_audio/short_conv_1455.wav	111111	What is the relationship between the speakers? A. Boss and secretary; B. Teacher and student; C. Co-workers.	C	True
+short_conv_1456	gaokao_audio/short_conv_1456.wav	111111	What does the man want to know? A. Where the sign-up sheet is; B. When he can take the field trip; C. Whether the woman will go on the field trip.	A	True
+short_conv_1457	gaokao_audio/short_conv_1457.wav	111111	What are the two speakers mainly talking about? A. A new movie; B. An old movie; C. A fun experience.	B	True
+short_conv_1458	gaokao_audio/short_conv_1458.wav	111111	What will the weather be like this evening? A. Rainy; B. Cloudy; C. Fine.	A	True
+short_conv_1459	gaokao_audio/short_conv_1459.wav	111111	Who is the woman most probably? A. The man’s mother; B. The man’s boss; C. The man’s colleague.	A	True
+short_conv_146	gaokao_audio/short_conv_146.wav	111111	What’s the relationship between the two speakers? A. Relatives; B. Friends; C. Classmates.	B	True
+short_conv_1460	gaokao_audio/short_conv_1460.wav	111111	What does the man mean? A. The problems are hard for him too; B. He has dealt with all the problems; C. The woman should make a good plan.	C	True
+short_conv_1461	gaokao_audio/short_conv_1461.wav	111111	When is the supermarket closed on weekends? A. At 9:00 pm; B. At 10:00 pm; C. At 11:00 pm.	B	True
+short_conv_1462	gaokao_audio/short_conv_1462.wav	111111	What does the woman advise the man to do? A. Pay extra money; B. Drop the lessons; C. Continue learning.	C	True
+short_conv_1463	gaokao_audio/short_conv_1463.wav	111111	Why does the man telephone the reservation office? A. To cancel his flight; B. To confirm his flight; C. To book a ticket.	C	True
+short_conv_1464	gaokao_audio/short_conv_1464.wav	111111	What will the man do next? A. Search for his room key; B. Go to the front desk; C. Change his ID card.	B	True
+short_conv_1465	gaokao_audio/short_conv_1465.wav	111111	What is the possible relationship between the speakers? A. Waitress and customer; B. Cook and waiter; C. Husband and wife.	C	True
+short_conv_1466	gaokao_audio/short_conv_1466.wav	111111	When will the man meet John? A. Tonight; B. Tomorrow; C. The day after tomorrow.	A	True
+short_conv_1467	gaokao_audio/short_conv_1467.wav	111111	What are the speakers talking about? A. The photos; B. The latest fashion; C. The woman's younger sister.	A	True
+short_conv_1468	gaokao_audio/short_conv_1468.wav	111111	Where are the speakers? A. At a travel agency; B. At a post office; C. At a bank.	B	True
+short_conv_1469	gaokao_audio/short_conv_1469.wav	111111	What does the woman look like? A. Excited; B. Nervous; C. Tired.	C	True
+short_conv_147	gaokao_audio/short_conv_147.wav	111111	Why is the man worried? A. He may be turned down; B. He was required to apply to Cambridge; C. He failed to be among the top five.	A	True
+short_conv_1470	gaokao_audio/short_conv_1470.wav	111111	What does the woman want? A. The man's phone; B. The man's phone charger; C. A new mobile phone.	B	True
+short_conv_1471	gaokao_audio/short_conv_1471.wav	111111	What did the man like about the movie? A. The acting; B. The jokes; C. The music.	C	True
+short_conv_1472	gaokao_audio/short_conv_1472.wav	111111	What's Jack's opinion of his new English teacher. A. She is ambitious; B. She is humorous; C. She is responsible.	C	True
+short_conv_1473	gaokao_audio/short_conv_1473.wav	111111	What are the two speakers talking about? A. What to listen to; B. How to handle stress; C. What to read.	B	True
+short_conv_1474	gaokao_audio/short_conv_1474.wav	111111	What is the total cost of the tickets? A. 165 B. 220 C. 135	B	True
+short_conv_1475	gaokao_audio/short_conv_1475.wav	111111	What does the woman mean? A. Dick will go to town; B. Dick will break his word; C. Dick will finish the task.	C	True
+short_conv_1476	gaokao_audio/short_conv_1476.wav	111111	What is the woman's attitude towards the man's answer? A. Understanding; B. Doubtful; C. Appreciative.	B	True
+short_conv_1477	gaokao_audio/short_conv_1477.wav	111111	What time does the woman pick up her son? A. 7:00 p.m; B. 5:00 p.m; C. 5:30 p.m.	A	True
+short_conv_1478	gaokao_audio/short_conv_1478.wav	111111	What is the possible relationship between the speakers? A. Professor and student; B. Customer and waiter; C. Secretary and manager.	A	True
+short_conv_1479	gaokao_audio/short_conv_1479.wav	111111	What will the man do at noon? A. Play football; B. Ride a bike; C. Stay at home.	C	True
+short_conv_148	gaokao_audio/short_conv_148.wav	111111	What does the woman want to know? A. The weather; B. An accident; C. Train time.	C	True
+short_conv_1480	gaokao_audio/short_conv_1480.wav	111111	What does the man mean? A. The woman got a good deal; B. The woman probably paid too much; C. The woman’s hair looks better than normal.	B	True
+short_conv_1481	gaokao_audio/short_conv_1481.wav	111111	Why doesn’t the woman want to drink the water? A. She isn’t thirsty; B. It has dark stuff; C. It tastes bad.	C	True
+short_conv_1482	gaokao_audio/short_conv_1482.wav	111111	What does the girl imply? A. She will be out of town that day; B. She will definitely go to the party; C. She won’t come because it’s Friday.	B	True
+short_conv_1483	gaokao_audio/short_conv_1483.wav	111111	What does the woman mean? A. The man always loses his car keys; B. The man should study harder for his lessons; C. The man should let the woman keep the car keys.	A	True
+short_conv_1484	gaokao_audio/short_conv_1484.wav	111111	How many tickets has the woman got? A. Two; B. Three; C. Four	B	True
+short_conv_1485	gaokao_audio/short_conv_1485.wav	111111	What is Summer in Paris? A. A film; B. A magazine; C. ATV program.	A	True
+short_conv_1486	gaokao_audio/short_conv_1486.wav	111111	What does the woman want to do? A. Borrow a phone; B. Buy a map; C. Ask the way.	C	True
+short_conv_1487	gaokao_audio/short_conv_1487.wav	111111	When was the woman born? A. In 1980 B. In 1982; C. In 1984.	A	True
+short_conv_1488	gaokao_audio/short_conv_1488.wav	111111	What is Tim doing? A. Locking the door; B. Walking a dog; C. Knocking at the door.	B	True
+short_conv_1489	gaokao_audio/short_conv_1489.wav	111111	What’s the man’s idea about the skirt? A. Change it; B. Buy it; C. Reject it.	B	True
+short_conv_149	gaokao_audio/short_conv_149.wav	111111	Who is probably the woman? A. A policewoman; B. A hotel clerk; C. The man’s wife.	A	True
+short_conv_1490	gaokao_audio/short_conv_1490.wav	111111	Why didn’t Peter go for a trip last weekend? A. He missed the train; B. He didn’t buy the ticket; C. He didn’t get to the station.	A	True
+short_conv_1491	gaokao_audio/short_conv_1491.wav	111111	Where does Sandra sit in the classroom now? A. By the window; B. In the back row; C. By the door.	C	True
+short_conv_1492	gaokao_audio/short_conv_1492.wav	111111	What are the speakers talking about? A. Rain-forests; B. Animals; C. Weather.	A	True
+short_conv_1493	gaokao_audio/short_conv_1493.wav	111111	What does the woman think of the cleaner’s job? A. Boring; B. Exciting; C. Dangerous.	C	True
+short_conv_1494	gaokao_audio/short_conv_1494.wav	111111	What does the man want the woman to do? A. Come to his house at 8:00; B. Attend a gathering; C. Help him with a job.	B	True
+short_conv_1495	gaokao_audio/short_conv_1495.wav	111111	How did the woman read the book? A. She read through it; B. She read the front chapters; C. She read the chapters that interested her.	C	True
+short_conv_1496	gaokao_audio/short_conv_1496.wav	111111	What are the speakers mainly talking about? A. Which orders are urgent; B. How many orders they've packed; C. Whether to leave work right now.	C	True
+short_conv_1497	gaokao_audio/short_conv_1497.wav	111111	Who built the scenery? A. A carpenter; B. The students; C. A designer.	A	True
+short_conv_1498	gaokao_audio/short_conv_1498.wav	111111	How did the woman feel about her presentation? A. Relaxed; B. Confident; C. Anxious.	C	True
+short_conv_1499	gaokao_audio/short_conv_1499.wav	111111	Where does the conversation probably take place? A. On a farm; B. At a fruit market; C. At customs.	C	True
+short_conv_15	gaokao_audio/short_conv_15.wav	111111	What is the man probably doing? A. Greeting his guests; B. Cleaning the house; C. Arguing with Maggie.	B	True
+short_conv_150	gaokao_audio/short_conv_150.wav	111111	What does the woman think of Tom? A. Shy; B. Impolite; C. Outgoing.	C	True
+short_conv_1500	gaokao_audio/short_conv_1500.wav	111111	When will the man most likely get home? A. At 7:00; B. At about 7:30; C. After 8:00.	B	True
+short_conv_1501	gaokao_audio/short_conv_1501.wav	111111	Where is the woman going next? A. To a snack bar; B. To a movie theater; C. To her friend Simon’s house.	A	True
+short_conv_1502	gaokao_audio/short_conv_1502.wav	111111	What will the man do next? A. Fill out another form; B. Correct his mistake on the form; C. Tell the woman his medical history.	A	True
+short_conv_1503	gaokao_audio/short_conv_1503.wav	111111	What is the man looking for? A. A book; B. His iPhone; C. A pay phone.	C	True
+short_conv_1504	gaokao_audio/short_conv_1504.wav	111111	What does the man mean? A. Mr. Johnson’s ideas are nonsense; B. He quite agrees with Mr. Johnson’s views; C. Mr. Johnson is good at expressing his ideas.	B	True
+short_conv_1505	gaokao_audio/short_conv_1505.wav	111111	How did the man feel when he was called on? A. Worried and frightened; B. Quite embarrassed; C. Deeply ashamed.	A	True
+short_conv_1506	gaokao_audio/short_conv_1506.wav	111111	What does the woman imply the man should do? A. To cut his jeans short; B. To go on a diet; C. To buy a pair of jeans.	B	True
+short_conv_1507	gaokao_audio/short_conv_1507.wav	111111	How much tax should the man pay per night? A. $5; B. $ 10; C. $15.	B	True
+short_conv_1508	gaokao_audio/short_conv_1508.wav	111111	Why is the woman upset? A. The flower shop is dosed; B. She received the wrong delivery; C. Her delivery hasn’t been ready in lime.	C	True
+short_conv_1509	gaokao_audio/short_conv_1509.wav	111111	What does the man order? A. Banana juice; B. Orange juice; C. Ice tea.	A	True
+short_conv_151	gaokao_audio/short_conv_151.wav	111111	How does Jack go to school now? A. On foot; B. By bus; C. By bike.	A	True
+short_conv_1510	gaokao_audio/short_conv_1510.wav	111111	Where does the man want to sit? A. Near the window; B. Far from the window; C. In the smoking section.	A	True
+short_conv_1511	gaokao_audio/short_conv_1511.wav	111111	When will the man call back? A. At 10:00; B. At 10:15; C. At 9:45.	B	True
+short_conv_1512	gaokao_audio/short_conv_1512.wav	111111	What does the woman advise the man to do? A. Do more outdoor activities; B. Think about his homework; C. Start working and watching TV.	A	True
+short_conv_1513	gaokao_audio/short_conv_1513.wav	111111	What do we know about the man? A. He doesn’t like skiing now; B. He is as excited as the woman before skiing now; C. He got excited before going skiing in the past.	C	True
+short_conv_1514	gaokao_audio/short_conv_1514.wav	111111	What can we learn about the man? A. He has a great talent for cooking; B. He is a green hand in cooking; C. He improved the dish of his grandmother.	A	True
+short_conv_1515	gaokao_audio/short_conv_1515.wav	111111	What is the men's attitude? A. Terrified; B. Apologetic; C. Annoyed.	C	True
+short_conv_1516	gaokao_audio/short_conv_1516.wav	111111	What are the speakers talking about? A. Ordering various drinks; B. Preparing for a party; C. Choosing suitable drinks.	B	True
+short_conv_1517	gaokao_audio/short_conv_1517.wav	111111	What does the woman imply? A. The delay of the delivery is caused by the awful weather; B. There is a problem with the policy of food delivery; C. The man should have his delivery fee returned.	A	True
+short_conv_1518	gaokao_audio/short_conv_1518.wav	111111	What is probably the men? A. A fitness instructor; B. A mechanic; C. A medical doctor.	A	True
+short_conv_1519	gaokao_audio/short_conv_1519.wav	111111	What will the men probably do next? A. Take some aspirin; B. See a doctor; C. Drive to the hospital.	B	True
+short_conv_152	gaokao_audio/short_conv_152.wav	111111	Where does the conversation probably take place? A. In a bank; B. In a library; C. In a bookstore.	B	True
+short_conv_1520	gaokao_audio/short_conv_1520.wav	111111	Why can't the lecture be held tomorrow? A. The CEO won’t be free at that time; B. The equipment in the lecture hall is out of order; C. The lecture hall is not reserved early enough.	A	True
+short_conv_1521	gaokao_audio/short_conv_1521.wav	111111	What time will the opening ceremony start? A. 7:45; B. 8:00; C. 8:15.	B	True
+short_conv_1522	gaokao_audio/short_conv_1522.wav	111111	Where might the speakers be? A. In a restaurant; B. In a supermarket; C. In the hospital.	A	True
+short_conv_1523	gaokao_audio/short_conv_1523.wav	111111	Which color does the woman prefer? A. Green; B. Yellow; C. Purple.	A	True
+short_conv_1524	gaokao_audio/short_conv_1524.wav	111111	What did Patrick do last Friday? A. He moved to another place; B. He sold his old apartment; C. He went out with a friend.	A	True
+short_conv_1525	gaokao_audio/short_conv_1525.wav	111111	What does the woman do? A. She’s a salesperson; B. She’s a librarian; C. She’s a bank clerk.	B	True
+short_conv_1526	gaokao_audio/short_conv_1526.wav	111111	What did Fred do? A. He travelled to Italy; B. He offered Kate a ride; C. He bought a new car.	C	True
+short_conv_1527	gaokao_audio/short_conv_1527.wav	111111	How does Henry feel now? A. Proud; B. Tired; C. Grateful.	B	True
+short_conv_1528	gaokao_audio/short_conv_1528.wav	111111	What is the woman going to do this afternoon? A. Eat out; B. See a doctor; C. Go shopping.	C	True
+short_conv_1529	gaokao_audio/short_conv_1529.wav	111111	What does the man think of the party? A. He doesn't like the part B. He hates to prepare for the party; C. It is worthwhile to prepare for the party.	B	True
+short_conv_153	gaokao_audio/short_conv_153.wav	111111	What will Cathy do first? A. Visit her aunt; B. Buy some fruit; C. Go to her office.	B	True
+short_conv_1530	gaokao_audio/short_conv_1530.wav	111111	What are the speakers mainly discussing? A. How customers could be best served; B. What kind of stores can offer lower prices; C. Whether online stores will replace high-street stores.	C	True
+short_conv_1531	gaokao_audio/short_conv_1531.wav	111111	Where are the speakers going to meet? A. At the woman’ s home B. At a library; C. At a bus stop	B	True
+short_conv_1532	gaokao_audio/short_conv_1532.wav	111111	What did the man do last night? A. He attended a party; B. He had his car repaired; C. He went to a restaurant	C	True
+short_conv_1533	gaokao_audio/short_conv_1533.wav	111111	How did James get in touch with the woman? A. By letter; B. By telephone C. By e-mail	B	True
+short_conv_1534	gaokao_audio/short_conv_1534.wav	111111	Who has the stapler? A. Somebody else; B. The man; C. The woman.	A	True
+short_conv_1535	gaokao_audio/short_conv_1535.wav	111111	What does the man do? A. A waiter; B. A salesman; C. A teacher.	B	True
+short_conv_1536	gaokao_audio/short_conv_1536.wav	111111	What does the man mean? A. Baseball is his favorite sport; B. Baseball is more interesting than any other sport; C. Baseball is the most boring sport.	C	True
+short_conv_1537	gaokao_audio/short_conv_1537.wav	111111	What does the man suggest doing? A. Getting a ride with somebody; B. Going to the gas station; C. Repairing the car.	A	True
+short_conv_1538	gaokao_audio/short_conv_1538.wav	111111	Where did the woman go? A. The office; B. The railway station; C. The doctor's.	B	True
+short_conv_1539	gaokao_audio/short_conv_1539.wav	111111	What does the man mean? A. He will go into town; B. He misses his parents; C. He has moved house.	C	True
+short_conv_154	gaokao_audio/short_conv_154.wav	111111	Where is the man going now? A. To a restaurant; B. To the editor’s office; C. To his own office.	A	True
+short_conv_1540	gaokao_audio/short_conv_1540.wav	111111	What are the speakers discussing? A. What gift to buy; B. Where to buy a gift; C. Whether to buy a gift.	A	True
+short_conv_1541	gaokao_audio/short_conv_1541.wav	111111	What does the man do now? A. An officer; B. A shop assistant; C. A teacher.	B	True
+short_conv_1542	gaokao_audio/short_conv_1542.wav	111111	Who did the man see yesterday? A. Jane and Tony; B. Tony’s mum; C. The woman’s boyfriend.	A	True
+short_conv_1543	gaokao_audio/short_conv_1543.wav	111111	What are they talking about? A. Boats; B. Paintings; C. Mountains.	B	True
+short_conv_1544	gaokao_audio/short_conv_1544.wav	111111	What is the weather like? A. Cold; B. Hot; C. Cool.	B	True
+short_conv_1545	gaokao_audio/short_conv_1545.wav	111111	What does the man plan to do for his mother on her birthday? A. Buy her a bunch of flowers; B. Buy her a new coat; C. Cook a dinner for her.	C	True
+short_conv_1546	gaokao_audio/short_conv_1546.wav	111111	Why can’t the man return the shoes? A. He has worn them too long; B. He lost the receipt; C. The shoes are comfortable.	A	True
+short_conv_1547	gaokao_audio/short_conv_1547.wav	111111	How often does the woman eat out? A. Three times a week; B. Four times a week; C. Five times a week.	C	True
+short_conv_1548	gaokao_audio/short_conv_1548.wav	111111	When should the man check in? A. At 4:30 pm; B. At 4:00 pm; C. At 5:30 pm.	A	True
+short_conv_1549	gaokao_audio/short_conv_1549.wav	111111	What are the speakers mainly talking about? A. Some kinds of art; B. The woman's mother; C. Their school life.	A	True
+short_conv_155	gaokao_audio/short_conv_155.wav	111111	What do we know about the man? A. He has lost his way; B. He isn’t a native of this city; C. He is busy talking with the old lady.	B	True
+short_conv_1550	gaokao_audio/short_conv_1550.wav	111111	At what age did the man probably start to play basketball? A. 7 years old; B. 13 years old; C. 27 years old.	A	True
+short_conv_1551	gaokao_audio/short_conv_1551.wav	111111	What did the man fail to do in Las Vegas? A. Do the shopping; B. Go swimming; C. Visit the Hoover Dam.	B	True
+short_conv_1552	gaokao_audio/short_conv_1552.wav	111111	How does the man usually go to work? A. By train; B. By bus; C. By car.	B	True
+short_conv_1553	gaokao_audio/short_conv_1553.wav	111111	Where was the man born? A. In Boston; B. In Phoenix; C. In New York.	C	True
+short_conv_1554	gaokao_audio/short_conv_1554.wav	111111	How will the woman go downtown? A. By subway; B. By bus; C. By taxi.	A	True
+short_conv_1555	gaokao_audio/short_conv_1555.wav	111111	Why do the speakers line up? A. To buy some water; B. To buy some food; C. To buy film tickets.	B	True
+short_conv_1556	gaokao_audio/short_conv_1556.wav	111111	When will the bus arrive? A. In two minutes; B. In four minutes; C. In ten minutes.	C	True
+short_conv_1557	gaokao_audio/short_conv_1557.wav	111111	Where is the man now? A. On his way home; B. In the kitchen; C. In his office.	C	True
+short_conv_1558	gaokao_audio/short_conv_1558.wav	111111	How does the woman feel? A. Excited; B. Surprised C. Annoyed.	C	True
+short_conv_1559	gaokao_audio/short_conv_1559.wav	111111	What does the man invite the woman to do? A. Go to the concert; B. Visit his brother; C. Have dinner together.	C	True
+short_conv_156	gaokao_audio/short_conv_156.wav	111111	What is the probable relationship between the speakers? A. Husband and wife; B. Salesman and customer; C. Workmates.	A	True
+short_conv_1560	gaokao_audio/short_conv_1560.wav	111111	What does the woman think of her Job? A. Stressful B. Interesting; C. Relaxing.	A	True
+short_conv_1561	gaokao_audio/short_conv_1561.wav	111111	Why does the woman look so excited? A. She will take a trip; B. She bought nice goods; C. She opened a beauty salon.	B	True
+short_conv_1562	gaokao_audio/short_conv_1562.wav	111111	What are the speakers talking about? A. Throwing a party; B. Getting Mary a gift; C. Doing some exercise.	B	True
+short_conv_1563	gaokao_audio/short_conv_1563.wav	111111	What does the man want? A. A magazine; B. Some fish; C. A teapot.	A	True
+short_conv_1564	gaokao_audio/short_conv_1564.wav	111111	What does the woman mean? A. She will help the man; B. She won’t finish the paper; C. The man should depend on himself.	C	True
+short_conv_1565	gaokao_audio/short_conv_1565.wav	111111	How will the woman deal with her bicycle? A. She will leave it in the apartment; B. She will give it to the man for free; C. She will sell it to the man at a low price.	B	True
+short_conv_1566	gaokao_audio/short_conv_1566.wav	111111	What does the man want to do? A. Play golf next Tuesday; B. Visit his parents; C. Take a day off.	C	True
+short_conv_1567	gaokao_audio/short_conv_1567.wav	111111	What happened to Nancy? A. She tried out but failed; B. She was chosen for the lead role; C. She missed the chance of trying out.	A	True
+short_conv_1568	gaokao_audio/short_conv_1568.wav	111111	What time is it now? A. 8:40; B. 8:55; C. 9:00.	B	True
+short_conv_1569	gaokao_audio/short_conv_1569.wav	111111	How many floors does the man have to walk to Mr Johnson’s office? A. Three B. Four C. Six.	B	True
+short_conv_157	gaokao_audio/short_conv_157.wav	111111	When will the speaker meet? A. This Tuesday; B. Next Monday; C. Next Tuesday.	C	True
+short_conv_1570	gaokao_audio/short_conv_1570.wav	111111	What will they do? A. they will go to the Mediterranean by train; B. they will go to Hawaii for their holiday; C. they will go to Hawaii by plane.	A	True
+short_conv_1571	gaokao_audio/short_conv_1571.wav	111111	What does the woman think of the man’s paper? A. It is not complete; B. The handwriting is very poor; C. Some parts of it aren’t well written.	C	True
+short_conv_1572	gaokao_audio/short_conv_1572.wav	111111	How long will Tom wait there? A. For six hours B. for two hours C. for three hours.	C	True
+short_conv_1573	gaokao_audio/short_conv_1573.wav	111111	Where does the conversation most probably take place？ A. In a classroom B. In a restaurant C. In a bookstore.	C	True
+short_conv_1574	gaokao_audio/short_conv_1574.wav	111111	When does the conversation take place? A. In September ; B. In April C. In February	C	True
+short_conv_1575	gaokao_audio/short_conv_1575.wav	111111	Whose advice did the woman follow? A. The shop assistant’s; B. Her mother’s; C. Her sister’s.	B	True
+short_conv_1576	gaokao_audio/short_conv_1576.wav	111111	What kind of music does the man like? A. Jazz; B. Classical; C. Folk.	C	True
+short_conv_1577	gaokao_audio/short_conv_1577.wav	111111	What will the man do in Edinburgh ? A. Do business with Justin; B. Tell Justin his new address; C. Give Justin the medicines.	C	True
+short_conv_1578	gaokao_audio/short_conv_1578.wav	111111	Where does the conversation take place ? A. In an elevator; B. On a bus; C. In a taxi.	B	True
+short_conv_1579	gaokao_audio/short_conv_1579.wav	111111	What is the woman concerned about the most? A. Spending less money; B. Buying unusual flowers; C. Getting something conveniently.	A	True
+short_conv_158	gaokao_audio/short_conv_158.wav	111111	What is the weather probably like now? A. Foggy; B. Cloudy; C. Fine.	B	True
+short_conv_1580	gaokao_audio/short_conv_1580.wav	111111	What does the man think is the best way of inviting guests? A. In person; B. By mail; C. Online.	C	True
+short_conv_1581	gaokao_audio/short_conv_1581.wav	111111	How does the woman feel about her weekend? A. Very pleased; B. Somewhat bored; C. Extremely disappointed.	A	True
+short_conv_1582	gaokao_audio/short_conv_1582.wav	111111	What does the man want to buy? A. A bike; B. A lock; C. A camera.	B	True
+short_conv_1583	gaokao_audio/short_conv_1583.wav	111111	Why can’t the woman go to the post office? A. It isn’t open today; B. She doesn’t have enough time; C. The man can’t give her directions.	A	True
+short_conv_1584	gaokao_audio/short_conv_1584.wav	111111	What is the man doing? A. Asking the way; B. Booking a table C. Waiting for the woman	A	True
+short_conv_1585	gaokao_audio/short_conv_1585.wav	111111	What is the probable relationship between the speakers? A. A couple B. Workmates C. Teacher and student.	B	True
+short_conv_1586	gaokao_audio/short_conv_1586.wav	111111	Where does the conversation probably take place? A. At a beach B. In a restaurant C. In a hotel	C	True
+short_conv_1587	gaokao_audio/short_conv_1587.wav	111111	What does the woman want Tom to do? A. Make the bed B. Play a game C. Lend her 5 dollars	A	True
+short_conv_1588	gaokao_audio/short_conv_1588.wav	111111	Why did the woman go to Beijing? A. To visit her cousin B. To enjoy her vacation C. To do some business	B	True
+short_conv_1589	gaokao_audio/short_conv_1589.wav	111111	What do we know about the woman? A. She is supportive; B. She is confident; C. She is active.	A	True
+short_conv_159	gaokao_audio/short_conv_159.wav	111111	Where does the conversation probably take place? A. In a hospital; B. In a classroom; C. In a restaurant.	A	True
+short_conv_1590	gaokao_audio/short_conv_1590.wav	111111	What are the speakers mainly talking about? A. The money; B. The football; C. The birthday.	B	True
+short_conv_1591	gaokao_audio/short_conv_1591.wav	111111	What is the man doing? A. Visiting a company; B. Having a job interview; C. Making a telephone call.	C	True
+short_conv_1592	gaokao_audio/short_conv_1592.wav	111111	What colour does the man prefer? A. Brown; B. Black; C. Blue.	A	True
+short_conv_1593	gaokao_audio/short_conv_1593.wav	111111	What is the woman going to do? A. See the doctor; B. Put on clothes; C. Go to bed.	C	True
+short_conv_1594	gaokao_audio/short_conv_1594.wav	111111	What will the woman do for the man? A. Wash his clothes; B. Take him to the store; C. Get him a wallet.	C	True
+short_conv_1595	gaokao_audio/short_conv_1595.wav	111111	What's the probable relationship between the speakers? A. Boss and secretary; B. Taxi driver and passenger; C. Driver and conductor.	B	True
+short_conv_1596	gaokao_audio/short_conv_1596.wav	111111	Where is Jimmy now? A. In a hotel; B. In the lab; C. At home.	B	True
+short_conv_1597	gaokao_audio/short_conv_1597.wav	111111	What does the woman want to do? A. Open the window; B. Open the door; C. Let the man in.	A	True
+short_conv_1598	gaokao_audio/short_conv_1598.wav	111111	When did the alarm clock ring? A. At 6 o'clock; B. At 8 o'clock; C. At 7 o'clock.	B	True
+short_conv_1599	gaokao_audio/short_conv_1599.wav	111111	How many languages is the man learning now? A. One; B. Two; C. Three.	B	True
+short_conv_16	gaokao_audio/short_conv_16.wav	111111	What is the woman looking at? A. A train from Beijing to Shanghai; B. The railway timetable; C. The maps of Beijing and Shanghai.	B	True
+short_conv_160	gaokao_audio/short_conv_160.wav	111111	What is the man likely to do? A. Drop out of school; B. Try to get a scholarship; C. Continue his studies.	C	True
+short_conv_1600	gaokao_audio/short_conv_1600.wav	111111	What is the man going to do first? A. Sweep the floor; B. Do the cooking; C. Remove the rubbish.	C	True
+short_conv_1601	gaokao_audio/short_conv_1601.wav	111111	What does the man think of his new computer? A. It's quite expensive; B. It is hard to use; C. It is of good value.	C	True
+short_conv_1602	gaokao_audio/short_conv_1602.wav	111111	What are the speakers mainly talking about? A. A museum; B. A garden; C. A painting.	A	True
+short_conv_1603	gaokao_audio/short_conv_1603.wav	111111	How will Uncle Tom come? A. By motorbike; B. By taxi; C. By car.	B	True
+short_conv_1604	gaokao_audio/short_conv_1604.wav	111111	What will the man do? A. Go shopping; B. Watch a game; C. Feed the bulls.	B	True
+short_conv_1605	gaokao_audio/short_conv_1605.wav	111111	How much does the man withdraw? A. $100; B. $105; C. $110.	A	True
+short_conv_1606	gaokao_audio/short_conv_1606.wav	111111	What does the woman probably do? A. A doctor; B. A professor; C. A policewoman.	C	True
+short_conv_1607	gaokao_audio/short_conv_1607.wav	111111	What makes the man think Kim Yu-Na comes from South Korea? A. Her name; B. Her language; C. Her appearance.	A	True
+short_conv_1608	gaokao_audio/short_conv_1608.wav	111111	What does the man often do on weekends? A. Go on picnics; B. Stay at home; C. Have a barbecue.	B	True
+short_conv_1609	gaokao_audio/short_conv_1609.wav	111111	What are the speakers mainly talking about? A. Their biology teacher; B. The woman’s parents; C. The pet of the class.	C	True
+short_conv_161	gaokao_audio/short_conv_161.wav	111111	What's the weather like? A. Cool; B. Cold; C. Warm.	B	True
+short_conv_1610	gaokao_audio/short_conv_1610.wav	111111	What is the relationship between the speakers? A. BOSS and employee; B. Waiter and customer; C. Co.workers.	C	True
+short_conv_1611	gaokao_audio/short_conv_1611.wav	111111	How much did the woman buy her watch for? A. $1 000; B. $800; C. $200.	B	True
+short_conv_1612	gaokao_audio/short_conv_1612.wav	111111	What does the woman 0flfer the man? A. Iced tea; B. Cookies; C. Chips.	C	True
+short_conv_1613	gaokao_audio/short_conv_1613.wav	111111	What will the man probably do during his holiday? A. Do his work; B. Travel to Qingdao; C. Climb some mountains.	A	True
+short_conv_1614	gaokao_audio/short_conv_1614.wav	111111	Who is the man most probably? A. The woman’s colleague; B. The woman’s brother; C. The woman’s husband.	A	True
+short_conv_1615	gaokao_audio/short_conv_1615.wav	111111	What does the man mean? A. The diamond is not beautiful; B. The design is not fashionable; C. The price is too high for them now.	C	True
+short_conv_1616	gaokao_audio/short_conv_1616.wav	111111	What will the woman do? A. Go to the movies; B. Do some shopping; C. Attend a party.	B	True
+short_conv_1617	gaokao_audio/short_conv_1617.wav	111111	When will the man leave for home? A. On Thursday; B. On Friday; C. On Sunday.	B	True
+short_conv_1618	gaokao_audio/short_conv_1618.wav	111111	Where did the woman go? A. To downtown Berkeley; B. To the man’s house; C. To Amherst.	A	True
+short_conv_1619	gaokao_audio/short_conv_1619.wav	111111	How does the woman happen to know about the Garden Café? A. She is greatly encouraged; B. She got to know about it on line; C. The man talked about it to her.	B	True
+short_conv_162	gaokao_audio/short_conv_162.wav	111111	What is important to the man? A. The cost; B. The time; C. The airline.	A	True
+short_conv_1620	gaokao_audio/short_conv_1620.wav	111111	Who is the woman speaking to? A. A policeman; B. A friend; C. A shop assistant.	A	True
+short_conv_1621	gaokao_audio/short_conv_1621.wav	111111	What is Kate’s job? A. A writer; B. A teacher; C. A doctor.	A	True
+short_conv_1622	gaokao_audio/short_conv_1622.wav	111111	How many letters has the man answered? A. Two; B. Four; C. Six.	B	True
+short_conv_1623	gaokao_audio/short_conv_1623.wav	111111	What does the woman mean? A. She failed in one of Miss Black’s tests; B. She finds it easy to pass Miss Black’s tests; C. She has never heard of Miss Black.	B	True
+short_conv_1624	gaokao_audio/short_conv_1624.wav	111111	What are the speakers talking about? A. A new TV set; B. A TV program; C. A radio program.	B	True
+short_conv_1625	gaokao_audio/short_conv_1625.wav	111111	What is the woman going to do? A. Play baseball; B. Watch a game; C. Do her work.	B	True
+short_conv_1626	gaokao_audio/short_conv_1626.wav	111111	What will the woman work as? A. An assistant; B. A lawyer; C. A teacher.	A	True
+short_conv_1627	gaokao_audio/short_conv_1627.wav	111111	What will the speakers take to the picnic? A. Some drinks; B. Some fruit; C. Some desserts.	C	True
+short_conv_1628	gaokao_audio/short_conv_1628.wav	111111	What did the man like about the movie? A. The acting; B. The music; C. The scenery.	A	True
+short_conv_1629	gaokao_audio/short_conv_1629.wav	111111	What does the man imply about Linda? A. She rearranged the chapters of her book; B. She assured him that the chapter was finished; C. She worked on the chapter for quite a while.	C	True
+short_conv_163	gaokao_audio/short_conv_163.wav	111111	What will the speakers do on Sunday? A. Go swimming; B. Play volleyball; C. Go cycling.	B	True
+short_conv_1630	gaokao_audio/short_conv_1630.wav	111111	What does the woman mean? A. She used to work at a newspaper; B. She'd like her supervisor's opinion of her work; C. She wishes she had a different kind of job.	B	True
+short_conv_1631	gaokao_audio/short_conv_1631.wav	111111	What does the woman suggest they do? A. Pay for some of the food; B. Insist on choosing their own food; C. Treat Gary to dinner some other time.	A	True
+short_conv_1632	gaokao_audio/short_conv_1632.wav	111111	What can be inferred about Philip? A. He'll go to the party with the woman; B. He met the man at the party; C. He has changed his plans.	C	True
+short_conv_1633	gaokao_audio/short_conv_1633.wav	111111	What does the woman say about the film? A. It will be ready at four o'clock today; B. It can be picked up at two o'clock tomorrow; C. Only two rolls will be ready on time.	B	True
+short_conv_1634	gaokao_audio/short_conv_1634.wav	111111	What does the man mean? A. The next dish will be really special; B. The baked potatoes are his best dish; C. He’11 make something even better next time.	A	True
+short_conv_1635	gaokao_audio/short_conv_1635.wav	111111	How does the woman probably feel in the beginning? A. Confused; B. Satisfied; C. Disappointed.	A	True
+short_conv_1636	gaokao_audio/short_conv_1636.wav	111111	Who might Shelly be? A. The man’s wife; B. The girl’s sister; C. A babysitter.	C	True
+short_conv_1637	gaokao_audio/short_conv_1637.wav	111111	What does the man probably want the woman to do? A. Fix his pants; B. Give him his money back; C. Give him a new pair of pants for free.	B	True
+short_conv_1638	gaokao_audio/short_conv_1638.wav	111111	What does the woman want to know? A. If the man is thirsty; B. If the man likes Beyonce; C. If the man has heard some new music.	C	True
+short_conv_1639	gaokao_audio/short_conv_1639.wav	111111	What are the speakers mainly talking about? A. A house; B. A friend; C. A garden.	A	True
+short_conv_164	gaokao_audio/short_conv_164.wav	111111	Why is Emily mentioned in the conversation? A. She might want a ticket; B. She is looking for the man; C. She has an extra ticket.	A	True
+short_conv_1640	gaokao_audio/short_conv_1640.wav	111111	Where will the woman go first? A. To a post office; B. To a bakery; C. To a bank.	A	True
+short_conv_1641	gaokao_audio/short_conv_1641.wav	111111	When will the woman bring the iPad to the man? A. This afternoon; B. Tomorrow morning; C. Tomorrow afternoon.	C	True
+short_conv_1642	gaokao_audio/short_conv_1642.wav	111111	What is the man wearing now? A. A blue sports shirt; B. A green sports shirt; C. A green T-shirt.	C	True
+short_conv_1643	gaokao_audio/short_conv_1643.wav	111111	What is the man’s grandmother doing? A. Having a swim; B. Taking a bath; C. Reading an email.	B	True
+short_conv_1644	gaokao_audio/short_conv_1644.wav	111111	What does the man think of the lecture? A. Interesting; B. Hard to understand; C. Long and boring.	B	True
+short_conv_1645	gaokao_audio/short_conv_1645.wav	111111	Where will the speakers go first? A. A restaurant; B. A cinema; C. A hospital.	A	True
+short_conv_1646	gaokao_audio/short_conv_1646.wav	111111	When did the woman plan to go to Spain? A. In spring; B. In summer; C. In autumn.	B	True
+short_conv_1647	gaokao_audio/short_conv_1647.wav	111111	How old is the man’s daughter? A. 1 years old; B. 2 years old; C. 3 years old.	B	True
+short_conv_1648	gaokao_audio/short_conv_1648.wav	111111	What does the man probably do? A. A shop assistant; B. A policeman; C. A postman.	C	True
+short_conv_1649	gaokao_audio/short_conv_1649.wav	111111	What did the woman do in Beijing? A. She visited some cultural relics; B. She visited a friend; C. She attended a lecture.	C	True
+short_conv_165	gaokao_audio/short_conv_165.wav	111111	What is the relationship between the speakers? A. Colleges; B. Classmates; C. Strangers.	C	True
+short_conv_1650	gaokao_audio/short_conv_1650.wav	111111	What's the probable relationship of the two speakers? A. Strangers; B. Colleagues; C. Couple.	B	True
+short_conv_1651	gaokao_audio/short_conv_1651.wav	111111	How will the speakers get there? A. In Lucy’s car; B. on foot; C. by bus.	A	True
+short_conv_1652	gaokao_audio/short_conv_1652.wav	111111	Who is the woman over there? A. Jack's mother; B. Jack's sister; C. Jack's aunt.	C	True
+short_conv_1653	gaokao_audio/short_conv_1653.wav	111111	When will the match start? A. At 2:20; B. At 2:35; C. At 2:25.	C	True
+short_conv_1654	gaokao_audio/short_conv_1654.wav	111111	What does the woman advise the man to do? A. Turn down the music; B. Go to bed early; C. Set an alarm.	C	True
+short_conv_1655	gaokao_audio/short_conv_1655.wav	111111	What may be the weather like at the weekend? A. Sunny; B. Rainy; C. Snowy.	C	True
+short_conv_1656	gaokao_audio/short_conv_1656.wav	111111	What does the man mean? A. Pizza tastes terrible; B. Pizza was once cheap; C. Pizza is always expensive.	B	True
+short_conv_1657	gaokao_audio/short_conv_1657.wav	111111	Why does the man congratulate the woman? A. She ranked Number 1 in the exams; B. She was popular in her class; C. She quit the exam.	A	True
+short_conv_1658	gaokao_audio/short_conv_1658.wav	111111	What are the speakers talking about? A. A dinner party; B. A plan for the night; C. A movie about dinner.	B	True
+short_conv_1659	gaokao_audio/short_conv_1659.wav	111111	What are the speakers mainly talking about? A. Whether to go to a bookstore; B. How to get a book; C. What their teacher is like.	B	True
+short_conv_166	gaokao_audio/short_conv_166.wav	111111	What are the speakers talking about? A. What to drink; B. Where to meet; C. When to leave.	B	True
+short_conv_1660	gaokao_audio/short_conv_1660.wav	111111	What did the speakers do together last summer? A. They went to school; B. They looked for jobs; C. They did exercise at the gym.	C	True
+short_conv_1661	gaokao_audio/short_conv_1661.wav	111111	How much will the woman pay? A. $1; B. $5; C. $5.2.	C	True
+short_conv_1662	gaokao_audio/short_conv_1662.wav	111111	What is the woman doing? A. Buying a ticket; B. Driving a car; C. Looking at a sign.	B	True
+short_conv_1663	gaokao_audio/short_conv_1663.wav	111111	Where does the conversation take place? A. In a fruit store; B. In a supermarket; C. In a restaurant.	C	True
+short_conv_1664	gaokao_audio/short_conv_1664.wav	111111	What will the man do? A. Turn in his paper; B. Pay the telephone bill; C. Help the woman.	A	True
+short_conv_1665	gaokao_audio/short_conv_1665.wav	111111	What does the man advise the woman to do? A. Watch the game tomorrow; B. Do the work tonight; C. Enjoy the game tonight.	C	True
+short_conv_1666	gaokao_audio/short_conv_1666.wav	111111	How will the speaker get a ticket to the concert? A. The man will go to buy the ticket; B. The woman will get the ticket; C. The man will have someone buy the ticket.	B	True
+short_conv_1667	gaokao_audio/short_conv_1667.wav	111111	What do we know about the man? A. He is going to take exercise; B. He is going to have meetings; C. He is going to clean his shirts.	B	True
+short_conv_1668	gaokao_audio/short_conv_1668.wav	111111	What's the probable relationship between the two speakers? A. Classmates; B. Colleagues.(同事) C. Teacher and student.	A	True
+short_conv_1669	gaokao_audio/short_conv_1669.wav	111111	What is the man? A. A secretary; B. A teacher; C. A doctor.	C	True
+short_conv_167	gaokao_audio/short_conv_167.wav	111111	What does Jack want to do? A. Take fitness classes; B. Buy a pair of gym shoes; C. Change his work schedule.	A	True
+short_conv_1670	gaokao_audio/short_conv_1670.wav	111111	Where is the woman’s grandma now? A. At home; B. In a hospital; C. In a hotel.	B	True
+short_conv_1671	gaokao_audio/short_conv_1671.wav	111111	What was the weather like on John’s holiday? A. Sunny; B. Rainy; C. Cold.	C	True
+short_conv_1672	gaokao_audio/short_conv_1672.wav	111111	How much does the woman pay for the tickets? A. £9; B. £10; C. £11.	A	True
+short_conv_1673	gaokao_audio/short_conv_1673.wav	111111	How will the speakers probably go home? A. By taxi B. By bus; C. By subway.	A	True
+short_conv_1674	gaokao_audio/short_conv_1674.wav	111111	What is the woman doing? A. Making suggestions; B. Making excuses; C. Making requests.	B	True
+short_conv_1675	gaokao_audio/short_conv_1675.wav	111111	What does the woman imply? A. The Edwards are quite well-off; B. The Edward should cut down their living expenses; C. It’ll be unwise for the Edwards to buy another house.	C	True
+short_conv_1676	gaokao_audio/short_conv_1676.wav	111111	What do we learn from the conversation? A. The train is crowded; B. The train is late; C. The train is on time.	B	True
+short_conv_1677	gaokao_audio/short_conv_1677.wav	111111	How does the woman suggest the man prepare for Professor Yang’s lesson? A. Review the details of all her lessons; B. Focus on the main points of her lectures; C. Talk with her about his learning problems.	B	True
+short_conv_1678	gaokao_audio/short_conv_1678.wav	111111	What are the speakers doing? A. Having dinner; B. Having a class; C. Reading.	A	True
+short_conv_1679	gaokao_audio/short_conv_1679.wav	111111	What are the speakers mainly talking about? A. Preparing for a test; B. Eating during an exam; C. Getting a medical exam.	B	True
+short_conv_168	gaokao_audio/short_conv_168.wav	111111	Where does this conversation take place? A. In a classroom; B. In a hospital; C. In a museum.	B	True
+short_conv_1680	gaokao_audio/short_conv_1680.wav	111111	What is the probable relationship between the speakers? A. Father and daughter; B. Classmates; C. Teacher and student.	C	True
+short_conv_1681	gaokao_audio/short_conv_1681.wav	111111	What did the woman do today? A. She cleaned the car; B. She bought an umbrella; C. She listened to the weather forecast.	A	True
+short_conv_1682	gaokao_audio/short_conv_1682.wav	111111	When does the man usually do exercise? A. In the afternoon; B. In the morning; C. At night.	B	True
+short_conv_1683	gaokao_audio/short_conv_1683.wav	111111	What fruit does the woman use? A. Pears; B. Oranges; C. Banana.	C	True
+short_conv_1684	gaokao_audio/short_conv_1684.wav	111111	What are the speakers trying to do? A. Call a taxi; B. Catch a bus; C. Take a plane.	B	True
+short_conv_1685	gaokao_audio/short_conv_1685.wav	111111	What’s the relationship between the speakers? A. Customer and waitress; B. Teacher and student; C. Boss and secretary.	A	True
+short_conv_1686	gaokao_audio/short_conv_1686.wav	111111	What season is it now? A. Winter; B. Spring; C. Autumn.	C	True
+short_conv_1687	gaokao_audio/short_conv_1687.wav	111111	Where does the conversation take place? A. In an office; B. In an apartment; C. In a shopping mall.	C	True
+short_conv_1688	gaokao_audio/short_conv_1688.wav	111111	What does the woman write? A. Books; B. Plays; C. Newspaper articles.	C	True
+short_conv_1689	gaokao_audio/short_conv_1689.wav	111111	What is the relationship between the speakers? A. Co-workers; B. Boss and employee; C. Taxi driver and customer.	A	True
+short_conv_169	gaokao_audio/short_conv_169.wav	111111	What is the woman trying to do? A. Solve a crime; B. Decorate her bedroom; C. Study a language.	C	True
+short_conv_1690	gaokao_audio/short_conv_1690.wav	111111	What’s the woman’s secret to making good spaghetti? A. She cooks the spaghetti longer; B. She adds some salt to the boiling water; C. She puts the spaghetti in before the water boils.	B	True
+short_conv_1691	gaokao_audio/short_conv_1691.wav	111111	Where might the speakers be? A. In the man’s garden; B. In the man’s bedroom; C. In the man’s living room.	C	True
+short_conv_1692	gaokao_audio/short_conv_1692.wav	111111	When will the bus arrive? A. In two minutes; B. In four minutes; C. In ten minutes.	C	True
+short_conv_1693	gaokao_audio/short_conv_1693.wav	111111	What does the man do to relax at home? A. He reads novels; B. He plays the piano; C. He listens to music.	A	True
+short_conv_1694	gaokao_audio/short_conv_1694.wav	111111	What does the man think of the game? A. Unimportant; B. Boring; C. Fair.	B	True
+short_conv_1695	gaokao_audio/short_conv_1695.wav	111111	What gift will the woman probably get for Mary? A. A pen; B. A music record; C. A movie ticket.	B	True
+short_conv_1696	gaokao_audio/short_conv_1696.wav	111111	When did the woman take a piano test? A. One week ago; B. One mouth ago; C. Two months ago.	A	True
+short_conv_1697	gaokao_audio/short_conv_1697.wav	111111	What time is it now? A. 9:40; B. 10:00; C. 10:20.	A	True
+short_conv_1698	gaokao_audio/short_conv_1698.wav	111111	What are the speakers talking about? A. The man’s favorite festival; B. The man’s aunt; C. The man’s school bag.	C	True
+short_conv_1699	gaokao_audio/short_conv_1699.wav	111111	What does the woman mean? A. She thought it was very easy; B. She thought it was too hard for her to follow; C. She thought the instructor was very good.	A	True
+short_conv_17	gaokao_audio/short_conv_17.wav	111111	Where does Mr. Green work? A. In a hospital; B. In the railway station; C. In the woman’s company.	A	True
+short_conv_170	gaokao_audio/short_conv_170.wav	111111	What aspect of the jeans are the speakers discussing? A. The style; B. The color; C. The quality.	A	True
+short_conv_1700	gaokao_audio/short_conv_1700.wav	111111	What does the woman mean? A. They will make a phone call to Dr. Smith tomorrow; B. Dr. Smith was late for the call; C. They can call on Dr. Smith tomorrow.	A	True
+short_conv_1701	gaokao_audio/short_conv_1701.wav	111111	What does the woman mean? A. Business is not necessarily good at the turn of the year; B. Businessmen are the busiest people at the end of the year; C. There will be many cases at the end of the year.	A	True
+short_conv_1702	gaokao_audio/short_conv_1702.wav	111111	What does the woman mean? A. She’s already an hour late; B. The man shouldn’t wait to be interviewed; C. She’s too nervous to calm down.	C	True
+short_conv_1703	gaokao_audio/short_conv_1703.wav	111111	What does the man think of the book? A. It is not worth reading; B. It is not the one he likes; C. It is better than he expected.	B	True
+short_conv_1704	gaokao_audio/short_conv_1704.wav	111111	What does the man dislike about the computer? A. The price; B. The monitor; C. The keyboard.	A	True
+short_conv_1705	gaokao_audio/short_conv_1705.wav	111111	What did the man do last night? A. He painted some pictures; B. He watched a football match on TV; C. He went out to play football.	B	True
+short_conv_1706	gaokao_audio/short_conv_1706.wav	111111	What's the probable relationship between the two speakers. A. Teacher and student; B. Husband and wife; C. Brother and sister.	A	True
+short_conv_1707	gaokao_audio/short_conv_1707.wav	111111	Where does the conversation probably take place? A. At home; B. At the airport; C. At a supermarket.	B	True
+short_conv_1708	gaokao_audio/short_conv_1708.wav	111111	How many students took the HSK test last month? A. 300; B. 400; C. 600.	B	True
+short_conv_1709	gaokao_audio/short_conv_1709.wav	111111	What will the man do later? A. Eat out; B. Cook dinner; C. Buy vegetables.	A	True
+short_conv_171	gaokao_audio/short_conv_171.wav	111111	How many fish did the man catch at the beginning? A. Two; B. Three; C. Six.	C	True
+short_conv_1710	gaokao_audio/short_conv_1710.wav	111111	Where does the conversation take place? A. At school; B. At a shop; C. At the man’s house.	B	True
+short_conv_1711	gaokao_audio/short_conv_1711.wav	111111	What does the woman do? A. A student; B. A waitress; C. A tour guide.	C	True
+short_conv_1712	gaokao_audio/short_conv_1712.wav	111111	How did the man feel before his speech? A. Relaxed; B. Nervous; C. Confident.	B	True
+short_conv_1713	gaokao_audio/short_conv_1713.wav	111111	What will the man do today? A. Play football; B. Buy some flowers; C. Work in the garden.	C	True
+short_conv_1714	gaokao_audio/short_conv_1714.wav	111111	Where are the speakers? A. In a store; B. In a classroom; C. At a hotel.	C	True
+short_conv_1715	gaokao_audio/short_conv_1715.wav	111111	What has the woman decided to do on Sunday afternoon? A. To attend a wedding; B. To visit an exhibition; C. To meet a friend.	A	True
+short_conv_1716	gaokao_audio/short_conv_1716.wav	111111	When does the bank close on Saturday? A. At l:00 pm; B. At 3:00 pm; C. At 4:00 pm.	B	True
+short_conv_1717	gaokao_audio/short_conv_1717.wav	111111	What was the normal price of the T-shirt? A. $15; B. $30; C. $50.	B	True
+short_conv_1718	gaokao_audio/short_conv_1718.wav	111111	What will Dorothy do on the weekend? A. Go out with her friend; B. Work on her paper; C. Make some plans.	B	True
+short_conv_1719	gaokao_audio/short_conv_1719.wav	111111	How many notebooks does the man have in his backpack? A. Two; B. Five; C. Ten.	B	True
+short_conv_172	gaokao_audio/short_conv_172.wav	111111	What will the man do next? A. Pour the milk in the sink; B. Buy some milk; C. Eat breakfast.	B	True
+short_conv_1720	gaokao_audio/short_conv_1720.wav	111111	How is the woman feeling about her speech today? A. Excited; B. Proud; C. Nervous.	C	True
+short_conv_1721	gaokao_audio/short_conv_1721.wav	111111	What does the man want to eat? A. A pizza; B. A burger; C. Some dessert.	A	True
+short_conv_1722	gaokao_audio/short_conv_1722.wav	111111	What is the man’s problem? A. He doesn’t like French; B. He doesn’t have a dictionary; C. He doesn’t have good reading skills.	C	True
+short_conv_1723	gaokao_audio/short_conv_1723.wav	111111	What kind of music does the man like the most? A. Jazz; B. Classical; C. Rock.	A	True
+short_conv_1724	gaokao_audio/short_conv_1724.wav	111111	What does the woman think of the shirt for the party? A. The size is not large enough; B. The material is not good; C. The color is not suitable.	C	True
+short_conv_1725	gaokao_audio/short_conv_1725.wav	111111	When can the woman get the computers? A. On Tuesday; B. On Wednesday; C. On Thursday.	A	True
+short_conv_1726	gaokao_audio/short_conv_1726.wav	111111	How does the man feel about going to school by bike? A. Delighted; B. Tired; C. Concerned.	A	True
+short_conv_1727	gaokao_audio/short_conv_1727.wav	111111	How much will the man pay for the tickets? A. £7.5; B. £15; C. £30.	B	True
+short_conv_1728	gaokao_audio/short_conv_1728.wav	111111	Which is the right gate for the man’s flight? A. Gate 16; B. Gate 25; C. Gate 22.	B	True
+short_conv_1729	gaokao_audio/short_conv_1729.wav	111111	What are the speakers talking about? A. WeChat; B. Online shopping; C. The man’s grandma.	C	True
+short_conv_173	gaokao_audio/short_conv_173.wav	111111	What kind of movie will the speakers watch? A. An action movie; B. A comedy; C. A thriller.	B	True
+short_conv_1730	gaokao_audio/short_conv_1730.wav	111111	Why can’t the lecture be held tomorrow? A. The CEO won’t be available then; B. The lecture hall isn’t big enough; C. The equipment in the lecture hall doesn’t work.	A	True
+short_conv_1731	gaokao_audio/short_conv_1731.wav	111111	When will the next underground arrive? A. At 1:55; B. At 2:00; C. At 2:05.	B	True
+short_conv_1732	gaokao_audio/short_conv_1732.wav	111111	Where is the bookstore? A. Near a hotel; B. On the left of a hospital; C. On the right side of Main Street.	A	True
+short_conv_1733	gaokao_audio/short_conv_1733.wav	111111	What color are the gloves? A. Blue; B. Green; C. Yellow.	B	True
+short_conv_1734	gaokao_audio/short_conv_1734.wav	111111	What does the woman want to do? A. See a film with the man; B. Offer the man some help; C. Listen to some great music.	A	True
+short_conv_1735	gaokao_audio/short_conv_1735.wav	111111	What will the man do? A. Change the plan B. Wait for a phone call C. Sort things out	B	True
+short_conv_1736	gaokao_audio/short_conv_1736.wav	111111	At what time will the two speakers meet? A. 5:20 B. 5:10 C. 5:40	B	True
+short_conv_1737	gaokao_audio/short_conv_1737.wav	111111	Which place are the speakers trying to find? A. A hotel B. A bank C. A restaurant.	A	True
+short_conv_1738	gaokao_audio/short_conv_1738.wav	111111	What does the man like about the play? A. The story; B. The ending; C. The actor.	C	True
+short_conv_1739	gaokao_audio/short_conv_1739.wav	111111	What are the speakers mainly talking about? A. Their holiday plans; B. How to celebrate a festival; C. How to spend the weekends.	B	True
+short_conv_174	gaokao_audio/short_conv_174.wav	111111	Why is Emily mentioned in the conversation? A. She might want a ticket; B. She is looking for the man; C. She has an extra ticket.	A	True
+short_conv_1740	gaokao_audio/short_conv_1740.wav	111111	Where is the woman’s next stop? A. New York; B. Paris; C. London.	A	True
+short_conv_1741	gaokao_audio/short_conv_1741.wav	111111	How will the man go to Detroit? A. By plane; B. By bus; C. By train.	C	True
+short_conv_1742	gaokao_audio/short_conv_1742.wav	111111	What does the man mean? A. He’d like to see Joan; B. He doesn’t want to see Joan; C. He will see Joan eventually.	B	True
+short_conv_1743	gaokao_audio/short_conv_1743.wav	111111	What is the man’s attitude towards the plan? A. He is against it; B. He doesn’t care; C. He thinks it is necessary.	A	True
+short_conv_1744	gaokao_audio/short_conv_1744.wav	111111	Which house fits the speakers? A. House One; B. House Two; C. House Three.	B	True
+short_conv_1745	gaokao_audio/short_conv_1745.wav	111111	What is the man most probably? A. A shop assistant; B. A postman; C. A passer-by.	A	True
+short_conv_1746	gaokao_audio/short_conv_1746.wav	111111	What do we know about the woman’s son? A. He is still a student now; B. He wanted to study arts; C. He taught in Harvard University.	B	True
+short_conv_1747	gaokao_audio/short_conv_1747.wav	111111	Where is the woman living this semester? A. In a rented room; B. In the dormitory; C. At home.	B	True
+short_conv_1748	gaokao_audio/short_conv_1748.wav	111111	What are the speakers talking about? A. Tradition; B. A disease; C. A service.	C	True
+short_conv_1749	gaokao_audio/short_conv_1749.wav	111111	Where are the speakers probably right now? A. In a restaurant; B. At a supermarket; C. At a watch store.	A	True
+short_conv_175	gaokao_audio/short_conv_175.wav	111111	What are the speakers talking about? A. What to drink; B. Where to meet; C. When to leave.	B	True
+short_conv_1750	gaokao_audio/short_conv_1750.wav	111111	When will the professor be back? A. At 12:00; B. At 13:00; C. At 14:00.	C	True
+short_conv_1751	gaokao_audio/short_conv_1751.wav	111111	How might be Mary’s university life before September? A. Exciting; B. Terrible; C. Ordinary.	B	True
+short_conv_1752	gaokao_audio/short_conv_1752.wav	111111	What does the woman advise the man to do? A. Go to the shop to replace the camera; B. Choose a good angle when taking pictures; C. Keep the camera clean when taking pictures.	C	True
+short_conv_1753	gaokao_audio/short_conv_1753.wav	111111	Who is Rose probably? A. The man’s student; B. The speakers’ daughter; C. The woman’s teacher.	B	True
+short_conv_1754	gaokao_audio/short_conv_1754.wav	111111	What does the woman imply? A. The man is so forgetful; B. The man is too careless; C. The man is over confident.	C	True
+short_conv_1755	gaokao_audio/short_conv_1755.wav	111111	Which aspect of the film does the woman like? A. The plot; B. The music; C. The dialogue.	C	True
+short_conv_1756	gaokao_audio/short_conv_1756.wav	111111	What do we know about the woman’s jacket? A. It is sold at a lower price; B. Its color is her favorite; C. It is her sister’s size.	A	True
+short_conv_1757	gaokao_audio/short_conv_1757.wav	111111	Where did the speakers plan to go? A. A shopping center; B. An opera house; C. The parking lot.	B	True
+short_conv_1758	gaokao_audio/short_conv_1758.wav	111111	Where does the conversation most probably take place? A. In an office; B. In a library; C. In a bookstore.	C	True
+short_conv_1759	gaokao_audio/short_conv_1759.wav	111111	What does the man mean? A. He is very excited about the news; B. He doesn’t pay attention to sports; C. He wishes a different team had won.	B	True
+short_conv_176	gaokao_audio/short_conv_176.wav	111111	What is the relationship between the speakers? A. Colleges; B. Classmates; C. Strangers.	C	True
+short_conv_1760	gaokao_audio/short_conv_1760.wav	111111	What will the young woman probably do? A. Write a paper; B. Ask her mother for help; C. Help the man with his homework.	B	True
+short_conv_1761	gaokao_audio/short_conv_1761.wav	111111	Who is the woman? A. A clerk; B. A teacher; C. A student.	A	True
+short_conv_1762	gaokao_audio/short_conv_1762.wav	111111	How much did Tom return to the woman? A. $5; B. $15; C. $50.	B	True
+short_conv_1763	gaokao_audio/short_conv_1763.wav	111111	How does the woman feel about her new job? A. Bored; B. Worried; C. Excited.	C	True
+short_conv_1764	gaokao_audio/short_conv_1764.wav	111111	What is the man doing? A. Looking for a job ; B. Asking for advice; C. Reserving a hotel .	A	True
+short_conv_1765	gaokao_audio/short_conv_1765.wav	111111	What do we know about the man? A. He canceled his flight; B. He was late for the meeting; C. He skipped the discussion.	B	True
+short_conv_1766	gaokao_audio/short_conv_1766.wav	111111	What did the man think of the movie? A. Interesting; B. Moving; C. Boring.	C	True
+short_conv_1767	gaokao_audio/short_conv_1767.wav	111111	On which day does the woman work seven hours? A. On Monday; B. On Thursday; C. On Friday.	C	True
+short_conv_1768	gaokao_audio/short_conv_1768.wav	111111	What is Maria going to do next? A. Answer questions; B. Attend a lecture; C. Set up a project.	B	True
+short_conv_1769	gaokao_audio/short_conv_1769.wav	111111	What does the man suggest the woman do? A. Write Daisy a note of apology; B. Return Daisy’s notes in a few days; C. Apologize when Daisy is less angry.	C	True
+short_conv_177	gaokao_audio/short_conv_177.wav	111111	What does Jack want to do? A. Take fitness classes; B. Buy a pair of gym shoes; C. Change his work schedule.	A	True
+short_conv_1770	gaokao_audio/short_conv_1770.wav	111111	How much money can the woman lend to the man? A. $ 50; B. $ 100; C. $ 150.	A	True
+short_conv_1771	gaokao_audio/short_conv_1771.wav	111111	What was the weather like at noon? A. Hot; B. Cool; C. Cold.	B	True
+short_conv_1772	gaokao_audio/short_conv_1772.wav	111111	Where are the speakers? A. In a library; B. In a bookstore; C. In a classroom.	B	True
+short_conv_1773	gaokao_audio/short_conv_1773.wav	111111	What is the man? A. A cook; B. A teacher; C. A salesman.	A	True
+short_conv_1774	gaokao_audio/short_conv_1774.wav	111111	What will the man probably do? A. Sail in a boat; B. Swim in the ocean; C. Go shopping with the woman.	A	True
+short_conv_1775	gaokao_audio/short_conv_1775.wav	111111	When do the speakers plan to meet? A. This afternoon; B. Tomorrow morning; C. Tomorrow afternoon.	C	True
+short_conv_1776	gaokao_audio/short_conv_1776.wav	111111	Why didn’t the man invite the woman to the party? A. He doesn’t like the woman at all; B. He thought the woman had known it; C. He was afraid that the woman had no time.	B	True
+short_conv_1777	gaokao_audio/short_conv_1777.wav	111111	How did the man think of the movie? A. Uninteresting; B. Very interesting; C. Too long.	A	True
+short_conv_1778	gaokao_audio/short_conv_1778.wav	111111	What does the man ask the woman to do? A. Do her homework; B. Make dinner for him; C. Have dinner with him.	C	True
+short_conv_1779	gaokao_audio/short_conv_1779.wav	111111	Where does the man’s uncle live? A. In New York; B. In London; C. In Paris.	C	True
+short_conv_178	gaokao_audio/short_conv_178.wav	111111	Where does this conversation take place? A. In a classroom; B. In a hospital; C. In a museum.	B	True
+short_conv_1780	gaokao_audio/short_conv_1780.wav	111111	How much did the jeans cost before the sale? A. 30 dollars B. 50 dollars C. 60 dollars	C	True
+short_conv_1781	gaokao_audio/short_conv_1781.wav	111111	What will the man do tomorrow? A. Go hiking; B. Stay at home; C. See a doctor.	B	True
+short_conv_1782	gaokao_audio/short_conv_1782.wav	111111	When is the man flying to Paris? A. On February 5th; B. On February 10th; C. On February 15th.	C	True
+short_conv_1783	gaokao_audio/short_conv_1783.wav	111111	Why doesn’t the woman want to wear the coat? A. The style is old; B. The color is ugly; C. The quality is not good.	A	True
+short_conv_1784	gaokao_audio/short_conv_1784.wav	111111	How much was the woman charged? A. $21; B. $30; C. $60.	A	True
+short_conv_1785	gaokao_audio/short_conv_1785.wav	111111	What is the woman doing? A. Booking flight tickets; B. Catching a flight; C. Trying to change seats.	A	True
+short_conv_1786	gaokao_audio/short_conv_1786.wav	111111	What’s the man’s excuse for failing the math exam? A. He didn’t prepare it well; B. He got too much pressure; C. He isn’t talented at math.	C	True
+short_conv_1787	gaokao_audio/short_conv_1787.wav	111111	What will the man do next Tuesday? A. Play football; B. Watch a game; C. Visit a factory.	C	True
+short_conv_1788	gaokao_audio/short_conv_1788.wav	111111	Why won’t the woman go to the bar? A. It’s no fun; B. It’s expensive; C. It’s too far away.	B	True
+short_conv_1789	gaokao_audio/short_conv_1789.wav	111111	What color is the woman’s new skirt? A. Green; B. Red; C. Blue.	C	True
+short_conv_179	gaokao_audio/short_conv_179.wav	111111	Why would David quit his job? A. To go back to school; B. To start his own firm; C. To work for his friend.	C	True
+short_conv_1790	gaokao_audio/short_conv_1790.wav	111111	Who probably went to Prof. Freeman’s class today? A. Felicia; B. Jack; C. Eric.	C	True
+short_conv_1791	gaokao_audio/short_conv_1791.wav	111111	How will the speakers travel to the countryside? A. By car; B. By bus; C. By train.	A	True
+short_conv_1792	gaokao_audio/short_conv_1792.wav	111111	What will the woman do this Saturday? A. Try the new restaurant; B. Attend a concert. ; C. Go to the park.	B	True
+short_conv_1793	gaokao_audio/short_conv_1793.wav	111111	What is the man? A. An actor; B. A director; C. A screenwriter.	C	True
+short_conv_1794	gaokao_audio/short_conv_1794.wav	111111	What is the man’s suggestion about serious pollution? A. Don’t breathe the poisonous air; B. The government should take action; C. The government should protect the environment.	B	True
+short_conv_1795	gaokao_audio/short_conv_1795.wav	111111	What does the man want to do? A. Buy a light; B. Get to the nearest light; C. Go to the supermarket.	C	True
+short_conv_1796	gaokao_audio/short_conv_1796.wav	111111	What does Laura need at the moment? A. Blame; B. Encouragement; C. Help with her chemistry.	B	True
+short_conv_1797	gaokao_audio/short_conv_1797.wav	111111	How many pills should the woman take at a time? A. 5; B. 3; C. 2.	A	True
+short_conv_1798	gaokao_audio/short_conv_1798.wav	111111	What will the woman do this Saturday? A. Go to see her sister; B. Go to the concert; C. Look after her brother’s son.	C	True
+short_conv_1799	gaokao_audio/short_conv_1799.wav	111111	What did the man think of the lecture? A. Exciting; B. Boring; C. Moving.	B	True
+short_conv_18	gaokao_audio/short_conv_18.wav	111111	When does the man usually sleep? A. At 9:00; B. At 10:00; C. At 11:00.	B	True
+short_conv_180	gaokao_audio/short_conv_180.wav	111111	What does the man tell Jane to do? A. Postpone his appointment; B. Meet Mr. Douglas; C. Return at 3 o’clock.	A	True
+short_conv_1800	gaokao_audio/short_conv_1800.wav	111111	What are the speakers talking about? A. Teachers’ hard work; B. A school performance; C. Long studying hours.	B	True
+short_conv_1801	gaokao_audio/short_conv_1801.wav	111111	Where does the conversation probably take place? A. At school; B. At home; C. At a shop.	A	True
+short_conv_1802	gaokao_audio/short_conv_1802.wav	111111	What can we know about the man’s hobby? A. His hobby is stamp collecting; B. He has no hobby; C. His hobby is photography.	C	True
+short_conv_1803	gaokao_audio/short_conv_1803.wav	111111	What program is the man watching? A. An advertisement; B. The World Cup; C. An interesting play.	A	True
+short_conv_1804	gaokao_audio/short_conv_1804.wav	111111	What does the woman think of the weather today? A. It’s too hot; B. It’s pretty cool; C. It’s sunny and nice.	C	True
+short_conv_1805	gaokao_audio/short_conv_1805.wav	111111	Where does the woman work? A. In a restaurant; B. In a shoe store; C. In a supermarket.	A	True
+short_conv_1806	gaokao_audio/short_conv_1806.wav	111111	What’s the relationship between the speakers? A. Classmates; B. Dentist and patient; C. Teacher and student.	A	True
+short_conv_1807	gaokao_audio/short_conv_1807.wav	111111	Why does the woman need to go to the dentist? A. To have a tooth pulled; B. To check her sore tooth; C. To get her teeth cleaned.	C	True
+short_conv_1808	gaokao_audio/short_conv_1808.wav	111111	What meal is the man eating? A. Breakfast; B. Lunch; C. Dinner.	A	True
+short_conv_1809	gaokao_audio/short_conv_1809.wav	111111	What does the man think of the film? A. Worth watching; B. Difficult to follow; C. Very famous.	B	True
+short_conv_181	gaokao_audio/short_conv_181.wav	111111	How much will the man pay? A. $520; B. $80; C. $100.	B	True
+short_conv_1810	gaokao_audio/short_conv_1810.wav	111111	Why does Alice feel excited? A. She has won the first race; B. She has been chosen for the race; C. She has got a pair of running shoes.	B	True
+short_conv_1811	gaokao_audio/short_conv_1811.wav	111111	What are the speakers talking about? A. A dish; B. An artist; C. A trip.	A	True
+short_conv_1812	gaokao_audio/short_conv_1812.wav	111111	Where should the man take the first turning? A. At the theater; B. At a post office; C. At the barber’s shop.	C	True
+short_conv_1813	gaokao_audio/short_conv_1813.wav	111111	What does the man want to buy? A. Football B. Some books; C. Basketball tickets.	C	True
+short_conv_1814	gaokao_audio/short_conv_1814.wav	111111	Why is the woman late for her class again? A. She has to prepare the supper; B. She has to do her homework; C. She has to meet some friends.	A	True
+short_conv_1815	gaokao_audio/short_conv_1815.wav	111111	What are the speakers going to do? A. Go skiing; B. Go to school; C. Clean the snow.	A	True
+short_conv_1816	gaokao_audio/short_conv_1816.wav	111111	What is the man doing? A. Buying a book; B. Chatting with a friend; C. Asking the way.	C	True
+short_conv_1817	gaokao_audio/short_conv_1817.wav	111111	What is the possible relationship between the speakers? A. Classmates; B. Strangers; C. Workmates.	C	True
+short_conv_1818	gaokao_audio/short_conv_1818.wav	111111	What instrument is the woman best at playing? A. Erhu; B. Violin; C. Piano.	B	True
+short_conv_1819	gaokao_audio/short_conv_1819.wav	111111	What are the speakers talking about? A. A dress; B. A sale; C. Some shoes.	A	True
+short_conv_182	gaokao_audio/short_conv_182.wav	111111	Where does the conversation probably take place? A. In a library; B. In a bookstore; C. In a classroom.	B	True
+short_conv_1820	gaokao_audio/short_conv_1820.wav	111111	Where are the speakers? A. On a bus; B. On a train; C. On a plane.	C	True
+short_conv_1821	gaokao_audio/short_conv_1821.wav	111111	What happened to the woman? A. She was late for work; B. She offered bad service; C. She was asked to leave her job.	A	True
+short_conv_1822	gaokao_audio/short_conv_1822.wav	111111	What kind of weather does the man like? A. Rainy; B. Sunny; C. Cloudy.	B	True
+short_conv_1823	gaokao_audio/short_conv_1823.wav	111111	What is the woman looking for? A. Her glasses; B. Her keys; C. Her books.	A	True
+short_conv_1824	gaokao_audio/short_conv_1824.wav	111111	What did Patrick do last Friday? A. He moved to another place; B. He sold his old apartment; C. He went out with a friend.	A	True
+short_conv_1825	gaokao_audio/short_conv_1825.wav	111111	What does the woman do? A. She’s a salesperson; B. She’s a librarian; C. She’s a bank clerk.	B	True
+short_conv_1826	gaokao_audio/short_conv_1826.wav	111111	What did Fred do? A. He travelled to Italy; B. He offered Kate a ride; C. He bought a new car.	C	True
+short_conv_1827	gaokao_audio/short_conv_1827.wav	111111	How does Henry feel now? A. Proud; B. Tired; C. Grateful.	B	True
+short_conv_1828	gaokao_audio/short_conv_1828.wav	111111	What is the woman going to do this afternoon? A. Eat out; B. See a doctor; C. Go shopping.	C	True
+short_conv_1829	gaokao_audio/short_conv_1829.wav	111111	Why does the man come to the woman? A. To take a picture of her; B. To ask for a new ID card; C. To fill out a form.	B	True
+short_conv_183	gaokao_audio/short_conv_183.wav	111111	How does the woman feel now? A. Relaxed; B. Excited; C. Tired.	C	True
+short_conv_1830	gaokao_audio/short_conv_1830.wav	111111	What color was the woman’s couch? A. Yellow; B. Brown; C. Purple.	C	True
+short_conv_1831	gaokao_audio/short_conv_1831.wav	111111	What are the speakers mainly talking about? A. The man’s watch; B. The man’s brother; C. The man’s birthday.	A	True
+short_conv_1832	gaokao_audio/short_conv_1832.wav	111111	What is the dog’s name? A. Scott; B. Michael; C. Robert.	A	True
+short_conv_1833	gaokao_audio/short_conv_1833.wav	111111	What do we know about the watch? A. The watch was not worth that much; B. The price was reasonable; C. It cost the woman $ 40.	A	True
+short_conv_1834	gaokao_audio/short_conv_1834.wav	111111	When is Simon supposed to arrive? A. 8:00; B. 7:30; C. 8:10.	A	True
+short_conv_1835	gaokao_audio/short_conv_1835.wav	111111	What does the man mean? A. He will carry the boxes later; B. He is unable to give help; C. He refuses to pay for boxes.	B	True
+short_conv_1836	gaokao_audio/short_conv_1836.wav	111111	What did the woman think of Dana’s speech? A. Important; B. Boring; C. Well-prepared.	C	True
+short_conv_1837	gaokao_audio/short_conv_1837.wav	111111	What are the speakers talking about? A. Borrowing DVDs; B. Buying DVDs; C. Sharing DVDs.	A	True
+short_conv_1838	gaokao_audio/short_conv_1838.wav	111111	What will the woman do about the dress? A. She’ll return it; B. She’ll change it; C. She’ll buy it.	C	True
+short_conv_1839	gaokao_audio/short_conv_1839.wav	111111	Where are the two speakers? A. On the ground floor; B. By the European paintings; C. At the black and white photo show.	A	True
+short_conv_184	gaokao_audio/short_conv_184.wav	111111	Why would David quit his job? A. To go back to school; B. To start his own firm; C. To work for his friend.	C	True
+short_conv_1840	gaokao_audio/short_conv_1840.wav	111111	What are the two speakers talking about? A. A new movie; B. A weekend plan; C. Steve’s cousin.	B	True
+short_conv_1841	gaokao_audio/short_conv_1841.wav	111111	What will the woman do? A. Cut down on food; B. Take her temperature; C. Take medicine with food.	C	True
+short_conv_1842	gaokao_audio/short_conv_1842.wav	111111	What happened to Susan? A. She lost her ticket; B. She got her driving license; C. She was fined for speeding.	C	True
+short_conv_1843	gaokao_audio/short_conv_1843.wav	111111	How does the woman find the book? A. Appealing B. Just so-so; C. Strange.	A	True
+short_conv_1844	gaokao_audio/short_conv_1844.wav	111111	What is the man interested in? A. Education; B. Medicine; C. Technology.	A	True
+short_conv_1845	gaokao_audio/short_conv_1845.wav	111111	Why does the man want to move? A. To be near his office; B. To go to a good school; C. To live in a bigger house.	C	True
+short_conv_1846	gaokao_audio/short_conv_1846.wav	111111	What would the man like to drink? A. Iced coffee; B. Red tea; C. Hot coffee.	C	True
+short_conv_1847	gaokao_audio/short_conv_1847.wav	111111	What will the girl do with Holly? A. Wash their bikes; B. Do their homework; C. Shop for new skirts.	A	True
+short_conv_1848	gaokao_audio/short_conv_1848.wav	111111	Where are the speakers? A. In Singapore; B. In Canada; C. In America.	B	True
+short_conv_1849	gaokao_audio/short_conv_1849.wav	111111	Where is Peter now? A. In an office; B. At a restaurant; C. At home.	A	True
+short_conv_185	gaokao_audio/short_conv_185.wav	111111	What does the man tell Jane to do? A. Postpone his appointment; B. Meet Mr. Douglas; C. Return at 3 o’clock.	A	True
+short_conv_1850	gaokao_audio/short_conv_1850.wav	111111	What is the man going to do first? A. Watch a movie; B. Do some shopping; C. Meet his teacher.	C	True
+short_conv_1851	gaokao_audio/short_conv_1851.wav	111111	What does the man mean? A. The woman can’t leave early; B. He’ll pick up the woman’s parents; C. Mr. Black won’t come at 4 o’clock.	A	True
+short_conv_1852	gaokao_audio/short_conv_1852.wav	111111	How will the woman pay for the toy? A. In cash; B. By check; C. By credit card.	B	True
+short_conv_1853	gaokao_audio/short_conv_1853.wav	111111	What time does Bill usually get up? A. Around 7:00; B. Around 6:30; C. Around 6:00.	C	True
+short_conv_1854	gaokao_audio/short_conv_1854.wav	111111	What has the bear been doing? A. Eating campers’ food; B. Chasing the tourists; C. Attacking the park rangers.	A	True
+short_conv_1855	gaokao_audio/short_conv_1855.wav	111111	What are the speakers talking about? A. A nice hairstyle; B. Their wedding; C. An old photo.	C	True
+short_conv_1856	gaokao_audio/short_conv_1856.wav	111111	Where does the man want to visit? A. Spain; B. Italy; C. France.	B	True
+short_conv_1857	gaokao_audio/short_conv_1857.wav	111111	How does the woman probably feel? A. Excited; B. Nervous; C. Unhappy.	C	True
+short_conv_1858	gaokao_audio/short_conv_1858.wav	111111	Where is this bus going? A. South; B. East; C. North.	A	True
+short_conv_1859	gaokao_audio/short_conv_1859.wav	111111	What do we know about the woman? A. She is fired; B. She didn’t work hard; C. She can take a day off tomorrow.	A	True
+short_conv_186	gaokao_audio/short_conv_186.wav	111111	How much will the man pay? A. $520; B. $80; C. $100.	B	True
+short_conv_1860	gaokao_audio/short_conv_1860.wav	111111	In which country does Jane want to spend her holiday? A. America; B. Korea; C. Japan.	B	True
+short_conv_1861	gaokao_audio/short_conv_1861.wav	111111	When will the speakers meet? A. At 6:20; B. At 6:10; C. At 5:40.	B	True
+short_conv_1862	gaokao_audio/short_conv_1862.wav	111111	Where does the conversation probably take place? A. In an office; B. In a store; C. In a hotel.	C	True
+short_conv_1863	gaokao_audio/short_conv_1863.wav	111111	What is the woman doing now? A. Watching TV; B. Taking part in an activity; C. Preparing for an exam.	C	True
+short_conv_1864	gaokao_audio/short_conv_1864.wav	111111	What will the man do right now? A. Buy his mum a coat; B. Buy his mum a handbag; C. Give Mary a call	C	True
+short_conv_1865	gaokao_audio/short_conv_1865.wav	111111	What time will the man call the woman? A. At 5:30; B. At 6:00; C. At 6:30.	A	True
+short_conv_1866	gaokao_audio/short_conv_1866.wav	111111	Which dress does the woman want to buy? A. The red one; B. The green one; C. The brown one.	B	True
+short_conv_1867	gaokao_audio/short_conv_1867.wav	111111	How much should the man pay? A. $19; B. $18; C. $17.	A	True
+short_conv_1868	gaokao_audio/short_conv_1868.wav	111111	Which place is the woman heading for right away? A. Her office; B. A flower shop; C. A hospital.	B	True
+short_conv_1869	gaokao_audio/short_conv_1869.wav	111111	What does the man want? A. A hot drink; B. Iced tea; C. A chocolate cake.	A	True
+short_conv_187	gaokao_audio/short_conv_187.wav	111111	How does the woman feel now? A. Relaxed; B. Excited; C. Tired.	C	True
+short_conv_1870	gaokao_audio/short_conv_1870.wav	111111	What does the man think the woman should do? A. Cancel her trip to Spain; B. Speak out how she feels; C. Go to another country.	B	True
+short_conv_1871	gaokao_audio/short_conv_1871.wav	111111	What does the man want to do? A. Borrow a book; B. Buy a book online; C. Return a book to the library.	A	True
+short_conv_1872	gaokao_audio/short_conv_1872.wav	111111	What do we know about the man’s ticket? A. It was super expensive; B. He bought it a week ago; C. He got it at the last minute.	C	True
+short_conv_1873	gaokao_audio/short_conv_1873.wav	111111	What time is it in New York? A. It’s 5:00 p.m; B. It’s 7:00 p.m; C. It’s 10:00 p.m.	B	True
+short_conv_1874	gaokao_audio/short_conv_1874.wav	111111	Why did the girl run into the man? A. She was running too fast; B. She was looking at her phone; C. She was holding too many papers.	B	True
+short_conv_1875	gaokao_audio/short_conv_1875.wav	111111	What does the woman suggest the man do? A. Get a new car; B. Get a new job; C. Fix his car.	A	True
+short_conv_1876	gaokao_audio/short_conv_1876.wav	111111	What will the man probably do next? A. Check out of his hotel; B. Take some medicine; C. See a doctor.	C	True
+short_conv_1877	gaokao_audio/short_conv_1877.wav	111111	Where might the speakers be? A. In a restaurant; B. At the man’s house; C. In a supermarket.	A	True
+short_conv_1878	gaokao_audio/short_conv_1878.wav	111111	What does the man imply? A. He won’t listen to the woman; B. He doesn’t know the woman; C. He mistook the woman for someone else.	B	True
+short_conv_1879	gaokao_audio/short_conv_1879.wav	111111	Who likes music that has great lyrics(歌词)？ A. The woman B. The man C. Both of them	A	True
+short_conv_188	gaokao_audio/short_conv_188.wav	111111	Where does the conversation probably take place? A. In a library; B. In a bookstore; C. In a classroom.	B	True
+short_conv_1880	gaokao_audio/short_conv_1880.wav	111111	What does the woman want the man to do? A. Take her bike away; B. Go out with her; C. Repair her bike	C	True
+short_conv_1881	gaokao_audio/short_conv_1881.wav	111111	How many classes does the man have? A. Three B. Four C. Five	B	True
+short_conv_1882	gaokao_audio/short_conv_1882.wav	111111	Where does the conversation probably take place? A. In a restaurant B. In a hospital C. In a shop	A	True
+short_conv_1883	gaokao_audio/short_conv_1883.wav	111111	What does the man advise the woman to wear? A. A suit B. A uniform C. A black dress	C	True
+short_conv_1884	gaokao_audio/short_conv_1884.wav	111111	What are the two speakers talking about? A. Diving; B. Drawing; C. Driving	C	True
+short_conv_1885	gaokao_audio/short_conv_1885.wav	111111	Why is the woman preparing so much food? A. It’s the man’s birthday; B. The woman wants to thank the man; C. The man can eat a lot.	A	True
+short_conv_1886	gaokao_audio/short_conv_1886.wav	111111	Where does the conversation most probably take place? A. In a clothing store B. In a restaurant C. In a bookstore	B	True
+short_conv_1887	gaokao_audio/short_conv_1887.wav	111111	Where is the man going first? A. To the Healey Supermarket; B. To the airport; C. To Canada.	A	True
+short_conv_1888	gaokao_audio/short_conv_1888.wav	111111	Where is the man’s mother now? A. At home; B. In a hospital; C. At a bus stop.	B	True
+short_conv_1889	gaokao_audio/short_conv_1889.wav	111111	What might have happened? A. An earthquake; B. A fire; C. A gas accident.	A	True
+short_conv_189	gaokao_audio/short_conv_189.wav	111111	Why is Emily mentioned in the conversation? A. She might want a ticket; B. She is looking for the man; C. She has an extra ticket.	A	True
+short_conv_1890	gaokao_audio/short_conv_1890.wav	111111	What is the man doing? A. Making a phone call; B. Making a visit; C. Making an appointment.	B	True
+short_conv_1891	gaokao_audio/short_conv_1891.wav	111111	When will the man have a meeting? A. In a minute; B. Tomorrow; C. In a couple of hours.	C	True
+short_conv_1892	gaokao_audio/short_conv_1892.wav	111111	What will the woman probably do? A. Wait for the airport bus; B. Go to the airport by taxi; C. Take a taxi and go home.	B	True
+short_conv_1893	gaokao_audio/short_conv_1893.wav	111111	How does the man like to begin his lecture? A. With an introduction; B. With a smile; C. With a funny story.	C	True
+short_conv_1894	gaokao_audio/short_conv_1894.wav	111111	How might the woman feel? A. Uneasy; B. Disappointed; C. Unconcerned.	B	True
+short_conv_1895	gaokao_audio/short_conv_1895.wav	111111	What seemed to be Sarah’s problem? A. She couldn’t finish the task as required; B. She failed in a job interview again; C. She always went to work late.	A	True
+short_conv_1896	gaokao_audio/short_conv_1896.wav	111111	What are the speakers mainly talking about? A. Environmental protection; B. Greenhouse effect; C. Gardening skills.	C	True
+short_conv_1897	gaokao_audio/short_conv_1897.wav	111111	How much more does Lucas need for the cellphone? A. $300; B. $500; C. $800.	A	True
+short_conv_1898	gaokao_audio/short_conv_1898.wav	111111	What did the woman try to quit drinking? A. Tea; B. Coffee; C. Juice.	B	True
+short_conv_1899	gaokao_audio/short_conv_1899.wav	111111	What are the speakers talking about? A. Weather; B. Clothes; C. News.	A	True
+short_conv_19	gaokao_audio/short_conv_19.wav	111111	Why did the man get a ticket? A. He drove too fast; B. He ran the red light; C. He made a wrong turn.	B	True
+short_conv_190	gaokao_audio/short_conv_190.wav	111111	What is the relationship between the speakers? A. Colleges; B. Classmates; C. Strangers.	C	True
+short_conv_1900	gaokao_audio/short_conv_1900.wav	111111	What does the man think of the book? A. Quite difficult; B. Very interesting; C. Too simple.	B	True
+short_conv_1901	gaokao_audio/short_conv_1901.wav	111111	Who might Mr. Peterson be? A. A new professor; B. A department head; C. A company director.	C	True
+short_conv_1902	gaokao_audio/short_conv_1902.wav	111111	What will the man do for the woman? A. Repair her car; B. Give her a ride; C. Pick up her aunt.	B	True
+short_conv_1903	gaokao_audio/short_conv_1903.wav	111111	What does the woman want to do? A. Find a place; B. Buy a map; C. Get an address.	A	True
+short_conv_1904	gaokao_audio/short_conv_1904.wav	111111	What does the woman want the man to do? A. Speak louder; B. Apologize to her; C. Turn off the radio.	C	True
+short_conv_1905	gaokao_audio/short_conv_1905.wav	111111	What is the woman’s opinion about the course? A. Too hard; B. Worth taking; C. Very easy.	B	True
+short_conv_1906	gaokao_audio/short_conv_1906.wav	111111	What will the man do? A. Attend a meeting; B. Give a lecture; C. Leave his office.	A	True
+short_conv_1907	gaokao_audio/short_conv_1907.wav	111111	What does the woman think of the weather? A. It’s nice; B. It’s warm; C. It’s cold.	C	True
+short_conv_1908	gaokao_audio/short_conv_1908.wav	111111	What time is it now? A. 9:10; B. 9:50; C. 10:00.	A	True
+short_conv_1909	gaokao_audio/short_conv_1909.wav	111111	What is the woman’s attitude? A. She thinks spoken English is useful; B. She is good at spoken English; C. She isn’t interested in spoken English.	A	True
+short_conv_191	gaokao_audio/short_conv_191.wav	111111	What are the speakers talking about? A. What to drink; B. Where to meet; C. When to leave.	B	True
+short_conv_1910	gaokao_audio/short_conv_1910.wav	111111	What is the weather like today? A. Rainy; B. Snowy; C. Sunny.	C	True
+short_conv_1911	gaokao_audio/short_conv_1911.wav	111111	Why did the woman have to walk? A. Her car was stolen; B. Her car hit a high tree; C. Something has gone wrong with her car.	C	True
+short_conv_1912	gaokao_audio/short_conv_1912.wav	111111	How long can the woman keep the book? A. For six days; B. For five days; C. For eight days.	A	True
+short_conv_1913	gaokao_audio/short_conv_1913.wav	111111	When will the woman sleep? A. After taking a walk; B. After turning off the lights; C. Before turning off the lights.	B	True
+short_conv_1914	gaokao_audio/short_conv_1914.wav	111111	What did the woman regret? A. Watching the soap opera; B. Paying for the tickets; C. Seeing the film.	C	True
+short_conv_1915	gaokao_audio/short_conv_1915.wav	111111	Where are the speakers going? A. A toy store; B. A restaurant; C. A bookstore.	B	True
+short_conv_1916	gaokao_audio/short_conv_1916.wav	111111	What seems to be Peter’s problem? A. He came late for a class; B. He took the wrong seat; C. He failed to borrow books.	A	True
+short_conv_1917	gaokao_audio/short_conv_1917.wav	111111	How does the woman sound? A. Excited; B. Surprised; C. Annoyed.	C	True
+short_conv_1918	gaokao_audio/short_conv_1918.wav	111111	What is the man doing now? A. Applying for a job; B. Asking for a pay raise; C. Interviewing the woman.	C	True
+short_conv_1919	gaokao_audio/short_conv_1919.wav	111111	What does the woman mean? A. It’s terrible to go abroad alone; B. It doesn’t matter if the man is not good at English; C. The man should improve his English.	C	True
+short_conv_192	gaokao_audio/short_conv_192.wav	111111	What does Jack want to do? A. Take fitness classes; B. Buy a pair of gym shoes; C. Change his work schedule.	A	True
+short_conv_1920	gaokao_audio/short_conv_1920.wav	111111	What do we know about the woman? A. She is too busy to go swimming; B. She’s willing to go swimming; C. She doesn’t want to wait long.	B	True
+short_conv_1921	gaokao_audio/short_conv_1921.wav	111111	What has the man been doing? A. Having an interview; B. Filling out a form; C. Asking for information.	B	True
+short_conv_1922	gaokao_audio/short_conv_1922.wav	111111	What is the man? A. He is a sports fan; B. He is a referee; C. He is an excellent athlete.	A	True
+short_conv_1923	gaokao_audio/short_conv_1923.wav	111111	How much should the man’s rent be? A. $500; B. $150; C. $125.	C	True
+short_conv_1924	gaokao_audio/short_conv_1924.wav	111111	What are the speakers discussing? A. A TV show; B. Their friend Kimmy Schmidt; C. An underground comedy club.	A	True
+short_conv_1925	gaokao_audio/short_conv_1925.wav	111111	What does the woman mean? A. She will help the man later; B. The man has always been lucky; C. She can't be of any assistance.	C	True
+short_conv_1926	gaokao_audio/short_conv_1926.wav	111111	Where are the speakers？ A. In a grocery store; B. In a candy store; C. At a café.	A	True
+short_conv_1927	gaokao_audio/short_conv_1927.wav	111111	What is the relationship between the speakers? A. Boss and employee; B. Customer and store clerk ; C. Father and daughter.	A	True
+short_conv_1928	gaokao_audio/short_conv_1928.wav	111111	What does the woman want the man to do? A. Eat a slice of pizza; B. Lie down; C. Turn on the TV.	B	True
+short_conv_1929	gaokao_audio/short_conv_1929.wav	111111	What will the woman do later probably? A. Watch TV; B. Go shopping; C. Go to work.	B	True
+short_conv_193	gaokao_audio/short_conv_193.wav	111111	Where does this conversation take place? A. In a classroom; B. In a hospital; C. In a museum.	B	True
+short_conv_1930	gaokao_audio/short_conv_1930.wav	111111	What does the woman mean? A. The man is not fully recovered yet; B. The man can leave the hospital now; C. She is not certain about the man’s condition.	A	True
+short_conv_1931	gaokao_audio/short_conv_1931.wav	111111	What’s the probable relationship between the man and Mary? A. Father and daughter; B. Friends; C. husband and wife.	C	True
+short_conv_1932	gaokao_audio/short_conv_1932.wav	111111	What might the man dress up like for the coming Halloween? A. A ghost; B. A skeleton; C. A witch.	B	True
+short_conv_1933	gaokao_audio/short_conv_1933.wav	111111	What is in the study? A. A thief; B. A rat; C. A dog.	C	True
+short_conv_1934	gaokao_audio/short_conv_1934.wav	111111	What is the man going to do? A. Go on the Internet; B. Make a phone call; C. Take a train trip.	A	True
+short_conv_1935	gaokao_audio/short_conv_1935.wav	111111	Where are the speakers? A. In a classroom; B. In a library; C. In a bookstore.	B	True
+short_conv_1936	gaokao_audio/short_conv_1936.wav	111111	What are the speakers talking about? A. Going out; B. Ordering drinks; C. Preparing for a party.	C	True
+short_conv_1937	gaokao_audio/short_conv_1937.wav	111111	How will Susan spend most of her time in France? A. Traveling around; B. Studying at a school; C. Looking after her aunt.	A	True
+short_conv_1938	gaokao_audio/short_conv_1938.wav	111111	What does the woman think of the movie? A. It’s amusing; B. It’s exciting; C. It’s disappointing.	C	True
+short_conv_1939	gaokao_audio/short_conv_1939.wav	111111	Where does this conversation probably take place? A. At a clothing store; B. In a tailor's shop; C. At a laundry.	C	True
+short_conv_194	gaokao_audio/short_conv_194.wav	111111	What are the speakers mainly talking about? A. The man’s haircut; B. The man’s friends; C. The man’s social life.	A	True
+short_conv_1940	gaokao_audio/short_conv_1940.wav	111111	What does the man say about Stephanie? A. She will get well soon; B. She has a very bad cold; C. She is coming to the beach.	B	True
+short_conv_1941	gaokao_audio/short_conv_1941.wav	111111	What does the woman like best about the shirt? A. The color; B. The price; C. The material.	A	True
+short_conv_1942	gaokao_audio/short_conv_1942.wav	111111	What will the woman do next? A. Walk to the university; B. Get off at the next stop; C. Take the downtown bus.	B	True
+short_conv_1943	gaokao_audio/short_conv_1943.wav	111111	Where is the man’s passport? A. In his car; B. In his bag; C. In his pocket.	C	True
+short_conv_1944	gaokao_audio/short_conv_1944.wav	111111	What does the woman order? A. Eggs and bread; B. Eggs and fruit; C. Fruit and bread.	C	True
+short_conv_1945	gaokao_audio/short_conv_1945.wav	111111	What did the man think was wrong at first? A. He left something inside the car; B. He forgot to turn off the lights; C. He left his wallet at home.	B	True
+short_conv_1946	gaokao_audio/short_conv_1946.wav	111111	Who is the man probably talking to? A. His boss; B. His assistant; C. His customer.	A	True
+short_conv_1947	gaokao_audio/short_conv_1947.wav	111111	Where are the speakers? A. In a restaurant; B. At home; C. In a grocery store.	B	True
+short_conv_1948	gaokao_audio/short_conv_1948.wav	111111	Why can’t the man park there? A. It is after 4 o’clock; B. He is blocking the driveway; C. Only the police can park there.	A	True
+short_conv_1949	gaokao_audio/short_conv_1949.wav	111111	What are the speakers mainly talking about? A. James’ daily life; B. James’ business; C. James’ family.	B	True
+short_conv_195	gaokao_audio/short_conv_195.wav	111111	Who is the woman talking to? A. Her student; B. Her son; C. Her teacher.	C	True
+short_conv_1950	gaokao_audio/short_conv_1950.wav	111111	What will the man do next? A. Leave; B. Phone Linda; C. Keep on waiting.	C	True
+short_conv_1951	gaokao_audio/short_conv_1951.wav	111111	What is wrong with the man? A. He’s got a headache; B. He can’t fall asleep at night; C. He doesn’t feel the pain.	A	True
+short_conv_1952	gaokao_audio/short_conv_1952.wav	111111	Why does the man want to leave his job? A. He doesn’t get on with his workmates; B. He thinks the job is too boring; C. The working place is too far.	C	True
+short_conv_1953	gaokao_audio/short_conv_1953.wav	111111	How will the man go to Chicago? A. By plane B. By train; C. By bus.	B	True
+short_conv_1954	gaokao_audio/short_conv_1954.wav	111111	What are the speakers doing? A. Painting the dining room; B. Discussing a house plan; C. Cleaning the kitchen.	B	True
+short_conv_1955	gaokao_audio/short_conv_1955.wav	111111	What does the woman imply? A. She won’t go to the beach if it rains; B. It will clear up tomorrow; C. It was pouring when she was at the beach.	A	True
+short_conv_1956	gaokao_audio/short_conv_1956.wav	111111	How does the woman feel? A. Annoyed; B. Embarrassed; C. Bored.	A	True
+short_conv_1957	gaokao_audio/short_conv_1957.wav	111111	How's the school now? A. It’s not as good as it was; B. It’s better than people say; C. It’s even worse than people say.	A	True
+short_conv_1958	gaokao_audio/short_conv_1958.wav	111111	What does the man mean? A. He wants to order the food; B. He doesn’t like Japanese food; C. He hopes to pay for the meal.	C	True
+short_conv_1959	gaokao_audio/short_conv_1959.wav	111111	According to the woman, what color is the shirt? A. Light blue; B. Green and blue; C. Yellow.	C	True
+short_conv_196	gaokao_audio/short_conv_196.wav	111111	Why did the woman call the man? A. To report a car accident; B. To report her car being stolen; C. To get some information about cars.	B	True
+short_conv_1960	gaokao_audio/short_conv_1960.wav	111111	How much is the service charge if the food cost $50? A. $5; B. $15; C. $50.	A	True
+short_conv_1961	gaokao_audio/short_conv_1961.wav	111111	What worries the woman a lot? A. The location of the hotel; B. The damage to the environment; C. The solution to the issue.	B	True
+short_conv_1962	gaokao_audio/short_conv_1962.wav	111111	What is the probable relationship between the two speakers? A. Teacher and student; B. Policeman and driver; C. Doctor and nurse.	B	True
+short_conv_1963	gaokao_audio/short_conv_1963.wav	111111	Where does the conversation probably take place? A. In a restaurant; B. In a meeting room; C. At the office.	A	True
+short_conv_1964	gaokao_audio/short_conv_1964.wav	111111	Who is the girl talking to? A. A dentist; B. A policeman; C. A salesman.	A	True
+short_conv_1965	gaokao_audio/short_conv_1965.wav	111111	What will the woman work as? A. An assistant; B. A lawyer; C. A teacher.	A	True
+short_conv_1966	gaokao_audio/short_conv_1966.wav	111111	What is the woman going to do? A. Play baseball; B. Watch a game; C. Do her work.	B	True
+short_conv_1967	gaokao_audio/short_conv_1967.wav	111111	What did the man like about the movie? A. The acting; B. The music; C. The scenery.	A	True
+short_conv_1968	gaokao_audio/short_conv_1968.wav	111111	What will the speakers bring to the picnic? A. Some drinks; B. Some fruit; C. Some desserts.	C	True
+short_conv_1969	gaokao_audio/short_conv_1969.wav	111111	What will the speakers probably do next? A. Order some boxes; B. Go home and rest; C. Continue packing.	B	True
+short_conv_197	gaokao_audio/short_conv_197.wav	111111	What will the woman do? A. Return a scarf; B. Exchange her shoes; C. Buy a purse.	A	True
+short_conv_1970	gaokao_audio/short_conv_1970.wav	111111	What does the woman say about John? A. He won’t wait for her; B. He won’t come home today; C. He won’t be on time for dinner.	C	True
+short_conv_1971	gaokao_audio/short_conv_1971.wav	111111	What will the man probably do? A. Take the job; B. Refuse the offer; C. Change the working time.	A	True
+short_conv_1972	gaokao_audio/short_conv_1972.wav	111111	Where is the man going next time? A. To a hill; B. To a park; C. To a beach.	B	True
+short_conv_1973	gaokao_audio/short_conv_1973.wav	111111	What does the woman want the man to do? A. Go to sleep; B. Take care of the cat; C. Stop his dog barking.	C	True
+short_conv_1974	gaokao_audio/short_conv_1974.wav	111111	What does a woman mean? A. Working too fast may lead to undesirable outcomes; B. The result may not be as bad as the man has expected; C. You can never lay too much emphasis on the fast speed.	A	True
+short_conv_1975	gaokao_audio/short_conv_1975.wav	111111	What does the man expect women to do for him? A. Arrange accommodation for him; B. Explain the cause of the cancellation; C. Allow him to take another flight that night.	A	True
+short_conv_1976	gaokao_audio/short_conv_1976.wav	111111	What does the woman mean? A. The woman is afraid of the potential noise; B. The woman has sleeping problems; C. The woman will sign the rental contract.	A	True
+short_conv_1977	gaokao_audio/short_conv_1977.wav	111111	Why did a man give up damaras on half way. A. His shoes were worn out; B. He didn’t like hiking trip; C. He was too exhausted to continue.	C	True
+short_conv_1978	gaokao_audio/short_conv_1978.wav	111111	What does the man imply? A. He has never been to Central Mountains; B. He doesn’t plan to go skiing during spring break; C. He doesn’t recommend going to Central Mountains.	C	True
+short_conv_1979	gaokao_audio/short_conv_1979.wav	111111	What is the woman's attitude towards the study tours? A. Negative; B. Neutral; C. Unclear.	C	True
+short_conv_198	gaokao_audio/short_conv_198.wav	111111	Where is the man going? A. To the classroom; B. To the movie theater; C. To the library.	C	True
+short_conv_1980	gaokao_audio/short_conv_1980.wav	111111	What does the woman mean? A. The movie will be quite boring; B. The kids will be surprised at the movie; C. The movie will not be suitable for kids to see.	C	True
+short_conv_1981	gaokao_audio/short_conv_1981.wav	111111	What does the woman mean? A. She can’t see the time on the sign; B. She loses her glasses; C. The museum is out of sight.	A	True
+short_conv_1982	gaokao_audio/short_conv_1982.wav	111111	What's the probable occupation of the man? A. A tailor; B. An electrician; C. An operator.	B	True
+short_conv_1983	gaokao_audio/short_conv_1983.wav	111111	Where does this conversation most probably take place? A. At a kindergarten; B. At a police station; C. In a library.	A	True
+short_conv_1984	gaokao_audio/short_conv_1984.wav	111111	What are the speakers probably doing? A. Operating a computer; B. Doing an experiment; C. Checking the power.	B	True
+short_conv_1985	gaokao_audio/short_conv_1985.wav	111111	What does the man say about the accident? A. The bus was speeding; B. The driver lost control of the truck; C. The bus driver made a sudden turn.	C	True
+short_conv_1986	gaokao_audio/short_conv_1986.wav	111111	Where does the conversation take place? A. In a hotel; B. In a restaurant; C. In a supermarket.	B	True
+short_conv_1987	gaokao_audio/short_conv_1987.wav	111111	What did the man do? A. He had some drinks; B. He made a phone call; C. He looked after the woman.	A	True
+short_conv_1988	gaokao_audio/short_conv_1988.wav	111111	How does the woman probably feel? A. Angry; B. Happy; C. Worried.	A	True
+short_conv_1989	gaokao_audio/short_conv_1989.wav	111111	What can we learn from the conversation? A. The man missed the meeting completely; B. The man was late for the meeting; C. The man attended the meeting on time.	B	True
+short_conv_199	gaokao_audio/short_conv_199.wav	111111	How may the woman feel now? A. Embarrassed; B. Confused; C. Frightened.	A	True
+short_conv_1990	gaokao_audio/short_conv_1990.wav	111111	Who had a car accident? A. Bill; B. Dick; C. John.	B	True
+short_conv_1991	gaokao_audio/short_conv_1991.wav	111111	What do the speakers think of Carl? A. Modest; B. Kind; C. Stubborn.	C	True
+short_conv_1992	gaokao_audio/short_conv_1992.wav	111111	How was the weather at noon? A. Cool; B. Cold; C. Hot.	A	True
+short_conv_1993	gaokao_audio/short_conv_1993.wav	111111	Why is the man going to New York? A. To have a holiday; B. To attend a meeting; C. To see his grandparents.	C	True
+short_conv_1994	gaokao_audio/short_conv_1994.wav	111111	What do we learn from the conversation? A. The dancers impressed them both; B. The woman is also a dancer; C. The man invited the lady to the show.	A	True
+short_conv_1995	gaokao_audio/short_conv_1995.wav	111111	What's probably the gift? A. A grand wedding party; B. Two plane tickets to Hawaii; C. A picture of the moon.	B	True
+short_conv_1996	gaokao_audio/short_conv_1996.wav	111111	What is the man mean? A. He can’t understand the lady’s feeling; B. The lady should not blame others; C. Nobody may be interested in her problem.	B	True
+short_conv_1997	gaokao_audio/short_conv_1997.wav	111111	What is the man mean? A. Lisa made the mess; B. He and Lisa are settling a problem; C. Lisa likes the new place.	C	True
+short_conv_1998	gaokao_audio/short_conv_1998.wav	111111	What do we learn from the talk? A. They just want to grab the chance; B. They will probably change their mind; C. They’ll go skiing even in the rain.	B	True
+short_conv_1999	gaokao_audio/short_conv_1999.wav	111111	What did Sarah do? A. She threw herself out of window and broke her leg; B. She moved a truck to save a little boy; C. She rushed to a moving truck to save a kid.	C	True
+short_conv_2	gaokao_audio/short_conv_2.wav	111111	What does the man come for? A. To say goodbye; B. To visit his frienD; C. To invite the woman.	A	True
+short_conv_20	gaokao_audio/short_conv_20.wav	111111	What time did the concert start last night? A. At 8:00; B. At 8:15; C. At 8:30.	B	True
+short_conv_200	gaokao_audio/short_conv_200.wav	111111	How will the man treat his cold probably? A. By receiving an injection; B. By taking some medicine ; C. By drinking more water.	C	True
+short_conv_2000	gaokao_audio/short_conv_2000.wav	111111	Why does the woman say me too? A. She is also working very hard; B. She loves American football so much; C. She works for the World Cup.	A	True
+short_conv_201	gaokao_audio/short_conv_201.wav	111111	What’s the probable relationship between the speakers? A. Classmates; B. Neighbors; C. Strangers.	A	True
+short_conv_202	gaokao_audio/short_conv_202.wav	111111	Why does the man dislike the second pair of shoes? A. Because of the size; B. Because of the style; C. Because of the color.	C	True
+short_conv_203	gaokao_audio/short_conv_203.wav	111111	What does the woman prefer doing this evening? A. Seeing a film; B. Watching TV; C. Playing games.	B	True
+short_conv_204	gaokao_audio/short_conv_204.wav	111111	What are the speakers mainly talking about? A. A picnic; B. The weather; C. A forecast.	B	True
+short_conv_205	gaokao_audio/short_conv_205.wav	111111	Why doesn’t the man want to eat? A. He’s feeling a little sick; B. He doesn’t like the food; C. He ate something just now.	C	True
+short_conv_206	gaokao_audio/short_conv_206.wav	111111	How does the man know about animals? A. From books; B. On TV; C. Through the Internet.	B	True
+short_conv_207	gaokao_audio/short_conv_207.wav	111111	What does the man ask the woman to do? A. Give her ID card to him; B. Move a table; C. Sign for a parcel.	A	True
+short_conv_208	gaokao_audio/short_conv_208.wav	111111	What is the cause of the woman’s quietness? A. The violent film; B. Her tiredness; C. The crowded theater.	A	True
+short_conv_209	gaokao_audio/short_conv_209.wav	111111	What crop does the woman’s uncle plant? A. Beans; B. Cotton; C. Corn.	B	True
+short_conv_21	gaokao_audio/short_conv_21.wav	111111	Where is the man going tomorrow? A. To the school; B. To the beach; C. To the cinema.	C	True
+short_conv_210	gaokao_audio/short_conv_210.wav	111111	What is the woman looking at? A. A painting; B. A photo; C. A mirror.	A	True
+short_conv_211	gaokao_audio/short_conv_211.wav	111111	How does the man feel about the family party? A. Excited; B. Hesitant; C. Scared.	B	True
+short_conv_212	gaokao_audio/short_conv_212.wav	111111	What does the woman want to eat? A. Pork pies; B. Beef pies; C. Egg cakes.	C	True
+short_conv_213	gaokao_audio/short_conv_213.wav	111111	When will the mall close? A. In half an hour; B. In an hour; C. In one hour and a half.	A	True
+short_conv_214	gaokao_audio/short_conv_214.wav	111111	What is the probable relationship between the speakers? A. Classmates; B. Teacher and student; C. Doctor and patient.	A	True
+short_conv_215	gaokao_audio/short_conv_215.wav	111111	How does the woman go to work? A. By car; B. On foot; C. By bike.	B	True
+short_conv_216	gaokao_audio/short_conv_216.wav	111111	When does the train leave? A. At 6:30; B. At 8:30; C. At 10:30.	C	True
+short_conv_217	gaokao_audio/short_conv_217.wav	111111	What can we say about the woman? A. She's generous; B. She's curious; C. She's helpful.	C	True
+short_conv_218	gaokao_audio/short_conv_218.wav	111111	What will James do tomorrow? A. Watch a TV program; B. Give a talk; C. Write a report.	B	True
+short_conv_219	gaokao_audio/short_conv_219.wav	111111	What will the woman do? A. Find a player; B. Watch a game; C. Play basketball.	C	True
+short_conv_22	gaokao_audio/short_conv_22.wav	111111	What does the woman mainly talk about? A. Paying to safety; B. Learning how to drive; C. Buying a good bike.	A	True
+short_conv_220	gaokao_audio/short_conv_220.wav	111111	What does the woman want to know? A. Where the meeting is being held; B. Where Joe will meet her; C. What the topic of the meeting is.	A	True
+short_conv_221	gaokao_audio/short_conv_221.wav	111111	Where is the man now? A. On the way; B. In a restaurant; C. At home.	A	True
+short_conv_222	gaokao_audio/short_conv_222.wav	111111	What does the man tell to the woman? A. He took Bill to the hospital; B. He forgot to call the woman; C. He didn't know which hospital Bill was in.	B	True
+short_conv_223	gaokao_audio/short_conv_223.wav	111111	What does the woman mean? A. She needs the man's help; B. She thinks the man is right; C. She plans to send out all the invitations.	C	True
+short_conv_224	gaokao_audio/short_conv_224.wav	111111	What’s wrong with the woman’s son? A. He gets poor grades; B. Hedoesn’t perform well in school; C. He doesn’t do his work.	A	True
+short_conv_225	gaokao_audio/short_conv_225.wav	111111	How long did the woman’s sister stay in her apartment? A. Two weeks; B. Two months; C. Two years.	B	True
+short_conv_226	gaokao_audio/short_conv_226.wav	111111	Who is the woman talking to? A. A repairman; B. A relative; C. A salesman.	C	True
+short_conv_227	gaokao_audio/short_conv_227.wav	111111	Where does the conversation take place? A. In a library; B. In a classroom; C. In a bookstore.	C	True
+short_conv_228	gaokao_audio/short_conv_228.wav	111111	Why can’t the woman have a room in the hotel tonight? A. There aren’t any open rooms; B. She doesn’t have enough money; C. A wedding is held there tonight.	A	True
+short_conv_229	gaokao_audio/short_conv_229.wav	111111	What's the probable relationship between the speakers? A. Husband and wife; B. Doctor and patient; C. Teacher and student.	B	True
+short_conv_23	gaokao_audio/short_conv_23.wav	111111	How did the woman get to Baltimore? A. By train; B. By bus; C. By taxi.	A	True
+short_conv_230	gaokao_audio/short_conv_230.wav	111111	What does the man think Michael has been doing this week? A. Going to class; B. Resting at home; C. Looking for a job.	C	True
+short_conv_231	gaokao_audio/short_conv_231.wav	111111	What does the man think of the jacket? A. It has too many pockets; B. It’s great for everyday use; C. It is suitable for outdoor activities.	C	True
+short_conv_232	gaokao_audio/short_conv_232.wav	111111	What is the woman going to do? A. Pray for good luck; B. Prepare for a debate; C. Study for the final exam.	B	True
+short_conv_233	gaokao_audio/short_conv_233.wav	111111	What are the speakers mainly talking about? A. What the apartment manager is like; B. When the man met the apartment manager; C. Whether the man likes the apartment manager.	A	True
+short_conv_234	gaokao_audio/short_conv_234.wav	111111	How did the man sound in the conversation? A. Disappointed; B. Depressed; C. Excited	C	True
+short_conv_235	gaokao_audio/short_conv_235.wav	111111	Where does the conversation take place? A. At a restaurant; B. At home; C. At a supermarket.	A	True
+short_conv_236	gaokao_audio/short_conv_236.wav	111111	What will the man do next Friday? A. Attend a party; B. Go on a business trip; C. E-mail the woman a report.	B	True
+short_conv_237	gaokao_audio/short_conv_237.wav	111111	What was the weather like when Beth was at the beach? A. Cloudy; B. Sunny; C. Rainy.	A	True
+short_conv_238	gaokao_audio/short_conv_238.wav	111111	What are the speakers mainly talking about? A. A picnic; B. The weather; C. A forecast.	B	True
+short_conv_239	gaokao_audio/short_conv_239.wav	111111	Why doesn’t the man want to eat? A. He’s feeling a little sick; B. He doesn’t like the food; C. He ate something just now.	C	True
+short_conv_24	gaokao_audio/short_conv_24.wav	111111	How does the man feel about Lila? A. Bored; B. Scared; C. Excited.	B	True
+short_conv_240	gaokao_audio/short_conv_240.wav	111111	What does the man ask the woman to do? A. Sign for a parcel; B. Move a table; C. Give her ID card to him.	C	True
+short_conv_241	gaokao_audio/short_conv_241.wav	111111	How does the man know about animals? A. From books; B. On TV; C. Through the Internet.	B	True
+short_conv_242	gaokao_audio/short_conv_242.wav	111111	What is the cause of the woman’s quietness? A. The violent film; B. Her tiredness; C. The crowded theater.	A	True
+short_conv_243	gaokao_audio/short_conv_243.wav	111111	What are the speakers talking about? A. A restaurant; B. A street; C. A dish.	A	True
+short_conv_244	gaokao_audio/short_conv_244.wav	111111	What does the woman think of her interview? A. It was tough; B. It was interesting; C. It was successful.	C	True
+short_conv_245	gaokao_audio/short_conv_245.wav	111111	Where does the conversation probably take place? A. In a bank; B. At a ticket office; C. On a train.	B	True
+short_conv_246	gaokao_audio/short_conv_246.wav	111111	What is the probable relationship between the speakers? A. Colleagues; B. Brother and sister; C. Teacher and student.	A	True
+short_conv_247	gaokao_audio/short_conv_247.wav	111111	What does John find difficult in learning German? A. Pronunciation; B. Vocabulary; C. Grammar.	C	True
+short_conv_248	gaokao_audio/short_conv_248.wav	111111	Where will the train at Platform1 leave for? A. Oxford; B. Reading; C. Southampton Central	A	True
+short_conv_249	gaokao_audio/short_conv_249.wav	111111	Why did the poor woman hit the pole? A. It was rainy; B. She was careless; C. She tried to avoid a biker	C	True
+short_conv_25	gaokao_audio/short_conv_25.wav	111111	What will the speakers do? A. Catch a flight; B. Pick up their son; C. Deal with an accident.	B	True
+short_conv_250	gaokao_audio/short_conv_250.wav	111111	Where are the two speakers? A. In the street; B. At home; C. In a restaurant	B	True
+short_conv_251	gaokao_audio/short_conv_251.wav	111111	Why did the man fail to wake up the woman? A. He found the woman tired; B. He felt rather terrible; C. He had been enjoying the scenery.	A	True
+short_conv_252	gaokao_audio/short_conv_252.wav	111111	How did the man feel when knowing the times is closing down A. Disappointed; B. Joyful; C. Sad.	B	True
+short_conv_253	gaokao_audio/short_conv_253.wav	111111	How did the man feel about his jump in the end? A. Terrified; B. Disappointed; C. Excited.	B	True
+short_conv_254	gaokao_audio/short_conv_254.wav	111111	Where does the conversation most probably take place? A. On a bus; B. In a library; C. In a shop.	A	True
+short_conv_255	gaokao_audio/short_conv_255.wav	111111	What has the man decided to do? A. Continue his talk with Mr. Black; B. Go to see an engineer; C. Check the schedule.	A	True
+short_conv_256	gaokao_audio/short_conv_256.wav	111111	What does the woman do? A. A teacher; B. A nurse; C. A shop assistant.	B	True
+short_conv_257	gaokao_audio/short_conv_257.wav	111111	Why does the man want to leave? A. The service is too slow; B. The food is bad; C. The music is too loud	C	True
+short_conv_258	gaokao_audio/short_conv_258.wav	111111	What does the man mean? A. Wash the clothes twice; B. Have the machine repaired; C. Put in fewer clothes.	C	True
+short_conv_259	gaokao_audio/short_conv_259.wav	111111	What does the man probably think of the party? A. It's interesting; B. It's crowded; C. It's dull.	C	True
+short_conv_26	gaokao_audio/short_conv_26.wav	111111	What are the two speakers talking about? A. The gift; B. The class; C. The new professor.	C	True
+short_conv_260	gaokao_audio/short_conv_260.wav	111111	What's the woman going to today? A. Take an examination; B. Have a history class; C. Study in the library.	C	True
+short_conv_261	gaokao_audio/short_conv_261.wav	111111	How did the man get home? A. By bus; B. On foot; C. By taxi.	B	True
+short_conv_262	gaokao_audio/short_conv_262.wav	111111	What are the speakers mainly talking about? A. Driving; B. Health; C. Weather.	C	True
+short_conv_263	gaokao_audio/short_conv_263.wav	111111	How much is the entrance fee for a student? A. $30; B. $20; C. $12.	B	True
+short_conv_264	gaokao_audio/short_conv_264.wav	111111	Where did the man have his advanced study? A. In the U.S; B. In France; C. In Britain.	C	True
+short_conv_265	gaokao_audio/short_conv_265.wav	111111	What did the man forget to bring this time? A. His keys; B. His phone; C. His wallet.	C	True
+short_conv_266	gaokao_audio/short_conv_266.wav	111111	Where does the conversation take place? A. In a hotel; B. In a restaurant; C. In a supermarket.	A	True
+short_conv_267	gaokao_audio/short_conv_267.wav	111111	When does the store open? A. At 8 a.m; B. At 7 a.m; C. At 6 a.m.	C	True
+short_conv_268	gaokao_audio/short_conv_268.wav	111111	Who knows the best place to ride a bike according to the conversation? A. Harry; B. The man speaker; C. The woman speaker.	A	True
+short_conv_269	gaokao_audio/short_conv_269.wav	111111	What is the woman doing? A. Offering suggestions; B. Expressing dissatisfaction; C. Asking for help.	C	True
+short_conv_27	gaokao_audio/short_conv_27.wav	111111	Why doesn't the woman go to Europe? A. She is terrified; B. She doesn't like Europe; C. She has. been there before.	A	True
+short_conv_270	gaokao_audio/short_conv_270.wav	111111	How did John do in the exam? A. He failed in the exam; B. He got the highest mark; C. He did worse than last time.	A	True
+short_conv_271	gaokao_audio/short_conv_271.wav	111111	When can the speakers reach the Overseas Chinese Hotel? A. At 11:35; B. At 11:45; C. At 12:00.	B	True
+short_conv_272	gaokao_audio/short_conv_272.wav	111111	What did the man do last night? A. He watched a play; B. He did some shopping; C. He relaxed at home.	A	True
+short_conv_273	gaokao_audio/short_conv_273.wav	111111	What is the woman going to do? A. Catch a train; B. Carry out a survey; C. Identify her personality.	A	True
+short_conv_274	gaokao_audio/short_conv_274.wav	111111	Where are the speakers? A. In the classroom; B. In the library; C. At home.	B	True
+short_conv_275	gaokao_audio/short_conv_275.wav	111111	Why was the man late? A. He cleaned out the garage; B. He fell over from the toolbox; C. He tried to find his baseball bat.	C	True
+short_conv_276	gaokao_audio/short_conv_276.wav	111111	When will the man leave for Perth? A. On May 24 B. On May 26 C. On May 27	C	True
+short_conv_277	gaokao_audio/short_conv_277.wav	111111	How will the speakers go to Zhongshan Road? A. By car; B. By bus; C. By subway.	C	True
+short_conv_278	gaokao_audio/short_conv_278.wav	111111	What does the man mean? A. The woman should get the skirt; B. The woman turns out to be a fool; C. The woman shouldn't go to the party.	A	True
+short_conv_279	gaokao_audio/short_conv_279.wav	111111	Where does the woman sit now? A. By the window; B. By the door; C. In the back row.	C	True
+short_conv_28	gaokao_audio/short_conv_28.wav	111111	What happened in the cafeteria? A. The man met his teacher; B. Someone took the man’s chair; C. The man didn’t take any money.	B	True
+short_conv_280	gaokao_audio/short_conv_280.wav	111111	What happened to the man last weekend? A. He caught a wrong train; B. He failed to go for a trip; C. He forgot to buy tickets.	B	True
+short_conv_281	gaokao_audio/short_conv_281.wav	111111	What are the speakers mainly talking about? A. Plants; B. Animals; C. Rainforests.	C	True
+short_conv_282	gaokao_audio/short_conv_282.wav	111111	Who is washing the windows? A. The man; B. A cleaner; C. The woman.	B	True
+short_conv_283	gaokao_audio/short_conv_283.wav	111111	Where does the conversation take place? A. In a bookstore; B. In a library; C. In a classroom.	A	True
+short_conv_284	gaokao_audio/short_conv_284.wav	111111	What does the man have to do today? A. Attend a class; B. Work on a report; C. Visit his mother.	B	True
+short_conv_285	gaokao_audio/short_conv_285.wav	111111	How will the woman go to the show? A. By bus; B. By car; C. On foot.	B	True
+short_conv_286	gaokao_audio/short_conv_286.wav	111111	What does the man think of the new play? A. Strange; B. Boring; C. Interesting.	C	True
+short_conv_287	gaokao_audio/short_conv_287.wav	111111	What food will the boy have at his party? A. Pizza; B. Salad; C. Fried chicken.	C	True
+short_conv_288	gaokao_audio/short_conv_288.wav	111111	What is the man’s attitude towards the plan? A. He doesn’t care; B. He is for it; C. He is against it.	C	True
+short_conv_289	gaokao_audio/short_conv_289.wav	111111	What is the woman planning to do? A. Go to have a coffee; B. Get a haircut; C. Go to the man’s house.	B	True
+short_conv_29	gaokao_audio/short_conv_29.wav	111111	What did the man do? A. He acted in a movie; B. He had a fight with others; C. He communicated with children.	B	True
+short_conv_290	gaokao_audio/short_conv_290.wav	111111	What caused the difference in the price? A. The color; B. The size; C. The material.	C	True
+short_conv_291	gaokao_audio/short_conv_291.wav	111111	Where is the man going first? A. To the Healey Supermarket; B. To the airport; C. To Canada.	A	True
+short_conv_292	gaokao_audio/short_conv_292.wav	111111	When should Susan go to meet Professor Brown? A. At 9: 30; B. At 10: 00; C. At 10: 30.	C	True
+short_conv_293	gaokao_audio/short_conv_293.wav	111111	What will the woman do? A. Take a bath; B. Cook a meal; C. Call her dad.	B	True
+short_conv_294	gaokao_audio/short_conv_294.wav	111111	Where does the woman want to have dinner? A. At the man’s house; B. At the Red Rose Restaurant; C. At the Blue Moon Restaurant.	C	True
+short_conv_295	gaokao_audio/short_conv_295.wav	111111	When did the woman came back home? A. At 8:00 B. At 10:00; C. At 11:00.	C	True
+short_conv_296	gaokao_audio/short_conv_296.wav	111111	What is the man doing? A. Asking permission; B. Offering help; C. Finding the smoking area.	A	True
+short_conv_297	gaokao_audio/short_conv_297.wav	111111	What does the man plan to do? A. Attend a concert; B. See a film; C. Watch a game.	C	True
+short_conv_298	gaokao_audio/short_conv_298.wav	111111	How will the man go to work ? A. On foot B. By car C. By understand	C	True
+short_conv_299	gaokao_audio/short_conv_299.wav	111111	Which T-shirt will the woman buy ? A. The $7 one B. The$8 one C. The $10	B	True
+short_conv_3	gaokao_audio/short_conv_3.wav	111111	Where does the talk take place? A. At the woman’s house; B. At the man’s house; C. In a local restaurant	B	True
+short_conv_30	gaokao_audio/short_conv_30.wav	111111	Why can the man speak German? A. He is German; B. His teacher taught him; C. He used to live in Germany.	C	True
+short_conv_300	gaokao_audio/short_conv_300.wav	111111	Where is the man going first ? A. To a post office B. To a library C. To a store	C	True
+short_conv_301	gaokao_audio/short_conv_301.wav	111111	What is the man’s hobby ? A. Playing computer games B. Climbing mountains C. Collecting coins	A	True
+short_conv_302	gaokao_audio/short_conv_302.wav	111111	What does the man ask the woman to get for him ? A. Some books B. Some pencils C. Some envelops	C	True
+short_conv_303	gaokao_audio/short_conv_303.wav	111111	What happened to the woman? A. She was ill; B. She failed a test; C. She lost her job.	B	True
+short_conv_304	gaokao_audio/short_conv_304.wav	111111	What does the woman say about Professor Johnson? A. His lectures are humorous; B. He is quiet; C. He is not strict.	A	True
+short_conv_305	gaokao_audio/short_conv_305.wav	111111	When does the regular train usually leave for London? A. At 5:15 pm; B. At 4:50 pm; C. At 2:30 pm.	C	True
+short_conv_306	gaokao_audio/short_conv_306.wav	111111	Where does the conversation take place? A. In a bookshop; B. In a bank; C. On a street.	B	True
+short_conv_307	gaokao_audio/short_conv_307.wav	111111	What color dress does the man suggest? A. Blue; B. Pink; C. Green.	C	True
+short_conv_308	gaokao_audio/short_conv_308.wav	111111	What can we learn about Ralph? A. He is three years old; B. He is a quick learner; C. He starts losing hair.	B	True
+short_conv_309	gaokao_audio/short_conv_309.wav	111111	What is the probable relationship between the speakers? A. Cousins; B. Mother and son; C. Brother and sister.	C	True
+short_conv_31	gaokao_audio/short_conv_31.wav	111111	What are the speakers talking about? A. An accident; B. A flight; C. A pilot.	B	True
+short_conv_310	gaokao_audio/short_conv_310.wav	111111	Where are probably the speakers? A. In a candy store; B. In a restaurant; C. In a supermarket.	C	True
+short_conv_311	gaokao_audio/short_conv_311.wav	111111	What will the man do today? A. Work in the office; B. Buy a jacket; C. Go to the sale.	A	True
+short_conv_312	gaokao_audio/short_conv_312.wav	111111	What are the speakers doing? A. Buying a new car; B. Driving in the city; C. Choosing flowers.	A	True
+short_conv_313	gaokao_audio/short_conv_313.wav	111111	Where are the two speakers now? A. On the first floor; B. On the fourth floor; C. On the fifth floor.	C	True
+short_conv_314	gaokao_audio/short_conv_314.wav	111111	What can we learn about the woman? A. She is angry with the man; B. She doesn’t like her roommate; C. She is rather quiet.	B	True
+short_conv_315	gaokao_audio/short_conv_315.wav	111111	What are the speakers mainly talk about? A. A child; B. A room; C. A present.	C	True
+short_conv_316	gaokao_audio/short_conv_316.wav	111111	What will the speakers discuss first? A. A report; B. A computer; C. A pop.	A	True
+short_conv_317	gaokao_audio/short_conv_317.wav	111111	How much will the woman pay if she buys two shirts? A. $18; B. $19; C. $20.	B	True
+short_conv_318	gaokao_audio/short_conv_318.wav	111111	Where is the No.1 Hospital? A. Across from a bank; B. On Zhongshan Street; C. At the end of 5th Street.	A	True
+short_conv_319	gaokao_audio/short_conv_319.wav	111111	What does the boy’s new history teacher look like? A. She has red hair; B. she is tall; C. She is quite short.	B	True
+short_conv_32	gaokao_audio/short_conv_32.wav	111111	Why is the baby crying according to the man? A. He is hungry; B. He is ill; C. He is alone.	C	True
+short_conv_320	gaokao_audio/short_conv_320.wav	111111	Why does the man refuse the woman? A. His car just broke down; B. He’ll use his car; C. She can’t drive.	B	True
+short_conv_321	gaokao_audio/short_conv_321.wav	111111	When will the boy go to bed? A. At 9:40; B. At 9:50; C. At 10:10.	C	True
+short_conv_322	gaokao_audio/short_conv_322.wav	111111	What does the man do? A. An artist; B. A house painter; C. A cleaner.	B	True
+short_conv_323	gaokao_audio/short_conv_323.wav	111111	Where is probably the man’s dog now? A. In the garden; B. In the park; C. In his house.	C	True
+short_conv_324	gaokao_audio/short_conv_324.wav	111111	What do we know about the man now? A. He is a teacher in Beijing; B. He is studying in a middle schoo1; C. He is teaching in Shanghai.	C	True
+short_conv_325	gaokao_audio/short_conv_325.wav	111111	Why was the man late? A. His phone ran out of power; B. His car broke down; C. He lost his way.	B	True
+short_conv_326	gaokao_audio/short_conv_326.wav	111111	When did the man have a tour in Mexico? A. Last month; B. Four days ago; C. Last week.	A	True
+short_conv_327	gaokao_audio/short_conv_327.wav	111111	What did the boy get from the shelf? A. A book; B. A toy; C. A cup.	B	True
+short_conv_328	gaokao_audio/short_conv_328.wav	111111	Why will Jack go to Anna’s office? A. To have a good rest; B. To ask for sick leave; C. To talk with his teacher.	B	True
+short_conv_329	gaokao_audio/short_conv_329.wav	111111	What will Mark do this afternoon? A. Watch TV at home; B. Go to the cinema alone; C. See a movie with Rosa.	A	True
+short_conv_33	gaokao_audio/short_conv_33.wav	111111	What is the most probable relationship between the speakers? A. Couple; B. Neighbors; C. Colleagues.	A	True
+short_conv_330	gaokao_audio/short_conv_330.wav	111111	Where does the conversation most probably take place? A. At a hospital; B. At a hotel; C. At a restaurant.	A	True
+short_conv_331	gaokao_audio/short_conv_331.wav	111111	What does Ted often play now? A. Basketball; B. Table tennis; C. Tennis.	B	True
+short_conv_332	gaokao_audio/short_conv_332.wav	111111	What does the woman like eating? A. Ice cream; B. Cake; C. Fruit.	C	True
+short_conv_333	gaokao_audio/short_conv_333.wav	111111	How much more money does the man want from the woman? A. Thirty dollars; B. Twenty dollars; C. Ten dollars.	C	True
+short_conv_334	gaokao_audio/short_conv_334.wav	111111	What does the man think of Bill? A. He’s thoughtful; B. He’s humorous; C. He’s careless.	C	True
+short_conv_335	gaokao_audio/short_conv_335.wav	111111	Why was the man surprised? A. The woman was late; B. The woman arrived early; C. The woman worked overtime tonight.	B	True
+short_conv_336	gaokao_audio/short_conv_336.wav	111111	What did the woman tell the man? A. He could use her extra pen; B. The pencil needed sharpening; C. She didn’t bring the pencil sharpener.	B	True
+short_conv_337	gaokao_audio/short_conv_337.wav	111111	When did the woman learn to draw? A. In university; B. In high school; C. In her childhood.	A	True
+short_conv_338	gaokao_audio/short_conv_338.wav	111111	How does the man find the woman's forgetfulness? A. Annoying; B. Embarrassing; C. Understandable.	C	True
+short_conv_339	gaokao_audio/short_conv_339.wav	111111	What does the woman think of the wine? A. It's a bit expensive; B. It's not her cup of tea; C. It's tasty and cheap.	A	True
+short_conv_34	gaokao_audio/short_conv_34.wav	111111	Where is the woman's father now? A. At home; B. In a hospital; C. At the office.	B	True
+short_conv_340	gaokao_audio/short_conv_340.wav	111111	What does the man suggest? A. Repairing the laptop; B. Buying a new laptop; C. Using the laptop less.	B	True
+short_conv_341	gaokao_audio/short_conv_341.wav	111111	What's the relationship between the speakers? A. Strangers B. Friends C. A couple.	C	True
+short_conv_342	gaokao_audio/short_conv_342.wav	111111	Where does the conversation most probably take place? A. In a library B. At a theater C. In a restaurant	C	True
+short_conv_343	gaokao_audio/short_conv_343.wav	111111	What are the speakers talking about? A. Sports; B. Fashion; C. Magazines.	C	True
+short_conv_344	gaokao_audio/short_conv_344.wav	111111	How does the man respond to the woman? A. He’s doubtful; B. He’s impressed; C. He’s inspired.	B	True
+short_conv_345	gaokao_audio/short_conv_345.wav	111111	What is the man going to do? A. Sell something; B. Leave his company; C. Have a job interview.	C	True
+short_conv_346	gaokao_audio/short_conv_346.wav	111111	When will the party begin? A. In 10 minutes; B. In 15 minutes; C. In 30 minutes.	B	True
+short_conv_347	gaokao_audio/short_conv_347.wav	111111	How will the man pay probably? A. By credit card; B. By check; C. In cash.	C	True
+short_conv_348	gaokao_audio/short_conv_348.wav	111111	What is the probable relationship between the speakers? A. Boss and employee; B. Waiter and customer; C. Co-workers.	C	True
+short_conv_349	gaokao_audio/short_conv_349.wav	111111	Where are the two speakers probably? A. In a hotel; B. In a hospital; C. In a car.	B	True
+short_conv_35	gaokao_audio/short_conv_35.wav	111111	What is the man doing? A. Giving a gift; B. Buying chocolates; C. Making chocolates.	A	True
+short_conv_350	gaokao_audio/short_conv_350.wav	111111	What are the speakers probably doing? A. Watching TV; B. Taking pictures; C. Doing exercise.	B	True
+short_conv_351	gaokao_audio/short_conv_351.wav	111111	What time will the man arrive in London? A. At 8:30; B. At 8:00; C. At 7:30.	A	True
+short_conv_352	gaokao_audio/short_conv_352.wav	111111	What is the purpose of the woman’s call? A. To open a new account; B. To pay the gas bill; C. To ask about a bill.	C	True
+short_conv_353	gaokao_audio/short_conv_353.wav	111111	Why is the man late? A. The traffic was heavy; B. There was an accident; C. He took the wrong bus.	C	True
+short_conv_354	gaokao_audio/short_conv_354.wav	111111	What is true about Ellen? A. She likes African art; B. She knows Susan very well; C. She doesn’t know Bob.	A	True
+short_conv_355	gaokao_audio/short_conv_355.wav	111111	What does the man mean? A. He will do a class project; B. He won’t go to the beach; C. He will go to the zoo next time.	B	True
+short_conv_356	gaokao_audio/short_conv_356.wav	111111	How will the man pay? A. By cheque; B. By credit card; C. In cash.	B	True
+short_conv_357	gaokao_audio/short_conv_357.wav	111111	Where does the man want to go? A. To New York; B. To Boston; C. To Chicago.	B	True
+short_conv_358	gaokao_audio/short_conv_358.wav	111111	When does the conversation take place? A. At 4:45; B. At 5:00; C. At 5:15.	A	True
+short_conv_359	gaokao_audio/short_conv_359.wav	111111	What does the man fail to prepare for his wedding? A. The church; B. The transport; C. The wedding dress.	C	True
+short_conv_36	gaokao_audio/short_conv_36.wav	111111	What's the woman's attitude towards the man's idea? A. Un concerned; B. Supportive ; C. Doubtful.	C	True
+short_conv_360	gaokao_audio/short_conv_360.wav	111111	What will the speakers probably do tonight? A. Eat out; B. Go shopping; C. Pick up a friend.	A	True
+short_conv_361	gaokao_audio/short_conv_361.wav	111111	How does the man find Mr. White? A. Strict; B. Patient; C. Responsible.	A	True
+short_conv_362	gaokao_audio/short_conv_362.wav	111111	What does the man probably do? A. A cook; B. A waiter; C. A fisherman.	B	True
+short_conv_363	gaokao_audio/short_conv_363.wav	111111	What does the man mean? A. The woman must examine her teeth; B. The woman will quarrel with somebody soon; C. The woman doesn't need to worry about the dream.	C	True
+short_conv_364	gaokao_audio/short_conv_364.wav	111111	What are the two speakers going to do next? A. Ask John to invite Professor Li; B. Work out details for John's farewell; C. Take part in the farewell party for Professor Li.	B	True
+short_conv_365	gaokao_audio/short_conv_365.wav	111111	What can we know from the dialogue? A. Sarah will stay with her cousin; B. Sarah will serve a room for her aunt; C. Sarah will move into a home-stay family.	C	True
+short_conv_366	gaokao_audio/short_conv_366.wav	111111	When should Trish get to the airport? A. At 3 pm; B. At 6 am; C. At 6 pm.	A	True
+short_conv_367	gaokao_audio/short_conv_367.wav	111111	Why did the man fail the test? A. He didn't work hard; B. He didn't sleep well; C. He got to the test late.	B	True
+short_conv_368	gaokao_audio/short_conv_368.wav	111111	What does the woman mean? A. The candidate is not good at giving speeches; B. The candidate is out of touch with the woman; C. The candidate is not qualified for the job.	C	True
+short_conv_369	gaokao_audio/short_conv_369.wav	111111	What is the man advised the woman to do? A. Pay for photographing for her wedding; B. Save the budget of wedding; C. Avoid taking too many photos for her wedding.	A	True
+short_conv_37	gaokao_audio/short_conv_37.wav	111111	When does the man learn to play the guitar? A. On Thursdays; B. On Wednesdays; C. On Saturdays.	B	True
+short_conv_370	gaokao_audio/short_conv_370.wav	111111	Where does this conversation most probably take place? A. At a drugstore B. At a laundry; C. At a furniture shop.	B	True
+short_conv_371	gaokao_audio/short_conv_371.wav	111111	What does the man mean? A. He doesn't know which taste to choose; B. He loses words to describe the taste of the ice cream; C. He enjoys selling ice cream.	A	True
+short_conv_372	gaokao_audio/short_conv_372.wav	111111	What does the woman imply? A. John lied about absence from school; B. John was too ill to receive them at home; C. She didn’t go to school herself.	A	True
+short_conv_373	gaokao_audio/short_conv_373.wav	111111	What does the woman mean? A. Her mother is in an area with poor signal reception; B. She can’t connect her mother through the mobile phone now; C. She has to notify her mother that someone is dead.	B	True
+short_conv_374	gaokao_audio/short_conv_374.wav	111111	What does the man mean? A. Their neighbor broke their light bulb; B. There's something wrong with their light bulb; C. It's black outside the window.	B	True
+short_conv_375	gaokao_audio/short_conv_375.wav	111111	What does the woman mean? A. The new movie was positively reviewed by critics; B. The new movie was successful in sales and reputation; C. The new movie wasn’t welcomed by the critics.	C	True
+short_conv_376	gaokao_audio/short_conv_376.wav	111111	How much will the man pay for a sandwich and a black coffee altogether? A. 6 dollars B. 7 dollars C. 11 dollars	A	True
+short_conv_377	gaokao_audio/short_conv_377.wav	111111	Why can't the man get his car field? A. Because there’s no gas left at the gas station right now; B. Because the gas station is checking and repairing the equipment now; C. Because the quality of the gas in the station is terrible.	B	True
+short_conv_378	gaokao_audio/short_conv_378.wav	111111	Why does the man want to see the manager? A. To put an advertisement; B. To apply for a job; C. To sell him a mobile phone.	B	True
+short_conv_379	gaokao_audio/short_conv_379.wav	111111	Where will the man plant the tree? A. By the front door; B. In the back yard; C. Next to the garage.	B	True
+short_conv_38	gaokao_audio/short_conv_38.wav	111111	What are the speakers talking about in general? A. Travel plans; B. Picnic preparations; C. Barbecue time.	B	True
+short_conv_380	gaokao_audio/short_conv_380.wav	111111	What does the man mean? A. He quite agrees with the woman; B. He enjoys the lecture the whole time; C. He doesn’t agree with the woman.	A	True
+short_conv_381	gaokao_audio/short_conv_381.wav	111111	Who will probably decide the place to go? A. The man; B. The woman; C. Harry.	C	True
+short_conv_382	gaokao_audio/short_conv_382.wav	111111	Where does the conversation take place? A. At an airport; B. At a hotel; C. At a travel agency.	A	True
+short_conv_383	gaokao_audio/short_conv_383.wav	111111	What is the probable relationship between the two speakers? A. Doctor and patient; B. Husband and wife; C. Teacher and student.	B	True
+short_conv_384	gaokao_audio/short_conv_384.wav	111111	Which flight will the man take? A. Flight 201; B. Flight 120; C. Flight 102.	C	True
+short_conv_385	gaokao_audio/short_conv_385.wav	111111	How does the woman find the man’s mother? A. Brave; B. Determined; C. Hard-working.	B	True
+short_conv_386	gaokao_audio/short_conv_386.wav	111111	What was wrong with the woman’s milk probably? A. It went bad; B. It had no smell; C. It tasted salty.	A	True
+short_conv_387	gaokao_audio/short_conv_387.wav	111111	How will the man go to the train station tonight? A. By car; B. By bus; C. On foot.	B	True
+short_conv_388	gaokao_audio/short_conv_388.wav	111111	What is the probable relationship between the two speakers? A. Doctor and patient; B. Husband and wife; C. Teacher and student.	B	True
+short_conv_389	gaokao_audio/short_conv_389.wav	111111	How does the woman find the man’s mother? A. Brave; B. Determined; C. Hard-working.	B	True
+short_conv_39	gaokao_audio/short_conv_39.wav	111111	Where does the conversation probably take place? A. In a movie theatre; B. In a science museum; C. At the train station.	C	True
+short_conv_390	gaokao_audio/short_conv_390.wav	111111	Which flight will the man take? A. Flight 201; B. Flight 120; C. Flight 102.	C	True
+short_conv_391	gaokao_audio/short_conv_391.wav	111111	What was wrong with the woman’s milk probably? A. It went bad; B. It had no smell; C. It tasted salty.	A	True
+short_conv_392	gaokao_audio/short_conv_392.wav	111111	How will the man go to the train station tonight? A. By car; B. By bus; C. On foot.	B	True
+short_conv_393	gaokao_audio/short_conv_393.wav	111111	What lessons does the woman want to have? A. Waterskiing; B. Sailing; C. Swimming.	A	True
+short_conv_394	gaokao_audio/short_conv_394.wav	111111	What do you know about the woman? A. She likes her necklace; B. She lost her necklace; C. She made her boyfriend unhappy.	B	True
+short_conv_395	gaokao_audio/short_conv_395.wav	111111	How much time did it take the woman to do the room cleaning? A. About 4 hours; B. About 6 hours; C. About 8 hours.	A	True
+short_conv_396	gaokao_audio/short_conv_396.wav	111111	What soup does the man order? A. Tomato soup; B. Chicken soup; C. Onion soup.	B	True
+short_conv_397	gaokao_audio/short_conv_397.wav	111111	Where does the conversation take place? A. In a restaurant; B. In a shop; C. In a bank.	C	True
+short_conv_398	gaokao_audio/short_conv_398.wav	111111	What does the man fail to prepare for his wedding? A. The church; B. The transport; C. The wedding dress.	C	True
+short_conv_399	gaokao_audio/short_conv_399.wav	111111	When does the conversation take place? A. At 4:45; B. At 5:00; C. At 5:15.	A	True
+short_conv_4	gaokao_audio/short_conv_4.wav	111111	What’s the time now in New York? A. 5 p.m; B. 11 a.m; C. 6 p.m.	B	True
+short_conv_40	gaokao_audio/short_conv_40.wav	111111	How does Julie go to school? A. On foot; B. By bike; C. By bus.	A	True
+short_conv_400	gaokao_audio/short_conv_400.wav	111111	What will the speakers probably do tonight? A. Eat out; B. Go shopping; C. Pick up a friend.	A	True
+short_conv_401	gaokao_audio/short_conv_401.wav	111111	How does the man think of Mr. White? A. Strict; B. Patient; C. Responsible.	A	True
+short_conv_402	gaokao_audio/short_conv_402.wav	111111	What does the man probably do? A. A cook; B. A waiter; C. A fisherman.	B	True
+short_conv_403	gaokao_audio/short_conv_403.wav	111111	Where did Paul plan to go on his way home? A. To the shop; B. To the bank; C. To the office.	A	True
+short_conv_404	gaokao_audio/short_conv_404.wav	111111	Why doesn’t the woman try the fried food? A. She doesn’t like the taste at all; B. She is careful about her weight; C. She thinks it doesn’t have vitamins.	B	True
+short_conv_405	gaokao_audio/short_conv_405.wav	111111	What is the man’s opinion about high-speed rail? A. Comfortable but expensive; B. Convenient and relaxing; C. Fast but not enjoyable.	C	True
+short_conv_406	gaokao_audio/short_conv_406.wav	111111	What are the speakers? A. Newspaper reporters; B. Students; C. Teacher and student.	B	True
+short_conv_407	gaokao_audio/short_conv_407.wav	111111	When will the man be free? A. On Tuesday afternoon; B. On Wednesday morning; C. On Wednesday afternoon.	C	True
+short_conv_408	gaokao_audio/short_conv_408.wav	111111	What are the speakers talking about? A. Weather; B. Clothes; C. News.	A	True
+short_conv_409	gaokao_audio/short_conv_409.wav	111111	What does the man think of the book? A. Quite difficult; B. Very interesting; C. Too simple.	B	True
+short_conv_41	gaokao_audio/short_conv_41.wav	111111	What can we learn from the conversation? A. The man’s room is very clean; B. The woman wants to clean the room; C. The room hasn’t been cleaned for a long time.	C	True
+short_conv_410	gaokao_audio/short_conv_410.wav	111111	Who might Mr. Peterson be? A. A new professor; B. A department head; C. A company director.	C	True
+short_conv_411	gaokao_audio/short_conv_411.wav	111111	What will the man do for the woman? A. Repair her car; B. Give her a ride; C. Pick up her aunt.	B	True
+short_conv_412	gaokao_audio/short_conv_412.wav	111111	What does the woman want to do? A. Find a place; B. Buy a map; C. Get an address.	A	True
+short_conv_413	gaokao_audio/short_conv_413.wav	111111	When does the woman probably go to the gym? A. On Mondays; B. On Fridays; C. On Saturdays.	C	True
+short_conv_414	gaokao_audio/short_conv_414.wav	111111	Why did the woman catch a cold according to the man? A. She wore too little clothing; B. She had a cold bath; C. She slept in a cold room.	A	True
+short_conv_415	gaokao_audio/short_conv_415.wav	111111	Where does the conversation probably take place? A. In a library; B. In a restaurant; C. In a drugstore.	B	True
+short_conv_416	gaokao_audio/short_conv_416.wav	111111	How will the speakers probably deal with the TV first? A. They will replace it with a new one; B. They will have it mended; C. They will sell it.	C	True
+short_conv_417	gaokao_audio/short_conv_417.wav	111111	What will the woman carry? A. Bottles; B. Bags; C. Boxes.	B	True
+short_conv_418	gaokao_audio/short_conv_418.wav	111111	Where are the speakers? A. At a bag store; B. In a restaurant; C. At a hotel.	C	True
+short_conv_419	gaokao_audio/short_conv_419.wav	111111	What was the woman probably trying to get? A. A ticket for a movie; B. A part in a play; C. A job as a model.	B	True
+short_conv_42	gaokao_audio/short_conv_42.wav	111111	What can we know about the man? A. He always studies hard; B. He doesn’t think he was wrong; C. He regrets that he didn’t study hard.	C	True
+short_conv_420	gaokao_audio/short_conv_420.wav	111111	Why was Alicia late this time? A. She missed the bus; B. Her grandma was sick; C. The bus was in an accident.	C	True
+short_conv_421	gaokao_audio/short_conv_421.wav	111111	How does the woman feel about the shoes? A. They’re a bit small; B. They’re too expensive; C. She doesn’t like the color.	C	True
+short_conv_422	gaokao_audio/short_conv_422.wav	111111	What is the relationship between the two speakers? A. Husband and wife; B. Teacher and students; C. Doctor and patient.	A	True
+short_conv_423	gaokao_audio/short_conv_423.wav	111111	Where are the speakers? A. In a coffee shop; B. At the workplace; C. At home.	A	True
+short_conv_424	gaokao_audio/short_conv_424.wav	111111	How much will the man finally pay for the T-shirt? A. 6 dollars; B. 10 dollars; C. 12 dollars.	B	True
+short_conv_425	gaokao_audio/short_conv_425.wav	111111	What will the speakers do? A. Keep waiting; B. Go back home; C. Change the restaurant.	C	True
+short_conv_426	gaokao_audio/short_conv_426.wav	111111	What will the man do? A. Return Jimmy’s dictionary; B. Go and look for Jimmy; C. Give Jimmy a phone call.	A	True
+short_conv_427	gaokao_audio/short_conv_427.wav	111111	When did the speakers graduate from the school? A. 8 years ago; B. 10 years ago; C. 20 years ago.	C	True
+short_conv_428	gaokao_audio/short_conv_428.wav	111111	Why didn't the man apply for the job? A. He can't start the job on time; B. He is occupied in May C. He does not like it	A	True
+short_conv_429	gaokao_audio/short_conv_429.wav	111111	What does the man advise the woman to do? A. Go to Tibet with a professor; B. Consult a local travel agent; C. Make the arrangements herself.	B	True
+short_conv_43	gaokao_audio/short_conv_43.wav	111111	What is the man going to do this weekend? A. Go to the picnic; B. Go to the company; C. Work in his garden.	A	True
+short_conv_430	gaokao_audio/short_conv_430.wav	111111	What does the woman mean? A. Mason looked nice with a beard B. Mason couldn’t recognize himself C. Mason changed a lot	C	True
+short_conv_431	gaokao_audio/short_conv_431.wav	111111	What does the woman probably do next? A. Buy an umbrella B. Cancel the picnic; C. Write a weather report	B	True
+short_conv_432	gaokao_audio/short_conv_432.wav	111111	What will the speakers probably eat for lunch? A. Noodles B. Sandwiches; C. Pizzas	A	True
+short_conv_433	gaokao_audio/short_conv_433.wav	111111	What does the woman think of Oliver? A. He is helpful; B. He is selfish; C. He is well-prepared.	B	True
+short_conv_434	gaokao_audio/short_conv_434.wav	111111	Why does Mary call the man? A. To reschedule the appointment; B. To cancel the appointment; C. To confirm an appointment.	C	True
+short_conv_435	gaokao_audio/short_conv_435.wav	111111	What are the speakers mainly talking about? A. Weather; B. Games; C. Ways to relax.	A	True
+short_conv_436	gaokao_audio/short_conv_436.wav	111111	Where is the man now? A. At a bookstore; B. At a supermarket; C. At a restaurant.	B	True
+short_conv_437	gaokao_audio/short_conv_437.wav	111111	Why was the woman late? A. She got up late; B. She forgot her class; C. She wanted to sleep more.	A	True
+short_conv_438	gaokao_audio/short_conv_438.wav	111111	Where does the conversation most probably take place? A. In a library; B. In a bookstore; C. In a classroom.	A	True
+short_conv_439	gaokao_audio/short_conv_439.wav	111111	What does the man think is wrong with the plant? A. It needs watering at present; B. It is not getting enough sunshine; C. It should be moved into a large pot.	B	True
+short_conv_44	gaokao_audio/short_conv_44.wav	111111	How many students are there in the class? A. 46; B. 52; C. 40.	B	True
+short_conv_440	gaokao_audio/short_conv_440.wav	111111	What is the man looking for? A. A file; B. A letter; C. A notebook.	A	True
+short_conv_441	gaokao_audio/short_conv_441.wav	111111	When does the science class begin? A. At 8:50; B. At 10:55; C. At 11:45.	B	True
+short_conv_442	gaokao_audio/short_conv_442.wav	111111	What day is it today? A. It's Monday; B. It's Saturday; C. It's Sunday.	C	True
+short_conv_443	gaokao_audio/short_conv_443.wav	111111	What can we learn from the conversation? A. The man likes all kinds of music; B. The woman likes all kinds of music; C. The man isn’t interested in rock music.	C	True
+short_conv_444	gaokao_audio/short_conv_444.wav	111111	What does the woman imply? A. They haven’t enough money; B. She likes her old house; C. They never thought of moving.	A	True
+short_conv_445	gaokao_audio/short_conv_445.wav	111111	What can we learn from this conversation? A. People couldn’t bear the heat; B. The traffic condition has improved; C. The road here is being repaired.	A	True
+short_conv_446	gaokao_audio/short_conv_446.wav	111111	What does the men incline? A. She came earlier; B. He has cleaned the house; C. He needn’t clean the house.	A	True
+short_conv_447	gaokao_audio/short_conv_447.wav	111111	What can we learn from the conversation? A. John closes the door; B. Linda walks to the ATM; C. John may need some cash.	C	True
+short_conv_448	gaokao_audio/short_conv_448.wav	111111	What does the man imply? A. He had a holiday with his family long ago; B. He wants to have a long holiday with his family; C. He wasn’t satisfied with his holiday.	A	True
+short_conv_449	gaokao_audio/short_conv_449.wav	111111	What can we learn from the conversation? A. John has many new ideas in the paper; B. Mary isn’t satisfied with John’s paper; C. Mary should have polished the paper.	A	True
+short_conv_45	gaokao_audio/short_conv_45.wav	111111	Where does the conversation probably take place? A. In the street; B. In the cinema; C. At the drugstore.	A	True
+short_conv_450	gaokao_audio/short_conv_450.wav	111111	What is the men imply? A. The team performs well; B. He knows little about the team; C. The team is playing worse.	C	True
+short_conv_451	gaokao_audio/short_conv_451.wav	111111	What's the woman's probable job? A. Experimenter; B. Shop assistant; C. Makeup artist.	B	True
+short_conv_452	gaokao_audio/short_conv_452.wav	111111	What are the speakers talking about? A. A roommate; B. A new game; C. A new watch.	B	True
+short_conv_453	gaokao_audio/short_conv_453.wav	111111	Where is the woman’s grandma now? A. At home; B. In a hospital; C. In a hotel.	B	True
+short_conv_454	gaokao_audio/short_conv_454.wav	111111	What is the man? A. A secretary; B. A teacher; C. A doctor.	C	True
+short_conv_455	gaokao_audio/short_conv_455.wav	111111	How much does the woman pay for the tickets? A. £9; B. £10; C. £11.	A	True
+short_conv_456	gaokao_audio/short_conv_456.wav	111111	What was the weather like on John’s holiday? A. Sunny; B. Rainy; C. Cold.	C	True
+short_conv_457	gaokao_audio/short_conv_457.wav	111111	How will the speakers probably go home? A. By taxi B. By bus; C. By subway.	A	True
+short_conv_458	gaokao_audio/short_conv_458.wav	111111	What can we learn from the conversation? A. The train will arrive soon; B. The train is late due to the storm; C. The woman has to wait for the train.	A	True
+short_conv_459	gaokao_audio/short_conv_459.wav	111111	What does the woman mean? A. She will choose the man; B. The man was late in asking; C. She may run for the position	B	True
+short_conv_46	gaokao_audio/short_conv_46.wav	111111	Why does the woman work at The Indians now? A. To save up for a computer; B. To become a manager there; C. To learn about food.	A	True
+short_conv_460	gaokao_audio/short_conv_460.wav	111111	What does the woman care most about her cell phone? A. Its design; B. Its special functions; C. Its practical use.	C	True
+short_conv_461	gaokao_audio/short_conv_461.wav	111111	How often will the woman’s daughter take dance lesson next month? A. Three times a week; B. Twice a week; C. Once every week.	A	True
+short_conv_462	gaokao_audio/short_conv_462.wav	111111	Where are the man and the woman? A. At a flower shop; B. At a restaurant; C. At a concert.	B	True
+short_conv_463	gaokao_audio/short_conv_463.wav	111111	What do we know about the man’s apartment? A. It is not quiet enough; B. It is near the train station; C. It has a good view of the park.	A	True
+short_conv_464	gaokao_audio/short_conv_464.wav	111111	Where did the man go yesterday? A. The hotel; B. The office; C. The airport.	B	True
+short_conv_465	gaokao_audio/short_conv_465.wav	111111	What is the woman’s red jacket best for? A. The rainy days; B. The windy days; C. The warm days.	C	True
+short_conv_466	gaokao_audio/short_conv_466.wav	111111	What would the woman probably order with A. White wine; B. Red wine; C. Beer.	A	True
+short_conv_467	gaokao_audio/short_conv_467.wav	111111	What will the woman probably write her name with? A. A pencil; B. Her finger; C. An electronic pen.	B	True
+short_conv_468	gaokao_audio/short_conv_468.wav	111111	What band did the woman see? A. The one with a pianist; B. The one with a dancer; C. The one with three guitarists.	C	True
+short_conv_469	gaokao_audio/short_conv_469.wav	111111	What are the speakers mainly talking about? A. A job position; B. A sales engineer; C. An electrical company.	A	True
+short_conv_47	gaokao_audio/short_conv_47.wav	111111	When will the Browns come? A. At 5:30; B. At 6:00; C. At 6:30.	C	True
+short_conv_470	gaokao_audio/short_conv_470.wav	111111	Where are the speakers? A. On a bus; B. At a bus stop; C. In the man's home.	A	True
+short_conv_471	gaokao_audio/short_conv_471.wav	111111	When is Kim's birthday party? A. On July 16th B. On July 17th C. On July30th	B	True
+short_conv_472	gaokao_audio/short_conv_472.wav	111111	What do we know about the woman? A. She is not hungry; B. She will eat the bread; C. She doesn't like bread.	B	True
+short_conv_473	gaokao_audio/short_conv_473.wav	111111	What do we learn from the conversation? A. Both of the speakers enjoyed the film; B. An exciting film will be on next week; C. The woman was interested in exploring space.	A	True
+short_conv_474	gaokao_audio/short_conv_474.wav	111111	What does the woman imply? A. She’s never been to the city; B. She knows the city very well; C. She doesn’t remember much about the city.	C	True
+short_conv_475	gaokao_audio/short_conv_475.wav	111111	What does the man mean? A. He was happy about the woman’s absence; B. He suggested the woman bring her daughter; C. He suggested the woman visit the university.	B	True
+short_conv_476	gaokao_audio/short_conv_476.wav	111111	What does the woman imply? A. She prefers going to the dentist later in the day; B. The man will be back before his first class; C. The man might sleep late and miss his appointment.	B	True
+short_conv_477	gaokao_audio/short_conv_477.wav	111111	Why can't the woman for this email at the moment? A. The Internet doesn’t work; B. She doesn’t have time to do it; C. The email hasn’t been ready.	A	True
+short_conv_478	gaokao_audio/short_conv_478.wav	111111	What does the woman mean? A. She is very busy; B. She has an invitation already; C. She questions the man’s purpose.	A	True
+short_conv_479	gaokao_audio/short_conv_479.wav	111111	What are they talking about? A. Sales strategies; B. A job opportunity; C. Tour news.	B	True
+short_conv_48	gaokao_audio/short_conv_48.wav	111111	What does the woman think Henry’s parents should do? A. Take good care of him; B. Go camping with Henry; C. Let Henry go camping.	C	True
+short_conv_480	gaokao_audio/short_conv_480.wav	111111	How much in our will it cost the man to send the package? A. $ 1.5; B. $ 3; C. $ 3.5.	C	True
+short_conv_481	gaokao_audio/short_conv_481.wav	111111	What is the probable relationship between the two speakers? A. Customer and shop assistant; B. Customer and travel agent; C. Sailor and tourist.	A	True
+short_conv_482	gaokao_audio/short_conv_482.wav	111111	Where does this conversation most probably take place? A. In a car; B. In a plane; C. On a farm.	B	True
+short_conv_483	gaokao_audio/short_conv_483.wav	111111	What will the man most probably do tomorrow? A. Go to the party; B. Spend time with Linda; C. Celebrate his 22nd birthday*	B	True
+short_conv_484	gaokao_audio/short_conv_484.wav	111111	Wliat does the woman mean? A. She wants a more difficult job; B. She is tired of her present job; C. Her job is too difficult for her.	A	True
+short_conv_485	gaokao_audio/short_conv_485.wav	111111	What are the speakers mainly talking about? A. book; B. A film; C. An accident.	C	True
+short_conv_486	gaokao_audio/short_conv_486.wav	111111	Why doesn’t the man wear his yellow shirt? A. It’s missing; B. He doesn’t like it; C. Two buttons are off it.	C	True
+short_conv_487	gaokao_audio/short_conv_487.wav	111111	What does the woman like to eat? A. Fish; B. Beef; C. Chicken.	B	True
+short_conv_488	gaokao_audio/short_conv_488.wav	111111	What does the boy probably want from the woman? A. Thirty more dollars; B. Twenty more dollars; C. Ten more dollars.	C	True
+short_conv_489	gaokao_audio/short_conv_489.wav	111111	What does the man think of Bill? A. He’s thoughtful; B. He’s humorous; C. He’s careless.	C	True
+short_conv_49	gaokao_audio/short_conv_49.wav	111111	What does the woman want the man to do? A. To get her something; B. To go for a walk with her; C. To eat cookies with her.	A	True
+short_conv_490	gaokao_audio/short_conv_490.wav	111111	Why might the man be surprised? A. The woman was late; B. The woman arrived early; C. The woman worked overtime tonight.	B	True
+short_conv_491	gaokao_audio/short_conv_491.wav	111111	What did the woman tell the man? A. The pencil wasn’t sharp; B. He could use her extra pen; C. She didn’t bring the pencil sharpener.	A	True
+short_conv_492	gaokao_audio/short_conv_492.wav	111111	When did the woman learn to draw? A. In the university; B. In high school; C. In the childhood.	A	True
+short_conv_493	gaokao_audio/short_conv_493.wav	111111	How does the woman feel about Linda and Rob's business? A. Confident; B. Discouraged; C. Worried.	A	True
+short_conv_494	gaokao_audio/short_conv_494.wav	111111	What are the two speakers talking about? A. Where Joyce comes from; B. What Joyce ’ s hometown is like; C. Why Joyce ’s hometown is boring.	B	True
+short_conv_495	gaokao_audio/short_conv_495.wav	111111	How much tax should the man pay per night? A. $5; B. $10; C. $15.	B	True
+short_conv_496	gaokao_audio/short_conv_496.wav	111111	What does the woman want to do? A. Do some shopping; B. Go to the post office; C. Get her watch repaired.	C	True
+short_conv_497	gaokao_audio/short_conv_497.wav	111111	Why does the woman telephone the man? A. To borrow his camera; B. To ask him to meet her parents; C. To invite him to her new apartment.	A	True
+short_conv_498	gaokao_audio/short_conv_498.wav	111111	Where does the conversation probably take place? A. In a store; B. On a plane; C. In an office.	B	True
+short_conv_499	gaokao_audio/short_conv_499.wav	111111	What was the woman dissatisfied with about the movie? A. The special effects; B. The acting; C. The length.	C	True
+short_conv_5	gaokao_audio/short_conv_5.wav	111111	Why doesn’t the woman learn drawing? A. She’s poor at drawing; B. She’s too lazy; C. She lacks time.	C	True
+short_conv_50	gaokao_audio/short_conv_50.wav	111111	What will the man do right now? A. Go straight to the airport; B. Spend the night in Paris; C. Pack some clothes.	C	True
+short_conv_500	gaokao_audio/short_conv_500.wav	111111	How do the speakers feel about today’s paper? A. Shocked; B. Amused; C. Uninterested.	C	True
+short_conv_501	gaokao_audio/short_conv_501.wav	111111	What discount will the speakers get? A. 30%; B. 50%; C. 70%.	B	True
+short_conv_502	gaokao_audio/short_conv_502.wav	111111	What does the man expect the woman to do? A. Study in university; B. Travel to Singapore; C. Write a reference for her.	C	True
+short_conv_503	gaokao_audio/short_conv_503.wav	111111	What is the relationship between the two speakers? A. Teacher and student; B. Husband and wife; C. Mother and son.	C	True
+short_conv_504	gaokao_audio/short_conv_504.wav	111111	What are the speakers mainly talking about? A. Preparing for a test; B. Eating during an exam; C. Getting a medical exam.	B	True
+short_conv_505	gaokao_audio/short_conv_505.wav	111111	When will the woman go for a holiday? A. After she gets a new job; B. When her training is over; C. She herself even doesn’t know.	C	True
+short_conv_506	gaokao_audio/short_conv_506.wav	111111	Why does the man refuse the woman? A. He doesn’t have a car; B. He’ll be using his car; C. She is a bad driver.	B	True
+short_conv_507	gaokao_audio/short_conv_507.wav	111111	What time is it now? A. 5:00; B. 4:45; C. 5:15.	B	True
+short_conv_508	gaokao_audio/short_conv_508.wav	111111	Where does this conversation take place? A. In a food store; B. In a restaurant; C. At a vegetable market.	B	True
+short_conv_509	gaokao_audio/short_conv_509.wav	111111	When is the concert going to start? A. At 7:45; B. At 7:30; C. At 7:15.	A	True
+short_conv_51	gaokao_audio/short_conv_51.wav	111111	Where will the man stay next? A. In the garden; B. In the bathroom; C. In the living room.	C	True
+short_conv_510	gaokao_audio/short_conv_510.wav	111111	Why can’t men do better in a computer company than women? A. They are not as careful as women; B. They are too strong; C. Their hands are too big.	C	True
+short_conv_511	gaokao_audio/short_conv_511.wav	111111	What does the man think of the car? A. Cheap; B. Old; C. Nice.	A	True
+short_conv_512	gaokao_audio/short_conv_512.wav	111111	Which skirt will the man buy? A. The green one; B. The brown one; C. The red one.	B	True
+short_conv_513	gaokao_audio/short_conv_513.wav	111111	What are the speakers talking about? A. A new TV set; B. A TV program; C. A radio program.	B	True
+short_conv_514	gaokao_audio/short_conv_514.wav	111111	What will the woman work as? A. An assistant B. A lawyer; C. A teacher.	A	True
+short_conv_515	gaokao_audio/short_conv_515.wav	111111	What is the woman going to do? A. Play baseball; B. Watch a game; C. Do her work.	B	True
+short_conv_516	gaokao_audio/short_conv_516.wav	111111	What will the speakers take to the picnic? A. Some drinks; B. Some fruit; C. Some desserts	C	True
+short_conv_517	gaokao_audio/short_conv_517.wav	111111	What did the man like about the movie? A. The acting; B. The music; C. The scenery.	A	True
+short_conv_518	gaokao_audio/short_conv_518.wav	111111	Why did the woman apologize to the man? A. She lost his cell-phone; B. She made up a lie; C. She said bad words about his parents	B	True
+short_conv_519	gaokao_audio/short_conv_519.wav	111111	How will the woman go to her date? A. By car; B. By bus; C. By underground.	C	True
+short_conv_52	gaokao_audio/short_conv_52.wav	111111	What does the man think of Bill? A. He’s funny; B. He causes problems; C. He shouldn’t be fired.	B	True
+short_conv_520	gaokao_audio/short_conv_520.wav	111111	Where does the conversation probable take place? A. In a classroom; B. In a library; C. In a bookshop.	C	True
+short_conv_521	gaokao_audio/short_conv_521.wav	111111	How much should the woman pay? A. $8; B. $10; C. $12.	A	True
+short_conv_522	gaokao_audio/short_conv_522.wav	111111	What is the weather probably like now? A. Dry; B. Windy; C. Rainy.	A	True
+short_conv_523	gaokao_audio/short_conv_523.wav	111111	What will the man do tonight? A. Work on his report; B. Go dancing with Jenny; C. Help Jenny with her history.	A	True
+short_conv_524	gaokao_audio/short_conv_524.wav	111111	What does the man imply? A. Jack didn’t find the record; B. Jack didn’t go to the party; C. Jack borrowed the record from him.	A	True
+short_conv_525	gaokao_audio/short_conv_525.wav	111111	What are the speakers mainly talking about? A. A movie; B. A swimming pool; C. A plan.	C	True
+short_conv_526	gaokao_audio/short_conv_526.wav	111111	How will the speakers go to the Sports Complex? A. By subway; B. By bus; C. By taxi.	C	True
+short_conv_527	gaokao_audio/short_conv_527.wav	111111	How many people are added to the lunch reservation? A. Six; B. Four; C. Two.	A	True
+short_conv_528	gaokao_audio/short_conv_528.wav	111111	What are the speakers mainly talking about？ A. A book; B. A film; C. A writer.	B	True
+short_conv_529	gaokao_audio/short_conv_529.wav	111111	What did the man ask Justin to do? A. Borrow some magazines for him; B. Bring some magazines to him; C. Refer to some magazines to finish his design.	B	True
+short_conv_53	gaokao_audio/short_conv_53.wav	111111	What is the man going to do now? A. Go home; B. Go to the store; C. Go to the hospital.	A	True
+short_conv_530	gaokao_audio/short_conv_530.wav	111111	Where did the woman stay while she was in Alaska? A. She stayed in the local's house; B. She stayed in a hotel with her friends; C. She camped near the mountains.	C	True
+short_conv_531	gaokao_audio/short_conv_531.wav	111111	Why does the girl want to buy a clock? A. She has trouble waking up; B. She wants to buy someone a gift; C. Her watch is broken.	A	True
+short_conv_532	gaokao_audio/short_conv_532.wav	111111	What is the man going to do tonight？ A. To a birthday party; B. To visit Nancy; C. To the airport.	C	True
+short_conv_533	gaokao_audio/short_conv_533.wav	111111	How does this woman think of her interview? A. It was tough B. It was interesting C. It was successful	C	True
+short_conv_534	gaokao_audio/short_conv_534.wav	111111	What are the speakers talking about? A. A restaurant B. A street C. A dish	A	True
+short_conv_535	gaokao_audio/short_conv_535.wav	111111	Where does the conversation probably take place? A. In a bank B. At a ticket office C. On the train	B	True
+short_conv_536	gaokao_audio/short_conv_536.wav	111111	What is the probable relationship between the speakers? A. Colleagues B. Brother and sister C. Teacher and student	A	True
+short_conv_537	gaokao_audio/short_conv_537.wav	111111	What does John find difficult in learning German? A. Pronunciation B. Vocabulary C. Grammar	C	True
+short_conv_538	gaokao_audio/short_conv_538.wav	111111	What will the speakers probably do next? A. Wait for the cat; B. Feed the cat; C. Call the cat.	A	True
+short_conv_539	gaokao_audio/short_conv_539.wav	111111	Where are the speakers? A. In a classroom; B. In a library; C. In a bookstore.	B	True
+short_conv_54	gaokao_audio/short_conv_54.wav	111111	What was the woman doing just now? A. Taking an exam; B. Talking to her professor; C. Giving money to the homeless.	B	True
+short_conv_540	gaokao_audio/short_conv_540.wav	111111	What do the speakers think of Linda’s brother? A. He is quiet B. He is friendly; C. He is unpleasant.	C	True
+short_conv_541	gaokao_audio/short_conv_541.wav	111111	What is Jack’s position? A. A manager; B. A cleaner; C. A salesperson.	A	True
+short_conv_542	gaokao_audio/short_conv_542.wav	111111	What did the woman dislike when she was young? A. Chocolate; B. Vegetables; C. Cookies.	B	True
+short_conv_543	gaokao_audio/short_conv_543.wav	111111	How does the woman probably feel now? A. Excited ; B. Tired ; C. Sad.	B	True
+short_conv_544	gaokao_audio/short_conv_544.wav	111111	What is the man going to do next Saturday? A. Attend a party; B. Stay at home ; C. Visit his grandparents.	B	True
+short_conv_545	gaokao_audio/short_conv_545.wav	111111	When is the woman’s school usually over? A. At 5:30 pm; B. At 6:00 pm ; C. At 6:30 pm.	A	True
+short_conv_546	gaokao_audio/short_conv_546.wav	111111	How will the woman go to her piano lesson? A. On foot ; B. By bike ; C. By car .	A	True
+short_conv_547	gaokao_audio/short_conv_547.wav	111111	Who does the man want to talk to ? A. Tammy; B. Dr.Maxwell; C. Emmy Simpson.	C	True
+short_conv_548	gaokao_audio/short_conv_548.wav	111111	Where does the conversation probably take place? A. On a farm; B. At a fruit market; C. At customs(海关).	C	True
+short_conv_549	gaokao_audio/short_conv_549.wav	111111	When will the man most likely get home? A. At 7:00; B. At about 7:30; C. After 8:00.	B	True
+short_conv_55	gaokao_audio/short_conv_55.wav	111111	Where did Mike meet up with Sam? A. At a gym; B. At a restaurant; C. At a movie theater.	A	True
+short_conv_550	gaokao_audio/short_conv_550.wav	111111	What will the man do next? A. Fill out another form; B. Correct his mistake on the form; C. Tell the woman his medical history.	A	True
+short_conv_551	gaokao_audio/short_conv_551.wav	111111	Where is the woman going next? A. To a snack bar; B. To a movie theater; C. To her friend Simon’s house.	A	True
+short_conv_552	gaokao_audio/short_conv_552.wav	111111	What is the man looking for? A. A book; B. His iPhone; C. A pay phone.	C	True
+short_conv_553	gaokao_audio/short_conv_553.wav	111111	What sport does the woman like best? A. Basketball B. Volleyball C. Tennis	C	True
+short_conv_554	gaokao_audio/short_conv_554.wav	111111	What are the speakers talking about? A. A professor; B. A report; C. An animal.	B	True
+short_conv_555	gaokao_audio/short_conv_555.wav	111111	What will the speakers probably do tomorrow? A. Clean the garage; B. Tidy the yard; C. Do some shopping.	A	True
+short_conv_556	gaokao_audio/short_conv_556.wav	111111	Who is using Tom’s notes now? A. Linda; B. Paul; C. Ivan.	B	True
+short_conv_557	gaokao_audio/short_conv_557.wav	111111	For which subject does the woman feel fully prepared? A. English; B. Math; C. Physics.	A	True
+short_conv_558	gaokao_audio/short_conv_558.wav	111111	Who will begin the lecture now? A. Prof. Brookings; B. Dr. Mildens; C. Dr. White.	A	True
+short_conv_559	gaokao_audio/short_conv_559.wav	111111	What did the woman do for Mary last night? A. She fixed Mary’s car; B. She gave Mary a phone call; C. She let Mary sleep in her house.	C	True
+short_conv_56	gaokao_audio/short_conv_56.wav	111111	Where is Mike now? A. At home; B. In the school office; C. In the park.	C	True
+short_conv_560	gaokao_audio/short_conv_560.wav	111111	Where do the speakers plan to go? A. The theater; B. Their mom’s office; C. Their grandma’s house.	C	True
+short_conv_561	gaokao_audio/short_conv_561.wav	111111	What are the speakers talking about? A. A birthday celebration; B. A fancy restaurant; C. A holiday plan.	A	True
+short_conv_562	gaokao_audio/short_conv_562.wav	111111	What does the man suggest the woman do? A. Buy a new dress; B. Exchange the dress; C. Get the dress tailored.	C	True
+short_conv_563	gaokao_audio/short_conv_563.wav	111111	What are the speakers mainly talking about? A. Young people lose their jobs easily; B. Young people seldom stay long in the same job; C. Young people are too quick in making decisions.	B	True
+short_conv_564	gaokao_audio/short_conv_564.wav	111111	How does the woman feel about the zoo? A. Sad; B. Disappointed; C. Impressed.	C	True
+short_conv_565	gaokao_audio/short_conv_565.wav	111111	What is the woman doing now? A. Making a list; B. Baking cookies; C. Shopping for groceries.	A	True
+short_conv_566	gaokao_audio/short_conv_566.wav	111111	What happened to the woman? A. She went to sleep late; B. She got to work late; C. She woke up late.	C	True
+short_conv_567	gaokao_audio/short_conv_567.wav	111111	Where does the man want to go? A. A railway station; B. The seaside; C. A post office.	B	True
+short_conv_568	gaokao_audio/short_conv_568.wav	111111	What is the man they imply? A. Many students find Professor Brown’s lecture uninteresting; B. Few students understand Professor Brown’s lecture; C. Many students have dropped Professor Brown’s class.	A	True
+short_conv_569	gaokao_audio/short_conv_569.wav	111111	What can we learn from the conversation about the woman? A. She isn’t popular with the colleagues in the sales department; B. She enjoyed working in the sales department; C. She doesn’t like her new position very much.	B	True
+short_conv_57	gaokao_audio/short_conv_57.wav	111111	What does the man ask the woman to help with? A. His English; B. His math; C. His science.	B	True
+short_conv_570	gaokao_audio/short_conv_570.wav	111111	what is the man probably mean? A. He is concerned about the woman’s safety; B. There is something wrong with the car; C. The woman must fasten the seat belt.	C	True
+short_conv_571	gaokao_audio/short_conv_571.wav	111111	What can we learn from the conversation? A. The weather will not affect their plan; B. They will not do as planned in case of rain; C. They will postpone their programme if it rains.	B	True
+short_conv_572	gaokao_audio/short_conv_572.wav	111111	What does the woman imply? A. She’s unable to finish her homework; B. She has to remove the virus; C. She’s infected with some disease.	A	True
+short_conv_573	gaokao_audio/short_conv_573.wav	111111	What can be most probably inferred about the man? A. He didn’t get the type of room he wanted; B. He expected the room to be more expensive; C. He thought he had already made a reservation.	A	True
+short_conv_574	gaokao_audio/short_conv_574.wav	111111	What are they mainly talking about? A. The man’s hobby; B. The man’s interview; C. The man’s job.	A	True
+short_conv_575	gaokao_audio/short_conv_575.wav	111111	What is the men think of the movie? A. Interesting; B. Successful; C. Boring.	C	True
+short_conv_576	gaokao_audio/short_conv_576.wav	111111	How did the men plan to go to the shopping mall at first? A. By car; B. By bus; C. On foot.	C	True
+short_conv_577	gaokao_audio/short_conv_577.wav	111111	Where are the speakers? A. At a bag store; B. In a restaurant; C. At a hotel.	C	True
+short_conv_578	gaokao_audio/short_conv_578.wav	111111	Why was Alicia late this time? A. She missed the bus; B. Her grandma was sick; C. The bus was in an accident.	C	True
+short_conv_579	gaokao_audio/short_conv_579.wav	111111	What was the woman probably trying to get? A. A ticket for a movie; B. A part in a play; C. A job as a model.	B	True
+short_conv_58	gaokao_audio/short_conv_58.wav	111111	When will the man check the computer? A. In five minutes; B. In two days; C. In two weeks.	A	True
+short_conv_580	gaokao_audio/short_conv_580.wav	111111	How does the woman feel about the shoes? A. They’re a bit small; B. They’re too expensive; C. She doesn’t like the color.	C	True
+short_conv_581	gaokao_audio/short_conv_581.wav	111111	What is the relationship between the two speakers? A. Husband and wife; B. Teacher and students; C. Doctor and patient.	A	True
+short_conv_582	gaokao_audio/short_conv_582.wav	111111	What can we learn from this conversation? A. The man has already downloaded some sales data; B. They all make preparations for the meeting; C. The woman asks for high quality service.	B	True
+short_conv_583	gaokao_audio/short_conv_583.wav	111111	What will the woman probably do? A. Lock the computer lab later; B. Buy a new lock for the computer lab; C. Show the man where the lab is.	A	True
+short_conv_584	gaokao_audio/short_conv_584.wav	111111	What can we learn from this conversation? A. The woman is a tour guide; B. The tour guide was born in New York; C. The man is a British native speaker.	B	True
+short_conv_585	gaokao_audio/short_conv_585.wav	111111	Where does the girl want to go? A. The History Museum; B. The Art Museum; C. The Space Museum.	A	True
+short_conv_586	gaokao_audio/short_conv_586.wav	111111	What's the relationship between Cindy and Ron? A. Mother and son; B. Wife and husband; C. Waitress and customer.	B	True
+short_conv_587	gaokao_audio/short_conv_587.wav	111111	Where most probably are the two speakers? A. At an airport; B. At a city Hall; C. At a railway station.	C	True
+short_conv_588	gaokao_audio/short_conv_588.wav	111111	How does the woman suggest the man go to the city? A. By foot; B. By bus; C. By taxi.	C	True
+short_conv_589	gaokao_audio/short_conv_589.wav	111111	What do we know about the speakers? A. They live together; B. They both like Star Wars; C. They are talking in the man’s room.	C	True
+short_conv_59	gaokao_audio/short_conv_59.wav	111111	What can we know about the man? A. He has got the job; B. He wears long hair; C. He has just had his hair cut.	B	True
+short_conv_590	gaokao_audio/short_conv_590.wav	111111	How might the man’s action appear to others in the U.S.? A. Very rude; B. Quite normal; C. A little old-fashioned.	B	True
+short_conv_591	gaokao_audio/short_conv_591.wav	111111	At what time does the office open? A. 7:45; B. 8:00; C. 8:15.	B	True
+short_conv_592	gaokao_audio/short_conv_592.wav	111111	What did the man do during his vacation? A. He stayed at home; B. He had a part-time job; C. He took some courses.	C	True
+short_conv_593	gaokao_audio/short_conv_593.wav	111111	What does the man work as now? A. An official; B. A lawyer; C. A sales manager.	C	True
+short_conv_594	gaokao_audio/short_conv_594.wav	111111	What does the woman ask the man to do? A. Start a fire; B. Look out of the window; C. Put his cigarette in the ashtray.	C	True
+short_conv_595	gaokao_audio/short_conv_595.wav	111111	What does the man need now? A. Ice cream; B. Milk; C. Water.	C	True
+short_conv_596	gaokao_audio/short_conv_596.wav	111111	What did the man do last weekend? A. He played basketball; B. He watched a game; C. He took a trip.	C	True
+short_conv_597	gaokao_audio/short_conv_597.wav	111111	Where is probably Sue now? A. At home; B. At Bill’s home; C. At the office.	C	True
+short_conv_598	gaokao_audio/short_conv_598.wav	111111	What does the woman want to do? A. Quit smoking; B. Change a seat; C. Buy a cake.	B	True
+short_conv_599	gaokao_audio/short_conv_599.wav	111111	What can be inferred from the man? A. Top concern should be given to safety; B. Bicycles are not always extremely cheap; C. The price can not decide the quality of bicycles.	A	True
+short_conv_6	gaokao_audio/short_conv_6.wav	111111	Where does the conversation probably take place? A. In a bookstore; B. In an office; C. In a storehouse.	C	True
+short_conv_60	gaokao_audio/short_conv_60.wav	111111	Which kind of music does the man like best? A. Pop music; B. Classical music; C. Folk music.	C	True
+short_conv_600	gaokao_audio/short_conv_600.wav	111111	What will the woman most probably do? A. Buy a new printer with less noise; B. Read a book on how to fix the printer; C. Get a repairman to check the printer.	C	True
+short_conv_601	gaokao_audio/short_conv_601.wav	111111	What does the woman imply? A. John's speech has something to be desired; B. John has a talent for delivering public speeches; C. John is brave enough to express his viewpoints.	C	True
+short_conv_602	gaokao_audio/short_conv_602.wav	111111	What's the woman's advice? A. Giving the paper to his tutor; B. Asking Mrs. Black for advice; C. Choosing biology as the subject.	B	True
+short_conv_603	gaokao_audio/short_conv_603.wav	111111	What's the possible relationship between the man and the woman? A. Boss and secretary; B. Professor and student; C. Waiter and guest.	B	True
+short_conv_604	gaokao_audio/short_conv_604.wav	111111	How does the man feel? A. Confused; B. Surprised; C. Worried.	C	True
+short_conv_605	gaokao_audio/short_conv_605.wav	111111	How long will it take if they go to the Art Center by subway? A. 45 minutes; B. 50 minutes; C. 60 minutes.	A	True
+short_conv_606	gaokao_audio/short_conv_606.wav	111111	What can we know from the conversation? A. Cathy doesn’t like parties; B. Cathy won’t come to the party; C. Cathy has just returned from China.	B	True
+short_conv_607	gaokao_audio/short_conv_607.wav	111111	When does the man think the woman should book a flight? A. In two weeks’ time; B. As soon as possible; C. One day before the departing day.	B	True
+short_conv_608	gaokao_audio/short_conv_608.wav	111111	What are the two speakers talking? A. Their trip to New Zealand; B. Their time with their families; C. Their plans for the Christmas holidays.	C	True
+short_conv_609	gaokao_audio/short_conv_609.wav	111111	What will the woman do this afternoon? A. Go to visit Mary; B. Have afternoon tea; C. Go to catch a train.	C	True
+short_conv_61	gaokao_audio/short_conv_61.wav	111111	Where did the woman put up the painting？ A. In the living com B. In the bedroom C. In the bathroom	C	True
+short_conv_610	gaokao_audio/short_conv_610.wav	111111	How much does the woman have to pay if she buys two pairs of shoes? A. $35 B. $56; C. $70.	B	True
+short_conv_611	gaokao_audio/short_conv_611.wav	111111	What are the speakers mainly talking about? A. An e-mail; B. A company; C. An old workmate.	C	True
+short_conv_612	gaokao_audio/short_conv_612.wav	111111	What was wrong with the woman? A. She nearly had an accident; B. She knocked into a taxi; C. She is sick.	A	True
+short_conv_613	gaokao_audio/short_conv_613.wav	111111	Where could the speakers most likely be? A. In a restaurant; B. In a supermarket; C. In the man’s house.	B	True
+short_conv_614	gaokao_audio/short_conv_614.wav	111111	When does the film finish? A. At 10:15; B. At 10:30; C. At 11:00.	A	True
+short_conv_615	gaokao_audio/short_conv_615.wav	111111	What is the weather like during the weekend? A. Cold; B. Warm; C. Hot.	B	True
+short_conv_616	gaokao_audio/short_conv_616.wav	111111	What does the man think of the plays by Lady Orland? A. Boring and terrible; B. Serious and positive; C. Inspiring and humorous.	C	True
+short_conv_617	gaokao_audio/short_conv_617.wav	111111	When will the man leave for Liverpool? A. At 12:00; B. At 14:30; C. At 17:30.	B	True
+short_conv_618	gaokao_audio/short_conv_618.wav	111111	What does the woman really mean? A. There's no problem; B. She is very busy; C. She’ll go to another place.	B	True
+short_conv_619	gaokao_audio/short_conv_619.wav	111111	What’s the time now? A. 10:30; B. 10:10; C. 9:50.	B	True
+short_conv_62	gaokao_audio/short_conv_62.wav	111111	What happened to the man ? A. He paid 10 dollars for a shirt; B. He got a shirt from his friend; C. He bought the shirt at a higher price.	C	True
+short_conv_620	gaokao_audio/short_conv_620.wav	111111	When will the woman speaker come back home? A. Early in the evening; B. Late in the evening; C. Early in the afternoon.	A	True
+short_conv_621	gaokao_audio/short_conv_621.wav	111111	What’s the possible relationship between the speakers? A. Teacher and student; B. Manager and secretary; C. Police officer and driver.	C	True
+short_conv_622	gaokao_audio/short_conv_622.wav	111111	What does the woman think of coming back by coach? A. It’s comfortable; B. It’s expensive; C. It’s time-saving.	A	True
+short_conv_623	gaokao_audio/short_conv_623.wav	111111	What are the speakers going to do? A. Do some cooking; B. Stop to eat; C. Go into a dining room.	B	True
+short_conv_624	gaokao_audio/short_conv_624.wav	111111	Where does the conversation take place? A. At home; B. In the street; C. In a restaurant	B	True
+short_conv_625	gaokao_audio/short_conv_625.wav	111111	What is the man? A. A doctor; B. A tailor; C. A waiter.	C	True
+short_conv_626	gaokao_audio/short_conv_626.wav	111111	What is the problem? A. The woman doesn’t like orange juice; B. The man was looking for orange juice; C. The man broke the container of juice.	C	True
+short_conv_627	gaokao_audio/short_conv_627.wav	111111	How did the woman feel about the books’ price? A. Cheap; B. Expensive; C. Have no idea.	A	True
+short_conv_628	gaokao_audio/short_conv_628.wav	111111	How can the woman get Kate’s phone number? A. The man will get the new number for her; B. She can get the new number by calling the old one; C. Kate is still using the old one, so she can call the old one.	B	True
+short_conv_629	gaokao_audio/short_conv_629.wav	111111	What does the man think of the woman’s hat? A. It’s very good; B. He likes the style of it; C. It doesn’t go well with her dress.	C	True
+short_conv_63	gaokao_audio/short_conv_63.wav	111111	How does the woman feel? A. Annoyed; B. Relieved; C. Nervous.	A	True
+short_conv_630	gaokao_audio/short_conv_630.wav	111111	What time is it now? A. 7:15 B. 6:40 C. 7:45	A	True
+short_conv_631	gaokao_audio/short_conv_631.wav	111111	What does the woman mean? A. She is fully enjoying herself; B. She doesn’t like the atmosphere; C. She is not familiar with the songs.	A	True
+short_conv_632	gaokao_audio/short_conv_632.wav	111111	What does the woman imply? A. The man should give Tom more influence; B. Tom’s project doesn’t need to be revised at all; C. It’s unlikely that Tom will revise his project.	C	True
+short_conv_633	gaokao_audio/short_conv_633.wav	111111	What do we know from the conversation? A. The woman is glad to meet Mr. Brown in person; B. The man is meeting the woman on behalf of Mr. Brown; C. The woman feels sorry that Mr. Brown is unable to come.	B	True
+short_conv_634	gaokao_audio/short_conv_634.wav	111111	What does the man say about the experiment? A. It’s better than expected; B. It’s not satisfactory at all; C. It’s beyond expectation.	B	True
+short_conv_635	gaokao_audio/short_conv_635.wav	111111	What does the woman think of the speech? A. Convincing; B. Unbelievable; C. Time-consuming.	A	True
+short_conv_636	gaokao_audio/short_conv_636.wav	111111	What is the man suggest the woman do? A. Make a little progress; B. Change for another course; C. Stick to the course.	C	True
+short_conv_637	gaokao_audio/short_conv_637.wav	111111	What are the 2 speakers probably doing? A. Taking a photo; B. Practicing riding a car; C. Hanging a picture.	A	True
+short_conv_638	gaokao_audio/short_conv_638.wav	111111	How much will the men pay for the two shirts? A. $ 30; B. $ 45; C. $ 60.	B	True
+short_conv_639	gaokao_audio/short_conv_639.wav	111111	What are the speakers talking about? A. Drinks to be delivered; B. Shopping list for drinks; C. Preparations for a party.	C	True
+short_conv_64	gaokao_audio/short_conv_64.wav	111111	What might the boy do after school？ A. Wait at school for his mother; B. Play on the swings with Katie; C. Go to the park with his mother.	B	True
+short_conv_640	gaokao_audio/short_conv_640.wav	111111	Where does this conversation probably take place? A. At a hotel; B. At a bus stop; C. At a post office.	A	True
+short_conv_641	gaokao_audio/short_conv_641.wav	111111	What was the woman doing when the earthquake happened? A. She was washing her hair; B. She was feeding the dog; C. She was cleaning the bathroom.	A	True
+short_conv_642	gaokao_audio/short_conv_642.wav	111111	What are the speakers mainly talking about? A. Chocolate; B. Cookies; C. Milk.	B	True
+short_conv_643	gaokao_audio/short_conv_643.wav	111111	Where does the conversation most probably take place? A. In a hotel; B. In a post office; C. In the woman’s house.	A	True
+short_conv_644	gaokao_audio/short_conv_644.wav	111111	How much did the bank lose according to the man? A. $2,000,000; B. $4,000,000; C. $6,000,000.	B	True
+short_conv_645	gaokao_audio/short_conv_645.wav	111111	When should the passengers check in for flight 452? A. At 3:50; B. At 4:50; C. At 5:50.	A	True
+short_conv_646	gaokao_audio/short_conv_646.wav	111111	What are the speakers talking about? A. Homework; B. Exams; C. Books.	B	True
+short_conv_647	gaokao_audio/short_conv_647.wav	111111	What does the woman suggest the man do? A. Put off the trip; B. Visit Sweden in the summer; C. Take some clothes to keep warm.	C	True
+short_conv_648	gaokao_audio/short_conv_648.wav	111111	Where does the conversation most probably take place? A. In a supermarket; B. In a restaurant; C. In the man’s house.	A	True
+short_conv_649	gaokao_audio/short_conv_649.wav	111111	What does the man want to buy? A. A jacket; B. A hat; C. A sweater.	A	True
+short_conv_65	gaokao_audio/short_conv_65.wav	111111	What does the woman want the man to do？ A. Bring her a blanket; B. Turn down the heat; C. Shut the windows.	C	True
+short_conv_650	gaokao_audio/short_conv_650.wav	111111	What will the man do on Friday? A. Send food; B. Attend a meeting; C. Order food.	B	True
+short_conv_651	gaokao_audio/short_conv_651.wav	111111	Why is the woman having trouble? A. The table is heavy; B. The house is far away; C. The table’s sides are hard to hold.	C	True
+short_conv_652	gaokao_audio/short_conv_652.wav	111111	Which season is it now? A. Summer; B. Fall; C. Winter.	B	True
+short_conv_653	gaokao_audio/short_conv_653.wav	111111	Where is the woman? A. In an office; B. In a hotel room; C. At a restaurant.	B	True
+short_conv_654	gaokao_audio/short_conv_654.wav	111111	How old is the woman now? A. 18 years old; B. 20 years old; C. 38 years old.	C	True
+short_conv_655	gaokao_audio/short_conv_655.wav	111111	What does the man ask the woman to do? A. Solve a problem; B. Write a report; C. Send an e-mail.	B	True
+short_conv_656	gaokao_audio/short_conv_656.wav	111111	What are the speakers discussing'? A. Which bus to take; B. Which way to go; C. Which stop to get off.	A	True
+short_conv_657	gaokao_audio/short_conv_657.wav	111111	Where are the speakers? A. In a clothes store; B. In a car shop; C. In a parking lot.	B	True
+short_conv_658	gaokao_audio/short_conv_658.wav	111111	Why does the woman often eat at restaurants? A. She likes the food there; B. She doesn't like cooking; C. She's too busy to cook.	C	True
+short_conv_659	gaokao_audio/short_conv_659.wav	111111	What time is it now? A. 12:30; B. 1:00; C. 1:30.	A	True
+short_conv_66	gaokao_audio/short_conv_66.wav	111111	How long will the boy be at summer camp? A. One week; B. Two weeks; C. Three weeks.	B	True
+short_conv_660	gaokao_audio/short_conv_660.wav	111111	What is the man doing? A. Reading a magazine; B. Checking his email; C. Typing a report.	C	True
+short_conv_661	gaokao_audio/short_conv_661.wav	111111	what can we learn from the conversation A. The woman is confident in the sales of her paintings; B. The man doubts that the woman’s paintings will sell well; C. The man is concerned about critics’ comments on the show.	C	True
+short_conv_662	gaokao_audio/short_conv_662.wav	111111	What does the woman mean? A. People should care more about their appearance; B. It’s not sensible to go after brand-name clothing; C. Styles change more quickly than necessary nowadays.	B	True
+short_conv_663	gaokao_audio/short_conv_663.wav	111111	what does the woman mean? A. She has no spare room for a change; B. The hotel’s business is now very good; C. She’s busy with her business right now.	A	True
+short_conv_664	gaokao_audio/short_conv_664.wav	111111	what is the woman going to do? A. She’s flying to Hong Kong; B. She’s going to buy an airplane ticket; C. She’s leaving for Hong Kong with Bill.	A	True
+short_conv_665	gaokao_audio/short_conv_665.wav	111111	What does the man mean? A. He is worried about their trip expense; B. He suggests the woman bring her daughter; C. He suggests the woman visit the university.	B	True
+short_conv_666	gaokao_audio/short_conv_666.wav	111111	What are the two speakers doing? A. Talking on the phone; B. Working in an office; C. Doing spelling practice.	A	True
+short_conv_667	gaokao_audio/short_conv_667.wav	111111	What will the man most probably do? A. Persuade the member not to quit; B. Attend the next club meeting; C. Look for someone to fill the position.	C	True
+short_conv_668	gaokao_audio/short_conv_668.wav	111111	What are the 2 speakers talking about? A. Tour news; B. A job opportunity; C. Sales strategies.	B	True
+short_conv_669	gaokao_audio/short_conv_669.wav	111111	Where does the conversation most probably take place? A. At a hotel; B. At an airport; C. At a police station.	A	True
+short_conv_67	gaokao_audio/short_conv_67.wav	111111	What does the man imply? A. He is stressed; B. He works too hard; C. He needs some excitement.	C	True
+short_conv_670	gaokao_audio/short_conv_670.wav	111111	What is the probable relationship between the two speakers? A. Guide and tourist; B. Customer and shop assistant; C. Trainer and trainee.	B	True
+short_conv_671	gaokao_audio/short_conv_671.wav	111111	What was the woman doing when the earthquake happened? A. She was washing her hair; B. She was feeding the dog; C. She was cleaning the bathroom.	A	True
+short_conv_672	gaokao_audio/short_conv_672.wav	111111	What are the speakers mainly talking about? A. Chocolate; B. Cookies; C. Milk.	B	True
+short_conv_673	gaokao_audio/short_conv_673.wav	111111	Where does the conversation most probably take place? A. In a hotel; B. In a post office; C. In the woman’s house.	A	True
+short_conv_674	gaokao_audio/short_conv_674.wav	111111	When should the passengers check in for flight 452? A. At 3:50; B. At 4:50; C. At 5:50.	A	True
+short_conv_675	gaokao_audio/short_conv_675.wav	111111	How much did the bank lose according to the man? A. $2,000,000; B. $4,000,000; C. $6,000,000.	B	True
+short_conv_676	gaokao_audio/short_conv_676.wav	111111	What can we learn from the conversation? A. The new suit is a reminder for the man; B. The new suit doesn’t fit the man; C. The man forgets to wear his new suit.	B	True
+short_conv_677	gaokao_audio/short_conv_677.wav	111111	What does the woman suggest the man do? A. Leave for Beijing with Jack; B. Go to the airport after work; C. Ask someone else for help.	C	True
+short_conv_678	gaokao_audio/short_conv_678.wav	111111	What does the woman imply? A. Weather forecasts are not reliable; B. They could stick to their plan; C. They’d better change their mind.	B	True
+short_conv_679	gaokao_audio/short_conv_679.wav	111111	What does the man mean? A. He partly agrees with the woman; B. He considers the woman competitive; C. He’s wholly been lost in a colorful life.	A	True
+short_conv_68	gaokao_audio/short_conv_68.wav	111111	Why is the man lost? A. He took a wrong turn; B. He was told to take this way; C. He missed the freeway signs.	A	True
+short_conv_680	gaokao_audio/short_conv_680.wav	111111	What does the man imply? A. Mary must be caught in heavy traffic; B. The woman was obviously not fond of Mary; C. The woman forgot to tell Mary to come.	A	True
+short_conv_681	gaokao_audio/short_conv_681.wav	111111	What does the woman mean? A. She will stay for breakfast; B. She loves to grab a coffee on the way; C. She needs to eat before school.	C	True
+short_conv_682	gaokao_audio/short_conv_682.wav	111111	What will the man probably do next? A. Carry the bags; B. Hurry to drive the car; C. Search for the bags.	A	True
+short_conv_683	gaokao_audio/short_conv_683.wav	111111	What are the speakers probably doing? A. Reading newspapers; B. Talking about sports; C. Putting up advertisements.	A	True
+short_conv_684	gaokao_audio/short_conv_684.wav	111111	Where does the conversation probably take place? A. In a taxi; B. On a bus; C. On a bridge.	A	True
+short_conv_685	gaokao_audio/short_conv_685.wav	111111	What's the probable relationship between the two speakers? A. Father and daughter; B. Manager and secretary; C. Customer and shop assistant.	B	True
+short_conv_686	gaokao_audio/short_conv_686.wav	111111	Why does the woman want to buy a heavy coat for Jimmy? A. Jimmy will go camping in the mountains; B. Winter is coming soon; C. Jimmy has caught a bad cold.	A	True
+short_conv_687	gaokao_audio/short_conv_687.wav	111111	What does the man advise the woman to do? A. Exercise for 20 minutes in the morning; B. Read English every morning; C. Get up early.	B	True
+short_conv_688	gaokao_audio/short_conv_688.wav	111111	When will the man check out? A. On Thursday; B. On Friday; C. On Tuesday.	B	True
+short_conv_689	gaokao_audio/short_conv_689.wav	111111	What does the man mean? A. He can’t hear what the woman is saying; B. He is afraid to touch the spider; C. He will try to touch the spider later.	B	True
+short_conv_69	gaokao_audio/short_conv_69.wav	111111	How much is the painting worth now? A. $2,000; B. $2 million; C. $30 million.	B	True
+short_conv_690	gaokao_audio/short_conv_690.wav	111111	Where does the woman want to go? A. The supermarket; B. The kindergarten; C. The book store.	C	True
+short_conv_691	gaokao_audio/short_conv_691.wav	111111	What are the speakers talking about? A. Where to play; B. When to play; C. Who to play with.	A	True
+short_conv_692	gaokao_audio/short_conv_692.wav	111111	How did the woman feel about her presentation? A. Relaxed; B. Confident; C. Anxious.	C	True
+short_conv_693	gaokao_audio/short_conv_693.wav	111111	What color T-shirt does the man like? A. Green and white; B. Gray and black; C. Gray and white.	A	True
+short_conv_694	gaokao_audio/short_conv_694.wav	111111	When will the speakers meet? A. By 5:00; B. By 4:30; C. By 2:30.	C	True
+short_conv_695	gaokao_audio/short_conv_695.wav	111111	What is the charge for breakfast at the moment? A. $2; B. $2.50; C. $3.	B	True
+short_conv_696	gaokao_audio/short_conv_696.wav	111111	What will the woman probably do? A. To serve as a good mechanic; B. To buy a new car; C. To get her car maintained.	C	True
+short_conv_697	gaokao_audio/short_conv_697.wav	111111	What was the man informed online? A. It would get warm today; B. The cold front would stay for long; C. The weather report was wrong.	A	True
+short_conv_698	gaokao_audio/short_conv_698.wav	111111	What does the woman mean? A. The man should continue with his exercise; B. It is important to make warming-up exercise; C. The man should start to exercise one month later.	A	True
+short_conv_699	gaokao_audio/short_conv_699.wav	111111	What does the woman mean? A. She doesn’t like the steak; B. She is too full to have anything more; C. She is full of energy.	B	True
+short_conv_7	gaokao_audio/short_conv_7.wav	111111	How does the man feel about the test? A. Confident; B. Worried; C. Sleepy.	A	True
+short_conv_70	gaokao_audio/short_conv_70.wav	111111	What does the woman want to know? A. Which items are on sale; B. Where the back of the store is; C. What the sign outside says.	A	True
+short_conv_700	gaokao_audio/short_conv_700.wav	111111	What are they most probably talking about? A. Matches; B. Toes; C. Shoes.	C	True
+short_conv_701	gaokao_audio/short_conv_701.wav	111111	What are they talking about? A. A soccer game; B. A swimming game; C. A Marathon running race.	A	True
+short_conv_702	gaokao_audio/short_conv_702.wav	111111	What do we learn from the conversation? A. The man will take the flight on Sep. 16; B. The man wants to sell his ticket for Sep. 16; C. The man is likely to take the flight on Sep. 20.	C	True
+short_conv_703	gaokao_audio/short_conv_703.wav	111111	What does the woman intend to tell the man? A. The paintings are copies with reasonable prices; B. The paintings are only sold at this fair; C. The paintings are highly priced.	A	True
+short_conv_704	gaokao_audio/short_conv_704.wav	111111	What is the man mean? A. The lady is satisfied with her black coffee; B. The lady has to have black coffee; C. The lady has had too much black coffee.	B	True
+short_conv_705	gaokao_audio/short_conv_705.wav	111111	Where does the conversation most probably take place? A. At an airport; B. At a bus stop; C. In a subway station.	A	True
+short_conv_706	gaokao_audio/short_conv_706.wav	111111	What are the speakers talking about? A. Where to play; B. When to play; C. Who to play with.	A	True
+short_conv_707	gaokao_audio/short_conv_707.wav	111111	How did the woman feel about her presentation? A. Relaxed; B. Confident; C. Anxious.	C	True
+short_conv_708	gaokao_audio/short_conv_708.wav	111111	What color T-shirt does the man like? A. Green and white; B. Gray and black; C. Gray and white.	A	True
+short_conv_709	gaokao_audio/short_conv_709.wav	111111	When will the speakers meet? A. By 5:00; B. By 4:30; C. By 2:30.	C	True
+short_conv_71	gaokao_audio/short_conv_71.wav	111111	What is the conversation about? A. How to get to Greece; B. Where to go on holiday; C. Whether to go to the small village.	B	True
+short_conv_710	gaokao_audio/short_conv_710.wav	111111	What is the charge for breakfast at the moment? A. $2; B. $2.50; C. $3.	B	True
+short_conv_711	gaokao_audio/short_conv_711.wav	111111	What is the woman complaining about? A. The busy line; B. The wrong food; C. The late delivery.	C	True
+short_conv_712	gaokao_audio/short_conv_712.wav	111111	What are the speakers talking about? A. The weather; B. Writing skills; C. Weekend plans.	C	True
+short_conv_713	gaokao_audio/short_conv_713.wav	111111	What does the woman want to do? A. Attend a party; B. Call the Trumps; C. Get Michael’s number.	B	True
+short_conv_714	gaokao_audio/short_conv_714.wav	111111	How does the woman sound? A. Relieved; B. Worried; C. Disappointed.	A	True
+short_conv_715	gaokao_audio/short_conv_715.wav	111111	What kind of shoes will the woman probably buy? A. Dress shoes; B. Soccer shoes; C. Tennis boots.	B	True
+short_conv_716	gaokao_audio/short_conv_716.wav	111111	Why was the woman worried? A. The lessons were long; B. The man came back late; C. The lessons were confusing.	B	True
+short_conv_717	gaokao_audio/short_conv_717.wav	111111	When will the speakers probably meet next time? A. On Sunday; B. On Wednesday; C. On Saturday.	B	True
+short_conv_718	gaokao_audio/short_conv_718.wav	111111	When will the second bus probably leave? A. At 10:10; B. At 10:20; C. At 10:30.	B	True
+short_conv_719	gaokao_audio/short_conv_719.wav	111111	What are the speakers talking about? A. Building a fire; B. Building a house; C. Buying some wood.	A	True
+short_conv_72	gaokao_audio/short_conv_72.wav	111111	What do we know about the man? A. He didn’t follow his doctor’s advice; B. He was under pressure from his wife; C. He gave up smoking.	C	True
+short_conv_720	gaokao_audio/short_conv_720.wav	111111	How does the woman feel about the old cartoons? A. They’re exciting; B. They’re her favorites; C. They’re only for young children.	C	True
+short_conv_721	gaokao_audio/short_conv_721.wav	111111	Why did the man change his mind probably? A. He didn’t bring enough money; B. He forgot his wallet; C. He didn’t need that much fruit.	A	True
+short_conv_722	gaokao_audio/short_conv_722.wav	111111	What are the speakers mainly talking about? A. The role of shopping in people’s lives; B. How to promote sales; C. The importance of mass media.	A	True
+short_conv_723	gaokao_audio/short_conv_723.wav	111111	What happened to the man just now? A. He met an old friend on the street; B. He mistook the woman for his friend; C. Lydia paid an unexpected visit to him.	B	True
+short_conv_724	gaokao_audio/short_conv_724.wav	111111	Who is the woman? A. Mary; B. Mary’s sister; C. Mary’s mother.	B	True
+short_conv_725	gaokao_audio/short_conv_725.wav	111111	When did the man live in London? A. Last year; B. Last month; C. When he was a child.	C	True
+short_conv_726	gaokao_audio/short_conv_726.wav	111111	what can we know from the conversation? A. People have already been standing in line for two hours; B. The man must wait for two hours to buy the ticket; C. The man can buy a special ticket before the drama starts.	C	True
+short_conv_727	gaokao_audio/short_conv_727.wav	111111	what does the man say about David? A. He has been to Seattle many times; B. He holds a high position in his company; C. He lived in Seattle for many years.	A	True
+short_conv_728	gaokao_audio/short_conv_728.wav	111111	what can be learned about the woman's paper? A. It hasn’t been graded; B. The committee is discussing it; C. The woman hasn’t handed it in.	A	True
+short_conv_729	gaokao_audio/short_conv_729.wav	111111	what does the woman suggest the man do? A. Pick up the package at the post office; B. Ask to have the package delivered to his home; C. Find out the opening hours of the post office.	B	True
+short_conv_73	gaokao_audio/short_conv_73.wav	111111	Why does the man think the woman is lucky? A. She has finished her term paper; B. She can work on the computer; C. She has a new typewriter.	B	True
+short_conv_730	gaokao_audio/short_conv_730.wav	111111	what kind of person is Victor according to the conversation? A. He is heroic; B. He is life-threatening; C. He is awkward.	A	True
+short_conv_731	gaokao_audio/short_conv_731.wav	111111	what does the woman mean? A. The reaction to the comedy is varied; B. The review of the newspaper is one-sided; C. Media are prejudiced against the comedy.	A	True
+short_conv_732	gaokao_audio/short_conv_732.wav	111111	what is the woman mean? A. Eric won’t eat vegetable without meat; B. Some meat will solve Eric’s problem; C. Eric is short of vegetable.	A	True
+short_conv_733	gaokao_audio/short_conv_733.wav	111111	what is the man probably doing? A. Buying the insurance; B. Buying a car; C. Taking a plane.	B	True
+short_conv_734	gaokao_audio/short_conv_734.wav	111111	what does the woman imply? A. She can’t wait for the winter to arrive; B. It’s hard to know how severe the winter will be; C. She needs a warm jacket.	C	True
+short_conv_735	gaokao_audio/short_conv_735.wav	111111	where does this conversation most probably take place. A. At an airport; B. At a police station; C. Ata travel agency.	A	True
+short_conv_736	gaokao_audio/short_conv_736.wav	111111	Why is the woman disappointed about the restaurant? A. The price is unacceptable; B. The waiter is unfriendly; C. The service is slow.	C	True
+short_conv_737	gaokao_audio/short_conv_737.wav	111111	What are the speakers discussing? A. When to watch TV; B. What program to watch; C. Whether to see a film.	B	True
+short_conv_738	gaokao_audio/short_conv_738.wav	111111	What does the woman ask the man to do? A. Move some boxes; B. Drive a car; C. Make a phone call.	A	True
+short_conv_739	gaokao_audio/short_conv_739.wav	111111	What will the speakers do in the afternoon? A. Go mountain biking; B. Build a tree house; C. Play beach volleyball.	B	True
+short_conv_74	gaokao_audio/short_conv_74.wav	111111	What will the man do for the woman? A. Send her to the hospital; B. Help her ask for leave; C. Get some medicine for her.	B	True
+short_conv_740	gaokao_audio/short_conv_740.wav	111111	How will the girFs mother pay for the CD? A. In cash; B. By credit card; C. By cheque.	C	True
+short_conv_741	gaokao_audio/short_conv_741.wav	111111	Why didn’t Johnson have supper? A. He was too tired; B. He had a stomachache; C. He was not hungry.	B	True
+short_conv_742	gaokao_audio/short_conv_742.wav	111111	What does the woman think of her trip to India? A. It was interesting; B. It was terrible; C. It was just so-so.	C	True
+short_conv_743	gaokao_audio/short_conv_743.wav	111111	Why is the man going shopping? A. To buy a schoolbag for the woman; B. To buy a birthday gift for his sister; C. To buy a coat for himself.	B	True
+short_conv_744	gaokao_audio/short_conv_744.wav	111111	What kind of room does the woman want? A. A room with a shower; B. A room with a single bed; C. A room with no air-conditioner.	A	True
+short_conv_745	gaokao_audio/short_conv_745.wav	111111	What’s the probable relationship between the speakers? A. Teacher and student; B. Manager and staff; C. Husband and wife.	C	True
+short_conv_746	gaokao_audio/short_conv_746.wav	111111	what can we learn from the conversation? A. The woman thinks it difficult to find the uniforms; B. The woman doesn't know the location of the school; C. The man implies his uniforms are hidden in a haystack(干草垛).	A	True
+short_conv_747	gaokao_audio/short_conv_747.wav	111111	Why are People worried about the overhead power lines? A. They can’t represent the look of the area; B. They are hidden danger in the neighborhood; C. They have historical values and should be protected.	B	True
+short_conv_748	gaokao_audio/short_conv_748.wav	111111	What are the two speakers talking about? A. The way to buy a sweater out of stock; B. The refund policy of the store; C. The proper size of a sweater.	A	True
+short_conv_749	gaokao_audio/short_conv_749.wav	111111	What can we learn from the conversation? A. The man's eating habit is quite different from the woman's; B. The woman has decided not to be a vegetarian; C. The woman usually avoids any kind of animal meat.	A	True
+short_conv_75	gaokao_audio/short_conv_75.wav	111111	Where does the conversation take place? A. In a library; B. In a classroom; C. In a bookstore.	A	True
+short_conv_750	gaokao_audio/short_conv_750.wav	111111	Where does the conversation most probably take place? A. At the airport; B. At the hospital; C. At the hotel.	A	True
+short_conv_751	gaokao_audio/short_conv_751.wav	111111	How long did David stay abroad in all? A. 9 days; B. 11 days; C. 16 days.	C	True
+short_conv_752	gaokao_audio/short_conv_752.wav	111111	Why did the woman get a “C” for her report? A. Because she forgot the deadline for the report; B. Because the man forgot to hand in her report; C. Because she didn’t hand in her report on time.	C	True
+short_conv_753	gaokao_audio/short_conv_753.wav	111111	What are the speakers probably talking about? A. Buying a house; B. Finding a hotel; C. Buying a car.	A	True
+short_conv_754	gaokao_audio/short_conv_754.wav	111111	What do we know about Peter Schmidt? A. He has lost his ticket; B. He is expecting a ticket; C. He went out to buy a ticket.	B	True
+short_conv_755	gaokao_audio/short_conv_755.wav	111111	Where are the two speakers? A. In a cafe; B. On a plane; C. On a ship.	B	True
+short_conv_756	gaokao_audio/short_conv_756.wav	111111	What does the man imply? A. He quite agrees with Mr. Johnson’s views; B. He has his own opinions on social welfare; C. Mr. Johnson is skillful in expressing his ideas.	A	True
+short_conv_757	gaokao_audio/short_conv_757.wav	111111	What does the man mean? A. The elderly don’t know how to use apps; B. The elderly can help to develop smart apps; C. The app developers can’t afford to ignore the elderly.	C	True
+short_conv_758	gaokao_audio/short_conv_758.wav	111111	Why can't the woman understand Mr. James. A. Mr. James likes boasting of his cleverness; B. The woman is not interested in what Mr. James says; C. Mr. James isn’t very straightforward in what he says.	C	True
+short_conv_759	gaokao_audio/short_conv_759.wav	111111	What does the man want? A. A job offer; B. An excellent résumé; C. The position of system engineer.	A	True
+short_conv_76	gaokao_audio/short_conv_76.wav	111111	How do the speakers go to work? A. By bus; B. By taxi; C. By bike.	A	True
+short_conv_760	gaokao_audio/short_conv_760.wav	111111	How does the man feel? A. Regretful; B. Angry; C. Relieved.	B	True
+short_conv_761	gaokao_audio/short_conv_761.wav	111111	What does the man imply? A. His wife didn’t take his sensible advice; B. He didn’t want to cut his wife’s long hair; C. His wife often complains about everything.	A	True
+short_conv_762	gaokao_audio/short_conv_762.wav	111111	What can we learn about Patrick from the conversation? A. His roommate walks in his sleep; B. His roommate’s bed is always in a mess; C. He doesn’t like sharing a room with anyone.	B	True
+short_conv_763	gaokao_audio/short_conv_763.wav	111111	How much will woman pay if she rented a room for three weeks? A. $50; B. $120; C. $150.	C	True
+short_conv_764	gaokao_audio/short_conv_764.wav	111111	What can we learn from the conversation? A. The woman was fully absorbed in the movie; B. The woman couldn’t understand the movie very well; C. The movie was no better than what the woman had imagined.	A	True
+short_conv_765	gaokao_audio/short_conv_765.wav	111111	Where does the conversation most probably take place? A. On a plane; B. On a bus; C. In a department store.	A	True
+short_conv_766	gaokao_audio/short_conv_766.wav	111111	How might the woman feel? A. Uneasy; B. Disappointed; C. Unconcerned.	B	True
+short_conv_767	gaokao_audio/short_conv_767.wav	111111	What seemed to be Sarah’s problem? A. She couldn’t finish the task as required; B. She failed in a job interview again; C. She always went to work late.	A	True
+short_conv_768	gaokao_audio/short_conv_768.wav	111111	What are the speakers mainly talking about? A. Environmental protection; B. Greenhouse effect; C. Gardening skills.	C	True
+short_conv_769	gaokao_audio/short_conv_769.wav	111111	How much more does Lucas need for the cellphone? A. $300; B. $500; C. $800.	A	True
+short_conv_77	gaokao_audio/short_conv_77.wav	111111	What does the woman look like? A. She’s slim; B. She wears glasses; C. She has short hair.	C	True
+short_conv_770	gaokao_audio/short_conv_770.wav	111111	What did the woman try to quit drinking? A. Tea; B. Coffee; C. Juice.	B	True
+short_conv_771	gaokao_audio/short_conv_771.wav	111111	What’s wrong with the coat? A. It’s a different brand; B. It’s a different color; C. It’s a wrong size.	B	True
+short_conv_772	gaokao_audio/short_conv_772.wav	111111	What is the probable relationship between the speakers? A. Boss and employee; B. Doctor and patient; C. Teacher and student.	C	True
+short_conv_773	gaokao_audio/short_conv_773.wav	111111	Who gave the woman a new car? A. Her sister; B. Her father; C. Her grandfather.	B	True
+short_conv_774	gaokao_audio/short_conv_774.wav	111111	What time will the movie begin? A. At 8:10; B. At 8:15; C. At 8:20.	C	True
+short_conv_775	gaokao_audio/short_conv_775.wav	111111	How does the man find his life in the countryside? A. Fun but inconvenient; B. Fine but tiring; C. Interesting but hard.	A	True
+short_conv_776	gaokao_audio/short_conv_776.wav	111111	What does the man mean? A. He won’t vote for the woman; B. The woman shouldn’t have asked him for his vote; C. The woman should ask his roommate to vote for her.	A	True
+short_conv_777	gaokao_audio/short_conv_777.wav	111111	What does the woman mean? A. She doesn’t want to join a gardening club; B. She doesn’t have time to work in a garden; C. She’s never been formally invited into a club.	A	True
+short_conv_778	gaokao_audio/short_conv_778.wav	111111	What can be inferred from the conversation? A. The woman didn’t work hard enough on her paper; B. The professor was content with the woman’s paper; C. The paper wasn’t as good as the woman had thought.	C	True
+short_conv_779	gaokao_audio/short_conv_779.wav	111111	What does the man mean? A. He’s satisfied with his job; B. He likes working in hot summer; C. He gets more pay than expected.	A	True
+short_conv_78	gaokao_audio/short_conv_78.wav	111111	What are the speakers talking about? A. A dog; B. A lecture; C. A professor.	B	True
+short_conv_780	gaokao_audio/short_conv_780.wav	111111	What can be inferred about the man? A. He’s going to Philadelphia by train; B. He’s already missed his train; C. He’s familiar with the train station.	C	True
+short_conv_781	gaokao_audio/short_conv_781.wav	111111	What does the woman mean? A. She’s already finished her report on the movie; B. She’ll be unable to see the movie with the man; C. She prefers a different type of movie to a comedy.	B	True
+short_conv_782	gaokao_audio/short_conv_782.wav	111111	why did the man miss the football game? A. He can’t endure the loud noise from the game; B. He thought the game was disappointing; C. He doesn’t think football games make any sense.	A	True
+short_conv_783	gaokao_audio/short_conv_783.wav	111111	What will the woman probably do next? A. Recalculate the bill; B. Refuse to pay the bill; C. Give the man a discount.	A	True
+short_conv_784	gaokao_audio/short_conv_784.wav	111111	where does the conversation most probably take place? A. In a hotel; B. In a clinic; C. In a university.	B	True
+short_conv_785	gaokao_audio/short_conv_785.wav	111111	What is most probably the man's occupation? A. An airhost; B. A passenger; C. A taxi driver.	C	True
+short_conv_786	gaokao_audio/short_conv_786.wav	111111	What’s the good news? A. The man got a better position; B. The man is going to be a father; C. The man is going to get married.	B	True
+short_conv_787	gaokao_audio/short_conv_787.wav	111111	Where does the conversation probably take place? A. At the woman’s home; B. In a cinema; C. In a shop.	A	True
+short_conv_788	gaokao_audio/short_conv_788.wav	111111	How long has the rain lasted? A. 4 days; B. 5 days; C. 6 days.	B	True
+short_conv_789	gaokao_audio/short_conv_789.wav	111111	What does the woman worry about? A. Their train tickets; B. Traffic jams; C. The driving habit.	B	True
+short_conv_79	gaokao_audio/short_conv_79.wav	111111	Where are the speakers? A. In a library; B. In a garden; C. In a bookstore.	C	True
+short_conv_790	gaokao_audio/short_conv_790.wav	111111	Who is the woman probably speaking to? A. A policeman; B. A friend; C. A shop assistant.	A	True
+short_conv_791	gaokao_audio/short_conv_791.wav	111111	What are the speakers mainly talking about? A. A picture; B. A holiday; C. A sport.	A	True
+short_conv_792	gaokao_audio/short_conv_792.wav	111111	What does the woman mean? A. Her sister loves villages; B. Tom makes a mistake; C. She likes her sister.	B	True
+short_conv_793	gaokao_audio/short_conv_793.wav	111111	Where are the speakers? A. At a party; B. In a store; C. In their new house.	C	True
+short_conv_794	gaokao_audio/short_conv_794.wav	111111	Who might Sam be? A. A baby; B. A pet; C. A toy.	B	True
+short_conv_795	gaokao_audio/short_conv_795.wav	111111	What did the woman do last night? A. She stayed at home; B. She went to a party; C. She saw a doctor.	A	True
+short_conv_796	gaokao_audio/short_conv_796.wav	111111	What is the woman doing? A. Reading; B. Asking for help; C. Washing hands.	B	True
+short_conv_797	gaokao_audio/short_conv_797.wav	111111	What are the speakers mainly talking about? A. Children’s nature; B. Parents’ effect on children; C. The importance of school education.	A	True
+short_conv_798	gaokao_audio/short_conv_798.wav	111111	What does the man suggest the woman do? A. Get a repairman; B. Put the table together; C. Do as the instructions tell.	C	True
+short_conv_799	gaokao_audio/short_conv_799.wav	111111	When will the man make the call with the headquarters? A. At 9:30; B. At 10:30; C. At 10:40.	C	True
+short_conv_8	gaokao_audio/short_conv_8.wav	111111	What will the man do first? A. Go to New York; B. Meet the woman; C. Visit Washington, D.C.	A	True
+short_conv_80	gaokao_audio/short_conv_80.wav	111111	What will the man do tonight? A. Have a drink; B. Write a paper; C. Watch a film.	B	True
+short_conv_800	gaokao_audio/short_conv_800.wav	111111	Where does the conversation probably take place? A. At a department; B. At the post office; C. At the cleaner’s.	C	True
+short_conv_801	gaokao_audio/short_conv_801.wav	111111	What’s the good news? A. The man got a better position; B. The man is going to get married; C. The man is going to be a father.	C	True
+short_conv_802	gaokao_audio/short_conv_802.wav	111111	How long has the rain lasted? A. 4 days; B. 5 days; C. 6 days.	B	True
+short_conv_803	gaokao_audio/short_conv_803.wav	111111	Where does the conversation probably take place? A. At the woman’s home; B. In a cinema; C. In a shop.	A	True
+short_conv_804	gaokao_audio/short_conv_804.wav	111111	Who is the woman probably speaking to? A. A friend; B. A policeman; C. A shop assistant.	B	True
+short_conv_805	gaokao_audio/short_conv_805.wav	111111	What does the woman worry about? A. Their train tickets; B. The driving habit; C. Traffic jams.	C	True
+short_conv_806	gaokao_audio/short_conv_806.wav	111111	What does the man imply? A. Papers pile up while he is on vacation; B. He has no time to go on holiday; C. Papers are too hard to understand.	B	True
+short_conv_807	gaokao_audio/short_conv_807.wav	111111	What can we learn from the conversation? A. The man finds fault with others; B. The woman has calmed the horse; C. The man has realized his problem.	C	True
+short_conv_808	gaokao_audio/short_conv_808.wav	111111	What can we learn from the conversation? A. Mr. Green can’t offer help to the woman; B. Italian words are hard to pronounce; C. Jack is not available at this moment.	A	True
+short_conv_809	gaokao_audio/short_conv_809.wav	111111	How will the woman get to Tokyo? A. By bus; B. By subway; C. By plane.	C	True
+short_conv_81	gaokao_audio/short_conv_81.wav	111111	Where does the conversation take place? A. In a restaurant; B. In a hotel; C. In supermarket.	A	True
+short_conv_810	gaokao_audio/short_conv_810.wav	111111	How does the woman feel about the film? A. Disappointed; B. Interested; C. Frightened.	A	True
+short_conv_811	gaokao_audio/short_conv_811.wav	111111	What does the woman mean? A. It is possible to cure toothache; B. She can stand two hours; C. It is too painful to be patient.	C	True
+short_conv_812	gaokao_audio/short_conv_812.wav	111111	What can we learn from the conversation? A. Ben is not interested in the training experience; B. Others cherish Ben for his long term efforts; C. Ben is very eager for the scholarship.	C	True
+short_conv_813	gaokao_audio/short_conv_813.wav	111111	How much would the woman have saved if she had waited? A. 15 dollars; B. 75 dollars; C. 60 dollars.	A	True
+short_conv_814	gaokao_audio/short_conv_814.wav	111111	Where is the conversation taking place? A. In a zoo; B. In a museum; C. In a pet store.	C	True
+short_conv_815	gaokao_audio/short_conv_815.wav	111111	What is the possible relationship between the speakers? A. Receptionist and guest; B. Doctor and patient; C. Waiter and diner.	A	True
+short_conv_816	gaokao_audio/short_conv_816.wav	111111	What does Jenny decide to do first? A. Look for a job; B. Go on a trip; C. Get an assistant.	B	True
+short_conv_817	gaokao_audio/short_conv_817.wav	111111	How will the woman get back from the railway station? A. By train; B. By car; C. By bus.	C	True
+short_conv_818	gaokao_audio/short_conv_818.wav	111111	Why does the man talk to Dr. Simpson? A. To make an apology; B. To ask for help; C. To discuss his studies.	A	True
+short_conv_819	gaokao_audio/short_conv_819.wav	111111	What is the weather like now? A. It’s sunny; B. It’s rainy; C. It’s cloudy.	C	True
+short_conv_82	gaokao_audio/short_conv_82.wav	111111	How much will the man pay if he buys two T-shirts? A. 90 yuan B. 100 yuan C. 110 yuan	A	True
+short_conv_820	gaokao_audio/short_conv_820.wav	111111	What will Lucy do at 11:30 tomorrow? A. Go out for lunch; B. See her dentist; C. Visit a friend.	B	True
+short_conv_821	gaokao_audio/short_conv_821.wav	111111	Who is the man? A. A driver; B. A lawyer; C. A policeman.	A	True
+short_conv_822	gaokao_audio/short_conv_822.wav	111111	Why does the man’s suit look strange? A. The color is too dark; B. He’s wearing it in the office; C. The jacket and trousers don't match.	C	True
+short_conv_823	gaokao_audio/short_conv_823.wav	111111	What are the speakers mainly talk about? A. Patience; B. Interests; C. Challeup.	B	True
+short_conv_824	gaokao_audio/short_conv_824.wav	111111	When will the man go to the cinema? A. On Saturday morning; B. On Saturday afternoon; C. On Saturday evening.	C	True
+short_conv_825	gaokao_audio/short_conv_825.wav	111111	Where does this conversation mostly probably take place? A. In a kitchen; B. In a dinner-room; C. In a sitting-room.	A	True
+short_conv_826	gaokao_audio/short_conv_826.wav	111111	What are the speakers talking about? A. An applicant; B. An experience; C. A job.	A	True
+short_conv_827	gaokao_audio/short_conv_827.wav	111111	Why does the girl talk with the man? A. To send an invitation; B. To seek for help; C. To ask for permission.	C	True
+short_conv_828	gaokao_audio/short_conv_828.wav	111111	What does the man want to do next? A. Stop for some coffee; B. Keep on working; C. Leave for home.	B	True
+short_conv_829	gaokao_audio/short_conv_829.wav	111111	What was the weather like on Saturday? A. Sunny; B. Cloudy; C. Windy.	A	True
+short_conv_83	gaokao_audio/short_conv_83.wav	111111	What’s the man doing? A. He’s preparing a lesson; B. He’s watching TV; C. He’s shouting.	A	True
+short_conv_830	gaokao_audio/short_conv_830.wav	111111	Which part of the man’s body hurts? A. His back; B. His neck; C. His arm.	A	True
+short_conv_831	gaokao_audio/short_conv_831.wav	111111	What does the woman ask the boy to wash? A. His hands; B. His plates; C. His clothes.	A	True
+short_conv_832	gaokao_audio/short_conv_832.wav	111111	What will the man need to do during the holiday? A. Write essays; B. Play basketball; C. Take a vacation.	A	True
+short_conv_833	gaokao_audio/short_conv_833.wav	111111	How many players will play the game? A. Two; B. Three; C. Four.	B	True
+short_conv_834	gaokao_audio/short_conv_834.wav	111111	What meal are the speakers about to eat? A. Breakfast; B. Lunch; C. Dinner.	C	True
+short_conv_835	gaokao_audio/short_conv_835.wav	111111	What color is the present sofa? A. Brown; B. White; C. Blue.	B	True
+short_conv_836	gaokao_audio/short_conv_836.wav	111111	Why does the woman come to talk with the man? A. To take a test; B. To get a job; C. To buy things.	B	True
+short_conv_837	gaokao_audio/short_conv_837.wav	111111	Who will pay for the dinner? A. John; B. Kate; C. Tom.	C	True
+short_conv_838	gaokao_audio/short_conv_838.wav	111111	When did the woman’s brother start smoking? A. During high school; B. At college; C. After college.	C	True
+short_conv_839	gaokao_audio/short_conv_839.wav	111111	What is the time now? A. 6:45; B. 6:55; C. 7:05.	B	True
+short_conv_84	gaokao_audio/short_conv_84.wav	111111	How many pills should the man take in a day? A. 8; B. 12; C. 4.	C	True
+short_conv_840	gaokao_audio/short_conv_840.wav	111111	Where does the conversation take place? A. In a restaurant; B. At a bookstore; C. In a supermarket.	A	True
+short_conv_841	gaokao_audio/short_conv_841.wav	111111	What does the man mean? A. The film is terrible; B. The film can be seen online; C. The film is worth the money.	A	True
+short_conv_842	gaokao_audio/short_conv_842.wav	111111	Where does the conversation most probably take place? A. At home; B. At a hospital; C. At a drug store.	C	True
+short_conv_843	gaokao_audio/short_conv_843.wav	111111	How old is the woman now? A. 20 years old; B. 45 years old; C. 65 years old.	B	True
+short_conv_844	gaokao_audio/short_conv_844.wav	111111	What is small for the woman? A. The T-shirt; B. The hat; C. The skirt.	C	True
+short_conv_845	gaokao_audio/short_conv_845.wav	111111	What will the man do next? A. Turn off the TV; B. Study with the woman; C. Watch a movie.	A	True
+short_conv_846	gaokao_audio/short_conv_846.wav	111111	What's the most probable relationship between the speakers? A. Electrician and owner; B. Boss and employee; C. Husband and wife	C	True
+short_conv_847	gaokao_audio/short_conv_847.wav	111111	How does the woman probably feel? A. Embarrassed; B. Surprised; C. Grateful	A	True
+short_conv_848	gaokao_audio/short_conv_848.wav	111111	How much will the man pay for the tickets? A. 70 dollars; B. 50 dollars C. 20 dollars.	B	True
+short_conv_849	gaokao_audio/short_conv_849.wav	111111	What does the woman want to do tonight? A. Go to bed early; B. Prepare for a trip; C. Have dinner with the man	A	True
+short_conv_85	gaokao_audio/short_conv_85.wav	111111	What’s the probable relationship between the two speakers? A. Teacher and student; B. Boss and worker; C. Shop assistant and customer.	C	True
+short_conv_850	gaokao_audio/short_conv_850.wav	111111	What will the weather be probably like the next day A. Fine B. Rainy C. Windy	B	True
+short_conv_851	gaokao_audio/short_conv_851.wav	111111	Where did the woman meet her husband again? A. In China; B. In America; C. In India.	B	True
+short_conv_852	gaokao_audio/short_conv_852.wav	111111	What did the man do yesterday afternoon? A. He played football; B. He stayed in his room; C. He went to the cinema.	B	True
+short_conv_853	gaokao_audio/short_conv_853.wav	111111	What are the two speakers talking about? A. Their favorite animals; B. Animals they've seen; C. Some lovely animals.	A	True
+short_conv_854	gaokao_audio/short_conv_854.wav	111111	Why can't the woman find the car key in the room? A. It is in the car; B. She left it outside; C. Someone has taken it.	C	True
+short_conv_855	gaokao_audio/short_conv_855.wav	111111	How many minutes earlier is the woman's watch? A. Five minutes; B. Ten minutes C. Fifteen minutes.	C	True
+short_conv_856	gaokao_audio/short_conv_856.wav	111111	Why did the speakers get lost? A. They forgot the address; B. They ignored Google Maps; C. They got wrong instructions.	C	True
+short_conv_857	gaokao_audio/short_conv_857.wav	111111	What does the woman advise the man to do? A. Be confident; B. Sell the company; C. Find another job.	A	True
+short_conv_858	gaokao_audio/short_conv_858.wav	111111	What is the woman going to do now? A. Look for her keys; B. Go to work by bus; C. Clean up the room.	B	True
+short_conv_859	gaokao_audio/short_conv_859.wav	111111	When does the conversation take place? A. At 2:45 P.m; B. At 3:00 P.m; C. At 3:15 P.m.	C	True
+short_conv_86	gaokao_audio/short_conv_86.wav	111111	What are the speakers doing? A. Watching a movie; B. Selling movie tickets; C. Discussing a soap opera.	A	True
+short_conv_860	gaokao_audio/short_conv_860.wav	111111	Where are probably the speakers? A. At a concert; B. In a restaurant; C. In a cinema.	B	True
+short_conv_861	gaokao_audio/short_conv_861.wav	111111	What day is it when the conversation takes place? A. Saturday; B. Sunday; C. Monday.	B	True
+short_conv_862	gaokao_audio/short_conv_862.wav	111111	Where is the man now? A. On his way; B. In a restaurant; C. At home.	A	True
+short_conv_863	gaokao_audio/short_conv_863.wav	111111	What will Celia do? A. Find a player; B. Watch a game; C. Play basketball.	C	True
+short_conv_864	gaokao_audio/short_conv_864.wav	111111	What are the speakers talking about? A. A noisy night; B. Their life in town; C. A place of living.	C	True
+short_conv_865	gaokao_audio/short_conv_865.wav	111111	What does the man want to do? A. Take photos; B. Buy a camera; C. Help the woman.	A	True
+short_conv_866	gaokao_audio/short_conv_866.wav	111111	Where does the conversation probably take place? A. At a hotel; B. In a ballroom; C. In a meeting room.	A	True
+short_conv_867	gaokao_audio/short_conv_867.wav	111111	What does the woman mean? A. She asks the man to buy a new bike; B. She can’t afford to help the man; C. She doesn’t believe the man.	C	True
+short_conv_868	gaokao_audio/short_conv_868.wav	111111	What does the woman suggest? A. Writing more essays; B. Experiencing Chinese culture; C. Borrowing some Chinese books.	C	True
+short_conv_869	gaokao_audio/short_conv_869.wav	111111	What are the speakers mainly talking about? A. A live concert; B. A right choice; C. A business report.	A	True
+short_conv_87	gaokao_audio/short_conv_87.wav	111111	What will the speakers do next? A. Go to the bookstore; B. Listen to some music; C. Eat in the restaurant.	C	True
+short_conv_870	gaokao_audio/short_conv_870.wav	111111	How long has the woman been kept in the house? A. Two days; B. Three days; C. Five days.	B	True
+short_conv_871	gaokao_audio/short_conv_871.wav	111111	What does the woman ask the boy to wash? A. His hands; B. His plates; C. His clothes.	A	True
+short_conv_872	gaokao_audio/short_conv_872.wav	111111	What will the man need to do during the holiday? A. Write papers; B. Play basketball; C. Take a vacation.	A	True
+short_conv_873	gaokao_audio/short_conv_873.wav	111111	How many players will play the game? A. Two; B. Three; C. Four.	B	True
+short_conv_874	gaokao_audio/short_conv_874.wav	111111	What meal are the speakers about to eat? A. Breakfast; B. Lunch C. Dinner.	C	True
+short_conv_875	gaokao_audio/short_conv_875.wav	111111	What color is the sofa? A. Brown; B. White; C. Blue.	B	True
+short_conv_876	gaokao_audio/short_conv_876.wav	111111	When does the conversation take place? A. In the morning; B. In the afternoon; C. In the evening.	A	True
+short_conv_877	gaokao_audio/short_conv_877.wav	111111	What is the man’s opinion on British food? A. Unhealthy; B. Tasteless; C. Excellent.	C	True
+short_conv_878	gaokao_audio/short_conv_878.wav	111111	What are the speakers mainly talking about? A. The man’s hobby; B. A holiday plan C. Their childhood.	A	True
+short_conv_879	gaokao_audio/short_conv_879.wav	111111	What is the woman trying to do? A. Hold a party for the man; B. Comfort the man; C. Apologize to the man.	B	True
+short_conv_88	gaokao_audio/short_conv_88.wav	111111	Where does the conversation probably take place? A. In a library; B. In a classroom; C. In an office.	B	True
+short_conv_880	gaokao_audio/short_conv_880.wav	111111	When can the woman take a vacation? A. At the end of August; B. At the end of June; C. This week.	A	True
+short_conv_881	gaokao_audio/short_conv_881.wav	111111	What are the speakers talking about? A. Who will go to the party; B. Whether the party is formal; C. What the woman will wear to the party.	C	True
+short_conv_882	gaokao_audio/short_conv_882.wav	111111	What does the woman mean? A. She is unhappy today; B. She is satisfied with life too; C. She is as cheerful as the man.	A	True
+short_conv_883	gaokao_audio/short_conv_883.wav	111111	What will the man do next? A. Read a book; B. Take a picture; C. Play with a boy.	B	True
+short_conv_884	gaokao_audio/short_conv_884.wav	111111	What did the man like when he was in college? A. Classical music; B. Rock music; C. Pop music.	B	True
+short_conv_885	gaokao_audio/short_conv_885.wav	111111	Where are the speakers? A. On a train; B. In a taxi; C. On a bus.	C	True
+short_conv_886	gaokao_audio/short_conv_886.wav	111111	What does the man advise the woman to do? A. Wait for the dining cart; B. Buy some sandwiches; C. Have some drinks.	A	True
+short_conv_887	gaokao_audio/short_conv_887.wav	111111	How many children did the man take to the beach? A. Two; B. Three; C. Four.	B	True
+short_conv_888	gaokao_audio/short_conv_888.wav	111111	Why does the woman call? A. To offer assistance; B. To cancel a meeting; C. To make an apology.	A	True
+short_conv_889	gaokao_audio/short_conv_889.wav	111111	What is Stan's job? A. A chef; B. A teacher; C. A photographer.	C	True
+short_conv_89	gaokao_audio/short_conv_89.wav	111111	How does the woman sound? A. Unhappy; B. Excited; C. Surprised.	A	True
+short_conv_890	gaokao_audio/short_conv_890.wav	111111	What will the man do first? A. Work overtime; B. Walk the dog; C. Do some exercise.	C	True
+short_conv_891	gaokao_audio/short_conv_891.wav	111111	What are the speakers mainly talking about? A. A French exam; B. An interpreter course; C. A job opportunity.	C	True
+short_conv_892	gaokao_audio/short_conv_892.wav	111111	What is the man’s favorite season? A. Spring; B. Autumn; C. Winter.	B	True
+short_conv_893	gaokao_audio/short_conv_893.wav	111111	Where does the conversation take place? A. In a hotel; B. In a library; C. In a museum.	B	True
+short_conv_894	gaokao_audio/short_conv_894.wav	111111	How will the speakers tour the city? A. By walking; B. By driving a car; C. By taking the bus.	C	True
+short_conv_895	gaokao_audio/short_conv_895.wav	111111	At what time will the woman arrive at the office tomorrow? A. 8:00; B. 8:30 C. 9:00.	A	True
+short_conv_896	gaokao_audio/short_conv_896.wav	111111	What’s the woman’s attitude to McDonald’s? A. Supportive; B. Neutral; C. Opposed.	C	True
+short_conv_897	gaokao_audio/short_conv_897.wav	111111	How does the woman feel? A. Satisfied; B. Discouraged; C. Excited.	B	True
+short_conv_898	gaokao_audio/short_conv_898.wav	111111	Where are the speakers? A. At home; B. In the office; C. In the stadium.	A	True
+short_conv_899	gaokao_audio/short_conv_899.wav	111111	Where does the woman want to go? A. Chicago; B. Atlanta; C. Denver.	B	True
+short_conv_9	gaokao_audio/short_conv_9.wav	111111	What is the relationship between the woman and Dr. Philips? A. Hostess and gardener; B. Neighbors; C. Doctor and patient.	B	True
+short_conv_90	gaokao_audio/short_conv_90.wav	111111	What is the probable relationship between the speakers? A. Customer and salesman; B. Interviewer and interviewee; C. Boss and secretary.	B	True
+short_conv_900	gaokao_audio/short_conv_900.wav	111111	What are the speakers probably doing? A. Preparing for camping; B. Buying sleeping bags; C. Cleaning up the car.	A	True
+short_conv_901	gaokao_audio/short_conv_901.wav	111111	What does the man mean about Mr. Dale? A. He’s become director of the department; B. He gets on very well with his colleague; C. He’s the focus of people’s attention.	C	True
+short_conv_902	gaokao_audio/short_conv_902.wav	111111	What is the man doing? A. Shopping with his son; B. Buying a gift for a kid; C. Bargaining with a salesgirl.	B	True
+short_conv_903	gaokao_audio/short_conv_903.wav	111111	How did the woman feel about the movie? A. Interesting; B. Boring; C. Instructive.	B	True
+short_conv_904	gaokao_audio/short_conv_904.wav	111111	When will the plane arrive? A. At 3: 30pm; B. At 4:00pm; C. At 4: 30pm.	C	True
+short_conv_905	gaokao_audio/short_conv_905.wav	111111	Where are the two speakers? A. At the airport; B. At the hotel; C. At the railway station.	C	True
+short_conv_906	gaokao_audio/short_conv_906.wav	111111	How does the man feel about his food. A. Hot B. Salty C. Tasteless	C	True
+short_conv_907	gaokao_audio/short_conv_907.wav	111111	How much does the woman have to pay? A. $2 B. $3 C. $4	B	True
+short_conv_908	gaokao_audio/short_conv_908.wav	111111	What is the relationship between the speakers? A. Colleagues; B. Waiter and customer; C. Boss and employee.	A	True
+short_conv_909	gaokao_audio/short_conv_909.wav	111111	When does the rainy season start? A. In January; B. In February; C. In November.	C	True
+short_conv_91	gaokao_audio/short_conv_91.wav	111111	What are the speakers mainly talking about? A. The woman's paper; B. The weekend plan; C. Outdoor activities.	B	True
+short_conv_910	gaokao_audio/short_conv_910.wav	111111	Where are the speakers? A. In a restaurant; B. In a hospital; C. In a shop.	A	True
+short_conv_911	gaokao_audio/short_conv_911.wav	111111	What is the weather like in the man’s hometown? A. Warm; B. Comfortable; C. Wet.	B	True
+short_conv_912	gaokao_audio/short_conv_912.wav	111111	What does the woman want to learn next year? A. Math; B. Science; C. Law.	C	True
+short_conv_913	gaokao_audio/short_conv_913.wav	111111	How does the man feel now? A. Cheerful; B. Unhappy; C. Afraid.	B	True
+short_conv_914	gaokao_audio/short_conv_914.wav	111111	Who is possibly at college? A. Liza; B. Peter; C. Grace.	C	True
+short_conv_915	gaokao_audio/short_conv_915.wav	111111	How does the woman know about the Second World War? A. She experienced it; B. She saw a film about it; C. She played a game about it.	A	True
+short_conv_916	gaokao_audio/short_conv_916.wav	111111	What is the woman probably doing? A. Drinking water; B. Washing a brush; C. Painting a picture.	C	True
+short_conv_917	gaokao_audio/short_conv_917.wav	111111	Why did the man leave his previous job? A. The pay wasn’t good; B. It kept him busy every day; C. There’s no room for development.	C	True
+short_conv_918	gaokao_audio/short_conv_918.wav	111111	What is the man going to do? A. Write his paper; B. Visit Professor Green; C. Go to the cinema with the woman.	B	True
+short_conv_919	gaokao_audio/short_conv_919.wav	111111	What does the man think of this summer? A. It will be nice; B. It will be hotter; C. It will be cool.	B	True
+short_conv_92	gaokao_audio/short_conv_92.wav	111111	Where does this conversation probably take place? A. At a bus stop; B. On the street; C. At an information desk	C	True
+short_conv_920	gaokao_audio/short_conv_920.wav	111111	What are the speakers mainly talking about? A. A book; B. Their children; C. Education.	A	True
+short_conv_921	gaokao_audio/short_conv_921.wav	111111	Where will the speakers read the book together? A. At the woman’s house; B. In the library; C. At the school café.	C	True
+short_conv_922	gaokao_audio/short_conv_922.wav	111111	What is the time now? A. 8:00 p.m; B. 7:30 p.m; C. 7:00 p.m.	C	True
+short_conv_923	gaokao_audio/short_conv_923.wav	111111	How does the woman get the information she wants? A. From the TV; B. From the newspaper; C. From the Internet.	C	True
+short_conv_924	gaokao_audio/short_conv_924.wav	111111	What does the man want to do now? A. Have supper; B. Set the table; C. Watch the news.	C	True
+short_conv_925	gaokao_audio/short_conv_925.wav	111111	Where will the speakers read the book together? A. At the woman’s house; B. In the library; C. At the school café.	C	True
+short_conv_926	gaokao_audio/short_conv_926.wav	111111	What are the speakers mainly talking about? A. A book; B. Their children; C. Education.	A	True
+short_conv_927	gaokao_audio/short_conv_927.wav	111111	What is the time now? A. 8:00 p.m; B. 7:30 p.m; C. 7:00 p.m.	C	True
+short_conv_928	gaokao_audio/short_conv_928.wav	111111	How does the woman get the information she wants? A. From the TV; B. From the newspaper; C. From the Internet.	C	True
+short_conv_929	gaokao_audio/short_conv_929.wav	111111	What does the man want to do now? A. Have supper; B. Set the table; C. Watch the news.	C	True
+short_conv_93	gaokao_audio/short_conv_93.wav	111111	Which flight will the man take? A. 10: 45; B. 12: 00; C. 14: 50.	A	True
+short_conv_930	gaokao_audio/short_conv_930.wav	111111	What do the speakers mainly talk about? A. Have a class reunion; B. Plan a birthday gift; C. Visit a family member.	C	True
+short_conv_931	gaokao_audio/short_conv_931.wav	111111	What is the man’s job? A. He is a model; B. He is a designer; C. He is a salesman.	C	True
+short_conv_932	gaokao_audio/short_conv_932.wav	111111	Where is Jacob now? A. At home; B. At school; C. At a supermarket.	B	True
+short_conv_933	gaokao_audio/short_conv_933.wav	111111	What does the man want to order? A. French fries; B. A large soft drink; C. A bacon sandwich.	C	True
+short_conv_934	gaokao_audio/short_conv_934.wav	111111	What is the weather like row? A. Sunny; B. Cloudy; C. Rainy.	B	True
+short_conv_935	gaokao_audio/short_conv_935.wav	111111	What is the probable relationship between the speakers? A. Friends; B. Boss and employee; C. Salesman and customer.	B	True
+short_conv_936	gaokao_audio/short_conv_936.wav	111111	What does the man usually do on the weekend? A. Go to the movies; B. Meet up with friends; C. Read books at home.	C	True
+short_conv_937	gaokao_audio/short_conv_937.wav	111111	What subject does the man have trouble with? A. Math; B. Science; C. English.	A	True
+short_conv_938	gaokao_audio/short_conv_938.wav	111111	What did the woman get from her mother? A. A new CD B. A new bike; C. A birthday card.	B	True
+short_conv_939	gaokao_audio/short_conv_939.wav	111111	When will the football program begin? A. At 7:00; B. At 8:25; C. At 9:30.	C	True
+short_conv_94	gaokao_audio/short_conv_94.wav	111111	How much will the woman pay? A. $15; B. $20; C. $25.	B	True
+short_conv_940	gaokao_audio/short_conv_940.wav	111111	What's the probable relationship between the speakers? A. Husband and wife; B. Doctor and patient; C. Teacher and student.	B	True
+short_conv_941	gaokao_audio/short_conv_941.wav	111111	What does the man think Michael has been doing this week? A. Going to class; B. Resting at home; C. Looking for a job.	C	True
+short_conv_942	gaokao_audio/short_conv_942.wav	111111	What is the woman going to do? A. Pray for good luck; B. Prepare for a debate; C. Study for the final exam.	B	True
+short_conv_943	gaokao_audio/short_conv_943.wav	111111	What does the man think of the jacket? A. It has too many pockets; B. It's great for everyday use; C. It is suitable for outdoor activities.	C	True
+short_conv_944	gaokao_audio/short_conv_944.wav	111111	What impressed the man? A. The fresh air; B. The heavy rain; C. The kind woman.	A	True
+short_conv_945	gaokao_audio/short_conv_945.wav	111111	What is the man’s secret ingredient? A. The spice; B. The cheese; C. The red wine.	C	True
+short_conv_946	gaokao_audio/short_conv_946.wav	111111	What is the man going to do after work? A. Go back home; B. Go to the dancing club; C. Go for a drink.	C	True
+short_conv_947	gaokao_audio/short_conv_947.wav	111111	What day is it today? A. Friday; B. Saturday; C. Sunday.	B	True
+short_conv_948	gaokao_audio/short_conv_948.wav	111111	Where does the conversation take place probably? A. In a concert; B. In a ball; C. In a CD store.	B	True
+short_conv_949	gaokao_audio/short_conv_949.wav	111111	How does the man feel about his trip? A. Regretful; B. Meaningful; C. Happy.	A	True
+short_conv_95	gaokao_audio/short_conv_95.wav	111111	What does the man need? A. Coffee; B. Sprite; C. Orange juice.	C	True
+short_conv_950	gaokao_audio/short_conv_950.wav	111111	What is the woman probably doing? A. Drinking water; B. Washing a brush; C. Painting a picture.	C	True
+short_conv_951	gaokao_audio/short_conv_951.wav	111111	How does the woman know about the Second World War? A. She experienced it; B. She saw a film about it; C. She played a game about it.	A	True
+short_conv_952	gaokao_audio/short_conv_952.wav	111111	Why did the man leave his previous job? A. The pay wasn’t good; B. It kept him busy every day; C. There’s no room for development.	C	True
+short_conv_953	gaokao_audio/short_conv_953.wav	111111	What does the man think of this summer? A. It will be nice; B. It will be hotter; C. It will be cool.	B	True
+short_conv_954	gaokao_audio/short_conv_954.wav	111111	What is the man going to do? A. Write his paper; B. Visit Professor Green; C. Go to the cinema with the woman.	B	True
+short_conv_955	gaokao_audio/short_conv_955.wav	111111	What does the woman think of the movie? A. Interesting; B. Exciting; C. Disappointing.	C	True
+short_conv_956	gaokao_audio/short_conv_956.wav	111111	How much does the man pay for the book? A. $ 20; B. $ 12.5; C. $ 7.5.	B	True
+short_conv_957	gaokao_audio/short_conv_957.wav	111111	When will the basketball match take place? A. On Tuesday; B. On Thursday; C. On Friday.	C	True
+short_conv_958	gaokao_audio/short_conv_958.wav	111111	What does the woman suggest doing? A. Going by bus; B. Going by car; C. Going on foot.	A	True
+short_conv_959	gaokao_audio/short_conv_959.wav	111111	What does the man do most weekends? A. He goes shopping; B. He watches matches; C. He visits museums.	B	True
+short_conv_96	gaokao_audio/short_conv_96.wav	111111	What does the man mean? A. Violence sports are to blame for the school bullying; B. Violence sports serve as an escape for negative emotions; C. Violence sports should expand the fans base in the long run.	B	True
+short_conv_960	gaokao_audio/short_conv_960.wav	111111	What does the man want to find? A. His pencils; B. His books; C. His bag.	A	True
+short_conv_961	gaokao_audio/short_conv_961.wav	111111	Where can the woman be? A. At the library; B. At the doctor’s; C. At a bookshop.	B	True
+short_conv_962	gaokao_audio/short_conv_962.wav	111111	Who are the speakers going to see? A. The woman’s father; B. The man’s uncle; C. The man’s father.	C	True
+short_conv_963	gaokao_audio/short_conv_963.wav	111111	Where does the woman suggest going? A. To the cinema; B. To the bookstore; C. To the shopping mall.	C	True
+short_conv_964	gaokao_audio/short_conv_964.wav	111111	When will the meeting start? A. At 8:15; B. At 8:45; C. At 9:00.	C	True
+short_conv_965	gaokao_audio/short_conv_965.wav	111111	How does the man learn to be a Christmas father? A. By playing with children; B. By learning from his father C. By going to a school	C	True
+short_conv_966	gaokao_audio/short_conv_966.wav	111111	What does the woman want to do on Wednesday? A. Do her homework; B. See a movie; C. Have football practice.	B	True
+short_conv_967	gaokao_audio/short_conv_967.wav	111111	What does the man think of artificial intelligence? A. It will help doctors; B. It will treat patients; C. It will replace doctors.	A	True
+short_conv_968	gaokao_audio/short_conv_968.wav	111111	Who will pick up the woman at the airport? A. The man's brother; B. The woman's father; C. The man's co—worker	A	True
+short_conv_969	gaokao_audio/short_conv_969.wav	111111	What does the man want to have? A. Coffee; B. Orange juice; C. Soda.	B	True
+short_conv_97	gaokao_audio/short_conv_97.wav	111111	What kinds of vocation does the man probably prefer? A. Casual; B. Extended; C. Well-planned.	C	True
+short_conv_970	gaokao_audio/short_conv_970.wav	111111	What are the speakers mainly talking about? A. A present for Molly; B. A birthday party; C. A musician’s life.	A	True
+short_conv_971	gaokao_audio/short_conv_971.wav	111111	What can we know about the man? A. He had just joined a new team; B. He was praised by the manager; C. He evaluated others’ performances,	B	True
+short_conv_972	gaokao_audio/short_conv_972.wav	111111	What will the speakers do first after watching the dolphin show? A. See the elephants; B. Drink a cup of tea; C. Have a watch repaired.	B	True
+short_conv_973	gaokao_audio/short_conv_973.wav	111111	Where does the conversation probably take place? A. At a restaurant; B. In the street; C. At home.	C	True
+short_conv_974	gaokao_audio/short_conv_974.wav	111111	How did the man get the book? A. He received it from his friend; B. He bought it from a bookstore; C. He borrowed it from the library.	C	True
+short_conv_975	gaokao_audio/short_conv_975.wav	111111	Why was the woman so late? A. She didn’t catch the bus; B. She took somebody to hospital; C. Something went wrong with the bus.	C	True
+short_conv_976	gaokao_audio/short_conv_976.wav	111111	What does the man mean? A. He preferred to go to the basketball game; B. He didn’t like to go to the basketball game; C. He enjoyed watching the football game on TV.	B	True
+short_conv_977	gaokao_audio/short_conv_977.wav	111111	How does the woman feel about Tom? A. He gets nervous very easily; B. He is not used to making a speech; C. He is extremely good at making a speech.	B	True
+short_conv_978	gaokao_audio/short_conv_978.wav	111111	What does the woman want the man to do first? A. Fix the shelf; B. Paint the car; C. Look for his key.	A	True
+short_conv_979	gaokao_audio/short_conv_979.wav	111111	What is the man probably doing? A. Writing a story; B. Telling a joke; C. Watching a movie.	C	True
+short_conv_98	gaokao_audio/short_conv_98.wav	111111	What does the woman suggest the man do? A. Ask the repair store to fix the calculator; B. Borrow the tools needed to fix the calculator; C. Figure out what is wrong with the calculator.	A	True
+short_conv_980	gaokao_audio/short_conv_980.wav	111111	When can the man eat in the restaurant? A. At 11:00 a.m; B. At 6:00 a.m; C. At 10:00 p.m.	A	True
+short_conv_981	gaokao_audio/short_conv_981.wav	111111	What problem is the man facing? A. His room is very dirty; B. He feels a pain in the neck; C. His roommate is annoying him.	C	True
+short_conv_982	gaokao_audio/short_conv_982.wav	111111	What will the man do on Sunday? A. See a movie; B. Do his homework; C. Go shopping.	A	True
+short_conv_983	gaokao_audio/short_conv_983.wav	111111	How does the woman feel about the book? A. challenging B. Boring; C. Interesting.	C	True
+short_conv_984	gaokao_audio/short_conv_984.wav	111111	What will the woman do at 3:30 p.m.? A. Meet Miss Lee; B. Attend a meeting; C. Call Roland.	B	True
+short_conv_985	gaokao_audio/short_conv_985.wav	111111	What is the man doing? A. Searching for an app; B. Learning words; C. Playing a game.	B	True
+short_conv_986	gaokao_audio/short_conv_986.wav	111111	What’s the probable relationship between the speakers? A. Classmates; B. Colleagues; C. Strangers.	C	True
+short_conv_987	gaokao_audio/short_conv_987.wav	111111	What are the speakers talking about? A. A traffic accident; B. Traffic rules; C. A student’s fault.	A	True
+short_conv_988	gaokao_audio/short_conv_988.wav	111111	What does the woman think of the food in the restaurant? A. Sweet; B. Good; C. Expensive.	C	True
+short_conv_989	gaokao_audio/short_conv_989.wav	111111	What is the probable relationship between the speakers? A. Colleagues; B. Brother and sister; C. Teacher and student.	A	True
+short_conv_99	gaokao_audio/short_conv_99.wav	111111	What has the man assumed about the woman? A. She has made a great fortune; B. She has the foggiest idea of business; C. She is a gifted stock operator.	B	True
+short_conv_990	gaokao_audio/short_conv_990.wav	111111	What is the man doing? A. Eating dessert; B. Reading a book; C. Taking out the rubbish.	B	True
+short_conv_991	gaokao_audio/short_conv_991.wav	111111	How long did Maria stay in France? A. One year; B. Two years; C. Three years.	B	True
+short_conv_992	gaokao_audio/short_conv_992.wav	111111	Where will the woman meet the driver? A. Next to the bank; B. Beside the bus stop; C. Opposite the theater.	C	True
+short_conv_993	gaokao_audio/short_conv_993.wav	111111	Why did the man come home late? A. He ate out with his friend; B. He studied at schoo1; C. He watched a match.	C	True
+short_conv_994	gaokao_audio/short_conv_994.wav	111111	What did the woman probably win? A. A television; B. $64 in cash; C. A radio.	A	True
+short_conv_995	gaokao_audio/short_conv_995.wav	111111	What is the woman complaining about? A. The bad traffic; B. Her early work schedule; C. The annoying construction.	C	True
+short_conv_996	gaokao_audio/short_conv_996.wav	111111	Where might the  speakers be? A. In a park; B. In a classroom; C. In a gym.	A	True
+short_conv_997	gaokao_audio/short_conv_997.wav	111111	What’s the man? A. A waiter; B. An accountant; C. A programmer.	A	True
+short_conv_998	gaokao_audio/short_conv_998.wav	111111	What does the man want the woman to do? A. Clean up the house; B. play chess with him; C. Have a chat with him.	A	True
+short_conv_999	gaokao_audio/short_conv_999.wav	111111	What’s the weather like tomorrow? A. Rainy; B. Windy; C. Sunny	A	True
diff --git a/WavLLM/wavllm/test_data/sv.tsv b/WavLLM/wavllm/test_data/sv.tsv
new file mode 100644
index 0000000000000000000000000000000000000000..bdeaa8b20ee4ac7b2a855e78dce0ab3cfed4628e
--- /dev/null
+++ b/WavLLM/wavllm/test_data/sv.tsv
@@ -0,0 +1,2 @@
+id	audio	n_frames	prompt	tgt_text	with_speech
+0	SpeechT5/WavLLM/fairseq/examples/wavllm/test_data/audio/sv.wav	351362	Is there only one speaker in the audio clip?	Yes	True
diff --git a/WavLLM/wavllm/tokenizer/tokenizer.model b/WavLLM/wavllm/tokenizer/tokenizer.model
new file mode 100644
index 0000000000000000000000000000000000000000..22bccbcb41ec929cf0c9dbe8f41036db82e5e773
Binary files /dev/null and b/WavLLM/wavllm/tokenizer/tokenizer.model differ
diff --git a/YiTrans/.gitignore b/YiTrans/.gitignore
new file mode 100644
index 0000000000000000000000000000000000000000..d9722a33e6ce7c45f6e06ef1b25a51a0684be986
--- /dev/null
+++ b/YiTrans/.gitignore
@@ -0,0 +1,2 @@
+**/__pycache__
+
diff --git a/YiTrans/exp_scripts/finetune_ASR/finetune_hubert24_mbart24_en.sh b/YiTrans/exp_scripts/finetune_ASR/finetune_hubert24_mbart24_en.sh
new file mode 100644
index 0000000000000000000000000000000000000000..4f6a2d443fe739f3f31b7458c7e08da13db5a639
--- /dev/null
+++ b/YiTrans/exp_scripts/finetune_ASR/finetune_hubert24_mbart24_en.sh
@@ -0,0 +1,67 @@
+world_size=$1
+update_freq=$2
+[ -z $world_size ] && world_size=8
+[ -z $update_freq ] && update_freq=8
+
+EXP_NAME=train_iwslt_asr_hubert24_mbart24_norel
+SAVE_DIR=${HOME}/data/iwslt/asr_v3/${EXP_NAME}
+
+DATA_ROOT=${HOME}/dataset/iwslt_mustc
+LABEL_DIR=${DATA_ROOT}/fine-tune_en_bpe250k
+SP_PATH=${LABEL_DIR}/sentence.bpe.model
+retain_dict=${LABEL_DIR}/index_en_onlyMUSTC
+W2V_PATH=${HOME}/dataset/iwslt_mustc/pretrain_ed_model_cfg.pt
+
+TRAIN_SUBSET=train_asr_MUSTC
+VALID_SUBSET=dev_asr_MUSTC
+
+
+mbart_path="/mnt/default/v-junyiao/released_exsp/mbart50.pretrained/model.pt"
+hubert_path="/mnt/default/v-junyiao/speechexp/fairseq_mlst/hubert_large_librivox_released/checkpoint_last.pt"
+
+CODE_ROOT=${HOME}/code/SpeechT5/YiTrans
+
+python $CODE_ROOT/fairseq/fairseq_cli/hydra_train.py \
+  --config-dir $CODE_ROOT/yitrans_iwslt22/config/finetune_asr \
+  --config-name large_mustc \
+  common.user_dir=$CODE_ROOT/yitrans_iwslt22 \
+  distributed_training.distributed_world_size=$world_size \
+  optimization.update_freq=[$update_freq] \
+  \
+  dataset.max_tokens=400001 \
+  dataset.num_workers=0 \
+  optimization.max_update=120000 \
+  \
+  task._name="iwslt_joint_pretraining" \
+  task.data=${DATA_ROOT} \
+  task.label_dir=${LABEL_DIR} \
+  +task.store_labels=True \
+  task.hubert_tokenizer="sentencepiece" \
+  task.sp_path=${SP_PATH} \
+  task.max_keep_size=400000 \
+  criterion.dec_weight=0.5 \
+  \
+  model._name="yitrans_asr" \
+  model.w2v_path=${W2V_PATH} \
+  +model.reuse_text_emb=true \
+  +model.share_ctc_decoder_embed=true \
+  +model.retain_dict_path=${retain_dict} \
+  model.freeze_finetune_updates=15000 \
+  \
+  +model.no_pretrained_weights=true \
+  +model.use_rel_pos_enc=false \
+  +model.encoder_layers=24 \
+  +model.add_text_encoder=true \
+  +model.share_s2t_t2t_embeddings=false \
+  +model.share_enc_dec_embeddings=false \
+  +model.add_adaptor=false \
+  +model.load_pretrained_w2v_from=$hubert_path \
+  +model.load_pretrained_mbart_from=$mbart_path \
+  \
+  dataset.train_subset=${TRAIN_SUBSET} \
+  dataset.valid_subset=${VALID_SUBSET} \
+  checkpoint.save_dir=${SAVE_DIR} \
+  common.tensorboard_logdir=${SAVE_DIR} \
+  hydra.run.dir=${SAVE_DIR} \
+  hydra.job.name=${EXP_NAME}
+
diff --git a/YiTrans/exp_scripts/finetune_MT/finetune_mbart_en-de.sh b/YiTrans/exp_scripts/finetune_MT/finetune_mbart_en-de.sh
new file mode 100644
index 0000000000000000000000000000000000000000..8abce531d8e3b63af5bde5105d4bcd42404176e1
--- /dev/null
+++ b/YiTrans/exp_scripts/finetune_MT/finetune_mbart_en-de.sh
@@ -0,0 +1,75 @@
+#####################################
+# Hubert ED model #
+#####################################
+[ $# -gt 2 ] && echo "Usage: $0 <world_size> <update_freq> [w2v_path] [mbart_path]" && exit 0
+world_size=$1
+update_freq=$2
+w2v_path=$3
+mbart_path=$4
+
+[ -z $world_size ] && world_size=8
+[ -z $update_freq ] && update_freq=2
+[ -z $w2v_path ] && w2v_path=${HOME}/dataset/iwslt_mustc/pretrain_ed_model_cfg.pt
+[ -z $mbart_path ] && mbart_path="/mnt/default/v-junyiao/released_exsp/mbart50.pretrained/model.pt"
+langs=ar_AR,cs_CZ,de_DE,en_XX,es_XX,et_EE,fi_FI,fr_XX,gu_IN,hi_IN,it_IT,ja_XX,kk_KZ,ko_KR,lt_LT,lv_LV,my_MM,ne_NP,nl_XX,ro_RO,ru_RU,si_LK,tr_TR,vi_VN,zh_CN,af_ZA,az_AZ,bn_IN,fa_IR,he_IL,hr_HR,id_ID,ka_GE,km_KH,mk_MK,ml_IN,mn_MN,mr_IN,pl_PL,ps_AF,pt_XX,sv_SE,sw_KE,ta_IN,te_IN,th_TH,tl_XX,uk_UA,ur_PK,xh_ZA,gl_ES,sl_SI
+
+DATA_DIR=/mnt/default/lozhou/speechdata/mt_data/en-de/com-filter-ende/bin-idx
+exp_name=tune_mbart_com_filter_le-4
+SAVE_DIR="${HOME}/data/iwslt/mt_stage1_en-de/$exp_name"
+[ -d $SAVE_DIR ] || mkdir -p $SAVE_DIR
+
+CODE_ROOT=${HOME}/code/SpeechT5/YiTrans
+
+python $CODE_ROOT/fairseq/fairseq_cli/hydra_train.py \
+  --config-dir $CODE_ROOT/yitrans_iwslt22/config/finetune_mt \
+  --config-name mt_translation \
+  common.user_dir=$CODE_ROOT/yitrans_iwslt22 \
+  distributed_training.distributed_world_size=${world_size} \
+  optimization.update_freq=[$update_freq] \
+  \
+  +task.data=$DATA_DIR \
+  +task.source_lang="en_XX" +task.target_lang="de_DE" \
+  +task.langs=\"$langs\" \
+  +task.normalize=false \
+  +task.append_source_id=true \
+  \
+  +model.dropout=0.2 \
+  +model.attention_dropout=0.1 \
+  model.activation_dropout=0.1 \
+  model.decoder_layerdrop=0 \
+  model.layerdrop=0 \
+  model.freeze_finetune_updates=0 \
+  \
+  model.w2v_path=$w2v_path \
+  +model.no_pretrained_weights=true \
+  +model.load_pretrained_mbart_from=$mbart_path \
+  +model.share_enc_dec_embeddings=true \
+  +model.share_s2t_t2t_embeddings=false \
+  +model.use_rel_pos_enc=false \
+  \
+  dataset.train_subset="train" \
+  dataset.valid_subset="valid" \
+  dataset.num_workers=4 \
+  dataset.max_tokens=2000 \
+  \
+  optimization.max_epoch=50 \
+  optimization.clip_norm=5 \
+  optimization.max_update=200000 \
+  lr_scheduler.total_num_update=200000 \
+  \
+  checkpoint.save_interval=1 \
+  checkpoint.save_interval_updates=5000 \
+  checkpoint.keep_last_epochs=5 \
+  checkpoint.keep_best_checkpoints=5 \
+  \
+  common.seed=222 \
+  common.log_interval=100 \
+  common.log_format="json" \
+  \
+  checkpoint.best_checkpoint_metric="accuracy" \
+  checkpoint.maximize_best_checkpoint_metric=true \
+  common.tensorboard_logdir=$SAVE_DIR \
+  checkpoint.save_dir=$SAVE_DIR \
+  hydra.run.dir=$SAVE_DIR \
+  hydra.job.name=$exp_name
+
diff --git a/YiTrans/exp_scripts/finetune_ST/en-de/jtst_pt36s2_mustc.sh b/YiTrans/exp_scripts/finetune_ST/en-de/jtst_pt36s2_mustc.sh
new file mode 100644
index 0000000000000000000000000000000000000000..df763ee898c778504827bc77607496e283fea114
--- /dev/null
+++ b/YiTrans/exp_scripts/finetune_ST/en-de/jtst_pt36s2_mustc.sh
@@ -0,0 +1,83 @@
+world_size=$1
+update_freq=$2
+[ -z $world_size ] && world_size=8
+[ -z $update_freq ] && update_freq=4
+
+DATA_DIR=/mnt/default/lozhou/speechdata/st_data/en-de/com2-ende-newmt
+EXP_NAME="jt_st_mustc_large_stage2_300k_11sets"
+SAVE_DIR=/mnt/default/v-ziqzhang/data/iwslt/st_en-de_v4/${EXP_NAME}
+retain_dict=/mnt/default/v-junyiao/dataset/iwslt/en-de/released/analyse/index_asr_st_onlyMUSTC
+W2V_PATH1=/mnt/default/v-junyiao/speechexp/train_speech_text_joint_addadaptor_bpecode_large_step1_mbartpt_400k/checkpoint_last.pt
+W2V_PATH2=/mnt/default/v-junyiao/speechexp/fairseq_mlst/train_speech_text_joint_adaptor_large_step2_300k/checkpoint_last.pt
+mkdir -p ${SAVE_DIR}
+
+FAIRSEQ_ROOT=/mnt/default/v-ziqzhang/code/fairseq_mlst
+
+python $FAIRSEQ_ROOT/fairseq_cli/train.py ${DATA_DIR} \
+    --save-dir ${SAVE_DIR} \
+    --user-dir examples/speech_text_joint_to_text \
+    --task speech_text_joint_to_text \
+    --config-yaml config_step1_39k.yaml \
+    --train-subset "train_11set_st_addsrc" \
+    --valid-subset "dev_mustc2_en_de_addsrc_st" \
+    --fp16 \
+    --seed 1 \
+    \
+    --ddp-backend no_c10d \
+    --distributed-world-size ${world_size} \
+    --tensorboard-logdir ${SAVE_DIR} \
+    \
+    --criterion guided_label_smoothed_cross_entropy_with_accuracy \
+    --label-smoothing 0.3 \
+    --guide-alpha 0.8 \
+    --disable-text-guide-update-num 5000 \
+    --attentive-cost-regularization 0.02 \
+    \
+    --optimizer adam \
+    --clip-norm 1.0 \
+    --lr 5e-05 \
+    --lr-scheduler polynomial_decay --warmup-updates 5000 \
+    --warmup-updates 5000 \
+    --max-update 200000 \
+    --total-num-update 200000 \
+    --update-freq ${update_freq} \
+    \
+    --max-tokens 450000 \
+    --max-sentences 3 \
+    --max-tokens-valid 500000 \
+    --max-source-positions 450000 \
+    --skip-invalid-size-inputs-valid-test \
+    --num-workers 0 \
+    --save-interval 1 \
+    --log-format json \
+    --log-interval 100 \
+    --best-checkpoint-metric "acc" \
+    --maximize-best-checkpoint-metric \
+    \
+    --arch "hubert_st2t" \
+    --w2v-path ${W2V_PATH1} \
+    --load-step2-model-from ${W2V_PATH2} \
+    --no-pretrained-weights \
+    --add-decoder \
+    --reuse-text-emb \
+    --layerdrop 0.1 \
+    --activation-dropout 0.1 \
+    --decoder-layerdrop 0.1 \
+    --freeze-finetune-updates 0 \
+    --feature-grad-mult 1.0 \
+    --retain-dict-path ${retain_dict} \
+    --share-decoder-input-output-embed \
+    --share-speech-text-embeddings \
+    \
+    --save-interval-updates 2000 \
+    --keep-interval-updates 5 \
+    --keep-interval-updates-pattern 10000 \
+    --keep-last-epochs 5 \
+    \
+    2>&1 | tee ${SAVE_DIR}/train.log
+
+sleep 5s
+
+    # --lr-scheduler inverse_sqrt \
+    # --load-step2-model-from ${W2V_PATH2} \
+    # --no-pretrained-weights \
diff --git a/YiTrans/exp_scripts/pretrain/pretrain_pt36_adaptor_step1.sh b/YiTrans/exp_scripts/pretrain/pretrain_pt36_adaptor_step1.sh
new file mode 100644
index 0000000000000000000000000000000000000000..8b7a55a93f6a2d9e3a002ff9e1d8958676e46ed3
--- /dev/null
+++ b/YiTrans/exp_scripts/pretrain/pretrain_pt36_adaptor_step1.sh
@@ -0,0 +1,46 @@
+export HYDRA_FULL_ERROR=1
+YiTrans=/home/v-ziqzhang/Code/SpeechT5/YiTrans
+DATA_DIR=/mnt/default/lozhou/speechdata/hubert_data
+LABEL_DIR=${DATA_DIR}/layer9_k500_label
+SP_PATH=${LABEL_DIR}/spm_unigram8000.model
+TEXT_DATA_DIR=/mnt/default/lozhou/speechdata/text_data/v3/bin_idx_step1
+EXP_NAME=pretrain_pt36_addadaptor_bpecode_large_step1
+SAVE_DIR=${HOME}/data/speechexp/${EXP_NAME}
+W2V_PATH=${HOME}/data/speechexp/hubert_large_librivox_released/checkpoint_last.pt
+MBART_PATH=${HOME}/data/speechexp/mbart50.pretrained/model.pt
+
+python ${YiTrans}/fairseq/fairseq_cli/hydra_train.py \
+  --config-dir ${YiTrans}/yitrans_iwslt22/config/pretrain \
+  --config-name joint_large \
+  common.user_dir=${YiTrans}/yitrans_iwslt22 \
+  \
+  task.data=$DATA_DIR \
+  task.labels='["km"]' \
+  task.label_dir=$LABEL_DIR \
+  task.text_cfg.text_data=$TEXT_DATA_DIR \
+  +task.hubert_tokenizer="sentencepiece" \
+  +task.sp_path=${SP_PATH} \
+  \
+  model.label_rate=50 \
+  model.encoder_layers=12 \
+  +model.load_pretrained_w2v_from=${W2V_PATH} \
+  +model.load_pretrained_mbart_from=${MBART_PATH} \
+  \
+  dataset.train_subset=\"train_LS,train_MUSTC+mono_deduped_filt_sort.en_XX.en_XX,mt8corpus_filt_slct.en_XX-de_DE\" \
+  dataset.valid_subset=\"dev_MUSTC+valid.en_XX-de_DE,dev_MUSTC+valid.en_XX-ja_XX,dev_MUSTC+valid.en_XX-zh_CN,dev_MUSTC+dev4x.en_XX.en_XX\" \
+  dataset.max_tokens=300000 \
+  \
+  distributed_training.distributed_world_size=8 \
+  distributed_training.nprocs_per_node=8 \
+  optimization.update_freq=[2] \
+  \
+  common.tensorboard_logdir=$SAVE_DIR \
+  checkpoint.save_dir=$SAVE_DIR \
+  hydra.run.dir=$SAVE_DIR \
+  hydra.job.name=$EXP_NAME \
+  checkpoint.reset_optimizer=true \
+  checkpoint.reset_dataloader=true
+
+
+
+  # dataset.train_subset=\"train_CV,train_EUR,train_LS,train_MUSTC,train_TEDLIUM,train_VP+mono_deduped_filt_sort.en_XX.en_XX,mt8corpus_filt_slct.en_XX-de_DE,mt8corpus_filt_slct.en_XX-ja_XX,mt8corpus_filt_slct.en_XX-zh_CN\" \
diff --git a/YiTrans/exp_scripts/pretrain/pretrain_pt36_adaptor_step2.sh b/YiTrans/exp_scripts/pretrain/pretrain_pt36_adaptor_step2.sh
new file mode 100644
index 0000000000000000000000000000000000000000..756b93a7c2c7e8b20ef9551bdab2f9863388bb9c
--- /dev/null
+++ b/YiTrans/exp_scripts/pretrain/pretrain_pt36_adaptor_step2.sh
@@ -0,0 +1,45 @@
+EXP_NAME=train_speech_text_joint_adaptor_large_step2_300k
+SAVE_DIR=/datablob/users/v-junyiao/speechexp/fairseq_mlst/${EXP_NAME}
+DATA_ROOT=/datablob/users/v-junyiao/speechdata/hubert_mlst
+LABEL_DIR=${DATA_ROOT}/fine-tune_en_bpe250k_full
+W2V_PATH=/mnt/default/v-junyiao/speechexp/train_speech_text_joint_addadaptor_bpecode_large_step1_mbartpt_400k/checkpoint_last_up.pt
+TEXT_DATA_DIR=/datablob/users/v-junyiao/speechdata/text_data/v4/bin-idx
+SP_PATH=${LABEL_DIR}/sentence.bpe.model
+# export CUDA_VISIBLE_DEVICES=1
+python fairseq_cli/hydra_train.py \
+  --config-dir examples/hubert/config/pretrain \
+  --config-name pretrain_step2 \
+  distributed_training.distributed_world_size=64 \
+  distributed_training.nprocs_per_node=8 \
+  \
+  dataset.train_subset=\"train_COVOST,train_asr_VP,train_punc_TEDLIUM,train_asr_MUSTC,train_punc_LS,train_asr_EUR+covost2.en_XX-ja_XX,covost2.en_XX-zh_CN,covost_eurST.en_XX-de_DE,mt8corpus_domain45.en_XX-ja_XX,mt8corpus_filt_slct80_domain44.en_XX-de_DE,mt8corpus_filt_slct80_domain40.en_XX-zh_CN,train.en_XX-de_DE,train.en_XX-ja_XX,train.en_XX-zh_CN\" \
+  dataset.valid_subset=\"dev_asr_MUSTC+valid.en_XX-de_DE,dev_asr_MUSTC+valid.en_XX-ja_XX,dev_asr_MUSTC+valid.en_XX-zh_CN\" \
+  dataset.max_tokens=480001 \
+  dataset.num_workers=0 \
+  optimization.update_freq=[1] \
+  optimization.max_update=300000 \
+  \
+  task.hubert_tokenizer="sentencepiece" \
+  task.sp_path=${SP_PATH} \
+  task.max_keep_size=480000 \
+  +task.split_modality_batch=true \
+  +task.speech_tgt_lang="en_XX" \
+  +task.mbart_style_lang_id=true \
+  +task.text_sampling_alpha=1.0 \
+  +task.store_labels=true \
+  model.freeze_finetune_updates=15000 \
+  criterion.dec_weight=0.5 \
+  +model.reuse_text_emb=true \
+  +model.share_ctc_decoder_embed=true \
+  +model.share_speech_text_embeddings=true \
+  \
+  task.data=${DATA_ROOT} \
+  task.label_dir=${LABEL_DIR} \
+  task.text_cfg.text_data=${TEXT_DATA_DIR} \
+  model.w2v_path=${W2V_PATH} \
+  checkpoint.save_dir=${SAVE_DIR} \
+  common.tensorboard_logdir=${SAVE_DIR} \
+  hydra.run.dir=${SAVE_DIR} \
+  hydra.job.name=${EXP_NAME}
+
+sleep infinity
diff --git a/YiTrans/readme.md b/YiTrans/readme.md
new file mode 100644
index 0000000000000000000000000000000000000000..ea957c46fb011286fcee55516efcbebb42b01001
--- /dev/null
+++ b/YiTrans/readme.md
@@ -0,0 +1,98 @@
+# YiTrans@IWSLT22
+
+> [**YiTrans**](https://arxiv.org/abs/2206.05777) (```IWSLT 2022```): **The YiTrans End-to-End Speech Translation System for IWSLT 2022 Offline Shared Task**
+> Code is being merged to this repository, thanks for your attention
+
+## Setup
+```bash
+git clone https://github.com/microsoft/SpeechT5.git
+git submodule update --init YiTrans/fairseq
+cd YiTrans/fairseq
+pip install -e .
+```
+
+## Data Preparation
+### Speech/ASR data for pre-training
+Please follow the steps of data preparation for HuBERT in [here](https://github.com/facebookresearch/fairseq/tree/main/examples/hubert#data-preparation).
+### Monolingual text data for pre-training
+Please follow the steps of data preparation for mBART in [here](https://github.com/facebookresearch/fairseq/tree/main/examples/mbart). We reuse the multilingual vocabulary.
+After getting your subset.{idx,bin} files ready, renaming them as subset.lang.lang.{idx,bin}, e.g.
+```
+mono_deduped_filt_sort.en_XX.en_XX.bin
+mono_deduped_filt_sort.en_XX.en_XX.idx
+```
+### Bilingual text data for pre-training
+The same way of preparing monolingual data with only the difference that you should prepare for both the source language and the target languages. Renaming them as subset.src-tgt.{src,tgt}.{idx,bin}, e.g.
+```
+mt8corpus_filt_slct.en_XX-de_DE.de_DE.bin
+mt8corpus_filt_slct.en_XX-de_DE.de_DE.idx
+mt8corpus_filt_slct.en_XX-de_DE.en_XX.bin
+mt8corpus_filt_slct.en_XX-de_DE.en_XX.idx
+```
+
+### ST data for fine-tuning
+Please follow the steps of data preparation for S2T tasks [here](https://github.com/pytorch/fairseq/blob/main/examples/speech_to_text/docs/mustc_example.md). Your tsv file should be like this:
+```
+id      audio   n_frames        tgt_text        speaker src_text        src_lang        tgt_lang
+ted_1_0 /mnt/speechdata/MUSTC/en-de/flac/ted_1_0.flac    25920   Hinter mir war gar keine Autokolonne.   spk.1   There was no motorcade back there.      en_XX   de_DE
+ted_1_1 /mnt/speechdata/MUSTC/en-de/flac/ted_1_1.flac    219359  Haben Sie schon mal vom Phantomschmerz gehört? (Lachen) Wir saßen in einem gemieteten Ford Taurus.       spk.1   (Laughter) You've heard of phantom limb pain? (Laughter)        en_XX   de_DE
+ted_1_2 /mnt/speechdata/MUSTC/en-de/flac/ted_1_2.flac    71360   Es war Zeit zum Abendessen und wir hielten Ausschau nach einem Restaurant.      spk.1   It was dinnertime, and we started looking for a place to eat.    en_XX   de_DE
+```
+
+
+
+## Pre-train
+For example of pre-training the PT36 model, please follow these steps:
+
+Step 0: Download the released [Hubert model](https://dl.fbaipublicfiles.com/hubert/hubert_large_ll60k.pt) and [mBART model](https://dl.fbaipublicfiles.com/fairseq/models/mbart50/mbart50.pretrained.tar.gz) model.
+
+Step 1: Pre-training with unlabeled speech data and monolingual/bilingual text data 
+```bash
+bash YiTrans/exp_scripts/pretrain/pretrain_pt36_adaptor_step1.sh
+```
+
+Step 2: Pre-training with ASR dat and domain-filtered bilingual text data 
+```bash
+bash YiTrans/exp_scripts/pretrain/pretrain_pt36_adaptor_step2.sh
+```
+Other configurations like training PT48 can also be fould in ./YiTrans/exp_scripts/pretrain, you might need to modify the PATH variables in .sh files to adjust your data.
+
+## Fine-tune
+For example of pre-training En-De ST model on MuST-C dataset,
+```bash
+bash YiTrans/exp_scripts/finetune_ST/en-de/jtst_pt36s2_mustc.sh
+```
+Other configurations like different translation directions or datasets could be found in ./YiTrans/exp_scripts/finetune_ST, you might need to modify the PATH variables in .sh files to adjust your data.
+
+## Cascaded system
+You can also build a cascaded ST system (ASR+MT) with our codebase.
+1. ASR model: fine-tune from the cascade of [Hubert Large](https://dl.fbaipublicfiles.com/hubert/hubert_large_ll60k.pt) and [mBART model](https://dl.fbaipublicfiles.com/fairseq/models/mbart50/mbart50.pretrained.tar.gz):
+    ```bash
+    # change the mbart_path/hubert_path to your own in the *.sh
+    bash YiTrans/exp_scripts/finetune_ASR/finetune_hubert24_mbart24_en.sh
+    ```
+    Check the [`.sh`](exp_scripts/finetune_ASR/finetune_hubert24_mbart24_en.sh) file for more information about the configuration.
+
+2. MT model: fine-tune from [mBART model](https://dl.fbaipublicfiles.com/fairseq/models/mbart50/mbart50.pretrained.tar.gz):
+
+    ```bash
+    # change the mbart_path to your own in the *.sh
+    bash YiTrans/exp_scripts/finetune_MT/finetune_mbart_en-de.sh
+    ```
+    Check the [`.sh`](exp_scripts/finetune_MT/finetune_mbart_en-de.sh) file for more information about the configuration.
+
+
+## Reference
+
+If you find our work is useful in your research, please cite the following paper:
+
+```bibtex
+@article{Zhang2022Yitrans,
+  title   = {The YiTrans End-to-End Speech Translation System for IWSLT 2022 Offline Shared Task},
+  author  = {Zhang, Ziqiang and Ao, Junyi and Zhou, Long and Liu, Shujie and Wei, Furu and Li, Jinyu},
+  eprint={2206.05777},
+  archivePrefix={arXiv},
+  primaryClass={cs.CL},
+  year={2022}
+}
+```
diff --git a/YiTrans/yitrans_iwslt22/__init__.py b/YiTrans/yitrans_iwslt22/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..97327d269e93a13cd135f6c1a187fd820a8decb8
--- /dev/null
+++ b/YiTrans/yitrans_iwslt22/__init__.py
@@ -0,0 +1 @@
+from . import data, tasks, criterions, models
diff --git a/YiTrans/yitrans_iwslt22/config/finetune_asr/large_mustc.yaml b/YiTrans/yitrans_iwslt22/config/finetune_asr/large_mustc.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..ce9052e29d72402ef4b01340b0d692dce24ee6df
--- /dev/null
+++ b/YiTrans/yitrans_iwslt22/config/finetune_asr/large_mustc.yaml
@@ -0,0 +1,103 @@
+# @package _group_
+
+common:
+  fp16: true
+  log_format: json
+  log_interval: 200
+  tensorboard_logdir: tblog
+  seed: 1337
+
+checkpoint:
+  no_epoch_checkpoints: false
+  best_checkpoint_metric: dec_accuracy
+  restore_file: checkpoint_last.pt
+  maximize_best_checkpoint_metric: true
+
+distributed_training:
+  ddp_backend: legacy_ddp
+  find_unused_parameters: true
+  distributed_world_size: 1
+  distributed_port: -1
+  nprocs_per_node: 8
+
+task:
+  _name: iwslt_joint_pretraining
+  data: ???
+  fine_tuning: true
+  label_dir: ???
+  normalize: true  # must be consistent with pre-training
+  labels: ["ltr"]
+  single_target: true
+  add_decoder: true
+  pad_audio: true
+  random_crop: false
+  max_keep_size: 480000
+  hubert_tokenizer: "none"
+  sp_path: None
+
+dataset:
+  num_workers: 6
+  max_tokens: 1280000
+  skip_invalid_size_inputs_valid_test: true
+  train_subset: train_100
+  valid_subset: dev_other
+  required_batch_size_multiple: 1
+
+criterion:
+  _name: ctc_ce
+  zero_infinity: true
+  dec_weight: 1.0
+
+optimization:
+  max_update: 80000
+  lr: [0.00003]
+  sentence_avg: true
+  update_freq: [1]
+
+optimizer:
+  _name: adam
+  adam_betas: (0.9,0.98)
+  adam_eps: 1e-08
+  weight_decay: 0.0
+
+lr_scheduler:
+  _name: tri_stage
+  phase_ratio: [0.1, 0.4, 0.5]
+  final_lr_scale: 0.05
+
+model:
+  _name: yitrans_asr
+  w2v_path: ???
+  apply_mask: true
+  mask_prob: 0.5
+  mask_channel_prob: 0.5
+  mask_channel_length: 64
+  layerdrop: 0.1
+  decoder_layerdrop: 0.1
+  activation_dropout: 0.1
+  feature_grad_mult: 0.0
+  freeze_finetune_updates: 0
+  add_decoder: true
+  share_decoder_input_output_embed: true
+
+
+hydra:
+  job:
+    config:
+      override_dirname:
+        kv_sep: '-'
+        item_sep: '__'
+        exclude_keys:
+          - run
+          - task.data
+          - task.label_dir
+          - model.w2v_path
+          - dataset.train_subset
+          - dataset.valid_subset
+          - criterion.wer_kenlm_model
+          - criterion.wer_lexicon
+  run:
+    dir: ???
+  sweep:
+    dir: ???
+    subdir: ${hydra.job.config_name}__${hydra.job.override_dirname}
diff --git a/YiTrans/yitrans_iwslt22/config/finetune_mt/mt_translation.yaml b/YiTrans/yitrans_iwslt22/config/finetune_mt/mt_translation.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..b0d02d4ab8ea29bcd0b2eb8833c8876f88de99dd
--- /dev/null
+++ b/YiTrans/yitrans_iwslt22/config/finetune_mt/mt_translation.yaml
@@ -0,0 +1,89 @@
+# @package _group_
+
+common:
+  fp16: true
+  log_format: json
+  log_interval: 200
+  tensorboard_logdir: tblog
+  seed: 1337
+
+checkpoint:
+  save_interval: 1000000
+  keep_last_epochs: 5
+  save_interval_updates: 10000
+  keep_interval_updates_pattern: 20000
+  keep_interval_updates: 5
+  keep_best_checkpoints: 5
+  best_checkpoint_metric: accuracy
+  maximize_best_checkpoint_metric: true
+
+distributed_training:
+  ddp_backend: legacy_ddp
+  find_unused_parameters: true
+  distributed_world_size: -1
+  nprocs_per_node: 8
+
+
+criterion:
+  _name: "label_smoothed_cross_entropy"
+  label_smoothing: 0.2
+  report_accuracy: true
+
+
+task:
+  _name: "iwslt_translation_from_pretrained"
+
+dataset:
+  num_workers: 6
+  max_tokens: 3200000
+  skip_invalid_size_inputs_valid_test: true
+  validate_after_updates: ${model.freeze_finetune_updates}
+  validate_interval: ${checkpoint.save_interval}
+  validate_interval_updates: ${checkpoint.save_interval_updates}
+  train_subset: train_100
+  valid_subset: dev_other
+  required_batch_size_multiple: 1
+
+optimizer:
+  _name: adam
+  adam_betas: (0.9,0.98)
+  adam_eps: 1e-06
+  weight_decay: 0.0
+
+lr_scheduler:
+  lr: [0.0001]
+  _name: polynomial_decay
+  warmup_updates: 5000
+  total_num_update: 200000
+
+model:
+  _name: finetune_mt
+  w2v_path: ???
+  apply_mask: true
+  mask_prob: 0.65
+  mask_channel_prob: 0.5
+  mask_channel_length: 64
+  layerdrop: 0.1
+  decoder_layerdrop: 0.1
+  activation_dropout: 0.1
+  feature_grad_mult: 0.0
+  freeze_finetune_updates: 0
+
+hydra:
+  job:
+    config:
+      override_dirname:
+        kv_sep: '-'
+        item_sep: '__'
+        exclude_keys:
+          - run
+          - task.data
+          - task.label_dir
+          - model.w2v_path
+          - dataset.train_subset
+          - dataset.valid_subset
+  run:
+    dir: ???
+  sweep:
+    dir: ???
+    subdir: ${hydra.job.config_name}__${hydra.job.override_dirname}
diff --git a/YiTrans/yitrans_iwslt22/config/pretrain/joint_base.yaml b/YiTrans/yitrans_iwslt22/config/pretrain/joint_base.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..deb7cda6c7571dc4ab25d63da0110c2e11c35a3d
--- /dev/null
+++ b/YiTrans/yitrans_iwslt22/config/pretrain/joint_base.yaml
@@ -0,0 +1,134 @@
+# @package _group_
+
+common:
+  fp16: true
+  log_format: json
+  log_interval: 200
+  seed: 1337
+  tensorboard_logdir: tblog
+
+checkpoint:
+  save_dir: ???
+  save_interval: 1
+  keep_last_epochs: 1
+  save_interval_updates: 5000
+  keep_interval_updates: -1
+  # no_epoch_checkpoints: true
+
+distributed_training:
+  ddp_backend: no_c10d
+  distributed_backend: 'nccl'
+  distributed_world_size: 32
+  distributed_port: 29671
+  nprocs_per_node: 8
+  find_unused_parameters: true
+
+task:
+  _name: iwslt_joint_pretraining
+  data: ???
+  label_dir: ???
+  labels: ???
+  label_rate: ${model.label_rate}
+  sample_rate: 16000
+  max_sample_size: 250000
+  min_sample_size: 32000
+  pad_audio: false
+  random_crop: true
+  normalize: false # must be consistent with extractor
+  add_decoder: false
+  text_cfg:
+    seed: ${common.seed}
+    text_data: ???
+    data_config: config.yaml
+    sample_break_mode: eos
+    tokens_per_sample: 512
+    shorten_method: "random_crop"
+    text_maxtokens_ratio: 1.0
+
+
+dataset:
+  num_workers: 6
+  max_tokens: 1400000
+  skip_invalid_size_inputs_valid_test: true
+  validate_interval: ${checkpoint.save_interval}
+  validate_interval_updates: ${checkpoint.save_interval_updates}
+  required_batch_size_multiple: 1
+
+criterion:
+  _name: hubert
+  pred_masked_weight: 1.0
+  pred_nomask_weight: 0.0
+  loss_weights: [10,]
+  label_smoothing: 0.1
+
+optimization:
+  max_update: 800000
+  lr: [0.0001]
+  clip_norm: 10.0
+
+optimizer:
+  _name: adam
+  adam_betas: (0.9,0.98)
+  adam_eps: 1e-06
+  weight_decay: 0.01
+
+lr_scheduler:
+  _name: polynomial_decay
+  warmup_updates: 32000
+
+model:
+  _name: hubert
+  label_rate: ???
+  skip_masked: false
+  skip_nomask: false
+  mask_prob: 0.80
+  extractor_mode: default
+  conv_feature_layers: '[(512,10,5)] + [(512,3,2)] * 4 + [(512,2,2)] * 2'
+  final_dim: 256
+  encoder_layerdrop: 0.05
+  decoder_layerdrop: 0.05
+  dropout_input: 0.1
+  dropout_features: 0.1
+  dropout: 0.1
+  attention_dropout: 0.1
+  feature_grad_mult: 0.1
+  untie_final_proj: true
+  activation_dropout: 0.0
+  use_rel_pos_enc: true
+  text_transformer:
+    activation_fn: ${model.activation_fn}
+    dropout: ${model.dropout}
+    attention_dropout: ${model.attention_dropout}
+    activation_dropout: ${model.activation_dropout}
+    adaptive_input: ${model.adaptive_input}
+    max_source_positions: ${task.text_cfg.tokens_per_sample}
+    checkpoint_activations: ${model.checkpoint_activations}
+    no_scale_embedding: false
+    layernorm_embedding: false
+    quant_noise:
+      pq: ${model.quant_noise_pq}
+    encoder:
+      embed_dim: 768
+      ffn_embed_dim: 3072
+      layers: 6
+      attention_heads: 12
+      normalize_before: false
+      learned_pos: false
+      layerdrop: ${model.encoder_layerdrop}
+     
+
+hydra:
+  job:
+    config:
+      override_dirname:
+        kv_sep: '-'
+        item_sep: '__'
+        exclude_keys:
+          - run
+          - task.data
+          - task.label_dir
+  run:
+    dir: ???
+  sweep:
+    dir: ???
+    subdir: ${hydra.job.config_name}__${hydra.job.override_dirname}
diff --git a/YiTrans/yitrans_iwslt22/config/pretrain/joint_large.yaml b/YiTrans/yitrans_iwslt22/config/pretrain/joint_large.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..dbacae9a15ba5c2ff34081d200f280b2a408f752
--- /dev/null
+++ b/YiTrans/yitrans_iwslt22/config/pretrain/joint_large.yaml
@@ -0,0 +1,159 @@
+# @package _group_
+
+common:
+  fp16: true
+  log_format: json
+  log_interval: 200
+  seed: 1337
+  tensorboard_logdir: tblog
+
+checkpoint:
+  save_dir: ???
+  save_interval: 1
+  keep_last_epochs: 10
+  save_interval_updates: 10000
+  keep_interval_updates: -1
+  # no_epoch_checkpoints: true
+
+distributed_training:
+  ddp_backend: no_c10d
+  distributed_backend: 'nccl'
+  distributed_world_size: 32
+  distributed_port: 29671
+  nprocs_per_node: 8
+  find_unused_parameters: true
+
+task:
+  _name: iwslt_joint_pretraining
+  data: ???
+  label_dir: ???
+  labels: ???
+  label_rate: ${model.label_rate}
+  sample_rate: 16000
+  max_sample_size: 250000
+  min_sample_size: 32000
+  pad_audio: false
+  random_crop: true
+  normalize: true # must be consistent with extractor
+  add_decoder: true
+  split_modality_batch: true
+  store_labels: true
+  text_cfg:
+    seed: ${common.seed}
+    text_data: ???
+    data_config: config.yaml
+    sample_break_mode: eos
+    tokens_per_sample: 1024
+    shorten_method: "random_crop"
+    text_maxtokens_ratio: 1.0
+    mask_whole_words: true
+
+dataset:
+  num_workers: 4
+  max_tokens: 900000
+  skip_invalid_size_inputs_valid_test: true
+  validate_interval: ${checkpoint.save_interval}
+  validate_interval_updates: ${checkpoint.save_interval_updates}
+  required_batch_size_multiple: 1
+
+criterion:
+  _name: joint_step1_split_batch
+  pred_masked_weight: 1.0
+  pred_nomask_weight: 0.0
+  loss_weights: [10,]
+  label_smoothing: 0.02
+
+optimization:
+  max_update: 400000
+  lr: [0.00003]
+  clip_norm: 1.0
+
+optimizer:
+  _name: adam
+  adam_betas: (0.9,0.98)
+  adam_eps: 1e-06
+  weight_decay: 0.01
+
+lr_scheduler:
+  _name: polynomial_decay
+  warmup_updates: 32000
+
+model:
+  _name: joint_ed
+  label_rate: ???
+  encoder_layers: 24
+  encoder_embed_dim: 1024
+  encoder_ffn_embed_dim: 4096
+  encoder_attention_heads: 16
+  final_dim: 768
+  skip_masked: false
+  skip_nomask: false
+  mask_prob: 0.80
+  extractor_mode: layer_norm
+  conv_feature_layers: '[(512,10,5)] + [(512,3,2)] * 4 + [(512,2,2)] * 2'
+  encoder_layerdrop: 0.0
+  dropout_input: 0.0
+  dropout_features: 0.0
+  dropout: 0.0
+  attention_dropout: 0.0
+  layer_norm_first: true
+  feature_grad_mult: 1.0
+  untie_final_proj: true
+  activation_dropout: 0.0
+  use_rel_pos_enc: true
+  decoder_layers: 12
+  decoder_embed_dim: 1024
+  decoder_ffn_embed_dim: 4096
+  decoder_attention_heads: 16
+  decoder_output_dim: 1024
+  decoder_normalize_before: true
+  layernorm_embedding: true
+  decoder_learned_pos: true
+  share_decoder_input_output_embed: true
+  share_enc_dec_embeddings: true
+  max_target_positions: 1024
+  activation_fn: "gelu"
+  adaptive_input: false
+  checkpoint_activations: false
+  quant_noise_pq: 0
+  add_text_modality: true
+  add_text_encoder: true
+  add_adaptor: true
+
+  text_transformer:
+    activation_fn: ${model.activation_fn}
+    dropout: ${model.dropout}
+    attention_dropout: ${model.attention_dropout}
+    activation_dropout: ${model.activation_dropout}
+    adaptive_input: ${model.adaptive_input}
+    max_source_positions: ${task.text_cfg.tokens_per_sample}
+    checkpoint_activations: ${model.checkpoint_activations}
+    no_scale_embedding: false
+    layernorm_embedding: true
+    quant_noise:
+      pq: ${model.quant_noise_pq}
+    encoder:
+      embed_dim: 1024
+      ffn_embed_dim: 4096
+      layers: 12
+      attention_heads: 16
+      normalize_before: true
+      learned_pos: true
+      layerdrop: ${model.encoder_layerdrop}
+     
+
+hydra:
+  job:
+    config:
+      override_dirname:
+        kv_sep: '-'
+        item_sep: '__'
+        exclude_keys:
+          - run
+          - task.data
+          - task.label_dir
+  run:
+    dir: ???
+  sweep:
+    dir: ???
+    subdir: ${hydra.job.config_name}__${hydra.job.override_dirname}
diff --git a/YiTrans/yitrans_iwslt22/criterions/__init__.py b/YiTrans/yitrans_iwslt22/criterions/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..bb260356b113bc05c3556213a15a337a4513c42f
--- /dev/null
+++ b/YiTrans/yitrans_iwslt22/criterions/__init__.py
@@ -0,0 +1,9 @@
+import importlib
+import os
+
+for file in os.listdir(os.path.dirname(__file__)):
+    if file.endswith(".py") and not file.startswith("_"):
+        criterion_name = file[: file.find(".py")]
+        importlib.import_module(
+            "yitrans_iwslt22.criterions." + criterion_name
+        )
diff --git a/YiTrans/yitrans_iwslt22/criterions/ctc_ce.py b/YiTrans/yitrans_iwslt22/criterions/ctc_ce.py
new file mode 100644
index 0000000000000000000000000000000000000000..40fab26b8db594f980541fa7e2d197b9329f1a40
--- /dev/null
+++ b/YiTrans/yitrans_iwslt22/criterions/ctc_ce.py
@@ -0,0 +1,414 @@
+# --------------------------------------------------------
+# The YiTrans End-to-End Speech Translation System for IWSLT 2022 Offline Shared Task (https://arxiv.org/abs/2206.05777)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/YiTrans
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Based on fairseq code bases
+# https://github.com/facebookresearch/fairseq
+# --------------------------------------------------------
+
+import math
+from argparse import Namespace
+from dataclasses import dataclass, field
+from omegaconf import II
+from typing import Optional
+
+import torch
+import torch.nn.functional as F
+from fairseq import metrics, utils
+from fairseq.criterions import FairseqCriterion, register_criterion
+from fairseq.criterions.label_smoothed_cross_entropy import label_smoothed_nll_loss
+from fairseq.dataclass import FairseqDataclass
+from fairseq.data.data_utils import post_process
+from fairseq.tasks import FairseqTask
+from fairseq.logging.meters import safe_round
+
+
+@dataclass
+class CtcCeCriterionConfig(FairseqDataclass):
+    zero_infinity: bool = field(
+        default=False,
+        metadata={"help": "zero inf loss when source length <= target length"},
+    )
+    sentence_avg: bool = II("optimization.sentence_avg")
+    post_process: str = field(
+        default="letter",
+        metadata={
+            "help": "how to post process predictions into words. can be letter, "
+            "wordpiece, BPE symbols, etc. "
+            "See fairseq.data.data_utils.post_process() for full list of options"
+        },
+    )
+    wer_kenlm_model: Optional[str] = field(
+        default=None,
+        metadata={
+            "help": "if this is provided, use kenlm to compute wer (along with other wer_* args)"
+        },
+    )
+    wer_lexicon: Optional[str] = field(
+        default=None,
+        metadata={"help": "lexicon to use with wer_kenlm_model"},
+    )
+    wer_lm_weight: float = field(
+        default=2.0,
+        metadata={"help": "lm weight to use with wer_kenlm_model"},
+    )
+    wer_word_score: float = field(
+        default=-1.0,
+        metadata={"help": "lm word score to use with wer_kenlm_model"},
+    )
+
+    wer_args: Optional[str] = field(
+        default=None,
+        metadata={
+            "help": "DEPRECATED: tuple of (wer_kenlm_model, wer_lexicon, wer_lm_weight, wer_word_score)"
+        },
+    )
+
+    dec_weight: float = field(
+        default=0.5,
+        metadata={"help": "weights for decoder CE Loss, loss will be ((1 - dec_weight) * hubert_loss + dec_weight * CE_Loss)"},
+    )
+    report_accuracy: bool = field(
+        default=True,
+        metadata={"help": "report decoder accuracy metric"},
+    )
+    ignore_prefix_size: int = field(
+        default=0,
+        metadata={"help": "Ignore first N tokens"},
+    )
+    label_smoothing: float = field(
+        default=0.1,
+        metadata={"help": "epsilon for label smoothing, 0 means no label smoothing"},
+    )
+
+
+@register_criterion("ctc_ce", dataclass=CtcCeCriterionConfig)
+class CtcCeCriterion(FairseqCriterion):
+    def __init__(self, cfg: CtcCeCriterionConfig, task: FairseqTask):
+        super().__init__(task)
+        self.blank_idx = (
+            task.target_dictionary.index(task.blank_symbol)
+            if hasattr(task, "blank_symbol")
+            else 0
+        )
+        self.pad_idx = task.target_dictionary.pad()
+        self.eos_idx = task.target_dictionary.eos()
+        self.post_process = cfg.post_process
+
+        if cfg.wer_args is not None:
+            (
+                cfg.wer_kenlm_model,
+                cfg.wer_lexicon,
+                cfg.wer_lm_weight,
+                cfg.wer_word_score,
+            ) = eval(cfg.wer_args)
+
+        if cfg.wer_kenlm_model is not None:
+            from examples.speech_recognition.w2l_decoder import W2lKenLMDecoder
+
+            dec_args = Namespace()
+            dec_args.nbest = 1
+            dec_args.criterion = "ctc"
+            dec_args.kenlm_model = cfg.wer_kenlm_model
+            dec_args.lexicon = cfg.wer_lexicon
+            dec_args.beam = 50
+            dec_args.beam_size_token = min(50, len(task.target_dictionary))
+            dec_args.beam_threshold = min(50, len(task.target_dictionary))
+            dec_args.lm_weight = cfg.wer_lm_weight
+            dec_args.word_score = cfg.wer_word_score
+            dec_args.unk_weight = -math.inf
+            dec_args.sil_weight = 0
+
+            self.w2l_decoder = W2lKenLMDecoder(dec_args, task.target_dictionary)
+        else:
+            self.w2l_decoder = None
+
+        self.zero_infinity = cfg.zero_infinity
+        self.sentence_avg = cfg.sentence_avg
+
+        self.dec_weight = cfg.dec_weight
+        self.report_accuracy = cfg.report_accuracy
+        self.ignore_prefix_size = cfg.ignore_prefix_size
+        self.eps = cfg.label_smoothing
+
+    def forward(self, model, sample, reduce=True):
+        net_output = model(**sample["net_input"])
+        lprobs = model.get_normalized_probs(
+            net_output, log_probs=True
+        ).contiguous()  # (T, B, C) from the encoder
+
+        if "src_lengths" in sample["net_input"]:
+            input_lengths = sample["net_input"]["src_lengths"]
+        else:
+            if net_output["padding_mask"] is not None:
+                non_padding_mask = ~net_output["padding_mask"]
+                input_lengths = non_padding_mask.long().sum(-1)
+            else:
+                input_lengths = lprobs.new_full(
+                    (lprobs.size(1),), lprobs.size(0), dtype=torch.long
+                )
+
+        pad_mask = (sample["target"] != self.pad_idx) & (
+            sample["target"] != self.eos_idx
+        )
+        targets_flat = sample["target"].masked_select(pad_mask)
+        if "target_lengths" in sample:
+            target_lengths = sample["target_lengths"]
+        else:
+            target_lengths = pad_mask.sum(-1)
+
+        with torch.backends.cudnn.flags(enabled=False):
+            loss = F.ctc_loss(
+                lprobs,
+                targets_flat,
+                input_lengths,
+                target_lengths,
+                blank=self.blank_idx,
+                reduction="sum",
+                zero_infinity=self.zero_infinity,
+            )
+
+        ntokens = (
+            sample["ntokens"] if "ntokens" in sample else target_lengths.sum().item()
+        )
+
+        sample_size = sample["target"].size(0) if self.sentence_avg else ntokens
+
+        logging_output = {}
+        if "decoder_target" in sample:
+            if net_output["decoder_out"] is not None:
+                dec_sample_size = sample["target"].size(0) if self.sentence_avg else sample["dec_ntokens"]
+                dec_loss, dec_nll_loss = self.compute_ce_loss(model, net_output["decoder_out"], sample, reduce=reduce)
+                logging_output["ctc_loss"] = loss.item()
+                loss = (1 - self.dec_weight) * loss + (self.dec_weight * dec_loss *  sample_size / dec_sample_size)
+                logging_output["dec_loss"] = dec_loss.item()
+                logging_output["dec_nll_loss"] = dec_nll_loss.item()
+                logging_output["dec_sample_size"] = dec_sample_size
+
+                if self.report_accuracy:
+                    n_correct, total = self.compute_accuracy(model, net_output["decoder_out"], sample)
+                    logging_output["dec_n_correct"] = utils.item(n_correct.data)
+                    logging_output["total"] = utils.item(total.data)
+            else:
+                logging_output["ctc_loss"] = loss.item()
+                loss = (1 - self.dec_weight) * loss
+                logging_output["dec_loss"] = 0
+                logging_output["dec_nll_loss"] = 0
+                logging_output["dec_sample_size"] = 1
+                if self.report_accuracy:
+                    logging_output["dec_n_correct"] = 0
+                    logging_output["total"] = 1
+            
+        logging_output = {
+            "loss": utils.item(loss.data),  # * sample['ntokens'],
+            "ntokens": ntokens,
+            "nsentences": sample["id"].numel(),
+            "sample_size": sample_size,
+            **logging_output,
+        }
+
+        if not model.training and self.dec_weight < 1.0:
+            import editdistance
+
+            with torch.no_grad():
+                lprobs_t = lprobs.transpose(0, 1).float().contiguous().cpu()
+
+                c_err = 0
+                c_len = 0
+                w_errs = 0
+                w_len = 0
+                wv_errs = 0
+                for lp, t, inp_l in zip(
+                    lprobs_t,
+                    sample["target_label"]
+                    if "target_label" in sample
+                    else sample["target"],
+                    input_lengths,
+                ):
+                    lp = lp[:inp_l].unsqueeze(0)
+
+                    decoded = None
+                    if self.w2l_decoder is not None:
+                        decoded = self.w2l_decoder.decode(lp)
+                        if len(decoded) < 1:
+                            decoded = None
+                        else:
+                            decoded = decoded[0]
+                            if len(decoded) < 1:
+                                decoded = None
+                            else:
+                                decoded = decoded[0]
+
+                    p = (t != self.task.target_dictionary.pad()) & (
+                        t != self.task.target_dictionary.eos()
+                    )
+                    targ = t[p]
+                    targ_units = self.task.target_dictionary.string(targ)
+                    targ_units_arr = targ.tolist()
+
+                    toks = lp.argmax(dim=-1).unique_consecutive()
+                    pred_units_arr = toks[toks != self.blank_idx].tolist()
+
+                    c_err += editdistance.eval(pred_units_arr, targ_units_arr)
+                    c_len += len(targ_units_arr)
+
+                    targ_words = post_process(targ_units, self.post_process).split()
+
+                    pred_units = self.task.target_dictionary.string(pred_units_arr)
+                    pred_words_raw = post_process(pred_units, self.post_process).split()
+
+                    if decoded is not None and "words" in decoded:
+                        pred_words = decoded["words"]
+                        w_errs += editdistance.eval(pred_words, targ_words)
+                        wv_errs += editdistance.eval(pred_words_raw, targ_words)
+                    else:
+                        dist = editdistance.eval(pred_words_raw, targ_words)
+                        w_errs += dist
+                        wv_errs += dist
+
+                    w_len += len(targ_words)
+
+                logging_output["wv_errors"] = wv_errs
+                logging_output["w_errors"] = w_errs
+                logging_output["w_total"] = w_len
+                logging_output["c_errors"] = c_err
+                logging_output["c_total"] = c_len
+
+        return loss, sample_size, logging_output
+
+    def compute_ce_loss(self, model, net_output, sample, reduce=True):
+        lprobs, target = self.get_lprobs_and_target(model, net_output, sample)
+        loss, nll_loss = label_smoothed_nll_loss(
+            lprobs,
+            target,
+            self.eps,
+            ignore_index=self.pad_idx,
+            reduce=reduce,
+        )
+        return loss, nll_loss
+
+    def compute_accuracy(self, model, net_output, sample):
+        lprobs, target = self.get_lprobs_and_target(model, net_output, sample)
+        mask = target.ne(self.pad_idx)
+        n_correct = torch.sum(
+            lprobs.argmax(1).masked_select(mask).eq(target.masked_select(mask))
+        )
+        total = torch.sum(mask)
+        return n_correct, total
+
+    def get_lprobs_and_target(self, model, net_output, sample):
+        lprobs = model.get_normalized_probs(net_output, log_probs=True)
+        target = sample["decoder_target"]
+        if self.ignore_prefix_size > 0:
+            if getattr(lprobs, "batch_first", False):
+                lprobs = lprobs[:, self.ignore_prefix_size :, :].contiguous()
+                target = target[:, self.ignore_prefix_size :].contiguous()
+            else:
+                lprobs = lprobs[self.ignore_prefix_size :, :, :].contiguous()
+                target = target[self.ignore_prefix_size :, :].contiguous()
+        return lprobs.view(-1, lprobs.size(-1)), target.view(-1)
+
+
+    @staticmethod
+    def reduce_metrics(logging_outputs) -> None:
+        """Aggregate logging outputs from data parallel training."""
+
+        loss_sum = utils.item(sum(log.get("loss", 0) for log in logging_outputs))
+        ntokens = utils.item(sum(log.get("ntokens", 0) for log in logging_outputs))
+        nsentences = utils.item(
+            sum(log.get("nsentences", 0) for log in logging_outputs)
+        )
+        sample_size = utils.item(
+            sum(log.get("sample_size", 0) for log in logging_outputs)
+        )
+
+        metrics.log_scalar(
+            "loss", loss_sum / sample_size / math.log(2), sample_size, round=3
+        )
+        metrics.log_scalar("ntokens", ntokens)
+        metrics.log_scalar("nsentences", nsentences)
+        if sample_size != ntokens:
+            metrics.log_scalar(
+                "nll_loss", loss_sum / ntokens / math.log(2), ntokens, round=3
+            )
+
+        c_errors = sum(log.get("c_errors", 0) for log in logging_outputs)
+        metrics.log_scalar("_c_errors", c_errors)
+        c_total = sum(log.get("c_total", 0) for log in logging_outputs)
+        metrics.log_scalar("_c_total", c_total)
+        w_errors = sum(log.get("w_errors", 0) for log in logging_outputs)
+        metrics.log_scalar("_w_errors", w_errors)
+        wv_errors = sum(log.get("wv_errors", 0) for log in logging_outputs)
+        metrics.log_scalar("_wv_errors", wv_errors)
+        w_total = sum(log.get("w_total", 0) for log in logging_outputs)
+        metrics.log_scalar("_w_total", w_total)
+
+        if c_total > 0:
+            metrics.log_derived(
+                "uer",
+                lambda meters: safe_round(
+                    meters["_c_errors"].sum * 100.0 / meters["_c_total"].sum, 3
+                )
+                if meters["_c_total"].sum > 0
+                else float("nan"),
+            )
+        if w_total > 0:
+            metrics.log_derived(
+                "wer",
+                lambda meters: safe_round(
+                    meters["_w_errors"].sum * 100.0 / meters["_w_total"].sum, 3
+                )
+                if meters["_w_total"].sum > 0
+                else float("nan"),
+            )
+            metrics.log_derived(
+                "raw_wer",
+                lambda meters: safe_round(
+                    meters["_wv_errors"].sum * 100.0 / meters["_w_total"].sum, 3
+                )
+                if meters["_w_total"].sum > 0
+                else float("nan"),
+            )
+
+        if "dec_loss" in logging_outputs[0]:
+            ctc_loss_sum = sum(log.get("ctc_loss", 0) for log in logging_outputs)
+            dec_loss_sum = sum(log.get("dec_loss", 0) for log in logging_outputs)
+            dec_nll_loss_sum = sum(log.get("dec_nll_loss", 0) for log in logging_outputs)
+            dec_sample_size = sum(log.get("dec_sample_size", 0) for log in logging_outputs)
+            metrics.log_scalar(
+                "dec_loss", dec_loss_sum / dec_sample_size / math.log(2), dec_sample_size, round=3
+            )
+            metrics.log_scalar(
+                "ctc_loss", ctc_loss_sum / sample_size / math.log(2), sample_size, round=3
+            )
+            metrics.log_scalar(
+                "dec_nll_loss", dec_nll_loss_sum / dec_sample_size / math.log(2), dec_sample_size, round=3
+            )
+            metrics.log_derived(
+                "dec_ppl", lambda meters: utils.get_perplexity(meters["dec_nll_loss"].avg)
+            )
+            total = utils.item(sum(log.get("total", 0) for log in logging_outputs))
+            if total > 0:
+                metrics.log_scalar("total", total)
+                n_correct = utils.item(
+                    sum(log.get("dec_n_correct", 0) for log in logging_outputs)
+                )
+                metrics.log_scalar("dec_n_correct", n_correct)
+                metrics.log_derived(
+                    "dec_accuracy",
+                    lambda meters: round(
+                        meters["dec_n_correct"].sum * 100.0 / meters["total"].sum, 3
+                    )
+                    if meters["total"].sum > 0
+                    else float("nan"),
+                )
+
+    @staticmethod
+    def logging_outputs_can_be_summed() -> bool:
+        """
+        Whether the logging outputs returned by `forward` can be summed
+        across workers prior to calling `reduce_metrics`. Setting this
+        to True will improves distributed training speed.
+        """
+        return True
diff --git a/YiTrans/yitrans_iwslt22/criterions/joint_step1_criterion.py b/YiTrans/yitrans_iwslt22/criterions/joint_step1_criterion.py
new file mode 100644
index 0000000000000000000000000000000000000000..2398de156affc681116e12e128123986ac21835f
--- /dev/null
+++ b/YiTrans/yitrans_iwslt22/criterions/joint_step1_criterion.py
@@ -0,0 +1,366 @@
+# --------------------------------------------------------
+# The YiTrans End-to-End Speech Translation System for IWSLT 2022 Offline Shared Task (https://arxiv.org/abs/2206.05777)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/YiTrans
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Based on fairseq code bases
+# https://github.com/facebookresearch/fairseq
+# --------------------------------------------------------
+
+import logging
+import math
+import re
+from dataclasses import dataclass, field
+from typing import List, Optional
+
+import torch
+import torch.nn.functional as F
+from fairseq import metrics, utils
+from fairseq.criterions import FairseqCriterion, register_criterion
+from fairseq.criterions.label_smoothed_cross_entropy import label_smoothed_nll_loss
+from fairseq.dataclass import FairseqDataclass
+
+logger = logging.getLogger(__name__)
+@dataclass
+class JointCriterionConfig(FairseqDataclass):
+    pred_masked_weight: float = field(
+        default=1.0,
+        metadata={"help": "weight for predictive loss for masked frames"},
+    )
+    pred_nomask_weight: float = field(
+        default=0.0,
+        metadata={"help": "weight for predictive loss for unmasked frames"},
+    )
+    loss_weights: Optional[List[float]] = field(
+        default=None,
+        metadata={"help": "weights for additional loss terms (not first one)"},
+    )
+    log_keys: List[str] = field(
+        default_factory=lambda: [],
+        metadata={"help": "output keys to log"},
+    )
+    dec_weight: float = field(
+        default=1.0,
+        metadata={"help": "weights for decoder CE Loss, loss will be (hubert_loss + dec_weight * CE_Loss)"},
+    )
+    text_weight: float = field(
+        default=1.0,
+        metadata={"help": "weights for text ED CE Loss, loss will be (hubert_loss + dec_weight * CE_Loss + text_weight * CE_Loss)"},
+    )
+    report_accuracy: bool = field(
+        default=True,
+        metadata={"help": "report decoder accuracy metric"},
+    )
+    ignore_prefix_size: int = field(
+        default=0,
+        metadata={"help": "Ignore first N tokens"},
+    )
+    label_smoothing: float = field(
+        default=0.0,
+        metadata={"help": "epsilon for label smoothing, 0 means no label smoothing"},
+    )
+
+
+@register_criterion("joint_step1", dataclass=JointCriterionConfig)
+class JointCriterion(FairseqCriterion):
+    def __init__(
+        self, 
+        task, 
+        pred_masked_weight, 
+        pred_nomask_weight, 
+        loss_weights=None, 
+        log_keys=None, 
+        dec_weight=1.0,
+        text_weight=1.0,
+        report_accuracy=False, 
+        ignore_prefix_size=0, 
+        label_smoothing=0.0
+    ):
+        super().__init__(task)
+        self.pred_masked_weight = pred_masked_weight
+        self.pred_nomask_weight = pred_nomask_weight
+        self.loss_weights = loss_weights
+        self.log_keys = [] if log_keys is None else log_keys
+        self.dec_weight = dec_weight
+        self.text_weight = text_weight
+        self.report_accuracy = report_accuracy
+        self.ignore_prefix_size = ignore_prefix_size
+        self.eps = label_smoothing
+        self.padding_idx = task.dictionaries[0].pad()
+
+    def forward(self, model, sample, reduce=True, log_pred=False):
+        """Compute the loss for the given sample.
+        Returns a tuple with three elements:
+        1) the loss
+        2) the sample size, which is used as the denominator for the gradient
+        3) logging outputs to display while training
+        """
+
+        if "speech" in sample.keys():
+            text_type = [name for name in sample.keys() if name.startswith("text")]
+            assert len(text_type) == 1
+            text_type = text_type[0]
+            text_sample = sample[text_type]
+            sample = sample["speech"]
+        else:
+            text_sample = None
+
+        sample["modality"] = "speech"
+        ### 1. do hubert forward and loss computation
+        net_output = model(target_list=sample["target_list"], **sample["net_input"])
+        loss = 0.0
+        sample_size = 0
+        logging_output = {}
+        reduction = "sum" if reduce else "none"
+
+        loss_m_list = []
+        logp_m_list = model.get_logits(net_output, True)
+        targ_m_list = model.get_targets(net_output, True)
+        assert self.pred_masked_weight == 0 or len(logp_m_list) > 0
+        for i, (logp_m, targ_m) in enumerate(zip(logp_m_list, targ_m_list)):
+            loss_m = F.cross_entropy(logp_m, targ_m, reduction=reduction)
+            loss_m_list.append(loss_m)
+            logging_output[f"loss_m_{i}"] = loss_m.detach().item()
+        if self.pred_masked_weight > 0:
+            loss += self.pred_masked_weight * sum(loss_m_list)
+            sample_size += targ_m_list[0].numel()
+
+        loss_u_list = []
+        logp_u_list = model.get_logits(net_output, False)
+        targ_u_list = model.get_targets(net_output, False)
+        assert self.pred_nomask_weight == 0 or len(logp_u_list) > 0
+        for i, (logp_u, targ_u) in enumerate(zip(logp_u_list, targ_u_list)):
+            loss_u = F.cross_entropy(logp_u, targ_u, reduction=reduction)
+            loss_u_list.append(loss_u)
+            logging_output[f"loss_u_{i}"] = loss_u.detach().item()
+        if self.pred_nomask_weight > 0:
+            loss += self.pred_nomask_weight * sum(loss_u_list)
+            sample_size += targ_u_list[0].numel()
+
+        if self.loss_weights is not None:
+            assert hasattr(model, "get_extra_losses")
+            extra_losses, names = model.get_extra_losses(net_output)
+            if torch.is_tensor(extra_losses):
+                extra_losses = [extra_losses]
+                names = [names]
+            if len(self.loss_weights) == 1 and len(extra_losses) != 1:
+                self.loss_weights = [self.loss_weights[0]] * len(extra_losses)
+            assert len(extra_losses) == len(
+                self.loss_weights
+            ), f"{len(extra_losses)}, {len(self.loss_weights)}"
+            for p, n, coef in zip(extra_losses, names, self.loss_weights):
+                if coef != 0 and p is not None:
+                    p = coef * p.float() * sample_size
+                    loss += p
+                    logging_output[f"loss_{n}"] = p.item()
+
+        if "decoder_target" in sample:
+            dec_sample_size = sample["dec_ntokens"]
+            dec_loss, dec_nll_loss = self.compute_ce_loss(model, net_output["decoder_out"], sample, reduce=reduce)
+            loss = loss + (self.dec_weight * dec_loss *  sample_size / dec_sample_size)
+            logging_output["dec_loss"] = dec_loss.item()
+            logging_output["dec_nll_loss"] = dec_nll_loss.item()
+            logging_output["dec_sample_size"] = dec_sample_size
+
+            if self.report_accuracy:
+                n_correct, total = self.compute_accuracy(model, net_output["decoder_out"], sample)
+                logging_output["dec_n_correct"] = utils.item(n_correct.data)
+                logging_output["total"] = utils.item(total.data)
+
+        if text_sample is not None:
+            ### 2. do text forward and loss computation
+            text_sample["modality"] = "text"
+            text_net_output = model(**text_sample["net_input"])
+            text_dec_loss, text_dec_nll_loss = self.compute_ce_loss(model, text_net_output["decoder_out"], text_sample, reduce=reduce)
+            text_sample_size = text_sample["ntokens"]
+            loss = loss + (self.text_weight * text_dec_loss *  sample_size / text_sample_size)
+            logging_output["text_dec_loss"] = text_dec_loss.item()
+            logging_output["text_dec_nll_loss"] = text_dec_nll_loss.item()
+            logging_output["text_sample_size"] = text_sample_size
+
+            if self.report_accuracy:
+                n_correct, total = self.compute_accuracy(model, text_net_output["decoder_out"], text_sample)
+                logging_output["text_dec_n_correct"] = utils.item(n_correct.data)
+                logging_output["text_total"] = utils.item(total.data)
+
+        logging_output = {
+            "loss": loss.item() if reduce else loss,
+            "ntokens": sample_size,
+            "nsentences": sample["id"].numel() + (text_sample["id"].numel() if text_sample is not None else 0),
+            "sample_size": sample_size,
+            **logging_output,
+        }
+
+        for lk in self.log_keys:
+            if lk in net_output:
+                logging_output[lk] = float((net_output[lk]))
+
+        def compute_correct(logits):
+            if logits.numel() == 0:
+                return 0, 0
+            else:
+                assert logits.dim() > 1, logits.shape
+                max = logits.argmax(-1) == 0
+                min = logits.argmin(-1) == 0
+                both = max & min
+                corr = max.long().sum().item() - both.long().sum().item()
+                count = max.numel()
+                return corr, count
+
+        with torch.no_grad():
+            for i, logp_m in enumerate(logp_m_list):
+                corr_m, count_m = compute_correct(logp_m)
+                logging_output[f"correct_m_{i}"] = corr_m
+                logging_output[f"count_m_{i}"] = count_m
+
+            for i, logp_u in enumerate(logp_u_list):
+                corr_u, count_u = compute_correct(logp_u)
+                logging_output[f"correct_u_{i}"] = corr_u
+                logging_output[f"count_u_{i}"] = count_u
+
+        return loss, sample_size, logging_output
+
+    def compute_ce_loss(self, model, net_output, sample, reduce=True):
+        lprobs, target = self.get_lprobs_and_target(model, net_output, sample)
+        loss, nll_loss = label_smoothed_nll_loss(
+            lprobs,
+            target,
+            self.eps,
+            ignore_index=self.padding_idx,
+            reduce=reduce,
+        )
+        return loss, nll_loss
+
+    def compute_accuracy(self, model, net_output, sample):
+        lprobs, target = self.get_lprobs_and_target(model, net_output, sample)
+        mask = target.ne(self.padding_idx)
+        n_correct = torch.sum(
+            lprobs.argmax(1).masked_select(mask).eq(target.masked_select(mask))
+        )
+        total = torch.sum(mask)
+        return n_correct, total
+
+    def get_lprobs_and_target(self, model, net_output, sample):
+        lprobs = model.get_normalized_probs(net_output, log_probs=True)
+        if sample["modality"] == "speech":
+            target = sample["decoder_target"]
+            if self.ignore_prefix_size > 0:
+                if getattr(lprobs, "batch_first", False):
+                    lprobs = lprobs[:, self.ignore_prefix_size :, :].contiguous()
+                    target = target[:, self.ignore_prefix_size :].contiguous()
+                else:
+                    lprobs = lprobs[self.ignore_prefix_size :, :, :].contiguous()
+                    target = target[self.ignore_prefix_size :, :].contiguous()
+        else:
+            target = sample["target"]
+
+        return lprobs.view(-1, lprobs.size(-1)), target.view(-1)
+
+    @staticmethod
+    def reduce_metrics(logging_outputs) -> None:
+        """Aggregate logging outputs from data parallel training (copied from normal cross entropy)."""
+        loss_sum = sum(log.get("loss", 0) for log in logging_outputs)
+        ntokens = sum(log.get("ntokens", 0) for log in logging_outputs)
+        sample_size = sum(log.get("sample_size", 0) for log in logging_outputs)
+
+        metrics.log_scalar(
+            "loss", loss_sum / sample_size / math.log(2), sample_size, round=3
+        )
+        if sample_size != ntokens:
+            metrics.log_scalar(
+                "nll_loss", loss_sum / ntokens / math.log(2), ntokens, round=3
+            )
+            metrics.log_derived(
+                "ppl", lambda meters: utils.get_perplexity(meters["nll_loss"].avg)
+            )
+        else:
+            metrics.log_derived(
+                "ppl", lambda meters: utils.get_perplexity(meters["loss"].avg)
+            )
+
+        counts = {}
+        for lk in logging_outputs[0].keys():
+            if lk.startswith("count_"):
+                val = sum(log[lk] for log in logging_outputs)
+                metrics.log_scalar(lk, val)
+                counts[lk] = val
+
+        for lk in logging_outputs[0].keys():
+            if lk.startswith("loss_"):
+                val = sum(log[lk] for log in logging_outputs)
+                metrics.log_scalar(lk, val / sample_size / math.log(2), round=3)
+            elif lk.startswith("correct_"):
+                val = sum(log[lk] for log in logging_outputs)
+                metrics.log_scalar(lk, val / counts[re.sub("correct", "count", lk)])
+
+        if "dec_loss" in logging_outputs[0]:
+            dec_loss_sum = sum(log.get("dec_loss", 0) for log in logging_outputs)
+            dec_nll_loss_sum = sum(log.get("dec_nll_loss", 0) for log in logging_outputs)
+            dec_sample_size = sum(log.get("dec_sample_size", 0) for log in logging_outputs)
+            metrics.log_scalar(
+                "dec_loss", dec_loss_sum / dec_sample_size / math.log(2), dec_sample_size, round=3
+            )
+            metrics.log_scalar(
+                "dec_nll_loss", dec_nll_loss_sum / dec_sample_size / math.log(2), dec_sample_size, round=3
+            )
+            metrics.log_derived(
+                "dec_ppl", lambda meters: utils.get_perplexity(meters["dec_nll_loss"].avg)
+            )
+            total = utils.item(sum(log.get("total", 0) for log in logging_outputs))
+            if total > 0:
+                metrics.log_scalar("total", total)
+                n_correct = utils.item(
+                    sum(log.get("dec_n_correct", 0) for log in logging_outputs)
+                )
+                metrics.log_scalar("dec_n_correct", n_correct)
+                metrics.log_derived(
+                    "dec_accuracy",
+                    lambda meters: round(
+                        meters["dec_n_correct"].sum * 100.0 / meters["total"].sum, 3
+                    )
+                    if meters["total"].sum > 0
+                    else float("nan"),
+                )
+        
+        if "text_dec_loss" in logging_outputs[0]:
+            text_dec_loss_sum = sum(log.get("text_dec_loss", 0) for log in logging_outputs)
+            text_dec_nll_loss_sum = sum(log.get("text_dec_nll_loss", 0) for log in logging_outputs)
+            text_sample_size = sum(log.get("text_sample_size", 0) for log in logging_outputs)
+            metrics.log_scalar(
+                "text_dec_loss", text_dec_loss_sum / text_sample_size / math.log(2), text_sample_size, round=3
+            )
+            metrics.log_scalar(
+                "text_dec_nll_loss", text_dec_nll_loss_sum / text_sample_size / math.log(2), text_sample_size, round=3
+            )
+            metrics.log_derived(
+                "text_dec_ppl", lambda meters: utils.get_perplexity(meters["text_dec_nll_loss"].avg)
+            )
+            text_total = utils.item(sum(log.get("text_total", 0) for log in logging_outputs))
+            if text_total > 0:
+                metrics.log_scalar("text_total", text_total)
+                text_n_correct = utils.item(
+                    sum(log.get("text_dec_n_correct", 0) for log in logging_outputs)
+                )
+                metrics.log_scalar("text_dec_n_correct", text_n_correct)
+                metrics.log_derived(
+                    "text_dec_accuracy",
+                    lambda meters: round(
+                        meters["text_dec_n_correct"].sum * 100.0 / meters["text_total"].sum, 3
+                    )
+                    if meters["text_total"].sum > 0
+                    else float("nan"),
+                )
+
+    @staticmethod
+    def aggregate_logging_outputs(logging_outputs):
+        """Aggregate logging outputs from data parallel training."""
+        raise NotImplementedError()
+
+    @staticmethod
+    def logging_outputs_can_be_summed() -> bool:
+        """
+        Whether the logging outputs returned by `forward` can be summed
+        across workers prior to calling `reduce_metrics`. Setting this
+        to True will improves distributed training speed.
+        """
+        return False
diff --git a/YiTrans/yitrans_iwslt22/criterions/joint_step1_split_batch_criterion.py b/YiTrans/yitrans_iwslt22/criterions/joint_step1_split_batch_criterion.py
new file mode 100644
index 0000000000000000000000000000000000000000..8d603e030dcbc7d50144d502ae7bf266365ea154
--- /dev/null
+++ b/YiTrans/yitrans_iwslt22/criterions/joint_step1_split_batch_criterion.py
@@ -0,0 +1,370 @@
+# --------------------------------------------------------
+# The YiTrans End-to-End Speech Translation System for IWSLT 2022 Offline Shared Task (https://arxiv.org/abs/2206.05777)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/YiTrans
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Based on fairseq code bases
+# https://github.com/facebookresearch/fairseq
+# --------------------------------------------------------
+
+import logging
+import math
+import re
+from dataclasses import dataclass, field
+from typing import List, Optional
+
+import torch
+import torch.nn.functional as F
+from fairseq import metrics, utils
+from fairseq.criterions import FairseqCriterion, register_criterion
+from fairseq.criterions.label_smoothed_cross_entropy import label_smoothed_nll_loss
+from fairseq.dataclass import FairseqDataclass
+
+logger = logging.getLogger(__name__)
+@dataclass
+class JointCriterionConfig(FairseqDataclass):
+    pred_masked_weight: float = field(
+        default=1.0,
+        metadata={"help": "weight for predictive loss for masked frames"},
+    )
+    pred_nomask_weight: float = field(
+        default=0.0,
+        metadata={"help": "weight for predictive loss for unmasked frames"},
+    )
+    loss_weights: Optional[List[float]] = field(
+        default=None,
+        metadata={"help": "weights for additional loss terms (not first one)"},
+    )
+    log_keys: List[str] = field(
+        default_factory=lambda: [],
+        metadata={"help": "output keys to log"},
+    )
+    dec_weight: float = field(
+        default=1.0,
+        metadata={"help": "weights for decoder CE Loss, loss will be (hubert_loss + dec_weight * CE_Loss)"},
+    )
+    text_weight: float = field(
+        default=1.0,
+        metadata={"help": "weights for text ED CE Loss, loss will be (hubert_loss + dec_weight * CE_Loss + text_weight * CE_Loss)"},
+    )
+    report_accuracy: bool = field(
+        default=True,
+        metadata={"help": "report decoder accuracy metric"},
+    )
+    ignore_prefix_size: int = field(
+        default=0,
+        metadata={"help": "Ignore first N tokens"},
+    )
+    label_smoothing: float = field(
+        default=0.0,
+        metadata={"help": "epsilon for label smoothing, 0 means no label smoothing"},
+    )
+
+
+@register_criterion("joint_step1_split_batch", dataclass=JointCriterionConfig)
+class JointSplitCriterion(FairseqCriterion):
+    def __init__(
+        self, 
+        task, 
+        pred_masked_weight, 
+        pred_nomask_weight, 
+        loss_weights=None, 
+        log_keys=None, 
+        dec_weight=1.0,
+        text_weight=1.0,
+        report_accuracy=False, 
+        ignore_prefix_size=0, 
+        label_smoothing=0.0
+    ):
+        super().__init__(task)
+        self.pred_masked_weight = pred_masked_weight
+        self.pred_nomask_weight = pred_nomask_weight
+        self.loss_weights = loss_weights
+        self.log_keys = [] if log_keys is None else log_keys
+        self.dec_weight = dec_weight
+        self.text_weight = text_weight
+        self.report_accuracy = report_accuracy
+        self.ignore_prefix_size = ignore_prefix_size
+        self.eps = label_smoothing
+        self.padding_idx = task.dictionaries[0].pad()
+        self.text_dict = task.text_dictionary
+
+    def forward(self, model, sample, reduce=True, log_pred=False):
+        """Compute the loss for the given sample.
+        Returns a tuple with three elements:
+        1) the loss
+        2) the sample size, which is used as the denominator for the gradient
+        3) logging outputs to display while training
+        """
+        text_type = [name for name in sample.keys() if name.startswith("text")]
+        loss = 0.
+        sample_size = 0
+        logging_output = {}
+        reduction = "sum" if reduce else "none"
+
+        if "speech" in sample.keys():
+            assert len(text_type) == 0
+            sample = sample["speech"]
+            sample["modality"] = "speech"
+
+            ### 1. do hubert forward and loss computation
+            net_output = model(target_list=sample["target_list"], **sample["net_input"])
+            loss_m_list = []
+            logp_m_list = model.get_logits(net_output, True)
+            targ_m_list = model.get_targets(net_output, True)
+            assert self.pred_masked_weight == 0 or len(logp_m_list) > 0
+            for i, (logp_m, targ_m) in enumerate(zip(logp_m_list, targ_m_list)):
+                loss_m = F.cross_entropy(logp_m, targ_m, reduction=reduction)
+                loss_m_list.append(loss_m)
+                logging_output[f"loss_m_{i}"] = loss_m.detach().item() / targ_m_list[0].numel()
+            if self.pred_masked_weight > 0:
+                loss += self.pred_masked_weight * sum(loss_m_list)
+                sample_size += targ_m_list[0].numel()
+
+            loss_u_list = []
+            logp_u_list = model.get_logits(net_output, False)
+            targ_u_list = model.get_targets(net_output, False)
+            assert self.pred_nomask_weight == 0 or len(logp_u_list) > 0
+            for i, (logp_u, targ_u) in enumerate(zip(logp_u_list, targ_u_list)):
+                loss_u = F.cross_entropy(logp_u, targ_u, reduction=reduction)
+                loss_u_list.append(loss_u)
+                logging_output[f"loss_u_{i}"] = loss_u.detach().item() / targ_m_list[0].numel()
+            if self.pred_nomask_weight > 0:
+                loss += self.pred_nomask_weight * sum(loss_u_list)
+                sample_size += targ_u_list[0].numel()
+
+            if self.loss_weights is not None:
+                assert hasattr(model, "get_extra_losses")
+                extra_losses, names = model.get_extra_losses(net_output)
+                if torch.is_tensor(extra_losses):
+                    extra_losses = [extra_losses]
+                    names = [names]
+                if len(self.loss_weights) == 1 and len(extra_losses) != 1:
+                    self.loss_weights = [self.loss_weights[0]] * len(extra_losses)
+                assert len(extra_losses) == len(self.loss_weights), f"{len(extra_losses)}, {len(self.loss_weights)}"
+                for p, n, coef in zip(extra_losses, names, self.loss_weights):
+                    if coef != 0 and p is not None:
+                        p = coef * p.float() * sample_size
+                        loss += p
+                        logging_output[f"loss_{n}"] = p.item() / sample_size
+
+            if "decoder_target" in sample:
+                dec_sample_size = sample["dec_ntokens"]
+                dec_loss, dec_nll_loss = self.compute_ce_loss(model, net_output["decoder_out"], sample, reduce=reduce)
+                loss = loss + (self.dec_weight * dec_loss *  sample_size / dec_sample_size)
+                logging_output["dec_loss"] = dec_loss.item()
+                logging_output["dec_nll_loss"] = dec_nll_loss.item()
+                logging_output["dec_sample_size"] = dec_sample_size
+                logging_output["hubert_sample_size"] = sample_size
+
+                if self.report_accuracy:
+                    n_correct, total = self.compute_accuracy(model, net_output["decoder_out"], sample)
+                    logging_output["dec_n_correct"] = utils.item(n_correct.data)
+                    logging_output["total"] = utils.item(total.data)
+            
+            loss = loss / sample_size
+
+            for lk in self.log_keys:
+                if lk in net_output:
+                    logging_output[lk] = float((net_output[lk]))
+
+            def compute_correct(logits):
+                if logits.numel() == 0:
+                    return 0, 0
+                else:
+                    assert logits.dim() > 1, logits.shape
+                    max = logits.argmax(-1) == 0
+                    min = logits.argmin(-1) == 0
+                    both = max & min
+                    corr = max.long().sum().item() - both.long().sum().item()
+                    count = max.numel()
+                    return corr, count
+
+            with torch.no_grad():
+                for i, logp_m in enumerate(logp_m_list):
+                    corr_m, count_m = compute_correct(logp_m)
+                    logging_output[f"correct_m_{i}"] = corr_m
+                    logging_output[f"count_m_{i}"] = count_m
+
+                for i, logp_u in enumerate(logp_u_list):
+                    corr_u, count_u = compute_correct(logp_u)
+                    logging_output[f"correct_u_{i}"] = corr_u
+                    logging_output[f"count_u_{i}"] = count_u
+            logging_output["speech_sample_size"] = sample_size
+
+        else:
+            assert len(text_type) == 1
+            text_type = text_type[0]
+            text_sample = sample[text_type]
+            text_sample["modality"] = "text"
+            ### 2. do text forward and loss computation
+            text_net_output = model(**text_sample["net_input"])
+            text_dec_loss, text_dec_nll_loss = self.compute_ce_loss(model, text_net_output["decoder_out"], text_sample, reduce=reduce)
+            text_sample_size = text_sample["ntokens"]
+            loss = loss + (self.text_weight * text_dec_loss)
+            logging_output["text_dec_loss"] = text_dec_loss.item()
+            logging_output["text_dec_nll_loss"] = text_dec_nll_loss.item()
+            logging_output["text_sample_size"] = text_sample_size
+
+            loss = loss / text_sample_size
+            sample_size = text_sample_size
+            sample = text_sample
+
+            if self.report_accuracy:
+                n_correct, total = self.compute_accuracy(model, text_net_output["decoder_out"], text_sample)
+                logging_output["text_dec_n_correct"] = utils.item(n_correct.data)
+                logging_output["text_total"] = utils.item(total.data)
+
+        logging_output = {
+            "loss": loss.item() if reduce else loss,
+            "ntokens": sample_size,
+            "nsentences": sample["id"].numel(),
+            "sample_size": 1,
+            **logging_output,
+        }
+
+        return loss, 1, logging_output
+
+    def compute_ce_loss(self, model, net_output, sample, reduce=True):
+        lprobs, target = self.get_lprobs_and_target(model, net_output, sample)
+        loss, nll_loss = label_smoothed_nll_loss(
+            lprobs,
+            target,
+            self.eps,
+            ignore_index=self.padding_idx,
+            reduce=reduce,
+        )
+        return loss, nll_loss
+
+    def compute_accuracy(self, model, net_output, sample):
+        lprobs, target = self.get_lprobs_and_target(model, net_output, sample)
+        mask = target.ne(self.padding_idx)
+        n_correct = torch.sum(
+            lprobs.argmax(1).masked_select(mask).eq(target.masked_select(mask))
+        )
+        total = torch.sum(mask)
+        return n_correct, total
+
+    def get_lprobs_and_target(self, model, net_output, sample):
+        lprobs = model.get_normalized_probs(net_output, log_probs=True)
+        if sample["modality"] == "speech":
+            target = sample["decoder_target"]
+            if self.ignore_prefix_size > 0:
+                if getattr(lprobs, "batch_first", False):
+                    lprobs = lprobs[:, self.ignore_prefix_size :, :].contiguous()
+                    target = target[:, self.ignore_prefix_size :].contiguous()
+                else:
+                    lprobs = lprobs[self.ignore_prefix_size :, :, :].contiguous()
+                    target = target[self.ignore_prefix_size :, :].contiguous()
+        else:
+            target = sample["target"]
+
+        return lprobs.view(-1, lprobs.size(-1)), target.view(-1)
+
+    @staticmethod
+    def reduce_metrics(logging_outputs) -> None:
+        """Aggregate logging outputs from data parallel training (copied from normal cross entropy)."""
+        loss_sum = sum(log.get("loss", 0) for log in logging_outputs)
+        ntokens = sum(log.get("ntokens", 0) for log in logging_outputs)
+        sample_size = sum(log.get("sample_size", 0) for log in logging_outputs)
+        speech_sample_size = sum(log.get("speech_sample_size", 0) for log in logging_outputs)
+
+        metrics.log_scalar("loss", loss_sum / sample_size / math.log(2), sample_size, round=3)
+        if sample_size != ntokens:
+            metrics.log_scalar("nll_loss", loss_sum / ntokens / math.log(2), ntokens, round=3)
+            metrics.log_derived("ppl", lambda meters: utils.get_perplexity(meters["nll_loss"].avg))
+        else:
+            metrics.log_derived("ppl", lambda meters: utils.get_perplexity(meters["loss"].avg))
+
+        counts = {}
+        log_keys = []
+        for log in logging_outputs:
+            log_keys += list(log.keys())
+        log_keys = set(log_keys)
+
+        for lk in log_keys:
+            if lk.startswith("count_"):
+                val = sum(log.get(lk, 0) for log in logging_outputs)
+                metrics.log_scalar(lk, val)
+                counts[lk] = val
+
+        for lk in log_keys:
+            if lk.startswith("loss_") and speech_sample_size > 0:
+                val = sum(log.get(lk, 0) for log in logging_outputs)
+                metrics.log_scalar(lk, val / speech_sample_size / math.log(2), round=3)
+            elif lk.startswith("correct_"):
+                val = sum(log.get(lk, 0) for log in logging_outputs)
+                metrics.log_scalar(lk, val / counts[re.sub("correct", "count", lk)])
+
+        if "dec_loss" in logging_outputs[0]:
+            dec_loss_sum = sum(log.get("dec_loss", 0) for log in logging_outputs)
+            dec_nll_loss_sum = sum(log.get("dec_nll_loss", 0) for log in logging_outputs)
+            dec_sample_size = sum(log.get("dec_sample_size", 0) for log in logging_outputs)
+            metrics.log_scalar(
+                "dec_loss", dec_loss_sum / dec_sample_size / math.log(2), dec_sample_size, round=3
+            )
+            metrics.log_scalar(
+                "dec_nll_loss", dec_nll_loss_sum / dec_sample_size / math.log(2), dec_sample_size, round=3
+            )
+            metrics.log_derived(
+                "dec_ppl", lambda meters: utils.get_perplexity(meters["dec_nll_loss"].avg)
+            )
+            total = utils.item(sum(log.get("total", 0) for log in logging_outputs))
+            if total > 0:
+                metrics.log_scalar("total", total)
+                n_correct = utils.item(
+                    sum(log.get("dec_n_correct", 0) for log in logging_outputs)
+                )
+                metrics.log_scalar("dec_n_correct", n_correct)
+                metrics.log_derived(
+                    "dec_accuracy",
+                    lambda meters: round(
+                        meters["dec_n_correct"].sum * 100.0 / meters["total"].sum, 3
+                    )
+                    if meters["total"].sum > 0
+                    else float("nan"),
+                )
+        
+        # if "text_dec_loss" in logging_outputs[0]:
+        if any("text_dec_loss" in logging_output for logging_output in logging_outputs):
+            text_dec_loss_sum = sum(log.get("text_dec_loss", 0) for log in logging_outputs)
+            text_dec_nll_loss_sum = sum(log.get("text_dec_nll_loss", 0) for log in logging_outputs)
+            text_sample_size = sum(log.get("text_sample_size", 0) for log in logging_outputs)
+            metrics.log_scalar(
+                "text_dec_loss", text_dec_loss_sum / text_sample_size / math.log(2), text_sample_size, round=3
+            )
+            metrics.log_scalar(
+                "text_dec_nll_loss", text_dec_nll_loss_sum / text_sample_size / math.log(2), text_sample_size, round=3
+            )
+            metrics.log_derived(
+                "text_dec_ppl", lambda meters: utils.get_perplexity(meters["text_dec_nll_loss"].avg)
+            )
+            text_total = utils.item(sum(log.get("text_total", 0) for log in logging_outputs))
+            if text_total > 0:
+                metrics.log_scalar("text_total", text_total)
+                text_n_correct = utils.item(
+                    sum(log.get("text_dec_n_correct", 0) for log in logging_outputs)
+                )
+                metrics.log_scalar("text_dec_n_correct", text_n_correct)
+                metrics.log_derived(
+                    "text_dec_accuracy",
+                    lambda meters: round(
+                        meters["text_dec_n_correct"].sum * 100.0 / meters["text_total"].sum, 3
+                    )
+                    if meters["text_total"].sum > 0
+                    else float("nan"),
+                )
+
+    @staticmethod
+    def aggregate_logging_outputs(logging_outputs):
+        """Aggregate logging outputs from data parallel training."""
+        raise NotImplementedError()
+
+    @staticmethod
+    def logging_outputs_can_be_summed() -> bool:
+        """
+        Whether the logging outputs returned by `forward` can be summed
+        across workers prior to calling `reduce_metrics`. Setting this
+        to True will improves distributed training speed.
+        """
+        return False
diff --git a/YiTrans/yitrans_iwslt22/criterions/joint_step2_criterion.py b/YiTrans/yitrans_iwslt22/criterions/joint_step2_criterion.py
new file mode 100644
index 0000000000000000000000000000000000000000..aaafdbe1fd0cfc9826ed849bd94e2189a517014a
--- /dev/null
+++ b/YiTrans/yitrans_iwslt22/criterions/joint_step2_criterion.py
@@ -0,0 +1,424 @@
+# --------------------------------------------------------
+# The YiTrans End-to-End Speech Translation System for IWSLT 2022 Offline Shared Task (https://arxiv.org/abs/2206.05777)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/YiTrans
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Based on fairseq code bases
+# https://github.com/facebookresearch/fairseq
+# --------------------------------------------------------
+
+import math
+from argparse import Namespace
+from dataclasses import dataclass, field
+from omegaconf import II
+from typing import Optional
+
+import torch
+import torch.nn.functional as F
+from fairseq import metrics, utils
+from fairseq.criterions import FairseqCriterion, register_criterion
+from fairseq.criterions.label_smoothed_cross_entropy import label_smoothed_nll_loss
+from fairseq.data.data_utils import post_process
+from fairseq.tasks import FairseqTask
+from fairseq.logging.meters import safe_round
+
+from yitrans_iwslt22.criterions.ctc_ce import CtcCeCriterionConfig
+
+@dataclass
+class JointStep2CriterionConfig(CtcCeCriterionConfig):
+    pass
+
+
+@register_criterion("joint_step2", dataclass=JointStep2CriterionConfig)
+class JointStep2Criterion(FairseqCriterion):
+    def __init__(self, cfg: JointStep2CriterionConfig, task: FairseqTask):
+        super().__init__(task)
+        self.blank_idx = (
+            task.target_dictionary.index(task.blank_symbol)
+            if hasattr(task, "blank_symbol")
+            else 0
+        )
+        self.pad_idx = task.target_dictionary.pad()
+        self.eos_idx = task.target_dictionary.eos()
+        self.post_process = cfg.post_process
+
+        if cfg.wer_args is not None:
+            (
+                cfg.wer_kenlm_model,
+                cfg.wer_lexicon,
+                cfg.wer_lm_weight,
+                cfg.wer_word_score,
+            ) = eval(cfg.wer_args)
+
+        if cfg.wer_kenlm_model is not None:
+            from examples.speech_recognition.w2l_decoder import W2lKenLMDecoder
+
+            dec_args = Namespace()
+            dec_args.nbest = 1
+            dec_args.criterion = "ctc"
+            dec_args.kenlm_model = cfg.wer_kenlm_model
+            dec_args.lexicon = cfg.wer_lexicon
+            dec_args.beam = 50
+            dec_args.beam_size_token = min(50, len(task.target_dictionary))
+            dec_args.beam_threshold = min(50, len(task.target_dictionary))
+            dec_args.lm_weight = cfg.wer_lm_weight
+            dec_args.word_score = cfg.wer_word_score
+            dec_args.unk_weight = -math.inf
+            dec_args.sil_weight = 0
+
+            self.w2l_decoder = W2lKenLMDecoder(dec_args, task.target_dictionary)
+        else:
+            self.w2l_decoder = None
+
+        self.zero_infinity = cfg.zero_infinity
+        self.sentence_avg = cfg.sentence_avg
+
+        self.dec_weight = cfg.dec_weight
+        self.report_accuracy = cfg.report_accuracy
+        self.ignore_prefix_size = cfg.ignore_prefix_size
+        self.eps = cfg.label_smoothing
+
+    def forward(self, model, sample, reduce=True):
+        text_type = [name for name in sample.keys() if name.startswith("text")]
+        logging_output = {}
+        if "speech" in sample.keys():
+            assert len(text_type) == 0
+            sample = sample["speech"]
+            sample["modality"] = "speech"
+        
+            net_output = model(**sample["net_input"])
+            lprobs = model.get_normalized_probs(
+                net_output, log_probs=True
+            ).contiguous()  # (T, B, C) from the encoder
+
+            if "src_lengths" in sample["net_input"]:
+                input_lengths = sample["net_input"]["src_lengths"]
+            else:
+                if net_output["padding_mask"] is not None:
+                    non_padding_mask = ~net_output["padding_mask"]
+                    input_lengths = non_padding_mask.long().sum(-1)
+                else:
+                    input_lengths = lprobs.new_full(
+                        (lprobs.size(1),), lprobs.size(0), dtype=torch.long
+                    )
+
+            pad_mask = (sample["target"] != self.pad_idx) & (
+                sample["target"] != self.eos_idx
+            )
+            targets_flat = sample["target"].masked_select(pad_mask)
+            if "target_lengths" in sample:
+                target_lengths = sample["target_lengths"]
+            else:
+                target_lengths = pad_mask.sum(-1)
+
+            with torch.backends.cudnn.flags(enabled=False):
+                loss = F.ctc_loss(
+                    lprobs,
+                    targets_flat,
+                    input_lengths,
+                    target_lengths,
+                    blank=self.blank_idx,
+                    reduction="sum",
+                    zero_infinity=self.zero_infinity,
+                )
+
+            ntokens = (
+                sample["ntokens"] if "ntokens" in sample else target_lengths.sum().item()
+            )
+
+            sample_size = sample["target"].size(0) if self.sentence_avg else ntokens
+
+            if "decoder_target" in sample:
+                if net_output["decoder_out"] is not None:
+                    dec_sample_size = sample["target"].size(0) if self.sentence_avg else sample["dec_ntokens"]
+                    dec_loss, dec_nll_loss = self.compute_ce_loss(model, net_output["decoder_out"], sample, reduce=reduce)
+                    logging_output["ctc_loss"] = loss.item()
+                    loss = (1 - self.dec_weight) * loss + (self.dec_weight * dec_loss *  sample_size / dec_sample_size)
+                    logging_output["dec_loss"] = dec_loss.item()
+                    logging_output["dec_nll_loss"] = dec_nll_loss.item()
+                    logging_output["dec_sample_size"] = dec_sample_size
+
+                    if self.report_accuracy:
+                        n_correct, total = self.compute_accuracy(model, net_output["decoder_out"], sample)
+                        logging_output["dec_n_correct"] = utils.item(n_correct.data)
+                        logging_output["total"] = utils.item(total.data)
+                else:
+                    logging_output["ctc_loss"] = loss.item()
+                    loss = (1 - self.dec_weight) * loss
+                    logging_output["dec_loss"] = 0
+                    logging_output["dec_nll_loss"] = 0
+                    logging_output["dec_sample_size"] = 1
+                    if self.report_accuracy:
+                        logging_output["dec_n_correct"] = 0
+                        logging_output["total"] = 1
+            loss = loss / sample_size
+            logging_output["speech_sample_size"] = sample_size
+        else:
+            assert len(text_type) == 1
+            text_type = text_type[0]
+            text_sample = sample[text_type]
+            text_sample["modality"] = "text"
+            ### 2. do text forward and loss computation
+            text_net_output = model(**text_sample["net_input"])
+            text_dec_loss, text_dec_nll_loss = self.compute_ce_loss(model, text_net_output["decoder_out"], text_sample, reduce=reduce)
+            text_sample_size = text_sample["target"].size(0) if self.sentence_avg else text_sample["ntokens"]
+            loss = text_dec_loss
+            logging_output["text_dec_loss"] = text_dec_loss.item()
+            logging_output["text_dec_nll_loss"] = text_dec_nll_loss.item()
+            logging_output["text_sample_size"] = text_sample_size
+
+            loss = loss / text_sample_size
+            sample = text_sample
+            ntokens = text_sample["ntokens"]
+
+            if self.report_accuracy:
+                n_correct, total = self.compute_accuracy(model, text_net_output["decoder_out"], text_sample)
+                logging_output["text_dec_n_correct"] = utils.item(n_correct.data)
+                logging_output["text_total"] = utils.item(total.data)
+
+        logging_output = {
+            "loss": utils.item(loss.data),  # * sample['ntokens'],
+            "ntokens": ntokens,
+            "nsentences": sample["id"].numel(),
+            "sample_size": 1,
+            **logging_output,
+        }
+
+        if not model.training and self.dec_weight < 1.0 and "speech" in sample.keys():
+            import editdistance
+
+            with torch.no_grad():
+                lprobs_t = lprobs.transpose(0, 1).float().contiguous().cpu()
+
+                c_err = 0
+                c_len = 0
+                w_errs = 0
+                w_len = 0
+                wv_errs = 0
+                for lp, t, inp_l in zip(
+                    lprobs_t,
+                    sample["target_label"]
+                    if "target_label" in sample
+                    else sample["target"],
+                    input_lengths,
+                ):
+                    lp = lp[:inp_l].unsqueeze(0)
+
+                    decoded = None
+                    if self.w2l_decoder is not None:
+                        decoded = self.w2l_decoder.decode(lp)
+                        if len(decoded) < 1:
+                            decoded = None
+                        else:
+                            decoded = decoded[0]
+                            if len(decoded) < 1:
+                                decoded = None
+                            else:
+                                decoded = decoded[0]
+
+                    p = (t != self.task.target_dictionary.pad()) & (
+                        t != self.task.target_dictionary.eos()
+                    )
+                    targ = t[p]
+                    targ_units = self.task.target_dictionary.string(targ)
+                    targ_units_arr = targ.tolist()
+
+                    toks = lp.argmax(dim=-1).unique_consecutive()
+                    pred_units_arr = toks[toks != self.blank_idx].tolist()
+
+                    c_err += editdistance.eval(pred_units_arr, targ_units_arr)
+                    c_len += len(targ_units_arr)
+
+                    targ_words = post_process(targ_units, self.post_process).split()
+
+                    pred_units = self.task.target_dictionary.string(pred_units_arr)
+                    pred_words_raw = post_process(pred_units, self.post_process).split()
+
+                    if decoded is not None and "words" in decoded:
+                        pred_words = decoded["words"]
+                        w_errs += editdistance.eval(pred_words, targ_words)
+                        wv_errs += editdistance.eval(pred_words_raw, targ_words)
+                    else:
+                        dist = editdistance.eval(pred_words_raw, targ_words)
+                        w_errs += dist
+                        wv_errs += dist
+
+                    w_len += len(targ_words)
+
+                logging_output["wv_errors"] = wv_errs
+                logging_output["w_errors"] = w_errs
+                logging_output["w_total"] = w_len
+                logging_output["c_errors"] = c_err
+                logging_output["c_total"] = c_len
+
+        return loss, 1, logging_output
+
+    def compute_ce_loss(self, model, net_output, sample, reduce=True):
+        lprobs, target = self.get_lprobs_and_target(model, net_output, sample)
+        loss, nll_loss = label_smoothed_nll_loss(
+            lprobs,
+            target,
+            self.eps,
+            ignore_index=self.pad_idx,
+            reduce=reduce,
+        )
+        return loss, nll_loss
+
+    def compute_accuracy(self, model, net_output, sample):
+        lprobs, target = self.get_lprobs_and_target(model, net_output, sample)
+        mask = target.ne(self.pad_idx)
+        n_correct = torch.sum(
+            lprobs.argmax(1).masked_select(mask).eq(target.masked_select(mask))
+        )
+        total = torch.sum(mask)
+        return n_correct, total
+
+    def get_lprobs_and_target(self, model, net_output, sample):
+        lprobs = model.get_normalized_probs(net_output, log_probs=True)
+        if sample["modality"] == "speech":
+            target = sample["decoder_target"]
+            if self.ignore_prefix_size > 0:
+                if getattr(lprobs, "batch_first", False):
+                    lprobs = lprobs[:, self.ignore_prefix_size :, :].contiguous()
+                    target = target[:, self.ignore_prefix_size :].contiguous()
+                else:
+                    lprobs = lprobs[self.ignore_prefix_size :, :, :].contiguous()
+                    target = target[self.ignore_prefix_size :, :].contiguous()
+        else:
+            target = sample["target"]
+
+        return lprobs.view(-1, lprobs.size(-1)), target.view(-1)
+
+
+    @staticmethod
+    def reduce_metrics(logging_outputs) -> None:
+        """Aggregate logging outputs from data parallel training."""
+
+        loss_sum = utils.item(sum(log.get("loss", 0) for log in logging_outputs))
+        ntokens = utils.item(sum(log.get("ntokens", 0) for log in logging_outputs))
+        nsentences = utils.item(
+            sum(log.get("nsentences", 0) for log in logging_outputs)
+        )
+        sample_size = utils.item(
+            sum(log.get("sample_size", 0) for log in logging_outputs)
+        )
+
+        metrics.log_scalar(
+            "loss", loss_sum / sample_size / math.log(2), sample_size, round=3
+        )
+        metrics.log_scalar("ntokens", ntokens)
+        metrics.log_scalar("nsentences", nsentences)
+        if sample_size != ntokens:
+            metrics.log_scalar(
+                "nll_loss", loss_sum / ntokens / math.log(2), ntokens, round=3
+            )
+
+        c_errors = sum(log.get("c_errors", 0) for log in logging_outputs)
+        metrics.log_scalar("_c_errors", c_errors)
+        c_total = sum(log.get("c_total", 0) for log in logging_outputs)
+        metrics.log_scalar("_c_total", c_total)
+        w_errors = sum(log.get("w_errors", 0) for log in logging_outputs)
+        metrics.log_scalar("_w_errors", w_errors)
+        wv_errors = sum(log.get("wv_errors", 0) for log in logging_outputs)
+        metrics.log_scalar("_wv_errors", wv_errors)
+        w_total = sum(log.get("w_total", 0) for log in logging_outputs)
+        metrics.log_scalar("_w_total", w_total)
+
+        if c_total > 0:
+            metrics.log_derived(
+                "uer",
+                lambda meters: safe_round(
+                    meters["_c_errors"].sum * 100.0 / meters["_c_total"].sum, 3
+                )
+                if meters["_c_total"].sum > 0
+                else float("nan"),
+            )
+        if w_total > 0:
+            metrics.log_derived(
+                "wer",
+                lambda meters: safe_round(
+                    meters["_w_errors"].sum * 100.0 / meters["_w_total"].sum, 3
+                )
+                if meters["_w_total"].sum > 0
+                else float("nan"),
+            )
+            metrics.log_derived(
+                "raw_wer",
+                lambda meters: safe_round(
+                    meters["_wv_errors"].sum * 100.0 / meters["_w_total"].sum, 3
+                )
+                if meters["_w_total"].sum > 0
+                else float("nan"),
+            )
+
+        if "dec_loss" in logging_outputs[0]:
+            ctc_loss_sum = sum(log.get("ctc_loss", 0) for log in logging_outputs)
+            dec_loss_sum = sum(log.get("dec_loss", 0) for log in logging_outputs)
+            dec_nll_loss_sum = sum(log.get("dec_nll_loss", 0) for log in logging_outputs)
+            dec_sample_size = sum(log.get("dec_sample_size", 0) for log in logging_outputs)
+            metrics.log_scalar(
+                "dec_loss", dec_loss_sum / dec_sample_size / math.log(2), dec_sample_size, round=3
+            )
+            metrics.log_scalar(
+                "ctc_loss", ctc_loss_sum / sample_size / math.log(2), sample_size, round=3
+            )
+            metrics.log_scalar(
+                "dec_nll_loss", dec_nll_loss_sum / dec_sample_size / math.log(2), dec_sample_size, round=3
+            )
+            metrics.log_derived(
+                "dec_ppl", lambda meters: utils.get_perplexity(meters["dec_nll_loss"].avg)
+            )
+            total = utils.item(sum(log.get("total", 0) for log in logging_outputs))
+            if total > 0:
+                metrics.log_scalar("total", total)
+                n_correct = utils.item(
+                    sum(log.get("dec_n_correct", 0) for log in logging_outputs)
+                )
+                metrics.log_scalar("dec_n_correct", n_correct)
+                metrics.log_derived(
+                    "dec_accuracy",
+                    lambda meters: round(
+                        meters["dec_n_correct"].sum * 100.0 / meters["total"].sum, 3
+                    )
+                    if meters["total"].sum > 0
+                    else float("nan"),
+                )
+
+        # if "text_dec_loss" in logging_outputs[0]:
+        if any("text_dec_loss" in logging_output for logging_output in logging_outputs):
+            text_dec_loss_sum = sum(log.get("text_dec_loss", 0) for log in logging_outputs)
+            text_dec_nll_loss_sum = sum(log.get("text_dec_nll_loss", 0) for log in logging_outputs)
+            text_sample_size = sum(log.get("text_sample_size", 0) for log in logging_outputs)
+            metrics.log_scalar(
+                "text_dec_loss", text_dec_loss_sum / text_sample_size / math.log(2), text_sample_size, round=3
+            )
+            metrics.log_scalar(
+                "text_dec_nll_loss", text_dec_nll_loss_sum / text_sample_size / math.log(2), text_sample_size, round=3
+            )
+            metrics.log_derived(
+                "text_dec_ppl", lambda meters: utils.get_perplexity(meters["text_dec_nll_loss"].avg)
+            )
+            text_total = utils.item(sum(log.get("text_total", 0) for log in logging_outputs))
+            if text_total > 0:
+                metrics.log_scalar("text_total", text_total)
+                text_n_correct = utils.item(
+                    sum(log.get("text_dec_n_correct", 0) for log in logging_outputs)
+                )
+                metrics.log_scalar("text_dec_n_correct", text_n_correct)
+                metrics.log_derived(
+                    "text_dec_accuracy",
+                    lambda meters: round(
+                        meters["text_dec_n_correct"].sum * 100.0 / meters["text_total"].sum, 3
+                    )
+                    if meters["text_total"].sum > 0
+                    else float("nan"),
+                )
+
+    @staticmethod
+    def logging_outputs_can_be_summed() -> bool:
+        """
+        Whether the logging outputs returned by `forward` can be summed
+        across workers prior to calling `reduce_metrics`. Setting this
+        to True will improves distributed training speed.
+        """
+        return False
diff --git a/YiTrans/yitrans_iwslt22/data/concat_dataset.py b/YiTrans/yitrans_iwslt22/data/concat_dataset.py
new file mode 100644
index 0000000000000000000000000000000000000000..9cdb5231d7cc6e701b99f5490d3406fad139c20f
--- /dev/null
+++ b/YiTrans/yitrans_iwslt22/data/concat_dataset.py
@@ -0,0 +1,124 @@
+# modalified from https://github.com/facebookresearch/fairseq/blob/main/fairseq/data/concat_dataset.py
+
+import bisect
+
+import numpy as np
+from torch.utils.data.dataloader import default_collate
+
+from fairseq.data import FairseqDataset
+
+
+class ConcatDataset(FairseqDataset):
+    @staticmethod
+    def cumsum(sequence, sample_ratios):
+        r, s = [], 0
+        for e, ratio in zip(sequence, sample_ratios):
+            curr_len = int(ratio * len(e))
+            r.append(curr_len + s)
+            s += curr_len
+        return r
+
+    def __init__(self, datasets, sample_ratios=1):
+        super(ConcatDataset, self).__init__()
+        assert len(datasets) > 0, "datasets should not be an empty iterable"
+        self.datasets = list(datasets)
+        if isinstance(sample_ratios, int):
+            sample_ratios = [sample_ratios] * len(self.datasets)
+        self.sample_ratios = sample_ratios
+        self.cumulative_sizes = self.cumsum(self.datasets, sample_ratios)
+        self.real_sizes = [len(d) for d in self.datasets]
+
+    def __len__(self):
+        return self.cumulative_sizes[-1]
+
+    def __getitem__(self, idx):
+        dataset_idx, sample_idx = self._get_dataset_and_sample_index(idx)
+        return self.datasets[dataset_idx][sample_idx]
+
+    def _get_dataset_and_sample_index(self, idx: int):
+        dataset_idx = bisect.bisect_right(self.cumulative_sizes, idx)
+        if dataset_idx == 0:
+            sample_idx = idx
+        else:
+            sample_idx = idx - self.cumulative_sizes[dataset_idx - 1]
+        sample_idx = sample_idx % self.real_sizes[dataset_idx]
+        return dataset_idx, sample_idx
+
+    def collater(self, samples, **extra_args):
+        # For now only supports datasets with same underlying collater implementations
+        if hasattr(self.datasets[0], "collater"):
+            return self.datasets[0].collater(samples, **extra_args)
+        else:
+            return default_collate(samples, **extra_args)
+
+    def size(self, idx: int):
+        """
+        Return an example's size as a float or tuple.
+        """
+        dataset_idx, sample_idx = self._get_dataset_and_sample_index(idx)
+        return self.datasets[dataset_idx].size(sample_idx)
+
+    def num_tokens(self, index: int):
+        return np.max(self.size(index))
+
+    def attr(self, attr: str, index: int):
+        dataset_idx = bisect.bisect_right(self.cumulative_sizes, index)
+        return getattr(self.datasets[dataset_idx], attr, None)
+
+    @property
+    def sizes(self):
+        _dataset_sizes = []
+        for ds, sr in zip(self.datasets, self.sample_ratios):
+            if isinstance(ds.sizes, np.ndarray):
+                _dataset_sizes.append(np.tile(ds.sizes, sr))
+            else:
+                # Only support underlying dataset with single size array.
+                assert isinstance(ds.sizes, list)
+                _dataset_sizes.append(np.tile(ds.sizes[0], sr))
+        return np.concatenate(_dataset_sizes)
+
+    @property
+    def supports_prefetch(self):
+        return all(d.supports_prefetch for d in self.datasets)
+
+    def ordered_indices(self):
+        """
+        Returns indices sorted by length. So less padding is needed.
+        """
+        if isinstance(self.sizes, np.ndarray) and len(self.sizes.shape) > 1:
+            # special handling for concatenating lang_pair_datasets
+            if getattr(self.datasets[0], "shuffle", False):
+                indices = np.random.permutation(len(self)).astype(np.int64)
+            else:
+                indices = np.arange(len(self), dtype=np.int64)
+            sizes = self.sizes
+            tgt_sizes = (
+                sizes[:, 1] if len(sizes.shape) > 0 and sizes.shape[1] > 1 else None
+            )
+            src_sizes = (
+                sizes[:, 0] if len(sizes.shape) > 0 and sizes.shape[1] > 1 else sizes
+            )
+            # sort by target length, then source length
+            if tgt_sizes is not None:
+                indices = indices[np.argsort(tgt_sizes[indices], kind="mergesort")]
+            return indices[np.argsort(src_sizes[indices], kind="mergesort")]
+        else:
+            return np.argsort(self.sizes)
+
+    def prefetch(self, indices):
+        frm = 0
+        for to, ds in zip(self.cumulative_sizes, self.datasets):
+            real_size = len(ds)
+            if getattr(ds, "supports_prefetch", False):
+                ds.prefetch([(i - frm) % real_size for i in indices if frm <= i < to])
+            frm = to
+
+    @property
+    def can_reuse_epoch_itr_across_epochs(self):
+        return all(d.can_reuse_epoch_itr_across_epochs for d in self.datasets)
+
+    def set_epoch(self, epoch):
+        super().set_epoch(epoch)
+        for ds in self.datasets:
+            if hasattr(ds, "set_epoch"):
+                ds.set_epoch(epoch)
diff --git a/YiTrans/yitrans_iwslt22/data/denoising_dataset.py b/YiTrans/yitrans_iwslt22/data/denoising_dataset.py
new file mode 100644
index 0000000000000000000000000000000000000000..f49870600047683fa7b9e37bc50a86bf0c87be53
--- /dev/null
+++ b/YiTrans/yitrans_iwslt22/data/denoising_dataset.py
@@ -0,0 +1,90 @@
+# --------------------------------------------------------
+# The YiTrans End-to-End Speech Translation System for IWSLT 2022 Offline Shared Task (https://arxiv.org/abs/2206.05777)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/YiTrans
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Based on fairseq code bases
+# https://github.com/facebookresearch/fairseq
+# --------------------------------------------------------
+
+import math
+
+import numpy as np
+import torch
+
+from fairseq.data import FairseqDataset, data_utils, DenoisingDataset
+
+
+class DenoisingDatasetLang(DenoisingDataset):
+    """
+    A wrapper around DenoisingDataset for BART dataset.
+
+    """
+
+    def __init__(
+        self,
+        dataset,
+        sizes,
+        vocab,
+        mask_idx,
+        mask_whole_words,
+        shuffle,
+        seed,
+        args,
+        eos=None,
+        item_transform_func=None,
+        tgt_lang_idx=None,
+    ):
+        super().__init__(
+            dataset,
+            sizes,
+            vocab,
+            mask_idx,
+            mask_whole_words,
+            shuffle,
+            seed,
+            args,
+            eos,
+            item_transform_func,
+        )
+        
+        self.tgt_lang_idx=tgt_lang_idx
+
+    def __getitem__(self, index):
+        with data_utils.numpy_seed(self.seed, self.epoch, index):
+            tokens = self.dataset[index]
+            assert tokens[-1] == self.eos
+            source, target = tokens, tokens.clone()
+
+            if self.permute_sentence_ratio > 0.0:
+                source = self.permute_sentences(source, self.permute_sentence_ratio)
+
+            if self.mask_ratio > 0:
+                source = self.add_whole_word_mask(source, self.mask_ratio)
+
+            if self.insert_ratio > 0:
+                source = self.add_insertion_noise(source, self.insert_ratio)
+
+            if self.rotate_ratio > 0.0 and np.random.random() < self.rotate_ratio:
+                source = self.add_rolling_noise(source)
+        # there can additional changes to make:
+        if self.item_transform_func is not None:
+            source, target = self.item_transform_func(source, target)
+
+        assert (source >= 0).all()
+        assert (source[1:-1] >= 1).all()
+        assert (source <= len(self.vocab)).all()
+        assert source[0] == self.vocab.bos()
+        assert target[0] == self.vocab.bos()
+        assert source[-1] == self.eos
+
+        if self.tgt_lang_idx is not None:
+            tgt_lang_idx = torch.LongTensor([self.tgt_lang_idx])
+            source = torch.cat([source[1:], tgt_lang_idx])
+            target = torch.cat([target[1:], tgt_lang_idx])
+        sample = {
+            "id": index,
+            "source": source,
+            "target": target,
+        }
+        return sample
diff --git a/YiTrans/yitrans_iwslt22/data/lang_pair_mask_dataset.py b/YiTrans/yitrans_iwslt22/data/lang_pair_mask_dataset.py
new file mode 100644
index 0000000000000000000000000000000000000000..e5617a23150a2268cd1ba36b9b7fed4c5e7b3d09
--- /dev/null
+++ b/YiTrans/yitrans_iwslt22/data/lang_pair_mask_dataset.py
@@ -0,0 +1,62 @@
+# --------------------------------------------------------
+# The YiTrans End-to-End Speech Translation System for IWSLT 2022 Offline Shared Task (https://arxiv.org/abs/2206.05777)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/YiTrans
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Based on fairseq code bases
+# https://github.com/facebookresearch/fairseq
+# --------------------------------------------------------
+
+"""
+    Modified from https://github.com/facebookresearch/fairseq/blob/main/fairseq/data/audio/multi_modality_dataset.py
+"""
+
+
+from typing import Optional
+
+import numpy as np
+import torch
+from fairseq.data import (
+    LanguagePairDataset,
+)
+from fairseq.data.audio.multi_modality_dataset import LangPairMaskDataset as FairseqLangPairMaskDataset
+
+class LangPairMaskDataset(FairseqLangPairMaskDataset):
+    def __init__(
+        self,
+        dataset: LanguagePairDataset,
+        src_eos: int,
+        src_bos: Optional[int] = None,
+        noise_id: Optional[int] = -1,
+        mask_ratio: Optional[float] = 0,
+        mask_type: Optional[str] = "random",
+    ):
+        super.__init__(
+            dataset,
+            src_eos,
+            src_bos,
+            noise_id,
+            mask_ratio,
+            mask_type,
+        )
+    def mask_src_tokens(self, sample):
+        src_item = sample["source"]
+        mask = None
+        if self.mask_type == "random":
+            mask = torch.rand(len(src_item)).le(self.mask_ratio)
+        else:
+            mask = torch.ones(len(src_item))
+            mask[: int(len(src_item) * (1 - self.mask_ratio))] = 0
+            mask = mask.eq(1)
+        if src_item[0] == self.src_bos:
+            mask[0] = False
+        if src_item[-1] == self.src_eos:
+            mask[-1] = False
+        mask_src_item = src_item.masked_fill(mask, self.noise_id)
+        smp = sample
+        smp["source"] = mask_src_item
+        return smp
+
+    def collater(self, samples, pad_to_length=None):
+        return self.dataset.collater(samples, pad_to_length=pad_to_length)
+
diff --git a/YiTrans/yitrans_iwslt22/data/load_langpair_dataset.py b/YiTrans/yitrans_iwslt22/data/load_langpair_dataset.py
new file mode 100644
index 0000000000000000000000000000000000000000..62c5e7b7789deece849df07b560d775c5c84d5c0
--- /dev/null
+++ b/YiTrans/yitrans_iwslt22/data/load_langpair_dataset.py
@@ -0,0 +1,170 @@
+# --------------------------------------------------------
+# The YiTrans End-to-End Speech Translation System for IWSLT 2022 Offline Shared Task (https://arxiv.org/abs/2206.05777)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/YiTrans
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Based on fairseq code bases
+# https://github.com/facebookresearch/fairseq
+# --------------------------------------------------------
+
+"""
+    Modified from https://github.com/facebookresearch/fairseq/blob/main/fairseq/tasks/translation.py
+    Add custom lang_format in function load_langpair_dataset
+"""
+
+import itertools
+import logging
+import os
+
+from fairseq.data import (
+    AppendTokenDataset,
+    LanguagePairDataset,
+    PrependTokenDataset,
+    StripTokenDataset,
+    TruncateDataset,
+    data_utils,
+    indexed_dataset,
+)
+
+from yitrans_iwslt22.data.concat_dataset import ConcatDataset
+
+
+EVAL_BLEU_ORDER = 4
+
+
+logger = logging.getLogger(__name__)
+
+
+def load_langpair_dataset(
+    data_path,
+    split,
+    src,
+    src_dict,
+    tgt,
+    tgt_dict,
+    combine,
+    dataset_impl,
+    upsample_primary,
+    left_pad_source,
+    left_pad_target,
+    max_source_positions,
+    max_target_positions,
+    prepend_bos=False,
+    load_alignments=False,
+    truncate_source=False,
+    append_source_id=False,
+    num_buckets=0,
+    shuffle=True,
+    pad_to_multiple=1,
+    prepend_bos_src=None,
+    lang_format="[{}]",
+):
+    def split_exists(split, src, tgt, lang, data_path):
+        filename = os.path.join(data_path, "{}.{}-{}.{}".format(split, src, tgt, lang))
+        return indexed_dataset.dataset_exists(filename, impl=dataset_impl)
+
+    src_datasets = []
+    tgt_datasets = []
+
+    for k in itertools.count():
+        split_k = split + (str(k) if k > 0 else "")
+
+        # infer langcode
+        if split_exists(split_k, src, tgt, src, data_path):
+            prefix = os.path.join(data_path, "{}.{}-{}.".format(split_k, src, tgt))
+        elif split_exists(split_k, tgt, src, src, data_path):
+            prefix = os.path.join(data_path, "{}.{}-{}.".format(split_k, tgt, src))
+        else:
+            if k > 0:
+                break
+            else:
+                raise FileNotFoundError(
+                    "Dataset not found: {} ({})".format(split, data_path)
+                )
+
+        src_dataset = data_utils.load_indexed_dataset(
+            prefix + src, src_dict, dataset_impl
+        )
+        if truncate_source:
+            src_dataset = AppendTokenDataset(
+                TruncateDataset(
+                    StripTokenDataset(src_dataset, src_dict.eos()),
+                    max_source_positions - 1,
+                ),
+                src_dict.eos(),
+            )
+        src_datasets.append(src_dataset)
+
+        tgt_dataset = data_utils.load_indexed_dataset(
+            prefix + tgt, tgt_dict, dataset_impl
+        )
+        if tgt_dataset is not None:
+            tgt_datasets.append(tgt_dataset)
+
+        logger.info(
+            "{} {} {}-{} {} examples".format(
+                data_path, split_k, src, tgt, len(src_datasets[-1])
+            )
+        )
+
+        if not combine:
+            break
+
+    assert len(src_datasets) == len(tgt_datasets) or len(tgt_datasets) == 0
+
+    if len(src_datasets) == 1:
+        src_dataset = src_datasets[0]
+        tgt_dataset = tgt_datasets[0] if len(tgt_datasets) > 0 else None
+    else:
+        sample_ratios = [1] * len(src_datasets)
+        sample_ratios[0] = upsample_primary
+        src_dataset = ConcatDataset(src_datasets, sample_ratios)
+        if len(tgt_datasets) > 0:
+            tgt_dataset = ConcatDataset(tgt_datasets, sample_ratios)
+        else:
+            tgt_dataset = None
+
+    if prepend_bos:
+        assert hasattr(src_dict, "bos_index") and hasattr(tgt_dict, "bos_index")
+        src_dataset = PrependTokenDataset(src_dataset, src_dict.bos())
+        if tgt_dataset is not None:
+            tgt_dataset = PrependTokenDataset(tgt_dataset, tgt_dict.bos())
+    elif prepend_bos_src is not None:
+        logger.info(f"prepending src bos: {prepend_bos_src}")
+        src_dataset = PrependTokenDataset(src_dataset, prepend_bos_src)
+
+    eos = None
+    if append_source_id:
+        src_dataset = AppendTokenDataset(
+            src_dataset, src_dict.index(lang_format.format(src))
+        )
+        if tgt_dataset is not None:
+            tgt_dataset = AppendTokenDataset(
+                tgt_dataset, tgt_dict.index(lang_format.format(tgt))
+            )
+        eos = tgt_dict.index(lang_format.format(tgt))
+
+    align_dataset = None
+    if load_alignments:
+        align_path = os.path.join(data_path, "{}.align.{}-{}".format(split, src, tgt))
+        if indexed_dataset.dataset_exists(align_path, impl=dataset_impl):
+            align_dataset = data_utils.load_indexed_dataset(
+                align_path, None, dataset_impl
+            )
+
+    tgt_dataset_sizes = tgt_dataset.sizes if tgt_dataset is not None else None
+    return LanguagePairDataset(
+        src_dataset,
+        src_dataset.sizes,
+        src_dict,
+        tgt_dataset,
+        tgt_dataset_sizes,
+        tgt_dict,
+        left_pad_source=left_pad_source,
+        left_pad_target=left_pad_target,
+        align_dataset=align_dataset,
+        eos=eos,
+        num_buckets=num_buckets,
+        shuffle=shuffle,
+        pad_to_multiple=pad_to_multiple,
+    )
diff --git a/YiTrans/yitrans_iwslt22/data/multimodal_corpus_dataset.py b/YiTrans/yitrans_iwslt22/data/multimodal_corpus_dataset.py
new file mode 100644
index 0000000000000000000000000000000000000000..ee02a4e9dee21acd200ef038f6be3a241d51479f
--- /dev/null
+++ b/YiTrans/yitrans_iwslt22/data/multimodal_corpus_dataset.py
@@ -0,0 +1,346 @@
+# --------------------------------------------------------
+# The YiTrans End-to-End Speech Translation System for IWSLT 2022 Offline Shared Task (https://arxiv.org/abs/2206.05777)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/YiTrans
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Based on fairseq code bases
+# https://github.com/facebookresearch/fairseq
+# --------------------------------------------------------
+
+import logging
+from os import replace
+import time
+from collections import OrderedDict
+from typing import Any, Dict, List, Optional
+
+import numpy as np
+from fairseq.data import data_utils
+
+from fairseq.data import FairseqDataset
+
+logger = logging.getLogger(__name__)
+
+
+class MultiCorpusDataset(FairseqDataset):
+    """
+    see fairseq/fairseq/data/multi_corpus_dataset.__doc__
+
+    Args:
+        datasets: a OrderedDict of FairseqDataset instances.
+        distribution: a List containing the probability of getting an utterance from
+                        corresponding dataset
+        seed: random seed for sampling the datsets
+        sort_indices: if true, will sort the ordered indices by size
+        batch_sample: if true, will ensure each batch is from a single dataset
+    """
+
+    def __init__(
+        self,
+        datasets: Dict[str, FairseqDataset],
+        max_positions: Dict,
+        distribution: List[float],
+        max_tokens_ratio: List[float],
+        seed: int,
+        sort_indices: bool = False,
+        check_length: bool = False,
+    ):
+        super().__init__()
+        assert isinstance(datasets, OrderedDict)
+        assert len(datasets) == len(distribution)
+        # assert sum(distribution) == 1
+        self.datasets = datasets
+        self.distribution = distribution
+        self.max_tokens_ratio = max_tokens_ratio
+        self.seed = seed
+        self.sort_indices = sort_indices
+        self.max_positions = max_positions
+        self.check_length = check_length
+
+        # Avoid repeated conversions to list later
+        self.dataset_list = list(datasets.values())
+        self.total_num_instances = 0
+
+        # first_dataset = self.dataset_list[0]
+
+        self.num_instances_per_dataset = []
+        self.dataset_offsets = []
+        for i, dataset in enumerate(self.dataset_list):
+            assert isinstance(dataset, FairseqDataset)
+            # assert type(dataset) is type(first_dataset)
+            self.num_instances_per_dataset.append(
+                0 if self.distribution[i] == 0 else len(dataset)
+            )
+            self.dataset_offsets.append(self.total_num_instances)
+            self.total_num_instances += self.num_instances_per_dataset[i]
+
+    def ordered_indices(self):
+        start = time.time()
+        with data_utils.numpy_seed(self.seed, self.epoch):
+            logger.info(f"sampling new dataset with seed {self.seed} epoch {self.epoch}")
+            sampled_indices = {}
+
+            # For each dataset i, sample self.distribution[i] * self.total_num_instances
+            for i, key in enumerate(self.datasets):
+                tp = time.time()
+                if self.distribution[i] == 0:
+                    # skip dataset if sampling probability is 0
+                    continue
+
+                if i < len(self.datasets) - 1:
+                    num_instances = int(self.distribution[i] * self.total_num_instances)
+                    high = self.dataset_offsets[i + 1]
+                else:
+                    num_instances = int(self.distribution[i] * self.total_num_instances)
+                    high = self.total_num_instances
+
+                logger.info(f"sampling {num_instances} from {key} dataset")
+
+                # First, add k copies of the dataset where k = num_instances // len(dataset).
+                # This ensures an equal distribution of the data points as much as possible.
+                # For the remaining entries randomly sample them
+                dataset_size = len(self.datasets[key])
+                num_copies = num_instances // dataset_size
+                dataset_indices = np.random.permutation(high - self.dataset_offsets[i])[: num_instances - num_copies * dataset_size]
+                if num_copies > 0:
+                    dataset_indices = np.concatenate(
+                            (
+                                np.repeat(
+                                    np.arange(high - self.dataset_offsets[i]), num_copies
+                                ),
+                                dataset_indices,
+                            )
+                        )
+                # filter by size, we should ignore it by setting check_length=False
+                # , as it is very time-consuming on large dadaset
+                if self.max_positions[key] is not None and self.check_length:
+                    dataset_indices, ignored = self.datasets[key].filter_indices_by_size(
+                        dataset_indices,
+                        self.max_positions[key],
+                    )
+                    if len(ignored) > 0:
+                        logger.warning(
+                            (
+                                "{:,} samples have invalid sizes and will be skipped, "
+                                "max_positions={}, first few sample ids={}"
+                            ).format(len(ignored), self.max_positions[key], ignored[:10])
+                        )
+            
+                if self.sort_indices:
+                    logger.info(" - sampled indices took {}s".format(time.time() - tp))
+                    tp = time.time()
+                    dataset_indices = np.sort(dataset_indices)
+                    dataset_indices = self.datasets[key].ordered_indices()[dataset_indices] + self.dataset_offsets[i]
+                    logger.info(" - ordered_indices took {}s".format(time.time() - tp))
+                else:
+                    np.random.shuffle(dataset_indices)
+                
+                sampled_indices[key] = dataset_indices
+
+            logger.info(
+                "multi_corpus_dataset ordered_indices took {}s".format(
+                    time.time() - start
+                )
+            )
+            return sampled_indices
+
+    def _map_index(self, index: int):
+        """
+        If dataset A has length N and dataset B has length M
+        then index 1 maps to index 1 of dataset A, and index N + 1
+        maps to index 1 of B.
+        """
+        counter = 0
+        for num_instances, key in zip(self.num_instances_per_dataset, self.datasets):
+            if index < counter + num_instances:
+                return index - counter, key
+            counter += num_instances
+        raise ValueError(
+            "Invalid index: {}, max: {}".format(index, self.total_num_instances)
+        )
+
+    def __len__(self):
+        """
+        Length of this dataset is the sum of individual datasets
+        """
+        return self.total_num_instances
+
+    def __getitem__(self, index):
+        new_index, key = self._map_index(index)
+        try:
+            item = self.datasets[key][new_index]
+            item["full_id"] = index
+            return item
+        except Exception as e:
+            e.args = (f"Error from {key} dataset", *e.args)
+            raise
+
+    def collater(self, samples):
+        """
+        If we are doing batch sampling, then pick the right collater to use.
+
+        Otherwise we assume all collaters are the same.
+        """
+        if len(samples) == 0:
+            return None
+        
+        samples_dict = {key: [] for key in self.datasets}
+        for s in samples:
+            _, key = self._map_index(s["full_id"])
+            samples_dict[key].append(s)
+        
+        batch = {}
+        for key in samples_dict:
+            if len(samples_dict[key]) == 0:
+                continue
+            batch[key] = self.datasets[key].collater(samples_dict[key])
+
+        return batch
+
+
+    def num_tokens(self, index: int):
+        index, key = self._map_index(index)
+        return self.datasets[key].num_tokens(index)
+
+    def size(self, index: int):
+        index, key = self._map_index(index)
+        return self.datasets[key].size(index)
+
+    @property
+    def can_reuse_epoch_itr_across_epochs(self):
+        return False
+
+    def set_epoch(self, epoch, **unused):
+        super().set_epoch(epoch)
+        logger.info(f"setting epoch of multi_corpus_dataset to {epoch}")
+        for ds in self.dataset_list:
+            if hasattr(ds, "set_epoch"):
+                ds.set_epoch(epoch)
+        self.epoch = epoch
+
+    @property
+    def supports_prefetch(self):
+        return False
+
+    @property
+    def supports_fetch_outside_dataloader(self):
+        return all(
+            self.datasets[key].supports_fetch_outside_dataloader
+            for key in self.datasets
+        )
+        
+
+    def batch_by_size(
+        self,
+        indices,
+        max_tokens=None,
+        max_sentences=None,
+        required_batch_size_multiple=1,
+    ):    
+        dataset_indices = indices
+        batches_dict = {}
+        for n, key in enumerate(dataset_indices):
+            max_tokens_ratio = self.max_tokens_ratio[n]
+            cur_batches = super().batch_by_size(
+                np.array(dataset_indices[key], dtype=np.int64),
+                round(max_tokens * max_tokens_ratio),
+                max_sentences,
+                required_batch_size_multiple,
+            )
+            logger.info(f"Created {len(cur_batches)} batches for dataset {key}")
+            batches_dict[key] = cur_batches
+
+        return batches_dict
+
+
+    def get_batch_sampler(
+        self,
+        indices,
+        num_shards,
+        seed,
+        max_tokens=None,
+        max_sentences=None,
+        required_batch_size_multiple=1,
+        split_modality_batch=False,
+    ):
+
+        def batch_sampler(dataset, epoch):
+            start = time.time()
+            batches_dict = dataset.batch_by_size(
+                indices,
+                max_tokens=max_tokens,
+                max_sentences=max_sentences,
+                required_batch_size_multiple=required_batch_size_multiple,
+            )
+            logger.info(f"multi_corpus_dataset, batch_by_size took {time.time() - start}s")
+            start = time.time()
+            new_batches = []
+
+            ### shuffle inner group size, split into speech/text batches
+            speech_batches, text_batches = [], []
+            for name, batches in batches_dict.items():
+                batches = inner_bucket_shuffle(batches, seed+epoch, num_shards*10)
+                batches = batches[: (len(batches) // num_shards) * num_shards]
+                if name.startswith("speech"):
+                    speech_batches += batches
+                else:
+                    text_batches += batches
+            if len(speech_batches) == 0:
+                logger.warning(f"Sample 0 speech batch, please ensure that no speech data loaded.")
+            if len(text_batches) == 0:
+                logger.warning(f"Sample 0 text batch, please ensure that no text data loaded.")
+            
+            ### shuffle groups
+            if len(speech_batches) == 0 or len(text_batches) == 0:
+                new_batches = speech_batches + text_batches
+                new_batches = shuffle_buckets(new_batches, seed=seed+epoch, inner_shuf=False)
+            else:
+                speech_batches = shuffle_buckets(speech_batches, seed=seed+epoch, inner_shuf=False)
+                text_batches = shuffle_buckets(text_batches, seed=seed+epoch, inner_shuf=False)
+                num_batch = min(len(speech_batches), len(text_batches))
+                if split_modality_batch:
+                    for i in range(0, num_batch, num_shards):
+                        new_batches += speech_batches[i: i + num_shards]
+                        new_batches += text_batches[i: i + num_shards]
+                else:
+                    for i in range(num_batch):
+                        new_batches.append(np.concatenate([speech_batches[i], text_batches[i]]))
+
+            logger.info(f"multi_corpus_dataset sample {len(new_batches)} batches, took {time.time() - start}s")
+            return new_batches
+        
+        def inner_bucket_shuffle(batches, seed, bucket_size=10, thr=0):
+            """we assert batches is sorted form long to short.
+                shuffle samples in a buctet(e.g. 10 batches).
+                batches: a list of numpy array"""
+            num_batch = len(batches)
+            new_batches = []
+            num_buckets = len(batches) // bucket_size
+            i = 0
+            while i < num_batch:
+                if (i < bucket_size * thr or 
+                    i >= bucket_size * (num_buckets - thr)
+                ):
+                    new_batches.append(batches[i])
+                    i += 1
+                else:
+                    group = np.concatenate(batches[i: i+bucket_size])
+                    with data_utils.numpy_seed(seed):
+                        np.random.shuffle(group)
+                    new_batches += np.array_split(group, bucket_size)
+                    i += bucket_size
+            assert all([len(batch) > 0 for batch in new_batches])
+            return new_batches
+        
+        def shuffle_buckets(batches, seed, inner_shuf=True):
+            if inner_shuf:
+                batches = inner_bucket_shuffle(batches, seed, num_shards*10)
+            batches = [batches[i: i + num_shards] for i in range(0, len(batches)-num_shards+1, num_shards)]
+            assert len(batches[-1]) == num_shards
+            new_batches = []
+            with data_utils.numpy_seed(seed):
+                np.random.shuffle(batches)
+                for group in batches:
+                    new_batches += group
+            return new_batches
+        
+        return batch_sampler
diff --git a/YiTrans/yitrans_iwslt22/data/speech2c_dataset.py b/YiTrans/yitrans_iwslt22/data/speech2c_dataset.py
new file mode 100644
index 0000000000000000000000000000000000000000..e75d75a96a1759c79a0a4ba1cc297e2eea4a1aaa
--- /dev/null
+++ b/YiTrans/yitrans_iwslt22/data/speech2c_dataset.py
@@ -0,0 +1,222 @@
+# --------------------------------------------------------
+# The YiTrans End-to-End Speech Translation System for IWSLT 2022 Offline Shared Task (https://arxiv.org/abs/2206.05777)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/YiTrans
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Based on fairseq code bases
+# https://github.com/facebookresearch/fairseq
+# --------------------------------------------------------
+
+import itertools
+import logging
+import os
+import sys
+from typing import Any, List, Optional, Union
+
+import numpy as np
+
+import torch
+import torch.nn.functional as F
+from fairseq.data import data_utils, Dictionary
+from fairseq.data.audio.hubert_dataset import HubertDataset
+
+logger = logging.getLogger(__name__)
+
+
+
+class Speech2cDataset(HubertDataset):
+    def __init__(
+        self,
+        manifest_path: str,
+        sample_rate: float,
+        label_paths: List[str],
+        label_rates: Union[List[float], float],  # -1 for sequence labels
+        pad_list: List[str],
+        eos_list: List[str],
+        label_processors: Optional[List[Any]] = None,
+        max_keep_sample_size: Optional[int] = None,
+        min_keep_sample_size: Optional[int] = None,
+        max_sample_size: Optional[int] = None,
+        shuffle: bool = True,
+        pad_audio: bool = False,
+        normalize: bool = False,
+        store_labels: bool = True,
+        random_crop: bool = False,
+        single_target: bool = False,
+        tgt_dict: Optional[Dictionary] = None,
+        add_decoder: bool = False,
+        fine_tuning: bool = False,
+        tokenizer = None,
+        tgt_lang_idx: int = None,
+        mbart_style_lang_id: bool = False,
+        retry_times: int = 5,
+        reduce_label_for_dec: bool = True,
+    ):
+        super().__init__(
+            manifest_path,
+            sample_rate,
+            label_paths,
+            label_rates,
+            pad_list,
+            eos_list,
+            label_processors,
+            max_keep_sample_size,
+            min_keep_sample_size,
+            max_sample_size,
+            shuffle,
+            pad_audio,
+            normalize,
+            store_labels,
+            random_crop,
+            single_target
+        )
+        self.tgt_dict = tgt_dict
+        self.add_decoder = add_decoder
+        self.fine_tuning = fine_tuning
+        self.tokenizer = tokenizer
+        self.tgt_lang_idx = tgt_lang_idx
+        self.mbart_style_lang_id = mbart_style_lang_id
+        self.retry_times = retry_times
+        self.reduce_label_for_dec = reduce_label_for_dec
+        logger.info(
+            f"tgt_lang_idx={self.tgt_lang_idx}, reduce_label_for_dec={reduce_label_for_dec}, "
+            f"mbart_style_lang_id={mbart_style_lang_id}"
+        )
+
+        self.sizes = np.array(self.sizes)
+    
+    def get_label(self, index, label_idx):
+        if self.store_labels:
+            label = self.label_list[label_idx][index]
+        else:
+            with open(self.label_paths[label_idx]) as f:
+                offset_s, offset_e = self.label_offsets_list[label_idx][index]
+                f.seek(offset_s)
+                label = f.read(offset_e - offset_s)
+
+        if self.tokenizer is not None and self.fine_tuning:
+            label = self.tokenizer.encode(label)
+
+        if self.label_processors is not None:
+            label = self.label_processors[label_idx](label)
+        return label
+
+    def collater(self, samples):
+        # target = max(sizes) -> random_crop not used
+        # target = max_sample_size -> random_crop used for long
+        samples = [s for s in samples if s["source"] is not None]
+        if len(samples) == 0:
+            return {}
+
+        audios = [s["source"] for s in samples]
+        audio_sizes = [len(s) for s in audios]
+        if self.pad_audio:
+            audio_size = min(max(audio_sizes), self.max_sample_size)
+        else:
+            audio_size = min(min(audio_sizes), self.max_sample_size)
+        collated_audios, padding_mask, audio_starts = self.collater_audio(
+            audios, audio_size
+        )
+
+        targets_by_label = [
+            [s["label_list"][i] for s in samples] for i in range(self.num_labels)
+        ]
+        targets_list, lengths_list, ntokens_list = self.collater_label(
+            targets_by_label, audio_size, audio_starts
+        )
+
+        if self.add_decoder:
+            if self.fine_tuning:
+                    decoder_label = [
+                        torch.cat((targets_list[0][i, :lengths_list[0][i]], torch.tensor([self.tgt_dict.eos()])), 0).long()
+                        for i in range(targets_list[0].size(0))
+                    ]
+            else:
+                if self.tokenizer is not None:
+                    decoder_label = [
+                        # Set 48 for translate int to char and avoid \n
+                        torch.cat(
+                            (
+                                torch.tensor(
+                                    self.tokenizer.sp.Encode(
+                                        "".join(
+                                            [chr(j + 48) for j in (
+                                                targets_list[0][i, :lengths_list[0][i]].unique_consecutive() if self.reduce_label_for_dec else targets_list[0][i, :lengths_list[0][i]]
+                                            ).tolist()]
+                                        ), out_type=int
+                                    )
+                                ), 
+                                torch.tensor([self.tgt_dict.eos()])
+                            ), dim=0
+                        ).long()
+                        for i in range(targets_list[0].size(0))
+                    ]
+                else:
+                    decoder_label = [
+                        torch.cat((targets_list[0][i, :lengths_list[0][i]].unique_consecutive() if self.reduce_label_for_dec else targets_list[0][i, :lengths_list[0][i]], torch.tensor([self.tgt_dict.eos()])), 0).long()
+                        for i in range(targets_list[0].size(0))
+                    ]
+
+            if self.mbart_style_lang_id:
+                decoder_label = [
+                    torch.cat((decoder_label[i], torch.tensor([self.tgt_lang_idx])), 0).long()
+                    for i in range(targets_list[0].size(0))
+                ]
+
+            dec_ntokens = sum(x.size(0) for x in decoder_label)
+            decoder_target = data_utils.collate_tokens(
+                decoder_label,
+                self.tgt_dict.pad(),
+                self.tgt_dict.eos() if not self.mbart_style_lang_id else self.tgt_lang_idx,
+                left_pad=False,
+                move_eos_to_beginning=False,
+            )
+            decoder_target_lengths = torch.tensor(
+                [x.size(0) for x in decoder_label], dtype=torch.long
+            )
+            prev_output_tokens = data_utils.collate_tokens(
+                decoder_label,
+                self.tgt_dict.pad(),
+                self.tgt_dict.eos() if not self.mbart_style_lang_id else self.tgt_lang_idx,
+                left_pad=False,
+                move_eos_to_beginning=True,
+            )
+            
+            if self.tgt_lang_idx is not None and not self.mbart_style_lang_id:
+                assert (prev_output_tokens[:, 0] != self.tgt_dict.eos()).sum() == 0
+                prev_output_tokens[:, 0] = self.tgt_lang_idx
+
+            net_input = {
+                "source": collated_audios, 
+                "padding_mask": padding_mask,
+                "prev_output_tokens": prev_output_tokens,
+            }
+            batch = {
+                "id": torch.LongTensor([s["id"] for s in samples]),
+                "net_input": net_input,
+                "decoder_target": decoder_target,
+                "decoder_target_lengths": decoder_target_lengths,
+                "dec_ntokens": dec_ntokens,
+                "lang_idx": self.tgt_lang_idx,
+            }
+        else:
+            net_input = {"source": collated_audios, "padding_mask": padding_mask}
+            batch = {
+                "id": torch.LongTensor([s["id"] for s in samples]),
+                "net_input": net_input,
+            }
+
+        if self.single_target:
+            batch["target_lengths"] = lengths_list[0]
+            batch["ntokens"] = ntokens_list[0]
+            batch["target"] = targets_list[0]
+        else:
+            batch["target_lengths_list"] = lengths_list
+            batch["ntokens_list"] = ntokens_list
+            batch["target_list"] = targets_list
+        return batch
+
+    # @property
+    # def sizes(self):
+    #     return np.array(self.sizes)
+
diff --git a/YiTrans/yitrans_iwslt22/models/__init__.py b/YiTrans/yitrans_iwslt22/models/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/YiTrans/yitrans_iwslt22/models/_hubert_mt.py b/YiTrans/yitrans_iwslt22/models/_hubert_mt.py
new file mode 100644
index 0000000000000000000000000000000000000000..d997eb21bdef44ab51ef02ba586b9058c87f371e
--- /dev/null
+++ b/YiTrans/yitrans_iwslt22/models/_hubert_mt.py
@@ -0,0 +1,310 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import logging
+import contextlib
+from argparse import Namespace
+from typing import Any, Optional
+
+import torch
+import torch.nn as nn
+from dataclasses import dataclass, field
+from fairseq import checkpoint_utils, tasks, utils
+from fairseq.dataclass import FairseqDataclass
+from fairseq.dataclass.utils import convert_namespace_to_omegaconf
+from fairseq.models import BaseFairseqModel, FairseqEncoder, register_model
+from fairseq.models.hubert.hubert import MASKING_DISTRIBUTION_CHOICES
+from fairseq.tasks import FairseqTask
+from omegaconf import II, MISSING
+
+from .hubert_asr import HubertAsrConfig
+from fairseq.models.transformer import TransformerConfig
+logger = logging.getLogger(__name__)
+
+
+@dataclass
+class HubertMTConfig(HubertAsrConfig):
+    load_pretrained_mbart_from: Optional[str] = field(
+        default=None,
+        metadata={
+            "help": "model to take text encoder decoder weights from (for initialization)"
+        },
+    )
+    use_rel_pos_enc: bool = field(
+        default=True,
+        metadata={"help": "whether to use relative positional encoding"},
+    )
+    text_transformer_encoder_layers: int = field(
+        default=12,
+        metadata={"help": "reset text_transformer_encoder_layers"},
+    )
+
+
+@register_model("hubert_mt", dataclass=HubertMTConfig)
+class HubertMT(BaseFairseqModel):
+    def __init__(self, cfg: HubertMTConfig, w2v_encoder: BaseFairseqModel):
+        super().__init__()
+        self.cfg = cfg
+        self.w2v_encoder = w2v_encoder
+
+    def upgrade_state_dict_named(self, state_dict, name):
+        super().upgrade_state_dict_named(state_dict, name)
+        return state_dict
+
+    @classmethod
+    def build_model(cls, cfg: HubertMTConfig, task: FairseqTask):
+        """Build a new model instance."""
+        w2v_encoder = HubertEncoder(cfg, task.target_dictionary)
+        return cls(cfg, w2v_encoder)
+
+    def get_normalized_probs(self, net_output, log_probs, sample=None):
+        """Get normalized probabilities (or log probs) from a net's output."""
+        if "decoder_out" in net_output:
+            return self.w2v_encoder.get_normalized_probs_decoder(net_output["decoder_out"], log_probs, sample)
+
+        assert "encoder_out" not in net_output
+        if "encoder_out" not in net_output:
+            return self.w2v_encoder.get_normalized_probs_decoder(net_output, log_probs, sample)
+
+
+    def get_logits(self, net_output):
+        logits = net_output["encoder_out"]
+        padding = net_output["encoder_padding_mask"]
+        if padding is not None and padding.any():
+            padding = padding.T
+            logits[padding][..., 0] = 0
+            logits[padding][..., 1:] = float("-inf")
+
+        return logits
+
+    def forward(self, **kwargs):
+        x = self.w2v_encoder(**kwargs)
+        return x
+
+    @property
+    def encoder(self):
+        return self.w2v_encoder
+
+    def reorder_encoder_out(self, encoder_out, new_order):
+        return self.encoder.reorder_encoder_out(encoder_out, new_order)
+
+    @property
+    def decoder(self):
+        return self.w2v_encoder.w2v_model.decoder
+
+
+class HubertEncoder(FairseqEncoder):
+    def __init__(self, cfg: HubertMTConfig, tgt_dict=None):
+        self.apply_mask = cfg.apply_mask
+
+        arg_overrides = {
+            "dropout": cfg.dropout,
+            "activation_dropout": cfg.activation_dropout,
+            "dropout_input": cfg.dropout_input,
+            "attention_dropout": cfg.attention_dropout,
+            "mask_length": cfg.mask_length,
+            "mask_prob": cfg.mask_prob,
+            "mask_selection": cfg.mask_selection,
+            "mask_other": cfg.mask_other,
+            "no_mask_overlap": cfg.no_mask_overlap,
+            "mask_channel_length": cfg.mask_channel_length,
+            "mask_channel_prob": cfg.mask_channel_prob,
+            "mask_channel_selection": cfg.mask_channel_selection,
+            "mask_channel_other": cfg.mask_channel_other,
+            "no_mask_channel_overlap": cfg.no_mask_channel_overlap,
+            "encoder_layerdrop": cfg.layerdrop,
+            "decoder_layerdrop": cfg.decoder_layerdrop,
+            "feature_grad_mult": cfg.feature_grad_mult,
+            "decoder_dict_size": -1,
+            "add_text_modality": True,
+            "add_text_encoder": True,
+            "load_pretrained_mbart_from": None,
+            "load_pretrained_w2v_from": None,
+            "text_transformer": {
+                "encoder":{
+                    "layers": cfg.text_transformer_encoder_layers,
+                    "layerdrop": cfg.layerdrop,
+                }, 
+                'dropout': cfg.dropout,
+                'attention_dropout': cfg.attention_dropout,
+                'activation_dropout': cfg.activation_dropout,
+                }
+            }
+        if cfg.no_pretrained_weights:
+            arg_overrides["use_rel_pos_enc"] = cfg.use_rel_pos_enc
+
+        if cfg.w2v_args is None:
+            state = checkpoint_utils.load_checkpoint_to_cpu(
+                cfg.w2v_path, arg_overrides
+            )
+            w2v_args = state.get("cfg", None)
+            if w2v_args is None:
+                w2v_args = convert_namespace_to_omegaconf(state["args"])
+            cfg.w2v_args = w2v_args
+        else:
+            state = None
+            w2v_args = cfg.w2v_args
+            if isinstance(w2v_args, Namespace):
+                cfg.w2v_args = w2v_args = convert_namespace_to_omegaconf(w2v_args)
+        
+        # logger.info("---------------------state.keys()-------------------------------------------")
+        # logger.info(state.keys())
+        # logger.info("---------------------w2v_args.task-------------------------------------------")
+        # logger.info(w2v_args.task)
+        # logger.info("---------------------w2v_args.model-------------------------------------------")
+        # logger.info(w2v_args.model)
+        # logger.info("----------------------------------------------------------------")
+                    
+        w2v_args.task.data = cfg.data
+        w2v_args.task.text_cfg.text_data = cfg.data
+        w2v_args.task.text_cfg.data_config = None
+        task = tasks.setup_task(w2v_args.task)
+
+        if state is not None and "task_state" in state:
+            # This will load the stored "dictionaries" object
+            task.load_state_dict(state["task_state"])
+
+        model = task.build_model(w2v_args.model)
+
+
+        ### load mbart if specificed
+        if cfg.load_pretrained_mbart_from is not None and cfg.no_pretrained_weights:
+            logger.info("Loading mbart....")
+            mbart_model_state = model.load_checkpoint(cfg.load_pretrained_mbart_from)
+            model.text_encoder = model.load_pretrained_component_from_model(
+                component=model.text_encoder, state=mbart_model_state
+            )
+            model.decoder = model.load_pretrained_component_from_model(
+                component=model.decoder, state=mbart_model_state
+            )
+        
+        if state is not None and not cfg.no_pretrained_weights:
+            logger.info("Loading pre-trained models....")
+            model.load_state_dict(state["model"], strict=True)
+        
+        ### remove_pretraining_modules model.remove_pretraining_modules()
+        model.target_glu = None
+        model.final_proj = None
+        model.feature_extractor = None
+        model.post_extract_proj = None
+        model.encoder = None
+
+        
+
+        dropout_keys = [ n for n in w2v_args.model.text_transformer if n.find("drop") >= 0 ]
+        for key in dropout_keys:
+            logger.info(f"{key}: {w2v_args.model.text_transformer[key]}")
+
+        super().__init__(task.source_dictionary)
+
+        d = w2v_args.model.encoder_embed_dim
+        
+        self.w2v_model = model
+
+        self.final_dropout = nn.Dropout(cfg.final_dropout)
+        self.freeze_finetune_updates = cfg.freeze_finetune_updates
+        self.freeze_decoder_updates = cfg.freeze_decoder_updates
+        self.num_updates = 0
+
+
+    def set_num_updates(self, num_updates):
+        """Set the number of parameters updates."""
+        super().set_num_updates(num_updates)
+        self.num_updates = num_updates
+
+    def forward(self, src_tokens, src_lengths, prev_output_tokens, tbc=True, **kwargs):
+
+        # ft = self.freeze_finetune_updates <= self.num_updates
+        w2v_args = {
+            "src_tokens": src_tokens,
+            "src_lengths": src_lengths,
+            "mask": self.apply_mask and self.training,
+            "prev_output_tokens": prev_output_tokens,
+        }
+
+        results = self.w2v_model(**w2v_args)
+        return results
+
+    def get_normalized_probs_decoder(self, net_output, log_probs, sample=None):
+        # net_output['encoder_out'] is a (B, T, D) tensor
+        return self.w2v_model.get_normalized_probs(net_output, log_probs, sample)
+
+    def reorder_encoder_out(self, encoder_out, new_order):
+        if encoder_out["encoder_out"] is not None:
+            if isinstance(encoder_out["encoder_out"], list):
+                encoder_out["encoder_out"] = (
+                    [] if len(encoder_out["encoder_out"]) == 0
+                    else [x.index_select(1, new_order) for x in encoder_out["encoder_out"]]
+                )
+            else:
+                encoder_out["encoder_out"] = encoder_out[
+                    "encoder_out"
+                ].index_select(1, new_order)
+        if encoder_out["encoder_padding_mask"] is not None:
+            if isinstance(encoder_out["encoder_padding_mask"], list):
+                encoder_out["encoder_padding_mask"] = (
+                    [] if len(encoder_out["encoder_padding_mask"]) == 0
+                    else [x.index_select(0, new_order) for x in encoder_out["encoder_padding_mask"]]
+                )
+            else:
+                encoder_out["encoder_padding_mask"] = encoder_out[
+                    "encoder_padding_mask"
+                ].index_select(0, new_order)
+        if "decoder_out" in encoder_out and encoder_out["decoder_out"] is not None:
+            if isinstance(encoder_out["decoder_out"], list):
+                encoder_out["decoder_out"] = (
+                    [] if len(encoder_out["decoder_out"]) == 0
+                    else [x.index_select(0, new_order) for x in encoder_out["decoder_out"]]
+                )
+            else:
+                encoder_out["decoder_out"] = encoder_out[
+                    "decoder_out"
+                ].index_select(0, new_order)
+        if "encoder_out_for_ctc" in encoder_out and encoder_out["encoder_out_for_ctc"] is not None:
+            if isinstance(encoder_out["encoder_out_for_ctc"], list):
+                encoder_out["encoder_out_for_ctc"] = (
+                    [] if len(encoder_out["encoder_out_for_ctc"]) == 0
+                    else [x.index_select(1, new_order) for x in encoder_out["encoder_out_for_ctc"]]
+                )
+            else:
+                encoder_out["encoder_out_for_ctc"] = encoder_out[
+                    "encoder_out_for_ctc"
+                ].index_select(1, new_order)
+
+        return encoder_out
+
+    def forward_torchscript(self, net_input):
+        """A TorchScript-compatible version of forward.
+
+        Encoders which use additional arguments may want to override
+        this method for TorchScript compatibility.
+        """
+        encoder_out = self.w2v_model.forward_torchscript(net_input)
+        if "encoder_out_for_ctc" in encoder_out:
+            del encoder_out['encoder_out_for_ctc']
+        
+        return encoder_out
+
+    def max_positions(self):
+        """Maximum input length supported by the encoder."""
+        return None
+
+    def upgrade_state_dict_named(self, state_dict, name):
+        return state_dict
+
+
+def Embedding(num_embeddings, embedding_dim, padding_idx):
+    m = nn.Embedding(num_embeddings, embedding_dim, padding_idx=padding_idx)
+    nn.init.normal_(m.weight, mean=0, std=embedding_dim ** -0.5)
+    nn.init.constant_(m.weight[padding_idx], 0)
+    return m
+
+
+def Linear(in_features, out_features, bias=True):
+    m = nn.Linear(in_features, out_features, bias)
+    nn.init.xavier_uniform_(m.weight)
+    if bias:
+        nn.init.constant_(m.bias, 0.0)
+    return m
diff --git a/YiTrans/yitrans_iwslt22/models/finetune_asr.py b/YiTrans/yitrans_iwslt22/models/finetune_asr.py
new file mode 100644
index 0000000000000000000000000000000000000000..a89b9676788c0d48b1f400b1ce4c3d3636ce008c
--- /dev/null
+++ b/YiTrans/yitrans_iwslt22/models/finetune_asr.py
@@ -0,0 +1,460 @@
+# --------------------------------------------------------
+# The YiTrans End-to-End Speech Translation System for IWSLT 2022 Offline Shared Task (https://arxiv.org/abs/2206.05777)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/YiTrans
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Based on fairseq code bases
+# https://github.com/facebookresearch/fairseq
+# --------------------------------------------------------
+
+import logging
+import contextlib
+from argparse import Namespace
+from typing import Any, Optional
+
+import torch
+import torch.nn as nn
+import pickle
+from dataclasses import dataclass, field
+from fairseq import checkpoint_utils, tasks, utils
+from fairseq.dataclass import FairseqDataclass
+from fairseq.dataclass.utils import convert_namespace_to_omegaconf
+from fairseq.models import BaseFairseqModel, FairseqEncoder, register_model
+from fairseq.models.hubert.hubert_asr import HubertCtcConfig
+from fairseq.tasks import FairseqTask
+from omegaconf import II, MISSING
+
+from yitrans_iwslt22.modules import MultimodalTransformerDecoder
+
+logger = logging.getLogger(__name__)
+
+@dataclass
+class HubertAsrConfig(HubertCtcConfig):
+    # for decoder
+    decoder_layerdrop: float = field(
+        default=0.1,
+        metadata={"help": "probability of dropping a decoder layer in hubert"},
+    )
+    add_decoder: bool = field(
+        default=False,
+        metadata={"help": "whether to add decoder for CE Loss on code"},
+    )
+    reuse_text_emb: bool = field(
+        default=False,
+        metadata={"help": "reuse text token embeddings instead of initialize randomly"},
+    )
+    freeze_decoder_updates: int = field(
+        default=0,
+        metadata={"help": "dont finetune hubert for this many updates"},
+    )
+    share_decoder_input_output_embed: bool = field(
+        default=False,
+        metadata={"help": "share decoder input and output embeddings"},
+    )
+    share_enc_dec_embeddings: bool = field(
+        default=False,
+        metadata={"help": "share embeddings of (text encoder, text decoder)"},
+    )
+    share_s2t_t2t_embeddings: bool = field(
+        default=False,
+        metadata={"help": "share embeddings of (speech2text(code), text2text)"},
+    )
+    share_ctc_decoder_embed: bool = field(
+        default=False,
+        metadata={"help": "share ctc and decoder embedding (only when share_decoder_input_output_embed is true)"},
+    )
+    enc_grad_mult: float = field(
+        default=1.0,
+        metadata={"help": "reset feature grad mult in hubert to this (only for st2t)"},
+    )
+    retain_dict_path: Optional[str] = field(
+        default=None,
+        metadata={"help": "delete embeddings according to this path"},
+    )
+    load_step2_model_from: Optional[str] = field(
+        default=None,
+        metadata={
+            "help": "load step2 model from"
+        },
+    )
+    load_pretrained_mbart_from: Optional[str] = field(
+        default=None,
+        metadata={
+            "help": "model to take text encoder decoder weights from (for initialization)"
+        },
+    )
+    load_pretrained_w2v_from: Optional[str] = field(
+        default=None,
+        metadata={
+            "help": "model to take speech encoder weights from (for initialization)"
+        },
+    )
+    use_rel_pos_enc: bool = field(
+        default=True,
+        metadata={"help": "whether to use relative positional encoding"},
+    )
+    encoder_layers: int = field(
+        default=12,
+        metadata={"help": "encoder_layers"},
+    )
+    add_text_encoder: bool = field(
+        default=True,
+        metadata={"help": "add_text_encoder"},
+    )
+    add_adaptor: bool = field(
+        default=True,
+        metadata={"help": "add_adaptor"},
+    )
+    adaptor_stride: int = field(
+        default=2,
+        metadata={"help": "adaptor stride"},
+    )
+
+
+@register_model("yitrans_asr", dataclass=HubertAsrConfig)
+class YitransASR(BaseFairseqModel):
+    def __init__(self, cfg: HubertAsrConfig, w2v_encoder: BaseFairseqModel):
+        super().__init__()
+        self.cfg = cfg
+        self.w2v_encoder = w2v_encoder
+
+        ### in case we need load hubert_step2 model
+        if cfg.load_step2_model_from:
+            logger.info(f"Loading hubert_step2 pretrained model for finetuning: {cfg.load_step2_model_from}")
+            hubert_step2_states = self.w2v_encoder.w2v_model.load_checkpoint(cfg.load_step2_model_from)["model"]
+            if cfg.retain_dict_path is not None:
+                assert self.w2v_encoder.w2v_model.add_text_modality, "Mustc have text modality if retain dict path"
+                logger.info("Cut embedding to a smaller size according to retain dict")
+                with open(cfg.retain_dict_path, "rb") as fp:
+                    overlap_idxs = pickle.load(fp)
+                hubert_step2_states['w2v_encoder.w2v_model.decoder.output_projection.0.weight'] = hubert_step2_states['w2v_encoder.w2v_model.decoder.output_projection.0.weight'][overlap_idxs]
+                hubert_step2_states["w2v_encoder.w2v_model.decoder.embed_tokens_list.0.weight"] = hubert_step2_states["w2v_encoder.w2v_model.decoder.embed_tokens_list.0.weight"][overlap_idxs]
+                hubert_step2_states["w2v_encoder.proj.weight"] = hubert_step2_states["w2v_encoder.proj.weight"][overlap_idxs]
+            try:
+                self.load_state_dict(hubert_step2_states, strict=True)
+            except Exception as e:
+                logger.warn(e)
+                self.load_state_dict(hubert_step2_states, strict=False)
+
+    def upgrade_state_dict_named(self, state_dict, name):
+        super().upgrade_state_dict_named(state_dict, name)
+        return state_dict
+
+    @classmethod
+    def build_model(cls, cfg: HubertAsrConfig, task: FairseqTask):
+        """Build a new model instance."""
+        w2v_encoder = HubertEncoder(cfg, task.target_dictionary)
+        return cls(cfg, w2v_encoder)
+
+    def get_normalized_probs(self, net_output, log_probs, sample=None):
+        """Get normalized probabilities (or log probs) from a net's output."""
+        if "encoder_out" not in net_output:
+            return self.w2v_encoder.get_normalized_probs_decoder(net_output, log_probs, sample)
+
+        if "encoder_out_for_ctc" in net_output:
+            logits = net_output["encoder_out_for_ctc"]
+        else:
+            logits = net_output["encoder_out"]
+        
+        if isinstance(logits, list):
+            logits = logits[0]
+
+        if log_probs:
+            return utils.log_softmax(logits.float(), dim=-1)
+        else:
+            return utils.softmax(logits.float(), dim=-1)
+
+    def get_logits(self, net_output):
+        logits = net_output["encoder_out"]
+        padding = net_output["encoder_padding_mask"]
+        if padding is not None and padding.any():
+            padding = padding.T
+            logits[padding][..., 0] = 0
+            logits[padding][..., 1:] = float("-inf")
+
+        return logits
+
+    def forward(self, **kwargs):
+        x = self.w2v_encoder(**kwargs)
+        return x
+
+    @property
+    def encoder(self):
+        return self.w2v_encoder
+
+    def reorder_encoder_out(self, encoder_out, new_order):
+        return self.encoder.reorder_encoder_out(encoder_out, new_order)
+
+    @property
+    def decoder(self):
+        return self.w2v_encoder.w2v_model.decoder
+
+
+class HubertEncoder(FairseqEncoder):
+    def __init__(self, cfg: HubertAsrConfig, tgt_dict=None):
+        self.apply_mask = cfg.apply_mask
+        logger.info(f"self.apply_mask: {self.apply_mask}")
+
+        arg_overrides = {
+            "dropout": cfg.dropout,
+            "activation_dropout": cfg.activation_dropout,
+            "dropout_input": cfg.dropout_input,
+            "attention_dropout": cfg.attention_dropout,
+            "mask_length": cfg.mask_length,
+            "mask_prob": cfg.mask_prob,
+            "mask_selection": cfg.mask_selection,
+            "mask_other": cfg.mask_other,
+            "no_mask_overlap": cfg.no_mask_overlap,
+            "mask_channel_length": cfg.mask_channel_length,
+            "mask_channel_prob": cfg.mask_channel_prob,
+            "mask_channel_selection": cfg.mask_channel_selection,
+            "mask_channel_other": cfg.mask_channel_other,
+            "no_mask_channel_overlap": cfg.no_mask_channel_overlap,
+            "encoder_layerdrop": cfg.layerdrop,
+            "decoder_layerdrop": cfg.decoder_layerdrop,
+            "feature_grad_mult": cfg.feature_grad_mult,
+            "decoder_dict_size": len(tgt_dict) if cfg.add_decoder else -1,
+            "share_decoder_input_output_embed": cfg.share_decoder_input_output_embed,
+            "load_pretrained_w2v_from": cfg.load_pretrained_w2v_from,
+            "load_pretrained_mbart_from": cfg.load_pretrained_mbart_from,
+            "adaptor_stride": cfg.adaptor_stride,
+        }
+
+        if cfg.no_pretrained_weights:
+            arg_overrides["use_rel_pos_enc"] = cfg.use_rel_pos_enc
+            arg_overrides["encoder_layers"] = cfg.encoder_layers
+            arg_overrides["add_text_encoder"] = cfg.add_text_encoder
+            arg_overrides["share_enc_dec_embeddings"] = cfg.share_enc_dec_embeddings
+            arg_overrides["share_s2t_t2t_embeddings"] = cfg.share_s2t_t2t_embeddings
+            arg_overrides["add_adaptor"] = cfg.add_adaptor
+
+        if cfg.w2v_args is None:
+            state = checkpoint_utils.load_checkpoint_to_cpu(cfg.w2v_path, arg_overrides)
+            w2v_args = state.get("cfg", None)
+            if w2v_args is None:
+                w2v_args = convert_namespace_to_omegaconf(state["args"])
+            cfg.w2v_args = w2v_args
+        else:
+            state = None
+            w2v_args = cfg.w2v_args
+            if isinstance(w2v_args, Namespace):
+                cfg.w2v_args = w2v_args = convert_namespace_to_omegaconf(w2v_args)
+
+        ## in speech_text_joint_to_text, data is loaded by soundfile, which returns without normalization
+        if cfg.normalize != w2v_args.task.normalize:
+            logger.warn(
+                "Fine-tuning works best when data normalization is the same. "
+                "Please check that --normalize is set or unset for "
+                "both pre-training and here"
+            )
+
+        w2v_args.task.data = cfg.data
+        if hasattr(w2v_args.task, "text_cfg"):
+            w2v_args.task.text_cfg.data_config = None
+        w2v_args.task.add_decoder = cfg.add_decoder
+        task = tasks.setup_task(w2v_args.task)
+        if state is not None and "task_state" in state:
+            # This will load the stored "dictionaries" object
+            task.load_state_dict(state["task_state"])
+        model = task.build_model(w2v_args.model)
+
+        ### delete the embed_tokens and output_projection of decoder
+        if state is not None and not cfg.no_pretrained_weights:
+            if cfg.retain_dict_path is not None:
+                assert model.add_text_modality, "Mustc have text modality if retain dict path"
+                logger.info("Cut embedding to a smaller size according to ratin dict")
+                with open(cfg.retain_dict_path, "rb") as fp:
+                    overlap_idxs = pickle.load(fp)
+                state['model']['decoder.output_projection.1.weight'] = state['model']['decoder.output_projection.1.weight'][overlap_idxs]
+                state["model"]["decoder.embed_tokens_list.1.weight"] = state["model"]["decoder.embed_tokens_list.1.weight"][overlap_idxs]
+            if cfg.reuse_text_emb:
+                assert model.add_text_modality, "Mustc have text modality if reuse text embed"
+                logger.info("Loading text-text pretrained token-embedding for speech-text finetuning...")
+                state["model"]["decoder.embed_tokens_list.0.weight"] = state["model"]["decoder.embed_tokens_list.1.weight"]
+                del state["model"]["decoder.embed_tokens_list.1.weight"]
+                state["model"]["decoder.output_projection.0.weight"] = state["model"]["decoder.output_projection.1.weight"]
+                del state["model"]["decoder.output_projection.1.weight"]
+                try:
+                    model.load_state_dict(state["model"], strict=True)
+                except Exception as e:
+                    logger.warn(e)
+                    model.load_state_dict(state["model"], strict=False)
+            else:
+                for pname in list(state["model"].keys()):
+                    if pname.startswith("decoder.embed_tokens") or pname.startswith("decoder.output_projection"):
+                        del state["model"][pname]
+                # set strict=False because we omit some modules
+                model.load_state_dict(state["model"], strict=False)
+
+        ### in case we need load mbart embedding into asr embedding
+        if cfg.no_pretrained_weights and cfg.load_pretrained_mbart_from and cfg.reuse_text_emb:
+            logger.info("Loading mbart pretrained token-embedding for speech-text finetuning...")
+            mbart_dec_states = model.decoder.state_dict()
+            loading_states = {}
+            if cfg.retain_dict_path is not None:
+                logger.info("Cut embedding to a smaller size according to ratin dict")
+                with open(cfg.retain_dict_path, "rb") as fp:
+                    overlap_idxs = pickle.load(fp)
+                loading_states["output_projection.0.weight"] = mbart_dec_states['output_projection.1.weight'][overlap_idxs]
+                loading_states["embed_tokens_list.0.weight"] = mbart_dec_states['embed_tokens_list.1.weight'][overlap_idxs]
+            else:
+                loading_states["output_projection.0.weight"] = mbart_dec_states['output_projection.1.weight']
+                loading_states["embed_tokens_list.0.weight"] = mbart_dec_states['embed_tokens_list.1.weight']
+            model.decoder.load_state_dict(loading_states, strict=False)
+
+        model.remove_pretraining_modules()
+
+        super().__init__(task.source_dictionary)
+
+        d = w2v_args.model.encoder_embed_dim
+
+        self.w2v_model = model
+
+        self.final_dropout = nn.Dropout(cfg.final_dropout)
+        self.freeze_finetune_updates = cfg.freeze_finetune_updates
+        self.freeze_decoder_updates = cfg.freeze_decoder_updates
+        self.num_updates = 0
+
+        if cfg.share_ctc_decoder_embed:
+            assert cfg.add_decoder and cfg.share_decoder_input_output_embed, "Must share decoder input and output embed before share ctc and decoder embed"
+            if isinstance(self.w2v_model.decoder, MultimodalTransformerDecoder):
+                self.proj = nn.Linear(
+                    self.w2v_model.decoder.embed_tokens_list[0].weight.shape[1],
+                    self.w2v_model.decoder.embed_tokens_list[0].weight.shape[0],
+                    bias=False,
+                )
+                self.proj.weight = self.w2v_model.decoder.embed_tokens_list[0].weight
+            else:
+                self.proj = nn.Linear(
+                    self.w2v_model.decoder.embed_tokens.weight.shape[1],
+                    self.w2v_model.decoder.embed_tokens.weight.shape[0],
+                    bias=False,
+                )
+                self.proj.weight = self.w2v_model.decoder.embed_tokens.weight
+        elif tgt_dict is not None:
+            self.proj = Linear(d, len(tgt_dict))
+        elif getattr(cfg, "decoder_embed_dim", d) != d:
+            self.proj = Linear(d, cfg.decoder_embed_dim)
+        else:
+            self.proj = None
+
+    def set_num_updates(self, num_updates):
+        """Set the number of parameters updates."""
+        super().set_num_updates(num_updates)
+        self.num_updates = num_updates
+
+    def forward(self, source, padding_mask, prev_output_tokens=None, tbc=True, **kwargs):
+
+        ft = self.freeze_finetune_updates <= self.num_updates
+        w2v_args = {
+            "source": source,
+            "padding_mask": padding_mask,
+            "mask": self.apply_mask and self.training,
+            "prev_output_tokens": prev_output_tokens,
+            "ft": ft,
+        }
+
+        if self.freeze_decoder_updates <= self.num_updates:
+            self.w2v_model.add_decoder = True
+        else:
+            self.w2v_model.add_decoder = False
+        
+        x, padding_mask, decoder_out = self.w2v_model.extract_features(**w2v_args)
+        
+        if tbc:
+            # B x T x C -> T x B x C
+            x = x.transpose(0, 1)
+
+        x = self.final_dropout(x)
+
+        if self.proj:
+            x = self.proj(x)
+
+        return {
+            "encoder_out": x,  # T x B x C
+            "encoder_padding_mask": padding_mask,  # B x T
+            "padding_mask": padding_mask,
+            "decoder_out": decoder_out,
+        }
+
+    def get_normalized_probs_decoder(self, net_output, log_probs, sample=None):
+        # net_output['encoder_out'] is a (B, T, D) tensor
+        return self.w2v_model.get_normalized_probs(net_output, log_probs, sample)
+
+    def reorder_encoder_out(self, encoder_out, new_order):
+        if encoder_out["encoder_out"] is not None:
+            if isinstance(encoder_out["encoder_out"], list):
+                encoder_out["encoder_out"] = (
+                    [] if len(encoder_out["encoder_out"]) == 0
+                    else [x.index_select(1, new_order) for x in encoder_out["encoder_out"]]
+                )
+            else:
+                encoder_out["encoder_out"] = encoder_out[
+                    "encoder_out"
+                ].index_select(1, new_order)
+        if encoder_out["encoder_padding_mask"] is not None:
+            if isinstance(encoder_out["encoder_padding_mask"], list):
+                encoder_out["encoder_padding_mask"] = (
+                    [] if len(encoder_out["encoder_padding_mask"]) == 0
+                    else [x.index_select(0, new_order) for x in encoder_out["encoder_padding_mask"]]
+                )
+            else:
+                encoder_out["encoder_padding_mask"] = encoder_out[
+                    "encoder_padding_mask"
+                ].index_select(0, new_order)
+        if "decoder_out" in encoder_out and encoder_out["decoder_out"] is not None:
+            if isinstance(encoder_out["decoder_out"], list):
+                encoder_out["decoder_out"] = (
+                    [] if len(encoder_out["decoder_out"]) == 0
+                    else [x.index_select(0, new_order) for x in encoder_out["decoder_out"]]
+                )
+            else:
+                encoder_out["decoder_out"] = encoder_out[
+                    "decoder_out"
+                ].index_select(0, new_order)
+        if "encoder_out_for_ctc" in encoder_out and encoder_out["encoder_out_for_ctc"] is not None:
+            if isinstance(encoder_out["encoder_out_for_ctc"], list):
+                encoder_out["encoder_out_for_ctc"] = (
+                    [] if len(encoder_out["encoder_out_for_ctc"]) == 0
+                    else [x.index_select(1, new_order) for x in encoder_out["encoder_out_for_ctc"]]
+                )
+            else:
+                encoder_out["encoder_out_for_ctc"] = encoder_out[
+                    "encoder_out_for_ctc"
+                ].index_select(1, new_order)
+
+        return encoder_out
+
+    def forward_torchscript(self, net_input):
+        """A TorchScript-compatible version of forward.
+        Encoders which use additional arguments may want to override
+        this method for TorchScript compatibility.
+        """
+        encoder_out = self.w2v_model.forward_torchscript(net_input)
+        
+        assert self.proj is not None
+        encoder_out['encoder_out_for_ctc'] = [self.proj(encoder_out['encoder_out'][0])]
+        
+        return encoder_out
+
+    def max_positions(self):
+        """Maximum input length supported by the encoder."""
+        return None
+
+    def upgrade_state_dict_named(self, state_dict, name):
+        return state_dict
+
+
+def Embedding(num_embeddings, embedding_dim, padding_idx):
+    m = nn.Embedding(num_embeddings, embedding_dim, padding_idx=padding_idx)
+    nn.init.normal_(m.weight, mean=0, std=embedding_dim ** -0.5)
+    nn.init.constant_(m.weight[padding_idx], 0)
+    return m
+
+
+def Linear(in_features, out_features, bias=True):
+    m = nn.Linear(in_features, out_features, bias)
+    nn.init.xavier_uniform_(m.weight)
+    if bias:
+        nn.init.constant_(m.bias, 0.0)
+    return m
diff --git a/YiTrans/yitrans_iwslt22/models/finetune_mt.py b/YiTrans/yitrans_iwslt22/models/finetune_mt.py
new file mode 100644
index 0000000000000000000000000000000000000000..78259ee131b79077e5bf4f1df29646ed72def240
--- /dev/null
+++ b/YiTrans/yitrans_iwslt22/models/finetune_mt.py
@@ -0,0 +1,355 @@
+# --------------------------------------------------------
+# The YiTrans End-to-End Speech Translation System for IWSLT 2022 Offline Shared Task (https://arxiv.org/abs/2206.05777)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/YiTrans
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Based on fairseq code bases
+# https://github.com/facebookresearch/fairseq
+# --------------------------------------------------------
+
+import logging
+import contextlib
+from argparse import Namespace
+from typing import Any, Optional
+
+import torch
+import torch.nn as nn
+from dataclasses import dataclass, field
+from fairseq import checkpoint_utils, tasks, utils
+from fairseq.dataclass.utils import convert_namespace_to_omegaconf
+from fairseq.models import BaseFairseqModel, FairseqEncoder, register_model
+from fairseq.models.hubert.hubert import MASKING_DISTRIBUTION_CHOICES
+from fairseq.tasks import FairseqTask
+from omegaconf import II, MISSING
+
+from fairseq.models.hubert.hubert_asr import HubertCtcConfig
+from fairseq.models.transformer import TransformerConfig
+logger = logging.getLogger(__name__)
+
+
+@dataclass
+class HubertMTConfig(HubertCtcConfig):
+    use_rel_pos_enc: bool = field(
+        default=True,
+        metadata={"help": "whether to use relative positional encoding"},
+    )
+    # for decoder
+    decoder_layerdrop: float = field(
+        default=0.1,
+        metadata={"help": "probability of dropping a decoder layer in hubert"},
+    )
+    add_decoder: bool = field(
+        default=False,
+        metadata={"help": "whether to add decoder for CE Loss on code"},
+    )
+    reuse_text_emb: bool = field(
+        default=False,
+        metadata={"help": "reuse text token embeddings instead of initialize randomly"},
+    )
+    freeze_decoder_updates: int = field(
+        default=0,
+        metadata={"help": "dont finetune hubert for this many updates"},
+    )
+    share_decoder_input_output_embed: bool = field(
+        default=False,
+        metadata={"help": "share decoder input and output embeddings"},
+    )
+    share_enc_dec_embeddings: bool = field(
+        default=False,
+        metadata={"help": "share embeddings of (text encoder, text decoder)"},
+    )
+    share_s2t_t2t_embeddings: bool = field(
+        default=False,
+        metadata={"help": "share embeddings of (speech2text(code), text2text)"},
+    )
+    share_ctc_decoder_embed: bool = field(
+        default=False,
+        metadata={"help": "share ctc and decoder embedding (only when share_decoder_input_output_embed is true)"},
+    )
+    enc_grad_mult: float = field(
+        default=1.0,
+        metadata={"help": "reset feature grad mult in hubert to this (only for st2t)"},
+    )
+    retain_dict_path: Optional[str] = field(
+        default=None,
+        metadata={"help": "delete embeddings according to this path"},
+    )
+    load_pretrained_mbart_from: Optional[str] = field(
+        default=None,
+        metadata={
+            "help": "model to take text encoder decoder weights from (for initialization)"
+        },
+    )
+    text_transformer_encoder_layers: int = field(
+        default=12,
+        metadata={"help": "reset text_transformer_encoder_layers"},
+    )
+
+@register_model("finetune_mt", dataclass=HubertMTConfig)
+class YitransMT(BaseFairseqModel):
+    def __init__(self, cfg: HubertMTConfig, w2v_encoder: BaseFairseqModel):
+        super().__init__()
+        self.cfg = cfg
+        self.w2v_encoder = w2v_encoder
+
+    def upgrade_state_dict_named(self, state_dict, name):
+        super().upgrade_state_dict_named(state_dict, name)
+        return state_dict
+
+    @classmethod
+    def build_model(cls, cfg: HubertMTConfig, task: FairseqTask):
+        """Build a new model instance."""
+        w2v_encoder = HubertEncoder(cfg, task.target_dictionary)
+        return cls(cfg, w2v_encoder)
+
+    def get_normalized_probs(self, net_output, log_probs, sample=None):
+        """Get normalized probabilities (or log probs) from a net's output."""
+        if "decoder_out" in net_output:
+            return self.w2v_encoder.get_normalized_probs_decoder(net_output["decoder_out"], log_probs, sample)
+
+        assert "encoder_out" not in net_output
+        if "encoder_out" not in net_output:
+            return self.w2v_encoder.get_normalized_probs_decoder(net_output, log_probs, sample)
+
+
+    def get_logits(self, net_output):
+        logits = net_output["encoder_out"]
+        padding = net_output["encoder_padding_mask"]
+        if padding is not None and padding.any():
+            padding = padding.T
+            logits[padding][..., 0] = 0
+            logits[padding][..., 1:] = float("-inf")
+
+        return logits
+
+    def forward(self, **kwargs):
+        x = self.w2v_encoder(**kwargs)
+        return x
+
+    @property
+    def encoder(self):
+        return self.w2v_encoder
+
+    def reorder_encoder_out(self, encoder_out, new_order):
+        return self.encoder.reorder_encoder_out(encoder_out, new_order)
+
+    @property
+    def decoder(self):
+        return self.w2v_encoder.w2v_model.decoder
+
+
+class HubertEncoder(FairseqEncoder):
+    def __init__(self, cfg: HubertMTConfig, tgt_dict=None):
+        self.apply_mask = cfg.apply_mask
+
+        arg_overrides = {
+            "dropout": cfg.dropout,
+            "activation_dropout": cfg.activation_dropout,
+            "dropout_input": cfg.dropout_input,
+            "attention_dropout": cfg.attention_dropout,
+            "mask_length": cfg.mask_length,
+            "mask_prob": cfg.mask_prob,
+            "mask_selection": cfg.mask_selection,
+            "mask_other": cfg.mask_other,
+            "no_mask_overlap": cfg.no_mask_overlap,
+            "mask_channel_length": cfg.mask_channel_length,
+            "mask_channel_prob": cfg.mask_channel_prob,
+            "mask_channel_selection": cfg.mask_channel_selection,
+            "mask_channel_other": cfg.mask_channel_other,
+            "no_mask_channel_overlap": cfg.no_mask_channel_overlap,
+            "encoder_layerdrop": cfg.layerdrop,
+            "decoder_layerdrop": cfg.decoder_layerdrop,
+            "feature_grad_mult": cfg.feature_grad_mult,
+            "decoder_dict_size": -1,
+            "add_text_modality": True,
+            "add_text_encoder": True,
+            "load_pretrained_mbart_from": None,
+            "load_pretrained_w2v_from": None,
+            "text_transformer": {
+                "encoder":{
+                    "layers": cfg.text_transformer_encoder_layers,
+                    "layerdrop": cfg.layerdrop,
+                }, 
+                'dropout': cfg.dropout,
+                'attention_dropout': cfg.attention_dropout,
+                'activation_dropout': cfg.activation_dropout,
+                }
+            }
+        if cfg.no_pretrained_weights:
+            arg_overrides["use_rel_pos_enc"] = cfg.use_rel_pos_enc
+            arg_overrides["share_enc_dec_embeddings"] = cfg.share_enc_dec_embeddings
+            arg_overrides["share_s2t_t2t_embeddings"] = cfg.share_s2t_t2t_embeddings
+
+        if cfg.w2v_args is None:
+            state = checkpoint_utils.load_checkpoint_to_cpu(
+                cfg.w2v_path, arg_overrides
+            )
+            w2v_args = state.get("cfg", None)
+            if w2v_args is None:
+                w2v_args = convert_namespace_to_omegaconf(state["args"])
+            cfg.w2v_args = w2v_args
+        else:
+            state = None
+            w2v_args = cfg.w2v_args
+            if isinstance(w2v_args, Namespace):
+                cfg.w2v_args = w2v_args = convert_namespace_to_omegaconf(w2v_args)
+        
+        # logger.info("---------------------state.keys()-------------------------------------------")
+        # logger.info(state.keys())
+        # logger.info("---------------------w2v_args.task-------------------------------------------")
+        # logger.info(w2v_args.task)
+        # logger.info("---------------------w2v_args.model-------------------------------------------")
+        # logger.info(w2v_args.model)
+        # logger.info("----------------------------------------------------------------")
+                    
+        w2v_args.task.data = cfg.data
+        w2v_args.task.text_cfg.text_data = cfg.data
+        w2v_args.task.text_cfg.data_config = None
+        task = tasks.setup_task(w2v_args.task)
+
+        if state is not None and "task_state" in state:
+            # This will load the stored "dictionaries" object
+            task.load_state_dict(state["task_state"])
+
+        model = task.build_model(w2v_args.model)
+
+
+        ### load mbart if specificed
+        if cfg.load_pretrained_mbart_from is not None and cfg.no_pretrained_weights:
+            logger.info("Loading mbart....")
+            mbart_model_state = model.load_checkpoint(cfg.load_pretrained_mbart_from)
+            model.text_encoder = model.load_pretrained_component_from_model(
+                component=model.text_encoder, state=mbart_model_state
+            )
+            model.decoder = model.load_pretrained_component_from_model(
+                component=model.decoder, state=mbart_model_state
+            )
+        
+        if state is not None and not cfg.no_pretrained_weights:
+            logger.info("Loading pre-trained models....")
+            model.load_state_dict(state["model"], strict=True)
+        
+        ### remove_pretraining_modules model.remove_pretraining_modules()
+        model.target_glu = None
+        model.final_proj = None
+        model.feature_extractor = None
+        model.post_extract_proj = None
+        model.encoder = None
+
+        
+
+        dropout_keys = [ n for n in w2v_args.model.text_transformer if n.find("drop") >= 0 ]
+        for key in dropout_keys:
+            logger.info(f"{key}: {w2v_args.model.text_transformer[key]}")
+
+        super().__init__(task.source_dictionary)
+
+        d = w2v_args.model.encoder_embed_dim
+        
+        self.w2v_model = model
+
+        self.final_dropout = nn.Dropout(cfg.final_dropout)
+        self.freeze_finetune_updates = cfg.freeze_finetune_updates
+        self.freeze_decoder_updates = cfg.freeze_decoder_updates
+        self.num_updates = 0
+
+
+    def set_num_updates(self, num_updates):
+        """Set the number of parameters updates."""
+        super().set_num_updates(num_updates)
+        self.num_updates = num_updates
+
+    def forward(self, src_tokens, src_lengths, prev_output_tokens, tbc=True, **kwargs):
+
+        # ft = self.freeze_finetune_updates <= self.num_updates
+        w2v_args = {
+            "src_tokens": src_tokens,
+            "src_lengths": src_lengths,
+            "mask": self.apply_mask and self.training,
+            "prev_output_tokens": prev_output_tokens,
+        }
+
+        results = self.w2v_model(**w2v_args)
+        return results
+
+    def get_normalized_probs_decoder(self, net_output, log_probs, sample=None):
+        # net_output['encoder_out'] is a (B, T, D) tensor
+        return self.w2v_model.get_normalized_probs(net_output, log_probs, sample)
+
+    def reorder_encoder_out(self, encoder_out, new_order):
+        if encoder_out["encoder_out"] is not None:
+            if isinstance(encoder_out["encoder_out"], list):
+                encoder_out["encoder_out"] = (
+                    [] if len(encoder_out["encoder_out"]) == 0
+                    else [x.index_select(1, new_order) for x in encoder_out["encoder_out"]]
+                )
+            else:
+                encoder_out["encoder_out"] = encoder_out[
+                    "encoder_out"
+                ].index_select(1, new_order)
+        if encoder_out["encoder_padding_mask"] is not None:
+            if isinstance(encoder_out["encoder_padding_mask"], list):
+                encoder_out["encoder_padding_mask"] = (
+                    [] if len(encoder_out["encoder_padding_mask"]) == 0
+                    else [x.index_select(0, new_order) for x in encoder_out["encoder_padding_mask"]]
+                )
+            else:
+                encoder_out["encoder_padding_mask"] = encoder_out[
+                    "encoder_padding_mask"
+                ].index_select(0, new_order)
+        if "decoder_out" in encoder_out and encoder_out["decoder_out"] is not None:
+            if isinstance(encoder_out["decoder_out"], list):
+                encoder_out["decoder_out"] = (
+                    [] if len(encoder_out["decoder_out"]) == 0
+                    else [x.index_select(0, new_order) for x in encoder_out["decoder_out"]]
+                )
+            else:
+                encoder_out["decoder_out"] = encoder_out[
+                    "decoder_out"
+                ].index_select(0, new_order)
+        if "encoder_out_for_ctc" in encoder_out and encoder_out["encoder_out_for_ctc"] is not None:
+            if isinstance(encoder_out["encoder_out_for_ctc"], list):
+                encoder_out["encoder_out_for_ctc"] = (
+                    [] if len(encoder_out["encoder_out_for_ctc"]) == 0
+                    else [x.index_select(1, new_order) for x in encoder_out["encoder_out_for_ctc"]]
+                )
+            else:
+                encoder_out["encoder_out_for_ctc"] = encoder_out[
+                    "encoder_out_for_ctc"
+                ].index_select(1, new_order)
+
+        return encoder_out
+
+    def forward_torchscript(self, net_input):
+        """A TorchScript-compatible version of forward.
+
+        Encoders which use additional arguments may want to override
+        this method for TorchScript compatibility.
+        """
+        encoder_out = self.w2v_model.forward_torchscript(net_input)
+        if "encoder_out_for_ctc" in encoder_out:
+            del encoder_out['encoder_out_for_ctc']
+        
+        return encoder_out
+
+    def max_positions(self):
+        """Maximum input length supported by the encoder."""
+        return None
+
+    def upgrade_state_dict_named(self, state_dict, name):
+        return state_dict
+
+
+def Embedding(num_embeddings, embedding_dim, padding_idx):
+    m = nn.Embedding(num_embeddings, embedding_dim, padding_idx=padding_idx)
+    nn.init.normal_(m.weight, mean=0, std=embedding_dim ** -0.5)
+    nn.init.constant_(m.weight[padding_idx], 0)
+    return m
+
+
+def Linear(in_features, out_features, bias=True):
+    m = nn.Linear(in_features, out_features, bias)
+    nn.init.xavier_uniform_(m.weight)
+    if bias:
+        nn.init.constant_(m.bias, 0.0)
+    return m
diff --git a/YiTrans/yitrans_iwslt22/models/finetune_st.py b/YiTrans/yitrans_iwslt22/models/finetune_st.py
new file mode 100644
index 0000000000000000000000000000000000000000..37e75bee1f4f669333cd275ad715466571be9c69
--- /dev/null
+++ b/YiTrans/yitrans_iwslt22/models/finetune_st.py
@@ -0,0 +1,434 @@
+# --------------------------------------------------------
+# The YiTrans End-to-End Speech Translation System for IWSLT 2022 Offline Shared Task (https://arxiv.org/abs/2206.05777)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/YiTrans
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Based on fairseq code bases
+# https://github.com/facebookresearch/fairseq
+# --------------------------------------------------------
+
+import logging
+import contextlib
+import pickle
+from argparse import Namespace
+from typing import Any, Optional
+
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from dataclasses import dataclass, field
+from fairseq import checkpoint_utils, tasks, utils
+from fairseq.dataclass.utils import convert_namespace_to_omegaconf
+from fairseq.models import BaseFairseqModel, FairseqEncoder, register_model
+from fairseq.models.hubert.hubert_asr import HubertCtcConfig, HubertAsrConfig
+from fairseq.tasks import FairseqTask
+from fairseq.data.data_utils import lengths_to_padding_mask
+from omegaconf import II, open_dict
+
+
+logger = logging.getLogger(__name__)
+
+@dataclass
+class HubertSt2tCtcConfig(HubertCtcConfig):
+    load_speech_only: bool = II("task.load_speech_only")
+    ## for decoder overrides
+    decoder_layerdrop: float = field(
+        default=0.1,
+        metadata={"help": "probability of dropping a decoder layer in hubert"},
+    )
+    add_decoder: bool = field(
+        default=False,
+        metadata={"help": "whether to add decoder for CE Loss on code"},
+    )
+    reuse_text_emb: bool = field(
+        default=False,
+        metadata={"help": "reuse text token embeddings instead of initialize randomly"},
+    )
+    freeze_decoder_updates: int = field(
+        default=0,
+        metadata={"help": "dont finetune hubert for this many updates"},
+    )
+    # share_enc_dec_embeddings: bool = field(
+    #     default=False,
+    #     metadata={"help": "share embeddings of (text encoder, text decoder)"},
+    # )
+    share_s2t_t2t_embeddings: bool = field(
+        default=False,
+        metadata={"help": "share embeddings of (speech2text(code), text2text)"},
+    )
+    share_ctc_decoder_embed: bool = field(
+        default=False,
+        metadata={"help": "share ctc and decoder embedding (only when share_decoder_input_output_embed is true)"},
+    )
+    enc_grad_mult: float = field(
+        default=1.0,
+        metadata={"help": "reset feature grad mult in hubert to this (only for st2t)"},
+    )
+    retain_dict_path: Optional[str] = field(
+        default=None,
+        metadata={"help": "delete embeddings according to this path"},
+    )
+    load_step2_model_from: Optional[str] = field(
+        default=None,
+        metadata={
+            "help": "load step2 model from"
+        },
+    )
+
+    # for other overrides
+    adaptor_stride: int = field(
+        default=2,
+        metadata={"help": "adaptor stride"},
+    )
+
+@register_model("hubert_st2t", dataclass=HubertSt2tCtcConfig)
+class HubertST2T(BaseFairseqModel):
+    def __init__(self, cfg: HubertSt2tCtcConfig, w2v_encoder: BaseFairseqModel):
+        super().__init__()
+        self.cfg = cfg
+        self.w2v_encoder = w2v_encoder
+        self.num_updates = 0
+
+        ### in case we need load hubert_step2 model
+        if cfg.load_step2_model_from:
+            logger.info(f"Loading hubert_step2 pretrained model for finetuning: {cfg.load_step2_model_from}")
+            hubert_step2_states = self.w2v_encoder.w2v_model.load_checkpoint(cfg.load_step2_model_from)["model"]
+            if cfg.retain_dict_path is not None:
+                with open(cfg.retain_dict_path, "rb") as fp:
+                    overlap_idxs = pickle.load(fp)
+                if hubert_step2_states['w2v_encoder.w2v_model.decoder.output_projection.0.weight'].size(0) != len(overlap_idxs):
+                    assert self.w2v_encoder.w2v_model.add_text_modality, "Mustc have text modality if retain dict path"
+                    logger.info("Cut embedding to a smaller size according to retain dict")
+                    hubert_step2_states['w2v_encoder.w2v_model.decoder.output_projection.0.weight'] = hubert_step2_states['w2v_encoder.w2v_model.decoder.output_projection.0.weight'][overlap_idxs]
+                    hubert_step2_states["w2v_encoder.w2v_model.decoder.embed_tokens_list.0.weight"] = hubert_step2_states["w2v_encoder.w2v_model.decoder.embed_tokens_list.0.weight"][overlap_idxs]
+                    if hubert_step2_states.get("w2v_encoder.w2v_model.text_encoder.embed_tokens.weight") is not None:
+                        hubert_step2_states["w2v_encoder.w2v_model.text_encoder.embed_tokens.weight"] = hubert_step2_states["w2v_encoder.w2v_model.text_encoder.embed_tokens.weight"][overlap_idxs]
+                else:
+                    logger.info(f"cfg.load_step2_model_from matches the cut embedding dims {len(overlap_idxs)}, no cutting needs to do")
+            if not self.cfg.load_speech_only and hubert_step2_states.get("w2v_encoder.w2v_model.text_encoder.embed_tokens.weight", None) is None:
+                hubert_step2_states["w2v_encoder.w2v_model.text_encoder.embed_tokens.weight"] = hubert_step2_states["w2v_encoder.w2v_model.decoder.embed_tokens_list.0.weight"]
+            try:
+                self.load_state_dict(hubert_step2_states, strict=True)
+            except Exception as e:
+                logger.warn(e)
+                self.load_state_dict(hubert_step2_states, strict=False)
+
+    def upgrade_state_dict_named(self, state_dict, name):
+        super().upgrade_state_dict_named(state_dict, name)
+        return state_dict
+
+    @classmethod
+    def build_model(cls, cfg: HubertSt2tCtcConfig, task: FairseqTask):
+        """Build a new model instance."""
+        w2v_encoder = HubertEncoder(cfg, task.target_dictionary)
+        return cls(cfg, w2v_encoder)
+
+    def get_normalized_probs(self, net_output, log_probs, sample=None):
+        """Get normalized probabilities (or log probs) from a net's output."""
+        if "encoder_out" not in net_output:
+            return self.w2v_encoder.get_normalized_probs_decoder(net_output, log_probs, sample)
+
+        if "encoder_out_for_ctc" in net_output:
+            logits = net_output["encoder_out_for_ctc"]
+        else:
+            logits = net_output["encoder_out"]
+        
+        if isinstance(logits, list):
+            logits = logits[0]
+
+        if log_probs:
+            return utils.log_softmax(logits.float(), dim=-1)
+        else:
+            return utils.softmax(logits.float(), dim=-1)
+
+    def get_logits(self, net_output):
+        logits = net_output["encoder_out"]
+        padding = net_output["encoder_padding_mask"]
+        if padding is not None and padding.any():
+            padding = padding.T
+            logits[padding][..., 0] = 0
+            logits[padding][..., 1:] = float("-inf")
+
+        return logits
+
+    def forward(self, **kwargs):
+        x = self.w2v_encoder(**kwargs)
+        return x
+
+    @property
+    def encoder(self):
+        return self.w2v_encoder
+
+    def reorder_encoder_out(self, encoder_out, new_order):
+        return self.encoder.reorder_encoder_out(encoder_out, new_order)
+
+    @property
+    def decoder(self):
+        return self.w2v_encoder.w2v_model.decoder
+    
+    def set_num_updates(self, num_updates):
+        """Set the number of parameters updates."""
+        super().set_num_updates(num_updates)
+        self.num_updates = num_updates
+
+
+class HubertEncoder(FairseqEncoder):
+    def __init__(self, cfg: HubertAsrConfig, tgt_dict=None):
+        self.apply_mask = cfg.apply_mask
+        logger.info(f"self.apply_mask: {self.apply_mask}")
+
+        arg_overrides = {
+            "dropout": cfg.dropout,
+            "activation_dropout": cfg.activation_dropout,
+            "dropout_input": cfg.dropout_input,
+            "attention_dropout": cfg.attention_dropout,
+            "mask_length": cfg.mask_length,
+            "mask_prob": cfg.mask_prob,
+            "mask_selection": cfg.mask_selection,
+            "mask_other": cfg.mask_other,
+            "no_mask_overlap": cfg.no_mask_overlap,
+            "mask_channel_length": cfg.mask_channel_length,
+            "mask_channel_prob": cfg.mask_channel_prob,
+            "mask_channel_selection": cfg.mask_channel_selection,
+            "mask_channel_other": cfg.mask_channel_other,
+            "no_mask_channel_overlap": cfg.no_mask_channel_overlap,
+            "encoder_layerdrop": cfg.layerdrop,
+            "decoder_layerdrop": cfg.decoder_layerdrop,
+            "feature_grad_mult": cfg.feature_grad_mult,
+            "decoder_dict_size": len(tgt_dict) if cfg.add_decoder else -1,
+            "share_decoder_input_output_embed": cfg.share_decoder_input_output_embed,
+            "load_pretrained_w2v_from": cfg.load_pretrained_w2v_from,
+            "load_pretrained_mbart_from": None,
+            "adaptor_stride": cfg.adaptor_stride,
+            "share_speech_text_embeddings": cfg.share_speech_text_embeddings,
+        }
+
+        if cfg.no_pretrained_weights:
+            arg_overrides["use_rel_pos_enc"] = cfg.use_rel_pos_enc
+            arg_overrides["encoder_layers"] = cfg.encoder_layers
+            arg_overrides["add_text_encoder"] = cfg.add_text_encoder
+            arg_overrides["share_all_embeddings"] = cfg.share_all_embeddings
+            arg_overrides["add_adaptor"] = cfg.add_adaptor
+
+        if cfg.w2v_args is None:
+            state = checkpoint_utils.load_checkpoint_to_cpu(cfg.w2v_path, arg_overrides)
+            w2v_args = state.get("cfg", None)
+            if w2v_args is None:
+                w2v_args = convert_namespace_to_omegaconf(state["args"])
+            cfg.w2v_args = w2v_args
+        else:
+            state = None
+            w2v_args = cfg.w2v_args
+            if isinstance(w2v_args, Namespace):
+                cfg.w2v_args = w2v_args = convert_namespace_to_omegaconf(w2v_args)
+
+        ## in speech_text_joint_to_text, data is loaded by soundfile, which returns without normalization
+        self.need_preprocess = w2v_args.task.normalize
+        logger.warn("We need normalize the input wavform from the src_tokens")
+
+        if cfg.normalize != w2v_args.task.normalize:
+            logger.warn(
+            "Fine-tuning works best when data normalization is the same. "
+            "Please check that --normalize is set or unset for "
+            "both pre-training and here"
+        )
+        
+        if not "share_speech_text_embeddings" in w2v_args.model:
+            with open_dict(w2v_args.model): 
+                w2v_args.model.share_speech_text_embedding = cfg.share_speech_text_embeddings
+        logger.info(f"share_speech_text_embeddings: {(getattr(w2v_args.model, 'share_speech_text_embeddings', False))}")
+        w2v_args.task.data = cfg.data
+        w2v_args.task.add_decoder = cfg.add_decoder
+        assert w2v_args.model._name == "hubert"
+
+        task = tasks.setup_task(w2v_args.task)
+        if state is not None and "task_state" in state:
+            # This will load the stored "dictionaries" object
+            task.load_state_dict(state["task_state"])
+        model = task.build_model(w2v_args.model)
+
+        ### modify the embed_tokens and output_projection of decoder
+        if state is not None and not cfg.no_pretrained_weights:
+            model_states = self.modify_states(state['model'], cfg.retain_dict_path, cfg.reuse_text_emb)
+            try:
+                model.load_state_dict(model_states, strict=True)
+            except Exception as e:
+                logger.warn(e)
+                model.load_state_dict(model_states, strict=False)
+
+        ### in case we need load mbart
+        if cfg.no_pretrained_weights and cfg.load_pretrained_mbart_from:
+            logger.info("Loading mbart ...")
+            mbart_state = model.load_checkpoint(cfg.load_pretrained_mbart_from)
+            mbart_state["model"] = self.modify_states(mbart_state["model"], cfg.retain_dict_path, cfg.reuse_text_emb, is_mbart=True)
+            model.text_encoder = model.load_pretrained_component_from_model(
+                component=model.text_encoder, state=mbart_state
+            )
+            model.decoder = model.load_pretrained_component_from_model(
+                component=model.decoder, state=mbart_state
+            )
+
+        model.remove_pretraining_modules(step2=(not cfg.load_speech_only))
+        # model.remove_pretraining_modules()
+
+        super().__init__(task.source_dictionary)
+
+        d = w2v_args.model.encoder_embed_dim
+
+        self.w2v_model = model
+
+        self.final_dropout = nn.Dropout(cfg.final_dropout)
+        self.freeze_finetune_updates = cfg.freeze_finetune_updates
+        self.freeze_decoder_updates = cfg.freeze_decoder_updates
+        self.num_updates = 0
+        self.enc_grad_mult = cfg.enc_grad_mult
+    
+    def modify_states(self, model_states, retain_dict_path=None, reuse_text_emb=False, is_mbart=False):
+        if retain_dict_path is not None:
+            logger.info("Cut embedding to a smaller size according to retain dict")
+            with open(retain_dict_path, "rb") as fp:
+                overlap_idxs = pickle.load(fp)
+            if is_mbart:
+                model_states["decoder.embed_tokens_list.1.weight"] = model_states["decoder.embed_tokens.weight"][overlap_idxs]
+                model_states["decoder.output_projection.1.weight"] = model_states["decoder.output_projection.weight"][overlap_idxs]
+                model_states["decoder.embed_tokens.weight"] = model_states["decoder.embed_tokens.weight"][overlap_idxs]
+                model_states["decoder.output_projection.weight"] = model_states["decoder.output_projection.weight"][overlap_idxs]
+                model_states["encoder.embed_tokens.weight"] = model_states["encoder.embed_tokens.weight"][overlap_idxs]
+            else:
+                model_states['decoder.output_projection.1.weight'] = model_states['decoder.output_projection.1.weight'][overlap_idxs]
+                model_states["decoder.embed_tokens_list.1.weight"] = model_states["decoder.embed_tokens_list.1.weight"][overlap_idxs]
+                model_states["text_encoder.embed_tokens.weight"] = model_states["text_encoder.embed_tokens.weight"][overlap_idxs]
+        if reuse_text_emb:
+            logger.info("Loading decoder.embed_tokens_list.0 <-- decoder.embed_tokens_list.1")
+            model_states["decoder.embed_tokens_list.0.weight"] = model_states["decoder.embed_tokens_list.1.weight"]
+            model_states["decoder.output_projection.0.weight"] = model_states["decoder.output_projection.1.weight"]
+            del model_states["decoder.embed_tokens_list.1.weight"]
+            del model_states["decoder.output_projection.1.weight"]
+        return model_states
+    
+    def set_num_updates(self, num_updates):
+        """Set the number of parameters updates."""
+        super().set_num_updates(num_updates)
+        self.num_updates = num_updates
+
+    def forward(self, src_tokens=None, src_lengths=None, src_txt_tokens=None, src_txt_lengths=None, prev_output_tokens=None, tbc=True, **kwargs):
+        padding_mask = lengths_to_padding_mask(src_lengths)
+        if self.need_preprocess:
+            src_tokens = torch.stack([F.layer_norm(wav, wav.shape) for wav in src_tokens])
+            src_tokens[padding_mask] = 0.0
+
+        ft = self.freeze_finetune_updates <= self.num_updates
+        w2v_args = {
+            "source": src_tokens,
+            "padding_mask": padding_mask,
+            "mask": self.apply_mask and self.training,
+            "prev_output_tokens": prev_output_tokens,
+            "ft": ft,
+            "enc_grad_mult": self.enc_grad_mult,
+        }
+
+        if self.freeze_decoder_updates <= self.num_updates:
+            self.w2v_model.add_decoder = True
+        else:
+            self.w2v_model.add_decoder = False
+        
+        x, padding_mask, decoder_out = self.w2v_model.extract_features(**w2v_args)
+        
+        if tbc:
+            # B x T x C -> T x B x C
+            x = x.transpose(0, 1)
+
+        x = self.final_dropout(x)
+
+        if src_txt_tokens is not None:
+            w2v_args_text = {
+                "src_tokens": src_txt_tokens,
+                "src_lengths": src_txt_lengths,
+                "prev_output_tokens": prev_output_tokens,
+            }
+
+            decoder_output_text = self.w2v_model(**w2v_args_text)
+            decoder_out = (torch.cat([decoder_out[0], decoder_output_text['decoder_out'][0]], dim=0), {"attn_cost": None})
+
+        return decoder_out
+
+    def get_normalized_probs_decoder(self, net_output, log_probs, sample=None):
+        # net_output['encoder_out'] is a (B, T, D) tensor
+        return self.w2v_model.get_normalized_probs(net_output, log_probs, sample)
+
+    def reorder_encoder_out(self, encoder_out, new_order):
+        if encoder_out["encoder_out"] is not None:
+            if isinstance(encoder_out["encoder_out"], list):
+                encoder_out["encoder_out"] = (
+                    [] if len(encoder_out["encoder_out"]) == 0
+                    else [x.index_select(1, new_order) for x in encoder_out["encoder_out"]]
+                )
+            else:
+                encoder_out["encoder_out"] = encoder_out[
+                    "encoder_out"
+                ].index_select(1, new_order)
+        if encoder_out["encoder_padding_mask"] is not None:
+            if isinstance(encoder_out["encoder_padding_mask"], list):
+                encoder_out["encoder_padding_mask"] = (
+                    [] if len(encoder_out["encoder_padding_mask"]) == 0
+                    else [x.index_select(0, new_order) for x in encoder_out["encoder_padding_mask"]]
+                )
+            else:
+                encoder_out["encoder_padding_mask"] = encoder_out[
+                    "encoder_padding_mask"
+                ].index_select(0, new_order)
+        if "decoder_out" in encoder_out and encoder_out["decoder_out"] is not None:
+            if isinstance(encoder_out["decoder_out"], list):
+                encoder_out["decoder_out"] = (
+                    [] if len(encoder_out["decoder_out"]) == 0
+                    else [x.index_select(0, new_order) for x in encoder_out["decoder_out"]]
+                )
+            else:
+                encoder_out["decoder_out"] = encoder_out[
+                    "decoder_out"
+                ].index_select(0, new_order)
+
+        return encoder_out
+
+    def forward_torchscript(self, net_input):
+        """A TorchScript-compatible version of forward.
+
+        Encoders which use additional arguments may want to override
+        this method for TorchScript compatibility.
+        """
+        padding_mask = lengths_to_padding_mask(net_input["src_lengths"])
+        src_tokens = net_input["src_tokens"]
+        if self.need_preprocess:
+            src_tokens = torch.stack([F.layer_norm(wav, wav.shape) for wav in src_tokens])
+            src_tokens[padding_mask] = 0.0
+
+        _net_input = {
+            "source": src_tokens,
+            "padding_mask": padding_mask,
+        }
+
+        encoder_out = self.w2v_model.forward_torchscript(_net_input)
+        
+        return encoder_out
+
+    def max_positions(self):
+        """Maximum input length supported by the encoder."""
+        return None
+
+    def upgrade_state_dict_named(self, state_dict, name):
+        return state_dict
+
+
+def Embedding(num_embeddings, embedding_dim, padding_idx):
+    m = nn.Embedding(num_embeddings, embedding_dim, padding_idx=padding_idx)
+    nn.init.normal_(m.weight, mean=0, std=embedding_dim ** -0.5)
+    nn.init.constant_(m.weight[padding_idx], 0)
+    return m
+
+
+def Linear(in_features, out_features, bias=True):
+    m = nn.Linear(in_features, out_features, bias)
+    nn.init.xavier_uniform_(m.weight)
+    if bias:
+        nn.init.constant_(m.bias, 0.0)
+    return m
diff --git a/YiTrans/yitrans_iwslt22/models/pretrain_ed.py b/YiTrans/yitrans_iwslt22/models/pretrain_ed.py
new file mode 100644
index 0000000000000000000000000000000000000000..a07fab74df97ca424f2201d3e8f8826ece2c82c4
--- /dev/null
+++ b/YiTrans/yitrans_iwslt22/models/pretrain_ed.py
@@ -0,0 +1,698 @@
+# --------------------------------------------------------
+# The YiTrans End-to-End Speech Translation System for IWSLT 2022 Offline Shared Task (https://arxiv.org/abs/2206.05777)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/YiTrans
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Based on fairseq code bases
+# https://github.com/facebookresearch/fairseq
+# --------------------------------------------------------
+
+import logging
+import contextlib
+from dataclasses import dataclass, field
+from typing import Dict, List, Optional, Tuple, Union
+from collections import OrderedDict
+
+import copy
+import torch
+from omegaconf import II
+
+from fairseq import checkpoint_utils
+from fairseq.data.dictionary import Dictionary
+from fairseq.dataclass import ChoiceEnum
+from fairseq.models import register_model, FairseqDecoder
+from fairseq.models.transformer import (
+    TransformerEncoderBase,
+    TransformerConfig,
+)
+from fairseq.models.speech_to_text import Conv1dAdaptor
+from fairseq.models.transformer import Embedding
+from fairseq.file_io import PathManager
+from torch import Tensor
+from fairseq.models.wav2vec.wav2vec2 import ConvFeatureExtractionModel
+from fairseq.modules import GradMultiply
+
+from fairseq.models.hubert import HubertConfig, HubertModel
+
+from fairseq.models.wav2vec.wav2vec2 import TransformerEncoder as W2vTransformerEncoder
+from yitrans_iwslt22.modules.w2v_encoder import TransformerEncoder
+from yitrans_iwslt22.modules.transformer_decoder import TransformerDecoderScriptable
+from yitrans_iwslt22.modules.multimodal_transformer_decoder import MultimodalTransformerDecoder
+from yitrans_iwslt22.tasks.iwslt_joint_pretraining import (
+    JointPretrainingConfig,
+    JointPretrainingTask,
+)
+
+logger = logging.getLogger(__name__)
+
+EXTRACTOR_MODE_CHOICES = ChoiceEnum(["default", "layer_norm"])
+MASKING_DISTRIBUTION_CHOICES = ChoiceEnum(["static", "uniform", "normal", "poisson"])
+
+
+@dataclass
+class JointEDConfig(HubertConfig):
+    use_rel_pos_enc: bool = field(
+        default=False,
+        metadata={"help": "whether to use relative positional encoding"},
+    )
+    
+    # decoder
+    decoder_layers: int = field(
+        default=6, metadata={"help": "num decoder layers in the transformer"}
+    )
+    decoder_embed_dim: int = field(
+        default=768, metadata={"help": "decoder embedding dimension"}
+    )
+    decoder_ffn_embed_dim: int = field(
+        default=3072, metadata={"help": "decoder embedding dimension for FFN"}
+    )
+    decoder_attention_heads: int = field(
+        default=12, metadata={"help": "num decoder attention heads"}
+    )
+    decoder_normalize_before: bool = field(
+        default=False,
+        metadata={"help": "apply layernorm before each decoder block"},
+    )
+    layernorm_embedding: bool = field(
+        default=False,
+        metadata={"help": "apply layernorm to embedding for decoder"},
+    )
+    decoder_layerdrop: float = field(
+        default=0.1,
+        metadata={"help": "probability of dropping a tarnsformer layer"},
+    )
+    share_decoder_input_output_embed: bool = field(
+        default=False,
+        metadata={"help": "share decoder input and output embeddings"},
+    )
+    share_enc_dec_embeddings: bool = field(
+        default=False,
+        metadata={"help": "share embeddings of (text encoder, text decoder)"},
+    )
+    share_s2t_t2t_embeddings: bool = field(
+        default=False,
+        metadata={"help": "share embeddings of (speech2text(code), text2text)"},
+    )
+    decoder_output_dim: int = field(
+        default=768, metadata={"help": "decoder output dimension"}
+    )
+    max_target_positions: int = field(
+        default=3000, metadata={"help": "max target position"}
+    )
+    no_scale_embedding: bool = field(
+        default=False,
+        metadata={"help": "not scale embedding"},
+    )
+    adaptive_input: bool = field(
+        default=False,
+        metadata={"help": "adaptive input"},
+    )
+    quant_noise_pq: int = field(
+        default=0, metadata={"help": "quant noise pq"}
+    )
+    decoder_learned_pos: bool = field(
+        default=False,
+        metadata={"help": "decoder learnable positional embedding"},
+    )
+    no_token_positional_embeddings: bool = field(
+        default=False,
+        metadata={"help": "no token positional embeddings"},
+    )
+    add_text_modality: bool = field(
+        default=-False,
+        metadata={"help": "add text modality, mainly used in pretrainnig"},
+    )
+    add_text_encoder: bool = field(
+        default=False,
+        metadata={"help": "add_text_encoder"},
+    )
+    share_text_encoder: bool = field(
+        default=True,
+        metadata={"help": "share text encoder so that speech branch go through it"},
+    )
+    split_attention: bool = field(
+        default=False,
+        metadata={"help": "use shared but split encoders"},
+    )
+    add_adaptor: bool = field(
+        default=False,
+        metadata={"help": "add adaptor and text encoder on the top of speech encoder"},
+    )
+    adaptor_n_layers: int = field(
+        default=3,
+        metadata={"help": "number of layers for adaptor"},
+    )
+    adaptor_kernel_size: int = field(
+        default=3,
+        metadata={"help": "kernel size for adaptor"},
+    )
+    adaptor_stride: int = field(
+        default=2,
+        metadata={"help": "adaptor stride"},
+    )
+    adaptor_layernorm: bool = field(
+        default=False,
+        metadata={"help": "adaptor layernorm"},
+    )
+    # Finetune related
+    decoder_dict_size: int = field(
+        default=-1,
+        metadata={"help": "decoder dictionary dimension"},
+    )
+
+    # text encoder related, TransformerConfig is used in bart but we only use its enconder
+    text_transformer: TransformerConfig = TransformerConfig()
+
+    # other
+    checkpoint_activations: bool = field(
+        default=False, metadata={"help": "recompute activations and save memory for extra compute"}
+    )
+
+    # Load pre-train model
+    load_pretrained_mbart_from: Optional[str] = field(
+        default=None,
+        metadata={
+            "help": "model to take text encoder decoder weights from (for initialization)"
+        },
+    )
+    load_pretrained_w2v_from: Optional[str] = field(
+        default=None,
+        metadata={
+            "help": "model to take speech encoder weights from (for initialization)"
+        },
+    )
+
+    # FP16 optimization
+    required_seq_len_multiple: int = field(
+        default=1,
+        metadata={
+            "help": "pad the input to encoder such that the sequence length is divisible by multiple"
+        },
+    )
+    crop_seq_to_multiple: int = field(
+        default=1,
+        metadata={
+            "help": "crop convolutional feature extractor output such that the sequence length is divisible by multiple"
+        },
+    )
+
+@register_model("joint_ed", dataclass=JointEDConfig)
+class JointEDModel(HubertModel):
+    def __init__(
+        self,
+        cfg: JointEDConfig,
+        task_cfg: JointPretrainingConfig,
+        dictionaries: List[Dictionary],
+        text_dictionary: Dictionary = None,
+    ) -> None:
+        super().__init__(cfg, task_cfg, dictionaries)
+        logger.info(f"JointEDModel Config: {cfg}")
+
+        self.encoder = TransformerEncoder(cfg)
+
+        ### build speeech-text joint_pretrain net from:
+        ### - add_text_modality is false: no text network
+        ### - add_text_modality is true, add_text_encoder=False: build text embedding
+        ### - add_text_modality is true, add_text_encoder=True: build text embedding and encoder
+        assert cfg.add_text_modality
+        assert cfg.add_text_encoder
+        assert cfg.share_text_encoder
+        assert text_dictionary is not None
+        self.add_text_modality = cfg.add_text_modality
+        self.add_text_encoder = cfg.add_text_encoder
+        self.share_text_encoder = cfg.share_text_encoder
+
+        if cfg.share_s2t_t2t_embeddings:
+            text_dictionary = self.cutting_dictionary(text_dictionary, cfg.decoder_dict_size)
+        
+        ### build text encoder
+        text_encoder_embed_tokens = self.build_embedding(
+                text_dictionary, cfg.text_transformer.encoder.embed_dim
+            )
+        self.text_encoder = TransformerEncoderBase(
+            cfg.text_transformer, 
+            text_dictionary, 
+            text_encoder_embed_tokens
+        )
+        
+        ### build text decoder
+        self.add_decoder = task_cfg.add_decoder
+        if self.add_decoder:
+            # To make sure that the decoder dict size is the same as the fine-tuning tgt_dict size or bpe code dict size
+            s2t_dec_dict = self.cutting_dictionary(dictionaries[0], cfg.decoder_dict_size)
+            if text_dictionary is None:
+                decoder_dict_list = [s2t_dec_dict]
+            else:
+                decoder_dict_list = [s2t_dec_dict, text_dictionary]
+
+            decoder_embed_tokens = [
+                self.build_embedding(dictionary, cfg.decoder_embed_dim)
+                for dictionary in decoder_dict_list
+            ]
+            
+            if cfg.share_enc_dec_embeddings and text_dictionary is not None:
+                assert cfg.share_decoder_input_output_embed, "Must share decoder input-output embed before share encoder-decoder embed"
+                logger.info("--------------------------------: share input-output embeddings")
+                decoder_embed_tokens[-1] = text_encoder_embed_tokens
+            
+            if cfg.share_s2t_t2t_embeddings:
+                logger.info("--------------------------------: share s2t-t2t embeddings")
+                assert len(s2t_dec_dict) == len(text_dictionary), "s2t embed len must be equal to t2t embed len"
+                decoder_embed_tokens[0] = text_encoder_embed_tokens
+
+            if len(decoder_embed_tokens) == 1:
+                self.decoder = TransformerDecoderScriptable(cfg, decoder_dict_list[0], decoder_embed_tokens[0])
+            else:
+                self.decoder = MultimodalTransformerDecoder(cfg, decoder_dict_list, decoder_embed_tokens)
+
+        self.add_adaptor = cfg.add_adaptor
+        if self.add_adaptor:
+            assert self.add_text_encoder, "Cannot shared encoder for text and speech once add adaptor"
+            self.adaptor = Conv1dAdaptor(
+                cfg.encoder_embed_dim,
+                cfg.decoder_embed_dim,
+                n_layers=cfg.adaptor_n_layers,
+                kernel_size=cfg.adaptor_kernel_size,
+                stride=cfg.adaptor_stride,
+                add_layernorm=cfg.adaptor_layernorm,
+            )
+
+        if cfg.load_pretrained_w2v_from is not None:
+            w2v_model_state = self.load_checkpoint(cfg.load_pretrained_w2v_from)
+            self.feature_extractor = self.load_pretrained_component_from_model(
+                component=self.feature_extractor, state=w2v_model_state
+            )
+            
+            self.encoder = self.load_pretrained_component_from_model(
+                component=self.encoder, state=w2v_model_state
+            )
+            
+            self.post_extract_proj.weight = torch.nn.Parameter(w2v_model_state["model"]["post_extract_proj.weight"])
+            self.post_extract_proj.bias = torch.nn.Parameter(w2v_model_state["model"]["post_extract_proj.bias"])
+
+            # self.final_proj.weight = torch.nn.Parameter(w2v_model_state["model"]["final_proj.weight"])
+            # self.final_proj.bias = torch.nn.Parameter(w2v_model_state["model"]["final_proj.bias"])
+
+            self.layer_norm.weight = torch.nn.Parameter(w2v_model_state["model"]["layer_norm.weight"])
+            self.layer_norm.bias = torch.nn.Parameter(w2v_model_state["model"]["layer_norm.bias"])
+
+            # self.label_embs_concat.data = torch.nn.Parameter(w2v_model_state["model"]["label_embs_concat"])
+            self.mask_emb.data = torch.nn.Parameter(w2v_model_state["model"]["mask_emb"])
+
+        if cfg.load_pretrained_mbart_from is not None:
+            mbart_model_state = self.load_checkpoint(cfg.load_pretrained_mbart_from)
+            if self.add_text_modality and self.add_text_encoder:
+                self.text_encoder = self.load_pretrained_component_from_model(
+                    component=self.text_encoder, state=mbart_model_state
+                )
+            if self.add_decoder:
+                self.decoder = self.load_pretrained_component_from_model(
+                    component=self.decoder, state=mbart_model_state
+                )
+    
+    def cutting_dictionary(self, dictionary, dict_size):
+        if dictionary is None or dict_size <= 0:
+            return dictionary
+        else:
+            cut_dictionary = copy.deepcopy(dictionary)
+            if dict_size > len(cut_dictionary):
+                for i in range(dict_size - len(cut_dictionary)):
+                    cut_dictionary.symbols.append(f'_{i}_')
+            else:
+                cut_dictionary.symbols = cut_dictionary.symbols[:dict_size]
+            return cut_dictionary
+
+    def build_embedding(self, dictionary, embed_dim):
+        num_embeddings = len(dictionary)
+        padding_idx = dictionary.pad()
+        return Embedding(num_embeddings, embed_dim, padding_idx)
+
+    @classmethod
+    def build_model(cls, cfg: HubertConfig, task: JointPretrainingTask):
+        """Build a new model instance."""
+        # Change dict size for bpe code
+        if hasattr(task, "hubert_tokenizer") and task.hubert_tokenizer is not None and not task.fine_tuning and cfg.decoder_dict_size == -1:
+            cfg.decoder_dict_size = len(task.hubert_tokenizer.sp)
+            logger.info(f"Use acoustic pieces as code, set decoder dict size to {len(task.hubert_tokenizer.sp)}")
+
+        text_dictionary = getattr(task, "text_dictionary", None)
+        model = JointEDModel(cfg, task.cfg, task.dictionaries, text_dictionary)
+        return model
+
+    def get_normalized_probs(
+        self,
+        net_output: Tuple[Tensor, Optional[Dict[str, List[Optional[Tensor]]]]],
+        log_probs: bool,
+        sample: Optional[Dict[str, Tensor]] = None,
+    ):
+        # net_output['encoder_out'] is a (B, T, D) tensor
+        lprobs = self.get_normalized_probs_scriptable(net_output, log_probs, sample)
+        lprobs.batch_first = True
+        return lprobs
+
+    def forward(
+        self,
+        source: torch.Tensor = None,
+        src_tokens: torch.Tensor = None,
+        src_lengths: torch.Tensor = None,
+        target_list: Optional[List[torch.Tensor]] = None,
+        padding_mask: Optional[torch.Tensor] = None,
+        mask: bool = True,
+        features_only: bool = False,
+        output_layer: Optional[int] = None,
+        prev_output_tokens: Optional[torch.Tensor] = None,
+        text_modal_idx: Optional[int] = -1,
+    ) -> Dict[str, torch.Tensor]:
+        """output layer is 1-based"""
+        assert source is not None or src_tokens is not None
+        if source is not None:
+            ### 1. go speech cnn-encoder-decoder branch
+            features = self.forward_features(source)
+            if target_list is not None:
+                features, target_list = self.forward_targets(features, target_list)
+
+            features_pen = features.float().pow(2).mean()
+
+            features = features.transpose(1, 2)
+            features = self.layer_norm(features)
+            unmasked_features = features.clone()
+
+            if padding_mask is not None:
+                padding_mask = self.forward_padding_mask(features, padding_mask)
+
+            if self.post_extract_proj is not None:
+                features = self.post_extract_proj(features)
+
+            features = self.dropout_input(features)
+            unmasked_features = self.dropout_features(unmasked_features)
+
+            if mask:
+                x, mask_indices = self.apply_mask(features, padding_mask, target_list)
+            else:
+                x = features
+                mask_indices = None
+
+            # feature: (B, T, D), float
+            # target: (B, T), long
+            # x: (B, T, D), float
+            # padding_mask: (B, T), bool
+            # mask_indices: (B, T), bool
+            x, _ = self.encoder(
+                x,
+                padding_mask=padding_mask,
+                layer=None if output_layer is None else output_layer - 1,
+            )
+
+            if features_only:
+                return {"x": x, "padding_mask": padding_mask, "features": features}
+
+            def compute_pred(proj_x, target, label_embs):
+                # compute logits for the i-th label set
+                y = torch.index_select(label_embs, 0, target.long())
+                negs = label_embs.unsqueeze(1).expand(-1, proj_x.size(0), -1)
+                if self.target_glu:
+                    y = self.target_glu(y)
+                    negs = self.target_glu(negs)
+                # proj_x: (S, D)
+                # y: (S, D)
+                # negs: (Neg, S, D)
+                return self.compute_nce(proj_x, y, negs)
+
+            label_embs_list = self.label_embs_concat.split(self.num_classes, 0)
+
+            if not self.skip_masked:
+                masked_indices = torch.logical_and(~padding_mask, mask_indices)
+                proj_x_m = self.final_proj(x[masked_indices])
+                if self.untie_final_proj:
+                    proj_x_m_list = proj_x_m.chunk(len(target_list), dim=-1)
+                else:
+                    proj_x_m_list = [proj_x_m for _ in range(len(target_list))]
+                logit_m_list = [
+                    compute_pred(proj_x_m, t[masked_indices], label_embs_list[i])
+                    for i, (proj_x_m, t) in enumerate(zip(proj_x_m_list, target_list))
+                ]
+            else:
+                logit_m_list = [None for _ in target_list]
+
+            if not self.skip_nomask:
+                nomask_indices = torch.logical_and(~padding_mask, ~mask_indices)
+                proj_x_u = self.final_proj(x[nomask_indices])
+                if self.untie_final_proj:
+                    proj_x_u_list = proj_x_u.chunk(len(target_list), dim=-1)
+                else:
+                    proj_x_u_list = [proj_x_u for _ in range(len(target_list))]
+
+                logit_u_list = [
+                    compute_pred(proj_x_u, t[nomask_indices], label_embs_list[i])
+                    for i, (proj_x_u, t) in enumerate(zip(proj_x_u_list, target_list))
+                ]
+            else:
+                logit_u_list = [None for _ in target_list]
+
+            result = {
+                "logit_m_list": logit_m_list,
+                "logit_u_list": logit_u_list,
+                "padding_mask": padding_mask,
+                "features_pen": features_pen,
+            }
+            
+            x = x.transpose(0, 1) # T x B x C
+            # adaptor layers
+            if self.add_adaptor:
+                x, padding_mask = self.adaptor(x, padding_mask)
+
+            # text encoder layers
+            if self.add_text_encoder and self.share_text_encoder:
+                for layer in self.text_encoder.layers:
+                    x = layer(
+                        x, encoder_padding_mask=padding_mask
+                    )
+                if self.text_encoder.layer_norm is not None:
+                    x = self.text_encoder.layer_norm(x)
+
+            # decoder layers
+            if self.add_decoder:
+                encoder_out = {
+                    "encoder_out": [x],  # T x B x C
+                    "encoder_padding_mask": [padding_mask],  # B x T
+                }
+                assert prev_output_tokens is not None
+                decoder_out = self.decoder(
+                    prev_output_tokens=prev_output_tokens, encoder_out=encoder_out
+                )
+                result['decoder_out'] = decoder_out
+        else:
+            ### 2. go text encoder-decoder branch
+            if self.add_text_encoder:
+                encoder_out = self.text_encoder(
+                    src_tokens, src_lengths=src_lengths, return_all_hiddens=False
+                )
+            else:
+                encoder_padding_mask = src_tokens.eq(self.text_padding_idx)
+                has_pads = src_tokens.device.type == "xla" or encoder_padding_mask.any()
+                x = self.text_embed_scale * self.text_encoder_embed_tokens(src_tokens)
+                x = x + self.text_embed_positions(src_tokens)
+                # x = self.dropout_input(x)
+                if has_pads:
+                    x = x * (1 - encoder_padding_mask.unsqueeze(-1).type_as(x))
+                kwargs={"modality": "text"} if self.split_attention else {}
+                x, _ = self.encoder(
+                    x,
+                    padding_mask=encoder_padding_mask,
+                    conv_pos=False,
+                    **kwargs,
+                )
+                encoder_out = {
+                    "encoder_out": [x.transpose(0, 1)],  # T x B x C
+                    "encoder_padding_mask": [encoder_padding_mask],  # B x T
+                    "src_lengths": [src_lengths],
+                }
+            
+            result = {"encoder_out": encoder_out}
+            if features_only:
+                return result
+            assert prev_output_tokens is not None
+            decoder_out = self.decoder(
+                prev_output_tokens=prev_output_tokens, encoder_out=encoder_out, modal_idx=text_modal_idx,
+            )
+            result['decoder_out'] = decoder_out
+
+        return result
+
+    def forward_torchscript(self, net_input: Dict[str, Tensor]):
+        """A TorchScript-compatible version of forward.
+
+        Encoders which use additional arguments may want to override
+        this method for TorchScript compatibility.
+        """
+        res = self.forward(
+            mask=False,
+            features_only=True,
+            **net_input,
+        )
+
+        if "source" in net_input:
+            res["x"] = res["x"].transpose(0, 1) # T x B x C
+
+            x = res["x"] # T x B x C
+            padding_mask = res["padding_mask"]
+            if self.add_adaptor:
+                x, padding_mask = self.adaptor(x, padding_mask)
+
+            # text encoder layers
+            if self.add_text_encoder and self.share_text_encoder:
+                for layer in self.text_encoder.layers:
+                    x = layer(
+                        x, encoder_padding_mask=padding_mask
+                    )
+
+                if self.text_encoder.layer_norm is not None:
+                    x = self.text_encoder.layer_norm(x)
+                
+                res["x"] = x
+                res["padding_mask"] = padding_mask
+
+            encoder_out = {
+                "encoder_out": [res["x"]],  # T x B x C
+                "encoder_padding_mask": [res["padding_mask"]],  # B x T
+            }
+        else:
+            encoder_out = res["encoder_out"]
+            if "encoder_states" in encoder_out:
+                del encoder_out["encoder_states"]
+            if "src_tokens" in encoder_out:
+                del encoder_out["src_tokens"]
+            if "src_tokens" in encoder_out:
+                del encoder_out["src_lengths"]
+        return encoder_out
+
+    def extract_features(
+        self,
+        source: torch.Tensor,
+        padding_mask: Optional[torch.Tensor] = None,
+        mask: bool = False,
+        ret_conv: bool = False,
+        output_layer: Optional[int] = None,
+        prev_output_tokens: Optional[torch.Tensor] = None,
+        ft: bool = True,
+        enc_grad_mult: float = 1.0,
+    ) -> Tuple[torch.Tensor, torch.Tensor]:
+        """only for speech input"""
+        with torch.no_grad() if not ft else contextlib.ExitStack():
+            res = self.forward(
+                source,
+                padding_mask=padding_mask,
+                mask=mask,
+                features_only=True,
+                output_layer=output_layer,
+            )
+
+        feature = res["features"] if ret_conv else res["x"]
+
+        res["x"] = res["x"].transpose(0, 1) # T x B x C
+        x = res["x"] # T x B x C
+        padding_mask = res["padding_mask"]
+        if self.add_adaptor:
+            x, padding_mask = self.adaptor(x, padding_mask)
+
+        # text encoder layers
+        if self.add_text_encoder and self.share_text_encoder:
+            for layer in self.text_encoder.layers:
+                x = layer(
+                    x, encoder_padding_mask=padding_mask
+                )
+
+            if self.text_encoder.layer_norm is not None:
+                x = self.text_encoder.layer_norm(x)
+            
+            res["x"] = x
+            res["padding_mask"] = padding_mask
+
+        if self.add_decoder and prev_output_tokens is not None:
+            encoder_out = {
+                "encoder_out": [res["x"]],  # T x B x C
+                "encoder_padding_mask": [res["padding_mask"]],  # B x T
+            }
+            
+            if enc_grad_mult != 1.0:
+                encoder_out = self.mult_rst_grad(encoder_out, enc_grad_mult)
+
+            assert prev_output_tokens is not None
+            decoder_out = self.decoder(
+                prev_output_tokens=prev_output_tokens, 
+                encoder_out=encoder_out,
+            )
+        else:
+            decoder_out = None
+        return feature, res["padding_mask"], decoder_out
+
+    def mult_rst_grad(self, rst, ratio):
+        assert isinstance(rst, dict)  # instead of EncoderOut
+        assert len(rst["encoder_out"]) == 1
+        rst["encoder_out"][0] = GradMultiply.apply(rst["encoder_out"][0], ratio)
+        return rst
+
+
+    def remove_pretraining_modules(self, step2=False):
+        self.target_glu = None
+        self.final_proj = None
+        if self.add_text_modality:
+            # Delete text embeddings of text encoder
+            if not step2:
+                if self.add_text_encoder:
+                    self.text_encoder.embed_tokens = None
+                    if hasattr(self.text_encoder, "embed_positions"):
+                        self.text_encoder.embed_tokens = None
+                    if hasattr(self.text_encoder, "layernorm_embedding"):
+                        self.text_encoder.layernorm_embedding = None
+                else:
+                    self.text_encoder_embed_tokens = None
+                    self.text_embed_positions = None
+            if isinstance(self.decoder, MultimodalTransformerDecoder):
+                # Delete text embeddings of decoder
+                self.decoder.embed_tokens_list = self.decoder.embed_tokens_list[:1]
+                self.decoder.output_projection = self.decoder.output_projection[:1]
+
+    def load_checkpoint(self, checkpoint: str):
+        if not PathManager.exists(checkpoint):
+            raise IOError("Model file not found: {}".format(checkpoint))
+        state = checkpoint_utils.load_checkpoint_to_cpu(checkpoint)
+        return state
+        
+    def load_pretrained_component_from_model(
+        self, component: Union[TransformerEncoderBase, TransformerEncoder, W2vTransformerEncoder, FairseqDecoder, ConvFeatureExtractionModel], state
+    ):
+        """
+        Load a pretrained FairseqEncoder or FairseqDecoder from checkpoint into the
+        provided `component` object. If state_dict fails to load, there may be a
+        mismatch in the architecture of the corresponding `component` found in the
+        `checkpoint` file.
+        """
+        if isinstance(component, (TransformerEncoderBase, TransformerEncoder, W2vTransformerEncoder)):
+            component_type = "encoder"
+        elif isinstance(component, FairseqDecoder):
+            component_type = "decoder"
+            if isinstance(component, MultimodalTransformerDecoder):
+                state["model"]["decoder.embed_tokens_list.1.weight"] = state["model"]["decoder.embed_tokens.weight"]
+                state["model"]["decoder.output_projection.1.weight"] = state["model"]["decoder.output_projection.weight"]
+        elif isinstance(component, ConvFeatureExtractionModel):
+            component_type = "feature_extractor"
+        else:
+            print(component)
+            raise ValueError(
+                "component to load must be either a FairseqEncoder or "
+                "FairseqDecoder. Loading other component types are not supported."
+            )
+        component_state_dict = OrderedDict()
+        for key in state["model"].keys():
+            if key.startswith(component_type):
+                # encoder.input_layers.0.0.weight --> input_layers.0.0.weight
+                component_subkey = key[len(component_type) + 1 :]
+                component_state_dict[component_subkey] = state["model"][key]
+        try:
+            logger.info(f"Load {component_type}")
+            component.load_state_dict(component_state_dict, strict=True)
+        except Exception as e:
+            logger.warn(e)
+            component.load_state_dict(component_state_dict, strict=False)
+        return component
diff --git a/YiTrans/yitrans_iwslt22/models/pretrain_ed_step2.py b/YiTrans/yitrans_iwslt22/models/pretrain_ed_step2.py
new file mode 100644
index 0000000000000000000000000000000000000000..82820bb95d5890db573c96241e3cd6c572adea34
--- /dev/null
+++ b/YiTrans/yitrans_iwslt22/models/pretrain_ed_step2.py
@@ -0,0 +1,438 @@
+# --------------------------------------------------------
+# The YiTrans End-to-End Speech Translation System for IWSLT 2022 Offline Shared Task (https://arxiv.org/abs/2206.05777)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/YiTrans
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Based on fairseq code bases
+# https://github.com/facebookresearch/fairseq
+# --------------------------------------------------------
+
+import logging
+import contextlib
+from argparse import Namespace
+from typing import Any, Optional
+
+import torch
+import torch.nn as nn
+import pickle
+from dataclasses import dataclass, field
+from fairseq import checkpoint_utils, tasks, utils
+from fairseq.dataclass import FairseqDataclass
+from fairseq.dataclass.utils import convert_namespace_to_omegaconf
+from fairseq.models import BaseFairseqModel, FairseqEncoder, register_model
+from fairseq.models.hubert.hubert import MASKING_DISTRIBUTION_CHOICES
+from fairseq.models.hubert.hubert_asr import HubertAsrConfig
+from fairseq.tasks import FairseqTask
+from omegaconf import II, MISSING
+
+from yitrans_iwslt22.modules.multimodal_transformer_decoder import MultimodalTransformerDecoder
+
+logger = logging.getLogger(__name__)
+
+@dataclass
+class JointStep2Config(HubertAsrConfig):
+    ## for decoder overrides
+    decoder_layerdrop: float = field(
+        default=0.1,
+        metadata={"help": "probability of dropping a decoder layer in hubert"},
+    )
+    add_decoder: bool = field(
+        default=False,
+        metadata={"help": "whether to add decoder for CE Loss on code"},
+    )
+    reuse_text_emb: bool = field(
+        default=False,
+        metadata={"help": "reuse text token embeddings instead of initialize randomly"},
+    )
+    freeze_decoder_updates: int = field(
+        default=0,
+        metadata={"help": "dont finetune hubert for this many updates"},
+    )
+    # share_enc_dec_embeddings: bool = field(
+    #     default=False,
+    #     metadata={"help": "share embeddings of (text encoder, text decoder)"},
+    # )
+    share_s2t_t2t_embeddings: bool = field(
+        default=False,
+        metadata={"help": "share embeddings of (speech2text(code), text2text)"},
+    )
+    share_ctc_decoder_embed: bool = field(
+        default=False,
+        metadata={"help": "share ctc and decoder embedding (only when share_decoder_input_output_embed is true)"},
+    )
+    enc_grad_mult: float = field(
+        default=1.0,
+        metadata={"help": "reset feature grad mult in hubert to this (only for st2t)"},
+    )
+    retain_dict_path: Optional[str] = field(
+        default=None,
+        metadata={"help": "delete embeddings according to this path"},
+    )
+    load_step2_model_from: Optional[str] = field(
+        default=None,
+        metadata={
+            "help": "load step2 model from"
+        },
+    )
+
+    # for other overrides
+    adaptor_stride: int = field(
+        default=2,
+        metadata={"help": "adaptor stride"},
+    )
+
+    # ## for reset some configs
+    # load_pretrained_mbart_from: Optional[str] = field(
+    #     default=None,
+    #     metadata={
+    #         "help": "model to take text encoder decoder weights from (for initialization)"
+    #     },
+    # )
+    # load_pretrained_w2v_from: Optional[str] = field(
+    #     default=None,
+    #     metadata={
+    #         "help": "model to take speech encoder weights from (for initialization)"
+    #     },
+    # )
+    # use_rel_pos_enc: bool = field(
+    #     default=True,
+    #     metadata={"help": "whether to use relative positional encoding"},
+    # )
+    # encoder_layers: int = field(
+    #     default=12,
+    #     metadata={"help": "encoder_layers"},
+    # )
+    # add_text_modality: bool = field(
+    #     default=True,
+    #     metadata={"help": "add_text_modality"},
+    # )
+    # add_text_encoder: bool = field(
+    #     default=True,
+    #     metadata={"help": "add_text_encoder"},
+    # )
+    # share_all_embeddings: bool = field(
+    #     default=True,
+    #     metadata={"help": "share text_encoder, decoder_input, decoder_output embeddings"},
+    # )
+    # add_adaptor: bool = field(
+    #     default=True,
+    #     metadata={"help": "add_adaptor"},
+    # )
+
+
+@register_model("hubert_step2", dataclass=JointStep2Config)
+class JointStep2Model(BaseFairseqModel):
+    def __init__(self, cfg: JointStep2Config, w2v_encoder: BaseFairseqModel):
+        super().__init__()
+        self.cfg = cfg
+        self.w2v_encoder = w2v_encoder
+
+    def upgrade_state_dict_named(self, state_dict, name):
+        super().upgrade_state_dict_named(state_dict, name)
+        return state_dict
+
+    @classmethod
+    def build_model(cls, cfg: JointStep2Config, task: FairseqTask):
+        """Build a new model instance."""
+        w2v_encoder = JointED(cfg, task.target_dictionary)
+        return cls(cfg, w2v_encoder)
+
+    def get_normalized_probs(self, net_output, log_probs, sample=None):
+        """Get normalized probabilities (or log probs) from a net's output."""
+        if "encoder_out" not in net_output:
+            return self.w2v_encoder.get_normalized_probs_decoder(net_output, log_probs, sample)
+
+        if "encoder_out_for_ctc" in net_output:
+            logits = net_output["encoder_out_for_ctc"]
+        else:
+            logits = net_output["encoder_out"]
+        
+        if isinstance(logits, list):
+            logits = logits[0]
+
+        if log_probs:
+            return utils.log_softmax(logits.float(), dim=-1)
+        else:
+            return utils.softmax(logits.float(), dim=-1)
+
+    def get_logits(self, net_output):
+        logits = net_output["encoder_out"]
+        padding = net_output["encoder_padding_mask"]
+        if padding is not None and padding.any():
+            padding = padding.T
+            logits[padding][..., 0] = 0
+            logits[padding][..., 1:] = float("-inf")
+
+        return logits
+
+    def forward(self, **kwargs):
+        x = self.w2v_encoder(**kwargs)
+        return x
+
+    @property
+    def encoder(self):
+        return self.w2v_encoder
+
+    def reorder_encoder_out(self, encoder_out, new_order):
+        return self.encoder.reorder_encoder_out(encoder_out, new_order)
+
+    @property
+    def decoder(self):
+        return self.w2v_encoder.w2v_model.decoder
+
+class JointED(FairseqEncoder):
+    def __init__(self, cfg: JointStep2Config, tgt_dict=None):
+        self.apply_mask = cfg.apply_mask
+        logger.info(f"self.apply_mask: {self.apply_mask}")
+
+        arg_overrides = {
+            "dropout": cfg.dropout,
+            "activation_dropout": cfg.activation_dropout,
+            "dropout_input": cfg.dropout_input,
+            "attention_dropout": cfg.attention_dropout,
+            "mask_length": cfg.mask_length,
+            "mask_prob": cfg.mask_prob,
+            "mask_selection": cfg.mask_selection,
+            "mask_other": cfg.mask_other,
+            "no_mask_overlap": cfg.no_mask_overlap,
+            "mask_channel_length": cfg.mask_channel_length,
+            "mask_channel_prob": cfg.mask_channel_prob,
+            "mask_channel_selection": cfg.mask_channel_selection,
+            "mask_channel_other": cfg.mask_channel_other,
+            "no_mask_channel_overlap": cfg.no_mask_channel_overlap,
+            "encoder_layerdrop": cfg.layerdrop,
+            "decoder_layerdrop": cfg.decoder_layerdrop,
+            "feature_grad_mult": cfg.feature_grad_mult,
+            "decoder_dict_size": len(tgt_dict) if cfg.add_decoder else -1,
+            "share_decoder_input_output_embed": cfg.share_decoder_input_output_embed,
+            "share_s2t_t2t_embeddings": cfg.share_s2t_t2t_embeddings,
+            "load_pretrained_w2v_from": None,
+            "load_pretrained_mbart_from": None,
+            "adaptor_stride": cfg.adaptor_stride,
+        }
+
+        if cfg.w2v_args is None:
+            state = checkpoint_utils.load_checkpoint_to_cpu(cfg.w2v_path, arg_overrides)
+            w2v_args = state.get("cfg", None)
+            if w2v_args is None:
+                w2v_args = convert_namespace_to_omegaconf(state["args"])
+            cfg.w2v_args = w2v_args
+        else:
+            state = None
+            w2v_args = cfg.w2v_args
+            if isinstance(w2v_args, Namespace):
+                cfg.w2v_args = w2v_args = convert_namespace_to_omegaconf(w2v_args)
+
+        if cfg.normalize != w2v_args.task.normalize:
+            logger.warn(
+                "Fine-tuning works best when data normalization is the same. "
+                "Please check that --normalize is set or unset for "
+                "both pre-training and here"
+            )
+
+        w2v_args.task.data = cfg.data
+        if hasattr(w2v_args.task, "text_cfg"):
+            w2v_args.task.text_cfg.data_config = None
+        w2v_args.task.add_decoder = cfg.add_decoder
+        task = tasks.setup_task(w2v_args.task)
+        if state is not None and "task_state" in state:
+            # This will load the stored "dictionaries" object
+            task.load_state_dict(state["task_state"])
+        model = task.build_model(w2v_args.model)
+
+        ### delete the embed_tokens and output_projection of decoder
+        if state is not None and not cfg.no_pretrained_weights:
+            if cfg.retain_dict_path is not None:
+                assert model.add_text_modality, "Mustc have text modality if retain dict path"
+                logger.info("Cut embedding to a smaller size according to ratin dict")
+                with open(cfg.retain_dict_path, "rb") as fp:
+                    overlap_idxs = pickle.load(fp)
+                state['model']['decoder.output_projection.1.weight'] = state['model']['decoder.output_projection.1.weight'][overlap_idxs]
+                state["model"]["decoder.embed_tokens_list.1.weight"] = state["model"]["decoder.embed_tokens_list.1.weight"][overlap_idxs]
+            if cfg.reuse_text_emb:
+                assert model.add_text_modality, "Mustc have text modality if reuse text embed"
+                logger.info("Loading text-text pretrained token-embedding for speech-text finetuning...")
+                state["model"]["decoder.embed_tokens_list.0.weight"] = state["model"]["decoder.embed_tokens_list.1.weight"]
+                del state["model"]["decoder.embed_tokens_list.1.weight"]
+                state["model"]["decoder.output_projection.0.weight"] = state["model"]["decoder.output_projection.1.weight"]
+                del state["model"]["decoder.output_projection.1.weight"]
+                try:
+                    model.load_state_dict(state["model"], strict=True)
+                except Exception as e:
+                    logger.warn(e)
+                    model.load_state_dict(state["model"], strict=False)
+            else:
+                for pname in list(state["model"].keys()):
+                    if pname.startswith("decoder.embed_tokens") or pname.startswith("decoder.output_projection"):
+                        del state["model"][pname]
+                # set strict=False because we omit some modules
+                model.load_state_dict(state["model"], strict=False)
+
+        model.remove_pretraining_modules(step2=True)
+
+        super().__init__(task.source_dictionary)
+
+        d = w2v_args.model.encoder_embed_dim
+
+        self.w2v_model = model
+
+        self.final_dropout = nn.Dropout(cfg.final_dropout)
+        self.freeze_finetune_updates = cfg.freeze_finetune_updates
+        self.freeze_decoder_updates = cfg.freeze_decoder_updates
+        self.num_updates = 0
+
+        if cfg.share_ctc_decoder_embed:
+            assert cfg.add_decoder and cfg.share_decoder_input_output_embed, "Must share decoder input and output embed before share ctc and decoder embed"
+            if isinstance(self.w2v_model.decoder, MultimodalTransformerDecoder):
+                self.proj = nn.Linear(
+                    self.w2v_model.decoder.embed_tokens_list[0].weight.shape[1],
+                    self.w2v_model.decoder.embed_tokens_list[0].weight.shape[0],
+                    bias=False,
+                )
+                self.proj.weight = self.w2v_model.decoder.embed_tokens_list[0].weight
+            else:
+                self.proj = nn.Linear(
+                    self.w2v_model.decoder.embed_tokens.weight.shape[1],
+                    self.w2v_model.decoder.embed_tokens.weight.shape[0],
+                    bias=False,
+                )
+                self.proj.weight = self.w2v_model.decoder.embed_tokens.weight
+        elif tgt_dict is not None:
+            self.proj = Linear(d, len(tgt_dict))
+        elif getattr(cfg, "decoder_embed_dim", d) != d:
+            self.proj = Linear(d, cfg.decoder_embed_dim)
+        else:
+            self.proj = None
+
+    def set_num_updates(self, num_updates):
+        """Set the number of parameters updates."""
+        super().set_num_updates(num_updates)
+        self.num_updates = num_updates
+
+    def forward(self, source=None, src_tokens=None, src_lengths=None, padding_mask=None, prev_output_tokens=None, tbc=True, **kwargs):
+        assert source is not None or src_tokens is not None
+        if source is not None:
+            ### 1. go speech cnn-encoder-decoder branch
+            ft = self.freeze_finetune_updates <= self.num_updates
+            w2v_args = {
+                "source": source,
+                "padding_mask": padding_mask,
+                "mask": self.apply_mask and self.training,
+                "prev_output_tokens": prev_output_tokens,
+                "ft": ft,
+            }
+
+            if self.freeze_decoder_updates <= self.num_updates:
+                self.w2v_model.add_decoder = True
+            else:
+                self.w2v_model.add_decoder = False
+        
+            x, padding_mask, decoder_out = self.w2v_model.extract_features(**w2v_args)
+
+            if tbc:
+                # B x T x C -> T x B x C
+                x = x.transpose(0, 1)
+
+            x = self.final_dropout(x)
+
+            if self.proj:
+                x = self.proj(x)
+
+            return {
+                "encoder_out": x,  # T x B x C
+                "encoder_padding_mask": padding_mask,  # B x T
+                "padding_mask": padding_mask,
+                "decoder_out": decoder_out,
+            }
+        else:
+            ### 2. go text encoder-decoder branch
+            w2v_args = {
+                "src_tokens": src_tokens,
+                "src_lengths": src_lengths,
+                "prev_output_tokens": prev_output_tokens,
+            }
+
+            return self.w2v_model(**w2v_args)
+
+    def get_normalized_probs_decoder(self, net_output, log_probs, sample=None):
+        # net_output['encoder_out'] is a (B, T, D) tensor
+        return self.w2v_model.get_normalized_probs(net_output, log_probs, sample)
+
+    def reorder_encoder_out(self, encoder_out, new_order):
+        if encoder_out["encoder_out"] is not None:
+            if isinstance(encoder_out["encoder_out"], list):
+                encoder_out["encoder_out"] = (
+                    [] if len(encoder_out["encoder_out"]) == 0
+                    else [x.index_select(1, new_order) for x in encoder_out["encoder_out"]]
+                )
+            else:
+                encoder_out["encoder_out"] = encoder_out[
+                    "encoder_out"
+                ].index_select(1, new_order)
+        if encoder_out["encoder_padding_mask"] is not None:
+            if isinstance(encoder_out["encoder_padding_mask"], list):
+                encoder_out["encoder_padding_mask"] = (
+                    [] if len(encoder_out["encoder_padding_mask"]) == 0
+                    else [x.index_select(0, new_order) for x in encoder_out["encoder_padding_mask"]]
+                )
+            else:
+                encoder_out["encoder_padding_mask"] = encoder_out[
+                    "encoder_padding_mask"
+                ].index_select(0, new_order)
+        if "decoder_out" in encoder_out and encoder_out["decoder_out"] is not None:
+            if isinstance(encoder_out["decoder_out"], list):
+                encoder_out["decoder_out"] = (
+                    [] if len(encoder_out["decoder_out"]) == 0
+                    else [x.index_select(0, new_order) for x in encoder_out["decoder_out"]]
+                )
+            else:
+                encoder_out["decoder_out"] = encoder_out[
+                    "decoder_out"
+                ].index_select(0, new_order)
+        if "encoder_out_for_ctc" in encoder_out and encoder_out["encoder_out_for_ctc"] is not None:
+            if isinstance(encoder_out["encoder_out_for_ctc"], list):
+                encoder_out["encoder_out_for_ctc"] = (
+                    [] if len(encoder_out["encoder_out_for_ctc"]) == 0
+                    else [x.index_select(1, new_order) for x in encoder_out["encoder_out_for_ctc"]]
+                )
+            else:
+                encoder_out["encoder_out_for_ctc"] = encoder_out[
+                    "encoder_out_for_ctc"
+                ].index_select(1, new_order)
+
+        return encoder_out
+
+    def forward_torchscript(self, net_input):
+        """A TorchScript-compatible version of forward.
+
+        Encoders which use additional arguments may want to override
+        this method for TorchScript compatibility.
+        """
+        encoder_out = self.w2v_model.forward_torchscript(net_input)
+        
+        assert self.proj is not None
+        encoder_out['encoder_out_for_ctc'] = [self.proj(encoder_out['encoder_out'][0])]
+        
+        return encoder_out
+
+    def max_positions(self):
+        """Maximum input length supported by the encoder."""
+        return None
+
+    def upgrade_state_dict_named(self, state_dict, name):
+        return state_dict
+
+
+def Embedding(num_embeddings, embedding_dim, padding_idx):
+    m = nn.Embedding(num_embeddings, embedding_dim, padding_idx=padding_idx)
+    nn.init.normal_(m.weight, mean=0, std=embedding_dim ** -0.5)
+    nn.init.constant_(m.weight[padding_idx], 0)
+    return m
+
+
+def Linear(in_features, out_features, bias=True):
+    m = nn.Linear(in_features, out_features, bias)
+    nn.init.xavier_uniform_(m.weight)
+    if bias:
+        nn.init.constant_(m.bias, 0.0)
+    return m
diff --git a/YiTrans/yitrans_iwslt22/modules/__init__.py b/YiTrans/yitrans_iwslt22/modules/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..6c611e8c6000172b43b1cdb213cd42b68cb3a685
--- /dev/null
+++ b/YiTrans/yitrans_iwslt22/modules/__init__.py
@@ -0,0 +1,23 @@
+# --------------------------------------------------------
+# The YiTrans End-to-End Speech Translation System for IWSLT 2022 Offline Shared Task (https://arxiv.org/abs/2206.05777)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/YiTrans
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Based on fairseq code bases
+# https://github.com/facebookresearch/fairseq
+# --------------------------------------------------------
+
+from .multihead_attention import MultiheadAttention
+from .relative_pos_enc import RelativePositionalEncoding
+from .transformer_decoder_layer import TransformerDecoderLayerBase
+from .w2v_encoder import TransformerEncoder, TransformerSentenceEncoderLayer
+from .multimodal_transformer_decoder import MultimodalTransformerDecoder
+
+__all__ = [
+    "MultiheadAttention",
+    "RelativePositionalEncoding",
+    "TransformerDecoderLayerBase",
+    "TransformerEncoder",
+    "TransformerSentenceEncoderLayer",
+    "MultimodalTransformerDecoder",
+]
diff --git a/YiTrans/yitrans_iwslt22/modules/multihead_attention.py b/YiTrans/yitrans_iwslt22/modules/multihead_attention.py
new file mode 100644
index 0000000000000000000000000000000000000000..7b1c1445037ada5aef5b8cf9fd3b63b05d95aca1
--- /dev/null
+++ b/YiTrans/yitrans_iwslt22/modules/multihead_attention.py
@@ -0,0 +1,341 @@
+# --------------------------------------------------------
+# Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data (https://arxiv.org/abs/2203.17113)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/Speech2C
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Based on fairseq code bases
+# https://github.com/pytorch/fairseq
+# --------------------------------------------------------
+
+from typing import Dict, Optional, Tuple
+
+import torch
+import torch.nn.functional as F
+from fairseq import utils
+from torch import Tensor
+
+from fairseq.modules import MultiheadAttention as FairseqMultiheadAttention
+
+
+class MultiheadAttention(FairseqMultiheadAttention):
+    """Multi-headed attention.
+
+    See "Attention Is All You Need" for more details.
+    """
+
+    def __init__(
+        self,
+        embed_dim,
+        num_heads,
+        kdim=None,
+        vdim=None,
+        dropout=0.0,
+        bias=True,
+        add_bias_kv=False,
+        add_zero_attn=False,
+        self_attention=False,
+        encoder_decoder_attention=False,
+        q_noise=0.0,
+        qn_block_size=8,
+    ):
+        super().__init__(
+            embed_dim,
+            num_heads,
+            kdim,
+            vdim,
+            dropout,
+            bias,
+            add_bias_kv,
+            add_zero_attn,
+            self_attention,
+            encoder_decoder_attention,
+            q_noise,
+            qn_block_size,
+        )
+
+    def forward(
+        self,
+        query,
+        key: Optional[Tensor],
+        value: Optional[Tensor],
+        key_padding_mask: Optional[Tensor] = None,
+        incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]] = None,
+        need_weights: bool = True,
+        static_kv: bool = False,
+        attn_mask: Optional[Tensor] = None,
+        before_softmax: bool = False,
+        need_head_weights: bool = False,
+        position_bias: Optional[Tensor] = None,
+    ) -> Tuple[Tensor, Optional[Tensor]]:
+        """Input shape: Time x Batch x Channel
+
+        Args:
+            key_padding_mask (ByteTensor, optional): mask to exclude
+                keys that are pads, of shape `(batch, src_len)`, where
+                padding elements are indicated by 1s.
+            need_weights (bool, optional): return the attention weights,
+                averaged over heads (default: False).
+            attn_mask (ByteTensor, optional): typically used to
+                implement causal attention, where the mask prevents the
+                attention from looking forward in time (default: None).
+            before_softmax (bool, optional): return the raw attention
+                weights and values before the attention softmax.
+            need_head_weights (bool, optional): return the attention
+                weights for each head. Implies *need_weights*. Default:
+                return the average attention weights over all heads.
+        """
+        if need_head_weights:
+            need_weights = True
+
+        is_tpu = query.device.type == "xla"
+
+        tgt_len, bsz, embed_dim = query.size()
+        src_len = tgt_len
+        assert embed_dim == self.embed_dim, f"query dim {embed_dim} != {self.embed_dim}"
+        assert list(query.size()) == [tgt_len, bsz, embed_dim]
+        if key is not None:
+            src_len, key_bsz, _ = key.size()
+            if not torch.jit.is_scripting():
+                assert key_bsz == bsz
+                assert value is not None
+                assert src_len, bsz == value.shape[:2]
+
+        if (
+            not self.onnx_trace
+            and not is_tpu  # don't use PyTorch version on TPUs
+            and incremental_state is None
+            and not static_kv
+            # A workaround for quantization to work. Otherwise JIT compilation
+            # treats bias in linear module as method.
+            and not torch.jit.is_scripting()
+            and position_bias is None
+        ):
+            assert key is not None and value is not None
+            return F.multi_head_attention_forward(
+                query,
+                key,
+                value,
+                self.embed_dim,
+                self.num_heads,
+                torch.empty([0]),
+                torch.cat((self.q_proj.bias, self.k_proj.bias, self.v_proj.bias)),
+                self.bias_k,
+                self.bias_v,
+                self.add_zero_attn,
+                self.dropout_module.p,
+                self.out_proj.weight,
+                self.out_proj.bias,
+                self.training or self.dropout_module.apply_during_inference,
+                key_padding_mask,
+                need_weights,
+                attn_mask,
+                use_separate_proj_weight=True,
+                q_proj_weight=self.q_proj.weight,
+                k_proj_weight=self.k_proj.weight,
+                v_proj_weight=self.v_proj.weight,
+            )
+
+        if incremental_state is not None:
+            saved_state = self._get_input_buffer(incremental_state)
+            if saved_state is not None and "prev_key" in saved_state:
+                # previous time steps are cached - no need to recompute
+                # key and value if they are static
+                if static_kv:
+                    assert self.encoder_decoder_attention and not self.self_attention
+                    key = value = None
+        else:
+            saved_state = None
+
+        if self.self_attention:
+            q = self.q_proj(query)
+            k = self.k_proj(query)
+            v = self.v_proj(query)
+        elif self.encoder_decoder_attention:
+            # encoder-decoder attention
+            q = self.q_proj(query)
+            if key is None:
+                assert value is None
+                k = v = None
+            else:
+                k = self.k_proj(key)
+                v = self.v_proj(key)
+
+        else:
+            assert key is not None and value is not None
+            q = self.q_proj(query)
+            k = self.k_proj(key)
+            v = self.v_proj(value)
+        q *= self.scaling
+
+        if self.bias_k is not None:
+            assert self.bias_v is not None
+            k = torch.cat([k, self.bias_k.repeat(1, bsz, 1)])
+            v = torch.cat([v, self.bias_v.repeat(1, bsz, 1)])
+            if attn_mask is not None:
+                attn_mask = torch.cat(
+                    [attn_mask, attn_mask.new_zeros(attn_mask.size(0), 1)], dim=1
+                )
+            if key_padding_mask is not None:
+                key_padding_mask = torch.cat(
+                    [
+                        key_padding_mask,
+                        key_padding_mask.new_zeros(key_padding_mask.size(0), 1),
+                    ],
+                    dim=1,
+                )
+
+        q = (
+            q.contiguous()
+            .view(tgt_len, bsz * self.num_heads, self.head_dim)
+            .transpose(0, 1)
+        )
+        if k is not None:
+            k = (
+                k.contiguous()
+                .view(-1, bsz * self.num_heads, self.head_dim)
+                .transpose(0, 1)
+            )
+        if v is not None:
+            v = (
+                v.contiguous()
+                .view(-1, bsz * self.num_heads, self.head_dim)
+                .transpose(0, 1)
+            )
+
+        if saved_state is not None:
+            # saved states are stored with shape (bsz, num_heads, seq_len, head_dim)
+            if "prev_key" in saved_state:
+                _prev_key = saved_state["prev_key"]
+                assert _prev_key is not None
+                prev_key = _prev_key.view(bsz * self.num_heads, -1, self.head_dim)
+                if static_kv:
+                    k = prev_key
+                else:
+                    assert k is not None
+                    k = torch.cat([prev_key, k], dim=1)
+                src_len = k.size(1)
+            if "prev_value" in saved_state:
+                _prev_value = saved_state["prev_value"]
+                assert _prev_value is not None
+                prev_value = _prev_value.view(bsz * self.num_heads, -1, self.head_dim)
+                if static_kv:
+                    v = prev_value
+                else:
+                    assert v is not None
+                    v = torch.cat([prev_value, v], dim=1)
+            prev_key_padding_mask: Optional[Tensor] = None
+            if "prev_key_padding_mask" in saved_state:
+                prev_key_padding_mask = saved_state["prev_key_padding_mask"]
+            assert k is not None and v is not None
+            key_padding_mask = MultiheadAttention._append_prev_key_padding_mask(
+                key_padding_mask=key_padding_mask,
+                prev_key_padding_mask=prev_key_padding_mask,
+                batch_size=bsz,
+                src_len=k.size(1),
+                static_kv=static_kv,
+            )
+
+            saved_state["prev_key"] = k.view(bsz, self.num_heads, -1, self.head_dim)
+            saved_state["prev_value"] = v.view(bsz, self.num_heads, -1, self.head_dim)
+            saved_state["prev_key_padding_mask"] = key_padding_mask
+            # In this branch incremental_state is never None
+            assert incremental_state is not None
+            incremental_state = self._set_input_buffer(incremental_state, saved_state)
+        assert k is not None
+        assert k.size(1) == src_len
+
+        # This is part of a workaround to get around fork/join parallelism
+        # not supporting Optional types.
+        if key_padding_mask is not None and key_padding_mask.dim() == 0:
+            key_padding_mask = None
+
+        if key_padding_mask is not None:
+            assert key_padding_mask.size(0) == bsz
+            assert key_padding_mask.size(1) == src_len
+
+        if self.add_zero_attn:
+            assert v is not None
+            src_len += 1
+            k = torch.cat([k, k.new_zeros((k.size(0), 1) + k.size()[2:])], dim=1)
+            v = torch.cat([v, v.new_zeros((v.size(0), 1) + v.size()[2:])], dim=1)
+            if attn_mask is not None:
+                attn_mask = torch.cat(
+                    [attn_mask, attn_mask.new_zeros(attn_mask.size(0), 1)], dim=1
+                )
+            if key_padding_mask is not None:
+                key_padding_mask = torch.cat(
+                    [
+                        key_padding_mask,
+                        torch.zeros(key_padding_mask.size(0), 1).type_as(
+                            key_padding_mask
+                        ),
+                    ],
+                    dim=1,
+                )
+
+        attn_weights = torch.bmm(q, k.transpose(1, 2))
+        attn_weights = self.apply_sparse_mask(attn_weights, tgt_len, src_len, bsz)
+
+        if position_bias is not None: ## first order
+            ## position_bias: [241, 241, 64]
+            #print ("attn_weights: ", attn_weights.size()) # [492, 241, 241]
+            reshape_q = q.contiguous().view(bsz * self.num_heads, -1, self.head_dim).transpose(0,1) #[241, 492, 64]
+            #print ("reshape_q: ", reshape_q.size())
+            B = torch.matmul(reshape_q, position_bias.transpose(-2, -1))
+            #print ("B: ", B.size())  ## [241, 492, 241]
+            #B = B.transpose(0, 1).view(bsz, self.num_heads, position_bias.size(0), position_bias.size(1))
+            B = B.transpose(0, 1).view(bsz*self.num_heads, position_bias.size(0), position_bias.size(1))
+            #print ("B 2: ", B.size())
+            attn_weights += B
+
+        assert list(attn_weights.size()) == [bsz * self.num_heads, tgt_len, src_len]
+
+        if attn_mask is not None:
+            attn_mask = attn_mask.unsqueeze(0)
+            if self.onnx_trace:
+                attn_mask = attn_mask.repeat(attn_weights.size(0), 1, 1)
+            attn_weights += attn_mask
+
+        if key_padding_mask is not None:
+            # don't attend to padding symbols
+            attn_weights = attn_weights.view(bsz, self.num_heads, tgt_len, src_len)
+            if not is_tpu:
+                attn_weights = attn_weights.masked_fill(
+                    key_padding_mask.unsqueeze(1).unsqueeze(2).to(torch.bool),
+                    float("-inf"),
+                )
+            else:
+                attn_weights = attn_weights.transpose(0, 2)
+                attn_weights = attn_weights.masked_fill(key_padding_mask, float("-inf"))
+                attn_weights = attn_weights.transpose(0, 2)
+            attn_weights = attn_weights.view(bsz * self.num_heads, tgt_len, src_len)
+
+        if before_softmax:
+            return attn_weights, v
+
+        attn_weights_float = utils.softmax(
+            attn_weights, dim=-1, onnx_trace=self.onnx_trace
+        )
+        attn_weights = attn_weights_float.type_as(attn_weights)
+        attn_probs = self.dropout_module(attn_weights)
+
+        assert v is not None
+        attn = torch.bmm(attn_probs, v)
+        assert list(attn.size()) == [bsz * self.num_heads, tgt_len, self.head_dim]
+        if self.onnx_trace and attn.size(1) == 1:
+            # when ONNX tracing a single decoder step (sequence length == 1)
+            # the transpose is a no-op copy before view, thus unnecessary
+            attn = attn.contiguous().view(tgt_len, bsz, embed_dim)
+        else:
+            attn = attn.transpose(0, 1).contiguous().view(tgt_len, bsz, embed_dim)
+        attn = self.out_proj(attn)
+        attn_weights: Optional[Tensor] = None
+        if need_weights:
+            attn_weights = attn_weights_float.view(
+                bsz, self.num_heads, tgt_len, src_len
+            ).transpose(1, 0)
+            if not need_head_weights:
+                # average attention weights over heads
+                attn_weights = attn_weights.mean(dim=0)
+
+        return attn, attn_weights
diff --git a/YiTrans/yitrans_iwslt22/modules/multimodal_transformer_decoder.py b/YiTrans/yitrans_iwslt22/modules/multimodal_transformer_decoder.py
new file mode 100644
index 0000000000000000000000000000000000000000..4d0b5cdd60217a0b27ecb2f60b8bc988e9f4eb65
--- /dev/null
+++ b/YiTrans/yitrans_iwslt22/modules/multimodal_transformer_decoder.py
@@ -0,0 +1,525 @@
+# --------------------------------------------------------
+# The YiTrans End-to-End Speech Translation System for IWSLT 2022 Offline Shared Task (https://arxiv.org/abs/2206.05777)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/YiTrans
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Based on fairseq code bases
+# https://github.com/facebookresearch/fairseq
+# --------------------------------------------------------
+"""
+    Modified from https://github.com/facebookresearch/fairseq/blob/main/fairseq/models/transformer/transformer_decoder.py  
+"""
+
+import math
+from typing import Any, Dict, List, Optional
+
+import torch
+import torch.nn as nn
+from fairseq import utils
+from fairseq.distributed import fsdp_wrap
+from fairseq.models import FairseqIncrementalDecoder
+from fairseq.models.transformer import TransformerConfig
+from fairseq.models.transformer.transformer_decoder import module_name_fordropout, Linear
+from fairseq.modules import (
+    AdaptiveSoftmax,
+    BaseLayer,
+    FairseqDropout,
+    LayerDropModuleList,
+    LayerNorm,
+    PositionalEmbedding,
+    SinusoidalPositionalEmbedding,
+)
+from fairseq.modules.checkpoint_activations import checkpoint_wrapper
+from fairseq.modules.quant_noise import quant_noise as apply_quant_noise_
+from torch import Tensor
+
+import yitrans_iwslt22.modules.transformer_decoder_layer as transformer_layer
+from yitrans_iwslt22.modules.relative_pos_enc import RelativePositionalEncoding
+
+class MultimodalTransformerDecoderBase(FairseqIncrementalDecoder):
+    """
+    Transformer decoder consisting of *cfg.decoder.layers* layers. Each layer
+    is a :class:`TransformerDecoderLayer`.
+
+    Args:
+        args (argparse.Namespace): parsed command-line arguments
+        dictionaries (~fairseq.data.Dictionary): a list of decoding dictionaries
+        embed_tokens_list (torch.nn.Embedding): a list of output embedding
+        no_encoder_attn (bool, optional): whether to attend to encoder outputs
+            (default: False).
+    """
+
+    def __init__(
+        self,
+        cfg,
+        dictionaries,
+        embed_tokens_list,
+        no_encoder_attn=False,
+        output_projection=None,
+        use_rel_pos_enc=False,
+    ):
+        assert all([embed_tokens.padding_idx == embed_tokens_list[0].padding_idx for embed_tokens in embed_tokens_list])
+        assert all([embed_tokens.embedding_dim == embed_tokens_list[0].embedding_dim for embed_tokens in embed_tokens_list])
+        self.cfg = cfg
+        super().__init__(dictionaries)
+        self.register_buffer("version", torch.Tensor([3]))
+        self._future_mask = torch.empty(0)
+
+        self.dropout_module = FairseqDropout(
+            cfg.dropout, module_name=module_name_fordropout(self.__class__.__name__)
+        )
+        self.decoder_layerdrop = cfg.decoder.layerdrop
+        self.share_input_output_embed = cfg.share_decoder_input_output_embed
+
+        input_embed_dim = embed_tokens_list[0].embedding_dim
+        embed_dim = cfg.decoder.embed_dim
+        self.embed_dim = embed_dim
+        self.output_embed_dim = cfg.decoder.output_dim
+
+        self.padding_idx = embed_tokens_list[0].padding_idx
+        self.max_target_positions = cfg.max_target_positions
+
+        self.embed_tokens_list = nn.ModuleList(embed_tokens_list)
+
+        self.embed_scale = 1.0 if cfg.no_scale_embedding else math.sqrt(embed_dim)
+
+        if not cfg.adaptive_input and cfg.quant_noise.pq > 0:
+            self.quant_noise = apply_quant_noise_(
+                nn.Linear(embed_dim, embed_dim, bias=False),
+                cfg.quant_noise.pq,
+                cfg.quant_noise.pq_block_size,
+            )
+        else:
+            self.quant_noise = None
+
+        self.project_in_dim = (
+            Linear(input_embed_dim, embed_dim, bias=False)
+            if embed_dim != input_embed_dim
+            else None
+        )
+        self.embed_positions = (
+            PositionalEmbedding(
+                self.max_target_positions,
+                embed_dim,
+                self.padding_idx,
+                learned=cfg.decoder.learned_pos,
+            )
+            if not cfg.no_token_positional_embeddings
+            else None
+        )
+        if cfg.layernorm_embedding:
+            self.layernorm_embedding = LayerNorm(embed_dim, export=cfg.export)
+        else:
+            self.layernorm_embedding = None
+
+        self.cross_self_attention = cfg.cross_self_attention
+
+        if self.decoder_layerdrop > 0.0:
+            self.layers = LayerDropModuleList(p=self.decoder_layerdrop)
+        else:
+            self.layers = nn.ModuleList([])
+        self.use_rel_pos_enc = use_rel_pos_enc
+        self.layers.extend(
+            [
+                self.build_decoder_layer(cfg, no_encoder_attn)
+                for _ in range(cfg.decoder.layers)
+            ]
+        )
+        self.num_layers = len(self.layers)
+
+        if cfg.decoder.normalize_before and not cfg.no_decoder_final_norm:
+            self.layer_norm = LayerNorm(embed_dim, export=cfg.export)
+        else:
+            self.layer_norm = None
+
+        self.project_out_dim = (
+            Linear(embed_dim, self.output_embed_dim, bias=False)
+            if embed_dim != self.output_embed_dim and not cfg.tie_adaptive_weights
+            else None
+        )
+
+        self.adaptive_softmax = None
+        self.output_projection = output_projection
+        if self.output_projection is None:
+            self.build_output_projection(cfg, dictionaries, embed_tokens_list)
+        if self.use_rel_pos_enc:
+            self.pos_emb = RelativePositionalEncoding(embed_dim // cfg.decoder.attention_heads, 24)
+
+    def build_output_projection(self, cfg, dictionaries, embed_tokens_list):
+        if cfg.adaptive_softmax_cutoff is not None:
+            self.adaptive_softmax = nn.ModuleList([
+                AdaptiveSoftmax(
+                    len(dictionary),
+                    self.output_embed_dim,
+                    utils.eval_str_list(cfg.adaptive_softmax_cutoff, type=int),
+                    dropout=cfg.adaptive_softmax_dropout,
+                    adaptive_inputs=embed_tokens if cfg.tie_adaptive_weights else None,
+                    factor=cfg.adaptive_softmax_factor,
+                    tie_proj=cfg.tie_adaptive_proj,
+                ) for (dictionary, embed_tokens) in zip(dictionaries, embed_tokens_list)
+            ])
+        elif self.share_input_output_embed:
+            self.output_projection = nn.ModuleList([
+                nn.Linear(
+                    self.embed_tokens_list[i].weight.shape[1],
+                    self.embed_tokens_list[i].weight.shape[0],
+                    bias=False,
+                ) for i in range(len(self.embed_tokens_list))
+            ])
+            for i in range(len(self.embed_tokens_list)):
+                self.output_projection[i].weight = self.embed_tokens_list[i].weight
+        else:
+            self.output_projection = nn.ModuleList([
+                nn.Linear(
+                    self.output_embed_dim, len(dictionary), bias=False
+                ) for dictionary in dictionaries
+            ])
+            for i in range(len(self.embed_tokens_list)):
+                nn.init.normal_(
+                    self.output_projection[i].weight, mean=0, std=self.output_embed_dim ** -0.5
+                )
+        num_base_layers = cfg.base_layers
+        for i in range(num_base_layers):
+            self.layers.insert(
+                ((i + 1) * cfg.decoder.layers) // (num_base_layers + 1),
+                BaseLayer(cfg),
+            )
+
+    def build_decoder_layer(self, cfg, no_encoder_attn=False):
+        layer = transformer_layer.TransformerDecoderLayerBase(cfg, no_encoder_attn, has_relative_attention_bias=self.use_rel_pos_enc)
+        checkpoint = cfg.checkpoint_activations
+        if checkpoint:
+            offload_to_cpu = cfg.offload_activations
+            layer = checkpoint_wrapper(layer, offload_to_cpu=offload_to_cpu)
+        # if we are checkpointing, enforce that FSDP always wraps the
+        # checkpointed layer, regardless of layer size
+        min_params_to_wrap = cfg.min_params_to_wrap if not checkpoint else 0
+        layer = fsdp_wrap(layer, min_num_params=min_params_to_wrap)
+        return layer
+
+    def forward(
+        self,
+        prev_output_tokens,
+        encoder_out: Optional[Dict[str, List[Tensor]]] = None,
+        incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]] = None,
+        features_only: bool = False,
+        full_context_alignment: bool = False,
+        alignment_layer: Optional[int] = None,
+        alignment_heads: Optional[int] = None,
+        src_lengths: Optional[Any] = None,
+        return_all_hiddens: bool = False,
+        modal_idx=0,
+    ):
+        """
+        Args:
+            prev_output_tokens (LongTensor): previous decoder outputs of shape
+                `(batch, tgt_len)`, for teacher forcing
+            encoder_out (optional): output from the encoder, used for
+                encoder-side attention, should be of size T x B x C
+            incremental_state (dict): dictionary used for storing state during
+                :ref:`Incremental decoding`
+            features_only (bool, optional): only return features without
+                applying output layer (default: False).
+            full_context_alignment (bool, optional): don't apply
+                auto-regressive mask to self-attention (default: False).
+
+        Returns:
+            tuple:
+                - the decoder's output of shape `(batch, tgt_len, vocab)`
+                - a dictionary with any model-specific outputs
+        """
+
+        x, extra = self.extract_features(
+            prev_output_tokens,
+            encoder_out=encoder_out,
+            incremental_state=incremental_state,
+            full_context_alignment=full_context_alignment,
+            alignment_layer=alignment_layer,
+            alignment_heads=alignment_heads,
+            modal_idx=modal_idx,
+        )
+
+        if not features_only:
+            x = self.output_layer(x, modal_idx)
+        return x, extra
+
+    def extract_features(
+        self,
+        prev_output_tokens,
+        encoder_out: Optional[Dict[str, List[Tensor]]],
+        incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]] = None,
+        full_context_alignment: bool = False,
+        alignment_layer: Optional[int] = None,
+        alignment_heads: Optional[int] = None,
+        modal_idx=0,
+    ):
+        return self.extract_features_scriptable(
+            prev_output_tokens,
+            encoder_out,
+            incremental_state,
+            full_context_alignment,
+            alignment_layer,
+            alignment_heads,
+            modal_idx=modal_idx,
+        )
+
+    """
+    A scriptable subclass of this class has an extract_features method and calls
+    super().extract_features, but super() is not supported in torchscript. A copy of
+    this function is made to be used in the subclass instead.
+    """
+
+    def extract_features_scriptable(
+        self,
+        prev_output_tokens,
+        encoder_out: Optional[Dict[str, List[Tensor]]],
+        incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]] = None,
+        full_context_alignment: bool = False,
+        alignment_layer: Optional[int] = None,
+        alignment_heads: Optional[int] = None,
+        modal_idx=0,
+    ):
+        """
+        Similar to *forward* but only return features.
+
+        Includes several features from "Jointly Learning to Align and
+        Translate with Transformer Models" (Garg et al., EMNLP 2019).
+
+        Args:
+            full_context_alignment (bool, optional): don't apply
+                auto-regressive mask to self-attention (default: False).
+            alignment_layer (int, optional): return mean alignment over
+                heads at this layer (default: last layer).
+            alignment_heads (int, optional): only average alignment over
+                this many heads (default: all heads).
+
+        Returns:
+            tuple:
+                - the decoder's features of shape `(batch, tgt_len, embed_dim)`
+                - a dictionary with any model-specific outputs
+        """
+        bs, slen = prev_output_tokens.size()
+        if alignment_layer is None:
+            alignment_layer = self.num_layers - 1
+
+        enc: Optional[Tensor] = None
+        padding_mask: Optional[Tensor] = None
+        if encoder_out is not None and len(encoder_out["encoder_out"]) > 0:
+            enc = encoder_out["encoder_out"][0]
+            assert (
+                enc.size()[1] == bs
+            ), f"Expected enc.shape == (t, {bs}, c) got {enc.shape}"
+        if encoder_out is not None and len(encoder_out["encoder_padding_mask"]) > 0:
+            padding_mask = encoder_out["encoder_padding_mask"][0]
+
+        # embed positions
+        positions = None
+        if self.embed_positions is not None:
+            positions = self.embed_positions(
+                prev_output_tokens, incremental_state=incremental_state
+            )
+
+        if incremental_state is not None:
+            prev_output_tokens = prev_output_tokens[:, -1:]
+            if positions is not None:
+                positions = positions[:, -1:]
+
+        # embed tokens and positions
+        x = self.embed_scale * self.embed_tokens_list[modal_idx](prev_output_tokens)
+
+        if self.quant_noise is not None:
+            x = self.quant_noise(x)
+
+        if self.project_in_dim is not None:
+            x = self.project_in_dim(x)
+
+        if positions is not None:
+            x += positions
+
+        if self.layernorm_embedding is not None:
+            x = self.layernorm_embedding(x)
+
+        x = self.dropout_module(x)
+
+        # B x T x C -> T x B x C
+        x = x.transpose(0, 1)
+        if self.use_rel_pos_enc:
+            pos_seq = torch.arange(0, slen).long().to(x.device)
+            pos_seq = pos_seq[:, None] - pos_seq[None, :]
+            pos_k, _ = self.pos_emb(pos_seq, incremental_state)
+        else:
+            pos_k = None
+
+        self_attn_padding_mask: Optional[Tensor] = None
+        if self.cross_self_attention or prev_output_tokens.eq(self.padding_idx).any():
+            self_attn_padding_mask = prev_output_tokens.eq(self.padding_idx)
+
+        # decoder layers
+        attn: Optional[Tensor] = None
+        inner_states: List[Optional[Tensor]] = [x]
+        for idx, layer in enumerate(self.layers):
+            if incremental_state is None and not full_context_alignment:
+                self_attn_mask = self.buffered_future_mask(x)
+            else:
+                self_attn_mask = None
+
+            x, layer_attn, _ = layer(
+                x,
+                enc,
+                padding_mask,
+                incremental_state,
+                self_attn_mask=self_attn_mask,
+                self_attn_padding_mask=self_attn_padding_mask,
+                need_attn=bool((idx == alignment_layer)),
+                need_head_weights=bool((idx == alignment_layer)),
+                pos_bias=pos_k,
+            )
+            inner_states.append(x)
+            if layer_attn is not None and idx == alignment_layer:
+                attn = layer_attn.float().to(x)
+
+        if attn is not None:
+            if alignment_heads is not None:
+                attn = attn[:alignment_heads]
+
+            # average probabilities over heads
+            attn = attn.mean(dim=0)
+
+        if self.layer_norm is not None:
+            x = self.layer_norm(x)
+
+        # T x B x C -> B x T x C
+        x = x.transpose(0, 1)
+
+        if self.project_out_dim is not None:
+            x = self.project_out_dim(x)
+
+        return x, {"attn": [attn], "inner_states": inner_states}
+
+    def output_layer(self, features, modal_idx):
+        """Project features to the vocabulary size."""
+        if self.adaptive_softmax is None:
+            # project back to size of vocabulary
+            return self.output_projection[modal_idx](features)
+        else:
+            return features
+
+    def max_positions(self):
+        """Maximum output length supported by the decoder."""
+        if self.embed_positions is None:
+            return self.max_target_positions
+        return min(self.max_target_positions, self.embed_positions.max_positions)
+
+    def buffered_future_mask(self, tensor):
+        dim = tensor.size(0)
+        # self._future_mask.device != tensor.device is not working in TorchScript. This is a workaround.
+        if (
+            self._future_mask.size(0) == 0
+            or (not self._future_mask.device == tensor.device)
+            or self._future_mask.size(0) < dim
+        ):
+            self._future_mask = torch.triu(
+                utils.fill_with_neg_inf(torch.zeros([dim, dim])), 1
+            )
+        self._future_mask = self._future_mask.to(tensor)
+        return self._future_mask[:dim, :dim]
+
+    def upgrade_state_dict_named(self, state_dict, name):
+        """Upgrade a (possibly old) state dict for new versions of fairseq."""
+        if isinstance(self.embed_positions, SinusoidalPositionalEmbedding):
+            weights_key = "{}.embed_positions.weights".format(name)
+            if weights_key in state_dict:
+                del state_dict[weights_key]
+            state_dict[
+                "{}.embed_positions._float_tensor".format(name)
+            ] = torch.FloatTensor(1)
+
+        if f"{name}.output_projection.weight" not in state_dict:
+            if self.share_input_output_embed:
+                embed_out_key = f"{name}.embed_tokens.weight"
+            else:
+                embed_out_key = f"{name}.embed_out"
+            if embed_out_key in state_dict:
+                state_dict[f"{name}.output_projection.weight"] = state_dict[
+                    embed_out_key
+                ]
+                if not self.share_input_output_embed:
+                    del state_dict[embed_out_key]
+
+        for i in range(self.num_layers):
+            # update layer norms
+            layer_norm_map = {
+                "0": "self_attn_layer_norm",
+                "1": "encoder_attn_layer_norm",
+                "2": "final_layer_norm",
+            }
+            for old, new in layer_norm_map.items():
+                for m in ("weight", "bias"):
+                    k = "{}.layers.{}.layer_norms.{}.{}".format(name, i, old, m)
+                    if k in state_dict:
+                        state_dict[
+                            "{}.layers.{}.{}.{}".format(name, i, new, m)
+                        ] = state_dict[k]
+                        del state_dict[k]
+
+        version_key = "{}.version".format(name)
+        if utils.item(state_dict.get(version_key, torch.Tensor([1]))[0]) <= 2:
+            # earlier checkpoints did not normalize after the stack of layers
+            self.layer_norm = None
+            self.normalize = False
+            state_dict[version_key] = torch.Tensor([1])
+
+        return state_dict
+
+
+class MultimodalTransformerDecoder(MultimodalTransformerDecoderBase):
+    def __init__(
+        self,
+        args,
+        dictionaries,
+        embed_tokens_list,
+        no_encoder_attn=False,
+        output_projection=None,
+    ):
+
+        self.args = args
+        super().__init__(
+            TransformerConfig.from_namespace(args),
+            dictionaries,
+            embed_tokens_list,
+            no_encoder_attn=no_encoder_attn,
+            output_projection=output_projection,
+            use_rel_pos_enc=getattr(args, "use_rel_pos_enc", False),
+        )
+
+    def build_output_projection(self, args, dictionaries, embed_tokens_list):
+        super().build_output_projection(
+            TransformerConfig.from_namespace(args), dictionaries, embed_tokens_list
+        )
+
+    def build_decoder_layer(self, args, no_encoder_attn=False):
+        return super().build_decoder_layer(
+            TransformerConfig.from_namespace(args), no_encoder_attn=no_encoder_attn
+        )
+    
+    def extract_features(
+        self,
+        prev_output_tokens,
+        encoder_out: Optional[Dict[str, List[Tensor]]] = None,
+        incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]] = None,
+        full_context_alignment: bool = False,
+        alignment_layer: Optional[int] = None,
+        alignment_heads: Optional[int] = None,
+        modal_idx=0,
+    ):
+        # call scriptable method from parent class
+        x, _ = self.extract_features_scriptable(
+            prev_output_tokens,
+            encoder_out,
+            incremental_state,
+            full_context_alignment,
+            alignment_layer,
+            alignment_heads,
+            modal_idx=modal_idx,
+        )
+        return x, None
diff --git a/YiTrans/yitrans_iwslt22/modules/relative_pos_enc.py b/YiTrans/yitrans_iwslt22/modules/relative_pos_enc.py
new file mode 100644
index 0000000000000000000000000000000000000000..2a073ebf2893e9e9b092aa520bdaf927e9388c2b
--- /dev/null
+++ b/YiTrans/yitrans_iwslt22/modules/relative_pos_enc.py
@@ -0,0 +1,35 @@
+# --------------------------------------------------------
+# Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data (https://arxiv.org/abs/2203.17113)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/Speech2C
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Based on fairseq code bases
+# https://github.com/pytorch/fairseq
+# --------------------------------------------------------
+
+import torch
+
+class RelativePositionalEncoding(torch.nn.Module):
+    def __init__(self, d_model, maxlen=1000, embed_v=False):
+        super(RelativePositionalEncoding, self).__init__()
+
+        self.d_model = d_model
+        self.maxlen = maxlen
+        self.pe_k = torch.nn.Embedding(2*maxlen, d_model) 
+        if embed_v:
+            self.pe_v = torch.nn.Embedding(2*maxlen, d_model)
+        self.embed_v = embed_v
+
+
+    def forward(self, pos_seq, incremental_state=None):
+        pos_seq[pos_seq < -self.maxlen] = -self.maxlen
+        pos_seq[pos_seq >= self.maxlen] = self.maxlen - 1
+        pos_seq = pos_seq + self.maxlen
+        
+        if incremental_state is not None:
+            pos_seq = pos_seq[-1:]
+
+        if self.embed_v:
+            return self.pe_k(pos_seq), self.pe_v(pos_seq)
+        else:
+            return self.pe_k(pos_seq), None
diff --git a/YiTrans/yitrans_iwslt22/modules/transformer_decoder.py b/YiTrans/yitrans_iwslt22/modules/transformer_decoder.py
new file mode 100644
index 0000000000000000000000000000000000000000..29b9f30fc0f026259c8f0ea277e4d60f9d70568d
--- /dev/null
+++ b/YiTrans/yitrans_iwslt22/modules/transformer_decoder.py
@@ -0,0 +1,523 @@
+# --------------------------------------------------------
+# The YiTrans End-to-End Speech Translation System for IWSLT 2022 Offline Shared Task (https://arxiv.org/abs/2206.05777)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/YiTrans
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Based on fairseq code bases
+# https://github.com/facebookresearch/fairseq
+# --------------------------------------------------------
+"""
+    Modified from https://github.com/facebookresearch/fairseq/blob/main/fairseq/models/transformer/transformer_decoder.py
+"""
+
+import math
+from typing import Any, Dict, List, Optional
+
+import torch
+import torch.nn as nn
+from fairseq import utils
+from fairseq.distributed import fsdp_wrap
+from fairseq.models import FairseqIncrementalDecoder
+from fairseq.models.transformer import TransformerConfig
+from fairseq.modules import (
+    AdaptiveSoftmax,
+    BaseLayer,
+    FairseqDropout,
+    LayerDropModuleList,
+    LayerNorm,
+    PositionalEmbedding,
+    SinusoidalPositionalEmbedding,
+)
+from fairseq.modules.checkpoint_activations import checkpoint_wrapper
+from fairseq.modules.quant_noise import quant_noise as apply_quant_noise_
+from torch import Tensor
+
+import yitrans_iwslt22.modules.transformer_decoder_layer as transformer_layer
+from yitrans_iwslt22.modules.relative_pos_enc import RelativePositionalEncoding
+
+# rewrite name for backward compatibility in `make_generation_fast_`
+def module_name_fordropout(module_name: str) -> str:
+    if module_name == "TransformerDecoderBase":
+        return "TransformerDecoder"
+    else:
+        return module_name
+
+
+class TransformerDecoderBase(FairseqIncrementalDecoder):
+    """
+    Transformer decoder consisting of *cfg.decoder.layers* layers. Each layer
+    is a :class:`TransformerDecoderLayer`.
+
+    Args:
+        args (argparse.Namespace): parsed command-line arguments
+        dictionary (~fairseq.data.Dictionary): decoding dictionary
+        embed_tokens (torch.nn.Embedding): output embedding
+        no_encoder_attn (bool, optional): whether to attend to encoder outputs
+            (default: False).
+    """
+
+    def __init__(
+        self,
+        cfg,
+        dictionary,
+        embed_tokens,
+        no_encoder_attn=False,
+        output_projection=None,
+        use_rel_pos_enc=False,
+    ):
+        self.cfg = cfg
+        super().__init__(dictionary)
+        self.register_buffer("version", torch.Tensor([3]))
+        self._future_mask = torch.empty(0)
+
+        self.dropout_module = FairseqDropout(
+            cfg.dropout, module_name=module_name_fordropout(self.__class__.__name__)
+        )
+        self.decoder_layerdrop = cfg.decoder.layerdrop
+        self.share_input_output_embed = cfg.share_decoder_input_output_embed
+
+        input_embed_dim = embed_tokens.embedding_dim
+        embed_dim = cfg.decoder.embed_dim
+        self.embed_dim = embed_dim
+        self.output_embed_dim = cfg.decoder.output_dim
+
+        self.padding_idx = embed_tokens.padding_idx
+        self.max_target_positions = cfg.max_target_positions
+
+        self.embed_tokens = embed_tokens
+
+        self.embed_scale = 1.0 if cfg.no_scale_embedding else math.sqrt(embed_dim)
+
+        if not cfg.adaptive_input and cfg.quant_noise.pq > 0:
+            self.quant_noise = apply_quant_noise_(
+                nn.Linear(embed_dim, embed_dim, bias=False),
+                cfg.quant_noise.pq,
+                cfg.quant_noise.pq_block_size,
+            )
+        else:
+            self.quant_noise = None
+
+        self.project_in_dim = (
+            Linear(input_embed_dim, embed_dim, bias=False)
+            if embed_dim != input_embed_dim
+            else None
+        )
+        self.embed_positions = (
+            PositionalEmbedding(
+                self.max_target_positions,
+                embed_dim,
+                self.padding_idx,
+                learned=cfg.decoder.learned_pos,
+            )
+            if not cfg.no_token_positional_embeddings
+            else None
+        )
+        if cfg.layernorm_embedding:
+            self.layernorm_embedding = LayerNorm(embed_dim, export=cfg.export)
+        else:
+            self.layernorm_embedding = None
+
+        self.cross_self_attention = cfg.cross_self_attention
+
+        if self.decoder_layerdrop > 0.0:
+            self.layers = LayerDropModuleList(p=self.decoder_layerdrop)
+        else:
+            self.layers = nn.ModuleList([])
+        self.use_rel_pos_enc = use_rel_pos_enc
+        self.layers.extend(
+            [
+                self.build_decoder_layer(cfg, no_encoder_attn)
+                for _ in range(cfg.decoder.layers)
+            ]
+        )
+        self.num_layers = len(self.layers)
+
+        if cfg.decoder.normalize_before and not cfg.no_decoder_final_norm:
+            self.layer_norm = LayerNorm(embed_dim, export=cfg.export)
+        else:
+            self.layer_norm = None
+
+        self.project_out_dim = (
+            Linear(embed_dim, self.output_embed_dim, bias=False)
+            if embed_dim != self.output_embed_dim and not cfg.tie_adaptive_weights
+            else None
+        )
+
+        self.adaptive_softmax = None
+        self.output_projection = output_projection
+        if self.output_projection is None:
+            self.build_output_projection(cfg, dictionary, embed_tokens)
+        if self.use_rel_pos_enc:
+            self.pos_emb = RelativePositionalEncoding(embed_dim // cfg.decoder.attention_heads, 24)
+
+    def build_output_projection(self, cfg, dictionary, embed_tokens):
+        if cfg.adaptive_softmax_cutoff is not None:
+            self.adaptive_softmax = AdaptiveSoftmax(
+                len(dictionary),
+                self.output_embed_dim,
+                utils.eval_str_list(cfg.adaptive_softmax_cutoff, type=int),
+                dropout=cfg.adaptive_softmax_dropout,
+                adaptive_inputs=embed_tokens if cfg.tie_adaptive_weights else None,
+                factor=cfg.adaptive_softmax_factor,
+                tie_proj=cfg.tie_adaptive_proj,
+            )
+        elif self.share_input_output_embed:
+            self.output_projection = nn.Linear(
+                self.embed_tokens.weight.shape[1],
+                self.embed_tokens.weight.shape[0],
+                bias=False,
+            )
+            self.output_projection.weight = self.embed_tokens.weight
+        else:
+            self.output_projection = nn.Linear(
+                self.output_embed_dim, len(dictionary), bias=False
+            )
+            nn.init.normal_(
+                self.output_projection.weight, mean=0, std=self.output_embed_dim ** -0.5
+            )
+        num_base_layers = cfg.base_layers
+        for i in range(num_base_layers):
+            self.layers.insert(
+                ((i + 1) * cfg.decoder.layers) // (num_base_layers + 1),
+                BaseLayer(cfg),
+            )
+
+    def build_decoder_layer(self, cfg, no_encoder_attn=False):
+        layer = transformer_layer.TransformerDecoderLayerBase(cfg, no_encoder_attn, has_relative_attention_bias=self.use_rel_pos_enc)
+        checkpoint = cfg.checkpoint_activations
+        if checkpoint:
+            offload_to_cpu = cfg.offload_activations
+            layer = checkpoint_wrapper(layer, offload_to_cpu=offload_to_cpu)
+        # if we are checkpointing, enforce that FSDP always wraps the
+        # checkpointed layer, regardless of layer size
+        min_params_to_wrap = cfg.min_params_to_wrap if not checkpoint else 0
+        layer = fsdp_wrap(layer, min_num_params=min_params_to_wrap)
+        return layer
+
+    def forward(
+        self,
+        prev_output_tokens,
+        encoder_out: Optional[Dict[str, List[Tensor]]] = None,
+        incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]] = None,
+        features_only: bool = False,
+        full_context_alignment: bool = False,
+        alignment_layer: Optional[int] = None,
+        alignment_heads: Optional[int] = None,
+        src_lengths: Optional[Any] = None,
+        return_all_hiddens: bool = False,
+    ):
+        """
+        Args:
+            prev_output_tokens (LongTensor): previous decoder outputs of shape
+                `(batch, tgt_len)`, for teacher forcing
+            encoder_out (optional): output from the encoder, used for
+                encoder-side attention, should be of size T x B x C
+            incremental_state (dict): dictionary used for storing state during
+                :ref:`Incremental decoding`
+            features_only (bool, optional): only return features without
+                applying output layer (default: False).
+            full_context_alignment (bool, optional): don't apply
+                auto-regressive mask to self-attention (default: False).
+
+        Returns:
+            tuple:
+                - the decoder's output of shape `(batch, tgt_len, vocab)`
+                - a dictionary with any model-specific outputs
+        """
+
+        x, extra = self.extract_features(
+            prev_output_tokens,
+            encoder_out=encoder_out,
+            incremental_state=incremental_state,
+            full_context_alignment=full_context_alignment,
+            alignment_layer=alignment_layer,
+            alignment_heads=alignment_heads,
+        )
+
+        if not features_only:
+            x = self.output_layer(x)
+        return x, extra
+
+    def extract_features(
+        self,
+        prev_output_tokens,
+        encoder_out: Optional[Dict[str, List[Tensor]]],
+        incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]] = None,
+        full_context_alignment: bool = False,
+        alignment_layer: Optional[int] = None,
+        alignment_heads: Optional[int] = None,
+    ):
+        return self.extract_features_scriptable(
+            prev_output_tokens,
+            encoder_out,
+            incremental_state,
+            full_context_alignment,
+            alignment_layer,
+            alignment_heads,
+        )
+
+    """
+    A scriptable subclass of this class has an extract_features method and calls
+    super().extract_features, but super() is not supported in torchscript. A copy of
+    this function is made to be used in the subclass instead.
+    """
+
+    def extract_features_scriptable(
+        self,
+        prev_output_tokens,
+        encoder_out: Optional[Dict[str, List[Tensor]]],
+        incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]] = None,
+        full_context_alignment: bool = False,
+        alignment_layer: Optional[int] = None,
+        alignment_heads: Optional[int] = None,
+    ):
+        """
+        Similar to *forward* but only return features.
+
+        Includes several features from "Jointly Learning to Align and
+        Translate with Transformer Models" (Garg et al., EMNLP 2019).
+
+        Args:
+            full_context_alignment (bool, optional): don't apply
+                auto-regressive mask to self-attention (default: False).
+            alignment_layer (int, optional): return mean alignment over
+                heads at this layer (default: last layer).
+            alignment_heads (int, optional): only average alignment over
+                this many heads (default: all heads).
+
+        Returns:
+            tuple:
+                - the decoder's features of shape `(batch, tgt_len, embed_dim)`
+                - a dictionary with any model-specific outputs
+        """
+        bs, slen = prev_output_tokens.size()
+        if alignment_layer is None:
+            alignment_layer = self.num_layers - 1
+
+        enc: Optional[Tensor] = None
+        padding_mask: Optional[Tensor] = None
+        if encoder_out is not None and len(encoder_out["encoder_out"]) > 0:
+            enc = encoder_out["encoder_out"][0]
+            assert (
+                enc.size()[1] == bs
+            ), f"Expected enc.shape == (t, {bs}, c) got {enc.shape}"
+        if encoder_out is not None and len(encoder_out["encoder_padding_mask"]) > 0:
+            padding_mask = encoder_out["encoder_padding_mask"][0]
+
+        # embed positions
+        positions = None
+        if self.embed_positions is not None:
+            positions = self.embed_positions(
+                prev_output_tokens, incremental_state=incremental_state
+            )
+
+        if incremental_state is not None:
+            prev_output_tokens = prev_output_tokens[:, -1:]
+            if positions is not None:
+                positions = positions[:, -1:]
+
+        # embed tokens and positions
+        x = self.embed_scale * self.embed_tokens(prev_output_tokens)
+
+        if self.quant_noise is not None:
+            x = self.quant_noise(x)
+
+        if self.project_in_dim is not None:
+            x = self.project_in_dim(x)
+
+        if positions is not None:
+            x += positions
+
+        if self.layernorm_embedding is not None:
+            x = self.layernorm_embedding(x)
+
+        x = self.dropout_module(x)
+
+        # B x T x C -> T x B x C
+        x = x.transpose(0, 1)
+        if self.use_rel_pos_enc:
+            pos_seq = torch.arange(0, slen).long().to(x.device)
+            pos_seq = pos_seq[:, None] - pos_seq[None, :]
+            pos_k, _ = self.pos_emb(pos_seq, incremental_state)
+        else:
+            pos_k = None
+
+        self_attn_padding_mask: Optional[Tensor] = None
+        if self.cross_self_attention or prev_output_tokens.eq(self.padding_idx).any():
+            self_attn_padding_mask = prev_output_tokens.eq(self.padding_idx)
+
+        # decoder layers
+        attn: Optional[Tensor] = None
+        inner_states: List[Optional[Tensor]] = [x]
+        for idx, layer in enumerate(self.layers):
+            if incremental_state is None and not full_context_alignment:
+                self_attn_mask = self.buffered_future_mask(x)
+            else:
+                self_attn_mask = None
+
+            x, layer_attn, _ = layer(
+                x,
+                enc,
+                padding_mask,
+                incremental_state,
+                self_attn_mask=self_attn_mask,
+                self_attn_padding_mask=self_attn_padding_mask,
+                need_attn=bool((idx == alignment_layer)),
+                need_head_weights=bool((idx == alignment_layer)),
+                pos_bias=pos_k,
+            )
+            inner_states.append(x)
+            if layer_attn is not None and idx == alignment_layer:
+                attn = layer_attn.float().to(x)
+
+        if attn is not None:
+            if alignment_heads is not None:
+                attn = attn[:alignment_heads]
+
+            # average probabilities over heads
+            attn = attn.mean(dim=0)
+
+        if self.layer_norm is not None:
+            x = self.layer_norm(x)
+
+        # T x B x C -> B x T x C
+        x = x.transpose(0, 1)
+
+        if self.project_out_dim is not None:
+            x = self.project_out_dim(x)
+
+        return x, {"attn": [attn], "inner_states": inner_states}
+
+    def output_layer(self, features):
+        """Project features to the vocabulary size."""
+        if self.adaptive_softmax is None:
+            # project back to size of vocabulary
+            return self.output_projection(features)
+        else:
+            return features
+
+    def max_positions(self):
+        """Maximum output length supported by the decoder."""
+        if self.embed_positions is None:
+            return self.max_target_positions
+        return min(self.max_target_positions, self.embed_positions.max_positions)
+
+    def buffered_future_mask(self, tensor):
+        dim = tensor.size(0)
+        # self._future_mask.device != tensor.device is not working in TorchScript. This is a workaround.
+        if (
+            self._future_mask.size(0) == 0
+            or (not self._future_mask.device == tensor.device)
+            or self._future_mask.size(0) < dim
+        ):
+            self._future_mask = torch.triu(
+                utils.fill_with_neg_inf(torch.zeros([dim, dim])), 1
+            )
+        self._future_mask = self._future_mask.to(tensor)
+        return self._future_mask[:dim, :dim]
+
+    def upgrade_state_dict_named(self, state_dict, name):
+        """Upgrade a (possibly old) state dict for new versions of fairseq."""
+        if isinstance(self.embed_positions, SinusoidalPositionalEmbedding):
+            weights_key = "{}.embed_positions.weights".format(name)
+            if weights_key in state_dict:
+                del state_dict[weights_key]
+            state_dict[
+                "{}.embed_positions._float_tensor".format(name)
+            ] = torch.FloatTensor(1)
+
+        if f"{name}.output_projection.weight" not in state_dict:
+            if self.share_input_output_embed:
+                embed_out_key = f"{name}.embed_tokens.weight"
+            else:
+                embed_out_key = f"{name}.embed_out"
+            if embed_out_key in state_dict:
+                state_dict[f"{name}.output_projection.weight"] = state_dict[
+                    embed_out_key
+                ]
+                if not self.share_input_output_embed:
+                    del state_dict[embed_out_key]
+
+        for i in range(self.num_layers):
+            # update layer norms
+            layer_norm_map = {
+                "0": "self_attn_layer_norm",
+                "1": "encoder_attn_layer_norm",
+                "2": "final_layer_norm",
+            }
+            for old, new in layer_norm_map.items():
+                for m in ("weight", "bias"):
+                    k = "{}.layers.{}.layer_norms.{}.{}".format(name, i, old, m)
+                    if k in state_dict:
+                        state_dict[
+                            "{}.layers.{}.{}.{}".format(name, i, new, m)
+                        ] = state_dict[k]
+                        del state_dict[k]
+
+        version_key = "{}.version".format(name)
+        if utils.item(state_dict.get(version_key, torch.Tensor([1]))[0]) <= 2:
+            # earlier checkpoints did not normalize after the stack of layers
+            self.layer_norm = None
+            self.normalize = False
+            state_dict[version_key] = torch.Tensor([1])
+
+        return state_dict
+
+
+def Linear(in_features, out_features, bias=True):
+    m = nn.Linear(in_features, out_features, bias)
+    nn.init.xavier_uniform_(m.weight)
+    if bias:
+        nn.init.constant_(m.bias, 0.0)
+    return m
+
+
+class TransformerDecoder(TransformerDecoderBase):
+    def __init__(
+        self,
+        args,
+        dictionary,
+        embed_tokens,
+        no_encoder_attn=False,
+        output_projection=None,
+    ):
+        self.args = args
+        super().__init__(
+            TransformerConfig.from_namespace(args),
+            dictionary,
+            embed_tokens,
+            no_encoder_attn=no_encoder_attn,
+            output_projection=output_projection,
+            use_rel_pos_enc=getattr(args, "use_rel_pos_enc", False),
+        )
+
+    def build_output_projection(self, args, dictionary, embed_tokens):
+        super().build_output_projection(
+            TransformerConfig.from_namespace(args), dictionary, embed_tokens
+        )
+
+    def build_decoder_layer(self, args, no_encoder_attn=False):
+        return super().build_decoder_layer(
+            TransformerConfig.from_namespace(args), no_encoder_attn=no_encoder_attn
+        )
+
+class TransformerDecoderScriptable(TransformerDecoder):
+    def extract_features(
+        self,
+        prev_output_tokens,
+        encoder_out: Optional[Dict[str, List[Tensor]]] = None,
+        incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]] = None,
+        full_context_alignment: bool = False,
+        alignment_layer: Optional[int] = None,
+        alignment_heads: Optional[int] = None,
+    ):
+        # call scriptable method from parent class
+        x, _ = self.extract_features_scriptable(
+            prev_output_tokens,
+            encoder_out,
+            incremental_state,
+            full_context_alignment,
+            alignment_layer,
+            alignment_heads,
+        )
+        return x, None
diff --git a/YiTrans/yitrans_iwslt22/modules/transformer_decoder_layer.py b/YiTrans/yitrans_iwslt22/modules/transformer_decoder_layer.py
new file mode 100644
index 0000000000000000000000000000000000000000..d5397b850a1cbec7f4e092a813cdb79b9c909c9f
--- /dev/null
+++ b/YiTrans/yitrans_iwslt22/modules/transformer_decoder_layer.py
@@ -0,0 +1,219 @@
+# --------------------------------------------------------
+# The YiTrans End-to-End Speech Translation System for IWSLT 2022 Offline Shared Task (https://arxiv.org/abs/2206.05777)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/YiTrans
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Based on fairseq code bases
+# https://github.com/facebookresearch/fairseq
+# --------------------------------------------------------
+"""
+    Modified from https://github.com/facebookresearch/fairseq/blob/main/fairseq/modules/transformer_layer.py
+    https://github.com/microsoft/SpeechT5/blob/main/Speech2C/speech2c/models/modules/transformer_decoder_layer.py
+"""
+
+from typing import Dict, List, Optional
+
+import torch
+from torch import Tensor
+from fairseq.modules.transformer_layer import TransformerDecoderLayerBase as FairseqTransformerDecoderLayerBase
+from fairseq.modules import LayerNorm
+
+from yitrans_iwslt22.modules.multihead_attention import MultiheadAttention
+
+
+class TransformerDecoderLayerBase(FairseqTransformerDecoderLayerBase):
+    """Decoder layer block.
+
+    In the original paper each operation (multi-head attention, encoder
+    attention or FFN) is postprocessed with: `dropout -> add residual ->
+    layernorm`. In the tensor2tensor code they suggest that learning is more
+    robust when preprocessing each layer with layernorm and postprocessing with:
+    `dropout -> add residual`. We default to the approach in the paper, but the
+    tensor2tensor approach can be enabled by setting
+    *cfg.decoder.normalize_before* to ``True``.
+
+    Args:
+        args (argparse.Namespace): parsed command-line arguments
+        no_encoder_attn (bool, optional): whether to attend to encoder outputs
+            (default: False).
+    """
+
+    def __init__(
+        self, cfg, no_encoder_attn=False, add_bias_kv=False, add_zero_attn=False, has_relative_attention_bias=False
+    ):
+        super().__init__(
+            cfg,
+            no_encoder_attn,
+            add_bias_kv,
+            add_zero_attn,
+        )
+
+        if has_relative_attention_bias:
+            self.norm_k = LayerNorm(self.embed_dim // cfg.decoder.attention_heads)
+
+    def build_self_attention(
+        self, embed_dim, cfg, add_bias_kv=False, add_zero_attn=False
+    ):
+        return MultiheadAttention(
+            embed_dim,
+            cfg.decoder.attention_heads,
+            dropout=cfg.attention_dropout,
+            add_bias_kv=add_bias_kv,
+            add_zero_attn=add_zero_attn,
+            self_attention=not cfg.cross_self_attention,
+            q_noise=self.quant_noise,
+            qn_block_size=self.quant_noise_block_size,
+        )
+
+    def forward(
+        self,
+        x,
+        encoder_out: Optional[torch.Tensor] = None,
+        encoder_padding_mask: Optional[torch.Tensor] = None,
+        incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]] = None,
+        prev_self_attn_state: Optional[List[torch.Tensor]] = None,
+        prev_attn_state: Optional[List[torch.Tensor]] = None,
+        self_attn_mask: Optional[torch.Tensor] = None,
+        self_attn_padding_mask: Optional[torch.Tensor] = None,
+        need_attn: bool = False,
+        need_head_weights: bool = False,
+        pos_bias=None,
+    ):
+        """
+        Args:
+            x (Tensor): input to the layer of shape `(seq_len, batch, embed_dim)`
+            encoder_padding_mask (ByteTensor, optional): binary
+                ByteTensor of shape `(batch, src_len)` where padding
+                elements are indicated by ``1``.
+            need_attn (bool, optional): return attention weights
+            need_head_weights (bool, optional): return attention weights
+                for each head (default: return average over heads).
+        Returns:
+            encoded output of shape `(seq_len, batch, embed_dim)`
+        """
+        if need_head_weights:
+            need_attn = True
+
+        residual = x
+        if self.normalize_before:
+            x = self.self_attn_layer_norm(x)
+            if pos_bias is not None:
+                pos_bias = self.norm_k(pos_bias)
+        if prev_self_attn_state is not None:
+            prev_key, prev_value = prev_self_attn_state[:2]
+            saved_state: Dict[str, Optional[Tensor]] = {
+                "prev_key": prev_key,
+                "prev_value": prev_value,
+            }
+            if len(prev_self_attn_state) >= 3:
+                saved_state["prev_key_padding_mask"] = prev_self_attn_state[2]
+            assert incremental_state is not None
+            self.self_attn._set_input_buffer(incremental_state, saved_state)
+        _self_attn_input_buffer = self.self_attn._get_input_buffer(incremental_state)
+        if self.cross_self_attention and not (
+            incremental_state is not None
+            and _self_attn_input_buffer is not None
+            and "prev_key" in _self_attn_input_buffer
+        ):
+            if self_attn_mask is not None:
+                assert encoder_out is not None
+                self_attn_mask = torch.cat(
+                    (x.new_zeros(x.size(0), encoder_out.size(0)), self_attn_mask), dim=1
+                )
+            if self_attn_padding_mask is not None:
+                if encoder_padding_mask is None:
+                    assert encoder_out is not None
+                    encoder_padding_mask = self_attn_padding_mask.new_zeros(
+                        encoder_out.size(1), encoder_out.size(0)
+                    )
+                self_attn_padding_mask = torch.cat(
+                    (encoder_padding_mask, self_attn_padding_mask), dim=1
+                )
+            assert encoder_out is not None
+            y = torch.cat((encoder_out, x), dim=0)
+        else:
+            y = x
+
+        x, attn = self.self_attn(
+            query=x,
+            key=y,
+            value=y,
+            key_padding_mask=self_attn_padding_mask,
+            incremental_state=incremental_state,
+            need_weights=False,
+            attn_mask=self_attn_mask,
+            position_bias=pos_bias,
+        )
+        if self.c_attn is not None:
+            tgt_len, bsz = x.size(0), x.size(1)
+            x = x.view(tgt_len, bsz, self.nh, self.head_dim)
+            x = torch.einsum("tbhd,h->tbhd", x, self.c_attn)
+            x = x.reshape(tgt_len, bsz, self.embed_dim)
+        if self.attn_ln is not None:
+            x = self.attn_ln(x)
+        x = self.dropout_module(x)
+        x = self.residual_connection(x, residual)
+        if not self.normalize_before:
+            x = self.self_attn_layer_norm(x)
+
+        if self.encoder_attn is not None and encoder_out is not None:
+            residual = x
+            if self.normalize_before:
+                x = self.encoder_attn_layer_norm(x)
+            if prev_attn_state is not None:
+                prev_key, prev_value = prev_attn_state[:2]
+                saved_state: Dict[str, Optional[Tensor]] = {
+                    "prev_key": prev_key,
+                    "prev_value": prev_value,
+                }
+                if len(prev_attn_state) >= 3:
+                    saved_state["prev_key_padding_mask"] = prev_attn_state[2]
+                assert incremental_state is not None
+                self.encoder_attn._set_input_buffer(incremental_state, saved_state)
+
+            x, attn = self.encoder_attn(
+                query=x,
+                key=encoder_out,
+                value=encoder_out,
+                key_padding_mask=encoder_padding_mask,
+                incremental_state=incremental_state,
+                static_kv=True,
+                need_weights=need_attn or (not self.training and self.need_attn),
+                need_head_weights=need_head_weights,
+            )
+            x = self.dropout_module(x)
+            x = self.residual_connection(x, residual)
+            if not self.normalize_before:
+                x = self.encoder_attn_layer_norm(x)
+
+        residual = x
+        if self.normalize_before:
+            x = self.final_layer_norm(x)
+
+        x = self.activation_fn(self.fc1(x))
+        x = self.activation_dropout_module(x)
+        if self.ffn_layernorm is not None:
+            x = self.ffn_layernorm(x)
+        x = self.fc2(x)
+        x = self.dropout_module(x)
+        if self.w_resid is not None:
+            residual = torch.mul(self.w_resid, residual)
+        x = self.residual_connection(x, residual)
+        if not self.normalize_before:
+            x = self.final_layer_norm(x)
+        if self.onnx_trace and incremental_state is not None:
+            saved_state = self.self_attn._get_input_buffer(incremental_state)
+            assert saved_state is not None
+            if self_attn_padding_mask is not None:
+                self_attn_state = [
+                    saved_state["prev_key"],
+                    saved_state["prev_value"],
+                    saved_state["prev_key_padding_mask"],
+                ]
+            else:
+                self_attn_state = [saved_state["prev_key"], saved_state["prev_value"]]
+            return x, attn, self_attn_state
+        return x, attn, None
+
+    def make_generation_fast_(self, need_attn: bool = False, **kwargs):
+        self.need_attn = need_attn
diff --git a/YiTrans/yitrans_iwslt22/modules/w2v_encoder.py b/YiTrans/yitrans_iwslt22/modules/w2v_encoder.py
new file mode 100644
index 0000000000000000000000000000000000000000..7d59a7bfcb5a5b1d02b685c9cfb3c5c2f5cbfa80
--- /dev/null
+++ b/YiTrans/yitrans_iwslt22/modules/w2v_encoder.py
@@ -0,0 +1,283 @@
+# --------------------------------------------------------
+# The YiTrans End-to-End Speech Translation System for IWSLT 2022 Offline Shared Task (https://arxiv.org/abs/2206.05777)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/YiTrans
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Based on fairseq code bases
+# https://github.com/facebookresearch/fairseq
+# --------------------------------------------------------
+
+"""
+    wav2vec encoder adding relitive position bias, modified from 
+    https://github.com/microsoft/SpeechT5/blob/main/Speech2C/speech2c/models/modules/transformer_encoder.py
+    https://github.com/facebookresearch/fairseq/blob/main/fairseq/models/wav2vec/wav2vec2.py
+"""
+
+import math
+import numpy as np
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from fairseq import utils
+from fairseq.dataclass import ChoiceEnum
+from fairseq.modules import (
+    LayerNorm,
+    SamePad,
+)
+from fairseq.modules.checkpoint_activations import checkpoint_wrapper
+from fairseq.modules.transformer_sentence_encoder import init_bert_params
+from fairseq.utils import index_put
+from fairseq.distributed import fsdp_wrap
+from fairseq.models.wav2vec.utils import pad_to_multiple
+
+## reload multi-head attition with rel-pos-bias
+from fairseq.models.wav2vec.wav2vec2 import TransformerEncoder as W2vTransformerEncoder
+from yitrans_iwslt22.modules.relative_pos_enc import RelativePositionalEncoding
+from yitrans_iwslt22.modules.multihead_attention import MultiheadAttention
+
+EXTRACTOR_MODE_CHOICES = ChoiceEnum(["default", "layer_norm"])
+MASKING_DISTRIBUTION_CHOICES = ChoiceEnum(["static", "uniform", "normal", "poisson"])
+
+
+class TransformerEncoder(W2vTransformerEncoder):
+    def __init__(self, args):
+        super().__init__(args)
+
+        self.dropout = args.dropout
+        self.embedding_dim = args.encoder_embed_dim
+        self.required_seq_len_multiple = args.required_seq_len_multiple
+        self.use_rel_pos_enc = getattr(args, "use_rel_pos_enc", False)
+
+        self.pos_conv = nn.Conv1d(
+            self.embedding_dim,
+            self.embedding_dim,
+            kernel_size=args.conv_pos,
+            padding=args.conv_pos // 2,
+            groups=args.conv_pos_groups,
+        )
+        dropout = 0
+        std = math.sqrt((4 * (1.0 - dropout)) / (args.conv_pos * self.embedding_dim))
+        nn.init.normal_(self.pos_conv.weight, mean=0, std=std)
+        nn.init.constant_(self.pos_conv.bias, 0)
+
+        self.pos_conv = nn.utils.weight_norm(self.pos_conv, name="weight", dim=2)
+        self.pos_conv = nn.Sequential(self.pos_conv, SamePad(args.conv_pos), nn.GELU())
+
+        layers = []
+        for _ in range(args.encoder_layers):
+            layer = TransformerSentenceEncoderLayer(
+                embedding_dim=self.embedding_dim,
+                ffn_embedding_dim=args.encoder_ffn_embed_dim,
+                num_attention_heads=args.encoder_attention_heads,
+                dropout=self.dropout,
+                attention_dropout=args.attention_dropout,
+                activation_dropout=args.activation_dropout,
+                activation_fn=args.activation_fn,
+                layer_norm_first=args.layer_norm_first,
+                has_relative_attention_bias=self.use_rel_pos_enc,
+            )
+            if args.checkpoint_activations:
+                layer = fsdp_wrap(layer)
+                layer = checkpoint_wrapper(layer)
+            layers.append(layer)
+        self.layers = nn.ModuleList(layers)
+
+        self.layer_norm_first = args.layer_norm_first
+        self.layer_norm = LayerNorm(self.embedding_dim)
+        self.layerdrop = args.encoder_layerdrop
+        if self.use_rel_pos_enc:
+            self.pos_emb = RelativePositionalEncoding(args.encoder_embed_dim // args.encoder_attention_heads, 160)
+
+
+        self.apply(init_bert_params)
+
+    def forward(self, x, padding_mask=None, layer=None):
+        x, layer_results = self.extract_features(x, padding_mask, layer)
+
+        if self.layer_norm_first and layer is None:
+            x = self.layer_norm(x)
+
+        return x, layer_results
+
+    def extract_features(self, x, padding_mask=None, tgt_layer=None):
+
+        if padding_mask is not None:
+            x = index_put(x, padding_mask, 0)
+
+        x_conv = self.pos_conv(x.transpose(1, 2))
+        x_conv = x_conv.transpose(1, 2)
+        x = x + x_conv
+
+        if not self.layer_norm_first:
+            x = self.layer_norm(x)
+
+        # pad to the sequence length dimension
+        x, pad_length = pad_to_multiple(
+            x, self.required_seq_len_multiple, dim=-2, value=0
+        )
+        if pad_length > 0 and padding_mask is None:
+            padding_mask = x.new_zeros((x.size(0), x.size(1)), dtype=torch.bool)
+            padding_mask[:, -pad_length:] = True
+        else:
+            padding_mask, _ = pad_to_multiple(
+                padding_mask, self.required_seq_len_multiple, dim=-1, value=True
+            )
+        x = F.dropout(x, p=self.dropout, training=self.training)
+
+        # B x T x C -> T x B x C
+        x = x.transpose(0, 1)
+
+        if self.use_rel_pos_enc:
+            x_len = x.shape[0]
+            pos_seq = torch.arange(0, x_len).long().to(x.device)
+            pos_seq = pos_seq[:, None] - pos_seq[None, :]
+            pos_k, pos_v = self.pos_emb(pos_seq)
+        else:
+            pos_k = None
+
+        layer_results = []
+        r = None
+        for i, layer in enumerate(self.layers):
+            dropout_probability = np.random.random()
+            if not self.training or (dropout_probability > self.layerdrop):
+                x, z = layer(x, self_attn_padding_mask=padding_mask, need_weights=False, pos_bias=pos_k)
+                if tgt_layer is not None:
+                    # unpad if needed
+                    if pad_length > 0:
+                        layer_results.append(
+                            (
+                                x[:-pad_length],
+                                z[:, :-pad_length, :-pad_length]
+                                if z is not None
+                                else z,
+                            )
+                        )
+                    else:
+                        layer_results.append((x, z))
+            if i == tgt_layer:
+                r = x
+                break
+
+        if r is not None:
+            x = r
+
+        # T x B x C -> B x T x C
+        x = x.transpose(0, 1)
+        # undo paddding
+        if pad_length > 0:
+            x = x[:, :-pad_length]
+
+        return x, layer_results
+
+
+class TransformerSentenceEncoderLayer(nn.Module):
+    """
+    Implements a Transformer Encoder Layer used in BERT/XLM style pre-trained
+    models.
+    """
+
+    def __init__(
+        self,
+        embedding_dim: float = 768,
+        ffn_embedding_dim: float = 3072,
+        num_attention_heads: float = 8,
+        dropout: float = 0.1,
+        attention_dropout: float = 0.1,
+        activation_dropout: float = 0.1,
+        activation_fn: str = "relu",
+        layer_norm_first: bool = False,
+        has_relative_attention_bias: bool = False,
+    ) -> None:
+
+        super().__init__()
+        # Initialize parameters
+        self.embedding_dim = embedding_dim
+        self.dropout = dropout
+        self.activation_dropout = activation_dropout
+
+        # Initialize blocks
+        self.activation_fn = utils.get_activation_fn(activation_fn)
+        self.self_attn = MultiheadAttention(
+            self.embedding_dim,
+            num_attention_heads,
+            dropout=attention_dropout,
+            self_attention=True,
+        )
+
+        self.dropout1 = nn.Dropout(dropout)
+        self.dropout2 = nn.Dropout(self.activation_dropout)
+        self.dropout3 = nn.Dropout(dropout)
+
+        self.layer_norm_first = layer_norm_first
+
+        # layer norm associated with the self attention layer
+        self.self_attn_layer_norm = LayerNorm(self.embedding_dim)
+        self.fc1 = nn.Linear(self.embedding_dim, ffn_embedding_dim)
+        self.fc2 = nn.Linear(ffn_embedding_dim, self.embedding_dim)
+
+        # layer norm associated with the position wise feed-forward NN
+        self.final_layer_norm = LayerNorm(self.embedding_dim)
+
+        if has_relative_attention_bias:
+            self.norm_k = LayerNorm(self.embedding_dim//num_attention_heads)
+
+    def forward(
+        self,
+        x: torch.Tensor,
+        self_attn_mask: torch.Tensor = None,
+        self_attn_padding_mask: torch.Tensor = None,
+        need_weights: bool = False,
+        att_args=None,
+        pos_bias=None,
+    ):
+        """
+        LayerNorm is applied either before or after the self-attention/ffn
+        modules similar to the original Transformer imlementation.
+        """
+        residual = x
+
+        if self.layer_norm_first:
+            x = self.self_attn_layer_norm(x)
+            if pos_bias is not None:
+                pos_bias = self.norm_k(pos_bias)
+            x, attn = self.self_attn(
+                query=x,
+                key=x,
+                value=x,
+                key_padding_mask=self_attn_padding_mask,
+                attn_mask=self_attn_mask,
+                position_bias=pos_bias,
+            )
+            x = self.dropout1(x)
+            x = residual + x
+
+            residual = x
+            x = self.final_layer_norm(x)
+            x = self.activation_fn(self.fc1(x))
+            x = self.dropout2(x)
+            x = self.fc2(x)
+            x = self.dropout3(x)
+            x = residual + x
+        else:
+            x, attn = self.self_attn(
+                query=x,
+                key=x,
+                value=x,
+                key_padding_mask=self_attn_padding_mask,
+                position_bias=pos_bias,
+            )
+
+            x = self.dropout1(x)
+            x = residual + x
+
+            x = self.self_attn_layer_norm(x)
+
+            residual = x
+            x = self.activation_fn(self.fc1(x))
+            x = self.dropout2(x)
+            x = self.fc2(x)
+            x = self.dropout3(x)
+            x = residual + x
+            x = self.final_layer_norm(x)
+
+        return x, attn
diff --git a/YiTrans/yitrans_iwslt22/sequence_generator.py b/YiTrans/yitrans_iwslt22/sequence_generator.py
new file mode 100644
index 0000000000000000000000000000000000000000..5f80669471b5837a14dd3451e546ee273a74ac5c
--- /dev/null
+++ b/YiTrans/yitrans_iwslt22/sequence_generator.py
@@ -0,0 +1,999 @@
+"""
+    Modified from 
+    https://github.com/facebookresearch/fairseq/blob/main/fairseq/sequence_generator.py
+
+"""
+
+import math
+from typing import Dict, List, Optional
+import sys
+
+import torch
+import torch.nn as nn
+from fairseq import search, utils
+from fairseq.data import data_utils
+from fairseq.models import FairseqIncrementalDecoder
+from torch import Tensor
+from fairseq.ngram_repeat_block import NGramRepeatBlock
+import numpy
+
+
+class SequenceGenerator(nn.Module):
+    def __init__(
+        self,
+        models,
+        tgt_dict,
+        beam_size=1,
+        max_len_a=0,
+        max_len_b=200,
+        max_len=0,
+        min_len=1,
+        normalize_scores=True,
+        len_penalty=1.0,
+        unk_penalty=0.0,
+        temperature=1.0,
+        match_source_len=False,
+        no_repeat_ngram_size=0,
+        search_strategy=None,
+        eos=None,
+        bos=None,
+        symbols_to_strip_from_output=None,
+        lm_model=None,
+        lm_weight=1.0,
+        ctc_weight=0.0,
+    ):
+        """Generates translations of a given source sentence.
+
+        Args:
+            models (List[~fairseq.models.FairseqModel]): ensemble of models,
+                currently support fairseq.models.TransformerModel for scripting
+            beam_size (int, optional): beam width (default: 1)
+            max_len_a/b (int, optional): generate sequences of maximum length
+                ax + b, where x is the source length
+            max_len (int, optional): the maximum length of the generated output
+                (not including end-of-sentence)
+            min_len (int, optional): the minimum length of the generated output
+                (not including end-of-sentence)
+            normalize_scores (bool, optional): normalize scores by the length
+                of the output (default: True)
+            len_penalty (float, optional): length penalty, where <1.0 favors
+                shorter, >1.0 favors longer sentences (default: 1.0)
+            unk_penalty (float, optional): unknown word penalty, where <0
+                produces more unks, >0 produces fewer (default: 0.0)
+            temperature (float, optional): temperature, where values
+                >1.0 produce more uniform samples and values <1.0 produce
+                sharper samples (default: 1.0)
+            match_source_len (bool, optional): outputs should match the source
+                length (default: False)
+        """
+        super().__init__()
+        if isinstance(models, EnsembleModel):
+            self.model = models
+        else:
+            self.model = EnsembleModel(models)
+        self.tgt_dict = tgt_dict
+        self.pad = tgt_dict.pad()
+        self.unk = tgt_dict.unk()
+        self.eos = tgt_dict.eos() if eos is None else eos
+        self.bos = self.eos if bos is None else bos
+        self.blank = self.tgt_dict.index("<s>")
+        self.symbols_to_strip_from_output = (
+            symbols_to_strip_from_output.union({self.eos})
+            if symbols_to_strip_from_output is not None
+            else {self.eos}
+        )
+        self.vocab_size = len(tgt_dict)
+        self.beam_size = beam_size
+        # the max beam size is the dictionary size - 1, since we never select pad
+        self.beam_size = min(beam_size, self.vocab_size - 1)
+        self.max_len_a = max_len_a
+        self.max_len_b = max_len_b
+        self.min_len = min_len
+        self.max_len = max_len or self.model.max_decoder_positions()
+
+        self.normalize_scores = normalize_scores
+        self.len_penalty = len_penalty
+        self.unk_penalty = unk_penalty
+        self.temperature = temperature
+        self.match_source_len = match_source_len
+
+        if no_repeat_ngram_size > 0:
+            self.repeat_ngram_blocker = NGramRepeatBlock(no_repeat_ngram_size)
+        else:
+            self.repeat_ngram_blocker = None
+
+        assert temperature > 0, "--temperature must be greater than 0"
+
+        self.search = (
+            search.BeamSearch(tgt_dict) if search_strategy is None else search_strategy
+        )
+        # We only need to set src_lengths in LengthConstrainedBeamSearch.
+        # As a module attribute, setting it would break in multithread
+        # settings when the model is shared.
+        self.should_set_src_lengths = (
+            hasattr(self.search, "needs_src_lengths") and self.search.needs_src_lengths
+        )
+
+        self.model.eval()
+
+        self.lm_model = lm_model
+        self.lm_weight = lm_weight
+        self.ctc_weight = ctc_weight
+        if self.lm_model is not None:
+            self.lm_model.eval()
+
+    def cuda(self):
+        self.model.cuda()
+        return self
+
+    @torch.no_grad()
+    def forward(
+        self,
+        sample: Dict[str, Dict[str, Tensor]],
+        prefix_tokens: Optional[Tensor] = None,
+        bos_token: Optional[int] = None,
+    ):
+        """Generate a batch of translations.
+
+        Args:
+            sample (dict): batch
+            prefix_tokens (torch.LongTensor, optional): force decoder to begin
+                with these tokens
+            bos_token (int, optional): beginning of sentence token
+                (default: self.eos)
+        """
+        return self._generate(sample, prefix_tokens, bos_token=bos_token)
+
+    # TODO(myleott): unused, deprecate after pytorch-translate migration
+    def generate_batched_itr(self, data_itr, beam_size=None, cuda=False, timer=None):
+        """Iterate over a batched dataset and yield individual translations.
+        Args:
+            cuda (bool, optional): use GPU for generation
+            timer (StopwatchMeter, optional): time generations
+        """
+        for sample in data_itr:
+            s = utils.move_to_cuda(sample) if cuda else sample
+            if "net_input" not in s:
+                continue
+            input = s["net_input"]
+            # model.forward normally channels prev_output_tokens into the decoder
+            # separately, but SequenceGenerator directly calls model.encoder
+            encoder_input = {
+                k: v for k, v in input.items() if k != "prev_output_tokens"
+            }
+            if timer is not None:
+                timer.start()
+            with torch.no_grad():
+                hypos = self.generate(encoder_input)
+            if timer is not None:
+                timer.stop(sum(len(h[0]["tokens"]) for h in hypos))
+            for i, id in enumerate(s["id"].data):
+                # remove padding
+                src = utils.strip_pad(input["src_tokens"].data[i, :], self.pad)
+                ref = (
+                    utils.strip_pad(s["target"].data[i, :], self.pad)
+                    if s["target"] is not None
+                    else None
+                )
+                yield id, src, ref, hypos[i]
+
+    @torch.no_grad()
+    def generate(
+        self, models, sample: Dict[str, Dict[str, Tensor]], **kwargs
+    ) -> List[List[Dict[str, Tensor]]]:
+        """Generate translations. Match the api of other fairseq generators.
+
+        Args:
+            models (List[~fairseq.models.FairseqModel]): ensemble of models
+            sample (dict): batch
+            prefix_tokens (torch.LongTensor, optional): force decoder to begin
+                with these tokens
+            constraints (torch.LongTensor, optional): force decoder to include
+                the list of constraints
+            bos_token (int, optional): beginning of sentence token
+                (default: self.eos)
+        """
+        return self._generate(sample, **kwargs)
+
+    def _generate(
+        self,
+        sample: Dict[str, Dict[str, Tensor]],
+        prefix_tokens: Optional[Tensor] = None,
+        constraints: Optional[Tensor] = None,
+        bos_token: Optional[int] = None,
+    ):
+        incremental_states = torch.jit.annotate(
+            List[Dict[str, Dict[str, Optional[Tensor]]]],
+            [
+                torch.jit.annotate(Dict[str, Dict[str, Optional[Tensor]]], {})
+                for i in range(self.model.models_size)
+            ],
+        )
+        net_input = sample["net_input"]
+
+        if "src_tokens" in net_input:
+            src_tokens = net_input["src_tokens"]
+            # length of the source text being the character length except EndOfSentence and pad
+            src_lengths = (
+                (src_tokens.ne(self.eos) & src_tokens.ne(self.pad)).long().sum(dim=1)
+            )
+        elif "source" in net_input:
+            src_tokens = net_input["source"]
+            src_lengths = (
+                net_input["padding_mask"].size(-1) - net_input["padding_mask"].sum(-1)
+                if net_input["padding_mask"] is not None
+                else torch.tensor(src_tokens.size(-1)).to(src_tokens)
+            )
+        elif "features" in net_input:
+            src_tokens = net_input["features"]
+            src_lengths = (
+                net_input["padding_mask"].size(-1) - net_input["padding_mask"].sum(-1)
+                if net_input["padding_mask"] is not None
+                else torch.tensor(src_tokens.size(-1)).to(src_tokens)
+            )
+        else:
+            raise Exception(
+                "expected src_tokens or source in net input. input keys: "
+                + str(net_input.keys())
+            )
+
+        # bsz: total number of sentences in beam
+        # Note that src_tokens may have more than 2 dimensions (i.e. audio features)
+        bsz, src_len = src_tokens.size()[:2]
+        beam_size = self.beam_size
+
+        if constraints is not None and not self.search.supports_constraints:
+            raise NotImplementedError(
+                "Target-side constraints were provided, but search method doesn't support them"
+            )
+
+        # Initialize constraints, when active
+        self.search.init_constraints(constraints, beam_size)
+
+        max_len: int = -1
+        if self.match_source_len:
+            max_len = src_lengths.max().item()
+        else:
+            max_len = min(
+                int(self.max_len_a * src_len + self.max_len_b),
+                self.max_len - 1,
+            )
+        assert (
+            self.min_len <= max_len
+        ), "min_len cannot be larger than max_len, please adjust these!"
+        # compute the encoder output for each beam
+        with torch.autograd.profiler.record_function("EnsembleModel: forward_encoder"):
+            encoder_outs = self.model.forward_encoder(net_input)
+
+        dec_sos = sample["lang_idx"] if ("lang_idx" in sample and sample["lang_idx"] is not None) else (self.bos if bos_token is None else bos_token)
+        # placeholder of indices for bsz * beam_size to hold tokens and accumulative scores
+        new_order = torch.arange(bsz).view(-1, 1).repeat(1, beam_size).view(-1)
+        new_order = new_order.to(src_tokens.device).long()
+        encoder_outs = self.model.reorder_encoder_out(encoder_outs, new_order)
+        # ensure encoder_outs is a List.
+        assert encoder_outs is not None
+
+        # initialize buffers
+        scores = (
+            torch.zeros(bsz * beam_size, max_len + 1).to(src_tokens).float()
+        )  # +1 for eos; pad is never chosen for scoring
+        tokens = (
+            torch.zeros(bsz * beam_size, max_len + 2)
+            .to(src_tokens)
+            .long()
+            .fill_(self.pad)
+        )  # +2 for eos and pad
+        tokens[:, 0] = dec_sos
+        attn: Optional[Tensor] = None
+
+        # A list that indicates candidates that should be ignored.
+        # For example, suppose we're sampling and have already finalized 2/5
+        # samples. Then cands_to_ignore would mark 2 positions as being ignored,
+        # so that we only finalize the remaining 3 samples.
+        cands_to_ignore = (
+            torch.zeros(bsz, beam_size).to(src_tokens).eq(-1)
+        )  # forward and backward-compatible False mask
+
+        # list of completed sentences
+        finalized = torch.jit.annotate(
+            List[List[Dict[str, Tensor]]],
+            [torch.jit.annotate(List[Dict[str, Tensor]], []) for i in range(bsz)],
+        )  # contains lists of dictionaries of infomation about the hypothesis being finalized at each step
+
+        # a boolean array indicating if the sentence at the index is finished or not
+        finished = [False for i in range(bsz)]
+        num_remaining_sent = bsz  # number of sentences remaining
+
+        # number of candidate hypos per step
+        cand_size = 2 * beam_size  # 2 x beam size in case half are EOS
+
+        # offset arrays for converting between different indexing schemes
+        bbsz_offsets = (
+            (torch.arange(0, bsz) * beam_size)
+            .unsqueeze(1)
+            .type_as(tokens)
+            .to(src_tokens.device)
+        )
+        cand_offsets = torch.arange(0, cand_size).type_as(tokens).to(src_tokens.device)
+
+        reorder_state: Optional[Tensor] = None
+        batch_idxs: Optional[Tensor] = None
+
+        original_batch_idxs: Optional[Tensor] = None
+        if "id" in sample and isinstance(sample["id"], Tensor):
+            original_batch_idxs = sample["id"]
+        else:
+            original_batch_idxs = torch.arange(0, bsz).type_as(tokens)
+
+        for step in range(max_len + 1):  # one extra step for EOS marker
+            # reorder decoder internal states based on the prev choice of beams
+            if reorder_state is not None:
+                if batch_idxs is not None:
+                    # update beam indices to take into account removed sentences
+                    corr = batch_idxs - torch.arange(batch_idxs.numel()).type_as(
+                        batch_idxs
+                    )
+                    reorder_state.view(-1, beam_size).add_(
+                        corr.unsqueeze(-1) * beam_size
+                    )
+                    original_batch_idxs = original_batch_idxs[batch_idxs]
+                self.model.reorder_incremental_state(incremental_states, reorder_state)
+                encoder_outs = self.model.reorder_encoder_out(
+                    encoder_outs, reorder_state
+                )
+            with torch.autograd.profiler.record_function(
+                "EnsembleModel: forward_decoder"
+            ):
+                lprobs, avg_attn_scores = self.model.forward_decoder(
+                    tokens[:, : step + 1],
+                    encoder_outs,
+                    incremental_states,
+                    self.temperature,
+                )
+
+            if self.lm_model is not None:
+                lm_out = self.lm_model(tokens[:, : step + 1])
+                probs = self.lm_model.get_normalized_probs(
+                    lm_out, log_probs=True, sample=None
+                )
+                probs = probs[:, -1, :] * self.lm_weight
+                lprobs += probs
+            # handle prefix tokens (possibly with different lengths)
+            if (
+                prefix_tokens is not None
+                and step < prefix_tokens.size(1)
+                and step < max_len
+            ):
+                lprobs, tokens, scores = self._prefix_tokens(
+                    step, lprobs, scores, tokens, prefix_tokens, beam_size
+                )
+            elif step < self.min_len:
+                # minimum length constraint (does not apply if using prefix_tokens)
+                lprobs[:, self.eos] = -math.inf
+
+            lprobs[lprobs != lprobs] = torch.tensor(-math.inf).to(lprobs)
+
+            lprobs[:, self.pad] = -math.inf  # never select pad
+            lprobs[:, self.unk] -= self.unk_penalty  # apply unk penalty
+            lprobs[:, self.blank] = -math.inf # never select blank
+            if dec_sos != self.eos:
+                lprobs[:, dec_sos] = -math.inf # never select lang id
+
+            # handle max length constraint
+            if step >= max_len:
+                lprobs[:, : self.eos] = -math.inf
+                lprobs[:, self.eos + 1 :] = -math.inf
+
+            # Record attention scores, only support avg_attn_scores is a Tensor
+            if avg_attn_scores is not None:
+                if attn is None:
+                    attn = torch.empty(
+                        bsz * beam_size, avg_attn_scores.size(1), max_len + 2
+                    ).to(scores)
+                attn[:, :, step + 1].copy_(avg_attn_scores)
+
+            scores = scores.type_as(lprobs)
+            eos_bbsz_idx = torch.empty(0).to(
+                tokens
+            )  # indices of hypothesis ending with eos (finished sentences)
+            eos_scores = torch.empty(0).to(
+                scores
+            )  # scores of hypothesis ending with eos (finished sentences)
+
+            if self.should_set_src_lengths:
+                self.search.set_src_lengths(src_lengths)
+
+            if self.repeat_ngram_blocker is not None:
+                lprobs = self.repeat_ngram_blocker(tokens, lprobs, bsz, beam_size, step)
+
+            # Shape: (batch, cand_size)
+            cand_scores, cand_indices, cand_beams = self.search.step(
+                step,
+                lprobs.view(bsz, -1, self.vocab_size),
+                scores.view(bsz, beam_size, -1)[:, :, :step],
+                tokens[:, : step + 1],
+                original_batch_idxs,
+            )
+
+            # cand_bbsz_idx contains beam indices for the top candidate
+            # hypotheses, with a range of values: [0, bsz*beam_size),
+            # and dimensions: [bsz, cand_size]
+            cand_bbsz_idx = cand_beams.add(bbsz_offsets)
+
+            # finalize hypotheses that end in eos
+            # Shape of eos_mask: (batch size, beam size)
+            eos_mask = cand_indices.eq(self.eos) & cand_scores.ne(-math.inf)
+            eos_mask[:, :beam_size][cands_to_ignore] = torch.tensor(0).to(eos_mask)
+
+            # only consider eos when it's among the top beam_size indices
+            # Now we know what beam item(s) to finish
+            # Shape: 1d list of absolute-numbered
+            eos_bbsz_idx = torch.masked_select(
+                cand_bbsz_idx[:, :beam_size], mask=eos_mask[:, :beam_size]
+            )
+
+            finalized_sents: List[int] = []
+            if eos_bbsz_idx.numel() > 0:
+                eos_scores = torch.masked_select(
+                    cand_scores[:, :beam_size], mask=eos_mask[:, :beam_size]
+                )
+
+                finalized_sents = self.finalize_hypos(
+                    step,
+                    eos_bbsz_idx,
+                    eos_scores,
+                    tokens,
+                    scores,
+                    finalized,
+                    finished,
+                    beam_size,
+                    attn,
+                    src_lengths,
+                    max_len,
+                )
+                num_remaining_sent -= len(finalized_sents)
+
+            assert num_remaining_sent >= 0
+            if num_remaining_sent == 0:
+                break
+            if self.search.stop_on_max_len and step >= max_len:
+                break
+            assert step < max_len, f"{step} < {max_len}"
+
+            # Remove finalized sentences (ones for which {beam_size}
+            # finished hypotheses have been generated) from the batch.
+            if len(finalized_sents) > 0:
+                new_bsz = bsz - len(finalized_sents)
+
+                # construct batch_idxs which holds indices of batches to keep for the next pass
+                batch_mask = torch.ones(
+                    bsz, dtype=torch.bool, device=cand_indices.device
+                )
+                batch_mask[finalized_sents] = False
+                # TODO replace `nonzero(as_tuple=False)` after TorchScript supports it
+                batch_idxs = torch.arange(
+                    bsz, device=cand_indices.device
+                ).masked_select(batch_mask)
+
+                # Choose the subset of the hypothesized constraints that will continue
+                self.search.prune_sentences(batch_idxs)
+
+                eos_mask = eos_mask[batch_idxs]
+                cand_beams = cand_beams[batch_idxs]
+                bbsz_offsets.resize_(new_bsz, 1)
+                cand_bbsz_idx = cand_beams.add(bbsz_offsets)
+                cand_scores = cand_scores[batch_idxs]
+                cand_indices = cand_indices[batch_idxs]
+
+                if prefix_tokens is not None:
+                    prefix_tokens = prefix_tokens[batch_idxs]
+                src_lengths = src_lengths[batch_idxs]
+                cands_to_ignore = cands_to_ignore[batch_idxs]
+
+                scores = scores.view(bsz, -1)[batch_idxs].view(new_bsz * beam_size, -1)
+                tokens = tokens.view(bsz, -1)[batch_idxs].view(new_bsz * beam_size, -1)
+                if attn is not None:
+                    attn = attn.view(bsz, -1)[batch_idxs].view(
+                        new_bsz * beam_size, attn.size(1), -1
+                    )
+                bsz = new_bsz
+            else:
+                batch_idxs = None
+
+            # Set active_mask so that values > cand_size indicate eos hypos
+            # and values < cand_size indicate candidate active hypos.
+            # After, the min values per row are the top candidate active hypos
+
+            # Rewrite the operator since the element wise or is not supported in torchscript.
+
+            eos_mask[:, :beam_size] = ~((~cands_to_ignore) & (~eos_mask[:, :beam_size]))
+            active_mask = torch.add(
+                eos_mask.type_as(cand_offsets) * cand_size,
+                cand_offsets[: eos_mask.size(1)],
+            )
+
+            # get the top beam_size active hypotheses, which are just
+            # the hypos with the smallest values in active_mask.
+            # {active_hypos} indicates which {beam_size} hypotheses
+            # from the list of {2 * beam_size} candidates were
+            # selected. Shapes: (batch size, beam size)
+            new_cands_to_ignore, active_hypos = torch.topk(
+                active_mask, k=beam_size, dim=1, largest=False
+            )
+
+            # update cands_to_ignore to ignore any finalized hypos.
+            cands_to_ignore = new_cands_to_ignore.ge(cand_size)[:, :beam_size]
+            # Make sure there is at least one active item for each sentence in the batch.
+            assert (~cands_to_ignore).any(dim=1).all()
+
+            # update cands_to_ignore to ignore any finalized hypos
+
+            # {active_bbsz_idx} denotes which beam number is continued for each new hypothesis (a beam
+            # can be selected more than once).
+            active_bbsz_idx = torch.gather(cand_bbsz_idx, dim=1, index=active_hypos)
+            active_scores = torch.gather(cand_scores, dim=1, index=active_hypos)
+
+            active_bbsz_idx = active_bbsz_idx.view(-1)
+            active_scores = active_scores.view(-1)
+
+            # copy tokens and scores for active hypotheses
+
+            # Set the tokens for each beam (can select the same row more than once)
+            tokens[:, : step + 1] = torch.index_select(
+                tokens[:, : step + 1], dim=0, index=active_bbsz_idx
+            )
+            # Select the next token for each of them
+            tokens.view(bsz, beam_size, -1)[:, :, step + 1] = torch.gather(
+                cand_indices, dim=1, index=active_hypos
+            )
+            if step > 0:
+                scores[:, :step] = torch.index_select(
+                    scores[:, :step], dim=0, index=active_bbsz_idx
+                )
+            scores.view(bsz, beam_size, -1)[:, :, step] = torch.gather(
+                cand_scores, dim=1, index=active_hypos
+            )
+
+            # Update constraints based on which candidates were selected for the next beam
+            self.search.update_constraints(active_hypos)
+
+            # copy attention for active hypotheses
+            if attn is not None:
+                attn[:, :, : step + 2] = torch.index_select(
+                    attn[:, :, : step + 2], dim=0, index=active_bbsz_idx
+                )
+
+            # reorder incremental state in decoder
+            reorder_state = active_bbsz_idx
+
+        # sort by score descending
+        for sent in range(len(finalized)):
+            scores = torch.tensor(
+                [float(elem["score"].item()) for elem in finalized[sent]]
+            )
+            _, sorted_scores_indices = torch.sort(scores, descending=True)
+            finalized[sent] = [finalized[sent][ssi] for ssi in sorted_scores_indices]
+            finalized[sent] = torch.jit.annotate(
+                List[Dict[str, Tensor]], finalized[sent]
+            )
+        return finalized
+
+    def _prefix_tokens(
+        self, step: int, lprobs, scores, tokens, prefix_tokens, beam_size: int
+    ):
+        """Handle prefix tokens"""
+        prefix_toks = prefix_tokens[:, step].unsqueeze(-1).repeat(1, beam_size).view(-1)
+        prefix_lprobs = lprobs.gather(-1, prefix_toks.unsqueeze(-1))
+        prefix_mask = prefix_toks.ne(self.pad)
+        lprobs[prefix_mask] = torch.tensor(-math.inf).to(lprobs)
+        lprobs[prefix_mask] = lprobs[prefix_mask].scatter(
+            -1, prefix_toks[prefix_mask].unsqueeze(-1), prefix_lprobs[prefix_mask]
+        )
+        # if prefix includes eos, then we should make sure tokens and
+        # scores are the same across all beams
+        eos_mask = prefix_toks.eq(self.eos)
+        if eos_mask.any():
+            # validate that the first beam matches the prefix
+            first_beam = tokens[eos_mask].view(-1, beam_size, tokens.size(-1))[
+                :, 0, 1 : step + 1
+            ]
+            eos_mask_batch_dim = eos_mask.view(-1, beam_size)[:, 0]
+            target_prefix = prefix_tokens[eos_mask_batch_dim][:, :step]
+            assert (first_beam == target_prefix).all()
+
+            # copy tokens, scores and lprobs from the first beam to all beams
+            tokens = self.replicate_first_beam(tokens, eos_mask_batch_dim, beam_size)
+            scores = self.replicate_first_beam(scores, eos_mask_batch_dim, beam_size)
+            lprobs = self.replicate_first_beam(lprobs, eos_mask_batch_dim, beam_size)
+        return lprobs, tokens, scores
+
+    def replicate_first_beam(self, tensor, mask, beam_size: int):
+        tensor = tensor.view(-1, beam_size, tensor.size(-1))
+        tensor[mask] = tensor[mask][:, :1, :]
+        return tensor.view(-1, tensor.size(-1))
+
+    def finalize_hypos(
+        self,
+        step: int,
+        bbsz_idx,
+        eos_scores,
+        tokens,
+        scores,
+        finalized: List[List[Dict[str, Tensor]]],
+        finished: List[bool],
+        beam_size: int,
+        attn: Optional[Tensor],
+        src_lengths,
+        max_len: int,
+    ):
+        """Finalize hypothesis, store finalized information in `finalized`, and change `finished` accordingly.
+        A sentence is finalized when {beam_size} finished items have been collected for it.
+
+        Returns number of sentences (not beam items) being finalized.
+        These will be removed from the batch and not processed further.
+        Args:
+            bbsz_idx (Tensor):
+        """
+        assert bbsz_idx.numel() == eos_scores.numel()
+
+        # clone relevant token and attention tensors.
+        # tokens is (batch * beam, max_len). So the index_select
+        # gets the newly EOS rows, then selects cols 1..{step + 2}
+        tokens_clone = tokens.index_select(0, bbsz_idx)[
+            :, 1 : step + 2
+        ]  # skip the first index, which is EOS
+
+        tokens_clone[:, step] = self.eos
+        attn_clone = (
+            attn.index_select(0, bbsz_idx)[:, :, 1 : step + 2]
+            if attn is not None
+            else None
+        )
+
+        # compute scores per token position
+        pos_scores = scores.index_select(0, bbsz_idx)[:, : step + 1]
+        pos_scores[:, step] = eos_scores
+        # convert from cumulative to per-position scores
+        pos_scores[:, 1:] = pos_scores[:, 1:] - pos_scores[:, :-1]
+
+        # normalize sentence-level scores
+        if self.normalize_scores:
+            eos_scores /= (step + 1) ** self.len_penalty
+
+        # cum_unfin records which sentences in the batch are finished.
+        # It helps match indexing between (a) the original sentences
+        # in the batch and (b) the current, possibly-reduced set of
+        # sentences.
+        cum_unfin: List[int] = []
+        prev = 0
+        for f in finished:
+            if f:
+                prev += 1
+            else:
+                cum_unfin.append(prev)
+        cum_fin_tensor = torch.tensor(cum_unfin, dtype=torch.int).to(bbsz_idx)
+
+        unfin_idx = bbsz_idx // beam_size
+        sent = unfin_idx + torch.index_select(cum_fin_tensor, 0, unfin_idx)
+
+        # Create a set of "{sent}{unfin_idx}", where
+        # "unfin_idx" is the index in the current (possibly reduced)
+        # list of sentences, and "sent" is the index in the original,
+        # unreduced batch
+        # For every finished beam item
+        # sentence index in the current (possibly reduced) batch
+        seen = (sent << 32) + unfin_idx
+        unique_seen: List[int] = torch.unique(seen).tolist()
+
+        if self.match_source_len:
+            condition = step > torch.index_select(src_lengths, 0, unfin_idx)
+            eos_scores = torch.where(condition, torch.tensor(-math.inf), eos_scores)
+        sent_list: List[int] = sent.tolist()
+        for i in range(bbsz_idx.size()[0]):
+            # An input sentence (among those in a batch) is finished when
+            # beam_size hypotheses have been collected for it
+            if len(finalized[sent_list[i]]) < beam_size:
+                if attn_clone is not None:
+                    # remove padding tokens from attn scores
+                    hypo_attn = attn_clone[i]
+                else:
+                    hypo_attn = torch.empty(0)
+
+                finalized[sent_list[i]].append(
+                    {
+                        "tokens": tokens_clone[i],
+                        "score": eos_scores[i],
+                        "attention": hypo_attn,  # src_len x tgt_len
+                        "alignment": torch.empty(0),
+                        "positional_scores": pos_scores[i],
+                    }
+                )
+
+        newly_finished: List[int] = []
+        for unique_s in unique_seen:
+            # check termination conditions for this sentence
+            unique_sent: int = unique_s >> 32
+            unique_unfin_idx: int = unique_s - (unique_sent << 32)
+
+            if not finished[unique_sent] and self.is_finished(
+                step, unique_unfin_idx, max_len, len(finalized[unique_sent]), beam_size
+            ):
+                finished[unique_sent] = True
+                newly_finished.append(unique_unfin_idx)
+
+        return newly_finished
+
+    def is_finished(
+        self,
+        step: int,
+        unfin_idx: int,
+        max_len: int,
+        finalized_sent_len: int,
+        beam_size: int,
+    ):
+        """
+        Check whether decoding for a sentence is finished, which
+        occurs when the list of finalized sentences has reached the
+        beam size, or when we reach the maximum length.
+        """
+        assert finalized_sent_len <= beam_size
+        if finalized_sent_len == beam_size or step == max_len:
+            return True
+        return False
+
+
+class EnsembleModel(nn.Module):
+    """A wrapper around an ensemble of models."""
+
+    def __init__(self, models):
+        super().__init__()
+        self.models_size = len(models)
+        # method '__len__' is not supported in ModuleList for torch script
+        self.single_model = models[0]
+        self.models = nn.ModuleList(models)
+
+        self.has_incremental: bool = False
+        if all(
+            hasattr(m, "decoder") and isinstance(m.decoder, FairseqIncrementalDecoder)
+            for m in models
+        ):
+            self.has_incremental = True
+
+    def forward(self):
+        pass
+
+    def has_encoder(self):
+        return hasattr(self.single_model, "encoder")
+
+    def has_incremental_states(self):
+        return self.has_incremental
+
+    def max_decoder_positions(self):
+        return min(
+            [
+                m.max_decoder_positions()
+                for m in self.models
+                if hasattr(m, "max_decoder_positions")
+            ]
+            + [sys.maxsize]
+        )
+
+    @torch.jit.export
+    def forward_encoder(self, net_input: Dict[str, Tensor]):
+        if not self.has_encoder():
+            return None
+        return [model.encoder.forward_torchscript(net_input) for model in self.models]
+
+    @torch.jit.export
+    def forward_decoder(
+        self,
+        tokens,
+        encoder_outs: List[Dict[str, List[Tensor]]],
+        incremental_states: List[Dict[str, Dict[str, Optional[Tensor]]]],
+        temperature: float = 1.0,
+    ):
+        log_probs = []
+        avg_attn: Optional[Tensor] = None
+        encoder_out: Optional[Dict[str, List[Tensor]]] = None
+        for i, model in enumerate(self.models):
+            if self.has_encoder():
+                encoder_out = encoder_outs[i]
+            # decode each model
+            if self.has_incremental_states():
+                decoder_out = model.decoder.forward(
+                    tokens,
+                    encoder_out=encoder_out,
+                    incremental_state=incremental_states[i],
+                    modal_idx=-1,
+                )
+            else:
+                if hasattr(model, "decoder"):
+                    decoder_out = model.decoder.forward(tokens, encoder_out=encoder_out)
+                else:
+                    decoder_out = model.forward(tokens)
+
+            attn: Optional[Tensor] = None
+            decoder_len = len(decoder_out)
+            if decoder_len > 1 and decoder_out[1] is not None:
+                if isinstance(decoder_out[1], Tensor):
+                    attn = decoder_out[1]
+                else:
+                    attn_holder = decoder_out[1]["attn"]
+                    if isinstance(attn_holder, Tensor):
+                        attn = attn_holder
+                    elif attn_holder is not None:
+                        attn = attn_holder[0]
+                if attn is not None:
+                    attn = attn[:, -1, :]
+
+            decoder_out_tuple = (
+                decoder_out[0][:, -1:, :].div_(temperature),
+                None if decoder_len <= 1 else decoder_out[1],
+            )
+            probs = model.get_normalized_probs(
+                decoder_out_tuple, log_probs=True, sample=None
+            )
+            probs = probs[:, -1, :]
+            if self.models_size == 1:
+                return probs, attn
+
+            log_probs.append(probs)
+            if attn is not None:
+                if avg_attn is None:
+                    avg_attn = attn
+                else:
+                    avg_attn.add_(attn)
+
+        avg_probs = torch.logsumexp(torch.stack(log_probs, dim=0), dim=0) - math.log(
+            self.models_size
+        )
+
+        if avg_attn is not None:
+            avg_attn.div_(self.models_size)
+        return avg_probs, avg_attn
+
+    @torch.jit.export
+    def reorder_encoder_out(
+        self, encoder_outs: Optional[List[Dict[str, List[Tensor]]]], new_order
+    ):
+        """
+        Reorder encoder output according to *new_order*.
+
+        Args:
+            encoder_out: output from the ``forward()`` method
+            new_order (LongTensor): desired order
+
+        Returns:
+            *encoder_out* rearranged according to *new_order*
+        """
+        new_outs: List[Dict[str, List[Tensor]]] = []
+        if not self.has_encoder():
+            return new_outs
+        for i, model in enumerate(self.models):
+            assert encoder_outs is not None
+            new_outs.append(
+                model.encoder.reorder_encoder_out(encoder_outs[i], new_order)
+            )
+        return new_outs
+
+    @torch.jit.export
+    def reorder_incremental_state(
+        self,
+        incremental_states: List[Dict[str, Dict[str, Optional[Tensor]]]],
+        new_order,
+    ):
+        if not self.has_incremental_states():
+            return
+        for i, model in enumerate(self.models):
+            model.decoder.reorder_incremental_state_scripting(
+                incremental_states[i], new_order
+            )
+
+
+class SequenceGeneratorWithAlignment(SequenceGenerator):
+    def __init__(
+        self, models, tgt_dict, left_pad_target=False, print_alignment="hard", **kwargs
+    ):
+        """Generates translations of a given source sentence.
+
+        Produces alignments following "Jointly Learning to Align and
+        Translate with Transformer Models" (Garg et al., EMNLP 2019).
+
+        Args:
+            left_pad_target (bool, optional): Whether or not the
+                hypothesis should be left padded or not when they are
+                teacher forced for generating alignments.
+        """
+        super().__init__(EnsembleModelWithAlignment(models), tgt_dict, **kwargs)
+        self.left_pad_target = left_pad_target
+
+        if print_alignment == "hard":
+            self.extract_alignment = utils.extract_hard_alignment
+        elif print_alignment == "soft":
+            self.extract_alignment = utils.extract_soft_alignment
+
+    @torch.no_grad()
+    def generate(self, models, sample, **kwargs):
+        finalized = super()._generate(sample, **kwargs)
+
+        src_tokens = sample["net_input"]["src_tokens"]
+        bsz = src_tokens.shape[0]
+        beam_size = self.beam_size
+        (
+            src_tokens,
+            src_lengths,
+            prev_output_tokens,
+            tgt_tokens,
+        ) = self._prepare_batch_for_alignment(sample, finalized)
+        if any(getattr(m, "full_context_alignment", False) for m in self.model.models):
+            attn = self.model.forward_align(src_tokens, src_lengths, prev_output_tokens)
+        else:
+            attn = [
+                finalized[i // beam_size][i % beam_size]["attention"].transpose(1, 0)
+                for i in range(bsz * beam_size)
+            ]
+
+        if src_tokens.device != "cpu":
+            src_tokens = src_tokens.to("cpu")
+            tgt_tokens = tgt_tokens.to("cpu")
+            attn = [i.to("cpu") for i in attn]
+
+        # Process the attn matrix to extract hard alignments.
+        for i in range(bsz * beam_size):
+            alignment = self.extract_alignment(
+                attn[i], src_tokens[i], tgt_tokens[i], self.pad, self.eos
+            )
+            finalized[i // beam_size][i % beam_size]["alignment"] = alignment
+        return finalized
+
+    def _prepare_batch_for_alignment(self, sample, hypothesis):
+        src_tokens = sample["net_input"]["src_tokens"]
+        bsz = src_tokens.shape[0]
+        src_tokens = (
+            src_tokens[:, None, :]
+            .expand(-1, self.beam_size, -1)
+            .contiguous()
+            .view(bsz * self.beam_size, -1)
+        )
+        src_lengths = sample["net_input"]["src_lengths"]
+        src_lengths = (
+            src_lengths[:, None]
+            .expand(-1, self.beam_size)
+            .contiguous()
+            .view(bsz * self.beam_size)
+        )
+        prev_output_tokens = data_utils.collate_tokens(
+            [beam["tokens"] for example in hypothesis for beam in example],
+            self.pad,
+            self.eos,
+            self.left_pad_target,
+            move_eos_to_beginning=True,
+        )
+        tgt_tokens = data_utils.collate_tokens(
+            [beam["tokens"] for example in hypothesis for beam in example],
+            self.pad,
+            self.eos,
+            self.left_pad_target,
+            move_eos_to_beginning=False,
+        )
+        return src_tokens, src_lengths, prev_output_tokens, tgt_tokens
+
+
+class EnsembleModelWithAlignment(EnsembleModel):
+    """A wrapper around an ensemble of models."""
+
+    def __init__(self, models):
+        super().__init__(models)
+
+    def forward_align(self, src_tokens, src_lengths, prev_output_tokens):
+        avg_attn = None
+        for model in self.models:
+            decoder_out = model(src_tokens, src_lengths, prev_output_tokens)
+            attn = decoder_out[1]["attn"][0]
+            if avg_attn is None:
+                avg_attn = attn
+            else:
+                avg_attn.add_(attn)
+        if len(self.models) > 1:
+            avg_attn.div_(len(self.models))
+        return avg_attn
diff --git a/YiTrans/yitrans_iwslt22/tasks/iwslt_joint_pretraining.py b/YiTrans/yitrans_iwslt22/tasks/iwslt_joint_pretraining.py
new file mode 100644
index 0000000000000000000000000000000000000000..fbbba14135234e08bef1a35821b12a069dea2c8a
--- /dev/null
+++ b/YiTrans/yitrans_iwslt22/tasks/iwslt_joint_pretraining.py
@@ -0,0 +1,726 @@
+# --------------------------------------------------------
+# The YiTrans End-to-End Speech Translation System for IWSLT 2022 Offline Shared Task (https://arxiv.org/abs/2206.05777)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/YiTrans
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Based on fairseq code bases
+# https://github.com/facebookresearch/fairseq
+# --------------------------------------------------------
+"""
+    Modified from 
+    https://github.com/facebookresearch/fairseq/blob/main/fairseq/tasks/hubert_pretraining.py
+    https://github.com/facebookresearch/fairseq/blob/main/fairseq/tasks/denoising.py
+
+    Pre-training task for YiTrans@IWSLT2022
+    Step1: Combine Speech2C and multilingual BART
+    Step2: Combine ASR and multilingual MT
+"""
+import logging
+import os
+import sys
+from typing import Dict, List, Optional, Tuple
+from pathlib import Path
+
+import numpy as np
+from argparse import Namespace
+from collections import OrderedDict
+
+from dataclasses import dataclass, field
+from fairseq.data import Dictionary, encoders
+from fairseq.data import (
+    Dictionary,
+    data_utils,
+    StripTokenDataset,
+    PrependTokenDataset,
+    AppendTokenDataset,
+    FairseqDataset,
+    iterators,
+    ResamplingDataset,
+)
+from fairseq.data.audio.speech_to_text_joint_dataset import S2TJointDataConfig
+from fairseq.data.shorten_dataset import maybe_shorten_dataset
+from fairseq.data.encoders.utils import get_whole_word_mask
+from fairseq.dataclass.configs import FairseqDataclass
+from fairseq.tasks import register_task
+from fairseq.tasks.fairseq_task import FairseqTask
+from fairseq.dataclass.constants import ChoiceEnum
+
+from fairseq.tasks.hubert_pretraining import HubertPretrainingConfig
+from yitrans_iwslt22.data.load_langpair_dataset import load_langpair_dataset
+from yitrans_iwslt22.data.lang_pair_mask_dataset import LangPairMaskDataset
+from yitrans_iwslt22.data.speech2c_dataset import Speech2cDataset
+from yitrans_iwslt22.data.denoising_dataset import DenoisingDatasetLang
+from yitrans_iwslt22.data.concat_dataset import ConcatDataset
+from yitrans_iwslt22.data.multimodal_corpus_dataset import MultiCorpusDataset
+
+
+logger = logging.getLogger(__name__)
+TOKENIZER_CHOICES = ChoiceEnum(["sentencepiece", "hubert_letters", "none"])
+
+def _lang_token(lang: str):
+    return "<lang:{}>".format(lang)
+
+def _lang_token_index(dic: Dictionary, lang: str):
+    """Return language token index."""
+    idx = dic.index(_lang_token(lang))
+    assert idx != dic.unk_index, "cannot find language token for lang {}".format(lang)
+    return idx
+
+class LabelEncoder(object):
+    def __init__(self, dictionary: Dictionary) -> None:
+        self.dictionary = dictionary
+
+    def __call__(self, label: str) -> List[str]:
+        return self.dictionary.encode_line(
+            label, append_eos=False, add_if_not_exist=False,
+        )
+
+@dataclass
+class TextPretrainingConfig(FairseqDataclass):    
+    """
+        Convert the legacy config of BART to the Dataclass style
+    """
+    text_data: Optional[str] = field(
+        default=None,
+        metadata={
+            "help": "if set, path to text data directory",
+        },
+    )
+    seed: Optional[int] = field(
+        default=1,
+        metadata={
+            "help": "for ordered_indices in MulticorpusDataset",
+        },
+    )
+    tokens_per_sample: Optional[int] = field(
+        default=512,
+        metadata={
+            "help": "max number of total tokens over all segments per sample for dataset",
+        },
+    )
+    sample_break_mode: Optional[str] = field(
+        default="eos",
+        metadata={
+            "help": "mode for breaking sentence",
+        },
+    )
+    mask: Optional[float] = field(
+        default=0.3,
+        metadata={
+            "help": "fraction of words/subwords that will be masked",
+        },
+    )
+    leave_unmasked_prob: float = field(
+        default=0.1,
+        metadata={"help": "probability that a masked token is unmasked"},
+    )
+    mask_random: Optional[float] = field(
+        default=0.0,
+        metadata={
+            "help": "instead of using [MASK], use random token this often",
+        },
+    )
+    freq_weighted_replacement: bool = field(
+        default=False,
+        metadata={"help": "sample random replacement words based on word frequencies"},
+    )
+    mask_whole_words: bool = field(
+        default=False,
+        metadata={"help": "mask whole words; you may also want to set --bpe"},
+    )
+    mask_multiple_length: int = field(
+        default=1,
+        metadata={"help": "repeat the mask indices multiple times"},
+    )
+    mask_stdev: float = field(
+        default=0.0,
+        metadata={"help": "stdev of the mask length"},
+    )
+    shorten_method: Optional[str] = field(
+        default="none",
+        metadata={
+            "help": "if not none, shorten sequences that exceed tokens_per_sample",
+            "choices": "none/truncate/random_crop"
+        },
+    )
+    shorten_data_split_list: Optional[str] = field(
+        default="",
+        metadata={
+            "help": "comma_separated list of dataset splits to apply shortening to, e.g., train,valid (default: all dataset splits)",
+        },
+    )
+    ### below hypra-parameters is used in BART
+    insert: Optional[float] = field(
+        default=0.0,
+        metadata={
+            "help": "insert this percentage of additional random tokens",
+        },
+    )
+    permute: Optional[float] = field(
+        default=0.0,
+        metadata={
+            "help": "take this proportion of subwords and permute them",
+        },
+    )
+    rotate: Optional[float] = field(
+        default=0.0,
+        metadata={
+            "help": "rotate this proportion of inputs",
+        },
+    )
+    poisson_lambda: Optional[float] = field(
+        default=3,
+        metadata={
+            "help": "randomly shuffle sentences for this proportion of inputs",
+        },
+    )
+    permute_sentences: Optional[float] = field(
+        default=0.0,
+        metadata={
+            "help": "shuffle this proportion of sentences in all inputs",
+        },
+    )
+    mask_length: Optional[str] = field(
+        default="span-poisson",
+        metadata={
+            "help": "mask length to choose",
+            "choice": "subword/word/span-poisson"
+        },
+    )
+    replace_length: Optional[int] = field(
+        default=1,
+        metadata={
+            "help": "when masking N tokens, replace with 0, 1, or N tokens (use -1 for N)",
+        },
+    )
+    shuffle_instance: Optional[bool] = field(
+        default=False,
+        metadata={"help": "shuffle instance"},
+    )
+    max_source_positions: Optional[int] = field(
+        default=1024,
+        metadata={"help": "max number of tokens in the source sequence"},
+    )
+    max_target_positions: Optional[int] = field(
+        default=1024,
+        metadata={"help": "max number of tokens in the target sequence"},
+    )
+    bpe: Optional[str] = field(
+        default="sentencepiece",
+        metadata={
+            "help": "will wrapped by the text_data_config yaml",
+        },
+    )
+    data_config: Optional[str] = field(
+        default=None,
+        metadata={
+            "help": "a config yaml specify the bpe model of text data",
+        },
+    )
+    text_maxtokens_ratio: Optional[float] = field(
+        default=1.0,
+        metadata={
+            "help": "for text, max_tokens = max_tokens * text_maxtokens_ratio / 320 ",
+        },
+    )
+    prepend_tgt_lang_tag: bool = field(
+        default=True,
+        metadata={"help": "prepend tgt_lang_tag to replace <eos>"},
+    )
+    mask_text_ratio: Optional[float] = field(
+        default=0.0,
+        metadata={
+            "help": "mask_text_ratio, for paired data",
+        },
+    )
+
+
+@dataclass
+class JointPretrainingConfig(HubertPretrainingConfig):
+    store_labels: Optional[bool] = field(
+        default=False,
+        metadata={"help": "store spm labels in memory, should be true when fine-tune with bpe"},
+    )
+    add_decoder: bool = field(
+        default=False,
+        metadata={"help": "whether to add decoder for CE Loss on code"},
+    )
+    split_modality_batch: bool = field(
+        default=False,
+        metadata={"help": "whether create all samples of different modalities in a batch"},
+    )
+    speech_tgt_lang: str = field(
+        default="",
+        metadata={"help": "prepend <tgt-id> to prev_output_tokens to replace <eos>, only used for decoder"},
+    )
+    speech_sampling_alpha: float = field(
+        default=0.2,
+        metadata={
+            "help": "Hyper-parameter alpha = 1/T for temperature-based speech resampling."
+            "(alpha = 1 for no resampling)"
+        },
+    )
+    text_sampling_alpha: float = field(
+        default=0.2,
+        metadata={
+            "help": "Hyper-parameter alpha = 1/T for temperature-based text resampling."
+            "(alpha = 1 for no resampling)"
+        },
+    )
+    hubert_tokenizer: Optional[TOKENIZER_CHOICES] = field(
+        default="none",
+        metadata={"help": "which tokenizer for processing text"},
+    )
+    sp_path: Optional[str] = field(
+        default=None,
+        metadata={"help": "sentencepiece model path if using bpe tokenizer"},
+    )
+    text_cfg: TextPretrainingConfig = TextPretrainingConfig()
+
+
+@register_task("iwslt_joint_pretraining", dataclass=JointPretrainingConfig)
+class JointPretrainingTask(FairseqTask):
+    cfg: JointPretrainingConfig
+    def __init__(
+        self,
+        cfg: JointPretrainingConfig,
+    ) -> None:
+        super().__init__(cfg)
+
+        logger.info(f"current directory is {os.getcwd()}")
+        logger.info(f"JointPretrainingTask Config {cfg}")
+
+        self.cfg = cfg
+        self.fine_tuning = cfg.fine_tuning
+        self.blank_symbol = "<s>"
+
+        self.state.add_factory("hubert_tokenizer", self.build_tokenizer)
+        self.state.add_factory("text_dictionary", self.load_text_dictionary)
+        self.state.add_factory("text_src_dictionary", self.load_text_src_dictionary)
+        if cfg.fine_tuning:
+            self.state.add_factory("target_dictionary", self.load_dictionaries)
+        else:
+            self.state.add_factory("dictionaries", self.load_dictionaries)
+
+        if cfg.text_cfg.data_config is not None:
+            self.text_data_cfg = S2TJointDataConfig(Path(f"{cfg.text_cfg.text_data}/{cfg.text_cfg.data_config}"))
+            self.cfg.text_cfg.bpe = self.text_data_cfg.bpe_tokenizer["bpe"]
+
+    @property
+    def source_dictionary(self) -> Optional[Dictionary]:
+        return None
+
+    @property
+    def target_dictionary(self) -> Optional[Dictionary]:
+        return self.state.target_dictionary
+
+    @property
+    def dictionaries(self) -> List[Dictionary]:
+        return self.state.dictionaries
+
+    @property
+    def text_dictionary(self) -> Optional[Dictionary]:
+        return self.state.text_dictionary
+
+    @property
+    def text_src_dictionary(self) -> Optional[Dictionary]:
+        return self.state.text_src_dictionary
+
+    @property
+    def hubert_tokenizer(self):
+        return self.state.hubert_tokenizer
+
+    def load_dictionaries(self):
+        label_dir = self.cfg.data if self.cfg.label_dir is None else self.cfg.label_dir
+        dictionaries = [Dictionary.load(f"{label_dir}/dict.{label}.txt") for label in self.cfg.labels]
+        return dictionaries[0] if self.cfg.fine_tuning else dictionaries
+    
+    def load_text_dictionary(self):
+        tgt_dict_path = f"{self.cfg.text_cfg.text_data}/{self.text_data_cfg.vocab_filename}"
+        if not os.path.isfile(tgt_dict_path):
+            raise FileNotFoundError(f"Dict not found: {tgt_dict_path}")
+        text_dictionary = Dictionary.load(tgt_dict_path)
+        self.mask_idx = text_dictionary.add_symbol("<mask>")
+        return text_dictionary
+    
+    def load_text_src_dictionary(self):
+        return self.load_text_dictionary()
+
+    @classmethod
+    def setup_task(
+        cls, cfg: JointPretrainingConfig, **kwargs
+    ) -> "JointPretrainingTask":
+        return cls(cfg)
+
+    def get_label_dir(self) -> str:
+        if self.cfg.label_dir is None:
+            return self.cfg.data
+        return self.cfg.label_dir
+
+    def load_dataset(self, split: str, epoch=1, **kwargs) -> None:
+        """
+            Create Wav dataset for audio, and Index dataset for phonemized text, 
+            then concatenate them to by fairseq.data.multi_corpus_dataset.MultiCorpusDataset.
+        """
+        if len(split.split("+")) == 1:
+            speech_splits = split.split(",")
+            has_text = False
+        else:
+            has_text = True
+            speech_splits, text_splits = split.split("+")
+            speech_splits = speech_splits.split(",")
+            speech_splits = [item for item in speech_splits if item != '']
+            text_splits = text_splits.split(",")
+            text_splits = [item for item in text_splits if item != '']
+            logging.info(f"text_splits: {text_splits}")
+        logging.info(f"speech_splits: {speech_splits}")
+
+        ### 1, create a speech dataset using Speech2cDataset (modified from HubertDataset)
+        dicts = [self.target_dictionary] if self.cfg.fine_tuning else self.dictionaries
+        pad_list = [dict.pad() for dict in dicts]
+        eos_list = [dict.eos() for dict in dicts]
+        procs = [LabelEncoder(dict) for dict in dicts]
+        if self.cfg.speech_tgt_lang != "":
+            tgt_lang_idx = _lang_token_index(dicts[0], self.cfg.speech_tgt_lang)
+            logger.info(f"Will prepend <{tgt_lang_idx}> at the beginning of prev_output_tokens to replace <eos>")
+        else:
+            tgt_lang_idx = None
+
+        speech_dataset = None
+        mono_dataset = None
+        paired_dataset = None
+
+        speech_datasets = []
+        for speech_split in speech_splits:
+            # hubert v1: pad_audio=True, random_crop=False;
+            paths = [f"{self.get_label_dir()}/{speech_split}.{l}" for l in self.cfg.labels]
+            speech_datasets.append( 
+                Speech2cDataset(
+                    f"{self.cfg.data}/{speech_split}.tsv",
+                    sample_rate=self.cfg.sample_rate,
+                    label_paths=paths,
+                    label_rates=self.cfg.label_rate,
+                    pad_list=pad_list,
+                    eos_list=eos_list,
+                    label_processors=procs,
+                    max_keep_sample_size=self.cfg.max_keep_size,
+                    min_keep_sample_size=self.cfg.min_sample_size,
+                    max_sample_size=self.cfg.max_sample_size,
+                    pad_audio=self.cfg.pad_audio,
+                    normalize=self.cfg.normalize,
+                    store_labels=self.cfg.store_labels,
+                    random_crop=self.cfg.random_crop,
+                    single_target=self.cfg.single_target,
+                    tgt_dict=dicts[0],
+                    add_decoder=self.cfg.add_decoder,
+                    fine_tuning=self.cfg.fine_tuning,
+                    tgt_lang_idx=tgt_lang_idx,
+                    tokenizer=self.hubert_tokenizer,
+                )
+            )
+
+        if len(speech_datasets) > 1:
+            if 'train' in speech_splits[0] and self.cfg.speech_sampling_alpha != 1.0:
+                size_ratios = self._get_size_ratios(
+                    speech_splits, [len(s) for s in speech_datasets], alpha=self.cfg.speech_sampling_alpha
+                )
+                speech_datasets = [
+                    ResamplingDataset(
+                        d, size_ratio=r, seed=0, epoch=epoch, replace=(r >= 1.0)
+                    )
+                    for d, r in zip(speech_datasets, size_ratios)
+                ]
+            speech_dataset = ConcatDataset(speech_datasets)
+        elif len(speech_datasets) == 1:
+            speech_dataset = speech_datasets[0]
+
+        ### 2, create text mono/paired datasets
+        logger.info(f"split {split} has unpaired text? {has_text}")
+        if not has_text:
+            assert speech_dataset is not None
+            self.datasets[split] = speech_dataset
+            return
+
+        text_pairs = [ item for item in text_splits if len(item.split(".")[-1].split("-")) > 1 ]
+        text_monos = [ item for item in text_splits if len(item.split(".")[-1].split("-")) == 1 ]
+        logging.info(f"text_monos: {text_monos}")
+        logging.info(f"text_pairs: {text_pairs}")
+
+        ### 2.1, create text mono dataset using DenoisingDatasetLang
+        mono_datasets = []
+        if len(text_monos) > 0:
+            for text_split in text_monos:
+                lang = text_split.split('.')[-2]    ## e.g. mono_deduped_filt_sort.de_DE.de_DE
+                mask_whole_words = (
+                    get_whole_word_mask(Namespace(**self.text_data_cfg.bpe_tokenizer), self.text_dictionary)
+                    if self.cfg.text_cfg.mask_whole_words and lang in ("en_XX", "de_DE")
+                    else None
+                )
+
+                mono_dataset = data_utils.load_indexed_dataset(
+                    f"{self.cfg.text_cfg.text_data}/{text_split}",
+                    self.text_dictionary,
+                    combine=True,
+                )
+                mono_dataset = StripTokenDataset(mono_dataset, self.text_dictionary.eos())
+                mono_dataset = maybe_shorten_dataset(
+                    mono_dataset,
+                    "xxxxx",
+                    self.cfg.text_cfg.shorten_data_split_list,
+                    self.cfg.text_cfg.shorten_method,
+                    self.cfg.text_cfg.tokens_per_sample - 2,
+                    self.cfg.text_cfg.seed,
+                )
+                logger.info("loaded {} samples from: {}".format(len(mono_dataset), text_split))
+                ### prepend bos and eos to dataset
+                mono_dataset = PrependTokenDataset(mono_dataset, self.text_dictionary.bos())
+                mono_dataset = AppendTokenDataset(mono_dataset, self.text_dictionary.eos())
+                mono_dataset = DenoisingDatasetLang(
+                    mono_dataset,
+                    mono_dataset.sizes,
+                    self.text_dictionary,
+                    self.mask_idx,
+                    mask_whole_words,
+                    shuffle=self.cfg.text_cfg.shuffle_instance,
+                    seed=self.cfg.text_cfg.seed,
+                    args=self.cfg.text_cfg,
+                    tgt_lang_idx=_lang_token_index(self.text_dictionary, lang) if self.cfg.text_cfg.prepend_tgt_lang_tag else None,
+                )
+                mono_datasets.append(mono_dataset)
+
+        ### 2.2, create paired text datasets using load_langpair_dataset
+        paired_datasets = []
+        if len(text_pairs) > 0:
+            for text_pair in text_pairs:
+                text_split, lp = text_pair.rsplit('.', 1)       ## e.g. "mt8corpus.de_DE-en_EN"
+                src, tgt = lp.split("-")
+                paired_dataset = load_langpair_dataset(
+                    self.cfg.text_cfg.text_data,
+                    text_split,
+                    src,
+                    self.text_src_dictionary,
+                    tgt,
+                    self.text_dictionary,
+                    combine=True,
+                    dataset_impl=None,
+                    upsample_primary=1,
+                    left_pad_source=False,
+                    left_pad_target=False,
+                    max_source_positions=self.cfg.text_cfg.tokens_per_sample,
+                    max_target_positions=self.cfg.text_cfg.tokens_per_sample,
+                    prepend_bos=False,
+                    load_alignments=False,
+                    append_source_id=True if self.cfg.text_cfg.prepend_tgt_lang_tag else False,
+                    lang_format="<lang:{}>" if self.cfg.text_cfg.prepend_tgt_lang_tag else "[{}]",
+                )
+                if self.cfg.text_cfg.mask_text_ratio > 0:
+                    # add mask
+                    noise_token_id = self.text_src_dictionary.index("<mask>")
+                    paired_dataset = LangPairMaskDataset(
+                        paired_dataset,
+                        src_bos=self.text_src_dictionary.bos(),
+                        src_eos=self.text_src_dictionary.eos(),
+                        noise_id=noise_token_id,
+                        mask_ratio=self.cfg.text_cfg.mask_text_ratio,
+                    )
+                paired_datasets.append(paired_dataset)
+
+
+        ### 3rd, compose a MultiCorpusDataset
+        dataset_dict, max_positions_dict, distributions, max_tokens_ratios = self.resample_multi_modality_dataset(
+            speech_dataset, mono_datasets, paired_datasets, text_monos, text_pairs, epoch=epoch,
+        )
+        self.datasets[split] = MultiCorpusDataset(
+            dataset_dict,
+            max_positions=max_positions_dict,
+            distribution=distributions,
+            max_tokens_ratio=max_tokens_ratios,
+            seed=self.cfg.text_cfg.seed,
+            sort_indices=True,
+            check_length=False,
+        )
+
+    def max_positions(self) -> Tuple[int, int]:
+        return (sys.maxsize, sys.maxsize)
+
+    def filter_indices_by_size(
+        self, indices: np.array, *args, **kwargs
+    ) -> np.array:
+        return indices
+
+    def get_batch_iterator(
+        self,
+        dataset,
+        max_tokens=None,
+        max_sentences=None,
+        max_positions=None,
+        ignore_invalid_inputs=False,
+        required_batch_size_multiple=1,
+        seed=1,
+        num_shards=1,
+        shard_id=0,
+        num_workers=0,
+        epoch=1,
+        data_buffer_size=0,
+        disable_iterator_cache=False,
+        skip_remainder_batch=False,
+        grouped_shuffling=False,
+        update_epoch_batch_itr=False,
+    ):
+        """
+        A warpper of Faiseq.task.get_batch_iterator, only for pre-training, see
+            
+            https://github.com/facebookresearch/fairseq/blob/main/fairseq/tasks/fairseq_task.py
+            
+        Returns:
+            ~fairseq.iterators.EpochBatchIterator: a batched iterator over the
+                given dataset split
+        """
+        if not isinstance(dataset, MultiCorpusDataset):
+            return super().get_batch_iterator(
+                dataset,
+                max_tokens=max_tokens,
+                max_sentences=max_sentences,
+                max_positions=max_positions,
+                ignore_invalid_inputs=ignore_invalid_inputs,
+                required_batch_size_multiple=required_batch_size_multiple,
+                seed=seed,
+                num_shards=num_shards,
+                shard_id=shard_id,
+                num_workers=num_workers,
+                epoch=epoch,
+                data_buffer_size=data_buffer_size,
+                disable_iterator_cache=disable_iterator_cache,
+                skip_remainder_batch=skip_remainder_batch,
+                grouped_shuffling=grouped_shuffling,
+                update_epoch_batch_itr=update_epoch_batch_itr,
+            )
+
+        can_reuse_epoch_itr = (
+            not disable_iterator_cache
+            and not update_epoch_batch_itr
+            and self.can_reuse_epoch_itr(dataset)
+        )
+        if can_reuse_epoch_itr and dataset in self.dataset_to_epoch_iter:
+            logger.debug("reusing EpochBatchIterator for epoch {}".format(epoch))
+            return self.dataset_to_epoch_iter[dataset]
+
+        assert isinstance(dataset, FairseqDataset)
+
+        # initialize the dataset with the correct starting epoch
+        dataset.set_epoch(epoch)
+
+        # get indices ordered by example size
+        with data_utils.numpy_seed(seed):
+            indices = dataset.ordered_indices()
+
+        # create mini-batches with given size constraints
+        batch_sampler = dataset.get_batch_sampler(
+            indices,
+            num_shards,
+            seed,
+            max_tokens=max_tokens,
+            max_sentences=max_sentences,
+            required_batch_size_multiple=required_batch_size_multiple,
+            split_modality_batch=self.cfg.split_modality_batch,
+        )
+
+        # return a reusable, sharded iterator
+        epoch_iter = iterators.EpochBatchIterator(
+            dataset=dataset,
+            collate_fn=dataset.collater,
+            batch_sampler=batch_sampler,
+            seed=seed,
+            num_shards=num_shards,
+            shard_id=shard_id,
+            num_workers=num_workers,
+            epoch=epoch,
+            buffer_size=data_buffer_size,
+            skip_remainder_batch=skip_remainder_batch,
+            disable_shuffling=True,
+            grouped_shuffling=grouped_shuffling,
+        )
+
+        if can_reuse_epoch_itr:
+            self.dataset_to_epoch_iter[dataset] = epoch_iter
+
+        return epoch_iter
+
+    @classmethod
+    def _get_size_ratios(cls, ids: List[str], sizes: List[int], alpha: float = 1.0):
+        """Size ratios for temperature-based sampling
+        (https://arxiv.org/abs/1907.05019)"""
+        _sizes = np.array(sizes)
+        prob = _sizes / _sizes.sum()
+        smoothed_prob = prob ** alpha
+        smoothed_prob = smoothed_prob / smoothed_prob.sum()
+        size_ratio = (smoothed_prob * _sizes.sum()) / _sizes
+
+        o_str = str({_i: f"{prob[i]:.3f}" for i, _i in enumerate(ids)})
+        logger.info(f"original sampling probability: {o_str}")
+        p_str = str({_i: f"{smoothed_prob[i]:.3f}" for i, _i in enumerate(ids)})
+        logger.info(f"balanced sampling probability: {p_str}")
+        sr_str = str({_id: f"{size_ratio[i]:.3f}" for i, _id in enumerate(ids)})
+        logger.info(f"balanced sampling size ratio: {sr_str}")
+        return size_ratio.tolist()
+
+    def resample_multi_modality_dataset(self, speech_dataset, mono_datasets, paired_datasets, mono_splits, paired_splits, epoch=1, train=True):
+        assert len(mono_datasets+paired_datasets) > 0, f"No text data loaded!"
+
+        text_datasets = mono_datasets+paired_datasets
+        if len(text_datasets) > 1 and self.cfg.text_sampling_alpha != 1.0:
+            size_ratios = self._get_size_ratios(
+                mono_splits + paired_splits, [len(s) for s in mono_datasets + paired_datasets], alpha=self.cfg.text_sampling_alpha
+            )
+            text_datasets = [
+                ResamplingDataset(
+                    d, size_ratio=r, seed=0, epoch=epoch, replace=(r >= 1.0)
+                )
+                for d, r in zip(text_datasets, size_ratios)
+            ]
+
+        mono_datasets = text_datasets[:len(mono_datasets)]
+        paired_datasets = text_datasets[len(mono_datasets):]
+        dataset_list = [speech_dataset]
+        for datasets in [mono_datasets, paired_datasets]:
+            if len(datasets) > 0:
+                dataset_list.append(ConcatDataset(datasets))
+            else:
+                dataset_list.append(None)
+
+        ### match speech/text datasets according to modality
+        dataset_dict = OrderedDict((name, d) for name, d in zip(["speech", "text_mono", "text_paired"], dataset_list) if d is not None)
+        max_positions_dict = OrderedDict((name, None) for name in dataset_dict.keys())
+        if "text_paired" in dataset_dict:
+            max_positions_dict["text_paired"] = (self.cfg.text_cfg.tokens_per_sample, self.cfg.text_cfg.tokens_per_sample)
+        dataset_lens = np.array([len(dataset) for dataset in dataset_dict.values()])
+        dataset_avg_sample_lens = np.array([
+            sum([dataset.num_tokens(i) for i in np.random.randint(low=0, high=len(dataset), size=10000)]) / 10000.0 
+            for dataset in dataset_dict.values()
+        ])
+        max_tokens_ratios = [1.0 / 320 / self.cfg.text_cfg.text_maxtokens_ratio] * len(dataset_dict)
+
+        if not "speech" in dataset_dict:
+            distributions = [l / sum(dataset_lens) for l in dataset_lens]
+        else:
+            ## we just keep the batches of speech and non-speech the same
+            first_ratio = dataset_lens[0] / sum(dataset_lens)
+            distributions = [max_tokens_ratios[0] * dataset_avg_sample_lens[0] / l for l in dataset_avg_sample_lens]
+            text_total = sum(dataset_lens[1:])
+            distributions = [1.2 * d * n / text_total for d, n in zip(distributions, dataset_lens)]
+            max_tokens_ratios[0] = 1.0
+            distributions[0] = 1.0
+            distributions = [first_ratio * d for d in distributions]
+
+        logging.info(f"Number samples of datasets is {dataset_lens}")
+        logging.info(f"Avg sample length of datasets is {dataset_avg_sample_lens}")
+        logging.info(f"Sampling distributions is {distributions}")
+        logging.info(f"Maxtokens ratio is {max_tokens_ratios}")
+        return dataset_dict, max_positions_dict, distributions, max_tokens_ratios
+
+    def build_tokenizer(self, cfg=None):
+        logger.info(f"tokenizer: {self.cfg.hubert_tokenizer}")
+        if self.cfg.hubert_tokenizer != "none":
+            return encoders.build_bpe(Namespace(**{"bpe": self.cfg.hubert_tokenizer, "sentencepiece_model": self.cfg.sp_path}))
+        else:
+            return None
diff --git a/YiTrans/yitrans_iwslt22/tasks/iwslt_translation_from_pretrain.py b/YiTrans/yitrans_iwslt22/tasks/iwslt_translation_from_pretrain.py
new file mode 100644
index 0000000000000000000000000000000000000000..72e0d95be17411ab8877be83c90fd4d7ba6a1091
--- /dev/null
+++ b/YiTrans/yitrans_iwslt22/tasks/iwslt_translation_from_pretrain.py
@@ -0,0 +1,252 @@
+# --------------------------------------------------------
+# The YiTrans End-to-End Speech Translation System for IWSLT 2022 Offline Shared Task (https://arxiv.org/abs/2206.05777)
+# Github source: https://github.com/microsoft/SpeechT5/tree/main/YiTrans
+# Copyright (c) 2022 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Based on fairseq code bases
+# https://github.com/facebookresearch/fairseq
+# --------------------------------------------------------
+"""
+    Modified from 
+    https://github.com/facebookresearch/fairseq/blob/main/fairseq/tasks/translation.py
+
+"""
+
+import torch
+import logging
+from dataclasses import dataclass, field
+from typing import List, Optional, NamedTuple
+
+from fairseq import utils
+from fairseq.data import LanguagePairDataset, TransformEosLangPairDataset, FairseqDataset
+from fairseq.tasks import register_task
+from fairseq.tasks.translation import TranslationTask, TranslationConfig
+
+from yitrans_iwslt22.data.concat_dataset import ConcatDataset
+from yitrans_iwslt22.data.load_langpair_dataset import load_langpair_dataset
+
+logger = logging.getLogger(__name__)
+
+
+
+class LangPairStripDataset(FairseqDataset):
+    def __init__(
+        self,
+        dataset: LanguagePairDataset,
+        src_eos: int,
+        src_bos: Optional[int] = None,
+        noise_id: Optional[int] = -1,
+        mask_ratio: Optional[float] = 0,
+        mask_type: Optional[str] = "random",
+    ):
+        self.dataset = dataset
+        self.src_eos = src_eos
+        self.src_bos = src_bos
+        self.noise_id = noise_id
+        self.mask_ratio = mask_ratio
+        self.mask_type = mask_type
+        assert mask_type in ("random", "tail")
+
+    @property
+    def src_sizes(self):
+        return self.dataset.src_sizes
+
+    @property
+    def tgt_sizes(self):
+        return self.dataset.tgt_sizes
+
+    @property
+    def sizes(self):
+        # dataset.sizes can be a dynamically computed sizes:
+        return self.dataset.sizes
+
+    def get_batch_shapes(self):
+        return self.dataset.buckets
+
+    def num_tokens_vec(self, indices):
+        return self.dataset.num_tokens_vec(indices)
+
+    def __len__(self):
+        return len(self.dataset)
+
+    def num_tokens(self, index):
+        return self.dataset.num_tokens(index)
+
+    def size(self, index):
+        return self.dataset.size(index)
+
+    def ordered_indices(self):
+        return self.dataset.ordered_indices()
+
+    @property
+    def supports_prefetch(self):
+        return getattr(self.dataset, "supports_prefetch", False)
+
+    def prefetch(self, indices):
+        return self.dataset.prefetch(indices)
+
+    def mask_src_tokens(self, sample):
+        src_item = sample["source"]
+        mask = None
+        if self.mask_type == "random":
+            mask = torch.rand(len(src_item)).le(self.mask_ratio)
+        else:
+            mask = torch.ones(len(src_item))
+            mask[: int(len(src_item) * (1 - self.mask_ratio))] = 0
+            mask = mask.eq(1)
+        mask[-1] = False
+        if src_item[0] == self.src_bos:
+            mask[0] = False
+        if src_item[-2] == self.src_eos:
+            mask[-2] = False
+        no_mask = ~mask
+        mask_src_item = src_item[no_mask]
+        smp = sample
+        smp["source"] = mask_src_item
+        print(f"{len(src_item)}: {src_item}")
+        print(f"{len(mask_src_item)}: {mask_src_item}")
+        return smp
+
+    def __getitem__(self, index):
+        sample = self.dataset[index]
+        if self.mask_ratio > 0:
+            sample = self.mask_src_tokens(sample)
+        return sample
+
+    def collater(self, samples, pad_to_length=None):
+        return self.dataset.collater(samples, pad_to_length=pad_to_length)
+
+
+@dataclass
+class AddTranslationConfig(TranslationConfig):
+    langs: str = ""
+    prepend_bos: bool = False
+    normalize: bool = False
+    append_source_id: bool = False
+    mask_text_ratio: float = 0
+    ### ShrinkingDataset related, not used
+    shrink_start_epoch: int = 0
+    shrink_end_epoch: int = 0
+    shrink_start_ratio: float = 1.0
+    shrink_end_ratio: float = 1.0
+
+
+@register_task("iwslt_translation_from_pretrained", dataclass=AddTranslationConfig)
+class TranslationFromPretrainedTask(TranslationTask):
+    args: AddTranslationConfig
+
+    def __init__(self, args: AddTranslationConfig, src_dict, tgt_dict):
+        super().__init__(args, src_dict, tgt_dict)
+        self.args = args
+        self.langs = args.langs.split(",")
+        for d in [src_dict, tgt_dict]:
+            for l in self.langs:
+                d.add_symbol("[{}]".format(l))
+            d.add_symbol("<mask>")
+
+
+    def load_dataset(self, split, epoch=1, combine=False, **kwargs):
+        """Load a given dataset split.
+
+        Args:
+            split (str): name of the split (e.g., train, valid, test)
+        """
+        paths = utils.split_paths(self.args.data)
+        assert len(paths) > 0
+        data_path = paths[(epoch - 1) % len(paths)]
+
+        # infer langcode
+        src, tgt = self.args.source_lang, self.args.target_lang
+
+        paired_datasets = []
+        for sub_split in split.split(","):
+            paired_dataset= load_langpair_dataset(
+                data_path,
+                sub_split,
+                src,
+                self.src_dict,
+                tgt,
+                self.tgt_dict,
+                combine=combine,
+                dataset_impl=self.args.dataset_impl,
+                upsample_primary=self.args.upsample_primary,
+                left_pad_source=self.args.left_pad_source,
+                left_pad_target=self.args.left_pad_target,
+                max_source_positions=getattr(self.args, "max_source_positions", 1024),
+                max_target_positions=getattr(self.args, "max_target_positions", 1024),
+                load_alignments=self.args.load_alignments,
+                prepend_bos=getattr(self.args, "prepend_bos", False),
+                append_source_id=getattr(self.args, "append_source_id", False),
+            )
+            if not split.startswith("valid") and getattr(self.args, "mask_text_ratio", 0) > 0 and not sub_split.startswith("asr_"):
+                mask_text_ratio = getattr(self.args, "mask_text_ratio", 0)
+                noise_token_id = self.src_dict.index("<mask>")
+                logger.info(f"Masking {sub_split} at a probability: {mask_text_ratio}")
+                paired_dataset = LangPairStripDataset(
+                    paired_dataset,
+                    src_bos=self.src_dict.bos(),
+                    src_eos=self.src_dict.eos(),
+                    noise_id=noise_token_id,
+                    mask_ratio=mask_text_ratio,
+                )
+            paired_datasets.append(paired_dataset)
+        paired_dataset = paired_datasets[0] if len(paired_datasets) == 1 else ConcatDataset(paired_datasets, 1)
+
+        if getattr(self.args, "append_source_id", False):
+            logger.info(f"Appending <lang-id> to the end of samples")
+            self.datasets[split] = paired_dataset
+        else:
+            logger.info(f"Replacing <eos> with <lang-id> for prev_output_tokens")
+            self.datasets[split] = TransformEosLangPairDataset(
+                paired_dataset,
+                src_eos=self.src_dict.eos(),
+                tgt_bos=self.tgt_dict.eos(),  # 'prev_output_tokens' starts with eos
+                new_tgt_bos=self.tgt_dict.index("[{}]".format(tgt)),
+            )
+
+    def build_generator(self, models, args, **unused):
+        if getattr(args, "score_reference", False):
+            from fairseq.sequence_scorer import SequenceScorer
+
+            return SequenceScorer(
+                self.target_dictionary,
+                eos=self.tgt_dict.index("[{}]".format(self.args.target_lang)),
+            )
+        else:
+            from yitrans_iwslt22.sequence_generator import SequenceGenerator
+
+            return SequenceGenerator(
+                models,
+                self.target_dictionary,
+                beam_size=getattr(args, "beam", 5),
+                max_len_a=getattr(args, "max_len_a", 0),
+                max_len_b=getattr(args, "max_len_b", 200),
+                min_len=getattr(args, "min_len", 1),
+                normalize_scores=(not getattr(args, "unnormalized", False)),
+                len_penalty=getattr(args, "lenpen", 1),
+                unk_penalty=getattr(args, "unkpen", 0),
+                temperature=getattr(args, "temperature", 1.0),
+                match_source_len=getattr(args, "match_source_len", False),
+                no_repeat_ngram_size=getattr(args, "no_repeat_ngram_size", 0),
+                eos=self.tgt_dict.index("[{}]".format(self.args.target_lang)) if getattr(self.args, "append_source_id", False) else None,
+                bos=None if getattr(self.args, "append_source_id", False) else self.tgt_dict.index("[{}]".format(self.args.target_lang))
+            )
+
+    def build_dataset_for_inference(self, src_tokens, src_lengths, constraints=None):
+        if getattr(self.args, "append_source_id", False):
+            src_lang_id = self.source_dictionary.index("[{}]".format(self.args.source_lang))
+            source_tokens = []
+            for s_t in src_tokens:
+                s_t = torch.cat([s_t, s_t.new(1).fill_(src_lang_id)])
+                source_tokens.append(s_t)
+        else:
+            source_tokens = src_tokens
+        
+        dataset = LanguagePairDataset(
+            source_tokens,
+            src_lengths,
+            self.source_dictionary,
+            tgt_dict=self.target_dictionary,
+            constraints=constraints,
+        )
+        return dataset
diff --git a/asr_train.sh b/asr_train.sh
new file mode 100644
index 0000000000000000000000000000000000000000..341c46f70f634902fc4fbf90c68c13fb233b3a1b
--- /dev/null
+++ b/asr_train.sh
@@ -0,0 +1,187 @@
+#!/bin/bash
+
+# 初始化默认值
+TRAIN_SET="train"
+VALID_SET="valid"
+
+# 定义帮助函数
+usage() {
+  cat <<EOF
+Usage:
+  $(basename "$0") [options]
+
+Options:
+  -h                                  - This help
+  --dcu <number>                      - The number of dcu
+  --log <log directory>               - The log file of train
+  --td <train dir>                    - The directory of (train.tsv, valid.tsv)
+  --res <result dir>                  - The directory of (xxx.pt)
+  --lab <label dir>                   - The directory of (train.txt, valid.txt)
+  --token <tokenizer path>            - The path of BPE_TOKENIZER
+  --speecht5 <speecht5 path>          - The path of speecht5
+  --checkpoint <checkpoint path>      - The path of speecht5_base.pt  
+  --epoch                             - epoch of train         
+EOF
+}
+
+# 主处理逻辑
+process_long_option() {
+  local arg="$1"
+  local key="${arg%%=*}"
+  local value="${arg#*=}"
+
+  if [[ "$key" == "$arg" ]]; then
+    # 如果没有等号，则假定下一个参数是值
+    value="$2"
+    if [[ -z "$value" || "$value" == -* ]]; then
+      echo "Option $key requires a value."
+      usage
+      exit 1
+    fi
+    # 移除已处理的值
+    shift
+  fi
+
+  case "$key" in
+    --dcu)
+      dcu="$value"
+      ;;
+    --log)
+      logdir="$value"
+      ;;
+    --td)
+      DATA_ROOT="$value"
+      ;;
+    --res)
+      SAVE_DIR="$value"
+      ;;
+    --lab)
+      LABEL_DIR="$value"
+      ;;
+    --token)
+      BPE_TOKENIZER="$value"
+      ;;
+    --speecht5)
+      USER_DIR="$value"
+      ;;
+    --checkpoint)
+      PT_CHECKPOINT_PATH="$value"
+      ;;
+    --epoch)
+      epoch="$value"
+      ;;
+    -h)
+      usage
+      exit 0
+      ;;
+    *)
+      echo "Unknown option: $key"
+      usage
+      exit 1
+      ;;
+  esac
+}
+
+# 解析命令行参数
+while [[ $# -gt 0 ]]; do
+  case "$1" in
+    --*)
+       # 处理长选项及其值
+      process_long_option "$1" "$2"
+      # 移除已处理的选项
+      shift 2
+      ;;
+    -h)
+      usage
+      exit 0
+      ;;
+    -*)
+      echo "Invalid option: $1"
+      usage
+      exit 1
+      ;;
+    *)
+      break
+      ;;
+  esac
+done
+
+# 创建日志目录
+mkdir -p "$logdir"
+
+# 创建保存目录
+mkdir -p "$SAVE_DIR"
+
+# 日志文件
+all_log="$logdir/all-${dcu}-log.log"
+benchmark_log="$logdir/train-log.log"
+
+# 输出解析结果（仅为演示）
+echo "dcu: $dcu"
+echo "log: $logdir"
+echo "DATA_ROOT: $DATA_ROOT"
+echo "SAVE_DIR: $SAVE_DIR"
+echo "LABEL_DIR: $LABEL_DIR"
+echo "BPE_TOKENIZER: $BPE_TOKENIZER"
+echo "USER_DIR: $USER_DIR"
+echo "PT_CHECKPOINT_PATH: $PT_CHECKPOINT_PATH"
+echo "epoch: $epoch"
+
+# 调用 fairseq-train 并传递参数
+fairseq-train "$DATA_ROOT" \
+  --save-dir "$SAVE_DIR" \
+  --tensorboard-logdir "$SAVE_DIR" \
+  --train-subset "$TRAIN_SET" \
+  --valid-subset "$VALID_SET" \
+  --hubert-label-dir "$LABEL_DIR" \
+  --distributed-world-size "$dcu" \
+  --distributed-port 0 \
+  --ddp-backend legacy_ddp \
+  --user-dir "$USER_DIR" \
+  --log-format json \
+  --seed 1 \
+  \
+  --task speecht5 \
+  --t5-task s2t \
+  --sample-rate 16000 \
+  --num-workers 0 \
+  --max-tokens 1600000 \
+  --update-freq 2 \
+  --bpe-tokenizer "$BPE_TOKENIZER" \
+  \
+  --criterion speecht5 \
+  --report-accuracy \
+  --zero-infinity \
+  --ce-weight 0.5 \
+  --ctc-weight 0.5 \
+  --sentence-avg \
+  \
+  --optimizer adam \
+  --adam-betas "(0.9, 0.98)" \
+  --adam-eps 1e-08 \
+  --weight-decay 0.1 \
+  --clip-norm 25.0 \
+  --lr 0.00006 \
+  --lr-scheduler tri_stage \
+  --phase-ratio "[0.1, 0.4, 0.5]" \
+  --final-lr-scale 0.05 \
+  \
+  --max-epoch "$epoch" \
+  --max-update 80000 \
+  --max-text-positions 600 \
+  --required-batch-size-multiple 1 \
+  --save-interval-updates 3000 \
+  --skip-invalid-size-inputs-valid-test \
+  \
+  --arch t5_transformer_base_asr \
+  --share-input-output-embed \
+  --find-unused-parameters \
+  --bert-init \
+  --relative-position-embedding \
+  --freeze-encoder-updates 13000 \
+  \
+  --keep-last-epochs 10 \
+  --feature-grad-mult 1.0 \
+  --best-checkpoint-metric s2t_accuracy \
+  --maximize-best-checkpoint-metric \
+  --finetune-from-model "$PT_CHECKPOINT_PATH" 2>&1 | tee "$all_log"
\ No newline at end of file
diff --git a/calc4vc.py b/calc4vc.py
new file mode 100644
index 0000000000000000000000000000000000000000..35925334acb4955bcf75290ba2796b27a12e7489
--- /dev/null
+++ b/calc4vc.py
@@ -0,0 +1,40 @@
+import numpy as np
+import librosa
+from scipy.spatial.distance import euclidean
+from jiwer import wer
+
+def extract_mfccs(signal, sample_rate):
+    mfccs = librosa.feature.mfcc(y=signal, sr=sample_rate, n_mfcc=13)
+    return mfccs
+
+def calculate_mcd(original_mfccs, converted_mfccs):
+    distances = [euclidean(original_mfccs[:, k], converted_mfccs[:, k]) for k in range(original_mfccs.shape[1])]
+    mcd = np.sqrt(np.mean([d**2 for d in distances]))
+    return mcd
+
+def calculate_wer(reference_text, hypothesis_text):
+    return wer(reference_text, hypothesis_text)
+
+# 示例音频文件
+original_audio_file = 'original.wav'
+converted_audio_file = 'converted.wav'
+
+# 加载音频
+original_signal, sample_rate = librosa.load(original_audio_file, sr=None)
+converted_signal, _ = librosa.load(converted_audio_file, sr=None)
+
+# 提取MFCCs
+original_mfccs = extract_mfccs(original_signal, sample_rate)
+converted_mfccs = extract_mfccs(converted_signal, sample_rate)
+
+# 计算MCD
+mcd = calculate_mcd(original_mfccs, converted_mfccs)
+print(f"MCD: {mcd}")
+
+# 假设我们有参考文本和转换后的文本
+reference_text = "This is a reference text."
+hypothesis_text = "This is a reference text."
+
+# 计算WER
+wer_value = calculate_wer(reference_text, hypothesis_text)
+print(f"WER: {wer_value * 100}%")
\ No newline at end of file
diff --git a/dict.txt b/dict.txt
new file mode 100644
index 0000000000000000000000000000000000000000..61ebbfe8cec6b9f80fb9d34ddae96a5e64e61366
--- /dev/null
+++ b/dict.txt
@@ -0,0 +1,75 @@
+▁ 1
+e 1
+t 1
+a 1
+o 1
+n 1
+i 1
+h 1
+s 1
+r 1
+d 1
+l 1
+u 1
+c 1
+m 1
+f 1
+w 1
+g 1
+y 1
+, 1
+p 1
+b 1
+. 1
+v 1
+k 1
+" 1
+I 1
+' 1
+T 1
+A 1
+S 1
+H 1
+; 1
+x 1
+W 1
+- 1
+B 1
+? 1
+C 1
+M 1
+! 1
+q 1
+j 1
+E 1
+N 1
+P 1
+O 1
+D 1
+L 1
+G 1
+R 1
+F 1
+Y 1
+z 1
+J 1
+: 1
+K 1
+U 1
+V 1
+) 1
+( 1
+Q 1
+Z 1
+] 1
+[ 1
+X 1
+— 1
+/ 1
+æ 1
+é 1
+{ 1
+} 1
+ê 1
+œ 1
+̄ 1
diff --git a/docker/Dockerfile b/docker/Dockerfile
new file mode 100644
index 0000000000000000000000000000000000000000..25729b25a0dc329252b12b4dc0f23f0081702fb3
--- /dev/null
+++ b/docker/Dockerfile
@@ -0,0 +1,2 @@
+FROM image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-centos7.6-dtk24.04-py310
+RUN source /opt/dtk/env.sh
\ No newline at end of file
diff --git a/icon.png b/icon.png
new file mode 100644
index 0000000000000000000000000000000000000000..5fb920f3003e8037c3692f0781aa19693c9e1c36
Binary files /dev/null and b/icon.png differ
diff --git a/images/algorithm.png b/images/algorithm.png
new file mode 100644
index 0000000000000000000000000000000000000000..cbd2f8bef07dc5252c8c6011354d5d01da694752
Binary files /dev/null and b/images/algorithm.png differ
diff --git a/images/model_architecture.png b/images/model_architecture.png
new file mode 100644
index 0000000000000000000000000000000000000000..7cb62dd6ac646f3420ec4f7570b5a819f8649c0b
Binary files /dev/null and b/images/model_architecture.png differ
diff --git a/libri_labels.py b/libri_labels.py
new file mode 100644
index 0000000000000000000000000000000000000000..277ff2dfbba6d9855522cf64571b7207e9abcb68
--- /dev/null
+++ b/libri_labels.py
@@ -0,0 +1,56 @@
+#!/usr/bin/env python3
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+"""
+Helper script to pre-compute embeddings for a flashlight (previously called wav2letter++) dataset
+"""
+
+import argparse
+import os
+
+
+def main():
+    parser = argparse.ArgumentParser()
+    parser.add_argument("tsv")
+    parser.add_argument("--output-dir", required=True)
+    parser.add_argument("--output-name", required=True)
+    args = parser.parse_args()
+
+    os.makedirs(args.output_dir, exist_ok=True)
+
+    transcriptions = {}
+
+    with open(args.tsv, "r") as tsv, open(
+        os.path.join(args.output_dir, args.output_name + ".ltr"), "w"
+    ) as ltr_out, open(
+        os.path.join(args.output_dir, args.output_name + ".wrd"), "w"
+    ) as wrd_out:
+        root = next(tsv).strip()
+        for line in tsv:
+            line = line.strip()
+            dir = os.path.dirname(line)
+            if dir not in transcriptions:
+                parts = dir.split(os.path.sep)
+                trans_path = f"{parts[-2]}-{parts[-1]}.trans.txt"
+                path = os.path.join(root, dir, trans_path)
+                assert os.path.exists(path)
+                texts = {}
+                with open(path, "r") as trans_f:
+                    for tline in trans_f:
+                        items = tline.strip().split()
+                        texts[items[0]] = " ".join(items[1:])
+                transcriptions[dir] = texts
+            part = os.path.basename(line).split(".")[0]
+            assert part in transcriptions[dir]
+            print(transcriptions[dir][part], file=wrd_out)
+            print(
+                " ".join(list(transcriptions[dir][part].replace(" ", "|"))) + " |",
+                file=ltr_out,
+            )
+
+
+if __name__ == "__main__":
+    main()
\ No newline at end of file
diff --git a/librispeech_asr_demo.py b/librispeech_asr_demo.py
new file mode 100644
index 0000000000000000000000000000000000000000..01ed586e70287a3e8f5c3e07c2095f5c6e97baa6
--- /dev/null
+++ b/librispeech_asr_demo.py
@@ -0,0 +1,137 @@
+# coding=utf-8
+# Copyright 2021 The TensorFlow Datasets Authors and the HuggingFace Datasets Authors.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+# Lint as: python3
+"""Librispeech automatic speech recognition dataset."""
+
+from __future__ import absolute_import, division, print_function
+
+import glob
+import os
+
+import datasets
+
+
+_CITATION = """\
+@inproceedings{panayotov2015librispeech,
+  title={Librispeech: an ASR corpus based on public domain audio books},
+  author={Panayotov, Vassil and Chen, Guoguo and Povey, Daniel and Khudanpur, Sanjeev},
+  booktitle={Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on},
+  pages={5206--5210},
+  year={2015},
+  organization={IEEE}
+}
+"""
+
+_DESCRIPTION = """\
+LibriSpeech is a corpus of approximately 1000 hours of read English speech with sampling rate of 16 kHz,
+prepared by Vassil Panayotov with the assistance of Daniel Povey. The data is derived from read
+audiobooks from the LibriVox project, and has been carefully segmented and aligned.
+
+Note that in order to limit the required storage for preparing this dataset, the audio
+is stored in the .flac format and is not converted to a float32 array. To convert, the audio
+file to a float32 array, please make use of the `.map()` function as follows:
+
+
+```python
+import soundfile as sf
+
+def map_to_array(batch):
+    speech_array, _ = sf.read(batch["file"])
+    batch["speech"] = speech_array
+    return batch
+
+dataset = dataset.map(map_to_array, remove_columns=["file"])
+```
+"""
+
+_URL = "http://www.openslr.org/12"
+_DL_URL = "https://s3.amazonaws.com/datasets.huggingface.co/librispeech_asr/2.1.0/"
+_DL_URL = "https://s3.amazonaws.com/datasets.huggingface.co/librispeech_asr/2.1.0/"
+
+_DL_URLS = {
+    "clean": {
+        "dev": _DL_URL + "dev_clean.tar.gz",
+    }
+}
+
+
+class LibrispeechASRConfig(datasets.BuilderConfig):
+    """BuilderConfig for LibriSpeechASR."""
+
+    def __init__(self, **kwargs):
+        """
+        Args:
+          data_dir: `string`, the path to the folder containing the files in the
+            downloaded .tar
+          citation: `string`, citation for the data set
+          url: `string`, url for information about the data set
+          **kwargs: keyword arguments forwarded to super.
+        """
+        super(LibrispeechASRConfig, self).__init__(version=datasets.Version("2.1.0", ""), **kwargs)
+
+
+class LibrispeechASR(datasets.GeneratorBasedBuilder):
+    """Librispeech dataset."""
+
+    BUILDER_CONFIGS = [
+        LibrispeechASRConfig(name="clean", description="'Clean' speech."),
+        LibrispeechASRConfig(name="other", description="'Other', more challenging, speech."),
+    ]
+
+    def _info(self):
+        return datasets.DatasetInfo(
+            description=_DESCRIPTION,
+            features=datasets.Features(
+                {
+                    "file": datasets.Value("string"),
+                    "audio": datasets.features.Audio(sampling_rate=16_000),
+                    "text": datasets.Value("string"),
+                    "speaker_id": datasets.Value("int64"),
+                    "chapter_id": datasets.Value("int64"),
+                    "id": datasets.Value("string"),
+                }
+            ),
+            supervised_keys=("speech", "text"),
+            homepage=_URL,
+            citation=_CITATION,
+        )
+
+    def _split_generators(self, dl_manager):
+        archive_path = dl_manager.download_and_extract(_DL_URLS[self.config.name])
+        return [
+            datasets.SplitGenerator(name=datasets.Split.VALIDATION, gen_kwargs={"archive_path": archive_path["dev"], "split_name": f"dev_{self.config.name}"}),
+        ]
+
+    def _generate_examples(self, archive_path, split_name):
+        """Generate examples from a Librispeech archive_path."""
+        transcripts_glob = os.path.join(archive_path, split_name, "*/*/*.txt")
+        for transcript_file in sorted(glob.glob(transcripts_glob)):
+            path = os.path.dirname(transcript_file)
+            with open(os.path.join(path, transcript_file)) as f:
+                for line in f:
+                    line = line.strip()
+                    key, transcript = line.split(" ", 1)
+                    audio_file = f"{key}.flac"
+                    speaker_id, chapter_id = [int(el) for el in key.split("-")[:2]]
+                    example = {
+                        "id": key,
+                        "speaker_id": speaker_id,
+                        "chapter_id": chapter_id,
+                        "file": os.path.join(path, audio_file),
+                        "audio": os.path.join(path, audio_file),
+                        "text": transcript,
+                    }
+                    yield key, example
diff --git a/model.properties b/model.properties
new file mode 100644
index 0000000000000000000000000000000000000000..fc102f7986ac9f2ad8556b1e703e9f3e3f3fa633
--- /dev/null
+++ b/model.properties
@@ -0,0 +1,10 @@
+#模型编码
+modelCode=870
+# 模型名称
+modelName=speecht5_pytorch
+# 模型描述
+modelDescription=speecht5是微软推出的语音模型,支持文本到语音,语音到语音,语音到文本的多个模态的推理。
+# 应用场景(多个标签以英文逗号分割)
+appScenario=推理,语音识别,人声变声,语音合成,金融,通信,广媒
+# 框架类型(多个标签以英文逗号分割)
+frameType=PyTorch
\ No newline at end of file
diff --git a/requirements.txt b/requirements.txt
new file mode 100644
index 0000000000000000000000000000000000000000..8c106b116fe0172ca99ddc5ac252f268e2d58205
--- /dev/null
+++ b/requirements.txt
@@ -0,0 +1,5 @@
+bitarray==2.9.2
+scikit-learn==1.5.1
+espnet==202402
+soundfile==0.12.1
+numpy==1.23.5
\ No newline at end of file
diff --git a/test_asr.sh b/test_asr.sh
new file mode 100644
index 0000000000000000000000000000000000000000..18a72462b902181dcda37343ed907b1834a4097f
--- /dev/null
+++ b/test_asr.sh
@@ -0,0 +1,29 @@
+CHECKPOINT_PATH=/public/home/changhl/dataset/speecht5_base_asr.pt
+DATA_ROOT=/public/home/changhl/py_project/speecht5_pytorch/valid
+SUBSET=valid
+BPE_TOKENIZER=/public/home/changhl/py_project/speecht5_pytorch/speecht5_asr/spm_char.model
+LABEL_DIR=/public/home/changhl/py_project/speecht5_pytorch/valid
+USER_DIR=/public/home/changhl/py_project/SpeechT5/SpeechT5/speecht5
+BEAM=4
+MAX_TOKENS=1600000
+CTC_WEIGHT=0.5
+LM_WEIGHT=0.5
+LM_PATH=/public/home/changhl/dataset/t5_transformer_lm.pt
+
+fairseq-generate ${DATA_ROOT} \
+  --gen-subset ${SUBSET} \
+  --bpe-tokenizer ${BPE_TOKENIZER} \
+  --user-dir ${USER_DIR} \
+  --task speecht5 \
+  --t5-task s2t \
+  --path ${CHECKPOINT_PATH} \
+  --hubert-label-dir ${LABEL_DIR} \
+  --ctc-weight ${CTC_WEIGHT} \
+  --lm-weight ${LM_WEIGHT} \
+  --lm-path ${LM_PATH} \
+  --max-tokens ${MAX_TOKENS} \
+  --beam ${BEAM} \
+  --scoring wer \
+  --max-len-a 0 \
+  --max-len-b 620 \
+  --sample-rate 16000
\ No newline at end of file